TOP ACTIVATIONS
MAX = 0.112

mom and their brother and sister . <|endoftext|> S ara was
They are good brother and sister . <|endoftext|> L ily liked
treats and toys with the girls and they all lived happily
, whenever Daisy and her sister look back on that special
mom and dad and his sister and his dog in the
on the little boys and girls always looked up to the
too . Sue and her sister had made their home much
happy to be brother and sister . They did not want
mom and dad and his sister . Ben nodded
on the floor and the sisters fighting . She
, while his brothers and sisters hugged him .
in the shirt and the twins crying . "
sees Sam 's mom and sister . He sees their angry
castle with his brothers and sisters . Then
amazed . Daisy and her sister were so proud of themselves
to the mom and the twins . He said he would
father and your brothers and sisters . They are all le
in the closet and her sister outside . She asked them
. Tim my and his sister quickly ran inside to stay
Max were normal brother and sister again . <|endoftext|> Tim had

MIN ACTIVATIONS
MIN = -0.141

little girl all about how volcan oes work . She taught
. Lily and Ben think volcan oes are cool . They
Lucy noticed the weather was uncom fort ably hot . She
big book with pictures of volcan oes . Vol can oes
but the other animal was ins istent . The
big book with pictures of volcan oes . Vol can oes
the little girl was so ins istent that her mum finally
a family of turtles sun bat hing in a nearby pond
The book has pictures of volcan oes . Vol can oes
teach you something special about volcan oes ." So
pony . She was very ins istent about it and her
learned even more about how volcan oes work . She thought
, but it was too high . So , she got
, there was a happy ostr ich . She was the
There was a gr umpy ranc her named Tom . Tom
, but it was too high for him .
, but it was too high . So he pushed a
, but it was too high for him to reach .
, a pink car named Spe edy liked to go fast
, but it was too high for him . So ,

INTERVAL 0.049 - 0.112
CONTAINS 2.246%

not forgiving . " You are bad boys ,"
bugs and worms , mom ," Ben said . " We
carried them carefully to the room . He put them on
hill . They laughed and had a great time . But
. But she likes her friends and her class . She

INTERVAL -0.014 - 0.049
CONTAINS 69.283%

his success . He celebrated by having a big party .
party and the loud yelling that made it so much fun
a monster ! It was a cat ! Billy
From that day , Daisy went to that hotel
. She thinks the player is broken . She stands up

INTERVAL -0.077 - -0.014
CONTAINS 28.362%

. They knew she wanted them to be happy . They
anyone . One day , she went to the forest with
" We have a modern truck . A truck is a
scared and confused , but he was determined to stay brave
girl named Lily who was going to start school soon .

INTERVAL -0.141 - -0.077
CONTAINS 0.109%

park and look for something y ummy to eat .
legs . He goes very high . He feels happy .
fall . Sp ide y listened to his
berries . They look so y ummy !" Her
. But they are too high for you to reach .
Token mom
Token ID1995
Feature activation+4.72e-02

Pos logit contributions
 coloring+0.012
 tea+0.011
 dress+0.011
 blush+0.010
 kit+0.010
Neg logit contributions
urious-0.022
ust-0.020
rated-0.020
uct-0.019
 habitat-0.018
Token and
Token ID290
Feature activation-3.55e-03

Pos logit contributions
 blush+0.081
 coloring+0.076
 Grac+0.074
 piping+0.070
 groom+0.069
Neg logit contributions
urious-0.095
rated-0.077
ust-0.075
uct-0.071
 dangers-0.070
Token their
Token ID511
Feature activation+3.07e-02

Pos logit contributions
urious+0.008
rated+0.007
ust+0.007
uct+0.007
 eternity+0.007
Neg logit contributions
 blush-0.005
 coloring-0.005
 Grac-0.004
 tea-0.004
 dress-0.004
Token brother
Token ID3956
Feature activation+7.88e-02

Pos logit contributions
 coloring+0.040
 blush+0.038
 tea+0.038
 dress+0.037
 groom+0.033
Neg logit contributions
urious-0.076
ust-0.067
rated-0.067
uct-0.064
appointed-0.060
Token and
Token ID290
Feature activation-3.55e-03

Pos logit contributions
 blush+0.138
 coloring+0.131
 Grac+0.123
 tea+0.120
 dress+0.119
Neg logit contributions
urious-0.156
rated-0.127
ust-0.123
uct-0.116
 eternity-0.114
Token sister
Token ID6621
Feature activation+1.12e-01

Pos logit contributions
urious+0.008
rated+0.007
ust+0.007
uct+0.007
 dangers+0.007
Neg logit contributions
 blush-0.005
 coloring-0.004
 Grac-0.004
 piping-0.004
 tea-0.004
Token.
Token ID13
Feature activation+3.10e-02

Pos logit contributions
 blush+0.213
 coloring+0.201
 Grac+0.200
 piping+0.191
acey+0.188
Neg logit contributions
urious-0.197
rated-0.152
ust-0.150
uct-0.142
 dangers-0.139
Token<|endoftext|>
Token ID50256
Feature activation-4.20e-02

Pos logit contributions
 blush+0.047
 coloring+0.045
 Grac+0.042
 tea+0.041
 piping+0.041
Neg logit contributions
urious-0.070
rated-0.059
ust-0.057
uct-0.054
 eternity-0.053
TokenS
Token ID50
Feature activation-1.98e-02

Pos logit contributions
urious+0.085
rated+0.073
ust+0.070
uct+0.065
 eternity+0.064
Neg logit contributions
 coloring-0.071
 blush-0.071
 tea-0.066
 Grac-0.065
 dress-0.064
Tokenara
Token ID3301
Feature activation+1.82e-02

Pos logit contributions
urious+0.040
rated+0.031
ust+0.031
uct+0.030
 eternity+0.029
Neg logit contributions
 blush-0.034
 coloring-0.032
 Grac-0.031
 piping-0.030
 groom-0.030
Token was
Token ID373
Feature activation-1.68e-02

Pos logit contributions
 blush+0.029
 coloring+0.028
 Grac+0.026
 tea+0.026
 dress+0.025
Neg logit contributions
urious-0.038
rated-0.031
ust-0.031
uct-0.029
 eternity-0.028
Token They
Token ID1119
Feature activation+3.63e-02

Pos logit contributions
 blush+0.047
 Grac+0.044
 coloring+0.044
 piping+0.042
 Rose+0.042
Neg logit contributions
urious-0.076
rated-0.065
ust-0.061
uct-0.060
 eternity-0.058
Token are
Token ID389
Feature activation+3.79e-02

Pos logit contributions
 blush+0.048
 coloring+0.045
 Grac+0.043
 piping+0.041
acey+0.039
Neg logit contributions
urious-0.085
rated-0.072
ust-0.071
uct-0.068
 dangers-0.067
Token good
Token ID922
Feature activation+3.61e-02

Pos logit contributions
 blush+0.069
 coloring+0.059
 Grac+0.059
 mascara+0.059
 lipstick+0.058
Neg logit contributions
urious-0.070
ust-0.059
uct-0.056
 dangers-0.055
rated-0.055
Token brother
Token ID3956
Feature activation+6.68e-02

Pos logit contributions
 blush+0.060
 coloring+0.056
 Grac+0.054
 piping+0.052
acey+0.051
Neg logit contributions
urious-0.075
rated-0.060
ust-0.060
 dangers-0.056
uct-0.056
Token and
Token ID290
Feature activation-1.10e-02

Pos logit contributions
 blush+0.118
 coloring+0.111
 Grac+0.107
 piping+0.103
acey+0.101
Neg logit contributions
urious-0.130
rated-0.104
ust-0.103
uct-0.096
 dangers-0.096
Token sister
Token ID6621
Feature activation+1.06e-01

Pos logit contributions
urious+0.023
rated+0.018
ust+0.018
 dangers+0.018
 eternity+0.017
Neg logit contributions
 blush-0.018
 coloring-0.016
 Grac-0.016
 piping-0.016
acey-0.015
Token.
Token ID13
Feature activation+3.25e-02

Pos logit contributions
 blush+0.199
 coloring+0.187
 Grac+0.182
 piping+0.175
 groom+0.172
Neg logit contributions
urious-0.195
rated-0.154
ust-0.151
uct-0.142
 dangers-0.141
Token<|endoftext|>
Token ID50256
Feature activation-4.61e-02

Pos logit contributions
 blush+0.050
 coloring+0.047
 Grac+0.047
 piping+0.045
 tea+0.044
Neg logit contributions
urious-0.071
rated-0.061
ust-0.057
uct-0.055
 eternity-0.054
TokenL
Token ID43
Feature activation+1.84e-02

Pos logit contributions
urious+0.094
rated+0.082
ust+0.078
uct+0.073
 eternity+0.071
Neg logit contributions
 coloring-0.076
 blush-0.076
 tea-0.071
 Grac-0.071
acey-0.069
Tokenily
Token ID813
Feature activation+3.74e-02

Pos logit contributions
 blush+0.032
 coloring+0.030
 Grac+0.029
 groom+0.027
 piping+0.027
Neg logit contributions
urious-0.037
ust-0.029
rated-0.029
uct-0.028
 dangers-0.027
Token liked
Token ID8288
Feature activation-3.96e-02

Pos logit contributions
 blush+0.064
 coloring+0.061
 Grac+0.058
 tea+0.055
 piping+0.055
Neg logit contributions
urious-0.075
rated-0.061
ust-0.059
uct-0.056
 eternity-0.055
Token treats
Token ID18432
Feature activation+9.88e-03

Pos logit contributions
 blush+0.042
 coloring+0.039
 Grac+0.038
 piping+0.036
acey+0.036
Neg logit contributions
urious-0.052
rated-0.043
ust-0.041
 dangers-0.039
 eternity-0.039
Token and
Token ID290
Feature activation-2.91e-02

Pos logit contributions
 blush+0.017
 coloring+0.015
 piping+0.015
 Grac+0.015
acey+0.015
Neg logit contributions
urious-0.018
ust-0.015
rated-0.015
 eternity-0.014
uct-0.014
Token toys
Token ID14958
Feature activation+2.52e-02

Pos logit contributions
urious+0.062
rated+0.053
ust+0.051
 dangers+0.049
 eternity+0.049
Neg logit contributions
 blush-0.043
 coloring-0.040
 Grac-0.038
acey-0.037
 piping-0.037
Token with
Token ID351
Feature activation+1.85e-02

Pos logit contributions
 blush+0.046
 piping+0.043
 Grac+0.042
 coloring+0.042
acey+0.042
Neg logit contributions
urious-0.043
ust-0.036
rated-0.034
 dangers-0.033
 eternity-0.033
Token the
Token ID262
Feature activation+1.17e-02

Pos logit contributions
 blush+0.027
 coloring+0.025
 Grac+0.024
 piping+0.023
 wig+0.023
Neg logit contributions
urious-0.039
ust-0.032
rated-0.031
 dangers-0.031
 eternity-0.030
Token girls
Token ID4813
Feature activation+1.06e-01

Pos logit contributions
 blush+0.020
 coloring+0.018
 Grac+0.018
 piping+0.017
acey+0.017
Neg logit contributions
urious-0.024
rated-0.019
ust-0.019
 dangers-0.018
uct-0.018
Token and
Token ID290
Feature activation-1.89e-02

Pos logit contributions
 blush+0.188
 Grac+0.169
 coloring+0.169
 mascara+0.166
 piping+0.165
Neg logit contributions
urious-0.190
ust-0.155
rated-0.152
 eternity-0.144
uct-0.142
Token they
Token ID484
Feature activation+3.84e-02

Pos logit contributions
urious+0.040
rated+0.033
ust+0.032
 dangers+0.031
 eternity+0.031
Neg logit contributions
 blush-0.028
 coloring-0.026
 Grac-0.025
acey-0.025
 piping-0.025
Token all
Token ID477
Feature activation+1.21e-03

Pos logit contributions
 blush+0.062
 coloring+0.058
 piping+0.055
 Grac+0.055
 wig+0.054
Neg logit contributions
urious-0.077
rated-0.065
ust-0.063
uct-0.059
 dangers-0.059
Token lived
Token ID5615
Feature activation+2.61e-03

Pos logit contributions
 blush+0.002
 coloring+0.002
 Grac+0.002
 lipstick+0.002
acey+0.002
Neg logit contributions
urious-0.002
rated-0.002
ust-0.002
uct-0.002
 eternity-0.002
Token happily
Token ID18177
Feature activation+5.89e-04

Pos logit contributions
 blush+0.004
 coloring+0.004
 Grac+0.004
 piping+0.004
acey+0.004
Neg logit contributions
urious-0.005
rated-0.004
ust-0.004
uct-0.004
 dangers-0.004
Token,
Token ID11
Feature activation+2.51e-02

Pos logit contributions
urious+0.006
rated+0.005
ust+0.005
uct+0.005
 eternity+0.005
Neg logit contributions
 blush-0.004
 coloring-0.003
 Grac-0.003
 piping-0.003
 wig-0.003
Token whenever
Token ID8797
Feature activation+3.90e-03

Pos logit contributions
 blush+0.033
 coloring+0.031
 Grac+0.031
 tea+0.029
 dress+0.028
Neg logit contributions
urious-0.061
rated-0.051
ust-0.049
uct-0.047
 dangers-0.047
Token Daisy
Token ID40355
Feature activation+2.00e-02

Pos logit contributions
 blush+0.005
 Grac+0.005
 coloring+0.005
 tea+0.004
 Rose+0.004
Neg logit contributions
urious-0.010
rated-0.008
ust-0.008
uct-0.008
 habitat-0.008
Token and
Token ID290
Feature activation-7.31e-03

Pos logit contributions
 blush+0.030
 coloring+0.029
 Grac+0.027
 dress+0.026
 groom+0.026
Neg logit contributions
urious-0.044
rated-0.037
ust-0.036
uct-0.034
 dangers-0.034
Token her
Token ID607
Feature activation+6.59e-03

Pos logit contributions
urious+0.018
rated+0.015
ust+0.015
uct+0.014
 dangers+0.014
Neg logit contributions
 blush-0.009
 coloring-0.008
 piping-0.007
 wig-0.007
acey-0.007
Token sister
Token ID6621
Feature activation+1.06e-01

Pos logit contributions
 blush+0.008
 coloring+0.007
 Grac+0.007
 piping+0.006
 groom+0.006
Neg logit contributions
urious-0.017
rated-0.014
ust-0.014
 dangers-0.013
uct-0.013
Token look
Token ID804
Feature activation+2.40e-02

Pos logit contributions
 blush+0.162
 coloring+0.152
 Grac+0.144
 piping+0.138
 tea+0.137
Neg logit contributions
urious-0.230
rated-0.190
ust-0.185
uct-0.177
 dangers-0.174
Token back
Token ID736
Feature activation+5.73e-04

Pos logit contributions
 blush+0.044
 wig+0.041
 groom+0.039
 mascara+0.039
 coloring+0.039
Neg logit contributions
urious-0.043
rated-0.035
ust-0.035
 eternity-0.032
 dangers-0.031
Token on
Token ID319
Feature activation+3.30e-02

Pos logit contributions
 blush+0.001
 coloring+0.001
 Grac+0.001
 tea+0.001
 dress+0.001
Neg logit contributions
urious-0.001
rated-0.001
ust-0.001
uct-0.001
 dangers-0.001
Token that
Token ID326
Feature activation+4.53e-02

Pos logit contributions
 blush+0.053
 coloring+0.048
 Grac+0.046
 wig+0.045
acey+0.045
Neg logit contributions
urious-0.069
rated-0.055
ust-0.054
 eternity-0.053
 dangers-0.053
Token special
Token ID2041
Feature activation+1.32e-02

Pos logit contributions
 blush+0.080
 coloring+0.077
 Grac+0.073
 tea+0.071
 dress+0.071
Neg logit contributions
urious-0.088
ust-0.071
rated-0.070
uct-0.067
 habitat-0.065
Token mom
Token ID1995
Feature activation+2.98e-02

Pos logit contributions
urious+0.075
rated+0.065
ust+0.062
uct+0.061
appointed+0.057
Neg logit contributions
 blush-0.048
 coloring-0.047
 dress-0.044
 tea-0.042
 groom-0.042
Token and
Token ID290
Feature activation-1.95e-02

Pos logit contributions
 blush+0.051
 coloring+0.047
 Grac+0.046
 tea+0.044
 piping+0.044
Neg logit contributions
urious-0.059
rated-0.047
ust-0.045
uct-0.044
 eternity-0.043
Token dad
Token ID9955
Feature activation+4.15e-02

Pos logit contributions
urious+0.027
uct+0.019
 dangers+0.018
rated+0.018
 next+0.018
Neg logit contributions
 blush-0.042
 Grac-0.038
 coloring-0.038
acey-0.038
 tea-0.038
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
 blush+0.067
 Grac+0.062
 coloring+0.061
acey+0.058
 piping+0.058
Neg logit contributions
urious-0.084
rated-0.067
ust-0.066
uct-0.064
 eternity-0.063
Token his
Token ID465
Feature activation+1.10e-03

Pos logit contributions
urious+0.019
 dangers+0.014
rated+0.013
 eternity+0.013
 next+0.013
Neg logit contributions
 blush-0.029
acey-0.026
 Grac-0.026
 coloring-0.026
 piping-0.026
Token sister
Token ID6621
Feature activation+1.04e-01

Pos logit contributions
 blush+0.002
 coloring+0.002
 Grac+0.002
 piping+0.001
 tea+0.001
Neg logit contributions
urious-0.002
rated-0.002
ust-0.002
uct-0.002
 dangers-0.002
Token and
Token ID290
Feature activation-9.74e-03

Pos logit contributions
 blush+0.178
 coloring+0.167
 Grac+0.163
 piping+0.156
acey+0.154
Neg logit contributions
urious-0.206
rated-0.165
ust-0.161
uct-0.154
 eternity-0.151
Token his
Token ID465
Feature activation+1.31e-03

Pos logit contributions
urious+0.016
rated+0.013
 dangers+0.013
 eternity+0.013
 habitat+0.012
Neg logit contributions
 blush-0.017
 piping-0.016
acey-0.016
 Grac-0.015
 coloring-0.015
Token dog
Token ID3290
Feature activation+3.24e-02

Pos logit contributions
 blush+0.002
 coloring+0.002
 Grac+0.002
 piping+0.002
 tea+0.002
Neg logit contributions
urious-0.003
rated-0.002
ust-0.002
uct-0.002
 dangers-0.002
Token in
Token ID287
Feature activation+3.75e-02

Pos logit contributions
 blush+0.056
 coloring+0.053
 Grac+0.050
 tea+0.048
 piping+0.048
Neg logit contributions
urious-0.065
rated-0.053
ust-0.051
uct-0.049
 dangers-0.048
Token the
Token ID262
Feature activation+3.75e-02

Pos logit contributions
 blush+0.059
 coloring+0.055
 Grac+0.052
 piping+0.050
acey+0.050
Neg logit contributions
urious-0.080
rated-0.064
ust-0.063
 dangers-0.061
uct-0.061
Token on
Token ID319
Feature activation+3.77e-03

Pos logit contributions
 blush+0.046
 coloring+0.042
 Grac+0.040
 groom+0.038
 piping+0.038
Neg logit contributions
urious-0.102
rated-0.088
ust-0.086
uct-0.083
 eternity-0.083
Token the
Token ID262
Feature activation-1.15e-02

Pos logit contributions
 blush+0.005
 coloring+0.004
 groom+0.004
 Grac+0.004
 piping+0.004
Neg logit contributions
urious-0.009
rated-0.008
ust-0.008
 eternity-0.007
uct-0.007
Token little
Token ID1310
Feature activation+2.75e-02

Pos logit contributions
urious+0.023
rated+0.019
ust+0.018
uct+0.017
 eternity+0.017
Neg logit contributions
 blush-0.020
 coloring-0.019
 Grac-0.018
 tea-0.018
 dress-0.018
Token boys
Token ID6510
Feature activation+5.88e-02

Pos logit contributions
 blush+0.037
 coloring+0.034
 Grac+0.032
 piping+0.031
acey+0.031
Neg logit contributions
urious-0.065
rated-0.055
ust-0.053
uct-0.051
 eternity-0.051
Token and
Token ID290
Feature activation-2.63e-02

Pos logit contributions
 blush+0.094
 coloring+0.088
 Grac+0.084
 piping+0.081
 tea+0.080
Neg logit contributions
urious-0.124
rated-0.102
ust-0.100
uct-0.095
 dangers-0.093
Token girls
Token ID4813
Feature activation+1.04e-01

Pos logit contributions
urious+0.051
rated+0.043
ust+0.040
 dangers+0.040
 eternity+0.039
Neg logit contributions
 blush-0.044
 coloring-0.040
 piping-0.039
 Grac-0.038
acey-0.038
Token always
Token ID1464
Feature activation+4.75e-03

Pos logit contributions
 blush+0.173
 piping+0.163
 Grac+0.160
 coloring+0.160
 wig+0.159
Neg logit contributions
urious-0.189
rated-0.156
ust-0.154
uct-0.141
 eternity-0.138
Token looked
Token ID3114
Feature activation+1.67e-02

Pos logit contributions
 blush+0.008
 coloring+0.007
 Grac+0.007
 piping+0.007
acey+0.007
Neg logit contributions
urious-0.010
rated-0.008
ust-0.008
 dangers-0.008
uct-0.007
Token up
Token ID510
Feature activation-3.85e-03

Pos logit contributions
 blush+0.028
 coloring+0.026
 Grac+0.025
 piping+0.025
 wig+0.024
Neg logit contributions
urious-0.034
rated-0.028
ust-0.028
 dangers-0.026
 eternity-0.026
Token to
Token ID284
Feature activation+1.92e-02

Pos logit contributions
urious+0.008
rated+0.007
uct+0.007
ust+0.007
 dangers+0.006
Neg logit contributions
 coloring-0.006
 blush-0.005
 Grac-0.005
 tea-0.005
 dress-0.005
Token the
Token ID262
Feature activation-1.69e-03

Pos logit contributions
 blush+0.031
 coloring+0.029
 Grac+0.027
acey+0.027
 piping+0.027
Neg logit contributions
urious-0.040
rated-0.033
ust-0.032
 eternity-0.031
 dangers-0.030
Token too
Token ID1165
Feature activation+1.13e-03

Pos logit contributions
urious+0.004
rated+0.003
ust+0.003
uct+0.003
 eternity+0.003
Neg logit contributions
 blush-0.003
 coloring-0.003
 Grac-0.003
 piping-0.003
 tea-0.003
Token.
Token ID13
Feature activation-1.99e-03

Pos logit contributions
 blush+0.002
 coloring+0.002
 Grac+0.001
 dress+0.001
 tea+0.001
Neg logit contributions
urious-0.003
rated-0.002
uct-0.002
ust-0.002
 dangers-0.002
Token Sue
Token ID26089
Feature activation+1.77e-02

Pos logit contributions
urious+0.005
rated+0.004
ust+0.004
 dangers+0.004
 eternity+0.003
Neg logit contributions
 blush-0.003
 coloring-0.003
 Grac-0.002
 piping-0.002
 tea-0.002
Token and
Token ID290
Feature activation-2.22e-02

Pos logit contributions
 blush+0.031
 coloring+0.029
 Grac+0.028
 piping+0.027
 tea+0.027
Neg logit contributions
urious-0.035
rated-0.028
ust-0.027
uct-0.026
 dangers-0.025
Token her
Token ID607
Feature activation+1.28e-02

Pos logit contributions
urious+0.058
rated+0.050
ust+0.049
uct+0.047
 eternity+0.047
Neg logit contributions
 blush-0.025
 coloring-0.022
 Grac-0.020
 piping-0.020
acey-0.019
Token sister
Token ID6621
Feature activation+1.04e-01

Pos logit contributions
 blush+0.017
 coloring+0.015
 piping+0.015
 Grac+0.014
 wig+0.014
Neg logit contributions
urious-0.029
ust-0.024
rated-0.024
 dangers-0.023
 eternity-0.023
Token had
Token ID550
Feature activation+6.12e-02

Pos logit contributions
 blush+0.174
 coloring+0.165
 Grac+0.162
 piping+0.160
 wig+0.158
Neg logit contributions
urious-0.199
rated-0.160
ust-0.154
uct-0.149
 eternity-0.145
Token made
Token ID925
Feature activation+2.50e-02

Pos logit contributions
 blush+0.103
 wig+0.091
 Grac+0.089
 coloring+0.089
 piping+0.087
Neg logit contributions
urious-0.121
uct-0.098
ust-0.097
 dangers-0.093
rated-0.093
Token their
Token ID511
Feature activation+1.12e-02

Pos logit contributions
 blush+0.043
 wig+0.037
 coloring+0.037
 piping+0.036
 mascara+0.036
Neg logit contributions
urious-0.049
 dangers-0.039
 eternity-0.039
ust-0.038
rated-0.037
Token home
Token ID1363
Feature activation+1.54e-02

Pos logit contributions
 blush+0.020
 coloring+0.020
 tea+0.019
 dress+0.019
 Grac+0.018
Neg logit contributions
urious-0.021
rated-0.018
ust-0.017
uct-0.016
 eternity-0.016
Token much
Token ID881
Feature activation+4.46e-03

Pos logit contributions
 blush+0.024
 coloring+0.021
 Grac+0.021
 piping+0.021
acey+0.021
Neg logit contributions
urious-0.032
rated-0.027
ust-0.027
 eternity-0.025
 dangers-0.025
Token happy
Token ID3772
Feature activation+2.24e-02

Pos logit contributions
 blush+0.068
 coloring+0.062
 Grac+0.061
 piping+0.058
 wig+0.057
Neg logit contributions
urious-0.076
ust-0.062
rated-0.061
uct-0.059
 dangers-0.058
Token to
Token ID284
Feature activation+5.61e-03

Pos logit contributions
 blush+0.039
 coloring+0.037
 Grac+0.035
 piping+0.034
 tea+0.033
Neg logit contributions
urious-0.045
rated-0.036
ust-0.035
uct-0.034
 eternity-0.033
Token be
Token ID307
Feature activation+3.78e-02

Pos logit contributions
 blush+0.009
 coloring+0.008
 Grac+0.008
acey+0.008
 groom+0.008
Neg logit contributions
urious-0.012
rated-0.010
ust-0.009
 eternity-0.009
uct-0.009
Token brother
Token ID3956
Feature activation+6.07e-02

Pos logit contributions
 blush+0.071
 piping+0.064
 Grac+0.062
 mascara+0.060
 coloring+0.060
Neg logit contributions
urious-0.067
ust-0.056
 next-0.053
 dangers-0.053
rated-0.052
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
 blush+0.099
 coloring+0.093
 Grac+0.088
 tea+0.085
 dress+0.085
Neg logit contributions
urious-0.128
rated-0.105
ust-0.102
uct-0.097
 eternity-0.095
Token sister
Token ID6621
Feature activation+1.04e-01

Pos logit contributions
urious+0.020
 next+0.017
 dangers+0.017
ust+0.017
rated+0.016
Neg logit contributions
 blush-0.018
 piping-0.017
acey-0.016
 Grac-0.016
 lipstick-0.015
Token.
Token ID13
Feature activation+4.03e-02

Pos logit contributions
 blush+0.186
 coloring+0.176
 Grac+0.168
 piping+0.162
 tea+0.161
Neg logit contributions
urious-0.201
rated-0.162
ust-0.157
uct-0.149
 dangers-0.146
Token They
Token ID1119
Feature activation+3.54e-02

Pos logit contributions
 blush+0.063
 coloring+0.059
 Grac+0.055
 tea+0.054
 piping+0.054
Neg logit contributions
urious-0.088
rated-0.074
ust-0.072
uct-0.069
 dangers-0.067
Token did
Token ID750
Feature activation+1.55e-02

Pos logit contributions
 blush+0.058
 coloring+0.056
 piping+0.053
 Grac+0.053
 wig+0.050
Neg logit contributions
urious-0.070
rated-0.057
ust-0.057
uct-0.054
 dangers-0.052
Token not
Token ID407
Feature activation-5.88e-02

Pos logit contributions
 blush+0.029
 coloring+0.028
 Grac+0.026
 tea+0.026
 dress+0.025
Neg logit contributions
urious-0.029
rated-0.024
ust-0.023
uct-0.021
 dangers-0.021
Token want
Token ID765
Feature activation-9.08e-03

Pos logit contributions
urious+0.121
rated+0.100
ust+0.097
uct+0.091
 dangers+0.090
Neg logit contributions
 blush-0.097
 coloring-0.093
 Grac-0.089
 piping-0.085
acey-0.084
Token mom
Token ID1995
Feature activation+3.17e-02

Pos logit contributions
urious+0.071
rated+0.059
ust+0.057
uct+0.055
 eternity+0.053
Neg logit contributions
 blush-0.051
 coloring-0.049
 Grac-0.045
 dress-0.044
 tea-0.044
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
 blush+0.049
 coloring+0.046
 Grac+0.043
 dress+0.042
 tea+0.042
Neg logit contributions
urious-0.070
rated-0.058
ust-0.057
uct-0.054
 dangers-0.053
Token dad
Token ID9955
Feature activation+3.68e-02

Pos logit contributions
urious+0.028
rated+0.023
ust+0.022
uct+0.021
 dangers+0.021
Neg logit contributions
 blush-0.023
 coloring-0.021
 Grac-0.020
 piping-0.019
 tea-0.019
Token and
Token ID290
Feature activation-1.33e-02

Pos logit contributions
 blush+0.056
 coloring+0.053
 Grac+0.048
 dress+0.048
 tea+0.047
Neg logit contributions
urious-0.082
rated-0.068
ust-0.067
uct-0.064
 dangers-0.062
Token his
Token ID465
Feature activation-7.63e-03

Pos logit contributions
urious+0.029
rated+0.024
ust+0.023
uct+0.022
 dangers+0.022
Neg logit contributions
 blush-0.020
 coloring-0.019
 Grac-0.018
 piping-0.018
acey-0.017
Token sister
Token ID6621
Feature activation+1.03e-01

Pos logit contributions
urious+0.017
rated+0.014
ust+0.014
uct+0.013
 eternity+0.013
Neg logit contributions
 blush-0.011
 coloring-0.011
 Grac-0.010
 tea-0.010
 dress-0.010
Token.
Token ID13
Feature activation+2.39e-02

Pos logit contributions
 blush+0.168
 coloring+0.160
 dress+0.148
 Grac+0.145
 tea+0.144
Neg logit contributions
urious-0.220
rated-0.180
ust-0.176
uct-0.166
 dangers-0.162
Token
Token ID198
Feature activation-2.51e-03

Pos logit contributions
 blush+0.043
 coloring+0.041
 Grac+0.040
 Rose+0.039
acey+0.038
Neg logit contributions
urious-0.046
rated-0.040
ust-0.037
uct-0.035
 eternity-0.034
Token
Token ID198
Feature activation-1.91e-03

Pos logit contributions
urious+0.006
rated+0.005
ust+0.005
uct+0.004
 dangers+0.004
Neg logit contributions
 blush-0.004
 coloring-0.004
 Grac-0.003
acey-0.003
 piping-0.003
TokenBen
Token ID11696
Feature activation+3.79e-02

Pos logit contributions
urious+0.004
rated+0.004
ust+0.003
 eternity+0.003
uct+0.003
Neg logit contributions
 blush-0.003
 coloring-0.003
 Grac-0.003
acey-0.002
 piping-0.002
Token nodded
Token ID14464
Feature activation+1.55e-02

Pos logit contributions
 blush+0.050
 coloring+0.046
 dress+0.045
 groom+0.043
 wig+0.042
Neg logit contributions
urious-0.089
rated-0.077
ust-0.076
 eternity-0.072
uct-0.071
Token on
Token ID319
Feature activation+6.73e-03

Pos logit contributions
 blush+0.051
 Grac+0.048
 coloring+0.047
 piping+0.046
acey+0.045
Neg logit contributions
urious-0.064
rated-0.052
ust-0.051
uct-0.049
 dangers-0.049
Token the
Token ID262
Feature activation+1.58e-02

Pos logit contributions
 blush+0.013
 coloring+0.012
 piping+0.011
 Grac+0.011
acey+0.011
Neg logit contributions
urious-0.011
 dangers-0.008
 eternity-0.008
 habitat-0.008
 next-0.008
Token floor
Token ID4314
Feature activation+4.13e-02

Pos logit contributions
 blush+0.033
 Grac+0.032
 piping+0.031
 groom+0.030
 coloring+0.030
Neg logit contributions
urious-0.021
rated-0.018
 next-0.017
 dangers-0.016
uct-0.015
Token and
Token ID290
Feature activation-2.51e-02

Pos logit contributions
 blush+0.069
 coloring+0.065
 Grac+0.063
 piping+0.060
 groom+0.059
Neg logit contributions
urious-0.084
rated-0.068
ust-0.067
uct-0.063
 dangers-0.063
Token the
Token ID262
Feature activation+1.47e-02

Pos logit contributions
urious+0.035
 dangers+0.028
uct+0.028
 next+0.027
rated+0.025
Neg logit contributions
 piping-0.047
 blush-0.045
acey-0.044
 gown-0.044
 coloring-0.044
Token sisters
Token ID15153
Feature activation+1.03e-01

Pos logit contributions
acey+0.019
 Grac+0.019
 blush+0.019
 piping+0.019
 Stacy+0.018
Neg logit contributions
urious-0.028
 next-0.025
 dangers-0.025
rated-0.024
 earthquake-0.023
Token fighting
Token ID4330
Feature activation+2.23e-03

Pos logit contributions
 blush+0.189
 coloring+0.178
 Grac+0.178
 piping+0.172
 groom+0.170
Neg logit contributions
urious-0.188
rated-0.147
ust-0.144
uct-0.137
 dangers-0.134
Token.
Token ID13
Feature activation+3.00e-02

Pos logit contributions
 blush+0.004
 Grac+0.004
 piping+0.004
 coloring+0.004
 groom+0.004
Neg logit contributions
urious-0.004
rated-0.003
ust-0.003
uct-0.003
 dangers-0.003
Token
Token ID198
Feature activation+7.56e-03

Pos logit contributions
 blush+0.042
 coloring+0.039
 Grac+0.037
 piping+0.035
 tea+0.035
Neg logit contributions
urious-0.070
rated-0.059
ust-0.058
uct-0.055
 eternity-0.054
Token
Token ID198
Feature activation+8.52e-03

Pos logit contributions
 blush+0.013
 coloring+0.012
 Grac+0.012
 piping+0.011
 tea+0.011
Neg logit contributions
urious-0.015
rated-0.012
ust-0.012
uct-0.011
 eternity-0.011
TokenShe
Token ID3347
Feature activation+1.58e-02

Pos logit contributions
 blush+0.015
 coloring+0.014
 Grac+0.013
 tea+0.013
 piping+0.013
Neg logit contributions
urious-0.017
ust-0.013
rated-0.013
uct-0.013
 dangers-0.013
Token,
Token ID11
Feature activation+3.13e-02

Pos logit contributions
 blush+0.033
 piping+0.030
 coloring+0.030
acey+0.029
 Grac+0.029
Neg logit contributions
urious-0.031
rated-0.026
ust-0.025
 eternity-0.024
 dangers-0.023
Token while
Token ID981
Feature activation-7.98e-03

Pos logit contributions
 blush+0.049
 coloring+0.046
 Grac+0.044
 piping+0.043
acey+0.042
Neg logit contributions
urious-0.066
rated-0.055
ust-0.053
uct-0.051
 eternity-0.051
Token his
Token ID465
Feature activation-3.65e-02

Pos logit contributions
urious+0.017
rated+0.014
ust+0.013
uct+0.013
 dangers+0.012
Neg logit contributions
 blush-0.013
 coloring-0.013
 Grac-0.012
 tea-0.011
 piping-0.011
Token brothers
Token ID9397
Feature activation+6.13e-02

Pos logit contributions
urious+0.087
uct+0.080
rated+0.080
ust+0.078
appointed+0.073
Neg logit contributions
 coloring-0.047
 blush-0.045
 tea-0.044
 dress-0.044
 Grac-0.042
Token and
Token ID290
Feature activation-2.42e-02

Pos logit contributions
 blush+0.090
 coloring+0.086
 Grac+0.082
 piping+0.081
 tea+0.077
Neg logit contributions
urious-0.132
rated-0.110
ust-0.108
uct-0.103
 eternity-0.101
Token sisters
Token ID15153
Feature activation+1.03e-01

Pos logit contributions
urious+0.044
rated+0.037
 dangers+0.036
 habitats+0.034
 habitat+0.034
Neg logit contributions
 blush-0.042
 piping-0.038
acey-0.038
 coloring-0.037
 Grac-0.036
Token hugged
Token ID41130
Feature activation+1.90e-02

Pos logit contributions
 blush+0.146
 piping+0.141
 coloring+0.138
 Grac+0.137
acey+0.132
Neg logit contributions
urious-0.205
rated-0.175
ust-0.173
uct-0.166
 eternity-0.160
Token him
Token ID683
Feature activation-5.83e-03

Pos logit contributions
 blush+0.036
 coloring+0.034
 Grac+0.033
 piping+0.032
 tea+0.032
Neg logit contributions
urious-0.034
rated-0.027
ust-0.026
uct-0.025
 eternity-0.025
Token.
Token ID13
Feature activation+1.89e-03

Pos logit contributions
urious+0.012
rated+0.010
ust+0.009
uct+0.009
 dangers+0.009
Neg logit contributions
 blush-0.010
 coloring-0.009
 Grac-0.009
 tea-0.009
 dress-0.009
Token
Token ID198
Feature activation-7.58e-03

Pos logit contributions
 blush+0.003
 coloring+0.003
 Grac+0.003
 piping+0.003
 tea+0.003
Neg logit contributions
urious-0.004
rated-0.003
ust-0.003
uct-0.003
 dangers-0.003
Token
Token ID198
Feature activation-5.91e-03

Pos logit contributions
urious+0.017
rated+0.014
ust+0.014
 habitat+0.013
 dangers+0.013
Neg logit contributions
 blush-0.011
 coloring-0.011
 Grac-0.011
 piping-0.010
acey-0.010
Token in
Token ID287
Feature activation+2.34e-02

Pos logit contributions
 blush+0.014
 coloring+0.013
 Grac+0.012
 tea+0.012
 dress+0.012
Neg logit contributions
urious-0.020
rated-0.016
ust-0.016
uct-0.015
 eternity-0.015
Token the
Token ID262
Feature activation+2.54e-02

Pos logit contributions
 blush+0.038
 coloring+0.036
 Grac+0.034
acey+0.033
 piping+0.033
Neg logit contributions
urious-0.048
rated-0.038
 dangers-0.037
ust-0.037
uct-0.036
Token shirt
Token ID10147
Feature activation+3.07e-02

Pos logit contributions
 blush+0.058
 coloring+0.054
 Grac+0.054
acey+0.053
 groom+0.052
Neg logit contributions
urious-0.035
rated-0.026
 dangers-0.026
 habitat-0.024
 territory-0.023
Token and
Token ID290
Feature activation-1.27e-02

Pos logit contributions
 blush+0.054
 coloring+0.051
 Grac+0.050
 piping+0.048
 groom+0.047
Neg logit contributions
urious-0.059
rated-0.048
ust-0.047
uct-0.044
 eternity-0.043
Token the
Token ID262
Feature activation+1.99e-02

Pos logit contributions
urious+0.023
 dangers+0.018
rated+0.017
uct+0.017
 eternity+0.016
Neg logit contributions
 blush-0.022
acey-0.021
 coloring-0.021
 Grac-0.021
 piping-0.021
Token twins
Token ID20345
Feature activation+1.02e-01

Pos logit contributions
 blush+0.030
 Grac+0.028
 coloring+0.028
acey+0.027
 piping+0.027
Neg logit contributions
urious-0.039
 dangers-0.034
rated-0.032
 habitat-0.030
 property-0.030
Token crying
Token ID13774
Feature activation+2.43e-02

Pos logit contributions
 blush+0.183
 coloring+0.173
 Grac+0.162
 dress+0.158
 tea+0.157
Neg logit contributions
urious-0.201
rated-0.164
ust-0.158
uct-0.148
 dangers-0.146
Token.
Token ID13
Feature activation+4.37e-02

Pos logit contributions
 blush+0.048
 coloring+0.046
 Grac+0.045
 piping+0.044
 groom+0.043
Neg logit contributions
urious-0.042
rated-0.034
ust-0.032
 dangers-0.030
uct-0.029
Token
Token ID198
Feature activation+1.17e-02

Pos logit contributions
 blush+0.069
 coloring+0.065
 Grac+0.062
 piping+0.059
 tea+0.059
Neg logit contributions
urious-0.094
rated-0.079
ust-0.076
uct-0.073
 eternity-0.071
Token
Token ID198
Feature activation+1.35e-02

Pos logit contributions
 blush+0.019
 coloring+0.018
 Grac+0.017
 piping+0.016
 dress+0.016
Neg logit contributions
urious-0.025
rated-0.021
ust-0.020
uct-0.019
 eternity-0.019
Token"
Token ID1
Feature activation+6.41e-02

Pos logit contributions
 blush+0.022
 coloring+0.020
 Grac+0.019
 piping+0.019
 tea+0.019
Neg logit contributions
urious-0.029
rated-0.023
ust-0.023
uct-0.022
 dangers-0.022
Token sees
Token ID7224
Feature activation+2.36e-03

Pos logit contributions
urious+0.037
rated+0.032
ust+0.031
uct+0.029
 eternity+0.029
Neg logit contributions
 blush-0.028
 coloring-0.025
 dress-0.024
 tea-0.024
 te-0.024
Token Sam
Token ID3409
Feature activation+7.01e-03

Pos logit contributions
 blush+0.004
 coloring+0.004
 Grac+0.004
 piping+0.004
 groom+0.004
Neg logit contributions
urious-0.005
rated-0.004
ust-0.004
 dangers-0.004
uct-0.003
Token's
Token ID338
Feature activation+2.34e-02

Pos logit contributions
 blush+0.010
 coloring+0.010
 dress+0.009
 tea+0.009
 Grac+0.009
Neg logit contributions
urious-0.016
rated-0.013
ust-0.013
uct-0.012
 habitat-0.012
Token mom
Token ID1995
Feature activation+3.19e-02

Pos logit contributions
 blush+0.034
 coloring+0.033
 dress+0.030
 tea+0.030
 Grac+0.029
Neg logit contributions
urious-0.054
rated-0.046
ust-0.045
uct-0.044
appointed-0.041
Token and
Token ID290
Feature activation-7.55e-03

Pos logit contributions
 blush+0.052
 coloring+0.050
 dress+0.045
 Grac+0.045
 tea+0.045
Neg logit contributions
urious-0.068
rated-0.057
ust-0.056
uct-0.052
 eternity-0.051
Token sister
Token ID6621
Feature activation+1.02e-01

Pos logit contributions
urious+0.016
rated+0.013
ust+0.012
uct+0.012
 dangers+0.012
Neg logit contributions
 blush-0.012
 coloring-0.012
 Grac-0.011
 piping-0.011
 tea-0.011
Token.
Token ID13
Feature activation+3.75e-02

Pos logit contributions
 blush+0.175
 coloring+0.166
 dress+0.153
 Grac+0.152
 tea+0.151
Neg logit contributions
urious-0.210
rated-0.173
ust-0.169
uct-0.157
 eternity-0.155
Token He
Token ID679
Feature activation-6.84e-03

Pos logit contributions
 blush+0.075
 coloring+0.071
 Grac+0.068
 dress+0.066
 Rose+0.066
Neg logit contributions
urious-0.065
rated-0.054
ust-0.051
uct-0.048
 eternity-0.047
Token sees
Token ID7224
Feature activation+6.10e-03

Pos logit contributions
urious+0.015
rated+0.012
ust+0.012
uct+0.011
 eternity+0.011
Neg logit contributions
 blush-0.011
 coloring-0.010
 Grac-0.010
 tea-0.009
 dress-0.009
Token their
Token ID511
Feature activation+1.17e-02

Pos logit contributions
 blush+0.011
 coloring+0.010
 Grac+0.010
 groom+0.010
 piping+0.010
Neg logit contributions
urious-0.011
 dangers-0.009
rated-0.008
ust-0.008
 eternity-0.008
Token angry
Token ID7954
Feature activation+1.08e-02

Pos logit contributions
 blush+0.017
 coloring+0.017
 dress+0.015
 tea+0.015
 Grac+0.014
Neg logit contributions
urious-0.027
rated-0.023
ust-0.023
uct-0.022
appointed-0.021
Token castle
Token ID16669
Feature activation-1.69e-03

Pos logit contributions
 blush+0.011
 coloring+0.010
 Grac+0.010
 piping+0.009
acey+0.009
Neg logit contributions
urious-0.013
rated-0.011
ust-0.011
uct-0.010
 dangers-0.010
Token with
Token ID351
Feature activation+1.15e-02

Pos logit contributions
urious+0.004
rated+0.003
ust+0.003
uct+0.003
 dangers+0.003
Neg logit contributions
 blush-0.002
 coloring-0.002
 dress-0.002
 Grac-0.002
 tea-0.002
Token his
Token ID465
Feature activation-1.28e-02

Pos logit contributions
 blush+0.020
acey+0.018
 coloring+0.018
 Grac+0.018
 kit+0.018
Neg logit contributions
urious-0.022
 dangers-0.017
ust-0.017
 habitat-0.017
rated-0.017
Token brothers
Token ID9397
Feature activation+5.91e-02

Pos logit contributions
urious+0.026
rated+0.021
ust+0.021
 dangers+0.020
uct+0.020
Neg logit contributions
 blush-0.021
 coloring-0.020
 Grac-0.019
 piping-0.018
 tea-0.018
Token and
Token ID290
Feature activation-2.04e-02

Pos logit contributions
 blush+0.106
 coloring+0.099
 Grac+0.097
 piping+0.093
 tea+0.092
Neg logit contributions
urious-0.112
rated-0.090
ust-0.087
uct-0.083
 eternity-0.082
Token sisters
Token ID15153
Feature activation+1.02e-01

Pos logit contributions
urious+0.036
 dangers+0.030
rated+0.029
 habitat+0.029
 habitats+0.028
Neg logit contributions
 blush-0.035
acey-0.032
 piping-0.031
 Grac-0.031
 coloring-0.030
Token.
Token ID13
Feature activation+4.30e-03

Pos logit contributions
 blush+0.196
 Grac+0.182
 coloring+0.179
 piping+0.176
acey+0.175
Neg logit contributions
urious-0.176
rated-0.138
ust-0.133
uct-0.127
 eternity-0.125
Token
Token ID220
Feature activation-1.25e-02

Pos logit contributions
 blush+0.007
 Grac+0.007
 coloring+0.007
 Rose+0.006
acey+0.006
Neg logit contributions
urious-0.009
rated-0.008
ust-0.007
 dangers-0.007
uct-0.007
Token
Token ID198
Feature activation-1.33e-02

Pos logit contributions
urious+0.031
rated+0.028
 dangers+0.026
 eternity+0.025
 habitat+0.025
Neg logit contributions
 coloring-0.014
acey-0.013
 Grac-0.013
 blush-0.013
 Rose-0.013
Token
Token ID198
Feature activation-1.16e-02

Pos logit contributions
urious+0.027
rated+0.022
ust+0.021
uct+0.020
 dangers+0.020
Neg logit contributions
 blush-0.023
 coloring-0.022
 Grac-0.021
 piping-0.020
 tea-0.020
TokenThen
Token ID6423
Feature activation+1.06e-02

Pos logit contributions
urious+0.023
rated+0.020
ust+0.018
 eternity+0.018
 habitat+0.018
Neg logit contributions
 blush-0.019
 coloring-0.018
 Grac-0.017
 groom-0.017
acey-0.017
Token amazed
Token ID24562
Feature activation-8.97e-03

Pos logit contributions
 blush+0.028
 Grac+0.025
 coloring+0.024
 wig+0.024
acey+0.024
Neg logit contributions
urious-0.036
rated-0.029
ust-0.029
uct-0.028
 eternity-0.028
Token.
Token ID13
Feature activation+8.65e-04

Pos logit contributions
urious+0.018
rated+0.015
ust+0.014
uct+0.014
appointed+0.014
Neg logit contributions
 blush-0.014
 coloring-0.014
 Grac-0.013
 dress-0.013
 groom-0.013
Token Daisy
Token ID40355
Feature activation+1.77e-02

Pos logit contributions
 blush+0.001
 Grac+0.001
 coloring+0.001
 Rose+0.001
 piping+0.001
Neg logit contributions
urious-0.002
rated-0.002
ust-0.002
 dangers-0.001
uct-0.001
Token and
Token ID290
Feature activation-1.86e-02

Pos logit contributions
 blush+0.032
 coloring+0.030
 piping+0.028
 tea+0.028
 Grac+0.028
Neg logit contributions
urious-0.033
rated-0.026
ust-0.025
uct-0.024
 habitat-0.023
Token her
Token ID607
Feature activation+6.38e-03

Pos logit contributions
urious+0.045
rated+0.038
ust+0.038
uct+0.036
 dangers+0.035
Neg logit contributions
 blush-0.025
 coloring-0.023
 Grac-0.022
 groom-0.021
 dress-0.020
Token sister
Token ID6621
Feature activation+1.01e-01

Pos logit contributions
 blush+0.010
 coloring+0.009
 Grac+0.009
 piping+0.009
 groom+0.008
Neg logit contributions
urious-0.014
rated-0.011
ust-0.011
uct-0.010
 dangers-0.010
Token were
Token ID547
Feature activation+4.12e-02

Pos logit contributions
 blush+0.166
 wig+0.158
 coloring+0.156
acey+0.156
 Grac+0.156
Neg logit contributions
urious-0.193
rated-0.149
ust-0.145
uct-0.141
 habitat-0.137
Token so
Token ID523
Feature activation+1.03e-02

Pos logit contributions
 blush+0.071
 wig+0.064
 mascara+0.063
 lipstick+0.062
 kit+0.061
Neg logit contributions
urious-0.079
uct-0.065
ust-0.065
rated-0.061
 eternity-0.060
Token proud
Token ID6613
Feature activation+2.32e-02

Pos logit contributions
 blush+0.018
 coloring+0.016
 wig+0.016
 kit+0.016
 gown+0.015
Neg logit contributions
urious-0.021
ust-0.017
rated-0.017
uct-0.016
appointed-0.015
Token of
Token ID286
Feature activation-3.75e-03

Pos logit contributions
 blush+0.037
 coloring+0.035
 Grac+0.033
 piping+0.032
 tea+0.032
Neg logit contributions
urious-0.049
rated-0.040
ust-0.040
uct-0.038
 dangers-0.037
Token themselves
Token ID2405
Feature activation+2.94e-02

Pos logit contributions
urious+0.009
rated+0.008
ust+0.008
uct+0.007
 dangers+0.007
Neg logit contributions
 blush-0.005
 coloring-0.005
 Grac-0.004
 tea-0.004
 piping-0.004
Token to
Token ID284
Feature activation+1.31e-02

Pos logit contributions
urious+0.000
rated+0.000
uct+0.000
 habitat+0.000
appointed+0.000
Neg logit contributions
 blush-0.000
 coloring-0.000
 tea-0.000
 Rose-0.000
 dress-0.000
Token the
Token ID262
Feature activation+4.46e-03

Pos logit contributions
 blush+0.018
 coloring+0.016
 Grac+0.016
 dress+0.015
 tea+0.015
Neg logit contributions
urious-0.032
rated-0.027
ust-0.027
uct-0.025
appointed-0.024
Token mom
Token ID1995
Feature activation+3.40e-02

Pos logit contributions
 blush+0.007
 coloring+0.007
 dress+0.006
 tea+0.006
 groom+0.006
Neg logit contributions
urious-0.010
rated-0.008
ust-0.008
uct-0.008
 eternity-0.007
Token and
Token ID290
Feature activation-7.55e-03

Pos logit contributions
 blush+0.057
 coloring+0.053
 Grac+0.050
 tea+0.049
 dress+0.049
Neg logit contributions
urious-0.071
rated-0.058
ust-0.056
uct-0.053
 dangers-0.052
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
urious+0.016
rated+0.013
ust+0.013
uct+0.012
 dangers+0.012
Neg logit contributions
 blush-0.012
 coloring-0.011
 Grac-0.011
 piping-0.011
acey-0.010
Token twins
Token ID20345
Feature activation+1.01e-01

Pos logit contributions
 blush+0.019
 coloring+0.018
 Grac+0.017
 tea+0.016
 dress+0.016
Neg logit contributions
urious-0.024
rated-0.020
ust-0.019
uct-0.018
 eternity-0.018
Token.
Token ID13
Feature activation+3.94e-02

Pos logit contributions
 blush+0.169
 coloring+0.162
 Grac+0.149
 dress+0.148
 tea+0.147
Neg logit contributions
urious-0.209
rated-0.171
ust-0.164
uct-0.155
 dangers-0.154
Token He
Token ID679
Feature activation-6.95e-03

Pos logit contributions
 blush+0.082
 coloring+0.078
 Grac+0.076
 piping+0.073
 dress+0.073
Neg logit contributions
urious-0.066
rated-0.053
ust-0.049
uct-0.046
 dangers-0.044
Token said
Token ID531
Feature activation+3.02e-02

Pos logit contributions
urious+0.016
rated+0.013
ust+0.013
uct+0.012
 eternity+0.012
Neg logit contributions
 blush-0.010
 coloring-0.009
 Grac-0.009
 dress-0.009
 groom-0.009
Token he
Token ID339
Feature activation-3.43e-02

Pos logit contributions
 blush+0.052
 coloring+0.049
 Grac+0.047
 piping+0.045
 tea+0.045
Neg logit contributions
urious-0.061
rated-0.050
ust-0.048
uct-0.046
 eternity-0.045
Token would
Token ID561
Feature activation+3.83e-03

Pos logit contributions
urious+0.077
rated+0.065
ust+0.062
 eternity+0.061
uct+0.060
Neg logit contributions
 blush-0.051
 coloring-0.048
 dress-0.044
 Grac-0.044
acey-0.044
Token father
Token ID2988
Feature activation+1.31e-02

Pos logit contributions
 blush+0.007
 coloring+0.006
 Grac+0.006
 piping+0.006
 tea+0.006
Neg logit contributions
urious-0.008
rated-0.007
ust-0.007
uct-0.006
 dangers-0.006
Token and
Token ID290
Feature activation-1.19e-02

Pos logit contributions
 blush+0.020
 coloring+0.019
 Grac+0.017
 tea+0.017
 piping+0.017
Neg logit contributions
urious-0.029
rated-0.024
ust-0.024
uct-0.023
 dangers-0.022
Token your
Token ID534
Feature activation+2.41e-02

Pos logit contributions
urious+0.025
rated+0.020
ust+0.020
 dangers+0.019
 habitat+0.019
Neg logit contributions
 blush-0.018
 coloring-0.017
 Grac-0.016
 piping-0.016
acey-0.016
Token brothers
Token ID9397
Feature activation+6.70e-02

Pos logit contributions
 blush+0.041
 coloring+0.038
 Grac+0.036
 piping+0.035
acey+0.035
Neg logit contributions
urious-0.049
rated-0.039
ust-0.038
 dangers-0.037
 eternity-0.036
Token and
Token ID290
Feature activation-1.24e-02

Pos logit contributions
 blush+0.106
 coloring+0.100
 Grac+0.094
 tea+0.090
 piping+0.090
Neg logit contributions
urious-0.144
rated-0.118
ust-0.115
uct-0.110
 dangers-0.108
Token sisters
Token ID15153
Feature activation+1.01e-01

Pos logit contributions
urious+0.026
rated+0.021
ust+0.020
 dangers+0.020
 habitat+0.019
Neg logit contributions
 blush-0.020
 coloring-0.019
 Grac-0.018
 piping-0.017
acey-0.017
Token.
Token ID13
Feature activation+2.87e-02

Pos logit contributions
 blush+0.168
 coloring+0.157
 Grac+0.151
 piping+0.144
 tea+0.143
Neg logit contributions
urious-0.206
rated-0.169
ust-0.165
uct-0.156
 dangers-0.154
Token They
Token ID1119
Feature activation+3.13e-02

Pos logit contributions
 blush+0.050
 coloring+0.047
 Grac+0.046
 piping+0.044
 tea+0.043
Neg logit contributions
urious-0.058
rated-0.047
ust-0.045
uct-0.043
 dangers-0.042
Token are
Token ID389
Feature activation+3.08e-02

Pos logit contributions
 blush+0.040
 coloring+0.037
 Grac+0.034
 piping+0.034
 tea+0.032
Neg logit contributions
urious-0.075
rated-0.064
ust-0.063
 dangers-0.060
uct-0.059
Token all
Token ID477
Feature activation+5.39e-03

Pos logit contributions
 blush+0.055
 mascara+0.048
 coloring+0.046
 Grac+0.046
 lipstick+0.046
Neg logit contributions
urious-0.054
 dangers-0.047
ust-0.046
 next-0.045
uct-0.043
Token le
Token ID443
Feature activation-3.80e-02

Pos logit contributions
 blush+0.010
 mascara+0.008
acey+0.008
 Stacy+0.008
 coloring+0.008
Neg logit contributions
urious-0.009
 dangers-0.008
ust-0.008
rated-0.008
 next-0.008
Token in
Token ID287
Feature activation+2.83e-02

Pos logit contributions
 blush+0.065
 coloring+0.060
 Grac+0.058
 piping+0.056
 tea+0.055
Neg logit contributions
urious-0.086
rated-0.071
ust-0.069
uct-0.066
 dangers-0.065
Token the
Token ID262
Feature activation+3.58e-02

Pos logit contributions
 blush+0.044
 coloring+0.042
 Grac+0.039
 piping+0.038
 groom+0.038
Neg logit contributions
urious-0.060
rated-0.048
ust-0.047
 dangers-0.046
uct-0.046
Token closet
Token ID21615
Feature activation+4.88e-02

Pos logit contributions
 blush+0.050
 groom+0.050
 piping+0.049
 Grac+0.048
 coloring+0.045
Neg logit contributions
urious-0.076
 dangers-0.061
 next-0.060
 habitat-0.058
 property-0.058
Token and
Token ID290
Feature activation-1.32e-02

Pos logit contributions
 blush+0.079
 coloring+0.074
 Grac+0.071
 piping+0.068
 tea+0.068
Neg logit contributions
urious-0.102
rated-0.084
ust-0.082
uct-0.078
 dangers-0.077
Token her
Token ID607
Feature activation+2.88e-02

Pos logit contributions
urious+0.020
 dangers+0.016
 next+0.016
uct+0.016
rated+0.014
Neg logit contributions
 piping-0.024
 blush-0.023
acey-0.022
 coloring-0.022
 tea-0.022
Token sister
Token ID6621
Feature activation+1.01e-01

Pos logit contributions
 blush+0.038
 Grac+0.035
 coloring+0.035
 piping+0.034
 groom+0.033
Neg logit contributions
urious-0.064
 dangers-0.052
rated-0.051
ust-0.051
 habitat-0.050
Token outside
Token ID2354
Feature activation+3.83e-02

Pos logit contributions
 blush+0.176
 coloring+0.166
 Grac+0.158
 piping+0.152
 tea+0.152
Neg logit contributions
urious-0.200
rated-0.162
ust-0.157
uct-0.149
 dangers-0.146
Token.
Token ID13
Feature activation+3.44e-02

Pos logit contributions
 blush+0.066
 coloring+0.062
 Grac+0.059
 dress+0.057
 tea+0.057
Neg logit contributions
urious-0.078
rated-0.064
ust-0.063
uct-0.059
 dangers-0.058
Token She
Token ID1375
Feature activation+1.61e-02

Pos logit contributions
 blush+0.052
 coloring+0.049
 groom+0.047
 piping+0.046
 Grac+0.046
Neg logit contributions
urious-0.075
rated-0.059
ust-0.059
uct-0.057
 dangers-0.056
Token asked
Token ID1965
Feature activation+2.16e-02

Pos logit contributions
 blush+0.028
 coloring+0.026
 piping+0.025
 tea+0.025
 Grac+0.025
Neg logit contributions
urious-0.032
ust-0.024
uct-0.024
rated-0.023
 dangers-0.023
Token them
Token ID606
Feature activation-1.45e-02

Pos logit contributions
 blush+0.029
 coloring+0.027
 piping+0.026
 groom+0.025
 wig+0.025
Neg logit contributions
urious-0.050
rated-0.041
ust-0.041
 dangers-0.040
 eternity-0.039
Token.
Token ID13
Feature activation+1.75e-03

Pos logit contributions
 blush+0.003
 coloring+0.003
 dress+0.003
 Grac+0.003
 tea+0.003
Neg logit contributions
urious-0.005
rated-0.004
ust-0.004
uct-0.004
 dangers-0.004
Token Tim
Token ID5045
Feature activation-5.24e-03

Pos logit contributions
 blush+0.003
 coloring+0.003
 Grac+0.003
 lipstick+0.002
 tea+0.002
Neg logit contributions
urious-0.004
rated-0.003
ust-0.003
 dangers-0.003
 habitats-0.003
Tokenmy
Token ID1820
Feature activation+2.85e-02

Pos logit contributions
urious+0.011
rated+0.008
ust+0.008
uct+0.008
 dangers+0.007
Neg logit contributions
 blush-0.009
 coloring-0.009
 Grac-0.008
 piping-0.008
 tea-0.008
Token and
Token ID290
Feature activation-2.55e-02

Pos logit contributions
 blush+0.047
 coloring+0.044
 Grac+0.042
 piping+0.041
 tea+0.040
Neg logit contributions
urious-0.059
rated-0.048
ust-0.047
uct-0.045
 dangers-0.044
Token his
Token ID465
Feature activation-6.87e-03

Pos logit contributions
urious+0.058
rated+0.049
ust+0.048
uct+0.045
 habitat+0.045
Neg logit contributions
 blush-0.038
 coloring-0.036
 Grac-0.035
 dress-0.032
 Rose-0.032
Token sister
Token ID6621
Feature activation+1.01e-01

Pos logit contributions
urious+0.017
ust+0.014
rated+0.014
 dangers+0.013
uct+0.013
Neg logit contributions
 blush-0.009
 coloring-0.008
 Grac-0.007
 piping-0.007
 groom-0.007
Token quickly
Token ID2952
Feature activation+1.57e-02

Pos logit contributions
 blush+0.165
 coloring+0.157
 Grac+0.151
 piping+0.149
 tea+0.144
Neg logit contributions
urious-0.207
rated-0.171
ust-0.166
uct-0.156
 eternity-0.153
Token ran
Token ID4966
Feature activation+6.19e-03

Pos logit contributions
 blush+0.026
 coloring+0.025
 Grac+0.024
 piping+0.024
acey+0.023
Neg logit contributions
urious-0.032
rated-0.026
ust-0.025
uct-0.024
 dangers-0.024
Token inside
Token ID2641
Feature activation+3.86e-02

Pos logit contributions
 blush+0.010
 piping+0.010
 coloring+0.009
acey+0.009
 Grac+0.009
Neg logit contributions
urious-0.012
rated-0.010
ust-0.010
 eternity-0.010
 dangers-0.010
Token to
Token ID284
Feature activation+5.89e-03

Pos logit contributions
 blush+0.059
 coloring+0.055
 Grac+0.052
 piping+0.051
 groom+0.050
Neg logit contributions
urious-0.084
rated-0.070
ust-0.068
uct-0.065
 eternity-0.065
Token stay
Token ID2652
Feature activation-1.28e-02

Pos logit contributions
 blush+0.008
 coloring+0.007
 Grac+0.007
 dress+0.007
 tea+0.007
Neg logit contributions
urious-0.014
rated-0.012
ust-0.012
uct-0.011
 dangers-0.011
Token Max
Token ID5436
Feature activation+4.40e-02

Pos logit contributions
urious+0.034
rated+0.029
ust+0.028
uct+0.027
 dangers+0.026
Neg logit contributions
 blush-0.020
 coloring-0.019
 Grac-0.018
 tea-0.017
 piping-0.017
Token were
Token ID547
Feature activation+3.98e-02

Pos logit contributions
 blush+0.075
 piping+0.072
 coloring+0.072
 Grac+0.071
 lipstick+0.067
Neg logit contributions
urious-0.081
rated-0.069
ust-0.063
 habitat-0.060
 dangers-0.060
Token normal
Token ID3487
Feature activation+2.12e-02

Pos logit contributions
 blush+0.074
 mascara+0.064
 Grac+0.064
 piping+0.063
 lipstick+0.063
Neg logit contributions
urious-0.068
 dangers-0.057
ust-0.057
 next-0.056
uct-0.055
Token brother
Token ID3956
Feature activation+5.46e-02

Pos logit contributions
 blush+0.039
 coloring+0.036
 Grac+0.035
 piping+0.034
acey+0.033
Neg logit contributions
urious-0.040
rated-0.032
ust-0.032
 dangers-0.030
 eternity-0.030
Token and
Token ID290
Feature activation-7.84e-03

Pos logit contributions
 blush+0.099
 coloring+0.093
 Grac+0.091
 piping+0.089
 groom+0.087
Neg logit contributions
urious-0.102
ust-0.081
rated-0.081
 dangers-0.076
 eternity-0.075
Token sister
Token ID6621
Feature activation+1.01e-01

Pos logit contributions
urious+0.015
rated+0.013
ust+0.012
 dangers+0.012
 eternity+0.012
Neg logit contributions
 blush-0.013
 piping-0.012
 coloring-0.012
 Grac-0.012
acey-0.011
Token again
Token ID757
Feature activation-1.70e-03

Pos logit contributions
 blush+0.196
 Grac+0.183
 coloring+0.181
 piping+0.179
acey+0.172
Neg logit contributions
urious-0.170
rated-0.132
ust-0.130
 dangers-0.123
 habitat-0.121
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
urious+0.003
rated+0.003
ust+0.003
uct+0.003
appointed+0.003
Neg logit contributions
 blush-0.003
 coloring-0.003
 Grac-0.003
 tea-0.003
 dress-0.003
Token<|endoftext|>
Token ID50256
Feature activation-3.74e-02

Pos logit contributions
 blush+0.030
 coloring+0.028
 Grac+0.027
 piping+0.026
 tea+0.026
Neg logit contributions
urious-0.035
rated-0.029
ust-0.028
uct-0.027
 dangers-0.026
TokenTim
Token ID14967
Feature activation-2.52e-03

Pos logit contributions
urious+0.086
rated+0.076
ust+0.073
uct+0.068
appointed+0.067
Neg logit contributions
 coloring-0.052
 blush-0.052
 tea-0.049
 Grac-0.049
 Rose-0.047
Token had
Token ID550
Feature activation+4.17e-02

Pos logit contributions
urious+0.005
rated+0.004
ust+0.004
uct+0.004
 dangers+0.004
Neg logit contributions
 blush-0.004
 coloring-0.004
 Grac-0.004
 piping-0.004
 tea-0.004
Token little
Token ID1310
Feature activation-9.20e-03

Pos logit contributions
urious+0.092
uct+0.079
ust+0.079
rated+0.078
 habitat+0.075
Neg logit contributions
 coloring-0.050
 Grac-0.050
 blush-0.049
 tea-0.049
 dress-0.045
Token girl
Token ID2576
Feature activation+4.70e-02

Pos logit contributions
urious+0.022
rated+0.019
ust+0.018
 dangers+0.017
 eternity+0.017
Neg logit contributions
 blush-0.013
 coloring-0.011
 wig-0.010
 piping-0.010
acey-0.010
Token all
Token ID477
Feature activation-2.48e-02

Pos logit contributions
 blush+0.090
 Grac+0.085
 piping+0.084
acey+0.082
 wig+0.082
Neg logit contributions
urious-0.084
ust-0.061
 dangers-0.059
rated-0.058
uct-0.057
Token about
Token ID546
Feature activation-1.26e-02

Pos logit contributions
urious+0.051
rated+0.041
ust+0.040
uct+0.038
 dangers+0.038
Neg logit contributions
 blush-0.042
 coloring-0.039
 Grac-0.038
 piping-0.037
acey-0.037
Token how
Token ID703
Feature activation+2.38e-02

Pos logit contributions
urious+0.029
rated+0.024
ust+0.023
uct+0.022
appointed+0.021
Neg logit contributions
 blush-0.018
 coloring-0.018
 Grac-0.016
 tea-0.016
 dress-0.016
Token volcan
Token ID17516
Feature activation-1.37e-01

Pos logit contributions
 coloring+0.027
 blush+0.026
 Grac+0.024
 tea+0.024
 groom+0.022
Neg logit contributions
urious-0.059
rated-0.053
ust-0.050
uct-0.050
 habitat-0.049
Tokenoes
Token ID3028
Feature activation-2.56e-02

Pos logit contributions
urious+0.258
uct+0.213
rated+0.203
 eternity+0.199
 habitat+0.194
Neg logit contributions
 coloring-0.229
 blush-0.229
 Grac-0.208
 piping-0.203
 tea-0.200
Token work
Token ID670
Feature activation+3.12e-03

Pos logit contributions
urious+0.063
rated+0.053
ust+0.052
uct+0.052
 eternity+0.049
Neg logit contributions
 blush-0.033
 coloring-0.031
 Grac-0.028
 dress-0.028
 tea-0.028
Token.
Token ID13
Feature activation-7.61e-03

Pos logit contributions
 blush+0.005
 coloring+0.005
 Grac+0.004
 tea+0.004
 dress+0.004
Neg logit contributions
urious-0.007
rated-0.006
ust-0.005
uct-0.005
 eternity-0.005
Token She
Token ID1375
Feature activation-6.40e-03

Pos logit contributions
urious+0.016
rated+0.013
ust+0.013
uct+0.012
 eternity+0.012
Neg logit contributions
 blush-0.012
 coloring-0.012
 Grac-0.011
 piping-0.011
 groom-0.011
Token taught
Token ID7817
Feature activation-1.49e-02

Pos logit contributions
urious+0.014
rated+0.012
ust+0.012
uct+0.011
 eternity+0.011
Neg logit contributions
 blush-0.009
 coloring-0.009
 Grac-0.008
 tea-0.008
 dress-0.008
Token.
Token ID13
Feature activation-2.30e-03

Pos logit contributions
urious+0.043
rated+0.035
ust+0.034
uct+0.033
 eternity+0.031
Neg logit contributions
 blush-0.037
 coloring-0.035
 Grac-0.033
 tea-0.032
 dress-0.032
Token Lily
Token ID20037
Feature activation+3.13e-02

Pos logit contributions
urious+0.006
rated+0.005
ust+0.005
 dangers+0.005
uct+0.005
Neg logit contributions
 blush-0.003
 Grac-0.002
 coloring-0.002
 tea-0.002
 Rose-0.002
Token and
Token ID290
Feature activation-3.70e-02

Pos logit contributions
 blush+0.050
 coloring+0.048
 Grac+0.044
 dress+0.043
 tea+0.043
Neg logit contributions
urious-0.066
rated-0.055
ust-0.054
uct-0.051
 eternity-0.050
Token Ben
Token ID3932
Feature activation+4.23e-02

Pos logit contributions
urious+0.092
rated+0.078
ust+0.076
uct+0.073
 dangers+0.072
Neg logit contributions
 blush-0.047
 coloring-0.043
 Grac-0.040
 tea-0.038
 piping-0.038
Token think
Token ID892
Feature activation-3.87e-02

Pos logit contributions
 blush+0.068
 coloring+0.065
 Grac+0.063
 piping+0.060
 tea+0.059
Neg logit contributions
urious-0.087
rated-0.073
ust-0.069
uct-0.065
 dangers-0.065
Token volcan
Token ID17516
Feature activation-1.35e-01

Pos logit contributions
urious+0.097
rated+0.082
ust+0.080
uct+0.077
appointed+0.074
Neg logit contributions
 blush-0.049
 coloring-0.047
 tea-0.043
 Grac-0.043
 dress-0.041
Tokenoes
Token ID3028
Feature activation-3.38e-02

Pos logit contributions
urious+0.226
rated+0.175
uct+0.169
ust+0.162
 eternity+0.160
Neg logit contributions
 blush-0.273
 coloring-0.263
 Grac-0.247
 tea-0.241
 piping-0.240
Token are
Token ID389
Feature activation+1.18e-02

Pos logit contributions
urious+0.091
rated+0.078
ust+0.076
uct+0.074
 eternity+0.073
Neg logit contributions
 blush-0.035
 coloring-0.032
 Grac-0.029
 dress-0.028
 tea-0.028
Token cool
Token ID3608
Feature activation+6.36e-03

Pos logit contributions
 blush+0.018
 coloring+0.017
 tea+0.016
 Grac+0.016
 dress+0.015
Neg logit contributions
urious-0.026
rated-0.022
ust-0.021
uct-0.021
 eternity-0.020
Token.
Token ID13
Feature activation-3.14e-03

Pos logit contributions
 blush+0.010
 coloring+0.010
 tea+0.009
 Grac+0.009
 dress+0.009
Neg logit contributions
urious-0.014
rated-0.011
ust-0.011
uct-0.011
 eternity-0.010
Token They
Token ID1119
Feature activation+1.19e-02

Pos logit contributions
urious+0.006
rated+0.005
ust+0.005
uct+0.005
 dangers+0.004
Neg logit contributions
 blush-0.006
 coloring-0.005
 Grac-0.005
 piping-0.005
 groom-0.005
Token Lucy
Token ID22162
Feature activation-2.74e-03

Pos logit contributions
 blush+0.029
 coloring+0.028
acey+0.026
 wig+0.026
 groom+0.026
Neg logit contributions
urious-0.037
rated-0.030
ust-0.029
 eternity-0.029
uct-0.028
Token noticed
Token ID6810
Feature activation-3.38e-02

Pos logit contributions
urious+0.006
rated+0.005
ust+0.005
uct+0.005
 eternity+0.005
Neg logit contributions
 blush-0.004
 coloring-0.004
 Grac-0.003
 dress-0.003
 tea-0.003
Token the
Token ID262
Feature activation-3.08e-02

Pos logit contributions
urious+0.082
rated+0.069
ust+0.068
uct+0.065
 dangers+0.064
Neg logit contributions
 blush-0.044
 coloring-0.041
 Grac-0.038
 piping-0.036
 tea-0.036
Token weather
Token ID6193
Feature activation-4.60e-02

Pos logit contributions
urious+0.063
rated+0.051
ust+0.050
uct+0.048
 eternity+0.047
Neg logit contributions
 blush-0.053
 coloring-0.050
 Grac-0.047
 dress-0.046
 tea-0.046
Token was
Token ID373
Feature activation-1.24e-02

Pos logit contributions
urious+0.103
ust+0.085
rated+0.083
uct+0.079
 habitat+0.079
Neg logit contributions
 blush-0.072
 coloring-0.068
 dress-0.062
 tea-0.061
 Grac-0.059
Token uncom
Token ID8820
Feature activation-1.34e-01

Pos logit contributions
urious+0.026
rated+0.021
ust+0.021
uct+0.020
 eternity+0.019
Neg logit contributions
 blush-0.020
 coloring-0.020
 tea-0.018
 Grac-0.018
 dress-0.018
Tokenfort
Token ID3319
Feature activation-9.72e-02

Pos logit contributions
urious+0.254
rated+0.205
 dangers+0.202
ust+0.200
 eternity+0.192
Neg logit contributions
 blush-0.238
 coloring-0.222
 Grac-0.208
 lipstick-0.205
 dress-0.203
Tokenably
Token ID1346
Feature activation-1.76e-02

Pos logit contributions
urious+0.169
uct+0.128
 dangers+0.126
 eternity+0.123
rated+0.123
Neg logit contributions
 blush-0.174
 coloring-0.173
 dress-0.162
 groom-0.159
acey-0.157
Token hot
Token ID3024
Feature activation-4.12e-02

Pos logit contributions
urious+0.038
ust+0.030
rated+0.030
uct+0.030
 eternity+0.028
Neg logit contributions
 blush-0.028
 coloring-0.027
 dress-0.025
 tea-0.025
 Grac-0.024
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
urious+0.098
ust+0.081
rated+0.079
uct+0.076
appointed+0.073
Neg logit contributions
 blush-0.061
 coloring-0.059
 tea-0.056
 dress-0.053
 lipstick-0.051
Token She
Token ID1375
Feature activation-1.62e-02

Pos logit contributions
urious+0.042
rated+0.037
ust+0.036
 dangers+0.034
uct+0.034
Neg logit contributions
 blush-0.019
 coloring-0.016
 Grac-0.016
 dress-0.015
 Rose-0.014
Token big
Token ID1263
Feature activation-1.71e-03

Pos logit contributions
 blush+0.011
acey+0.010
 Grac+0.010
 piping+0.009
 Stacy+0.009
Neg logit contributions
urious-0.010
 dangers-0.008
ust-0.008
 next-0.008
 risky-0.008
Token book
Token ID1492
Feature activation-5.71e-02

Pos logit contributions
urious+0.003
 earthquake+0.003
rated+0.003
 eternity+0.003
 dangers+0.003
Neg logit contributions
 blush-0.003
acey-0.002
 Grac-0.002
 Stacy-0.002
 groom-0.002
Token with
Token ID351
Feature activation-8.21e-03

Pos logit contributions
urious+0.124
rated+0.102
ust+0.099
uct+0.095
 dangers+0.093
Neg logit contributions
 blush-0.090
 coloring-0.085
 Grac-0.079
 dress-0.076
 tea-0.076
Token pictures
Token ID5986
Feature activation-1.01e-02

Pos logit contributions
urious+0.017
 dangers+0.014
rated+0.014
ust+0.014
 habitats+0.013
Neg logit contributions
 blush-0.012
 Grac-0.011
 coloring-0.011
acey-0.011
 piping-0.011
Token of
Token ID286
Feature activation+4.11e-03

Pos logit contributions
urious+0.019
rated+0.015
ust+0.015
 eternity+0.015
uct+0.015
Neg logit contributions
 blush-0.017
 Grac-0.017
 coloring-0.016
 groom-0.016
 piping-0.016
Token volcan
Token ID17516
Feature activation-1.34e-01

Pos logit contributions
 blush+0.007
acey+0.006
 Grac+0.006
 coloring+0.006
 piping+0.006
Neg logit contributions
urious-0.008
 dangers-0.006
ust-0.006
 eternity-0.006
 habitats-0.006
Tokenoes
Token ID3028
Feature activation-1.84e-02

Pos logit contributions
urious+0.234
uct+0.194
rated+0.189
 eternity+0.178
 dangers+0.172
Neg logit contributions
 blush-0.251
 coloring-0.250
 tea-0.223
 piping-0.218
 lipstick-0.215
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
urious+0.039
rated+0.032
ust+0.031
uct+0.030
 dangers+0.028
Neg logit contributions
 blush-0.030
 coloring-0.029
 Grac-0.027
 dress-0.026
 tea-0.026
Token Vol
Token ID4709
Feature activation-9.22e-02

Pos logit contributions
urious+0.028
rated+0.024
ust+0.023
 dangers+0.022
 eternity+0.022
Neg logit contributions
 blush-0.019
 coloring-0.018
 Grac-0.017
 piping-0.017
 tea-0.016
Tokencan
Token ID5171
Feature activation-2.52e-02

Pos logit contributions
urious+0.229
rated+0.194
ust+0.193
 dangers+0.187
 eternity+0.186
Neg logit contributions
 coloring-0.109
 blush-0.104
acey-0.103
 kit-0.097
 Grac-0.096
Tokenoes
Token ID3028
Feature activation-3.76e-02

Pos logit contributions
urious+0.056
rated+0.047
uct+0.046
ust+0.044
appointed+0.043
Neg logit contributions
 coloring-0.037
 blush-0.036
 Grac-0.034
 dress-0.033
 kit-0.033
Token but
Token ID475
Feature activation-6.31e-02

Pos logit contributions
 blush+0.022
 coloring+0.020
 Grac+0.019
 piping+0.018
 tea+0.018
Neg logit contributions
urious-0.034
rated-0.029
ust-0.028
uct-0.027
 dangers-0.026
Token the
Token ID262
Feature activation-3.53e-02

Pos logit contributions
urious+0.163
rated+0.140
ust+0.137
 habitat+0.131
rig+0.126
Neg logit contributions
 blush-0.074
 coloring-0.067
 Grac-0.064
 Rose-0.064
 dress-0.064
Token other
Token ID584
Feature activation+1.55e-04

Pos logit contributions
urious+0.081
rated+0.068
ust+0.067
uct+0.064
 eternity+0.063
Neg logit contributions
 blush-0.050
 coloring-0.047
 Grac-0.044
 piping-0.043
 tea-0.043
Token animal
Token ID5044
Feature activation-2.87e-02

Pos logit contributions
 blush+0.000
 coloring+0.000
 dress+0.000
 tea+0.000
 groom+0.000
Neg logit contributions
urious-0.000
ust-0.000
rated-0.000
uct-0.000
appointed-0.000
Token was
Token ID373
Feature activation-1.36e-02

Pos logit contributions
urious+0.061
rated+0.050
ust+0.048
uct+0.046
 habitat+0.046
Neg logit contributions
 blush-0.045
 coloring-0.043
 dress-0.041
 Grac-0.039
 tea-0.039
Token ins
Token ID1035
Feature activation-1.34e-01

Pos logit contributions
urious+0.029
rated+0.024
ust+0.023
uct+0.022
 dangers+0.022
Neg logit contributions
 blush-0.022
 coloring-0.021
 Grac-0.020
 piping-0.019
 tea-0.019
Tokenistent
Token ID7609
Feature activation-6.31e-02

Pos logit contributions
urious+0.222
rated+0.184
 dangers+0.183
 habitats+0.176
 habitat+0.175
Neg logit contributions
 blush-0.249
acey-0.237
 coloring-0.234
 Grac-0.220
ula-0.220
Token.
Token ID13
Feature activation-2.99e-02

Pos logit contributions
urious+0.155
rated+0.129
 habitat+0.125
ust+0.122
 habitats+0.122
Neg logit contributions
 coloring-0.078
 blush-0.076
 dress-0.068
 tea-0.066
 kit-0.065
Token
Token ID198
Feature activation-2.83e-02

Pos logit contributions
urious+0.071
rated+0.064
ust+0.061
 dangers+0.059
 eternity+0.056
Neg logit contributions
 blush-0.038
 Grac-0.036
 coloring-0.035
 Rose-0.034
 piping-0.032
Token
Token ID198
Feature activation-2.52e-02

Pos logit contributions
urious+0.061
rated+0.052
ust+0.050
 dangers+0.047
uct+0.047
Neg logit contributions
 blush-0.042
 coloring-0.040
 Grac-0.039
 piping-0.037
acey-0.037
TokenThe
Token ID464
Feature activation-4.89e-02

Pos logit contributions
urious+0.051
rated+0.044
ust+0.041
 eternity+0.041
 habitat+0.040
Neg logit contributions
 blush-0.039
 coloring-0.036
 Grac-0.036
Rose-0.035
 dress-0.034
Token big
Token ID1263
Feature activation-3.50e-04

Pos logit contributions
 blush+0.013
acey+0.011
 Grac+0.011
 piping+0.011
 Stacy+0.011
Neg logit contributions
urious-0.012
 dangers-0.009
ust-0.009
 next-0.009
 risky-0.009
Token book
Token ID1492
Feature activation-5.37e-02

Pos logit contributions
urious+0.001
 earthquake+0.001
rated+0.001
 eternity+0.001
 dangers+0.001
Neg logit contributions
 blush-0.001
acey-0.000
 Grac-0.000
 groom-0.000
 Stacy-0.000
Token with
Token ID351
Feature activation-5.39e-03

Pos logit contributions
urious+0.115
rated+0.094
ust+0.091
uct+0.087
 dangers+0.086
Neg logit contributions
 blush-0.086
 coloring-0.082
 Grac-0.076
 dress-0.074
 tea-0.074
Token pictures
Token ID5986
Feature activation-7.57e-03

Pos logit contributions
urious+0.011
 dangers+0.009
rated+0.009
ust+0.009
 eternity+0.009
Neg logit contributions
 blush-0.008
 Grac-0.007
 coloring-0.007
acey-0.007
 piping-0.007
Token of
Token ID286
Feature activation+5.11e-03

Pos logit contributions
urious+0.014
rated+0.011
ust+0.011
uct+0.011
 eternity+0.011
Neg logit contributions
 blush-0.013
 Grac-0.012
 coloring-0.012
 groom-0.012
 piping-0.012
Token volcan
Token ID17516
Feature activation-1.32e-01

Pos logit contributions
 blush+0.009
 Grac+0.008
acey+0.008
 coloring+0.008
 piping+0.008
Neg logit contributions
urious-0.009
 dangers-0.008
ust-0.007
 eternity-0.007
 habitats-0.007
Tokenoes
Token ID3028
Feature activation-1.57e-02

Pos logit contributions
urious+0.231
uct+0.192
rated+0.187
 eternity+0.176
 dangers+0.171
Neg logit contributions
 blush-0.245
 coloring-0.244
 tea-0.216
 piping-0.212
 lipstick-0.208
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
urious+0.033
rated+0.027
ust+0.026
uct+0.025
 dangers+0.024
Neg logit contributions
 blush-0.026
 coloring-0.025
 Grac-0.023
 dress-0.023
 tea-0.022
Token Vol
Token ID4709
Feature activation-8.92e-02

Pos logit contributions
urious+0.024
rated+0.021
ust+0.020
 dangers+0.019
 eternity+0.019
Neg logit contributions
 blush-0.016
 coloring-0.015
 Grac-0.014
 piping-0.014
 tea-0.014
Tokencan
Token ID5171
Feature activation-2.35e-02

Pos logit contributions
urious+0.220
rated+0.185
ust+0.185
 dangers+0.180
 eternity+0.178
Neg logit contributions
 coloring-0.106
 blush-0.102
acey-0.101
 kit-0.095
 Grac-0.095
Tokenoes
Token ID3028
Feature activation-3.69e-02

Pos logit contributions
urious+0.053
rated+0.044
uct+0.044
ust+0.042
 eternity+0.041
Neg logit contributions
 coloring-0.034
 blush-0.033
 Grac-0.031
 dress-0.030
 tea-0.030
Token the
Token ID262
Feature activation-3.11e-02

Pos logit contributions
urious+0.098
rated+0.084
ust+0.080
 habitat+0.077
 dangers+0.076
Neg logit contributions
 blush-0.057
 coloring-0.053
 Grac-0.050
 groom-0.047
 dress-0.046
Token little
Token ID1310
Feature activation-2.01e-03

Pos logit contributions
urious+0.072
ust+0.060
rated+0.060
uct+0.059
 eternity+0.057
Neg logit contributions
 blush-0.043
 coloring-0.041
 dress-0.040
 tea-0.039
 Grac-0.038
Token girl
Token ID2576
Feature activation+4.28e-02

Pos logit contributions
urious+0.005
rated+0.004
ust+0.004
 dangers+0.003
uct+0.003
Neg logit contributions
 blush-0.003
 coloring-0.003
 piping-0.003
 wig-0.003
acey-0.002
Token was
Token ID373
Feature activation+5.73e-04

Pos logit contributions
 blush+0.071
 coloring+0.067
 Grac+0.064
 dress+0.061
 tea+0.061
Neg logit contributions
urious-0.088
rated-0.072
ust-0.070
uct-0.066
 dangers-0.066
Token so
Token ID523
Feature activation-6.61e-03

Pos logit contributions
 blush+0.001
 coloring+0.001
 Grac+0.001
 piping+0.001
 tea+0.001
Neg logit contributions
urious-0.001
ust-0.001
rated-0.001
uct-0.001
 dangers-0.001
Token ins
Token ID1035
Feature activation-1.32e-01

Pos logit contributions
urious+0.016
rated+0.014
 habitat+0.013
 habitats+0.013
 dangers+0.013
Neg logit contributions
 blush-0.009
 coloring-0.008
 Grac-0.008
 groom-0.007
 dress-0.007
Tokenistent
Token ID7609
Feature activation-6.30e-02

Pos logit contributions
urious+0.227
rated+0.185
 dangers+0.184
 territory+0.176
 habitat+0.176
Neg logit contributions
 blush-0.246
acey-0.233
 coloring-0.230
 Grac-0.226
ula-0.213
Token that
Token ID326
Feature activation-9.35e-03

Pos logit contributions
urious+0.141
rated+0.117
ust+0.111
 habitat+0.108
uct+0.106
Neg logit contributions
 blush-0.097
 coloring-0.091
 Grac-0.083
 dress-0.082
 tea-0.081
Token her
Token ID607
Feature activation-3.13e-02

Pos logit contributions
urious+0.027
rated+0.024
ust+0.023
 habitat+0.022
uct+0.022
Neg logit contributions
 blush-0.008
 coloring-0.008
 Grac-0.007
 tea-0.006
 dress-0.006
Token mum
Token ID25682
Feature activation+4.88e-03

Pos logit contributions
urious+0.073
rated+0.063
ust+0.062
uct+0.062
 eternity+0.058
Neg logit contributions
 blush-0.042
 coloring-0.039
 dress-0.039
 tea-0.036
 groom-0.035
Token finally
Token ID3443
Feature activation-1.88e-02

Pos logit contributions
 blush+0.007
 coloring+0.007
 Grac+0.006
 dress+0.006
acey+0.006
Neg logit contributions
urious-0.011
rated-0.009
ust-0.009
 habitat-0.008
 dangers-0.008
Token a
Token ID257
Feature activation+1.65e-02

Pos logit contributions
urious+0.097
rated+0.086
ust+0.083
uct+0.078
 eternity+0.076
Neg logit contributions
 blush-0.042
 coloring-0.039
 Grac-0.035
 tea-0.035
 dress-0.034
Token family
Token ID1641
Feature activation+1.51e-03

Pos logit contributions
 blush+0.031
 coloring+0.028
 Grac+0.028
acey+0.028
 piping+0.027
Neg logit contributions
urious-0.030
rated-0.023
ust-0.023
 dangers-0.022
 earthquake-0.021
Token of
Token ID286
Feature activation+7.18e-03

Pos logit contributions
 blush+0.002
 coloring+0.002
 Grac+0.002
 piping+0.002
acey+0.002
Neg logit contributions
urious-0.003
rated-0.003
ust-0.002
uct-0.002
 eternity-0.002
Token turtles
Token ID36288
Feature activation-3.42e-02

Pos logit contributions
 blush+0.013
 coloring+0.012
 Grac+0.011
 piping+0.011
acey+0.011
Neg logit contributions
urious-0.014
rated-0.011
ust-0.011
 dangers-0.010
uct-0.010
Token sun
Token ID4252
Feature activation-2.60e-02

Pos logit contributions
urious+0.076
rated+0.064
ust+0.062
uct+0.059
 eternity+0.057
Neg logit contributions
 blush-0.052
 coloring-0.050
 Grac-0.046
 tea-0.045
 dress-0.045
Tokenbat
Token ID8664
Feature activation-1.30e-01

Pos logit contributions
urious+0.053
rated+0.043
ust+0.042
uct+0.040
 dangers+0.039
Neg logit contributions
 blush-0.045
 coloring-0.043
 Grac-0.040
 dress-0.039
 tea-0.039
Tokenhing
Token ID722
Feature activation-3.16e-02

Pos logit contributions
urious+0.312
rated+0.258
ust+0.257
 dangers+0.252
 habitat+0.249
Neg logit contributions
 blush-0.171
 coloring-0.159
 Grac-0.148
 piping-0.139
acey-0.139
Token in
Token ID287
Feature activation+1.25e-02

Pos logit contributions
urious+0.069
rated+0.056
ust+0.056
uct+0.053
 dangers+0.052
Neg logit contributions
 blush-0.049
 coloring-0.046
 Grac-0.044
 tea-0.042
 dress-0.042
Token a
Token ID257
Feature activation+6.39e-02

Pos logit contributions
 blush+0.018
 coloring+0.017
 Grac+0.016
 piping+0.016
 tea+0.016
Neg logit contributions
urious-0.028
rated-0.024
ust-0.023
uct-0.022
 dangers-0.022
Token nearby
Token ID6716
Feature activation-6.45e-03

Pos logit contributions
 blush+0.124
 coloring+0.114
 Grac+0.112
acey+0.110
 mascara+0.105
Neg logit contributions
urious-0.111
 dangers-0.082
ust-0.080
rated-0.079
 habitat-0.076
Token pond
Token ID16723
Feature activation-1.17e-02

Pos logit contributions
urious+0.012
rated+0.010
ust+0.010
uct+0.009
 dangers+0.009
Neg logit contributions
 blush-0.012
 coloring-0.011
 Grac-0.011
 tea-0.011
 dress-0.010
Token The
Token ID383
Feature activation-3.20e-02

Pos logit contributions
 blush+0.013
 coloring+0.012
 groom+0.011
 piping+0.011
 Grac+0.011
Neg logit contributions
urious-0.012
rated-0.009
ust-0.009
 habitat-0.008
uct-0.008
Token book
Token ID1492
Feature activation-4.12e-02

Pos logit contributions
urious+0.074
 dangers+0.059
rated+0.059
 next+0.058
uct+0.056
Neg logit contributions
 blush-0.042
 Grac-0.042
 piping-0.038
 coloring-0.037
acey-0.037
Token has
Token ID468
Feature activation+3.19e-02

Pos logit contributions
urious+0.084
rated+0.068
ust+0.067
uct+0.063
 dangers+0.062
Neg logit contributions
 blush-0.069
 coloring-0.065
 Grac-0.062
 piping-0.060
 tea-0.059
Token pictures
Token ID5986
Feature activation-6.45e-04

Pos logit contributions
 blush+0.050
 Grac+0.049
acey+0.045
 coloring+0.045
 piping+0.044
Neg logit contributions
urious-0.062
 dangers-0.053
 habitat-0.049
ust-0.049
 habitats-0.048
Token of
Token ID286
Feature activation+6.02e-03

Pos logit contributions
urious+0.001
 next+0.001
 dangers+0.001
ust+0.001
 eternity+0.001
Neg logit contributions
 blush-0.001
 Grac-0.001
 groom-0.001
acey-0.001
 piping-0.001
Token volcan
Token ID17516
Feature activation-1.27e-01

Pos logit contributions
 blush+0.011
 coloring+0.010
 Grac+0.010
 piping+0.009
acey+0.009
Neg logit contributions
urious-0.011
 dangers-0.009
ust-0.009
rated-0.009
 eternity-0.008
Tokenoes
Token ID3028
Feature activation-1.46e-02

Pos logit contributions
urious+0.208
rated+0.164
uct+0.161
ust+0.151
 eternity+0.150
Neg logit contributions
 blush-0.260
 coloring-0.254
 tea-0.229
 piping-0.225
 lipstick-0.224
Token.
Token ID13
Feature activation+3.18e-03

Pos logit contributions
urious+0.030
rated+0.024
ust+0.023
uct+0.022
 dangers+0.022
Neg logit contributions
 blush-0.025
 coloring-0.023
 Grac-0.022
 piping-0.021
 tea-0.021
Token Vol
Token ID4709
Feature activation-8.62e-02

Pos logit contributions
 blush+0.005
 coloring+0.005
 Grac+0.005
 tea+0.004
 piping+0.004
Neg logit contributions
urious-0.007
rated-0.006
ust-0.006
uct-0.005
 dangers-0.005
Tokencan
Token ID5171
Feature activation-2.49e-02

Pos logit contributions
urious+0.210
rated+0.177
ust+0.176
 eternity+0.170
 dangers+0.170
Neg logit contributions
 coloring-0.106
 blush-0.105
acey-0.098
 Grac-0.095
 kit-0.093
Tokenoes
Token ID3028
Feature activation-3.62e-02

Pos logit contributions
urious+0.055
rated+0.045
uct+0.044
ust+0.043
 eternity+0.042
Neg logit contributions
 coloring-0.038
 blush-0.037
 Grac-0.035
 dress-0.034
 Rose-0.034
Token teach
Token ID4545
Feature activation+2.24e-03

Pos logit contributions
 blush+0.010
 coloring+0.009
 Grac+0.009
 piping+0.009
acey+0.009
Neg logit contributions
urious-0.015
rated-0.013
ust-0.012
 eternity-0.012
uct-0.012
Token you
Token ID345
Feature activation-1.48e-02

Pos logit contributions
 blush+0.004
 coloring+0.004
 Grac+0.004
 tea+0.004
 dress+0.004
Neg logit contributions
urious-0.004
rated-0.003
ust-0.003
uct-0.003
 eternity-0.003
Token something
Token ID1223
Feature activation+2.75e-03

Pos logit contributions
urious+0.029
ust+0.023
rated+0.022
 dangers+0.021
uct+0.021
Neg logit contributions
 blush-0.026
 coloring-0.024
 Grac-0.024
 piping-0.024
acey-0.023
Token special
Token ID2041
Feature activation-4.15e-02

Pos logit contributions
 blush+0.005
 coloring+0.004
 piping+0.004
 Grac+0.004
acey+0.004
Neg logit contributions
urious-0.005
ust-0.004
rated-0.004
 dangers-0.004
uct-0.004
Token about
Token ID546
Feature activation-1.04e-02

Pos logit contributions
urious+0.084
rated+0.069
ust+0.068
 dangers+0.064
 eternity+0.064
Neg logit contributions
 blush-0.070
 coloring-0.064
 Grac-0.062
 piping-0.060
 groom-0.059
Token volcan
Token ID17516
Feature activation-1.27e-01

Pos logit contributions
urious+0.025
rated+0.021
ust+0.020
uct+0.019
appointed+0.019
Neg logit contributions
 coloring-0.015
 blush-0.014
 tea-0.013
 Grac-0.013
 dress-0.013
Tokenoes
Token ID3028
Feature activation-1.22e-02

Pos logit contributions
urious+0.238
uct+0.196
rated+0.187
appointed+0.177
 eternity+0.175
Neg logit contributions
 coloring-0.218
 blush-0.208
 tea-0.188
 piping-0.183
 Grac-0.180
Token."
Token ID526
Feature activation+5.93e-03

Pos logit contributions
urious+0.028
rated+0.023
ust+0.022
uct+0.022
 eternity+0.021
Neg logit contributions
 blush-0.018
 coloring-0.017
 Grac-0.016
 tea-0.015
 dress-0.015
Token
Token ID198
Feature activation-2.93e-02

Pos logit contributions
 blush+0.012
 coloring+0.012
 wig+0.012
 groom+0.012
acey+0.011
Neg logit contributions
urious-0.009
uct-0.006
rated-0.006
 eternity-0.006
 Next-0.006
Token
Token ID198
Feature activation-2.54e-02

Pos logit contributions
urious+0.051
rated+0.039
ust+0.038
uct+0.037
 eternity+0.035
Neg logit contributions
 blush-0.059
 coloring-0.056
 Grac-0.052
 groom-0.052
 piping-0.051
TokenSo
Token ID2396
Feature activation-2.27e-02

Pos logit contributions
urious+0.044
ust+0.032
rated+0.031
uct+0.030
appointed+0.028
Neg logit contributions
 blush-0.053
 coloring-0.051
 Grac-0.048
 piping-0.048
acey-0.047
Token pony
Token ID26902
Feature activation-3.79e-02

Pos logit contributions
 blush+0.053
 Grac+0.051
acey+0.051
 Stacy+0.050
Rose+0.049
Neg logit contributions
urious-0.066
 adventure-0.055
 storm-0.052
 risky-0.051
ust-0.051
Token.
Token ID13
Feature activation-1.97e-02

Pos logit contributions
urious+0.070
ust+0.056
rated+0.056
 eternity+0.052
uct+0.052
Neg logit contributions
 blush-0.067
 Grac-0.062
 coloring-0.062
 piping-0.062
acey-0.059
Token She
Token ID1375
Feature activation-1.29e-02

Pos logit contributions
urious+0.045
rated+0.036
ust+0.035
uct+0.035
appointed+0.034
Neg logit contributions
 blush-0.028
 coloring-0.027
 Grac-0.025
 piping-0.025
 groom-0.024
Token was
Token ID373
Feature activation-5.86e-03

Pos logit contributions
urious+0.029
rated+0.024
ust+0.023
uct+0.022
 dangers+0.022
Neg logit contributions
 blush-0.020
 coloring-0.018
 Grac-0.017
 piping-0.017
 tea-0.017
Token very
Token ID845
Feature activation-1.11e-02

Pos logit contributions
urious+0.012
ust+0.010
uct+0.010
rated+0.010
appointed+0.009
Neg logit contributions
 blush-0.009
 coloring-0.008
 Grac-0.008
 piping-0.008
 wig-0.008
Token ins
Token ID1035
Feature activation-1.26e-01

Pos logit contributions
urious+0.023
rated+0.018
ust+0.018
uct+0.017
 eternity+0.017
Neg logit contributions
 blush-0.019
 coloring-0.018
 Grac-0.017
 piping-0.016
 tea-0.016
Tokenistent
Token ID7609
Feature activation-5.67e-02

Pos logit contributions
urious+0.199
rated+0.161
 dangers+0.156
 habitats+0.143
 habitat+0.142
Neg logit contributions
 blush-0.258
 coloring-0.245
acey-0.238
 Grac-0.235
 piping-0.223
Token about
Token ID546
Feature activation-2.21e-02

Pos logit contributions
urious+0.126
rated+0.104
ust+0.101
uct+0.097
 dangers+0.095
Neg logit contributions
 blush-0.086
 coloring-0.081
 Grac-0.076
 tea-0.073
 dress-0.072
Token it
Token ID340
Feature activation-3.09e-02

Pos logit contributions
urious+0.052
rated+0.044
ust+0.042
uct+0.040
 eternity+0.040
Neg logit contributions
 blush-0.031
 coloring-0.029
 Grac-0.027
 tea-0.026
 dress-0.026
Token and
Token ID290
Feature activation-5.16e-02

Pos logit contributions
urious+0.073
rated+0.061
ust+0.059
uct+0.057
 habitat+0.056
Neg logit contributions
 blush-0.042
 coloring-0.041
 dress-0.037
 Grac-0.037
 tea-0.036
Token her
Token ID607
Feature activation-2.60e-02

Pos logit contributions
urious+0.122
rated+0.103
ust+0.101
uct+0.096
 dangers+0.095
Neg logit contributions
 blush-0.071
 coloring-0.065
 Grac-0.062
 tea-0.058
 dress-0.058
Token learned
Token ID4499
Feature activation+1.39e-02

Pos logit contributions
urious+0.063
rated+0.053
uct+0.050
 eternity+0.049
ust+0.049
Neg logit contributions
 blush-0.043
 coloring-0.040
 Grac-0.040
 piping-0.040
acey-0.039
Token even
Token ID772
Feature activation-3.69e-02

Pos logit contributions
 blush+0.023
 coloring+0.022
 Grac+0.021
 piping+0.021
acey+0.020
Neg logit contributions
urious-0.028
rated-0.023
ust-0.023
uct-0.021
 dangers-0.021
Token more
Token ID517
Feature activation+2.65e-02

Pos logit contributions
urious+0.077
rated+0.064
ust+0.062
uct+0.059
 eternity+0.059
Neg logit contributions
 blush-0.060
 coloring-0.057
 Grac-0.053
 tea-0.051
 groom-0.051
Token about
Token ID546
Feature activation-1.13e-02

Pos logit contributions
 coloring+0.046
 blush+0.046
 Grac+0.041
 dress+0.041
 tea+0.041
Neg logit contributions
urious-0.053
rated-0.045
ust-0.041
uct-0.040
 habitat-0.039
Token how
Token ID703
Feature activation+3.70e-02

Pos logit contributions
urious+0.026
rated+0.022
ust+0.021
uct+0.020
 eternity+0.019
Neg logit contributions
 blush-0.017
 coloring-0.016
 Grac-0.015
 tea-0.014
 dress-0.014
Token volcan
Token ID17516
Feature activation-1.26e-01

Pos logit contributions
 blush+0.051
 coloring+0.050
 Grac+0.047
 tea+0.044
 groom+0.043
Neg logit contributions
urious-0.085
rated-0.073
ust-0.069
uct-0.067
 eternity-0.067
Tokenoes
Token ID3028
Feature activation-5.76e-03

Pos logit contributions
urious+0.231
uct+0.192
rated+0.184
 eternity+0.178
appointed+0.171
Neg logit contributions
 blush-0.221
 coloring-0.218
 Grac-0.192
 piping-0.190
 tea-0.188
Token work
Token ID670
Feature activation+2.08e-02

Pos logit contributions
urious+0.013
rated+0.011
ust+0.011
uct+0.011
 eternity+0.010
Neg logit contributions
 blush-0.008
 coloring-0.008
 Grac-0.007
 dress-0.007
 tea-0.007
Token.
Token ID13
Feature activation-1.09e-02

Pos logit contributions
 blush+0.034
 coloring+0.033
 Grac+0.030
 tea+0.030
 dress+0.029
Neg logit contributions
urious-0.044
rated-0.036
ust-0.035
uct-0.033
 eternity-0.032
Token She
Token ID1375
Feature activation+4.39e-03

Pos logit contributions
urious+0.024
rated+0.021
ust+0.020
 dangers+0.019
 eternity+0.018
Neg logit contributions
 blush-0.017
 coloring-0.016
 Grac-0.015
 Rose-0.014
 piping-0.014
Token thought
Token ID1807
Feature activation-2.92e-02

Pos logit contributions
 blush+0.007
 coloring+0.007
 Grac+0.007
 piping+0.007
 tea+0.006
Neg logit contributions
urious-0.009
rated-0.007
ust-0.007
uct-0.007
 dangers-0.006
Token,
Token ID11
Feature activation+1.45e-02

Pos logit contributions
urious+0.070
rated+0.059
ust+0.058
uct+0.055
 dangers+0.055
Neg logit contributions
 blush-0.037
 coloring-0.035
 Grac-0.033
 dress-0.031
 tea-0.031
Token but
Token ID475
Feature activation-6.87e-02

Pos logit contributions
 blush+0.024
 coloring+0.023
 Grac+0.022
acey+0.022
 piping+0.022
Neg logit contributions
urious-0.029
rated-0.024
ust-0.023
uct-0.023
 eternity-0.022
Token it
Token ID340
Feature activation-6.40e-02

Pos logit contributions
urious+0.185
rated+0.159
ust+0.155
uct+0.148
 eternity+0.148
Neg logit contributions
 blush-0.074
 coloring-0.067
 Grac-0.062
 dress-0.058
 tea-0.057
Token was
Token ID373
Feature activation-2.93e-02

Pos logit contributions
urious+0.155
 eternity+0.134
 habitats+0.134
 habitat+0.134
ust+0.131
Neg logit contributions
 blush-0.074
 dress-0.070
 coloring-0.070
 comb-0.066
 Grac-0.064
Token too
Token ID1165
Feature activation-5.56e-02

Pos logit contributions
urious+0.069
rated+0.055
 habitat+0.054
ust+0.053
 eternity+0.052
Neg logit contributions
 blush-0.042
 coloring-0.041
 tea-0.038
 Grac-0.037
 Rose-0.036
Token high
Token ID1029
Feature activation-1.25e-01

Pos logit contributions
urious+0.135
 habitat+0.122
 habitats+0.120
 eternity+0.119
uct+0.117
Neg logit contributions
 coloring-0.052
 few-0.052
 handful-0.051
 blush-0.050
 comb-0.049
Token.
Token ID13
Feature activation-2.62e-02

Pos logit contributions
urious+0.291
ust+0.231
rated+0.228
uct+0.223
 dangers+0.220
Neg logit contributions
 blush-0.177
 coloring-0.173
 Grac-0.158
 tea-0.154
 dress-0.153
Token So
Token ID1406
Feature activation-2.58e-02

Pos logit contributions
urious+0.060
rated+0.050
ust+0.049
uct+0.047
 eternity+0.046
Neg logit contributions
 blush-0.037
 coloring-0.035
 Grac-0.033
 piping-0.032
 groom-0.032
Token,
Token ID11
Feature activation+1.98e-03

Pos logit contributions
 next+0.034
urious+0.034
rated+0.029
appointed+0.028
 eternity+0.028
Neg logit contributions
acey-0.050
 groom-0.049
 piping-0.048
 kettle-0.046
 wig-0.046
Token she
Token ID673
Feature activation-2.53e-02

Pos logit contributions
 piping+0.003
acey+0.003
 kit+0.003
 volunteer+0.003
 coloring+0.003
Neg logit contributions
urious-0.003
 next-0.003
uct-0.003
appointed-0.002
 Next-0.002
Token got
Token ID1392
Feature activation+2.25e-02

Pos logit contributions
urious+0.056
rated+0.047
ust+0.046
 eternity+0.045
 dangers+0.044
Neg logit contributions
 blush-0.038
 coloring-0.035
 dress-0.033
 Grac-0.033
 groom-0.032
Token,
Token ID11
Feature activation+2.11e-02

Pos logit contributions
 blush+0.003
 coloring+0.003
 Grac+0.002
 tea+0.002
 dress+0.002
Neg logit contributions
urious-0.010
rated-0.009
ust-0.008
uct-0.008
 dangers-0.008
Token there
Token ID612
Feature activation-4.55e-03

Pos logit contributions
 blush+0.033
 coloring+0.032
 Grac+0.031
 tea+0.029
 dress+0.029
Neg logit contributions
urious-0.045
rated-0.037
ust-0.036
uct-0.035
 habitat-0.035
Token was
Token ID373
Feature activation-2.95e-02

Pos logit contributions
urious+0.010
rated+0.008
ust+0.008
uct+0.008
 dangers+0.008
Neg logit contributions
 blush-0.007
 coloring-0.007
 Grac-0.006
 dress-0.006
 tea-0.006
Token a
Token ID257
Feature activation+1.33e-03

Pos logit contributions
urious+0.072
rated+0.060
ust+0.059
uct+0.057
 eternity+0.057
Neg logit contributions
 blush-0.038
 coloring-0.034
 Grac-0.033
 groom-0.031
 piping-0.031
Token happy
Token ID3772
Feature activation-5.01e-03

Pos logit contributions
 blush+0.003
 coloring+0.002
 Grac+0.002
 piping+0.002
 mascara+0.002
Neg logit contributions
urious-0.002
ust-0.002
rated-0.002
 dangers-0.001
 next-0.001
Token ostr
Token ID42387
Feature activation-1.25e-01

Pos logit contributions
urious+0.010
rated+0.008
ust+0.008
 eternity+0.007
 dangers+0.007
Neg logit contributions
 blush-0.009
 coloring-0.008
 Grac-0.008
 piping-0.008
acey-0.007
Tokenich
Token ID488
Feature activation-6.12e-02

Pos logit contributions
urious+0.251
rated+0.224
uct+0.197
ust+0.191
 dangers+0.189
Neg logit contributions
 blush-0.195
 coloring-0.194
 groom-0.183
 Grac-0.182
 wig-0.180
Token.
Token ID13
Feature activation-1.36e-02

Pos logit contributions
urious+0.133
rated+0.109
ust+0.107
 eternity+0.102
uct+0.101
Neg logit contributions
 blush-0.095
 coloring-0.088
 Grac-0.085
 piping-0.082
acey-0.080
Token She
Token ID1375
Feature activation-2.29e-02

Pos logit contributions
urious+0.030
rated+0.025
ust+0.025
 dangers+0.023
uct+0.023
Neg logit contributions
 blush-0.021
 coloring-0.020
 Grac-0.019
 dress-0.018
 piping-0.018
Token was
Token ID373
Feature activation-1.07e-02

Pos logit contributions
urious+0.038
rated+0.029
ust+0.028
 dangers+0.027
uct+0.026
Neg logit contributions
 blush-0.047
 coloring-0.044
 Grac-0.043
 piping-0.042
acey-0.041
Token the
Token ID262
Feature activation-2.71e-02

Pos logit contributions
urious+0.018
 next+0.017
ust+0.016
appointed+0.015
iring+0.015
Neg logit contributions
 blush-0.018
 piping-0.016
 mascara-0.016
 wig-0.015
acey-0.015
TokenThere
Token ID1858
Feature activation+8.67e-03

Pos logit contributions
urious+0.144
rated+0.123
ust+0.120
uct+0.112
 eternity+0.111
Neg logit contributions
 blush-0.092
 coloring-0.092
 Grac-0.088
 tea-0.084
 Rose-0.084
Token was
Token ID373
Feature activation-2.73e-02

Pos logit contributions
 blush+0.014
 coloring+0.013
 Grac+0.013
 tea+0.012
 piping+0.012
Neg logit contributions
urious-0.018
rated-0.015
ust-0.015
uct-0.014
 dangers-0.014
Token a
Token ID257
Feature activation-1.28e-03

Pos logit contributions
urious+0.071
rated+0.061
ust+0.060
uct+0.057
 dangers+0.057
Neg logit contributions
 blush-0.031
 coloring-0.030
 tea-0.026
 Grac-0.026
 piping-0.026
Token gr
Token ID1036
Feature activation-5.55e-02

Pos logit contributions
urious+0.002
rated+0.002
ust+0.002
 eternity+0.002
uct+0.002
Neg logit contributions
 blush-0.002
 coloring-0.002
 Grac-0.002
 piping-0.002
acey-0.002
Tokenumpy
Token ID32152
Feature activation-3.39e-02

Pos logit contributions
urious+0.070
rated+0.049
uct+0.046
ust+0.044
 dangers+0.042
Neg logit contributions
 blush-0.132
 coloring-0.127
acey-0.126
 Grac-0.125
 tea-0.121
Token ranc
Token ID32205
Feature activation-1.24e-01

Pos logit contributions
urious+0.071
rated+0.058
ust+0.057
uct+0.054
 dangers+0.052
Neg logit contributions
 blush-0.056
 coloring-0.054
 Grac-0.051
 tea-0.050
 dress-0.049
Tokenher
Token ID372
Feature activation-1.50e-02

Pos logit contributions
urious+0.240
rated+0.195
ust+0.183
 dangers+0.179
 habitat+0.173
Neg logit contributions
 coloring-0.208
 blush-0.197
 Grac-0.190
acey-0.184
 te-0.181
Token named
Token ID3706
Feature activation+2.45e-02

Pos logit contributions
urious+0.030
rated+0.024
ust+0.024
uct+0.022
 dangers+0.022
Neg logit contributions
 blush-0.026
 coloring-0.024
 Grac-0.023
 piping-0.022
 tea-0.022
Token Tom
Token ID4186
Feature activation-2.69e-02

Pos logit contributions
 Grac+0.031
 blush+0.031
 coloring+0.029
 tea+0.029
 Rose+0.028
Neg logit contributions
urious-0.063
rated-0.050
ust-0.049
uct-0.047
 dangers-0.047
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
urious+0.054
rated+0.044
ust+0.042
uct+0.040
 dangers+0.040
Neg logit contributions
 blush-0.046
 coloring-0.044
 Grac-0.042
 dress-0.040
 tea-0.040
Token Tom
Token ID4186
Feature activation-1.56e-02

Pos logit contributions
urious+0.035
ust+0.031
rated+0.031
iring+0.029
ths+0.028
Neg logit contributions
 blush-0.019
 coloring-0.019
 tea-0.017
 piping-0.016
 wig-0.016
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
urious+0.045
rated+0.038
ust+0.037
uct+0.035
 dangers+0.035
Neg logit contributions
 blush-0.027
 coloring-0.026
 Grac-0.024
 tea-0.023
 dress-0.023
Token but
Token ID475
Feature activation-6.79e-02

Pos logit contributions
 blush+0.019
 coloring+0.017
 Grac+0.016
 piping+0.016
 tea+0.016
Neg logit contributions
urious-0.029
rated-0.024
ust-0.023
uct-0.022
 dangers-0.022
Token it
Token ID340
Feature activation-6.00e-02

Pos logit contributions
urious+0.185
ust+0.155
 habitat+0.153
rated+0.153
 habitats+0.148
Neg logit contributions
 blush-0.071
 dress-0.068
 coloring-0.066
 Rose-0.064
 comb-0.062
Token was
Token ID373
Feature activation-2.54e-02

Pos logit contributions
urious+0.147
 eternity+0.129
ust+0.127
 habitats+0.127
 habitat+0.127
Neg logit contributions
 blush-0.063
 coloring-0.062
 dress-0.060
 comb-0.060
 tea-0.058
Token too
Token ID1165
Feature activation-5.11e-02

Pos logit contributions
urious+0.063
rated+0.050
 habitat+0.050
 habitats+0.049
 eternity+0.049
Neg logit contributions
 coloring-0.033
 blush-0.031
 tea-0.030
 comb-0.028
 brush-0.028
Token high
Token ID1029
Feature activation-1.24e-01

Pos logit contributions
urious+0.124
 eternity+0.112
 habitat+0.111
 habitats+0.111
uct+0.109
Neg logit contributions
 handful-0.050
 coloring-0.049
 spoon-0.048
 comb-0.047
 tea-0.046
Token for
Token ID329
Feature activation-3.07e-02

Pos logit contributions
urious+0.330
ust+0.268
uct+0.265
rated+0.256
 habitats+0.255
Neg logit contributions
 coloring-0.131
 blush-0.120
 kit-0.112
 tea-0.112
 dress-0.108
Token him
Token ID683
Feature activation-4.21e-02

Pos logit contributions
urious+0.076
rated+0.067
ust+0.065
rig+0.059
aut+0.057
Neg logit contributions
 blush-0.043
 tea-0.041
 coloring-0.041
 dress-0.040
 Elsa-0.038
Token.
Token ID13
Feature activation-2.79e-02

Pos logit contributions
urious+0.099
rated+0.083
uct+0.079
ust+0.078
 eternity+0.076
Neg logit contributions
 blush-0.056
 coloring-0.056
 dress-0.050
 Grac-0.048
 tea-0.048
Token
Token ID198
Feature activation-2.27e-02

Pos logit contributions
urious+0.074
rated+0.068
ust+0.066
 dangers+0.062
 eternity+0.061
Neg logit contributions
 blush-0.027
 Rose-0.026
 coloring-0.023
 Elsa-0.023
 Grac-0.023
Token
Token ID198
Feature activation-2.04e-02

Pos logit contributions
urious+0.051
rated+0.045
ust+0.042
uct+0.041
 habitat+0.041
Neg logit contributions
 blush-0.030
 coloring-0.029
 Grac-0.028
 piping-0.027
 tea-0.027
Token,
Token ID11
Feature activation+1.79e-02

Pos logit contributions
urious+0.031
rated+0.026
ust+0.025
uct+0.024
 dangers+0.024
Neg logit contributions
 blush-0.020
 coloring-0.019
 Grac-0.018
 tea-0.017
 piping-0.017
Token but
Token ID475
Feature activation-7.01e-02

Pos logit contributions
 blush+0.027
 coloring+0.025
 Grac+0.024
acey+0.024
 piping+0.023
Neg logit contributions
urious-0.039
rated-0.032
ust-0.032
uct-0.031
 eternity-0.030
Token it
Token ID340
Feature activation-6.24e-02

Pos logit contributions
urious+0.194
rated+0.167
ust+0.166
 habitat+0.165
 habitats+0.161
Neg logit contributions
 coloring-0.066
 blush-0.065
 dress-0.063
 Rose-0.062
 Grac-0.060
Token was
Token ID373
Feature activation-2.68e-02

Pos logit contributions
urious+0.154
 habitats+0.136
 habitat+0.133
 eternity+0.132
 situations+0.130
Neg logit contributions
 blush-0.063
 tea-0.060
 coloring-0.060
 dress-0.059
 spoon-0.058
Token too
Token ID1165
Feature activation-5.00e-02

Pos logit contributions
urious+0.064
rated+0.052
 habitat+0.051
 habitats+0.050
 eternity+0.050
Neg logit contributions
 coloring-0.035
 blush-0.034
 tea-0.033
 Grac-0.030
 brush-0.030
Token high
Token ID1029
Feature activation-1.23e-01

Pos logit contributions
urious+0.118
 habitats+0.105
 eternity+0.105
 habitat+0.105
 situations+0.104
Neg logit contributions
 handful-0.052
 spoon-0.051
 coloring-0.051
 tea-0.048
 blush-0.048
Token.
Token ID13
Feature activation-3.03e-02

Pos logit contributions
urious+0.313
ust+0.249
uct+0.244
rated+0.243
 habitat+0.240
Neg logit contributions
 coloring-0.145
 blush-0.139
 kit-0.126
 Grac-0.126
 Rose-0.124
Token So
Token ID1406
Feature activation-2.76e-02

Pos logit contributions
urious+0.073
rated+0.069
ust+0.066
 dangers+0.064
 eternity+0.062
Neg logit contributions
 blush-0.034
 coloring-0.033
 Rose-0.033
 Grac-0.032
 kit-0.029
Token he
Token ID339
Feature activation-4.40e-02

Pos logit contributions
urious+0.063
rated+0.053
 eternity+0.051
ust+0.051
uct+0.050
Neg logit contributions
 blush-0.039
 coloring-0.035
acey-0.033
 Grac-0.033
 piping-0.033
Token pushed
Token ID7121
Feature activation-2.29e-02

Pos logit contributions
urious+0.100
rated+0.083
ust+0.081
 eternity+0.079
 dangers+0.078
Neg logit contributions
 blush-0.063
 coloring-0.058
 Grac-0.055
 dress-0.054
 tea-0.054
Token a
Token ID257
Feature activation+3.29e-02

Pos logit contributions
urious+0.058
ust+0.050
rated+0.049
 habitat+0.047
 habitats+0.047
Neg logit contributions
 blush-0.028
 coloring-0.026
 Grac-0.024
 dress-0.023
 lipstick-0.023
Token,
Token ID11
Feature activation+1.38e-02

Pos logit contributions
urious+0.091
rated+0.077
ust+0.074
uct+0.072
 dangers+0.071
Neg logit contributions
 blush-0.045
 coloring-0.042
 Grac-0.039
 tea-0.038
 dress-0.037
Token but
Token ID475
Feature activation-7.03e-02

Pos logit contributions
 blush+0.022
 coloring+0.020
 Grac+0.020
 piping+0.019
acey+0.019
Neg logit contributions
urious-0.029
rated-0.024
ust-0.024
uct-0.023
 eternity-0.022
Token it
Token ID340
Feature activation-6.18e-02

Pos logit contributions
urious+0.167
rated+0.138
ust+0.137
 habitat+0.130
 eternity+0.129
Neg logit contributions
 blush-0.098
 coloring-0.093
 dress-0.087
 Rose-0.086
 Grac-0.085
Token was
Token ID373
Feature activation-2.78e-02

Pos logit contributions
urious+0.150
 habitats+0.130
 habitat+0.130
ust+0.129
 eternity+0.128
Neg logit contributions
 blush-0.070
 coloring-0.066
 dress-0.063
 tea-0.062
 comb-0.062
Token too
Token ID1165
Feature activation-5.36e-02

Pos logit contributions
urious+0.067
rated+0.054
 habitat+0.053
 habitats+0.052
ust+0.052
Neg logit contributions
 coloring-0.037
 blush-0.036
 tea-0.035
 Grac-0.031
 brush-0.031
Token high
Token ID1029
Feature activation-1.23e-01

Pos logit contributions
urious+0.128
 habitat+0.115
 habitats+0.114
 situations+0.114
 eternity+0.113
Neg logit contributions
 coloring-0.053
 tea-0.053
 handful-0.053
 spoon-0.052
 blush-0.050
Token for
Token ID329
Feature activation-3.05e-02

Pos logit contributions
urious+0.314
ust+0.254
uct+0.249
rated+0.246
 habitat+0.243
Neg logit contributions
 coloring-0.144
 blush-0.139
 kit-0.124
 tea-0.123
 Rose-0.121
Token him
Token ID683
Feature activation-4.26e-02

Pos logit contributions
urious+0.059
rated+0.049
ust+0.047
uct+0.043
 habitat+0.042
Neg logit contributions
 blush-0.056
 coloring-0.054
 Grac-0.050
 tea-0.050
 dress-0.049
Token to
Token ID284
Feature activation-6.21e-03

Pos logit contributions
urious+0.097
rated+0.081
ust+0.077
uct+0.076
 eternity+0.074
Neg logit contributions
 blush-0.062
 coloring-0.059
 Grac-0.054
 dress-0.053
 tea-0.053
Token reach
Token ID3151
Feature activation-2.77e-02

Pos logit contributions
urious+0.015
ust+0.012
rated+0.012
 habitats+0.012
uct+0.012
Neg logit contributions
 blush-0.009
 dress-0.008
 coloring-0.008
 brush-0.008
 tea-0.008
Token.
Token ID13
Feature activation-2.87e-02

Pos logit contributions
urious+0.071
ust+0.059
rated+0.058
uct+0.055
 habitat+0.054
Neg logit contributions
 coloring-0.034
 blush-0.033
 tea-0.031
 Grac-0.030
 dress-0.030
Token,
Token ID11
Feature activation+9.02e-03

Pos logit contributions
 blush+0.010
 coloring+0.010
 Grac+0.009
 tea+0.008
 dress+0.008
Neg logit contributions
urious-0.028
rated-0.024
ust-0.024
uct-0.023
 dangers-0.023
Token a
Token ID257
Feature activation-3.55e-03

Pos logit contributions
 blush+0.011
 coloring+0.010
 Grac+0.010
 piping+0.009
 tea+0.009
Neg logit contributions
urious-0.023
rated-0.019
ust-0.019
uct-0.018
 dangers-0.018
Token pink
Token ID11398
Feature activation-2.67e-02

Pos logit contributions
urious+0.007
rated+0.005
ust+0.005
 dangers+0.005
 eternity+0.005
Neg logit contributions
 blush-0.006
 coloring-0.006
 Grac-0.006
 piping-0.006
acey-0.005
Token car
Token ID1097
Feature activation-3.78e-02

Pos logit contributions
urious+0.052
rated+0.040
 eternity+0.040
ust+0.039
 next+0.039
Neg logit contributions
 blush-0.043
 Grac-0.040
 coloring-0.039
 piping-0.039
acey-0.038
Token named
Token ID3706
Feature activation+1.24e-02

Pos logit contributions
urious+0.086
rated+0.072
ust+0.069
uct+0.066
 eternity+0.066
Neg logit contributions
 blush-0.055
 coloring-0.051
 Grac-0.049
 piping-0.046
 tea-0.046
Token Spe
Token ID2531
Feature activation-1.23e-01

Pos logit contributions
 blush+0.024
 coloring+0.023
 Grac+0.022
 piping+0.021
 tea+0.021
Neg logit contributions
urious-0.022
rated-0.017
ust-0.017
uct-0.016
 eternity-0.015
Tokenedy
Token ID4716
Feature activation-5.08e-02

Pos logit contributions
urious+0.231
rated+0.203
 dangers+0.193
ust+0.187
 situations+0.185
Neg logit contributions
 blush-0.201
 coloring-0.199
 dress-0.191
 lipstick-0.184
 wig-0.183
Token liked
Token ID8288
Feature activation-3.91e-02

Pos logit contributions
urious+0.106
rated+0.088
ust+0.088
uct+0.084
 dangers+0.081
Neg logit contributions
 blush-0.080
 coloring-0.078
 dress-0.072
 Grac-0.072
 tea-0.070
Token to
Token ID284
Feature activation-4.20e-02

Pos logit contributions
urious+0.108
rated+0.096
ust+0.093
uct+0.088
 eternity+0.085
Neg logit contributions
 coloring-0.040
 blush-0.038
 tea-0.034
 dress-0.034
 groom-0.033
Token go
Token ID467
Feature activation-2.95e-02

Pos logit contributions
urious+0.080
rated+0.067
ust+0.061
 eternity+0.061
uct+0.059
Neg logit contributions
 blush-0.072
 coloring-0.069
 Grac-0.068
acey-0.064
 piping-0.064
Token fast
Token ID3049
Feature activation-9.38e-03

Pos logit contributions
urious+0.074
rated+0.063
ust+0.062
uct+0.059
 eternity+0.058
Neg logit contributions
 blush-0.036
 coloring-0.035
 Grac-0.032
 tea-0.031
 dress-0.030
Token,
Token ID11
Feature activation+1.12e-02

Pos logit contributions
urious+0.044
rated+0.036
ust+0.035
uct+0.033
 dangers+0.033
Neg logit contributions
 blush-0.035
 coloring-0.033
 Grac-0.031
 piping-0.031
 groom-0.030
Token but
Token ID475
Feature activation-6.88e-02

Pos logit contributions
 blush+0.016
 coloring+0.015
 Grac+0.014
 piping+0.013
 tea+0.013
Neg logit contributions
urious-0.026
rated-0.022
ust-0.021
uct-0.020
 dangers-0.020
Token it
Token ID340
Feature activation-5.88e-02

Pos logit contributions
urious+0.187
rated+0.160
ust+0.158
 habitat+0.154
uct+0.150
Neg logit contributions
 coloring-0.068
 blush-0.068
 Grac-0.063
 dress-0.062
 Rose-0.061
Token was
Token ID373
Feature activation-2.50e-02

Pos logit contributions
urious+0.140
 eternity+0.122
 habitat+0.122
rated+0.120
 habitats+0.119
Neg logit contributions
 blush-0.065
 coloring-0.064
 dress-0.062
 comb-0.061
 te-0.059
Token too
Token ID1165
Feature activation-5.08e-02

Pos logit contributions
urious+0.058
rated+0.047
 habitat+0.045
ust+0.045
 habitats+0.045
Neg logit contributions
 coloring-0.035
 blush-0.035
 tea-0.032
 Grac-0.031
 Rose-0.030
Token high
Token ID1029
Feature activation-1.23e-01

Pos logit contributions
urious+0.127
uct+0.118
 habitat+0.118
 habitats+0.117
 eternity+0.116
Neg logit contributions
 handful-0.041
 coloring-0.041
 brush-0.040
 few-0.039
 cutter-0.038
Token for
Token ID329
Feature activation-2.98e-02

Pos logit contributions
urious+0.320
ust+0.258
uct+0.256
 habitat+0.249
 habitats+0.248
Neg logit contributions
 coloring-0.138
 blush-0.126
 tea-0.118
 kit-0.118
 Rose-0.117
Token him
Token ID683
Feature activation-4.11e-02

Pos logit contributions
urious+0.071
rated+0.062
ust+0.061
uct+0.057
 habitat+0.055
Neg logit contributions
 blush-0.041
 coloring-0.039
 tea-0.037
 Grac-0.036
 dress-0.035
Token.
Token ID13
Feature activation-1.67e-02

Pos logit contributions
urious+0.092
rated+0.076
uct+0.072
ust+0.072
 eternity+0.070
Neg logit contributions
 blush-0.061
 coloring-0.060
 dress-0.054
 Grac-0.053
 tea-0.053
Token So
Token ID1406
Feature activation-2.30e-02

Pos logit contributions
urious+0.037
rated+0.033
ust+0.032
 dangers+0.030
uct+0.029
Neg logit contributions
 blush-0.024
 coloring-0.023
 Grac-0.022
 Rose-0.021
 te-0.021
Token,
Token ID11
Feature activation+5.27e-03

Pos logit contributions
urious+0.055
rated+0.047
ust+0.045
 eternity+0.044
 dangers+0.044
Neg logit contributions
 blush-0.031
 coloring-0.028
 Grac-0.026
acey-0.026
 piping-0.026
Token not
Token ID407
Feature activation-4.83e-02

Pos logit contributions
 blush+0.030
 coloring+0.028
 Grac+0.027
 piping+0.026
 tea+0.026
Neg logit contributions
urious-0.031
rated-0.025
ust-0.025
uct-0.023
 dangers-0.023
Token forgiving
Token ID43486
Feature activation+2.01e-02

Pos logit contributions
urious+0.086
ust+0.071
rated+0.070
 dangers+0.067
uct+0.065
Neg logit contributions
 blush-0.090
 coloring-0.082
 Grac-0.082
 piping-0.081
acey-0.080
Token.
Token ID13
Feature activation+4.02e-02

Pos logit contributions
 blush+0.027
 coloring+0.026
 dress+0.025
 tea+0.025
 Rose+0.024
Neg logit contributions
urious-0.046
rated-0.039
uct-0.038
appointed-0.037
ust-0.036
Token
Token ID198
Feature activation+6.94e-03

Pos logit contributions
 blush+0.061
 coloring+0.059
 Rose+0.058
 Grac+0.056
acey+0.055
Neg logit contributions
urious-0.086
rated-0.077
ust-0.069
 eternity-0.065
uct-0.064
Token
Token ID198
Feature activation+6.90e-03

Pos logit contributions
 coloring+0.007
 blush+0.007
 Grac+0.007
 piping+0.007
 Rose+0.006
Neg logit contributions
urious-0.017
rated-0.016
ust-0.015
uct-0.014
 eternity-0.014
Token"
Token ID1
Feature activation+5.84e-02

Pos logit contributions
 coloring+0.007
 blush+0.007
 kit+0.007
 lipstick+0.007
Rose+0.007
Neg logit contributions
urious-0.017
rated-0.016
ust-0.014
 eternity-0.014
 habitat-0.014
TokenYou
Token ID1639
Feature activation+1.30e-02

Pos logit contributions
 blush+0.082
 groom+0.078
 coloring+0.076
 wig+0.072
 Grac+0.071
Neg logit contributions
urious-0.132
rated-0.122
 eternity-0.108
ust-0.107
 habitat-0.107
Token are
Token ID389
Feature activation+3.43e-02

Pos logit contributions
 blush+0.016
 coloring+0.015
 Grac+0.014
 piping+0.013
 tea+0.013
Neg logit contributions
urious-0.032
rated-0.027
ust-0.027
uct-0.026
 dangers-0.026
Token bad
Token ID2089
Feature activation+1.96e-02

Pos logit contributions
 blush+0.058
 coloring+0.055
 Grac+0.052
 tea+0.050
 piping+0.050
Neg logit contributions
urious-0.071
rated-0.058
ust-0.056
uct-0.053
 eternity-0.052
Token boys
Token ID6510
Feature activation+5.07e-02

Pos logit contributions
 blush+0.030
 coloring+0.028
 Grac+0.026
 tea+0.025
 dress+0.025
Neg logit contributions
urious-0.044
rated-0.036
ust-0.036
uct-0.034
 eternity-0.033
Token,"
Token ID553
Feature activation+5.93e-02

Pos logit contributions
 blush+0.068
 coloring+0.064
 Grac+0.059
 dress+0.057
 tea+0.057
Neg logit contributions
urious-0.121
rated-0.101
ust-0.099
uct-0.095
 eternity-0.093
Token bugs
Token ID11316
Feature activation-1.49e-02

Pos logit contributions
urious+0.020
rated+0.018
ust+0.017
uct+0.016
appointed+0.016
Neg logit contributions
 blush-0.011
 coloring-0.011
 Grac-0.010
 tea-0.010
 dress-0.010
Token and
Token ID290
Feature activation-1.38e-02

Pos logit contributions
urious+0.033
rated+0.028
ust+0.027
uct+0.026
 eternity+0.025
Neg logit contributions
 blush-0.022
 coloring-0.021
 Grac-0.020
 piping-0.019
 tea-0.019
Token worms
Token ID29174
Feature activation-6.82e-03

Pos logit contributions
urious+0.027
rated+0.022
ust+0.021
uct+0.020
 dangers+0.020
Neg logit contributions
 blush-0.024
 coloring-0.023
 Grac-0.022
 piping-0.021
acey-0.021
Token,
Token ID11
Feature activation+3.52e-02

Pos logit contributions
urious+0.015
rated+0.013
ust+0.012
uct+0.012
 dangers+0.012
Neg logit contributions
 blush-0.010
 coloring-0.009
 Grac-0.009
 piping-0.008
 tea-0.008
Token mom
Token ID1995
Feature activation+3.27e-02

Pos logit contributions
 blush+0.055
 coloring+0.051
 Grac+0.049
 tea+0.047
 dress+0.046
Neg logit contributions
urious-0.077
rated-0.064
ust-0.062
uct-0.059
 eternity-0.058
Token,"
Token ID553
Feature activation+6.14e-02

Pos logit contributions
 blush+0.055
 coloring+0.051
 Grac+0.049
 piping+0.047
acey+0.047
Neg logit contributions
urious-0.067
rated-0.055
ust-0.053
uct-0.050
 eternity-0.049
Token Ben
Token ID3932
Feature activation+3.78e-02

Pos logit contributions
 blush+0.094
 coloring+0.092
 piping+0.084
acey+0.083
 gown+0.082
Neg logit contributions
urious-0.124
uct-0.099
ust-0.097
rated-0.097
 eternity-0.097
Token said
Token ID531
Feature activation+3.45e-02

Pos logit contributions
 blush+0.059
 coloring+0.055
 Grac+0.052
 piping+0.050
 tea+0.050
Neg logit contributions
urious-0.082
rated-0.068
ust-0.066
uct-0.063
 eternity-0.062
Token.
Token ID13
Feature activation+1.58e-02

Pos logit contributions
 blush+0.070
 coloring+0.066
 piping+0.064
 Grac+0.063
 groom+0.063
Neg logit contributions
urious-0.055
rated-0.044
ust-0.042
 eternity-0.040
uct-0.039
Token "
Token ID366
Feature activation+1.67e-02

Pos logit contributions
 blush+0.031
 coloring+0.030
 groom+0.028
 Grac+0.028
 tea+0.028
Neg logit contributions
urious-0.027
ust-0.020
rated-0.020
 next-0.019
uct-0.019
TokenWe
Token ID1135
Feature activation+4.09e-02

Pos logit contributions
 blush+0.027
 coloring+0.025
 Grac+0.024
 groom+0.023
 piping+0.023
Neg logit contributions
urious-0.035
rated-0.030
ust-0.028
 eternity-0.027
uct-0.027
Token carried
Token ID5281
Feature activation+1.89e-02

Pos logit contributions
urious+0.015
rated+0.012
ust+0.012
uct+0.012
 dangers+0.011
Neg logit contributions
 blush-0.012
 coloring-0.010
 Grac-0.010
 piping-0.010
 gown-0.010
Token them
Token ID606
Feature activation+1.39e-02

Pos logit contributions
 blush+0.033
 coloring+0.030
 groom+0.029
 piping+0.029
 wig+0.028
Neg logit contributions
urious-0.033
rated-0.027
 next-0.027
uct-0.026
ust-0.025
Token carefully
Token ID7773
Feature activation-7.68e-03

Pos logit contributions
 blush+0.020
 coloring+0.019
 Grac+0.018
 piping+0.018
acey+0.017
Neg logit contributions
urious-0.031
rated-0.026
ust-0.025
uct-0.024
 eternity-0.024
Token to
Token ID284
Feature activation+1.51e-02

Pos logit contributions
urious+0.016
rated+0.013
ust+0.012
uct+0.012
 eternity+0.012
Neg logit contributions
 blush-0.013
 coloring-0.012
 Grac-0.011
 piping-0.011
acey-0.011
Token the
Token ID262
Feature activation+6.47e-03

Pos logit contributions
 blush+0.027
 Grac+0.024
 coloring+0.024
acey+0.023
 piping+0.023
Neg logit contributions
urious-0.028
 eternity-0.022
rated-0.022
uct-0.021
 dangers-0.021
Token room
Token ID2119
Feature activation+5.49e-02

Pos logit contributions
 blush+0.010
 Grac+0.009
acey+0.009
 Stacy+0.008
 piping+0.008
Neg logit contributions
urious-0.013
 next-0.011
 dangers-0.011
rated-0.011
 habitat-0.010
Token.
Token ID13
Feature activation+2.26e-02

Pos logit contributions
 blush+0.099
 coloring+0.092
 Grac+0.091
 piping+0.088
 groom+0.086
Neg logit contributions
urious-0.104
rated-0.084
ust-0.081
uct-0.077
 eternity-0.076
Token He
Token ID679
Feature activation-6.69e-03

Pos logit contributions
 blush+0.047
 coloring+0.044
 Grac+0.042
 groom+0.042
 piping+0.042
Neg logit contributions
urious-0.037
rated-0.027
ust-0.027
uct-0.026
 eternity-0.025
Token put
Token ID1234
Feature activation+3.63e-02

Pos logit contributions
urious+0.014
rated+0.012
ust+0.011
uct+0.011
 dangers+0.011
Neg logit contributions
 blush-0.010
 coloring-0.009
 Grac-0.009
 piping-0.009
 lipstick-0.008
Token them
Token ID606
Feature activation+7.89e-03

Pos logit contributions
 blush+0.065
 coloring+0.059
 Grac+0.057
 groom+0.057
 wig+0.056
Neg logit contributions
urious-0.059
 next-0.051
rated-0.048
uct-0.047
 dangers-0.046
Token on
Token ID319
Feature activation+1.53e-02

Pos logit contributions
 blush+0.010
 coloring+0.009
 Grac+0.008
 tea+0.008
 dress+0.008
Neg logit contributions
urious-0.020
rated-0.017
ust-0.016
uct-0.016
 dangers-0.016
Token hill
Token ID12788
Feature activation-3.09e-02

Pos logit contributions
 blush+0.006
 coloring+0.006
 Grac+0.005
 piping+0.005
 tea+0.005
Neg logit contributions
urious-0.006
rated-0.005
ust-0.005
uct-0.004
 dangers-0.004
Token.
Token ID13
Feature activation-1.59e-02

Pos logit contributions
urious+0.064
rated+0.051
ust+0.051
uct+0.048
 dangers+0.048
Neg logit contributions
 blush-0.051
 coloring-0.049
 Grac-0.045
 dress-0.044
 tea-0.044
Token They
Token ID1119
Feature activation+2.62e-02

Pos logit contributions
urious+0.034
rated+0.028
ust+0.027
 dangers+0.026
uct+0.025
Neg logit contributions
 blush-0.026
 coloring-0.024
 Grac-0.024
 piping-0.022
 Rose-0.022
Token laughed
Token ID13818
Feature activation+3.77e-03

Pos logit contributions
 blush+0.041
 coloring+0.040
 Grac+0.038
 piping+0.037
 tea+0.036
Neg logit contributions
urious-0.053
rated-0.045
ust-0.044
uct-0.042
 eternity-0.040
Token and
Token ID290
Feature activation-2.22e-03

Pos logit contributions
 blush+0.007
 coloring+0.006
 Grac+0.006
 piping+0.006
acey+0.006
Neg logit contributions
urious-0.007
rated-0.006
ust-0.006
 eternity-0.006
uct-0.005
Token had
Token ID550
Feature activation+6.38e-02

Pos logit contributions
urious+0.004
rated+0.004
uct+0.004
ust+0.004
 eternity+0.004
Neg logit contributions
 blush-0.003
 coloring-0.003
 piping-0.003
acey-0.003
 Grac-0.003
Token a
Token ID257
Feature activation+5.63e-02

Pos logit contributions
 blush+0.113
 coloring+0.105
 Grac+0.104
acey+0.100
 piping+0.099
Neg logit contributions
urious-0.124
ust-0.099
rated-0.099
uct-0.094
 dangers-0.093
Token great
Token ID1049
Feature activation+4.12e-02

Pos logit contributions
 blush+0.097
 Grac+0.092
 coloring+0.087
 Stacy+0.087
acey+0.087
Neg logit contributions
urious-0.107
ust-0.087
rated-0.084
 dangers-0.081
 risky-0.079
Token time
Token ID640
Feature activation+2.08e-02

Pos logit contributions
 blush+0.082
 coloring+0.081
 dress+0.078
 tea+0.078
 piping+0.074
Neg logit contributions
urious-0.069
rated-0.052
ust-0.051
uct-0.051
 dangers-0.048
Token.
Token ID13
Feature activation+4.00e-03

Pos logit contributions
 coloring+0.031
 blush+0.031
 dress+0.027
 tea+0.027
 Grac+0.026
Neg logit contributions
urious-0.048
rated-0.041
ust-0.038
uct-0.036
 habitat-0.035
Token But
Token ID887
Feature activation-3.31e-02

Pos logit contributions
 blush+0.006
 coloring+0.006
 Grac+0.006
 Rose+0.005
 tea+0.005
Neg logit contributions
urious-0.009
rated-0.007
ust-0.007
 dangers-0.007
uct-0.007
Token.
Token ID13
Feature activation+2.93e-02

Pos logit contributions
urious+0.009
rated+0.007
ust+0.007
uct+0.007
 eternity+0.007
Neg logit contributions
 blush-0.007
 coloring-0.007
 Grac-0.006
 dress-0.006
 tea-0.006
Token But
Token ID887
Feature activation-3.26e-02

Pos logit contributions
 blush+0.035
 coloring+0.031
 Grac+0.028
 Rose+0.027
 piping+0.027
Neg logit contributions
urious-0.076
rated-0.067
ust-0.064
uct-0.061
 dangers-0.061
Token she
Token ID673
Feature activation-4.48e-03

Pos logit contributions
urious+0.100
rated+0.089
 habitat+0.086
ust+0.083
 habitats+0.082
Neg logit contributions
 blush-0.028
 coloring-0.025
 Rose-0.021
 Grac-0.020
 dress-0.018
Token likes
Token ID7832
Feature activation-1.55e-02

Pos logit contributions
urious+0.009
rated+0.008
ust+0.008
uct+0.007
 eternity+0.007
Neg logit contributions
 blush-0.007
 coloring-0.007
 dress-0.006
 tea-0.006
acey-0.006
Token her
Token ID607
Feature activation-4.68e-03

Pos logit contributions
urious+0.034
rated+0.028
ust+0.027
uct+0.026
 dangers+0.025
Neg logit contributions
 blush-0.024
 coloring-0.022
 Grac-0.021
 piping-0.020
 tea-0.020
Token friends
Token ID2460
Feature activation+6.08e-02

Pos logit contributions
urious+0.011
rated+0.009
ust+0.008
 dangers+0.008
 eternity+0.008
Neg logit contributions
 blush-0.006
 Grac-0.006
 coloring-0.006
 piping-0.006
acey-0.006
Token and
Token ID290
Feature activation-2.95e-02

Pos logit contributions
 blush+0.111
 coloring+0.105
 Grac+0.100
 piping+0.097
 tea+0.096
Neg logit contributions
urious-0.116
rated-0.093
ust-0.090
uct-0.085
 dangers-0.083
Token her
Token ID607
Feature activation+5.94e-03

Pos logit contributions
urious+0.063
rated+0.052
ust+0.050
 dangers+0.049
uct+0.048
Neg logit contributions
 blush-0.045
 coloring-0.042
 Grac-0.041
acey-0.039
 piping-0.039
Token class
Token ID1398
Feature activation+5.53e-02

Pos logit contributions
 blush+0.009
 coloring+0.008
 Grac+0.008
 piping+0.008
acey+0.007
Neg logit contributions
urious-0.013
rated-0.011
ust-0.011
uct-0.010
 dangers-0.010
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
 blush+0.102
 coloring+0.097
 Grac+0.094
 piping+0.090
 tea+0.089
Neg logit contributions
urious-0.102
rated-0.082
ust-0.080
uct-0.075
 dangers-0.074
Token She
Token ID1375
Feature activation+9.57e-03

Pos logit contributions
 blush+0.030
 coloring+0.027
 Grac+0.026
 tea+0.024
 piping+0.024
Neg logit contributions
urious-0.058
rated-0.050
ust-0.048
uct-0.046
 dangers-0.046
Token his
Token ID465
Feature activation-2.26e-02

Pos logit contributions
urious+0.011
rated+0.009
ust+0.009
 habitat+0.008
appointed+0.008
Neg logit contributions
 blush-0.011
 coloring-0.011
 Grac-0.010
 tea-0.010
 Rose-0.010
Token success
Token ID1943
Feature activation+8.57e-03

Pos logit contributions
urious+0.045
rated+0.038
ust+0.037
uct+0.036
appointed+0.033
Neg logit contributions
 blush-0.039
 coloring-0.039
 dress-0.035
 tea-0.035
 Grac-0.034
Token.
Token ID13
Feature activation+1.13e-02

Pos logit contributions
 blush+0.014
 coloring+0.014
 Grac+0.013
 dress+0.012
 tea+0.012
Neg logit contributions
urious-0.018
rated-0.015
ust-0.014
uct-0.013
 habitat-0.013
Token He
Token ID679
Feature activation-1.16e-02

Pos logit contributions
 blush+0.021
 Grac+0.020
 coloring+0.020
 Rose+0.019
acey+0.018
Neg logit contributions
urious-0.021
rated-0.018
ust-0.016
 dangers-0.015
uct-0.014
Token celebrated
Token ID13943
Feature activation+2.52e-02

Pos logit contributions
urious+0.024
rated+0.020
ust+0.019
uct+0.019
 eternity+0.018
Neg logit contributions
 blush-0.019
 coloring-0.018
 Grac-0.017
 piping-0.016
 tea-0.016
Token by
Token ID416
Feature activation-9.95e-03

Pos logit contributions
 blush+0.039
 coloring+0.037
 Grac+0.035
 tea+0.033
 dress+0.033
Neg logit contributions
urious-0.055
rated-0.046
ust-0.045
uct-0.042
 dangers-0.041
Token having
Token ID1719
Feature activation+4.58e-02

Pos logit contributions
urious+0.021
rated+0.017
ust+0.017
uct+0.016
 dangers+0.016
Neg logit contributions
 blush-0.016
 coloring-0.015
 Grac-0.015
 piping-0.014
 tea-0.014
Token a
Token ID257
Feature activation+4.16e-02

Pos logit contributions
 blush+0.068
 coloring+0.062
 Grac+0.059
 piping+0.058
acey+0.057
Neg logit contributions
urious-0.101
rated-0.082
ust-0.082
 dangers-0.079
uct-0.079
Token big
Token ID1263
Feature activation+2.83e-02

Pos logit contributions
 blush+0.076
 Grac+0.066
 Stacy+0.066
 mascara+0.066
acey+0.066
Neg logit contributions
urious-0.072
ust-0.059
 eternity-0.055
 next-0.055
rated-0.054
Token party
Token ID2151
Feature activation+2.71e-02

Pos logit contributions
 blush+0.039
 Grac+0.035
 coloring+0.035
acey+0.034
 groom+0.033
Neg logit contributions
urious-0.064
rated-0.053
 eternity-0.052
ust-0.051
 dangers-0.050
Token.
Token ID13
Feature activation+6.09e-03

Pos logit contributions
 blush+0.047
 coloring+0.043
 Grac+0.043
 piping+0.042
acey+0.041
Neg logit contributions
urious-0.052
rated-0.043
ust-0.042
 eternity-0.040
uct-0.039
Token party
Token ID2151
Feature activation+3.27e-02

Pos logit contributions
 blush+0.030
 coloring+0.029
 dress+0.026
 tea+0.026
 Grac+0.026
Neg logit contributions
urious-0.045
rated-0.037
ust-0.036
uct-0.034
 dangers-0.033
Token and
Token ID290
Feature activation-2.46e-02

Pos logit contributions
 blush+0.055
 coloring+0.052
 Grac+0.049
 tea+0.047
 dress+0.047
Neg logit contributions
urious-0.067
rated-0.055
ust-0.053
uct-0.050
 dangers-0.049
Token the
Token ID262
Feature activation-8.29e-03

Pos logit contributions
urious+0.055
rated+0.046
ust+0.045
uct+0.043
 eternity+0.042
Neg logit contributions
 blush-0.037
 coloring-0.034
 Grac-0.032
 tea-0.031
 dress-0.031
Token loud
Token ID7812
Feature activation-2.27e-03

Pos logit contributions
urious+0.017
rated+0.014
ust+0.014
uct+0.013
 eternity+0.013
Neg logit contributions
 blush-0.014
 coloring-0.013
 tea-0.013
 dress-0.013
 Grac-0.012
Token yelling
Token ID22187
Feature activation+1.55e-02

Pos logit contributions
urious+0.005
ust+0.004
rated+0.004
uct+0.004
aut+0.003
Neg logit contributions
 blush-0.004
 coloring-0.004
 dress-0.003
 tea-0.003
 Grac-0.003
Token that
Token ID326
Feature activation+1.39e-02

Pos logit contributions
 blush+0.027
 coloring+0.025
 Grac+0.024
 piping+0.024
 tea+0.023
Neg logit contributions
urious-0.030
rated-0.025
ust-0.024
 dangers-0.023
uct-0.022
Token made
Token ID925
Feature activation+1.39e-02

Pos logit contributions
 blush+0.018
 coloring+0.017
 Grac+0.016
 tea+0.016
 dress+0.015
Neg logit contributions
urious-0.034
rated-0.028
ust-0.028
uct-0.027
 eternity-0.026
Token it
Token ID340
Feature activation-1.02e-02

Pos logit contributions
 blush+0.019
 coloring+0.018
 Grac+0.017
 piping+0.016
 tea+0.016
Neg logit contributions
urious-0.032
rated-0.027
ust-0.027
uct-0.025
 dangers-0.025
Token so
Token ID523
Feature activation-6.26e-04

Pos logit contributions
urious+0.023
rated+0.019
ust+0.019
uct+0.018
 eternity+0.018
Neg logit contributions
 blush-0.015
 coloring-0.014
 Grac-0.013
 tea-0.012
 dress-0.012
Token much
Token ID881
Feature activation-2.51e-02

Pos logit contributions
urious+0.002
rated+0.001
uct+0.001
 habitat+0.001
ust+0.001
Neg logit contributions
 blush-0.001
 tea-0.001
 coloring-0.001
 Grac-0.001
 dress-0.001
Token fun
Token ID1257
Feature activation-6.56e-03

Pos logit contributions
urious+0.054
rated+0.044
ust+0.042
uct+0.040
 habitat+0.039
Neg logit contributions
 blush-0.041
 coloring-0.039
 tea-0.037
 Grac-0.037
 dress-0.036
Token a
Token ID257
Feature activation+3.24e-02

Pos logit contributions
urious+0.125
rated+0.100
ust+0.097
uct+0.095
 habitat+0.093
Neg logit contributions
 blush-0.093
 coloring-0.089
 Grac-0.085
 tea-0.082
 dress-0.081
Token monster
Token ID9234
Feature activation-2.89e-02

Pos logit contributions
 blush+0.068
acey+0.061
 coloring+0.061
 piping+0.060
 lipstick+0.059
Neg logit contributions
urious-0.050
rated-0.042
 dangers-0.040
ust-0.040
 eternity-0.038
Token!
Token ID0
Feature activation-2.57e-03

Pos logit contributions
urious+0.059
rated+0.049
ust+0.048
 eternity+0.045
uct+0.045
Neg logit contributions
 blush-0.047
 coloring-0.044
 Grac-0.042
 piping-0.041
 groom-0.041
Token It
Token ID632
Feature activation-2.98e-02

Pos logit contributions
urious+0.006
rated+0.005
ust+0.005
 dangers+0.004
 eternity+0.004
Neg logit contributions
 blush-0.004
 Grac-0.004
 coloring-0.004
 Rose-0.004
acey-0.003
Token was
Token ID373
Feature activation+2.24e-03

Pos logit contributions
urious+0.072
ust+0.059
rated+0.058
 eternity+0.058
 habitat+0.057
Neg logit contributions
 blush-0.038
 Grac-0.034
 dress-0.033
 coloring-0.033
 Rose-0.031
Token a
Token ID257
Feature activation+4.14e-02

Pos logit contributions
 blush+0.003
 coloring+0.003
 Grac+0.003
 tea+0.003
 dress+0.003
Neg logit contributions
urious-0.005
rated-0.004
ust-0.004
uct-0.004
 habitat-0.004
Token cat
Token ID3797
Feature activation-9.03e-03

Pos logit contributions
 blush+0.081
 coloring+0.072
acey+0.072
 Grac+0.071
 piping+0.071
Neg logit contributions
urious-0.066
 dangers-0.054
rated-0.053
ust-0.051
 earthquake-0.050
Token!
Token ID0
Feature activation-4.40e-03

Pos logit contributions
urious+0.018
rated+0.015
ust+0.014
 eternity+0.014
uct+0.013
Neg logit contributions
 blush-0.015
 coloring-0.014
 Grac-0.014
 piping-0.014
 groom-0.013
Token
Token ID198
Feature activation-8.06e-03

Pos logit contributions
urious+0.009
rated+0.008
ust+0.007
 dangers+0.007
uct+0.007
Neg logit contributions
 blush-0.007
 Grac-0.007
 coloring-0.007
 Rose-0.006
acey-0.006
Token
Token ID198
Feature activation-7.30e-03

Pos logit contributions
urious+0.018
rated+0.016
ust+0.015
 eternity+0.014
 dangers+0.014
Neg logit contributions
 blush-0.012
 Grac-0.011
 coloring-0.011
acey-0.010
 piping-0.010
TokenBilly
Token ID49640
Feature activation-8.15e-03

Pos logit contributions
urious+0.015
rated+0.013
ust+0.012
 eternity+0.012
 habitat+0.011
Neg logit contributions
 blush-0.012
 coloring-0.011
 Grac-0.011
acey-0.010
 piping-0.010
Token
Token ID198
Feature activation-1.04e-02

Pos logit contributions
urious+0.014
rated+0.012
ust+0.011
 dangers+0.011
uct+0.010
Neg logit contributions
 blush-0.010
 coloring-0.009
 Grac-0.009
 piping-0.008
 tea-0.008
Token
Token ID198
Feature activation-8.78e-03

Pos logit contributions
urious+0.020
rated+0.016
ust+0.015
uct+0.014
 eternity+0.014
Neg logit contributions
 blush-0.019
 coloring-0.018
 Grac-0.017
 groom-0.017
 piping-0.017
TokenFrom
Token ID4863
Feature activation-4.81e-02

Pos logit contributions
urious+0.018
rated+0.015
ust+0.015
uct+0.014
 dangers+0.013
Neg logit contributions
 blush-0.015
 coloring-0.014
 Grac-0.013
 piping-0.013
 tea-0.013
Token that
Token ID326
Feature activation+2.40e-02

Pos logit contributions
urious+0.118
rated+0.102
ust+0.100
 habitat+0.092
 dangers+0.092
Neg logit contributions
 blush-0.061
 coloring-0.057
 Grac-0.054
 dress-0.052
 Rose-0.052
Token day
Token ID1110
Feature activation+4.33e-02

Pos logit contributions
 blush+0.053
 coloring+0.051
 Grac+0.049
 piping+0.048
 tea+0.048
Neg logit contributions
urious-0.036
rated-0.027
ust-0.026
uct-0.024
 eternity-0.024
Token,
Token ID11
Feature activation+2.60e-02

Pos logit contributions
 blush+0.061
 coloring+0.057
 Grac+0.054
 piping+0.052
 groom+0.051
Neg logit contributions
urious-0.100
rated-0.084
ust-0.082
uct-0.079
 eternity-0.078
Token Daisy
Token ID40355
Feature activation+3.75e-03

Pos logit contributions
 blush+0.041
 coloring+0.038
 Grac+0.036
 piping+0.035
 tea+0.035
Neg logit contributions
urious-0.056
rated-0.046
ust-0.045
uct-0.043
 eternity-0.042
Token went
Token ID1816
Feature activation+9.90e-03

Pos logit contributions
 blush+0.007
 coloring+0.006
 Grac+0.006
 piping+0.006
 tea+0.006
Neg logit contributions
urious-0.007
rated-0.006
ust-0.006
uct-0.005
 dangers-0.005
Token to
Token ID284
Feature activation-6.69e-04

Pos logit contributions
 blush+0.017
 piping+0.016
 Grac+0.016
 groom+0.015
acey+0.015
Neg logit contributions
urious-0.017
 next-0.015
 eternity-0.014
 dangers-0.014
 adventure-0.013
Token that
Token ID326
Feature activation+1.76e-02

Pos logit contributions
urious+0.001
rated+0.001
 eternity+0.001
uct+0.001
ust+0.001
Neg logit contributions
 blush-0.001
 piping-0.001
 Grac-0.001
 coloring-0.001
acey-0.001
Token hotel
Token ID7541
Feature activation-2.42e-02

Pos logit contributions
 blush+0.033
 coloring+0.031
 Grac+0.030
 tea+0.029
 dress+0.029
Neg logit contributions
urious-0.033
rated-0.026
ust-0.026
uct-0.024
 eternity-0.024
Token.
Token ID13
Feature activation+2.40e-02

Pos logit contributions
 blush+0.007
 coloring+0.007
 Grac+0.007
 piping+0.007
acey+0.007
Neg logit contributions
urious-0.008
rated-0.006
ust-0.006
uct-0.006
 eternity-0.006
Token She
Token ID1375
Feature activation+7.95e-03

Pos logit contributions
 blush+0.031
 coloring+0.028
 Grac+0.027
 dress+0.025
 piping+0.025
Neg logit contributions
urious-0.059
rated-0.051
ust-0.049
uct-0.047
 dangers-0.046
Token thinks
Token ID6834
Feature activation-2.30e-02

Pos logit contributions
 blush+0.013
 coloring+0.012
 dress+0.011
 tea+0.011
 Grac+0.011
Neg logit contributions
urious-0.016
rated-0.014
ust-0.014
 eternity-0.013
uct-0.013
Token the
Token ID262
Feature activation-1.57e-02

Pos logit contributions
urious+0.051
rated+0.043
ust+0.042
 eternity+0.041
uct+0.041
Neg logit contributions
 blush-0.033
 coloring-0.031
 Grac-0.029
 piping-0.028
 groom-0.028
Token player
Token ID2137
Feature activation+2.51e-02

Pos logit contributions
urious+0.032
rated+0.027
 dangers+0.026
 eternity+0.025
ust+0.025
Neg logit contributions
 blush-0.025
 coloring-0.023
 Grac-0.023
 piping-0.021
acey-0.021
Token is
Token ID318
Feature activation+1.39e-02

Pos logit contributions
 blush+0.041
 coloring+0.039
 Grac+0.037
 tea+0.036
 dress+0.036
Neg logit contributions
urious-0.052
rated-0.042
ust-0.041
uct-0.039
 eternity-0.039
Token broken
Token ID5445
Feature activation+3.30e-03

Pos logit contributions
 blush+0.024
 coloring+0.023
 Grac+0.021
 tea+0.021
 dress+0.020
Neg logit contributions
urious-0.029
rated-0.023
ust-0.022
uct-0.021
 eternity-0.021
Token.
Token ID13
Feature activation+2.27e-02

Pos logit contributions
 blush+0.005
 coloring+0.005
 dress+0.005
 Grac+0.005
 tea+0.005
Neg logit contributions
urious-0.007
rated-0.006
ust-0.006
uct-0.006
 habitat-0.005
Token She
Token ID1375
Feature activation+4.80e-03

Pos logit contributions
 blush+0.029
 coloring+0.026
 Grac+0.025
 piping+0.023
 dress+0.023
Neg logit contributions
urious-0.056
rated-0.049
ust-0.047
uct-0.045
 eternity-0.045
Token stands
Token ID6296
Feature activation-6.83e-03

Pos logit contributions
 blush+0.008
 coloring+0.008
 dress+0.007
 Grac+0.007
 tea+0.007
Neg logit contributions
urious-0.010
rated-0.008
ust-0.008
uct-0.007
 eternity-0.007
Token up
Token ID510
Feature activation+8.27e-04

Pos logit contributions
urious+0.014
rated+0.012
ust+0.012
 habitat+0.011
 eternity+0.011
Neg logit contributions
 blush-0.011
 coloring-0.011
 Grac-0.010
 dress-0.010
 tea-0.009
Token.
Token ID13
Feature activation+3.29e-02

Pos logit contributions
 blush+0.051
 coloring+0.048
 Grac+0.046
 piping+0.044
 tea+0.044
Neg logit contributions
urious-0.056
rated-0.045
ust-0.044
uct-0.041
 dangers-0.041
Token They
Token ID1119
Feature activation+2.92e-02

Pos logit contributions
 blush+0.053
 coloring+0.049
 Grac+0.047
 piping+0.045
 tea+0.045
Neg logit contributions
urious-0.071
rated-0.059
ust-0.057
uct-0.055
 dangers-0.054
Token knew
Token ID2993
Feature activation-1.17e-02

Pos logit contributions
 blush+0.045
 coloring+0.042
 Grac+0.040
 dress+0.039
 groom+0.039
Neg logit contributions
urious-0.064
rated-0.053
ust-0.051
uct-0.049
 eternity-0.049
Token she
Token ID673
Feature activation-3.07e-02

Pos logit contributions
urious+0.033
rated+0.029
ust+0.026
uct+0.026
appointed+0.026
Neg logit contributions
 blush-0.012
 coloring-0.012
 tea-0.011
 Grac-0.011
 dress-0.010
Token wanted
Token ID2227
Feature activation-7.45e-03

Pos logit contributions
urious+0.077
rated+0.065
 eternity+0.065
uct+0.064
 habitat+0.064
Neg logit contributions
 blush-0.033
 coloring-0.033
 dress-0.030
 kit-0.030
acey-0.029
Token them
Token ID606
Feature activation-1.72e-02

Pos logit contributions
urious+0.018
rated+0.015
ust+0.015
uct+0.014
 dangers+0.014
Neg logit contributions
 blush-0.010
 coloring-0.009
 Grac-0.009
 piping-0.008
 tea-0.008
Token to
Token ID284
Feature activation-7.59e-03

Pos logit contributions
urious+0.042
rated+0.035
ust+0.034
uct+0.033
 eternity+0.033
Neg logit contributions
 blush-0.022
 coloring-0.021
 Grac-0.019
 tea-0.019
 dress-0.018
Token be
Token ID307
Feature activation+2.37e-02

Pos logit contributions
urious+0.018
rated+0.014
ust+0.014
uct+0.014
 eternity+0.013
Neg logit contributions
 blush-0.011
 coloring-0.010
 dress-0.010
 Grac-0.010
 tea-0.010
Token happy
Token ID3772
Feature activation+2.16e-02

Pos logit contributions
 blush+0.037
 coloring+0.035
 Grac+0.033
 piping+0.032
acey+0.031
Neg logit contributions
urious-0.051
rated-0.042
ust-0.042
uct-0.039
 dangers-0.039
Token.
Token ID13
Feature activation+2.28e-02

Pos logit contributions
 blush+0.033
 coloring+0.032
 tea+0.030
 dress+0.030
 Grac+0.030
Neg logit contributions
urious-0.049
rated-0.041
uct-0.038
ust-0.038
appointed-0.037
Token They
Token ID1119
Feature activation+2.36e-02

Pos logit contributions
 blush+0.033
 coloring+0.031
 Grac+0.031
 piping+0.029
 Rose+0.029
Neg logit contributions
urious-0.051
rated-0.044
ust-0.042
uct-0.041
 eternity-0.040
Token anyone
Token ID2687
Feature activation-2.25e-03

Pos logit contributions
urious+0.064
rated+0.054
ust+0.052
uct+0.050
appointed+0.048
Neg logit contributions
 blush-0.042
 coloring-0.040
 Grac-0.037
 tea-0.036
 dress-0.035
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
urious+0.005
rated+0.004
ust+0.004
uct+0.004
 eternity+0.004
Neg logit contributions
 blush-0.004
 coloring-0.003
 Grac-0.003
 dress-0.003
 tea-0.003
Token One
Token ID1881
Feature activation-4.06e-02

Pos logit contributions
urious+0.027
rated+0.023
ust+0.021
 dangers+0.020
 eternity+0.019
Neg logit contributions
 blush-0.033
 coloring-0.031
 Grac-0.030
 Rose-0.029
 tea-0.029
Token day
Token ID1110
Feature activation+1.18e-02

Pos logit contributions
urious+0.083
 habitat+0.071
ust+0.071
uct+0.071
rated+0.068
Neg logit contributions
 coloring-0.065
 blush-0.063
 tea-0.060
 dress-0.059
 brush-0.057
Token,
Token ID11
Feature activation+7.64e-03

Pos logit contributions
 groom+0.019
 piping+0.018
 blush+0.018
 wig+0.018
 kettle+0.018
Neg logit contributions
urious-0.018
 next-0.016
 eternity-0.015
rated-0.015
 earthquake-0.015
Token she
Token ID673
Feature activation-1.77e-02

Pos logit contributions
 piping+0.011
 wig+0.011
 blush+0.011
 groom+0.010
acey+0.010
Neg logit contributions
urious-0.014
 next-0.012
uct-0.011
 eternity-0.011
appointed-0.011
Token went
Token ID1816
Feature activation-6.41e-03

Pos logit contributions
urious+0.040
rated+0.035
ust+0.034
 eternity+0.033
uct+0.033
Neg logit contributions
 blush-0.025
 coloring-0.024
 dress-0.022
 Grac-0.022
acey-0.021
Token to
Token ID284
Feature activation-1.59e-02

Pos logit contributions
urious+0.010
 next+0.009
 dangers+0.008
 adventure+0.008
 eternity+0.008
Neg logit contributions
 piping-0.012
 blush-0.012
acey-0.011
 groom-0.011
 Grac-0.011
Token the
Token ID262
Feature activation-3.12e-02

Pos logit contributions
urious+0.035
rated+0.028
ust+0.028
uct+0.027
 eternity+0.027
Neg logit contributions
 blush-0.024
 coloring-0.022
 Grac-0.021
 piping-0.021
acey-0.020
Token forest
Token ID8222
Feature activation-3.67e-02

Pos logit contributions
urious+0.052
ust+0.038
 dangers+0.036
rated+0.036
 habitat+0.036
Neg logit contributions
 blush-0.064
 Grac-0.059
 coloring-0.058
 piping-0.057
acey-0.056
Token with
Token ID351
Feature activation+4.95e-03

Pos logit contributions
urious+0.086
rated+0.074
ust+0.072
uct+0.069
 dangers+0.067
Neg logit contributions
 blush-0.049
 coloring-0.048
 Grac-0.043
 dress-0.042
 tea-0.042
Token "
Token ID366
Feature activation+1.79e-02

Pos logit contributions
 blush+0.042
 coloring+0.038
acey+0.038
 tea+0.037
 piping+0.037
Neg logit contributions
urious-0.045
 next-0.038
rated-0.037
uct-0.037
 eternity-0.036
TokenWe
Token ID1135
Feature activation+3.71e-02

Pos logit contributions
 blush+0.035
 coloring+0.033
 Grac+0.032
 tea+0.031
 piping+0.031
Neg logit contributions
urious-0.032
rated-0.025
ust-0.025
uct-0.023
 dangers-0.023
Token have
Token ID423
Feature activation+4.74e-02

Pos logit contributions
 blush+0.059
 coloring+0.054
 Grac+0.052
 piping+0.051
 wig+0.050
Neg logit contributions
urious-0.077
rated-0.064
ust-0.063
 dangers-0.059
uct-0.058
Token a
Token ID257
Feature activation+5.52e-02

Pos logit contributions
 blush+0.064
 coloring+0.059
 Grac+0.055
 piping+0.052
 tea+0.052
Neg logit contributions
urious-0.113
rated-0.095
ust-0.093
uct-0.089
 dangers-0.088
Token modern
Token ID3660
Feature activation+2.27e-02

Pos logit contributions
 blush+0.105
 mascara+0.091
 Grac+0.090
 Alexa+0.089
acey+0.088
Neg logit contributions
urious-0.092
 adventure-0.076
 risky-0.073
 journey-0.073
 dangers-0.071
Token truck
Token ID7779
Feature activation-2.67e-02

Pos logit contributions
 blush+0.039
 Grac+0.034
acey+0.034
 Alexa+0.033
 coloring+0.033
Neg logit contributions
urious-0.043
 dangers-0.036
rated-0.035
 eternity-0.035
 habitat-0.034
Token.
Token ID13
Feature activation+2.47e-02

Pos logit contributions
urious+0.052
rated+0.042
ust+0.042
 eternity+0.040
 dangers+0.040
Neg logit contributions
 blush-0.046
 Grac-0.041
 coloring-0.041
 piping-0.040
 groom-0.038
Token A
Token ID317
Feature activation+3.36e-02

Pos logit contributions
 blush+0.043
 coloring+0.041
 Grac+0.039
 piping+0.038
 tea+0.037
Neg logit contributions
urious-0.049
rated-0.040
ust-0.039
uct-0.036
 dangers-0.036
Token truck
Token ID7779
Feature activation-3.43e-02

Pos logit contributions
 blush+0.066
 Grac+0.058
 mascara+0.056
acey+0.056
 Alexa+0.055
Neg logit contributions
urious-0.052
 dangers-0.045
 risky-0.044
rated-0.042
 next-0.042
Token is
Token ID318
Feature activation-1.17e-03

Pos logit contributions
urious+0.070
rated+0.057
ust+0.056
uct+0.053
 dangers+0.052
Neg logit contributions
 blush-0.057
 coloring-0.053
 Grac-0.051
 piping-0.049
 tea-0.049
Token a
Token ID257
Feature activation+3.53e-02

Pos logit contributions
urious+0.003
rated+0.002
ust+0.002
uct+0.002
 dangers+0.002
Neg logit contributions
 blush-0.002
 coloring-0.002
 Grac-0.002
 piping-0.001
 tea-0.001
Token scared
Token ID12008
Feature activation-2.84e-02

Pos logit contributions
 blush+0.044
 coloring+0.042
 Grac+0.039
 piping+0.038
 tea+0.038
Neg logit contributions
urious-0.059
rated-0.049
ust-0.048
uct-0.045
 eternity-0.045
Token and
Token ID290
Feature activation-3.35e-02

Pos logit contributions
urious+0.064
rated+0.054
ust+0.052
uct+0.050
 habitat+0.048
Neg logit contributions
 blush-0.042
 coloring-0.040
 Grac-0.038
 tea-0.036
 dress-0.036
Token confused
Token ID10416
Feature activation-2.60e-02

Pos logit contributions
urious+0.077
ust+0.063
rated+0.060
aut+0.058
 habitat+0.058
Neg logit contributions
 blush-0.049
 coloring-0.042
 Grac-0.042
 dress-0.041
 Rose-0.040
Token,
Token ID11
Feature activation+1.98e-02

Pos logit contributions
urious+0.058
rated+0.048
ust+0.046
uct+0.045
 dangers+0.043
Neg logit contributions
 blush-0.040
 coloring-0.037
 Grac-0.035
 dress-0.034
 tea-0.033
Token but
Token ID475
Feature activation-4.71e-02

Pos logit contributions
 blush+0.027
 coloring+0.025
 Grac+0.023
 dress+0.022
 tea+0.021
Neg logit contributions
urious-0.048
rated-0.040
ust-0.040
uct-0.037
 dangers-0.036
Token he
Token ID339
Feature activation-2.84e-02

Pos logit contributions
urious+0.118
rated+0.100
ust+0.095
 habitat+0.095
 habitats+0.092
Neg logit contributions
 blush-0.060
 coloring-0.058
 Grac-0.054
 Rose-0.051
 dress-0.050
Token was
Token ID373
Feature activation-2.38e-04

Pos logit contributions
urious+0.063
rated+0.054
ust+0.050
 eternity+0.050
 habitat+0.050
Neg logit contributions
 blush-0.041
 coloring-0.038
 dress-0.036
 Grac-0.036
 groom-0.035
Token determined
Token ID5295
Feature activation-5.66e-02

Pos logit contributions
urious+0.000
rated+0.000
ust+0.000
 habitat+0.000
 dangers+0.000
Neg logit contributions
 blush-0.000
 coloring-0.000
 Grac-0.000
 groom-0.000
 dress-0.000
Token to
Token ID284
Feature activation-2.35e-02

Pos logit contributions
urious+0.144
rated+0.123
ust+0.118
uct+0.114
 habitat+0.112
Neg logit contributions
 blush-0.064
 coloring-0.064
 dress-0.055
 Grac-0.053
 kit-0.051
Token stay
Token ID2652
Feature activation-1.15e-02

Pos logit contributions
urious+0.053
rated+0.044
ust+0.043
uct+0.041
 dangers+0.041
Neg logit contributions
 blush-0.034
 coloring-0.032
 Grac-0.030
 dress-0.029
 tea-0.029
Token brave
Token ID14802
Feature activation-1.17e-02

Pos logit contributions
urious+0.026
rated+0.022
ust+0.021
uct+0.021
 eternity+0.020
Neg logit contributions
 blush-0.016
 coloring-0.016
 Grac-0.015
 dress-0.014
 tea-0.014
Token girl
Token ID2576
Feature activation+2.88e-02

Pos logit contributions
urious+0.024
rated+0.018
 earthquake+0.018
 dangers+0.018
 next+0.017
Neg logit contributions
 blush-0.025
 coloring-0.023
acey-0.022
 Grac-0.022
 piping-0.022
Token named
Token ID3706
Feature activation+2.48e-02

Pos logit contributions
 blush+0.042
 Grac+0.037
 piping+0.037
 coloring+0.036
acey+0.036
Neg logit contributions
urious-0.064
rated-0.051
ust-0.050
 eternity-0.049
 next-0.049
Token Lily
Token ID20037
Feature activation+7.57e-03

Pos logit contributions
 blush+0.031
 coloring+0.029
 piping+0.027
 groom+0.027
 wig+0.027
Neg logit contributions
urious-0.060
rated-0.050
ust-0.050
uct-0.048
 eternity-0.046
Token who
Token ID508
Feature activation-1.78e-02

Pos logit contributions
 blush+0.012
 coloring+0.012
 dress+0.011
 Grac+0.011
 tea+0.011
Neg logit contributions
urious-0.016
rated-0.013
ust-0.013
uct-0.012
 dangers-0.012
Token was
Token ID373
Feature activation-2.60e-02

Pos logit contributions
urious+0.036
rated+0.029
ust+0.028
 dangers+0.027
uct+0.026
Neg logit contributions
 blush-0.031
 coloring-0.029
 Grac-0.028
 piping-0.027
 groom-0.026
Token going
Token ID1016
Feature activation-6.02e-02

Pos logit contributions
urious+0.055
ust+0.044
rated+0.043
 next+0.043
 dangers+0.042
Neg logit contributions
 blush-0.039
 piping-0.035
 wig-0.034
 Grac-0.033
 coloring-0.033
Token to
Token ID284
Feature activation-4.78e-02

Pos logit contributions
urious+0.137
rated+0.114
ust+0.111
 dangers+0.107
uct+0.106
Neg logit contributions
 blush-0.087
 Grac-0.076
 coloring-0.076
 piping-0.074
acey-0.072
Token start
Token ID923
Feature activation-8.64e-03

Pos logit contributions
urious+0.100
rated+0.081
ust+0.079
uct+0.077
 eternity+0.076
Neg logit contributions
 blush-0.076
 coloring-0.069
 Grac-0.067
acey-0.065
 piping-0.064
Token school
Token ID1524
Feature activation+7.30e-03

Pos logit contributions
urious+0.021
rated+0.018
ust+0.017
uct+0.016
 dangers+0.016
Neg logit contributions
 blush-0.012
 coloring-0.011
 Grac-0.010
 tea-0.010
 dress-0.010
Token soon
Token ID2582
Feature activation-1.67e-02

Pos logit contributions
 blush+0.015
 Grac+0.014
 coloring+0.013
 piping+0.013
 groom+0.013
Neg logit contributions
urious-0.012
rated-0.009
ust-0.009
uct-0.009
 eternity-0.009
Token.
Token ID13
Feature activation-1.88e-02

Pos logit contributions
urious+0.037
ust+0.030
rated+0.030
uct+0.028
 dangers+0.027
Neg logit contributions
 blush-0.026
 coloring-0.026
 dress-0.023
 tea-0.023
 Grac-0.022
Token park
Token ID3952
Feature activation+9.06e-03

Pos logit contributions
 blush+0.018
 coloring+0.016
 dress+0.016
 tea+0.016
 piping+0.015
Neg logit contributions
urious-0.021
rated-0.018
ust-0.018
 eternity-0.017
uct-0.016
Token and
Token ID290
Feature activation-4.91e-02

Pos logit contributions
 blush+0.013
 coloring+0.013
 Grac+0.012
 tea+0.011
 dress+0.011
Neg logit contributions
urious-0.020
rated-0.017
ust-0.017
uct-0.016
 dangers-0.016
Token look
Token ID804
Feature activation-4.96e-03

Pos logit contributions
urious+0.100
rated+0.086
ust+0.080
 eternity+0.079
uct+0.079
Neg logit contributions
 blush-0.073
acey-0.071
 coloring-0.069
 Grac-0.068
 piping-0.067
Token for
Token ID329
Feature activation-3.54e-02

Pos logit contributions
urious+0.008
rated+0.007
ust+0.006
 eternity+0.006
uct+0.006
Neg logit contributions
 blush-0.010
 coloring-0.009
 kit-0.009
 Grac-0.009
 piping-0.009
Token something
Token ID1223
Feature activation+1.34e-02

Pos logit contributions
urious+0.077
rated+0.065
ust+0.062
uct+0.060
 eternity+0.059
Neg logit contributions
 blush-0.055
 coloring-0.052
 Grac-0.049
 tea-0.047
 dress-0.047
Token y
Token ID331
Feature activation-7.97e-02

Pos logit contributions
 blush+0.021
 coloring+0.019
 Grac+0.018
 piping+0.017
 tea+0.017
Neg logit contributions
urious-0.030
rated-0.025
ust-0.024
uct-0.023
 dangers-0.022
Tokenummy
Token ID13513
Feature activation-1.58e-02

Pos logit contributions
urious+0.134
rated+0.118
uct+0.110
 dangers+0.104
appointed+0.098
Neg logit contributions
 blush-0.147
 coloring-0.139
 tea-0.136
 lipstick-0.133
 dress-0.129
Token to
Token ID284
Feature activation+1.26e-03

Pos logit contributions
urious+0.037
rated+0.031
ust+0.030
uct+0.029
 eternity+0.028
Neg logit contributions
 blush-0.023
 coloring-0.021
 Grac-0.019
 tea-0.019
 dress-0.019
Token eat
Token ID4483
Feature activation-4.21e-03

Pos logit contributions
 blush+0.002
 Grac+0.002
 coloring+0.002
acey+0.002
 piping+0.002
Neg logit contributions
urious-0.003
rated-0.002
ust-0.002
uct-0.002
 eternity-0.002
Token.
Token ID13
Feature activation-3.04e-02

Pos logit contributions
urious+0.010
rated+0.008
ust+0.008
appointed+0.008
uct+0.008
Neg logit contributions
 blush-0.006
 coloring-0.006
 tea-0.005
 dress-0.005
 lipstick-0.005
Token
Token ID198
Feature activation-2.69e-02

Pos logit contributions
urious+0.066
rated+0.057
ust+0.054
uct+0.051
 dangers+0.051
Neg logit contributions
 blush-0.047
 coloring-0.045
 Grac-0.043
 tea-0.041
 Rose-0.041
Token legs
Token ID7405
Feature activation-9.49e-03

Pos logit contributions
urious+0.051
rated+0.040
ust+0.040
 dangers+0.038
uct+0.038
Neg logit contributions
 blush-0.040
 coloring-0.038
 Grac-0.037
 piping-0.035
acey-0.035
Token.
Token ID13
Feature activation-4.11e-03

Pos logit contributions
urious+0.018
rated+0.014
ust+0.014
uct+0.013
 dangers+0.013
Neg logit contributions
 blush-0.017
 coloring-0.016
 Grac-0.016
 piping-0.015
 tea-0.015
Token He
Token ID679
Feature activation-2.11e-02

Pos logit contributions
urious+0.007
rated+0.006
ust+0.005
uct+0.005
 eternity+0.005
Neg logit contributions
 blush-0.008
 coloring-0.008
 Grac-0.007
 groom-0.007
 piping-0.007
Token goes
Token ID2925
Feature activation+6.54e-04

Pos logit contributions
urious+0.039
rated+0.030
ust+0.029
uct+0.028
 dangers+0.028
Neg logit contributions
 blush-0.038
 coloring-0.037
 Grac-0.036
 piping-0.035
acey-0.034
Token very
Token ID845
Feature activation+1.83e-02

Pos logit contributions
 blush+0.001
 piping+0.001
 Grac+0.001
acey+0.001
 coloring+0.001
Neg logit contributions
urious-0.001
rated-0.001
 next-0.001
uct-0.001
appointed-0.001
Token high
Token ID1029
Feature activation-8.24e-02

Pos logit contributions
 blush+0.024
 coloring+0.023
 Grac+0.021
 dress+0.020
 groom+0.020
Neg logit contributions
urious-0.044
rated-0.036
ust-0.035
uct-0.035
 habitat-0.034
Token.
Token ID13
Feature activation-4.10e-03

Pos logit contributions
urious+0.197
ust+0.160
rated+0.158
uct+0.154
 habitat+0.149
Neg logit contributions
 coloring-0.105
 blush-0.104
 dress-0.094
 tea-0.093
 Grac-0.092
Token He
Token ID679
Feature activation-2.58e-02

Pos logit contributions
urious+0.008
rated+0.006
ust+0.006
uct+0.006
 dangers+0.006
Neg logit contributions
 blush-0.007
 coloring-0.007
 Grac-0.007
 piping-0.007
 tea-0.006
Token feels
Token ID5300
Feature activation+1.80e-02

Pos logit contributions
urious+0.048
rated+0.038
ust+0.037
uct+0.035
 dangers+0.035
Neg logit contributions
 blush-0.047
 coloring-0.044
 Grac-0.044
 piping-0.042
acey-0.041
Token happy
Token ID3772
Feature activation+7.18e-03

Pos logit contributions
 blush+0.032
 coloring+0.031
acey+0.029
 Grac+0.029
 kit+0.029
Neg logit contributions
urious-0.032
rated-0.026
 dangers-0.026
ust-0.025
uct-0.025
Token.
Token ID13
Feature activation-5.02e-03

Pos logit contributions
 blush+0.011
 coloring+0.011
 dress+0.010
 tea+0.010
 Grac+0.010
Neg logit contributions
urious-0.016
rated-0.013
ust-0.013
uct-0.012
appointed-0.012
Token fall
Token ID2121
Feature activation-2.47e-02

Pos logit contributions
urious+0.081
rated+0.064
ust+0.063
uct+0.063
 habitats+0.062
Neg logit contributions
 blush-0.045
 coloring-0.042
 dress-0.040
 Grac-0.040
 tea-0.039
Token.
Token ID13
Feature activation-1.41e-02

Pos logit contributions
urious+0.057
rated+0.046
ust+0.045
uct+0.045
 eternity+0.042
Neg logit contributions
 blush-0.036
 coloring-0.034
 Grac-0.032
 dress-0.031
 tea-0.031
Token
Token ID220
Feature activation-2.57e-02

Pos logit contributions
urious+0.026
rated+0.020
uct+0.020
ust+0.019
 eternity+0.019
Neg logit contributions
 blush-0.025
 groom-0.024
 coloring-0.024
 piping-0.023
 wig-0.023
Token
Token ID198
Feature activation-2.49e-02

Pos logit contributions
urious+0.060
rated+0.053
ust+0.049
 dangers+0.048
 habitat+0.048
Neg logit contributions
 blush-0.033
 Grac-0.032
 coloring-0.032
 Rose-0.030
acey-0.030
Token
Token ID198
Feature activation-2.17e-02

Pos logit contributions
urious+0.042
rated+0.031
ust+0.031
uct+0.030
 eternity+0.028
Neg logit contributions
 blush-0.051
 coloring-0.048
 groom-0.047
 wig-0.045
 piping-0.045
TokenSp
Token ID4561
Feature activation-7.90e-02

Pos logit contributions
urious+0.040
rated+0.030
ust+0.030
uct+0.028
 dangers+0.027
Neg logit contributions
 blush-0.042
 coloring-0.040
 groom-0.038
 piping-0.038
 Grac-0.037
Tokenide
Token ID485
Feature activation-4.70e-02

Pos logit contributions
urious+0.127
rated+0.102
uct+0.099
appointed+0.093
 eternity+0.093
Neg logit contributions
 blush-0.151
 coloring-0.150
 Grac-0.145
acey-0.142
 tea-0.139
Tokeny
Token ID88
Feature activation+1.95e-02

Pos logit contributions
urious+0.118
rated+0.099
uct+0.099
ust+0.098
 eternity+0.096
Neg logit contributions
 blush-0.053
 coloring-0.049
 Grac-0.048
 tea-0.043
acey-0.043
Token listened
Token ID16399
Feature activation-3.07e-02

Pos logit contributions
 blush+0.026
 dress+0.024
 coloring+0.023
 comb+0.022
 Grac+0.022
Neg logit contributions
urious-0.044
rated-0.038
ust-0.038
 eternity-0.036
 dangers-0.036
Token to
Token ID284
Feature activation+7.67e-03

Pos logit contributions
urious+0.074
rated+0.062
ust+0.061
uct+0.059
 dangers+0.059
Neg logit contributions
 blush-0.040
 coloring-0.038
 Grac-0.035
 tea-0.033
 dress-0.033
Token his
Token ID465
Feature activation-3.44e-02

Pos logit contributions
 blush+0.015
 coloring+0.014
 piping+0.013
 Grac+0.013
 groom+0.013
Neg logit contributions
urious-0.014
rated-0.011
ust-0.010
 eternity-0.010
 dangers-0.010
Tokenberries
Token ID20853
Feature activation-2.61e-02

Pos logit contributions
urious+0.031
rated+0.025
ust+0.025
uct+0.023
 eternity+0.023
Neg logit contributions
 blush-0.023
 coloring-0.021
 Grac-0.020
acey-0.020
 piping-0.019
Token.
Token ID13
Feature activation-3.98e-03

Pos logit contributions
urious+0.060
rated+0.048
ust+0.047
uct+0.046
 dangers+0.046
Neg logit contributions
 blush-0.038
 coloring-0.036
 Grac-0.034
 tea-0.034
 dress-0.033
Token They
Token ID1119
Feature activation+1.54e-02

Pos logit contributions
urious+0.008
rated+0.006
ust+0.006
uct+0.006
 dangers+0.006
Neg logit contributions
 blush-0.007
 coloring-0.007
 Grac-0.006
 piping-0.006
 tea-0.006
Token look
Token ID804
Feature activation+3.66e-03

Pos logit contributions
 blush+0.021
 coloring+0.020
 piping+0.018
 wig+0.018
acey+0.018
Neg logit contributions
urious-0.035
rated-0.031
ust-0.030
uct-0.028
 dangers-0.028
Token so
Token ID523
Feature activation-2.25e-03

Pos logit contributions
 blush+0.006
 coloring+0.006
 kit+0.005
 wig+0.005
acey+0.005
Neg logit contributions
urious-0.007
rated-0.006
ust-0.006
uct-0.006
 eternity-0.006
Token y
Token ID331
Feature activation-8.01e-02

Pos logit contributions
urious+0.005
rated+0.005
ust+0.004
 eternity+0.004
uct+0.004
Neg logit contributions
 blush-0.003
 coloring-0.003
 Grac-0.003
 tea-0.003
 piping-0.002
Tokenummy
Token ID13513
Feature activation-1.91e-02

Pos logit contributions
urious+0.134
rated+0.106
 dangers+0.099
uct+0.098
ust+0.095
Neg logit contributions
 blush-0.158
 coloring-0.149
 Grac-0.148
 tea-0.145
 lipstick-0.141
Token!"
Token ID2474
Feature activation+1.61e-02

Pos logit contributions
urious+0.038
rated+0.032
ust+0.031
 eternity+0.029
uct+0.029
Neg logit contributions
 blush-0.032
 coloring-0.030
 Grac-0.029
 piping-0.028
acey-0.028
Token
Token ID198
Feature activation-3.00e-02

Pos logit contributions
 blush+0.030
 coloring+0.028
 Grac+0.027
 piping+0.026
 groom+0.026
Neg logit contributions
urious-0.030
rated-0.024
ust-0.023
uct-0.022
 eternity-0.022
Token
Token ID198
Feature activation-2.71e-02

Pos logit contributions
urious+0.056
rated+0.044
ust+0.043
uct+0.041
 eternity+0.040
Neg logit contributions
 blush-0.056
 coloring-0.053
 Grac-0.051
 piping-0.049
 tea-0.049
TokenHer
Token ID9360
Feature activation-1.91e-02

Pos logit contributions
urious+0.052
rated+0.040
ust+0.040
uct+0.038
appointed+0.037
Neg logit contributions
 blush-0.050
 coloring-0.048
 Grac-0.045
 piping-0.044
 tea-0.044
Token.
Token ID13
Feature activation-5.99e-03

Pos logit contributions
 blush+0.047
 coloring+0.046
 Grac+0.042
 tea+0.042
 dress+0.041
Neg logit contributions
urious-0.058
rated-0.048
ust-0.047
uct-0.045
 eternity-0.043
Token But
Token ID887
Feature activation-4.61e-02

Pos logit contributions
urious+0.012
rated+0.009
ust+0.009
uct+0.009
 dangers+0.008
Neg logit contributions
 blush-0.011
 coloring-0.010
 Grac-0.010
 piping-0.009
 tea-0.009
Token they
Token ID484
Feature activation-3.81e-03

Pos logit contributions
urious+0.107
rated+0.089
ust+0.086
uct+0.082
 habitat+0.080
Neg logit contributions
 blush-0.068
 coloring-0.067
 Grac-0.064
 dress-0.061
 tea-0.059
Token are
Token ID389
Feature activation+1.24e-02

Pos logit contributions
urious+0.009
rated+0.008
ust+0.007
uct+0.007
 habitat+0.007
Neg logit contributions
 blush-0.005
 coloring-0.005
 Grac-0.005
 tea-0.004
 dress-0.004
Token too
Token ID1165
Feature activation-2.79e-02

Pos logit contributions
 blush+0.019
 coloring+0.018
 Grac+0.017
 tea+0.017
 piping+0.016
Neg logit contributions
urious-0.027
rated-0.022
ust-0.022
uct-0.021
 eternity-0.020
Token high
Token ID1029
Feature activation-1.04e-01

Pos logit contributions
urious+0.070
rated+0.064
uct+0.062
 eternity+0.061
ust+0.059
Neg logit contributions
 coloring-0.027
 Grac-0.026
 tea-0.025
 blush-0.025
 handful-0.024
Token for
Token ID329
Feature activation-1.71e-02

Pos logit contributions
urious+0.251
ust+0.205
rated+0.198
uct+0.198
 dangers+0.187
Neg logit contributions
 coloring-0.136
 blush-0.136
 Grac-0.125
 tea-0.122
 dress-0.116
Token you
Token ID345
Feature activation+5.36e-03

Pos logit contributions
urious+0.032
rated+0.027
ust+0.026
uct+0.023
appointed+0.022
Neg logit contributions
 blush-0.033
 coloring-0.032
 tea-0.031
 Grac-0.030
 dress-0.029
Token to
Token ID284
Feature activation+1.57e-03

Pos logit contributions
 blush+0.007
 coloring+0.007
 Grac+0.006
 tea+0.006
 dress+0.006
Neg logit contributions
urious-0.013
rated-0.011
ust-0.010
uct-0.010
 eternity-0.010
Token reach
Token ID3151
Feature activation-1.67e-02

Pos logit contributions
 blush+0.002
 coloring+0.002
 dress+0.002
 tea+0.002
 Grac+0.002
Neg logit contributions
urious-0.004
rated-0.003
ust-0.003
uct-0.003
 dangers-0.003
Token.
Token ID13
Feature activation-9.02e-03

Pos logit contributions
urious+0.037
rated+0.031
ust+0.030
uct+0.029
 dangers+0.028
Neg logit contributions
 blush-0.025
 coloring-0.024
 Grac-0.023
 tea-0.022
 piping-0.021