TOP ACTIVATIONS
MAX = 0.100

a little girl called Stacy . She was just three years
the window . T ara had a great time
believe it ! T ara jumped in and started
it pretty . T ina had an idea to
But one day , when D ina was riding her tr
back . T ina was curious . She
beside Tara . T ara was very patient and
It smelled horrible . Elsa was sad . She
noticed before . T ina wanted to explore the
be seen . B izzy tried asking some of
could go on . One day Stacy decided she
to examine it . So she took a closer
v ase . T ina saw a tall vine
playing in the park , D ina would ride her tr
the wind . B izzy woke up and couldn
noise from outside . St acy opened the door
special little girl named Stacy . She was always so curious
in the mud . Gr ace found it .
towel . T ina named the kitten Lightning
neck . T illy spent every day playing

MIN ACTIVATIONS
MIN = -0.119

Hello , kids ," the firefighter says . " Do you
make you sick !" the man yells . Tim
" Bad dog !" the man says . He is angry
, qu ack !" The ducks and ge ese made loud
a crocod ile !" the man says . " We need
worry , son !" the man says . " I will
, kids !" the zo oke eper said . " The
M ira and R avi and their friends ran to
back my things ," the man says . He is right
!" one of the fire men said . " Did you
Stop ! Stop !" the man calls . He runs to
might hurt you ." the man says . Anna
stop !" L ila and Ben shout . They run to
I 'm sorry ," the man says . " Rex is
are missing out ," the man says . Tom
, bad dog !" the man says . He sounds angry
" O w !" the man says . " The dog
oc eros !" Anna and Ben said . The
the zoo ," the zo oke eper says . " Don
" Hey , you !" he shouted . " You

INTERVAL 0.045 - 0.100
CONTAINS 1.081%

practiced in an icy room . This time it was easier
. As soon as she entered the room and started to
stepped into the movie theater , she immediately saw popcorn on
m op by her side . <|endoftext|> Once upon a time
very happy . She kept dreaming about her style each night

INTERVAL -0.009 - 0.045
CONTAINS 67.893%

went to the store . So the little girl happily ran
, mom . We are sorry . We will clean our
there was a little girl named Lily . She loved to
The little girl smiled , laughed , and kept playing .
she went to the stage . She saw a man .

INTERVAL -0.064 - -0.009
CONTAINS 31.000%

will be mad if you break the h angers ."
the boat so much that she ran away from him and
home , John was angry . He had realized that Sam
were so excited that they all joined hands and wished as
the bear and the bear grate fully accepted .

INTERVAL -0.119 - -0.064
CONTAINS 0.026%

?" Tom and Mia shook their heads . They
sorry , we are sorry !" The angry man
! O ink !" The pig said . It shook its
. Ben and Mia see the dragon 's face
with cubes . They have many cubes of different colors and
Token a
Token ID257
Feature activation+7.31e-03

Pos logit contributions
enny+0.011
 Jewel+0.011
Em+0.011
pillar+0.011
Well+0.010
Neg logit contributions
 hon-0.011
ans-0.010
cks-0.010
 ge-0.010
ep-0.009
Token little
Token ID1310
Feature activation+1.28e-02

Pos logit contributions
 Jewel+0.011
Em+0.011
enny+0.011
pillar+0.011
Of+0.010
Neg logit contributions
ans-0.013
 hon-0.013
cks-0.013
 steer-0.012
 ge-0.012
Token girl
Token ID2576
Feature activation+1.47e-02

Pos logit contributions
enny+0.018
 Jewel+0.018
Em+0.017
pillar+0.017
Well+0.016
Neg logit contributions
 hon-0.026
ans-0.025
 ge-0.024
cks-0.024
 paddle-0.023
Token called
Token ID1444
Feature activation+2.73e-02

Pos logit contributions
enny+0.028
 Jewel+0.028
Em+0.027
pillar+0.027
Well+0.026
Neg logit contributions
 hon-0.023
ans-0.022
cks-0.020
 ge-0.019
 steer-0.019
Token Stacy
Token ID49523
Feature activation+1.86e-02

Pos logit contributions
pillar+0.048
Well+0.047
Really+0.046
Hmm+0.046
ometime+0.046
Neg logit contributions
 hon-0.043
 ge-0.039
ans-0.039
cks-0.036
 polar-0.036
Token.
Token ID13
Feature activation+9.99e-02

Pos logit contributions
enny+0.033
Em+0.033
 Jewel+0.032
pillar+0.031
 she+0.031
Neg logit contributions
 hon-0.032
cks-0.030
ans-0.029
 paddle-0.028
 afloat-0.028
Token She
Token ID1375
Feature activation+2.22e-02

Pos logit contributions
ths+0.155
stone+0.142
Em+0.138
cha+0.124
 Emmy+0.123
Neg logit contributions
 hon-0.170
 their-0.162
 paddle-0.159
cks-0.157
om-0.157
Token was
Token ID373
Feature activation+1.46e-02

Pos logit contributions
enny+0.040
 Jewel+0.040
pillar+0.039
Em+0.038
Hmm+0.038
Neg logit contributions
 hon-0.037
ans-0.033
ep-0.032
cks-0.031
 ge-0.031
Token just
Token ID655
Feature activation+2.51e-03

Pos logit contributions
enny+0.029
pillar+0.028
Em+0.028
oph+0.027
ometime+0.027
Neg logit contributions
 hon-0.023
ans-0.020
 ge-0.019
cks-0.019
 polar-0.019
Token three
Token ID1115
Feature activation-6.23e-03

Pos logit contributions
enny+0.004
Em+0.004
pillar+0.004
 Jewel+0.004
Well+0.003
Neg logit contributions
 hon-0.005
ans-0.005
 ge-0.005
cks-0.004
 paddle-0.004
Token years
Token ID812
Feature activation+1.51e-02

Pos logit contributions
 hon+0.014
ans+0.013
 paddle+0.013
cks+0.013
 steer+0.013
Neg logit contributions
 Jewel-0.007
enny-0.007
Em-0.007
pillar-0.007
 she-0.006
Token the
Token ID262
Feature activation-4.28e-03

Pos logit contributions
 Jewel+0.029
enny+0.029
Em+0.029
pillar+0.028
 she+0.027
Neg logit contributions
 hon-0.026
ans-0.026
cks-0.024
 ge-0.023
ep-0.023
Token window
Token ID4324
Feature activation+6.84e-03

Pos logit contributions
 hon+0.008
ans+0.008
cks+0.008
 ge+0.008
ull+0.008
Neg logit contributions
 Jewel-0.006
Em-0.006
enny-0.006
pillar-0.006
ometime-0.005
Token.
Token ID13
Feature activation+5.06e-02

Pos logit contributions
enny+0.013
 Jewel+0.013
Em+0.013
pillar+0.013
 she+0.013
Neg logit contributions
 hon-0.010
ans-0.009
cks-0.009
 ge-0.008
ep-0.008
Token
Token ID198
Feature activation+2.51e-02

Pos logit contributions
stone+0.096
ths+0.093
pillar+0.090
ience+0.087
umm+0.086
Neg logit contributions
 hon-0.067
ans-0.059
om-0.057
 sled-0.057
 paddle-0.056
Token
Token ID198
Feature activation+8.29e-02

Pos logit contributions
enny+0.046
Em+0.045
 Jewel+0.045
pillar+0.044
Well+0.042
Neg logit contributions
 hon-0.041
ans-0.039
cks-0.037
 ge-0.036
 paddle-0.035
TokenT
Token ID51
Feature activation+9.94e-02

Pos logit contributions
umm+0.125
stone+0.121
 Jewel+0.120
 Well+0.114
 cartridge+0.113
Neg logit contributions
 hon-0.154
ep-0.153
ans-0.143
 steer-0.135
cks-0.130
Tokenara
Token ID3301
Feature activation+4.28e-03

Pos logit contributions
 Jewel+0.149
Really+0.143
stone+0.140
Certainly+0.140
 Close+0.138
Neg logit contributions
ans-0.180
ep-0.169
 hon-0.164
cks-0.149
 snowball-0.147
Token had
Token ID550
Feature activation+1.20e-02

Pos logit contributions
enny+0.006
Em+0.006
pillar+0.006
 Jewel+0.006
Well+0.005
Neg logit contributions
ans-0.008
 hon-0.008
cks-0.008
 ge-0.008
 paddle-0.008
Token a
Token ID257
Feature activation+3.20e-02

Pos logit contributions
enny+0.020
 Jewel+0.020
Em+0.020
pillar+0.019
innie+0.018
Neg logit contributions
 hon-0.022
ans-0.020
cks-0.019
 ge-0.019
ep-0.019
Token great
Token ID1049
Feature activation+2.19e-02

Pos logit contributions
enny+0.058
 Jewel+0.055
Em+0.055
pillar+0.054
stone+0.052
Neg logit contributions
 hon-0.056
 ge-0.047
 paddle-0.047
ans-0.047
 sled-0.047
Token time
Token ID640
Feature activation+6.15e-03

Pos logit contributions
 Jewel+0.035
enny+0.035
Em+0.034
pillar+0.034
Well+0.032
Neg logit contributions
 hon-0.040
ans-0.039
cks-0.037
 ge-0.036
 steer-0.035
Token believe
Token ID1975
Feature activation+2.51e-02

Pos logit contributions
 Jewel+0.001
Em+0.001
enny+0.001
pillar+0.001
stone+0.001
Neg logit contributions
ans-0.002
 hon-0.002
cks-0.002
 ge-0.002
ep-0.002
Token it
Token ID340
Feature activation+3.70e-02

Pos logit contributions
pillar+0.042
enny+0.041
Hmm+0.041
Em+0.040
 Close+0.039
Neg logit contributions
 hon-0.044
cks-0.042
ans-0.040
 ge-0.037
 their-0.037
Token!
Token ID0
Feature activation+3.62e-02

Pos logit contributions
enny+0.099
Really+0.098
pillar+0.097
Em+0.097
Of+0.097
Neg logit contributions
 hon-0.028
ep-0.022
cks-0.021
ans-0.020
 steer-0.015
Token
Token ID198
Feature activation+2.39e-02

Pos logit contributions
stone+0.067
 Jewel+0.065
pillar+0.065
Em+0.065
enny+0.064
Neg logit contributions
 hon-0.052
ans-0.050
cks-0.045
 paddle-0.044
 ge-0.044
Token
Token ID198
Feature activation+8.27e-02

Pos logit contributions
enny+0.041
Em+0.041
 Jewel+0.040
pillar+0.039
Well+0.038
Neg logit contributions
 hon-0.042
ans-0.041
cks-0.039
 ge-0.038
 paddle-0.037
TokenT
Token ID51
Feature activation+9.93e-02

Pos logit contributions
stone+0.114
 Jewel+0.113
umm+0.110
 Well+0.107
enny+0.106
Neg logit contributions
 hon-0.164
ep-0.158
ans-0.157
 steer-0.144
cks-0.142
Tokenara
Token ID3301
Feature activation-1.43e-03

Pos logit contributions
 Jewel+0.147
Really+0.143
stone+0.136
Certainly+0.136
Hmm+0.136
Neg logit contributions
ans-0.182
ep-0.172
 hon-0.171
cks-0.153
ull-0.152
Token jumped
Token ID11687
Feature activation+2.38e-02

Pos logit contributions
 hon+0.003
ans+0.003
cks+0.003
 ge+0.002
 paddle+0.002
Neg logit contributions
enny-0.002
Em-0.002
 Jewel-0.002
pillar-0.002
Well-0.002
Token in
Token ID287
Feature activation-4.75e-03

Pos logit contributions
 Jewel+0.040
enny+0.038
Em+0.038
pillar+0.037
 Grace+0.036
Neg logit contributions
ans-0.041
 hon-0.041
cks-0.040
 ge-0.038
ep-0.036
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
ans+0.007
 hon+0.007
cks+0.007
 ge+0.006
ep+0.006
Neg logit contributions
 Jewel-0.009
Em-0.009
enny-0.009
pillar-0.009
 Grace-0.008
Token started
Token ID2067
Feature activation+4.10e-03

Pos logit contributions
enny+0.020
 Jewel+0.020
Em+0.020
pillar+0.019
innie+0.019
Neg logit contributions
 hon-0.023
ans-0.021
cks-0.020
 ge-0.020
ep-0.019
Token it
Token ID340
Feature activation+2.44e-04

Pos logit contributions
 Jewel+0.006
enny+0.006
Em+0.006
pillar+0.006
 she+0.006
Neg logit contributions
 hon-0.008
ans-0.008
cks-0.008
 steer-0.007
 paddle-0.007
Token pretty
Token ID2495
Feature activation+5.76e-03

Pos logit contributions
 she+0.000
 Jewel+0.000
stone+0.000
 diary+0.000
enny+0.000
Neg logit contributions
ans-0.000
 hon-0.000
cks-0.000
 ge-0.000
ep-0.000
Token.
Token ID13
Feature activation+3.88e-02

Pos logit contributions
 Jewel+0.009
enny+0.008
Em+0.008
 she+0.008
pillar+0.008
Neg logit contributions
 hon-0.011
ans-0.011
cks-0.010
 ge-0.010
 paddle-0.010
Token
Token ID198
Feature activation+1.20e-02

Pos logit contributions
 Jewel+0.066
pillar+0.065
enny+0.064
Really+0.063
umm+0.063
Neg logit contributions
 hon-0.060
ans-0.058
 paddle-0.052
 ge-0.051
ep-0.051
Token
Token ID198
Feature activation+6.84e-02

Pos logit contributions
enny+0.021
Em+0.021
 Jewel+0.021
pillar+0.020
Well+0.019
Neg logit contributions
 hon-0.020
ans-0.020
cks-0.019
 ge-0.018
 paddle-0.018
TokenT
Token ID51
Feature activation+9.88e-02

Pos logit contributions
umm+0.114
 Jewel+0.109
Really+0.104
 Well+0.103
 Close+0.102
Neg logit contributions
ep-0.117
 hon-0.116
ans-0.109
 steer-0.100
 paddle-0.097
Tokenina
Token ID1437
Feature activation-1.35e-02

Pos logit contributions
Really+0.158
 Jewel+0.153
Certainly+0.148
Hmm+0.146
Of+0.143
Neg logit contributions
ans-0.178
ep-0.168
ull-0.157
 hon-0.150
an-0.146
Token had
Token ID550
Feature activation-1.27e-03

Pos logit contributions
 hon+0.025
ans+0.024
cks+0.022
 ge+0.022
ep+0.022
Neg logit contributions
 Jewel-0.021
enny-0.021
Em-0.021
pillar-0.020
Hmm-0.020
Token an
Token ID281
Feature activation+5.13e-03

Pos logit contributions
 hon+0.002
ans+0.002
cks+0.002
 ge+0.002
ep+0.002
Neg logit contributions
enny-0.002
 Jewel-0.002
Em-0.002
pillar-0.002
innie-0.002
Token idea
Token ID2126
Feature activation-2.51e-03

Pos logit contributions
enny+0.012
Em+0.012
pillar+0.012
 Jewel+0.012
Ruby+0.011
Neg logit contributions
 hon-0.006
ans-0.006
cks-0.005
ep-0.005
 ge-0.005
Token to
Token ID284
Feature activation-5.36e-03

Pos logit contributions
 hon+0.003
ans+0.003
cks+0.003
 ge+0.002
ep+0.002
Neg logit contributions
 Jewel-0.006
enny-0.006
Em-0.006
pillar-0.006
Hmm-0.005
TokenBut
Token ID1537
Feature activation+4.94e-02

Pos logit contributions
 Jewel+0.095
enny+0.095
stone+0.095
 diary+0.095
 cartridge+0.092
Neg logit contributions
 hon-0.132
ans-0.126
 ge-0.124
ep-0.119
 paddle-0.117
Token one
Token ID530
Feature activation+2.06e-02

Pos logit contributions
pillar+0.095
enny+0.095
Em+0.091
ometime+0.088
umm+0.088
Neg logit contributions
 hon-0.077
 ge-0.073
 their-0.068
ans-0.068
 paddle-0.065
Token day
Token ID1110
Feature activation+4.01e-02

Pos logit contributions
 Jewel+0.036
Em+0.034
enny+0.034
 cartridge+0.034
stone+0.033
Neg logit contributions
 hon-0.036
 ge-0.032
ans-0.031
 polar-0.031
 snowball-0.030
Token,
Token ID11
Feature activation+5.51e-02

Pos logit contributions
Em+0.100
 cartridge+0.096
enny+0.096
 Jewel+0.096
pillar+0.095
Neg logit contributions
 hon-0.038
 ge-0.036
 paddle-0.030
 polar-0.029
ans-0.028
Token when
Token ID618
Feature activation+4.35e-02

Pos logit contributions
pillar+0.131
Em+0.130
enny+0.130
innie+0.127
 cartridge+0.126
Neg logit contributions
 ge-0.060
 their-0.057
 hon-0.055
 paddle-0.049
 polar-0.047
Token D
Token ID360
Feature activation+9.85e-02

Pos logit contributions
enny+0.087
Em+0.087
pillar+0.086
stone+0.083
 cartridge+0.082
Neg logit contributions
 hon-0.065
 ge-0.062
 paddle-0.058
 their-0.057
 polar-0.056
Tokenina
Token ID1437
Feature activation+2.25e-02

Pos logit contributions
Really+0.188
 Jewel+0.170
ths+0.169
Em+0.163
Hmm+0.161
Neg logit contributions
ans-0.156
an-0.120
ull-0.119
 polar-0.118
om-0.117
Token was
Token ID373
Feature activation-1.15e-02

Pos logit contributions
enny+0.034
 Jewel+0.034
Em+0.033
pillar+0.033
Well+0.031
Neg logit contributions
 hon-0.043
ans-0.041
cks-0.039
 ge-0.038
ep-0.037
Token riding
Token ID10311
Feature activation+4.12e-02

Pos logit contributions
ans+0.021
 hon+0.021
cks+0.020
 ge+0.019
cker+0.019
Neg logit contributions
enny-0.017
Em-0.017
 Jewel-0.017
 she-0.016
pillar-0.016
Token her
Token ID607
Feature activation+3.72e-02

Pos logit contributions
enny+0.070
 Jewel+0.069
Em+0.068
pillar+0.067
Well+0.064
Neg logit contributions
 hon-0.072
ans-0.068
cks-0.065
 ge-0.063
ep-0.062
Token tr
Token ID491
Feature activation+3.23e-02

Pos logit contributions
Em+0.074
enny+0.074
 Jewel+0.072
Hmm+0.072
 cartridge+0.072
Neg logit contributions
 hon-0.052
 sled-0.050
ans-0.047
 paddle-0.044
 ge-0.044
Token back
Token ID736
Feature activation+1.20e-02

Pos logit contributions
enny+0.063
 Jewel+0.063
Em+0.062
pillar+0.061
Well+0.059
Neg logit contributions
 hon-0.050
ans-0.048
cks-0.045
 ge-0.043
ep-0.042
Token.
Token ID13
Feature activation+4.24e-02

Pos logit contributions
 Jewel+0.023
enny+0.023
pillar+0.022
Em+0.022
 she+0.021
Neg logit contributions
 hon-0.019
ans-0.018
cks-0.017
ep-0.016
 ge-0.016
Token
Token ID220
Feature activation-4.86e-03

Pos logit contributions
stone+0.079
brate+0.078
ths+0.077
pillar+0.076
ometime+0.076
Neg logit contributions
 hon-0.050
ans-0.046
 sled-0.045
 paddle-0.044
 polar-0.042
Token
Token ID198
Feature activation+3.62e-02

Pos logit contributions
 hon+0.009
ans+0.008
 ge+0.008
cks+0.008
 paddle+0.008
Neg logit contributions
enny-0.008
Em-0.008
 Jewel-0.008
pillar-0.008
Well-0.008
Token
Token ID198
Feature activation+6.60e-02

Pos logit contributions
 Jewel+0.071
enny+0.071
pillar+0.069
Em+0.069
stone+0.067
Neg logit contributions
 hon-0.052
ans-0.048
cks-0.045
ep-0.043
 ge-0.043
TokenT
Token ID51
Feature activation+9.80e-02

Pos logit contributions
umm+0.097
 Jewel+0.091
 Close+0.090
oph+0.088
 cartridge+0.087
Neg logit contributions
ep-0.127
 hon-0.123
ans-0.109
 steer-0.108
 sled-0.105
Tokenina
Token ID1437
Feature activation-2.96e-03

Pos logit contributions
 Jewel+0.154
Really+0.149
brate+0.145
Hmm+0.141
Certainly+0.139
Neg logit contributions
ans-0.175
ep-0.163
ull-0.159
 hon-0.157
 polar-0.150
Token was
Token ID373
Feature activation+1.05e-03

Pos logit contributions
 hon+0.006
ans+0.005
cks+0.005
 ge+0.005
ep+0.005
Neg logit contributions
enny-0.004
 Jewel-0.004
Em-0.004
pillar-0.004
Well-0.004
Token curious
Token ID11040
Feature activation+2.94e-02

Pos logit contributions
 Jewel+0.001
Em+0.001
enny+0.001
 she+0.001
 Charlotte+0.001
Neg logit contributions
ans-0.002
 hon-0.002
cks-0.002
 steer-0.002
 ge-0.002
Token.
Token ID13
Feature activation+4.05e-02

Pos logit contributions
 Jewel+0.066
Em+0.066
enny+0.065
Hmm+0.064
innie+0.064
Neg logit contributions
 hon-0.034
ans-0.028
cks-0.026
 ge-0.026
 steer-0.025
Token She
Token ID1375
Feature activation+5.89e-03

Pos logit contributions
ha+0.054
brate+0.053
Em+0.052
pillar+0.051
enny+0.051
Neg logit contributions
 hon-0.072
ans-0.067
 paddle-0.065
 steer-0.064
 horn-0.062
Token beside
Token ID13970
Feature activation+3.37e-02

Pos logit contributions
enny+0.023
 she+0.023
 Jewel+0.023
pillar+0.023
stone+0.022
Neg logit contributions
ans-0.020
 hon-0.020
cks-0.020
 ge-0.018
ep-0.018
Token Tara
Token ID37723
Feature activation+2.73e-02

Pos logit contributions
enny+0.052
 Jewel+0.051
Em+0.051
pillar+0.050
Well+0.047
Neg logit contributions
 hon-0.064
ans-0.061
cks-0.058
 ge-0.057
ep-0.055
Token.
Token ID13
Feature activation+5.19e-02

Pos logit contributions
 Jewel+0.052
enny+0.052
Em+0.052
 she+0.051
pillar+0.051
Neg logit contributions
 hon-0.041
ans-0.039
cks-0.038
 ge-0.037
ep-0.036
Token
Token ID198
Feature activation+3.07e-02

Pos logit contributions
 Jewel+0.082
Em+0.082
enny+0.081
stone+0.081
pillar+0.081
Neg logit contributions
 hon-0.092
ans-0.085
cks-0.081
 paddle-0.079
 ge-0.079
Token
Token ID198
Feature activation+8.38e-02

Pos logit contributions
enny+0.054
Em+0.054
 Jewel+0.053
pillar+0.052
Well+0.050
Neg logit contributions
 hon-0.052
ans-0.051
cks-0.048
 ge-0.047
 paddle-0.045
TokenT
Token ID51
Feature activation+9.75e-02

Pos logit contributions
 Jewel+0.113
stone+0.111
enny+0.107
 cartridge+0.104
umm+0.103
Neg logit contributions
 hon-0.174
ep-0.164
ans-0.162
 steer-0.152
cks-0.150
Tokenara
Token ID3301
Feature activation+1.45e-02

Pos logit contributions
 Jewel+0.153
Really+0.146
stone+0.146
Hmm+0.142
Certainly+0.142
Neg logit contributions
ans-0.166
ep-0.157
 hon-0.154
 snowball-0.142
ull-0.138
Token was
Token ID373
Feature activation+6.59e-03

Pos logit contributions
enny+0.020
Em+0.020
pillar+0.020
 Jewel+0.019
Ruby+0.018
Neg logit contributions
 hon-0.029
ans-0.028
cks-0.028
 ge-0.026
 paddle-0.026
Token very
Token ID845
Feature activation+2.71e-03

Pos logit contributions
 Jewel+0.009
 she+0.009
 Grace+0.009
Em+0.009
 Charlotte+0.008
Neg logit contributions
ans-0.014
 hon-0.013
cks-0.013
 steer-0.012
 paddle-0.012
Token patient
Token ID5827
Feature activation+8.19e-04

Pos logit contributions
 Jewel+0.005
enny+0.005
 she+0.005
Em+0.005
Of+0.005
Neg logit contributions
ans-0.004
 hon-0.004
cks-0.004
ep-0.003
 sled-0.003
Token and
Token ID290
Feature activation+1.06e-02

Pos logit contributions
 Jewel+0.001
enny+0.001
Em+0.001
pillar+0.001
 she+0.001
Neg logit contributions
ans-0.001
 hon-0.001
cks-0.001
 ge-0.001
 paddle-0.001
Token It
Token ID632
Feature activation+1.15e-02

Pos logit contributions
ths+0.119
pr+0.110
enny+0.108
illy+0.108
years+0.106
Neg logit contributions
 hon-0.125
 their-0.115
 paddle-0.113
 horns-0.111
ans-0.109
Token smelled
Token ID37609
Feature activation-1.61e-03

Pos logit contributions
enny+0.018
 Jewel+0.018
Em+0.018
pillar+0.017
innie+0.017
Neg logit contributions
 hon-0.022
ans-0.020
cks-0.020
 ge-0.019
ep-0.019
Token horrible
Token ID12361
Feature activation+1.33e-02

Pos logit contributions
 hon+0.003
ans+0.003
cks+0.003
 ge+0.003
ep+0.003
Neg logit contributions
 Jewel-0.003
enny-0.003
Em-0.003
pillar-0.003
Well-0.002
Token.
Token ID13
Feature activation+6.95e-02

Pos logit contributions
Em+0.029
enny+0.029
 Jewel+0.029
pillar+0.028
Hmm+0.028
Neg logit contributions
 hon-0.017
ans-0.016
cks-0.015
 ge-0.014
ep-0.013
Token
Token ID198
Feature activation+3.51e-02

Pos logit contributions
ths+0.105
pr+0.096
illy+0.094
years+0.093
enny+0.093
Neg logit contributions
 hon-0.121
 their-0.111
ans-0.108
 horns-0.107
 paddle-0.106
Token
Token ID198
Feature activation+9.63e-02

Pos logit contributions
 Jewel+0.072
enny+0.072
pillar+0.070
Em+0.070
stone+0.070
Neg logit contributions
 hon-0.046
ans-0.042
cks-0.041
ep-0.038
 steer-0.038
TokenElsa
Token ID49050
Feature activation+1.61e-02

Pos logit contributions
umm+0.148
stone+0.136
 Well+0.135
enny+0.135
ths+0.133
Neg logit contributions
 hon-0.168
ep-0.165
ans-0.152
 steer-0.152
cks-0.149
Token was
Token ID373
Feature activation-6.20e-03

Pos logit contributions
enny+0.024
 Jewel+0.023
Em+0.023
pillar+0.023
Well+0.021
Neg logit contributions
 hon-0.031
ans-0.030
cks-0.029
 ge-0.028
 steer-0.027
Token sad
Token ID6507
Feature activation+2.49e-02

Pos logit contributions
ans+0.011
 hon+0.010
cks+0.010
 steer+0.010
 ge+0.009
Neg logit contributions
 Jewel-0.011
Em-0.010
 she-0.010
enny-0.010
 Close-0.010
Token.
Token ID13
Feature activation+5.30e-02

Pos logit contributions
Em+0.059
 Jewel+0.057
Hmm+0.057
Ruby+0.056
Close+0.056
Neg logit contributions
 hon-0.028
ans-0.023
 sled-0.022
 ge-0.021
 steer-0.021
Token She
Token ID1375
Feature activation-2.69e-03

Pos logit contributions
ha+0.084
pr+0.083
Em+0.081
prise+0.081
ths+0.080
Neg logit contributions
 hon-0.077
ans-0.068
 horn-0.068
 head-0.067
 noises-0.067
Token noticed
Token ID6810
Feature activation+1.60e-02

Pos logit contributions
enny+0.003
Em+0.003
 she+0.003
 Jewel+0.003
pillar+0.003
Neg logit contributions
ans-0.004
 hon-0.004
cks-0.004
 paddle-0.004
 ge-0.004
Token before
Token ID878
Feature activation-1.05e-02

Pos logit contributions
 Jewel+0.032
enny+0.032
Em+0.032
pillar+0.031
Well+0.030
Neg logit contributions
 hon-0.023
ans-0.022
cks-0.020
 ge-0.019
 steer-0.019
Token.
Token ID13
Feature activation+3.46e-02

Pos logit contributions
ans+0.015
 hon+0.015
cks+0.014
 ge+0.013
ep+0.013
Neg logit contributions
 Jewel-0.020
Em-0.020
enny-0.020
 she-0.020
 Emmy-0.020
Token
Token ID198
Feature activation+3.00e-02

Pos logit contributions
pillar+0.066
stone+0.063
ometime+0.063
 Jewel+0.063
ths+0.061
Neg logit contributions
 hon-0.048
ans-0.043
 paddle-0.042
 polar-0.039
ep-0.039
Token
Token ID198
Feature activation+5.63e-02

Pos logit contributions
 Jewel+0.056
enny+0.056
Em+0.056
pillar+0.055
Well+0.052
Neg logit contributions
 hon-0.046
ans-0.044
cks-0.042
 ge-0.040
ep-0.039
TokenT
Token ID51
Feature activation+9.61e-02

Pos logit contributions
 Jewel+0.079
pillar+0.077
umm+0.075
enny+0.074
ometime+0.074
Neg logit contributions
 hon-0.111
ep-0.106
ans-0.102
 steer-0.097
cks-0.095
Tokenina
Token ID1437
Feature activation-6.67e-04

Pos logit contributions
Really+0.145
 Jewel+0.144
Hmm+0.137
brate+0.136
Certainly+0.135
Neg logit contributions
ans-0.180
ull-0.165
 hon-0.164
ep-0.164
 polar-0.154
Token wanted
Token ID2227
Feature activation+3.87e-02

Pos logit contributions
 hon+0.001
ans+0.001
cks+0.001
 ge+0.001
 steer+0.001
Neg logit contributions
enny-0.001
Em-0.001
 Jewel-0.001
pillar-0.001
Well-0.001
Token to
Token ID284
Feature activation-2.21e-03

Pos logit contributions
enny+0.067
Em+0.066
pillar+0.065
 Jewel+0.065
Really+0.063
Neg logit contributions
 hon-0.068
ans-0.062
 ge-0.060
cks-0.058
 paddle-0.057
Token explore
Token ID7301
Feature activation+1.98e-02

Pos logit contributions
ans+0.004
 hon+0.004
cks+0.004
ep+0.004
 ge+0.004
Neg logit contributions
 Jewel-0.004
enny-0.003
Em-0.003
pillar-0.003
 Close-0.003
Token the
Token ID262
Feature activation+1.64e-02

Pos logit contributions
enny+0.037
 Jewel+0.037
Em+0.036
pillar+0.036
Well+0.034
Neg logit contributions
 hon-0.031
ans-0.029
cks-0.028
 ge-0.027
ep-0.026
Token be
Token ID307
Feature activation+1.05e-02

Pos logit contributions
 Jewel+0.026
enny+0.026
Em+0.026
pillar+0.025
Hmm+0.024
Neg logit contributions
 hon-0.019
ans-0.019
cks-0.018
 ge-0.017
ep-0.016
Token seen
Token ID1775
Feature activation+2.67e-02

Pos logit contributions
enny+0.012
Em+0.012
 she+0.012
 diary+0.011
pillar+0.011
Neg logit contributions
 hon-0.024
ans-0.023
cks-0.022
 paddle-0.022
 ge-0.021
Token.
Token ID13
Feature activation+5.01e-02

Pos logit contributions
enny+0.056
 Jewel+0.056
Em+0.056
pillar+0.054
Well+0.053
Neg logit contributions
 hon-0.035
ans-0.033
cks-0.031
 ge-0.029
ep-0.029
Token
Token ID198
Feature activation+2.83e-02

Pos logit contributions
oph+0.096
stone+0.090
pillar+0.089
ometime+0.088
iley+0.087
Neg logit contributions
 hon-0.068
 polar-0.064
 sled-0.060
 paddle-0.059
om-0.058
Token
Token ID198
Feature activation+7.10e-02

Pos logit contributions
 Jewel+0.056
enny+0.056
Em+0.055
pillar+0.055
stone+0.053
Neg logit contributions
 hon-0.040
ans-0.037
cks-0.036
ep-0.034
 ge-0.034
TokenB
Token ID33
Feature activation+9.58e-02

Pos logit contributions
oph+0.102
umm+0.097
 Jewel+0.094
 cartridge+0.093
enny+0.090
Neg logit contributions
 hon-0.143
ep-0.141
ans-0.127
 steer-0.125
 sled-0.124
Tokenizzy
Token ID40593
Feature activation+1.88e-02

Pos logit contributions
 Jewel+0.127
Really+0.114
 Faith+0.111
 cartridge+0.111
Em+0.109
Neg logit contributions
 hon-0.193
ans-0.191
ep-0.184
 polar-0.180
om-0.178
Token tried
Token ID3088
Feature activation+3.23e-02

Pos logit contributions
enny+0.031
 Jewel+0.031
Em+0.030
pillar+0.029
innie+0.029
Neg logit contributions
 hon-0.034
ans-0.032
ep-0.030
cks-0.030
 ge-0.030
Token asking
Token ID4737
Feature activation+4.31e-02

Pos logit contributions
enny+0.054
Em+0.054
pillar+0.053
 Jewel+0.053
Well+0.051
Neg logit contributions
 hon-0.058
ans-0.051
 ge-0.049
cks-0.049
 steer-0.049
Token some
Token ID617
Feature activation+3.18e-02

Pos logit contributions
pillar+0.076
Em+0.076
enny+0.075
Well+0.074
oph+0.073
Neg logit contributions
 hon-0.071
ans-0.063
 steer-0.061
 their-0.058
 paddle-0.058
Token of
Token ID286
Feature activation+3.11e-02

Pos logit contributions
enny+0.057
Em+0.056
 Jewel+0.052
Well+0.052
 Emmy+0.052
Neg logit contributions
 hon-0.056
ans-0.053
 ge-0.051
cks-0.049
 steer-0.047
Token could
Token ID714
Feature activation-8.52e-03

Pos logit contributions
Em+0.021
enny+0.020
 Jewel+0.020
Charl+0.020
innie+0.020
Neg logit contributions
 hon-0.023
ep-0.020
ans-0.020
 ge-0.019
cks-0.019
Token go
Token ID467
Feature activation+2.37e-02

Pos logit contributions
 hon+0.015
ans+0.014
cks+0.014
 ge+0.014
 polar+0.013
Neg logit contributions
enny-0.014
 she-0.014
 Jewel-0.013
Em-0.013
pillar-0.013
Token on
Token ID319
Feature activation+1.12e-02

Pos logit contributions
enny+0.048
 Jewel+0.048
Em+0.047
pillar+0.046
Well+0.044
Neg logit contributions
 hon-0.033
ans-0.032
cks-0.030
 ge-0.028
ep-0.028
Token.
Token ID13
Feature activation+6.77e-02

Pos logit contributions
enny+0.022
Em+0.021
 Jewel+0.021
pillar+0.021
Of+0.021
Neg logit contributions
 hon-0.017
ans-0.016
cks-0.015
 ge-0.014
ep-0.014
Token
Token ID198
Feature activation+3.56e-02

Pos logit contributions
ths+0.149
stone+0.146
Em+0.142
umm+0.136
Really+0.132
Neg logit contributions
 hon-0.072
ans-0.068
 paddle-0.063
 sled-0.062
cks-0.062
Token
Token ID198
Feature activation+9.57e-02

Pos logit contributions
 Jewel+0.069
enny+0.069
Em+0.068
pillar+0.067
stone+0.065
Neg logit contributions
 hon-0.052
ans-0.049
cks-0.046
 ge-0.044
ep-0.044
TokenOne
Token ID3198
Feature activation+3.12e-02

Pos logit contributions
umm+0.154
 Emmy+0.151
stone+0.151
igsaw+0.142
ths+0.141
Neg logit contributions
 hon-0.164
ep-0.160
ans-0.148
cks-0.145
 steer-0.145
Token day
Token ID1110
Feature activation+7.39e-02

Pos logit contributions
 Jewel+0.049
enny+0.048
Em+0.048
pillar+0.047
 cartridge+0.045
Neg logit contributions
 hon-0.058
ans-0.055
 ge-0.052
cks-0.051
 steer-0.051
Token Stacy
Token ID49523
Feature activation+1.90e-02

Pos logit contributions
Em+0.179
umm+0.178
stone+0.176
ometime+0.176
Certainly+0.175
Neg logit contributions
 hon-0.064
 ge-0.051
 paddle-0.051
 steer-0.048
 their-0.047
Token decided
Token ID3066
Feature activation+4.72e-02

Pos logit contributions
 Jewel+0.031
enny+0.030
Em+0.030
pillar+0.029
Hmm+0.028
Neg logit contributions
 hon-0.035
ans-0.033
cks-0.031
 ge-0.031
ep-0.030
Token she
Token ID673
Feature activation+2.21e-02

Pos logit contributions
enny+0.090
innie+0.088
pillar+0.088
 Jewel+0.088
Em+0.087
Neg logit contributions
 hon-0.075
 ge-0.064
ans-0.063
 steer-0.060
 paddle-0.058
Token to
Token ID284
Feature activation-9.28e-03

Pos logit contributions
enny+0.057
 Jewel+0.056
Em+0.056
pillar+0.055
innie+0.054
Neg logit contributions
 hon-0.062
ans-0.056
 ge-0.054
cks-0.052
 steer-0.052
Token examine
Token ID10716
Feature activation+3.32e-02

Pos logit contributions
 hon+0.016
ans+0.015
cks+0.015
ep+0.014
 ge+0.014
Neg logit contributions
 Jewel-0.016
enny-0.015
Em-0.015
pillar-0.015
 Close-0.015
Token it
Token ID340
Feature activation+2.51e-02

Pos logit contributions
enny+0.069
Em+0.068
 Jewel+0.065
 Close+0.065
oph+0.065
Neg logit contributions
 hon-0.045
ans-0.043
cks-0.042
 ge-0.039
ep-0.038
Token.
Token ID13
Feature activation+5.63e-02

Pos logit contributions
Em+0.060
enny+0.059
pillar+0.057
Well+0.057
 Jewel+0.057
Neg logit contributions
 hon-0.028
ans-0.026
cks-0.025
 steer-0.023
ep-0.022
Token
Token ID198
Feature activation+1.96e-02

Pos logit contributions
ths+0.136
umm+0.131
years+0.130
stone+0.129
enny+0.127
Neg logit contributions
 hon-0.047
ans-0.039
 paddle-0.037
 their-0.036
cks-0.034
Token
Token ID198
Feature activation+9.50e-02

Pos logit contributions
enny+0.035
Em+0.034
 Jewel+0.034
pillar+0.033
Well+0.032
Neg logit contributions
 hon-0.033
ans-0.032
cks-0.030
 ge-0.030
 paddle-0.029
TokenSo
Token ID2396
Feature activation+6.66e-02

Pos logit contributions
umm+0.145
 Well+0.136
ths+0.133
stone+0.131
enny+0.129
Neg logit contributions
 hon-0.172
ep-0.166
 steer-0.155
ans-0.152
cks-0.149
Token she
Token ID673
Feature activation+1.32e-02

Pos logit contributions
enny+0.186
pillar+0.182
years+0.181
Certainly+0.181
umm+0.180
Neg logit contributions
 their-0.034
 hon-0.032
 his-0.019
 paddle-0.011
 steer-0.009
Token took
Token ID1718
Feature activation+3.11e-02

Pos logit contributions
 Jewel+0.022
enny+0.022
Em+0.022
pillar+0.021
innie+0.021
Neg logit contributions
 hon-0.025
ans-0.022
ep-0.021
 paddle-0.021
 ge-0.021
Token a
Token ID257
Feature activation+1.56e-02

Pos logit contributions
Em+0.069
enny+0.069
Really+0.068
pillar+0.067
oph+0.064
Neg logit contributions
 hon-0.040
ans-0.036
 ge-0.034
cks-0.033
 steer-0.032
Token closer
Token ID5699
Feature activation+1.22e-02

Pos logit contributions
 Jewel+0.029
enny+0.029
Em+0.029
pillar+0.028
Well+0.027
Neg logit contributions
 hon-0.024
ans-0.023
cks-0.022
 ge-0.021
ep-0.020
Token v
Token ID410
Feature activation+9.31e-03

Pos logit contributions
enny+0.019
Em+0.018
 Jewel+0.018
Well+0.018
pillar+0.018
Neg logit contributions
 hon-0.017
ans-0.016
 ge-0.016
 paddle-0.015
cks-0.015
Tokenase
Token ID589
Feature activation+7.01e-03

Pos logit contributions
enny+0.010
Em+0.009
pillar+0.009
 Jewel+0.009
innie+0.008
Neg logit contributions
 hon-0.023
cks-0.021
ans-0.021
 ge-0.021
 paddle-0.021
Token.
Token ID13
Feature activation+4.18e-02

Pos logit contributions
enny+0.014
 Jewel+0.014
Em+0.014
pillar+0.013
stone+0.013
Neg logit contributions
 hon-0.010
ans-0.009
cks-0.009
 ge-0.009
 paddle-0.008
Token
Token ID198
Feature activation+2.90e-02

Pos logit contributions
pillar+0.070
stone+0.068
brate+0.064
ience+0.064
ometime+0.063
Neg logit contributions
 hon-0.064
 paddle-0.061
 their-0.060
ans-0.060
om-0.057
Token
Token ID198
Feature activation+6.58e-02

Pos logit contributions
 Jewel+0.060
pillar+0.058
enny+0.057
stone+0.057
Em+0.056
Neg logit contributions
 hon-0.039
ans-0.035
cks-0.033
ep-0.033
 steer-0.032
TokenT
Token ID51
Feature activation+9.49e-02

Pos logit contributions
umm+0.102
 Jewel+0.102
pillar+0.095
stone+0.095
oph+0.092
Neg logit contributions
ep-0.120
 hon-0.117
 steer-0.106
ans-0.106
 paddle-0.101
Tokenina
Token ID1437
Feature activation+1.18e-02

Pos logit contributions
 Jewel+0.144
Really+0.139
Em+0.133
Hmm+0.132
Certainly+0.131
Neg logit contributions
ans-0.183
 hon-0.167
ep-0.166
ull-0.164
 polar-0.157
Token saw
Token ID2497
Feature activation+2.02e-02

Pos logit contributions
enny+0.018
 Jewel+0.018
Em+0.018
pillar+0.018
Well+0.017
Neg logit contributions
 hon-0.022
ans-0.021
cks-0.020
 ge-0.020
ep-0.019
Token a
Token ID257
Feature activation+2.13e-02

Pos logit contributions
enny+0.035
 Jewel+0.034
Em+0.034
pillar+0.033
Well+0.032
Neg logit contributions
 hon-0.035
ans-0.033
cks-0.031
 ge-0.031
ep-0.030
Token tall
Token ID7331
Feature activation-9.44e-03

Pos logit contributions
 Jewel+0.035
Em+0.035
enny+0.035
pillar+0.034
Hmm+0.032
Neg logit contributions
 hon-0.037
ans-0.036
cks-0.035
 ge-0.033
 steer-0.032
Token vine
Token ID17793
Feature activation-9.18e-03

Pos logit contributions
ans+0.016
cks+0.016
 hon+0.016
 steer+0.014
ep+0.014
Neg logit contributions
 Jewel-0.016
pillar-0.016
Em-0.015
 she-0.015
stone-0.014
Token playing
Token ID2712
Feature activation+2.61e-02

Pos logit contributions
enny+0.039
pillar+0.039
Em+0.038
 Jewel+0.038
 cartridge+0.037
Neg logit contributions
 hon-0.037
cks-0.031
ans-0.031
 sled-0.031
 ge-0.031
Token in
Token ID287
Feature activation+1.27e-02

Pos logit contributions
enny+0.051
Em+0.050
pillar+0.049
 Jewel+0.049
oph+0.048
Neg logit contributions
 hon-0.040
ans-0.036
cks-0.034
ep-0.033
 ge-0.033
Token the
Token ID262
Feature activation-5.66e-03

Pos logit contributions
 Jewel+0.025
Em+0.024
 she+0.023
enny+0.023
 Grace+0.023
Neg logit contributions
ans-0.020
 hon-0.019
cks-0.017
 ge-0.017
cker-0.016
Token park
Token ID3952
Feature activation+2.93e-02

Pos logit contributions
 hon+0.010
ans+0.010
cks+0.010
 ge+0.009
 steer+0.009
Neg logit contributions
 Jewel-0.009
Em-0.009
enny-0.009
pillar-0.009
Well-0.008
Token,
Token ID11
Feature activation+6.35e-02

Pos logit contributions
 Jewel+0.054
enny+0.054
Em+0.054
pillar+0.052
Well+0.050
Neg logit contributions
 hon-0.046
ans-0.044
cks-0.042
 ge-0.040
ep-0.039
Token D
Token ID360
Feature activation+9.46e-02

Pos logit contributions
Em+0.094
enny+0.091
innie+0.089
pillar+0.087
ometime+0.086
Neg logit contributions
 hon-0.124
 ge-0.123
 paddle-0.119
ans-0.115
 sled-0.112
Tokenina
Token ID1437
Feature activation+1.38e-02

Pos logit contributions
Really+0.166
 Jewel+0.163
ths+0.153
Hmm+0.153
brate+0.152
Neg logit contributions
ans-0.158
ep-0.123
an-0.119
 hon-0.118
om-0.117
Token would
Token ID561
Feature activation+1.45e-02

Pos logit contributions
 she+0.015
Em+0.015
pillar+0.014
enny+0.014
Well+0.013
Neg logit contributions
cks-0.033
ans-0.031
 hon-0.030
 afloat-0.030
 explode-0.029
Token ride
Token ID6594
Feature activation+3.76e-02

Pos logit contributions
 she+0.023
 Jewel+0.023
Sure+0.022
Of+0.021
 Close+0.021
Neg logit contributions
ans-0.027
cks-0.026
ull-0.024
 ge-0.024
 hon-0.023
Token her
Token ID607
Feature activation+2.13e-02

Pos logit contributions
 Jewel+0.048
enny+0.046
Em+0.046
pillar+0.044
 she+0.043
Neg logit contributions
 hon-0.080
ans-0.080
cks-0.077
 ge-0.074
 paddle-0.073
Token tr
Token ID491
Feature activation+1.94e-02

Pos logit contributions
enny+0.037
 Jewel+0.037
Em+0.036
pillar+0.035
 cartridge+0.035
Neg logit contributions
 hon-0.037
ans-0.034
cks-0.032
 sled-0.031
ep-0.031
Token the
Token ID262
Feature activation+1.21e-02

Pos logit contributions
enny+0.023
 Jewel+0.023
Em+0.023
pillar+0.022
Well+0.021
Neg logit contributions
 hon-0.024
ans-0.022
cks-0.022
 ge-0.021
ep-0.021
Token wind
Token ID2344
Feature activation-1.68e-03

Pos logit contributions
enny+0.024
Em+0.023
 Jewel+0.022
innie+0.022
Well+0.021
Neg logit contributions
 hon-0.019
ans-0.018
 ge-0.017
cks-0.016
ep-0.016
Token.
Token ID13
Feature activation+4.44e-02

Pos logit contributions
ans+0.003
 hon+0.003
cks+0.003
 ge+0.003
 paddle+0.002
Neg logit contributions
 she-0.003
pillar-0.003
stone-0.003
enny-0.003
Em-0.003
Token
Token ID198
Feature activation+2.58e-02

Pos logit contributions
enny+0.084
oph+0.084
pillar+0.083
ometime+0.083
stone+0.082
Neg logit contributions
 hon-0.063
 paddle-0.054
 polar-0.053
ans-0.053
 sled-0.052
Token
Token ID198
Feature activation+6.78e-02

Pos logit contributions
enny+0.049
 Jewel+0.048
Em+0.048
pillar+0.047
Well+0.045
Neg logit contributions
 hon-0.040
ans-0.038
cks-0.036
 ge-0.034
ep-0.034
TokenB
Token ID33
Feature activation+9.45e-02

Pos logit contributions
oph+0.083
 Jewel+0.081
enny+0.079
 cartridge+0.077
umm+0.076
Neg logit contributions
 hon-0.149
ep-0.140
ans-0.135
 paddle-0.132
 steer-0.132
Tokenizzy
Token ID40593
Feature activation+2.05e-02

Pos logit contributions
 Jewel+0.124
Really+0.111
Em+0.108
 cartridge+0.108
 Faith+0.108
Neg logit contributions
ans-0.192
 hon-0.192
ep-0.182
 polar-0.180
om-0.176
Token woke
Token ID19092
Feature activation+4.57e-02

Pos logit contributions
enny+0.031
 Jewel+0.031
Em+0.031
pillar+0.030
Well+0.028
Neg logit contributions
 hon-0.039
ans-0.037
cks-0.036
 ge-0.034
ep-0.034
Token up
Token ID510
Feature activation+1.74e-02

Pos logit contributions
enny+0.096
Em+0.094
pillar+0.093
 Jewel+0.093
Hmm+0.090
Neg logit contributions
 hon-0.064
ans-0.059
cks-0.056
ep-0.052
 steer-0.051
Token and
Token ID290
Feature activation+2.64e-02

Pos logit contributions
 Jewel+0.032
enny+0.032
Em+0.032
pillar+0.031
Well+0.030
Neg logit contributions
 hon-0.027
ans-0.026
cks-0.025
 ge-0.024
ep-0.023
Token couldn
Token ID3521
Feature activation+4.33e-02

Pos logit contributions
enny+0.046
 Jewel+0.045
Em+0.044
pillar+0.044
innie+0.042
Neg logit contributions
 hon-0.047
ans-0.043
cks-0.041
 ge-0.041
ep-0.040
Token noise
Token ID7838
Feature activation+9.36e-03

Pos logit contributions
enny+0.052
 Jewel+0.051
Em+0.051
pillar+0.050
Well+0.049
Neg logit contributions
 hon-0.030
ans-0.027
cks-0.026
 ge-0.025
ep-0.024
Token from
Token ID422
Feature activation-2.98e-03

Pos logit contributions
enny+0.019
 Jewel+0.019
Em+0.018
pillar+0.018
Hmm+0.017
Neg logit contributions
 hon-0.014
ans-0.013
cks-0.012
 ge-0.011
 steer-0.011
Token outside
Token ID2354
Feature activation+8.40e-03

Pos logit contributions
 hon+0.005
ans+0.005
cks+0.005
 ge+0.005
ep+0.005
Neg logit contributions
 Jewel-0.005
Em-0.005
enny-0.005
pillar-0.005
 Grace-0.004
Token.
Token ID13
Feature activation+6.52e-02

Pos logit contributions
 Jewel+0.013
 she+0.013
enny+0.013
Em+0.013
pillar+0.013
Neg logit contributions
ans-0.015
 hon-0.015
cks-0.014
 ge-0.014
 paddle-0.014
Token
Token ID198
Feature activation+3.96e-02

Pos logit contributions
ths+0.107
Em+0.106
stone+0.099
 occasion+0.098
umm+0.097
Neg logit contributions
 hon-0.109
 paddle-0.102
ans-0.101
cks-0.098
 ge-0.097
Token
Token ID198
Feature activation+9.45e-02

Pos logit contributions
 Jewel+0.078
enny+0.078
Em+0.077
pillar+0.076
stone+0.074
Neg logit contributions
 hon-0.057
ans-0.052
cks-0.050
 ge-0.047
ep-0.047
TokenSt
Token ID1273
Feature activation+5.86e-02

Pos logit contributions
umm+0.156
 Emmy+0.149
ths+0.141
stone+0.140
igsaw+0.138
Neg logit contributions
 hon-0.161
ep-0.157
 steer-0.149
ans-0.147
cks-0.147
Tokenacy
Token ID1590
Feature activation-5.29e-03

Pos logit contributions
 Jewel+0.065
Em+0.064
Of+0.062
Really+0.061
Hmm+0.059
Neg logit contributions
ans-0.131
 hon-0.130
cks-0.129
ep-0.121
 steer-0.119
Token opened
Token ID4721
Feature activation+2.61e-02

Pos logit contributions
 hon+0.010
ans+0.009
cks+0.009
 ge+0.009
ep+0.008
Neg logit contributions
 Jewel-0.008
enny-0.008
Em-0.008
pillar-0.008
Hmm-0.008
Token the
Token ID262
Feature activation+1.44e-02

Pos logit contributions
enny+0.040
 Jewel+0.039
Em+0.039
pillar+0.038
Well+0.036
Neg logit contributions
 hon-0.050
ans-0.048
cks-0.046
 ge-0.044
ep-0.044
Token door
Token ID3420
Feature activation+1.64e-02

Pos logit contributions
enny+0.027
Em+0.027
Well+0.026
�+0.026
oph+0.026
Neg logit contributions
 hon-0.023
ans-0.020
 polar-0.020
 horn-0.019
 sled-0.019
Token special
Token ID2041
Feature activation+2.48e-02

Pos logit contributions
Em+0.010
 Jewel+0.010
pillar+0.010
enny+0.010
Of+0.009
Neg logit contributions
cks-0.016
ans-0.016
 hon-0.015
ep-0.013
 steer-0.013
Token little
Token ID1310
Feature activation+1.47e-02

Pos logit contributions
enny+0.043
 Jewel+0.042
Em+0.042
pillar+0.041
Hmm+0.040
Neg logit contributions
 hon-0.043
ans-0.039
 ge-0.037
 paddle-0.036
cks-0.036
Token girl
Token ID2576
Feature activation+1.44e-02

Pos logit contributions
enny+0.022
 Jewel+0.022
Em+0.022
pillar+0.021
Well+0.021
Neg logit contributions
 hon-0.028
ans-0.026
 ge-0.025
cks-0.025
 paddle-0.024
Token named
Token ID3706
Feature activation+3.92e-02

Pos logit contributions
enny+0.025
Em+0.025
 Jewel+0.025
pillar+0.024
 she+0.023
Neg logit contributions
 hon-0.024
ans-0.023
cks-0.022
 ge-0.021
 paddle-0.021
Token Stacy
Token ID49523
Feature activation+1.07e-02

Pos logit contributions
pillar+0.072
ths+0.071
Hmm+0.070
 cartridge+0.070
stone+0.070
Neg logit contributions
 hon-0.052
ans-0.049
 ge-0.048
 their-0.047
 Jen-0.046
Token.
Token ID13
Feature activation+9.42e-02

Pos logit contributions
Em+0.019
enny+0.019
 Jewel+0.018
 she+0.018
pillar+0.018
Neg logit contributions
 hon-0.018
ans-0.017
cks-0.017
 paddle-0.016
 ge-0.016
Token She
Token ID1375
Feature activation+2.31e-02

Pos logit contributions
ths+0.140
stone+0.131
Em+0.127
 Emmy+0.113
Really+0.111
Neg logit contributions
 hon-0.167
ans-0.162
 their-0.160
cks-0.159
 paddle-0.157
Token was
Token ID373
Feature activation+1.69e-02

Pos logit contributions
enny+0.044
 Jewel+0.043
pillar+0.042
Em+0.042
Hmm+0.041
Neg logit contributions
 hon-0.037
ans-0.032
ep-0.031
cks-0.030
 ge-0.030
Token always
Token ID1464
Feature activation+1.56e-02

Pos logit contributions
enny+0.032
pillar+0.031
Em+0.030
 Jewel+0.030
ometime+0.029
Neg logit contributions
 hon-0.028
ans-0.025
cks-0.024
 ge-0.024
 polar-0.023
Token so
Token ID523
Feature activation+4.92e-03

Pos logit contributions
enny+0.028
 Jewel+0.028
Em+0.028
pillar+0.027
Well+0.026
Neg logit contributions
 hon-0.026
ans-0.024
cks-0.023
 ge-0.022
ep-0.022
Token curious
Token ID11040
Feature activation+2.82e-02

Pos logit contributions
enny+0.011
 Jewel+0.011
pillar+0.011
Em+0.011
ometime+0.010
Neg logit contributions
 hon-0.006
ans-0.006
 ge-0.005
cks-0.005
 paddle-0.005
Token in
Token ID287
Feature activation+6.86e-03

Pos logit contributions
 Jewel+0.030
enny+0.030
Em+0.029
pillar+0.029
Well+0.028
Neg logit contributions
 hon-0.027
ans-0.026
cks-0.025
 ge-0.024
ep-0.023
Token the
Token ID262
Feature activation-2.05e-02

Pos logit contributions
 Jewel+0.013
Em+0.013
enny+0.013
 she+0.013
pillar+0.012
Neg logit contributions
 hon-0.010
ans-0.010
cks-0.009
 ge-0.009
 paddle-0.009
Token mud
Token ID17492
Feature activation-1.91e-03

Pos logit contributions
 hon+0.030
cks+0.029
ans+0.028
 steer+0.026
 paddle+0.024
Neg logit contributions
Em-0.039
enny-0.039
pillar-0.039
 Jewel-0.039
Hmm-0.038
Token.
Token ID13
Feature activation+5.85e-02

Pos logit contributions
 hon+0.003
ans+0.003
cks+0.003
 ge+0.003
 steer+0.003
Neg logit contributions
enny-0.003
 she-0.003
 Jewel-0.003
pillar-0.003
Em-0.003
Token
Token ID198
Feature activation+3.40e-02

Pos logit contributions
ths+0.131
oph+0.128
years+0.127
 exam+0.126
ometime+0.126
Neg logit contributions
 their-0.050
 hon-0.048
 paddle-0.042
 sled-0.042
ans-0.039
Token
Token ID198
Feature activation+9.40e-02

Pos logit contributions
 Jewel+0.065
enny+0.065
Em+0.064
pillar+0.063
Well+0.060
Neg logit contributions
 hon-0.051
ans-0.048
cks-0.046
 ge-0.044
ep-0.043
TokenGr
Token ID8642
Feature activation+6.59e-02

Pos logit contributions
umm+0.175
enny+0.157
 Well+0.154
oll+0.154
elt+0.152
Neg logit contributions
ep-0.148
 hon-0.139
 steer-0.125
 their-0.113
Their-0.113
Tokenace
Token ID558
Feature activation+1.11e-02

Pos logit contributions
Really+0.129
Sure+0.129
Hmm+0.129
Of+0.129
 she+0.128
Neg logit contributions
ans-0.088
 hon-0.080
cks-0.077
ep-0.077
ull-0.076
Token found
Token ID1043
Feature activation+3.98e-02

Pos logit contributions
enny+0.016
Em+0.015
pillar+0.015
 Jewel+0.015
Well+0.014
Neg logit contributions
 hon-0.022
cks-0.022
ans-0.021
 ge-0.020
 steer-0.020
Token it
Token ID340
Feature activation+3.84e-02

Pos logit contributions
enny+0.082
Em+0.082
pillar+0.081
oph+0.080
Really+0.080
Neg logit contributions
 hon-0.053
ans-0.047
 ge-0.046
cks-0.044
ep-0.044
Token.
Token ID13
Feature activation+5.76e-02

Pos logit contributions
 Jewel+0.090
Em+0.089
Really+0.087
enny+0.087
pillar+0.087
Neg logit contributions
 hon-0.042
ans-0.036
ep-0.036
 steer-0.033
cks-0.032
Token towel
Token ID24808
Feature activation+1.79e-02

Pos logit contributions
 Jewel+0.049
Em+0.048
enny+0.048
pillar+0.048
 Grace+0.046
Neg logit contributions
 hon-0.052
ans-0.051
cks-0.050
 ge-0.046
ep-0.045
Token.
Token ID13
Feature activation+5.23e-02

Pos logit contributions
 she+0.031
enny+0.030
stone+0.030
pillar+0.030
 cartridge+0.029
Neg logit contributions
 hon-0.030
ans-0.030
cks-0.030
 ge-0.028
 steer-0.027
Token
Token ID220
Feature activation-5.64e-03

Pos logit contributions
pillar+0.081
 Jewel+0.080
stone+0.078
Em+0.077
brate+0.076
Neg logit contributions
 hon-0.087
ans-0.083
ep-0.075
 paddle-0.075
 steer-0.075
Token
Token ID198
Feature activation+4.30e-02

Pos logit contributions
 hon+0.011
ans+0.010
 ge+0.010
 paddle+0.010
cks+0.009
Neg logit contributions
enny-0.009
Em-0.009
 Jewel-0.009
Well-0.008
pillar-0.008
Token
Token ID198
Feature activation+8.45e-02

Pos logit contributions
 Jewel+0.086
enny+0.085
Em+0.083
pillar+0.083
stone+0.080
Neg logit contributions
 hon-0.060
ans-0.056
cks-0.053
ep-0.051
 ge-0.050
TokenT
Token ID51
Feature activation+9.38e-02

Pos logit contributions
umm+0.137
 Jewel+0.135
 Close+0.122
enny+0.122
stone+0.121
Neg logit contributions
ep-0.153
 hon-0.146
ans-0.135
 steer-0.129
cks-0.123
Tokenina
Token ID1437
Feature activation+1.17e-02

Pos logit contributions
Really+0.141
 Jewel+0.140
Certainly+0.134
Hmm+0.131
 Really+0.127
Neg logit contributions
ans-0.175
ep-0.164
 hon-0.155
ull-0.153
an-0.144
Token named
Token ID3706
Feature activation+3.08e-02

Pos logit contributions
enny+0.017
Em+0.017
pillar+0.016
 Jewel+0.016
Well+0.015
Neg logit contributions
 hon-0.023
ans-0.023
cks-0.022
 steer-0.021
 ge-0.021
Token the
Token ID262
Feature activation+3.07e-02

Pos logit contributions
 Jewel+0.054
enny+0.054
Em+0.053
pillar+0.052
 Grace+0.050
Neg logit contributions
 hon-0.051
ans-0.049
cks-0.047
 ge-0.045
ep-0.045
Token kitten
Token ID42143
Feature activation+1.41e-02

Pos logit contributions
enny+0.049
ometime+0.044
oph+0.043
Em+0.042
Hmm+0.041
Neg logit contributions
 hon-0.059
 ge-0.055
ans-0.055
 polar-0.053
 hipp-0.052
Token Lightning
Token ID12469
Feature activation+1.15e-02

Pos logit contributions
 Jewel+0.024
enny+0.024
Em+0.023
pillar+0.023
 Grace+0.022
Neg logit contributions
 hon-0.024
ans-0.024
cks-0.023
ep-0.021
 ge-0.021
Token neck
Token ID7393
Feature activation+4.77e-03

Pos logit contributions
 Jewel+0.022
pillar+0.021
 diary+0.020
stone+0.020
Em+0.020
Neg logit contributions
 hon-0.015
cks-0.015
ans-0.014
ull-0.013
 ge-0.013
Token.
Token ID13
Feature activation+3.75e-02

Pos logit contributions
enny+0.008
 she+0.008
stone+0.008
pillar+0.008
 Jewel+0.008
Neg logit contributions
 hon-0.008
cks-0.007
ans-0.007
 ge-0.007
 polar-0.007
Token
Token ID220
Feature activation-5.71e-03

Pos logit contributions
Em+0.059
pillar+0.057
Ruby+0.056
enny+0.055
 Jewel+0.054
Neg logit contributions
 hon-0.065
ans-0.064
 paddle-0.059
 steer-0.058
 ge-0.057
Token
Token ID198
Feature activation+1.97e-02

Pos logit contributions
 hon+0.011
ans+0.010
 ge+0.010
cks+0.009
 paddle+0.009
Neg logit contributions
enny-0.010
Em-0.009
 Jewel-0.009
pillar-0.009
Well-0.009
Token
Token ID198
Feature activation+6.75e-02

Pos logit contributions
enny+0.036
Em+0.035
 Jewel+0.035
pillar+0.034
Well+0.033
Neg logit contributions
 hon-0.032
ans-0.031
cks-0.029
 ge-0.028
 paddle-0.027
TokenT
Token ID51
Feature activation+9.30e-02

Pos logit contributions
 Jewel+0.093
enny+0.086
 cartridge+0.085
pillar+0.084
 Close+0.084
Neg logit contributions
 hon-0.135
ans-0.128
ep-0.127
 steer-0.122
 ge-0.118
Tokenilly
Token ID6548
Feature activation+2.77e-03

Pos logit contributions
 Jewel+0.092
Ruby+0.082
Em+0.081
 Faith+0.081
Really+0.078
Neg logit contributions
ans-0.226
ull-0.211
 polar-0.195
an-0.195
ep-0.193
Token spent
Token ID3377
Feature activation+2.77e-02

Pos logit contributions
 Jewel+0.004
enny+0.004
Em+0.004
pillar+0.004
Well+0.004
Neg logit contributions
 hon-0.005
ans-0.005
cks-0.005
 ge-0.005
 steer-0.005
Token every
Token ID790
Feature activation+2.42e-02

Pos logit contributions
enny+0.061
 Jewel+0.060
pillar+0.059
Em+0.059
 Close+0.057
Neg logit contributions
 hon-0.035
ans-0.032
cks-0.031
 ge-0.029
ep-0.029
Token day
Token ID1110
Feature activation+3.99e-03

Pos logit contributions
enny+0.044
 Jewel+0.041
oph+0.041
innie+0.040
illy+0.040
Neg logit contributions
 hon-0.041
 steer-0.036
 paddle-0.036
ep-0.035
 snowball-0.034
Token playing
Token ID2712
Feature activation+1.80e-02

Pos logit contributions
Em+0.006
 Jewel+0.006
enny+0.006
pillar+0.006
stone+0.005
Neg logit contributions
ans-0.008
 hon-0.008
cks-0.007
 ge-0.007
 steer-0.007
TokenHello
Token ID15496
Feature activation-1.33e-02

Pos logit contributions
enny+0.022
 Jewel+0.021
Em+0.020
pillar+0.020
innie+0.019
Neg logit contributions
 hon-0.049
ans-0.046
cks-0.045
 ge-0.044
ep-0.044
Token,
Token ID11
Feature activation-2.76e-02

Pos logit contributions
ans+0.023
 hon+0.023
cks+0.022
ep+0.020
 paddle+0.020
Neg logit contributions
 Jewel-0.022
enny-0.021
Em-0.021
pillar-0.021
 she-0.021
Token kids
Token ID3988
Feature activation-5.50e-02

Pos logit contributions
ans+0.048
cks+0.046
 hon+0.045
ep+0.041
 afloat+0.041
Neg logit contributions
 Jewel-0.047
Em-0.046
 she-0.044
 Grace-0.044
enny-0.043
Token,"
Token ID553
Feature activation-5.67e-02

Pos logit contributions
ans+0.106
 hon+0.101
 paddle+0.100
cks+0.097
 ge+0.094
Neg logit contributions
 she-0.079
 Jewel-0.076
pillar-0.074
enny-0.073
Em-0.072
Token the
Token ID262
Feature activation-5.11e-02

Pos logit contributions
ans+0.101
 hon+0.097
cks+0.096
ep+0.090
 afloat+0.087
Neg logit contributions
 Jewel-0.093
 she-0.091
Em-0.089
enny-0.087
 Grace-0.086
Token firefighter
Token ID43105
Feature activation-9.07e-02

Pos logit contributions
cks+0.084
ans+0.080
 afloat+0.075
 hon+0.074
 steer+0.071
Neg logit contributions
 Jewel-0.089
Em-0.086
 she-0.085
pillar-0.083
 diary-0.083
Token says
Token ID1139
Feature activation-4.81e-02

Pos logit contributions
ans+0.165
 afloat+0.162
cks+0.156
 sled+0.155
 hon+0.153
Neg logit contributions
 Jewel-0.126
 she-0.125
 Close-0.122
enny-0.120
pillar-0.118
Token.
Token ID13
Feature activation-4.36e-02

Pos logit contributions
ans+0.089
cks+0.081
 hon+0.081
 ge+0.080
ep+0.079
Neg logit contributions
 she-0.082
enny-0.074
 Jewel-0.073
ly-0.071
Em-0.071
Token "
Token ID366
Feature activation+1.04e-02

Pos logit contributions
ans+0.102
cks+0.094
 ge+0.089
 hon+0.089
 afloat+0.088
Neg logit contributions
 Jewel-0.055
 she-0.054
 Well-0.051
Well-0.051
Em-0.051
TokenDo
Token ID5211
Feature activation+3.29e-03

Pos logit contributions
enny+0.009
 Jewel+0.009
Em+0.009
pillar+0.008
Well+0.008
Neg logit contributions
 hon-0.027
ans-0.026
cks-0.025
 ge-0.024
ep-0.024
Token you
Token ID345
Feature activation-2.01e-02

Pos logit contributions
 Jewel+0.002
enny+0.002
Em+0.002
pillar+0.002
Well+0.002
Neg logit contributions
 hon-0.009
ans-0.009
cks-0.008
 ge-0.008
 paddle-0.008
Token make
Token ID787
Feature activation-1.30e-02

Pos logit contributions
 Jewel+0.005
Em+0.005
enny+0.005
pillar+0.005
Ruby+0.004
Neg logit contributions
 hon-0.004
ans-0.004
cks-0.004
 ge-0.003
ep-0.003
Token you
Token ID345
Feature activation-5.20e-03

Pos logit contributions
 hon+0.029
ans+0.028
 paddle+0.028
 steer+0.027
cks+0.027
Neg logit contributions
 Jewel-0.014
pillar-0.013
 she-0.013
Em-0.013
enny-0.013
Token sick
Token ID6639
Feature activation-2.83e-02

Pos logit contributions
ans+0.009
 hon+0.009
cks+0.009
 ge+0.008
Dan+0.008
Neg logit contributions
 she-0.008
 diary-0.007
stone-0.007
pillar-0.006
 blush-0.006
Token!"
Token ID2474
Feature activation-6.92e-02

Pos logit contributions
 hon+0.045
ans+0.044
cks+0.042
 paddle+0.040
ep+0.040
Neg logit contributions
pillar-0.049
 Jewel-0.049
 she-0.049
enny-0.048
Em-0.047
Token the
Token ID262
Feature activation-3.74e-02

Pos logit contributions
ans+0.122
cks+0.108
ep+0.108
 afloat+0.107
cker+0.102
Neg logit contributions
 she-0.132
Well-0.118
 Well-0.114
 Jewel-0.109
Hmm-0.109
Token man
Token ID582
Feature activation-8.93e-02

Pos logit contributions
cks+0.066
 afloat+0.061
ans+0.061
 hon+0.055
ric+0.053
Neg logit contributions
 she-0.062
pillar-0.060
 diary-0.058
Sure-0.058
 tailor-0.056
Token yells
Token ID42675
Feature activation-5.31e-02

Pos logit contributions
 afloat+0.159
ans+0.157
 sled+0.146
 polar+0.146
 paddle+0.145
Neg logit contributions
 Jewel-0.128
 she-0.128
pillar-0.119
she-0.119
Well-0.119
Token.
Token ID13
Feature activation-6.11e-02

Pos logit contributions
ep+0.102
ric+0.100
ans+0.099
cks+0.099
cker+0.096
Neg logit contributions
 she-0.077
ly-0.074
 Jewel-0.070
enny-0.069
pillar-0.065
Token
Token ID198
Feature activation-4.56e-02

Pos logit contributions
ans+0.133
cks+0.121
 hon+0.116
 afloat+0.115
ep+0.114
Neg logit contributions
Well-0.087
 Well-0.083
 she-0.083
 Jewel-0.082
enny-0.079
Token
Token ID198
Feature activation-1.46e-02

Pos logit contributions
ans+0.106
 hon+0.099
cker+0.097
 paddle+0.096
 afloat+0.095
Neg logit contributions
Well-0.055
Sure-0.050
enny-0.050
 she-0.049
 Well-0.047
TokenTim
Token ID14967
Feature activation-3.32e-02

Pos logit contributions
ans+0.025
 hon+0.024
cks+0.023
 paddle+0.022
 afloat+0.022
Neg logit contributions
Well-0.025
Hmm-0.025
Sure-0.024
enny-0.023
Em-0.023
Token"
Token ID1
Feature activation+1.62e-02

Pos logit contributions
ans+0.020
 hon+0.019
cks+0.018
 paddle+0.018
cker+0.018
Neg logit contributions
Em-0.012
Ruby-0.011
enny-0.011
 Jewel-0.011
Well-0.011
TokenBad
Token ID22069
Feature activation-1.56e-02

Pos logit contributions
enny+0.015
 Jewel+0.014
Em+0.014
pillar+0.013
innie+0.013
Neg logit contributions
 hon-0.042
ans-0.039
 ge-0.038
cks-0.038
ep-0.038
Token dog
Token ID3290
Feature activation-4.89e-02

Pos logit contributions
cks+0.027
ans+0.027
 afloat+0.026
 hon+0.026
ep+0.025
Neg logit contributions
pillar-0.024
 Jewel-0.024
 she-0.023
enny-0.021
 Charlotte-0.021
Token!"
Token ID2474
Feature activation-5.03e-02

Pos logit contributions
ans+0.089
 paddle+0.084
 hon+0.083
cks+0.082
ric+0.081
Neg logit contributions
pillar-0.079
 Jewel-0.072
stone-0.069
enny-0.069
ha-0.068
Token the
Token ID262
Feature activation-3.82e-02

Pos logit contributions
ans+0.107
 hon+0.101
cks+0.101
ep+0.098
cker+0.095
Neg logit contributions
 she-0.069
 Jewel-0.067
Em-0.066
enny-0.063
 Charlotte-0.062
Token man
Token ID582
Feature activation-8.79e-02

Pos logit contributions
cks+0.068
ans+0.065
ull+0.061
 afloat+0.060
ric+0.060
Neg logit contributions
pillar-0.063
 she-0.062
 Jewel-0.062
Sure-0.061
 diary-0.060
Token says
Token ID1139
Feature activation-5.03e-02

Pos logit contributions
ans+0.153
 afloat+0.152
ull+0.147
 polar+0.145
 sled+0.142
Neg logit contributions
 Jewel-0.130
 she-0.129
she-0.120
pillar-0.118
Well-0.115
Token.
Token ID13
Feature activation-3.97e-02

Pos logit contributions
ans+0.104
 ge+0.097
ep+0.094
ric+0.094
 hon+0.091
Neg logit contributions
 she-0.079
enny-0.069
ly-0.069
 Jewel-0.065
stone-0.061
Token He
Token ID679
Feature activation-3.86e-02

Pos logit contributions
ans+0.094
cks+0.084
 hon+0.082
cker+0.082
ep+0.080
Neg logit contributions
 Jewel-0.050
 she-0.047
pillar-0.047
enny-0.046
Em-0.046
Token is
Token ID318
Feature activation-4.32e-02

Pos logit contributions
ans+0.065
 hon+0.065
cks+0.060
 ge+0.059
 steer+0.057
Neg logit contributions
 Jewel-0.064
enny-0.062
pillar-0.062
Em-0.061
Well-0.059
Token angry
Token ID7954
Feature activation-3.34e-02

Pos logit contributions
ans+0.086
cker+0.075
 paddle+0.073
 mast+0.070
 steer+0.070
Neg logit contributions
 she-0.062
 Jewel-0.062
 Faith-0.057
 Charlotte-0.056
 Emma-0.056
Token,
Token ID11
Feature activation-2.15e-02

Pos logit contributions
 hon+0.100
 paddle+0.097
 steer+0.095
 afloat+0.095
 stret+0.093
Neg logit contributions
pillar-0.061
ly-0.057
olly-0.055
ha-0.054
 she-0.052
Token qu
Token ID627
Feature activation-1.91e-02

Pos logit contributions
ans+0.034
cks+0.033
 hon+0.032
ep+0.031
 afloat+0.031
Neg logit contributions
pillar-0.036
 Jewel-0.036
 she-0.036
Of-0.036
Sure-0.035
Tokenack
Token ID441
Feature activation-5.62e-02

Pos logit contributions
 hon+0.033
 paddle+0.030
 sled+0.030
 steer+0.030
 polar+0.029
Neg logit contributions
pillar-0.033
enny-0.031
olly-0.029
Ruby-0.028
illy-0.028
Token!"
Token ID2474
Feature activation-5.26e-02

Pos logit contributions
 hon+0.107
 paddle+0.107
 afloat+0.103
 steer+0.102
 stret+0.102
Neg logit contributions
pillar-0.068
ly-0.061
ha-0.059
olly-0.059
ww-0.058
Token The
Token ID383
Feature activation-5.01e-02

Pos logit contributions
ans+0.111
 hon+0.106
cks+0.105
ep+0.103
cker+0.099
Neg logit contributions
 she-0.072
 Jewel-0.068
enny-0.068
Em-0.067
 Penny-0.066
Token ducks
Token ID39694
Feature activation-8.78e-02

Pos logit contributions
cks+0.085
 afloat+0.078
rum+0.068
aks+0.068
 steer+0.067
Neg logit contributions
 she-0.079
pillar-0.078
Hi-0.075
 butterfly-0.071
Sure-0.070
Token and
Token ID290
Feature activation-4.29e-02

Pos logit contributions
ans+0.173
cks+0.163
ric+0.158
 sled+0.158
 polar+0.157
Neg logit contributions
 she-0.112
ted-0.099
Sure-0.098
pillar-0.098
Well-0.093
Token ge
Token ID4903
Feature activation-3.48e-02

Pos logit contributions
ans+0.065
cks+0.065
 afloat+0.063
ull+0.061
ep+0.058
Neg logit contributions
 she-0.075
 Jewel-0.072
ted-0.069
 Penny-0.068
Of-0.068
Tokenese
Token ID2771
Feature activation-7.74e-02

Pos logit contributions
 hon+0.044
 afloat+0.041
 paddle+0.041
 polar+0.039
 steer+0.039
Neg logit contributions
pillar-0.070
als-0.063
 Jewel-0.063
elt-0.063
enny-0.062
Token made
Token ID925
Feature activation-5.06e-02

Pos logit contributions
ans+0.151
cks+0.141
 polar+0.137
 sled+0.135
 hon+0.133
Neg logit contributions
 she-0.097
ted-0.096
pillar-0.092
Sure-0.091
als-0.091
Token loud
Token ID7812
Feature activation-6.03e-02

Pos logit contributions
cker+0.096
ans+0.094
cks+0.092
ep+0.090
 hon+0.090
Neg logit contributions
 she-0.068
Sure-0.063
ted-0.060
 Jewel-0.060
 Penny-0.060
Token a
Token ID257
Feature activation+5.16e-03

Pos logit contributions
ans+0.032
 hon+0.031
 paddle+0.029
cks+0.029
 steer+0.029
Neg logit contributions
 Jewel-0.028
Em-0.026
enny-0.025
stone-0.025
 cartridge-0.025
Token crocod
Token ID37565
Feature activation-4.03e-02

Pos logit contributions
 Jewel+0.009
enny+0.009
Em+0.009
pillar+0.009
Well+0.008
Neg logit contributions
 hon-0.009
ans-0.008
cks-0.008
 ge-0.008
ep-0.008
Tokenile
Token ID576
Feature activation-3.61e-02

Pos logit contributions
 hon+0.055
ans+0.053
cks+0.049
 paddle+0.047
 ge+0.047
Neg logit contributions
pillar-0.083
enny-0.080
 Jewel-0.080
Em-0.078
stone-0.076
Token!"
Token ID2474
Feature activation-5.89e-02

Pos logit contributions
ans+0.055
 hon+0.053
cks+0.051
 paddle+0.049
 steer+0.048
Neg logit contributions
pillar-0.066
 Jewel-0.065
 she-0.064
stone-0.063
enny-0.062
Token the
Token ID262
Feature activation-4.95e-02

Pos logit contributions
ans+0.126
cks+0.116
ep+0.112
 hon+0.111
 afloat+0.108
Neg logit contributions
 she-0.095
Well-0.080
Em-0.080
 Jewel-0.078
 Emmy-0.078
Token man
Token ID582
Feature activation-8.72e-02

Pos logit contributions
cks+0.086
 afloat+0.083
ans+0.079
 hon+0.073
 steer+0.071
Neg logit contributions
pillar-0.079
 she-0.077
Really-0.075
 Jewel-0.074
Hmm-0.073
Token says
Token ID1139
Feature activation-4.74e-02

Pos logit contributions
 afloat+0.157
ans+0.156
 sled+0.148
 paddle+0.146
 polar+0.145
Neg logit contributions
 Jewel-0.126
 she-0.124
Sure-0.116
pillar-0.115
Well-0.115
Token.
Token ID13
Feature activation-5.16e-02

Pos logit contributions
ans+0.098
 ge+0.091
ep+0.088
 hon+0.087
cks+0.086
Neg logit contributions
 she-0.076
enny-0.067
ly-0.063
 Jewel-0.061
stone-0.059
Token "
Token ID366
Feature activation+2.52e-03

Pos logit contributions
ans+0.105
cks+0.098
 hon+0.095
ep+0.091
 paddle+0.089
Neg logit contributions
enny-0.078
 Jewel-0.077
Em-0.077
Well-0.077
 she-0.076
TokenWe
Token ID1135
Feature activation-3.21e-02

Pos logit contributions
Em+0.002
 Jewel+0.002
pillar+0.002
Well+0.002
enny+0.002
Neg logit contributions
 hon-0.006
ans-0.006
cks-0.006
 ge-0.006
 paddle-0.006
Token need
Token ID761
Feature activation-5.99e-03

Pos logit contributions
 hon+0.059
ans+0.058
cks+0.056
 ge+0.052
 steer+0.052
Neg logit contributions
 Jewel-0.050
enny-0.049
pillar-0.048
Em-0.047
Well-0.045
Token worry
Token ID5490
Feature activation-3.83e-02

Pos logit contributions
ans+0.029
cks+0.027
 hon+0.027
ep+0.026
 afloat+0.024
Neg logit contributions
 Jewel-0.022
 she-0.022
enny-0.022
pillar-0.021
Em-0.021
Token,
Token ID11
Feature activation-3.09e-02

Pos logit contributions
cks+0.083
ans+0.082
 hon+0.081
 paddle+0.078
cker+0.077
Neg logit contributions
 she-0.048
enny-0.048
olly-0.044
pillar-0.044
ha-0.044
Token son
Token ID3367
Feature activation-4.38e-02

Pos logit contributions
cks+0.054
ans+0.054
 hon+0.051
cker+0.049
 afloat+0.048
Neg logit contributions
 she-0.049
enny-0.049
pillar-0.049
 Jewel-0.047
Of-0.047
Token!"
Token ID2474
Feature activation-6.32e-02

Pos logit contributions
ans+0.083
 hon+0.077
cks+0.076
ric+0.076
 afloat+0.075
Neg logit contributions
 she-0.071
 Jewel-0.063
stone-0.061
enny-0.060
 I-0.059
Token the
Token ID262
Feature activation-4.55e-02

Pos logit contributions
ans+0.134
cks+0.124
ep+0.119
cker+0.118
 hon+0.117
Neg logit contributions
 she-0.100
Well-0.086
 Jewel-0.082
 Well-0.081
enny-0.081
Token man
Token ID582
Feature activation-8.58e-02

Pos logit contributions
ans+0.081
cks+0.080
 afloat+0.076
ric+0.075
cker+0.073
Neg logit contributions
 she-0.073
pillar-0.073
Sure-0.071
 Jewel-0.069
 diary-0.068
Token says
Token ID1139
Feature activation-5.13e-02

Pos logit contributions
 afloat+0.163
ans+0.160
ric+0.151
 polar+0.149
 sled+0.148
Neg logit contributions
 she-0.119
 Jewel-0.112
she-0.105
Sure-0.105
pillar-0.103
Token.
Token ID13
Feature activation-5.41e-02

Pos logit contributions
ans+0.104
ric+0.096
 ge+0.096
ep+0.094
 hon+0.092
Neg logit contributions
 she-0.082
enny-0.074
ly-0.071
 Jewel-0.067
stone-0.067
Token "
Token ID366
Feature activation-7.13e-03

Pos logit contributions
ans+0.131
cks+0.122
 hon+0.115
ep+0.115
 afloat+0.114
Neg logit contributions
 she-0.062
 Jewel-0.060
Well-0.060
enny-0.059
pillar-0.059
TokenI
Token ID40
Feature activation-1.12e-02

Pos logit contributions
 hon+0.018
ans+0.018
cks+0.017
 ge+0.017
 polar+0.017
Neg logit contributions
Em-0.006
pillar-0.006
 Jewel-0.006
Well-0.006
enny-0.006
Token will
Token ID481
Feature activation-2.39e-02

Pos logit contributions
ans+0.023
cks+0.022
 hon+0.022
ep+0.020
 ge+0.020
Neg logit contributions
enny-0.015
 Jewel-0.015
pillar-0.014
Sure-0.014
Em-0.014
Token,
Token ID11
Feature activation-4.04e-02

Pos logit contributions
cks+0.086
ans+0.086
 hon+0.082
 afloat+0.082
ric+0.081
Neg logit contributions
 she-0.049
 Jewel-0.047
ha-0.046
enny-0.046
olly-0.045
Token kids
Token ID3988
Feature activation-4.90e-02

Pos logit contributions
ans+0.071
cks+0.067
 afloat+0.063
 hon+0.062
cker+0.061
Neg logit contributions
 Jewel-0.069
 she-0.067
Em-0.064
Of-0.064
 May-0.063
Token!"
Token ID2474
Feature activation-6.90e-02

Pos logit contributions
ans+0.092
cks+0.086
 hon+0.084
ull+0.083
ric+0.081
Neg logit contributions
 she-0.077
enny-0.071
 Jewel-0.070
stone-0.067
pillar-0.067
Token the
Token ID262
Feature activation-5.15e-02

Pos logit contributions
ans+0.140
cks+0.127
ric+0.125
ep+0.123
 afloat+0.122
Neg logit contributions
 she-0.115
 Jewel-0.102
Well-0.100
Em-0.098
Of-0.096
Token zo
Token ID40565
Feature activation-5.54e-02

Pos logit contributions
ans+0.078
 afloat+0.076
cks+0.076
ull+0.068
ric+0.064
Neg logit contributions
 Jewel-0.089
 she-0.084
pillar-0.084
Sure-0.084
Really-0.083
Tokenoke
Token ID2088
Feature activation-8.56e-02

Pos logit contributions
ans+0.091
 hon+0.087
cks+0.085
 afloat+0.083
ep+0.081
Neg logit contributions
enny-0.099
 Jewel-0.093
pillar-0.092
Em-0.089
 Well-0.088
Tokeneper
Token ID5723
Feature activation-7.03e-02

Pos logit contributions
 afloat+0.156
 gra+0.142
 stret+0.138
 paddle+0.138
 explode+0.137
Neg logit contributions
Well-0.107
ha-0.107
 she-0.104
ths-0.104
issa-0.102
Token said
Token ID531
Feature activation-4.47e-02

Pos logit contributions
ans+0.159
ric+0.155
cks+0.152
 afloat+0.149
 paddle+0.145
Neg logit contributions
 she-0.079
enny-0.077
Em-0.070
stone-0.070
ly-0.064
Token.
Token ID13
Feature activation-4.85e-02

Pos logit contributions
ans+0.097
 ge+0.090
ric+0.089
ep+0.088
cks+0.087
Neg logit contributions
 she-0.069
enny-0.059
ly-0.059
 Jewel-0.055
 Emma-0.053
Token "
Token ID366
Feature activation-4.61e-04

Pos logit contributions
ans+0.121
cks+0.111
ep+0.106
cker+0.105
 afloat+0.105
Neg logit contributions
 she-0.054
 Jewel-0.053
 Well-0.051
Well-0.051
enny-0.050
TokenThe
Token ID464
Feature activation-4.03e-02

Pos logit contributions
 hon+0.001
ans+0.001
cks+0.001
 ge+0.001
 polar+0.001
Neg logit contributions
Em-0.000
enny-0.000
 Jewel-0.000
pillar-0.000
Well-0.000
Token
Token ID198
Feature activation-6.77e-03

Pos logit contributions
ans+0.069
cks+0.064
 hon+0.064
cker+0.062
 ge+0.061
Neg logit contributions
enny-0.041
Em-0.040
Well-0.037
Sure-0.036
 Jewel-0.036
TokenM
Token ID44
Feature activation-1.33e-02

Pos logit contributions
ans+0.016
cks+0.015
 hon+0.015
cker+0.014
 afloat+0.014
Neg logit contributions
Em-0.009
enny-0.008
Sure-0.008
Well-0.008
Hmm-0.008
Tokenira
Token ID8704
Feature activation-4.84e-02

Pos logit contributions
 hon+0.028
cks+0.027
ans+0.026
 ge+0.025
 paddle+0.025
Neg logit contributions
enny-0.018
pillar-0.017
 Jewel-0.016
innie-0.016
Em-0.016
Token and
Token ID290
Feature activation-6.63e-03

Pos logit contributions
 hon+0.091
ans+0.091
cks+0.088
 ge+0.083
 paddle+0.083
Neg logit contributions
enny-0.071
Em-0.070
pillar-0.070
 Jewel-0.069
Well-0.066
Token R
Token ID371
Feature activation-1.79e-02

Pos logit contributions
ans+0.012
cks+0.011
 hon+0.011
ep+0.010
 afloat+0.010
Neg logit contributions
 Jewel-0.012
 she-0.011
 Min-0.011
Em-0.011
 Charlotte-0.011
Tokenavi
Token ID15820
Feature activation-8.56e-02

Pos logit contributions
 hon+0.025
 afloat+0.024
 paddle+0.023
cks+0.022
ans+0.022
Neg logit contributions
pillar-0.035
enny-0.035
innie-0.035
iley-0.033
illy-0.033
Token and
Token ID290
Feature activation-5.98e-03

Pos logit contributions
ans+0.180
cks+0.173
 hon+0.170
 afloat+0.159
 polar+0.157
Neg logit contributions
enny-0.106
Well-0.105
Em-0.103
pillar-0.101
Sure-0.101
Token their
Token ID511
Feature activation-3.64e-02

Pos logit contributions
ans+0.009
 hon+0.008
cks+0.008
ep+0.008
 afloat+0.008
Neg logit contributions
 Jewel-0.012
Em-0.011
enny-0.011
Well-0.011
Of-0.011
Token friends
Token ID2460
Feature activation-4.35e-02

Pos logit contributions
cks+0.067
ans+0.061
 afloat+0.060
 hon+0.059
ull+0.058
Neg logit contributions
innie-0.052
 Jewel-0.052
Em-0.051
Of-0.050
Sure-0.050
Token ran
Token ID4966
Feature activation-1.02e-04

Pos logit contributions
ans+0.085
 hon+0.084
cks+0.083
 ge+0.077
ep+0.076
Neg logit contributions
enny-0.061
Em-0.061
 Jewel-0.060
pillar-0.058
Well-0.056
Token to
Token ID284
Feature activation-4.09e-02

Pos logit contributions
cks+0.000
ans+0.000
 hon+0.000
ep+0.000
cker+0.000
Neg logit contributions
 Jewel-0.000
enny-0.000
Sure-0.000
Em-0.000
 she-0.000
Token back
Token ID736
Feature activation-1.82e-02

Pos logit contributions
enny+0.006
 Jewel+0.005
pillar+0.005
Of+0.005
Em+0.005
Neg logit contributions
 hon-0.008
cks-0.007
ans-0.007
 ge-0.007
 paddle-0.007
Token my
Token ID616
Feature activation-5.67e-03

Pos logit contributions
 hon+0.036
ans+0.034
cks+0.034
 paddle+0.033
 afloat+0.033
Neg logit contributions
 Jewel-0.026
enny-0.025
 she-0.024
Em-0.024
pillar-0.023
Token things
Token ID1243
Feature activation-3.05e-02

Pos logit contributions
 hon+0.010
ans+0.010
cks+0.010
ep+0.009
 steer+0.009
Neg logit contributions
 Jewel-0.009
enny-0.009
Em-0.009
pillar-0.009
 diary-0.008
Token,"
Token ID553
Feature activation-4.20e-02

Pos logit contributions
ans+0.051
 hon+0.051
cks+0.048
ep+0.045
 ge+0.045
Neg logit contributions
 she-0.052
enny-0.052
 Jewel-0.052
Em-0.051
pillar-0.050
Token the
Token ID262
Feature activation-2.85e-02

Pos logit contributions
ans+0.089
 hon+0.086
cks+0.084
ep+0.081
 afloat+0.078
Neg logit contributions
 Jewel-0.056
 she-0.056
Em-0.055
enny-0.054
Well-0.052
Token man
Token ID582
Feature activation-8.53e-02

Pos logit contributions
 hon+0.048
ans+0.048
cks+0.048
 steer+0.043
ep+0.043
Neg logit contributions
 Jewel-0.047
pillar-0.046
Em-0.046
enny-0.045
Ruby-0.043
Token says
Token ID1139
Feature activation-4.34e-02

Pos logit contributions
ans+0.175
 afloat+0.166
cks+0.160
 sled+0.159
 paddle+0.158
Neg logit contributions
 Jewel-0.109
 she-0.107
enny-0.103
Sure-0.102
pillar-0.102
Token.
Token ID13
Feature activation-4.33e-02

Pos logit contributions
ans+0.085
 ge+0.077
cks+0.076
 hon+0.074
ep+0.072
Neg logit contributions
 she-0.074
enny-0.070
 Jewel-0.064
ly-0.063
Em-0.063
Token He
Token ID679
Feature activation-3.85e-02

Pos logit contributions
ans+0.082
 hon+0.077
cks+0.076
ep+0.071
 ge+0.070
Neg logit contributions
enny-0.072
 Jewel-0.071
Em-0.071
Well-0.068
pillar-0.067
Token is
Token ID318
Feature activation-3.82e-02

Pos logit contributions
 hon+0.066
ans+0.065
cks+0.062
 ge+0.059
 steer+0.058
Neg logit contributions
 Jewel-0.063
enny-0.062
Em-0.061
pillar-0.061
Well-0.058
Token right
Token ID826
Feature activation-2.17e-02

Pos logit contributions
ans+0.076
cker+0.066
 paddle+0.065
 steer+0.063
 sled+0.063
Neg logit contributions
 she-0.056
 Jewel-0.056
 Well-0.050
 makeup-0.050
 Emma-0.049
Token!"
Token ID2474
Feature activation-6.37e-02

Pos logit contributions
ans+0.038
cks+0.037
 hon+0.036
ep+0.033
 paddle+0.033
Neg logit contributions
 she-0.031
 Jewel-0.031
enny-0.030
Em-0.030
 Charlotte-0.029
Token one
Token ID530
Feature activation-4.31e-02

Pos logit contributions
ans+0.106
cks+0.097
 hon+0.096
ep+0.095
cker+0.091
Neg logit contributions
 she-0.117
 Jewel-0.113
Well-0.109
Em-0.108
 Well-0.107
Token of
Token ID286
Feature activation-5.59e-02

Pos logit contributions
ans+0.088
 hon+0.086
cks+0.084
 ge+0.079
 paddle+0.078
Neg logit contributions
enny-0.060
Em-0.057
 Jewel-0.056
pillar-0.055
 she-0.054
Token the
Token ID262
Feature activation-4.95e-02

Pos logit contributions
ans+0.098
 hon+0.095
cker+0.093
cks+0.090
 afloat+0.089
Neg logit contributions
 Jewel-0.086
 Grace-0.085
 she-0.085
 Penny-0.083
 Emmy-0.083
Token fire
Token ID2046
Feature activation-6.36e-02

Pos logit contributions
cks+0.082
ans+0.081
 afloat+0.071
 steer+0.070
ull+0.070
Neg logit contributions
 she-0.079
 Jewel-0.077
 diary-0.074
Really-0.072
Ruby-0.071
Tokenmen
Token ID3653
Feature activation-8.48e-02

Pos logit contributions
ans+0.091
cks+0.086
 afloat+0.079
 ge+0.078
 paddle+0.073
Neg logit contributions
 Well-0.104
als-0.101
 she-0.100
ths-0.099
 Girl-0.099
Token said
Token ID531
Feature activation-4.04e-02

Pos logit contributions
ans+0.169
cks+0.161
ric+0.153
 snowball+0.148
opl+0.144
Neg logit contributions
 she-0.112
Em-0.106
enny-0.105
ted-0.100
ly-0.099
Token.
Token ID13
Feature activation-5.62e-02

Pos logit contributions
ans+0.086
cks+0.078
 ge+0.078
ep+0.078
cker+0.075
Neg logit contributions
 she-0.062
enny-0.052
ly-0.051
 Jewel-0.051
 Emma-0.049
Token "
Token ID366
Feature activation+8.46e-03

Pos logit contributions
ans+0.116
cks+0.109
 hon+0.105
ep+0.103
cker+0.101
Neg logit contributions
Well-0.080
 Jewel-0.080
 Well-0.079
Em-0.077
 However-0.076
TokenDid
Token ID11633
Feature activation-1.03e-02

Pos logit contributions
Em+0.007
 Jewel+0.007
Well+0.007
pillar+0.007
enny+0.007
Neg logit contributions
 hon-0.021
ans-0.021
cks-0.020
 paddle-0.020
 ge-0.020
Token you
Token ID345
Feature activation-1.21e-02

Pos logit contributions
 hon+0.029
ans+0.029
cks+0.028
 afloat+0.027
 paddle+0.027
Neg logit contributions
 Jewel-0.005
 she-0.005
enny-0.004
Em-0.004
Well-0.004
TokenStop
Token ID19485
Feature activation-2.33e-02

Pos logit contributions
 Jewel+0.011
Em+0.011
enny+0.011
pillar+0.010
Well+0.009
Neg logit contributions
 hon-0.044
ans-0.043
cks-0.042
 ge-0.040
ep-0.040
Token!
Token ID0
Feature activation-1.52e-02

Pos logit contributions
ans+0.049
cks+0.043
cker+0.043
 hon+0.043
ep+0.042
Neg logit contributions
 she-0.032
 Jewel-0.031
pillar-0.031
 Emmy-0.028
enny-0.028
Token Stop
Token ID13707
Feature activation-1.51e-02

Pos logit contributions
ans+0.040
 hon+0.040
cks+0.039
ep+0.038
 afloat+0.037
Neg logit contributions
enny-0.012
 Jewel-0.012
pillar-0.011
Em-0.010
 Well-0.010
Token!"
Token ID2474
Feature activation-5.20e-02

Pos logit contributions
ans+0.033
cker+0.031
cks+0.030
 hon+0.029
ull+0.029
Neg logit contributions
 she-0.019
 Jewel-0.017
pillar-0.017
 Charlotte-0.016
Em-0.016
Token the
Token ID262
Feature activation-4.32e-02

Pos logit contributions
ans+0.101
 hon+0.096
cks+0.093
ep+0.092
 ge+0.088
Neg logit contributions
 Jewel-0.079
 she-0.079
Em-0.076
enny-0.074
 Grace-0.073
Token man
Token ID582
Feature activation-8.47e-02

Pos logit contributions
cks+0.080
ans+0.075
 afloat+0.074
ric+0.068
ull+0.066
Neg logit contributions
 she-0.068
pillar-0.068
Sure-0.064
 Jewel-0.064
 diary-0.063
Token calls
Token ID3848
Feature activation-3.30e-02

Pos logit contributions
ans+0.137
 afloat+0.136
 polar+0.127
ull+0.127
 ge+0.126
Neg logit contributions
 Jewel-0.134
 she-0.134
pillar-0.127
she-0.125
Well-0.123
Token.
Token ID13
Feature activation-4.39e-02

Pos logit contributions
ans+0.073
cks+0.068
ric+0.068
ep+0.066
cker+0.065
Neg logit contributions
 she-0.052
 Grace-0.047
 Emma-0.046
 Jewel-0.044
 May-0.043
Token He
Token ID679
Feature activation-3.41e-02

Pos logit contributions
ans+0.094
cks+0.085
ep+0.080
 hon+0.080
 ge+0.080
Neg logit contributions
 Jewel-0.065
 she-0.064
 Well-0.061
Well-0.060
Em-0.059
Token runs
Token ID4539
Feature activation-3.29e-04

Pos logit contributions
ans+0.060
 hon+0.057
cks+0.055
 afloat+0.054
 ge+0.054
Neg logit contributions
 Jewel-0.053
pillar-0.051
enny-0.050
Well-0.050
 she-0.049
Token to
Token ID284
Feature activation-2.76e-02

Pos logit contributions
ans+0.001
cks+0.001
 hon+0.001
 ge+0.001
ep+0.001
Neg logit contributions
 Jewel-0.000
enny-0.000
Em-0.000
 she-0.000
Sure-0.000
Token might
Token ID1244
Feature activation-1.32e-02

Pos logit contributions
 hon+0.028
ans+0.027
cks+0.026
 ge+0.025
ep+0.024
Neg logit contributions
enny-0.024
 Jewel-0.024
Em-0.023
pillar-0.023
Well-0.022
Token hurt
Token ID5938
Feature activation-4.11e-02

Pos logit contributions
ans+0.023
cks+0.022
ep+0.021
 hon+0.020
ull+0.020
Neg logit contributions
 Jewel-0.021
 she-0.021
pillar-0.020
Em-0.019
Ruby-0.019
Token you
Token ID345
Feature activation-2.06e-02

Pos logit contributions
ans+0.106
 hon+0.106
 paddle+0.102
cks+0.102
 afloat+0.099
Neg logit contributions
 Jewel-0.031
 she-0.030
enny-0.026
pillar-0.026
Em-0.025
Token."
Token ID526
Feature activation-4.32e-02

Pos logit contributions
cks+0.035
ans+0.035
 hon+0.035
ull+0.034
 ge+0.034
Neg logit contributions
 she-0.033
 Jewel-0.030
pillar-0.028
stone-0.028
ly-0.028
Token the
Token ID262
Feature activation-3.32e-02

Pos logit contributions
ans+0.080
 hon+0.076
cks+0.074
ep+0.073
cker+0.068
Neg logit contributions
 Jewel-0.071
enny-0.070
 she-0.070
Em-0.069
Well-0.067
Token man
Token ID582
Feature activation-8.47e-02

Pos logit contributions
cks+0.055
ans+0.054
 afloat+0.048
ull+0.047
 hon+0.047
Neg logit contributions
pillar-0.059
 she-0.058
 Jewel-0.057
 diary-0.056
Sure-0.055
Token says
Token ID1139
Feature activation-4.69e-02

Pos logit contributions
ans+0.143
 afloat+0.139
 hon+0.131
 polar+0.131
 paddle+0.131
Neg logit contributions
 Jewel-0.134
 she-0.127
pillar-0.127
Well-0.125
Sure-0.124
Token.
Token ID13
Feature activation-4.42e-02

Pos logit contributions
ans+0.087
 ge+0.083
 hon+0.079
 polar+0.078
ep+0.078
Neg logit contributions
 she-0.079
enny-0.074
ly-0.071
 Jewel-0.069
Em-0.067
Token
Token ID198
Feature activation-3.48e-02

Pos logit contributions
ans+0.083
 hon+0.078
cks+0.076
ep+0.072
 ge+0.071
Neg logit contributions
 Jewel-0.073
enny-0.072
Em-0.071
pillar-0.070
Well-0.070
Token
Token ID198
Feature activation-1.36e-03

Pos logit contributions
ans+0.082
cker+0.078
 hon+0.076
 ge+0.076
 paddle+0.074
Neg logit contributions
enny-0.042
Em-0.041
 she-0.038
Well-0.038
 Jewel-0.037
TokenAnna
Token ID31160
Feature activation-4.13e-02

Pos logit contributions
ans+0.003
 hon+0.003
cks+0.003
 paddle+0.003
 ge+0.003
Neg logit contributions
enny-0.002
Em-0.002
pillar-0.002
Ruby-0.002
 Jewel-0.002
Token stop
Token ID2245
Feature activation-3.36e-02

Pos logit contributions
ans+0.029
 hon+0.028
cks+0.028
ep+0.026
cker+0.025
Neg logit contributions
 Jewel-0.036
pillar-0.035
enny-0.034
Of-0.034
Sure-0.034
Token!"
Token ID2474
Feature activation-3.21e-02

Pos logit contributions
ans+0.063
 hon+0.060
ep+0.058
cker+0.058
 paddle+0.056
Neg logit contributions
 she-0.046
pillar-0.045
 Jewel-0.044
cha-0.042
 I-0.041
Token L
Token ID406
Feature activation-1.93e-02

Pos logit contributions
 hon+0.059
ans+0.059
cks+0.056
ep+0.054
 ge+0.052
Neg logit contributions
 Jewel-0.050
enny-0.048
Em-0.048
pillar-0.047
 she-0.046
Tokenila
Token ID10102
Feature activation-6.30e-02

Pos logit contributions
 hon+0.037
ans+0.035
 afloat+0.034
cker+0.034
 paddle+0.033
Neg logit contributions
enny-0.031
pillar-0.028
 Jewel-0.027
illy-0.027
olly-0.027
Token and
Token ID290
Feature activation+2.98e-03

Pos logit contributions
ans+0.101
 polar+0.097
 afloat+0.096
ull+0.095
cker+0.094
Neg logit contributions
pillar-0.106
 Jewel-0.104
 she-0.102
Well-0.094
enny-0.094
Token Ben
Token ID3932
Feature activation-8.46e-02

Pos logit contributions
 Jewel+0.005
enny+0.005
pillar+0.004
Em+0.004
 she+0.004
Neg logit contributions
 hon-0.005
ans-0.005
cks-0.005
ep-0.005
 ge-0.005
Token shout
Token ID9191
Feature activation-4.54e-02

Pos logit contributions
ull+0.132
ans+0.132
cker+0.126
 polar+0.124
ep+0.122
Neg logit contributions
 she-0.136
 Jewel-0.133
Sure-0.130
 blush-0.126
pillar-0.124
Token.
Token ID13
Feature activation-4.00e-02

Pos logit contributions
ans+0.085
 ge+0.080
cks+0.078
 hon+0.078
ep+0.077
Neg logit contributions
 she-0.068
 Jewel-0.065
enny-0.065
ly-0.062
pillar-0.061
Token They
Token ID1119
Feature activation-5.41e-02

Pos logit contributions
ans+0.070
 hon+0.068
cks+0.066
ep+0.065
cker+0.061
Neg logit contributions
 Jewel-0.067
Well-0.063
pillar-0.062
enny-0.062
 she-0.062
Token run
Token ID1057
Feature activation-8.09e-03

Pos logit contributions
ans+0.100
ull+0.098
ep+0.092
cker+0.091
cks+0.089
Neg logit contributions
 Jewel-0.078
Sure-0.073
pillar-0.073
 she-0.072
Of-0.071
Token to
Token ID284
Feature activation-3.29e-02

Pos logit contributions
cks+0.018
ans+0.018
 hon+0.018
ep+0.017
cker+0.017
Neg logit contributions
 Jewel-0.009
enny-0.009
Sure-0.008
 she-0.008
Of-0.008
Token I
Token ID314
Feature activation-1.54e-02

Pos logit contributions
ans+0.044
cks+0.043
 afloat+0.039
ull+0.039
 hon+0.039
Neg logit contributions
 Jewel-0.046
 she-0.045
pillar-0.044
Of-0.044
enny-0.043
Token'm
Token ID1101
Feature activation-1.84e-02

Pos logit contributions
ans+0.035
cks+0.034
 hon+0.033
 steer+0.031
 ge+0.030
Neg logit contributions
enny-0.019
Em-0.018
pillar-0.018
Well-0.017
 Jewel-0.017
Token sorry
Token ID7926
Feature activation-2.10e-02

Pos logit contributions
ans+0.038
 steer+0.035
cker+0.035
 paddle+0.035
 hon+0.034
Neg logit contributions
 she-0.026
 Jewel-0.025
 Charlotte-0.024
 Emmy-0.023
 Grace-0.023
Token,"
Token ID553
Feature activation-5.20e-02

Pos logit contributions
ans+0.039
cks+0.037
 hon+0.037
ep+0.036
cker+0.035
Neg logit contributions
 she-0.034
 Jewel-0.033
enny-0.032
pillar-0.031
Em-0.030
Token the
Token ID262
Feature activation-3.32e-02

Pos logit contributions
ans+0.092
cks+0.084
ull+0.081
ep+0.080
 afloat+0.079
Neg logit contributions
 she-0.097
 Jewel-0.090
Well-0.086
Of-0.085
Sure-0.084
Token man
Token ID582
Feature activation-8.40e-02

Pos logit contributions
cks+0.053
ans+0.052
 afloat+0.047
 hon+0.047
ull+0.046
Neg logit contributions
pillar-0.062
 Jewel-0.060
Sure-0.059
 she-0.058
Of-0.058
Token says
Token ID1139
Feature activation-4.99e-02

Pos logit contributions
 afloat+0.155
ans+0.154
 paddle+0.144
 sled+0.144
ric+0.143
Neg logit contributions
 Jewel-0.118
 she-0.118
Sure-0.114
Of-0.111
pillar-0.109
Token.
Token ID13
Feature activation-4.36e-02

Pos logit contributions
ans+0.098
 ge+0.095
 hon+0.088
ep+0.088
 polar+0.088
Neg logit contributions
 she-0.080
enny-0.074
ly-0.070
 Jewel-0.068
stone-0.066
Token "
Token ID366
Feature activation-1.27e-03

Pos logit contributions
ans+0.097
cks+0.090
 hon+0.086
 ge+0.084
 afloat+0.083
Neg logit contributions
 Jewel-0.058
Well-0.056
enny-0.056
 she-0.055
pillar-0.055
TokenRex
Token ID47389
Feature activation-2.13e-02

Pos logit contributions
ans+0.003
 hon+0.003
cks+0.003
 ge+0.003
 paddle+0.003
Neg logit contributions
Em-0.001
Well-0.001
pillar-0.001
enny-0.001
 Jewel-0.001
Token is
Token ID318
Feature activation-2.62e-02

Pos logit contributions
ans+0.042
ull+0.041
 hon+0.040
cks+0.040
 afloat+0.039
Neg logit contributions
 Jewel-0.029
pillar-0.027
 she-0.027
�-0.025
Ruby-0.025
Token are
Token ID389
Feature activation-2.38e-02

Pos logit contributions
 hon+0.027
ans+0.027
cks+0.025
 ge+0.025
ep+0.024
Neg logit contributions
 Jewel-0.019
enny-0.019
pillar-0.019
Em-0.018
 Close-0.018
Token missing
Token ID4814
Feature activation-1.77e-02

Pos logit contributions
ans+0.042
 hon+0.041
 paddle+0.039
 steer+0.039
cks+0.038
Neg logit contributions
 Jewel-0.038
Em-0.034
 cartridge-0.034
 Charlotte-0.034
 she-0.034
Token out
Token ID503
Feature activation-3.43e-02

Pos logit contributions
 hon+0.028
ans+0.028
cks+0.026
 ge+0.025
 paddle+0.025
Neg logit contributions
 Jewel-0.031
Em-0.030
enny-0.029
pillar-0.029
 Grace-0.029
Token,"
Token ID553
Feature activation-5.63e-02

Pos logit contributions
 hon+0.055
ans+0.054
cks+0.050
 paddle+0.049
 ge+0.048
Neg logit contributions
 Jewel-0.059
enny-0.059
Em-0.058
pillar-0.057
 she-0.056
Token the
Token ID262
Feature activation-4.30e-02

Pos logit contributions
ans+0.104
cks+0.098
ep+0.091
 afloat+0.090
 hon+0.085
Neg logit contributions
 she-0.102
 Jewel-0.093
Well-0.089
Em-0.087
Sure-0.086
Token man
Token ID582
Feature activation-8.39e-02

Pos logit contributions
cks+0.066
ans+0.065
 afloat+0.061
 hon+0.058
 steer+0.055
Neg logit contributions
pillar-0.079
 she-0.077
 Jewel-0.076
Well-0.074
Really-0.074
Token says
Token ID1139
Feature activation-4.55e-02

Pos logit contributions
ans+0.149
 afloat+0.148
ull+0.139
 ge+0.138
 sled+0.138
Neg logit contributions
 Jewel-0.124
 she-0.120
pillar-0.114
 Close-0.110
Well-0.109
Token.
Token ID13
Feature activation-5.28e-02

Pos logit contributions
ans+0.091
 ge+0.088
 polar+0.081
cks+0.080
 hon+0.080
Neg logit contributions
 she-0.072
enny-0.065
ly-0.062
 Jewel-0.061
stone-0.057
Token
Token ID198
Feature activation-3.87e-02

Pos logit contributions
ans+0.110
cks+0.099
 hon+0.097
 paddle+0.095
 ge+0.094
Neg logit contributions
Well-0.078
 Jewel-0.077
enny-0.077
 she-0.077
Em-0.075
Token
Token ID198
Feature activation-1.37e-02

Pos logit contributions
ans+0.092
 paddle+0.088
 ge+0.087
cker+0.087
 hon+0.086
Neg logit contributions
enny-0.044
Em-0.042
Well-0.041
Sure-0.040
 she-0.039
TokenTom
Token ID13787
Feature activation-2.94e-02

Pos logit contributions
ans+0.027
 paddle+0.026
 hon+0.026
 polar+0.025
cks+0.025
Neg logit contributions
enny-0.020
Em-0.020
Well-0.020
Hmm-0.019
Sure-0.019
Token,
Token ID11
Feature activation-1.36e-03

Pos logit contributions
ans+0.019
cks+0.019
 hon+0.019
ep+0.018
cker+0.018
Neg logit contributions
 Jewel-0.015
 she-0.014
 Grace-0.014
 Charlotte-0.013
Em-0.013
Token bad
Token ID2089
Feature activation+1.96e-03

Pos logit contributions
cks+0.002
ans+0.002
 hon+0.002
ep+0.002
cker+0.002
Neg logit contributions
 Jewel-0.002
pillar-0.002
 she-0.002
 Charlotte-0.002
enny-0.002
Token dog
Token ID3290
Feature activation-3.36e-02

Pos logit contributions
pillar+0.003
 Jewel+0.003
 diary+0.003
 Charlotte+0.003
 she+0.003
Neg logit contributions
cks-0.003
 hon-0.003
ans-0.003
ep-0.003
 steer-0.003
Token!"
Token ID2474
Feature activation-4.66e-02

Pos logit contributions
ans+0.057
 hon+0.056
cks+0.055
 paddle+0.054
ric+0.054
Neg logit contributions
pillar-0.054
 Jewel-0.051
 she-0.050
enny-0.049
stone-0.049
Token the
Token ID262
Feature activation-3.16e-02

Pos logit contributions
ans+0.092
cks+0.088
 hon+0.088
ep+0.083
 steer+0.081
Neg logit contributions
 she-0.068
 Jewel-0.067
Em-0.066
enny-0.064
 Grace-0.063
Token man
Token ID582
Feature activation-8.38e-02

Pos logit contributions
cks+0.055
ans+0.053
 hon+0.051
 steer+0.047
 afloat+0.046
Neg logit contributions
pillar-0.054
 Jewel-0.052
 diary-0.050
Em-0.050
 she-0.049
Token says
Token ID1139
Feature activation-4.28e-02

Pos logit contributions
 afloat+0.153
ans+0.152
 sled+0.142
ric+0.142
ull+0.141
Neg logit contributions
 she-0.118
 Jewel-0.110
pillar-0.105
she-0.104
 Close-0.103
Token.
Token ID13
Feature activation-4.06e-02

Pos logit contributions
ans+0.082
cks+0.075
ric+0.074
 ge+0.074
 hon+0.072
Neg logit contributions
 she-0.071
enny-0.065
ly-0.062
Em-0.062
 Emma-0.061
Token He
Token ID679
Feature activation-3.14e-02

Pos logit contributions
ans+0.088
cks+0.080
 ge+0.075
 hon+0.075
ep+0.074
Neg logit contributions
 she-0.059
 Jewel-0.058
Well-0.056
Em-0.055
enny-0.055
Token sounds
Token ID5238
Feature activation-2.57e-02

Pos logit contributions
ans+0.056
 hon+0.055
cks+0.052
 ge+0.050
ull+0.050
Neg logit contributions
 Jewel-0.049
pillar-0.047
enny-0.047
Em-0.046
Well-0.045
Token angry
Token ID7954
Feature activation-4.24e-02

Pos logit contributions
ans+0.051
cker+0.045
cks+0.045
 hon+0.045
ep+0.044
Neg logit contributions
 she-0.039
 Jewel-0.036
Em-0.034
Well-0.034
Ruby-0.034
Token"
Token ID1
Feature activation+2.62e-02

Pos logit contributions
Em+0.007
enny+0.007
 Jewel+0.007
pillar+0.007
Ruby+0.007
Neg logit contributions
ans-0.011
 hon-0.010
cks-0.010
 paddle-0.010
 ge-0.009
TokenO
Token ID46
Feature activation-5.69e-03

Pos logit contributions
enny+0.030
 Jewel+0.029
Em+0.028
pillar+0.027
innie+0.026
Neg logit contributions
 hon-0.062
ans-0.059
cks-0.057
 ge-0.057
ep-0.055
Tokenw
Token ID86
Feature activation-1.24e-02

Pos logit contributions
 hon+0.015
ans+0.015
 ge+0.014
 polar+0.014
 paddle+0.014
Neg logit contributions
pillar-0.004
enny-0.004
innie-0.003
 Jewel-0.003
Well-0.003
Token!"
Token ID2474
Feature activation-4.28e-02

Pos logit contributions
 hon+0.023
ans+0.022
 ge+0.021
 paddle+0.021
 steer+0.021
Neg logit contributions
pillar-0.019
enny-0.019
 Jewel-0.018
stone-0.018
 she-0.017
Token the
Token ID262
Feature activation-2.75e-02

Pos logit contributions
ans+0.079
cks+0.075
 hon+0.075
ep+0.073
cker+0.069
Neg logit contributions
 Jewel-0.069
 she-0.068
Em-0.066
 Grace-0.064
 Penny-0.064
Token man
Token ID582
Feature activation-8.34e-02

Pos logit contributions
ans+0.044
cks+0.044
 hon+0.043
ep+0.038
 steer+0.038
Neg logit contributions
pillar-0.050
 Jewel-0.049
enny-0.046
Em-0.046
 diary-0.046
Token says
Token ID1139
Feature activation-4.41e-02

Pos logit contributions
ans+0.141
 afloat+0.138
ull+0.134
 polar+0.134
 sled+0.129
Neg logit contributions
 Jewel-0.127
 she-0.123
pillar-0.122
Sure-0.116
Well-0.116
Token.
Token ID13
Feature activation-4.00e-02

Pos logit contributions
ans+0.087
ep+0.079
cks+0.077
 ge+0.077
 hon+0.075
Neg logit contributions
 she-0.073
 Jewel-0.065
enny-0.063
ly-0.061
 Well-0.060
Token "
Token ID366
Feature activation+5.01e-03

Pos logit contributions
ans+0.088
cks+0.082
 hon+0.079
ep+0.078
cker+0.076
Neg logit contributions
 Jewel-0.055
 she-0.052
enny-0.051
Em-0.050
pillar-0.050
TokenThe
Token ID464
Feature activation-1.69e-02

Pos logit contributions
pillar+0.004
Em+0.004
Well+0.004
 Jewel+0.004
Hmm+0.004
Neg logit contributions
 hon-0.013
ans-0.013
cks-0.012
 paddle-0.012
 ge-0.012
Token dog
Token ID3290
Feature activation-3.47e-02

Pos logit contributions
cks+0.028
ans+0.027
 hon+0.026
ep+0.024
ull+0.024
Neg logit contributions
 Jewel-0.030
pillar-0.030
 diary-0.028
Of-0.027
Really-0.027
Tokenoc
Token ID420
Feature activation-2.26e-02

Pos logit contributions
ans+0.105
cks+0.102
 afloat+0.095
 paddle+0.094
allows+0.090
Neg logit contributions
pillar-0.107
stone-0.096
 Jewel-0.096
ha-0.095
 she-0.090
Tokeneros
Token ID27498
Feature activation-5.66e-02

Pos logit contributions
 hon+0.038
ans+0.037
cks+0.035
 paddle+0.034
 ge+0.034
Neg logit contributions
 Jewel-0.039
pillar-0.036
enny-0.036
stone-0.035
innie-0.034
Token!"
Token ID2474
Feature activation-3.92e-02

Pos logit contributions
ans+0.106
 paddle+0.098
 afloat+0.097
cks+0.096
 hon+0.094
Neg logit contributions
pillar-0.088
 she-0.083
 Jewel-0.082
stone-0.081
ha-0.077
Token Anna
Token ID11735
Feature activation-5.74e-02

Pos logit contributions
 hon+0.070
ans+0.068
cks+0.064
 ge+0.062
ep+0.061
Neg logit contributions
 Jewel-0.064
enny-0.064
Em-0.063
pillar-0.062
Well-0.059
Token and
Token ID290
Feature activation-8.34e-04

Pos logit contributions
ans+0.102
cks+0.098
 hon+0.094
 paddle+0.092
 afloat+0.091
Neg logit contributions
pillar-0.093
enny-0.092
 Jewel-0.092
Em-0.090
 she-0.090
Token Ben
Token ID3932
Feature activation-8.34e-02

Pos logit contributions
 hon+0.001
ans+0.001
 ge+0.001
cks+0.001
 paddle+0.001
Neg logit contributions
enny-0.002
Em-0.002
 Jewel-0.002
pillar-0.001
innie-0.001
Token said
Token ID531
Feature activation-4.05e-02

Pos logit contributions
ans+0.172
cks+0.163
 hon+0.158
 snowball+0.156
ep+0.154
Neg logit contributions
 Jewel-0.102
 she-0.102
enny-0.098
Em-0.098
pillar-0.098
Token.
Token ID13
Feature activation-4.25e-02

Pos logit contributions
ans+0.074
 ge+0.069
 hon+0.067
cks+0.066
ep+0.066
Neg logit contributions
 she-0.070
enny-0.064
 Jewel-0.062
stone-0.061
 Grace-0.059
Token
Token ID198
Feature activation-3.74e-02

Pos logit contributions
ans+0.080
 hon+0.077
cks+0.074
ep+0.072
 ge+0.069
Neg logit contributions
 Jewel-0.068
Em-0.066
enny-0.065
pillar-0.063
Well-0.063
Token
Token ID198
Feature activation-9.60e-03

Pos logit contributions
ans+0.093
cker+0.087
 hon+0.085
 ge+0.085
cks+0.084
Neg logit contributions
Em-0.037
 she-0.033
Ruby-0.032
 Jewel-0.032
enny-0.032
TokenThe
Token ID464
Feature activation-3.04e-02

Pos logit contributions
ans+0.020
 hon+0.020
cks+0.019
 polar+0.018
 ge+0.018
Neg logit contributions
Em-0.013
Ruby-0.012
 Jewel-0.012
enny-0.012
�-0.011
Token the
Token ID262
Feature activation-1.98e-02

Pos logit contributions
ans+0.042
 hon+0.040
 afloat+0.040
 snowball+0.040
ep+0.039
Neg logit contributions
 Jewel-0.036
 Grace-0.034
 Penny-0.033
 she-0.032
Of-0.032
Token zoo
Token ID26626
Feature activation-5.29e-02

Pos logit contributions
ans+0.030
cks+0.030
 hon+0.028
 steer+0.026
 afloat+0.025
Neg logit contributions
 Jewel-0.038
Really-0.035
pillar-0.035
Em-0.034
Hmm-0.034
Token,"
Token ID553
Feature activation-5.09e-02

Pos logit contributions
ans+0.090
 paddle+0.088
cks+0.084
 afloat+0.083
 hon+0.083
Neg logit contributions
 she-0.086
 Jewel-0.083
stone-0.079
pillar-0.076
 May-0.074
Token the
Token ID262
Feature activation-4.42e-02

Pos logit contributions
ans+0.098
cks+0.091
 hon+0.090
ep+0.088
 afloat+0.083
Neg logit contributions
 Jewel-0.081
 she-0.081
Em-0.077
 Grace-0.074
Well-0.074
Token zo
Token ID40565
Feature activation-5.39e-02

Pos logit contributions
cks+0.057
ans+0.057
 afloat+0.054
ull+0.048
 hon+0.046
Neg logit contributions
 Jewel-0.092
Really-0.085
Sure-0.084
Well-0.084
Hmm-0.084
Tokenoke
Token ID2088
Feature activation-8.33e-02

Pos logit contributions
ans+0.091
 hon+0.088
 afloat+0.086
cks+0.085
ep+0.084
Neg logit contributions
enny-0.091
 Jewel-0.087
pillar-0.087
 Well-0.082
olly-0.082
Tokeneper
Token ID5723
Feature activation-7.04e-02

Pos logit contributions
 afloat+0.163
 explode+0.146
 paddle+0.145
 snowball+0.140
 drown+0.139
Neg logit contributions
 Jewel-0.098
Well-0.098
ray-0.095
stone-0.094
ha-0.093
Token says
Token ID1139
Feature activation-4.35e-02

Pos logit contributions
 afloat+0.146
ans+0.137
 paddle+0.133
 ge+0.131
 sled+0.129
Neg logit contributions
 she-0.091
enny-0.085
 Jewel-0.082
 Close-0.079
she-0.077
Token.
Token ID13
Feature activation-4.69e-02

Pos logit contributions
ans+0.090
 ge+0.084
ep+0.082
 hon+0.080
cks+0.079
Neg logit contributions
 she-0.068
enny-0.062
 Jewel-0.058
stone-0.056
ly-0.056
Token "
Token ID366
Feature activation+9.22e-03

Pos logit contributions
ans+0.099
cks+0.092
 hon+0.089
ep+0.087
 ge+0.083
Neg logit contributions
 Jewel-0.069
enny-0.066
Em-0.065
 she-0.065
Well-0.065
TokenDon
Token ID3987
Feature activation+1.28e-02

Pos logit contributions
enny+0.009
 Jewel+0.009
Em+0.008
pillar+0.008
innie+0.007
Neg logit contributions
 hon-0.023
ans-0.022
cks-0.022
 ge-0.021
 steer-0.021
Token
Token ID198
Feature activation-4.09e-02

Pos logit contributions
ans+0.147
cker+0.138
 afloat+0.135
 hon+0.134
 polar+0.133
Neg logit contributions
Well-0.060
 she-0.057
Sure-0.056
Em-0.056
I-0.052
Token"
Token ID1
Feature activation+7.27e-05

Pos logit contributions
ans+0.096
ric+0.090
 hon+0.089
 polar+0.089
 afloat+0.088
Neg logit contributions
Well-0.046
Sure-0.044
 she-0.043
Em-0.043
Hmm-0.042
TokenHey
Token ID10814
Feature activation-3.91e-02

Pos logit contributions
 Jewel+0.000
enny+0.000
Em+0.000
pillar+0.000
Well+0.000
Neg logit contributions
 hon-0.000
ans-0.000
cks-0.000
 ge-0.000
ep-0.000
Token,
Token ID11
Feature activation-2.95e-02

Pos logit contributions
ans+0.075
cks+0.070
 hon+0.070
cker+0.065
 afloat+0.065
Neg logit contributions
 Jewel-0.057
 she-0.056
pillar-0.054
enny-0.054
ha-0.054
Token you
Token ID345
Feature activation-3.08e-02

Pos logit contributions
 hon+0.062
ans+0.062
cks+0.060
ep+0.057
 steer+0.056
Neg logit contributions
 Jewel-0.037
enny-0.036
pillar-0.035
Em-0.035
Of-0.034
Token!"
Token ID2474
Feature activation-8.32e-02

Pos logit contributions
 hon+0.054
ans+0.052
cks+0.050
 ge+0.048
 steer+0.047
Neg logit contributions
enny-0.051
 Jewel-0.051
Em-0.050
pillar-0.050
Well-0.047
Token he
Token ID339
Feature activation-6.48e-02

Pos logit contributions
ans+0.166
cks+0.151
 afloat+0.146
ep+0.144
ric+0.143
Neg logit contributions
 she-0.140
Well-0.128
 Jewel-0.121
 Well-0.120
Sure-0.118
Token shouted
Token ID17293
Feature activation-4.68e-02

Pos logit contributions
ans+0.121
cks+0.116
 afloat+0.111
 paddle+0.108
ric+0.108
Neg logit contributions
 Jewel-0.097
Em-0.097
 she-0.094
enny-0.092
Well-0.092
Token.
Token ID13
Feature activation-7.10e-02

Pos logit contributions
ans+0.096
ric+0.095
cks+0.094
ep+0.093
cker+0.087
Neg logit contributions
 she-0.071
 Jewel-0.063
ly-0.061
enny-0.061
 Emma-0.060
Token "
Token ID366
Feature activation+6.02e-03

Pos logit contributions
ans+0.171
cks+0.160
 afloat+0.155
ep+0.152
cker+0.149
Neg logit contributions
 she-0.079
 Jewel-0.079
Well-0.079
 However-0.076
 Well-0.075
TokenYou
Token ID1639
Feature activation-1.45e-02

Pos logit contributions
Em+0.005
 Jewel+0.005
enny+0.005
pillar+0.005
Well+0.004
Neg logit contributions
 hon-0.016
ans-0.016
cks-0.015
 ge-0.014
 paddle-0.014
Token practiced
Token ID19893
Feature activation+3.74e-02

Pos logit contributions
 Jewel+0.041
enny+0.040
Em+0.040
pillar+0.039
Hmm+0.037
Neg logit contributions
 hon-0.056
ans-0.053
cks-0.051
 ge-0.049
ep-0.049
Token in
Token ID287
Feature activation+2.43e-02

Pos logit contributions
 Jewel+0.058
enny+0.057
Em+0.057
pillar+0.055
Well+0.053
Neg logit contributions
 hon-0.070
ans-0.068
cks-0.064
 ge-0.063
ep-0.061
Token an
Token ID281
Feature activation+1.65e-02

Pos logit contributions
enny+0.038
 Jewel+0.037
Em+0.037
pillar+0.037
Well+0.035
Neg logit contributions
 hon-0.045
ans-0.042
cks-0.041
 ge-0.040
 steer-0.039
Token icy
Token ID30284
Feature activation+1.44e-02

Pos logit contributions
enny+0.024
 Jewel+0.024
Em+0.024
pillar+0.023
Well+0.022
Neg logit contributions
 hon-0.033
ans-0.031
cks-0.030
 ge-0.029
ep-0.029
Token room
Token ID2119
Feature activation+2.65e-02

Pos logit contributions
Em+0.024
pillar+0.024
 Jewel+0.024
enny+0.024
 Grace+0.022
Neg logit contributions
cks-0.024
 hon-0.024
ans-0.024
 ge-0.022
ep-0.021
Token.
Token ID13
Feature activation+5.12e-02

Pos logit contributions
 Jewel+0.058
enny+0.058
Em+0.057
pillar+0.056
Hmm+0.055
Neg logit contributions
 hon-0.033
ans-0.030
cks-0.028
 ge-0.027
ep-0.026
Token This
Token ID770
Feature activation+3.57e-02

Pos logit contributions
stone+0.065
ths+0.062
pillar+0.060
brate+0.060
Em+0.059
Neg logit contributions
 hon-0.096
ans-0.090
 paddle-0.088
 steer-0.087
 sled-0.086
Token time
Token ID640
Feature activation+5.67e-02

Pos logit contributions
enny+0.076
Really+0.074
Of+0.073
 Jewel+0.073
Certainly+0.071
Neg logit contributions
 hon-0.047
ep-0.041
ans-0.037
 steer-0.034
 ge-0.034
Token it
Token ID340
Feature activation+4.88e-02

Pos logit contributions
Em+0.086
pillar+0.084
Really+0.084
 Jewel+0.084
 Close+0.083
Neg logit contributions
 hon-0.103
 ge-0.091
ans-0.088
 steer-0.086
 paddle-0.085
Token was
Token ID373
Feature activation+1.08e-02

Pos logit contributions
 Jewel+0.083
enny+0.083
Em+0.081
innie+0.079
Really+0.079
Neg logit contributions
 hon-0.086
ans-0.075
ep-0.074
 steer-0.071
 ge-0.071
Token easier
Token ID4577
Feature activation+3.60e-02

Pos logit contributions
 Jewel+0.016
Em+0.016
enny+0.016
pillar+0.015
Well+0.015
Neg logit contributions
 hon-0.020
ans-0.020
cks-0.019
 ge-0.018
 steer-0.018
Token.
Token ID13
Feature activation+5.28e-02

Pos logit contributions
 Jewel+0.023
enny+0.023
Em+0.023
pillar+0.022
 she+0.021
Neg logit contributions
 hon-0.018
ans-0.018
cks-0.017
 ge-0.016
 paddle-0.016
Token As
Token ID1081
Feature activation+6.36e-02

Pos logit contributions
illy+0.074
ths+0.073
stone+0.070
innie+0.069
pillar+0.068
Neg logit contributions
 hon-0.100
 steer-0.084
ans-0.082
 paddle-0.081
 their-0.080
Token soon
Token ID2582
Feature activation+3.19e-02

Pos logit contributions
pillar+0.068
Hmm+0.067
stone+0.066
umm+0.065
 cartridge+0.065
Neg logit contributions
 hon-0.152
 their-0.136
 ge-0.133
 steer-0.128
 sled-0.125
Token as
Token ID355
Feature activation+4.08e-02

Pos logit contributions
enny+0.058
pillar+0.057
Really+0.056
 Jewel+0.056
Hmm+0.056
Neg logit contributions
 hon-0.054
 ge-0.047
ans-0.046
 paddle-0.044
 steer-0.043
Token she
Token ID673
Feature activation+3.70e-02

Pos logit contributions
pillar+0.039
 cartridge+0.036
innie+0.035
�+0.035
 Close+0.035
Neg logit contributions
 hon-0.102
 ge-0.091
 their-0.090
 steer-0.089
 sled-0.085
Token entered
Token ID5982
Feature activation+4.66e-02

Pos logit contributions
 Jewel+0.069
Em+0.066
enny+0.066
 diary+0.066
Hmm+0.066
Neg logit contributions
 hon-0.064
ep-0.054
 steer-0.052
 paddle-0.051
 ge-0.051
Token the
Token ID262
Feature activation+1.80e-02

Pos logit contributions
enny+0.111
Em+0.110
pillar+0.109
Hmm+0.107
Really+0.106
Neg logit contributions
 hon-0.049
ans-0.039
 ge-0.037
cks-0.037
 steer-0.036
Token room
Token ID2119
Feature activation+3.27e-02

Pos logit contributions
enny+0.030
innie+0.027
Em+0.027
Hmm+0.027
illy+0.026
Neg logit contributions
 hon-0.035
ans-0.030
ep-0.029
 ge-0.028
 paddle-0.028
Token and
Token ID290
Feature activation+4.27e-02

Pos logit contributions
 Jewel+0.070
Em+0.069
pillar+0.068
innie+0.068
Hmm+0.068
Neg logit contributions
 hon-0.042
ans-0.035
 ge-0.033
 steer-0.032
cks-0.032
Token started
Token ID2067
Feature activation+3.21e-02

Pos logit contributions
innie+0.078
enny+0.077
Em+0.075
pillar+0.075
 Jewel+0.075
Neg logit contributions
 hon-0.073
 ge-0.060
ep-0.059
ans-0.059
 steer-0.059
Token to
Token ID284
Feature activation+1.70e-02

Pos logit contributions
 Jewel+0.042
enny+0.042
Em+0.041
 she+0.041
pillar+0.040
Neg logit contributions
ans-0.066
 hon-0.065
cks-0.063
ep-0.060
 ge-0.060
Token stepped
Token ID10764
Feature activation+3.16e-02

Pos logit contributions
 Jewel+0.071
enny+0.070
Em+0.069
pillar+0.067
Hmm+0.066
Neg logit contributions
 hon-0.080
ans-0.073
 ge-0.070
cks-0.068
 steer-0.068
Token into
Token ID656
Feature activation+1.18e-02

Pos logit contributions
Em+0.057
pillar+0.057
 Jewel+0.056
enny+0.056
oph+0.055
Neg logit contributions
 hon-0.053
 sled-0.046
 polar-0.045
 paddle-0.045
ans-0.045
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
enny+0.025
pillar+0.024
 Jewel+0.024
Em+0.024
Well+0.023
Neg logit contributions
 hon-0.016
ans-0.014
cks-0.014
 ge-0.014
 steer-0.013
Token movie
Token ID3807
Feature activation+1.36e-02

Pos logit contributions
ans+0.017
 hon+0.017
cks+0.017
 ge+0.016
 paddle+0.016
Neg logit contributions
 Jewel-0.018
Em-0.018
enny-0.018
pillar-0.018
 Close-0.017
Token theater
Token ID13766
Feature activation+1.82e-02

Pos logit contributions
 Jewel+0.026
enny+0.026
Em+0.025
pillar+0.025
Hmm+0.024
Neg logit contributions
 hon-0.021
ans-0.019
cks-0.018
 ge-0.018
ep-0.017
Token,
Token ID11
Feature activation+4.53e-02

Pos logit contributions
enny+0.035
 Jewel+0.035
Em+0.035
pillar+0.034
Well+0.033
Neg logit contributions
 hon-0.027
ans-0.026
cks-0.024
 ge-0.024
ep-0.023
Token she
Token ID673
Feature activation+1.58e-02

Pos logit contributions
innie+0.068
Em+0.059
umm+0.055
enny+0.053
ths+0.052
Neg logit contributions
 ge-0.090
 their-0.089
 hon-0.089
 his-0.085
 paddle-0.083
Token immediately
Token ID3393
Feature activation+2.69e-03

Pos logit contributions
enny+0.021
Em+0.020
pillar+0.020
 she+0.020
 Jewel+0.019
Neg logit contributions
ans-0.033
cks-0.032
 hon-0.031
 afloat-0.029
 steer-0.029
Token saw
Token ID2497
Feature activation+2.22e-02

Pos logit contributions
 she+0.003
enny+0.003
pillar+0.003
Em+0.003
 Jewel+0.003
Neg logit contributions
cks-0.006
ans-0.006
 hon-0.005
 afloat-0.005
ep-0.005
Token popcorn
Token ID38914
Feature activation-5.72e-03

Pos logit contributions
enny+0.039
 Jewel+0.038
Em+0.038
pillar+0.037
 cartridge+0.035
Neg logit contributions
 hon-0.038
ans-0.036
cks-0.034
 ge-0.034
ep-0.033
Token on
Token ID319
Feature activation-4.01e-03

Pos logit contributions
ans+0.010
 hon+0.009
cks+0.009
 ge+0.009
cker+0.009
Neg logit contributions
 she-0.009
 Jewel-0.008
stone-0.008
pillar-0.008
enny-0.008
Token m
Token ID285
Feature activation+1.15e-02

Pos logit contributions
enny+0.043
 Jewel+0.042
pillar+0.042
Em+0.042
 diary+0.039
Neg logit contributions
 hon-0.040
cks-0.039
ans-0.038
ep-0.034
 ge-0.034
Tokenop
Token ID404
Feature activation+5.84e-03

Pos logit contributions
enny+0.014
innie+0.013
pillar+0.013
Em+0.012
 Jewel+0.012
Neg logit contributions
 hon-0.027
ans-0.025
cks-0.025
 ge-0.024
 paddle-0.024
Token by
Token ID416
Feature activation+7.80e-03

Pos logit contributions
enny+0.010
 Jewel+0.010
pillar+0.010
 she+0.010
Em+0.010
Neg logit contributions
 hon-0.010
cks-0.010
ans-0.009
 ge-0.009
 paddle-0.009
Token her
Token ID607
Feature activation+1.92e-02

Pos logit contributions
 Jewel+0.010
Em+0.009
enny+0.009
pillar+0.009
 Grace+0.009
Neg logit contributions
 hon-0.017
ans-0.017
cks-0.016
 ge-0.015
 paddle-0.015
Token side
Token ID1735
Feature activation+1.05e-02

Pos logit contributions
 Jewel+0.034
enny+0.033
Em+0.033
pillar+0.032
 diary+0.032
Neg logit contributions
 hon-0.032
ans-0.031
cks-0.030
ep-0.028
 ge-0.028
Token.
Token ID13
Feature activation+5.05e-02

Pos logit contributions
enny+0.021
Em+0.020
pillar+0.020
 Jewel+0.020
 she+0.020
Neg logit contributions
 hon-0.016
ans-0.015
cks-0.014
 paddle-0.013
 ge-0.013
Token<|endoftext|>
Token ID50256
Feature activation-1.89e-03

Pos logit contributions
 Jewel+0.107
stone+0.105
Em+0.105
 cartridge+0.104
enny+0.104
Neg logit contributions
 hon-0.058
ans-0.052
 ge-0.049
cks-0.047
 steer-0.044
TokenOnce
Token ID7454
Feature activation+3.22e-03

Pos logit contributions
ans+0.004
 hon+0.004
 paddle+0.004
cks+0.004
cker+0.004
Neg logit contributions
Em-0.003
enny-0.003
 Jewel-0.003
pillar-0.003
Well-0.003
Token upon
Token ID2402
Feature activation-1.63e-02

Pos logit contributions
enny+0.005
Em+0.004
 she+0.004
 Jewel+0.004
pillar+0.004
Neg logit contributions
 hon-0.007
ans-0.007
cks-0.006
 ge-0.006
 steer-0.006
Token a
Token ID257
Feature activation+6.55e-03

Pos logit contributions
ans+0.033
 hon+0.032
cks+0.030
 ge+0.029
 paddle+0.029
Neg logit contributions
Em-0.023
enny-0.023
 Jewel-0.023
pillar-0.022
 she-0.022
Token time
Token ID640
Feature activation-4.28e-03

Pos logit contributions
Em+0.010
enny+0.010
pillar+0.009
 Jewel+0.009
oph+0.009
Neg logit contributions
 hon-0.013
ans-0.013
cks-0.012
 paddle-0.012
 ge-0.012
Token very
Token ID845
Feature activation+1.22e-02

Pos logit contributions
 Jewel+0.032
Em+0.032
enny+0.032
pillar+0.031
Well+0.030
Neg logit contributions
 hon-0.033
ans-0.033
cks-0.031
 ge-0.030
 steer-0.029
Token happy
Token ID3772
Feature activation+3.85e-02

Pos logit contributions
enny+0.026
 Jewel+0.026
pillar+0.026
Em+0.026
Hmm+0.025
Neg logit contributions
 hon-0.016
ans-0.014
cks-0.014
 ge-0.013
 steer-0.013
Token.
Token ID13
Feature activation+5.46e-02

Pos logit contributions
Hmm+0.089
enny+0.088
pillar+0.088
Em+0.087
 Jewel+0.087
Neg logit contributions
 hon-0.042
 ge-0.031
ans-0.031
 their-0.030
 steer-0.030
Token She
Token ID1375
Feature activation+2.18e-02

Pos logit contributions
brate+0.086
pr+0.077
ha+0.077
Hmm+0.076
stone+0.075
Neg logit contributions
 their-0.075
ans-0.073
 hon-0.073
om-0.073
unts-0.070
Token kept
Token ID4030
Feature activation+3.33e-02

Pos logit contributions
 Jewel+0.036
enny+0.035
Em+0.034
pillar+0.033
Hmm+0.033
Neg logit contributions
 hon-0.041
ans-0.037
ep-0.035
 ge-0.035
cks-0.035
Token dreaming
Token ID28368
Feature activation+5.41e-02

Pos logit contributions
enny+0.053
pillar+0.052
 Jewel+0.050
Em+0.050
Hmm+0.049
Neg logit contributions
 hon-0.065
ans-0.059
cks-0.058
ep-0.058
 ge-0.055
Token about
Token ID546
Feature activation+3.55e-02

Pos logit contributions
 Jewel+0.097
enny+0.096
Em+0.095
Hmm+0.094
pillar+0.093
Neg logit contributions
 hon-0.087
ans-0.077
cks-0.074
ep-0.072
 ge-0.072
Token her
Token ID607
Feature activation+4.91e-02

Pos logit contributions
enny+0.060
Em+0.059
 Jewel+0.058
pillar+0.058
Hmm+0.055
Neg logit contributions
 hon-0.063
ans-0.058
cks-0.056
 ge-0.055
 steer-0.053
Token style
Token ID3918
Feature activation+3.75e-02

Pos logit contributions
enny+0.086
Em+0.084
pillar+0.083
Hmm+0.083
 Jewel+0.082
Neg logit contributions
 hon-0.083
ans-0.074
 ge-0.070
 steer-0.070
 paddle-0.069
Token each
Token ID1123
Feature activation+1.81e-02

Pos logit contributions
 Jewel+0.072
Em+0.071
enny+0.071
pillar+0.069
Hmm+0.069
Neg logit contributions
 hon-0.055
ans-0.051
cks-0.047
ep-0.046
 steer-0.046
Token night
Token ID1755
Feature activation+3.04e-02

Pos logit contributions
Really+0.035
enny+0.035
Em+0.035
 cartridge+0.034
Well+0.033
Neg logit contributions
 hon-0.026
ans-0.023
 steer-0.023
 ge-0.022
 paddle-0.022
Token went
Token ID1816
Feature activation+8.48e-03

Pos logit contributions
enny+0.003
 Jewel+0.003
Em+0.003
pillar+0.003
Well+0.003
Neg logit contributions
 hon-0.005
ans-0.005
cks-0.005
 ge-0.004
ep-0.004
Token to
Token ID284
Feature activation-1.37e-02

Pos logit contributions
 Jewel+0.011
enny+0.011
Em+0.011
pillar+0.010
 she+0.010
Neg logit contributions
 hon-0.018
ans-0.017
cks-0.017
 ge-0.016
 steer-0.016
Token the
Token ID262
Feature activation-2.30e-02

Pos logit contributions
ans+0.024
 hon+0.023
cks+0.022
ep+0.021
cker+0.021
Neg logit contributions
 Jewel-0.023
 she-0.022
Em-0.022
 Penny-0.022
 Charlotte-0.022
Token store
Token ID3650
Feature activation+5.47e-04

Pos logit contributions
 hon+0.047
ans+0.046
cks+0.046
 steer+0.043
ep+0.042
Neg logit contributions
 Jewel-0.031
Em-0.030
enny-0.030
pillar-0.029
 Close-0.028
Token.
Token ID13
Feature activation+1.20e-02

Pos logit contributions
enny+0.001
 she+0.001
 Jewel+0.001
Em+0.001
 Emmy+0.001
Neg logit contributions
 hon-0.001
cks-0.001
ans-0.001
 ge-0.001
 paddle-0.001
Token So
Token ID1406
Feature activation+3.33e-02

Pos logit contributions
stone+0.022
pillar+0.022
enny+0.022
 diary+0.021
ience+0.021
Neg logit contributions
 hon-0.017
 ge-0.015
ans-0.015
 their-0.014
 steer-0.013
Token the
Token ID262
Feature activation+1.01e-02

Pos logit contributions
pillar+0.093
enny+0.092
Em+0.086
 cartridge+0.085
Charl+0.085
Neg logit contributions
 hon-0.024
 their-0.023
 ge-0.019
 steer-0.013
ans-0.013
Token little
Token ID1310
Feature activation+3.36e-02

Pos logit contributions
enny+0.025
pillar+0.023
 Jewel+0.023
 cartridge+0.023
Charl+0.023
Neg logit contributions
 hon-0.009
 ge-0.009
 hipp-0.008
 polar-0.008
ans-0.007
Token girl
Token ID2576
Feature activation+1.17e-02

Pos logit contributions
innie+0.066
enny+0.066
brate+0.066
Charl+0.064
Em+0.063
Neg logit contributions
 ge-0.043
 hon-0.041
 snowball-0.037
ans-0.037
 their-0.036
Token happily
Token ID18177
Feature activation+7.87e-03

Pos logit contributions
 Jewel+0.020
Em+0.020
enny+0.020
pillar+0.019
Hmm+0.019
Neg logit contributions
 hon-0.020
ans-0.019
 ge-0.018
ep-0.017
 paddle-0.017
Token ran
Token ID4966
Feature activation+4.27e-02

Pos logit contributions
 Jewel+0.012
enny+0.012
Em+0.012
pillar+0.012
Hmm+0.011
Neg logit contributions
 hon-0.015
ans-0.014
cks-0.013
 ge-0.013
ep-0.013
Token,
Token ID11
Feature activation-3.17e-03

Pos logit contributions
ans+0.030
 hon+0.029
cks+0.028
 steer+0.027
 paddle+0.026
Neg logit contributions
 Jewel-0.024
enny-0.024
 she-0.023
Em-0.022
 Grace-0.022
Token mom
Token ID1995
Feature activation-1.61e-02

Pos logit contributions
 hon+0.006
ans+0.005
cks+0.005
ep+0.005
 steer+0.005
Neg logit contributions
 Jewel-0.005
enny-0.005
Em-0.005
pillar-0.005
Of-0.005
Token.
Token ID13
Feature activation+3.20e-03

Pos logit contributions
ans+0.030
 paddle+0.029
 hon+0.029
cks+0.029
 afloat+0.027
Neg logit contributions
 she-0.027
enny-0.026
Em-0.025
 Jewel-0.024
pillar-0.024
Token We
Token ID775
Feature activation-9.74e-03

Pos logit contributions
enny+0.005
Em+0.005
pillar+0.004
 Jewel+0.004
ometime+0.004
Neg logit contributions
 hon-0.006
ans-0.006
cks-0.006
 ge-0.006
 steer-0.005
Token are
Token ID389
Feature activation-1.00e-02

Pos logit contributions
 hon+0.018
ans+0.017
cks+0.017
 ge+0.015
 steer+0.015
Neg logit contributions
 Jewel-0.016
enny-0.016
Em-0.015
pillar-0.015
Well-0.015
Token sorry
Token ID7926
Feature activation+5.27e-03

Pos logit contributions
ans+0.018
 hon+0.018
 steer+0.017
 paddle+0.017
cks+0.017
Neg logit contributions
 Jewel-0.016
Em-0.016
 Charlotte-0.015
 she-0.015
 Emmy-0.015
Token.
Token ID13
Feature activation-4.53e-03

Pos logit contributions
enny+0.010
 Jewel+0.010
Em+0.010
pillar+0.010
Well+0.009
Neg logit contributions
 hon-0.008
ans-0.008
cks-0.007
 ge-0.007
ep-0.007
Token We
Token ID775
Feature activation-1.35e-02

Pos logit contributions
 hon+0.008
ans+0.008
cks+0.008
 ge+0.008
 steer+0.007
Neg logit contributions
enny-0.007
Em-0.007
pillar-0.007
 Jewel-0.007
oph-0.006
Token will
Token ID481
Feature activation+1.69e-03

Pos logit contributions
 hon+0.026
ans+0.025
cks+0.025
 ge+0.023
 steer+0.023
Neg logit contributions
 Jewel-0.020
enny-0.020
Em-0.020
pillar-0.019
Well-0.019
Token clean
Token ID3424
Feature activation-2.22e-02

Pos logit contributions
 Jewel+0.003
Em+0.003
enny+0.003
 Emmy+0.003
 she+0.003
Neg logit contributions
ans-0.003
 hon-0.003
cks-0.003
 ge-0.003
ep-0.002
Token our
Token ID674
Feature activation+9.79e-03

Pos logit contributions
 hon+0.042
ans+0.040
 ge+0.039
cker+0.039
 paddle+0.038
Neg logit contributions
 Jewel-0.033
 she-0.031
 Emmy-0.031
 Penny-0.031
 Charlotte-0.031
Token there
Token ID612
Feature activation-5.98e-03

Pos logit contributions
 hon+0.004
ans+0.003
cks+0.003
 ge+0.003
ep+0.003
Neg logit contributions
enny-0.003
 Jewel-0.003
Em-0.003
pillar-0.003
oph-0.003
Token was
Token ID373
Feature activation+7.54e-03

Pos logit contributions
ans+0.013
 hon+0.013
cks+0.012
 steer+0.011
 ge+0.011
Neg logit contributions
enny-0.008
 Jewel-0.008
Em-0.008
pillar-0.007
Well-0.007
Token a
Token ID257
Feature activation+8.84e-03

Pos logit contributions
enny+0.013
 Jewel+0.013
Em+0.013
pillar+0.013
Well+0.012
Neg logit contributions
 hon-0.013
ans-0.012
cks-0.012
 ge-0.011
ep-0.011
Token little
Token ID1310
Feature activation+1.43e-02

Pos logit contributions
Em+0.013
 Jewel+0.013
enny+0.013
pillar+0.012
Of+0.012
Neg logit contributions
ans-0.017
cks-0.017
 hon-0.017
 steer-0.015
ep-0.015
Token girl
Token ID2576
Feature activation+1.53e-02

Pos logit contributions
enny+0.021
 Jewel+0.021
Em+0.021
pillar+0.020
Well+0.019
Neg logit contributions
 hon-0.028
ans-0.027
cks-0.026
 ge-0.025
ep-0.025
Token named
Token ID3706
Feature activation+3.38e-02

Pos logit contributions
enny+0.028
 Jewel+0.028
Em+0.028
pillar+0.027
Well+0.026
Neg logit contributions
 hon-0.024
ans-0.023
cks-0.022
 ge-0.021
 paddle-0.021
Token Lily
Token ID20037
Feature activation+2.19e-02

Pos logit contributions
pillar+0.075
ths+0.073
Well+0.072
stone+0.070
Really+0.070
Neg logit contributions
 their-0.036
 Jen-0.033
 hon-0.032
 ge-0.030
ans-0.030
Token.
Token ID13
Feature activation+4.42e-02

Pos logit contributions
Em+0.040
enny+0.039
 she+0.039
 Jewel+0.038
pillar+0.038
Neg logit contributions
ans-0.036
 hon-0.035
 ge-0.032
 paddle-0.032
cks-0.032
Token She
Token ID1375
Feature activation+2.58e-02

Pos logit contributions
stone+0.062
ths+0.061
pillar+0.058
ience+0.056
innie+0.052
Neg logit contributions
 hon-0.083
 their-0.076
 snowball-0.074
 polar-0.073
ans-0.073
Token loved
Token ID6151
Feature activation+3.59e-02

Pos logit contributions
enny+0.047
 Jewel+0.047
pillar+0.045
Em+0.045
Hmm+0.044
Neg logit contributions
 hon-0.043
ans-0.038
ep-0.037
cks-0.036
 ge-0.035
Token to
Token ID284
Feature activation+2.32e-02

Pos logit contributions
enny+0.065
pillar+0.062
 Jewel+0.062
Em+0.062
Hmm+0.060
Neg logit contributions
 hon-0.063
ans-0.057
 ge-0.057
cks-0.052
ep-0.052
Token The
Token ID383
Feature activation-1.99e-03

Pos logit contributions
enny+0.004
 Jewel+0.004
Em+0.004
pillar+0.004
Well+0.004
Neg logit contributions
 hon-0.004
ans-0.003
cks-0.003
 ge-0.003
 steer-0.003
Token little
Token ID1310
Feature activation+4.06e-03

Pos logit contributions
 hon+0.004
ans+0.003
 ge+0.003
ep+0.003
cks+0.003
Neg logit contributions
enny-0.003
 Jewel-0.003
Em-0.003
pillar-0.003
 cartridge-0.003
Token girl
Token ID2576
Feature activation-4.28e-03

Pos logit contributions
enny+0.006
Em+0.006
 Jewel+0.005
innie+0.005
pillar+0.005
Neg logit contributions
 hon-0.008
 ge-0.008
ans-0.008
 paddle-0.008
 steer-0.007
Token smiled
Token ID13541
Feature activation+1.39e-02

Pos logit contributions
ans+0.008
 hon+0.008
cks+0.008
 ge+0.007
 steer+0.007
Neg logit contributions
enny-0.007
Em-0.007
 Jewel-0.006
pillar-0.006
Well-0.006
Token,
Token ID11
Feature activation+9.29e-03

Pos logit contributions
enny+0.026
 Jewel+0.026
Em+0.026
pillar+0.025
Of+0.024
Neg logit contributions
 hon-0.021
ans-0.021
cks-0.020
 ge-0.019
ep-0.018
Token laughed
Token ID13818
Feature activation+1.61e-02

Pos logit contributions
Em+0.016
 Jewel+0.016
enny+0.016
Of+0.016
 she+0.016
Neg logit contributions
ans-0.015
 hon-0.015
cks-0.015
 ge-0.013
 paddle-0.013
Token,
Token ID11
Feature activation+8.21e-03

Pos logit contributions
enny+0.030
 Jewel+0.030
Em+0.029
pillar+0.029
Well+0.028
Neg logit contributions
 hon-0.025
ans-0.024
cks-0.023
 ge-0.022
ep-0.022
Token and
Token ID290
Feature activation+1.62e-02

Pos logit contributions
Em+0.014
 she+0.014
Of+0.014
 Jewel+0.014
enny+0.014
Neg logit contributions
ans-0.014
 hon-0.013
cks-0.013
 ge-0.012
 paddle-0.012
Token kept
Token ID4030
Feature activation+2.31e-02

Pos logit contributions
enny+0.024
 Jewel+0.024
Em+0.024
pillar+0.023
innie+0.022
Neg logit contributions
 hon-0.032
ans-0.030
cks-0.028
 ge-0.028
 steer-0.027
Token playing
Token ID2712
Feature activation+2.55e-02

Pos logit contributions
 Jewel+0.034
Em+0.033
enny+0.033
pillar+0.031
 Charlotte+0.031
Neg logit contributions
 hon-0.045
ans-0.044
cks-0.041
 ge-0.041
 steer-0.039
Token.
Token ID13
Feature activation+2.73e-02

Pos logit contributions
 Jewel+0.045
Em+0.044
enny+0.044
pillar+0.043
Ruby+0.041
Neg logit contributions
 hon-0.042
ans-0.042
cks-0.040
 ge-0.039
 steer-0.037
Token she
Token ID673
Feature activation+1.43e-02

Pos logit contributions
pillar+0.045
enny+0.043
Em+0.041
Charl+0.040
innie+0.040
Neg logit contributions
 hon-0.059
 their-0.054
 ge-0.052
 paddle-0.050
 sled-0.050
Token went
Token ID1816
Feature activation+3.47e-02

Pos logit contributions
enny+0.022
 Jewel+0.022
Em+0.022
pillar+0.021
Well+0.020
Neg logit contributions
 hon-0.027
ans-0.026
cks-0.025
 ge-0.024
 steer-0.023
Token to
Token ID284
Feature activation+3.52e-03

Pos logit contributions
enny+0.055
 Jewel+0.055
Em+0.055
pillar+0.054
Well+0.051
Neg logit contributions
 hon-0.064
ans-0.061
cks-0.058
 ge-0.056
 steer-0.056
Token the
Token ID262
Feature activation+6.04e-03

Pos logit contributions
 Jewel+0.007
Em+0.006
enny+0.006
 Grace+0.006
 Charlotte+0.006
Neg logit contributions
ans-0.006
 hon-0.005
cks-0.005
ep-0.005
 ge-0.005
Token stage
Token ID3800
Feature activation+1.08e-02

Pos logit contributions
enny+0.011
Em+0.011
 Jewel+0.011
pillar+0.011
innie+0.010
Neg logit contributions
 hon-0.010
ans-0.009
 ge-0.008
cks-0.008
ep-0.008
Token.
Token ID13
Feature activation+2.11e-02

Pos logit contributions
enny+0.022
 Jewel+0.022
Em+0.022
pillar+0.022
 she+0.021
Neg logit contributions
 hon-0.014
ans-0.014
cks-0.013
 ge-0.012
 paddle-0.012
Token She
Token ID1375
Feature activation+1.37e-02

Pos logit contributions
ths+0.036
stone+0.034
pillar+0.034
ience+0.033
brate+0.032
Neg logit contributions
 hon-0.035
 their-0.031
 Jen-0.031
 paddle-0.029
ans-0.029
Token saw
Token ID2497
Feature activation+1.25e-02

Pos logit contributions
 Jewel+0.022
enny+0.022
Em+0.022
pillar+0.021
Well+0.020
Neg logit contributions
 hon-0.025
ans-0.024
cks-0.022
 ge-0.022
ep-0.021
Token a
Token ID257
Feature activation+8.92e-03

Pos logit contributions
 Jewel+0.021
enny+0.020
Em+0.020
pillar+0.019
 she+0.019
Neg logit contributions
 hon-0.022
ans-0.022
cks-0.020
 steer-0.019
 paddle-0.019
Token man
Token ID582
Feature activation-2.53e-02

Pos logit contributions
 Jewel+0.015
Em+0.015
pillar+0.015
enny+0.015
 she+0.014
Neg logit contributions
ans-0.015
cks-0.015
 hon-0.014
 steer-0.013
cker-0.013
Token.
Token ID13
Feature activation-2.09e-02

Pos logit contributions
ans+0.050
 ge+0.045
 hon+0.045
cks+0.044
ull+0.042
Neg logit contributions
 she-0.037
 Jewel-0.035
enny-0.035
pillar-0.034
Em-0.034
Token will
Token ID481
Feature activation+5.32e-03

Pos logit contributions
 hon+0.021
ans+0.020
cks+0.019
 ge+0.018
 steer+0.018
Neg logit contributions
 Jewel-0.019
enny-0.019
Em-0.019
pillar-0.019
Well-0.018
Token be
Token ID307
Feature activation-9.36e-03

Pos logit contributions
 Jewel+0.010
Em+0.009
enny+0.009
pillar+0.009
 she+0.009
Neg logit contributions
ans-0.008
 hon-0.008
cks-0.008
 ge-0.008
ep-0.007
Token mad
Token ID8805
Feature activation-1.12e-02

Pos logit contributions
ans+0.016
 hon+0.015
ull+0.015
 paddle+0.015
cker+0.014
Neg logit contributions
 Jewel-0.015
Em-0.014
 she-0.014
 cartridge-0.014
 Charlotte-0.014
Token if
Token ID611
Feature activation+1.13e-02

Pos logit contributions
 hon+0.019
ans+0.018
cks+0.017
 ge+0.016
ep+0.016
Neg logit contributions
 Jewel-0.020
enny-0.020
Em-0.020
pillar-0.019
Hmm-0.018
Token you
Token ID345
Feature activation-2.18e-02

Pos logit contributions
enny+0.013
Em+0.013
 Jewel+0.013
pillar+0.012
Hmm+0.011
Neg logit contributions
 hon-0.026
ans-0.024
 ge-0.024
cks-0.023
 steer-0.023
Token break
Token ID2270
Feature activation-1.41e-02

Pos logit contributions
 hon+0.038
ans+0.037
cks+0.037
ep+0.034
 ge+0.034
Neg logit contributions
enny-0.035
 Jewel-0.034
pillar-0.034
Em-0.033
Of-0.032
Token the
Token ID262
Feature activation+9.46e-03

Pos logit contributions
 hon+0.028
ans+0.027
 ge+0.025
cks+0.025
 paddle+0.025
Neg logit contributions
 Jewel-0.020
enny-0.019
Em-0.019
pillar-0.018
 she-0.018
Token h
Token ID289
Feature activation+2.79e-02

Pos logit contributions
enny+0.016
Em+0.016
 Jewel+0.015
pillar+0.015
Well+0.015
Neg logit contributions
 hon-0.017
ans-0.016
cks-0.015
 ge-0.015
ep-0.015
Tokenangers
Token ID6606
Feature activation-3.76e-02

Pos logit contributions
Em+0.036
 Jewel+0.035
Hmm+0.033
Really+0.033
Of+0.033
Neg logit contributions
 hon-0.060
ans-0.060
ep-0.056
cks-0.056
 sled-0.053
Token."
Token ID526
Feature activation-1.85e-02

Pos logit contributions
 hon+0.058
ans+0.055
cks+0.053
 ge+0.050
 polar+0.049
Neg logit contributions
 Jewel-0.070
enny-0.069
Em-0.067
pillar-0.067
 she-0.066
Token
Token ID198
Feature activation-1.68e-02

Pos logit contributions
 hon+0.032
ans+0.030
cks+0.029
ep+0.028
 ge+0.027
Neg logit contributions
enny-0.032
 Jewel-0.032
Em-0.032
pillar-0.031
Well-0.030
Token the
Token ID262
Feature activation-2.39e-02

Pos logit contributions
 hon+0.018
ans+0.018
cks+0.017
ep+0.016
cker+0.016
Neg logit contributions
 she-0.017
 Jewel-0.017
 Grace-0.016
 Emmy-0.016
 Penny-0.016
Token boat
Token ID8848
Feature activation-2.15e-02

Pos logit contributions
cks+0.040
 hon+0.035
ans+0.033
cker+0.032
 seventeen+0.031
Neg logit contributions
 she-0.037
 diary-0.036
 cartridge-0.036
pillar-0.035
Well-0.035
Token so
Token ID523
Feature activation-1.28e-02

Pos logit contributions
ans+0.040
cks+0.039
 hon+0.037
 ge+0.037
ep+0.036
Neg logit contributions
 she-0.033
 Jewel-0.030
stone-0.029
 Grace-0.027
pillar-0.027
Token much
Token ID881
Feature activation-6.36e-03

Pos logit contributions
ans+0.026
 hon+0.026
cks+0.025
 ge+0.024
ep+0.024
Neg logit contributions
enny-0.017
 Jewel-0.017
 she-0.017
Em-0.016
pillar-0.016
Token that
Token ID326
Feature activation+2.59e-03

Pos logit contributions
 hon+0.011
ans+0.011
cks+0.011
ep+0.010
 ge+0.010
Neg logit contributions
 Jewel-0.010
enny-0.010
 she-0.010
Em-0.010
pillar-0.010
Token she
Token ID673
Feature activation-1.27e-02

Pos logit contributions
 Jewel+0.003
enny+0.003
Em+0.003
pillar+0.003
Well+0.003
Neg logit contributions
 hon-0.006
ans-0.006
cks-0.005
 ge-0.005
ep-0.005
Token ran
Token ID4966
Feature activation+1.10e-02

Pos logit contributions
 hon+0.026
ans+0.025
cks+0.024
 ge+0.024
 steer+0.023
Neg logit contributions
enny-0.017
 Jewel-0.017
Em-0.017
pillar-0.017
Well-0.016
Token away
Token ID1497
Feature activation-6.09e-03

Pos logit contributions
 Jewel+0.016
enny+0.016
Em+0.015
 she+0.015
pillar+0.015
Neg logit contributions
ans-0.021
 hon-0.021
cks-0.020
 ge-0.019
ep-0.019
Token from
Token ID422
Feature activation-1.95e-02

Pos logit contributions
ans+0.011
cks+0.010
 hon+0.010
ep+0.009
 ge+0.009
Neg logit contributions
 Jewel-0.010
 she-0.010
enny-0.010
Em-0.010
 Grace-0.009
Token him
Token ID683
Feature activation+1.26e-03

Pos logit contributions
ans+0.034
 hon+0.033
cks+0.031
ep+0.031
cker+0.029
Neg logit contributions
 Jewel-0.032
 she-0.032
 Grace-0.030
 Penny-0.030
 Emmy-0.030
Token and
Token ID290
Feature activation+3.08e-04

Pos logit contributions
 Jewel+0.002
enny+0.002
pillar+0.002
 she+0.002
stone+0.002
Neg logit contributions
 hon-0.002
ans-0.002
cks-0.002
ep-0.002
 ge-0.002
Token home
Token ID1363
Feature activation+1.21e-02

Pos logit contributions
enny+0.016
 Jewel+0.016
Em+0.016
pillar+0.015
 she+0.015
Neg logit contributions
 hon-0.019
ans-0.018
cks-0.017
ep-0.016
 ge-0.016
Token,
Token ID11
Feature activation-1.19e-02

Pos logit contributions
 she+0.019
enny+0.018
 Emmy+0.017
 Jewel+0.017
 Grace+0.017
Neg logit contributions
ans-0.024
cks-0.024
 hon-0.023
ep-0.022
 ge-0.022
Token John
Token ID1757
Feature activation+1.58e-02

Pos logit contributions
ans+0.024
 hon+0.023
cks+0.023
ep+0.021
 ge+0.021
Neg logit contributions
 Jewel-0.017
enny-0.017
Em-0.017
 she-0.017
 Emmy-0.016
Token was
Token ID373
Feature activation-7.13e-03

Pos logit contributions
enny+0.021
Em+0.021
pillar+0.020
 Jewel+0.020
 she+0.019
Neg logit contributions
ans-0.032
cks-0.032
 hon-0.031
 ge-0.029
 steer-0.029
Token angry
Token ID7954
Feature activation-7.20e-03

Pos logit contributions
ans+0.016
cks+0.014
 steer+0.014
 hon+0.014
cker+0.014
Neg logit contributions
 she-0.009
Em-0.009
 Emmy-0.008
 Charlotte-0.008
 Grace-0.008
Token.
Token ID13
Feature activation-1.88e-02

Pos logit contributions
 hon+0.010
ans+0.010
cks+0.009
 ge+0.009
ep+0.009
Neg logit contributions
enny-0.014
 Jewel-0.014
Em-0.014
pillar-0.014
Well-0.013
Token He
Token ID679
Feature activation-7.33e-03

Pos logit contributions
 hon+0.028
ans+0.026
cks+0.024
 ge+0.024
 steer+0.023
Neg logit contributions
 Jewel-0.036
enny-0.036
Em-0.036
pillar-0.035
stone-0.034
Token had
Token ID550
Feature activation-1.13e-02

Pos logit contributions
 hon+0.014
ans+0.013
cks+0.012
 ge+0.012
ep+0.012
Neg logit contributions
enny-0.012
 Jewel-0.011
Em-0.011
pillar-0.011
Hmm-0.010
Token realized
Token ID6939
Feature activation-1.31e-04

Pos logit contributions
ans+0.023
 hon+0.022
cks+0.022
 paddle+0.020
 steer+0.020
Neg logit contributions
Em-0.015
enny-0.015
 Jewel-0.015
pillar-0.014
Well-0.014
Token that
Token ID326
Feature activation+5.14e-04

Pos logit contributions
 hon+0.000
ans+0.000
cks+0.000
 ge+0.000
 steer+0.000
Neg logit contributions
enny-0.000
 Jewel-0.000
Em-0.000
 she-0.000
pillar-0.000
Token Sam
Token ID3409
Feature activation+6.33e-03

Pos logit contributions
 Jewel+0.001
 she+0.001
Em+0.001
enny+0.001
pillar+0.001
Neg logit contributions
ans-0.001
 hon-0.001
cks-0.001
ep-0.001
 paddle-0.001
Token were
Token ID547
Feature activation-2.09e-02

Pos logit contributions
ans+0.059
 hon+0.057
cks+0.056
 ge+0.053
ep+0.051
Neg logit contributions
 Jewel-0.047
enny-0.047
Em-0.047
pillar-0.045
Well-0.044
Token so
Token ID523
Feature activation-2.43e-02

Pos logit contributions
ans+0.048
 steer+0.044
cker+0.043
 paddle+0.042
 ge+0.041
Neg logit contributions
 Jewel-0.023
 she-0.022
 Grace-0.022
 Charlotte-0.022
 May-0.022
Token excited
Token ID6568
Feature activation+3.63e-03

Pos logit contributions
ans+0.048
 hon+0.042
 sled+0.042
cks+0.041
 snowball+0.041
Neg logit contributions
 she-0.036
Well-0.033
 Jewel-0.033
Em-0.032
 Grace-0.032
Token that
Token ID326
Feature activation+2.44e-03

Pos logit contributions
enny+0.006
 Jewel+0.006
Em+0.006
pillar+0.006
Well+0.006
Neg logit contributions
 hon-0.006
ans-0.006
cks-0.006
 ge-0.005
ep-0.005
Token they
Token ID484
Feature activation-1.04e-02

Pos logit contributions
 she+0.004
 Jewel+0.003
Em+0.003
 Grace+0.003
 Charlotte+0.003
Neg logit contributions
ans-0.005
 hon-0.005
cks-0.005
ep-0.004
 steer-0.004
Token all
Token ID477
Feature activation-1.57e-02

Pos logit contributions
 hon+0.021
ans+0.020
cks+0.019
 ge+0.019
ep+0.018
Neg logit contributions
enny-0.015
 Jewel-0.015
Em-0.015
pillar-0.014
Well-0.014
Token joined
Token ID5399
Feature activation-1.26e-02

Pos logit contributions
ans+0.031
 hon+0.031
cks+0.030
 ge+0.028
 steer+0.028
Neg logit contributions
Em-0.022
 Jewel-0.021
enny-0.021
pillar-0.021
Well-0.020
Token hands
Token ID2832
Feature activation-8.80e-03

Pos logit contributions
 hon+0.021
ans+0.020
cks+0.020
 ge+0.019
ep+0.018
Neg logit contributions
 Jewel-0.022
enny-0.022
Em-0.022
pillar-0.021
Well-0.020
Token and
Token ID290
Feature activation-8.54e-03

Pos logit contributions
cks+0.016
ans+0.016
 hon+0.016
ep+0.014
 ge+0.014
Neg logit contributions
 she-0.014
enny-0.014
pillar-0.013
Em-0.013
 Grace-0.013
Token wished
Token ID16555
Feature activation+8.85e-03

Pos logit contributions
ans+0.016
cks+0.016
 hon+0.016
ep+0.015
 ge+0.014
Neg logit contributions
 Jewel-0.012
 she-0.012
Em-0.012
enny-0.012
 Grace-0.011
Token as
Token ID355
Feature activation+2.45e-03

Pos logit contributions
 Jewel+0.013
enny+0.013
Em+0.013
 she+0.013
 Grace+0.012
Neg logit contributions
ans-0.017
 hon-0.017
cks-0.016
ep-0.015
 ge-0.015
Token the
Token ID262
Feature activation+3.37e-03

Pos logit contributions
ans+0.023
 hon+0.023
cks+0.022
ep+0.021
 ge+0.020
Neg logit contributions
 Jewel-0.022
 she-0.022
 Penny-0.021
enny-0.021
 Grace-0.021
Token bear
Token ID6842
Feature activation-4.37e-03

Pos logit contributions
enny+0.006
Em+0.006
 Jewel+0.006
pillar+0.005
Well+0.005
Neg logit contributions
 hon-0.006
ans-0.005
 ge-0.005
 polar-0.005
 paddle-0.005
Token and
Token ID290
Feature activation-2.04e-03

Pos logit contributions
 hon+0.008
cks+0.007
ans+0.007
ep+0.007
 ge+0.007
Neg logit contributions
enny-0.007
 Jewel-0.007
 she-0.007
pillar-0.007
stone-0.007
Token the
Token ID262
Feature activation-3.80e-03

Pos logit contributions
ans+0.004
 hon+0.004
cks+0.004
ep+0.003
 ge+0.003
Neg logit contributions
 Jewel-0.003
enny-0.003
Em-0.003
 she-0.003
pillar-0.003
Token bear
Token ID6842
Feature activation-1.47e-02

Pos logit contributions
 hon+0.006
ans+0.006
cks+0.006
 ge+0.006
 paddle+0.005
Neg logit contributions
enny-0.007
Em-0.007
 Jewel-0.007
pillar-0.006
Well-0.006
Token grate
Token ID48612
Feature activation-2.56e-02

Pos logit contributions
cks+0.029
 hon+0.029
ans+0.028
ep+0.026
 ge+0.025
Neg logit contributions
 she-0.021
enny-0.020
 Jewel-0.019
stone-0.019
pillar-0.019
Tokenfully
Token ID2759
Feature activation-1.05e-02

Pos logit contributions
 hon+0.071
ans+0.070
cks+0.068
 ge+0.067
 snowball+0.067
Neg logit contributions
enny-0.015
 Jewel-0.013
 she-0.012
 Grace-0.012
illy-0.012
Token accepted
Token ID6292
Feature activation-1.92e-02

Pos logit contributions
ans+0.020
 hon+0.020
cks+0.020
ep+0.019
 ge+0.018
Neg logit contributions
 Jewel-0.014
enny-0.014
Em-0.014
 she-0.014
stone-0.014
Token.
Token ID13
Feature activation-1.69e-02

Pos logit contributions
ans+0.032
 hon+0.032
cks+0.030
ep+0.030
cker+0.029
Neg logit contributions
 she-0.033
 Jewel-0.033
enny-0.031
 Penny-0.031
 Grace-0.030
Token
Token ID198
Feature activation-9.39e-03

Pos logit contributions
 hon+0.033
ans+0.033
cks+0.033
ep+0.032
cker+0.030
Neg logit contributions
 Jewel-0.025
enny-0.025
Well-0.024
Em-0.024
 However-0.024
Token
Token ID198
Feature activation+1.29e-02

Pos logit contributions
ans+0.020
 hon+0.020
cker+0.019
cks+0.019
 paddle+0.018
Neg logit contributions
Em-0.013
enny-0.013
Well-0.013
Sure-0.012
pillar-0.012
Token?"
Token ID1701
Feature activation-5.06e-02

Pos logit contributions
 hon+0.003
ans+0.003
cks+0.003
 ge+0.003
 paddle+0.003
Neg logit contributions
enny-0.003
 Jewel-0.003
Em-0.002
pillar-0.002
ometime-0.002
Token
Token ID198
Feature activation-2.76e-02

Pos logit contributions
ans+0.105
cks+0.099
 hon+0.098
 afloat+0.097
ep+0.097
Neg logit contributions
 she-0.080
 Jewel-0.072
Em-0.071
enny-0.070
 Grace-0.069
Token
Token ID198
Feature activation-5.33e-04

Pos logit contributions
ans+0.063
 hon+0.060
 afloat+0.060
 paddle+0.059
cks+0.057
Neg logit contributions
Em-0.037
enny-0.035
 she-0.034
Ruby-0.034
Sure-0.032
TokenTom
Token ID13787
Feature activation-2.79e-02

Pos logit contributions
ans+0.001
 hon+0.001
cks+0.001
 afloat+0.001
 paddle+0.001
Neg logit contributions
Em-0.001
enny-0.001
Ruby-0.001
 Jewel-0.001
pillar-0.001
Token and
Token ID290
Feature activation-1.39e-02

Pos logit contributions
ans+0.048
 hon+0.047
 afloat+0.046
cks+0.045
 ge+0.045
Neg logit contributions
 Jewel-0.044
pillar-0.044
enny-0.044
Em-0.042
ometime-0.042
Token Mia
Token ID32189
Feature activation-6.54e-02

Pos logit contributions
ans+0.029
cks+0.028
 hon+0.027
ep+0.026
 afloat+0.026
Neg logit contributions
 Jewel-0.019
Em-0.019
enny-0.018
 she-0.018
Of-0.018
Token shook
Token ID14682
Feature activation-5.06e-02

Pos logit contributions
ans+0.110
 hon+0.108
cks+0.106
ep+0.100
 polar+0.098
Neg logit contributions
 Jewel-0.107
pillar-0.103
enny-0.102
Sure-0.102
 she-0.101
Token their
Token ID511
Feature activation-3.23e-02

Pos logit contributions
ans+0.077
 afloat+0.062
ipples+0.062
orses+0.062
 ge+0.061
Neg logit contributions
 she-0.095
 Jewel-0.092
 Grace-0.090
Hi-0.090
 Emma-0.089
Token heads
Token ID6665
Feature activation-5.24e-02

Pos logit contributions
ull+0.057
 afloat+0.056
cks+0.054
 hon+0.053
ans+0.051
Neg logit contributions
 diary-0.044
 Jewel-0.044
Sure-0.040
Ruby-0.040
Really-0.040
Token.
Token ID13
Feature activation-4.82e-02

Pos logit contributions
ans+0.093
cks+0.089
 ge+0.085
 hon+0.082
 polar+0.081
Neg logit contributions
 she-0.085
 Jewel-0.077
enny-0.076
stone-0.075
Sure-0.075
Token They
Token ID1119
Feature activation-5.01e-02

Pos logit contributions
ans+0.095
cks+0.091
 afloat+0.088
ep+0.087
 hon+0.086
Neg logit contributions
 Jewel-0.071
 However-0.069
Well-0.069
 she-0.069
Really-0.069
Token sorry
Token ID7926
Feature activation-1.06e-02

Pos logit contributions
ans+0.056
 hon+0.054
 steer+0.053
 paddle+0.052
cker+0.051
Neg logit contributions
 Jewel-0.041
 cartridge-0.038
Em-0.038
 Emmy-0.037
 she-0.037
Token,
Token ID11
Feature activation-2.53e-02

Pos logit contributions
 hon+0.018
ans+0.018
cks+0.018
 ge+0.016
cker+0.016
Neg logit contributions
enny-0.018
 Jewel-0.018
 she-0.017
pillar-0.017
Em-0.017
Token we
Token ID356
Feature activation-2.74e-02

Pos logit contributions
cks+0.048
ans+0.048
 afloat+0.042
 hon+0.042
cker+0.042
Neg logit contributions
 Jewel-0.038
 she-0.038
Of-0.037
Sure-0.036
pillar-0.035
Token are
Token ID389
Feature activation-3.39e-02

Pos logit contributions
 hon+0.054
ans+0.054
cks+0.052
 ge+0.049
 steer+0.049
Neg logit contributions
enny-0.038
 Jewel-0.037
Em-0.037
pillar-0.037
Well-0.034
Token sorry
Token ID7926
Feature activation-1.67e-02

Pos logit contributions
ans+0.065
 steer+0.064
 paddle+0.063
 hon+0.062
 sled+0.059
Neg logit contributions
 Jewel-0.046
 cartridge-0.044
Em-0.043
 she-0.043
 Emma-0.042
Token!"
Token ID2474
Feature activation-6.79e-02

Pos logit contributions
ans+0.029
 hon+0.028
cks+0.028
cker+0.026
 paddle+0.026
Neg logit contributions
enny-0.027
 she-0.027
 Jewel-0.026
pillar-0.026
Em-0.025
Token
Token ID198
Feature activation-4.48e-02

Pos logit contributions
cks+0.137
ans+0.136
ep+0.133
 hon+0.128
 afloat+0.127
Neg logit contributions
 she-0.095
Well-0.093
 However-0.093
Sure-0.090
Of-0.090
Token
Token ID198
Feature activation-2.44e-02

Pos logit contributions
ans+0.110
cker+0.102
 paddle+0.099
 hon+0.099
cks+0.099
Neg logit contributions
Sure-0.046
Em-0.044
Well-0.044
Of-0.041
 she-0.041
TokenThe
Token ID464
Feature activation-3.67e-02

Pos logit contributions
ans+0.051
cks+0.047
 hon+0.046
 paddle+0.045
 afloat+0.045
Neg logit contributions
Em-0.033
Sure-0.033
Well-0.031
Hmm-0.031
Of-0.031
Token angry
Token ID7954
Feature activation-3.70e-02

Pos logit contributions
cks+0.065
ans+0.062
 hon+0.059
 steer+0.057
cker+0.055
Neg logit contributions
Em-0.059
 Jewel-0.059
enny-0.057
pillar-0.057
Well-0.057
Token man
Token ID582
Feature activation-5.98e-02

Pos logit contributions
cks+0.076
 afloat+0.067
 steer+0.064
 hon+0.062
ans+0.061
Neg logit contributions
 exam-0.048
enny-0.047
ly-0.047
Hi-0.046
olly-0.045
Token!
Token ID0
Feature activation-1.08e-02

Pos logit contributions
 hon+0.063
ans+0.060
cks+0.059
 steer+0.059
 paddle+0.058
Neg logit contributions
pillar-0.049
ha-0.045
stone-0.043
enny-0.043
 she-0.042
Token O
Token ID440
Feature activation-1.13e-03

Pos logit contributions
 hon+0.026
ans+0.026
cks+0.025
ep+0.024
cker+0.024
Neg logit contributions
enny-0.012
 Jewel-0.011
pillar-0.011
 Well-0.010
Em-0.010
Tokenink
Token ID676
Feature activation-3.32e-02

Pos logit contributions
 hon+0.002
ans+0.002
 steer+0.002
 paddle+0.002
cks+0.002
Neg logit contributions
pillar-0.001
enny-0.001
 Jewel-0.001
stone-0.001
Em-0.001
Token!"
Token ID2474
Feature activation-2.65e-02

Pos logit contributions
 hon+0.064
ans+0.061
cks+0.060
 steer+0.059
 paddle+0.059
Neg logit contributions
pillar-0.048
ha-0.044
enny-0.042
stone-0.042
 she-0.041
Token The
Token ID383
Feature activation-1.94e-02

Pos logit contributions
 hon+0.047
ans+0.043
cks+0.041
 ge+0.041
 steer+0.039
Neg logit contributions
enny-0.045
 Jewel-0.044
Em-0.044
pillar-0.043
stone-0.042
Token pig
Token ID12967
Feature activation-6.50e-02

Pos logit contributions
cks+0.035
ans+0.034
 hon+0.032
ull+0.031
 steer+0.031
Neg logit contributions
 Jewel-0.029
 cartridge-0.029
 she-0.029
pillar-0.027
 diary-0.027
Token said
Token ID531
Feature activation-2.62e-02

Pos logit contributions
opl+0.157
cks+0.156
ans+0.155
ric+0.152
 afloat+0.152
Neg logit contributions
 she-0.052
stone-0.045
als-0.042
Sure-0.040
enny-0.039
Token.
Token ID13
Feature activation-2.95e-02

Pos logit contributions
ans+0.047
 hon+0.042
cks+0.042
 ge+0.041
ep+0.041
Neg logit contributions
 she-0.047
 Jewel-0.045
enny-0.045
Em-0.043
 Grace-0.042
Token It
Token ID632
Feature activation-2.97e-02

Pos logit contributions
 hon+0.056
ans+0.055
cks+0.052
 ge+0.050
ep+0.049
Neg logit contributions
 Jewel-0.046
enny-0.045
Em-0.045
pillar-0.044
Well-0.042
Token shook
Token ID14682
Feature activation-2.48e-02

Pos logit contributions
ans+0.064
cks+0.058
 hon+0.057
 afloat+0.055
 steer+0.054
Neg logit contributions
 Jewel-0.036
 she-0.035
Em-0.035
Ruby-0.035
Well-0.035
Token its
Token ID663
Feature activation-2.45e-02

Pos logit contributions
ans+0.050
 hon+0.044
cks+0.042
cker+0.042
 ge+0.041
Neg logit contributions
 she-0.038
 Jewel-0.036
Really-0.033
 Grace-0.033
Sure-0.032
Token.
Token ID13
Feature activation-4.70e-02

Pos logit contributions
ans+0.090
 ge+0.086
ep+0.080
cks+0.079
 hon+0.079
Neg logit contributions
 she-0.074
enny-0.066
 Jewel-0.063
ly-0.062
 Well-0.059
Token
Token ID198
Feature activation-3.42e-02

Pos logit contributions
ans+0.104
 hon+0.094
cks+0.093
 afloat+0.092
ep+0.090
Neg logit contributions
 she-0.061
 Jewel-0.059
pillar-0.058
 Well-0.058
Well-0.058
Token
Token ID198
Feature activation-7.80e-03

Pos logit contributions
ans+0.077
 hon+0.076
 paddle+0.074
 ge+0.072
 afloat+0.072
Neg logit contributions
enny-0.041
Well-0.037
Em-0.037
Ruby-0.037
Sure-0.036
TokenBen
Token ID11696
Feature activation-3.62e-02

Pos logit contributions
 hon+0.017
ans+0.017
 paddle+0.016
cks+0.016
 afloat+0.016
Neg logit contributions
enny-0.010
Em-0.009
Ruby-0.009
Hmm-0.009
Well-0.009
Token and
Token ID290
Feature activation-2.18e-03

Pos logit contributions
ans+0.055
 hon+0.055
cks+0.053
 ge+0.050
 steer+0.049
Neg logit contributions
pillar-0.065
 Jewel-0.065
enny-0.063
Em-0.061
Well-0.061
Token Mia
Token ID32189
Feature activation-6.68e-02

Pos logit contributions
 hon+0.004
ans+0.004
cks+0.003
ep+0.003
 ge+0.003
Neg logit contributions
 Jewel-0.004
enny-0.004
Em-0.004
pillar-0.004
Well-0.004
Token see
Token ID766
Feature activation-2.27e-02

Pos logit contributions
ans+0.108
cks+0.103
 hon+0.102
ep+0.097
 polar+0.096
Neg logit contributions
 Jewel-0.114
 she-0.111
pillar-0.110
Sure-0.109
enny-0.109
Token the
Token ID262
Feature activation-2.07e-02

Pos logit contributions
ans+0.042
ep+0.040
 hon+0.040
cker+0.039
 steer+0.039
Neg logit contributions
 she-0.035
 diary-0.030
 Penny-0.030
 Well-0.030
 Jewel-0.030
Token dragon
Token ID10441
Feature activation-3.24e-02

Pos logit contributions
ans+0.038
cks+0.037
 hon+0.036
 steer+0.034
 ge+0.033
Neg logit contributions
 diary-0.031
pillar-0.030
 Jewel-0.029
 cartridge-0.028
Em-0.028
Token's
Token ID338
Feature activation-2.48e-02

Pos logit contributions
ans+0.055
 hon+0.051
 ge+0.049
 paddle+0.046
ep+0.046
Neg logit contributions
 she-0.054
 diary-0.051
pillar-0.050
stone-0.050
 Jewel-0.048
Token face
Token ID1986
Feature activation-2.43e-02

Pos logit contributions
 hon+0.044
ans+0.042
cks+0.041
ep+0.039
 steer+0.039
Neg logit contributions
 diary-0.039
 she-0.032
 folder-0.031
 makeup-0.031
 cartridge-0.030
Token with
Token ID351
Feature activation-2.73e-02

Pos logit contributions
ans+0.054
 hon+0.052
cks+0.049
 ge+0.048
ull+0.047
Neg logit contributions
 Jewel-0.072
enny-0.068
Em-0.068
Of-0.067
 she-0.067
Token cubes
Token ID34896
Feature activation-4.54e-02

Pos logit contributions
 hon+0.046
ans+0.046
cks+0.042
 ge+0.041
ep+0.040
Neg logit contributions
 Jewel-0.046
Em-0.045
enny-0.045
pillar-0.044
 Grace-0.043
Token.
Token ID13
Feature activation-1.97e-02

Pos logit contributions
 hon+0.082
cks+0.077
 paddle+0.077
ans+0.076
 polar+0.074
Neg logit contributions
enny-0.072
 she-0.071
stone-0.067
pillar-0.066
Em-0.063
Token They
Token ID1119
Feature activation-5.86e-02

Pos logit contributions
 hon+0.037
ans+0.036
cks+0.035
ep+0.033
 ge+0.033
Neg logit contributions
 Jewel-0.030
enny-0.030
Em-0.030
pillar-0.029
Well-0.028
Token have
Token ID423
Feature activation-3.45e-02

Pos logit contributions
ans+0.122
cks+0.117
ull+0.108
 hon+0.107
 snowball+0.106
Neg logit contributions
 she-0.074
 Jewel-0.068
Em-0.067
 Penny-0.066
 blush-0.066
Token many
Token ID867
Feature activation-8.06e-02

Pos logit contributions
ans+0.073
cks+0.073
 hon+0.071
cker+0.068
 paddle+0.068
Neg logit contributions
Em-0.041
 she-0.038
 Jewel-0.037
enny-0.037
 cartridge-0.037
Token cubes
Token ID34896
Feature activation-3.99e-02

Pos logit contributions
cks+0.144
 hon+0.133
ans+0.132
 stret+0.132
cker+0.130
Neg logit contributions
 makeup-0.104
 markers-0.098
 weeks-0.097
brate-0.094
ometime-0.094
Token of
Token ID286
Feature activation-7.81e-02

Pos logit contributions
 hon+0.084
 paddle+0.080
cks+0.080
ans+0.079
 polar+0.077
Neg logit contributions
enny-0.056
 she-0.053
stone-0.048
pillar-0.047
Em-0.046
Token different
Token ID1180
Feature activation-7.38e-02

Pos logit contributions
cks+0.140
ans+0.139
 hon+0.136
 afloat+0.133
cker+0.128
Neg logit contributions
 she-0.109
Sure-0.107
Em-0.106
 Jewel-0.106
 Emmy-0.105
Token colors
Token ID7577
Feature activation-3.69e-02

Pos logit contributions
 hon+0.136
 stret+0.133
 paddle+0.128
cks+0.124
 drown+0.122
Neg logit contributions
ometime-0.095
enny-0.094
 makeup-0.092
ly-0.091
brate-0.087
Token and
Token ID290
Feature activation-4.71e-02

Pos logit contributions
 hon+0.065
ans+0.062
 ge+0.059
 paddle+0.059
cks+0.059
Neg logit contributions
 she-0.062
enny-0.058
stone-0.057
pillar-0.056
 Jewel-0.055