TOP ACTIVATIONS
MAX = 0.127

Yes , they are . Magn ets are really strong !"
ised and studied hard -- mem or izing the answers and
day , while hop - ski pping through the garden ,
looked at the egg - du ck and then at each
to share the egg - du ck and try to fit
money couldn 't buy -- mem ories that he would hold
to play with magnets . Magn ets are things that stick
inside and sque aked " aunt 's coming for dinner !'
They chase the car - nos aur around the room ,
" We are the weapon creators ! We can stop the
. " A car - nos aur !" They
... Ben . I ... miss ... you ... very ...
lesson â pat ience pays off ! <|endoftext|>
didn 't know what " nos y " meant , so
, he made some money transactions . At the post office
. Mom helped John re hang the shelf and it was
old man was right -- pat ience was key . He
Larry was a good - nat ured bird and always looked
. The all igator was so upset that he
," she said , " Made this cub by house all

MIN ACTIVATIONS
MIN = -0.117

her sweater * po pped * ! Everyone stopped and looked
a loud * be ep * from the oven . The
at them with surprise and displeasure . " But
work hard ." * wh istle sound <|endoftext|> One
danced along and the audience ( L ily 's mom and
friend . * The End * <|endoftext|> Once
he heard a whistle ! * He was so happy and
* The End * <|endoftext|> Once upon a time
hair tied in two pig tails . " M ummy !"
," the bird said . ( Or , if you think
calling him Tim my the Couch - P usher ! <|endoftext|>
smiled , happy that the crisis was aver ted . The
," Be good , Holly ,. I 'll be back soon
NO , L ILY ! STOP !" Mom shouted .
He says , " O w , ow , ow !
make bra ids and pony tails and curls . You can
asked her mum if the hor was ready yet .
there was a river . River was very wide and long
my said , " E w , Tim my ! That
and had the word " w elcome " written in big

INTERVAL 0.066 - 0.127
CONTAINS 0.057%

â That looks amazing !
said , It â
said , Do not cry ,
. â It â
happened . The books he lf became restless . It was

INTERVAL 0.005 - 0.066
CONTAINS 30.612%

basket . " This is shrimp . It is a small
know what to do . Tim my 's
With the old woman â s wise words ,
said it was made from special plants and woods . She
big and had a smooth floor that was fun to run

INTERVAL -0.056 - 0.005
CONTAINS 69.291%

One day , a 3 year old walked into the room
new friend today . Her name is Emma and she is
They looked at each other . They smiled and hugged .
" Wow , Cat ty is a butterfly !" Ben
was not . It snapped . The tomato fell to the

INTERVAL -0.117 - -0.056
CONTAINS 0.040%

was off on her golf ing adventure with her clean golf
they marched outside with their bro oms and began to clean
Tim my 's house and celebrated with ice cream . The
and scary . It bark ed and grow led and showed
her hand . " O uch !" cried Lily . "
TokenYes
Token ID5297
Feature activation+1.78e-02

Pos logit contributions
w+0.036
z+0.031
inks+0.030
ings+0.030
s+0.029
Neg logit contributions
 agreement-0.035
 unison-0.027
 Princess-0.024
 Prince-0.024
 horror-0.024
Token,
Token ID11
Feature activation+2.00e-03

Pos logit contributions
 agreement+0.043
 unison+0.034
 horror+0.033
 Prince+0.033
 Freddie+0.031
Neg logit contributions
w-0.035
z-0.029
s-0.028
inks-0.027
ing-0.026
Token they
Token ID484
Feature activation+1.02e-02

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 Prince+0.003
 Princess+0.003
Neg logit contributions
w-0.004
z-0.004
inks-0.003
s-0.003
ings-0.003
Token are
Token ID389
Feature activation+8.62e-03

Pos logit contributions
 agreement+0.023
 unison+0.018
 horror+0.017
 relief+0.017
 Freddie+0.017
Neg logit contributions
w-0.021
z-0.019
s-0.017
inks-0.017
ings-0.016
Token.
Token ID13
Feature activation-3.35e-03

Pos logit contributions
 agreement+0.020
 unison+0.015
 horror+0.015
 Prince+0.014
 Freddie+0.014
Neg logit contributions
w-0.018
z-0.015
inks-0.014
s-0.014
ings-0.013
Token Magn
Token ID14872
Feature activation+1.11e-01

Pos logit contributions
w+0.006
z+0.005
s+0.005
inks+0.005
mons+0.005
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.006
Tokenets
Token ID1039
Feature activation-3.93e-04

Pos logit contributions
 agreement+0.255
 unison+0.198
 horror+0.197
 Prince+0.190
 Freddie+0.183
Neg logit contributions
w-0.236
z-0.196
s-0.185
inks-0.183
ing-0.172
Token are
Token ID389
Feature activation+5.27e-03

Pos logit contributions
w+0.001
z+0.001
inks+0.001
mons+0.001
s+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 delight-0.001
 Freddie-0.001
Token really
Token ID1107
Feature activation-1.50e-02

Pos logit contributions
 agreement+0.011
 unison+0.008
 horror+0.008
 Prince+0.008
 Freddie+0.008
Neg logit contributions
w-0.012
z-0.010
s-0.010
inks-0.010
ings-0.009
Token strong
Token ID1913
Feature activation+7.65e-03

Pos logit contributions
w+0.036
z+0.032
inks+0.029
s+0.029
ings+0.028
Neg logit contributions
 agreement-0.029
 unison-0.022
 horror-0.021
 Prince-0.021
 Freddie-0.020
Token!"
Token ID2474
Feature activation+1.30e-02

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 Prince+0.013
 Freddie+0.012
Neg logit contributions
w-0.016
z-0.014
inks-0.013
s-0.013
mons-0.012
Tokenised
Token ID1417
Feature activation-2.70e-02

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 delight+0.004
 Prince+0.004
Neg logit contributions
w-0.005
z-0.004
inks-0.004
s-0.004
mons-0.004
Token and
Token ID290
Feature activation+1.80e-03

Pos logit contributions
w+0.064
z+0.054
mons+0.053
inks+0.052
s+0.050
Neg logit contributions
 agreement-0.052
 unison-0.037
 horror-0.036
 delight-0.036
 Freddie-0.035
Token studied
Token ID9713
Feature activation+1.83e-02

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 Prince+0.003
 Freddie+0.003
Neg logit contributions
w-0.004
z-0.004
s-0.003
inks-0.003
ing-0.003
Token hard
Token ID1327
Feature activation-2.45e-03

Pos logit contributions
 agreement+0.038
 unison+0.029
 horror+0.028
 Prince+0.027
 Freddie+0.026
Neg logit contributions
w-0.042
z-0.036
s-0.034
inks-0.034
ings-0.032
Token--
Token ID438
Feature activation-1.62e-02

Pos logit contributions
w+0.006
z+0.005
inks+0.004
s+0.004
mons+0.004
Neg logit contributions
 agreement-0.005
 unison-0.004
 horror-0.004
 Prince-0.004
 Princess-0.004
Tokenmem
Token ID11883
Feature activation+1.10e-01

Pos logit contributions
w+0.031
inks+0.025
z+0.023
s+0.023
mons+0.022
Neg logit contributions
 agreement-0.037
 Princess-0.028
 Prince-0.028
 unison-0.027
 knight-0.027
Tokenor
Token ID273
Feature activation-1.44e-02

Pos logit contributions
 agreement+0.235
 horror+0.187
 unison+0.182
 Prince+0.182
 Princess+0.177
Neg logit contributions
w-0.245
z-0.208
inks-0.194
s-0.189
ings-0.187
Tokenizing
Token ID2890
Feature activation-3.72e-02

Pos logit contributions
w+0.028
z+0.022
inks+0.021
s+0.021
mons+0.020
Neg logit contributions
 agreement-0.034
 unison-0.027
 horror-0.026
 Prince-0.026
 Princess-0.026
Token the
Token ID262
Feature activation+4.37e-03

Pos logit contributions
w+0.084
z+0.072
inks+0.066
s+0.066
mons+0.063
Neg logit contributions
 agreement-0.078
 unison-0.059
 horror-0.058
 Prince-0.056
 Freddie-0.055
Token answers
Token ID7429
Feature activation+6.92e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.007
 Freddie+0.007
 Prince+0.006
Neg logit contributions
w-0.010
z-0.008
s-0.008
inks-0.008
ings-0.007
Token and
Token ID290
Feature activation+1.52e-03

Pos logit contributions
 agreement+0.015
 unison+0.012
 horror+0.011
 Prince+0.011
 Princess+0.011
Neg logit contributions
w-0.015
z-0.013
inks-0.012
s-0.012
mons-0.011
Token day
Token ID1110
Feature activation+1.71e-03

Pos logit contributions
 agreement+0.013
 unison+0.010
 horror+0.010
 Prince+0.009
 Freddie+0.009
Neg logit contributions
w-0.012
z-0.010
s-0.010
inks-0.009
ing-0.009
Token,
Token ID11
Feature activation+2.52e-03

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 relief+0.003
 Prince+0.003
Neg logit contributions
w-0.003
z-0.003
s-0.003
inks-0.002
ing-0.002
Token while
Token ID981
Feature activation+2.38e-03

Pos logit contributions
 agreement+0.005
 unison+0.003
 horror+0.003
 relief+0.003
 knight+0.003
Neg logit contributions
w-0.006
s-0.006
z-0.005
ing-0.005
inks-0.005
Token hop
Token ID1725
Feature activation-8.69e-06

Pos logit contributions
 agreement+0.006
 unison+0.005
 horror+0.004
 relief+0.004
 delight+0.004
Neg logit contributions
w-0.005
z-0.004
s-0.004
ing-0.003
inks-0.003
Token-
Token ID12
Feature activation+5.93e-04

Feature not active on prev token;
no predictions were affected.
Tokenski
Token ID20545
Feature activation+1.08e-01

Pos logit contributions
 agreement+0.001
 Freddie+0.001
 Princess+0.001
 Prince+0.001
 Molly+0.001
Neg logit contributions
w-0.001
ings-0.001
inks-0.001
uc-0.001
uch-0.001
Tokenpping
Token ID2105
Feature activation-6.85e-03

Pos logit contributions
 agreement+0.221
 unison+0.166
 horror+0.166
 Prince+0.156
 relief+0.155
Neg logit contributions
w-0.254
z-0.226
s-0.208
inks-0.205
ing-0.197
Token through
Token ID832
Feature activation+8.90e-03

Pos logit contributions
w+0.016
z+0.013
inks+0.012
s+0.012
mons+0.012
Neg logit contributions
 agreement-0.014
 unison-0.011
 horror-0.011
 Prince-0.010
 Freddie-0.010
Token the
Token ID262
Feature activation-5.40e-04

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.012
 relief+0.011
 Prince+0.011
Neg logit contributions
w-0.021
z-0.019
s-0.018
inks-0.017
ing-0.017
Token garden
Token ID11376
Feature activation-2.49e-03

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Prince-0.001
 Freddie-0.001
Token,
Token ID11
Feature activation-4.40e-04

Pos logit contributions
w+0.006
z+0.005
inks+0.005
s+0.004
mons+0.004
Neg logit contributions
 agreement-0.005
 horror-0.004
 unison-0.004
 Prince-0.004
 Princess-0.004
Token looked
Token ID3114
Feature activation+2.63e-02

Pos logit contributions
 agreement+0.033
 unison+0.025
 horror+0.024
 Prince+0.024
 Freddie+0.023
Neg logit contributions
w-0.038
z-0.032
s-0.032
inks-0.030
ing-0.029
Token at
Token ID379
Feature activation+6.35e-03

Pos logit contributions
 agreement+0.069
 unison+0.058
 horror+0.054
 Prince+0.054
 Freddie+0.052
Neg logit contributions
w-0.044
s-0.036
z-0.035
inks-0.034
ing-0.032
Token the
Token ID262
Feature activation+2.11e-04

Pos logit contributions
 agreement+0.014
 knight+0.011
 unison+0.011
 Prince+0.011
 Princess+0.011
Neg logit contributions
w-0.013
s-0.010
z-0.010
ings-0.009
inks-0.009
Token egg
Token ID5935
Feature activation+2.38e-02

Pos logit contributions
 agreement+0.000
 horror+0.000
 unison+0.000
 relief+0.000
 Prince+0.000
Neg logit contributions
w-0.001
z-0.000
inks-0.000
s-0.000
mons-0.000
Token-
Token ID12
Feature activation-3.16e-03

Pos logit contributions
 agreement+0.053
 unison+0.041
 horror+0.040
 Prince+0.039
 Freddie+0.038
Neg logit contributions
w-0.051
z-0.043
s-0.041
inks-0.040
ings-0.038
Tokendu
Token ID646
Feature activation+1.05e-01

Pos logit contributions
w+0.004
inks+0.004
ings+0.004
z+0.004
ve+0.003
Neg logit contributions
 agreement-0.007
 end-0.005
 Freddie-0.005
 Prince-0.005
 End-0.005
Tokenck
Token ID694
Feature activation-3.28e-03

Pos logit contributions
 agreement+0.219
 unison+0.164
 horror+0.156
 Freddie+0.149
 Prince+0.148
Neg logit contributions
w-0.232
z-0.203
s-0.194
inks-0.192
ing-0.180
Token and
Token ID290
Feature activation+2.28e-03

Pos logit contributions
w+0.006
z+0.005
s+0.005
inks+0.005
ings+0.004
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.006
Token then
Token ID788
Feature activation+2.25e-02

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.003
 Prince+0.003
 Freddie+0.003
Neg logit contributions
w-0.005
z-0.004
s-0.004
inks-0.004
mons-0.004
Token at
Token ID379
Feature activation+4.72e-03

Pos logit contributions
 agreement+0.056
 unison+0.045
 horror+0.043
 Prince+0.042
 Princess+0.041
Neg logit contributions
w-0.043
z-0.035
s-0.035
inks-0.033
ing-0.031
Token each
Token ID1123
Feature activation+7.36e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 knight+0.007
 Prince+0.007
 Freddie+0.007
Neg logit contributions
w-0.011
z-0.009
s-0.009
inks-0.009
ings-0.008
Token to
Token ID284
Feature activation-1.39e-03

Pos logit contributions
w+0.023
z+0.018
inks+0.016
ings+0.014
s+0.014
Neg logit contributions
 agreement-0.041
 horror-0.033
 Prince-0.032
 unison-0.032
 Princess-0.031
Token share
Token ID2648
Feature activation+2.76e-03

Pos logit contributions
w+0.003
z+0.003
s+0.003
inks+0.003
ings+0.002
Neg logit contributions
 agreement-0.003
 unison-0.002
 horror-0.002
 Prince-0.002
 Freddie-0.002
Token the
Token ID262
Feature activation-2.24e-04

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 Prince+0.004
 Freddie+0.004
Neg logit contributions
w-0.007
z-0.006
s-0.006
inks-0.005
ings-0.005
Token egg
Token ID5935
Feature activation+2.19e-02

Pos logit contributions
w+0.001
z+0.001
inks+0.001
s+0.001
ing+0.001
Neg logit contributions
 agreement-0.000
 horror-0.000
 relief-0.000
 unison-0.000
 delight-0.000
Token-
Token ID12
Feature activation-3.51e-03

Pos logit contributions
 agreement+0.052
 unison+0.039
 horror+0.039
 Prince+0.038
 Freddie+0.037
Neg logit contributions
w-0.045
z-0.037
inks-0.034
s-0.032
mons-0.032
Tokendu
Token ID646
Feature activation+1.04e-01

Pos logit contributions
inks+0.006
w+0.006
z+0.005
ings+0.005
uffy+0.005
Neg logit contributions
 agreement-0.006
 Princess-0.005
 end-0.005
 Freddie-0.005
 Prince-0.005
Tokenck
Token ID694
Feature activation-3.80e-03

Pos logit contributions
 agreement+0.188
 unison+0.133
 horror+0.122
 Freddie+0.118
 Prince+0.114
Neg logit contributions
w-0.259
z-0.230
inks-0.222
s-0.221
ing-0.210
Token and
Token ID290
Feature activation-1.61e-04

Pos logit contributions
w+0.008
z+0.007
inks+0.006
s+0.006
mons+0.006
Neg logit contributions
 agreement-0.009
 unison-0.007
 horror-0.007
 Prince-0.006
 Freddie-0.006
Token try
Token ID1949
Feature activation-3.03e-02

Pos logit contributions
w+0.000
z+0.000
s+0.000
inks+0.000
ings+0.000
Neg logit contributions
 agreement-0.000
 unison-0.000
 horror-0.000
 Prince-0.000
 Freddie-0.000
Token to
Token ID284
Feature activation-4.52e-03

Pos logit contributions
w+0.053
z+0.042
inks+0.035
mons+0.034
uch+0.033
Neg logit contributions
 agreement-0.080
 Prince-0.065
 horror-0.064
 Freddie-0.063
 unison-0.063
Token fit
Token ID4197
Feature activation+5.16e-03

Pos logit contributions
w+0.010
z+0.009
s+0.008
inks+0.008
ings+0.008
Neg logit contributions
 agreement-0.009
 horror-0.007
 unison-0.007
 Prince-0.007
 Freddie-0.007
Token money
Token ID1637
Feature activation-1.01e-02

Pos logit contributions
 agreement+0.028
 unison+0.022
 horror+0.021
 Prince+0.021
 Freddie+0.020
Neg logit contributions
w-0.025
z-0.021
s-0.020
inks-0.019
ings-0.018
Token couldn
Token ID3521
Feature activation-9.27e-03

Pos logit contributions
w+0.020
z+0.017
inks+0.015
s+0.015
mons+0.014
Neg logit contributions
 agreement-0.024
 unison-0.019
 horror-0.018
 Prince-0.018
 Freddie-0.018
Token't
Token ID470
Feature activation+1.31e-02

Pos logit contributions
w+0.017
z+0.014
inks+0.014
mons+0.013
s+0.012
Neg logit contributions
 agreement-0.023
 unison-0.018
 horror-0.018
 Freddie-0.017
 Prince-0.017
Token buy
Token ID2822
Feature activation+6.44e-03

Pos logit contributions
 agreement+0.028
 unison+0.023
 horror+0.023
 Prince+0.022
 Princess+0.021
Neg logit contributions
w-0.028
z-0.023
s-0.022
ings-0.020
inks-0.020
Token--
Token ID438
Feature activation-2.32e-02

Pos logit contributions
 agreement+0.013
 unison+0.010
 horror+0.010
 Prince+0.009
 Freddie+0.009
Neg logit contributions
w-0.015
z-0.013
inks-0.012
s-0.012
mons-0.011
Tokenmem
Token ID11883
Feature activation+1.04e-01

Pos logit contributions
w+0.044
inks+0.037
z+0.034
s+0.032
mons+0.032
Neg logit contributions
 agreement-0.049
 Prince-0.039
 Freddie-0.037
 Princess-0.037
 unison-0.037
Tokenories
Token ID1749
Feature activation-1.80e-02

Pos logit contributions
 agreement+0.235
 horror+0.183
 unison+0.182
 Prince+0.176
 Princess+0.172
Neg logit contributions
w-0.225
z-0.189
s-0.175
inks-0.174
ing-0.167
Token that
Token ID326
Feature activation-7.21e-03

Pos logit contributions
w+0.038
z+0.033
s+0.030
inks+0.030
ings+0.028
Neg logit contributions
 agreement-0.040
 unison-0.031
 horror-0.030
 Prince-0.029
 Freddie-0.029
Token he
Token ID339
Feature activation-1.05e-02

Pos logit contributions
w+0.016
z+0.014
s+0.014
inks+0.013
ings+0.012
Neg logit contributions
 agreement-0.015
 unison-0.012
 horror-0.011
 Prince-0.011
 Princess-0.010
Token would
Token ID561
Feature activation+1.38e-02

Pos logit contributions
w+0.024
z+0.021
inks+0.020
s+0.019
ings+0.019
Neg logit contributions
 agreement-0.021
 unison-0.016
 horror-0.015
 Prince-0.015
 Freddie-0.015
Token hold
Token ID1745
Feature activation-1.80e-02

Pos logit contributions
 agreement+0.030
 unison+0.023
 horror+0.023
 Prince+0.022
 Freddie+0.022
Neg logit contributions
w-0.029
z-0.025
s-0.023
inks-0.023
ings-0.022
Token to
Token ID284
Feature activation-1.34e-02

Pos logit contributions
w+0.061
z+0.049
inks+0.046
ings+0.042
uch+0.040
Neg logit contributions
 agreement-0.090
 Prince-0.077
 Freddie-0.074
 knight-0.073
 Princess-0.073
Token play
Token ID711
Feature activation-5.72e-03

Pos logit contributions
w+0.029
z+0.025
s+0.023
inks+0.023
ings+0.022
Neg logit contributions
 agreement-0.029
 unison-0.022
 horror-0.022
 Prince-0.021
 relief-0.021
Token with
Token ID351
Feature activation-1.91e-02

Pos logit contributions
w+0.010
z+0.008
s+0.008
inks+0.007
mons+0.007
Neg logit contributions
 agreement-0.015
 horror-0.012
 unison-0.012
 Prince-0.012
 relief-0.011
Token magnets
Token ID37446
Feature activation+5.22e-03

Pos logit contributions
w+0.043
z+0.037
inks+0.035
s+0.034
uck+0.032
Neg logit contributions
 agreement-0.040
 unison-0.031
 horror-0.030
 Prince-0.029
 Freddie-0.029
Token.
Token ID13
Feature activation-5.57e-03

Pos logit contributions
 agreement+0.012
 unison+0.010
 horror+0.010
 relief+0.009
 delight+0.009
Neg logit contributions
w-0.011
z-0.009
s-0.009
inks-0.008
ing-0.008
Token Magn
Token ID14872
Feature activation+1.02e-01

Pos logit contributions
w+0.011
s+0.010
inks+0.009
z+0.009
mons+0.009
Neg logit contributions
 agreement-0.012
 horror-0.009
 knight-0.009
 relief-0.009
 Prince-0.009
Tokenets
Token ID1039
Feature activation-1.45e-02

Pos logit contributions
 agreement+0.232
 unison+0.181
 horror+0.178
 Prince+0.172
 Freddie+0.168
Neg logit contributions
w-0.216
z-0.181
s-0.170
inks-0.169
ing-0.158
Token are
Token ID389
Feature activation-7.34e-03

Pos logit contributions
w+0.030
z+0.027
inks+0.023
mons+0.023
uck+0.022
Neg logit contributions
 agreement-0.033
 unison-0.026
 horror-0.025
 end-0.024
 Princess-0.024
Token things
Token ID1243
Feature activation-1.69e-02

Pos logit contributions
w+0.014
s+0.012
z+0.012
inks+0.011
mons+0.011
Neg logit contributions
 agreement-0.017
 horror-0.013
 unison-0.013
 Freddie-0.013
 Prince-0.013
Token that
Token ID326
Feature activation-2.04e-02

Pos logit contributions
w+0.030
z+0.024
s+0.024
inks+0.023
ing+0.021
Neg logit contributions
 agreement-0.044
 unison-0.035
 horror-0.034
 Prince-0.033
 Freddie-0.033
Token stick
Token ID4859
Feature activation-1.60e-02

Pos logit contributions
w+0.041
z+0.035
inks+0.032
s+0.031
ing+0.030
Neg logit contributions
 agreement-0.048
 unison-0.037
 horror-0.037
 Prince-0.036
 Freddie-0.035
Token inside
Token ID2641
Feature activation+2.35e-02

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
ings+0.001
Neg logit contributions
 agreement-0.002
 unison-0.001
 horror-0.001
 Prince-0.001
 Freddie-0.001
Token and
Token ID290
Feature activation-9.86e-03

Pos logit contributions
 agreement+0.057
 unison+0.047
 horror+0.044
 Prince+0.044
 knight+0.043
Neg logit contributions
w-0.043
z-0.037
s-0.036
inks-0.033
ings-0.033
Token sque
Token ID35776
Feature activation-4.59e-02

Pos logit contributions
w+0.022
z+0.019
s+0.018
inks+0.018
ing+0.017
Neg logit contributions
 agreement-0.021
 unison-0.016
 horror-0.015
 Prince-0.015
 Freddie-0.014
Tokenaked
Token ID4335
Feature activation-4.75e-02

Pos logit contributions
w+0.064
z+0.050
s+0.044
inks+0.043
mons+0.039
Neg logit contributions
 agreement-0.138
 unison-0.114
 horror-0.112
 Prince-0.109
 Freddie-0.107
Token "
Token ID366
Feature activation-2.14e-02

Pos logit contributions
w+0.096
z+0.083
s+0.077
inks+0.076
ings+0.073
Neg logit contributions
 agreement-0.109
 unison-0.085
 horror-0.083
 Prince-0.081
 Freddie-0.079
Tokenaunt
Token ID12968
Feature activation+1.02e-01

Pos logit contributions
w+0.040
inks+0.034
z+0.033
s+0.032
ings+0.032
Neg logit contributions
 agreement-0.049
 unison-0.039
 Prince-0.036
 Princess-0.036
 Freddie-0.035
Token's
Token ID338
Feature activation+2.50e-05

Pos logit contributions
 agreement+0.226
 unison+0.176
 horror+0.171
 Prince+0.170
 Freddie+0.168
Neg logit contributions
w-0.207
z-0.180
inks-0.169
mons-0.161
s-0.160
Token coming
Token ID2406
Feature activation+1.98e-03

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 Prince+0.000
 Freddie+0.000
Neg logit contributions
w-0.000
z-0.000
s-0.000
inks-0.000
ings-0.000
Token for
Token ID329
Feature activation-1.63e-04

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 Prince+0.003
 Princess+0.003
Neg logit contributions
w-0.004
z-0.004
inks-0.003
s-0.003
mons-0.003
Token dinner
Token ID8073
Feature activation-2.47e-03

Pos logit contributions
w+0.000
z+0.000
s+0.000
inks+0.000
ings+0.000
Neg logit contributions
 agreement-0.000
 unison-0.000
 horror-0.000
 Prince-0.000
 Freddie-0.000
Token!'
Token ID13679
Feature activation-3.33e-02

Pos logit contributions
w+0.004
z+0.004
s+0.003
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.006
 unison-0.005
 horror-0.005
 Prince-0.005
 Freddie-0.005
TokenThey
Token ID2990
Feature activation+2.00e-02

Pos logit contributions
w+0.002
z+0.002
inks+0.002
s+0.002
ings+0.002
Neg logit contributions
 agreement-0.003
 unison-0.002
 horror-0.002
 Prince-0.002
 Freddie-0.002
Token chase
Token ID15505
Feature activation+6.16e-03

Pos logit contributions
 agreement+0.041
 Prince+0.033
 Freddie+0.032
 unison+0.031
 Princess+0.031
Neg logit contributions
w-0.041
z-0.036
s-0.034
inks-0.033
ings-0.029
Token the
Token ID262
Feature activation+1.44e-03

Pos logit contributions
 agreement+0.012
 unison+0.008
 horror+0.008
 Prince+0.008
 Freddie+0.008
Neg logit contributions
w-0.015
z-0.013
s-0.013
inks-0.013
ings-0.012
Token car
Token ID1097
Feature activation-1.04e-02

Pos logit contributions
 agreement+0.003
 unison+0.003
 horror+0.002
 Freddie+0.002
 Prince+0.002
Neg logit contributions
w-0.003
z-0.003
s-0.002
inks-0.002
ings-0.002
Token-
Token ID12
Feature activation-1.19e-03

Pos logit contributions
w+0.022
z+0.019
s+0.018
inks+0.017
ings+0.016
Neg logit contributions
 agreement-0.023
 unison-0.018
 horror-0.017
 Prince-0.017
 Freddie-0.017
Tokennos
Token ID39369
Feature activation+1.01e-01

Pos logit contributions
inks+0.002
w+0.002
ings+0.002
ucker+0.002
iated+0.002
Neg logit contributions
 agreement-0.002
 end-0.002
 unison-0.002
 finally-0.002
 Princess-0.002
Tokenaur
Token ID2899
Feature activation+3.68e-02

Pos logit contributions
 agreement+0.234
 unison+0.187
 horror+0.175
 knight+0.174
 end+0.173
Neg logit contributions
w-0.193
z-0.168
s-0.162
inks-0.158
mons-0.153
Token around
Token ID1088
Feature activation+1.02e-02

Pos logit contributions
 agreement+0.075
 unison+0.057
 horror+0.055
 Prince+0.054
 relief+0.052
Neg logit contributions
w-0.087
z-0.075
s-0.071
inks-0.069
ing-0.067
Token the
Token ID262
Feature activation+2.65e-04

Pos logit contributions
 agreement+0.020
 Prince+0.015
 unison+0.015
 Freddie+0.014
 Princess+0.014
Neg logit contributions
w-0.023
z-0.020
s-0.019
ings-0.019
ing-0.019
Token room
Token ID2119
Feature activation+1.23e-03

Pos logit contributions
 agreement+0.001
 unison+0.000
 horror+0.000
 Prince+0.000
 Princess+0.000
Neg logit contributions
w-0.001
z-0.001
inks-0.001
s-0.001
mons-0.001
Token,
Token ID11
Feature activation+2.78e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.002
s-0.002
inks-0.002
ings-0.002
Token "
Token ID366
Feature activation-2.07e-02

Pos logit contributions
z+0.003
s+0.003
w+0.003
inks+0.002
om+0.002
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 relief-0.003
 Prince-0.003
TokenWe
Token ID1135
Feature activation+4.67e-03

Pos logit contributions
w+0.045
z+0.039
inks+0.036
s+0.036
ings+0.034
Neg logit contributions
 agreement-0.046
 unison-0.035
 horror-0.034
 Prince-0.033
 Freddie-0.032
Token are
Token ID389
Feature activation+7.41e-03

Pos logit contributions
 agreement+0.011
 unison+0.009
 horror+0.009
 Prince+0.009
 Princess+0.008
Neg logit contributions
w-0.009
z-0.008
s-0.007
inks-0.007
ings-0.006
Token the
Token ID262
Feature activation-2.34e-03

Pos logit contributions
 agreement+0.018
 unison+0.014
 horror+0.014
 relief+0.013
 hesitation+0.013
Neg logit contributions
w-0.013
z-0.011
inks-0.010
om-0.010
s-0.010
Token weapon
Token ID4282
Feature activation+6.48e-03

Pos logit contributions
w+0.005
z+0.004
s+0.004
inks+0.004
ings+0.003
Neg logit contributions
 agreement-0.005
 unison-0.004
 horror-0.004
 relief-0.004
 delight-0.004
Token creators
Token ID16294
Feature activation+1.00e-01

Pos logit contributions
 agreement+0.015
 unison+0.012
 horror+0.012
 Prince+0.011
 Freddie+0.011
Neg logit contributions
w-0.013
z-0.011
inks-0.010
s-0.010
mons-0.010
Token!
Token ID0
Feature activation-1.08e-02

Pos logit contributions
 agreement+0.259
 horror+0.210
 unison+0.203
 relief+0.200
 Prince+0.199
Neg logit contributions
w-0.181
z-0.152
s-0.136
ings-0.128
inks-0.127
Token We
Token ID775
Feature activation-7.31e-03

Pos logit contributions
w+0.019
z+0.015
s+0.014
inks+0.014
ings+0.013
Neg logit contributions
 agreement-0.028
 unison-0.022
 horror-0.022
 Prince-0.021
 Freddie-0.021
Token can
Token ID460
Feature activation+8.27e-03

Pos logit contributions
w+0.015
z+0.012
s+0.011
inks+0.011
ings+0.010
Neg logit contributions
 agreement-0.017
 unison-0.014
 horror-0.013
 Prince-0.013
 Princess-0.013
Token stop
Token ID2245
Feature activation-9.46e-03

Pos logit contributions
 agreement+0.016
 unison+0.012
 horror+0.012
 Prince+0.011
 Freddie+0.011
Neg logit contributions
w-0.020
z-0.017
s-0.016
inks-0.016
ings-0.015
Token the
Token ID262
Feature activation-4.55e-03

Pos logit contributions
w+0.022
z+0.019
inks+0.017
mons+0.017
s+0.017
Neg logit contributions
 agreement-0.019
 unison-0.014
 horror-0.014
 Prince-0.014
 Freddie-0.014
Token.
Token ID13
Feature activation-3.42e-04

Pos logit contributions
w+0.011
z+0.010
s+0.010
inks+0.009
ing+0.009
Neg logit contributions
 agreement-0.013
 unison-0.010
 horror-0.009
 Prince-0.009
 Freddie-0.009
Token "
Token ID366
Feature activation-1.78e-02

Pos logit contributions
w+0.001
s+0.001
z+0.001
inks+0.001
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Prince-0.001
 relief-0.001
TokenA
Token ID32
Feature activation-1.40e-02

Pos logit contributions
w+0.041
inks+0.035
ings+0.034
z+0.034
s+0.032
Neg logit contributions
 agreement-0.036
 unison-0.027
 knight-0.024
 Prince-0.024
 horror-0.024
Token car
Token ID1097
Feature activation-1.15e-02

Pos logit contributions
w+0.029
z+0.026
s+0.024
inks+0.022
ing+0.021
Neg logit contributions
 agreement-0.032
 unison-0.025
 horror-0.025
 Freddie-0.024
 Prince-0.023
Token-
Token ID12
Feature activation-1.75e-03

Pos logit contributions
w+0.020
z+0.016
inks+0.015
mons+0.014
s+0.014
Neg logit contributions
 agreement-0.030
 unison-0.024
 horror-0.024
 knight-0.023
 Princess-0.023
Tokennos
Token ID39369
Feature activation+1.00e-01

Pos logit contributions
w+0.003
inks+0.003
ings+0.003
mons+0.003
ll+0.003
Neg logit contributions
 agreement-0.003
 knight-0.003
 unison-0.003
 Princess-0.003
 Prince-0.003
Tokenaur
Token ID2899
Feature activation+3.65e-02

Pos logit contributions
 agreement+0.243
 unison+0.195
 knight+0.185
 horror+0.183
 Prince+0.182
Neg logit contributions
w-0.184
z-0.157
inks-0.149
s-0.147
mons-0.141
Token!"
Token ID2474
Feature activation+1.22e-02

Pos logit contributions
 agreement+0.089
 unison+0.070
 horror+0.069
 Prince+0.067
 relief+0.065
Neg logit contributions
w-0.073
z-0.061
s-0.056
inks-0.054
ing-0.052
Token
Token ID198
Feature activation-3.05e-03

Pos logit contributions
 agreement+0.032
 unison+0.026
 Prince+0.024
 horror+0.024
 knight+0.023
Neg logit contributions
w-0.021
s-0.019
z-0.017
inks-0.016
uck-0.015
Token
Token ID198
Feature activation-1.13e-03

Pos logit contributions
w+0.007
z+0.006
inks+0.006
mons+0.006
s+0.006
Neg logit contributions
 agreement-0.006
 unison-0.004
 horror-0.004
 Prince-0.004
 Freddie-0.004
TokenThey
Token ID2990
Feature activation+2.00e-02

Pos logit contributions
w+0.002
z+0.002
inks+0.002
s+0.002
ings+0.002
Neg logit contributions
 agreement-0.003
 unison-0.002
 horror-0.002
 Prince-0.002
 Freddie-0.002
Token...
Token ID986
Feature activation-4.79e-02

Pos logit contributions
w+0.060
z+0.052
inks+0.050
s+0.049
mons+0.047
Neg logit contributions
 agreement-0.060
 unison-0.045
 Prince-0.044
 Princess-0.043
 horror-0.043
TokenBen
Token ID11696
Feature activation+3.06e-03

Pos logit contributions
w+0.100
z+0.085
inks+0.082
s+0.078
ings+0.077
Neg logit contributions
 agreement-0.101
 Princess-0.077
 Prince-0.076
 unison-0.076
 horror-0.073
Token.
Token ID13
Feature activation-4.89e-04

Pos logit contributions
 agreement+0.007
 unison+0.006
 horror+0.005
 Prince+0.005
 Freddie+0.005
Neg logit contributions
w-0.006
z-0.005
inks-0.005
s-0.005
mons-0.004
Token I
Token ID314
Feature activation+6.15e-03

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Prince-0.001
 Freddie-0.001
Token...
Token ID986
Feature activation-4.82e-02

Pos logit contributions
 agreement+0.014
 unison+0.011
 horror+0.011
 Prince+0.010
 Princess+0.010
Neg logit contributions
w-0.013
z-0.011
inks-0.010
s-0.010
ing-0.010
Tokenmiss
Token ID3927
Feature activation+9.86e-02

Pos logit contributions
w+0.099
z+0.084
inks+0.081
s+0.078
ings+0.077
Neg logit contributions
 agreement-0.105
 unison-0.081
 Prince-0.080
 Princess-0.079
 horror-0.077
Token...
Token ID986
Feature activation-4.82e-02

Pos logit contributions
 agreement+0.227
 unison+0.179
 Prince+0.170
 horror+0.168
 Princess+0.168
Neg logit contributions
w-0.187
z-0.165
inks-0.157
s-0.152
mons-0.146
Tokenyou
Token ID5832
Feature activation+3.09e-02

Pos logit contributions
w+0.097
z+0.084
inks+0.080
s+0.076
ings+0.075
Neg logit contributions
 agreement-0.105
 unison-0.080
 Prince-0.080
 Princess-0.079
 horror-0.077
Token...
Token ID986
Feature activation-4.79e-02

Pos logit contributions
 agreement+0.071
 unison+0.056
 horror+0.055
 Prince+0.053
 Princess+0.052
Neg logit contributions
w-0.062
z-0.053
inks-0.049
s-0.048
mons-0.046
Tokenvery
Token ID548
Feature activation-2.87e-02

Pos logit contributions
w+0.093
z+0.081
inks+0.077
s+0.073
ings+0.072
Neg logit contributions
 agreement-0.103
 Prince-0.081
 Princess-0.081
 unison-0.080
 Freddie-0.076
Token...
Token ID986
Feature activation-4.73e-02

Pos logit contributions
w+0.051
z+0.042
inks+0.038
s+0.036
mons+0.036
Neg logit contributions
 agreement-0.072
 unison-0.059
 horror-0.056
 Prince-0.055
 Princess-0.055
Token lesson
Token ID11483
Feature activation+2.30e-02

Pos logit contributions
w+0.000
z+0.000
inks+0.000
s+0.000
mons+0.000
Neg logit contributions
 agreement-0.000
 unison-0.000
 horror-0.000
 Prince-0.000
 relief-0.000
Tokenâ
Token ID22940
Feature activation+4.47e-03

Pos logit contributions
 agreement+0.047
 unison+0.036
 horror+0.034
 Prince+0.033
 Freddie+0.032
Neg logit contributions
w-0.053
z-0.046
inks-0.043
s-0.041
mons-0.040
Token
Token ID26391
Feature activation+3.96e-02

Pos logit contributions
 agreement+0.010
 unison+0.007
 horror+0.007
 Prince+0.007
 Freddie+0.007
Neg logit contributions
w-0.010
z-0.009
s-0.008
inks-0.008
ings-0.008
Token
Token ID447
Feature activation+1.08e-02

Pos logit contributions
 agreement+0.087
 unison+0.076
 horror+0.069
 Prince+0.066
 delight+0.066
Neg logit contributions
w-0.082
z-0.060
inks-0.057
mons-0.056
ing-0.055
Token
Token ID251
Feature activation-9.44e-03

Pos logit contributions
 agreement+0.020
 unison+0.015
 horror+0.013
 Freddie+0.013
 Prince+0.013
Neg logit contributions
w-0.027
z-0.023
inks-0.023
s-0.022
ing-0.021
Tokenpat
Token ID8071
Feature activation+9.81e-02

Pos logit contributions
w+0.020
z+0.017
s+0.017
inks+0.016
ings+0.015
Neg logit contributions
 agreement-0.020
 horror-0.016
 unison-0.015
 knight-0.015
 Freddie-0.015
Tokenience
Token ID1240
Feature activation-3.37e-02

Pos logit contributions
 agreement+0.222
 unison+0.175
 horror+0.174
 Prince+0.169
 Princess+0.162
Neg logit contributions
w-0.209
z-0.177
s-0.166
ing-0.157
inks-0.155
Token pays
Token ID13831
Feature activation-2.16e-02

Pos logit contributions
w+0.070
z+0.059
s+0.055
inks+0.054
ings+0.051
Neg logit contributions
 agreement-0.078
 unison-0.060
 horror-0.059
 Prince-0.057
 Freddie-0.056
Token off
Token ID572
Feature activation+3.65e-02

Pos logit contributions
w+0.050
z+0.043
inks+0.041
s+0.040
mons+0.039
Neg logit contributions
 agreement-0.045
 unison-0.033
 horror-0.032
 Prince-0.031
 Freddie-0.030
Token!
Token ID0
Feature activation+4.18e-03

Pos logit contributions
 agreement+0.080
 unison+0.062
 horror+0.060
 Prince+0.058
 Freddie+0.057
Neg logit contributions
w-0.079
z-0.067
inks-0.062
s-0.062
ings-0.058
Token<|endoftext|>
Token ID50256
Feature activation+5.05e-03

Pos logit contributions
 agreement+0.010
 horror+0.008
 unison+0.008
 knight+0.007
 relief+0.007
Neg logit contributions
w-0.008
s-0.007
z-0.007
inks-0.006
ings-0.006
Token didn
Token ID1422
Feature activation+1.76e-02

Pos logit contributions
 agreement+0.012
 unison+0.009
 horror+0.009
 Prince+0.008
 Freddie+0.008
Neg logit contributions
w-0.013
z-0.011
s-0.010
inks-0.010
ings-0.009
Token't
Token ID470
Feature activation+1.85e-02

Pos logit contributions
 agreement+0.047
 unison+0.038
 horror+0.038
 Prince+0.037
 Freddie+0.036
Neg logit contributions
w-0.029
z-0.024
inks-0.021
s-0.021
mons-0.019
Token know
Token ID760
Feature activation-2.50e-02

Pos logit contributions
 agreement+0.042
 unison+0.034
 horror+0.032
 Freddie+0.031
 Prince+0.031
Neg logit contributions
w-0.039
z-0.032
s-0.030
inks-0.030
ings-0.028
Token what
Token ID644
Feature activation-1.73e-02

Pos logit contributions
w+0.058
z+0.050
s+0.048
inks+0.047
ing+0.046
Neg logit contributions
 agreement-0.051
 unison-0.038
 horror-0.036
 Prince-0.035
 relief-0.034
Token "
Token ID366
Feature activation-1.84e-02

Pos logit contributions
w+0.035
z+0.029
inks+0.027
s+0.026
ings+0.025
Neg logit contributions
 agreement-0.041
 unison-0.032
 Prince-0.031
 horror-0.031
 Princess-0.031
Tokennos
Token ID39369
Feature activation+9.73e-02

Pos logit contributions
w+0.032
inks+0.029
z+0.027
ings+0.025
s+0.025
Neg logit contributions
 agreement-0.040
 Prince-0.031
 Princess-0.031
 unison-0.030
 end-0.029
Tokeny
Token ID88
Feature activation-4.35e-02

Pos logit contributions
 agreement+0.251
 knight+0.203
 Prince+0.202
 Princess+0.196
 end+0.196
Neg logit contributions
w-0.145
z-0.129
inks-0.118
s-0.115
mons-0.114
Token"
Token ID1
Feature activation-7.84e-04

Pos logit contributions
w+0.087
z+0.076
inks+0.071
s+0.067
mons+0.064
Neg logit contributions
 agreement-0.098
 unison-0.077
 Prince-0.075
 Princess-0.073
 horror-0.073
Token meant
Token ID4001
Feature activation-5.11e-04

Pos logit contributions
w+0.002
z+0.002
inks+0.002
s+0.002
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 Prince-0.001
 horror-0.001
 Princess-0.001
Token,
Token ID11
Feature activation+2.05e-03

Pos logit contributions
w+0.001
z+0.001
s+0.001
ing+0.001
inks+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 relief-0.001
 Prince-0.001
Token so
Token ID523
Feature activation-2.81e-02

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 knight+0.003
 alien+0.003
Neg logit contributions
w-0.004
s-0.003
ing-0.003
mons-0.003
z-0.003
Token,
Token ID11
Feature activation+4.83e-03

Pos logit contributions
 agreement+0.006
 end+0.004
 horror+0.004
 Princess+0.004
 Prince+0.004
Neg logit contributions
w-0.010
z-0.008
inks-0.008
we-0.007
to-0.007
Token he
Token ID339
Feature activation-1.98e-03

Pos logit contributions
 agreement+0.009
 unison+0.006
 horror+0.006
 Prince+0.006
 Princess+0.006
Neg logit contributions
w-0.012
z-0.011
inks-0.010
s-0.010
ings-0.010
Token made
Token ID925
Feature activation-8.45e-03

Pos logit contributions
w+0.004
z+0.004
s+0.004
inks+0.004
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 Prince-0.003
 Freddie-0.003
Token some
Token ID617
Feature activation+3.70e-03

Pos logit contributions
w+0.021
z+0.018
inks+0.017
s+0.017
mons+0.016
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.012
 Prince-0.011
 Princess-0.011
Token money
Token ID1637
Feature activation-7.13e-03

Pos logit contributions
 agreement+0.008
 unison+0.006
 horror+0.006
 Prince+0.005
 Freddie+0.005
Neg logit contributions
w-0.009
z-0.008
inks-0.007
s-0.007
ings-0.007
Token transactions
Token ID8945
Feature activation+9.65e-02

Pos logit contributions
w+0.017
z+0.014
inks+0.013
s+0.013
mons+0.013
Neg logit contributions
 agreement-0.014
 unison-0.011
 horror-0.010
 Prince-0.010
 Freddie-0.010
Token.
Token ID13
Feature activation+9.74e-03

Pos logit contributions
 agreement+0.241
 horror+0.195
 unison+0.192
 relief+0.188
 delight+0.182
Neg logit contributions
w-0.173
z-0.157
s-0.142
ings-0.135
ing-0.131
Token At
Token ID1629
Feature activation+9.92e-03

Pos logit contributions
 agreement+0.022
 unison+0.017
 horror+0.017
 hesitation+0.016
 relief+0.016
Neg logit contributions
w-0.019
z-0.016
s-0.016
inks-0.015
uck-0.015
Token the
Token ID262
Feature activation+6.57e-04

Pos logit contributions
 agreement+0.019
 unison+0.014
 horror+0.013
 Prince+0.012
 relief+0.012
Neg logit contributions
w-0.025
z-0.022
s-0.020
inks-0.019
ing-0.019
Token post
Token ID1281
Feature activation+1.72e-02

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Princess+0.001
Neg logit contributions
w-0.002
z-0.001
inks-0.001
s-0.001
ings-0.001
Token office
Token ID2607
Feature activation-1.11e-02

Pos logit contributions
 agreement+0.036
 horror+0.027
 Prince+0.027
 unison+0.027
 Princess+0.027
Neg logit contributions
w-0.039
z-0.032
inks-0.031
s-0.029
mons-0.028
Token.
Token ID13
Feature activation-1.71e-03

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 Prince+0.012
 relief+0.012
Neg logit contributions
w-0.014
z-0.013
s-0.012
inks-0.011
ings-0.011
Token Mom
Token ID11254
Feature activation-2.14e-02

Pos logit contributions
w+0.004
z+0.003
s+0.003
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 Prince-0.003
 relief-0.003
Token helped
Token ID4193
Feature activation+4.49e-03

Pos logit contributions
w+0.041
z+0.033
inks+0.031
s+0.030
mons+0.030
Neg logit contributions
 agreement-0.052
 unison-0.041
 horror-0.040
 Prince-0.039
 Freddie-0.039
Token John
Token ID1757
Feature activation-1.63e-02

Pos logit contributions
 agreement+0.008
 unison+0.006
 horror+0.006
 Freddie+0.005
 Prince+0.005
Neg logit contributions
w-0.012
z-0.010
inks-0.010
mons-0.009
s-0.009
Token re
Token ID302
Feature activation+8.42e-03

Pos logit contributions
w+0.042
mons+0.033
inks+0.033
z+0.033
ings+0.031
Neg logit contributions
 agreement-0.028
 Freddie-0.019
 horror-0.019
 unison-0.019
 delight-0.019
Tokenhang
Token ID33255
Feature activation+9.64e-02

Pos logit contributions
 agreement+0.020
 unison+0.017
 horror+0.015
 delight+0.014
 captain+0.014
Neg logit contributions
w-0.014
mons-0.012
s-0.011
inks-0.011
bows-0.011
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
 agreement+0.222
 unison+0.172
 horror+0.169
 Prince+0.165
 Freddie+0.161
Neg logit contributions
w-0.199
z-0.165
inks-0.155
s-0.153
mons-0.145
Token shelf
Token ID18316
Feature activation+5.27e-03

Pos logit contributions
w+0.025
z+0.022
s+0.020
inks+0.020
ings+0.019
Neg logit contributions
 agreement-0.021
 unison-0.016
 horror-0.015
 Prince-0.015
 Freddie-0.014
Token and
Token ID290
Feature activation-1.27e-02

Pos logit contributions
 agreement+0.011
 unison+0.009
 horror+0.008
 Prince+0.008
 Freddie+0.008
Neg logit contributions
w-0.012
z-0.010
inks-0.009
s-0.009
mons-0.009
Token it
Token ID340
Feature activation-6.44e-03

Pos logit contributions
w+0.031
z+0.027
s+0.026
inks+0.026
ing+0.024
Neg logit contributions
 agreement-0.024
 unison-0.018
 horror-0.017
 Prince-0.016
 Freddie-0.016
Token was
Token ID373
Feature activation-1.49e-02

Pos logit contributions
w+0.013
z+0.011
s+0.011
inks+0.010
ing+0.010
Neg logit contributions
 agreement-0.015
 unison-0.012
 horror-0.012
 Prince-0.011
 Freddie-0.011
Token old
Token ID1468
Feature activation+1.21e-02

Pos logit contributions
w+0.003
z+0.003
inks+0.003
s+0.003
ings+0.002
Neg logit contributions
 agreement-0.003
 unison-0.002
 horror-0.002
 Prince-0.002
 Princess-0.002
Token man
Token ID582
Feature activation-5.12e-03

Pos logit contributions
 agreement+0.022
 horror+0.015
 unison+0.014
 Prince+0.014
 knight+0.014
Neg logit contributions
w-0.032
z-0.028
inks-0.027
s-0.026
ings-0.025
Token was
Token ID373
Feature activation-7.48e-03

Pos logit contributions
w+0.011
z+0.010
inks+0.009
ings+0.009
mons+0.009
Neg logit contributions
 agreement-0.011
 unison-0.008
 horror-0.008
 Prince-0.008
 Freddie-0.008
Token right
Token ID826
Feature activation-3.16e-02

Pos logit contributions
w+0.014
z+0.012
ing+0.011
s+0.011
inks+0.011
Neg logit contributions
 agreement-0.017
 unison-0.014
 horror-0.013
 Prince-0.013
 knight-0.012
Token--
Token ID438
Feature activation-2.05e-02

Pos logit contributions
w+0.066
z+0.055
s+0.051
inks+0.051
ings+0.048
Neg logit contributions
 agreement-0.073
 unison-0.057
 horror-0.056
 Prince-0.054
 Freddie-0.053
Tokenpat
Token ID8071
Feature activation+9.55e-02

Pos logit contributions
w+0.039
inks+0.034
s+0.031
z+0.030
mons+0.030
Neg logit contributions
 agreement-0.043
 Prince-0.033
 Princess-0.033
 knight-0.033
 Freddie-0.033
Tokenience
Token ID1240
Feature activation-4.02e-02

Pos logit contributions
 agreement+0.202
 unison+0.153
 horror+0.149
 Prince+0.144
 Freddie+0.141
Neg logit contributions
w-0.216
z-0.184
inks-0.172
s-0.172
ings-0.162
Token was
Token ID373
Feature activation-7.57e-03

Pos logit contributions
w+0.075
z+0.062
s+0.056
inks+0.056
ings+0.052
Neg logit contributions
 agreement-0.101
 unison-0.081
 horror-0.079
 Prince-0.077
 Freddie-0.075
Token key
Token ID1994
Feature activation-8.63e-03

Pos logit contributions
w+0.017
z+0.014
s+0.013
inks+0.013
ing+0.013
Neg logit contributions
 agreement-0.017
 unison-0.013
 horror-0.013
 Prince-0.012
 Princess-0.012
Token.
Token ID13
Feature activation+3.50e-03

Pos logit contributions
w+0.019
z+0.015
inks+0.015
mons+0.014
iffs+0.014
Neg logit contributions
 agreement-0.018
 unison-0.014
 horror-0.013
 knight-0.013
 Freddie-0.013
Token He
Token ID679
Feature activation+1.25e-03

Pos logit contributions
 agreement+0.007
 unison+0.005
 horror+0.005
 Prince+0.005
 Freddie+0.005
Neg logit contributions
w-0.009
z-0.007
s-0.007
inks-0.007
ings-0.007
Token Larry
Token ID13633
Feature activation-1.18e-02

Pos logit contributions
 agreement+0.005
 unison+0.003
 horror+0.003
 Prince+0.003
 Freddie+0.003
Neg logit contributions
w-0.012
z-0.010
s-0.010
inks-0.010
ings-0.009
Token was
Token ID373
Feature activation-4.17e-03

Pos logit contributions
w+0.028
z+0.026
inks+0.026
mons+0.026
ings+0.025
Neg logit contributions
 agreement-0.018
 end-0.013
 unison-0.013
 Freddie-0.013
 Princess-0.012
Token a
Token ID257
Feature activation+4.13e-03

Pos logit contributions
w+0.010
z+0.009
inks+0.008
s+0.008
ings+0.008
Neg logit contributions
 agreement-0.008
 horror-0.006
 unison-0.006
 Prince-0.006
 Princess-0.006
Token good
Token ID922
Feature activation+1.66e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.007
 Prince+0.006
 Freddie+0.006
Neg logit contributions
w-0.009
z-0.008
s-0.007
inks-0.007
ing-0.007
Token-
Token ID12
Feature activation-1.16e-04

Pos logit contributions
 agreement+0.003
 knight+0.002
 Prince+0.002
 Princess+0.002
 horror+0.002
Neg logit contributions
w-0.004
z-0.004
inks-0.004
s-0.004
ings-0.003
Tokennat
Token ID32353
Feature activation+9.08e-02

Pos logit contributions
w+0.000
inks+0.000
z+0.000
ings+0.000
s+0.000
Neg logit contributions
 agreement-0.000
 Princess-0.000
 Prince-0.000
 knight-0.000
 Freddie-0.000
Tokenured
Token ID1522
Feature activation+1.09e-02

Pos logit contributions
 agreement+0.195
 unison+0.151
 horror+0.149
 Prince+0.140
 Freddie+0.137
Neg logit contributions
w-0.204
z-0.176
s-0.161
inks-0.159
mons-0.152
Token bird
Token ID6512
Feature activation+1.24e-02

Pos logit contributions
 agreement+0.019
 knight+0.013
 Prince+0.013
 horror+0.013
 Princess+0.013
Neg logit contributions
w-0.028
z-0.025
inks-0.024
s-0.023
mons-0.021
Token and
Token ID290
Feature activation-1.75e-03

Pos logit contributions
 agreement+0.022
 unison+0.016
 Prince+0.016
 Freddie+0.015
 Princess+0.015
Neg logit contributions
w-0.031
z-0.027
inks-0.027
mons-0.025
ings-0.024
Token always
Token ID1464
Feature activation-8.17e-04

Pos logit contributions
w+0.004
z+0.003
s+0.003
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 Prince-0.003
 Freddie-0.003
Token looked
Token ID3114
Feature activation+2.26e-02

Pos logit contributions
w+0.002
z+0.002
inks+0.002
s+0.002
ings+0.002
Neg logit contributions
 agreement-0.002
 unison-0.001
 horror-0.001
 Freddie-0.001
 relief-0.001
Token.
Token ID13
Feature activation+1.23e-02

Pos logit contributions
w+0.010
z+0.009
s+0.008
inks+0.008
ings+0.008
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Prince-0.009
 Princess-0.009
Token
Token ID198
Feature activation+9.58e-03

Pos logit contributions
 agreement+0.029
 horror+0.022
 unison+0.021
 knight+0.021
 relief+0.021
Neg logit contributions
w-0.023
s-0.021
z-0.020
inks-0.019
mons-0.018
Token
Token ID198
Feature activation+1.26e-02

Pos logit contributions
 agreement+0.021
 unison+0.016
 horror+0.016
 Prince+0.015
 Freddie+0.015
Neg logit contributions
w-0.021
z-0.018
inks-0.016
s-0.016
ings-0.015
TokenThe
Token ID464
Feature activation+8.36e-03

Pos logit contributions
 agreement+0.029
 horror+0.023
 unison+0.023
 Prince+0.022
 relief+0.021
Neg logit contributions
w-0.026
z-0.022
s-0.020
inks-0.020
ing-0.019
Token all
Token ID477
Feature activation-9.99e-03

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 Freddie+0.012
 Prince+0.012
Neg logit contributions
w-0.019
z-0.016
s-0.015
inks-0.015
ing-0.015
Tokenigator
Token ID23823
Feature activation+9.07e-02

Pos logit contributions
w+0.026
s+0.023
z+0.022
ing+0.021
o+0.020
Neg logit contributions
 agreement-0.017
 unison-0.013
 Prince-0.012
 horror-0.012
 Deal-0.012
Token was
Token ID373
Feature activation+9.73e-03

Pos logit contributions
 agreement+0.191
 unison+0.144
 horror+0.141
 Prince+0.136
 Freddie+0.132
Neg logit contributions
w-0.207
z-0.176
s-0.168
inks-0.163
ing-0.155
Token so
Token ID523
Feature activation-8.34e-03

Pos logit contributions
 agreement+0.021
 unison+0.017
 Prince+0.016
 horror+0.016
 knight+0.015
Neg logit contributions
w-0.020
z-0.018
s-0.016
inks-0.016
ings-0.015
Token upset
Token ID9247
Feature activation+3.32e-02

Pos logit contributions
w+0.021
z+0.018
inks+0.017
s+0.017
mons+0.016
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.011
 Prince-0.011
 Freddie-0.010
Token that
Token ID326
Feature activation+1.00e-02

Pos logit contributions
 agreement+0.077
 unison+0.060
 horror+0.058
 Prince+0.057
 Princess+0.055
Neg logit contributions
w-0.069
z-0.058
s-0.055
inks-0.054
ing-0.051
Token he
Token ID339
Feature activation+5.95e-03

Pos logit contributions
 agreement+0.021
 unison+0.016
 horror+0.015
 Prince+0.015
 relief+0.014
Neg logit contributions
w-0.023
z-0.019
s-0.019
inks-0.018
ing-0.017
Token,"
Token ID553
Feature activation+4.14e-03

Pos logit contributions
w+0.019
z+0.017
inks+0.016
s+0.016
mons+0.015
Neg logit contributions
 agreement-0.019
 unison-0.014
 horror-0.014
 Prince-0.014
 Freddie-0.014
Token she
Token ID673
Feature activation+1.12e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.007
 knight+0.006
 Prince+0.006
Neg logit contributions
w-0.009
s-0.008
z-0.007
uck-0.007
inks-0.007
Token said
Token ID531
Feature activation-8.01e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 End+0.002
 relief+0.002
Neg logit contributions
w-0.002
s-0.002
z-0.002
inks-0.002
uck-0.001
Token,
Token ID11
Feature activation+3.92e-03

Pos logit contributions
w+0.014
z+0.013
s+0.013
ings+0.012
ing+0.012
Neg logit contributions
 agreement-0.019
 unison-0.016
 horror-0.015
 knight-0.014
 Prince-0.014
Token "
Token ID366
Feature activation-9.83e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.007
 knight+0.007
 end+0.007
Neg logit contributions
z-0.006
w-0.006
s-0.006
inks-0.006
elt-0.005
TokenMade
Token ID24616
Feature activation+8.99e-02

Pos logit contributions
w+0.019
z+0.016
inks+0.015
s+0.015
ings+0.015
Neg logit contributions
 agreement-0.023
 unison-0.018
 Prince-0.017
 horror-0.017
 Princess-0.017
Token this
Token ID428
Feature activation+1.67e-02

Pos logit contributions
 agreement+0.198
 unison+0.159
 horror+0.157
 Prince+0.149
 delight+0.148
Neg logit contributions
w-0.198
z-0.159
ing-0.147
inks-0.143
uck-0.140
Token cub
Token ID13617
Feature activation-1.08e-02

Pos logit contributions
 agreement+0.032
 unison+0.024
 horror+0.023
 Prince+0.022
 Freddie+0.022
Neg logit contributions
w-0.041
z-0.035
inks-0.033
s-0.033
ings-0.031
Tokenby
Token ID1525
Feature activation-2.55e-02

Pos logit contributions
w+0.020
z+0.018
inks+0.016
mons+0.015
ings+0.015
Neg logit contributions
 agreement-0.025
 unison-0.020
 Prince-0.019
 Freddie-0.019
 horror-0.019
Token house
Token ID2156
Feature activation-1.25e-02

Pos logit contributions
w+0.048
z+0.040
inks+0.037
s+0.036
mons+0.034
Neg logit contributions
 agreement-0.061
 unison-0.049
 horror-0.047
 Prince-0.047
 Freddie-0.046
Token all
Token ID477
Feature activation-2.06e-02

Pos logit contributions
w+0.021
z+0.018
inks+0.016
s+0.015
mons+0.015
Neg logit contributions
 agreement-0.032
 unison-0.026
 horror-0.025
 Prince-0.025
 Princess-0.025
Token her
Token ID607
Feature activation+7.77e-03

Pos logit contributions
 agreement+0.064
 knight+0.048
 unison+0.048
 Prince+0.047
 horror+0.047
Neg logit contributions
w-0.071
z-0.060
s-0.058
ings-0.058
inks-0.056
Token sweater
Token ID35685
Feature activation+1.22e-03

Pos logit contributions
 agreement+0.014
 unison+0.010
 horror+0.010
 Prince+0.009
 Freddie+0.009
Neg logit contributions
w-0.020
z-0.018
s-0.017
inks-0.017
ings-0.016
Token *
Token ID1635
Feature activation-4.39e-02

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.002
inks-0.002
s-0.002
ings-0.002
Tokenpo
Token ID7501
Feature activation+1.38e-02

Pos logit contributions
w+0.080
z+0.066
inks+0.062
s+0.061
ings+0.057
Neg logit contributions
 agreement-0.109
 unison-0.087
 horror-0.084
 Prince-0.083
 Princess-0.081
Tokenpped
Token ID1496
Feature activation-1.56e-02

Pos logit contributions
 agreement+0.028
 unison+0.020
 horror+0.019
 Prince+0.019
 end+0.019
Neg logit contributions
w-0.027
inks-0.025
mons-0.024
z-0.024
bel-0.023
Token*
Token ID9
Feature activation-1.17e-01

Pos logit contributions
w+0.033
z+0.028
s+0.026
inks+0.026
ings+0.024
Neg logit contributions
 agreement-0.035
 unison-0.027
 horror-0.027
 Prince-0.026
 Freddie-0.025
Token!
Token ID0
Feature activation+1.11e-04

Pos logit contributions
w+0.261
z+0.223
s+0.208
inks+0.207
ings+0.195
Neg logit contributions
 agreement-0.252
 unison-0.192
 horror-0.187
 Prince-0.181
 Freddie-0.177
Token Everyone
Token ID11075
Feature activation-2.51e-03

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 Prince+0.000
 knight+0.000
Neg logit contributions
w-0.000
z-0.000
s-0.000
inks-0.000
uck-0.000
Token stopped
Token ID5025
Feature activation-1.79e-02

Pos logit contributions
w+0.005
z+0.005
inks+0.004
s+0.004
ings+0.004
Neg logit contributions
 agreement-0.006
 unison-0.004
 horror-0.004
 Prince-0.004
 Freddie-0.004
Token and
Token ID290
Feature activation+1.26e-04

Pos logit contributions
w+0.044
z+0.036
inks+0.033
mons+0.033
s+0.032
Neg logit contributions
 agreement-0.036
 horror-0.027
 unison-0.026
 Prince-0.024
 delight-0.024
Token looked
Token ID3114
Feature activation+2.32e-02

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 Prince+0.000
 Freddie+0.000
Neg logit contributions
w-0.000
z-0.000
s-0.000
inks-0.000
ing-0.000
Token a
Token ID257
Feature activation+2.58e-03

Pos logit contributions
w+0.004
z+0.004
s+0.003
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.005
 unison-0.003
 horror-0.003
 Prince-0.003
 Princess-0.003
Token loud
Token ID7812
Feature activation-1.02e-02

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 Prince+0.004
 Freddie+0.004
Neg logit contributions
w-0.006
z-0.005
s-0.005
inks-0.005
ings-0.004
Token *
Token ID1635
Feature activation-3.90e-02

Pos logit contributions
w+0.024
z+0.021
s+0.020
inks+0.019
ing+0.019
Neg logit contributions
 agreement-0.020
 unison-0.015
 horror-0.014
 Prince-0.014
 Freddie-0.013
Tokenbe
Token ID1350
Feature activation-5.80e-03

Pos logit contributions
w+0.073
z+0.061
inks+0.058
s+0.057
ings+0.053
Neg logit contributions
 agreement-0.094
 unison-0.075
 horror-0.072
 Prince-0.071
 Princess-0.070
Tokenep
Token ID538
Feature activation-1.27e-02

Pos logit contributions
w+0.008
inks+0.006
z+0.006
s+0.006
mons+0.006
Neg logit contributions
 agreement-0.017
 unison-0.013
 horror-0.013
 delight-0.013
 relief-0.013
Token*
Token ID9
Feature activation-1.14e-01

Pos logit contributions
w+0.029
z+0.024
inks+0.023
s+0.023
mons+0.022
Neg logit contributions
 agreement-0.027
 unison-0.020
 horror-0.020
 Prince-0.019
 Princess-0.018
Token from
Token ID422
Feature activation+2.79e-02

Pos logit contributions
w+0.268
z+0.232
s+0.218
inks+0.212
ings+0.208
Neg logit contributions
 agreement-0.230
 unison-0.172
 horror-0.166
 Prince-0.163
 Princess-0.159
Token the
Token ID262
Feature activation+1.33e-03

Pos logit contributions
 agreement+0.055
 unison+0.042
 horror+0.039
 Prince+0.038
 Princess+0.037
Neg logit contributions
w-0.063
z-0.054
ings-0.049
inks-0.049
bel-0.048
Token oven
Token ID14361
Feature activation+1.39e-02

Pos logit contributions
 agreement+0.002
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.003
s-0.003
inks-0.003
ings-0.003
Token.
Token ID13
Feature activation+1.22e-04

Pos logit contributions
 agreement+0.031
 unison+0.023
 horror+0.023
 Prince+0.022
 Freddie+0.022
Neg logit contributions
w-0.030
z-0.026
s-0.024
inks-0.024
ings-0.023
Token The
Token ID383
Feature activation-6.72e-03

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 knight+0.000
 Prince+0.000
Neg logit contributions
w-0.000
s-0.000
z-0.000
inks-0.000
ings-0.000
Token at
Token ID379
Feature activation+1.45e-03

Pos logit contributions
 agreement+0.045
 unison+0.037
 horror+0.035
 Prince+0.034
 Freddie+0.033
Neg logit contributions
w-0.036
z-0.030
s-0.029
inks-0.028
ing-0.027
Token them
Token ID606
Feature activation+3.30e-04

Pos logit contributions
 agreement+0.003
 knight+0.003
 unison+0.002
 respond+0.002
 horror+0.002
Neg logit contributions
w-0.003
s-0.002
ings-0.002
ing-0.002
z-0.002
Token with
Token ID351
Feature activation-1.08e-02

Pos logit contributions
 agreement+0.001
 unison+0.001
 Prince+0.001
 knight+0.001
 Princess+0.001
Neg logit contributions
w-0.001
z-0.001
s-0.000
ing-0.000
ings-0.000
Token surprise
Token ID5975
Feature activation+1.33e-02

Pos logit contributions
w+0.020
ings+0.017
s+0.017
z+0.016
inks+0.016
Neg logit contributions
 agreement-0.022
 Princess-0.018
 Prince-0.018
 knight-0.017
 Freddie-0.017
Token and
Token ID290
Feature activation-1.89e-03

Pos logit contributions
 agreement+0.032
 unison+0.026
 Prince+0.025
 Princess+0.025
 knight+0.025
Neg logit contributions
w-0.024
s-0.021
z-0.020
ing-0.019
ings-0.019
Token displeasure
Token ID47313
Feature activation-1.13e-01

Pos logit contributions
w+0.004
z+0.004
s+0.004
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 Prince-0.003
 Freddie-0.003
 horror-0.003
Token.
Token ID13
Feature activation-3.80e-03

Pos logit contributions
w+0.233
s+0.198
z+0.197
ing+0.184
ings+0.182
Neg logit contributions
 agreement-0.264
 unison-0.209
 horror-0.200
 Prince-0.200
 Princess-0.196
Token
Token ID198
Feature activation-3.61e-03

Pos logit contributions
w+0.005
s+0.005
z+0.005
inks+0.005
ings+0.004
Neg logit contributions
 agreement-0.010
 unison-0.008
 hesitation-0.008
 knight-0.008
 horror-0.008
Token
Token ID198
Feature activation-2.73e-04

Pos logit contributions
w+0.007
z+0.006
s+0.006
inks+0.006
ings+0.005
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.006
Token"
Token ID1
Feature activation+4.53e-04

Pos logit contributions
w+0.000
z+0.000
s+0.000
inks+0.000
ing+0.000
Neg logit contributions
 agreement-0.001
 horror-0.001
 Prince-0.001
 unison-0.001
 Princess-0.001
TokenBut
Token ID1537
Feature activation+7.90e-03

Pos logit contributions
 agreement+0.001
 horror+0.001
 unison+0.001
 hesitation+0.001
 Prince+0.001
Neg logit contributions
w-0.001
z-0.001
we-0.001
s-0.001
mons-0.001
Token work
Token ID670
Feature activation-6.01e-03

Pos logit contributions
w+0.015
z+0.014
inks+0.013
s+0.013
mons+0.012
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Prince-0.008
 relief-0.008
Token hard
Token ID1327
Feature activation-8.46e-03

Pos logit contributions
w+0.013
z+0.011
inks+0.010
s+0.010
mons+0.010
Neg logit contributions
 agreement-0.013
 unison-0.010
 horror-0.010
 Prince-0.010
 Freddie-0.010
Token."
Token ID526
Feature activation+6.13e-03

Pos logit contributions
w+0.018
z+0.015
inks+0.015
s+0.014
mons+0.014
Neg logit contributions
 agreement-0.019
 unison-0.014
 Prince-0.014
 horror-0.014
 Freddie-0.013
Token
Token ID198
Feature activation-7.33e-03

Pos logit contributions
 agreement+0.012
 unison+0.009
 horror+0.008
 knight+0.008
 Prince+0.008
Neg logit contributions
w-0.015
z-0.013
s-0.013
inks-0.012
ings-0.012
Token
Token ID198
Feature activation-4.83e-03

Pos logit contributions
w+0.016
z+0.014
s+0.013
inks+0.013
ings+0.012
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.012
 Prince-0.011
 Princess-0.011
Token*
Token ID9
Feature activation-1.13e-01

Pos logit contributions
w+0.010
z+0.009
s+0.008
inks+0.008
ing+0.008
Neg logit contributions
 agreement-0.011
 horror-0.008
 unison-0.008
 Prince-0.008
 Princess-0.008
Tokenwh
Token ID1929
Feature activation+2.91e-02

Pos logit contributions
w+0.244
z+0.205
s+0.195
ing+0.190
ings+0.186
Neg logit contributions
 agreement-0.240
 unison-0.197
 horror-0.194
 hesitation-0.178
 Prince-0.176
Tokenistle
Token ID12535
Feature activation+1.73e-02

Pos logit contributions
 agreement+0.068
 unison+0.053
 horror+0.053
 Prince+0.051
 Princess+0.050
Neg logit contributions
w-0.061
z-0.051
s-0.046
inks-0.046
ing-0.044
Token sound
Token ID2128
Feature activation-3.99e-03

Pos logit contributions
 agreement+0.037
 unison+0.028
 horror+0.027
 Prince+0.026
 Princess+0.026
Neg logit contributions
w-0.039
z-0.032
inks-0.031
s-0.030
mons-0.030
Token<|endoftext|>
Token ID50256
Feature activation+3.60e-03

Pos logit contributions
w+0.009
z+0.008
s+0.008
inks+0.008
ing+0.007
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.006
TokenOne
Token ID3198
Feature activation-6.09e-03

Pos logit contributions
 agreement+0.010
 unison+0.008
 horror+0.008
 Prince+0.007
 relief+0.007
Neg logit contributions
w-0.006
z-0.005
s-0.005
inks-0.004
ing-0.004
Token danced
Token ID39480
Feature activation-1.32e-02

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.002
s-0.002
inks-0.002
ing-0.002
Token along
Token ID1863
Feature activation-1.63e-02

Pos logit contributions
w+0.027
z+0.023
s+0.021
inks+0.021
ing+0.020
Neg logit contributions
 agreement-0.031
 unison-0.024
 horror-0.024
 Prince-0.023
 Freddie-0.022
Token and
Token ID290
Feature activation-8.72e-03

Pos logit contributions
w+0.032
z+0.028
s+0.026
inks+0.025
ings+0.024
Neg logit contributions
 agreement-0.038
 unison-0.030
 horror-0.029
 Prince-0.028
 Freddie-0.028
Token the
Token ID262
Feature activation-5.93e-03

Pos logit contributions
w+0.022
z+0.019
s+0.018
inks+0.018
ing+0.017
Neg logit contributions
 agreement-0.017
 unison-0.012
 horror-0.012
 Prince-0.011
 Freddie-0.011
Token audience
Token ID5386
Feature activation-2.14e-02

Pos logit contributions
w+0.014
z+0.011
s+0.011
ing+0.011
inks+0.010
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Freddie-0.008
 hesitation-0.008
Token (
Token ID357
Feature activation-1.07e-01

Pos logit contributions
w+0.049
z+0.042
s+0.039
inks+0.039
ings+0.037
Neg logit contributions
 agreement-0.045
 unison-0.034
 horror-0.033
 Prince-0.032
 Freddie-0.031
TokenL
Token ID43
Feature activation+1.07e-03

Pos logit contributions
w+0.185
inks+0.154
z+0.141
ings+0.141
s+0.138
Neg logit contributions
 agreement-0.258
 unison-0.207
 Princess-0.200
 Prince-0.197
 knight-0.194
Tokenily
Token ID813
Feature activation-1.59e-02

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.003
z-0.003
s-0.003
inks-0.003
ings-0.003
Token's
Token ID338
Feature activation+2.79e-03

Pos logit contributions
w+0.035
z+0.028
mons+0.027
inks+0.027
ings+0.026
Neg logit contributions
 agreement-0.033
 unison-0.026
 horror-0.025
 Princess-0.024
 Prince-0.024
Token mom
Token ID1995
Feature activation-7.67e-03

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 Prince+0.004
 Freddie+0.004
Neg logit contributions
w-0.007
z-0.006
s-0.006
inks-0.006
ings-0.005
Token and
Token ID290
Feature activation-1.15e-02

Pos logit contributions
w+0.018
z+0.015
mons+0.014
inks+0.014
ings+0.013
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.011
 Prince-0.011
 Princess-0.011
Token friend
Token ID1545
Feature activation+6.73e-03

Pos logit contributions
 agreement+0.007
 unison+0.006
 horror+0.005
 Prince+0.005
 Freddie+0.005
Neg logit contributions
w-0.006
z-0.006
s-0.005
inks-0.005
ings-0.005
Token.
Token ID13
Feature activation+2.15e-03

Pos logit contributions
 agreement+0.016
 unison+0.013
 horror+0.012
 Prince+0.012
 relief+0.012
Neg logit contributions
w-0.013
z-0.011
s-0.011
inks-0.010
ing-0.010
Token
Token ID220
Feature activation+1.27e-02

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 Prince+0.004
 Freddie+0.003
Neg logit contributions
w-0.004
z-0.004
s-0.004
inks-0.004
ings-0.003
Token
Token ID198
Feature activation+1.28e-03

Pos logit contributions
 agreement+0.026
 unison+0.020
 horror+0.019
 Prince+0.019
 Freddie+0.018
Neg logit contributions
w-0.030
z-0.025
s-0.023
inks-0.023
mons-0.022
Token
Token ID198
Feature activation+4.30e-03

Pos logit contributions
 agreement+0.002
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.003
inks-0.003
s-0.003
mons-0.002
Token*
Token ID9
Feature activation-1.07e-01

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.007
 Prince+0.007
 Freddie+0.007
Neg logit contributions
w-0.009
z-0.008
inks-0.008
s-0.007
ings-0.007
TokenThe
Token ID464
Feature activation+1.70e-03

Pos logit contributions
w+0.233
z+0.198
s+0.183
inks+0.176
ings+0.176
Neg logit contributions
 agreement-0.236
 unison-0.182
 horror-0.181
 Prince-0.177
 Freddie-0.172
Token End
Token ID5268
Feature activation+1.16e-04

Pos logit contributions
 agreement+0.002
 unison+0.002
 Prince+0.002
 Freddie+0.002
 Princess+0.001
Neg logit contributions
w-0.005
z-0.004
s-0.004
inks-0.004
mons-0.004
Token*
Token ID9
Feature activation-1.06e-01

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 Prince+0.000
 Freddie+0.000
Neg logit contributions
w-0.000
z-0.000
inks-0.000
s-0.000
mons-0.000
Token<|endoftext|>
Token ID50256
Feature activation+1.08e-02

Pos logit contributions
w+0.247
z+0.208
s+0.200
uck+0.188
ing+0.187
Neg logit contributions
 agreement-0.219
 unison-0.165
 horror-0.163
 Prince-0.158
 knight-0.155
TokenOnce
Token ID7454
Feature activation+2.32e-02

Pos logit contributions
 agreement+0.028
 unison+0.023
 horror+0.023
 Prince+0.022
 relief+0.021
Neg logit contributions
w-0.019
z-0.015
s-0.014
inks-0.014
ing-0.013
Token he
Token ID339
Feature activation+5.93e-04

Pos logit contributions
 agreement+0.018
 unison+0.014
 horror+0.013
 Freddie+0.012
 Prince+0.012
Neg logit contributions
w-0.018
z-0.015
s-0.014
ing-0.014
ings-0.013
Token heard
Token ID2982
Feature activation+2.16e-03

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.001
z-0.001
s-0.001
inks-0.001
ing-0.001
Token a
Token ID257
Feature activation+8.56e-03

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.003
 Freddie+0.003
 Prince+0.003
Neg logit contributions
w-0.004
z-0.004
s-0.004
ing-0.004
inks-0.004
Token whistle
Token ID16121
Feature activation+1.62e-03

Pos logit contributions
 agreement+0.016
 unison+0.012
 horror+0.012
 Prince+0.012
 Freddie+0.012
Neg logit contributions
w-0.020
z-0.017
s-0.017
inks-0.016
ing-0.016
Token!
Token ID0
Feature activation+4.50e-03

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 Prince+0.003
 Freddie+0.003
Neg logit contributions
w-0.003
z-0.003
s-0.003
ing-0.003
inks-0.003
Token*
Token ID9
Feature activation-1.07e-01

Pos logit contributions
 agreement+0.010
 unison+0.008
 horror+0.008
 hesitation+0.007
 knight+0.007
Neg logit contributions
w-0.009
z-0.008
s-0.008
inks-0.007
ings-0.007
Token He
Token ID679
Feature activation+1.11e-02

Pos logit contributions
w+0.226
z+0.191
s+0.182
ing+0.170
inks+0.169
Neg logit contributions
 agreement-0.230
 unison-0.181
 horror-0.178
 Prince-0.171
 Freddie-0.168
Token was
Token ID373
Feature activation+2.22e-03

Pos logit contributions
 agreement+0.025
 unison+0.019
 horror+0.019
 Prince+0.018
 Freddie+0.018
Neg logit contributions
w-0.024
z-0.021
s-0.020
inks-0.019
ing-0.018
Token so
Token ID523
Feature activation-1.73e-02

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.003
 Prince+0.003
 Freddie+0.003
Neg logit contributions
w-0.005
z-0.004
inks-0.004
s-0.004
ing-0.004
Token happy
Token ID3772
Feature activation+8.52e-03

Pos logit contributions
w+0.047
z+0.041
s+0.038
inks+0.038
mons+0.038
Neg logit contributions
 agreement-0.030
 unison-0.020
 horror-0.020
 Prince-0.019
 delight-0.019
Token and
Token ID290
Feature activation+6.76e-03

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 Prince+0.013
 Freddie+0.013
Neg logit contributions
w-0.020
z-0.017
inks-0.015
mons-0.015
s-0.015
Token
Token ID198
Feature activation+1.28e-03

Pos logit contributions
 agreement+0.026
 unison+0.020
 horror+0.019
 Prince+0.019
 Freddie+0.018
Neg logit contributions
w-0.030
z-0.025
s-0.023
inks-0.023
mons-0.022
Token
Token ID198
Feature activation+4.30e-03

Pos logit contributions
 agreement+0.002
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.003
inks-0.003
s-0.003
mons-0.002
Token*
Token ID9
Feature activation-1.07e-01

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.007
 Prince+0.007
 Freddie+0.007
Neg logit contributions
w-0.009
z-0.008
inks-0.008
s-0.007
ings-0.007
TokenThe
Token ID464
Feature activation+1.70e-03

Pos logit contributions
w+0.233
z+0.198
s+0.183
inks+0.176
ings+0.176
Neg logit contributions
 agreement-0.236
 unison-0.182
 horror-0.181
 Prince-0.177
 Freddie-0.172
Token End
Token ID5268
Feature activation+1.16e-04

Pos logit contributions
 agreement+0.002
 unison+0.002
 Prince+0.002
 Freddie+0.002
 Princess+0.001
Neg logit contributions
w-0.005
z-0.004
s-0.004
inks-0.004
mons-0.004
Token*
Token ID9
Feature activation-1.06e-01

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 Prince+0.000
 Freddie+0.000
Neg logit contributions
w-0.000
z-0.000
inks-0.000
s-0.000
mons-0.000
Token<|endoftext|>
Token ID50256
Feature activation+1.08e-02

Pos logit contributions
w+0.247
z+0.208
s+0.200
uck+0.188
ing+0.187
Neg logit contributions
 agreement-0.219
 unison-0.165
 horror-0.163
 Prince-0.158
 knight-0.155
TokenOnce
Token ID7454
Feature activation+2.32e-02

Pos logit contributions
 agreement+0.028
 unison+0.023
 horror+0.023
 Prince+0.022
 relief+0.021
Neg logit contributions
w-0.019
z-0.015
s-0.014
inks-0.014
ing-0.013
Token upon
Token ID2402
Feature activation-4.77e-03

Pos logit contributions
 agreement+0.057
 horror+0.044
 unison+0.043
 Freddie+0.042
 Prince+0.042
Neg logit contributions
w-0.044
z-0.037
inks-0.033
mons-0.032
s-0.030
Token a
Token ID257
Feature activation+9.89e-03

Pos logit contributions
w+0.011
z+0.010
s+0.009
inks+0.009
ings+0.009
Neg logit contributions
 agreement-0.009
 unison-0.007
 horror-0.007
 Prince-0.007
 Freddie-0.006
Token time
Token ID640
Feature activation+4.85e-03

Pos logit contributions
 agreement+0.013
 unison+0.008
 horror+0.008
 Prince+0.008
 Princess+0.007
Neg logit contributions
w-0.030
z-0.027
s-0.025
inks-0.025
ings-0.024
Token hair
Token ID4190
Feature activation-2.31e-03

Pos logit contributions
w+0.010
z+0.008
s+0.008
inks+0.008
ings+0.008
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.006
Token tied
Token ID8165
Feature activation+5.62e-03

Pos logit contributions
w+0.005
z+0.004
inks+0.004
s+0.004
mons+0.004
Neg logit contributions
 agreement-0.005
 unison-0.004
 horror-0.004
 Prince-0.003
 Princess-0.003
Token in
Token ID287
Feature activation-1.60e-02

Pos logit contributions
 agreement+0.013
 unison+0.010
 Freddie+0.010
 horror+0.010
 Prince+0.009
Neg logit contributions
w-0.011
z-0.009
inks-0.009
s-0.009
mons-0.008
Token two
Token ID734
Feature activation-3.12e-02

Pos logit contributions
w+0.034
z+0.030
s+0.028
ings+0.028
inks+0.027
Neg logit contributions
 agreement-0.032
 horror-0.025
 knight-0.024
 unison-0.024
 Prince-0.024
Token pig
Token ID12967
Feature activation-2.73e-02

Pos logit contributions
w+0.068
s+0.059
z+0.059
inks+0.056
ing+0.053
Neg logit contributions
 agreement-0.066
 unison-0.050
 horror-0.049
 Prince-0.048
 Freddie-0.047
Tokentails
Token ID26404
Feature activation-1.00e-01

Pos logit contributions
w+0.045
z+0.034
inks+0.031
s+0.029
uck+0.029
Neg logit contributions
 agreement-0.072
 unison-0.060
 Prince-0.057
 horror-0.057
 Freddie-0.056
Token.
Token ID13
Feature activation-7.19e-03

Pos logit contributions
w+0.193
z+0.172
s+0.163
inks+0.155
ings+0.149
Neg logit contributions
 agreement-0.238
 horror-0.185
 Prince-0.181
 relief-0.181
 unison-0.181
Token "
Token ID366
Feature activation-2.68e-02

Pos logit contributions
w+0.012
s+0.011
inks+0.010
z+0.010
elt+0.010
Neg logit contributions
 agreement-0.018
 unison-0.014
 horror-0.014
 relief-0.014
 knight-0.014
TokenM
Token ID44
Feature activation-9.88e-03

Pos logit contributions
w+0.057
z+0.048
s+0.045
inks+0.045
ings+0.042
Neg logit contributions
 agreement-0.060
 unison-0.047
 horror-0.045
 Prince-0.044
 Freddie-0.043
Tokenummy
Token ID13513
Feature activation-3.87e-02

Pos logit contributions
w+0.017
z+0.014
s+0.013
inks+0.013
ings+0.012
Neg logit contributions
 agreement-0.026
 unison-0.021
 horror-0.020
 Prince-0.020
 Princess-0.020
Token!"
Token ID2474
Feature activation+9.68e-03

Pos logit contributions
w+0.081
z+0.070
inks+0.066
s+0.063
mons+0.062
Neg logit contributions
 agreement-0.083
 unison-0.064
 Princess-0.063
 Prince-0.063
 Freddie-0.063
Token,"
Token ID553
Feature activation-3.46e-03

Pos logit contributions
w+0.040
z+0.033
inks+0.031
s+0.030
mons+0.029
Neg logit contributions
 agreement-0.057
 unison-0.045
 horror-0.044
 Prince-0.043
 Freddie-0.043
Token the
Token ID262
Feature activation-1.89e-03

Pos logit contributions
w+0.010
z+0.009
inks+0.008
s+0.008
ing+0.008
Neg logit contributions
 agreement-0.005
 horror-0.004
 unison-0.004
 Prince-0.003
 relief-0.003
Token bird
Token ID6512
Feature activation+6.27e-03

Pos logit contributions
w+0.005
z+0.004
s+0.004
inks+0.004
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 Prince-0.003
 Freddie-0.003
Token said
Token ID531
Feature activation-9.00e-03

Pos logit contributions
 agreement+0.013
 unison+0.009
 horror+0.009
 Prince+0.009
 Freddie+0.008
Neg logit contributions
w-0.015
z-0.013
s-0.012
inks-0.012
ings-0.011
Token.
Token ID13
Feature activation+6.02e-04

Pos logit contributions
w+0.021
z+0.018
inks+0.017
s+0.017
mons+0.016
Neg logit contributions
 agreement-0.019
 unison-0.014
 horror-0.014
 Prince-0.013
 Freddie-0.013
Token (
Token ID357
Feature activation-9.59e-02

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.001
z-0.001
s-0.001
inks-0.001
ings-0.001
TokenOr
Token ID5574
Feature activation+2.11e-03

Pos logit contributions
w+0.198
z+0.171
inks+0.170
s+0.159
ings+0.157
Neg logit contributions
 agreement-0.213
 unison-0.163
 horror-0.155
 relief-0.148
 Prince-0.147
Token,
Token ID11
Feature activation-3.78e-03

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 Prince+0.003
 Princess+0.003
Neg logit contributions
w-0.005
z-0.005
inks-0.004
s-0.004
ings-0.004
Token if
Token ID611
Feature activation-2.58e-02

Pos logit contributions
w+0.008
z+0.006
s+0.006
inks+0.005
mons+0.005
Neg logit contributions
 agreement-0.009
 unison-0.007
 horror-0.007
 Prince-0.006
 Freddie-0.006
Token you
Token ID345
Feature activation-2.27e-03

Pos logit contributions
w+0.048
z+0.043
inks+0.040
s+0.036
ings+0.035
Neg logit contributions
 agreement-0.062
 unison-0.050
 horror-0.049
 Prince-0.049
 Princess-0.048
Token think
Token ID892
Feature activation-1.82e-02

Pos logit contributions
w+0.005
z+0.004
s+0.004
inks+0.004
ings+0.004
Neg logit contributions
 agreement-0.005
 unison-0.004
 horror-0.004
 Prince-0.004
 Princess-0.004
Token calling
Token ID4585
Feature activation-2.47e-02

Pos logit contributions
w+0.001
z+0.001
inks+0.001
s+0.001
mons+0.001
Neg logit contributions
 agreement-0.002
 horror-0.002
 unison-0.001
 Prince-0.001
 Freddie-0.001
Token him
Token ID683
Feature activation-2.61e-02

Pos logit contributions
w+0.047
z+0.039
s+0.037
inks+0.035
ing+0.035
Neg logit contributions
 agreement-0.059
 unison-0.048
 horror-0.046
 Prince-0.045
 Princess-0.044
Token Tim
Token ID5045
Feature activation-1.87e-02

Pos logit contributions
w+0.055
z+0.045
inks+0.042
s+0.041
mons+0.040
Neg logit contributions
 agreement-0.060
 unison-0.046
 horror-0.046
 Prince-0.045
 Freddie-0.044
Tokenmy
Token ID1820
Feature activation-1.96e-02

Pos logit contributions
w+0.037
z+0.031
inks+0.030
s+0.028
mons+0.028
Neg logit contributions
 agreement-0.043
 unison-0.033
 Prince-0.033
 horror-0.033
 Freddie-0.032
Token the
Token ID262
Feature activation-5.95e-03

Pos logit contributions
w+0.042
z+0.035
inks+0.033
s+0.032
mons+0.030
Neg logit contributions
 agreement-0.044
 unison-0.034
 horror-0.034
 Prince-0.033
 Freddie-0.032
Token Couch
Token ID48225
Feature activation-9.54e-02

Pos logit contributions
w+0.013
z+0.012
s+0.011
inks+0.010
ing+0.010
Neg logit contributions
 agreement-0.012
 unison-0.010
 Freddie-0.009
 horror-0.009
 Prince-0.009
Token-
Token ID12
Feature activation-3.77e-03

Pos logit contributions
w+0.204
z+0.173
s+0.162
inks+0.160
ing+0.151
Neg logit contributions
 agreement-0.214
 unison-0.165
 horror-0.161
 Prince-0.156
 Princess-0.151
TokenP
Token ID47
Feature activation+7.12e-03

Pos logit contributions
w+0.005
inks+0.005
ings+0.005
z+0.004
s+0.004
Neg logit contributions
 agreement-0.009
 Princess-0.008
 Prince-0.008
 knight-0.008
 Freddie-0.008
Tokenusher
Token ID34055
Feature activation+2.65e-03

Pos logit contributions
 agreement+0.017
 unison+0.014
 horror+0.013
 Prince+0.012
 knight+0.012
Neg logit contributions
w-0.014
z-0.011
s-0.010
inks-0.010
ings-0.010
Token!
Token ID0
Feature activation-7.07e-03

Pos logit contributions
 agreement+0.006
 unison+0.005
 horror+0.005
 Prince+0.005
 relief+0.005
Neg logit contributions
w-0.005
z-0.004
s-0.004
ing-0.004
ings-0.004
Token<|endoftext|>
Token ID50256
Feature activation-9.33e-03

Pos logit contributions
w+0.015
z+0.013
s+0.013
inks+0.012
ings+0.011
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.012
 Prince-0.011
 knight-0.011
Token smiled
Token ID13541
Feature activation-2.09e-02

Pos logit contributions
w+0.020
z+0.017
inks+0.016
s+0.015
mons+0.015
Neg logit contributions
 agreement-0.020
 unison-0.015
 horror-0.015
 Prince-0.014
 Princess-0.014
Token,
Token ID11
Feature activation-6.75e-03

Pos logit contributions
w+0.032
s+0.030
z+0.029
ing+0.028
inks+0.027
Neg logit contributions
 agreement-0.053
 unison-0.044
 horror-0.043
 Freddie-0.041
 Prince-0.040
Token happy
Token ID3772
Feature activation-3.38e-03

Pos logit contributions
w+0.012
s+0.011
z+0.011
inks+0.010
elt+0.010
Neg logit contributions
 agreement-0.015
 unison-0.012
 horror-0.012
 Prince-0.011
 knight-0.011
Token that
Token ID326
Feature activation-8.72e-03

Pos logit contributions
w+0.006
z+0.005
s+0.005
inks+0.004
ing+0.004
Neg logit contributions
 agreement-0.009
 unison-0.007
 horror-0.007
 Prince-0.007
 Freddie-0.007
Token the
Token ID262
Feature activation-5.68e-03

Pos logit contributions
w+0.018
s+0.015
z+0.015
ing+0.014
ings+0.013
Neg logit contributions
 agreement-0.020
 unison-0.016
 horror-0.014
 knight-0.014
 relief-0.013
Token crisis
Token ID4902
Feature activation-9.48e-02

Pos logit contributions
w+0.013
z+0.011
s+0.010
ing+0.010
mons+0.009
Neg logit contributions
 agreement-0.010
 unison-0.008
 horror-0.008
 Freddie-0.008
 Prince-0.007
Token was
Token ID373
Feature activation-3.27e-03

Pos logit contributions
w+0.177
z+0.147
s+0.136
inks+0.134
ings+0.125
Neg logit contributions
 agreement-0.237
 unison-0.188
 horror-0.184
 Prince-0.180
 Freddie-0.176
Token aver
Token ID9076
Feature activation+3.10e-03

Pos logit contributions
w+0.007
z+0.006
s+0.005
ing+0.005
ings+0.005
Neg logit contributions
 agreement-0.007
 unison-0.006
 horror-0.006
 Prince-0.005
 knight-0.005
Tokented
Token ID1513
Feature activation-2.30e-02

Pos logit contributions
 agreement+0.007
 unison+0.006
 Prince+0.005
 Freddie+0.005
 Princess+0.005
Neg logit contributions
w-0.006
z-0.005
inks-0.005
s-0.005
ings-0.004
Token.
Token ID13
Feature activation+3.82e-03

Pos logit contributions
w+0.052
z+0.045
inks+0.042
s+0.041
mons+0.039
Neg logit contributions
 agreement-0.048
 unison-0.036
 horror-0.036
 Prince-0.035
 Freddie-0.034
Token The
Token ID383
Feature activation-7.77e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 knight+0.007
 hesitation+0.007
 horror+0.007
Neg logit contributions
w-0.007
s-0.006
z-0.006
inks-0.006
o-0.006
Token,"
Token ID553
Feature activation+6.98e-03

Pos logit contributions
 agreement+0.015
 unison+0.012
 horror+0.011
 Prince+0.011
 Freddie+0.011
Neg logit contributions
w-0.013
z-0.011
s-0.010
inks-0.010
ings-0.010
TokenBe
Token ID3856
Feature activation+6.11e-03

Pos logit contributions
 agreement+0.014
 horror+0.011
 unison+0.010
 knight+0.010
 Prince+0.010
Neg logit contributions
w-0.016
z-0.014
s-0.013
inks-0.012
uck-0.012
Token good
Token ID922
Feature activation-1.14e-03

Pos logit contributions
 agreement+0.013
 horror+0.010
 unison+0.010
 Prince+0.010
 Freddie+0.009
Neg logit contributions
w-0.014
z-0.012
inks-0.011
uck-0.010
s-0.010
Token,
Token ID11
Feature activation+1.62e-03

Pos logit contributions
w+0.002
z+0.002
inks+0.002
s+0.002
mons+0.002
Neg logit contributions
 agreement-0.003
 unison-0.002
 horror-0.002
 Prince-0.002
 Princess-0.002
Token Holly
Token ID7764
Feature activation+3.53e-03

Pos logit contributions
 agreement+0.003
 unison+0.003
 horror+0.003
 knight+0.003
 Prince+0.003
Neg logit contributions
w-0.003
s-0.002
z-0.002
inks-0.002
elt-0.002
Token,.
Token ID38508
Feature activation-9.40e-02

Pos logit contributions
 agreement+0.008
 unison+0.006
 horror+0.006
 relief+0.006
 Prince+0.006
Neg logit contributions
w-0.007
z-0.006
s-0.006
inks-0.006
ing-0.005
Token I
Token ID314
Feature activation-2.28e-03

Pos logit contributions
w+0.206
z+0.175
s+0.161
inks+0.160
ings+0.151
Neg logit contributions
 agreement-0.209
 unison-0.160
 horror-0.156
 Prince-0.154
 Princess-0.151
Token'll
Token ID1183
Feature activation-4.16e-03

Pos logit contributions
w+0.004
z+0.003
s+0.003
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.006
 unison-0.005
 horror-0.005
 Prince-0.005
 Freddie-0.005
Token be
Token ID307
Feature activation-2.03e-03

Pos logit contributions
w+0.009
z+0.008
s+0.008
inks+0.007
ings+0.007
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Freddie-0.006
 Prince-0.006
Token back
Token ID736
Feature activation+5.53e-03

Pos logit contributions
w+0.004
z+0.004
uck+0.003
inks+0.003
ing+0.003
Neg logit contributions
 agreement-0.004
 unison-0.004
 horror-0.004
 Freddie-0.003
 Prince-0.003
Token soon
Token ID2582
Feature activation-1.52e-02

Pos logit contributions
 agreement+0.011
 unison+0.008
 Prince+0.008
 Princess+0.007
 horror+0.007
Neg logit contributions
w-0.014
z-0.011
inks-0.011
s-0.011
mons-0.011
TokenNO
Token ID15285
Feature activation-6.55e-04

Pos logit contributions
w+0.008
z+0.007
inks+0.006
s+0.006
ings+0.006
Neg logit contributions
 agreement-0.007
 unison-0.005
 horror-0.005
 Prince-0.005
 relief-0.005
Token,
Token ID11
Feature activation-1.35e-03

Pos logit contributions
w+0.001
z+0.001
inks+0.001
s+0.001
mons+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Prince-0.001
 Freddie-0.001
Token L
Token ID406
Feature activation-2.68e-02

Pos logit contributions
w+0.003
s+0.002
mons+0.002
uck+0.002
ew+0.002
Neg logit contributions
 agreement-0.003
 unison-0.002
 hesitation-0.002
 Freddie-0.002
 Prince-0.002
TokenILY
Token ID33340
Feature activation+2.40e-02

Pos logit contributions
w+0.070
z+0.061
s+0.057
inks+0.057
mons+0.055
Neg logit contributions
 agreement-0.049
 unison-0.035
 horror-0.034
 Prince-0.033
 Freddie-0.032
Token!
Token ID0
Feature activation-1.34e-02

Pos logit contributions
 agreement+0.054
 unison+0.041
 horror+0.041
 Prince+0.039
 Freddie+0.038
Neg logit contributions
w-0.051
z-0.044
inks-0.041
s-0.040
mons-0.038
Token STOP
Token ID44934
Feature activation-9.16e-02

Pos logit contributions
w+0.031
z+0.027
inks+0.025
s+0.025
ings+0.024
Neg logit contributions
 agreement-0.027
 horror-0.021
 unison-0.020
 Prince-0.020
 Princess-0.019
Token!"
Token ID2474
Feature activation-3.43e-03

Pos logit contributions
w+0.174
z+0.160
bel+0.147
inks+0.145
mons+0.144
Neg logit contributions
 agreement-0.202
 unison-0.156
 Freddie-0.151
 horror-0.149
 Prince-0.148
Token Mom
Token ID11254
Feature activation-1.47e-02

Pos logit contributions
w+0.009
z+0.008
s+0.008
inks+0.008
ings+0.007
Neg logit contributions
 agreement-0.006
 unison-0.004
 horror-0.004
 Freddie-0.004
 Prince-0.004
Token shouted
Token ID17293
Feature activation-1.37e-02

Pos logit contributions
w+0.034
z+0.030
s+0.028
inks+0.027
ing+0.026
Neg logit contributions
 agreement-0.031
 unison-0.023
 horror-0.022
 Prince-0.022
 Princess-0.021
Token.
Token ID13
Feature activation-7.53e-03

Pos logit contributions
w+0.026
z+0.022
s+0.022
ing+0.021
inks+0.021
Neg logit contributions
 agreement-0.032
 unison-0.027
 Prince-0.025
 horror-0.024
 Freddie-0.024
Token
Token ID198
Feature activation-8.94e-03

Pos logit contributions
w+0.014
s+0.012
z+0.012
inks+0.011
ings+0.011
Neg logit contributions
 agreement-0.019
 unison-0.015
 horror-0.014
 Freddie-0.014
 Prince-0.014
Token He
Token ID679
Feature activation-9.55e-03

Pos logit contributions
w+0.031
z+0.026
s+0.026
inks+0.025
ings+0.024
Neg logit contributions
 agreement-0.032
 unison-0.025
 horror-0.023
 Prince-0.022
 hesitation-0.022
Token says
Token ID1139
Feature activation-1.21e-02

Pos logit contributions
w+0.019
z+0.016
inks+0.014
ings+0.014
mons+0.014
Neg logit contributions
 agreement-0.022
 horror-0.018
 unison-0.018
 Prince-0.017
 Princess-0.017
Token,
Token ID11
Feature activation-1.23e-02

Pos logit contributions
w+0.028
z+0.025
inks+0.023
mons+0.022
s+0.022
Neg logit contributions
 agreement-0.024
 horror-0.018
 Prince-0.018
 Freddie-0.017
 Princess-0.017
Token "
Token ID366
Feature activation-3.18e-02

Pos logit contributions
w+0.025
s+0.022
z+0.021
ing+0.019
inks+0.019
Neg logit contributions
 agreement-0.028
 unison-0.023
 horror-0.020
 delight-0.020
 Prince-0.019
TokenO
Token ID46
Feature activation-8.91e-03

Pos logit contributions
w+0.068
z+0.060
inks+0.058
s+0.056
ings+0.054
Neg logit contributions
 agreement-0.068
 unison-0.051
 horror-0.051
 Prince-0.049
 relief-0.048
Tokenw
Token ID86
Feature activation-9.14e-02

Pos logit contributions
bel+0.005
to+0.004
elt+0.003
z+0.003
inks+0.003
Neg logit contributions
 agreement-0.022
 end-0.022
kin-0.022
 End-0.021
 relief-0.021
Token,
Token ID11
Feature activation-1.27e-02

Pos logit contributions
w+0.202
z+0.178
inks+0.167
mons+0.164
bel+0.164
Neg logit contributions
 agreement-0.181
 horror-0.133
 unison-0.132
 Freddie-0.131
 Prince-0.129
Token ow
Token ID12334
Feature activation-4.35e-03

Pos logit contributions
w+0.034
z+0.028
s+0.027
uck+0.026
inks+0.026
Neg logit contributions
 agreement-0.023
 unison-0.017
 horror-0.014
 Prince-0.014
 Princess-0.014
Token,
Token ID11
Feature activation-1.26e-02

Pos logit contributions
w+0.010
z+0.009
inks+0.009
bel+0.009
mons+0.008
Neg logit contributions
 agreement-0.008
 horror-0.006
 end-0.006
 unison-0.005
 Freddie-0.005
Token ow
Token ID12334
Feature activation-4.20e-03

Pos logit contributions
w+0.033
z+0.027
s+0.026
uck+0.025
ing+0.025
Neg logit contributions
 agreement-0.023
 unison-0.018
 Prince-0.015
 horror-0.015
 Princess-0.015
Token!
Token ID0
Feature activation-2.28e-02

Pos logit contributions
w+0.009
z+0.008
inks+0.008
mons+0.007
bel+0.007
Neg logit contributions
 agreement-0.008
 horror-0.006
 unison-0.006
 end-0.006
 Freddie-0.006
Token make
Token ID787
Feature activation-1.10e-02

Pos logit contributions
 agreement+0.020
 unison+0.015
 horror+0.014
 Prince+0.014
 Princess+0.014
Neg logit contributions
w-0.024
z-0.021
inks-0.019
s-0.019
ing-0.019
Token bra
Token ID8290
Feature activation+1.31e-02

Pos logit contributions
w+0.025
z+0.022
s+0.020
inks+0.020
ings+0.019
Neg logit contributions
 agreement-0.023
 unison-0.017
 horror-0.017
 Prince-0.016
 Freddie-0.016
Tokenids
Token ID2340
Feature activation+6.30e-03

Pos logit contributions
 agreement+0.026
 unison+0.019
 horror+0.018
 Prince+0.018
 Princess+0.017
Neg logit contributions
w-0.032
z-0.027
s-0.026
inks-0.026
mons-0.024
Token and
Token ID290
Feature activation-5.50e-04

Pos logit contributions
 agreement+0.014
 unison+0.011
 horror+0.011
 Prince+0.010
 Freddie+0.010
Neg logit contributions
w-0.013
z-0.011
s-0.011
inks-0.010
ings-0.010
Token pony
Token ID26902
Feature activation+6.43e-03

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
mons+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Prince-0.001
 Freddie-0.001
Tokentails
Token ID26404
Feature activation-9.06e-02

Pos logit contributions
 agreement+0.013
 unison+0.010
 Prince+0.010
 Princess+0.010
 horror+0.010
Neg logit contributions
w-0.015
z-0.012
inks-0.012
s-0.011
mons-0.011
Token and
Token ID290
Feature activation-2.93e-03

Pos logit contributions
w+0.198
z+0.168
inks+0.157
s+0.156
ings+0.147
Neg logit contributions
 agreement-0.198
 unison-0.151
 horror-0.148
 Prince-0.143
 Princess-0.139
Token curls
Token ID49507
Feature activation+1.09e-02

Pos logit contributions
w+0.007
z+0.006
s+0.005
inks+0.005
mons+0.005
Neg logit contributions
 agreement-0.006
 unison-0.005
 horror-0.004
 Prince-0.004
 Freddie-0.004
Token.
Token ID13
Feature activation-8.44e-03

Pos logit contributions
 agreement+0.021
 unison+0.015
 Prince+0.015
 horror+0.015
 Princess+0.015
Neg logit contributions
w-0.026
z-0.022
inks-0.021
s-0.020
mons-0.020
Token You
Token ID921
Feature activation+1.06e-02

Pos logit contributions
w+0.016
z+0.014
inks+0.013
s+0.013
mons+0.012
Neg logit contributions
 agreement-0.020
 unison-0.016
 horror-0.015
 Freddie-0.015
 Prince-0.015
Token can
Token ID460
Feature activation+6.48e-03

Pos logit contributions
 agreement+0.025
 unison+0.019
 horror+0.019
 Prince+0.018
 Princess+0.018
Neg logit contributions
w-0.021
z-0.018
inks-0.017
s-0.017
ings-0.016
Token asked
Token ID1965
Feature activation+7.64e-03

Pos logit contributions
 agreement+0.004
 unison+0.003
 horror+0.003
 Prince+0.003
 Freddie+0.003
Neg logit contributions
w-0.005
z-0.005
inks-0.004
s-0.004
ings-0.004
Token her
Token ID607
Feature activation+1.04e-02

Pos logit contributions
 agreement+0.014
 unison+0.011
 horror+0.010
 Prince+0.010
 Freddie+0.010
Neg logit contributions
w-0.019
z-0.016
s-0.016
inks-0.015
ing-0.015
Token mum
Token ID25682
Feature activation-7.31e-04

Pos logit contributions
 agreement+0.021
 unison+0.018
 Freddie+0.016
 horror+0.016
 relief+0.015
Neg logit contributions
w-0.021
s-0.020
z-0.019
ing-0.018
mons-0.017
Token if
Token ID611
Feature activation-2.36e-02

Pos logit contributions
w+0.002
z+0.001
inks+0.001
s+0.001
mons+0.001
Neg logit contributions
 agreement-0.002
 unison-0.001
 Prince-0.001
 horror-0.001
 Princess-0.001
Token the
Token ID262
Feature activation+3.55e-03

Pos logit contributions
w+0.050
z+0.042
s+0.040
inks+0.039
mons+0.037
Neg logit contributions
 agreement-0.054
 unison-0.042
 horror-0.040
 Prince-0.038
 Freddie-0.038
Token hor
Token ID3076
Feature activation-9.03e-02

Pos logit contributions
 agreement+0.007
 unison+0.005
 horror+0.005
 Prince+0.005
 Princess+0.005
Neg logit contributions
w-0.009
z-0.007
inks-0.007
s-0.007
ings-0.007
Token was
Token ID373
Feature activation-3.28e-03

Pos logit contributions
w+0.209
z+0.179
uck+0.166
ings+0.161
s+0.160
Neg logit contributions
 agreement-0.185
 horror-0.144
 unison-0.141
 relief-0.138
 delight-0.136
Token ready
Token ID3492
Feature activation-3.41e-03

Pos logit contributions
w+0.006
z+0.005
ing+0.005
s+0.005
inks+0.005
Neg logit contributions
 agreement-0.008
 unison-0.007
 horror-0.006
 delight-0.006
 relief-0.006
Token yet
Token ID1865
Feature activation+8.92e-03

Pos logit contributions
w+0.007
z+0.005
inks+0.005
s+0.005
mons+0.005
Neg logit contributions
 agreement-0.008
 unison-0.007
 horror-0.006
 Prince-0.006
 Princess-0.006
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
 agreement+0.019
 unison+0.014
 horror+0.014
 Prince+0.013
 Freddie+0.013
Neg logit contributions
w-0.020
z-0.017
inks-0.016
s-0.016
ings-0.015
Token
Token ID198
Feature activation+3.16e-03

Pos logit contributions
 agreement+0.023
 unison+0.018
 horror+0.017
 Prince+0.017
 knight+0.017
Neg logit contributions
w-0.023
z-0.020
s-0.019
inks-0.019
ings-0.018
Token there
Token ID612
Feature activation-2.18e-02

Pos logit contributions
w+0.062
z+0.052
inks+0.050
mons+0.048
s+0.047
Neg logit contributions
 agreement-0.051
 horror-0.037
 Freddie-0.037
 unison-0.037
 Prince-0.036
Token was
Token ID373
Feature activation-1.87e-02

Pos logit contributions
w+0.041
z+0.034
inks+0.031
s+0.030
mons+0.029
Neg logit contributions
 agreement-0.054
 unison-0.043
 horror-0.042
 Prince-0.041
 Freddie-0.040
Token a
Token ID257
Feature activation-1.08e-02

Pos logit contributions
w+0.043
z+0.036
s+0.034
inks+0.033
mons+0.032
Neg logit contributions
 agreement-0.040
 unison-0.030
 horror-0.030
 Prince-0.028
 relief-0.027
Token river
Token ID7850
Feature activation-1.07e-02

Pos logit contributions
w+0.026
z+0.023
s+0.023
inks+0.021
ing+0.021
Neg logit contributions
 agreement-0.020
 horror-0.015
 unison-0.015
 hesitation-0.014
 delight-0.013
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
w+0.024
z+0.020
inks+0.019
s+0.019
mons+0.018
Neg logit contributions
 agreement-0.023
 unison-0.018
 horror-0.017
 Prince-0.017
 Freddie-0.016
Token River
Token ID5866
Feature activation-9.02e-02

Pos logit contributions
w+0.022
z+0.019
inks+0.017
s+0.017
mons+0.016
Neg logit contributions
 agreement-0.027
 horror-0.021
 unison-0.021
 hesitation-0.020
 relief-0.020
Token was
Token ID373
Feature activation-1.34e-02

Pos logit contributions
w+0.194
z+0.162
inks+0.151
ings+0.142
mons+0.141
Neg logit contributions
 agreement-0.190
 unison-0.148
 Freddie-0.142
 Prince-0.140
 Princess-0.138
Token very
Token ID845
Feature activation+2.08e-03

Pos logit contributions
w+0.032
z+0.027
inks+0.026
s+0.026
ings+0.024
Neg logit contributions
 agreement-0.027
 unison-0.020
 horror-0.020
 Prince-0.020
 Princess-0.019
Token wide
Token ID3094
Feature activation-2.53e-02

Pos logit contributions
 agreement+0.004
 horror+0.003
 Prince+0.002
 unison+0.002
 Freddie+0.002
Neg logit contributions
w-0.005
z-0.005
inks-0.004
s-0.004
ings-0.004
Token and
Token ID290
Feature activation-9.05e-03

Pos logit contributions
w+0.059
z+0.049
inks+0.045
s+0.045
mons+0.044
Neg logit contributions
 agreement-0.053
 unison-0.040
 horror-0.039
 Prince-0.039
 Freddie-0.038
Token long
Token ID890
Feature activation-2.17e-02

Pos logit contributions
w+0.022
z+0.019
inks+0.018
s+0.017
ings+0.017
Neg logit contributions
 agreement-0.018
 unison-0.013
 horror-0.012
 Freddie-0.012
 Prince-0.012
Tokenmy
Token ID1820
Feature activation-2.13e-02

Pos logit contributions
w+0.012
z+0.010
inks+0.009
s+0.009
mons+0.009
Neg logit contributions
 agreement-0.015
 unison-0.012
 horror-0.012
 Prince-0.012
 Freddie-0.011
Token said
Token ID531
Feature activation-1.68e-02

Pos logit contributions
w+0.049
z+0.042
s+0.041
inks+0.039
ing+0.037
Neg logit contributions
 agreement-0.045
 unison-0.034
 horror-0.033
 Prince-0.031
 relief-0.030
Token,
Token ID11
Feature activation-5.48e-03

Pos logit contributions
w+0.038
z+0.032
inks+0.030
s+0.030
ings+0.028
Neg logit contributions
 agreement-0.036
 unison-0.027
 horror-0.027
 Prince-0.026
 Freddie-0.026
Token "
Token ID366
Feature activation-2.37e-02

Pos logit contributions
w+0.013
z+0.012
s+0.011
inks+0.011
ing+0.011
Neg logit contributions
 agreement-0.010
 unison-0.008
 horror-0.007
 Prince-0.007
 relief-0.007
TokenE
Token ID36
Feature activation-3.11e-02

Pos logit contributions
inks+0.056
w+0.055
z+0.053
ings+0.053
s+0.047
Neg logit contributions
 agreement-0.040
 end-0.025
 unison-0.025
 Freddie-0.025
 horror-0.024
Tokenw
Token ID86
Feature activation-8.97e-02

Pos logit contributions
z+0.028
inks+0.025
bel+0.025
to+0.021
w+0.021
Neg logit contributions
 agreement-0.079
 Freddie-0.072
 end-0.072
 relief-0.070
kin-0.069
Token,
Token ID11
Feature activation-5.07e-03

Pos logit contributions
w+0.197
z+0.169
inks+0.160
s+0.156
mons+0.152
Neg logit contributions
 agreement-0.183
 Freddie-0.136
 Prince-0.135
 unison-0.135
 horror-0.134
Token Tim
Token ID5045
Feature activation-1.98e-02

Pos logit contributions
w+0.012
z+0.010
inks+0.009
s+0.009
ings+0.009
Neg logit contributions
 agreement-0.011
 unison-0.008
 horror-0.008
 Prince-0.008
 Freddie-0.007
Tokenmy
Token ID1820
Feature activation-2.08e-02

Pos logit contributions
w+0.042
z+0.037
inks+0.036
mons+0.035
s+0.034
Neg logit contributions
 agreement-0.043
 horror-0.031
 end-0.030
 Freddie-0.030
 unison-0.029
Token!
Token ID0
Feature activation-1.39e-02

Pos logit contributions
w+0.051
z+0.047
bel+0.044
mons+0.044
iffs+0.043
Neg logit contributions
 agreement-0.033
 horror-0.025
 Freddie-0.024
 Prince-0.023
 unison-0.023
Token That
Token ID1320
Feature activation-1.45e-02

Pos logit contributions
w+0.029
z+0.025
inks+0.023
s+0.022
ings+0.021
Neg logit contributions
 agreement-0.032
 unison-0.025
 horror-0.024
 Prince-0.024
 Freddie-0.023
Token and
Token ID290
Feature activation-3.05e-03

Pos logit contributions
 agreement+0.032
 unison+0.025
 horror+0.024
 Prince+0.023
 Freddie+0.023
Neg logit contributions
w-0.038
z-0.033
inks-0.030
s-0.029
ings-0.028
Token had
Token ID550
Feature activation-1.08e-02

Pos logit contributions
w+0.007
z+0.006
s+0.006
inks+0.006
mons+0.005
Neg logit contributions
 agreement-0.007
 unison-0.005
 horror-0.005
 Prince-0.005
 relief-0.004
Token the
Token ID262
Feature activation-8.33e-03

Pos logit contributions
w+0.027
z+0.024
inks+0.023
s+0.022
ings+0.022
Neg logit contributions
 agreement-0.019
 unison-0.014
 Prince-0.013
 delight-0.013
 horror-0.013
Token word
Token ID1573
Feature activation-2.83e-02

Pos logit contributions
w+0.019
z+0.016
inks+0.015
s+0.015
ings+0.014
Neg logit contributions
 agreement-0.017
 unison-0.013
 horror-0.013
 Prince-0.012
 Princess-0.012
Token "
Token ID366
Feature activation-2.85e-02

Pos logit contributions
w+0.060
z+0.051
s+0.049
inks+0.048
ings+0.045
Neg logit contributions
 agreement-0.064
 unison-0.049
 horror-0.048
 Prince-0.046
 Freddie-0.045
Tokenw
Token ID86
Feature activation-8.91e-02

Pos logit contributions
w+0.053
inks+0.045
z+0.044
s+0.041
ings+0.040
Neg logit contributions
 agreement-0.064
 unison-0.051
 Prince-0.048
 Princess-0.048
 Freddie-0.046
Tokenelcome
Token ID9571
Feature activation-4.32e-02

Pos logit contributions
w+0.165
z+0.140
inks+0.128
s+0.127
mons+0.117
Neg logit contributions
 agreement-0.215
 unison-0.171
 Prince-0.164
 horror-0.163
 Freddie-0.161
Token"
Token ID1
Feature activation-7.00e-03

Pos logit contributions
w+0.091
z+0.079
inks+0.075
s+0.071
mons+0.068
Neg logit contributions
 agreement-0.093
 unison-0.070
 Prince-0.069
 Princess-0.068
 Freddie-0.067
Token written
Token ID3194
Feature activation-2.01e-02

Pos logit contributions
w+0.015
z+0.013
inks+0.012
s+0.011
ings+0.011
Neg logit contributions
 agreement-0.015
 Prince-0.011
 unison-0.011
 Princess-0.011
 Freddie-0.011
Token in
Token ID287
Feature activation-5.25e-03

Pos logit contributions
w+0.046
z+0.040
s+0.037
inks+0.037
ings+0.035
Neg logit contributions
 agreement-0.042
 unison-0.031
 horror-0.030
 Prince-0.029
 Freddie-0.029
Token big
Token ID1263
Feature activation+1.06e-02

Pos logit contributions
w+0.013
z+0.011
s+0.010
inks+0.010
ings+0.010
Neg logit contributions
 agreement-0.010
 unison-0.007
 horror-0.007
 Prince-0.007
 Freddie-0.007
Token
Token ID220
Feature activation+8.63e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 knight+0.002
Neg logit contributions
w-0.003
z-0.002
s-0.002
inks-0.002
ings-0.002
Token
Token ID198
Feature activation-3.27e-03

Pos logit contributions
 agreement+0.015
 unison+0.011
 Prince+0.011
 Freddie+0.011
 horror+0.011
Neg logit contributions
w-0.022
z-0.018
inks-0.017
mons-0.016
s-0.016
Token
Token ID198
Feature activation-1.99e-04

Pos logit contributions
w+0.007
z+0.006
s+0.005
inks+0.005
ings+0.005
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.005
Tokenâ
Token ID22940
Feature activation+5.08e-03

Pos logit contributions
w+0.000
z+0.000
s+0.000
inks+0.000
ing+0.000
Neg logit contributions
 agreement-0.000
 horror-0.000
 unison-0.000
 Prince-0.000
 Princess-0.000
Token
Token ID26391
Feature activation+4.28e-02

Pos logit contributions
 agreement+0.010
 unison+0.007
 horror+0.006
 relief+0.006
 Princess+0.006
Neg logit contributions
w-0.012
z-0.011
inks-0.010
s-0.010
ings-0.010
Token
Token ID129
Feature activation+7.61e-02

Pos logit contributions
 agreement+0.077
 unison+0.064
 horror+0.061
 delight+0.056
 Prince+0.056
Neg logit contributions
w-0.100
mons-0.077
z-0.076
ing-0.073
uck-0.073
Token
Token ID241
Feature activation+4.24e-03

Pos logit contributions
 agreement+0.145
 unison+0.106
 horror+0.104
 Prince+0.100
 Freddie+0.097
Neg logit contributions
w-0.188
z-0.165
s-0.154
inks-0.153
ings-0.146
TokenThat
Token ID2504
Feature activation-3.13e-03

Pos logit contributions
 agreement+0.009
 unison+0.007
 horror+0.006
 relief+0.006
 delight+0.006
Neg logit contributions
w-0.009
inks-0.008
z-0.008
s-0.007
ings-0.007
Token looks
Token ID3073
Feature activation-9.53e-03

Pos logit contributions
w+0.007
z+0.006
inks+0.005
mons+0.005
iffs+0.005
Neg logit contributions
 agreement-0.007
 unison-0.005
 Princess-0.005
 Prince-0.005
 horror-0.005
Token amazing
Token ID4998
Feature activation+1.93e-02

Pos logit contributions
w+0.023
z+0.020
s+0.019
inks+0.019
mons+0.018
Neg logit contributions
 agreement-0.018
 unison-0.013
 horror-0.013
 Prince-0.013
 Princess-0.012
Token!
Token ID0
Feature activation+4.44e-03

Pos logit contributions
 agreement+0.045
 unison+0.035
 horror+0.034
 Prince+0.033
 Freddie+0.033
Neg logit contributions
w-0.040
z-0.033
inks-0.031
s-0.030
ings-0.029
Token said
Token ID531
Feature activation-8.38e-03

Pos logit contributions
w+0.004
s+0.004
z+0.003
inks+0.003
o+0.003
Neg logit contributions
 agreement-0.006
 unison-0.005
 horror-0.005
 Deal-0.004
 End-0.004
Token,
Token ID11
Feature activation+1.87e-04

Pos logit contributions
w+0.018
z+0.016
inks+0.015
s+0.014
mons+0.014
Neg logit contributions
 agreement-0.018
 unison-0.014
 horror-0.014
 Prince-0.014
 Freddie-0.013
Token
Token ID6184
Feature activation-1.98e-02

Pos logit contributions
 agreement+0.000
 unison+0.000
 horror+0.000
 relief+0.000
 delight+0.000
Neg logit contributions
w-0.000
z-0.000
s-0.000
ing-0.000
inks-0.000
Token
Token ID95
Feature activation+9.21e-03

Pos logit contributions
w+0.040
z+0.033
s+0.031
inks+0.031
ings+0.029
Neg logit contributions
 agreement-0.047
 unison-0.037
 horror-0.036
 Prince-0.035
 Freddie-0.034
Token
Token ID26391
Feature activation+3.66e-02

Pos logit contributions
 agreement+0.021
 unison+0.016
 horror+0.016
 Prince+0.016
 Freddie+0.015
Neg logit contributions
w-0.019
z-0.016
inks-0.015
s-0.015
ings-0.014
Token
Token ID129
Feature activation+6.84e-02

Pos logit contributions
 agreement+0.078
 unison+0.067
 Prince+0.065
 Freddie+0.064
 knight+0.063
Neg logit contributions
w-0.069
z-0.053
mons-0.049
ing-0.049
inks-0.048
Token
Token ID241
Feature activation-3.28e-03

Pos logit contributions
 agreement+0.133
 unison+0.100
 horror+0.097
 Prince+0.094
 Princess+0.090
Neg logit contributions
w-0.164
z-0.143
inks-0.135
s-0.134
ings-0.127
TokenIt
Token ID1026
Feature activation+1.97e-03

Pos logit contributions
w+0.007
z+0.006
inks+0.006
s+0.006
ings+0.005
Neg logit contributions
 agreement-0.007
 unison-0.006
 Prince-0.005
 horror-0.005
 Princess-0.005
Tokenâ
Token ID22940
Feature activation-5.72e-03

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 Prince+0.004
 Freddie+0.004
Neg logit contributions
w-0.004
z-0.003
s-0.003
inks-0.003
ings-0.003
Token
Token ID26391
Feature activation+3.59e-02

Pos logit contributions
w+0.013
z+0.011
s+0.011
inks+0.010
mons+0.010
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Prince-0.009
 Freddie-0.008
Token
Token ID8151
Feature activation-3.49e-03

Pos logit contributions
 agreement+0.075
 unison+0.061
 Prince+0.058
 Freddie+0.057
 horror+0.057
Neg logit contributions
w-0.077
z-0.064
inks-0.059
s-0.058
ing-0.057
Token said
Token ID531
Feature activation+3.24e-03

Pos logit contributions
 agreement+0.035
 horror+0.027
 unison+0.027
 relief+0.025
 Prince+0.025
Neg logit contributions
w-0.031
z-0.027
s-0.026
inks-0.024
ing-0.023
Token,
Token ID11
Feature activation+8.86e-03

Pos logit contributions
 agreement+0.007
 Prince+0.005
 horror+0.005
 unison+0.005
 Princess+0.005
Neg logit contributions
w-0.007
z-0.006
inks-0.006
s-0.005
mons-0.005
Token
Token ID6184
Feature activation-4.47e-03

Pos logit contributions
 agreement+0.019
 unison+0.015
 horror+0.015
 relief+0.014
 delight+0.013
Neg logit contributions
w-0.018
z-0.017
s-0.016
ing-0.015
inks-0.015
Token
Token ID95
Feature activation+2.23e-02

Pos logit contributions
w+0.011
z+0.009
s+0.009
inks+0.009
ings+0.008
Neg logit contributions
 agreement-0.009
 unison-0.006
 Prince-0.006
 horror-0.006
 Princess-0.006
Token
Token ID26391
Feature activation+4.38e-02

Pos logit contributions
 agreement+0.053
 unison+0.041
 horror+0.041
 Prince+0.039
 Freddie+0.039
Neg logit contributions
w-0.045
z-0.037
s-0.035
inks-0.035
ings-0.032
Token
Token ID129
Feature activation+7.53e-02

Pos logit contributions
 agreement+0.080
 knight+0.070
 Prince+0.070
 Princess+0.067
 unison+0.065
Neg logit contributions
w-0.099
z-0.076
mons-0.075
ings-0.072
uck-0.071
Token
Token ID241
Feature activation+6.22e-03

Pos logit contributions
 agreement+0.153
 unison+0.117
 horror+0.112
 Prince+0.107
 Freddie+0.104
Neg logit contributions
w-0.174
z-0.152
inks-0.142
s-0.142
ings-0.134
TokenDo
Token ID5211
Feature activation+2.12e-02

Pos logit contributions
 agreement+0.012
 unison+0.009
 Prince+0.009
 Princess+0.009
 Freddie+0.009
Neg logit contributions
w-0.015
z-0.012
inks-0.012
s-0.012
ings-0.012
Token not
Token ID407
Feature activation+2.45e-02

Pos logit contributions
 agreement+0.053
 unison+0.042
 horror+0.042
 Prince+0.040
 relief+0.039
Neg logit contributions
w-0.041
z-0.033
s-0.031
inks-0.030
ing-0.028
Token cry
Token ID3960
Feature activation+8.75e-03

Pos logit contributions
 agreement+0.048
 unison+0.038
 horror+0.037
 Freddie+0.034
 Prince+0.034
Neg logit contributions
w-0.058
z-0.048
inks-0.044
uck-0.043
s-0.043
Token,
Token ID11
Feature activation+3.95e-03

Pos logit contributions
 agreement+0.020
 unison+0.015
 horror+0.015
 Prince+0.015
 Freddie+0.014
Neg logit contributions
w-0.019
z-0.016
s-0.015
inks-0.015
ings-0.014
Token.
Token ID13
Feature activation+7.67e-03

Pos logit contributions
 agreement+0.023
 unison+0.018
 horror+0.018
 Prince+0.017
 relief+0.017
Neg logit contributions
w-0.019
z-0.016
s-0.015
inks-0.015
ings-0.014
Token
Token ID198
Feature activation+1.34e-03

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 knight+0.013
 hesitation+0.013
Neg logit contributions
w-0.014
s-0.013
inks-0.012
z-0.012
uck-0.012
Token
Token ID198
Feature activation+5.16e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.003
inks-0.002
s-0.002
mons-0.002
Tokenâ
Token ID22940
Feature activation+1.29e-02

Pos logit contributions
 agreement+0.011
 horror+0.008
 unison+0.008
 Prince+0.008
 relief+0.007
Neg logit contributions
w-0.012
z-0.010
s-0.010
inks-0.009
ing-0.009
Token
Token ID26391
Feature activation+4.73e-02

Pos logit contributions
 agreement+0.026
 unison+0.019
 horror+0.019
 Prince+0.018
 Freddie+0.017
Neg logit contributions
w-0.031
z-0.027
s-0.025
inks-0.025
ings-0.024
Token
Token ID129
Feature activation+7.53e-02

Pos logit contributions
 agreement+0.085
 unison+0.075
 horror+0.067
 Freddie+0.067
 Prince+0.067
Neg logit contributions
w-0.108
z-0.081
ing-0.080
mons-0.078
inks-0.076
Token
Token ID241
Feature activation+8.30e-03

Pos logit contributions
 agreement+0.142
 unison+0.104
 horror+0.101
 Prince+0.097
 Princess+0.094
Neg logit contributions
w-0.187
z-0.164
s-0.154
inks-0.154
ings-0.146
TokenIt
Token ID1026
Feature activation+1.18e-02

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.012
 relief+0.012
 delight+0.012
Neg logit contributions
w-0.018
inks-0.016
z-0.015
ings-0.014
s-0.014
Tokenâ
Token ID22940
Feature activation+1.25e-02

Pos logit contributions
 agreement+0.030
 unison+0.024
 horror+0.023
 Prince+0.023
 Freddie+0.022
Neg logit contributions
w-0.022
z-0.018
inks-0.017
s-0.016
ings-0.015
Token
Token ID26391
Feature activation+4.72e-02

Pos logit contributions
 agreement+0.024
 horror+0.018
 unison+0.018
 Prince+0.017
 knight+0.016
Neg logit contributions
w-0.031
z-0.027
s-0.025
inks-0.025
uck-0.024
Token
Token ID8151
Feature activation+6.93e-03

Pos logit contributions
 agreement+0.090
 unison+0.076
 Prince+0.069
 Freddie+0.069
 Princess+0.068
Neg logit contributions
w-0.107
z-0.086
inks-0.081
ing-0.080
s-0.079
Token happened
Token ID3022
Feature activation-1.15e-02

Pos logit contributions
w+0.029
z+0.023
s+0.021
inks+0.021
ing+0.020
Neg logit contributions
 agreement-0.043
 unison-0.034
 horror-0.034
 Prince-0.033
 Princess-0.032
Token.
Token ID13
Feature activation+1.11e-02

Pos logit contributions
w+0.025
z+0.022
s+0.020
inks+0.020
ings+0.019
Neg logit contributions
 agreement-0.025
 unison-0.019
 horror-0.019
 Prince-0.018
 Freddie-0.018
Token The
Token ID383
Feature activation+1.81e-03

Pos logit contributions
 agreement+0.025
 horror+0.019
 unison+0.018
 hesitation+0.018
 end+0.018
Neg logit contributions
w-0.020
z-0.018
s-0.018
inks-0.017
elt-0.017
Token books
Token ID3835
Feature activation+7.46e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Freddie+0.002
 Prince+0.002
Neg logit contributions
w-0.005
z-0.004
s-0.004
inks-0.004
ing-0.004
Tokenhe
Token ID258
Feature activation+1.25e-03

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 Prince+0.013
 Princess+0.012
Neg logit contributions
w-0.016
z-0.013
s-0.013
inks-0.012
ing-0.012
Tokenlf
Token ID1652
Feature activation+7.19e-02

Pos logit contributions
 agreement+0.002
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.004
z-0.003
s-0.003
inks-0.003
ings-0.003
Token became
Token ID2627
Feature activation+2.62e-02

Pos logit contributions
 agreement+0.148
 unison+0.112
 horror+0.109
 Prince+0.105
 Freddie+0.103
Neg logit contributions
w-0.164
z-0.140
inks-0.132
s-0.129
mons-0.124
Token restless
Token ID42537
Feature activation+7.93e-03

Pos logit contributions
 agreement+0.059
 unison+0.046
 horror+0.044
 Prince+0.042
 Freddie+0.041
Neg logit contributions
w-0.052
z-0.045
inks-0.042
ings-0.041
s-0.041
Token.
Token ID13
Feature activation+8.57e-03

Pos logit contributions
 agreement+0.019
 unison+0.015
 horror+0.014
 Prince+0.014
 Freddie+0.013
Neg logit contributions
w-0.015
z-0.013
s-0.013
inks-0.012
ing-0.012
Token It
Token ID632
Feature activation-1.39e-03

Pos logit contributions
 agreement+0.020
 horror+0.015
 unison+0.015
 hesitation+0.015
 relief+0.015
Neg logit contributions
w-0.015
s-0.013
z-0.013
inks-0.013
ings-0.012
Token was
Token ID373
Feature activation+3.65e-03

Pos logit contributions
w+0.003
z+0.002
s+0.002
inks+0.002
bel+0.002
Neg logit contributions
 agreement-0.003
 unison-0.003
 Prince-0.003
 horror-0.002
 Freddie-0.002
Token basket
Token ID7988
Feature activation-5.15e-03

Pos logit contributions
 agreement+0.052
 unison+0.039
 horror+0.038
 Prince+0.035
 relief+0.035
Neg logit contributions
w-0.061
z-0.051
s-0.050
inks-0.048
ings-0.047
Token.
Token ID13
Feature activation+9.13e-03

Pos logit contributions
w+0.011
z+0.009
s+0.009
inks+0.009
ings+0.008
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Prince-0.008
 relief-0.008
Token "
Token ID366
Feature activation-1.34e-02

Pos logit contributions
 agreement+0.022
 unison+0.017
 horror+0.016
 relief+0.016
 hesitation+0.016
Neg logit contributions
w-0.016
s-0.016
z-0.015
inks-0.015
ings-0.014
TokenThis
Token ID1212
Feature activation-1.55e-03

Pos logit contributions
w+0.030
z+0.026
inks+0.025
ings+0.024
s+0.023
Neg logit contributions
 agreement-0.028
 unison-0.022
 Prince-0.020
 horror-0.020
 Princess-0.020
Token is
Token ID318
Feature activation+2.58e-03

Pos logit contributions
w+0.003
z+0.003
inks+0.002
s+0.002
bel+0.002
Neg logit contributions
 agreement-0.004
 Prince-0.003
 Freddie-0.003
 unison-0.003
 Princess-0.003
Token shrimp
Token ID30060
Feature activation+9.09e-03

Pos logit contributions
 agreement+0.006
 horror+0.004
 unison+0.004
 relief+0.004
 delight+0.004
Neg logit contributions
w-0.005
s-0.004
z-0.004
inks-0.004
mons-0.004
Token.
Token ID13
Feature activation+2.71e-03

Pos logit contributions
 agreement+0.020
 unison+0.015
 Prince+0.015
 horror+0.015
 Freddie+0.014
Neg logit contributions
w-0.020
z-0.017
inks-0.016
s-0.015
ings-0.015
Token It
Token ID632
Feature activation+2.92e-03

Pos logit contributions
 agreement+0.007
 unison+0.005
 horror+0.005
 relief+0.005
 hesitation+0.005
Neg logit contributions
w-0.005
s-0.004
z-0.004
inks-0.004
mons-0.004
Token is
Token ID318
Feature activation+2.03e-03

Pos logit contributions
 agreement+0.008
 unison+0.007
 horror+0.007
 Prince+0.007
 Freddie+0.007
Neg logit contributions
w-0.004
z-0.004
inks-0.003
s-0.003
ings-0.003
Token a
Token ID257
Feature activation+3.96e-03

Pos logit contributions
 agreement+0.005
 unison+0.003
 horror+0.003
 relief+0.003
 knight+0.003
Neg logit contributions
w-0.004
s-0.003
z-0.003
inks-0.003
ings-0.003
Token small
Token ID1402
Feature activation-1.68e-02

Pos logit contributions
 agreement+0.008
 unison+0.006
 horror+0.006
 Prince+0.005
 Freddie+0.005
Neg logit contributions
w-0.009
z-0.008
s-0.008
inks-0.008
ings-0.007
Token know
Token ID760
Feature activation-2.76e-02

Pos logit contributions
 agreement+0.038
 unison+0.031
 horror+0.030
 Prince+0.029
 Freddie+0.028
Neg logit contributions
w-0.036
z-0.030
s-0.028
inks-0.027
ing-0.025
Token what
Token ID644
Feature activation-1.89e-02

Pos logit contributions
w+0.054
z+0.046
s+0.045
ing+0.044
inks+0.042
Neg logit contributions
 agreement-0.064
 unison-0.050
 horror-0.046
 Freddie-0.045
 Prince-0.045
Token to
Token ID284
Feature activation-2.75e-03

Pos logit contributions
w+0.029
z+0.023
inks+0.022
ings+0.019
s+0.018
Neg logit contributions
 agreement-0.052
 Prince-0.043
 Freddie-0.043
 Princess-0.043
 horror-0.042
Token do
Token ID466
Feature activation-2.12e-02

Pos logit contributions
w+0.007
z+0.006
inks+0.006
mons+0.005
s+0.005
Neg logit contributions
 agreement-0.005
 end-0.004
 horror-0.004
 unison-0.004
 relief-0.004
Token.
Token ID13
Feature activation+1.37e-03

Pos logit contributions
w+0.049
z+0.041
inks+0.039
s+0.039
mons+0.037
Neg logit contributions
 agreement-0.044
 unison-0.033
 horror-0.032
 Prince-0.031
 Freddie-0.031
Token
Token ID220
Feature activation+8.55e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.003
z-0.003
inks-0.003
s-0.003
ings-0.002
Token
Token ID198
Feature activation-4.46e-03

Pos logit contributions
 agreement+0.017
 unison+0.012
 horror+0.012
 Prince+0.012
 Freddie+0.011
Neg logit contributions
w-0.021
z-0.017
inks-0.016
mons-0.016
s-0.016
Token
Token ID198
Feature activation-1.84e-03

Pos logit contributions
w+0.012
z+0.010
inks+0.010
mons+0.009
ing+0.009
Neg logit contributions
 agreement-0.008
 unison-0.005
 Freddie-0.005
 Prince-0.005
 horror-0.005
TokenTim
Token ID14967
Feature activation+5.95e-03

Pos logit contributions
w+0.004
z+0.003
inks+0.003
s+0.003
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 Prince-0.003
 Freddie-0.003
Tokenmy
Token ID1820
Feature activation-1.04e-02

Pos logit contributions
 agreement+0.014
 unison+0.011
 horror+0.011
 Prince+0.011
 Freddie+0.011
Neg logit contributions
w-0.011
z-0.010
inks-0.009
s-0.009
ings-0.009
Token's
Token ID338
Feature activation+1.18e-02

Pos logit contributions
w+0.023
z+0.019
inks+0.018
s+0.018
ings+0.017
Neg logit contributions
 agreement-0.023
 unison-0.017
 horror-0.017
 Prince-0.017
 Princess-0.016
Token With
Token ID2080
Feature activation+8.86e-03

Pos logit contributions
 agreement+0.031
 unison+0.024
 horror+0.023
 relief+0.022
 knight+0.022
Neg logit contributions
w-0.024
z-0.022
s-0.022
inks-0.021
ing-0.020
Token the
Token ID262
Feature activation+9.41e-03

Pos logit contributions
 agreement+0.018
 unison+0.014
 horror+0.014
 Prince+0.013
 knight+0.013
Neg logit contributions
w-0.020
z-0.017
s-0.016
ing-0.015
ings-0.014
Token old
Token ID1468
Feature activation+2.37e-02

Pos logit contributions
 agreement+0.019
 unison+0.015
 horror+0.014
 Prince+0.014
 Freddie+0.014
Neg logit contributions
w-0.021
z-0.018
s-0.018
ings-0.017
ing-0.017
Token woman
Token ID2415
Feature activation-7.27e-03

Pos logit contributions
 agreement+0.045
 unison+0.033
 horror+0.032
 Prince+0.030
 Freddie+0.029
Neg logit contributions
w-0.059
z-0.052
s-0.049
inks-0.048
ings-0.046
Tokenâ
Token ID22940
Feature activation+1.45e-02

Pos logit contributions
w+0.016
z+0.014
inks+0.013
s+0.013
ings+0.012
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.012
 Prince-0.011
 Freddie-0.011
Token
Token ID26391
Feature activation+4.64e-02

Pos logit contributions
 agreement+0.029
 unison+0.022
 horror+0.022
 Prince+0.021
 Freddie+0.020
Neg logit contributions
w-0.034
z-0.029
s-0.027
inks-0.027
ings-0.026
Token
Token ID8151
Feature activation+9.27e-03

Pos logit contributions
 agreement+0.109
 unison+0.096
 Freddie+0.089
 Prince+0.088
 horror+0.087
Neg logit contributions
w-0.083
z-0.057
inks-0.054
ing-0.053
mons-0.052
Tokens
Token ID82
Feature activation+1.66e-03

Pos logit contributions
 agreement+0.022
 unison+0.017
 horror+0.016
 Freddie+0.016
 Prince+0.016
Neg logit contributions
w-0.019
z-0.015
inks-0.015
mons-0.014
s-0.013
Token wise
Token ID10787
Feature activation-2.32e-03

Pos logit contributions
 agreement+0.003
 unison+0.002
 horror+0.002
 Prince+0.002
 Freddie+0.002
Neg logit contributions
w-0.004
z-0.004
s-0.003
inks-0.003
ings-0.003
Token words
Token ID2456
Feature activation+5.91e-03

Pos logit contributions
w+0.006
z+0.005
inks+0.005
s+0.005
mons+0.004
Neg logit contributions
 agreement-0.005
 unison-0.003
 horror-0.003
 Prince-0.003
 Freddie-0.003
Token,
Token ID11
Feature activation+9.20e-03

Pos logit contributions
 agreement+0.013
 unison+0.010
 horror+0.010
 Prince+0.010
 Freddie+0.010
Neg logit contributions
w-0.012
z-0.011
s-0.010
inks-0.010
ings-0.009
Token said
Token ID531
Feature activation-2.67e-03

Pos logit contributions
 agreement+0.016
 unison+0.012
 horror+0.012
 Prince+0.011
 relief+0.011
Neg logit contributions
w-0.015
z-0.012
s-0.012
inks-0.011
mons-0.011
Token it
Token ID340
Feature activation+7.56e-03

Pos logit contributions
w+0.006
z+0.005
s+0.005
inks+0.005
ings+0.005
Neg logit contributions
 agreement-0.006
 unison-0.004
 horror-0.004
 Prince-0.004
 Freddie-0.004
Token was
Token ID373
Feature activation+1.45e-03

Pos logit contributions
 agreement+0.017
 unison+0.014
 horror+0.014
 Prince+0.013
 Princess+0.013
Neg logit contributions
w-0.015
z-0.013
inks-0.012
s-0.011
ings-0.011
Token made
Token ID925
Feature activation-1.05e-02

Pos logit contributions
 agreement+0.003
 unison+0.003
 horror+0.003
 delight+0.002
 knight+0.002
Neg logit contributions
w-0.003
s-0.002
inks-0.002
z-0.002
ings-0.002
Token from
Token ID422
Feature activation+2.55e-02

Pos logit contributions
w+0.024
z+0.020
inks+0.019
s+0.019
mons+0.018
Neg logit contributions
 agreement-0.022
 unison-0.017
 horror-0.016
 Prince-0.016
 Princess-0.016
Token special
Token ID2041
Feature activation+1.73e-02

Pos logit contributions
 agreement+0.049
 unison+0.036
 horror+0.035
 Prince+0.034
 Freddie+0.033
Neg logit contributions
w-0.062
z-0.054
s-0.051
inks-0.050
ings-0.048
Token plants
Token ID6134
Feature activation+5.26e-03

Pos logit contributions
 agreement+0.040
 horror+0.030
 unison+0.030
 relief+0.029
 delight+0.029
Neg logit contributions
w-0.037
z-0.032
inks-0.029
s-0.029
uch-0.027
Token and
Token ID290
Feature activation+3.45e-04

Pos logit contributions
 agreement+0.011
 unison+0.009
 horror+0.008
 Prince+0.008
 Freddie+0.008
Neg logit contributions
w-0.012
z-0.010
inks-0.009
s-0.009
mons-0.009
Token woods
Token ID16479
Feature activation-2.33e-02

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.000
 Prince+0.000
 Princess+0.000
Neg logit contributions
w-0.001
z-0.001
inks-0.001
s-0.001
ings-0.001
Token.
Token ID13
Feature activation+2.51e-03

Pos logit contributions
w+0.054
z+0.046
inks+0.043
s+0.042
mons+0.040
Neg logit contributions
 agreement-0.048
 unison-0.036
 horror-0.036
 Prince-0.034
 Princess-0.033
Token She
Token ID1375
Feature activation+5.25e-03

Pos logit contributions
 agreement+0.005
 unison+0.004
 horror+0.004
 knight+0.004
 captain+0.004
Neg logit contributions
w-0.005
s-0.005
inks-0.004
z-0.004
ings-0.004
Token big
Token ID1263
Feature activation+1.27e-02

Pos logit contributions
w+0.018
z+0.016
s+0.014
inks+0.014
ing+0.013
Neg logit contributions
 agreement-0.019
 unison-0.015
 horror-0.015
 Prince-0.014
 Freddie-0.014
Token and
Token ID290
Feature activation-3.92e-03

Pos logit contributions
 agreement+0.028
 unison+0.021
 horror+0.021
 Prince+0.021
 Princess+0.020
Neg logit contributions
w-0.028
z-0.023
inks-0.022
s-0.022
ings-0.021
Token had
Token ID550
Feature activation-3.99e-04

Pos logit contributions
w+0.009
z+0.008
s+0.007
inks+0.007
ing+0.007
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 Freddie-0.006
Token a
Token ID257
Feature activation-1.77e-03

Pos logit contributions
w+0.001
z+0.001
inks+0.001
s+0.001
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 delight-0.001
 Princess-0.001
Token smooth
Token ID7209
Feature activation+5.91e-03

Pos logit contributions
w+0.004
z+0.003
s+0.003
inks+0.003
ings+0.003
Neg logit contributions
 agreement-0.004
 unison-0.003
 horror-0.003
 Prince-0.003
 Princess-0.003
Token floor
Token ID4314
Feature activation+1.27e-02

Pos logit contributions
 agreement+0.011
 unison+0.007
 horror+0.007
 Prince+0.007
 delight+0.007
Neg logit contributions
w-0.015
z-0.013
s-0.013
inks-0.013
mons-0.012
Token that
Token ID326
Feature activation-1.05e-02

Pos logit contributions
 agreement+0.025
 unison+0.019
 horror+0.019
 Prince+0.018
 Princess+0.018
Neg logit contributions
w-0.030
z-0.025
inks-0.023
s-0.023
mons-0.022
Token was
Token ID373
Feature activation-9.50e-03

Pos logit contributions
w+0.024
z+0.020
s+0.019
inks+0.019
ings+0.018
Neg logit contributions
 agreement-0.022
 unison-0.017
 horror-0.016
 Prince-0.016
 Freddie-0.015
Token fun
Token ID1257
Feature activation-1.34e-02

Pos logit contributions
w+0.019
z+0.017
inks+0.015
s+0.015
ing+0.014
Neg logit contributions
 agreement-0.022
 unison-0.018
 Prince-0.017
 horror-0.017
 Freddie-0.017
Token to
Token ID284
Feature activation-7.19e-03

Pos logit contributions
w+0.027
z+0.022
inks+0.021
s+0.020
mons+0.019
Neg logit contributions
 agreement-0.031
 horror-0.024
 unison-0.024
 Prince-0.023
 Princess-0.023
Token run
Token ID1057
Feature activation-8.76e-03

Pos logit contributions
w+0.016
z+0.014
s+0.013
inks+0.013
ings+0.012
Neg logit contributions
 agreement-0.015
 unison-0.012
 Prince-0.011
 horror-0.011
 Freddie-0.011
TokenOne
Token ID3198
Feature activation+8.45e-04

Pos logit contributions
 agreement+0.002
 unison+0.002
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.002
z-0.002
inks-0.002
s-0.002
ings-0.002
Token day
Token ID1110
Feature activation+1.06e-02

Pos logit contributions
 agreement+0.002
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.002
z-0.002
s-0.002
inks-0.001
ing-0.001
Token,
Token ID11
Feature activation+5.81e-03

Pos logit contributions
 agreement+0.020
 unison+0.015
 horror+0.015
 Freddie+0.015
 Prince+0.015
Neg logit contributions
w-0.024
z-0.022
inks-0.021
s-0.019
mons-0.019
Token a
Token ID257
Feature activation+8.52e-03

Pos logit contributions
 agreement+0.012
 unison+0.009
 horror+0.008
 Prince+0.008
 Freddie+0.008
Neg logit contributions
w-0.014
z-0.012
s-0.011
inks-0.011
mons-0.011
Token 3
Token ID513
Feature activation+1.67e-02

Pos logit contributions
 agreement+0.017
 unison+0.012
 horror+0.012
 Freddie+0.011
 delight+0.011
Neg logit contributions
w-0.020
z-0.017
s-0.017
ing-0.016
inks-0.016
Token year
Token ID614
Feature activation-2.13e-02

Pos logit contributions
 agreement+0.044
 unison+0.036
 horror+0.035
 Prince+0.034
 Freddie+0.033
Neg logit contributions
w-0.029
z-0.023
s-0.021
inks-0.021
ings-0.020
Token old
Token ID1468
Feature activation+2.06e-02

Pos logit contributions
w+0.055
z+0.048
inks+0.045
s+0.044
ings+0.043
Neg logit contributions
 agreement-0.038
 horror-0.027
 unison-0.027
 Prince-0.026
 Princess-0.025
Token walked
Token ID6807
Feature activation+1.98e-02

Pos logit contributions
 agreement+0.045
 unison+0.035
 horror+0.034
 Prince+0.032
 Freddie+0.032
Neg logit contributions
w-0.045
z-0.039
s-0.036
inks-0.036
ing-0.034
Token into
Token ID656
Feature activation+2.46e-02

Pos logit contributions
 agreement+0.043
 Prince+0.032
 Freddie+0.031
 Princess+0.031
 knight+0.030
Neg logit contributions
w-0.043
z-0.036
inks-0.035
mons-0.033
ings-0.033
Token the
Token ID262
Feature activation+4.03e-03

Pos logit contributions
 agreement+0.046
 unison+0.035
 horror+0.032
 Prince+0.031
 Princess+0.031
Neg logit contributions
w-0.059
z-0.052
s-0.050
inks-0.048
ings-0.047
Token room
Token ID2119
Feature activation+5.61e-03

Pos logit contributions
 agreement+0.007
 Prince+0.005
 horror+0.005
 Princess+0.005
 unison+0.005
Neg logit contributions
w-0.011
z-0.009
inks-0.009
mons-0.009
s-0.009
Token new
Token ID649
Feature activation-1.04e-02

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
mons+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 delight-0.001
 relief-0.001
Token friend
Token ID1545
Feature activation+5.79e-03

Pos logit contributions
w+0.024
z+0.020
inks+0.019
s+0.019
ings+0.018
Neg logit contributions
 agreement-0.022
 unison-0.016
 horror-0.016
 Prince-0.015
 Princess-0.015
Token today
Token ID1909
Feature activation+4.52e-04

Pos logit contributions
 agreement+0.014
 unison+0.011
 horror+0.011
 relief+0.011
 delight+0.011
Neg logit contributions
w-0.011
z-0.009
s-0.009
inks-0.008
ing-0.008
Token.
Token ID13
Feature activation-2.34e-03

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.001
z-0.001
s-0.001
inks-0.001
ings-0.001
Token Her
Token ID2332
Feature activation+9.67e-03

Pos logit contributions
w+0.004
s+0.003
z+0.003
inks+0.003
o+0.003
Neg logit contributions
 agreement-0.006
 unison-0.005
 horror-0.005
 hesitation-0.005
 relief-0.004
Token name
Token ID1438
Feature activation+2.26e-03

Pos logit contributions
 agreement+0.023
 unison+0.018
 horror+0.017
 Freddie+0.016
 Prince+0.016
Neg logit contributions
w-0.019
z-0.016
inks-0.015
s-0.015
ings-0.015
Token is
Token ID318
Feature activation-1.89e-03

Pos logit contributions
 agreement+0.006
 unison+0.005
 Princess+0.005
 Prince+0.005
 Freddie+0.005
Neg logit contributions
w-0.003
z-0.003
inks-0.003
mons-0.003
bel-0.002
Token Emma
Token ID18966
Feature activation-2.09e-03

Pos logit contributions
w+0.005
z+0.004
s+0.004
ing+0.004
inks+0.004
Neg logit contributions
 agreement-0.003
 horror-0.002
 unison-0.002
 relief-0.002
 hesitation-0.002
Token and
Token ID290
Feature activation-7.16e-04

Pos logit contributions
w+0.004
z+0.004
s+0.003
inks+0.003
ing+0.003
Neg logit contributions
 agreement-0.005
 unison-0.004
 horror-0.004
 Prince-0.004
 relief-0.003
Token she
Token ID673
Feature activation-3.89e-03

Pos logit contributions
w+0.002
z+0.001
s+0.001
inks+0.001
ing+0.001
Neg logit contributions
 agreement-0.002
 unison-0.001
 horror-0.001
 Prince-0.001
 relief-0.001
Token is
Token ID318
Feature activation-3.89e-03

Pos logit contributions
w+0.007
z+0.006
mons+0.006
ings+0.006
inks+0.006
Neg logit contributions
 agreement-0.009
 Princess-0.007
 Prince-0.007
 Freddie-0.007
 unison-0.007
Token They
Token ID1119
Feature activation+4.71e-03

Pos logit contributions
w+0.017
s+0.014
z+0.014
inks+0.014
mons+0.013
Neg logit contributions
 agreement-0.026
 unison-0.020
 horror-0.020
 hesitation-0.019
 relief-0.019
Token looked
Token ID3114
Feature activation+2.14e-02

Pos logit contributions
 agreement+0.012
 unison+0.009
 horror+0.009
 Freddie+0.009
 Prince+0.008
Neg logit contributions
w-0.009
s-0.007
z-0.007
inks-0.007
bel-0.006
Token at
Token ID379
Feature activation+6.01e-04

Pos logit contributions
 agreement+0.052
 unison+0.043
 horror+0.041
 Freddie+0.039
 Prince+0.039
Neg logit contributions
w-0.038
s-0.032
z-0.032
inks-0.031
ing-0.029
Token each
Token ID1123
Feature activation+2.61e-03

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Princess+0.001
Neg logit contributions
w-0.001
s-0.001
z-0.001
inks-0.001
ings-0.001
Token other
Token ID584
Feature activation-1.45e-02

Pos logit contributions
 agreement+0.005
 unison+0.003
 Freddie+0.003
 horror+0.003
 relief+0.003
Neg logit contributions
w-0.007
s-0.006
z-0.006
ing-0.005
ings-0.005
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
w+0.029
z+0.026
s+0.025
inks+0.024
ings+0.023
Neg logit contributions
 agreement-0.033
 unison-0.026
 horror-0.025
 Prince-0.024
 Freddie-0.024
Token They
Token ID1119
Feature activation+4.67e-03

Pos logit contributions
w+0.019
z+0.016
s+0.016
inks+0.015
mons+0.014
Neg logit contributions
 agreement-0.025
 unison-0.020
 horror-0.019
 Prince-0.018
 relief-0.018
Token smiled
Token ID13541
Feature activation-1.71e-02

Pos logit contributions
 agreement+0.011
 unison+0.009
 horror+0.009
 Freddie+0.009
 Prince+0.008
Neg logit contributions
w-0.009
s-0.007
z-0.007
inks-0.007
mons-0.006
Token and
Token ID290
Feature activation-4.38e-03

Pos logit contributions
w+0.027
s+0.026
ing+0.023
z+0.023
inks+0.023
Neg logit contributions
 agreement-0.043
 unison-0.035
 horror-0.034
 Freddie-0.034
 hesitation-0.033
Token hugged
Token ID41130
Feature activation-5.35e-03

Pos logit contributions
w+0.009
s+0.008
inks+0.007
z+0.007
aped+0.007
Neg logit contributions
 agreement-0.010
 horror-0.007
 unison-0.007
 Freddie-0.007
 Prince-0.007
Token.
Token ID13
Feature activation-1.17e-02

Pos logit contributions
w+0.011
s+0.009
z+0.009
inks+0.008
ing+0.008
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Freddie-0.009
 Prince-0.009
Token
Token ID198
Feature activation-5.06e-03

Pos logit contributions
w+0.016
z+0.014
inks+0.013
s+0.013
mons+0.012
Neg logit contributions
 agreement-0.016
 unison-0.013
 horror-0.012
 Prince-0.012
 Freddie-0.012
Token"
Token ID1
Feature activation-3.09e-03

Pos logit contributions
w+0.010
z+0.009
s+0.008
inks+0.008
ings+0.007
Neg logit contributions
 agreement-0.012
 unison-0.009
 horror-0.009
 Prince-0.009
 Freddie-0.009
TokenWow
Token ID22017
Feature activation+8.26e-03

Pos logit contributions
w+0.007
z+0.006
inks+0.006
s+0.005
ings+0.005
Neg logit contributions
 agreement-0.007
 unison-0.005
 horror-0.005
 Prince-0.005
 Princess-0.005
Token,
Token ID11
Feature activation+8.91e-04

Pos logit contributions
 agreement+0.019
 unison+0.015
 horror+0.015
 Prince+0.014
 Freddie+0.014
Neg logit contributions
w-0.017
z-0.014
inks-0.013
s-0.013
mons-0.012
Token Cat
Token ID5181
Feature activation-2.60e-02

Pos logit contributions
 agreement+0.002
 unison+0.001
 horror+0.001
 Prince+0.001
 Freddie+0.001
Neg logit contributions
w-0.002
z-0.002
s-0.002
inks-0.002
uck-0.002
Tokenty
Token ID774
Feature activation-4.05e-03

Pos logit contributions
w+0.052
z+0.044
s+0.041
inks+0.040
ings+0.038
Neg logit contributions
 agreement-0.062
 unison-0.048
 horror-0.047
 Prince-0.046
 Freddie-0.045
Token is
Token ID318
Feature activation-4.62e-03

Pos logit contributions
w+0.008
z+0.007
inks+0.006
mons+0.006
s+0.006
Neg logit contributions
 agreement-0.009
 horror-0.007
 Freddie-0.007
 unison-0.007
 Prince-0.007
Token a
Token ID257
Feature activation-5.06e-04

Pos logit contributions
w+0.009
z+0.007
inks+0.007
s+0.007
mons+0.006
Neg logit contributions
 agreement-0.011
 unison-0.009
 horror-0.008
 hesitation-0.007
 delight-0.007
Token butterfly
Token ID35113
Feature activation+1.44e-02

Pos logit contributions
w+0.001
z+0.001
inks+0.001
s+0.001
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Freddie-0.001
 Prince-0.001
Token!"
Token ID2474
Feature activation-3.16e-03

Pos logit contributions
 agreement+0.037
 unison+0.030
 horror+0.029
 Prince+0.029
 Freddie+0.029
Neg logit contributions
w-0.024
z-0.020
inks-0.019
s-0.018
mons-0.017
Token Ben
Token ID3932
Feature activation+3.73e-03

Pos logit contributions
w+0.007
s+0.007
z+0.006
inks+0.006
o+0.006
Neg logit contributions
 agreement-0.006
 unison-0.005
 horror-0.004
 relief-0.004
 approval-0.004
Token was
Token ID373
Feature activation-1.08e-02

Pos logit contributions
 agreement+0.019
 unison+0.015
 horror+0.014
 Freddie+0.014
 Prince+0.014
Neg logit contributions
w-0.019
z-0.017
inks-0.016
s-0.015
mons-0.014
Token not
Token ID407
Feature activation-1.04e-02

Pos logit contributions
w+0.024
z+0.020
s+0.019
inks+0.019
ings+0.018
Neg logit contributions
 agreement-0.024
 unison-0.018
 horror-0.018
 Prince-0.017
 Princess-0.016
Token.
Token ID13
Feature activation-1.52e-02

Pos logit contributions
w+0.024
z+0.020
inks+0.019
s+0.019
ings+0.018
Neg logit contributions
 agreement-0.022
 unison-0.017
 horror-0.016
 Prince-0.016
 Princess-0.015
Token It
Token ID632
Feature activation-2.49e-02

Pos logit contributions
w+0.029
s+0.025
z+0.025
inks+0.024
ings+0.023
Neg logit contributions
 agreement-0.035
 unison-0.027
 horror-0.027
 relief-0.026
 hesitation-0.025
Token snapped
Token ID20821
Feature activation-2.44e-02

Pos logit contributions
w+0.047
z+0.038
s+0.036
inks+0.034
ings+0.032
Neg logit contributions
 agreement-0.063
 unison-0.050
 horror-0.049
 Prince-0.048
 Princess-0.046
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
w+0.053
z+0.045
s+0.042
inks+0.041
ings+0.039
Neg logit contributions
 agreement-0.054
 unison-0.042
 horror-0.041
 Prince-0.040
 Freddie-0.039
Token The
Token ID383
Feature activation-1.38e-02

Pos logit contributions
w+0.021
z+0.017
s+0.017
inks+0.017
ings+0.016
Neg logit contributions
 agreement-0.024
 unison-0.019
 horror-0.018
 hesitation-0.018
 relief-0.017
Token tomato
Token ID24240
Feature activation-2.28e-03

Pos logit contributions
w+0.032
z+0.027
s+0.026
inks+0.025
mons+0.025
Neg logit contributions
 agreement-0.028
 unison-0.021
 horror-0.021
 Prince-0.019
 Freddie-0.019
Token fell
Token ID3214
Feature activation-2.46e-02

Pos logit contributions
w+0.005
z+0.004
s+0.004
inks+0.004
ings+0.004
Neg logit contributions
 agreement-0.005
 unison-0.004
 horror-0.004
 Prince-0.004
 Princess-0.004
Token to
Token ID284
Feature activation-6.69e-03

Pos logit contributions
w+0.051
z+0.043
inks+0.040
s+0.039
mons+0.038
Neg logit contributions
 agreement-0.056
 unison-0.043
 horror-0.043
 Prince-0.041
 Freddie-0.041
Token the
Token ID262
Feature activation-8.34e-03

Pos logit contributions
w+0.014
z+0.011
s+0.011
ings+0.011
inks+0.010
Neg logit contributions
 agreement-0.014
 unison-0.011
 horror-0.010
 Prince-0.010
 knight-0.010
Token was
Token ID373
Feature activation+5.81e-04

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
ings+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Freddie-0.001
 knight-0.001
Token off
Token ID572
Feature activation+2.23e-02

Pos logit contributions
 agreement+0.001
 unison+0.001
 horror+0.001
 Prince+0.001
 Princess+0.001
Neg logit contributions
w-0.001
z-0.001
inks-0.001
s-0.001
ings-0.001
Token on
Token ID319
Feature activation-1.66e-02

Pos logit contributions
 agreement+0.051
 unison+0.040
 horror+0.039
 Prince+0.038
 Freddie+0.037
Neg logit contributions
w-0.046
z-0.039
inks-0.036
s-0.036
mons-0.034
Token her
Token ID607
Feature activation-5.51e-03

Pos logit contributions
w+0.042
z+0.037
s+0.035
inks+0.034
ings+0.033
Neg logit contributions
 agreement-0.030
 unison-0.022
 horror-0.021
 Prince-0.021
 Princess-0.020
Token golf
Token ID13126
Feature activation+9.78e-03

Pos logit contributions
w+0.013
z+0.012
s+0.011
inks+0.011
ings+0.010
Neg logit contributions
 agreement-0.011
 unison-0.008
 horror-0.008
 Prince-0.007
 Freddie-0.007
Tokening
Token ID278
Feature activation-6.73e-02

Pos logit contributions
 agreement+0.022
 horror+0.017
 unison+0.016
 Prince+0.016
 relief+0.015
Neg logit contributions
w-0.021
z-0.018
s-0.017
inks-0.017
ings-0.016
Token adventure
Token ID8855
Feature activation+9.29e-03

Pos logit contributions
w+0.150
z+0.128
s+0.119
inks+0.119
ings+0.113
Neg logit contributions
 agreement-0.144
 unison-0.110
 horror-0.108
 Prince-0.104
 Freddie-0.101
Token with
Token ID351
Feature activation-1.56e-02

Pos logit contributions
 agreement+0.020
 unison+0.015
 horror+0.014
 Prince+0.014
 Freddie+0.014
Neg logit contributions
w-0.021
z-0.018
inks-0.017
s-0.016
mons-0.016
Token her
Token ID607
Feature activation-5.98e-03

Pos logit contributions
w+0.034
z+0.029
s+0.027
ings+0.026
inks+0.026
Neg logit contributions
 agreement-0.032
 unison-0.024
 horror-0.024
 Prince-0.023
 knight-0.023
Token clean
Token ID3424
Feature activation+1.10e-02

Pos logit contributions
w+0.013
z+0.012
s+0.011
inks+0.011
ings+0.010
Neg logit contributions
 agreement-0.013
 horror-0.009
 unison-0.009
 Prince-0.009
 Freddie-0.009
Token golf
Token ID13126
Feature activation+1.17e-02

Pos logit contributions
 agreement+0.023
 unison+0.017
 horror+0.017
 Prince+0.016
 Freddie+0.016
Neg logit contributions
w-0.025
z-0.022
s-0.021
inks-0.020
ings-0.020
Token they
Token ID484
Feature activation+5.11e-03

Pos logit contributions
w+0.008
s+0.007
z+0.007
mons+0.006
ing+0.006
Neg logit contributions
 agreement-0.007
 unison-0.006
 horror-0.005
 knight-0.005
 hesitation-0.005
Token marched
Token ID23558
Feature activation-1.83e-02

Pos logit contributions
 agreement+0.013
 horror+0.011
 unison+0.011
 Prince+0.011
 Princess+0.010
Neg logit contributions
w-0.010
s-0.008
z-0.007
o-0.007
mons-0.007
Token outside
Token ID2354
Feature activation+2.16e-02

Pos logit contributions
w+0.034
z+0.027
inks+0.026
s+0.025
ings+0.024
Neg logit contributions
 agreement-0.047
 unison-0.037
 horror-0.036
 Prince-0.035
 Freddie-0.035
Token with
Token ID351
Feature activation-1.55e-02

Pos logit contributions
 agreement+0.059
 unison+0.047
 horror+0.047
 Prince+0.045
 Freddie+0.044
Neg logit contributions
w-0.037
z-0.029
inks-0.027
s-0.026
mons-0.024
Token their
Token ID511
Feature activation+4.33e-03

Pos logit contributions
w+0.028
s+0.024
z+0.023
ings+0.023
mons+0.022
Neg logit contributions
 agreement-0.034
 knight-0.028
 Prince-0.028
 unison-0.027
 Princess-0.027
Token bro
Token ID1379
Feature activation-6.77e-02

Pos logit contributions
 agreement+0.009
 unison+0.007
 Prince+0.007
 horror+0.007
 Princess+0.007
Neg logit contributions
w-0.010
z-0.008
s-0.008
inks-0.008
mons-0.008
Tokenoms
Token ID3150
Feature activation-2.00e-02

Pos logit contributions
w+0.125
z+0.106
ings+0.092
inks+0.090
s+0.085
Neg logit contributions
 agreement-0.159
 unison-0.126
 horror-0.120
 end-0.113
 Prince-0.112
Token and
Token ID290
Feature activation-7.65e-03

Pos logit contributions
w+0.038
z+0.033
s+0.032
ing+0.029
inks+0.029
Neg logit contributions
 agreement-0.050
 horror-0.039
 Prince-0.039
 unison-0.039
 Princess-0.038
Token began
Token ID2540
Feature activation-1.35e-02

Pos logit contributions
w+0.018
z+0.016
s+0.014
inks+0.014
mons+0.014
Neg logit contributions
 agreement-0.016
 unison-0.012
 horror-0.012
 Prince-0.011
 Princess-0.011
Token to
Token ID284
Feature activation-9.45e-03

Pos logit contributions
w+0.021
z+0.014
inks+0.013
s+0.013
mons+0.011
Neg logit contributions
 agreement-0.039
 unison-0.032
 horror-0.032
 Prince-0.031
 delight-0.031
Token clean
Token ID3424
Feature activation+1.50e-02

Pos logit contributions
w+0.019
z+0.017
inks+0.015
s+0.015
ings+0.015
Neg logit contributions
 agreement-0.020
 Prince-0.016
 unison-0.016
 horror-0.015
 Princess-0.015
Token Tim
Token ID5045
Feature activation-1.64e-02

Pos logit contributions
w+0.007
s+0.006
z+0.006
inks+0.006
to+0.005
Neg logit contributions
 agreement-0.008
 unison-0.006
 knight-0.006
 horror-0.006
 Princess-0.006
Tokenmy
Token ID1820
Feature activation-1.78e-02

Pos logit contributions
w+0.032
z+0.027
inks+0.025
s+0.024
mons+0.023
Neg logit contributions
 agreement-0.040
 unison-0.031
 horror-0.030
 Prince-0.030
 Freddie-0.029
Token's
Token ID338
Feature activation+7.64e-03

Pos logit contributions
w+0.034
z+0.029
s+0.028
inks+0.027
ing+0.025
Neg logit contributions
 agreement-0.043
 unison-0.034
 horror-0.033
 Prince-0.032
 relief-0.032
Token house
Token ID2156
Feature activation-1.95e-02

Pos logit contributions
 agreement+0.017
 unison+0.013
 horror+0.013
 Prince+0.013
 Freddie+0.012
Neg logit contributions
w-0.016
z-0.014
s-0.013
inks-0.013
ings-0.012
Token and
Token ID290
Feature activation-7.21e-03

Pos logit contributions
w+0.039
z+0.033
s+0.031
inks+0.030
ings+0.028
Neg logit contributions
 agreement-0.046
 unison-0.036
 horror-0.035
 Prince-0.034
 Freddie-0.034
Token celebrated
Token ID13943
Feature activation-6.79e-02

Pos logit contributions
w+0.015
s+0.013
z+0.012
inks+0.011
o+0.011
Neg logit contributions
 agreement-0.017
 unison-0.013
 horror-0.012
 Deal-0.011
 hesitation-0.011
Token with
Token ID351
Feature activation-1.52e-02

Pos logit contributions
w+0.147
z+0.124
s+0.117
inks+0.116
ings+0.109
Neg logit contributions
 agreement-0.151
 unison-0.117
 horror-0.114
 Prince-0.109
 Freddie-0.107
Token ice
Token ID4771
Feature activation+1.74e-02

Pos logit contributions
w+0.030
s+0.026
z+0.024
ings+0.023
inks+0.023
Neg logit contributions
 agreement-0.032
 unison-0.025
 horror-0.024
 knight-0.023
 Princess-0.023
Token cream
Token ID8566
Feature activation-3.43e-04

Pos logit contributions
 agreement+0.034
 end+0.028
 Prince+0.027
kin+0.026
 Nem+0.026
Neg logit contributions
w-0.033
z-0.028
inks-0.027
ew-0.025
eware-0.025
Token.
Token ID13
Feature activation-5.19e-03

Pos logit contributions
w+0.001
z+0.001
s+0.001
inks+0.001
ing+0.001
Neg logit contributions
 agreement-0.001
 unison-0.001
 horror-0.001
 Prince-0.001
 Princess-0.001
Token The
Token ID383
Feature activation-1.59e-02

Pos logit contributions
w+0.013
z+0.011
inks+0.010
s+0.010
ings+0.010
Neg logit contributions
 agreement-0.010
 unison-0.007
 horror-0.007
 Prince-0.007
 Freddie-0.007
Token and
Token ID290
Feature activation-3.49e-03

Pos logit contributions
 agreement+0.013
 unison+0.010
 horror+0.010
 Prince+0.009
 Freddie+0.009
Neg logit contributions
w-0.012
z-0.010
inks-0.009
s-0.009
ings-0.009
Token scary
Token ID14343
Feature activation-8.94e-03

Pos logit contributions
w+0.008
s+0.007
z+0.007
inks+0.006
ing+0.006
Neg logit contributions
 agreement-0.007
 unison-0.006
 horror-0.006
 Prince-0.005
 delight-0.005
Token.
Token ID13
Feature activation+2.33e-03

Pos logit contributions
w+0.019
z+0.016
s+0.016
inks+0.015
ing+0.015
Neg logit contributions
 agreement-0.020
 unison-0.015
 horror-0.015
 Prince-0.015
 Princess-0.014
Token It
Token ID632
Feature activation-8.53e-03

Pos logit contributions
 agreement+0.005
 unison+0.003
 horror+0.003
 hesitation+0.003
 knight+0.003
Neg logit contributions
w-0.005
s-0.005
z-0.005
ew-0.004
inks-0.004
Token bark
Token ID21405
Feature activation+6.91e-03

Pos logit contributions
w+0.016
s+0.014
z+0.013
o+0.011
inks+0.011
Neg logit contributions
 agreement-0.022
 unison-0.018
 horror-0.016
 Prince-0.016
 delight-0.016
Tokened
Token ID276
Feature activation-5.93e-02

Pos logit contributions
 agreement+0.022
 unison+0.019
 horror+0.018
 Prince+0.018
 Freddie+0.018
Neg logit contributions
w-0.008
z-0.005
inks-0.005
ings-0.004
mons-0.004
Token and
Token ID290
Feature activation-3.42e-03

Pos logit contributions
w+0.123
z+0.104
s+0.098
inks+0.096
ing+0.091
Neg logit contributions
 agreement-0.136
 unison-0.107
 horror-0.104
 Prince-0.101
 Freddie-0.098
Token grow
Token ID1663
Feature activation-2.21e-02

Pos logit contributions
w+0.007
s+0.006
z+0.006
inks+0.005
mons+0.005
Neg logit contributions
 agreement-0.008
 unison-0.006
 horror-0.006
 Prince-0.006
 knight-0.006
Tokenled
Token ID992
Feature activation-3.69e-02

Pos logit contributions
w+0.041
z+0.035
inks+0.032
mons+0.029
s+0.029
Neg logit contributions
 agreement-0.053
 unison-0.042
 horror-0.041
 Freddie-0.040
 Prince-0.040
Token and
Token ID290
Feature activation-4.56e-03

Pos logit contributions
w+0.071
z+0.060
s+0.059
inks+0.055
ing+0.055
Neg logit contributions
 agreement-0.087
 unison-0.070
 horror-0.067
 Prince-0.066
 Princess-0.065
Token showed
Token ID3751
Feature activation+1.30e-03

Pos logit contributions
w+0.010
z+0.008
s+0.008
inks+0.007
mons+0.007
Neg logit contributions
 agreement-0.010
 unison-0.008
 horror-0.007
 Prince-0.007
 Princess-0.007
Token her
Token ID607
Feature activation+2.98e-04

Pos logit contributions
w+0.035
z+0.031
inks+0.029
s+0.029
mons+0.028
Neg logit contributions
 agreement-0.023
 unison-0.016
 horror-0.015
 Prince-0.015
 Freddie-0.014
Token hand
Token ID1021
Feature activation-6.20e-03

Pos logit contributions
 agreement+0.001
 unison+0.000
 horror+0.000
 Prince+0.000
 Princess+0.000
Neg logit contributions
w-0.001
z-0.001
s-0.001
inks-0.001
ings-0.001
Token.
Token ID13
Feature activation-2.49e-03

Pos logit contributions
w+0.014
z+0.012
inks+0.012
s+0.011
mons+0.011
Neg logit contributions
 agreement-0.012
 horror-0.009
 unison-0.009
 Prince-0.008
 Freddie-0.008
Token "
Token ID366
Feature activation-2.47e-02

Pos logit contributions
w+0.006
z+0.005
s+0.005
inks+0.005
ings+0.005
Neg logit contributions
 agreement-0.005
 unison-0.003
 horror-0.003
 relief-0.003
 Prince-0.003
TokenO
Token ID46
Feature activation-5.35e-03

Pos logit contributions
w+0.053
z+0.045
s+0.042
inks+0.042
ings+0.039
Neg logit contributions
 agreement-0.055
 unison-0.043
 horror-0.042
 Prince-0.040
 Freddie-0.039
Tokenuch
Token ID794
Feature activation-6.46e-02

Pos logit contributions
bel+0.006
elt+0.005
mons+0.004
inks+0.004
iated+0.004
Neg logit contributions
 agreement-0.011
 gust-0.011
kin-0.011
 Freddie-0.011
lf-0.010
Token!"
Token ID2474
Feature activation+7.66e-03

Pos logit contributions
w+0.127
mons+0.113
z+0.111
inks+0.109
to+0.104
Neg logit contributions
 agreement-0.128
 gust-0.102
 Freddie-0.102
 end-0.097
 knight-0.094
Token cried
Token ID16896
Feature activation-1.07e-02

Pos logit contributions
 agreement+0.015
 unison+0.012
 horror+0.011
 Prince+0.010
 relief+0.010
Neg logit contributions
w-0.018
z-0.016
s-0.015
inks-0.015
ings-0.014
Token Lily
Token ID20037
Feature activation-5.06e-03

Pos logit contributions
w+0.027
z+0.023
s+0.023
ing+0.022
ings+0.021
Neg logit contributions
 agreement-0.019
 unison-0.015
 horror-0.014
 delight-0.013
 hesitation-0.013
Token.
Token ID13
Feature activation-9.38e-04

Pos logit contributions
w+0.012
z+0.010
inks+0.009
s+0.009
mons+0.009
Neg logit contributions
 agreement-0.010
 unison-0.008
 horror-0.008
 Prince-0.007
 Freddie-0.007
Token "
Token ID366
Feature activation-2.34e-02

Pos logit contributions
w+0.002
z+0.002
s+0.002
inks+0.002
ings+0.002
Neg logit contributions
 agreement-0.002
 unison-0.001
 horror-0.001
 Prince-0.001
 relief-0.001