TOP ACTIVATIONS
MAX = 0.283

saw all the tasty ch ocol ates inside . She couldn
looking at all the ch ocol ates , Lex i chose
started eating the delicious ch ocol ates . She was so
was filled with delicious ch ocol ates . He was happy
as he opened the ch ocol ates and started eating them
bled up all the ch ocol ates , and she was
't have all these ch ocol ates ! Now , go
a big box of ch ocol ates ! <|endoftext|> Once upon
a large selection of ch ocol ates . Ell a picked
he ate all the ch ocol ates and sweets . Then
box and share the ch ocol ates . They are happy
and her favourite , ch ocol ates ! She asked mummy
not eat too many ch ocol ates and his friend learned
and started eating the ch ocol ates . The shop keeper
noticed something . The ch ocol ates were bitter . They
a big box of ch ocol ates . It was so
a big box of ch ocol ates to share with the
a big box of ch ocol ates !
was a box of ch ocol ates ! Alice and her
she paid for the ch ocol ates and the two of

MIN ACTIVATIONS
MIN = -0.143

counters . When they were done , they sat down to
do . But I 'm done now . Come on ,
. And when we are done , we can play with
golden . When they were done , June opened the oven
sand . When they were done , Jane thanked Bob for
him . When they were done , he thanked her for
carefully . When Daddy was done , the door was locked
away . When they were done , her mother said ,
patiently . When they were done , Lily 's mom took
carefully . When she was done , the duck said ,
bake . When they were done , Tim my 's wish
fun ! When they were done , Lily 's mom looked
correctly . When she was done , the owl smiled and
smooth . When they were done , Jack and Jill hopped
before . When they were done , they were very proud
finish . When they were done , Lily noticed that her
cloth . When they were done , they went inside the
snacks . When they were done , Lucy felt ready for
another . When they were done , John felt proud of
him and said he 'd done a great job . The

INTERVAL 0.177 - 0.283
CONTAINS 0.002%

find a box of ch ocol ates . "
. <|endoftext|> Peter was a mis chie vous 3 year old
a big box of ch ocol ates to share with the
, Bugs said , " Twe ety , I have a
just a little bit of mis beh aving .

INTERVAL 0.070 - 0.177
CONTAINS 3.319%

a smile y face . Her mom saw the mark and
sad anymore . She had her very own beautiful rainbow to
didn 't want to . Her friends asked her why she
. He quickly stole something from his friend and ran away
That is the moon , my child . It 's very

INTERVAL -0.036 - 0.070
CONTAINS 80.662%

animal . One day , the turtle was running through the
Fr anny understood and carefully put the animal back .
flew all around the room . Mom thought for
and cut the lion 's dull paw . The lion roared
together . They drank water and ate healthy meat .

INTERVAL -0.143 - -0.036
CONTAINS 16.017%

Tim says . They open the box and take
was very angry and he shouted at Tom . He said
than usual for fun , they speed . Paul
When she was done , she felt happy and
She cooked the mixture until it smelled y ummy .
Token saw
Token ID2497
Feature activation+1.47e-02

Pos logit contributions
 exclaimed+0.016
 replied+0.016
cakes+0.015
s+0.014
�+0.014
Neg logit contributions
ocol-0.038
 loser-0.033
ubb-0.033
uff-0.033
 bruised-0.033
Token all
Token ID477
Feature activation-1.05e-02

Pos logit contributions
ocol+0.044
 loser+0.039
uff+0.039
ithering+0.039
ubb+0.039
Neg logit contributions
 exclaimed-0.013
 replied-0.012
 said-0.012
s-0.011
�-0.010
Token the
Token ID262
Feature activation-1.14e-02

Pos logit contributions
 exclaimed+0.011
 replied+0.011
 said+0.010
�+0.010
s+0.010
Neg logit contributions
ocol-0.029
uff-0.025
ubb-0.025
 loser-0.025
 bruised-0.025
Token tasty
Token ID25103
Feature activation+2.40e-02

Pos logit contributions
 said+0.017
 exclaimed+0.017
 replied+0.016
s+0.015
 looked+0.015
Neg logit contributions
ocol-0.027
ithering-0.024
uff-0.024
 loser-0.024
ubb-0.024
Token ch
Token ID442
Feature activation+7.50e-02

Pos logit contributions
ocol+0.056
 loser+0.051
uff+0.050
aught+0.050
ithering+0.050
Neg logit contributions
 said-0.034
 exclaimed-0.034
 replied-0.034
s-0.032
�-0.030
Tokenocol
Token ID4668
Feature activation+2.83e-01

Pos logit contributions
 Kang+0.024
 bruised+0.024
aught+0.022
asured+0.020
 loser+0.020
Neg logit contributions
s-0.219
 exclaimed-0.212
 said-0.210
cakes-0.208
 replied-0.208
Tokenates
Token ID689
Feature activation+1.45e-02

Pos logit contributions
ocol+0.364
 loser+0.302
 bruised+0.283
ubb+0.282
aught+0.279
Neg logit contributions
 exclaimed-0.683
 replied-0.670
 said-0.666
s-0.660
cakes-0.645
Token inside
Token ID2641
Feature activation+1.11e-02

Pos logit contributions
ocol+0.038
uff+0.033
 loser+0.033
ubb+0.033
ithering+0.032
Neg logit contributions
 exclaimed-0.017
 replied-0.017
 said-0.016
�-0.015
s-0.014
Token.
Token ID13
Feature activation+3.74e-03

Pos logit contributions
ocol+0.031
uff+0.028
 loser+0.028
ubb+0.027
ithering+0.027
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.011
s-0.010
�-0.010
Token She
Token ID1375
Feature activation+2.05e-02

Pos logit contributions
ocol+0.009
uff+0.008
uzz+0.008
aught+0.008
 loser+0.008
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.005
�-0.005
Token couldn
Token ID3521
Feature activation-2.07e-02

Pos logit contributions
ocol+0.051
asured+0.047
uff+0.046
uzz+0.046
 bruised+0.046
Neg logit contributions
�-0.021
cakes-0.020
 exclaimed-0.019
 replied-0.019
s-0.019
Token looking
Token ID2045
Feature activation+2.53e-03

Pos logit contributions
ocol+0.053
 loser+0.048
ubb+0.047
uff+0.047
aught+0.046
Neg logit contributions
 replied-0.020
 said-0.020
 exclaimed-0.020
 they-0.018
s-0.017
Token at
Token ID379
Feature activation+5.84e-02

Pos logit contributions
ocol+0.006
 loser+0.005
uff+0.005
ubb+0.005
ithering+0.005
Neg logit contributions
 exclaimed-0.004
 replied-0.004
 said-0.004
s-0.004
�-0.004
Token all
Token ID477
Feature activation-1.17e-02

Pos logit contributions
ocol+0.174
 loser+0.162
ithering+0.159
uff+0.153
aught+0.153
Neg logit contributions
 said-0.050
 replied-0.049
 exclaimed-0.047
s-0.044
 -0.044
Token the
Token ID262
Feature activation-9.44e-03

Pos logit contributions
 exclaimed+0.010
 replied+0.010
 said+0.010
s+0.009
�+0.008
Neg logit contributions
ocol-0.035
 loser-0.031
uff-0.031
ithering-0.031
ubb-0.030
Token ch
Token ID442
Feature activation+7.29e-02

Pos logit contributions
 said+0.013
 exclaimed+0.012
 replied+0.012
s+0.012
�+0.011
Neg logit contributions
ocol-0.024
 loser-0.022
uzz-0.021
ubb-0.021
ithering-0.021
Tokenocol
Token ID4668
Feature activation+2.81e-01

Pos logit contributions
 loser+0.033
 prisoner+0.029
 unsolved+0.028
 property+0.028
 seeker+0.027
Neg logit contributions
s-0.216
cakes-0.210
 exclaimed-0.207
 said-0.207
 replied-0.206
Tokenates
Token ID689
Feature activation+1.22e-02

Pos logit contributions
ocol+0.386
 loser+0.347
 bruised+0.327
ubb+0.325
aught+0.313
Neg logit contributions
 exclaimed-0.625
s-0.610
 said-0.608
 replied-0.608
cakes-0.597
Token,
Token ID11
Feature activation+1.93e-02

Pos logit contributions
ocol+0.036
 loser+0.032
uff+0.032
ubb+0.032
ithering+0.032
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.010
s-0.009
�-0.009
Token Lex
Token ID17210
Feature activation+7.14e-02

Pos logit contributions
ocol+0.043
uff+0.036
 loser+0.036
ubb+0.036
ithering+0.035
Neg logit contributions
 exclaimed-0.031
 replied-0.031
 said-0.030
s-0.029
�-0.028
Tokeni
Token ID72
Feature activation+2.82e-02

Pos logit contributions
ocol+0.116
 loser+0.094
uff+0.092
ubb+0.090
ithering+0.090
Neg logit contributions
 exclaimed-0.158
 replied-0.157
 said-0.155
s-0.150
�-0.147
Token chose
Token ID7690
Feature activation+2.44e-02

Pos logit contributions
ocol+0.086
 loser+0.076
uff+0.076
ubb+0.075
ithering+0.075
Neg logit contributions
 exclaimed-0.023
 replied-0.023
 said-0.022
s-0.020
�-0.019
Token started
Token ID2067
Feature activation+1.75e-02

Pos logit contributions
 replied+0.037
 exclaimed+0.037
�+0.036
cakes+0.033
s+0.033
Neg logit contributions
ocol-0.081
uff-0.072
uzz-0.071
 bruised-0.070
ubb-0.069
Token eating
Token ID6600
Feature activation-1.64e-02

Pos logit contributions
ocol+0.044
uff+0.038
uzz+0.037
aught+0.037
ubb+0.037
Neg logit contributions
 replied-0.023
 exclaimed-0.023
�-0.022
 said-0.021
cakes-0.020
Token the
Token ID262
Feature activation-2.70e-02

Pos logit contributions
 exclaimed+0.018
 replied+0.017
 said+0.016
�+0.015
s+0.015
Neg logit contributions
ocol-0.045
uff-0.040
 loser-0.039
ubb-0.039
 bruised-0.039
Token delicious
Token ID12625
Feature activation-2.53e-03

Pos logit contributions
 exclaimed+0.038
 replied+0.037
 said+0.035
s+0.034
�+0.034
Neg logit contributions
ocol-0.066
uff-0.055
 bruised-0.055
ubb-0.054
 loser-0.054
Token ch
Token ID442
Feature activation+6.84e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.003
 said+0.003
s+0.003
�+0.003
Neg logit contributions
ocol-0.006
uff-0.005
 loser-0.005
ubb-0.005
ithering-0.005
Tokenocol
Token ID4668
Feature activation+2.78e-01

Pos logit contributions
 loser+0.021
 bruised+0.020
uzz+0.018
ithering+0.016
asured+0.016
Neg logit contributions
s-0.205
cakes-0.202
 exclaimed-0.202
 replied-0.200
 said-0.198
Tokenates
Token ID689
Feature activation+6.71e-03

Pos logit contributions
ocol+0.365
 loser+0.291
ubb+0.278
 bruised+0.275
uff+0.275
Neg logit contributions
 exclaimed-0.679
 replied-0.667
 said-0.660
s-0.652
cakes-0.639
Token.
Token ID13
Feature activation-5.91e-03

Pos logit contributions
ocol+0.017
uff+0.015
ubb+0.015
 loser+0.015
uzz+0.015
Neg logit contributions
 replied-0.008
 exclaimed-0.008
�-0.007
 said-0.007
cakes-0.007
Token She
Token ID1375
Feature activation+1.77e-02

Pos logit contributions
 replied+0.006
 exclaimed+0.006
 said+0.005
cakes+0.005
s+0.005
Neg logit contributions
ocol-0.015
uzz-0.014
uff-0.014
ithering-0.013
 loser-0.013
Token was
Token ID373
Feature activation+1.09e-02

Pos logit contributions
ocol+0.049
uff+0.044
 bruised+0.043
aught+0.043
uzz+0.043
Neg logit contributions
 replied-0.014
 exclaimed-0.014
�-0.014
cakes-0.013
s-0.013
Token so
Token ID523
Feature activation-6.46e-02

Pos logit contributions
ocol+0.033
 loser+0.029
uff+0.029
ubb+0.029
ithering+0.029
Neg logit contributions
 exclaimed-0.010
 replied-0.009
 said-0.009
s-0.008
 -0.007
Token was
Token ID373
Feature activation+3.56e-02

Pos logit contributions
 exclaimed+0.056
 replied+0.055
 said+0.052
�+0.051
cakes+0.049
Neg logit contributions
ocol-0.122
uff-0.107
ubb-0.106
 loser-0.106
aught-0.105
Token filled
Token ID5901
Feature activation+8.05e-03

Pos logit contributions
ocol+0.090
uff+0.079
 loser+0.079
ubb+0.077
ithering+0.076
Neg logit contributions
 exclaimed-0.048
 replied-0.047
 said-0.046
s-0.042
�-0.041
Token with
Token ID351
Feature activation+3.82e-02

Pos logit contributions
ocol+0.019
 loser+0.016
uff+0.016
ubb+0.016
ithering+0.016
Neg logit contributions
 exclaimed-0.012
 replied-0.012
 said-0.011
s-0.011
�-0.011
Token delicious
Token ID12625
Feature activation+7.75e-04

Pos logit contributions
ocol+0.098
 loser+0.090
uff+0.089
ithering+0.088
ubb+0.085
Neg logit contributions
 exclaimed-0.048
 said-0.047
 replied-0.047
s-0.041
 -0.039
Token ch
Token ID442
Feature activation+6.69e-02

Pos logit contributions
ocol+0.002
 loser+0.002
uff+0.002
ubb+0.002
ithering+0.002
Neg logit contributions
 exclaimed-0.001
 replied-0.001
 said-0.001
s-0.001
�-0.001
Tokenocol
Token ID4668
Feature activation+2.77e-01

Pos logit contributions
 loser+0.042
 bruised+0.036
uzz+0.033
ateful+0.032
asured+0.031
Neg logit contributions
 exclaimed-0.185
s-0.184
 replied-0.182
 said-0.179
�-0.174
Tokenates
Token ID689
Feature activation+5.37e-03

Pos logit contributions
ocol+0.354
 loser+0.298
 bruised+0.276
ubb+0.274
uff+0.270
Neg logit contributions
 exclaimed-0.675
 replied-0.656
 said-0.652
s-0.646
cakes-0.627
Token.
Token ID13
Feature activation-3.61e-03

Pos logit contributions
ocol+0.014
uff+0.013
 loser+0.012
ubb+0.012
aught+0.012
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.006
�-0.005
s-0.005
Token He
Token ID679
Feature activation+1.74e-02

Pos logit contributions
 exclaimed+0.004
 replied+0.004
 said+0.004
s+0.003
�+0.003
Neg logit contributions
ocol-0.010
uff-0.009
 loser-0.009
ubb-0.009
ithering-0.009
Token was
Token ID373
Feature activation+1.03e-02

Pos logit contributions
ocol+0.048
uff+0.043
ubb+0.042
aught+0.042
 bruised+0.042
Neg logit contributions
 exclaimed-0.015
 replied-0.015
�-0.014
cakes-0.014
s-0.014
Token happy
Token ID3772
Feature activation-3.11e-02

Pos logit contributions
ocol+0.029
 loser+0.026
uff+0.025
ubb+0.025
ithering+0.025
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.010
s-0.009
�-0.009
Token as
Token ID355
Feature activation+3.15e-02

Pos logit contributions
 exclaimed+0.058
 replied+0.056
 said+0.053
�+0.051
cakes+0.050
Neg logit contributions
ocol-0.130
 loser-0.116
uff-0.115
ubb-0.113
aught-0.112
Token he
Token ID339
Feature activation+1.33e-02

Pos logit contributions
ocol+0.101
uff+0.091
 loser+0.090
ithering+0.089
ubb+0.089
Neg logit contributions
 exclaimed-0.020
 replied-0.020
 said-0.020
s-0.016
 they-0.016
Token opened
Token ID4721
Feature activation+2.76e-05

Pos logit contributions
ocol+0.039
uff+0.034
 loser+0.034
ubb+0.034
ithering+0.033
Neg logit contributions
 exclaimed-0.013
 replied-0.013
 said-0.012
s-0.011
�-0.011
Token the
Token ID262
Feature activation-2.53e-02

Pos logit contributions
ocol+0.000
 loser+0.000
uff+0.000
ithering+0.000
ubb+0.000
Neg logit contributions
 exclaimed-0.000
 replied-0.000
 said-0.000
s-0.000
 -0.000
Token ch
Token ID442
Feature activation+5.90e-02

Pos logit contributions
 replied+0.033
 exclaimed+0.033
 said+0.032
s+0.030
�+0.030
Neg logit contributions
ocol-0.064
uff-0.056
 loser-0.055
ubb-0.055
ithering-0.055
Tokenocol
Token ID4668
Feature activation+2.77e-01

Pos logit contributions
ocol+0.034
 bruised+0.032
 loser+0.032
ithering+0.031
ities+0.029
Neg logit contributions
 replied-0.169
 exclaimed-0.167
s-0.163
 said-0.163
�-0.158
Tokenates
Token ID689
Feature activation+4.73e-03

Pos logit contributions
ocol+0.350
 loser+0.305
 bruised+0.297
aught+0.281
ubb+0.281
Neg logit contributions
 exclaimed-0.655
 replied-0.638
 said-0.632
s-0.630
�-0.608
Token and
Token ID290
Feature activation-8.89e-03

Pos logit contributions
ocol+0.013
uff+0.011
 loser+0.011
ubb+0.011
ithering+0.011
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.005
�-0.005
Token started
Token ID2067
Feature activation+2.35e-02

Pos logit contributions
 exclaimed+0.010
 replied+0.010
 said+0.010
s+0.009
�+0.009
Neg logit contributions
ocol-0.024
uff-0.021
 loser-0.021
ubb-0.021
ithering-0.021
Token eating
Token ID6600
Feature activation-1.34e-02

Pos logit contributions
ocol+0.063
 loser+0.056
uff+0.055
ubb+0.055
ithering+0.054
Neg logit contributions
 exclaimed-0.027
 replied-0.027
 said-0.026
s-0.024
�-0.022
Token them
Token ID606
Feature activation-7.99e-03

Pos logit contributions
 said+0.013
 exclaimed+0.013
 replied+0.012
s+0.011
 they+0.011
Neg logit contributions
ocol-0.038
 loser-0.035
uff-0.035
ithering-0.034
aught-0.034
Tokenbled
Token ID9342
Feature activation-4.36e-02

Pos logit contributions
ocol+0.008
uff+0.007
ithering+0.006
uzz+0.006
ubb+0.006
Neg logit contributions
 exclaimed-0.013
 replied-0.012
 said-0.012
�-0.012
s-0.012
Token up
Token ID510
Feature activation+2.49e-03

Pos logit contributions
 exclaimed+0.050
 replied+0.049
 said+0.047
s+0.044
�+0.043
Neg logit contributions
ocol-0.117
 loser-0.103
uff-0.103
ubb-0.102
ithering-0.101
Token all
Token ID477
Feature activation-1.83e-02

Pos logit contributions
ocol+0.007
uff+0.006
 loser+0.006
ubb+0.006
 bruised+0.006
Neg logit contributions
 exclaimed-0.002
 replied-0.002
 said-0.002
s-0.002
�-0.002
Token the
Token ID262
Feature activation-1.15e-02

Pos logit contributions
 exclaimed+0.024
 replied+0.023
�+0.022
 said+0.021
cakes+0.020
Neg logit contributions
ocol-0.045
uff-0.039
 bruised-0.039
uzz-0.039
ubb-0.038
Token ch
Token ID442
Feature activation+6.75e-02

Pos logit contributions
 exclaimed+0.018
 replied+0.017
 said+0.017
s+0.016
�+0.016
Neg logit contributions
ocol-0.027
uff-0.023
 loser-0.023
ubb-0.023
ithering-0.022
Tokenocol
Token ID4668
Feature activation+2.76e-01

Pos logit contributions
 Kang+0.025
 loser+0.025
 bruised+0.024
 seeker+0.023
uding+0.022
Neg logit contributions
s-0.195
 exclaimed-0.189
 said-0.187
 replied-0.185
cakes-0.184
Tokenates
Token ID689
Feature activation+6.37e-03

Pos logit contributions
ocol+0.359
 loser+0.298
 bruised+0.280
ubb+0.278
aught+0.274
Neg logit contributions
 exclaimed-0.666
 replied-0.654
 said-0.649
s-0.645
cakes-0.628
Token,
Token ID11
Feature activation+1.49e-02

Pos logit contributions
ocol+0.015
uff+0.014
uzz+0.013
ubb+0.013
 bruised+0.013
Neg logit contributions
 exclaimed-0.008
 replied-0.008
�-0.007
 said-0.007
cakes-0.007
Token and
Token ID290
Feature activation-2.59e-02

Pos logit contributions
ocol+0.040
 loser+0.035
uff+0.035
ithering+0.035
ubb+0.034
Neg logit contributions
 exclaimed-0.018
 replied-0.018
 said-0.018
s-0.016
�-0.015
Token she
Token ID673
Feature activation-2.62e-02

Pos logit contributions
 exclaimed+0.027
 replied+0.027
 said+0.025
�+0.024
s+0.024
Neg logit contributions
ocol-0.070
uff-0.062
 loser-0.062
ubb-0.062
 bruised-0.062
Token was
Token ID373
Feature activation+1.02e-02

Pos logit contributions
 exclaimed+0.024
 replied+0.024
�+0.023
cakes+0.023
s+0.022
Neg logit contributions
ocol-0.069
uff-0.063
 bruised-0.062
asured-0.062
aught-0.062
Token't
Token ID470
Feature activation-2.66e-02

Pos logit contributions
 exclaimed+0.025
 replied+0.025
 said+0.024
s+0.022
�+0.022
Neg logit contributions
ocol-0.053
 loser-0.047
uff-0.046
ubb-0.046
ithering-0.045
Token have
Token ID423
Feature activation-4.14e-02

Pos logit contributions
 replied+0.032
 exclaimed+0.031
 said+0.031
�+0.028
s+0.027
Neg logit contributions
ocol-0.072
uff-0.064
ubb-0.063
 loser-0.061
ithering-0.061
Token all
Token ID477
Feature activation+6.90e-03

Pos logit contributions
 exclaimed+0.059
 replied+0.059
 said+0.055
s+0.054
�+0.053
Neg logit contributions
ocol-0.099
 loser-0.085
ubb-0.083
 bruised-0.083
uff-0.083
Token these
Token ID777
Feature activation+5.55e-02

Pos logit contributions
ocol+0.017
 loser+0.015
aught+0.015
 bruised+0.015
uff+0.015
Neg logit contributions
 replied-0.009
 exclaimed-0.009
�-0.008
 said-0.008
s-0.008
Token ch
Token ID442
Feature activation+8.76e-02

Pos logit contributions
ocol+0.136
ithering+0.123
 loser+0.122
uff+0.122
ubb+0.121
Neg logit contributions
 said-0.075
 exclaimed-0.075
 replied-0.072
s-0.070
�-0.067
Tokenocol
Token ID4668
Feature activation+2.76e-01

Pos logit contributions
 unsolved+0.039
 loser+0.035
ithering+0.031
 seeker+0.030
 bruised+0.030
Neg logit contributions
s-0.255
 exclaimed-0.246
 said-0.244
 replied-0.239
cakes-0.239
Tokenates
Token ID689
Feature activation+2.15e-02

Pos logit contributions
ocol+0.372
 loser+0.343
 bruised+0.334
ubb+0.320
arter+0.311
Neg logit contributions
 exclaimed-0.607
s-0.592
 said-0.587
 replied-0.578
cakes-0.576
Token!
Token ID0
Feature activation+1.71e-02

Pos logit contributions
ocol+0.057
uff+0.050
 loser+0.050
ubb+0.049
ithering+0.049
Neg logit contributions
 exclaimed-0.026
 replied-0.025
 said-0.025
s-0.024
�-0.023
Token Now
Token ID2735
Feature activation-4.68e-02

Pos logit contributions
ocol+0.036
ubb+0.031
arter+0.030
ithering+0.030
uff+0.030
Neg logit contributions
 -0.032
 exclaimed-0.030
 said-0.030
 replied-0.030
�-0.029
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
 exclaimed+0.059
 replied+0.059
 said+0.058
s+0.053
�+0.053
Neg logit contributions
ocol-0.123
uff-0.108
 loser-0.106
ubb-0.106
ithering-0.106
Token go
Token ID467
Feature activation+2.91e-02

Pos logit contributions
ocol+0.036
uff+0.032
ithering+0.031
ubb+0.031
arter+0.031
Neg logit contributions
 said-0.018
 replied-0.017
 exclaimed-0.017
 they-0.017
�-0.015
Token a
Token ID257
Feature activation-6.86e-02

Pos logit contributions
ocol+0.100
uff+0.091
 loser+0.091
ithering+0.089
ubb+0.088
Neg logit contributions
 said-0.030
 exclaimed-0.030
 replied-0.029
s-0.024
 -0.024
Token big
Token ID1263
Feature activation-3.65e-02

Pos logit contributions
 exclaimed+0.090
 replied+0.088
 said+0.085
s+0.081
�+0.079
Neg logit contributions
ocol-0.172
 loser-0.151
uff-0.149
ubb-0.148
ithering-0.146
Token box
Token ID3091
Feature activation-7.71e-03

Pos logit contributions
 replied+0.037
 said+0.035
 exclaimed+0.033
�+0.031
 looked+0.029
Neg logit contributions
ocol-0.104
uff-0.094
aught-0.093
ubb-0.093
ithering-0.092
Token of
Token ID286
Feature activation+6.33e-02

Pos logit contributions
 exclaimed+0.008
 replied+0.008
 said+0.008
s+0.007
�+0.007
Neg logit contributions
ocol-0.021
uff-0.019
 loser-0.019
ubb-0.018
ithering-0.018
Token ch
Token ID442
Feature activation+6.21e-02

Pos logit contributions
ocol+0.152
uff+0.144
ithering+0.142
 loser+0.142
ubb+0.140
Neg logit contributions
 said-0.088
 exclaimed-0.086
 replied-0.086
s-0.074
�-0.072
Tokenocol
Token ID4668
Feature activation+2.76e-01

Pos logit contributions
 loser+0.039
 bruised+0.033
 property+0.032
 Kang+0.030
 seeker+0.030
Neg logit contributions
s-0.168
 exclaimed-0.165
 replied-0.162
 said-0.161
les-0.158
Tokenates
Token ID689
Feature activation+4.42e-03

Pos logit contributions
ocol+0.353
 loser+0.290
ubb+0.274
 bruised+0.272
aught+0.267
Neg logit contributions
 exclaimed-0.668
 replied-0.653
 said-0.649
s-0.638
cakes-0.621
Token!
Token ID0
Feature activation-6.84e-03

Pos logit contributions
ocol+0.012
 loser+0.011
uff+0.011
ubb+0.011
ithering+0.011
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.004
�-0.004
Token<|endoftext|>
Token ID50256
Feature activation+1.35e-02

Pos logit contributions
 exclaimed+0.005
 replied+0.005
 said+0.005
s+0.005
�+0.004
Neg logit contributions
ocol-0.021
uff-0.018
 loser-0.018
ubb-0.018
aught-0.018
TokenOnce
Token ID7454
Feature activation+2.24e-02

Pos logit contributions
ocol+0.032
uff+0.028
 loser+0.028
ithering+0.028
uzz+0.027
Neg logit contributions
 exclaimed-0.019
 replied-0.018
 said-0.017
cakes-0.017
s-0.016
Token upon
Token ID2402
Feature activation+6.97e-02

Pos logit contributions
ocol+0.037
uff+0.031
uzz+0.030
 loser+0.029
aught+0.029
Neg logit contributions
 exclaimed-0.047
 replied-0.046
cakes-0.044
 said-0.044
�-0.044
Token a
Token ID257
Feature activation-5.21e-02

Pos logit contributions
 exclaimed+0.048
 replied+0.048
�+0.044
s+0.043
 said+0.042
Neg logit contributions
ocol-0.113
 loser-0.101
uff-0.100
 bruised-0.099
aught-0.098
Token large
Token ID1588
Feature activation-5.52e-03

Pos logit contributions
 replied+0.058
 exclaimed+0.057
 said+0.057
s+0.052
�+0.052
Neg logit contributions
ocol-0.143
ubb-0.127
uff-0.126
 loser-0.124
uzz-0.123
Token selection
Token ID6356
Feature activation+2.98e-02

Pos logit contributions
 replied+0.008
 said+0.007
 exclaimed+0.007
�+0.007
 looked+0.007
Neg logit contributions
ocol-0.013
ubb-0.012
ateful-0.012
uzz-0.012
aught-0.012
Token of
Token ID286
Feature activation+6.66e-02

Pos logit contributions
ocol+0.086
 loser+0.077
uff+0.076
ubb+0.076
ithering+0.075
Neg logit contributions
 exclaimed-0.028
 replied-0.028
 said-0.027
s-0.026
cakes-0.024
Token ch
Token ID442
Feature activation+7.02e-02

Pos logit contributions
ocol+0.157
 loser+0.156
ithering+0.151
ubb+0.149
arter+0.149
Neg logit contributions
 said-0.092
 exclaimed-0.090
 replied-0.089
s-0.083
 -0.079
Tokenocol
Token ID4668
Feature activation+2.75e-01

Pos logit contributions
 bruised+0.036
 unsolved+0.030
 loser+0.030
 property+0.028
 seeker+0.027
Neg logit contributions
s-0.210
 exclaimed-0.195
cakes-0.192
 said-0.192
les-0.188
Tokenates
Token ID689
Feature activation+1.72e-02

Pos logit contributions
ocol+0.356
 loser+0.324
 bruised+0.320
ubb+0.303
aught+0.300
Neg logit contributions
 exclaimed-0.620
s-0.605
 said-0.598
 replied-0.597
cakes-0.589
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
ocol+0.050
 loser+0.044
uff+0.044
ubb+0.044
ithering+0.043
Neg logit contributions
 exclaimed-0.017
 replied-0.016
 said-0.016
s-0.015
�-0.014
Token Ell
Token ID7122
Feature activation+3.41e-02

Pos logit contributions
ocol+0.034
 loser+0.030
ubb+0.030
ithering+0.030
arter+0.030
Neg logit contributions
 exclaimed-0.016
 replied-0.015
 said-0.015
 -0.015
s-0.014
Tokena
Token ID64
Feature activation-3.04e-02

Pos logit contributions
ocol+0.055
uff+0.047
aught+0.044
 loser+0.044
uzz+0.044
Neg logit contributions
 replied-0.073
 exclaimed-0.073
 said-0.070
cakes-0.069
�-0.069
Token picked
Token ID6497
Feature activation-1.76e-02

Pos logit contributions
 exclaimed+0.025
 replied+0.024
 said+0.023
s+0.021
�+0.021
Neg logit contributions
ocol-0.089
uff-0.080
 loser-0.079
ubb-0.078
aught-0.078
Token he
Token ID339
Feature activation+2.57e-03

Pos logit contributions
 exclaimed+0.020
 replied+0.020
 said+0.019
s+0.016
�+0.015
Neg logit contributions
ocol-0.084
uff-0.076
 loser-0.075
ubb-0.074
ithering-0.074
Token ate
Token ID15063
Feature activation-4.29e-02

Pos logit contributions
ocol+0.006
uff+0.006
 loser+0.006
 bruised+0.006
ubb+0.006
Neg logit contributions
 replied-0.003
 exclaimed-0.003
cakes-0.003
�-0.003
s-0.003
Token all
Token ID477
Feature activation-2.26e-02

Pos logit contributions
 exclaimed+0.049
 replied+0.048
 said+0.046
s+0.043
�+0.042
Neg logit contributions
ocol-0.116
uff-0.101
 loser-0.100
ubb-0.100
 bruised-0.098
Token the
Token ID262
Feature activation-1.82e-02

Pos logit contributions
 exclaimed+0.032
 replied+0.031
�+0.029
 said+0.028
cakes+0.027
Neg logit contributions
ocol-0.054
uff-0.045
uzz-0.045
ubb-0.044
aught-0.044
Token ch
Token ID442
Feature activation+6.41e-02

Pos logit contributions
 exclaimed+0.024
 replied+0.024
 said+0.023
s+0.022
�+0.021
Neg logit contributions
ocol-0.046
uff-0.038
ubb-0.038
 loser-0.038
 bruised-0.038
Tokenocol
Token ID4668
Feature activation+2.74e-01

Pos logit contributions
 loser+0.021
 property+0.019
 seeker+0.016
ithering+0.016
 unsolved+0.014
Neg logit contributions
 exclaimed-0.200
s-0.199
 replied-0.197
 said-0.195
cakes-0.188
Tokenates
Token ID689
Feature activation+9.42e-03

Pos logit contributions
ocol+0.352
 loser+0.306
 bruised+0.287
ubb+0.281
aught+0.281
Neg logit contributions
 exclaimed-0.653
 replied-0.635
 said-0.633
s-0.628
cakes-0.608
Token and
Token ID290
Feature activation-4.44e-03

Pos logit contributions
ocol+0.023
uff+0.021
ubb+0.020
uzz+0.020
 loser+0.020
Neg logit contributions
 exclaimed-0.012
 replied-0.012
�-0.011
 said-0.011
cakes-0.010
Token sweets
Token ID42402
Feature activation+1.12e-02

Pos logit contributions
 exclaimed+0.004
 replied+0.004
s+0.004
�+0.004
 said+0.004
Neg logit contributions
ocol-0.012
 bruised-0.011
 loser-0.011
ubb-0.011
uff-0.011
Token.
Token ID13
Feature activation+6.54e-03

Pos logit contributions
ocol+0.026
uff+0.023
 loser+0.023
 bruised+0.023
ubb+0.022
Neg logit contributions
 exclaimed-0.015
 replied-0.015
�-0.014
 said-0.013
 -0.013
Token Then
Token ID3244
Feature activation-2.24e-02

Pos logit contributions
ocol+0.018
 loser+0.016
ubb+0.016
ithering+0.015
ateful+0.015
Neg logit contributions
 replied-0.008
 -0.008
 exclaimed-0.008
 said-0.008
s-0.007
Token box
Token ID3091
Feature activation-1.32e-02

Pos logit contributions
 replied+0.032
 exclaimed+0.031
 said+0.031
s+0.030
�+0.029
Neg logit contributions
ocol-0.073
uff-0.066
ubb-0.066
 loser-0.065
aught-0.065
Token and
Token ID290
Feature activation-1.91e-02

Pos logit contributions
 exclaimed+0.017
�+0.017
 replied+0.016
cakes+0.015
 said+0.014
Neg logit contributions
ocol-0.030
uff-0.029
uzz-0.028
aught-0.028
 loser-0.027
Token share
Token ID2648
Feature activation-3.54e-02

Pos logit contributions
 exclaimed+0.022
 replied+0.021
 said+0.020
s+0.019
cakes+0.018
Neg logit contributions
ocol-0.051
 loser-0.045
uff-0.044
ubb-0.044
 bruised-0.044
Token the
Token ID262
Feature activation-3.21e-02

Pos logit contributions
 exclaimed+0.039
 replied+0.039
 said+0.034
�+0.034
cakes+0.033
Neg logit contributions
ocol-0.096
uff-0.084
 loser-0.083
uzz-0.083
ubb-0.082
Token ch
Token ID442
Feature activation+6.09e-02

Pos logit contributions
 replied+0.043
 said+0.043
 exclaimed+0.042
�+0.039
 +0.038
Neg logit contributions
ocol-0.080
uff-0.072
ubb-0.072
aught-0.071
ithering-0.071
Tokenocol
Token ID4668
Feature activation+2.73e-01

Pos logit contributions
 bruised+0.038
conf+0.032
tons+0.030
 unatt+0.030
ithering+0.029
Neg logit contributions
s-0.168
 exclaimed-0.162
 replied-0.162
 said-0.157
�-0.156
Tokenates
Token ID689
Feature activation-4.29e-03

Pos logit contributions
ocol+0.372
 bruised+0.314
 loser+0.310
ubb+0.306
aught+0.296
Neg logit contributions
 exclaimed-0.621
s-0.616
 said-0.603
 replied-0.603
cakes-0.592
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
 replied+0.006
 exclaimed+0.005
�+0.005
 said+0.005
cakes+0.005
Neg logit contributions
ocol-0.010
uff-0.009
uzz-0.009
 loser-0.009
aught-0.009
Token They
Token ID1119
Feature activation-4.51e-02

Pos logit contributions
ocol+0.040
uff+0.035
 loser+0.035
uzz+0.034
ithering+0.034
Neg logit contributions
 exclaimed-0.020
 replied-0.019
 said-0.018
s-0.017
�-0.017
Token are
Token ID389
Feature activation+2.22e-03

Pos logit contributions
 exclaimed+0.054
 replied+0.054
 said+0.053
s+0.049
�+0.048
Neg logit contributions
ocol-0.119
uff-0.104
 loser-0.104
ubb-0.103
ithering-0.102
Token happy
Token ID3772
Feature activation-3.56e-02

Pos logit contributions
ocol+0.006
uff+0.005
 loser+0.005
ubb+0.005
ithering+0.005
Neg logit contributions
 exclaimed-0.003
 said-0.003
 replied-0.003
s-0.002
 they-0.002
Token and
Token ID290
Feature activation+2.44e-03

Pos logit contributions
ocol+0.010
uff+0.009
 loser+0.009
ubb+0.009
ithering+0.009
Neg logit contributions
 exclaimed-0.004
 replied-0.004
 said-0.004
�-0.004
s-0.003
Token her
Token ID607
Feature activation+6.68e-02

Pos logit contributions
ocol+0.006
 loser+0.005
uff+0.005
ubb+0.005
 bruised+0.005
Neg logit contributions
 exclaimed-0.003
 replied-0.003
 said-0.003
s-0.003
�-0.003
Token favourite
Token ID12507
Feature activation+1.56e-02

Pos logit contributions
ocol+0.160
 loser+0.141
ithering+0.141
uff+0.139
ubb+0.136
Neg logit contributions
 replied-0.098
 said-0.095
 exclaimed-0.095
�-0.087
s-0.087
Token,
Token ID11
Feature activation+2.75e-02

Pos logit contributions
ocol+0.041
uff+0.036
 loser+0.036
ubb+0.035
ithering+0.035
Neg logit contributions
 exclaimed-0.019
 replied-0.019
 said-0.018
s-0.017
�-0.017
Token ch
Token ID442
Feature activation+6.17e-02

Pos logit contributions
ocol+0.073
 loser+0.066
uff+0.065
ithering+0.065
ubb+0.065
Neg logit contributions
 said-0.033
 replied-0.032
 exclaimed-0.032
s-0.029
�-0.028
Tokenocol
Token ID4668
Feature activation+2.73e-01

Pos logit contributions
 bruised+0.029
 property+0.028
 loser+0.026
 slaves+0.022
 unsolved+0.022
Neg logit contributions
 replied-0.178
cakes-0.174
 exclaimed-0.173
s-0.173
�-0.169
Tokenates
Token ID689
Feature activation+5.16e-03

Pos logit contributions
ocol+0.372
 loser+0.315
ubb+0.305
 bruised+0.304
aught+0.291
Neg logit contributions
 exclaimed-0.624
 replied-0.610
 said-0.608
s-0.603
cakes-0.592
Token!
Token ID0
Feature activation-1.08e-02

Pos logit contributions
ocol+0.014
uff+0.012
 loser+0.012
ubb+0.012
aught+0.012
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.005
�-0.005
s-0.005
Token She
Token ID1375
Feature activation+1.92e-02

Pos logit contributions
 exclaimed+0.015
 replied+0.015
 said+0.014
cakes+0.013
�+0.013
Neg logit contributions
ocol-0.024
uff-0.022
aught-0.021
ubb-0.021
uzz-0.021
Token asked
Token ID1965
Feature activation+2.14e-02

Pos logit contributions
ocol+0.049
uff+0.044
asured+0.044
uzz+0.043
aught+0.043
Neg logit contributions
 exclaimed-0.018
�-0.017
 replied-0.017
cakes-0.017
s-0.016
Token mummy
Token ID38567
Feature activation-4.23e-02

Pos logit contributions
ocol+0.053
uff+0.046
ubb+0.045
 loser+0.045
aught+0.044
Neg logit contributions
 exclaimed-0.030
 replied-0.029
 said-0.028
�-0.027
s-0.026
Token not
Token ID407
Feature activation+9.85e-03

Pos logit contributions
 replied+0.032
 exclaimed+0.032
 said+0.031
s+0.029
�+0.028
Neg logit contributions
ocol-0.057
uff-0.049
 loser-0.049
ubb-0.048
ithering-0.048
Token eat
Token ID4483
Feature activation+1.67e-02

Pos logit contributions
ocol+0.027
uff+0.024
ubb+0.023
ithering+0.023
 loser+0.023
Neg logit contributions
 said-0.012
 replied-0.012
 exclaimed-0.011
s-0.010
�-0.010
Token too
Token ID1165
Feature activation-8.27e-03

Pos logit contributions
ocol+0.043
aught+0.041
 loser+0.041
uff+0.041
ithering+0.041
Neg logit contributions
 said-0.019
 they-0.019
s-0.018
 exclaimed-0.017
 looked-0.016
Token many
Token ID867
Feature activation+2.40e-02

Pos logit contributions
 exclaimed+0.008
 said+0.008
s+0.008
 replied+0.008
�+0.007
Neg logit contributions
ocol-0.023
uff-0.021
 loser-0.021
ubb-0.020
ithering-0.020
Token ch
Token ID442
Feature activation+8.13e-02

Pos logit contributions
ocol+0.057
ithering+0.056
uff+0.055
 loser+0.055
aught+0.053
Neg logit contributions
 said-0.034
 exclaimed-0.032
s-0.031
 replied-0.031
 looked-0.028
Tokenocol
Token ID4668
Feature activation+2.73e-01

Pos logit contributions
 loser+0.053
 seeker+0.053
 unsolved+0.049
ocated+0.046
tons+0.043
Neg logit contributions
s-0.223
 said-0.203
 exclaimed-0.201
ons-0.199
 replied-0.196
Tokenates
Token ID689
Feature activation+2.49e-02

Pos logit contributions
ocol+0.387
 bruised+0.341
 loser+0.339
ubb+0.328
aught+0.320
Neg logit contributions
 exclaimed-0.581
s-0.577
 said-0.567
 replied-0.558
�-0.545
Token and
Token ID290
Feature activation+3.57e-03

Pos logit contributions
ocol+0.072
uff+0.064
ithering+0.064
 loser+0.063
ubb+0.063
Neg logit contributions
s-0.025
 said-0.025
 exclaimed-0.023
 replied-0.023
 they-0.022
Token his
Token ID465
Feature activation+8.31e-02

Pos logit contributions
ocol+0.010
uff+0.009
ithering+0.008
 loser+0.008
ubb+0.008
Neg logit contributions
 said-0.004
 exclaimed-0.004
 replied-0.004
s-0.004
 they-0.004
Token friend
Token ID1545
Feature activation-7.44e-03

Pos logit contributions
ocol+0.189
ithering+0.180
udge+0.171
uff+0.170
 loser+0.167
Neg logit contributions
 said-0.129
 replied-0.125
 exclaimed-0.119
�-0.110
s-0.109
Token learned
Token ID4499
Feature activation-6.01e-02

Pos logit contributions
 said+0.007
 replied+0.006
 exclaimed+0.006
s+0.005
 they+0.005
Neg logit contributions
ocol-0.023
uff-0.021
ithering-0.021
ubb-0.021
 loser-0.021
Token and
Token ID290
Feature activation-3.08e-02

Pos logit contributions
�+0.021
 exclaimed+0.021
 replied+0.020
 said+0.018
cakes+0.018
Neg logit contributions
ocol-0.042
uff-0.039
aught-0.038
 bruised-0.037
ithering-0.037
Token started
Token ID2067
Feature activation+1.35e-02

Pos logit contributions
 exclaimed+0.031
 replied+0.031
 said+0.028
s+0.028
�+0.027
Neg logit contributions
ocol-0.085
uff-0.074
 loser-0.074
 bruised-0.073
ubb-0.073
Token eating
Token ID6600
Feature activation-2.27e-02

Pos logit contributions
ocol+0.040
uff+0.035
 loser+0.035
ubb+0.035
ithering+0.034
Neg logit contributions
 exclaimed-0.012
 replied-0.012
 said-0.011
s-0.010
�-0.010
Token the
Token ID262
Feature activation-2.95e-02

Pos logit contributions
 exclaimed+0.021
 replied+0.020
 said+0.020
s+0.018
�+0.017
Neg logit contributions
ocol-0.066
 loser-0.059
uff-0.059
ubb-0.059
ithering-0.058
Token ch
Token ID442
Feature activation+5.87e-02

Pos logit contributions
 exclaimed+0.039
 replied+0.038
 said+0.036
s+0.034
�+0.034
Neg logit contributions
ocol-0.075
 loser-0.065
uff-0.065
ubb-0.064
ithering-0.064
Tokenocol
Token ID4668
Feature activation+2.73e-01

Pos logit contributions
conf+0.016
 bruised+0.014
uzz+0.012
ateful+0.011
 Kang+0.011
Neg logit contributions
s-0.181
cakes-0.178
 exclaimed-0.177
 replied-0.177
�-0.174
Tokenates
Token ID689
Feature activation-1.62e-03

Pos logit contributions
ocol+0.349
 loser+0.289
 bruised+0.276
ubb+0.275
uff+0.269
Neg logit contributions
 exclaimed-0.659
 replied-0.644
s-0.637
 said-0.637
cakes-0.622
Token.
Token ID13
Feature activation+5.22e-05

Pos logit contributions
 exclaimed+0.002
 replied+0.002
 said+0.002
�+0.002
s+0.002
Neg logit contributions
ocol-0.004
uff-0.004
 loser-0.004
ubb-0.004
ithering-0.004
Token The
Token ID383
Feature activation-2.54e-02

Pos logit contributions
ocol+0.000
 loser+0.000
ubb+0.000
uff+0.000
ithering+0.000
Neg logit contributions
 exclaimed-0.000
 replied-0.000
 said-0.000
 -0.000
s-0.000
Token shop
Token ID6128
Feature activation+3.83e-02

Pos logit contributions
 replied+0.036
 said+0.035
 exclaimed+0.035
�+0.032
s+0.032
Neg logit contributions
ocol-0.062
ithering-0.055
uff-0.054
uzz-0.054
ubb-0.054
Tokenkeeper
Token ID13884
Feature activation+5.10e-03

Pos logit contributions
ocol+0.093
uff+0.082
 loser+0.081
ubb+0.080
aught+0.080
Neg logit contributions
 exclaimed-0.051
 replied-0.049
 said-0.047
�-0.046
s-0.045
Token noticed
Token ID6810
Feature activation+2.04e-03

Pos logit contributions
 exclaimed+0.033
 replied+0.032
cakes+0.031
�+0.028
 said+0.027
Neg logit contributions
ocol-0.088
 loser-0.082
uff-0.081
 bruised-0.080
ithering-0.079
Token something
Token ID1223
Feature activation-2.36e-02

Pos logit contributions
ocol+0.006
 loser+0.005
uff+0.005
ubb+0.005
ithering+0.005
Neg logit contributions
 exclaimed-0.002
 replied-0.002
 said-0.002
s-0.002
�-0.002
Token.
Token ID13
Feature activation+1.84e-02

Pos logit contributions
 exclaimed+0.033
 replied+0.033
 said+0.033
s+0.031
�+0.029
Neg logit contributions
ocol-0.058
 loser-0.050
uff-0.050
ubb-0.049
aught-0.048
Token The
Token ID383
Feature activation-1.85e-02

Pos logit contributions
ocol+0.052
asured+0.048
ind+0.047
ubb+0.047
uff+0.047
Neg logit contributions
 -0.018
 replied-0.017
 said-0.016
 exclaimed-0.016
�-0.016
Token ch
Token ID442
Feature activation+6.46e-02

Pos logit contributions
 replied+0.030
 exclaimed+0.029
 said+0.029
s+0.027
�+0.027
Neg logit contributions
ocol-0.042
uff-0.036
 loser-0.035
ubb-0.035
aught-0.035
Tokenocol
Token ID4668
Feature activation+2.72e-01

Pos logit contributions
conf+0.057
 property+0.056
 unsolved+0.054
 supervision+0.052
aught+0.050
Neg logit contributions
 replied-0.161
s-0.160
 exclaimed-0.156
 said-0.154
�-0.149
Tokenates
Token ID689
Feature activation+2.65e-03

Pos logit contributions
ocol+0.366
 loser+0.300
ubb+0.291
aught+0.291
 bruised+0.286
Neg logit contributions
 exclaimed-0.632
 replied-0.625
 said-0.622
s-0.620
�-0.594
Token were
Token ID547
Feature activation+2.40e-02

Pos logit contributions
ocol+0.008
 loser+0.007
uff+0.007
ubb+0.007
aught+0.007
Neg logit contributions
 exclaimed-0.002
 replied-0.002
 said-0.002
s-0.002
�-0.001
Token bitter
Token ID12922
Feature activation+2.19e-02

Pos logit contributions
ocol+0.061
conf+0.055
 loser+0.054
aught+0.054
 property+0.053
Neg logit contributions
 said-0.032
 exclaimed-0.029
 replied-0.028
 they-0.027
 looked-0.027
Token.
Token ID13
Feature activation+3.19e-02

Pos logit contributions
ocol+0.061
uff+0.054
 loser+0.054
ubb+0.053
ithering+0.053
Neg logit contributions
 exclaimed-0.023
 replied-0.022
 said-0.021
�-0.019
s-0.019
Token They
Token ID1119
Feature activation-1.93e-02

Pos logit contributions
ocol+0.088
asured+0.078
uff+0.078
ubb+0.078
ind+0.077
Neg logit contributions
 replied-0.034
 said-0.033
 exclaimed-0.033
 -0.033
�-0.031
Token a
Token ID257
Feature activation-7.89e-02

Pos logit contributions
ocol+0.002
 loser+0.002
uff+0.002
ubb+0.002
ithering+0.001
Neg logit contributions
 exclaimed-0.001
 replied-0.001
 said-0.001
s-0.000
�-0.000
Token big
Token ID1263
Feature activation-4.29e-02

Pos logit contributions
 replied+0.091
 exclaimed+0.090
 said+0.088
s+0.081
�+0.081
Neg logit contributions
ocol-0.214
uff-0.186
ubb-0.185
uzz-0.183
 loser-0.183
Token box
Token ID3091
Feature activation-1.77e-02

Pos logit contributions
 replied+0.055
 said+0.053
�+0.049
 exclaimed+0.048
 looked+0.047
Neg logit contributions
ocol-0.110
uzz-0.101
ubb-0.100
aught-0.099
uff-0.099
Token of
Token ID286
Feature activation+5.69e-02

Pos logit contributions
 exclaimed+0.022
 replied+0.022
 said+0.021
s+0.020
�+0.020
Neg logit contributions
ocol-0.045
uff-0.040
 loser-0.040
ubb-0.039
ithering-0.039
Token ch
Token ID442
Feature activation+5.89e-02

Pos logit contributions
ocol+0.131
 loser+0.123
uff+0.123
ithering+0.122
ubb+0.122
Neg logit contributions
 said-0.083
 exclaimed-0.082
 replied-0.080
s-0.072
cakes-0.071
Tokenocol
Token ID4668
Feature activation+2.72e-01

Pos logit contributions
 bruised+0.046
tons+0.045
 loser+0.045
uding+0.042
asured+0.042
Neg logit contributions
s-0.154
 exclaimed-0.142
les-0.138
 said-0.138
ons-0.137
Tokenates
Token ID689
Feature activation+6.97e-03

Pos logit contributions
ocol+0.345
 loser+0.282
ubb+0.273
 bruised+0.270
aught+0.262
Neg logit contributions
 exclaimed-0.661
 replied-0.643
 said-0.639
s-0.634
cakes-0.617
Token.
Token ID13
Feature activation+9.66e-03

Pos logit contributions
ocol+0.020
 loser+0.017
uff+0.017
ubb+0.017
ithering+0.017
Neg logit contributions
 exclaimed-0.007
 replied-0.007
 said-0.007
s-0.006
�-0.006
Token It
Token ID632
Feature activation-3.93e-03

Pos logit contributions
ocol+0.026
ubb+0.023
 loser+0.023
ithering+0.022
uff+0.022
Neg logit contributions
 replied-0.012
 exclaimed-0.012
 said-0.012
 -0.011
cakes-0.010
Token was
Token ID373
Feature activation+4.22e-02

Pos logit contributions
 exclaimed+0.003
�+0.003
 replied+0.003
cakes+0.002
s+0.002
Neg logit contributions
ocol-0.011
 bruised-0.010
uff-0.010
 loser-0.010
uzz-0.010
Token so
Token ID523
Feature activation-4.51e-02

Pos logit contributions
ocol+0.130
 loser+0.121
ubb+0.119
uff+0.119
aught+0.118
Neg logit contributions
 said-0.035
 replied-0.032
 exclaimed-0.031
 they-0.026
 looked-0.026
Token a
Token ID257
Feature activation-7.92e-02

Pos logit contributions
 exclaimed+0.058
 replied+0.056
 said+0.053
cakes+0.053
�+0.053
Neg logit contributions
ocol-0.065
uff-0.055
 loser-0.054
uzz-0.053
aught-0.053
Token big
Token ID1263
Feature activation-4.29e-02

Pos logit contributions
 exclaimed+0.098
 replied+0.094
 said+0.090
s+0.086
�+0.085
Neg logit contributions
ocol-0.201
 loser-0.179
uff-0.176
 bruised-0.173
ubb-0.173
Token box
Token ID3091
Feature activation-1.36e-02

Pos logit contributions
 replied+0.059
 exclaimed+0.057
 said+0.056
�+0.052
s+0.052
Neg logit contributions
ocol-0.106
uff-0.093
ubb-0.093
 loser-0.092
ithering-0.092
Token of
Token ID286
Feature activation+5.45e-02

Pos logit contributions
 exclaimed+0.017
 replied+0.016
�+0.016
 said+0.015
cakes+0.015
Neg logit contributions
ocol-0.033
aught-0.030
uff-0.030
uzz-0.029
 loser-0.029
Token ch
Token ID442
Feature activation+5.97e-02

Pos logit contributions
ocol+0.117
 loser+0.110
ithering+0.109
uff+0.109
ubb+0.106
Neg logit contributions
 said-0.091
 exclaimed-0.089
 replied-0.089
s-0.079
 looked-0.076
Tokenocol
Token ID4668
Feature activation+2.72e-01

Pos logit contributions
 bruised+0.053
 loser+0.051
tons+0.051
 Kang+0.045
uding+0.045
Neg logit contributions
s-0.146
 exclaimed-0.141
 replied-0.138
les-0.136
 said-0.135
Tokenates
Token ID689
Feature activation+2.71e-03

Pos logit contributions
ocol+0.363
 loser+0.293
ubb+0.281
 bruised+0.274
uff+0.271
Neg logit contributions
 exclaimed-0.653
 replied-0.639
 said-0.634
s-0.627
cakes-0.609
Token to
Token ID284
Feature activation+4.93e-03

Pos logit contributions
ocol+0.007
uff+0.006
 loser+0.006
uzz+0.006
aught+0.006
Neg logit contributions
 exclaimed-0.003
 replied-0.003
 said-0.003
�-0.003
s-0.003
Token share
Token ID2648
Feature activation-2.54e-02

Pos logit contributions
ocol+0.013
uff+0.011
 loser+0.011
ubb+0.011
ithering+0.011
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.006
s-0.005
�-0.005
Token with
Token ID351
Feature activation+2.51e-02

Pos logit contributions
 exclaimed+0.026
 said+0.026
 replied+0.025
s+0.025
 they+0.022
Neg logit contributions
ocol-0.072
 loser-0.065
ubb-0.065
uff-0.065
ithering-0.064
Token the
Token ID262
Feature activation-2.62e-02

Pos logit contributions
ocol+0.061
uff+0.053
 loser+0.053
ubb+0.052
ithering+0.052
Neg logit contributions
 exclaimed-0.036
 replied-0.035
 said-0.034
s-0.032
�-0.031
Token a
Token ID257
Feature activation-7.08e-02

Pos logit contributions
ocol+0.104
ubb+0.093
arter+0.092
uff+0.092
 loser+0.090
Neg logit contributions
 said-0.026
 they-0.025
 looked-0.022
 -0.021
 replied-0.021
Token big
Token ID1263
Feature activation-3.23e-02

Pos logit contributions
 said+0.086
 replied+0.086
 exclaimed+0.083
�+0.077
 looked+0.075
Neg logit contributions
ocol-0.192
uff-0.168
ubb-0.167
uzz-0.165
aught-0.165
Token box
Token ID3091
Feature activation-6.07e-03

Pos logit contributions
 replied+0.039
 said+0.037
 looked+0.034
�+0.034
 exclaimed+0.033
Neg logit contributions
ocol-0.087
ubb-0.080
aught-0.080
uff-0.079
uzz-0.079
Token of
Token ID286
Feature activation+6.52e-02

Pos logit contributions
 exclaimed+0.006
 replied+0.006
 said+0.006
s+0.006
�+0.005
Neg logit contributions
ocol-0.017
 loser-0.015
uff-0.015
ubb-0.015
ithering-0.015
Token ch
Token ID442
Feature activation+6.37e-02

Pos logit contributions
ocol+0.155
uff+0.153
ithering+0.150
ubb+0.149
aught+0.149
Neg logit contributions
 said-0.091
 replied-0.086
 exclaimed-0.086
 looked-0.077
 they-0.074
Tokenocol
Token ID4668
Feature activation+2.72e-01

Pos logit contributions
 Kang+0.032
uding+0.031
tons+0.031
aught+0.031
 bruised+0.030
Neg logit contributions
s-0.172
 exclaimed-0.166
 said-0.165
ons-0.164
�-0.163
Tokenates
Token ID689
Feature activation+5.07e-03

Pos logit contributions
ocol+0.354
 loser+0.290
ubb+0.277
 bruised+0.272
aught+0.270
Neg logit contributions
 exclaimed-0.654
 replied-0.641
 said-0.636
s-0.624
cakes-0.609
Token!
Token ID0
Feature activation-6.14e-03

Pos logit contributions
ocol+0.014
 loser+0.013
uff+0.013
ubb+0.013
ithering+0.012
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.005
�-0.004
Token
Token ID220
Feature activation-5.90e-02

Pos logit contributions
 exclaimed+0.010
 replied+0.010
 said+0.009
s+0.009
�+0.009
Neg logit contributions
ocol-0.013
uff-0.012
 loser-0.011
aught-0.011
ubb-0.011
Token
Token ID198
Feature activation-7.17e-03

Pos logit contributions
 exclaimed+0.042
 replied+0.041
 said+0.037
s+0.033
�+0.032
Neg logit contributions
ocol-0.183
uff-0.165
 loser-0.164
ithering-0.162
ubb-0.162
Token
Token ID198
Feature activation-6.65e-04

Pos logit contributions
cakes+0.011
 replied+0.010
 started+0.009
 talked+0.009
 began+0.009
Neg logit contributions
uzz-0.013
uff-0.013
udge-0.013
ocol-0.012
aught-0.012
Token was
Token ID373
Feature activation+1.92e-02

Pos logit contributions
�+0.001
 exclaimed+0.001
 replied+0.001
cakes+0.001
s+0.001
Neg logit contributions
ocol-0.003
uff-0.003
aught-0.003
uzz-0.003
 bruised-0.003
Token a
Token ID257
Feature activation-6.60e-02

Pos logit contributions
ocol+0.057
 loser+0.050
uff+0.050
ubb+0.050
ithering+0.049
Neg logit contributions
 exclaimed-0.018
 replied-0.017
 said-0.017
s-0.015
�-0.014
Token box
Token ID3091
Feature activation-2.90e-03

Pos logit contributions
 replied+0.079
 exclaimed+0.079
 said+0.077
s+0.069
�+0.069
Neg logit contributions
ocol-0.177
uff-0.156
ubb-0.153
 loser-0.153
ithering-0.151
Token of
Token ID286
Feature activation+6.47e-02

Pos logit contributions
 exclaimed+0.004
 replied+0.004
�+0.003
 said+0.003
s+0.003
Neg logit contributions
ocol-0.007
uff-0.006
ithering-0.006
 loser-0.006
aught-0.006
Token ch
Token ID442
Feature activation+6.51e-02

Pos logit contributions
ocol+0.161
 loser+0.150
uff+0.149
ithering+0.147
ubb+0.147
Neg logit contributions
 said-0.086
 exclaimed-0.085
 replied-0.084
s-0.074
cakes-0.072
Tokenocol
Token ID4668
Feature activation+2.71e-01

Pos logit contributions
 loser+0.043
tons+0.036
 bruised+0.036
 Kang+0.035
 slaves+0.033
Neg logit contributions
s-0.172
 exclaimed-0.170
 replied-0.167
 said-0.165
les-0.165
Tokenates
Token ID689
Feature activation+7.65e-03

Pos logit contributions
ocol+0.362
 loser+0.297
ubb+0.284
 bruised+0.277
uff+0.275
Neg logit contributions
 exclaimed-0.652
 replied-0.638
 said-0.633
s-0.619
cakes-0.606
Token!
Token ID0
Feature activation-1.12e-02

Pos logit contributions
ocol+0.020
uff+0.018
 loser+0.018
ubb+0.018
ithering+0.017
Neg logit contributions
 exclaimed-0.009
 replied-0.009
 said-0.008
�-0.008
s-0.008
Token Alice
Token ID14862
Feature activation-3.55e-02

Pos logit contributions
 replied+0.017
 exclaimed+0.017
 said+0.016
s+0.016
ade+0.016
Neg logit contributions
ocol-0.020
uff-0.018
uzz-0.018
 bruised-0.018
 loser-0.018
Token and
Token ID290
Feature activation-2.67e-02

Pos logit contributions
 exclaimed+0.030
 replied+0.029
�+0.028
cakes+0.028
 said+0.026
Neg logit contributions
ocol-0.099
uff-0.090
aught-0.088
uzz-0.088
 loser-0.088
Token her
Token ID607
Feature activation+5.36e-02

Pos logit contributions
 exclaimed+0.040
 replied+0.038
s+0.038
�+0.037
cakes+0.037
Neg logit contributions
ocol-0.058
uff-0.051
 loser-0.050
uzz-0.050
 bruised-0.050
Token she
Token ID673
Feature activation-2.31e-02

Pos logit contributions
 exclaimed+0.016
 replied+0.015
cakes+0.015
s+0.014
 said+0.014
Neg logit contributions
ocol-0.044
 loser-0.040
uff-0.040
 bruised-0.039
ubb-0.039
Token paid
Token ID3432
Feature activation+9.05e-03

Pos logit contributions
 exclaimed+0.021
�+0.020
 replied+0.019
s+0.019
cakes+0.018
Neg logit contributions
ocol-0.062
 bruised-0.056
uff-0.055
uzz-0.054
aught-0.054
Token for
Token ID329
Feature activation-8.72e-03

Pos logit contributions
ocol+0.024
 loser+0.021
uff+0.021
ubb+0.021
ithering+0.021
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.011
s-0.010
�-0.009
Token the
Token ID262
Feature activation-2.76e-02

Pos logit contributions
 said+0.007
 exclaimed+0.007
 replied+0.006
 they+0.006
 +0.005
Neg logit contributions
ocol-0.026
uff-0.024
ithering-0.024
 loser-0.024
aught-0.024
Token ch
Token ID442
Feature activation+6.42e-02

Pos logit contributions
 said+0.036
 replied+0.036
 exclaimed+0.034
�+0.032
 +0.031
Neg logit contributions
ocol-0.071
ithering-0.063
uff-0.063
uzz-0.062
arter-0.062
Tokenocol
Token ID4668
Feature activation+2.70e-01

Pos logit contributions
ithering+0.018
 unsolved+0.018
ateful+0.018
 property+0.016
conf+0.014
Neg logit contributions
 replied-0.191
s-0.188
 exclaimed-0.187
�-0.185
 said-0.185
Tokenates
Token ID689
Feature activation+9.19e-04

Pos logit contributions
ocol+0.371
 loser+0.320
 bruised+0.319
ubb+0.308
aught+0.304
Neg logit contributions
 exclaimed-0.593
s-0.591
 said-0.581
 replied-0.580
cakes-0.569
Token and
Token ID290
Feature activation-1.67e-02

Pos logit contributions
ocol+0.003
 loser+0.002
uff+0.002
ubb+0.002
ithering+0.002
Neg logit contributions
 exclaimed-0.001
 replied-0.001
 said-0.001
s-0.001
�-0.001
Token the
Token ID262
Feature activation-3.49e-02

Pos logit contributions
 exclaimed+0.016
 replied+0.016
 said+0.015
s+0.014
�+0.014
Neg logit contributions
ocol-0.047
uff-0.042
 loser-0.042
ubb-0.041
 bruised-0.041
Token two
Token ID734
Feature activation-8.05e-02

Pos logit contributions
 said+0.048
 replied+0.047
 exclaimed+0.046
�+0.042
cakes+0.041
Neg logit contributions
ocol-0.087
ithering-0.078
uff-0.077
uzz-0.076
arter-0.076
Token of
Token ID286
Feature activation+4.55e-02

Pos logit contributions
 exclaimed+0.099
 replied+0.097
 said+0.091
�+0.089
s+0.085
Neg logit contributions
ocol-0.210
uff-0.182
 loser-0.181
ubb-0.180
 bruised-0.177
Token counters
Token ID21154
Feature activation-3.88e-02

Pos logit contributions
 exclaimed+0.059
 replied+0.058
 said+0.056
s+0.054
�+0.053
Neg logit contributions
ocol-0.088
 loser-0.076
uff-0.076
ubb-0.075
ithering-0.073
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
 exclaimed+0.045
 replied+0.044
 said+0.042
s+0.039
�+0.039
Neg logit contributions
ocol-0.103
uff-0.090
 loser-0.090
ubb-0.089
ithering-0.088
Token When
Token ID1649
Feature activation+3.88e-03

Pos logit contributions
ocol+0.043
 loser+0.039
ubb+0.038
ithering+0.037
uff+0.037
Neg logit contributions
 exclaimed-0.018
 replied-0.017
 -0.017
 said-0.017
�-0.015
Token they
Token ID484
Feature activation-1.06e-01

Pos logit contributions
ocol+0.011
 loser+0.010
uff+0.010
ubb+0.010
 bruised+0.010
Neg logit contributions
 exclaimed-0.004
 replied-0.003
 said-0.003
s-0.003
�-0.003
Token were
Token ID547
Feature activation-2.41e-02

Pos logit contributions
 exclaimed+0.101
 replied+0.096
�+0.091
s+0.085
cakes+0.085
Neg logit contributions
ocol-0.287
ubb-0.257
 bruised-0.256
 loser-0.254
uff-0.250
Token done
Token ID1760
Feature activation-1.43e-01

Pos logit contributions
 replied+0.029
s+0.028
 exclaimed+0.028
�+0.028
 said+0.025
Neg logit contributions
ocol-0.061
uff-0.055
 bruised-0.054
ithering-0.053
aught-0.052
Token,
Token ID11
Feature activation-7.33e-03

Pos logit contributions
 exclaimed+0.158
 replied+0.152
 said+0.146
�+0.135
s+0.134
Neg logit contributions
ocol-0.391
uff-0.345
ithering-0.336
 loser-0.336
ubb-0.334
Token they
Token ID484
Feature activation-9.35e-02

Pos logit contributions
 exclaimed+0.008
 replied+0.008
 said+0.008
s+0.007
 they+0.007
Neg logit contributions
ocol-0.021
uff-0.018
 loser-0.018
ubb-0.018
ithering-0.018
Token sat
Token ID3332
Feature activation+1.21e-02

Pos logit contributions
 exclaimed+0.096
s+0.092
cakes+0.092
�+0.091
 replied+0.090
Neg logit contributions
ocol-0.231
 bruised-0.212
uff-0.208
ubb-0.207
uzz-0.206
Token down
Token ID866
Feature activation+5.89e-03

Pos logit contributions
ocol+0.028
 loser+0.024
uff+0.024
ubb+0.024
ithering+0.024
Neg logit contributions
 exclaimed-0.018
 replied-0.018
 said-0.018
s-0.017
�-0.017
Token to
Token ID284
Feature activation-1.28e-02

Pos logit contributions
ocol+0.014
uff+0.013
 bruised+0.013
conf+0.012
uzz+0.012
Neg logit contributions
 exclaimed-0.008
 replied-0.007
�-0.007
 said-0.007
cakes-0.006
Token do
Token ID466
Feature activation-4.12e-02

Pos logit contributions
ocol+0.027
uff+0.023
 loser+0.023
ubb+0.023
ithering+0.023
Neg logit contributions
 exclaimed-0.014
 replied-0.014
 said-0.013
�-0.012
s-0.012
Token.
Token ID13
Feature activation+3.28e-02

Pos logit contributions
 exclaimed+0.048
 replied+0.046
 said+0.046
s+0.043
�+0.042
Neg logit contributions
ocol-0.111
uff-0.097
 loser-0.097
ubb-0.096
ithering-0.095
Token But
Token ID887
Feature activation+4.53e-02

Pos logit contributions
ocol+0.081
ubb+0.074
uff+0.072
ithering+0.070
ind+0.069
Neg logit contributions
 -0.046
�-0.046
 said-0.045
 replied-0.044
 exclaimed-0.042
Token I
Token ID314
Feature activation-8.24e-03

Pos logit contributions
ocol+0.129
ithering+0.118
udge+0.117
aught+0.115
ubb+0.114
Neg logit contributions
 said-0.054
 they-0.054
 replied-0.050
 looked-0.044
 -0.041
Token'm
Token ID1101
Feature activation-6.38e-03

Pos logit contributions
 exclaimed+0.012
 replied+0.012
 said+0.012
s+0.011
�+0.011
Neg logit contributions
ocol-0.020
uff-0.017
 loser-0.017
ubb-0.017
ithering-0.017
Token done
Token ID1760
Feature activation-1.40e-01

Pos logit contributions
 exclaimed+0.012
 replied+0.012
 said+0.012
�+0.011
s+0.011
Neg logit contributions
ocol-0.013
 loser-0.011
uff-0.011
ubb-0.010
ithering-0.010
Token now
Token ID783
Feature activation-5.21e-02

Pos logit contributions
 exclaimed+0.158
 replied+0.155
 said+0.149
s+0.139
�+0.138
Neg logit contributions
ocol-0.381
uff-0.335
 loser-0.335
ubb-0.332
ithering-0.328
Token.
Token ID13
Feature activation+3.71e-02

Pos logit contributions
 exclaimed+0.050
 replied+0.050
 said+0.049
�+0.045
s+0.044
Neg logit contributions
ocol-0.151
uff-0.133
ubb-0.132
 loser-0.131
ithering-0.131
Token Come
Token ID7911
Feature activation+1.40e-02

Pos logit contributions
ocol+0.089
uff+0.081
ubb+0.081
ithering+0.078
ind+0.077
Neg logit contributions
�-0.054
 -0.054
 said-0.052
 replied-0.052
 exclaimed-0.050
Token on
Token ID319
Feature activation-4.50e-02

Pos logit contributions
ocol+0.039
ubb+0.034
uff+0.033
 loser+0.033
aught+0.033
Neg logit contributions
 replied-0.016
 exclaimed-0.015
 said-0.015
s-0.014
�-0.014
Token,
Token ID11
Feature activation+1.78e-02

Pos logit contributions
 exclaimed+0.046
 replied+0.043
 said+0.041
s+0.039
cakes+0.037
Neg logit contributions
ocol-0.125
 loser-0.113
uff-0.112
 bruised-0.109
ithering-0.109
Token.
Token ID13
Feature activation+2.77e-02

Pos logit contributions
 exclaimed+0.023
 replied+0.023
 said+0.022
s+0.021
�+0.020
Neg logit contributions
ocol-0.058
uff-0.051
 loser-0.050
ubb-0.050
ithering-0.050
Token And
Token ID843
Feature activation+2.76e-02

Pos logit contributions
ocol+0.064
ubb+0.055
uff+0.054
ithering+0.053
ateful+0.052
Neg logit contributions
 replied-0.044
 said-0.043
 -0.043
 exclaimed-0.043
�-0.042
Token when
Token ID618
Feature activation+7.43e-03

Pos logit contributions
ocol+0.076
ithering+0.066
uff+0.065
ubb+0.065
arter+0.065
Neg logit contributions
 said-0.035
 replied-0.034
 exclaimed-0.033
 they-0.032
 -0.029
Token we
Token ID356
Feature activation-6.23e-02

Pos logit contributions
ocol+0.020
uff+0.018
ubb+0.017
ithering+0.017
aught+0.017
Neg logit contributions
 replied-0.008
 exclaimed-0.008
 said-0.008
�-0.008
s-0.007
Token are
Token ID389
Feature activation-1.12e-02

Pos logit contributions
 exclaimed+0.074
 replied+0.074
 said+0.072
s+0.066
�+0.066
Neg logit contributions
ocol-0.166
uff-0.146
 loser-0.145
ubb-0.143
ithering-0.142
Token done
Token ID1760
Feature activation-1.40e-01

Pos logit contributions
 exclaimed+0.014
 replied+0.014
 said+0.013
s+0.012
�+0.012
Neg logit contributions
ocol-0.029
uff-0.025
 loser-0.025
ubb-0.025
ithering-0.025
Token,
Token ID11
Feature activation+1.28e-02

Pos logit contributions
 exclaimed+0.204
 replied+0.200
 said+0.194
s+0.185
�+0.181
Neg logit contributions
ocol-0.334
uff-0.288
 loser-0.287
ubb-0.283
ithering-0.279
Token we
Token ID356
Feature activation-5.99e-02

Pos logit contributions
ocol+0.035
uff+0.031
ithering+0.031
ubb+0.031
aught+0.030
Neg logit contributions
 said-0.015
 replied-0.015
 exclaimed-0.014
 they-0.013
�-0.013
Token can
Token ID460
Feature activation-4.04e-02

Pos logit contributions
 replied+0.083
 exclaimed+0.083
 said+0.082
�+0.076
s+0.076
Neg logit contributions
ocol-0.148
uff-0.128
ubb-0.126
 loser-0.126
ithering-0.125
Token play
Token ID711
Feature activation-1.86e-02

Pos logit contributions
 exclaimed+0.057
 replied+0.055
 said+0.053
s+0.051
�+0.049
Neg logit contributions
ocol-0.098
uff-0.085
 loser-0.085
ubb-0.083
ithering-0.083
Token with
Token ID351
Feature activation+2.55e-02

Pos logit contributions
 exclaimed+0.022
 replied+0.022
 said+0.021
s+0.019
�+0.019
Neg logit contributions
ocol-0.049
 loser-0.043
uff-0.043
ubb-0.043
ithering-0.042
Token golden
Token ID10861
Feature activation+2.25e-02

Pos logit contributions
 exclaimed+0.007
 replied+0.007
 said+0.007
s+0.006
�+0.006
Neg logit contributions
ocol-0.014
uff-0.013
 loser-0.013
ubb-0.012
ithering-0.012
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
ocol+0.059
 loser+0.052
uff+0.052
ithering+0.052
ubb+0.052
Neg logit contributions
 exclaimed-0.026
 replied-0.026
 said-0.026
s-0.024
�-0.023
Token When
Token ID1649
Feature activation+9.08e-03

Pos logit contributions
ocol+0.034
 loser+0.031
ubb+0.030
uff+0.030
ithering+0.030
Neg logit contributions
 exclaimed-0.011
 replied-0.010
 said-0.010
 -0.009
s-0.009
Token they
Token ID484
Feature activation-9.91e-02

Pos logit contributions
ocol+0.028
 loser+0.025
uff+0.025
ubb+0.025
 bruised+0.025
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.005
s-0.005
�-0.005
Token were
Token ID547
Feature activation-1.74e-02

Pos logit contributions
 exclaimed+0.088
 replied+0.083
�+0.080
cakes+0.079
s+0.076
Neg logit contributions
ocol-0.274
ubb-0.241
 bruised-0.239
 loser-0.238
uzz-0.236
Token done
Token ID1760
Feature activation-1.40e-01

Pos logit contributions
 exclaimed+0.019
 replied+0.019
s+0.018
�+0.017
 said+0.016
Neg logit contributions
ocol-0.047
uff-0.042
 bruised-0.041
ubb-0.041
 loser-0.041
Token,
Token ID11
Feature activation-5.27e-03

Pos logit contributions
 exclaimed+0.141
 replied+0.135
 said+0.129
s+0.119
�+0.118
Neg logit contributions
ocol-0.398
uff-0.351
 loser-0.346
ubb-0.344
ithering-0.343
Token June
Token ID2795
Feature activation-4.49e-02

Pos logit contributions
 exclaimed+0.005
 replied+0.004
 said+0.004
s+0.004
�+0.004
Neg logit contributions
ocol-0.016
uff-0.014
 loser-0.014
ubb-0.014
ithering-0.014
Token opened
Token ID4721
Feature activation-1.50e-02

Pos logit contributions
cakes+0.036
 exclaimed+0.035
 replied+0.034
�+0.034
s+0.032
Neg logit contributions
ocol-0.120
 bruised-0.110
uff-0.107
aught-0.107
asured-0.106
Token the
Token ID262
Feature activation-3.62e-02

Pos logit contributions
 exclaimed+0.019
 replied+0.019
cakes+0.018
�+0.018
 said+0.017
Neg logit contributions
ocol-0.037
uff-0.032
uzz-0.032
 bruised-0.031
ubb-0.031
Token oven
Token ID14361
Feature activation-2.11e-02

Pos logit contributions
 exclaimed+0.072
 replied+0.070
 said+0.068
s+0.065
�+0.064
Neg logit contributions
ocol-0.069
 loser-0.055
uff-0.055
 bruised-0.055
ubb-0.053
Token sand
Token ID6450
Feature activation+6.38e-03

Pos logit contributions
ocol+0.080
 loser+0.074
uff+0.072
ithering+0.072
ubb+0.071
Neg logit contributions
 said-0.026
 exclaimed-0.026
 replied-0.025
s-0.022
 they-0.020
Token.
Token ID13
Feature activation+3.84e-03

Pos logit contributions
ocol+0.018
 loser+0.016
uff+0.016
ubb+0.015
ithering+0.015
Neg logit contributions
 exclaimed-0.007
 replied-0.007
 said-0.006
s-0.006
�-0.006
Token When
Token ID1649
Feature activation+1.02e-02

Pos logit contributions
ocol+0.011
 loser+0.010
ubb+0.009
ithering+0.009
 bruised+0.009
Neg logit contributions
 exclaimed-0.004
 replied-0.004
 said-0.004
 -0.004
s-0.004
Token they
Token ID484
Feature activation-1.03e-01

Pos logit contributions
ocol+0.030
 loser+0.027
uff+0.027
ubb+0.026
uzz+0.026
Neg logit contributions
 exclaimed-0.009
 replied-0.008
 said-0.007
s-0.007
�-0.007
Token were
Token ID547
Feature activation-2.24e-02

Pos logit contributions
 exclaimed+0.094
 replied+0.090
�+0.088
cakes+0.081
s+0.081
Neg logit contributions
ocol-0.288
ubb-0.256
 loser-0.256
uff-0.254
aught-0.251
Token done
Token ID1760
Feature activation-1.40e-01

Pos logit contributions
 exclaimed+0.023
 replied+0.022
�+0.021
s+0.021
 said+0.020
Neg logit contributions
ocol-0.063
uff-0.056
 loser-0.055
 bruised-0.055
ubb-0.055
Token,
Token ID11
Feature activation-8.17e-03

Pos logit contributions
 exclaimed+0.142
 replied+0.137
 said+0.131
s+0.122
�+0.119
Neg logit contributions
ocol-0.397
uff-0.351
 loser-0.348
ubb-0.345
ithering-0.343
Token Jane
Token ID12091
Feature activation-6.03e-02

Pos logit contributions
 exclaimed+0.008
 replied+0.008
 said+0.007
�+0.007
s+0.007
Neg logit contributions
ocol-0.023
uff-0.020
 loser-0.020
ubb-0.020
uzz-0.020
Token thanked
Token ID26280
Feature activation-2.67e-02

Pos logit contributions
 exclaimed+0.042
cakes+0.040
 replied+0.040
�+0.039
s+0.034
Neg logit contributions
ocol-0.168
uff-0.156
 bruised-0.153
aught-0.153
asured-0.151
Token Bob
Token ID5811
Feature activation-7.08e-03

Pos logit contributions
 replied+0.041
 exclaimed+0.040
 said+0.040
s+0.036
�+0.036
Neg logit contributions
ocol-0.062
 loser-0.055
ubb-0.053
ithering-0.053
uff-0.053
Token for
Token ID329
Feature activation-1.29e-02

Pos logit contributions
 exclaimed+0.009
 replied+0.008
 said+0.008
s+0.007
�+0.007
Neg logit contributions
ocol-0.019
uff-0.016
 loser-0.016
ubb-0.016
ithering-0.016
Token him
Token ID683
Feature activation+3.39e-02

Pos logit contributions
 exclaimed+0.060
 replied+0.059
 said+0.057
s+0.054
�+0.053
Neg logit contributions
ocol-0.112
 loser-0.096
uff-0.096
ubb-0.095
aught-0.094
Token.
Token ID13
Feature activation+6.67e-03

Pos logit contributions
ocol+0.085
 loser+0.074
uff+0.074
ubb+0.073
ithering+0.073
Neg logit contributions
 exclaimed-0.043
 replied-0.043
 said-0.041
s-0.039
�-0.038
Token When
Token ID1649
Feature activation+9.10e-05

Pos logit contributions
ocol+0.020
ubb+0.018
 loser+0.017
uff+0.017
 bruised+0.017
Neg logit contributions
 replied-0.007
 -0.007
 exclaimed-0.007
 said-0.006
�-0.006
Token they
Token ID484
Feature activation-1.04e-01

Pos logit contributions
ocol+0.000
uff+0.000
 loser+0.000
ubb+0.000
ithering+0.000
Neg logit contributions
 exclaimed-0.000
 replied-0.000
 said-0.000
s-0.000
�-0.000
Token were
Token ID547
Feature activation-2.13e-02

Pos logit contributions
 exclaimed+0.089
 replied+0.084
 said+0.076
s+0.075
�+0.072
Neg logit contributions
ocol-0.308
 loser-0.273
ubb-0.271
uff-0.269
ithering-0.267
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 exclaimed+0.026
 replied+0.026
 said+0.025
s+0.024
�+0.023
Neg logit contributions
ocol-0.056
uff-0.048
 loser-0.048
ubb-0.048
ithering-0.047
Token,
Token ID11
Feature activation-9.39e-03

Pos logit contributions
 exclaimed+0.125
 replied+0.123
 said+0.117
s+0.107
�+0.103
Neg logit contributions
ocol-0.410
 loser-0.366
uff-0.365
ubb-0.361
ithering-0.357
Token he
Token ID339
Feature activation-1.37e-02

Pos logit contributions
 exclaimed+0.011
 replied+0.010
 said+0.010
s+0.009
�+0.009
Neg logit contributions
ocol-0.026
uff-0.023
 loser-0.022
ubb-0.022
ithering-0.022
Token thanked
Token ID26280
Feature activation-2.95e-02

Pos logit contributions
 exclaimed+0.012
cakes+0.011
 replied+0.011
�+0.011
s+0.011
Neg logit contributions
ocol-0.036
 bruised-0.033
aught-0.033
uff-0.033
 loser-0.033
Token her
Token ID607
Feature activation+4.88e-02

Pos logit contributions
 exclaimed+0.037
 replied+0.036
 said+0.035
s+0.033
�+0.033
Neg logit contributions
ocol-0.076
uff-0.066
 loser-0.066
ubb-0.065
ithering-0.064
Token for
Token ID329
Feature activation-1.53e-02

Pos logit contributions
ocol+0.132
 loser+0.116
ithering+0.114
ubb+0.114
uff+0.114
Neg logit contributions
 replied-0.060
 exclaimed-0.059
 said-0.058
s-0.053
�-0.050
Token carefully
Token ID7773
Feature activation-3.23e-02

Pos logit contributions
 exclaimed+0.031
 replied+0.031
 said+0.029
s+0.027
�+0.026
Neg logit contributions
ocol-0.069
 loser-0.062
uff-0.061
ithering-0.060
ubb-0.060
Token.
Token ID13
Feature activation+1.92e-02

Pos logit contributions
 replied+0.034
 said+0.034
 exclaimed+0.033
s+0.029
 +0.028
Neg logit contributions
ocol-0.091
 loser-0.083
ubb-0.081
uff-0.080
arter-0.080
Token When
Token ID1649
Feature activation+1.74e-02

Pos logit contributions
ocol+0.050
 loser+0.043
ubb+0.042
uff+0.042
ithering+0.042
Neg logit contributions
 exclaimed-0.025
 replied-0.025
 said-0.024
s-0.023
 -0.023
Token Daddy
Token ID26926
Feature activation-5.46e-02

Pos logit contributions
ocol+0.051
 loser+0.045
uff+0.045
ubb+0.045
ithering+0.044
Neg logit contributions
 exclaimed-0.016
 replied-0.015
 said-0.014
s-0.013
�-0.013
Token was
Token ID373
Feature activation-4.93e-03

Pos logit contributions
 exclaimed+0.047
 replied+0.045
 said+0.041
s+0.039
�+0.039
Neg logit contributions
ocol-0.161
uff-0.143
 loser-0.142
ubb-0.141
ithering-0.140
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 exclaimed+0.005
 replied+0.005
s+0.005
 said+0.005
�+0.005
Neg logit contributions
ocol-0.013
uff-0.012
 loser-0.012
ithering-0.011
 bruised-0.011
Token,
Token ID11
Feature activation-1.36e-03

Pos logit contributions
 replied+0.087
 exclaimed+0.083
 said+0.082
 they+0.075
s+0.067
Neg logit contributions
ocol-0.447
 loser-0.413
ubb-0.405
ateful-0.402
asured-0.401
Token the
Token ID262
Feature activation-4.04e-02

Pos logit contributions
 exclaimed+0.001
 replied+0.001
 said+0.001
s+0.001
�+0.001
Neg logit contributions
ocol-0.004
 loser-0.004
uff-0.004
ubb-0.004
ithering-0.004
Token door
Token ID3420
Feature activation+5.17e-05

Pos logit contributions
 said+0.054
 replied+0.054
 exclaimed+0.050
 looked+0.046
�+0.045
Neg logit contributions
ocol-0.109
ithering-0.094
uff-0.094
ubb-0.093
uzz-0.093
Token was
Token ID373
Feature activation+3.28e-02

Pos logit contributions
ocol+0.000
uff+0.000
 bruised+0.000
 loser+0.000
ubb+0.000
Neg logit contributions
 exclaimed-0.000
 replied-0.000
�-0.000
 said-0.000
s-0.000
Token locked
Token ID8970
Feature activation+3.07e-02

Pos logit contributions
ocol+0.084
uff+0.074
 loser+0.074
ubb+0.073
aught+0.070
Neg logit contributions
 said-0.045
 exclaimed-0.045
 replied-0.044
 looked-0.038
 -0.038
Token away
Token ID1497
Feature activation+1.75e-03

Pos logit contributions
 exclaimed+0.001
 said+0.001
 replied+0.001
s+0.001
�+0.001
Neg logit contributions
ocol-0.002
 loser-0.002
uff-0.002
ubb-0.002
ithering-0.002
Token.
Token ID13
Feature activation+1.87e-04

Pos logit contributions
ocol+0.005
 loser+0.004
uff+0.004
ithering+0.004
ubb+0.004
Neg logit contributions
 exclaimed-0.002
 said-0.002
 replied-0.002
s-0.002
cakes-0.002
Token When
Token ID1649
Feature activation+1.17e-02

Pos logit contributions
ocol+0.001
 loser+0.000
ubb+0.000
uff+0.000
ithering+0.000
Neg logit contributions
 exclaimed-0.000
 replied-0.000
 said-0.000
s-0.000
 -0.000
Token they
Token ID484
Feature activation-1.02e-01

Pos logit contributions
ocol+0.034
 loser+0.030
uff+0.030
ubb+0.029
 bruised+0.029
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.010
s-0.009
�-0.009
Token were
Token ID547
Feature activation-2.25e-02

Pos logit contributions
 exclaimed+0.088
 replied+0.082
�+0.078
s+0.074
cakes+0.073
Neg logit contributions
ocol-0.293
ubb-0.260
 loser-0.257
uff-0.255
 bruised-0.253
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 exclaimed+0.022
 replied+0.022
�+0.020
s+0.020
 said+0.019
Neg logit contributions
ocol-0.063
uff-0.056
 bruised-0.055
ubb-0.055
 loser-0.055
Token,
Token ID11
Feature activation-9.04e-03

Pos logit contributions
 exclaimed+0.154
 replied+0.150
 said+0.144
s+0.134
�+0.132
Neg logit contributions
ocol-0.382
uff-0.335
 loser-0.331
ubb-0.329
ithering-0.327
Token her
Token ID607
Feature activation+5.16e-02

Pos logit contributions
 exclaimed+0.012
 replied+0.011
 said+0.011
s+0.010
�+0.010
Neg logit contributions
ocol-0.023
 loser-0.020
uff-0.020
ubb-0.020
 bruised-0.019
Token mother
Token ID2802
Feature activation-1.50e-02

Pos logit contributions
ocol+0.119
 loser+0.101
uff+0.099
ubb+0.099
ithering+0.099
Neg logit contributions
 replied-0.083
 exclaimed-0.082
 said-0.081
s-0.076
�-0.073
Token said
Token ID531
Feature activation-5.44e-02

Pos logit contributions
 exclaimed+0.009
�+0.009
 replied+0.008
s+0.007
cakes+0.007
Neg logit contributions
ocol-0.045
uff-0.041
 bruised-0.041
aught-0.040
ubb-0.040
Token,
Token ID11
Feature activation-2.48e-03

Pos logit contributions
 exclaimed+0.058
 replied+0.054
 said+0.052
�+0.050
s+0.049
Neg logit contributions
ocol-0.147
uff-0.129
 loser-0.129
ubb-0.127
uzz-0.126
Token patiently
Token ID37220
Feature activation-7.81e-03

Pos logit contributions
 exclaimed+0.112
 replied+0.110
 said+0.105
s+0.101
�+0.100
Neg logit contributions
ocol-0.185
uff-0.160
 loser-0.155
ubb-0.154
ithering-0.154
Token.
Token ID13
Feature activation+2.33e-03

Pos logit contributions
 exclaimed+0.009
 replied+0.009
 said+0.009
s+0.008
�+0.008
Neg logit contributions
ocol-0.021
 loser-0.018
uff-0.018
ubb-0.018
ithering-0.018
Token When
Token ID1649
Feature activation+1.25e-02

Pos logit contributions
ocol+0.006
 loser+0.005
uff+0.005
ubb+0.005
ithering+0.005
Neg logit contributions
 exclaimed-0.003
 replied-0.003
 said-0.003
s-0.003
�-0.003
Token they
Token ID484
Feature activation-9.54e-02

Pos logit contributions
ocol+0.036
 loser+0.032
uff+0.032
 bruised+0.032
ubb+0.031
Neg logit contributions
 exclaimed-0.011
 replied-0.010
�-0.009
 said-0.009
s-0.009
Token were
Token ID547
Feature activation-1.64e-02

Pos logit contributions
 exclaimed+0.092
 replied+0.089
�+0.082
s+0.076
cakes+0.074
Neg logit contributions
ocol-0.261
ubb-0.229
 bruised-0.228
 loser-0.227
ateful-0.221
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 replied+0.018
 exclaimed+0.018
s+0.017
�+0.017
 said+0.015
Neg logit contributions
ocol-0.044
uff-0.039
 bruised-0.039
 loser-0.038
ubb-0.038
Token,
Token ID11
Feature activation-2.67e-03

Pos logit contributions
 exclaimed+0.186
 replied+0.179
 said+0.172
s+0.162
�+0.162
Neg logit contributions
ocol-0.353
uff-0.305
 loser-0.298
ubb-0.296
ithering-0.296
Token Lily
Token ID20037
Feature activation-3.56e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.003
 said+0.003
s+0.003
�+0.003
Neg logit contributions
ocol-0.007
 loser-0.006
uff-0.006
ubb-0.006
ithering-0.006
Token's
Token ID338
Feature activation+1.64e-02

Pos logit contributions
 replied+0.034
cakes+0.034
 exclaimed+0.033
�+0.032
 +0.029
Neg logit contributions
ocol-0.082
 bruised-0.079
uff-0.078
aught-0.078
asured-0.078
Token mom
Token ID1995
Feature activation-2.95e-02

Pos logit contributions
ocol+0.037
uff+0.031
 loser+0.031
ubb+0.031
aught+0.030
Neg logit contributions
 exclaimed-0.026
 replied-0.026
 said-0.025
s-0.024
�-0.024
Token took
Token ID1718
Feature activation-4.42e-02

Pos logit contributions
�+0.028
 exclaimed+0.027
 replied+0.025
s+0.024
cakes+0.024
Neg logit contributions
ocol-0.075
 bruised-0.071
uff-0.070
aught-0.069
asured-0.066
Token carefully
Token ID7773
Feature activation-2.47e-02

Pos logit contributions
ocol+0.062
uff+0.054
 loser+0.054
ubb+0.053
ithering+0.053
Neg logit contributions
 exclaimed-0.032
 replied-0.032
 said-0.031
s-0.029
�-0.028
Token.
Token ID13
Feature activation+5.81e-03

Pos logit contributions
 exclaimed+0.025
 replied+0.025
 said+0.024
s+0.022
�+0.021
Neg logit contributions
ocol-0.070
 loser-0.062
uff-0.062
ubb-0.061
ithering-0.061
Token When
Token ID1649
Feature activation+2.77e-03

Pos logit contributions
ocol+0.017
 loser+0.016
ubb+0.015
ithering+0.015
uff+0.015
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
 -0.005
s-0.004
Token she
Token ID673
Feature activation-3.72e-02

Pos logit contributions
ocol+0.008
 loser+0.007
uff+0.007
ubb+0.007
 bruised+0.007
Neg logit contributions
 exclaimed-0.003
 replied-0.002
 said-0.002
s-0.002
�-0.002
Token was
Token ID373
Feature activation-7.97e-03

Pos logit contributions
 exclaimed+0.032
 replied+0.030
cakes+0.029
�+0.028
s+0.027
Neg logit contributions
ocol-0.101
 bruised-0.090
ubb-0.089
uff-0.089
aught-0.088
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 replied+0.008
 exclaimed+0.008
s+0.008
�+0.008
 said+0.007
Neg logit contributions
ocol-0.022
uff-0.019
 bruised-0.019
 loser-0.019
ithering-0.019
Token,
Token ID11
Feature activation-3.74e-03

Pos logit contributions
 exclaimed+0.132
 replied+0.130
 said+0.124
s+0.114
�+0.109
Neg logit contributions
ocol-0.401
 loser-0.359
uff-0.355
ubb-0.353
ithering-0.348
Token the
Token ID262
Feature activation-3.83e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.003
 said+0.003
s+0.003
�+0.003
Neg logit contributions
ocol-0.011
uff-0.010
 loser-0.010
ithering-0.010
ubb-0.010
Token duck
Token ID22045
Feature activation+1.56e-02

Pos logit contributions
 said+0.057
 replied+0.057
 exclaimed+0.056
�+0.052
s+0.051
Neg logit contributions
ocol-0.093
ithering-0.080
uzz-0.079
arter-0.079
uff-0.078
Token said
Token ID531
Feature activation-4.24e-02

Pos logit contributions
ocol+0.042
 bruised+0.039
ubb+0.039
uff+0.039
aught+0.038
Neg logit contributions
 exclaimed-0.013
 replied-0.012
�-0.012
cakes-0.012
 said-0.010
Token,
Token ID11
Feature activation+6.17e-03

Pos logit contributions
 exclaimed+0.041
 replied+0.040
 said+0.038
s+0.036
�+0.035
Neg logit contributions
ocol-0.121
 loser-0.107
uff-0.107
ubb-0.106
 bruised-0.104
Token bake
Token ID28450
Feature activation-7.47e-02

Pos logit contributions
 exclaimed+0.014
 replied+0.014
 said+0.013
s+0.013
�+0.012
Neg logit contributions
ocol-0.030
 loser-0.026
uff-0.025
ubb-0.025
 bruised-0.025
Token.
Token ID13
Feature activation-5.02e-03

Pos logit contributions
 exclaimed+0.103
 replied+0.102
 said+0.089
�+0.085
s+0.084
Neg logit contributions
ocol-0.174
uff-0.145
 bruised-0.140
 loser-0.140
ities-0.139
Token When
Token ID1649
Feature activation+8.17e-03

Pos logit contributions
 exclaimed+0.007
 replied+0.007
 said+0.007
s+0.007
�+0.007
Neg logit contributions
ocol-0.012
uff-0.010
 loser-0.010
aught-0.010
ithering-0.010
Token they
Token ID484
Feature activation-1.01e-01

Pos logit contributions
ocol+0.022
 bruised+0.019
 loser+0.019
uff+0.019
ubb+0.018
Neg logit contributions
 exclaimed-0.009
 replied-0.008
cakes-0.007
 said-0.007
s-0.007
Token were
Token ID547
Feature activation-2.27e-02

Pos logit contributions
 exclaimed+0.109
 replied+0.099
�+0.090
cakes+0.088
s+0.087
Neg logit contributions
ocol-0.266
ubb-0.229
 bruised-0.225
 loser-0.221
ateful-0.221
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 replied+0.030
 exclaimed+0.029
s+0.029
�+0.028
 said+0.025
Neg logit contributions
ocol-0.055
uff-0.048
 bruised-0.047
ubb-0.046
aught-0.046
Token,
Token ID11
Feature activation-7.31e-03

Pos logit contributions
 exclaimed+0.185
 replied+0.176
 said+0.169
s+0.157
�+0.157
Neg logit contributions
ocol-0.356
uff-0.306
ithering-0.294
ubb-0.293
 loser-0.292
Token Tim
Token ID5045
Feature activation+4.09e-03

Pos logit contributions
 exclaimed+0.012
 replied+0.012
 said+0.011
s+0.011
�+0.011
Neg logit contributions
ocol-0.016
 loser-0.013
uff-0.013
 bruised-0.013
ubb-0.013
Tokenmy
Token ID1820
Feature activation-4.24e-02

Pos logit contributions
ocol+0.005
 bruised+0.005
asured+0.005
uff+0.005
aught+0.005
Neg logit contributions
 exclaimed-0.009
cakes-0.009
 replied-0.008
 they-0.008
�-0.008
Token's
Token ID338
Feature activation+1.88e-02

Pos logit contributions
cakes+0.047
 replied+0.041
 exclaimed+0.040
�+0.040
lic+0.040
Neg logit contributions
asured-0.084
 bruised-0.083
ocol-0.083
aught-0.082
conf-0.080
Token wish
Token ID4601
Feature activation+1.14e-02

Pos logit contributions
ocol+0.044
 loser+0.037
uff+0.037
ubb+0.037
ithering+0.036
Neg logit contributions
 exclaimed-0.029
 replied-0.028
 said-0.027
s-0.026
�-0.026
Token fun
Token ID1257
Feature activation-9.11e-03

Pos logit contributions
ocol+0.042
 bruised+0.041
 loser+0.040
 unsolved+0.039
 cheating+0.038
Neg logit contributions
 exclaimed-0.035
 replied-0.034
�-0.032
s-0.032
cakes-0.032
Token!
Token ID0
Feature activation-7.17e-03

Pos logit contributions
 exclaimed+0.011
 replied+0.011
 said+0.010
�+0.010
cakes+0.010
Neg logit contributions
ocol-0.023
uff-0.021
 loser-0.020
aught-0.020
conf-0.020
Token When
Token ID1649
Feature activation+1.42e-02

Pos logit contributions
 exclaimed+0.009
 replied+0.009
 chat+0.009
 said+0.008
cakes+0.008
Neg logit contributions
ocol-0.014
uff-0.014
aught-0.014
 loser-0.014
uzz-0.013
Token they
Token ID484
Feature activation-9.49e-02

Pos logit contributions
ocol+0.031
 loser+0.029
 bruised+0.028
arter+0.027
conf+0.027
Neg logit contributions
 exclaimed-0.020
 replied-0.019
cakes-0.018
lic-0.017
 said-0.016
Token were
Token ID547
Feature activation-2.04e-02

Pos logit contributions
 exclaimed+0.094
cakes+0.092
�+0.092
 replied+0.091
s+0.077
Neg logit contributions
ocol-0.234
 loser-0.218
ubb-0.213
uzz-0.207
uff-0.206
Token done
Token ID1760
Feature activation-1.39e-01

Pos logit contributions
 exclaimed+0.024
 replied+0.023
�+0.022
s+0.022
 said+0.021
Neg logit contributions
ocol-0.053
uff-0.049
 loser-0.048
ithering-0.047
 bruised-0.047
Token,
Token ID11
Feature activation-4.87e-03

Pos logit contributions
 exclaimed+0.196
 replied+0.191
 said+0.185
s+0.174
�+0.172
Neg logit contributions
ocol-0.339
uff-0.294
 loser-0.290
ubb-0.286
ithering-0.286
Token Lily
Token ID20037
Feature activation-3.76e-02

Pos logit contributions
 exclaimed+0.006
 replied+0.006
s+0.005
 said+0.005
�+0.005
Neg logit contributions
ocol-0.011
 loser-0.011
 bruised-0.010
uff-0.010
ubb-0.010
Token's
Token ID338
Feature activation+1.50e-02

Pos logit contributions
cakes+0.045
�+0.039
Happy+0.038
 replied+0.038
 exclaimed+0.037
Neg logit contributions
asured-0.073
uff-0.073
 bruised-0.072
aught-0.072
ocol-0.070
Token mom
Token ID1995
Feature activation-3.05e-02

Pos logit contributions
ocol+0.037
uff+0.031
ubb+0.031
 loser+0.031
ithering+0.031
Neg logit contributions
 replied-0.021
 exclaimed-0.021
 said-0.021
s-0.019
�-0.019
Token looked
Token ID3114
Feature activation-8.82e-02

Pos logit contributions
�+0.028
 exclaimed+0.025
cakes+0.024
s+0.024
 replied+0.023
Neg logit contributions
ocol-0.074
 bruised-0.073
uff-0.072
aught-0.072
asured-0.070
Token correctly
Token ID9380
Feature activation-5.72e-03

Pos logit contributions
 replied+0.047
 exclaimed+0.046
 said+0.043
�+0.043
cakes+0.041
Neg logit contributions
ocol-0.060
uff-0.052
 property-0.051
 loser-0.051
aught-0.050
Token.
Token ID13
Feature activation+7.93e-03

Pos logit contributions
 exclaimed+0.006
 replied+0.006
 said+0.006
s+0.005
�+0.005
Neg logit contributions
ocol-0.016
 loser-0.014
uff-0.014
ubb-0.014
ithering-0.013
Token When
Token ID1649
Feature activation-4.39e-03

Pos logit contributions
ocol+0.023
ubb+0.021
uff+0.020
 loser+0.020
 bruised+0.020
Neg logit contributions
 replied-0.008
 exclaimed-0.008
 said-0.008
 -0.008
�-0.007
Token she
Token ID673
Feature activation-4.06e-02

Pos logit contributions
 exclaimed+0.004
 replied+0.004
 said+0.004
s+0.003
�+0.003
Neg logit contributions
ocol-0.013
 loser-0.012
uff-0.012
ubb-0.011
ithering-0.011
Token was
Token ID373
Feature activation-8.32e-03

Pos logit contributions
 exclaimed+0.032
 replied+0.030
s+0.027
cakes+0.027
 said+0.026
Neg logit contributions
ocol-0.118
 loser-0.106
uff-0.104
ubb-0.104
 bruised-0.103
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 exclaimed+0.008
 replied+0.008
s+0.007
 said+0.007
�+0.007
Neg logit contributions
ocol-0.023
uff-0.021
 loser-0.021
ubb-0.020
ithering-0.020
Token,
Token ID11
Feature activation-3.80e-03

Pos logit contributions
 exclaimed+0.120
 replied+0.119
 said+0.113
s+0.102
�+0.099
Neg logit contributions
ocol-0.412
 loser-0.367
uff-0.365
ubb-0.364
ithering-0.359
Token the
Token ID262
Feature activation-4.14e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.003
 said+0.003
s+0.003
�+0.003
Neg logit contributions
ocol-0.011
uff-0.010
 loser-0.010
ubb-0.010
ithering-0.010
Token owl
Token ID39610
Feature activation+2.62e-02

Pos logit contributions
 replied+0.061
 said+0.060
 exclaimed+0.058
�+0.054
cakes+0.051
Neg logit contributions
ocol-0.101
ithering-0.087
uzz-0.086
uff-0.086
aught-0.085
Token smiled
Token ID13541
Feature activation-3.39e-02

Pos logit contributions
ocol+0.075
 bruised+0.067
ubb+0.067
uff+0.066
 loser+0.066
Neg logit contributions
 exclaimed-0.021
 replied-0.019
cakes-0.018
�-0.018
s-0.018
Token and
Token ID290
Feature activation-1.53e-02

Pos logit contributions
 exclaimed+0.038
 replied+0.037
 said+0.036
s+0.033
�+0.032
Neg logit contributions
ocol-0.093
 loser-0.082
uff-0.081
ubb-0.081
ithering-0.080
Token smooth
Token ID7209
Feature activation-2.31e-02

Pos logit contributions
 exclaimed+0.011
 replied+0.011
�+0.010
s+0.010
 said+0.010
Neg logit contributions
ocol-0.016
 bruised-0.014
 loser-0.014
ubb-0.014
uff-0.014
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
 exclaimed+0.026
 replied+0.025
 said+0.024
s+0.022
�+0.022
Neg logit contributions
ocol-0.063
uff-0.055
 loser-0.055
ubb-0.055
 bruised-0.054
Token When
Token ID1649
Feature activation+1.14e-02

Pos logit contributions
 exclaimed+0.017
 replied+0.016
�+0.016
 said+0.016
s+0.015
Neg logit contributions
ocol-0.022
uff-0.021
aught-0.020
uzz-0.020
udge-0.019
Token they
Token ID484
Feature activation-9.78e-02

Pos logit contributions
ocol+0.027
 loser+0.024
 cheating+0.024
ateful+0.024
ubb+0.023
Neg logit contributions
 exclaimed-0.014
 replied-0.013
lic-0.012
s-0.012
cakes-0.011
Token were
Token ID547
Feature activation-2.17e-02

Pos logit contributions
 exclaimed+0.093
 replied+0.088
cakes+0.087
�+0.085
s+0.078
Neg logit contributions
ocol-0.258
ubb-0.229
 bruised-0.226
uff-0.225
uzz-0.224
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 replied+0.026
 exclaimed+0.026
s+0.025
�+0.025
cakes+0.022
Neg logit contributions
ocol-0.053
uff-0.049
 bruised-0.048
 loser-0.047
ubb-0.047
Token,
Token ID11
Feature activation-9.24e-03

Pos logit contributions
 exclaimed+0.168
 replied+0.162
 said+0.156
�+0.147
s+0.146
Neg logit contributions
ocol-0.364
uff-0.320
 loser-0.311
ithering-0.310
ubb-0.310
Token Jack
Token ID3619
Feature activation-5.26e-02

Pos logit contributions
 exclaimed+0.009
�+0.009
 replied+0.008
s+0.008
cakes+0.008
Neg logit contributions
ocol-0.023
ateful-0.022
 loser-0.021
ubb-0.021
uff-0.021
Token and
Token ID290
Feature activation-2.49e-02

Pos logit contributions
cakes+0.059
 exclaimed+0.055
 replied+0.054
�+0.052
 +0.049
Neg logit contributions
ocol-0.115
asured-0.111
 bruised-0.110
uff-0.109
aught-0.107
Token Jill
Token ID22772
Feature activation-7.92e-02

Pos logit contributions
�+0.051
 exclaimed+0.051
s+0.048
 replied+0.048
cakes+0.047
Neg logit contributions
ocol-0.041
ubb-0.034
 loser-0.034
uzz-0.034
ateful-0.034
Token hopped
Token ID45230
Feature activation+7.61e-03

Pos logit contributions
cakes+0.094
lic+0.080
 exclaimed+0.078
 replied+0.075
�+0.074
Neg logit contributions
ocol-0.158
uff-0.151
 bruised-0.151
onel-0.150
asured-0.149
Token before
Token ID878
Feature activation+1.24e-02

Pos logit contributions
ocol+0.334
uff+0.303
ithering+0.301
 loser+0.291
aught+0.290
Neg logit contributions
 said-0.159
 they-0.152
 exclaimed-0.152
 replied-0.147
s-0.140
Token.
Token ID13
Feature activation+2.53e-04

Pos logit contributions
ocol+0.034
uff+0.030
 loser+0.030
ubb+0.030
ithering+0.030
Neg logit contributions
 exclaimed-0.013
 replied-0.013
 said-0.012
s-0.011
�-0.011
Token When
Token ID1649
Feature activation+1.58e-02

Pos logit contributions
ocol+0.001
uff+0.001
 loser+0.001
ubb+0.001
ithering+0.001
Neg logit contributions
 exclaimed-0.000
 replied-0.000
 said-0.000
s-0.000
�-0.000
Token they
Token ID484
Feature activation-9.86e-02

Pos logit contributions
ocol+0.044
 loser+0.040
ubb+0.039
uff+0.039
 bruised+0.038
Neg logit contributions
 exclaimed-0.015
 replied-0.014
�-0.013
s-0.012
 said-0.012
Token were
Token ID547
Feature activation-2.09e-02

Pos logit contributions
 exclaimed+0.088
 replied+0.085
�+0.078
s+0.075
cakes+0.075
Neg logit contributions
ocol-0.282
 loser-0.253
ubb-0.250
uff-0.247
aught-0.245
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 exclaimed+0.022
 replied+0.021
s+0.020
�+0.020
 said+0.019
Neg logit contributions
ocol-0.058
 loser-0.051
uff-0.051
ubb-0.051
 bruised-0.051
Token,
Token ID11
Feature activation-7.59e-03

Pos logit contributions
 exclaimed+0.142
 replied+0.136
 said+0.130
s+0.120
�+0.120
Neg logit contributions
ocol-0.390
uff-0.344
 loser-0.339
ubb-0.336
ithering-0.335
Token they
Token ID484
Feature activation-9.17e-02

Pos logit contributions
 exclaimed+0.005
 replied+0.005
 said+0.004
�+0.004
s+0.004
Neg logit contributions
ocol-0.024
 loser-0.022
uff-0.021
ubb-0.021
uzz-0.021
Token were
Token ID547
Feature activation-8.19e-03

Pos logit contributions
 exclaimed+0.094
cakes+0.092
�+0.089
s+0.089
 replied+0.088
Neg logit contributions
ocol-0.224
uzz-0.204
 bruised-0.204
uff-0.203
 loser-0.203
Token very
Token ID845
Feature activation-7.18e-04

Pos logit contributions
 replied+0.012
 exclaimed+0.012
s+0.011
�+0.011
 said+0.011
Neg logit contributions
ocol-0.019
 loser-0.016
uff-0.016
 bruised-0.016
ubb-0.016
Token proud
Token ID6613
Feature activation-3.74e-02

Pos logit contributions
 exclaimed+0.001
 replied+0.001
 said+0.001
�+0.001
s+0.001
Neg logit contributions
ocol-0.002
uff-0.001
 loser-0.001
ubb-0.001
aught-0.001
Token finish
Token ID5461
Feature activation-1.68e-02

Pos logit contributions
ocol+0.009
uff+0.008
 loser+0.008
ubb+0.008
aught+0.008
Neg logit contributions
 exclaimed-0.004
 replied-0.004
 said-0.004
�-0.004
s-0.004
Token.
Token ID13
Feature activation+1.86e-02

Pos logit contributions
 exclaimed+0.016
 said+0.015
 replied+0.015
s+0.013
 they+0.013
Neg logit contributions
ocol-0.049
 loser-0.044
ubb-0.043
uff-0.043
ithering-0.043
Token When
Token ID1649
Feature activation+1.29e-02

Pos logit contributions
ocol+0.058
ubb+0.051
 loser+0.051
uff+0.050
ithering+0.050
Neg logit contributions
 exclaimed-0.016
 replied-0.016
 said-0.015
 -0.014
s-0.013
Token they
Token ID484
Feature activation-9.91e-02

Pos logit contributions
ocol+0.041
 loser+0.037
uff+0.037
ubb+0.037
ithering+0.036
Neg logit contributions
 exclaimed-0.008
 replied-0.008
 said-0.007
s-0.006
�-0.006
Token were
Token ID547
Feature activation-1.58e-02

Pos logit contributions
 exclaimed+0.085
 replied+0.080
�+0.072
s+0.071
cakes+0.069
Neg logit contributions
ocol-0.285
 loser-0.254
ubb-0.253
 bruised-0.249
uff-0.247
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 exclaimed+0.019
 replied+0.018
s+0.018
�+0.017
 said+0.016
Neg logit contributions
ocol-0.041
uff-0.036
 bruised-0.036
ithering-0.035
 loser-0.035
Token,
Token ID11
Feature activation-4.88e-03

Pos logit contributions
 exclaimed+0.134
 replied+0.130
 said+0.124
s+0.114
�+0.112
Neg logit contributions
ocol-0.396
uff-0.351
 loser-0.349
ubb-0.345
ithering-0.343
Token Lily
Token ID20037
Feature activation-3.68e-02

Pos logit contributions
 exclaimed+0.005
 replied+0.005
 said+0.005
s+0.005
�+0.005
Neg logit contributions
ocol-0.013
 loser-0.012
uff-0.012
ubb-0.012
 bruised-0.011
Token noticed
Token ID6810
Feature activation-4.24e-03

Pos logit contributions
cakes+0.034
 replied+0.034
 exclaimed+0.033
�+0.032
s+0.031
Neg logit contributions
ocol-0.087
 bruised-0.084
uff-0.083
asured-0.082
aught-0.082
Token that
Token ID326
Feature activation-4.10e-02

Pos logit contributions
 exclaimed+0.004
 replied+0.004
 said+0.004
s+0.003
�+0.003
Neg logit contributions
ocol-0.013
 loser-0.011
uff-0.011
ubb-0.011
ithering-0.011
Token her
Token ID607
Feature activation+7.19e-02

Pos logit contributions
 they+0.035
 said+0.035
 exclaimed+0.031
 replied+0.030
s+0.025
Neg logit contributions
ocol-0.132
ithering-0.117
uff-0.117
ubb-0.116
 loser-0.115
Token cloth
Token ID16270
Feature activation-3.37e-02

Pos logit contributions
 said+0.036
 replied+0.036
 exclaimed+0.035
s+0.033
�+0.033
Neg logit contributions
ocol-0.059
uff-0.052
uzz-0.051
ubb-0.051
aught-0.051
Token.
Token ID13
Feature activation+7.14e-03

Pos logit contributions
 exclaimed+0.034
 replied+0.033
 said+0.032
s+0.030
�+0.028
Neg logit contributions
ocol-0.096
 loser-0.085
uff-0.084
ubb-0.084
ithering-0.083
Token When
Token ID1649
Feature activation+5.84e-03

Pos logit contributions
ocol+0.020
 loser+0.017
uff+0.017
ubb+0.017
ithering+0.017
Neg logit contributions
 exclaimed-0.008
 replied-0.008
 said-0.008
s-0.007
�-0.007
Token they
Token ID484
Feature activation-1.03e-01

Pos logit contributions
ocol+0.017
 loser+0.015
 bruised+0.015
ubb+0.015
uff+0.015
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.004
s-0.004
cakes-0.004
Token were
Token ID547
Feature activation-2.03e-02

Pos logit contributions
 exclaimed+0.085
 replied+0.080
cakes+0.077
�+0.074
s+0.070
Neg logit contributions
ocol-0.296
ubb-0.262
 loser-0.260
 bruised-0.259
uff-0.257
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 exclaimed+0.022
 replied+0.022
s+0.021
�+0.021
 said+0.020
Neg logit contributions
ocol-0.055
uff-0.049
ithering-0.048
 bruised-0.048
 loser-0.048
Token,
Token ID11
Feature activation-8.83e-03

Pos logit contributions
 exclaimed+0.119
 replied+0.117
 said+0.111
s+0.102
�+0.097
Neg logit contributions
ocol-0.410
 loser-0.366
uff-0.364
ubb-0.362
ithering-0.357
Token they
Token ID484
Feature activation-9.13e-02

Pos logit contributions
 exclaimed+0.007
 replied+0.007
 said+0.007
s+0.006
�+0.006
Neg logit contributions
ocol-0.027
 loser-0.024
uff-0.024
ubb-0.023
ithering-0.023
Token went
Token ID1816
Feature activation-1.04e-02

Pos logit contributions
 exclaimed+0.091
cakes+0.089
 replied+0.086
s+0.084
�+0.083
Neg logit contributions
ocol-0.234
uff-0.210
 bruised-0.209
ubb-0.208
uzz-0.207
Token inside
Token ID2641
Feature activation-2.37e-02

Pos logit contributions
 exclaimed+0.020
 replied+0.019
�+0.018
 said+0.018
cakes+0.017
Neg logit contributions
ocol-0.019
uff-0.017
uzz-0.017
ithering-0.017
 loser-0.017
Token the
Token ID262
Feature activation-2.80e-02

Pos logit contributions
 exclaimed+0.028
 replied+0.027
cakes+0.027
�+0.026
 said+0.023
Neg logit contributions
ocol-0.057
aught-0.053
uff-0.052
uzz-0.052
conf-0.051
Token snacks
Token ID28487
Feature activation-4.41e-03

Pos logit contributions
 exclaimed+0.029
 replied+0.029
 said+0.026
�+0.025
 +0.025
Neg logit contributions
ocol-0.045
 bruised-0.036
ubb-0.036
uff-0.035
 loser-0.034
Token.
Token ID13
Feature activation-1.92e-03

Pos logit contributions
 exclaimed+0.005
 replied+0.005
 said+0.005
�+0.004
s+0.004
Neg logit contributions
ocol-0.012
uff-0.010
 loser-0.010
ubb-0.010
aught-0.010
Token When
Token ID1649
Feature activation+8.31e-03

Pos logit contributions
 exclaimed+0.002
 replied+0.002
 said+0.002
s+0.002
�+0.002
Neg logit contributions
ocol-0.005
uff-0.004
 loser-0.004
ubb-0.004
uzz-0.004
Token they
Token ID484
Feature activation-9.97e-02

Pos logit contributions
ocol+0.024
 loser+0.021
 bruised+0.021
ubb+0.021
uff+0.021
Neg logit contributions
 exclaimed-0.007
 replied-0.006
s-0.006
 said-0.005
cakes-0.005
Token were
Token ID547
Feature activation-1.82e-02

Pos logit contributions
 exclaimed+0.095
�+0.087
 replied+0.086
cakes+0.086
s+0.081
Neg logit contributions
ocol-0.260
ubb-0.230
 bruised-0.227
 loser-0.225
uzz-0.221
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 exclaimed+0.020
 replied+0.020
�+0.019
s+0.019
 said+0.017
Neg logit contributions
ocol-0.048
uff-0.043
 bruised-0.043
 loser-0.042
ubb-0.042
Token,
Token ID11
Feature activation-7.69e-03

Pos logit contributions
 exclaimed+0.150
 replied+0.145
 said+0.139
s+0.129
�+0.127
Neg logit contributions
ocol-0.380
uff-0.334
 loser-0.332
ubb-0.329
ithering-0.327
Token Lucy
Token ID22162
Feature activation-2.18e-02

Pos logit contributions
 exclaimed+0.009
 replied+0.009
 said+0.008
s+0.008
�+0.008
Neg logit contributions
ocol-0.020
 loser-0.017
uff-0.017
ubb-0.017
 bruised-0.017
Token felt
Token ID2936
Feature activation-5.46e-02

Pos logit contributions
cakes+0.019
 exclaimed+0.019
 replied+0.019
�+0.018
s+0.016
Neg logit contributions
ocol-0.054
 bruised-0.051
aught-0.050
uff-0.050
asured-0.050
Token ready
Token ID3492
Feature activation-3.99e-02

Pos logit contributions
 replied+0.080
 exclaimed+0.079
s+0.074
 said+0.074
cakes+0.073
Neg logit contributions
ocol-0.126
ubb-0.108
 bruised-0.108
 loser-0.106
aught-0.104
Token for
Token ID329
Feature activation-4.26e-03

Pos logit contributions
 exclaimed+0.050
 replied+0.047
�+0.045
cakes+0.044
 said+0.043
Neg logit contributions
ocol-0.098
uff-0.086
 bruised-0.085
aught-0.085
 loser-0.085
Token another
Token ID1194
Feature activation-2.32e-02

Pos logit contributions
ocol+0.074
uff+0.066
ubb+0.065
ithering+0.065
onel+0.064
Neg logit contributions
 said-0.020
 replied-0.019
 exclaimed-0.019
 -0.019
 they-0.016
Token.
Token ID13
Feature activation+6.98e-03

Pos logit contributions
 said+0.031
 replied+0.030
 exclaimed+0.030
 +0.029
s+0.029
Neg logit contributions
ocol-0.060
uff-0.053
ubb-0.052
ateful-0.052
uzz-0.052
Token When
Token ID1649
Feature activation+1.31e-02

Pos logit contributions
ocol+0.019
uff+0.017
 loser+0.016
aught+0.016
ithering+0.016
Neg logit contributions
 exclaimed-0.008
 replied-0.007
 said-0.007
s-0.007
�-0.006
Token they
Token ID484
Feature activation-9.72e-02

Pos logit contributions
ocol+0.036
 loser+0.032
 bruised+0.031
ubb+0.031
uff+0.031
Neg logit contributions
 exclaimed-0.013
 replied-0.012
s-0.011
 said-0.010
cakes-0.010
Token were
Token ID547
Feature activation-1.87e-02

Pos logit contributions
 exclaimed+0.086
 replied+0.082
�+0.079
s+0.073
cakes+0.072
Neg logit contributions
ocol-0.275
 loser-0.245
ubb-0.245
uff-0.243
aught-0.241
Token done
Token ID1760
Feature activation-1.38e-01

Pos logit contributions
 exclaimed+0.017
 replied+0.017
 said+0.015
s+0.015
�+0.015
Neg logit contributions
ocol-0.054
uff-0.048
 loser-0.048
ubb-0.048
ithering-0.047
Token,
Token ID11
Feature activation-6.29e-03

Pos logit contributions
 exclaimed+0.123
 replied+0.120
 said+0.115
s+0.105
�+0.101
Neg logit contributions
ocol-0.405
 loser-0.361
uff-0.359
ubb-0.357
ithering-0.352
Token John
Token ID1757
Feature activation-3.08e-02

Pos logit contributions
 exclaimed+0.008
 replied+0.008
s+0.007
cakes+0.007
 said+0.007
Neg logit contributions
ocol-0.015
 loser-0.013
uff-0.013
ateful-0.013
 bruised-0.013
Token felt
Token ID2936
Feature activation-5.05e-02

Pos logit contributions
cakes+0.026
 exclaimed+0.026
 replied+0.026
�+0.025
s+0.022
Neg logit contributions
ocol-0.080
uff-0.075
aught-0.074
 bruised-0.074
asured-0.074
Token proud
Token ID6613
Feature activation-3.13e-02

Pos logit contributions
 exclaimed+0.070
 replied+0.070
 said+0.066
s+0.065
�+0.063
Neg logit contributions
ocol-0.121
 loser-0.105
ubb-0.104
 bruised-0.103
uff-0.103
Token of
Token ID286
Feature activation+5.19e-02

Pos logit contributions
 exclaimed+0.032
 replied+0.031
 said+0.030
�+0.028
s+0.028
Neg logit contributions
ocol-0.087
uff-0.077
 loser-0.077
ubb-0.076
aught-0.076
Token him
Token ID683
Feature activation+4.83e-02

Pos logit contributions
 exclaimed+0.026
 replied+0.025
 said+0.025
�+0.022
s+0.022
Neg logit contributions
ocol-0.060
uff-0.053
 loser-0.053
ubb-0.052
ithering-0.052
Token and
Token ID290
Feature activation-2.54e-02

Pos logit contributions
ocol+0.154
 loser+0.136
ubb+0.136
uff+0.133
ateful+0.132
Neg logit contributions
 replied-0.040
 said-0.039
 exclaimed-0.038
s-0.034
�-0.030
Token said
Token ID531
Feature activation-4.68e-02

Pos logit contributions
 exclaimed+0.012
 replied+0.011
s+0.010
 said+0.010
�+0.009
Neg logit contributions
ocol-0.082
 bruised-0.074
 loser-0.073
uff-0.073
ubb-0.073
Token he
Token ID339
Feature activation+4.22e-03

Pos logit contributions
 replied+0.038
 exclaimed+0.037
 said+0.036
s+0.031
 they+0.029
Neg logit contributions
ocol-0.144
uff-0.129
 loser-0.128
ubb-0.127
ithering-0.127
Token'd
Token ID1549
Feature activation-4.13e-02

Pos logit contributions
ocol+0.013
uff+0.012
 loser+0.012
ubb+0.011
ithering+0.011
Neg logit contributions
 replied-0.004
 exclaimed-0.004
 said-0.003
s-0.003
�-0.003
Token done
Token ID1760
Feature activation-1.37e-01

Pos logit contributions
 said+0.056
 exclaimed+0.056
 replied+0.055
�+0.049
s+0.048
Neg logit contributions
ocol-0.105
uff-0.093
ubb-0.090
ithering-0.089
 loser-0.089
Token a
Token ID257
Feature activation-6.99e-02

Pos logit contributions
 exclaimed+0.152
 replied+0.149
 said+0.145
s+0.133
�+0.131
Neg logit contributions
ocol-0.379
uff-0.334
 loser-0.332
ubb-0.331
ithering-0.326
Token great
Token ID1049
Feature activation-2.01e-02

Pos logit contributions
 exclaimed+0.092
 replied+0.091
 said+0.089
s+0.083
�+0.082
Neg logit contributions
ocol-0.177
uff-0.154
 loser-0.151
ubb-0.150
ithering-0.149
Token job
Token ID1693
Feature activation-4.77e-02

Pos logit contributions
 exclaimed+0.033
 replied+0.033
 said+0.032
s+0.030
�+0.030
Neg logit contributions
ocol-0.044
uff-0.038
 loser-0.037
ubb-0.037
ithering-0.036
Token.
Token ID13
Feature activation+6.93e-03

Pos logit contributions
 exclaimed+0.047
 said+0.047
 replied+0.045
s+0.041
 they+0.040
Neg logit contributions
ocol-0.138
uff-0.122
ubb-0.122
ithering-0.121
 loser-0.121
Token The
Token ID383
Feature activation-2.61e-02

Pos logit contributions
ocol+0.020
 loser+0.017
uff+0.017
ubb+0.017
ithering+0.017
Neg logit contributions
 exclaimed-0.008
 replied-0.007
 said-0.007
s-0.006
�-0.006
Token find
Token ID1064
Feature activation-1.41e-03

Pos logit contributions
 said+0.055
 exclaimed+0.053
 replied+0.053
s+0.049
�+0.048
Neg logit contributions
ocol-0.118
uff-0.104
 loser-0.102
ubb-0.102
aught-0.102
Token a
Token ID257
Feature activation-7.13e-02

Pos logit contributions
 exclaimed+0.001
 replied+0.001
 said+0.001
s+0.001
�+0.001
Neg logit contributions
ocol-0.004
 loser-0.004
uff-0.004
ubb-0.004
ithering-0.004
Token box
Token ID3091
Feature activation-1.61e-02

Pos logit contributions
 replied+0.093
 said+0.092
 exclaimed+0.091
s+0.086
�+0.083
Neg logit contributions
ocol-0.184
ubb-0.160
uff-0.158
ateful-0.155
aught-0.154
Token of
Token ID286
Feature activation+5.65e-02

Pos logit contributions
 exclaimed+0.020
 replied+0.020
 said+0.019
s+0.018
�+0.018
Neg logit contributions
ocol-0.041
uff-0.036
 loser-0.036
ubb-0.036
ithering-0.035
Token ch
Token ID442
Feature activation+5.77e-02

Pos logit contributions
ocol+0.135
 loser+0.129
uff+0.128
ithering+0.128
ubb+0.127
Neg logit contributions
 said-0.077
 exclaimed-0.073
 replied-0.073
s-0.067
 looked-0.064
Tokenocol
Token ID4668
Feature activation+2.66e-01

Pos logit contributions
 bruised+0.059
 loser+0.055
tons+0.055
 property+0.054
conf+0.054
Neg logit contributions
s-0.142
 said-0.126
 exclaimed-0.125
�-0.122
 they-0.122
Tokenates
Token ID689
Feature activation-2.77e-03

Pos logit contributions
ocol+0.353
 loser+0.285
ubb+0.281
 bruised+0.278
aught+0.271
Neg logit contributions
 exclaimed-0.628
 replied-0.612
s-0.612
 said-0.612
cakes-0.590
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.003
 said+0.003
s+0.002
�+0.002
Neg logit contributions
ocol-0.008
 loser-0.007
uff-0.007
ubb-0.007
ithering-0.007
Token
Token ID198
Feature activation+1.61e-03

Pos logit contributions
ocol+0.035
uff+0.031
 loser+0.031
ubb+0.031
ithering+0.031
Neg logit contributions
 exclaimed-0.012
 replied-0.012
 said-0.011
s-0.010
�-0.010
Token
Token ID198
Feature activation+7.21e-03

Pos logit contributions
ocol+0.004
uff+0.004
uzz+0.004
aught+0.004
ithering+0.004
Neg logit contributions
 replied-0.002
 exclaimed-0.002
cakes-0.002
 said-0.001
�-0.001
Token"
Token ID1
Feature activation+5.66e-02

Pos logit contributions
ocol+0.018
 loser+0.016
uff+0.016
ithering+0.015
ubb+0.015
Neg logit contributions
 exclaimed-0.010
 replied-0.010
 said-0.009
s-0.009
�-0.008
Token.
Token ID13
Feature activation+6.73e-03

Pos logit contributions
 exclaimed+0.069
 replied+0.067
 said+0.064
s+0.060
�+0.059
Neg logit contributions
ocol-0.166
uff-0.146
 loser-0.146
ubb-0.144
aught-0.142
Token<|endoftext|>
Token ID50256
Feature activation+1.06e-02

Pos logit contributions
ocol+0.021
uff+0.018
 loser+0.018
ubb+0.018
aught+0.018
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.004
�-0.004
TokenPeter
Token ID19727
Feature activation-2.19e-02

Pos logit contributions
ocol+0.026
uff+0.023
 loser+0.023
ithering+0.023
uzz+0.022
Neg logit contributions
 exclaimed-0.014
 replied-0.013
 said-0.012
cakes-0.012
s-0.011
Token was
Token ID373
Feature activation+3.23e-02

Pos logit contributions
 exclaimed+0.020
 replied+0.019
 said+0.018
s+0.017
cakes+0.017
Neg logit contributions
ocol-0.063
uff-0.056
 loser-0.056
ubb-0.055
aught-0.055
Token a
Token ID257
Feature activation-5.40e-02

Pos logit contributions
ocol+0.084
 loser+0.073
ubb+0.072
ities+0.070
uff+0.070
Neg logit contributions
 exclaimed-0.043
 said-0.043
 replied-0.042
 -0.037
�-0.036
Token mis
Token ID2984
Feature activation+2.07e-01

Pos logit contributions
 said+0.068
 replied+0.068
�+0.066
 exclaimed+0.063
 looked+0.061
Neg logit contributions
ocol-0.145
ities-0.122
ithering-0.122
uzz-0.121
uff-0.121
Tokenchie
Token ID3043
Feature activation+3.21e-02

Pos logit contributions
ocol+0.306
arter+0.244
 bruised+0.237
ities+0.232
 appearance+0.224
Neg logit contributions
 exclaimed-0.486
s-0.450
 replied-0.449
 said-0.447
cakes-0.430
Tokenvous
Token ID31222
Feature activation-2.61e-02

Pos logit contributions
ocol+0.042
aught+0.035
 loser+0.035
ang+0.033
uff+0.033
Neg logit contributions
 replied-0.073
 exclaimed-0.072
�-0.070
 said-0.069
s-0.069
Token 3
Token ID513
Feature activation-1.28e-02

Pos logit contributions
 said+0.033
 replied+0.033
�+0.029
 looked+0.029
 exclaimed+0.028
Neg logit contributions
ocol-0.070
ithering-0.062
aught-0.059
ubb-0.059
ateful-0.059
Token year
Token ID614
Feature activation+2.85e-02

Pos logit contributions
 exclaimed+0.016
 replied+0.016
 said+0.015
s+0.014
�+0.014
Neg logit contributions
ocol-0.033
uff-0.029
 loser-0.029
ubb-0.028
aught-0.028
Token old
Token ID1468
Feature activation+3.65e-02

Pos logit contributions
ocol+0.064
 loser+0.056
uff+0.055
ithering+0.055
ubb+0.055
Neg logit contributions
 exclaimed-0.045
 replied-0.045
 said-0.044
s-0.042
�-0.040
Token a
Token ID257
Feature activation-7.92e-02

Pos logit contributions
 exclaimed+0.058
 replied+0.056
 said+0.053
cakes+0.053
�+0.053
Neg logit contributions
ocol-0.065
uff-0.055
 loser-0.054
uzz-0.053
aught-0.053
Token big
Token ID1263
Feature activation-4.29e-02

Pos logit contributions
 exclaimed+0.098
 replied+0.094
 said+0.090
s+0.086
�+0.085
Neg logit contributions
ocol-0.201
 loser-0.179
uff-0.176
 bruised-0.173
ubb-0.173
Token box
Token ID3091
Feature activation-1.36e-02

Pos logit contributions
 replied+0.059
 exclaimed+0.057
 said+0.056
�+0.052
s+0.052
Neg logit contributions
ocol-0.106
uff-0.093
ubb-0.093
 loser-0.092
ithering-0.092
Token of
Token ID286
Feature activation+5.45e-02

Pos logit contributions
 exclaimed+0.017
 replied+0.016
�+0.016
 said+0.015
cakes+0.015
Neg logit contributions
ocol-0.033
aught-0.030
uff-0.030
uzz-0.029
 loser-0.029
Token ch
Token ID442
Feature activation+5.97e-02

Pos logit contributions
ocol+0.117
 loser+0.110
ithering+0.109
uff+0.109
ubb+0.106
Neg logit contributions
 said-0.091
 exclaimed-0.089
 replied-0.089
s-0.079
 looked-0.076
Tokenocol
Token ID4668
Feature activation+2.72e-01

Pos logit contributions
 bruised+0.053
 loser+0.051
tons+0.051
 Kang+0.045
uding+0.045
Neg logit contributions
s-0.146
 exclaimed-0.141
 replied-0.138
les-0.136
 said-0.135
Tokenates
Token ID689
Feature activation+2.71e-03

Pos logit contributions
ocol+0.363
 loser+0.293
ubb+0.281
 bruised+0.274
uff+0.271
Neg logit contributions
 exclaimed-0.653
 replied-0.639
 said-0.634
s-0.627
cakes-0.609
Token to
Token ID284
Feature activation+4.93e-03

Pos logit contributions
ocol+0.007
uff+0.006
 loser+0.006
uzz+0.006
aught+0.006
Neg logit contributions
 exclaimed-0.003
 replied-0.003
 said-0.003
�-0.003
s-0.003
Token share
Token ID2648
Feature activation-2.54e-02

Pos logit contributions
ocol+0.013
uff+0.011
 loser+0.011
ubb+0.011
ithering+0.011
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.006
s-0.005
�-0.005
Token with
Token ID351
Feature activation+2.51e-02

Pos logit contributions
 exclaimed+0.026
 said+0.026
 replied+0.025
s+0.025
 they+0.022
Neg logit contributions
ocol-0.072
 loser-0.065
ubb-0.065
uff-0.065
ithering-0.064
Token the
Token ID262
Feature activation-2.62e-02

Pos logit contributions
ocol+0.061
uff+0.053
 loser+0.053
ubb+0.052
ithering+0.052
Neg logit contributions
 exclaimed-0.036
 replied-0.035
 said-0.034
s-0.032
�-0.031
Token,
Token ID11
Feature activation+1.20e-02

Pos logit contributions
ocol+0.048
 loser+0.043
uff+0.043
ubb+0.042
ithering+0.042
Neg logit contributions
 exclaimed-0.017
 replied-0.016
 said-0.016
s-0.014
�-0.014
Token Bugs
Token ID44991
Feature activation+3.25e-02

Pos logit contributions
ocol+0.021
 loser+0.017
uff+0.017
 bruised+0.017
ithering+0.017
Neg logit contributions
 exclaimed-0.025
 replied-0.024
 said-0.023
cakes-0.022
s-0.022
Token said
Token ID531
Feature activation-3.08e-02

Pos logit contributions
ocol+0.085
uff+0.082
 bruised+0.080
aught+0.080
ithering+0.079
Neg logit contributions
cakes-0.029
 exclaimed-0.023
�-0.022
 replied-0.021
 chat-0.018
Token,
Token ID11
Feature activation+1.17e-02

Pos logit contributions
 exclaimed+0.033
 replied+0.031
 said+0.030
�+0.030
cakes+0.029
Neg logit contributions
ocol-0.081
 loser-0.073
uff-0.072
uzz-0.071
ithering-0.071
Token "
Token ID366
Feature activation+8.83e-02

Pos logit contributions
ocol+0.023
uff+0.021
uzz+0.020
ithering+0.020
 loser+0.020
Neg logit contributions
 exclaimed-0.019
 replied-0.018
cakes-0.018
�-0.018
s-0.018
TokenTwe
Token ID32665
Feature activation+1.88e-01

Pos logit contributions
ocol+0.181
uff+0.152
 loser+0.152
ubb+0.149
ithering+0.147
Neg logit contributions
 exclaimed-0.159
 replied-0.157
 said-0.153
s-0.147
�-0.145
Tokenety
Token ID2963
Feature activation+2.12e-02

Pos logit contributions
ocol+0.336
ubb+0.304
 loser+0.289
arter+0.272
onel+0.271
Neg logit contributions
s-0.383
 replied-0.366
 exclaimed-0.364
 said-0.356
�-0.331
Token,
Token ID11
Feature activation+3.54e-02

Pos logit contributions
ocol+0.066
 loser+0.059
uff+0.059
ubb+0.059
ithering+0.058
Neg logit contributions
 replied-0.016
 exclaimed-0.015
 said-0.015
s-0.014
�-0.013
Token I
Token ID314
Feature activation+1.11e-02

Pos logit contributions
ocol+0.094
uff+0.083
ubb+0.081
 loser+0.081
ithering+0.080
Neg logit contributions
 said-0.046
 replied-0.046
 they-0.042
 exclaimed-0.041
 looked-0.040
Token have
Token ID423
Feature activation-5.38e-02

Pos logit contributions
ocol+0.028
uff+0.024
ubb+0.023
 loser+0.023
ithering+0.023
Neg logit contributions
 said-0.016
 replied-0.016
 exclaimed-0.015
s-0.015
�-0.015
Token a
Token ID257
Feature activation-5.75e-02

Pos logit contributions
 exclaimed+0.063
 replied+0.062
 said+0.057
s+0.055
�+0.054
Neg logit contributions
ocol-0.143
 loser-0.127
uff-0.124
 bruised-0.123
ubb-0.123
Token just
Token ID655
Feature activation+6.25e-02

Pos logit contributions
ocol+0.048
 loser+0.043
uff+0.042
ubb+0.042
ithering+0.042
Neg logit contributions
 exclaimed-0.021
 said-0.021
 replied-0.020
 they-0.018
s-0.017
Token a
Token ID257
Feature activation-5.98e-02

Pos logit contributions
ocol+0.153
uff+0.136
 loser+0.135
ithering+0.135
ubb+0.132
Neg logit contributions
 said-0.091
 replied-0.088
 exclaimed-0.085
 they-0.079
 looked-0.076
Token little
Token ID1310
Feature activation+3.02e-03

Pos logit contributions
 exclaimed+0.081
 replied+0.079
 said+0.077
s+0.073
�+0.071
Neg logit contributions
ocol-0.150
uff-0.130
 loser-0.128
ubb-0.128
ithering-0.127
Token bit
Token ID1643
Feature activation+1.72e-02

Pos logit contributions
ocol+0.007
uff+0.006
ithering+0.006
ubb+0.006
 loser+0.006
Neg logit contributions
 replied-0.004
 exclaimed-0.004
 said-0.004
s-0.004
�-0.004
Token of
Token ID286
Feature activation+5.50e-02

Pos logit contributions
ocol+0.051
uff+0.046
ithering+0.045
 loser+0.045
uzz+0.045
Neg logit contributions
 said-0.015
 exclaimed-0.015
 replied-0.015
s-0.014
�-0.012
Token mis
Token ID2984
Feature activation+1.98e-01

Pos logit contributions
ocol+0.138
uff+0.125
ithering+0.124
 loser+0.123
uzz+0.121
Neg logit contributions
 said-0.072
 exclaimed-0.072
 replied-0.071
s-0.062
�-0.061
Tokenbeh
Token ID20709
Feature activation+1.02e-01

Pos logit contributions
ocol+0.340
arter+0.280
uff+0.275
 loser+0.266
 bruised+0.264
Neg logit contributions
 exclaimed-0.420
s-0.396
 said-0.393
 replied-0.389
�-0.374
Tokenaving
Token ID2703
Feature activation-1.63e-02

Pos logit contributions
ocol+0.175
uff+0.144
 loser+0.138
ubb+0.137
ithering+0.135
Neg logit contributions
 exclaimed-0.217
 replied-0.211
 said-0.209
s-0.206
�-0.201
Token.
Token ID13
Feature activation+1.61e-02

Pos logit contributions
 exclaimed+0.015
 replied+0.015
 said+0.015
s+0.014
�+0.013
Neg logit contributions
ocol-0.048
uff-0.042
 loser-0.042
ithering-0.041
ubb-0.041
Token
Token ID198
Feature activation+6.10e-04

Pos logit contributions
ocol+0.052
uff+0.047
 loser+0.047
ubb+0.046
aught+0.046
Neg logit contributions
 exclaimed-0.009
 replied-0.008
 said-0.008
s-0.007
cakes-0.006
Token
Token ID198
Feature activation+4.83e-03

Pos logit contributions
ocol+0.002
uzz+0.001
uff+0.001
aught+0.001
ithering+0.001
Neg logit contributions
cakes-0.001
 replied-0.001
 exclaimed-0.001
 said-0.001
�-0.001
Token a
Token ID257
Feature activation-7.76e-02

Pos logit contributions
ocol+0.098
 loser+0.089
uff+0.089
ithering+0.088
ubb+0.087
Neg logit contributions
 exclaimed-0.030
 replied-0.029
 said-0.029
s-0.025
�-0.024
Token smile
Token ID8212
Feature activation-6.39e-02

Pos logit contributions
 exclaimed+0.109
 replied+0.106
 said+0.102
cakes+0.097
s+0.096
Neg logit contributions
ocol-0.186
 loser-0.164
uff-0.158
 bruised-0.158
ubb-0.157
Tokeny
Token ID88
Feature activation-1.77e-02

Pos logit contributions
 exclaimed+0.075
 replied+0.073
 said+0.070
s+0.066
�+0.065
Neg logit contributions
ocol-0.170
uff-0.149
 loser-0.149
ubb-0.147
ithering-0.146
Token face
Token ID1986
Feature activation-1.46e-02

Pos logit contributions
 exclaimed+0.026
 replied+0.026
 said+0.025
s+0.024
�+0.023
Neg logit contributions
ocol-0.041
 loser-0.036
uff-0.036
ubb-0.035
ithering-0.034
Token.
Token ID13
Feature activation+7.05e-03

Pos logit contributions
 exclaimed+0.014
 replied+0.014
 said+0.013
s+0.012
cakes+0.011
Neg logit contributions
ocol-0.042
 loser-0.038
uff-0.037
ubb-0.037
ithering-0.037
Token Her
Token ID2332
Feature activation+8.09e-02

Pos logit contributions
ocol+0.018
 loser+0.016
ubb+0.015
uff+0.015
ithering+0.015
Neg logit contributions
 exclaimed-0.009
 replied-0.009
 said-0.009
s-0.009
�-0.008
Token mom
Token ID1995
Feature activation-3.47e-02

Pos logit contributions
ocol+0.193
 loser+0.165
uff+0.165
ubb+0.164
ithering+0.162
Neg logit contributions
 exclaimed-0.121
 replied-0.120
 said-0.117
s-0.110
�-0.108
Token saw
Token ID2497
Feature activation-4.41e-03

Pos logit contributions
�+0.026
 exclaimed+0.025
 replied+0.022
cakes+0.022
s+0.022
Neg logit contributions
ocol-0.096
uff-0.089
 bruised-0.088
aught-0.088
asured-0.086
Token the
Token ID262
Feature activation-3.36e-02

Pos logit contributions
 exclaimed+0.006
 replied+0.006
 said+0.006
s+0.006
�+0.006
Neg logit contributions
ocol-0.011
uff-0.009
 loser-0.009
ubb-0.009
ithering-0.009
Token mark
Token ID1317
Feature activation+8.13e-03

Pos logit contributions
 exclaimed+0.047
 replied+0.046
 said+0.044
s+0.042
�+0.041
Neg logit contributions
ocol-0.082
 loser-0.072
uff-0.072
ubb-0.071
 bruised-0.070
Token and
Token ID290
Feature activation-2.08e-02

Pos logit contributions
ocol+0.022
uff+0.020
 loser+0.020
ubb+0.020
ithering+0.019
Neg logit contributions
 exclaimed-0.009
 replied-0.008
 said-0.008
s-0.008
�-0.007
Token sad
Token ID6507
Feature activation+7.69e-03

Pos logit contributions
 exclaimed+0.011
 replied+0.011
 said+0.010
s+0.010
�+0.010
Neg logit contributions
ocol-0.014
uff-0.012
 loser-0.012
ubb-0.012
ithering-0.012
Token anymore
Token ID7471
Feature activation+3.90e-02

Pos logit contributions
ocol+0.021
ithering+0.019
 loser+0.019
uff+0.018
ubb+0.018
Neg logit contributions
 said-0.009
 replied-0.009
 exclaimed-0.008
s-0.008
 they-0.008
Token.
Token ID13
Feature activation+1.77e-02

Pos logit contributions
ocol+0.120
 loser+0.106
ubb+0.106
uff+0.105
ithering+0.105
Neg logit contributions
 exclaimed-0.034
 said-0.034
 replied-0.032
s-0.029
 they-0.027
Token She
Token ID1375
Feature activation+3.27e-02

Pos logit contributions
ocol+0.052
 loser+0.045
ithering+0.044
ubb+0.044
uff+0.043
Neg logit contributions
 replied-0.020
 -0.020
 exclaimed-0.019
 said-0.019
�-0.017
Token had
Token ID550
Feature activation-5.09e-02

Pos logit contributions
ocol+0.084
 bruised+0.076
asured+0.075
uff+0.074
 loser+0.074
Neg logit contributions
 exclaimed-0.033
cakes-0.033
 replied-0.031
s-0.031
�-0.031
Token her
Token ID607
Feature activation+7.87e-02

Pos logit contributions
 replied+0.072
 exclaimed+0.070
s+0.070
�+0.069
cakes+0.069
Neg logit contributions
ocol-0.110
 bruised-0.100
asured-0.099
aught-0.099
 loser-0.099
Token very
Token ID845
Feature activation+1.13e-02

Pos logit contributions
ocol+0.190
uff+0.164
 loser+0.162
ubb+0.162
ithering+0.162
Neg logit contributions
 replied-0.114
 said-0.113
 exclaimed-0.113
�-0.103
s-0.103
Token own
Token ID898
Feature activation+4.26e-02

Pos logit contributions
ocol+0.025
uff+0.022
 loser+0.022
ubb+0.021
ithering+0.021
Neg logit contributions
 exclaimed-0.018
 replied-0.018
 said-0.017
s-0.016
�-0.016
Token beautiful
Token ID4950
Feature activation+9.94e-03

Pos logit contributions
ocol+0.103
uff+0.089
 loser+0.089
ubb+0.087
ithering+0.086
Neg logit contributions
 exclaimed-0.061
 replied-0.060
 said-0.058
s-0.055
�-0.054
Token rainbow
Token ID27223
Feature activation+1.53e-02

Pos logit contributions
ocol+0.023
uff+0.020
 loser+0.020
ubb+0.019
aught+0.019
Neg logit contributions
 exclaimed-0.015
 replied-0.015
 said-0.014
s-0.014
�-0.013
Token to
Token ID284
Feature activation+2.46e-02

Pos logit contributions
ocol+0.042
 loser+0.037
uff+0.037
ubb+0.037
ithering+0.037
Neg logit contributions
 exclaimed-0.016
 replied-0.016
 said-0.016
s-0.015
�-0.014
Token didn
Token ID1422
Feature activation-1.72e-02

Pos logit contributions
 exclaimed+0.033
 replied+0.032
 said+0.030
cakes+0.029
s+0.029
Neg logit contributions
ocol-0.084
uff-0.076
 loser-0.075
aught-0.073
ithering-0.073
Token't
Token ID470
Feature activation-2.17e-02

Pos logit contributions
 exclaimed+0.027
 replied+0.027
 said+0.026
�+0.023
cakes+0.023
Neg logit contributions
ocol-0.036
ithering-0.034
 loser-0.033
uff-0.033
conf-0.031
Token want
Token ID765
Feature activation+5.22e-02

Pos logit contributions
 exclaimed+0.040
 replied+0.040
 said+0.038
s+0.037
�+0.037
Neg logit contributions
ocol-0.043
 loser-0.036
uff-0.036
ubb-0.036
ithering-0.035
Token to
Token ID284
Feature activation+1.20e-02

Pos logit contributions
ocol+0.159
 loser+0.142
uff+0.142
ubb+0.140
aught+0.139
Neg logit contributions
 exclaimed-0.041
 replied-0.039
 said-0.036
�-0.033
s-0.033
Token.
Token ID13
Feature activation+1.39e-03

Pos logit contributions
ocol+0.031
uff+0.027
ubb+0.026
 loser+0.026
ithering+0.026
Neg logit contributions
 exclaimed-0.016
 replied-0.016
 said-0.015
s-0.014
�-0.014
Token Her
Token ID2332
Feature activation+8.33e-02

Pos logit contributions
ocol+0.003
uff+0.003
 loser+0.003
aught+0.003
ubb+0.003
Neg logit contributions
 exclaimed-0.002
 replied-0.002
 said-0.002
s-0.002
�-0.002
Token friends
Token ID2460
Feature activation-3.51e-02

Pos logit contributions
ocol+0.208
ubb+0.172
 loser+0.170
uff+0.167
aught+0.167
Neg logit contributions
 replied-0.123
 said-0.123
 exclaimed-0.121
s-0.112
cakes-0.106
Token asked
Token ID1965
Feature activation+2.85e-02

Pos logit contributions
cakes+0.034
 exclaimed+0.034
�+0.034
s+0.031
 replied+0.031
Neg logit contributions
ocol-0.088
uff-0.083
 bruised-0.081
 loser-0.081
aught-0.081
Token her
Token ID607
Feature activation+7.12e-02

Pos logit contributions
ocol+0.084
ubb+0.077
 loser+0.075
uff+0.074
arter+0.074
Neg logit contributions
 said-0.028
 replied-0.028
 exclaimed-0.026
 they-0.026
s-0.024
Token why
Token ID1521
Feature activation+5.97e-02

Pos logit contributions
ocol+0.175
ubb+0.147
 loser+0.146
uff+0.146
ithering+0.144
Neg logit contributions
 replied-0.105
 said-0.104
 exclaimed-0.104
s-0.099
 they-0.096
Token she
Token ID673
Feature activation-5.00e-03

Pos logit contributions
ocol+0.204
ubb+0.183
 loser+0.183
uff+0.181
ithering+0.180
Neg logit contributions
 said-0.031
 replied-0.028
 exclaimed-0.028
s-0.026
 they-0.025
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
ocol+0.206
ubb+0.180
uff+0.179
 loser+0.178
ithering+0.176
Neg logit contributions
 said-0.050
 exclaimed-0.048
 replied-0.046
s-0.045
 they-0.038
Token He
Token ID679
Feature activation+3.74e-02

Pos logit contributions
ocol+0.050
uff+0.044
 loser+0.044
ubb+0.044
ithering+0.043
Neg logit contributions
 exclaimed-0.017
 replied-0.016
 said-0.015
s-0.014
�-0.014
Token quickly
Token ID2952
Feature activation-4.72e-02

Pos logit contributions
ocol+0.106
 loser+0.095
uff+0.095
ubb+0.093
ithering+0.093
Neg logit contributions
 exclaimed-0.036
 replied-0.035
 said-0.033
s-0.031
�-0.031
Token stole
Token ID18240
Feature activation+1.83e-02

Pos logit contributions
 exclaimed+0.053
�+0.051
 replied+0.051
cakes+0.051
s+0.050
Neg logit contributions
ocol-0.120
 bruised-0.104
 loser-0.104
uff-0.104
ithering-0.103
Token something
Token ID1223
Feature activation-2.00e-02

Pos logit contributions
ocol+0.058
ithering+0.054
uff+0.053
arter+0.053
 loser+0.053
Neg logit contributions
 said-0.013
 exclaimed-0.012
 replied-0.012
s-0.010
 -0.010
Token from
Token ID422
Feature activation+7.94e-02

Pos logit contributions
 exclaimed+0.027
 said+0.027
 replied+0.026
s+0.025
 they+0.022
Neg logit contributions
ocol-0.053
 loser-0.046
uff-0.045
ubb-0.045
ithering-0.044
Token his
Token ID465
Feature activation+7.75e-02

Pos logit contributions
ocol+0.256
uff+0.234
ithering+0.233
 loser+0.232
ubb+0.231
Neg logit contributions
 said-0.053
 replied-0.050
 exclaimed-0.049
�-0.037
 -0.037
Token friend
Token ID1545
Feature activation-6.70e-03

Pos logit contributions
ocol+0.202
ubb+0.174
ateful+0.171
uzz+0.168
ithering+0.167
Neg logit contributions
 said-0.107
 replied-0.102
 exclaimed-0.100
s-0.098
�-0.092
Token and
Token ID290
Feature activation-1.71e-02

Pos logit contributions
 exclaimed+0.006
 replied+0.006
 said+0.006
s+0.005
�+0.005
Neg logit contributions
ocol-0.020
ubb-0.018
uff-0.018
 loser-0.018
ithering-0.018
Token ran
Token ID4966
Feature activation-1.27e-02

Pos logit contributions
 exclaimed+0.017
 replied+0.017
 said+0.016
s+0.015
�+0.015
Neg logit contributions
ocol-0.048
 loser-0.042
uff-0.042
ubb-0.042
ithering-0.041
Token away
Token ID1497
Feature activation+1.52e-03

Pos logit contributions
 exclaimed+0.018
 replied+0.017
 said+0.017
s+0.016
�+0.016
Neg logit contributions
ocol-0.031
 loser-0.027
uff-0.027
ubb-0.026
ithering-0.026
TokenThat
Token ID2504
Feature activation+4.86e-02

Pos logit contributions
ocol+0.271
uff+0.231
 loser+0.230
arter+0.227
ubb+0.226
Neg logit contributions
 exclaimed-0.191
 said-0.187
 replied-0.187
s-0.179
�-0.176
Token is
Token ID318
Feature activation+5.64e-02

Pos logit contributions
ocol+0.114
 property+0.106
uff+0.105
aught+0.104
 bruised+0.103
Neg logit contributions
 exclaimed-0.057
cakes-0.054
 replied-0.050
 said-0.048
�-0.044
Token the
Token ID262
Feature activation+9.61e-03

Pos logit contributions
ocol+0.188
uff+0.171
 loser+0.170
ubb+0.169
ithering+0.166
Neg logit contributions
 said-0.038
 replied-0.036
 exclaimed-0.030
 they-0.028
 -0.025
Token moon
Token ID8824
Feature activation+3.86e-02

Pos logit contributions
ocol+0.025
uzz+0.021
ubb+0.021
ateful+0.021
uff+0.020
Neg logit contributions
 said-0.014
 replied-0.014
 exclaimed-0.013
 -0.012
�-0.012
Token,
Token ID11
Feature activation+4.29e-02

Pos logit contributions
ocol+0.108
uff+0.097
 loser+0.096
ubb+0.095
ithering+0.094
Neg logit contributions
 replied-0.040
 exclaimed-0.039
 said-0.039
s-0.037
�-0.035
Token my
Token ID616
Feature activation+8.57e-02

Pos logit contributions
ocol+0.119
ithering+0.108
 loser+0.107
ubb+0.105
ateful+0.105
Neg logit contributions
 said-0.054
 replied-0.052
 they-0.047
 exclaimed-0.047
 -0.042
Token child
Token ID1200
Feature activation+1.13e-02

Pos logit contributions
ocol+0.215
ateful+0.187
ubb+0.184
ithering+0.180
 loser+0.179
Neg logit contributions
 replied-0.130
 said-0.128
 exclaimed-0.114
s-0.111
�-0.109
Token.
Token ID13
Feature activation+1.48e-02

Pos logit contributions
ocol+0.033
 loser+0.030
uff+0.030
ubb+0.030
ithering+0.030
Neg logit contributions
 replied-0.010
 exclaimed-0.010
 said-0.010
s-0.009
�-0.009
Token It
Token ID632
Feature activation+1.30e-03

Pos logit contributions
ocol+0.035
 loser+0.031
ubb+0.031
uff+0.030
ithering+0.030
Neg logit contributions
 replied-0.023
 exclaimed-0.022
 said-0.022
 -0.021
�-0.021
Token's
Token ID338
Feature activation+4.87e-02

Pos logit contributions
ocol+0.003
 bruised+0.003
ubb+0.003
aught+0.003
uff+0.003
Neg logit contributions
 exclaimed-0.002
 replied-0.001
cakes-0.001
 said-0.001
�-0.001
Token very
Token ID845
Feature activation+3.14e-02

Pos logit contributions
ocol+0.143
 loser+0.129
ubb+0.127
uff+0.124
iker+0.124
Neg logit contributions
 said-0.057
 replied-0.054
 exclaimed-0.048
 -0.043
 looked-0.042
Token animal
Token ID5044
Feature activation+7.79e-02

Pos logit contributions
ocol+0.157
uzz+0.139
uff+0.138
aught+0.135
 bruised+0.135
Neg logit contributions
 said-0.091
s-0.088
 exclaimed-0.083
 replied-0.082
 they-0.080
Token.
Token ID13
Feature activation+7.35e-03

Pos logit contributions
ocol+0.244
uff+0.216
ubb+0.213
ithering+0.211
 loser+0.209
Neg logit contributions
 said-0.069
s-0.066
 exclaimed-0.064
 they-0.059
 replied-0.058
Token One
Token ID1881
Feature activation+5.05e-02

Pos logit contributions
ocol+0.024
ubb+0.021
 loser+0.021
uff+0.021
ithering+0.021
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.004
�-0.004
Token day
Token ID1110
Feature activation-6.66e-02

Pos logit contributions
ocol+0.137
uff+0.120
 loser+0.120
ubb+0.118
ithering+0.117
Neg logit contributions
 exclaimed-0.059
 replied-0.058
 said-0.056
s-0.052
�-0.051
Token,
Token ID11
Feature activation+7.71e-03

Pos logit contributions
 exclaimed+0.055
 replied+0.054
 said+0.052
s+0.047
�+0.045
Neg logit contributions
ocol-0.202
uff-0.180
 loser-0.179
ubb-0.178
ithering-0.176
Token the
Token ID262
Feature activation-2.78e-02

Pos logit contributions
ocol+0.024
uff+0.022
ubb+0.021
 loser+0.021
aught+0.021
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.005
s-0.005
�-0.005
Token turtle
Token ID28699
Feature activation+3.69e-02

Pos logit contributions
 said+0.042
 replied+0.041
 exclaimed+0.038
�+0.036
s+0.036
Neg logit contributions
ocol-0.071
uzz-0.060
aught-0.059
 bruised-0.058
udge-0.058
Token was
Token ID373
Feature activation+2.61e-02

Pos logit contributions
ocol+0.096
 loser+0.087
uff+0.086
aught+0.085
ubb+0.084
Neg logit contributions
 exclaimed-0.039
 replied-0.038
cakes-0.036
 said-0.034
�-0.034
Token running
Token ID2491
Feature activation+1.33e-02

Pos logit contributions
ocol+0.064
ubb+0.053
uff+0.053
 loser+0.053
uzz+0.052
Neg logit contributions
 exclaimed-0.039
 said-0.038
 replied-0.038
s-0.035
�-0.034
Token through
Token ID832
Feature activation+2.40e-02

Pos logit contributions
ocol+0.033
 loser+0.029
uff+0.029
ubb+0.028
ithering+0.028
Neg logit contributions
 exclaimed-0.018
 replied-0.018
 said-0.017
s-0.016
�-0.016
Token the
Token ID262
Feature activation-1.65e-02

Pos logit contributions
ocol+0.080
uff+0.073
 loser+0.072
ithering+0.071
ubb+0.071
Neg logit contributions
 exclaimed-0.012
 said-0.012
 replied-0.011
s-0.011
 -0.010
Token
Token ID198
Feature activation+7.94e-03

Pos logit contributions
ocol+0.004
uff+0.003
 loser+0.003
ubb+0.003
ithering+0.003
Neg logit contributions
 exclaimed-0.001
 replied-0.001
 said-0.001
s-0.001
�-0.001
TokenFr
Token ID6732
Feature activation+5.69e-02

Pos logit contributions
ocol+0.021
ubb+0.018
 bruised+0.018
arter+0.018
 loser+0.018
Neg logit contributions
 exclaimed-0.011
 replied-0.010
 said-0.010
s-0.009
�-0.009
Tokenanny
Token ID7737
Feature activation-2.85e-02

Pos logit contributions
ocol+0.076
 bruised+0.063
 loser+0.060
ithering+0.057
ubb+0.056
Neg logit contributions
 exclaimed-0.143
 replied-0.141
 said-0.137
s-0.128
cakes-0.128
Token understood
Token ID7247
Feature activation-2.55e-02

Pos logit contributions
 exclaimed+0.024
cakes+0.022
 replied+0.021
s+0.021
�+0.020
Neg logit contributions
ocol-0.078
uff-0.071
 loser-0.070
ubb-0.069
 bruised-0.069
Token and
Token ID290
Feature activation-3.10e-02

Pos logit contributions
 exclaimed+0.028
 replied+0.027
 said+0.025
s+0.024
�+0.024
Neg logit contributions
ocol-0.069
uff-0.061
 loser-0.060
ubb-0.059
aught-0.059
Token carefully
Token ID7773
Feature activation-3.07e-02

Pos logit contributions
�+0.027
 exclaimed+0.027
s+0.025
 replied+0.025
cakes+0.024
Neg logit contributions
ocol-0.085
 bruised-0.077
ubb-0.076
aught-0.076
uff-0.076
Token put
Token ID1234
Feature activation-1.93e-02

Pos logit contributions
 exclaimed+0.032
 replied+0.030
�+0.029
s+0.029
 said+0.028
Neg logit contributions
ocol-0.084
uff-0.073
ubb-0.073
 loser-0.072
 bruised-0.072
Token the
Token ID262
Feature activation-2.48e-02

Pos logit contributions
 said+0.014
 replied+0.014
 exclaimed+0.013
 +0.012
 looked+0.011
Neg logit contributions
ocol-0.059
 loser-0.056
ithering-0.055
arter-0.054
aught-0.054
Token animal
Token ID5044
Feature activation+5.80e-02

Pos logit contributions
 said+0.035
 replied+0.034
 exclaimed+0.033
�+0.032
s+0.030
Neg logit contributions
ocol-0.059
uff-0.053
ithering-0.052
aught-0.052
uzz-0.051
Token back
Token ID736
Feature activation+4.17e-02

Pos logit contributions
ocol+0.152
uff+0.134
 loser+0.133
ubb+0.130
ithering+0.129
Neg logit contributions
 exclaimed-0.075
 said-0.074
 replied-0.072
s-0.071
 they-0.066
Token.
Token ID13
Feature activation+6.89e-03

Pos logit contributions
ocol+0.113
 loser+0.104
ubb+0.101
uff+0.099
ithering+0.099
Neg logit contributions
 said-0.049
 exclaimed-0.048
s-0.047
 replied-0.047
 they-0.041
Token flew
Token ID13112
Feature activation+1.71e-02

Pos logit contributions
ocol+0.028
 loser+0.025
uff+0.025
ubb+0.025
 bruised+0.024
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.010
s-0.010
�-0.010
Token all
Token ID477
Feature activation-1.56e-02

Pos logit contributions
ocol+0.040
 loser+0.035
uff+0.035
ubb+0.034
ithering+0.034
Neg logit contributions
 exclaimed-0.025
 replied-0.025
 said-0.024
s-0.023
�-0.023
Token around
Token ID1088
Feature activation-1.08e-03

Pos logit contributions
 exclaimed+0.022
 replied+0.022
�+0.021
cakes+0.020
 said+0.020
Neg logit contributions
ocol-0.037
ubb-0.032
 bruised-0.031
uff-0.031
 loser-0.031
Token the
Token ID262
Feature activation-1.74e-02

Pos logit contributions
 exclaimed+0.001
 said+0.001
 replied+0.001
s+0.001
 they+0.001
Neg logit contributions
ocol-0.003
uff-0.003
 loser-0.003
ubb-0.003
ithering-0.003
Token room
Token ID2119
Feature activation-1.78e-02

Pos logit contributions
 exclaimed+0.030
 replied+0.029
 said+0.028
s+0.027
�+0.026
Neg logit contributions
ocol-0.037
 loser-0.031
uff-0.031
 bruised-0.030
aught-0.030
Token.
Token ID13
Feature activation+1.56e-02

Pos logit contributions
 exclaimed+0.020
 replied+0.020
�+0.019
 said+0.018
s+0.018
Neg logit contributions
ocol-0.046
 loser-0.041
uff-0.041
aught-0.040
ubb-0.040
Token
Token ID198
Feature activation-1.25e-03

Pos logit contributions
ocol+0.037
uff+0.034
 loser+0.033
ithering+0.033
aught+0.033
Neg logit contributions
 exclaimed-0.019
 replied-0.018
s-0.017
�-0.017
 said-0.017
Token
Token ID198
Feature activation+4.25e-03

Pos logit contributions
cakes+0.002
 replied+0.002
 chat+0.002
 exclaimed+0.001
�+0.001
Neg logit contributions
uzz-0.002
uff-0.002
aught-0.002
ithering-0.002
ocol-0.002
TokenMom
Token ID29252
Feature activation-2.80e-02

Pos logit contributions
ocol+0.010
 loser+0.009
uff+0.009
ithering+0.008
ubb+0.008
Neg logit contributions
 exclaimed-0.006
 replied-0.006
 said-0.006
s-0.006
�-0.006
Token thought
Token ID1807
Feature activation-7.80e-02

Pos logit contributions
�+0.026
 exclaimed+0.025
cakes+0.023
s+0.023
 replied+0.022
Neg logit contributions
ocol-0.072
uff-0.068
 bruised-0.067
uzz-0.065
aught-0.064
Token for
Token ID329
Feature activation+4.22e-03

Pos logit contributions
 exclaimed+0.121
�+0.115
 replied+0.113
cakes+0.104
s+0.103
Neg logit contributions
ocol-0.178
 bruised-0.145
uff-0.142
arter-0.138
ubb-0.138
Token and
Token ID290
Feature activation-2.62e-02

Pos logit contributions
ocol+0.045
uff+0.040
 loser+0.040
ubb+0.040
ithering+0.039
Neg logit contributions
 exclaimed-0.017
 replied-0.017
 said-0.016
�-0.015
s-0.015
Token cut
Token ID2005
Feature activation-1.62e-02

Pos logit contributions
 exclaimed+0.028
 replied+0.027
 said+0.025
s+0.025
�+0.024
Neg logit contributions
ocol-0.071
uff-0.062
 loser-0.062
ubb-0.062
 bruised-0.062
Token the
Token ID262
Feature activation-3.74e-02

Pos logit contributions
 exclaimed+0.011
 said+0.011
 replied+0.011
s+0.011
 +0.009
Neg logit contributions
ocol-0.051
 loser-0.047
uff-0.046
arter-0.046
ubb-0.045
Token lion
Token ID18744
Feature activation+1.26e-02

Pos logit contributions
 exclaimed+0.060
 replied+0.060
 said+0.059
�+0.055
s+0.055
Neg logit contributions
ocol-0.085
uff-0.074
uzz-0.073
ubb-0.072
aught-0.071
Token's
Token ID338
Feature activation+1.75e-02

Pos logit contributions
ocol+0.037
uff+0.033
 loser+0.033
ubb+0.033
ithering+0.032
Neg logit contributions
 exclaimed-0.012
 replied-0.011
 said-0.011
s-0.010
�-0.009
Token dull
Token ID19222
Feature activation+2.62e-03

Pos logit contributions
ocol+0.038
uff+0.034
uzz+0.033
ateful+0.032
ubb+0.032
Neg logit contributions
 said-0.029
 exclaimed-0.029
 replied-0.029
s-0.027
�-0.026
Token paw
Token ID41827
Feature activation+1.46e-02

Pos logit contributions
ocol+0.007
uff+0.006
uzz+0.006
ithering+0.006
ubb+0.006
Neg logit contributions
 exclaimed-0.003
 said-0.003
 replied-0.003
s-0.003
�-0.003
Token.
Token ID13
Feature activation+4.24e-03

Pos logit contributions
ocol+0.044
 loser+0.039
uff+0.039
ateful+0.039
ubb+0.039
Neg logit contributions
 exclaimed-0.013
s-0.013
 said-0.013
 replied-0.012
 they-0.010
Token The
Token ID383
Feature activation-2.39e-02

Pos logit contributions
ocol+0.011
 loser+0.010
uff+0.010
ubb+0.010
ithering+0.010
Neg logit contributions
 exclaimed-0.005
 replied-0.005
 said-0.005
s-0.005
�-0.004
Token lion
Token ID18744
Feature activation+1.24e-02

Pos logit contributions
 replied+0.036
 said+0.035
 exclaimed+0.034
�+0.031
s+0.031
Neg logit contributions
ocol-0.059
uff-0.051
uzz-0.051
aught-0.050
ubb-0.050
Token roared
Token ID45615
Feature activation-2.03e-02

Pos logit contributions
ocol+0.036
 loser+0.032
uff+0.032
ubb+0.032
 bruised+0.032
Neg logit contributions
 exclaimed-0.010
 replied-0.010
�-0.009
 said-0.009
cakes-0.009
Token together
Token ID1978
Feature activation-8.54e-02

Pos logit contributions
 exclaimed+0.083
 replied+0.080
 said+0.072
�+0.071
s+0.069
Neg logit contributions
ocol-0.152
uff-0.137
 loser-0.135
aught-0.134
ithering-0.132
Token.
Token ID13
Feature activation-2.06e-03

Pos logit contributions
 exclaimed+0.084
 replied+0.082
 said+0.079
s+0.073
�+0.070
Neg logit contributions
ocol-0.244
 loser-0.216
uff-0.216
ubb-0.213
ithering-0.211
Token They
Token ID1119
Feature activation-4.54e-02

Pos logit contributions
 exclaimed+0.002
 replied+0.002
 said+0.002
cakes+0.002
s+0.002
Neg logit contributions
ocol-0.006
uff-0.005
 loser-0.005
aught-0.005
uzz-0.005
Token drank
Token ID24070
Feature activation-5.40e-02

Pos logit contributions
 exclaimed+0.048
 replied+0.043
cakes+0.040
s+0.038
�+0.038
Neg logit contributions
ocol-0.123
ubb-0.108
 bruised-0.108
 loser-0.107
uzz-0.106
Token water
Token ID1660
Feature activation-2.60e-02

Pos logit contributions
 exclaimed+0.076
 replied+0.074
 said+0.065
�+0.065
cakes+0.060
Neg logit contributions
ocol-0.124
 bruised-0.109
ubb-0.104
uff-0.102
 loser-0.099
Token and
Token ID290
Feature activation-1.91e-02

Pos logit contributions
 exclaimed+0.036
 replied+0.033
�+0.031
 said+0.030
 +0.028
Neg logit contributions
ocol-0.058
uff-0.054
 bruised-0.052
ubb-0.051
uzz-0.051
Token ate
Token ID15063
Feature activation-4.55e-02

Pos logit contributions
 exclaimed+0.024
 replied+0.024
cakes+0.023
s+0.023
lic+0.021
Neg logit contributions
ocol-0.044
 bruised-0.041
ubb-0.038
ities-0.037
asured-0.036
Token healthy
Token ID5448
Feature activation-8.36e-03

Pos logit contributions
 exclaimed+0.059
 replied+0.058
 said+0.053
cakes+0.049
�+0.049
Neg logit contributions
ocol-0.115
uff-0.098
 bruised-0.097
ubb-0.096
 loser-0.094
Token meat
Token ID6174
Feature activation+3.52e-02

Pos logit contributions
 replied+0.014
 exclaimed+0.014
 said+0.013
cakes+0.012
�+0.012
Neg logit contributions
ocol-0.018
ities-0.015
ubb-0.014
 bruised-0.014
 loser-0.014
Token.
Token ID13
Feature activation-3.17e-03

Pos logit contributions
ocol+0.086
uff+0.077
 loser+0.075
ubb+0.074
uzz+0.074
Neg logit contributions
 replied-0.046
 exclaimed-0.046
�-0.042
 said-0.042
 -0.038
Token
Token ID220
Feature activation-6.09e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.003
 said+0.003
cakes+0.003
s+0.003
Neg logit contributions
ocol-0.009
uff-0.008
 loser-0.008
aught-0.008
uzz-0.008
Token Tim
Token ID5045
Feature activation-1.86e-02

Pos logit contributions
 exclaimed+0.184
 replied+0.182
 said+0.180
s+0.170
�+0.166
Neg logit contributions
ocol-0.224
 loser-0.189
ubb-0.188
uff-0.187
ithering-0.183
Token says
Token ID1139
Feature activation-4.44e-02

Pos logit contributions
 exclaimed+0.019
cakes+0.018
�+0.017
 they+0.017
ves+0.017
Neg logit contributions
ocol-0.044
uff-0.042
uzz-0.040
asured-0.040
 loser-0.040
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
 exclaimed+0.059
 replied+0.058
�+0.055
 said+0.054
cakes+0.052
Neg logit contributions
ocol-0.106
uff-0.094
uzz-0.093
 loser-0.093
 bruised-0.092
Token
Token ID198
Feature activation+1.01e-02

Pos logit contributions
ocol+0.064
uff+0.058
 loser+0.058
uzz+0.058
ithering+0.057
Neg logit contributions
 exclaimed-0.022
 replied-0.021
 said-0.019
�-0.019
cakes-0.018
Token
Token ID198
Feature activation+1.57e-02

Pos logit contributions
ocol+0.025
uff+0.024
uzz+0.024
aught+0.023
ithering+0.023
Neg logit contributions
 replied-0.011
cakes-0.011
 exclaimed-0.011
 said-0.010
�-0.009
TokenThey
Token ID2990
Feature activation-5.39e-02

Pos logit contributions
ocol+0.048
ubb+0.042
 loser+0.042
uff+0.042
 bruised+0.041
Neg logit contributions
 exclaimed-0.014
 replied-0.013
 said-0.013
s-0.012
�-0.011
Token open
Token ID1280
Feature activation+1.46e-02

Pos logit contributions
 exclaimed+0.073
 replied+0.071
 said+0.068
s+0.065
�+0.063
Neg logit contributions
ocol-0.135
 loser-0.117
uff-0.117
ubb-0.115
ithering-0.114
Token the
Token ID262
Feature activation-2.14e-02

Pos logit contributions
ocol+0.043
 loser+0.038
uff+0.038
ubb+0.038
ithering+0.037
Neg logit contributions
 exclaimed-0.013
 replied-0.013
 said-0.012
s-0.011
�-0.011
Token box
Token ID3091
Feature activation-1.12e-02

Pos logit contributions
 exclaimed+0.030
 replied+0.030
 said+0.030
s+0.029
�+0.028
Neg logit contributions
ocol-0.051
uff-0.045
 loser-0.044
ubb-0.044
ithering-0.043
Token and
Token ID290
Feature activation-1.62e-02

Pos logit contributions
 exclaimed+0.015
�+0.015
 replied+0.014
 said+0.012
cakes+0.012
Neg logit contributions
ocol-0.025
uff-0.024
uzz-0.023
aught-0.023
ithering-0.023
Token take
Token ID1011
Feature activation-1.22e-02

Pos logit contributions
 exclaimed+0.020
 replied+0.019
 said+0.018
s+0.018
�+0.017
Neg logit contributions
ocol-0.042
 loser-0.037
uff-0.037
ubb-0.036
 bruised-0.036
Token was
Token ID373
Feature activation+1.99e-02

Pos logit contributions
ocol+0.057
 bruised+0.054
uff+0.052
 loser+0.052
ateful+0.052
Neg logit contributions
�-0.021
cakes-0.021
 exclaimed-0.020
 replied-0.018
s-0.018
Token very
Token ID845
Feature activation+7.49e-03

Pos logit contributions
ocol+0.049
ubb+0.042
uff+0.042
 loser+0.042
eter+0.041
Neg logit contributions
 said-0.030
 exclaimed-0.029
 replied-0.028
 -0.025
 looked-0.024
Token angry
Token ID7954
Feature activation-2.08e-02

Pos logit contributions
ocol+0.017
ities+0.014
uff+0.014
 loser+0.014
ubb+0.014
Neg logit contributions
 said-0.013
 exclaimed-0.013
 replied-0.013
s-0.012
 they-0.011
Token and
Token ID290
Feature activation+8.50e-03

Pos logit contributions
 said+0.015
 exclaimed+0.013
 replied+0.013
s+0.013
 they+0.011
Neg logit contributions
ocol-0.069
ubb-0.061
 loser-0.060
ithering-0.060
uff-0.059
Token he
Token ID339
Feature activation+4.03e-03

Pos logit contributions
ocol+0.015
 bruised+0.013
 loser+0.013
uff+0.013
ubb+0.013
Neg logit contributions
 exclaimed-0.016
 replied-0.016
�-0.015
cakes-0.015
s-0.015
Token shouted
Token ID17293
Feature activation-3.89e-02

Pos logit contributions
ocol+0.009
 bruised+0.009
 loser+0.009
uff+0.009
aught+0.008
Neg logit contributions
cakes-0.005
�-0.004
 replied-0.004
 exclaimed-0.004
s-0.004
Token at
Token ID379
Feature activation+4.70e-02

Pos logit contributions
 exclaimed+0.061
 said+0.060
 replied+0.060
s+0.060
 they+0.054
Neg logit contributions
ocol-0.090
ubb-0.077
 loser-0.076
uff-0.076
ithering-0.074
Token Tom
Token ID4186
Feature activation-2.31e-02

Pos logit contributions
ocol+0.117
ubb+0.102
 loser+0.102
uff+0.101
ithering+0.101
Neg logit contributions
 replied-0.063
 said-0.063
 exclaimed-0.062
s-0.057
�-0.056
Token.
Token ID13
Feature activation+2.55e-02

Pos logit contributions
 exclaimed+0.019
 said+0.019
 replied+0.019
s+0.018
 +0.015
Neg logit contributions
ocol-0.072
ubb-0.063
 loser-0.063
uff-0.062
ithering-0.062
Token He
Token ID679
Feature activation+3.33e-02

Pos logit contributions
ocol+0.064
ubb+0.055
 loser+0.054
uff+0.053
ithering+0.053
Neg logit contributions
 exclaimed-0.037
 replied-0.037
 said-0.037
s-0.033
 -0.033
Token said
Token ID531
Feature activation-3.80e-02

Pos logit contributions
ocol+0.087
 bruised+0.083
uff+0.080
 loser+0.080
ateful+0.078
Neg logit contributions
cakes-0.028
�-0.028
 exclaimed-0.026
s-0.025
 replied-0.024
Token than
Token ID621
Feature activation+1.33e-01

Pos logit contributions
 said+0.028
 exclaimed+0.027
s+0.026
 replied+0.026
�+0.024
Neg logit contributions
ocol-0.096
ubb-0.085
ithering-0.083
uff-0.083
arter-0.083
Token usual
Token ID6678
Feature activation+1.90e-02

Pos logit contributions
ocol+0.361
uff+0.325
ithering+0.323
ubb+0.317
 loser+0.316
Neg logit contributions
 said-0.151
 exclaimed-0.145
 replied-0.141
 they-0.138
s-0.132
Token for
Token ID329
Feature activation+2.04e-02

Pos logit contributions
ocol+0.060
uff+0.054
ubb+0.053
uzz+0.052
ithering+0.052
Neg logit contributions
 said-0.015
 exclaimed-0.014
 replied-0.013
�-0.013
s-0.013
Token fun
Token ID1257
Feature activation-1.40e-02

Pos logit contributions
ocol+0.056
 loser+0.050
ubb+0.050
uff+0.050
ithering+0.049
Neg logit contributions
 said-0.022
 exclaimed-0.021
 they-0.020
 replied-0.020
s-0.019
Token,
Token ID11
Feature activation+2.27e-02

Pos logit contributions
 exclaimed+0.013
 said+0.012
 replied+0.012
s+0.012
�+0.011
Neg logit contributions
ocol-0.041
ubb-0.036
uff-0.036
 loser-0.036
ithering-0.036
Token they
Token ID484
Feature activation-7.62e-02

Pos logit contributions
ocol+0.071
uff+0.063
ithering+0.063
ubb+0.063
 loser+0.062
Neg logit contributions
 said-0.019
 they-0.018
 exclaimed-0.017
 replied-0.017
s-0.015
Token speed
Token ID2866
Feature activation+2.37e-02

Pos logit contributions
 exclaimed+0.097
 replied+0.095
 said+0.090
s+0.086
�+0.085
Neg logit contributions
ocol-0.192
 loser-0.168
uff-0.167
ubb-0.166
aught-0.164
Token.
Token ID13
Feature activation+1.08e-02

Pos logit contributions
ocol+0.063
 loser+0.055
uff+0.054
ubb+0.054
ithering+0.053
Neg logit contributions
 exclaimed-0.029
 replied-0.028
 said-0.028
s-0.026
�-0.025
Token
Token ID198
Feature activation+2.16e-03

Pos logit contributions
ocol+0.029
 loser+0.026
ubb+0.026
uff+0.026
ithering+0.025
Neg logit contributions
 exclaimed-0.013
 replied-0.012
 said-0.012
s-0.011
�-0.011
Token
Token ID198
Feature activation+7.96e-03

Pos logit contributions
ocol+0.006
uzz+0.005
uff+0.005
aught+0.005
ithering+0.005
Neg logit contributions
 replied-0.002
 exclaimed-0.002
 said-0.002
cakes-0.002
�-0.002
TokenPaul
Token ID12041
Feature activation+1.95e-02

Pos logit contributions
ocol+0.020
ubb+0.017
 loser+0.017
uff+0.017
 bruised+0.017
Neg logit contributions
 exclaimed-0.011
 replied-0.011
 said-0.011
s-0.010
�-0.010
Token
Token ID198
Feature activation-8.35e-03

Pos logit contributions
ocol+0.023
 loser+0.021
uff+0.021
ubb+0.020
ithering+0.020
Neg logit contributions
 exclaimed-0.007
 replied-0.006
 said-0.006
s-0.006
 -0.006
Token
Token ID198
Feature activation-1.92e-03

Pos logit contributions
 replied+0.008
 exclaimed+0.008
cakes+0.008
 said+0.008
�+0.007
Neg logit contributions
ocol-0.022
uff-0.020
ithering-0.020
aught-0.020
uzz-0.020
TokenWhen
Token ID2215
Feature activation+1.78e-02

Pos logit contributions
 exclaimed+0.003
 replied+0.002
 said+0.002
s+0.002
�+0.002
Neg logit contributions
ocol-0.005
uff-0.004
 loser-0.004
ubb-0.004
ithering-0.004
Token she
Token ID673
Feature activation-2.88e-02

Pos logit contributions
ocol+0.046
ubb+0.041
uff+0.040
 loser+0.040
 bruised+0.039
Neg logit contributions
 exclaimed-0.022
 replied-0.021
 said-0.020
cakes-0.019
�-0.019
Token was
Token ID373
Feature activation+2.34e-03

Pos logit contributions
 exclaimed+0.025
 replied+0.023
cakes+0.023
�+0.022
s+0.022
Neg logit contributions
ocol-0.078
 loser-0.069
ubb-0.069
 bruised-0.069
uff-0.068
Token done
Token ID1760
Feature activation-1.32e-01

Pos logit contributions
ocol+0.007
 loser+0.006
uff+0.006
ithering+0.006
 bruised+0.006
Neg logit contributions
 exclaimed-0.002
 replied-0.002
s-0.002
 said-0.002
�-0.002
Token,
Token ID11
Feature activation+3.05e-03

Pos logit contributions
 exclaimed+0.125
 replied+0.122
 said+0.116
s+0.107
�+0.104
Neg logit contributions
ocol-0.383
 loser-0.339
uff-0.339
ubb-0.335
ithering-0.332
Token she
Token ID673
Feature activation-2.43e-02

Pos logit contributions
ocol+0.008
ubb+0.007
uff+0.007
 loser+0.007
ithering+0.007
Neg logit contributions
 exclaimed-0.004
 replied-0.004
 said-0.003
s-0.003
cakes-0.003
Token felt
Token ID2936
Feature activation-5.61e-02

Pos logit contributions
cakes+0.021
 exclaimed+0.021
�+0.020
 replied+0.020
s+0.020
Neg logit contributions
ocol-0.061
 bruised-0.057
asured-0.056
aught-0.056
uff-0.056
Token happy
Token ID3772
Feature activation-2.90e-02

Pos logit contributions
 exclaimed+0.076
 replied+0.076
 said+0.072
s+0.071
�+0.068
Neg logit contributions
ocol-0.136
ubb-0.118
 loser-0.118
 bruised-0.117
uff-0.116
Token and
Token ID290
Feature activation-5.97e-03

Pos logit contributions
 exclaimed+0.030
 replied+0.029
 said+0.028
s+0.026
�+0.025
Neg logit contributions
ocol-0.082
uff-0.072
 loser-0.072
ubb-0.071
ithering-0.071
Token She
Token ID1375
Feature activation+1.80e-02

Pos logit contributions
 exclaimed+0.006
 replied+0.006
 said+0.006
s+0.005
�+0.005
Neg logit contributions
ocol-0.011
uff-0.009
 loser-0.009
ubb-0.009
ithering-0.009
Token cooked
Token ID15847
Feature activation-6.40e-02

Pos logit contributions
ocol+0.038
asured+0.034
uff+0.033
 bruised+0.033
aught+0.033
Neg logit contributions
 replied-0.024
 exclaimed-0.024
�-0.024
cakes-0.023
 -0.023
Token the
Token ID262
Feature activation-5.07e-02

Pos logit contributions
 replied+0.084
 exclaimed+0.084
 said+0.074
�+0.073
cakes+0.068
Neg logit contributions
ocol-0.159
uff-0.136
ubb-0.130
ithering-0.127
udge-0.126
Token mixture
Token ID11710
Feature activation-5.82e-02

Pos logit contributions
 exclaimed+0.078
 replied+0.077
 said+0.070
s+0.067
�+0.067
Neg logit contributions
ocol-0.118
ubb-0.099
uff-0.098
 loser-0.098
 bruised-0.096
Token until
Token ID1566
Feature activation-3.41e-02

Pos logit contributions
 exclaimed+0.087
 replied+0.087
�+0.080
 said+0.077
 +0.070
Neg logit contributions
ocol-0.122
uff-0.111
ubb-0.108
aught-0.105
udge-0.103
Token it
Token ID340
Feature activation-5.32e-02

Pos logit contributions
 exclaimed+0.043
 replied+0.041
s+0.037
cakes+0.035
 said+0.035
Neg logit contributions
ocol-0.085
ubb-0.074
 bruised-0.073
 loser-0.072
ateful-0.070
Token smelled
Token ID37609
Feature activation-4.48e-02

Pos logit contributions
 exclaimed+0.058
 replied+0.056
�+0.051
 +0.046
s+0.044
Neg logit contributions
ocol-0.124
ubb-0.116
 bruised-0.115
 loser-0.113
 unsolved-0.112
Token y
Token ID331
Feature activation+2.73e-02

Pos logit contributions
 exclaimed+0.069
 replied+0.069
 said+0.063
cakes+0.062
s+0.060
Neg logit contributions
ocol-0.100
ubb-0.084
 bruised-0.082
 loser-0.079
uff-0.078
Tokenummy
Token ID13513
Feature activation+7.20e-03

Pos logit contributions
ocol+0.051
uff+0.042
 loser+0.041
ubb+0.041
ithering+0.040
Neg logit contributions
 exclaimed-0.054
 replied-0.054
 said-0.052
�-0.050
s-0.050
Token.
Token ID13
Feature activation-3.95e-03

Pos logit contributions
ocol+0.017
ubb+0.014
 bruised+0.014
ities+0.014
uff+0.014
Neg logit contributions
 replied-0.010
 exclaimed-0.010
 said-0.009
 -0.009
�-0.008
Token
Token ID198
Feature activation-9.94e-03

Pos logit contributions
 exclaimed+0.006
 replied+0.006
 said+0.005
s+0.005
�+0.005
Neg logit contributions
ocol-0.009
uff-0.008
 loser-0.007
aught-0.007
uzz-0.007