TOP ACTIVATIONS
MAX = 0.218

nodded . She liked mac aron i and cheese . She
replied . Sh aron helped her mom pour the
on end . <|endoftext|> Sh aron was walking in the park
a delicious bowl of mac aron i and cheese . And
their favorite food : mac aron i and cheese .
Soon , the mac aron i and cheese was ready
" I still love mac aron i and cheese ... especially
Mom and they loved mac aron i and cheese .
table . Sh aron smiled and took a big
'm making a delicious mac aron i and cheese ," M
an ig loo ! Er in had never seen an
smiled . She loved mac aron i and cheese !
dinner . It was mac aron i and cheese with a
about we make some mac aron i and cheese for dinner
a lot . Er in was glad that she
broccoli and had her mac aron i and cheese too .
ig loo . Er in felt so happy and
perfect battery ! Er nie thanked the creature and
broccoli . She wanted mac aron i and cheese instead .
the forest !" Er nie nodded and thanked the

MIN ACTIVATIONS
MIN = -0.237

snake sees Tom and his ses . It opens its mouth
like that . It his ses and fl aps its wings
her hair in a pony tail so she could look pretty
is angry . It his ses . It scratches . It
, Emma decided a pony tail wasn 't so important after
ked tongue . It his ses and swim s near the
nose . The cat his ses and scratches his face .
it . The cat his ses and scratches him . Ben
scare it . It his ses and shows its teeth .
slide . The cat his ses at him . Tom is
The snake his ses . L ila and Tom
a cat . She his ses and scratches at the dog
angry snake . It his ses and snaps at Ben .
of the web and his ses at Ben . Ben drops
is afraid . It his ses and scratches Tom 's hand
claws and bites and his ses at the bars . He
with glasses and a pony tail . She is reading a
and angry . It his ses and fl aps its wings
is afraid . It his ses and scratches Tom 's hand
of Tom . It his ses and scratches . It hurts

INTERVAL 0.104 - 0.218
CONTAINS 0.938%

. <|endoftext|> Once upon a time , there was a little
across a big , shiny mirror . The bear looked
. <|endoftext|> Once upon a time , there was a little
. <|endoftext|> Once upon a time , a boy and his
. It was her favorite color and it made her feel

INTERVAL -0.009 - 0.104
CONTAINS 62.063%

for her brother as a surprise . The little
love you too . I will try school ." She gave
the forehead . He hugged his mom my back and felt
What happened , Tom ?" Mom asked . "
, but it showed that even though it was competitive ,

INTERVAL -0.123 - -0.009
CONTAINS 36.836%

delicate and very pretty . She was excited to go outside
, " Santa , is that you ? We are so
One day , they see a real train outside their
had a hole in the middle . He wondered what it
her mom and dad . She ran away and found a

INTERVAL -0.237 - -0.123
CONTAINS 0.164%

was curious too . Ab ig ail offered to teach him
S ara nodded and stro ked the velvet dress gently .
She me owed and me owed to tell her family she
scratch his ear with his paw . The fle a jumps
now reach the spaghetti . Kitty took a big bite and
Token nodded
Token ID14464
Feature activation+2.82e-02

Pos logit contributions
bo+0.011
y+0.010
en+0.009
by+0.009
aron+0.009
Neg logit contributions
ishly-0.016
stack-0.016
anger-0.016
ses-0.016
sed-0.015
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
bo+0.042
ud+0.040
aron+0.038
by+0.037
en+0.037
Neg logit contributions
 yes-0.047
ishly-0.046
ble-0.044
stack-0.044
 hi-0.043
Token She
Token ID1375
Feature activation-1.26e-02

Pos logit contributions
bo+0.026
y+0.026
en+0.025
aron+0.025
by+0.024
Neg logit contributions
anger-0.024
stack-0.023
 plunge-0.022
ses-0.021
ittens-0.020
Token liked
Token ID8288
Feature activation+2.17e-02

Pos logit contributions
stack+0.023
anger+0.023
sed+0.023
ses+0.023
ishly+0.022
Neg logit contributions
en-0.018
bo-0.017
y-0.017
by-0.015
aron-0.015
Token mac
Token ID8352
Feature activation+9.39e-02

Pos logit contributions
bo+0.038
aron+0.037
y+0.035
en+0.033
by+0.032
Neg logit contributions
 plunge-0.036
anger-0.036
stack-0.036
ses-0.034
ittens-0.033
Tokenaron
Token ID8045
Feature activation+2.18e-01

Pos logit contributions
bo+0.083
Em+0.082
by+0.081
en+0.077
Kat+0.077
Neg logit contributions
anger-0.218
ses-0.211
stack-0.210
ishly-0.201
sed-0.195
Tokeni
Token ID72
Feature activation+1.51e-02

Pos logit contributions
bo+0.304
aron+0.303
y+0.288
en+0.284
by+0.266
Neg logit contributions
stack-0.415
anger-0.410
 plunge-0.400
ittens-0.389
ishly-0.387
Token and
Token ID290
Feature activation+6.30e-03

Pos logit contributions
en+0.016
bo+0.016
Em+0.016
aron+0.016
Kat+0.016
Neg logit contributions
stack-0.032
anger-0.032
ishly-0.031
 plunge-0.031
ittens-0.030
Token cheese
Token ID9891
Feature activation+1.05e-02

Pos logit contributions
en+0.011
y+0.011
aron+0.010
bo+0.010
Kat+0.010
Neg logit contributions
ses-0.010
stack-0.010
sed-0.010
 plunge-0.009
aroo-0.009
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
bo+0.013
en+0.012
by+0.011
y+0.011
aron+0.011
Neg logit contributions
stack-0.023
anger-0.023
 plunge-0.021
ses-0.021
ishly-0.020
Token She
Token ID1375
Feature activation-1.41e-02

Pos logit contributions
bo+0.023
y+0.023
en+0.022
aron+0.022
by+0.021
Neg logit contributions
anger-0.021
stack-0.020
 plunge-0.019
ses-0.018
ittens-0.018
Token replied
Token ID8712
Feature activation-6.13e-02

Pos logit contributions
by+0.008
bo+0.008
en+0.008
aron+0.008
y+0.008
Neg logit contributions
ishly-0.007
stack-0.007
anger-0.007
 plunge-0.007
ittens-0.007
Token.
Token ID13
Feature activation+5.05e-03

Pos logit contributions
anger+0.115
ishly+0.115
stack+0.114
 plunge+0.107
ittens+0.104
Neg logit contributions
bo-0.086
en-0.080
by-0.080
y-0.078
 gre-0.077
Token
Token ID198
Feature activation+1.59e-02

Pos logit contributions
bo+0.009
y+0.008
en+0.008
by+0.008
aron+0.008
Neg logit contributions
stack-0.009
anger-0.009
-0.008
 plunge-0.008
ittens-0.008
Token
Token ID198
Feature activation+1.69e-02

Pos logit contributions
y+0.021
bo+0.021
en+0.021
aron+0.020
by+0.019
Neg logit contributions
anger-0.033
stack-0.032
 plunge-0.031
ittens-0.031
aroo-0.030
TokenSh
Token ID2484
Feature activation+3.91e-02

Pos logit contributions
bo+0.025
en+0.025
y+0.025
aron+0.024
by+0.023
Neg logit contributions
anger-0.031
stack-0.031
 plunge-0.030
ittens-0.030
aroo-0.029
Tokenaron
Token ID8045
Feature activation+2.17e-01

Pos logit contributions
bo+0.068
by+0.066
 gre+0.064
 cared+0.064
 shared+0.062
Neg logit contributions
anger-0.066
ittens-0.058
aroo-0.058
stack-0.057
oot-0.056
Token helped
Token ID4193
Feature activation-5.21e-03

Pos logit contributions
 gre+0.265
aron+0.262
bo+0.258
 late+0.253
able+0.242
Neg logit contributions
stack-0.446
aroo-0.422
anger-0.415
ishly-0.415
ittens-0.406
Token her
Token ID607
Feature activation-2.59e-02

Pos logit contributions
stack+0.008
 plunge+0.008
anger+0.008
ishly+0.007
ittens+0.007
Neg logit contributions
bo-0.009
y-0.009
 cared-0.009
aron-0.008
en-0.008
Token mom
Token ID1995
Feature activation+9.89e-03

Pos logit contributions
stack+0.032
anger+0.032
 plunge+0.031
ittens+0.027
ishly+0.027
Neg logit contributions
bo-0.055
aron-0.055
y-0.053
en-0.053
ky-0.050
Token pour
Token ID12797
Feature activation+4.54e-02

Pos logit contributions
aron+0.014
bo+0.014
en+0.013
y+0.013
Em+0.013
Neg logit contributions
anger-0.017
stack-0.017
 plunge-0.017
ishly-0.016
ittens-0.016
Token the
Token ID262
Feature activation-1.10e-03

Pos logit contributions
aron+0.089
bo+0.087
Em+0.085
 cared+0.084
y+0.082
Neg logit contributions
 plunge-0.066
anger-0.062
stack-0.060
 umbrella-0.058
 raft-0.057
Token on
Token ID319
Feature activation-1.64e-02

Pos logit contributions
bo+0.017
aron+0.016
by+0.015
y+0.015
en+0.015
Neg logit contributions
anger-0.030
stack-0.030
 plunge-0.027
ittens-0.027
ishly-0.026
Token end
Token ID886
Feature activation+2.45e-03

Pos logit contributions
anger+0.034
stack+0.033
 plunge+0.032
+0.031
ittens+0.030
Neg logit contributions
bo-0.022
aron-0.022
y-0.021
by-0.019
 cared-0.019
Token.
Token ID13
Feature activation+2.30e-03

Pos logit contributions
bo+0.003
aron+0.003
 cared+0.003
y+0.003
en+0.003
Neg logit contributions
anger-0.005
stack-0.005
 plunge-0.005
ittens-0.004
aroo-0.004
Token<|endoftext|>
Token ID50256
Feature activation-1.69e-02

Pos logit contributions
bo+0.003
y+0.003
en+0.003
aron+0.003
by+0.003
Neg logit contributions
stack-0.005
anger-0.004
 plunge-0.004
-0.004
ittens-0.004
TokenSh
Token ID2484
Feature activation+3.79e-02

Pos logit contributions
anger+0.032
stack+0.031
 plunge+0.029
+0.029
aroo+0.027
Neg logit contributions
aron-0.025
en-0.025
bo-0.025
y-0.024
by-0.024
Tokenaron
Token ID8045
Feature activation+2.16e-01

Pos logit contributions
bo+0.063
 gre+0.059
 cared+0.058
 added+0.056
aron+0.056
Neg logit contributions
anger-0.066
stack-0.062
ittens-0.060
aroo-0.057
 plunge-0.056
Token was
Token ID373
Feature activation+1.98e-02

Pos logit contributions
aron+0.327
bo+0.302
y+0.292
en+0.281
 gre+0.274
Neg logit contributions
stack-0.417
anger-0.403
 plunge-0.393
aroo-0.374
ishly-0.371
Token walking
Token ID6155
Feature activation+1.55e-02

Pos logit contributions
bo+0.034
aron+0.033
en+0.033
by+0.033
y+0.033
Neg logit contributions
stack-0.031
 plunge-0.030
anger-0.029
ittens-0.028
ses-0.028
Token in
Token ID287
Feature activation+2.26e-02

Pos logit contributions
bo+0.023
aron+0.023
en+0.021
y+0.021
 cared+0.021
Neg logit contributions
anger-0.030
stack-0.030
 plunge-0.029
aroo-0.027
ute-0.026
Token the
Token ID262
Feature activation+1.04e-03

Pos logit contributions
y+0.027
bo+0.027
aron+0.026
en+0.026
by+0.023
Neg logit contributions
anger-0.051
stack-0.049
 plunge-0.048
ses-0.046
ittens-0.046
Token park
Token ID3952
Feature activation-1.30e-02

Pos logit contributions
aron+0.002
y+0.002
bo+0.002
en+0.002
by+0.002
Neg logit contributions
stack-0.002
anger-0.002
 plunge-0.002
ses-0.001
 raft-0.001
Token a
Token ID257
Feature activation-2.13e-02

Pos logit contributions
anger+0.018
stack+0.018
 plunge+0.018
ittens+0.016
aroo+0.015
Neg logit contributions
bo-0.025
aron-0.025
y-0.024
en-0.024
by-0.022
Token delicious
Token ID12625
Feature activation-1.17e-02

Pos logit contributions
 plunge+0.034
anger+0.033
stack+0.033
aroo+0.030
 raft+0.030
Neg logit contributions
en-0.040
bo-0.039
aron-0.038
y-0.037
by-0.032
Token bowl
Token ID9396
Feature activation-1.37e-02

Pos logit contributions
anger+0.015
 plunge+0.015
stack+0.015
 raft+0.014
+0.013
Neg logit contributions
bo-0.025
aron-0.023
en-0.023
y-0.023
 cared-0.021
Token of
Token ID286
Feature activation-1.63e-02

Pos logit contributions
stack+0.026
anger+0.026
 plunge+0.023
aroo+0.022
ishly+0.022
Neg logit contributions
bo-0.022
aron-0.021
en-0.020
by-0.020
y-0.019
Token mac
Token ID8352
Feature activation+8.93e-02

Pos logit contributions
 plunge+0.024
stack+0.023
anger+0.021
 raft+0.020
ittens+0.020
Neg logit contributions
bo-0.034
y-0.032
aron-0.032
en-0.031
by-0.029
Tokenaron
Token ID8045
Feature activation+2.14e-01

Pos logit contributions
Em+0.080
Kat+0.076
bo+0.075
by+0.075
 cared+0.074
Neg logit contributions
anger-0.216
stack-0.216
ses-0.193
ittens-0.190
comb-0.189
Tokeni
Token ID72
Feature activation+1.28e-02

Pos logit contributions
aron+0.353
Em+0.340
bo+0.329
Kat+0.316
 cared+0.315
Neg logit contributions
stack-0.358
 plunge-0.337
anger-0.331
ishly-0.331
ittens-0.326
Token and
Token ID290
Feature activation-4.18e-03

Pos logit contributions
en+0.016
Em+0.016
bo+0.016
Kat+0.015
aron+0.015
Neg logit contributions
stack-0.026
anger-0.025
ittens-0.024
ses-0.023
 plunge-0.023
Token cheese
Token ID9891
Feature activation+1.07e-02

Pos logit contributions
stack+0.007
 plunge+0.006
ses+0.006
sed+0.006
aroo+0.006
Neg logit contributions
en-0.008
y-0.007
aron-0.007
bo-0.007
ud-0.007
Token.
Token ID13
Feature activation+4.14e-03

Pos logit contributions
bo+0.015
en+0.014
by+0.013
aron+0.013
 cared+0.013
Neg logit contributions
stack-0.023
anger-0.022
 plunge-0.020
ittens-0.019
aroo-0.019
Token And
Token ID843
Feature activation-4.45e-02

Pos logit contributions
y+0.008
bo+0.008
en+0.008
by+0.007
aron+0.007
Neg logit contributions
stack-0.007
anger-0.006
 plunge-0.006
-0.006
ishly-0.006
Token their
Token ID511
Feature activation+6.98e-03

Pos logit contributions
bo+0.106
y+0.106
aron+0.104
en+0.098
by+0.093
Neg logit contributions
anger-0.123
 plunge-0.120
stack-0.116
ses-0.109
ittens-0.107
Token favorite
Token ID4004
Feature activation-2.91e-02

Pos logit contributions
bo+0.013
y+0.013
aron+0.013
en+0.013
by+0.012
Neg logit contributions
anger-0.011
stack-0.011
 plunge-0.010
ses-0.009
ittens-0.009
Token food
Token ID2057
Feature activation-9.37e-03

Pos logit contributions
anger+0.047
stack+0.046
 plunge+0.045
ses+0.041
ittens+0.041
Neg logit contributions
bo-0.053
y-0.052
aron-0.052
en-0.050
by-0.048
Token:
Token ID25
Feature activation+1.45e-02

Pos logit contributions
anger+0.021
stack+0.021
 plunge+0.020
ses+0.019
ittens+0.019
Neg logit contributions
bo-0.011
y-0.011
aron-0.011
en-0.011
by-0.010
Token mac
Token ID8352
Feature activation+9.05e-02

Pos logit contributions
bo+0.025
y+0.024
en+0.024
by+0.023
aron+0.023
Neg logit contributions
stack-0.024
anger-0.023
ses-0.023
 plunge-0.022
ittens-0.022
Tokenaron
Token ID8045
Feature activation+2.14e-01

Pos logit contributions
Em+0.081
Kat+0.072
bo+0.069
by+0.069
 cared+0.065
Neg logit contributions
anger-0.217
ses-0.215
stack-0.205
sed-0.205
ittens-0.204
Tokeni
Token ID72
Feature activation+1.54e-02

Pos logit contributions
bo+0.314
aron+0.313
Em+0.301
y+0.289
 gre+0.288
Neg logit contributions
stack-0.385
anger-0.377
ittens-0.367
 plunge-0.367
ishly-0.357
Token and
Token ID290
Feature activation+9.49e-03

Pos logit contributions
Em+0.020
Kat+0.019
en+0.018
 gre+0.017
bo+0.017
Neg logit contributions
ittens-0.031
anger-0.030
stack-0.030
ses-0.029
sed-0.028
Token cheese
Token ID9891
Feature activation+8.55e-03

Pos logit contributions
en+0.019
y+0.019
aron+0.017
bo+0.017
ud+0.017
Neg logit contributions
ses-0.013
sed-0.013
stack-0.012
 plunge-0.012
ittens-0.011
Token.
Token ID13
Feature activation+1.31e-02

Pos logit contributions
bo+0.011
en+0.010
by+0.010
 cared+0.010
y+0.009
Neg logit contributions
stack-0.018
anger-0.018
ittens-0.017
ses-0.017
 plunge-0.017
Token
Token ID198
Feature activation+1.36e-02

Pos logit contributions
bo+0.021
y+0.020
en+0.020
aron+0.019
by+0.019
Neg logit contributions
anger-0.025
stack-0.024
 plunge-0.023
ittens-0.022
ses-0.022
Token
Token ID198
Feature activation+1.19e-02

Pos logit contributions
bo+0.013
y+0.013
aron+0.013
en+0.013
by+0.012
Neg logit contributions
anger-0.025
stack-0.024
 plunge-0.024
ittens-0.023
ses-0.022
TokenSoon
Token ID28093
Feature activation+9.99e-03

Pos logit contributions
en+0.017
bo+0.017
y+0.016
aron+0.016
by+0.015
Neg logit contributions
stack-0.023
anger-0.022
 plunge-0.022
ittens-0.021
fs-0.021
Token,
Token ID11
Feature activation-8.00e-03

Pos logit contributions
bo+0.017
aron+0.017
y+0.016
en+0.015
 cared+0.014
Neg logit contributions
anger-0.017
 plunge-0.017
stack-0.016
ittens-0.015
ses-0.015
Token the
Token ID262
Feature activation+4.38e-03

Pos logit contributions
anger+0.015
stack+0.015
 plunge+0.015
ses+0.014
ittens+0.014
Neg logit contributions
bo-0.012
y-0.012
aron-0.012
en-0.011
by-0.011
Token mac
Token ID8352
Feature activation+9.13e-02

Pos logit contributions
en+0.008
aron+0.008
y+0.008
bo+0.008
by+0.007
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.007
ses-0.006
sed-0.006
Tokenaron
Token ID8045
Feature activation+2.14e-01

Pos logit contributions
Em+0.069
Kat+0.067
by+0.067
en+0.066
bo+0.063
Neg logit contributions
anger-0.231
stack-0.226
ses-0.217
sed-0.207
ittens-0.205
Tokeni
Token ID72
Feature activation+1.40e-02

Pos logit contributions
aron+0.304
Em+0.279
bo+0.276
 gre+0.268
Kat+0.265
Neg logit contributions
stack-0.399
anger-0.390
aroo-0.380
 plunge-0.376
ishly-0.371
Token and
Token ID290
Feature activation+8.81e-03

Pos logit contributions
Em+0.017
en+0.017
Kat+0.016
bo+0.016
aron+0.016
Neg logit contributions
stack-0.028
anger-0.028
sed-0.026
aroo-0.026
ittens-0.026
Token cheese
Token ID9891
Feature activation+7.34e-03

Pos logit contributions
en+0.015
aron+0.015
y+0.015
bo+0.014
Kat+0.014
Neg logit contributions
stack-0.014
ses-0.013
 plunge-0.013
sed-0.013
aroo-0.013
Token was
Token ID373
Feature activation+2.58e-02

Pos logit contributions
bo+0.009
aron+0.008
en+0.008
y+0.008
by+0.008
Neg logit contributions
anger-0.017
stack-0.016
 plunge-0.015
ses-0.015
ittens-0.015
Token ready
Token ID3492
Feature activation+2.10e-02

Pos logit contributions
Em+0.049
y+0.048
en+0.048
bo+0.047
aron+0.046
Neg logit contributions
stack-0.032
sed-0.031
ble-0.030
ses-0.030
ittens-0.029
Token "
Token ID366
Feature activation-1.27e-03

Pos logit contributions
bo+0.014
en+0.013
y+0.013
by+0.013
aron+0.013
Neg logit contributions
stack-0.012
anger-0.012
-0.011
 plunge-0.010
ishly-0.010
TokenI
Token ID40
Feature activation+5.69e-02

Pos logit contributions
stack+0.002
anger+0.002
 plunge+0.002
+0.002
aroo+0.002
Neg logit contributions
en-0.002
y-0.002
aron-0.002
by-0.002
bo-0.002
Token still
Token ID991
Feature activation-6.90e-03

Pos logit contributions
aron+0.104
bo+0.102
y+0.101
en+0.101
ky+0.093
Neg logit contributions
 plunge-0.086
anger-0.086
stack-0.084
ses-0.075
ishly-0.074
Token love
Token ID1842
Feature activation+1.71e-02

Pos logit contributions
stack+0.010
 plunge+0.010
ses+0.010
anger+0.009
ishly+0.009
Neg logit contributions
bo-0.013
aron-0.013
y-0.012
by-0.012
en-0.012
Token mac
Token ID8352
Feature activation+9.36e-02

Pos logit contributions
bo+0.038
aron+0.037
y+0.036
ny+0.034
by+0.034
Neg logit contributions
 plunge-0.021
stack-0.021
anger-0.020
ittens-0.018
aroo-0.017
Tokenaron
Token ID8045
Feature activation+2.14e-01

Pos logit contributions
bo+0.081
by+0.081
Em+0.078
idi+0.077
en+0.074
Neg logit contributions
anger-0.225
stack-0.214
ses-0.204
ittens-0.197
ishly-0.196
Tokeni
Token ID72
Feature activation+1.37e-02

Pos logit contributions
aron+0.294
bo+0.293
y+0.284
en+0.277
 cared+0.259
Neg logit contributions
anger-0.418
stack-0.417
 plunge-0.405
ittens-0.388
ishly-0.380
Token and
Token ID290
Feature activation-7.58e-04

Pos logit contributions
en+0.020
Em+0.020
bo+0.020
Kat+0.019
idi+0.019
Neg logit contributions
anger-0.024
stack-0.023
 plunge-0.023
ittens-0.023
aroo-0.022
Token cheese
Token ID9891
Feature activation+1.28e-02

Pos logit contributions
stack+0.001
ses+0.001
 plunge+0.001
sed+0.001
aroo+0.001
Neg logit contributions
y-0.001
en-0.001
aron-0.001
bo-0.001
by-0.001
Token...
Token ID986
Feature activation+1.19e-02

Pos logit contributions
bo+0.019
en+0.018
by+0.018
y+0.017
 cared+0.017
Neg logit contributions
anger-0.024
stack-0.024
 plunge-0.022
ittens-0.022
aroo-0.021
Tokenespecially
Token ID16480
Feature activation+1.08e-02

Pos logit contributions
en+0.020
by+0.019
y+0.019
aron+0.018
bo+0.018
Neg logit contributions
anger-0.021
stack-0.020
aroo-0.019
 plunge-0.018
-0.018
Token Mom
Token ID11254
Feature activation+2.83e-02

Pos logit contributions
bo+0.019
aron+0.019
y+0.018
en+0.017
ny+0.017
Neg logit contributions
stack-0.017
ses-0.017
 plunge-0.016
ittens-0.016
anger-0.015
Token and
Token ID290
Feature activation+9.81e-03

Pos logit contributions
en+0.038
bo+0.036
Em+0.036
aron+0.035
Kat+0.034
Neg logit contributions
stack-0.050
ishly-0.049
ses-0.047
anger-0.047
ittens-0.046
Token they
Token ID484
Feature activation+2.93e-02

Pos logit contributions
bo+0.017
en+0.017
aron+0.017
y+0.017
by+0.015
Neg logit contributions
stack-0.017
ses-0.016
 plunge-0.015
anger-0.015
sed-0.015
Token loved
Token ID6151
Feature activation+1.02e-02

Pos logit contributions
en+0.036
y+0.034
bo+0.034
aron+0.034
ce+0.032
Neg logit contributions
stack-0.062
 plunge-0.059
ses-0.058
ishly-0.058
anger-0.056
Token mac
Token ID8352
Feature activation+8.98e-02

Pos logit contributions
bo+0.018
aron+0.018
y+0.017
en+0.016
ny+0.016
Neg logit contributions
stack-0.016
 plunge-0.015
anger-0.015
ittens-0.015
ses-0.015
Tokenaron
Token ID8045
Feature activation+2.14e-01

Pos logit contributions
Em+0.079
bo+0.074
idi+0.072
Kat+0.071
by+0.069
Neg logit contributions
anger-0.207
ses-0.202
stack-0.198
ittens-0.188
sed-0.187
Tokeni
Token ID72
Feature activation+1.57e-02

Pos logit contributions
bo+0.306
aron+0.305
y+0.287
en+0.281
 cared+0.271
Neg logit contributions
stack-0.397
anger-0.395
 plunge-0.387
ittens-0.377
ishly-0.369
Token and
Token ID290
Feature activation+9.45e-03

Pos logit contributions
Em+0.018
en+0.018
bo+0.018
Kat+0.017
idi+0.017
Neg logit contributions
anger-0.031
stack-0.031
ittens-0.030
 plunge-0.030
ishly-0.029
Token cheese
Token ID9891
Feature activation+7.61e-03

Pos logit contributions
en+0.017
y+0.017
aron+0.016
bo+0.016
Kat+0.015
Neg logit contributions
stack-0.015
ses-0.015
sed-0.014
 plunge-0.014
aroo-0.013
Token.
Token ID13
Feature activation+1.25e-02

Pos logit contributions
bo+0.010
en+0.009
by+0.009
y+0.008
aron+0.008
Neg logit contributions
stack-0.016
anger-0.016
 plunge-0.015
ses-0.015
ittens-0.014
Token
Token ID198
Feature activation+1.27e-02

Pos logit contributions
bo+0.021
y+0.020
en+0.020
aron+0.019
by+0.019
Neg logit contributions
stack-0.021
anger-0.021
ses-0.020
ittens-0.020
-0.020
Token table
Token ID3084
Feature activation+4.26e-02

Pos logit contributions
y+0.002
en+0.002
bo+0.002
aron+0.002
 cared+0.002
Neg logit contributions
stack-0.002
anger-0.001
 plunge-0.001
ses-0.001
ishly-0.001
Token.
Token ID13
Feature activation+5.42e-03

Pos logit contributions
bo+0.059
aron+0.052
en+0.051
y+0.050
by+0.049
Neg logit contributions
anger-0.091
stack-0.090
 plunge-0.085
-0.075
ishly-0.075
Token
Token ID198
Feature activation+1.16e-02

Pos logit contributions
bo+0.009
y+0.009
en+0.008
aron+0.008
by+0.008
Neg logit contributions
anger-0.010
stack-0.010
 plunge-0.010
ses-0.009
ittens-0.009
Token
Token ID198
Feature activation+1.20e-02

Pos logit contributions
bo+0.013
y+0.013
aron+0.012
en+0.012
by+0.011
Neg logit contributions
anger-0.027
stack-0.026
 plunge-0.026
ittens-0.024
ses-0.024
TokenSh
Token ID2484
Feature activation+3.61e-02

Pos logit contributions
bo+0.018
y+0.017
en+0.017
aron+0.017
by+0.016
Neg logit contributions
anger-0.024
stack-0.023
 plunge-0.023
ittens-0.021
ses-0.021
Tokenaron
Token ID8045
Feature activation+2.14e-01

Pos logit contributions
bo+0.050
by+0.049
y+0.047
 shared+0.046
 gre+0.046
Neg logit contributions
anger-0.074
ittens-0.066
stack-0.064
oot-0.063
aroo-0.062
Token smiled
Token ID13541
Feature activation+3.47e-02

Pos logit contributions
aron+0.258
 gre+0.242
bo+0.237
en+0.229
able+0.225
Neg logit contributions
stack-0.448
aroo-0.422
ishly-0.422
anger-0.419
 plunge-0.407
Token and
Token ID290
Feature activation+1.94e-03

Pos logit contributions
bo+0.052
aron+0.046
 cared+0.046
en+0.044
y+0.043
Neg logit contributions
anger-0.065
stack-0.063
ishly-0.061
ble-0.061
 plunge-0.060
Token took
Token ID1718
Feature activation+2.34e-02

Pos logit contributions
y+0.003
bo+0.003
en+0.003
aron+0.003
by+0.002
Neg logit contributions
stack-0.004
anger-0.004
 plunge-0.004
ishly-0.004
ses-0.004
Token a
Token ID257
Feature activation-1.81e-02

Pos logit contributions
y+0.043
aron+0.040
bo+0.040
en+0.038
 cared+0.037
Neg logit contributions
anger-0.037
 plunge-0.036
stack-0.032
aroo-0.031
ishly-0.029
Token big
Token ID1263
Feature activation+5.97e-02

Pos logit contributions
anger+0.032
 plunge+0.031
stack+0.030
ittens+0.027
aroo+0.027
Neg logit contributions
bo-0.030
en-0.030
aron-0.030
y-0.030
 cared-0.026
Token'm
Token ID1101
Feature activation+1.85e-02

Pos logit contributions
aron+0.106
bo+0.103
en+0.099
y+0.098
by+0.094
Neg logit contributions
 plunge-0.067
anger-0.067
stack-0.063
-0.061
ses-0.060
Token making
Token ID1642
Feature activation+6.34e-02

Pos logit contributions
bo+0.032
y+0.032
aron+0.031
by+0.030
en+0.030
Neg logit contributions
stack-0.032
anger-0.031
 plunge-0.031
ses-0.029
ittens-0.028
Token a
Token ID257
Feature activation-2.05e-02

Pos logit contributions
y+0.117
bo+0.116
aron+0.114
by+0.106
 cared+0.106
Neg logit contributions
 plunge-0.101
anger-0.097
stack-0.089
ittens-0.087
ble-0.085
Token delicious
Token ID12625
Feature activation-1.28e-02

Pos logit contributions
anger+0.033
stack+0.032
 plunge+0.032
ittens+0.029
ses+0.029
Neg logit contributions
bo-0.037
y-0.036
aron-0.036
en-0.036
 cared-0.033
Token mac
Token ID8352
Feature activation+8.85e-02

Pos logit contributions
anger+0.020
 plunge+0.019
stack+0.018
ittens+0.017
+0.016
Neg logit contributions
bo-0.025
aron-0.024
y-0.024
en-0.023
 cared-0.022
Tokenaron
Token ID8045
Feature activation+2.13e-01

Pos logit contributions
 cared+0.081
bo+0.075
Em+0.072
en+0.072
by+0.072
Neg logit contributions
anger-0.214
stack-0.204
ittens-0.194
ble-0.186
ses-0.186
Tokeni
Token ID72
Feature activation+1.50e-02

Pos logit contributions
aron+0.303
bo+0.297
y+0.287
en+0.282
 cared+0.271
Neg logit contributions
anger-0.406
stack-0.402
 plunge-0.397
ittens-0.377
aroo-0.371
Token and
Token ID290
Feature activation+1.56e-03

Pos logit contributions
en+0.025
Em+0.025
 cared+0.025
bo+0.024
Kat+0.023
Neg logit contributions
anger-0.023
ittens-0.023
 plunge-0.022
stack-0.022
ble-0.021
Token cheese
Token ID9891
Feature activation+8.83e-03

Pos logit contributions
y+0.003
en+0.003
bo+0.003
aron+0.003
by+0.003
Neg logit contributions
stack-0.002
 plunge-0.002
ses-0.002
ittens-0.002
sed-0.002
Token,"
Token ID553
Feature activation+3.71e-02

Pos logit contributions
bo+0.016
en+0.015
 cared+0.015
by+0.014
y+0.014
Neg logit contributions
anger-0.014
stack-0.014
 plunge-0.013
ittens-0.012
aroo-0.012
Token M
Token ID337
Feature activation-1.67e-02

Pos logit contributions
aron+0.070
bo+0.068
y+0.067
en+0.067
by+0.063
Neg logit contributions
stack-0.055
 plunge-0.055
anger-0.052
aroo-0.050
ittens-0.049
Token an
Token ID281
Feature activation-3.73e-02

Pos logit contributions
bo+0.007
aron+0.007
y+0.006
en+0.006
by+0.006
Neg logit contributions
anger-0.006
stack-0.006
 plunge-0.006
aroo-0.005
ses-0.005
Token ig
Token ID45329
Feature activation-6.26e-02

Pos logit contributions
anger+0.069
stack+0.067
 plunge+0.065
ses+0.060
ittens+0.060
Neg logit contributions
bo-0.060
y-0.059
aron-0.058
en-0.057
by-0.053
Tokenloo
Token ID29680
Feature activation-7.01e-02

Pos logit contributions
anger+0.075
stack+0.072
 plunge+0.070
ses+0.063
ittens+0.061
Neg logit contributions
bo-0.141
y-0.140
aron-0.137
en-0.135
by-0.130
Token!
Token ID0
Feature activation-5.39e-02

Pos logit contributions
anger+0.149
stack+0.148
 plunge+0.138
ishly+0.127
ittens+0.126
Neg logit contributions
bo-0.094
aron-0.091
by-0.087
y-0.087
en-0.084
Token
Token ID198
Feature activation+5.50e-03

Pos logit contributions
anger+0.102
stack+0.099
 plunge+0.097
ses+0.090
ittens+0.090
Neg logit contributions
bo-0.084
y-0.083
aron-0.081
en-0.079
by-0.075
TokenEr
Token ID9139
Feature activation+2.13e-01

Pos logit contributions
y+0.007
bo+0.007
en+0.006
aron+0.006
by+0.006
Neg logit contributions
anger-0.012
stack-0.012
 plunge-0.012
ittens-0.011
aroo-0.011
Tokenin
Token ID259
Feature activation+8.87e-03

Pos logit contributions
aron+0.272
 gre+0.245
bo+0.242
en+0.237
 sav+0.235
Neg logit contributions
anger-0.393
aroo-0.392
ble-0.390
 plunge-0.363
stack-0.359
Token had
Token ID550
Feature activation+4.93e-02

Pos logit contributions
aron+0.011
en+0.011
bo+0.010
y+0.010
 gre+0.009
Neg logit contributions
stack-0.018
anger-0.017
aroo-0.017
ittens-0.017
 raft-0.016
Token never
Token ID1239
Feature activation-2.46e-02

Pos logit contributions
aron+0.076
y+0.076
en+0.072
bo+0.070
by+0.066
Neg logit contributions
 plunge-0.091
anger-0.090
stack-0.088
ses-0.079
 umbrella-0.079
Token seen
Token ID1775
Feature activation-4.98e-02

Pos logit contributions
 plunge+0.042
 footing+0.039
anger+0.037
 bearings+0.036
stack+0.036
Neg logit contributions
aron-0.042
bo-0.038
y-0.036
en-0.035
ce-0.035
Token an
Token ID281
Feature activation-3.78e-02

Pos logit contributions
anger+0.079
 plunge+0.076
stack+0.072
aroo+0.063
 umbrella+0.061
Neg logit contributions
bo-0.094
y-0.093
aron-0.092
by-0.087
 cared-0.083
Token smiled
Token ID13541
Feature activation+3.38e-02

Pos logit contributions
stack+0.005
ishly+0.005
aroo+0.004
 yes+0.004
sed+0.004
Neg logit contributions
able-0.004
en-0.003
bo-0.003
aron-0.003
 late-0.003
Token.
Token ID13
Feature activation+3.16e-03

Pos logit contributions
bo+0.053
aron+0.049
Em+0.049
ud+0.047
ce+0.047
Neg logit contributions
ishly-0.056
ble-0.055
anger-0.052
 plunge-0.051
stack-0.050
Token She
Token ID1375
Feature activation-1.58e-02

Pos logit contributions
bo+0.007
aron+0.006
y+0.006
en+0.006
by+0.006
Neg logit contributions
anger-0.004
stack-0.004
 plunge-0.004
-0.004
ses-0.004
Token loved
Token ID6151
Feature activation+1.03e-02

Pos logit contributions
stack+0.028
sed+0.026
ishly+0.025
aroo+0.025
+0.025
Neg logit contributions
en-0.022
bo-0.021
able-0.020
older-0.020
by-0.020
Token mac
Token ID8352
Feature activation+8.84e-02

Pos logit contributions
bo+0.020
aron+0.019
y+0.019
ny+0.019
en+0.018
Neg logit contributions
 plunge-0.014
stack-0.014
ittens-0.013
anger-0.012
ses-0.012
Tokenaron
Token ID8045
Feature activation+2.12e-01

Pos logit contributions
Em+0.057
bo+0.054
idi+0.052
Kat+0.050
 gre+0.049
Neg logit contributions
anger-0.215
stack-0.209
ittens-0.201
ishly-0.201
ses-0.201
Tokeni
Token ID72
Feature activation+1.53e-02

Pos logit contributions
aron+0.325
bo+0.311
y+0.290
en+0.281
 cared+0.278
Neg logit contributions
stack-0.379
anger-0.376
 plunge-0.374
ittens-0.359
ishly-0.356
Token and
Token ID290
Feature activation+1.61e-03

Pos logit contributions
Em+0.021
Kat+0.020
en+0.019
bo+0.019
idi+0.019
Neg logit contributions
ittens-0.028
 plunge-0.027
stack-0.027
anger-0.027
ishly-0.027
Token cheese
Token ID9891
Feature activation+9.06e-03

Pos logit contributions
en+0.003
y+0.003
bo+0.003
aron+0.003
by+0.002
Neg logit contributions
stack-0.003
ses-0.002
 plunge-0.002
sed-0.002
ittens-0.002
Token!
Token ID0
Feature activation-4.81e-02

Pos logit contributions
bo+0.012
en+0.012
by+0.011
aron+0.011
Em+0.010
Neg logit contributions
stack-0.018
anger-0.018
 plunge-0.017
ittens-0.017
ishly-0.016
Token
Token ID198
Feature activation+9.69e-03

Pos logit contributions
anger+0.074
stack+0.074
 plunge+0.069
+0.064
ittens+0.063
Neg logit contributions
bo-0.092
y-0.089
aron-0.088
en-0.086
by-0.084
Token dinner
Token ID8073
Feature activation-2.29e-02

Pos logit contributions
anger+0.110
stack+0.109
 plunge+0.106
ittens+0.100
ses+0.097
Neg logit contributions
bo-0.087
y-0.086
aron-0.083
en-0.080
by-0.077
Token.
Token ID13
Feature activation+6.96e-04

Pos logit contributions
stack+0.045
anger+0.045
ittens+0.041
 plunge+0.040
+0.040
Neg logit contributions
bo-0.033
en-0.030
by-0.028
y-0.028
aron-0.027
Token It
Token ID632
Feature activation-9.23e-02

Pos logit contributions
bo+0.001
y+0.001
en+0.001
aron+0.001
by+0.001
Neg logit contributions
stack-0.001
anger-0.001
 plunge-0.001
ittens-0.001
-0.001
Token was
Token ID373
Feature activation+1.73e-02

Pos logit contributions
stack+0.181
 plunge+0.165
ishly+0.164
anger+0.156
aroo+0.151
Neg logit contributions
aron-0.122
en-0.108
older-0.106
eared-0.105
ky-0.105
Token mac
Token ID8352
Feature activation+8.76e-02

Pos logit contributions
bo+0.032
y+0.031
by+0.030
aron+0.030
en+0.030
Neg logit contributions
stack-0.026
 plunge-0.024
ittens-0.022
anger-0.021
ses-0.021
Tokenaron
Token ID8045
Feature activation+2.11e-01

Pos logit contributions
bo+0.053
by+0.052
Em+0.049
Kat+0.045
en+0.044
Neg logit contributions
anger-0.227
stack-0.224
ses-0.210
ittens-0.208
ishly-0.205
Tokeni
Token ID72
Feature activation+1.09e-02

Pos logit contributions
aron+0.314
bo+0.309
y+0.287
en+0.283
 cared+0.277
Neg logit contributions
stack-0.391
anger-0.374
 plunge-0.372
ittens-0.362
ishly-0.361
Token and
Token ID290
Feature activation-7.62e-03

Pos logit contributions
en+0.013
bo+0.013
Em+0.013
aron+0.012
Kat+0.012
Neg logit contributions
stack-0.022
ittens-0.021
anger-0.021
 plunge-0.021
ishly-0.020
Token cheese
Token ID9891
Feature activation+7.43e-03

Pos logit contributions
stack+0.012
 plunge+0.011
ittens+0.011
ses+0.011
aroo+0.011
Neg logit contributions
en-0.014
y-0.013
aron-0.013
bo-0.013
by-0.012
Token with
Token ID351
Feature activation+1.86e-02

Pos logit contributions
bo+0.010
en+0.009
by+0.009
y+0.009
aron+0.009
Neg logit contributions
stack-0.016
anger-0.015
 plunge-0.014
ittens-0.014
aroo-0.014
Token a
Token ID257
Feature activation-2.53e-02

Pos logit contributions
aron+0.034
bo+0.033
y+0.033
en+0.032
ny+0.031
Neg logit contributions
 plunge-0.026
stack-0.025
anger-0.024
ittens-0.023
aroo-0.022
Token about
Token ID546
Feature activation+1.79e-02

Pos logit contributions
anger+0.019
stack+0.018
 plunge+0.018
ses+0.017
ittens+0.017
Neg logit contributions
bo-0.012
y-0.012
aron-0.012
en-0.012
by-0.011
Token we
Token ID356
Feature activation-6.82e-03

Pos logit contributions
bo+0.032
y+0.031
aron+0.031
en+0.030
by+0.029
Neg logit contributions
anger-0.030
stack-0.029
 plunge-0.028
ses-0.026
ittens-0.026
Token make
Token ID787
Feature activation+2.42e-02

Pos logit contributions
anger+0.010
stack+0.010
 plunge+0.010
ittens+0.009
ses+0.009
Neg logit contributions
bo-0.013
aron-0.013
y-0.013
en-0.012
by-0.012
Token some
Token ID617
Feature activation-2.62e-02

Pos logit contributions
bo+0.040
y+0.040
aron+0.040
en+0.037
by+0.036
Neg logit contributions
anger-0.043
 plunge-0.041
stack-0.040
ittens-0.038
ses-0.037
Token mac
Token ID8352
Feature activation+9.09e-02

Pos logit contributions
 plunge+0.041
anger+0.041
ittens+0.038
stack+0.036
ble+0.036
Neg logit contributions
bo-0.049
aron-0.047
y-0.046
en-0.044
by-0.042
Tokenaron
Token ID8045
Feature activation+2.11e-01

Pos logit contributions
Em+0.085
bo+0.080
by+0.080
 cared+0.078
Kat+0.077
Neg logit contributions
anger-0.212
stack-0.198
ses-0.192
ittens-0.190
anks-0.185
Tokeni
Token ID72
Feature activation+1.63e-02

Pos logit contributions
bo+0.292
aron+0.289
y+0.285
en+0.273
by+0.257
Neg logit contributions
anger-0.421
stack-0.413
 plunge-0.405
ittens-0.382
ses-0.373
Token and
Token ID290
Feature activation+9.41e-03

Pos logit contributions
bo+0.023
aron+0.023
en+0.022
y+0.022
Em+0.022
Neg logit contributions
anger-0.030
stack-0.029
 plunge-0.028
ittens-0.028
ble-0.027
Token cheese
Token ID9891
Feature activation+7.71e-03

Pos logit contributions
y+0.017
aron+0.016
bo+0.016
en+0.016
by+0.015
Neg logit contributions
stack-0.015
ses-0.014
 plunge-0.014
sed-0.014
anger-0.013
Token for
Token ID329
Feature activation-4.76e-02

Pos logit contributions
bo+0.012
aron+0.011
en+0.011
y+0.011
by+0.011
Neg logit contributions
anger-0.015
stack-0.014
 plunge-0.013
ittens-0.013
ses-0.013
Token dinner
Token ID8073
Feature activation-2.04e-02

Pos logit contributions
anger+0.077
stack+0.075
 plunge+0.074
ittens+0.068
ses+0.067
Neg logit contributions
bo-0.086
y-0.086
aron-0.084
en-0.082
by-0.078
Token a
Token ID257
Feature activation-2.40e-02

Pos logit contributions
anger+0.069
stack+0.068
 plunge+0.067
ses+0.064
ittens+0.062
Neg logit contributions
y-0.034
bo-0.033
aron-0.032
en-0.032
by-0.030
Token lot
Token ID1256
Feature activation+2.76e-02

Pos logit contributions
anger+0.041
stack+0.041
 plunge+0.040
ittens+0.037
aroo+0.036
Neg logit contributions
bo-0.041
en-0.041
y-0.040
aron-0.039
by-0.036
Token.
Token ID13
Feature activation+1.12e-03

Pos logit contributions
bo+0.032
y+0.029
en+0.027
aron+0.027
 cared+0.026
Neg logit contributions
anger-0.065
stack-0.062
 plunge-0.060
ittens-0.058
aroo-0.057
Token
Token ID198
Feature activation+7.95e-03

Pos logit contributions
y+0.002
bo+0.002
aron+0.002
en+0.002
by+0.002
Neg logit contributions
anger-0.002
stack-0.002
 plunge-0.002
ittens-0.002
ses-0.002
Token
Token ID198
Feature activation+9.59e-03

Pos logit contributions
bo+0.011
y+0.011
aron+0.011
en+0.011
by+0.010
Neg logit contributions
anger-0.016
stack-0.016
 plunge-0.015
ittens-0.014
ses-0.014
TokenEr
Token ID9139
Feature activation+2.11e-01

Pos logit contributions
y+0.015
bo+0.015
aron+0.014
en+0.014
by+0.013
Neg logit contributions
anger-0.018
stack-0.018
 plunge-0.017
ittens-0.016
aroo-0.016
Tokenin
Token ID259
Feature activation+7.52e-03

Pos logit contributions
aron+0.257
en+0.235
 gre+0.234
bo+0.232
 sav+0.222
Neg logit contributions
aroo-0.399
anger-0.391
ble-0.387
stack-0.378
ute-0.368
Token was
Token ID373
Feature activation+2.06e-02

Pos logit contributions
aron+0.009
y+0.009
en+0.008
bo+0.008
 gre+0.007
Neg logit contributions
stack-0.016
anger-0.015
ishly-0.015
aroo-0.015
ittens-0.014
Token glad
Token ID9675
Feature activation-3.03e-02

Pos logit contributions
y+0.036
aron+0.035
en+0.035
by+0.034
bo+0.034
Neg logit contributions
stack-0.033
 plunge-0.032
anger-0.031
aroo-0.030
ble-0.029
Token that
Token ID326
Feature activation-2.98e-02

Pos logit contributions
anger+0.039
stack+0.038
 plunge+0.035
ittens+0.034
aroo+0.033
Neg logit contributions
bo-0.065
aron-0.063
y-0.062
en-0.060
ny-0.058
Token she
Token ID673
Feature activation+3.74e-02

Pos logit contributions
stack+0.043
anger+0.043
 plunge+0.042
aroo+0.037
ittens+0.036
Neg logit contributions
aron-0.059
bo-0.058
y-0.058
en-0.055
by-0.052
Token broccoli
Token ID44653
Feature activation+7.20e-02

Pos logit contributions
 plunge+0.045
anger+0.044
stack+0.042
ittens+0.042
ses+0.040
Neg logit contributions
aron-0.063
bo-0.062
en-0.061
y-0.061
Em-0.054
Token and
Token ID290
Feature activation-9.02e-03

Pos logit contributions
bo+0.097
en+0.087
aron+0.086
y+0.081
by+0.079
Neg logit contributions
stack-0.156
anger-0.148
 plunge-0.143
ishly-0.136
-0.132
Token had
Token ID550
Feature activation+5.34e-02

Pos logit contributions
stack+0.018
sed+0.016
ses+0.016
 plunge+0.015
ishly+0.015
Neg logit contributions
bo-0.013
aron-0.013
y-0.012
ny-0.012
en-0.012
Token her
Token ID607
Feature activation-2.87e-02

Pos logit contributions
y+0.117
en+0.114
bo+0.113
aron+0.113
ce+0.108
Neg logit contributions
 plunge-0.054
 yes-0.046
 umbrella-0.042
-0.042
stack-0.042
Token mac
Token ID8352
Feature activation+8.96e-02

Pos logit contributions
anger+0.042
stack+0.042
 plunge+0.041
ses+0.038
ittens+0.037
Neg logit contributions
bo-0.056
aron-0.055
y-0.055
en-0.054
by-0.050
Tokenaron
Token ID8045
Feature activation+2.10e-01

Pos logit contributions
Em+0.051
Kat+0.047
en+0.046
by+0.045
bo+0.044
Neg logit contributions
anger-0.238
stack-0.230
ses-0.221
ittens-0.216
sed-0.214
Tokeni
Token ID72
Feature activation+9.18e-03

Pos logit contributions
aron+0.322
bo+0.314
Em+0.301
 gre+0.299
 cared+0.290
Neg logit contributions
stack-0.364
anger-0.347
ishly-0.340
ittens-0.339
 plunge-0.339
Token and
Token ID290
Feature activation-4.90e-03

Pos logit contributions
en+0.012
Em+0.011
Kat+0.011
bo+0.011
 gre+0.010
Neg logit contributions
stack-0.018
anger-0.018
ittens-0.017
ses-0.016
ishly-0.016
Token cheese
Token ID9891
Feature activation+5.57e-03

Pos logit contributions
stack+0.007
ses+0.007
sed+0.007
ittens+0.007
 plunge+0.007
Neg logit contributions
y-0.009
en-0.009
bo-0.009
aron-0.009
ud-0.008
Token too
Token ID1165
Feature activation-9.04e-02

Pos logit contributions
bo+0.007
en+0.007
by+0.006
y+0.006
aron+0.006
Neg logit contributions
stack-0.012
anger-0.012
ittens-0.011
 plunge-0.011
aroo-0.010
Token.
Token ID13
Feature activation+2.86e-03

Pos logit contributions
anger+0.155
ittens+0.151
+0.148
stack+0.141
 plunge+0.139
Neg logit contributions
en-0.140
bo-0.135
Em-0.121
idi-0.120
y-0.119
Token ig
Token ID45329
Feature activation-6.63e-02

Pos logit contributions
stack+0.009
anger+0.008
 plunge+0.008
 raft+0.007
 umbrella+0.007
Neg logit contributions
y-0.013
aron-0.013
en-0.012
bo-0.012
 cared-0.012
Tokenloo
Token ID29680
Feature activation-7.37e-02

Pos logit contributions
anger+0.075
stack+0.068
 plunge+0.062
ittens+0.057
ses+0.053
Neg logit contributions
bo-0.156
y-0.156
aron-0.154
en-0.153
by-0.147
Token.
Token ID13
Feature activation+1.21e-03

Pos logit contributions
stack+0.164
anger+0.163
 plunge+0.149
ishly+0.140
ittens+0.138
Neg logit contributions
bo-0.092
aron-0.087
by-0.085
y-0.084
en-0.082
Token
Token ID220
Feature activation-1.72e-02

Pos logit contributions
y+0.002
bo+0.002
aron+0.002
en+0.002
by+0.002
Neg logit contributions
anger-0.002
stack-0.002
 plunge-0.002
-0.002
ittens-0.002
Token
Token ID198
Feature activation+5.32e-03

Pos logit contributions
anger+0.042
stack+0.041
 plunge+0.041
ses+0.039
ittens+0.038
Neg logit contributions
bo-0.017
y-0.017
aron-0.016
en-0.015
by-0.014
TokenEr
Token ID9139
Feature activation+2.09e-01

Pos logit contributions
y+0.008
en+0.008
bo+0.008
aron+0.007
by+0.007
Neg logit contributions
anger-0.010
stack-0.010
 plunge-0.010
ble-0.009
ute-0.009
Tokenin
Token ID259
Feature activation+7.52e-03

Pos logit contributions
aron+0.271
 gre+0.254
 sav+0.245
bo+0.245
en+0.242
Neg logit contributions
ble-0.371
aroo-0.359
anger-0.357
 plunge-0.324
ute-0.322
Token felt
Token ID2936
Feature activation+4.02e-03

Pos logit contributions
aron+0.009
en+0.008
bo+0.008
y+0.008
Em+0.008
Neg logit contributions
stack-0.016
aroo-0.014
anger-0.014
 raft-0.014
ittens-0.014
Token so
Token ID523
Feature activation-3.57e-02

Pos logit contributions
y+0.008
bo+0.007
by+0.007
aron+0.007
en+0.007
Neg logit contributions
 plunge-0.007
stack-0.006
anger-0.006
aroo-0.006
ble-0.006
Token happy
Token ID3772
Feature activation-1.74e-02

Pos logit contributions
 plunge+0.061
anger+0.060
stack+0.059
ses+0.053
ittens+0.053
Neg logit contributions
y-0.062
bo-0.061
aron-0.060
en-0.059
by-0.055
Token and
Token ID290
Feature activation-4.48e-03

Pos logit contributions
anger+0.031
stack+0.030
 plunge+0.029
ittens+0.026
ishly+0.026
Neg logit contributions
bo-0.031
aron-0.028
y-0.027
by-0.026
en-0.026
Token perfect
Token ID2818
Feature activation+2.19e-02

Pos logit contributions
bo+0.006
y+0.006
aron+0.006
en+0.006
by+0.006
Neg logit contributions
stack-0.004
 plunge-0.004
anger-0.004
ses-0.004
ishly-0.003
Token battery
Token ID6555
Feature activation+3.96e-02

Pos logit contributions
bo+0.044
aron+0.041
y+0.040
 cared+0.037
by+0.037
Neg logit contributions
anger-0.035
 plunge-0.033
stack-0.032
ses-0.029
 footing-0.028
Token!
Token ID0
Feature activation-5.17e-02

Pos logit contributions
bo+0.057
aron+0.056
en+0.053
y+0.053
by+0.049
Neg logit contributions
anger-0.081
stack-0.079
 plunge-0.076
aroo-0.070
ses-0.069
Token
Token ID198
Feature activation+1.34e-02

Pos logit contributions
anger+0.100
stack+0.100
 plunge+0.093
+0.088
ittens+0.087
Neg logit contributions
y-0.077
en-0.073
bo-0.073
aron-0.072
by-0.070
Token
Token ID198
Feature activation+1.42e-02

Pos logit contributions
y+0.019
en+0.019
bo+0.018
aron+0.018
by+0.017
Neg logit contributions
anger-0.027
stack-0.026
 plunge-0.025
ittens-0.024
aroo-0.024
TokenEr
Token ID9139
Feature activation+2.09e-01

Pos logit contributions
y+0.022
bo+0.021
aron+0.021
en+0.021
by+0.021
Neg logit contributions
anger-0.024
stack-0.024
 plunge-0.024
aroo-0.023
vere-0.022
Tokennie
Token ID11952
Feature activation+1.37e-01

Pos logit contributions
 gre+0.195
 sav+0.190
aron+0.176
bo+0.174
 che+0.162
Neg logit contributions
aroo-0.445
ble-0.443
stack-0.434
anger-0.432
ute-0.411
Token thanked
Token ID26280
Feature activation+2.12e-02

Pos logit contributions
aron+0.164
bo+0.159
 gre+0.151
en+0.148
 late+0.148
Neg logit contributions
stack-0.278
ishly-0.262
aroo-0.258
anger-0.254
 plunge-0.248
Token the
Token ID262
Feature activation+2.81e-03

Pos logit contributions
bo+0.030
y+0.029
aron+0.028
en+0.027
by+0.026
Neg logit contributions
anger-0.043
stack-0.043
 plunge-0.042
ittens-0.038
aroo-0.038
Token creature
Token ID7185
Feature activation-5.67e-02

Pos logit contributions
y+0.005
bo+0.005
en+0.004
aron+0.004
by+0.004
Neg logit contributions
anger-0.005
stack-0.005
 plunge-0.005
ittens-0.005
ses-0.005
Token and
Token ID290
Feature activation+2.10e-03

Pos logit contributions
stack+0.116
anger+0.116
 plunge+0.111
ishly+0.106
aroo+0.102
Neg logit contributions
bo-0.079
aron-0.074
y-0.072
en-0.070
 cared-0.064
Token broccoli
Token ID44653
Feature activation+7.46e-02

Pos logit contributions
anger+0.002
 plunge+0.002
stack+0.002
ses+0.002
ittens+0.002
Neg logit contributions
bo-0.002
aron-0.002
y-0.002
en-0.002
by-0.002
Token.
Token ID13
Feature activation-3.71e-03

Pos logit contributions
bo+0.085
aron+0.082
y+0.080
en+0.078
by+0.072
Neg logit contributions
anger-0.172
stack-0.168
 plunge-0.165
ses-0.154
ittens-0.153
Token She
Token ID1375
Feature activation-2.06e-02

Pos logit contributions
anger+0.007
stack+0.007
 plunge+0.007
ses+0.006
ittens+0.006
Neg logit contributions
bo-0.006
y-0.006
aron-0.006
en-0.005
by-0.005
Token wanted
Token ID2227
Feature activation-7.58e-03

Pos logit contributions
stack+0.039
anger+0.038
 plunge+0.038
ishly+0.038
sed+0.038
Neg logit contributions
aron-0.027
en-0.026
y-0.026
bo-0.025
Kat-0.023
Token mac
Token ID8352
Feature activation+8.82e-02

Pos logit contributions
 plunge+0.013
stack+0.012
anger+0.012
ses+0.011
aroo+0.011
Neg logit contributions
aron-0.013
bo-0.013
y-0.012
en-0.012
by-0.011
Tokenaron
Token ID8045
Feature activation+2.08e-01

Pos logit contributions
bo+0.069
en+0.064
by+0.064
Em+0.064
 gre+0.062
Neg logit contributions
anger-0.216
stack-0.203
ses-0.202
ittens-0.194
ishly-0.193
Tokeni
Token ID72
Feature activation+1.27e-02

Pos logit contributions
aron+0.279
bo+0.278
y+0.268
en+0.256
by+0.239
Neg logit contributions
anger-0.421
stack-0.416
 plunge-0.410
ittens-0.385
ishly-0.377
Token and
Token ID290
Feature activation-6.69e-03

Pos logit contributions
bo+0.015
aron+0.015
en+0.014
Em+0.014
y+0.013
Neg logit contributions
anger-0.027
stack-0.026
 plunge-0.026
ittens-0.025
ishly-0.025
Token cheese
Token ID9891
Feature activation+5.29e-03

Pos logit contributions
 plunge+0.011
stack+0.011
ses+0.010
ittens+0.010
anger+0.010
Neg logit contributions
y-0.012
en-0.012
aron-0.012
bo-0.011
by-0.010
Token instead
Token ID2427
Feature activation-9.33e-02

Pos logit contributions
bo+0.006
en+0.006
aron+0.006
y+0.006
by+0.006
Neg logit contributions
anger-0.012
stack-0.012
 plunge-0.011
ittens-0.010
ses-0.010
Token.
Token ID13
Feature activation-3.34e-03

Pos logit contributions
anger+0.224
stack+0.217
 plunge+0.212
ittens+0.202
aroo+0.198
Neg logit contributions
bo-0.097
aron-0.094
y-0.092
en-0.090
by-0.082
Token the
Token ID262
Feature activation+1.71e-03

Pos logit contributions
bo+0.038
aron+0.037
y+0.037
en+0.035
by+0.034
Neg logit contributions
anger-0.040
stack-0.039
 plunge-0.038
ses-0.034
ittens-0.034
Token forest
Token ID8222
Feature activation-5.11e-02

Pos logit contributions
aron+0.003
y+0.003
bo+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ses-0.002
ishly-0.002
Token!"
Token ID2474
Feature activation-4.25e-03

Pos logit contributions
stack+0.102
anger+0.100
 plunge+0.097
ishly+0.089
aroo+0.088
Neg logit contributions
bo-0.073
aron-0.069
y-0.067
en-0.066
by-0.064
Token
Token ID198
Feature activation+1.04e-02

Pos logit contributions
anger+0.008
stack+0.007
 plunge+0.007
ishly+0.007
+0.007
Neg logit contributions
y-0.007
bo-0.007
en-0.006
by-0.006
aron-0.006
Token
Token ID198
Feature activation+1.10e-02

Pos logit contributions
y+0.014
en+0.014
bo+0.013
aron+0.013
by+0.012
Neg logit contributions
anger-0.021
stack-0.021
 plunge-0.020
ittens-0.019
aroo-0.019
TokenEr
Token ID9139
Feature activation+2.08e-01

Pos logit contributions
y+0.017
aron+0.016
bo+0.016
en+0.016
by+0.015
Neg logit contributions
anger-0.019
stack-0.019
 plunge-0.019
ute-0.018
vere-0.018
Tokennie
Token ID11952
Feature activation+1.37e-01

Pos logit contributions
 gre+0.195
 sav+0.187
aron+0.179
bo+0.171
 che+0.160
Neg logit contributions
aroo-0.454
ble-0.448
stack-0.438
anger-0.431
ute-0.418
Token nodded
Token ID14464
Feature activation+2.91e-02

Pos logit contributions
aron+0.164
bo+0.155
 gre+0.150
en+0.148
y+0.147
Neg logit contributions
stack-0.288
ishly-0.283
aroo-0.261
 plunge-0.260
anger-0.259
Token and
Token ID290
Feature activation+8.31e-04

Pos logit contributions
bo+0.042
aron+0.039
ud+0.037
y+0.036
by+0.036
Neg logit contributions
stack-0.055
ishly-0.054
 plunge-0.052
anger-0.052
 yes-0.052
Token thanked
Token ID26280
Feature activation+2.17e-02

Pos logit contributions
aron+0.001
bo+0.001
y+0.001
en+0.001
ud+0.001
Neg logit contributions
stack-0.002
 plunge-0.002
anger-0.002
ishly-0.002
aroo-0.002
Token the
Token ID262
Feature activation+2.58e-03

Pos logit contributions
bo+0.033
y+0.033
aron+0.032
en+0.029
by+0.028
Neg logit contributions
stack-0.041
anger-0.041
 plunge-0.040
aroo-0.036
ishly-0.036
Token snake
Token ID17522
Feature activation-3.38e-02

Pos logit contributions
anger+0.082
ittens+0.075
 plunge+0.072
stack+0.072
ishly+0.069
Neg logit contributions
y-0.097
bo-0.092
aron-0.086
en-0.086
ny-0.085
Token sees
Token ID7224
Feature activation+1.01e-02

Pos logit contributions
anger+0.062
stack+0.061
 plunge+0.059
ses+0.058
sed+0.055
Neg logit contributions
aron-0.053
bo-0.052
y-0.052
en-0.050
by-0.048
Token Tom
Token ID4186
Feature activation-6.90e-03

Pos logit contributions
bo+0.017
aron+0.017
y+0.017
en+0.016
by+0.016
Neg logit contributions
anger-0.018
stack-0.017
 plunge-0.017
ses-0.016
ittens-0.016
Token and
Token ID290
Feature activation+1.05e-02

Pos logit contributions
anger+0.016
stack+0.016
 plunge+0.015
ses+0.015
ittens+0.014
Neg logit contributions
aron-0.007
bo-0.007
y-0.007
en-0.006
by-0.006
Token his
Token ID465
Feature activation+9.09e-03

Pos logit contributions
aron+0.016
y+0.016
bo+0.016
en+0.015
by+0.014
Neg logit contributions
stack-0.020
anger-0.020
 plunge-0.019
ses-0.019
aroo-0.018
Tokenses
Token ID8448
Feature activation-2.18e-01

Pos logit contributions
y+0.019
bo+0.019
aron+0.018
en+0.017
by+0.017
Neg logit contributions
anger-0.013
stack-0.012
 plunge-0.012
ittens-0.011
aroo-0.010
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
anger+0.442
 plunge+0.425
ses+0.418
stack+0.411
ishly+0.398
Neg logit contributions
bo-0.275
aron-0.272
Em-0.258
by-0.255
en-0.251
Token It
Token ID632
Feature activation-8.22e-02

Pos logit contributions
bo+0.019
y+0.019
aron+0.018
en+0.018
by+0.017
Neg logit contributions
anger-0.021
stack-0.021
 plunge-0.020
ittens-0.019
ses-0.019
Token opens
Token ID9808
Feature activation-2.40e-03

Pos logit contributions
anger+0.158
stack+0.155
 plunge+0.151
ses+0.143
ittens+0.139
Neg logit contributions
bo-0.122
y-0.122
aron-0.121
en-0.117
by-0.110
Token its
Token ID663
Feature activation-6.09e-02

Pos logit contributions
stack+0.004
anger+0.004
 plunge+0.004
ses+0.004
aroo+0.004
Neg logit contributions
bo-0.004
y-0.004
aron-0.004
by-0.004
 cared-0.004
Token mouth
Token ID5422
Feature activation-4.90e-02

Pos logit contributions
anger+0.084
stack+0.080
 plunge+0.078
ittens+0.071
ses+0.068
Neg logit contributions
bo-0.127
y-0.126
aron-0.124
en-0.121
by-0.117
Token like
Token ID588
Feature activation+3.84e-02

Pos logit contributions
 plunge+0.021
anger+0.020
stack+0.019
ses+0.019
aroo+0.018
Neg logit contributions
aron-0.021
bo-0.021
en-0.021
y-0.021
ce-0.019
Token that
Token ID326
Feature activation-3.35e-02

Pos logit contributions
aron+0.070
bo+0.067
y+0.064
en+0.061
ce+0.060
Neg logit contributions
anger-0.062
 plunge-0.061
stack-0.059
ses-0.056
aroo-0.056
Token.
Token ID13
Feature activation+9.24e-03

Pos logit contributions
stack+0.068
anger+0.067
 plunge+0.065
ses+0.062
aroo+0.062
Neg logit contributions
bo-0.045
aron-0.044
en-0.041
by-0.041
Em-0.040
Token It
Token ID632
Feature activation-8.66e-02

Pos logit contributions
bo+0.016
y+0.016
aron+0.016
en+0.015
by+0.014
Neg logit contributions
anger-0.016
stack-0.015
 plunge-0.015
ses-0.014
ittens-0.014
Token his
Token ID465
Feature activation+3.76e-03

Pos logit contributions
stack+0.150
 plunge+0.146
ses+0.146
anger+0.142
sed+0.137
Neg logit contributions
aron-0.144
Em-0.134
y-0.131
en-0.128
Kat-0.128
Tokenses
Token ID8448
Feature activation-2.18e-01

Pos logit contributions
bo+0.008
y+0.008
aron+0.008
en+0.008
by+0.007
Neg logit contributions
anger-0.005
stack-0.005
 plunge-0.005
ittens-0.004
ses-0.004
Token and
Token ID290
Feature activation+8.36e-03

Pos logit contributions
anger+0.452
ses+0.439
 plunge+0.430
stack+0.430
sed+0.428
Neg logit contributions
bo-0.263
aron-0.258
by-0.246
Kat-0.243
en-0.238
Token fl
Token ID781
Feature activation-5.11e-02

Pos logit contributions
bo+0.012
y+0.012
aron+0.012
en+0.012
by+0.011
Neg logit contributions
anger-0.016
stack-0.016
 plunge-0.016
ittens-0.014
ses-0.014
Tokenaps
Token ID1686
Feature activation+1.39e-02

Pos logit contributions
stack+0.084
 plunge+0.079
anger+0.076
 treadmill+0.069
 stool+0.067
Neg logit contributions
y-0.100
bo-0.099
by-0.089
aron-0.086
en-0.084
Token its
Token ID663
Feature activation-6.93e-02

Pos logit contributions
bo+0.021
aron+0.021
y+0.020
en+0.020
by+0.019
Neg logit contributions
anger-0.026
stack-0.025
 plunge-0.025
ses-0.023
ittens-0.023
Token wings
Token ID12098
Feature activation-4.51e-02

Pos logit contributions
anger+0.082
stack+0.080
 plunge+0.079
ses+0.074
ittens+0.069
Neg logit contributions
bo-0.151
y-0.151
aron-0.150
en-0.149
by-0.140
Token her
Token ID607
Feature activation-2.22e-02

Pos logit contributions
bo+0.089
y+0.089
aron+0.088
en+0.085
by+0.081
Neg logit contributions
anger-0.124
stack-0.121
 plunge-0.118
ittens-0.111
ses-0.108
Token hair
Token ID4190
Feature activation+9.54e-02

Pos logit contributions
anger+0.036
 plunge+0.034
stack+0.033
ittens+0.031
ses+0.030
Neg logit contributions
bo-0.041
y-0.040
aron-0.040
en-0.040
by-0.037
Token in
Token ID287
Feature activation+2.42e-02

Pos logit contributions
aron+0.126
bo+0.125
en+0.122
y+0.121
by+0.112
Neg logit contributions
anger-0.201
stack-0.191
 plunge-0.188
ittens-0.178
aroo-0.172
Token a
Token ID257
Feature activation-1.98e-02

Pos logit contributions
bo+0.039
y+0.038
aron+0.037
by+0.035
en+0.035
Neg logit contributions
stack-0.044
anger-0.043
 plunge-0.043
-0.039
ittens-0.039
Token pony
Token ID26902
Feature activation-5.97e-02

Pos logit contributions
anger+0.030
stack+0.029
 plunge+0.029
ittens+0.026
ute+0.024
Neg logit contributions
aron-0.038
bo-0.037
en-0.037
y-0.037
by-0.034
Tokentail
Token ID13199
Feature activation-2.17e-01

Pos logit contributions
anger+0.135
stack+0.130
 plunge+0.129
ses+0.123
ittens+0.121
Neg logit contributions
bo-0.072
y-0.072
aron-0.068
en-0.066
by-0.061
Token so
Token ID523
Feature activation-3.20e-02

Pos logit contributions
anger+0.512
stack+0.501
 plunge+0.477
ittens+0.458
ses+0.447
Neg logit contributions
bo-0.237
y-0.227
en-0.222
aron-0.222
by-0.207
Token she
Token ID673
Feature activation+3.62e-02

Pos logit contributions
anger+0.048
stack+0.047
 plunge+0.045
ittens+0.041
ses+0.041
Neg logit contributions
bo-0.062
y-0.061
aron-0.061
en-0.059
by-0.056
Token could
Token ID714
Feature activation-3.44e-02

Pos logit contributions
aron+0.047
en+0.046
y+0.044
bo+0.044
by+0.039
Neg logit contributions
anger-0.075
stack-0.073
 plunge-0.071
aroo-0.067
ittens-0.067
Token look
Token ID804
Feature activation+2.92e-02

Pos logit contributions
anger+0.060
stack+0.058
 plunge+0.056
ses+0.054
ittens+0.052
Neg logit contributions
bo-0.060
y-0.058
aron-0.057
en-0.056
by-0.052
Token pretty
Token ID2495
Feature activation+3.56e-02

Pos logit contributions
bo+0.053
aron+0.050
y+0.049
en+0.047
ud+0.046
Neg logit contributions
anger-0.045
 plunge-0.044
stack-0.039
 yes-0.039
 hi-0.039
Token is
Token ID318
Feature activation+1.95e-02

Pos logit contributions
stack+0.139
ses+0.139
anger+0.135
 plunge+0.133
sed+0.128
Neg logit contributions
aron-0.142
y-0.133
by-0.128
bo-0.128
en-0.128
Token angry
Token ID7954
Feature activation-2.75e-02

Pos logit contributions
bo+0.035
y+0.034
en+0.034
aron+0.033
by+0.033
Neg logit contributions
stack-0.032
anger-0.031
ses-0.031
 plunge-0.030
sed-0.029
Token.
Token ID13
Feature activation+1.58e-02

Pos logit contributions
anger+0.057
ses+0.051
 plunge+0.051
ittens+0.050
stack+0.050
Neg logit contributions
bo-0.036
by-0.035
aron-0.033
idi-0.032
en-0.031
Token It
Token ID632
Feature activation-8.09e-02

Pos logit contributions
y+0.027
bo+0.027
en+0.026
aron+0.026
by+0.026
Neg logit contributions
anger-0.027
stack-0.026
 plunge-0.025
ses-0.024
ittens-0.023
Token his
Token ID465
Feature activation+1.40e-02

Pos logit contributions
anger+0.138
ses+0.135
stack+0.134
 plunge+0.131
sed+0.126
Neg logit contributions
aron-0.136
y-0.133
bo-0.129
en-0.128
by-0.123
Tokenses
Token ID8448
Feature activation-2.17e-01

Pos logit contributions
bo+0.038
y+0.038
aron+0.037
en+0.036
by+0.036
Neg logit contributions
anger-0.010
stack-0.009
 plunge-0.009
ittens-0.007
aroo-0.006
Token.
Token ID13
Feature activation+1.84e-02

Pos logit contributions
anger+0.449
ses+0.445
sed+0.423
stack+0.414
 plunge+0.413
Neg logit contributions
bo-0.258
by-0.255
aron-0.253
Em-0.249
Kat-0.248
Token It
Token ID632
Feature activation-8.03e-02

Pos logit contributions
bo+0.030
y+0.029
aron+0.029
en+0.028
by+0.026
Neg logit contributions
anger-0.034
stack-0.033
 plunge-0.032
ses-0.030
ittens-0.030
Token scratches
Token ID37627
Feature activation+3.16e-02

Pos logit contributions
anger+0.149
stack+0.145
 plunge+0.142
ses+0.140
sed+0.131
Neg logit contributions
aron-0.124
y-0.123
bo-0.121
en-0.118
by-0.112
Token.
Token ID13
Feature activation+1.99e-02

Pos logit contributions
bo+0.043
y+0.042
aron+0.041
en+0.040
by+0.038
Neg logit contributions
anger-0.066
stack-0.064
 plunge-0.063
ittens-0.059
ses-0.059
Token It
Token ID632
Feature activation-7.83e-02

Pos logit contributions
bo+0.031
aron+0.030
y+0.030
en+0.028
by+0.026
Neg logit contributions
anger-0.038
stack-0.037
 plunge-0.037
ittens-0.034
ses-0.034
Token,
Token ID11
Feature activation-1.11e-02

Pos logit contributions
anger+0.046
stack+0.045
 plunge+0.045
ishly+0.041
ses+0.039
Neg logit contributions
aron-0.045
bo-0.045
y-0.040
en-0.040
by-0.037
Token Emma
Token ID18966
Feature activation+6.37e-02

Pos logit contributions
anger+0.021
stack+0.021
 plunge+0.020
+0.019
ishly+0.019
Neg logit contributions
bo-0.017
aron-0.017
y-0.016
en-0.015
by-0.015
Token decided
Token ID3066
Feature activation-2.75e-03

Pos logit contributions
aron+0.083
able+0.074
 gre+0.073
older+0.073
ams+0.072
Neg logit contributions
ishly-0.111
stack-0.107
anger-0.103
-0.099
icking-0.095
Token a
Token ID257
Feature activation-1.92e-02

Pos logit contributions
anger+0.005
stack+0.005
 plunge+0.005
ishly+0.004
 hi+0.004
Neg logit contributions
bo-0.005
aron-0.005
y-0.004
en-0.004
ud-0.004
Token pony
Token ID26902
Feature activation-5.93e-02

Pos logit contributions
anger+0.031
 plunge+0.031
stack+0.030
ses+0.027
ittens+0.027
Neg logit contributions
en-0.035
aron-0.035
bo-0.034
y-0.034
ams-0.029
Tokentail
Token ID13199
Feature activation-2.16e-01

Pos logit contributions
anger+0.135
stack+0.133
 plunge+0.129
ses+0.121
ittens+0.120
Neg logit contributions
bo-0.069
aron-0.067
y-0.067
en-0.064
by-0.059
Token wasn
Token ID2492
Feature activation-6.24e-03

Pos logit contributions
anger+0.523
stack+0.497
 plunge+0.479
aroo+0.450
ses+0.446
Neg logit contributions
bo-0.244
aron-0.232
en-0.218
y-0.212
by-0.198
Token't
Token ID470
Feature activation-5.14e-02

Pos logit contributions
anger+0.007
stack+0.007
ses+0.007
 plunge+0.006
sed+0.006
Neg logit contributions
bo-0.015
y-0.014
en-0.014
aron-0.014
ce-0.013
Token so
Token ID523
Feature activation-3.06e-02

Pos logit contributions
 plunge+0.074
anger+0.071
aroo+0.067
ble+0.066
stack+0.065
Neg logit contributions
bo-0.090
aron-0.088
en-0.087
by-0.085
y-0.083
Token important
Token ID1593
Feature activation+5.51e-02

Pos logit contributions
anger+0.045
 plunge+0.045
aroo+0.041
vere+0.039
 hi+0.039
Neg logit contributions
aron-0.056
en-0.056
bo-0.054
y-0.053
by-0.049
Token after
Token ID706
Feature activation-7.19e-02

Pos logit contributions
bo+0.082
y+0.068
aron+0.067
en+0.065
by+0.065
Neg logit contributions
anger-0.118
stack-0.108
 plunge-0.107
ittens-0.105
ishly-0.099
Tokenked
Token ID9091
Feature activation-1.27e-01

Pos logit contributions
anger+0.114
stack+0.108
 plunge+0.103
ute+0.101
ittens+0.099
Neg logit contributions
bo-0.054
y-0.054
en-0.054
aron-0.049
by-0.046
Token tongue
Token ID11880
Feature activation-2.45e-02

Pos logit contributions
anger+0.208
stack+0.202
 plunge+0.196
ses+0.182
ittens+0.179
Neg logit contributions
bo-0.229
y-0.226
aron-0.224
en-0.218
by-0.208
Token.
Token ID13
Feature activation+1.02e-03

Pos logit contributions
anger+0.057
stack+0.055
 plunge+0.054
ses+0.051
ittens+0.051
Neg logit contributions
bo-0.026
aron-0.025
y-0.024
by-0.024
en-0.024
Token It
Token ID632
Feature activation-8.58e-02

Pos logit contributions
bo+0.002
y+0.002
en+0.002
by+0.002
aron+0.002
Neg logit contributions
anger-0.002
stack-0.002
 plunge-0.002
ses-0.002
ittens-0.002
Token his
Token ID465
Feature activation+4.80e-03

Pos logit contributions
stack+0.157
anger+0.155
 plunge+0.153
ses+0.152
sed+0.143
Neg logit contributions
aron-0.132
y-0.127
bo-0.125
en-0.122
by-0.119
Tokenses
Token ID8448
Feature activation-2.16e-01

Pos logit contributions
y+0.012
bo+0.011
by+0.011
aron+0.011
en+0.010
Neg logit contributions
anger-0.005
stack-0.004
 plunge-0.004
otion-0.004
fs-0.004
Token and
Token ID290
Feature activation-4.04e-03

Pos logit contributions
anger+0.427
ses+0.419
sed+0.403
 plunge+0.398
ishly+0.391
Neg logit contributions
by-0.274
bo-0.274
Kat-0.265
aron-0.265
Em-0.260
Token swim
Token ID9422
Feature activation-4.50e-02

Pos logit contributions
anger+0.008
stack+0.008
 plunge+0.008
ittens+0.007
aroo+0.007
Neg logit contributions
bo-0.006
y-0.006
aron-0.006
en-0.006
by-0.005
Tokens
Token ID82
Feature activation-3.18e-02

Pos logit contributions
anger+0.094
stack+0.090
 plunge+0.089
ses+0.085
ittens+0.081
Neg logit contributions
aron-0.061
bo-0.060
en-0.058
y-0.057
by-0.052
Token near
Token ID1474
Feature activation-5.03e-02

Pos logit contributions
anger+0.076
stack+0.074
 plunge+0.073
ittens+0.069
ses+0.069
Neg logit contributions
bo-0.034
y-0.034
aron-0.032
en-0.031
by-0.029
Token the
Token ID262
Feature activation+1.41e-03

Pos logit contributions
anger+0.101
stack+0.101
 plunge+0.098
ses+0.092
ittens+0.090
Neg logit contributions
bo-0.071
y-0.069
aron-0.069
en-0.065
by-0.062
Token nose
Token ID9686
Feature activation-6.40e-02

Pos logit contributions
anger+0.021
stack+0.020
 plunge+0.020
ses+0.018
ittens+0.018
Neg logit contributions
bo-0.028
y-0.027
aron-0.027
en-0.026
by-0.025
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
anger+0.124
stack+0.116
 plunge+0.115
ishly+0.115
ittens+0.114
Neg logit contributions
Em-0.078
en-0.077
aron-0.077
by-0.075
bo-0.075
Token The
Token ID383
Feature activation-4.76e-02

Pos logit contributions
bo+0.021
y+0.021
aron+0.020
en+0.020
by+0.019
Neg logit contributions
anger-0.029
stack-0.029
 plunge-0.028
ses-0.026
ittens-0.026
Token cat
Token ID3797
Feature activation-8.32e-02

Pos logit contributions
anger+0.076
stack+0.073
 plunge+0.070
ittens+0.064
aroo+0.061
Neg logit contributions
y-0.090
bo-0.089
aron-0.084
en-0.082
by-0.081
Token his
Token ID465
Feature activation+1.17e-02

Pos logit contributions
stack+0.140
ses+0.139
anger+0.137
 plunge+0.135
sed+0.132
Neg logit contributions
aron-0.129
y-0.127
bo-0.127
Em-0.126
Kat-0.123
Tokenses
Token ID8448
Feature activation-2.16e-01

Pos logit contributions
bo+0.025
y+0.024
aron+0.024
by+0.023
en+0.022
Neg logit contributions
anger-0.016
stack-0.016
 plunge-0.015
otion-0.013
aroo-0.013
Token and
Token ID290
Feature activation+1.33e-02

Pos logit contributions
ses+0.381
ishly+0.374
anger+0.370
sed+0.363
 hi+0.360
Neg logit contributions
Em-0.302
Kat-0.300
nie-0.293
by-0.292
bo-0.288
Token scratches
Token ID37627
Feature activation+2.48e-02

Pos logit contributions
bo+0.018
y+0.017
aron+0.017
en+0.017
by+0.015
Neg logit contributions
anger-0.029
stack-0.028
 plunge-0.027
ittens-0.025
aroo-0.024
Token his
Token ID465
Feature activation+1.18e-02

Pos logit contributions
bo+0.036
aron+0.035
y+0.034
en+0.033
by+0.033
Neg logit contributions
anger-0.049
stack-0.047
 plunge-0.046
ses-0.043
ittens-0.043
Token face
Token ID1986
Feature activation+3.21e-02

Pos logit contributions
bo+0.023
y+0.022
aron+0.021
en+0.021
by+0.020
Neg logit contributions
anger-0.019
stack-0.019
 plunge-0.018
ittens-0.016
aroo-0.016
Token.
Token ID13
Feature activation+1.44e-02

Pos logit contributions
Em+0.039
by+0.039
aron+0.039
bo+0.038
en+0.037
Neg logit contributions
anger-0.063
stack-0.060
 plunge-0.058
ishly-0.058
ittens-0.057
Token it
Token ID340
Feature activation-8.25e-03

Pos logit contributions
anger+0.037
stack+0.036
ittens+0.035
ses+0.035
 plunge+0.035
Neg logit contributions
bo-0.023
y-0.023
en-0.022
aron-0.020
by-0.019
Token.
Token ID13
Feature activation+1.67e-02

Pos logit contributions
anger+0.022
stack+0.022
 plunge+0.021
ittens+0.021
ses+0.020
Neg logit contributions
y-0.007
bo-0.007
aron-0.006
en-0.006
by-0.005
Token The
Token ID383
Feature activation-4.87e-02

Pos logit contributions
bo+0.023
y+0.023
aron+0.023
en+0.022
by+0.021
Neg logit contributions
anger-0.034
stack-0.033
 plunge-0.033
ses-0.031
ittens-0.031
Token cat
Token ID3797
Feature activation-8.36e-02

Pos logit contributions
anger+0.086
stack+0.083
 plunge+0.082
ittens+0.076
+0.075
Neg logit contributions
y-0.082
bo-0.080
aron-0.074
by-0.073
 heard-0.072
Token his
Token ID465
Feature activation+9.39e-03

Pos logit contributions
anger+0.154
stack+0.152
 plunge+0.148
ses+0.145
sed+0.137
Neg logit contributions
aron-0.127
bo-0.127
y-0.126
en-0.122
by-0.117
Tokenses
Token ID8448
Feature activation-2.15e-01

Pos logit contributions
y+0.025
bo+0.024
aron+0.024
by+0.024
en+0.023
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.007
otion-0.005
-0.005
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
anger+0.444
ses+0.431
 plunge+0.424
ishly+0.416
stack+0.412
Neg logit contributions
bo-0.258
by-0.249
aron-0.247
Em-0.236
nie-0.235
Token scratches
Token ID37627
Feature activation+2.39e-02

Pos logit contributions
bo+0.017
y+0.016
aron+0.016
en+0.016
 cared+0.014
Neg logit contributions
anger-0.027
stack-0.025
 plunge-0.025
ittens-0.023
ute-0.023
Token him
Token ID683
Feature activation-8.66e-02

Pos logit contributions
bo+0.033
y+0.033
aron+0.032
en+0.031
by+0.030
Neg logit contributions
anger-0.049
stack-0.047
 plunge-0.046
ittens-0.043
ses-0.043
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
anger+0.215
stack+0.210
 plunge+0.207
ses+0.196
ittens+0.196
Neg logit contributions
bo-0.084
y-0.083
aron-0.080
en-0.077
by-0.069
Token Ben
Token ID3932
Feature activation+3.49e-02

Pos logit contributions
bo+0.022
y+0.022
aron+0.021
en+0.021
by+0.019
Neg logit contributions
anger-0.031
stack-0.031
 plunge-0.030
ses-0.028
ittens-0.028
Token scare
Token ID19437
Feature activation+5.37e-03

Pos logit contributions
bo+0.050
y+0.050
aron+0.049
en+0.048
by+0.046
Neg logit contributions
anger-0.043
stack-0.041
 plunge-0.040
ittens-0.037
ses-0.037
Token it
Token ID340
Feature activation-6.84e-03

Pos logit contributions
aron+0.009
y+0.009
bo+0.009
en+0.008
Em+0.008
Neg logit contributions
anger-0.009
 plunge-0.009
stack-0.008
aroo-0.008
ses-0.008
Token.
Token ID13
Feature activation+2.32e-03

Pos logit contributions
anger+0.015
stack+0.015
 plunge+0.014
ses+0.013
ittens+0.013
Neg logit contributions
bo-0.008
aron-0.008
y-0.008
en-0.008
by-0.007
Token It
Token ID632
Feature activation-8.29e-02

Pos logit contributions
bo+0.004
y+0.004
aron+0.004
en+0.004
by+0.004
Neg logit contributions
anger-0.004
stack-0.004
 plunge-0.004
ses-0.004
ittens-0.004
Token his
Token ID465
Feature activation+8.16e-03

Pos logit contributions
stack+0.144
anger+0.144
 plunge+0.143
ses+0.140
sed+0.131
Neg logit contributions
aron-0.133
y-0.130
bo-0.127
en-0.122
Em-0.117
Tokenses
Token ID8448
Feature activation-2.15e-01

Pos logit contributions
bo+0.017
y+0.017
by+0.017
aron+0.016
 cared+0.016
Neg logit contributions
anger-0.010
stack-0.010
 plunge-0.010
ittens-0.009
otion-0.008
Token and
Token ID290
Feature activation+2.01e-03

Pos logit contributions
anger+0.449
ses+0.429
 plunge+0.425
ishly+0.419
stack+0.414
Neg logit contributions
bo-0.263
aron-0.253
by-0.244
Kat-0.238
Em-0.233
Token shows
Token ID2523
Feature activation+2.24e-02

Pos logit contributions
bo+0.003
y+0.003
en+0.003
aron+0.003
by+0.002
Neg logit contributions
anger-0.004
stack-0.004
 plunge-0.004
ittens-0.004
ute-0.004
Token its
Token ID663
Feature activation-5.79e-02

Pos logit contributions
bo+0.034
y+0.034
aron+0.034
en+0.032
by+0.030
Neg logit contributions
anger-0.043
stack-0.041
 plunge-0.041
ses-0.038
ittens-0.037
Token teeth
Token ID9941
Feature activation-1.29e-02

Pos logit contributions
anger+0.112
stack+0.109
 plunge+0.103
ittens+0.097
+0.096
Neg logit contributions
y-0.090
bo-0.090
aron-0.083
en-0.082
by-0.082
Token.
Token ID13
Feature activation+3.61e-03

Pos logit contributions
anger+0.027
stack+0.026
 plunge+0.025
ses+0.024
ishly+0.024
Neg logit contributions
bo-0.016
aron-0.016
Em-0.015
by-0.014
en-0.014
Token slide
Token ID10649
Feature activation-6.00e-03

Pos logit contributions
bo+0.006
y+0.006
aron+0.006
en+0.005
by+0.005
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ittens-0.003
ses-0.003
Token.
Token ID13
Feature activation+1.12e-02

Pos logit contributions
anger+0.014
stack+0.013
 plunge+0.013
ittens+0.012
ses+0.012
Neg logit contributions
aron-0.007
bo-0.007
by-0.006
en-0.006
y-0.006
Token The
Token ID383
Feature activation-5.29e-02

Pos logit contributions
bo+0.016
y+0.016
aron+0.015
en+0.015
by+0.014
Neg logit contributions
anger-0.023
stack-0.022
 plunge-0.022
ses-0.021
ittens-0.021
Token cat
Token ID3797
Feature activation-8.52e-02

Pos logit contributions
anger+0.102
stack+0.100
 plunge+0.099
ancing+0.095
ishly+0.090
Neg logit contributions
y-0.079
bo-0.073
 heard-0.069
 time-0.068
 had-0.065
Token his
Token ID465
Feature activation+6.32e-03

Pos logit contributions
ses+0.133
anger+0.132
stack+0.131
sed+0.125
 plunge+0.123
Neg logit contributions
Em-0.143
aron-0.142
by-0.140
y-0.139
en-0.138
Tokenses
Token ID8448
Feature activation-2.15e-01

Pos logit contributions
bo+0.013
y+0.013
aron+0.013
by+0.013
en+0.012
Neg logit contributions
anger-0.008
stack-0.008
 plunge-0.008
otion-0.007
fs-0.006
Token at
Token ID379
Feature activation-4.33e-02

Pos logit contributions
anger+0.424
ses+0.412
 plunge+0.400
sed+0.390
stack+0.383
Neg logit contributions
bo-0.276
by-0.271
aron-0.270
Em-0.269
Kat-0.260
Token him
Token ID683
Feature activation-8.78e-02

Pos logit contributions
anger+0.091
stack+0.087
 plunge+0.086
ittens+0.081
ses+0.080
Neg logit contributions
bo-0.061
y-0.059
aron-0.057
en-0.057
by-0.053
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
anger+0.229
stack+0.224
 plunge+0.221
ses+0.209
ittens+0.209
Neg logit contributions
bo-0.075
y-0.074
aron-0.070
en-0.067
by-0.059
Token Tom
Token ID4186
Feature activation-8.39e-03

Pos logit contributions
bo+0.013
aron+0.012
y+0.012
en+0.011
 cared+0.010
Neg logit contributions
anger-0.023
stack-0.022
 plunge-0.022
ittens-0.021
ses-0.020
Token is
Token ID318
Feature activation+1.44e-02

Pos logit contributions
ses+0.014
anger+0.014
stack+0.014
 plunge+0.014
sed+0.013
Neg logit contributions
aron-0.013
bo-0.013
y-0.013
Em-0.013
by-0.013
Token
Token ID198
Feature activation+2.94e-03

Pos logit contributions
bo+0.003
y+0.003
aron+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.004
stack-0.004
 plunge-0.004
ses-0.004
ittens-0.004
Token
Token ID198
Feature activation+3.26e-03

Pos logit contributions
bo+0.003
y+0.003
aron+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.007
stack-0.006
 plunge-0.006
ittens-0.006
ses-0.006
TokenThe
Token ID464
Feature activation-4.21e-02

Pos logit contributions
aron+0.004
y+0.004
en+0.004
bo+0.004
by+0.004
Neg logit contributions
anger-0.007
stack-0.006
 plunge-0.006
ittens-0.006
aroo-0.006
Token snake
Token ID17522
Feature activation-3.18e-02

Pos logit contributions
anger+0.060
stack+0.056
 plunge+0.053
ittens+0.052
ishly+0.049
Neg logit contributions
y-0.086
bo-0.085
aron-0.081
en-0.081
by-0.079
Token his
Token ID465
Feature activation+6.90e-03

Pos logit contributions
stack+0.058
anger+0.057
ses+0.056
 plunge+0.055
sed+0.053
Neg logit contributions
aron-0.050
y-0.047
bo-0.046
en-0.046
Em-0.044
Tokenses
Token ID8448
Feature activation-2.15e-01

Pos logit contributions
y+0.016
by+0.016
bo+0.016
aron+0.015
en+0.015
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.007
ittens-0.006
otion-0.006
Token.
Token ID13
Feature activation+2.81e-03

Pos logit contributions
anger+0.433
ses+0.418
 plunge+0.409
ishly+0.403
sed+0.397
Neg logit contributions
bo-0.261
Em-0.257
Kat-0.257
aron-0.256
by-0.256
Token L
Token ID406
Feature activation+2.94e-02

Pos logit contributions
bo+0.004
aron+0.004
y+0.004
en+0.004
by+0.004
Neg logit contributions
stack-0.005
anger-0.005
 plunge-0.005
ittens-0.005
aroo-0.005
Tokenila
Token ID10102
Feature activation-1.69e-02

Pos logit contributions
y+0.047
bo+0.043
en+0.043
aron+0.042
ce+0.039
Neg logit contributions
stack-0.061
ishly-0.056
 plunge-0.055
ses-0.053
sing-0.051
Token and
Token ID290
Feature activation+3.26e-04

Pos logit contributions
anger+0.040
stack+0.039
 plunge+0.038
ses+0.038
ishly+0.036
Neg logit contributions
bo-0.017
y-0.016
aron-0.016
en-0.015
by-0.015
Token Tom
Token ID4186
Feature activation-1.09e-02

Pos logit contributions
bo+0.000
y+0.000
aron+0.000
en+0.000
by+0.000
Neg logit contributions
anger-0.001
stack-0.001
 plunge-0.001
ittens-0.001
ute-0.001
Token a
Token ID257
Feature activation-2.38e-02

Pos logit contributions
y+0.061
bo+0.061
aron+0.059
en+0.058
by+0.054
Neg logit contributions
anger-0.083
stack-0.082
 plunge-0.079
ses-0.075
ittens-0.074
Token cat
Token ID3797
Feature activation-8.60e-02

Pos logit contributions
stack+0.037
anger+0.036
 plunge+0.035
ishly+0.031
ittens+0.031
Neg logit contributions
y-0.047
bo-0.045
aron-0.044
by-0.043
en-0.042
Token.
Token ID13
Feature activation-1.97e-03

Pos logit contributions
anger+0.214
stack+0.210
 plunge+0.206
ses+0.195
ittens+0.195
Neg logit contributions
bo-0.084
y-0.083
aron-0.079
en-0.076
by-0.068
Token She
Token ID1375
Feature activation-1.75e-02

Pos logit contributions
anger+0.004
stack+0.004
 plunge+0.004
ses+0.003
ittens+0.003
Neg logit contributions
bo-0.003
y-0.003
aron-0.003
en-0.003
by-0.003
Token his
Token ID465
Feature activation+5.19e-03

Pos logit contributions
anger+0.036
stack+0.035
 plunge+0.034
ses+0.032
ittens+0.032
Neg logit contributions
bo-0.024
y-0.024
aron-0.024
en-0.023
by-0.021
Tokenses
Token ID8448
Feature activation-2.15e-01

Pos logit contributions
 time+0.010
y+0.010
by+0.010
bo+0.009
 enjoyed+0.009
Neg logit contributions
anger-0.007
 plunge-0.007
otion-0.007
stack-0.007
fs-0.006
Token and
Token ID290
Feature activation-3.23e-03

Pos logit contributions
anger+0.457
 plunge+0.435
ses+0.433
stack+0.422
sed+0.416
Neg logit contributions
bo-0.262
aron-0.252
en-0.240
y-0.237
by-0.237
Token scratches
Token ID37627
Feature activation+2.29e-02

Pos logit contributions
anger+0.008
stack+0.007
 plunge+0.007
ute+0.006
ittens+0.006
Neg logit contributions
bo-0.004
en-0.004
y-0.004
aron-0.004
 sav-0.004
Token at
Token ID379
Feature activation-4.39e-02

Pos logit contributions
bo+0.031
y+0.031
aron+0.030
en+0.029
by+0.028
Neg logit contributions
anger-0.048
stack-0.046
 plunge-0.045
ittens-0.042
ses-0.042
Token the
Token ID262
Feature activation+3.77e-03

Pos logit contributions
anger+0.105
stack+0.099
 plunge+0.096
vere+0.095
ittens+0.095
Neg logit contributions
bo-0.051
y-0.049
en-0.047
by-0.043
aron-0.041
Token dog
Token ID3290
Feature activation-9.40e-02

Pos logit contributions
y+0.007
bo+0.006
aron+0.006
by+0.006
ny+0.006
Neg logit contributions
anger-0.006
stack-0.006
 plunge-0.006
ittens-0.006
-0.005
Token angry
Token ID7954
Feature activation-2.73e-02

Pos logit contributions
stack+0.018
ses+0.018
 plunge+0.017
anger+0.017
sed+0.016
Neg logit contributions
aron-0.018
bo-0.017
en-0.017
y-0.017
by-0.017
Token snake
Token ID17522
Feature activation-3.22e-02

Pos logit contributions
anger+0.044
stack+0.042
 plunge+0.041
ses+0.039
ittens+0.038
Neg logit contributions
bo-0.050
y-0.049
aron-0.049
en-0.047
by-0.045
Token.
Token ID13
Feature activation+6.61e-03

Pos logit contributions
anger+0.071
stack+0.071
 plunge+0.067
ses+0.066
aroo+0.064
Neg logit contributions
aron-0.039
bo-0.038
en-0.036
y-0.036
by-0.034
Token It
Token ID632
Feature activation-8.31e-02

Pos logit contributions
bo+0.009
aron+0.009
y+0.009
en+0.008
 added+0.008
Neg logit contributions
 plunge-0.013
anger-0.013
stack-0.013
ittens-0.012
ses-0.012
Token his
Token ID465
Feature activation+7.49e-03

Pos logit contributions
stack+0.147
anger+0.146
ses+0.143
 plunge+0.143
sed+0.135
Neg logit contributions
aron-0.133
y-0.130
bo-0.125
en-0.123
by-0.121
Tokenses
Token ID8448
Feature activation-2.14e-01

Pos logit contributions
bo+0.018
y+0.018
aron+0.018
en+0.018
by+0.017
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.007
ittens-0.006
ishly-0.005
Token and
Token ID290
Feature activation+4.04e-03

Pos logit contributions
anger+0.465
ses+0.446
 plunge+0.438
sed+0.427
stack+0.427
Neg logit contributions
bo-0.251
aron-0.247
by-0.234
Em-0.226
y-0.223
Token snaps
Token ID23429
Feature activation+4.21e-02

Pos logit contributions
bo+0.006
y+0.005
aron+0.005
en+0.005
by+0.005
Neg logit contributions
anger-0.008
stack-0.008
 plunge-0.008
ittens-0.007
ute-0.007
Token at
Token ID379
Feature activation-3.97e-02

Pos logit contributions
bo+0.062
aron+0.059
y+0.057
en+0.055
by+0.055
Neg logit contributions
anger-0.085
 plunge-0.080
stack-0.079
ses-0.077
sed-0.074
Token Ben
Token ID3932
Feature activation+3.41e-02

Pos logit contributions
anger+0.082
stack+0.080
 plunge+0.078
ses+0.073
ittens+0.072
Neg logit contributions
bo-0.056
y-0.055
en-0.054
aron-0.051
by-0.050
Token.
Token ID13
Feature activation+7.95e-03

Pos logit contributions
bo+0.037
aron+0.036
 cared+0.035
en+0.035
by+0.034
Neg logit contributions
anger-0.075
stack-0.073
 plunge-0.070
ses-0.068
ittens-0.067
Token of
Token ID286
Feature activation-1.27e-02

Pos logit contributions
bo+0.021
aron+0.020
y+0.020
en+0.019
by+0.019
Neg logit contributions
anger-0.032
stack-0.031
 plunge-0.030
ittens-0.028
ses-0.028
Token the
Token ID262
Feature activation+6.40e-03

Pos logit contributions
stack+0.025
anger+0.024
 plunge+0.024
ses+0.022
ittens+0.021
Neg logit contributions
bo-0.020
y-0.019
aron-0.018
by-0.017
en-0.017
Token web
Token ID3992
Feature activation-8.68e-02

Pos logit contributions
bo+0.014
aron+0.014
y+0.014
en+0.013
 cared+0.013
Neg logit contributions
anger-0.008
stack-0.008
 plunge-0.008
ses-0.007
ittens-0.007
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
anger+0.213
stack+0.210
 plunge+0.202
ses+0.194
ittens+0.191
Neg logit contributions
bo-0.086
y-0.081
aron-0.081
en-0.078
by-0.074
Token his
Token ID465
Feature activation+7.04e-03

Pos logit contributions
y+0.018
bo+0.018
aron+0.018
en+0.017
by+0.016
Neg logit contributions
anger-0.023
stack-0.023
 plunge-0.023
ses-0.022
ittens-0.020
Tokenses
Token ID8448
Feature activation-2.14e-01

Pos logit contributions
y+0.014
bo+0.014
aron+0.013
by+0.013
en+0.013
Neg logit contributions
anger-0.010
 plunge-0.009
stack-0.009
ittens-0.008
aroo-0.008
Token at
Token ID379
Feature activation-4.37e-02

Pos logit contributions
anger+0.411
ses+0.389
 plunge+0.379
sed+0.369
stack+0.366
Neg logit contributions
bo-0.293
aron-0.288
Em-0.284
by-0.282
Kat-0.279
Token Ben
Token ID3932
Feature activation+3.57e-02

Pos logit contributions
anger+0.094
stack+0.091
 plunge+0.090
ittens+0.084
ses+0.084
Neg logit contributions
bo-0.058
y-0.057
en-0.055
aron-0.055
by-0.050
Token.
Token ID13
Feature activation+1.63e-02

Pos logit contributions
bo+0.040
 cared+0.039
aron+0.039
en+0.038
y+0.037
Neg logit contributions
anger-0.077
stack-0.076
 plunge-0.072
ses-0.069
ittens-0.069
Token Ben
Token ID3932
Feature activation+3.60e-02

Pos logit contributions
bo+0.024
y+0.023
aron+0.023
en+0.022
by+0.021
Neg logit contributions
anger-0.033
stack-0.032
 plunge-0.031
ittens-0.029
ses-0.029
Token drops
Token ID10532
Feature activation-2.27e-02

Pos logit contributions
bo+0.057
y+0.057
aron+0.056
en+0.054
by+0.053
Neg logit contributions
anger-0.064
stack-0.063
 plunge-0.062
ses-0.061
sed-0.057
Token is
Token ID318
Feature activation+1.41e-02

Pos logit contributions
anger+0.147
stack+0.145
 plunge+0.140
ses+0.138
ishly+0.130
Neg logit contributions
y-0.129
bo-0.129
aron-0.125
en-0.122
by-0.118
Token afraid
Token ID7787
Feature activation-3.09e-02

Pos logit contributions
bo+0.023
y+0.023
en+0.022
aron+0.022
by+0.021
Neg logit contributions
stack-0.024
anger-0.024
 plunge-0.024
ses-0.023
ittens-0.022
Token.
Token ID13
Feature activation+1.63e-02

Pos logit contributions
anger+0.053
 plunge+0.052
ittens+0.051
aroo+0.049
 umbrella+0.049
Neg logit contributions
aron-0.049
bo-0.046
en-0.044
y-0.043
by-0.043
Token It
Token ID632
Feature activation-8.18e-02

Pos logit contributions
aron+0.021
bo+0.020
y+0.018
en+0.016
 cared+0.016
Neg logit contributions
anger-0.037
 plunge-0.036
stack-0.035
ittens-0.033
ute-0.033
Token his
Token ID465
Feature activation+9.81e-03

Pos logit contributions
anger+0.152
stack+0.151
 plunge+0.148
ses+0.143
sed+0.134
Neg logit contributions
y-0.124
bo-0.123
aron-0.122
en-0.119
by-0.111
Tokenses
Token ID8448
Feature activation-2.14e-01

Pos logit contributions
y+0.026
bo+0.026
aron+0.025
by+0.025
en+0.024
Neg logit contributions
anger-0.007
 plunge-0.007
stack-0.007
ittens-0.005
fs-0.005
Token and
Token ID290
Feature activation+1.21e-02

Pos logit contributions
anger+0.462
ses+0.449
 plunge+0.445
stack+0.432
sed+0.427
Neg logit contributions
bo-0.249
aron-0.246
y-0.225
Em-0.225
by-0.225
Token scratches
Token ID37627
Feature activation+2.34e-02

Pos logit contributions
bo+0.015
y+0.014
en+0.014
aron+0.014
 sav+0.013
Neg logit contributions
anger-0.028
stack-0.026
 plunge-0.025
ittens-0.024
ute-0.024
Token Tom
Token ID4186
Feature activation-4.21e-03

Pos logit contributions
bo+0.030
y+0.029
aron+0.029
en+0.028
by+0.027
Neg logit contributions
anger-0.050
stack-0.049
 plunge-0.048
ses-0.045
ittens-0.045
Token's
Token ID338
Feature activation-1.09e-02

Pos logit contributions
anger+0.010
stack+0.010
 plunge+0.010
ses+0.009
ittens+0.009
Neg logit contributions
aron-0.004
bo-0.004
en-0.004
y-0.004
by-0.004
Token hand
Token ID1021
Feature activation-6.32e-02

Pos logit contributions
anger+0.016
stack+0.016
 plunge+0.015
ittens+0.014
ses+0.013
Neg logit contributions
bo-0.022
y-0.022
aron-0.021
en-0.021
by-0.020
Token claws
Token ID28421
Feature activation-4.97e-02

Pos logit contributions
y+0.021
aron+0.021
bo+0.020
en+0.019
by+0.019
Neg logit contributions
anger-0.024
stack-0.024
 plunge-0.024
ses-0.023
sed-0.021
Token and
Token ID290
Feature activation-2.38e-03

Pos logit contributions
anger+0.109
stack+0.105
 plunge+0.104
ses+0.098
ittens+0.096
Neg logit contributions
bo-0.062
y-0.061
aron-0.061
en-0.058
by-0.055
Token bites
Token ID26081
Feature activation-4.99e-02

Pos logit contributions
anger+0.004
stack+0.004
 plunge+0.004
ses+0.004
ittens+0.004
Neg logit contributions
bo-0.004
y-0.004
aron-0.004
en-0.004
by-0.004
Token and
Token ID290
Feature activation-2.17e-03

Pos logit contributions
anger+0.083
 plunge+0.078
stack+0.078
aroo+0.071
ittens+0.071
Neg logit contributions
aron-0.086
bo-0.086
en-0.080
y-0.080
by-0.079
Token his
Token ID465
Feature activation+4.24e-03

Pos logit contributions
stack+0.004
 plunge+0.003
anger+0.003
ses+0.003
sed+0.003
Neg logit contributions
y-0.004
bo-0.004
aron-0.004
en-0.004
by-0.004
Tokenses
Token ID8448
Feature activation-2.14e-01

Pos logit contributions
bo+0.007
y+0.007
by+0.007
aron+0.007
en+0.006
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.007
otion-0.006
ittens-0.006
Token at
Token ID379
Feature activation-4.37e-02

Pos logit contributions
anger+0.417
 plunge+0.390
ishly+0.389
ses+0.387
stack+0.381
Neg logit contributions
bo-0.283
aron-0.272
Kat-0.270
by-0.267
en-0.261
Token the
Token ID262
Feature activation+1.87e-03

Pos logit contributions
anger+0.091
stack+0.088
 plunge+0.086
ses+0.081
ittens+0.080
Neg logit contributions
bo-0.061
y-0.061
en-0.059
aron-0.059
by-0.054
Token bars
Token ID9210
Feature activation-1.20e-01

Pos logit contributions
y+0.003
bo+0.003
aron+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ittens-0.003
-0.003
Token.
Token ID13
Feature activation+1.68e-04

Pos logit contributions
anger+0.327
stack+0.325
 plunge+0.323
ses+0.305
ittens+0.305
Neg logit contributions
y-0.081
bo-0.081
aron-0.077
en-0.069
by-0.054
Token He
Token ID679
Feature activation+1.39e-02

Pos logit contributions
y+0.000
bo+0.000
en+0.000
aron+0.000
by+0.000
Neg logit contributions
anger-0.000
stack-0.000
 plunge-0.000
ses-0.000
ittens-0.000
Token with
Token ID351
Feature activation+2.48e-02

Pos logit contributions
aron+0.082
bo+0.080
y+0.076
en+0.075
by+0.073
Neg logit contributions
stack-0.100
anger-0.098
 plunge-0.094
ishly-0.090
aroo-0.089
Token glasses
Token ID15232
Feature activation-1.71e-03

Pos logit contributions
bo+0.037
y+0.036
aron+0.036
en+0.035
by+0.032
Neg logit contributions
anger-0.048
stack-0.047
 plunge-0.046
ses-0.044
ittens-0.043
Token and
Token ID290
Feature activation+2.07e-03

Pos logit contributions
anger+0.004
stack+0.004
 plunge+0.004
ittens+0.004
ses+0.004
Neg logit contributions
bo-0.002
y-0.002
aron-0.002
en-0.002
by-0.002
Token a
Token ID257
Feature activation-2.04e-02

Pos logit contributions
bo+0.003
y+0.003
aron+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.004
stack-0.004
 plunge-0.004
ses-0.003
sed-0.003
Token pony
Token ID26902
Feature activation-5.67e-02

Pos logit contributions
anger+0.032
stack+0.031
 plunge+0.030
ses+0.029
ittens+0.028
Neg logit contributions
bo-0.038
en-0.037
y-0.037
aron-0.036
by-0.034
Tokentail
Token ID13199
Feature activation-2.14e-01

Pos logit contributions
anger+0.143
stack+0.139
 plunge+0.138
ittens+0.130
ses+0.130
Neg logit contributions
bo-0.053
y-0.053
aron-0.049
en-0.048
by-0.043
Token.
Token ID13
Feature activation+4.95e-03

Pos logit contributions
anger+0.518
stack+0.508
 plunge+0.487
ses+0.472
ittens+0.464
Neg logit contributions
bo-0.221
y-0.215
aron-0.204
en-0.203
by-0.192
Token She
Token ID1375
Feature activation-1.24e-02

Pos logit contributions
bo+0.009
y+0.009
en+0.009
aron+0.009
by+0.008
Neg logit contributions
anger-0.008
stack-0.008
ses-0.007
 plunge-0.007
ittens-0.007
Token is
Token ID318
Feature activation+1.22e-02

Pos logit contributions
ses+0.022
anger+0.022
stack+0.021
sed+0.021
 plunge+0.021
Neg logit contributions
bo-0.019
aron-0.018
y-0.018
by-0.017
Em-0.017
Token reading
Token ID3555
Feature activation+4.03e-02

Pos logit contributions
bo+0.022
en+0.022
by+0.022
y+0.021
aron+0.021
Neg logit contributions
stack-0.018
ses-0.017
 plunge-0.017
sed-0.016
anger-0.016
Token a
Token ID257
Feature activation-2.00e-02

Pos logit contributions
bo+0.071
y+0.066
aron+0.066
en+0.065
by+0.064
Neg logit contributions
anger-0.065
stack-0.064
 plunge-0.061
ishly-0.057
ittens-0.057
Token and
Token ID290
Feature activation+1.08e-02

Pos logit contributions
anger+0.043
ittens+0.042
 plunge+0.041
 umbrella+0.040
aroo+0.040
Neg logit contributions
aron-0.040
bo-0.039
by-0.038
Kat-0.037
idi-0.036
Token angry
Token ID7954
Feature activation-2.93e-02

Pos logit contributions
bo+0.017
y+0.017
aron+0.016
en+0.016
by+0.015
Neg logit contributions
anger-0.021
stack-0.020
 plunge-0.019
ittens-0.018
ses-0.018
Token.
Token ID13
Feature activation+1.38e-02

Pos logit contributions
anger+0.060
ses+0.056
 plunge+0.055
stack+0.055
ittens+0.053
Neg logit contributions
bo-0.039
aron-0.037
Em-0.035
by-0.035
y-0.035
Token It
Token ID632
Feature activation-8.33e-02

Pos logit contributions
bo+0.024
y+0.024
aron+0.024
en+0.023
by+0.022
Neg logit contributions
anger-0.023
stack-0.023
 plunge-0.022
ittens-0.020
ses-0.020
Token his
Token ID465
Feature activation+7.12e-03

Pos logit contributions
anger+0.149
stack+0.147
 plunge+0.145
ses+0.144
sed+0.135
Neg logit contributions
aron-0.131
y-0.130
bo-0.126
en-0.125
by-0.118
Tokenses
Token ID8448
Feature activation-2.14e-01

Pos logit contributions
y+0.017
bo+0.017
by+0.017
aron+0.017
en+0.016
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.006
otion-0.005
ishly-0.005
Token and
Token ID290
Feature activation+1.14e-02

Pos logit contributions
anger+0.451
ses+0.444
 plunge+0.428
sed+0.426
stack+0.414
Neg logit contributions
bo-0.249
aron-0.243
Em-0.236
by-0.235
Kat-0.232
Token fl
Token ID781
Feature activation-4.63e-02

Pos logit contributions
bo+0.015
y+0.014
en+0.014
aron+0.014
 sav+0.012
Neg logit contributions
anger-0.025
stack-0.024
 plunge-0.023
ittens-0.022
ute-0.022
Tokenaps
Token ID1686
Feature activation+1.65e-02

Pos logit contributions
stack+0.096
 plunge+0.095
anger+0.083
fs+0.081
ishly+0.079
Neg logit contributions
y-0.070
bo-0.069
by-0.062
aron-0.059
ud-0.058
Token its
Token ID663
Feature activation-6.26e-02

Pos logit contributions
bo+0.025
aron+0.024
y+0.024
en+0.023
by+0.022
Neg logit contributions
anger-0.031
stack-0.030
 plunge-0.030
ses-0.028
ittens-0.027
Token wings
Token ID12098
Feature activation-4.30e-02

Pos logit contributions
anger+0.088
stack+0.085
 plunge+0.082
ses+0.076
ittens+0.074
Neg logit contributions
bo-0.127
y-0.127
aron-0.125
en-0.123
by-0.117
Token is
Token ID318
Feature activation+1.42e-02

Pos logit contributions
anger+0.145
stack+0.143
 plunge+0.138
ses+0.137
sed+0.128
Neg logit contributions
y-0.128
bo-0.128
aron-0.124
en-0.121
by-0.117
Token afraid
Token ID7787
Feature activation-3.03e-02

Pos logit contributions
bo+0.023
y+0.023
en+0.022
aron+0.022
by+0.021
Neg logit contributions
anger-0.025
stack-0.025
 plunge-0.025
ses-0.024
ittens-0.023
Token.
Token ID13
Feature activation+1.47e-02

Pos logit contributions
anger+0.052
 plunge+0.050
ittens+0.049
aroo+0.047
stack+0.047
Neg logit contributions
aron-0.048
bo-0.046
en-0.043
by-0.043
y-0.042
Token It
Token ID632
Feature activation-8.21e-02

Pos logit contributions
aron+0.019
bo+0.018
y+0.016
en+0.015
 cared+0.015
Neg logit contributions
anger-0.033
 plunge-0.033
stack-0.032
 footing-0.030
ittens-0.030
Token his
Token ID465
Feature activation+9.74e-03

Pos logit contributions
anger+0.152
stack+0.151
 plunge+0.148
ses+0.143
sed+0.134
Neg logit contributions
y-0.125
bo-0.123
aron-0.123
en-0.119
by-0.112
Tokenses
Token ID8448
Feature activation-2.14e-01

Pos logit contributions
y+0.025
bo+0.025
by+0.024
aron+0.024
 time+0.023
Neg logit contributions
anger-0.008
 plunge-0.007
stack-0.007
otion-0.006
aroo-0.006
Token and
Token ID290
Feature activation+9.82e-03

Pos logit contributions
anger+0.478
ses+0.462
 plunge+0.459
stack+0.450
sed+0.440
Neg logit contributions
bo-0.237
aron-0.233
y-0.216
en-0.213
by-0.212
Token scratches
Token ID37627
Feature activation+2.52e-02

Pos logit contributions
bo+0.012
y+0.012
en+0.012
aron+0.011
 sav+0.011
Neg logit contributions
anger-0.022
stack-0.021
 plunge-0.020
ittens-0.019
ute-0.019
Token Tom
Token ID4186
Feature activation-4.83e-03

Pos logit contributions
bo+0.033
y+0.032
aron+0.032
en+0.030
by+0.029
Neg logit contributions
anger-0.054
stack-0.052
 plunge-0.051
ses-0.048
ittens-0.048
Token's
Token ID338
Feature activation-1.27e-02

Pos logit contributions
anger+0.011
stack+0.011
 plunge+0.011
ses+0.011
ittens+0.011
Neg logit contributions
aron-0.004
bo-0.004
en-0.004
y-0.004
by-0.004
Token hand
Token ID1021
Feature activation-6.50e-02

Pos logit contributions
anger+0.019
stack+0.018
 plunge+0.018
ittens+0.016
aroo+0.016
Neg logit contributions
bo-0.025
y-0.025
aron-0.024
en-0.024
by-0.023
Token of
Token ID286
Feature activation-1.19e-02

Pos logit contributions
anger+0.047
ittens+0.047
 plunge+0.044
aroo+0.043
 umbrella+0.043
Neg logit contributions
aron-0.040
by-0.039
bo-0.039
en-0.037
Kat-0.037
Token Tom
Token ID4186
Feature activation-1.34e-02

Pos logit contributions
anger+0.029
stack+0.028
 plunge+0.028
ute+0.026
ittens+0.026
Neg logit contributions
en-0.012
bo-0.012
y-0.012
aron-0.011
by-0.010
Token.
Token ID13
Feature activation+1.45e-03

Pos logit contributions
anger+0.032
stack+0.031
 plunge+0.031
ittens+0.029
ses+0.029
Neg logit contributions
bo-0.013
aron-0.013
y-0.013
en-0.012
by-0.011
Token It
Token ID632
Feature activation-8.37e-02

Pos logit contributions
bo+0.002
aron+0.002
y+0.002
en+0.002
 cared+0.002
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ittens-0.003
aroo-0.003
Token his
Token ID465
Feature activation+7.39e-03

Pos logit contributions
anger+0.159
stack+0.155
 plunge+0.151
ses+0.142
ittens+0.140
Neg logit contributions
bo-0.128
y-0.128
aron-0.125
en-0.122
by-0.115
Tokenses
Token ID8448
Feature activation-2.13e-01

Pos logit contributions
y+0.018
bo+0.017
by+0.017
aron+0.017
 time+0.017
Neg logit contributions
anger-0.007
stack-0.007
 plunge-0.007
otion-0.006
ishly-0.006
Token and
Token ID290
Feature activation-1.24e-03

Pos logit contributions
anger+0.477
ses+0.457
 plunge+0.446
stack+0.441
sed+0.435
Neg logit contributions
bo-0.235
aron-0.225
by-0.217
en-0.217
y-0.209
Token scratches
Token ID37627
Feature activation+2.39e-02

Pos logit contributions
anger+0.003
stack+0.003
 plunge+0.003
ittens+0.002
ute+0.002
Neg logit contributions
bo-0.002
y-0.002
en-0.001
aron-0.001
 cared-0.001
Token.
Token ID13
Feature activation+2.74e-03

Pos logit contributions
bo+0.026
y+0.026
aron+0.025
en+0.024
by+0.023
Neg logit contributions
anger-0.056
stack-0.054
 plunge-0.053
ittens-0.050
ses-0.050
Token It
Token ID632
Feature activation-8.32e-02

Pos logit contributions
bo+0.004
aron+0.004
y+0.004
en+0.004
 cared+0.004
Neg logit contributions
anger-0.005
 plunge-0.005
stack-0.005
ishly-0.005
ittens-0.005
Token hurts
Token ID20406
Feature activation+1.73e-02

Pos logit contributions
anger+0.168
stack+0.164
 plunge+0.160
ittens+0.150
ses+0.147
Neg logit contributions
bo-0.121
y-0.118
aron-0.114
en-0.112
by-0.105
Token.
Token ID13
Feature activation+6.97e-03

Pos logit contributions
bo+0.005
aron+0.005
die+0.005
idi+0.005
Kat+0.005
Neg logit contributions
stack-0.008
anger-0.008
 plunge-0.007
ishly-0.007
-0.007
Token<|endoftext|>
Token ID50256
Feature activation-1.47e-02

Pos logit contributions
y+0.010
bo+0.010
en+0.010
aron+0.009
by+0.009
Neg logit contributions
stack-0.014
anger-0.014
 plunge-0.013
ittens-0.013
ses-0.012
TokenOnce
Token ID7454
Feature activation+2.88e-03

Pos logit contributions
anger+0.027
stack+0.027
 plunge+0.025
+0.025
aroo+0.023
Neg logit contributions
aron-0.022
en-0.022
bo-0.021
y-0.021
by-0.021
Token upon
Token ID2402
Feature activation-2.03e-02

Pos logit contributions
y+0.004
bo+0.004
by+0.004
en+0.004
aron+0.004
Neg logit contributions
anger-0.006
stack-0.006
 plunge-0.005
ses-0.005
ishly-0.005
Token a
Token ID257
Feature activation-1.89e-02

Pos logit contributions
anger+0.040
stack+0.039
 plunge+0.039
ses+0.036
ittens+0.036
Neg logit contributions
bo-0.029
y-0.029
aron-0.028
en-0.028
by-0.026
Token time
Token ID640
Feature activation+1.34e-01

Pos logit contributions
ittens+0.044
ses+0.041
+0.041
anger+0.041
 plunge+0.040
Neg logit contributions
en-0.019
bo-0.017
y-0.016
aron-0.015
 added-0.014
Token,
Token ID11
Feature activation-1.14e-02

Pos logit contributions
y+0.151
bo+0.138
aron+0.125
en+0.124
by+0.119
Neg logit contributions
anger-0.322
stack-0.318
 plunge-0.306
ses-0.295
ittens-0.286
Token there
Token ID612
Feature activation-4.18e-02

Pos logit contributions
anger+0.019
stack+0.019
 plunge+0.018
ses+0.017
ittens+0.017
Neg logit contributions
bo-0.020
aron-0.019
en-0.019
y-0.019
by-0.018
Token was
Token ID373
Feature activation+2.26e-02

Pos logit contributions
anger+0.103
stack+0.102
 plunge+0.100
aroo+0.093
ses+0.093
Neg logit contributions
aron-0.040
bo-0.039
en-0.035
y-0.035
by-0.031
Token a
Token ID257
Feature activation-1.84e-02

Pos logit contributions
bo+0.038
aron+0.038
y+0.038
en+0.038
by+0.035
Neg logit contributions
anger-0.037
 plunge-0.037
stack-0.037
ittens-0.035
ses-0.034
Token little
Token ID1310
Feature activation+3.38e-02

Pos logit contributions
anger+0.037
stack+0.036
 plunge+0.035
ittens+0.033
ses+0.033
Neg logit contributions
bo-0.026
y-0.025
aron-0.025
en-0.025
by-0.023
Token across
Token ID1973
Feature activation+2.59e-02

Pos logit contributions
aron+0.048
bo+0.047
en+0.044
y+0.044
 cared+0.039
Neg logit contributions
 plunge-0.077
anger-0.076
stack-0.075
ses-0.069
ittens-0.067
Token a
Token ID257
Feature activation-2.22e-02

Pos logit contributions
aron+0.049
y+0.046
bo+0.046
en+0.044
by+0.041
Neg logit contributions
 plunge-0.041
anger-0.039
stack-0.039
ses-0.034
aroo-0.034
Token big
Token ID1263
Feature activation+5.93e-02

Pos logit contributions
anger+0.038
stack+0.038
 plunge+0.037
ittens+0.034
ses+0.033
Neg logit contributions
bo-0.038
aron-0.037
y-0.037
en-0.036
by-0.033
Token,
Token ID11
Feature activation-1.44e-02

Pos logit contributions
bo+0.118
y+0.115
aron+0.115
en+0.112
by+0.107
Neg logit contributions
anger-0.086
stack-0.084
 plunge-0.081
ses-0.073
ittens-0.073
Token shiny
Token ID22441
Feature activation+5.45e-02

Pos logit contributions
 plunge+0.022
anger+0.022
stack+0.022
 raft+0.021
ses+0.020
Neg logit contributions
en-0.026
bo-0.026
y-0.026
aron-0.025
by-0.025
Token mirror
Token ID10162
Feature activation+1.30e-01

Pos logit contributions
bo+0.103
y+0.102
aron+0.100
en+0.098
by+0.094
Neg logit contributions
anger-0.085
stack-0.082
 plunge-0.079
ses-0.072
ittens-0.072
Token.
Token ID13
Feature activation+1.41e-03

Pos logit contributions
bo+0.134
aron+0.130
y+0.127
en+0.125
by+0.119
Neg logit contributions
anger-0.316
stack-0.306
 plunge-0.291
ses-0.281
ittens-0.281
Token
Token ID198
Feature activation+4.46e-03

Pos logit contributions
bo+0.002
y+0.002
aron+0.002
en+0.002
by+0.002
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ses-0.003
ittens-0.003
TokenThe
Token ID464
Feature activation-4.61e-02

Pos logit contributions
bo+0.005
y+0.005
aron+0.005
en+0.004
by+0.004
Neg logit contributions
anger-0.011
stack-0.010
 plunge-0.010
ittens-0.010
ses-0.010
Token bear
Token ID6842
Feature activation-5.25e-02

Pos logit contributions
anger+0.081
stack+0.078
 plunge+0.076
ses+0.070
ittens+0.070
Neg logit contributions
bo-0.079
y-0.078
aron-0.077
en-0.075
by-0.071
Token looked
Token ID3114
Feature activation+7.82e-03

Pos logit contributions
stack+0.101
anger+0.094
sed+0.091
 raft+0.091
aroo+0.090
Neg logit contributions
aron-0.074
en-0.069
bo-0.065
y-0.058
eared-0.057
Token.
Token ID13
Feature activation-3.82e-03

Pos logit contributions
stack+0.043
 plunge+0.043
anger+0.041
ble+0.040
ittens+0.040
Neg logit contributions
bo-0.041
aron-0.039
y-0.037
ny-0.035
by-0.034
Token<|endoftext|>
Token ID50256
Feature activation-2.55e-02

Pos logit contributions
anger+0.008
stack+0.008
 plunge+0.007
ittens+0.007
ses+0.007
Neg logit contributions
bo-0.005
y-0.005
aron-0.005
en-0.005
by-0.005
TokenOnce
Token ID7454
Feature activation+1.36e-03

Pos logit contributions
anger+0.052
stack+0.050
 plunge+0.048
+0.046
aroo+0.044
Neg logit contributions
en-0.034
y-0.034
aron-0.034
bo-0.034
by-0.032
Token upon
Token ID2402
Feature activation-2.23e-02

Pos logit contributions
y+0.002
bo+0.002
en+0.002
by+0.002
aron+0.002
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ses-0.002
ishly-0.002
Token a
Token ID257
Feature activation-1.97e-02

Pos logit contributions
anger+0.045
stack+0.044
 plunge+0.043
ses+0.040
ittens+0.040
Neg logit contributions
bo-0.031
y-0.031
aron-0.030
en-0.030
by-0.028
Token time
Token ID640
Feature activation+1.32e-01

Pos logit contributions
ittens+0.049
anger+0.048
 plunge+0.047
ses+0.046
+0.046
Neg logit contributions
en-0.016
bo-0.016
aron-0.014
y-0.014
 added-0.012
Token,
Token ID11
Feature activation-1.08e-02

Pos logit contributions
y+0.138
bo+0.117
en+0.108
aron+0.104
by+0.104
Neg logit contributions
stack-0.328
anger-0.326
 plunge-0.313
ses-0.303
ittens-0.294
Token there
Token ID612
Feature activation-4.16e-02

Pos logit contributions
anger+0.019
stack+0.019
 plunge+0.018
ses+0.017
ittens+0.017
Neg logit contributions
bo-0.018
y-0.018
aron-0.017
en-0.017
by-0.016
Token was
Token ID373
Feature activation+2.42e-02

Pos logit contributions
anger+0.109
stack+0.107
 plunge+0.105
ses+0.099
ittens+0.099
Neg logit contributions
bo-0.034
aron-0.033
y-0.031
en-0.030
by-0.026
Token a
Token ID257
Feature activation-1.76e-02

Pos logit contributions
bo+0.037
y+0.036
aron+0.036
en+0.036
by+0.033
Neg logit contributions
anger-0.045
stack-0.044
 plunge-0.044
ittens-0.041
ses-0.041
Token little
Token ID1310
Feature activation+3.42e-02

Pos logit contributions
anger+0.034
stack+0.033
 plunge+0.032
ittens+0.031
ses+0.031
Neg logit contributions
bo-0.026
aron-0.025
en-0.025
y-0.025
by-0.022
Token.
Token ID13
Feature activation+1.35e-03

Pos logit contributions
anger+0.056
stack+0.056
 plunge+0.054
ishly+0.051
aroo+0.050
Neg logit contributions
bo-0.033
aron-0.030
y-0.030
en-0.029
by-0.028
Token<|endoftext|>
Token ID50256
Feature activation-1.29e-02

Pos logit contributions
bo+0.002
y+0.002
aron+0.002
en+0.002
by+0.001
Neg logit contributions
anger-0.003
stack-0.003
 plunge-0.003
ses-0.003
ittens-0.003
TokenOnce
Token ID7454
Feature activation+3.54e-03

Pos logit contributions
anger+0.024
stack+0.023
 plunge+0.022
+0.022
aroo+0.020
Neg logit contributions
aron-0.020
en-0.019
bo-0.019
by-0.018
y-0.018
Token upon
Token ID2402
Feature activation-2.30e-02

Pos logit contributions
y+0.005
bo+0.005
by+0.005
en+0.004
aron+0.004
Neg logit contributions
stack-0.007
anger-0.007
 plunge-0.007
ses-0.007
ishly-0.007
Token a
Token ID257
Feature activation-2.31e-02

Pos logit contributions
anger+0.051
stack+0.050
 plunge+0.048
ses+0.045
ittens+0.045
Neg logit contributions
bo-0.029
y-0.029
aron-0.028
en-0.027
by-0.025
Token time
Token ID640
Feature activation+1.33e-01

Pos logit contributions
ittens+0.053
 plunge+0.050
+0.050
anger+0.050
ses+0.050
Neg logit contributions
en-0.023
bo-0.021
aron-0.019
y-0.019
 added-0.017
Token,
Token ID11
Feature activation-1.35e-02

Pos logit contributions
y+0.144
bo+0.124
en+0.114
by+0.112
aron+0.111
Neg logit contributions
anger-0.324
stack-0.324
 plunge-0.309
ses-0.300
ittens-0.293
Token a
Token ID257
Feature activation-2.28e-02

Pos logit contributions
anger+0.025
stack+0.024
 plunge+0.023
ses+0.022
ittens+0.022
Neg logit contributions
y-0.022
bo-0.022
aron-0.021
en-0.021
by-0.020
Token boy
Token ID2933
Feature activation+7.02e-02

Pos logit contributions
anger+0.039
stack+0.036
 plunge+0.036
ittens+0.036
aroo+0.036
Neg logit contributions
en-0.039
aron-0.037
bo-0.037
y-0.035
 cared-0.032
Token and
Token ID290
Feature activation-4.01e-04

Pos logit contributions
aron+0.153
en+0.133
bo+0.131
die+0.127
nie+0.125
Neg logit contributions
aroo-0.085
stack-0.084
anger-0.081
ittens-0.078
 plunge-0.078
Token his
Token ID465
Feature activation+2.70e-03

Pos logit contributions
anger+0.001
stack+0.001
 plunge+0.001
ittens+0.001
ses+0.001
Neg logit contributions
bo-0.000
y-0.000
aron-0.000
en-0.000
by-0.000
Token.
Token ID13
Feature activation+1.56e-03

Pos logit contributions
bo+0.091
aron+0.086
en+0.085
y+0.083
by+0.078
Neg logit contributions
anger-0.175
stack-0.173
 plunge-0.162
ittens-0.155
aroo-0.153
Token It
Token ID632
Feature activation-8.46e-02

Pos logit contributions
bo+0.003
y+0.003
aron+0.003
by+0.003
en+0.003
Neg logit contributions
anger-0.002
stack-0.002
 plunge-0.002
ittens-0.002
ses-0.002
Token was
Token ID373
Feature activation+2.09e-02

Pos logit contributions
stack+0.178
anger+0.175
aroo+0.158
ittens+0.157
 plunge+0.157
Neg logit contributions
aron-0.102
en-0.097
by-0.093
bo-0.089
y-0.089
Token her
Token ID607
Feature activation-2.05e-02

Pos logit contributions
bo+0.036
aron+0.036
by+0.034
y+0.034
en+0.034
Neg logit contributions
stack-0.032
anger-0.032
 plunge-0.031
ittens-0.031
 umbrella-0.029
Token favorite
Token ID4004
Feature activation-2.60e-02

Pos logit contributions
anger+0.036
stack+0.035
 plunge+0.033
ittens+0.031
aroo+0.030
Neg logit contributions
y-0.035
bo-0.034
aron-0.032
en-0.032
by-0.032
Token color
Token ID3124
Feature activation+1.11e-01

Pos logit contributions
anger+0.045
 plunge+0.042
stack+0.042
ittens+0.040
ute+0.038
Neg logit contributions
bo-0.044
aron-0.042
y-0.040
en-0.040
by-0.038
Token and
Token ID290
Feature activation-2.57e-04

Pos logit contributions
bo+0.151
aron+0.145
en+0.136
by+0.131
y+0.130
Neg logit contributions
anger-0.235
stack-0.220
 plunge-0.209
ittens-0.204
ute-0.198
Token it
Token ID340
Feature activation-1.00e-02

Pos logit contributions
anger+0.000
stack+0.000
 plunge+0.000
ittens+0.000
aroo+0.000
Neg logit contributions
bo-0.000
y-0.000
aron-0.000
en-0.000
by-0.000
Token made
Token ID925
Feature activation+5.84e-02

Pos logit contributions
anger+0.020
stack+0.020
 plunge+0.019
fs+0.018
ishly+0.018
Neg logit contributions
aron-0.013
en-0.012
by-0.011
bo-0.011
y-0.011
Token her
Token ID607
Feature activation-2.02e-02

Pos logit contributions
bo+0.109
aron+0.103
y+0.100
en+0.095
by+0.095
Neg logit contributions
 plunge-0.090
anger-0.089
ittens-0.078
stack-0.078
fs-0.071
Token feel
Token ID1254
Feature activation+4.47e-02

Pos logit contributions
anger+0.041
stack+0.040
 plunge+0.039
ses+0.036
ittens+0.036
Neg logit contributions
bo-0.029
y-0.028
aron-0.028
en-0.027
by-0.025
Token for
Token ID329
Feature activation-5.77e-02

Pos logit contributions
 plunge+0.018
anger+0.018
stack+0.018
ittens+0.017
ses+0.017
Neg logit contributions
y-0.013
bo-0.012
aron-0.011
en-0.011
by-0.010
Token her
Token ID607
Feature activation-3.03e-02

Pos logit contributions
anger+0.128
stack+0.127
 plunge+0.122
ses+0.117
ute+0.113
Neg logit contributions
y-0.067
bo-0.067
en-0.066
aron-0.065
by-0.059
Token brother
Token ID3956
Feature activation-3.73e-02

Pos logit contributions
anger+0.056
stack+0.054
 plunge+0.053
ses+0.049
ittens+0.049
Neg logit contributions
bo-0.049
y-0.048
aron-0.047
en-0.046
by-0.043
Token as
Token ID355
Feature activation-5.68e-02

Pos logit contributions
anger+0.069
stack+0.069
ishly+0.065
ittens+0.062
 plunge+0.062
Neg logit contributions
bo-0.054
aron-0.050
en-0.050
by-0.048
 gre-0.046
Token a
Token ID257
Feature activation-2.26e-02

Pos logit contributions
anger+0.088
stack+0.085
 plunge+0.083
ittens+0.077
aroo+0.075
Neg logit contributions
bo-0.108
aron-0.106
y-0.104
en-0.102
by-0.095
Token surprise
Token ID5975
Feature activation-7.50e-03

Pos logit contributions
anger+0.036
stack+0.034
 plunge+0.034
ses+0.031
ittens+0.030
Neg logit contributions
bo-0.043
aron-0.042
y-0.042
en-0.042
by-0.038
Token.
Token ID13
Feature activation+5.27e-03

Pos logit contributions
anger+0.017
stack+0.016
 plunge+0.016
ittens+0.015
ses+0.015
Neg logit contributions
bo-0.010
aron-0.009
y-0.009
en-0.008
by-0.008
Token
Token ID198
Feature activation+8.64e-03

Pos logit contributions
bo+0.008
y+0.008
aron+0.008
en+0.007
by+0.007
Neg logit contributions
anger-0.010
stack-0.010
 plunge-0.010
ittens-0.009
ses-0.009
Token
Token ID198
Feature activation+9.97e-03

Pos logit contributions
bo+0.010
y+0.010
en+0.010
aron+0.010
by+0.009
Neg logit contributions
anger-0.019
stack-0.019
 plunge-0.018
ittens-0.018
aroo-0.017
TokenThe
Token ID464
Feature activation-4.24e-02

Pos logit contributions
bo+0.014
y+0.014
aron+0.014
en+0.014
by+0.013
Neg logit contributions
anger-0.020
stack-0.019
 plunge-0.019
ittens-0.018
aroo-0.017
Token little
Token ID1310
Feature activation+3.06e-02

Pos logit contributions
anger+0.078
stack+0.076
 plunge+0.075
ses+0.069
ittens+0.068
Neg logit contributions
bo-0.068
aron-0.068
y-0.067
en-0.065
by-0.060
Token love
Token ID1842
Feature activation+1.70e-02

Pos logit contributions
en+0.037
bo+0.035
aron+0.035
y+0.035
ky+0.032
Neg logit contributions
 plunge-0.037
anger-0.037
stack-0.036
ses-0.035
-0.032
Token you
Token ID345
Feature activation-6.37e-04

Pos logit contributions
aron+0.041
bo+0.041
y+0.040
en+0.038
by+0.037
Neg logit contributions
anger-0.018
 plunge-0.017
stack-0.016
ittens-0.015
ses-0.014
Token too
Token ID1165
Feature activation-8.49e-02

Pos logit contributions
anger+0.002
stack+0.001
 plunge+0.001
ses+0.001
ittens+0.001
Neg logit contributions
bo-0.001
y-0.001
aron-0.001
en-0.001
by-0.001
Token.
Token ID13
Feature activation+1.41e-02

Pos logit contributions
anger+0.205
stack+0.200
 plunge+0.197
ses+0.187
ittens+0.186
Neg logit contributions
bo-0.088
y-0.085
aron-0.083
en-0.080
by-0.073
Token I
Token ID314
Feature activation+2.29e-02

Pos logit contributions
bo+0.026
y+0.025
aron+0.025
en+0.024
by+0.023
Neg logit contributions
anger-0.023
stack-0.022
 plunge-0.022
ittens-0.020
ses-0.020
Token will
Token ID481
Feature activation+1.27e-02

Pos logit contributions
en+0.039
aron+0.038
bo+0.037
y+0.037
Em+0.035
Neg logit contributions
 plunge-0.035
anger-0.035
ses-0.034
stack-0.033
 seeker-0.031
Token try
Token ID1949
Feature activation-3.63e-03

Pos logit contributions
bo+0.022
y+0.022
aron+0.022
en+0.021
by+0.020
Neg logit contributions
anger-0.022
stack-0.021
 plunge-0.020
ittens-0.019
ses-0.019
Token school
Token ID1524
Feature activation+2.08e-02

Pos logit contributions
anger+0.007
 plunge+0.007
ittens+0.006
stack+0.006
ses+0.006
Neg logit contributions
aron-0.005
y-0.005
bo-0.005
en-0.005
 cared-0.005
Token."
Token ID526
Feature activation+2.16e-02

Pos logit contributions
bo+0.028
y+0.026
aron+0.026
en+0.026
by+0.024
Neg logit contributions
anger-0.043
 plunge-0.040
stack-0.039
ittens-0.037
fs-0.036
Token She
Token ID1375
Feature activation-1.51e-02

Pos logit contributions
bo+0.035
en+0.034
y+0.034
aron+0.034
by+0.031
Neg logit contributions
anger-0.038
stack-0.038
ittens-0.036
 plunge-0.035
ses-0.035
Token gave
Token ID2921
Feature activation+2.82e-02

Pos logit contributions
anger+0.032
stack+0.031
 plunge+0.030
ses+0.030
sed+0.029
Neg logit contributions
bo-0.019
y-0.019
en-0.018
aron-0.018
by-0.017
Token the
Token ID262
Feature activation+1.30e-03

Pos logit contributions
anger+0.026
stack+0.026
 plunge+0.025
ittens+0.023
+0.023
Neg logit contributions
bo-0.025
y-0.024
aron-0.022
by-0.022
 cared-0.021
Token forehead
Token ID22645
Feature activation-6.57e-02

Pos logit contributions
bo+0.003
y+0.003
aron+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.002
stack-0.002
 plunge-0.002
ittens-0.001
ses-0.001
Token.
Token ID13
Feature activation+5.91e-03

Pos logit contributions
anger+0.139
stack+0.133
 plunge+0.123
ishly+0.120
ittens+0.118
Neg logit contributions
bo-0.083
en-0.078
y-0.077
by-0.077
aron-0.076
Token He
Token ID679
Feature activation+1.74e-02

Pos logit contributions
bo+0.010
y+0.010
en+0.009
aron+0.009
by+0.009
Neg logit contributions
anger-0.010
stack-0.010
 plunge-0.009
ittens-0.009
-0.009
Token hugged
Token ID41130
Feature activation-1.67e-02

Pos logit contributions
aron+0.022
y+0.022
bo+0.021
en+0.021
older+0.019
Neg logit contributions
stack-0.033
anger-0.032
ishly-0.031
ses-0.031
sed-0.030
Token his
Token ID465
Feature activation+5.57e-03

Pos logit contributions
anger+0.029
stack+0.028
 plunge+0.027
ishly+0.026
aroo+0.025
Neg logit contributions
bo-0.027
y-0.026
aron-0.026
en-0.024
 gre-0.024
Token mom
Token ID1995
Feature activation+1.03e-02

Pos logit contributions
bo+0.010
y+0.010
by+0.010
aron+0.010
en+0.009
Neg logit contributions
stack-0.009
anger-0.009
 plunge-0.009
otion-0.008
aroo-0.008
Tokenmy
Token ID1820
Feature activation+9.14e-02

Pos logit contributions
aron+0.009
bo+0.008
 gre+0.008
en+0.008
 cared+0.008
Neg logit contributions
stack-0.023
ishly-0.023
anger-0.023
 plunge-0.022
ittens-0.022
Token back
Token ID736
Feature activation-5.32e-02

Pos logit contributions
bo+0.116
aron+0.114
en+0.110
y+0.106
 cared+0.105
Neg logit contributions
stack-0.180
anger-0.176
ishly-0.174
 plunge-0.170
ittens-0.164
Token and
Token ID290
Feature activation+4.98e-03

Pos logit contributions
anger+0.119
stack+0.118
 plunge+0.108
ittens+0.107
ishly+0.106
Neg logit contributions
bo-0.063
aron-0.057
y-0.055
en-0.055
 cared-0.054
Token felt
Token ID2936
Feature activation+6.78e-03

Pos logit contributions
bo+0.007
aron+0.007
y+0.007
en+0.007
ny+0.006
Neg logit contributions
stack-0.010
 plunge-0.009
anger-0.009
ishly-0.009
aroo-0.008
TokenWhat
Token ID2061
Feature activation+1.33e-02

Pos logit contributions
stack+0.052
anger+0.051
 plunge+0.050
aroo+0.048
ittens+0.046
Neg logit contributions
en-0.039
y-0.039
aron-0.039
bo-0.039
by-0.038
Token happened
Token ID3022
Feature activation+7.64e-02

Pos logit contributions
bo+0.019
aron+0.018
y+0.018
en+0.018
by+0.016
Neg logit contributions
anger-0.027
stack-0.026
 plunge-0.026
ses-0.025
ittens-0.024
Token,
Token ID11
Feature activation-5.14e-03

Pos logit contributions
bo+0.135
y+0.132
aron+0.131
en+0.127
by+0.121
Neg logit contributions
anger-0.129
stack-0.125
 plunge-0.123
ses-0.112
ittens-0.112
Token Tom
Token ID4186
Feature activation-3.92e-03

Pos logit contributions
stack+0.012
anger+0.011
+0.011
 plunge+0.011
ishly+0.010
Neg logit contributions
y-0.006
aron-0.006
en-0.006
bo-0.006
 added-0.005
Token?"
Token ID1701
Feature activation+3.72e-02

Pos logit contributions
stack+0.004
anger+0.004
ble+0.004
 plunge+0.004
aroo+0.004
Neg logit contributions
aron-0.008
bo-0.008
en-0.008
by-0.008
 cared-0.008
Token Mom
Token ID11254
Feature activation+2.83e-02

Pos logit contributions
bo+0.064
y+0.064
aron+0.062
en+0.061
by+0.058
Neg logit contributions
anger-0.064
stack-0.063
 plunge-0.061
ses-0.056
ittens-0.055
Token asked
Token ID1965
Feature activation+6.77e-02

Pos logit contributions
by+0.030
 cared+0.024
en+0.024
 enjoyed+0.024
aron+0.024
Neg logit contributions
stack-0.061
ittens-0.060
anger-0.060
aroo-0.059
otion-0.058
Token.
Token ID13
Feature activation+1.73e-02

Pos logit contributions
bo+0.073
aron+0.071
y+0.069
en+0.068
by+0.062
Neg logit contributions
anger-0.159
stack-0.152
 plunge-0.152
ittens-0.144
ses-0.142
Token
Token ID198
Feature activation+1.63e-02

Pos logit contributions
bo+0.031
y+0.031
aron+0.030
en+0.030
by+0.028
Neg logit contributions
anger-0.029
stack-0.028
 plunge-0.027
ses-0.025
ittens-0.025
Token
Token ID198
Feature activation+1.69e-02

Pos logit contributions
bo+0.017
y+0.016
aron+0.016
en+0.015
by+0.014
Neg logit contributions
anger-0.040
stack-0.039
 plunge-0.038
ses-0.036
ittens-0.036
Token"
Token ID1
Feature activation-2.59e-02

Pos logit contributions
bo+0.022
y+0.022
aron+0.021
en+0.021
by+0.019
Neg logit contributions
anger-0.036
stack-0.035
 plunge-0.035
ses-0.033
ittens-0.032
Token,
Token ID11
Feature activation-1.30e-02

Pos logit contributions
anger+0.076
stack+0.072
 plunge+0.071
ishly+0.066
ble+0.065
Neg logit contributions
bo-0.081
aron-0.072
Em-0.068
by-0.067
y-0.067
Token but
Token ID475
Feature activation-1.17e-02

Pos logit contributions
stack+0.025
anger+0.023
 plunge+0.022
ses+0.022
ishly+0.021
Neg logit contributions
bo-0.022
y-0.020
by-0.020
en-0.018
aron-0.018
Token it
Token ID340
Feature activation-9.74e-03

Pos logit contributions
stack+0.019
 plunge+0.018
anger+0.018
aroo+0.017
ses+0.016
Neg logit contributions
bo-0.022
aron-0.021
y-0.021
en-0.020
by-0.019
Token showed
Token ID3751
Feature activation+5.68e-03

Pos logit contributions
stack+0.022
anger+0.020
aroo+0.020
 plunge+0.019
ishly+0.019
Neg logit contributions
aron-0.012
bo-0.011
en-0.011
y-0.011
by-0.010
Token that
Token ID326
Feature activation-2.91e-02

Pos logit contributions
aron+0.010
bo+0.010
y+0.010
ce+0.009
en+0.009
Neg logit contributions
anger-0.009
 plunge-0.009
stack-0.009
ishly-0.008
aroo-0.008
Token even
Token ID772
Feature activation+8.63e-02

Pos logit contributions
anger+0.048
stack+0.048
 plunge+0.047
ses+0.042
aroo+0.042
Neg logit contributions
bo-0.052
aron-0.051
y-0.050
en-0.048
by-0.046
Token though
Token ID996
Feature activation+1.53e-02

Pos logit contributions
aron+0.176
bo+0.168
ce+0.149
y+0.143
Kat+0.143
Neg logit contributions
 plunge-0.119
ble-0.104
 raft-0.104
aroo-0.102
anger-0.100
Token it
Token ID340
Feature activation-9.64e-03

Pos logit contributions
bo+0.026
aron+0.026
y+0.026
en+0.024
by+0.023
Neg logit contributions
anger-0.027
stack-0.026
 plunge-0.026
ittens-0.023
ses-0.023
Token was
Token ID373
Feature activation+2.09e-02

Pos logit contributions
anger+0.021
stack+0.021
 plunge+0.021
+0.019
aroo+0.019
Neg logit contributions
aron-0.012
bo-0.010
en-0.010
y-0.010
by-0.010
Token competitive
Token ID7606
Feature activation+1.11e-02

Pos logit contributions
aron+0.039
bo+0.039
y+0.037
by+0.036
en+0.036
Neg logit contributions
stack-0.030
 plunge-0.030
 footing-0.028
ses-0.027
ble-0.027
Token,
Token ID11
Feature activation-1.29e-02

Pos logit contributions
bo+0.019
aron+0.019
by+0.017
Grand+0.016
Kat+0.016
Neg logit contributions
anger-0.019
 plunge-0.018
stack-0.017
ishly-0.017
-0.016
Token delicate
Token ID19217
Feature activation+1.37e-02

Pos logit contributions
bo+0.029
aron+0.029
y+0.028
en+0.028
by+0.028
Neg logit contributions
anger-0.034
 plunge-0.033
stack-0.033
ittens-0.031
ses-0.030
Token and
Token ID290
Feature activation-2.22e-03

Pos logit contributions
bo+0.018
aron+0.016
by+0.015
y+0.015
en+0.014
Neg logit contributions
anger-0.030
stack-0.029
 plunge-0.029
ishly-0.027
ittens-0.027
Token very
Token ID845
Feature activation+7.72e-03

Pos logit contributions
anger+0.005
stack+0.005
 plunge+0.004
ses+0.004
ittens+0.004
Neg logit contributions
bo-0.003
y-0.003
aron-0.003
en-0.003
by-0.003
Token pretty
Token ID2495
Feature activation+3.43e-02

Pos logit contributions
bo+0.011
y+0.011
aron+0.011
en+0.011
by+0.010
Neg logit contributions
anger-0.015
stack-0.015
 plunge-0.015
ses-0.014
ittens-0.014
Token.
Token ID13
Feature activation-9.71e-04

Pos logit contributions
bo+0.050
aron+0.045
by+0.042
die+0.041
Em+0.041
Neg logit contributions
anger-0.071
stack-0.066
 plunge-0.065
ishly-0.061
-0.059
Token She
Token ID1375
Feature activation-2.09e-02

Pos logit contributions
anger+0.002
stack+0.002
 plunge+0.002
ses+0.002
ittens+0.002
Neg logit contributions
bo-0.002
y-0.001
aron-0.001
en-0.001
by-0.001
Token was
Token ID373
Feature activation+1.82e-02

Pos logit contributions
anger+0.044
 plunge+0.043
stack+0.042
ses+0.042
sed+0.041
Neg logit contributions
aron-0.025
bo-0.024
en-0.023
y-0.022
by-0.020
Token excited
Token ID6568
Feature activation-1.76e-02

Pos logit contributions
bo+0.027
y+0.026
aron+0.026
en+0.025
by+0.025
Neg logit contributions
anger-0.035
stack-0.035
 plunge-0.034
ses-0.032
ittens-0.031
Token to
Token ID284
Feature activation+2.55e-02

Pos logit contributions
anger+0.034
 plunge+0.033
stack+0.032
ittens+0.030
idents+0.030
Neg logit contributions
bo-0.028
aron-0.026
en-0.023
y-0.022
ce-0.022
Token go
Token ID467
Feature activation-2.88e-02

Pos logit contributions
en+0.038
bo+0.036
y+0.034
 added+0.034
 enjoyed+0.034
Neg logit contributions
anger-0.052
stack-0.050
ses-0.047
ittens-0.046
ishly-0.046
Token outside
Token ID2354
Feature activation+1.06e-02

Pos logit contributions
stack+0.076
ses+0.074
anger+0.073
sed+0.071
ishly+0.069
Neg logit contributions
y-0.026
bo-0.025
en-0.024
by-0.021
 over-0.020
Token,
Token ID11
Feature activation-8.03e-03

Pos logit contributions
anger+0.030
 plunge+0.030
stack+0.029
 hi+0.028
aroo+0.027
Neg logit contributions
bo-0.042
en-0.041
y-0.041
aron-0.040
ky-0.039
Token "
Token ID366
Feature activation+2.92e-03

Pos logit contributions
stack+0.011
anger+0.010
 plunge+0.009
ses+0.009
ittens+0.009
Neg logit contributions
bo-0.017
y-0.017
en-0.017
by-0.016
aron-0.016
TokenSanta
Token ID42694
Feature activation+1.24e-01

Pos logit contributions
en+0.004
y+0.004
aron+0.004
by+0.004
bo+0.004
Neg logit contributions
 plunge-0.006
stack-0.005
aroo-0.005
anger-0.005
ikes-0.005
Token,
Token ID11
Feature activation-8.35e-03

Pos logit contributions
aron+0.187
bo+0.185
en+0.179
y+0.178
Em+0.168
Neg logit contributions
anger-0.230
stack-0.223
 plunge-0.216
ses-0.206
ishly-0.199
Token is
Token ID318
Feature activation+1.79e-02

Pos logit contributions
stack+0.013
anger+0.013
 plunge+0.013
ittens+0.011
aroo+0.011
Neg logit contributions
bo-0.015
y-0.015
aron-0.015
en-0.015
by-0.014
Token that
Token ID326
Feature activation-2.33e-02

Pos logit contributions
y+0.036
en+0.035
aron+0.033
by+0.033
bo+0.032
Neg logit contributions
stack-0.026
 plunge-0.024
anger-0.023
aroo-0.023
 ours-0.022
Token you
Token ID345
Feature activation+9.42e-04

Pos logit contributions
 plunge+0.034
stack+0.031
anger+0.030
 hi+0.029
aroo+0.028
Neg logit contributions
aron-0.046
en-0.044
bo-0.043
y-0.042
ky-0.042
Token?
Token ID30
Feature activation+2.62e-02

Pos logit contributions
aron+0.002
bo+0.002
y+0.002
en+0.002
by+0.002
Neg logit contributions
anger-0.001
 plunge-0.001
stack-0.001
ses-0.001
aroo-0.001
Token We
Token ID775
Feature activation+3.03e-02

Pos logit contributions
y+0.052
en+0.052
aron+0.052
bo+0.051
by+0.051
Neg logit contributions
stack-0.036
anger-0.035
 plunge-0.033
aroo-0.030
ishly-0.029
Token are
Token ID389
Feature activation-3.88e-02

Pos logit contributions
aron+0.062
y+0.061
bo+0.061
en+0.060
by+0.057
Neg logit contributions
 plunge-0.039
stack-0.038
anger-0.038
ses-0.033
aroo-0.032
Token so
Token ID523
Feature activation-2.88e-02

Pos logit contributions
stack+0.057
 plunge+0.055
anger+0.052
ses+0.050
aroo+0.049
Neg logit contributions
y-0.076
en-0.073
bo-0.072
aron-0.071
by-0.070
Token
Token ID198
Feature activation+3.14e-03

Pos logit contributions
bo+0.003
y+0.003
aron+0.003
en+0.003
by+0.003
Neg logit contributions
anger-0.006
stack-0.006
 plunge-0.005
ittens-0.005
aroo-0.005
TokenOne
Token ID3198
Feature activation+1.93e-03

Pos logit contributions
bo+0.004
aron+0.004
en+0.004
y+0.004
by+0.004
Neg logit contributions
anger-0.006
stack-0.006
 plunge-0.006
ittens-0.006
aroo-0.006
Token day
Token ID1110
Feature activation+5.15e-02

Pos logit contributions
aron+0.003
bo+0.003
en+0.002
y+0.002
by+0.002
Neg logit contributions
anger-0.004
stack-0.004
 plunge-0.004
ittens-0.004
aroo-0.004
Token,
Token ID11
Feature activation-1.50e-02

Pos logit contributions
bo+0.089
aron+0.087
en+0.081
y+0.080
 cared+0.075
Neg logit contributions
anger-0.093
 plunge-0.091
stack-0.085
ittens-0.082
aroo-0.080
Token they
Token ID484
Feature activation+2.40e-02

Pos logit contributions
anger+0.027
stack+0.027
 plunge+0.026
ses+0.024
ittens+0.024
Neg logit contributions
bo-0.024
y-0.024
aron-0.024
en-0.023
by-0.022
Token see
Token ID766
Feature activation-1.37e-02

Pos logit contributions
aron+0.047
bo+0.043
y+0.042
ce+0.041
en+0.041
Neg logit contributions
 plunge-0.036
stack-0.036
anger-0.034
ses-0.031
 raft-0.031
Token a
Token ID257
Feature activation-2.33e-02

Pos logit contributions
anger+0.024
stack+0.023
 plunge+0.023
ses+0.022
ittens+0.021
Neg logit contributions
bo-0.023
aron-0.023
y-0.022
en-0.022
by-0.020
Token real
Token ID1103
Feature activation+4.43e-03

Pos logit contributions
anger+0.042
stack+0.040
 plunge+0.039
ses+0.036
ittens+0.036
Neg logit contributions
bo-0.039
y-0.038
aron-0.038
en-0.037
by-0.034
Token train
Token ID4512
Feature activation-9.07e-02

Pos logit contributions
y+0.010
bo+0.010
en+0.010
aron+0.010
by+0.009
Neg logit contributions
anger-0.005
stack-0.005
 plunge-0.005
ittens-0.004
ses-0.004
Token outside
Token ID2354
Feature activation+9.52e-03

Pos logit contributions
anger+0.188
stack+0.183
 plunge+0.180
ittens+0.169
ses+0.168
Neg logit contributions
bo-0.124
y-0.123
aron-0.119
en-0.116
by-0.108
Token their
Token ID511
Feature activation+2.70e-03

Pos logit contributions
bo+0.011
aron+0.010
y+0.010
en+0.010
by+0.009
Neg logit contributions
anger-0.022
stack-0.022
 plunge-0.021
ses-0.020
ittens-0.020
Token had
Token ID550
Feature activation+5.15e-02

Pos logit contributions
aron+0.005
en+0.005
bo+0.005
y+0.005
by+0.004
Neg logit contributions
stack-0.006
anger-0.006
 plunge-0.006
ses-0.006
sed-0.005
Token a
Token ID257
Feature activation-1.95e-02

Pos logit contributions
aron+0.101
y+0.101
bo+0.099
en+0.095
ce+0.093
Neg logit contributions
anger-0.073
 plunge-0.072
ses-0.068
stack-0.065
sed-0.065
Token hole
Token ID7604
Feature activation-2.39e-02

Pos logit contributions
anger+0.032
stack+0.030
 plunge+0.030
ses+0.028
ittens+0.027
Neg logit contributions
bo-0.036
y-0.035
aron-0.035
en-0.035
by-0.032
Token in
Token ID287
Feature activation+2.23e-02

Pos logit contributions
anger+0.049
stack+0.048
 plunge+0.046
ble+0.043
aroo+0.043
Neg logit contributions
bo-0.033
aron-0.032
by-0.031
en-0.030
 cared-0.029
Token the
Token ID262
Feature activation+4.88e-03

Pos logit contributions
bo+0.039
aron+0.039
y+0.037
by+0.036
 cared+0.036
Neg logit contributions
stack-0.037
anger-0.036
 plunge-0.036
aroo-0.031
ses-0.031
Token middle
Token ID3504
Feature activation-7.25e-02

Pos logit contributions
y+0.010
bo+0.010
aron+0.010
en+0.010
by+0.009
Neg logit contributions
stack-0.007
anger-0.007
 plunge-0.006
ses-0.006
aroo-0.005
Token.
Token ID13
Feature activation+9.58e-03

Pos logit contributions
anger+0.148
stack+0.139
 plunge+0.136
ittens+0.127
aroo+0.122
Neg logit contributions
bo-0.101
aron-0.099
by-0.094
y-0.093
en-0.093
Token He
Token ID679
Feature activation+1.72e-02

Pos logit contributions
bo+0.013
aron+0.013
y+0.013
en+0.012
by+0.011
Neg logit contributions
anger-0.020
stack-0.019
 plunge-0.019
ses-0.018
ittens-0.018
Token wondered
Token ID14028
Feature activation+2.30e-02

Pos logit contributions
bo+0.023
aron+0.023
en+0.022
y+0.022
by+0.020
Neg logit contributions
stack-0.032
sed-0.032
anger-0.031
ses-0.031
aroo-0.030
Token what
Token ID644
Feature activation-8.11e-03

Pos logit contributions
y+0.037
bo+0.036
aron+0.036
en+0.036
by+0.034
Neg logit contributions
anger-0.042
stack-0.041
 plunge-0.039
ses-0.037
ittens-0.037
Token it
Token ID340
Feature activation-9.35e-03

Pos logit contributions
anger+0.014
 plunge+0.014
stack+0.014
aroo+0.013
 raft+0.012
Neg logit contributions
aron-0.014
bo-0.013
en-0.013
y-0.013
by-0.011
Token her
Token ID607
Feature activation-2.07e-02

Pos logit contributions
anger+0.024
stack+0.024
 plunge+0.023
ittens+0.022
ses+0.022
Neg logit contributions
bo-0.018
y-0.018
aron-0.017
en-0.017
by-0.016
Token mom
Token ID1995
Feature activation+5.07e-03

Pos logit contributions
anger+0.034
stack+0.033
 plunge+0.033
ittens+0.030
otion+0.030
Neg logit contributions
y-0.036
bo-0.035
by-0.034
en-0.034
aron-0.033
Token and
Token ID290
Feature activation+4.12e-04

Pos logit contributions
aron+0.006
bo+0.006
en+0.005
by+0.005
y+0.005
Neg logit contributions
stack-0.011
anger-0.011
 plunge-0.010
ittens-0.010
ishly-0.010
Token dad
Token ID9955
Feature activation+7.15e-02

Pos logit contributions
bo+0.001
y+0.001
aron+0.001
en+0.001
by+0.001
Neg logit contributions
anger-0.001
stack-0.001
 plunge-0.001
ses-0.001
ittens-0.001
Token.
Token ID13
Feature activation+3.41e-03

Pos logit contributions
bo+0.093
aron+0.090
en+0.086
y+0.084
by+0.079
Neg logit contributions
stack-0.153
anger-0.147
 plunge-0.147
ishly-0.138
ittens-0.131
Token She
Token ID1375
Feature activation-1.66e-02

Pos logit contributions
bo+0.007
y+0.006
aron+0.006
en+0.006
by+0.006
Neg logit contributions
anger-0.005
stack-0.005
 plunge-0.005
ses-0.004
ittens-0.004
Token ran
Token ID4966
Feature activation-2.15e-02

Pos logit contributions
stack+0.035
anger+0.035
 plunge+0.034
sed+0.033
ses+0.033
Neg logit contributions
aron-0.020
en-0.019
y-0.019
bo-0.018
able-0.015
Token away
Token ID1497
Feature activation-3.60e-02

Pos logit contributions
anger+0.046
stack+0.045
 plunge+0.045
ses+0.041
ittens+0.040
Neg logit contributions
bo-0.029
aron-0.027
y-0.027
en-0.025
 cared-0.024
Token and
Token ID290
Feature activation+6.37e-04

Pos logit contributions
anger+0.076
stack+0.075
 plunge+0.073
ses+0.068
aroo+0.066
Neg logit contributions
bo-0.051
aron-0.045
y-0.044
en-0.044
by-0.041
Token found
Token ID1043
Feature activation-3.90e-02

Pos logit contributions
bo+0.001
aron+0.001
y+0.001
en+0.001
by+0.001
Neg logit contributions
anger-0.001
stack-0.001
 plunge-0.001
ses-0.001
ittens-0.001
Token a
Token ID257
Feature activation-2.14e-02

Pos logit contributions
anger+0.087
stack+0.084
 plunge+0.081
ittens+0.080
ses+0.078
Neg logit contributions
bo-0.047
y-0.046
aron-0.044
en-0.043
by-0.041
Token was
Token ID373
Feature activation+2.06e-02

Pos logit contributions
bo+0.064
y+0.063
aron+0.063
en+0.062
by+0.055
Neg logit contributions
stack-0.119
anger-0.119
 plunge-0.115
aroo-0.110
ses-0.109
Token curious
Token ID11040
Feature activation-1.49e-03

Pos logit contributions
y+0.035
bo+0.035
en+0.034
aron+0.034
by+0.033
Neg logit contributions
stack-0.035
anger-0.034
 plunge-0.034
ses-0.031
aroo-0.031
Token too
Token ID1165
Feature activation-8.05e-02

Pos logit contributions
anger+0.002
 plunge+0.002
stack+0.002
ishly+0.002
ittens+0.002
Neg logit contributions
bo-0.002
aron-0.002
by-0.002
en-0.002
y-0.002
Token.
Token ID13
Feature activation+3.98e-03

Pos logit contributions
anger+0.185
stack+0.182
 plunge+0.178
ittens+0.170
aroo+0.169
Neg logit contributions
bo-0.090
aron-0.084
y-0.084
en-0.083
by-0.075
Token Ab
Token ID2275
Feature activation+8.79e-02

Pos logit contributions
bo+0.006
y+0.006
en+0.006
aron+0.006
by+0.005
Neg logit contributions
anger-0.008
stack-0.008
 plunge-0.007
ses-0.007
ittens-0.007
Tokenig
Token ID328
Feature activation-1.46e-01

Pos logit contributions
bo+0.197
y+0.196
en+0.188
by+0.184
aron+0.181
Neg logit contributions
stack-0.116
anger-0.109
 plunge-0.104
ses-0.099
ishly-0.090
Tokenail
Token ID603
Feature activation-7.12e-02

Pos logit contributions
anger+0.170
stack+0.162
 plunge+0.159
ses+0.144
ittens+0.135
Neg logit contributions
bo-0.340
y-0.337
aron-0.324
en-0.323
by-0.314
Token offered
Token ID4438
Feature activation-4.42e-03

Pos logit contributions
stack+0.154
anger+0.151
ishly+0.148
 plunge+0.146
aroo+0.140
Neg logit contributions
aron-0.083
bo-0.081
y-0.077
en-0.077
by-0.069
Token to
Token ID284
Feature activation+2.65e-02

Pos logit contributions
anger+0.009
stack+0.009
 plunge+0.009
aroo+0.008
ses+0.008
Neg logit contributions
bo-0.006
y-0.006
aron-0.006
en-0.006
by-0.005
Token teach
Token ID4545
Feature activation-1.61e-02

Pos logit contributions
bo+0.047
en+0.045
y+0.045
aron+0.044
 cared+0.044
Neg logit contributions
anger-0.047
ittens-0.040
ses-0.040
stack-0.039
 plunge-0.038
Token him
Token ID683
Feature activation-8.52e-02

Pos logit contributions
anger+0.032
stack+0.031
 plunge+0.031
ittens+0.028
ses+0.028
Neg logit contributions
bo-0.024
y-0.024
aron-0.023
en-0.022
by-0.021
TokenS
Token ID50
Feature activation+8.66e-02

Pos logit contributions
en+0.018
y+0.016
bo+0.016
aron+0.016
nie+0.015
Neg logit contributions
ittens-0.021
anger-0.021
stack-0.020
 plunge-0.020
aroo-0.020
Tokenara
Token ID3301
Feature activation+1.04e-01

Pos logit contributions
bo+0.096
aron+0.093
y+0.093
en+0.090
 cared+0.081
Neg logit contributions
anger-0.204
stack-0.191
 plunge-0.190
ittens-0.184
ses-0.177
Token nodded
Token ID14464
Feature activation+2.76e-02

Pos logit contributions
en+0.139
bo+0.138
 late+0.136
able+0.130
y+0.129
Neg logit contributions
anger-0.180
ishly-0.178
stack-0.176
sed-0.173
ses-0.169
Token and
Token ID290
Feature activation+7.70e-03

Pos logit contributions
bo+0.037
aron+0.034
ud+0.032
en+0.031
by+0.031
Neg logit contributions
anger-0.051
 yes-0.050
 plunge-0.049
ishly-0.049
stack-0.049
Token stro
Token ID9660
Feature activation+1.02e-02

Pos logit contributions
bo+0.010
y+0.010
aron+0.010
en+0.010
by+0.009
Neg logit contributions
stack-0.016
anger-0.016
 plunge-0.016
ses-0.015
ishly-0.015
Tokenked
Token ID9091
Feature activation-1.23e-01

Pos logit contributions
bo+0.021
y+0.020
aron+0.020
en+0.020
by+0.019
Neg logit contributions
anger-0.016
stack-0.015
 plunge-0.014
ses-0.013
ittens-0.013
Token the
Token ID262
Feature activation+3.79e-03

Pos logit contributions
anger+0.214
stack+0.209
 plunge+0.209
ses+0.197
ishly+0.189
Neg logit contributions
bo-0.196
aron-0.195
y-0.191
 cared-0.178
en-0.178
Token velvet
Token ID47750
Feature activation-3.45e-02

Pos logit contributions
bo+0.007
y+0.007
aron+0.006
en+0.006
by+0.006
Neg logit contributions
stack-0.007
anger-0.006
 plunge-0.006
ittens-0.006
aroo-0.006
Token dress
Token ID6576
Feature activation-1.83e-03

Pos logit contributions
anger+0.051
stack+0.051
 plunge+0.050
ittens+0.045
ses+0.045
Neg logit contributions
y-0.066
bo-0.066
aron-0.064
en-0.063
by-0.060
Token gently
Token ID15165
Feature activation-4.56e-02

Pos logit contributions
anger+0.005
stack+0.004
 plunge+0.004
ittens+0.004
ses+0.004
Neg logit contributions
bo-0.002
y-0.002
aron-0.002
en-0.002
by-0.001
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
anger+0.091
stack+0.087
 plunge+0.085
ishly+0.081
ses+0.079
Neg logit contributions
bo-0.062
Em-0.058
Kat-0.058
by-0.057
y-0.057
Token She
Token ID1375
Feature activation-2.41e-02

Pos logit contributions
anger+0.005
 plunge+0.005
stack+0.005
ute+0.004
fs+0.004
Neg logit contributions
bo-0.003
y-0.003
aron-0.003
en-0.003
 cared-0.002
Token me
Token ID502
Feature activation+9.89e-03

Pos logit contributions
stack+0.049
sed+0.047
 plunge+0.047
ses+0.046
anger+0.046
Neg logit contributions
en-0.030
aron-0.030
bo-0.027
y-0.026
ams-0.024
Tokenowed
Token ID6972
Feature activation-1.39e-01

Pos logit contributions
y+0.022
bo+0.021
aron+0.021
en+0.020
by+0.020
Neg logit contributions
anger-0.013
stack-0.013
 plunge-0.012
ses-0.011
ittens-0.011
Token and
Token ID290
Feature activation-4.96e-03

Pos logit contributions
anger+0.287
stack+0.281
 plunge+0.277
ittens+0.261
ishly+0.258
Neg logit contributions
bo-0.183
aron-0.177
en-0.169
y-0.168
by-0.160
Token me
Token ID502
Feature activation+1.02e-02

Pos logit contributions
stack+0.011
 plunge+0.011
anger+0.011
ses+0.010
sed+0.010
Neg logit contributions
bo-0.006
en-0.006
aron-0.006
y-0.006
by-0.005
Tokenowed
Token ID6972
Feature activation-1.39e-01

Pos logit contributions
y+0.023
bo+0.023
aron+0.023
en+0.022
by+0.022
Neg logit contributions
anger-0.012
stack-0.011
 plunge-0.011
ses-0.010
ittens-0.010
Token to
Token ID284
Feature activation+2.52e-02

Pos logit contributions
anger+0.257
stack+0.251
 plunge+0.250
ishly+0.234
ittens+0.232
Neg logit contributions
bo-0.208
aron-0.201
y-0.192
en-0.190
by-0.185
Token tell
Token ID1560
Feature activation-5.24e-02

Pos logit contributions
bo+0.038
y+0.036
en+0.036
aron+0.035
by+0.034
Neg logit contributions
anger-0.052
stack-0.047
 plunge-0.046
ittens-0.045
ute-0.044
Token her
Token ID607
Feature activation-2.30e-02

Pos logit contributions
anger+0.105
stack+0.103
 plunge+0.099
ses+0.093
ittens+0.092
Neg logit contributions
bo-0.077
y-0.076
aron-0.074
en-0.073
by-0.068
Token family
Token ID1641
Feature activation+1.59e-02

Pos logit contributions
anger+0.041
stack+0.040
 plunge+0.039
ses+0.036
aroo+0.035
Neg logit contributions
y-0.040
bo-0.039
aron-0.037
by-0.037
en-0.036
Token she
Token ID673
Feature activation+3.62e-02

Pos logit contributions
bo+0.023
aron+0.022
y+0.022
en+0.021
by+0.020
Neg logit contributions
anger-0.032
stack-0.031
 plunge-0.030
ses-0.028
ittens-0.028
Token scratch
Token ID12692
Feature activation+1.01e-02

Pos logit contributions
bo+0.046
y+0.044
en+0.044
aron+0.043
by+0.040
Neg logit contributions
anger-0.048
stack-0.044
 plunge-0.041
ittens-0.041
ses-0.039
Token his
Token ID465
Feature activation+6.38e-03

Pos logit contributions
aron+0.017
bo+0.016
y+0.016
en+0.015
by+0.015
Neg logit contributions
anger-0.017
stack-0.017
 plunge-0.017
aroo-0.016
ses-0.015
Token ear
Token ID1027
Feature activation-4.79e-02

Pos logit contributions
bo+0.012
y+0.012
aron+0.011
by+0.011
en+0.011
Neg logit contributions
anger-0.011
stack-0.010
 plunge-0.010
otion-0.010
-0.009
Token with
Token ID351
Feature activation+2.10e-02

Pos logit contributions
anger+0.122
stack+0.120
 plunge+0.118
ses+0.112
ittens+0.112
Neg logit contributions
bo-0.042
y-0.042
aron-0.040
en-0.038
by-0.034
Token his
Token ID465
Feature activation+6.17e-03

Pos logit contributions
y+0.027
bo+0.027
aron+0.026
en+0.025
by+0.024
Neg logit contributions
anger-0.045
stack-0.044
 plunge-0.044
ses-0.041
ittens-0.041
Token paw
Token ID41827
Feature activation-1.33e-01

Pos logit contributions
bo+0.011
y+0.010
aron+0.010
by+0.010
en+0.009
Neg logit contributions
stack-0.011
anger-0.011
 plunge-0.010
otion-0.010
ishly-0.009
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
anger+0.293
stack+0.282
 plunge+0.276
ittens+0.267
ses+0.265
Neg logit contributions
bo-0.155
aron-0.154
en-0.150
by-0.145
y-0.144
Token The
Token ID383
Feature activation-4.99e-02

Pos logit contributions
y+0.016
bo+0.016
en+0.016
aron+0.015
by+0.014
Neg logit contributions
anger-0.028
stack-0.027
 plunge-0.026
ses-0.025
ittens-0.025
Token fle
Token ID5104
Feature activation+5.45e-03

Pos logit contributions
anger+0.098
stack+0.096
 plunge+0.092
ittens+0.087
ses+0.085
Neg logit contributions
bo-0.075
y-0.074
aron-0.070
en-0.069
by-0.066
Tokena
Token ID64
Feature activation+7.42e-02

Pos logit contributions
bo+0.005
en+0.005
aron+0.005
y+0.005
by+0.005
Neg logit contributions
anger-0.013
stack-0.013
 plunge-0.012
ses-0.012
ittens-0.012
Token jumps
Token ID18045
Feature activation-3.93e-03

Pos logit contributions
aron+0.124
y+0.123
bo+0.120
en+0.119
Em+0.116
Neg logit contributions
anger-0.122
ses-0.118
 plunge-0.116
stack-0.115
sed-0.111
Token now
Token ID783
Feature activation-5.42e-02

Pos logit contributions
anger+0.057
stack+0.052
 plunge+0.052
ittens+0.049
ses+0.049
Neg logit contributions
bo-0.056
y-0.054
en-0.053
aron-0.053
 added-0.049
Token reach
Token ID3151
Feature activation-1.22e-02

Pos logit contributions
anger+0.081
stack+0.079
 plunge+0.076
ses+0.069
ittens+0.069
Neg logit contributions
bo-0.105
y-0.105
aron-0.103
en-0.100
by-0.096
Token the
Token ID262
Feature activation-1.59e-03

Pos logit contributions
anger+0.022
stack+0.021
 plunge+0.021
ittens+0.018
ble+0.018
Neg logit contributions
bo-0.021
aron-0.020
y-0.019
 cared-0.019
en-0.017
Token spaghetti
Token ID48241
Feature activation-2.04e-02

Pos logit contributions
anger+0.002
stack+0.002
 plunge+0.002
ses+0.002
sed+0.002
Neg logit contributions
bo-0.003
aron-0.003
en-0.003
y-0.003
 cared-0.003
Token.
Token ID13
Feature activation+6.85e-03

Pos logit contributions
anger+0.043
stack+0.043
 plunge+0.041
ishly+0.037
ses+0.037
Neg logit contributions
bo-0.028
aron-0.026
y-0.025
en-0.025
by-0.025
Token Kitty
Token ID21168
Feature activation-1.37e-01

Pos logit contributions
bo+0.011
y+0.011
aron+0.011
en+0.010
by+0.010
Neg logit contributions
anger-0.013
stack-0.012
 plunge-0.012
ittens-0.011
ses-0.011
Token took
Token ID1718
Feature activation+2.11e-02

Pos logit contributions
stack+0.264
ishly+0.252
ses+0.234
sed+0.234
anger+0.232
Neg logit contributions
Kat-0.164
older-0.153
ams-0.152
bo-0.149
 late-0.149
Token a
Token ID257
Feature activation-1.92e-02

Pos logit contributions
y+0.032
bo+0.031
aron+0.031
en+0.030
by+0.028
Neg logit contributions
anger-0.041
stack-0.039
 plunge-0.039
ittens-0.036
ses-0.036
Token big
Token ID1263
Feature activation+6.14e-02

Pos logit contributions
anger+0.035
 plunge+0.033
stack+0.032
ittens+0.030
ses+0.029
Neg logit contributions
bo-0.032
aron-0.032
en-0.031
y-0.031
 cared-0.028
Token bite
Token ID13197
Feature activation-4.27e-02

Pos logit contributions
bo+0.103
y+0.102
aron+0.100
en+0.098
by+0.093
Neg logit contributions
anger-0.109
stack-0.105
 plunge-0.103
ses-0.095
ittens-0.095
Token and
Token ID290
Feature activation-8.46e-03

Pos logit contributions
anger+0.080
stack+0.072
 plunge+0.072
ittens+0.066
ble+0.064
Neg logit contributions
bo-0.066
by-0.063
aron-0.060
 cared-0.060
idi-0.058