TOP ACTIVATIONS
MAX = 16.842

ice cream , and bought souven irs . They
't let yourself become too dependent on one thing . It
bring the quarrel to a resolution , until everyone was happy
took a few spark ly souven irs home with her to
can keep it as a souven ir of your adventure .
take it home as a souven ir of their sm elly
keep the ring as a souven ir . They thanked the
to keep them as a souven ir of Frost y .
gold coin as a special souven ir of Sarah 's adventure
could keep it as a souven ir . S
He loaded his backpack with souven irs and kept travelling until
take it home as a souven ir from the park .
a purple feather as a souven ir .
keep the spring as a souven ir of his adventure .
never dared to join them <|endoftext|> Ab by and Tommy were
one for himself as a souven ir ! But his mum
found a beautiful sun flower <|endoftext|> Once upon a time ,
to have such a cool souven ir from their exciting adventure
a blue pass in it <|endoftext|> Once upon a time there
tr ink et as a souven ir . Then he hopped

MIN ACTIVATIONS
MIN = -12.063

then , Tim my and his dad have made sure to
the pond , fl apping his wings . He was so
Max w agged his tail and ran around happily
. Jack felt his heart stop . He knew
As Tim my and his friends were playing , they
now he could qu ench his thirst and have a cool
Tom my and his friend were both very glad
Brown ie w agged his tail and said , "
Suddenly , his dad had an idea .
so , Tim my and his mom picked up all the
! John and his mom walked away with happy
Then , his mom came and hugged him
." Tim , his mom , and his dad
. Tim and his friends worked together to raise
Jo jo cl apped his hands and danced along as
." John and his mom found some water and
Max w agged his tail and wanted to play
Tim my and his family continued their walk ,
delightful . Tim my and his friend tasted the delic iously
Little Johnny nodded his head . He was so

INTERVAL 9.616 - 16.842
CONTAINS 0.833%

sharing made everyone happy . <|endoftext|> Once upon a time ,
. You need a better joke , Tom . A funny
asked why they wouldn 't give it back . The person
They loved growing tomatoes . <|endoftext|> L ila was lonely in
things the right way . <|endoftext|> Once there was a tall

INTERVAL 2.389 - 9.616
CONTAINS 40.407%

came back with a big piece of paper . She smiled
puppy is brown and fluffy . Sara and Max go to
. We nap and pur r and scratch ."
home Sam from the store . She was also happy that
. <|endoftext|> Once upon a time , there was a princess

INTERVAL -4.837 - 2.389
CONTAINS 36.775%

upon a time , there was a tiny seed . It
warm breath . Snow y likes Lily and Tom
." The boy smiled happily . " That
were done , Jack said , " Look at the pretty
got closer , they realized it was an elephant 's trunk

INTERVAL -12.063 - -4.837
CONTAINS 21.985%

day , someone knocked on the door , and it was
for his pet . But he didn 't find it and
in the yard , in the street , and in the
He gathered enough pieces of the red ball to bring back
you stay out too long you might freeze . Come inside
Token ice
Token ID4771
Feature activation+2.38e+00

Pos logit contributions
anes+0.226
uts+0.217
all+0.203
ous+0.196
that+0.192
Neg logit contributions
 forgot-0.155
 scar-0.129
 fort-0.126
 hospital-0.122
 trap-0.121
Token cream
Token ID8566
Feature activation+4.58e+00

Pos logit contributions
anes+0.072
uts+0.067
all+0.063
ous+0.062
 Isn+0.061
Neg logit contributions
 forgot-0.051
 scar-0.044
 fort-0.043
 bit-0.040
 scarf-0.040
Token,
Token ID11
Feature activation-1.06e+00

Pos logit contributions
anes+0.154
uts+0.141
all+0.135
 Isn+0.134
ous+0.133
Neg logit contributions
 forgot-0.079
 scar-0.069
 fort-0.068
 bit-0.066
 head-0.063
Token and
Token ID290
Feature activation-6.04e+00

Pos logit contributions
 forgot+0.025
iper+0.024
 fort+0.023
 trap+0.022
 hospital+0.022
Neg logit contributions
uts-0.026
anes-0.026
all-0.024
that-0.022
unch-0.022
Token bought
Token ID5839
Feature activation+6.34e+00

Pos logit contributions
 forgot+0.124
iper+0.107
 fort+0.107
 trap+0.102
 hospital+0.101
Neg logit contributions
anes-0.171
uts-0.167
all-0.156
ous-0.150
 Isn-0.147
Token souven
Token ID45043
Feature activation+1.68e+01

Pos logit contributions
anes+0.189
uts+0.181
ous+0.164
all+0.164
unch+0.160
Neg logit contributions
 forgot-0.135
 scar-0.127
 scarf-0.115
 fort-0.114
 hospital-0.114
Tokenirs
Token ID17062
Feature activation+7.71e+00

Pos logit contributions
uts+0.281
anes+0.281
 Isn+0.266
that+0.254
ous+0.246
Neg logit contributions
 forgot-0.502
iper-0.497
 scar-0.487
ches-0.465
 fort-0.453
Token.
Token ID13
Feature activation+5.04e+00

Pos logit contributions
anes+0.203
unch+0.186
uts+0.182
 Isn+0.178
all+0.174
Neg logit contributions
 forgot-0.172
 fort-0.154
 scar-0.149
 hospital-0.144
 shield-0.142
Token
Token ID198
Feature activation+3.98e+00

Pos logit contributions
anes+0.143
ous+0.131
uts+0.131
qu+0.127
all+0.123
Neg logit contributions
 forgot-0.109
 closet-0.095
 hospital-0.092
 scar-0.090
 fort-0.089
Token
Token ID198
Feature activation+3.62e+00

Pos logit contributions
anes+0.094
uts+0.088
ous+0.086
umm+0.085
all+0.079
Neg logit contributions
 forgot-0.103
 hospital-0.089
 fort-0.089
 scar-0.089
 learned-0.087
TokenThey
Token ID2990
Feature activation-6.96e+00

Pos logit contributions
anes+0.102
uts+0.096
ous+0.091
all+0.087
 Isn+0.085
Neg logit contributions
 forgot-0.084
 scar-0.070
 fort-0.068
 hospital-0.067
 trap-0.066
Token't
Token ID470
Feature activation-2.66e+00

Pos logit contributions
 Hello+0.131
 Isn+0.129
umm+0.128
anes+0.127
uts+0.127
Neg logit contributions
 forgot-0.178
ches-0.162
 didn-0.160
 hospital-0.157
 cast-0.157
Token let
Token ID1309
Feature activation+6.15e+00

Pos logit contributions
 forgot+0.055
 bit+0.041
 scar+0.040
 hospital+0.040
 head+0.039
Neg logit contributions
anes-0.086
uts-0.081
all-0.074
ous-0.072
unch-0.072
Token yourself
Token ID3511
Feature activation-2.59e+00

Pos logit contributions
anes+0.186
uts+0.178
all+0.165
ous+0.160
unch+0.157
Neg logit contributions
 forgot-0.131
 scar-0.108
 fort-0.104
 hospital-0.104
 trap-0.101
Token become
Token ID1716
Feature activation+1.67e+00

Pos logit contributions
 forgot+0.062
 bit+0.050
 scar+0.050
 shield+0.050
 head+0.050
Neg logit contributions
anes-0.072
uts-0.066
all-0.061
unch-0.060
ous-0.059
Token too
Token ID1165
Feature activation+2.11e+00

Pos logit contributions
anes+0.050
uts+0.048
all+0.045
ous+0.043
that+0.043
Neg logit contributions
 forgot-0.035
 scar-0.028
 fort-0.028
 hospital-0.027
 closet-0.027
Token dependent
Token ID10795
Feature activation+1.68e+01

Pos logit contributions
anes+0.076
uts+0.068
amb+0.067
all+0.066
that+0.065
Neg logit contributions
 forgot-0.038
 bit-0.028
 dangerous-0.026
 way-0.025
 remembered-0.024
Token on
Token ID319
Feature activation+2.95e+00

Pos logit contributions
anes+0.388
uts+0.372
unch+0.349
all+0.345
ous+0.344
Neg logit contributions
 forgot-0.421
ches-0.357
iper-0.356
 head-0.355
 hospital-0.352
Token one
Token ID530
Feature activation+8.80e-02

Pos logit contributions
anes+0.085
uts+0.081
all+0.075
ous+0.072
unch+0.071
Neg logit contributions
 forgot-0.067
 scar-0.056
 fort-0.054
 hospital-0.053
 shield-0.053
Token thing
Token ID1517
Feature activation+7.03e+00

Pos logit contributions
anes+0.002
uts+0.002
all+0.002
unch+0.002
ous+0.002
Neg logit contributions
 forgot-0.002
 bit-0.002
 fort-0.002
 trap-0.002
 scar-0.002
Token.
Token ID13
Feature activation+4.93e+00

Pos logit contributions
anes+0.168
uts+0.152
all+0.144
unch+0.142
that+0.136
Neg logit contributions
 forgot-0.190
 scar-0.160
 fort-0.159
 hospital-0.159
 bit-0.158
Token It
Token ID632
Feature activation-5.09e+00

Pos logit contributions
anes+0.116
uts+0.112
ous+0.105
qu+0.098
ges+0.098
Neg logit contributions
 forgot-0.123
 hospital-0.109
 closet-0.104
iper-0.103
 learned-0.102
Token bring
Token ID2222
Feature activation+5.39e+00

Pos logit contributions
 forgot+0.053
 scar+0.043
 fort+0.043
 hospital+0.041
 closet+0.040
Neg logit contributions
anes-0.082
uts-0.078
all-0.074
ous-0.071
that-0.070
Token the
Token ID262
Feature activation-6.10e+00

Pos logit contributions
anes+0.143
uts+0.136
all+0.127
ous+0.120
that+0.118
Neg logit contributions
 forgot-0.133
 scar-0.112
 fort-0.111
 hospital-0.107
 closet-0.106
Token quarrel
Token ID41998
Feature activation+9.87e+00

Pos logit contributions
 forgot+0.134
 scar+0.109
 fort+0.109
iper+0.106
 hospital+0.103
Neg logit contributions
anes-0.176
uts-0.166
all-0.157
ous-0.151
that-0.149
Token to
Token ID284
Feature activation-1.62e+00

Pos logit contributions
anes+0.253
uts+0.239
all+0.221
ous+0.211
that+0.207
Neg logit contributions
 forgot-0.252
 scar-0.213
 fort-0.212
 hospital-0.206
 shield-0.204
Token a
Token ID257
Feature activation-4.40e+00

Pos logit contributions
 forgot+0.037
 scar+0.031
 fort+0.030
 hospital+0.030
 shield+0.029
Neg logit contributions
anes-0.046
uts-0.044
all-0.041
ous-0.039
that-0.039
Token resolution
Token ID6323
Feature activation+1.67e+01

Pos logit contributions
 forgot+0.089
 scar+0.072
iper+0.070
 fort+0.069
 closet+0.066
Neg logit contributions
anes-0.131
uts-0.125
all-0.119
ous-0.117
that-0.114
Token,
Token ID11
Feature activation-4.90e-01

Pos logit contributions
anes+0.476
uts+0.447
all+0.413
 Isn+0.398
unch+0.398
Neg logit contributions
 forgot-0.376
 scar-0.306
 fort-0.302
 closet-0.295
 hospital-0.295
Token until
Token ID1566
Feature activation+5.58e-02

Pos logit contributions
 forgot+0.011
 fort+0.009
 scar+0.009
 trap+0.009
iper+0.009
Neg logit contributions
anes-0.014
uts-0.013
all-0.013
that-0.012
ous-0.012
Token everyone
Token ID2506
Feature activation-2.22e+00

Pos logit contributions
anes+0.001
uts+0.001
all+0.001
ous+0.001
that+0.001
Neg logit contributions
 forgot-0.001
 scar-0.001
 fort-0.001
 hospital-0.001
 trap-0.001
Token was
Token ID373
Feature activation-3.24e+00

Pos logit contributions
 forgot+0.039
 scar+0.034
 fort+0.033
 scarf+0.031
 closet+0.031
Neg logit contributions
anes-0.073
uts-0.070
all-0.066
ous-0.064
od-0.063
Token happy
Token ID3772
Feature activation+3.48e+00

Pos logit contributions
 forgot+0.069
 scar+0.054
 fort+0.052
 hospital+0.052
 bit+0.052
Neg logit contributions
anes-0.100
uts-0.095
all-0.088
ous-0.087
 Isn-0.084
Token took
Token ID1718
Feature activation+4.84e+00

Pos logit contributions
 fort+0.098
 trap+0.090
 hospital+0.089
 forgot+0.086
 closet+0.085
Neg logit contributions
all-0.140
anes-0.133
od-0.127
ges-0.126
Welcome-0.126
Token a
Token ID257
Feature activation-5.01e+00

Pos logit contributions
anes+0.128
uts+0.122
all+0.113
ous+0.108
that+0.106
Neg logit contributions
 forgot-0.120
 scar-0.102
 fort-0.100
 hospital-0.098
 trap-0.097
Token few
Token ID1178
Feature activation-1.13e-01

Pos logit contributions
 forgot+0.101
 hospital+0.078
 fort+0.077
 scar+0.077
iper+0.075
Neg logit contributions
anes-0.152
uts-0.146
all-0.138
ous-0.136
 Isn-0.132
Token spark
Token ID9009
Feature activation+4.72e+00

Pos logit contributions
 forgot+0.003
 fort+0.002
 hospital+0.002
 head+0.002
iper+0.002
Neg logit contributions
anes-0.003
uts-0.003
all-0.003
 Isn-0.003
qu-0.003
Tokenly
Token ID306
Feature activation-3.17e+00

Pos logit contributions
anes+0.125
uts+0.121
all+0.115
ous+0.112
qu+0.106
Neg logit contributions
 forgot-0.117
 closet-0.097
 fort-0.096
 trap-0.095
 hospital-0.095
Token souven
Token ID45043
Feature activation+1.66e+01

Pos logit contributions
 forgot+0.073
 scar+0.057
 hospital+0.057
 fort+0.057
iper+0.056
Neg logit contributions
anes-0.090
uts-0.085
all-0.080
ous-0.079
 Isn-0.076
Tokenirs
Token ID17062
Feature activation+7.28e+00

Pos logit contributions
uts+0.311
anes+0.300
 Isn+0.291
that+0.273
 Hello+0.265
Neg logit contributions
iper-0.455
 forgot-0.453
 scar-0.428
 closet-0.415
ches-0.401
Token home
Token ID1363
Feature activation+6.37e+00

Pos logit contributions
anes+0.193
uts+0.177
unch+0.173
 Share+0.164
all+0.163
Neg logit contributions
 forgot-0.168
 fort-0.145
 scar-0.142
 shield-0.139
 closet-0.138
Token with
Token ID351
Feature activation+2.69e+00

Pos logit contributions
anes+0.170
uts+0.154
all+0.141
unch+0.139
 Isn+0.139
Neg logit contributions
 forgot-0.158
 scar-0.133
 fort-0.133
 closet-0.130
 hospital-0.129
Token her
Token ID607
Feature activation-8.35e+00

Pos logit contributions
anes+0.083
uts+0.079
all+0.072
ous+0.072
unch+0.070
Neg logit contributions
 forgot-0.056
 scar-0.048
 closet-0.046
 scarf-0.045
 shield-0.045
Token to
Token ID284
Feature activation-2.04e+00

Pos logit contributions
 forgot+0.168
 scar+0.137
 fort+0.132
 closet+0.129
 shield+0.129
Neg logit contributions
anes-0.266
uts-0.252
all-0.237
ous-0.228
that-0.225
Token can
Token ID460
Feature activation+3.06e+00

Pos logit contributions
 fort+0.072
 forgot+0.070
 scar+0.068
eh+0.068
 closet+0.067
Neg logit contributions
anes-0.085
uts-0.081
all-0.073
od-0.072
qu-0.071
Token keep
Token ID1394
Feature activation+4.11e+00

Pos logit contributions
anes+0.093
uts+0.089
all+0.082
ous+0.082
that+0.080
Neg logit contributions
 forgot-0.063
 hospital-0.052
 cast-0.052
 scar-0.052
 shield-0.050
Token it
Token ID340
Feature activation-3.14e+00

Pos logit contributions
anes+0.106
uts+0.100
all+0.088
ous+0.088
unch+0.086
Neg logit contributions
 forgot-0.108
 scar-0.095
 shield-0.090
 hospital-0.089
 trap-0.089
Token as
Token ID355
Feature activation+2.39e+00

Pos logit contributions
 forgot+0.083
 scar+0.069
 hospital+0.068
 fort+0.068
iper+0.066
Neg logit contributions
anes-0.076
uts-0.075
all-0.069
ous-0.064
that-0.063
Token a
Token ID257
Feature activation-4.81e+00

Pos logit contributions
anes+0.071
uts+0.068
all+0.063
ous+0.061
that+0.060
Neg logit contributions
 forgot-0.052
 scar-0.043
 fort-0.042
 hospital-0.041
 trap-0.041
Token souven
Token ID45043
Feature activation+1.62e+01

Pos logit contributions
 forgot+0.099
 scar+0.082
 fort+0.080
 bit+0.078
 trap+0.078
Neg logit contributions
anes-0.153
uts-0.145
all-0.135
ous-0.129
that-0.128
Tokenir
Token ID343
Feature activation+1.71e+00

Pos logit contributions
 Isn+0.374
uts+0.373
anes+0.369
that+0.356
 Hello+0.347
Neg logit contributions
 forgot-0.366
iper-0.357
ig-0.329
 cast-0.323
ches-0.323
Token of
Token ID286
Feature activation+1.36e-01

Pos logit contributions
anes+0.042
uts+0.039
 Isn+0.035
that+0.035
all+0.035
Neg logit contributions
 forgot-0.044
 scar-0.038
 trap-0.038
 hospital-0.038
 head-0.037
Token your
Token ID534
Feature activation-8.48e+00

Pos logit contributions
anes+0.004
uts+0.003
all+0.003
ous+0.003
that+0.003
Neg logit contributions
 forgot-0.003
 scar-0.003
 fort-0.003
 hospital-0.003
 trap-0.003
Token adventure
Token ID8855
Feature activation+7.99e+00

Pos logit contributions
 forgot+0.182
 scar+0.142
iper+0.139
 fort+0.138
 slammed+0.138
Neg logit contributions
anes-0.243
uts-0.238
all-0.221
ous-0.212
 Isn-0.210
Token.
Token ID13
Feature activation+5.11e+00

Pos logit contributions
anes+0.184
uts+0.169
all+0.156
ous+0.151
that+0.148
Neg logit contributions
 forgot-0.225
 scar-0.196
 fort-0.193
 hospital-0.188
 trap-0.188
Token take
Token ID1011
Feature activation+5.04e+00

Pos logit contributions
 forgot+0.048
 scar+0.041
 fort+0.040
 closet+0.038
iper+0.038
Neg logit contributions
anes-0.059
uts-0.057
all-0.055
 Share-0.051
qu-0.051
Token it
Token ID340
Feature activation-2.54e+00

Pos logit contributions
anes+0.115
uts+0.109
all+0.100
ous+0.094
that+0.092
Neg logit contributions
 forgot-0.144
 scar-0.124
 fort-0.122
 hospital-0.119
 closet-0.119
Token home
Token ID1363
Feature activation+6.46e+00

Pos logit contributions
 forgot+0.068
 scar+0.057
 fort+0.055
 hospital+0.055
iper+0.054
Neg logit contributions
anes-0.062
uts-0.060
all-0.056
ous-0.052
that-0.051
Token as
Token ID355
Feature activation+2.53e+00

Pos logit contributions
anes+0.147
uts+0.136
all+0.124
ous+0.119
that+0.115
Neg logit contributions
 forgot-0.186
 scar-0.161
 fort-0.159
 hospital-0.156
 shield-0.155
Token a
Token ID257
Feature activation-4.89e+00

Pos logit contributions
anes+0.072
uts+0.067
all+0.062
ous+0.060
that+0.060
Neg logit contributions
 forgot-0.059
 scar-0.049
 fort-0.049
 shield-0.047
 hospital-0.047
Token souven
Token ID45043
Feature activation+1.61e+01

Pos logit contributions
 forgot+0.103
 scar+0.082
 fort+0.079
 hospital+0.078
iper+0.078
Neg logit contributions
anes-0.143
uts-0.139
all-0.130
ous-0.127
 Isn-0.124
Tokenir
Token ID343
Feature activation+1.28e+00

Pos logit contributions
uts+0.351
anes+0.347
 Isn+0.342
that+0.320
 Hello+0.315
Neg logit contributions
 forgot-0.411
iper-0.377
 closet-0.349
ches-0.347
 scar-0.344
Token of
Token ID286
Feature activation+5.01e-02

Pos logit contributions
anes+0.031
uts+0.029
 Isn+0.026
all+0.026
that+0.025
Neg logit contributions
 forgot-0.034
 fort-0.029
 scar-0.029
 hospital-0.029
 trap-0.029
Token their
Token ID511
Feature activation-9.07e+00

Pos logit contributions
anes+0.001
uts+0.001
all+0.001
ous+0.001
that+0.001
Neg logit contributions
 forgot-0.001
 scar-0.001
 fort-0.001
 hospital-0.001
 trap-0.001
Token sm
Token ID895
Feature activation+7.89e+00

Pos logit contributions
 forgot+0.199
 scar+0.151
iper+0.148
 slammed+0.148
 remembered+0.146
Neg logit contributions
anes-0.254
uts-0.248
all-0.237
ous-0.225
that-0.218
Tokenelly
Token ID6148
Feature activation-1.90e+00

Pos logit contributions
anes+0.134
uts+0.125
 Isn+0.110
that+0.109
qu+0.106
Neg logit contributions
 forgot-0.253
 fort-0.234
 hospital-0.232
 closet-0.228
iper-0.227
Token keep
Token ID1394
Feature activation+4.15e+00

Pos logit contributions
 forgot+0.053
 scar+0.043
 fort+0.043
 hospital+0.041
 shield+0.041
Neg logit contributions
anes-0.075
uts-0.072
all-0.068
ous-0.065
that-0.064
Token the
Token ID262
Feature activation-6.53e+00

Pos logit contributions
anes+0.123
uts+0.118
all+0.109
ous+0.106
that+0.103
Neg logit contributions
 forgot-0.092
 scar-0.077
 fort-0.074
 hospital-0.073
 closet-0.072
Token ring
Token ID5858
Feature activation+5.01e+00

Pos logit contributions
 forgot+0.159
 fort+0.132
 scar+0.132
iper+0.129
 closet+0.125
Neg logit contributions
anes-0.175
uts-0.165
all-0.156
that-0.149
ous-0.148
Token as
Token ID355
Feature activation+2.40e+00

Pos logit contributions
anes+0.115
uts+0.111
all+0.103
ous+0.098
that+0.094
Neg logit contributions
 forgot-0.141
 scar-0.118
 fort-0.114
 hospital-0.114
iper-0.113
Token a
Token ID257
Feature activation-4.98e+00

Pos logit contributions
anes+0.067
uts+0.064
all+0.060
ous+0.057
that+0.056
Neg logit contributions
 forgot-0.056
 scar-0.047
 fort-0.046
 hospital-0.045
 shield-0.045
Token souven
Token ID45043
Feature activation+1.61e+01

Pos logit contributions
 forgot+0.109
 scar+0.089
 fort+0.088
 hospital+0.085
iper+0.084
Neg logit contributions
anes-0.144
uts-0.138
all-0.129
ous-0.125
that-0.122
Tokenir
Token ID343
Feature activation+1.61e+00

Pos logit contributions
uts+0.359
anes+0.352
 Isn+0.342
that+0.323
ous+0.316
Neg logit contributions
 forgot-0.402
iper-0.370
 closet-0.347
 scar-0.347
ig-0.344
Token.
Token ID13
Feature activation+5.32e+00

Pos logit contributions
anes+0.039
uts+0.037
 Isn+0.033
that+0.033
all+0.033
Neg logit contributions
 forgot-0.041
 head-0.036
 scar-0.035
 trap-0.035
 hospital-0.035
Token They
Token ID1119
Feature activation-7.27e+00

Pos logit contributions
anes+0.145
uts+0.135
ous+0.132
qu+0.127
all+0.124
Neg logit contributions
 forgot-0.130
 closet-0.112
 scar-0.110
 fort-0.109
 hospital-0.108
Token thanked
Token ID26280
Feature activation+5.38e+00

Pos logit contributions
 fort+0.142
 trap+0.130
 shield+0.123
iper+0.123
 hospital+0.123
Neg logit contributions
anes-0.195
uts-0.194
all-0.181
 Mike-0.178
od-0.175
Token the
Token ID262
Feature activation-6.46e+00

Pos logit contributions
uts+0.147
anes+0.147
ous+0.126
unch+0.124
all+0.123
Neg logit contributions
 forgot-0.128
 scar-0.116
 closet-0.108
 hospital-0.106
 scarf-0.106
Token to
Token ID284
Feature activation-2.71e+00

Pos logit contributions
anes+0.032
uts+0.031
all+0.028
ous+0.026
that+0.025
Neg logit contributions
 forgot-0.046
 scar-0.040
 fort-0.040
 hospital-0.039
 trap-0.039
Token keep
Token ID1394
Feature activation+4.08e+00

Pos logit contributions
 forgot+0.059
 scar+0.049
 fort+0.048
 closet+0.045
 shield+0.045
Neg logit contributions
anes-0.078
uts-0.074
all-0.071
ous-0.066
qu-0.066
Token them
Token ID606
Feature activation-5.07e+00

Pos logit contributions
anes+0.095
uts+0.090
all+0.081
ous+0.078
unch+0.075
Neg logit contributions
 forgot-0.116
 scar-0.102
 fort-0.099
 hospital-0.097
 shield-0.097
Token as
Token ID355
Feature activation+2.14e+00

Pos logit contributions
 forgot+0.122
 hospital+0.099
iper+0.099
 trap+0.096
eh+0.095
Neg logit contributions
uts-0.129
anes-0.123
all-0.118
eddy-0.106
that-0.106
Token a
Token ID257
Feature activation-5.06e+00

Pos logit contributions
anes+0.059
uts+0.057
all+0.053
ous+0.051
that+0.049
Neg logit contributions
 forgot-0.051
 scar-0.042
 fort-0.042
 hospital-0.041
 closet-0.040
Token souven
Token ID45043
Feature activation+1.61e+01

Pos logit contributions
 forgot+0.115
 scar+0.093
 fort+0.092
 hospital+0.090
iper+0.088
Neg logit contributions
anes-0.143
uts-0.138
all-0.128
ous-0.124
that-0.121
Tokenir
Token ID343
Feature activation+1.47e+00

Pos logit contributions
anes+0.336
uts+0.335
 Isn+0.326
that+0.307
ous+0.291
Neg logit contributions
 forgot-0.430
iper-0.395
 scar-0.379
 closet-0.371
 fort-0.370
Token of
Token ID286
Feature activation-3.11e-02

Pos logit contributions
anes+0.036
uts+0.034
 Isn+0.030
all+0.030
that+0.030
Neg logit contributions
 forgot-0.038
 scar-0.033
 fort-0.033
 head-0.032
 trap-0.032
Token Frost
Token ID15122
Feature activation-2.03e+00

Pos logit contributions
 forgot+0.001
 scar+0.001
 fort+0.001
 hospital+0.001
 shield+0.001
Neg logit contributions
anes-0.001
uts-0.001
all-0.001
ous-0.001
that-0.001
Tokeny
Token ID88
Feature activation-5.42e+00

Pos logit contributions
 forgot+0.053
 fort+0.046
 scar+0.046
 bit+0.044
 head+0.043
Neg logit contributions
anes-0.052
uts-0.047
all-0.043
 Isn-0.043
that-0.042
Token.
Token ID13
Feature activation+4.52e+00

Pos logit contributions
 forgot+0.150
 scar+0.125
iper+0.121
 fort+0.120
 slammed+0.120
Neg logit contributions
anes-0.123
uts-0.120
all-0.109
ous-0.104
ges-0.101
Token gold
Token ID3869
Feature activation+2.05e+00

Pos logit contributions
 forgot+0.151
 scar+0.124
iper+0.124
 fort+0.123
 hospital+0.119
Neg logit contributions
anes-0.166
uts-0.156
all-0.151
that-0.144
 Isn-0.142
Token coin
Token ID10752
Feature activation+3.29e+00

Pos logit contributions
anes+0.060
uts+0.058
all+0.054
ous+0.051
that+0.051
Neg logit contributions
 forgot-0.045
 scar-0.037
 fort-0.036
 hospital-0.036
 closet-0.035
Token as
Token ID355
Feature activation+2.19e+00

Pos logit contributions
uts+0.075
anes+0.074
all+0.069
ous+0.063
that+0.063
Neg logit contributions
 forgot-0.090
 scar-0.078
 hospital-0.076
 fort-0.074
 closet-0.074
Token a
Token ID257
Feature activation-5.14e+00

Pos logit contributions
anes+0.062
uts+0.059
all+0.055
ous+0.053
that+0.052
Neg logit contributions
 forgot-0.051
 scar-0.042
 fort-0.041
 hospital-0.040
 shield-0.040
Token special
Token ID2041
Feature activation+1.76e-01

Pos logit contributions
 forgot+0.118
 scar+0.098
 fort+0.096
 hospital+0.093
 trap+0.093
Neg logit contributions
anes-0.147
uts-0.140
all-0.130
ous-0.125
that-0.123
Token souven
Token ID45043
Feature activation+1.60e+01

Pos logit contributions
anes+0.005
uts+0.005
all+0.004
ous+0.004
that+0.004
Neg logit contributions
 forgot-0.004
 scar-0.003
 fort-0.003
 hospital-0.003
 closet-0.003
Tokenir
Token ID343
Feature activation+1.58e+00

Pos logit contributions
uts+0.350
anes+0.338
 Isn+0.335
 Hello+0.311
that+0.310
Neg logit contributions
 forgot-0.399
iper-0.385
 closet-0.344
 scar-0.344
 cast-0.341
Token of
Token ID286
Feature activation+1.72e-01

Pos logit contributions
anes+0.038
uts+0.036
all+0.032
 Isn+0.032
that+0.031
Neg logit contributions
 forgot-0.041
 scar-0.035
 fort-0.035
 trap-0.035
 hospital-0.035
Token Sarah
Token ID10490
Feature activation-6.87e+00

Pos logit contributions
anes+0.005
uts+0.004
all+0.004
ous+0.004
that+0.004
Neg logit contributions
 forgot-0.004
 scar-0.004
 fort-0.004
 hospital-0.003
 shield-0.003
Token's
Token ID338
Feature activation-5.64e+00

Pos logit contributions
 forgot+0.177
 scar+0.153
 fort+0.151
 hospital+0.148
 shield+0.146
Neg logit contributions
anes-0.166
uts-0.164
all-0.151
ous-0.143
od-0.140
Token adventure
Token ID8855
Feature activation+8.29e+00

Pos logit contributions
 forgot+0.124
 scar+0.100
 fort+0.097
 hospital+0.094
iper+0.094
Neg logit contributions
anes-0.164
uts-0.157
all-0.148
ous-0.141
that-0.139
Token could
Token ID714
Feature activation+2.98e+00

Pos logit contributions
 forgot+0.138
 fort+0.135
 scar+0.133
 trap+0.129
iper+0.127
Neg logit contributions
anes-0.202
uts-0.192
all-0.178
ous-0.173
that-0.171
Token keep
Token ID1394
Feature activation+3.83e+00

Pos logit contributions
anes+0.099
uts+0.095
ous+0.088
all+0.087
 Isn+0.085
Neg logit contributions
 forgot-0.057
 scar-0.044
 hospital-0.044
 cast-0.043
 fort-0.043
Token it
Token ID340
Feature activation-3.13e+00

Pos logit contributions
anes+0.091
uts+0.086
all+0.079
ous+0.075
that+0.073
Neg logit contributions
 forgot-0.106
 scar-0.092
 fort-0.090
 hospital-0.089
 shield-0.088
Token as
Token ID355
Feature activation+2.25e+00

Pos logit contributions
 forgot+0.079
iper+0.067
 hospital+0.065
 scar+0.064
 slammed+0.063
Neg logit contributions
uts-0.075
anes-0.070
all-0.070
that-0.061
ous-0.061
Token a
Token ID257
Feature activation-4.96e+00

Pos logit contributions
anes+0.062
uts+0.060
all+0.056
ous+0.053
that+0.051
Neg logit contributions
 forgot-0.053
 scar-0.044
 fort-0.044
 hospital-0.043
 closet-0.042
Token souven
Token ID45043
Feature activation+1.60e+01

Pos logit contributions
 forgot+0.106
 scar+0.086
 fort+0.085
 hospital+0.082
 shield+0.081
Neg logit contributions
anes-0.149
uts-0.142
all-0.133
ous-0.128
that-0.126
Tokenir
Token ID343
Feature activation+1.68e+00

Pos logit contributions
anes+0.336
uts+0.330
 Isn+0.320
that+0.302
 Hello+0.294
Neg logit contributions
 forgot-0.434
iper-0.395
 scar-0.375
 closet-0.370
 cast-0.370
Token.
Token ID13
Feature activation+5.19e+00

Pos logit contributions
anes+0.040
uts+0.038
all+0.034
 Isn+0.034
that+0.033
Neg logit contributions
 forgot-0.045
 hospital-0.038
 fort-0.038
 scar-0.038
 trap-0.038
Token
Token ID198
Feature activation+4.67e+00

Pos logit contributions
anes+0.141
ous+0.132
qu+0.123
work+0.120
od+0.119
Neg logit contributions
 forgot-0.124
 closet-0.109
 hospital-0.107
 scar-0.102
 learned-0.099
Token
Token ID198
Feature activation+4.20e+00

Pos logit contributions
anes+0.115
ous+0.105
umm+0.103
uts+0.103
all+0.097
Neg logit contributions
 forgot-0.118
 hospital-0.105
 learned-0.101
 fort-0.101
 scar-0.101
TokenS
Token ID50
Feature activation-1.84e+00

Pos logit contributions
anes+0.100
uts+0.091
ous+0.085
all+0.083
od+0.078
Neg logit contributions
 forgot-0.121
 scar-0.103
 fort-0.100
 hospital-0.100
 bit-0.099
Token He
Token ID679
Feature activation-1.01e+01

Pos logit contributions
anes+0.105
uts+0.103
ous+0.096
qu+0.094
all+0.088
Neg logit contributions
 forgot-0.108
 hospital-0.093
 closet-0.092
 fort-0.090
iper-0.090
Token loaded
Token ID9639
Feature activation+6.90e+00

Pos logit contributions
 fort+0.182
 forgot+0.176
 closet+0.170
 trap+0.169
 scar+0.165
Neg logit contributions
anes-0.280
all-0.277
uts-0.268
ous-0.254
od-0.251
Token his
Token ID465
Feature activation-9.98e+00

Pos logit contributions
anes+0.167
uts+0.156
all+0.149
that+0.138
 Mike+0.136
Neg logit contributions
 forgot-0.188
 fort-0.153
 hospital-0.153
 scar-0.152
 trap-0.149
Token backpack
Token ID22526
Feature activation+5.28e+00

Pos logit contributions
 forgot+0.218
 slammed+0.168
 accidentally+0.163
 hospital+0.163
eh+0.160
Neg logit contributions
anes-0.252
uts-0.245
 aboard-0.237
all-0.231
ach-0.227
Token with
Token ID351
Feature activation+1.40e+00

Pos logit contributions
anes+0.123
uts+0.119
all+0.110
ous+0.101
that+0.100
Neg logit contributions
 forgot-0.147
 scar-0.126
 hospital-0.123
 fort-0.121
 trap-0.121
Token souven
Token ID45043
Feature activation+1.60e+01

Pos logit contributions
anes+0.035
uts+0.033
all+0.030
ous+0.029
that+0.028
Neg logit contributions
 forgot-0.037
 scar-0.031
 fort-0.031
 hospital-0.030
 trap-0.030
Tokenirs
Token ID17062
Feature activation+7.54e+00

Pos logit contributions
uts+0.247
anes+0.237
 Isn+0.227
that+0.213
 Hello+0.200
Neg logit contributions
 forgot-0.502
iper-0.492
 scar-0.473
 shield-0.463
 cast-0.452
Token and
Token ID290
Feature activation-5.06e+00

Pos logit contributions
anes+0.200
uts+0.187
all+0.173
unch+0.170
 Isn+0.167
Neg logit contributions
 forgot-0.182
 scar-0.153
 fort-0.153
 shield-0.150
 hospital-0.146
Token kept
Token ID4030
Feature activation+4.72e+00

Pos logit contributions
 forgot+0.104
 fort+0.091
 scar+0.091
 trap+0.088
 closet+0.088
Neg logit contributions
anes-0.140
uts-0.134
all-0.131
ous-0.126
 Mike-0.121
Token travelling
Token ID16574
Feature activation+9.06e+00

Pos logit contributions
anes+0.125
uts+0.120
ous+0.106
all+0.106
unch+0.103
Neg logit contributions
 forgot-0.117
 scar-0.105
 shield-0.100
 cast-0.098
 fort-0.098
Token until
Token ID1566
Feature activation+4.87e-01

Pos logit contributions
anes+0.219
uts+0.208
all+0.192
ous+0.183
that+0.178
Neg logit contributions
 forgot-0.245
 scar-0.208
 fort-0.204
 hospital-0.202
 trap-0.200
Token take
Token ID1011
Feature activation+4.70e+00

Pos logit contributions
 forgot+0.126
iper+0.119
 scar+0.117
 fort+0.114
eh+0.110
Neg logit contributions
anes-0.147
uts-0.145
all-0.138
that-0.132
ous-0.131
Token it
Token ID340
Feature activation-3.00e+00

Pos logit contributions
anes+0.106
uts+0.102
all+0.095
that+0.087
ous+0.086
Neg logit contributions
 forgot-0.133
 fort-0.115
 scar-0.115
 closet-0.112
 shield-0.110
Token home
Token ID1363
Feature activation+5.99e+00

Pos logit contributions
 forgot+0.069
 scar+0.058
 fort+0.056
 hospital+0.055
 closet+0.054
Neg logit contributions
anes-0.084
uts-0.081
all-0.076
ous-0.072
that-0.071
Token as
Token ID355
Feature activation+2.31e+00

Pos logit contributions
anes+0.137
uts+0.125
all+0.113
ous+0.109
 Isn+0.107
Neg logit contributions
 forgot-0.174
 scar-0.150
 fort-0.149
 hospital-0.146
 bit-0.145
Token a
Token ID257
Feature activation-5.09e+00

Pos logit contributions
anes+0.065
uts+0.061
all+0.057
ous+0.055
that+0.054
Neg logit contributions
 forgot-0.055
 scar-0.046
 fort-0.045
 hospital-0.044
 shield-0.043
Token souven
Token ID45043
Feature activation+1.60e+01

Pos logit contributions
 forgot+0.108
 scar+0.089
 fort+0.088
 trap+0.085
 bit+0.085
Neg logit contributions
anes-0.157
uts-0.149
all-0.138
ous-0.133
that-0.131
Tokenir
Token ID343
Feature activation+1.44e+00

Pos logit contributions
anes+0.337
uts+0.327
 Isn+0.306
that+0.289
ous+0.287
Neg logit contributions
 forgot-0.445
iper-0.397
 scar-0.383
 cast-0.378
 hospital-0.377
Token from
Token ID422
Feature activation+5.83e-01

Pos logit contributions
anes+0.036
uts+0.033
 Isn+0.031
that+0.029
all+0.029
Neg logit contributions
 forgot-0.037
 hospital-0.032
 trap-0.032
 head-0.032
 fort-0.031
Token the
Token ID262
Feature activation-6.24e+00

Pos logit contributions
anes+0.017
uts+0.016
all+0.015
ous+0.014
that+0.014
Neg logit contributions
 forgot-0.013
 scar-0.011
 fort-0.011
 hospital-0.010
 shield-0.010
Token park
Token ID3952
Feature activation+4.50e+00

Pos logit contributions
 forgot+0.124
 scar+0.098
iper+0.096
 fort+0.095
 closet+0.090
Neg logit contributions
anes-0.194
uts-0.183
all-0.176
ous-0.169
that-0.166
Token.
Token ID13
Feature activation+4.69e+00

Pos logit contributions
anes+0.101
uts+0.094
all+0.085
ous+0.081
that+0.079
Neg logit contributions
 forgot-0.131
 fort-0.112
 scar-0.112
 hospital-0.111
 trap-0.109
Token a
Token ID257
Feature activation-5.41e+00

Pos logit contributions
 forgot+0.194
 scar+0.164
 fort+0.163
 trap+0.157
 closet+0.157
Neg logit contributions
anes-0.179
uts-0.176
all-0.163
that-0.153
ous-0.150
Token purple
Token ID14032
Feature activation-3.18e-01

Pos logit contributions
 forgot+0.109
iper+0.084
 fort+0.083
 scar+0.082
 slammed+0.080
Neg logit contributions
anes-0.154
uts-0.152
ous-0.145
all-0.143
 Isn-0.138
Token feather
Token ID17972
Feature activation+4.49e+00

Pos logit contributions
 forgot+0.007
 scar+0.005
iper+0.005
 slammed+0.005
 fort+0.005
Neg logit contributions
anes-0.010
uts-0.009
all-0.009
ous-0.009
unch-0.008
Token as
Token ID355
Feature activation+1.61e+00

Pos logit contributions
anes+0.115
uts+0.106
all+0.099
 Isn+0.093
ous+0.093
Neg logit contributions
 forgot-0.116
 scar-0.100
 fort-0.099
 shield-0.097
 scarf-0.096
Token a
Token ID257
Feature activation-5.29e+00

Pos logit contributions
anes+0.044
uts+0.042
all+0.039
ous+0.037
that+0.037
Neg logit contributions
 forgot-0.038
 scar-0.032
 fort-0.032
 hospital-0.031
 shield-0.030
Token souven
Token ID45043
Feature activation+1.58e+01

Pos logit contributions
 forgot+0.111
 scar+0.091
 fort+0.089
 hospital+0.086
 shield+0.085
Neg logit contributions
anes-0.160
uts-0.153
all-0.143
ous-0.138
that-0.135
Tokenir
Token ID343
Feature activation+1.48e+00

Pos logit contributions
anes+0.344
uts+0.340
 Isn+0.323
that+0.302
 Hello+0.289
Neg logit contributions
 forgot-0.410
iper-0.388
 scar-0.364
 shield-0.350
ches-0.350
Token.
Token ID13
Feature activation+4.78e+00

Pos logit contributions
anes+0.030
uts+0.028
 Isn+0.025
all+0.024
that+0.024
Neg logit contributions
 forgot-0.044
 trap-0.039
 scar-0.039
 shield-0.039
 hospital-0.039
Token
Token ID220
Feature activation+8.21e+00

Pos logit contributions
anes+0.115
uts+0.107
all+0.096
ous+0.096
od+0.093
Neg logit contributions
 forgot-0.130
 fort-0.112
 hospital-0.110
 scar-0.110
 closet-0.109
Token
Token ID198
Feature activation+4.44e+00

Pos logit contributions
anes+0.185
ous+0.173
umm+0.170
uts+0.164
all+0.163
Neg logit contributions
 forgot-0.203
iper-0.173
 fort-0.173
 closet-0.169
 hospital-0.168
Token
Token ID198
Feature activation+4.01e+00

Pos logit contributions
anes+0.104
uts+0.093
ous+0.092
work+0.084
od+0.082
Neg logit contributions
 forgot-0.122
 learned-0.101
eh-0.101
iper-0.100
 cast-0.100
Token keep
Token ID1394
Feature activation+3.89e+00

Pos logit contributions
 forgot+0.055
 scar+0.046
 fort+0.045
 closet+0.042
iper+0.042
Neg logit contributions
anes-0.078
uts-0.075
all-0.072
ous-0.068
qu-0.067
Token the
Token ID262
Feature activation-6.63e+00

Pos logit contributions
anes+0.102
uts+0.097
all+0.090
ous+0.086
that+0.084
Neg logit contributions
 forgot-0.099
 scar-0.083
 fort-0.082
 hospital-0.079
 trap-0.079
Token spring
Token ID6076
Feature activation+5.68e+00

Pos logit contributions
 forgot+0.144
iper+0.115
 scar+0.102
 slammed+0.101
 remembered+0.100
Neg logit contributions
anes-0.192
uts-0.179
all-0.174
that-0.169
unch-0.166
Token as
Token ID355
Feature activation+2.07e+00

Pos logit contributions
anes+0.135
uts+0.131
all+0.122
ous+0.116
that+0.111
Neg logit contributions
 forgot-0.159
 scar-0.130
 remembered-0.128
 closet-0.125
iper-0.124
Token a
Token ID257
Feature activation-5.10e+00

Pos logit contributions
anes+0.055
uts+0.053
all+0.050
ous+0.047
that+0.045
Neg logit contributions
 forgot-0.050
 scar-0.043
 fort-0.042
 closet-0.041
iper-0.041
Token souven
Token ID45043
Feature activation+1.58e+01

Pos logit contributions
 forgot+0.103
 scar+0.081
iper+0.080
 fort+0.079
 closet+0.076
Neg logit contributions
anes-0.149
uts-0.148
all-0.139
ous-0.136
unch-0.132
Tokenir
Token ID343
Feature activation+1.33e+00

Pos logit contributions
uts+0.329
anes+0.328
 Isn+0.325
that+0.300
 Hello+0.293
Neg logit contributions
 forgot-0.415
iper-0.382
 hospital-0.361
 cast-0.361
 scar-0.359
Token of
Token ID286
Feature activation-2.07e-01

Pos logit contributions
anes+0.029
uts+0.027
all+0.024
 Isn+0.023
that+0.023
Neg logit contributions
 forgot-0.039
 scar-0.034
 fort-0.034
 hospital-0.033
 trap-0.033
Token his
Token ID465
Feature activation-9.14e+00

Pos logit contributions
 forgot+0.005
 scar+0.004
 fort+0.004
 hospital+0.004
 shield+0.004
Neg logit contributions
anes-0.006
uts-0.006
all-0.005
ous-0.005
that-0.005
Token adventure
Token ID8855
Feature activation+7.74e+00

Pos logit contributions
 forgot+0.198
iper+0.150
 scar+0.148
 slammed+0.145
 remembered+0.142
Neg logit contributions
uts-0.258
anes-0.257
all-0.240
ous-0.231
unch-0.225
Token.
Token ID13
Feature activation+4.78e+00

Pos logit contributions
anes+0.169
uts+0.161
all+0.146
ous+0.138
that+0.133
Neg logit contributions
 forgot-0.228
 scar-0.199
 fort-0.195
 hospital-0.191
 closet-0.190
Token never
Token ID1239
Feature activation-3.31e+00

Pos logit contributions
 forgot+0.076
 scar+0.066
 fort+0.065
 closet+0.063
iper+0.063
Neg logit contributions
anes-0.083
uts-0.080
all-0.076
ous-0.072
that-0.070
Token dared
Token ID28765
Feature activation+8.83e+00

Pos logit contributions
 forgot+0.064
 scar+0.053
 fort+0.052
 hospital+0.050
 trap+0.050
Neg logit contributions
anes-0.104
uts-0.100
all-0.094
ous-0.092
that-0.089
Token to
Token ID284
Feature activation-1.85e+00

Pos logit contributions
anes+0.212
uts+0.198
all+0.175
ous+0.172
unch+0.170
Neg logit contributions
 forgot-0.242
 scar-0.215
 fort-0.205
 hospital-0.204
 shield-0.204
Token join
Token ID4654
Feature activation+9.23e+00

Pos logit contributions
 forgot+0.039
 way+0.030
 bit+0.030
 head+0.030
 cast+0.030
Neg logit contributions
anes-0.061
uts-0.057
eddy-0.053
unch-0.052
all-0.052
Token them
Token ID606
Feature activation-3.87e+00

Pos logit contributions
anes+0.255
uts+0.246
all+0.219
ous+0.216
unch+0.215
Neg logit contributions
 forgot-0.214
 scar-0.185
 shield-0.180
 fort-0.179
 hospital-0.178
Token<|endoftext|>
Token ID50256
Feature activation+1.58e+01

Pos logit contributions
 forgot+0.097
 fort+0.082
 hospital+0.081
 scar+0.080
 shield+0.080
Neg logit contributions
anes-0.103
uts-0.091
 Isn-0.085
ous-0.085
unch-0.085
TokenAb
Token ID4826
Feature activation-4.24e+00

Pos logit contributions
uts+0.462
ous+0.441
anes+0.420
amb+0.401
all+0.392
Neg logit contributions
 saw-0.294
 forgot-0.292
 closet-0.280
 might-0.277
 fort-0.276
Tokenby
Token ID1525
Feature activation-3.43e+00

Pos logit contributions
 forgot+0.089
 hospital+0.073
 scar+0.073
 fort+0.072
 shield+0.071
Neg logit contributions
anes-0.126
uts-0.120
all-0.115
that-0.109
ges-0.108
Token and
Token ID290
Feature activation-5.34e+00

Pos logit contributions
 forgot+0.088
 scar+0.082
 fort+0.082
 closet+0.080
 hospital+0.080
Neg logit contributions
uts-0.077
all-0.073
anes-0.073
od-0.067
ous-0.065
Token Tommy
Token ID19919
Feature activation-5.87e+00

Pos logit contributions
 forgot+0.120
 scar+0.098
 fort+0.095
 hospital+0.095
 shield+0.094
Neg logit contributions
anes-0.155
uts-0.149
all-0.138
ous-0.133
unch-0.130
Token were
Token ID547
Feature activation-1.39e+00

Pos logit contributions
 forgot+0.135
 scar+0.109
 fort+0.106
 hospital+0.105
 closet+0.104
Neg logit contributions
anes-0.170
uts-0.163
all-0.150
ous-0.145
unch-0.142
Token one
Token ID530
Feature activation-1.22e+00

Pos logit contributions
uts+0.224
anes+0.223
ous+0.203
unch+0.197
all+0.191
Neg logit contributions
 forgot-0.131
 scar-0.131
 scarf-0.115
 hospital-0.113
 shield-0.113
Token for
Token ID329
Feature activation+7.26e-01

Pos logit contributions
 forgot+0.027
 scar+0.024
 bit+0.023
 fort+0.023
 way+0.023
Neg logit contributions
anes-0.036
uts-0.034
ous-0.030
all-0.030
qu-0.029
Token himself
Token ID2241
Feature activation-7.24e+00

Pos logit contributions
anes+0.021
uts+0.020
ous+0.018
all+0.018
 Isn+0.017
Neg logit contributions
 forgot-0.017
 scar-0.014
 hospital-0.014
 cast-0.013
 shield-0.013
Token as
Token ID355
Feature activation+1.77e+00

Pos logit contributions
 forgot+0.172
 scar+0.143
 fort+0.140
 bit+0.137
 shield+0.135
Neg logit contributions
anes-0.205
uts-0.184
all-0.171
 Isn-0.166
ous-0.166
Token a
Token ID257
Feature activation-5.01e+00

Pos logit contributions
anes+0.056
uts+0.053
qu+0.048
 Isn+0.048
ous+0.048
Neg logit contributions
 forgot-0.034
 scar-0.029
 shield-0.028
 scarf-0.026
 sword-0.026
Token souven
Token ID45043
Feature activation+1.58e+01

Pos logit contributions
 forgot+0.105
 scar+0.089
 fort+0.086
 bit+0.085
 trap+0.083
Neg logit contributions
anes-0.157
uts-0.149
all-0.137
ous-0.131
that-0.130
Tokenir
Token ID343
Feature activation+1.22e+00

Pos logit contributions
 Isn+0.363
uts+0.358
anes+0.344
that+0.335
 Hello+0.332
Neg logit contributions
iper-0.361
 forgot-0.359
ig-0.319
ches-0.319
 scar-0.316
Token!
Token ID0
Feature activation+4.09e+00

Pos logit contributions
anes+0.031
uts+0.030
 Isn+0.027
eddy+0.026
that+0.026
Neg logit contributions
 forgot-0.029
iper-0.025
 head-0.025
 trap-0.025
 scar-0.025
Token But
Token ID887
Feature activation-9.84e-01

Pos logit contributions
anes+0.114
uts+0.111
qu+0.102
ous+0.101
all+0.098
Neg logit contributions
 forgot-0.095
 hospital-0.078
 closet-0.078
 scar-0.076
 fort-0.075
Token his
Token ID465
Feature activation-1.07e+01

Pos logit contributions
 forgot+0.024
 fort+0.022
 scar+0.022
 trap+0.021
iper+0.021
Neg logit contributions
all-0.023
anes-0.023
uts-0.022
 Mike-0.021
that-0.021
Token mum
Token ID25682
Feature activation-1.34e+00

Pos logit contributions
 forgot+0.289
 scar+0.241
 fort+0.239
 hospital+0.234
 slammed+0.231
Neg logit contributions
anes-0.247
uts-0.243
all-0.229
ous-0.212
unch-0.207
Token found
Token ID1043
Feature activation+3.62e+00

Pos logit contributions
 forgot+0.073
 scar+0.057
 bit+0.055
 remembered+0.055
 cast+0.055
Neg logit contributions
anes-0.090
uts-0.084
all-0.077
ous-0.076
unch-0.074
Token a
Token ID257
Feature activation-4.64e+00

Pos logit contributions
anes+0.108
uts+0.103
all+0.092
unch+0.092
ous+0.090
Neg logit contributions
 forgot-0.076
 scar-0.068
 scarf-0.064
 trap-0.063
 hospital-0.063
Token beautiful
Token ID4950
Feature activation-5.22e-01

Pos logit contributions
 forgot+0.097
 scar+0.078
 fort+0.077
 hospital+0.075
 shield+0.074
Neg logit contributions
anes-0.138
uts-0.132
all-0.125
ous-0.122
that-0.119
Token sun
Token ID4252
Feature activation+4.57e+00

Pos logit contributions
 forgot+0.011
 scar+0.009
 fort+0.009
 hospital+0.008
 shield+0.008
Neg logit contributions
anes-0.016
uts-0.015
all-0.014
ous-0.014
that-0.013
Tokenflower
Token ID25547
Feature activation+2.21e+00

Pos logit contributions
anes+0.109
uts+0.104
 Share+0.088
all+0.086
 Isn+0.085
Neg logit contributions
 forgot-0.116
 head-0.113
 scarf-0.111
 scar-0.108
 fort-0.107
Token<|endoftext|>
Token ID50256
Feature activation+1.57e+01

Pos logit contributions
anes+0.052
uts+0.046
all+0.041
ges+0.040
 Share+0.039
Neg logit contributions
 forgot-0.060
 bit-0.055
 head-0.054
 fort-0.054
 scarf-0.054
TokenOnce
Token ID7454
Feature activation+7.54e+00

Pos logit contributions
uts+0.438
anes+0.408
ous+0.401
all+0.360
amb+0.356
Neg logit contributions
 forgot-0.336
 saw-0.314
 closet-0.311
 might-0.298
 fort-0.296
Token upon
Token ID2402
Feature activation-9.52e-02

Pos logit contributions
anes+0.223
uts+0.213
all+0.197
ous+0.190
qu+0.186
Neg logit contributions
 forgot-0.167
 scar-0.136
 fort-0.134
 hospital-0.130
 closet-0.129
Token a
Token ID257
Feature activation-3.22e+00

Pos logit contributions
 forgot+0.002
 scar+0.002
 fort+0.002
 scarf+0.002
 hospital+0.002
Neg logit contributions
anes-0.003
uts-0.003
ous-0.003
all-0.003
qu-0.002
Token time
Token ID640
Feature activation+5.19e+00

Pos logit contributions
 forgot+0.064
 scar+0.054
 shield+0.054
 hospital+0.052
 needle+0.051
Neg logit contributions
all-0.090
uts-0.088
anes-0.086
unch-0.085
that-0.083
Token,
Token ID11
Feature activation+7.80e-02

Pos logit contributions
anes+0.200
uts+0.184
eddy+0.172
good+0.170
qu+0.170
Neg logit contributions
 forgot-0.074
 didn-0.051
 fort-0.050
 closet-0.048
 way-0.048
Token to
Token ID284
Feature activation-2.66e+00

Pos logit contributions
anes+0.078
uts+0.068
qu+0.063
all+0.062
 Isn+0.061
Neg logit contributions
 forgot-0.059
 scar-0.047
 bit-0.047
 fort-0.046
 hospital-0.046
Token have
Token ID423
Feature activation-8.95e-01

Pos logit contributions
 forgot+0.041
 head+0.031
 hospital+0.028
 learned+0.028
 remembered+0.028
Neg logit contributions
anes-0.101
uts-0.093
unch-0.088
all-0.087
ous-0.087
Token such
Token ID884
Feature activation-3.48e+00

Pos logit contributions
 forgot+0.020
 fort+0.018
 scar+0.018
 hospital+0.017
 closet+0.017
Neg logit contributions
anes-0.024
uts-0.024
all-0.022
that-0.021
od-0.020
Token a
Token ID257
Feature activation-5.07e+00

Pos logit contributions
 forgot+0.092
 fort+0.077
 scar+0.075
 closet+0.073
 hospital+0.073
Neg logit contributions
anes-0.083
uts-0.081
all-0.076
that-0.071
ous-0.070
Token cool
Token ID3608
Feature activation-1.97e-01

Pos logit contributions
 forgot+0.114
ns+0.104
iper+0.103
ts+0.101
ches+0.098
Neg logit contributions
uts-0.112
all-0.109
eddy-0.107
that-0.103
ous-0.102
Token souven
Token ID45043
Feature activation+1.57e+01

Pos logit contributions
 forgot+0.005
ns+0.004
 scar+0.004
 slammed+0.004
 remembered+0.004
Neg logit contributions
uts-0.005
anes-0.005
all-0.005
eddy-0.005
unch-0.005
Tokenir
Token ID343
Feature activation+1.09e+00

Pos logit contributions
uts+0.337
anes+0.332
 Isn+0.332
 Hello+0.306
that+0.299
Neg logit contributions
 forgot-0.399
iper-0.372
 scar-0.339
 closet-0.336
ches-0.336
Token from
Token ID422
Feature activation+6.18e-01

Pos logit contributions
anes+0.025
uts+0.023
 Isn+0.021
all+0.020
that+0.020
Neg logit contributions
 forgot-0.030
 fort-0.026
 hospital-0.026
 scar-0.026
 trap-0.025
Token their
Token ID511
Feature activation-9.02e+00

Pos logit contributions
anes+0.018
uts+0.017
all+0.016
ous+0.015
that+0.015
Neg logit contributions
 forgot-0.014
 scar-0.011
 fort-0.011
 hospital-0.011
 shield-0.011
Token exciting
Token ID7895
Feature activation+1.48e-01

Pos logit contributions
 forgot+0.203
eh+0.159
 scar+0.156
 slammed+0.151
 shield+0.151
Neg logit contributions
anes-0.238
uts-0.234
all-0.227
ous-0.216
that-0.210
Token adventure
Token ID8855
Feature activation+8.42e+00

Pos logit contributions
anes+0.004
uts+0.004
all+0.004
ous+0.004
eddy+0.004
Neg logit contributions
 forgot-0.003
 scar-0.003
 slammed-0.002
 remembered-0.002
 sank-0.002
Token a
Token ID257
Feature activation-4.82e+00

Pos logit contributions
anes+0.057
uts+0.054
all+0.050
ous+0.048
that+0.047
Neg logit contributions
 forgot-0.049
 scar-0.041
 fort-0.041
 hospital-0.040
 shield-0.039
Token blue
Token ID4171
Feature activation+2.91e+00

Pos logit contributions
 forgot+0.104
 scar+0.084
 fort+0.084
 hospital+0.082
iper+0.082
Neg logit contributions
anes-0.138
uts-0.135
all-0.122
ous-0.121
 Isn-0.120
Token pass
Token ID1208
Feature activation+8.10e+00

Pos logit contributions
anes+0.103
uts+0.099
all+0.093
ous+0.090
that+0.089
Neg logit contributions
 forgot-0.047
 scar-0.036
 fort-0.035
 hospital-0.033
 trap-0.033
Token in
Token ID287
Feature activation-1.65e-01

Pos logit contributions
anes+0.189
uts+0.180
all+0.163
unch+0.157
ous+0.154
Neg logit contributions
 forgot-0.220
 fort-0.188
 scar-0.188
 closet-0.187
 hospital-0.184
Token it
Token ID340
Feature activation-2.41e+00

Pos logit contributions
 forgot+0.003
 closet+0.003
 scar+0.003
 bit+0.003
 scarf+0.003
Neg logit contributions
anes-0.005
uts-0.005
ous-0.005
unch-0.004
amb-0.004
Token<|endoftext|>
Token ID50256
Feature activation+1.56e+01

Pos logit contributions
 forgot+0.068
 scar+0.059
 fort+0.058
 closet+0.057
 bit+0.056
Neg logit contributions
anes-0.056
uts-0.052
all-0.047
ous-0.045
unch-0.044
TokenOnce
Token ID7454
Feature activation+7.50e+00

Pos logit contributions
uts+0.470
ous+0.434
anes+0.428
amb+0.395
all+0.392
Neg logit contributions
 forgot-0.287
 saw-0.287
 closet-0.278
 might-0.270
 way-0.265
Token upon
Token ID2402
Feature activation-2.20e-01

Pos logit contributions
anes+0.219
uts+0.210
all+0.194
ous+0.187
that+0.183
Neg logit contributions
 forgot-0.167
 scar-0.136
 fort-0.135
 hospital-0.130
 closet-0.129
Token a
Token ID257
Feature activation-3.43e+00

Pos logit contributions
 forgot+0.005
 scar+0.004
 fort+0.004
 scarf+0.004
 hospital+0.004
Neg logit contributions
anes-0.007
uts-0.006
all-0.006
ous-0.006
qu-0.006
Token time
Token ID640
Feature activation+5.00e+00

Pos logit contributions
 forgot+0.067
 scar+0.057
 shield+0.055
 hospital+0.054
 closet+0.053
Neg logit contributions
uts-0.096
all-0.096
anes-0.092
unch-0.091
that-0.089
Token there
Token ID612
Feature activation+3.24e+00

Pos logit contributions
anes+0.192
uts+0.177
eddy+0.165
good+0.165
qu+0.165
Neg logit contributions
 forgot-0.070
 didn-0.048
 fort-0.048
 closet-0.046
 way-0.046
Token tr
Token ID491
Feature activation+5.96e+00

Pos logit contributions
 forgot+0.039
iper+0.028
 scar+0.028
 hospital+0.027
 fort+0.027
Neg logit contributions
anes-0.068
uts-0.067
all-0.063
ous-0.062
 Isn-0.061
Tokenink
Token ID676
Feature activation-3.50e-01

Pos logit contributions
anes+0.120
uts+0.111
ous+0.101
all+0.100
 Isn+0.100
Neg logit contributions
 forgot-0.172
 fort-0.157
 closet-0.154
 scar-0.152
iper-0.151
Tokenet
Token ID316
Feature activation-1.17e+00

Pos logit contributions
 forgot+0.009
 scar+0.007
 fort+0.007
 closet+0.007
 hospital+0.007
Neg logit contributions
anes-0.009
uts-0.009
all-0.008
ous-0.008
that-0.008
Token as
Token ID355
Feature activation+1.43e+00

Pos logit contributions
 forgot+0.024
 scar+0.021
 fort+0.020
 scarf+0.020
 shield+0.020
Neg logit contributions
anes-0.035
uts-0.033
all-0.031
 Isn-0.030
qu-0.030
Token a
Token ID257
Feature activation-5.40e+00

Pos logit contributions
anes+0.044
uts+0.042
all+0.038
qu+0.037
ous+0.037
Neg logit contributions
 forgot-0.030
 scar-0.024
 shield-0.024
 fort-0.023
 scarf-0.023
Token souven
Token ID45043
Feature activation+1.56e+01

Pos logit contributions
 forgot+0.113
 scar+0.091
 fort+0.089
 hospital+0.087
 closet+0.085
Neg logit contributions
anes-0.162
uts-0.156
all-0.146
ous-0.143
that-0.139
Tokenir
Token ID343
Feature activation+1.06e+00

Pos logit contributions
anes+0.332
uts+0.331
 Isn+0.316
that+0.296
 Hello+0.291
Neg logit contributions
 forgot-0.407
iper-0.382
 scar-0.358
 closet-0.355
 fort-0.346
Token.
Token ID13
Feature activation+4.81e+00

Pos logit contributions
anes+0.024
uts+0.023
 Isn+0.020
eddy+0.020
all+0.019
Neg logit contributions
 forgot-0.028
 shield-0.024
 trap-0.024
 head-0.024
 scar-0.024
Token Then
Token ID3244
Feature activation+4.37e-01

Pos logit contributions
anes+0.122
uts+0.114
qu+0.107
ous+0.107
all+0.106
Neg logit contributions
 forgot-0.117
 hospital-0.100
 fort-0.098
iper-0.098
 shield-0.098
Token he
Token ID339
Feature activation-9.53e+00

Pos logit contributions
anes+0.011
uts+0.010
all+0.010
ous+0.009
 Mike+0.009
Neg logit contributions
 forgot-0.011
 fort-0.010
 scar-0.009
 closet-0.009
iper-0.009
Token hopped
Token ID45230
Feature activation+4.94e+00

Pos logit contributions
 forgot+0.156
 hospital+0.150
 fort+0.149
 trap+0.144
 proper+0.140
Neg logit contributions
anes-0.274
uts-0.271
all-0.267
od-0.245
ous-0.244
Token then
Token ID788
Feature activation-2.59e+00

Pos logit contributions
 forgot+0.053
 fort+0.045
 scar+0.044
 closet+0.043
 trap+0.042
Neg logit contributions
anes-0.072
uts-0.068
all-0.067
that-0.063
ous-0.062
Token,
Token ID11
Feature activation-2.01e+00

Pos logit contributions
 forgot+0.053
 scar+0.042
 fort+0.040
 hospital+0.040
 bit+0.039
Neg logit contributions
anes-0.082
uts-0.079
all-0.072
ous-0.071
unch-0.069
Token Tim
Token ID5045
Feature activation-8.36e+00

Pos logit contributions
 forgot+0.054
 scar+0.047
 fort+0.046
 hospital+0.045
 shield+0.044
Neg logit contributions
anes-0.049
uts-0.047
all-0.042
ous-0.041
that-0.039
Tokenmy
Token ID1820
Feature activation-8.02e+00

Pos logit contributions
 forgot+0.213
 trap+0.206
 fort+0.204
 slammed+0.202
 scar+0.201
Neg logit contributions
all-0.184
anes-0.177
that-0.162
eddy-0.154
ous-0.153
Token and
Token ID290
Feature activation-7.92e+00

Pos logit contributions
 forgot+0.150
 scar+0.148
 fort+0.143
 trap+0.139
eh+0.138
Neg logit contributions
anes-0.223
all-0.215
uts-0.212
amb-0.195
 Share-0.195
Token his
Token ID465
Feature activation-1.21e+01

Pos logit contributions
 forgot+0.166
 scar+0.130
 shield+0.128
 hospital+0.128
 fort+0.126
Neg logit contributions
anes-0.249
uts-0.238
all-0.217
ous-0.215
qu-0.213
Token dad
Token ID9955
Feature activation-1.51e+00

Pos logit contributions
 forgot+0.287
 scar+0.249
 fort+0.241
ches+0.234
 trap+0.229
Neg logit contributions
anes-0.312
uts-0.305
all-0.279
ous-0.273
that-0.264
Token have
Token ID423
Feature activation-2.52e+00

Pos logit contributions
 forgot+0.029
 fort+0.027
 scar+0.027
 trap+0.025
eh+0.025
Neg logit contributions
anes-0.043
uts-0.040
all-0.040
ous-0.038
od-0.037
Token made
Token ID925
Feature activation+3.69e+00

Pos logit contributions
 fort+0.042
 forgot+0.041
 hospital+0.039
 head+0.039
 shield+0.039
Neg logit contributions
uts-0.074
anes-0.074
all-0.072
that-0.069
qu-0.068
Token sure
Token ID1654
Feature activation+1.44e+00

Pos logit contributions
anes+0.097
uts+0.092
all+0.086
ous+0.081
that+0.080
Neg logit contributions
 forgot-0.093
 scar-0.078
 fort-0.077
 closet-0.075
 shield-0.074
Token to
Token ID284
Feature activation-2.92e+00

Pos logit contributions
anes+0.031
uts+0.029
all+0.026
ous+0.025
that+0.024
Neg logit contributions
 forgot-0.043
 scar-0.038
 fort-0.037
 closet-0.036
 shield-0.036
Token the
Token ID262
Feature activation-6.90e+00

Pos logit contributions
anes+0.082
uts+0.078
all+0.071
ous+0.069
unch+0.068
Neg logit contributions
 forgot-0.066
 scar-0.056
 fort-0.055
 hospital-0.054
 shield-0.054
Token pond
Token ID16723
Feature activation+3.32e+00

Pos logit contributions
 forgot+0.154
iper+0.125
 scar+0.124
 slammed+0.116
 fort+0.115
Neg logit contributions
anes-0.194
uts-0.177
all-0.173
that-0.169
 Isn-0.166
Token,
Token ID11
Feature activation-1.50e+00

Pos logit contributions
anes+0.084
uts+0.080
all+0.074
ous+0.071
that+0.069
Neg logit contributions
 forgot-0.085
 scar-0.073
 fort-0.071
 hospital-0.070
 closet-0.069
Token fl
Token ID781
Feature activation+7.99e+00

Pos logit contributions
 forgot+0.039
 fort+0.034
 scar+0.034
 trap+0.033
iper+0.033
Neg logit contributions
anes-0.036
uts-0.035
all-0.033
ous-0.031
that-0.031
Tokenapping
Token ID5912
Feature activation+5.14e+00

Pos logit contributions
anes+0.120
uts+0.107
all+0.091
 Isn+0.091
qu+0.084
Neg logit contributions
 forgot-0.284
 scar-0.250
iper-0.247
 fort-0.247
 head-0.247
Token his
Token ID465
Feature activation-1.19e+01

Pos logit contributions
anes+0.130
uts+0.126
all+0.117
ous+0.113
that+0.110
Neg logit contributions
 forgot-0.130
 scar-0.110
 fort-0.109
 hospital-0.108
 trap-0.106
Token wings
Token ID12098
Feature activation+4.07e+00

Pos logit contributions
 forgot+0.261
 hospital+0.228
 slammed+0.221
 closet+0.213
 fort+0.207
Neg logit contributions
anes-0.302
uts-0.302
ous-0.272
ach-0.269
amb-0.258
Token.
Token ID13
Feature activation+4.37e+00

Pos logit contributions
anes+0.115
uts+0.110
all+0.102
ous+0.099
that+0.097
Neg logit contributions
 forgot-0.093
 scar-0.078
 fort-0.076
 hospital-0.075
 closet-0.074
Token He
Token ID679
Feature activation-1.03e+01

Pos logit contributions
anes+0.111
uts+0.105
ous+0.099
qu+0.097
od+0.093
Neg logit contributions
 forgot-0.107
 hospital-0.092
 closet-0.088
iper-0.088
 way-0.087
Token was
Token ID373
Feature activation-3.54e+00

Pos logit contributions
 fort+0.193
 closet+0.184
 forgot+0.179
 trap+0.177
 hospital+0.173
Neg logit contributions
anes-0.293
uts-0.285
all-0.283
ous-0.269
 Isn-0.263
Token so
Token ID523
Feature activation-1.26e+00

Pos logit contributions
 forgot+0.083
 scar+0.068
 fort+0.066
 hospital+0.065
 shield+0.064
Neg logit contributions
anes-0.100
uts-0.096
all-0.088
ous-0.085
that-0.083
Token
Token ID198
Feature activation+4.78e+00

Pos logit contributions
anes+0.192
umm+0.192
ous+0.182
all+0.177
work+0.176
Neg logit contributions
 forgot-0.178
 might-0.153
 fort-0.149
 way-0.148
ches-0.145
Token
Token ID198
Feature activation+4.34e+00

Pos logit contributions
anes+0.104
uts+0.096
umm+0.091
ous+0.089
all+0.088
Neg logit contributions
 forgot-0.135
 fort-0.115
 scar-0.115
 closet-0.113
 hospital-0.113
TokenMax
Token ID11518
Feature activation-7.12e+00

Pos logit contributions
uts+0.114
anes+0.113
all+0.103
that+0.101
qu+0.098
Neg logit contributions
 forgot-0.103
 scar-0.090
 hospital-0.088
 fort-0.088
 closet-0.085
Token w
Token ID266
Feature activation+1.41e+00

Pos logit contributions
 hospital+0.136
 fort+0.133
 proper+0.130
 forgot+0.129
 trap+0.126
Neg logit contributions
all-0.186
uts-0.186
anes-0.174
od-0.171
qu-0.165
Tokenagged
Token ID14655
Feature activation+1.46e+00

Pos logit contributions
anes+0.029
uts+0.027
unch+0.023
all+0.023
od+0.021
Neg logit contributions
 forgot-0.046
 scar-0.041
 bit-0.041
 closet-0.041
 fort-0.040
Token his
Token ID465
Feature activation-1.19e+01

Pos logit contributions
uts+0.029
all+0.029
anes+0.028
that+0.026
ous+0.025
Neg logit contributions
 forgot-0.042
ns-0.037
iper-0.036
 fort-0.035
 scar-0.035
Token tail
Token ID7894
Feature activation+2.56e+00

Pos logit contributions
 forgot+0.315
 hospital+0.288
 wouldn+0.280
iper+0.276
ience+0.276
Neg logit contributions
uts-0.212
that-0.180
amb-0.178
 That-0.178
eddy-0.174
Token and
Token ID290
Feature activation-6.35e+00

Pos logit contributions
uts+0.062
anes+0.062
all+0.058
ous+0.055
unch+0.054
Neg logit contributions
 forgot-0.065
 scar-0.055
iper-0.054
 fort-0.052
 hospital-0.051
Token ran
Token ID4966
Feature activation+3.92e+00

Pos logit contributions
iper+0.129
ns+0.124
ches+0.123
 trap+0.122
 fort+0.118
Neg logit contributions
 Mike-0.152
uts-0.147
all-0.143
 Sus-0.136
 Lib-0.136
Token around
Token ID1088
Feature activation+3.01e+00

Pos logit contributions
anes+0.099
uts+0.094
all+0.086
ous+0.083
that+0.080
Neg logit contributions
 forgot-0.103
 scar-0.088
 fort-0.086
 hospital-0.084
 trap-0.084
Token happily
Token ID18177
Feature activation+2.31e-01

Pos logit contributions
anes+0.081
uts+0.077
all+0.071
ous+0.068
that+0.067
Neg logit contributions
 forgot-0.074
 scar-0.063
 fort-0.061
 hospital-0.060
 shield-0.059
Token.
Token ID13
Feature activation+4.85e+00

Pos logit contributions
anes+0.095
uts+0.083
all+0.078
 Isn+0.077
unch+0.075
Neg logit contributions
 forgot-0.093
 head-0.079
 bit-0.077
 shield-0.077
 cast-0.075
Token
Token ID198
Feature activation+4.81e+00

Pos logit contributions
anes+0.113
uts+0.105
ous+0.097
qu+0.094
all+0.093
Neg logit contributions
 forgot-0.133
 hospital-0.111
 bit-0.110
 shield-0.109
 remembered-0.109
Token
Token ID198
Feature activation+4.41e+00

Pos logit contributions
anes+0.109
uts+0.103
ous+0.099
umm+0.090
all+0.089
Neg logit contributions
 forgot-0.131
 hospital-0.113
 cast-0.112
 learned-0.111
 trap-0.110
TokenJack
Token ID14295
Feature activation-7.72e+00

Pos logit contributions
anes+0.106
uts+0.100
all+0.091
ous+0.089
that+0.084
Neg logit contributions
 forgot-0.122
 scar-0.104
 fort-0.102
 hospital-0.101
 trap-0.100
Token felt
Token ID2936
Feature activation-1.62e+00

Pos logit contributions
 fort+0.155
 forgot+0.149
 hospital+0.148
 trap+0.146
 scar+0.141
Neg logit contributions
all-0.212
uts-0.200
anes-0.199
ous-0.193
that-0.191
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.036
 fort+0.033
 scar+0.032
 closet+0.030
iper+0.030
Neg logit contributions
anes-0.044
uts-0.043
all-0.041
that-0.039
 Isn-0.038
Token heart
Token ID2612
Feature activation+3.32e+00

Pos logit contributions
 forgot+0.240
eh+0.202
 scar+0.195
 slammed+0.189
 fort+0.187
Neg logit contributions
ous-0.321
uts-0.319
anes-0.304
all-0.295
amb-0.285
Token stop
Token ID2245
Feature activation+5.79e+00

Pos logit contributions
uts+0.098
anes+0.097
ous+0.090
all+0.089
that+0.084
Neg logit contributions
 forgot-0.066
 scar-0.057
 hospital-0.056
 fort-0.055
iper-0.054
Token.
Token ID13
Feature activation+3.96e+00

Pos logit contributions
anes+0.164
uts+0.156
all+0.139
ous+0.134
 Isn+0.134
Neg logit contributions
 forgot-0.136
 scar-0.116
 head-0.112
 shield-0.111
 fort-0.111
Token He
Token ID679
Feature activation-1.02e+01

Pos logit contributions
anes+0.111
uts+0.105
qu+0.101
all+0.095
ous+0.094
Neg logit contributions
 forgot-0.088
 head-0.073
 hospital-0.072
 shield-0.071
 closet-0.071
Token knew
Token ID2993
Feature activation+3.44e+00

Pos logit contributions
 fort+0.211
 hospital+0.193
 trap+0.185
 forgot+0.183
 closet+0.183
Neg logit contributions
all-0.260
uts-0.253
anes-0.242
ous-0.237
 Isn-0.235
Token
Token ID198
Feature activation+3.64e+00

Pos logit contributions
anes+0.101
uts+0.092
umm+0.091
ous+0.088
qu+0.082
Neg logit contributions
 forgot-0.101
 fort-0.087
 cast-0.086
 hospital-0.086
 saw-0.085
TokenAs
Token ID1722
Feature activation+1.77e+00

Pos logit contributions
anes+0.102
uts+0.097
all+0.090
ous+0.087
that+0.084
Neg logit contributions
 forgot-0.087
 scar-0.072
 fort-0.071
 hospital-0.069
 trap-0.069
Token Tim
Token ID5045
Feature activation-8.26e+00

Pos logit contributions
anes+0.043
uts+0.041
all+0.038
ous+0.036
that+0.035
Neg logit contributions
 forgot-0.048
 scar-0.041
 fort-0.040
 hospital-0.039
 closet-0.039
Tokenmy
Token ID1820
Feature activation-7.99e+00

Pos logit contributions
 forgot+0.213
 shield+0.200
 trap+0.199
 sword+0.191
 scar+0.190
Neg logit contributions
all-0.173
anes-0.171
eddy-0.160
uts-0.157
unch-0.155
Token and
Token ID290
Feature activation-7.86e+00

Pos logit contributions
 fort+0.157
 closet+0.156
 forgot+0.156
iper+0.155
 shield+0.154
Neg logit contributions
uts-0.192
all-0.189
unch-0.186
anes-0.177
good-0.176
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.193
 scar+0.160
 fort+0.159
 hospital+0.155
iper+0.155
Neg logit contributions
anes-0.204
uts-0.196
all-0.187
ous-0.174
that-0.173
Token friends
Token ID2460
Feature activation+6.34e-01

Pos logit contributions
 forgot+0.243
 scar+0.200
iper+0.199
 trap+0.189
 fort+0.188
Neg logit contributions
anes-0.347
uts-0.345
all-0.312
eddy-0.303
unch-0.303
Token were
Token ID547
Feature activation-2.82e+00

Pos logit contributions
anes+0.014
 Mike+0.014
all+0.013
qu+0.013
good+0.013
Neg logit contributions
 fort-0.014
 trap-0.014
 shield-0.013
 sword-0.013
iper-0.013
Token playing
Token ID2712
Feature activation+7.08e+00

Pos logit contributions
 forgot+0.071
 scar+0.063
iper+0.062
 fort+0.062
 trap+0.061
Neg logit contributions
anes-0.067
uts-0.066
all-0.061
unch-0.060
ous-0.055
Token,
Token ID11
Feature activation-2.23e+00

Pos logit contributions
uts+0.114
anes+0.113
that+0.104
all+0.103
whe+0.100
Neg logit contributions
 forgot-0.228
iper-0.218
 scar-0.204
 closet-0.194
ns-0.189
Token they
Token ID484
Feature activation-7.97e+00

Pos logit contributions
 forgot+0.065
 scar+0.057
iper+0.057
 fort+0.057
 trap+0.055
Neg logit contributions
anes-0.048
uts-0.044
all-0.041
ous-0.038
that-0.038
Token now
Token ID783
Feature activation-2.79e+00

Pos logit contributions
 forgot+0.031
 scar+0.026
 fort+0.025
 hospital+0.025
 shield+0.025
Neg logit contributions
anes-0.038
uts-0.037
all-0.033
ous-0.033
qu-0.032
Token he
Token ID339
Feature activation-9.05e+00

Pos logit contributions
 forgot+0.071
 scar+0.063
 fort+0.062
 closet+0.060
iper+0.060
Neg logit contributions
anes-0.069
uts-0.067
all-0.062
ous-0.059
that-0.058
Token could
Token ID714
Feature activation+2.50e+00

Pos logit contributions
 fort+0.186
 closet+0.182
iper+0.174
 scar+0.173
 garage+0.168
Neg logit contributions
all-0.236
uts-0.230
anes-0.220
that-0.216
qu-0.214
Token qu
Token ID627
Feature activation+7.86e+00

Pos logit contributions
anes+0.076
uts+0.074
all+0.070
ous+0.067
that+0.066
Neg logit contributions
 forgot-0.050
 scar-0.042
 fort-0.040
 closet-0.039
iper-0.039
Tokenench
Token ID24421
Feature activation+6.63e+00

Pos logit contributions
anes+0.206
uts+0.195
all+0.181
ous+0.172
that+0.169
Neg logit contributions
 forgot-0.200
 scar-0.170
 fort-0.166
 hospital-0.162
 shield-0.161
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
anes+0.210
uts+0.198
ous+0.182
all+0.181
qu+0.174
Neg logit contributions
 forgot-0.135
 scar-0.115
 fort-0.107
 hospital-0.106
 shield-0.106
Token thirst
Token ID24613
Feature activation+5.42e+00

Pos logit contributions
 forgot+0.229
 hospital+0.178
 slammed+0.177
iper+0.173
 fort+0.173
Neg logit contributions
uts-0.334
anes-0.333
unch-0.333
all-0.330
that-0.309
Token and
Token ID290
Feature activation-5.90e+00

Pos logit contributions
anes+0.146
uts+0.140
all+0.129
ous+0.124
that+0.121
Neg logit contributions
 forgot-0.131
 scar-0.111
 fort-0.109
 hospital-0.106
 shield-0.105
Token have
Token ID423
Feature activation-8.35e-01

Pos logit contributions
 forgot+0.116
iper+0.106
 fort+0.102
 scar+0.100
 closet+0.099
Neg logit contributions
uts-0.159
anes-0.157
all-0.153
ous-0.145
 Isn-0.142
Token a
Token ID257
Feature activation-5.26e+00

Pos logit contributions
 forgot+0.021
 fort+0.018
 hospital+0.018
 closet+0.018
 scar+0.018
Neg logit contributions
anes-0.020
uts-0.019
all-0.019
that-0.017
 Isn-0.017
Token cool
Token ID3608
Feature activation-1.09e-01

Pos logit contributions
 forgot+0.116
iper+0.099
 slammed+0.094
 hospital+0.093
 closet+0.093
Neg logit contributions
all-0.129
anes-0.127
uts-0.126
unch-0.123
that-0.119
Token
Token ID198
Feature activation+4.56e+00

Pos logit contributions
anes+0.166
ous+0.153
umm+0.150
uts+0.149
all+0.148
Neg logit contributions
 forgot-0.207
 hospital-0.177
 fort-0.172
 cast-0.172
iper-0.171
Token
Token ID198
Feature activation+4.16e+00

Pos logit contributions
anes+0.121
ous+0.114
uts+0.111
umm+0.108
work+0.103
Neg logit contributions
 forgot-0.107
 cast-0.095
 learned-0.094
 hospital-0.092
 scar-0.090
TokenTom
Token ID13787
Feature activation-5.33e+00

Pos logit contributions
anes+0.120
uts+0.113
ous+0.108
all+0.102
od+0.101
Neg logit contributions
 forgot-0.094
 hospital-0.077
 cast-0.077
 scar-0.077
 scarf-0.073
Tokenmy
Token ID1820
Feature activation-7.50e+00

Pos logit contributions
 forgot+0.135
 trap+0.127
 fort+0.126
 hospital+0.126
 scar+0.125
Neg logit contributions
all-0.117
anes-0.117
ous-0.104
uts-0.103
that-0.103
Token and
Token ID290
Feature activation-7.60e+00

Pos logit contributions
 fort+0.159
 scar+0.151
 forgot+0.150
 trap+0.147
 closet+0.147
Neg logit contributions
all-0.186
anes-0.184
uts-0.182
ous-0.171
 Mike-0.168
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.189
 scar+0.160
 fort+0.159
 closet+0.153
 hospital+0.151
Neg logit contributions
anes-0.195
uts-0.184
all-0.179
ous-0.167
that-0.165
Token friend
Token ID1545
Feature activation+6.23e-01

Pos logit contributions
 forgot+0.297
 slammed+0.239
 scar+0.230
iper+0.228
 closet+0.223
Neg logit contributions
anes-0.273
uts-0.265
all-0.246
ous-0.242
unch-0.228
Token were
Token ID547
Feature activation-2.48e+00

Pos logit contributions
anes+0.016
 Mike+0.016
uts+0.015
all+0.015
ous+0.015
Neg logit contributions
 forgot-0.012
 fort-0.011
 scar-0.011
 trap-0.011
 needle-0.011
Token both
Token ID1111
Feature activation-6.37e+00

Pos logit contributions
 forgot+0.054
 scar+0.045
 fort+0.044
 hospital+0.043
 trap+0.043
Neg logit contributions
anes-0.073
uts-0.070
all-0.065
ous-0.063
that-0.062
Token very
Token ID845
Feature activation-2.80e+00

Pos logit contributions
 forgot+0.143
 scar+0.122
 fort+0.120
 trap+0.116
iper+0.115
Neg logit contributions
anes-0.178
uts-0.172
all-0.162
ous-0.157
that-0.153
Token glad
Token ID9675
Feature activation+2.52e+00

Pos logit contributions
 forgot+0.056
 scar+0.045
 fort+0.043
 hospital+0.042
 shield+0.042
Neg logit contributions
anes-0.089
uts-0.085
all-0.080
ous-0.077
that-0.076
Token
Token ID198
Feature activation+4.38e+00

Pos logit contributions
anes+0.111
uts+0.109
umm+0.108
ous+0.100
all+0.097
Neg logit contributions
 forgot-0.112
 cast-0.108
 needle-0.103
 hospital-0.103
 scarf-0.101
TokenBrown
Token ID20644
Feature activation-4.66e+00

Pos logit contributions
anes+0.125
uts+0.120
all+0.110
ous+0.110
unch+0.108
Neg logit contributions
 forgot-0.092
 scar-0.079
 cast-0.078
 shield-0.076
 bit-0.076
Tokenie
Token ID494
Feature activation-7.32e+00

Pos logit contributions
 forgot+0.100
 hospital+0.087
 fort+0.085
 trap+0.085
 scar+0.080
Neg logit contributions
anes-0.134
all-0.127
ous-0.124
uts-0.123
eddy-0.118
Token w
Token ID266
Feature activation+1.83e+00

Pos logit contributions
 forgot+0.154
 fort+0.146
 scar+0.137
 closet+0.135
 hospital+0.133
Neg logit contributions
anes-0.200
uts-0.198
all-0.194
ous-0.183
od-0.180
Tokenagged
Token ID14655
Feature activation+1.17e+00

Pos logit contributions
anes+0.044
uts+0.041
all+0.037
ous+0.035
unch+0.035
Neg logit contributions
 forgot-0.053
 scar-0.045
 fort-0.045
 closet-0.044
 bit-0.043
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
anes+0.029
uts+0.029
all+0.027
ous+0.026
that+0.025
Neg logit contributions
 forgot-0.030
 fort-0.025
 scar-0.025
 hospital-0.024
 trap-0.024
Token tail
Token ID7894
Feature activation+2.80e+00

Pos logit contributions
 forgot+0.337
 hospital+0.307
 wouldn+0.276
 slammed+0.269
 nurses+0.265
Neg logit contributions
uts-0.208
amb-0.183
that-0.178
ous-0.174
aps-0.171
Token and
Token ID290
Feature activation-6.33e+00

Pos logit contributions
anes+0.078
uts+0.076
all+0.072
ous+0.069
that+0.067
Neg logit contributions
 forgot-0.064
 scar-0.052
 fort-0.052
 hospital-0.050
 closet-0.050
Token said
Token ID531
Feature activation+3.32e+00

Pos logit contributions
 forgot+0.131
 fort+0.119
 closet+0.115
 trap+0.113
 hospital+0.112
Neg logit contributions
anes-0.169
uts-0.169
all-0.159
ous-0.157
 Mike-0.153
Token,
Token ID11
Feature activation-9.55e-01

Pos logit contributions
anes+0.119
uts+0.110
all+0.105
unch+0.102
eddy+0.102
Neg logit contributions
 forgot-0.053
 shield-0.041
 remembered-0.039
 bit-0.038
 scar-0.038
Token "
Token ID366
Feature activation+3.49e+00

Pos logit contributions
 forgot+0.019
 head+0.016
 bit+0.015
 remembered+0.015
 didn+0.015
Neg logit contributions
anes-0.031
uts-0.028
eddy-0.026
reating-0.025
ach-0.025
Token
Token ID220
Feature activation+7.64e+00

Pos logit contributions
anes+0.118
uts+0.111
qu+0.104
ous+0.101
od+0.101
Neg logit contributions
 forgot-0.100
 way-0.083
 hospital-0.082
 closet-0.081
 head-0.079
Token
Token ID198
Feature activation+4.21e+00

Pos logit contributions
anes+0.175
ous+0.163
umm+0.161
reating+0.155
uts+0.155
Neg logit contributions
 forgot-0.191
 might-0.158
 hospital-0.156
 way-0.156
 fort-0.155
Token
Token ID198
Feature activation+3.87e+00

Pos logit contributions
anes+0.107
ous+0.096
uts+0.096
umm+0.096
work+0.088
Neg logit contributions
 forgot-0.107
 cast-0.089
 learned-0.089
 hospital-0.085
 didn-0.085
TokenSuddenly
Token ID38582
Feature activation+1.25e+00

Pos logit contributions
anes+0.116
uts+0.109
ous+0.101
all+0.099
od+0.097
Neg logit contributions
 forgot-0.086
 scar-0.068
 hospital-0.067
 fort-0.066
 cast-0.066
Token,
Token ID11
Feature activation-1.90e+00

Pos logit contributions
anes+0.028
uts+0.027
all+0.026
 Mike+0.025
ous+0.025
Neg logit contributions
 forgot-0.034
 fort-0.029
 scar-0.029
 trap-0.028
 closet-0.028
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.063
 scar+0.056
 fort+0.056
 trap+0.055
iper+0.055
Neg logit contributions
anes-0.033
uts-0.030
all-0.029
ous-0.026
that-0.025
Token dad
Token ID9955
Feature activation-1.20e+00

Pos logit contributions
 forgot+0.269
 scar+0.226
 fort+0.225
 trap+0.216
iper+0.213
Neg logit contributions
anes-0.324
uts-0.317
all-0.289
ous-0.280
that-0.269
Token had
Token ID550
Feature activation-1.89e+00

Pos logit contributions
 forgot+0.021
 fort+0.018
 scar+0.018
 hospital+0.017
 proper+0.017
Neg logit contributions
anes-0.038
uts-0.037
all-0.035
ous-0.034
qu-0.033
Token an
Token ID281
Feature activation-5.49e+00

Pos logit contributions
 forgot+0.046
 scar+0.039
 fort+0.037
 hospital+0.037
 shield+0.037
Neg logit contributions
anes-0.052
uts-0.049
all-0.046
ous-0.044
that-0.043
Token idea
Token ID2126
Feature activation+6.80e+00

Pos logit contributions
 forgot+0.128
 scar+0.101
 fort+0.098
 bit+0.094
 closet+0.094
Neg logit contributions
anes-0.156
uts-0.147
all-0.143
ous-0.134
that-0.131
Token.
Token ID13
Feature activation+4.40e+00

Pos logit contributions
anes+0.182
uts+0.164
all+0.146
unch+0.144
eddy+0.143
Neg logit contributions
 forgot-0.173
 bit-0.146
 fort-0.145
 cast-0.144
 scar-0.143
Token so
Token ID523
Feature activation-1.81e+00

Pos logit contributions
 forgot+0.074
 scar+0.065
 fort+0.065
 trap+0.063
 shield+0.063
Neg logit contributions
anes-0.057
uts-0.056
all-0.052
ous-0.048
that-0.047
Token,
Token ID11
Feature activation-1.68e+00

Pos logit contributions
 forgot+0.048
 scar+0.045
 closet+0.044
 hospital+0.044
 trap+0.043
Neg logit contributions
anes-0.038
uts-0.038
all-0.035
ous-0.034
that-0.033
Token Tim
Token ID5045
Feature activation-7.94e+00

Pos logit contributions
 forgot+0.047
 scar+0.041
 fort+0.040
 trap+0.039
 closet+0.039
Neg logit contributions
anes-0.039
uts-0.037
all-0.034
ous-0.032
that-0.032
Tokenmy
Token ID1820
Feature activation-7.36e+00

Pos logit contributions
 trap+0.228
 hospital+0.224
 traps+0.221
ns+0.217
 forgot+0.215
Neg logit contributions
all-0.146
eddy-0.126
that-0.123
anes-0.120
ous-0.112
Token and
Token ID290
Feature activation-7.72e+00

Pos logit contributions
 trap+0.143
 traps+0.139
 hospital+0.138
ns+0.132
 scar+0.131
Neg logit contributions
uts-0.194
all-0.193
that-0.176
anes-0.171
amb-0.165
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.216
ns+0.190
iper+0.188
 trap+0.187
 scar+0.186
Neg logit contributions
anes-0.154
all-0.148
uts-0.148
 Mike-0.135
ges-0.133
Token mom
Token ID1995
Feature activation-2.98e+00

Pos logit contributions
 forgot+0.289
iper+0.247
 scar+0.236
ches+0.234
eh+0.234
Neg logit contributions
anes-0.279
uts-0.276
eddy-0.255
all-0.250
ous-0.249
Token picked
Token ID6497
Feature activation+5.45e+00

Pos logit contributions
 forgot+0.047
 trap+0.045
 hospital+0.042
 fort+0.039
 shield+0.039
Neg logit contributions
anes-0.095
all-0.091
uts-0.091
ous-0.086
that-0.086
Token up
Token ID510
Feature activation+3.71e-01

Pos logit contributions
anes+0.137
uts+0.131
all+0.119
ous+0.115
unch+0.110
Neg logit contributions
 forgot-0.143
 scar-0.124
 fort-0.122
 hospital-0.120
 shield-0.118
Token all
Token ID477
Feature activation+1.46e-01

Pos logit contributions
anes+0.008
all+0.007
uts+0.007
 Mike+0.007
that+0.007
Neg logit contributions
 forgot-0.011
 trap-0.009
 fort-0.009
iper-0.009
 closet-0.008
Token the
Token ID262
Feature activation-7.14e+00

Pos logit contributions
anes+0.004
uts+0.004
all+0.003
that+0.003
ous+0.003
Neg logit contributions
 forgot-0.004
 scar-0.003
 fort-0.003
 trap-0.003
 shield-0.003
Token!
Token ID0
Feature activation+3.81e+00

Pos logit contributions
anes+0.089
uts+0.078
 Share+0.070
all+0.070
unch+0.069
Neg logit contributions
 forgot-0.076
 fort-0.064
 bit-0.064
 scar-0.063
 trap-0.063
Token
Token ID198
Feature activation+4.39e+00

Pos logit contributions
uts+0.099
qu+0.099
anes+0.096
od+0.094
ous+0.093
Neg logit contributions
 forgot-0.076
 way-0.068
 hospital-0.065
 garage-0.065
 learned-0.064
Token
Token ID198
Feature activation+4.00e+00

Pos logit contributions
anes+0.109
umm+0.109
uts+0.106
ous+0.104
ges+0.097
Neg logit contributions
 forgot-0.098
 learned-0.089
 cast-0.086
 hospital-0.086
 saw-0.082
TokenJohn
Token ID7554
Feature activation-9.26e+00

Pos logit contributions
anes+0.087
uts+0.086
ous+0.083
ges+0.077
umm+0.075
Neg logit contributions
 forgot-0.107
 learned-0.093
 cast-0.091
eh-0.090
 hospital-0.090
Token and
Token ID290
Feature activation-7.74e+00

Pos logit contributions
 fort+0.174
 forgot+0.171
 closet+0.167
 hospital+0.163
 trap+0.163
Neg logit contributions
all-0.258
anes-0.257
uts-0.254
od-0.236
ous-0.236
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.175
 scar+0.145
 fort+0.142
 hospital+0.138
 trap+0.137
Neg logit contributions
anes-0.223
uts-0.213
all-0.200
ous-0.192
that-0.188
Token mom
Token ID1995
Feature activation-3.00e+00

Pos logit contributions
 forgot+0.264
 scar+0.210
iper+0.207
 slammed+0.199
ches+0.197
Neg logit contributions
anes-0.332
uts-0.317
all-0.292
ous-0.284
eddy-0.281
Token walked
Token ID6807
Feature activation+3.88e+00

Pos logit contributions
 forgot+0.055
 fort+0.048
 scar+0.047
 trap+0.047
 hospital+0.046
Neg logit contributions
anes-0.094
uts-0.088
all-0.086
ous-0.083
that-0.081
Token away
Token ID1497
Feature activation+5.49e-01

Pos logit contributions
anes+0.109
uts+0.105
all+0.094
ous+0.092
unch+0.091
Neg logit contributions
 forgot-0.091
 scar-0.079
 fort-0.077
 hospital-0.073
 shield-0.073
Token with
Token ID351
Feature activation+1.17e+00

Pos logit contributions
anes+0.015
uts+0.014
all+0.013
ous+0.013
 Isn+0.012
Neg logit contributions
 forgot-0.013
 scar-0.011
 fort-0.011
 bit-0.011
 hospital-0.011
Token happy
Token ID3772
Feature activation+2.67e+00

Pos logit contributions
anes+0.031
uts+0.029
all+0.027
ous+0.026
that+0.026
Neg logit contributions
 forgot-0.029
 scar-0.024
 fort-0.024
 hospital-0.023
 closet-0.023
Token
Token ID220
Feature activation+7.83e+00

Pos logit contributions
anes+0.110
uts+0.104
all+0.095
ous+0.091
that+0.088
Neg logit contributions
 forgot-0.130
 scar-0.112
 fort-0.110
 hospital-0.108
 trap-0.107
Token
Token ID198
Feature activation+4.43e+00

Pos logit contributions
anes+0.180
umm+0.170
ous+0.169
qu+0.164
uts+0.161
Neg logit contributions
 forgot-0.189
 hospital-0.161
 closet-0.153
 way-0.153
 might-0.152
Token
Token ID198
Feature activation+4.03e+00

Pos logit contributions
anes+0.107
ous+0.097
uts+0.096
umm+0.093
work+0.086
Neg logit contributions
 forgot-0.118
 cast-0.101
 learned-0.100
 hospital-0.099
 way-0.095
TokenThen
Token ID6423
Feature activation-1.10e+00

Pos logit contributions
anes+0.113
uts+0.109
all+0.102
that+0.096
ous+0.095
Neg logit contributions
 forgot-0.094
 scar-0.080
 fort-0.079
 closet-0.076
 shield-0.075
Token,
Token ID11
Feature activation-1.89e+00

Pos logit contributions
 forgot+0.031
 fort+0.028
iper+0.028
 closet+0.027
ches+0.027
Neg logit contributions
anes-0.021
 Mike-0.020
all-0.020
uts-0.020
ous-0.018
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.051
 fort+0.044
iper+0.044
 closet+0.043
 scar+0.043
Neg logit contributions
anes-0.043
all-0.040
uts-0.040
 Mike-0.037
ous-0.037
Token mom
Token ID1995
Feature activation-2.40e+00

Pos logit contributions
 forgot+0.267
 scar+0.211
iper+0.201
 slammed+0.200
eh+0.198
Neg logit contributions
anes-0.320
uts-0.314
all-0.283
ous-0.280
eddy-0.280
Token came
Token ID1625
Feature activation+3.33e+00

Pos logit contributions
 forgot+0.049
 fort+0.044
 proper+0.043
 trap+0.043
 hospital+0.042
Neg logit contributions
anes-0.068
all-0.062
uts-0.061
ous-0.061
od-0.058
Token and
Token ID290
Feature activation-6.73e+00

Pos logit contributions
anes+0.097
uts+0.093
ous+0.082
unch+0.080
all+0.079
Neg logit contributions
 forgot-0.075
 scar-0.067
 closet-0.065
 head-0.065
 fort-0.065
Token hugged
Token ID41130
Feature activation+7.55e+00

Pos logit contributions
 forgot+0.115
 fort+0.109
 trap+0.101
 hospital+0.099
 shield+0.094
Neg logit contributions
anes-0.205
all-0.196
uts-0.195
that-0.188
qu-0.185
Token him
Token ID683
Feature activation-6.77e+00

Pos logit contributions
anes+0.116
uts+0.105
all+0.096
ous+0.085
that+0.085
Neg logit contributions
 forgot-0.273
 scar-0.239
 fort-0.239
 trap-0.234
 hospital-0.233
Token."
Token ID526
Feature activation+7.54e+00

Pos logit contributions
anes+0.084
uts+0.068
unch+0.067
that+0.066
all+0.066
Neg logit contributions
 forgot-0.088
 bit-0.071
 way-0.070
 fort-0.070
 remembered-0.070
Token
Token ID198
Feature activation+4.58e+00

Pos logit contributions
anes+0.179
uts+0.170
all+0.155
ous+0.150
that+0.142
Neg logit contributions
 forgot-0.210
 scar-0.179
 fort-0.178
 hospital-0.175
 closet-0.172
Token
Token ID198
Feature activation+4.15e+00

Pos logit contributions
anes+0.106
uts+0.098
ous+0.092
umm+0.088
all+0.085
Neg logit contributions
 forgot-0.131
 fort-0.110
 cast-0.110
 hospital-0.108
 learned-0.108
TokenTim
Token ID14967
Feature activation-8.08e+00

Pos logit contributions
anes+0.099
uts+0.095
all+0.089
that+0.086
qu+0.083
Neg logit contributions
 forgot-0.111
 scar-0.099
 fort-0.096
 closet-0.096
 shield-0.096
Token,
Token ID11
Feature activation-2.72e+00

Pos logit contributions
 injuries+0.189
 fort+0.189
 trap+0.187
 traps+0.182
 scar+0.182
Neg logit contributions
all-0.182
uts-0.159
amb-0.154
anes-0.151
good-0.145
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.073
 fort+0.067
iper+0.066
ches+0.065
 trap+0.063
Neg logit contributions
anes-0.057
 Mike-0.056
all-0.056
uts-0.053
that-0.051
Token mom
Token ID1995
Feature activation-2.93e+00

Pos logit contributions
 forgot+0.252
 scar+0.208
 fort+0.200
iper+0.198
 trap+0.194
Neg logit contributions
anes-0.342
uts-0.333
all-0.308
ous-0.297
od-0.291
Token,
Token ID11
Feature activation-2.40e+00

Pos logit contributions
 forgot+0.063
 fort+0.057
 scar+0.057
 trap+0.056
 proper+0.055
Neg logit contributions
anes-0.082
uts-0.077
all-0.077
ous-0.071
od-0.069
Token and
Token ID290
Feature activation-7.07e+00

Pos logit contributions
 forgot+0.054
 fort+0.052
 trap+0.048
 web+0.046
 hospital+0.046
Neg logit contributions
anes-0.062
all-0.058
 Mike-0.056
uts-0.055
that-0.054
Token his
Token ID465
Feature activation-1.12e+01

Pos logit contributions
 forgot+0.169
 fort+0.147
 trap+0.144
ches+0.141
 closet+0.140
Neg logit contributions
anes-0.184
uts-0.173
all-0.166
 Mike-0.155
ous-0.154
Token dad
Token ID9955
Feature activation-1.04e+00

Pos logit contributions
 forgot+0.228
 scar+0.184
 fort+0.178
 hospital+0.174
 trap+0.172
Neg logit contributions
anes-0.343
uts-0.330
all-0.308
ous-0.300
od-0.291
Token.
Token ID13
Feature activation+4.50e+00

Pos logit contributions
anes+0.065
uts+0.060
all+0.054
ous+0.052
unch+0.051
Neg logit contributions
 forgot-0.074
 scar-0.064
 fort-0.063
 trap-0.062
 bit-0.062
Token
Token ID198
Feature activation+4.09e+00

Pos logit contributions
anes+0.106
uts+0.101
ous+0.096
all+0.091
od+0.090
Neg logit contributions
 forgot-0.117
 fort-0.100
 hospital-0.099
 slammed-0.096
 scar-0.096
Token
Token ID198
Feature activation+3.78e+00

Pos logit contributions
anes+0.090
uts+0.084
ous+0.081
umm+0.077
all+0.072
Neg logit contributions
 forgot-0.115
 cast-0.100
 fort-0.098
 scar-0.098
 hospital-0.097
TokenTim
Token ID14967
Feature activation-7.91e+00

Pos logit contributions
anes+0.108
uts+0.103
ous+0.095
all+0.094
od+0.090
Neg logit contributions
 forgot-0.086
 scar-0.071
 fort-0.070
 hospital-0.068
 trap-0.068
Token and
Token ID290
Feature activation-7.62e+00

Pos logit contributions
 closet+0.160
 trap+0.158
 fort+0.158
 hospital+0.154
 injuries+0.153
Neg logit contributions
all-0.198
anes-0.194
good-0.178
od-0.177
qu-0.175
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.229
iper+0.197
ches+0.196
 scar+0.194
 closet+0.193
Neg logit contributions
anes-0.146
 Mike-0.137
all-0.135
uts-0.124
that-0.124
Token friends
Token ID2460
Feature activation+1.01e+00

Pos logit contributions
 forgot+0.287
 slammed+0.232
 hospital+0.221
 scar+0.220
 bit+0.217
Neg logit contributions
anes-0.313
uts-0.297
all-0.275
ous-0.273
eddy-0.261
Token worked
Token ID3111
Feature activation+4.96e+00

Pos logit contributions
anes+0.028
uts+0.025
all+0.025
od+0.025
ous+0.025
Neg logit contributions
 forgot-0.019
 fort-0.019
 hospital-0.018
 trap-0.018
 scar-0.018
Token together
Token ID1978
Feature activation+4.09e-01

Pos logit contributions
anes+0.075
uts+0.071
all+0.065
that+0.055
ous+0.055
Neg logit contributions
 forgot-0.178
 closet-0.158
 hospital-0.156
 fort-0.155
 scar-0.155
Token to
Token ID284
Feature activation-2.73e+00

Pos logit contributions
anes+0.010
uts+0.009
all+0.008
ous+0.008
that+0.008
Neg logit contributions
 forgot-0.011
 scar-0.010
 fort-0.009
 hospital-0.009
 shield-0.009
Token raise
Token ID5298
Feature activation+5.76e+00

Pos logit contributions
 forgot+0.060
 scar+0.049
iper+0.048
 fort+0.048
 closet+0.047
Neg logit contributions
anes-0.077
uts-0.072
all-0.070
qu-0.067
ous-0.066
Token
Token ID198
Feature activation+4.10e+00

Pos logit contributions
anes+0.106
umm+0.103
uts+0.102
ous+0.091
all+0.088
Neg logit contributions
 forgot-0.113
 scar-0.104
 fort-0.101
 scarf-0.100
 cast-0.099
TokenJo
Token ID9908
Feature activation-4.64e+00

Pos logit contributions
anes+0.098
uts+0.094
ous+0.082
all+0.080
unch+0.080
Neg logit contributions
 forgot-0.110
 scar-0.099
 fort-0.094
 scarf-0.092
 cast-0.091
Tokenjo
Token ID7639
Feature activation-8.14e+00

Pos logit contributions
 forgot+0.122
 trap+0.110
 scar+0.109
 fort+0.109
 hospital+0.107
Neg logit contributions
anes-0.113
uts-0.106
all-0.105
ous-0.102
od-0.095
Token cl
Token ID537
Feature activation+8.28e+00

Pos logit contributions
 fort+0.155
 trap+0.151
 forgot+0.151
 scar+0.147
 closet+0.146
Neg logit contributions
uts-0.226
all-0.225
anes-0.224
od-0.207
ous-0.207
Tokenapped
Token ID6320
Feature activation+7.45e+00

Pos logit contributions
anes+0.188
uts+0.174
reating+0.174
Welcome+0.165
qu+0.165
Neg logit contributions
 forgot-0.206
 head-0.186
 remembered-0.186
iper-0.181
 searched-0.180
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
anes+0.207
uts+0.201
all+0.190
ous+0.180
that+0.177
Neg logit contributions
 forgot-0.173
 scar-0.141
 fort-0.141
 hospital-0.138
 trap-0.138
Token hands
Token ID2832
Feature activation+3.51e+00

Pos logit contributions
 forgot+0.282
 hospital+0.261
 fort+0.237
 closet+0.232
 scar+0.227
Neg logit contributions
uts-0.300
anes-0.293
all-0.265
ous-0.264
that-0.261
Token and
Token ID290
Feature activation-6.31e+00

Pos logit contributions
anes+0.119
uts+0.108
 Isn+0.105
 Share+0.100
ous+0.100
Neg logit contributions
 forgot-0.059
 bit-0.053
 scar-0.051
 head-0.049
 scarf-0.048
Token danced
Token ID39480
Feature activation+7.76e+00

Pos logit contributions
 forgot+0.119
 fort+0.107
 trap+0.105
 hospital+0.104
 scar+0.100
Neg logit contributions
uts-0.185
anes-0.185
all-0.174
ous-0.169
that-0.166
Token along
Token ID1863
Feature activation+2.24e+00

Pos logit contributions
anes+0.210
uts+0.196
all+0.178
unch+0.176
ous+0.173
Neg logit contributions
 forgot-0.192
 scar-0.167
 fort-0.162
 shield-0.159
 bit-0.157
Token as
Token ID355
Feature activation+1.11e+00

Pos logit contributions
anes+0.067
uts+0.064
all+0.058
unch+0.058
ous+0.057
Neg logit contributions
 forgot-0.048
 scar-0.040
 bit-0.039
 fort-0.039
 hospital-0.039
Token."
Token ID526
Feature activation+7.07e+00

Pos logit contributions
 forgot+0.114
 fort+0.100
 scar+0.100
 shield+0.097
 scarf+0.096
Neg logit contributions
anes-0.076
uts-0.070
all-0.061
unch-0.059
ous-0.059
Token
Token ID198
Feature activation+4.25e+00

Pos logit contributions
anes+0.166
uts+0.159
ous+0.146
all+0.144
qu+0.140
Neg logit contributions
 forgot-0.185
 hospital-0.154
 closet-0.153
 fort-0.152
iper-0.151
Token
Token ID198
Feature activation+3.88e+00

Pos logit contributions
anes+0.091
uts+0.090
ous+0.085
umm+0.085
all+0.078
Neg logit contributions
 forgot-0.112
iper-0.100
 cast-0.100
 hospital-0.100
 scarf-0.099
TokenJohn
Token ID7554
Feature activation-9.17e+00

Pos logit contributions
anes+0.078
uts+0.074
ous+0.068
all+0.063
ges+0.062
Neg logit contributions
 forgot-0.118
 scar-0.103
 cast-0.101
 hospital-0.100
 fort-0.100
Token and
Token ID290
Feature activation-7.52e+00

Pos logit contributions
 forgot+0.179
 fort+0.173
 trap+0.171
 hospital+0.168
 closet+0.163
Neg logit contributions
all-0.256
anes-0.256
uts-0.248
od-0.234
ous-0.231
Token his
Token ID465
Feature activation-1.18e+01

Pos logit contributions
 forgot+0.217
 scar+0.187
 fort+0.183
 shield+0.182
 hospital+0.181
Neg logit contributions
anes-0.171
uts-0.165
all-0.144
unch-0.139
ous-0.138
Token mom
Token ID1995
Feature activation-2.50e+00

Pos logit contributions
 forgot+0.257
 scar+0.202
 fort+0.199
 hospital+0.194
 trap+0.193
Neg logit contributions
anes-0.342
uts-0.326
all-0.307
ous-0.302
od-0.289
Token found
Token ID1043
Feature activation+2.61e+00

Pos logit contributions
 forgot+0.045
 trap+0.037
 fort+0.037
 hospital+0.037
 scar+0.036
Neg logit contributions
anes-0.081
uts-0.075
all-0.074
ous-0.071
od-0.070
Token some
Token ID617
Feature activation-1.65e+00

Pos logit contributions
anes+0.075
uts+0.072
all+0.066
ous+0.064
unch+0.063
Neg logit contributions
 forgot-0.059
 scar-0.050
 fort-0.049
 hospital-0.048
 shield-0.048
Token water
Token ID1660
Feature activation+3.06e+00

Pos logit contributions
 forgot+0.037
iper+0.027
 fort+0.027
 hospital+0.026
 scar+0.026
Neg logit contributions
anes-0.048
uts-0.044
all-0.042
that-0.040
od-0.040
Token and
Token ID290
Feature activation-5.94e+00

Pos logit contributions
anes+0.089
uts+0.085
all+0.079
ous+0.076
unch+0.074
Neg logit contributions
 forgot-0.068
 scar-0.057
 fort-0.056
 hospital-0.054
 closet-0.054
Token
Token ID198
Feature activation+4.51e+00

Pos logit contributions
anes+0.186
uts+0.177
all+0.162
ous+0.155
that+0.152
Neg logit contributions
 forgot-0.189
 scar-0.161
 fort-0.159
 hospital-0.155
 shield-0.154
Token
Token ID198
Feature activation+4.08e+00

Pos logit contributions
anes+0.099
uts+0.092
umm+0.085
ous+0.085
all+0.084
Neg logit contributions
 forgot-0.126
 fort-0.111
 scar-0.110
 cast-0.109
 hospital-0.108
TokenMax
Token ID11518
Feature activation-7.35e+00

Pos logit contributions
uts+0.095
anes+0.092
all+0.083
that+0.083
qu+0.082
Neg logit contributions
 forgot-0.108
 hospital-0.092
 scar-0.090
 closet-0.089
 pocket-0.089
Token w
Token ID266
Feature activation+1.16e+00

Pos logit contributions
 hospital+0.146
 fort+0.145
 forgot+0.141
 injuries+0.137
 closet+0.136
Neg logit contributions
uts-0.198
anes-0.181
all-0.181
od-0.174
qu-0.172
Tokenagged
Token ID14655
Feature activation+1.12e+00

Pos logit contributions
anes+0.025
uts+0.024
all+0.022
unch+0.021
od+0.020
Neg logit contributions
 forgot-0.036
 scar-0.032
 fort-0.031
 closet-0.031
 bit-0.031
Token his
Token ID465
Feature activation-1.17e+01

Pos logit contributions
all+0.021
uts+0.020
anes+0.019
that+0.019
ges+0.018
Neg logit contributions
 forgot-0.033
ns-0.029
iper-0.029
 hospital-0.029
 fort-0.028
Token tail
Token ID7894
Feature activation+2.52e+00

Pos logit contributions
 forgot+0.306
 hospital+0.286
 wouldn+0.270
 slammed+0.260
 punished+0.254
Neg logit contributions
uts-0.238
amb-0.208
that-0.206
eddy-0.202
all-0.193
Token and
Token ID290
Feature activation-6.42e+00

Pos logit contributions
uts+0.060
anes+0.060
all+0.056
that+0.052
unch+0.052
Neg logit contributions
 forgot-0.065
 scar-0.055
iper-0.055
 fort-0.053
 hospital-0.051
Token wanted
Token ID2227
Feature activation+6.31e+00

Pos logit contributions
iper+0.137
 injuries+0.134
 hospital+0.132
 fort+0.127
ns+0.126
Neg logit contributions
uts-0.146
 Isn-0.133
all-0.131
 Mike-0.131
that-0.128
Token to
Token ID284
Feature activation-2.21e+00

Pos logit contributions
anes+0.113
uts+0.109
all+0.099
 Isn+0.089
that+0.088
Neg logit contributions
 forgot-0.206
 fort-0.181
 scar-0.180
iper-0.176
 closet-0.176
Token play
Token ID711
Feature activation+7.63e+00

Pos logit contributions
 forgot+0.045
 scar+0.035
 trap+0.035
 shield+0.035
 fort+0.034
Neg logit contributions
anes-0.071
uts-0.068
all-0.062
ous-0.061
unch-0.060
Token
Token ID198
Feature activation+4.35e+00

Pos logit contributions
anes+0.180
ous+0.170
umm+0.170
uts+0.164
all+0.162
Neg logit contributions
 forgot-0.184
iper-0.160
 hospital-0.160
 fort-0.159
 closet-0.155
Token
Token ID198
Feature activation+3.96e+00

Pos logit contributions
anes+0.095
uts+0.090
ous+0.085
umm+0.080
all+0.077
Neg logit contributions
 forgot-0.120
 learned-0.104
 hospital-0.104
 scar-0.103
 cast-0.102
TokenTim
Token ID14967
Feature activation-7.58e+00

Pos logit contributions
anes+0.111
uts+0.105
ous+0.099
all+0.094
 Isn+0.092
Neg logit contributions
 forgot-0.091
 scar-0.075
 hospital-0.073
 cast-0.072
 fort-0.072
Tokenmy
Token ID1820
Feature activation-7.60e+00

Pos logit contributions
 trap+0.183
 forgot+0.178
 hospital+0.176
 fort+0.175
 traps+0.169
Neg logit contributions
all-0.175
anes-0.166
amb-0.152
uts-0.151
ous-0.150
Token and
Token ID290
Feature activation-7.66e+00

Pos logit contributions
 fort+0.153
 trap+0.151
 forgot+0.144
 hospital+0.142
 traps+0.141
Neg logit contributions
all-0.201
uts-0.193
anes-0.193
qu-0.180
amb-0.180
Token his
Token ID465
Feature activation-1.17e+01

Pos logit contributions
 forgot+0.193
 fort+0.165
 scar+0.164
 trap+0.162
 hospital+0.158
Neg logit contributions
anes-0.193
uts-0.182
all-0.176
ous-0.167
 Mike-0.160
Token family
Token ID1641
Feature activation+3.47e-01

Pos logit contributions
 forgot+0.265
 fort+0.208
 slammed+0.206
 scar+0.206
 hospital+0.202
Neg logit contributions
anes-0.320
uts-0.306
ous-0.288
all-0.286
eddy-0.275
Token continued
Token ID3767
Feature activation+3.87e+00

Pos logit contributions
anes+0.011
uts+0.010
all+0.010
ous+0.009
od+0.009
Neg logit contributions
 forgot-0.006
 fort-0.006
 trap-0.006
 scar-0.005
 hospital-0.005
Token their
Token ID511
Feature activation-1.02e+01

Pos logit contributions
anes+0.104
uts+0.099
all+0.089
 Isn+0.087
ous+0.087
Neg logit contributions
 forgot-0.094
 scar-0.085
 fort-0.082
 shield-0.080
 hospital-0.079
Token walk
Token ID2513
Feature activation+6.13e+00

Pos logit contributions
 forgot+0.188
 closet+0.134
 shield+0.134
 remembered+0.132
 hospital+0.130
Neg logit contributions
anes-0.308
all-0.300
uts-0.297
qu-0.287
ous-0.285
Token,
Token ID11
Feature activation-1.78e+00

Pos logit contributions
anes+0.150
uts+0.141
all+0.131
ous+0.124
that+0.121
Neg logit contributions
 forgot-0.167
 scar-0.140
 fort-0.136
 trap-0.136
 closet-0.135
Token delightful
Token ID32327
Feature activation+9.17e-01

Pos logit contributions
 forgot+0.008
 scar+0.006
 fort+0.006
 shield+0.006
 trap+0.006
Neg logit contributions
anes-0.013
uts-0.012
all-0.011
ous-0.011
that-0.011
Token.
Token ID13
Feature activation+3.99e+00

Pos logit contributions
anes+0.024
uts+0.021
unch+0.020
all+0.019
ges+0.019
Neg logit contributions
 forgot-0.023
 fort-0.020
 bit-0.020
 way-0.020
 scar-0.019
Token Tim
Token ID5045
Feature activation-8.37e+00

Pos logit contributions
anes+0.093
uts+0.091
ous+0.083
qu+0.080
all+0.078
Neg logit contributions
 forgot-0.100
 hospital-0.086
 closet-0.083
iper-0.083
 learned-0.082
Tokenmy
Token ID1820
Feature activation-7.56e+00

Pos logit contributions
 trap+0.201
 forgot+0.199
 fort+0.199
 hospital+0.193
 scar+0.189
Neg logit contributions
all-0.191
anes-0.183
that-0.170
od-0.165
uts-0.164
Token and
Token ID290
Feature activation-7.70e+00

Pos logit contributions
 fort+0.153
 trap+0.150
 forgot+0.145
 closet+0.142
 hospital+0.142
Neg logit contributions
all-0.198
anes-0.196
uts-0.191
od-0.183
ous-0.180
Token his
Token ID465
Feature activation-1.17e+01

Pos logit contributions
 forgot+0.195
 scar+0.165
 fort+0.163
 hospital+0.159
 trap+0.157
Neg logit contributions
anes-0.201
uts-0.191
all-0.177
ous-0.169
that-0.165
Token friend
Token ID1545
Feature activation+9.54e-01

Pos logit contributions
 forgot+0.255
 scar+0.202
 fort+0.197
 slammed+0.195
iper+0.194
Neg logit contributions
anes-0.336
uts-0.322
all-0.300
ous-0.297
od-0.284
Token tasted
Token ID29187
Feature activation+4.80e+00

Pos logit contributions
anes+0.029
uts+0.027
all+0.026
ous+0.025
od+0.025
Neg logit contributions
 forgot-0.019
 scar-0.016
 fort-0.016
 trap-0.016
 hospital-0.015
Token the
Token ID262
Feature activation-6.61e+00

Pos logit contributions
anes+0.139
uts+0.131
ous+0.117
all+0.116
unch+0.112
Neg logit contributions
 forgot-0.112
 scar-0.095
 hospital-0.094
 bit-0.092
 shield-0.092
Token delic
Token ID8675
Feature activation+3.36e+00

Pos logit contributions
 forgot+0.142
 scar+0.112
 fort+0.110
iper+0.108
 hospital+0.105
Neg logit contributions
anes-0.197
uts-0.187
all-0.178
ous-0.172
that-0.169
Tokeniously
Token ID6819
Feature activation-1.09e+00

Pos logit contributions
anes+0.095
uts+0.090
 Isn+0.088
 Share+0.085
that+0.085
Neg logit contributions
 forgot-0.071
 head-0.058
iper-0.057
 bit-0.055
 scar-0.055
Token
Token ID198
Feature activation+4.60e+00

Pos logit contributions
anes+0.194
uts+0.179
qu+0.166
ous+0.164
reating+0.163
Neg logit contributions
 forgot-0.169
 remembered-0.148
 saw-0.148
 accidentally-0.146
 didn-0.145
Token
Token ID198
Feature activation+4.22e+00

Pos logit contributions
anes+0.115
uts+0.110
umm+0.107
ous+0.106
work+0.104
Neg logit contributions
 forgot-0.104
 accidentally-0.090
 saw-0.089
 closet-0.089
 scarf-0.089
TokenLittle
Token ID22253
Feature activation-1.85e+00

Pos logit contributions
anes+0.113
uts+0.105
ous+0.099
all+0.093
 Share+0.089
Neg logit contributions
 forgot-0.103
 scar-0.085
 remembered-0.084
 scarf-0.083
 closet-0.082
Token Johnny
Token ID15470
Feature activation-7.55e+00

Pos logit contributions
 forgot+0.056
 scar+0.049
 fort+0.048
 hospital+0.048
 trap+0.048
Neg logit contributions
anes-0.038
uts-0.036
all-0.033
ous-0.031
that-0.031
Token nodded
Token ID14464
Feature activation+1.99e+00

Pos logit contributions
 forgot+0.132
 fort+0.126
 scar+0.123
 trap+0.121
 shield+0.114
Neg logit contributions
anes-0.227
uts-0.219
all-0.217
ous-0.207
od-0.203
Token his
Token ID465
Feature activation-1.17e+01

Pos logit contributions
anes+0.063
uts+0.058
unch+0.052
qu+0.052
ous+0.051
Neg logit contributions
 forgot-0.042
 head-0.037
 remembered-0.036
 bit-0.035
 shield-0.035
Token head
Token ID1182
Feature activation+2.96e+00

Pos logit contributions
 forgot+0.181
 hospital+0.130
 scar+0.126
ns+0.120
iper+0.118
Neg logit contributions
uts-0.374
ous-0.361
all-0.348
anes-0.348
amb-0.346
Token.
Token ID13
Feature activation+4.25e+00

Pos logit contributions
anes+0.089
uts+0.079
 Isn+0.074
all+0.073
ous+0.071
Neg logit contributions
 forgot-0.066
 bit-0.057
 head-0.056
 scar-0.054
 scarf-0.053
Token He
Token ID679
Feature activation-1.03e+01

Pos logit contributions
anes+0.130
uts+0.127
qu+0.123
good+0.114
od+0.113
Neg logit contributions
 forgot-0.080
 closet-0.074
 head-0.069
 pile-0.069
 backyard-0.065
Token was
Token ID373
Feature activation-3.66e+00

Pos logit contributions
 fort+0.217
 trap+0.200
 forgot+0.198
 hospital+0.198
 closet+0.196
Neg logit contributions
all-0.253
uts-0.249
 Mike-0.244
anes-0.236
od-0.224
Token so
Token ID523
Feature activation-1.38e+00

Pos logit contributions
 forgot+0.078
 scar+0.065
 fort+0.065
 shield+0.061
iper+0.061
Neg logit contributions
anes-0.108
uts-0.104
all-0.099
ous-0.093
that-0.093
Token sharing
Token ID7373
Feature activation+6.82e+00

Pos logit contributions
 forgot+0.140
 scar+0.117
 fort+0.116
 hospital+0.112
 closet+0.110
Neg logit contributions
anes-0.190
uts-0.184
all-0.172
ous-0.165
that-0.163
Token made
Token ID925
Feature activation+5.04e+00

Pos logit contributions
anes+0.192
uts+0.189
all+0.174
that+0.170
 Share+0.166
Neg logit contributions
 forgot-0.147
 scar-0.126
 fort-0.123
iper-0.122
 hospital-0.118
Token everyone
Token ID2506
Feature activation-6.88e-01

Pos logit contributions
anes+0.134
uts+0.127
all+0.119
ous+0.113
that+0.112
Neg logit contributions
 forgot-0.126
 scar-0.106
 fort-0.104
 closet-0.101
 shield-0.101
Token happy
Token ID3772
Feature activation+3.25e+00

Pos logit contributions
 forgot+0.013
 scar+0.011
 fort+0.010
 hospital+0.010
 trap+0.010
Neg logit contributions
anes-0.022
uts-0.021
all-0.020
ous-0.019
that-0.019
Token.
Token ID13
Feature activation+4.81e+00

Pos logit contributions
anes+0.086
uts+0.077
unch+0.074
all+0.073
ous+0.069
Neg logit contributions
 forgot-0.086
 fort-0.071
 hospital-0.069
 head-0.068
 scar-0.068
Token<|endoftext|>
Token ID50256
Feature activation+1.37e+01

Pos logit contributions
anes+0.116
ous+0.108
uts+0.106
all+0.101
ach+0.100
Neg logit contributions
 forgot-0.118
 learned-0.108
 hospital-0.100
 trap-0.098
 fort-0.098
TokenOnce
Token ID7454
Feature activation+6.71e+00

Pos logit contributions
ous+0.403
uts+0.394
anes+0.389
umm+0.377
amb+0.359
Neg logit contributions
 forgot-0.262
 saw-0.258
 learned-0.248
 way-0.248
 room-0.243
Token upon
Token ID2402
Feature activation-5.10e-01

Pos logit contributions
anes+0.190
uts+0.182
all+0.168
ous+0.162
qu+0.159
Neg logit contributions
 forgot-0.155
 scar-0.127
 fort-0.126
 hospital-0.122
 closet-0.121
Token a
Token ID257
Feature activation-3.61e+00

Pos logit contributions
 forgot+0.011
 scar+0.009
 fort+0.009
 hospital+0.009
 trap+0.009
Neg logit contributions
anes-0.015
uts-0.015
all-0.013
ous-0.013
qu-0.013
Token time
Token ID640
Feature activation+4.81e+00

Pos logit contributions
 forgot+0.072
 scar+0.062
 shield+0.060
 hospital+0.059
 closet+0.059
Neg logit contributions
uts-0.101
all-0.099
anes-0.097
unch-0.097
that-0.090
Token,
Token ID11
Feature activation-2.87e-01

Pos logit contributions
anes+0.173
uts+0.160
all+0.151
ous+0.149
 Isn+0.148
Neg logit contributions
 forgot-0.078
 fort-0.057
 way-0.053
 bit-0.053
 scar-0.052
Token.
Token ID13
Feature activation+4.97e+00

Pos logit contributions
 forgot+0.053
 scar+0.045
 fort+0.045
 hospital+0.044
 trap+0.043
Neg logit contributions
anes-0.046
uts-0.043
all-0.039
ous-0.037
that-0.037
Token You
Token ID921
Feature activation-3.38e+00

Pos logit contributions
anes+0.117
uts+0.106
ous+0.103
all+0.101
qu+0.098
Neg logit contributions
 forgot-0.124
 hospital-0.111
 closet-0.110
 head-0.107
 slammed-0.106
Token need
Token ID761
Feature activation+4.24e+00

Pos logit contributions
 forgot+0.076
 fort+0.074
 scar+0.071
 closet+0.070
iper+0.069
Neg logit contributions
anes-0.087
uts-0.084
all-0.075
od-0.073
 Share-0.072
Token a
Token ID257
Feature activation-4.96e+00

Pos logit contributions
anes+0.099
uts+0.094
all+0.087
ous+0.082
that+0.080
Neg logit contributions
 forgot-0.118
 scar-0.101
 fort-0.100
 hospital-0.097
 shield-0.096
Token better
Token ID1365
Feature activation+6.79e-01

Pos logit contributions
 forgot+0.099
iper+0.077
 fort+0.077
 scar+0.077
 slammed+0.075
Neg logit contributions
anes-0.144
uts-0.142
ous-0.132
all-0.132
 Isn-0.128
Token joke
Token ID9707
Feature activation+1.00e+01

Pos logit contributions
anes+0.022
uts+0.021
all+0.020
ous+0.019
that+0.018
Neg logit contributions
 forgot-0.013
 scar-0.010
 fort-0.010
 hospital-0.010
 closet-0.010
Token,
Token ID11
Feature activation-1.30e+00

Pos logit contributions
anes+0.273
unch+0.238
all+0.233
uts+0.230
that+0.226
Neg logit contributions
 forgot-0.238
 scar-0.205
 hospital-0.204
 bit-0.204
 fort-0.201
Token Tom
Token ID4186
Feature activation-6.46e+00

Pos logit contributions
 forgot+0.029
 fort+0.024
iper+0.023
 scar+0.023
 shield+0.023
Neg logit contributions
anes-0.037
uts-0.036
all-0.034
 Isn-0.031
ous-0.031
Token.
Token ID13
Feature activation+4.32e+00

Pos logit contributions
 forgot+0.214
 scar+0.196
 fort+0.194
 closet+0.188
iper+0.187
Neg logit contributions
uts-0.105
anes-0.105
all-0.091
that-0.080
ous-0.080
Token A
Token ID317
Feature activation-4.31e+00

Pos logit contributions
anes+0.104
ous+0.094
uts+0.092
all+0.090
that+0.087
Neg logit contributions
 forgot-0.103
 head-0.094
 hospital-0.093
 closet-0.091
 bit-0.089
Token funny
Token ID8258
Feature activation-3.67e-01

Pos logit contributions
 forgot+0.092
 scar+0.075
 fort+0.074
 hospital+0.071
 shield+0.071
Neg logit contributions
anes-0.128
uts-0.123
all-0.115
ous-0.111
that-0.109
Token asked
Token ID1965
Feature activation+4.76e+00

Pos logit contributions
 fort+0.116
 forgot+0.110
 scar+0.108
 scarf+0.102
eh+0.100
Neg logit contributions
uts-0.157
anes-0.154
all-0.152
ous-0.149
 Mike-0.149
Token why
Token ID1521
Feature activation+5.42e+00

Pos logit contributions
anes+0.131
uts+0.124
all+0.115
ous+0.110
that+0.108
Neg logit contributions
 forgot-0.115
 scar-0.096
 fort-0.094
 hospital-0.092
 shield-0.092
Token they
Token ID484
Feature activation-6.69e+00

Pos logit contributions
anes+0.133
uts+0.130
all+0.118
ous+0.114
that+0.110
Neg logit contributions
 forgot-0.141
 scar-0.124
 fort-0.122
 trap-0.117
 hospital-0.117
Token wouldn
Token ID3636
Feature activation+5.29e+00

Pos logit contributions
 forgot+0.133
 fort+0.129
 scar+0.126
 hospital+0.119
iper+0.117
Neg logit contributions
anes-0.189
uts-0.182
all-0.171
ous-0.169
that-0.161
Token't
Token ID470
Feature activation-2.52e+00

Pos logit contributions
umm+0.116
anes+0.113
 Isn+0.105
Happy+0.101
ucks+0.101
Neg logit contributions
 forgot-0.137
 hospital-0.125
 cast-0.119
 head-0.118
 needle-0.117
Token give
Token ID1577
Feature activation+9.68e+00

Pos logit contributions
 forgot+0.050
 head+0.037
 shield+0.036
 bit+0.036
 trap+0.036
Neg logit contributions
anes-0.085
uts-0.078
all-0.073
 Isn-0.073
ous-0.071
Token it
Token ID340
Feature activation-2.01e+00

Pos logit contributions
anes+0.236
uts+0.223
all+0.205
ous+0.196
that+0.190
Neg logit contributions
 forgot-0.262
 scar-0.225
 fort-0.221
 hospital-0.216
 shield-0.215
Token back
Token ID736
Feature activation+3.38e-01

Pos logit contributions
 forgot+0.049
 scar+0.041
 fort+0.040
 hospital+0.039
 trap+0.039
Neg logit contributions
anes-0.055
uts-0.052
all-0.048
ous-0.046
unch-0.046
Token.
Token ID13
Feature activation+4.88e+00

Pos logit contributions
anes+0.008
uts+0.008
all+0.007
ous+0.007
unch+0.007
Neg logit contributions
 forgot-0.009
 scar-0.008
 fort-0.008
 hospital-0.008
 closet-0.008
Token The
Token ID383
Feature activation-6.81e+00

Pos logit contributions
anes+0.124
uts+0.115
all+0.104
ous+0.103
qu+0.100
Neg logit contributions
 forgot-0.128
 hospital-0.107
 bit-0.105
 closet-0.105
 fort-0.104
Token person
Token ID1048
Feature activation-2.66e-01

Pos logit contributions
 forgot+0.154
 scar+0.132
 fort+0.125
iper+0.121
 scarf+0.120
Neg logit contributions
anes-0.194
uts-0.184
all-0.174
ous-0.167
that-0.165
Token They
Token ID1119
Feature activation-7.90e+00

Pos logit contributions
anes+0.130
ous+0.117
uts+0.112
all+0.111
qu+0.109
Neg logit contributions
 forgot-0.121
 closet-0.105
 scar-0.101
 hospital-0.099
 fort-0.099
Token loved
Token ID6151
Feature activation+4.64e+00

Pos logit contributions
 fort+0.168
iper+0.156
 trap+0.156
 hospital+0.148
 closet+0.147
Neg logit contributions
anes-0.188
uts-0.186
all-0.178
 Mike-0.178
od-0.173
Token growing
Token ID3957
Feature activation+6.94e+00

Pos logit contributions
anes+0.133
uts+0.126
all+0.115
ous+0.113
unch+0.110
Neg logit contributions
 forgot-0.108
 scar-0.093
 fort-0.088
 hospital-0.086
 shield-0.086
Token tomatoes
Token ID23972
Feature activation+4.04e+00

Pos logit contributions
anes+0.207
uts+0.199
all+0.184
ous+0.181
unch+0.176
Neg logit contributions
 forgot-0.146
 scar-0.123
 fort-0.121
 trap-0.117
 shield-0.116
Token.
Token ID13
Feature activation+4.91e+00

Pos logit contributions
anes+0.122
uts+0.117
all+0.109
ous+0.106
that+0.103
Neg logit contributions
 forgot-0.085
 scar-0.070
 fort-0.068
 hospital-0.066
 shield-0.065
Token<|endoftext|>
Token ID50256
Feature activation+1.37e+01

Pos logit contributions
anes+0.139
ous+0.129
all+0.121
uts+0.121
qu+0.118
Neg logit contributions
 forgot-0.114
 closet-0.101
 scar-0.096
 fort-0.094
 hospital-0.093
TokenL
Token ID43
Feature activation-1.33e+00

Pos logit contributions
ous+0.423
uts+0.414
anes+0.382
all+0.375
umm+0.371
Neg logit contributions
 saw-0.220
 might-0.220
 room-0.216
 closet-0.213
 trap-0.212
Tokenila
Token ID10102
Feature activation-6.90e+00

Pos logit contributions
 forgot+0.038
 bit+0.032
 scar+0.032
 shield+0.032
 trap+0.031
Neg logit contributions
anes-0.031
uts-0.028
all-0.028
od-0.026
ges-0.025
Token was
Token ID373
Feature activation-2.62e+00

Pos logit contributions
 scar+0.169
 forgot+0.153
 fort+0.153
 shield+0.152
iper+0.151
Neg logit contributions
uts-0.154
anes-0.154
all-0.148
od-0.135
ges-0.134
Token lonely
Token ID21757
Feature activation+3.96e+00

Pos logit contributions
 forgot+0.071
 scar+0.060
 fort+0.060
 trap+0.058
 shield+0.057
Neg logit contributions
anes-0.061
uts-0.057
all-0.057
unch-0.051
that-0.051
Token in
Token ID287
Feature activation-1.25e+00

Pos logit contributions
anes+0.086
uts+0.082
all+0.075
ous+0.071
that+0.068
Neg logit contributions
 forgot-0.115
 scar-0.100
 fort-0.098
iper-0.096
 trap-0.095
Token things
Token ID1243
Feature activation+3.04e+00

Pos logit contributions
anes+0.088
uts+0.086
all+0.080
ous+0.076
that+0.076
Neg logit contributions
 forgot-0.080
 scar-0.069
 fort-0.068
 closet-0.065
iper-0.065
Token the
Token ID262
Feature activation-6.76e+00

Pos logit contributions
anes+0.083
uts+0.076
all+0.070
unch+0.068
ous+0.068
Neg logit contributions
 forgot-0.076
 scar-0.062
 fort-0.061
 hospital-0.061
 shield-0.061
Token right
Token ID826
Feature activation+7.35e-01

Pos logit contributions
 forgot+0.179
 scar+0.154
iper+0.153
 slammed+0.146
ches+0.144
Neg logit contributions
anes-0.154
all-0.141
that-0.139
uts-0.137
 Isn-0.132
Token way
Token ID835
Feature activation+5.64e+00

Pos logit contributions
anes+0.022
uts+0.021
all+0.019
ous+0.019
that+0.018
Neg logit contributions
 forgot-0.016
 scar-0.013
 fort-0.013
 hospital-0.013
 shield-0.012
Token.
Token ID13
Feature activation+4.89e+00

Pos logit contributions
anes+0.144
uts+0.132
all+0.120
unch+0.119
qu+0.116
Neg logit contributions
 forgot-0.150
 fort-0.122
 way-0.121
 hospital-0.121
 head-0.120
Token<|endoftext|>
Token ID50256
Feature activation+1.38e+01

Pos logit contributions
anes+0.133
uts+0.124
ous+0.121
qu+0.117
ach+0.112
Neg logit contributions
 forgot-0.112
 learned-0.098
 hospital-0.091
 pocket-0.089
 closet-0.089
TokenOnce
Token ID7454
Feature activation+6.84e+00

Pos logit contributions
uts+0.400
anes+0.371
ous+0.370
all+0.342
qu+0.338
Neg logit contributions
 forgot-0.271
 saw-0.262
 closet-0.256
 might-0.252
 way-0.251
Token there
Token ID612
Feature activation+2.81e+00

Pos logit contributions
anes+0.199
uts+0.191
all+0.177
ous+0.170
that+0.167
Neg logit contributions
 forgot-0.152
 scar-0.125
 fort-0.123
 hospital-0.119
 closet-0.118
Token was
Token ID373
Feature activation-2.52e+00

Pos logit contributions
anes+0.091
uts+0.088
all+0.083
ous+0.080
that+0.079
Neg logit contributions
 forgot-0.052
 scar-0.042
 fort-0.041
 hospital-0.039
 shield-0.039
Token a
Token ID257
Feature activation-4.06e+00

Pos logit contributions
 forgot+0.048
 scar+0.042
 scarf+0.040
 fort+0.040
 hospital+0.039
Neg logit contributions
anes-0.080
uts-0.077
ous-0.071
all-0.069
qu-0.068
Token tall
Token ID7331
Feature activation-1.53e+00

Pos logit contributions
 forgot+0.074
iper+0.061
 scar+0.057
 slammed+0.055
 hospital+0.054
Neg logit contributions
anes-0.120
uts-0.117
all-0.113
unch-0.112
eddy-0.112
Token came
Token ID1625
Feature activation+3.75e+00

Pos logit contributions
 fort+0.105
 scar+0.103
 trap+0.101
 forgot+0.100
iper+0.098
Neg logit contributions
uts-0.170
anes-0.166
all-0.165
 Mike-0.163
ges-0.155
Token back
Token ID736
Feature activation-5.36e-01

Pos logit contributions
anes+0.109
uts+0.103
all+0.094
ous+0.092
qu+0.091
Neg logit contributions
 forgot-0.086
 scar-0.070
 fort-0.070
 closet-0.068
 head-0.067
Token with
Token ID351
Feature activation+1.41e+00

Pos logit contributions
 forgot+0.013
 closet+0.011
 fort+0.011
 bit+0.011
 hospital+0.011
Neg logit contributions
anes-0.016
uts-0.014
unch-0.013
qu-0.013
ous-0.012
Token a
Token ID257
Feature activation-5.47e+00

Pos logit contributions
anes+0.037
uts+0.036
all+0.033
ous+0.032
that+0.031
Neg logit contributions
 forgot-0.035
 scar-0.029
 fort-0.029
 hospital-0.028
 trap-0.027
Token big
Token ID1263
Feature activation-3.14e+00

Pos logit contributions
 forgot+0.107
iper+0.089
 hospital+0.085
 fort+0.084
 scar+0.083
Neg logit contributions
anes-0.151
uts-0.150
ous-0.140
that-0.139
 Isn-0.138
Token piece
Token ID3704
Feature activation+7.41e+00

Pos logit contributions
 forgot+0.077
iper+0.061
ns+0.060
 hospital+0.058
ches+0.058
Neg logit contributions
anes-0.075
ous-0.070
uts-0.068
 cheering-0.067
 Isn-0.067
Token of
Token ID286
Feature activation-1.18e+00

Pos logit contributions
anes+0.114
uts+0.104
all+0.090
ous+0.083
that+0.079
Neg logit contributions
 forgot-0.266
 scar-0.238
 fort-0.235
 hospital-0.231
 shield-0.230
Token paper
Token ID3348
Feature activation+2.45e+00

Pos logit contributions
 forgot+0.024
 fort+0.018
 hospital+0.017
 slammed+0.016
 trap+0.016
Neg logit contributions
anes-0.036
uts-0.033
all-0.033
that-0.032
od-0.032
Token.
Token ID13
Feature activation+4.27e+00

Pos logit contributions
anes+0.058
uts+0.055
all+0.050
ous+0.048
that+0.047
Neg logit contributions
 forgot-0.068
 scar-0.059
 fort-0.058
 hospital-0.056
 shield-0.056
Token She
Token ID1375
Feature activation-1.05e+01

Pos logit contributions
anes+0.111
uts+0.102
qu+0.096
eddy+0.095
ous+0.094
Neg logit contributions
 forgot-0.101
 closet-0.085
 learned-0.085
 hospital-0.083
 slammed-0.083
Token smiled
Token ID13541
Feature activation+3.53e+00

Pos logit contributions
 fort+0.208
 trap+0.198
 hospital+0.196
iper+0.194
 scar+0.191
Neg logit contributions
uts-0.267
all-0.256
anes-0.252
od-0.245
 Mike-0.237
Token puppy
Token ID26188
Feature activation-3.55e-02

Pos logit contributions
 forgot+0.115
 scar+0.089
 fort+0.086
 hospital+0.083
 trap+0.081
Neg logit contributions
anes-0.233
uts-0.225
all-0.212
ous-0.205
that-0.202
Token is
Token ID318
Feature activation-3.54e+00

Pos logit contributions
iper+0.001
 forgot+0.001
eh+0.001
 fort+0.001
Days+0.001
Neg logit contributions
anes-0.001
uts-0.001
 Hello-0.001
 Share-0.001
 Clown-0.001
Token brown
Token ID7586
Feature activation+9.98e-01

Pos logit contributions
 forgot+0.074
 fort+0.060
 scar+0.059
 hospital+0.057
 shield+0.056
Neg logit contributions
anes-0.106
uts-0.103
all-0.096
ous-0.092
that-0.090
Token and
Token ID290
Feature activation-6.89e+00

Pos logit contributions
anes+0.029
uts+0.028
all+0.026
ous+0.025
that+0.025
Neg logit contributions
 forgot-0.022
 scar-0.018
 fort-0.017
iper-0.017
 hospital-0.017
Token fluffy
Token ID39145
Feature activation+7.33e-01

Pos logit contributions
 forgot+0.147
iper+0.129
 fort+0.128
 hospital+0.127
 closet+0.117
Neg logit contributions
uts-0.182
anes-0.177
 Isn-0.165
ous-0.163
that-0.162
Token.
Token ID13
Feature activation+4.49e+00

Pos logit contributions
anes+0.021
uts+0.020
all+0.019
ous+0.018
that+0.017
Neg logit contributions
 forgot-0.017
 scar-0.014
 fort-0.014
 shield-0.014
 trap-0.014
Token Sara
Token ID24799
Feature activation-8.71e+00

Pos logit contributions
anes+0.115
uts+0.105
all+0.102
ous+0.100
qu+0.097
Neg logit contributions
 forgot-0.116
 scar-0.100
 hospital-0.097
 bit-0.095
 closet-0.095
Token and
Token ID290
Feature activation-7.92e+00

Pos logit contributions
 forgot+0.197
 fort+0.174
 scar+0.170
 shield+0.170
 trap+0.170
Neg logit contributions
anes-0.231
uts-0.231
all-0.206
 Share-0.200
od-0.193
Token Max
Token ID5436
Feature activation-7.46e+00

Pos logit contributions
 forgot+0.200
eh+0.167
 fort+0.159
 trap+0.156
 closet+0.154
Neg logit contributions
anes-0.201
uts-0.188
all-0.170
 Share-0.167
ous-0.167
Token go
Token ID467
Feature activation+3.61e+00

Pos logit contributions
 forgot+0.177
 fort+0.157
 scar+0.154
iper+0.151
 trap+0.150
Neg logit contributions
anes-0.191
uts-0.184
all-0.169
that-0.162
 Share-0.162
Token to
Token ID284
Feature activation-2.84e+00

Pos logit contributions
anes+0.082
uts+0.078
all+0.071
ous+0.067
that+0.065
Neg logit contributions
 forgot-0.103
 scar-0.090
 fort-0.088
 hospital-0.086
 shield-0.086
Token.
Token ID13
Feature activation+5.18e+00

Pos logit contributions
anes+0.067
uts+0.062
all+0.058
unch+0.057
that+0.057
Neg logit contributions
 forgot-0.056
 scar-0.049
 fort-0.048
 bit-0.047
 hospital-0.046
Token We
Token ID775
Feature activation-4.13e+00

Pos logit contributions
anes+0.128
uts+0.122
all+0.114
unch+0.113
ous+0.112
Neg logit contributions
 forgot-0.128
 scar-0.114
 closet-0.113
 hospital-0.113
 fort-0.112
Token nap
Token ID25422
Feature activation+7.72e+00

Pos logit contributions
 forgot+0.102
 fort+0.097
 closet+0.094
 scar+0.093
eh+0.092
Neg logit contributions
anes-0.088
uts-0.087
all-0.080
 Share-0.078
qu-0.077
Token and
Token ID290
Feature activation-6.45e+00

Pos logit contributions
uts+0.175
anes+0.171
all+0.158
ous+0.147
that+0.145
Neg logit contributions
 forgot-0.209
 hospital-0.179
 scar-0.177
 fort-0.174
iper-0.173
Token pur
Token ID1308
Feature activation+5.75e+00

Pos logit contributions
 forgot+0.143
iper+0.125
 fort+0.125
 scar+0.122
eh+0.120
Neg logit contributions
anes-0.164
uts-0.163
all-0.150
 Share-0.145
ous-0.144
Tokenr
Token ID81
Feature activation+6.57e+00

Pos logit contributions
anes+0.137
uts+0.122
that+0.118
all+0.115
unch+0.111
Neg logit contributions
 forgot-0.148
 scar-0.138
 bit-0.133
 head-0.131
iper-0.130
Token and
Token ID290
Feature activation-6.47e+00

Pos logit contributions
anes+0.168
uts+0.147
 Isn+0.142
unch+0.140
all+0.140
Neg logit contributions
 forgot-0.163
 head-0.143
 scar-0.142
 fort-0.141
 bit-0.134
Token scratch
Token ID12692
Feature activation+8.64e+00

Pos logit contributions
 forgot+0.137
 fort+0.115
 hospital+0.114
iper+0.112
 closet+0.110
Neg logit contributions
anes-0.175
uts-0.174
all-0.161
 Isn-0.154
ous-0.154
Token."
Token ID526
Feature activation+7.02e+00

Pos logit contributions
anes+0.219
uts+0.204
unch+0.188
all+0.186
ous+0.182
Neg logit contributions
 forgot-0.223
 scar-0.197
 bit-0.189
 fort-0.184
 trap-0.184
Token
Token ID198
Feature activation+4.48e+00

Pos logit contributions
anes+0.160
ous+0.148
uts+0.142
all+0.142
od+0.131
Neg logit contributions
 forgot-0.197
 scar-0.173
 hospital-0.169
 closet-0.166
 fort-0.165
Token
Token ID198
Feature activation+4.09e+00

Pos logit contributions
anes+0.112
umm+0.105
ous+0.104
uts+0.102
all+0.100
Neg logit contributions
 forgot-0.111
 scar-0.098
 cast-0.097
 hospital-0.097
 learned-0.096
Token home
Token ID1363
Feature activation+5.54e+00

Pos logit contributions
anes+0.131
uts+0.125
all+0.116
ous+0.111
that+0.108
Neg logit contributions
 forgot-0.114
 scar-0.096
 fort-0.094
 hospital-0.092
 shield-0.091
Token Sam
Token ID3409
Feature activation-5.30e+00

Pos logit contributions
anes+0.137
uts+0.133
all+0.124
ous+0.114
that+0.114
Neg logit contributions
 forgot-0.144
 scar-0.122
 fort-0.121
 trap-0.119
 hospital-0.118
Token from
Token ID422
Feature activation-1.36e-01

Pos logit contributions
 forgot+0.133
 scar+0.111
 fort+0.110
 hospital+0.108
 trap+0.108
Neg logit contributions
anes-0.133
uts-0.131
all-0.122
that-0.114
ous-0.112
Token the
Token ID262
Feature activation-6.49e+00

Pos logit contributions
 forgot+0.003
 fort+0.003
 scar+0.003
 shield+0.003
 scarf+0.003
Neg logit contributions
anes-0.003
all-0.003
uts-0.003
ges-0.003
 Mike-0.003
Token store
Token ID3650
Feature activation+6.81e+00

Pos logit contributions
 forgot+0.149
iper+0.121
 scar+0.117
 remembered+0.113
ches+0.112
Neg logit contributions
anes-0.174
uts-0.165
that-0.154
all-0.154
 Isn-0.151
Token.
Token ID13
Feature activation+4.32e+00

Pos logit contributions
anes+0.151
uts+0.143
all+0.130
ous+0.123
that+0.120
Neg logit contributions
 forgot-0.198
 scar-0.173
 fort-0.168
 shield-0.165
 trap-0.165
Token She
Token ID1375
Feature activation-1.06e+01

Pos logit contributions
anes+0.112
uts+0.102
qu+0.100
ous+0.097
all+0.096
Neg logit contributions
 forgot-0.102
 fort-0.090
 closet-0.089
 hospital-0.089
iper-0.086
Token was
Token ID373
Feature activation-3.22e+00

Pos logit contributions
 fort+0.208
 trap+0.208
 traps+0.191
 scar+0.189
eh+0.188
Neg logit contributions
all-0.276
uts-0.272
anes-0.263
 Mike-0.252
amb-0.247
Token also
Token ID635
Feature activation-7.91e+00

Pos logit contributions
 forgot+0.072
 fort+0.062
 scar+0.062
 trap+0.059
 shield+0.058
Neg logit contributions
anes-0.090
uts-0.087
all-0.084
that-0.078
ous-0.078
Token happy
Token ID3772
Feature activation+3.03e+00

Pos logit contributions
 forgot+0.162
 scar+0.132
 fort+0.130
 hospital+0.125
 trap+0.124
Neg logit contributions
anes-0.244
uts-0.234
all-0.220
ous-0.211
that-0.208
Token that
Token ID326
Feature activation-2.40e+00

Pos logit contributions
anes+0.082
uts+0.073
all+0.066
qu+0.065
unch+0.064
Neg logit contributions
 forgot-0.079
 scar-0.066
 fort-0.065
 hospital-0.064
 bit-0.063
Token.
Token ID13
Feature activation+4.87e+00

Pos logit contributions
anes+0.080
uts+0.072
all+0.066
ous+0.064
unch+0.063
Neg logit contributions
 forgot-0.092
 fort-0.077
 scar-0.077
 bit-0.076
 hospital-0.075
Token<|endoftext|>
Token ID50256
Feature activation+1.38e+01

Pos logit contributions
anes+0.109
uts+0.100
all+0.089
ach+0.087
qu+0.087
Neg logit contributions
 forgot-0.144
 fort-0.124
 hospital-0.123
 closet-0.122
 learned-0.119
TokenOnce
Token ID7454
Feature activation+6.72e+00

Pos logit contributions
anes+0.393
ous+0.381
uts+0.377
umm+0.363
amb+0.349
Neg logit contributions
 forgot-0.284
 way-0.265
 saw-0.262
 learned-0.259
 room-0.255
Token upon
Token ID2402
Feature activation-7.90e-01

Pos logit contributions
anes+0.197
uts+0.188
all+0.174
ous+0.168
qu+0.166
Neg logit contributions
 forgot-0.149
 scar-0.121
 fort-0.120
 hospital-0.116
 closet-0.115
Token a
Token ID257
Feature activation-3.57e+00

Pos logit contributions
 forgot+0.016
 scar+0.013
 fort+0.013
 scarf+0.013
 shield+0.013
Neg logit contributions
anes-0.025
uts-0.023
ous-0.021
all-0.021
qu-0.020
Token time
Token ID640
Feature activation+4.71e+00

Pos logit contributions
 forgot+0.068
 scar+0.057
 shield+0.055
 hospital+0.055
 closet+0.053
Neg logit contributions
uts-0.104
all-0.103
anes-0.102
unch-0.100
that-0.094
Token,
Token ID11
Feature activation-1.56e-01

Pos logit contributions
anes+0.180
uts+0.164
qu+0.154
all+0.154
ous+0.154
Neg logit contributions
 forgot-0.066
 fort-0.046
 way-0.044
 didn-0.042
 remembered-0.041
Token there
Token ID612
Feature activation+2.48e+00

Pos logit contributions
 forgot+0.004
 scar+0.004
 fort+0.004
 hospital+0.004
 shield+0.004
Neg logit contributions
anes-0.004
uts-0.003
all-0.003
ous-0.003
that-0.003
Token was
Token ID373
Feature activation-2.67e+00

Pos logit contributions
anes+0.087
uts+0.084
all+0.078
ous+0.077
qu+0.075
Neg logit contributions
 forgot-0.041
 fort-0.030
 scar-0.030
 hospital-0.029
 closet-0.029
Token a
Token ID257
Feature activation-4.40e+00

Pos logit contributions
 forgot+0.055
 scar+0.047
 fort+0.046
 hospital+0.045
 scarf+0.044
Neg logit contributions
anes-0.082
uts-0.078
ous-0.072
all-0.071
 Isn-0.068
Token princess
Token ID21752
Feature activation+1.14e-01

Pos logit contributions
 forgot+0.089
iper+0.078
ches+0.072
 scar+0.070
 sank+0.069
Neg logit contributions
anes-0.118
uts-0.112
eddy-0.112
unch-0.110
all-0.109
Token upon
Token ID2402
Feature activation-8.94e-01

Pos logit contributions
anes+0.203
uts+0.194
all+0.180
qu+0.174
ous+0.173
Neg logit contributions
 forgot-0.145
 fort-0.116
 scar-0.115
 hospital-0.111
 closet-0.111
Token a
Token ID257
Feature activation-3.70e+00

Pos logit contributions
 forgot+0.019
 scar+0.016
 fort+0.016
 hospital+0.015
 scarf+0.015
Neg logit contributions
anes-0.027
uts-0.026
all-0.024
ous-0.023
qu-0.022
Token time
Token ID640
Feature activation+4.72e+00

Pos logit contributions
 forgot+0.075
 scar+0.063
 shield+0.061
 hospital+0.061
 closet+0.060
Neg logit contributions
all-0.102
uts-0.102
anes-0.099
unch-0.097
that-0.093
Token,
Token ID11
Feature activation-3.18e-01

Pos logit contributions
anes+0.179
uts+0.165
qu+0.155
good+0.154
eddy+0.154
Neg logit contributions
 forgot-0.065
 fort-0.046
 way-0.043
 didn-0.043
 closet-0.042
Token there
Token ID612
Feature activation+2.52e+00

Pos logit contributions
 forgot+0.009
 scar+0.007
 fort+0.007
 hospital+0.007
 shield+0.007
Neg logit contributions
anes-0.008
uts-0.007
all-0.007
ous-0.006
that-0.006
Token was
Token ID373
Feature activation-2.76e+00

Pos logit contributions
anes+0.089
uts+0.085
all+0.080
ous+0.078
qu+0.076
Neg logit contributions
 forgot-0.042
 fort-0.031
 scar-0.031
 hospital-0.030
 closet-0.029
Token a
Token ID257
Feature activation-4.46e+00

Pos logit contributions
 forgot+0.060
 scar+0.050
 fort+0.049
 hospital+0.048
 shield+0.048
Neg logit contributions
anes-0.082
uts-0.079
all-0.072
ous-0.071
qu-0.068
Token tiny
Token ID7009
Feature activation-1.66e+00

Pos logit contributions
 forgot+0.092
iper+0.082
ches+0.076
 scar+0.073
 slammed+0.072
Neg logit contributions
anes-0.114
eddy-0.111
uts-0.108
unch-0.107
all-0.107
Token seed
Token ID9403
Feature activation+1.51e+00

Pos logit contributions
 forgot+0.039
iper+0.030
 sank+0.028
 scar+0.028
ches+0.028
Neg logit contributions
anes-0.044
eddy-0.042
uts-0.041
ous-0.040
all-0.039
Token.
Token ID13
Feature activation+5.26e+00

Pos logit contributions
anes+0.034
uts+0.032
all+0.029
ous+0.028
that+0.027
Neg logit contributions
 forgot-0.044
 scar-0.038
 fort-0.037
 hospital-0.037
 shield-0.036
Token It
Token ID632
Feature activation-5.09e+00

Pos logit contributions
anes+0.137
uts+0.135
all+0.120
ous+0.119
qu+0.117
Neg logit contributions
 forgot-0.122
 trap-0.108
 fort-0.107
 closet-0.105
 bit-0.104
Token warm
Token ID5814
Feature activation+9.64e-01

Pos logit contributions
 forgot+0.211
 hospital+0.159
 slammed+0.159
iper+0.156
 fort+0.144
Neg logit contributions
uts-0.332
anes-0.318
ous-0.291
all-0.284
unch-0.283
Token breath
Token ID8033
Feature activation+6.78e+00

Pos logit contributions
anes+0.033
uts+0.032
all+0.030
ous+0.030
that+0.029
Neg logit contributions
 forgot-0.016
 scar-0.011
 hospital-0.011
 fort-0.011
iper-0.011
Token.
Token ID13
Feature activation+4.78e+00

Pos logit contributions
anes+0.184
uts+0.168
all+0.157
ous+0.153
unch+0.150
Neg logit contributions
 forgot-0.171
 scar-0.143
 bit-0.142
 fort-0.138
 hospital-0.137
Token
Token ID198
Feature activation+3.99e+00

Pos logit contributions
anes+0.123
all+0.113
uts+0.110
ous+0.110
od+0.109
Neg logit contributions
 forgot-0.120
 scar-0.113
 scarf-0.105
 closet-0.105
 fort-0.103
Token
Token ID198
Feature activation+3.64e+00

Pos logit contributions
anes+0.091
uts+0.083
ous+0.079
umm+0.077
all+0.077
Neg logit contributions
 forgot-0.113
 scar-0.100
 fort-0.099
 hospital-0.098
 scarf-0.096
TokenSnow
Token ID28974
Feature activation+4.39e-01

Pos logit contributions
anes+0.078
uts+0.075
all+0.068
that+0.064
qu+0.063
Neg logit contributions
 forgot-0.107
 fort-0.093
 scar-0.092
 shield-0.091
 hospital-0.091
Tokeny
Token ID88
Feature activation-7.50e+00

Pos logit contributions
anes+0.010
uts+0.009
all+0.008
ous+0.008
eddy+0.007
Neg logit contributions
 forgot-0.013
 scar-0.011
 hospital-0.011
 bit-0.011
 trap-0.011
Token likes
Token ID7832
Feature activation+2.58e+00

Pos logit contributions
 forgot+0.183
 proper+0.167
Days+0.166
eh+0.164
iper+0.163
Neg logit contributions
uts-0.156
anes-0.152
all-0.141
 Share-0.138
amb-0.134
Token Lily
Token ID20037
Feature activation-7.12e+00

Pos logit contributions
anes+0.070
uts+0.066
all+0.061
ous+0.059
that+0.058
Neg logit contributions
 forgot-0.063
 scar-0.055
 fort-0.052
 hospital-0.052
 scarf-0.051
Token and
Token ID290
Feature activation-7.61e+00

Pos logit contributions
 forgot+0.169
 fort+0.148
 scar+0.146
iper+0.144
 shield+0.144
Neg logit contributions
uts-0.181
anes-0.163
all-0.161
that-0.152
unch-0.149
Token Tom
Token ID4186
Feature activation-6.88e+00

Pos logit contributions
 forgot+0.229
iper+0.202
 fort+0.201
 trap+0.199
 closet+0.194
Neg logit contributions
uts-0.136
anes-0.127
all-0.122
 Mike-0.116
that-0.108
Token."
Token ID526
Feature activation+7.74e+00

Pos logit contributions
anes+0.126
uts+0.119
all+0.105
unch+0.100
ous+0.100
Neg logit contributions
 forgot-0.187
 fort-0.164
 scar-0.163
 hospital-0.160
 bit-0.159
Token
Token ID220
Feature activation+8.18e+00

Pos logit contributions
anes+0.187
uts+0.177
ous+0.164
all+0.161
qu+0.152
Neg logit contributions
 forgot-0.202
 scar-0.170
 hospital-0.170
 fort-0.169
 closet-0.167
Token
Token ID198
Feature activation+5.01e+00

Pos logit contributions
anes+0.192
ous+0.183
reating+0.178
umm+0.177
 Isn+0.172
Neg logit contributions
 forgot-0.193
 hospital-0.168
 fort-0.164
 might-0.160
 way-0.159
Token
Token ID198
Feature activation+4.54e+00

Pos logit contributions
anes+0.116
ous+0.109
uts+0.109
umm+0.104
all+0.098
Neg logit contributions
 forgot-0.131
 cast-0.115
 learned-0.114
 scar-0.112
 hospital-0.108
TokenThe
Token ID464
Feature activation-7.40e+00

Pos logit contributions
anes+0.127
uts+0.121
all+0.111
ous+0.111
 Isn+0.104
Neg logit contributions
 forgot-0.107
 scar-0.090
 fort-0.087
 hospital-0.086
 trap-0.085
Token boy
Token ID2933
Feature activation-4.20e+00

Pos logit contributions
 forgot+0.138
iper+0.113
 scar+0.109
 fort+0.108
 shield+0.105
Neg logit contributions
anes-0.234
uts-0.218
all-0.215
that-0.207
ous-0.205
Token smiled
Token ID13541
Feature activation+3.22e+00

Pos logit contributions
 forgot+0.085
 fort+0.085
 hospital+0.084
 trap+0.083
 closet+0.082
Neg logit contributions
all-0.110
uts-0.108
anes-0.107
ous-0.102
od-0.102
Token happily
Token ID18177
Feature activation-5.94e-01

Pos logit contributions
anes+0.110
uts+0.104
all+0.096
qu+0.095
unch+0.094
Neg logit contributions
 forgot-0.057
 scar-0.047
 shield-0.045
 fort-0.044
 head-0.044
Token.
Token ID13
Feature activation+4.51e+00

Pos logit contributions
 forgot+0.012
 scar+0.010
 fort+0.009
 bit+0.009
 head+0.009
Neg logit contributions
anes-0.019
uts-0.018
all-0.016
ous-0.016
unch-0.016
Token "
Token ID366
Feature activation+2.71e+00

Pos logit contributions
anes+0.130
uts+0.124
qu+0.124
od+0.116
all+0.115
Neg logit contributions
 forgot-0.092
 head-0.078
 fort-0.076
 closet-0.076
 way-0.075
TokenThat
Token ID2504
Feature activation-2.79e+00

Pos logit contributions
anes+0.068
uts+0.068
ous+0.059
unch+0.054
ges+0.053
Neg logit contributions
 forgot-0.072
 bit-0.060
 scar-0.060
 remembered-0.060
 cast-0.059
Token were
Token ID547
Feature activation-2.75e+00

Pos logit contributions
 forgot+0.183
 scar+0.151
 fort+0.147
 hospital+0.143
 shield+0.141
Neg logit contributions
anes-0.262
uts-0.251
all-0.234
ous-0.226
that-0.222
Token done
Token ID1760
Feature activation+4.74e+00

Pos logit contributions
 forgot+0.050
 scar+0.040
 fort+0.040
 trap+0.039
 shield+0.039
Neg logit contributions
anes-0.089
uts-0.085
all-0.083
od-0.078
ous-0.077
Token,
Token ID11
Feature activation-1.43e+00

Pos logit contributions
anes+0.177
uts+0.175
amb+0.164
eddy+0.163
 Isn+0.160
Neg logit contributions
 forgot-0.054
 hospital-0.042
 fort-0.040
 accidentally-0.039
 way-0.037
Token Jack
Token ID3619
Feature activation-7.79e+00

Pos logit contributions
 forgot+0.037
 scar+0.030
 fort+0.029
 hospital+0.029
 bit+0.029
Neg logit contributions
anes-0.038
uts-0.037
all-0.033
ous-0.032
unch-0.032
Token said
Token ID531
Feature activation+2.72e+00

Pos logit contributions
 forgot+0.155
 fort+0.154
 scar+0.153
 hospital+0.149
 shield+0.145
Neg logit contributions
uts-0.209
anes-0.208
all-0.195
od-0.184
ous-0.180
Token,
Token ID11
Feature activation-1.62e+00

Pos logit contributions
anes+0.086
uts+0.079
all+0.074
unch+0.073
that+0.071
Neg logit contributions
 forgot-0.057
 shield-0.044
 hospital-0.044
 bit-0.044
 fort-0.044
Token "
Token ID366
Feature activation+3.21e+00

Pos logit contributions
 forgot+0.029
 didn+0.024
 remembered+0.023
 scar+0.023
 bit+0.023
Neg logit contributions
anes-0.054
uts-0.050
umm-0.047
unch-0.047
ach-0.046
TokenLook
Token ID8567
Feature activation+4.78e+00

Pos logit contributions
uts+0.082
anes+0.079
ous+0.073
unch+0.067
all+0.061
Neg logit contributions
 forgot-0.089
 bit-0.073
 learned-0.072
 remembered-0.072
 cast-0.069
Token at
Token ID379
Feature activation+2.91e+00

Pos logit contributions
anes+0.129
uts+0.122
all+0.112
ous+0.108
that+0.105
Neg logit contributions
 forgot-0.119
 scar-0.100
 fort-0.098
 hospital-0.095
 shield-0.095
Token the
Token ID262
Feature activation-5.86e+00

Pos logit contributions
anes+0.090
uts+0.087
all+0.079
ous+0.077
unch+0.076
Neg logit contributions
 forgot-0.061
 scar-0.050
 fort-0.049
 hospital-0.049
 trap-0.047
Token pretty
Token ID2495
Feature activation-3.03e-01

Pos logit contributions
 forgot+0.137
 fort+0.120
 hospital+0.118
 scar+0.117
 way+0.117
Neg logit contributions
uts-0.160
anes-0.159
all-0.143
unch-0.138
ges-0.136
Token got
Token ID1392
Feature activation+3.43e+00

Pos logit contributions
 forgot+0.135
 fort+0.119
 scar+0.117
iper+0.110
 closet+0.109
Neg logit contributions
anes-0.259
uts-0.244
all-0.243
that-0.227
ous-0.227
Token closer
Token ID5699
Feature activation+2.38e+00

Pos logit contributions
anes+0.085
uts+0.081
all+0.074
ous+0.071
that+0.069
Neg logit contributions
 forgot-0.092
 scar-0.079
 fort-0.078
 hospital-0.076
 shield-0.076
Token,
Token ID11
Feature activation-1.30e+00

Pos logit contributions
anes+0.070
uts+0.067
all+0.062
ous+0.060
that+0.059
Neg logit contributions
 forgot-0.052
 scar-0.043
 fort-0.042
 hospital-0.041
 shield-0.041
Token they
Token ID484
Feature activation-7.51e+00

Pos logit contributions
 forgot+0.034
 scar+0.029
 fort+0.029
iper+0.028
 closet+0.028
Neg logit contributions
anes-0.032
uts-0.030
all-0.028
ous-0.027
that-0.026
Token realized
Token ID6939
Feature activation+1.01e+00

Pos logit contributions
 forgot+0.088
 fort+0.088
 hospital+0.081
iper+0.080
 scar+0.080
Neg logit contributions
anes-0.266
uts-0.250
all-0.244
od-0.237
ous-0.236
Token it
Token ID340
Feature activation-3.81e+00

Pos logit contributions
anes+0.026
uts+0.025
all+0.023
ous+0.022
unch+0.022
Neg logit contributions
 forgot-0.026
 scar-0.022
 hospital-0.022
 fort-0.021
 shield-0.021
Token was
Token ID373
Feature activation-2.12e+00

Pos logit contributions
 forgot+0.081
 scar+0.081
 fort+0.073
iper+0.072
 scarf+0.072
Neg logit contributions
anes-0.104
uts-0.102
all-0.095
ous-0.094
that-0.091
Token an
Token ID281
Feature activation-4.62e+00

Pos logit contributions
 forgot+0.052
iper+0.046
 scar+0.045
 fort+0.044
 closet+0.042
Neg logit contributions
all-0.049
anes-0.049
uts-0.049
that-0.044
ous-0.043
Token elephant
Token ID20950
Feature activation+2.71e-01

Pos logit contributions
 forgot+0.090
 scar+0.072
 fort+0.070
 hospital+0.066
 closet+0.066
Neg logit contributions
anes-0.147
uts-0.141
all-0.134
ous-0.129
that-0.126
Token's
Token ID338
Feature activation-6.12e+00

Pos logit contributions
anes+0.006
uts+0.006
all+0.005
ous+0.005
that+0.005
Neg logit contributions
 forgot-0.008
 scar-0.007
 fort-0.007
 hospital-0.007
 trap-0.006
Token trunk
Token ID21427
Feature activation+2.77e+00

Pos logit contributions
 forgot+0.116
 scar+0.095
 fort+0.094
 hospital+0.090
 closet+0.090
Neg logit contributions
anes-0.198
uts-0.190
all-0.178
ous-0.171
that-0.169
Token day
Token ID1110
Feature activation+4.34e+00

Pos logit contributions
anes+0.021
all+0.020
uts+0.020
ous+0.019
that+0.018
Neg logit contributions
 forgot-0.024
 scar-0.021
iper-0.020
 shield-0.019
 closet-0.019
Token,
Token ID11
Feature activation-1.44e+00

Pos logit contributions
anes+0.104
uts+0.101
all+0.093
 Mike+0.088
ges+0.087
Neg logit contributions
 forgot-0.107
 scar-0.100
 fort-0.096
eh-0.092
iper-0.092
Token someone
Token ID2130
Feature activation-2.66e+00

Pos logit contributions
 forgot+0.036
eh+0.033
iper+0.033
 fort+0.031
 scar+0.031
Neg logit contributions
anes-0.032
uts-0.031
all-0.030
 Isn-0.028
that-0.027
Token knocked
Token ID13642
Feature activation+9.70e+00

Pos logit contributions
 fort+0.050
 scar+0.050
 closet+0.046
 scarf+0.046
iper+0.045
Neg logit contributions
uts-0.070
all-0.066
anes-0.066
ous-0.063
od-0.062
Token on
Token ID319
Feature activation+1.10e+00

Pos logit contributions
anes+0.223
uts+0.212
all+0.188
ous+0.183
unch+0.177
Neg logit contributions
 forgot-0.277
 scar-0.241
 fort-0.237
 hospital-0.233
 shield-0.231
Token the
Token ID262
Feature activation-6.51e+00

Pos logit contributions
anes+0.026
uts+0.024
all+0.023
ous+0.021
that+0.021
Neg logit contributions
 forgot-0.031
 scar-0.026
 fort-0.025
 closet-0.024
 trap-0.024
Token door
Token ID3420
Feature activation+4.36e+00

Pos logit contributions
 forgot+0.137
 scar+0.109
iper+0.105
 fort+0.100
 remembered+0.100
Neg logit contributions
anes-0.198
uts-0.188
all-0.175
ous-0.174
 Isn-0.171
Token,
Token ID11
Feature activation-9.93e-01

Pos logit contributions
anes+0.127
uts+0.121
all+0.111
ous+0.107
unch+0.107
Neg logit contributions
 forgot-0.097
 fort-0.080
 scar-0.079
 hospital-0.078
 closet-0.078
Token and
Token ID290
Feature activation-6.07e+00

Pos logit contributions
 forgot+0.022
 fort+0.020
iper+0.019
 scar+0.019
 trap+0.018
Neg logit contributions
anes-0.026
uts-0.026
all-0.025
 Isn-0.024
that-0.023
Token it
Token ID340
Feature activation-4.60e+00

Pos logit contributions
 forgot+0.136
 fort+0.122
 scar+0.120
eh+0.120
iper+0.118
Neg logit contributions
anes-0.154
uts-0.148
all-0.139
ous-0.138
 Isn-0.131
Token was
Token ID373
Feature activation-2.64e+00

Pos logit contributions
iper+0.088
 scar+0.085
 forgot+0.081
 fort+0.076
 shield+0.076
Neg logit contributions
uts-0.122
anes-0.120
all-0.116
ous-0.114
qu-0.107
Token for
Token ID329
Feature activation+9.58e-01

Pos logit contributions
all+0.055
anes+0.054
 along+0.054
uts+0.052
ges+0.052
Neg logit contributions
 forgot-0.114
iper-0.101
 scar-0.096
 closet-0.095
 scarf-0.094
Token his
Token ID465
Feature activation-9.43e+00

Pos logit contributions
uts+0.018
anes+0.018
all+0.017
 Mike+0.017
ges+0.017
Neg logit contributions
 forgot-0.027
 fort-0.023
iper-0.022
 closet-0.022
eh-0.022
Token pet
Token ID4273
Feature activation+4.40e+00

Pos logit contributions
 forgot+0.224
iper+0.183
 shield+0.178
 scar+0.177
 fort+0.174
Neg logit contributions
uts-0.234
anes-0.221
all-0.218
that-0.207
unch-0.202
Token.
Token ID13
Feature activation+4.69e+00

Pos logit contributions
anes+0.115
uts+0.112
all+0.103
ous+0.099
that+0.097
Neg logit contributions
 forgot-0.111
 scar-0.088
 fort-0.084
 remembered-0.082
 shield-0.082
Token But
Token ID887
Feature activation-1.13e+00

Pos logit contributions
anes+0.104
uts+0.098
all+0.091
ous+0.085
that+0.083
Neg logit contributions
 forgot-0.138
 scar-0.121
 fort-0.116
 scarf-0.115
 shield-0.115
Token he
Token ID339
Feature activation-9.31e+00

Pos logit contributions
 forgot+0.031
 fort+0.030
 closet+0.029
iper+0.029
 scar+0.029
Neg logit contributions
all-0.023
 Mike-0.021
uts-0.019
anes-0.019
that-0.018
Token didn
Token ID1422
Feature activation+5.69e+00

Pos logit contributions
 fort+0.224
 scar+0.223
 coats+0.212
 closet+0.206
 scarf+0.199
Neg logit contributions
all-0.230
 Mike-0.190
uts-0.186
od-0.182
 ever-0.180
Token't
Token ID470
Feature activation-3.37e+00

Pos logit contributions
anes+0.117
ges+0.106
 Isn+0.104
uts+0.103
umm+0.103
Neg logit contributions
 forgot-0.145
 hospital-0.144
 needle-0.131
 trap-0.131
 cast-0.131
Token find
Token ID1064
Feature activation+2.48e+00

Pos logit contributions
 forgot+0.081
 scar+0.072
 fort+0.071
 closet+0.068
 scarf+0.067
Neg logit contributions
anes-0.087
uts-0.085
all-0.079
ous-0.075
that-0.073
Token it
Token ID340
Feature activation-3.37e+00

Pos logit contributions
anes+0.052
uts+0.049
all+0.048
ges+0.046
that+0.044
Neg logit contributions
 forgot-0.074
 fort-0.063
 scar-0.061
 closet-0.060
iper-0.059
Token and
Token ID290
Feature activation-5.67e+00

Pos logit contributions
 forgot+0.096
 scar+0.088
 fort+0.082
iper+0.082
 scarf+0.081
Neg logit contributions
uts-0.071
anes-0.070
all-0.065
ous-0.061
that-0.060
Token in
Token ID287
Feature activation-6.92e-01

Pos logit contributions
anes+0.053
uts+0.050
all+0.048
ous+0.045
amb+0.045
Neg logit contributions
 forgot-0.036
iper-0.025
 scar-0.025
 trap-0.025
 fort-0.025
Token the
Token ID262
Feature activation-6.37e+00

Pos logit contributions
 forgot+0.016
 scar+0.013
 fort+0.013
 hospital+0.012
 trap+0.012
Neg logit contributions
anes-0.020
uts-0.019
all-0.018
ous-0.017
that-0.017
Token yard
Token ID12699
Feature activation+6.50e+00

Pos logit contributions
 forgot+0.169
iper+0.125
 sank+0.123
 slammed+0.122
 remembered+0.122
Neg logit contributions
anes-0.175
that-0.144
all-0.142
eddy-0.140
Welcome-0.139
Token,
Token ID11
Feature activation-1.44e+00

Pos logit contributions
anes+0.159
uts+0.150
all+0.142
ous+0.133
that+0.131
Neg logit contributions
 forgot-0.174
 scar-0.151
 fort-0.144
 trap-0.144
 shield-0.142
Token in
Token ID287
Feature activation-1.11e+00

Pos logit contributions
 forgot+0.040
 trap+0.035
 fort+0.035
iper+0.035
 scar+0.033
Neg logit contributions
anes-0.031
all-0.030
uts-0.028
that-0.028
ges-0.026
Token the
Token ID262
Feature activation-6.78e+00

Pos logit contributions
 forgot+0.030
 fort+0.024
iper+0.024
 scar+0.024
 trap+0.024
Neg logit contributions
anes-0.026
all-0.024
uts-0.024
that-0.023
 Isn-0.022
Token street
Token ID4675
Feature activation+4.69e+00

Pos logit contributions
 forgot+0.175
ches+0.139
iper+0.138
 sank+0.136
 slammed+0.132
Neg logit contributions
anes-0.167
Welcome-0.154
that-0.147
 Hello-0.147
Ice-0.144
Token,
Token ID11
Feature activation-1.37e+00

Pos logit contributions
anes+0.122
uts+0.116
all+0.110
ous+0.104
that+0.103
Neg logit contributions
 forgot-0.117
 scar-0.100
 fort-0.098
 trap-0.097
 shield-0.096
Token and
Token ID290
Feature activation-5.96e+00

Pos logit contributions
 forgot+0.037
 trap+0.031
 fort+0.031
 scar+0.031
iper+0.031
Neg logit contributions
anes-0.033
uts-0.031
all-0.031
that-0.029
ous-0.028
Token in
Token ID287
Feature activation-1.34e+00

Pos logit contributions
 forgot+0.145
 fort+0.124
 scar+0.123
 trap+0.121
iper+0.118
Neg logit contributions
anes-0.155
uts-0.148
all-0.139
ous-0.133
that-0.131
Token the
Token ID262
Feature activation-7.03e+00

Pos logit contributions
 forgot+0.036
 fort+0.029
eh+0.029
iper+0.029
 trap+0.028
Neg logit contributions
anes-0.031
uts-0.029
all-0.028
that-0.028
 Isn-0.027
Token He
Token ID679
Feature activation-1.03e+01

Pos logit contributions
anes+0.122
uts+0.118
qu+0.114
ous+0.109
all+0.107
Neg logit contributions
 forgot-0.087
 hospital-0.075
 shield-0.074
 closet-0.073
iper-0.073
Token gathered
Token ID9272
Feature activation+4.51e+00

Pos logit contributions
 fort+0.185
 forgot+0.181
 hospital+0.176
 closet+0.175
 trap+0.170
Neg logit contributions
anes-0.290
all-0.288
uts-0.283
ous-0.266
 Mike-0.263
Token enough
Token ID1576
Feature activation+1.70e-01

Pos logit contributions
anes+0.124
uts+0.119
all+0.108
ous+0.106
unch+0.102
Neg logit contributions
 forgot-0.108
 scar-0.093
 fort-0.090
 shield-0.088
 hospital-0.088
Token pieces
Token ID5207
Feature activation+5.02e+00

Pos logit contributions
anes+0.004
uts+0.004
all+0.004
 Isn+0.004
ous+0.004
Neg logit contributions
 forgot-0.004
 fort-0.004
 hospital-0.003
iper-0.003
 scar-0.003
Token of
Token ID286
Feature activation-7.18e-01

Pos logit contributions
anes+0.095
uts+0.088
all+0.078
ous+0.074
that+0.071
Neg logit contributions
 forgot-0.162
 scar-0.143
 fort-0.141
 shield-0.138
 hospital-0.138
Token the
Token ID262
Feature activation-7.08e+00

Pos logit contributions
 forgot+0.018
 fort+0.014
 hospital+0.014
iper+0.014
 scar+0.014
Neg logit contributions
anes-0.019
uts-0.018
all-0.017
that-0.016
ous-0.016
Token red
Token ID2266
Feature activation+1.57e+00

Pos logit contributions
 forgot+0.158
iper+0.130
 scar+0.124
 slammed+0.124
 hospital+0.123
Neg logit contributions
anes-0.195
uts-0.180
all-0.170
that-0.166
 Isn-0.165
Token ball
Token ID2613
Feature activation+4.53e+00

Pos logit contributions
anes+0.049
uts+0.047
all+0.044
ous+0.043
that+0.042
Neg logit contributions
 forgot-0.032
 scar-0.025
 fort-0.024
 hospital-0.024
 closet-0.023
Token to
Token ID284
Feature activation-2.19e+00

Pos logit contributions
anes+0.126
uts+0.122
all+0.113
ous+0.109
that+0.106
Neg logit contributions
 forgot-0.105
 scar-0.089
 fort-0.085
 hospital-0.085
 closet-0.083
Token bring
Token ID2222
Feature activation+5.54e+00

Pos logit contributions
 forgot+0.048
 scar+0.039
 fort+0.038
 closet+0.037
 hospital+0.036
Neg logit contributions
anes-0.064
uts-0.061
all-0.058
ous-0.056
that-0.055
Token back
Token ID736
Feature activation+1.18e-01

Pos logit contributions
anes+0.146
uts+0.139
all+0.127
ous+0.123
that+0.119
Neg logit contributions
 forgot-0.139
 scar-0.119
 fort-0.116
 shield-0.113
 hospital-0.113
Token you
Token ID345
Feature activation-5.90e+00

Pos logit contributions
anes+0.043
uts+0.041
all+0.038
ous+0.036
that+0.035
Neg logit contributions
 forgot-0.054
 fort-0.047
 scar-0.047
 shield-0.045
 trap-0.045
Token stay
Token ID2652
Feature activation+2.15e+00

Pos logit contributions
 forgot+0.132
 fort+0.114
 scar+0.113
 hospital+0.109
 closet+0.109
Neg logit contributions
anes-0.164
uts-0.159
all-0.150
ous-0.141
unch-0.138
Token out
Token ID503
Feature activation+6.37e-01

Pos logit contributions
anes+0.047
uts+0.046
all+0.046
eddy+0.040
that+0.040
Neg logit contributions
 forgot-0.057
 scar-0.047
iper-0.047
 fort-0.046
 trap-0.046
Token too
Token ID1165
Feature activation+1.06e+00

Pos logit contributions
anes+0.016
uts+0.015
all+0.014
ous+0.013
that+0.013
Neg logit contributions
 forgot-0.017
 scar-0.014
 fort-0.014
 hospital-0.014
 shield-0.014
Token long
Token ID890
Feature activation+1.06e+00

Pos logit contributions
anes+0.030
uts+0.029
all+0.027
ous+0.026
that+0.025
Neg logit contributions
 forgot-0.024
 scar-0.020
 fort-0.020
 hospital-0.019
 shield-0.019
Token you
Token ID345
Feature activation-5.15e+00

Pos logit contributions
anes+0.030
uts+0.028
all+0.025
that+0.025
ous+0.025
Neg logit contributions
 forgot-0.024
 scar-0.020
 fort-0.019
 hospital-0.019
 bit-0.019
Token might
Token ID1244
Feature activation+2.22e+00

Pos logit contributions
 forgot+0.116
 scar+0.097
 fort+0.096
 hospital+0.093
 shield+0.092
Neg logit contributions
anes-0.146
uts-0.140
all-0.130
ous-0.125
that-0.122
Token freeze
Token ID16611
Feature activation+5.97e+00

Pos logit contributions
anes+0.068
uts+0.064
all+0.060
ous+0.058
that+0.057
Neg logit contributions
 forgot-0.047
 scar-0.037
 fort-0.037
 hospital-0.037
 trap-0.036
Token.
Token ID13
Feature activation+5.42e+00

Pos logit contributions
anes+0.146
uts+0.136
all+0.124
ous+0.119
unch+0.117
Neg logit contributions
 forgot-0.161
 scar-0.140
 fort-0.138
 hospital-0.136
 shield-0.134
Token Come
Token ID7911
Feature activation+5.03e+00

Pos logit contributions
anes+0.123
uts+0.116
ous+0.105
od+0.100
qu+0.097
Neg logit contributions
 forgot-0.149
 scar-0.133
 closet-0.131
 hospital-0.131
iper-0.127
Token inside
Token ID2641
Feature activation+4.02e+00

Pos logit contributions
anes+0.132
uts+0.125
ous+0.109
all+0.107
qu+0.106
Neg logit contributions
 forgot-0.129
 scar-0.116
 fort-0.114
 scarf-0.110
 closet-0.110