TOP ACTIVATIONS
MAX = 0.327

't know the rh in oc eros was wild . Can
he would take the av oc ados and scatter them around
reward . It was ch ocol ates and sweets !
quiet compared to the noisy se ag ull s outside .
blanket . He lifts the ostr ich off the fence .
give to the rh in oc eros . Lily walked to
that the bad rh in oc eros was now nice .
there was a rh in oc eros who was always naughty
walked , they saw an ostr ich . " Look !"
air was mild . The ostr ich saw some other animals
am taking the rh in oc eros to a new home
couldn 't see what the ostr ich was hiding , but
giving him too many ch ocol ates ." The king learned
" Welcome home , B oun cy !", they said ,
it to the rh in oc eros . When he tasted
He touched the brightly - colored pet als , and as
was a big rh in oc eros . He was troubled
Once there was a little ot ter who lived near a
need . Even the t ini est creature needs our help
time there was a blue ostr ich that lived in the

MIN ACTIVATIONS
MIN = -0.106

their faces , enjoying every last bite .
, hugged and sang together until sunset .
and they raced and laughed until it was time for Billy
him . They played together until the sun started to set
Milo and together they played until the sun went down .
, they laughed and laughed until the sun set . <|endoftext|>
and over . He sang until he had eaten all the
stopped they all said goodbye until next time ! <|endoftext|> Once
laughed and smiled and played until it was time to go
enthusiastic . Every time Amelia got an answer right her Mom
would hug and say goodbye until the next time . They
They played and laughed together until the sun went down .
. She laughed and played until it was time to go
all of the support they got . They worked hard ,
. They laughed and talked until the sun went down .
They played and laughed together until the monkey tried to take
. They laughed and laughed until they were both very tired
and happy . He played until he was so tired he
to help keep her warm until morning and never regretted it
little boy played and played until it was time to go

INTERVAL 0.219 - 0.327
CONTAINS 0.039%

't know the rh in oc eros was wild . Can

INTERVAL 0.110 - 0.219
CONTAINS 1.676%

without asking . <|endoftext|> Once upon a time , there was
was a little boy named Tim my . Tim my was
you doing ?" Tim and Lily stopped . They
the king laugh . But the king did not like that
toy car !" <|endoftext|> Once upon a time , there was

INTERVAL 0.002 - 0.110
CONTAINS 39.842%

spotted a lady bug . She was very friendly and they
couldn 't believe it . It had become alive !
time , there was a little girl named Lily . She
run away from the black car . They run back to
to sleep with his mom and dad in their big bed

INTERVAL -0.106 - 0.002
CONTAINS 58.444%

sn iffs the box and tries to open it . The
Sam and Lily are playing with a ball . Tom and
. Max started to shoot with the microphone .
out and said , " Sur prise !" The lion laughed
. They wanted to go back to the park . They
Token't
Token ID470
Feature activation+1.54e-01

Pos logit contributions
Listen+0.055
 unbearable+0.053
 crumbling+0.053
Less+0.053
Warning+0.052
Neg logit contributions
ins-0.022
er-0.022
aw-0.021
adow-0.021
cer-0.021
Token know
Token ID760
Feature activation+1.15e-01

Pos logit contributions
cer+0.260
an+0.254
adow+0.249
pit+0.248
ins+0.248
Neg logit contributions
Listen-0.140
Who-0.134
 crumbling-0.126
 unbearable-0.122
 failed-0.120
Token the
Token ID262
Feature activation+1.73e-01

Pos logit contributions
cer+0.162
an+0.160
adow+0.152
pit+0.149
ins+0.148
Neg logit contributions
Listen-0.135
Who-0.134
 crumbling-0.129
 tempting-0.121
 unbearable-0.121
Token rh
Token ID9529
Feature activation+2.14e-01

Pos logit contributions
cer+0.243
adow+0.235
an+0.231
ins+0.230
pit+0.223
Neg logit contributions
Listen-0.206
Who-0.199
 crumbling-0.194
 unbearable-0.187
 tempting-0.183
Tokenin
Token ID259
Feature activation+1.37e-01

Pos logit contributions
adow+0.293
pit+0.283
robe+0.263
 majesty+0.262
ows+0.254
Neg logit contributions
Who-0.232
 failed-0.231
Listen-0.230
 afford-0.229
Less-0.218
Tokenoc
Token ID420
Feature activation+2.30e-01

Pos logit contributions
cer+0.107
adow+0.102
an+0.101
ins+0.095
pit+0.094
Neg logit contributions
Listen-0.248
Who-0.240
 crumbling-0.236
 unbearable-0.230
Less-0.229
Tokeneros
Token ID27498
Feature activation+1.26e-01

Pos logit contributions
adow+0.234
eh+0.199
cer+0.196
pit+0.191
robe+0.187
Neg logit contributions
Listen-0.349
Who-0.347
 crumbling-0.345
 failed-0.341
Less-0.338
Token was
Token ID373
Feature activation+8.15e-02

Pos logit contributions
cer+0.160
adow+0.155
an+0.153
ins+0.146
pit+0.146
Neg logit contributions
Listen-0.170
Who-0.160
 crumbling-0.157
 unbearable-0.152
Less-0.152
Token wild
Token ID4295
Feature activation+1.31e-01

Pos logit contributions
cer+0.123
adow+0.119
an+0.119
ins+0.115
pit+0.115
Neg logit contributions
Listen-0.089
Who-0.085
 crumbling-0.082
 unbearable-0.079
Less-0.078
Token.
Token ID13
Feature activation+7.78e-02

Pos logit contributions
eh+0.176
ins+0.174
an+0.173
cer+0.172
oco+0.170
Neg logit contributions
 crumbling-0.149
 tempting-0.138
 damage-0.138
 confusing-0.137
Who-0.137
Token Can
Token ID1680
Feature activation+7.90e-02

Pos logit contributions
cer+0.116
adow+0.114
an+0.112
pit+0.109
ins+0.108
Neg logit contributions
Listen-0.086
Who-0.080
 crumbling-0.079
 unbearable-0.076
Less-0.075
Token he
Token ID339
Feature activation+1.16e-01

Pos logit contributions
cer+0.252
an+0.249
adow+0.243
ins+0.241
ese+0.237
Neg logit contributions
Listen-0.134
Who-0.129
 crumbling-0.121
Um-0.115
 unbearable-0.115
Token would
Token ID561
Feature activation+8.08e-02

Pos logit contributions
cer+0.179
an+0.173
adow+0.173
ins+0.169
pit+0.167
Neg logit contributions
Listen-0.121
Who-0.115
 crumbling-0.111
 unbearable-0.106
Um-0.105
Token take
Token ID1011
Feature activation+7.34e-02

Pos logit contributions
cer+0.130
an+0.126
adow+0.125
ins+0.123
pit+0.122
Neg logit contributions
Listen-0.080
Who-0.076
 crumbling-0.073
 unbearable-0.070
Less-0.069
Token the
Token ID262
Feature activation+1.40e-01

Pos logit contributions
cer+0.105
adow+0.103
an+0.103
pit+0.100
ins+0.099
Neg logit contributions
Listen-0.086
Who-0.082
 crumbling-0.079
 unbearable-0.076
Less-0.075
Token av
Token ID1196
Feature activation+1.59e-01

Pos logit contributions
cer+0.185
adow+0.178
an+0.177
ins+0.174
pit+0.170
Neg logit contributions
Listen-0.178
Who-0.171
 crumbling-0.168
 unbearable-0.162
Less-0.160
Tokenoc
Token ID420
Feature activation+2.15e-01

Pos logit contributions
robe+0.170
pit+0.166
adow+0.162
cer+0.160
an+0.159
Neg logit contributions
Pick-0.213
Listen-0.213
Less-0.213
Po-0.213
Who-0.206
Tokenados
Token ID22484
Feature activation+6.64e-02

Pos logit contributions
adow+0.272
cer+0.267
robe+0.246
pit+0.245
an+0.241
Neg logit contributions
Listen-0.262
 crumbling-0.259
Who-0.259
Less-0.255
 unbearable-0.249
Token and
Token ID290
Feature activation+5.59e-02

Pos logit contributions
an+0.095
cer+0.095
ins+0.090
adow+0.089
ese+0.089
Neg logit contributions
Listen-0.072
Who-0.068
 crumbling-0.068
 afford-0.066
Less-0.066
Token scatter
Token ID41058
Feature activation+5.28e-02

Pos logit contributions
cer+0.084
adow+0.081
an+0.081
ins+0.079
pit+0.078
Neg logit contributions
Listen-0.062
Who-0.058
 crumbling-0.056
 unbearable-0.054
Less-0.053
Token them
Token ID606
Feature activation+5.21e-02

Pos logit contributions
cer+0.074
adow+0.073
an+0.072
pit+0.069
ins+0.069
Neg logit contributions
Listen-0.063
Who-0.060
 crumbling-0.059
 unbearable-0.057
Less-0.056
Token around
Token ID1088
Feature activation+6.44e-02

Pos logit contributions
cer+0.056
adow+0.054
an+0.053
pit+0.051
ins+0.051
Neg logit contributions
Listen-0.081
Who-0.077
 crumbling-0.075
Um-0.073
Less-0.073
Token reward
Token ID6721
Feature activation+1.46e-01

Pos logit contributions
ins+0.300
adow+0.300
an+0.289
ese+0.284
cer+0.277
Neg logit contributions
 crumbling-0.304
Who-0.302
 damage-0.291
Listen-0.286
 unbearable-0.281
Token.
Token ID13
Feature activation+1.04e-01

Pos logit contributions
an+0.177
cer+0.176
adow+0.167
pit+0.164
ese+0.162
Neg logit contributions
Listen-0.192
Who-0.186
 crumbling-0.178
Less-0.176
 afford-0.176
Token It
Token ID632
Feature activation+8.59e-02

Pos logit contributions
cer+0.168
an+0.165
adow+0.163
ins+0.160
ese+0.157
Neg logit contributions
Listen-0.103
Who-0.099
 crumbling-0.092
Less-0.090
Um-0.090
Token was
Token ID373
Feature activation+8.33e-02

Pos logit contributions
cer+0.116
adow+0.112
an+0.112
ins+0.108
pit+0.108
Neg logit contributions
Listen-0.108
Who-0.102
 crumbling-0.100
 unbearable-0.096
Less-0.096
Token ch
Token ID442
Feature activation+1.09e-01

Pos logit contributions
cer+0.106
adow+0.104
an+0.102
pit+0.099
ins+0.098
Neg logit contributions
Listen-0.110
Who-0.105
 crumbling-0.100
Um-0.098
Less-0.097
Tokenocol
Token ID4668
Feature activation+2.14e-01

Pos logit contributions
cer+0.106
an+0.100
adow+0.098
pit+0.098
ese+0.094
Neg logit contributions
Listen-0.174
 crumbling-0.172
 unbearable-0.167
Who-0.165
 tempting-0.162
Tokenates
Token ID689
Feature activation+8.53e-02

Pos logit contributions
an+0.159
cer+0.158
robe+0.143
adow+0.138
 player+0.138
Neg logit contributions
Listen-0.358
Less-0.348
 crumbling-0.345
Who-0.343
Po-0.340
Token and
Token ID290
Feature activation+6.82e-02

Pos logit contributions
an+0.100
cer+0.098
ins+0.095
adow+0.092
ese+0.091
Neg logit contributions
 crumbling-0.110
Listen-0.108
 unbearable-0.106
 tempting-0.104
Who-0.104
Token sweets
Token ID42402
Feature activation+7.12e-02

Pos logit contributions
cer+0.102
adow+0.100
an+0.099
ins+0.096
pit+0.095
Neg logit contributions
Listen-0.075
Who-0.072
 crumbling-0.069
 unbearable-0.066
Less-0.066
Token!
Token ID0
Feature activation+6.39e-02

Pos logit contributions
an+0.078
cer+0.078
adow+0.076
ins+0.075
ese+0.072
Neg logit contributions
Listen-0.101
Who-0.099
 crumbling-0.098
 unbearable-0.095
 failed-0.094
Token
Token ID198
Feature activation+5.67e-02

Pos logit contributions
cer+0.097
an+0.095
adow+0.094
ins+0.092
pit+0.091
Neg logit contributions
Listen-0.070
Who-0.067
 crumbling-0.063
Less-0.062
Um-0.061
Token quiet
Token ID5897
Feature activation+2.17e-01

Pos logit contributions
Listen+0.022
Who+0.021
 crumbling+0.020
 unbearable+0.020
Less+0.020
Neg logit contributions
cer-0.016
adow-0.015
an-0.015
ins-0.014
pit-0.014
Token compared
Token ID3688
Feature activation+1.62e-01

Pos logit contributions
cer+0.296
oco+0.291
ins+0.284
an+0.284
ase+0.281
Neg logit contributions
Who-0.229
Listen-0.226
 notice-0.224
 damage-0.220
 sigh-0.219
Token to
Token ID284
Feature activation+8.63e-02

Pos logit contributions
an+0.253
cer+0.248
adow+0.247
ese+0.241
ase+0.239
Neg logit contributions
Listen-0.165
Who-0.159
 crumbling-0.159
 unbearable-0.149
 damage-0.147
Token the
Token ID262
Feature activation+1.67e-01

Pos logit contributions
cer+0.123
an+0.120
adow+0.119
ins+0.116
pit+0.115
Neg logit contributions
Listen-0.101
Who-0.097
 crumbling-0.094
 unbearable-0.090
Less-0.089
Token noisy
Token ID31210
Feature activation+1.48e-01

Pos logit contributions
cer+0.226
adow+0.220
an+0.216
ins+0.215
pit+0.207
Neg logit contributions
Listen-0.206
Who-0.201
 crumbling-0.198
 unbearable-0.190
 tempting-0.187
Token se
Token ID384
Feature activation+2.14e-01

Pos logit contributions
adow+0.178
cer+0.175
ins+0.170
an+0.167
ese+0.162
Neg logit contributions
Listen-0.196
Who-0.194
 crumbling-0.190
 damage-0.186
 unbearable-0.181
Tokenag
Token ID363
Feature activation+1.48e-01

Pos logit contributions
cer+0.242
pit+0.240
robe+0.212
 knights+0.190
 room+0.189
Neg logit contributions
Listen-0.292
rel-0.276
 unbearable-0.273
 persever-0.272
 afford-0.270
Tokenull
Token ID724
Feature activation+9.75e-02

Pos logit contributions
oco+0.198
pit+0.192
cer+0.188
ese+0.187
 majesty+0.179
Neg logit contributions
Listen-0.160
 crumbling-0.151
 single-0.149
 unbearable-0.148
None-0.146
Tokens
Token ID82
Feature activation+9.20e-02

Pos logit contributions
cer+0.118
pit+0.111
adow+0.108
an+0.108
ins+0.107
Neg logit contributions
Listen-0.126
 crumbling-0.124
Who-0.121
 unbearable-0.121
 failed-0.119
Token outside
Token ID2354
Feature activation+9.04e-02

Pos logit contributions
cer+0.101
an+0.097
adow+0.095
ins+0.094
pit+0.092
Neg logit contributions
Listen-0.135
 crumbling-0.131
Who-0.129
 unbearable-0.126
 failed-0.125
Token.
Token ID13
Feature activation+7.99e-02

Pos logit contributions
cer+0.096
an+0.094
adow+0.092
ins+0.089
ese+0.087
Neg logit contributions
Listen-0.136
Who-0.132
 crumbling-0.130
 unbearable-0.126
Less-0.123
Token blanket
Token ID18447
Feature activation+1.42e-01

Pos logit contributions
cer+0.283
adow+0.279
ins+0.278
an+0.259
ese+0.251
Neg logit contributions
Who-0.269
 crumbling-0.265
Listen-0.264
 damage-0.260
 tempting-0.253
Token.
Token ID13
Feature activation+1.02e-01

Pos logit contributions
cer+0.159
an+0.152
ins+0.151
adow+0.148
ese+0.145
Neg logit contributions
Listen-0.199
 crumbling-0.194
Who-0.190
 unbearable-0.188
 failed-0.184
Token He
Token ID679
Feature activation+7.81e-02

Pos logit contributions
cer+0.166
an+0.162
adow+0.159
ins+0.156
ese+0.154
Neg logit contributions
Listen-0.103
Who-0.098
 crumbling-0.092
Um-0.089
Less-0.089
Token lifts
Token ID27103
Feature activation+6.47e-02

Pos logit contributions
cer+0.127
an+0.122
adow+0.122
ins+0.119
pit+0.119
Neg logit contributions
Listen-0.076
Who-0.072
 crumbling-0.070
Um-0.066
 unbearable-0.066
Token the
Token ID262
Feature activation+1.29e-01

Pos logit contributions
adow+0.118
cer+0.115
pit+0.113
ins+0.112
ese+0.110
Neg logit contributions
Who-0.051
Listen-0.051
 crumbling-0.048
 unbearable-0.046
 damage-0.044
Token ostr
Token ID42387
Feature activation+2.12e-01

Pos logit contributions
adow+0.162
cer+0.161
ins+0.160
pit+0.148
an+0.147
Neg logit contributions
Who-0.167
 crumbling-0.167
Listen-0.166
 damage-0.159
 unbearable-0.158
Tokenich
Token ID488
Feature activation+8.70e-02

Pos logit contributions
cer+0.233
robe+0.231
room+0.220
adow+0.216
 ways+0.215
Neg logit contributions
 crumbling-0.276
Um-0.262
Who-0.261
 failed-0.257
 unbearable-0.253
Token off
Token ID572
Feature activation+4.78e-02

Pos logit contributions
cer+0.100
ins+0.093
adow+0.090
an+0.090
pit+0.089
Neg logit contributions
Listen-0.120
Who-0.120
 crumbling-0.119
 unbearable-0.115
 afford-0.111
Token the
Token ID262
Feature activation+1.14e-01

Pos logit contributions
cer+0.067
adow+0.066
an+0.065
ins+0.063
pit+0.062
Neg logit contributions
Listen-0.057
Who-0.055
 crumbling-0.054
 unbearable-0.052
Less-0.051
Token fence
Token ID13990
Feature activation+6.06e-02

Pos logit contributions
cer+0.141
ins+0.140
adow+0.135
pit+0.125
an+0.123
Neg logit contributions
Listen-0.154
Who-0.152
 crumbling-0.152
 unbearable-0.144
 damage-0.142
Token.
Token ID13
Feature activation+5.34e-02

Pos logit contributions
cer+0.069
ins+0.066
an+0.065
adow+0.062
ese+0.062
Neg logit contributions
 crumbling-0.086
Listen-0.084
 unbearable-0.081
Who-0.081
 failed-0.079
Token give
Token ID1577
Feature activation+8.70e-02

Pos logit contributions
cer+0.181
ins+0.177
an+0.175
adow+0.173
pit+0.172
Neg logit contributions
Who-0.088
Listen-0.086
 crumbling-0.081
 damage-0.081
 unbearable-0.079
Token to
Token ID284
Feature activation+6.83e-02

Pos logit contributions
cer+0.139
adow+0.138
pit+0.138
an+0.134
ins+0.128
Neg logit contributions
Listen-0.087
Who-0.082
 crumbling-0.078
 unbearable-0.075
 tempting-0.075
Token the
Token ID262
Feature activation+1.33e-01

Pos logit contributions
cer+0.123
an+0.117
pit+0.114
ins+0.112
adow+0.112
Neg logit contributions
Who-0.056
 damage-0.054
Listen-0.054
 crumbling-0.053
Less-0.053
Token rh
Token ID9529
Feature activation+1.83e-01

Pos logit contributions
cer+0.166
adow+0.158
an+0.155
ins+0.152
pit+0.149
Neg logit contributions
Listen-0.180
Who-0.176
 crumbling-0.172
 unbearable-0.166
Less-0.164
Tokenin
Token ID259
Feature activation+1.06e-01

Pos logit contributions
adow+0.268
pit+0.267
robe+0.261
 majesty+0.241
 ways+0.238
Neg logit contributions
 afford-0.174
Who-0.171
Listen-0.170
Less-0.166
Pay-0.163
Tokenoc
Token ID420
Feature activation+2.11e-01

Pos logit contributions
cer+0.098
pit+0.090
ins+0.089
adow+0.088
an+0.086
Neg logit contributions
Listen-0.172
Who-0.171
Less-0.165
 crumbling-0.162
Um-0.161
Tokeneros
Token ID27498
Feature activation+9.51e-02

Pos logit contributions
adow+0.216
cer+0.187
eh+0.182
robe+0.178
pit+0.177
Neg logit contributions
Listen-0.320
 crumbling-0.316
Who-0.315
Less-0.309
 failed-0.307
Token.
Token ID13
Feature activation+7.24e-02

Pos logit contributions
cer+0.123
an+0.116
pit+0.114
adow+0.112
ins+0.112
Neg logit contributions
Listen-0.120
Who-0.116
 crumbling-0.112
Less-0.111
 unbearable-0.110
Token Lily
Token ID20037
Feature activation+1.05e-01

Pos logit contributions
cer+0.111
adow+0.108
an+0.108
ins+0.105
pit+0.104
Neg logit contributions
Listen-0.077
Who-0.073
 crumbling-0.071
 unbearable-0.068
Less-0.067
Token walked
Token ID6807
Feature activation+5.72e-02

Pos logit contributions
cer+0.155
adow+0.150
an+0.148
ins+0.146
over+0.145
Neg logit contributions
Listen-0.109
Who-0.104
 failed-0.104
 tempting-0.099
 crumbling-0.098
Token to
Token ID284
Feature activation+3.12e-02

Pos logit contributions
cer+0.086
adow+0.084
ins+0.084
an+0.083
ese+0.081
Neg logit contributions
Listen-0.065
Who-0.060
 crumbling-0.057
 unbearable-0.055
 tempting-0.054
Token that
Token ID326
Feature activation+7.93e-02

Pos logit contributions
an+0.173
cer+0.170
adow+0.165
eh+0.163
ese+0.163
Neg logit contributions
Listen-0.143
Who-0.139
 crumbling-0.133
 afford-0.127
 tempting-0.125
Token the
Token ID262
Feature activation+1.41e-01

Pos logit contributions
cer+0.127
adow+0.123
an+0.121
ins+0.118
pit+0.118
Neg logit contributions
Listen-0.080
Who-0.075
 crumbling-0.073
 unbearable-0.070
Um-0.069
Token bad
Token ID2089
Feature activation+1.11e-01

Pos logit contributions
cer+0.175
adow+0.171
an+0.166
ins+0.164
pit+0.160
Neg logit contributions
Listen-0.191
Who-0.186
 crumbling-0.181
 unbearable-0.174
Less-0.173
Token rh
Token ID9529
Feature activation+1.92e-01

Pos logit contributions
cer+0.140
adow+0.137
an+0.135
ins+0.132
pit+0.129
Neg logit contributions
Listen-0.149
Who-0.143
 crumbling-0.139
 unbearable-0.134
Less-0.133
Tokenin
Token ID259
Feature activation+1.13e-01

Pos logit contributions
adow+0.243
pit+0.236
cer+0.216
an+0.215
robe+0.214
Neg logit contributions
Listen-0.232
Who-0.230
 failed-0.221
Less-0.217
 crumbling-0.215
Tokenoc
Token ID420
Feature activation+2.11e-01

Pos logit contributions
cer+0.100
adow+0.096
an+0.093
ins+0.092
pit+0.091
Neg logit contributions
Listen-0.192
Who-0.187
 crumbling-0.182
Less-0.178
 unbearable-0.178
Tokeneros
Token ID27498
Feature activation+1.02e-01

Pos logit contributions
adow+0.206
cer+0.178
eh+0.176
pit+0.172
robe+0.160
Neg logit contributions
Listen-0.329
Who-0.322
 crumbling-0.322
 failed-0.318
Less-0.315
Token was
Token ID373
Feature activation+5.10e-02

Pos logit contributions
cer+0.144
adow+0.140
an+0.140
ins+0.133
pit+0.132
Neg logit contributions
Listen-0.124
Who-0.117
 crumbling-0.113
Less-0.109
 unbearable-0.109
Token now
Token ID783
Feature activation+5.12e-02

Pos logit contributions
cer+0.070
adow+0.069
an+0.067
ins+0.065
pit+0.064
Neg logit contributions
Listen-0.063
Who-0.060
Um-0.057
 crumbling-0.057
Less-0.055
Token nice
Token ID3621
Feature activation+8.84e-02

Pos logit contributions
cer+0.064
adow+0.062
an+0.058
er+0.057
pit+0.056
Neg logit contributions
Listen-0.072
Who-0.068
Um-0.068
Warning-0.064
Less-0.063
Token.
Token ID13
Feature activation+4.92e-02

Pos logit contributions
an+0.098
cer+0.098
adow+0.095
ins+0.094
ese+0.092
Neg logit contributions
Listen-0.125
Who-0.122
 crumbling-0.120
 unbearable-0.116
 failed-0.116
Token there
Token ID612
Feature activation+9.22e-02

Pos logit contributions
an+0.117
cer+0.116
ese+0.110
ins+0.108
adow+0.107
Neg logit contributions
Listen-0.121
Who-0.118
 crumbling-0.113
 failed-0.109
 Who-0.108
Token was
Token ID373
Feature activation+9.24e-02

Pos logit contributions
cer+0.123
an+0.121
adow+0.118
ins+0.117
ese+0.114
Neg logit contributions
Listen-0.115
Who-0.110
 crumbling-0.106
 unbearable-0.102
Less-0.102
Token a
Token ID257
Feature activation+9.59e-02

Pos logit contributions
an+0.160
cer+0.154
ins+0.152
adow+0.148
ese+0.145
Neg logit contributions
Listen-0.094
Who-0.089
 crumbling-0.085
 unbearable-0.082
Um-0.080
Token rh
Token ID9529
Feature activation+1.83e-01

Pos logit contributions
cer+0.093
an+0.089
adow+0.085
pit+0.084
ins+0.079
Neg logit contributions
Listen-0.159
Who-0.146
 crumbling-0.145
Less-0.145
Warning-0.144
Tokenin
Token ID259
Feature activation+1.13e-01

Pos logit contributions
 majesty+0.280
robe+0.272
adow+0.270
pit+0.270
 pads+0.269
Neg logit contributions
Who-0.163
Listen-0.161
 crumbling-0.157
 failed-0.151
 single-0.150
Tokenoc
Token ID420
Feature activation+2.10e-01

Pos logit contributions
cer+0.102
adow+0.102
pit+0.099
ins+0.099
ese+0.092
Neg logit contributions
Who-0.180
Listen-0.177
 crumbling-0.170
Less-0.168
Um-0.168
Tokeneros
Token ID27498
Feature activation+1.22e-01

Pos logit contributions
adow+0.242
eh+0.213
robe+0.213
cer+0.209
as+0.202
Neg logit contributions
 crumbling-0.285
 failed-0.274
 burg-0.272
Listen-0.271
Who-0.269
Token who
Token ID508
Feature activation+7.45e-02

Pos logit contributions
cer+0.126
pit+0.121
an+0.121
ins+0.121
adow+0.117
Neg logit contributions
Who-0.176
Listen-0.173
 crumbling-0.170
 afford-0.165
 failed-0.165
Token was
Token ID373
Feature activation+6.95e-02

Pos logit contributions
cer+0.093
adow+0.089
an+0.088
ie+0.082
ins+0.082
Neg logit contributions
Listen-0.105
Who-0.098
 crumbling-0.096
Less-0.095
 unbearable-0.094
Token always
Token ID1464
Feature activation+7.60e-02

Pos logit contributions
cer+0.109
an+0.105
ins+0.104
adow+0.104
pit+0.103
Neg logit contributions
 crumbling-0.070
Listen-0.069
 unbearable-0.066
Who-0.066
 tempting-0.065
Token naughty
Token ID45128
Feature activation+8.43e-02

Pos logit contributions
an+0.111
cer+0.110
adow+0.108
ins+0.107
pit+0.105
Neg logit contributions
Listen-0.082
 crumbling-0.081
Who-0.080
 unbearable-0.077
 tempting-0.076
Token walked
Token ID6807
Feature activation+8.86e-02

Pos logit contributions
cer+0.235
an+0.228
adow+0.227
ins+0.222
pit+0.221
Neg logit contributions
Listen-0.145
Who-0.136
 crumbling-0.131
 unbearable-0.126
Less-0.124
Token,
Token ID11
Feature activation+8.25e-02

Pos logit contributions
cer+0.117
an+0.114
ins+0.114
adow+0.113
ese+0.110
Neg logit contributions
Listen-0.113
Who-0.106
 crumbling-0.104
 unbearable-0.099
Less-0.099
Token they
Token ID484
Feature activation+6.74e-02

Pos logit contributions
cer+0.116
adow+0.113
an+0.111
pit+0.108
ins+0.108
Neg logit contributions
Listen-0.098
Who-0.093
 crumbling-0.091
Less-0.088
 unbearable-0.088
Token saw
Token ID2497
Feature activation+8.11e-02

Pos logit contributions
cer+0.101
adow+0.099
an+0.097
pit+0.095
ins+0.094
Neg logit contributions
Listen-0.075
Who-0.070
 crumbling-0.068
 unbearable-0.066
Less-0.066
Token an
Token ID281
Feature activation+1.07e-01

Pos logit contributions
cer+0.125
an+0.123
adow+0.122
ins+0.119
pit+0.118
Neg logit contributions
Listen-0.088
Who-0.084
 crumbling-0.083
 unbearable-0.077
 tempting-0.075
Token ostr
Token ID42387
Feature activation+2.08e-01

Pos logit contributions
cer+0.134
an+0.132
adow+0.131
pit+0.124
ins+0.123
Neg logit contributions
Listen-0.148
Who-0.140
 crumbling-0.135
Less-0.132
Um-0.132
Tokenich
Token ID488
Feature activation+9.40e-02

Pos logit contributions
robe+0.221
cer+0.220
adow+0.202
ins+0.201
una+0.201
Neg logit contributions
 crumbling-0.293
Um-0.278
Who-0.278
 failed-0.275
 unbearable-0.271
Token.
Token ID13
Feature activation+5.12e-02

Pos logit contributions
cer+0.106
an+0.101
adow+0.101
ins+0.099
pit+0.096
Neg logit contributions
Listen-0.136
Who-0.133
 crumbling-0.130
 unbearable-0.125
Less-0.124
Token "
Token ID366
Feature activation+4.56e-02

Pos logit contributions
cer+0.072
adow+0.071
an+0.069
pit+0.068
ins+0.067
Neg logit contributions
Listen-0.059
 crumbling-0.055
 unbearable-0.054
Who-0.054
 failed-0.053
TokenLook
Token ID8567
Feature activation+7.12e-02

Pos logit contributions
cer+0.073
ese+0.068
an+0.067
aw+0.066
room+0.064
Neg logit contributions
 failed-0.046
 tempting-0.045
 confusing-0.043
 crumbling-0.043
 unbearable-0.042
Token!"
Token ID2474
Feature activation+4.47e-02

Pos logit contributions
cer+0.091
ins+0.085
adow+0.085
an+0.084
pit+0.083
Neg logit contributions
Who-0.089
Listen-0.086
 crumbling-0.083
 Who-0.081
 unbearable-0.081
Token air
Token ID1633
Feature activation+1.21e-01

Pos logit contributions
adow+0.198
cer+0.197
ins+0.192
an+0.183
ase+0.179
Neg logit contributions
Who-0.201
Listen-0.197
 crumbling-0.195
 unbearable-0.184
 damage-0.183
Token was
Token ID373
Feature activation+8.86e-02

Pos logit contributions
cer+0.166
adow+0.158
an+0.155
ie+0.153
pit+0.152
Neg logit contributions
Listen-0.154
Who-0.146
Um-0.143
Less-0.141
 crumbling-0.138
Token mild
Token ID11607
Feature activation+1.10e-01

Pos logit contributions
cer+0.138
adow+0.136
an+0.133
pit+0.131
ins+0.129
Neg logit contributions
Listen-0.093
Who-0.088
Um-0.081
 crumbling-0.081
Less-0.081
Token.
Token ID13
Feature activation+7.20e-02

Pos logit contributions
cer+0.112
an+0.108
adow+0.106
ins+0.102
pit+0.100
Neg logit contributions
Listen-0.173
Who-0.167
 crumbling-0.166
 unbearable-0.161
 tempting-0.157
Token The
Token ID383
Feature activation+1.12e-01

Pos logit contributions
cer+0.116
an+0.114
adow+0.111
ins+0.109
ese+0.108
Neg logit contributions
Listen-0.073
Who-0.070
 crumbling-0.066
Um-0.064
Less-0.063
Token ostr
Token ID42387
Feature activation+2.07e-01

Pos logit contributions
cer+0.129
adow+0.126
ins+0.121
an+0.119
pit+0.116
Neg logit contributions
Listen-0.161
Who-0.160
 crumbling-0.155
 unbearable-0.149
Less-0.146
Tokenich
Token ID488
Feature activation+9.45e-02

Pos logit contributions
cer+0.200
adow+0.175
robe+0.175
room+0.168
una+0.164
Neg logit contributions
 crumbling-0.318
Who-0.313
Listen-0.310
Um-0.307
 failed-0.303
Token saw
Token ID2497
Feature activation+6.47e-02

Pos logit contributions
cer+0.135
adow+0.132
an+0.131
ins+0.126
pit+0.126
Neg logit contributions
Listen-0.111
Who-0.105
 crumbling-0.102
Less-0.099
 unbearable-0.098
Token some
Token ID617
Feature activation+7.02e-02

Pos logit contributions
cer+0.097
adow+0.095
an+0.093
pit+0.090
ins+0.088
Neg logit contributions
Listen-0.073
Who-0.072
 crumbling-0.068
 unbearable-0.065
 tempting-0.064
Token other
Token ID584
Feature activation+7.36e-02

Pos logit contributions
cer+0.089
adow+0.086
an+0.086
ins+0.083
pit+0.082
Neg logit contributions
Listen-0.094
Who-0.089
 crumbling-0.087
Less-0.085
Um-0.085
Token animals
Token ID4695
Feature activation+6.82e-02

Pos logit contributions
cer+0.064
adow+0.059
an+0.058
pit+0.056
ins+0.056
Neg logit contributions
Listen-0.127
Who-0.124
 crumbling-0.122
 unbearable-0.118
 tempting-0.117
Token am
Token ID716
Feature activation+7.92e-02

Pos logit contributions
adow+0.165
cer+0.164
an+0.156
ins+0.149
ie+0.148
Neg logit contributions
Listen-0.146
 crumbling-0.139
Who-0.138
Less-0.131
 unbearable-0.129
Token taking
Token ID2263
Feature activation+7.42e-02

Pos logit contributions
cer+0.113
an+0.109
adow+0.109
ins+0.105
pit+0.105
Neg logit contributions
Listen-0.094
Who-0.089
 crumbling-0.087
 unbearable-0.084
Less-0.083
Token the
Token ID262
Feature activation+1.59e-01

Pos logit contributions
cer+0.103
pit+0.101
adow+0.101
an+0.100
ins+0.096
Neg logit contributions
Listen-0.091
Who-0.086
 crumbling-0.082
 unbearable-0.080
Less-0.080
Token rh
Token ID9529
Feature activation+1.88e-01

Pos logit contributions
cer+0.212
adow+0.206
an+0.202
ins+0.201
pit+0.196
Neg logit contributions
Listen-0.199
Who-0.191
 crumbling-0.188
 unbearable-0.181
Less-0.178
Tokenin
Token ID259
Feature activation+1.17e-01

Pos logit contributions
pit+0.280
adow+0.273
robe+0.270
 majesty+0.252
ows+0.250
Neg logit contributions
 afford-0.177
 single-0.169
Less-0.169
 failed-0.169
Who-0.168
Tokenoc
Token ID420
Feature activation+2.06e-01

Pos logit contributions
ins+0.120
pit+0.120
cer+0.118
adow+0.114
an+0.103
Neg logit contributions
Who-0.170
Listen-0.168
Less-0.165
 afford-0.164
 single-0.162
Tokeneros
Token ID27498
Feature activation+1.06e-01

Pos logit contributions
adow+0.209
cer+0.183
eh+0.183
robe+0.181
pit+0.180
Neg logit contributions
 crumbling-0.305
Listen-0.301
 failed-0.296
Who-0.296
Less-0.295
Token to
Token ID284
Feature activation+4.48e-02

Pos logit contributions
cer+0.139
pit+0.136
ins+0.134
an+0.133
adow+0.128
Neg logit contributions
Listen-0.122
Who-0.119
 afford-0.119
 failed-0.119
 crumbling-0.115
Token a
Token ID257
Feature activation+7.82e-02

Pos logit contributions
cer+0.063
adow+0.061
an+0.060
ins+0.058
pit+0.058
Neg logit contributions
Listen-0.054
Who-0.051
 crumbling-0.050
 unbearable-0.048
Less-0.048
Token new
Token ID649
Feature activation+1.29e-01

Pos logit contributions
cer+0.111
adow+0.108
an+0.108
ins+0.104
pit+0.104
Neg logit contributions
Listen-0.092
Who-0.087
 crumbling-0.085
 unbearable-0.082
Less-0.081
Token home
Token ID1363
Feature activation+6.03e-02

Pos logit contributions
ins+0.173
adow+0.160
cer+0.153
pit+0.149
as+0.149
Neg logit contributions
Who-0.156
 crumbling-0.155
Listen-0.152
 afford-0.150
 single-0.148
Token couldn
Token ID3521
Feature activation+7.97e-02

Pos logit contributions
cer+0.194
adow+0.186
an+0.184
ins+0.183
pit+0.182
Neg logit contributions
Listen-0.166
Who-0.155
 crumbling-0.154
 failed-0.153
 unbearable-0.151
Token't
Token ID470
Feature activation+8.01e-02

Pos logit contributions
cer+0.137
adow+0.135
an+0.134
pit+0.128
ins+0.128
Neg logit contributions
Listen-0.066
Who-0.064
 crumbling-0.060
 unbearable-0.057
Um-0.056
Token see
Token ID766
Feature activation+7.14e-02

Pos logit contributions
adow+0.111
an+0.108
cer+0.108
ie+0.104
ese+0.097
Neg logit contributions
Listen-0.101
 crumbling-0.092
 tempting-0.092
Who-0.091
Um-0.088
Token what
Token ID644
Feature activation+5.56e-02

Pos logit contributions
adow+0.122
an+0.122
cer+0.121
ins+0.121
ese+0.119
Neg logit contributions
Who-0.064
 crumbling-0.059
Listen-0.057
 damage-0.055
 Who-0.055
Token the
Token ID262
Feature activation+1.23e-01

Pos logit contributions
cer+0.084
adow+0.081
an+0.079
ins+0.077
ie+0.077
Neg logit contributions
Listen-0.060
Who-0.056
 crumbling-0.055
Less-0.054
 unbearable-0.054
Token ostr
Token ID42387
Feature activation+2.06e-01

Pos logit contributions
cer+0.154
adow+0.149
an+0.148
ins+0.143
pit+0.142
Neg logit contributions
Listen-0.164
Who-0.158
 crumbling-0.154
 unbearable-0.149
Less-0.147
Tokenich
Token ID488
Feature activation+8.82e-02

Pos logit contributions
cer+0.227
robe+0.219
room+0.209
una+0.199
adow+0.198
Neg logit contributions
 crumbling-0.278
Who-0.274
Um-0.273
 failed-0.266
Po-0.258
Token was
Token ID373
Feature activation+6.76e-02

Pos logit contributions
cer+0.111
adow+0.110
an+0.107
ie+0.104
pit+0.102
Neg logit contributions
Listen-0.121
Who-0.111
Less-0.110
 crumbling-0.110
Um-0.108
Token hiding
Token ID11816
Feature activation+1.01e-01

Pos logit contributions
cer+0.095
an+0.090
adow+0.090
ins+0.089
pit+0.087
Neg logit contributions
Listen-0.081
Who-0.076
 crumbling-0.076
 unbearable-0.073
 tempting-0.072
Token,
Token ID11
Feature activation+4.43e-02

Pos logit contributions
ese+0.128
ins+0.127
an+0.124
adow+0.122
eh+0.121
Neg logit contributions
 crumbling-0.124
Who-0.121
Listen-0.117
 unbearable-0.116
 afford-0.113
Token but
Token ID475
Feature activation+3.83e-02

Pos logit contributions
cer+0.055
adow+0.053
an+0.052
ins+0.050
pit+0.050
Neg logit contributions
Listen-0.061
Who-0.058
 crumbling-0.057
 unbearable-0.055
Less-0.055
Token giving
Token ID3501
Feature activation+7.00e-02

Pos logit contributions
cer+0.120
adow+0.117
an+0.115
ins+0.112
pit+0.112
Neg logit contributions
Listen-0.102
Who-0.098
 crumbling-0.095
 unbearable-0.092
Less-0.092
Token him
Token ID683
Feature activation+5.15e-02

Pos logit contributions
pit+0.091
adow+0.087
cer+0.085
an+0.078
ase+0.078
Neg logit contributions
Listen-0.095
Who-0.090
 tempting-0.088
Po-0.085
 damage-0.084
Token too
Token ID1165
Feature activation+5.16e-02

Pos logit contributions
cer+0.076
pit+0.072
adow+0.071
an+0.070
ins+0.070
Neg logit contributions
Listen-0.056
Who-0.053
 unbearable-0.053
 crumbling-0.052
Less-0.052
Token many
Token ID867
Feature activation+7.79e-02

Pos logit contributions
adow+0.081
cer+0.078
an+0.077
ins+0.076
ie+0.075
Neg logit contributions
Who-0.054
Listen-0.053
 crumbling-0.047
Um-0.047
Less-0.046
Token ch
Token ID442
Feature activation+9.42e-02

Pos logit contributions
cer+0.097
an+0.090
pit+0.089
adow+0.088
ese+0.084
Neg logit contributions
Listen-0.105
Who-0.103
 crumbling-0.101
 tempting-0.100
 damage-0.098
Tokenocol
Token ID4668
Feature activation+2.05e-01

Pos logit contributions
cer+0.114
pit+0.110
an+0.103
robe+0.102
adow+0.099
Neg logit contributions
Listen-0.119
 crumbling-0.119
 unbearable-0.117
 failed-0.114
Po-0.114
Tokenates
Token ID689
Feature activation+6.41e-02

Pos logit contributions
 player+0.177
cer+0.173
robe+0.171
an+0.169
pit+0.168
Neg logit contributions
Listen-0.295
Less-0.294
Po-0.288
 failed-0.281
 Much-0.278
Token."
Token ID526
Feature activation+6.00e-02

Pos logit contributions
cer+0.067
an+0.066
pit+0.063
adow+0.062
ins+0.061
Neg logit contributions
Listen-0.092
 crumbling-0.089
Who-0.089
 unbearable-0.087
 afford-0.087
Token The
Token ID383
Feature activation+9.62e-02

Pos logit contributions
cer+0.095
an+0.093
adow+0.092
ins+0.090
pit+0.090
Neg logit contributions
Listen-0.062
Who-0.059
Less-0.055
Um-0.054
 crumbling-0.054
Token king
Token ID5822
Feature activation+1.00e-01

Pos logit contributions
cer+0.118
adow+0.115
ins+0.107
an+0.104
pit+0.103
Neg logit contributions
Who-0.136
Listen-0.133
 crumbling-0.127
Less-0.125
Um-0.124
Token learned
Token ID4499
Feature activation+5.89e-02

Pos logit contributions
cer+0.159
adow+0.148
pit+0.146
an+0.145
ese+0.145
Neg logit contributions
Listen-0.094
 failed-0.091
Who-0.090
Um-0.086
 crumbling-0.085
Token "
Token ID366
Feature activation+1.11e-01

Pos logit contributions
cer+0.244
an+0.239
adow+0.234
ins+0.231
ese+0.227
Neg logit contributions
Listen-0.147
Who-0.141
 crumbling-0.131
Um-0.128
Less-0.126
TokenWelcome
Token ID14618
Feature activation+9.25e-02

Pos logit contributions
cer+0.186
adow+0.176
an+0.175
ins+0.172
ese+0.171
Neg logit contributions
 crumbling-0.097
 unbearable-0.091
Listen-0.091
 tempting-0.090
 failed-0.089
Token home
Token ID1363
Feature activation+7.53e-02

Pos logit contributions
an+0.136
ins+0.134
cer+0.133
ese+0.129
ase+0.128
Neg logit contributions
Listen-0.095
Who-0.094
 Who-0.090
 afford-0.087
 tempting-0.083
Token,
Token ID11
Feature activation+6.03e-02

Pos logit contributions
an+0.094
cer+0.093
ins+0.089
ests+0.086
ese+0.085
Neg logit contributions
 afford-0.090
Who-0.086
Listen-0.086
 tempting-0.085
 Who-0.083
Token B
Token ID347
Feature activation+1.31e-01

Pos logit contributions
cer+0.087
ins+0.085
an+0.085
adow+0.082
ase+0.082
Neg logit contributions
Who-0.071
Listen-0.069
 Who-0.065
 tempting-0.063
 crumbling-0.062
Tokenoun
Token ID977
Feature activation+2.03e-01

Pos logit contributions
cer+0.151
pit+0.144
ins+0.144
ese+0.134
 ways+0.132
Neg logit contributions
Listen-0.181
 afford-0.174
 burg-0.162
 unbearable-0.162
 crumbling-0.162
Tokency
Token ID948
Feature activation+1.35e-01

Pos logit contributions
adow+0.177
pit+0.173
eh+0.164
ins+0.164
una+0.162
Neg logit contributions
Listen-0.330
Who-0.318
 failed-0.298
 Who-0.297
Less-0.296
Token!",
Token ID40754
Feature activation+5.36e-02

Pos logit contributions
ins+0.151
pit+0.146
cer+0.145
ests+0.144
ese+0.143
Neg logit contributions
Who-0.164
Listen-0.163
 afford-0.162
 Who-0.159
 forgotten-0.156
Token they
Token ID484
Feature activation+5.32e-02

Pos logit contributions
cer+0.082
adow+0.079
an+0.079
pit+0.077
ins+0.077
Neg logit contributions
Listen-0.057
Who-0.055
 crumbling-0.052
Less-0.050
Um-0.050
Token said
Token ID531
Feature activation+5.37e-02

Pos logit contributions
cer+0.090
ins+0.089
an+0.087
ase+0.085
pit+0.085
Neg logit contributions
Listen-0.046
Who-0.045
 crumbling-0.043
 failed-0.042
 unbearable-0.042
Token,
Token ID11
Feature activation+5.55e-02

Pos logit contributions
cer+0.063
an+0.063
adow+0.061
pit+0.060
ins+0.059
Neg logit contributions
Listen-0.075
Who-0.072
 crumbling-0.072
 unbearable-0.070
 failed-0.068
Token it
Token ID340
Feature activation+8.21e-02

Pos logit contributions
adow+0.170
pit+0.170
cer+0.167
an+0.163
ase+0.158
Neg logit contributions
Listen-0.086
Who-0.082
 tempting-0.075
Less-0.075
 crumbling-0.075
Token to
Token ID284
Feature activation+5.35e-02

Pos logit contributions
cer+0.120
adow+0.115
an+0.115
pit+0.113
ins+0.113
Neg logit contributions
Listen-0.093
Who-0.088
 crumbling-0.086
 unbearable-0.084
Less-0.082
Token the
Token ID262
Feature activation+1.35e-01

Pos logit contributions
cer+0.079
adow+0.079
an+0.078
pit+0.077
ins+0.076
Neg logit contributions
Who-0.059
Listen-0.059
Less-0.055
 crumbling-0.054
 unbearable-0.053
Token rh
Token ID9529
Feature activation+1.74e-01

Pos logit contributions
cer+0.169
adow+0.169
ins+0.163
an+0.159
pit+0.155
Neg logit contributions
Listen-0.177
Who-0.177
 crumbling-0.171
 unbearable-0.164
Less-0.163
Tokenin
Token ID259
Feature activation+1.03e-01

Pos logit contributions
pit+0.260
adow+0.256
robe+0.252
 majesty+0.237
 pads+0.234
Neg logit contributions
 afford-0.159
Listen-0.158
 failed-0.157
Less-0.154
Who-0.153
Tokenoc
Token ID420
Feature activation+2.03e-01

Pos logit contributions
cer+0.099
ins+0.094
pit+0.093
adow+0.090
an+0.086
Neg logit contributions
Who-0.157
Listen-0.156
Less-0.155
 afford-0.153
 unbearable-0.150
Tokeneros
Token ID27498
Feature activation+9.46e-02

Pos logit contributions
adow+0.208
eh+0.182
cer+0.179
pit+0.174
robe+0.173
Neg logit contributions
Listen-0.305
 crumbling-0.304
Who-0.296
 failed-0.295
Less-0.293
Token.
Token ID13
Feature activation+6.23e-02

Pos logit contributions
cer+0.108
an+0.100
pit+0.099
ins+0.098
adow+0.097
Neg logit contributions
Listen-0.132
Who-0.129
 crumbling-0.126
Less-0.125
 unbearable-0.123
Token When
Token ID1649
Feature activation+5.17e-02

Pos logit contributions
cer+0.093
adow+0.090
an+0.089
ins+0.087
pit+0.087
Neg logit contributions
Listen-0.069
Who-0.065
 crumbling-0.064
 unbearable-0.062
Less-0.060
Token he
Token ID339
Feature activation+4.90e-02

Pos logit contributions
cer+0.066
ie+0.061
er+0.060
pit+0.059
adow+0.058
Neg logit contributions
Listen-0.067
 crumbling-0.063
Who-0.061
Um-0.061
 unbearable-0.061
Token tasted
Token ID29187
Feature activation+4.83e-02

Pos logit contributions
cer+0.070
adow+0.067
an+0.066
ie+0.065
ins+0.064
Neg logit contributions
Listen-0.060
 crumbling-0.057
Who-0.055
 unbearable-0.053
Less-0.053
Token He
Token ID679
Feature activation+1.24e-01

Pos logit contributions
Listen+0.054
Who+0.051
 crumbling+0.049
 unbearable+0.047
Less+0.047
Neg logit contributions
cer-0.079
adow-0.077
an-0.077
ins-0.075
pit-0.074
Token touched
Token ID12615
Feature activation+1.02e-01

Pos logit contributions
cer+0.194
adow+0.187
an+0.187
ins+0.184
pit+0.181
Neg logit contributions
Listen-0.128
Who-0.122
 crumbling-0.118
 unbearable-0.113
Um-0.113
Token the
Token ID262
Feature activation+1.57e-01

Pos logit contributions
adow+0.198
pit+0.190
an+0.190
cer+0.188
ese+0.187
Neg logit contributions
Listen-0.071
Who-0.068
 crumbling-0.060
 unbearable-0.058
 damage-0.055
Token brightly
Token ID35254
Feature activation+1.14e-01

Pos logit contributions
cer+0.222
adow+0.221
ins+0.214
pit+0.202
an+0.201
Neg logit contributions
Who-0.179
 crumbling-0.176
Listen-0.172
 damage-0.170
 unbearable-0.167
Token-
Token ID12
Feature activation+1.61e-01

Pos logit contributions
cer+0.141
adow+0.137
an+0.134
ins+0.131
pit+0.130
Neg logit contributions
Listen-0.155
Who-0.148
 crumbling-0.144
 unbearable-0.140
Less-0.138
Tokencolored
Token ID25717
Feature activation+2.02e-01

Pos logit contributions
cer+0.212
pit+0.211
adow+0.209
ese+0.197
an+0.194
Neg logit contributions
Listen-0.213
Who-0.204
Po-0.189
Um-0.185
 crumbling-0.185
Token pet
Token ID4273
Feature activation+1.27e-01

Pos logit contributions
adow+0.276
cer+0.263
ins+0.255
ese+0.246
an+0.243
Neg logit contributions
Who-0.239
 crumbling-0.225
Listen-0.221
 damage-0.215
Po-0.213
Tokenals
Token ID874
Feature activation+7.94e-02

Pos logit contributions
cer+0.124
adow+0.122
an+0.118
ins+0.114
pit+0.113
Neg logit contributions
Listen-0.205
Who-0.200
 crumbling-0.194
 unbearable-0.188
Um-0.186
Token,
Token ID11
Feature activation+7.08e-02

Pos logit contributions
cer+0.067
adow+0.066
an+0.063
pit+0.061
ie+0.061
Neg logit contributions
Listen-0.141
Who-0.136
Um-0.132
Less-0.130
 crumbling-0.129
Token and
Token ID290
Feature activation+6.82e-02

Pos logit contributions
cer+0.091
adow+0.088
an+0.088
ins+0.084
pit+0.083
Neg logit contributions
Listen-0.094
Who-0.091
 crumbling-0.089
 unbearable-0.086
Less-0.084
Token as
Token ID355
Feature activation+7.37e-02

Pos logit contributions
cer+0.108
adow+0.107
an+0.106
ins+0.102
pit+0.101
Neg logit contributions
Listen-0.069
Who-0.067
 crumbling-0.064
 unbearable-0.061
Um-0.059
Token was
Token ID373
Feature activation+8.57e-02

Pos logit contributions
an+0.136
cer+0.133
ins+0.132
ese+0.130
adow+0.127
Neg logit contributions
Listen-0.090
Who-0.083
 failed-0.081
 crumbling-0.078
Less-0.077
Token a
Token ID257
Feature activation+9.22e-02

Pos logit contributions
an+0.150
cer+0.146
ins+0.143
adow+0.140
pit+0.136
Neg logit contributions
Listen-0.087
Who-0.082
 crumbling-0.078
 unbearable-0.076
Um-0.074
Token big
Token ID1263
Feature activation+1.11e-01

Pos logit contributions
cer+0.092
an+0.088
adow+0.085
pit+0.084
ins+0.079
Neg logit contributions
Listen-0.150
Who-0.137
 crumbling-0.137
Less-0.136
 unbearable-0.135
Token rh
Token ID9529
Feature activation+1.79e-01

Pos logit contributions
cer+0.123
adow+0.119
an+0.117
ins+0.115
pit+0.111
Neg logit contributions
Listen-0.163
Who-0.158
 crumbling-0.156
 unbearable-0.151
Less-0.148
Tokenin
Token ID259
Feature activation+1.10e-01

Pos logit contributions
 majesty+0.265
adow+0.262
pit+0.262
robe+0.259
 pads+0.258
Neg logit contributions
Who-0.167
Listen-0.162
 crumbling-0.160
Po-0.154
 single-0.151
Tokenoc
Token ID420
Feature activation+2.01e-01

Pos logit contributions
cer+0.100
adow+0.099
ins+0.098
pit+0.096
ese+0.090
Neg logit contributions
Who-0.175
Listen-0.171
 crumbling-0.165
Less-0.163
Um-0.162
Tokeneros
Token ID27498
Feature activation+1.18e-01

Pos logit contributions
adow+0.234
eh+0.205
cer+0.202
robe+0.201
as+0.194
Neg logit contributions
 crumbling-0.273
 burg-0.262
Listen-0.260
Who-0.259
 failed-0.257
Token.
Token ID13
Feature activation+5.09e-02

Pos logit contributions
cer+0.124
an+0.120
ins+0.119
pit+0.118
adow+0.116
Neg logit contributions
Who-0.168
Listen-0.166
 crumbling-0.164
 failed-0.158
 afford-0.157
Token He
Token ID679
Feature activation+5.59e-02

Pos logit contributions
cer+0.076
adow+0.076
an+0.074
pit+0.072
ese+0.071
Neg logit contributions
Listen-0.056
 crumbling-0.051
Who-0.051
 unbearable-0.050
 failed-0.048
Token was
Token ID373
Feature activation+6.00e-02

Pos logit contributions
cer+0.061
an+0.061
adow+0.058
ie+0.055
aw+0.054
Neg logit contributions
Listen-0.085
 crumbling-0.079
Who-0.078
 unbearable-0.077
Less-0.077
Token troubled
Token ID17840
Feature activation+8.00e-02

Pos logit contributions
cer+0.090
an+0.086
ins+0.085
adow+0.084
pit+0.083
Neg logit contributions
Listen-0.066
 crumbling-0.065
Who-0.063
 unbearable-0.062
 tempting-0.061
TokenOnce
Token ID7454
Feature activation+1.28e-01

Pos logit contributions
 afford+0.064
 crumbling+0.064
 failed+0.062
 forgotten+0.062
 tempting+0.061
Neg logit contributions
an-0.075
ins-0.072
ese-0.070
cer-0.069
ase-0.068
Token there
Token ID612
Feature activation+1.04e-01

Pos logit contributions
cer+0.163
an+0.162
ese+0.154
ins+0.146
er+0.140
Neg logit contributions
Listen-0.169
 crumbling-0.158
Less-0.157
 failed-0.157
Who-0.156
Token was
Token ID373
Feature activation+9.22e-02

Pos logit contributions
an+0.161
cer+0.158
ins+0.154
ese+0.152
oco+0.147
Neg logit contributions
Listen-0.105
Who-0.099
 failed-0.094
 crumbling-0.092
Less-0.090
Token a
Token ID257
Feature activation+9.68e-02

Pos logit contributions
an+0.145
cer+0.144
ins+0.139
adow+0.135
pit+0.135
Neg logit contributions
Listen-0.101
 crumbling-0.094
Who-0.094
 unbearable-0.091
 failed-0.087
Token little
Token ID1310
Feature activation+1.38e-01

Pos logit contributions
cer+0.094
an+0.089
adow+0.088
pit+0.084
ins+0.083
Neg logit contributions
Listen-0.159
Who-0.151
 crumbling-0.149
Less-0.146
 unbearable-0.146
Token ot
Token ID30972
Feature activation+2.00e-01

Pos logit contributions
adow+0.139
ins+0.137
cer+0.131
ese+0.131
an+0.128
Neg logit contributions
Who-0.211
 crumbling-0.206
Listen-0.205
 confusing-0.197
 unbearable-0.196
Tokenter
Token ID353
Feature activation+1.17e-01

Pos logit contributions
adow+0.202
 majesty+0.193
 ways+0.181
oco+0.179
pit+0.176
Neg logit contributions
Listen-0.293
Who-0.286
 crumbling-0.272
Po-0.272
 burg-0.263
Token who
Token ID508
Feature activation+7.34e-02

Pos logit contributions
pit+0.122
oco+0.119
cer+0.118
ymes+0.113
ins+0.113
Neg logit contributions
 crumbling-0.160
Who-0.158
Listen-0.155
 failed-0.152
Po-0.150
Token lived
Token ID5615
Feature activation+7.10e-02

Pos logit contributions
cer+0.087
adow+0.084
an+0.083
ins+0.079
ie+0.077
Neg logit contributions
Listen-0.106
Who-0.102
 crumbling-0.098
Less-0.097
 unbearable-0.095
Token near
Token ID1474
Feature activation+6.84e-02

Pos logit contributions
an+0.089
cer+0.088
adow+0.086
pit+0.086
ese+0.085
Neg logit contributions
Listen-0.089
 crumbling-0.088
 unbearable-0.083
Who-0.082
 discouraged-0.082
Token a
Token ID257
Feature activation+8.72e-02

Pos logit contributions
cer+0.092
adow+0.089
an+0.089
ins+0.086
pit+0.086
Neg logit contributions
Listen-0.086
Who-0.082
 crumbling-0.080
 unbearable-0.077
Less-0.077
Token need
Token ID761
Feature activation+1.15e-01

Pos logit contributions
an+0.218
ins+0.211
cer+0.206
ese+0.205
adow+0.197
Neg logit contributions
Listen-0.127
Who-0.125
 crumbling-0.123
 unbearable-0.114
 failed-0.114
Token.
Token ID13
Feature activation+9.41e-02

Pos logit contributions
cer+0.143
adow+0.140
ins+0.136
an+0.134
er+0.131
Neg logit contributions
Listen-0.151
Who-0.145
 crumbling-0.140
Um-0.137
 unbearable-0.136
Token Even
Token ID3412
Feature activation+9.87e-02

Pos logit contributions
cer+0.147
adow+0.142
an+0.142
ins+0.138
pit+0.137
Neg logit contributions
Listen-0.098
Who-0.093
 crumbling-0.090
 unbearable-0.086
Less-0.085
Token the
Token ID262
Feature activation+1.40e-01

Pos logit contributions
an+0.143
cer+0.138
adow+0.136
ese+0.135
ase+0.132
Neg logit contributions
Listen-0.117
Who-0.114
 crumbling-0.112
 tempting-0.110
 unbearable-0.108
Token t
Token ID256
Feature activation+1.58e-01

Pos logit contributions
cer+0.184
adow+0.178
an+0.176
ins+0.172
pit+0.168
Neg logit contributions
Listen-0.180
Who-0.174
 crumbling-0.170
 unbearable-0.164
Less-0.162
Tokenini
Token ID5362
Feature activation+1.98e-01

Pos logit contributions
eh+0.161
ese+0.155
adow+0.152
ans+0.144
pit+0.141
Neg logit contributions
 afford-0.235
Um-0.219
Who-0.218
Listen-0.215
 Who-0.211
Tokenest
Token ID395
Feature activation+1.20e-01

Pos logit contributions
pit+0.167
cer+0.164
oco+0.159
adow+0.158
ins+0.154
Neg logit contributions
Listen-0.332
Who-0.316
 unbearable-0.314
 failed-0.311
 crumbling-0.307
Token creature
Token ID7185
Feature activation+9.46e-02

Pos logit contributions
cer+0.140
adow+0.135
an+0.134
ins+0.129
pit+0.128
Neg logit contributions
Listen-0.170
Who-0.164
 crumbling-0.161
 unbearable-0.155
Um-0.153
Token needs
Token ID2476
Feature activation+6.82e-02

Pos logit contributions
cer+0.131
adow+0.127
an+0.126
ins+0.122
pit+0.121
Neg logit contributions
Listen-0.116
Who-0.110
 crumbling-0.106
Less-0.103
 unbearable-0.103
Token our
Token ID674
Feature activation+1.15e-01

Pos logit contributions
cer+0.081
adow+0.079
er+0.076
an+0.075
ins+0.075
Neg logit contributions
Listen-0.095
Who-0.092
Less-0.086
Um-0.086
 crumbling-0.086
Token help
Token ID1037
Feature activation+6.26e-02

Pos logit contributions
cer+0.150
adow+0.145
an+0.144
ins+0.141
pit+0.139
Neg logit contributions
Listen-0.148
Who-0.142
 crumbling-0.139
 unbearable-0.134
Less-0.133
Token time
Token ID640
Feature activation+5.86e-02

Pos logit contributions
cer+0.212
er+0.203
robe+0.202
keeping+0.201
ins+0.199
Neg logit contributions
 tug-0.064
 notice-0.063
 tempting-0.062
 hard-0.061
 long-0.061
Token there
Token ID612
Feature activation+6.15e-02

Pos logit contributions
an+0.093
cer+0.092
ese+0.089
ins+0.088
ans+0.083
Neg logit contributions
Listen-0.059
 crumbling-0.056
Who-0.053
 Much-0.053
 tempting-0.052
Token was
Token ID373
Feature activation+6.71e-02

Pos logit contributions
ese+0.106
ins+0.104
an+0.103
cer+0.101
ans+0.100
Neg logit contributions
Listen-0.054
Who-0.053
 crumbling-0.048
 failed-0.047
 Who-0.046
Token a
Token ID257
Feature activation+7.19e-02

Pos logit contributions
cer+0.135
pit+0.134
an+0.134
ins+0.130
er+0.128
Neg logit contributions
Listen-0.049
There-0.040
 long-0.040
Who-0.039
 Much-0.039
Token blue
Token ID4171
Feature activation+1.28e-01

Pos logit contributions
adow+0.090
cer+0.088
ins+0.086
pit+0.085
ese+0.083
Neg logit contributions
Who-0.097
Listen-0.093
 crumbling-0.089
Po-0.087
 tempting-0.086
Token ostr
Token ID42387
Feature activation+1.98e-01

Pos logit contributions
ins+0.182
adow+0.170
cer+0.170
as+0.162
ows+0.159
Neg logit contributions
Who-0.133
 crumbling-0.127
Listen-0.124
 single-0.124
Po-0.122
Tokenich
Token ID488
Feature activation+8.99e-02

Pos logit contributions
cer+0.230
robe+0.214
 ways+0.204
adow+0.202
ins+0.199
Neg logit contributions
 crumbling-0.257
Who-0.243
Um-0.242
Po-0.239
Dead-0.234
Token that
Token ID326
Feature activation+3.13e-02

Pos logit contributions
pit+0.121
ins+0.118
cer+0.118
 majesty+0.110
immers+0.109
Neg logit contributions
Who-0.103
Dead-0.098
 crumbling-0.097
Listen-0.097
Po-0.095
Token lived
Token ID5615
Feature activation+4.95e-02

Pos logit contributions
cer+0.039
adow+0.038
an+0.037
ins+0.036
pit+0.035
Neg logit contributions
Listen-0.043
Who-0.041
 crumbling-0.040
 unbearable-0.039
Less-0.039
Token in
Token ID287
Feature activation+4.21e-02

Pos logit contributions
pit+0.071
cer+0.069
an+0.069
 majesty+0.067
adow+0.066
Neg logit contributions
Listen-0.051
 crumbling-0.051
 loose-0.047
On-0.047
Less-0.047
Token the
Token ID262
Feature activation+8.48e-02

Pos logit contributions
cer+0.061
adow+0.059
an+0.059
pit+0.059
ins+0.059
Neg logit contributions
Listen-0.050
Who-0.049
 crumbling-0.046
 unbearable-0.044
Less-0.044
Token their
Token ID511
Feature activation-3.74e-02

Pos logit contributions
Listen+0.108
Who+0.102
 crumbling+0.102
Um+0.100
 failed+0.097
Neg logit contributions
cer-0.084
an-0.078
ese-0.075
ins-0.074
adow-0.073
Token faces
Token ID6698
Feature activation-7.09e-02

Pos logit contributions
Listen+0.060
Who+0.057
 crumbling+0.055
Um+0.055
Less+0.054
Neg logit contributions
cer-0.038
an-0.038
adow-0.036
ese-0.035
ins-0.035
Token,
Token ID11
Feature activation-6.07e-02

Pos logit contributions
Listen+0.137
Who+0.131
Um+0.128
Less+0.124
 crumbling+0.124
Neg logit contributions
cer-0.053
adow-0.050
pit-0.048
an-0.045
er-0.045
Token enjoying
Token ID13226
Feature activation-5.93e-02

Pos logit contributions
Listen+0.077
Who+0.074
 crumbling+0.072
 unbearable+0.069
Less+0.068
Neg logit contributions
cer-0.082
an-0.080
adow-0.079
ins-0.076
pit-0.075
Token every
Token ID790
Feature activation-5.40e-02

Pos logit contributions
Listen+0.062
Who+0.062
 crumbling+0.057
Less+0.057
 tempting+0.055
Neg logit contributions
an-0.093
adow-0.092
cer-0.089
pit-0.088
ase-0.087
Tokenlast
Token ID12957
Feature activation-1.06e-01

Pos logit contributions
Listen+0.060
Who+0.056
 crumbling+0.056
Um+0.053
Less+0.053
Neg logit contributions
cer-0.080
adow-0.078
an-0.076
pit-0.074
ie-0.074
Token bite
Token ID13197
Feature activation-4.93e-02

Pos logit contributions
Listen+0.143
 crumbling+0.134
Who+0.134
Um+0.129
Less+0.127
Neg logit contributions
adow-0.134
cer-0.134
ins-0.127
an-0.125
ie-0.121
Token.
Token ID13
Feature activation-6.45e-02

Pos logit contributions
Listen+0.070
Who+0.069
 crumbling+0.068
 unbearable+0.067
 tempting+0.065
Neg logit contributions
an-0.054
cer-0.054
adow-0.054
pit-0.050
ese-0.050
Token
Token ID220
Feature activation-3.35e-02

Pos logit contributions
Listen+0.061
Who+0.057
Less+0.053
 crumbling+0.053
Um+0.051
Neg logit contributions
cer-0.109
an-0.108
adow-0.107
ins-0.103
pit-0.103
Token
Token ID198
Feature activation-5.65e-02

Pos logit contributions
Listen+0.030
Who+0.028
Less+0.027
 Who+0.027
 crumbling+0.025
Neg logit contributions
cer-0.058
an-0.058
adow-0.054
ie-0.054
pit-0.054
Token
Token ID198
Feature activation-6.57e-02

Pos logit contributions
Listen+0.069
Who+0.066
 crumbling+0.065
 unbearable+0.063
Um+0.061
Neg logit contributions
cer-0.077
adow-0.075
an-0.075
ins-0.073
pit-0.071
Token,
Token ID11
Feature activation-4.37e-02

Pos logit contributions
Listen+0.055
Who+0.052
 crumbling+0.052
 unbearable+0.051
Less+0.049
Neg logit contributions
cer-0.037
adow-0.036
an-0.036
ins-0.035
pit-0.034
Token hugged
Token ID41130
Feature activation-4.23e-02

Pos logit contributions
Listen+0.057
Who+0.055
 crumbling+0.053
 unbearable+0.051
Less+0.051
Neg logit contributions
cer-0.058
adow-0.055
an-0.055
ins-0.053
pit-0.053
Token and
Token ID290
Feature activation-6.01e-02

Pos logit contributions
Listen+0.060
Who+0.057
 crumbling+0.055
 unbearable+0.054
Less+0.054
Neg logit contributions
adow-0.050
cer-0.050
pit-0.050
an-0.049
ins-0.048
Token sang
Token ID25889
Feature activation-5.97e-02

Pos logit contributions
Listen+0.065
Who+0.061
 crumbling+0.060
 unbearable+0.057
Less+0.057
Neg logit contributions
cer-0.092
adow-0.089
an-0.089
ins-0.086
pit-0.086
Token together
Token ID1978
Feature activation-5.73e-02

Pos logit contributions
Listen+0.077
Who+0.072
 crumbling+0.071
 unbearable+0.071
Less+0.068
Neg logit contributions
an-0.076
cer-0.076
pit-0.075
adow-0.074
eh-0.071
Token until
Token ID1566
Feature activation-1.04e-01

Pos logit contributions
Listen+0.100
Who+0.096
 crumbling+0.093
Um+0.092
 unbearable+0.091
Neg logit contributions
cer-0.051
adow-0.049
an-0.047
pit-0.045
ins-0.045
Token sunset
Token ID26428
Feature activation-3.70e-02

Pos logit contributions
Um+0.181
Po+0.179
Warning+0.179
Hmm+0.173
Value+0.172
Neg logit contributions
cer-0.067
aw-0.066
 friends-0.063
ie-0.061
 knights-0.060
Token.
Token ID13
Feature activation-7.41e-02

Pos logit contributions
Listen+0.055
Who+0.053
 crumbling+0.052
 unbearable+0.051
 failed+0.050
Neg logit contributions
cer-0.040
an-0.039
adow-0.039
pit-0.037
ins-0.037
Token
Token ID220
Feature activation-2.79e-02

Pos logit contributions
 crumbling+0.108
 unbearable+0.108
 damage+0.106
Who+0.104
 failed+0.104
Neg logit contributions
adow-0.076
ase-0.074
aw-0.070
room-0.070
cer-0.070
Token
Token ID198
Feature activation-5.80e-02

Pos logit contributions
Listen+0.026
Who+0.024
Less+0.023
 Who+0.022
 crumbling+0.022
Neg logit contributions
cer-0.048
an-0.047
pit-0.045
ie-0.045
adow-0.044
Token
Token ID198
Feature activation-6.96e-02

Pos logit contributions
 crumbling+0.081
 unbearable+0.079
Listen+0.079
Who+0.079
 failed+0.077
Neg logit contributions
adow-0.065
an-0.063
cer-0.062
ins-0.061
ase-0.061
Token and
Token ID290
Feature activation-5.30e-02

Pos logit contributions
Listen+0.102
Who+0.101
 crumbling+0.100
Um+0.100
 unbearable+0.100
Neg logit contributions
er-0.025
cer-0.023
adow-0.023
an-0.022
ins-0.022
Token they
Token ID484
Feature activation-5.70e-02

Pos logit contributions
Listen+0.036
Who+0.033
 crumbling+0.030
Less+0.030
Um+0.028
Neg logit contributions
cer-0.100
adow-0.100
an-0.100
pit-0.097
ese-0.096
Token raced
Token ID32897
Feature activation-7.02e-02

Pos logit contributions
Listen+0.064
Who+0.059
 crumbling+0.059
Less+0.057
 unbearable+0.056
Neg logit contributions
cer-0.084
an-0.082
adow-0.082
ins-0.080
pit-0.079
Token and
Token ID290
Feature activation-6.01e-02

Pos logit contributions
Listen+0.089
Who+0.084
 crumbling+0.082
Less+0.078
 unbearable+0.078
Neg logit contributions
cer-0.094
adow-0.093
an-0.092
ins-0.090
pit-0.088
Token laughed
Token ID13818
Feature activation-5.62e-02

Pos logit contributions
Listen+0.061
Who+0.057
 crumbling+0.056
Less+0.053
 unbearable+0.053
Neg logit contributions
cer-0.095
adow-0.094
an-0.094
ins-0.091
pit-0.090
Token until
Token ID1566
Feature activation-1.03e-01

Pos logit contributions
Listen+0.085
Who+0.082
 crumbling+0.081
 unbearable+0.079
Less+0.078
Neg logit contributions
cer-0.060
an-0.059
adow-0.059
ins-0.056
pit-0.055
Token it
Token ID340
Feature activation-5.96e-02

Pos logit contributions
Po+0.144
Um+0.142
 tempting+0.141
Warning+0.140
Listen+0.140
Neg logit contributions
cer-0.114
ie-0.101
aw-0.097
er-0.095
pit-0.095
Token was
Token ID373
Feature activation-5.20e-02

Pos logit contributions
Listen+0.066
Who+0.060
 failed+0.059
 crumbling+0.058
 unbearable+0.057
Neg logit contributions
adow-0.090
cer-0.089
ins-0.087
an-0.086
ese-0.086
Token time
Token ID640
Feature activation-4.06e-02

Pos logit contributions
Listen+0.070
Who+0.066
 crumbling+0.066
 unbearable+0.065
Less+0.063
Neg logit contributions
cer-0.066
an-0.064
adow-0.063
ins-0.062
pit-0.061
Token for
Token ID329
Feature activation-4.15e-02

Pos logit contributions
Who+0.065
Listen+0.064
Um+0.061
 crumbling+0.059
Less+0.059
Neg logit contributions
adow-0.041
cer-0.040
an-0.037
pit-0.037
ins-0.036
Token Billy
Token ID15890
Feature activation-5.53e-02

Pos logit contributions
Listen+0.076
Who+0.074
Um+0.071
Dead+0.071
Value+0.071
Neg logit contributions
ie-0.028
adow-0.027
 ways-0.026
er-0.026
cer-0.025
Token him
Token ID683
Feature activation-6.31e-02

Pos logit contributions
Listen+0.048
Who+0.048
 crumbling+0.044
Less+0.043
 unbearable+0.042
Neg logit contributions
cer-0.065
adow-0.064
an-0.063
pit-0.062
ins-0.062
Token.
Token ID13
Feature activation-6.34e-02

Pos logit contributions
Listen+0.080
Who+0.078
Less+0.078
 crumbling+0.076
 afford+0.075
Neg logit contributions
cer-0.081
an-0.076
pit-0.074
adow-0.074
ins-0.074
Token They
Token ID1119
Feature activation-5.73e-02

Pos logit contributions
Listen+0.068
Who+0.064
 crumbling+0.063
 unbearable+0.060
Less+0.059
Neg logit contributions
cer-0.097
adow-0.094
an-0.093
pit-0.091
ins-0.091
Token played
Token ID2826
Feature activation-5.36e-02

Pos logit contributions
Listen+0.073
Who+0.070
 crumbling+0.067
Um+0.065
 unbearable+0.064
Neg logit contributions
cer-0.077
adow-0.076
an-0.074
ie-0.072
ese-0.071
Token together
Token ID1978
Feature activation-5.67e-02

Pos logit contributions
Listen+0.047
 tug+0.045
Who+0.043
Less+0.042
 persu+0.042
Neg logit contributions
cer-0.084
an-0.083
pit-0.082
ins-0.081
adow-0.080
Token until
Token ID1566
Feature activation-1.00e-01

Pos logit contributions
Listen+0.069
 crumbling+0.066
Who+0.066
Less+0.065
 failed+0.065
Neg logit contributions
cer-0.073
an-0.072
ins-0.070
adow-0.068
ese-0.067
Token the
Token ID262
Feature activation+1.75e-03

Pos logit contributions
Listen+0.142
Um+0.139
 crumbling+0.135
Po+0.135
Who+0.135
Neg logit contributions
cer-0.110
ie-0.100
aw-0.098
adow-0.095
pit-0.093
Token sun
Token ID4252
Feature activation-1.99e-02

Pos logit contributions
cer+0.002
an+0.002
adow+0.002
pit+0.002
ins+0.002
Neg logit contributions
Listen-0.002
Who-0.002
 crumbling-0.002
Less-0.002
Um-0.002
Token started
Token ID2067
Feature activation-5.71e-02

Pos logit contributions
Listen+0.020
Who+0.019
 crumbling+0.018
 unbearable+0.018
Less+0.017
Neg logit contributions
cer-0.032
adow-0.031
an-0.031
ins-0.030
pit-0.030
Token to
Token ID284
Feature activation-5.07e-02

Pos logit contributions
Who+0.083
Listen+0.083
Um+0.080
Less+0.079
 Who+0.076
Neg logit contributions
adow-0.063
cer-0.063
ie-0.062
an-0.060
er-0.059
Token set
Token ID900
Feature activation-6.02e-02

Pos logit contributions
Listen+0.071
Um+0.069
 pricey+0.067
Who+0.067
 failed+0.067
Neg logit contributions
ie-0.057
an-0.057
adow-0.055
cer-0.055
ese-0.053
Token Milo
Token ID32662
Feature activation-3.59e-02

Pos logit contributions
Listen+0.036
Who+0.036
 Who+0.032
 tug+0.032
Po+0.031
Neg logit contributions
adow-0.075
pit-0.074
ins-0.073
an-0.072
cer-0.071
Token and
Token ID290
Feature activation-6.01e-02

Pos logit contributions
 crumbling+0.042
Listen+0.041
 unbearable+0.041
Less+0.040
 tempting+0.039
Neg logit contributions
cer-0.047
ins-0.046
pit-0.045
ase-0.045
room-0.044
Token together
Token ID1978
Feature activation-5.99e-02

Pos logit contributions
Listen+0.059
Who+0.057
 crumbling+0.054
 unbearable+0.052
Less+0.051
Neg logit contributions
cer-0.097
an-0.096
adow-0.095
ins-0.093
pit-0.092
Token they
Token ID484
Feature activation-5.76e-02

Pos logit contributions
Listen+0.087
Who+0.085
Um+0.084
Warning+0.083
 pricey+0.081
Neg logit contributions
ie-0.070
adow-0.069
cer-0.069
er-0.067
aw-0.066
Token played
Token ID2826
Feature activation-6.15e-02

Pos logit contributions
Listen+0.072
Who+0.069
Um+0.065
 crumbling+0.065
Less+0.064
Neg logit contributions
cer-0.080
adow-0.078
an-0.075
ie-0.074
pit-0.073
Token until
Token ID1566
Feature activation-9.94e-02

Pos logit contributions
 tug+0.058
Listen+0.058
Who+0.057
 persu+0.054
 notice+0.054
Neg logit contributions
an-0.093
ins-0.091
rim-0.088
ie-0.088
adow-0.087
Token the
Token ID262
Feature activation+5.76e-03

Pos logit contributions
Listen+0.127
Um+0.123
 crumbling+0.122
Who+0.122
Po+0.120
Neg logit contributions
cer-0.126
ie-0.113
aw-0.111
adow-0.110
pit-0.110
Token sun
Token ID4252
Feature activation-1.85e-02

Pos logit contributions
cer+0.008
adow+0.008
ins+0.007
an+0.007
pit+0.007
Neg logit contributions
Listen-0.008
Who-0.007
 crumbling-0.007
Less-0.007
 unbearable-0.007
Token went
Token ID1816
Feature activation-5.59e-02

Pos logit contributions
Listen+0.019
Who+0.018
 crumbling+0.017
Less+0.017
 unbearable+0.017
Neg logit contributions
cer-0.029
an-0.028
adow-0.028
ins-0.027
pit-0.027
Token down
Token ID866
Feature activation-6.37e-02

Pos logit contributions
Listen+0.083
Who+0.083
Um+0.081
Less+0.079
 crumbling+0.079
Neg logit contributions
cer-0.061
an-0.057
ie-0.055
adow-0.054
er-0.054
Token.
Token ID13
Feature activation-7.18e-02

Pos logit contributions
Listen+0.127
Um+0.124
Who+0.124
Less+0.119
 crumbling+0.119
Neg logit contributions
cer-0.040
er-0.034
adow-0.033
aw-0.032
ie-0.032
Token,
Token ID11
Feature activation-7.21e-02

Pos logit contributions
Listen+0.080
Who+0.078
Um+0.075
 crumbling+0.073
Warning+0.072
Neg logit contributions
cer-0.092
adow-0.088
ie-0.087
er-0.084
aw-0.083
Token they
Token ID484
Feature activation-5.44e-02

Pos logit contributions
Listen+0.083
Who+0.079
 crumbling+0.077
 unbearable+0.074
Less+0.073
Neg logit contributions
cer-0.105
adow-0.102
an-0.101
ins-0.098
pit-0.097
Token laughed
Token ID13818
Feature activation-3.86e-02

Pos logit contributions
Listen+0.060
Who+0.057
 crumbling+0.055
 unbearable+0.053
Um+0.052
Neg logit contributions
cer-0.082
adow-0.080
an-0.079
ins-0.077
pit-0.077
Token and
Token ID290
Feature activation-5.68e-02

Pos logit contributions
Listen+0.059
Who+0.056
 crumbling+0.056
 unbearable+0.055
Less+0.054
Neg logit contributions
cer-0.041
an-0.040
adow-0.039
ins-0.038
pit-0.038
Token laughed
Token ID13818
Feature activation-4.48e-02

Pos logit contributions
Listen+0.068
Who+0.065
 crumbling+0.063
 unbearable+0.060
Um+0.060
Neg logit contributions
cer-0.081
adow-0.078
an-0.077
ins-0.074
pit-0.074
Token until
Token ID1566
Feature activation-9.92e-02

Pos logit contributions
Listen+0.065
Who+0.062
 crumbling+0.062
 unbearable+0.060
Less+0.059
Neg logit contributions
cer-0.050
an-0.050
adow-0.049
ins-0.048
pit-0.047
Token the
Token ID262
Feature activation+6.97e-03

Pos logit contributions
Um+0.140
Listen+0.140
Po+0.139
Warning+0.139
 crumbling+0.138
Neg logit contributions
cer-0.108
ie-0.097
aw-0.094
adow-0.092
er-0.090
Token sun
Token ID4252
Feature activation-1.83e-02

Pos logit contributions
cer+0.008
adow+0.008
an+0.008
ins+0.008
pit+0.008
Neg logit contributions
Listen-0.010
Who-0.010
 crumbling-0.009
 unbearable-0.009
Less-0.009
Token set
Token ID900
Feature activation-4.45e-02

Pos logit contributions
Listen+0.020
Who+0.019
 crumbling+0.019
Um+0.018
Less+0.018
Neg logit contributions
cer-0.028
an-0.027
adow-0.026
pit-0.026
ins-0.026
Token.
Token ID13
Feature activation-6.93e-02

Pos logit contributions
Listen+0.068
Who+0.065
 crumbling+0.064
 unbearable+0.063
Less+0.062
Neg logit contributions
cer-0.048
adow-0.046
an-0.046
pit-0.044
ins-0.044
Token<|endoftext|>
Token ID50256
Feature activation+1.96e-02

Pos logit contributions
Listen+0.090
Who+0.086
 crumbling+0.086
 unbearable+0.084
 failed+0.081
Neg logit contributions
cer-0.087
adow-0.086
ins-0.082
an-0.081
pit-0.080
Token and
Token ID290
Feature activation-7.04e-02

Pos logit contributions
Listen+0.079
Who+0.075
 crumbling+0.073
Um+0.071
Less+0.071
Neg logit contributions
cer-0.074
adow-0.070
an-0.069
ins-0.068
pit-0.068
Token over
Token ID625
Feature activation-4.76e-02

Pos logit contributions
Listen+0.082
 crumbling+0.076
Who+0.076
 unbearable+0.073
Um+0.072
Neg logit contributions
cer-0.102
adow-0.097
an-0.097
ins-0.095
pit-0.094
Token.
Token ID13
Feature activation-7.35e-02

Pos logit contributions
Listen+0.073
 crumbling+0.067
Who+0.067
Um+0.066
Less+0.066
Neg logit contributions
cer-0.053
adow-0.050
pit-0.048
ins-0.047
an-0.046
Token He
Token ID679
Feature activation-6.12e-02

Pos logit contributions
Listen+0.093
 crumbling+0.092
Who+0.090
 unbearable+0.089
 damage+0.087
Neg logit contributions
adow-0.091
cer-0.090
ase-0.088
an-0.086
room-0.085
Token sang
Token ID25889
Feature activation-5.74e-02

Pos logit contributions
Listen+0.073
Who+0.069
 crumbling+0.068
 unbearable+0.065
Less+0.064
Neg logit contributions
cer-0.087
adow-0.084
an-0.084
ins-0.081
pit-0.080
Token until
Token ID1566
Feature activation-9.90e-02

Pos logit contributions
Listen+0.068
Who+0.064
 unbearable+0.060
 Who+0.059
Less+0.058
Neg logit contributions
cer-0.084
an-0.083
adow-0.082
pit-0.081
ins-0.078
Token he
Token ID339
Feature activation-5.80e-02

Pos logit contributions
Listen+0.144
Warning+0.141
 crumbling+0.140
Po+0.139
 tempting+0.139
Neg logit contributions
cer-0.104
ie-0.097
aw-0.096
er-0.094
adow-0.092
Token had
Token ID550
Feature activation-6.41e-02

Pos logit contributions
Listen+0.067
Who+0.064
 crumbling+0.062
 unbearable+0.060
Less+0.059
Neg logit contributions
cer-0.084
adow-0.082
an-0.081
ins-0.079
pit-0.078
Token eaten
Token ID17065
Feature activation-5.01e-02

Pos logit contributions
Listen+0.072
Who+0.069
 crumbling+0.067
Um+0.064
Less+0.063
Neg logit contributions
cer-0.094
adow-0.092
ins-0.089
an-0.088
ese-0.088
Token all
Token ID477
Feature activation-4.74e-02

Pos logit contributions
Listen+0.058
Who+0.055
 crumbling+0.055
 unbearable+0.054
 tempting+0.053
Neg logit contributions
an-0.071
cer-0.071
adow-0.070
pit-0.067
ins-0.066
Token the
Token ID262
Feature activation-6.84e-03

Pos logit contributions
Listen+0.039
 single+0.033
 unbearable+0.033
Less+0.033
 crumbling+0.033
Neg logit contributions
an-0.083
pit-0.082
cer-0.081
 majesty-0.078
adow-0.078
Token stopped
Token ID5025
Feature activation-6.84e-02

Pos logit contributions
Listen+0.055
Who+0.052
 crumbling+0.051
Less+0.049
Um+0.049
Neg logit contributions
cer-0.080
adow-0.078
an-0.076
ins-0.075
pit-0.074
Token they
Token ID484
Feature activation-6.79e-02

Pos logit contributions
Listen+0.128
Less+0.124
Um+0.123
Who+0.121
Warning+0.119
Neg logit contributions
cer-0.051
ie-0.051
adow-0.048
an-0.045
ins-0.045
Token all
Token ID477
Feature activation-3.97e-02

Pos logit contributions
Listen+0.081
Who+0.076
 crumbling+0.073
Um+0.071
Less+0.070
Neg logit contributions
adow-0.099
cer-0.097
an-0.092
ie-0.091
ese-0.090
Token said
Token ID531
Feature activation-6.23e-02

Pos logit contributions
Listen+0.046
Who+0.044
 crumbling+0.042
Um+0.040
 unbearable+0.040
Neg logit contributions
cer-0.058
adow-0.058
an-0.056
ins-0.054
pit-0.054
Token goodbye
Token ID24829
Feature activation-4.12e-02

Pos logit contributions
Listen+0.089
Who+0.086
 crumbling+0.084
Um+0.083
 unbearable+0.082
Neg logit contributions
cer-0.072
adow-0.071
an-0.068
ins-0.066
pit-0.065
Token until
Token ID1566
Feature activation-9.86e-02

Pos logit contributions
Listen+0.065
Who+0.062
 crumbling+0.061
 unbearable+0.060
Less+0.059
Neg logit contributions
cer-0.042
an-0.040
adow-0.040
ins-0.039
pit-0.038
Token next
Token ID1306
Feature activation-3.67e-02

Pos logit contributions
Um+0.163
Po+0.161
Who+0.157
 tempting+0.156
Listen+0.156
Neg logit contributions
cer-0.079
ie-0.074
er-0.073
 friends-0.071
over-0.070
Token time
Token ID640
Feature activation-5.17e-02

Pos logit contributions
Listen+0.055
Who+0.054
 crumbling+0.052
Less+0.051
Um+0.050
Neg logit contributions
cer-0.040
adow-0.039
an-0.038
pit-0.036
ins-0.036
Token!
Token ID0
Feature activation-5.10e-02

Pos logit contributions
Listen+0.064
Who+0.061
Um+0.059
 crumbling+0.059
 unbearable+0.058
Neg logit contributions
cer-0.071
adow-0.068
an-0.064
er-0.064
pit-0.064
Token<|endoftext|>
Token ID50256
Feature activation+2.46e-02

Pos logit contributions
Listen+0.061
Who+0.058
 crumbling+0.057
 unbearable+0.055
Um+0.054
Neg logit contributions
cer-0.072
adow-0.070
an-0.068
ins-0.066
pit-0.066
TokenOnce
Token ID7454
Feature activation-1.75e-02

Pos logit contributions
cer+0.040
an+0.040
adow+0.039
ese+0.038
ins+0.038
Neg logit contributions
Listen-0.025
Who-0.023
 crumbling-0.021
Less-0.021
 Who-0.020
Token laughed
Token ID13818
Feature activation-3.70e-02

Pos logit contributions
Listen+0.059
Who+0.055
 crumbling+0.054
 unbearable+0.052
Less+0.051
Neg logit contributions
cer-0.083
adow-0.080
an-0.080
ins-0.077
pit-0.077
Token and
Token ID290
Feature activation-5.44e-02

Pos logit contributions
Listen+0.058
 crumbling+0.055
Who+0.055
 unbearable+0.053
Less+0.052
Neg logit contributions
cer-0.038
an-0.038
adow-0.037
ins-0.036
pit-0.035
Token smiled
Token ID13541
Feature activation-2.31e-02

Pos logit contributions
Listen+0.065
Who+0.062
 crumbling+0.061
 unbearable+0.059
Um+0.058
Neg logit contributions
cer-0.077
adow-0.073
an-0.072
ins-0.070
pit-0.070
Token and
Token ID290
Feature activation-5.46e-02

Pos logit contributions
Listen+0.034
 crumbling+0.033
Who+0.033
 unbearable+0.032
Less+0.031
Neg logit contributions
an-0.025
cer-0.025
adow-0.024
ins-0.024
ese-0.023
Token played
Token ID2826
Feature activation-6.07e-02

Pos logit contributions
Listen+0.060
Who+0.057
 crumbling+0.055
 unbearable+0.053
Less+0.052
Neg logit contributions
cer-0.083
adow-0.080
an-0.080
ins-0.077
pit-0.077
Token until
Token ID1566
Feature activation-9.83e-02

Pos logit contributions
 tug+0.058
Listen+0.057
Who+0.057
 notice+0.054
Less+0.052
Neg logit contributions
an-0.094
adow-0.092
ins-0.090
ese-0.088
pit-0.088
Token it
Token ID340
Feature activation-4.99e-02

Pos logit contributions
Um+0.155
Listen+0.153
Po+0.149
Warning+0.149
 crumbling+0.148
Neg logit contributions
cer-0.092
ie-0.079
aw-0.077
er-0.076
over-0.074
Token was
Token ID373
Feature activation-4.36e-02

Pos logit contributions
Listen+0.055
Who+0.051
 failed+0.050
 unbearable+0.048
 crumbling+0.048
Neg logit contributions
cer-0.075
adow-0.075
an-0.073
ins-0.073
ese-0.072
Token time
Token ID640
Feature activation-3.49e-02

Pos logit contributions
Listen+0.058
 crumbling+0.055
Who+0.054
 unbearable+0.054
Less+0.052
Neg logit contributions
cer-0.056
an-0.055
adow-0.054
ins-0.053
pit-0.052
Token to
Token ID284
Feature activation-5.30e-02

Pos logit contributions
Who+0.060
Listen+0.058
Um+0.057
 crumbling+0.055
Warning+0.055
Neg logit contributions
adow-0.032
cer-0.029
ie-0.026
pit-0.026
an-0.026
Token go
Token ID467
Feature activation-7.13e-02

Pos logit contributions
Listen+0.048
Who+0.045
 crumbling+0.043
Less+0.042
 unbearable+0.042
Neg logit contributions
cer-0.090
ins-0.088
adow-0.088
an-0.087
pit-0.086
Token enthusiastic
Token ID21848
Feature activation-1.72e-02

Pos logit contributions
Listen+0.115
Warning+0.108
Who+0.102
Um+0.097
 crumbling+0.097
Neg logit contributions
cer-0.094
adow-0.092
ins-0.090
pit-0.085
as-0.085
Token.
Token ID13
Feature activation-7.34e-02

Pos logit contributions
Listen+0.024
Who+0.023
 crumbling+0.023
 unbearable+0.022
 tempting+0.022
Neg logit contributions
an-0.021
cer-0.020
ese-0.019
adow-0.019
ins-0.019
Token Every
Token ID3887
Feature activation-3.51e-02

Pos logit contributions
Listen+0.093
 unbearable+0.090
 crumbling+0.090
Who+0.088
 failed+0.087
Neg logit contributions
adow-0.091
ase-0.086
cer-0.085
pit-0.082
ins-0.081
Token time
Token ID640
Feature activation-6.47e-02

Pos logit contributions
Listen+0.055
Who+0.054
 crumbling+0.052
Um+0.052
Less+0.050
Neg logit contributions
adow-0.037
cer-0.036
an-0.033
ie-0.033
pit-0.032
Token Amelia
Token ID49769
Feature activation-4.62e-02

Pos logit contributions
Listen+0.082
Who+0.076
Less+0.076
 unbearable+0.075
 crumbling+0.074
Neg logit contributions
cer-0.084
adow-0.082
er-0.079
ie-0.079
ins-0.076
Token got
Token ID1392
Feature activation-9.81e-02

Pos logit contributions
Listen+0.057
Who+0.053
 crumbling+0.052
Less+0.051
 unbearable+0.050
Neg logit contributions
cer-0.066
adow-0.064
an-0.062
ins-0.062
pit-0.061
Token an
Token ID281
Feature activation-2.49e-02

Pos logit contributions
Warning+0.159
Listen+0.157
Who+0.153
Um+0.149
 crumbling+0.148
Neg logit contributions
ins-0.094
cer-0.090
adow-0.086
ie-0.085
pit-0.084
Token answer
Token ID3280
Feature activation-6.53e-02

Pos logit contributions
Listen+0.034
Who+0.031
 crumbling+0.030
Um+0.029
Less+0.029
Neg logit contributions
adow-0.034
an-0.033
cer-0.032
ie-0.030
pit-0.030
Token right
Token ID826
Feature activation-4.23e-02

Pos logit contributions
Listen+0.111
Who+0.106
 crumbling+0.103
Less+0.103
Um+0.103
Neg logit contributions
cer-0.061
adow-0.059
pit-0.056
an-0.054
ie-0.053
Token her
Token ID607
Feature activation-2.88e-02

Pos logit contributions
Listen+0.072
Who+0.070
Less+0.069
 crumbling+0.067
Um+0.067
Neg logit contributions
cer-0.041
er-0.038
pit-0.038
adow-0.037
ie-0.036
Token Mom
Token ID11254
Feature activation-3.21e-02

Pos logit contributions
Listen+0.047
Who+0.041
Warning+0.040
Um+0.040
 crumbling+0.039
Neg logit contributions
adow-0.034
pit-0.032
 ways-0.032
an-0.030
 friends-0.029
Token would
Token ID561
Feature activation-4.83e-02

Pos logit contributions
Listen+0.053
Who+0.051
 crumbling+0.049
 unbearable+0.047
Less+0.047
Neg logit contributions
cer-0.070
adow-0.068
an-0.067
ins-0.065
pit-0.065
Token hug
Token ID16225
Feature activation-3.73e-02

Pos logit contributions
Listen+0.050
Who+0.047
 crumbling+0.046
 afford+0.044
 unbearable+0.044
Neg logit contributions
cer-0.076
ins-0.074
an-0.073
adow-0.071
pit-0.071
Token and
Token ID290
Feature activation-4.86e-02

Pos logit contributions
Listen+0.041
Who+0.040
 Who+0.035
 crumbling+0.035
 unbearable+0.035
Neg logit contributions
pit-0.057
eh-0.056
an-0.056
adow-0.056
ins-0.054
Token say
Token ID910
Feature activation-5.24e-02

Pos logit contributions
Listen+0.038
Who+0.036
 failed+0.035
 tug+0.034
 notice+0.033
Neg logit contributions
ins-0.092
an-0.092
ase-0.089
pit-0.088
cer-0.087
Token goodbye
Token ID24829
Feature activation-2.93e-02

Pos logit contributions
Listen+0.068
Who+0.066
 crumbling+0.065
 unbearable+0.062
Um+0.062
Neg logit contributions
cer-0.067
adow-0.066
an-0.064
ins-0.064
pit-0.061
Token until
Token ID1566
Feature activation-9.75e-02

Pos logit contributions
Listen+0.040
Who+0.038
 crumbling+0.038
 unbearable+0.036
 failed+0.036
Neg logit contributions
cer-0.036
an-0.034
ins-0.034
adow-0.034
pit-0.034
Token the
Token ID262
Feature activation+7.47e-03

Pos logit contributions
Listen+0.145
Um+0.145
 crumbling+0.144
Po+0.143
Warning+0.142
Neg logit contributions
cer-0.097
ie-0.086
adow-0.083
er-0.082
over-0.079
Token next
Token ID1306
Feature activation-1.21e-02

Pos logit contributions
cer+0.008
adow+0.008
an+0.008
pit+0.008
ins+0.008
Neg logit contributions
Listen-0.011
Who-0.011
 crumbling-0.010
Less-0.010
 unbearable-0.010
Token time
Token ID640
Feature activation-5.02e-02

Pos logit contributions
Listen+0.018
Who+0.017
 crumbling+0.016
 unbearable+0.016
Less+0.016
Neg logit contributions
cer-0.014
adow-0.014
an-0.014
ins-0.013
pit-0.013
Token.
Token ID13
Feature activation-7.21e-02

Pos logit contributions
Listen+0.049
Who+0.046
 crumbling+0.045
 unbearable+0.043
Less+0.042
Neg logit contributions
cer-0.082
adow-0.080
an-0.079
ins-0.077
pit-0.077
Token They
Token ID1119
Feature activation-5.66e-02

Pos logit contributions
Listen+0.086
Who+0.082
 crumbling+0.081
 unbearable+0.079
Um+0.076
Neg logit contributions
cer-0.100
adow-0.098
an-0.095
pit-0.093
ins-0.093
Token They
Token ID1119
Feature activation-5.33e-02

Pos logit contributions
Listen+0.090
Who+0.087
 crumbling+0.087
 unbearable+0.085
 failed+0.082
Neg logit contributions
cer-0.093
adow-0.090
an-0.087
ins-0.087
ase-0.086
Token played
Token ID2826
Feature activation-5.51e-02

Pos logit contributions
Listen+0.066
Who+0.062
 crumbling+0.061
 unbearable+0.059
Less+0.058
Neg logit contributions
cer-0.073
adow-0.071
an-0.071
ins-0.068
pit-0.068
Token and
Token ID290
Feature activation-5.65e-02

Pos logit contributions
 tug+0.052
Listen+0.049
 hard+0.047
 notice+0.045
Who+0.045
Neg logit contributions
adow-0.089
an-0.088
cer-0.084
ins-0.083
pit-0.083
Token laughed
Token ID13818
Feature activation-5.05e-02

Pos logit contributions
Listen+0.057
Who+0.052
 crumbling+0.051
 failed+0.050
Less+0.048
Neg logit contributions
an-0.092
adow-0.091
cer-0.090
pit-0.088
ins-0.088
Token together
Token ID1978
Feature activation-4.86e-02

Pos logit contributions
Listen+0.057
 crumbling+0.054
 unbearable+0.053
Who+0.053
Less+0.052
Neg logit contributions
an-0.070
adow-0.070
cer-0.068
ins-0.068
ese-0.066
Token until
Token ID1566
Feature activation-9.69e-02

Pos logit contributions
Listen+0.072
Who+0.069
 crumbling+0.068
 unbearable+0.066
Less+0.066
Neg logit contributions
cer-0.054
an-0.052
adow-0.052
ins-0.050
pit-0.049
Token the
Token ID262
Feature activation+2.28e-03

Pos logit contributions
Um+0.147
Listen+0.146
Po+0.144
Warning+0.144
Who+0.142
Neg logit contributions
cer-0.096
ie-0.088
aw-0.085
er-0.079
over-0.074
Token sun
Token ID4252
Feature activation-1.73e-02

Pos logit contributions
cer+0.003
an+0.003
adow+0.003
pit+0.003
ese+0.003
Neg logit contributions
Listen-0.003
Who-0.003
 crumbling-0.003
Less-0.003
 afford-0.003
Token went
Token ID1816
Feature activation-5.36e-02

Pos logit contributions
Listen+0.019
Who+0.018
 crumbling+0.017
Less+0.016
 unbearable+0.016
Neg logit contributions
cer-0.026
an-0.026
adow-0.026
ins-0.025
pit-0.025
Token down
Token ID866
Feature activation-6.55e-02

Pos logit contributions
Listen+0.079
Who+0.079
Um+0.077
Less+0.075
 crumbling+0.075
Neg logit contributions
cer-0.059
an-0.054
ie-0.054
adow-0.052
er-0.052
Token.
Token ID13
Feature activation-7.33e-02

Pos logit contributions
Listen+0.129
Um+0.126
Who+0.126
Less+0.121
 crumbling+0.120
Neg logit contributions
cer-0.043
er-0.037
adow-0.036
room-0.035
ie-0.034
Token.
Token ID13
Feature activation-6.48e-02

Pos logit contributions
Listen+0.041
Who+0.040
 crumbling+0.038
 unbearable+0.037
Less+0.037
Neg logit contributions
cer-0.026
an-0.026
adow-0.025
ins-0.024
pit-0.024
Token She
Token ID1375
Feature activation-3.99e-02

Pos logit contributions
Listen+0.069
Who+0.065
 crumbling+0.063
 unbearable+0.061
Less+0.060
Neg logit contributions
cer-0.100
adow-0.098
an-0.097
ins-0.094
pit-0.094
Token laughed
Token ID13818
Feature activation-3.42e-02

Pos logit contributions
Listen+0.045
Who+0.042
 crumbling+0.041
 unbearable+0.040
Less+0.039
Neg logit contributions
cer-0.059
adow-0.058
an-0.057
pit-0.055
ins-0.055
Token and
Token ID290
Feature activation-4.95e-02

Pos logit contributions
Listen+0.056
Who+0.054
 crumbling+0.053
 unbearable+0.052
Less+0.051
Neg logit contributions
cer-0.033
adow-0.032
an-0.032
ins-0.030
pit-0.029
Token played
Token ID2826
Feature activation-6.03e-02

Pos logit contributions
Listen+0.061
Who+0.058
 crumbling+0.057
 unbearable+0.055
Um+0.054
Neg logit contributions
cer-0.069
adow-0.066
an-0.065
pit-0.063
ins-0.063
Token until
Token ID1566
Feature activation-9.66e-02

Pos logit contributions
Who+0.076
Listen+0.075
 crumbling+0.070
 notice+0.069
Less+0.068
Neg logit contributions
an-0.078
cer-0.078
adow-0.076
ese-0.075
ins-0.075
Token it
Token ID340
Feature activation-5.05e-02

Pos logit contributions
Um+0.143
Listen+0.141
Warning+0.140
 tempting+0.138
Po+0.137
Neg logit contributions
cer-0.092
ie-0.089
aw-0.087
er-0.084
pit-0.083
Token was
Token ID373
Feature activation-4.54e-02

Pos logit contributions
Listen+0.068
Who+0.065
 crumbling+0.064
Less+0.062
 unbearable+0.061
Neg logit contributions
cer-0.064
an-0.061
adow-0.061
pit-0.059
ins-0.058
Token time
Token ID640
Feature activation-3.36e-02

Pos logit contributions
Listen+0.065
Who+0.063
 crumbling+0.061
Um+0.059
Less+0.059
Neg logit contributions
cer-0.053
adow-0.051
an-0.051
ins-0.049
pit-0.049
Token to
Token ID284
Feature activation-5.49e-02

Pos logit contributions
Who+0.056
Listen+0.055
Um+0.053
 crumbling+0.052
Warning+0.052
Neg logit contributions
adow-0.032
cer-0.029
ie-0.027
pit-0.027
an-0.026
Token go
Token ID467
Feature activation-7.13e-02

Pos logit contributions
Listen+0.055
Who+0.052
 crumbling+0.050
 unbearable+0.048
Less+0.048
Neg logit contributions
cer-0.088
adow-0.085
an-0.085
ins-0.083
pit-0.082
Token all
Token ID477
Feature activation-3.86e-02

Pos logit contributions
Listen+0.100
Um+0.093
Who+0.093
 crumbling+0.091
 unbearable+0.090
Neg logit contributions
cer-0.077
ins-0.070
an-0.068
 ways-0.068
adow-0.068
Token of
Token ID286
Feature activation-4.93e-02

Pos logit contributions
Listen+0.048
Who+0.046
 crumbling+0.046
Less+0.044
 unbearable+0.044
Neg logit contributions
cer-0.052
an-0.050
adow-0.050
pit-0.050
ese-0.048
Token the
Token ID262
Feature activation-1.54e-02

Pos logit contributions
Listen+0.071
Who+0.067
Um+0.066
 crumbling+0.065
 unbearable+0.064
Neg logit contributions
cer-0.058
ins-0.055
an-0.054
adow-0.054
ie-0.052
Token support
Token ID1104
Feature activation-4.65e-02

Pos logit contributions
Listen+0.022
Who+0.021
Um+0.021
 crumbling+0.020
 unbearable+0.020
Neg logit contributions
cer-0.018
an-0.017
ins-0.017
adow-0.017
 ways-0.016
Token they
Token ID484
Feature activation-6.37e-02

Pos logit contributions
Listen+0.066
 crumbling+0.063
Who+0.063
 failed+0.062
 unbearable+0.060
Neg logit contributions
an-0.051
cer-0.051
adow-0.050
pit-0.048
ins-0.048
Token got
Token ID1392
Feature activation-9.65e-02

Pos logit contributions
Listen+0.083
 crumbling+0.079
Um+0.076
Who+0.075
 unbearable+0.074
Neg logit contributions
cer-0.079
adow-0.077
ie-0.077
er-0.071
ese-0.071
Token.
Token ID13
Feature activation-7.45e-02

Pos logit contributions
Listen+0.160
Who+0.155
Um+0.151
Warning+0.146
Less+0.142
Neg logit contributions
ins-0.099
cer-0.097
er-0.092
adow-0.089
an-0.087
Token They
Token ID1119
Feature activation-6.21e-02

Pos logit contributions
Listen+0.097
Who+0.094
 crumbling+0.093
 unbearable+0.091
Um+0.087
Neg logit contributions
cer-0.096
adow-0.091
an-0.090
ase-0.087
ins-0.087
Token worked
Token ID3111
Feature activation-6.48e-02

Pos logit contributions
Listen+0.080
Who+0.077
 crumbling+0.072
Um+0.071
 unbearable+0.070
Neg logit contributions
cer-0.084
adow-0.083
an-0.079
ie-0.078
ins-0.076
Token hard
Token ID1327
Feature activation-4.96e-02

Pos logit contributions
Who+0.115
Um+0.108
Listen+0.108
Warning+0.102
 Who+0.101
Neg logit contributions
cer-0.065
ins-0.057
er-0.055
ie-0.054
aw-0.053
Token,
Token ID11
Feature activation-6.67e-02

Pos logit contributions
Who+0.098
Um+0.095
Listen+0.094
 crumbling+0.088
Less+0.088
Neg logit contributions
cer-0.035
er-0.032
ins-0.030
adow-0.030
ie-0.029
Token.
Token ID13
Feature activation-6.57e-02

Pos logit contributions
Listen+0.041
Who+0.039
 crumbling+0.039
 failed+0.038
Less+0.037
Neg logit contributions
an-0.036
cer-0.035
adow-0.034
ese-0.033
ins-0.033
Token They
Token ID1119
Feature activation-5.41e-02

Pos logit contributions
Listen+0.076
Who+0.073
 crumbling+0.070
 unbearable+0.068
Um+0.067
Neg logit contributions
cer-0.095
adow-0.093
an-0.092
ins-0.089
pit-0.089
Token laughed
Token ID13818
Feature activation-3.76e-02

Pos logit contributions
Listen+0.074
Who+0.072
 crumbling+0.069
 unbearable+0.068
Um+0.068
Neg logit contributions
cer-0.068
adow-0.065
an-0.064
ie-0.061
ese-0.061
Token and
Token ID290
Feature activation-5.56e-02

Pos logit contributions
Listen+0.059
Who+0.055
 crumbling+0.055
 unbearable+0.054
Less+0.053
Neg logit contributions
cer-0.039
an-0.038
adow-0.037
ins-0.037
pit-0.036
Token talked
Token ID6619
Feature activation-7.31e-02

Pos logit contributions
Listen+0.072
Who+0.071
 crumbling+0.068
 unbearable+0.067
Um+0.067
Neg logit contributions
cer-0.073
adow-0.068
an-0.067
ese-0.064
pit-0.064
Token until
Token ID1566
Feature activation-9.62e-02

Pos logit contributions
Listen+0.116
Who+0.112
 crumbling+0.110
 unbearable+0.107
Less+0.106
Neg logit contributions
cer-0.074
an-0.071
adow-0.071
ins-0.068
pit-0.067
Token the
Token ID262
Feature activation-8.11e-03

Pos logit contributions
Um+0.161
Po+0.160
Listen+0.156
Warning+0.155
Value+0.154
Neg logit contributions
aw-0.080
cer-0.073
ie-0.070
 friends-0.069
er-0.066
Token sun
Token ID4252
Feature activation-2.56e-02

Pos logit contributions
Listen+0.014
 crumbling+0.013
Who+0.013
 afford+0.013
Less+0.013
Neg logit contributions
an-0.007
adow-0.007
pit-0.007
ese-0.007
cer-0.006
Token went
Token ID1816
Feature activation-5.76e-02

Pos logit contributions
Listen+0.030
Who+0.028
 crumbling+0.027
Less+0.027
Um+0.027
Neg logit contributions
cer-0.037
an-0.037
adow-0.036
pit-0.035
er-0.035
Token down
Token ID866
Feature activation-6.98e-02

Pos logit contributions
Listen+0.087
Who+0.087
Um+0.084
Less+0.082
 crumbling+0.081
Neg logit contributions
cer-0.062
an-0.057
adow-0.056
ie-0.056
er-0.055
Token.
Token ID13
Feature activation-7.14e-02

Pos logit contributions
Listen+0.144
Um+0.142
Who+0.142
Less+0.135
 crumbling+0.133
Neg logit contributions
cer-0.041
er-0.035
adow-0.035
aw-0.032
room-0.032
TokenThey
Token ID2990
Feature activation-5.11e-02

Pos logit contributions
 crumbling+0.074
 unbearable+0.072
Listen+0.070
 failed+0.069
 tempting+0.068
Neg logit contributions
cer-0.089
adow-0.088
an-0.087
ins-0.085
eh-0.084
Token played
Token ID2826
Feature activation-5.35e-02

Pos logit contributions
Listen+0.068
Who+0.065
 crumbling+0.063
 unbearable+0.061
Um+0.061
Neg logit contributions
cer-0.066
adow-0.064
an-0.063
ie-0.062
ese-0.061
Token and
Token ID290
Feature activation-5.43e-02

Pos logit contributions
Listen+0.043
 tug+0.042
 persu+0.041
Less+0.040
Who+0.040
Neg logit contributions
an-0.084
cer-0.083
ins-0.083
adow-0.083
rim-0.081
Token laughed
Token ID13818
Feature activation-4.85e-02

Pos logit contributions
Listen+0.050
Who+0.047
 crumbling+0.045
Less+0.042
 failed+0.042
Neg logit contributions
adow-0.091
cer-0.091
an-0.091
ins-0.089
pit-0.086
Token together
Token ID1978
Feature activation-5.06e-02

Pos logit contributions
Listen+0.057
 crumbling+0.054
Who+0.053
 unbearable+0.052
Less+0.051
Neg logit contributions
an-0.065
adow-0.064
cer-0.064
ins-0.064
ese-0.061
Token until
Token ID1566
Feature activation-9.62e-02

Pos logit contributions
Listen+0.076
Who+0.073
 crumbling+0.072
 unbearable+0.070
Less+0.070
Neg logit contributions
cer-0.055
adow-0.053
an-0.053
ins-0.051
pit-0.050
Token the
Token ID262
Feature activation+3.98e-03

Pos logit contributions
Listen+0.145
Um+0.143
Value+0.141
 crumbling+0.141
Warning+0.140
Neg logit contributions
cer-0.091
aw-0.089
ie-0.088
er-0.078
adow-0.075
Token monkey
Token ID21657
Feature activation-2.03e-02

Pos logit contributions
an+0.005
pit+0.004
cer+0.004
adow+0.004
ese+0.004
Neg logit contributions
Listen-0.006
 crumbling-0.005
Who-0.005
 unbearable-0.005
Less-0.005
Token tried
Token ID3088
Feature activation-7.15e-02

Pos logit contributions
Listen+0.019
Who+0.018
 crumbling+0.017
 failed+0.017
 unbearable+0.017
Neg logit contributions
cer-0.033
adow-0.032
ins-0.032
an-0.032
pit-0.031
Token to
Token ID284
Feature activation-4.67e-02

Pos logit contributions
Listen+0.110
Warning+0.105
Who+0.105
Um+0.105
Less+0.103
Neg logit contributions
cer-0.073
pit-0.066
adow-0.066
ins-0.065
ie-0.065
Token take
Token ID1011
Feature activation-5.01e-02

Pos logit contributions
Who+0.039
Listen+0.038
 unbearable+0.035
 crumbling+0.035
 afford+0.034
Neg logit contributions
ins-0.081
cer-0.081
adow-0.080
ase-0.079
pit-0.078
Token.
Token ID13
Feature activation-7.12e-02

Pos logit contributions
 failed+0.040
 crumbling+0.039
 afford+0.039
 eventually+0.038
 confusing+0.038
Neg logit contributions
an-0.058
ins-0.054
ese-0.053
cer-0.052
rim-0.052
Token They
Token ID1119
Feature activation-5.21e-02

Pos logit contributions
Listen+0.093
 unbearable+0.092
Who+0.091
Um+0.090
Po+0.088
Neg logit contributions
adow-0.085
cer-0.078
ie-0.078
er-0.077
room-0.077
Token laughed
Token ID13818
Feature activation-3.63e-02

Pos logit contributions
Listen+0.054
Who+0.051
 crumbling+0.049
 unbearable+0.047
Um+0.047
Neg logit contributions
cer-0.082
adow-0.080
an-0.079
ins-0.077
pit-0.077
Token and
Token ID290
Feature activation-5.56e-02

Pos logit contributions
Listen+0.048
 crumbling+0.045
 unbearable+0.044
Who+0.044
 tempting+0.042
Neg logit contributions
adow-0.046
an-0.045
cer-0.045
ins-0.044
pit-0.042
Token laughed
Token ID13818
Feature activation-4.51e-02

Pos logit contributions
Listen+0.057
Who+0.054
 crumbling+0.052
 unbearable+0.050
Um+0.049
Neg logit contributions
cer-0.088
adow-0.085
an-0.085
ins-0.082
pit-0.082
Token until
Token ID1566
Feature activation-9.62e-02

Pos logit contributions
Listen+0.056
 crumbling+0.054
Who+0.053
 unbearable+0.053
Less+0.050
Neg logit contributions
an-0.059
adow-0.059
cer-0.058
ins-0.057
ese-0.055
Token they
Token ID484
Feature activation-5.21e-02

Pos logit contributions
Warning+0.167
Um+0.163
Po+0.161
 tempting+0.159
Value+0.159
Neg logit contributions
ie-0.080
aw-0.066
er-0.065
 friends-0.063
cer-0.061
Token were
Token ID547
Feature activation-6.00e-02

Pos logit contributions
Listen+0.050
Who+0.047
 crumbling+0.045
 unbearable+0.043
Less+0.043
Neg logit contributions
cer-0.086
an-0.084
adow-0.084
ins-0.082
pit-0.081
Token both
Token ID1111
Feature activation-4.77e-02

Pos logit contributions
Listen+0.048
 crumbling+0.047
 unbearable+0.046
 failed+0.045
 tempting+0.042
Neg logit contributions
cer-0.108
an-0.107
ins-0.102
adow-0.101
pit-0.100
Token very
Token ID845
Feature activation-7.08e-02

Pos logit contributions
 failed+0.035
 crumbling+0.034
 forgotten+0.033
 unbearable+0.032
 discouraged+0.032
Neg logit contributions
an-0.090
ins-0.084
ase-0.084
cer-0.084
una-0.083
Token tired
Token ID10032
Feature activation-3.99e-02

Pos logit contributions
Listen+0.083
Who+0.080
Warning+0.074
 crumbling+0.073
Less+0.073
Neg logit contributions
cer-0.104
adow-0.103
ins-0.099
an-0.098
pit-0.095
Token and
Token ID290
Feature activation-4.88e-02

Pos logit contributions
 crumbling+0.020
 failed+0.018
 tempting+0.018
 forgotten+0.018
 unbearable+0.018
Neg logit contributions
an-0.025
ase-0.024
eh-0.023
ins-0.023
oco-0.023
Token happy
Token ID3772
Feature activation-3.30e-02

Pos logit contributions
Listen+0.052
Who+0.049
 crumbling+0.048
 unbearable+0.046
Less+0.045
Neg logit contributions
cer-0.075
an-0.074
adow-0.073
ins-0.072
pit-0.071
Token.
Token ID13
Feature activation-6.70e-02

Pos logit contributions
 crumbling+0.040
Who+0.040
Listen+0.039
 unbearable+0.039
 failed+0.038
Neg logit contributions
an-0.044
eh-0.042
ese-0.041
cer-0.041
ins-0.041
Token He
Token ID679
Feature activation-5.58e-02

Pos logit contributions
Listen+0.081
Who+0.080
 crumbling+0.078
 afford+0.075
 unbearable+0.074
Neg logit contributions
cer-0.087
adow-0.086
an-0.084
er-0.079
ie-0.078
Token played
Token ID2826
Feature activation-5.51e-02

Pos logit contributions
Listen+0.056
Who+0.053
 crumbling+0.051
 unbearable+0.049
Less+0.048
Neg logit contributions
cer-0.090
adow-0.088
an-0.087
ins-0.084
pit-0.084
Token until
Token ID1566
Feature activation-9.61e-02

Pos logit contributions
Listen+0.058
Who+0.055
 tug+0.053
 notice+0.051
 persu+0.051
Neg logit contributions
cer-0.081
adow-0.080
an-0.079
ese-0.078
ins-0.078
Token he
Token ID339
Feature activation-5.60e-02

Pos logit contributions
Listen+0.124
Who+0.120
Um+0.120
 crumbling+0.117
 tempting+0.116
Neg logit contributions
cer-0.124
adow-0.112
ie-0.112
an-0.107
er-0.107
Token was
Token ID373
Feature activation-6.35e-02

Pos logit contributions
Listen+0.053
Who+0.051
 failed+0.049
 crumbling+0.048
 unbearable+0.047
Neg logit contributions
cer-0.090
an-0.089
ins-0.088
adow-0.087
ese-0.086
Token so
Token ID523
Feature activation-7.54e-02

Pos logit contributions
 crumbling+0.051
Listen+0.051
 unbearable+0.050
 failed+0.048
 forgotten+0.046
Neg logit contributions
cer-0.114
an-0.113
ins-0.109
ese-0.107
pit-0.105
Token tired
Token ID10032
Feature activation-4.52e-02

Pos logit contributions
Listen+0.085
Who+0.082
 crumbling+0.076
Um+0.075
Warning+0.075
Neg logit contributions
adow-0.113
cer-0.110
an-0.107
ins-0.106
pit-0.103
Token he
Token ID339
Feature activation-5.57e-02

Pos logit contributions
Listen+0.048
Who+0.046
 crumbling+0.046
 damage+0.045
 unbearable+0.045
Neg logit contributions
ins-0.069
adow-0.068
an-0.068
cer-0.067
ese-0.066
Token to
Token ID284
Feature activation-7.20e-02

Pos logit contributions
Listen+0.028
 crumbling+0.027
Who+0.027
 unbearable+0.027
 tempting+0.026
Neg logit contributions
cer-0.022
ins-0.021
an-0.020
pit-0.020
adow-0.020
Token help
Token ID1037
Feature activation-5.21e-02

Pos logit contributions
Listen+0.073
Who+0.069
 crumbling+0.066
 unbearable+0.064
Less+0.063
Neg logit contributions
cer-0.116
adow-0.112
an-0.112
ins-0.109
pit-0.109
Token keep
Token ID1394
Feature activation-5.91e-02

Pos logit contributions
Listen+0.061
Who+0.061
 crumbling+0.059
 unbearable+0.057
 damage+0.055
Neg logit contributions
cer-0.073
adow-0.071
pit-0.071
an-0.070
ins-0.069
Token her
Token ID607
Feature activation-3.09e-02

Pos logit contributions
Listen+0.077
Who+0.074
 crumbling+0.071
Um+0.070
Less+0.069
Neg logit contributions
cer-0.078
adow-0.075
an-0.074
ins-0.072
ese-0.070
Token warm
Token ID5814
Feature activation-9.48e-03

Pos logit contributions
 crumbling+0.033
 unbearable+0.033
Listen+0.033
Who+0.032
 tempting+0.031
Neg logit contributions
cer-0.046
pit-0.044
ins-0.044
ase-0.042
an-0.042
Token until
Token ID1566
Feature activation-9.61e-02

Pos logit contributions
 unbearable+0.012
 crumbling+0.011
Listen+0.011
 damage+0.011
 tempting+0.011
Neg logit contributions
ins-0.013
cer-0.012
aw-0.012
pit-0.012
eh-0.011
Token morning
Token ID3329
Feature activation-5.19e-02

Pos logit contributions
Listen+0.131
Um+0.127
Warning+0.123
Who+0.122
Po+0.120
Neg logit contributions
cer-0.110
ie-0.107
adow-0.104
er-0.104
pit-0.097
Token and
Token ID290
Feature activation-5.08e-02

Pos logit contributions
Listen+0.083
Who+0.079
 crumbling+0.078
Um+0.076
Less+0.076
Neg logit contributions
cer-0.053
adow-0.050
an-0.049
ins-0.047
pit-0.047
Token never
Token ID1239
Feature activation-3.03e-02

Pos logit contributions
Listen+0.045
Who+0.043
 crumbling+0.041
 unbearable+0.040
 failed+0.039
Neg logit contributions
cer-0.087
an-0.086
adow-0.085
ins-0.084
pit-0.084
Token regretted
Token ID44652
Feature activation-4.04e-02

Pos logit contributions
Listen+0.035
Who+0.032
Less+0.032
Warning+0.032
 crumbling+0.031
Neg logit contributions
adow-0.043
cer-0.043
ie-0.042
an-0.039
pit-0.039
Token it
Token ID340
Feature activation-5.96e-02

Pos logit contributions
Listen+0.041
Who+0.040
 crumbling+0.040
 unbearable+0.038
Less+0.037
Neg logit contributions
cer-0.062
an-0.061
adow-0.061
pit-0.060
ins-0.059
Token little
Token ID1310
Feature activation+1.70e-02

Pos logit contributions
Listen+0.010
Who+0.009
 crumbling+0.009
Um+0.009
 unbearable+0.009
Neg logit contributions
cer-0.006
an-0.006
adow-0.006
pit-0.006
ese-0.006
Token boy
Token ID2933
Feature activation-1.57e-02

Pos logit contributions
adow+0.014
cer+0.014
an+0.014
ins+0.014
ese+0.013
Neg logit contributions
Listen-0.030
Who-0.029
 crumbling-0.029
 unbearable-0.027
 failed-0.027
Token played
Token ID2826
Feature activation-5.37e-02

Pos logit contributions
Listen+0.019
Who+0.018
 crumbling+0.018
 unbearable+0.017
Less+0.017
Neg logit contributions
cer-0.022
adow-0.021
an-0.021
ins-0.020
pit-0.020
Token and
Token ID290
Feature activation-5.29e-02

Pos logit contributions
Who+0.055
Listen+0.053
 tug+0.052
 notice+0.051
 hard+0.050
Neg logit contributions
an-0.079
adow-0.079
ins-0.077
ese-0.076
cer-0.076
Token played
Token ID2826
Feature activation-6.29e-02

Pos logit contributions
Listen+0.046
Who+0.043
 crumbling+0.041
 failed+0.040
Less+0.038
Neg logit contributions
an-0.092
adow-0.092
cer-0.090
ins-0.089
ese-0.089
Token until
Token ID1566
Feature activation-9.59e-02

Pos logit contributions
Who+0.062
 notice+0.060
 tug+0.059
 hard+0.058
Listen+0.057
Neg logit contributions
adow-0.096
an-0.095
ese-0.093
ins-0.093
rim-0.091
Token it
Token ID340
Feature activation-5.23e-02

Pos logit contributions
Listen+0.123
Um+0.119
Po+0.117
 crumbling+0.116
Who+0.116
Neg logit contributions
cer-0.124
aw-0.112
ie-0.110
pit-0.110
er-0.107
Token was
Token ID373
Feature activation-4.53e-02

Pos logit contributions
Listen+0.057
Who+0.054
 failed+0.052
 crumbling+0.051
 unbearable+0.051
Neg logit contributions
adow-0.079
cer-0.078
ins-0.077
an-0.076
ese-0.076
Token time
Token ID640
Feature activation-3.32e-02

Pos logit contributions
Listen+0.056
 crumbling+0.054
 unbearable+0.053
Who+0.052
 tempting+0.051
Neg logit contributions
cer-0.062
an-0.062
ins-0.059
adow-0.059
ese-0.058
Token to
Token ID284
Feature activation-5.34e-02

Pos logit contributions
Listen+0.053
Who+0.053
Um+0.050
 crumbling+0.049
Less+0.049
Neg logit contributions
adow-0.034
cer-0.033
an-0.030
pit-0.030
ie-0.029
Token go
Token ID467
Feature activation-7.16e-02

Pos logit contributions
Listen+0.040
Who+0.039
 crumbling+0.036
 unbearable+0.036
Less+0.036
Neg logit contributions
ins-0.098
cer-0.098
adow-0.093
an-0.093
pit-0.093
Token't
Token ID470
Feature activation+1.54e-01

Pos logit contributions
Listen+0.055
 unbearable+0.053
 crumbling+0.053
Less+0.053
Warning+0.052
Neg logit contributions
ins-0.022
er-0.022
aw-0.021
adow-0.021
cer-0.021
Token know
Token ID760
Feature activation+1.15e-01

Pos logit contributions
cer+0.260
an+0.254
adow+0.249
pit+0.248
ins+0.248
Neg logit contributions
Listen-0.140
Who-0.134
 crumbling-0.126
 unbearable-0.122
 failed-0.120
Token the
Token ID262
Feature activation+1.73e-01

Pos logit contributions
cer+0.162
an+0.160
adow+0.152
pit+0.149
ins+0.148
Neg logit contributions
Listen-0.135
Who-0.134
 crumbling-0.129
 tempting-0.121
 unbearable-0.121
Token rh
Token ID9529
Feature activation+2.14e-01

Pos logit contributions
cer+0.243
adow+0.235
an+0.231
ins+0.230
pit+0.223
Neg logit contributions
Listen-0.206
Who-0.199
 crumbling-0.194
 unbearable-0.187
 tempting-0.183
Tokenin
Token ID259
Feature activation+1.37e-01

Pos logit contributions
adow+0.293
pit+0.283
robe+0.263
 majesty+0.262
ows+0.254
Neg logit contributions
Who-0.232
 failed-0.231
Listen-0.230
 afford-0.229
Less-0.218
Tokenoc
Token ID420
Feature activation+2.30e-01

Pos logit contributions
cer+0.107
adow+0.102
an+0.101
ins+0.095
pit+0.094
Neg logit contributions
Listen-0.248
Who-0.240
 crumbling-0.236
 unbearable-0.230
Less-0.229
Tokeneros
Token ID27498
Feature activation+1.26e-01

Pos logit contributions
adow+0.234
eh+0.199
cer+0.196
pit+0.191
robe+0.187
Neg logit contributions
Listen-0.349
Who-0.347
 crumbling-0.345
 failed-0.341
Less-0.338
Token was
Token ID373
Feature activation+8.15e-02

Pos logit contributions
cer+0.160
adow+0.155
an+0.153
ins+0.146
pit+0.146
Neg logit contributions
Listen-0.170
Who-0.160
 crumbling-0.157
 unbearable-0.152
Less-0.152
Token wild
Token ID4295
Feature activation+1.31e-01

Pos logit contributions
cer+0.123
adow+0.119
an+0.119
ins+0.115
pit+0.115
Neg logit contributions
Listen-0.089
Who-0.085
 crumbling-0.082
 unbearable-0.079
Less-0.078
Token.
Token ID13
Feature activation+7.78e-02

Pos logit contributions
eh+0.176
ins+0.174
an+0.173
cer+0.172
oco+0.170
Neg logit contributions
 crumbling-0.149
 tempting-0.138
 damage-0.138
 confusing-0.137
Who-0.137
Token Can
Token ID1680
Feature activation+7.90e-02

Pos logit contributions
cer+0.116
adow+0.114
an+0.112
pit+0.109
ins+0.108
Neg logit contributions
Listen-0.086
Who-0.080
 crumbling-0.079
 unbearable-0.076
Less-0.075
Token without
Token ID1231
Feature activation+4.23e-02

Pos logit contributions
cer+0.048
adow+0.047
an+0.046
ins+0.044
pit+0.044
Neg logit contributions
Listen-0.053
Who-0.050
 crumbling-0.049
 unbearable-0.047
Um-0.047
Token asking
Token ID4737
Feature activation+2.95e-02

Pos logit contributions
cer+0.058
ins+0.056
ese+0.054
an+0.053
adow+0.053
Neg logit contributions
Listen-0.053
Who-0.048
Um-0.046
 crumbling-0.046
Less-0.046
Token.
Token ID13
Feature activation+4.72e-02

Pos logit contributions
cer+0.030
adow+0.029
an+0.029
ins+0.027
pit+0.027
Neg logit contributions
Listen-0.047
Who-0.045
 crumbling-0.044
 unbearable-0.043
Less-0.043
Token<|endoftext|>
Token ID50256
Feature activation+1.12e-01

Pos logit contributions
cer+0.076
an+0.074
adow+0.073
ins+0.071
ese+0.071
Neg logit contributions
Listen-0.048
Who-0.046
 crumbling-0.043
Um-0.042
Less-0.042
TokenOnce
Token ID7454
Feature activation+6.83e-02

Pos logit contributions
an+0.202
cer+0.198
ese+0.194
adow+0.192
pit+0.191
Neg logit contributions
Listen-0.095
Who-0.087
 Who-0.080
Less-0.075
Warning-0.075
Token upon
Token ID2402
Feature activation+1.38e-01

Pos logit contributions
ese+0.093
cer+0.092
an+0.088
aw+0.082
ins+0.082
Neg logit contributions
Listen-0.085
 crumbling-0.080
 tempting-0.080
 failed-0.079
Less-0.078
Token a
Token ID257
Feature activation+7.58e-02

Pos logit contributions
an+0.315
pit+0.311
cer+0.301
ins+0.298
rim+0.297
Neg logit contributions
Who-0.091
Listen-0.084
It-0.064
On-0.064
What-0.057
Token time
Token ID640
Feature activation+3.46e-02

Pos logit contributions
cer+0.137
ins+0.134
er+0.134
robe+0.134
an+0.129
Neg logit contributions
 tug-0.059
Listen-0.059
 tempting-0.057
 notice-0.055
 failed-0.055
Token,
Token ID11
Feature activation+3.71e-02

Pos logit contributions
an+0.054
cer+0.053
ese+0.051
ins+0.051
ans+0.048
Neg logit contributions
Listen-0.037
 crumbling-0.035
Who-0.034
 tempting-0.033
 Much-0.032
Token there
Token ID612
Feature activation+5.42e-02

Pos logit contributions
cer+0.054
an+0.053
adow+0.053
ese+0.053
ins+0.052
Neg logit contributions
Listen-0.042
Who-0.040
 crumbling-0.039
 unbearable-0.037
Less-0.036
Token was
Token ID373
Feature activation+4.61e-02

Pos logit contributions
ese+0.086
an+0.086
ins+0.086
cer+0.085
adow+0.084
Neg logit contributions
Listen-0.054
Who-0.051
 crumbling-0.048
 failed-0.047
 Who-0.046
Token was
Token ID373
Feature activation+1.20e-01

Pos logit contributions
an+0.226
cer+0.224
ins+0.224
ese+0.215
pit+0.211
Neg logit contributions
Listen-0.144
Who-0.131
 failed-0.129
 crumbling-0.123
Less-0.120
Token a
Token ID257
Feature activation+1.11e-01

Pos logit contributions
an+0.202
cer+0.198
ins+0.194
ese+0.188
adow+0.185
Neg logit contributions
Listen-0.117
 crumbling-0.111
Who-0.108
 unbearable-0.107
 tempting-0.104
Token little
Token ID1310
Feature activation+1.50e-01

Pos logit contributions
cer+0.118
an+0.113
adow+0.110
pit+0.107
ins+0.103
Neg logit contributions
Listen-0.174
Who-0.162
 crumbling-0.160
Less-0.159
 unbearable-0.157
Token boy
Token ID2933
Feature activation+1.30e-01

Pos logit contributions
adow+0.149
ins+0.145
cer+0.141
ese+0.139
an+0.137
Neg logit contributions
Who-0.232
 crumbling-0.228
Listen-0.225
 tempting-0.219
 confusing-0.219
Token named
Token ID3706
Feature activation+1.12e-01

Pos logit contributions
cer+0.118
pit+0.114
an+0.113
ins+0.109
adow+0.108
Neg logit contributions
Who-0.207
Listen-0.205
 crumbling-0.202
 afford-0.193
 failed-0.193
Token Tim
Token ID5045
Feature activation+1.30e-01

Pos logit contributions
an+0.187
adow+0.184
cer+0.183
ese+0.182
ins+0.181
Neg logit contributions
Who-0.108
Listen-0.108
 crumbling-0.103
 Who-0.103
Um-0.095
Tokenmy
Token ID1820
Feature activation+1.10e-01

Pos logit contributions
cer+0.153
pit+0.151
ase+0.150
ese+0.149
an+0.148
Neg logit contributions
Who-0.168
Listen-0.167
 failed-0.161
 crumbling-0.159
 afford-0.155
Token.
Token ID13
Feature activation+6.08e-02

Pos logit contributions
cer+0.131
pit+0.128
an+0.127
ins+0.127
ese+0.124
Neg logit contributions
Listen-0.147
Who-0.146
 crumbling-0.143
 failed-0.139
 unbearable-0.136
Token Tim
Token ID5045
Feature activation+1.08e-01

Pos logit contributions
cer+0.083
adow+0.082
an+0.080
pit+0.080
ins+0.077
Neg logit contributions
Listen-0.074
Who-0.070
 crumbling-0.067
 unbearable-0.066
 failed-0.065
Tokenmy
Token ID1820
Feature activation+9.15e-02

Pos logit contributions
cer+0.117
ins+0.112
pit+0.111
over+0.108
ese+0.107
Neg logit contributions
Listen-0.155
 crumbling-0.145
 failed-0.145
 tempting-0.145
Who-0.144
Token was
Token ID373
Feature activation+7.11e-02

Pos logit contributions
cer+0.107
an+0.101
adow+0.101
ins+0.101
pit+0.100
Neg logit contributions
Listen-0.130
Who-0.124
 crumbling-0.122
 failed-0.118
 unbearable-0.118
Token you
Token ID345
Feature activation+1.21e-01

Pos logit contributions
cer+0.094
adow+0.088
an+0.087
ins+0.086
ese+0.081
Neg logit contributions
Listen-0.202
Who-0.195
 crumbling-0.192
 unbearable-0.188
Less-0.185
Token doing
Token ID1804
Feature activation+7.99e-02

Pos logit contributions
cer+0.144
an+0.139
adow+0.137
ins+0.133
pit+0.132
Neg logit contributions
Listen-0.171
Who-0.165
 crumbling-0.161
 unbearable-0.156
Less-0.155
Token?"
Token ID1701
Feature activation+6.77e-02

Pos logit contributions
cer+0.088
an+0.086
adow+0.085
ins+0.081
pit+0.080
Neg logit contributions
Listen-0.117
Who-0.113
 crumbling-0.111
 unbearable-0.108
 tempting-0.107
Token
Token ID198
Feature activation+5.92e-02

Pos logit contributions
cer+0.090
adow+0.088
an+0.086
ins+0.084
ie+0.083
Neg logit contributions
Listen-0.084
 crumbling-0.080
Who-0.078
 unbearable-0.077
 tempting-0.074
Token
Token ID198
Feature activation+5.21e-02

Pos logit contributions
cer+0.097
pit+0.093
an+0.093
adow+0.092
ins+0.090
Neg logit contributions
Listen-0.061
Who-0.058
Less-0.053
Um-0.053
 crumbling-0.052
TokenTim
Token ID14967
Feature activation+1.12e-01

Pos logit contributions
cer+0.072
adow+0.069
an+0.069
ins+0.066
pit+0.066
Neg logit contributions
Listen-0.064
Who-0.061
 crumbling-0.060
 unbearable-0.058
Less-0.057
Token and
Token ID290
Feature activation+5.87e-02

Pos logit contributions
cer+0.124
an+0.117
adow+0.115
ins+0.113
pit+0.110
Neg logit contributions
Listen-0.167
Who-0.160
 crumbling-0.156
 unbearable-0.152
 failed-0.151
Token Lily
Token ID20037
Feature activation+6.98e-02

Pos logit contributions
an+0.086
adow+0.085
ins+0.085
cer+0.085
ese+0.083
Neg logit contributions
Who-0.073
Listen-0.069
Um-0.066
 Who-0.065
 crumbling-0.065
Token stopped
Token ID5025
Feature activation+4.78e-02

Pos logit contributions
adow+0.098
cer+0.096
pit+0.091
an+0.091
ie+0.090
Neg logit contributions
Listen-0.084
Who-0.079
 crumbling-0.078
Less-0.076
 unbearable-0.074
Token.
Token ID13
Feature activation+3.09e-02

Pos logit contributions
cer+0.062
adow+0.060
an+0.060
ins+0.057
pit+0.057
Neg logit contributions
Listen-0.063
Who-0.060
 crumbling-0.059
 unbearable-0.057
Less-0.056
Token They
Token ID1119
Feature activation+3.58e-02

Pos logit contributions
room+0.026
cer+0.022
pit+0.022
robe+0.022
er+0.022
Neg logit contributions
 failed-0.053
 unbearable-0.051
 tragic-0.051
 afford-0.051
 damage-0.051
Token the
Token ID262
Feature activation+1.83e-01

Pos logit contributions
cer+0.230
pit+0.221
adow+0.220
an+0.216
ase+0.215
Neg logit contributions
Listen-0.125
 damage-0.121
 crumbling-0.119
Who-0.115
 unbearable-0.115
Token king
Token ID5822
Feature activation+1.11e-01

Pos logit contributions
adow+0.254
cer+0.251
ins+0.246
an+0.232
ase+0.231
Neg logit contributions
 crumbling-0.212
Who-0.209
Listen-0.208
 damage-0.207
 unbearable-0.204
Token laugh
Token ID6487
Feature activation+1.19e-01

Pos logit contributions
cer+0.153
an+0.143
ins+0.141
adow+0.140
pit+0.139
Neg logit contributions
Listen-0.129
Who-0.126
 crumbling-0.125
 unbearable-0.122
 failed-0.118
Token.
Token ID13
Feature activation+8.19e-02

Pos logit contributions
adow+0.154
cer+0.149
an+0.149
pit+0.143
ese+0.141
Neg logit contributions
Listen-0.145
Who-0.142
 unbearable-0.133
 crumbling-0.130
 Who-0.130
Token But
Token ID887
Feature activation+6.36e-02

Pos logit contributions
cer+0.135
an+0.132
adow+0.130
ins+0.127
ese+0.126
Neg logit contributions
Listen-0.080
Who-0.076
 crumbling-0.070
Less-0.069
Um-0.069
Token the
Token ID262
Feature activation+1.25e-01

Pos logit contributions
cer+0.091
adow+0.088
an+0.087
pit+0.084
ins+0.084
Neg logit contributions
Listen-0.075
Who-0.071
 crumbling-0.069
 unbearable-0.067
Less-0.066
Token king
Token ID5822
Feature activation+9.67e-02

Pos logit contributions
cer+0.161
adow+0.159
ins+0.157
an+0.143
ese+0.141
Neg logit contributions
Who-0.168
Listen-0.158
 crumbling-0.158
 damage-0.154
 burg-0.153
Token did
Token ID750
Feature activation+5.65e-02

Pos logit contributions
cer+0.145
adow+0.139
an+0.139
ins+0.137
pit+0.136
Neg logit contributions
Listen-0.098
 failed-0.095
Who-0.094
 crumbling-0.092
 unbearable-0.090
Token not
Token ID407
Feature activation+5.45e-02

Pos logit contributions
cer+0.056
adow+0.053
er+0.052
an+0.052
ie+0.050
Neg logit contributions
Listen-0.088
 crumbling-0.082
Who-0.082
Warning-0.080
Um-0.079
Token like
Token ID588
Feature activation+7.05e-02

Pos logit contributions
cer+0.067
adow+0.067
ie+0.065
an+0.065
ese+0.062
Neg logit contributions
Listen-0.075
 crumbling-0.071
Less-0.068
Warning-0.068
Um-0.067
Token that
Token ID326
Feature activation+4.96e-02

Pos logit contributions
cer+0.117
adow+0.113
an+0.112
pit+0.106
ins+0.105
Neg logit contributions
Who-0.071
Listen-0.069
 crumbling-0.067
 unbearable-0.063
 damage-0.062
Token toy
Token ID13373
Feature activation+1.16e-01

Pos logit contributions
adow+0.228
ins+0.226
cer+0.218
as+0.204
an+0.202
Neg logit contributions
Who-0.207
Listen-0.206
 crumbling-0.200
 afford-0.195
 damage-0.193
Token car
Token ID1097
Feature activation+8.40e-02

Pos logit contributions
adow+0.132
cer+0.130
ins+0.127
an+0.123
ese+0.118
Neg logit contributions
Listen-0.161
Who-0.159
 crumbling-0.157
Um-0.150
 afford-0.150
Token!"
Token ID2474
Feature activation+7.49e-02

Pos logit contributions
ins+0.104
cer+0.103
an+0.102
adow+0.099
eh+0.095
Neg logit contributions
 afford-0.102
Listen-0.097
 failed-0.094
 cost-0.094
Who-0.094
Token<|endoftext|>
Token ID50256
Feature activation+1.35e-01

Pos logit contributions
cer+0.118
an+0.115
adow+0.113
pit+0.111
ins+0.110
Neg logit contributions
Listen-0.079
Who-0.076
Less-0.070
 crumbling-0.070
Um-0.070
TokenOnce
Token ID7454
Feature activation+7.79e-02

Pos logit contributions
an+0.269
pit+0.258
cer+0.257
ese+0.257
ans+0.249
Neg logit contributions
Listen-0.095
Who-0.084
 Who-0.079
Warning-0.073
 Much-0.073
Token upon
Token ID2402
Feature activation+1.54e-01

Pos logit contributions
ese+0.108
cer+0.104
an+0.101
pit+0.095
adow+0.094
Neg logit contributions
Listen-0.097
Less-0.090
 failed-0.090
 crumbling-0.089
 tempting-0.089
Token a
Token ID257
Feature activation+9.20e-02

Pos logit contributions
an+0.362
pit+0.361
cer+0.348
ins+0.347
rim+0.346
Neg logit contributions
Who-0.085
Listen-0.079
It-0.064
There-0.058
On-0.054
Token time
Token ID640
Feature activation+4.87e-02

Pos logit contributions
cer+0.178
robe+0.177
er+0.175
keeping+0.172
ins+0.169
Neg logit contributions
 tempting-0.064
 notice-0.062
 tug-0.062
 failed-0.060
Listen-0.060
Token,
Token ID11
Feature activation+4.85e-02

Pos logit contributions
an+0.078
ese+0.074
cer+0.074
ins+0.072
ans+0.071
Neg logit contributions
Listen-0.052
 crumbling-0.048
Who-0.047
 failed-0.046
Less-0.045
Token there
Token ID612
Feature activation+6.41e-02

Pos logit contributions
ese+0.078
an+0.077
ins+0.077
adow+0.076
cer+0.075
Neg logit contributions
Listen-0.047
Who-0.045
 crumbling-0.044
Less-0.040
 tempting-0.039
Token was
Token ID373
Feature activation+5.60e-02

Pos logit contributions
ins+0.112
an+0.112
ese+0.112
ans+0.108
adow+0.107
Neg logit contributions
Listen-0.055
Who-0.051
 failed-0.047
 Who-0.045
 crumbling-0.045
Token spotted
Token ID13489
Feature activation+6.80e-02

Pos logit contributions
cer+0.148
adow+0.143
an+0.143
ins+0.139
pit+0.138
Neg logit contributions
Listen-0.109
Who-0.104
 crumbling-0.101
 unbearable-0.097
Um-0.096
Token a
Token ID257
Feature activation+7.42e-02

Pos logit contributions
cer+0.108
an+0.106
adow+0.105
pit+0.102
ins+0.101
Neg logit contributions
Listen-0.069
Who-0.068
 crumbling-0.065
 unbearable-0.062
 tempting-0.061
Token lady
Token ID10846
Feature activation+9.83e-02

Pos logit contributions
cer+0.100
adow+0.096
an+0.096
pit+0.093
ins+0.090
Neg logit contributions
Listen-0.095
Who-0.088
 crumbling-0.085
Less-0.084
Um-0.084
Tokenbug
Token ID25456
Feature activation+1.11e-01

Pos logit contributions
cer+0.119
an+0.115
ins+0.111
pit+0.107
ese+0.107
Neg logit contributions
Who-0.132
Listen-0.129
 crumbling-0.129
 afford-0.121
 unbearable-0.121
Token.
Token ID13
Feature activation+4.09e-02

Pos logit contributions
ins+0.142
cer+0.140
an+0.137
ests+0.137
oco+0.133
Neg logit contributions
 crumbling-0.127
Who-0.126
Listen-0.122
 failed-0.121
 unbearable-0.119
Token She
Token ID1375
Feature activation+5.10e-02

Pos logit contributions
cer+0.062
adow+0.061
an+0.060
pit+0.058
ins+0.058
Neg logit contributions
Listen-0.044
Who-0.041
 crumbling-0.040
 unbearable-0.039
Less-0.038
Token was
Token ID373
Feature activation+4.65e-02

Pos logit contributions
cer+0.070
adow+0.068
an+0.068
ins+0.065
ie+0.065
Neg logit contributions
Listen-0.062
 crumbling-0.057
Who-0.057
 unbearable-0.056
Less-0.055
Token very
Token ID845
Feature activation+2.29e-02

Pos logit contributions
cer+0.068
an+0.065
adow+0.063
ins+0.062
pit+0.062
Neg logit contributions
Listen-0.054
 crumbling-0.052
Who-0.052
 unbearable-0.050
 tempting-0.049
Token friendly
Token ID8030
Feature activation+7.45e-02

Pos logit contributions
cer+0.032
adow+0.032
an+0.031
ins+0.031
pit+0.030
Neg logit contributions
Listen-0.027
Who-0.026
 crumbling-0.025
Less-0.024
 unbearable-0.024
Token and
Token ID290
Feature activation+3.01e-02

Pos logit contributions
an+0.119
eh+0.115
cer+0.111
ans+0.109
ins+0.108
Neg logit contributions
Who-0.074
 Who-0.069
 crumbling-0.068
 tempting-0.066
Listen-0.065
Token they
Token ID484
Feature activation+2.99e-02

Pos logit contributions
cer+0.048
an+0.047
adow+0.047
pit+0.045
ins+0.045
Neg logit contributions
Listen-0.031
Who-0.029
 crumbling-0.028
 unbearable-0.027
Um-0.027
Token couldn
Token ID3521
Feature activation+1.17e-02

Pos logit contributions
cer+0.013
adow+0.013
an+0.012
pit+0.012
aw+0.012
Neg logit contributions
Listen-0.011
 crumbling-0.010
Who-0.010
 unbearable-0.009
Less-0.009
Token't
Token ID470
Feature activation+1.98e-02

Pos logit contributions
cer+0.019
adow+0.019
an+0.019
ie+0.018
ase+0.018
Neg logit contributions
Listen-0.011
Who-0.010
 crumbling-0.009
 unbearable-0.009
Um-0.009
Token believe
Token ID1975
Feature activation+6.16e-02

Pos logit contributions
adow+0.025
ie+0.024
an+0.024
cer+0.023
room+0.023
Neg logit contributions
Listen-0.029
 tempting-0.028
 pricey-0.026
 frustrating-0.025
 crumbling-0.025
Token it
Token ID340
Feature activation+1.09e-02

Pos logit contributions
an+0.108
ese+0.107
cer+0.107
adow+0.107
eh+0.103
Neg logit contributions
Who-0.054
Listen-0.050
 crumbling-0.048
 Who-0.044
 unbearable-0.042
Token.
Token ID13
Feature activation-6.62e-04

Pos logit contributions
cer+0.008
adow+0.008
an+0.007
ie+0.007
pit+0.007
Neg logit contributions
Listen-0.021
Who-0.020
 crumbling-0.019
Um-0.019
Less-0.019
Token It
Token ID632
Feature activation+4.76e-03

Pos logit contributions
Listen+0.001
Who+0.001
 crumbling+0.001
 unbearable+0.001
Less+0.001
Neg logit contributions
cer-0.001
adow-0.001
an-0.001
pit-0.001
ins-0.001
Token had
Token ID550
Feature activation+1.48e-02

Pos logit contributions
cer+0.006
an+0.006
adow+0.006
pit+0.005
ie+0.005
Neg logit contributions
Listen-0.007
Who-0.006
 crumbling-0.006
Less-0.006
 unbearable-0.006
Token become
Token ID1716
Feature activation+1.41e-02

Pos logit contributions
cer+0.020
adow+0.019
an+0.019
aw+0.019
pit+0.018
Neg logit contributions
Listen-0.018
Who-0.017
Less-0.016
 crumbling-0.016
 unbearable-0.016
Token alive
Token ID6776
Feature activation+3.01e-02

Pos logit contributions
cer+0.018
adow+0.017
pit+0.017
an+0.016
ins+0.016
Neg logit contributions
Listen-0.020
Who-0.018
Less-0.017
Um-0.017
 Who-0.017
Token!
Token ID0
Feature activation+1.44e-03

Pos logit contributions
ins+0.039
an+0.039
ese+0.038
adow+0.036
ans+0.036
Neg logit contributions
 crumbling-0.037
Who-0.035
 tempting-0.034
 unbearable-0.033
 failed-0.033
Token
Token ID198
Feature activation+4.17e-04

Pos logit contributions
cer+0.002
adow+0.002
an+0.002
ins+0.002
pit+0.002
Neg logit contributions
Listen-0.002
Who-0.002
 crumbling-0.002
 unbearable-0.001
Less-0.001
Token time
Token ID640
Feature activation-1.56e-02

Pos logit contributions
cer+0.059
ins+0.057
pit+0.056
an+0.056
er+0.056
Neg logit contributions
Listen-0.029
 tempting-0.027
 failed-0.025
 tug-0.025
 unbearable-0.025
Token,
Token ID11
Feature activation-9.87e-03

Pos logit contributions
Listen+0.018
 crumbling+0.016
Who+0.016
 failed+0.016
 tempting+0.016
Neg logit contributions
an-0.024
cer-0.023
ese-0.022
ins-0.022
ans-0.021
Token there
Token ID612
Feature activation+1.31e-02

Pos logit contributions
Listen+0.010
Who+0.010
 crumbling+0.009
 tempting+0.009
 unbearable+0.009
Neg logit contributions
an-0.016
cer-0.016
ese-0.015
ins-0.015
adow-0.015
Token was
Token ID373
Feature activation+2.41e-03

Pos logit contributions
an+0.020
ins+0.020
cer+0.019
ese+0.019
adow+0.019
Neg logit contributions
Listen-0.014
Who-0.013
 crumbling-0.013
 failed-0.012
 Who-0.012
Token a
Token ID257
Feature activation+1.56e-02

Pos logit contributions
an+0.005
pit+0.005
cer+0.005
ins+0.005
ese+0.004
Neg logit contributions
Listen-0.002
Who-0.002
Less-0.001
 crumbling-0.001
 Much-0.001
Token little
Token ID1310
Feature activation+4.84e-02

Pos logit contributions
cer+0.016
adow+0.016
an+0.015
ins+0.015
pit+0.015
Neg logit contributions
Listen-0.024
Who-0.024
 crumbling-0.023
 unbearable-0.022
Um-0.022
Token girl
Token ID2576
Feature activation+4.64e-02

Pos logit contributions
adow+0.053
pit+0.049
ins+0.049
cer+0.049
ese+0.048
Neg logit contributions
Who-0.070
Listen-0.069
 crumbling-0.068
 burg-0.066
Po-0.066
Token named
Token ID3706
Feature activation+1.81e-02

Pos logit contributions
pit+0.073
ins+0.070
cer+0.067
ests+0.067
immers+0.066
Neg logit contributions
Who-0.050
Po-0.047
 knock-0.046
 crumbling-0.045
 knocked-0.045
Token Lily
Token ID20037
Feature activation+5.55e-02

Pos logit contributions
cer+0.023
an+0.022
ins+0.021
adow+0.021
ese+0.021
Neg logit contributions
Listen-0.026
Who-0.025
 crumbling-0.024
Um-0.023
Less-0.023
Token.
Token ID13
Feature activation-1.75e-02

Pos logit contributions
pit+0.101
ins+0.097
 majesty+0.094
ese+0.093
immers+0.091
Neg logit contributions
Who-0.050
Listen-0.047
 afford-0.044
 crumbling-0.044
Dead-0.041
Token She
Token ID1375
Feature activation+8.57e-03

Pos logit contributions
Listen+0.019
Who+0.018
 crumbling+0.017
 unbearable+0.016
Less+0.016
Neg logit contributions
cer-0.027
adow-0.026
an-0.026
ins-0.025
pit-0.025
Token run
Token ID1057
Feature activation-1.22e-03

Pos logit contributions
Listen+0.001
Who+0.001
 crumbling+0.001
Less+0.001
 unbearable+0.001
Neg logit contributions
adow-0.001
cer-0.001
an-0.001
pit-0.001
ese-0.001
Token away
Token ID1497
Feature activation-8.30e-03

Pos logit contributions
Listen+0.001
Who+0.001
 crumbling+0.001
 unbearable+0.001
 failed+0.001
Neg logit contributions
cer-0.002
adow-0.002
an-0.002
ins-0.002
ese-0.002
Token from
Token ID422
Feature activation+5.18e-03

Pos logit contributions
Listen+0.015
Who+0.014
 crumbling+0.014
Um+0.014
 unbearable+0.013
Neg logit contributions
cer-0.007
adow-0.007
an-0.006
pit-0.006
ins-0.006
Token the
Token ID262
Feature activation+3.72e-02

Pos logit contributions
adow+0.007
cer+0.007
an+0.007
ins+0.007
pit+0.007
Neg logit contributions
Listen-0.006
Who-0.006
 crumbling-0.006
 unbearable-0.005
Less-0.005
Token black
Token ID2042
Feature activation+7.47e-02

Pos logit contributions
cer+0.031
an+0.029
pit+0.028
ese+0.026
adow+0.026
Neg logit contributions
Listen-0.066
 unbearable-0.063
Um-0.062
Less-0.062
 crumbling-0.062
Token car
Token ID1097
Feature activation+8.12e-03

Pos logit contributions
adow+0.086
cer+0.083
ins+0.083
an+0.081
ese+0.079
Neg logit contributions
Listen-0.107
Who-0.104
 crumbling-0.099
 afford-0.096
 burg-0.095
Token.
Token ID13
Feature activation-2.14e-02

Pos logit contributions
ins+0.010
cer+0.010
an+0.010
adow+0.009
ese+0.009
Neg logit contributions
 crumbling-0.010
 failed-0.010
 afford-0.010
Who-0.009
Listen-0.009
Token They
Token ID1119
Feature activation-1.44e-02

Pos logit contributions
 unbearable+0.032
Listen+0.032
 crumbling+0.031
Less+0.030
 failed+0.030
Neg logit contributions
cer-0.022
room-0.021
adow-0.021
aw-0.021
pit-0.019
Token run
Token ID1057
Feature activation-6.96e-03

Pos logit contributions
Listen+0.016
Who+0.014
 crumbling+0.014
 unbearable+0.014
Less+0.014
Neg logit contributions
cer-0.022
adow-0.022
an-0.021
pit-0.021
ese-0.020
Token back
Token ID736
Feature activation-5.73e-03

Pos logit contributions
Listen+0.007
Who+0.007
 crumbling+0.006
 failed+0.006
 afford+0.006
Neg logit contributions
adow-0.011
ins-0.011
an-0.011
cer-0.011
ese-0.011
Token to
Token ID284
Feature activation-3.53e-02

Pos logit contributions
 crumbling+0.004
Listen+0.004
Who+0.004
 failed+0.004
 confusing+0.004
Neg logit contributions
ese-0.010
an-0.010
ins-0.010
cer-0.010
pit-0.010
Token to
Token ID284
Feature activation+2.56e-02

Pos logit contributions
cer+0.047
adow+0.046
an+0.045
ins+0.044
ie+0.043
Neg logit contributions
Listen-0.050
Who-0.048
Less-0.047
 crumbling-0.047
 unbearable-0.046
Token sleep
Token ID3993
Feature activation+3.36e-02

Pos logit contributions
cer+0.040
adow+0.040
an+0.039
ins+0.038
pit+0.038
Neg logit contributions
Listen-0.026
Who-0.025
 crumbling-0.024
 unbearable-0.023
Less-0.023
Token with
Token ID351
Feature activation+3.97e-02

Pos logit contributions
cer+0.039
an+0.037
adow+0.037
ins+0.036
pit+0.035
Neg logit contributions
Listen-0.048
Who-0.046
 crumbling-0.046
 unbearable-0.045
Less-0.044
Token his
Token ID465
Feature activation+6.27e-02

Pos logit contributions
cer+0.066
adow+0.064
an+0.064
pit+0.063
ins+0.061
Neg logit contributions
Listen-0.038
Who-0.037
 crumbling-0.036
 unbearable-0.035
 damage-0.034
Token mom
Token ID1995
Feature activation+4.30e-02

Pos logit contributions
cer+0.078
adow+0.074
an+0.073
ins+0.072
ese+0.071
Neg logit contributions
Listen-0.082
 crumbling-0.081
Who-0.081
 unbearable-0.079
 tempting-0.077
Token and
Token ID290
Feature activation+1.46e-02

Pos logit contributions
cer+0.050
an+0.049
ins+0.047
pit+0.045
 majesty+0.043
Neg logit contributions
Listen-0.057
Who-0.056
 crumbling-0.055
 tempting-0.054
 unbearable-0.054
Token dad
Token ID9955
Feature activation+2.55e-02

Pos logit contributions
an+0.023
cer+0.022
ins+0.022
ase+0.022
adow+0.021
Neg logit contributions
Who-0.016
Listen-0.016
Um-0.015
 crumbling-0.015
 failed-0.014
Token in
Token ID287
Feature activation+2.27e-02

Pos logit contributions
cer+0.026
an+0.025
adow+0.024
ins+0.023
pit+0.023
Neg logit contributions
Listen-0.040
Who-0.039
 crumbling-0.038
 unbearable-0.037
Less-0.037
Token their
Token ID511
Feature activation+6.08e-02

Pos logit contributions
cer+0.033
an+0.033
adow+0.032
ins+0.032
pit+0.031
Neg logit contributions
Listen-0.026
Who-0.025
 crumbling-0.024
 unbearable-0.023
Less-0.023
Token big
Token ID1263
Feature activation+7.46e-02

Pos logit contributions
cer+0.068
ins+0.065
adow+0.062
an+0.061
ese+0.059
Neg logit contributions
 crumbling-0.087
Who-0.087
Listen-0.087
 confusing-0.084
 tempting-0.083
Token bed
Token ID3996
Feature activation+1.98e-02

Pos logit contributions
cer+0.092
an+0.086
ins+0.085
adow+0.083
ese+0.079
Neg logit contributions
 crumbling-0.099
Listen-0.098
Who-0.096
 unbearable-0.094
 confusing-0.094
Token sn
Token ID3013
Feature activation-2.40e-02

Pos logit contributions
Listen+0.050
Who+0.049
 crumbling+0.047
 failed+0.046
Um+0.045
Neg logit contributions
cer-0.074
adow-0.073
an-0.072
ins-0.070
ase-0.069
Tokeniffs
Token ID10203
Feature activation-4.64e-02

Pos logit contributions
Listen+0.030
Who+0.027
 crumbling+0.027
Less+0.026
 unbearable+0.026
Neg logit contributions
cer-0.033
an-0.032
adow-0.032
ins-0.031
pit-0.031
Token the
Token ID262
Feature activation-1.76e-02

Pos logit contributions
Listen+0.044
Who+0.042
 crumbling+0.041
 unbearable+0.038
 tempting+0.037
Neg logit contributions
adow-0.074
cer-0.073
ese-0.072
an-0.072
ins-0.071
Token box
Token ID3091
Feature activation-6.47e-02

Pos logit contributions
Listen+0.032
Who+0.030
Um+0.030
 unbearable+0.030
 crumbling+0.030
Neg logit contributions
cer-0.015
an-0.015
pit-0.013
adow-0.013
ese-0.013
Token and
Token ID290
Feature activation-5.12e-02

Pos logit contributions
Listen+0.098
 crumbling+0.097
 unbearable+0.094
Who+0.094
 tempting+0.091
Neg logit contributions
cer-0.067
ins-0.064
an-0.063
adow-0.062
ese-0.061
Token tries
Token ID8404
Feature activation-6.21e-02

Pos logit contributions
Listen+0.060
Who+0.056
 crumbling+0.055
 unbearable+0.053
Less+0.052
Neg logit contributions
cer-0.074
adow-0.072
an-0.072
ins-0.069
pit-0.069
Token to
Token ID284
Feature activation-5.03e-02

Pos logit contributions
Listen+0.103
Um+0.099
Who+0.098
 unbearable+0.097
Less+0.097
Neg logit contributions
cer-0.056
er-0.053
adow-0.050
 ways-0.050
ins-0.049
Token open
Token ID1280
Feature activation-4.41e-02

Pos logit contributions
Listen+0.047
Who+0.042
 crumbling+0.042
Less+0.040
 unbearable+0.040
Neg logit contributions
cer-0.085
an-0.083
adow-0.082
pit-0.079
ese-0.079
Token it
Token ID340
Feature activation-6.59e-02

Pos logit contributions
Listen+0.050
Who+0.047
 crumbling+0.045
Um+0.044
 unbearable+0.044
Neg logit contributions
cer-0.066
an-0.063
adow-0.063
pit-0.061
ins-0.061
Token.
Token ID13
Feature activation-6.14e-02

Pos logit contributions
Listen+0.128
Who+0.121
Less+0.117
 crumbling+0.117
Um+0.117
Neg logit contributions
cer-0.048
adow-0.045
an-0.045
pit-0.044
er-0.043
Token The
Token ID383
Feature activation-1.54e-02

Pos logit contributions
Listen+0.089
 crumbling+0.088
 unbearable+0.085
 afford+0.083
Who+0.083
Neg logit contributions
adow-0.063
room-0.063
aw-0.062
an-0.060
cer-0.059
Token Sam
Token ID3409
Feature activation-2.22e-02

Pos logit contributions
Listen+0.053
 crumbling+0.050
Who+0.050
 unbearable+0.049
Less+0.048
Neg logit contributions
cer-0.060
adow-0.057
an-0.056
pit-0.056
ins-0.055
Token and
Token ID290
Feature activation-2.01e-02

Pos logit contributions
Listen+0.035
Who+0.033
 crumbling+0.033
 unbearable+0.032
Less+0.032
Neg logit contributions
cer-0.023
adow-0.023
an-0.023
pit-0.022
ins-0.022
Token Lily
Token ID20037
Feature activation-1.93e-02

Pos logit contributions
Listen+0.031
Who+0.031
 crumbling+0.028
Um+0.028
 Who+0.028
Neg logit contributions
cer-0.023
adow-0.022
an-0.022
ese-0.020
ins-0.020
Token are
Token ID389
Feature activation-3.79e-02

Pos logit contributions
Listen+0.029
Um+0.026
 crumbling+0.026
 unbearable+0.026
Who+0.026
Neg logit contributions
adow-0.023
cer-0.023
pit-0.022
ese-0.022
an-0.021
Token playing
Token ID2712
Feature activation-2.22e-02

Pos logit contributions
Listen+0.038
Who+0.038
 crumbling+0.038
 unbearable+0.036
 tempting+0.035
Neg logit contributions
cer-0.060
an-0.058
adow-0.057
ase-0.055
ins-0.054
Token with
Token ID351
Feature activation-2.03e-02

Pos logit contributions
Who+0.031
Listen+0.030
 crumbling+0.028
 tug+0.028
Less+0.027
Neg logit contributions
adow-0.026
an-0.026
cer-0.025
ie-0.024
ins-0.024
Token a
Token ID257
Feature activation-2.88e-02

Pos logit contributions
Who+0.020
Listen+0.020
 crumbling+0.020
 unbearable+0.018
 damage+0.018
Neg logit contributions
adow-0.032
an-0.031
cer-0.031
ins-0.029
pit-0.029
Token ball
Token ID2613
Feature activation-2.13e-02

Pos logit contributions
Listen+0.041
Who+0.039
 crumbling+0.038
 unbearable+0.038
Less+0.037
Neg logit contributions
cer-0.035
an-0.033
adow-0.033
pit-0.032
ins-0.032
Token.
Token ID13
Feature activation-4.14e-02

Pos logit contributions
 crumbling+0.024
Listen+0.023
Who+0.023
 failed+0.023
 afford+0.023
Neg logit contributions
an-0.027
ins-0.027
adow-0.027
cer-0.027
 majesty-0.026
Token Tom
Token ID4186
Feature activation-1.72e-02

Pos logit contributions
Listen+0.047
Who+0.044
 crumbling+0.042
Um+0.041
 unbearable+0.041
Neg logit contributions
cer-0.062
adow-0.060
an-0.060
ins-0.058
pit-0.058
Token and
Token ID290
Feature activation-2.65e-02

Pos logit contributions
Listen+0.026
Who+0.025
 crumbling+0.025
 unbearable+0.024
Less+0.024
Neg logit contributions
cer-0.019
adow-0.018
an-0.018
ins-0.017
pit-0.017
Token.
Token ID13
Feature activation+8.07e-04

Pos logit contributions
cer+0.019
ins+0.018
ese+0.017
pit+0.017
an+0.017
Neg logit contributions
Listen-0.017
 crumbling-0.017
 afford-0.017
 unbearable-0.016
Who-0.016
Token
Token ID198
Feature activation+3.99e-03

Pos logit contributions
cer+0.001
adow+0.001
an+0.001
ins+0.001
pit+0.001
Neg logit contributions
Listen-0.001
Who-0.001
 crumbling-0.001
Less-0.001
Um-0.001
Token
Token ID198
Feature activation-5.59e-03

Pos logit contributions
cer+0.006
adow+0.006
an+0.006
pit+0.006
ins+0.006
Neg logit contributions
Listen-0.004
Who-0.004
 crumbling-0.004
Less-0.004
Um-0.004
TokenMax
Token ID11518
Feature activation+1.45e-02

Pos logit contributions
Listen+0.007
Who+0.006
 crumbling+0.006
 unbearable+0.006
Um+0.006
Neg logit contributions
cer-0.008
an-0.008
adow-0.008
ins-0.007
pit-0.007
Token started
Token ID2067
Feature activation-3.91e-03

Pos logit contributions
cer+0.019
an+0.018
adow+0.018
pit+0.017
ins+0.017
Neg logit contributions
Listen-0.019
Who-0.018
 crumbling-0.018
Less-0.017
 unbearable-0.017
Token to
Token ID284
Feature activation-2.86e-03

Pos logit contributions
Listen+0.005
Who+0.005
Less+0.005
Um+0.005
 crumbling+0.005
Neg logit contributions
cer-0.005
an-0.004
adow-0.004
ins-0.004
pit-0.004
Token shoot
Token ID2686
Feature activation-2.07e-03

Pos logit contributions
Listen+0.003
Who+0.003
Less+0.003
Um+0.003
 crumbling+0.003
Neg logit contributions
cer-0.004
an-0.004
adow-0.004
ie-0.004
pit-0.004
Token with
Token ID351
Feature activation+1.09e-02

Pos logit contributions
Listen+0.002
Who+0.002
 crumbling+0.002
 unbearable+0.002
Less+0.002
Neg logit contributions
adow-0.003
cer-0.003
an-0.003
pit-0.003
ins-0.003
Token the
Token ID262
Feature activation+3.00e-02

Pos logit contributions
adow+0.017
an+0.017
cer+0.017
ins+0.017
pit+0.016
Neg logit contributions
Listen-0.011
 crumbling-0.011
Who-0.011
 unbearable-0.011
 failed-0.010
Token microphone
Token ID21822
Feature activation-9.14e-03

Pos logit contributions
adow+0.034
cer+0.034
an+0.032
pit+0.032
ins+0.031
Neg logit contributions
Listen-0.044
Who-0.043
 crumbling-0.042
 unbearable-0.040
Less-0.040
Token.
Token ID13
Feature activation-2.02e-02

Pos logit contributions
Listen+0.016
Who+0.016
Um+0.015
Less+0.015
 crumbling+0.015
Neg logit contributions
cer-0.008
an-0.008
adow-0.008
pit-0.007
ins-0.007
Token out
Token ID503
Feature activation-5.99e-02

Pos logit contributions
Listen+0.085
Who+0.082
Um+0.079
Less+0.078
 crumbling+0.078
Neg logit contributions
cer-0.051
an-0.049
adow-0.046
pit-0.046
er-0.045
Token and
Token ID290
Feature activation-4.54e-02

Pos logit contributions
Listen+0.135
Um+0.128
Less+0.126
Who+0.125
 afford+0.125
Neg logit contributions
er-0.027
cer-0.025
ie-0.024
room-0.021
pit-0.020
Token said
Token ID531
Feature activation-5.45e-02

Pos logit contributions
Listen+0.057
 afford+0.052
Warning+0.051
 crumbling+0.051
Who+0.050
Neg logit contributions
cer-0.063
an-0.057
ie-0.057
er-0.055
adow-0.055
Token,
Token ID11
Feature activation-4.52e-02

Pos logit contributions
Listen+0.101
Who+0.098
 crumbling+0.097
Less+0.095
 unbearable+0.095
Neg logit contributions
cer-0.039
adow-0.037
an-0.036
er-0.035
ins-0.034
Token "
Token ID366
Feature activation-4.20e-02

Pos logit contributions
 afford+0.090
 damage+0.089
 failed+0.087
Warning+0.087
Listen+0.086
Neg logit contributions
er-0.021
 and-0.016
 when-0.016
 knights-0.015
 where-0.015
TokenSur
Token ID14214
Feature activation-2.93e-02

Pos logit contributions
Listen+0.036
 crumbling+0.034
Who+0.033
 unbearable+0.032
 tempting+0.031
Neg logit contributions
cer-0.073
an-0.070
adow-0.069
ins-0.068
ese-0.068
Tokenprise
Token ID7919
Feature activation-8.41e-03

Pos logit contributions
Listen+0.028
Who+0.026
Less+0.024
 crumbling+0.024
Um+0.024
Neg logit contributions
cer-0.048
an-0.047
pit-0.046
adow-0.046
ins-0.044
Token!"
Token ID2474
Feature activation-4.56e-02

Pos logit contributions
Who+0.011
 Who+0.011
Listen+0.010
 Really+0.010
 tempting+0.010
Neg logit contributions
cer-0.010
an-0.009
adow-0.009
eh-0.009
ins-0.009
Token The
Token ID383
Feature activation-4.74e-04

Pos logit contributions
Listen+0.054
 crumbling+0.050
Who+0.050
 unbearable+0.048
Um+0.048
Neg logit contributions
cer-0.065
adow-0.064
an-0.063
ins-0.060
ese-0.060
Token lion
Token ID18744
Feature activation-1.76e-02

Pos logit contributions
Listen+0.001
Who+0.001
 crumbling+0.001
 unbearable+0.001
Less+0.001
Neg logit contributions
cer-0.000
adow-0.000
an-0.000
ins-0.000
pit-0.000
Token laughed
Token ID13818
Feature activation-2.42e-02

Pos logit contributions
Listen+0.024
Who+0.022
 crumbling+0.021
Less+0.021
 unbearable+0.021
Neg logit contributions
cer-0.023
an-0.023
adow-0.022
pit-0.021
ese-0.021
Token.
Token ID13
Feature activation-2.09e-02

Pos logit contributions
Listen+0.038
Who+0.037
 crumbling+0.036
Um+0.036
Less+0.035
Neg logit contributions
cer-0.018
adow-0.017
an-0.017
ins-0.016
pit-0.016
Token They
Token ID1119
Feature activation-2.29e-02

Pos logit contributions
 unbearable+0.032
 crumbling+0.032
Listen+0.032
 persever+0.032
Less+0.031
Neg logit contributions
room-0.019
aw-0.019
cer-0.018
pit-0.017
er-0.017
Token wanted
Token ID2227
Feature activation-1.69e-02

Pos logit contributions
Listen+0.026
Who+0.024
 crumbling+0.024
Less+0.023
 unbearable+0.023
Neg logit contributions
cer-0.033
an-0.032
adow-0.032
ese-0.030
aw-0.030
Token to
Token ID284
Feature activation-1.92e-02

Pos logit contributions
Listen+0.025
Less+0.025
Who+0.024
 crumbling+0.024
Warning+0.024
Neg logit contributions
 ways-0.016
cer-0.015
ie-0.015
er-0.015
 knights-0.015
Token go
Token ID467
Feature activation-2.58e-02

Pos logit contributions
Who+0.012
 damage+0.012
Listen+0.011
 afford+0.011
 break+0.011
Neg logit contributions
ins-0.040
pit-0.039
cer-0.039
adow-0.038
ase-0.037
Token back
Token ID736
Feature activation-2.37e-02

Pos logit contributions
Listen+0.034
Who+0.032
 crumbling+0.031
 afford+0.030
 unbearable+0.030
Neg logit contributions
cer-0.034
adow-0.032
an-0.032
ins-0.032
pit-0.031
Token to
Token ID284
Feature activation-3.19e-02

Pos logit contributions
Who+0.021
Listen+0.020
 crumbling+0.019
 damage+0.019
 afford+0.018
Neg logit contributions
an-0.040
ins-0.039
ese-0.039
pit-0.038
eh-0.038
Token the
Token ID262
Feature activation+4.25e-02

Pos logit contributions
Listen+0.036
Who+0.035
 crumbling+0.033
Less+0.031
 unbearable+0.031
Neg logit contributions
cer-0.048
an-0.047
adow-0.047
ins-0.046
pit-0.045
Token park
Token ID3952
Feature activation+4.35e-03

Pos logit contributions
adow+0.046
cer+0.045
ins+0.043
an+0.042
pit+0.041
Neg logit contributions
Listen-0.066
Who-0.065
 crumbling-0.062
 damage-0.060
Less-0.060
Token.
Token ID13
Feature activation-2.45e-02

Pos logit contributions
ins+0.006
an+0.006
cer+0.006
eh+0.006
adow+0.006
Neg logit contributions
Who-0.005
 afford-0.005
 failed-0.005
Listen-0.005
 crumbling-0.005
Token They
Token ID1119
Feature activation-2.46e-02

Pos logit contributions
 crumbling+0.033
Listen+0.033
 unbearable+0.033
Who+0.031
Less+0.031
Neg logit contributions
cer-0.029
adow-0.026
room-0.026
aw-0.026
an-0.026