TOP ACTIVATIONS
MAX = 0.548

angry . They cried and blamed each other for losing the
sad and angry . He blamed Mia for not helping him
time . They yelled and blamed and fired people .
crying and angry . He blamed Sam for his pain .
. He was angry and blamed Mia . He said it
Mill ie blamed her mom for why she
sad and angry . They blamed Tom and Mia for being
sad and angry . He blamed Mia for making him fall
what happened . They both blamed each other . They both
's your fault !" Mia blamed Sam . Their
what you did !" Sam blamed Mia . "
he just kept sh ru gging . Rob by thought this
everyone had been sh ru gging when she was looking for
Tom guesses , sh ru gging . They turn
Ben said , sh ru gging . They saw
, who is surprised and amused . She says they are
Mom was not amused . She took off the
angry . She is also amused . She laughs . She
not surprised . She is amused . She tells them the
lady was surprised , but amused by the young child รข

MIN ACTIVATIONS
MIN = -0.508

, and always made sure to listen to her mom .
stool , so she decided to sit down .
both of them had something to remember each other by .
out and look for something to chew . He
she had figured out how to attach the car to the
music , so she went to the neighbor 's house to
on so she went up to her mom and asked her
that he had been able to climb to the top .
and looked around for something to wear . Suddenly , she
now it was too late to yield . <|endoftext|> Once there
tall , so she had to stand on her t ipp
beautiful , so he decided to design something that he could
all sticky so he decided to take a bath . He
, so he used it to open the chest . Inside
!" They start to fight . They push and
around , so he decided to keep looking through the store
hills that were very tricky to climb . Daisy was very
nest , so she decided to give him a helping hand
were so many fun things to look at . She saw
time , she made sure to put them in a basket

INTERVAL 0.284 - 0.548
CONTAINS 1.478%

play together nicely . They both agreed and had fun playing
apple juice . They are happy and say thank you .
size . She was so happy that she succeeded in finding
ground . The cat was happy and safe , all thanks
small they were . He learned that it 's important to

INTERVAL 0.020 - 0.284
CONTAINS 43.498%

does not like dolls . He thinks they are boring .
and try each other 's toys . They found out that
He was not angry . He was glad that his son
big fish named Max . Max looked strong and happy .
a long time . Every time he took a step forward

INTERVAL -0.244 - 0.020
CONTAINS 49.756%

cried and screamed . They had cuts and bruises . They
be restored . I can fix it for you ." Lily
so excited ! When they got to the
. Despite being scared , she had still allowed her uncle
The man lay down on the grass and soon began to

INTERVAL -0.508 - -0.244
CONTAINS 5.267%

was painting , everyone wanted to watch . One
day , her family went to the store to buy a
she needed ! She carefully we aved them around her toe
was a girl who loved to pick tomatoes from her garden
the swing . He says to Mia , " Let me
Token angry
Token ID7954
Feature activation+4.10e-01

Pos logit contributions
aron+0.017
gging+0.015
ady+0.015
cks+0.015
aky+0.015
Neg logit contributions
 to-0.024
 arrived-0.021
ho-0.021
 hill-0.021
t-0.020
Token.
Token ID13
Feature activation+1.89e-01

Pos logit contributions
 blamed+0.738
 amused+0.691
cks+0.678
aky+0.677
gging+0.670
Neg logit contributions
 to-0.876
t-0.780
 hill-0.713
anger-0.706
ho-0.661
Token They
Token ID1119
Feature activation+1.56e-01

Pos logit contributions
 blamed+0.357
 amused+0.343
aky+0.340
gging+0.337
cks+0.332
Neg logit contributions
 to-0.323
t-0.321
 hill-0.307
 ladder-0.291
anger-0.289
Token cried
Token ID16896
Feature activation+3.69e-01

Pos logit contributions
cks+0.301
aron+0.295
 amused+0.281
gging+0.281
ags+0.279
Neg logit contributions
 to-0.274
 arrived-0.264
 would-0.245
t-0.242
 hill-0.238
Token and
Token ID290
Feature activation+1.10e-02

Pos logit contributions
cks+0.952
 blamed+0.921
eling+0.901
 amused+0.884
gging+0.866
Neg logit contributions
 to-0.563
t-0.421
 hill-0.358
anger-0.346
 arrived-0.330
Token blamed
Token ID13772
Feature activation+5.48e-01

Pos logit contributions
iver+0.019
ady+0.019
aky+0.019
cks+0.019
aron+0.019
Neg logit contributions
 to-0.022
 arrived-0.019
ho-0.018
 hill-0.018
t-0.017
Token each
Token ID1123
Feature activation+1.27e-01

Pos logit contributions
aky+1.188
 blamed+1.159
 amused+1.138
ady+1.098
aron+1.095
Neg logit contributions
 to-0.938
 hill-0.769
ho-0.754
oes-0.683
 ladder-0.648
Token other
Token ID584
Feature activation+3.05e-02

Pos logit contributions
 blamed+0.225
 amused+0.204
gging+0.202
 ashamed+0.197
cks+0.194
Neg logit contributions
t-0.238
 hill-0.235
 to-0.233
anger-0.230
 ladder-0.219
Token for
Token ID329
Feature activation-6.92e-02

Pos logit contributions
 blamed+0.059
gging+0.054
 amused+0.054
cks+0.053
 ashamed+0.051
Neg logit contributions
 to-0.055
t-0.052
 hill-0.051
anger-0.049
 ladder-0.048
Token losing
Token ID6078
Feature activation+1.85e-01

Pos logit contributions
 to+0.131
 prize+0.100
 hill+0.099
 visit+0.093
 ladder+0.092
Neg logit contributions
aron-0.140
aky-0.135
ady-0.130
 blamed-0.127
gging-0.124
Token the
Token ID262
Feature activation+2.22e-02

Pos logit contributions
 blamed+0.372
 amused+0.347
aky+0.338
gging+0.325
cks+0.325
Neg logit contributions
 to-0.346
 hill-0.290
 ladder-0.272
 prize-0.260
anger-0.259
Token sad
Token ID6507
Feature activation+2.85e-01

Pos logit contributions
aron+0.209
 blamed+0.205
gging+0.190
ities+0.187
 unison+0.186
Neg logit contributions
 to-0.370
t-0.355
 ladder-0.342
 hill-0.338
ho-0.324
Token and
Token ID290
Feature activation+1.02e-02

Pos logit contributions
 blamed+0.457
 amused+0.409
gging+0.403
cks+0.395
 ashamed+0.383
Neg logit contributions
 to-0.605
t-0.591
 hill-0.585
anger-0.567
 ladder-0.544
Token angry
Token ID7954
Feature activation+4.09e-01

Pos logit contributions
aron+0.016
gging+0.015
 blamed+0.015
cks+0.015
aky+0.015
Neg logit contributions
 to-0.022
 hill-0.020
t-0.020
ho-0.019
 arrived-0.019
Token.
Token ID13
Feature activation+1.86e-01

Pos logit contributions
 blamed+0.756
 amused+0.697
cks+0.685
aky+0.669
gging+0.669
Neg logit contributions
 to-0.869
t-0.763
anger-0.722
 hill-0.712
ho-0.655
Token He
Token ID679
Feature activation+6.72e-02

Pos logit contributions
 blamed+0.370
cks+0.366
ck+0.363
gging+0.358
 amused+0.357
Neg logit contributions
 to-0.311
t-0.287
 hill-0.280
 ladder-0.265
anger-0.260
Token blamed
Token ID13772
Feature activation+5.47e-01

Pos logit contributions
cks+0.115
gging+0.115
 amused+0.115
ags+0.114
aron+0.113
Neg logit contributions
 to-0.123
 arrived-0.114
t-0.113
 hill-0.111
 would-0.104
Token Mia
Token ID32189
Feature activation+1.20e-01

Pos logit contributions
 blamed+1.116
 amused+1.023
 ashamed+0.982
aron+0.980
aky+0.959
Neg logit contributions
 to-0.997
 hill-0.881
ho-0.833
oes-0.825
t-0.787
Token for
Token ID329
Feature activation-6.88e-02

Pos logit contributions
 blamed+0.242
 amused+0.221
gging+0.220
 ashamed+0.215
cks+0.213
Neg logit contributions
t-0.195
 to-0.193
 hill-0.192
anger-0.187
 ladder-0.177
Token not
Token ID407
Feature activation+1.71e-01

Pos logit contributions
 to+0.133
 hill+0.114
t+0.110
 ladder+0.108
 arrived+0.105
Neg logit contributions
 blamed-0.123
aron-0.122
aky-0.119
gging-0.117
cks-0.113
Token helping
Token ID5742
Feature activation+3.25e-02

Pos logit contributions
 blamed+0.315
 amused+0.286
gging+0.285
 ashamed+0.277
cks+0.275
Neg logit contributions
t-0.307
 to-0.305
 hill-0.303
anger-0.296
 ladder-0.282
Token him
Token ID683
Feature activation+5.59e-02

Pos logit contributions
 blamed+0.054
 amused+0.049
gging+0.048
 ashamed+0.047
cks+0.046
Neg logit contributions
 to-0.065
t-0.064
 hill-0.064
anger-0.062
oes-0.060
Token time
Token ID640
Feature activation+4.92e-02

Pos logit contributions
 blamed+0.010
gging+0.009
 amused+0.009
cks+0.009
 ashamed+0.009
Neg logit contributions
 to-0.010
t-0.010
 hill-0.010
anger-0.010
 ladder-0.010
Token.
Token ID13
Feature activation+1.84e-01

Pos logit contributions
 blamed+0.126
 amused+0.118
gging+0.117
 ashamed+0.115
cks+0.114
Neg logit contributions
t-0.052
 hill-0.051
 to-0.050
anger-0.050
 ladder-0.045
Token They
Token ID1119
Feature activation+1.53e-01

Pos logit contributions
 blamed+0.351
 amused+0.347
ck+0.347
cks+0.346
gging+0.346
Neg logit contributions
 to-0.311
t-0.302
 hill-0.281
anger-0.270
 ladder-0.260
Token yelled
Token ID22760
Feature activation+2.11e-01

Pos logit contributions
gging+0.265
cks+0.255
 amused+0.250
 blamed+0.249
aron+0.246
Neg logit contributions
 to-0.285
t-0.279
 arrived-0.271
 hill-0.269
 ladder-0.252
Token and
Token ID290
Feature activation+5.07e-03

Pos logit contributions
 blamed+0.425
cks+0.401
gging+0.391
 amused+0.384
ck+0.371
Neg logit contributions
 to-0.370
t-0.343
anger-0.324
 hill-0.310
 arrived-0.298
Token blamed
Token ID13772
Feature activation+5.47e-01

Pos logit contributions
gging+0.008
aron+0.008
iver+0.008
cks+0.008
aky+0.008
Neg logit contributions
 to-0.010
 arrived-0.009
 hill-0.009
t-0.009
 ladder-0.009
Token and
Token ID290
Feature activation+5.25e-03

Pos logit contributions
 blamed+1.106
 amused+1.041
 ashamed+1.004
aky+0.990
aron+0.974
Neg logit contributions
 to-0.992
 hill-0.872
ho-0.785
or-0.779
 stair-0.763
Token fired
Token ID6294
Feature activation+9.62e-02

Pos logit contributions
gging+0.007
aron+0.007
 blamed+0.007
cks+0.007
aky+0.007
Neg logit contributions
 to-0.012
t-0.011
 arrived-0.011
 hill-0.010
ho-0.010
Token people
Token ID661
Feature activation+5.24e-02

Pos logit contributions
 blamed+0.170
 amused+0.151
gging+0.149
cks+0.143
aky+0.143
Neg logit contributions
 to-0.199
t-0.177
 hill-0.175
anger-0.171
 ladder-0.166
Token.
Token ID13
Feature activation+1.84e-01

Pos logit contributions
 blamed+0.089
 amused+0.081
gging+0.079
 ashamed+0.078
cks+0.078
Neg logit contributions
t-0.099
 hill-0.099
anger-0.098
 plunge-0.093
 to-0.092
Token
Token ID198
Feature activation-5.66e-02

Pos logit contributions
eling+0.379
ck+0.377
cks+0.376
gging+0.373
 blamed+0.373
Neg logit contributions
 to-0.287
t-0.281
 hill-0.250
anger-0.243
 ladder-0.236
Token crying
Token ID13774
Feature activation+9.67e-02

Pos logit contributions
 to+0.270
 door+0.183
 home+0.181
't+0.180
 ladder+0.179
Neg logit contributions
itive-0.247
leness-0.239
ities-0.230
ishment-0.225
 unison-0.225
Token and
Token ID290
Feature activation+7.61e-03

Pos logit contributions
 blamed+0.244
cks+0.237
 amused+0.224
ck+0.222
eling+0.220
Neg logit contributions
 to-0.140
t-0.098
 hill-0.093
 door-0.082
s-0.081
Token angry
Token ID7954
Feature activation+4.09e-01

Pos logit contributions
aron+0.014
gging+0.014
ady+0.013
 blamed+0.013
aky+0.013
Neg logit contributions
 to-0.014
 hill-0.013
 arrived-0.012
 ladder-0.012
ho-0.012
Token.
Token ID13
Feature activation+1.84e-01

Pos logit contributions
 blamed+0.720
 amused+0.676
cks+0.673
aky+0.654
gging+0.653
Neg logit contributions
 to-0.881
t-0.786
anger-0.706
 hill-0.694
oes-0.690
Token He
Token ID679
Feature activation+6.55e-02

Pos logit contributions
ck+0.385
 blamed+0.382
cks+0.374
eling+0.369
 amused+0.366
Neg logit contributions
 to-0.293
 hill-0.274
t-0.271
 ladder-0.255
anger-0.253
Token blamed
Token ID13772
Feature activation+5.47e-01

Pos logit contributions
cks+0.134
ags+0.131
gging+0.127
 amused+0.124
aron+0.122
Neg logit contributions
 to-0.103
 arrived-0.096
 would-0.091
t-0.088
 hill-0.083
Token Sam
Token ID3409
Feature activation+7.44e-02

Pos logit contributions
 blamed+1.084
 amused+1.045
aron+0.995
 ashamed+0.975
aky+0.959
Neg logit contributions
 to-0.987
 hill-0.911
oes-0.868
ho-0.818
t-0.816
Token for
Token ID329
Feature activation-7.08e-02

Pos logit contributions
 blamed+0.156
 amused+0.143
gging+0.142
cks+0.138
 ashamed+0.137
Neg logit contributions
 to-0.125
t-0.115
 hill-0.112
anger-0.106
 ladder-0.104
Token his
Token ID465
Feature activation+1.47e-01

Pos logit contributions
 to+0.138
 hill+0.113
t+0.111
 ladder+0.107
 prize+0.103
Neg logit contributions
aron-0.127
 blamed-0.127
aky-0.124
ady-0.121
gging-0.119
Token pain
Token ID2356
Feature activation+2.92e-01

Pos logit contributions
 blamed+0.301
 amused+0.277
gging+0.274
 ashamed+0.271
cks+0.265
Neg logit contributions
t-0.234
 hill-0.230
anger-0.225
 to-0.225
 ladder-0.211
Token.
Token ID13
Feature activation+1.85e-01

Pos logit contributions
 blamed+0.468
 amused+0.421
gging+0.409
cks+0.396
 ashamed+0.395
Neg logit contributions
 to-0.616
t-0.605
 hill-0.589
anger-0.579
oes-0.555
Token.
Token ID13
Feature activation+1.83e-01

Pos logit contributions
 blamed+0.844
gging+0.774
 amused+0.773
 ashamed+0.763
 innocent+0.742
Neg logit contributions
 hill-0.689
t-0.688
anger-0.676
 ladder-0.640
 plunge-0.635
Token He
Token ID679
Feature activation+6.39e-02

Pos logit contributions
 blamed+0.351
 amused+0.328
gging+0.324
cks+0.321
ck+0.318
Neg logit contributions
 to-0.322
t-0.310
 hill-0.307
anger-0.291
 ladder-0.288
Token was
Token ID373
Feature activation-1.40e-01

Pos logit contributions
cks+0.124
gging+0.124
 amused+0.123
ags+0.120
 blamed+0.119
Neg logit contributions
 to-0.100
 arrived-0.098
t-0.097
 hill-0.091
ho-0.087
Token angry
Token ID7954
Feature activation+4.07e-01

Pos logit contributions
 to+0.312
 door+0.236
 home+0.222
 ladder+0.220
t+0.220
Neg logit contributions
itive-0.220
gging-0.214
aron-0.212
 unison-0.198
leness-0.192
Token and
Token ID290
Feature activation+6.81e-03

Pos logit contributions
 blamed+0.721
 amused+0.656
gging+0.647
 ashamed+0.638
cks+0.620
Neg logit contributions
t-0.753
 hill-0.752
anger-0.739
 to-0.718
 ladder-0.704
Token blamed
Token ID13772
Feature activation+5.46e-01

Pos logit contributions
 blamed+0.007
gging+0.007
aron+0.007
cks+0.007
 years+0.007
Neg logit contributions
 to-0.017
t-0.017
 hill-0.016
 ladder-0.016
ho-0.016
Token Mia
Token ID32189
Feature activation+1.18e-01

Pos logit contributions
 blamed+1.089
 amused+1.004
aron+0.985
 ashamed+0.934
aky+0.919
Neg logit contributions
 to-1.028
 hill-0.935
oes-0.879
ho-0.877
t-0.854
Token.
Token ID13
Feature activation+1.84e-01

Pos logit contributions
 blamed+0.220
gging+0.201
 amused+0.199
cks+0.192
 ashamed+0.189
Neg logit contributions
 to-0.219
t-0.210
 hill-0.205
anger-0.198
 ladder-0.193
Token He
Token ID679
Feature activation+6.43e-02

Pos logit contributions
 blamed+0.395
ck+0.386
gging+0.385
cks+0.381
 amused+0.377
Neg logit contributions
 to-0.282
t-0.257
 hill-0.245
 ladder-0.235
anger-0.221
Token said
Token ID531
Feature activation+1.70e-01

Pos logit contributions
cks+0.122
gging+0.122
ags+0.121
 amused+0.115
iver+0.115
Neg logit contributions
 to-0.104
 arrived-0.101
t-0.094
 would-0.090
 hill-0.089
Token it
Token ID340
Feature activation-1.74e-01

Pos logit contributions
 blamed+0.371
 ashamed+0.342
 amused+0.340
 sorry+0.330
gging+0.330
Neg logit contributions
 hill-0.265
anger-0.254
t-0.250
 plunge-0.239
oes-0.238
Token
Token ID220
Feature activation-1.57e-02

Pos logit contributions
 blamed+0.050
 ashamed+0.045
 amused+0.044
 innocent+0.041
 years+0.041
Neg logit contributions
anger-0.049
 hill-0.049
t-0.048
 plunge-0.046
oes-0.045
Token
Token ID198
Feature activation-5.09e-02

Pos logit contributions
t+0.023
 to+0.023
 hill+0.022
anger+0.022
 ladder+0.021
Neg logit contributions
 blamed-0.034
 amused-0.032
gging-0.032
 ashamed-0.031
cks-0.031
Token
Token ID198
Feature activation-5.09e-02

Pos logit contributions
anger+0.089
 hill+0.088
t+0.087
oes+0.084
 to+0.083
Neg logit contributions
 blamed-0.095
 amused-0.087
 ashamed-0.083
gging-0.083
 years-0.081
TokenMill
Token ID22603
Feature activation+3.06e-02

Pos logit contributions
anger+0.106
 hill+0.103
t+0.101
oes+0.101
 to+0.099
Neg logit contributions
 blamed-0.080
 amused-0.072
 years-0.069
gging-0.068
 ashamed-0.068
Tokenie
Token ID494
Feature activation-2.96e-02

Pos logit contributions
 blamed+0.062
 amused+0.057
gging+0.056
 ashamed+0.055
cks+0.055
Neg logit contributions
t-0.049
 hill-0.048
 to-0.047
anger-0.047
 ladder-0.044
Token blamed
Token ID13772
Feature activation+5.46e-01

Pos logit contributions
t+0.047
 hill+0.047
 to+0.047
anger+0.045
 ladder+0.043
Neg logit contributions
 blamed-0.060
 amused-0.055
gging-0.055
 ashamed-0.053
cks-0.053
Token her
Token ID607
Feature activation+2.13e-01

Pos logit contributions
 blamed+0.890
 amused+0.812
aky+0.786
cks+0.763
 ashamed+0.756
Neg logit contributions
 to-1.212
 hill-1.095
ho-1.054
t-1.046
 ladder-1.045
Token mom
Token ID1995
Feature activation-1.32e-02

Pos logit contributions
 blamed+0.419
gging+0.383
 amused+0.382
cks+0.371
 ashamed+0.370
Neg logit contributions
t-0.360
 hill-0.356
 to-0.355
anger-0.347
 ladder-0.331
Token for
Token ID329
Feature activation-7.13e-02

Pos logit contributions
t+0.020
 to+0.020
 hill+0.020
anger+0.019
 ladder+0.018
Neg logit contributions
 blamed-0.028
 amused-0.026
gging-0.025
 ashamed-0.025
cks-0.025
Token why
Token ID1521
Feature activation+1.36e-01

Pos logit contributions
 to+0.132
 hill+0.118
t+0.113
 plunge+0.112
 arrived+0.110
Neg logit contributions
 blamed-0.135
gging-0.125
aky-0.123
cks-0.123
aron-0.122
Token she
Token ID673
Feature activation-6.53e-02

Pos logit contributions
 blamed+0.304
gging+0.282
 amused+0.279
cks+0.275
 ashamed+0.270
Neg logit contributions
t-0.194
 to-0.194
 hill-0.188
anger-0.181
 ladder-0.176
Token sad
Token ID6507
Feature activation+2.81e-01

Pos logit contributions
gging+0.027
 blamed+0.027
aron+0.026
itive+0.026
ities+0.024
Neg logit contributions
 to-0.047
 arrived-0.039
t-0.038
 ladder-0.038
 home-0.037
Token and
Token ID290
Feature activation+2.75e-03

Pos logit contributions
 blamed+0.511
 amused+0.464
gging+0.460
 ashamed+0.449
cks+0.444
Neg logit contributions
t-0.513
 hill-0.507
 to-0.502
anger-0.496
 ladder-0.472
Token angry
Token ID7954
Feature activation+4.06e-01

Pos logit contributions
aron+0.003
gging+0.003
 blamed+0.003
cks+0.003
ady+0.003
Neg logit contributions
 to-0.007
t-0.007
 hill-0.006
ho-0.006
 arrived-0.006
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
 blamed+0.675
 amused+0.615
gging+0.605
aky+0.593
cks+0.581
Neg logit contributions
 to-0.881
t-0.816
 hill-0.780
anger-0.769
ho-0.731
Token They
Token ID1119
Feature activation+1.52e-01

Pos logit contributions
gging+0.370
eling+0.367
ck+0.364
aky+0.363
 blamed+0.358
Neg logit contributions
 to-0.274
t-0.264
 hill-0.253
 ladder-0.234
anger-0.231
Token blamed
Token ID13772
Feature activation+5.45e-01

Pos logit contributions
gging+0.274
aron+0.268
cks+0.266
ags+0.254
 amused+0.252
Neg logit contributions
 to-0.271
 arrived-0.261
 hill-0.252
t-0.248
ho-0.238
Token Tom
Token ID4186
Feature activation+1.77e-01

Pos logit contributions
 blamed+1.076
 amused+0.998
aky+0.989
aron+0.967
 ashamed+0.966
Neg logit contributions
 to-0.995
 hill-0.902
ho-0.849
oes-0.798
t-0.797
Token and
Token ID290
Feature activation+3.34e-03

Pos logit contributions
 blamed+0.371
 amused+0.340
gging+0.337
 ashamed+0.328
cks+0.327
Neg logit contributions
 to-0.280
t-0.278
 hill-0.269
anger-0.262
 ladder-0.248
Token Mia
Token ID32189
Feature activation+1.16e-01

Pos logit contributions
gging+0.005
 blamed+0.005
cks+0.005
aky+0.005
 amused+0.005
Neg logit contributions
 to-0.007
 hill-0.006
t-0.006
 arrived-0.006
ho-0.006
Token for
Token ID329
Feature activation-7.22e-02

Pos logit contributions
 blamed+0.230
gging+0.214
 amused+0.213
cks+0.207
aron+0.204
Neg logit contributions
 to-0.198
t-0.190
 hill-0.183
anger-0.172
 ladder-0.170
Token being
Token ID852
Feature activation+1.67e-01

Pos logit contributions
 to+0.139
 hill+0.117
 ladder+0.109
t+0.108
 prize+0.108
Neg logit contributions
aron-0.133
aky-0.129
 blamed-0.128
ady-0.125
gging-0.124
Token sad
Token ID6507
Feature activation+2.82e-01

Pos logit contributions
aron+0.134
cks+0.116
itive+0.115
ena+0.114
gging+0.113
Neg logit contributions
 to-0.224
t-0.183
 can-0.178
 ladder-0.175
ore-0.174
Token and
Token ID290
Feature activation+6.17e-03

Pos logit contributions
 blamed+0.550
 amused+0.502
gging+0.499
cks+0.486
 ashamed+0.483
Neg logit contributions
 to-0.481
t-0.480
 hill-0.473
anger-0.460
 ladder-0.436
Token angry
Token ID7954
Feature activation+4.07e-01

Pos logit contributions
aron+0.007
gging+0.006
ady+0.006
cks+0.006
aky+0.006
Neg logit contributions
 to-0.016
 hill-0.014
ho-0.014
 arrived-0.014
t-0.014
Token.
Token ID13
Feature activation+1.81e-01

Pos logit contributions
 blamed+0.733
 amused+0.656
cks+0.653
gging+0.651
aky+0.634
Neg logit contributions
 to-0.868
t-0.761
 hill-0.727
anger-0.714
ho-0.668
Token He
Token ID679
Feature activation+6.31e-02

Pos logit contributions
 blamed+0.385
 amused+0.361
gging+0.359
cks+0.352
 ashamed+0.346
Neg logit contributions
t-0.273
 to-0.273
 hill-0.263
anger-0.253
 ladder-0.243
Token blamed
Token ID13772
Feature activation+5.45e-01

Pos logit contributions
cks+0.120
ags+0.118
gging+0.116
 amused+0.113
iver+0.110
Neg logit contributions
 to-0.106
 arrived-0.098
 would-0.093
t-0.092
ho-0.086
Token Mia
Token ID32189
Feature activation+1.17e-01

Pos logit contributions
 blamed+1.066
 amused+0.992
 ashamed+0.938
aky+0.935
aron+0.935
Neg logit contributions
 to-1.007
 hill-0.956
oes-0.864
ho-0.861
t-0.854
Token for
Token ID329
Feature activation-7.22e-02

Pos logit contributions
 blamed+0.247
 amused+0.227
gging+0.226
 ashamed+0.219
cks+0.219
Neg logit contributions
 to-0.181
t-0.180
 hill-0.177
anger-0.172
 ladder-0.163
Token making
Token ID1642
Feature activation+2.55e-02

Pos logit contributions
 to+0.143
 hill+0.121
t+0.119
 ladder+0.115
 prize+0.112
Neg logit contributions
 blamed-0.127
aron-0.124
aky-0.121
gging-0.120
ady-0.116
Token him
Token ID683
Feature activation+5.49e-02

Pos logit contributions
 blamed+0.053
aron+0.048
aky+0.048
gging+0.046
 amused+0.046
Neg logit contributions
 to-0.046
 hill-0.038
 plunge-0.037
anger-0.037
 ladder-0.037
Token fall
Token ID2121
Feature activation-4.90e-02

Pos logit contributions
 blamed+0.094
gging+0.085
aron+0.083
 amused+0.083
cks+0.080
Neg logit contributions
 to-0.119
t-0.107
 hill-0.104
 ladder-0.101
oes-0.096
Token what
Token ID644
Feature activation+1.40e-01

Pos logit contributions
 blamed+0.137
 amused+0.125
gging+0.124
cks+0.122
 ashamed+0.121
Neg logit contributions
 to-0.116
t-0.114
 hill-0.112
anger-0.109
 ladder-0.104
Token happened
Token ID3022
Feature activation-7.21e-02

Pos logit contributions
 blamed+0.342
 amused+0.323
 ashamed+0.316
 innocent+0.307
 annoying+0.307
Neg logit contributions
anger-0.183
 hill-0.169
t-0.160
 plunge-0.155
oes-0.149
Token.
Token ID13
Feature activation+1.70e-01

Pos logit contributions
t+0.127
 hill+0.125
 to+0.123
anger+0.123
 ladder+0.116
Neg logit contributions
 blamed-0.135
 amused-0.123
gging-0.122
 ashamed-0.119
cks-0.118
Token They
Token ID1119
Feature activation+1.43e-01

Pos logit contributions
 blamed+0.302
 amused+0.280
gging+0.279
cks+0.273
 ashamed+0.268
Neg logit contributions
t-0.311
 to-0.309
 hill-0.303
anger-0.297
 ladder-0.285
Token both
Token ID1111
Feature activation+3.13e-01

Pos logit contributions
cks+0.251
gging+0.245
 blamed+0.244
aron+0.238
 amused+0.238
Neg logit contributions
 to-0.260
 arrived-0.251
 hill-0.243
t-0.243
ho-0.232
Token blamed
Token ID13772
Feature activation+5.35e-01

Pos logit contributions
ags+0.626
cks+0.603
gging+0.578
ivers+0.567
iver+0.556
Neg logit contributions
 arrived-0.535
 to-0.529
t-0.472
 would-0.447
ho-0.447
Token each
Token ID1123
Feature activation+1.13e-01

Pos logit contributions
 blamed+1.070
aky+1.017
 amused+1.003
 ashamed+0.990
aron+0.989
Neg logit contributions
 to-0.950
 hill-0.825
oes-0.765
ho-0.751
t-0.745
Token other
Token ID584
Feature activation+2.32e-02

Pos logit contributions
 blamed+0.203
 amused+0.184
gging+0.182
 ashamed+0.178
cks+0.176
Neg logit contributions
t-0.209
 hill-0.207
 to-0.205
anger-0.202
 ladder-0.193
Token.
Token ID13
Feature activation+1.72e-01

Pos logit contributions
 blamed+0.044
gging+0.040
 amused+0.039
cks+0.039
 ashamed+0.038
Neg logit contributions
 to-0.042
t-0.040
 hill-0.040
anger-0.038
 ladder-0.037
Token They
Token ID1119
Feature activation+1.47e-01

Pos logit contributions
 blamed+0.334
 amused+0.326
gging+0.324
eling+0.321
aky+0.320
Neg logit contributions
 to-0.284
t-0.279
 hill-0.267
anger-0.256
 ladder-0.250
Token both
Token ID1111
Feature activation+3.15e-01

Pos logit contributions
cks+0.286
ags+0.274
aron+0.271
gging+0.271
ivers+0.262
Neg logit contributions
 arrived-0.245
 to-0.243
t-0.219
 would-0.217
 hill-0.216
Token's
Token ID338
Feature activation-1.07e-01

Pos logit contributions
t+0.145
anger+0.144
 hill+0.144
 to+0.135
 plunge+0.132
Neg logit contributions
 blamed-0.204
 ashamed-0.185
 amused-0.184
gging-0.179
cks-0.179
Token your
Token ID534
Feature activation-2.73e-03

Pos logit contributions
 to+0.204
t+0.170
 hill+0.168
 ladder+0.166
 prize+0.163
Neg logit contributions
 blamed-0.207
gging-0.198
 amused-0.189
cks-0.181
aron-0.175
Token fault
Token ID8046
Feature activation-1.55e-02

Pos logit contributions
t+0.004
 hill+0.004
anger+0.004
 arrived+0.004
 plunge+0.004
Neg logit contributions
 blamed-0.005
 ashamed-0.005
 amused-0.005
 innocent-0.005
 sorry-0.005
Token!"
Token ID2474
Feature activation-5.24e-02

Pos logit contributions
t+0.026
 to+0.025
 hill+0.025
anger+0.025
 ladder+0.024
Neg logit contributions
 blamed-0.031
 amused-0.028
gging-0.028
 ashamed-0.027
cks-0.027
Token Mia
Token ID32189
Feature activation+1.06e-01

Pos logit contributions
 to+0.096
 hill+0.083
t+0.083
 ladder+0.082
 arrived+0.077
Neg logit contributions
gging-0.098
cks-0.097
 blamed-0.095
eling-0.094
aky-0.094
Token blamed
Token ID13772
Feature activation+5.34e-01

Pos logit contributions
 blamed+0.177
cks+0.171
gging+0.168
aky+0.166
 amused+0.166
Neg logit contributions
 to-0.204
 hill-0.187
t-0.183
 ladder-0.177
 arrived-0.177
Token Sam
Token ID3409
Feature activation+6.44e-02

Pos logit contributions
 blamed+0.945
 amused+0.837
 ashamed+0.818
aky+0.812
aron+0.794
Neg logit contributions
 to-1.139
 hill-1.021
oes-0.983
ho-0.980
t-0.948
Token.
Token ID13
Feature activation+1.71e-01

Pos logit contributions
 blamed+0.114
gging+0.102
 amused+0.102
 annoying+0.098
 years+0.096
Neg logit contributions
 to-0.135
t-0.115
 hill-0.112
 ladder-0.109
oes-0.108
Token
Token ID198
Feature activation-6.35e-02

Pos logit contributions
 blamed+0.397
gging+0.377
cks+0.376
aky+0.374
ck+0.373
Neg logit contributions
 to-0.236
t-0.217
 hill-0.213
 ladder-0.203
anger-0.198
Token
Token ID198
Feature activation-6.40e-02

Pos logit contributions
 to+0.076
t+0.073
 hill+0.069
anger+0.063
 ladder+0.062
Neg logit contributions
 blamed-0.159
 amused-0.150
 ashamed-0.149
gging-0.148
 innocent-0.145
TokenTheir
Token ID14574
Feature activation+2.41e-01

Pos logit contributions
 hill+0.101
t+0.100
 to+0.100
 ladder+0.090
anger+0.088
Neg logit contributions
 blamed-0.132
 ashamed-0.123
 amused-0.123
gging-0.117
 innocent-0.115
Token what
Token ID644
Feature activation+1.26e-01

Pos logit contributions
t+0.198
 hill+0.195
 to+0.189
anger+0.189
 ladder+0.178
Neg logit contributions
 blamed-0.265
 amused-0.245
gging-0.243
 ashamed-0.238
cks-0.236
Token you
Token ID345
Feature activation-1.34e-01

Pos logit contributions
 blamed+0.227
 ashamed+0.210
 amused+0.201
 sorry+0.200
 innocent+0.197
Neg logit contributions
anger-0.230
t-0.223
 hill-0.215
oes-0.208
ho-0.207
Token did
Token ID750
Feature activation+1.01e-01

Pos logit contributions
t+0.217
 to+0.211
 hill+0.211
anger+0.205
 arrived+0.197
Neg logit contributions
 blamed-0.271
 amused-0.250
gging-0.249
 ashamed-0.241
cks-0.240
Token!"
Token ID2474
Feature activation-6.46e-02

Pos logit contributions
 blamed+0.177
 amused+0.166
gging+0.164
cks+0.161
 ashamed+0.161
Neg logit contributions
 hill-0.180
anger-0.176
t-0.175
oes-0.166
 ladder-0.165
Token Sam
Token ID3409
Feature activation+5.63e-02

Pos logit contributions
 to+0.125
t+0.119
 hill+0.114
anger+0.112
 ladder+0.111
Neg logit contributions
 blamed-0.110
cks-0.105
gging-0.105
 amused-0.102
aky-0.100
Token blamed
Token ID13772
Feature activation+5.26e-01

Pos logit contributions
 blamed+0.088
cks+0.086
gging+0.085
 amused+0.084
 years+0.081
Neg logit contributions
 to-0.115
t-0.103
 hill-0.103
 arrived-0.103
 ladder-0.097
Token Mia
Token ID32189
Feature activation+1.04e-01

Pos logit contributions
 blamed+1.008
 amused+0.946
aky+0.914
 ashamed+0.903
aron+0.898
Neg logit contributions
 to-1.027
oes-0.857
 hill-0.856
t-0.823
anger-0.818
Token.
Token ID13
Feature activation+1.65e-01

Pos logit contributions
 blamed+0.177
gging+0.160
 amused+0.158
 annoying+0.154
cks+0.152
Neg logit contributions
 to-0.223
t-0.196
 hill-0.189
 ladder-0.182
oes-0.180
Token
Token ID198
Feature activation-6.80e-02

Pos logit contributions
 blamed+0.378
gging+0.361
cks+0.356
 years+0.353
 amused+0.352
Neg logit contributions
 to-0.228
t-0.212
 hill-0.209
 ladder-0.200
anger-0.197
Token
Token ID198
Feature activation-6.76e-02

Pos logit contributions
 to+0.085
t+0.084
 hill+0.079
anger+0.073
 ladder+0.072
Neg logit contributions
 blamed-0.165
gging-0.155
 amused-0.155
 ashamed-0.154
 innocent-0.149
Token"
Token ID1
Feature activation-1.13e-01

Pos logit contributions
t+0.097
 hill+0.095
 to+0.094
 ladder+0.085
anger+0.083
Neg logit contributions
 blamed-0.151
 amused-0.141
 ashamed-0.140
gging-0.138
cks-0.134
Token he
Token ID339
Feature activation-1.23e-01

Pos logit contributions
t+0.150
 to+0.148
 hill+0.148
anger+0.144
 ladder+0.137
Neg logit contributions
 blamed-0.191
 amused-0.175
gging-0.175
cks-0.170
 ashamed-0.170
Token just
Token ID655
Feature activation+1.50e-01

Pos logit contributions
t+0.212
 to+0.210
 hill+0.203
anger+0.196
 arrived+0.194
Neg logit contributions
 blamed-0.232
gging-0.216
 amused-0.215
cks-0.213
 ashamed-0.204
Token kept
Token ID4030
Feature activation+4.02e-02

Pos logit contributions
 blamed+0.264
gging+0.249
cks+0.244
 amused+0.242
aron+0.231
Neg logit contributions
 to-0.282
t-0.271
 arrived-0.261
 hill-0.260
ho-0.251
Token sh
Token ID427
Feature activation+9.74e-02

Pos logit contributions
 blamed+0.074
gging+0.066
 amused+0.066
 ashamed+0.063
cks+0.063
Neg logit contributions
 to-0.075
t-0.070
 hill-0.070
anger-0.067
 ladder-0.066
Tokenru
Token ID622
Feature activation+2.87e-01

Pos logit contributions
 blamed+0.125
 amused+0.099
 ashamed+0.090
 innocent+0.089
cks+0.085
Neg logit contributions
t-0.245
or-0.234
ore-0.230
oes-0.228
 to-0.226
Tokengging
Token ID18792
Feature activation+4.86e-01

Pos logit contributions
 blamed+0.409
 amused+0.330
 ashamed+0.299
istered+0.275
 importantly+0.269
Neg logit contributions
or-0.652
 to-0.641
ho-0.588
t-0.583
 hill-0.576
Token.
Token ID13
Feature activation+1.91e-01

Pos logit contributions
 blamed+0.854
 amused+0.764
gging+0.740
cks+0.729
 ashamed+0.725
Neg logit contributions
 to-0.957
t-0.935
 hill-0.903
anger-0.890
 ladder-0.853
Token Rob
Token ID3851
Feature activation-6.10e-02

Pos logit contributions
 blamed+0.375
 amused+0.350
cks+0.346
gging+0.345
aky+0.336
Neg logit contributions
 to-0.315
t-0.313
 hill-0.311
anger-0.302
 ladder-0.289
Tokenby
Token ID1525
Feature activation-2.21e-01

Pos logit contributions
t+0.068
 to+0.067
 hill+0.064
anger+0.061
 arrived+0.061
Neg logit contributions
 blamed-0.155
 amused-0.146
gging-0.143
cks-0.142
 ashamed-0.142
Token thought
Token ID1807
Feature activation+1.26e-01

Pos logit contributions
t+0.382
 to+0.382
 hill+0.365
 arrived+0.352
anger+0.352
Neg logit contributions
 blamed-0.416
 amused-0.385
cks-0.382
gging-0.378
 ashamed-0.366
Token this
Token ID428
Feature activation-1.35e-01

Pos logit contributions
 blamed+0.243
 amused+0.227
gging+0.221
 ashamed+0.219
 innocent+0.215
Neg logit contributions
 hill-0.209
t-0.206
anger-0.201
oes-0.195
 to-0.194
Token everyone
Token ID2506
Feature activation+2.79e-02

Pos logit contributions
 blamed+0.291
gging+0.266
cks+0.264
 amused+0.263
aron+0.253
Neg logit contributions
 to-0.214
t-0.206
 hill-0.200
anger-0.192
 ladder-0.189
Token had
Token ID550
Feature activation+1.33e-02

Pos logit contributions
 blamed+0.059
 amused+0.055
gging+0.053
 ashamed+0.053
cks+0.052
Neg logit contributions
t-0.042
 hill-0.042
anger-0.042
 to-0.041
 ladder-0.039
Token been
Token ID587
Feature activation+7.99e-02

Pos logit contributions
 blamed+0.025
gging+0.023
 amused+0.023
cks+0.023
aron+0.022
Neg logit contributions
 to-0.024
t-0.022
 hill-0.021
 arrived-0.021
anger-0.020
Token sh
Token ID427
Feature activation+9.82e-02

Pos logit contributions
 blamed+0.130
cks+0.119
gging+0.119
 amused+0.110
aron+0.108
Neg logit contributions
 to-0.174
t-0.152
 arrived-0.146
 hill-0.141
 ladder-0.141
Tokenru
Token ID622
Feature activation+2.83e-01

Pos logit contributions
 blamed+0.107
 amused+0.091
cks+0.080
 innocent+0.079
 ashamed+0.079
Neg logit contributions
t-0.259
anger-0.246
 to-0.242
or-0.241
oes-0.240
Tokengging
Token ID18792
Feature activation+4.85e-01

Pos logit contributions
 blamed+0.382
 amused+0.368
 ashamed+0.304
 sorry+0.287
 annoying+0.279
Neg logit contributions
or-0.622
 to-0.577
t-0.562
st-0.550
ho-0.549
Token when
Token ID618
Feature activation+5.33e-02

Pos logit contributions
 blamed+0.966
 amused+0.871
gging+0.867
cks+0.852
 ashamed+0.837
Neg logit contributions
 to-0.826
t-0.810
 hill-0.782
anger-0.763
 ladder-0.731
Token she
Token ID673
Feature activation-6.58e-02

Pos logit contributions
 blamed+0.110
 amused+0.103
 ashamed+0.100
gging+0.100
 innocent+0.097
Neg logit contributions
t-0.082
 hill-0.081
anger-0.081
 to-0.077
oes-0.075
Token was
Token ID373
Feature activation-1.37e-01

Pos logit contributions
 to+0.098
t+0.097
 hill+0.095
 arrived+0.090
anger+0.090
Neg logit contributions
 blamed-0.140
gging-0.132
 amused-0.132
cks-0.129
 ashamed-0.125
Token looking
Token ID2045
Feature activation+9.86e-03

Pos logit contributions
 to+0.287
t+0.247
 arrived+0.243
 ladder+0.226
 hill+0.223
Neg logit contributions
 blamed-0.226
gging-0.214
cks-0.212
aron-0.192
 amused-0.190
Token for
Token ID329
Feature activation-7.10e-02

Pos logit contributions
 blamed+0.022
gging+0.020
 amused+0.020
cks+0.019
 ashamed+0.019
Neg logit contributions
 to-0.015
t-0.014
 hill-0.014
anger-0.014
 ladder-0.013
Token Tom
Token ID4186
Feature activation+1.75e-01

Pos logit contributions
aky+0.147
aron+0.145
 blamed+0.145
 amused+0.139
gging+0.137
Neg logit contributions
 to-0.130
 hill-0.120
t-0.113
 ladder-0.111
ho-0.110
Token guesses
Token ID44774
Feature activation+1.97e-01

Pos logit contributions
aky+0.318
 blamed+0.317
cks+0.311
aming+0.311
cking+0.310
Neg logit contributions
 to-0.300
 hill-0.266
oes-0.246
 arrived-0.243
 ladder-0.241
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
 blamed+0.386
cks+0.369
 annoying+0.357
gging+0.356
aky+0.355
Neg logit contributions
 to-0.372
t-0.308
 hill-0.295
 arrived-0.282
oes-0.277
Token sh
Token ID427
Feature activation+9.44e-02

Pos logit contributions
 to+0.184
 hill+0.154
 ladder+0.147
 arrived+0.142
ho+0.138
Neg logit contributions
aky-0.191
 blamed-0.188
wy-0.182
 awful-0.182
ags-0.178
Tokenru
Token ID622
Feature activation+2.88e-01

Pos logit contributions
 blamed+0.151
 amused+0.133
 ashamed+0.132
 innocent+0.119
 annoying+0.118
Neg logit contributions
t-0.199
oes-0.191
ho-0.188
anger-0.187
zag-0.184
Tokengging
Token ID18792
Feature activation+4.85e-01

Pos logit contributions
 blamed+0.342
 importantly+0.267
 amused+0.260
 ashamed+0.253
istered+0.210
Neg logit contributions
 to-0.702
or-0.698
ho-0.687
 hill-0.675
anger-0.671
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
 blamed+0.778
 amused+0.682
gging+0.669
 ashamed+0.647
 years+0.642
Neg logit contributions
 to-1.046
t-1.005
 hill-0.966
anger-0.959
oes-0.945
Token
Token ID198
Feature activation-6.03e-02

Pos logit contributions
 blamed+0.404
gging+0.380
 amused+0.376
cks+0.365
 ashamed+0.364
Neg logit contributions
 to-0.258
t-0.255
 hill-0.242
anger-0.238
 ladder-0.225
Token
Token ID198
Feature activation-6.04e-02

Pos logit contributions
 to+0.075
t+0.071
 hill+0.067
anger+0.063
 ladder+0.061
Neg logit contributions
 blamed-0.149
gging-0.141
 amused-0.141
 ashamed-0.137
 annoying-0.135
TokenThey
Token ID2990
Feature activation+2.50e-01

Pos logit contributions
t+0.113
 hill+0.112
 to+0.111
anger+0.105
 ladder+0.104
Neg logit contributions
 blamed-0.108
 amused-0.100
gging-0.098
 ashamed-0.097
 innocent-0.092
Token turn
Token ID1210
Feature activation-1.85e-01

Pos logit contributions
 blamed+0.551
 amused+0.527
gging+0.523
cks+0.517
 ashamed+0.507
Neg logit contributions
 hill-0.345
 to-0.337
t-0.326
 arrived-0.318
 ladder-0.307
Token Ben
Token ID3932
Feature activation+1.47e-01

Pos logit contributions
cks+0.094
 blamed+0.093
aron+0.092
aky+0.088
 amused+0.087
Neg logit contributions
 to-0.100
 hill-0.090
 arrived-0.088
t-0.087
 plunge-0.085
Token said
Token ID531
Feature activation+1.75e-01

Pos logit contributions
cks+0.292
ags+0.277
cking+0.265
gging+0.258
eling+0.258
Neg logit contributions
 to-0.264
t-0.241
 arrived-0.239
 would-0.208
'll-0.206
Token,
Token ID11
Feature activation-9.88e-02

Pos logit contributions
 blamed+0.423
cks+0.415
gging+0.401
 amused+0.393
aky+0.393
Neg logit contributions
 to-0.245
t-0.215
anger-0.186
 ladder-0.181
 arrived-0.180
Token sh
Token ID427
Feature activation+9.44e-02

Pos logit contributions
 to+0.164
 hill+0.134
 ladder+0.134
 arrived+0.125
t+0.123
Neg logit contributions
aky-0.207
 blamed-0.196
ags-0.194
cks-0.194
wy-0.191
Tokenru
Token ID622
Feature activation+2.85e-01

Pos logit contributions
 blamed+0.156
 amused+0.148
 ashamed+0.145
 innocent+0.133
 awful+0.131
Neg logit contributions
t-0.197
ho-0.178
zag-0.176
oes-0.173
anger-0.172
Tokengging
Token ID18792
Feature activation+4.84e-01

Pos logit contributions
 blamed+0.308
 importantly+0.281
 amused+0.272
 ashamed+0.263
 enjoyable+0.229
Neg logit contributions
or-0.689
ho-0.678
t-0.665
 to-0.662
anger-0.646
Token.
Token ID13
Feature activation+1.89e-01

Pos logit contributions
 blamed+0.840
cks+0.735
 amused+0.730
 years+0.729
gging+0.723
Neg logit contributions
 to-1.059
t-0.942
anger-0.862
oes-0.861
 hill-0.860
Token
Token ID198
Feature activation-5.02e-02

Pos logit contributions
 blamed+0.378
 amused+0.344
 ashamed+0.338
gging+0.337
 innocent+0.322
Neg logit contributions
 hill-0.310
t-0.308
anger-0.301
 to-0.292
 plunge-0.286
Token
Token ID198
Feature activation-5.06e-02

Pos logit contributions
t+0.075
 to+0.073
 hill+0.072
anger+0.071
 ladder+0.066
Neg logit contributions
 blamed-0.108
 amused-0.100
gging-0.100
 ashamed-0.098
cks-0.097
TokenThey
Token ID2990
Feature activation+2.52e-01

Pos logit contributions
t+0.088
 to+0.086
 hill+0.084
anger+0.081
 ladder+0.078
Neg logit contributions
 blamed-0.098
 amused-0.091
cks-0.089
gging-0.089
 ashamed-0.087
Token saw
Token ID2497
Feature activation-1.18e-01

Pos logit contributions
 blamed+0.499
gging+0.495
cks+0.495
 amused+0.491
aron+0.474
Neg logit contributions
 to-0.395
 arrived-0.386
t-0.377
 hill-0.376
ho-0.343
Token,
Token ID11
Feature activation-1.02e-01

Pos logit contributions
 to+0.033
t+0.032
 hill+0.032
anger+0.030
 ladder+0.029
Neg logit contributions
 blamed-0.034
gging-0.031
 amused-0.031
 ashamed-0.030
cks-0.030
Token who
Token ID508
Feature activation+2.38e-01

Pos logit contributions
 to+0.212
 hill+0.197
t+0.187
 ladder+0.185
 arrived+0.182
Neg logit contributions
 blamed-0.165
gging-0.155
cks-0.150
 awful-0.144
 amused-0.142
Token is
Token ID318
Feature activation-1.76e-01

Pos logit contributions
 blamed+0.499
 amused+0.461
gging+0.453
 ashamed+0.446
cks+0.443
Neg logit contributions
t-0.367
 hill-0.356
anger-0.354
 to-0.351
 ladder-0.328
Token surprised
Token ID6655
Feature activation+2.38e-01

Pos logit contributions
 to+0.367
 ladder+0.301
t+0.299
 hill+0.296
 contest+0.291
Neg logit contributions
 blamed-0.308
gging-0.292
aron-0.274
cks-0.261
aky-0.253
Token and
Token ID290
Feature activation+3.81e-03

Pos logit contributions
 blamed+0.471
 amused+0.436
gging+0.433
 years+0.426
 ashamed+0.424
Neg logit contributions
 hill-0.387
anger-0.386
t-0.384
 ladder-0.364
 plunge-0.362
Token amused
Token ID36979
Feature activation+4.77e-01

Pos logit contributions
gging+0.004
aron+0.004
 blamed+0.004
aky+0.004
cks+0.004
Neg logit contributions
 to-0.010
t-0.009
 arrived-0.009
 ladder-0.009
 hill-0.008
Token.
Token ID13
Feature activation+1.81e-01

Pos logit contributions
 blamed+0.869
cks+0.768
 amused+0.748
gging+0.746
aky+0.724
Neg logit contributions
 to-1.036
t-0.927
oes-0.841
 hill-0.838
anger-0.824
Token She
Token ID1375
Feature activation+6.85e-02

Pos logit contributions
 blamed+0.328
gging+0.314
cks+0.309
 amused+0.308
ck+0.302
Neg logit contributions
 to-0.337
t-0.320
 hill-0.313
anger-0.309
 ladder-0.298
Token says
Token ID1139
Feature activation+1.53e-01

Pos logit contributions
 blamed+0.119
gging+0.114
 amused+0.108
cks+0.107
 ashamed+0.105
Neg logit contributions
 to-0.126
 hill-0.122
t-0.120
anger-0.118
 ladder-0.116
Token they
Token ID484
Feature activation-9.03e-02

Pos logit contributions
 blamed+0.312
 amused+0.298
 sorry+0.296
 ashamed+0.294
gging+0.280
Neg logit contributions
 hill-0.248
t-0.246
anger-0.240
oes-0.235
 plunge-0.220
Token are
Token ID389
Feature activation-1.92e-02

Pos logit contributions
 to+0.126
t+0.124
 hill+0.120
'll+0.116
 arrived+0.113
Neg logit contributions
 blamed-0.199
gging-0.191
 amused-0.187
cks-0.184
 ashamed-0.180
Token
Token ID198
Feature activation-5.43e-02

Pos logit contributions
 blamed+0.357
 amused+0.326
gging+0.323
 ashamed+0.318
cks+0.312
Neg logit contributions
t-0.304
 hill-0.300
 to-0.295
anger-0.294
 ladder-0.277
Token
Token ID198
Feature activation-5.46e-02

Pos logit contributions
t+0.079
 to+0.078
 hill+0.077
anger+0.075
 ladder+0.071
Neg logit contributions
 blamed-0.119
 amused-0.110
gging-0.110
 ashamed-0.107
cks-0.106
TokenMom
Token ID29252
Feature activation+2.31e-02

Pos logit contributions
t+0.090
 hill+0.089
 to+0.088
anger+0.087
 ladder+0.082
Neg logit contributions
 blamed-0.109
 amused-0.100
gging-0.099
 ashamed-0.097
cks-0.096
Token was
Token ID373
Feature activation-1.40e-01

Pos logit contributions
 blamed+0.038
 amused+0.034
gging+0.034
 ashamed+0.033
cks+0.033
Neg logit contributions
t-0.046
 hill-0.045
 to-0.045
anger-0.045
 ladder-0.043
Token not
Token ID407
Feature activation+1.71e-01

Pos logit contributions
 to+0.324
t+0.270
 ladder+0.270
 door+0.256
 arrived+0.255
Neg logit contributions
 blamed-0.186
gging-0.183
itive-0.178
cks-0.175
aron-0.172
Token amused
Token ID36979
Feature activation+4.76e-01

Pos logit contributions
 blamed+0.230
cks+0.206
 unison+0.205
 awful+0.197
gging+0.191
Neg logit contributions
 to-0.438
t-0.371
 home-0.350
 ladder-0.345
 arrived-0.344
Token.
Token ID13
Feature activation+1.82e-01

Pos logit contributions
 blamed+0.827
 amused+0.739
cks+0.722
gging+0.716
 ashamed+0.709
Neg logit contributions
 to-0.962
t-0.930
 hill-0.901
anger-0.882
oes-0.850
Token She
Token ID1375
Feature activation+6.89e-02

Pos logit contributions
 blamed+0.332
gging+0.304
cks+0.303
 amused+0.302
ck+0.292
Neg logit contributions
 to-0.333
t-0.329
 hill-0.324
anger-0.316
 ladder-0.304
Token took
Token ID1718
Feature activation-1.03e-02

Pos logit contributions
gging+0.145
cks+0.145
 blamed+0.141
ags+0.135
 amused+0.134
Neg logit contributions
 to-0.099
 arrived-0.097
t-0.096
 hill-0.090
anger-0.086
Token off
Token ID572
Feature activation-1.34e-01

Pos logit contributions
t+0.022
 hill+0.022
anger+0.021
 to+0.020
 ladder+0.020
Neg logit contributions
 blamed-0.015
gging-0.014
 amused-0.013
 ashamed-0.013
cks-0.013
Token the
Token ID262
Feature activation+1.86e-02

Pos logit contributions
 to+0.272
t+0.260
 hill+0.253
anger+0.248
 ladder+0.240
Neg logit contributions
 blamed-0.228
 amused-0.208
gging-0.207
cks-0.202
 ashamed-0.197
Token angry
Token ID7954
Feature activation+4.08e-01

Pos logit contributions
 to+0.396
 door+0.308
t+0.304
 ladder+0.300
't+0.285
Neg logit contributions
gging-0.263
itive-0.263
aron-0.253
 unison-0.252
 blamed-0.243
Token.
Token ID13
Feature activation+1.81e-01

Pos logit contributions
 blamed+0.690
gging+0.617
 amused+0.607
cks+0.596
ck+0.578
Neg logit contributions
 to-0.886
t-0.819
 hill-0.781
anger-0.756
oes-0.738
Token She
Token ID1375
Feature activation+6.77e-02

Pos logit contributions
 blamed+0.318
 amused+0.285
gging+0.281
 ashamed+0.280
cks+0.266
Neg logit contributions
t-0.339
 hill-0.338
anger-0.332
 to-0.325
 plunge-0.318
Token is
Token ID318
Feature activation-1.74e-01

Pos logit contributions
 blamed+0.117
gging+0.113
 amused+0.107
aky+0.106
aron+0.106
Neg logit contributions
 to-0.124
 hill-0.117
t-0.116
anger-0.113
 ladder-0.113
Token also
Token ID635
Feature activation+1.97e-01

Pos logit contributions
 to+0.397
t+0.310
 door+0.299
 ladder+0.296
 arrived+0.289
Neg logit contributions
gging-0.264
itive-0.257
aron-0.255
 unison-0.254
ishment-0.244
Token amused
Token ID36979
Feature activation+4.76e-01

Pos logit contributions
 blamed+0.213
gging+0.205
aron+0.202
cks+0.181
ck+0.179
Neg logit contributions
 to-0.522
t-0.460
 ladder-0.442
 arrived-0.441
 hill-0.434
Token.
Token ID13
Feature activation+1.82e-01

Pos logit contributions
 blamed+0.815
gging+0.726
cks+0.719
 amused+0.716
aky+0.679
Neg logit contributions
 to-1.061
t-0.970
 hill-0.884
oes-0.875
anger-0.869
Token She
Token ID1375
Feature activation+6.85e-02

Pos logit contributions
 blamed+0.325
 amused+0.292
 ashamed+0.286
gging+0.286
 innocent+0.272
Neg logit contributions
 hill-0.337
t-0.336
anger-0.329
 to-0.318
 plunge-0.318
Token laughs
Token ID22051
Feature activation+2.49e-01

Pos logit contributions
 blamed+0.121
gging+0.117
 amused+0.111
cks+0.109
aky+0.108
Neg logit contributions
 to-0.123
 hill-0.117
t-0.116
anger-0.114
 ladder-0.112
Token.
Token ID13
Feature activation+1.82e-01

Pos logit contributions
 blamed+0.585
eling+0.569
cks+0.550
ck+0.550
gging+0.545
Neg logit contributions
 to-0.466
t-0.306
s-0.299
 door-0.279
 can-0.261
Token She
Token ID1375
Feature activation+6.88e-02

Pos logit contributions
 blamed+0.341
 amused+0.309
gging+0.304
 ashamed+0.302
cks+0.290
Neg logit contributions
t-0.322
 hill-0.320
anger-0.312
 to-0.306
 plunge-0.298
Token not
Token ID407
Feature activation+1.73e-01

Pos logit contributions
 to+0.427
t+0.317
 home+0.316
 ladder+0.308
 arrived+0.305
Neg logit contributions
gging-0.268
itive-0.252
 blamed-0.248
ities-0.238
 unison-0.235
Token surprised
Token ID6655
Feature activation+2.39e-01

Pos logit contributions
 blamed+0.211
gging+0.207
 unison+0.197
cks+0.193
eling+0.177
Neg logit contributions
 to-0.476
 home-0.375
t-0.366
 arrived-0.360
 ladder-0.355
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
 blamed+0.403
gging+0.361
 amused+0.360
cks+0.351
 ashamed+0.338
Neg logit contributions
 to-0.516
t-0.479
 hill-0.450
oes-0.448
anger-0.442
Token She
Token ID1375
Feature activation+6.89e-02

Pos logit contributions
 blamed+0.343
gging+0.327
cks+0.325
ck+0.322
 amused+0.321
Neg logit contributions
 to-0.319
t-0.310
anger-0.303
 hill-0.296
 ladder-0.280
Token is
Token ID318
Feature activation-1.77e-01

Pos logit contributions
gging+0.148
 blamed+0.147
aky+0.139
 amused+0.139
cks+0.137
Neg logit contributions
 to-0.099
 hill-0.088
 arrived-0.088
t-0.087
anger-0.085
Token amused
Token ID36979
Feature activation+4.76e-01

Pos logit contributions
 to+0.426
t+0.314
 home+0.310
 arrived+0.308
 ladder+0.303
Neg logit contributions
gging-0.270
itive-0.252
 blamed-0.247
ities-0.238
 unison-0.233
Token.
Token ID13
Feature activation+1.81e-01

Pos logit contributions
 blamed+0.751
gging+0.662
 amused+0.656
cks+0.644
 ashamed+0.630
Neg logit contributions
 to-1.054
t-1.025
 hill-0.967
anger-0.962
oes-0.934
Token She
Token ID1375
Feature activation+6.81e-02

Pos logit contributions
 blamed+0.331
gging+0.309
 amused+0.306
cks+0.303
ck+0.293
Neg logit contributions
 to-0.330
t-0.326
 hill-0.315
anger-0.314
 ladder-0.297
Token tells
Token ID4952
Feature activation+7.95e-02

Pos logit contributions
 blamed+0.124
gging+0.122
 amused+0.113
cks+0.113
aky+0.111
Neg logit contributions
 to-0.120
 hill-0.113
t-0.112
 arrived-0.110
anger-0.110
Token them
Token ID606
Feature activation+1.46e-02

Pos logit contributions
 blamed+0.156
gging+0.143
aron+0.141
 amused+0.141
aky+0.139
Neg logit contributions
 to-0.145
t-0.133
anger-0.129
 hill-0.122
ho-0.121
Token the
Token ID262
Feature activation+1.60e-02

Pos logit contributions
 blamed+0.037
 amused+0.035
 ashamed+0.034
gging+0.034
 innocent+0.033
Neg logit contributions
 hill-0.016
t-0.016
anger-0.015
 to-0.014
 ladder-0.014
Token lady
Token ID10846
Feature activation-1.05e-01

Pos logit contributions
 blamed+0.409
 amused+0.379
 ashamed+0.374
gging+0.367
 innocent+0.360
Neg logit contributions
t-0.247
anger-0.241
 plunge-0.239
 hill-0.235
 to-0.232
Token was
Token ID373
Feature activation-1.39e-01

Pos logit contributions
 to+0.212
t+0.212
 hill+0.207
anger+0.201
 arrived+0.198
Neg logit contributions
 blamed-0.163
gging-0.148
 amused-0.147
cks-0.146
 ashamed-0.143
Token surprised
Token ID6655
Feature activation+2.41e-01

Pos logit contributions
 to+0.309
 arrived+0.271
t+0.264
 hill+0.256
 ladder+0.255
Neg logit contributions
 blamed-0.202
gging-0.192
cks-0.190
eling-0.178
ck-0.177
Token,
Token ID11
Feature activation-9.47e-02

Pos logit contributions
 blamed+0.514
 amused+0.468
gging+0.466
cks+0.466
 ashamed+0.456
Neg logit contributions
 to-0.379
t-0.369
 hill-0.356
anger-0.344
oes-0.326
Token but
Token ID475
Feature activation+8.01e-03

Pos logit contributions
 to+0.160
t+0.140
 hill+0.140
 ladder+0.132
 arrived+0.129
Neg logit contributions
 blamed-0.194
cks-0.189
gging-0.182
aky-0.180
ck-0.174
Token amused
Token ID36979
Feature activation+4.76e-01

Pos logit contributions
 blamed+0.016
cks+0.015
gging+0.015
ags+0.014
aron+0.014
Neg logit contributions
 to-0.013
t-0.013
 hill-0.012
 arrived-0.012
 ladder-0.011
Token by
Token ID416
Feature activation-8.63e-02

Pos logit contributions
 blamed+0.960
cks+0.857
 amused+0.856
gging+0.852
 ashamed+0.833
Neg logit contributions
 to-0.842
t-0.804
 hill-0.762
anger-0.746
ho-0.716
Token the
Token ID262
Feature activation+2.29e-02

Pos logit contributions
 to+0.153
 hill+0.142
t+0.136
 arrived+0.133
 plunge+0.131
Neg logit contributions
 blamed-0.168
gging-0.152
aky-0.149
cks-0.147
 ashamed-0.146
Token young
Token ID1862
Feature activation+9.87e-02

Pos logit contributions
 blamed+0.044
 amused+0.040
gging+0.040
 ashamed+0.039
cks+0.038
Neg logit contributions
t-0.040
 hill-0.039
 to-0.039
anger-0.038
 ladder-0.036
Token child
Token ID1200
Feature activation-1.02e-01

Pos logit contributions
 blamed+0.225
 amused+0.209
 ashamed+0.206
gging+0.206
 innocent+0.200
Neg logit contributions
t-0.135
anger-0.129
 hill-0.127
 to-0.125
 plunge-0.118
Tokenรข
Token ID22940
Feature activation+3.87e-02

Pos logit contributions
 to+0.195
t+0.187
 hill+0.184
anger+0.174
 arrived+0.173
Neg logit contributions
 blamed-0.183
gging-0.167
 amused-0.164
cks-0.162
 ashamed-0.158
Token,
Token ID11
Feature activation-1.53e-01

Pos logit contributions
 blamed+0.364
 amused+0.330
gging+0.327
 ashamed+0.321
cks+0.313
Neg logit contributions
t-0.334
 hill-0.330
anger-0.322
 to-0.321
 ladder-0.305
Token and
Token ID290
Feature activation-2.84e-02

Pos logit contributions
t+0.255
 hill+0.250
anger+0.247
 to+0.245
oes+0.230
Neg logit contributions
 blamed-0.299
 amused-0.274
gging-0.270
 ashamed-0.267
cks-0.261
Token always
Token ID1464
Feature activation+8.49e-02

Pos logit contributions
t+0.050
 to+0.050
 hill+0.050
anger+0.048
 ladder+0.046
Neg logit contributions
 blamed-0.053
gging-0.048
 amused-0.048
 ashamed-0.046
cks-0.046
Token made
Token ID925
Feature activation-5.95e-02

Pos logit contributions
 blamed+0.159
gging+0.146
 amused+0.145
cks+0.140
 ashamed+0.139
Neg logit contributions
 to-0.149
 hill-0.145
t-0.144
 arrived-0.138
anger-0.135
Token sure
Token ID1654
Feature activation+2.14e-01

Pos logit contributions
 to+0.125
 hill+0.110
anger+0.108
 ladder+0.108
 plunge+0.105
Neg logit contributions
 blamed-0.103
 amused-0.090
gging-0.089
aky-0.085
 years-0.084
Token to
Token ID284
Feature activation-4.84e-01

Pos logit contributions
 blamed+0.609
 ashamed+0.594
 amused+0.571
 innocent+0.569
 years+0.562
Neg logit contributions
anger-0.169
 hill-0.148
t-0.140
 Pole-0.120
oes-0.118
Token listen
Token ID6004
Feature activation-3.19e-02

Pos logit contributions
 to+0.683
 hill+0.672
t+0.666
anger+0.616
 ladder+0.611
Neg logit contributions
 blamed-1.078
 amused-1.012
gging-0.983
 ashamed-0.971
cks-0.970
Token to
Token ID284
Feature activation-4.73e-01

Pos logit contributions
t+0.028
 hill+0.027
anger+0.026
 to+0.026
 ladder+0.023
Neg logit contributions
 blamed-0.088
 amused-0.083
gging-0.082
 ashamed-0.081
cks-0.080
Token her
Token ID607
Feature activation+2.01e-01

Pos logit contributions
 to+0.836
 hill+0.797
t+0.754
 ladder+0.734
anger+0.724
Neg logit contributions
 blamed-0.925
 amused-0.833
gging-0.825
 ashamed-0.797
cks-0.791
Token mom
Token ID1995
Feature activation-2.31e-02

Pos logit contributions
 blamed+0.372
 amused+0.338
gging+0.335
 ashamed+0.326
cks+0.325
Neg logit contributions
t-0.357
 hill-0.356
 to-0.355
anger-0.346
 ladder-0.331
Token.
Token ID13
Feature activation+1.72e-01

Pos logit contributions
t+0.044
 hill+0.044
anger+0.043
 to+0.042
 ladder+0.041
Neg logit contributions
 blamed-0.039
 amused-0.036
gging-0.035
 ashamed-0.034
cks-0.034
Token stool
Token ID38753
Feature activation-3.42e-01

Pos logit contributions
t+0.308
 hill+0.302
 to+0.297
anger+0.297
 ladder+0.281
Neg logit contributions
 blamed-0.331
 amused-0.303
gging-0.300
 ashamed-0.294
cks-0.290
Token,
Token ID11
Feature activation-1.42e-01

Pos logit contributions
 hill+0.589
anger+0.577
 plunge+0.570
t+0.568
 mountain+0.551
Neg logit contributions
 blamed-0.601
 ashamed-0.538
 amused-0.521
gging-0.520
aron-0.506
Token so
Token ID523
Feature activation+7.93e-02

Pos logit contributions
t+0.240
 to+0.239
 hill+0.237
anger+0.229
 ladder+0.221
Neg logit contributions
 blamed-0.274
 amused-0.251
gging-0.250
cks-0.243
 ashamed-0.243
Token she
Token ID673
Feature activation-1.00e-01

Pos logit contributions
 blamed+0.182
gging+0.169
 amused+0.168
cks+0.165
 ashamed+0.162
Neg logit contributions
 to-0.107
t-0.107
 hill-0.105
anger-0.099
 ladder-0.097
Token decided
Token ID3066
Feature activation+1.32e-01

Pos logit contributions
t+0.178
 hill+0.176
anger+0.173
 to+0.172
oes+0.163
Neg logit contributions
 blamed-0.185
 amused-0.168
gging-0.165
 ashamed-0.163
cks-0.159
Token to
Token ID284
Feature activation-4.84e-01

Pos logit contributions
 blamed+0.388
 amused+0.375
 ashamed+0.373
 innocent+0.365
 angry+0.361
Neg logit contributions
 hill-0.080
anger-0.073
oes-0.063
ho-0.057
 mountain-0.056
Token sit
Token ID1650
Feature activation-4.93e-02

Pos logit contributions
t+0.706
anger+0.690
 hill+0.674
 to+0.659
oes+0.632
Neg logit contributions
 blamed-1.050
gging-0.954
 amused-0.946
 ashamed-0.934
cks-0.917
Token down
Token ID866
Feature activation-1.90e-01

Pos logit contributions
 to+0.058
 hill+0.052
t+0.051
anger+0.048
 ladder+0.047
Neg logit contributions
 blamed-0.131
 amused-0.118
gging-0.118
 ashamed-0.116
cks-0.114
Token.
Token ID13
Feature activation+1.73e-01

Pos logit contributions
t+0.349
 hill+0.345
anger+0.340
 to+0.337
 ladder+0.321
Neg logit contributions
 blamed-0.337
 amused-0.307
gging-0.302
 ashamed-0.297
cks-0.292
Token
Token ID198
Feature activation-6.63e-02

Pos logit contributions
 blamed+0.324
 amused+0.294
gging+0.291
 ashamed+0.286
cks+0.279
Neg logit contributions
t-0.303
 hill-0.300
 to-0.294
anger-0.293
 ladder-0.277
Token
Token ID198
Feature activation-6.60e-02

Pos logit contributions
t+0.101
 hill+0.100
 to+0.099
anger+0.098
 ladder+0.092
Neg logit contributions
 blamed-0.139
 amused-0.128
gging-0.127
 ashamed-0.125
cks-0.123
Token both
Token ID1111
Feature activation+2.71e-01

Pos logit contributions
 blamed+0.405
 amused+0.378
gging+0.371
 ashamed+0.364
 years+0.363
Neg logit contributions
 hill-0.430
t-0.412
anger-0.412
 ladder-0.401
 plunge-0.399
Token of
Token ID286
Feature activation-2.19e-01

Pos logit contributions
 blamed+0.489
gging+0.453
 amused+0.445
cks+0.434
aron+0.423
Neg logit contributions
 to-0.501
t-0.489
 hill-0.482
 arrived-0.452
 ladder-0.449
Token them
Token ID606
Feature activation-1.79e-02

Pos logit contributions
 to+0.399
 hill+0.392
t+0.383
anger+0.366
 ladder+0.365
Neg logit contributions
 blamed-0.403
 amused-0.367
gging-0.365
 ashamed-0.355
cks-0.346
Token had
Token ID550
Feature activation-2.18e-02

Pos logit contributions
 to+0.035
t+0.035
 hill+0.035
anger+0.033
 ladder+0.032
Neg logit contributions
 blamed-0.030
 amused-0.027
gging-0.027
cks-0.026
 ashamed-0.026
Token something
Token ID1223
Feature activation-1.42e-02

Pos logit contributions
 to+0.038
t+0.037
 hill+0.035
 arrived+0.034
anger+0.034
Neg logit contributions
 blamed-0.042
gging-0.039
 amused-0.039
cks-0.038
 ashamed-0.037
Token to
Token ID284
Feature activation-4.83e-01

Pos logit contributions
t+0.026
 hill+0.026
anger+0.026
 to+0.025
 ladder+0.024
Neg logit contributions
 blamed-0.025
 amused-0.023
gging-0.022
 ashamed-0.022
cks-0.022
Token remember
Token ID3505
Feature activation+1.11e-01

Pos logit contributions
 to+0.678
t+0.636
 hill+0.633
 prize+0.578
 plunge+0.578
Neg logit contributions
 blamed-1.095
 amused-1.017
gging-0.993
 ashamed-0.985
cks-0.967
Token each
Token ID1123
Feature activation+1.05e-01

Pos logit contributions
 blamed+0.187
 amused+0.169
gging+0.167
 ashamed+0.163
cks+0.161
Neg logit contributions
t-0.216
 hill-0.213
anger-0.209
 to-0.209
 ladder-0.199
Token other
Token ID584
Feature activation+1.80e-02

Pos logit contributions
 blamed+0.211
 amused+0.194
gging+0.192
 ashamed+0.189
cks+0.186
Neg logit contributions
t-0.170
 hill-0.167
anger-0.165
 to-0.164
 ladder-0.154
Token by
Token ID416
Feature activation-1.04e-01

Pos logit contributions
 blamed+0.031
gging+0.027
 amused+0.027
cks+0.026
 ashamed+0.026
Neg logit contributions
 to-0.035
t-0.035
 hill-0.034
anger-0.033
 ladder-0.032
Token.
Token ID13
Feature activation+1.73e-01

Pos logit contributions
 to+0.199
 hill+0.191
t+0.182
ho+0.173
 plunge+0.173
Neg logit contributions
 blamed-0.187
 amused-0.161
 ashamed-0.160
aky-0.154
aron-0.153
Token out
Token ID503
Feature activation-2.59e-01

Pos logit contributions
t+0.028
 hill+0.027
anger+0.027
 to+0.027
oes+0.026
Neg logit contributions
 blamed-0.025
 amused-0.023
gging-0.022
 ashamed-0.022
 innocent-0.021
Token and
Token ID290
Feature activation-3.12e-02

Pos logit contributions
t+0.431
 to+0.429
 hill+0.427
anger+0.414
 ladder+0.396
Neg logit contributions
 blamed-0.505
 amused-0.460
gging-0.459
 ashamed-0.447
cks-0.443
Token look
Token ID804
Feature activation-2.30e-01

Pos logit contributions
t+0.056
 hill+0.055
 to+0.055
anger+0.054
 ladder+0.052
Neg logit contributions
 blamed-0.057
 amused-0.052
gging-0.051
 ashamed-0.050
cks-0.050
Token for
Token ID329
Feature activation-9.86e-02

Pos logit contributions
 to+0.377
t+0.294
 arrived+0.294
 plunge+0.273
'll+0.271
Neg logit contributions
 blamed-0.493
gging-0.484
cks-0.459
aron-0.446
aky-0.432
Token something
Token ID1223
Feature activation-1.49e-02

Pos logit contributions
 to+0.168
 hill+0.148
t+0.145
 plunge+0.144
anger+0.143
Neg logit contributions
 blamed-0.205
gging-0.193
cks-0.188
 amused-0.187
aky-0.185
Token to
Token ID284
Feature activation-4.82e-01

Pos logit contributions
t+0.021
 hill+0.021
anger+0.021
oes+0.019
 ladder+0.019
Neg logit contributions
 blamed-0.032
 amused-0.030
 ashamed-0.030
 innocent-0.029
 years-0.029
Token chew
Token ID34722
Feature activation+1.18e-02

Pos logit contributions
 to+0.652
 hill+0.632
t+0.613
 ladder+0.576
anger+0.570
Neg logit contributions
 blamed-1.116
 amused-1.035
gging-1.024
 ashamed-1.010
cks-0.982
Token.
Token ID13
Feature activation+1.64e-01

Pos logit contributions
 blamed+0.020
 amused+0.018
gging+0.017
 ashamed+0.017
cks+0.017
Neg logit contributions
t-0.023
 hill-0.023
 to-0.023
anger-0.023
 ladder-0.022
Token
Token ID198
Feature activation-6.91e-02

Pos logit contributions
 blamed+0.316
 amused+0.292
gging+0.290
cks+0.282
 ashamed+0.281
Neg logit contributions
t-0.279
 to-0.275
 hill-0.274
anger-0.267
 ladder-0.255
Token
Token ID198
Feature activation-6.97e-02

Pos logit contributions
t+0.096
 to+0.095
 hill+0.092
anger+0.087
 ladder+0.084
Neg logit contributions
 blamed-0.157
gging-0.146
 amused-0.146
 ashamed-0.143
cks-0.141
TokenHe
Token ID1544
Feature activation+1.64e-01

Pos logit contributions
t+0.119
 hill+0.115
 to+0.114
anger+0.107
 ladder+0.105
Neg logit contributions
 blamed-0.136
 amused-0.126
gging-0.124
 ashamed-0.124
cks-0.122
Token she
Token ID673
Feature activation-1.23e-01

Pos logit contributions
 blamed+0.426
 amused+0.391
gging+0.391
 ashamed+0.390
 years+0.384
Neg logit contributions
anger-0.429
t-0.428
 hill-0.422
oes-0.402
 ladder-0.395
Token had
Token ID550
Feature activation-3.13e-02

Pos logit contributions
 to+0.201
t+0.199
 hill+0.193
anger+0.186
 arrived+0.184
Neg logit contributions
 blamed-0.244
gging-0.229
 amused-0.225
cks-0.221
 ashamed-0.216
Token figured
Token ID11638
Feature activation-1.69e-02

Pos logit contributions
 to+0.051
t+0.049
 hill+0.047
anger+0.046
 arrived+0.045
Neg logit contributions
 blamed-0.064
gging-0.060
 amused-0.059
cks-0.058
 ashamed-0.057
Token out
Token ID503
Feature activation-2.18e-01

Pos logit contributions
 hill+0.025
t+0.022
anger+0.022
ho+0.021
 mountain+0.021
Neg logit contributions
 blamed-0.037
 amused-0.035
 ashamed-0.034
 innocent-0.033
gging-0.033
Token how
Token ID703
Feature activation+1.09e-01

Pos logit contributions
 to+0.425
t+0.396
 hill+0.394
anger+0.385
 ladder+0.374
Neg logit contributions
 blamed-0.384
gging-0.347
 amused-0.341
cks-0.336
 ashamed-0.324
Token to
Token ID284
Feature activation-4.82e-01

Pos logit contributions
 blamed+0.274
 amused+0.260
 ashamed+0.257
gging+0.251
 innocent+0.251
Neg logit contributions
 hill-0.118
anger-0.115
t-0.114
oes-0.103
 to-0.102
Token attach
Token ID10199
Feature activation-9.14e-02

Pos logit contributions
t+0.643
 hill+0.642
 to+0.633
anger+0.610
 ladder+0.580
Neg logit contributions
 blamed-1.110
 amused-1.034
gging-1.015
 ashamed-1.006
cks-0.989
Token the
Token ID262
Feature activation+4.08e-03

Pos logit contributions
t+0.159
 hill+0.158
anger+0.154
 to+0.149
 ladder+0.145
Neg logit contributions
 blamed-0.168
 amused-0.153
gging-0.152
 ashamed-0.150
cks-0.149
Token car
Token ID1097
Feature activation-2.81e-01

Pos logit contributions
 blamed+0.008
 innocent+0.007
 ashamed+0.007
 amused+0.007
 years+0.007
Neg logit contributions
t-0.006
anger-0.006
oes-0.006
 hill-0.006
ho-0.006
Token to
Token ID284
Feature activation-4.68e-01

Pos logit contributions
t+0.457
 hill+0.457
anger+0.445
 plunge+0.437
ho+0.418
Neg logit contributions
 blamed-0.502
 ashamed-0.463
 amused-0.462
 innocent-0.461
gging-0.459
Token the
Token ID262
Feature activation+1.22e-02

Pos logit contributions
 to+0.909
 hill+0.877
t+0.863
anger+0.843
 ladder+0.828
Neg logit contributions
 blamed-0.833
gging-0.744
 amused-0.742
 ashamed-0.714
cks-0.713
Token music
Token ID2647
Feature activation+8.05e-02

Pos logit contributions
 blamed+0.266
 ashamed+0.263
 amused+0.259
 angry+0.255
 mad+0.253
Neg logit contributions
 hill-0.231
anger-0.230
ho-0.230
t-0.228
oes-0.220
Token,
Token ID11
Feature activation-1.33e-01

Pos logit contributions
 blamed+0.139
 amused+0.127
gging+0.125
 ashamed+0.124
cks+0.120
Neg logit contributions
t-0.149
 hill-0.148
anger-0.146
 to-0.143
 ladder-0.139
Token so
Token ID523
Feature activation+7.83e-02

Pos logit contributions
 to+0.236
t+0.232
 hill+0.231
anger+0.220
 ladder+0.214
Neg logit contributions
 blamed-0.248
gging-0.226
 amused-0.224
cks-0.220
 ashamed-0.217
Token she
Token ID673
Feature activation-9.98e-02

Pos logit contributions
 blamed+0.183
gging+0.172
 amused+0.169
cks+0.168
aron+0.163
Neg logit contributions
 to-0.105
t-0.102
 hill-0.099
 ladder-0.091
 arrived-0.090
Token went
Token ID1816
Feature activation-4.80e-02

Pos logit contributions
t+0.172
 to+0.171
 hill+0.170
anger+0.165
 ladder+0.158
Neg logit contributions
 blamed-0.187
 amused-0.171
gging-0.171
cks-0.166
 ashamed-0.166
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
t+0.048
ho+0.048
 hill+0.046
anger+0.043
oes+0.040
Neg logit contributions
 blamed-0.117
 amused-0.116
 ashamed-0.113
 angry-0.112
 sorry-0.111
Token the
Token ID262
Feature activation+4.51e-03

Pos logit contributions
t+0.783
 hill+0.779
 to+0.777
anger+0.755
 ladder+0.719
Neg logit contributions
 blamed-0.963
 amused-0.882
gging-0.876
 ashamed-0.857
cks-0.850
Token neighbor
Token ID4780
Feature activation-3.04e-02

Pos logit contributions
 blamed+0.009
 amused+0.009
 ashamed+0.009
 innocent+0.009
 years+0.009
Neg logit contributions
t-0.007
anger-0.006
oes-0.006
 to-0.006
ho-0.006
Token's
Token ID338
Feature activation-1.08e-01

Pos logit contributions
t+0.052
anger+0.052
 hill+0.048
 plunge+0.047
ho+0.046
Neg logit contributions
 blamed-0.059
 years-0.052
 ashamed-0.052
 amused-0.052
aky-0.050
Token house
Token ID2156
Feature activation-2.58e-01

Pos logit contributions
t+0.156
anger+0.138
 plunge+0.122
oes+0.122
ho+0.120
Neg logit contributions
 blamed-0.251
 ashamed-0.234
 innocent-0.233
 amused-0.231
 years-0.231
Token to
Token ID284
Feature activation-4.64e-01

Pos logit contributions
 to+0.503
t+0.503
 hill+0.498
anger+0.485
 ladder+0.466
Neg logit contributions
 blamed-0.435
 amused-0.391
gging-0.389
 ashamed-0.376
cks-0.375
Token on
Token ID319
Feature activation-2.11e-01

Pos logit contributions
t+0.529
 hill+0.522
 to+0.517
anger+0.509
 ladder+0.482
Neg logit contributions
 blamed-0.627
 amused-0.574
gging-0.569
 ashamed-0.557
cks-0.551
Token so
Token ID523
Feature activation+6.26e-02

Pos logit contributions
 to+0.404
 hill+0.400
t+0.379
 ladder+0.374
anger+0.361
Neg logit contributions
 blamed-0.382
 amused-0.348
gging-0.341
 ashamed-0.337
aky-0.327
Token she
Token ID673
Feature activation-1.12e-01

Pos logit contributions
 blamed+0.155
gging+0.146
 amused+0.144
cks+0.142
aron+0.141
Neg logit contributions
 to-0.078
t-0.076
 hill-0.073
 ladder-0.067
anger-0.063
Token went
Token ID1816
Feature activation-5.59e-02

Pos logit contributions
 to+0.197
t+0.196
 hill+0.193
anger+0.186
 arrived+0.181
Neg logit contributions
 blamed-0.206
gging-0.190
 amused-0.189
cks-0.185
 ashamed-0.181
Token up
Token ID510
Feature activation-1.21e-01

Pos logit contributions
t+0.072
 hill+0.068
anger+0.067
ho+0.065
oes+0.062
Neg logit contributions
 blamed-0.125
 amused-0.119
 ashamed-0.117
 angry-0.114
cks-0.114
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
t+0.111
anger+0.110
oes+0.099
 hill+0.095
 plunge+0.088
Neg logit contributions
 blamed-0.316
 amused-0.304
 years-0.303
 angry-0.301
gging-0.297
Token her
Token ID607
Feature activation+1.95e-01

Pos logit contributions
t+0.940
anger+0.905
 hill+0.902
 to+0.881
oes+0.848
Neg logit contributions
 blamed-0.827
 amused-0.751
gging-0.742
 ashamed-0.720
cks-0.711
Token mom
Token ID1995
Feature activation-2.58e-02

Pos logit contributions
 blamed+0.404
 amused+0.373
 ashamed+0.372
 years+0.367
gging+0.364
Neg logit contributions
t-0.306
anger-0.285
 hill-0.272
oes-0.268
 to-0.264
Token and
Token ID290
Feature activation-1.15e-03

Pos logit contributions
t+0.049
 hill+0.048
anger+0.048
 to+0.046
 plunge+0.045
Neg logit contributions
 blamed-0.044
 amused-0.040
gging-0.039
 ashamed-0.039
cks-0.038
Token asked
Token ID1965
Feature activation+1.20e-01

Pos logit contributions
t+0.002
 hill+0.002
 to+0.002
anger+0.002
 ladder+0.002
Neg logit contributions
 blamed-0.002
 amused-0.002
gging-0.002
 ashamed-0.001
cks-0.001
Token her
Token ID607
Feature activation+2.04e-01

Pos logit contributions
 blamed+0.213
cks+0.194
gging+0.192
 amused+0.190
 annoying+0.184
Neg logit contributions
 to-0.243
t-0.223
anger-0.214
 hill-0.212
 ladder-0.205
Token that
Token ID326
Feature activation-1.46e-01

Pos logit contributions
 to+0.286
t+0.284
 hill+0.278
anger+0.266
 arrived+0.256
Neg logit contributions
 blamed-0.337
gging-0.307
 amused-0.303
cks-0.295
 ashamed-0.294
Token he
Token ID339
Feature activation-1.76e-01

Pos logit contributions
t+0.229
anger+0.226
 hill+0.223
 to+0.215
 plunge+0.206
Neg logit contributions
 blamed-0.301
 amused-0.277
 ashamed-0.274
gging-0.270
cks-0.263
Token had
Token ID550
Feature activation-2.51e-02

Pos logit contributions
 to+0.284
t+0.278
 hill+0.268
 arrived+0.260
'll+0.249
Neg logit contributions
 blamed-0.346
gging-0.335
 amused-0.330
cks-0.326
aron-0.310
Token been
Token ID587
Feature activation+4.99e-02

Pos logit contributions
t+0.037
 to+0.037
 hill+0.037
anger+0.036
 ladder+0.034
Neg logit contributions
 blamed-0.054
 amused-0.050
gging-0.049
cks-0.048
 ashamed-0.048
Token able
Token ID1498
Feature activation+1.72e-01

Pos logit contributions
 blamed+0.085
gging+0.078
cks+0.076
 amused+0.075
aron+0.071
Neg logit contributions
 to-0.100
t-0.094
 hill-0.090
 arrived-0.089
 ladder-0.087
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
 blamed+0.504
 ashamed+0.503
 amused+0.490
 years+0.485
 angry+0.483
Neg logit contributions
anger-0.124
 hill-0.102
 Pole-0.082
t-0.076
 stair-0.075
Token climb
Token ID12080
Feature activation-2.12e-01

Pos logit contributions
 hill+0.635
 to+0.629
t+0.629
anger+0.591
 ladder+0.573
Neg logit contributions
 blamed-1.110
 amused-1.038
gging-1.019
 ashamed-1.003
cks-1.000
Token to
Token ID284
Feature activation-4.71e-01

Pos logit contributions
t+0.384
anger+0.371
 hill+0.363
oes+0.351
 to+0.350
Neg logit contributions
 blamed-0.370
gging-0.339
 amused-0.336
cks-0.333
 ashamed-0.331
Token the
Token ID262
Feature activation+1.09e-02

Pos logit contributions
t+0.834
 hill+0.832
 to+0.824
anger+0.807
 ladder+0.773
Neg logit contributions
 blamed-0.876
 amused-0.798
gging-0.791
 ashamed-0.771
cks-0.762
Token top
Token ID1353
Feature activation-2.40e-01

Pos logit contributions
 blamed+0.024
 ashamed+0.023
 awful+0.023
 years+0.023
 innocent+0.022
Neg logit contributions
t-0.015
anger-0.014
 to-0.013
oes-0.013
or-0.012
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
anger+0.361
t+0.353
 plunge+0.344
ho+0.337
 hill+0.328
Neg logit contributions
 blamed-0.463
 years-0.442
 amused-0.423
 ashamed-0.423
 annoying-0.418
Token and
Token ID290
Feature activation-3.44e-02

Pos logit contributions
anger+0.091
 hill+0.091
t+0.088
 plunge+0.086
ho+0.086
Neg logit contributions
 blamed-0.095
 amused-0.086
gging-0.085
cks-0.083
 years-0.082
Token looked
Token ID3114
Feature activation-7.98e-02

Pos logit contributions
 to+0.064
 hill+0.062
t+0.062
anger+0.059
 ladder+0.059
Neg logit contributions
 blamed-0.061
gging-0.056
 amused-0.055
cks-0.054
 ashamed-0.053
Token around
Token ID1088
Feature activation-8.72e-02

Pos logit contributions
 to+0.142
 hill+0.110
t+0.109
 arrived+0.109
 ladder+0.106
Neg logit contributions
 blamed-0.165
gging-0.157
cks-0.153
aron-0.149
aky-0.143
Token for
Token ID329
Feature activation-9.64e-02

Pos logit contributions
 to+0.174
 hill+0.169
t+0.168
anger+0.162
 ladder+0.158
Neg logit contributions
 blamed-0.148
gging-0.131
 amused-0.131
cks-0.128
 ashamed-0.127
Token something
Token ID1223
Feature activation-1.01e-02

Pos logit contributions
 to+0.168
 hill+0.145
 plunge+0.141
 arrived+0.140
 prize+0.137
Neg logit contributions
 blamed-0.198
cks-0.187
gging-0.186
aky-0.180
 amused-0.179
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
t+0.016
 hill+0.015
anger+0.015
 to+0.015
oes+0.014
Neg logit contributions
 blamed-0.021
 amused-0.019
gging-0.019
 ashamed-0.019
 innocent-0.018
Token wear
Token ID5806
Feature activation+1.82e-02

Pos logit contributions
 to+0.627
 hill+0.604
t+0.576
 plunge+0.543
 ladder+0.542
Neg logit contributions
 blamed-1.142
 amused-1.059
 ashamed-1.039
gging-1.038
aron-1.013
Token.
Token ID13
Feature activation+1.65e-01

Pos logit contributions
 blamed+0.029
 amused+0.026
gging+0.026
 ashamed+0.025
cks+0.025
Neg logit contributions
t-0.037
 to-0.037
 hill-0.036
anger-0.036
 ladder-0.034
Token Suddenly
Token ID24975
Feature activation-4.82e-03

Pos logit contributions
 blamed+0.325
 amused+0.303
gging+0.302
cks+0.296
 ashamed+0.291
Neg logit contributions
t-0.270
 to-0.269
 hill-0.266
anger-0.257
 ladder-0.247
Token,
Token ID11
Feature activation-1.07e-01

Pos logit contributions
t+0.007
 hill+0.007
anger+0.007
oes+0.007
 to+0.007
Neg logit contributions
 blamed-0.010
 amused-0.009
 ashamed-0.009
gging-0.009
 innocent-0.009
Token she
Token ID673
Feature activation-7.72e-02

Pos logit contributions
t+0.170
 hill+0.167
 to+0.165
anger+0.164
 ladder+0.154
Neg logit contributions
 blamed-0.216
 amused-0.199
gging-0.197
 ashamed-0.193
cks-0.190
Token now
Token ID783
Feature activation-5.96e-02

Pos logit contributions
 blamed+0.109
 amused+0.105
cks+0.104
gging+0.102
 ashamed+0.098
Neg logit contributions
 to-0.132
t-0.131
 hill-0.126
 arrived-0.121
 would-0.117
Token it
Token ID340
Feature activation-2.22e-01

Pos logit contributions
t+0.097
 hill+0.096
 to+0.095
anger+0.094
 ladder+0.089
Neg logit contributions
 blamed-0.118
 amused-0.109
gging-0.108
 ashamed-0.105
cks-0.104
Token was
Token ID373
Feature activation-1.79e-01

Pos logit contributions
t+0.284
 hill+0.281
anger+0.275
 to+0.273
 ladder+0.254
Neg logit contributions
 blamed-0.517
 amused-0.481
gging-0.475
 ashamed-0.471
cks-0.463
Token too
Token ID1165
Feature activation+7.76e-02

Pos logit contributions
 to+0.355
t+0.303
 hill+0.293
 arrived+0.292
 ladder+0.290
Neg logit contributions
 blamed-0.328
gging-0.316
cks-0.282
 amused-0.282
aron-0.280
Token late
Token ID2739
Feature activation+1.30e-01

Pos logit contributions
 blamed+0.147
cks+0.135
gging+0.134
ags+0.132
agged+0.122
Neg logit contributions
 to-0.151
t-0.147
 hill-0.133
 ladder-0.130
 arrived-0.122
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
 blamed+0.205
 amused+0.185
gging+0.183
 ashamed+0.178
cks+0.176
Neg logit contributions
t-0.263
 hill-0.261
anger-0.257
 to-0.252
 ladder-0.245
Token yield
Token ID7800
Feature activation-5.99e-02

Pos logit contributions
 to+0.645
 hill+0.637
t+0.588
 arrived+0.574
 ladder+0.573
Neg logit contributions
 blamed-1.093
 amused-1.045
gging-1.027
 ashamed-0.998
cks-0.997
Token.
Token ID13
Feature activation+1.69e-01

Pos logit contributions
t+0.088
 hill+0.087
anger+0.085
 to+0.083
 ladder+0.079
Neg logit contributions
 blamed-0.128
 amused-0.119
gging-0.118
 ashamed-0.117
cks-0.114
Token<|endoftext|>
Token ID50256
Feature activation+6.66e-02

Pos logit contributions
 blamed+0.329
gging+0.319
 amused+0.316
cks+0.316
ck+0.304
Neg logit contributions
t-0.276
 to-0.274
 hill-0.263
anger-0.253
 ladder-0.246
TokenOnce
Token ID7454
Feature activation-4.83e-02

Pos logit contributions
 blamed+0.127
 amused+0.113
gging+0.112
cks+0.112
 years+0.111
Neg logit contributions
t-0.107
anger-0.106
 hill-0.106
 plunge-0.103
ho-0.102
Token there
Token ID612
Feature activation-1.31e-01

Pos logit contributions
 hill+0.051
anger+0.050
t+0.050
 plunge+0.047
 to+0.046
Neg logit contributions
 blamed-0.121
 amused-0.115
 ashamed-0.113
gging-0.110
cks-0.109
Token tall
Token ID7331
Feature activation-1.27e-02

Pos logit contributions
 blamed+0.087
gging+0.082
aron+0.078
cks+0.077
aky+0.070
Neg logit contributions
 to-0.165
t-0.158
 ladder-0.152
 hill-0.152
ho-0.146
Token,
Token ID11
Feature activation-1.23e-01

Pos logit contributions
t+0.021
 hill+0.021
anger+0.020
 to+0.020
 ladder+0.019
Neg logit contributions
 blamed-0.025
 amused-0.023
gging-0.022
 ashamed-0.022
cks-0.022
Token so
Token ID523
Feature activation+8.84e-02

Pos logit contributions
t+0.177
 hill+0.173
anger+0.171
 to+0.170
 ladder+0.158
Neg logit contributions
 blamed-0.270
 amused-0.251
gging-0.247
 ashamed-0.244
cks-0.240
Token she
Token ID673
Feature activation-9.86e-02

Pos logit contributions
 blamed+0.192
gging+0.179
 amused+0.175
cks+0.173
 ashamed+0.170
Neg logit contributions
 to-0.132
t-0.130
 hill-0.129
 ladder-0.120
anger-0.120
Token had
Token ID550
Feature activation-1.65e-02

Pos logit contributions
t+0.174
 hill+0.171
 to+0.171
anger+0.167
 ladder+0.160
Neg logit contributions
 blamed-0.183
 amused-0.167
gging-0.166
 ashamed-0.162
cks-0.161
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
t+0.027
 hill+0.027
anger+0.027
 to+0.026
oes+0.025
Neg logit contributions
 blamed-0.033
 amused-0.030
gging-0.030
 ashamed-0.029
cks-0.029
Token stand
Token ID1302
Feature activation-2.03e-01

Pos logit contributions
 hill+0.615
 to+0.604
 ladder+0.554
t+0.530
 plunge+0.527
Neg logit contributions
 blamed-1.140
 amused-1.098
 ashamed-1.044
cks-1.042
gging-1.040
Token on
Token ID319
Feature activation-1.65e-01

Pos logit contributions
 to+0.278
t+0.260
 hill+0.260
anger+0.247
 ladder+0.239
Neg logit contributions
 blamed-0.483
gging-0.437
 amused-0.432
 ashamed-0.422
cks-0.420
Token her
Token ID607
Feature activation+1.99e-01

Pos logit contributions
t+0.290
 hill+0.289
 to+0.289
anger+0.281
 ladder+0.269
Neg logit contributions
 blamed-0.311
 amused-0.282
gging-0.279
 ashamed-0.273
cks-0.270
Token t
Token ID256
Feature activation-5.00e-02

Pos logit contributions
 blamed+0.417
 amused+0.389
 years+0.388
gging+0.388
 ashamed+0.385
Neg logit contributions
t-0.301
anger-0.278
 hill-0.267
oes-0.264
 to-0.263
Tokenipp
Token ID3974
Feature activation+3.36e-04

Pos logit contributions
 hill+0.079
 plunge+0.079
 arrived+0.076
 ladder+0.075
 prize+0.073
Neg logit contributions
aky-0.091
 years-0.090
 blamed-0.087
aron-0.084
ady-0.083
Token beautiful
Token ID4950
Feature activation+9.30e-02

Pos logit contributions
 blamed+0.087
gging+0.082
aron+0.078
cks+0.077
aky+0.070
Neg logit contributions
 to-0.165
t-0.158
 ladder-0.152
 hill-0.152
ho-0.146
Token,
Token ID11
Feature activation-1.25e-01

Pos logit contributions
 blamed+0.157
 amused+0.145
gging+0.143
 ashamed+0.140
 years+0.138
Neg logit contributions
t-0.176
 hill-0.172
anger-0.171
 to-0.166
oes-0.162
Token so
Token ID523
Feature activation+8.61e-02

Pos logit contributions
t+0.201
 hill+0.197
anger+0.195
 to+0.192
 ladder+0.182
Neg logit contributions
 blamed-0.250
 amused-0.231
gging-0.226
 ashamed-0.224
cks-0.220
Token he
Token ID339
Feature activation-1.59e-01

Pos logit contributions
 blamed+0.188
gging+0.175
 amused+0.170
cks+0.169
 ashamed+0.165
Neg logit contributions
 to-0.130
t-0.127
 hill-0.124
 ladder-0.115
anger-0.115
Token decided
Token ID3066
Feature activation+1.31e-01

Pos logit contributions
t+0.277
 to+0.273
 hill+0.273
anger+0.265
 ladder+0.254
Neg logit contributions
 blamed-0.297
 amused-0.271
gging-0.270
 ashamed-0.263
cks-0.262
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
 blamed+0.373
 amused+0.364
 ashamed+0.360
 innocent+0.355
 angry+0.348
Neg logit contributions
anger-0.089
 hill-0.085
oes-0.074
t-0.071
ho-0.068
Token design
Token ID1486
Feature activation-2.28e-01

Pos logit contributions
t+0.695
anger+0.674
 hill+0.656
 to+0.640
oes+0.605
Neg logit contributions
 blamed-1.056
gging-0.967
 amused-0.956
 ashamed-0.941
cks-0.937
Token something
Token ID1223
Feature activation-7.09e-03

Pos logit contributions
 to+0.401
t+0.384
 hill+0.381
anger+0.376
 ladder+0.355
Neg logit contributions
 blamed-0.444
gging-0.400
 amused-0.398
 ashamed-0.387
cks-0.378
Token that
Token ID326
Feature activation-9.85e-02

Pos logit contributions
t+0.014
 hill+0.013
anger+0.013
oes+0.013
 to+0.013
Neg logit contributions
 blamed-0.012
 amused-0.011
 ashamed-0.011
 innocent-0.010
gging-0.010
Token he
Token ID339
Feature activation-1.45e-01

Pos logit contributions
anger+0.148
t+0.146
 hill+0.145
 to+0.139
 ladder+0.134
Neg logit contributions
 blamed-0.206
 amused-0.192
 ashamed-0.188
gging-0.183
 innocent-0.182
Token could
Token ID714
Feature activation-1.55e-01

Pos logit contributions
t+0.223
 hill+0.219
 to+0.217
anger+0.214
 ladder+0.202
Neg logit contributions
 blamed-0.301
 amused-0.277
gging-0.276
 ashamed-0.270
cks-0.267
Token all
Token ID477
Feature activation-4.73e-03

Pos logit contributions
t+0.093
 hill+0.089
 to+0.087
anger+0.081
 ladder+0.080
Neg logit contributions
 blamed-0.097
 amused-0.090
cks-0.089
 ashamed-0.089
gging-0.088
Token sticky
Token ID23408
Feature activation+2.08e-01

Pos logit contributions
 to+0.007
t+0.007
 hill+0.007
anger+0.007
 ladder+0.007
Neg logit contributions
 blamed-0.010
gging-0.009
 amused-0.009
cks-0.009
 ashamed-0.009
Token so
Token ID523
Feature activation+7.59e-02

Pos logit contributions
 blamed+0.384
 amused+0.349
gging+0.347
 ashamed+0.340
cks+0.335
Neg logit contributions
t-0.368
 hill-0.363
 to-0.357
anger-0.356
 ladder-0.338
Token he
Token ID339
Feature activation-1.60e-01

Pos logit contributions
 blamed+0.142
gging+0.127
cks+0.124
aron+0.122
 amused+0.120
Neg logit contributions
 to-0.149
t-0.145
 ladder-0.132
 hill-0.130
ho-0.129
Token decided
Token ID3066
Feature activation+1.33e-01

Pos logit contributions
t+0.263
 to+0.262
 hill+0.256
anger+0.243
 ladder+0.238
Neg logit contributions
 blamed-0.314
 amused-0.292
gging-0.292
cks-0.283
 ashamed-0.277
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
 blamed+0.400
 ashamed+0.386
 amused+0.380
 innocent+0.374
 angry+0.371
Neg logit contributions
anger-0.074
 hill-0.073
oes-0.060
 plunge-0.053
t-0.052
Token take
Token ID1011
Feature activation-1.56e-01

Pos logit contributions
t+0.680
 hill+0.670
 to+0.664
anger+0.650
 ladder+0.612
Neg logit contributions
 blamed-1.061
 amused-0.983
gging-0.975
 ashamed-0.956
cks-0.948
Token a
Token ID257
Feature activation+1.23e-01

Pos logit contributions
t+0.297
 to+0.297
 hill+0.290
anger+0.286
 ladder+0.272
Neg logit contributions
 blamed-0.275
 amused-0.249
gging-0.244
 ashamed-0.238
cks-0.236
Token bath
Token ID7837
Feature activation-2.93e-01

Pos logit contributions
 angry+0.215
 mad+0.213
 sorry+0.207
 ashamed+0.202
 amused+0.198
Neg logit contributions
ane-0.199
anger-0.199
oes-0.197
 awaits-0.185
ho-0.183
Token.
Token ID13
Feature activation+1.74e-01

Pos logit contributions
 hill+0.570
t+0.566
anger+0.551
 ladder+0.546
 to+0.542
Neg logit contributions
 blamed-0.459
 amused-0.414
gging-0.412
 ashamed-0.411
 years-0.403
Token He
Token ID679
Feature activation+5.63e-02

Pos logit contributions
 blamed+0.334
 amused+0.307
gging+0.304
 ashamed+0.296
cks+0.296
Neg logit contributions
t-0.297
 hill-0.292
 to-0.291
anger-0.285
 ladder-0.271
Token,
Token ID11
Feature activation-1.38e-01

Pos logit contributions
t+0.254
anger+0.251
 hill+0.249
 plunge+0.240
 to+0.239
Neg logit contributions
 blamed-0.195
 amused-0.173
 ashamed-0.169
gging-0.169
 years-0.166
Token so
Token ID523
Feature activation+8.04e-02

Pos logit contributions
 to+0.237
 hill+0.221
t+0.220
 ladder+0.207
anger+0.204
Neg logit contributions
 blamed-0.268
gging-0.253
cks-0.247
 amused-0.244
 ashamed-0.237
Token he
Token ID339
Feature activation-1.57e-01

Pos logit contributions
 blamed+0.186
gging+0.176
cks+0.172
 amused+0.171
aron+0.166
Neg logit contributions
 to-0.109
t-0.106
 hill-0.103
 ladder-0.095
 arrived-0.095
Token used
Token ID973
Feature activation+9.24e-02

Pos logit contributions
t+0.275
 to+0.272
 hill+0.271
anger+0.264
 ladder+0.253
Neg logit contributions
 blamed-0.292
 amused-0.267
gging-0.267
cks-0.258
 ashamed-0.258
Token it
Token ID340
Feature activation-1.99e-01

Pos logit contributions
 blamed+0.159
 amused+0.143
 ashamed+0.139
gging+0.139
cks+0.138
Neg logit contributions
t-0.174
 hill-0.173
anger-0.173
oes-0.163
ho-0.160
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
 hill+0.102
anger+0.095
t+0.094
 ladder+0.072
 plunge+0.070
Neg logit contributions
 blamed-0.608
 amused-0.575
 innocent-0.573
 ashamed-0.572
gging-0.571
Token open
Token ID1280
Feature activation+1.92e-02

Pos logit contributions
 to+0.670
 hill+0.668
t+0.636
 ladder+0.601
anger+0.593
Neg logit contributions
 blamed-1.081
 amused-1.010
gging-0.997
 ashamed-0.985
cks-0.968
Token the
Token ID262
Feature activation+9.21e-03

Pos logit contributions
 blamed+0.034
 amused+0.031
gging+0.031
cks+0.031
 years+0.031
Neg logit contributions
t-0.035
anger-0.034
 hill-0.033
oes-0.032
ho-0.032
Token chest
Token ID7721
Feature activation-2.58e-02

Pos logit contributions
 blamed+0.020
 innocent+0.019
 years+0.019
 ashamed+0.019
 awful+0.019
Neg logit contributions
t-0.013
anger-0.011
oes-0.011
or-0.011
ho-0.011
Token.
Token ID13
Feature activation+1.75e-01

Pos logit contributions
 hill+0.052
t+0.052
anger+0.052
 plunge+0.050
 to+0.049
Neg logit contributions
 blamed-0.038
 amused-0.034
 ashamed-0.034
gging-0.033
 innocent-0.033
Token Inside
Token ID14384
Feature activation-1.99e-01

Pos logit contributions
 blamed+0.337
 amused+0.308
gging+0.305
 ashamed+0.299
cks+0.295
Neg logit contributions
t-0.299
 hill-0.295
 to-0.292
anger-0.289
 ladder-0.274
Token!"
Token ID2474
Feature activation-1.20e-01

Pos logit contributions
 blamed+0.006
 amused+0.006
 years+0.006
gging+0.005
aron+0.005
Neg logit contributions
 to-0.007
 hill-0.007
 ladder-0.006
oes-0.006
anger-0.006
Token
Token ID198
Feature activation-1.04e-01

Pos logit contributions
 to+0.200
t+0.193
 hill+0.179
 ladder+0.171
 arrived+0.171
Neg logit contributions
 blamed-0.235
cks-0.225
 amused-0.224
gging-0.223
 innocent-0.217
Token
Token ID198
Feature activation-9.26e-02

Pos logit contributions
 to+0.102
t+0.101
 hill+0.087
ore+0.078
 ladder+0.077
Neg logit contributions
 blamed-0.269
 amused-0.257
 ashamed-0.256
 annoying-0.255
 innocent-0.254
TokenThey
Token ID2990
Feature activation+2.07e-01

Pos logit contributions
t+0.155
 hill+0.150
 to+0.144
 ladder+0.135
where+0.131
Neg logit contributions
 blamed-0.176
 ashamed-0.168
 amused-0.167
 awful-0.161
 innocent-0.160
Token start
Token ID923
Feature activation-1.51e-01

Pos logit contributions
 blamed+0.349
 amused+0.345
gging+0.344
cks+0.340
aron+0.324
Neg logit contributions
 to-0.379
 arrived-0.379
 hill-0.378
t-0.355
ho-0.351
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
anger+0.076
t+0.063
 hill+0.060
 plunge+0.047
oes+0.044
Neg logit contributions
 blamed-0.480
 amused-0.458
gging-0.457
 ashamed-0.447
 angry-0.445
Token fight
Token ID1907
Feature activation-1.17e-01

Pos logit contributions
 hill+0.703
 to+0.678
 ladder+0.623
 plunge+0.592
 stair+0.582
Neg logit contributions
 blamed-1.046
 amused-1.011
 ashamed-0.982
aky-0.978
 years-0.965
Token.
Token ID13
Feature activation+1.64e-01

Pos logit contributions
 to+0.203
t+0.184
 hill+0.178
 ladder+0.163
anger+0.163
Neg logit contributions
 blamed-0.240
 amused-0.220
aky-0.218
gging-0.217
 ashamed-0.214
Token They
Token ID1119
Feature activation+1.32e-01

Pos logit contributions
 blamed+0.284
 amused+0.279
gging+0.277
aky+0.275
cks+0.274
Neg logit contributions
 to-0.302
 hill-0.292
t-0.291
anger-0.277
 ladder-0.274
Token push
Token ID4574
Feature activation-1.05e-01

Pos logit contributions
 blamed+0.283
 amused+0.283
gging+0.280
aron+0.280
 years+0.274
Neg logit contributions
 to-0.177
 hill-0.177
t-0.171
 ladder-0.160
 arrived-0.158
Token and
Token ID290
Feature activation-5.79e-03

Pos logit contributions
 to+0.184
 hill+0.166
t+0.162
 ladder+0.150
anger+0.149
Neg logit contributions
 blamed-0.215
 amused-0.199
 ashamed-0.191
cks-0.188
 sincere-0.188
Token around
Token ID1088
Feature activation-9.94e-02

Pos logit contributions
 to+0.408
t+0.373
 hill+0.364
 ladder+0.353
 arrived+0.350
Neg logit contributions
 blamed-0.312
gging-0.293
aron-0.274
cks-0.272
 amused-0.267
Token,
Token ID11
Feature activation-1.20e-01

Pos logit contributions
t+0.174
anger+0.169
 hill+0.168
 to+0.164
oes+0.158
Neg logit contributions
 blamed-0.186
 amused-0.172
gging-0.170
 ashamed-0.166
cks-0.163
Token so
Token ID523
Feature activation+9.18e-02

Pos logit contributions
 to+0.208
 hill+0.202
t+0.202
anger+0.188
 ladder+0.188
Neg logit contributions
 blamed-0.233
gging-0.213
 amused-0.210
cks-0.209
 ashamed-0.204
Token he
Token ID339
Feature activation-1.56e-01

Pos logit contributions
 blamed+0.211
gging+0.197
 amused+0.194
cks+0.192
 ashamed+0.188
Neg logit contributions
 to-0.126
t-0.123
 hill-0.122
 ladder-0.112
anger-0.111
Token decided
Token ID3066
Feature activation+1.33e-01

Pos logit contributions
t+0.268
 to+0.264
 hill+0.263
anger+0.255
 ladder+0.245
Neg logit contributions
 blamed-0.299
 amused-0.274
gging-0.273
cks-0.265
 ashamed-0.264
Token to
Token ID284
Feature activation-4.80e-01

Pos logit contributions
 blamed+0.387
 amused+0.376
 ashamed+0.375
 innocent+0.370
 angry+0.363
Neg logit contributions
anger-0.086
 hill-0.079
oes-0.069
t-0.065
ho-0.064
Token keep
Token ID1394
Feature activation-9.29e-02

Pos logit contributions
t+0.672
anger+0.654
 hill+0.633
 to+0.621
oes+0.590
Neg logit contributions
 blamed-1.077
gging-0.987
 amused-0.975
 ashamed-0.966
cks-0.952
Token looking
Token ID2045
Feature activation-3.76e-03

Pos logit contributions
 to+0.147
 hill+0.141
t+0.139
anger+0.135
 plunge+0.130
Neg logit contributions
 blamed-0.197
 amused-0.179
gging-0.179
 ashamed-0.173
cks-0.171
Token through
Token ID832
Feature activation-1.59e-01

Pos logit contributions
 to+0.006
 hill+0.006
t+0.005
 ladder+0.005
anger+0.005
Neg logit contributions
 blamed-0.008
gging-0.008
cks-0.007
 amused-0.007
aron-0.007
Token the
Token ID262
Feature activation+1.36e-02

Pos logit contributions
 to+0.283
 hill+0.277
t+0.260
 ladder+0.254
anger+0.253
Neg logit contributions
 blamed-0.313
 amused-0.283
gging-0.283
 ashamed-0.274
cks-0.272
Token store
Token ID3650
Feature activation-2.34e-01

Pos logit contributions
 blamed+0.026
 years+0.025
 innocent+0.025
 ashamed+0.024
 amused+0.024
Neg logit contributions
t-0.024
anger-0.021
 to-0.020
oes-0.020
 arrived-0.019
Token hills
Token ID18639
Feature activation-3.23e-01

Pos logit contributions
 to+0.176
t+0.165
 hill+0.164
anger+0.152
 ladder+0.151
Neg logit contributions
 blamed-0.195
 amused-0.175
gging-0.173
cks-0.171
 ashamed-0.165
Token that
Token ID326
Feature activation-1.19e-01

Pos logit contributions
t+0.617
 hill+0.608
anger+0.601
 to+0.601
 ladder+0.570
Neg logit contributions
 blamed-0.547
 amused-0.494
gging-0.490
 ashamed-0.477
cks-0.472
Token were
Token ID547
Feature activation-5.75e-03

Pos logit contributions
t+0.189
anger+0.187
 to+0.176
 hill+0.174
 plunge+0.169
Neg logit contributions
 blamed-0.240
 amused-0.222
 ashamed-0.220
gging-0.215
 innocent-0.213
Token very
Token ID845
Feature activation+1.30e-01

Pos logit contributions
t+0.012
 to+0.012
 hill+0.012
anger+0.011
 ladder+0.011
Neg logit contributions
 blamed-0.009
gging-0.008
 amused-0.008
cks-0.008
 ashamed-0.008
Token tricky
Token ID17198
Feature activation+1.14e-01

Pos logit contributions
 blamed+0.218
gging+0.205
cks+0.197
 amused+0.192
aron+0.188
Neg logit contributions
 to-0.257
 hill-0.256
t-0.254
 ladder-0.244
anger-0.233
Token to
Token ID284
Feature activation-4.80e-01

Pos logit contributions
 blamed+0.170
 amused+0.151
gging+0.149
 ashamed+0.147
 years+0.146
Neg logit contributions
t-0.239
 hill-0.235
anger-0.235
 to-0.226
oes-0.224
Token climb
Token ID12080
Feature activation-2.13e-01

Pos logit contributions
 hill+0.644
t+0.638
 to+0.637
anger+0.599
 ladder+0.585
Neg logit contributions
 blamed-1.107
 amused-1.034
gging-1.017
 ashamed-1.003
cks-0.990
Token.
Token ID13
Feature activation+1.70e-01

Pos logit contributions
t+0.387
 hill+0.376
anger+0.373
 to+0.368
oes+0.353
Neg logit contributions
 blamed-0.378
 amused-0.343
gging-0.343
 ashamed-0.335
cks-0.333
Token Daisy
Token ID40355
Feature activation-3.27e-02

Pos logit contributions
 blamed+0.331
 amused+0.313
gging+0.310
cks+0.307
 ashamed+0.298
Neg logit contributions
 to-0.280
t-0.279
 hill-0.276
anger-0.262
 ladder-0.257
Token was
Token ID373
Feature activation-1.51e-01

Pos logit contributions
t+0.053
 to+0.053
 hill+0.052
anger+0.050
 ladder+0.048
Neg logit contributions
 blamed-0.065
 amused-0.060
gging-0.060
cks-0.058
 ashamed-0.058
Token very
Token ID845
Feature activation+1.47e-01

Pos logit contributions
 to+0.336
t+0.303
 ladder+0.297
 hill+0.296
 arrived+0.290
Neg logit contributions
 blamed-0.225
gging-0.208
cks-0.206
aron-0.192
 amused-0.188
Token nest
Token ID16343
Feature activation-1.70e-01

Pos logit contributions
 to+0.801
 hill+0.765
anger+0.726
t+0.723
 ladder+0.716
Neg logit contributions
 blamed-0.872
 amused-0.797
aky-0.795
gging-0.780
 years-0.778
Token,
Token ID11
Feature activation-1.31e-01

Pos logit contributions
t+0.310
 hill+0.308
anger+0.303
 to+0.299
 plunge+0.286
Neg logit contributions
 blamed-0.298
 amused-0.270
gging-0.267
 ashamed-0.265
cks-0.260
Token so
Token ID523
Feature activation+8.02e-02

Pos logit contributions
 to+0.222
t+0.221
 hill+0.220
anger+0.210
 ladder+0.204
Neg logit contributions
 blamed-0.251
 amused-0.229
gging-0.229
cks-0.223
 ashamed-0.222
Token she
Token ID673
Feature activation-9.89e-02

Pos logit contributions
 blamed+0.183
gging+0.170
 amused+0.169
cks+0.166
 ashamed+0.163
Neg logit contributions
 to-0.109
t-0.108
 hill-0.107
anger-0.099
 ladder-0.098
Token decided
Token ID3066
Feature activation+1.34e-01

Pos logit contributions
t+0.175
 hill+0.172
 to+0.171
anger+0.168
 ladder+0.160
Neg logit contributions
 blamed-0.183
 amused-0.167
gging-0.166
 ashamed-0.162
cks-0.160
Token to
Token ID284
Feature activation-4.80e-01

Pos logit contributions
 blamed+0.378
 amused+0.371
 ashamed+0.370
 innocent+0.362
 angry+0.357
Neg logit contributions
anger-0.095
 hill-0.094
oes-0.077
t-0.071
loo-0.071
Token give
Token ID1577
Feature activation-7.80e-02

Pos logit contributions
t+0.725
anger+0.712
 hill+0.672
 to+0.656
oes+0.637
Neg logit contributions
 blamed-1.024
gging-0.935
 ashamed-0.915
 amused-0.911
cks-0.909
Token him
Token ID683
Feature activation+4.24e-02

Pos logit contributions
t+0.140
 hill+0.138
 to+0.136
anger+0.135
 ladder+0.129
Neg logit contributions
 blamed-0.144
 amused-0.131
gging-0.130
 ashamed-0.127
cks-0.125
Token a
Token ID257
Feature activation+1.26e-01

Pos logit contributions
 blamed+0.073
 amused+0.066
gging+0.065
cks+0.062
 ashamed+0.062
Neg logit contributions
 to-0.085
t-0.081
 hill-0.081
anger-0.079
 ladder-0.076
Token helping
Token ID5742
Feature activation+1.31e-02

Pos logit contributions
 ashamed+0.204
 sorry+0.198
 amused+0.187
 angry+0.187
 blamed+0.187
Neg logit contributions
t-0.222
anger-0.220
or-0.216
ho-0.214
oes-0.211
Token hand
Token ID1021
Feature activation-1.11e-01

Pos logit contributions
 blamed+0.024
 years+0.023
aron+0.022
gging+0.022
aky+0.022
Neg logit contributions
t-0.021
anger-0.021
 hill-0.020
 Pole-0.019
ho-0.019
Token were
Token ID547
Feature activation-3.89e-02

Pos logit contributions
 blamed+0.382
 amused+0.347
gging+0.346
 ashamed+0.339
cks+0.333
Neg logit contributions
t-0.332
 hill-0.328
 to-0.323
anger-0.323
 ladder-0.304
Token so
Token ID523
Feature activation+6.28e-02

Pos logit contributions
 to+0.081
t+0.074
 hill+0.072
 ladder+0.070
 arrived+0.070
Neg logit contributions
 blamed-0.064
gging-0.062
aron-0.058
cks-0.057
 amused-0.054
Token many
Token ID867
Feature activation+5.58e-02

Pos logit contributions
gging+0.084
 blamed+0.084
aron+0.080
cks+0.077
ags+0.072
Neg logit contributions
 to-0.147
t-0.141
 ladder-0.137
 hill-0.134
ho-0.130
Token fun
Token ID1257
Feature activation+1.86e-01

Pos logit contributions
 blamed+0.098
gging+0.094
cks+0.091
 amused+0.088
aky+0.086
Neg logit contributions
 to-0.107
 hill-0.106
 ladder-0.099
t-0.098
 mountain-0.096
Token things
Token ID1243
Feature activation-1.08e-01

Pos logit contributions
 blamed+0.316
 amused+0.288
gging+0.280
 ashamed+0.278
 years+0.276
Neg logit contributions
t-0.351
anger-0.344
 hill-0.342
 to-0.328
oes-0.326
Token to
Token ID284
Feature activation-4.80e-01

Pos logit contributions
t+0.169
 hill+0.169
anger+0.168
oes+0.157
 to+0.157
Neg logit contributions
 blamed-0.216
 amused-0.199
 ashamed-0.194
gging-0.193
cks-0.188
Token look
Token ID804
Feature activation-2.20e-01

Pos logit contributions
 hill+0.592
 to+0.583
t+0.544
 ladder+0.510
anger+0.507
Neg logit contributions
 blamed-1.173
 amused-1.097
 ashamed-1.078
gging-1.077
cks-1.055
Token at
Token ID379
Feature activation+7.49e-02

Pos logit contributions
 to+0.392
t+0.338
 hill+0.329
anger+0.312
or+0.308
Neg logit contributions
 blamed-0.458
gging-0.423
cks-0.408
 amused-0.398
aron-0.387
Token.
Token ID13
Feature activation+1.65e-01

Pos logit contributions
 blamed+0.135
 amused+0.123
gging+0.122
 ashamed+0.119
cks+0.117
Neg logit contributions
t-0.136
 to-0.136
 hill-0.136
anger-0.132
 ladder-0.126
Token She
Token ID1375
Feature activation+5.31e-02

Pos logit contributions
 blamed+0.314
 amused+0.294
gging+0.293
cks+0.285
 ashamed+0.282
Neg logit contributions
t-0.283
 to-0.281
 hill-0.281
anger-0.269
 ladder-0.260
Token saw
Token ID2497
Feature activation-1.29e-01

Pos logit contributions
 blamed+0.100
gging+0.096
 amused+0.094
cks+0.094
 ashamed+0.090
Neg logit contributions
 to-0.091
 hill-0.089
t-0.088
 arrived-0.085
anger-0.083
Token time
Token ID640
Feature activation-5.99e-03

Pos logit contributions
 to+0.091
t+0.083
 hill+0.080
 arrived+0.079
'll+0.076
Neg logit contributions
 blamed-0.064
gging-0.060
aron-0.057
cks-0.056
 amused-0.055
Token,
Token ID11
Feature activation-1.37e-01

Pos logit contributions
t+0.009
 hill+0.009
anger+0.009
oes+0.009
 plunge+0.008
Neg logit contributions
 blamed-0.012
 amused-0.011
 ashamed-0.011
gging-0.011
cks-0.011
Token she
Token ID673
Feature activation-1.00e-01

Pos logit contributions
t+0.213
anger+0.208
 hill+0.206
 to+0.201
oes+0.193
Neg logit contributions
 blamed-0.284
 amused-0.261
gging-0.257
 ashamed-0.256
 innocent-0.248
Token made
Token ID925
Feature activation-5.87e-02

Pos logit contributions
 to+0.154
 hill+0.152
t+0.152
anger+0.144
 arrived+0.142
Neg logit contributions
 blamed-0.206
gging-0.193
 amused-0.192
cks-0.186
 ashamed-0.184
Token sure
Token ID1654
Feature activation+2.15e-01

Pos logit contributions
 to+0.121
anger+0.107
t+0.106
 hill+0.105
 ladder+0.105
Neg logit contributions
 blamed-0.103
gging-0.090
 amused-0.089
cks-0.089
aky-0.087
Token to
Token ID284
Feature activation-4.80e-01

Pos logit contributions
 blamed+0.562
 ashamed+0.543
 amused+0.528
 innocent+0.519
 angry+0.515
Neg logit contributions
anger-0.217
 hill-0.207
t-0.192
 Pole-0.181
oes-0.181
Token put
Token ID1234
Feature activation+5.97e-02

Pos logit contributions
 to+0.679
 hill+0.661
t+0.652
 ladder+0.607
anger+0.606
Neg logit contributions
 blamed-1.075
 amused-1.014
gging-0.987
 ashamed-0.975
cks-0.969
Token them
Token ID606
Feature activation+4.07e-04

Pos logit contributions
 blamed+0.121
 amused+0.109
gging+0.109
 ashamed+0.106
cks+0.102
Neg logit contributions
 to-0.101
 hill-0.097
t-0.096
anger-0.096
 ladder-0.092
Token in
Token ID287
Feature activation-5.17e-02

Pos logit contributions
 blamed+0.001
 amused+0.001
gging+0.001
 ashamed+0.001
cks+0.001
Neg logit contributions
t-0.001
 hill-0.001
anger-0.001
 to-0.001
 ladder-0.001
Token a
Token ID257
Feature activation+1.23e-01

Pos logit contributions
 to+0.107
 hill+0.101
t+0.097
 ladder+0.095
anger+0.093
Neg logit contributions
 blamed-0.090
gging-0.082
 amused-0.081
cks-0.077
 ashamed-0.077
Token basket
Token ID7988
Feature activation-2.40e-01

Pos logit contributions
 blamed+0.243
 ashamed+0.235
 years+0.232
 innocent+0.231
aky+0.231
Neg logit contributions
t-0.197
oes-0.192
anger-0.184
ho-0.171
eware-0.166
Token play
Token ID711
Feature activation-1.33e-01

Pos logit contributions
 blamed+0.026
 ashamed+0.023
 amused+0.022
 innocent+0.021
 spit+0.021
Neg logit contributions
anger-0.023
 hill-0.021
t-0.021
ane-0.020
 Pole-0.020
Token together
Token ID1978
Feature activation+5.44e-02

Pos logit contributions
 to+0.209
 hill+0.184
t+0.173
 prize+0.169
 ladder+0.164
Neg logit contributions
 blamed-0.291
 amused-0.280
cks-0.277
gging-0.272
 annoying-0.266
Token nicely
Token ID16576
Feature activation+1.27e-01

Pos logit contributions
 blamed+0.088
 amused+0.079
gging+0.078
 ashamed+0.076
cks+0.075
Neg logit contributions
 to-0.117
t-0.112
 hill-0.111
anger-0.104
 ladder-0.104
Token.
Token ID13
Feature activation+1.90e-01

Pos logit contributions
 blamed+0.207
 amused+0.184
gging+0.182
 ashamed+0.176
cks+0.173
Neg logit contributions
 to-0.278
t-0.266
 hill-0.261
anger-0.249
 ladder-0.242
Token They
Token ID1119
Feature activation+1.59e-01

Pos logit contributions
 blamed+0.386
gging+0.382
cks+0.373
 amused+0.366
aky+0.359
Neg logit contributions
 to-0.308
t-0.299
 hill-0.282
anger-0.274
 ladder-0.268
Token both
Token ID1111
Feature activation+3.26e-01

Pos logit contributions
gging+0.285
cks+0.281
 blamed+0.275
aron+0.269
 amused+0.267
Neg logit contributions
 to-0.297
t-0.286
 hill-0.284
 arrived-0.282
'll-0.262
Token agreed
Token ID4987
Feature activation+2.21e-01

Pos logit contributions
cks+0.717
gging+0.715
ags+0.684
 amused+0.643
ivers+0.643
Neg logit contributions
 to-0.511
 arrived-0.509
t-0.462
 would-0.458
'll-0.438
Token and
Token ID290
Feature activation+1.18e-02

Pos logit contributions
 blamed+0.539
 amused+0.504
gging+0.497
 ashamed+0.496
 innocent+0.483
Neg logit contributions
 hill-0.260
t-0.257
anger-0.256
oes-0.237
 to-0.233
Token had
Token ID550
Feature activation+1.48e-02

Pos logit contributions
 blamed+0.022
 amused+0.020
gging+0.020
 ashamed+0.019
cks+0.019
Neg logit contributions
t-0.021
 hill-0.021
 to-0.021
anger-0.020
 ladder-0.019
Token fun
Token ID1257
Feature activation+2.25e-01

Pos logit contributions
 blamed+0.026
cks+0.025
gging+0.025
 amused+0.024
aky+0.023
Neg logit contributions
 to-0.031
t-0.027
 hill-0.026
 arrived-0.026
'll-0.025
Token playing
Token ID2712
Feature activation+6.47e-02

Pos logit contributions
 blamed+0.409
 amused+0.376
gging+0.369
 ashamed+0.363
aky+0.357
Neg logit contributions
 to-0.426
 hill-0.411
t-0.407
anger-0.383
 ladder-0.379
Token apple
Token ID17180
Feature activation-5.90e-02

Pos logit contributions
 blamed+0.027
gging+0.026
 amused+0.026
cks+0.026
aky+0.024
Neg logit contributions
 to-0.021
 hill-0.019
t-0.019
 arrived-0.018
 ladder-0.018
Token juice
Token ID13135
Feature activation+2.58e-02

Pos logit contributions
t+0.105
 hill+0.105
anger+0.104
 plunge+0.100
 ladder+0.099
Neg logit contributions
 blamed-0.101
cks-0.092
gging-0.092
 amused-0.091
 ashamed-0.091
Token.
Token ID13
Feature activation+1.90e-01

Pos logit contributions
 blamed+0.030
 amused+0.025
aron+0.025
gging+0.025
 ashamed+0.025
Neg logit contributions
 hill-0.061
t-0.060
anger-0.059
 plunge-0.059
 ladder-0.058
Token They
Token ID1119
Feature activation+1.60e-01

Pos logit contributions
 blamed+0.308
 amused+0.276
gging+0.272
 ashamed+0.266
cks+0.261
Neg logit contributions
t-0.385
 hill-0.381
 to-0.377
anger-0.374
 ladder-0.357
Token are
Token ID389
Feature activation-1.36e-02

Pos logit contributions
 blamed+0.301
gging+0.294
 amused+0.288
cks+0.286
aron+0.276
Neg logit contributions
 hill-0.276
 to-0.273
t-0.262
 arrived-0.255
 ladder-0.250
Token happy
Token ID3772
Feature activation+3.28e-01

Pos logit contributions
 to+0.027
t+0.022
 ladder+0.022
 hill+0.021
 arrived+0.021
Neg logit contributions
gging-0.026
 blamed-0.024
cks-0.023
aron-0.022
ck-0.022
Token and
Token ID290
Feature activation+1.27e-02

Pos logit contributions
 blamed+0.731
 amused+0.679
gging+0.668
 ashamed+0.664
 years+0.649
Neg logit contributions
t-0.460
 hill-0.455
anger-0.448
 to-0.427
 ladder-0.417
Token say
Token ID910
Feature activation+1.31e-01

Pos logit contributions
gging+0.024
aky+0.023
 blamed+0.023
cks+0.022
aron+0.022
Neg logit contributions
 to-0.024
 hill-0.020
 arrived-0.020
t-0.020
 visit-0.020
Token thank
Token ID5875
Feature activation+6.67e-02

Pos logit contributions
 blamed+0.260
 amused+0.238
gging+0.236
 ashamed+0.230
cks+0.230
Neg logit contributions
t-0.218
 to-0.215
 hill-0.214
anger-0.209
 ladder-0.199
Token you
Token ID345
Feature activation-1.03e-01

Pos logit contributions
 blamed+0.186
 amused+0.175
 ashamed+0.172
aron+0.169
cks+0.168
Neg logit contributions
 to-0.067
t-0.065
or-0.052
oes-0.049
anger-0.047
Token.
Token ID13
Feature activation+1.91e-01

Pos logit contributions
 to+0.209
t+0.208
 hill+0.200
anger+0.197
 ladder+0.188
Neg logit contributions
 blamed-0.176
 amused-0.157
gging-0.156
cks-0.152
 ashamed-0.148
Token size
Token ID2546
Feature activation+4.12e-02

Pos logit contributions
 blamed+0.070
 years+0.067
aron+0.066
gging+0.065
 innocent+0.065
Neg logit contributions
t-0.085
anger-0.083
 hill-0.080
 plunge-0.079
 ladder-0.076
Token.
Token ID13
Feature activation+1.86e-01

Pos logit contributions
 blamed+0.058
 amused+0.054
gging+0.052
 ashamed+0.052
 years+0.051
Neg logit contributions
 hill-0.087
t-0.085
anger-0.083
 plunge-0.081
 ladder-0.080
Token She
Token ID1375
Feature activation+7.11e-02

Pos logit contributions
 blamed+0.340
 amused+0.319
cks+0.317
gging+0.317
ck+0.300
Neg logit contributions
 to-0.337
t-0.337
anger-0.326
 hill-0.325
 ladder-0.309
Token was
Token ID373
Feature activation-1.39e-01

Pos logit contributions
 blamed+0.148
gging+0.143
 amused+0.142
cks+0.142
 ashamed+0.136
Neg logit contributions
 to-0.107
 hill-0.103
t-0.102
 arrived-0.100
 ladder-0.093
Token so
Token ID523
Feature activation+1.16e-01

Pos logit contributions
 to+0.274
t+0.222
 arrived+0.222
 ladder+0.216
 hill+0.210
Neg logit contributions
 blamed-0.251
gging-0.251
cks-0.238
aron-0.222
itive-0.218
Token happy
Token ID3772
Feature activation+3.24e-01

Pos logit contributions
 blamed+0.153
gging+0.148
cks+0.146
aron+0.145
ities+0.136
Neg logit contributions
 to-0.269
t-0.260
 ladder-0.244
ho-0.235
 arrived-0.234
Token that
Token ID326
Feature activation-8.10e-02

Pos logit contributions
 blamed+0.589
 years+0.565
 amused+0.563
 ashamed+0.549
gging+0.549
Neg logit contributions
 hill-0.543
anger-0.538
t-0.535
 plunge-0.529
oes-0.518
Token she
Token ID673
Feature activation-6.77e-02

Pos logit contributions
t+0.135
anger+0.134
 hill+0.134
 to+0.128
oes+0.124
Neg logit contributions
 blamed-0.159
 amused-0.146
 ashamed-0.143
gging-0.141
 innocent-0.138
Token succeeded
Token ID14131
Feature activation+6.95e-02

Pos logit contributions
t+0.114
 to+0.114
 hill+0.110
 arrived+0.107
anger+0.106
Neg logit contributions
 blamed-0.130
gging-0.122
 amused-0.120
cks-0.120
 ashamed-0.115
Token in
Token ID287
Feature activation-3.56e-02

Pos logit contributions
 blamed+0.140
 amused+0.127
gging+0.126
 ashamed+0.124
cks+0.122
Neg logit contributions
t-0.114
 to-0.113
 hill-0.112
anger-0.109
 ladder-0.103
Token finding
Token ID4917
Feature activation-3.25e-02

Pos logit contributions
 to+0.065
t+0.065
 hill+0.064
anger+0.062
 ladder+0.060
Neg logit contributions
 blamed-0.065
 amused-0.058
gging-0.058
 ashamed-0.056
cks-0.056
Token ground
Token ID2323
Feature activation-3.42e-02

Pos logit contributions
 blamed+0.055
 amused+0.051
 ashamed+0.050
gging+0.050
 innocent+0.049
Neg logit contributions
t-0.031
anger-0.029
 to-0.028
 hill-0.027
oes-0.026
Token.
Token ID13
Feature activation+1.89e-01

Pos logit contributions
t+0.068
 hill+0.067
 to+0.067
anger+0.066
 ladder+0.063
Neg logit contributions
 blamed-0.056
 amused-0.050
gging-0.050
 ashamed-0.049
cks-0.048
Token The
Token ID383
Feature activation+5.12e-02

Pos logit contributions
 blamed+0.356
 amused+0.323
gging+0.319
 ashamed+0.314
cks+0.307
Neg logit contributions
t-0.331
 hill-0.328
 to-0.321
anger-0.319
 ladder-0.303
Token cat
Token ID3797
Feature activation-1.55e-01

Pos logit contributions
 blamed+0.130
 amused+0.122
gging+0.121
cks+0.119
 ashamed+0.118
Neg logit contributions
t-0.056
 hill-0.056
 to-0.055
anger-0.053
 ladder-0.050
Token was
Token ID373
Feature activation-1.37e-01

Pos logit contributions
t+0.239
 hill+0.236
 to+0.235
anger+0.228
 ladder+0.217
Neg logit contributions
 blamed-0.321
 amused-0.296
gging-0.295
 ashamed-0.288
cks-0.286
Token happy
Token ID3772
Feature activation+3.26e-01

Pos logit contributions
 to+0.274
t+0.231
 arrived+0.230
 ladder+0.227
 hill+0.227
Neg logit contributions
 blamed-0.249
gging-0.236
itive-0.212
cks-0.206
aron-0.204
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
 blamed+0.687
 amused+0.661
 ashamed+0.651
 years+0.637
gging+0.628
Neg logit contributions
anger-0.476
t-0.472
 hill-0.464
 plunge-0.447
oes-0.440
Token safe
Token ID3338
Feature activation+1.45e-01

Pos logit contributions
 blamed+0.020
gging+0.019
cks+0.018
aky+0.017
 amused+0.017
Neg logit contributions
 to-0.026
t-0.025
 hill-0.024
 arrived-0.023
 ladder-0.023
Token,
Token ID11
Feature activation-9.53e-02

Pos logit contributions
 blamed+0.244
 amused+0.221
gging+0.218
 ashamed+0.214
cks+0.210
Neg logit contributions
t-0.281
 hill-0.277
 to-0.273
anger-0.272
 ladder-0.259
Token all
Token ID477
Feature activation+5.85e-02

Pos logit contributions
 to+0.168
t+0.159
 hill+0.159
 ladder+0.148
anger+0.148
Neg logit contributions
 blamed-0.187
gging-0.173
cks-0.169
 amused-0.165
aky-0.162
Token thanks
Token ID5176
Feature activation+2.24e-01

Pos logit contributions
 blamed+0.127
gging+0.122
 amused+0.114
cks+0.114
 ashamed+0.111
Neg logit contributions
 to-0.092
 hill-0.086
t-0.086
 ladder-0.079
 arrived-0.078
Token small
Token ID1402
Feature activation+5.94e-02

Pos logit contributions
 blamed+0.068
gging+0.059
cks+0.058
aron+0.057
 amused+0.056
Neg logit contributions
 to-0.115
t-0.113
 hill-0.112
 ladder-0.105
oes-0.104
Token they
Token ID484
Feature activation-8.65e-02

Pos logit contributions
 blamed+0.120
 amused+0.110
gging+0.109
cks+0.106
 ashamed+0.106
Neg logit contributions
t-0.095
 to-0.095
 hill-0.095
anger-0.091
 ladder-0.086
Token were
Token ID547
Feature activation+2.83e-02

Pos logit contributions
t+0.144
 to+0.143
 hill+0.138
 arrived+0.134
 plunge+0.124
Neg logit contributions
 blamed-0.169
aron-0.158
cks-0.157
 amused-0.157
gging-0.157
Token.
Token ID13
Feature activation+1.93e-01

Pos logit contributions
 blamed+0.050
 amused+0.044
gging+0.044
cks+0.044
aron+0.043
Neg logit contributions
 to-0.056
t-0.053
 hill-0.051
oes-0.048
 arrived-0.048
Token He
Token ID679
Feature activation+7.00e-02

Pos logit contributions
 blamed+0.331
 amused+0.301
gging+0.297
cks+0.288
 ashamed+0.287
Neg logit contributions
t-0.374
 hill-0.369
 to-0.368
anger-0.360
 ladder-0.345
Token learned
Token ID4499
Feature activation+2.87e-01

Pos logit contributions
 blamed+0.138
 amused+0.129
gging+0.129
cks+0.128
 innocent+0.121
Neg logit contributions
 to-0.116
t-0.113
 hill-0.112
 arrived-0.105
anger-0.103
Token that
Token ID326
Feature activation-7.79e-02

Pos logit contributions
 blamed+0.650
 amused+0.603
gging+0.597
cks+0.586
 ashamed+0.581
Neg logit contributions
 to-0.403
t-0.398
 hill-0.386
anger-0.376
 ladder-0.352
Token it
Token ID340
Feature activation-1.70e-01

Pos logit contributions
anger+0.155
t+0.150
 hill+0.145
 ladder+0.139
 arrived+0.139
Neg logit contributions
 blamed-0.132
 amused-0.122
 ashamed-0.120
gging-0.118
 innocent-0.115
Token's
Token ID338
Feature activation-9.24e-02

Pos logit contributions
anger+0.269
 hill+0.262
t+0.250
 ladder+0.246
 plunge+0.244
Neg logit contributions
 blamed-0.357
 ashamed-0.329
 amused-0.326
gging-0.320
 annoying-0.314
Token important
Token ID1593
Feature activation+2.35e-01

Pos logit contributions
 to+0.190
t+0.159
where+0.157
 would+0.155
'll+0.152
Neg logit contributions
 blamed-0.166
aron-0.148
cks-0.148
 amused-0.144
gging-0.143
Token to
Token ID284
Feature activation-4.52e-01

Pos logit contributions
 blamed+0.654
 ashamed+0.649
gging+0.636
 years+0.635
 innocent+0.626
Neg logit contributions
 hill-0.160
anger-0.159
 Pole-0.154
 plunge-0.148
 ladder-0.137
Token does
Token ID857
Feature activation+3.75e-02

Pos logit contributions
 blamed+0.134
gging+0.126
 amused+0.126
 ashamed+0.122
aky+0.121
Neg logit contributions
 to-0.079
 hill-0.078
t-0.075
anger-0.071
 ladder-0.071
Token not
Token ID407
Feature activation+1.67e-01

Pos logit contributions
 blamed+0.070
aky+0.066
aron+0.062
 years+0.060
ady+0.060
Neg logit contributions
 to-0.074
t-0.068
 ladder-0.062
 prize-0.062
 hill-0.061
Token like
Token ID588
Feature activation+2.46e-01

Pos logit contributions
 blamed+0.339
gging+0.324
cks+0.316
 years+0.305
 amused+0.305
Neg logit contributions
 to-0.285
t-0.276
 hill-0.245
 arrived-0.242
oes-0.239
Token dolls
Token ID36062
Feature activation+2.98e-02

Pos logit contributions
 blamed+0.453
 amused+0.416
gging+0.408
 ashamed+0.403
cks+0.396
Neg logit contributions
t-0.443
 hill-0.432
anger-0.429
 plunge-0.401
oes-0.400
Token.
Token ID13
Feature activation+1.75e-01

Pos logit contributions
 blamed+0.047
 amused+0.042
gging+0.042
cks+0.040
 ashamed+0.039
Neg logit contributions
 to-0.064
t-0.062
 hill-0.060
anger-0.059
 arrived-0.057
Token He
Token ID679
Feature activation+5.95e-02

Pos logit contributions
 blamed+0.357
gging+0.335
 amused+0.332
ck+0.331
cks+0.330
Neg logit contributions
 to-0.291
t-0.277
 hill-0.266
anger-0.261
 ladder-0.252
Token thinks
Token ID6834
Feature activation+9.12e-02

Pos logit contributions
 blamed+0.123
 amused+0.119
gging+0.118
aky+0.115
aron+0.113
Neg logit contributions
 to-0.086
 hill-0.083
t-0.081
 arrived-0.077
 ladder-0.077
Token they
Token ID484
Feature activation-9.57e-02

Pos logit contributions
 blamed+0.178
 amused+0.164
gging+0.161
 ashamed+0.159
cks+0.156
Neg logit contributions
t-0.153
 hill-0.151
 to-0.148
anger-0.147
 ladder-0.139
Token are
Token ID389
Feature activation-2.41e-02

Pos logit contributions
t+0.146
 hill+0.144
 to+0.142
anger+0.135
 ladder+0.131
Neg logit contributions
 blamed-0.202
 amused-0.189
gging-0.186
 ashamed-0.181
cks-0.179
Token boring
Token ID14262
Feature activation+2.69e-01

Pos logit contributions
 to+0.048
 prize+0.038
 hill+0.037
 ladder+0.035
 box+0.035
Neg logit contributions
 blamed-0.043
gging-0.043
aron-0.040
iver-0.039
ivery-0.038
Token.
Token ID13
Feature activation+1.75e-01

Pos logit contributions
 blamed+0.421
 amused+0.379
gging+0.378
cks+0.367
 ashamed+0.354
Neg logit contributions
 to-0.587
t-0.560
 hill-0.550
anger-0.532
 ladder-0.507
Token and
Token ID290
Feature activation+2.94e-03

Pos logit contributions
 blamed+0.100
 amused+0.092
gging+0.091
 ashamed+0.089
cks+0.088
Neg logit contributions
t-0.093
 hill-0.091
anger-0.090
 to-0.089
 ladder-0.085
Token try
Token ID1949
Feature activation-2.24e-01

Pos logit contributions
 blamed+0.006
 amused+0.005
 ashamed+0.005
gging+0.005
 innocent+0.005
Neg logit contributions
anger-0.005
t-0.005
 hill-0.005
 to-0.005
 plunge-0.005
Token each
Token ID1123
Feature activation+1.21e-01

Pos logit contributions
t+0.222
 hill+0.212
anger+0.211
 to+0.190
 plunge+0.185
Neg logit contributions
 blamed-0.595
 amused-0.563
gging-0.559
 ashamed-0.551
cks-0.543
Token other
Token ID584
Feature activation+2.50e-02

Pos logit contributions
 blamed+0.219
 amused+0.199
gging+0.197
 ashamed+0.192
cks+0.191
Neg logit contributions
t-0.223
 hill-0.221
 to-0.220
anger-0.216
 ladder-0.206
Token's
Token ID338
Feature activation-9.21e-02

Pos logit contributions
 blamed+0.055
 amused+0.050
gging+0.049
 ashamed+0.049
cks+0.048
Neg logit contributions
 to-0.038
t-0.037
 hill-0.037
anger-0.035
 ladder-0.033
Token toys
Token ID14958
Feature activation+3.32e-02

Pos logit contributions
 to+0.166
 hill+0.162
t+0.161
anger+0.154
 ladder+0.152
Neg logit contributions
 blamed-0.175
 amused-0.162
gging-0.157
cks-0.154
 ashamed-0.153
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
 blamed+0.051
 amused+0.045
gging+0.045
 ashamed+0.043
cks+0.043
Neg logit contributions
t-0.070
 hill-0.069
 to-0.068
anger-0.068
 ladder-0.065
Token They
Token ID1119
Feature activation+1.53e-01

Pos logit contributions
 blamed+0.320
gging+0.311
cks+0.308
 amused+0.304
eling+0.301
Neg logit contributions
 to-0.348
t-0.325
 hill-0.322
anger-0.310
 ladder-0.304
Token found
Token ID1043
Feature activation-7.51e-02

Pos logit contributions
cks+0.337
gging+0.328
aron+0.309
 amused+0.307
ags+0.304
Neg logit contributions
 to-0.239
 would-0.238
 arrived-0.238
t-0.225
 hill-0.224
Token out
Token ID503
Feature activation-2.03e-01

Pos logit contributions
 to+0.145
t+0.135
 hill+0.135
anger+0.133
 ladder+0.131
Neg logit contributions
 blamed-0.133
gging-0.124
 amused-0.121
 years-0.117
aron-0.116
Token that
Token ID326
Feature activation-8.06e-02

Pos logit contributions
 to+0.303
t+0.291
 hill+0.274
anger+0.269
 ladder+0.257
Neg logit contributions
 blamed-0.450
gging-0.419
 amused-0.411
aron-0.402
cks-0.400
Token He
Token ID679
Feature activation+6.59e-02

Pos logit contributions
 blamed+0.382
gging+0.363
cks+0.361
 years+0.359
ck+0.359
Neg logit contributions
 to-0.298
 hill-0.283
anger-0.274
t-0.269
 ladder-0.268
Token was
Token ID373
Feature activation-1.40e-01

Pos logit contributions
gging+0.132
cks+0.132
ags+0.128
 blamed+0.127
 amused+0.120
Neg logit contributions
 to-0.107
 arrived-0.101
t-0.095
 hill-0.095
 would-0.091
Token not
Token ID407
Feature activation+1.69e-01

Pos logit contributions
 to+0.305
 home+0.227
 arrived+0.217
t+0.213
 ladder+0.209
Neg logit contributions
gging-0.223
ities-0.223
itive-0.220
 blamed-0.205
ishment-0.204
Token angry
Token ID7954
Feature activation+4.07e-01

Pos logit contributions
 unison+0.275
cks+0.255
gging+0.242
icious+0.239
 blamed+0.238
Neg logit contributions
 to-0.394
 home-0.308
 arrived-0.288
t-0.276
 ladder-0.274
Token.
Token ID13
Feature activation+1.87e-01

Pos logit contributions
 blamed+0.783
 amused+0.715
gging+0.708
cks+0.699
 ashamed+0.684
Neg logit contributions
 to-0.708
t-0.705
 hill-0.681
anger-0.664
 ladder-0.630
Token He
Token ID679
Feature activation+6.60e-02

Pos logit contributions
 blamed+0.377
ck+0.364
cks+0.361
gging+0.360
 amused+0.355
Neg logit contributions
 to-0.301
 hill-0.283
t-0.279
anger-0.275
 ladder-0.263
Token was
Token ID373
Feature activation-1.39e-01

Pos logit contributions
cks+0.152
gging+0.149
ags+0.148
 blamed+0.136
 amused+0.136
Neg logit contributions
 to-0.092
 arrived-0.089
 would-0.083
t-0.081
 hill-0.075
Token glad
Token ID9675
Feature activation+3.72e-01

Pos logit contributions
 to+0.312
 home+0.225
 arrived+0.224
t+0.222
 ladder+0.218
Neg logit contributions
gging-0.224
itive-0.216
ities-0.213
 blamed-0.200
 unison-0.200
Token that
Token ID326
Feature activation-7.83e-02

Pos logit contributions
 blamed+0.870
 amused+0.809
gging+0.799
 ashamed+0.791
cks+0.777
Neg logit contributions
t-0.484
 hill-0.475
anger-0.466
 to-0.455
 ladder-0.431
Token his
Token ID465
Feature activation+1.48e-01

Pos logit contributions
 to+0.115
t+0.110
 hill+0.109
 arrived+0.100
anger+0.099
Neg logit contributions
 blamed-0.174
gging-0.166
 amused-0.161
cks-0.160
aron-0.158
Token son
Token ID3367
Feature activation+1.21e-01

Pos logit contributions
 blamed+0.301
gging+0.272
 amused+0.270
cks+0.269
 ashamed+0.257
Neg logit contributions
 to-0.245
 hill-0.244
t-0.237
anger-0.236
 ladder-0.228
Token big
Token ID1263
Feature activation-6.94e-02

Pos logit contributions
 ashamed+0.191
 angry+0.191
 amused+0.186
 mad+0.180
 innocent+0.179
Neg logit contributions
t-0.312
oes-0.300
anger-0.298
or-0.295
ane-0.279
Token fish
Token ID5916
Feature activation-9.09e-02

Pos logit contributions
t+0.123
anger+0.106
oes+0.104
or+0.102
'll+0.095
Neg logit contributions
 angry-0.139
 sorry-0.137
 ashamed-0.136
 blamed-0.135
 mad-0.135
Token named
Token ID3706
Feature activation+2.41e-01

Pos logit contributions
anger+0.169
t+0.167
 hill+0.156
 ladder+0.154
oes+0.153
Neg logit contributions
 blamed-0.160
cks-0.146
 ashamed-0.145
 amused-0.144
gging-0.143
Token Max
Token ID5436
Feature activation+7.64e-02

Pos logit contributions
 blamed+0.498
 amused+0.461
gging+0.453
 years+0.451
aky+0.440
Neg logit contributions
 to-0.386
 hill-0.374
t-0.373
anger-0.351
 plunge-0.349
Token.
Token ID13
Feature activation+1.86e-01

Pos logit contributions
 blamed+0.119
 amused+0.108
gging+0.105
 ashamed+0.103
aron+0.102
Neg logit contributions
 to-0.164
 hill-0.157
t-0.155
 arrived-0.149
anger-0.146
Token Max
Token ID5436
Feature activation+7.65e-02

Pos logit contributions
 blamed+0.309
 amused+0.272
gging+0.270
 ashamed+0.268
 innocent+0.255
Neg logit contributions
t-0.374
anger-0.365
 hill-0.364
 to-0.355
 plunge-0.348
Token looked
Token ID3114
Feature activation-3.99e-02

Pos logit contributions
 blamed+0.156
 amused+0.151
cks+0.149
ags+0.146
gging+0.145
Neg logit contributions
 to-0.121
 hill-0.116
t-0.115
 arrived-0.110
ho-0.104
Token strong
Token ID1913
Feature activation+1.28e-01

Pos logit contributions
 to+0.074
 hill+0.069
t+0.069
anger+0.065
 ladder+0.065
Neg logit contributions
 blamed-0.073
gging-0.067
cks-0.065
 amused-0.065
aron-0.064
Token and
Token ID290
Feature activation+1.06e-02

Pos logit contributions
 blamed+0.257
 amused+0.235
gging+0.232
cks+0.227
 ashamed+0.226
Neg logit contributions
 to-0.215
t-0.211
 hill-0.209
anger-0.201
 ladder-0.191
Token happy
Token ID3772
Feature activation+3.25e-01

Pos logit contributions
 blamed+0.018
aron+0.018
gging+0.018
cks+0.018
ags+0.017
Neg logit contributions
 to-0.021
 hill-0.019
t-0.018
 ladder-0.018
ho-0.018
Token.
Token ID13
Feature activation+1.87e-01

Pos logit contributions
 blamed+0.561
 amused+0.507
gging+0.503
 ashamed+0.493
cks+0.481
Neg logit contributions
t-0.624
 hill-0.616
anger-0.608
 to-0.593
 ladder-0.580
Token a
Token ID257
Feature activation+1.42e-01

Pos logit contributions
 to+0.139
 hill+0.128
t+0.124
 mountain+0.121
 prize+0.117
Neg logit contributions
 blamed-0.132
 amused-0.119
cks-0.114
gging-0.113
 ashamed-0.111
Token long
Token ID890
Feature activation-1.21e-02

Pos logit contributions
 ashamed+0.217
 amused+0.215
 sorry+0.208
 innocent+0.205
 blamed+0.205
Neg logit contributions
anger-0.271
oes-0.268
ane-0.260
t-0.246
or-0.245
Token time
Token ID640
Feature activation+4.99e-02

Pos logit contributions
anger+0.024
t+0.024
oes+0.023
 hill+0.023
 to+0.023
Neg logit contributions
 blamed-0.019
gging-0.018
 amused-0.017
 years-0.017
 ashamed-0.016
Token.
Token ID13
Feature activation+1.92e-01

Pos logit contributions
 blamed+0.087
 amused+0.078
gging+0.078
cks+0.075
 ashamed+0.075
Neg logit contributions
 to-0.097
t-0.095
 hill-0.094
anger-0.091
 ladder-0.087
Token Every
Token ID3887
Feature activation+1.79e-01

Pos logit contributions
 blamed+0.310
 ashamed+0.270
gging+0.268
 amused+0.268
 innocent+0.252
Neg logit contributions
t-0.388
 hill-0.384
anger-0.376
 to-0.369
 plunge-0.366
Token time
Token ID640
Feature activation+4.97e-02

Pos logit contributions
 years+0.359
 blamed+0.356
 ashamed+0.348
 amused+0.342
 awful+0.337
Neg logit contributions
anger-0.275
t-0.254
oes-0.252
 Pole-0.246
or-0.240
Token he
Token ID339
Feature activation-1.25e-01

Pos logit contributions
 blamed+0.090
 ashamed+0.082
 sorry+0.077
 amused+0.076
 angry+0.076
Neg logit contributions
anger-0.093
t-0.088
oes-0.088
 hill-0.085
ho-0.083
Token took
Token ID1718
Feature activation-8.16e-03

Pos logit contributions
t+0.227
anger+0.223
 hill+0.217
 to+0.214
oes+0.208
Neg logit contributions
 blamed-0.232
 amused-0.205
gging-0.203
 ashamed-0.201
 years-0.197
Token a
Token ID257
Feature activation+1.42e-01

Pos logit contributions
t+0.016
 hill+0.016
anger+0.016
ho+0.015
 to+0.015
Neg logit contributions
 blamed-0.013
gging-0.012
 amused-0.012
 ashamed-0.012
cks-0.012
Token step
Token ID2239
Feature activation-2.82e-01

Pos logit contributions
 mad+0.224
 angry+0.224
 happy+0.223
 amused+0.219
 ashamed+0.212
Neg logit contributions
anger-0.254
oes-0.248
ane-0.242
ase-0.238
lor-0.232
Token forward
Token ID2651
Feature activation-1.92e-02

Pos logit contributions
anger+0.479
 hill+0.474
 exhibit+0.465
 Pole+0.462
t+0.444
Neg logit contributions
 angry-0.469
 sorry-0.458
 mad-0.447
 innocent-0.445
 ashamed-0.445
Token cried
Token ID16896
Feature activation+3.64e-01

Pos logit contributions
gging+0.225
 amused+0.220
aron+0.214
cks+0.213
ags+0.209
Neg logit contributions
 to-0.324
t-0.303
 hill-0.302
 arrived-0.297
 ladder-0.286
Token and
Token ID290
Feature activation+2.19e-03

Pos logit contributions
 blamed+0.784
cks+0.767
eling+0.730
gging+0.727
 amused+0.724
Neg logit contributions
 to-0.701
t-0.550
 hill-0.515
anger-0.497
 ladder-0.474
Token screamed
Token ID25421
Feature activation+2.72e-01

Pos logit contributions
gging+0.003
icious+0.003
aron+0.003
aky+0.003
iver+0.003
Neg logit contributions
 to-0.005
 arrived-0.004
 ladder-0.004
ho-0.004
 hill-0.004
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
 blamed+0.523
cks+0.474
gging+0.465
 amused+0.464
eling+0.450
Neg logit contributions
 to-0.570
t-0.483
 hill-0.454
anger-0.436
 ladder-0.428
Token They
Token ID1119
Feature activation+1.54e-01

Pos logit contributions
 blamed+0.388
 amused+0.366
gging+0.366
cks+0.365
ck+0.357
Neg logit contributions
 to-0.271
t-0.264
 hill-0.244
anger-0.236
 ladder-0.227
Token had
Token ID550
Feature activation+1.08e-02

Pos logit contributions
gging+0.279
cks+0.275
 amused+0.268
ags+0.265
aron+0.264
Neg logit contributions
 to-0.284
 arrived-0.268
t-0.254
 would-0.251
'll-0.242
Token cuts
Token ID6630
Feature activation+4.04e-02

Pos logit contributions
 blamed+0.022
 amused+0.021
gging+0.021
cks+0.020
 ashamed+0.020
Neg logit contributions
t-0.017
 to-0.017
 hill-0.016
anger-0.016
 ladder-0.015
Token and
Token ID290
Feature activation+2.46e-03

Pos logit contributions
 blamed+0.080
 amused+0.072
gging+0.071
cks+0.068
 ashamed+0.067
Neg logit contributions
 to-0.076
t-0.070
anger-0.067
 hill-0.067
 ladder-0.064
Token bruises
Token ID42232
Feature activation+2.77e-01

Pos logit contributions
 blamed+0.003
gging+0.003
 amused+0.003
aron+0.003
cks+0.002
Neg logit contributions
 to-0.006
 hill-0.006
t-0.006
 ladder-0.006
anger-0.005
Token.
Token ID13
Feature activation+1.79e-01

Pos logit contributions
 blamed+0.493
 amused+0.452
gging+0.441
cks+0.433
 ashamed+0.427
Neg logit contributions
 to-0.538
t-0.530
 hill-0.504
anger-0.492
 arrived-0.478
Token They
Token ID1119
Feature activation+1.54e-01

Pos logit contributions
 blamed+0.368
 amused+0.350
gging+0.349
cks+0.348
eling+0.344
Neg logit contributions
 to-0.293
t-0.289
 hill-0.260
anger-0.254
 ladder-0.243
Token be
Token ID307
Feature activation-9.62e-02

Pos logit contributions
 to+0.598
t+0.580
 hill+0.579
 plunge+0.530
anger+0.525
Neg logit contributions
 blamed-1.074
 amused-1.016
gging-0.992
 ashamed-0.971
cks-0.967
Token restored
Token ID15032
Feature activation+8.82e-02

Pos logit contributions
 to+0.202
t+0.168
 arrived+0.167
'll+0.160
 ladder+0.156
Neg logit contributions
 blamed-0.164
gging-0.157
 amused-0.146
aron-0.144
cks-0.143
Token.
Token ID13
Feature activation+1.79e-01

Pos logit contributions
 blamed+0.133
 years+0.126
gging+0.122
 ashamed+0.120
 amused+0.120
Neg logit contributions
t-0.175
anger-0.175
 hill-0.171
 plunge-0.165
 ladder-0.164
Token I
Token ID314
Feature activation-2.21e-01

Pos logit contributions
 blamed+0.386
 amused+0.349
 ashamed+0.346
gging+0.345
cks+0.335
Neg logit contributions
 hill-0.263
t-0.262
anger-0.254
 to-0.253
 ladder-0.239
Token can
Token ID460
Feature activation-2.95e-01

Pos logit contributions
 hill+0.276
anger+0.270
t+0.255
 to+0.246
 ladder+0.245
Neg logit contributions
 blamed-0.530
 ashamed-0.489
 amused-0.485
gging-0.470
 innocent-0.465
Token fix
Token ID4259
Feature activation+8.63e-03

Pos logit contributions
anger+0.435
t+0.434
 hill+0.432
 to+0.416
 ladder+0.399
Neg logit contributions
 blamed-0.627
gging-0.574
 amused-0.566
 ashamed-0.562
 innocent-0.549
Token it
Token ID340
Feature activation-1.79e-01

Pos logit contributions
 blamed+0.021
 amused+0.020
gging+0.019
 ashamed+0.019
cks+0.019
Neg logit contributions
t-0.010
 hill-0.010
 to-0.010
anger-0.010
 ladder-0.009
Token for
Token ID329
Feature activation-7.66e-02

Pos logit contributions
t+0.334
 hill+0.332
anger+0.328
 to+0.318
 plunge+0.312
Neg logit contributions
 blamed-0.303
 amused-0.275
gging-0.275
 ashamed-0.269
cks-0.267
Token you
Token ID345
Feature activation-1.08e-01

Pos logit contributions
 to+0.109
t+0.101
 hill+0.101
 prize+0.091
anger+0.091
Neg logit contributions
 blamed-0.176
 amused-0.166
gging-0.159
 ashamed-0.157
aron-0.156
Token."
Token ID526
Feature activation+7.69e-02

Pos logit contributions
 hill+0.215
anger+0.214
t+0.213
 plunge+0.201
 ladder+0.201
Neg logit contributions
 blamed-0.171
 amused-0.154
 ashamed-0.153
gging-0.153
 years-0.148
Token Lily
Token ID20037
Feature activation+1.37e-01

Pos logit contributions
 blamed+0.156
 amused+0.148
gging+0.147
cks+0.145
 annoying+0.141
Neg logit contributions
 to-0.124
t-0.121
 hill-0.118
anger-0.114
 ladder-0.110
Token so
Token ID523
Feature activation+1.19e-01

Pos logit contributions
 to+0.291
t+0.230
 ladder+0.225
 arrived+0.216
 hill+0.215
Neg logit contributions
cks-0.240
gging-0.240
aron-0.233
ck-0.226
 blamed-0.221
Token excited
Token ID6568
Feature activation+2.75e-01

Pos logit contributions
cks+0.180
gging+0.176
 unison+0.172
 blamed+0.169
aron+0.166
Neg logit contributions
t-0.259
 to-0.257
 arrived-0.227
 ladder-0.226
 hill-0.224
Token!
Token ID0
Feature activation+3.15e-02

Pos logit contributions
 blamed+0.654
 years+0.637
gging+0.618
 amused+0.616
 ashamed+0.612
Neg logit contributions
 hill-0.303
anger-0.299
 plunge-0.286
 ladder-0.281
ho-0.276
Token
Token ID220
Feature activation-1.34e-02

Pos logit contributions
 blamed+0.052
 ashamed+0.044
 amused+0.044
gging+0.043
aron+0.042
Neg logit contributions
anger-0.061
t-0.061
 plunge-0.060
 hill-0.060
ho-0.059
Token
Token ID198
Feature activation-5.02e-02

Pos logit contributions
t+0.019
 to+0.018
 hill+0.017
anger+0.017
oes+0.016
Neg logit contributions
 blamed-0.030
gging-0.028
 amused-0.028
 ashamed-0.028
cks-0.027
Token
Token ID198
Feature activation-5.03e-02

Pos logit contributions
anger+0.089
 hill+0.088
t+0.084
ho+0.084
 plunge+0.084
Neg logit contributions
 blamed-0.089
 amused-0.081
gging-0.080
cks-0.078
 ashamed-0.077
TokenWhen
Token ID2215
Feature activation-3.89e-02

Pos logit contributions
t+0.090
 hill+0.089
anger+0.088
 to+0.088
 ladder+0.083
Neg logit contributions
 blamed-0.093
 amused-0.084
gging-0.084
 ashamed-0.081
cks-0.081
Token they
Token ID484
Feature activation-9.01e-02

Pos logit contributions
anger+0.074
t+0.071
oes+0.067
 hill+0.066
 plunge+0.066
Neg logit contributions
 blamed-0.069
 ashamed-0.061
 amused-0.059
 sorry-0.059
 years-0.058
Token got
Token ID1392
Feature activation-1.51e-01

Pos logit contributions
anger+0.114
 hill+0.114
 ladder+0.104
oes+0.103
 mountain+0.098
Neg logit contributions
 blamed-0.222
 amused-0.198
 ashamed-0.194
 angry-0.188
 years-0.187
Token to
Token ID284
Feature activation-4.52e-01

Pos logit contributions
anger+0.183
 hill+0.165
t+0.157
ho+0.155
oes+0.146
Neg logit contributions
 angry-0.367
 blamed-0.366
 mad-0.362
 ashamed-0.352
 amused-0.343
Token the
Token ID262
Feature activation+2.38e-02

Pos logit contributions
 hill+0.801
t+0.800
 to+0.798
anger+0.774
 ladder+0.744
Neg logit contributions
 blamed-0.836
 amused-0.761
gging-0.759
 ashamed-0.738
cks-0.731
Token.
Token ID13
Feature activation+1.88e-01

Pos logit contributions
t+0.004
 hill+0.004
 to+0.004
anger+0.004
 ladder+0.003
Neg logit contributions
 blamed-0.004
 amused-0.004
gging-0.004
 ashamed-0.004
cks-0.004
Token Despite
Token ID7945
Feature activation+8.78e-02

Pos logit contributions
 blamed+0.342
 amused+0.318
gging+0.314
cks+0.311
aky+0.299
Neg logit contributions
 to-0.344
t-0.337
 hill-0.328
anger-0.328
oes-0.311
Token being
Token ID852
Feature activation+1.68e-01

Pos logit contributions
 blamed+0.145
gging+0.129
 amused+0.125
aron+0.124
cks+0.122
Neg logit contributions
 to-0.185
 hill-0.165
t-0.163
ho-0.161
 plunge-0.159
Token scared
Token ID12008
Feature activation+2.74e-01

Pos logit contributions
 blamed+0.210
cks+0.194
gging+0.192
aron+0.183
 amused+0.176
Neg logit contributions
 to-0.423
t-0.373
 arrived-0.357
'll-0.347
 hill-0.343
Token,
Token ID11
Feature activation-9.19e-02

Pos logit contributions
 blamed+0.603
 amused+0.555
gging+0.552
 ashamed+0.548
 innocent+0.532
Neg logit contributions
 hill-0.388
t-0.385
anger-0.374
 to-0.366
 ladder-0.352
Token she
Token ID673
Feature activation-6.44e-02

Pos logit contributions
 to+0.150
t+0.148
 hill+0.144
anger+0.141
 ladder+0.134
Neg logit contributions
 blamed-0.187
gging-0.174
 amused-0.171
cks-0.171
 ashamed-0.163
Token had
Token ID550
Feature activation+1.70e-02

Pos logit contributions
 to+0.122
t+0.120
 hill+0.116
 arrived+0.114
anger+0.109
Neg logit contributions
 blamed-0.109
cks-0.105
gging-0.104
 amused-0.103
 innocent-0.095
Token still
Token ID991
Feature activation+2.19e-01

Pos logit contributions
 blamed+0.045
 amused+0.041
gging+0.041
 ashamed+0.041
cks+0.040
Neg logit contributions
 hill-0.017
t-0.017
anger-0.017
 to-0.016
 ladder-0.015
Token allowed
Token ID3142
Feature activation+1.22e-01

Pos logit contributions
 blamed+0.408
gging+0.378
 amused+0.368
cks+0.365
aron+0.348
Neg logit contributions
 to-0.409
t-0.388
 hill-0.379
 arrived-0.365
 ladder-0.356
Token her
Token ID607
Feature activation+2.13e-01

Pos logit contributions
 blamed+0.213
 amused+0.193
gging+0.191
 ashamed+0.186
cks+0.184
Neg logit contributions
t-0.230
 hill-0.227
 to-0.226
anger-0.223
 ladder-0.212
Token uncle
Token ID7711
Feature activation+1.11e-01

Pos logit contributions
 blamed+0.491
 ashamed+0.472
 years+0.468
 amused+0.468
 innocent+0.465
Neg logit contributions
anger-0.264
 hill-0.260
t-0.254
oes-0.230
ho-0.229
TokenThe
Token ID464
Feature activation+2.50e-02

Pos logit contributions
t+0.139
 hill+0.133
 to+0.133
 ladder+0.122
anger+0.121
Neg logit contributions
 blamed-0.166
 amused-0.154
 ashamed-0.153
cks-0.153
gging-0.153
Token man
Token ID582
Feature activation-1.72e-01

Pos logit contributions
 blamed+0.053
 amused+0.049
gging+0.049
 ashamed+0.048
 innocent+0.047
Neg logit contributions
t-0.037
 to-0.036
 hill-0.036
anger-0.036
oes-0.033
Token lay
Token ID3830
Feature activation+1.95e-02

Pos logit contributions
 to+0.311
t+0.310
 hill+0.307
anger+0.294
 arrived+0.289
Neg logit contributions
 blamed-0.307
gging-0.282
 amused-0.280
cks-0.274
 ashamed-0.271
Token down
Token ID866
Feature activation-1.81e-01

Pos logit contributions
 blamed+0.046
 amused+0.042
gging+0.042
cks+0.041
 ashamed+0.041
Neg logit contributions
 to-0.026
 hill-0.025
t-0.025
anger-0.024
 ladder-0.023
Token on
Token ID319
Feature activation-1.57e-01

Pos logit contributions
t+0.288
 hill+0.283
anger+0.279
 to+0.274
 ladder+0.260
Neg logit contributions
 blamed-0.368
 amused-0.340
gging-0.334
 ashamed-0.331
cks-0.324
Token the
Token ID262
Feature activation+1.32e-02

Pos logit contributions
 to+0.297
 hill+0.291
t+0.285
anger+0.277
 ladder+0.270
Neg logit contributions
 blamed-0.284
 amused-0.257
gging-0.252
 ashamed-0.248
cks-0.245
Token grass
Token ID8701
Feature activation+2.94e-02

Pos logit contributions
 blamed+0.026
 amused+0.025
 innocent+0.024
 years+0.024
gging+0.024
Neg logit contributions
t-0.021
anger-0.020
 to-0.019
oes-0.018
 hill-0.018
Token and
Token ID290
Feature activation+6.77e-03

Pos logit contributions
 blamed+0.051
gging+0.046
 amused+0.046
 ashamed+0.045
cks+0.044
Neg logit contributions
t-0.054
 hill-0.053
anger-0.053
 to-0.051
 ladder-0.049
Token soon
Token ID2582
Feature activation+1.61e-02

Pos logit contributions
 blamed+0.012
 amused+0.011
gging+0.011
 ashamed+0.011
cks+0.011
Neg logit contributions
t-0.012
 hill-0.012
 to-0.012
anger-0.012
 ladder-0.011
Token began
Token ID2540
Feature activation-5.48e-02

Pos logit contributions
 blamed+0.034
 amused+0.031
 ashamed+0.031
gging+0.031
cks+0.030
Neg logit contributions
t-0.024
 hill-0.024
anger-0.024
 to-0.023
oes-0.022
Token to
Token ID284
Feature activation-4.62e-01

Pos logit contributions
t+0.027
anger+0.026
 hill+0.024
oes+0.017
 ladder+0.017
Neg logit contributions
 blamed-0.172
 amused-0.165
 ashamed-0.163
gging-0.161
cks-0.160
Token was
Token ID373
Feature activation-1.42e-01

Pos logit contributions
t+0.125
 hill+0.123
anger+0.122
 to+0.120
 ladder+0.115
Neg logit contributions
 blamed-0.132
 amused-0.119
gging-0.118
 ashamed-0.116
cks-0.114
Token painting
Token ID12036
Feature activation-1.21e-01

Pos logit contributions
t+0.258
 to+0.254
 hill+0.252
anger+0.247
 ladder+0.236
Neg logit contributions
 blamed-0.259
 amused-0.234
gging-0.234
 ashamed-0.227
cks-0.226
Token,
Token ID11
Feature activation-9.56e-02

Pos logit contributions
t+0.194
 hill+0.192
anger+0.187
 to+0.187
 ladder+0.176
Neg logit contributions
 blamed-0.243
 amused-0.223
gging-0.221
 ashamed-0.218
cks-0.214
Token everyone
Token ID2506
Feature activation+2.16e-02

Pos logit contributions
t+0.171
anger+0.171
 hill+0.167
oes+0.158
ho+0.154
Neg logit contributions
 blamed-0.174
 amused-0.160
 ashamed-0.158
gging-0.153
 innocent-0.152
Token wanted
Token ID2227
Feature activation+1.61e-01

Pos logit contributions
 blamed+0.039
 amused+0.036
 ashamed+0.035
gging+0.034
cks+0.033
Neg logit contributions
anger-0.041
t-0.040
 hill-0.039
 plunge-0.038
 ladder-0.037
Token to
Token ID284
Feature activation-4.56e-01

Pos logit contributions
 blamed+0.502
 amused+0.476
 ashamed+0.474
 years+0.472
 innocent+0.467
Neg logit contributions
 hill-0.085
anger-0.079
t-0.071
 ladder-0.061
 plunge-0.058
Token watch
Token ID2342
Feature activation-1.77e-01

Pos logit contributions
 to+0.583
t+0.566
 hill+0.554
anger+0.518
 ladder+0.502
Neg logit contributions
 blamed-1.085
 amused-1.020
gging-1.010
cks-0.992
 ashamed-0.989
Token.
Token ID13
Feature activation+1.89e-01

Pos logit contributions
 to+0.321
t+0.302
 hill+0.301
anger+0.291
 ladder+0.280
Neg logit contributions
 blamed-0.338
gging-0.304
 amused-0.302
 ashamed-0.294
cks-0.292
Token
Token ID198
Feature activation-6.04e-02

Pos logit contributions
 blamed+0.347
 amused+0.315
gging+0.312
 ashamed+0.305
cks+0.301
Neg logit contributions
t-0.341
 hill-0.337
 to-0.333
anger-0.330
 ladder-0.313
Token
Token ID198
Feature activation-6.04e-02

Pos logit contributions
 hill+0.097
t+0.097
anger+0.096
 to+0.094
 plunge+0.091
Neg logit contributions
 blamed-0.119
 amused-0.109
gging-0.108
 ashamed-0.105
cks-0.104
TokenOne
Token ID3198
Feature activation+2.20e-02

Pos logit contributions
t+0.099
 hill+0.096
 to+0.096
anger+0.089
 ladder+0.086
Neg logit contributions
 blamed-0.124
 amused-0.116
 ashamed-0.114
cks-0.114
gging-0.114
Token day
Token ID1110
Feature activation-2.26e-01

Pos logit contributions
 blamed+0.039
 amused+0.037
gging+0.037
 ashamed+0.036
cks+0.036
Neg logit contributions
t-0.013
 hill-0.012
 to-0.012
anger-0.012
 ladder-0.010
Token,
Token ID11
Feature activation-1.01e-01

Pos logit contributions
t+0.335
 hill+0.332
 to+0.323
anger+0.323
oes+0.303
Neg logit contributions
 blamed-0.484
 amused-0.445
gging-0.442
 ashamed-0.434
cks-0.426
Token her
Token ID607
Feature activation+2.09e-01

Pos logit contributions
t+0.183
 to+0.183
 hill+0.181
anger+0.176
 ladder+0.169
Neg logit contributions
 blamed-0.184
 amused-0.167
gging-0.166
cks-0.163
 ashamed-0.161
Token family
Token ID1641
Feature activation-1.98e-02

Pos logit contributions
 blamed+0.361
 ashamed+0.332
 amused+0.331
 innocent+0.323
gging+0.319
Neg logit contributions
t-0.399
 hill-0.373
oes-0.372
anger-0.371
 to-0.354
Token went
Token ID1816
Feature activation-2.11e-02

Pos logit contributions
anger+0.036
t+0.036
 hill+0.036
oes+0.034
 to+0.034
Neg logit contributions
 blamed-0.036
 amused-0.032
 ashamed-0.032
gging-0.031
 years-0.030
Token to
Token ID284
Feature activation-4.57e-01

Pos logit contributions
t+0.022
 hill+0.021
ho+0.021
anger+0.021
oes+0.019
Neg logit contributions
 blamed-0.050
 sorry-0.049
 ashamed-0.049
 amused-0.049
 angry-0.049
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
t+0.905
anger+0.876
 hill+0.875
 to+0.852
oes+0.826
Neg logit contributions
 blamed-0.769
 amused-0.696
gging-0.683
 ashamed-0.672
cks-0.654
Token store
Token ID3650
Feature activation-2.27e-01

Pos logit contributions
 blamed+0.042
 innocent+0.040
 ashamed+0.040
 years+0.039
 angry+0.039
Neg logit contributions
t-0.031
anger-0.028
oes-0.027
or-0.025
 to-0.024
Token to
Token ID284
Feature activation-4.58e-01

Pos logit contributions
t+0.372
 hill+0.369
anger+0.368
 plunge+0.350
 to+0.342
Neg logit contributions
 blamed-0.441
 amused-0.406
 ashamed-0.400
gging-0.397
 years-0.389
Token buy
Token ID2822
Feature activation-1.63e-01

Pos logit contributions
t+0.565
 hill+0.545
anger+0.540
 to+0.525
oes+0.491
Neg logit contributions
 blamed-1.107
 amused-1.028
gging-1.014
 ashamed-1.008
cks-0.986
Token a
Token ID257
Feature activation+1.37e-01

Pos logit contributions
 to+0.294
t+0.250
 hill+0.246
anger+0.246
 ladder+0.238
Neg logit contributions
 blamed-0.342
gging-0.316
cks-0.311
 amused-0.306
aky-0.295
Token she
Token ID673
Feature activation-7.13e-02

Pos logit contributions
 blamed+0.320
 amused+0.298
 ashamed+0.289
gging+0.288
 years+0.284
Neg logit contributions
anger-0.217
 hill-0.216
t-0.214
 plunge-0.199
oes-0.197
Token needed
Token ID2622
Feature activation+1.24e-02

Pos logit contributions
 to+0.117
t+0.115
 hill+0.108
 arrived+0.108
anger+0.103
Neg logit contributions
 blamed-0.143
gging-0.136
 amused-0.132
cks-0.132
 ashamed-0.129
Token!
Token ID0
Feature activation+2.28e-02

Pos logit contributions
 blamed+0.016
 amused+0.015
 years+0.015
gging+0.014
 ashamed+0.014
Neg logit contributions
 hill-0.028
anger-0.027
t-0.027
 ladder-0.026
 mountain-0.026
Token She
Token ID1375
Feature activation+6.73e-02

Pos logit contributions
 blamed+0.045
 amused+0.041
gging+0.041
cks+0.040
 ashamed+0.040
Neg logit contributions
t-0.038
 to-0.038
 hill-0.038
anger-0.037
 ladder-0.035
Token carefully
Token ID7773
Feature activation+7.32e-02

Pos logit contributions
 blamed+0.132
 amused+0.121
gging+0.121
 ashamed+0.118
cks+0.118
Neg logit contributions
t-0.112
 hill-0.110
 to-0.110
anger-0.107
 ladder-0.102
Token we
Token ID356
Feature activation-3.04e-01

Pos logit contributions
 blamed+0.129
 amused+0.116
gging+0.114
 ashamed+0.111
cks+0.109
Neg logit contributions
t-0.138
 hill-0.137
anger-0.134
 to-0.130
oes-0.128
Tokenaved
Token ID9586
Feature activation-5.12e-02

Pos logit contributions
t+0.461
 hill+0.453
 to+0.451
anger+0.441
 ladder+0.416
Neg logit contributions
 blamed-0.641
 amused-0.593
gging-0.589
 ashamed-0.576
cks-0.571
Token them
Token ID606
Feature activation+1.69e-02

Pos logit contributions
t+0.097
 hill+0.095
anger+0.093
oes+0.091
 plunge+0.089
Neg logit contributions
 blamed-0.084
gging-0.078
 amused-0.077
cks-0.075
 years-0.075
Token around
Token ID1088
Feature activation-6.56e-02

Pos logit contributions
 blamed+0.029
 amused+0.027
gging+0.026
 innocent+0.026
cks+0.026
Neg logit contributions
t-0.031
anger-0.030
 hill-0.030
oes-0.029
 ladder-0.028
Token her
Token ID607
Feature activation+2.08e-01

Pos logit contributions
t+0.122
 hill+0.120
 to+0.119
anger+0.118
 ladder+0.112
Neg logit contributions
 blamed-0.116
 amused-0.105
gging-0.104
 ashamed-0.102
cks-0.101
Token toe
Token ID21189
Feature activation+8.37e-02

Pos logit contributions
 blamed+0.393
 amused+0.359
gging+0.356
 ashamed+0.349
cks+0.344
Neg logit contributions
t-0.365
 hill-0.357
 to-0.354
anger-0.351
 ladder-0.332
Token was
Token ID373
Feature activation-1.38e-01

Pos logit contributions
anger+0.174
t+0.172
 hill+0.167
oes+0.162
 to+0.161
Neg logit contributions
 blamed-0.262
 amused-0.239
gging-0.236
 ashamed-0.233
 years-0.231
Token a
Token ID257
Feature activation+1.43e-01

Pos logit contributions
 to+0.235
t+0.193
 hill+0.186
where+0.184
 arrived+0.181
Neg logit contributions
 blamed-0.300
gging-0.280
cks-0.279
eling-0.273
 amused-0.271
Token girl
Token ID2576
Feature activation-9.54e-02

Pos logit contributions
 blamed+0.281
gging+0.257
cks+0.254
 amused+0.253
 ashamed+0.245
Neg logit contributions
 hill-0.242
 to-0.237
t-0.236
anger-0.227
 ladder-0.224
Token who
Token ID508
Feature activation+2.36e-01

Pos logit contributions
 to+0.212
 hill+0.193
t+0.193
ho+0.184
 arrived+0.180
Neg logit contributions
 blamed-0.150
gging-0.141
 amused-0.139
 ashamed-0.130
cks-0.130
Token loved
Token ID6151
Feature activation+2.43e-01

Pos logit contributions
 blamed+0.408
gging+0.405
cks+0.393
 amused+0.382
uted+0.362
Neg logit contributions
 to-0.484
t-0.472
 arrived-0.447
 would-0.435
 visit-0.427
Token to
Token ID284
Feature activation-4.55e-01

Pos logit contributions
 blamed+0.698
 years+0.669
 angry+0.659
 amused+0.658
 ashamed+0.657
Neg logit contributions
t-0.195
anger-0.189
or-0.147
 plunge-0.136
zag-0.133
Token pick
Token ID2298
Feature activation-1.88e-01

Pos logit contributions
t+0.573
 hill+0.567
 to+0.561
anger+0.546
 ladder+0.511
Neg logit contributions
 blamed-1.080
 amused-1.005
gging-0.997
 ashamed-0.981
cks-0.973
Token tomatoes
Token ID23972
Feature activation-2.90e-03

Pos logit contributions
 to+0.284
 hill+0.274
t+0.272
anger+0.269
 ladder+0.252
Neg logit contributions
 blamed-0.408
 amused-0.376
gging-0.371
cks-0.362
 ashamed-0.361
Token from
Token ID422
Feature activation-5.05e-02

Pos logit contributions
t+0.007
 hill+0.007
anger+0.007
 to+0.007
 plunge+0.007
Neg logit contributions
 blamed-0.003
 amused-0.003
gging-0.003
 ashamed-0.003
cks-0.003
Token her
Token ID607
Feature activation+2.17e-01

Pos logit contributions
 to+0.089
 hill+0.089
t+0.087
anger+0.085
 ladder+0.082
Neg logit contributions
 blamed-0.096
 amused-0.088
gging-0.086
 ashamed-0.084
cks-0.083
Token garden
Token ID11376
Feature activation-1.40e-01

Pos logit contributions
 years+0.363
 blamed+0.362
 ashamed+0.346
 innocent+0.344
 angry+0.331
Neg logit contributions
t-0.439
 plunge-0.373
anger-0.372
oes-0.367
 arrived-0.352
Token the
Token ID262
Feature activation+5.39e-03

Pos logit contributions
t+0.277
 hill+0.274
 to+0.272
anger+0.268
 ladder+0.255
Neg logit contributions
 blamed-0.287
 amused-0.262
gging-0.259
 ashamed-0.253
cks-0.250
Token swing
Token ID9628
Feature activation-2.12e-01

Pos logit contributions
 blamed+0.012
 amused+0.011
 annoying+0.011
 ashamed+0.011
 angry+0.011
Neg logit contributions
t-0.007
anger-0.007
or-0.006
 to-0.006
oes-0.006
Token.
Token ID13
Feature activation+1.68e-01

Pos logit contributions
t+0.438
 hill+0.435
anger+0.430
 to+0.425
 plunge+0.411
Neg logit contributions
 blamed-0.318
 amused-0.281
gging-0.280
 ashamed-0.274
cks-0.268
Token He
Token ID679
Feature activation+5.84e-02

Pos logit contributions
 blamed+0.264
 amused+0.225
 ashamed+0.224
gging+0.222
cks+0.215
Neg logit contributions
t-0.345
anger-0.344
 hill-0.343
 plunge-0.331
 to-0.327
Token says
Token ID1139
Feature activation+1.48e-01

Pos logit contributions
 blamed+0.108
 amused+0.098
gging+0.097
 ashamed+0.095
cks+0.094
Neg logit contributions
t-0.104
 hill-0.102
 to-0.101
anger-0.101
 ladder-0.095
Token to
Token ID284
Feature activation-4.55e-01

Pos logit contributions
 blamed+0.289
 sorry+0.279
 ashamed+0.271
 amused+0.268
 innocent+0.258
Neg logit contributions
 hill-0.247
anger-0.240
t-0.230
oes-0.227
 exhibit-0.218
Token Mia
Token ID32189
Feature activation+1.08e-01

Pos logit contributions
 to+0.866
 hill+0.830
 ladder+0.781
t+0.764
 prize+0.763
Neg logit contributions
 blamed-0.824
 amused-0.778
aky-0.768
gging-0.763
 ashamed-0.746
Token,
Token ID11
Feature activation-1.11e-01

Pos logit contributions
 blamed+0.222
 amused+0.204
gging+0.203
 ashamed+0.198
cks+0.196
Neg logit contributions
t-0.173
 to-0.170
 hill-0.170
anger-0.165
 ladder-0.157
Token "
Token ID366
Feature activation-1.00e-01

Pos logit contributions
anger+0.182
 hill+0.181
t+0.180
oes+0.168
 plunge+0.168
Neg logit contributions
 blamed-0.219
 amused-0.209
 ashamed-0.199
gging-0.195
 sorry-0.194
TokenLet
Token ID5756
Feature activation-1.50e-01

Pos logit contributions
anger+0.183
 hill+0.167
t+0.164
 plunge+0.163
 ladder+0.163
Neg logit contributions
 blamed-0.182
gging-0.160
 amused-0.156
cks-0.155
ck-0.154
Token me
Token ID502
Feature activation-5.84e-02

Pos logit contributions
anger+0.251
t+0.246
 hill+0.243
oes+0.228
 ladder+0.226
Neg logit contributions
 blamed-0.293
gging-0.268
 ashamed-0.265
 amused-0.262
 angry-0.259