TOP ACTIVATIONS
MAX = 0.167

of the story : Gener osity is sometimes the best gift
than being jealous . Gener osity is the best way to
Jack smiled and nodded before continuing his tour with Daisy .
fun . Now , whenever There se gets bored , she
and thanked the river before continuing with her wild adventure .
From then on , whenever there was a disagreement between Marvin
his microscope once more before continuing his exploration . <|endoftext|> One
for a few moments before continuing their journey .
bunny for a while before continuing their walk . They saw
safe and dry inside whenever there was a storm . <|endoftext|>
The End ! <|endoftext|> There was a boy named Jake
in the sky . Up there , everything was so quiet
helping out and kept getting rewarded , until one day he
ucked the da isy before continuing her climb .
. The end . <|endoftext|> There was once a farmer who
just like before . <|endoftext|> There was a big pit near
night and during storms when there was no light outside .
that the sausage was still intact . She continued rolling it
for a little while before continuing on their way home .
park . There , the baby saw some

MIN ACTIVATIONS
MIN = -0.190

. Tara had a long - lost magazine and she wanted
i was a small 3 - year - old who wanted
S ally was a 3 - year - old girl who
He was wearing a big - ear ring , so Jack
<|endoftext|> Jimmy was a 3 - year - old who loved
. Jack was a 3 - year old who liked to
her mother holding a freshly - b aked muff in .
Tom my was a 3 - year - old who loved
S ally was a 3 - year - old girl who
my saw a big T - Rex and said , "
<|endoftext|> Jack was a 3 - year - old boy and
<|endoftext|> Tom was a 3 - year - old who liked
Joe y was a 3 - year - old boy who
, there was a 3 - year - old girl .
, holding his green T - rex . " He can
<|endoftext|> George was a 3 - year - old boy who
<|endoftext|> Ben was a 3 - year - old boy who
, there lived a 3 - year - old girl named
Tom was wearing a t - shirt and shorts , and
Tom my was a 3 - year - old boy and

INTERVAL 0.078 - 0.167
CONTAINS 4.418%

that she jumped up and down and shouted , " Y
She heard music coming from inside . When she opened the
helpful . Later that day , Lily 's
whale felt very sad . He felt like his special design
the big fish , but he could not do it .

INTERVAL -0.012 - 0.078
CONTAINS 53.388%

special project from the pattern , she was thrilled . She
told Tim my to help her clean the television . Tim
it is very expensive . She also sees that it is
. L ily comes back and sees that her
She was not happy . " What are you

INTERVAL -0.101 - -0.012
CONTAINS 41.682%

to talk to other people . One day
bridges and tunnels . They have fun . One
was getting closer and closer . She started to feel scared
you . Do you want to play with our ball ?"
The little boy thought that was lovely ,

INTERVAL -0.190 - -0.101
CONTAINS 0.512%

Lily 's mom my had an idea . She took the
a great job . I 'm glad you solved the zoom
Emily could hardly keep her eyes open . After a few
that she would have a lot of fun there and she
began to take a few l icks . Y um !
Token of
Token ID286
Feature activation+4.79e-02

Pos logit contributions
 scree+0.020
 arm+0.019
 fright+0.019
 lip+0.019
 fingert+0.018
Neg logit contributions
-0.020
orting-0.019
-0.018
uing-0.018
ating-0.017
Token the
Token ID262
Feature activation+4.46e-02

Pos logit contributions
+0.071
uing+0.068
orting+0.067
+0.065
 Others+0.063
Neg logit contributions
 scree-0.071
hy-0.069
 fright-0.066
 arm-0.066
 speech-0.064
Token story
Token ID1621
Feature activation+3.75e-03

Pos logit contributions
+0.068
orting+0.066
uing+0.062
+0.062
 Others+0.061
Neg logit contributions
 scree-0.063
hy-0.062
 arm-0.061
 ped-0.060
 fright-0.060
Token:
Token ID25
Feature activation+2.32e-02

Pos logit contributions
+0.007
uing+0.007
+0.006
orting+0.006
 Pole+0.006
Neg logit contributions
 arm-0.005
hy-0.004
 speech-0.004
 very-0.004
 fright-0.004
Token Gener
Token ID2980
Feature activation-5.80e-02

Pos logit contributions
uing+0.035
+0.034
 Pole+0.033
orting+0.033
+0.032
Neg logit contributions
 arm-0.033
-0.032
 fright-0.031
 scree-0.031
 speech-0.030
Tokenosity
Token ID16579
Feature activation+1.63e-01

Pos logit contributions
 arm+0.091
hy+0.090
 supposed+0.088
 written+0.085
 er+0.085
Neg logit contributions
-0.077
-0.074
orting-0.067
 Pole-0.067
cast-0.066
Token is
Token ID318
Feature activation-6.57e-02

Pos logit contributions
+0.252
uing+0.242
+0.235
orting+0.230
 Pole+0.225
Neg logit contributions
 arm-0.233
 scree-0.227
 fright-0.226
hy-0.213
 speech-0.212
Token sometimes
Token ID3360
Feature activation+4.68e-02

Pos logit contributions
 arm+0.074
 scree+0.074
 fright+0.071
hy+0.071
 speech+0.065
Neg logit contributions
-0.122
uing-0.118
-0.115
orting-0.113
 Others-0.109
Token the
Token ID262
Feature activation+3.09e-02

Pos logit contributions
+0.082
uing+0.079
+0.077
orting+0.076
 Others+0.073
Neg logit contributions
 arm-0.058
 scree-0.057
 fright-0.056
hy-0.054
 supposed-0.052
Token best
Token ID1266
Feature activation+2.39e-02

Pos logit contributions
+0.053
uing+0.050
+0.049
orting+0.048
 Others+0.046
Neg logit contributions
 scree-0.040
hy-0.038
 arm-0.038
 fright-0.037
 ped-0.036
Token gift
Token ID6979
Feature activation+1.49e-02

Pos logit contributions
+0.041
uing+0.040
+0.038
orting+0.038
 Others+0.036
Neg logit contributions
 arm-0.030
 scree-0.030
 fright-0.030
hy-0.029
 speech-0.028
Token than
Token ID621
Feature activation-6.15e-03

Pos logit contributions
+0.089
uing+0.085
+0.084
orting+0.082
 Pole+0.080
Neg logit contributions
 arm-0.059
 fright-0.058
 scree-0.057
 supposed-0.056
hy-0.054
Token being
Token ID852
Feature activation-3.62e-03

Pos logit contributions
 arm+0.007
 scree+0.007
 fright+0.007
hy+0.007
 speech+0.007
Neg logit contributions
-0.011
uing-0.011
-0.010
orting-0.010
 Others-0.010
Token jealous
Token ID19354
Feature activation+4.15e-02

Pos logit contributions
 arm+0.004
 scree+0.004
 fright+0.004
hy+0.003
 speech+0.003
Neg logit contributions
-0.007
uing-0.007
-0.007
orting-0.007
 Others-0.006
Token.
Token ID13
Feature activation-2.49e-02

Pos logit contributions
+0.079
+0.075
uing+0.075
 Pole+0.072
orting+0.071
Neg logit contributions
 arm-0.046
 fright-0.044
 scree-0.043
 er-0.042
hy-0.042
Token Gener
Token ID2980
Feature activation-7.03e-02

Pos logit contributions
 arm+0.042
 fright+0.042
 scree+0.042
hy+0.040
 speech+0.040
Neg logit contributions
uing-0.031
-0.031
-0.029
orting-0.028
 Pole-0.027
Tokenosity
Token ID16579
Feature activation+1.52e-01

Pos logit contributions
 arm+0.098
hy+0.097
 supposed+0.097
 scree+0.092
 written+0.091
Neg logit contributions
-0.106
-0.099
orting-0.092
cast-0.092
 Pole-0.091
Token is
Token ID318
Feature activation-7.40e-02

Pos logit contributions
+0.245
uing+0.234
+0.230
orting+0.224
 Pole+0.223
Neg logit contributions
 arm-0.211
 scree-0.203
 fright-0.203
 speech-0.191
 very-0.190
Token the
Token ID262
Feature activation+1.60e-02

Pos logit contributions
 arm+0.084
 scree+0.083
 fright+0.081
hy+0.081
 speech+0.075
Neg logit contributions
-0.136
uing-0.133
-0.128
orting-0.127
 Others-0.122
Token best
Token ID1266
Feature activation+1.92e-02

Pos logit contributions
+0.024
uing+0.023
+0.022
orting+0.022
 Others+0.020
Neg logit contributions
 scree-0.023
hy-0.023
 arm-0.022
 fright-0.021
 ped-0.021
Token way
Token ID835
Feature activation-1.49e-02

Pos logit contributions
+0.033
uing+0.032
+0.031
orting+0.030
 Others+0.029
Neg logit contributions
 arm-0.025
 scree-0.024
 fright-0.024
hy-0.023
 speech-0.022
Token to
Token ID284
Feature activation-2.25e-02

Pos logit contributions
 scree+0.021
 arm+0.021
 fright+0.020
hy+0.020
 speech+0.019
Neg logit contributions
-0.023
uing-0.023
-0.022
orting-0.022
 Others-0.021
TokenJack
Token ID14295
Feature activation+2.42e-02

Pos logit contributions
uing+0.031
 Pole+0.031
places+0.029
cast+0.029
 Majesty+0.029
Neg logit contributions
-0.027
 fright-0.025
 scree-0.024
leaf-0.023
 arm-0.023
Token smiled
Token ID13541
Feature activation+1.90e-04

Pos logit contributions
+0.044
uing+0.043
orting+0.042
+0.041
 Others+0.040
Neg logit contributions
 arm-0.028
 scree-0.028
 fright-0.027
hy-0.026
 speech-0.025
Token and
Token ID290
Feature activation-6.34e-02

Pos logit contributions
+0.000
uing+0.000
+0.000
orting+0.000
 Others+0.000
Neg logit contributions
 scree-0.000
 arm-0.000
 fright-0.000
hy-0.000
 supposed-0.000
Token nodded
Token ID14464
Feature activation+4.60e-02

Pos logit contributions
 scree+0.071
 arm+0.071
 fright+0.069
hy+0.067
 speech+0.064
Neg logit contributions
-0.118
uing-0.115
-0.111
orting-0.110
 Others-0.106
Token before
Token ID878
Feature activation-1.98e-02

Pos logit contributions
+0.075
uing+0.073
+0.071
orting+0.069
 Others+0.066
Neg logit contributions
 arm-0.062
 scree-0.061
 fright-0.060
hy-0.059
 supposed-0.057
Token continuing
Token ID8282
Feature activation+1.46e-01

Pos logit contributions
 scree+0.027
 arm+0.026
 fright+0.026
hy+0.025
 speech+0.024
Neg logit contributions
-0.033
uing-0.033
-0.031
orting-0.031
 Others-0.029
Token his
Token ID465
Feature activation+2.89e-02

Pos logit contributions
+0.247
uing+0.237
+0.231
orting+0.231
 Others+0.217
Neg logit contributions
 scree-0.187
 arm-0.183
 fright-0.182
hy-0.178
 supposed-0.172
Token tour
Token ID4205
Feature activation+4.81e-02

Pos logit contributions
+0.051
uing+0.049
+0.048
orting+0.047
 Pole+0.047
Neg logit contributions
 arm-0.032
 scree-0.032
 written-0.031
hy-0.031
 fright-0.031
Token with
Token ID351
Feature activation-1.23e-02

Pos logit contributions
+0.080
uing+0.077
+0.075
orting+0.073
 Others+0.070
Neg logit contributions
 arm-0.064
 scree-0.063
 fright-0.062
hy-0.061
 speech-0.058
Token Daisy
Token ID40355
Feature activation+3.50e-02

Pos logit contributions
 scree+0.014
 arm+0.013
hy+0.013
 supposed+0.012
 fright+0.012
Neg logit contributions
-0.022
uing-0.022
orting-0.021
 Others-0.021
-0.021
Token.
Token ID13
Feature activation-2.27e-02

Pos logit contributions
+0.054
uing+0.052
+0.051
orting+0.050
 Others+0.047
Neg logit contributions
 scree-0.050
 fright-0.049
 arm-0.049
hy-0.047
 supposed-0.045
Token fun
Token ID1257
Feature activation+4.07e-02

Pos logit contributions
+0.034
uing+0.034
orting+0.033
+0.033
 Others+0.031
Neg logit contributions
 arm-0.022
 scree-0.021
hy-0.021
 fright-0.020
 speech-0.019
Token.
Token ID13
Feature activation-2.35e-02

Pos logit contributions
+0.077
uing+0.075
+0.072
orting+0.071
 Others+0.069
Neg logit contributions
 arm-0.045
 scree-0.044
 fright-0.043
hy-0.042
 supposed-0.041
Token Now
Token ID2735
Feature activation+3.93e-02

Pos logit contributions
 arm+0.021
 scree+0.020
 fright+0.019
hy+0.019
 speech+0.018
Neg logit contributions
uing-0.049
-0.049
orting-0.047
-0.047
 Others-0.044
Token,
Token ID11
Feature activation+7.40e-02

Pos logit contributions
+0.060
orting+0.058
uing+0.058
+0.056
 Others+0.054
Neg logit contributions
 arm-0.057
hy-0.052
 very-0.052
 scree-0.051
-0.051
Token whenever
Token ID8797
Feature activation+1.05e-01

Pos logit contributions
+0.141
uing+0.137
+0.133
orting+0.132
 Others+0.126
Neg logit contributions
 arm-0.080
 scree-0.078
 fright-0.076
hy-0.074
 speech-0.071
Token There
Token ID1318
Feature activation+1.44e-01

Pos logit contributions
+0.203
uing+0.198
+0.192
orting+0.190
 Others+0.183
Neg logit contributions
 arm-0.111
 scree-0.109
 fright-0.107
hy-0.103
 speech-0.098
Tokense
Token ID325
Feature activation-1.66e-03

Pos logit contributions
uing+0.229
+0.226
orting+0.220
+0.211
 Others+0.202
Neg logit contributions
 arm-0.202
 fright-0.189
 scree-0.188
hy-0.186
 very-0.184
Token gets
Token ID3011
Feature activation+1.75e-02

Pos logit contributions
 arm+0.002
 scree+0.002
 fright+0.002
hy+0.002
 er+0.002
Neg logit contributions
-0.003
uing-0.003
orting-0.003
-0.003
 Others-0.003
Token bored
Token ID17533
Feature activation+7.41e-02

Pos logit contributions
+0.032
uing+0.031
+0.030
orting+0.030
 Others+0.030
Neg logit contributions
 scree-0.019
 arm-0.019
hy-0.018
 fright-0.018
 fingert-0.017
Token,
Token ID11
Feature activation+7.71e-02

Pos logit contributions
+0.111
uing+0.111
+0.104
orting+0.102
 Others+0.094
Neg logit contributions
 arm-0.110
 fright-0.109
hy-0.107
 scree-0.106
 very-0.105
Token she
Token ID673
Feature activation+9.82e-02

Pos logit contributions
+0.127
uing+0.122
+0.119
orting+0.116
 Others+0.112
Neg logit contributions
 scree-0.103
 arm-0.102
 fright-0.100
hy-0.098
 supposed-0.093
Token and
Token ID290
Feature activation-6.40e-02

Pos logit contributions
 arm+0.001
 scree+0.001
 fright+0.001
 very+0.001
hy+0.001
Neg logit contributions
-0.002
uing-0.002
orting-0.002
 Others-0.001
-0.001
Token thanked
Token ID26280
Feature activation+4.80e-02

Pos logit contributions
 arm+0.090
 scree+0.089
 fright+0.088
hy+0.087
 supposed+0.084
Neg logit contributions
-0.101
uing-0.096
-0.096
orting-0.092
 Others-0.087
Token the
Token ID262
Feature activation+2.54e-02

Pos logit contributions
+0.080
uing+0.076
+0.074
orting+0.073
 Others+0.072
Neg logit contributions
 scree-0.063
 arm-0.062
 supposed-0.061
hy-0.061
 fright-0.059
Token river
Token ID7850
Feature activation+4.63e-02

Pos logit contributions
uing+0.053
+0.053
orting+0.049
+0.049
 Others+0.047
Neg logit contributions
 arm-0.025
 scree-0.024
 fright-0.024
hy-0.022
 speech-0.022
Token before
Token ID878
Feature activation-2.46e-02

Pos logit contributions
+0.079
uing+0.075
+0.073
orting+0.071
 Others+0.069
Neg logit contributions
 arm-0.061
 scree-0.059
 fright-0.058
hy-0.058
 speech-0.055
Token continuing
Token ID8282
Feature activation+1.43e-01

Pos logit contributions
 arm+0.031
 scree+0.030
 fright+0.030
hy+0.030
 supposed+0.028
Neg logit contributions
-0.042
uing-0.040
-0.040
orting-0.039
 Others-0.038
Token with
Token ID351
Feature activation+4.43e-03

Pos logit contributions
+0.260
uing+0.245
+0.242
orting+0.238
 Others+0.226
Neg logit contributions
 scree-0.169
 arm-0.162
 fright-0.162
hy-0.159
 supposed-0.153
Token her
Token ID607
Feature activation+1.65e-02

Pos logit contributions
 them+0.006
+0.006
uing+0.006
 Others+0.006
orting+0.006
Neg logit contributions
 scree-0.006
hy-0.006
 supposed-0.005
leaf-0.005
 arm-0.005
Token wild
Token ID4295
Feature activation+1.39e-02

Pos logit contributions
+0.029
+0.027
uing+0.027
orting+0.025
 Others+0.025
Neg logit contributions
 scree-0.020
hy-0.020
 supposed-0.019
 fright-0.019
 arm-0.018
Token adventure
Token ID8855
Feature activation+6.07e-02

Pos logit contributions
+0.025
uing+0.024
+0.023
orting+0.023
 Others+0.022
Neg logit contributions
 scree-0.016
 arm-0.016
 fright-0.016
hy-0.016
 supposed-0.015
Token.
Token ID13
Feature activation-2.08e-02

Pos logit contributions
+0.102
uing+0.099
+0.096
orting+0.094
 Others+0.091
Neg logit contributions
 arm-0.079
 scree-0.078
 fright-0.076
hy-0.075
 speech-0.071
TokenFrom
Token ID4863
Feature activation+9.63e-03

Pos logit contributions
uing+0.031
+0.030
orting+0.029
 Pole+0.029
+0.027
Neg logit contributions
 scree-0.026
 arm-0.025
 fright-0.025
-0.024
 speech-0.024
Token then
Token ID788
Feature activation+5.85e-02

Pos logit contributions
+0.012
uing+0.012
+0.011
orting+0.011
 Others+0.010
Neg logit contributions
 arm-0.017
 scree-0.017
 fright-0.016
hy-0.016
 speech-0.016
Token on
Token ID319
Feature activation-6.06e-03

Pos logit contributions
+0.120
uing+0.118
+0.114
orting+0.113
 Others+0.110
Neg logit contributions
 arm-0.055
 scree-0.053
 fright-0.051
hy-0.050
 supposed-0.047
Token,
Token ID11
Feature activation+7.97e-02

Pos logit contributions
 arm+0.009
 very+0.008
 a+0.008
hy+0.008
 fright+0.008
Neg logit contributions
uing-0.010
orting-0.009
-0.009
 Pole-0.009
-0.008
Token whenever
Token ID8797
Feature activation+1.06e-01

Pos logit contributions
uing+0.133
orting+0.128
+0.128
+0.119
 Pole+0.117
Neg logit contributions
 arm-0.105
 scree-0.100
 fright-0.099
 very-0.099
hy-0.097
Token there
Token ID612
Feature activation+1.42e-01

Pos logit contributions
uing+0.175
+0.171
orting+0.169
+0.161
 Others+0.152
Neg logit contributions
 arm-0.142
 fright-0.141
 scree-0.138
 speech-0.132
hy-0.131
Token was
Token ID373
Feature activation-4.61e-02

Pos logit contributions
uing+0.247
+0.240
orting+0.237
+0.228
 Others+0.217
Neg logit contributions
 arm-0.179
 scree-0.172
 fright-0.171
hy-0.165
 very-0.160
Token a
Token ID257
Feature activation-4.34e-02

Pos logit contributions
 arm+0.044
 scree+0.044
 fright+0.042
hy+0.041
 speech+0.038
Neg logit contributions
-0.093
uing-0.090
-0.088
orting-0.087
 Others-0.084
Token disagreement
Token ID25800
Feature activation+1.75e-02

Pos logit contributions
 arm+0.051
 scree+0.050
hy+0.048
 fright+0.048
 supposed+0.047
Neg logit contributions
-0.078
-0.075
uing-0.074
orting-0.070
 Others-0.069
Token between
Token ID1022
Feature activation-5.05e-02

Pos logit contributions
+0.026
uing+0.025
+0.024
orting+0.024
 Pole+0.023
Neg logit contributions
 arm-0.026
 scree-0.025
hy-0.025
 fright-0.025
 er-0.024
Token Marvin
Token ID35105
Feature activation-1.72e-02

Pos logit contributions
 arm+0.072
 scree+0.072
 fright+0.071
hy+0.069
 speech+0.066
Neg logit contributions
-0.079
uing-0.076
-0.073
orting-0.072
 Others-0.068
Token his
Token ID465
Feature activation+5.23e-03

Pos logit contributions
 arm+0.002
 fright+0.002
 scree+0.002
hy+0.002
 very+0.002
Neg logit contributions
-0.003
uing-0.003
-0.002
orting-0.002
 Pole-0.002
Token microscope
Token ID36396
Feature activation-2.82e-02

Pos logit contributions
+0.009
+0.008
uing+0.008
orting+0.007
 Majesty+0.007
Neg logit contributions
hy-0.007
 scree-0.007
 supposed-0.007
 written-0.007
 fright-0.007
Token once
Token ID1752
Feature activation-6.51e-03

Pos logit contributions
 scree+0.037
 supposed+0.034
 fright+0.034
 arm+0.033
hy+0.033
Neg logit contributions
uing-0.047
-0.047
-0.044
orting-0.044
 Others-0.041
Token more
Token ID517
Feature activation+2.05e-02

Pos logit contributions
 arm+0.012
 scree+0.012
 fright+0.012
hy+0.011
 supposed+0.011
Neg logit contributions
-0.008
uing-0.007
-0.007
orting-0.007
 Others-0.006
Token before
Token ID878
Feature activation-2.60e-02

Pos logit contributions
+0.036
uing+0.035
+0.034
orting+0.034
 Others+0.032
Neg logit contributions
 arm-0.025
 scree-0.025
 fright-0.024
hy-0.024
 speech-0.023
Token continuing
Token ID8282
Feature activation+1.42e-01

Pos logit contributions
 scree+0.033
 arm+0.033
 fright+0.032
hy+0.030
 supposed+0.030
Neg logit contributions
-0.045
uing-0.044
orting-0.042
-0.042
 Others-0.040
Token his
Token ID465
Feature activation+3.11e-02

Pos logit contributions
+0.271
uing+0.263
+0.256
orting+0.252
 Others+0.243
Neg logit contributions
 arm-0.156
 scree-0.154
 fright-0.150
hy-0.145
 speech-0.138
Token exploration
Token ID13936
Feature activation+8.09e-02

Pos logit contributions
+0.053
+0.049
uing+0.049
orting+0.046
 Pole+0.045
Neg logit contributions
 scree-0.039
hy-0.039
 written-0.038
leaf-0.037
 arm-0.037
Token.
Token ID13
Feature activation-2.49e-02

Pos logit contributions
+0.148
uing+0.143
+0.139
orting+0.134
 Others+0.131
Neg logit contributions
 arm-0.098
 scree-0.093
 fright-0.093
hy-0.093
 supposed-0.086
Token<|endoftext|>
Token ID50256
Feature activation+4.46e-02

Pos logit contributions
 arm+0.045
 scree+0.045
 fright+0.044
hy+0.043
 speech+0.042
Neg logit contributions
-0.030
uing-0.029
-0.027
orting-0.026
 Others-0.025
TokenOne
Token ID3198
Feature activation+6.51e-02

Pos logit contributions
uing+0.066
+0.063
orting+0.057
+0.057
 Majesty+0.055
Neg logit contributions
 arm-0.070
 fright-0.069
 scree-0.067
hy-0.066
 speech-0.065
Token for
Token ID329
Feature activation-3.30e-02

Pos logit contributions
 scree+0.054
 arm+0.053
 fright+0.053
hy+0.052
 speech+0.050
Neg logit contributions
-0.040
uing-0.038
-0.037
orting-0.036
 Others-0.034
Token a
Token ID257
Feature activation-7.67e-02

Pos logit contributions
 scree+0.043
 arm+0.043
 fright+0.042
hy+0.042
 speech+0.040
Neg logit contributions
-0.054
uing-0.052
orting-0.050
-0.050
 Others-0.049
Token few
Token ID1178
Feature activation+2.83e-02

Pos logit contributions
 scree+0.132
 arm+0.132
 fright+0.128
hy+0.127
 supposed+0.125
Neg logit contributions
-0.096
-0.091
uing-0.091
orting-0.081
 Others-0.079
Token moments
Token ID7188
Feature activation-3.60e-02

Pos logit contributions
+0.044
uing+0.042
+0.041
orting+0.040
 Others+0.039
Neg logit contributions
 scree-0.040
 arm-0.040
hy-0.039
 fright-0.038
 supposed-0.037
Token before
Token ID878
Feature activation-2.95e-02

Pos logit contributions
 arm+0.047
 very+0.044
 scree+0.043
 written+0.043
 fright+0.043
Neg logit contributions
-0.061
uing-0.057
-0.057
 Pole-0.056
orting-0.055
Token continuing
Token ID8282
Feature activation+1.42e-01

Pos logit contributions
 arm+0.042
 scree+0.042
 fright+0.041
hy+0.040
 speech+0.039
Neg logit contributions
-0.046
uing-0.044
-0.043
orting-0.042
 Others-0.040
Token their
Token ID511
Feature activation+3.55e-02

Pos logit contributions
+0.262
uing+0.255
+0.247
orting+0.244
 Others+0.235
Neg logit contributions
 arm-0.163
 scree-0.160
 fright-0.157
hy-0.152
 speech-0.145
Token journey
Token ID7002
Feature activation+3.49e-02

Pos logit contributions
+0.060
uing+0.058
+0.056
orting+0.055
 Others+0.052
Neg logit contributions
 scree-0.044
 arm-0.043
 fright-0.042
hy-0.042
 written-0.041
Token.
Token ID13
Feature activation-2.00e-02

Pos logit contributions
+0.062
uing+0.059
+0.058
orting+0.056
 Others+0.055
Neg logit contributions
 arm-0.043
hy-0.041
 scree-0.041
 fright-0.041
 very-0.039
Token
Token ID198
Feature activation+2.02e-02

Pos logit contributions
 arm+0.031
 scree+0.031
hy+0.030
 fright+0.030
 speech+0.030
Neg logit contributions
uing-0.028
-0.027
orting-0.025
ating-0.024
-0.024
Token
Token ID198
Feature activation+2.14e-02

Pos logit contributions
+0.031
uing+0.030
+0.029
orting+0.028
 Others+0.028
Neg logit contributions
 arm-0.029
 scree-0.029
 fright-0.028
hy-0.028
 er-0.027
Token bunny
Token ID44915
Feature activation-2.43e-03

Pos logit contributions
+0.042
+0.040
uing+0.037
orting+0.036
 Others+0.034
Neg logit contributions
 scree-0.052
hy-0.050
 supposed-0.049
 fright-0.049
 arm-0.049
Token for
Token ID329
Feature activation-2.06e-02

Pos logit contributions
 arm+0.003
 scree+0.003
 fright+0.003
hy+0.003
 speech+0.003
Neg logit contributions
-0.004
uing-0.004
-0.004
orting-0.004
 Pole-0.004
Token a
Token ID257
Feature activation-6.87e-02

Pos logit contributions
hy+0.020
 supposed+0.019
 scree+0.019
 arm+0.019
 er+0.018
Neg logit contributions
 them-0.037
 Others-0.036
orting-0.035
-0.035
 until-0.034
Token while
Token ID981
Feature activation+2.71e-03

Pos logit contributions
 arm+0.110
 scree+0.109
hy+0.105
 fright+0.105
 supposed+0.105
Neg logit contributions
-0.093
-0.089
uing-0.089
orting-0.082
 Pole-0.079
Token before
Token ID878
Feature activation-2.68e-02

Pos logit contributions
+0.004
uing+0.004
+0.004
orting+0.004
 Others+0.004
Neg logit contributions
 scree-0.004
 arm-0.004
 fright-0.004
hy-0.004
 supposed-0.004
Token continuing
Token ID8282
Feature activation+1.42e-01

Pos logit contributions
 arm+0.044
hy+0.043
 scree+0.043
 fright+0.043
 supposed+0.042
Neg logit contributions
-0.035
-0.033
uing-0.033
orting-0.032
 Others-0.031
Token their
Token ID511
Feature activation+3.81e-02

Pos logit contributions
+0.254
uing+0.245
+0.240
orting+0.236
 Others+0.226
Neg logit contributions
 scree-0.171
 arm-0.170
 fright-0.166
hy-0.162
 supposed-0.154
Token walk
Token ID2513
Feature activation+2.28e-02

Pos logit contributions
+0.067
uing+0.064
+0.063
orting+0.062
 Pole+0.060
Neg logit contributions
 arm-0.044
 scree-0.044
 fright-0.043
hy-0.043
 written-0.042
Token.
Token ID13
Feature activation-2.25e-02

Pos logit contributions
+0.043
uing+0.042
+0.039
orting+0.038
 Pole+0.038
Neg logit contributions
 arm-0.027
hy-0.025
 fright-0.025
 scree-0.025
 very-0.024
Token They
Token ID1119
Feature activation+5.45e-02

Pos logit contributions
 arm+0.034
 very+0.033
 speech+0.033
+0.033
hy+0.032
Neg logit contributions
uing-0.035
ating-0.033
-0.032
orting-0.031
ying-0.029
Token saw
Token ID2497
Feature activation+5.33e-02

Pos logit contributions
+0.084
+0.081
uing+0.079
orting+0.076
 Others+0.073
Neg logit contributions
 arm-0.077
 fright-0.076
 scree-0.075
hy-0.074
 supposed-0.074
Token safe
Token ID3338
Feature activation+7.52e-02

Pos logit contributions
 arm+0.078
 scree+0.076
 fright+0.074
hy+0.072
 supposed+0.071
Neg logit contributions
-0.107
uing-0.104
-0.099
orting-0.098
 Others-0.095
Token and
Token ID290
Feature activation-6.53e-02

Pos logit contributions
+0.148
uing+0.146
 Others+0.136
orting+0.136
+0.135
Neg logit contributions
 arm-0.085
 very-0.081
 fright-0.077
 supposed-0.074
 scree-0.073
Token dry
Token ID5894
Feature activation+3.12e-02

Pos logit contributions
 arm+0.070
 scree+0.070
hy+0.067
 fright+0.066
 er+0.065
Neg logit contributions
-0.120
uing-0.117
-0.116
orting-0.114
 Others-0.107
Token inside
Token ID2641
Feature activation+8.28e-02

Pos logit contributions
+0.060
uing+0.058
+0.056
 Others+0.053
 Pole+0.052
Neg logit contributions
 arm-0.037
 fright-0.034
 very-0.033
hy-0.033
 scree-0.032
Token whenever
Token ID8797
Feature activation+9.95e-02

Pos logit contributions
+0.128
uing+0.125
+0.120
orting+0.119
 Others+0.114
Neg logit contributions
 scree-0.119
 arm-0.113
 fright-0.112
hy-0.111
 speech-0.105
Token there
Token ID612
Feature activation+1.42e-01

Pos logit contributions
uing+0.172
+0.164
orting+0.162
+0.155
 Pole+0.148
Neg logit contributions
 arm-0.130
 fright-0.126
 scree-0.125
hy-0.118
 very-0.117
Token was
Token ID373
Feature activation-5.20e-02

Pos logit contributions
+0.229
uing+0.223
+0.215
orting+0.212
 Others+0.202
Neg logit contributions
 arm-0.195
 scree-0.193
 fright-0.189
hy-0.185
 speech-0.177
Token a
Token ID257
Feature activation-4.76e-02

Pos logit contributions
 scree+0.053
 arm+0.052
hy+0.051
 fright+0.050
 speech+0.046
Neg logit contributions
-0.099
uing-0.096
-0.093
orting-0.092
 Others-0.091
Token storm
Token ID6388
Feature activation+1.84e-02

Pos logit contributions
 arm+0.053
 scree+0.053
 fright+0.051
hy+0.051
 supposed+0.048
Neg logit contributions
-0.088
uing-0.085
-0.084
orting-0.081
 Others-0.079
Token.
Token ID13
Feature activation-1.82e-02

Pos logit contributions
+0.034
uing+0.034
+0.032
orting+0.031
 Others+0.031
Neg logit contributions
 scree-0.021
 arm-0.021
hy-0.021
 fright-0.021
 supposed-0.019
Token<|endoftext|>
Token ID50256
Feature activation+4.53e-02

Pos logit contributions
 scree+0.035
 arm+0.034
 fright+0.034
hy+0.033
 supposed+0.032
Neg logit contributions
-0.020
uing-0.018
-0.018
orting-0.018
 Others-0.016
Token
Token ID198
Feature activation+2.47e-02

Pos logit contributions
+0.035
uing+0.034
+0.032
orting+0.032
 Pole+0.030
Neg logit contributions
 arm-0.039
 scree-0.039
 fright-0.038
hy-0.037
 speech-0.037
TokenThe
Token ID464
Feature activation+2.92e-02

Pos logit contributions
+0.032
uing+0.031
+0.029
orting+0.029
 Others+0.027
Neg logit contributions
 arm-0.042
 scree-0.042
 fright-0.041
hy-0.040
 speech-0.039
Token End
Token ID5268
Feature activation+6.84e-02

Pos logit contributions
+0.041
+0.039
uing+0.036
orting+0.035
 Pole+0.035
Neg logit contributions
 supposed-0.046
 scree-0.044
 fright-0.043
hy-0.043
 written-0.042
Token!
Token ID0
Feature activation-1.82e-02

Pos logit contributions
+0.076
+0.075
 Others+0.070
orting+0.067
uing+0.064
Neg logit contributions
 arm-0.127
 scree-0.125
 er-0.121
 supposed-0.121
hy-0.119
Token<|endoftext|>
Token ID50256
Feature activation+5.41e-02

Pos logit contributions
 scree+0.035
 arm+0.035
 fright+0.034
hy+0.034
 supposed+0.033
Neg logit contributions
-0.020
uing-0.018
-0.018
orting-0.017
 Others-0.016
TokenThere
Token ID1858
Feature activation+1.42e-01

Pos logit contributions
uing+0.064
+0.055
orting+0.051
+0.049
 Majesty+0.049
Neg logit contributions
 arm-0.103
 fright-0.099
 scree-0.099
-0.098
hy-0.097
Token was
Token ID373
Feature activation-8.54e-02

Pos logit contributions
uing+0.276
orting+0.261
+0.259
+0.241
 Others+0.241
Neg logit contributions
 arm-0.162
 very-0.144
 aer-0.140
hy-0.139
-0.135
Token a
Token ID257
Feature activation-6.72e-02

Pos logit contributions
 scree+0.083
 fright+0.083
hy+0.082
 arm+0.079
 vel+0.070
Neg logit contributions
-0.168
-0.159
orting-0.159
uing-0.157
 Others-0.156
Token boy
Token ID2933
Feature activation-3.73e-02

Pos logit contributions
 scree+0.085
 fright+0.085
 arm+0.084
hy+0.083
 supposed+0.081
Neg logit contributions
-0.115
-0.112
uing-0.107
orting-0.102
 Pole-0.101
Token named
Token ID3706
Feature activation-5.66e-02

Pos logit contributions
 scree+0.048
 arm+0.048
 fright+0.047
hy+0.045
 speech+0.044
Neg logit contributions
-0.063
uing-0.060
-0.059
orting-0.058
 Others-0.055
Token Jake
Token ID14757
Feature activation-2.21e-02

Pos logit contributions
 arm+0.078
 scree+0.077
hy+0.077
 fright+0.076
 supposed+0.072
Neg logit contributions
-0.093
uing-0.091
-0.086
orting-0.085
ating-0.079
Token in
Token ID287
Feature activation-9.47e-04

Pos logit contributions
uing+0.064
+0.064
orting+0.060
+0.060
 Others+0.059
Neg logit contributions
 arm-0.040
 fright-0.037
 very-0.037
 scree-0.037
hy-0.036
Token the
Token ID262
Feature activation+2.64e-02

Pos logit contributions
 scree+0.001
hy+0.001
 supposed+0.001
 arm+0.001
 fright+0.001
Neg logit contributions
-0.001
uing-0.001
-0.001
 Others-0.001
orting-0.001
Token sky
Token ID6766
Feature activation+1.67e-02

Pos logit contributions
+0.040
+0.037
uing+0.037
orting+0.037
 Pole+0.035
Neg logit contributions
 scree-0.039
 fright-0.038
 arm-0.038
 supposed-0.037
hy-0.036
Token.
Token ID13
Feature activation-2.41e-02

Pos logit contributions
+0.032
uing+0.031
+0.030
orting+0.029
 Others+0.028
Neg logit contributions
 arm-0.019
 scree-0.018
hy-0.018
 fright-0.017
 very-0.017
Token Up
Token ID3205
Feature activation+5.43e-02

Pos logit contributions
 arm+0.037
 scree+0.037
 fright+0.036
 speech+0.036
hy+0.036
Neg logit contributions
uing-0.036
-0.035
-0.032
orting-0.032
ating-0.031
Token there
Token ID612
Feature activation+1.42e-01

Pos logit contributions
uing+0.097
+0.089
orting+0.087
 Majesty+0.085
 Others+0.084
Neg logit contributions
 very-0.069
 arm-0.068
 scree-0.066
 fright-0.064
 a-0.061
Token,
Token ID11
Feature activation+9.94e-02

Pos logit contributions
+0.235
uing+0.229
orting+0.214
+0.206
stals+0.201
Neg logit contributions
 arm-0.195
 very-0.192
 scree-0.183
 a-0.182
hy-0.177
Token everything
Token ID2279
Feature activation+5.60e-02

Pos logit contributions
+0.186
uing+0.184
orting+0.170
 Others+0.167
+0.166
Neg logit contributions
 arm-0.111
 very-0.106
 scree-0.105
 fright-0.100
 ped-0.099
Token was
Token ID373
Feature activation-6.31e-02

Pos logit contributions
+0.108
uing+0.100
+0.100
orting+0.098
 Others+0.094
Neg logit contributions
 scree-0.065
 arm-0.062
 fright-0.061
 speech-0.058
 very-0.058
Token so
Token ID523
Feature activation-4.69e-02

Pos logit contributions
 arm+0.066
 scree+0.065
 fright+0.063
hy+0.062
 speech+0.058
Neg logit contributions
-0.122
uing-0.119
-0.116
orting-0.114
 Others-0.110
Token quiet
Token ID5897
Feature activation+4.98e-02

Pos logit contributions
 scree+0.037
 arm+0.035
 fright+0.034
hy+0.033
 very+0.031
Neg logit contributions
-0.106
uing-0.102
orting-0.099
-0.099
 Others-0.095
Token helping
Token ID5742
Feature activation+5.49e-02

Pos logit contributions
+0.013
uing+0.013
+0.012
orting+0.012
 Others+0.012
Neg logit contributions
hy-0.009
 scree-0.009
 fright-0.008
 arm-0.008
 speech-0.008
Token out
Token ID503
Feature activation+3.47e-02

Pos logit contributions
+0.088
uing+0.086
+0.082
orting+0.081
 Others+0.078
Neg logit contributions
 scree-0.075
 arm-0.074
 fright-0.073
hy-0.072
 supposed-0.069
Token and
Token ID290
Feature activation-5.27e-02

Pos logit contributions
+0.062
uing+0.059
+0.059
orting+0.057
 Pole+0.055
Neg logit contributions
 arm-0.043
 scree-0.042
 fright-0.041
hy-0.040
 speech-0.038
Token kept
Token ID4030
Feature activation+1.14e-02

Pos logit contributions
 arm+0.075
 scree+0.074
 fright+0.073
hy+0.070
 speech+0.068
Neg logit contributions
-0.084
uing-0.080
-0.078
orting-0.077
 Others-0.074
Token getting
Token ID1972
Feature activation+5.63e-03

Pos logit contributions
+0.020
uing+0.020
+0.019
orting+0.019
 Others+0.018
Neg logit contributions
hy-0.013
 scree-0.013
 arm-0.013
 fright-0.013
 speech-0.012
Token rewarded
Token ID20945
Feature activation+1.42e-01

Pos logit contributions
+0.010
uing+0.010
+0.010
orting+0.010
 Others+0.009
Neg logit contributions
 arm-0.006
 scree-0.006
 fright-0.006
hy-0.006
 supposed-0.006
Token,
Token ID11
Feature activation+7.30e-02

Pos logit contributions
+0.225
uing+0.220
+0.211
orting+0.208
 Others+0.198
Neg logit contributions
 scree-0.196
 arm-0.194
 fright-0.191
hy-0.185
 speech-0.179
Token until
Token ID1566
Feature activation+6.56e-02

Pos logit contributions
+0.113
uing+0.109
+0.106
orting+0.103
 Others+0.098
Neg logit contributions
 arm-0.102
 scree-0.102
 fright-0.100
hy-0.099
 supposed-0.095
Token one
Token ID530
Feature activation+1.65e-02

Pos logit contributions
+0.090
uing+0.083
+0.082
orting+0.079
 Others+0.076
Neg logit contributions
 scree-0.107
 arm-0.105
 fright-0.105
hy-0.103
 supposed-0.101
Token day
Token ID1110
Feature activation+4.56e-02

Pos logit contributions
+0.023
uing+0.022
+0.022
ry+0.021
orting+0.021
Neg logit contributions
 scree-0.024
 fright-0.023
hy-0.023
 fingert-0.022
 arm-0.022
Token he
Token ID339
Feature activation+1.04e-01

Pos logit contributions
+0.065
uing+0.064
+0.060
orting+0.060
 Others+0.056
Neg logit contributions
 arm-0.071
 scree-0.069
hy-0.068
 fright-0.067
 written-0.065
Tokenucked
Token ID17758
Feature activation+3.58e-02

Pos logit contributions
 arm+0.081
 scree+0.080
 fright+0.079
hy+0.077
 speech+0.074
Neg logit contributions
-0.081
uing-0.078
-0.075
orting-0.074
 Others-0.070
Token the
Token ID262
Feature activation+2.12e-02

Pos logit contributions
+0.067
uing+0.065
+0.063
orting+0.062
 Others+0.060
Neg logit contributions
 arm-0.040
 scree-0.040
 fright-0.038
hy-0.037
 supposed-0.036
Token da
Token ID12379
Feature activation-7.31e-02

Pos logit contributions
+0.034
+0.034
ar+0.030
uing+0.030
 Pole+0.030
Neg logit contributions
 ped-0.025
 supposed-0.025
 fright-0.024
leaf-0.024
 scree-0.024
Tokenisy
Token ID13560
Feature activation-7.56e-03

Pos logit contributions
 arm+0.115
 scree+0.114
 fright+0.112
hy+0.109
 supposed+0.106
Neg logit contributions
-0.103
uing-0.101
orting-0.096
-0.095
 Others-0.089
Token before
Token ID878
Feature activation-3.05e-02

Pos logit contributions
 scree+0.011
 fright+0.011
hy+0.010
 arm+0.010
 supposed+0.010
Neg logit contributions
-0.012
uing-0.011
-0.011
orting-0.010
 Others-0.010
Token continuing
Token ID8282
Feature activation+1.41e-01

Pos logit contributions
 arm+0.045
hy+0.045
 scree+0.044
 fright+0.043
 supposed+0.043
Neg logit contributions
-0.044
-0.042
uing-0.040
orting-0.039
 Others-0.039
Token her
Token ID607
Feature activation+3.20e-02

Pos logit contributions
+0.245
uing+0.233
+0.230
orting+0.225
 Others+0.214
Neg logit contributions
 scree-0.176
 arm-0.169
 fright-0.168
hy-0.165
 supposed-0.159
Token climb
Token ID12080
Feature activation+1.96e-02

Pos logit contributions
+0.055
+0.051
uing+0.050
 Others+0.046
orting+0.046
Neg logit contributions
leaf-0.038
 scree-0.038
hy-0.036
oof-0.036
 written-0.036
Token.
Token ID13
Feature activation-2.16e-02

Pos logit contributions
+0.031
uing+0.030
+0.029
orting+0.029
 Others+0.028
Neg logit contributions
 arm-0.027
 scree-0.027
 fright-0.026
hy-0.026
 supposed-0.025
Token
Token ID198
Feature activation+1.78e-02

Pos logit contributions
 arm+0.033
 fright+0.032
 speech+0.032
 scree+0.031
+0.031
Neg logit contributions
uing-0.033
-0.030
orting-0.029
 Pole-0.028
-0.028
Token
Token ID198
Feature activation+1.86e-02

Pos logit contributions
+0.025
uing+0.024
orting+0.022
+0.022
 Pole+0.022
Neg logit contributions
 arm-0.029
 fright-0.028
 scree-0.028
 speech-0.027
hy-0.027
Token.
Token ID13
Feature activation-1.85e-02

Pos logit contributions
 arm+0.073
 scree+0.071
hy+0.070
 fright+0.070
 very+0.066
Neg logit contributions
-0.122
uing-0.115
-0.113
orting-0.113
 Pole-0.109
Token The
Token ID383
Feature activation+3.17e-02

Pos logit contributions
 arm+0.031
 scree+0.031
 fright+0.030
hy+0.030
 speech+0.029
Neg logit contributions
uing-0.024
-0.023
orting-0.022
-0.021
ating-0.020
Token end
Token ID886
Feature activation+7.32e-03

Pos logit contributions
+0.046
+0.045
 Pole+0.042
orting+0.042
uing+0.041
Neg logit contributions
 supposed-0.046
 scree-0.044
 fright-0.043
hy-0.042
 arm-0.041
Token.
Token ID13
Feature activation-1.86e-02

Pos logit contributions
+0.010
uing+0.009
+0.009
orting+0.009
 Others+0.009
Neg logit contributions
 scree-0.012
hy-0.011
 arm-0.011
 fright-0.011
 supposed-0.011
Token<|endoftext|>
Token ID50256
Feature activation+4.64e-02

Pos logit contributions
 arm+0.033
 scree+0.033
 fright+0.032
hy+0.032
 speech+0.031
Neg logit contributions
-0.022
uing-0.022
-0.020
orting-0.020
 Others-0.019
TokenThere
Token ID1858
Feature activation+1.40e-01

Pos logit contributions
uing+0.057
+0.055
+0.050
orting+0.049
 Others+0.045
Neg logit contributions
 arm-0.084
 fright-0.082
 scree-0.081
hy-0.080
 speech-0.078
Token was
Token ID373
Feature activation-8.47e-02

Pos logit contributions
uing+0.269
+0.255
orting+0.253
 Others+0.235
+0.234
Neg logit contributions
 arm-0.163
 very-0.148
hy-0.143
 scree-0.138
 lip-0.137
Token once
Token ID1752
Feature activation-2.37e-03

Pos logit contributions
 scree+0.069
 fright+0.068
 arm+0.065
hy+0.064
 fingert+0.055
Neg logit contributions
-0.180
-0.173
orting-0.170
uing-0.170
 Others-0.170
Token a
Token ID257
Feature activation-5.86e-02

Pos logit contributions
 scree+0.003
 arm+0.003
 fright+0.003
hy+0.003
 supposed+0.003
Neg logit contributions
-0.004
uing-0.004
-0.004
orting-0.003
 Others-0.003
Token farmer
Token ID18739
Feature activation-4.73e-03

Pos logit contributions
 arm+0.079
 scree+0.078
 fright+0.075
hy+0.073
 speech+0.070
Neg logit contributions
-0.096
uing-0.095
orting-0.091
-0.089
 Others-0.086
Token who
Token ID508
Feature activation+5.36e-03

Pos logit contributions
 arm+0.006
 scree+0.006
 fright+0.006
hy+0.006
 speech+0.006
Neg logit contributions
-0.008
uing-0.008
-0.007
orting-0.007
 Others-0.007
Token just
Token ID655
Feature activation-4.36e-02

Pos logit contributions
+0.136
uing+0.132
+0.128
orting+0.126
 Others+0.121
Neg logit contributions
 arm-0.099
 scree-0.098
 fright-0.096
hy-0.093
 speech-0.089
Token like
Token ID588
Feature activation+2.83e-02

Pos logit contributions
 arm+0.056
 scree+0.055
 fright+0.054
hy+0.053
 speech+0.051
Neg logit contributions
-0.072
uing-0.069
-0.067
orting-0.067
 Others-0.063
Token before
Token ID878
Feature activation-2.93e-02

Pos logit contributions
+0.045
uing+0.044
+0.042
orting+0.041
 Others+0.039
Neg logit contributions
 arm-0.040
 scree-0.039
 fright-0.038
hy-0.037
 supposed-0.036
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
 arm+0.042
 scree+0.042
hy+0.041
 fright+0.041
 speech+0.039
Neg logit contributions
-0.044
uing-0.042
-0.042
orting-0.040
 Others-0.040
Token<|endoftext|>
Token ID50256
Feature activation+4.43e-02

Pos logit contributions
 arm+0.022
 scree+0.021
hy+0.021
 fright+0.021
 speech+0.021
Neg logit contributions
uing-0.019
-0.018
orting-0.017
ating-0.016
-0.016
TokenThere
Token ID1858
Feature activation+1.39e-01

Pos logit contributions
uing+0.060
+0.056
orting+0.051
+0.050
 Majesty+0.047
Neg logit contributions
 arm-0.076
 fright-0.074
 scree-0.073
hy-0.072
 speech-0.072
Token was
Token ID373
Feature activation-8.64e-02

Pos logit contributions
uing+0.292
orting+0.276
+0.273
 Others+0.256
+0.253
Neg logit contributions
 arm-0.139
 very-0.131
hy-0.120
 aer-0.117
-0.116
Token a
Token ID257
Feature activation-6.82e-02

Pos logit contributions
 fright+0.094
 scree+0.093
hy+0.087
 arm+0.085
 vel+0.080
Neg logit contributions
-0.157
orting-0.152
 Others-0.151
-0.150
uing-0.150
Token big
Token ID1263
Feature activation+9.14e-04

Pos logit contributions
 scree+0.083
 fright+0.083
 arm+0.081
hy+0.080
 supposed+0.078
Neg logit contributions
-0.120
-0.117
uing-0.113
orting-0.107
 Pole-0.106
Token pit
Token ID6028
Feature activation+3.22e-02

Pos logit contributions
+0.001
+0.001
uing+0.001
orting+0.001
 Pole+0.001
Neg logit contributions
 scree-0.001
 fright-0.001
 supposed-0.001
hy-0.001
 arm-0.001
Token near
Token ID1474
Feature activation+8.72e-03

Pos logit contributions
+0.053
uing+0.052
+0.049
orting+0.049
 Others+0.047
Neg logit contributions
 arm-0.044
 scree-0.042
 fright-0.041
hy-0.041
 speech-0.039
Token night
Token ID1755
Feature activation+4.53e-02

Pos logit contributions
uing+0.039
+0.038
 Others+0.037
orting+0.036
+0.036
Neg logit contributions
 scree-0.031
hy-0.031
 fright-0.029
leaf-0.029
 speech-0.029
Token and
Token ID290
Feature activation-6.27e-02

Pos logit contributions
+0.073
uing+0.070
+0.068
orting+0.066
 Others+0.064
Neg logit contributions
 arm-0.065
 scree-0.063
 fright-0.062
hy-0.061
 speech-0.059
Token during
Token ID1141
Feature activation+4.10e-02

Pos logit contributions
 arm+0.085
 scree+0.085
 fright+0.083
hy+0.082
 supposed+0.078
Neg logit contributions
-0.101
uing-0.098
-0.095
orting-0.093
 Others-0.089
Token storms
Token ID20228
Feature activation+5.03e-02

Pos logit contributions
+0.061
 Others+0.057
uing+0.055
 Days+0.054
+0.054
Neg logit contributions
leaf-0.055
hy-0.052
 speech-0.050
 supposed-0.050
 fingert-0.049
Token when
Token ID618
Feature activation+4.86e-02

Pos logit contributions
+0.101
uing+0.096
+0.094
 Majesty+0.089
 Others+0.086
Neg logit contributions
hy-0.060
 scree-0.058
 fright-0.054
 arm-0.054
 very-0.053
Token there
Token ID612
Feature activation+1.39e-01

Pos logit contributions
+0.062
uing+0.059
+0.057
orting+0.055
 Others+0.053
Neg logit contributions
 arm-0.083
 scree-0.083
 fright-0.081
hy-0.080
 supposed-0.077
Token was
Token ID373
Feature activation-6.37e-02

Pos logit contributions
+0.242
uing+0.238
+0.227
orting+0.227
 Others+0.213
Neg logit contributions
 arm-0.177
 scree-0.173
 fright-0.168
hy-0.165
 speech-0.157
Token no
Token ID645
Feature activation-1.54e-02

Pos logit contributions
 scree+0.056
hy+0.056
 arm+0.055
 fright+0.054
leaf+0.052
Neg logit contributions
-0.126
uing-0.122
-0.118
 Others-0.118
orting-0.117
Token light
Token ID1657
Feature activation+3.08e-03

Pos logit contributions
hy+0.020
 scree+0.020
 fingert+0.019
 supposed+0.019
leaf+0.019
Neg logit contributions
uing-0.022
 Others-0.021
orting-0.021
-0.021
 Pole-0.020
Token outside
Token ID2354
Feature activation+6.41e-02

Pos logit contributions
+0.006
uing+0.005
+0.005
orting+0.005
 Others+0.005
Neg logit contributions
 arm-0.004
 scree-0.004
 fright-0.004
hy-0.004
 speech-0.003
Token.
Token ID13
Feature activation-2.22e-02

Pos logit contributions
+0.109
uing+0.105
+0.102
orting+0.100
 Others+0.096
Neg logit contributions
 arm-0.083
 scree-0.082
 fright-0.080
hy-0.078
 speech-0.075
Token that
Token ID326
Feature activation+9.60e-02

Pos logit contributions
+0.053
uing+0.049
orting+0.049
+0.049
 Pole+0.048
Neg logit contributions
 arm-0.032
 fright-0.031
 scree-0.030
 very-0.028
 speech-0.028
Token the
Token ID262
Feature activation+2.75e-02

Pos logit contributions
+0.154
uing+0.144
+0.142
orting+0.138
 Others+0.132
Neg logit contributions
 scree-0.135
 arm-0.130
 fright-0.129
hy-0.129
 speech-0.124
Token sausage
Token ID31889
Feature activation-9.70e-03

Pos logit contributions
+0.048
uing+0.047
+0.045
orting+0.045
 Others+0.043
Neg logit contributions
 arm-0.034
 scree-0.033
 fright-0.033
hy-0.032
 supposed-0.031
Token was
Token ID373
Feature activation-8.06e-02

Pos logit contributions
 arm+0.013
 scree+0.013
 fright+0.012
hy+0.012
 speech+0.012
Neg logit contributions
-0.016
uing-0.016
-0.015
orting-0.015
 Others-0.014
Token still
Token ID991
Feature activation+1.25e-02

Pos logit contributions
 arm+0.085
 scree+0.085
 fright+0.084
hy+0.083
 speech+0.076
Neg logit contributions
uing-0.147
-0.147
-0.140
orting-0.138
 Others-0.133
Token intact
Token ID16572
Feature activation+1.39e-01

Pos logit contributions
uing+0.020
+0.020
+0.019
orting+0.019
ating+0.018
Neg logit contributions
 scree-0.015
 fright-0.015
hy-0.015
 arm-0.014
leaf-0.014
Token.
Token ID13
Feature activation-2.27e-02

Pos logit contributions
+0.254
uing+0.244
+0.237
orting+0.234
 Others+0.229
Neg logit contributions
 arm-0.168
 scree-0.156
 fright-0.153
hy-0.151
 very-0.149
Token She
Token ID1375
Feature activation+7.45e-02

Pos logit contributions
 arm+0.037
 fright+0.036
 scree+0.036
hy+0.036
 speech+0.035
Neg logit contributions
uing-0.031
-0.030
orting-0.028
-0.027
 Pole-0.026
Token continued
Token ID3767
Feature activation+8.33e-02

Pos logit contributions
+0.110
+0.110
uing+0.108
 Others+0.097
cast+0.095
Neg logit contributions
hy-0.104
 supposed-0.104
 arm-0.102
 fright-0.100
 scree-0.096
Token rolling
Token ID10708
Feature activation+5.70e-02

Pos logit contributions
+0.143
uing+0.139
+0.135
orting+0.132
 Others+0.127
Neg logit contributions
 arm-0.104
 scree-0.104
 fright-0.102
hy-0.100
 speech-0.095
Token it
Token ID340
Feature activation+4.67e-02

Pos logit contributions
+0.097
uing+0.096
+0.090
orting+0.089
 Pole+0.088
Neg logit contributions
 arm-0.075
 fright-0.073
 scree-0.073
hy-0.068
 very-0.066
Token for
Token ID329
Feature activation-2.59e-02

Pos logit contributions
 arm+0.023
 scree+0.023
 fright+0.022
hy+0.022
 supposed+0.021
Neg logit contributions
-0.038
uing-0.038
-0.036
orting-0.036
 Others-0.035
Token a
Token ID257
Feature activation-7.07e-02

Pos logit contributions
hy+0.022
 scree+0.021
 arm+0.020
 supposed+0.020
 speech+0.019
Neg logit contributions
-0.050
 Others-0.050
uing-0.050
orting-0.049
-0.048
Token little
Token ID1310
Feature activation-1.12e-02

Pos logit contributions
 scree+0.117
 arm+0.117
hy+0.114
 supposed+0.114
 fright+0.112
Neg logit contributions
-0.088
-0.086
uing-0.085
 Pole-0.075
orting-0.074
Token while
Token ID981
Feature activation-4.70e-03

Pos logit contributions
 scree+0.014
 arm+0.014
hy+0.014
 supposed+0.013
 fright+0.013
Neg logit contributions
-0.019
uing-0.018
-0.018
orting-0.018
 Others-0.017
Token before
Token ID878
Feature activation-3.12e-02

Pos logit contributions
 arm+0.008
 scree+0.008
 fright+0.008
hy+0.007
 speech+0.007
Neg logit contributions
-0.006
uing-0.006
-0.006
orting-0.006
 Others-0.005
Token continuing
Token ID8282
Feature activation+1.39e-01

Pos logit contributions
 arm+0.044
 scree+0.043
 fright+0.042
hy+0.042
 speech+0.040
Neg logit contributions
-0.050
uing-0.048
-0.046
orting-0.045
 Others-0.043
Token on
Token ID319
Feature activation-1.55e-03

Pos logit contributions
+0.249
uing+0.240
+0.235
orting+0.230
 Others+0.221
Neg logit contributions
 scree-0.165
 arm-0.163
 fright-0.160
hy-0.156
 supposed-0.149
Token their
Token ID511
Feature activation+2.25e-02

Pos logit contributions
 scree+0.002
hy+0.002
 arm+0.002
 fright+0.002
leaf+0.002
Neg logit contributions
-0.002
uing-0.002
-0.002
orting-0.002
 Others-0.002
Token way
Token ID835
Feature activation-3.26e-03

Pos logit contributions
+0.036
uing+0.034
+0.033
orting+0.032
 Others+0.031
Neg logit contributions
 scree-0.030
hy-0.030
 written-0.029
leaf-0.029
 fright-0.029
Token home
Token ID1363
Feature activation+4.65e-02

Pos logit contributions
 arm+0.004
 fright+0.004
 scree+0.004
hy+0.004
 speech+0.004
Neg logit contributions
-0.006
uing-0.006
-0.005
orting-0.005
 Others-0.005
Token.
Token ID13
Feature activation-2.25e-02

Pos logit contributions
+0.083
uing+0.078
+0.077
orting+0.076
 Others+0.073
Neg logit contributions
 arm-0.059
 scree-0.056
 fright-0.055
hy-0.055
 speech-0.052
Token park
Token ID3952
Feature activation+5.21e-02

Pos logit contributions
+0.022
+0.021
uing+0.020
 Pole+0.020
orting+0.019
Neg logit contributions
 scree-0.025
 fright-0.024
hy-0.024
 supposed-0.024
 arm-0.023
Token.
Token ID13
Feature activation-2.67e-02

Pos logit contributions
+0.080
uing+0.078
+0.076
orting+0.075
 Others+0.071
Neg logit contributions
 scree-0.074
 arm-0.073
 fright-0.072
hy-0.070
 supposed-0.069
Token
Token ID220
Feature activation+1.44e-02

Pos logit contributions
 arm+0.045
 scree+0.044
 fright+0.044
hy+0.044
 speech+0.042
Neg logit contributions
uing-0.035
-0.034
orting-0.031
-0.031
ating-0.030
Token
Token ID198
Feature activation+2.40e-02

Pos logit contributions
+0.021
uing+0.020
+0.019
orting+0.018
 Pole+0.018
Neg logit contributions
 arm-0.022
 fright-0.022
 scree-0.021
hy-0.021
 speech-0.021
Token
Token ID198
Feature activation+2.44e-02

Pos logit contributions
+0.033
uing+0.033
orting+0.030
+0.030
 Pole+0.029
Neg logit contributions
 arm-0.038
 fright-0.038
 scree-0.037
hy-0.036
 speech-0.036
TokenThere
Token ID1858
Feature activation+1.38e-01

Pos logit contributions
+0.031
uing+0.030
orting+0.029
+0.028
 Pole+0.027
Neg logit contributions
 scree-0.041
 arm-0.041
 fright-0.040
hy-0.039
 speech-0.038
Token,
Token ID11
Feature activation+7.57e-02

Pos logit contributions
uing+0.273
+0.255
orting+0.251
 Others+0.234
 Pole+0.234
Neg logit contributions
 arm-0.150
 very-0.145
 a-0.136
 scree-0.131
-0.131
Token the
Token ID262
Feature activation+3.04e-02

Pos logit contributions
uing+0.117
+0.116
orting+0.108
+0.101
 Others+0.098
Neg logit contributions
 arm-0.114
hy-0.108
 scree-0.108
 fright-0.108
 very-0.107
Token baby
Token ID5156
Feature activation-1.97e-02

Pos logit contributions
+0.046
+0.044
uing+0.043
orting+0.042
 Pole+0.041
Neg logit contributions
 scree-0.044
 fright-0.043
 arm-0.043
 supposed-0.042
hy-0.042
Token saw
Token ID2497
Feature activation+5.37e-02

Pos logit contributions
 arm+0.026
 scree+0.025
hy+0.024
 fright+0.023
 er+0.023
Neg logit contributions
uing-0.034
-0.033
orting-0.032
-0.030
 Others-0.030
Token some
Token ID617
Feature activation-2.86e-02

Pos logit contributions
+0.111
uing+0.109
+0.106
orting+0.105
 Others+0.101
Neg logit contributions
 arm-0.048
 scree-0.048
 fright-0.047
hy-0.045
 supposed-0.042
Token.
Token ID13
Feature activation-3.97e-02

Pos logit contributions
+0.018
+0.017
uing+0.017
orting+0.017
 Others+0.016
Neg logit contributions
 scree-0.017
 fright-0.017
 arm-0.017
hy-0.016
 supposed-0.016
Token Tara
Token ID37723
Feature activation-1.59e-02

Pos logit contributions
 arm+0.070
 scree+0.068
 fright+0.067
hy+0.067
 speech+0.064
Neg logit contributions
-0.049
uing-0.047
-0.045
orting-0.044
 Others-0.041
Token had
Token ID550
Feature activation-2.37e-02

Pos logit contributions
 arm+0.019
 scree+0.018
 fright+0.017
hy+0.017
 supposed+0.016
Neg logit contributions
uing-0.029
-0.029
orting-0.028
-0.027
 Others-0.026
Token a
Token ID257
Feature activation-9.17e-02

Pos logit contributions
 scree+0.028
 fright+0.027
leaf+0.025
 mid+0.024
 tal+0.024
Neg logit contributions
-0.035
-0.034
 Others-0.034
 them-0.033
 Days-0.032
Token long
Token ID890
Feature activation-2.23e-02

Pos logit contributions
 scree+0.134
e+0.127
hy+0.127
leaf+0.125
 fright+0.125
Neg logit contributions
-0.145
-0.137
 Pole-0.113
 Majesty-0.112
uddle-0.111
Token-
Token ID12
Feature activation-1.90e-01

Pos logit contributions
leaf+0.034
 fright+0.033
 scree+0.032
 supposed+0.031
 speech+0.030
Neg logit contributions
-0.035
-0.034
uing-0.030
orting-0.028
 Pole-0.027
Tokenlost
Token ID33224
Feature activation+2.18e-03

Pos logit contributions
 scree+0.306
 mid+0.267
 arm+0.260
 fright+0.260
 cry+0.257
Neg logit contributions
-0.250
orting-0.239
-0.238
or-0.228
uing-0.228
Token magazine
Token ID7093
Feature activation-3.27e-02

Pos logit contributions
+0.003
+0.003
 Pole+0.003
 Majesty+0.002
uing+0.002
Neg logit contributions
 supposed-0.003
 scree-0.003
 fright-0.003
leaf-0.003
 rhyth-0.003
Token and
Token ID290
Feature activation-7.52e-02

Pos logit contributions
 fright+0.064
 scree+0.063
 tal+0.058
 cry+0.058
oof+0.058
Neg logit contributions
-0.031
-0.031
orting-0.029
uing-0.025
 Pole-0.025
Token she
Token ID673
Feature activation+8.65e-02

Pos logit contributions
 fright+0.141
 scree+0.140
 arm+0.128
 mid+0.127
 speech+0.126
Neg logit contributions
-0.083
-0.082
 Others-0.070
uing-0.067
 Majesty-0.060
Token wanted
Token ID2227
Feature activation-8.08e-03

Pos logit contributions
+0.098
+0.096
uing+0.083
 Others+0.078
 regularly+0.074
Neg logit contributions
 fright-0.156
 scree-0.152
 arm-0.147
 mid-0.145
hy-0.144
Tokeni
Token ID72
Feature activation-3.74e-02

Pos logit contributions
+0.043
uing+0.042
orting+0.041
 Pole+0.039
 Others+0.039
Neg logit contributions
 arm-0.028
 scree-0.026
 lip-0.025
 very-0.025
 fright-0.024
Token was
Token ID373
Feature activation-1.02e-01

Pos logit contributions
 arm+0.040
 scree+0.037
+0.036
 very+0.035
hy+0.034
Neg logit contributions
uing-0.073
-0.070
orting-0.070
 Pole-0.065
 Others-0.065
Token a
Token ID257
Feature activation-8.83e-02

Pos logit contributions
 scree+0.116
 fright+0.115
hy+0.114
leaf+0.114
 fingert+0.109
Neg logit contributions
-0.170
-0.167
uing-0.161
 Others-0.156
orting-0.154
Token small
Token ID1402
Feature activation+2.38e-02

Pos logit contributions
 supposed+0.128
leaf+0.128
 written+0.127
oof+0.126
hy+0.125
Neg logit contributions
-0.140
-0.135
 Pole-0.113
uing-0.108
 Others-0.101
Token 3
Token ID513
Feature activation-9.96e-02

Pos logit contributions
+0.031
+0.030
uing+0.028
 Pole+0.027
orting+0.026
Neg logit contributions
 scree-0.039
 supposed-0.038
 fright-0.038
hy-0.037
 speech-0.037
Token-
Token ID12
Feature activation-1.86e-01

Pos logit contributions
 scree+0.137
hy+0.133
 written+0.127
 supposed+0.126
 vel+0.126
Neg logit contributions
-0.149
-0.148
uing-0.144
 Others-0.136
ating-0.136
Tokenyear
Token ID1941
Feature activation-1.14e-01

Pos logit contributions
 scree+0.228
 arm+0.217
 fright+0.209
 poem+0.198
 vel+0.196
Neg logit contributions
uing-0.324
orting-0.318
-0.304
 Others-0.300
-0.295
Token-
Token ID12
Feature activation-1.79e-01

Pos logit contributions
 scree+0.173
 fright+0.168
 written+0.160
 speech+0.159
 supposed+0.158
Neg logit contributions
uing-0.160
-0.154
orting-0.151
-0.150
 Pole-0.141
Tokenold
Token ID727
Feature activation-7.34e-03

Pos logit contributions
 scree+0.294
 arm+0.284
 fright+0.280
 fingert+0.261
 vel+0.259
Neg logit contributions
orting-0.241
uing-0.240
-0.223
 Others-0.218
ating-0.212
Token who
Token ID508
Feature activation-1.40e-02

Pos logit contributions
 scree+0.011
 fright+0.011
 arm+0.011
hy+0.011
 supposed+0.010
Neg logit contributions
-0.011
-0.010
uing-0.010
orting-0.009
 Others-0.009
Token wanted
Token ID2227
Feature activation-8.91e-03

Pos logit contributions
 arm+0.019
 scree+0.019
 fright+0.019
hy+0.018
 speech+0.018
Neg logit contributions
-0.022
uing-0.021
-0.021
orting-0.020
 Others-0.019
TokenS
Token ID50
Feature activation-1.09e-03

Pos logit contributions
uing+0.056
+0.051
orting+0.046
+0.045
 Majesty+0.043
Neg logit contributions
 arm-0.080
 fright-0.077
 scree-0.076
hy-0.076
 speech-0.075
Tokenally
Token ID453
Feature activation+8.77e-03

Pos logit contributions
 arm+0.001
hy+0.001
 written+0.001
 fright+0.001
 lip+0.001
Neg logit contributions
-0.002
uing-0.002
-0.002
orting-0.002
 Others-0.002
Token was
Token ID373
Feature activation-9.56e-02

Pos logit contributions
+0.015
uing+0.014
orting+0.014
+0.014
 Others+0.013
Neg logit contributions
 arm-0.012
 scree-0.011
 fright-0.011
hy-0.011
 supposed-0.010
Token a
Token ID257
Feature activation-7.90e-02

Pos logit contributions
 fright+0.139
 scree+0.138
hy+0.136
 vel+0.128
 poem+0.126
Neg logit contributions
-0.125
-0.122
orting-0.120
 Others-0.120
uing-0.120
Token 3
Token ID513
Feature activation-9.32e-02

Pos logit contributions
 supposed+0.112
 fright+0.112
 scree+0.111
 written+0.108
hy+0.107
Neg logit contributions
-0.126
-0.124
uing-0.105
 Pole-0.105
 Others-0.099
Token-
Token ID12
Feature activation-1.85e-01

Pos logit contributions
 scree+0.133
 supposed+0.126
hy+0.126
 written+0.125
 vel+0.124
Neg logit contributions
-0.133
-0.131
uing-0.128
 Others-0.122
ating-0.120
Tokenyear
Token ID1941
Feature activation-1.07e-01

Pos logit contributions
 scree+0.226
 arm+0.211
 fright+0.209
 vel+0.198
 fingert+0.197
Neg logit contributions
uing-0.321
orting-0.306
-0.300
 Others-0.300
-0.295
Token-
Token ID12
Feature activation-1.73e-01

Pos logit contributions
 scree+0.170
 fright+0.167
 written+0.157
 speech+0.155
 supposed+0.155
Neg logit contributions
uing-0.140
orting-0.131
-0.131
-0.129
rier-0.126
Tokenold
Token ID727
Feature activation-9.11e-04

Pos logit contributions
 scree+0.284
 arm+0.273
 fright+0.271
 fingert+0.254
 vel+0.252
Neg logit contributions
uing-0.232
orting-0.226
-0.215
 Others-0.212
ating-0.207
Token girl
Token ID2576
Feature activation-3.44e-02

Pos logit contributions
 scree+0.001
 fright+0.001
 arm+0.001
hy+0.001
 supposed+0.001
Neg logit contributions
-0.001
-0.001
uing-0.001
orting-0.001
 Others-0.001
Token who
Token ID508
Feature activation-8.82e-03

Pos logit contributions
 scree+0.047
 fright+0.047
 arm+0.046
hy+0.044
 supposed+0.043
Neg logit contributions
-0.055
-0.053
uing-0.051
orting-0.049
 Others-0.048
Token He
Token ID679
Feature activation+7.51e-02

Pos logit contributions
 arm+0.038
 fright+0.038
 speech+0.037
 scree+0.037
+0.036
Neg logit contributions
uing-0.037
-0.035
orting-0.033
ating-0.032
-0.031
Token was
Token ID373
Feature activation-9.06e-02

Pos logit contributions
+0.115
+0.115
uing+0.107
 Others+0.102
 Pole+0.099
Neg logit contributions
 arm-0.103
 supposed-0.102
 fright-0.100
hy-0.099
 scree-0.095
Token wearing
Token ID5762
Feature activation+6.23e-03

Pos logit contributions
 poem+0.085
oof+0.085
hy+0.084
 arm+0.083
 scree+0.083
Neg logit contributions
uing-0.154
-0.153
 Others-0.153
-0.152
orting-0.144
Token a
Token ID257
Feature activation-7.91e-02

Pos logit contributions
+0.012
+0.012
uing+0.012
 Others+0.011
orting+0.011
Neg logit contributions
 scree-0.006
 supposed-0.005
 fright-0.005
 arm-0.005
 speech-0.005
Token big
Token ID1263
Feature activation-1.40e-02

Pos logit contributions
 supposed+0.093
leaf+0.091
 scree+0.088
hy+0.084
 fright+0.081
Neg logit contributions
-0.132
-0.127
uing-0.120
 Pole-0.114
 Others-0.112
Token-
Token ID12
Feature activation-1.85e-01

Pos logit contributions
 supposed+0.019
leaf+0.019
 scree+0.017
 written+0.017
hy+0.017
Neg logit contributions
-0.022
-0.022
uing-0.020
orting-0.019
pit-0.019
Tokenear
Token ID451
Feature activation+8.54e-03

Pos logit contributions
 scree+0.248
 arm+0.232
 vel+0.226
 supposed+0.220
 fingert+0.220
Neg logit contributions
-0.279
orting-0.271
-0.267
uing-0.262
 Others-0.259
Tokenring
Token ID1806
Feature activation-3.35e-02

Pos logit contributions
+0.013
uing+0.012
+0.012
orting+0.012
 Pole+0.011
Neg logit contributions
 scree-0.012
 fright-0.012
 supposed-0.011
 arm-0.011
 speech-0.011
Token,
Token ID11
Feature activation+5.57e-02

Pos logit contributions
 supposed+0.043
 scree+0.043
 fright+0.042
 speech+0.040
leaf+0.040
Neg logit contributions
-0.054
-0.051
uing-0.051
orting-0.050
 Pole-0.047
Token so
Token ID523
Feature activation-5.69e-02

Pos logit contributions
+0.086
+0.085
uing+0.078
orting+0.078
 Pole+0.076
Neg logit contributions
 arm-0.070
 fright-0.069
hy-0.067
 supposed-0.067
 scree-0.067
Token Jack
Token ID3619
Feature activation+3.70e-02

Pos logit contributions
 arm+0.096
 scree+0.096
 fright+0.094
hy+0.092
 speech+0.089
Neg logit contributions
-0.078
uing-0.074
-0.070
orting-0.070
 Pole-0.064
Token<|endoftext|>
Token ID50256
Feature activation+4.46e-02

Pos logit contributions
 arm+0.059
 fright+0.058
 scree+0.057
hy+0.057
 speech+0.056
Neg logit contributions
uing-0.042
-0.042
-0.039
orting-0.039
 Pole-0.036
TokenJimmy
Token ID40335
Feature activation+2.30e-02

Pos logit contributions
uing+0.058
+0.050
orting+0.046
 Majesty+0.045
+0.043
Neg logit contributions
 arm-0.080
 fright-0.077
-0.077
 very-0.077
 speech-0.076
Token was
Token ID373
Feature activation-9.61e-02

Pos logit contributions
uing+0.043
+0.041
orting+0.040
 Others+0.038
+0.038
Neg logit contributions
 arm-0.027
 scree-0.026
hy-0.024
 very-0.024
 fright-0.024
Token a
Token ID257
Feature activation-8.14e-02

Pos logit contributions
 scree+0.122
 fright+0.121
hy+0.116
leaf+0.110
 fingert+0.107
Neg logit contributions
-0.154
-0.152
uing-0.147
 Others-0.141
orting-0.138
Token 3
Token ID513
Feature activation-9.45e-02

Pos logit contributions
 supposed+0.111
 fright+0.110
 scree+0.109
hy+0.107
 written+0.106
Neg logit contributions
-0.138
-0.133
 Pole-0.113
uing-0.109
 Majesty-0.105
Token-
Token ID12
Feature activation-1.85e-01

Pos logit contributions
 scree+0.135
hy+0.131
 supposed+0.125
 fright+0.124
 written+0.123
Neg logit contributions
-0.139
-0.138
uing-0.133
 Others-0.126
ating-0.125
Tokenyear
Token ID1941
Feature activation-1.11e-01

Pos logit contributions
 scree+0.259
 arm+0.240
 fright+0.234
 vel+0.227
 tal+0.225
Neg logit contributions
uing-0.301
orting-0.282
or-0.271
ins-0.271
 Others-0.269
Token-
Token ID12
Feature activation-1.77e-01

Pos logit contributions
 scree+0.176
 fright+0.172
 written+0.163
 supposed+0.160
 speech+0.158
Neg logit contributions
uing-0.148
-0.142
-0.141
orting-0.136
ating-0.130
Tokenold
Token ID727
Feature activation-1.63e-03

Pos logit contributions
 scree+0.318
 arm+0.301
 fright+0.300
 vel+0.284
 fingert+0.281
Neg logit contributions
uing-0.221
orting-0.213
or-0.197
 Others-0.195
ins-0.193
Token who
Token ID508
Feature activation-1.06e-02

Pos logit contributions
 scree+0.003
 fright+0.003
 arm+0.003
hy+0.002
 supposed+0.002
Neg logit contributions
-0.002
-0.002
uing-0.002
orting-0.002
 Others-0.002
Token loved
Token ID6151
Feature activation-9.83e-03

Pos logit contributions
 arm+0.014
 scree+0.014
 fright+0.014
hy+0.013
 speech+0.013
Neg logit contributions
-0.018
uing-0.017
-0.017
orting-0.016
 Others-0.016
Token.
Token ID13
Feature activation-3.33e-02

Pos logit contributions
 scree+0.039
 fright+0.039
 speech+0.038
hy+0.036
 vel+0.036
Neg logit contributions
-0.025
-0.024
uing-0.023
orting-0.023
 Others-0.020
Token Jack
Token ID3619
Feature activation+2.69e-02

Pos logit contributions
 written+0.054
 a+0.052
 fright+0.052
 arm+0.051
 mid+0.051
Neg logit contributions
uing-0.046
-0.044
rier-0.042
 Majesty-0.040
-0.040
Token was
Token ID373
Feature activation-9.22e-02

Pos logit contributions
+0.051
uing+0.051
orting+0.048
+0.047
 Others+0.047
Neg logit contributions
 arm-0.029
 scree-0.029
hy-0.027
 fright-0.027
 supposed-0.026
Token a
Token ID257
Feature activation-8.24e-02

Pos logit contributions
 fright+0.090
 fingert+0.084
 scree+0.083
 arm+0.081
 poem+0.080
Neg logit contributions
-0.168
 Others-0.168
-0.167
uing-0.162
orting-0.160
Token 3
Token ID513
Feature activation-9.54e-02

Pos logit contributions
 fright+0.092
 arm+0.091
 scree+0.090
hy+0.090
 supposed+0.087
Neg logit contributions
-0.150
-0.149
uing-0.137
 Pole-0.133
 Others-0.130
Token-
Token ID12
Feature activation-1.85e-01

Pos logit contributions
 scree+0.134
hy+0.133
 vel+0.128
 supposed+0.127
 fright+0.127
Neg logit contributions
-0.139
uing-0.134
-0.133
orting-0.128
 Others-0.126
Tokenyear
Token ID1941
Feature activation-1.10e-01

Pos logit contributions
 arm+0.207
 scree+0.201
 fright+0.197
 fingert+0.181
 poem+0.179
Neg logit contributions
uing-0.336
orting-0.331
 Others-0.318
-0.314
-0.310
Token old
Token ID1468
Feature activation-9.08e-03

Pos logit contributions
 fright+0.165
 scree+0.163
 written+0.157
 speech+0.154
 arm+0.150
Neg logit contributions
uing-0.151
orting-0.144
-0.143
-0.139
 Pole-0.137
Token who
Token ID508
Feature activation-1.36e-02

Pos logit contributions
 arm+0.013
 scree+0.013
 fright+0.012
 er+0.012
 lip+0.012
Neg logit contributions
-0.015
uing-0.015
orting-0.014
-0.013
 Others-0.013
Token liked
Token ID8288
Feature activation-1.20e-02

Pos logit contributions
 scree+0.017
 arm+0.017
hy+0.017
 fright+0.017
 supposed+0.016
Neg logit contributions
-0.023
uing-0.023
orting-0.022
-0.022
 Others-0.021
Token to
Token ID284
Feature activation-5.46e-02

Pos logit contributions
 scree+0.018
hy+0.018
 arm+0.018
 fright+0.018
 speech+0.017
Neg logit contributions
-0.016
-0.016
uing-0.015
 Others-0.015
orting-0.015
Token her
Token ID607
Feature activation+1.01e-02

Pos logit contributions
+0.106
+0.104
orting+0.104
 Others+0.102
uing+0.101
Neg logit contributions
 scree-0.046
 arm-0.044
 fright-0.043
 supposed-0.043
hy-0.042
Token mother
Token ID2802
Feature activation-7.07e-02

Pos logit contributions
+0.011
+0.010
 Majesty+0.010
uing+0.010
 Pole+0.009
Neg logit contributions
leaf-0.018
 scree-0.017
hy-0.017
 supposed-0.016
 fright-0.016
Token holding
Token ID4769
Feature activation+6.29e-03

Pos logit contributions
 arm+0.077
 scree+0.076
 fright+0.074
hy+0.072
 supposed+0.069
Neg logit contributions
-0.133
uing-0.130
-0.127
orting-0.124
 Others-0.120
Token a
Token ID257
Feature activation-8.54e-02

Pos logit contributions
+0.011
 them+0.010
+0.010
uing+0.009
orting+0.009
Neg logit contributions
 scree-0.008
 supposed-0.008
hy-0.007
leaf-0.007
 rhyth-0.007
Token freshly
Token ID29026
Feature activation-8.05e-02

Pos logit contributions
 scree+0.111
hy+0.106
 ped+0.105
 supposed+0.105
oof+0.099
Neg logit contributions
-0.135
-0.128
uing-0.126
 Pole-0.118
uddle-0.111
Token-
Token ID12
Feature activation-1.84e-01

Pos logit contributions
 scree+0.096
 fright+0.094
hy+0.093
 arm+0.093
 speech+0.085
Neg logit contributions
-0.144
-0.136
uing-0.135
orting-0.132
ating-0.126
Tokenb
Token ID65
Feature activation-6.64e-02

Pos logit contributions
 scree+0.272
 arm+0.248
 supposed+0.246
 cry+0.234
 speech+0.233
Neg logit contributions
-0.280
-0.265
uing-0.259
orting-0.255
 Others-0.242
Tokenaked
Token ID4335
Feature activation-3.03e-02

Pos logit contributions
 scree+0.123
 fright+0.121
hy+0.118
 supposed+0.113
leaf+0.112
Neg logit contributions
uing-0.078
-0.074
-0.072
orting-0.064
ating-0.061
Token muff
Token ID27563
Feature activation-5.34e-02

Pos logit contributions
 scree+0.049
oof+0.048
 supposed+0.046
hy+0.046
 fright+0.045
Neg logit contributions
uing-0.039
orting-0.036
-0.036
-0.034
ating-0.033
Tokenin
Token ID259
Feature activation+4.04e-04

Pos logit contributions
 scree+0.123
 arm+0.121
 fright+0.120
hy+0.120
 speech+0.115
Neg logit contributions
uing-0.037
-0.036
orting-0.033
-0.032
 Others-0.027
Token.
Token ID13
Feature activation-3.08e-02

Pos logit contributions
+0.001
uing+0.001
+0.001
orting+0.001
ry+0.000
Neg logit contributions
 scree-0.001
 fright-0.001
 supposed-0.001
hy-0.001
 speech-0.001
TokenTom
Token ID13787
Feature activation-1.31e-03

Pos logit contributions
uing+0.055
+0.048
orting+0.044
 Majesty+0.043
+0.041
Neg logit contributions
 arm-0.083
 fright-0.080
 scree-0.079
hy-0.079
-0.078
Tokenmy
Token ID1820
Feature activation-1.17e-02

Pos logit contributions
 arm+0.001
 scree+0.001
 fright+0.001
hy+0.001
 er+0.001
Neg logit contributions
-0.003
uing-0.003
orting-0.003
-0.002
 Others-0.002
Token was
Token ID373
Feature activation-9.13e-02

Pos logit contributions
 arm+0.014
 scree+0.014
 fright+0.013
hy+0.013
 er+0.012
Neg logit contributions
uing-0.022
-0.021
orting-0.020
-0.020
 Others-0.019
Token a
Token ID257
Feature activation-7.82e-02

Pos logit contributions
 scree+0.125
 fright+0.123
hy+0.118
 fingert+0.111
 poem+0.111
Neg logit contributions
-0.130
-0.128
uing-0.124
 Others-0.124
 Days-0.120
Token 3
Token ID513
Feature activation-9.29e-02

Pos logit contributions
 fright+0.107
 supposed+0.106
 scree+0.105
hy+0.103
leaf+0.100
Neg logit contributions
-0.129
-0.127
 Pole-0.109
uing-0.107
 Others-0.102
Token-
Token ID12
Feature activation-1.84e-01

Pos logit contributions
 scree+0.132
hy+0.127
 supposed+0.124
 fright+0.121
 fingert+0.121
Neg logit contributions
-0.137
-0.134
uing-0.131
 Others-0.123
ating-0.123
Tokenyear
Token ID1941
Feature activation-1.10e-01

Pos logit contributions
 scree+0.226
 arm+0.217
 fright+0.210
 vel+0.199
 fingert+0.198
Neg logit contributions
uing-0.320
orting-0.302
-0.296
 Others-0.292
-0.292
Token-
Token ID12
Feature activation-1.76e-01

Pos logit contributions
 scree+0.170
 fright+0.169
 supposed+0.159
 written+0.159
 speech+0.157
Neg logit contributions
uing-0.147
-0.139
-0.138
orting-0.137
 Pole-0.132
Tokenold
Token ID727
Feature activation+1.09e-03

Pos logit contributions
 scree+0.281
 arm+0.275
 fright+0.271
 fingert+0.254
 vel+0.253
Neg logit contributions
uing-0.243
orting-0.235
-0.221
 Others-0.219
-0.214
Token who
Token ID508
Feature activation-9.92e-03

Pos logit contributions
+0.002
+0.001
uing+0.001
orting+0.001
 Others+0.001
Neg logit contributions
 scree-0.002
 arm-0.002
 fright-0.002
hy-0.002
 supposed-0.002
Token loved
Token ID6151
Feature activation-7.83e-03

Pos logit contributions
 arm+0.013
 scree+0.013
 fright+0.013
hy+0.012
 speech+0.012
Neg logit contributions
-0.017
uing-0.016
-0.016
orting-0.015
 Others-0.015
TokenS
Token ID50
Feature activation+5.29e-03

Pos logit contributions
uing+0.053
+0.047
orting+0.043
 Majesty+0.041
+0.041
Neg logit contributions
 arm-0.080
 fright-0.077
 scree-0.076
 speech-0.076
 very-0.076
Tokenally
Token ID453
Feature activation+2.32e-03

Pos logit contributions
+0.007
uing+0.007
+0.007
orting+0.007
 Others+0.006
Neg logit contributions
 scree-0.009
 arm-0.009
 fright-0.008
hy-0.008
 supposed-0.008
Token was
Token ID373
Feature activation-9.27e-02

Pos logit contributions
uing+0.004
+0.004
orting+0.004
+0.004
 Others+0.004
Neg logit contributions
 arm-0.003
 scree-0.003
 fright-0.003
hy-0.003
 supposed-0.003
Token a
Token ID257
Feature activation-8.06e-02

Pos logit contributions
 scree+0.126
 fright+0.122
hy+0.116
 poem+0.108
 tal+0.106
Neg logit contributions
-0.139
-0.136
 Others-0.128
 nicely-0.127
 Days-0.127
Token 3
Token ID513
Feature activation-9.41e-02

Pos logit contributions
 fright+0.118
 supposed+0.117
 scree+0.116
hy+0.114
 written+0.111
Neg logit contributions
-0.129
-0.126
 Pole-0.105
uing-0.102
 Others-0.098
Token-
Token ID12
Feature activation-1.84e-01

Pos logit contributions
 scree+0.142
hy+0.138
 supposed+0.133
 fright+0.132
 vel+0.131
Neg logit contributions
-0.133
-0.127
uing-0.123
ating-0.117
ling-0.116
Tokenyear
Token ID1941
Feature activation-1.11e-01

Pos logit contributions
 scree+0.266
 arm+0.245
 fright+0.244
 vel+0.238
 tal+0.236
Neg logit contributions
uing-0.284
orting-0.261
ins-0.261
or-0.259
 Others-0.257
Token-
Token ID12
Feature activation-1.74e-01

Pos logit contributions
 scree+0.178
 fright+0.176
 written+0.163
 supposed+0.162
 speech+0.161
Neg logit contributions
uing-0.143
-0.140
-0.138
orting-0.131
 Pole-0.128
Tokenold
Token ID727
Feature activation-8.75e-04

Pos logit contributions
 scree+0.332
 fright+0.318
 arm+0.313
 vel+0.300
 fingert+0.295
Neg logit contributions
uing-0.196
orting-0.187
or-0.177
ins-0.176
 Others-0.174
Token girl
Token ID2576
Feature activation-3.20e-02

Pos logit contributions
 scree+0.001
 fright+0.001
 arm+0.001
hy+0.001
 supposed+0.001
Neg logit contributions
-0.001
-0.001
uing-0.001
orting-0.001
 Others-0.001
Token who
Token ID508
Feature activation-8.04e-03

Pos logit contributions
 scree+0.051
 fright+0.051
 arm+0.048
hy+0.048
 supposed+0.047
Neg logit contributions
-0.046
-0.044
uing-0.041
orting-0.039
 Others-0.038
Tokenmy
Token ID1820
Feature activation-2.12e-02

Pos logit contributions
 arm+0.004
 scree+0.004
 fright+0.004
hy+0.004
 supposed+0.004
Neg logit contributions
-0.008
uing-0.008
orting-0.008
-0.008
 Others-0.008
Token saw
Token ID2497
Feature activation+5.55e-02

Pos logit contributions
 arm+0.026
 scree+0.026
 fright+0.024
hy+0.024
 very+0.023
Neg logit contributions
-0.038
uing-0.038
orting-0.037
-0.034
 Others-0.034
Token a
Token ID257
Feature activation-7.32e-02

Pos logit contributions
+0.106
uing+0.105
+0.105
orting+0.105
 Others+0.105
Neg logit contributions
 scree-0.050
 fright-0.050
 arm-0.049
hy-0.048
 supposed-0.046
Token big
Token ID1263
Feature activation-7.61e-03

Pos logit contributions
 scree+0.087
 arm+0.087
 fright+0.086
 supposed+0.085
hy+0.084
Neg logit contributions
-0.125
 Pole-0.120
-0.120
uing-0.115
 Others-0.109
Token T
Token ID309
Feature activation-7.36e-02

Pos logit contributions
 supposed+0.011
 scree+0.011
 fright+0.011
 fingert+0.011
 arm+0.011
Neg logit contributions
-0.010
-0.010
 Pole-0.010
uing-0.010
orting-0.010
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 arm+0.104
 scree+0.102
 fright+0.100
hy+0.098
 supposed+0.094
Neg logit contributions
-0.118
uing-0.113
-0.110
orting-0.106
 Others-0.103
TokenRex
Token ID47389
Feature activation-9.27e-02

Pos logit contributions
 scree+0.277
 supposed+0.269
 fright+0.269
 arm+0.262
 speech+0.258
Neg logit contributions
-0.240
 Pole-0.236
orting-0.234
-0.233
uing-0.229
Token and
Token ID290
Feature activation-9.35e-02

Pos logit contributions
 arm+0.127
 scree+0.126
hy+0.120
 fright+0.120
 er+0.113
Neg logit contributions
-0.154
uing-0.148
-0.141
orting-0.141
 Others-0.135
Token said
Token ID531
Feature activation-1.53e-04

Pos logit contributions
 arm+0.154
 fright+0.153
 supposed+0.151
hy+0.149
 speech+0.148
Neg logit contributions
-0.122
-0.109
 Others-0.096
 Pole-0.095
uing-0.093
Token,
Token ID11
Feature activation+6.62e-02

Pos logit contributions
 very+0.000
hy+0.000
 arm+0.000
 fright+0.000
 scree+0.000
Neg logit contributions
-0.000
uing-0.000
orting-0.000
ating-0.000
places-0.000
Token "
Token ID366
Feature activation+4.41e-02

Pos logit contributions
uing+0.086
+0.085
orting+0.080
ating+0.077
 Others+0.071
Neg logit contributions
 scree-0.116
 arm-0.113
 very-0.110
 fright-0.108
hy-0.108
Token<|endoftext|>
Token ID50256
Feature activation+4.04e-02

Pos logit contributions
+0.037
 speech+0.037
 arm+0.036
 fright+0.035
 very+0.035
Neg logit contributions
uing-0.052
-0.048
ating-0.047
orting-0.047
 Pole-0.042
TokenJack
Token ID14295
Feature activation+3.38e-02

Pos logit contributions
uing+0.060
+0.057
orting+0.051
+0.050
 Majesty+0.049
Neg logit contributions
 arm-0.064
 fright-0.063
 scree-0.061
 speech-0.060
hy-0.060
Token was
Token ID373
Feature activation-9.37e-02

Pos logit contributions
uing+0.060
+0.060
orting+0.057
+0.056
 Others+0.055
Neg logit contributions
 arm-0.042
 scree-0.039
 fright-0.038
hy-0.037
 er-0.036
Token a
Token ID257
Feature activation-7.65e-02

Pos logit contributions
 fright+0.128
hy+0.126
 scree+0.126
leaf+0.114
 poem+0.114
Neg logit contributions
-0.136
-0.136
uing-0.135
orting-0.130
 Others-0.130
Token 3
Token ID513
Feature activation-9.11e-02

Pos logit contributions
 fright+0.098
 scree+0.098
 supposed+0.096
hy+0.096
 arm+0.094
Neg logit contributions
-0.130
-0.129
uing-0.116
 Pole-0.112
 Others-0.108
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.128
hy+0.123
 supposed+0.122
 written+0.119
 vel+0.119
Neg logit contributions
-0.132
-0.131
uing-0.127
ating-0.120
 Others-0.120
Tokenyear
Token ID1941
Feature activation-1.08e-01

Pos logit contributions
 scree+0.222
 arm+0.210
 fright+0.203
 vel+0.196
 fingert+0.193
Neg logit contributions
uing-0.321
orting-0.304
-0.296
 Others-0.294
-0.291
Token-
Token ID12
Feature activation-1.71e-01

Pos logit contributions
 scree+0.162
 fright+0.159
 written+0.150
 supposed+0.149
 speech+0.148
Neg logit contributions
uing-0.151
orting-0.143
-0.141
-0.140
ating-0.134
Tokenold
Token ID727
Feature activation+3.32e-04

Pos logit contributions
 scree+0.278
 arm+0.270
 fright+0.265
 vel+0.249
 fingert+0.249
Neg logit contributions
uing-0.234
orting-0.227
-0.215
 Others-0.210
ating-0.205
Token boy
Token ID2933
Feature activation-4.74e-02

Pos logit contributions
+0.000
+0.000
uing+0.000
orting+0.000
 Others+0.000
Neg logit contributions
 scree-0.001
 arm-0.001
 fright-0.001
hy-0.001
 supposed-0.000
Token and
Token ID290
Feature activation-8.72e-02

Pos logit contributions
 scree+0.065
 arm+0.065
 fright+0.064
hy+0.062
 speech+0.060
Neg logit contributions
-0.076
uing-0.072
-0.071
orting-0.069
 Others-0.066
Token<|endoftext|>
Token ID50256
Feature activation+4.58e-02

Pos logit contributions
 arm+0.039
 fright+0.039
 speech+0.038
 scree+0.038
+0.038
Neg logit contributions
uing-0.029
-0.028
-0.025
orting-0.025
ating-0.024
TokenTom
Token ID13787
Feature activation+4.91e-04

Pos logit contributions
uing+0.069
+0.059
 Majesty+0.058
orting+0.056
rier+0.052
Neg logit contributions
 arm-0.073
-0.072
 very-0.071
 fright-0.070
 a-0.070
Token was
Token ID373
Feature activation-9.19e-02

Pos logit contributions
+0.001
uing+0.001
orting+0.001
 Others+0.001
+0.001
Neg logit contributions
 arm-0.000
 supposed-0.000
 very-0.000
 er-0.000
 scree-0.000
Token a
Token ID257
Feature activation-7.90e-02

Pos logit contributions
 scree+0.122
 fright+0.121
hy+0.117
 poem+0.112
 fingert+0.108
Neg logit contributions
-0.135
-0.130
uing-0.128
 Others-0.127
 Days-0.124
Token 3
Token ID513
Feature activation-9.34e-02

Pos logit contributions
 fright+0.111
 supposed+0.109
 scree+0.108
hy+0.106
 arm+0.104
Neg logit contributions
-0.128
-0.126
 Pole-0.109
uing-0.108
 Others-0.102
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.133
hy+0.129
 supposed+0.125
 fright+0.124
 written+0.123
Neg logit contributions
-0.139
-0.135
uing-0.131
 Others-0.124
ating-0.123
Tokenyear
Token ID1941
Feature activation-1.10e-01

Pos logit contributions
 scree+0.226
 arm+0.218
 fright+0.212
 fingert+0.201
 vel+0.200
Neg logit contributions
uing-0.319
orting-0.300
 Others-0.291
-0.286
-0.285
Token-
Token ID12
Feature activation-1.75e-01

Pos logit contributions
 fright+0.168
 scree+0.168
 written+0.159
 supposed+0.157
 speech+0.156
Neg logit contributions
uing-0.149
-0.144
-0.142
orting-0.140
 Pole-0.137
Tokenold
Token ID727
Feature activation-1.57e-04

Pos logit contributions
 scree+0.282
 arm+0.276
 fright+0.273
 fingert+0.257
 vel+0.255
Neg logit contributions
uing-0.240
orting-0.231
 Others-0.217
-0.215
-0.209
Token who
Token ID508
Feature activation-9.41e-03

Pos logit contributions
 scree+0.000
 arm+0.000
 fright+0.000
hy+0.000
 supposed+0.000
Neg logit contributions
-0.000
-0.000
uing-0.000
orting-0.000
 Others-0.000
Token liked
Token ID8288
Feature activation-9.11e-03

Pos logit contributions
 arm+0.012
 scree+0.012
 fright+0.012
hy+0.012
 speech+0.011
Neg logit contributions
-0.016
uing-0.015
-0.015
orting-0.014
 Others-0.014
TokenJoe
Token ID19585
Feature activation+2.34e-02

Pos logit contributions
uing+0.053
+0.052
+0.047
orting+0.047
 Others+0.043
Neg logit contributions
 arm-0.069
 fright-0.067
 scree-0.067
hy-0.065
 speech-0.064
Tokeny
Token ID88
Feature activation+5.05e-02

Pos logit contributions
+0.041
uing+0.041
orting+0.039
+0.038
 Others+0.037
Neg logit contributions
 arm-0.030
 scree-0.028
hy-0.027
 fright-0.027
 lip-0.026
Token was
Token ID373
Feature activation-9.22e-02

Pos logit contributions
+0.087
uing+0.087
orting+0.082
+0.081
 Others+0.078
Neg logit contributions
 arm-0.064
 scree-0.062
 fright-0.060
hy-0.059
 speech-0.057
Token a
Token ID257
Feature activation-7.72e-02

Pos logit contributions
 scree+0.117
 fright+0.116
hy+0.112
 poem+0.109
 fingert+0.107
Neg logit contributions
-0.141
-0.139
 Others-0.137
uing-0.136
 Days-0.132
Token 3
Token ID513
Feature activation-9.28e-02

Pos logit contributions
 fright+0.104
 supposed+0.104
 scree+0.104
hy+0.101
 written+0.101
Neg logit contributions
-0.127
-0.125
uing-0.108
 Pole-0.107
 Others-0.103
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.128
hy+0.120
 supposed+0.120
 written+0.119
 fright+0.118
Neg logit contributions
-0.140
-0.139
uing-0.134
 Others-0.129
orting-0.127
Tokenyear
Token ID1941
Feature activation-1.07e-01

Pos logit contributions
 scree+0.212
 arm+0.203
 fright+0.196
 vel+0.186
 fingert+0.185
Neg logit contributions
uing-0.330
orting-0.318
-0.314
 Others-0.310
-0.308
Token-
Token ID12
Feature activation-1.73e-01

Pos logit contributions
 scree+0.157
 fright+0.155
 written+0.143
 speech+0.143
 supposed+0.141
Neg logit contributions
uing-0.155
-0.150
orting-0.148
-0.147
 Pole-0.139
Tokenold
Token ID727
Feature activation+7.53e-04

Pos logit contributions
 scree+0.270
 arm+0.262
 fright+0.259
 fingert+0.242
 vel+0.241
Neg logit contributions
uing-0.247
orting-0.240
-0.234
 Others-0.227
-0.224
Token boy
Token ID2933
Feature activation-4.96e-02

Pos logit contributions
+0.001
+0.001
uing+0.001
orting+0.001
 Others+0.001
Neg logit contributions
 scree-0.001
 fright-0.001
 arm-0.001
hy-0.001
 supposed-0.001
Token who
Token ID508
Feature activation-7.37e-03

Pos logit contributions
 scree+0.069
 arm+0.068
 fright+0.068
hy+0.065
 speech+0.063
Neg logit contributions
-0.079
uing-0.074
-0.074
orting-0.072
 Others-0.069
Token,
Token ID11
Feature activation+6.67e-02

Pos logit contributions
 arm+0.011
 fright+0.010
 very+0.010
hy+0.010
 scree+0.010
Neg logit contributions
uing-0.011
-0.011
orting-0.010
-0.009
 Others-0.009
Token there
Token ID612
Feature activation+1.19e-01

Pos logit contributions
uing+0.106
+0.103
orting+0.102
 Pole+0.096
 Majesty+0.094
Neg logit contributions
 arm-0.098
 very-0.095
 fright-0.092
 scree-0.091
 a-0.089
Token was
Token ID373
Feature activation-8.33e-02

Pos logit contributions
+0.228
uing+0.227
orting+0.220
+0.213
 Pole+0.206
Neg logit contributions
 arm-0.133
 fright-0.124
 scree-0.123
 very-0.117
hy-0.116
Token a
Token ID257
Feature activation-7.51e-02

Pos logit contributions
 scree+0.118
 fright+0.111
hy+0.108
 speech+0.101
 arm+0.098
Neg logit contributions
 them-0.122
-0.121
 Others-0.118
 They-0.115
-0.114
Token 3
Token ID513
Feature activation-8.89e-02

Pos logit contributions
 fright+0.108
 scree+0.108
 arm+0.107
hy+0.106
 supposed+0.104
Neg logit contributions
-0.114
-0.111
uing-0.106
orting-0.100
 Pole-0.099
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.126
 supposed+0.124
 written+0.124
hy+0.124
 fright+0.121
Neg logit contributions
-0.128
-0.120
uing-0.117
 Others-0.113
ins-0.112
Tokenyear
Token ID1941
Feature activation-1.06e-01

Pos logit contributions
 scree+0.217
 arm+0.208
 fright+0.205
 vel+0.190
 fingert+0.189
Neg logit contributions
uing-0.317
 Others-0.304
orting-0.304
-0.294
-0.290
Token-
Token ID12
Feature activation-1.74e-01

Pos logit contributions
 fright+0.165
 scree+0.162
 written+0.156
 speech+0.150
+0.150
Neg logit contributions
uing-0.135
-0.134
 Pole-0.129
 Majesty-0.129
orting-0.128
Tokenold
Token ID727
Feature activation+3.27e-03

Pos logit contributions
 scree+0.266
 arm+0.261
 fright+0.258
 fingert+0.239
 mid+0.237
Neg logit contributions
uing-0.246
orting-0.240
 Others-0.233
-0.231
-0.224
Token girl
Token ID2576
Feature activation-3.10e-02

Pos logit contributions
+0.004
+0.004
uing+0.004
orting+0.003
 Others+0.003
Neg logit contributions
 scree-0.006
 fright-0.006
 supposed-0.005
 arm-0.005
hy-0.005
Token.
Token ID13
Feature activation-3.27e-02

Pos logit contributions
 scree+0.042
 fright+0.041
 speech+0.039
leaf+0.039
 vel+0.038
Neg logit contributions
-0.051
-0.050
orting-0.043
uing-0.043
ry-0.043
Token,
Token ID11
Feature activation+5.36e-02

Pos logit contributions
 arm+0.012
 scree+0.012
 fright+0.012
hy+0.012
 supposed+0.011
Neg logit contributions
-0.013
uing-0.013
-0.012
orting-0.012
 Others-0.011
Token holding
Token ID4769
Feature activation+1.65e-02

Pos logit contributions
+0.091
+0.090
uing+0.086
 Others+0.080
orting+0.079
Neg logit contributions
hy-0.063
 arm-0.063
 supposed-0.062
 fright-0.062
 scree-0.060
Token his
Token ID465
Feature activation-3.92e-03

Pos logit contributions
+0.027
uing+0.027
+0.026
 Others+0.025
orting+0.024
Neg logit contributions
 scree-0.022
hy-0.021
 supposed-0.021
 fright-0.020
 arm-0.019
Token green
Token ID4077
Feature activation-3.61e-03

Pos logit contributions
 supposed+0.006
leaf+0.006
 written+0.006
 scree+0.006
hy+0.006
Neg logit contributions
-0.005
 Pole-0.005
-0.005
uing-0.005
 Majesty-0.005
Token T
Token ID309
Feature activation-8.65e-02

Pos logit contributions
 supposed+0.005
 scree+0.005
 fright+0.005
leaf+0.005
hy+0.005
Neg logit contributions
-0.006
-0.006
uing-0.005
 Pole-0.005
 Others-0.005
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.132
 arm+0.130
 fright+0.129
hy+0.125
 speech+0.121
Neg logit contributions
-0.125
uing-0.123
-0.117
orting-0.116
 Others-0.110
Tokenrex
Token ID21510
Feature activation-1.00e-01

Pos logit contributions
 scree+0.263
 supposed+0.253
 cry+0.241
 speech+0.238
 fright+0.237
Neg logit contributions
-0.271
-0.261
uing-0.255
orting-0.245
ar-0.242
Token.
Token ID13
Feature activation-3.21e-02

Pos logit contributions
leaf+0.181
 supposed+0.177
Ta+0.176
oof+0.175
 sch+0.173
Neg logit contributions
uing-0.094
orting-0.087
-0.085
-0.084
der-0.081
Token "
Token ID366
Feature activation+4.26e-02

Pos logit contributions
 arm+0.045
 speech+0.044
 scree+0.043
 very+0.043
 fright+0.042
Neg logit contributions
uing-0.054
-0.053
orting-0.050
ating-0.049
-0.045
TokenHe
Token ID1544
Feature activation+6.04e-02

Pos logit contributions
+0.049
+0.046
uing+0.039
 Others+0.038
orting+0.037
Neg logit contributions
 arm-0.081
 supposed-0.079
 fright-0.077
hy-0.077
 scree-0.077
Token can
Token ID460
Feature activation-4.11e-02

Pos logit contributions
+0.096
uing+0.094
orting+0.089
+0.089
 Others+0.084
Neg logit contributions
 scree-0.086
 arm-0.085
 fright-0.082
hy-0.081
 er-0.078
Token<|endoftext|>
Token ID50256
Feature activation+4.50e-02

Pos logit contributions
 arm+0.033
hy+0.032
 scree+0.032
 fright+0.032
 speech+0.031
Neg logit contributions
uing-0.034
-0.032
orting-0.030
-0.029
 Pole-0.029
TokenGeorge
Token ID20191
Feature activation+3.19e-02

Pos logit contributions
uing+0.063
+0.057
orting+0.053
 Majesty+0.051
+0.051
Neg logit contributions
 arm-0.076
 fright-0.073
 very-0.071
 scree-0.071
hy-0.071
Token was
Token ID373
Feature activation-8.98e-02

Pos logit contributions
uing+0.059
+0.058
orting+0.056
+0.054
 Others+0.053
Neg logit contributions
 arm-0.038
 scree-0.036
hy-0.034
 fright-0.034
 er-0.033
Token a
Token ID257
Feature activation-7.72e-02

Pos logit contributions
 fright+0.116
 scree+0.116
hy+0.111
 fingert+0.105
leaf+0.105
Neg logit contributions
-0.137
-0.137
uing-0.132
 Others-0.128
orting-0.124
Token 3
Token ID513
Feature activation-9.08e-02

Pos logit contributions
 supposed+0.104
 fright+0.104
 scree+0.104
hy+0.100
 written+0.100
Neg logit contributions
-0.127
-0.125
 Pole-0.107
uing-0.106
 Majesty-0.102
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.126
hy+0.120
 supposed+0.117
 fright+0.116
 written+0.115
Neg logit contributions
-0.136
-0.135
uing-0.131
 Others-0.124
ating-0.123
Tokenyear
Token ID1941
Feature activation-1.09e-01

Pos logit contributions
 scree+0.229
 arm+0.218
 fright+0.209
 fingert+0.200
 vel+0.200
Neg logit contributions
uing-0.317
orting-0.297
-0.289
 Others-0.288
-0.284
Token-
Token ID12
Feature activation-1.74e-01

Pos logit contributions
 scree+0.166
 fright+0.164
 supposed+0.153
 speech+0.152
 written+0.152
Neg logit contributions
uing-0.150
-0.146
-0.145
orting-0.142
 Pole-0.135
Tokenold
Token ID727
Feature activation+7.08e-04

Pos logit contributions
 scree+0.291
 arm+0.283
 fright+0.278
 fingert+0.262
 vel+0.259
Neg logit contributions
uing-0.232
orting-0.225
-0.210
 Others-0.209
-0.202
Token boy
Token ID2933
Feature activation-4.95e-02

Pos logit contributions
+0.001
+0.001
uing+0.001
orting+0.001
 Others+0.001
Neg logit contributions
 scree-0.001
 fright-0.001
 arm-0.001
hy-0.001
 supposed-0.001
Token who
Token ID508
Feature activation-6.73e-03

Pos logit contributions
 scree+0.071
 arm+0.070
 fright+0.070
hy+0.067
 speech+0.066
Neg logit contributions
-0.076
uing-0.072
-0.071
orting-0.069
 Others-0.066
Token<|endoftext|>
Token ID50256
Feature activation+4.38e-02

Pos logit contributions
 arm+0.023
 speech+0.022
 fright+0.022
+0.022
 very+0.021
Neg logit contributions
uing-0.025
-0.024
orting-0.023
ating-0.022
 Pole-0.022
TokenBen
Token ID11696
Feature activation-4.55e-04

Pos logit contributions
uing+0.069
+0.063
orting+0.058
 Majesty+0.057
+0.055
Neg logit contributions
 arm-0.067
 fright-0.065
 very-0.063
 speech-0.063
hy-0.063
Token was
Token ID373
Feature activation-9.13e-02

Pos logit contributions
 arm+0.001
 scree+0.001
hy+0.000
 fright+0.000
 very+0.000
Neg logit contributions
uing-0.001
-0.001
orting-0.001
-0.001
 Others-0.001
Token a
Token ID257
Feature activation-7.89e-02

Pos logit contributions
 scree+0.128
 fright+0.123
hy+0.119
 fingert+0.114
leaf+0.113
Neg logit contributions
-0.129
-0.127
uing-0.126
 Others-0.123
 Days-0.119
Token 3
Token ID513
Feature activation-9.21e-02

Pos logit contributions
 supposed+0.114
 fright+0.114
 scree+0.112
hy+0.111
leaf+0.109
Neg logit contributions
-0.124
-0.121
 Pole-0.104
uing-0.101
 Others-0.096
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.133
hy+0.129
 supposed+0.126
 vel+0.125
 fingert+0.124
Neg logit contributions
-0.133
-0.129
uing-0.125
 Others-0.118
ating-0.118
Tokenyear
Token ID1941
Feature activation-1.08e-01

Pos logit contributions
 scree+0.248
 arm+0.229
 fright+0.227
 vel+0.222
 fingert+0.218
Neg logit contributions
uing-0.299
orting-0.282
or-0.270
 Others-0.270
-0.266
Token-
Token ID12
Feature activation-1.74e-01

Pos logit contributions
 scree+0.173
 fright+0.172
 written+0.164
 supposed+0.162
 speech+0.159
Neg logit contributions
uing-0.136
-0.130
-0.128
orting-0.128
 Pole-0.126
Tokenold
Token ID727
Feature activation+2.04e-03

Pos logit contributions
 scree+0.317
 fright+0.302
 arm+0.301
 vel+0.288
 fingert+0.287
Neg logit contributions
uing-0.207
orting-0.204
or-0.184
 Others-0.184
-0.179
Token boy
Token ID2933
Feature activation-4.85e-02

Pos logit contributions
+0.003
+0.003
uing+0.003
orting+0.002
 Others+0.002
Neg logit contributions
 scree-0.003
 fright-0.003
 arm-0.003
hy-0.003
 supposed-0.003
Token who
Token ID508
Feature activation-5.78e-03

Pos logit contributions
 scree+0.072
 fright+0.072
 arm+0.070
hy+0.068
 speech+0.067
Neg logit contributions
-0.072
-0.068
uing-0.067
orting-0.066
 Others-0.063
Token,
Token ID11
Feature activation+7.12e-02

Pos logit contributions
 very+0.003
 arm+0.003
+0.003
 fright+0.003
hy+0.003
Neg logit contributions
uing-0.004
-0.003
orting-0.003
ating-0.003
 Others-0.003
Token there
Token ID612
Feature activation+1.23e-01

Pos logit contributions
uing+0.132
+0.129
orting+0.129
 Pole+0.121
 Majesty+0.121
Neg logit contributions
 very-0.096
 a-0.090
 arm-0.090
-0.079
 stuck-0.078
Token lived
Token ID5615
Feature activation+4.65e-02

Pos logit contributions
+0.219
uing+0.217
orting+0.207
+0.205
 Others+0.195
Neg logit contributions
 arm-0.151
 scree-0.144
 fright-0.143
hy-0.137
 speech-0.132
Token a
Token ID257
Feature activation-7.57e-02

Pos logit contributions
+0.119
uing+0.116
+0.114
orting+0.113
 Others+0.110
Neg logit contributions
 scree-0.020
 arm-0.020
 fright-0.018
hy-0.017
 supposed-0.014
Token 3
Token ID513
Feature activation-8.93e-02

Pos logit contributions
 arm+0.101
 scree+0.100
 fright+0.098
hy+0.096
 speech+0.092
Neg logit contributions
-0.125
uing-0.120
-0.117
orting-0.115
 Others-0.110
Token-
Token ID12
Feature activation-1.83e-01

Pos logit contributions
 scree+0.122
 supposed+0.121
 written+0.118
 fright+0.118
hy+0.117
Neg logit contributions
-0.130
-0.124
uing-0.122
 Others-0.118
orting-0.117
Tokenyear
Token ID1941
Feature activation-1.05e-01

Pos logit contributions
 scree+0.207
 fright+0.199
 arm+0.198
 vel+0.178
 speech+0.177
Neg logit contributions
uing-0.323
orting-0.311
 Others-0.306
-0.304
-0.300
Token-
Token ID12
Feature activation-1.74e-01

Pos logit contributions
 fright+0.160
 scree+0.157
 written+0.150
 speech+0.147
 supposed+0.147
Neg logit contributions
uing-0.137
orting-0.132
 Pole-0.131
-0.130
-0.128
Tokenold
Token ID727
Feature activation+2.59e-03

Pos logit contributions
 scree+0.252
 arm+0.246
 fright+0.246
 speech+0.224
 fingert+0.223
Neg logit contributions
uing-0.257
orting-0.250
-0.246
 Others-0.240
-0.238
Token girl
Token ID2576
Feature activation-3.15e-02

Pos logit contributions
+0.003
+0.003
uing+0.003
orting+0.003
 Pole+0.003
Neg logit contributions
 scree-0.004
 fright-0.004
 arm-0.004
 supposed-0.004
 speech-0.004
Token named
Token ID3706
Feature activation-6.00e-02

Pos logit contributions
 fright+0.039
 scree+0.038
 speech+0.036
leaf+0.035
hy+0.035
Neg logit contributions
-0.055
-0.054
orting-0.048
uing-0.048
 Others-0.047
Token Tom
Token ID4186
Feature activation-4.20e-03

Pos logit contributions
 arm+0.044
 scree+0.044
hy+0.043
 fright+0.043
 speech+0.041
Neg logit contributions
-0.049
uing-0.049
-0.045
orting-0.045
 Others-0.042
Token was
Token ID373
Feature activation-8.92e-02

Pos logit contributions
 scree+0.004
 arm+0.004
 very+0.004
 fright+0.004
hy+0.004
Neg logit contributions
uing-0.009
-0.009
orting-0.008
 Others-0.008
 Pole-0.008
Token wearing
Token ID5762
Feature activation+7.15e-03

Pos logit contributions
 scree+0.095
 fright+0.094
 arm+0.091
leaf+0.090
 fingert+0.088
Neg logit contributions
uing-0.149
-0.148
-0.147
 Others-0.144
orting-0.140
Token a
Token ID257
Feature activation-8.51e-02

Pos logit contributions
 them+0.012
+0.012
orting+0.012
 Others+0.012
+0.011
Neg logit contributions
 scree-0.008
 supposed-0.008
 fright-0.007
leaf-0.007
 fingert-0.007
Token t
Token ID256
Feature activation-1.00e-01

Pos logit contributions
 supposed+0.112
 scree+0.108
 fright+0.104
leaf+0.099
hy+0.098
Neg logit contributions
-0.136
-0.129
uing-0.124
 Pole-0.118
orting-0.117
Token-
Token ID12
Feature activation-1.82e-01

Pos logit contributions
 scree+0.195
 fright+0.186
 supposed+0.181
 rhyth+0.174
+0.173
Neg logit contributions
uing-0.107
ry-0.096
-0.094
orting-0.091
 Pole-0.090
Tokenshirt
Token ID15600
Feature activation-7.88e-02

Pos logit contributions
 scree+0.243
 arm+0.223
 cry+0.214
 supposed+0.214
 fright+0.212
Neg logit contributions
-0.278
orting-0.270
-0.259
ar-0.257
uing-0.257
Token and
Token ID290
Feature activation-9.15e-02

Pos logit contributions
 fright+0.143
 scree+0.141
leaf+0.140
 supposed+0.139
 speech+0.129
Neg logit contributions
-0.086
-0.086
orting-0.084
uing-0.080
 Pole-0.075
Token shorts
Token ID22078
Feature activation-8.30e-02

Pos logit contributions
 fright+0.101
 supposed+0.094
hy+0.092
 scree+0.091
leaf+0.090
Neg logit contributions
-0.159
-0.158
 Others-0.145
 Pole-0.142
 Majesty-0.140
Token,
Token ID11
Feature activation+5.49e-02

Pos logit contributions
 scree+0.117
 fright+0.117
 arm+0.115
hy+0.109
 speech+0.108
Neg logit contributions
-0.127
uing-0.125
-0.122
orting-0.118
 Others-0.111
Token and
Token ID290
Feature activation-7.52e-02

Pos logit contributions
+0.068
+0.068
orting+0.063
uing+0.062
 Pole+0.062
Neg logit contributions
 fright-0.086
 supposed-0.084
hy-0.082
 arm-0.082
 scree-0.081
TokenTom
Token ID13787
Feature activation-2.14e-03

Pos logit contributions
uing+0.051
+0.048
orting+0.043
+0.043
 Majesty+0.040
Neg logit contributions
 arm-0.072
 fright-0.070
 scree-0.069
hy-0.068
 speech-0.068
Tokenmy
Token ID1820
Feature activation-9.58e-03

Pos logit contributions
 arm+0.002
 scree+0.002
 fright+0.002
hy+0.002
 lip+0.002
Neg logit contributions
-0.004
uing-0.004
orting-0.004
-0.004
 Others-0.004
Token was
Token ID373
Feature activation-9.35e-02

Pos logit contributions
 arm+0.012
 scree+0.011
 fright+0.011
hy+0.010
 er+0.010
Neg logit contributions
uing-0.017
-0.017
orting-0.016
-0.016
 Others-0.016
Token a
Token ID257
Feature activation-7.55e-02

Pos logit contributions
 fright+0.124
 scree+0.124
hy+0.123
 poem+0.114
 fingert+0.111
Neg logit contributions
-0.132
-0.132
 Others-0.131
uing-0.131
orting-0.129
Token 3
Token ID513
Feature activation-9.19e-02

Pos logit contributions
 fright+0.101
 scree+0.100
 supposed+0.099
hy+0.097
 written+0.096
Neg logit contributions
-0.125
-0.125
uing-0.110
 Pole-0.107
 Others-0.104
Token-
Token ID12
Feature activation-1.82e-01

Pos logit contributions
 scree+0.121
hy+0.115
 supposed+0.113
 fright+0.113
 written+0.112
Neg logit contributions
-0.144
-0.143
uing-0.141
 Others-0.133
orting-0.132
Tokenyear
Token ID1941
Feature activation-1.06e-01

Pos logit contributions
 scree+0.209
 fright+0.196
 arm+0.196
 fingert+0.182
 vel+0.180
Neg logit contributions
uing-0.334
-0.319
orting-0.315
-0.310
 Others-0.310
Token-
Token ID12
Feature activation-1.70e-01

Pos logit contributions
 scree+0.152
 fright+0.149
 arm+0.138
 speech+0.138
 supposed+0.136
Neg logit contributions
uing-0.160
-0.156
-0.152
orting-0.151
ating-0.143
Tokenold
Token ID727
Feature activation+1.57e-03

Pos logit contributions
 scree+0.260
 arm+0.251
 fright+0.250
 fingert+0.232
 vel+0.229
Neg logit contributions
uing-0.249
-0.239
orting-0.239
 Others-0.227
-0.226
Token boy
Token ID2933
Feature activation-4.82e-02

Pos logit contributions
+0.002
uing+0.002
+0.002
orting+0.002
 Others+0.002
Neg logit contributions
 scree-0.002
 arm-0.002
 fright-0.002
hy-0.002
 supposed-0.002
Token and
Token ID290
Feature activation-8.97e-02

Pos logit contributions
 scree+0.064
 arm+0.064
 fright+0.063
hy+0.061
 speech+0.059
Neg logit contributions
-0.079
uing-0.076
-0.074
orting-0.072
 Others-0.070
Token that
Token ID326
Feature activation+8.49e-02

Pos logit contributions
+0.047
uing+0.045
+0.042
 Others+0.041
orting+0.041
Neg logit contributions
 arm-0.025
 scree-0.025
 written-0.025
 fright-0.025
 very-0.025
Token she
Token ID673
Feature activation+9.02e-02

Pos logit contributions
+0.099
+0.091
uing+0.089
orting+0.080
 Others+0.077
Neg logit contributions
 scree-0.159
hy-0.153
 fright-0.152
 arm-0.151
leaf-0.147
Token jumped
Token ID11687
Feature activation-7.21e-03

Pos logit contributions
+0.140
+0.139
uing+0.135
 Others+0.121
orting+0.120
Neg logit contributions
 arm-0.124
hy-0.121
 fright-0.119
 scree-0.117
 supposed-0.114
Token up
Token ID510
Feature activation+6.56e-02

Pos logit contributions
 arm+0.010
 scree+0.010
 fright+0.009
hy+0.009
 supposed+0.009
Neg logit contributions
-0.012
uing-0.011
-0.011
orting-0.011
 Others-0.010
Token and
Token ID290
Feature activation-5.22e-02

Pos logit contributions
+0.082
+0.079
uing+0.077
orting+0.074
cast+0.072
Neg logit contributions
 scree-0.106
hy-0.105
 supposed-0.105
 arm-0.103
oof-0.102
Token down
Token ID866
Feature activation+8.24e-02

Pos logit contributions
 arm+0.067
 scree+0.066
 fright+0.065
hy+0.064
 supposed+0.061
Neg logit contributions
-0.088
uing-0.085
-0.083
orting-0.081
 Others-0.078
Token and
Token ID290
Feature activation-5.89e-02

Pos logit contributions
+0.140
uing+0.137
orting+0.130
+0.130
 Pole+0.124
Neg logit contributions
 arm-0.109
 scree-0.108
 fright-0.106
hy-0.102
 speech-0.099
Token shouted
Token ID17293
Feature activation-2.59e-02

Pos logit contributions
 arm+0.073
 scree+0.072
 fright+0.071
hy+0.069
 speech+0.066
Neg logit contributions
-0.103
uing-0.100
-0.097
orting-0.095
 Others-0.092
Token,
Token ID11
Feature activation+7.16e-02

Pos logit contributions
 very+0.036
 arm+0.036
 speech+0.035
 fright+0.034
 scree+0.033
Neg logit contributions
-0.042
 Pole-0.039
orting-0.039
uing-0.038
-0.037
Token "
Token ID366
Feature activation+4.57e-02

Pos logit contributions
uing+0.097
orting+0.092
+0.092
ating+0.088
 Others+0.083
Neg logit contributions
 scree-0.123
 arm-0.118
 very-0.116
 fright-0.115
 speech-0.113
TokenY
Token ID56
Feature activation-6.79e-02

Pos logit contributions
+0.066
+0.060
uing+0.058
orting+0.056
 Others+0.053
Neg logit contributions
 arm-0.074
 scree-0.072
 fright-0.071
hy-0.070
 supposed-0.069
Token She
Token ID1375
Feature activation+7.19e-02

Pos logit contributions
 scree+0.062
 arm+0.062
 fright+0.061
hy+0.059
 supposed+0.058
Neg logit contributions
-0.038
uing-0.036
-0.035
orting-0.034
 Others-0.032
Token heard
Token ID2982
Feature activation-1.90e-02

Pos logit contributions
+0.098
+0.098
uing+0.090
 Others+0.085
 Pole+0.083
Neg logit contributions
hy-0.115
 fright-0.113
 arm-0.110
 scree-0.110
 supposed-0.109
Token music
Token ID2647
Feature activation-1.81e-02

Pos logit contributions
 scree+0.018
 arm+0.018
 fright+0.018
 supposed+0.017
hy+0.017
Neg logit contributions
uing-0.035
-0.035
 Others-0.035
-0.034
orting-0.034
Token coming
Token ID2406
Feature activation-9.43e-03

Pos logit contributions
 scree+0.023
 arm+0.023
 fright+0.022
hy+0.022
 speech+0.021
Neg logit contributions
-0.032
uing-0.030
-0.029
orting-0.028
 Others-0.028
Token from
Token ID422
Feature activation-1.06e-02

Pos logit contributions
 scree+0.013
hy+0.013
leaf+0.013
 er+0.013
 fingert+0.013
Neg logit contributions
uing-0.013
-0.013
-0.012
orting-0.012
 Pole-0.012
Token inside
Token ID2641
Feature activation+9.30e-02

Pos logit contributions
hy+0.016
 scree+0.016
 fingert+0.015
 speech+0.015
 supposed+0.015
Neg logit contributions
-0.015
-0.014
stairs-0.013
 Others-0.013
 Pole-0.013
Token.
Token ID13
Feature activation-3.14e-02

Pos logit contributions
+0.148
uing+0.144
+0.140
orting+0.137
 Others+0.133
Neg logit contributions
 scree-0.127
 arm-0.123
 fright-0.121
hy-0.121
 supposed-0.115
Token When
Token ID1649
Feature activation+5.53e-02

Pos logit contributions
 arm+0.055
 scree+0.053
hy+0.052
 fright+0.051
 speech+0.050
Neg logit contributions
uing-0.040
-0.040
orting-0.037
-0.037
ating-0.033
Token she
Token ID673
Feature activation+1.01e-01

Pos logit contributions
+0.066
+0.063
uing+0.059
orting+0.056
 Others+0.056
Neg logit contributions
 fright-0.097
 scree-0.097
 arm-0.097
 supposed-0.094
hy-0.094
Token opened
Token ID4721
Feature activation+4.63e-02

Pos logit contributions
+0.167
uing+0.158
+0.158
orting+0.150
 Others+0.145
Neg logit contributions
 arm-0.133
 scree-0.131
 fright-0.130
hy-0.130
 supposed-0.125
Token the
Token ID262
Feature activation+3.57e-02

Pos logit contributions
+0.092
uing+0.090
orting+0.086
+0.086
 Others+0.082
Neg logit contributions
 arm-0.049
 fright-0.047
 scree-0.046
hy-0.043
 speech-0.042
Token helpful
Token ID7613
Feature activation+3.17e-02

Pos logit contributions
 arm+0.094
 scree+0.093
 fright+0.093
hy+0.090
 speech+0.085
Neg logit contributions
-0.147
uing-0.143
-0.140
orting-0.137
 Others-0.129
Token.
Token ID13
Feature activation-1.85e-02

Pos logit contributions
+0.058
uing+0.056
+0.054
orting+0.052
 Pole+0.051
Neg logit contributions
 arm-0.039
 fright-0.037
 scree-0.037
hy-0.036
 supposed-0.034
Token
Token ID220
Feature activation+7.13e-03

Pos logit contributions
 very+0.020
 arm+0.019
+0.019
 a+0.018
 scree+0.017
Neg logit contributions
uing-0.037
ating-0.035
-0.033
immers-0.033
rier-0.031
Token
Token ID198
Feature activation+1.89e-02

Pos logit contributions
+0.012
uing+0.011
+0.010
 Pole+0.010
 Majesty+0.010
Neg logit contributions
 arm-0.009
 supposed-0.009
 very-0.009
 speech-0.009
 fright-0.009
Token
Token ID198
Feature activation+2.09e-02

Pos logit contributions
+0.031
uing+0.030
 Pole+0.027
 Majesty+0.027
orting+0.027
Neg logit contributions
 arm-0.025
 fright-0.025
 scree-0.025
 very-0.025
hy-0.024
TokenLater
Token ID18602
Feature activation+1.06e-01

Pos logit contributions
 Pole+0.033
uing+0.033
+0.033
 Majesty+0.031
orting+0.030
Neg logit contributions
 arm-0.027
 scree-0.027
 fright-0.026
-0.026
 very-0.026
Token that
Token ID326
Feature activation+9.72e-02

Pos logit contributions
+0.148
uing+0.145
+0.136
orting+0.133
 Others+0.128
Neg logit contributions
 arm-0.172
 scree-0.169
 fright-0.166
hy-0.162
 speech-0.158
Token day
Token ID1110
Feature activation+6.01e-02

Pos logit contributions
+0.182
+0.174
uing+0.172
orting+0.168
 Others+0.160
Neg logit contributions
 scree-0.112
 fright-0.110
hy-0.108
 arm-0.108
 speech-0.101
Token,
Token ID11
Feature activation+7.42e-02

Pos logit contributions
uing+0.106
 Majesty+0.100
+0.098
places+0.097
 Pole+0.096
Neg logit contributions
 arm-0.078
 very-0.078
 a-0.073
 er-0.068
 scree-0.067
Token Lily
Token ID20037
Feature activation+1.62e-02

Pos logit contributions
uing+0.128
+0.123
orting+0.120
 Pole+0.120
ating+0.112
Neg logit contributions
 very-0.100
 arm-0.098
 a-0.097
 er-0.088
 fright-0.088
Token's
Token ID338
Feature activation-3.13e-02

Pos logit contributions
+0.029
uing+0.028
+0.027
orting+0.027
 Others+0.026
Neg logit contributions
 arm-0.020
 scree-0.019
 fright-0.019
hy-0.018
 supposed-0.018
Token whale
Token ID22206
Feature activation-2.69e-02

Pos logit contributions
+0.049
+0.048
uing+0.045
 Pole+0.044
orting+0.044
Neg logit contributions
hy-0.056
 scree-0.054
 arm-0.053
 supposed-0.053
 fright-0.052
Token felt
Token ID2936
Feature activation-4.71e-02

Pos logit contributions
 arm+0.034
 scree+0.033
 fright+0.032
hy+0.031
 speech+0.030
Neg logit contributions
-0.047
uing-0.046
orting-0.044
-0.044
 Others-0.042
Token very
Token ID845
Feature activation-7.66e-02

Pos logit contributions
 arm+0.043
 scree+0.042
hy+0.040
 fright+0.038
oof+0.037
Neg logit contributions
-0.091
uing-0.090
orting-0.088
-0.087
 Others-0.086
Token sad
Token ID6507
Feature activation+4.93e-02

Pos logit contributions
 arm+0.072
 scree+0.071
 fright+0.069
hy+0.067
 supposed+0.063
Neg logit contributions
-0.156
uing-0.153
-0.149
orting-0.146
 Others-0.141
Token.
Token ID13
Feature activation-2.15e-02

Pos logit contributions
+0.112
uing+0.101
+0.100
 Others+0.098
 Pole+0.097
Neg logit contributions
 very-0.049
 arm-0.045
 a-0.041
 scree-0.039
 fright-0.038
Token He
Token ID679
Feature activation+7.97e-02

Pos logit contributions
 arm+0.035
 fright+0.034
 scree+0.034
 speech+0.034
hy+0.033
Neg logit contributions
uing-0.031
-0.030
-0.028
orting-0.027
ating-0.026
Token felt
Token ID2936
Feature activation-4.23e-02

Pos logit contributions
+0.121
+0.117
uing+0.116
orting+0.108
 Others+0.108
Neg logit contributions
 arm-0.107
hy-0.106
 scree-0.103
 fright-0.102
 supposed-0.102
Token like
Token ID588
Feature activation+1.37e-02

Pos logit contributions
 arm+0.040
 scree+0.039
hy+0.037
 fright+0.037
 supposed+0.035
Neg logit contributions
-0.083
uing-0.081
-0.079
orting-0.079
 Others-0.077
Token his
Token ID465
Feature activation+4.05e-03

Pos logit contributions
+0.020
uing+0.019
+0.018
orting+0.018
 Others+0.017
Neg logit contributions
 arm-0.021
 scree-0.021
 fright-0.020
hy-0.020
 speech-0.019
Token special
Token ID2041
Feature activation+3.21e-02

Pos logit contributions
+0.006
uing+0.006
+0.006
 Others+0.006
 Pole+0.005
Neg logit contributions
 scree-0.006
hy-0.006
 fright-0.005
 supposed-0.005
 arm-0.005
Token design
Token ID1486
Feature activation+3.64e-03

Pos logit contributions
+0.056
uing+0.053
+0.053
orting+0.050
 Others+0.049
Neg logit contributions
 scree-0.039
 fright-0.038
hy-0.038
 arm-0.037
 supposed-0.036
Token the
Token ID262
Feature activation+3.16e-02

Pos logit contributions
 arm+0.016
 scree+0.016
 fright+0.016
hy+0.015
 speech+0.015
Neg logit contributions
-0.011
uing-0.010
-0.010
orting-0.010
 Others-0.009
Token big
Token ID1263
Feature activation+4.26e-02

Pos logit contributions
+0.042
 Pole+0.040
+0.040
orting+0.039
uing+0.039
Neg logit contributions
 supposed-0.046
hy-0.046
 scree-0.045
 written-0.045
 arm-0.043
Token fish
Token ID5916
Feature activation+6.31e-03

Pos logit contributions
+0.061
+0.057
uing+0.055
orting+0.055
 Pole+0.055
Neg logit contributions
 supposed-0.064
hy-0.061
 scree-0.061
 fright-0.060
 arm-0.060
Token,
Token ID11
Feature activation+6.05e-02

Pos logit contributions
+0.011
uing+0.011
+0.010
orting+0.010
 Others+0.010
Neg logit contributions
 arm-0.008
 scree-0.008
 fright-0.008
 very-0.007
hy-0.007
Token but
Token ID475
Feature activation-4.08e-02

Pos logit contributions
+0.086
uing+0.083
+0.079
orting+0.078
 Others+0.074
Neg logit contributions
 arm-0.096
 scree-0.095
 fright-0.094
hy-0.090
 speech-0.088
Token he
Token ID339
Feature activation+8.83e-02

Pos logit contributions
 scree+0.070
 fright+0.069
hy+0.069
 supposed+0.068
leaf+0.068
Neg logit contributions
-0.051
-0.048
uing-0.047
orting-0.044
 Others-0.041
Token could
Token ID714
Feature activation-3.06e-02

Pos logit contributions
+0.121
+0.121
uing+0.120
orting+0.108
 Others+0.105
Neg logit contributions
hy-0.137
 fright-0.131
 arm-0.128
leaf-0.128
 scree-0.128
Token not
Token ID407
Feature activation-4.46e-02

Pos logit contributions
 arm+0.029
 scree+0.028
 fright+0.028
hy+0.026
 speech+0.025
Neg logit contributions
-0.064
uing-0.062
-0.060
orting-0.060
 Others-0.057
Token do
Token ID466
Feature activation-1.87e-02

Pos logit contributions
 arm+0.054
 scree+0.054
 fright+0.053
hy+0.049
 speech+0.048
Neg logit contributions
-0.081
uing-0.078
-0.077
orting-0.075
 Others-0.072
Token it
Token ID340
Feature activation+2.30e-02

Pos logit contributions
 arm+0.025
 scree+0.025
 fright+0.024
hy+0.023
 supposed+0.023
Neg logit contributions
-0.032
uing-0.030
-0.030
orting-0.029
 Pole-0.028
Token.
Token ID13
Feature activation-1.82e-02

Pos logit contributions
+0.040
+0.038
uing+0.036
 Pole+0.035
orting+0.035
Neg logit contributions
 arm-0.031
 scree-0.030
 fright-0.029
 supposed-0.029
 very-0.029
Token special
Token ID2041
Feature activation+3.90e-02

Pos logit contributions
+0.019
uing+0.019
 Majesty+0.018
+0.018
cast+0.018
Neg logit contributions
hy-0.026
 scree-0.026
oof-0.024
 fright-0.023
leaf-0.023
Token project
Token ID1628
Feature activation-1.32e-03

Pos logit contributions
uing+0.060
+0.058
+0.053
ry+0.051
 Majesty+0.051
Neg logit contributions
hy-0.055
 scree-0.054
 supposed-0.054
leaf-0.049
 fright-0.048
Token from
Token ID422
Feature activation-2.14e-02

Pos logit contributions
 arm+0.002
 scree+0.002
 fright+0.002
hy+0.002
 speech+0.002
Neg logit contributions
-0.002
uing-0.002
-0.002
orting-0.002
 Others-0.002
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
 scree+0.029
hy+0.028
 fright+0.027
 arm+0.027
 supposed+0.025
Neg logit contributions
-0.034
uing-0.033
-0.032
orting-0.031
 Others-0.031
Token pattern
Token ID3912
Feature activation-6.69e-03

Pos logit contributions
+0.031
+0.029
uing+0.028
orting+0.027
 Others+0.026
Neg logit contributions
 scree-0.030
 fright-0.028
 supposed-0.028
hy-0.028
oof-0.028
Token,
Token ID11
Feature activation+6.45e-02

Pos logit contributions
 arm+0.010
 scree+0.009
 fright+0.009
hy+0.009
 speech+0.009
Neg logit contributions
-0.011
uing-0.010
-0.010
orting-0.010
 Others-0.009
Token she
Token ID673
Feature activation+9.22e-02

Pos logit contributions
+0.087
uing+0.085
+0.082
orting+0.078
 Others+0.076
Neg logit contributions
 scree-0.102
 arm-0.102
 fright-0.101
hy-0.099
 supposed-0.095
Token was
Token ID373
Feature activation-8.27e-02

Pos logit contributions
+0.151
uing+0.147
+0.143
orting+0.137
 Others+0.133
Neg logit contributions
 arm-0.123
 scree-0.122
 fright-0.120
hy-0.119
 supposed-0.112
Token thrilled
Token ID20536
Feature activation+4.85e-02

Pos logit contributions
 scree+0.095
hy+0.093
 fright+0.092
 arm+0.089
oof+0.088
Neg logit contributions
uing-0.143
-0.141
-0.138
 Others-0.130
orting-0.127
Token.
Token ID13
Feature activation-2.39e-02

Pos logit contributions
+0.093
uing+0.089
orting+0.085
+0.085
 Others+0.085
Neg logit contributions
 arm-0.053
 fright-0.050
 scree-0.050
 written-0.049
hy-0.048
Token She
Token ID1375
Feature activation+7.75e-02

Pos logit contributions
 arm+0.039
 fright+0.038
 scree+0.038
hy+0.038
 speech+0.037
Neg logit contributions
uing-0.032
-0.031
orting-0.030
-0.029
 Pole-0.027
Token told
Token ID1297
Feature activation+9.31e-03

Pos logit contributions
+0.103
+0.103
uing+0.100
 Others+0.091
orting+0.090
Neg logit contributions
hy-0.103
 fright-0.101
 scree-0.098
 arm-0.097
 supposed-0.095
Token Tim
Token ID5045
Feature activation+5.82e-03

Pos logit contributions
+0.011
uing+0.010
orting+0.010
+0.010
 Others+0.009
Neg logit contributions
 scree-0.017
 arm-0.017
hy-0.016
 fright-0.016
 speech-0.016
Tokenmy
Token ID1820
Feature activation-2.14e-02

Pos logit contributions
uing+0.009
+0.009
orting+0.009
+0.009
 Others+0.008
Neg logit contributions
 scree-0.008
 fright-0.008
 arm-0.008
hy-0.008
 speech-0.007
Token to
Token ID284
Feature activation-3.71e-02

Pos logit contributions
 arm+0.030
 scree+0.028
 fright+0.027
 very+0.027
hy+0.027
Neg logit contributions
-0.036
uing-0.034
-0.033
orting-0.033
 Pole-0.032
Token help
Token ID1037
Feature activation-8.65e-03

Pos logit contributions
 scree+0.048
hy+0.045
 fright+0.044
 arm+0.044
 supposed+0.043
Neg logit contributions
uing-0.062
-0.062
orting-0.059
-0.058
 Others-0.056
Token her
Token ID607
Feature activation+2.06e-02

Pos logit contributions
 arm+0.010
 scree+0.009
 fright+0.009
 supposed+0.009
hy+0.009
Neg logit contributions
-0.016
uing-0.016
-0.016
orting-0.015
 Pole-0.015
Token clean
Token ID3424
Feature activation+5.36e-02

Pos logit contributions
+0.044
orting+0.042
+0.041
uing+0.040
 Pole+0.040
Neg logit contributions
 arm-0.023
 very-0.018
 lip-0.018
 fright-0.018
 cry-0.017
Token the
Token ID262
Feature activation+2.01e-02

Pos logit contributions
+0.105
+0.098
uing+0.097
orting+0.096
 Pole+0.092
Neg logit contributions
 arm-0.067
 fright-0.057
 very-0.056
 er-0.056
 lip-0.055
Token television
Token ID5581
Feature activation+4.41e-03

Pos logit contributions
+0.039
uing+0.037
orting+0.036
+0.035
ub+0.034
Neg logit contributions
 arm-0.025
 fright-0.022
 very-0.022
 speech-0.022
 supposed-0.021
Token.
Token ID13
Feature activation-3.72e-02

Pos logit contributions
+0.008
+0.008
uing+0.008
orting+0.007
 Pole+0.007
Neg logit contributions
 arm-0.006
 very-0.005
 fright-0.005
 er-0.005
 scree-0.005
Token Tim
Token ID5045
Feature activation+1.24e-02

Pos logit contributions
 arm+0.039
 very+0.037
 lip+0.036
+0.035
 a+0.035
Neg logit contributions
uing-0.077
 Majesty-0.071
-0.070
rier-0.069
ating-0.066
Token it
Token ID340
Feature activation+3.37e-02

Pos logit contributions
+0.117
+0.108
uing+0.107
orting+0.102
 Others+0.099
Neg logit contributions
 scree-0.149
 fright-0.145
 arm-0.143
hy-0.142
 supposed-0.136
Token is
Token ID318
Feature activation-7.51e-02

Pos logit contributions
+0.052
uing+0.051
+0.049
orting+0.048
 Others+0.046
Neg logit contributions
 arm-0.049
 scree-0.048
 fright-0.047
hy-0.046
 er-0.045
Token very
Token ID845
Feature activation-7.11e-02

Pos logit contributions
leaf+0.097
 fright+0.095
 scree+0.095
hy+0.086
 rhyth+0.082
Neg logit contributions
uing-0.118
-0.117
 nicely-0.114
-0.113
 them-0.107
Token expensive
Token ID5789
Feature activation+3.02e-02

Pos logit contributions
 scree+0.090
 fright+0.086
 arm+0.084
hy+0.078
 fingert+0.077
Neg logit contributions
uing-0.122
-0.117
-0.117
orting-0.112
 Others-0.108
Token.
Token ID13
Feature activation-2.96e-02

Pos logit contributions
+0.052
uing+0.050
+0.048
orting+0.047
 Others+0.046
Neg logit contributions
 arm-0.040
 scree-0.038
 fright-0.038
hy-0.037
 supposed-0.035
Token She
Token ID1375
Feature activation+6.52e-02

Pos logit contributions
 arm+0.049
hy+0.047
 fright+0.046
 scree+0.046
 speech+0.045
Neg logit contributions
uing-0.042
-0.040
orting-0.037
-0.037
ating-0.035
Token also
Token ID635
Feature activation-1.01e-02

Pos logit contributions
+0.093
uing+0.090
+0.088
orting+0.084
 Others+0.082
Neg logit contributions
 arm-0.100
 scree-0.100
 fright-0.099
hy-0.098
 supposed-0.094
Token sees
Token ID7224
Feature activation+5.06e-02

Pos logit contributions
 arm+0.017
 scree+0.017
 fright+0.017
hy+0.017
 speech+0.016
Neg logit contributions
-0.013
uing-0.012
-0.012
orting-0.012
 Others-0.011
Token that
Token ID326
Feature activation+8.82e-02

Pos logit contributions
+0.083
uing+0.080
+0.079
orting+0.078
 Others+0.076
Neg logit contributions
 scree-0.065
 fright-0.064
 arm-0.064
hy-0.062
 supposed-0.060
Token it
Token ID340
Feature activation+3.37e-02

Pos logit contributions
+0.119
+0.109
uing+0.107
orting+0.102
 Others+0.101
Neg logit contributions
 scree-0.147
 fright-0.143
hy-0.140
 arm-0.139
 fingert-0.134
Token is
Token ID318
Feature activation-7.47e-02

Pos logit contributions
+0.049
uing+0.047
+0.045
orting+0.044
 Others+0.042
Neg logit contributions
 arm-0.052
 scree-0.052
 fright-0.051
hy-0.050
 speech-0.048
Token.
Token ID13
Feature activation-2.80e-02

Pos logit contributions
 arm+0.021
 scree+0.020
 fright+0.019
hy+0.019
 supposed+0.019
Neg logit contributions
-0.034
uing-0.033
-0.031
orting-0.031
 Others-0.030
Token
Token ID198
Feature activation+1.74e-02

Pos logit contributions
 arm+0.040
 speech+0.037
 a+0.037
 fright+0.037
hy+0.037
Neg logit contributions
uing-0.049
-0.046
ating-0.043
-0.041
orting-0.041
Token
Token ID198
Feature activation+1.95e-02

Pos logit contributions
+0.029
uing+0.027
+0.025
orting+0.025
 Pole+0.024
Neg logit contributions
 arm-0.025
 fright-0.025
 scree-0.024
 very-0.024
 speech-0.023
TokenL
Token ID43
Feature activation-4.46e-02

Pos logit contributions
uing+0.033
 Pole+0.032
+0.031
 Majesty+0.031
ishes+0.030
Neg logit contributions
-0.025
 arm-0.025
 scree-0.024
 very-0.024
 fright-0.023
Tokenily
Token ID813
Feature activation-9.22e-05

Pos logit contributions
 arm+0.066
 scree+0.065
 fright+0.064
hy+0.062
 speech+0.060
Neg logit contributions
-0.068
uing-0.066
-0.064
orting-0.062
 Others-0.059
Token comes
Token ID2058
Feature activation+1.18e-02

Pos logit contributions
 arm+0.000
 scree+0.000
 fright+0.000
hy+0.000
 speech+0.000
Neg logit contributions
-0.000
uing-0.000
-0.000
orting-0.000
 Others-0.000
Token back
Token ID736
Feature activation+6.63e-02

Pos logit contributions
uing+0.017
+0.016
+0.016
orting+0.015
 Others+0.015
Neg logit contributions
hy-0.018
 scree-0.018
 arm-0.017
 fright-0.017
 er-0.016
Token and
Token ID290
Feature activation-4.82e-02

Pos logit contributions
+0.115
uing+0.111
+0.107
orting+0.106
 Others+0.101
Neg logit contributions
 arm-0.088
 scree-0.084
 fright-0.084
hy-0.080
 supposed-0.078
Token sees
Token ID7224
Feature activation+6.05e-02

Pos logit contributions
 arm+0.079
 scree+0.079
 fright+0.077
 speech+0.074
hy+0.073
Neg logit contributions
-0.068
uing-0.065
-0.061
orting-0.060
 Others-0.057
Token that
Token ID326
Feature activation+9.39e-02

Pos logit contributions
+0.109
uing+0.104
+0.101
orting+0.096
 Pole+0.093
Neg logit contributions
 arm-0.079
 scree-0.075
 fright-0.073
 er-0.072
hy-0.071
Token her
Token ID607
Feature activation+1.84e-02

Pos logit contributions
+0.123
uing+0.115
+0.114
orting+0.111
 Others+0.106
Neg logit contributions
 scree-0.157
hy-0.154
 fright-0.154
 arm-0.150
 supposed-0.146
Token She
Token ID1375
Feature activation+6.77e-02

Pos logit contributions
 arm+0.037
 speech+0.036
hy+0.035
 fright+0.035
 scree+0.035
Neg logit contributions
uing-0.043
-0.039
ating-0.038
orting-0.037
-0.035
Token was
Token ID373
Feature activation-9.23e-02

Pos logit contributions
+0.089
+0.084
 Others+0.080
uing+0.079
 Pole+0.072
Neg logit contributions
hy-0.105
 supposed-0.104
 fright-0.102
 arm-0.098
oof-0.097
Token not
Token ID407
Feature activation-5.45e-02

Pos logit contributions
oof+0.104
 poem+0.097
 tal+0.095
hy+0.095
 fingert+0.092
Neg logit contributions
 nicely-0.145
 Others-0.143
uing-0.142
-0.140
 hey-0.138
Token happy
Token ID3772
Feature activation+6.76e-02

Pos logit contributions
hy+0.059
 fright+0.058
 scree+0.057
 arm+0.057
 fingert+0.055
Neg logit contributions
uing-0.096
-0.093
-0.091
 Others-0.088
orting-0.087
Token.
Token ID13
Feature activation-2.10e-02

Pos logit contributions
+0.138
+0.128
uing+0.127
orting+0.121
 Others+0.120
Neg logit contributions
 arm-0.075
 fright-0.071
 scree-0.071
 supposed-0.071
 written-0.068
Token
Token ID198
Feature activation+1.98e-02

Pos logit contributions
 arm+0.027
 speech+0.026
 fright+0.026
 very+0.025
hy+0.025
Neg logit contributions
uing-0.040
ating-0.035
-0.034
orting-0.034
ling-0.033
Token
Token ID198
Feature activation+2.16e-02

Pos logit contributions
+0.029
uing+0.028
orting+0.026
+0.026
 Pole+0.025
Neg logit contributions
 arm-0.031
 fright-0.030
 scree-0.030
 speech-0.028
hy-0.028
Token"
Token ID1
Feature activation+4.25e-02

Pos logit contributions
uing+0.028
+0.026
 Pole+0.026
orting+0.025
cast+0.024
Neg logit contributions
 arm-0.036
 scree-0.035
-0.035
 fright-0.035
hy-0.034
TokenWhat
Token ID2061
Feature activation+3.05e-02

Pos logit contributions
uing+0.071
+0.067
orting+0.066
+0.063
 Others+0.062
Neg logit contributions
 scree-0.058
 arm-0.058
 fright-0.056
 written-0.053
 speech-0.052
Token are
Token ID389
Feature activation-5.06e-02

Pos logit contributions
+0.029
+0.029
uing+0.027
 Others+0.027
ains+0.024
Neg logit contributions
 scree-0.057
 tal-0.055
 fingert-0.055
hy-0.055
 fright-0.054
Token you
Token ID345
Feature activation-5.79e-02

Pos logit contributions
 arm+0.061
 scree+0.060
 fright+0.058
hy+0.058
 supposed+0.057
Neg logit contributions
-0.089
uing-0.088
-0.083
orting-0.082
 Others-0.078
Token to
Token ID284
Feature activation-2.60e-02

Pos logit contributions
+0.010
uing+0.009
orting+0.009
+0.009
 Others+0.009
Neg logit contributions
 scree-0.009
 arm-0.009
hy-0.009
 fright-0.009
 speech-0.008
Token talk
Token ID1561
Feature activation-2.76e-02

Pos logit contributions
 scree+0.037
 arm+0.036
hy+0.036
leaf+0.035
 fingert+0.034
Neg logit contributions
-0.036
uing-0.036
 Others-0.034
orting-0.033
-0.032
Token to
Token ID284
Feature activation-1.51e-02

Pos logit contributions
 arm+0.032
 fright+0.031
 scree+0.031
 very+0.030
hy+0.030
Neg logit contributions
-0.052
uing-0.049
-0.048
orting-0.047
 Pole-0.045
Token other
Token ID584
Feature activation+1.51e-02

Pos logit contributions
 scree+0.019
 arm+0.019
 fright+0.018
hy+0.018
 supposed+0.017
Neg logit contributions
-0.026
uing-0.025
-0.024
orting-0.024
 Others-0.024
Token people
Token ID661
Feature activation-7.68e-03

Pos logit contributions
+0.021
uing+0.021
+0.020
orting+0.019
 Others+0.019
Neg logit contributions
 scree-0.024
 arm-0.023
 fright-0.023
hy-0.023
 supposed-0.022
Token.
Token ID13
Feature activation-3.12e-02

Pos logit contributions
 arm+0.008
 very+0.008
hy+0.008
 scree+0.008
 supposed+0.008
Neg logit contributions
-0.015
uing-0.015
-0.014
orting-0.014
 Others-0.013
Token
Token ID220
Feature activation+7.67e-03

Pos logit contributions
 arm+0.043
 very+0.041
hy+0.039
 speech+0.038
 er+0.038
Neg logit contributions
uing-0.052
-0.050
ating-0.049
orting-0.047
-0.046
Token
Token ID198
Feature activation+1.98e-02

Pos logit contributions
+0.012
uing+0.012
orting+0.011
+0.011
 Pole+0.011
Neg logit contributions
 arm-0.011
 very-0.010
 fright-0.010
 speech-0.010
 supposed-0.010
Token
Token ID198
Feature activation+2.17e-02

Pos logit contributions
+0.032
uing+0.030
orting+0.029
 Pole+0.028
+0.028
Neg logit contributions
 arm-0.028
 very-0.027
 fright-0.027
 speech-0.026
hy-0.025
TokenOne
Token ID3198
Feature activation+6.92e-02

Pos logit contributions
+0.032
uing+0.031
orting+0.029
+0.029
 Pole+0.028
Neg logit contributions
 arm-0.033
 scree-0.032
hy-0.031
 fright-0.031
 very-0.030
Token day
Token ID1110
Feature activation+4.93e-02

Pos logit contributions
+0.069
+0.067
 nicely+0.062
ry+0.061
acky+0.060
Neg logit contributions
 scree-0.131
 fright-0.130
 tal-0.121
 fingert-0.121
hy-0.119
Token bridges
Token ID19432
Feature activation+4.55e-02

Pos logit contributions
 arm+0.042
 fright+0.042
 scree+0.041
hy+0.041
 supposed+0.039
Neg logit contributions
uing-0.052
-0.051
-0.048
orting-0.048
 Others-0.047
Token and
Token ID290
Feature activation-7.03e-02

Pos logit contributions
+0.082
+0.076
uing+0.076
orting+0.073
 Others+0.070
Neg logit contributions
 arm-0.058
 scree-0.058
 fright-0.055
hy-0.054
 very-0.053
Token tunnels
Token ID22642
Feature activation+6.04e-02

Pos logit contributions
 arm+0.142
 scree+0.142
 fright+0.140
hy+0.137
 speech+0.134
Neg logit contributions
-0.068
uing-0.064
-0.060
orting-0.058
 Others-0.054
Token.
Token ID13
Feature activation-3.08e-02

Pos logit contributions
+0.124
+0.115
uing+0.114
orting+0.107
 Pole+0.106
Neg logit contributions
 scree-0.065
 arm-0.065
 very-0.063
 fright-0.062
hy-0.061
Token They
Token ID1119
Feature activation+4.49e-02

Pos logit contributions
 fright+0.039
 arm+0.039
 very+0.038
+0.037
 a+0.037
Neg logit contributions
uing-0.057
-0.051
 Majesty-0.050
rier-0.049
ating-0.049
Token have
Token ID423
Feature activation-2.04e-02

Pos logit contributions
+0.076
uing+0.074
+0.072
orting+0.070
 Others+0.068
Neg logit contributions
 arm-0.057
 scree-0.056
 fright-0.055
hy-0.054
 speech-0.052
Token fun
Token ID1257
Feature activation+3.99e-02

Pos logit contributions
 arm+0.019
 scree+0.019
 fright+0.019
hy+0.018
 speech+0.017
Neg logit contributions
-0.040
uing-0.039
-0.039
orting-0.038
 Others-0.038
Token.
Token ID13
Feature activation-3.35e-02

Pos logit contributions
+0.079
uing+0.074
orting+0.072
+0.071
 Pole+0.070
Neg logit contributions
 scree-0.045
 arm-0.044
 very-0.042
 fright-0.040
 er-0.039
Token
Token ID198
Feature activation+1.90e-02

Pos logit contributions
 fright+0.043
 arm+0.043
 very+0.041
+0.040
 scree+0.040
Neg logit contributions
uing-0.061
-0.055
rier-0.053
 Majesty-0.053
orting-0.052
Token
Token ID198
Feature activation+2.14e-02

Pos logit contributions
+0.030
uing+0.029
 Majesty+0.027
orting+0.027
+0.027
Neg logit contributions
 arm-0.027
 fright-0.027
 very-0.026
 scree-0.025
 supposed-0.025
TokenOne
Token ID3198
Feature activation+6.84e-02

Pos logit contributions
uing+0.036
 Majesty+0.033
 Pole+0.033
orting+0.033
+0.033
Neg logit contributions
 arm-0.027
 scree-0.027
 fright-0.027
 very-0.025
-0.025
Token was
Token ID373
Feature activation-8.47e-02

Pos logit contributions
 arm+0.086
 scree+0.085
 fright+0.085
hy+0.083
 supposed+0.081
Neg logit contributions
-0.095
-0.091
uing-0.090
orting-0.086
 Others-0.083
Token getting
Token ID1972
Feature activation-1.17e-02

Pos logit contributions
leaf+0.101
hy+0.091
Ta+0.090
oof+0.089
 fingert+0.087
Neg logit contributions
 them-0.124
uing-0.122
-0.122
-0.122
orting-0.121
Token closer
Token ID5699
Feature activation+6.03e-02

Pos logit contributions
hy+0.017
 poem+0.016
 fingert+0.015
oof+0.015
 arm+0.015
Neg logit contributions
 them-0.018
orting-0.015
ling-0.015
 nicely-0.015
 houses-0.014
Token and
Token ID290
Feature activation-4.81e-02

Pos logit contributions
+0.092
uing+0.089
+0.087
orting+0.085
 Others+0.080
Neg logit contributions
 scree-0.086
 arm-0.085
 fright-0.084
hy-0.083
 speech-0.079
Token closer
Token ID5699
Feature activation+6.29e-02

Pos logit contributions
 scree+0.053
 fright+0.051
 arm+0.050
 very+0.047
hy+0.045
Neg logit contributions
uing-0.095
-0.094
orting-0.090
 Others-0.086
-0.086
Token.
Token ID13
Feature activation-2.18e-02

Pos logit contributions
+0.142
uing+0.133
 Pole+0.128
orting+0.128
 Others+0.127
Neg logit contributions
 very-0.059
 arm-0.057
 scree-0.053
 fright-0.053
 written-0.049
Token She
Token ID1375
Feature activation+7.80e-02

Pos logit contributions
 arm+0.032
 speech+0.030
hy+0.030
 very+0.030
 fright+0.029
Neg logit contributions
uing-0.037
ating-0.033
orting-0.032
-0.032
ins-0.029
Token started
Token ID2067
Feature activation+2.32e-02

Pos logit contributions
+0.121
+0.117
uing+0.113
orting+0.107
 Others+0.105
Neg logit contributions
hy-0.109
 arm-0.109
 fright-0.107
 scree-0.105
 supposed-0.105
Token to
Token ID284
Feature activation-2.78e-02

Pos logit contributions
+0.033
+0.032
uing+0.032
orting+0.030
 Others+0.030
Neg logit contributions
hy-0.034
 arm-0.033
 scree-0.032
 fright-0.032
 fingert-0.031
Token feel
Token ID1254
Feature activation-3.38e-02

Pos logit contributions
hy+0.028
 supposed+0.027
 arm+0.027
 written+0.026
leaf+0.026
Neg logit contributions
-0.050
uing-0.049
-0.048
orting-0.046
 Others-0.046
Token scared
Token ID12008
Feature activation+1.84e-02

Pos logit contributions
hy+0.033
 arm+0.032
 scree+0.031
oof+0.030
 supposed+0.030
Neg logit contributions
-0.063
orting-0.061
-0.060
 Others-0.058
uing-0.058
Token you
Token ID345
Feature activation-5.63e-02

Pos logit contributions
 arm+0.014
 scree+0.014
hy+0.014
 fright+0.014
 speech+0.013
Neg logit contributions
-0.012
uing-0.012
-0.011
orting-0.011
 Others-0.011
Token.
Token ID13
Feature activation-1.85e-02

Pos logit contributions
 arm+0.075
 fright+0.071
 scree+0.070
 very+0.070
hy+0.068
Neg logit contributions
-0.096
uing-0.092
-0.088
orting-0.087
 Pole-0.085
Token Do
Token ID2141
Feature activation+7.12e-03

Pos logit contributions
 arm+0.030
 scree+0.030
 fright+0.029
hy+0.029
 speech+0.028
Neg logit contributions
-0.025
uing-0.024
-0.023
orting-0.023
 Others-0.022
Token you
Token ID345
Feature activation-6.18e-02

Pos logit contributions
uing+0.012
+0.012
orting+0.011
+0.011
 Pole+0.011
Neg logit contributions
 arm-0.009
 scree-0.009
 fright-0.008
hy-0.008
 very-0.008
Token want
Token ID765
Feature activation-2.95e-02

Pos logit contributions
 scree+0.084
 arm+0.082
hy+0.077
 fright+0.076
 er+0.076
Neg logit contributions
uing-0.099
-0.096
orting-0.093
-0.091
 Others-0.088
Token to
Token ID284
Feature activation-3.54e-02

Pos logit contributions
 scree+0.045
hy+0.044
 arm+0.044
 fright+0.042
 supposed+0.042
Neg logit contributions
-0.042
orting-0.040
uing-0.039
-0.039
 Others-0.038
Token play
Token ID711
Feature activation+2.44e-02

Pos logit contributions
 arm+0.047
 scree+0.047
 fright+0.046
hy+0.045
 supposed+0.044
Neg logit contributions
-0.058
uing-0.056
-0.054
orting-0.053
 Others-0.051
Token with
Token ID351
Feature activation-2.52e-02

Pos logit contributions
+0.044
uing+0.042
+0.040
 Pole+0.040
orting+0.039
Neg logit contributions
 arm-0.030
 fright-0.028
 scree-0.028
 very-0.027
 speech-0.026
Token our
Token ID674
Feature activation+6.75e-03

Pos logit contributions
 arm+0.042
 fright+0.041
 scree+0.040
 er+0.038
 very+0.038
Neg logit contributions
uing-0.034
-0.034
-0.031
orting-0.031
 Pole-0.029
Token ball
Token ID2613
Feature activation-5.29e-02

Pos logit contributions
+0.010
+0.009
uing+0.009
orting+0.009
 Pole+0.009
Neg logit contributions
 scree-0.010
hy-0.010
 fright-0.010
 written-0.009
 supposed-0.009
Token?"
Token ID1701
Feature activation-2.72e-02

Pos logit contributions
 arm+0.074
 scree+0.070
 fright+0.070
hy+0.068
 er+0.066
Neg logit contributions
-0.085
uing-0.083
-0.079
orting-0.077
 Others-0.075
Token
Token ID220
Feature activation+4.80e-03

Pos logit contributions
 arm+0.029
 scree+0.029
 speech+0.028
 fright+0.028
 very+0.028
Neg logit contributions
uing-0.040
-0.037
ating-0.037
orting-0.036
cast-0.033
Token
Token ID198
Feature activation+1.52e-02

Pos logit contributions
+0.009
uing+0.008
+0.008
orting+0.007
 Pole+0.007
Neg logit contributions
-0.006
 arm-0.006
 very-0.006
 speech-0.006
 fright-0.006
Token
Token ID198
Feature activation+1.65e-02

Pos logit contributions
+0.025
uing+0.024
orting+0.022
 Pole+0.022
+0.022
Neg logit contributions
 fright-0.020
 arm-0.020
 scree-0.020
-0.020
 speech-0.019
TokenThe
Token ID464
Feature activation+2.97e-02

Pos logit contributions
uing+0.030
+0.029
 Pole+0.029
orting+0.029
cast+0.028
Neg logit contributions
 scree-0.018
-0.018
 arm-0.017
 fright-0.017
hy-0.017
Token little
Token ID1310
Feature activation+1.39e-02

Pos logit contributions
+0.040
uing+0.038
+0.038
orting+0.036
 Pole+0.034
Neg logit contributions
 scree-0.048
 arm-0.048
 fright-0.047
hy-0.046
 supposed-0.046
Token boy
Token ID2933
Feature activation-3.15e-02

Pos logit contributions
+0.020
uing+0.020
orting+0.018
+0.018
 Others+0.017
Neg logit contributions
 arm-0.022
 scree-0.022
 fright-0.021
hy-0.021
 er-0.020
Token thought
Token ID1807
Feature activation-1.61e-02

Pos logit contributions
 arm+0.039
 scree+0.038
 fright+0.038
hy+0.037
 supposed+0.036
Neg logit contributions
-0.054
uing-0.053
-0.052
orting-0.050
 Others-0.048
Token that
Token ID326
Feature activation+9.25e-02

Pos logit contributions
 scree+0.022
 arm+0.022
 fright+0.022
hy+0.021
 speech+0.021
Neg logit contributions
uing-0.025
-0.025
orting-0.024
-0.023
 Others-0.023
Token was
Token ID373
Feature activation-6.99e-02

Pos logit contributions
+0.120
uing+0.111
+0.109
 Others+0.109
orting+0.105
Neg logit contributions
hy-0.149
 arm-0.142
 scree-0.142
 fright-0.140
 fingert-0.139
Token lovely
Token ID14081
Feature activation+5.64e-02

Pos logit contributions
 arm+0.057
 scree+0.053
 fright+0.053
hy+0.052
 fingert+0.046
Neg logit contributions
-0.148
-0.144
uing-0.141
orting-0.137
 Others-0.136
Token,
Token ID11
Feature activation+5.60e-02

Pos logit contributions
+0.093
uing+0.090
+0.086
orting+0.085
 Others+0.081
Neg logit contributions
 arm-0.078
 scree-0.076
 fright-0.075
hy-0.073
 speech-0.071
Token Lily
Token ID20037
Feature activation+5.10e-03

Pos logit contributions
 fright+0.007
 scree+0.007
 arm+0.007
hy+0.007
 supposed+0.007
Neg logit contributions
-0.006
-0.005
uing-0.005
orting-0.005
 Others-0.005
Token's
Token ID338
Feature activation-4.22e-02

Pos logit contributions
+0.009
uing+0.009
orting+0.008
+0.008
 Others+0.008
Neg logit contributions
 arm-0.006
 scree-0.006
 fright-0.006
hy-0.006
 supposed-0.006
Token mom
Token ID1995
Feature activation-6.85e-02

Pos logit contributions
 scree+0.076
leaf+0.076
 supposed+0.069
hy+0.069
 written+0.069
Neg logit contributions
-0.046
-0.045
 Majesty-0.043
uing-0.042
 nicely-0.042
Tokenmy
Token ID1820
Feature activation-2.83e-02

Pos logit contributions
 arm+0.089
 scree+0.087
 fright+0.084
hy+0.080
 supposed+0.079
Neg logit contributions
-0.120
uing-0.117
orting-0.110
-0.109
 Others-0.107
Token had
Token ID550
Feature activation-1.94e-02

Pos logit contributions
 arm+0.034
 scree+0.034
 fright+0.032
 written+0.031
 very+0.031
Neg logit contributions
-0.052
uing-0.051
orting-0.048
 Others-0.047
 Pole-0.046
Token an
Token ID281
Feature activation-1.03e-01

Pos logit contributions
 scree+0.020
hy+0.019
 fright+0.018
 fingert+0.018
 rhyth+0.018
Neg logit contributions
-0.037
-0.036
uing-0.034
 Others-0.034
orting-0.033
Token idea
Token ID2126
Feature activation-3.75e-02

Pos logit contributions
 scree+0.195
hy+0.182
 fingert+0.175
 fright+0.172
leaf+0.167
Neg logit contributions
uing-0.113
orting-0.102
-0.089
 Others-0.088
ying-0.087
Token.
Token ID13
Feature activation-2.86e-02

Pos logit contributions
 scree+0.070
hy+0.068
 fright+0.067
 fingert+0.066
 vel+0.066
Neg logit contributions
-0.038
-0.038
orting-0.038
uing-0.036
ry-0.032
Token She
Token ID1375
Feature activation+7.35e-02

Pos logit contributions
 very+0.036
 arm+0.034
+0.031
 a+0.030
 er+0.029
Neg logit contributions
uing-0.055
ating-0.054
orting-0.051
-0.051
 Majesty-0.050
Token took
Token ID1718
Feature activation-5.13e-03

Pos logit contributions
+0.109
+0.108
uing+0.107
orting+0.098
 Others+0.095
Neg logit contributions
hy-0.107
 fright-0.107
 arm-0.105
 scree-0.103
 fingert-0.098
Token the
Token ID262
Feature activation+1.73e-02

Pos logit contributions
 scree+0.007
hy+0.006
 fright+0.006
 arm+0.006
 supposed+0.006
Neg logit contributions
-0.009
uing-0.008
 Others-0.008
orting-0.008
-0.008
Token a
Token ID257
Feature activation-7.42e-02

Pos logit contributions
hy+0.051
 scree+0.051
 arm+0.051
leaf+0.047
 fright+0.047
Neg logit contributions
uing-0.081
-0.079
 Others-0.076
-0.074
orting-0.073
Token great
Token ID1049
Feature activation+1.94e-03

Pos logit contributions
hy+0.107
 scree+0.101
 arm+0.099
 supposed+0.098
 fright+0.095
Neg logit contributions
uing-0.115
-0.114
-0.110
orting-0.100
 Others-0.099
Token job
Token ID1693
Feature activation-3.48e-02

Pos logit contributions
uing+0.003
+0.003
+0.003
orting+0.003
 Majesty+0.002
Neg logit contributions
 scree-0.003
hy-0.003
 fright-0.003
 supposed-0.003
leaf-0.003
Token.
Token ID13
Feature activation-2.41e-02

Pos logit contributions
 arm+0.045
 scree+0.045
 fright+0.044
hy+0.042
 supposed+0.041
Neg logit contributions
-0.059
uing-0.057
-0.055
orting-0.054
 Others-0.052
Token I
Token ID314
Feature activation-2.74e-02

Pos logit contributions
 arm+0.038
 scree+0.038
 fright+0.037
hy+0.036
 speech+0.035
Neg logit contributions
-0.034
uing-0.033
-0.031
orting-0.031
 Others-0.029
Token'm
Token ID1101
Feature activation-1.02e-01

Pos logit contributions
 arm+0.035
 scree+0.034
 fright+0.034
hy+0.033
 speech+0.031
Neg logit contributions
-0.047
uing-0.046
-0.044
orting-0.044
 Others-0.042
Token glad
Token ID9675
Feature activation+5.97e-02

Pos logit contributions
 arm+0.126
 scree+0.125
hy+0.125
 fright+0.124
 fingert+0.117
Neg logit contributions
uing-0.163
-0.159
-0.158
 Others-0.155
orting-0.153
Token you
Token ID345
Feature activation-5.07e-02

Pos logit contributions
+0.095
uing+0.092
+0.088
orting+0.087
 Others+0.083
Neg logit contributions
 arm-0.084
 scree-0.084
 fright-0.082
hy-0.080
 speech-0.077
Token solved
Token ID16019
Feature activation+9.49e-02

Pos logit contributions
 arm+0.067
 scree+0.066
 fright+0.065
hy+0.063
 speech+0.060
Neg logit contributions
-0.085
uing-0.082
-0.080
orting-0.078
 Others-0.075
Token the
Token ID262
Feature activation+2.45e-02

Pos logit contributions
+0.159
uing+0.154
+0.149
orting+0.146
 Others+0.141
Neg logit contributions
 scree-0.123
 arm-0.122
hy-0.119
 fright-0.119
 speech-0.112
Token zoom
Token ID19792
Feature activation+5.37e-03

Pos logit contributions
+0.035
uing+0.033
+0.033
orting+0.032
 Pole+0.031
Neg logit contributions
hy-0.036
 scree-0.036
 supposed-0.034
 arm-0.034
 fingert-0.033
Token Emily
Token ID17608
Feature activation+2.84e-02

Pos logit contributions
+0.093
uing+0.093
orting+0.088
+0.087
 Pole+0.082
Neg logit contributions
 arm-0.074
 scree-0.074
 fright-0.072
hy-0.068
 very-0.067
Token could
Token ID714
Feature activation-2.19e-02

Pos logit contributions
+0.046
uing+0.045
+0.044
orting+0.042
 Others+0.041
Neg logit contributions
 arm-0.037
 scree-0.037
 fright-0.037
hy-0.035
 supposed-0.034
Token hardly
Token ID8941
Feature activation-7.06e-02

Pos logit contributions
 arm+0.026
 scree+0.026
hy+0.025
 fright+0.025
 supposed+0.024
Neg logit contributions
-0.038
uing-0.038
-0.036
orting-0.035
 Others-0.034
Token keep
Token ID1394
Feature activation-2.99e-02

Pos logit contributions
 arm+0.087
 scree+0.083
 fright+0.080
 supposed+0.080
hy+0.079
Neg logit contributions
-0.119
uing-0.118
orting-0.111
-0.111
 Others-0.108
Token her
Token ID607
Feature activation-7.11e-04

Pos logit contributions
hy+0.047
 scree+0.044
 fright+0.044
 arm+0.044
leaf+0.042
Neg logit contributions
uing-0.041
-0.040
-0.038
orting-0.038
cast-0.036
Token eyes
Token ID2951
Feature activation-1.29e-01

Pos logit contributions
leaf+0.001
 supposed+0.001
 scree+0.001
hy+0.001
Ta+0.001
Neg logit contributions
uing-0.001
-0.001
-0.001
orting-0.001
 until-0.001
Token open
Token ID1280
Feature activation+3.02e-02

Pos logit contributions
 scree+0.161
leaf+0.161
 fright+0.154
oof+0.153
 arm+0.149
Neg logit contributions
uing-0.213
-0.206
-0.197
orting-0.195
 Others-0.181
Token.
Token ID13
Feature activation-2.16e-02

Pos logit contributions
+0.051
uing+0.049
+0.047
orting+0.047
 Others+0.045
Neg logit contributions
 arm-0.040
 scree-0.039
 fright-0.038
hy-0.037
 speech-0.036
Token After
Token ID2293
Feature activation+4.54e-02

Pos logit contributions
 arm+0.033
 fright+0.032
 speech+0.032
 scree+0.031
+0.031
Neg logit contributions
uing-0.032
-0.030
orting-0.029
ating-0.028
 Pole-0.027
Token a
Token ID257
Feature activation-6.55e-02

Pos logit contributions
+0.088
uing+0.085
+0.084
orting+0.083
 Others+0.081
Neg logit contributions
 arm-0.046
 scree-0.044
 fright-0.044
hy-0.043
 supposed-0.041
Token few
Token ID1178
Feature activation+3.84e-02

Pos logit contributions
 arm+0.106
 scree+0.105
 fright+0.103
hy+0.102
 supposed+0.098
Neg logit contributions
-0.090
uing-0.086
-0.083
orting-0.080
 Others-0.076
Token that
Token ID326
Feature activation+7.98e-02

Pos logit contributions
+0.015
uing+0.014
+0.014
orting+0.013
 Others+0.013
Neg logit contributions
 scree-0.014
 arm-0.014
 fright-0.014
hy-0.014
 speech-0.013
Token she
Token ID673
Feature activation+9.06e-02

Pos logit contributions
+0.104
uing+0.099
+0.095
orting+0.093
 Others+0.089
Neg logit contributions
 scree-0.134
 arm-0.133
 fright-0.131
hy-0.129
 speech-0.124
Token would
Token ID561
Feature activation-2.68e-02

Pos logit contributions
+0.146
uing+0.138
+0.137
 Others+0.128
orting+0.128
Neg logit contributions
 scree-0.124
 arm-0.122
 fright-0.121
hy-0.118
 speech-0.114
Token have
Token ID423
Feature activation-1.09e-02

Pos logit contributions
 scree+0.031
 arm+0.031
 fright+0.030
hy+0.029
 supposed+0.028
Neg logit contributions
-0.049
uing-0.047
-0.046
orting-0.045
 Others-0.043
Token a
Token ID257
Feature activation-7.53e-02

Pos logit contributions
 scree+0.012
 arm+0.011
 fright+0.011
hy+0.011
 fingert+0.010
Neg logit contributions
-0.020
uing-0.020
-0.019
orting-0.019
 Others-0.019
Token lot
Token ID1256
Feature activation-1.14e-01

Pos logit contributions
 scree+0.105
hy+0.102
 arm+0.097
 fright+0.094
 written+0.092
Neg logit contributions
-0.116
uing-0.113
-0.113
 Pole-0.109
 Others-0.107
Token of
Token ID286
Feature activation-2.70e-02

Pos logit contributions
 scree+0.125
 arm+0.124
 fright+0.120
hy+0.116
 speech+0.109
Neg logit contributions
-0.215
uing-0.210
orting-0.203
-0.202
 Others-0.195
Token fun
Token ID1257
Feature activation+3.14e-02

Pos logit contributions
 scree+0.039
hy+0.037
 arm+0.037
 fright+0.034
 fingert+0.034
Neg logit contributions
uing-0.040
 Others-0.040
orting-0.039
-0.037
 Pole-0.035
Token there
Token ID612
Feature activation+1.05e-01

Pos logit contributions
+0.054
uing+0.051
+0.051
orting+0.050
 Pole+0.047
Neg logit contributions
 arm-0.040
 scree-0.039
 fright-0.038
 supposed-0.037
 very-0.037
Token and
Token ID290
Feature activation-7.40e-02

Pos logit contributions
+0.171
uing+0.166
+0.160
orting+0.158
 Others+0.150
Neg logit contributions
 arm-0.144
 scree-0.139
 fright-0.139
hy-0.135
 supposed-0.131
Token she
Token ID673
Feature activation+8.75e-02

Pos logit contributions
 arm+0.094
 scree+0.093
 fright+0.091
hy+0.088
 speech+0.086
Neg logit contributions
-0.128
uing-0.124
-0.121
orting-0.119
 Others-0.112
Token began
Token ID2540
Feature activation+3.14e-02

Pos logit contributions
 scree+0.052
 very+0.049
 arm+0.049
 fright+0.049
 ped+0.045
Neg logit contributions
uing-0.105
-0.103
orting-0.100
 Others-0.095
ating-0.093
Token to
Token ID284
Feature activation-2.69e-02

Pos logit contributions
+0.046
uing+0.044
+0.044
orting+0.041
 Others+0.041
Neg logit contributions
hy-0.046
 arm-0.046
 scree-0.046
 fright-0.044
 supposed-0.042
Token take
Token ID1011
Feature activation-2.21e-02

Pos logit contributions
 arm+0.035
 scree+0.034
 fright+0.033
hy+0.033
 supposed+0.032
Neg logit contributions
-0.046
uing-0.044
-0.043
orting-0.042
 Others-0.041
Token a
Token ID257
Feature activation-6.01e-02

Pos logit contributions
 scree+0.025
hy+0.025
 arm+0.024
 fright+0.023
 speech+0.023
Neg logit contributions
-0.041
uing-0.038
orting-0.038
-0.037
 Others-0.037
Token few
Token ID1178
Feature activation+3.32e-02

Pos logit contributions
hy+0.067
 scree+0.067
 arm+0.066
 supposed+0.064
 fright+0.061
Neg logit contributions
uing-0.104
-0.104
-0.103
 Pole-0.098
orting-0.097
Token l
Token ID300
Feature activation-1.16e-01

Pos logit contributions
+0.051
uing+0.051
orting+0.049
 Others+0.048
+0.047
Neg logit contributions
hy-0.046
 scree-0.045
 arm-0.044
 supposed-0.043
 speech-0.042
Tokenicks
Token ID3378
Feature activation+1.53e-02

Pos logit contributions
 scree+0.179
 arm+0.175
 fright+0.173
hy+0.165
 vel+0.161
Neg logit contributions
uing-0.167
orting-0.166
-0.159
 Others-0.152
-0.146
Token.
Token ID13
Feature activation-3.45e-02

Pos logit contributions
+0.025
uing+0.024
orting+0.023
+0.023
 Others+0.023
Neg logit contributions
 scree-0.021
 supposed-0.020
hy-0.019
 arm-0.019
 fright-0.019
Token Y
Token ID575
Feature activation-5.38e-02

Pos logit contributions
 arm+0.054
 fright+0.053
 scree+0.052
hy+0.052
 speech+0.051
Neg logit contributions
uing-0.051
-0.047
orting-0.046
ating-0.043
-0.043
Tokenum
Token ID388
Feature activation-3.29e-02

Pos logit contributions
 arm+0.079
 scree+0.077
 fright+0.075
hy+0.075
 supposed+0.071
Neg logit contributions
-0.083
uing-0.080
-0.078
orting-0.075
 Others-0.073
Token!
Token ID0
Feature activation-4.42e-02

Pos logit contributions
 arm+0.045
 scree+0.044
hy+0.042
 er+0.042
 very+0.042
Neg logit contributions
-0.056
uing-0.054
orting-0.051
 Others-0.049
-0.049