TOP ACTIVATIONS
MAX = 18.511

voice . " Ho ho ho ! Merry Christmas
. He said , " Ho ho ho ! Don 't
to reach . Ho pper asked his friend ,
sack . " Ho ho ho ! Merry Christmas
. He said , " Ho ho ho , hello Lily
light . " Ho , ho , ho !
She put some on her eyel ids and cheeks . It
. He blinked his long eyel ashes and kept hopping .
They started playing hide - and - seek again and had
decided to play hide - and - seek . Lily hid
Jack said firmly , " and I 'm not going to
. They played hide - and - seek and had so
Let 's play hide - and - seek !" Lily was
loved to play hide - and - seek with her friends
I 'm playing hide - and - seek !"
much fun playing hide - and - seek . Tim my
Lily was playing hide - and - seek with her friends
she would play hide - and - seek with her friends
friends were playing hide - and - seek behind the old
they were playing hide - and - seek when Lily couldn

MIN ACTIVATIONS
MIN = -16.998

We have different likes and disl ikes . We can be
, but only with our supervision . And maybe you can
to use it with adult supervision ," Mr . Lee said
a broom and a dust pan and start to clean the
her little broom and dust pan , trying to get all
m op and a dust pan and cleaned up the dough
a broom and a dust pan . You have to clean
of paper and a dust pan and they cleaned up the
a broom and a dust pan . " Can
you should do something more constructive with those mar bles .
Lily . It was my fault . I was not careful
trust and care , and that breaking or losing things can
. It 's not your fault . Fl uffy was old
and forgiveness . It shows that lending and borrowing things requires
hurts Ben . She says that brothers should be kind and
. It 's not your fault . The rope was old
and said goodbye . From that day on , Lily knew
a broom and a dust pan . They help him sweep
" It 's your fault , you crashed into me
she accidentally broke the sign that said " Do not touch

INTERVAL 9.634 - 18.511
CONTAINS 0.155%

. She was only three years old and she loved playing
smelled delicious . Har per and her mom had
ier . They walk back home . They are
on tight as the pony tro tted around the park .
he decided to take a walk down to the beach .

INTERVAL 0.757 - 9.634
CONTAINS 38.406%

more hearts and put them on the floor . They made
dad spent the whole day visiting the dragon . They laughed
a good idea and they went to the neighbor 's house
he knew it , the school day was over . He
said , " Yes , please !". So , Joe drove

INTERVAL -8.121 - 0.757
CONTAINS 60.724%

carrots . Bob had an idea . He would
to feel a little better . But then ,
Her mom was so happy and they both hugged . The
Mia 's paper . She is using the red cr ay
. She ran outside to say hello to the pony .

INTERVAL -16.998 - -8.121
CONTAINS 0.714%

. He was so curious that he wanted to find out
that day . They learned that sometimes it is good to
they continued walking . Even though Joe couldn 't help the
want to be mean like that bird . I want to
new friend . She knew that sometimes it takes a little
Token voice
Token ID3809
Feature activation+2.88e+00

Pos logit contributions
ust+0.254
 centuries+0.238
ur+0.228
Ar+0.228
abb+0.221
Neg logit contributions
 sees-0.185
-0.181
 "-0.174
 draws-0.172
 understands-0.170
Token.
Token ID13
Feature activation+1.14e+00

Pos logit contributions
ust+0.102
Ar+0.099
 centuries+0.099
abb+0.093
ur+0.092
Neg logit contributions
 sees-0.070
 means-0.069
-0.066
 understands-0.066
 draws-0.065
Token
Token ID198
Feature activation+3.16e+00

Pos logit contributions
Ar+0.037
 centuries+0.036
ur+0.035
 ever+0.034
ust+0.034
Neg logit contributions
-0.027
 disl-0.026
liness-0.026
 draws-0.026
Fighting-0.026
Token
Token ID198
Feature activation+3.16e+00

Pos logit contributions
Ar+0.105
 centuries+0.102
ust+0.096
 ever+0.096
ur+0.095
Neg logit contributions
-0.077
 understands-0.075
 draws-0.074
 means-0.073
liness-0.073
Token"
Token ID1
Feature activation-7.03e+00

Pos logit contributions
Ar+0.108
 centuries+0.098
 ever+0.091
Flo+0.090
 ho+0.090
Neg logit contributions
 understands-0.077
liness-0.075
 means-0.073
 draws-0.072
 disl-0.071
TokenHo
Token ID28900
Feature activation+1.85e+01

Pos logit contributions
+0.165
 understands+0.161
 sees+0.157
 draws+0.156
 means+0.154
Neg logit contributions
ust-0.245
 centuries-0.243
Ar-0.235
ur-0.234
 ever-0.232
Token ho
Token ID8169
Feature activation+1.29e+01

Pos logit contributions
ust+0.329
 centuries+0.313
Ar+0.313
and+0.275
 prey+0.272
Neg logit contributions
-0.791
 sees-0.763
 understands-0.750
 draws-0.740
 means-0.729
Token ho
Token ID8169
Feature activation+1.29e+01

Pos logit contributions
ust+0.292
 centuries+0.280
Ar+0.272
 prey+0.246
and+0.242
Neg logit contributions
-0.499
 sees-0.484
 understands-0.469
 means-0.452
 draws-0.452
Token!
Token ID0
Feature activation-4.96e-01

Pos logit contributions
ust+0.262
 centuries+0.251
Ar+0.240
and+0.211
 prey+0.211
Neg logit contributions
-0.535
 sees-0.514
 understands-0.495
 means-0.481
 draws-0.476
Token Merry
Token ID39001
Feature activation+6.83e+00

Pos logit contributions
+0.012
 understands+0.011
 sees+0.011
 draws+0.011
 means+0.011
Neg logit contributions
ust-0.017
 centuries-0.017
Ar-0.016
 ever-0.016
ur-0.016
Token Christmas
Token ID6786
Feature activation+4.55e+00

Pos logit contributions
ust+0.218
 centuries+0.199
Ar+0.195
and+0.189
 prey+0.188
Neg logit contributions
-0.207
 sees-0.203
 draws-0.197
 understands-0.192
 means-0.190
Token.
Token ID13
Feature activation+9.07e-01

Pos logit contributions
Ar+0.257
ust+0.244
addle+0.226
abb+0.225
 centuries+0.225
Neg logit contributions
 "-0.113
-0.101
:"-0.099
 the-0.098
 sees-0.096
Token He
Token ID679
Feature activation+1.17e+00

Pos logit contributions
 centuries+0.032
ur+0.032
Ar+0.031
ust+0.031
 ever+0.031
Neg logit contributions
 draws-0.020
 understands-0.019
-0.019
 sees-0.018
Fighting-0.018
Token said
Token ID531
Feature activation-1.74e+00

Pos logit contributions
 centuries+0.048
Ar+0.048
ust+0.048
 ever+0.047
ur+0.047
Neg logit contributions
 understands-0.018
-0.017
 draws-0.017
 sees-0.016
 means-0.016
Token,
Token ID11
Feature activation-1.04e+00

Pos logit contributions
 draws+0.038
 understands+0.038
+0.036
 sees+0.036
Fighting+0.036
Neg logit contributions
ur-0.062
 centuries-0.061
 ever-0.060
ust-0.059
 ho-0.059
Token "
Token ID366
Feature activation-5.74e+00

Pos logit contributions
 draws+0.012
 understands+0.012
 sees+0.011
liness+0.011
 means+0.010
Neg logit contributions
 ho-0.047
ur-0.047
 centuries-0.046
 ever-0.046
Ar-0.046
TokenHo
Token ID28900
Feature activation+1.85e+01

Pos logit contributions
+0.125
 sees+0.112
 understands+0.112
 means+0.111
Fighting+0.108
Neg logit contributions
 centuries-0.213
ust-0.211
 eyel-0.206
ur-0.206
 prey-0.204
Token ho
Token ID8169
Feature activation+1.31e+01

Pos logit contributions
ust+0.293
Ar+0.283
 centuries+0.275
 prey+0.238
 souven+0.238
Neg logit contributions
-0.821
 sees-0.784
 understands-0.776
 draws-0.766
 "-0.753
Token ho
Token ID8169
Feature activation+1.31e+01

Pos logit contributions
ust+0.278
Ar+0.274
 centuries+0.271
 prey+0.243
 souven+0.238
Neg logit contributions
-0.507
 sees-0.481
 understands-0.477
 draws-0.461
 means-0.457
Token!
Token ID0
Feature activation-6.25e-01

Pos logit contributions
ust+0.253
Ar+0.250
 centuries+0.247
 prey+0.216
 souven+0.214
Neg logit contributions
-0.536
 sees-0.505
 understands-0.497
 draws-0.480
 "-0.478
Token Don
Token ID2094
Feature activation-2.38e+00

Pos logit contributions
+0.014
 understands+0.013
 means+0.013
 sees+0.013
 "+0.013
Neg logit contributions
ust-0.023
 centuries-0.022
Ar-0.021
 prey-0.021
 ever-0.021
Token't
Token ID470
Feature activation+3.89e+00

Pos logit contributions
+0.070
 means+0.068
 understands+0.067
 sees+0.067
 draws+0.064
Neg logit contributions
ust-0.069
 centuries-0.067
Ar-0.066
 prey-0.065
 souven-0.064
Token to
Token ID284
Feature activation+8.91e-02

Pos logit contributions
Ar+0.151
 centuries+0.144
ust+0.139
 souven+0.134
 eyel+0.132
Neg logit contributions
 sees-0.064
 "-0.064
 realized-0.063
 understands-0.062
 means-0.062
Token reach
Token ID3151
Feature activation+5.16e-02

Pos logit contributions
ust+0.003
 centuries+0.003
Ar+0.003
and+0.003
ur+0.003
Neg logit contributions
 sees-0.002
 realized-0.002
 saw-0.002
 the-0.002
 it-0.002
Token.
Token ID13
Feature activation+4.37e-01

Pos logit contributions
ust+0.002
Ar+0.002
 centuries+0.002
ur+0.002
ider+0.002
Neg logit contributions
 it-0.001
 "-0.001
-0.001
 sees-0.001
 means-0.001
Token
Token ID198
Feature activation+2.49e+00

Pos logit contributions
Ar+0.016
 centuries+0.016
ust+0.016
ur+0.016
 ho+0.016
Neg logit contributions
 understands-0.009
 draws-0.008
 sees-0.008
-0.008
 means-0.008
Token
Token ID198
Feature activation+2.46e+00

Pos logit contributions
ust+0.103
 centuries+0.102
ur+0.099
 ho+0.099
 ever+0.098
Neg logit contributions
-0.041
 understands-0.040
 sees-0.039
 means-0.037
 draws-0.037
TokenHo
Token ID28900
Feature activation+1.78e+01

Pos logit contributions
ust+0.064
 centuries+0.064
 ever+0.064
ur+0.061
 prey+0.060
Neg logit contributions
-0.083
 realized-0.076
 sees-0.075
 understands-0.074
 draws-0.074
Tokenpper
Token ID2848
Feature activation+2.15e+00

Pos logit contributions
ust+0.337
 centuries+0.329
Ar+0.314
 ever+0.296
ur+0.285
Neg logit contributions
-0.720
 sees-0.692
 understands-0.689
 draws-0.680
 means-0.670
Token asked
Token ID1965
Feature activation-2.56e+00

Pos logit contributions
ur+0.081
 ho+0.080
 ever+0.079
 centuries+0.077
ust+0.076
Neg logit contributions
Fighting-0.042
 disl-0.039
 understands-0.039
 draws-0.037
liness-0.037
Token his
Token ID465
Feature activation+3.24e+00

Pos logit contributions
+0.060
 understands+0.058
 "+0.058
 means+0.058
 sees+0.058
Neg logit contributions
ust-0.093
 centuries-0.087
Ar-0.085
 ever-0.084
ider-0.082
Token friend
Token ID1545
Feature activation-2.98e-01

Pos logit contributions
 centuries+0.113
ust+0.111
Ar+0.106
ur+0.103
 ever+0.102
Neg logit contributions
 understands-0.080
 sees-0.079
-0.076
 "-0.074
 realized-0.074
Token,
Token ID11
Feature activation-1.44e+00

Pos logit contributions
+0.005
 understands+0.005
 sees+0.005
 draws+0.005
 means+0.005
Neg logit contributions
ust-0.012
 centuries-0.012
Ar-0.012
 ever-0.011
ur-0.011
Token sack
Token ID23704
Feature activation-1.70e+00

Pos logit contributions
 understands+0.174
 means+0.173
 disl+0.169
 draws+0.164
+0.161
Neg logit contributions
 centuries-0.186
 prey-0.172
 ho-0.169
Ar-0.168
 souven-0.167
Token.
Token ID13
Feature activation+6.91e-01

Pos logit contributions
+0.046
 understands+0.046
 sees+0.045
 draws+0.044
 means+0.044
Neg logit contributions
ust-0.052
 centuries-0.051
Ar-0.050
ur-0.049
 ever-0.049
Token
Token ID198
Feature activation+2.45e+00

Pos logit contributions
 centuries+0.029
ur+0.029
Ar+0.028
ust+0.028
 ever+0.027
Neg logit contributions
-0.010
 understands-0.009
 sees-0.009
 draws-0.009
Fighting-0.009
Token
Token ID198
Feature activation+2.44e+00

Pos logit contributions
Ar+0.084
 centuries+0.083
ust+0.080
 ever+0.079
ur+0.078
Neg logit contributions
-0.057
 understands-0.056
 draws-0.056
 means-0.055
 sees-0.054
Token"
Token ID1
Feature activation-7.78e+00

Pos logit contributions
Ar+0.090
 centuries+0.086
ust+0.085
ur+0.083
 ever+0.083
Neg logit contributions
 understands-0.052
 means-0.051
 draws-0.051
 sees-0.049
liness-0.048
TokenHo
Token ID28900
Feature activation+1.75e+01

Pos logit contributions
+0.162
 understands+0.154
 sees+0.152
 draws+0.149
 means+0.148
Neg logit contributions
ust-0.294
 centuries-0.294
ur-0.288
 ho-0.279
 ever-0.279
Token ho
Token ID8169
Feature activation+1.22e+01

Pos logit contributions
Ar+0.354
ust+0.349
 centuries+0.342
and+0.308
 souven+0.306
Neg logit contributions
-0.695
 sees-0.673
 understands-0.658
 draws-0.645
 "-0.637
Token ho
Token ID8169
Feature activation+1.22e+01

Pos logit contributions
 centuries+0.321
ust+0.320
Ar+0.316
 prey+0.285
 eyel+0.283
Neg logit contributions
-0.412
 sees-0.393
 understands-0.385
 means-0.366
 draws-0.366
Token!
Token ID0
Feature activation-1.04e+00

Pos logit contributions
 centuries+0.300
ust+0.297
Ar+0.293
 eyel+0.263
 souven+0.259
Neg logit contributions
-0.440
 sees-0.415
 understands-0.402
 means-0.384
 realized-0.383
Token Merry
Token ID39001
Feature activation+5.94e+00

Pos logit contributions
+0.023
 understands+0.022
 sees+0.022
 means+0.021
 draws+0.021
Neg logit contributions
ust-0.038
 centuries-0.038
Ar-0.036
 ever-0.036
ur-0.036
Token Christmas
Token ID6786
Feature activation+3.74e+00

Pos logit contributions
ust+0.185
 centuries+0.176
Ar+0.174
 prey+0.164
and+0.164
Neg logit contributions
-0.179
 sees-0.172
 draws-0.167
 understands-0.165
 means-0.161
Token.
Token ID13
Feature activation+6.63e-01

Pos logit contributions
+0.023
 understands+0.022
 sees+0.022
 "+0.021
 means+0.021
Neg logit contributions
 centuries-0.031
ust-0.031
Ar-0.030
ur-0.029
 ever-0.029
Token He
Token ID679
Feature activation+2.62e-01

Pos logit contributions
ust+0.026
 centuries+0.026
Ar+0.025
 ever+0.025
ur+0.025
Neg logit contributions
-0.012
 understands-0.012
 sees-0.012
 means-0.011
 draws-0.011
Token said
Token ID531
Feature activation-2.39e+00

Pos logit contributions
 centuries+0.011
ust+0.011
Ar+0.011
 ever+0.011
ur+0.011
Neg logit contributions
-0.004
 understands-0.003
 draws-0.003
 sees-0.003
 means-0.003
Token,
Token ID11
Feature activation-1.51e+00

Pos logit contributions
+0.048
 understands+0.047
 draws+0.047
 sees+0.046
 means+0.045
Neg logit contributions
 centuries-0.088
ust-0.087
ur-0.087
 ever-0.086
 ho-0.085
Token "
Token ID366
Feature activation-6.41e+00

Pos logit contributions
 understands+0.017
 draws+0.016
 sees+0.015
 means+0.014
liness+0.014
Neg logit contributions
 ho-0.069
ur-0.068
 centuries-0.067
Ar-0.067
 ever-0.066
TokenHo
Token ID28900
Feature activation+1.74e+01

Pos logit contributions
+0.152
 understands+0.139
 means+0.135
 draws+0.134
 sees+0.134
Neg logit contributions
 centuries-0.227
ur-0.223
ust-0.223
 eyel-0.221
 prey-0.220
Token ho
Token ID8169
Feature activation+1.21e+01

Pos logit contributions
ust+0.396
Ar+0.377
 centuries+0.377
 prey+0.345
 ever+0.338
Neg logit contributions
-0.649
 sees-0.625
 understands-0.620
 draws-0.607
 means-0.597
Token ho
Token ID8169
Feature activation+1.21e+01

Pos logit contributions
ust+0.332
 centuries+0.324
Ar+0.317
 prey+0.298
 ever+0.291
Neg logit contributions
-0.391
 sees-0.374
 understands-0.374
 means-0.362
 draws-0.361
Token,
Token ID11
Feature activation-1.47e+00

Pos logit contributions
ust+0.307
 centuries+0.298
Ar+0.290
 prey+0.265
 souven+0.260
Neg logit contributions
-0.427
 sees-0.403
 understands-0.397
 means-0.388
 draws-0.382
Token hello
Token ID23748
Feature activation+4.02e+00

Pos logit contributions
+0.037
 sees+0.037
 understands+0.036
 "+0.036
 means+0.035
Neg logit contributions
ust-0.049
 centuries-0.047
Ar-0.046
abb-0.045
ur-0.045
Token Lily
Token ID20037
Feature activation-1.26e+00

Pos logit contributions
 centuries+0.175
Ar+0.170
ust+0.169
 eyel+0.160
addle+0.160
Neg logit contributions
-0.071
 "-0.067
 sees-0.066
 means-0.060
 understands-0.060
Token light
Token ID1657
Feature activation-1.00e-01

Pos logit contributions
 centuries+0.041
ust+0.040
Ar+0.040
 ho+0.040
 ever+0.039
Neg logit contributions
-0.029
 understands-0.028
 means-0.027
 draws-0.027
 sees-0.027
Token.
Token ID13
Feature activation+6.19e-01

Pos logit contributions
+0.002
 sees+0.002
 understands+0.002
 draws+0.002
 means+0.002
Neg logit contributions
ust-0.003
 centuries-0.003
Ar-0.003
ur-0.003
abb-0.003
Token
Token ID198
Feature activation+2.19e+00

Pos logit contributions
ust+0.024
 centuries+0.024
Ar+0.023
ur+0.023
 ever+0.023
Neg logit contributions
-0.012
 understands-0.012
 sees-0.011
 draws-0.011
 means-0.011
Token
Token ID198
Feature activation+2.15e+00

Pos logit contributions
 centuries+0.080
ust+0.080
Ar+0.079
ur+0.077
 ever+0.076
Neg logit contributions
-0.046
 understands-0.045
 draws-0.044
 sees-0.044
 means-0.043
Token"
Token ID1
Feature activation-8.05e+00

Pos logit contributions
ust+0.071
 centuries+0.071
Ar+0.069
ur+0.067
 ever+0.067
Neg logit contributions
 understands-0.053
-0.053
 sees-0.051
 draws-0.051
 means-0.051
TokenHo
Token ID28900
Feature activation+1.73e+01

Pos logit contributions
+0.152
Fighting+0.133
 draws+0.132
 means+0.129
 understands+0.127
Neg logit contributions
ur-0.325
ust-0.315
 eyel-0.313
 centuries-0.312
 prey-0.311
Token,
Token ID11
Feature activation-1.59e+00

Pos logit contributions
ust+0.425
Ar+0.413
 centuries+0.412
 prey+0.382
 ever+0.377
Neg logit contributions
-0.606
 sees-0.578
 understands-0.574
 draws-0.564
 means-0.554
Token ho
Token ID8169
Feature activation+1.21e+01

Pos logit contributions
+0.040
 understands+0.039
 sees+0.039
 means+0.038
 draws+0.037
Neg logit contributions
ust-0.052
 centuries-0.051
Ar-0.050
ur-0.049
 prey-0.049
Token,
Token ID11
Feature activation-1.60e+00

Pos logit contributions
ust+0.346
 centuries+0.343
Ar+0.334
 prey+0.320
 ever+0.318
Neg logit contributions
-0.366
 understands-0.349
 sees-0.346
 draws-0.338
 means-0.337
Token ho
Token ID8169
Feature activation+1.21e+01

Pos logit contributions
+0.041
 sees+0.040
 understands+0.040
 means+0.039
 "+0.038
Neg logit contributions
ust-0.052
 centuries-0.051
Ar-0.050
 prey-0.049
ur-0.049
Token!
Token ID0
Feature activation-8.91e-01

Pos logit contributions
ust+0.353
 centuries+0.348
Ar+0.342
 prey+0.326
 eyel+0.318
Neg logit contributions
-0.368
 sees-0.338
 understands-0.337
 means-0.327
 draws-0.324
Token She
Token ID1375
Feature activation+5.29e-01

Pos logit contributions
 centuries+0.019
and+0.019
ur+0.018
Ar+0.018
Flo+0.018
Neg logit contributions
 realized-0.021
 sees-0.020
 understands-0.020
edience-0.020
 disl-0.020
Token put
Token ID1234
Feature activation-4.18e+00

Pos logit contributions
 ever+0.020
ust+0.019
 centuries+0.019
 eyel+0.019
Ar+0.019
Neg logit contributions
-0.011
Fighting-0.010
 disl-0.010
liness-0.010
 understands-0.009
Token some
Token ID617
Feature activation+1.39e+00

Pos logit contributions
 disl+0.101
 understands+0.099
+0.098
Fighting+0.094
:"+0.092
Neg logit contributions
 ever-0.146
 centuries-0.140
 souven-0.139
 ho-0.137
Ar-0.128
Token on
Token ID319
Feature activation+5.21e+00

Pos logit contributions
 centuries+0.046
ust+0.046
Ar+0.044
 ever+0.044
ur+0.044
Neg logit contributions
-0.034
 understands-0.034
 sees-0.033
 draws-0.032
 means-0.032
Token her
Token ID607
Feature activation+3.66e+00

Pos logit contributions
 centuries+0.196
ust+0.192
 ever+0.190
Ar+0.187
 ho+0.187
Neg logit contributions
-0.106
 understands-0.105
 sees-0.099
 means-0.097
 draws-0.096
Token eyel
Token ID29708
Feature activation+1.67e+01

Pos logit contributions
 centuries+0.109
 eyel+0.108
ust+0.107
 ever+0.106
 ho+0.105
Neg logit contributions
 understands-0.097
-0.096
 means-0.096
Fighting-0.094
 draws-0.091
Tokenids
Token ID2340
Feature activation-7.68e-01

Pos logit contributions
Ar+0.539
 centuries+0.511
 prey+0.501
ust+0.483
 souven+0.478
Neg logit contributions
 draws-0.471
 sees-0.451
 understands-0.450
:"-0.424
 "-0.418
Token and
Token ID290
Feature activation+1.44e-01

Pos logit contributions
 understands+0.021
+0.020
 disl+0.020
Fighting+0.019
 draws+0.019
Neg logit contributions
 ever-0.024
 centuries-0.024
 ho-0.023
ust-0.023
ur-0.023
Token cheeks
Token ID25839
Feature activation+2.75e+00

Pos logit contributions
 ho+0.005
 eyel+0.004
 souven+0.004
 centuries+0.004
 ever+0.004
Neg logit contributions
Fighting-0.004
-0.004
:"-0.004
liness-0.004
 understands-0.004
Token.
Token ID13
Feature activation+7.82e-01

Pos logit contributions
 centuries+0.073
 ever+0.073
ust+0.072
 ho+0.072
ur+0.069
Neg logit contributions
 understands-0.085
-0.082
 sees-0.080
 draws-0.080
 means-0.080
Token It
Token ID632
Feature activation-7.09e+00

Pos logit contributions
 centuries+0.023
and+0.022
ur+0.022
 eyel+0.022
Flo+0.021
Neg logit contributions
 sees-0.020
 realized-0.020
 understands-0.019
 disl-0.019
-0.019
Token.
Token ID13
Feature activation+7.89e-01

Pos logit contributions
 means+0.059
 sees+0.056
 understands+0.056
+0.056
 "+0.055
Neg logit contributions
ust-0.102
Ar-0.101
 centuries-0.097
abb-0.095
ur-0.094
Token He
Token ID679
Feature activation+8.84e-01

Pos logit contributions
 centuries+0.029
ust+0.029
Ar+0.029
ur+0.028
 ever+0.028
Neg logit contributions
-0.016
 understands-0.016
 sees-0.016
 draws-0.016
 means-0.015
Token blinked
Token ID40360
Feature activation+4.56e-02

Pos logit contributions
ust+0.038
 centuries+0.037
Ar+0.037
ur+0.036
 ever+0.036
Neg logit contributions
-0.014
 understands-0.013
 sees-0.013
 draws-0.012
 means-0.012
Token his
Token ID465
Feature activation+3.85e+00

Pos logit contributions
 centuries+0.001
ust+0.001
 ho+0.001
 ever+0.001
 prey+0.001
Neg logit contributions
 understands-0.001
-0.001
 draws-0.001
 means-0.001
Fighting-0.001
Token long
Token ID890
Feature activation+4.83e+00

Pos logit contributions
ust+0.139
 centuries+0.135
Ar+0.134
ur+0.129
 ever+0.129
Neg logit contributions
-0.091
 understands-0.090
 sees-0.090
 draws-0.083
 "-0.081
Token eyel
Token ID29708
Feature activation+1.66e+01

Pos logit contributions
ust+0.151
 centuries+0.143
Ar+0.141
ur+0.139
 ever+0.136
Neg logit contributions
-0.134
 sees-0.132
 understands-0.129
 draws-0.129
 means-0.125
Tokenashes
Token ID7465
Feature activation+4.03e-01

Pos logit contributions
Ar+0.573
 centuries+0.548
 prey+0.516
 souven+0.510
ust+0.508
Neg logit contributions
 draws-0.417
 sees-0.415
 understands-0.412
:"-0.380
-0.370
Token and
Token ID290
Feature activation+1.28e-01

Pos logit contributions
 centuries+0.014
 ever+0.014
 ho+0.013
ust+0.013
ur+0.013
Neg logit contributions
-0.009
 understands-0.009
 draws-0.009
 sees-0.009
 disl-0.009
Token kept
Token ID4030
Feature activation-5.21e+00

Pos logit contributions
 ever+0.005
 centuries+0.005
 ho+0.005
Ar+0.005
ur+0.005
Neg logit contributions
 understands-0.002
-0.002
Fighting-0.002
 means-0.002
 draws-0.002
Token hopping
Token ID47153
Feature activation+7.41e+00

Pos logit contributions
 sees+0.134
+0.130
 "+0.128
 understands+0.128
 draws+0.127
Neg logit contributions
ust-0.178
 centuries-0.164
Ar-0.163
ur-0.159
abb-0.159
Token.
Token ID13
Feature activation+8.41e-01

Pos logit contributions
ust+0.330
 centuries+0.319
Ar+0.313
ur+0.304
 eyel+0.303
Neg logit contributions
-0.100
 "-0.099
 sees-0.094
 realized-0.090
 means-0.087
Token They
Token ID1119
Feature activation+4.69e-01

Pos logit contributions
ust+0.029
 centuries+0.029
Ar+0.029
 ever+0.028
 ho+0.028
Neg logit contributions
-0.014
 understands-0.014
 sees-0.013
 draws-0.013
 means-0.013
Token started
Token ID2067
Feature activation+1.28e+00

Pos logit contributions
ust+0.015
 ever+0.015
 centuries+0.015
ur+0.015
 ho+0.014
Neg logit contributions
-0.012
 understands-0.012
 means-0.012
 draws-0.012
 sees-0.011
Token playing
Token ID2712
Feature activation+1.51e+00

Pos logit contributions
 ever+0.037
 centuries+0.036
ur+0.036
 ho+0.036
ust+0.035
Neg logit contributions
-0.038
 understands-0.036
 draws-0.035
 means-0.035
 sees-0.034
Token hide
Token ID7808
Feature activation+6.94e+00

Pos logit contributions
 ever+0.039
 centuries+0.039
 prey+0.038
Ar+0.037
 souven+0.036
Neg logit contributions
 understands-0.046
-0.046
 draws-0.045
 means-0.044
 disl-0.042
Token-
Token ID12
Feature activation+6.05e+00

Pos logit contributions
ust+0.286
Ar+0.283
 centuries+0.282
ur+0.272
 eyel+0.267
Neg logit contributions
-0.116
 sees-0.110
 understands-0.108
 realized-0.104
 means-0.104
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 centuries+0.075
 ho+0.071
ust+0.068
 prey+0.067
 eyel+0.065
Neg logit contributions
-0.266
 understands-0.263
 draws-0.263
 sees-0.262
 disl-0.262
Token-
Token ID12
Feature activation+6.09e+00

Pos logit contributions
ust+0.461
 centuries+0.459
Ar+0.445
ur+0.424
 prey+0.422
Neg logit contributions
-0.494
 understands-0.486
 sees-0.476
 draws-0.465
 means-0.465
Tokenseek
Token ID36163
Feature activation+2.55e+00

Pos logit contributions
ust+0.124
 centuries+0.122
Ar+0.115
 ho+0.113
 prey+0.111
Neg logit contributions
-0.229
 understands-0.225
 sees-0.221
 draws-0.220
 means-0.219
Token again
Token ID757
Feature activation-1.13e+00

Pos logit contributions
ust+0.097
 centuries+0.095
Ar+0.094
ur+0.092
 ho+0.090
Neg logit contributions
-0.051
 understands-0.050
 sees-0.049
 draws-0.048
 means-0.048
Token and
Token ID290
Feature activation+1.71e-01

Pos logit contributions
+0.024
 understands+0.023
 sees+0.023
 means+0.022
 draws+0.022
Neg logit contributions
ust-0.042
 centuries-0.042
Ar-0.041
ur-0.040
 ever-0.039
Token had
Token ID550
Feature activation-1.86e+00

Pos logit contributions
 ever+0.006
 centuries+0.006
ur+0.005
ust+0.005
 ho+0.005
Neg logit contributions
-0.004
 understands-0.004
 means-0.004
 draws-0.004
:"-0.004
Token decided
Token ID3066
Feature activation-1.56e-01

Pos logit contributions
ust+0.023
 centuries+0.023
 ever+0.022
Ar+0.022
ur+0.022
Neg logit contributions
-0.014
 understands-0.014
 means-0.013
 sees-0.013
 draws-0.013
Token to
Token ID284
Feature activation+6.30e-01

Pos logit contributions
+0.005
 draws+0.005
 understands+0.005
 End+0.005
 sees+0.005
Neg logit contributions
 ever-0.004
ust-0.003
 souven-0.003
 centuries-0.003
ur-0.003
Token play
Token ID711
Feature activation+2.29e+00

Pos logit contributions
ust+0.023
 centuries+0.022
 prey+0.022
Ar+0.021
and+0.021
Neg logit contributions
 draws-0.014
 sees-0.013
 means-0.013
-0.013
 understands-0.013
Token hide
Token ID7808
Feature activation+7.03e+00

Pos logit contributions
 centuries+0.057
 ever+0.055
 prey+0.053
Ar+0.052
 souven+0.051
Neg logit contributions
 understands-0.077
-0.076
 means-0.070
 draws-0.070
 disl-0.070
Token-
Token ID12
Feature activation+6.19e+00

Pos logit contributions
ust+0.262
 centuries+0.259
Ar+0.256
ur+0.252
 ho+0.247
Neg logit contributions
-0.146
 understands-0.139
 sees-0.139
 means-0.136
 draws-0.133
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 ho+0.087
 prey+0.079
 centuries+0.076
ust+0.076
 eyel+0.073
Neg logit contributions
 draws-0.267
-0.263
 sees-0.261
Fighting-0.257
 means-0.256
Token-
Token ID12
Feature activation+6.22e+00

Pos logit contributions
ust+0.482
 centuries+0.479
Ar+0.473
 ho+0.450
 prey+0.449
Neg logit contributions
-0.469
 understands-0.458
 sees-0.455
 draws-0.442
 means-0.440
Tokenseek
Token ID36163
Feature activation+2.49e+00

Pos logit contributions
ust+0.137
 centuries+0.131
 ho+0.127
 prey+0.124
Ar+0.123
Neg logit contributions
-0.226
 draws-0.219
 understands-0.219
 sees-0.218
 means-0.217
Token.
Token ID13
Feature activation+9.57e-01

Pos logit contributions
ust+0.094
ur+0.092
Ar+0.092
abb+0.090
 centuries+0.088
Neg logit contributions
 "-0.050
 sees-0.050
-0.049
 means-0.048
 understands-0.047
Token Lily
Token ID20037
Feature activation-2.16e-01

Pos logit contributions
 centuries+0.029
ust+0.029
ur+0.029
Ar+0.028
and+0.028
Neg logit contributions
 understands-0.025
 sees-0.024
 means-0.023
-0.023
edience-0.023
Token hid
Token ID24519
Feature activation+5.76e-01

Pos logit contributions
Fighting+0.005
+0.005
 disl+0.005
 understands+0.004
 means+0.004
Neg logit contributions
 ever-0.008
ur-0.007
 ho-0.007
Ar-0.007
 centuries-0.007
Token Jack
Token ID3619
Feature activation-1.32e-01

Pos logit contributions
+0.132
 understands+0.131
 draws+0.128
 sees+0.128
liness+0.122
Neg logit contributions
 ever-0.104
 centuries-0.103
ust-0.097
 ho-0.096
Ar-0.095
Token said
Token ID531
Feature activation-1.77e+00

Pos logit contributions
+0.002
 understands+0.002
 sees+0.002
 draws+0.002
 means+0.002
Neg logit contributions
ust-0.006
 centuries-0.006
 ever-0.005
ur-0.005
Ar-0.005
Token firmly
Token ID14245
Feature activation-2.02e+00

Pos logit contributions
+0.043
 "+0.041
 understands+0.041
 means+0.040
 realized+0.040
Neg logit contributions
Ar-0.061
ust-0.060
 centuries-0.060
 prey-0.057
abb-0.056
Token,
Token ID11
Feature activation-1.06e+00

Pos logit contributions
 understands+0.060
+0.059
 sees+0.058
 draws+0.058
 means+0.056
Neg logit contributions
ust-0.056
 centuries-0.055
ur-0.054
 ever-0.053
 ho-0.053
Token "
Token ID366
Feature activation-5.81e+00

Pos logit contributions
 understands+0.012
+0.011
 sees+0.011
 draws+0.011
 means+0.011
Neg logit contributions
 centuries-0.049
ust-0.048
 ever-0.047
ur-0.047
 ho-0.047
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
+0.128
Fighting+0.115
 means+0.114
 understands+0.113
 draws+0.111
Neg logit contributions
 centuries-0.216
ur-0.213
 ho-0.208
 eyel-0.206
 prey-0.204
Token I
Token ID314
Feature activation+2.50e+00

Pos logit contributions
 ever+0.613
 centuries+0.604
ur+0.599
ust+0.599
 ho+0.591
Neg logit contributions
 understands-0.321
 draws-0.319
-0.316
 sees-0.310
Fighting-0.300
Token'm
Token ID1101
Feature activation-3.04e-01

Pos logit contributions
ust+0.086
 centuries+0.085
Ar+0.083
 ever+0.082
ur+0.082
Neg logit contributions
-0.059
 understands-0.057
 sees-0.056
 draws-0.056
 means-0.055
Token not
Token ID407
Feature activation+1.75e+00

Pos logit contributions
 understands+0.007
+0.007
 sees+0.007
 means+0.007
 "+0.007
Neg logit contributions
ust-0.010
Ar-0.010
 centuries-0.010
ur-0.010
 prey-0.010
Token going
Token ID1016
Feature activation+5.94e+00

Pos logit contributions
ust+0.056
 centuries+0.056
Ar+0.055
ur+0.054
 prey+0.053
Neg logit contributions
-0.044
 understands-0.044
 sees-0.043
 means-0.043
 draws-0.042
Token to
Token ID284
Feature activation+6.22e-01

Pos logit contributions
ust+0.212
 centuries+0.207
Ar+0.203
ur+0.199
 eyel+0.196
Neg logit contributions
-0.137
 sees-0.132
 means-0.127
 understands-0.125
 draws-0.125
Token.
Token ID13
Feature activation+8.55e-01

Pos logit contributions
 means+0.107
 "+0.106
+0.106
 understands+0.105
 sees+0.104
Neg logit contributions
Ar-0.154
ust-0.149
 centuries-0.145
ur-0.142
abb-0.139
Token They
Token ID1119
Feature activation+2.91e-01

Pos logit contributions
 centuries+0.028
ur+0.028
ust+0.028
 eyel+0.028
 ho+0.027
Neg logit contributions
 understands-0.021
-0.020
 sees-0.019
 means-0.019
Fighting-0.018
Token played
Token ID2826
Feature activation+7.27e-01

Pos logit contributions
 ever+0.010
 ho+0.009
ur+0.009
ust+0.009
 centuries+0.009
Neg logit contributions
 understands-0.008
-0.007
 disl-0.007
 means-0.007
Fighting-0.007
Token hide
Token ID7808
Feature activation+6.96e+00

Pos logit contributions
 centuries+0.018
 ever+0.017
Ar+0.017
 prey+0.016
and+0.016
Neg logit contributions
 understands-0.024
-0.024
 means-0.023
 draws-0.023
 sees-0.023
Token-
Token ID12
Feature activation+6.11e+00

Pos logit contributions
ust+0.264
 centuries+0.260
Ar+0.257
ur+0.251
 eyel+0.247
Neg logit contributions
-0.142
 sees-0.134
 understands-0.132
 realized-0.129
 means-0.128
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 centuries+0.075
 ho+0.073
 prey+0.072
ust+0.071
 eyel+0.064
Neg logit contributions
-0.273
 draws-0.266
 sees-0.260
 understands-0.259
Fighting-0.258
Token-
Token ID12
Feature activation+6.15e+00

Pos logit contributions
ust+0.446
 centuries+0.443
Ar+0.427
ur+0.415
 ever+0.413
Neg logit contributions
-0.507
 understands-0.496
 sees-0.487
 draws-0.483
 means-0.478
Tokenseek
Token ID36163
Feature activation+2.47e+00

Pos logit contributions
ust+0.126
 centuries+0.123
Ar+0.115
 ho+0.114
 prey+0.113
Neg logit contributions
-0.232
 understands-0.225
 draws-0.223
 sees-0.222
 means-0.220
Token and
Token ID290
Feature activation+1.98e-01

Pos logit contributions
ust+0.102
 centuries+0.099
ur+0.099
Ar+0.099
abb+0.096
Neg logit contributions
-0.042
 sees-0.041
 understands-0.040
 "-0.039
 means-0.038
Token had
Token ID550
Feature activation-1.78e+00

Pos logit contributions
 centuries+0.007
ust+0.007
Ar+0.007
 ever+0.007
ur+0.007
Neg logit contributions
-0.004
 understands-0.004
 sees-0.004
 draws-0.004
 means-0.004
Token so
Token ID523
Feature activation+4.43e+00

Pos logit contributions
 understands+0.049
+0.048
 sees+0.048
 draws+0.047
 realized+0.047
Neg logit contributions
ust-0.054
 centuries-0.052
ur-0.052
Ar-0.051
 ho-0.050
TokenLet
Token ID5756
Feature activation-4.77e+00

Pos logit contributions
+0.152
 understands+0.140
 draws+0.140
Fighting+0.138
 sees+0.137
Neg logit contributions
 centuries-0.195
ust-0.190
ur-0.189
 ho-0.187
 prey-0.184
Token's
Token ID338
Feature activation-3.65e-01

Pos logit contributions
+0.092
 understands+0.092
 disl+0.088
 draws+0.086
Fighting+0.086
Neg logit contributions
ust-0.180
 ever-0.175
 ho-0.171
 centuries-0.170
ct-0.168
Token play
Token ID711
Feature activation+2.21e+00

Pos logit contributions
+0.009
 draws+0.009
 sees+0.008
 understands+0.008
 means+0.008
Neg logit contributions
ust-0.013
 centuries-0.012
Ar-0.012
ur-0.012
 prey-0.012
Token hide
Token ID7808
Feature activation+6.99e+00

Pos logit contributions
 centuries+0.056
 ever+0.054
Ar+0.053
ust+0.053
 prey+0.052
Neg logit contributions
 understands-0.072
-0.071
 draws-0.067
 means-0.067
 sees-0.067
Token-
Token ID12
Feature activation+6.17e+00

Pos logit contributions
ust+0.248
 centuries+0.246
Ar+0.239
ur+0.236
 ever+0.235
Neg logit contributions
-0.157
 understands-0.154
 sees-0.150
 draws-0.148
 means-0.147
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
ust+0.064
 ho+0.063
 centuries+0.062
 prey+0.057
 eyel+0.054
Neg logit contributions
-0.286
 draws-0.284
 sees-0.281
 means-0.278
 understands-0.276
Token-
Token ID12
Feature activation+6.20e+00

Pos logit contributions
ust+0.449
 centuries+0.445
Ar+0.431
ur+0.418
 ever+0.414
Neg logit contributions
-0.505
 understands-0.495
 sees-0.487
 draws-0.481
 means-0.479
Tokenseek
Token ID36163
Feature activation+2.50e+00

Pos logit contributions
 centuries+0.094
Ar+0.090
ur+0.089
 ever+0.089
ust+0.088
Neg logit contributions
 understands-0.267
-0.263
 sees-0.259
 draws-0.257
 means-0.255
Token!"
Token ID2474
Feature activation-4.15e+00

Pos logit contributions
ust+0.089
Ar+0.086
ur+0.086
 centuries+0.084
 eyel+0.083
Neg logit contributions
-0.057
 sees-0.055
 means-0.054
 "-0.053
 understands-0.053
Token Lily
Token ID20037
Feature activation-3.24e-01

Pos logit contributions
 understands+0.112
+0.108
 sees+0.107
 means+0.104
 disl+0.102
Neg logit contributions
 centuries-0.122
 eyel-0.122
and-0.121
ust-0.119
ur-0.119
Token was
Token ID373
Feature activation-9.71e-01

Pos logit contributions
+0.007
 understands+0.007
Fighting+0.006
 sees+0.006
 draws+0.006
Neg logit contributions
ust-0.012
 centuries-0.012
ur-0.012
 ever-0.011
Ar-0.011
Token loved
Token ID6151
Feature activation-1.06e+00

Pos logit contributions
 ever+0.027
 centuries+0.026
Ar+0.026
ust+0.026
 eyel+0.025
Neg logit contributions
-0.010
 understands-0.010
 means-0.009
liness-0.009
 sees-0.009
Token to
Token ID284
Feature activation+5.84e-01

Pos logit contributions
+0.028
 understands+0.028
 sees+0.027
 means+0.026
 draws+0.026
Neg logit contributions
 centuries-0.033
ust-0.032
 ever-0.032
Ar-0.032
 ho-0.031
Token play
Token ID711
Feature activation+2.23e+00

Pos logit contributions
ust+0.025
 centuries+0.023
Ar+0.023
 prey+0.022
ur+0.022
Neg logit contributions
 draws-0.012
 sees-0.011
 "-0.010
-0.010
 disl-0.010
Token hide
Token ID7808
Feature activation+6.96e+00

Pos logit contributions
ust+0.068
 centuries+0.066
Ar+0.064
ur+0.063
 ever+0.063
Neg logit contributions
-0.064
 sees-0.061
 draws-0.061
 understands-0.060
 means-0.059
Token-
Token ID12
Feature activation+6.07e+00

Pos logit contributions
ust+0.268
Ar+0.266
 centuries+0.260
ur+0.254
 eyel+0.252
Neg logit contributions
-0.137
 sees-0.129
 realized-0.126
 means-0.119
 draws-0.118
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 ho+0.105
ust+0.100
 prey+0.099
 centuries+0.097
 eyel+0.093
Neg logit contributions
 disl-0.226
 sees-0.224
 draws-0.222
 that-0.218
 realized-0.217
Token-
Token ID12
Feature activation+6.09e+00

Pos logit contributions
ust+0.531
 centuries+0.523
Ar+0.521
 prey+0.500
 souven+0.482
Neg logit contributions
-0.430
 sees-0.408
 understands-0.399
 realized-0.395
 draws-0.385
Tokenseek
Token ID36163
Feature activation+2.48e+00

Pos logit contributions
ust+0.166
 centuries+0.154
 ho+0.153
 prey+0.150
 souven+0.149
Neg logit contributions
-0.186
 sees-0.182
 draws-0.180
 realized-0.177
 disl-0.177
Token with
Token ID351
Feature activation+3.96e+00

Pos logit contributions
ur+0.102
ust+0.101
Ar+0.099
iri+0.097
abb+0.097
Neg logit contributions
 "-0.048
 the-0.046
 that-0.045
 sees-0.044
-0.043
Token her
Token ID607
Feature activation+3.94e+00

Pos logit contributions
ust+0.201
ider+0.194
ur+0.189
Ar+0.188
 centuries+0.182
Neg logit contributions
 the-0.041
 draws-0.041
 "-0.040
 sees-0.039
 it-0.034
Token friends
Token ID2460
Feature activation+3.27e+00

Pos logit contributions
ust+0.139
 centuries+0.136
ur+0.133
Ar+0.131
abb+0.127
Neg logit contributions
-0.098
 sees-0.096
 understands-0.096
 "-0.091
 draws-0.087
TokenI
Token ID40
Feature activation+3.79e+00

Pos logit contributions
+0.119
 understands+0.105
 draws+0.103
Fighting+0.102
 sees+0.101
Neg logit contributions
 centuries-0.228
ur-0.225
 prey-0.222
 ho-0.219
ust-0.217
Token'm
Token ID1101
Feature activation-4.51e-01

Pos logit contributions
 centuries+0.139
ust+0.139
Ar+0.134
 prey+0.134
ur+0.131
Neg logit contributions
 sees-0.082
 understands-0.081
-0.080
 means-0.077
 draws-0.077
Token playing
Token ID2712
Feature activation+1.50e+00

Pos logit contributions
 sees+0.011
+0.011
 understands+0.011
 means+0.011
 draws+0.011
Neg logit contributions
ust-0.015
Ar-0.015
 centuries-0.015
 prey-0.014
abb-0.014
Token hide
Token ID7808
Feature activation+6.92e+00

Pos logit contributions
 centuries+0.042
ust+0.041
Ar+0.040
 ever+0.039
ur+0.039
Neg logit contributions
 understands-0.045
-0.045
 sees-0.043
 draws-0.043
 means-0.043
Token-
Token ID12
Feature activation+6.04e+00

Pos logit contributions
ust+0.231
 centuries+0.226
Ar+0.226
ur+0.219
 prey+0.214
Neg logit contributions
-0.170
 sees-0.163
 understands-0.163
 means-0.160
 draws-0.157
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 ho+0.089
 prey+0.082
ust+0.080
 eyel+0.079
 centuries+0.078
Neg logit contributions
 draws-0.247
 sees-0.247
-0.246
 disl-0.245
Fighting-0.240
Token-
Token ID12
Feature activation+6.06e+00

Pos logit contributions
ust+0.487
 centuries+0.475
Ar+0.474
 prey+0.448
ur+0.441
Neg logit contributions
-0.469
 understands-0.454
 sees-0.451
 means-0.440
 draws-0.434
Tokenseek
Token ID36163
Feature activation+2.46e+00

Pos logit contributions
ust+0.138
 centuries+0.128
 ho+0.125
 prey+0.123
Ar+0.122
Neg logit contributions
-0.218
 sees-0.209
 understands-0.209
 draws-0.208
 means-0.207
Token!"
Token ID2474
Feature activation-4.23e+00

Pos logit contributions
ust+0.085
Ar+0.085
ur+0.082
 centuries+0.082
abb+0.080
Neg logit contributions
-0.056
 sees-0.056
 means-0.054
 understands-0.054
 "-0.053
Token
Token ID198
Feature activation+2.75e+00

Pos logit contributions
+0.081
 understands+0.080
 sees+0.077
 draws+0.076
 means+0.075
Neg logit contributions
ust-0.165
 centuries-0.164
Ar-0.160
ur-0.157
 ever-0.157
Token
Token ID198
Feature activation+2.75e+00

Pos logit contributions
ust+0.110
 centuries+0.109
Ar+0.105
 ever+0.105
ur+0.104
Neg logit contributions
-0.052
 understands-0.050
 sees-0.048
 draws-0.047
 means-0.046
Token much
Token ID881
Feature activation+7.62e+00

Pos logit contributions
 centuries+0.106
ur+0.105
ust+0.105
 ho+0.104
 prey+0.103
Neg logit contributions
 understands-0.136
 sees-0.135
-0.134
 "-0.128
 realized-0.127
Token fun
Token ID1257
Feature activation+1.72e+00

Pos logit contributions
ust+0.293
 centuries+0.275
 ho+0.274
ur+0.271
Flo+0.263
Neg logit contributions
 sees-0.168
 understands-0.160
 means-0.156
 draws-0.155
 disl-0.154
Token playing
Token ID2712
Feature activation+1.48e+00

Pos logit contributions
ust+0.060
 centuries+0.058
Ar+0.057
ur+0.056
 ho+0.055
Neg logit contributions
-0.041
 understands-0.039
 sees-0.039
 means-0.039
 draws-0.039
Token hide
Token ID7808
Feature activation+6.88e+00

Pos logit contributions
ust+0.042
 centuries+0.041
Ar+0.040
ur+0.039
 ever+0.039
Neg logit contributions
-0.044
 understands-0.044
 sees-0.043
 draws-0.042
 means-0.042
Token-
Token ID12
Feature activation+6.09e+00

Pos logit contributions
ust+0.284
Ar+0.283
 centuries+0.282
ur+0.275
 eyel+0.274
Neg logit contributions
-0.113
 sees-0.104
 realized-0.101
 means-0.101
 understands-0.098
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 ho+0.092
 centuries+0.089
 eyel+0.085
ust+0.085
 prey+0.082
Neg logit contributions
 disl-0.249
 draws-0.245
-0.245
 sees-0.243
Fighting-0.241
Token-
Token ID12
Feature activation+6.11e+00

Pos logit contributions
 centuries+0.546
ust+0.536
Ar+0.530
 prey+0.498
 eyel+0.498
Neg logit contributions
-0.409
 disl-0.379
 sees-0.377
 understands-0.374
 realized-0.373
Tokenseek
Token ID36163
Feature activation+2.43e+00

Pos logit contributions
ust+0.143
 centuries+0.138
 ho+0.131
Ar+0.129
 souven+0.128
Neg logit contributions
-0.214
 sees-0.205
 understands-0.204
 draws-0.204
 means-0.202
Token.
Token ID13
Feature activation+9.41e-01

Pos logit contributions
ust+0.097
ur+0.096
Ar+0.096
abb+0.093
 centuries+0.093
Neg logit contributions
 "-0.046
-0.044
 sees-0.042
 means-0.040
 the-0.040
Token Tim
Token ID5045
Feature activation+6.46e-01

Pos logit contributions
 centuries+0.029
ust+0.029
Ar+0.029
ur+0.028
 ever+0.028
Neg logit contributions
 understands-0.025
 draws-0.023
-0.023
 sees-0.023
 means-0.023
Tokenmy
Token ID1820
Feature activation+7.57e-01

Pos logit contributions
ust+0.022
 ho+0.020
 ever+0.020
ct+0.020
ippy+0.019
Neg logit contributions
 disl-0.017
-0.017
Fighting-0.015
:"-0.015
 understands-0.015
Token Lily
Token ID20037
Feature activation-4.58e-01

Pos logit contributions
 understands+0.029
liness+0.026
edience+0.026
 disl+0.026
 means+0.026
Neg logit contributions
 ever-0.029
 eyel-0.029
 souven-0.029
 centuries-0.028
 ho-0.028
Token was
Token ID373
Feature activation-1.07e+00

Pos logit contributions
Fighting+0.010
+0.009
 disl+0.009
 understands+0.009
:"+0.009
Neg logit contributions
 ever-0.016
Ar-0.016
 ho-0.016
 centuries-0.016
ust-0.016
Token playing
Token ID2712
Feature activation+1.51e+00

Pos logit contributions
+0.027
 understands+0.027
 means+0.026
 draws+0.026
 sees+0.025
Neg logit contributions
 centuries-0.035
ust-0.035
 ever-0.033
Ar-0.033
ur-0.033
Token hide
Token ID7808
Feature activation+6.94e+00

Pos logit contributions
ust+0.043
 centuries+0.042
Ar+0.040
ur+0.040
 ever+0.039
Neg logit contributions
-0.046
 sees-0.045
 understands-0.044
 draws-0.044
 means-0.043
Token-
Token ID12
Feature activation+6.09e+00

Pos logit contributions
Ar+0.251
ust+0.248
 centuries+0.244
ur+0.237
 souven+0.236
Neg logit contributions
-0.152
 sees-0.148
 realized-0.144
 means-0.140
 understands-0.135
Tokenand
Token ID392
Feature activation+1.64e+01

Pos logit contributions
 centuries+0.128
 ho+0.126
ust+0.124
 eyel+0.123
 prey+0.122
Neg logit contributions
 draws-0.205
 sees-0.204
-0.203
 disl-0.200
 means-0.199
Token-
Token ID12
Feature activation+6.11e+00

Pos logit contributions
ust+0.510
 centuries+0.508
Ar+0.499
 prey+0.467
ur+0.464
Neg logit contributions
-0.450
 sees-0.429
 understands-0.419
 realized-0.414
 draws-0.410
Tokenseek
Token ID36163
Feature activation+2.50e+00

Pos logit contributions
ust+0.129
 centuries+0.124
 ho+0.116
 prey+0.115
Ar+0.114
Neg logit contributions
-0.228
 understands-0.221
 draws-0.220
 sees-0.220
 means-0.218
Token with
Token ID351
Feature activation+3.88e+00

Pos logit contributions
Ar+0.108
ust+0.104
ur+0.102
abb+0.101
iri+0.099
Neg logit contributions
 "-0.044
 the-0.041
 that-0.041
 sees-0.039
-0.038
Token her
Token ID607
Feature activation+3.80e+00

Pos logit contributions
ust+0.171
Ar+0.161
 centuries+0.159
ur+0.159
abb+0.157
Neg logit contributions
 sees-0.060
 draws-0.059
 understands-0.058
 "-0.057
-0.054
Token friends
Token ID2460
Feature activation+3.16e+00

Pos logit contributions
ust+0.134
 centuries+0.132
ur+0.128
Ar+0.126
abb+0.123
Neg logit contributions
 sees-0.092
-0.091
 understands-0.090
 draws-0.084
 realized-0.083
Token she
Token ID673
Feature activation+1.44e+00

Pos logit contributions
+0.019
 understands+0.018
 sees+0.018
 "+0.017
 means+0.017
Neg logit contributions
Ar-0.042
ust-0.042
 centuries-0.041
ur-0.041
 prey-0.041
Token would
Token ID561
Feature activation-2.56e+00

Pos logit contributions
 ever+0.054
Ar+0.052
ust+0.052
 ho+0.051
 centuries+0.051
Neg logit contributions
-0.028
 understands-0.027
Fighting-0.027
:"-0.026
liness-0.026
Token play
Token ID711
Feature activation+2.10e+00

Pos logit contributions
+0.069
 sees+0.069
 draws+0.068
 understands+0.067
 means+0.065
Neg logit contributions
ust-0.080
 centuries-0.079
Ar-0.077
 prey-0.076
ur-0.074
Token hide
Token ID7808
Feature activation+6.89e+00

Pos logit contributions
 centuries+0.053
ust+0.053
Ar+0.051
 ever+0.050
ur+0.050
Neg logit contributions
-0.068
 understands-0.068
 sees-0.066
 draws-0.065
 means-0.065
Token-
Token ID12
Feature activation+6.03e+00

Pos logit contributions
Ar+0.269
ust+0.268
 centuries+0.259
ur+0.256
 eyel+0.252
Neg logit contributions
-0.131
 sees-0.125
 realized-0.120
 means-0.117
 draws-0.114
Tokenand
Token ID392
Feature activation+1.63e+01

Pos logit contributions
 ho+0.096
ust+0.095
 centuries+0.089
 prey+0.089
 eyel+0.088
Neg logit contributions
 draws-0.237
 disl-0.233
 sees-0.232
-0.232
 realized-0.227
Token-
Token ID12
Feature activation+6.04e+00

Pos logit contributions
ust+0.532
 centuries+0.517
Ar+0.516
 prey+0.492
 souven+0.475
Neg logit contributions
-0.426
 sees-0.411
 understands-0.400
 realized-0.395
 draws-0.394
Tokenseek
Token ID36163
Feature activation+2.41e+00

Pos logit contributions
ust+0.148
 centuries+0.137
 ho+0.134
 prey+0.131
 souven+0.130
Neg logit contributions
-0.206
 draws-0.201
 sees-0.200
 understands-0.198
 means-0.197
Token with
Token ID351
Feature activation+3.83e+00

Pos logit contributions
ust+0.093
ur+0.093
Ar+0.092
abb+0.089
iri+0.088
Neg logit contributions
 "-0.050
 sees-0.047
 the-0.047
-0.046
 that-0.044
Token her
Token ID607
Feature activation+3.82e+00

Pos logit contributions
ust+0.189
Ar+0.181
ider+0.180
ur+0.177
abb+0.174
Neg logit contributions
 the-0.043
 sees-0.042
 "-0.041
 draws-0.040
 understands-0.038
Token friends
Token ID2460
Feature activation+3.12e+00

Pos logit contributions
ust+0.132
 centuries+0.127
ur+0.124
Ar+0.123
abb+0.120
Neg logit contributions
-0.097
 sees-0.095
 understands-0.095
 draws-0.089
 "-0.089
Token friends
Token ID2460
Feature activation+3.01e+00

Pos logit contributions
 centuries+0.154
ust+0.151
Ar+0.146
ur+0.145
and+0.141
Neg logit contributions
 sees-0.074
 understands-0.074
-0.072
 "-0.070
 realized-0.070
Token were
Token ID547
Feature activation+2.69e+00

Pos logit contributions
 centuries+0.120
ust+0.119
Ar+0.118
 prey+0.116
ur+0.114
Neg logit contributions
-0.054
 sees-0.052
 understands-0.052
 realized-0.051
 means-0.049
Token playing
Token ID2712
Feature activation+1.46e+00

Pos logit contributions
ust+0.090
 centuries+0.089
Ar+0.088
 prey+0.085
ur+0.085
Neg logit contributions
-0.067
 sees-0.065
 understands-0.064
 draws-0.062
 means-0.062
Token hide
Token ID7808
Feature activation+6.92e+00

Pos logit contributions
 centuries+0.037
ust+0.036
Ar+0.035
 ever+0.035
 prey+0.034
Neg logit contributions
 understands-0.047
-0.047
 draws-0.045
 sees-0.045
 means-0.045
Token-
Token ID12
Feature activation+6.02e+00

Pos logit contributions
Ar+0.286
ust+0.285
 centuries+0.278
ur+0.273
 eyel+0.266
Neg logit contributions
-0.117
 sees-0.112
 means-0.107
 realized-0.106
 understands-0.100
Tokenand
Token ID392
Feature activation+1.63e+01

Pos logit contributions
 ho+0.107
 prey+0.093
 centuries+0.091
iri+0.089
 souven+0.088
Neg logit contributions
 disl-0.230
Fighting-0.229
 draws-0.227
-0.225
 sees-0.225
Token-
Token ID12
Feature activation+6.06e+00

Pos logit contributions
Ar+0.505
 centuries+0.503
ust+0.503
 ho+0.469
 prey+0.468
Neg logit contributions
-0.437
 sees-0.422
 understands-0.421
 means-0.408
 realized-0.407
Tokenseek
Token ID36163
Feature activation+2.45e+00

Pos logit contributions
ust+0.151
 ho+0.147
 centuries+0.145
 souven+0.141
 prey+0.141
Neg logit contributions
-0.197
 draws-0.188
 sees-0.188
 means-0.188
 realized-0.187
Token behind
Token ID2157
Feature activation+4.54e+00

Pos logit contributions
Ar+0.097
ust+0.096
ur+0.093
abb+0.092
iri+0.091
Neg logit contributions
 "-0.048
-0.045
 sees-0.045
 means-0.043
:"-0.042
Token the
Token ID262
Feature activation-5.85e+00

Pos logit contributions
ust+0.193
Ar+0.179
ur+0.179
 centuries+0.178
abb+0.177
Neg logit contributions
 sees-0.076
-0.075
 draws-0.073
 means-0.073
 understands-0.072
Token old
Token ID1468
Feature activation-5.71e+00

Pos logit contributions
+0.128
 understands+0.125
 sees+0.123
 draws+0.121
 means+0.119
Neg logit contributions
ust-0.212
 centuries-0.209
Ar-0.203
ur-0.201
 ever-0.200
Token they
Token ID484
Feature activation+5.86e-01

Pos logit contributions
 understands+0.028
 means+0.025
liness+0.024
 disl+0.024
Fighting+0.024
Neg logit contributions
 ever-0.031
 souven-0.030
 eyel-0.030
 centuries-0.030
 ho-0.030
Token were
Token ID547
Feature activation+2.75e+00

Pos logit contributions
ust+0.024
 centuries+0.023
Ar+0.023
 ever+0.023
ur+0.023
Neg logit contributions
-0.010
 understands-0.010
 sees-0.010
 draws-0.010
 means-0.010
Token playing
Token ID2712
Feature activation+1.51e+00

Pos logit contributions
ust+0.108
 centuries+0.106
Ar+0.106
 prey+0.102
 souven+0.101
Neg logit contributions
 sees-0.054
-0.054
 "-0.050
 understands-0.050
 draws-0.049
Token hide
Token ID7808
Feature activation+6.94e+00

Pos logit contributions
ust+0.041
 centuries+0.040
Ar+0.038
ur+0.038
 ho+0.036
Neg logit contributions
-0.048
 sees-0.047
 understands-0.047
 draws-0.046
 means-0.045
Token-
Token ID12
Feature activation+6.09e+00

Pos logit contributions
ust+0.272
Ar+0.272
 centuries+0.268
ur+0.261
 souven+0.260
Neg logit contributions
-0.132
 sees-0.128
 realized-0.121
 means-0.118
 understands-0.113
Tokenand
Token ID392
Feature activation+1.63e+01

Pos logit contributions
 ho+0.102
 centuries+0.094
 prey+0.093
ust+0.088
 eyel+0.088
Neg logit contributions
 sees-0.235
 draws-0.234
-0.234
 disl-0.232
Fighting-0.231
Token-
Token ID12
Feature activation+6.11e+00

Pos logit contributions
 centuries+0.528
Ar+0.525
ust+0.523
 prey+0.492
 souven+0.484
Neg logit contributions
-0.423
 sees-0.405
 understands-0.395
 realized-0.393
 draws-0.376
Tokenseek
Token ID36163
Feature activation+2.38e+00

Pos logit contributions
ust+0.148
 centuries+0.142
 ho+0.138
 prey+0.135
 souven+0.134
Neg logit contributions
-0.208
 sees-0.201
 draws-0.200
 understands-0.199
 realized-0.196
Token when
Token ID618
Feature activation-1.14e+00

Pos logit contributions
Ar+0.105
ust+0.103
ur+0.100
iri+0.100
abb+0.100
Neg logit contributions
 "-0.040
 the-0.037
 that-0.037
 sees-0.036
-0.035
Token Lily
Token ID20037
Feature activation-3.76e-01

Pos logit contributions
+0.018
 sees+0.017
 understands+0.017
 draws+0.017
 means+0.016
Neg logit contributions
ust-0.048
 centuries-0.047
ur-0.046
Ar-0.046
 prey-0.045
Token couldn
Token ID3521
Feature activation-2.12e+00

Pos logit contributions
+0.004
 understands+0.004
 sees+0.004
 draws+0.003
 means+0.003
Neg logit contributions
ust-0.018
 centuries-0.018
Ar-0.017
ur-0.017
 ever-0.017
Token We
Token ID775
Feature activation+1.95e+00

Pos logit contributions
ust+0.033
 centuries+0.032
Ar+0.032
 prey+0.032
ur+0.031
Neg logit contributions
 draws-0.015
 means-0.014
-0.014
 understands-0.013
 "-0.013
Token have
Token ID423
Feature activation-2.70e+00

Pos logit contributions
ust+0.069
 centuries+0.069
Ar+0.067
ur+0.066
 prey+0.065
Neg logit contributions
-0.043
 understands-0.043
 sees-0.042
 draws-0.041
 means-0.041
Token different
Token ID1180
Feature activation-3.62e+00

Pos logit contributions
+0.068
 understands+0.067
 sees+0.065
 draws+0.065
 means+0.064
Neg logit contributions
ust-0.088
 centuries-0.088
Ar-0.085
ur-0.084
 ever-0.084
Token likes
Token ID7832
Feature activation-4.76e+00

Pos logit contributions
+0.090
 understands+0.086
 sees+0.083
 draws+0.081
 means+0.080
Neg logit contributions
 centuries-0.122
ust-0.120
Ar-0.117
 ever-0.116
 ho-0.115
Token and
Token ID290
Feature activation+2.56e-01

Pos logit contributions
 draws+0.101
 sees+0.100
 means+0.099
+0.096
 understands+0.095
Neg logit contributions
ust-0.180
 centuries-0.170
Ar-0.170
ur-0.168
abb-0.161
Token disl
Token ID19621
Feature activation-1.70e+01

Pos logit contributions
 ever+0.008
 centuries+0.008
ust+0.008
 ho+0.008
ur+0.007
Neg logit contributions
-0.007
 understands-0.007
:"-0.006
 sees-0.006
 means-0.006
Tokenikes
Token ID7938
Feature activation+3.62e+00

Pos logit contributions
 means+0.506
 draws+0.505
 understands+0.505
+0.502
 sees+0.479
Neg logit contributions
 centuries-0.485
ur-0.469
 ho-0.464
Ar-0.460
 souven-0.456
Token.
Token ID13
Feature activation+8.30e-01

Pos logit contributions
ust+0.095
 centuries+0.095
 ever+0.094
 ho+0.090
Ar+0.089
Neg logit contributions
-0.111
 understands-0.110
 sees-0.106
 draws-0.106
 realized-0.104
Token We
Token ID775
Feature activation+1.98e+00

Pos logit contributions
ust+0.033
 centuries+0.033
 prey+0.032
ur+0.032
Ar+0.032
Neg logit contributions
 draws-0.016
-0.015
 means-0.015
 understands-0.014
 "-0.014
Token can
Token ID460
Feature activation-5.43e+00

Pos logit contributions
ust+0.068
 centuries+0.067
 ever+0.066
Ar+0.066
ur+0.065
Neg logit contributions
-0.047
 understands-0.045
 sees-0.043
 draws-0.043
 means-0.043
Token be
Token ID307
Feature activation+1.81e+00

Pos logit contributions
 draws+0.118
+0.118
 sees+0.117
 understands+0.114
 means+0.112
Neg logit contributions
 centuries-0.196
ust-0.195
ur-0.187
Ar-0.187
 prey-0.186
Token,
Token ID11
Feature activation-1.53e+00

Pos logit contributions
ust+0.099
 centuries+0.098
Ar+0.097
ur+0.095
 prey+0.093
Neg logit contributions
-0.043
 understands-0.042
 sees-0.041
 means-0.040
 draws-0.039
Token but
Token ID475
Feature activation-4.14e+00

Pos logit contributions
+0.039
 understands+0.038
 sees+0.038
 means+0.037
 draws+0.037
Neg logit contributions
 centuries-0.050
ust-0.049
Ar-0.049
ur-0.047
 prey-0.046
Token only
Token ID691
Feature activation+1.53e+00

Pos logit contributions
+0.117
 understands+0.116
 sees+0.113
 draws+0.113
 means+0.111
Neg logit contributions
ust-0.122
 centuries-0.121
Ar-0.117
 ever-0.116
ur-0.116
Token with
Token ID351
Feature activation+3.13e+00

Pos logit contributions
ust+0.054
Ar+0.054
 centuries+0.053
abb+0.051
ur+0.050
Neg logit contributions
 means-0.036
 sees-0.035
-0.035
 draws-0.035
 understands-0.033
Token our
Token ID674
Feature activation+1.37e+00

Pos logit contributions
ust+0.113
 centuries+0.112
Ar+0.109
ur+0.108
 ever+0.107
Neg logit contributions
-0.068
 understands-0.067
 sees-0.065
 draws-0.064
 means-0.064
Token supervision
Token ID20865
Feature activation-1.27e+01

Pos logit contributions
 centuries+0.044
ust+0.044
Ar+0.042
 ho+0.042
 ever+0.042
Neg logit contributions
-0.035
 understands-0.034
 draws-0.033
 means-0.033
 sees-0.033
Token.
Token ID13
Feature activation+4.35e-01

Pos logit contributions
+0.333
 understands+0.332
 draws+0.327
 sees+0.325
 means+0.313
Neg logit contributions
ust-0.388
 centuries-0.384
 ever-0.379
 ho-0.371
ur-0.367
Token And
Token ID843
Feature activation+2.41e+00

Pos logit contributions
ust+0.016
 centuries+0.016
Ar+0.016
 ever+0.016
ur+0.016
Neg logit contributions
-0.009
 understands-0.009
 draws-0.008
 means-0.008
 sees-0.008
Token maybe
Token ID3863
Feature activation-1.44e+00

Pos logit contributions
ust+0.076
 centuries+0.075
Ar+0.073
 ever+0.072
ur+0.072
Neg logit contributions
-0.063
 understands-0.062
 sees-0.061
 draws-0.060
 means-0.059
Token you
Token ID345
Feature activation+6.85e+00

Pos logit contributions
+0.035
 means+0.034
 sees+0.033
 draws+0.033
 understands+0.032
Neg logit contributions
ust-0.051
 centuries-0.049
Ar-0.049
ur-0.048
abb-0.047
Token can
Token ID460
Feature activation-5.65e+00

Pos logit contributions
ust+0.293
 centuries+0.292
Ar+0.285
ur+0.281
 prey+0.278
Neg logit contributions
-0.103
 understands-0.101
 sees-0.098
 draws-0.095
 means-0.095
Token to
Token ID284
Feature activation+2.74e-01

Pos logit contributions
+0.079
 understands+0.073
:"+0.072
 draws+0.071
 sees+0.071
Neg logit contributions
 centuries-0.081
 ever-0.081
ust-0.080
 ho-0.079
Ar-0.077
Token use
Token ID779
Feature activation-1.87e-01

Pos logit contributions
and+0.010
ust+0.010
 centuries+0.010
ider+0.010
ur+0.010
Neg logit contributions
 draws-0.006
 sees-0.005
 realized-0.005
 means-0.005
 disl-0.005
Token it
Token ID340
Feature activation-7.01e+00

Pos logit contributions
+0.003
 understands+0.003
 sees+0.003
 draws+0.003
Fighting+0.003
Neg logit contributions
 centuries-0.008
 ho-0.008
 ever-0.008
ust-0.007
Ar-0.007
Token with
Token ID351
Feature activation+3.53e+00

Pos logit contributions
 means+0.145
+0.145
 sees+0.141
 understands+0.139
 draws+0.136
Neg logit contributions
Ar-0.253
ust-0.253
 centuries-0.250
 eyel-0.240
ur-0.236
Token adult
Token ID4044
Feature activation-1.91e-01

Pos logit contributions
 centuries+0.130
ust+0.130
Ar+0.126
 ever+0.126
ur+0.125
Neg logit contributions
-0.074
 understands-0.072
 sees-0.069
 draws-0.069
 means-0.067
Token supervision
Token ID20865
Feature activation-1.18e+01

Pos logit contributions
+0.005
 understands+0.005
 sees+0.005
 draws+0.005
:"+0.005
Neg logit contributions
 centuries-0.006
 ever-0.006
ust-0.006
ur-0.006
 ho-0.006
Token,"
Token ID553
Feature activation-3.94e+00

Pos logit contributions
+0.276
 understands+0.271
 sees+0.263
 means+0.260
 draws+0.260
Neg logit contributions
ust-0.404
 centuries-0.401
Ar-0.392
ur-0.384
 ever-0.380
Token Mr
Token ID1770
Feature activation+8.05e+00

Pos logit contributions
+0.077
 understands+0.075
 sees+0.073
 draws+0.072
 means+0.071
Neg logit contributions
ust-0.151
 centuries-0.151
Ar-0.146
ur-0.145
 ever-0.145
Token.
Token ID13
Feature activation+9.25e-01

Pos logit contributions
 centuries+0.261
ust+0.258
Ar+0.250
ur+0.242
 ever+0.240
Neg logit contributions
-0.210
 understands-0.209
 sees-0.205
 means-0.203
 draws-0.200
Token Lee
Token ID5741
Feature activation+1.56e+00

Pos logit contributions
ust+0.032
 centuries+0.031
Ar+0.030
ur+0.030
 ever+0.030
Neg logit contributions
-0.022
 understands-0.022
 sees-0.021
 draws-0.021
 means-0.021
Token said
Token ID531
Feature activation-1.63e+00

Pos logit contributions
ust+0.064
 centuries+0.064
ur+0.063
 ever+0.063
 ho+0.062
Neg logit contributions
-0.024
 understands-0.023
 draws-0.022
 sees-0.022
 disl-0.021
Token a
Token ID257
Feature activation+5.82e+00

Pos logit contributions
ust+0.075
 centuries+0.072
Ar+0.071
ur+0.069
 prey+0.068
Neg logit contributions
-0.051
 sees-0.051
 understands-0.051
 means-0.050
 draws-0.049
Token broom
Token ID47085
Feature activation+4.92e-01

Pos logit contributions
ust+0.212
 centuries+0.206
ur+0.197
Ar+0.195
abb+0.194
Neg logit contributions
 sees-0.135
 understands-0.134
-0.133
 draws-0.125
 means-0.121
Token and
Token ID290
Feature activation-1.04e-01

Pos logit contributions
ust+0.018
 centuries+0.018
Ar+0.018
ur+0.017
 ever+0.017
Neg logit contributions
-0.010
 understands-0.010
 sees-0.010
 draws-0.010
 means-0.009
Token a
Token ID257
Feature activation+5.83e+00

Pos logit contributions
+0.003
 understands+0.003
 sees+0.003
 draws+0.003
 means+0.003
Neg logit contributions
 centuries-0.003
ust-0.003
Ar-0.003
ur-0.003
 ever-0.003
Token dust
Token ID8977
Feature activation-8.61e-01

Pos logit contributions
 centuries+0.176
ust+0.175
Ar+0.172
 ho+0.171
 ever+0.168
Neg logit contributions
-0.155
 understands-0.154
 means-0.150
 draws-0.150
 sees-0.147
Tokenpan
Token ID6839
Feature activation-1.16e+01

Pos logit contributions
+0.016
 understands+0.016
 sees+0.015
 draws+0.015
 means+0.015
Neg logit contributions
 centuries-0.033
ust-0.033
 ever-0.032
Ar-0.032
ur-0.032
Token and
Token ID290
Feature activation-8.61e-02

Pos logit contributions
+0.300
 understands+0.295
 sees+0.287
 draws+0.285
 means+0.281
Neg logit contributions
ust-0.367
 centuries-0.365
Ar-0.353
 ever-0.353
ur-0.350
Token start
Token ID923
Feature activation-1.08e+00

Pos logit contributions
+0.002
 understands+0.002
 sees+0.002
 draws+0.002
 means+0.002
Neg logit contributions
 centuries-0.003
ust-0.002
Ar-0.002
 ever-0.002
 ho-0.002
Token to
Token ID284
Feature activation-1.51e-01

Pos logit contributions
 understands+0.031
+0.031
 sees+0.030
 draws+0.029
 realized+0.029
Neg logit contributions
 centuries-0.031
 ever-0.031
ur-0.030
ust-0.030
 eyel-0.029
Token clean
Token ID3424
Feature activation-5.35e+00

Pos logit contributions
+0.004
 understands+0.004
 sees+0.004
:"+0.004
 means+0.004
Neg logit contributions
 centuries-0.004
ust-0.004
 ho-0.004
Ar-0.004
 ever-0.004
Token the
Token ID262
Feature activation-6.43e+00

Pos logit contributions
 understands+0.148
+0.141
 realized+0.134
Fighting+0.131
 sees+0.129
Neg logit contributions
 centuries-0.160
 ever-0.155
 ho-0.154
 eyel-0.153
 prey-0.150
Token her
Token ID607
Feature activation+3.09e+00

Pos logit contributions
 ever+0.104
 centuries+0.104
 ho+0.099
 souven+0.097
 prey+0.097
Neg logit contributions
-0.097
 understands-0.087
 means-0.086
 draws-0.086
 sees-0.084
Token little
Token ID1310
Feature activation+2.41e+00

Pos logit contributions
 ho+0.095
 ever+0.095
ust+0.091
 souven+0.090
 prey+0.090
Neg logit contributions
 means-0.085
 draws-0.085
 understands-0.082
-0.080
 disl-0.079
Token broom
Token ID47085
Feature activation+8.47e-01

Pos logit contributions
ust+0.092
 centuries+0.090
Ar+0.088
ur+0.087
 ever+0.085
Neg logit contributions
-0.050
 sees-0.049
 understands-0.049
 draws-0.048
 means-0.046
Token and
Token ID290
Feature activation-2.02e-01

Pos logit contributions
 ho+0.025
 ever+0.025
ur+0.025
 eyel+0.024
 centuries+0.024
Neg logit contributions
 understands-0.024
-0.023
 draws-0.023
Fighting-0.023
 means-0.023
Token dust
Token ID8977
Feature activation-5.73e-01

Pos logit contributions
 understands+0.006
+0.005
 means+0.005
:"+0.005
 disl+0.005
Neg logit contributions
 ever-0.007
 ho-0.006
 eyel-0.006
 centuries-0.006
ur-0.006
Tokenpan
Token ID6839
Feature activation-1.13e+01

Pos logit contributions
+0.017
 understands+0.017
 draws+0.017
 disl+0.016
Fighting+0.016
Neg logit contributions
 ever-0.016
 centuries-0.016
 ho-0.016
 eyel-0.015
ur-0.015
Token,
Token ID11
Feature activation-1.36e+00

Pos logit contributions
+0.328
 draws+0.316
 understands+0.314
 sees+0.310
 means+0.296
Neg logit contributions
 ever-0.332
 centuries-0.327
ust-0.325
 ho-0.320
ur-0.319
Token trying
Token ID2111
Feature activation-2.38e+00

Pos logit contributions
 understands+0.041
+0.040
 means+0.039
 draws+0.038
:"+0.037
Neg logit contributions
 ho-0.038
 ever-0.038
 centuries-0.037
 eyel-0.036
ust-0.035
Token to
Token ID284
Feature activation+1.29e-01

Pos logit contributions
+0.070
 understands+0.067
 draws+0.066
 sees+0.065
 means+0.064
Neg logit contributions
ust-0.069
 centuries-0.069
 ever-0.068
 ho-0.065
ur-0.065
Token get
Token ID651
Feature activation+2.13e+00

Pos logit contributions
ust+0.004
 centuries+0.004
Ar+0.004
ur+0.004
 ever+0.004
Neg logit contributions
-0.003
 understands-0.003
 sees-0.003
 draws-0.003
 means-0.003
Token all
Token ID477
Feature activation-2.39e-01

Pos logit contributions
 centuries+0.072
ust+0.071
 ever+0.070
ur+0.069
Ar+0.068
Neg logit contributions
-0.053
 understands-0.051
 sees-0.049
 draws-0.049
 means-0.048
Token m
Token ID285
Feature activation+1.34e+00

Pos logit contributions
ust+0.198
 centuries+0.197
Ar+0.191
ur+0.189
 ever+0.188
Neg logit contributions
-0.137
 understands-0.134
 sees-0.130
 draws-0.130
 means-0.128
Tokenop
Token ID404
Feature activation-4.90e-01

Pos logit contributions
 centuries+0.051
Ar+0.049
 prey+0.046
ust+0.045
 souven+0.044
Neg logit contributions
-0.032
 understands-0.031
 sees-0.028
:"-0.027
 realized-0.026
Token and
Token ID290
Feature activation-1.72e-01

Pos logit contributions
 understands+0.011
+0.011
 draws+0.011
 sees+0.011
 means+0.010
Neg logit contributions
 ever-0.017
ust-0.017
 centuries-0.017
 ho-0.016
ur-0.016
Token a
Token ID257
Feature activation+5.85e+00

Pos logit contributions
+0.004
 understands+0.004
 sees+0.004
 means+0.004
 draws+0.004
Neg logit contributions
 centuries-0.006
 ever-0.006
ust-0.006
Ar-0.006
 ho-0.006
Token dust
Token ID8977
Feature activation-6.45e-01

Pos logit contributions
ust+0.197
 centuries+0.196
Ar+0.193
 ho+0.189
 ever+0.189
Neg logit contributions
-0.136
 understands-0.135
 draws-0.131
 means-0.131
 sees-0.130
Tokenpan
Token ID6839
Feature activation-1.13e+01

Pos logit contributions
 understands+0.013
+0.013
 draws+0.012
 sees+0.012
 means+0.012
Neg logit contributions
 centuries-0.024
ust-0.024
 ever-0.024
Ar-0.023
 ho-0.023
Token and
Token ID290
Feature activation-1.37e-01

Pos logit contributions
+0.273
 understands+0.265
 draws+0.261
 sees+0.257
 means+0.249
Neg logit contributions
 ever-0.373
ust-0.369
 centuries-0.368
 ho-0.358
ur-0.358
Token cleaned
Token ID20750
Feature activation-6.06e+00

Pos logit contributions
+0.003
 understands+0.003
:"+0.003
 sees+0.003
Fighting+0.003
Neg logit contributions
 centuries-0.004
 ever-0.004
Ar-0.004
 ho-0.004
ust-0.004
Token up
Token ID510
Feature activation+1.15e+00

Pos logit contributions
+0.147
 understands+0.141
 disl+0.138
Fighting+0.136
 draws+0.135
Neg logit contributions
 centuries-0.193
 ever-0.192
 ho-0.192
 eyel-0.188
Ar-0.187
Token the
Token ID262
Feature activation-6.34e+00

Pos logit contributions
 centuries+0.048
ust+0.047
 ever+0.047
Ar+0.046
 ho+0.046
Neg logit contributions
-0.019
 understands-0.018
 sees-0.017
 draws-0.017
 means-0.016
Token dough
Token ID15756
Feature activation-3.61e+00

Pos logit contributions
+0.134
 understands+0.131
 means+0.120
:"+0.120
 disl+0.116
Neg logit contributions
 centuries-0.232
Ar-0.224
 ho-0.222
ust-0.221
 eyel-0.221
Token a
Token ID257
Feature activation+5.99e+00

Pos logit contributions
ust+0.077
 centuries+0.073
Ar+0.072
abb+0.070
 prey+0.070
Neg logit contributions
-0.053
 sees-0.052
 understands-0.051
 draws-0.051
 means-0.050
Token broom
Token ID47085
Feature activation+8.57e-01

Pos logit contributions
ust+0.228
 centuries+0.226
izards+0.216
ur+0.211
abb+0.210
Neg logit contributions
 sees-0.134
-0.129
 understands-0.126
 "-0.119
 machine-0.116
Token and
Token ID290
Feature activation-8.61e-02

Pos logit contributions
ust+0.030
 centuries+0.029
Ar+0.028
ur+0.028
 prey+0.027
Neg logit contributions
-0.020
 sees-0.020
 means-0.020
 draws-0.020
 understands-0.019
Token a
Token ID257
Feature activation+6.00e+00

Pos logit contributions
+0.002
 understands+0.002
 sees+0.002
 draws+0.002
 means+0.002
Neg logit contributions
 centuries-0.003
ust-0.003
Ar-0.002
 ever-0.002
ur-0.002
Token dust
Token ID8977
Feature activation-6.02e-01

Pos logit contributions
ust+0.180
 centuries+0.179
Ar+0.174
 ever+0.171
ur+0.170
Neg logit contributions
-0.164
 understands-0.162
 draws-0.158
 sees-0.157
 means-0.157
Tokenpan
Token ID6839
Feature activation-1.12e+01

Pos logit contributions
+0.013
 understands+0.013
 sees+0.012
 draws+0.012
 means+0.012
Neg logit contributions
ust-0.022
 centuries-0.022
Ar-0.021
ur-0.021
 ever-0.021
Token.
Token ID13
Feature activation+7.35e-01

Pos logit contributions
+0.295
 understands+0.290
 sees+0.283
 draws+0.281
 means+0.278
Neg logit contributions
ust-0.349
 centuries-0.346
Ar-0.335
ur-0.331
 ever-0.330
Token You
Token ID921
Feature activation+2.57e+00

Pos logit contributions
 centuries+0.027
ust+0.027
Ar+0.027
ur+0.026
 ever+0.026
Neg logit contributions
-0.015
 understands-0.015
 sees-0.014
 draws-0.014
 means-0.014
Token have
Token ID423
Feature activation-3.08e+00

Pos logit contributions
ust+0.115
 centuries+0.114
Ar+0.112
ur+0.111
 ever+0.110
Neg logit contributions
-0.034
 understands-0.033
 sees-0.032
 draws-0.031
 means-0.030
Token to
Token ID284
Feature activation-1.76e-02

Pos logit contributions
+0.094
 understands+0.088
:"+0.087
liness+0.087
 means+0.085
Neg logit contributions
 ever-0.087
 centuries-0.086
 ho-0.083
ust-0.082
Ar-0.082
Token clean
Token ID3424
Feature activation-5.21e+00

Pos logit contributions
 draws+0.000
 sees+0.000
 means+0.000
+0.000
 realized+0.000
Neg logit contributions
ust-0.001
and-0.001
 centuries-0.001
Ar-0.001
ur-0.001
Token of
Token ID286
Feature activation+3.85e+00

Pos logit contributions
 ever+0.037
 centuries+0.037
ust+0.037
ur+0.036
 ho+0.035
Neg logit contributions
-0.042
 understands-0.040
 draws-0.039
 sees-0.039
 means-0.038
Token paper
Token ID3348
Feature activation-1.52e+00

Pos logit contributions
 centuries+0.134
 ho+0.128
 prey+0.127
 souven+0.126
 ever+0.125
Neg logit contributions
-0.096
 understands-0.089
 sees-0.084
:"-0.083
 draws-0.082
Token and
Token ID290
Feature activation-4.61e-01

Pos logit contributions
+0.038
 understands+0.038
 sees+0.036
 draws+0.035
Fighting+0.035
Neg logit contributions
 centuries-0.048
 ever-0.048
ust-0.048
 ho-0.047
ur-0.047
Token a
Token ID257
Feature activation+5.92e+00

Pos logit contributions
+0.012
 understands+0.011
:"+0.011
 sees+0.011
Fighting+0.011
Neg logit contributions
 centuries-0.015
 ever-0.015
 ho-0.015
Ar-0.014
ust-0.014
Token dust
Token ID8977
Feature activation-4.97e-01

Pos logit contributions
 centuries+0.180
 ho+0.179
Ar+0.177
ust+0.177
 souven+0.174
Neg logit contributions
 understands-0.153
-0.153
 means-0.151
 draws-0.148
 sees-0.144
Tokenpan
Token ID6839
Feature activation-1.11e+01

Pos logit contributions
 understands+0.011
+0.011
 sees+0.010
 draws+0.010
Fighting+0.010
Neg logit contributions
 ever-0.017
 centuries-0.017
 ho-0.017
ust-0.017
 souven-0.016
Token and
Token ID290
Feature activation-4.31e-01

Pos logit contributions
+0.344
 understands+0.331
 draws+0.328
 sees+0.324
Fighting+0.307
Neg logit contributions
 ever-0.310
ust-0.295
 centuries-0.290
 ho-0.285
ur-0.283
Token they
Token ID484
Feature activation-2.08e-01

Pos logit contributions
+0.012
 understands+0.011
 sees+0.011
:"+0.011
Fighting+0.011
Neg logit contributions
 centuries-0.013
 ever-0.013
ust-0.013
Ar-0.013
 ho-0.013
Token cleaned
Token ID20750
Feature activation-5.87e+00

Pos logit contributions
+0.005
 understands+0.005
Fighting+0.005
 disl+0.005
 sees+0.005
Neg logit contributions
 ever-0.007
ust-0.007
ur-0.006
 ho-0.006
 centuries-0.006
Token up
Token ID510
Feature activation+1.35e+00

Pos logit contributions
+0.138
 understands+0.134
 sees+0.127
 realized+0.126
Fighting+0.125
Neg logit contributions
 centuries-0.200
 ever-0.193
 eyel-0.187
 ho-0.186
 souven-0.185
Token the
Token ID262
Feature activation-6.44e+00

Pos logit contributions
 centuries+0.050
 ever+0.050
ust+0.049
 souven+0.048
 ho+0.047
Neg logit contributions
-0.028
 understands-0.028
 sees-0.026
 draws-0.026
Fighting-0.025
Token a
Token ID257
Feature activation+5.97e+00

Pos logit contributions
+0.125
 draws+0.110
 understands+0.108
:"+0.106
Fighting+0.105
Neg logit contributions
 ever-0.130
 centuries-0.129
 ho-0.128
 prey-0.127
ust-0.125
Token broom
Token ID47085
Feature activation+9.62e-01

Pos logit contributions
 ho+0.174
Ar+0.172
 prey+0.165
 souven+0.164
 ever+0.162
Neg logit contributions
 draws-0.166
 means-0.165
 disl-0.163
 understands-0.163
-0.159
Token and
Token ID290
Feature activation+1.30e-01

Pos logit contributions
 ever+0.027
 ho+0.026
 centuries+0.026
ust+0.025
 prey+0.025
Neg logit contributions
 understands-0.030
-0.028
 disl-0.027
 draws-0.026
 means-0.026
Token a
Token ID257
Feature activation+5.98e+00

Pos logit contributions
 centuries+0.004
 ever+0.004
 ho+0.004
Ar+0.004
ust+0.004
Neg logit contributions
-0.004
 understands-0.004
:"-0.004
 draws-0.004
 means-0.003
Token dust
Token ID8977
Feature activation-5.67e-01

Pos logit contributions
 ho+0.168
Ar+0.164
 centuries+0.160
 prey+0.160
ust+0.158
Neg logit contributions
 means-0.171
 understands-0.171
 draws-0.170
-0.168
 disl-0.162
Tokenpan
Token ID6839
Feature activation-1.11e+01

Pos logit contributions
 understands+0.013
+0.013
 draws+0.012
 sees+0.012
 disl+0.012
Neg logit contributions
 centuries-0.019
 ever-0.019
ust-0.019
 ho-0.019
 prey-0.018
Token.
Token ID13
Feature activation+8.51e-01

Pos logit contributions
+0.338
 understands+0.331
 draws+0.323
 sees+0.318
 means+0.310
Neg logit contributions
ust-0.300
 ever-0.300
 centuries-0.299
 ho-0.286
Ar-0.283
Token
Token ID198
Feature activation+2.86e+00

Pos logit contributions
 centuries+0.029
ust+0.029
Ar+0.029
 ever+0.028
 prey+0.028
Neg logit contributions
-0.019
 understands-0.019
 sees-0.018
 draws-0.018
 realized-0.018
Token
Token ID198
Feature activation+2.83e+00

Pos logit contributions
Ar+0.088
 centuries+0.085
ust+0.082
 ever+0.081
and+0.081
Neg logit contributions
 draws-0.076
-0.075
liness-0.075
 means-0.075
 understands-0.073
Token"
Token ID1
Feature activation-7.31e+00

Pos logit contributions
Ar+0.095
ust+0.091
 centuries+0.090
 ho+0.087
 ever+0.085
Neg logit contributions
 understands-0.071
 means-0.070
 draws-0.069
liness-0.067
 sees-0.067
TokenCan
Token ID6090
Feature activation-2.25e+00

Pos logit contributions
+0.161
 understands+0.155
 sees+0.152
 draws+0.151
 means+0.149
Neg logit contributions
ust-0.267
 centuries-0.266
ur-0.261
 ever-0.254
Ar-0.254
Token you
Token ID345
Feature activation+7.13e+00

Pos logit contributions
+0.030
 understands+0.030
 sees+0.029
 draws+0.028
 means+0.028
Neg logit contributions
ust-0.046
 centuries-0.045
Ar-0.044
ur-0.043
 ever-0.043
Token should
Token ID815
Feature activation-5.69e+00

Pos logit contributions
ust+0.271
 centuries+0.271
Ar+0.267
 prey+0.259
ur+0.258
Neg logit contributions
 understands-0.139
-0.137
 sees-0.136
 means-0.131
 realized-0.129
Token do
Token ID466
Feature activation-3.53e+00

Pos logit contributions
 understands+0.141
+0.140
 sees+0.140
 draws+0.137
 means+0.136
Neg logit contributions
 centuries-0.183
ust-0.182
Ar-0.180
 prey-0.176
and-0.174
Token something
Token ID1223
Feature activation+1.34e+00

Pos logit contributions
+0.082
 understands+0.075
 draws+0.074
 sees+0.073
 means+0.070
Neg logit contributions
 centuries-0.127
 ever-0.125
ust-0.124
 ho-0.119
 prey-0.118
Token more
Token ID517
Feature activation+8.71e-01

Pos logit contributions
Ar+0.049
 centuries+0.049
ust+0.048
 ho+0.046
 prey+0.046
Neg logit contributions
 understands-0.029
-0.028
 sees-0.028
 means-0.028
 "-0.026
Token constructive
Token ID26938
Feature activation-1.11e+01

Pos logit contributions
ust+0.026
 centuries+0.026
Ar+0.025
ur+0.025
 ever+0.025
Neg logit contributions
-0.024
 understands-0.024
 sees-0.023
 draws-0.023
 means-0.023
Token with
Token ID351
Feature activation+3.57e+00

Pos logit contributions
+0.315
 understands+0.309
 sees+0.304
 draws+0.302
 means+0.300
Neg logit contributions
ust-0.323
 centuries-0.321
Ar-0.311
ur-0.306
 ever-0.302
Token those
Token ID883
Feature activation-1.16e+00

Pos logit contributions
ust+0.132
 centuries+0.132
Ar+0.128
 ever+0.126
ur+0.126
Neg logit contributions
-0.074
 understands-0.073
 sees-0.070
 draws-0.070
 means-0.068
Token mar
Token ID1667
Feature activation+7.89e+00

Pos logit contributions
+0.027
 understands+0.026
 sees+0.026
 draws+0.025
 means+0.025
Neg logit contributions
ust-0.040
 centuries-0.040
Ar-0.039
ur-0.038
 ever-0.038
Tokenbles
Token ID7689
Feature activation+4.99e+00

Pos logit contributions
 centuries+0.223
 eyel+0.219
ust+0.213
 prey+0.213
Ar+0.212
Neg logit contributions
-0.233
 sees-0.224
 draws-0.221
 understands-0.221
 means-0.209
Token.
Token ID13
Feature activation+5.85e-01

Pos logit contributions
ust+0.145
 centuries+0.144
Ar+0.138
 ever+0.137
ur+0.137
Neg logit contributions
-0.143
 understands-0.141
 sees-0.138
 draws-0.137
 means-0.135
Token Lily
Token ID20037
Feature activation-1.18e+00

Pos logit contributions
+0.021
 understands+0.021
 sees+0.020
 draws+0.020
 means+0.020
Neg logit contributions
 centuries-0.044
ust-0.044
Ar-0.043
ur-0.042
 ever-0.042
Token.
Token ID13
Feature activation+8.77e-01

Pos logit contributions
Fighting+0.040
 disl+0.039
 understands+0.039
+0.039
 draws+0.039
Neg logit contributions
 ever-0.027
 ho-0.025
ct-0.025
ust-0.024
 centuries-0.024
Token It
Token ID632
Feature activation-7.88e+00

Pos logit contributions
 ever+0.021
 eyel+0.020
ust+0.020
ur+0.020
 ho+0.020
Neg logit contributions
 realized-0.028
 understands-0.027
Fighting-0.026
 sees-0.026
:"-0.026
Token was
Token ID373
Feature activation-1.38e+00

Pos logit contributions
+0.114
 understands+0.099
 draws+0.092
:"+0.092
 sees+0.090
Neg logit contributions
ust-0.344
 ever-0.344
 centuries-0.336
Ar-0.328
 eyel-0.323
Token my
Token ID616
Feature activation+8.20e-01

Pos logit contributions
 draws+0.031
 sees+0.031
 understands+0.030
 "+0.029
+0.029
Neg logit contributions
Ar-0.048
ust-0.048
 centuries-0.047
ur-0.047
and-0.045
Token fault
Token ID8046
Feature activation-1.10e+01

Pos logit contributions
 centuries+0.030
ust+0.030
Ar+0.030
ur+0.029
abb+0.028
Neg logit contributions
 sees-0.018
 understands-0.018
-0.017
 draws-0.017
 means-0.016
Token.
Token ID13
Feature activation+6.93e-01

Pos logit contributions
+0.273
 understands+0.257
 means+0.255
 sees+0.254
 "+0.250
Neg logit contributions
Ar-0.373
 centuries-0.371
ust-0.367
ur-0.357
 prey-0.343
Token I
Token ID314
Feature activation+1.76e+00

Pos logit contributions
ust+0.023
 centuries+0.023
 ever+0.022
ur+0.022
Ar+0.022
Neg logit contributions
-0.016
 understands-0.016
 sees-0.016
 realized-0.016
:"-0.015
Token was
Token ID373
Feature activation-1.50e+00

Pos logit contributions
ust+0.072
 centuries+0.071
Ar+0.069
 ever+0.069
ur+0.069
Neg logit contributions
-0.030
 understands-0.029
 sees-0.028
 draws-0.028
 means-0.027
Token not
Token ID407
Feature activation+8.97e-01

Pos logit contributions
+0.035
 understands+0.035
 sees+0.034
 draws+0.034
 means+0.033
Neg logit contributions
ust-0.052
 centuries-0.051
Ar-0.051
ur-0.049
 prey-0.049
Token careful
Token ID8161
Feature activation-7.11e+00

Pos logit contributions
 centuries+0.030
ust+0.029
Ar+0.029
ur+0.028
 prey+0.027
Neg logit contributions
 understands-0.022
-0.022
 sees-0.022
 draws-0.021
 means-0.021
Token trust
Token ID3774
Feature activation+1.40e+00

Pos logit contributions
+0.084
 understands+0.082
 sees+0.080
 draws+0.079
 means+0.077
Neg logit contributions
 centuries-0.094
ust-0.093
Ar-0.090
 ever-0.089
ur-0.088
Token and
Token ID290
Feature activation-3.78e-01

Pos logit contributions
ust+0.043
 centuries+0.043
 ever+0.042
 ho+0.040
ur+0.040
Neg logit contributions
-0.038
 understands-0.037
 sees-0.036
 draws-0.036
 realized-0.035
Token care
Token ID1337
Feature activation-3.06e+00

Pos logit contributions
+0.011
 understands+0.010
 sees+0.010
:"+0.010
 draws+0.010
Neg logit contributions
 ever-0.011
 centuries-0.011
ust-0.011
 ho-0.011
 prey-0.011
Token,
Token ID11
Feature activation-1.93e+00

Pos logit contributions
+0.105
 draws+0.102
 sees+0.101
 understands+0.100
 realized+0.098
Neg logit contributions
ust-0.072
 ever-0.069
 centuries-0.068
and-0.066
ur-0.065
Token and
Token ID290
Feature activation-4.60e-01

Pos logit contributions
+0.054
 understands+0.053
 sees+0.051
 draws+0.051
 means+0.050
Neg logit contributions
ust-0.058
 centuries-0.057
 ever-0.056
 ho-0.055
Ar-0.054
Token that
Token ID326
Feature activation-1.10e+01

Pos logit contributions
+0.012
 understands+0.011
:"+0.011
 sees+0.011
 draws+0.010
Neg logit contributions
 ever-0.015
 centuries-0.014
 ho-0.014
ust-0.014
 souven-0.014
Token breaking
Token ID7163
Feature activation-2.26e+00

Pos logit contributions
+0.229
:"+0.203
 understands+0.200
 sees+0.195
 draws+0.191
Neg logit contributions
 ever-0.422
ust-0.404
 centuries-0.398
 ho-0.397
 prey-0.382
Token or
Token ID393
Feature activation+4.09e-01

Pos logit contributions
+0.053
 understands+0.050
 sees+0.049
 draws+0.048
Fighting+0.047
Neg logit contributions
 centuries-0.076
ust-0.075
 ever-0.075
 ho-0.073
 prey-0.071
Token losing
Token ID6078
Feature activation-6.99e+00

Pos logit contributions
 ever+0.013
ust+0.013
 ho+0.013
 centuries+0.013
 prey+0.012
Neg logit contributions
-0.010
 understands-0.010
:"-0.010
 draws-0.010
 sees-0.009
Token things
Token ID1243
Feature activation-1.76e+00

Pos logit contributions
+0.167
 understands+0.152
 sees+0.150
 draws+0.149
:"+0.143
Neg logit contributions
 centuries-0.239
 ever-0.237
ust-0.234
 ho-0.227
 prey-0.226
Token can
Token ID460
Feature activation-6.15e+00

Pos logit contributions
+0.046
 understands+0.045
 draws+0.044
 sees+0.043
:"+0.041
Neg logit contributions
 ever-0.055
ust-0.055
 centuries-0.054
ur-0.052
 ho-0.051
Token.
Token ID13
Feature activation+6.54e-01

Pos logit contributions
+0.089
 understands+0.088
 draws+0.086
 sees+0.085
 means+0.084
Neg logit contributions
ust-0.083
 centuries-0.081
 ever-0.080
 ho-0.078
ur-0.078
Token It
Token ID632
Feature activation-7.96e+00

Pos logit contributions
ur+0.018
 eyel+0.018
 ho+0.018
 ever+0.018
 centuries+0.018
Neg logit contributions
 realized-0.018
-0.017
 understands-0.017
Fighting-0.017
:"-0.017
Token's
Token ID338
Feature activation-1.61e+00

Pos logit contributions
+0.113
:"+0.093
 understands+0.091
 draws+0.090
Fighting+0.088
Neg logit contributions
 ever-0.348
ust-0.347
 centuries-0.336
Ar-0.330
ur-0.329
Token not
Token ID407
Feature activation+7.95e-01

Pos logit contributions
 sees+0.044
 understands+0.044
 means+0.043
 draws+0.043
 "+0.043
Neg logit contributions
 centuries-0.049
Ar-0.049
ur-0.047
ust-0.047
and-0.045
Token your
Token ID534
Feature activation-3.42e-01

Pos logit contributions
Ar+0.028
 centuries+0.028
ider+0.026
ust+0.026
ur+0.026
Neg logit contributions
 means-0.018
 understands-0.018
 sees-0.018
 draws-0.018
 "-0.017
Token fault
Token ID8046
Feature activation-1.10e+01

Pos logit contributions
 understands+0.008
 sees+0.008
+0.008
 draws+0.008
 means+0.008
Neg logit contributions
 centuries-0.012
ust-0.012
Ar-0.011
ur-0.011
 ever-0.011
Token.
Token ID13
Feature activation+5.36e-01

Pos logit contributions
+0.312
 understands+0.308
 sees+0.301
 draws+0.300
 means+0.295
Neg logit contributions
ust-0.325
 centuries-0.321
Ar-0.309
 ever-0.308
ur-0.306
Token Fl
Token ID1610
Feature activation+4.74e+00

Pos logit contributions
 centuries+0.020
ust+0.020
ur+0.019
 ever+0.019
Ar+0.019
Neg logit contributions
-0.010
 sees-0.010
 realized-0.010
 understands-0.010
:"-0.009
Tokenuffy
Token ID15352
Feature activation+3.69e+00

Pos logit contributions
 centuries+0.128
ust+0.125
Ar+0.124
ur+0.119
 ever+0.118
Neg logit contributions
-0.152
 understands-0.147
 sees-0.146
 draws-0.144
 means-0.142
Token was
Token ID373
Feature activation-1.64e+00

Pos logit contributions
 ever+0.156
 ho+0.151
 centuries+0.145
ust+0.144
 eyel+0.144
Neg logit contributions
-0.055
:"-0.049
Fighting-0.049
 disl-0.046
 "'-0.046
Token old
Token ID1468
Feature activation-6.70e+00

Pos logit contributions
 sees+0.041
 understands+0.041
 draws+0.039
 means+0.039
+0.038
Neg logit contributions
ust-0.054
 centuries-0.053
Ar-0.053
ur-0.052
 souven-0.050
Token and
Token ID290
Feature activation-4.20e-01

Pos logit contributions
+0.046
 understands+0.045
 sees+0.044
 draws+0.044
 means+0.043
Neg logit contributions
ust-0.063
 centuries-0.063
 ever-0.060
Ar-0.060
ur-0.060
Token forgiveness
Token ID26027
Feature activation-3.71e+00

Pos logit contributions
+0.010
 understands+0.009
:"+0.009
 draws+0.009
Fighting+0.009
Neg logit contributions
 ever-0.014
 ho-0.014
 centuries-0.014
 prey-0.013
 souven-0.013
Token.
Token ID13
Feature activation-4.11e-02

Pos logit contributions
+0.118
 sees+0.111
 draws+0.111
 understands+0.109
Fighting+0.109
Neg logit contributions
 ever-0.100
ust-0.091
 centuries-0.087
 ho-0.085
 prey-0.081
Token It
Token ID632
Feature activation-8.32e+00

Pos logit contributions
+0.001
 understands+0.001
 sees+0.001
 draws+0.001
 means+0.001
Neg logit contributions
ust-0.002
 centuries-0.002
Ar-0.002
ur-0.002
 ever-0.002
Token shows
Token ID2523
Feature activation-6.10e-01

Pos logit contributions
+0.125
:"+0.097
Fighting+0.097
 Building+0.090
liness+0.089
Neg logit contributions
 ever-0.360
ust-0.358
 centuries-0.348
Ar-0.340
ur-0.339
Token that
Token ID326
Feature activation-1.10e+01

Pos logit contributions
+0.013
 understands+0.013
 sees+0.013
 draws+0.013
 means+0.013
Neg logit contributions
ust-0.022
 centuries-0.022
Ar-0.021
ur-0.021
 ever-0.021
Token lending
Token ID20983
Feature activation-2.41e+00

Pos logit contributions
+0.216
 understands+0.205
 sees+0.198
 draws+0.196
 means+0.185
Neg logit contributions
ust-0.419
 centuries-0.418
 ever-0.412
Ar-0.401
 prey-0.401
Token and
Token ID290
Feature activation-4.37e-01

Pos logit contributions
+0.047
:"+0.039
Fighting+0.039
 End+0.039
 "'+0.037
Neg logit contributions
 ever-0.091
 centuries-0.089
 ho-0.088
 prey-0.086
ust-0.086
Token borrowing
Token ID23669
Feature activation-2.34e+00

Pos logit contributions
+0.012
 understands+0.011
:"+0.011
 sees+0.010
 draws+0.010
Neg logit contributions
 ho-0.014
 ever-0.013
 centuries-0.013
 prey-0.013
 souven-0.013
Token things
Token ID1243
Feature activation-1.73e+00

Pos logit contributions
+0.052
 sees+0.047
 understands+0.047
Fighting+0.046
 draws+0.046
Neg logit contributions
 centuries-0.081
 ever-0.080
ust-0.078
 ho-0.077
 prey-0.077
Token requires
Token ID4433
Feature activation-3.06e+00

Pos logit contributions
+0.043
 understands+0.042
 sees+0.041
 draws+0.041
:"+0.039
Neg logit contributions
ust-0.056
 ever-0.056
 centuries-0.055
ur-0.053
Ar-0.052
Token hurts
Token ID20406
Feature activation-3.71e+00

Pos logit contributions
liness+0.001
+0.001
:"+0.001
+0.001
 draws+0.001
Neg logit contributions
 ever-0.001
 ho-0.001
ur-0.001
ust-0.001
beh-0.001
Token Ben
Token ID3932
Feature activation-2.09e+00

Pos logit contributions
+0.095
 understands+0.092
 draws+0.089
 sees+0.089
 means+0.084
Neg logit contributions
ust-0.120
 centuries-0.119
 ever-0.119
 ho-0.114
 prey-0.113
Token.
Token ID13
Feature activation+4.49e-01

Pos logit contributions
+0.055
 understands+0.052
 draws+0.052
 sees+0.050
 disl+0.049
Neg logit contributions
ust-0.066
 ever-0.065
 centuries-0.064
 ho-0.063
 prey-0.061
Token She
Token ID1375
Feature activation+5.06e-01

Pos logit contributions
ust+0.017
 centuries+0.017
Ar+0.016
ur+0.016
 ever+0.016
Neg logit contributions
-0.009
 understands-0.009
 sees-0.009
 draws-0.008
 means-0.008
Token says
Token ID1139
Feature activation-2.93e+00

Pos logit contributions
ust+0.021
 centuries+0.020
 ever+0.020
ur+0.020
ct+0.020
Neg logit contributions
-0.008
liness-0.007
 "'-0.007
:"-0.006
Fighting-0.006
Token that
Token ID326
Feature activation-1.10e+01

Pos logit contributions
liness+0.077
 draws+0.074
+0.073
 understands+0.068
 Building+0.067
Neg logit contributions
 ever-0.098
 ho-0.089
ur-0.088
ust-0.082
 souven-0.081
Token brothers
Token ID9397
Feature activation+1.60e+00

Pos logit contributions
+0.269
 understands+0.244
:"+0.235
 sees+0.231
 draws+0.231
Neg logit contributions
 ever-0.381
ust-0.367
 centuries-0.365
 prey-0.355
Ar-0.354
Token should
Token ID815
Feature activation-5.90e+00

Pos logit contributions
 ever+0.052
ust+0.049
 ho+0.047
 centuries+0.047
ur+0.046
Neg logit contributions
-0.043
 draws-0.040
:"-0.039
 understands-0.039
 "'-0.038
Token be
Token ID307
Feature activation+1.51e+00

Pos logit contributions
+0.173
 understands+0.170
 sees+0.166
 draws+0.165
 means+0.163
Neg logit contributions
ust-0.169
 centuries-0.167
Ar-0.162
 ever-0.160
ur-0.159
Token kind
Token ID1611
Feature activation-2.27e+00

Pos logit contributions
ust+0.041
 centuries+0.040
 ever+0.040
 ho+0.039
Ar+0.039
Neg logit contributions
-0.046
 draws-0.044
 understands-0.043
 sees-0.042
liness-0.042
Token and
Token ID290
Feature activation-1.36e-01

Pos logit contributions
+0.057
 understands+0.053
 draws+0.052
 sees+0.051
:"+0.050
Neg logit contributions
 ever-0.076
ust-0.074
 centuries-0.074
 prey-0.072
 ho-0.071
Token.
Token ID13
Feature activation+7.50e-01

Pos logit contributions
+0.094
 understands+0.093
 sees+0.091
 draws+0.090
 realized+0.087
Neg logit contributions
ust-0.099
 centuries-0.095
 ever-0.093
ur-0.091
 ho-0.091
Token It
Token ID632
Feature activation-7.94e+00

Pos logit contributions
ust+0.025
ur+0.025
 centuries+0.025
 ever+0.024
 ho+0.024
Neg logit contributions
-0.017
 understands-0.017
 sees-0.017
 realized-0.016
Fighting-0.016
Token's
Token ID338
Feature activation-1.38e+00

Pos logit contributions
+0.121
:"+0.102
 understands+0.101
Fighting+0.100
 draws+0.098
Neg logit contributions
ust-0.343
 ever-0.336
 centuries-0.326
ur-0.323
 ho-0.322
Token not
Token ID407
Feature activation+9.41e-01

Pos logit contributions
 means+0.038
 understands+0.037
 sees+0.037
 draws+0.036
 "+0.036
Neg logit contributions
Ar-0.042
 centuries-0.041
ur-0.041
and-0.040
ust-0.040
Token your
Token ID534
Feature activation-2.23e-01

Pos logit contributions
Ar+0.033
 centuries+0.032
ider+0.032
ur+0.031
ippy+0.031
Neg logit contributions
 means-0.022
 sees-0.022
 understands-0.021
 draws-0.021
 "-0.021
Token fault
Token ID8046
Feature activation-1.10e+01

Pos logit contributions
 sees+0.005
 understands+0.005
+0.005
 means+0.005
 draws+0.005
Neg logit contributions
 centuries-0.008
ust-0.008
Ar-0.008
ur-0.007
 ever-0.007
Token.
Token ID13
Feature activation+5.87e-01

Pos logit contributions
+0.332
 understands+0.332
 draws+0.326
 sees+0.324
 means+0.309
Neg logit contributions
ust-0.302
 ever-0.301
 centuries-0.288
 ho-0.284
ur-0.279
Token The
Token ID383
Feature activation-7.32e+00

Pos logit contributions
ust+0.026
 centuries+0.025
Ar+0.025
ur+0.025
 ever+0.025
Neg logit contributions
-0.008
 understands-0.008
 sees-0.008
 draws-0.008
 means-0.007
Token rope
Token ID17182
Feature activation-1.72e+00

Pos logit contributions
 sees+0.156
+0.154
 understands+0.153
 draws+0.149
 means+0.147
Neg logit contributions
ust-0.272
 centuries-0.266
ur-0.265
Ar-0.256
abb-0.251
Token was
Token ID373
Feature activation-1.55e+00

Pos logit contributions
+0.021
 understands+0.019
 draws+0.018
Fighting+0.017
 sees+0.017
Neg logit contributions
 ever-0.077
 centuries-0.077
ust-0.077
ur-0.075
 ho-0.075
Token old
Token ID1468
Feature activation-6.68e+00

Pos logit contributions
 sees+0.037
 understands+0.036
 means+0.036
 draws+0.036
+0.036
Neg logit contributions
ust-0.052
 centuries-0.051
Ar-0.050
ur-0.049
and-0.049
Token and
Token ID290
Feature activation-6.30e-01

Pos logit contributions
+0.031
 understands+0.030
 sees+0.030
 draws+0.029
 means+0.029
Neg logit contributions
ust-0.057
 centuries-0.056
Ar-0.055
ur-0.054
 ever-0.054
Token said
Token ID531
Feature activation-2.90e+00

Pos logit contributions
+0.013
 understands+0.012
 draws+0.012
Fighting+0.012
 sees+0.011
Neg logit contributions
 centuries-0.023
 ever-0.023
 ho-0.023
ur-0.023
 souven-0.023
Token goodbye
Token ID24829
Feature activation+1.11e+00

Pos logit contributions
 draws+0.069
 sees+0.069
+0.067
 understands+0.066
 means+0.064
Neg logit contributions
 ever-0.097
ur-0.096
 centuries-0.095
ust-0.094
 ho-0.094
Token.
Token ID13
Feature activation-6.84e-02

Pos logit contributions
ust+0.040
 centuries+0.040
Ar+0.039
 prey+0.037
ur+0.037
Neg logit contributions
-0.026
 understands-0.024
 "-0.023
 sees-0.023
 realized-0.023
Token From
Token ID3574
Feature activation+1.47e+00

Pos logit contributions
+0.001
 understands+0.001
 sees+0.001
 draws+0.001
 means+0.001
Neg logit contributions
ust-0.003
 centuries-0.003
Ar-0.003
ur-0.003
 ever-0.003
Token that
Token ID326
Feature activation-1.09e+01

Pos logit contributions
 centuries+0.063
 ever+0.062
ust+0.061
ur+0.061
 ho+0.061
Neg logit contributions
 understands-0.021
-0.021
 draws-0.021
 sees-0.019
Fighting-0.018
Token day
Token ID1110
Feature activation-3.50e+00

Pos logit contributions
+0.329
 understands+0.326
 disl+0.324
 draws+0.321
 sees+0.309
Neg logit contributions
 ever-0.322
 centuries-0.305
 ho-0.300
Ar-0.298
 souven-0.282
Token on
Token ID319
Feature activation+4.32e+00

Pos logit contributions
+0.087
 understands+0.080
 means+0.079
 sees+0.078
 "+0.077
Neg logit contributions
ust-0.124
 centuries-0.117
Ar-0.117
 prey-0.115
ur-0.113
Token,
Token ID11
Feature activation-2.21e+00

Pos logit contributions
 centuries+0.148
 ever+0.147
 ho+0.143
ur+0.139
ust+0.139
Neg logit contributions
 understands-0.099
-0.098
 draws-0.097
Fighting-0.097
 sees-0.093
Token Lily
Token ID20037
Feature activation-1.91e+00

Pos logit contributions
+0.036
 understands+0.035
 sees+0.033
 draws+0.033
 means+0.032
Neg logit contributions
ust-0.092
 centuries-0.092
Ar-0.090
 ever-0.089
ur-0.089
Token knew
Token ID2993
Feature activation-4.50e+00

Pos logit contributions
 disl+0.048
+0.046
Fighting+0.045
:"+0.044
 mixer+0.042
Neg logit contributions
 ever-0.067
 ho-0.061
 gall-0.057
ur-0.056
Ar-0.056
Token a
Token ID257
Feature activation+6.04e+00

Pos logit contributions
+0.006
 understands+0.006
 sees+0.006
 draws+0.006
 means+0.006
Neg logit contributions
ust-0.011
 centuries-0.011
Ar-0.011
ur-0.011
 ever-0.011
Token broom
Token ID47085
Feature activation+9.39e-01

Pos logit contributions
ust+0.209
 centuries+0.206
Ar+0.197
 ever+0.197
ur+0.196
Neg logit contributions
-0.144
 understands-0.143
 sees-0.140
 draws-0.137
 means-0.135
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
ust+0.032
 centuries+0.031
Ar+0.031
ur+0.030
 ever+0.030
Neg logit contributions
-0.023
 understands-0.022
 sees-0.022
 draws-0.021
 means-0.021
Token a
Token ID257
Feature activation+6.05e+00

Pos logit contributions
+0.000
 understands+0.000
 sees+0.000
 draws+0.000
 means+0.000
Neg logit contributions
ust-0.000
 centuries-0.000
Ar-0.000
ur-0.000
 ever-0.000
Token dust
Token ID8977
Feature activation-5.45e-01

Pos logit contributions
 centuries+0.184
ust+0.184
Ar+0.180
 ho+0.177
ur+0.176
Neg logit contributions
-0.160
 understands-0.159
 draws-0.154
 sees-0.154
 means-0.153
Tokenpan
Token ID6839
Feature activation-1.09e+01

Pos logit contributions
+0.011
 understands+0.010
 sees+0.010
 draws+0.010
 means+0.010
Neg logit contributions
ust-0.021
 centuries-0.021
Ar-0.020
ur-0.020
 ever-0.020
Token.
Token ID13
Feature activation+7.57e-01

Pos logit contributions
 understands+0.283
+0.282
 sees+0.278
 means+0.274
 draws+0.273
Neg logit contributions
ust-0.349
 centuries-0.345
Ar-0.336
ur-0.327
 prey-0.321
Token They
Token ID1119
Feature activation+3.57e-01

Pos logit contributions
 centuries+0.022
Ar+0.021
ust+0.021
ur+0.021
 prey+0.020
Neg logit contributions
-0.022
 understands-0.021
 draws-0.020
 sees-0.020
Fighting-0.020
Token help
Token ID1037
Feature activation-9.15e-01

Pos logit contributions
 centuries+0.010
ust+0.010
Ar+0.010
ur+0.010
 ever+0.010
Neg logit contributions
-0.010
 understands-0.010
 sees-0.010
 draws-0.010
 means-0.010
Token him
Token ID683
Feature activation+3.72e+00

Pos logit contributions
 sees+0.014
+0.014
 means+0.014
 understands+0.013
 draws+0.013
Neg logit contributions
ust-0.040
 centuries-0.039
Ar-0.038
ur-0.038
abb-0.038
Token sweep
Token ID16085
Feature activation-4.40e+00

Pos logit contributions
ust+0.115
 centuries+0.114
ur+0.111
 ever+0.110
Ar+0.110
Neg logit contributions
-0.099
 understands-0.098
 sees-0.094
 draws-0.094
 means-0.092
Token
Token ID198
Feature activation+6.94e-01

Pos logit contributions
 ever+0.044
 centuries+0.043
 ho+0.042
ct+0.041
 prey+0.040
Neg logit contributions
 understands-0.038
-0.038
Fighting-0.036
 disl-0.035
edience-0.035
Token"
Token ID1
Feature activation-7.48e+00

Pos logit contributions
ust+0.027
 prey+0.027
 ho+0.027
ur+0.027
 centuries+0.026
Neg logit contributions
-0.014
 understands-0.012
 "-0.011
Fighting-0.011
 sees-0.011
TokenIt
Token ID1026
Feature activation-4.37e+00

Pos logit contributions
+0.169
 means+0.155
 understands+0.153
 draws+0.150
 sees+0.148
Neg logit contributions
 centuries-0.276
ur-0.273
ust-0.273
 ho-0.265
 prey-0.264
Token's
Token ID338
Feature activation-8.13e-01

Pos logit contributions
Fighting+0.077
+0.075
 End+0.071
Playing+0.066
 disl+0.065
Neg logit contributions
 ever-0.168
ust-0.168
 centuries-0.161
ur-0.160
 ho-0.159
Token your
Token ID534
Feature activation+2.17e-01

Pos logit contributions
 means+0.021
+0.021
 understands+0.020
 sees+0.020
 draws+0.020
Neg logit contributions
ust-0.025
Ar-0.025
 centuries-0.025
ur-0.025
and-0.024
Token fault
Token ID8046
Feature activation-1.09e+01

Pos logit contributions
ust+0.008
 centuries+0.008
Ar+0.007
ur+0.007
abb+0.007
Neg logit contributions
 sees-0.005
 understands-0.005
 draws-0.005
 means-0.005
-0.005
Token,
Token ID11
Feature activation-8.36e-01

Pos logit contributions
+0.287
 understands+0.285
 draws+0.279
 sees+0.278
 means+0.268
Neg logit contributions
ust-0.341
 centuries-0.334
 ever-0.333
 ho-0.324
ur-0.322
Token you
Token ID345
Feature activation+7.11e+00

Pos logit contributions
+0.017
 understands+0.017
 sees+0.016
 means+0.016
 draws+0.016
Neg logit contributions
 centuries-0.031
ust-0.031
Ar-0.031
ur-0.030
 ever-0.029
Token crashed
Token ID14997
Feature activation-1.16e+00

Pos logit contributions
ust+0.253
 centuries+0.250
 ever+0.243
ur+0.241
Ar+0.241
Neg logit contributions
-0.160
 understands-0.155
 draws-0.151
 sees-0.150
 means-0.147
Token into
Token ID656
Feature activation+6.33e+00

Pos logit contributions
+0.030
 understands+0.030
 sees+0.029
 draws+0.029
 means+0.028
Neg logit contributions
ust-0.037
 centuries-0.036
Ar-0.035
 ever-0.035
ur-0.035
Token me
Token ID502
Feature activation+5.05e+00

Pos logit contributions
ust+0.270
ur+0.261
 centuries+0.260
Ar+0.258
 ever+0.251
Neg logit contributions
-0.097
 understands-0.095
 sees-0.094
 means-0.093
 draws-0.092
Token she
Token ID673
Feature activation+4.00e-01

Pos logit contributions
+0.032
 understands+0.031
 draws+0.030
 sees+0.030
 means+0.028
Neg logit contributions
 centuries-0.072
ust-0.071
 ever-0.070
ur-0.069
 ho-0.068
Token accidentally
Token ID14716
Feature activation-4.44e+00

Pos logit contributions
ust+0.017
 centuries+0.017
 ever+0.016
Ar+0.016
ur+0.016
Neg logit contributions
-0.006
 understands-0.006
 means-0.006
 draws-0.006
 sees-0.006
Token broke
Token ID6265
Feature activation-5.00e+00

Pos logit contributions
 sees+0.097
 understands+0.097
+0.097
 draws+0.095
 realized+0.093
Neg logit contributions
ust-0.161
 centuries-0.157
Ar-0.155
 prey-0.149
abb-0.149
Token the
Token ID262
Feature activation-6.79e+00

Pos logit contributions
+0.072
 understands+0.069
 sees+0.067
 means+0.065
 draws+0.065
Neg logit contributions
ust-0.218
 centuries-0.216
Ar-0.213
ur-0.210
 ever-0.206
Token sign
Token ID1051
Feature activation-8.22e-01

Pos logit contributions
+0.181
 understands+0.178
 sees+0.176
 draws+0.171
 means+0.170
Neg logit contributions
ust-0.214
 centuries-0.211
Ar-0.204
ur-0.204
 ever-0.199
Token that
Token ID326
Feature activation-1.09e+01

Pos logit contributions
+0.022
 understands+0.021
 sees+0.021
 draws+0.021
 means+0.020
Neg logit contributions
ust-0.026
 centuries-0.025
 ever-0.024
Ar-0.024
ur-0.024
Token said
Token ID531
Feature activation-2.82e+00

Pos logit contributions
+0.212
 understands+0.196
 draws+0.190
 sees+0.186
:"+0.177
Neg logit contributions
 ever-0.435
 centuries-0.423
ust-0.416
 ho-0.403
ur-0.402
Token "
Token ID366
Feature activation-6.64e+00

Pos logit contributions
+0.055
 draws+0.055
 sees+0.055
 understands+0.054
Fighting+0.054
Neg logit contributions
 ever-0.108
 centuries-0.103
ur-0.102
ust-0.102
 souven-0.101
TokenDo
Token ID5211
Feature activation-1.06e+00

Pos logit contributions
+0.179
 means+0.167
Fighting+0.162
 draws+0.161
 sees+0.154
Neg logit contributions
ur-0.210
 centuries-0.206
ust-0.202
 eyel-0.199
 prey-0.195
Token not
Token ID407
Feature activation+6.45e-01

Pos logit contributions
+0.040
 understands+0.038
 draws+0.038
liness+0.037
 sees+0.037
Neg logit contributions
ust-0.024
 ever-0.023
 centuries-0.022
ur-0.021
 ho-0.020
Token touch
Token ID3638
Feature activation+8.37e-01

Pos logit contributions
Ar+0.024
 centuries+0.023
ust+0.022
ippy+0.022
ider+0.022
Neg logit contributions
 draws-0.015
 means-0.014
 sees-0.014
-0.014
 "-0.013
Token.
Token ID13
Feature activation+5.66e-01

Pos logit contributions
 understands+0.021
+0.021
Fighting+0.020
 means+0.020
 disl+0.019
Neg logit contributions
 ever-0.013
ust-0.012
 centuries-0.012
 eyel-0.011
ur-0.011
Token She
Token ID1375
Feature activation+2.47e-01

Pos logit contributions
 ever+0.015
ust+0.014
 eyel+0.014
 centuries+0.014
and+0.014
Neg logit contributions
-0.017
 understands-0.017
 means-0.016
 realized-0.016
 sees-0.015
Token was
Token ID373
Feature activation-1.46e+00

Pos logit contributions
 ever+0.010
 eyel+0.009
Ar+0.009
 centuries+0.009
ust+0.009
Neg logit contributions
liness-0.004
-0.004
 means-0.004
 understands-0.004
Fighting-0.004
Token only
Token ID691
Feature activation+1.70e+00

Pos logit contributions
 "+0.034
 sees+0.034
+0.032
 draws+0.032
 means+0.031
Neg logit contributions
ust-0.052
Ar-0.051
 centuries-0.050
 prey-0.049
abb-0.048
Token three
Token ID1115
Feature activation+4.75e+00

Pos logit contributions
Ar+0.064
 centuries+0.063
ust+0.063
addle+0.058
 prey+0.058
Neg logit contributions
 "-0.037
 sees-0.036
-0.034
 understands-0.034
 means-0.033
Token years
Token ID812
Feature activation+1.02e+01

Pos logit contributions
ust+0.088
 centuries+0.088
Ar+0.084
ur+0.081
 ever+0.080
Neg logit contributions
-0.187
 understands-0.185
 sees-0.182
 draws-0.181
 means-0.180
Token old
Token ID1468
Feature activation-6.04e+00

Pos logit contributions
ust+0.478
 centuries+0.475
Ar+0.466
ur+0.459
 ever+0.459
Neg logit contributions
-0.115
 understands-0.110
 sees-0.105
 draws-0.102
 means-0.100
Token and
Token ID290
Feature activation-2.04e-01

Pos logit contributions
 understands+0.160
 draws+0.156
+0.156
 disl+0.154
 sees+0.152
Neg logit contributions
 centuries-0.178
 ever-0.178
ust-0.178
 ho-0.176
 eyel-0.172
Token she
Token ID673
Feature activation+1.08e+00

Pos logit contributions
 draws+0.005
 sees+0.005
+0.005
liness+0.005
 understands+0.004
Neg logit contributions
 ever-0.007
 eyel-0.007
ust-0.007
 centuries-0.006
 quite-0.006
Token loved
Token ID6151
Feature activation-1.54e+00

Pos logit contributions
 ever+0.044
ust+0.041
 eyel+0.040
 centuries+0.040
ur+0.039
Neg logit contributions
-0.020
liness-0.020
 means-0.019
 understands-0.019
 draws-0.019
Token playing
Token ID2712
Feature activation+1.15e+00

Pos logit contributions
+0.042
 understands+0.041
 sees+0.039
 draws+0.039
 means+0.039
Neg logit contributions
 centuries-0.048
ust-0.047
 ever-0.047
 ho-0.045
 prey-0.045
Token smelled
Token ID37609
Feature activation+3.75e-01

Pos logit contributions
Fighting+0.007
 understands+0.007
:"+0.007
+0.007
 sees+0.007
Neg logit contributions
 ever-0.011
 centuries-0.011
 ho-0.011
Ar-0.010
 souven-0.010
Token delicious
Token ID12625
Feature activation+1.83e-01

Pos logit contributions
 centuries+0.010
 ever+0.010
ust+0.010
 souven+0.009
 ho+0.009
Neg logit contributions
-0.012
 understands-0.012
 draws-0.012
 sees-0.011
 means-0.011
Token.
Token ID13
Feature activation+4.34e-01

Pos logit contributions
ust+0.006
 centuries+0.006
Ar+0.006
ur+0.005
 prey+0.005
Neg logit contributions
-0.005
 understands-0.005
 sees-0.005
 means-0.005
 draws-0.005
Token
Token ID198
Feature activation+2.14e+00

Pos logit contributions
 centuries+0.014
Ar+0.014
 ho+0.014
ust+0.014
ur+0.013
Neg logit contributions
 disl-0.010
 sees-0.010
 draws-0.009
 understands-0.009
Fighting-0.009
Token
Token ID198
Feature activation+2.12e+00

Pos logit contributions
Ar+0.072
 centuries+0.071
ust+0.069
 ever+0.067
 prey+0.067
Neg logit contributions
 draws-0.051
-0.050
 means-0.049
 sees-0.049
 understands-0.049
TokenHar
Token ID13587
Feature activation+1.01e+01

Pos logit contributions
Ar+0.047
ust+0.046
 centuries+0.046
 ho+0.043
 prey+0.041
Neg logit contributions
 understands-0.074
 sees-0.074
 draws-0.074
 means-0.072
-0.072
Tokenper
Token ID525
Feature activation+2.94e+00

Pos logit contributions
ust+0.300
 centuries+0.289
Ar+0.283
 eyel+0.281
 prey+0.280
Neg logit contributions
 understands-0.285
-0.285
 means-0.281
 sees-0.275
 draws-0.271
Token and
Token ID290
Feature activation-3.25e-01

Pos logit contributions
ust+0.124
 centuries+0.123
Ar+0.120
 ever+0.119
ur+0.119
Neg logit contributions
-0.046
 understands-0.045
 sees-0.043
 draws-0.043
 means-0.041
Token her
Token ID607
Feature activation+2.98e+00

Pos logit contributions
+0.007
 understands+0.007
 sees+0.007
 draws+0.007
 means+0.007
Neg logit contributions
ust-0.012
 centuries-0.012
Ar-0.011
ur-0.011
 ever-0.011
Token mom
Token ID1995
Feature activation-2.50e+00

Pos logit contributions
ust+0.125
 centuries+0.123
Ar+0.120
ur+0.120
 ever+0.116
Neg logit contributions
-0.050
 understands-0.050
 sees-0.048
 draws-0.046
 means-0.045
Token had
Token ID550
Feature activation-2.69e+00

Pos logit contributions
+0.058
 understands+0.058
 means+0.056
 disl+0.056
Fighting+0.056
Neg logit contributions
 ever-0.086
ust-0.084
 centuries-0.083
 ho-0.082
ur-0.081
Tokenier
Token ID959
Feature activation-1.75e-01

Pos logit contributions
Ar+0.186
 centuries+0.180
ust+0.176
 prey+0.169
 eyel+0.165
Neg logit contributions
 means-0.134
 understands-0.134
-0.133
 sees-0.130
:"-0.125
Token.
Token ID13
Feature activation+1.34e+00

Pos logit contributions
+0.005
 understands+0.005
 sees+0.005
 draws+0.005
 means+0.005
Neg logit contributions
ust-0.005
 centuries-0.005
Ar-0.005
 ever-0.005
ur-0.005
Token
Token ID198
Feature activation+3.39e+00

Pos logit contributions
ust+0.049
 centuries+0.049
Ar+0.048
 ever+0.047
ur+0.047
Neg logit contributions
-0.028
 understands-0.028
 sees-0.027
 draws-0.027
 means-0.026
Token
Token ID198
Feature activation+3.39e+00

Pos logit contributions
 centuries+0.117
Ar+0.117
ust+0.114
 ever+0.111
ur+0.111
Neg logit contributions
-0.078
 understands-0.075
 draws-0.074
 sees-0.073
 means-0.073
TokenThey
Token ID2990
Feature activation+2.34e+00

Pos logit contributions
ust+0.105
 centuries+0.103
ur+0.100
 prey+0.098
 ever+0.098
Neg logit contributions
-0.096
 understands-0.091
 sees-0.090
 draws-0.089
 means-0.088
Token walk
Token ID2513
Feature activation+9.80e+00

Pos logit contributions
ust+0.081
 centuries+0.079
 prey+0.077
Ar+0.076
and+0.075
Neg logit contributions
 sees-0.058
 understands-0.057
-0.056
 draws-0.055
 realized-0.054
Token back
Token ID736
Feature activation+9.67e-01

Pos logit contributions
 centuries+0.314
ust+0.312
Ar+0.306
 ever+0.304
ur+0.301
Neg logit contributions
-0.250
 understands-0.246
 draws-0.240
 sees-0.236
 means-0.235
Token home
Token ID1363
Feature activation-1.55e+00

Pos logit contributions
ust+0.031
 centuries+0.031
Ar+0.030
ur+0.030
 ever+0.030
Neg logit contributions
-0.025
 understands-0.024
 draws-0.023
 sees-0.023
 means-0.023
Token.
Token ID13
Feature activation+1.38e+00

Pos logit contributions
+0.042
 understands+0.042
 sees+0.041
 draws+0.040
 means+0.040
Neg logit contributions
ust-0.048
 centuries-0.047
Ar-0.046
ur-0.045
 ever-0.044
Token They
Token ID1119
Feature activation+1.12e+00

Pos logit contributions
ust+0.059
 centuries+0.056
ur+0.055
 ho+0.055
 ever+0.054
Neg logit contributions
-0.023
 understands-0.023
 draws-0.023
 means-0.023
 sees-0.022
Token are
Token ID389
Feature activation+1.02e+00

Pos logit contributions
ust+0.041
 centuries+0.040
Ar+0.039
 prey+0.038
ur+0.038
Neg logit contributions
 sees-0.025
 understands-0.025
-0.025
 draws-0.024
 means-0.023
Token on
Token ID319
Feature activation+4.50e+00

Pos logit contributions
+0.040
 understands+0.039
 sees+0.038
 draws+0.037
 means+0.037
Neg logit contributions
ust-0.098
 centuries-0.097
Ar-0.095
ur-0.093
 ever-0.093
Token tight
Token ID5381
Feature activation+1.19e+00

Pos logit contributions
ust+0.157
 centuries+0.152
Ar+0.147
 prey+0.144
ur+0.141
Neg logit contributions
-0.112
 understands-0.107
 sees-0.107
 draws-0.104
 "-0.102
Token as
Token ID355
Feature activation+3.66e+00

Pos logit contributions
ust+0.047
 centuries+0.045
Ar+0.044
ur+0.043
 ever+0.043
Neg logit contributions
-0.024
 understands-0.023
 sees-0.022
 means-0.022
 draws-0.022
Token the
Token ID262
Feature activation-6.53e+00

Pos logit contributions
ust+0.156
 centuries+0.152
Ar+0.149
ur+0.147
abb+0.146
Neg logit contributions
-0.060
 understands-0.057
 sees-0.055
 "-0.055
 means-0.053
Token pony
Token ID26902
Feature activation+1.53e+00

Pos logit contributions
+0.180
 understands+0.177
 sees+0.174
 draws+0.170
 means+0.169
Neg logit contributions
ust-0.201
 centuries-0.199
Ar-0.191
ur-0.191
 ever-0.187
Token tro
Token ID4161
Feature activation+1.03e+01

Pos logit contributions
ust+0.056
 centuries+0.055
Ar+0.053
abb+0.051
ur+0.051
Neg logit contributions
-0.035
 understands-0.033
 sees-0.033
 means-0.032
 draws-0.032
Tokentted
Token ID28734
Feature activation-8.67e-02

Pos logit contributions
 prey+0.290
ust+0.287
 centuries+0.282
 ho+0.276
 eyel+0.266
Neg logit contributions
-0.337
 "-0.296
:"-0.294
 understands-0.290
 sees-0.287
Token around
Token ID1088
Feature activation-4.86e+00

Pos logit contributions
 "+0.002
+0.002
 means+0.002
 that+0.001
 sees+0.001
Neg logit contributions
ust-0.003
 centuries-0.003
abb-0.003
izards-0.003
Ar-0.003
Token the
Token ID262
Feature activation-6.50e+00

Pos logit contributions
+0.061
 understands+0.059
 sees+0.056
 draws+0.055
 means+0.054
Neg logit contributions
ust-0.220
 centuries-0.220
Ar-0.215
ur-0.213
 ever-0.212
Token park
Token ID3952
Feature activation+4.89e-01

Pos logit contributions
 means+0.147
 understands+0.145
+0.142
 draws+0.140
 disl+0.139
Neg logit contributions
 centuries-0.221
ust-0.220
Ar-0.220
 ho-0.219
 prey-0.214
Token.
Token ID13
Feature activation+3.16e-01

Pos logit contributions
ust+0.015
 centuries+0.015
Ar+0.015
ur+0.014
 ever+0.014
Neg logit contributions
-0.013
 understands-0.013
 sees-0.012
 draws-0.012
 means-0.012
Token he
Token ID339
Feature activation+1.78e+00

Pos logit contributions
ust+0.150
 centuries+0.149
Ar+0.145
ur+0.143
 ever+0.143
Neg logit contributions
-0.098
 understands-0.096
 sees-0.094
 draws-0.093
 means-0.092
Token decided
Token ID3066
Feature activation-3.10e-01

Pos logit contributions
 centuries+0.074
ust+0.074
 ever+0.070
and+0.070
 prey+0.070
Neg logit contributions
-0.032
 realized-0.031
 sees-0.029
 saw-0.028
 understands-0.027
Token to
Token ID284
Feature activation+4.72e-01

Pos logit contributions
 draws+0.010
 understands+0.010
+0.010
liness+0.009
 means+0.009
Neg logit contributions
 ever-0.008
ur-0.008
ust-0.008
 centuries-0.008
 prey-0.008
Token take
Token ID1011
Feature activation+4.02e+00

Pos logit contributions
ust+0.016
ider+0.015
 centuries+0.015
Ar+0.015
abb+0.014
Neg logit contributions
 sees-0.012
 means-0.012
 draws-0.012
 understands-0.011
-0.011
Token a
Token ID257
Feature activation+6.47e+00

Pos logit contributions
ust+0.173
ider+0.157
abb+0.152
 centuries+0.146
ur+0.146
Neg logit contributions
 sees-0.078
 means-0.077
 draws-0.077
 it-0.076
 the-0.076
Token walk
Token ID2513
Feature activation+9.80e+00

Pos logit contributions
ust+0.217
 centuries+0.205
Ar+0.199
ider+0.196
abb+0.194
Neg logit contributions
 sees-0.176
-0.162
 understands-0.159
 saw-0.157
 draws-0.156
Token down
Token ID866
Feature activation+2.23e+00

Pos logit contributions
ust+0.376
 centuries+0.361
Ar+0.358
 eyel+0.350
abb+0.347
Neg logit contributions
-0.195
 sees-0.192
 means-0.191
 "-0.182
 draws-0.175
Token to
Token ID284
Feature activation+5.21e-01

Pos logit contributions
ust+0.126
abb+0.119
ur+0.117
 eyel+0.116
 centuries+0.116
Neg logit contributions
 means-0.009
 sees-0.009
 "-0.007
-0.006
 understands-0.005
Token the
Token ID262
Feature activation-5.86e+00

Pos logit contributions
ust+0.021
 centuries+0.021
Ar+0.020
ur+0.020
 ever+0.020
Neg logit contributions
-0.009
 understands-0.009
 sees-0.009
 draws-0.009
 means-0.009
Token beach
Token ID10481
Feature activation+2.14e+00

Pos logit contributions
+0.166
 understands+0.166
 draws+0.161
 means+0.158
 sees+0.156
Neg logit contributions
 centuries-0.173
ust-0.170
Ar-0.168
 prey-0.163
 ever-0.163
Token.
Token ID13
Feature activation+8.94e-01

Pos logit contributions
ust+0.067
 centuries+0.066
Ar+0.064
ur+0.063
 ever+0.062
Neg logit contributions
-0.057
 understands-0.056
 sees-0.056
 draws-0.054
 means-0.054
Token more
Token ID517
Feature activation+7.96e-01

Pos logit contributions
 centuries+0.312
ust+0.309
Ar+0.299
 ever+0.299
 ho+0.295
Neg logit contributions
-0.226
 understands-0.216
 sees-0.210
 draws-0.208
 means-0.207
Token hearts
Token ID11954
Feature activation+1.87e+00

Pos logit contributions
 centuries+0.029
 ho+0.028
ust+0.028
 ever+0.028
 souven+0.027
Neg logit contributions
-0.018
 understands-0.017
:"-0.016
 sees-0.016
Fighting-0.016
Token and
Token ID290
Feature activation-6.82e-02

Pos logit contributions
ust+0.067
 centuries+0.067
Ar+0.065
ur+0.064
 ever+0.063
Neg logit contributions
-0.041
 understands-0.040
 sees-0.039
 draws-0.039
 means-0.038
Token put
Token ID1234
Feature activation-4.41e+00

Pos logit contributions
+0.002
:"+0.002
 understands+0.002
Fighting+0.001
 disl+0.001
Neg logit contributions
 ever-0.002
 souven-0.002
 centuries-0.002
 ho-0.002
Ar-0.002
Token them
Token ID606
Feature activation-4.65e-02

Pos logit contributions
+0.086
 understands+0.082
 sees+0.077
Fighting+0.076
:"+0.076
Neg logit contributions
 centuries-0.169
 ever-0.166
Ar-0.165
ust-0.164
 ho-0.164
Token on
Token ID319
Feature activation+5.08e+00

Pos logit contributions
 draws+0.001
 sees+0.001
+0.001
 understands+0.001
 means+0.001
Neg logit contributions
ust-0.002
 centuries-0.002
Ar-0.002
 prey-0.001
ur-0.001
Token the
Token ID262
Feature activation-5.97e+00

Pos logit contributions
ust+0.213
 centuries+0.209
Ar+0.205
ur+0.205
 ever+0.201
Neg logit contributions
-0.083
 understands-0.080
 sees-0.078
 draws-0.077
 means-0.076
Token floor
Token ID4314
Feature activation-7.81e-01

Pos logit contributions
+0.136
 understands+0.133
 means+0.130
:"+0.124
 disl+0.123
Neg logit contributions
 centuries-0.208
 ho-0.204
 ever-0.203
 eyel-0.201
ust-0.199
Token.
Token ID13
Feature activation+7.58e-01

Pos logit contributions
+0.021
 understands+0.021
 sees+0.021
 draws+0.021
 means+0.020
Neg logit contributions
ust-0.024
 centuries-0.023
Ar-0.023
ur-0.022
 ever-0.022
Token They
Token ID1119
Feature activation+4.07e-01

Pos logit contributions
 centuries+0.027
Ar+0.027
ust+0.026
ur+0.026
 ever+0.025
Neg logit contributions
-0.016
 understands-0.016
 sees-0.016
 draws-0.015
 means-0.015
Token made
Token ID925
Feature activation-7.84e+00

Pos logit contributions
 ever+0.014
 centuries+0.014
 ho+0.014
ust+0.014
Ar+0.014
Neg logit contributions
-0.009
 understands-0.009
:"-0.009
 disl-0.009
liness-0.009
Token dad
Token ID9955
Feature activation-2.50e+00

Pos logit contributions
ust+0.149
 centuries+0.147
ur+0.146
Ar+0.144
abb+0.139
Neg logit contributions
 understands-0.059
 sees-0.056
-0.054
 realized-0.051
 draws-0.048
Token spent
Token ID3377
Feature activation-1.77e+00

Pos logit contributions
+0.043
 realized+0.042
 understands+0.042
 sees+0.042
 draws+0.038
Neg logit contributions
ust-0.102
 centuries-0.102
Ar-0.099
 prey-0.097
ur-0.096
Token the
Token ID262
Feature activation-6.26e+00

Pos logit contributions
+0.029
 understands+0.028
 sees+0.027
 draws+0.027
 means+0.026
Neg logit contributions
ust-0.074
 centuries-0.073
Ar-0.072
 ever-0.071
ur-0.071
Token whole
Token ID2187
Feature activation+2.14e+00

Pos logit contributions
 sees+0.126
+0.119
 understands+0.114
 End+0.111
 has+0.108
Neg logit contributions
ust-0.249
ur-0.247
uzz-0.232
ala-0.232
abb-0.230
Token day
Token ID1110
Feature activation-2.79e+00

Pos logit contributions
ust+0.096
ur+0.093
 prey+0.090
izards+0.090
 centuries+0.090
Neg logit contributions
-0.034
 sees-0.031
 understands-0.027
 realized-0.026
 End-0.026
Token visiting
Token ID10013
Feature activation+2.69e+00

Pos logit contributions
+0.057
 sees+0.057
 realized+0.051
 saw+0.050
 "+0.049
Neg logit contributions
ust-0.112
Ar-0.103
iri-0.101
 centuries-0.101
abb-0.099
Token the
Token ID262
Feature activation-6.22e+00

Pos logit contributions
ust+0.135
 centuries+0.129
ur+0.128
Ar+0.128
abb+0.126
Neg logit contributions
-0.024
 understands-0.023
 sees-0.022
 means-0.022
 "-0.019
Token dragon
Token ID10441
Feature activation+4.20e+00

Pos logit contributions
 sees+0.181
+0.181
 understands+0.180
 draws+0.175
 means+0.172
Neg logit contributions
ust-0.182
 centuries-0.176
ur-0.175
Ar-0.172
 ho-0.166
Token.
Token ID13
Feature activation+5.11e-01

Pos logit contributions
ust+0.150
Ar+0.147
 centuries+0.146
 eyel+0.141
ur+0.139
Neg logit contributions
 sees-0.093
-0.093
 understands-0.090
 means-0.089
 draws-0.088
Token They
Token ID1119
Feature activation-1.68e-04

Pos logit contributions
 centuries+0.019
ust+0.019
Ar+0.019
 ever+0.018
ur+0.018
Neg logit contributions
-0.010
 understands-0.010
 draws-0.010
 sees-0.010
 means-0.010
Token laughed
Token ID13818
Feature activation-3.53e-01

Pos logit contributions
+0.000
 understands+0.000
 draws+0.000
 sees+0.000
 means+0.000
Neg logit contributions
ust-0.000
 centuries-0.000
Ar-0.000
 ever-0.000
ur-0.000
Token a
Token ID257
Feature activation+6.31e+00

Pos logit contributions
+0.038
 understands+0.037
 draws+0.036
 sees+0.036
 means+0.034
Neg logit contributions
 centuries-0.028
ust-0.028
 ever-0.028
 prey-0.026
ur-0.026
Token good
Token ID922
Feature activation-1.73e+00

Pos logit contributions
ust+0.201
 centuries+0.200
ur+0.196
abb+0.186
 ever+0.185
Neg logit contributions
-0.171
 understands-0.165
 sees-0.164
 draws-0.157
 means-0.156
Token idea
Token ID2126
Feature activation-4.22e+00

Pos logit contributions
+0.042
 understands+0.042
 sees+0.041
 means+0.040
 draws+0.040
Neg logit contributions
 centuries-0.058
ust-0.058
Ar-0.056
ur-0.055
 ever-0.054
Token and
Token ID290
Feature activation+5.76e-03

Pos logit contributions
+0.098
 understands+0.096
 draws+0.095
 sees+0.094
 means+0.089
Neg logit contributions
ust-0.145
 centuries-0.145
 ever-0.144
ur-0.139
Ar-0.138
Token they
Token ID484
Feature activation+5.23e-01

Pos logit contributions
 ever+0.000
ur+0.000
 centuries+0.000
ust+0.000
 souven+0.000
Neg logit contributions
 draws-0.000
-0.000
 understands-0.000
 sees-0.000
Fighting-0.000
Token went
Token ID1816
Feature activation+7.29e+00

Pos logit contributions
ust+0.019
 centuries+0.019
Ar+0.018
ur+0.018
 ever+0.018
Neg logit contributions
-0.012
 understands-0.011
 sees-0.011
 draws-0.011
 means-0.011
Token to
Token ID284
Feature activation+5.05e-01

Pos logit contributions
ust+0.291
 centuries+0.275
Ar+0.267
abb+0.263
 prey+0.262
Neg logit contributions
 sees-0.145
-0.144
 understands-0.137
 draws-0.132
 "-0.131
Token the
Token ID262
Feature activation-5.93e+00

Pos logit contributions
 centuries+0.018
Ar+0.018
ust+0.017
 ever+0.017
ur+0.017
Neg logit contributions
-0.011
 understands-0.011
 draws-0.010
 means-0.010
 sees-0.010
Token neighbor
Token ID4780
Feature activation-2.28e+00

Pos logit contributions
+0.129
 understands+0.127
 means+0.127
 draws+0.124
 Building+0.116
Neg logit contributions
 centuries-0.209
ust-0.206
 ever-0.203
Ar-0.200
 prey-0.198
Token's
Token ID338
Feature activation-5.27e-01

Pos logit contributions
+0.029
 understands+0.029
 draws+0.026
 means+0.026
 sees+0.026
Neg logit contributions
 centuries-0.102
ust-0.101
ur-0.099
 ever-0.099
Ar-0.098
Token house
Token ID2156
Feature activation-1.77e+00

Pos logit contributions
+0.012
 draws+0.012
 means+0.012
 understands+0.012
Fighting+0.012
Neg logit contributions
 centuries-0.017
 prey-0.016
ust-0.016
 ever-0.016
 ho-0.016
Token he
Token ID339
Feature activation+7.51e-01

Pos logit contributions
 "+0.073
 the+0.070
 it+0.066
 that+0.062
+0.062
Neg logit contributions
ust-0.215
abb-0.191
and-0.182
beh-0.181
ur-0.179
Token knew
Token ID2993
Feature activation-4.22e+00

Pos logit contributions
ust+0.032
 centuries+0.032
Ar+0.031
 prey+0.030
ur+0.030
Neg logit contributions
 understands-0.012
 sees-0.012
-0.012
 realized-0.011
 means-0.011
Token it
Token ID340
Feature activation-7.57e+00

Pos logit contributions
 understands+0.055
 means+0.054
+0.052
 sees+0.051
 "+0.050
Neg logit contributions
Ar-0.189
ust-0.187
 centuries-0.184
abb-0.178
 prey-0.176
Token,
Token ID11
Feature activation-1.79e+00

Pos logit contributions
+0.163
 draws+0.154
 understands+0.154
 sees+0.149
 disl+0.145
Neg logit contributions
 centuries-0.279
 ever-0.279
ust-0.276
ur-0.270
 ho-0.264
Token the
Token ID262
Feature activation-6.59e+00

Pos logit contributions
+0.028
 means+0.026
 understands+0.026
 "+0.026
 sees+0.026
Neg logit contributions
ust-0.077
 centuries-0.074
Ar-0.074
ur-0.073
abb-0.072
Token school
Token ID1524
Feature activation+1.03e+00

Pos logit contributions
+0.140
 understands+0.139
 sees+0.137
 draws+0.133
 means+0.131
Neg logit contributions
ust-0.242
 centuries-0.239
ur-0.233
Ar-0.232
 prey-0.227
Token day
Token ID1110
Feature activation-3.22e+00

Pos logit contributions
 ever+0.043
ust+0.042
 centuries+0.042
Ar+0.040
ur+0.040
Neg logit contributions
-0.016
 understands-0.015
 draws-0.015
 disl-0.015
 sees-0.014
Token was
Token ID373
Feature activation-1.87e+00

Pos logit contributions
+0.060
 understands+0.059
 sees+0.057
 means+0.056
 draws+0.056
Neg logit contributions
ust-0.126
 centuries-0.125
Ar-0.123
ur-0.120
 prey-0.119
Token over
Token ID625
Feature activation+8.12e-02

Pos logit contributions
+0.052
 understands+0.051
 sees+0.050
 draws+0.050
 means+0.049
Neg logit contributions
ust-0.056
 centuries-0.056
Ar-0.054
 ever-0.054
ur-0.053
Token.
Token ID13
Feature activation+4.17e-01

Pos logit contributions
ust+0.003
 centuries+0.003
Ar+0.003
ur+0.002
 ever+0.002
Neg logit contributions
-0.002
 understands-0.002
 sees-0.002
 draws-0.002
 means-0.002
Token He
Token ID679
Feature activation+1.56e-01

Pos logit contributions
ust+0.017
Ar+0.017
 centuries+0.017
 ever+0.016
 ho+0.016
Neg logit contributions
-0.007
 "-0.007
 understands-0.007
 means-0.007
 draws-0.007
Token said
Token ID531
Feature activation-2.18e+00

Pos logit contributions
+0.003
 understands+0.003
 draws+0.003
 sees+0.003
 means+0.002
Neg logit contributions
ust-0.008
 centuries-0.008
ur-0.008
 ever-0.008
Ar-0.008
Token,
Token ID11
Feature activation-1.46e+00

Pos logit contributions
+0.037
 understands+0.035
 sees+0.033
 means+0.033
 draws+0.032
Neg logit contributions
ust-0.091
 centuries-0.091
Ar-0.089
 prey-0.086
ur-0.086
Token "
Token ID366
Feature activation-6.19e+00

Pos logit contributions
 understands+0.012
 draws+0.012
 sees+0.011
 means+0.011
liness+0.009
Neg logit contributions
 centuries-0.071
 ho-0.071
 ever-0.070
ust-0.070
ur-0.070
TokenYes
Token ID5297
Feature activation+3.97e+00

Pos logit contributions
+0.165
 understands+0.145
Fighting+0.144
 draws+0.142
 means+0.141
Neg logit contributions
 centuries-0.208
ur-0.199
ust-0.197
 prey-0.195
 ho-0.194
Token,
Token ID11
Feature activation-1.44e+00

Pos logit contributions
Ar+0.153
 centuries+0.153
ust+0.152
 prey+0.145
ur+0.144
Neg logit contributions
-0.079
 understands-0.074
 sees-0.072
 "-0.071
 means-0.070
Token please
Token ID3387
Feature activation+2.34e+00

Pos logit contributions
+0.039
 understands+0.038
 sees+0.037
 means+0.037
 draws+0.036
Neg logit contributions
 centuries-0.045
ust-0.044
Ar-0.044
ur-0.042
 eyel-0.041
Token!".
Token ID48220
Feature activation-3.98e+00

Pos logit contributions
ust+0.074
 centuries+0.072
Ar+0.070
 prey+0.068
ur+0.068
Neg logit contributions
-0.065
 understands-0.062
 sees-0.061
 means-0.060
 draws-0.060
Token So
Token ID1406
Feature activation+5.82e+00

Pos logit contributions
+0.078
 understands+0.076
 sees+0.073
 means+0.072
 draws+0.071
Neg logit contributions
ust-0.156
 centuries-0.153
Ar-0.151
 ever-0.146
ur-0.146
Token,
Token ID11
Feature activation-1.48e+00

Pos logit contributions
ust+0.218
 centuries+0.218
 ever+0.213
Ar+0.211
ur+0.211
Neg logit contributions
-0.116
 understands-0.116
 draws-0.112
 sees-0.112
 means-0.108
Token Joe
Token ID5689
Feature activation-4.04e-01

Pos logit contributions
+0.030
 understands+0.028
 sees+0.028
 means+0.028
 realized+0.027
Neg logit contributions
ust-0.057
 centuries-0.056
Ar-0.055
ur-0.053
 eyel-0.053
Token drove
Token ID10357
Feature activation+5.43e+00

Pos logit contributions
+0.009
 understands+0.008
 draws+0.008
 sees+0.008
 means+0.008
Neg logit contributions
ust-0.014
 centuries-0.014
 ever-0.014
Ar-0.014
ur-0.014
Token carrots
Token ID34397
Feature activation+3.94e+00

Pos logit contributions
 understands+0.127
+0.124
Fighting+0.124
 draws+0.121
 realized+0.117
Neg logit contributions
 prey-0.100
 ho-0.097
 centuries-0.096
 souven-0.093
Ar-0.093
Token.
Token ID13
Feature activation+5.15e-01

Pos logit contributions
ust+0.118
 centuries+0.117
 ever+0.113
Ar+0.112
ur+0.112
Neg logit contributions
-0.109
 understands-0.109
 draws-0.106
 sees-0.105
 means-0.103
Token
Token ID198
Feature activation+2.36e+00

Pos logit contributions
ust+0.019
 centuries+0.018
Ar+0.018
ur+0.018
 ever+0.018
Neg logit contributions
-0.011
 understands-0.011
 sees-0.011
 draws-0.011
 means-0.011
Token
Token ID198
Feature activation+2.31e+00

Pos logit contributions
ust+0.092
 centuries+0.092
Ar+0.088
ur+0.088
 ever+0.088
Neg logit contributions
-0.044
 understands-0.044
 sees-0.042
 means-0.041
 draws-0.041
TokenBob
Token ID18861
Feature activation+3.05e+00

Pos logit contributions
 centuries+0.064
 eyel+0.064
ur+0.062
ust+0.062
 ho+0.062
Neg logit contributions
-0.073
 understands-0.067
:"-0.065
 means-0.065
 realized-0.064
Token had
Token ID550
Feature activation-2.78e+00

Pos logit contributions
ust+0.121
 centuries+0.121
 ever+0.120
Ar+0.119
 ho+0.118
Neg logit contributions
-0.051
 understands-0.049
 draws-0.048
 disl-0.047
Fighting-0.046
Token an
Token ID281
Feature activation+4.78e+00

Pos logit contributions
+0.092
 understands+0.087
 draws+0.086
 means+0.085
 sees+0.084
Neg logit contributions
 centuries-0.071
 ever-0.070
ust-0.070
Ar-0.068
 ho-0.066
Token idea
Token ID2126
Feature activation-4.81e+00

Pos logit contributions
ust+0.159
 centuries+0.158
Ar+0.157
ur+0.151
 ever+0.151
Neg logit contributions
 understands-0.115
 sees-0.112
 means-0.112
-0.111
 draws-0.109
Token.
Token ID13
Feature activation+5.47e-01

Pos logit contributions
+0.123
 understands+0.121
 means+0.119
 sees+0.119
 draws+0.115
Neg logit contributions
ust-0.155
 centuries-0.153
Ar-0.153
ur-0.146
 ho-0.144
Token He
Token ID679
Feature activation+2.66e-01

Pos logit contributions
ust+0.016
 centuries+0.016
ur+0.016
Ar+0.016
 ever+0.015
Neg logit contributions
-0.015
 understands-0.015
 draws-0.015
 sees-0.015
 means-0.014
Token would
Token ID561
Feature activation-3.24e+00

Pos logit contributions
ust+0.011
 centuries+0.010
Ar+0.010
ur+0.010
 ever+0.010
Neg logit contributions
-0.005
 understands-0.005
 sees-0.005
 draws-0.005
 means-0.004
Token to
Token ID284
Feature activation-7.16e-02

Pos logit contributions
 centuries+0.013
ust+0.013
 ever+0.013
ur+0.013
 ho+0.013
Neg logit contributions
-0.012
 understands-0.012
 sees-0.011
 draws-0.011
 means-0.011
Token feel
Token ID1254
Feature activation-4.10e+00

Pos logit contributions
+0.002
 understands+0.002
 sees+0.002
 draws+0.002
 means+0.002
Neg logit contributions
 centuries-0.002
ust-0.002
Ar-0.002
 ever-0.002
ur-0.002
Token a
Token ID257
Feature activation+5.81e+00

Pos logit contributions
+0.147
 draws+0.138
 Painter+0.130
liness+0.129
 understands+0.128
Neg logit contributions
 ever-0.093
 centuries-0.092
 souven-0.091
 prey-0.087
Ar-0.086
Token little
Token ID1310
Feature activation+2.27e+00

Pos logit contributions
ust+0.186
 centuries+0.185
ur+0.181
Ar+0.177
 ever+0.173
Neg logit contributions
-0.153
 understands-0.149
 sees-0.146
 realized-0.143
 draws-0.140
Token better
Token ID1365
Feature activation-2.53e+00

Pos logit contributions
ust+0.075
 centuries+0.075
Ar+0.073
 ever+0.072
ur+0.072
Neg logit contributions
-0.056
 understands-0.054
 draws-0.053
 sees-0.052
 means-0.052
Token.
Token ID13
Feature activation+7.42e-01

Pos logit contributions
+0.061
 understands+0.061
 sees+0.061
 means+0.060
 draws+0.058
Neg logit contributions
ust-0.085
 centuries-0.085
Ar-0.083
ur-0.082
 ho-0.078
Token
Token ID198
Feature activation+2.72e+00

Pos logit contributions
 centuries+0.028
Ar+0.028
ur+0.027
ust+0.027
 ever+0.027
Neg logit contributions
-0.014
 understands-0.014
 sees-0.014
 draws-0.013
 means-0.013
Token
Token ID198
Feature activation+2.67e+00

Pos logit contributions
Ar+0.092
 centuries+0.090
ust+0.087
 ever+0.086
and+0.085
Neg logit contributions
-0.066
 draws-0.065
 understands-0.064
 means-0.063
 sees-0.062
TokenBut
Token ID1537
Feature activation-5.39e+00

Pos logit contributions
Ar+0.092
ust+0.092
 centuries+0.091
 ever+0.087
ur+0.086
Neg logit contributions
 understands-0.064
 sees-0.061
 draws-0.061
 means-0.061
-0.061
Token then
Token ID788
Feature activation-5.82e+00

Pos logit contributions
 understands+0.098
 draws+0.093
 disl+0.093
+0.087
liness+0.087
Neg logit contributions
 ever-0.225
 centuries-0.219
 ho-0.213
ur-0.209
 prey-0.209
Token,
Token ID11
Feature activation-1.49e+00

Pos logit contributions
 understands+0.125
+0.125
Fighting+0.123
 draws+0.122
 disl+0.117
Neg logit contributions
 centuries-0.206
 ever-0.202
 ho-0.200
Ar-0.196
ur-0.194
Token Her
Token ID2332
Feature activation+1.91e+00

Pos logit contributions
 centuries+0.018
 eyel+0.018
ur+0.018
ust+0.017
 ever+0.017
Neg logit contributions
 sees-0.016
 understands-0.016
-0.015
Fighting-0.015
 realized-0.015
Token mom
Token ID1995
Feature activation-2.33e+00

Pos logit contributions
ust+0.066
 ho+0.065
 centuries+0.065
 ever+0.064
 eyel+0.063
Neg logit contributions
-0.043
 means-0.042
 draws-0.041
 understands-0.041
Fighting-0.040
Token was
Token ID373
Feature activation-1.36e+00

Pos logit contributions
 disl+0.053
+0.049
Fighting+0.048
 Chess+0.045
 Fighting+0.043
Neg logit contributions
 ever-0.085
 ho-0.080
ur-0.079
 centuries-0.077
ust-0.077
Token so
Token ID523
Feature activation+4.08e+00

Pos logit contributions
+0.039
 understands+0.039
 draws+0.038
 means+0.038
 sees+0.037
Neg logit contributions
ust-0.040
 centuries-0.039
 ever-0.039
Ar-0.037
ur-0.037
Token happy
Token ID3772
Feature activation-4.63e+00

Pos logit contributions
ust+0.156
 centuries+0.151
 ever+0.150
Ar+0.148
ur+0.148
Neg logit contributions
-0.080
 draws-0.079
 understands-0.078
 means-0.078
 sees-0.076
Token and
Token ID290
Feature activation-6.15e-02

Pos logit contributions
+0.100
 understands+0.099
 draws+0.095
 sees+0.094
 means+0.091
Neg logit contributions
 ever-0.167
 centuries-0.165
ust-0.165
ur-0.161
 ho-0.157
Token they
Token ID484
Feature activation+2.43e-01

Pos logit contributions
+0.001
Fighting+0.001
 draws+0.001
 means+0.001
 understands+0.001
Neg logit contributions
 ever-0.002
ur-0.002
 ho-0.002
 centuries-0.002
 souven-0.002
Token both
Token ID1111
Feature activation+4.88e+00

Pos logit contributions
ust+0.009
 centuries+0.009
 ever+0.008
ur+0.008
Ar+0.008
Neg logit contributions
-0.005
 understands-0.005
 means-0.005
 sees-0.005
 draws-0.005
Token hugged
Token ID41130
Feature activation+1.15e-01

Pos logit contributions
ust+0.189
 centuries+0.188
 ever+0.183
ur+0.183
Ar+0.183
Neg logit contributions
-0.094
 understands-0.090
 sees-0.087
 draws-0.086
 means-0.086
Token.
Token ID13
Feature activation+6.15e-01

Pos logit contributions
 ever+0.004
 centuries+0.004
ust+0.004
 ho+0.004
 souven+0.004
Neg logit contributions
-0.003
 understands-0.003
 sees-0.003
 draws-0.003
 means-0.003
Token The
Token ID383
Feature activation-7.36e+00

Pos logit contributions
 centuries+0.017
ur+0.017
 eyel+0.016
and+0.015
uzz+0.015
Neg logit contributions
 sees-0.017
 understands-0.016
 realized-0.016
liness-0.016
 means-0.016
Token Mia
Token ID32189
Feature activation+1.06e+00

Pos logit contributions
 ever+0.101
 centuries+0.100
 prey+0.096
 ho+0.095
ust+0.093
Neg logit contributions
-0.066
 understands-0.055
:"-0.054
 disl-0.054
Fighting-0.053
Token's
Token ID338
Feature activation-1.50e+00

Pos logit contributions
ust+0.034
 centuries+0.034
 ever+0.034
 ho+0.033
ur+0.033
Neg logit contributions
-0.026
 understands-0.026
 draws-0.025
 sees-0.024
 means-0.024
Token paper
Token ID3348
Feature activation-1.88e+00

Pos logit contributions
+0.035
 understands+0.035
 means+0.034
 sees+0.032
 draws+0.032
Neg logit contributions
 centuries-0.051
ust-0.050
 ever-0.050
 ho-0.049
 prey-0.048
Token.
Token ID13
Feature activation+6.42e-01

Pos logit contributions
+0.048
 understands+0.047
 sees+0.047
 draws+0.046
 means+0.046
Neg logit contributions
ust-0.061
 centuries-0.060
Ar-0.059
ur-0.058
 ever-0.056
Token She
Token ID1375
Feature activation+4.58e-01

Pos logit contributions
ust+0.021
 centuries+0.021
Ar+0.020
ur+0.020
 ever+0.020
Neg logit contributions
-0.016
 understands-0.016
 sees-0.015
 draws-0.015
 means-0.015
Token is
Token ID318
Feature activation-3.92e+00

Pos logit contributions
 ho+0.019
 eyel+0.019
ust+0.018
 centuries+0.018
ur+0.018
Neg logit contributions
-0.007
Fighting-0.006
edience-0.006
liness-0.006
:"-0.006
Token using
Token ID1262
Feature activation-4.88e+00

Pos logit contributions
+0.108
 understands+0.101
 means+0.099
 draws+0.098
 sees+0.096
Neg logit contributions
ust-0.122
 centuries-0.121
 ever-0.119
 ho-0.116
ur-0.116
Token the
Token ID262
Feature activation-6.32e+00

Pos logit contributions
+0.095
 understands+0.079
Fighting+0.077
:"+0.074
 sees+0.073
Neg logit contributions
 centuries-0.197
 ho-0.194
 ever-0.193
 souven-0.188
ust-0.186
Token red
Token ID2266
Feature activation-4.20e+00

Pos logit contributions
+0.182
 understands+0.175
Fighting+0.170
 means+0.168
:"+0.166
Neg logit contributions
 centuries-0.182
 ho-0.178
 ever-0.177
 souven-0.176
Ar-0.175
Token cr
Token ID1067
Feature activation+4.75e+00

Pos logit contributions
Fighting+0.141
 understands+0.140
 disl+0.134
+0.133
:"+0.129
Neg logit contributions
 ho-0.099
 ever-0.098
 centuries-0.097
 souven-0.095
 eyel-0.094
Tokenay
Token ID323
Feature activation-2.14e-01

Pos logit contributions
ust+0.151
 centuries+0.150
Ar+0.147
 prey+0.145
ur+0.144
Neg logit contributions
 draws-0.121
-0.121
 understands-0.119
 means-0.116
 sees-0.115
Token.
Token ID13
Feature activation+5.82e-01

Pos logit contributions
ust+0.055
abb+0.050
Ar+0.050
 centuries+0.049
ur+0.049
Neg logit contributions
 "-0.028
-0.028
 sees-0.027
 means-0.026
 understands-0.025
Token She
Token ID1375
Feature activation+9.68e-02

Pos logit contributions
 centuries+0.020
Ar+0.019
ur+0.019
ust+0.019
 ever+0.019
Neg logit contributions
 understands-0.013
 draws-0.013
 sees-0.013
-0.013
 realized-0.013
Token ran
Token ID4966
Feature activation+5.90e+00

Pos logit contributions
ust+0.004
 centuries+0.004
Ar+0.004
ur+0.004
 ever+0.004
Neg logit contributions
-0.002
 understands-0.002
 draws-0.002
 sees-0.002
 means-0.002
Token outside
Token ID2354
Feature activation+4.44e+00

Pos logit contributions
ust+0.243
 centuries+0.231
Ar+0.221
abb+0.220
 eyel+0.217
Neg logit contributions
-0.110
 sees-0.109
 "-0.103
 understands-0.101
 means-0.097
Token to
Token ID284
Feature activation+9.80e-03

Pos logit contributions
ust+0.177
 centuries+0.172
Ar+0.169
ur+0.165
 prey+0.163
Neg logit contributions
-0.084
 sees-0.082
 understands-0.081
 means-0.079
 draws-0.078
Token say
Token ID910
Feature activation-1.31e+00

Pos logit contributions
 centuries+0.000
ust+0.000
 ever+0.000
Ar+0.000
ur+0.000
Neg logit contributions
-0.000
 understands-0.000
 means-0.000
 draws-0.000
 sees-0.000
Token hello
Token ID23748
Feature activation+4.35e+00

Pos logit contributions
+0.046
 draws+0.046
 understands+0.045
Fighting+0.045
 sees+0.045
Neg logit contributions
 ho-0.028
ust-0.027
ur-0.027
 centuries-0.027
 ever-0.027
Token to
Token ID284
Feature activation+2.35e-02

Pos logit contributions
 centuries+0.175
Ar+0.172
ust+0.170
ur+0.159
 souven+0.158
Neg logit contributions
-0.088
 "-0.084
 sees-0.081
 understands-0.080
 means-0.078
Token the
Token ID262
Feature activation-6.43e+00

Pos logit contributions
 centuries+0.001
ust+0.001
 ever+0.001
Ar+0.001
ur+0.001
Neg logit contributions
-0.000
 understands-0.000
 draws-0.000
 sees-0.000
 means-0.000
Token pony
Token ID26902
Feature activation+1.63e+00

Pos logit contributions
+0.186
 understands+0.184
 draws+0.184
 means+0.180
 sees+0.179
Neg logit contributions
ust-0.184
 centuries-0.181
Ar-0.176
 ho-0.174
 ever-0.174
Token.
Token ID13
Feature activation+5.53e-01

Pos logit contributions
ust+0.057
Ar+0.057
 centuries+0.056
abb+0.052
 prey+0.052
Neg logit contributions
-0.039
 sees-0.037
 understands-0.037
 means-0.037
 draws-0.036
Token.
Token ID13
Feature activation+7.33e-01

Pos logit contributions
 sees+0.045
+0.045
 "+0.044
 means+0.043
 understands+0.042
Neg logit contributions
ust-0.078
Ar-0.077
 centuries-0.075
ur-0.074
 souven-0.072
Token He
Token ID679
Feature activation+8.64e-01

Pos logit contributions
Ar+0.022
 centuries+0.021
ur+0.021
 ho+0.020
ust+0.020
Neg logit contributions
 understands-0.019
-0.019
Fighting-0.019
 draws-0.019
 realized-0.018
Token was
Token ID373
Feature activation-1.29e+00

Pos logit contributions
ust+0.036
 centuries+0.036
Ar+0.035
ur+0.034
 ever+0.034
Neg logit contributions
-0.014
 understands-0.014
 sees-0.013
 draws-0.013
 means-0.013
Token so
Token ID523
Feature activation+4.21e+00

Pos logit contributions
+0.036
 understands+0.036
 draws+0.035
 sees+0.035
 means+0.034
Neg logit contributions
ust-0.038
 centuries-0.038
Ar-0.037
ur-0.036
 ever-0.036
Token curious
Token ID11040
Feature activation-3.13e+00

Pos logit contributions
 centuries+0.150
ust+0.147
Ar+0.141
ur+0.140
 prey+0.140
Neg logit contributions
 understands-0.098
-0.098
 sees-0.097
 means-0.090
 realized-0.089
Token that
Token ID326
Feature activation-9.98e+00

Pos logit contributions
+0.071
 understands+0.068
 draws+0.067
Fighting+0.063
 sees+0.063
Neg logit contributions
 ever-0.113
 centuries-0.111
ust-0.109
 prey-0.107
ur-0.107
Token he
Token ID339
Feature activation+1.62e+00

Pos logit contributions
 understands+0.325
+0.324
 draws+0.317
Fighting+0.307
:"+0.302
Neg logit contributions
 ever-0.270
 centuries-0.257
 souven-0.248
 ho-0.248
Ar-0.244
Token wanted
Token ID2227
Feature activation-1.78e+00

Pos logit contributions
 centuries+0.064
ust+0.064
Ar+0.062
ur+0.061
 ever+0.061
Neg logit contributions
-0.030
 understands-0.029
 sees-0.029
 draws-0.028
 means-0.028
Token to
Token ID284
Feature activation+3.95e-01

Pos logit contributions
 understands+0.037
 sees+0.037
+0.036
 means+0.035
 draws+0.035
Neg logit contributions
ust-0.067
 centuries-0.065
Ar-0.064
ur-0.063
abb-0.062
Token find
Token ID1064
Feature activation-2.28e+00

Pos logit contributions
ust+0.014
ider+0.014
 centuries+0.014
abb+0.013
Ar+0.013
Neg logit contributions
 sees-0.009
 realized-0.009
 "-0.009
 the-0.009
 means-0.009
Token out
Token ID503
Feature activation+9.61e+00

Pos logit contributions
+0.064
 understands+0.063
 sees+0.061
 draws+0.061
 means+0.060
Neg logit contributions
ust-0.068
 centuries-0.068
Ar-0.065
ur-0.065
 ever-0.064
Token that
Token ID326
Feature activation-1.04e+01

Pos logit contributions
+0.093
 understands+0.091
 sees+0.089
 means+0.088
 draws+0.088
Neg logit contributions
ust-0.128
 centuries-0.126
Ar-0.124
ur-0.121
 ho-0.120
Token day
Token ID1110
Feature activation-2.89e+00

Pos logit contributions
+0.319
:"+0.290
 understands+0.286
 sees+0.279
 draws+0.270
Neg logit contributions
 ever-0.328
 centuries-0.290
Ar-0.287
 ho-0.278
ust-0.278
Token.
Token ID13
Feature activation+5.60e-01

Pos logit contributions
+0.089
 draws+0.087
 understands+0.087
 sees+0.087
 disl+0.083
Neg logit contributions
 centuries-0.077
 ever-0.077
ust-0.076
ur-0.072
 ho-0.072
Token They
Token ID1119
Feature activation-2.20e-03

Pos logit contributions
ust+0.023
 centuries+0.023
ur+0.022
 ever+0.022
Ar+0.022
Neg logit contributions
 means-0.009
 understands-0.009
-0.009
 draws-0.009
 "-0.009
Token learned
Token ID4499
Feature activation-5.66e+00

Pos logit contributions
+0.000
 understands+0.000
 draws+0.000
 sees+0.000
:"+0.000
Neg logit contributions
ust-0.000
 ever-0.000
 centuries-0.000
Ar-0.000
ur-0.000
Token that
Token ID326
Feature activation-1.04e+01

Pos logit contributions
+0.105
 understands+0.102
 sees+0.099
 draws+0.098
 means+0.096
Neg logit contributions
ust-0.223
 centuries-0.222
Ar-0.216
ur-0.214
 ever-0.214
Token sometimes
Token ID3360
Feature activation-6.75e+00

Pos logit contributions
+0.217
 understands+0.216
 means+0.212
 sees+0.210
 draws+0.209
Neg logit contributions
ust-0.386
 centuries-0.381
Ar-0.371
ur-0.368
 ho-0.357
Token it
Token ID340
Feature activation-7.28e+00

Pos logit contributions
+0.142
 understands+0.139
 draws+0.133
 sees+0.132
:"+0.128
Neg logit contributions
 centuries-0.247
ust-0.243
 ever-0.241
 ho-0.237
Ar-0.236
Token is
Token ID318
Feature activation-4.01e+00

Pos logit contributions
+0.103
:"+0.086
 understands+0.083
 draws+0.083
 disl+0.083
Neg logit contributions
 ever-0.319
ust-0.316
 centuries-0.306
ur-0.303
 ho-0.297
Token good
Token ID922
Feature activation-2.10e+00

Pos logit contributions
+0.102
 understands+0.102
 means+0.100
 sees+0.099
 draws+0.098
Neg logit contributions
 centuries-0.128
ust-0.127
Ar-0.124
ur-0.123
 ever-0.121
Token to
Token ID284
Feature activation+7.92e-02

Pos logit contributions
+0.056
 understands+0.053
 draws+0.052
 sees+0.052
:"+0.050
Neg logit contributions
 centuries-0.065
ust-0.065
 ever-0.065
ur-0.063
 ho-0.063
Token they
Token ID484
Feature activation+9.04e-02

Pos logit contributions
+0.004
 understands+0.004
 draws+0.004
 sees+0.004
Fighting+0.004
Neg logit contributions
 centuries-0.008
 ever-0.008
 ho-0.008
ust-0.008
Ar-0.008
Token continued
Token ID3767
Feature activation+3.30e+00

Pos logit contributions
ust+0.003
 centuries+0.003
Ar+0.003
 ever+0.003
ur+0.003
Neg logit contributions
-0.002
 understands-0.002
 draws-0.002
 sees-0.002
 means-0.002
Token walking
Token ID6155
Feature activation+8.45e+00

Pos logit contributions
 centuries+0.101
ust+0.100
Ar+0.098
 ever+0.098
ur+0.097
Neg logit contributions
 understands-0.088
-0.088
 draws-0.085
 sees-0.085
 means-0.084
Token.
Token ID13
Feature activation+4.89e-01

Pos logit contributions
 centuries+0.263
ust+0.263
Ar+0.255
 ever+0.252
ur+0.251
Neg logit contributions
-0.223
 understands-0.222
 draws-0.217
 sees-0.215
 means-0.212
Token Even
Token ID3412
Feature activation-1.28e+00

Pos logit contributions
 centuries+0.018
ust+0.018
Ar+0.017
 ever+0.017
ur+0.017
Neg logit contributions
-0.010
 understands-0.010
 sees-0.010
 draws-0.010
 means-0.010
Token though
Token ID996
Feature activation-8.35e+00

Pos logit contributions
+0.031
 understands+0.031
 draws+0.030
 sees+0.029
 means+0.029
Neg logit contributions
ust-0.043
 centuries-0.043
Ar-0.041
 ever-0.041
ur-0.041
Token Joe
Token ID5689
Feature activation-3.43e-01

Pos logit contributions
+0.123
 understands+0.117
 means+0.115
 sees+0.112
 realized+0.107
Neg logit contributions
ust-0.366
ur-0.354
Ar-0.351
 centuries-0.351
abb-0.346
Token couldn
Token ID3521
Feature activation-2.64e+00

Pos logit contributions
+0.006
 draws+0.006
 disl+0.006
Fighting+0.006
:"+0.005
Neg logit contributions
 ever-0.014
ust-0.013
 ho-0.013
ur-0.013
 centuries-0.012
Token't
Token ID470
Feature activation+3.34e+00

Pos logit contributions
+0.075
 means+0.073
 understands+0.071
 sees+0.071
 "+0.069
Neg logit contributions
ust-0.081
 centuries-0.078
Ar-0.076
 prey-0.076
 souven-0.075
Token help
Token ID1037
Feature activation-6.08e-01

Pos logit contributions
 centuries+0.117
Ar+0.115
ust+0.114
abb+0.109
and+0.108
Neg logit contributions
 realized-0.079
-0.075
 that-0.074
 sees-0.073
 "-0.071
Token the
Token ID262
Feature activation-6.09e+00

Pos logit contributions
 means+0.012
 "+0.012
 sees+0.012
+0.011
 understands+0.011
Neg logit contributions
Ar-0.024
 centuries-0.023
ust-0.023
ur-0.022
abb-0.022
Token want
Token ID765
Feature activation-3.00e-01

Pos logit contributions
ust+0.098
 centuries+0.097
Ar+0.093
 ever+0.092
ur+0.092
Neg logit contributions
-0.103
 understands-0.102
 sees-0.099
 draws-0.099
 means-0.098
Token to
Token ID284
Feature activation+2.86e-01

Pos logit contributions
 sees+0.006
 understands+0.006
 means+0.006
 draws+0.006
+0.005
Neg logit contributions
ust-0.012
Ar-0.012
 centuries-0.011
ur-0.011
 ho-0.011
Token be
Token ID307
Feature activation+1.67e+00

Pos logit contributions
 centuries+0.009
ust+0.009
Ar+0.009
ur+0.009
 prey+0.009
Neg logit contributions
-0.007
 sees-0.007
 understands-0.007
 draws-0.007
 means-0.007
Token mean
Token ID1612
Feature activation-1.52e+00

Pos logit contributions
 centuries+0.057
Ar+0.057
ust+0.056
ur+0.053
 souven+0.052
Neg logit contributions
 sees-0.039
 understands-0.039
-0.038
 means-0.038
 draws-0.037
Token like
Token ID588
Feature activation-1.95e+00

Pos logit contributions
+0.035
 understands+0.034
 means+0.034
 sees+0.034
 draws+0.033
Neg logit contributions
 centuries-0.053
ust-0.053
Ar-0.053
ur-0.051
 souven-0.049
Token that
Token ID326
Feature activation-1.01e+01

Pos logit contributions
+0.046
 understands+0.045
 sees+0.044
 draws+0.044
 means+0.043
Neg logit contributions
ust-0.067
 centuries-0.065
Ar-0.065
ur-0.063
 ever-0.062
Token bird
Token ID6512
Feature activation-1.42e+00

Pos logit contributions
+0.261
 understands+0.257
 sees+0.249
 draws+0.249
 means+0.244
Neg logit contributions
ust-0.323
 centuries-0.321
Ar-0.309
 ever-0.308
ur-0.304
Token.
Token ID13
Feature activation+5.16e-01

Pos logit contributions
+0.039
 understands+0.038
 draws+0.038
 sees+0.037
Fighting+0.036
Neg logit contributions
ust-0.044
 ever-0.043
 centuries-0.041
 ho-0.040
ur-0.040
Token I
Token ID314
Feature activation+2.23e+00

Pos logit contributions
ust+0.017
ur+0.017
 centuries+0.017
 ever+0.017
Ar+0.016
Neg logit contributions
 sees-0.012
 understands-0.012
-0.011
 realized-0.011
 draws-0.011
Token want
Token ID765
Feature activation-2.63e-01

Pos logit contributions
ust+0.076
 centuries+0.074
 ever+0.073
Ar+0.073
ur+0.073
Neg logit contributions
-0.054
 understands-0.052
 draws-0.050
 sees-0.050
 means-0.049
Token to
Token ID284
Feature activation+3.50e-01

Pos logit contributions
+0.006
 understands+0.006
 sees+0.006
 draws+0.006
 means+0.006
Neg logit contributions
ust-0.009
 centuries-0.009
Ar-0.009
 ever-0.008
ur-0.008
Token new
Token ID649
Feature activation-8.52e-01

Pos logit contributions
ust+0.206
 centuries+0.206
Ar+0.199
ur+0.197
 ever+0.196
Neg logit contributions
-0.133
 understands-0.131
 sees-0.127
 draws-0.125
 means-0.124
Token friend
Token ID1545
Feature activation-4.83e-01

Pos logit contributions
 understands+0.022
 sees+0.022
+0.021
 means+0.021
 draws+0.021
Neg logit contributions
ust-0.028
 centuries-0.028
ur-0.027
Ar-0.027
abb-0.026
Token.
Token ID13
Feature activation+4.38e-01

Pos logit contributions
+0.014
 understands+0.014
 sees+0.013
 draws+0.013
 means+0.013
Neg logit contributions
ust-0.014
 centuries-0.014
 ever-0.014
ur-0.013
Ar-0.013
Token She
Token ID1375
Feature activation+2.58e-02

Pos logit contributions
ust+0.014
 centuries+0.014
ur+0.014
 eyel+0.013
and+0.013
Neg logit contributions
 sees-0.011
 understands-0.010
 means-0.010
-0.010
Fighting-0.010
Token knew
Token ID2993
Feature activation-4.15e+00

Pos logit contributions
ust+0.001
 ever+0.001
 centuries+0.001
Ar+0.001
ur+0.001
Neg logit contributions
-0.000
 understands-0.000
 means-0.000
Fighting-0.000
 draws-0.000
Token that
Token ID326
Feature activation-1.05e+01

Pos logit contributions
+0.069
 sees+0.064
 draws+0.063
 understands+0.062
 means+0.054
Neg logit contributions
 ever-0.177
ust-0.175
 centuries-0.174
ur-0.172
 prey-0.166
Token sometimes
Token ID3360
Feature activation-6.68e+00

Pos logit contributions
+0.228
 understands+0.221
 sees+0.213
 draws+0.209
 means+0.202
Neg logit contributions
ust-0.382
 centuries-0.381
 ever-0.375
Ar-0.366
ur-0.364
Token it
Token ID340
Feature activation-7.59e+00

Pos logit contributions
+0.147
 understands+0.143
 draws+0.136
 sees+0.135
:"+0.133
Neg logit contributions
 ever-0.242
 centuries-0.236
ust-0.232
 ho-0.226
 prey-0.221
Token takes
Token ID2753
Feature activation+2.18e-01

Pos logit contributions
+0.116
 understands+0.101
 draws+0.098
:"+0.097
 sees+0.096
Neg logit contributions
ust-0.330
 ever-0.329
 centuries-0.314
ur-0.311
 ho-0.304
Token a
Token ID257
Feature activation+5.74e+00

Pos logit contributions
ust+0.008
 centuries+0.008
Ar+0.008
ider+0.008
ur+0.008
Neg logit contributions
 means-0.005
 understands-0.005
 draws-0.005
 sees-0.005
-0.004
Token little
Token ID1310
Feature activation+2.17e+00

Pos logit contributions
ust+0.197
 centuries+0.194
ider+0.194
izards+0.189
ur+0.185
Neg logit contributions
 sees-0.139
 "-0.126
 machine-0.125
-0.125
 understands-0.125