TOP ACTIVATIONS
MAX = 0.163

thought about how important the judge 's job was .
his lawyer . As the judge read out the verdict ,
didn 't understand why the judge was angry at him .
had to hope that the judge and the man who was
. He had let the judge down and he was very
lesson that day to never judge an object by the way
. She walks to the judge . She says hello .
the rules set by the judge . <|endoftext|> Once upon a
to run . Everywhere the turtle went , he loved to
closer and closer to the judge . When he reached the
a girl came to the judge . She had a problem
harsh queen . Everywhere the Prince looked , he could see
came along and saw how selfish the f auc et was
decided to try and unsc rew it . He twisted and
away . The judge felt bad for sc aring
fun . The judge blows his whistle . He
on the grass , unsc rew ed the lid of the
Every day the judge had to decide cases in
until they finally saw the judge 's house . When her
everyone ." The judge picked up Lily 's doll

MIN ACTIVATIONS
MIN = -0.187

outside and Sus ie showed Jo jo a big tire .
with him . Winston told Jo jo , " Cour age
mom gave her a night light to make it less scary
the tent with their flash lights and their t eddy bears
by night she used moon light to help it shine .
to bed with her night light . The next day ,
so excited ! She loved pic n ics and spending time
went on adventures , had pic n ics , and even
and nodded . He loved pic n ics and couldn 't
girl named Lily who loved pic n ics . She would
favourite place to go for pic n ics . <|endoftext|> Once
was very proud of his ig loo . He liked playing
robe and wear her p aj amas instead . But Lily
liked to play and have pic n ics . One day
was so impressed with her ig loo at the end !
pond , with the moon light dancing on its feathers .
was still wearing her p aj amas .
roof . They would have pic n ics and play games
a family who really enjoyed pic n ics . They loved
even showed the rabbit his ig loo , which the rabbit

INTERVAL 0.076 - 0.163
CONTAINS 0.371%

hair now ." Lily sat up and her mom poured water
. He decided to clean up the mess and then rock
decided to help her clean up the mess . They learned
ily nodded , and climbed up the ladder . She felt
hug and told her how proud she was of her for

INTERVAL -0.012 - 0.076
CONTAINS 69.878%

best friend who had left the note . They had a
field full of colors . He decided to display his flowers
. Mom called the shop and ordered a deep pizza .
L ily and Tim my played together for a
to a hive to get some honey , but the bees

INTERVAL -0.099 - -0.012
CONTAINS 29.653%

They still have a lot of paper . They can make
looked in awe . She had never seen anything so big
back and he knew he had made the right choice in
full of danger and difficult times . The warrior
came back . They saw Mae and excited ly ran towards

INTERVAL -0.187 - -0.099
CONTAINS 0.097%

l adders . They were firefighters . They sprayed water on
promised he wouldn â t complain about the hot
. It was the prett iest thing Lucy had ever seen
, it doesn â t feel safe . â
One day , she was tid ying her room when she
Token thought
Token ID1807
Feature activation+4.86e-02

Pos logit contributions
 end+0.034
 other+0.028
 mean+0.026
 her+0.026
 decide+0.025
Neg logit contributions
belt-0.043
amas-0.041
aj-0.040
icol-0.040
roller-0.040
Token about
Token ID546
Feature activation-5.72e-02

Pos logit contributions
 her+0.078
 end+0.070
 other+0.064
 the+0.063
 say+0.062
Neg logit contributions
amas-0.122
belt-0.117
roller-0.117
aj-0.116
coat-0.113
Token how
Token ID703
Feature activation+3.52e-02

Pos logit contributions
belt+0.143
+0.141
amas+0.140
adows+0.136
anas+0.136
Neg logit contributions
 end-0.048
 decide-0.041
 asks-0.040
 say-0.038
 angry-0.036
Token important
Token ID1593
Feature activation+1.16e-01

Pos logit contributions
 end+0.036
 her+0.032
 say+0.030
 decide+0.028
 other+0.026
Neg logit contributions
amas-0.098
belt-0.097
aj-0.094
roller-0.092
-0.092
Token the
Token ID262
Feature activation+3.04e-02

Pos logit contributions
 end+0.117
 say+0.103
 decide+0.097
 asks+0.093
 her+0.092
Neg logit contributions
belt-0.313
amas-0.303
-0.295
coat-0.292
roller-0.290
Token judge
Token ID5052
Feature activation+1.63e-01

Pos logit contributions
 her+0.035
 end+0.029
 both+0.029
 say+0.029
 says+0.028
Neg logit contributions
belt-0.072
roller-0.069
amas-0.069
coat-0.067
ney-0.066
Token's
Token ID338
Feature activation+4.03e-02

Pos logit contributions
 end+0.250
 her+0.229
 decide+0.218
 say+0.214
 angry+0.202
Neg logit contributions
belt-0.354
amas-0.339
coat-0.325
 outage-0.319
aj-0.319
Token job
Token ID1693
Feature activation+2.51e-02

Pos logit contributions
 end+0.058
 her+0.057
 say+0.052
 decide+0.050
 other+0.049
Neg logit contributions
amas-0.096
belt-0.094
aj-0.092
roller-0.091
coat-0.088
Token was
Token ID373
Feature activation+7.00e-03

Pos logit contributions
 end+0.030
 her+0.030
 say+0.027
 other+0.026
 decide+0.024
Neg logit contributions
amas-0.068
aj-0.066
belt-0.065
roller-0.064
-0.062
Token.
Token ID13
Feature activation-2.09e-04

Pos logit contributions
 her+0.007
 end+0.007
 say+0.007
 other+0.006
 angry+0.006
Neg logit contributions
amas-0.021
belt-0.020
aj-0.020
roller-0.020
-0.019
Token
Token ID220
Feature activation+1.00e-02

Pos logit contributions
amas+0.000
belt+0.000
aj+0.000
roller+0.000
coat+0.000
Neg logit contributions
 her-0.000
 end-0.000
 say-0.000
 other-0.000
 decide-0.000
Token his
Token ID465
Feature activation-2.70e-02

Pos logit contributions
 end+0.022
 her+0.020
 say+0.018
 decide+0.017
 other+0.016
Neg logit contributions
amas-0.073
belt-0.073
aj-0.070
roller-0.070
coat-0.069
Token lawyer
Token ID6853
Feature activation+6.79e-03

Pos logit contributions
amas+0.065
belt+0.064
aj+0.062
roller+0.062
coat+0.060
Neg logit contributions
 end-0.038
 her-0.037
 say-0.034
 decide-0.033
 other-0.032
Token.
Token ID13
Feature activation-5.97e-03

Pos logit contributions
 end+0.008
 her+0.008
 say+0.007
 other+0.007
 decide+0.007
Neg logit contributions
amas-0.018
belt-0.017
aj-0.017
roller-0.017
coat-0.016
Token As
Token ID1081
Feature activation+3.68e-02

Pos logit contributions
coat+0.014
belt+0.014
imm+0.013
aj+0.013
amas+0.013
Neg logit contributions
 her-0.010
 end-0.009
 asks-0.009
 say-0.008
 the-0.008
Token the
Token ID262
Feature activation+1.62e-02

Pos logit contributions
 her+0.045
 end+0.045
 say+0.039
 other+0.038
 decide+0.036
Neg logit contributions
amas-0.099
belt-0.095
aj-0.094
roller-0.093
coat-0.092
Token judge
Token ID5052
Feature activation+1.58e-01

Pos logit contributions
 her+0.023
 end+0.022
 say+0.021
 decide+0.021
 asks+0.020
Neg logit contributions
belt-0.037
amas-0.034
roller-0.032
coat-0.030
-0.030
Token read
Token ID1100
Feature activation+4.96e-02

Pos logit contributions
 end+0.198
 her+0.192
 say+0.175
 decide+0.166
 other+0.162
Neg logit contributions
amas-0.398
belt-0.396
aj-0.379
roller-0.378
coat-0.370
Token out
Token ID503
Feature activation-2.00e-02

Pos logit contributions
 her+0.044
 end+0.043
 say+0.035
 other+0.033
 angry+0.032
Neg logit contributions
amas-0.149
aj-0.145
belt-0.145
roller-0.142
-0.140
Token the
Token ID262
Feature activation-1.86e-02

Pos logit contributions
amas+0.058
belt+0.057
aj+0.056
roller+0.055
coat+0.054
Neg logit contributions
 end-0.018
 her-0.017
 say-0.016
 decide-0.014
 other-0.014
Token verdict
Token ID15593
Feature activation+3.58e-02

Pos logit contributions
belt+0.043
amas+0.043
roller+0.040
coat+0.039
aj+0.039
Neg logit contributions
 her-0.025
 end-0.025
 say-0.022
 decide-0.022
 asks-0.021
Token,
Token ID11
Feature activation+2.24e-02

Pos logit contributions
 her+0.035
 end+0.033
 other+0.027
 say+0.026
 the+0.025
Neg logit contributions
amas-0.104
belt-0.102
roller-0.101
aj-0.100
-0.098
Token didn
Token ID1422
Feature activation+1.50e-02

Pos logit contributions
 end+0.027
 her+0.026
 say+0.024
 decide+0.023
 other+0.022
Neg logit contributions
amas-0.051
belt-0.050
aj-0.049
roller-0.049
coat-0.047
Token't
Token ID470
Feature activation+2.09e-02

Pos logit contributions
 end+0.024
 her+0.023
 say+0.022
 decide+0.021
 other+0.021
Neg logit contributions
amas-0.033
belt-0.033
roller-0.031
aj-0.031
-0.030
Token understand
Token ID1833
Feature activation-1.02e-02

Pos logit contributions
 her+0.025
 end+0.024
 asks+0.023
 says+0.022
els+0.021
Neg logit contributions
coat-0.045
 outage-0.045
onna-0.041
belt-0.041
anas-0.041
Token why
Token ID1521
Feature activation-3.70e-02

Pos logit contributions
amas+0.028
belt+0.027
aj+0.027
roller+0.026
+0.026
Neg logit contributions
 end-0.010
 her-0.009
 say-0.009
 decide-0.008
 asks-0.008
Token the
Token ID262
Feature activation+2.21e-02

Pos logit contributions
amas+0.095
belt+0.093
aj+0.090
roller+0.089
+0.087
Neg logit contributions
 end-0.046
 her-0.044
 say-0.041
 decide-0.039
 other-0.037
Token judge
Token ID5052
Feature activation+1.52e-01

Pos logit contributions
 her+0.025
 end+0.020
 say+0.020
 says+0.019
 asks+0.019
Neg logit contributions
belt-0.054
roller-0.051
amas-0.051
coat-0.050
tub-0.049
Token was
Token ID373
Feature activation+9.43e-03

Pos logit contributions
 end+0.212
 her+0.203
 say+0.184
 decide+0.183
 other+0.173
Neg logit contributions
belt-0.353
amas-0.347
coat-0.330
roller-0.329
aj-0.327
Token angry
Token ID7954
Feature activation+3.12e-02

Pos logit contributions
 end+0.013
 her+0.012
 say+0.012
 decide+0.012
 asks+0.011
Neg logit contributions
amas-0.021
belt-0.021
aj-0.020
roller-0.020
coat-0.020
Token at
Token ID379
Feature activation+2.25e-03

Pos logit contributions
 end+0.036
 her+0.035
 say+0.031
 decide+0.029
 other+0.029
Neg logit contributions
amas-0.084
belt-0.082
aj-0.080
roller-0.079
coat-0.078
Token him
Token ID683
Feature activation+6.19e-03

Pos logit contributions
 end+0.002
 say+0.002
 her+0.002
 asks+0.002
 decide+0.002
Neg logit contributions
amas-0.006
belt-0.006
aj-0.006
roller-0.006
-0.006
Token.
Token ID13
Feature activation-8.20e-03

Pos logit contributions
 end+0.007
 her+0.007
 say+0.006
 decide+0.006
 other+0.006
Neg logit contributions
amas-0.017
belt-0.016
aj-0.016
roller-0.016
coat-0.016
Token had
Token ID550
Feature activation-3.02e-03

Pos logit contributions
 end+0.017
 her+0.016
 say+0.014
 other+0.013
 decide+0.013
Neg logit contributions
belt-0.035
aj-0.034
amas-0.034
roller-0.033
coat-0.032
Token to
Token ID284
Feature activation-1.15e-02

Pos logit contributions
amas+0.008
belt+0.008
aj+0.007
roller+0.007
+0.007
Neg logit contributions
 her-0.004
 end-0.004
 say-0.004
 other-0.003
 angry-0.003
Token hope
Token ID2911
Feature activation-6.12e-02

Pos logit contributions
belt+0.030
amas+0.030
roller+0.029
coat+0.028
aj+0.028
Neg logit contributions
 end-0.011
 her-0.010
 asks-0.010
 other-0.009
 angry-0.009
Token that
Token ID326
Feature activation+4.22e-02

Pos logit contributions
amas+0.175
belt+0.171
aj+0.168
roller+0.166
coat+0.163
Neg logit contributions
 end-0.062
 her-0.062
 say-0.053
 other-0.050
 decide-0.049
Token the
Token ID262
Feature activation+1.61e-02

Pos logit contributions
 end+0.050
 her+0.045
 say+0.043
 decide+0.043
 asks+0.040
Neg logit contributions
belt-0.104
amas-0.103
roller-0.096
aj-0.096
-0.096
Token judge
Token ID5052
Feature activation+1.50e-01

Pos logit contributions
 her+0.027
 end+0.027
 say+0.025
 decide+0.024
 asks+0.024
Neg logit contributions
amas-0.034
belt-0.033
aj-0.031
roller-0.031
coat-0.030
Token and
Token ID290
Feature activation+1.38e-02

Pos logit contributions
 end+0.218
 her+0.210
 say+0.195
 decide+0.189
 other+0.184
Neg logit contributions
belt-0.347
amas-0.345
aj-0.328
roller-0.324
coat-0.321
Token the
Token ID262
Feature activation+1.88e-03

Pos logit contributions
 end+0.012
 her+0.010
 say+0.009
 decide+0.009
 other+0.008
Neg logit contributions
belt-0.040
amas-0.040
aj-0.038
roller-0.038
coat-0.037
Token man
Token ID582
Feature activation+7.96e-02

Pos logit contributions
 her+0.002
 end+0.002
 say+0.002
 asks+0.002
 decide+0.002
Neg logit contributions
amas-0.005
belt-0.005
aj-0.004
roller-0.004
coat-0.004
Token who
Token ID508
Feature activation+8.42e-04

Pos logit contributions
 end+0.106
 her+0.103
 say+0.094
 decide+0.090
 other+0.089
Neg logit contributions
amas-0.193
belt-0.193
aj-0.185
roller-0.182
coat-0.180
Token was
Token ID373
Feature activation+1.20e-02

Pos logit contributions
 end+0.001
 her+0.001
 decide+0.001
 say+0.001
 other+0.001
Neg logit contributions
belt-0.002
amas-0.002
coat-0.002
aj-0.002
roller-0.002
Token.
Token ID13
Feature activation-1.09e-03

Pos logit contributions
amas+0.093
aj+0.091
roller+0.091
belt+0.090
coat+0.088
Neg logit contributions
 end-0.026
 her-0.026
 say-0.020
 other-0.020
 angry-0.019
Token He
Token ID679
Feature activation+1.01e-02

Pos logit contributions
belt+0.002
aj+0.002
coat+0.002
amas+0.002
roller+0.002
Neg logit contributions
 her-0.002
 end-0.002
 other-0.002
 asks-0.002
 say-0.002
Token had
Token ID550
Feature activation-1.40e-02

Pos logit contributions
 her+0.014
 end+0.014
 say+0.012
 decide+0.012
 other+0.012
Neg logit contributions
belt-0.022
amas-0.021
roller-0.020
aj-0.020
ney-0.020
Token let
Token ID1309
Feature activation-1.03e-02

Pos logit contributions
amas+0.037
belt+0.036
aj+0.035
roller+0.034
coat+0.034
Neg logit contributions
 her-0.019
 end-0.019
 other-0.017
 say-0.016
 decide-0.016
Token the
Token ID262
Feature activation+4.07e-03

Pos logit contributions
amas+0.032
belt+0.032
aj+0.031
roller+0.031
coat+0.031
Neg logit contributions
 end-0.007
 her-0.007
 say-0.006
 decide-0.005
 other-0.005
Token judge
Token ID5052
Feature activation+1.49e-01

Pos logit contributions
 her+0.005
eks+0.004
 asks+0.004
 end+0.004
 says+0.004
Neg logit contributions
belt-0.009
roller-0.009
amas-0.009
-0.009
tted-0.009
Token down
Token ID866
Feature activation+2.71e-02

Pos logit contributions
 end+0.174
 her+0.167
 say+0.150
 decide+0.141
 other+0.140
Neg logit contributions
amas-0.385
belt-0.384
aj-0.368
roller-0.366
coat-0.359
Token and
Token ID290
Feature activation+1.28e-02

Pos logit contributions
 her+0.024
 end+0.023
 say+0.020
 other+0.018
 decide+0.017
Neg logit contributions
amas-0.083
belt-0.081
aj-0.081
roller-0.079
-0.078
Token he
Token ID339
Feature activation-2.98e-02

Pos logit contributions
 end+0.016
 her+0.015
 say+0.013
 decide+0.013
 other+0.012
Neg logit contributions
belt-0.033
amas-0.033
aj-0.031
roller-0.031
-0.030
Token was
Token ID373
Feature activation-1.39e-02

Pos logit contributions
amas+0.068
belt+0.068
aj+0.065
roller+0.064
coat+0.063
Neg logit contributions
 end-0.044
 her-0.043
 say-0.039
 decide-0.038
 other-0.037
Token very
Token ID845
Feature activation-2.02e-02

Pos logit contributions
amas+0.036
belt+0.035
aj+0.034
roller+0.034
coat+0.033
Neg logit contributions
 end-0.017
 her-0.016
 say-0.015
 decide-0.014
 other-0.014
Token lesson
Token ID11483
Feature activation+5.36e-02

Pos logit contributions
 end+0.078
 her+0.077
 say+0.073
 decide+0.071
 angry+0.069
Neg logit contributions
amas-0.087
belt-0.086
roller-0.082
aj-0.081
coat-0.079
Token that
Token ID326
Feature activation+1.83e-02

Pos logit contributions
 her+0.049
 end+0.048
 say+0.039
 other+0.037
 decide+0.033
Neg logit contributions
amas-0.165
aj-0.158
belt-0.155
roller-0.153
coat-0.153
Token day
Token ID1110
Feature activation+2.88e-02

Pos logit contributions
 end+0.023
 her+0.022
 say+0.020
 other+0.019
 decide+0.019
Neg logit contributions
amas-0.048
belt-0.047
aj-0.046
roller-0.045
coat-0.044
Token to
Token ID284
Feature activation+2.12e-02

Pos logit contributions
 her+0.031
 end+0.031
 say+0.025
 other+0.024
 the+0.023
Neg logit contributions
amas-0.084
aj-0.081
belt-0.081
coat-0.080
roller-0.080
Token never
Token ID1239
Feature activation-2.07e-02

Pos logit contributions
 her+0.024
 end+0.023
 angry+0.023
 asks+0.023
 says+0.021
Neg logit contributions
amas-0.050
belt-0.050
roller-0.048
aj-0.047
coat-0.046
Token judge
Token ID5052
Feature activation+1.48e-01

Pos logit contributions
onna+0.045
amas+0.045
roller+0.045
coat+0.045
belt+0.045
Neg logit contributions
 her-0.024
 end-0.023
 other-0.019
 angry-0.019
 asks-0.019
Token an
Token ID281
Feature activation-4.19e-03

Pos logit contributions
 end+0.146
 her+0.136
 say+0.124
 decide+0.116
 other+0.109
Neg logit contributions
amas-0.414
belt-0.408
aj-0.397
roller-0.392
-0.386
Token object
Token ID2134
Feature activation+7.95e-02

Pos logit contributions
belt+0.009
 outage+0.008
roller+0.008
amas+0.008
ney+0.008
Neg logit contributions
 her-0.006
 say-0.005
 asks-0.005
 says-0.005
 end-0.005
Token by
Token ID416
Feature activation+6.25e-02

Pos logit contributions
 end+0.094
 her+0.092
 say+0.082
 other+0.078
 decide+0.076
Neg logit contributions
amas-0.218
aj-0.210
belt-0.209
roller-0.205
-0.200
Token the
Token ID262
Feature activation+1.23e-02

Pos logit contributions
 end+0.078
 her+0.076
 say+0.069
 decide+0.065
 other+0.065
Neg logit contributions
amas-0.162
belt-0.159
aj-0.155
roller-0.153
coat-0.150
Token way
Token ID835
Feature activation-2.10e-02

Pos logit contributions
 her+0.019
 say+0.017
 asks+0.017
 ask+0.016
 says+0.016
Neg logit contributions
belt-0.025
amas-0.024
roller-0.024
ney-0.022
aj-0.022
Token.
Token ID13
Feature activation-1.13e-02

Pos logit contributions
amas+0.022
belt+0.022
roller+0.021
aj+0.021
+0.021
Neg logit contributions
 her-0.009
 end-0.009
 say-0.008
 angry-0.007
 other-0.007
Token She
Token ID1375
Feature activation+1.59e-02

Pos logit contributions
belt+0.031
amas+0.031
aj+0.030
coat+0.030
roller+0.029
Neg logit contributions
 her-0.014
 end-0.011
 say-0.011
 asks-0.011
 ask-0.010
Token walks
Token ID11114
Feature activation+6.21e-03

Pos logit contributions
 end+0.021
 her+0.020
 other+0.016
 decide+0.016
 angry+0.015
Neg logit contributions
aj-0.037
amas-0.034
roller-0.033
belt-0.033
coat-0.033
Token to
Token ID284
Feature activation+1.66e-02

Pos logit contributions
 end+0.011
 say+0.010
 her+0.010
 decide+0.010
 asks+0.009
Neg logit contributions
belt-0.013
amas-0.012
roller-0.012
coat-0.012
aj-0.011
Token the
Token ID262
Feature activation+1.68e-02

Pos logit contributions
 end+0.015
vious+0.014
 mean+0.014
 decides+0.013
 pigeon+0.012
Neg logit contributions
amas-0.037
roller-0.036
 lap-0.036
onna-0.036
belt-0.036
Token judge
Token ID5052
Feature activation+1.47e-01

Pos logit contributions
 her+0.028
 end+0.025
 asks+0.025
 say+0.025
 says+0.023
Neg logit contributions
belt-0.034
amas-0.033
roller-0.033
aj-0.030
coat-0.030
Token.
Token ID13
Feature activation-9.76e-03

Pos logit contributions
 end+0.181
 her+0.172
 say+0.156
 decide+0.155
 other+0.147
Neg logit contributions
amas-0.371
belt-0.369
aj-0.359
roller-0.354
coat-0.346
Token She
Token ID1375
Feature activation+1.39e-02

Pos logit contributions
amas+0.021
belt+0.020
roller+0.019
aj+0.019
+0.019
Neg logit contributions
 end-0.017
 her-0.015
 say-0.015
 decide-0.014
 other-0.014
Token says
Token ID1139
Feature activation+2.32e-02

Pos logit contributions
 end+0.019
 her+0.017
 decide+0.015
 other+0.015
 angry+0.013
Neg logit contributions
aj-0.033
amas-0.030
roller-0.030
belt-0.030
onna-0.029
Token hello
Token ID23748
Feature activation-3.04e-02

Pos logit contributions
 end+0.031
 decide+0.027
 say+0.025
 her+0.024
 other+0.024
Neg logit contributions
belt-0.053
amas-0.052
aj-0.051
onna-0.050
roller-0.050
Token.
Token ID13
Feature activation-1.16e-02

Pos logit contributions
amas+0.078
belt+0.076
aj+0.074
roller+0.073
coat+0.072
Neg logit contributions
 end-0.039
 her-0.038
 say-0.035
 decide-0.033
 other-0.032
Token the
Token ID262
Feature activation-1.56e-02

Pos logit contributions
amas+0.029
belt+0.029
roller+0.028
aj+0.028
+0.028
Neg logit contributions
 end-0.006
 say-0.005
 asks-0.004
 decide-0.004
 her-0.004
Token rules
Token ID3173
Feature activation+5.09e-03

Pos logit contributions
belt+0.039
amas+0.038
roller+0.038
coat+0.036
ables+0.036
Neg logit contributions
 her-0.017
 end-0.014
 asks-0.013
 say-0.013
 says-0.012
Token set
Token ID900
Feature activation-1.47e-02

Pos logit contributions
 her+0.005
 end+0.005
 say+0.004
 other+0.004
 the+0.004
Neg logit contributions
amas-0.015
aj-0.015
belt-0.014
roller-0.014
-0.014
Token by
Token ID416
Feature activation+5.66e-02

Pos logit contributions
amas+0.042
belt+0.041
aj+0.041
roller+0.040
+0.039
Neg logit contributions
 her-0.015
 end-0.014
 say-0.013
 other-0.012
 decide-0.011
Token the
Token ID262
Feature activation+3.04e-03

Pos logit contributions
 end+0.045
 her+0.045
 say+0.037
 other+0.034
 decide+0.033
Neg logit contributions
amas-0.172
belt-0.170
aj-0.166
roller-0.164
coat-0.161
Token judge
Token ID5052
Feature activation+1.46e-01

Pos logit contributions
 her+0.005
 say+0.004
 asks+0.004
 end+0.004
 says+0.004
Neg logit contributions
belt-0.006
amas-0.006
roller-0.006
coat-0.006
ney-0.006
Token.
Token ID13
Feature activation-1.98e-04

Pos logit contributions
 end+0.166
 her+0.164
 say+0.148
 other+0.138
 decide+0.137
Neg logit contributions
amas-0.397
belt-0.386
aj-0.380
roller-0.374
coat-0.367
Token<|endoftext|>
Token ID50256
Feature activation+3.22e-02

Pos logit contributions
belt+0.001
amas+0.001
aj+0.001
coat+0.001
roller+0.001
Neg logit contributions
 her-0.000
 end-0.000
 say-0.000
 the-0.000
 other-0.000
TokenOnce
Token ID7454
Feature activation+8.85e-03

Pos logit contributions
 her+0.034
 end+0.034
 other+0.029
 say+0.028
 two+0.027
Neg logit contributions
belt-0.092
amas-0.090
aj-0.088
roller-0.086
coat-0.085
Token upon
Token ID2402
Feature activation-1.18e-02

Pos logit contributions
 end+0.014
 her+0.014
 say+0.013
 decide+0.013
 other+0.013
Neg logit contributions
amas-0.019
belt-0.019
aj-0.018
roller-0.018
coat-0.017
Token a
Token ID257
Feature activation+3.92e-02

Pos logit contributions
amas+0.038
belt+0.038
aj+0.037
roller+0.036
coat+0.036
Neg logit contributions
 end-0.007
 her-0.007
 say-0.006
 decide-0.005
 other-0.005
Token to
Token ID284
Feature activation-7.56e-03

Pos logit contributions
 end+0.009
 her+0.009
 say+0.008
 decide+0.008
 other+0.008
Neg logit contributions
amas-0.012
belt-0.012
aj-0.011
roller-0.011
coat-0.011
Token run
Token ID1057
Feature activation-3.52e-03

Pos logit contributions
amas+0.019
belt+0.019
aj+0.019
roller+0.019
coat+0.018
Neg logit contributions
 end-0.009
 her-0.009
 say-0.008
 decide-0.007
 angry-0.007
Token.
Token ID13
Feature activation-1.39e-02

Pos logit contributions
amas+0.010
belt+0.009
aj+0.009
roller+0.009
coat+0.009
Neg logit contributions
 end-0.004
 her-0.004
 say-0.003
 other-0.003
 decide-0.003
Token Everywhere
Token ID48201
Feature activation+4.79e-02

Pos logit contributions
aj+0.037
belt+0.037
amas+0.037
roller+0.034
lor+0.034
Neg logit contributions
 her-0.019
 end-0.017
 say-0.016
 the-0.016
 ask-0.016
Token the
Token ID262
Feature activation+6.71e-02

Pos logit contributions
 end+0.061
 her+0.060
 say+0.054
 other+0.051
 decide+0.050
Neg logit contributions
amas-0.125
belt-0.121
aj-0.119
roller-0.118
coat-0.115
Token turtle
Token ID28699
Feature activation+1.45e-01

Pos logit contributions
 say+0.100
 her+0.097
 says+0.095
eks+0.091
 end+0.090
Neg logit contributions
belt-0.119
tub-0.118
coat-0.112
-0.109
amas-0.108
Token went
Token ID1816
Feature activation+5.16e-02

Pos logit contributions
 end+0.204
 her+0.194
 say+0.176
 decide+0.170
 angry+0.168
Neg logit contributions
belt-0.338
amas-0.334
aj-0.327
coat-0.321
roller-0.317
Token,
Token ID11
Feature activation+4.73e-02

Pos logit contributions
 end+0.065
 her+0.060
 say+0.058
 decide+0.058
 angry+0.051
Neg logit contributions
amas-0.128
belt-0.126
roller-0.120
aj-0.119
coat-0.118
Token he
Token ID339
Feature activation-8.47e-03

Pos logit contributions
 end+0.061
 her+0.058
 say+0.054
 decide+0.051
 other+0.050
Neg logit contributions
amas-0.119
belt-0.118
aj-0.113
roller-0.112
coat-0.110
Token loved
Token ID6151
Feature activation+1.62e-03

Pos logit contributions
amas+0.022
belt+0.021
aj+0.020
roller+0.020
+0.019
Neg logit contributions
 end-0.012
 her-0.012
 say-0.010
 other-0.010
 decide-0.010
Token to
Token ID284
Feature activation-9.45e-03

Pos logit contributions
 her+0.002
 end+0.002
 other+0.002
 say+0.002
 angry+0.002
Neg logit contributions
amas-0.004
belt-0.004
aj-0.004
roller-0.004
coat-0.004
Token closer
Token ID5699
Feature activation+2.87e-02

Pos logit contributions
amas+0.103
belt+0.100
aj+0.099
roller+0.097
+0.095
Neg logit contributions
 end-0.048
 her-0.047
 say-0.042
 other-0.040
 decide-0.039
Token and
Token ID290
Feature activation+7.29e-03

Pos logit contributions
 her+0.030
 end+0.025
 say+0.023
 the+0.022
 other+0.022
Neg logit contributions
amas-0.089
belt-0.087
aj-0.085
roller-0.083
-0.083
Token closer
Token ID5699
Feature activation+3.02e-02

Pos logit contributions
 end+0.009
 her+0.007
 say+0.006
 decide+0.006
 other+0.006
Neg logit contributions
belt-0.018
amas-0.017
aj-0.017
coat-0.017
roller-0.017
Token to
Token ID284
Feature activation+1.50e-02

Pos logit contributions
 her+0.036
 end+0.033
 say+0.030
 other+0.029
 the+0.028
Neg logit contributions
amas-0.088
belt-0.085
aj-0.084
roller-0.082
coat-0.080
Token the
Token ID262
Feature activation+9.47e-03

Pos logit contributions
 end+0.012
 her+0.010
 angry+0.010
 says+0.009
 say+0.009
Neg logit contributions
belt-0.036
-0.036
roller-0.035
amas-0.035
adows-0.035
Token judge
Token ID5052
Feature activation+1.44e-01

Pos logit contributions
 her+0.015
eks+0.012
 say+0.011
 asks+0.011
 end+0.011
Neg logit contributions
roller-0.020
tub-0.019
amas-0.019
keeping-0.019
belt-0.019
Token.
Token ID13
Feature activation-5.46e-03

Pos logit contributions
 end+0.167
 her+0.164
 say+0.148
 other+0.138
 decide+0.137
Neg logit contributions
amas-0.385
belt-0.377
aj-0.368
roller-0.363
coat-0.356
Token When
Token ID1649
Feature activation+2.77e-02

Pos logit contributions
belt+0.014
aj+0.013
amas+0.013
coat+0.013
roller+0.013
Neg logit contributions
 her-0.007
 end-0.007
 say-0.007
 asks-0.007
 other-0.007
Token he
Token ID339
Feature activation+1.33e-02

Pos logit contributions
 end+0.039
 her+0.037
 say+0.036
 decide+0.035
 asks+0.033
Neg logit contributions
amas-0.063
belt-0.063
roller-0.059
-0.058
aj-0.058
Token reached
Token ID4251
Feature activation+1.19e-02

Pos logit contributions
 end+0.019
 her+0.018
 say+0.017
 decide+0.016
 other+0.016
Neg logit contributions
amas-0.030
belt-0.030
roller-0.029
aj-0.029
-0.029
Token the
Token ID262
Feature activation+6.85e-03

Pos logit contributions
 end+0.006
 her+0.006
 say+0.004
 other+0.003
 decide+0.003
Neg logit contributions
amas-0.040
belt-0.039
aj-0.038
roller-0.038
coat-0.037
Token a
Token ID257
Feature activation+2.21e-02

Pos logit contributions
 end+0.026
 decide+0.025
 say+0.024
 asks+0.022
 her+0.020
Neg logit contributions
amas-0.077
belt-0.077
coat-0.072
-0.071
roller-0.070
Token girl
Token ID2576
Feature activation+2.22e-02

Pos logit contributions
 her+0.031
 end+0.028
 say+0.028
 decide+0.027
 asks+0.027
Neg logit contributions
belt-0.050
amas-0.049
roller-0.048
coat-0.048
-0.047
Token came
Token ID1625
Feature activation-1.61e-02

Pos logit contributions
 end+0.030
 her+0.026
 say+0.026
 decide+0.025
 other+0.024
Neg logit contributions
amas-0.049
belt-0.049
coat-0.049
roller-0.048
-0.048
Token to
Token ID284
Feature activation+4.12e-03

Pos logit contributions
amas+0.031
belt+0.030
roller+0.029
aj+0.029
coat+0.029
Neg logit contributions
 end-0.029
 her-0.027
 say-0.027
 decide-0.026
 asks-0.025
Token the
Token ID262
Feature activation+6.20e-03

Pos logit contributions
 end+0.004
 says+0.004
 asks+0.003
 angry+0.003
 decides+0.003
Neg logit contributions
amas-0.009
roller-0.009
belt-0.008
coat-0.008
onna-0.008
Token judge
Token ID5052
Feature activation+1.43e-01

Pos logit contributions
 her+0.010
 say+0.010
 says+0.010
 end+0.010
 asks+0.010
Neg logit contributions
amas-0.012
belt-0.012
roller-0.011
tub-0.011
ney-0.011
Token.
Token ID13
Feature activation-7.37e-03

Pos logit contributions
 end+0.191
 her+0.180
 say+0.171
 decide+0.166
 angry+0.160
Neg logit contributions
amas-0.344
belt-0.334
aj-0.325
roller-0.322
coat-0.316
Token She
Token ID1375
Feature activation+6.70e-03

Pos logit contributions
amas+0.016
belt+0.016
roller+0.015
aj+0.015
coat+0.015
Neg logit contributions
 end-0.012
 her-0.011
 say-0.010
 decide-0.010
 other-0.010
Token had
Token ID550
Feature activation-8.31e-03

Pos logit contributions
 end+0.008
 her+0.007
 say+0.007
 decide+0.007
 other+0.006
Neg logit contributions
amas-0.017
belt-0.017
aj-0.017
coat-0.016
roller-0.016
Token a
Token ID257
Feature activation+3.86e-04

Pos logit contributions
amas+0.026
belt+0.026
aj+0.025
roller+0.025
coat+0.024
Neg logit contributions
 end-0.006
 her-0.006
 say-0.005
 other-0.004
 decide-0.004
Token problem
Token ID1917
Feature activation+2.39e-02

Pos logit contributions
 end+0.001
 her+0.001
 decide+0.001
 asks+0.000
 say+0.000
Neg logit contributions
belt-0.001
roller-0.001
amas-0.001
coat-0.001
aj-0.001
Token harsh
Token ID11859
Feature activation-9.17e-03

Pos logit contributions
 her+0.034
 say+0.032
 end+0.031
 decide+0.029
 asks+0.029
Neg logit contributions
amas-0.048
belt-0.046
roller-0.046
ables-0.045
-0.045
Token queen
Token ID16599
Feature activation+1.77e-02

Pos logit contributions
amas+0.021
belt+0.020
aj+0.020
roller+0.020
coat+0.019
Neg logit contributions
 end-0.014
 her-0.014
 say-0.013
 decide-0.012
 other-0.012
Token.
Token ID13
Feature activation-1.87e-02

Pos logit contributions
 end+0.022
 her+0.021
 say+0.019
 decide+0.019
 angry+0.018
Neg logit contributions
amas-0.044
belt-0.043
aj-0.042
roller-0.042
-0.041
Token Everywhere
Token ID48201
Feature activation+4.24e-02

Pos logit contributions
belt+0.049
amas+0.048
aj+0.046
roller+0.045
coat+0.045
Neg logit contributions
 her-0.025
 end-0.025
 say-0.022
 other-0.022
 asks-0.021
Token the
Token ID262
Feature activation+6.13e-02

Pos logit contributions
 end+0.065
 her+0.062
 say+0.060
 decide+0.058
 angry+0.054
Neg logit contributions
amas-0.093
belt-0.092
aj-0.088
roller-0.086
-0.086
Token Prince
Token ID9005
Feature activation+1.43e-01

Pos logit contributions
 say+0.081
 her+0.077
 end+0.074
 decide+0.073
 asks+0.070
Neg logit contributions
amas-0.124
belt-0.119
tub-0.117
aj-0.116
-0.110
Token looked
Token ID3114
Feature activation+5.14e-02

Pos logit contributions
 end+0.213
 her+0.190
 say+0.180
 decide+0.176
 other+0.175
Neg logit contributions
amas-0.314
aj-0.305
belt-0.304
-0.291
anas-0.291
Token,
Token ID11
Feature activation+5.00e-02

Pos logit contributions
 end+0.057
 her+0.054
 say+0.050
 decide+0.047
 other+0.045
Neg logit contributions
amas-0.138
belt-0.137
aj-0.133
roller-0.130
coat-0.129
Token he
Token ID339
Feature activation-1.32e-02

Pos logit contributions
 end+0.068
 her+0.065
 say+0.061
 decide+0.058
 other+0.056
Neg logit contributions
amas-0.122
belt-0.120
aj-0.116
roller-0.114
-0.112
Token could
Token ID714
Feature activation-1.90e-02

Pos logit contributions
belt+0.029
amas+0.027
aj+0.027
coat+0.026
+0.026
Neg logit contributions
 end-0.019
 her-0.019
 say-0.018
 decide-0.017
 other-0.017
Token see
Token ID766
Feature activation-1.78e-02

Pos logit contributions
belt+0.050
amas+0.050
roller+0.048
coat+0.047
aj+0.047
Neg logit contributions
 end-0.019
 her-0.018
 say-0.016
 other-0.016
 asks-0.015
Token came
Token ID1625
Feature activation-1.43e-02

Pos logit contributions
 end+0.091
 her+0.084
 decide+0.077
 other+0.076
 say+0.071
Neg logit contributions
belt-0.152
amas-0.149
roller-0.144
aj-0.144
coat-0.143
Token along
Token ID1863
Feature activation-1.76e-02

Pos logit contributions
amas+0.035
belt+0.034
roller+0.033
aj+0.033
coat+0.033
Neg logit contributions
 end-0.019
 her-0.018
 say-0.017
 decide-0.016
 other-0.015
Token and
Token ID290
Feature activation+9.99e-03

Pos logit contributions
amas+0.048
belt+0.047
aj+0.046
roller+0.045
coat+0.044
Neg logit contributions
 end-0.019
 her-0.019
 say-0.017
 decide-0.016
 other-0.016
Token saw
Token ID2497
Feature activation-1.82e-02

Pos logit contributions
 end+0.012
 her+0.010
 decide+0.009
 say+0.009
 other+0.008
Neg logit contributions
belt-0.026
amas-0.025
aj-0.025
coat-0.024
roller-0.024
Token how
Token ID703
Feature activation+2.50e-02

Pos logit contributions
amas+0.046
+0.044
belt+0.044
tub+0.044
aj+0.044
Neg logit contributions
 end-0.018
 decide-0.015
 say-0.014
 ask-0.012
 asks-0.012
Token selfish
Token ID20363
Feature activation+1.42e-01

Pos logit contributions
 end+0.032
 her+0.029
 say+0.028
 decide+0.026
 asks+0.025
Neg logit contributions
belt-0.062
amas-0.061
aj-0.059
roller-0.058
-0.058
Token the
Token ID262
Feature activation+3.05e-02

Pos logit contributions
 end+0.105
 her+0.101
 say+0.085
 decide+0.076
 other+0.075
Neg logit contributions
amas-0.440
belt-0.433
aj-0.424
roller-0.420
coat-0.412
Token f
Token ID277
Feature activation+2.41e-02

Pos logit contributions
 her+0.041
 end+0.038
 say+0.034
 decide+0.033
 asks+0.032
Neg logit contributions
amas-0.074
belt-0.074
roller-0.071
aj-0.071
coat-0.069
Tokenauc
Token ID14272
Feature activation-6.07e-02

Pos logit contributions
 end+0.056
 her+0.055
 say+0.053
 other+0.052
 decide+0.051
Neg logit contributions
amas-0.035
belt-0.033
aj-0.032
roller-0.032
-0.030
Tokenet
Token ID316
Feature activation+1.77e-02

Pos logit contributions
amas+0.126
belt+0.123
aj+0.118
roller+0.117
coat+0.114
Neg logit contributions
 end-0.107
 her-0.105
 say-0.098
 decide-0.095
 other-0.094
Token was
Token ID373
Feature activation+1.31e-02

Pos logit contributions
 end+0.023
 her+0.023
 say+0.021
 decide+0.020
 other+0.020
Neg logit contributions
amas-0.045
belt-0.044
aj-0.043
roller-0.042
coat-0.041
Token decided
Token ID3066
Feature activation-7.54e-03

Pos logit contributions
 end+0.023
 her+0.018
 say+0.018
 decide+0.017
 other+0.016
Neg logit contributions
belt-0.036
amas-0.035
roller-0.033
aj-0.032
-0.032
Token to
Token ID284
Feature activation-1.65e-02

Pos logit contributions
amas+0.018
belt+0.017
aj+0.017
roller+0.017
coat+0.016
Neg logit contributions
 her-0.012
 end-0.011
 say-0.010
 other-0.010
 decide-0.010
Token try
Token ID1949
Feature activation+4.18e-02

Pos logit contributions
amas+0.045
belt+0.044
aj+0.043
roller+0.042
coat+0.042
Neg logit contributions
 end-0.019
 her-0.019
 say-0.017
 decide-0.016
 other-0.015
Token and
Token ID290
Feature activation+1.04e-02

Pos logit contributions
 her+0.031
 end+0.030
 say+0.025
 other+0.023
 decide+0.021
Neg logit contributions
amas-0.130
belt-0.128
aj-0.126
roller-0.124
coat-0.122
Token unsc
Token ID28594
Feature activation-4.83e-03

Pos logit contributions
 end+0.013
 her+0.012
 angry+0.010
 say+0.010
 asks+0.010
Neg logit contributions
belt-0.025
amas-0.025
roller-0.024
-0.023
 outage-0.023
Tokenrew
Token ID1809
Feature activation+1.42e-01

Pos logit contributions
belt+0.011
amas+0.011
aj+0.010
roller+0.010
icycle+0.009
Neg logit contributions
 end-0.007
 other-0.007
 say-0.007
 her-0.006
 girl-0.006
Token it
Token ID340
Feature activation+2.55e-02

Pos logit contributions
 end+0.168
 say+0.150
 her+0.148
 decide+0.141
 asks+0.129
Neg logit contributions
belt-0.362
amas-0.362
roller-0.345
aj-0.335
-0.335
Token.
Token ID13
Feature activation-1.92e-03

Pos logit contributions
 her+0.026
 end+0.025
 say+0.022
 other+0.021
 decide+0.020
Neg logit contributions
amas-0.073
belt-0.071
aj-0.071
roller-0.069
-0.069
Token He
Token ID679
Feature activation+1.43e-02

Pos logit contributions
belt+0.006
 outage+0.005
aj+0.005
lor+0.005
adows+0.005
Neg logit contributions
 her-0.002
 the-0.002
 say-0.002
 ask-0.002
 says-0.002
Token twisted
Token ID19074
Feature activation-1.16e-02

Pos logit contributions
 end+0.018
 her+0.017
 say+0.016
 decide+0.015
 other+0.014
Neg logit contributions
amas-0.037
belt-0.037
aj-0.035
roller-0.035
coat-0.034
Token and
Token ID290
Feature activation+5.46e-04

Pos logit contributions
amas+0.036
belt+0.036
aj+0.035
roller+0.034
coat+0.034
Neg logit contributions
 end-0.008
 her-0.008
 say-0.007
 decide-0.006
 other-0.006
Token away
Token ID1497
Feature activation+2.07e-02

Pos logit contributions
amas+0.071
belt+0.069
aj+0.068
roller+0.066
+0.066
Neg logit contributions
 end-0.029
 her-0.029
 say-0.026
 other-0.024
 decide-0.024
Token.
Token ID13
Feature activation-3.56e-03

Pos logit contributions
 her+0.021
 end+0.020
 say+0.017
 other+0.016
 angry+0.016
Neg logit contributions
amas-0.062
belt-0.060
aj-0.058
-0.057
roller-0.057
Token
Token ID198
Feature activation+1.41e-03

Pos logit contributions
amas+0.009
belt+0.009
coat+0.008
aj+0.008
imm+0.008
Neg logit contributions
 her-0.005
 end-0.005
 say-0.005
 asks-0.004
 other-0.004
Token
Token ID198
Feature activation+5.26e-03

Pos logit contributions
 her+0.002
 end+0.002
 say+0.002
 other+0.002
 decide+0.001
Neg logit contributions
amas-0.004
belt-0.004
coat-0.003
aj-0.003
roller-0.003
TokenThe
Token ID464
Feature activation-1.83e-03

Pos logit contributions
 end+0.008
 her+0.008
 say+0.008
 decide+0.007
 other+0.007
Neg logit contributions
amas-0.012
belt-0.011
aj-0.011
roller-0.011
coat-0.011
Token judge
Token ID5052
Feature activation+1.42e-01

Pos logit contributions
belt+0.004
amas+0.004
aj+0.004
coat+0.004
roller+0.004
Neg logit contributions
 her-0.002
 asks-0.002
 say-0.002
 decide-0.002
 ask-0.002
Token felt
Token ID2936
Feature activation+1.01e-03

Pos logit contributions
 end+0.185
 her+0.180
 say+0.164
 decide+0.159
 other+0.148
Neg logit contributions
belt-0.341
amas-0.336
roller-0.323
aj-0.320
coat-0.316
Token bad
Token ID2089
Feature activation+5.21e-03

Pos logit contributions
 end+0.001
 her+0.001
 say+0.001
 asks+0.001
 decide+0.001
Neg logit contributions
belt-0.003
amas-0.003
aj-0.002
-0.002
roller-0.002
Token for
Token ID329
Feature activation+1.99e-02

Pos logit contributions
 end+0.005
 her+0.005
 say+0.005
 other+0.004
 angry+0.004
Neg logit contributions
amas-0.015
belt-0.015
aj-0.015
roller-0.014
coat-0.014
Token sc
Token ID629
Feature activation+9.93e-03

Pos logit contributions
 end+0.022
 decide+0.019
 say+0.019
 her+0.018
 asks+0.018
Neg logit contributions
amas-0.051
belt-0.051
roller-0.048
aj-0.048
-0.048
Tokenaring
Token ID1723
Feature activation+1.45e-02

Pos logit contributions
 end+0.025
 her+0.025
 say+0.024
 decide+0.023
 other+0.023
Neg logit contributions
amas-0.013
belt-0.013
aj-0.012
roller-0.012
coat-0.011
Token fun
Token ID1257
Feature activation-5.11e-02

Pos logit contributions
 end+0.041
 her+0.040
 say+0.036
 other+0.035
 decide+0.034
Neg logit contributions
amas-0.084
belt-0.082
aj-0.079
roller-0.079
coat-0.077
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
amas+0.125
belt+0.122
aj+0.119
roller+0.116
+0.115
Neg logit contributions
 end-0.064
 her-0.064
 say-0.060
 decide-0.057
 asks-0.054
Token
Token ID198
Feature activation-7.35e-03

Pos logit contributions
amas+0.032
belt+0.031
aj+0.030
coat+0.030
roller+0.029
Neg logit contributions
 end-0.018
 her-0.018
 say-0.017
 other-0.016
 asks-0.015
Token
Token ID198
Feature activation-2.24e-03

Pos logit contributions
amas+0.018
belt+0.018
aj+0.017
roller+0.017
+0.017
Neg logit contributions
 end-0.010
 her-0.010
 say-0.009
 decide-0.008
 other-0.008
TokenThe
Token ID464
Feature activation-7.55e-04

Pos logit contributions
amas+0.004
belt+0.004
aj+0.004
roller+0.004
+0.004
Neg logit contributions
 her-0.004
 end-0.004
 say-0.004
 decide-0.004
 angry-0.004
Token judge
Token ID5052
Feature activation+1.41e-01

Pos logit contributions
belt+0.002
roller+0.002
aj+0.002
tub+0.002
coat+0.002
Neg logit contributions
 her-0.001
 asks-0.001
 ask-0.001
 say-0.001
eks-0.001
Token blows
Token ID20385
Feature activation+2.09e-02

Pos logit contributions
 her+0.193
 end+0.181
 decide+0.154
 other+0.152
 pet+0.150
Neg logit contributions
belt-0.287
aj-0.287
amas-0.286
roller-0.279
ables-0.276
Token his
Token ID465
Feature activation+6.87e-03

Pos logit contributions
 end+0.020
 decide+0.018
 her+0.018
 say+0.017
 asks+0.017
Neg logit contributions
amas-0.055
belt-0.053
roller-0.052
aj-0.052
coat-0.051
Token whistle
Token ID16121
Feature activation+2.32e-02

Pos logit contributions
 her+0.010
 end+0.010
 say+0.009
 asks+0.009
 decide+0.009
Neg logit contributions
amas-0.015
belt-0.015
aj-0.014
roller-0.014
coat-0.014
Token.
Token ID13
Feature activation-3.52e-03

Pos logit contributions
 end+0.025
 her+0.024
 say+0.022
 other+0.020
 decide+0.020
Neg logit contributions
amas-0.064
belt-0.063
aj-0.062
roller-0.061
coat-0.059
Token He
Token ID679
Feature activation+1.50e-02

Pos logit contributions
amas+0.008
belt+0.007
aj+0.007
coat+0.007
roller+0.007
Neg logit contributions
 her-0.006
 end-0.006
 say-0.006
 other-0.005
 decide-0.005
Token on
Token ID319
Feature activation-1.31e-02

Pos logit contributions
 end+0.029
 say+0.027
 decide+0.026
 her+0.026
 asks+0.024
Neg logit contributions
belt-0.054
amas-0.054
coat-0.051
roller-0.051
aj-0.051
Token the
Token ID262
Feature activation+7.54e-03

Pos logit contributions
amas+0.039
belt+0.038
roller+0.037
coat+0.037
+0.037
Neg logit contributions
 end-0.009
 say-0.008
 asks-0.008
 decide-0.008
 ask-0.006
Token grass
Token ID8701
Feature activation+1.95e-02

Pos logit contributions
 end+0.011
 her+0.010
 other+0.010
 say+0.010
 decide+0.009
Neg logit contributions
amas-0.019
belt-0.018
aj-0.018
roller-0.017
-0.017
Token,
Token ID11
Feature activation+5.93e-03

Pos logit contributions
 end+0.020
 her+0.020
 say+0.017
 other+0.016
 decide+0.015
Neg logit contributions
amas-0.055
belt-0.054
aj-0.053
roller-0.052
-0.052
Token unsc
Token ID28594
Feature activation-3.72e-03

Pos logit contributions
 end+0.009
 her+0.008
 say+0.008
 decide+0.007
 other+0.007
Neg logit contributions
amas-0.014
belt-0.014
aj-0.013
roller-0.013
coat-0.013
Tokenrew
Token ID1809
Feature activation+1.41e-01

Pos logit contributions
belt+0.008
amas+0.008
aj+0.008
roller+0.008
coat+0.008
Neg logit contributions
 end-0.007
 say-0.006
 other-0.006
 decide-0.005
 her-0.005
Tokened
Token ID276
Feature activation+2.96e-02

Pos logit contributions
 end+0.318
 say+0.297
 decide+0.294
 ask+0.282
 asks+0.275
Neg logit contributions
belt-0.201
amas-0.196
coat-0.191
roller-0.188
run-0.182
Token the
Token ID262
Feature activation+4.09e-03

Pos logit contributions
 end+0.028
 decide+0.020
 say+0.019
 asks+0.018
 judge+0.016
Neg logit contributions
belt-0.080
amas-0.079
coat-0.079
roller-0.078
run-0.076
Token lid
Token ID19789
Feature activation-2.72e-03

Pos logit contributions
 end+0.006
 her+0.006
 decide+0.006
 say+0.006
 asks+0.005
Neg logit contributions
amas-0.009
roller-0.009
belt-0.009
aj-0.009
coat-0.009
Token of
Token ID286
Feature activation-2.66e-02

Pos logit contributions
amas+0.008
belt+0.008
+0.008
aj+0.008
anas+0.008
Neg logit contributions
 her-0.003
 end-0.002
 say-0.002
 other-0.002
 angry-0.002
Token the
Token ID262
Feature activation+1.41e-02

Pos logit contributions
amas+0.074
coat+0.073
roller+0.073
belt+0.072
aj+0.071
Neg logit contributions
 end-0.022
 decide-0.019
 asks-0.017
 say-0.016
 ask-0.014
Token
Token ID198
Feature activation-9.28e-03

Pos logit contributions
 her+0.010
 end+0.009
 other+0.008
 say+0.008
 the+0.007
Neg logit contributions
belt-0.024
amas-0.024
aj-0.023
coat-0.023
roller-0.023
Token
Token ID198
Feature activation-7.52e-03

Pos logit contributions
belt+0.026
amas+0.025
aj+0.025
roller+0.024
coat+0.023
Neg logit contributions
 her-0.011
 end-0.010
 other-0.010
 say-0.009
 the-0.008
TokenEvery
Token ID6109
Feature activation+1.65e-02

Pos logit contributions
amas+0.019
belt+0.018
aj+0.018
roller+0.018
+0.017
Neg logit contributions
 end-0.011
 her-0.010
 say-0.009
 other-0.009
 decide-0.009
Token day
Token ID1110
Feature activation+3.38e-02

Pos logit contributions
 her+0.025
 say+0.025
 decide+0.024
 end+0.024
 asks+0.024
Neg logit contributions
belt-0.034
amas-0.033
roller-0.032
coat-0.032
-0.032
Token the
Token ID262
Feature activation+1.44e-02

Pos logit contributions
 decide+0.046
 say+0.045
 end+0.043
 angry+0.043
 asks+0.041
Neg logit contributions
belt-0.070
amas-0.069
coat-0.067
-0.066
roller-0.064
Token judge
Token ID5052
Feature activation+1.41e-01

Pos logit contributions
 say+0.015
 end+0.015
eks+0.015
 says+0.015
 her+0.015
Neg logit contributions
amas-0.033
belt-0.033
coat-0.032
tub-0.031
roller-0.031
Token had
Token ID550
Feature activation+4.85e-03

Pos logit contributions
 end+0.244
 her+0.238
 say+0.226
 decide+0.216
 angry+0.216
Neg logit contributions
belt-0.273
amas-0.264
aj-0.255
coat-0.253
roller-0.248
Token to
Token ID284
Feature activation-8.40e-03

Pos logit contributions
 her+0.005
 end+0.005
 other+0.004
 say+0.004
 decide+0.004
Neg logit contributions
amas-0.014
belt-0.014
aj-0.013
roller-0.013
coat-0.013
Token decide
Token ID5409
Feature activation+2.57e-02

Pos logit contributions
amas+0.021
belt+0.021
roller+0.020
coat+0.020
aj+0.020
Neg logit contributions
 end-0.009
 her-0.009
 asks-0.008
 angry-0.008
 other-0.007
Token cases
Token ID2663
Feature activation-9.79e-03

Pos logit contributions
 end+0.032
 her+0.031
 say+0.029
 decide+0.027
 other+0.027
Neg logit contributions
amas-0.066
belt-0.065
aj-0.063
roller-0.062
coat-0.061
Token in
Token ID287
Feature activation+4.89e-03

Pos logit contributions
amas+0.025
belt+0.024
aj+0.023
roller+0.023
+0.022
Neg logit contributions
 her-0.013
 end-0.013
 say-0.012
 other-0.011
 decide-0.011
Token until
Token ID1566
Feature activation-1.28e-03

Pos logit contributions
 her+0.001
 end+0.001
 say+0.001
 the+0.001
 one+0.001
Neg logit contributions
amas-0.005
belt-0.004
aj-0.004
roller-0.004
-0.004
Token they
Token ID484
Feature activation+7.31e-04

Pos logit contributions
belt+0.004
amas+0.004
roller+0.003
aj+0.003
coat+0.003
Neg logit contributions
 end-0.001
 say-0.001
 her-0.001
 decide-0.001
 asks-0.001
Token finally
Token ID3443
Feature activation+1.81e-03

Pos logit contributions
 end+0.001
 her+0.001
 say+0.001
 decide+0.001
 asks+0.001
Neg logit contributions
amas-0.002
belt-0.002
roller-0.002
aj-0.002
coat-0.002
Token saw
Token ID2497
Feature activation-1.42e-02

Pos logit contributions
 end+0.002
 her+0.002
 say+0.002
 decide+0.002
 other+0.002
Neg logit contributions
amas-0.005
belt-0.005
aj-0.005
roller-0.005
coat-0.004
Token the
Token ID262
Feature activation-1.38e-02

Pos logit contributions
amas+0.041
belt+0.040
aj+0.040
roller+0.040
+0.039
Neg logit contributions
 end-0.011
 say-0.009
 decide-0.008
 asks-0.008
 her-0.008
Token judge
Token ID5052
Feature activation+1.40e-01

Pos logit contributions
amas+0.033
belt+0.033
aj+0.031
roller+0.031
coat+0.031
Neg logit contributions
 her-0.019
 end-0.019
 say-0.018
 decide-0.017
 other-0.016
Token's
Token ID338
Feature activation+1.43e-02

Pos logit contributions
 end+0.176
 her+0.172
 say+0.157
 decide+0.147
 other+0.146
Neg logit contributions
amas-0.362
belt-0.355
aj-0.346
roller-0.341
coat-0.334
Token house
Token ID2156
Feature activation-3.12e-02

Pos logit contributions
 end+0.020
 her+0.019
 say+0.018
 other+0.017
 decide+0.017
Neg logit contributions
amas-0.035
belt-0.035
aj-0.034
roller-0.033
coat-0.032
Token.
Token ID13
Feature activation-2.07e-03

Pos logit contributions
amas+0.094
aj+0.093
+0.090
belt+0.090
roller+0.089
Neg logit contributions
 her-0.030
 end-0.029
 say-0.023
 angry-0.022
 the-0.022
Token When
Token ID1649
Feature activation+2.82e-02

Pos logit contributions
belt+0.006
coat+0.005
amas+0.005
imm+0.005
 outage+0.005
Neg logit contributions
 her-0.003
 say-0.002
 asks-0.002
 the-0.002
 says-0.002
Token her
Token ID607
Feature activation+3.32e-02

Pos logit contributions
 end+0.027
 her+0.025
 say+0.023
 decide+0.022
 angry+0.020
Neg logit contributions
amas-0.080
belt-0.079
aj-0.076
roller-0.076
-0.074
Token everyone
Token ID2506
Feature activation-7.79e-03

Pos logit contributions
 end+0.013
 her+0.012
 say+0.011
 decide+0.011
 angry+0.010
Neg logit contributions
amas-0.036
belt-0.036
aj-0.035
roller-0.035
coat-0.034
Token."
Token ID526
Feature activation+2.97e-03

Pos logit contributions
amas+0.022
belt+0.022
aj+0.021
roller+0.021
coat+0.021
Neg logit contributions
 her-0.008
 end-0.008
 say-0.007
 other-0.007
 angry-0.006
Token
Token ID198
Feature activation-1.83e-03

Pos logit contributions
 end+0.005
 her+0.004
 say+0.004
 decide+0.004
 other+0.004
Neg logit contributions
amas-0.007
belt-0.007
aj-0.006
roller-0.006
coat-0.006
Token
Token ID198
Feature activation+7.31e-05

Pos logit contributions
amas+0.005
belt+0.004
aj+0.004
roller+0.004
coat+0.004
Neg logit contributions
 end-0.003
 her-0.003
 say-0.002
 other-0.002
 decide-0.002
TokenThe
Token ID464
Feature activation+9.60e-04

Pos logit contributions
 end+0.000
 her+0.000
 say+0.000
 decide+0.000
 other+0.000
Neg logit contributions
amas-0.000
belt-0.000
aj-0.000
roller-0.000
coat-0.000
Token judge
Token ID5052
Feature activation+1.40e-01

Pos logit contributions
 her+0.001
eks+0.001
 both+0.001
ck+0.001
 say+0.001
Neg logit contributions
belt-0.002
tub-0.002
amas-0.002
roller-0.002
coat-0.002
Token picked
Token ID6497
Feature activation+5.96e-02

Pos logit contributions
 end+0.202
 her+0.199
 decide+0.183
 say+0.182
 other+0.170
Neg logit contributions
belt-0.298
amas-0.286
aj-0.280
ney-0.276
roller-0.275
Token up
Token ID510
Feature activation+5.93e-02

Pos logit contributions
 end+0.094
 decide+0.080
 say+0.076
 judge+0.072
 ask+0.071
Neg logit contributions
amas-0.125
roller-0.115
ables-0.111
belt-0.111
-0.109
Token Lily
Token ID20037
Feature activation+3.47e-02

Pos logit contributions
 end+0.053
 decide+0.045
 say+0.044
 asks+0.040
 ask+0.037
Neg logit contributions
belt-0.163
amas-0.163
aj-0.159
roller-0.158
coat-0.156
Token's
Token ID338
Feature activation+1.20e-02

Pos logit contributions
 end+0.046
 her+0.043
 decide+0.041
 say+0.041
 other+0.039
Neg logit contributions
amas-0.084
belt-0.081
aj-0.080
roller-0.079
coat-0.078
Token doll
Token ID3654
Feature activation+1.64e-02

Pos logit contributions
 end+0.018
 her+0.017
 say+0.016
 decide+0.016
 asks+0.015
Neg logit contributions
amas-0.027
belt-0.027
aj-0.027
roller-0.026
coat-0.026
Token outside
Token ID2354
Feature activation-4.19e-03

Pos logit contributions
 end+0.010
 her+0.009
 say+0.009
 decide+0.009
 asks+0.008
Neg logit contributions
belt-0.015
amas-0.015
roller-0.015
aj-0.015
coat-0.015
Token and
Token ID290
Feature activation+1.51e-02

Pos logit contributions
amas+0.012
belt+0.012
aj+0.011
roller+0.011
+0.011
Neg logit contributions
 her-0.005
 end-0.005
 say-0.004
 other-0.004
 decide-0.004
Token Sus
Token ID8932
Feature activation-5.60e-02

Pos logit contributions
 end+0.023
 her+0.020
 asks+0.019
 angry+0.019
 decide+0.019
Neg logit contributions
belt-0.033
amas-0.031
coat-0.031
roller-0.030
-0.030
Tokenie
Token ID494
Feature activation-4.41e-02

Pos logit contributions
amas+0.170
belt+0.167
aj+0.163
roller+0.162
coat+0.159
Neg logit contributions
 end-0.045
 her-0.043
 say-0.037
 decide-0.033
 other-0.033
Token showed
Token ID3751
Feature activation-2.34e-02

Pos logit contributions
amas+0.126
belt+0.123
aj+0.119
roller+0.118
+0.116
Neg logit contributions
 her-0.045
 end-0.044
 say-0.040
 other-0.037
 decide-0.036
Token Jo
Token ID5302
Feature activation-1.87e-01

Pos logit contributions
amas+0.042
belt+0.042
roller+0.041
aj+0.039
+0.038
Neg logit contributions
 end-0.045
 decide-0.039
 say-0.038
 angry-0.037
 asks-0.037
Tokenjo
Token ID7639
Feature activation-1.17e-01

Pos logit contributions
amas+0.389
belt+0.367
roller+0.355
aj+0.347
+0.338
Neg logit contributions
 her-0.352
 end-0.333
 say-0.313
 other-0.300
 angry-0.296
Token a
Token ID257
Feature activation-2.42e-02

Pos logit contributions
belt+0.318
amas+0.317
aj+0.312
roller+0.309
+0.308
Neg logit contributions
 end-0.110
 decide-0.096
 say-0.095
 asks-0.088
 her-0.083
Token big
Token ID1263
Feature activation-3.91e-02

Pos logit contributions
amas+0.064
belt+0.063
aj+0.061
roller+0.060
+0.059
Neg logit contributions
 end-0.029
 her-0.028
 say-0.026
 other-0.024
 decide-0.023
Token tire
Token ID15867
Feature activation+1.23e-03

Pos logit contributions
amas+0.095
belt+0.093
roller+0.091
aj+0.090
coat+0.089
Neg logit contributions
 end-0.052
 her-0.052
 say-0.047
 decide-0.047
 other-0.045
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
 end+0.001
 her+0.001
 say+0.001
 decide+0.001
 other+0.001
Neg logit contributions
amas-0.003
belt-0.003
aj-0.003
roller-0.003
coat-0.003
Token with
Token ID351
Feature activation+2.48e-02

Pos logit contributions
 her+0.053
 end+0.049
 angry+0.044
 say+0.043
 other+0.040
Neg logit contributions
amas-0.140
aj-0.134
belt-0.134
roller-0.130
-0.129
Token him
Token ID683
Feature activation-1.78e-02

Pos logit contributions
 her+0.019
 end+0.018
 say+0.015
 other+0.014
 decide+0.013
Neg logit contributions
amas-0.078
belt-0.076
aj-0.075
roller-0.074
coat-0.073
Token.
Token ID13
Feature activation-1.10e-02

Pos logit contributions
amas+0.053
belt+0.051
aj+0.051
roller+0.050
+0.049
Neg logit contributions
 her-0.017
 end-0.016
 say-0.014
 other-0.013
 angry-0.013
Token Winston
Token ID21400
Feature activation-1.18e-02

Pos logit contributions
coat+0.025
aj+0.024
amas+0.024
belt+0.024
imm+0.023
Neg logit contributions
 her-0.018
 the-0.016
 asks-0.016
 say-0.016
 two-0.015
Token told
Token ID1297
Feature activation-4.06e-02

Pos logit contributions
amas+0.031
belt+0.030
aj+0.030
roller+0.029
coat+0.029
Neg logit contributions
 end-0.014
 her-0.014
 say-0.013
 other-0.012
 decide-0.012
Token Jo
Token ID5302
Feature activation-1.80e-01

Pos logit contributions
amas+0.110
belt+0.108
aj+0.105
roller+0.105
coat+0.102
Neg logit contributions
 end-0.044
 her-0.041
 say-0.038
 decide-0.037
 other-0.035
Tokenjo
Token ID7639
Feature activation-1.16e-01

Pos logit contributions
amas+0.251
roller+0.204
belt+0.200
aj+0.195
+0.187
Neg logit contributions
 her-0.485
 end-0.448
 say-0.440
 other-0.419
 angry-0.417
Token,
Token ID11
Feature activation-2.22e-02

Pos logit contributions
amas+0.353
belt+0.343
aj+0.339
roller+0.333
coat+0.329
Neg logit contributions
 her-0.103
 end-0.100
 say-0.083
 other-0.077
 angry-0.072
Token "
Token ID366
Feature activation-4.54e-02

Pos logit contributions
amas+0.057
aj+0.055
belt+0.053
roller+0.053
+0.052
Neg logit contributions
 her-0.034
 end-0.031
 say-0.030
 other-0.028
 the-0.028
TokenCour
Token ID25877
Feature activation-4.26e-02

Pos logit contributions
amas+0.112
belt+0.110
aj+0.109
roller+0.107
+0.103
Neg logit contributions
 her-0.062
 end-0.061
 say-0.056
 decide-0.053
 other-0.053
Tokenage
Token ID496
Feature activation-5.18e-02

Pos logit contributions
aj+0.070
amas+0.063
 pic+0.054
lights+0.053
tub+0.052
Neg logit contributions
 end-0.104
 her-0.087
 one-0.086
 say-0.084
 mean-0.081
Token mom
Token ID1995
Feature activation-3.51e-02

Pos logit contributions
amas+0.047
belt+0.046
roller+0.045
aj+0.044
coat+0.044
Neg logit contributions
 end-0.016
 her-0.015
 say-0.013
 other-0.012
 angry-0.012
Token gave
Token ID2921
Feature activation-1.26e-02

Pos logit contributions
amas+0.100
belt+0.098
aj+0.096
roller+0.095
+0.093
Neg logit contributions
 end-0.035
 her-0.034
 say-0.031
 decide-0.027
 other-0.027
Token her
Token ID607
Feature activation-3.48e-02

Pos logit contributions
amas+0.038
coat+0.035
belt+0.035
aj+0.035
roller+0.034
Neg logit contributions
 end-0.008
 decide-0.006
 judge-0.005
 say-0.005
 asks-0.004
Token a
Token ID257
Feature activation-2.06e-02

Pos logit contributions
amas+0.108
belt+0.107
aj+0.104
roller+0.103
coat+0.102
Neg logit contributions
 end-0.025
 her-0.023
 say-0.020
 decide-0.018
 other-0.017
Token night
Token ID1755
Feature activation-6.30e-02

Pos logit contributions
amas+0.056
belt+0.055
aj+0.053
roller+0.052
+0.052
Neg logit contributions
 end-0.024
 her-0.023
 say-0.021
 other-0.020
 decide-0.019
Tokenlight
Token ID2971
Feature activation-1.75e-01

Pos logit contributions
amas+0.183
belt+0.183
aj+0.175
roller+0.174
anas+0.173
Neg logit contributions
 end-0.068
 her-0.066
 say-0.060
 the-0.053
 one-0.052
Token to
Token ID284
Feature activation+8.54e-03

Pos logit contributions
amas+0.521
belt+0.513
+0.509
roller+0.504
anas+0.487
Neg logit contributions
 her-0.180
 end-0.152
 say-0.147
 all-0.124
 one-0.115
Token make
Token ID787
Feature activation-2.54e-02

Pos logit contributions
 end+0.011
 her+0.011
 say+0.010
 decide+0.010
 other+0.010
Neg logit contributions
amas-0.022
belt-0.021
aj-0.021
roller-0.020
coat-0.020
Token it
Token ID340
Feature activation+3.95e-03

Pos logit contributions
belt+0.082
amas+0.081
aj+0.079
roller+0.079
coat+0.077
Neg logit contributions
 her-0.017
 end-0.016
 say-0.013
 other-0.011
 decide-0.010
Token less
Token ID1342
Feature activation+6.18e-03

Pos logit contributions
 her+0.004
 end+0.004
 say+0.004
 other+0.003
 decide+0.003
Neg logit contributions
belt-0.011
amas-0.011
roller-0.011
aj-0.011
anas-0.010
Token scary
Token ID14343
Feature activation+2.81e-02

Pos logit contributions
 her+0.008
 end+0.008
 say+0.008
 angry+0.007
 other+0.007
Neg logit contributions
amas-0.016
belt-0.016
roller-0.015
aj-0.015
coat-0.015
Token the
Token ID262
Feature activation+9.41e-03

Pos logit contributions
 end+0.016
 her+0.015
 say+0.013
 decide+0.011
 other+0.011
Neg logit contributions
amas-0.077
belt-0.076
aj-0.074
roller-0.073
coat-0.072
Token tent
Token ID11105
Feature activation-5.93e-02

Pos logit contributions
 end+0.018
 her+0.018
 say+0.017
 other+0.017
 angry+0.016
Neg logit contributions
amas-0.018
belt-0.018
aj-0.017
roller-0.017
-0.017
Token with
Token ID351
Feature activation+3.53e-02

Pos logit contributions
amas+0.181
belt+0.180
aj+0.179
+0.178
roller+0.170
Neg logit contributions
 her-0.055
 end-0.053
 say-0.043
 other-0.038
 all-0.038
Token their
Token ID511
Feature activation-3.58e-02

Pos logit contributions
 end+0.039
 her+0.035
 say+0.034
 decide+0.034
 asks+0.032
Neg logit contributions
amas-0.092
belt-0.091
aj-0.089
roller-0.088
coat-0.086
Token flash
Token ID7644
Feature activation-6.86e-02

Pos logit contributions
amas+0.089
belt+0.087
aj+0.084
roller+0.084
coat+0.082
Neg logit contributions
 end-0.049
 her-0.048
 say-0.044
 other-0.041
 decide-0.041
Tokenlights
Token ID8091
Feature activation-1.74e-01

Pos logit contributions
amas+0.071
belt+0.069
+0.069
aj+0.063
anas+0.063
Neg logit contributions
 her-0.200
 end-0.197
 say-0.180
 other-0.180
 the-0.179
Token and
Token ID290
Feature activation+3.60e-02

Pos logit contributions
amas+0.512
belt+0.509
aj+0.498
+0.489
roller+0.486
Neg logit contributions
 her-0.171
 end-0.163
 say-0.146
 other-0.125
 angry-0.119
Token their
Token ID511
Feature activation+3.82e-03

Pos logit contributions
 end+0.041
 her+0.040
 say+0.036
 decide+0.034
 other+0.033
Neg logit contributions
amas-0.097
belt-0.095
aj-0.093
roller-0.092
coat-0.090
Token t
Token ID256
Feature activation-1.62e-02

Pos logit contributions
 end+0.007
 her+0.007
 say+0.006
 other+0.006
 angry+0.006
Neg logit contributions
amas-0.008
belt-0.008
roller-0.007
aj-0.007
-0.007
Tokeneddy
Token ID21874
Feature activation-4.69e-02

Pos logit contributions
belt+0.019
amas+0.019
coat+0.018
+0.018
aj+0.018
Neg logit contributions
 her-0.040
 end-0.040
 say-0.037
 other-0.036
 decide-0.036
Token bears
Token ID13062
Feature activation-4.11e-02

Pos logit contributions
amas+0.125
belt+0.121
aj+0.120
roller+0.116
coat+0.115
Neg logit contributions
 end-0.053
 her-0.051
 say-0.047
 decide-0.047
 angry-0.043
Token by
Token ID416
Feature activation+5.60e-02

Pos logit contributions
 her+0.010
 end+0.010
 say+0.009
 other+0.009
 decide+0.009
Neg logit contributions
amas-0.021
belt-0.020
aj-0.020
roller-0.020
coat-0.019
Token night
Token ID1755
Feature activation-2.39e-02

Pos logit contributions
 end+0.062
 her+0.059
 say+0.055
 decide+0.053
 other+0.050
Neg logit contributions
amas-0.148
belt-0.147
aj-0.142
roller-0.141
coat-0.138
Token she
Token ID673
Feature activation-2.99e-02

Pos logit contributions
belt+0.071
aj+0.070
amas+0.069
roller+0.068
+0.067
Neg logit contributions
 her-0.024
 end-0.020
 the-0.018
 say-0.017
 other-0.017
Token used
Token ID973
Feature activation-6.52e-04

Pos logit contributions
amas+0.078
belt+0.075
aj+0.074
roller+0.074
coat+0.072
Neg logit contributions
 end-0.039
 her-0.039
 say-0.035
 decide-0.033
 other-0.033
Token moon
Token ID8824
Feature activation-8.03e-02

Pos logit contributions
amas+0.002
belt+0.002
aj+0.002
roller+0.002
+0.002
Neg logit contributions
 her-0.000
 end-0.000
 say-0.000
 other-0.000
 decide-0.000
Tokenlight
Token ID2971
Feature activation-1.73e-01

Pos logit contributions
amas+0.209
belt+0.202
roller+0.199
aj+0.199
+0.194
Neg logit contributions
 her-0.107
 end-0.100
 say-0.091
 other-0.086
 decide-0.080
Token to
Token ID284
Feature activation+1.42e-03

Pos logit contributions
amas+0.517
roller+0.500
belt+0.496
+0.493
aj+0.476
Neg logit contributions
 her-0.189
 end-0.162
 say-0.140
 all-0.138
 the-0.127
Token help
Token ID1037
Feature activation+4.68e-04

Pos logit contributions
 end+0.002
 her+0.002
 say+0.001
 decide+0.001
 other+0.001
Neg logit contributions
amas-0.004
belt-0.004
aj-0.004
roller-0.003
coat-0.003
Token it
Token ID340
Feature activation+1.88e-02

Pos logit contributions
 end+0.000
 her+0.000
 say+0.000
 decide+0.000
 other+0.000
Neg logit contributions
amas-0.002
belt-0.002
aj-0.002
roller-0.002
coat-0.001
Token shine
Token ID18340
Feature activation-2.21e-02

Pos logit contributions
 her+0.022
 end+0.021
 say+0.019
 decide+0.018
 other+0.018
Neg logit contributions
amas-0.052
belt-0.051
aj-0.051
roller-0.050
coat-0.049
Token.
Token ID13
Feature activation-7.81e-03

Pos logit contributions
amas+0.065
roller+0.064
belt+0.063
aj+0.063
+0.061
Neg logit contributions
 her-0.025
 end-0.022
 other-0.018
 say-0.018
 the-0.018
Token to
Token ID284
Feature activation+3.78e-04

Pos logit contributions
 her+0.010
 end+0.010
 say+0.009
 decide+0.009
 other+0.009
Neg logit contributions
amas-0.019
belt-0.019
aj-0.018
roller-0.018
-0.018
Token bed
Token ID3996
Feature activation-4.99e-02

Pos logit contributions
 end+0.001
 her+0.001
 say+0.001
 decide+0.001
 other+0.001
Neg logit contributions
amas-0.001
belt-0.001
aj-0.001
roller-0.001
coat-0.001
Token with
Token ID351
Feature activation+3.58e-02

Pos logit contributions
+0.137
amas+0.136
belt+0.135
roller+0.132
ney+0.129
Neg logit contributions
 her-0.064
 end-0.050
 say-0.045
 all-0.044
 angry-0.044
Token her
Token ID607
Feature activation-2.83e-02

Pos logit contributions
 end+0.014
 her+0.011
 say+0.010
 decide+0.008
 asks+0.007
Neg logit contributions
amas-0.120
belt-0.118
aj-0.116
roller-0.115
coat-0.113
Token night
Token ID1755
Feature activation-6.90e-02

Pos logit contributions
amas+0.079
belt+0.077
roller+0.075
aj+0.075
+0.074
Neg logit contributions
 end-0.031
 her-0.030
 say-0.026
 other-0.025
 angry-0.024
Tokenlight
Token ID2971
Feature activation-1.73e-01

Pos logit contributions
belt+0.191
amas+0.183
+0.182
anas+0.181
roller+0.180
Neg logit contributions
 her-0.085
 end-0.079
 the-0.065
 other-0.065
 one-0.062
Token.
Token ID13
Feature activation-6.91e-03

Pos logit contributions
amas+0.513
belt+0.503
+0.494
roller+0.494
aj+0.481
Neg logit contributions
 her-0.188
 end-0.162
 say-0.140
 all-0.131
 one-0.119
Token The
Token ID383
Feature activation-1.78e-03

Pos logit contributions
belt+0.019
imm+0.019
 outage+0.019
amas+0.018
coat+0.018
Neg logit contributions
 her-0.009
 end-0.006
 say-0.006
 ask-0.006
 asks-0.006
Token next
Token ID1306
Feature activation+4.15e-02

Pos logit contributions
amas+0.005
belt+0.005
aj+0.004
roller+0.004
coat+0.004
Neg logit contributions
 her-0.002
 end-0.002
 say-0.002
 decide-0.002
 asks-0.002
Token day
Token ID1110
Feature activation+2.84e-02

Pos logit contributions
 end+0.049
 her+0.047
 say+0.042
 other+0.039
 decide+0.037
Neg logit contributions
amas-0.113
belt-0.111
aj-0.108
roller-0.108
-0.105
Token,
Token ID11
Feature activation+2.08e-02

Pos logit contributions
 her+0.027
 end+0.026
 say+0.022
 other+0.021
 decide+0.019
Neg logit contributions
amas-0.084
belt-0.084
aj-0.081
roller-0.080
coat-0.079
Token so
Token ID523
Feature activation-3.71e-02

Pos logit contributions
amas+0.019
belt+0.019
aj+0.019
roller+0.018
coat+0.018
Neg logit contributions
 her-0.010
 end-0.009
 say-0.009
 angry-0.008
 other-0.008
Token excited
Token ID6568
Feature activation+1.34e-02

Pos logit contributions
amas+0.098
belt+0.096
aj+0.093
roller+0.093
coat+0.091
Neg logit contributions
 her-0.045
 end-0.044
 say-0.040
 decide-0.038
 other-0.037
Token!
Token ID0
Feature activation-1.64e-02

Pos logit contributions
 her+0.018
 end+0.015
 say+0.015
 other+0.013
 the+0.013
Neg logit contributions
belt-0.037
amas-0.036
aj-0.035
roller-0.034
-0.034
Token She
Token ID1375
Feature activation+2.31e-03

Pos logit contributions
belt+0.048
 outage+0.045
amas+0.045
aj+0.045
imm+0.045
Neg logit contributions
 her-0.019
 end-0.015
 say-0.015
 asks-0.013
 ask-0.013
Token loved
Token ID6151
Feature activation-9.29e-03

Pos logit contributions
 her+0.002
 end+0.002
 say+0.002
 decide+0.002
 other+0.002
Neg logit contributions
amas-0.007
belt-0.007
roller-0.006
aj-0.006
-0.006
Token pic
Token ID8301
Feature activation-1.70e-01

Pos logit contributions
amas+0.022
belt+0.022
aj+0.021
roller+0.021
coat+0.020
Neg logit contributions
 her-0.014
 end-0.013
 say-0.012
 other-0.012
 decide-0.011
Tokenn
Token ID77
Feature activation+6.84e-03

Pos logit contributions
amas+0.346
belt+0.343
aj+0.330
roller+0.324
coat+0.311
Neg logit contributions
 end-0.314
 her-0.308
 say-0.287
 other-0.271
 decide-0.266
Tokenics
Token ID873
Feature activation-3.42e-02

Pos logit contributions
 her+0.014
 end+0.014
 say+0.013
 other+0.013
 angry+0.012
Neg logit contributions
belt-0.012
aj-0.012
amas-0.012
roller-0.012
-0.011
Token and
Token ID290
Feature activation-2.35e-03

Pos logit contributions
amas+0.102
aj+0.100
belt+0.099
roller+0.097
+0.093
Neg logit contributions
 her-0.036
 end-0.033
 say-0.027
 other-0.025
 all-0.025
Token spending
Token ID4581
Feature activation-3.71e-02

Pos logit contributions
amas+0.007
belt+0.006
roller+0.006
aj+0.006
 outage+0.006
Neg logit contributions
 her-0.003
 say-0.002
 end-0.002
 other-0.002
 decide-0.002
Token time
Token ID640
Feature activation-6.11e-03

Pos logit contributions
amas+0.092
belt+0.090
aj+0.087
roller+0.086
coat+0.083
Neg logit contributions
 her-0.054
 end-0.052
 say-0.046
 other-0.045
 decide-0.043
Token went
Token ID1816
Feature activation+7.80e-03

Pos logit contributions
 asks+0.009
 end+0.009
 angry+0.009
 say+0.009
 mad+0.009
Neg logit contributions
coat-0.010
belt-0.010
-0.010
anas-0.010
adows-0.010
Token on
Token ID319
Feature activation-2.36e-02

Pos logit contributions
 end+0.010
 her+0.009
 say+0.008
 other+0.008
 decide+0.008
Neg logit contributions
amas-0.020
belt-0.020
aj-0.020
roller-0.019
-0.019
Token adventures
Token ID17545
Feature activation-2.97e-02

Pos logit contributions
amas+0.058
belt+0.058
aj+0.055
roller+0.055
coat+0.054
Neg logit contributions
 her-0.033
 end-0.033
 say-0.029
 other-0.029
 decide-0.028
Token,
Token ID11
Feature activation+7.60e-03

Pos logit contributions
amas+0.091
aj+0.090
belt+0.089
roller+0.087
+0.083
Neg logit contributions
 her-0.028
 end-0.026
 all-0.024
 other-0.023
 the-0.022
Token had
Token ID550
Feature activation-1.63e-02

Pos logit contributions
 her+0.009
 end+0.008
 say+0.007
 other+0.007
 the+0.006
Neg logit contributions
amas-0.022
belt-0.021
aj-0.021
roller-0.021
 outage-0.020
Token pic
Token ID8301
Feature activation-1.67e-01

Pos logit contributions
belt+0.048
amas+0.047
aj+0.045
roller+0.044
 outage+0.044
Neg logit contributions
 her-0.020
 other-0.018
 end-0.017
 the-0.016
 two-0.016
Tokenn
Token ID77
Feature activation+2.06e-04

Pos logit contributions
amas+0.350
belt+0.346
aj+0.332
roller+0.324
+0.309
Neg logit contributions
 end-0.310
 her-0.295
 say-0.279
 other-0.265
 decide-0.261
Tokenics
Token ID873
Feature activation-3.91e-02

Pos logit contributions
 her+0.000
 end+0.000
 other+0.000
 both+0.000
 say+0.000
Neg logit contributions
belt-0.000
amas-0.000
aj-0.000
roller-0.000
icol-0.000
Token,
Token ID11
Feature activation-8.91e-03

Pos logit contributions
amas+0.121
belt+0.115
aj+0.114
roller+0.114
 outage+0.110
Neg logit contributions
 her-0.038
 end-0.036
 all-0.030
 say-0.029
 other-0.029
Token and
Token ID290
Feature activation-1.79e-03

Pos logit contributions
amas+0.028
belt+0.027
roller+0.027
aj+0.026
 outage+0.026
Neg logit contributions
 her-0.009
 end-0.007
 other-0.007
 say-0.007
 the-0.006
Token even
Token ID772
Feature activation-3.43e-02

Pos logit contributions
amas+0.005
belt+0.005
aj+0.004
roller+0.004
coat+0.004
Neg logit contributions
 end-0.002
 her-0.002
 say-0.002
 decide-0.002
 other-0.002
Token and
Token ID290
Feature activation-3.35e-02

Pos logit contributions
amas+0.004
belt+0.004
aj+0.004
roller+0.003
coat+0.003
Neg logit contributions
 her-0.001
 end-0.001
 say-0.001
 angry-0.001
 decide-0.001
Token nodded
Token ID14464
Feature activation+9.48e-03

Pos logit contributions
amas+0.103
belt+0.102
aj+0.099
roller+0.098
coat+0.097
Neg logit contributions
 end-0.026
 her-0.022
 say-0.019
 decide-0.018
 other-0.017
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 her+0.010
 end+0.009
 say+0.007
 angry+0.007
 decide+0.006
Neg logit contributions
amas-0.029
belt-0.028
roller-0.028
aj-0.028
-0.027
Token He
Token ID679
Feature activation+4.72e-03

Pos logit contributions
lor+0.028
 outage+0.027
aj+0.027
coat+0.027
belt+0.026
Neg logit contributions
 her-0.015
 asks-0.013
 the-0.013
 say-0.013
 two-0.013
Token loved
Token ID6151
Feature activation-8.37e-03

Pos logit contributions
 her+0.005
 end+0.005
 say+0.004
 decide+0.004
 other+0.004
Neg logit contributions
amas-0.014
belt-0.013
aj-0.013
roller-0.013
-0.013
Token pic
Token ID8301
Feature activation-1.67e-01

Pos logit contributions
amas+0.025
belt+0.025
roller+0.024
aj+0.024
coat+0.024
Neg logit contributions
 her-0.008
 end-0.007
 say-0.006
 other-0.006
 angry-0.005
Tokenn
Token ID77
Feature activation+1.12e-02

Pos logit contributions
amas+0.342
belt+0.335
aj+0.327
roller+0.319
coat+0.307
Neg logit contributions
 end-0.312
 her-0.302
 say-0.286
 other-0.268
 decide-0.265
Tokenics
Token ID873
Feature activation-3.33e-02

Pos logit contributions
 her+0.024
 end+0.023
 say+0.022
 other+0.021
 angry+0.021
Neg logit contributions
belt-0.020
amas-0.020
aj-0.019
roller-0.018
-0.018
Token and
Token ID290
Feature activation-3.02e-03

Pos logit contributions
amas+0.104
aj+0.101
roller+0.098
belt+0.097
 outage+0.094
Neg logit contributions
 her-0.032
 end-0.030
 say-0.025
 other-0.023
 all-0.023
Token couldn
Token ID3521
Feature activation-6.52e-03

Pos logit contributions
amas+0.008
belt+0.008
roller+0.008
aj+0.008
+0.008
Neg logit contributions
 her-0.004
 end-0.003
 say-0.003
 other-0.003
 decide-0.003
Token't
Token ID470
Feature activation+4.94e-02

Pos logit contributions
aj+0.018
belt+0.017
amas+0.017
coat+0.017
anas+0.016
Neg logit contributions
 end-0.008
 her-0.008
 say-0.007
 other-0.007
 the-0.006
Token girl
Token ID2576
Feature activation+1.71e-02

Pos logit contributions
 decide+0.015
 her+0.014
 end+0.014
 say+0.013
 asks+0.013
Neg logit contributions
coat-0.035
amas-0.035
belt-0.034
roller-0.033
aj-0.033
Token named
Token ID3706
Feature activation-1.74e-02

Pos logit contributions
 end+0.022
 decide+0.021
 her+0.021
 say+0.020
 other+0.019
Neg logit contributions
belt-0.040
amas-0.039
aj-0.039
coat-0.038
roller-0.038
Token Lily
Token ID20037
Feature activation+1.18e-02

Pos logit contributions
amas+0.045
aj+0.044
belt+0.044
coat+0.043
roller+0.043
Neg logit contributions
 end-0.020
 decide-0.018
 say-0.017
 her-0.017
 asks-0.016
Token who
Token ID508
Feature activation-3.07e-02

Pos logit contributions
 end+0.014
 her+0.013
 say+0.012
 decide+0.011
 other+0.011
Neg logit contributions
amas-0.031
belt-0.031
aj-0.030
roller-0.030
coat-0.029
Token loved
Token ID6151
Feature activation-2.20e-03

Pos logit contributions
amas+0.066
belt+0.065
coat+0.063
+0.062
aj+0.062
Neg logit contributions
 end-0.047
 her-0.044
 say-0.043
 decide-0.043
 asks-0.042
Token pic
Token ID8301
Feature activation-1.66e-01

Pos logit contributions
amas+0.004
belt+0.004
aj+0.004
roller+0.004
coat+0.004
Neg logit contributions
 end-0.004
 her-0.004
 say-0.004
 decide-0.003
 other-0.003
Tokenn
Token ID77
Feature activation+1.06e-02

Pos logit contributions
amas+0.319
belt+0.313
aj+0.300
roller+0.295
coat+0.286
Neg logit contributions
 end-0.322
 her-0.315
 say-0.296
 other-0.284
 decide-0.283
Tokenics
Token ID873
Feature activation-2.61e-02

Pos logit contributions
 her+0.022
 end+0.022
 say+0.020
 other+0.020
 angry+0.019
Neg logit contributions
belt-0.019
aj-0.018
amas-0.018
roller-0.017
-0.017
Token.
Token ID13
Feature activation-1.60e-02

Pos logit contributions
amas+0.072
belt+0.071
aj+0.069
roller+0.069
+0.067
Neg logit contributions
 end-0.028
 her-0.028
 say-0.024
 other-0.022
 decide-0.022
Token She
Token ID1375
Feature activation-2.60e-03

Pos logit contributions
belt+0.046
amas+0.045
aj+0.043
roller+0.042
+0.042
Neg logit contributions
 her-0.019
 end-0.018
 say-0.015
 other-0.014
 ask-0.014
Token would
Token ID561
Feature activation-3.97e-02

Pos logit contributions
amas+0.008
belt+0.007
roller+0.007
 outage+0.007
aj+0.007
Neg logit contributions
 her-0.003
 end-0.003
 other-0.002
 say-0.002
 pet-0.002
Token favourite
Token ID12507
Feature activation-5.76e-02

Pos logit contributions
amas+0.131
belt+0.128
aj+0.124
roller+0.123
coat+0.120
Neg logit contributions
 end-0.082
 her-0.079
 say-0.073
 other-0.070
 decide-0.070
Token place
Token ID1295
Feature activation-6.54e-02

Pos logit contributions
amas+0.139
belt+0.136
aj+0.132
roller+0.130
+0.127
Neg logit contributions
 end-0.090
 her-0.085
 other-0.076
 say-0.076
 decide-0.071
Token to
Token ID284
Feature activation+8.80e-03

Pos logit contributions
amas+0.208
aj+0.194
+0.191
belt+0.190
roller+0.189
Neg logit contributions
 her-0.066
 end-0.061
 say-0.051
 the-0.051
 all-0.049
Token go
Token ID467
Feature activation+5.12e-03

Pos logit contributions
 end+0.010
 her+0.010
 say+0.009
 decide+0.008
 other+0.008
Neg logit contributions
amas-0.024
belt-0.024
aj-0.023
roller-0.023
coat-0.022
Token for
Token ID329
Feature activation+2.06e-02

Pos logit contributions
 her+0.005
 end+0.005
 say+0.004
 the+0.004
 other+0.004
Neg logit contributions
amas-0.016
belt-0.015
aj-0.015
anas-0.015
-0.015
Token pic
Token ID8301
Feature activation-1.66e-01

Pos logit contributions
 end+0.020
 her+0.019
 say+0.017
 decide+0.016
 other+0.016
Neg logit contributions
amas-0.059
belt-0.058
aj-0.057
roller-0.056
coat-0.055
Tokenn
Token ID77
Feature activation+9.34e-03

Pos logit contributions
amas+0.365
belt+0.361
aj+0.346
roller+0.333
coat+0.322
Neg logit contributions
 end-0.295
 her-0.273
 say-0.258
 other-0.246
 decide-0.242
Tokenics
Token ID873
Feature activation-3.26e-02

Pos logit contributions
 her+0.011
 end+0.011
 all+0.009
 other+0.009
 both+0.009
Neg logit contributions
belt-0.026
aj-0.025
amas-0.024
icol-0.024
anas-0.024
Token.
Token ID13
Feature activation-4.82e-03

Pos logit contributions
amas+0.102
belt+0.098
aj+0.097
roller+0.096
+0.093
Neg logit contributions
 her-0.031
 end-0.029
 say-0.023
 other-0.022
 all-0.021
Token<|endoftext|>
Token ID50256
Feature activation+3.22e-02

Pos logit contributions
belt+0.014
amas+0.013
coat+0.013
aj+0.013
imm+0.012
Neg logit contributions
 her-0.005
 end-0.005
 say-0.005
 other-0.005
 the-0.004
TokenOnce
Token ID7454
Feature activation+6.21e-03

Pos logit contributions
 her+0.032
 end+0.031
 other+0.026
 the+0.025
 two+0.024
Neg logit contributions
belt-0.095
amas-0.092
aj-0.092
roller-0.089
coat-0.088
Token was
Token ID373
Feature activation-1.32e-02

Pos logit contributions
 end+0.012
 her+0.012
 say+0.011
 decide+0.011
 asks+0.011
Neg logit contributions
belt-0.016
amas-0.015
aj-0.015
coat-0.014
-0.014
Token very
Token ID845
Feature activation-2.76e-02

Pos logit contributions
belt+0.033
amas+0.032
coat+0.031
aj+0.031
+0.031
Neg logit contributions
 end-0.016
 decide-0.014
 her-0.014
 say-0.014
 asks-0.013
Token proud
Token ID6613
Feature activation+6.33e-03

Pos logit contributions
belt+0.063
amas+0.061
aj+0.061
roller+0.058
coat+0.058
Neg logit contributions
 end-0.041
 her-0.038
 decide-0.035
 say-0.034
 other-0.031
Token of
Token ID286
Feature activation-8.23e-02

Pos logit contributions
 her+0.008
 end+0.008
 say+0.007
 other+0.007
 asks+0.006
Neg logit contributions
amas-0.017
belt-0.017
roller-0.016
aj-0.016
coat-0.016
Token his
Token ID465
Feature activation-2.65e-02

Pos logit contributions
amas+0.204
belt+0.204
aj+0.194
+0.191
roller+0.189
Neg logit contributions
 end-0.097
 say-0.091
 her-0.089
 decide-0.088
 angry-0.082
Token ig
Token ID45329
Feature activation-1.66e-01

Pos logit contributions
amas+0.068
belt+0.066
aj+0.065
roller+0.064
coat+0.062
Neg logit contributions
 end-0.033
 her-0.033
 say-0.030
 decide-0.028
 other-0.027
Tokenloo
Token ID29680
Feature activation-1.34e-01

Pos logit contributions
amas+0.240
roller+0.226
+0.219
aj+0.213
coat+0.213
Neg logit contributions
 end-0.393
 her-0.358
 other-0.358
 say-0.349
 angry-0.345
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
amas+0.412
aj+0.411
roller+0.400
belt+0.389
+0.387
Neg logit contributions
 end-0.121
 her-0.117
 say-0.092
 other-0.092
 all-0.088
Token He
Token ID679
Feature activation+6.79e-03

Pos logit contributions
belt+0.035
aj+0.034
amas+0.034
coat+0.033
roller+0.032
Neg logit contributions
 her-0.021
 end-0.021
 other-0.020
 say-0.020
 asks-0.019
Token liked
Token ID8288
Feature activation-4.14e-03

Pos logit contributions
 end+0.010
 her+0.010
 say+0.009
 decide+0.008
 angry+0.008
Neg logit contributions
belt-0.016
amas-0.015
aj-0.015
roller-0.015
coat-0.015
Token playing
Token ID2712
Feature activation-1.97e-02

Pos logit contributions
amas+0.011
belt+0.011
aj+0.011
roller+0.011
coat+0.010
Neg logit contributions
 her-0.006
 end-0.005
 other-0.005
 the-0.005
 say-0.005
Tokenrobe
Token ID25481
Feature activation-5.21e-02

Pos logit contributions
+0.098
anas+0.097
 outage+0.094
icol+0.093
belt+0.092
Neg logit contributions
 her-0.124
 end-0.123
 all-0.116
 other-0.113
 one-0.106
Token and
Token ID290
Feature activation+6.74e-03

Pos logit contributions
aj+0.149
amas+0.148
+0.148
belt+0.145
anas+0.144
Neg logit contributions
 her-0.055
 end-0.053
 say-0.043
 all-0.041
 other-0.040
Token wear
Token ID5806
Feature activation-3.26e-03

Pos logit contributions
 end+0.007
 her+0.007
 say+0.006
 decide+0.006
 other+0.006
Neg logit contributions
amas-0.019
belt-0.019
aj-0.018
roller-0.018
coat-0.018
Token her
Token ID607
Feature activation-2.85e-02

Pos logit contributions
amas+0.011
belt+0.011
aj+0.010
roller+0.010
coat+0.010
Neg logit contributions
 her-0.002
 end-0.002
 say-0.001
 other-0.001
 decide-0.001
Token p
Token ID279
Feature activation-6.34e-02

Pos logit contributions
amas+0.073
belt+0.071
roller+0.070
anas+0.069
+0.068
Neg logit contributions
 her-0.039
 end-0.038
 other-0.034
 say-0.033
 angry-0.031
Tokenaj
Token ID1228
Feature activation-1.66e-01

Pos logit contributions
amas+0.041
 outage+0.039
anas+0.037
fire+0.037
+0.036
Neg logit contributions
 her-0.186
 end-0.178
 say-0.173
 me-0.169
 says-0.169
Tokenamas
Token ID17485
Feature activation-9.46e-02

Pos logit contributions
 outage+0.192
belt+0.147
onna+0.141
 steep+0.139
ney+0.138
Neg logit contributions
 her-0.455
 me-0.434
 both-0.426
 all-0.422
 end-0.416
Token instead
Token ID2427
Feature activation+1.75e-02

Pos logit contributions
amas+0.302
+0.297
aj+0.296
roller+0.291
anas+0.289
Neg logit contributions
 her-0.077
 end-0.067
 say-0.054
 all-0.052
 one-0.044
Token.
Token ID13
Feature activation-1.39e-03

Pos logit contributions
 her+0.015
 end+0.013
 other+0.010
 say+0.009
 one+0.008
Neg logit contributions
amas-0.056
belt-0.055
aj-0.055
icol-0.055
anas-0.054
Token But
Token ID887
Feature activation-1.43e-02

Pos logit contributions
belt+0.004
amas+0.004
aj+0.004
roller+0.004
+0.003
Neg logit contributions
 her-0.002
 end-0.002
 say-0.001
 other-0.001
 decide-0.001
Token Lily
Token ID20037
Feature activation+1.62e-02

Pos logit contributions
belt+0.035
amas+0.034
aj+0.033
coat+0.033
roller+0.032
Neg logit contributions
 end-0.016
 decide-0.015
 say-0.015
 asks-0.015
 angry-0.014
Token liked
Token ID8288
Feature activation-1.78e-03

Pos logit contributions
 end+0.001
 angry+0.001
 her+0.001
 asks+0.001
 mad+0.001
Neg logit contributions
coat-0.001
adows-0.001
amas-0.001
anas-0.001
belt-0.001
Token to
Token ID284
Feature activation-1.36e-02

Pos logit contributions
amas+0.004
belt+0.004
aj+0.004
roller+0.004
coat+0.004
Neg logit contributions
 her-0.003
 end-0.003
 say-0.002
 other-0.002
 decide-0.002
Token play
Token ID711
Feature activation+1.96e-02

Pos logit contributions
amas+0.035
belt+0.035
coat+0.034
roller+0.033
aj+0.033
Neg logit contributions
 end-0.015
 her-0.014
 angry-0.013
 asks-0.012
 say-0.012
Token and
Token ID290
Feature activation+1.17e-02

Pos logit contributions
 end+0.029
 her+0.028
 say+0.026
 other+0.025
 decide+0.025
Neg logit contributions
amas-0.047
belt-0.046
aj-0.045
roller-0.044
coat-0.043
Token have
Token ID423
Feature activation+3.62e-02

Pos logit contributions
 end+0.014
 her+0.013
 angry+0.011
 asks+0.011
 say+0.011
Neg logit contributions
belt-0.029
amas-0.028
coat-0.028
-0.028
roller-0.028
Token pic
Token ID8301
Feature activation-1.65e-01

Pos logit contributions
 her+0.049
 end+0.047
 other+0.043
 say+0.042
 decide+0.040
Neg logit contributions
belt-0.096
amas-0.095
aj-0.090
roller-0.090
coat-0.086
Tokenn
Token ID77
Feature activation+1.09e-02

Pos logit contributions
amas+0.337
belt+0.331
aj+0.318
roller+0.309
coat+0.298
Neg logit contributions
 end-0.311
 her-0.298
 say-0.282
 other-0.269
 decide-0.267
Tokenics
Token ID873
Feature activation-3.19e-02

Pos logit contributions
 her+0.023
 end+0.022
 other+0.021
 say+0.021
 decide+0.020
Neg logit contributions
belt-0.019
amas-0.019
aj-0.018
roller-0.018
-0.017
Token.
Token ID13
Feature activation-9.69e-03

Pos logit contributions
amas+0.085
belt+0.082
aj+0.081
roller+0.080
coat+0.077
Neg logit contributions
 end-0.039
 her-0.038
 say-0.034
 other-0.032
 decide-0.032
Token One
Token ID1881
Feature activation+1.42e-02

Pos logit contributions
amas+0.022
belt+0.021
aj+0.021
roller+0.020
coat+0.020
Neg logit contributions
 end-0.015
 her-0.015
 say-0.014
 decide-0.013
 other-0.013
Token day
Token ID1110
Feature activation+1.81e-02

Pos logit contributions
 end+0.018
 her+0.018
 say+0.016
 decide+0.015
 other+0.015
Neg logit contributions
amas-0.036
belt-0.036
aj-0.035
roller-0.034
coat-0.034
Token was
Token ID373
Feature activation-9.98e-03

Pos logit contributions
 her+0.003
 end+0.003
 say+0.003
 other+0.003
 decide+0.003
Neg logit contributions
amas-0.007
belt-0.007
aj-0.007
roller-0.007
-0.007
Token so
Token ID523
Feature activation-3.60e-02

Pos logit contributions
amas+0.027
belt+0.026
aj+0.026
roller+0.026
+0.025
Neg logit contributions
 her-0.013
 end-0.012
 angry-0.011
 say-0.011
 other-0.010
Token impressed
Token ID12617
Feature activation+3.34e-02

Pos logit contributions
amas+0.100
belt+0.098
roller+0.095
aj+0.094
coat+0.092
Neg logit contributions
 her-0.041
 end-0.040
 say-0.036
 angry-0.035
 decide-0.034
Token with
Token ID351
Feature activation+2.63e-02

Pos logit contributions
 end+0.046
 her+0.042
 say+0.040
 decide+0.040
 other+0.038
Neg logit contributions
amas-0.081
belt-0.080
aj-0.077
roller-0.076
coat-0.076
Token her
Token ID607
Feature activation-2.38e-02

Pos logit contributions
 end+0.019
 say+0.017
 decide+0.016
 asks+0.015
 her+0.013
Neg logit contributions
amas-0.076
belt-0.076
-0.073
roller-0.073
aj-0.072
Token ig
Token ID45329
Feature activation-1.64e-01

Pos logit contributions
amas+0.062
belt+0.060
roller+0.059
coat+0.058
+0.057
Neg logit contributions
 end-0.032
 her-0.030
 other-0.029
 say-0.028
 angry-0.027
Tokenloo
Token ID29680
Feature activation-1.26e-01

Pos logit contributions
amas+0.257
roller+0.254
+0.254
icol+0.244
anas+0.236
Neg logit contributions
 end-0.351
 her-0.307
 other-0.304
 pe-0.297
 say-0.296
Token at
Token ID379
Feature activation+2.66e-03

Pos logit contributions
amas+0.362
aj+0.355
belt+0.344
roller+0.343
+0.340
Neg logit contributions
 her-0.135
 end-0.127
 say-0.106
 other-0.099
 angry-0.097
Token the
Token ID262
Feature activation-1.65e-02

Pos logit contributions
 end+0.002
 her+0.002
 say+0.001
 other+0.001
 decide+0.001
Neg logit contributions
amas-0.009
belt-0.009
aj-0.008
roller-0.008
coat-0.008
Token end
Token ID886
Feature activation+8.79e-02

Pos logit contributions
amas+0.046
belt+0.046
aj+0.044
roller+0.044
+0.043
Neg logit contributions
 end-0.019
 her-0.017
 say-0.016
 other-0.015
 decide-0.014
Token!
Token ID0
Feature activation-1.08e-02

Pos logit contributions
 her+0.098
 end+0.080
 other+0.070
 the+0.068
 say+0.064
Neg logit contributions
amas-0.267
aj-0.255
belt-0.255
anas-0.252
acket-0.247
Token pond
Token ID16723
Feature activation+3.00e-02

Pos logit contributions
 her+0.020
 end+0.018
 say+0.017
 asks+0.017
 decide+0.017
Neg logit contributions
amas-0.025
belt-0.025
roller-0.025
aj-0.024
coat-0.023
Token,
Token ID11
Feature activation+7.29e-03

Pos logit contributions
 her+0.029
 end+0.028
 say+0.023
 other+0.022
 all+0.021
Neg logit contributions
amas-0.091
belt-0.089
aj-0.088
-0.087
roller-0.086
Token with
Token ID351
Feature activation+2.25e-02

Pos logit contributions
 her+0.010
 end+0.009
 say+0.009
 angry+0.008
 other+0.008
Neg logit contributions
amas-0.019
belt-0.018
aj-0.018
roller-0.018
-0.017
Token the
Token ID262
Feature activation-3.32e-02

Pos logit contributions
 end+0.022
 her+0.019
 say+0.018
 decide+0.018
 asks+0.016
Neg logit contributions
amas-0.063
belt-0.062
roller-0.061
aj-0.061
coat-0.059
Token moon
Token ID8824
Feature activation-6.35e-02

Pos logit contributions
belt+0.081
amas+0.080
roller+0.080
aj+0.078
coat+0.077
Neg logit contributions
 her-0.040
 end-0.038
 asks-0.034
 decide-0.033
 say-0.032
Tokenlight
Token ID2971
Feature activation-1.64e-01

Pos logit contributions
amas+0.173
belt+0.167
aj+0.164
roller+0.161
+0.160
Neg logit contributions
 her-0.075
 end-0.074
 say-0.067
 other-0.060
 angry-0.059
Token dancing
Token ID15360
Feature activation-3.89e-02

Pos logit contributions
amas+0.463
belt+0.450
aj+0.443
+0.438
roller+0.435
Neg logit contributions
 her-0.196
 end-0.181
 say-0.163
 angry-0.146
 all-0.145
Token on
Token ID319
Feature activation-2.84e-02

Pos logit contributions
amas+0.119
aj+0.116
belt+0.115
+0.112
roller+0.112
Neg logit contributions
 her-0.038
 end-0.032
 all-0.028
 the-0.026
 say-0.026
Token its
Token ID663
Feature activation-2.20e-02

Pos logit contributions
amas+0.091
belt+0.089
aj+0.087
roller+0.087
coat+0.085
Neg logit contributions
 end-0.017
 her-0.015
 say-0.014
 decide-0.012
 other-0.011
Token feathers
Token ID27400
Feature activation-2.96e-02

Pos logit contributions
amas+0.056
belt+0.054
aj+0.053
+0.052
coat+0.052
Neg logit contributions
 end-0.034
 her-0.031
 other-0.030
 say-0.029
 pet-0.026
Token.
Token ID13
Feature activation-5.51e-03

Pos logit contributions
amas+0.090
belt+0.087
aj+0.086
roller+0.084
+0.084
Neg logit contributions
 her-0.028
 end-0.026
 say-0.024
 other-0.021
 all-0.020
Token was
Token ID373
Feature activation-1.97e-02

Pos logit contributions
belt+0.108
amas+0.108
+0.102
aj+0.100
coat+0.099
Neg logit contributions
 end-0.093
 her-0.092
 say-0.090
 decide-0.086
 ask-0.082
Token still
Token ID991
Feature activation-1.71e-02

Pos logit contributions
belt+0.039
amas+0.038
+0.037
coat+0.036
aj+0.036
Neg logit contributions
 end-0.029
 decide-0.028
 say-0.028
 says-0.026
 asks-0.025
Token wearing
Token ID5762
Feature activation-8.01e-03

Pos logit contributions
belt+0.042
amas+0.042
aj+0.041
coat+0.040
roller+0.040
Neg logit contributions
 end-0.022
 decide-0.019
 say-0.019
 her-0.019
 asks-0.018
Token her
Token ID607
Feature activation-2.04e-02

Pos logit contributions
amas+0.025
belt+0.024
aj+0.024
roller+0.023
coat+0.023
Neg logit contributions
 end-0.006
 say-0.005
 her-0.004
 decide-0.004
 asks-0.004
Token p
Token ID279
Feature activation-6.08e-02

Pos logit contributions
amas+0.053
belt+0.052
roller+0.050
aj+0.050
+0.049
Neg logit contributions
 end-0.026
 her-0.026
 say-0.023
 other-0.022
 angry-0.021
Tokenaj
Token ID1228
Feature activation-1.64e-01

Pos logit contributions
amas+0.039
belt+0.035
 outage+0.028
+0.027
roller+0.025
Neg logit contributions
 her-0.187
 end-0.184
 say-0.178
 pet-0.173
 angry-0.173
Tokenamas
Token ID17485
Feature activation-9.57e-02

Pos logit contributions
 outage+0.134
belt+0.121
onna+0.092
ney+0.083
run+0.082
Neg logit contributions
 her-0.491
 me-0.472
 end-0.472
 two-0.453
 both-0.451
Token.
Token ID13
Feature activation-5.50e-03

Pos logit contributions
amas+0.303
aj+0.301
+0.292
roller+0.292
anas+0.289
Neg logit contributions
 her-0.083
 end-0.071
 all-0.054
 say-0.052
 angry-0.047
Token
Token ID220
Feature activation+7.57e-03

Pos logit contributions
belt+0.014
amas+0.014
aj+0.013
roller+0.013
coat+0.013
Neg logit contributions
 her-0.007
 end-0.007
 say-0.006
 other-0.006
 asks-0.006
Token
Token ID198
Feature activation-1.69e-02

Pos logit contributions
 her+0.009
 end+0.007
 say+0.007
 other+0.007
 both+0.006
Neg logit contributions
belt-0.021
amas-0.021
aj-0.020
coat-0.020
roller-0.020
Token
Token ID198
Feature activation-1.36e-02

Pos logit contributions
belt+0.044
amas+0.044
aj+0.042
roller+0.041
coat+0.041
Neg logit contributions
 her-0.022
 end-0.021
 say-0.019
 other-0.019
 decide-0.017
Token roof
Token ID9753
Feature activation-8.18e-02

Pos logit contributions
 end+0.001
 her+0.001
 other+0.001
 say+0.001
 decide+0.001
Neg logit contributions
amas-0.002
belt-0.002
aj-0.002
roller-0.002
-0.002
Token.
Token ID13
Feature activation+2.23e-03

Pos logit contributions
amas+0.243
aj+0.237
belt+0.233
roller+0.231
+0.226
Neg logit contributions
 her-0.088
 end-0.086
 other-0.070
 say-0.069
 all-0.068
Token They
Token ID1119
Feature activation+9.41e-03

Pos logit contributions
 her+0.003
 end+0.003
 other+0.003
 say+0.003
 decide+0.003
Neg logit contributions
belt-0.006
aj-0.005
amas-0.005
coat-0.005
roller-0.005
Token would
Token ID561
Feature activation-3.28e-02

Pos logit contributions
 end+0.015
 angry+0.015
 her+0.014
 asks+0.014
 mad+0.014
Neg logit contributions
anas-0.016
coat-0.016
belt-0.015
ney-0.015
adows-0.015
Token have
Token ID423
Feature activation+3.22e-02

Pos logit contributions
amas+0.074
belt+0.072
roller+0.070
coat+0.070
anas+0.069
Neg logit contributions
 end-0.044
 her-0.043
 angry-0.040
 other-0.039
 asks-0.039
Token pic
Token ID8301
Feature activation-1.63e-01

Pos logit contributions
 her+0.036
 end+0.033
 other+0.031
 say+0.030
 the+0.027
Neg logit contributions
belt-0.094
amas-0.093
aj-0.089
roller-0.089
coat-0.087
Tokenn
Token ID77
Feature activation+1.62e-02

Pos logit contributions
amas+0.330
belt+0.328
aj+0.312
roller+0.306
coat+0.293
Neg logit contributions
 end-0.308
 her-0.297
 say-0.281
 other-0.265
 decide-0.265
Tokenics
Token ID873
Feature activation-3.07e-02

Pos logit contributions
 her+0.034
 end+0.032
 say+0.030
 other+0.030
 both+0.029
Neg logit contributions
belt-0.030
amas-0.029
aj-0.028
roller-0.028
coat-0.027
Token and
Token ID290
Feature activation-2.95e-03

Pos logit contributions
amas+0.090
aj+0.086
belt+0.085
roller+0.085
 outage+0.082
Neg logit contributions
 her-0.034
 end-0.033
 all-0.028
 say-0.028
 other-0.027
Token play
Token ID711
Feature activation+2.68e-02

Pos logit contributions
amas+0.008
belt+0.008
aj+0.007
roller+0.007
coat+0.007
Neg logit contributions
 end-0.003
 her-0.003
 say-0.003
 decide-0.003
 other-0.003
Token games
Token ID1830
Feature activation-2.94e-02

Pos logit contributions
 her+0.038
 end+0.037
 other+0.033
 say+0.033
 decide+0.032
Neg logit contributions
belt-0.067
amas-0.066
aj-0.066
roller-0.066
 outage-0.062
Token a
Token ID257
Feature activation+1.26e-02

Pos logit contributions
 end+0.021
 say+0.019
 her+0.018
 decide+0.017
 asks+0.016
Neg logit contributions
amas-0.079
belt-0.077
aj-0.076
coat-0.075
-0.075
Token family
Token ID1641
Feature activation-1.42e-02

Pos logit contributions
 her+0.016
 end+0.016
 decide+0.015
 say+0.015
 asks+0.014
Neg logit contributions
amas-0.029
belt-0.029
aj-0.029
coat-0.028
roller-0.028
Token who
Token ID508
Feature activation-1.94e-02

Pos logit contributions
amas+0.035
belt+0.034
roller+0.033
aj+0.033
coat+0.032
Neg logit contributions
 end-0.018
 her-0.017
 say-0.016
 decide-0.016
 angry-0.015
Token really
Token ID1107
Feature activation+4.12e-02

Pos logit contributions
amas+0.049
belt+0.048
aj+0.046
roller+0.046
coat+0.045
Neg logit contributions
 end-0.026
 her-0.026
 say-0.024
 decide-0.022
 other-0.022
Token enjoyed
Token ID8359
Feature activation-2.60e-02

Pos logit contributions
 end+0.060
 her+0.056
 say+0.051
 decide+0.049
 asks+0.048
Neg logit contributions
belt-0.096
amas-0.095
aj-0.091
coat-0.091
roller-0.090
Token pic
Token ID8301
Feature activation-1.63e-01

Pos logit contributions
belt+0.065
amas+0.065
roller+0.063
aj+0.062
coat+0.060
Neg logit contributions
 her-0.037
 end-0.035
 say-0.032
 other-0.031
 angry-0.030
Tokenn
Token ID77
Feature activation+1.75e-02

Pos logit contributions
amas+0.327
belt+0.323
aj+0.311
roller+0.305
coat+0.295
Neg logit contributions
 end-0.307
 her-0.300
 say-0.281
 other-0.269
 decide-0.264
Tokenics
Token ID873
Feature activation-2.88e-02

Pos logit contributions
 her+0.036
 end+0.035
 say+0.032
 other+0.032
 decide+0.031
Neg logit contributions
belt-0.032
aj-0.031
amas-0.031
roller-0.029
coat-0.029
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
amas+0.088
belt+0.086
roller+0.085
aj+0.085
+0.082
Neg logit contributions
 her-0.027
 end-0.025
 say-0.021
 other-0.020
 all-0.019
Token They
Token ID1119
Feature activation-9.68e-05

Pos logit contributions
belt+0.028
amas+0.028
aj+0.027
lor+0.025
imm+0.025
Neg logit contributions
 her-0.015
 end-0.013
 say-0.013
 ask-0.012
 the-0.012
Token loved
Token ID6151
Feature activation-1.03e-02

Pos logit contributions
belt+0.000
amas+0.000
coat+0.000
aj+0.000
+0.000
Neg logit contributions
 end-0.000
 her-0.000
 say-0.000
 angry-0.000
 decide-0.000
Token even
Token ID772
Feature activation-3.27e-02

Pos logit contributions
 end+0.069
 her+0.067
 say+0.061
 decide+0.057
 other+0.057
Neg logit contributions
amas-0.149
belt-0.146
aj-0.142
roller-0.141
coat-0.138
Token showed
Token ID3751
Feature activation-1.11e-02

Pos logit contributions
belt+0.080
aj+0.078
amas+0.077
coat+0.076
roller+0.075
Neg logit contributions
 end-0.042
 her-0.038
 decide-0.036
 say-0.036
 angry-0.033
Token the
Token ID262
Feature activation-4.94e-02

Pos logit contributions
amas+0.031
belt+0.031
aj+0.030
roller+0.030
+0.029
Neg logit contributions
 end-0.010
 her-0.009
 say-0.008
 decide-0.008
 angry-0.007
Token rabbit
Token ID22746
Feature activation+8.92e-03

Pos logit contributions
amas+0.127
belt+0.125
aj+0.122
roller+0.120
coat+0.118
Neg logit contributions
 end-0.061
 her-0.060
 say-0.054
 decide-0.052
 other-0.049
Token his
Token ID465
Feature activation-4.28e-02

Pos logit contributions
 end+0.010
 her+0.010
 say+0.009
 decide+0.008
 other+0.008
Neg logit contributions
amas-0.024
belt-0.024
aj-0.023
roller-0.023
coat-0.022
Token ig
Token ID45329
Feature activation-1.63e-01

Pos logit contributions
amas+0.109
belt+0.105
aj+0.101
roller+0.101
coat+0.099
Neg logit contributions
 end-0.058
 her-0.056
 say-0.052
 other-0.051
 angry-0.048
Tokenloo
Token ID29680
Feature activation-1.25e-01

Pos logit contributions
amas+0.254
+0.241
roller+0.236
icol+0.234
anas+0.234
Neg logit contributions
 end-0.360
 her-0.315
 other-0.314
 all-0.305
 pe-0.302
Token,
Token ID11
Feature activation-8.88e-03

Pos logit contributions
amas+0.360
aj+0.356
+0.337
roller+0.337
belt+0.330
Neg logit contributions
 her-0.140
 end-0.136
 all-0.113
 say-0.109
 the-0.107
Token which
Token ID543
Feature activation-2.08e-02

Pos logit contributions
amas+0.025
belt+0.024
aj+0.023
roller+0.023
+0.023
Neg logit contributions
 her-0.011
 end-0.010
 say-0.009
 other-0.008
 the-0.008
Token the
Token ID262
Feature activation+4.36e-03

Pos logit contributions
amas+0.058
aj+0.056
belt+0.056
roller+0.054
coat+0.053
Neg logit contributions
 her-0.025
 end-0.024
 say-0.022
 other-0.021
 angry-0.019
Token rabbit
Token ID22746
Feature activation+4.45e-02

Pos logit contributions
 her+0.005
 end+0.005
 say+0.004
 decide+0.004
 asks+0.004
Neg logit contributions
amas-0.011
belt-0.011
aj-0.011
roller-0.011
coat-0.011
Token hair
Token ID4190
Feature activation+2.24e-02

Pos logit contributions
 end+0.021
 her+0.020
 say+0.019
 decide+0.018
 other+0.018
Neg logit contributions
amas-0.033
belt-0.032
aj-0.032
roller-0.031
coat-0.030
Token now
Token ID783
Feature activation-1.12e-02

Pos logit contributions
 her+0.025
 end+0.023
 say+0.019
 all+0.018
 other+0.018
Neg logit contributions
-0.062
 outage-0.061
amas-0.061
anas-0.061
roller-0.060
Token."
Token ID526
Feature activation+1.62e-04

Pos logit contributions
belt+0.029
roller+0.028
amas+0.027
aj+0.027
+0.027
Neg logit contributions
 her-0.016
 end-0.015
 say-0.014
 other-0.013
 all-0.013
Token Lily
Token ID20037
Feature activation+2.31e-02

Pos logit contributions
 her+0.000
 end+0.000
 say+0.000
 other+0.000
 decide+0.000
Neg logit contributions
belt-0.000
amas-0.000
aj-0.000
roller-0.000
-0.000
Token sat
Token ID3332
Feature activation-2.30e-02

Pos logit contributions
 her+0.023
 end+0.022
 say+0.020
 other+0.018
 decide+0.017
Neg logit contributions
belt-0.066
amas-0.066
roller-0.063
aj-0.062
 outage-0.062
Token up
Token ID510
Feature activation+8.03e-02

Pos logit contributions
amas+0.059
belt+0.059
aj+0.056
coat+0.054
roller+0.054
Neg logit contributions
 end-0.029
 say-0.026
 her-0.026
 decide-0.025
 other-0.024
Token and
Token ID290
Feature activation+4.92e-03

Pos logit contributions
 end+0.091
 her+0.088
 say+0.080
 decide+0.075
 other+0.074
Neg logit contributions
amas-0.216
belt-0.212
aj-0.207
roller-0.205
coat-0.201
Token her
Token ID607
Feature activation+9.16e-03

Pos logit contributions
 end+0.006
 her+0.005
 say+0.005
 decide+0.005
 other+0.005
Neg logit contributions
amas-0.013
belt-0.013
aj-0.013
roller-0.013
coat-0.012
Token mom
Token ID1995
Feature activation-1.38e-02

Pos logit contributions
 end+0.009
 her+0.009
 say+0.008
 other+0.007
 decide+0.007
Neg logit contributions
amas-0.026
belt-0.026
aj-0.025
roller-0.025
-0.024
Token poured
Token ID19036
Feature activation-1.83e-02

Pos logit contributions
amas+0.040
belt+0.040
roller+0.038
 outage+0.038
aj+0.038
Neg logit contributions
 her-0.015
 end-0.013
 say-0.012
 other-0.011
 pet-0.011
Token water
Token ID1660
Feature activation+3.77e-03

Pos logit contributions
amas+0.055
belt+0.053
aj+0.052
coat+0.052
roller+0.052
Neg logit contributions
 end-0.013
 say-0.011
 decide-0.011
 her-0.010
 asks-0.010
Token.
Token ID13
Feature activation-5.10e-03

Pos logit contributions
amas+0.015
belt+0.015
aj+0.015
coat+0.014
roller+0.014
Neg logit contributions
 end-0.008
 her-0.007
 decide-0.006
 say-0.006
 other-0.006
Token He
Token ID679
Feature activation+1.21e-02

Pos logit contributions
belt+0.012
 outage+0.011
coat+0.011
amas+0.011
+0.011
Neg logit contributions
 her-0.008
 say-0.007
 end-0.007
 asks-0.007
 says-0.007
Token decided
Token ID3066
Feature activation-1.30e-02

Pos logit contributions
 end+0.016
 her+0.015
 say+0.014
 decide+0.013
 other+0.013
Neg logit contributions
belt-0.026
aj-0.026
amas-0.025
roller-0.025
anas-0.024
Token to
Token ID284
Feature activation-1.93e-02

Pos logit contributions
amas+0.030
belt+0.029
aj+0.028
roller+0.028
coat+0.027
Neg logit contributions
 her-0.020
 end-0.020
 say-0.019
 decide-0.018
 other-0.018
Token clean
Token ID3424
Feature activation-6.06e-02

Pos logit contributions
amas+0.051
belt+0.050
roller+0.049
aj+0.048
coat+0.048
Neg logit contributions
 end-0.021
 her-0.020
 angry-0.018
 say-0.017
 other-0.017
Token up
Token ID510
Feature activation+8.55e-02

Pos logit contributions
amas+0.175
belt+0.171
aj+0.166
roller+0.165
coat+0.162
Neg logit contributions
 end-0.058
 her-0.051
 say-0.048
 decide-0.046
 other-0.043
Token the
Token ID262
Feature activation+7.31e-03

Pos logit contributions
 end+0.073
 her+0.071
 say+0.061
 decide+0.055
 other+0.055
Neg logit contributions
amas-0.255
belt-0.251
aj-0.245
roller-0.243
coat-0.239
Token mess
Token ID2085
Feature activation+2.36e-02

Pos logit contributions
 end+0.012
 her+0.012
 say+0.011
 decide+0.011
 other+0.010
Neg logit contributions
amas-0.016
belt-0.016
aj-0.015
roller-0.015
coat-0.015
Token and
Token ID290
Feature activation+2.39e-02

Pos logit contributions
 her+0.025
 end+0.023
 say+0.019
 other+0.018
 angry+0.017
Neg logit contributions
amas-0.069
aj-0.068
roller-0.068
belt-0.067
-0.066
Token then
Token ID788
Feature activation+2.91e-02

Pos logit contributions
 end+0.027
 her+0.026
 say+0.024
 decide+0.022
 other+0.022
Neg logit contributions
amas-0.064
belt-0.063
aj-0.061
roller-0.060
coat-0.059
Token rock
Token ID3881
Feature activation+3.82e-02

Pos logit contributions
 end+0.036
 her+0.033
 say+0.031
 other+0.029
 decide+0.029
Neg logit contributions
belt-0.075
amas-0.074
aj-0.072
roller-0.070
coat-0.069
Token decided
Token ID3066
Feature activation-2.20e-03

Pos logit contributions
 end+0.025
 her+0.022
 asks+0.020
 flower+0.019
els+0.019
Neg logit contributions
belt-0.038
coat-0.037
amas-0.036
roller-0.036
anas-0.036
Token to
Token ID284
Feature activation-1.19e-02

Pos logit contributions
amas+0.005
belt+0.005
aj+0.005
roller+0.005
coat+0.005
Neg logit contributions
 her-0.003
 end-0.003
 say-0.003
 decide-0.003
 other-0.003
Token help
Token ID1037
Feature activation+7.76e-05

Pos logit contributions
amas+0.030
belt+0.029
roller+0.029
coat+0.028
aj+0.028
Neg logit contributions
 end-0.013
 her-0.013
 angry-0.012
 asks-0.011
 other-0.011
Token her
Token ID607
Feature activation+2.15e-02

Pos logit contributions
 end+0.000
 her+0.000
 say+0.000
 decide+0.000
 other+0.000
Neg logit contributions
amas-0.000
belt-0.000
roller-0.000
aj-0.000
coat-0.000
Token clean
Token ID3424
Feature activation-6.09e-02

Pos logit contributions
 end+0.029
 her+0.028
 say+0.026
 decide+0.025
 other+0.024
Neg logit contributions
amas-0.054
belt-0.053
aj-0.052
roller-0.051
coat-0.050
Token up
Token ID510
Feature activation+9.18e-02

Pos logit contributions
amas+0.176
belt+0.172
aj+0.167
roller+0.166
coat+0.163
Neg logit contributions
 end-0.057
 her-0.052
 say-0.049
 decide-0.046
 other-0.043
Token the
Token ID262
Feature activation+1.19e-02

Pos logit contributions
 her+0.074
 end+0.071
 say+0.056
 other+0.054
 angry+0.049
Neg logit contributions
amas-0.293
belt-0.284
aj-0.280
roller-0.278
-0.275
Token mess
Token ID2085
Feature activation+2.59e-02

Pos logit contributions
 end+0.020
 her+0.020
 say+0.019
 decide+0.018
 other+0.018
Neg logit contributions
amas-0.026
belt-0.025
aj-0.024
roller-0.024
coat-0.023
Token.
Token ID13
Feature activation+2.28e-03

Pos logit contributions
 her+0.027
 end+0.026
 say+0.023
 other+0.021
 angry+0.020
Neg logit contributions
amas-0.074
belt-0.073
aj-0.072
roller-0.071
-0.070
Token They
Token ID1119
Feature activation+2.17e-02

Pos logit contributions
 her+0.003
 asks+0.002
 says+0.002
 say+0.002
 ask+0.002
Neg logit contributions
imm-0.007
belt-0.007
amas-0.006
adows-0.006
coat-0.006
Token learned
Token ID4499
Feature activation+2.76e-02

Pos logit contributions
 end+0.030
 her+0.029
 say+0.026
 other+0.025
 asks+0.025
Neg logit contributions
belt-0.050
amas-0.049
aj-0.048
coat-0.048
roller-0.047
Tokenily
Token ID813
Feature activation+2.58e-02

Pos logit contributions
 her+0.049
 end+0.048
 pet+0.048
 say+0.047
 asks+0.046
Neg logit contributions
aj-0.094
amas-0.091
belt-0.091
and-0.090
 outage-0.087
Token nodded
Token ID14464
Feature activation+4.81e-03

Pos logit contributions
 end+0.032
 her+0.031
 say+0.027
 other+0.027
 decide+0.027
Neg logit contributions
amas-0.065
aj-0.063
belt-0.062
coat-0.061
roller-0.061
Token,
Token ID11
Feature activation+3.34e-03

Pos logit contributions
 end+0.006
 her+0.005
 say+0.005
 other+0.005
 asks+0.005
Neg logit contributions
amas-0.012
belt-0.012
aj-0.012
roller-0.011
coat-0.011
Token and
Token ID290
Feature activation+1.60e-03

Pos logit contributions
 her+0.005
 end+0.005
 say+0.004
 angry+0.004
 other+0.004
Neg logit contributions
amas-0.008
belt-0.008
aj-0.008
roller-0.008
-0.008
Token climbed
Token ID19952
Feature activation+1.44e-02

Pos logit contributions
 end+0.002
 her+0.001
 decide+0.001
 say+0.001
 other+0.001
Neg logit contributions
belt-0.004
amas-0.004
aj-0.004
coat-0.004
roller-0.004
Token up
Token ID510
Feature activation+7.58e-02

Pos logit contributions
 end+0.018
 her+0.016
 say+0.016
 decide+0.015
 other+0.014
Neg logit contributions
amas-0.037
belt-0.037
aj-0.035
roller-0.035
coat-0.035
Token the
Token ID262
Feature activation+1.64e-02

Pos logit contributions
 her+0.064
 end+0.061
 say+0.051
 other+0.046
 angry+0.044
Neg logit contributions
amas-0.232
belt-0.227
aj-0.224
roller-0.219
-0.216
Token ladder
Token ID18002
Feature activation-1.59e-02

Pos logit contributions
 end+0.033
 her+0.032
 say+0.031
 other+0.029
 decide+0.029
Neg logit contributions
amas-0.030
belt-0.029
aj-0.028
roller-0.028
coat-0.027
Token.
Token ID13
Feature activation-2.45e-03

Pos logit contributions
amas+0.049
aj+0.048
+0.047
anas+0.046
belt+0.046
Neg logit contributions
 her-0.015
 end-0.014
 say-0.011
 other-0.011
 one-0.010
Token She
Token ID1375
Feature activation+1.50e-02

Pos logit contributions
belt+0.007
amas+0.007
aj+0.007
imm+0.006
 outage+0.006
Neg logit contributions
 her-0.003
 end-0.002
 say-0.002
 asks-0.002
 ask-0.002
Token felt
Token ID2936
Feature activation+1.84e-03

Pos logit contributions
 end+0.017
 her+0.017
 say+0.015
 decide+0.014
 other+0.014
Neg logit contributions
amas-0.041
belt-0.040
aj-0.039
roller-0.039
coat-0.038
Token hug
Token ID16225
Feature activation+1.97e-02

Pos logit contributions
amas+0.108
belt+0.106
aj+0.104
roller+0.102
coat+0.100
Neg logit contributions
 end-0.044
 her-0.043
 say-0.039
 decide-0.036
 other-0.036
Token and
Token ID290
Feature activation-5.67e-03

Pos logit contributions
 her+0.020
 end+0.017
 say+0.015
 the+0.013
 other+0.012
Neg logit contributions
amas-0.060
aj-0.059
roller-0.058
belt-0.058
-0.057
Token told
Token ID1297
Feature activation-3.16e-02

Pos logit contributions
amas+0.016
belt+0.016
aj+0.016
roller+0.016
coat+0.015
Neg logit contributions
 end-0.005
 her-0.005
 say-0.004
 decide-0.004
 other-0.004
Token her
Token ID607
Feature activation-4.79e-02

Pos logit contributions
amas+0.104
belt+0.102
aj+0.100
roller+0.100
+0.098
Neg logit contributions
 end-0.012
 decide-0.008
 say-0.006
 asks-0.005
 angry-0.004
Token how
Token ID703
Feature activation-1.09e-04

Pos logit contributions
amas+0.137
belt+0.134
aj+0.130
roller+0.130
coat+0.127
Neg logit contributions
 end-0.048
 her-0.048
 say-0.042
 other-0.038
 decide-0.038
Token proud
Token ID6613
Feature activation+8.74e-02

Pos logit contributions
amas+0.000
belt+0.000
roller+0.000
aj+0.000
coat+0.000
Neg logit contributions
 her-0.000
 end-0.000
 say-0.000
 other-0.000
 angry-0.000
Token she
Token ID673
Feature activation-2.75e-02

Pos logit contributions
 end+0.077
 her+0.071
 say+0.064
 decide+0.061
 other+0.057
Neg logit contributions
amas-0.254
belt-0.251
aj-0.245
roller-0.242
coat-0.239
Token was
Token ID373
Feature activation-1.32e-02

Pos logit contributions
amas+0.061
belt+0.061
+0.057
aj+0.057
coat+0.057
Neg logit contributions
 end-0.041
 her-0.039
 say-0.037
 decide-0.036
 other-0.035
Token of
Token ID286
Feature activation-7.44e-02

Pos logit contributions
amas+0.036
belt+0.035
aj+0.034
roller+0.034
coat+0.033
Neg logit contributions
 end-0.015
 her-0.015
 say-0.013
 other-0.012
 decide-0.012
Token her
Token ID607
Feature activation-1.08e-02

Pos logit contributions
amas+0.224
belt+0.222
aj+0.216
roller+0.212
coat+0.210
Neg logit contributions
 end-0.055
 say-0.045
 her-0.043
 decide-0.042
 asks-0.036
Token for
Token ID329
Feature activation+2.96e-03

Pos logit contributions
amas+0.027
belt+0.026
aj+0.026
roller+0.025
coat+0.025
Neg logit contributions
 end-0.014
 her-0.014
 say-0.013
 decide-0.012
 asks-0.012
Token best
Token ID1266
Feature activation-7.64e-02

Pos logit contributions
belt+0.082
amas+0.081
aj+0.081
roller+0.078
coat+0.077
Neg logit contributions
 end-0.037
 her-0.035
 say-0.032
 decide-0.031
 asks-0.030
Token friend
Token ID1545
Feature activation+2.29e-02

Pos logit contributions
amas+0.199
belt+0.197
aj+0.194
roller+0.192
coat+0.188
Neg logit contributions
 end-0.088
 her-0.085
 say-0.078
 decide-0.074
 angry-0.071
Token who
Token ID508
Feature activation-1.99e-02

Pos logit contributions
 end+0.028
 her+0.027
 say+0.025
 decide+0.023
 other+0.023
Neg logit contributions
amas-0.060
belt-0.059
aj-0.057
roller-0.057
coat-0.056
Token had
Token ID550
Feature activation-9.53e-04

Pos logit contributions
belt+0.044
amas+0.043
coat+0.042
roller+0.041
aj+0.041
Neg logit contributions
 end-0.030
 decide-0.025
 her-0.025
 say-0.025
 angry-0.024
Token left
Token ID1364
Feature activation-3.38e-02

Pos logit contributions
amas+0.003
belt+0.003
aj+0.003
roller+0.003
coat+0.003
Neg logit contributions
 her-0.001
 end-0.001
 say-0.001
 other-0.001
 decide-0.001
Token the
Token ID262
Feature activation+2.71e-03

Pos logit contributions
amas+0.109
belt+0.107
aj+0.105
roller+0.104
+0.102
Neg logit contributions
 her-0.022
 end-0.022
 say-0.017
 other-0.015
 decide-0.014
Token note
Token ID3465
Feature activation+8.32e-02

Pos logit contributions
 end+0.004
 her+0.004
 say+0.004
 decide+0.003
 other+0.003
Neg logit contributions
amas-0.006
belt-0.006
aj-0.006
roller-0.006
coat-0.006
Token.
Token ID13
Feature activation-7.79e-03

Pos logit contributions
 her+0.087
 end+0.086
 say+0.075
 other+0.069
 decide+0.066
Neg logit contributions
amas-0.239
belt-0.229
aj-0.226
roller-0.223
-0.222
Token They
Token ID1119
Feature activation+2.09e-03

Pos logit contributions
amas+0.019
belt+0.019
aj+0.018
coat+0.018
roller+0.018
Neg logit contributions
 her-0.011
 end-0.010
 say-0.010
 other-0.009
 asks-0.009
Token had
Token ID550
Feature activation-1.09e-02

Pos logit contributions
 end+0.003
 asks+0.002
 angry+0.002
 her+0.002
 mad+0.002
Neg logit contributions
belt-0.004
coat-0.004
anas-0.004
roller-0.004
aj-0.004
Token a
Token ID257
Feature activation-5.57e-03

Pos logit contributions
amas+0.030
belt+0.030
aj+0.029
roller+0.028
coat+0.027
Neg logit contributions
 her-0.013
 end-0.012
 say-0.011
 other-0.011
 decide-0.010
Token field
Token ID2214
Feature activation-2.25e-02

Pos logit contributions
amas+0.006
aj+0.006
belt+0.006
coat+0.006
roller+0.006
Neg logit contributions
 her-0.003
 end-0.003
 say-0.003
 asks-0.003
 decide-0.003
Token full
Token ID1336
Feature activation-4.77e-02

Pos logit contributions
amas+0.060
belt+0.059
aj+0.058
roller+0.057
coat+0.056
Neg logit contributions
 end-0.027
 her-0.026
 say-0.023
 other-0.022
 decide-0.022
Token of
Token ID286
Feature activation-4.01e-02

Pos logit contributions
amas+0.115
belt+0.113
aj+0.109
roller+0.108
coat+0.105
Neg logit contributions
 end-0.069
 her-0.067
 say-0.062
 decide-0.059
 other-0.058
Token colors
Token ID7577
Feature activation+1.62e-02

Pos logit contributions
amas+0.084
aj+0.080
adows+0.079
belt+0.078
coat+0.077
Neg logit contributions
 end-0.057
 say-0.051
 decide-0.051
 her-0.049
 asks-0.048
Token.
Token ID13
Feature activation-9.09e-03

Pos logit contributions
 end+0.018
 her+0.018
 say+0.016
 decide+0.015
 other+0.015
Neg logit contributions
amas-0.044
belt-0.043
aj-0.042
roller-0.041
coat-0.040
Token He
Token ID679
Feature activation+8.94e-03

Pos logit contributions
belt+0.020
amas+0.019
aj+0.019
 outage+0.019
coat+0.019
Neg logit contributions
 her-0.016
 end-0.014
 say-0.014
 asks-0.014
 other-0.014
Token decided
Token ID3066
Feature activation-1.36e-02

Pos logit contributions
 end+0.012
 her+0.011
 say+0.010
 decide+0.010
 angry+0.010
Neg logit contributions
amas-0.020
aj-0.020
belt-0.020
roller-0.019
coat-0.019
Token to
Token ID284
Feature activation-1.87e-02

Pos logit contributions
amas+0.031
belt+0.030
aj+0.029
roller+0.029
coat+0.028
Neg logit contributions
 end-0.021
 her-0.021
 say-0.020
 decide-0.019
 other-0.019
Token display
Token ID3359
Feature activation+1.87e-03

Pos logit contributions
amas+0.046
roller+0.044
belt+0.044
coat+0.043
+0.042
Neg logit contributions
 her-0.020
 end-0.020
 angry-0.020
 asks-0.017
 mad-0.017
Token his
Token ID465
Feature activation-1.64e-02

Pos logit contributions
 end+0.002
 say+0.001
 decide+0.001
 asks+0.001
 her+0.001
Neg logit contributions
amas-0.005
belt-0.005
aj-0.005
lights-0.005
roller-0.005
Token flowers
Token ID12734
Feature activation-3.59e-02

Pos logit contributions
amas+0.039
belt+0.038
aj+0.038
roller+0.037
coat+0.036
Neg logit contributions
 end-0.022
 her-0.021
 say-0.020
 decide-0.019
 asks-0.018
Token.
Token ID13
Feature activation-4.01e-03

Pos logit contributions
 end+0.016
 her+0.016
 say+0.014
 other+0.013
 decide+0.013
Neg logit contributions
amas-0.041
belt-0.040
aj-0.039
roller-0.039
coat-0.038
Token Mom
Token ID11254
Feature activation+5.90e-04

Pos logit contributions
belt+0.010
amas+0.010
imm+0.010
 outage+0.010
coat+0.010
Neg logit contributions
 her-0.005
 end-0.005
 say-0.005
 other-0.005
 asks-0.005
Token called
Token ID1444
Feature activation-1.97e-02

Pos logit contributions
 end+0.001
 her+0.001
 say+0.001
 decide+0.001
 other+0.001
Neg logit contributions
amas-0.001
belt-0.001
aj-0.001
coat-0.001
roller-0.001
Token the
Token ID262
Feature activation-1.92e-02

Pos logit contributions
amas+0.058
belt+0.056
roller+0.055
aj+0.055
coat+0.054
Neg logit contributions
 end-0.016
 her-0.014
 say-0.013
 decide-0.013
 asks-0.012
Token shop
Token ID6128
Feature activation+2.74e-02

Pos logit contributions
amas+0.040
belt+0.037
aj+0.036
roller+0.036
coat+0.035
Neg logit contributions
 her-0.033
 end-0.029
 say-0.029
 decide-0.028
 asks-0.027
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
 end+0.054
 her+0.053
 say+0.050
 decide+0.049
 other+0.048
Neg logit contributions
amas-0.050
belt-0.050
aj-0.047
roller-0.047
coat-0.045
Token ordered
Token ID6149
Feature activation-5.66e-03

Pos logit contributions
 end+0.015
 her+0.013
 say+0.012
 decide+0.012
 other+0.012
Neg logit contributions
belt-0.031
amas-0.031
aj-0.029
roller-0.029
coat-0.029
Token a
Token ID257
Feature activation+1.87e-02

Pos logit contributions
amas+0.015
belt+0.014
aj+0.014
roller+0.014
coat+0.014
Neg logit contributions
 end-0.006
 her-0.005
 say-0.005
 decide-0.005
 asks-0.005
Token deep
Token ID2769
Feature activation-4.29e-02

Pos logit contributions
 her+0.032
 end+0.032
 say+0.029
 decide+0.029
 other+0.028
Neg logit contributions
amas-0.039
belt-0.038
aj-0.037
roller-0.036
coat-0.035
Token pizza
Token ID14256
Feature activation+3.34e-02

Pos logit contributions
amas+0.089
belt+0.086
aj+0.084
roller+0.083
coat+0.081
Neg logit contributions
 her-0.073
 end-0.073
 decide-0.067
 other-0.067
 say-0.067
Token.
Token ID13
Feature activation-5.40e-04

Pos logit contributions
 end+0.041
 her+0.040
 say+0.037
 decide+0.035
 other+0.034
Neg logit contributions
amas-0.086
belt-0.084
aj-0.082
roller-0.081
coat-0.080
Token
Token ID198
Feature activation-1.64e-02

Pos logit contributions
belt+0.033
amas+0.032
 outage+0.031
aj+0.030
coat+0.030
Neg logit contributions
 her-0.016
 end-0.015
 say-0.014
 other-0.013
 asks-0.013
Token
Token ID198
Feature activation-1.35e-02

Pos logit contributions
belt+0.046
amas+0.046
aj+0.043
coat+0.043
roller+0.043
Neg logit contributions
 her-0.018
 end-0.017
 other-0.015
 say-0.015
 the-0.013
TokenL
Token ID43
Feature activation+2.26e-02

Pos logit contributions
amas+0.031
belt+0.030
aj+0.030
roller+0.029
+0.029
Neg logit contributions
 end-0.021
 her-0.020
 say-0.018
 other-0.018
 decide-0.017
Tokenily
Token ID813
Feature activation+1.25e-02

Pos logit contributions
 end+0.025
 her+0.023
 say+0.023
 decide+0.021
 other+0.020
Neg logit contributions
amas-0.059
aj-0.058
belt-0.058
coat-0.056
roller-0.056
Token and
Token ID290
Feature activation-9.60e-04

Pos logit contributions
 end+0.015
 her+0.013
 say+0.012
 decide+0.011
 other+0.011
Neg logit contributions
amas-0.032
aj-0.032
belt-0.032
roller-0.031
coat-0.030
Token Tim
Token ID5045
Feature activation-7.74e-03

Pos logit contributions
amas+0.001
belt+0.001
coat+0.001
+0.001
aj+0.001
Neg logit contributions
 end-0.002
 decide-0.002
 mean-0.002
 say-0.002
 angry-0.002
Tokenmy
Token ID1820
Feature activation+8.76e-03

Pos logit contributions
coat+0.014
amas+0.013
belt+0.013
aj+0.013
ney+0.013
Neg logit contributions
 end-0.013
 her-0.013
 angry-0.012
 other-0.012
 flower-0.012
Token played
Token ID2826
Feature activation-1.77e-02

Pos logit contributions
 end+0.012
 her+0.011
 say+0.010
 angry+0.010
 asks+0.010
Neg logit contributions
belt-0.019
coat-0.019
amas-0.019
roller-0.018
anas-0.018
Token together
Token ID1978
Feature activation+5.64e-03

Pos logit contributions
amas+0.043
belt+0.042
aj+0.041
roller+0.040
coat+0.039
Neg logit contributions
 end-0.025
 her-0.025
 say-0.023
 decide-0.022
 other-0.021
Token for
Token ID329
Feature activation+3.53e-02

Pos logit contributions
 end+0.007
 say+0.006
 decide+0.006
 her+0.006
 asks+0.006
Neg logit contributions
amas-0.014
belt-0.013
roller-0.013
coat-0.013
aj-0.013
Token a
Token ID257
Feature activation+7.04e-03

Pos logit contributions
 end+0.040
 say+0.039
 decide+0.038
 asks+0.037
 her+0.036
Neg logit contributions
amas-0.086
-0.084
anas-0.083
belt-0.083
aj-0.081
Token to
Token ID284
Feature activation+1.67e-03

Pos logit contributions
 end+0.005
 her+0.005
 say+0.005
 decide+0.005
 other+0.005
Neg logit contributions
amas-0.006
belt-0.006
aj-0.006
roller-0.006
coat-0.006
Token a
Token ID257
Feature activation+2.85e-02

Pos logit contributions
 end+0.002
 her+0.001
 decide+0.001
 angry+0.001
 say+0.001
Neg logit contributions
amas-0.005
roller-0.004
aj-0.004
belt-0.004
-0.004
Token hive
Token ID35881
Feature activation-1.77e-02

Pos logit contributions
 end+0.039
 her+0.038
 say+0.035
 decide+0.033
 other+0.033
Neg logit contributions
amas-0.070
belt-0.068
aj-0.066
roller-0.066
coat-0.064
Token to
Token ID284
Feature activation+1.40e-02

Pos logit contributions
amas+0.051
+0.049
belt+0.049
aj+0.048
roller+0.047
Neg logit contributions
 her-0.021
 end-0.019
 say-0.018
 the-0.015
 ask-0.015
Token get
Token ID651
Feature activation+4.85e-02

Pos logit contributions
 end+0.016
 asks+0.016
 angry+0.015
 her+0.015
 says+0.015
Neg logit contributions
amas-0.031
belt-0.029
coat-0.029
onna-0.028
aj-0.027
Token some
Token ID617
Feature activation+2.05e-02

Pos logit contributions
 end+0.059
 judge+0.054
 say+0.053
 decide+0.051
 asks+0.050
Neg logit contributions
amas-0.112
belt-0.106
aj-0.104
onna-0.098
coat-0.097
Token honey
Token ID12498
Feature activation-4.08e-02

Pos logit contributions
 end+0.032
 her+0.030
 decide+0.029
 say+0.029
 asks+0.028
Neg logit contributions
amas-0.044
aj-0.042
belt-0.040
coat-0.039
roller-0.038
Token,
Token ID11
Feature activation+9.61e-03

Pos logit contributions
belt+0.122
amas+0.122
aj+0.119
roller+0.119
+0.117
Neg logit contributions
 her-0.039
 end-0.035
 say-0.032
 all-0.026
 other-0.025
Token but
Token ID475
Feature activation-4.51e-02

Pos logit contributions
 her+0.018
 say+0.017
 end+0.016
 other+0.015
 ask+0.015
Neg logit contributions
amas-0.021
belt-0.021
roller-0.020
aj-0.020
-0.019
Token the
Token ID262
Feature activation-1.37e-02

Pos logit contributions
belt+0.118
amas+0.118
aj+0.112
roller+0.111
coat+0.108
Neg logit contributions
 end-0.052
 her-0.050
 say-0.048
 decide-0.045
 asks-0.043
Token bees
Token ID17002
Feature activation-6.41e-02

Pos logit contributions
amas+0.034
belt+0.032
aj+0.031
coat+0.031
roller+0.031
Neg logit contributions
 end-0.017
 her-0.017
 say-0.016
 decide-0.016
 angry-0.015
Token They
Token ID1119
Feature activation+1.36e-02

Pos logit contributions
belt+0.029
amas+0.029
coat+0.028
imm+0.027
aj+0.027
Neg logit contributions
 her-0.011
 say-0.010
 end-0.010
 asks-0.009
 the-0.009
Token still
Token ID991
Feature activation+1.42e-02

Pos logit contributions
 her+0.015
 end+0.014
 angry+0.013
 asks+0.012
 other+0.012
Neg logit contributions
belt-0.032
coat-0.031
aj-0.031
amas-0.031
roller-0.031
Token have
Token ID423
Feature activation+2.98e-02

Pos logit contributions
 end+0.014
 her+0.014
 say+0.012
 other+0.012
 decide+0.011
Neg logit contributions
amas-0.039
belt-0.039
aj-0.038
roller-0.037
coat-0.037
Token a
Token ID257
Feature activation+4.34e-03

Pos logit contributions
 her+0.036
 end+0.033
 other+0.032
 the+0.029
 say+0.028
Neg logit contributions
amas-0.088
belt-0.087
roller-0.082
 outage-0.080
aj-0.080
Token lot
Token ID1256
Feature activation+2.85e-02

Pos logit contributions
 end+0.005
 her+0.005
 say+0.005
 decide+0.004
 other+0.004
Neg logit contributions
amas-0.011
belt-0.011
aj-0.011
roller-0.011
coat-0.010
Token of
Token ID286
Feature activation-3.28e-02

Pos logit contributions
 end+0.039
 her+0.038
 say+0.034
 other+0.033
 decide+0.032
Neg logit contributions
amas-0.074
belt-0.071
aj-0.069
roller-0.068
-0.067
Token paper
Token ID3348
Feature activation+1.68e-02

Pos logit contributions
amas+0.061
aj+0.060
belt+0.059
coat+0.057
roller+0.056
Neg logit contributions
 end-0.058
 her-0.056
 decide-0.053
 say-0.052
 asks-0.050
Token.
Token ID13
Feature activation-8.27e-03

Pos logit contributions
 end+0.019
 her+0.019
 say+0.017
 other+0.016
 decide+0.015
Neg logit contributions
amas-0.046
belt-0.045
aj-0.043
roller-0.043
-0.042
Token They
Token ID1119
Feature activation+1.52e-02

Pos logit contributions
belt+0.024
amas+0.023
coat+0.023
imm+0.022
aj+0.022
Neg logit contributions
 her-0.009
 end-0.009
 say-0.009
 other-0.008
 asks-0.008
Token can
Token ID460
Feature activation+1.42e-02

Pos logit contributions
 her+0.017
 end+0.016
 angry+0.015
 asks+0.015
 girls+0.014
Neg logit contributions
belt-0.034
coat-0.034
amas-0.033
aj-0.033
-0.032
Token make
Token ID787
Feature activation-1.98e-02

Pos logit contributions
 end+0.016
 her+0.016
 angry+0.015
 other+0.014
 asks+0.014
Neg logit contributions
amas-0.034
belt-0.034
roller-0.033
coat-0.032
aj-0.032
Token looked
Token ID3114
Feature activation-1.05e-02

Pos logit contributions
 end+0.017
 her+0.016
 say+0.015
 decide+0.014
 other+0.014
Neg logit contributions
amas-0.032
belt-0.032
aj-0.031
roller-0.031
coat-0.030
Token in
Token ID287
Feature activation-8.00e-03

Pos logit contributions
belt+0.026
amas+0.025
aj+0.025
roller+0.024
coat+0.024
Neg logit contributions
 end-0.013
 her-0.012
 say-0.011
 asks-0.010
 other-0.010
Token awe
Token ID25030
Feature activation-4.50e-02

Pos logit contributions
amas+0.019
belt+0.019
roller+0.018
aj+0.018
coat+0.018
Neg logit contributions
 end-0.010
 her-0.009
 say-0.009
 decide-0.009
 asks-0.009
Token.
Token ID13
Feature activation-5.66e-03

Pos logit contributions
amas+0.122
belt+0.120
aj+0.117
roller+0.116
coat+0.113
Neg logit contributions
 end-0.051
 her-0.050
 say-0.044
 decide-0.041
 other-0.041
Token She
Token ID1375
Feature activation+3.42e-03

Pos logit contributions
belt+0.015
amas+0.015
aj+0.014
coat+0.014
roller+0.014
Neg logit contributions
 her-0.007
 end-0.007
 say-0.006
 other-0.006
 asks-0.006
Token had
Token ID550
Feature activation-1.82e-02

Pos logit contributions
 end+0.004
 her+0.004
 say+0.004
 decide+0.003
 other+0.003
Neg logit contributions
amas-0.009
belt-0.009
aj-0.008
roller-0.008
coat-0.008
Token never
Token ID1239
Feature activation-5.14e-02

Pos logit contributions
amas+0.046
belt+0.045
aj+0.044
roller+0.043
coat+0.042
Neg logit contributions
 her-0.026
 end-0.025
 say-0.023
 other-0.022
 decide-0.022
Token seen
Token ID1775
Feature activation-6.81e-02

Pos logit contributions
belt+0.079
amas+0.078
aj+0.075
+0.074
roller+0.074
Neg logit contributions
 end-0.109
 her-0.106
 say-0.100
 decide-0.098
 other-0.096
Token anything
Token ID1997
Feature activation-7.61e-03

Pos logit contributions
amas+0.178
belt+0.176
aj+0.170
roller+0.168
+0.167
Neg logit contributions
 end-0.077
 her-0.071
 say-0.069
 decide-0.066
 angry-0.061
Token so
Token ID523
Feature activation+8.24e-03

Pos logit contributions
amas+0.020
belt+0.020
aj+0.019
roller+0.019
coat+0.018
Neg logit contributions
 end-0.009
 her-0.009
 say-0.008
 other-0.008
 decide-0.008
Token big
Token ID1263
Feature activation+3.85e-02

Pos logit contributions
 end+0.011
 her+0.011
 say+0.010
 decide+0.010
 other+0.009
Neg logit contributions
amas-0.020
belt-0.020
aj-0.019
roller-0.019
coat-0.018
Token back
Token ID736
Feature activation-3.98e-02

Pos logit contributions
amas+0.009
belt+0.009
roller+0.009
aj+0.009
+0.009
Neg logit contributions
 her-0.003
 end-0.003
 say-0.002
 angry-0.002
 decide-0.002
Token and
Token ID290
Feature activation-2.14e-02

Pos logit contributions
amas+0.110
belt+0.108
aj+0.105
roller+0.104
coat+0.102
Neg logit contributions
 end-0.042
 her-0.041
 say-0.037
 decide-0.034
 other-0.034
Token he
Token ID339
Feature activation-5.08e-02

Pos logit contributions
belt+0.056
amas+0.056
aj+0.054
roller+0.053
coat+0.053
Neg logit contributions
 end-0.025
 her-0.022
 say-0.020
 decide-0.020
 other-0.019
Token knew
Token ID2993
Feature activation+1.30e-02

Pos logit contributions
amas+0.131
belt+0.129
aj+0.125
roller+0.124
coat+0.122
Neg logit contributions
 end-0.062
 her-0.060
 say-0.054
 decide-0.052
 other-0.051
Token he
Token ID339
Feature activation-6.65e-02

Pos logit contributions
 end+0.012
 her+0.012
 say+0.010
 other+0.009
 decide+0.009
Neg logit contributions
amas-0.038
belt-0.037
aj-0.037
roller-0.036
coat-0.036
Token had
Token ID550
Feature activation-3.34e-02

Pos logit contributions
amas+0.148
belt+0.145
aj+0.140
roller+0.138
coat+0.135
Neg logit contributions
 end-0.107
 her-0.105
 say-0.097
 decide-0.093
 other-0.092
Token made
Token ID925
Feature activation-7.23e-02

Pos logit contributions
amas+0.089
belt+0.087
aj+0.085
roller+0.084
coat+0.082
Neg logit contributions
 her-0.043
 end-0.041
 say-0.037
 other-0.036
 decide-0.035
Token the
Token ID262
Feature activation-3.68e-02

Pos logit contributions
amas+0.221
belt+0.217
aj+0.212
roller+0.210
coat+0.207
Neg logit contributions
 end-0.055
 her-0.051
 say-0.044
 decide-0.040
 other-0.038
Token right
Token ID826
Feature activation+9.42e-03

Pos logit contributions
amas+0.079
coat+0.077
belt+0.076
+0.073
aj+0.072
Neg logit contributions
 her-0.051
 asks-0.048
 say-0.045
 end-0.044
 decide-0.043
Token choice
Token ID3572
Feature activation+1.19e-02

Pos logit contributions
 her+0.016
 end+0.015
 say+0.014
 decide+0.014
 other+0.014
Neg logit contributions
amas-0.019
belt-0.019
aj-0.019
roller-0.018
coat-0.018
Token in
Token ID287
Feature activation-1.46e-04

Pos logit contributions
 her+0.013
 end+0.012
 say+0.010
 other+0.009
 the+0.009
Neg logit contributions
amas-0.036
aj-0.035
roller-0.034
belt-0.033
coat-0.033
Token full
Token ID1336
Feature activation-7.84e-02

Pos logit contributions
 her+0.039
 end+0.034
 angry+0.033
 the+0.032
 other+0.031
Neg logit contributions
amas-0.090
roller-0.087
belt-0.084
aj-0.083
coat-0.083
Token of
Token ID286
Feature activation-6.11e-02

Pos logit contributions
amas+0.245
aj+0.238
belt+0.237
roller+0.236
+0.235
Neg logit contributions
 end-0.080
 her-0.075
 other-0.067
 the-0.061
 all-0.059
Token danger
Token ID3514
Feature activation-3.96e-02

Pos logit contributions
amas+0.150
belt+0.147
aj+0.143
roller+0.141
coat+0.138
Neg logit contributions
 end-0.083
 her-0.081
 say-0.074
 decide-0.070
 other-0.069
Token and
Token ID290
Feature activation+2.31e-02

Pos logit contributions
amas+0.124
aj+0.123
belt+0.120
roller+0.119
coat+0.115
Neg logit contributions
 her-0.034
 end-0.033
 other-0.029
 say-0.027
 the-0.027
Token difficult
Token ID2408
Feature activation+2.06e-02

Pos logit contributions
 end+0.029
 her+0.029
 say+0.026
 other+0.026
 angry+0.024
Neg logit contributions
amas-0.062
aj-0.059
belt-0.058
roller-0.058
coat-0.057
Token times
Token ID1661
Feature activation-1.87e-02

Pos logit contributions
 end+0.026
 her+0.025
 other+0.022
 say+0.022
 decide+0.021
Neg logit contributions
amas-0.056
aj-0.055
belt-0.055
roller-0.054
coat-0.054
Token.
Token ID13
Feature activation-7.27e-03

Pos logit contributions
amas+0.059
aj+0.058
coat+0.055
+0.055
roller+0.055
Neg logit contributions
 her-0.016
 end-0.016
 other-0.012
 say-0.012
 the-0.012
Token
Token ID198
Feature activation-1.82e-02

Pos logit contributions
belt+0.020
amas+0.020
coat+0.019
aj+0.019
 outage+0.019
Neg logit contributions
 her-0.008
 end-0.008
 other-0.007
 say-0.007
 asks-0.007
Token
Token ID198
Feature activation-1.48e-02

Pos logit contributions
amas+0.052
belt+0.051
aj+0.050
coat+0.049
roller+0.048
Neg logit contributions
 her-0.019
 end-0.019
 other-0.017
 say-0.017
 the-0.015
TokenThe
Token ID464
Feature activation-1.54e-02

Pos logit contributions
amas+0.037
aj+0.036
belt+0.035
roller+0.034
coat+0.034
Neg logit contributions
 end-0.022
 her-0.020
 other-0.019
 say-0.018
 decide-0.018
Token warrior
Token ID16491
Feature activation+3.16e-02

Pos logit contributions
belt+0.031
amas+0.030
+0.028
aj+0.026
ables+0.026
Neg logit contributions
 her-0.027
 say-0.025
 says-0.023
 asks-0.023
 end-0.022
Token came
Token ID1625
Feature activation-2.17e-02

Pos logit contributions
 end+0.031
 her+0.030
 say+0.027
 angry+0.025
 decide+0.025
Neg logit contributions
amas-0.061
belt-0.060
roller-0.058
aj-0.058
coat-0.058
Token back
Token ID736
Feature activation-1.03e-02

Pos logit contributions
amas+0.055
belt+0.054
aj+0.052
roller+0.051
+0.050
Neg logit contributions
 her-0.030
 end-0.030
 say-0.027
 other-0.025
 decide-0.025
Token.
Token ID13
Feature activation-3.14e-03

Pos logit contributions
amas+0.028
belt+0.027
aj+0.026
roller+0.026
coat+0.026
Neg logit contributions
 her-0.012
 end-0.012
 say-0.011
 decide-0.010
 other-0.010
Token They
Token ID1119
Feature activation+6.45e-03

Pos logit contributions
belt+0.007
amas+0.007
aj+0.007
coat+0.007
+0.007
Neg logit contributions
 her-0.005
 end-0.005
 say-0.004
 asks-0.004
 other-0.004
Token saw
Token ID2497
Feature activation-2.36e-02

Pos logit contributions
 end+0.007
 her+0.006
 say+0.006
 asks+0.006
 angry+0.006
Neg logit contributions
belt-0.015
coat-0.015
amas-0.015
aj-0.015
roller-0.015
Token Mae
Token ID34673
Feature activation-5.72e-02

Pos logit contributions
tub+0.054
aj+0.053
+0.053
amas+0.053
belt+0.052
Neg logit contributions
 end-0.026
 say-0.023
 decide-0.022
 ask-0.021
 judge-0.020
Token and
Token ID290
Feature activation-6.85e-03

Pos logit contributions
amas+0.135
belt+0.132
aj+0.132
roller+0.128
coat+0.125
Neg logit contributions
 end-0.077
 her-0.073
 decide-0.067
 say-0.067
 other-0.065
Token excited
Token ID6568
Feature activation-2.37e-03

Pos logit contributions
belt+0.017
amas+0.017
aj+0.017
roller+0.016
coat+0.016
Neg logit contributions
 end-0.008
 say-0.006
 decide-0.006
 her-0.006
 asks-0.006
Tokenly
Token ID306
Feature activation+5.45e-03

Pos logit contributions
amas+0.005
aj+0.005
belt+0.005
roller+0.004
coat+0.004
Neg logit contributions
 end-0.004
 her-0.003
 decide-0.003
 say-0.003
 asks-0.003
Token ran
Token ID4966
Feature activation-1.85e-02

Pos logit contributions
 end+0.008
 her+0.007
 say+0.007
 decide+0.006
 other+0.006
Neg logit contributions
amas-0.013
belt-0.013
aj-0.013
roller-0.012
coat-0.012
Token towards
Token ID3371
Feature activation-3.22e-03

Pos logit contributions
amas+0.040
belt+0.040
roller+0.039
coat+0.038
aj+0.038
Neg logit contributions
 end-0.030
 her-0.027
 say-0.026
 asks-0.025
 decide-0.025
Token l
Token ID300
Feature activation+6.11e-02

Pos logit contributions
 end+0.009
 her+0.008
 say+0.008
 other+0.008
 decide+0.007
Neg logit contributions
amas-0.013
belt-0.012
aj-0.012
roller-0.012
coat-0.011
Tokenadders
Token ID45940
Feature activation-1.47e-01

Pos logit contributions
 end+0.124
 her+0.124
 say+0.119
 other+0.116
 decide+0.115
Neg logit contributions
amas-0.109
belt-0.102
roller-0.098
aj-0.096
-0.095
Token.
Token ID13
Feature activation-9.36e-03

Pos logit contributions
amas+0.475
aj+0.451
+0.446
roller+0.435
belt+0.430
Neg logit contributions
 her-0.126
 end-0.114
 say-0.098
 all-0.093
 the-0.093
Token They
Token ID1119
Feature activation+1.10e-02

Pos logit contributions
amas+0.024
belt+0.024
coat+0.023
aj+0.023
imm+0.022
Neg logit contributions
 her-0.013
 end-0.012
 say-0.011
 other-0.011
 asks-0.011
Token were
Token ID547
Feature activation-1.67e-02

Pos logit contributions
 end+0.014
 her+0.014
 say+0.012
 decide+0.012
 other+0.011
Neg logit contributions
belt-0.027
amas-0.027
aj-0.026
roller-0.026
coat-0.026
Token firefighters
Token ID22611
Feature activation-1.03e-01

Pos logit contributions
amas+0.047
belt+0.046
roller+0.045
aj+0.044
+0.043
Neg logit contributions
 her-0.018
 end-0.018
 say-0.016
 angry-0.016
 other-0.015
Token.
Token ID13
Feature activation-7.26e-03

Pos logit contributions
amas+0.322
belt+0.308
aj+0.307
roller+0.299
+0.296
Neg logit contributions
 end-0.093
 her-0.087
 say-0.081
 angry-0.067
 other-0.066
Token They
Token ID1119
Feature activation+1.49e-02

Pos logit contributions
amas+0.019
belt+0.018
coat+0.018
aj+0.018
imm+0.017
Neg logit contributions
 her-0.010
 say-0.009
 end-0.009
 other-0.009
 asks-0.009
Token sprayed
Token ID35729
Feature activation-3.69e-02

Pos logit contributions
 end+0.019
 her+0.019
 say+0.017
 decide+0.016
 other+0.016
Neg logit contributions
amas-0.038
belt-0.037
aj-0.036
roller-0.036
coat-0.035
Token water
Token ID1660
Feature activation+8.12e-03

Pos logit contributions
amas+0.113
belt+0.111
aj+0.109
roller+0.108
+0.105
Neg logit contributions
 end-0.031
 her-0.030
 say-0.025
 other-0.024
 decide-0.022
Token on
Token ID319
Feature activation+3.16e-04

Pos logit contributions
 end+0.007
 her+0.007
 say+0.006
 other+0.006
 angry+0.005
Neg logit contributions
amas-0.025
belt-0.024
aj-0.024
-0.023
roller-0.023
Token promised
Token ID8072
Feature activation-2.16e-02

Pos logit contributions
 end+0.004
 her+0.004
 say+0.004
 decide+0.004
 other+0.004
Neg logit contributions
amas-0.011
belt-0.011
aj-0.011
roller-0.011
coat-0.010
Token he
Token ID339
Feature activation-6.38e-02

Pos logit contributions
amas+0.058
aj+0.055
belt+0.055
roller+0.054
coat+0.053
Neg logit contributions
 her-0.029
 end-0.027
 say-0.025
 other-0.024
 decide-0.023
Token wouldn
Token ID3636
Feature activation-4.92e-02

Pos logit contributions
amas+0.133
belt+0.130
aj+0.125
roller+0.124
coat+0.121
Neg logit contributions
 end-0.111
 her-0.108
 say-0.101
 decide-0.098
 other-0.097
Tokenâ
Token ID22940
Feature activation-5.20e-02

Pos logit contributions
amas+0.122
belt+0.120
aj+0.120
roller+0.114
coat+0.113
Neg logit contributions
 end-0.069
 her-0.066
 say-0.062
 decide-0.058
 other-0.058
Token
Token ID26391
Feature activation-4.53e-04

Pos logit contributions
amas+0.089
belt+0.086
aj+0.083
roller+0.081
coat+0.079
Neg logit contributions
 end-0.110
 her-0.109
 say-0.103
 decide-0.100
 other-0.099
Token
Token ID8151
Feature activation-1.19e-01

Pos logit contributions
 outage+0.001
uster+0.001
onna+0.001
imm+0.001
lights+0.001
Neg logit contributions
 her-0.001
 both-0.001
 the-0.001
 say-0.001
 end-0.001
Tokent
Token ID83
Feature activation-3.18e-03

Pos logit contributions
amas+0.204
aj+0.192
icol+0.190
belt+0.180
anas+0.178
Neg logit contributions
 her-0.261
 end-0.260
 other-0.241
 both-0.230
 say-0.228
Token complain
Token ID13121
Feature activation+9.32e-03

Pos logit contributions
amas+0.008
belt+0.008
roller+0.008
coat+0.008
aj+0.008
Neg logit contributions
 end-0.004
 her-0.003
 say-0.003
 decide-0.003
 other-0.003
Token about
Token ID546
Feature activation-6.09e-02

Pos logit contributions
 her+0.011
 end+0.010
 other+0.009
 all+0.008
 say+0.008
Neg logit contributions
amas-0.028
aj-0.027
roller-0.027
belt-0.027
coat-0.026
Token the
Token ID262
Feature activation-7.88e-03

Pos logit contributions
amas+0.202
belt+0.199
aj+0.194
roller+0.194
coat+0.191
Neg logit contributions
 her-0.034
 end-0.034
 say-0.025
 other-0.022
 decide-0.020
Token hot
Token ID3024
Feature activation-3.03e-02

Pos logit contributions
amas+0.018
belt+0.017
aj+0.017
coat+0.016
roller+0.016
Neg logit contributions
 her-0.011
 end-0.011
 say-0.011
 decide-0.010
 asks-0.010
Token.
Token ID13
Feature activation-3.43e-03

Pos logit contributions
 end+0.045
 her+0.042
 say+0.036
 other+0.035
 angry+0.034
Neg logit contributions
belt-0.099
amas-0.097
roller-0.092
 outage-0.090
aj-0.090
Token It
Token ID632
Feature activation+2.03e-02

Pos logit contributions
amas+0.008
belt+0.008
aj+0.008
roller+0.007
coat+0.007
Neg logit contributions
 her-0.005
 end-0.005
 say-0.005
 asks-0.004
 other-0.004
Token was
Token ID373
Feature activation-1.59e-02

Pos logit contributions
 end+0.028
 her+0.027
 say+0.024
 decide+0.023
 other+0.023
Neg logit contributions
amas-0.049
belt-0.048
aj-0.047
roller-0.046
coat-0.045
Token the
Token ID262
Feature activation-3.80e-02

Pos logit contributions
amas+0.045
belt+0.044
roller+0.043
aj+0.043
+0.042
Neg logit contributions
 her-0.017
 end-0.016
 say-0.014
 other-0.014
 angry-0.013
Token prett
Token ID46442
Feature activation-2.41e-02

Pos logit contributions
amas+0.086
belt+0.086
aj+0.081
coat+0.081
roller+0.080
Neg logit contributions
 her-0.056
 end-0.051
 say-0.049
 asks-0.048
 decide-0.048
Tokeniest
Token ID6386
Feature activation-1.01e-01

Pos logit contributions
amas+0.029
belt+0.029
aj+0.026
roller+0.025
coat+0.025
Neg logit contributions
 end-0.061
 her-0.061
 say-0.058
 decide-0.058
 asks-0.057
Token thing
Token ID1517
Feature activation+3.63e-02

Pos logit contributions
amas+0.247
belt+0.243
aj+0.235
roller+0.234
coat+0.230
Neg logit contributions
 end-0.133
 her-0.131
 say-0.120
 decide-0.116
 angry-0.112
Token Lucy
Token ID22162
Feature activation-3.90e-02

Pos logit contributions
 her+0.045
 end+0.042
 say+0.039
 other+0.036
 angry+0.034
Neg logit contributions
amas-0.100
belt-0.095
aj-0.095
roller-0.093
-0.090
Token had
Token ID550
Feature activation-1.77e-02

Pos logit contributions
amas+0.098
belt+0.096
aj+0.093
roller+0.093
coat+0.090
Neg logit contributions
 end-0.051
 her-0.050
 say-0.046
 decide-0.043
 other-0.043
Token ever
Token ID1683
Feature activation-4.18e-02

Pos logit contributions
amas+0.042
belt+0.040
aj+0.039
roller+0.039
+0.038
Neg logit contributions
 her-0.027
 end-0.027
 say-0.024
 other-0.023
 decide-0.023
Token seen
Token ID1775
Feature activation-9.02e-02

Pos logit contributions
amas+0.071
aj+0.071
belt+0.070
coat+0.068
+0.067
Neg logit contributions
 end-0.079
 her-0.077
 say-0.072
 decide-0.069
 other-0.069
Token,
Token ID11
Feature activation+9.89e-03

Pos logit contributions
 end+0.007
 her+0.007
 say+0.006
 other+0.006
 decide+0.006
Neg logit contributions
amas-0.014
belt-0.014
aj-0.014
roller-0.014
coat-0.013
Token it
Token ID340
Feature activation-7.23e-04

Pos logit contributions
 her+0.011
 end+0.011
 say+0.010
 other+0.009
 decide+0.009
Neg logit contributions
amas-0.027
belt-0.026
aj-0.026
roller-0.025
coat-0.025
Token doesn
Token ID1595
Feature activation+2.49e-02

Pos logit contributions
amas+0.002
belt+0.002
roller+0.002
aj+0.002
coat+0.002
Neg logit contributions
 her-0.001
 end-0.001
 say-0.001
 other-0.001
 angry-0.001
Tokenâ
Token ID22940
Feature activation-3.33e-02

Pos logit contributions
 end+0.034
 her+0.033
 say+0.029
 other+0.028
 decide+0.026
Neg logit contributions
belt-0.064
aj-0.064
amas-0.063
coat-0.060
roller-0.060
Token
Token ID26391
Feature activation+1.82e-02

Pos logit contributions
amas+0.055
belt+0.053
aj+0.049
roller+0.049
coat+0.049
Neg logit contributions
 end-0.072
 her-0.071
 say-0.067
 other-0.066
 decide-0.066
Token
Token ID8151
Feature activation-1.04e-01

Pos logit contributions
 her+0.052
 say+0.049
 end+0.049
 the+0.048
 two+0.048
Neg logit contributions
belt-0.014
imm-0.014
lights-0.014
 outage-0.013
amas-0.013
Tokent
Token ID83
Feature activation+1.07e-02

Pos logit contributions
amas+0.169
aj+0.158
belt+0.157
icol+0.151
anas+0.147
Neg logit contributions
 her-0.231
 end-0.229
 say-0.216
 other-0.212
 angry-0.208
Token feel
Token ID1254
Feature activation+1.71e-02

Pos logit contributions
 end+0.011
 her+0.011
 decide+0.010
 say+0.010
 asks+0.009
Neg logit contributions
amas-0.028
belt-0.028
coat-0.027
aj-0.027
roller-0.027
Token safe
Token ID3338
Feature activation-5.18e-02

Pos logit contributions
 her+0.017
 end+0.017
 say+0.015
 other+0.014
 angry+0.014
Neg logit contributions
amas-0.051
belt-0.050
roller-0.049
aj-0.049
coat-0.047
Token.
Token ID13
Feature activation+8.91e-03

Pos logit contributions
amas+0.151
roller+0.148
belt+0.147
aj+0.146
+0.144
Neg logit contributions
 her-0.056
 end-0.053
 say-0.044
 other-0.042
 the-0.041
Tokenâ
Token ID22940
Feature activation-2.43e-02

Pos logit contributions
 her+0.013
 say+0.012
 end+0.012
 asks+0.012
 ask+0.012
Neg logit contributions
belt-0.021
amas-0.020
aj-0.020
coat-0.019
roller-0.019
Token One
Token ID1881
Feature activation+1.50e-02

Pos logit contributions
belt+0.031
amas+0.031
aj+0.030
roller+0.029
coat+0.028
Neg logit contributions
 her-0.017
 end-0.016
 say-0.015
 other-0.014
 decide-0.013
Token day
Token ID1110
Feature activation+1.85e-02

Pos logit contributions
 end+0.020
 her+0.020
 say+0.018
 decide+0.017
 other+0.017
Neg logit contributions
amas-0.037
belt-0.037
aj-0.036
roller-0.035
coat-0.035
Token,
Token ID11
Feature activation+2.76e-02

Pos logit contributions
 end+0.026
 decide+0.026
 say+0.025
 asks+0.024
 angry+0.024
Neg logit contributions
amas-0.040
belt-0.039
coat-0.038
aj-0.038
roller-0.038
Token she
Token ID673
Feature activation-3.31e-02

Pos logit contributions
 end+0.034
 asks+0.034
 decide+0.034
 decides+0.032
 say+0.030
Neg logit contributions
amas-0.060
coat-0.059
belt-0.058
-0.053
spread-0.053
Token was
Token ID373
Feature activation-6.28e-03

Pos logit contributions
amas+0.094
belt+0.093
aj+0.090
roller+0.090
coat+0.088
Neg logit contributions
 end-0.033
 her-0.032
 say-0.028
 decide-0.026
 other-0.026
Token tid
Token ID29770
Feature activation-1.24e-01

Pos logit contributions
amas+0.015
belt+0.015
aj+0.015
roller+0.015
coat+0.014
Neg logit contributions
 end-0.009
 her-0.008
 say-0.008
 decide-0.007
 other-0.007
Tokenying
Token ID1112
Feature activation-4.20e-02

Pos logit contributions
amas+0.238
belt+0.230
aj+0.223
roller+0.221
+0.214
Neg logit contributions
 end-0.240
 her-0.240
 say-0.222
 decide-0.214
 other-0.214
Token her
Token ID607
Feature activation+8.67e-03

Pos logit contributions
amas+0.122
belt+0.119
aj+0.115
roller+0.114
coat+0.112
Neg logit contributions
 end-0.038
 say-0.034
 her-0.034
 decide-0.030
 asks-0.029
Token room
Token ID2119
Feature activation-6.44e-02

Pos logit contributions
 end+0.017
 her+0.017
 say+0.015
 other+0.015
 decide+0.015
Neg logit contributions
amas-0.017
belt-0.016
roller-0.015
aj-0.015
-0.015
Token when
Token ID618
Feature activation+4.61e-02

Pos logit contributions
amas+0.189
+0.185
roller+0.184
aj+0.183
belt+0.179
Neg logit contributions
 her-0.069
 end-0.066
 other-0.051
 all-0.050
 one-0.047
Token she
Token ID673
Feature activation+7.12e-03

Pos logit contributions
 end+0.041
 say+0.041
 decide+0.035
 asks+0.035
 says+0.033
Neg logit contributions
amas-0.126
belt-0.126
-0.118
aj-0.118
roller-0.118