TOP ACTIVATIONS
MAX = 7.256

opened and he was over joy ed that he had found
happy . She was over joy ed that her pet was
M ia was over joy ed and thanked her friend
Cl aire was over joy ed - her mug had
The little girl was over joy ed and she thanked her
and Tim my was over joy ed . He and his
all ." Lucy was over joy ed and quickly grabbed the
The bald man was over joy ed . He thanked the
Anna was over joy ed . She had guessed
John was over joy ed and asked the woman
win . Jane was over joy ed ! <|endoftext|> Once upon
rock . Lily was over joy ed and knew it was
The oct opus was over joy ed , and he thanked
S ally was so over joy ed with her present that
Jean was over joy ed and waved to the
flower . Lily was over joy ed and ran to her
The little girl was over joy ed and jumped out of
The bear was over joy ed ! He quickly ran
The rabbit was over joy ed ! He hopped up
wind . Benny was over joy ed .

MIN ACTIVATIONS
MIN = -7.972

make it whole again . Amen ." Then ,
to mom and dad . Please mend the toy and make
and fell . The ash tr ay flew out of her
I have a treat ? Please !" Her mom
Please , help me . Please , hug me ."
you need to sleep . Please , put this book away
show off his powerful ash tr ay . He flew it
ones , Mom my ! Please , please , please ."
hit her mom 's ash tr ay . The ash tr
let us watch TV . Please do not be mad at
she could have the ash tr ay to put her rocks
and I like them . Please , mom , just one
me fix my pants . Please ." L ily
else to play with . Please , please , don 't
, it caused the ash tr ay to scatter all over
his mom saw the ash tr ay , she laughed and
a mask at all . Please , Sam ."
will tidy our room . Please do not take away our
me what happened next . Please don 't stop ," Lily
. He drops the ash tr ay . It breaks .

INTERVAL 3.449 - 7.256
CONTAINS 0.317%

said . M ia yawn ed . She did
. She watched his moves carefully and learned a lot .
. She wrapped the yarn carefully around each thread of the
proud and excited to be able to play all by herself
It is a very distant building . It is in the

INTERVAL -0.358 - 3.449
CONTAINS 69.815%

I think I know which book you mean ," the park
" We are digging for minerals ," Tom said
first ." They ran to the kitchen where Mom
falls on the grass and gets dirty . He cries and
story is about Jane and her friend Bob . Jane and

INTERVAL -4.165 - -0.358
CONTAINS 29.751%

'm here . But you were foolish to go to the
Follow your dreams - the true treasure lies within . The
box of candy . He said to her that the scary
After dinner , Lily 's dad came home with
web and wished he could be mighty like the cob web

INTERVAL -7.972 - -4.165
CONTAINS 0.117%

. It sh ined very brightly as it passed over the
." Sur pr ised , the little boy asked
cold ." " Hi , Lily ," Ben said
surprised and said , " Wow , Mr . Cable is
straight . " Wow , you are good at
Token opened
Token ID4721
Feature activation+1.04e+00

Pos logit contributions
ogo+0.029
ules+0.027
ades+0.027
eds+0.026
wen+0.026
Neg logit contributions
 don-0.016
 Are-0.014
 blame-0.013
 worry-0.013
 give-0.012
Token and
Token ID290
Feature activation+2.75e-02

Pos logit contributions
ogo+0.099
ules+0.094
wen+0.093
ades+0.093
eds+0.092
Neg logit contributions
 don-0.038
 Are-0.030
 blame-0.026
 worry-0.025
 Lily-0.025
Token he
Token ID339
Feature activation+4.74e-01

Pos logit contributions
ogo+0.002
Andrew+0.002
eds+0.002
wen+0.002
ades+0.002
Neg logit contributions
 don-0.001
 blame-0.001
 Are-0.001
 worry-0.001
 weep-0.001
Token was
Token ID373
Feature activation+3.86e-01

Pos logit contributions
ogo+0.044
ades+0.042
ules+0.042
wen+0.041
eds+0.041
Neg logit contributions
 don-0.019
 Are-0.015
 blame-0.014
 do-0.013
 give-0.013
Token over
Token ID625
Feature activation+2.78e-01

Pos logit contributions
ogo+0.032
ades+0.031
eds+0.029
ules+0.029
wen+0.029
Neg logit contributions
 don-0.019
 Are-0.017
 blame-0.016
 worry-0.015
 give-0.015
Tokenjoy
Token ID2633
Feature activation+7.17e+00

Pos logit contributions
ogo+0.016
ades+0.014
wen+0.014
ules+0.014
eds+0.014
Neg logit contributions
 don-0.021
 Are-0.019
 blame-0.019
 worry-0.018
 girl-0.018
Tokened
Token ID276
Feature activation-1.42e+00

Pos logit contributions
ogo+0.634
ades+0.603
ules+0.594
wen+0.591
eds+0.591
Neg logit contributions
 don-0.303
 Are-0.262
 blame-0.246
 worry-0.215
 girl-0.213
Token that
Token ID326
Feature activation+3.17e-03

Pos logit contributions
 don+0.062
 Are+0.053
 blame+0.049
 girl+0.045
 do+0.045
Neg logit contributions
ogo-0.125
ades-0.119
wen-0.117
ules-0.117
eds-0.116
Token he
Token ID339
Feature activation+4.85e-01

Pos logit contributions
ogo+0.000
ules+0.000
Andrew+0.000
eds+0.000
wen+0.000
Neg logit contributions
 don-0.000
 blame-0.000
 Are-0.000
 weep-0.000
 worry-0.000
Token had
Token ID550
Feature activation+3.50e-01

Pos logit contributions
ogo+0.041
ades+0.038
eds+0.038
wen+0.038
ules+0.038
Neg logit contributions
 don-0.021
 blame-0.019
 Are-0.019
 worry-0.016
 cry-0.015
Token found
Token ID1043
Feature activation+1.17e+00

Pos logit contributions
ogo+0.030
ades+0.028
ules+0.028
wen+0.028
eds+0.027
Neg logit contributions
 don-0.017
 Are-0.014
 blame-0.013
 do-0.013
 worry-0.012
Token happy
Token ID3772
Feature activation+2.39e-01

Pos logit contributions
ogo+0.142
ades+0.135
ules+0.134
eds+0.132
wen+0.131
Neg logit contributions
 don-0.099
 Are-0.090
 blame-0.087
 girl-0.076
 worry-0.076
Token.
Token ID13
Feature activation-2.01e-02

Pos logit contributions
ogo+0.018
ades+0.018
ules+0.017
eds+0.017
wen+0.017
Neg logit contributions
 don-0.012
 blame-0.011
 Are-0.011
 worry-0.009
 weep-0.009
Token She
Token ID1375
Feature activation+1.47e-02

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 Lily+0.001
 girl+0.000
Neg logit contributions
ogo-0.002
ades-0.002
eds-0.002
wen-0.002
ules-0.002
Token was
Token ID373
Feature activation+2.65e-01

Pos logit contributions
ogo+0.001
ades+0.001
ules+0.001
eds+0.001
wen+0.001
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 do-0.000
 worry-0.000
Token over
Token ID625
Feature activation+1.27e-01

Pos logit contributions
ogo+0.021
ades+0.020
eds+0.019
ules+0.019
wen+0.019
Neg logit contributions
 don-0.014
 Are-0.012
 blame-0.012
 worry-0.011
 do-0.011
Tokenjoy
Token ID2633
Feature activation+7.15e+00

Pos logit contributions
ogo+0.006
ades+0.005
ules+0.005
eds+0.005
wen+0.005
Neg logit contributions
 don-0.011
 Are-0.010
 blame-0.010
 worry-0.009
 cry-0.009
Tokened
Token ID276
Feature activation-1.46e+00

Pos logit contributions
ogo+0.606
ades+0.575
ules+0.572
wen+0.568
eds+0.567
Neg logit contributions
 don-0.323
 Are-0.283
 blame-0.277
 worry-0.240
 girl-0.231
Token that
Token ID326
Feature activation-1.54e-01

Pos logit contributions
 don+0.060
 Are+0.053
 blame+0.052
 worry+0.043
 girl+0.042
Neg logit contributions
ogo-0.128
ades-0.122
ules-0.121
wen-0.120
eds-0.120
Token her
Token ID607
Feature activation+2.25e-01

Pos logit contributions
 don+0.006
 blame+0.006
 Are+0.006
 weep+0.005
 Sorry+0.004
Neg logit contributions
ogo-0.013
ules-0.013
wen-0.012
Andrew-0.012
eds-0.012
Token pet
Token ID4273
Feature activation+1.73e+00

Pos logit contributions
ogo+0.016
wen+0.015
ades+0.015
ules+0.015
eds+0.014
Neg logit contributions
 don-0.013
 Are-0.012
 blame-0.011
 Sorry-0.010
 weep-0.010
Token was
Token ID373
Feature activation+2.78e-01

Pos logit contributions
ogo+0.141
ules+0.132
ades+0.131
wen+0.131
eds+0.129
Neg logit contributions
 don-0.080
 Are-0.074
 blame-0.072
 worry-0.062
 weep-0.061
Token
Token ID198
Feature activation+6.62e-01

Pos logit contributions
ogo+0.067
eds+0.064
urger+0.063
ules+0.062
osaurus+0.062
Neg logit contributions
 don-0.018
 Lily-0.015
 Are-0.014
 do-0.012
 Tim-0.011
TokenM
Token ID44
Feature activation+2.12e+00

Pos logit contributions
ogo+0.047
ules+0.041
eds+0.041
opl+0.040
wen+0.040
Neg logit contributions
 don-0.041
 Are-0.038
 blame-0.034
 girl-0.034
 do-0.034
Tokenia
Token ID544
Feature activation+3.75e+00

Pos logit contributions
ogo+0.093
eds+0.090
Andrew+0.084
ules+0.084
lor+0.081
Neg logit contributions
 don-0.174
 blame-0.160
 Are-0.154
 do-0.151
 girl-0.149
Token was
Token ID373
Feature activation+4.69e-01

Pos logit contributions
ogo+0.327
ades+0.313
wen+0.308
ules+0.308
eds+0.305
Neg logit contributions
 don-0.156
 Are-0.138
 blame-0.131
 worry-0.113
 girl-0.107
Token over
Token ID625
Feature activation+2.87e-01

Pos logit contributions
ogo+0.036
ades+0.034
ules+0.034
eds+0.034
wen+0.033
Neg logit contributions
 don-0.025
 Are-0.022
 blame-0.022
 worry-0.020
 do-0.019
Tokenjoy
Token ID2633
Feature activation+7.15e+00

Pos logit contributions
ogo+0.010
ades+0.010
ules+0.010
wen+0.009
Andrew+0.009
Neg logit contributions
 don-0.026
 blame-0.024
 Are-0.024
 cry-0.022
 weep-0.022
Tokened
Token ID276
Feature activation-1.40e+00

Pos logit contributions
ogo+0.507
ules+0.499
wen+0.493
ades+0.487
us+0.486
Neg logit contributions
 don-0.393
 blame-0.368
 Are-0.353
 worry-0.320
 weep-0.316
Token and
Token ID290
Feature activation+1.75e-02

Pos logit contributions
 don+0.065
 Are+0.060
 blame+0.060
 weep+0.049
 worry+0.049
Neg logit contributions
ogo-0.113
ules-0.108
ades-0.107
wen-0.107
eds-0.106
Token thanked
Token ID26280
Feature activation-4.93e-01

Pos logit contributions
ogo+0.001
ules+0.001
wen+0.001
ades+0.001
eds+0.001
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 worry-0.001
 weep-0.001
Token her
Token ID607
Feature activation+3.11e-01

Pos logit contributions
 don+0.023
 Are+0.020
 blame+0.019
 worry+0.017
 girl+0.016
Neg logit contributions
ogo-0.041
ades-0.039
ules-0.039
wen-0.038
eds-0.038
Token friend
Token ID1545
Feature activation+2.89e-01

Pos logit contributions
ogo+0.025
ades+0.024
ules+0.024
wen+0.024
eds+0.023
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.013
 worry-0.012
 girl-0.011
Token
Token ID198
Feature activation+5.23e-01

Pos logit contributions
ogo+0.048
ules+0.045
eds+0.045
wen+0.044
ades+0.044
Neg logit contributions
 don-0.023
 Are-0.021
 Lily-0.018
 do-0.018
 blame-0.017
TokenCl
Token ID2601
Feature activation-6.53e-01

Pos logit contributions
ogo+0.035
ules+0.032
eds+0.032
wen+0.031
ades+0.030
Neg logit contributions
 don-0.035
 Are-0.032
 blame-0.029
 girl-0.029
 do-0.029
Tokenaire
Token ID7626
Feature activation+1.45e+00

Pos logit contributions
 don+0.045
 Are+0.041
 blame+0.041
 worry+0.037
 cry+0.037
Neg logit contributions
ogo-0.039
ades-0.037
eds-0.036
wen-0.036
ules-0.036
Token was
Token ID373
Feature activation+3.80e-01

Pos logit contributions
ogo+0.128
ades+0.122
ules+0.121
wen+0.120
eds+0.120
Neg logit contributions
 don-0.060
 Are-0.052
 blame-0.050
 worry-0.043
 girl-0.042
Token over
Token ID625
Feature activation+1.77e-01

Pos logit contributions
ogo+0.029
ades+0.027
ules+0.027
wen+0.027
eds+0.027
Neg logit contributions
 don-0.020
 Are-0.018
 blame-0.018
 worry-0.016
 girl-0.016
Tokenjoy
Token ID2633
Feature activation+7.12e+00

Pos logit contributions
ogo+0.006
ules+0.006
ades+0.006
wen+0.006
eds+0.006
Neg logit contributions
 don-0.016
 blame-0.015
 Are-0.015
 weep-0.014
 cry-0.014
Tokened
Token ID276
Feature activation-1.44e+00

Pos logit contributions
ogo+0.520
ules+0.504
wen+0.502
lor+0.500
ades+0.498
Neg logit contributions
 don-0.375
 blame-0.351
 Are-0.335
 worry-0.301
 weep-0.296
Token -
Token ID532
Feature activation+7.53e-01

Pos logit contributions
 don+0.065
 Are+0.060
 blame+0.060
 weep+0.049
 worry+0.048
Neg logit contributions
ogo-0.117
ules-0.111
wen-0.110
ades-0.110
eds-0.110
Token her
Token ID607
Feature activation+2.25e-01

Pos logit contributions
ogo+0.058
ules+0.057
wen+0.057
lor+0.056
Andrew+0.055
Neg logit contributions
 blame-0.030
 don-0.029
 Are-0.027
 weep-0.022
 worry-0.021
Token mug
Token ID25152
Feature activation+2.49e-02

Pos logit contributions
ogo+0.014
ades+0.014
wen+0.014
eds+0.013
ules+0.013
Neg logit contributions
 don-0.014
 Are-0.013
 blame-0.012
 Sorry-0.011
ter-0.011
Token had
Token ID550
Feature activation+3.71e-01

Pos logit contributions
ogo+0.002
ades+0.002
wen+0.002
eds+0.002
ules+0.002
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 worry-0.001
 weep-0.001
TokenThe
Token ID464
Feature activation+7.67e-02

Pos logit contributions
ogo+0.046
eds+0.043
wen+0.042
ades+0.041
ules+0.040
Neg logit contributions
 don-0.041
 Are-0.036
 blame-0.034
 do-0.034
 Lily-0.031
Token little
Token ID1310
Feature activation+1.88e-02

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
wen+0.005
Andrew+0.005
Neg logit contributions
 don-0.005
 Are-0.004
 blame-0.004
 weep-0.004
 Sorry-0.004
Token girl
Token ID2576
Feature activation-2.26e+00

Pos logit contributions
is+0.001
asper+0.001
ules+0.001
ups+0.001
oki+0.001
Neg logit contributions
 don-0.001
 blame-0.001
 Sorry-0.001
 die-0.001
 Are-0.001
Token was
Token ID373
Feature activation+3.59e-01

Pos logit contributions
 don+0.088
 Are+0.076
 blame+0.072
 worry+0.061
 girl+0.059
Neg logit contributions
ogo-0.205
ades-0.195
ules-0.193
wen-0.192
eds-0.192
Token over
Token ID625
Feature activation+1.59e-01

Pos logit contributions
ogo+0.031
ades+0.030
eds+0.029
wen+0.028
lor+0.028
Neg logit contributions
 don-0.017
 Are-0.014
 blame-0.014
 worry-0.013
 do-0.013
Tokenjoy
Token ID2633
Feature activation+7.10e+00

Pos logit contributions
ogo+0.008
ades+0.007
ules+0.007
wen+0.007
eds+0.007
Neg logit contributions
 don-0.012
 Are-0.012
 blame-0.012
 worry-0.011
 weep-0.010
Tokened
Token ID276
Feature activation-1.49e+00

Pos logit contributions
ogo+0.549
ules+0.532
ades+0.523
wen+0.520
eds+0.519
Neg logit contributions
 don-0.359
 blame-0.324
 Are-0.319
 worry-0.283
 weep-0.273
Token and
Token ID290
Feature activation-2.88e-02

Pos logit contributions
 don+0.069
 Are+0.064
 blame+0.062
 worry+0.052
 weep+0.052
Neg logit contributions
ogo-0.121
ules-0.116
ades-0.115
wen-0.114
eds-0.113
Token she
Token ID673
Feature activation+4.19e-01

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 worry+0.001
 weep+0.001
Neg logit contributions
ogo-0.002
ules-0.002
wen-0.002
eds-0.002
ades-0.002
Token thanked
Token ID26280
Feature activation-5.44e-01

Pos logit contributions
ogo+0.040
ades+0.038
eds+0.037
wen+0.037
ules+0.037
Neg logit contributions
 don-0.015
 Are-0.013
 blame-0.012
 do-0.010
 girl-0.010
Token her
Token ID607
Feature activation+2.18e-01

Pos logit contributions
 don+0.023
 Are+0.022
 blame+0.020
 weep+0.017
 Sorry+0.017
Neg logit contributions
ogo-0.046
ules-0.044
wen-0.043
ades-0.042
eds-0.042
Token and
Token ID290
Feature activation-8.51e-02

Pos logit contributions
 don+0.024
 Are+0.019
 blame+0.017
 girl+0.016
 do+0.016
Neg logit contributions
ogo-0.057
ades-0.054
wen-0.053
eds-0.053
ules-0.053
Token Tim
Token ID5045
Feature activation-4.85e-01

Pos logit contributions
 Are+0.004
 don+0.003
 blame+0.003
ter+0.003
 weep+0.003
Neg logit contributions
Andrew-0.006
ogo-0.006
ades-0.006
wen-0.006
ules-0.006
Tokenmy
Token ID1820
Feature activation+8.33e-01

Pos logit contributions
 don+0.024
 Are+0.020
 blame+0.019
 worry+0.018
 girl+0.018
Neg logit contributions
ogo-0.040
ades-0.039
ules-0.038
wen-0.037
eds-0.037
Token was
Token ID373
Feature activation+3.41e-01

Pos logit contributions
ogo+0.075
ades+0.071
ules+0.070
wen+0.070
eds+0.070
Neg logit contributions
 don-0.034
 Are-0.029
 blame-0.027
 worry-0.024
 girl-0.023
Token over
Token ID625
Feature activation+2.31e-01

Pos logit contributions
ogo+0.028
ades+0.027
eds+0.026
ules+0.026
wen+0.026
Neg logit contributions
 don-0.017
 Are-0.015
 blame-0.014
 worry-0.013
 do-0.013
Tokenjoy
Token ID2633
Feature activation+7.10e+00

Pos logit contributions
ogo+0.015
ades+0.014
wen+0.014
ules+0.014
eds+0.014
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.013
 worry-0.013
 girl-0.012
Tokened
Token ID276
Feature activation-1.43e+00

Pos logit contributions
ogo+0.640
ades+0.608
wen+0.597
ules+0.596
eds+0.595
Neg logit contributions
 don-0.290
 Are-0.247
 blame-0.228
 girl-0.201
 worry-0.199
Token.
Token ID13
Feature activation-4.76e-02

Pos logit contributions
 don+0.063
 Are+0.054
 blame+0.051
 worry+0.046
 girl+0.045
Neg logit contributions
ogo-0.124
ades-0.117
ules-0.116
wen-0.116
eds-0.115
Token He
Token ID679
Feature activation-1.74e-01

Pos logit contributions
 don+0.002
 Are+0.002
 blame+0.002
 Lily+0.002
 girl+0.002
Neg logit contributions
ogo-0.004
ades-0.004
eds-0.004
wen-0.004
ules-0.004
Token and
Token ID290
Feature activation-7.50e-02

Pos logit contributions
 don+0.007
 Are+0.006
 blame+0.005
 do+0.005
 girl+0.004
Neg logit contributions
ogo-0.016
ades-0.015
ules-0.015
eds-0.015
wen-0.015
Token his
Token ID465
Feature activation+3.78e-01

Pos logit contributions
 don+0.006
 Are+0.005
 blame+0.005
 want+0.005
 like+0.004
Neg logit contributions
ogo-0.004
eds-0.003
ules-0.003
wen-0.003
Andrew-0.003
Token all
Token ID477
Feature activation+2.33e-01

Pos logit contributions
ogo+0.128
ades+0.121
ules+0.119
eds+0.118
wen+0.118
Neg logit contributions
 don-0.094
 Are-0.085
 blame-0.082
 worry-0.074
 girl-0.073
Token."
Token ID526
Feature activation+7.30e-01

Pos logit contributions
ogo+0.018
ades+0.017
ules+0.017
wen+0.017
eds+0.017
Neg logit contributions
 don-0.013
 Are-0.011
 blame-0.011
 worry-0.010
 girl-0.010
Token Lucy
Token ID22162
Feature activation+2.95e-01

Pos logit contributions
ogo+0.054
wen+0.051
eds+0.051
ades+0.051
lor+0.051
Neg logit contributions
 don-0.037
 blame-0.031
 Are-0.030
 worry-0.028
 want-0.027
Token was
Token ID373
Feature activation+3.41e-01

Pos logit contributions
ogo+0.025
ades+0.024
wen+0.024
ules+0.024
eds+0.024
Neg logit contributions
 don-0.013
 Are-0.011
 blame-0.011
 worry-0.009
 girl-0.009
Token over
Token ID625
Feature activation+2.22e-01

Pos logit contributions
ogo+0.024
ades+0.023
ules+0.023
wen+0.022
eds+0.022
Neg logit contributions
 don-0.018
 Are-0.016
 blame-0.015
 Sorry-0.014
 girl-0.014
Tokenjoy
Token ID2633
Feature activation+7.10e+00

Pos logit contributions
ogo+0.006
ules+0.006
Andrew+0.006
ades+0.005
eds+0.005
Neg logit contributions
 don-0.021
 blame-0.020
 Are-0.020
 Sorry-0.019
 weep-0.018
Tokened
Token ID276
Feature activation-1.44e+00

Pos logit contributions
ogo+0.546
ules+0.523
ades+0.518
wen+0.517
eds+0.517
Neg logit contributions
 don-0.359
 blame-0.328
 Are-0.322
 worry-0.282
 weep-0.272
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
 don+0.074
 Are+0.071
 blame+0.071
 Sorry+0.060
 weep+0.058
Neg logit contributions
ogo-0.106
ules-0.101
eds-0.099
wen-0.099
ades-0.099
Token quickly
Token ID2952
Feature activation+2.66e+00

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 worry+0.001
 girl+0.001
Neg logit contributions
ogo-0.001
ules-0.001
wen-0.001
eds-0.001
ades-0.001
Token grabbed
Token ID13176
Feature activation+1.42e+00

Pos logit contributions
ogo+0.229
ades+0.218
ules+0.215
wen+0.214
eds+0.214
Neg logit contributions
 don-0.115
 Are-0.102
 blame-0.097
 worry-0.084
 girl-0.082
Token the
Token ID262
Feature activation+3.21e-02

Pos logit contributions
ogo+0.122
ades+0.116
ules+0.116
wen+0.115
eds+0.114
Neg logit contributions
 don-0.058
 Are-0.054
 blame-0.049
 worry-0.042
 weep-0.041
TokenThe
Token ID464
Feature activation+4.64e-02

Pos logit contributions
ogo+0.046
eds+0.044
wen+0.043
ules+0.043
alks+0.041
Neg logit contributions
 don-0.031
 Are-0.028
 do-0.026
No-0.025
Thank-0.024
Token bald
Token ID28273
Feature activation+9.93e-01

Pos logit contributions
ogo+0.003
ades+0.003
ules+0.003
wen+0.003
eds+0.003
Neg logit contributions
 don-0.003
 Are-0.003
 blame-0.003
 worry-0.003
 girl-0.003
Token man
Token ID582
Feature activation+3.62e-01

Pos logit contributions
ogo+0.081
ades+0.077
ules+0.076
wen+0.076
eds+0.075
Neg logit contributions
 don-0.049
 Are-0.043
 blame-0.042
 worry-0.037
 girl-0.035
Token was
Token ID373
Feature activation+3.39e-01

Pos logit contributions
ogo+0.028
ades+0.027
eds+0.027
wen+0.027
ules+0.027
Neg logit contributions
 don-0.017
 Are-0.016
 blame-0.015
 Sorry-0.013
 worry-0.013
Token over
Token ID625
Feature activation+2.21e-01

Pos logit contributions
ogo+0.027
ades+0.025
ules+0.025
wen+0.025
eds+0.025
Neg logit contributions
 don-0.018
 Are-0.016
 blame-0.015
 worry-0.014
 girl-0.013
Tokenjoy
Token ID2633
Feature activation+7.10e+00

Pos logit contributions
ogo+0.010
ades+0.009
ules+0.009
eds+0.009
wen+0.009
Neg logit contributions
 don-0.018
 Are-0.017
 blame-0.017
 worry-0.016
 cry-0.015
Tokened
Token ID276
Feature activation-1.46e+00

Pos logit contributions
ogo+0.545
ules+0.523
eds+0.520
ades+0.516
wen+0.513
Neg logit contributions
 don-0.363
 blame-0.334
 Are-0.322
 worry-0.290
 weep-0.279
Token.
Token ID13
Feature activation-3.08e-02

Pos logit contributions
 don+0.063
 Are+0.056
 blame+0.055
 worry+0.046
 weep+0.045
Neg logit contributions
ogo-0.126
ades-0.119
ules-0.119
eds-0.118
wen-0.118
Token He
Token ID679
Feature activation-1.14e-01

Pos logit contributions
 don+0.002
 Are+0.001
 blame+0.001
 girl+0.001
 Lily+0.001
Neg logit contributions
ogo-0.003
ades-0.002
wen-0.002
eds-0.002
ules-0.002
Token thanked
Token ID26280
Feature activation-5.29e-01

Pos logit contributions
 don+0.005
 Are+0.004
 blame+0.004
 worry+0.003
 girl+0.003
Neg logit contributions
ogo-0.010
ades-0.010
ules-0.010
eds-0.010
wen-0.010
Token the
Token ID262
Feature activation+6.37e-02

Pos logit contributions
 don+0.019
 Are+0.017
 blame+0.016
 worry+0.013
 girl+0.012
Neg logit contributions
ogo-0.049
ades-0.047
ules-0.047
eds-0.046
wen-0.046
Token
Token ID198
Feature activation+6.32e-01

Pos logit contributions
ogo+0.175
eds+0.167
ades+0.167
ules+0.167
wen+0.162
Neg logit contributions
 don-0.094
 Are-0.080
 Lily-0.071
 blame-0.070
 do-0.065
Token
Token ID198
Feature activation+6.26e-01

Pos logit contributions
ogo+0.053
ules+0.051
ades+0.050
wen+0.050
eds+0.050
Neg logit contributions
 don-0.029
 Are-0.026
 blame-0.023
 Lily-0.023
 do-0.023
TokenAnna
Token ID31160
Feature activation+9.82e-01

Pos logit contributions
ogo+0.041
ades+0.038
ules+0.038
wen+0.037
eds+0.037
Neg logit contributions
 don-0.041
 Are-0.037
 blame-0.036
 worry-0.033
 girl-0.033
Token was
Token ID373
Feature activation+3.88e-01

Pos logit contributions
ogo+0.084
ades+0.080
ules+0.079
eds+0.079
wen+0.079
Neg logit contributions
 don-0.044
 Are-0.038
 blame-0.037
 worry-0.032
 girl-0.031
Token over
Token ID625
Feature activation+2.71e-01

Pos logit contributions
ogo+0.027
ules+0.025
wen+0.025
ades+0.025
eds+0.025
Neg logit contributions
 don-0.023
 Are-0.021
 blame-0.020
 girl-0.018
 worry-0.018
Tokenjoy
Token ID2633
Feature activation+7.09e+00

Pos logit contributions
ogo+0.008
ules+0.008
ades+0.008
wen+0.008
Andrew+0.007
Neg logit contributions
 don-0.025
 blame-0.024
 Are-0.024
 Sorry-0.022
 weep-0.022
Tokened
Token ID276
Feature activation-1.45e+00

Pos logit contributions
ogo+0.501
ules+0.488
eds+0.481
wen+0.478
us+0.476
Neg logit contributions
 don-0.398
 blame-0.372
 Are-0.352
 worry-0.327
 weep-0.313
Token.
Token ID13
Feature activation-9.66e-03

Pos logit contributions
 don+0.071
 blame+0.067
 Are+0.067
 weep+0.056
 Sorry+0.055
Neg logit contributions
ogo-0.112
ules-0.107
wen-0.106
eds-0.105
ades-0.104
Token She
Token ID1375
Feature activation-1.62e-02

Pos logit contributions
 don+0.000
 blame+0.000
 Are+0.000
 worry+0.000
 weep+0.000
Neg logit contributions
ogo-0.001
ules-0.001
wen-0.001
ades-0.001
eds-0.001
Token had
Token ID550
Feature activation+3.44e-01

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 do+0.000
 worry+0.000
Neg logit contributions
ogo-0.001
ades-0.001
ules-0.001
eds-0.001
wen-0.001
Token guessed
Token ID25183
Feature activation+3.88e-01

Pos logit contributions
ogo+0.033
ades+0.032
ules+0.031
eds+0.030
wen+0.030
Neg logit contributions
 don-0.013
 Are-0.010
 Lily-0.010
 do-0.010
 give-0.010
Token
Token ID198
Feature activation+6.04e-01

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 Lily+0.001
 boy+0.001
Neg logit contributions
ogo-0.002
eds-0.002
ades-0.002
wen-0.002
ules-0.002
Token
Token ID198
Feature activation+6.04e-01

Pos logit contributions
ogo+0.065
eds+0.063
ules+0.061
urger+0.061
lor+0.059
Neg logit contributions
 don-0.014
 Are-0.012
 Lily-0.010
 do-0.010
 with-0.010
TokenJohn
Token ID7554
Feature activation+1.46e+00

Pos logit contributions
ogo+0.047
eds+0.042
ules+0.040
lor+0.039
wen+0.039
Neg logit contributions
 don-0.036
 Are-0.035
 blame-0.031
 do-0.031
No-0.030
Token was
Token ID373
Feature activation+2.46e-01

Pos logit contributions
ogo+0.132
ades+0.126
ules+0.125
wen+0.124
eds+0.124
Neg logit contributions
 don-0.057
 Are-0.049
 blame-0.047
 worry-0.040
 girl-0.038
Token over
Token ID625
Feature activation+1.70e-01

Pos logit contributions
ogo+0.018
ades+0.017
ules+0.017
wen+0.016
eds+0.016
Neg logit contributions
 don-0.014
 Are-0.013
 blame-0.012
 girl-0.011
 worry-0.011
Tokenjoy
Token ID2633
Feature activation+7.09e+00

Pos logit contributions
ogo+0.006
ades+0.006
ules+0.006
Andrew+0.006
eds+0.006
Neg logit contributions
 don-0.015
 blame-0.014
 Are-0.014
 weep-0.013
 cry-0.013
Tokened
Token ID276
Feature activation-1.49e+00

Pos logit contributions
ogo+0.578
ules+0.550
ades+0.549
wen+0.544
eds+0.544
Neg logit contributions
 don-0.339
 Are-0.297
 blame-0.297
 worry-0.257
 weep-0.246
Token and
Token ID290
Feature activation-1.06e-01

Pos logit contributions
 don+0.071
 Are+0.063
 blame+0.062
 worry+0.053
 weep+0.052
Neg logit contributions
ogo-0.122
ades-0.115
ules-0.115
eds-0.114
wen-0.114
Token asked
Token ID1965
Feature activation-2.28e+00

Pos logit contributions
 don+0.005
 Are+0.005
 blame+0.005
 worry+0.004
 weep+0.004
Neg logit contributions
ogo-0.008
ules-0.008
ades-0.008
eds-0.007
wen-0.007
Token the
Token ID262
Feature activation+2.88e-02

Pos logit contributions
 don+0.089
 Are+0.082
 blame+0.081
 weep+0.071
 Sorry+0.069
Neg logit contributions
ogo-0.186
ules-0.181
ades-0.178
us-0.175
Andrew-0.174
Token woman
Token ID2415
Feature activation-3.68e-01

Pos logit contributions
ogo+0.002
ades+0.002
ules+0.002
eds+0.002
wen+0.002
Neg logit contributions
 don-0.002
 Are-0.001
 blame-0.001
 worry-0.001
 girl-0.001
Token win
Token ID1592
Feature activation+1.24e+00

Pos logit contributions
 don+0.003
 Are+0.002
 blame+0.002
 worry+0.002
 weep+0.002
Neg logit contributions
ogo-0.003
ades-0.003
ules-0.003
wen-0.003
eds-0.003
Token.
Token ID13
Feature activation-9.69e-02

Pos logit contributions
ogo+0.102
ades+0.096
ules+0.095
eds+0.094
wen+0.094
Neg logit contributions
 don-0.061
 Are-0.052
 blame-0.050
 worry-0.045
 do-0.045
Token Jane
Token ID12091
Feature activation+2.43e+00

Pos logit contributions
 don+0.005
 Are+0.004
 blame+0.004
 Lily+0.004
 girl+0.004
Neg logit contributions
ogo-0.008
ades-0.007
eds-0.007
ules-0.007
wen-0.007
Token was
Token ID373
Feature activation+2.83e-01

Pos logit contributions
ogo+0.223
ades+0.213
ules+0.210
eds+0.208
wen+0.208
Neg logit contributions
 don-0.095
 Are-0.080
 blame-0.075
 worry-0.065
 girl-0.065
Token over
Token ID625
Feature activation+1.47e-01

Pos logit contributions
ogo+0.022
ades+0.020
ules+0.020
wen+0.020
eds+0.020
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.013
 worry-0.012
 girl-0.012
Tokenjoy
Token ID2633
Feature activation+7.08e+00

Pos logit contributions
ogo+0.006
ades+0.006
ules+0.006
wen+0.006
eds+0.006
Neg logit contributions
 don-0.013
 Are-0.012
 blame-0.012
 worry-0.011
 weep-0.011
Tokened
Token ID276
Feature activation-1.50e+00

Pos logit contributions
ogo+0.561
ules+0.536
wen+0.534
ades+0.534
eds+0.528
Neg logit contributions
 don-0.345
 blame-0.310
 Are-0.307
 worry-0.267
 weep-0.258
Token!
Token ID0
Feature activation+2.18e+00

Pos logit contributions
 don+0.067
 Are+0.060
 blame+0.059
 worry+0.049
 weep+0.049
Neg logit contributions
ogo-0.125
ades-0.119
ules-0.118
wen-0.117
eds-0.117
Token<|endoftext|>
Token ID50256
Feature activation-1.87e+00

Pos logit contributions
ogo+0.184
ades+0.175
eds+0.171
ules+0.169
wen+0.169
Neg logit contributions
 don-0.106
 Are-0.097
 blame-0.086
 Lily-0.079
 girl-0.079
TokenOnce
Token ID7454
Feature activation+8.24e-01

Pos logit contributions
 don+0.106
 Are+0.097
 blame+0.090
 girl+0.083
 do+0.082
Neg logit contributions
ogo-0.137
ades-0.129
ules-0.127
wen-0.127
eds-0.126
Token upon
Token ID2402
Feature activation-4.96e-01

Pos logit contributions
ogo+0.064
ades+0.062
eds+0.061
lor+0.061
wen+0.060
Neg logit contributions
 don-0.045
 Are-0.036
 blame-0.034
 Lily-0.034
 worry-0.033
Token rock
Token ID3881
Feature activation+8.76e-01

Pos logit contributions
ogo+0.063
ades+0.062
lor+0.061
wen+0.060
eds+0.059
Neg logit contributions
 don-0.053
 blame-0.051
 Are-0.051
 Sorry-0.042
 worry-0.041
Token.
Token ID13
Feature activation+8.64e-02

Pos logit contributions
ogo+0.071
ades+0.068
eds+0.067
wen+0.066
ules+0.066
Neg logit contributions
 don-0.042
 Are-0.038
 blame-0.037
 worry-0.032
 girl-0.031
Token Lily
Token ID20037
Feature activation-1.82e-01

Pos logit contributions
ogo+0.008
ades+0.007
ules+0.007
wen+0.007
eds+0.007
Neg logit contributions
 don-0.003
 blame-0.003
 Are-0.003
 worry-0.003
 weep-0.002
Token was
Token ID373
Feature activation+3.50e-01

Pos logit contributions
 don+0.007
 Are+0.005
 blame+0.005
 worry+0.004
 do+0.004
Neg logit contributions
ogo-0.017
ades-0.016
ules-0.016
wen-0.016
eds-0.016
Token over
Token ID625
Feature activation+1.59e-01

Pos logit contributions
ogo+0.027
ades+0.025
ules+0.025
wen+0.025
eds+0.025
Neg logit contributions
 don-0.019
 Are-0.017
 blame-0.016
 worry-0.014
 girl-0.014
Tokenjoy
Token ID2633
Feature activation+7.07e+00

Pos logit contributions
ogo+0.006
ades+0.005
ules+0.005
wen+0.005
Andrew+0.005
Neg logit contributions
 don-0.014
 blame-0.014
 Are-0.013
 cry-0.012
 weep-0.012
Tokened
Token ID276
Feature activation-1.49e+00

Pos logit contributions
ogo+0.557
ules+0.533
ades+0.528
wen+0.527
eds+0.526
Neg logit contributions
 don-0.353
 blame-0.318
 Are-0.312
 worry-0.275
 weep-0.264
Token and
Token ID290
Feature activation-7.77e-02

Pos logit contributions
 don+0.071
 Are+0.065
 blame+0.065
 worry+0.054
 weep+0.053
Neg logit contributions
ogo-0.120
ules-0.113
ades-0.113
eds-0.112
wen-0.112
Token knew
Token ID2993
Feature activation+2.58e+00

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 worry+0.003
 girl+0.003
Neg logit contributions
ogo-0.006
ules-0.005
ades-0.005
eds-0.005
wen-0.005
Token it
Token ID340
Feature activation-1.39e-01

Pos logit contributions
ogo+0.228
ades+0.218
wen+0.217
ules+0.216
lor+0.211
Neg logit contributions
 don-0.101
 Are-0.092
 blame-0.089
 weep-0.074
 cry-0.072
Token was
Token ID373
Feature activation+3.62e-01

Pos logit contributions
 don+0.006
 Are+0.006
 blame+0.006
 worry+0.005
 weep+0.005
Neg logit contributions
ogo-0.011
wen-0.011
ades-0.011
eds-0.011
ules-0.011
TokenThe
Token ID464
Feature activation-1.27e-02

Pos logit contributions
ogo+0.043
ules+0.039
eds+0.039
wen+0.039
ades+0.037
Neg logit contributions
 don-0.030
 Are-0.028
 do-0.025
 blame-0.025
Yes-0.024
Token oct
Token ID19318
Feature activation+1.22e+00

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 worry+0.001
 weep+0.001
Neg logit contributions
ogo-0.001
ades-0.001
ules-0.001
eds-0.001
wen-0.001
Tokenopus
Token ID25790
Feature activation+1.60e+00

Pos logit contributions
ogo+0.064
eds+0.059
wen+0.059
Andrew+0.056
ules+0.056
Neg logit contributions
 don-0.090
 Are-0.084
 blame-0.078
 do-0.076
 boy-0.076
Token was
Token ID373
Feature activation+3.08e-01

Pos logit contributions
ogo+0.135
ades+0.128
eds+0.126
wen+0.126
ules+0.126
Neg logit contributions
 don-0.070
 Are-0.064
 blame-0.061
 worry-0.051
 cry-0.049
Token over
Token ID625
Feature activation+1.72e-01

Pos logit contributions
ogo+0.026
ades+0.024
ules+0.024
eds+0.024
wen+0.024
Neg logit contributions
 don-0.015
 Are-0.013
 blame-0.013
 worry-0.011
 do-0.011
Tokenjoy
Token ID2633
Feature activation+7.05e+00

Pos logit contributions
ogo+0.008
ades+0.008
ules+0.007
eds+0.007
wen+0.007
Neg logit contributions
 don-0.014
 Are-0.013
 blame-0.013
 worry-0.012
 girl-0.012
Tokened
Token ID276
Feature activation-1.47e+00

Pos logit contributions
ogo+0.589
ades+0.560
ules+0.557
wen+0.553
eds+0.552
Neg logit contributions
 don-0.326
 Are-0.287
 blame-0.283
 worry-0.245
 weep-0.235
Token,
Token ID11
Feature activation-8.04e-01

Pos logit contributions
 don+0.064
 Are+0.057
 blame+0.056
 worry+0.047
 weep+0.045
Neg logit contributions
ogo-0.126
ades-0.119
ules-0.119
eds-0.118
wen-0.117
Token and
Token ID290
Feature activation-8.72e-02

Pos logit contributions
 don+0.038
 Are+0.034
 blame+0.034
 worry+0.029
 girl+0.028
Neg logit contributions
ogo-0.064
ules-0.061
ades-0.060
eds-0.060
wen-0.060
Token he
Token ID339
Feature activation+3.47e-01

Pos logit contributions
 don+0.004
 blame+0.004
 Are+0.004
 worry+0.003
 weep+0.003
Neg logit contributions
ogo-0.007
ules-0.006
eds-0.006
ades-0.006
wen-0.006
Token thanked
Token ID26280
Feature activation-5.81e-01

Pos logit contributions
ogo+0.030
ades+0.028
ules+0.028
wen+0.028
eds+0.028
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.013
 worry-0.011
 girl-0.011
TokenS
Token ID50
Feature activation-4.95e-02

Pos logit contributions
ogo+0.042
eds+0.040
wen+0.039
ades+0.038
osaurus+0.037
Neg logit contributions
 don-0.024
 Are-0.021
 Lily-0.019
Yes-0.018
Thank-0.018
Tokenally
Token ID453
Feature activation+3.69e+00

Pos logit contributions
 don+0.004
 blame+0.003
 Are+0.003
 girl+0.003
 worry+0.003
Neg logit contributions
ogo-0.003
eds-0.003
wen-0.003
ules-0.003
ades-0.003
Token was
Token ID373
Feature activation+1.88e-01

Pos logit contributions
ogo+0.340
ades+0.327
ules+0.321
eds+0.318
wen+0.315
Neg logit contributions
 don-0.142
 Are-0.114
 blame-0.109
 girl-0.098
 do-0.097
Token so
Token ID523
Feature activation+5.68e-01

Pos logit contributions
ogo+0.015
ades+0.014
ules+0.014
eds+0.014
wen+0.014
Neg logit contributions
 don-0.010
 Are-0.009
 blame-0.008
 worry-0.008
 girl-0.007
Token over
Token ID625
Feature activation+6.47e-02

Pos logit contributions
ogo+0.041
ades+0.039
ules+0.039
wen+0.038
eds+0.038
Neg logit contributions
 don-0.031
 Are-0.028
 blame-0.028
 girl-0.024
 worry-0.024
Tokenjoy
Token ID2633
Feature activation+7.05e+00

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
wen+0.005
eds+0.005
Neg logit contributions
 don-0.003
 Are-0.003
 blame-0.003
 worry-0.002
 cry-0.002
Tokened
Token ID276
Feature activation-1.55e+00

Pos logit contributions
ogo+0.559
ules+0.538
ades+0.537
wen+0.531
eds+0.528
Neg logit contributions
 don-0.344
 blame-0.308
 Are-0.307
 worry-0.268
 weep-0.261
Token with
Token ID351
Feature activation-4.03e-01

Pos logit contributions
 don+0.065
 Are+0.058
 blame+0.057
 worry+0.047
 weep+0.046
Neg logit contributions
ogo-0.134
ades-0.127
ules-0.127
wen-0.126
eds-0.126
Token her
Token ID607
Feature activation+7.27e-02

Pos logit contributions
 don+0.017
 Are+0.015
 blame+0.014
 girl+0.012
 worry+0.012
Neg logit contributions
ogo-0.035
ades-0.034
ules-0.034
wen-0.033
eds-0.033
Token present
Token ID1944
Feature activation+2.51e-02

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
wen+0.005
eds+0.005
Neg logit contributions
 don-0.004
 Are-0.004
 blame-0.004
 worry-0.003
 girl-0.003
Token that
Token ID326
Feature activation-1.78e-01

Pos logit contributions
ogo+0.002
ades+0.002
ules+0.002
wen+0.002
eds+0.002
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 worry-0.001
 girl-0.001
Token
Token ID198
Feature activation+6.28e-01

Pos logit contributions
ogo+0.155
ades+0.145
ules+0.143
wen+0.142
eds+0.142
Neg logit contributions
 don-0.132
 Are-0.119
 blame-0.116
 worry-0.105
 girl-0.104
Token
Token ID198
Feature activation+6.27e-01

Pos logit contributions
ogo+0.062
ades+0.058
eds+0.058
ules+0.058
urger+0.056
Neg logit contributions
 don-0.021
 Are-0.017
 do-0.016
 blame-0.014
 with-0.013
TokenJean
Token ID38248
Feature activation+3.32e+00

Pos logit contributions
ogo+0.038
ades+0.034
eds+0.033
ules+0.033
us+0.031
Neg logit contributions
 don-0.046
 Are-0.044
 blame-0.042
 do-0.040
 worry-0.039
Token was
Token ID373
Feature activation+1.92e-01

Pos logit contributions
ogo+0.307
ades+0.293
ules+0.289
wen+0.287
eds+0.286
Neg logit contributions
 don-0.128
 Are-0.106
 blame-0.101
 girl-0.087
 worry-0.087
Token over
Token ID625
Feature activation+1.40e-01

Pos logit contributions
ogo+0.015
ades+0.014
ules+0.014
wen+0.014
eds+0.014
Neg logit contributions
 don-0.010
 Are-0.009
 blame-0.009
 worry-0.008
 girl-0.008
Tokenjoy
Token ID2633
Feature activation+7.04e+00

Pos logit contributions
ogo+0.004
ules+0.004
ades+0.004
Andrew+0.004
eds+0.004
Neg logit contributions
 don-0.013
 blame-0.012
 Are-0.012
 cry-0.011
 weep-0.011
Tokened
Token ID276
Feature activation-1.55e+00

Pos logit contributions
ogo+0.544
ules+0.527
eds+0.518
ades+0.515
wen+0.515
Neg logit contributions
 don-0.359
 blame-0.327
 Are-0.318
 worry-0.281
 cry-0.272
Token and
Token ID290
Feature activation-9.53e-02

Pos logit contributions
 don+0.074
 blame+0.069
 Are+0.068
 weep+0.057
 worry+0.057
Neg logit contributions
ogo-0.122
ules-0.117
eds-0.115
wen-0.114
ades-0.114
Token waved
Token ID26834
Feature activation-7.91e-02

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 worry+0.003
 weep+0.003
Neg logit contributions
ogo-0.008
ules-0.007
eds-0.007
ades-0.007
wen-0.007
Token to
Token ID284
Feature activation+5.01e-01

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 weep+0.003
 worry+0.003
Neg logit contributions
ogo-0.006
ules-0.006
ades-0.006
eds-0.005
Andrew-0.005
Token the
Token ID262
Feature activation+2.39e-02

Pos logit contributions
ogo+0.050
ades+0.048
ules+0.047
wen+0.047
eds+0.047
Neg logit contributions
 don-0.016
 Are-0.013
 blame-0.013
 worry-0.010
 do-0.010
Token flower
Token ID15061
Feature activation+1.93e+00

Pos logit contributions
 don+0.030
 blame+0.029
 Are+0.028
 cry+0.024
 Sorry+0.023
Neg logit contributions
ades-0.034
eds-0.034
ogo-0.034
ules-0.032
lor-0.032
Token.
Token ID13
Feature activation+3.21e-02

Pos logit contributions
ogo+0.157
ades+0.149
ules+0.147
eds+0.147
wen+0.146
Neg logit contributions
 don-0.093
 Are-0.083
 blame-0.080
 worry-0.070
 girl-0.067
Token Lily
Token ID20037
Feature activation-1.49e-01

Pos logit contributions
ogo+0.003
ades+0.003
ules+0.003
lor+0.003
wen+0.003
Neg logit contributions
 don-0.001
 blame-0.001
 Are-0.001
 worry-0.001
 weep-0.001
Token was
Token ID373
Feature activation+3.09e-01

Pos logit contributions
 don+0.006
 Are+0.005
 blame+0.005
 worry+0.004
 girl+0.004
Neg logit contributions
ogo-0.013
ades-0.013
ules-0.013
wen-0.012
eds-0.012
Token over
Token ID625
Feature activation+1.90e-01

Pos logit contributions
ogo+0.024
ades+0.023
ules+0.022
eds+0.022
wen+0.022
Neg logit contributions
 don-0.016
 Are-0.015
 blame-0.014
 worry-0.013
 girl-0.013
Tokenjoy
Token ID2633
Feature activation+7.04e+00

Pos logit contributions
ogo+0.007
ules+0.006
ades+0.006
eds+0.006
wen+0.006
Neg logit contributions
 don-0.017
 blame-0.016
 Are-0.016
 cry-0.015
 Sorry-0.015
Tokened
Token ID276
Feature activation-1.52e+00

Pos logit contributions
ogo+0.542
ules+0.523
wen+0.515
eds+0.514
ades+0.514
Neg logit contributions
 don-0.361
 blame-0.330
 Are-0.318
 worry-0.285
 weep-0.274
Token and
Token ID290
Feature activation-7.18e-02

Pos logit contributions
 don+0.074
 blame+0.070
 Are+0.069
 Sorry+0.058
 weep+0.057
Neg logit contributions
ogo-0.116
ules-0.112
eds-0.111
wen-0.110
ades-0.109
Token ran
Token ID4966
Feature activation+1.45e-01

Pos logit contributions
 don+0.003
 Are+0.003
 blame+0.003
 worry+0.003
 boy+0.003
Neg logit contributions
ogo-0.005
ules-0.005
eds-0.005
wen-0.005
ades-0.005
Token to
Token ID284
Feature activation+5.79e-01

Pos logit contributions
ogo+0.012
ades+0.011
ules+0.011
wen+0.011
eds+0.011
Neg logit contributions
 don-0.007
 Are-0.006
 blame-0.006
 worry-0.005
 girl-0.005
Token her
Token ID607
Feature activation+2.48e-01

Pos logit contributions
ogo+0.053
ades+0.050
ules+0.050
eds+0.050
wen+0.050
Neg logit contributions
 don-0.022
 Are-0.019
 blame-0.018
 worry-0.015
 girl-0.015
TokenThe
Token ID464
Feature activation+8.16e-02

Pos logit contributions
ogo+0.043
eds+0.040
ules+0.039
wen+0.039
ades+0.037
Neg logit contributions
 don-0.035
 Are-0.031
 blame-0.030
 girl-0.029
 do-0.028
Token little
Token ID1310
Feature activation+5.82e-02

Pos logit contributions
ogo+0.006
ades+0.006
ules+0.006
wen+0.006
eds+0.006
Neg logit contributions
 don-0.004
 Are-0.004
 blame-0.004
 worry-0.003
 weep-0.003
Token girl
Token ID2576
Feature activation-2.27e+00

Pos logit contributions
Andrew+0.005
uan+0.005
ogo+0.005
ules+0.005
asper+0.004
Neg logit contributions
 don-0.002
 blame-0.002
 Are-0.002
 Sorry-0.001
 weep-0.001
Token was
Token ID373
Feature activation+3.44e-01

Pos logit contributions
 don+0.099
 Are+0.090
 blame+0.085
 worry+0.073
 weep+0.069
Neg logit contributions
ogo-0.190
ades-0.182
wen-0.180
ules-0.179
eds-0.176
Token over
Token ID625
Feature activation+2.00e-01

Pos logit contributions
ogo+0.028
ades+0.026
eds+0.026
ules+0.026
wen+0.026
Neg logit contributions
 don-0.017
 Are-0.015
 blame-0.015
 worry-0.013
 do-0.013
Tokenjoy
Token ID2633
Feature activation+7.03e+00

Pos logit contributions
ogo+0.006
ules+0.006
ades+0.006
Andrew+0.006
eds+0.005
Neg logit contributions
 don-0.018
 blame-0.018
 Are-0.017
 Sorry-0.016
 cry-0.016
Tokened
Token ID276
Feature activation-1.53e+00

Pos logit contributions
ogo+0.536
ules+0.518
ades+0.510
wen+0.508
eds+0.506
Neg logit contributions
 don-0.363
 blame-0.332
 Are-0.324
 worry-0.286
 weep-0.280
Token and
Token ID290
Feature activation-5.01e-02

Pos logit contributions
 don+0.074
 blame+0.069
 Are+0.069
 Sorry+0.057
 weep+0.057
Neg logit contributions
ogo-0.119
ules-0.114
ades-0.112
wen-0.111
eds-0.111
Token jumped
Token ID11687
Feature activation+1.19e+00

Pos logit contributions
 don+0.003
 Are+0.002
 blame+0.002
 worry+0.002
 weep+0.002
Neg logit contributions
ogo-0.004
ules-0.003
wen-0.003
eds-0.003
ades-0.003
Token out
Token ID503
Feature activation+9.96e-01

Pos logit contributions
ogo+0.091
ades+0.088
ules+0.084
eds+0.083
us+0.082
Neg logit contributions
 don-0.061
 Are-0.057
 blame-0.055
 worry-0.049
 Sorry-0.048
Token of
Token ID286
Feature activation+2.00e-02

Pos logit contributions
ogo+0.076
ades+0.073
ules+0.072
eds+0.071
wen+0.070
Neg logit contributions
 don-0.052
 Are-0.047
 blame-0.045
 worry-0.039
 Sorry-0.038
Token
Token ID198
Feature activation+6.41e-01

Pos logit contributions
ogo+0.067
eds+0.066
urger+0.065
ules+0.065
opl+0.064
Neg logit contributions
 don-0.017
 Are-0.013
 do-0.011
 Lily-0.011
 with-0.010
TokenThe
Token ID464
Feature activation-6.26e-03

Pos logit contributions
ogo+0.048
ules+0.045
wen+0.045
eds+0.044
ades+0.043
Neg logit contributions
 don-0.038
 Are-0.035
 do-0.032
 blame-0.031
No-0.030
Token bear
Token ID6842
Feature activation+4.32e-01

Pos logit contributions
 don+0.000
 Are+0.000
 blame+0.000
 Sorry+0.000
 want+0.000
Neg logit contributions
ogo-0.000
ades-0.000
eds-0.000
iflower-0.000
Andrew-0.000
Token was
Token ID373
Feature activation+2.79e-01

Pos logit contributions
ogo+0.037
ades+0.035
eds+0.035
wen+0.035
ules+0.035
Neg logit contributions
 don-0.018
 Are-0.017
 blame-0.016
 worry-0.013
 girl-0.013
Token over
Token ID625
Feature activation+1.67e-01

Pos logit contributions
ogo+0.023
ades+0.022
ules+0.022
wen+0.022
eds+0.022
Neg logit contributions
 don-0.013
 Are-0.011
 blame-0.011
 worry-0.010
 do-0.010
Tokenjoy
Token ID2633
Feature activation+7.03e+00

Pos logit contributions
ogo+0.009
ades+0.008
ules+0.008
eds+0.008
wen+0.008
Neg logit contributions
 don-0.013
 Are-0.012
 blame-0.012
 worry-0.011
 girl-0.011
Tokened
Token ID276
Feature activation-1.51e+00

Pos logit contributions
ogo+0.564
ules+0.537
eds+0.534
ades+0.534
wen+0.530
Neg logit contributions
 don-0.342
 blame-0.307
 Are-0.304
 worry-0.266
 weep-0.255
Token!
Token ID0
Feature activation+2.25e+00

Pos logit contributions
 don+0.065
 Are+0.057
 blame+0.054
 worry+0.047
 girl+0.046
Neg logit contributions
ogo-0.131
ades-0.125
ules-0.123
wen-0.123
eds-0.123
Token He
Token ID679
Feature activation-1.32e-01

Pos logit contributions
ogo+0.190
ades+0.184
eds+0.180
wen+0.179
ules+0.175
Neg logit contributions
 don-0.107
 Are-0.099
 blame-0.087
 girl-0.080
 Lily-0.080
Token quickly
Token ID2952
Feature activation+2.61e+00

Pos logit contributions
 don+0.005
 Are+0.005
 blame+0.004
 worry+0.004
 girl+0.004
Neg logit contributions
ogo-0.012
ades-0.011
ules-0.011
wen-0.011
eds-0.011
Token ran
Token ID4966
Feature activation+1.50e-01

Pos logit contributions
ogo+0.221
ades+0.210
ules+0.207
eds+0.206
wen+0.206
Neg logit contributions
 don-0.118
 Are-0.104
 blame-0.099
 worry-0.087
 girl-0.085
Token
Token ID198
Feature activation+6.31e-01

Pos logit contributions
ogo+0.060
eds+0.057
ules+0.057
wen+0.056
ades+0.055
Neg logit contributions
 don-0.022
 Are-0.020
 Lily-0.019
 do-0.016
 blame-0.016
TokenThe
Token ID464
Feature activation-1.27e-02

Pos logit contributions
ogo+0.049
wen+0.044
ules+0.044
eds+0.044
ades+0.042
Neg logit contributions
 don-0.036
 Are-0.034
 blame-0.031
 girl-0.030
 do-0.029
Token rabbit
Token ID22746
Feature activation+4.83e-01

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 Sorry+0.001
 weep+0.001
Neg logit contributions
ogo-0.001
ules-0.001
eds-0.001
ades-0.001
wen-0.001
Token was
Token ID373
Feature activation+3.06e-01

Pos logit contributions
ogo+0.040
ades+0.038
wen+0.037
ules+0.037
lor+0.036
Neg logit contributions
 don-0.021
 Are-0.019
 blame-0.018
 Sorry-0.016
 worry-0.015
Token over
Token ID625
Feature activation+1.77e-01

Pos logit contributions
ogo+0.025
ades+0.024
ules+0.024
wen+0.024
eds+0.023
Neg logit contributions
 don-0.015
 Are-0.013
 blame-0.012
 worry-0.011
 girl-0.011
Tokenjoy
Token ID2633
Feature activation+7.03e+00

Pos logit contributions
ogo+0.007
ades+0.007
ules+0.007
eds+0.006
wen+0.006
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.014
 worry-0.013
 weep-0.013
Tokened
Token ID276
Feature activation-1.49e+00

Pos logit contributions
ogo+0.540
ules+0.520
ades+0.510
eds+0.509
us+0.509
Neg logit contributions
 don-0.361
 blame-0.330
 Are-0.322
 worry-0.284
 weep-0.276
Token!
Token ID0
Feature activation+2.26e+00

Pos logit contributions
 don+0.070
 Are+0.065
 blame+0.064
 worry+0.053
 weep+0.053
Neg logit contributions
ogo-0.120
ules-0.114
ades-0.113
eds-0.112
wen-0.111
Token He
Token ID679
Feature activation-1.25e-01

Pos logit contributions
ogo+0.203
ades+0.198
eds+0.195
wen+0.192
ules+0.184
Neg logit contributions
 don-0.095
 Are-0.092
 Lily-0.075
 blame-0.072
 boy-0.072
Token hopped
Token ID45230
Feature activation+4.14e-02

Pos logit contributions
 don+0.005
 Are+0.004
 blame+0.004
 worry+0.004
 girl+0.004
Neg logit contributions
ogo-0.011
ades-0.011
ules-0.010
eds-0.010
wen-0.010
Token up
Token ID510
Feature activation+1.55e+00

Pos logit contributions
ogo+0.003
ades+0.003
ules+0.003
eds+0.003
wen+0.003
Neg logit contributions
 don-0.002
 Are-0.002
 blame-0.002
 worry-0.001
 weep-0.001
Token wind
Token ID2344
Feature activation-4.73e+00

Pos logit contributions
ades+0.002
eds+0.002
ogo+0.002
opl+0.002
ules+0.002
Neg logit contributions
 don-0.002
 Are-0.002
 blame-0.002
 worry-0.001
 give-0.001
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
 don+0.193
 Are+0.172
 blame+0.169
 worry+0.139
 girl+0.133
Neg logit contributions
ogo-0.417
ades-0.397
ules-0.393
eds-0.393
wen-0.390
Token Benny
Token ID44275
Feature activation+4.44e-01

Pos logit contributions
 don+0.001
 blame+0.001
 Are+0.000
 worry+0.000
 weep+0.000
Neg logit contributions
ogo-0.001
ades-0.001
ules-0.001
eds-0.001
wen-0.001
Token was
Token ID373
Feature activation+2.82e-01

Pos logit contributions
ogo+0.040
ades+0.038
ules+0.038
wen+0.037
eds+0.037
Neg logit contributions
 don-0.018
 Are-0.015
 blame-0.015
 worry-0.013
 girl-0.012
Token over
Token ID625
Feature activation+1.69e-01

Pos logit contributions
ogo+0.022
ades+0.021
ules+0.021
wen+0.021
eds+0.021
Neg logit contributions
 don-0.015
 Are-0.013
 blame-0.012
 worry-0.011
 girl-0.011
Tokenjoy
Token ID2633
Feature activation+7.03e+00

Pos logit contributions
ogo+0.008
ades+0.008
ules+0.007
wen+0.007
eds+0.007
Neg logit contributions
 don-0.014
 Are-0.013
 blame-0.013
 worry-0.012
 girl-0.012
Tokened
Token ID276
Feature activation-1.50e+00

Pos logit contributions
ogo+0.605
ades+0.575
ules+0.569
eds+0.565
wen+0.565
Neg logit contributions
 don-0.312
 Are-0.272
 blame-0.262
 worry-0.228
 girl-0.221
Token.
Token ID13
Feature activation-8.83e-03

Pos logit contributions
 don+0.069
 Are+0.060
 blame+0.058
 worry+0.051
 girl+0.050
Neg logit contributions
ogo-0.126
ades-0.120
ules-0.118
wen-0.118
eds-0.118
Token
Token ID220
Feature activation+1.95e+00

Pos logit contributions
 don+0.000
 Are+0.000
 blame+0.000
 Lily+0.000
 girl+0.000
Neg logit contributions
ogo-0.001
ades-0.001
eds-0.001
wen-0.001
ules-0.001
Token
Token ID198
Feature activation+6.24e-01

Pos logit contributions
ogo+0.181
eds+0.181
ades+0.175
ules+0.173
wen+0.172
Neg logit contributions
 don-0.072
 Are-0.059
 Lily-0.054
 Tim-0.047
 do-0.045
Token
Token ID198
Feature activation+6.15e-01

Pos logit contributions
ogo+0.063
opl+0.063
wen+0.062
urger+0.062
ades+0.061
Neg logit contributions
 don-0.017
 Are-0.014
 Lily-0.013
 with-0.011
 do-0.009
Token make
Token ID787
Feature activation+2.63e-01

Pos logit contributions
ogo+0.012
Andrew+0.012
eds+0.012
ades+0.011
wen+0.011
Neg logit contributions
 Are-0.008
 don-0.008
 blame-0.008
Wow-0.007
Hi-0.007
Token it
Token ID340
Feature activation-3.85e-01

Pos logit contributions
ogo+0.026
ules+0.025
ades+0.024
wen+0.024
eds+0.024
Neg logit contributions
 don-0.009
 Are-0.007
 blame-0.007
 do-0.006
 Lily-0.006
Token whole
Token ID2187
Feature activation+5.64e-01

Pos logit contributions
 Are+0.020
 don+0.019
 blame+0.018
Hi+0.016
 girl+0.016
Neg logit contributions
ogo-0.028
ades-0.027
eds-0.026
vern-0.025
wen-0.025
Token again
Token ID757
Feature activation-1.36e+00

Pos logit contributions
ogo+0.042
ades+0.040
ules+0.040
wen+0.039
eds+0.039
Neg logit contributions
 don-0.031
 Are-0.028
 blame-0.027
 worry-0.025
 girl-0.024
Token.
Token ID13
Feature activation+1.03e-01

Pos logit contributions
 don+0.076
 Are+0.068
 blame+0.066
 worry+0.059
 girl+0.059
Neg logit contributions
ogo-0.101
ades-0.095
ules-0.094
eds-0.093
wen-0.093
Token Amen
Token ID34717
Feature activation-6.88e+00

Pos logit contributions
ogo+0.007
Andrew+0.007
lor+0.007
ules+0.007
opl+0.007
Neg logit contributions
 don-0.004
 blame-0.004
 worry-0.004
 girl-0.004
 forth-0.004
Token."
Token ID526
Feature activation+7.27e-01

Pos logit contributions
 don+0.344
 Are+0.296
 blame+0.286
 worry+0.256
 do+0.255
Neg logit contributions
ogo-0.557
ades-0.527
ules-0.519
wen-0.519
eds-0.513
Token
Token ID198
Feature activation+9.33e-01

Pos logit contributions
ogo+0.047
ules+0.046
ades+0.045
Andrew+0.045
wen+0.044
Neg logit contributions
 don-0.043
 blame-0.038
 Are-0.038
 worry-0.035
 do-0.034
Token
Token ID198
Feature activation+9.32e-01

Pos logit contributions
ogo+0.090
ules+0.084
eds+0.084
ades+0.083
lor+0.083
Neg logit contributions
 don-0.033
 Are-0.028
 Lily-0.028
 blame-0.025
 do-0.022
TokenThen
Token ID6423
Feature activation-1.32e+00

Pos logit contributions
ogo+0.069
ules+0.064
ades+0.064
eds+0.063
wen+0.062
Neg logit contributions
 don-0.055
 Are-0.050
 blame-0.048
 worry-0.043
 girl-0.043
Token,
Token ID11
Feature activation-8.85e-01

Pos logit contributions
 blame+0.061
 don+0.059
 Are+0.057
 weep+0.055
 want+0.049
Neg logit contributions
Andrew-0.091
ogo-0.089
ades-0.085
ules-0.083
Karl-0.083
Token to
Token ID284
Feature activation+1.01e+00

Pos logit contributions
ogo+0.038
eds+0.036
ules+0.036
Andrew+0.036
ades+0.035
Neg logit contributions
 don-0.017
 Are-0.016
 blame-0.014
ter-0.012
Wow-0.012
Token mom
Token ID1995
Feature activation-8.59e-01

Pos logit contributions
ogo+0.079
eds+0.076
ades+0.074
Andrew+0.072
wen+0.072
Neg logit contributions
 Are-0.041
 don-0.038
 girl-0.032
Wow-0.032
 boy-0.029
Token and
Token ID290
Feature activation+3.54e-01

Pos logit contributions
 don+0.041
 Are+0.037
 blame+0.035
 worry+0.030
 girl+0.030
Neg logit contributions
ogo-0.069
ades-0.066
ules-0.065
eds-0.065
wen-0.063
Token dad
Token ID9955
Feature activation+1.18e-01

Pos logit contributions
ogo+0.022
Andrew+0.022
ules+0.021
eds+0.021
ades+0.020
Neg logit contributions
 Are-0.018
 don-0.018
Wow-0.016
 blame-0.016
ter-0.014
Token.
Token ID13
Feature activation+1.74e-01

Pos logit contributions
ogo+0.009
eds+0.008
ades+0.008
ules+0.008
wen+0.008
Neg logit contributions
 don-0.006
 Are-0.006
 blame-0.005
 girl-0.005
 Sorry-0.005
Token Please
Token ID4222
Feature activation-6.73e+00

Pos logit contributions
ogo+0.014
ades+0.014
ules+0.014
wen+0.013
eds+0.013
Neg logit contributions
 don-0.008
 Are-0.007
 blame-0.007
 worry-0.006
 girl-0.006
Token mend
Token ID47618
Feature activation-9.12e-01

Pos logit contributions
 Are+0.246
 girl+0.227
Wow+0.227
 boy+0.200
 blame+0.185
Neg logit contributions
ogo-0.525
ades-0.518
eds-0.496
vern-0.491
us-0.475
Token the
Token ID262
Feature activation+9.17e-02

Pos logit contributions
 don+0.041
 Are+0.035
 blame+0.033
 Lily+0.030
 worry+0.029
Neg logit contributions
ogo-0.078
ades-0.075
ules-0.074
wen-0.073
eds-0.072
Token toy
Token ID13373
Feature activation-2.25e-01

Pos logit contributions
ogo+0.006
ades+0.006
eds+0.006
wen+0.005
ules+0.005
Neg logit contributions
 don-0.005
 Are-0.005
 blame-0.005
 worry-0.004
 Sorry-0.004
Token and
Token ID290
Feature activation+1.88e-01

Pos logit contributions
 don+0.011
 Are+0.010
 blame+0.009
 worry+0.008
 weep+0.008
Neg logit contributions
ogo-0.018
ades-0.017
eds-0.017
ules-0.017
wen-0.017
Token make
Token ID787
Feature activation+2.63e-01

Pos logit contributions
ogo+0.012
Andrew+0.012
eds+0.012
ades+0.011
wen+0.011
Neg logit contributions
 Are-0.008
 don-0.008
 blame-0.008
Wow-0.007
Hi-0.007
Token and
Token ID290
Feature activation+1.34e-01

Pos logit contributions
 don+0.039
 Are+0.032
 blame+0.028
 Lily+0.027
 girl+0.026
Neg logit contributions
ogo-0.105
ades-0.103
ules-0.100
wen-0.100
Andrew-0.098
Token fell
Token ID3214
Feature activation-4.47e-01

Pos logit contributions
ogo+0.010
eds+0.010
wen+0.010
us+0.010
ules+0.009
Neg logit contributions
 don-0.006
 blame-0.006
 Are-0.005
 worry-0.005
 girl-0.005
Token.
Token ID13
Feature activation-6.88e-03

Pos logit contributions
 don+0.019
 Are+0.017
 blame+0.016
 worry+0.014
 girl+0.014
Neg logit contributions
ogo-0.039
ades-0.037
ules-0.036
eds-0.036
wen-0.036
Token The
Token ID383
Feature activation-7.28e-02

Pos logit contributions
 don+0.000
 blame+0.000
 Are+0.000
 worry+0.000
 girl+0.000
Neg logit contributions
ogo-0.001
ules-0.001
ades-0.001
wen-0.001
Andrew-0.000
Token ash
Token ID12530
Feature activation-4.13e-01

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 Sorry+0.004
 Mind+0.004
Neg logit contributions
ogo-0.004
ades-0.004
eds-0.004
wen-0.004
ules-0.004
Tokentr
Token ID2213
Feature activation-6.60e+00

Pos logit contributions
 don+0.017
 Are+0.015
 blame+0.015
 worry+0.013
 weep+0.012
Neg logit contributions
ogo-0.036
ades-0.034
eds-0.034
wen-0.034
ules-0.033
Tokenay
Token ID323
Feature activation+4.21e-02

Pos logit contributions
 don+0.505
 Are+0.468
 blame+0.456
 worry+0.425
 girl+0.420
Neg logit contributions
ogo-0.353
ades-0.325
ules-0.319
wen-0.316
eds-0.315
Token flew
Token ID13112
Feature activation-1.50e+00

Pos logit contributions
ogo+0.004
ades+0.003
wen+0.003
eds+0.003
ules+0.003
Neg logit contributions
 don-0.002
 blame-0.002
 Are-0.002
 worry-0.001
 weep-0.001
Token out
Token ID503
Feature activation+9.17e-01

Pos logit contributions
 don+0.070
 Are+0.063
 blame+0.062
 worry+0.053
 girl+0.051
Neg logit contributions
ogo-0.124
ades-0.117
eds-0.116
ules-0.114
wen-0.114
Token of
Token ID286
Feature activation-1.11e-01

Pos logit contributions
ogo+0.078
ades+0.074
ules+0.073
wen+0.073
eds+0.072
Neg logit contributions
 don-0.042
 Are-0.036
 blame-0.035
 worry-0.031
 girl-0.030
Token her
Token ID607
Feature activation-7.23e-02

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 worry+0.003
 do+0.003
Neg logit contributions
ogo-0.010
ades-0.010
eds-0.010
wen-0.010
ules-0.009
Token I
Token ID314
Feature activation-3.21e+00

Pos logit contributions
 don+0.010
 Are+0.009
 blame+0.008
 worry+0.007
 girl+0.007
Neg logit contributions
ogo-0.021
ades-0.020
ules-0.019
wen-0.019
eds-0.019
Token have
Token ID423
Feature activation+3.37e-01

Pos logit contributions
 Are+0.113
 blame+0.112
 don+0.104
Wow+0.098
ter+0.096
Neg logit contributions
ogo-0.235
ades-0.234
eds-0.231
ules-0.229
us-0.227
Token a
Token ID257
Feature activation-2.44e-01

Pos logit contributions
ogo+0.029
ades+0.027
ules+0.027
wen+0.027
eds+0.026
Neg logit contributions
 don-0.016
 Are-0.013
 blame-0.013
 worry-0.012
 do-0.011
Token treat
Token ID2190
Feature activation+2.29e+00

Pos logit contributions
 don+0.014
 Are+0.012
 blame+0.012
 worry+0.011
 girl+0.010
Neg logit contributions
ogo-0.018
ades-0.017
ules-0.016
eds-0.016
wen-0.016
Token?
Token ID30
Feature activation-2.12e-01

Pos logit contributions
ogo+0.181
ades+0.172
ules+0.171
eds+0.169
wen+0.168
Neg logit contributions
 don-0.110
 blame-0.103
 Are-0.101
 worry-0.088
 weep-0.086
Token Please
Token ID4222
Feature activation-6.59e+00

Pos logit contributions
 don+0.009
 blame+0.009
 worry+0.008
 forth+0.008
 weep+0.008
Neg logit contributions
ules-0.015
ogo-0.015
Andrew-0.015
lor-0.015
us-0.014
Token!"
Token ID2474
Feature activation+2.09e+00

Pos logit contributions
 Are+0.251
 girl+0.237
Wow+0.232
 blame+0.227
 boy+0.221
Neg logit contributions
ogo-0.508
ades-0.475
Andrew-0.471
wen-0.468
ules-0.463
Token
Token ID198
Feature activation+1.15e+00

Pos logit contributions
ogo+0.145
ules+0.141
Andrew+0.140
ades+0.139
wen+0.136
Neg logit contributions
 don-0.111
 blame-0.100
 Are-0.098
 worry-0.090
 girl-0.087
Token
Token ID198
Feature activation+1.15e+00

Pos logit contributions
ogo+0.107
eds+0.103
ades+0.100
ules+0.099
wen+0.098
Neg logit contributions
 don-0.043
 Are-0.038
 Lily-0.035
 blame-0.032
 do-0.029
TokenHer
Token ID9360
Feature activation-5.56e-02

Pos logit contributions
ogo+0.078
ades+0.072
eds+0.071
ules+0.071
wen+0.070
Neg logit contributions
 don-0.074
 Are-0.068
 blame-0.066
 worry-0.060
 girl-0.059
Token mom
Token ID1995
Feature activation-9.26e-01

Pos logit contributions
 don+0.003
 Are+0.002
 blame+0.002
 worry+0.002
 weep+0.002
Neg logit contributions
ogo-0.005
ades-0.004
wen-0.004
ules-0.004
Andrew-0.004
Token Please
Token ID4222
Feature activation-6.79e+00

Pos logit contributions
ogo+0.034
ules+0.032
ades+0.032
wen+0.031
eds+0.031
Neg logit contributions
 don-0.021
 blame-0.018
 Are-0.018
 worry-0.016
 girl-0.016
Token,
Token ID11
Feature activation-2.92e-01

Pos logit contributions
 Are+0.213
 girl+0.194
 blame+0.188
Wow+0.186
 don+0.178
Neg logit contributions
ogo-0.572
ades-0.543
ules-0.525
us-0.523
lor-0.522
Token help
Token ID1037
Feature activation-1.37e-01

Pos logit contributions
 don+0.010
 Are+0.010
 blame+0.010
 weep+0.009
 worry+0.008
Neg logit contributions
ogo-0.024
ules-0.023
Andrew-0.022
ades-0.022
wen-0.022
Token me
Token ID502
Feature activation-2.64e+00

Pos logit contributions
 don+0.008
 Are+0.008
 blame+0.007
 worry+0.007
 girl+0.007
Neg logit contributions
ogo-0.010
ades-0.009
ules-0.009
eds-0.009
wen-0.009
Token.
Token ID13
Feature activation+3.50e-01

Pos logit contributions
 don+0.123
 Are+0.116
 blame+0.109
 girl+0.095
 worry+0.092
Neg logit contributions
ogo-0.211
eds-0.201
ades-0.201
ules-0.196
wen-0.194
Token Please
Token ID4222
Feature activation-6.58e+00

Pos logit contributions
lor+0.022
ogo+0.022
Andrew+0.021
ules+0.021
ables+0.020
Neg logit contributions
 forth-0.015
 don-0.015
 feel-0.014
 worry-0.014
 blame-0.013
Token,
Token ID11
Feature activation-5.07e-01

Pos logit contributions
 Are+0.241
Wow+0.229
 girl+0.223
 blame+0.202
 boy+0.202
Neg logit contributions
ogo-0.497
ades-0.488
eds-0.476
vern-0.469
us-0.459
Token hug
Token ID16225
Feature activation+6.73e-01

Pos logit contributions
 Are+0.017
 don+0.016
 blame+0.016
 weep+0.015
Wow+0.014
Neg logit contributions
ogo-0.041
Andrew-0.038
ules-0.038
ades-0.037
wen-0.037
Token me
Token ID502
Feature activation-2.69e+00

Pos logit contributions
ogo+0.043
eds+0.041
ules+0.039
ades+0.039
wen+0.037
Neg logit contributions
 don-0.041
 blame-0.040
 Are-0.039
 weep-0.035
 cry-0.034
Token."
Token ID526
Feature activation+8.01e-01

Pos logit contributions
 don+0.141
 Are+0.125
 blame+0.121
 worry+0.108
 girl+0.106
Neg logit contributions
ogo-0.209
ades-0.198
ules-0.195
wen-0.194
eds-0.194
Token
Token ID198
Feature activation+1.08e+00

Pos logit contributions
ogo+0.057
ades+0.054
ules+0.054
Andrew+0.053
wen+0.053
Neg logit contributions
 don-0.044
 blame-0.038
 Are-0.038
 worry-0.035
 do-0.034
Token you
Token ID345
Feature activation-1.90e+00

Pos logit contributions
 don+0.005
 Are+0.005
 blame+0.005
 worry+0.004
ter+0.004
Neg logit contributions
ogo-0.010
ules-0.009
ades-0.009
eds-0.009
Andrew-0.009
Token need
Token ID761
Feature activation-1.44e+00

Pos logit contributions
 Are+0.077
 blame+0.074
 don+0.074
 girl+0.064
 worry+0.062
Neg logit contributions
ogo-0.150
ades-0.143
ules-0.141
eds-0.140
Andrew-0.137
Token to
Token ID284
Feature activation+1.12e+00

Pos logit contributions
 don+0.072
 Are+0.063
 blame+0.061
 worry+0.055
 do+0.054
Neg logit contributions
ogo-0.117
ades-0.110
ules-0.110
wen-0.109
eds-0.107
Token sleep
Token ID3993
Feature activation-1.15e+00

Pos logit contributions
ogo+0.096
ades+0.090
eds+0.089
wen+0.089
Andrew+0.088
Neg logit contributions
 Are-0.041
 don-0.039
 blame-0.034
 girl-0.034
 boy-0.032
Token.
Token ID13
Feature activation+3.60e-01

Pos logit contributions
 don+0.059
 blame+0.056
 Are+0.056
 worry+0.048
 girl+0.046
Neg logit contributions
ogo-0.087
eds-0.082
ades-0.082
wen-0.080
ules-0.080
Token Please
Token ID4222
Feature activation-6.58e+00

Pos logit contributions
ogo+0.027
ules+0.026
Andrew+0.025
lor+0.025
wen+0.025
Neg logit contributions
 don-0.017
 blame-0.015
 worry-0.014
 girl-0.013
 cry-0.013
Token,
Token ID11
Feature activation-5.24e-01

Pos logit contributions
Wow+0.266
 Are+0.254
 blame+0.239
 girl+0.233
 boy+0.221
Neg logit contributions
ogo-0.502
eds-0.461
vern-0.454
lor-0.450
us-0.448
Token put
Token ID1234
Feature activation+4.00e-01

Pos logit contributions
 Are+0.019
 blame+0.018
 don+0.017
Wow+0.017
 weep+0.016
Neg logit contributions
ogo-0.041
Andrew-0.038
ules-0.037
wen-0.036
Karl-0.035
Token this
Token ID428
Feature activation-1.86e-02

Pos logit contributions
ogo+0.031
eds+0.030
ades+0.030
wen+0.029
ules+0.029
Neg logit contributions
 don-0.020
 Are-0.019
 blame-0.018
 worry-0.016
 weep-0.015
Token book
Token ID1492
Feature activation+8.04e-01

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 Sorry+0.001
Hi+0.001
Neg logit contributions
ogo-0.001
ades-0.001
eds-0.001
lor-0.001
wen-0.001
Token away
Token ID1497
Feature activation-1.10e+00

Pos logit contributions
ogo+0.060
ades+0.059
eds+0.056
lor+0.055
ables+0.054
Neg logit contributions
 Are-0.039
 blame-0.039
 don-0.039
 worry-0.032
 weep-0.031
Token show
Token ID905
Feature activation+2.59e+00

Pos logit contributions
ogo+0.060
ades+0.058
ules+0.057
wen+0.056
eds+0.056
Neg logit contributions
 don-0.023
 Are-0.021
 blame-0.019
 girl-0.016
 worry-0.015
Token off
Token ID572
Feature activation+2.90e-01

Pos logit contributions
ogo+0.196
ules+0.183
ades+0.183
wen+0.180
eds+0.178
Neg logit contributions
 don-0.135
 Are-0.125
 blame-0.118
 girl-0.105
 worry-0.105
Token his
Token ID465
Feature activation+2.62e-01

Pos logit contributions
ogo+0.025
ades+0.024
wen+0.024
ules+0.024
eds+0.023
Neg logit contributions
 don-0.013
 Are-0.011
 blame-0.010
 worry-0.009
 do-0.009
Token powerful
Token ID3665
Feature activation+1.02e+00

Pos logit contributions
ogo+0.019
ades+0.017
ules+0.017
wen+0.017
eds+0.017
Neg logit contributions
 don-0.016
 Are-0.014
 blame-0.014
 worry-0.012
 girl-0.012
Token ash
Token ID12530
Feature activation-3.94e-01

Pos logit contributions
ogo+0.069
ades+0.065
eds+0.064
ules+0.064
wen+0.063
Neg logit contributions
 don-0.063
 Are-0.058
 blame-0.055
 Lily-0.049
 worry-0.048
Tokentr
Token ID2213
Feature activation-6.57e+00

Pos logit contributions
 don+0.023
 Are+0.021
 blame+0.020
 worry+0.018
 girl+0.017
Neg logit contributions
ogo-0.029
ades-0.027
ules-0.026
wen-0.026
eds-0.026
Tokenay
Token ID323
Feature activation+1.96e-01

Pos logit contributions
 don+0.453
 Are+0.416
 blame+0.398
 worry+0.372
 do+0.371
Neg logit contributions
ogo-0.380
ules-0.357
eds-0.353
wen-0.350
Andrew-0.349
Token.
Token ID13
Feature activation+3.81e-01

Pos logit contributions
ogo+0.017
ades+0.016
ules+0.016
eds+0.016
wen+0.016
Neg logit contributions
 don-0.009
 Are-0.008
 blame-0.007
 worry-0.006
 do-0.006
Token He
Token ID679
Feature activation+4.89e-03

Pos logit contributions
ogo+0.030
ades+0.028
ules+0.028
wen+0.028
eds+0.028
Neg logit contributions
 don-0.019
 Are-0.017
 blame-0.017
 worry-0.015
 girl-0.014
Token flew
Token ID13112
Feature activation-1.51e+00

Pos logit contributions
ogo+0.000
ades+0.000
wen+0.000
ules+0.000
eds+0.000
Neg logit contributions
 don-0.000
 Are-0.000
 blame-0.000
 worry-0.000
 Sorry-0.000
Token it
Token ID340
Feature activation-2.79e-01

Pos logit contributions
 don+0.066
 Are+0.058
 blame+0.056
 worry+0.048
 girl+0.047
Neg logit contributions
ogo-0.130
ades-0.124
ules-0.122
wen-0.121
eds-0.121
Token ones
Token ID3392
Feature activation-2.70e-01

Pos logit contributions
ogo+0.094
ades+0.090
ules+0.089
eds+0.087
wen+0.087
Neg logit contributions
 don-0.070
 Are-0.064
 blame-0.062
 worry-0.056
 weep-0.054
Token,
Token ID11
Feature activation-7.93e-01

Pos logit contributions
 don+0.012
 Are+0.011
 blame+0.011
 worry+0.009
 weep+0.009
Neg logit contributions
ogo-0.023
ades-0.022
ules-0.021
eds-0.021
wen-0.021
Token Mom
Token ID11254
Feature activation-1.38e+00

Pos logit contributions
 don+0.030
 Are+0.028
 blame+0.028
 weep+0.024
 cry+0.023
Neg logit contributions
ogo-0.067
ades-0.065
ules-0.064
Andrew-0.062
wen-0.062
Tokenmy
Token ID1820
Feature activation+3.85e-01

Pos logit contributions
 don+0.069
 Are+0.062
 blame+0.060
 worry+0.053
 weep+0.051
Neg logit contributions
ogo-0.109
ades-0.103
ules-0.102
eds-0.101
wen-0.101
Token!
Token ID0
Feature activation+2.41e+00

Pos logit contributions
ogo+0.029
ades+0.028
ules+0.028
eds+0.027
wen+0.027
Neg logit contributions
 don-0.020
 Are-0.018
 blame-0.018
 worry-0.016
 weep-0.015
Token Please
Token ID4222
Feature activation-6.56e+00

Pos logit contributions
ogo+0.205
ules+0.195
ades+0.195
wen+0.190
eds+0.189
Neg logit contributions
 don-0.104
 blame-0.088
 Are-0.087
 worry-0.078
 weep-0.075
Token,
Token ID11
Feature activation-8.70e-01

Pos logit contributions
 Are+0.233
 blame+0.209
 girl+0.204
 don+0.199
 boy+0.194
Neg logit contributions
ogo-0.538
ades-0.518
ules-0.502
eds-0.500
us-0.491
Token please
Token ID3387
Feature activation-2.30e+00

Pos logit contributions
 Are+0.027
 blame+0.027
 don+0.027
 weep+0.026
ter+0.022
Neg logit contributions
ogo-0.069
ules-0.066
Andrew-0.065
ades-0.064
wen-0.062
Token,
Token ID11
Feature activation-8.25e-01

Pos logit contributions
 Are+0.093
 don+0.091
 blame+0.090
 girl+0.082
Wow+0.081
Neg logit contributions
ogo-0.174
ades-0.166
eds-0.163
ules-0.162
wen-0.161
Token please
Token ID3387
Feature activation-2.27e+00

Pos logit contributions
 Are+0.027
 blame+0.026
 don+0.026
 weep+0.024
 cry+0.021
Neg logit contributions
ogo-0.068
ules-0.065
ades-0.063
Andrew-0.063
wen-0.061
Token."
Token ID526
Feature activation+7.70e-01

Pos logit contributions
 Are+0.116
 don+0.116
 blame+0.116
 girl+0.105
 boy+0.102
Neg logit contributions
ogo-0.149
ades-0.140
eds-0.139
ules-0.136
wen-0.136
Token hit
Token ID2277
Feature activation-1.11e+00

Pos logit contributions
ogo+0.207
eds+0.197
ades+0.194
wen+0.193
ules+0.190
Neg logit contributions
 don-0.119
 Are-0.109
 blame-0.104
 girl-0.092
 worry-0.092
Token her
Token ID607
Feature activation-2.52e-01

Pos logit contributions
 don+0.047
 Are+0.043
 blame+0.042
 worry+0.034
 Sorry+0.033
Neg logit contributions
ogo-0.094
eds-0.089
wen-0.088
ules-0.087
ades-0.086
Token mom
Token ID1995
Feature activation-1.18e+00

Pos logit contributions
 don+0.012
 Are+0.012
 blame+0.011
 worry+0.009
Hi+0.009
Neg logit contributions
ogo-0.020
eds-0.019
ades-0.019
wen-0.018
ules-0.018
Token's
Token ID338
Feature activation+2.25e-01

Pos logit contributions
 don+0.059
 Are+0.053
 blame+0.050
 worry+0.044
 girl+0.044
Neg logit contributions
ogo-0.093
eds-0.089
ades-0.088
ules-0.087
wen-0.087
Token ash
Token ID12530
Feature activation-4.54e-01

Pos logit contributions
ogo+0.016
eds+0.015
ades+0.015
wen+0.015
ules+0.014
Neg logit contributions
 don-0.013
 Are-0.012
 blame-0.011
 Sorry-0.010
Hi-0.009
Tokentr
Token ID2213
Feature activation-6.55e+00

Pos logit contributions
 don+0.023
 Are+0.021
 blame+0.020
 worry+0.018
 girl+0.018
Neg logit contributions
ogo-0.036
ades-0.034
ules-0.033
eds-0.033
wen-0.033
Tokenay
Token ID323
Feature activation+1.49e-01

Pos logit contributions
 don+0.461
 Are+0.425
 blame+0.409
 worry+0.383
 girl+0.378
Neg logit contributions
ogo-0.377
ules-0.354
ades-0.347
wen-0.346
Andrew-0.343
Token.
Token ID13
Feature activation-2.29e-01

Pos logit contributions
ogo+0.013
ades+0.012
ules+0.012
wen+0.012
eds+0.012
Neg logit contributions
 don-0.007
 Are-0.006
 blame-0.006
 worry-0.005
 girl-0.005
Token The
Token ID383
Feature activation-2.95e-01

Pos logit contributions
 don+0.010
 blame+0.008
 Are+0.008
 worry+0.007
 do+0.007
Neg logit contributions
ogo-0.019
ades-0.018
ules-0.018
wen-0.018
eds-0.018
Token ash
Token ID12530
Feature activation-4.37e-01

Pos logit contributions
 don+0.017
 Are+0.016
 blame+0.014
 Sorry+0.013
 give+0.012
Neg logit contributions
ogo-0.021
ades-0.020
wen-0.019
eds-0.019
ules-0.019
Tokentr
Token ID2213
Feature activation-6.42e+00

Pos logit contributions
 don+0.026
 Are+0.024
 blame+0.023
 worry+0.021
 weep+0.020
Neg logit contributions
ogo-0.030
ades-0.029
wen-0.029
eds-0.028
lor-0.027
Token let
Token ID1309
Feature activation+5.36e-01

Pos logit contributions
 Are+0.204
 girl+0.185
Wow+0.177
 blame+0.167
 boy+0.156
Neg logit contributions
ogo-0.572
ades-0.535
vern-0.516
eds-0.514
wen-0.512
Token us
Token ID514
Feature activation-1.61e+00

Pos logit contributions
ogo+0.039
ades+0.037
ules+0.037
wen+0.036
eds+0.036
Neg logit contributions
 don-0.031
 Are-0.028
 blame-0.027
 worry-0.024
 do-0.024
Token watch
Token ID2342
Feature activation+3.08e+00

Pos logit contributions
 don+0.059
 Are+0.058
 blame+0.051
 girl+0.044
 boy+0.041
Neg logit contributions
ogo-0.143
ades-0.139
eds-0.136
ules-0.135
wen-0.131
Token TV
Token ID3195
Feature activation+4.58e-01

Pos logit contributions
ogo+0.231
ades+0.223
eds+0.220
ules+0.218
wen+0.211
Neg logit contributions
 don-0.157
 Are-0.145
 blame-0.142
 worry-0.125
 girl-0.123
Token.
Token ID13
Feature activation+3.50e-01

Pos logit contributions
ogo+0.036
ades+0.035
ules+0.034
eds+0.034
wen+0.033
Neg logit contributions
 don-0.022
 Are-0.020
 blame-0.020
 worry-0.017
 weep-0.017
Token Please
Token ID4222
Feature activation-6.54e+00

Pos logit contributions
ogo+0.024
Andrew+0.024
ules+0.024
ades+0.023
lor+0.023
Neg logit contributions
 don-0.015
 worry-0.014
 weep-0.014
 blame-0.014
 forth-0.014
Token do
Token ID466
Feature activation+4.42e-02

Pos logit contributions
 Are+0.267
 girl+0.240
Wow+0.233
 blame+0.226
 boy+0.215
Neg logit contributions
ogo-0.495
ades-0.482
eds-0.453
ules-0.448
wen-0.442
Token not
Token ID407
Feature activation+8.66e-01

Pos logit contributions
ogo+0.003
ades+0.003
ules+0.003
wen+0.003
eds+0.003
Neg logit contributions
 don-0.002
 Are-0.002
 blame-0.002
 girl-0.002
 worry-0.002
Token be
Token ID307
Feature activation-2.49e+00

Pos logit contributions
ogo+0.067
eds+0.066
ules+0.066
ades+0.065
Andrew+0.063
Neg logit contributions
 Are-0.033
 don-0.032
 girl-0.027
Wow-0.026
 blame-0.025
Token mad
Token ID8805
Feature activation-1.39e+00

Pos logit contributions
 don+0.110
 Are+0.095
 blame+0.092
 worry+0.081
 girl+0.078
Neg logit contributions
ogo-0.217
ades-0.205
wen-0.203
ules-0.202
eds-0.199
Token at
Token ID379
Feature activation-2.50e+00

Pos logit contributions
 don+0.067
 Are+0.061
 blame+0.060
 weep+0.052
 worry+0.051
Neg logit contributions
ogo-0.109
ules-0.105
ades-0.105
eds-0.101
us-0.098
Token she
Token ID673
Feature activation+2.64e-01

Pos logit contributions
 blame+0.030
 Are+0.029
 don+0.026
 weep+0.026
 cry+0.025
Neg logit contributions
Andrew-0.042
wen-0.041
ules-0.041
ogo-0.040
Kevin-0.040
Token could
Token ID714
Feature activation-6.84e-02

Pos logit contributions
ogo+0.016
wen+0.015
ules+0.015
us+0.015
ades+0.014
Neg logit contributions
 blame-0.015
 Are-0.015
 don-0.014
 Sorry-0.013
 weep-0.013
Token have
Token ID423
Feature activation-9.38e-04

Pos logit contributions
 Are+0.002
 don+0.002
 blame+0.002
 girl+0.002
Wow+0.002
Neg logit contributions
ogo-0.006
ades-0.006
ules-0.006
wen-0.006
eds-0.006
Token the
Token ID262
Feature activation+8.37e-02

Pos logit contributions
 don+0.000
 Are+0.000
 blame+0.000
 worry+0.000
 girl+0.000
Neg logit contributions
ogo-0.000
ades-0.000
eds-0.000
ules-0.000
wen-0.000
Token ash
Token ID12530
Feature activation-3.00e-01

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
wen+0.005
eds+0.005
Neg logit contributions
 don-0.005
 Are-0.005
 blame-0.005
 Sorry-0.004
 weep-0.004
Tokentr
Token ID2213
Feature activation-6.52e+00

Pos logit contributions
 don+0.018
 Are+0.017
 blame+0.017
 worry+0.015
 cry+0.014
Neg logit contributions
ogo-0.020
ades-0.019
ules-0.018
eds-0.018
lor-0.018
Tokenay
Token ID323
Feature activation+2.09e-01

Pos logit contributions
 don+0.496
 Are+0.459
 blame+0.448
 worry+0.417
 girl+0.413
Neg logit contributions
ogo-0.350
ades-0.322
ules-0.317
wen-0.314
eds-0.314
Token to
Token ID284
Feature activation+5.85e-01

Pos logit contributions
ogo+0.017
ades+0.016
ules+0.016
wen+0.016
eds+0.015
Neg logit contributions
 don-0.010
 Are-0.009
 blame-0.009
 worry-0.008
 weep-0.007
Token put
Token ID1234
Feature activation+6.34e-02

Pos logit contributions
ogo+0.047
ades+0.044
wen+0.043
Andrew+0.043
eds+0.042
Neg logit contributions
 Are-0.024
 don-0.020
 girl-0.020
 blame-0.019
 boy-0.019
Token her
Token ID607
Feature activation-7.90e-02

Pos logit contributions
ogo+0.005
eds+0.005
ades+0.005
us+0.005
wen+0.005
Neg logit contributions
 blame-0.003
 don-0.003
 Are-0.003
 worry-0.002
 weep-0.002
Token rocks
Token ID12586
Feature activation-2.81e-02

Pos logit contributions
 Are+0.004
 don+0.004
 blame+0.004
 Sorry+0.003
Hi+0.003
Neg logit contributions
ogo-0.006
eds-0.005
ades-0.005
wen-0.005
lor-0.005
Token and
Token ID290
Feature activation+1.66e-01

Pos logit contributions
 don+0.016
 Are+0.014
 blame+0.013
 worry+0.012
 girl+0.011
Neg logit contributions
ogo-0.032
ades-0.030
ules-0.030
wen-0.030
eds-0.029
Token I
Token ID314
Feature activation-2.21e+00

Pos logit contributions
ogo+0.010
ules+0.010
ades+0.010
eds+0.010
Andrew+0.010
Neg logit contributions
 Are-0.009
 don-0.009
 blame-0.008
ter-0.008
 weep-0.007
Token like
Token ID588
Feature activation-2.14e+00

Pos logit contributions
 Are+0.066
 blame+0.060
 don+0.059
ter+0.055
 girl+0.050
Neg logit contributions
ogo-0.195
ules-0.187
ades-0.186
wen-0.185
Andrew-0.181
Token them
Token ID606
Feature activation+1.50e+00

Pos logit contributions
 don+0.099
 Are+0.083
 Lily+0.077
 blame+0.077
 do+0.075
Neg logit contributions
ogo-0.189
wen-0.180
ades-0.176
lor-0.175
eds-0.172
Token.
Token ID13
Feature activation+8.34e-02

Pos logit contributions
ogo+0.118
ades+0.112
ules+0.111
eds+0.110
wen+0.109
Neg logit contributions
 don-0.075
 Are-0.068
 blame-0.065
 worry-0.057
 girl-0.056
Token Please
Token ID4222
Feature activation-6.52e+00

Pos logit contributions
ogo+0.007
ules+0.006
ades+0.006
Andrew+0.006
wen+0.006
Neg logit contributions
 don-0.004
 blame-0.003
 worry-0.003
 Are-0.003
 weep-0.003
Token,
Token ID11
Feature activation-7.61e-01

Pos logit contributions
 Are+0.233
 girl+0.213
 blame+0.202
 boy+0.197
Wow+0.196
Neg logit contributions
ogo-0.530
ades-0.489
eds-0.485
wen-0.478
ules-0.470
Token mom
Token ID1995
Feature activation-9.45e-01

Pos logit contributions
 weep+0.027
 Are+0.027
 blame+0.026
 don+0.026
ter+0.025
Neg logit contributions
ogo-0.057
Andrew-0.051
ules-0.051
wen-0.049
ades-0.048
Token,
Token ID11
Feature activation-7.85e-01

Pos logit contributions
 don+0.036
 Are+0.034
 blame+0.033
 weep+0.028
 cry+0.027
Neg logit contributions
ogo-0.081
ules-0.077
ades-0.076
eds-0.076
wen-0.075
Token just
Token ID655
Feature activation+1.01e+00

Pos logit contributions
 blame+0.028
 Are+0.026
 weep+0.026
 don+0.025
 cry+0.023
Neg logit contributions
ogo-0.061
ules-0.058
Andrew-0.056
ades-0.055
wen-0.054
Token one
Token ID530
Feature activation-2.57e+00

Pos logit contributions
 supposed+0.064
eds+0.064
ogo+0.064
ules+0.062
Andrew+0.059
Neg logit contributions
 Are-0.039
 blame-0.037
Wow-0.037
 don-0.034
Hi-0.033
Token me
Token ID502
Feature activation-2.65e+00

Pos logit contributions
 don+0.013
 Are+0.011
 blame+0.011
 worry+0.010
 do+0.010
Neg logit contributions
ogo-0.016
ades-0.016
ules-0.015
wen-0.015
eds-0.015
Token fix
Token ID4259
Feature activation+7.68e-01

Pos logit contributions
 don+0.103
 Are+0.098
 blame+0.090
 girl+0.075
 boy+0.072
Neg logit contributions
ogo-0.230
ades-0.224
eds-0.222
ules-0.217
wen-0.210
Token my
Token ID616
Feature activation-1.62e-01

Pos logit contributions
ogo+0.067
ades+0.064
ules+0.063
wen+0.063
eds+0.063
Neg logit contributions
 don-0.033
 Are-0.028
 blame-0.027
 worry-0.023
 girl-0.023
Token pants
Token ID12581
Feature activation-2.17e+00

Pos logit contributions
 don+0.009
 Are+0.008
 blame+0.008
 worry+0.007
 Sorry+0.007
Neg logit contributions
ogo-0.011
ades-0.011
eds-0.011
wen-0.010
ules-0.010
Token.
Token ID13
Feature activation+6.91e-02

Pos logit contributions
 don+0.106
 Are+0.099
 blame+0.097
 worry+0.083
 girl+0.081
Neg logit contributions
ogo-0.171
ades-0.164
ules-0.159
eds-0.159
wen-0.157
Token Please
Token ID4222
Feature activation-6.52e+00

Pos logit contributions
ogo+0.005
ules+0.005
Andrew+0.005
lor+0.005
wen+0.005
Neg logit contributions
 don-0.003
 blame-0.003
 worry-0.003
 weep-0.003
 girl-0.003
Token."
Token ID526
Feature activation+7.41e-01

Pos logit contributions
 Are+0.247
 girl+0.233
Wow+0.228
 boy+0.215
 blame+0.212
Neg logit contributions
ogo-0.491
ades-0.472
eds-0.450
vern-0.447
us-0.436
Token
Token ID198
Feature activation+8.41e-01

Pos logit contributions
ogo+0.052
ules+0.051
Andrew+0.050
ades+0.049
us+0.048
Neg logit contributions
 don-0.039
 blame-0.034
 Are-0.033
 worry-0.031
 do-0.030
Token
Token ID198
Feature activation+8.84e-01

Pos logit contributions
ogo+0.078
eds+0.074
ades+0.073
ules+0.073
wen+0.072
Neg logit contributions
 don-0.032
 Are-0.028
 Lily-0.027
 blame-0.025
 do-0.022
TokenL
Token ID43
Feature activation-3.55e-01

Pos logit contributions
ogo+0.063
ades+0.059
ules+0.058
eds+0.058
wen+0.058
Neg logit contributions
 don-0.053
 Are-0.048
 blame-0.047
 worry-0.042
 girl-0.042
Tokenily
Token ID813
Feature activation-1.30e+00

Pos logit contributions
 don+0.015
 Are+0.012
 blame+0.012
 Lily+0.011
 do+0.010
Neg logit contributions
ogo-0.031
ades-0.030
eds-0.030
ules-0.029
Andrew-0.029
Token else
Token ID2073
Feature activation+1.04e+00

Pos logit contributions
 don+0.002
 Are+0.002
 blame+0.002
 worry+0.001
 girl+0.001
Neg logit contributions
ogo-0.004
ades-0.004
ules-0.003
eds-0.003
wen-0.003
Token to
Token ID284
Feature activation+9.20e-01

Pos logit contributions
ogo+0.085
ades+0.081
ules+0.080
eds+0.080
wen+0.078
Neg logit contributions
 don-0.048
 Are-0.044
 blame-0.043
 worry-0.037
 girl-0.036
Token play
Token ID711
Feature activation-8.51e-01

Pos logit contributions
ades+0.073
ogo+0.073
Andrew+0.071
eds+0.070
asts+0.068
Neg logit contributions
 Are-0.034
 don-0.030
 girl-0.029
 blame-0.025
 boy-0.025
Token with
Token ID351
Feature activation-1.24e-01

Pos logit contributions
 don+0.030
 Are+0.027
 blame+0.027
 worry+0.022
 cry+0.020
Neg logit contributions
ades-0.076
ogo-0.076
eds-0.074
ules-0.070
lor-0.070
Token.
Token ID13
Feature activation+2.54e-01

Pos logit contributions
 don+0.006
 Are+0.005
 blame+0.005
 worry+0.004
 girl+0.004
Neg logit contributions
ogo-0.011
ades-0.010
ules-0.010
wen-0.010
eds-0.010
Token Please
Token ID4222
Feature activation-6.51e+00

Pos logit contributions
Andrew+0.017
ogo+0.017
lor+0.017
ades+0.016
ules+0.016
Neg logit contributions
 don-0.012
 blame-0.011
 worry-0.011
 weep-0.011
 feel-0.011
Token,
Token ID11
Feature activation-6.68e-01

Pos logit contributions
 Are+0.232
 girl+0.211
Wow+0.209
 blame+0.195
 boy+0.188
Neg logit contributions
ogo-0.509
ades-0.496
eds-0.480
vern-0.462
us-0.457
Token please
Token ID3387
Feature activation-2.24e+00

Pos logit contributions
 Are+0.022
 don+0.022
 blame+0.021
 weep+0.020
 cry+0.018
Neg logit contributions
ogo-0.055
ules-0.051
Andrew-0.051
ades-0.051
wen-0.049
Token,
Token ID11
Feature activation-6.40e-01

Pos logit contributions
Wow+0.091
 Are+0.091
 blame+0.084
 girl+0.082
 don+0.077
Neg logit contributions
ogo-0.157
ades-0.151
eds-0.149
Andrew-0.142
wen-0.140
Token don
Token ID836
Feature activation-2.36e+00

Pos logit contributions
 Are+0.021
 don+0.021
 blame+0.021
 weep+0.019
 cry+0.017
Neg logit contributions
ogo-0.053
ades-0.050
ules-0.050
Andrew-0.049
wen-0.047
Token't
Token ID470
Feature activation+1.52e+00

Pos logit contributions
 don+0.139
 Are+0.123
 blame+0.119
 do+0.110
 worry+0.108
Neg logit contributions
ogo-0.174
ules-0.163
ades-0.161
wen-0.161
eds-0.161
Token,
Token ID11
Feature activation-7.30e-01

Pos logit contributions
 don+0.021
 blame+0.021
 Are+0.020
 weep+0.016
 worry+0.016
Neg logit contributions
ogo-0.045
ades-0.043
eds-0.042
ules-0.042
us-0.042
Token it
Token ID340
Feature activation-2.63e-01

Pos logit contributions
 blame+0.033
 don+0.032
 Are+0.031
 weep+0.028
 worry+0.026
Neg logit contributions
ogo-0.055
Andrew-0.053
ules-0.053
wen-0.052
ades-0.052
Token caused
Token ID4073
Feature activation-1.80e-01

Pos logit contributions
 don+0.012
 Are+0.011
 blame+0.011
 worry+0.009
 girl+0.009
Neg logit contributions
ogo-0.021
ades-0.020
eds-0.020
wen-0.020
ules-0.019
Token the
Token ID262
Feature activation-2.76e-02

Pos logit contributions
 don+0.007
 Are+0.006
 blame+0.006
 worry+0.005
 girl+0.005
Neg logit contributions
ogo-0.017
ades-0.016
ules-0.016
wen-0.016
eds-0.016
Token ash
Token ID12530
Feature activation-2.99e-01

Pos logit contributions
 don+0.001
 Are+0.001
 blame+0.001
 Sorry+0.001
 Lily+0.001
Neg logit contributions
ogo-0.002
ades-0.002
wen-0.002
eds-0.002
ules-0.002
Tokentr
Token ID2213
Feature activation-6.50e+00

Pos logit contributions
 don+0.016
 Are+0.014
 blame+0.014
 worry+0.012
 girl+0.012
Neg logit contributions
ogo-0.023
ades-0.022
eds-0.021
wen-0.021
ules-0.021
Tokenay
Token ID323
Feature activation+1.83e-01

Pos logit contributions
 don+0.460
 Are+0.421
 blame+0.407
 worry+0.382
 do+0.375
Neg logit contributions
ogo-0.376
ules-0.349
ades-0.342
wen-0.342
eds-0.341
Token to
Token ID284
Feature activation+6.42e-01

Pos logit contributions
ogo+0.014
ades+0.013
ules+0.013
wen+0.013
eds+0.013
Neg logit contributions
 don-0.010
 Are-0.009
 blame-0.009
 worry-0.008
 girl-0.008
Token scatter
Token ID41058
Feature activation+1.14e+00

Pos logit contributions
ogo+0.058
ades+0.055
ules+0.055
wen+0.054
eds+0.054
Neg logit contributions
 don-0.026
 Are-0.022
 blame-0.021
 worry-0.018
 girl-0.017
Token all
Token ID477
Feature activation+1.37e-01

Pos logit contributions
ogo+0.093
ades+0.089
eds+0.089
ules+0.087
wen+0.086
Neg logit contributions
 don-0.052
 Are-0.047
 blame-0.046
 worry-0.039
 girl-0.038
Token over
Token ID625
Feature activation+5.46e-02

Pos logit contributions
ogo+0.011
ades+0.011
eds+0.011
wen+0.011
ules+0.011
Neg logit contributions
 don-0.006
 Are-0.006
 blame-0.005
 worry-0.005
 do-0.004
Token his
Token ID465
Feature activation+3.30e-02

Pos logit contributions
ogo+0.023
Andrew+0.022
ules+0.022
wen+0.022
ades+0.021
Neg logit contributions
 don-0.016
 blame-0.015
 Are-0.014
 weep-0.013
 want-0.013
Token mom
Token ID1995
Feature activation-1.11e+00

Pos logit contributions
ogo+0.003
ades+0.003
wen+0.002
ules+0.002
eds+0.002
Neg logit contributions
 don-0.002
 Are-0.001
 blame-0.001
 worry-0.001
 weep-0.001
Token saw
Token ID2497
Feature activation+1.42e+00

Pos logit contributions
 don+0.047
 Are+0.041
 blame+0.039
 worry+0.034
 girl+0.033
Neg logit contributions
ogo-0.097
ades-0.092
ules-0.091
eds-0.090
wen-0.090
Token the
Token ID262
Feature activation-1.71e-01

Pos logit contributions
ogo+0.130
ades+0.126
eds+0.124
ules+0.122
wen+0.122
Neg logit contributions
 don-0.057
 Are-0.045
 blame-0.043
 Lily-0.039
 worry-0.038
Token ash
Token ID12530
Feature activation-3.51e-01

Pos logit contributions
 don+0.010
 Are+0.010
 blame+0.009
 Sorry+0.008
 weep+0.008
Neg logit contributions
ogo-0.011
ades-0.011
ules-0.011
wen-0.011
eds-0.010
Tokentr
Token ID2213
Feature activation-6.50e+00

Pos logit contributions
 don+0.018
 Are+0.017
 blame+0.016
 worry+0.014
 weep+0.014
Neg logit contributions
ogo-0.028
ades-0.027
wen-0.026
ules-0.025
eds-0.025
Tokenay
Token ID323
Feature activation+1.98e-01

Pos logit contributions
 don+0.498
 Are+0.461
 blame+0.450
 worry+0.419
 girl+0.415
Neg logit contributions
ogo-0.347
ades-0.319
ules-0.314
wen-0.310
eds-0.310
Token,
Token ID11
Feature activation-9.11e-01

Pos logit contributions
ogo+0.016
ades+0.015
wen+0.015
ules+0.015
eds+0.015
Neg logit contributions
 don-0.009
 Are-0.009
 blame-0.008
 worry-0.007
 girl-0.007
Token she
Token ID673
Feature activation+1.44e-01

Pos logit contributions
 don+0.043
 blame+0.041
 Are+0.039
 weep+0.035
 want+0.034
Neg logit contributions
ogo-0.066
ules-0.064
Andrew-0.064
wen-0.063
ades-0.063
Token laughed
Token ID13818
Feature activation+1.42e+00

Pos logit contributions
ogo+0.013
ades+0.012
ules+0.012
wen+0.012
eds+0.012
Neg logit contributions
 don-0.006
 Are-0.005
 blame-0.005
 worry-0.004
 girl-0.004
Token and
Token ID290
Feature activation-1.15e-01

Pos logit contributions
ogo+0.107
ades+0.106
ules+0.103
eds+0.098
wen+0.098
Neg logit contributions
 don-0.072
 blame-0.067
 Are-0.065
 worry-0.055
 weep-0.054
Token a
Token ID257
Feature activation-1.02e-01

Pos logit contributions
 don+0.000
 Are+0.000
 blame+0.000
 worry+0.000
 girl+0.000
Neg logit contributions
ogo-0.000
ades-0.000
eds-0.000
ules-0.000
wen-0.000
Token mask
Token ID9335
Feature activation+3.98e-01

Pos logit contributions
 don+0.006
 Are+0.005
 blame+0.005
 worry+0.004
ter+0.004
Neg logit contributions
ogo-0.007
ades-0.007
eds-0.007
wen-0.007
ules-0.007
Token at
Token ID379
Feature activation-2.30e+00

Pos logit contributions
ogo+0.032
ades+0.031
lor+0.030
eds+0.030
ules+0.029
Neg logit contributions
 don-0.018
 Are-0.018
 blame-0.017
 worry-0.015
 girl-0.014
Token all
Token ID477
Feature activation+2.74e-01

Pos logit contributions
 don+0.104
 Are+0.091
 blame+0.087
 worry+0.076
 girl+0.075
Neg logit contributions
ogo-0.195
ades-0.185
ules-0.183
wen-0.182
eds-0.181
Token.
Token ID13
Feature activation+3.36e-01

Pos logit contributions
ogo+0.023
ades+0.022
ules+0.021
eds+0.021
wen+0.021
Neg logit contributions
 don-0.013
 Are-0.011
 blame-0.011
 worry-0.010
 girl-0.009
Token Please
Token ID4222
Feature activation-6.50e+00

Pos logit contributions
ogo+0.028
ades+0.026
ules+0.026
wen+0.026
eds+0.025
Neg logit contributions
 don-0.016
 Are-0.014
 blame-0.014
 worry-0.012
 girl-0.012
Token,
Token ID11
Feature activation-6.15e-01

Pos logit contributions
Wow+0.223
 Are+0.217
 girl+0.207
 boy+0.195
 blame+0.185
Neg logit contributions
ogo-0.536
ades-0.482
lor-0.478
vern-0.477
eds-0.476
Token Sam
Token ID3409
Feature activation+3.26e+00

Pos logit contributions
 blame+0.019
 Are+0.019
 weep+0.018
 don+0.018
Wow+0.017
Neg logit contributions
ogo-0.048
Andrew-0.045
ules-0.044
ades-0.044
wen-0.043
Token."
Token ID526
Feature activation+9.18e-01

Pos logit contributions
ogo+0.313
eds+0.293
ades+0.292
ules+0.292
wen+0.292
Neg logit contributions
 don-0.097
 Are-0.087
 blame-0.085
 worry-0.063
 girl-0.060
Token
Token ID198
Feature activation+1.12e+00

Pos logit contributions
ogo+0.071
ades+0.067
ules+0.067
wen+0.066
eds+0.065
Neg logit contributions
 don-0.046
 Are-0.041
 blame-0.040
 worry-0.036
 girl-0.035
Token
Token ID198
Feature activation+1.12e+00

Pos logit contributions
ogo+0.110
ules+0.105
eds+0.104
lor+0.103
ades+0.103
Neg logit contributions
 don-0.037
 Lily-0.031
 Are-0.031
 blame-0.024
 Tim-0.024
Token will
Token ID481
Feature activation+1.07e+00

Pos logit contributions
 Are+0.075
 girl+0.059
 Of+0.059
 oh+0.056
ter+0.055
Neg logit contributions
ades-0.099
ogo-0.092
eds-0.091
wen-0.090
asper-0.088
Token tidy
Token ID43044
Feature activation-1.78e-01

Pos logit contributions
ogo+0.080
ades+0.077
wen+0.076
Andrew+0.075
eds+0.074
Neg logit contributions
 Are-0.048
 don-0.043
 girl-0.041
 boy-0.039
 blame-0.039
Token our
Token ID674
Feature activation+7.02e-01

Pos logit contributions
 don+0.008
 Are+0.007
 blame+0.007
 worry+0.006
 girl+0.006
Neg logit contributions
ogo-0.015
ades-0.014
eds-0.014
wen-0.014
ules-0.014
Token room
Token ID2119
Feature activation-8.26e-01

Pos logit contributions
ogo+0.047
ades+0.045
eds+0.044
wen+0.043
ules+0.043
Neg logit contributions
 don-0.043
 Are-0.040
 blame-0.038
 worry-0.034
 girl-0.034
Token.
Token ID13
Feature activation+1.97e-01

Pos logit contributions
 don+0.039
 Are+0.035
 blame+0.033
 worry+0.029
 girl+0.028
Neg logit contributions
ogo-0.068
ades-0.065
eds-0.063
ules-0.063
wen-0.063
Token Please
Token ID4222
Feature activation-6.50e+00

Pos logit contributions
Andrew+0.012
lor+0.012
ogo+0.012
wen+0.011
ades+0.011
Neg logit contributions
 don-0.010
 blame-0.009
 weep-0.009
 worry-0.009
 forth-0.009
Token do
Token ID466
Feature activation-1.42e-01

Pos logit contributions
 Are+0.237
Wow+0.211
 girl+0.203
 blame+0.200
 boy+0.193
Neg logit contributions
ades-0.477
ogo-0.476
eds-0.463
wen-0.445
vern-0.441
Token not
Token ID407
Feature activation+8.29e-01

Pos logit contributions
 don+0.008
 Are+0.007
 blame+0.007
 girl+0.006
 worry+0.006
Neg logit contributions
ogo-0.011
ades-0.010
wen-0.010
eds-0.010
ules-0.010
Token take
Token ID1011
Feature activation+9.28e-01

Pos logit contributions
eds+0.054
 supposed+0.053
ades+0.052
ogo+0.052
Andrew+0.050
Neg logit contributions
 Are-0.036
Wow-0.032
 don-0.031
 girl-0.031
Grab-0.030
Token away
Token ID1497
Feature activation-1.12e+00

Pos logit contributions
ogo+0.070
eds+0.067
ades+0.067
ules+0.064
wen+0.064
Neg logit contributions
 don-0.049
 Are-0.045
 blame-0.043
 worry-0.038
 girl-0.037
Token our
Token ID674
Feature activation+7.39e-01

Pos logit contributions
 don+0.058
 Are+0.052
 blame+0.051
 worry+0.045
 girl+0.044
Neg logit contributions
ogo-0.087
ades-0.083
eds-0.082
ules-0.081
wen-0.081
Token me
Token ID502
Feature activation-2.72e+00

Pos logit contributions
ogo+0.047
ules+0.043
ades+0.042
eds+0.042
wen+0.042
Neg logit contributions
 don-0.046
 Are-0.045
 blame-0.042
 girl-0.038
 weep-0.038
Token what
Token ID644
Feature activation+1.65e+00

Pos logit contributions
 don+0.118
 Are+0.109
 blame+0.104
 worry+0.087
 girl+0.087
Neg logit contributions
ogo-0.230
ades-0.219
ules-0.216
eds-0.215
wen-0.212
Token happened
Token ID3022
Feature activation-3.32e-01

Pos logit contributions
ogo+0.137
ades+0.134
ules+0.130
wen+0.129
eds+0.129
Neg logit contributions
 don-0.071
 Are-0.063
 blame-0.062
 worry-0.053
 girl-0.052
Token next
Token ID1306
Feature activation-2.02e+00

Pos logit contributions
 don+0.016
 blame+0.015
 Are+0.015
 worry+0.013
 weep+0.012
Neg logit contributions
ogo-0.026
ades-0.024
ules-0.024
eds-0.024
wen-0.024
Token.
Token ID13
Feature activation+3.28e-01

Pos logit contributions
 don+0.123
 Are+0.113
 blame+0.109
 worry+0.097
 girl+0.095
Neg logit contributions
ogo-0.139
ades-0.130
ules-0.127
wen-0.127
eds-0.127
Token Please
Token ID4222
Feature activation-6.50e+00

Pos logit contributions
ogo+0.025
lor+0.024
Andrew+0.024
ules+0.024
wen+0.023
Neg logit contributions
 don-0.013
 forth-0.012
 blame-0.012
 worry-0.011
 feel-0.011
Token don
Token ID836
Feature activation-2.41e+00

Pos logit contributions
 Are+0.230
Wow+0.218
 girl+0.216
 blame+0.199
 boy+0.187
Neg logit contributions
ogo-0.514
ades-0.488
eds-0.473
wen-0.472
Andrew-0.469
Token't
Token ID470
Feature activation+1.49e+00

Pos logit contributions
 don+0.143
 Are+0.128
 blame+0.124
 do+0.113
 worry+0.112
Neg logit contributions
ogo-0.174
ades-0.162
ules-0.161
eds-0.161
wen-0.160
Token stop
Token ID2245
Feature activation+1.40e+00

Pos logit contributions
ogo+0.108
eds+0.104
ules+0.099
ades+0.099
Andrew+0.098
Neg logit contributions
 Are-0.056
Wow-0.055
 don-0.050
Grab-0.049
 girl-0.048
Token,"
Token ID553
Feature activation-7.12e-01

Pos logit contributions
ogo+0.102
eds+0.101
ules+0.098
ades+0.095
us+0.094
Neg logit contributions
 don-0.070
 Are-0.064
 blame-0.061
 girl-0.056
 boy-0.054
Token Lily
Token ID20037
Feature activation-2.74e-01

Pos logit contributions
 don+0.030
 blame+0.026
 Are+0.026
 want+0.023
 forth+0.022
Neg logit contributions
Andrew-0.055
ogo-0.052
ules-0.052
ades-0.052
wen-0.051
Token.
Token ID13
Feature activation+4.74e-01

Pos logit contributions
ogo+0.072
ades+0.068
ules+0.068
eds+0.068
wen+0.066
Neg logit contributions
 don-0.040
 Are-0.036
 blame-0.035
 worry-0.031
 girl-0.030
Token He
Token ID679
Feature activation+9.02e-02

Pos logit contributions
ogo+0.034
ules+0.032
ades+0.032
wen+0.031
eds+0.031
Neg logit contributions
 don-0.027
 blame-0.023
 Are-0.023
 worry-0.022
 girl-0.021
Token drops
Token ID10532
Feature activation+1.73e-01

Pos logit contributions
ogo+0.007
ades+0.007
eds+0.007
ules+0.007
wen+0.007
Neg logit contributions
 don-0.004
 Are-0.004
 blame-0.004
 worry-0.003
 girl-0.003
Token the
Token ID262
Feature activation+1.13e-01

Pos logit contributions
ogo+0.014
ades+0.013
ules+0.013
eds+0.013
wen+0.013
Neg logit contributions
 don-0.008
 Are-0.007
 blame-0.007
 worry-0.006
 girl-0.006
Token ash
Token ID12530
Feature activation-3.53e-01

Pos logit contributions
ogo+0.008
ades+0.007
eds+0.007
ules+0.007
wen+0.007
Neg logit contributions
 don-0.007
 Are-0.006
 blame-0.006
 worry-0.005
 Sorry-0.005
Tokentr
Token ID2213
Feature activation-6.50e+00

Pos logit contributions
 don+0.016
 Are+0.014
 blame+0.014
 worry+0.012
 girl+0.012
Neg logit contributions
ogo-0.029
ades-0.028
eds-0.028
ules-0.027
wen-0.027
Tokenay
Token ID323
Feature activation+2.23e-01

Pos logit contributions
 don+0.464
 Are+0.427
 blame+0.411
 worry+0.384
 girl+0.381
Neg logit contributions
ogo-0.370
ules-0.346
ades-0.340
wen-0.339
eds-0.334
Token.
Token ID13
Feature activation+2.27e-01

Pos logit contributions
ogo+0.018
ades+0.017
eds+0.017
wen+0.017
ules+0.017
Neg logit contributions
 don-0.011
 Are-0.010
 blame-0.009
 worry-0.008
 girl-0.008
Token It
Token ID632
Feature activation-1.60e+00

Pos logit contributions
ogo+0.016
ades+0.015
ules+0.015
wen+0.015
Andrew+0.015
Neg logit contributions
 don-0.013
 blame-0.012
 Are-0.011
 worry-0.011
 girl-0.010
Token breaks
Token ID9457
Feature activation+8.96e-02

Pos logit contributions
 don+0.080
 Are+0.072
 blame+0.070
 worry+0.066
 girl+0.062
Neg logit contributions
ogo-0.122
eds-0.116
ades-0.115
wen-0.110
ules-0.108
Token.
Token ID13
Feature activation+2.72e-01

Pos logit contributions
ogo+0.008
ades+0.008
ules+0.008
wen+0.008
lor+0.008
Neg logit contributions
 don-0.004
 Are-0.003
 blame-0.003
 Lily-0.003
 worry-0.002
Token said
Token ID531
Feature activation-1.03e+00

Pos logit contributions
ogo+0.025
ules+0.025
ades+0.024
us+0.024
wen+0.024
Neg logit contributions
 don-0.013
 Are-0.012
 blame-0.010
 worry-0.010
 Sorry-0.009
Token.
Token ID13
Feature activation+4.15e-02

Pos logit contributions
 don+0.063
 Are+0.061
 blame+0.056
Wow+0.052
 Sorry+0.050
Neg logit contributions
ades-0.061
us-0.060
ules-0.059
ogo-0.058
vern-0.055
Token
Token ID198
Feature activation+6.02e-01

Pos logit contributions
ogo+0.004
eds+0.003
wen+0.003
ades+0.003
ules+0.003
Neg logit contributions
 don-0.002
 Are-0.002
 blame-0.002
 Lily-0.002
 boy-0.002
Token
Token ID198
Feature activation+5.96e-01

Pos logit contributions
ogo+0.056
ules+0.055
lor+0.054
wen+0.054
opl+0.053
Neg logit contributions
 don-0.022
 Lily-0.019
 Are-0.019
 do-0.016
 boy-0.016
TokenM
Token ID44
Feature activation+2.16e+00

Pos logit contributions
ogo+0.040
ules+0.037
wen+0.037
ades+0.037
eds+0.036
Neg logit contributions
 don-0.039
 Are-0.036
 blame-0.034
 girl-0.033
 boy-0.032
Tokenia
Token ID544
Feature activation+3.77e+00

Pos logit contributions
eds+0.098
ogo+0.097
lor+0.096
Andrew+0.092
ades+0.090
Neg logit contributions
 don-0.170
 blame-0.154
 girl-0.149
 do-0.148
 worry-0.146
Token yawn
Token ID40908
Feature activation-6.78e-01

Pos logit contributions
ogo+0.335
ades+0.320
ules+0.316
wen+0.314
eds+0.313
Neg logit contributions
 don-0.153
 Are-0.133
 blame-0.127
 worry-0.108
 girl-0.104
Tokened
Token ID276
Feature activation-1.52e+00

Pos logit contributions
 don+0.036
 blame+0.032
 Are+0.032
 worry+0.029
 weep+0.027
Neg logit contributions
ogo-0.051
ades-0.049
ules-0.048
wen-0.048
us-0.047
Token.
Token ID13
Feature activation+5.38e-02

Pos logit contributions
 don+0.098
 Are+0.093
 blame+0.093
 Sorry+0.088
 do+0.077
Neg logit contributions
ades-0.079
ogo-0.078
Andrew-0.076
ules-0.075
us-0.074
Token She
Token ID1375
Feature activation-7.09e-02

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
wen+0.005
eds+0.005
Neg logit contributions
 don-0.002
 Are-0.002
 blame-0.002
 worry-0.001
 girl-0.001
Token did
Token ID750
Feature activation+5.47e-01

Pos logit contributions
 don+0.003
 Are+0.003
 blame+0.002
 worry+0.002
 girl+0.002
Neg logit contributions
ogo-0.006
ades-0.006
wen-0.006
ules-0.006
eds-0.006
Token.
Token ID13
Feature activation+3.38e-02

Pos logit contributions
ogo+0.177
ades+0.174
ules+0.171
eds+0.170
us+0.162
Neg logit contributions
 don-0.122
 Are-0.111
 blame-0.110
 worry-0.096
 weep-0.094
Token She
Token ID1375
Feature activation-1.23e-01

Pos logit contributions
ogo+0.003
ades+0.003
eds+0.003
wen+0.003
ules+0.003
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 Lily-0.001
 girl-0.001
Token watched
Token ID7342
Feature activation+1.26e+00

Pos logit contributions
 don+0.005
 Are+0.004
 blame+0.004
 worry+0.003
 girl+0.003
Neg logit contributions
ogo-0.011
ades-0.011
ules-0.010
eds-0.010
wen-0.010
Token his
Token ID465
Feature activation+8.26e-02

Pos logit contributions
ogo+0.101
ades+0.100
ules+0.099
eds+0.095
wen+0.094
Neg logit contributions
 don-0.055
 Are-0.052
 blame-0.048
 worry-0.041
 weep-0.040
Token moves
Token ID6100
Feature activation+1.34e+00

Pos logit contributions
ogo+0.006
ades+0.006
eds+0.006
ules+0.006
Andrew+0.006
Neg logit contributions
 don-0.004
 Are-0.004
 blame-0.004
ter-0.003
 Mind-0.003
Token carefully
Token ID7773
Feature activation+3.62e+00

Pos logit contributions
ogo+0.114
ades+0.110
ules+0.108
eds+0.107
wen+0.106
Neg logit contributions
 don-0.058
 Are-0.053
 blame-0.050
 worry-0.042
 girl-0.040
Token and
Token ID290
Feature activation-5.26e-02

Pos logit contributions
ogo+0.288
ades+0.281
ules+0.275
eds+0.273
wen+0.265
Neg logit contributions
 don-0.162
 Are-0.150
 blame-0.144
 worry-0.122
 girl-0.121
Token learned
Token ID4499
Feature activation+2.81e+00

Pos logit contributions
 don+0.002
 blame+0.002
 Are+0.002
 worry+0.002
 weep+0.002
Neg logit contributions
Andrew-0.004
ules-0.003
ogo-0.003
asper-0.003
wen-0.003
Token a
Token ID257
Feature activation-7.23e-01

Pos logit contributions
ogo+0.237
ades+0.224
ules+0.223
eds+0.222
wen+0.222
Neg logit contributions
 don-0.131
 Are-0.112
 blame-0.108
 worry-0.096
 do-0.094
Token lot
Token ID1256
Feature activation+1.33e+00

Pos logit contributions
 don+0.045
 Are+0.037
 do+0.037
 girl+0.035
 feel+0.035
Neg logit contributions
ogo-0.058
ules-0.054
wen-0.054
eds-0.053
lor-0.051
Token.
Token ID13
Feature activation-7.74e-03

Pos logit contributions
ogo+0.117
eds+0.110
wen+0.110
ades+0.109
ules+0.109
Neg logit contributions
 don-0.058
 Are-0.047
 blame-0.045
 do-0.043
 Lily-0.041
Token.
Token ID13
Feature activation+2.28e-03

Pos logit contributions
 don+0.016
 Are+0.015
 blame+0.014
 worry+0.012
 girl+0.012
Neg logit contributions
ogo-0.026
ades-0.025
ules-0.024
wen-0.024
eds-0.024
Token She
Token ID1375
Feature activation-1.15e-01

Pos logit contributions
ogo+0.000
ades+0.000
eds+0.000
wen+0.000
ules+0.000
Neg logit contributions
 don-0.000
 Are-0.000
 blame-0.000
 Lily-0.000
 girl-0.000
Token wrapped
Token ID12908
Feature activation-1.34e+00

Pos logit contributions
 don+0.005
 Are+0.004
 blame+0.004
 worry+0.004
 girl+0.004
Neg logit contributions
ogo-0.010
ades-0.010
ules-0.009
wen-0.009
eds-0.009
Token the
Token ID262
Feature activation-1.02e-01

Pos logit contributions
 don+0.060
 blame+0.058
 Are+0.057
 worry+0.050
 weep+0.048
Neg logit contributions
ogo-0.109
ades-0.104
eds-0.103
wen-0.103
ules-0.100
Token yarn
Token ID21181
Feature activation+3.56e-01

Pos logit contributions
 don+0.006
 Are+0.005
 blame+0.005
 Sorry+0.005
 worry+0.004
Neg logit contributions
ogo-0.007
ades-0.007
eds-0.007
wen-0.007
lor-0.007
Token carefully
Token ID7773
Feature activation+3.57e+00

Pos logit contributions
ogo+0.030
ades+0.029
eds+0.029
wen+0.028
ules+0.028
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.014
 worry-0.011
 Sorry-0.010
Token around
Token ID1088
Feature activation-1.03e+00

Pos logit contributions
ogo+0.314
ades+0.301
ules+0.295
eds+0.295
wen+0.294
Neg logit contributions
 don-0.146
 Are-0.129
 blame-0.124
 worry-0.105
 girl-0.101
Token each
Token ID1123
Feature activation+3.15e-01

Pos logit contributions
 don+0.039
 Are+0.034
 blame+0.033
 worry+0.027
 girl+0.026
Neg logit contributions
ogo-0.093
ades-0.089
eds-0.088
ules-0.088
wen-0.088
Token thread
Token ID4704
Feature activation-2.01e+00

Pos logit contributions
ogo+0.024
ades+0.023
ules+0.023
wen+0.023
eds+0.023
Neg logit contributions
 don-0.017
 Are-0.015
 blame-0.014
 worry-0.013
 girl-0.013
Token of
Token ID286
Feature activation-6.71e-02

Pos logit contributions
 don+0.077
 Are+0.068
 blame+0.064
 worry+0.054
 girl+0.052
Neg logit contributions
ogo-0.182
ades-0.174
ules-0.172
eds-0.171
wen-0.171
Token the
Token ID262
Feature activation-8.82e-02

Pos logit contributions
 don+0.002
 Are+0.002
 blame+0.002
 worry+0.002
 girl+0.002
Neg logit contributions
ogo-0.006
ades-0.006
ules-0.006
wen-0.006
eds-0.006
Token proud
Token ID6613
Feature activation+2.74e+00

Pos logit contributions
ules+0.030
ades+0.029
wen+0.029
vern+0.029
ogo+0.029
Neg logit contributions
 Are-0.025
 don-0.025
ter-0.023
 blame-0.023
Wow-0.021
Token and
Token ID290
Feature activation-8.55e-02

Pos logit contributions
ogo+0.214
ades+0.203
ules+0.202
eds+0.199
wen+0.199
Neg logit contributions
 don-0.141
 Are-0.126
 blame-0.123
 worry-0.109
 weep-0.105
Token excited
Token ID6568
Feature activation-7.06e-02

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
ter+0.003
 weep+0.003
Neg logit contributions
ules-0.006
wen-0.006
ogo-0.006
ades-0.005
Andrew-0.005
Token to
Token ID284
Feature activation+4.59e-01

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 weep+0.003
 worry+0.003
Neg logit contributions
ogo-0.005
ules-0.005
ades-0.005
wen-0.005
eds-0.004
Token be
Token ID307
Feature activation-2.91e+00

Pos logit contributions
ogo+0.043
ades+0.041
ules+0.041
wen+0.041
eds+0.041
Neg logit contributions
 don-0.016
 Are-0.014
 blame-0.012
 girl-0.010
 worry-0.010
Token able
Token ID1498
Feature activation+3.79e+00

Pos logit contributions
 don+0.145
 Are+0.129
 blame+0.124
 worry+0.109
 girl+0.106
Neg logit contributions
ogo-0.233
ades-0.222
ules-0.219
eds-0.218
wen-0.217
Token to
Token ID284
Feature activation+4.69e-01

Pos logit contributions
ogo+0.325
wen+0.302
ades+0.302
ules+0.299
lor+0.297
Neg logit contributions
 don-0.181
 blame-0.149
 Are-0.148
 worry-0.143
 do-0.135
Token play
Token ID711
Feature activation-1.04e+00

Pos logit contributions
ogo+0.045
ades+0.042
ules+0.042
wen+0.042
eds+0.042
Neg logit contributions
 don-0.016
 Are-0.014
 blame-0.012
 girl-0.011
 worry-0.010
Token all
Token ID477
Feature activation+1.71e-02

Pos logit contributions
 don+0.050
 Are+0.049
 blame+0.048
 worry+0.044
 Sorry+0.043
Neg logit contributions
ades-0.075
ogo-0.072
eds-0.071
ules-0.067
lor-0.066
Token by
Token ID416
Feature activation+9.98e-01

Pos logit contributions
ogo+0.001
eds+0.001
ades+0.001
ules+0.001
wen+0.001
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 worry-0.001
 cry-0.001
Token herself
Token ID5223
Feature activation-2.16e-01

Pos logit contributions
ogo+0.089
ades+0.085
ules+0.084
wen+0.083
eds+0.083
Neg logit contributions
 don-0.042
 Are-0.036
 blame-0.034
 worry-0.030
 girl-0.029
Token It
Token ID632
Feature activation-1.87e+00

Pos logit contributions
 don+0.002
 worry+0.002
 blame+0.002
 weep+0.002
 cry+0.002
Neg logit contributions
ogo-0.002
ules-0.002
Andrew-0.002
lor-0.002
ades-0.002
Token is
Token ID318
Feature activation+1.13e+00

Pos logit contributions
 don+0.095
 Are+0.085
 blame+0.084
 worry+0.076
 weep+0.072
Neg logit contributions
ogo-0.145
ades-0.138
eds-0.135
wen-0.134
ules-0.134
Token a
Token ID257
Feature activation-5.99e-01

Pos logit contributions
ogo+0.102
ades+0.098
ules+0.097
lor+0.096
eds+0.096
Neg logit contributions
 don-0.048
 Are-0.040
 Lily-0.037
 do-0.035
 blame-0.035
Token very
Token ID845
Feature activation+1.85e+00

Pos logit contributions
 don+0.035
 Are+0.032
 blame+0.031
 worry+0.028
 girl+0.028
Neg logit contributions
ogo-0.043
ades-0.040
ules-0.039
wen-0.039
eds-0.039
Token distant
Token ID12899
Feature activation-8.28e-01

Pos logit contributions
ogo+0.137
wen+0.127
ules+0.126
eds+0.126
lor+0.125
Neg logit contributions
 don-0.109
 Are-0.097
 blame-0.094
 girl-0.088
 do-0.088
Token building
Token ID2615
Feature activation+3.63e+00

Pos logit contributions
 don+0.057
 blame+0.052
 Are+0.051
 worry+0.047
 weep+0.045
Neg logit contributions
ogo-0.046
ades-0.045
eds-0.044
ules-0.044
wen-0.043
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
ades+0.286
ogo+0.286
eds+0.273
ules+0.272
wen+0.266
Neg logit contributions
 don-0.170
 Are-0.156
 blame-0.152
 worry-0.134
 weep-0.127
Token It
Token ID632
Feature activation-1.85e+00

Pos logit contributions
ogo+0.001
ules+0.001
Andrew+0.001
wen+0.001
ades+0.001
Neg logit contributions
 don-0.000
 blame-0.000
 worry-0.000
 weep-0.000
 cry-0.000
Token is
Token ID318
Feature activation+1.14e+00

Pos logit contributions
 don+0.092
 Are+0.085
 blame+0.084
 worry+0.078
 weep+0.073
Neg logit contributions
ogo-0.139
ades-0.133
eds-0.130
wen-0.128
ules-0.126
Token in
Token ID287
Feature activation-1.28e+00

Pos logit contributions
ogo+0.098
ades+0.093
ules+0.092
eds+0.091
wen+0.091
Neg logit contributions
 don-0.052
 Are-0.045
 blame-0.043
 worry-0.038
 do-0.038
Token the
Token ID262
Feature activation+3.70e-02

Pos logit contributions
 don+0.045
 Are+0.037
 blame+0.034
 worry+0.030
 do+0.029
Neg logit contributions
ogo-0.125
ades-0.118
ules-0.118
wen-0.116
eds-0.115
TokenI
Token ID40
Feature activation-1.66e+00

Pos logit contributions
ogo+0.027
ades+0.027
ules+0.026
wen+0.026
Andrew+0.026
Neg logit contributions
 don-0.017
 blame-0.015
 Are-0.014
 worry-0.013
 weep-0.013
Token think
Token ID892
Feature activation-1.31e-01

Pos logit contributions
 forth+0.042
 Are+0.042
ter+0.041
 blame+0.041
 Sorry+0.033
Neg logit contributions
ogo-0.147
us-0.143
wen-0.142
Andrew-0.141
ades-0.140
Token I
Token ID314
Feature activation-2.02e+00

Pos logit contributions
 don+0.005
 blame+0.005
 Are+0.004
 worry+0.004
 weep+0.004
Neg logit contributions
ogo-0.012
ades-0.011
ules-0.011
wen-0.011
eds-0.011
Token know
Token ID760
Feature activation+1.20e-01

Pos logit contributions
 don+0.073
 Are+0.070
 blame+0.068
 girl+0.059
 worry+0.057
Neg logit contributions
ogo-0.175
ades-0.165
wen-0.165
ules-0.165
eds-0.163
Token which
Token ID543
Feature activation+2.33e-01

Pos logit contributions
ogo+0.010
ades+0.010
ules+0.009
wen+0.009
eds+0.009
Neg logit contributions
 don-0.005
 blame-0.005
 Are-0.005
 weep-0.004
 worry-0.004
Token book
Token ID1492
Feature activation+8.53e-01

Pos logit contributions
ogo+0.016
ades+0.015
ules+0.015
wen+0.015
eds+0.015
Neg logit contributions
 don-0.014
 Are-0.013
 blame-0.013
 worry-0.012
 weep-0.011
Token you
Token ID345
Feature activation-1.42e+00

Pos logit contributions
ogo+0.067
ades+0.066
eds+0.061
wen+0.061
lor+0.061
Neg logit contributions
 don-0.039
 Are-0.037
 blame-0.037
 worry-0.033
 weep-0.031
Token mean
Token ID1612
Feature activation-1.51e+00

Pos logit contributions
 Are+0.051
 blame+0.049
 don+0.048
 girl+0.041
 worry+0.040
Neg logit contributions
ogo-0.122
ades-0.116
ules-0.115
eds-0.113
wen-0.112
Token,"
Token ID553
Feature activation-9.66e-01

Pos logit contributions
 don+0.064
 Are+0.061
 blame+0.060
 worry+0.053
 weep+0.051
Neg logit contributions
ogo-0.121
ades-0.117
eds-0.116
ules-0.113
us-0.111
Token the
Token ID262
Feature activation+1.41e-02

Pos logit contributions
 don+0.055
 Are+0.052
 blame+0.047
 want+0.046
ter+0.045
Neg logit contributions
Andrew-0.056
ogo-0.053
Steven-0.050
wen-0.050
Karl-0.050
Token park
Token ID3952
Feature activation+1.53e+00

Pos logit contributions
ogo+0.001
ades+0.001
ules+0.001
eds+0.001
wen+0.001
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 worry-0.001
 weep-0.001
Token
Token ID198
Feature activation+7.62e-01

Pos logit contributions
 don+0.065
 blame+0.059
 Are+0.058
 worry+0.054
 weep+0.053
Neg logit contributions
ogo-0.063
ades-0.061
ules-0.060
wen-0.059
Andrew-0.059
Token
Token ID198
Feature activation+7.57e-01

Pos logit contributions
ogo+0.069
ules+0.066
eds+0.065
wen+0.064
ades+0.063
Neg logit contributions
 don-0.031
 Are-0.027
 Lily-0.025
 blame-0.025
 do-0.022
Token"
Token ID1
Feature activation+3.05e-01

Pos logit contributions
ogo+0.050
ules+0.046
ades+0.046
wen+0.045
eds+0.045
Neg logit contributions
 don-0.050
 Are-0.046
 blame-0.044
 worry-0.041
 girl-0.041
TokenWe
Token ID1135
Feature activation-2.14e+00

Pos logit contributions
ades+0.023
ogo+0.022
ules+0.022
Andrew+0.022
wen+0.021
Neg logit contributions
 don-0.015
 blame-0.013
 Are-0.013
 worry-0.012
 cry-0.012
Token are
Token ID389
Feature activation-6.52e-01

Pos logit contributions
 Are+0.078
 blame+0.066
 girl+0.062
 don+0.060
 boy+0.057
Neg logit contributions
ogo-0.179
ades-0.170
ules-0.170
eds-0.166
asper-0.165
Token digging
Token ID18894
Feature activation+3.14e+00

Pos logit contributions
 don+0.035
 Are+0.033
 blame+0.031
 worry+0.028
 girl+0.028
Neg logit contributions
ogo-0.048
ules-0.045
ades-0.044
wen-0.044
eds-0.043
Token for
Token ID329
Feature activation-5.31e-01

Pos logit contributions
ades+0.248
ogo+0.243
eds+0.240
us+0.228
Andrew+0.218
Neg logit contributions
 blame-0.138
 Are-0.135
 don-0.135
 worry-0.110
 weep-0.109
Token minerals
Token ID21622
Feature activation+8.22e-01

Pos logit contributions
 don+0.025
 Are+0.022
 blame+0.021
 girl+0.018
 worry+0.018
Neg logit contributions
ogo-0.043
ades-0.042
eds-0.041
ules-0.041
wen-0.040
Token,"
Token ID553
Feature activation-1.11e+00

Pos logit contributions
ogo+0.077
ades+0.075
ules+0.073
eds+0.072
us+0.071
Neg logit contributions
 don-0.028
 Are-0.026
 blame-0.025
 worry-0.019
 weep-0.018
Token Tom
Token ID4186
Feature activation+1.91e+00

Pos logit contributions
 don+0.058
 Are+0.057
 blame+0.051
 forth+0.051
cy+0.050
Neg logit contributions
Andrew-0.064
ules-0.059
ades-0.058
David-0.057
Kevin-0.056
Token said
Token ID531
Feature activation-1.02e+00

Pos logit contributions
ogo+0.165
ules+0.160
wen+0.155
ades+0.153
Andrew+0.153
Neg logit contributions
 don-0.075
 Are-0.067
 blame-0.063
 Sorry-0.056
 worry-0.053
Token first
Token ID717
Feature activation+9.45e-01

Pos logit contributions
 don+0.056
 Are+0.043
 blame+0.039
 do+0.038
 worry+0.038
Neg logit contributions
ogo-0.128
ades-0.123
wen-0.123
eds-0.120
ules-0.119
Token."
Token ID526
Feature activation+7.89e-01

Pos logit contributions
ogo+0.073
ades+0.070
wen+0.068
ules+0.068
eds+0.068
Neg logit contributions
 don-0.051
 Are-0.044
 blame-0.042
 worry-0.039
 do-0.039
Token
Token ID198
Feature activation+6.05e-01

Pos logit contributions
ogo+0.066
ades+0.063
eds+0.063
wen+0.062
ules+0.061
Neg logit contributions
 don-0.039
 Are-0.034
 blame-0.032
 Lily-0.030
 girl-0.029
Token
Token ID198
Feature activation+6.07e-01

Pos logit contributions
ogo+0.059
ules+0.057
lor+0.057
icians+0.057
eds+0.057
Neg logit contributions
 don-0.015
 Lily-0.015
 Are-0.012
 Tim-0.012
 do-0.011
TokenThey
Token ID2990
Feature activation-2.95e-01

Pos logit contributions
ogo+0.048
ules+0.045
wen+0.044
eds+0.044
opl+0.044
Neg logit contributions
 don-0.031
 Are-0.027
 do-0.027
 boy-0.026
 girl-0.025
Token ran
Token ID4966
Feature activation+2.58e-01

Pos logit contributions
 don+0.013
 Are+0.012
 blame+0.011
 worry+0.010
 girl+0.009
Neg logit contributions
ogo-0.025
ules-0.023
ades-0.023
wen-0.023
Andrew-0.023
Token to
Token ID284
Feature activation+5.89e-01

Pos logit contributions
ogo+0.019
ades+0.017
eds+0.017
ules+0.017
wen+0.017
Neg logit contributions
 don-0.015
 Are-0.014
 blame-0.013
 worry-0.012
 weep-0.011
Token the
Token ID262
Feature activation+1.19e-01

Pos logit contributions
ogo+0.049
Andrew+0.046
wen+0.046
ades+0.046
ules+0.046
Neg logit contributions
 don-0.024
 Are-0.024
 blame-0.020
 girl-0.017
 Sorry-0.017
Token kitchen
Token ID9592
Feature activation+1.78e+00

Pos logit contributions
ogo+0.010
ules+0.009
wen+0.009
ades+0.009
lor+0.009
Neg logit contributions
 don-0.006
 blame-0.006
 Are-0.006
 boy-0.005
 girl-0.005
Token where
Token ID810
Feature activation+1.59e+00

Pos logit contributions
ogo+0.157
ades+0.148
ules+0.146
wen+0.146
eds+0.146
Neg logit contributions
 don-0.077
 Are-0.066
 blame-0.063
 worry-0.056
 girl-0.055
Token Mom
Token ID11254
Feature activation-1.36e+00

Pos logit contributions
ogo+0.129
ades+0.128
wen+0.125
ules+0.124
eds+0.122
Neg logit contributions
 don-0.068
 Are-0.065
 blame-0.062
 weep-0.051
 worry-0.050
Token falls
Token ID8953
Feature activation-8.22e-01

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.004
 worry+0.003
 girl+0.003
Neg logit contributions
ogo-0.007
ades-0.007
ules-0.007
wen-0.007
eds-0.007
Token on
Token ID319
Feature activation-7.40e-01

Pos logit contributions
 don+0.036
 Are+0.031
 blame+0.030
 worry+0.026
 girl+0.025
Neg logit contributions
ogo-0.071
ades-0.068
eds-0.067
ules-0.067
wen-0.066
Token the
Token ID262
Feature activation+1.32e-01

Pos logit contributions
 don+0.021
 Are+0.015
 Lily+0.014
 blame+0.014
 worry+0.012
Neg logit contributions
ogo-0.077
ules-0.074
wen-0.074
ades-0.073
eds-0.073
Token grass
Token ID8701
Feature activation+2.00e-01

Pos logit contributions
ogo+0.011
lor+0.011
ules+0.011
wen+0.011
ades+0.010
Neg logit contributions
 don-0.006
 Are-0.005
 blame-0.005
 do-0.005
 cry-0.005
Token and
Token ID290
Feature activation+1.32e-01

Pos logit contributions
ogo+0.019
wen+0.018
ules+0.018
ades+0.018
eds+0.018
Neg logit contributions
 don-0.007
 Are-0.006
 blame-0.005
 do-0.005
 boy-0.005
Token gets
Token ID3011
Feature activation+2.48e-01

Pos logit contributions
ogo+0.008
eds+0.008
opl+0.007
ades+0.007
Andrew+0.007
Neg logit contributions
 don-0.007
 Are-0.006
 blame-0.006
 worry-0.006
ter-0.006
Token dirty
Token ID11841
Feature activation-1.56e+00

Pos logit contributions
ades+0.018
ogo+0.018
eds+0.017
us+0.017
ules+0.016
Neg logit contributions
 don-0.012
 Are-0.012
 blame-0.011
 girl-0.010
 Sorry-0.009
Token.
Token ID13
Feature activation+3.37e-02

Pos logit contributions
 don+0.093
 Are+0.087
 blame+0.086
 worry+0.073
 Sorry+0.073
Neg logit contributions
ogo-0.103
ades-0.102
eds-0.100
us-0.096
ules-0.095
Token He
Token ID679
Feature activation-9.79e-02

Pos logit contributions
ogo+0.003
ades+0.003
wen+0.003
eds+0.003
ules+0.003
Neg logit contributions
 don-0.002
 Are-0.001
 blame-0.001
 Lily-0.001
 girl-0.001
Token cries
Token ID24691
Feature activation-1.04e+00

Pos logit contributions
 don+0.005
 Are+0.004
 blame+0.004
 worry+0.004
 girl+0.004
Neg logit contributions
ogo-0.008
ades-0.007
eds-0.007
ules-0.007
wen-0.007
Token and
Token ID290
Feature activation+1.20e-01

Pos logit contributions
 don+0.050
 Are+0.045
 blame+0.045
 worry+0.038
 weep+0.037
Neg logit contributions
ogo-0.083
ades-0.079
ules-0.078
eds-0.077
us-0.075
Token story
Token ID1621
Feature activation+1.70e+00

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
eds+0.005
wen+0.005
Neg logit contributions
 don-0.004
 Are-0.004
 blame-0.003
 do-0.003
 girl-0.003
Token is
Token ID318
Feature activation+1.43e+00

Pos logit contributions
ogo+0.081
ades+0.075
wen+0.072
ules+0.070
eds+0.070
Neg logit contributions
 don-0.138
 blame-0.128
 Are-0.126
 worry-0.122
 weep-0.117
Token about
Token ID546
Feature activation+7.92e-01

Pos logit contributions
ogo+0.130
ades+0.124
eds+0.122
ules+0.121
wen+0.121
Neg logit contributions
 don-0.058
 Are-0.049
 blame-0.046
 worry-0.041
 girl-0.040
Token Jane
Token ID12091
Feature activation+2.41e+00

Pos logit contributions
ogo+0.067
ades+0.065
ules+0.064
wen+0.063
eds+0.063
Neg logit contributions
 don-0.032
 Are-0.029
 blame-0.029
 worry-0.023
 weep-0.023
Token and
Token ID290
Feature activation+1.86e-01

Pos logit contributions
ogo+0.229
ades+0.223
wen+0.212
ules+0.211
eds+0.209
Neg logit contributions
 don-0.091
 Are-0.072
 girl-0.066
 blame-0.065
 worry-0.063
Token her
Token ID607
Feature activation-9.72e-02

Pos logit contributions
ogo+0.014
ules+0.014
wen+0.013
Andrew+0.013
ades+0.013
Neg logit contributions
 don-0.009
 blame-0.008
 Are-0.008
 girl-0.007
 worry-0.007
Token friend
Token ID1545
Feature activation+2.09e-01

Pos logit contributions
 don+0.005
 Are+0.005
 blame+0.004
 worry+0.004
 weep+0.004
Neg logit contributions
ogo-0.007
ades-0.007
ules-0.007
wen-0.007
eds-0.007
Token Bob
Token ID5811
Feature activation+9.45e-01

Pos logit contributions
ogo+0.018
ules+0.018
ades+0.018
wen+0.017
eds+0.017
Neg logit contributions
 don-0.008
 Are-0.007
 blame-0.006
 worry-0.005
 weep-0.005
Token.
Token ID13
Feature activation+7.88e-02

Pos logit contributions
ogo+0.080
ades+0.076
ules+0.075
wen+0.075
eds+0.075
Neg logit contributions
 don-0.043
 Are-0.038
 blame-0.036
 worry-0.032
 girl-0.031
Token Jane
Token ID12091
Feature activation+2.31e+00

Pos logit contributions
ogo+0.007
ades+0.007
ules+0.006
eds+0.006
wen+0.006
Neg logit contributions
 don-0.003
 Are-0.003
 blame-0.003
 worry-0.002
 girl-0.002
Token and
Token ID290
Feature activation+8.69e-02

Pos logit contributions
ogo+0.193
ades+0.185
ules+0.183
wen+0.181
eds+0.181
Neg logit contributions
 don-0.101
 Are-0.090
 blame-0.087
 worry-0.076
 girl-0.074
Token'm
Token ID1101
Feature activation-5.40e-01

Pos logit contributions
 Are+0.081
 don+0.079
 blame+0.076
 girl+0.069
ter+0.068
Neg logit contributions
ogo-0.147
ades-0.139
eds-0.138
us-0.137
wen-0.136
Token here
Token ID994
Feature activation-1.17e+00

Pos logit contributions
 don+0.018
 Are+0.015
 blame+0.014
 worry+0.012
 do+0.011
Neg logit contributions
ogo-0.053
ades-0.051
ules-0.050
eds-0.050
wen-0.050
Token.
Token ID13
Feature activation+3.88e-02

Pos logit contributions
 don+0.067
 blame+0.064
 Are+0.063
 girl+0.055
 weep+0.055
Neg logit contributions
ogo-0.077
eds-0.073
ades-0.073
ules-0.072
wen-0.071
Token But
Token ID887
Feature activation-1.16e+00

Pos logit contributions
ogo+0.003
lor+0.002
opl+0.002
wen+0.002
eds+0.002
Neg logit contributions
 don-0.002
 blame-0.002
 worry-0.002
 forth-0.002
 girl-0.001
Token you
Token ID345
Feature activation-1.25e+00

Pos logit contributions
 blame+0.059
 weep+0.053
 whine+0.051
tail+0.049
Wow+0.049
Neg logit contributions
ogo-0.069
wen-0.059
iflower-0.055
lor-0.055
eds-0.054
Token were
Token ID547
Feature activation-9.38e-01

Pos logit contributions
 Are+0.038
 don+0.037
 blame+0.034
 girl+0.030
 boy+0.026
Neg logit contributions
ogo-0.117
ades-0.109
eds-0.109
ules-0.107
wen-0.106
Token foolish
Token ID19538
Feature activation+1.24e+00

Pos logit contributions
 don+0.042
 Are+0.035
 blame+0.033
 do+0.031
 Lily+0.030
Neg logit contributions
ogo-0.083
ades-0.081
ules-0.080
wen-0.078
lor-0.077
Token to
Token ID284
Feature activation+6.13e-01

Pos logit contributions
ogo+0.094
eds+0.090
ules+0.089
us+0.086
ades+0.086
Neg logit contributions
 don-0.062
 blame-0.058
 Are-0.057
 Sorry-0.050
 worry-0.048
Token go
Token ID467
Feature activation-5.30e-02

Pos logit contributions
ogo+0.060
eds+0.056
wen+0.056
ades+0.056
ules+0.055
Neg logit contributions
 don-0.020
 Are-0.018
 blame-0.016
 girl-0.013
 worry-0.012
Token to
Token ID284
Feature activation+6.25e-01

Pos logit contributions
 don+0.003
 Are+0.002
 blame+0.002
 worry+0.002
 girl+0.002
Neg logit contributions
ogo-0.004
ades-0.004
eds-0.004
ules-0.004
wen-0.004
Token the
Token ID262
Feature activation+1.86e-01

Pos logit contributions
ogo+0.052
ades+0.050
ules+0.049
eds+0.049
wen+0.049
Neg logit contributions
 don-0.029
 Are-0.025
 blame-0.024
 worry-0.021
 girl-0.021
TokenFollow
Token ID7155
Feature activation-1.92e+00

Pos logit contributions
ogo+0.005
ades+0.005
ules+0.005
wen+0.005
eds+0.005
Neg logit contributions
 don-0.003
 Are-0.002
 blame-0.002
 worry-0.002
 girl-0.002
Token your
Token ID534
Feature activation-3.25e-02

Pos logit contributions
 don+0.107
 Are+0.093
 blame+0.090
 worry+0.084
 do+0.083
Neg logit contributions
ogo-0.144
ades-0.138
ules-0.135
wen-0.133
eds-0.131
Token dreams
Token ID10625
Feature activation-2.16e+00

Pos logit contributions
 don+0.002
 Are+0.002
 blame+0.001
 worry+0.001
 girl+0.001
Neg logit contributions
ogo-0.003
ades-0.002
ules-0.002
wen-0.002
eds-0.002
Token -
Token ID532
Feature activation+8.05e-01

Pos logit contributions
 don+0.101
 Are+0.084
 blame+0.079
 do+0.073
 worry+0.072
Neg logit contributions
ogo-0.183
ades-0.175
wen-0.171
ules-0.171
eds-0.171
Token the
Token ID262
Feature activation-1.06e-01

Pos logit contributions
ogo+0.066
ules+0.062
ades+0.062
wen+0.061
lor+0.061
Neg logit contributions
 blame-0.031
 don-0.030
 Are-0.029
 weep-0.026
 cry-0.024
Token true
Token ID2081
Feature activation-8.24e-01

Pos logit contributions
 don+0.006
 blame+0.006
 Are+0.006
 Sorry+0.005
 cry+0.005
Neg logit contributions
ogo-0.006
ades-0.006
Andrew-0.006
lor-0.006
eds-0.006
Token treasure
Token ID14068
Feature activation+6.75e-01

Pos logit contributions
 don+0.052
 blame+0.048
 Are+0.048
 cry+0.042
 worry+0.042
Neg logit contributions
ogo-0.054
ades-0.050
eds-0.049
wen-0.048
ules-0.048
Token lies
Token ID7363
Feature activation+2.98e-02

Pos logit contributions
ogo+0.056
ades+0.054
ules+0.053
wen+0.052
eds+0.052
Neg logit contributions
 don-0.032
 Are-0.027
 blame-0.026
 girl-0.023
 do-0.023
Token within
Token ID1626
Feature activation-1.83e+00

Pos logit contributions
ogo+0.003
ades+0.002
ules+0.002
wen+0.002
eds+0.002
Neg logit contributions
 don-0.001
 Are-0.001
 blame-0.001
 worry-0.001
 girl-0.001
Token.
Token ID13
Feature activation-1.98e-01

Pos logit contributions
 don+0.092
 Are+0.079
 blame+0.075
 worry+0.069
 do+0.068
Neg logit contributions
ogo-0.150
ades-0.143
ules-0.141
wen-0.139
eds-0.136
Token The
Token ID383
Feature activation-2.06e-01

Pos logit contributions
 don+0.009
 blame+0.008
 worry+0.008
 Are+0.007
 weep+0.007
Neg logit contributions
ogo-0.016
lor-0.015
ules-0.015
wen-0.014
Andrew-0.014
Token box
Token ID3091
Feature activation-2.85e-01

Pos logit contributions
ogo+0.103
eds+0.103
ades+0.103
ules+0.095
Andrew+0.095
Neg logit contributions
 don-0.118
 Are-0.107
 blame-0.101
ter-0.099
 Sorry-0.096
Token of
Token ID286
Feature activation-3.98e-02

Pos logit contributions
 don+0.014
 Are+0.013
 blame+0.012
 Sorry+0.010
 worry+0.010
Neg logit contributions
ogo-0.023
ades-0.022
ules-0.022
eds-0.021
wen-0.021
Token candy
Token ID18550
Feature activation+8.24e-02

Pos logit contributions
 don+0.003
 Are+0.003
 Sorry+0.002
ter+0.002
 blame+0.002
Neg logit contributions
ades-0.002
ogo-0.002
eds-0.002
ules-0.002
us-0.002
Token.
Token ID13
Feature activation-1.11e-02

Pos logit contributions
ogo+0.007
ades+0.006
ules+0.006
eds+0.006
wen+0.006
Neg logit contributions
 don-0.004
 Are-0.004
 blame-0.003
 worry-0.003
 weep-0.003
Token He
Token ID679
Feature activation-1.75e-01

Pos logit contributions
 don+0.000
 blame+0.000
 Are+0.000
 worry+0.000
 do+0.000
Neg logit contributions
ogo-0.001
ules-0.001
ades-0.001
wen-0.001
eds-0.001
Token said
Token ID531
Feature activation-1.15e+00

Pos logit contributions
 don+0.008
 Are+0.007
 blame+0.007
 worry+0.006
 Sorry+0.006
Neg logit contributions
ogo-0.014
ades-0.013
wen-0.013
ules-0.013
Andrew-0.013
Token to
Token ID284
Feature activation+5.24e-01

Pos logit contributions
 blame+0.049
 don+0.048
 Are+0.045
 weep+0.041
 forth+0.040
Neg logit contributions
ades-0.084
ogo-0.083
ules-0.083
us-0.081
lor-0.078
Token her
Token ID607
Feature activation+2.01e-01

Pos logit contributions
ogo+0.045
wen+0.042
ules+0.042
eds+0.042
Andrew+0.041
Neg logit contributions
 don-0.021
 Are-0.019
 blame-0.017
 Sorry-0.015
 worry-0.015
Token that
Token ID326
Feature activation-1.42e-01

Pos logit contributions
ogo+0.017
ules+0.016
wen+0.016
ades+0.016
lor+0.015
Neg logit contributions
 Are-0.008
 don-0.008
 blame-0.007
 Sorry-0.007
Wow-0.006
Token the
Token ID262
Feature activation+5.35e-02

Pos logit contributions
 blame+0.006
 don+0.005
 Are+0.005
 Sorry+0.005
 weep+0.004
Neg logit contributions
ogo-0.010
ules-0.010
wen-0.010
Andrew-0.010
ades-0.010
Token scary
Token ID14343
Feature activation-1.24e+00

Pos logit contributions
ades+0.003
ogo+0.003
lor+0.003
 lair+0.003
vern+0.003
Neg logit contributions
 don-0.003
 Are-0.003
 blame-0.002
 forth-0.002
 weep-0.002
Token
Token ID198
Feature activation+6.24e-01

Pos logit contributions
ogo+0.161
eds+0.156
wen+0.155
ules+0.155
Andrew+0.153
Neg logit contributions
 don-0.071
 Lily-0.058
 Are-0.057
 with-0.045
 blame-0.045
Token
Token ID198
Feature activation+6.08e-01

Pos logit contributions
ogo+0.061
eds+0.060
wen+0.058
ules+0.058
ades+0.058
Neg logit contributions
 don-0.019
 Lily-0.018
 Are-0.017
 Tim-0.012
 with-0.012
TokenAfter
Token ID3260
Feature activation-3.66e-02

Pos logit contributions
ogo+0.045
eds+0.042
ules+0.042
ades+0.041
wen+0.041
Neg logit contributions
 don-0.035
 Are-0.032
 blame-0.030
 Lily-0.027
 do-0.027
Token dinner
Token ID8073
Feature activation+5.11e-01

Pos logit contributions
 don+0.002
 Are+0.002
 blame+0.002
 girl+0.001
 boy+0.001
Neg logit contributions
ogo-0.003
ades-0.003
ules-0.003
us-0.003
Andrew-0.003
Token,
Token ID11
Feature activation-1.03e+00

Pos logit contributions
ogo+0.045
ades+0.044
us+0.043
ules+0.042
wen+0.042
Neg logit contributions
 don-0.019
 blame-0.018
 Are-0.016
 worry-0.014
 weep-0.014
Token Lily
Token ID20037
Feature activation-5.28e-01

Pos logit contributions
 blame+0.036
 don+0.033
 Are+0.032
 weep+0.028
 cry+0.027
Neg logit contributions
ogo-0.087
Andrew-0.086
wen-0.084
ules-0.083
Karl-0.082
Token's
Token ID338
Feature activation+3.26e-01

Pos logit contributions
 don+0.021
 Are+0.018
 blame+0.016
 girl+0.015
 do+0.015
Neg logit contributions
ogo-0.048
ades-0.047
ules-0.045
wen-0.045
eds-0.045
Token dad
Token ID9955
Feature activation+6.36e-02

Pos logit contributions
ogo+0.024
ades+0.023
eds+0.022
wen+0.021
ules+0.021
Neg logit contributions
 don-0.017
 Are-0.016
 blame-0.015
 Sorry-0.013
 Mind-0.012
Token came
Token ID1625
Feature activation+6.70e-01

Pos logit contributions
ogo+0.006
ades+0.006
eds+0.006
ules+0.006
wen+0.006
Neg logit contributions
 don-0.002
 Are-0.002
 blame-0.002
 worry-0.002
 girl-0.002
Token home
Token ID1363
Feature activation-2.48e-01

Pos logit contributions
ogo+0.061
ades+0.059
wen+0.058
lor+0.057
ules+0.057
Neg logit contributions
 don-0.026
 blame-0.022
 Are-0.022
 girl-0.021
 Lily-0.020
Token with
Token ID351
Feature activation-4.39e-01

Pos logit contributions
 don+0.011
 Are+0.009
 blame+0.009
 worry+0.007
 girl+0.007
Neg logit contributions
ogo-0.021
ades-0.020
eds-0.020
ules-0.020
wen-0.020
Tokenweb
Token ID12384
Feature activation-1.14e+00

Pos logit contributions
 don+0.053
 Are+0.048
 blame+0.048
 worry+0.044
 cry+0.042
Neg logit contributions
ogo-0.047
ades-0.044
eds-0.044
wen-0.043
ules-0.043
Token and
Token ID290
Feature activation-2.29e-03

Pos logit contributions
 don+0.054
 Are+0.051
 blame+0.049
 worry+0.041
 Sorry+0.041
Neg logit contributions
ogo-0.089
ades-0.087
eds-0.085
ules-0.084
wen-0.083
Token wished
Token ID16555
Feature activation-2.99e+00

Pos logit contributions
 blame+0.000
 don+0.000
 Are+0.000
 weep+0.000
 cry+0.000
Neg logit contributions
Andrew-0.000
ules-0.000
ogo-0.000
wen-0.000
Karl-0.000
Token he
Token ID339
Feature activation+1.95e-01

Pos logit contributions
 don+0.125
 Are+0.109
 blame+0.105
 worry+0.090
 girl+0.087
Neg logit contributions
ogo-0.263
ades-0.251
ules-0.248
wen-0.246
eds-0.246
Token could
Token ID714
Feature activation-1.00e-01

Pos logit contributions
ogo+0.015
wen+0.014
ades+0.014
eds+0.013
ules+0.013
Neg logit contributions
 don-0.010
 blame-0.010
 Are-0.009
 worry-0.008
 cry-0.008
Token be
Token ID307
Feature activation-2.79e+00

Pos logit contributions
 don+0.004
 Are+0.004
 blame+0.003
 girl+0.003
 boy+0.003
Neg logit contributions
ogo-0.009
ades-0.008
eds-0.008
wen-0.008
lor-0.008
Token mighty
Token ID18680
Feature activation+1.74e+00

Pos logit contributions
 don+0.147
 Are+0.135
 blame+0.131
 worry+0.113
 girl+0.112
Neg logit contributions
ogo-0.211
ades-0.199
eds-0.199
ules-0.195
wen-0.194
Token like
Token ID588
Feature activation-1.90e+00

Pos logit contributions
ogo+0.133
ades+0.126
ules+0.125
eds+0.124
wen+0.119
Neg logit contributions
 don-0.087
 Are-0.080
 blame-0.078
 Sorry-0.068
 weep-0.066
Token the
Token ID262
Feature activation-1.42e-01

Pos logit contributions
 don+0.070
 Are+0.058
 blame+0.054
 worry+0.046
 do+0.045
Neg logit contributions
ogo-0.180
ades-0.172
wen-0.170
ules-0.169
eds-0.169
Token cob
Token ID22843
Feature activation-7.46e-01

Pos logit contributions
 don+0.008
 Are+0.008
 blame+0.007
 worry+0.006
 Sorry+0.006
Neg logit contributions
ogo-0.010
wen-0.009
eds-0.009
ades-0.009
ules-0.009
Tokenweb
Token ID12384
Feature activation-1.09e+00

Pos logit contributions
 don+0.049
 Are+0.045
 blame+0.044
 worry+0.040
 girl+0.040
Neg logit contributions
ogo-0.048
ades-0.045
ules-0.044
wen-0.044
eds-0.044
Token.
Token ID13
Feature activation+1.63e-02

Pos logit contributions
 don+0.075
 blame+0.070
 Are+0.068
 worry+0.059
 cry+0.058
Neg logit contributions
ogo-0.104
ades-0.102
eds-0.101
ules-0.099
wen-0.098
Token It
Token ID632
Feature activation-1.83e+00

Pos logit contributions
ogo+0.001
ules+0.001
ades+0.001
wen+0.001
Andrew+0.001
Neg logit contributions
 don-0.001
 blame-0.001
 Are-0.001
 worry-0.001
 girl-0.001
Token sh
Token ID427
Feature activation-3.46e+00

Pos logit contributions
 don+0.080
 blame+0.073
 Are+0.070
 worry+0.063
 girl+0.057
Neg logit contributions
ogo-0.150
eds-0.144
ades-0.143
wen-0.142
ules-0.139
Tokenined
Token ID1389
Feature activation-6.02e-01

Pos logit contributions
 don+0.235
 Are+0.215
 blame+0.209
 worry+0.194
 weep+0.191
Neg logit contributions
ogo-0.213
ades-0.197
ules-0.195
wen-0.191
Andrew-0.189
Token very
Token ID845
Feature activation+1.88e+00

Pos logit contributions
 blame+0.034
 don+0.032
 Are+0.030
 weep+0.028
 cry+0.028
Neg logit contributions
ogo-0.041
ades-0.041
eds-0.039
ules-0.039
us-0.037
Token brightly
Token ID35254
Feature activation-4.37e+00

Pos logit contributions
ogo+0.133
ades+0.126
ules+0.123
eds+0.122
wen+0.122
Neg logit contributions
 don-0.110
 Are-0.101
 blame-0.098
 worry-0.088
 girl-0.086
Token as
Token ID355
Feature activation-3.30e-01

Pos logit contributions
 don+0.202
 blame+0.198
 Are+0.190
 worry+0.160
 cry+0.160
Neg logit contributions
ogo-0.353
ades-0.337
wen-0.333
eds-0.332
ules-0.329
Token it
Token ID340
Feature activation-3.97e-01

Pos logit contributions
 don+0.012
 Are+0.010
 blame+0.010
 worry+0.008
 girl+0.008
Neg logit contributions
ogo-0.031
ades-0.029
ules-0.029
wen-0.029
eds-0.029
Token passed
Token ID3804
Feature activation+5.44e-01

Pos logit contributions
 don+0.018
 Are+0.017
 blame+0.017
 worry+0.015
 Sorry+0.014
Neg logit contributions
ogo-0.032
eds-0.030
ades-0.030
wen-0.030
ules-0.030
Token over
Token ID625
Feature activation+1.76e-02

Pos logit contributions
ogo+0.053
ades+0.050
wen+0.050
ules+0.050
eds+0.049
Neg logit contributions
 don-0.019
 Are-0.015
 blame-0.013
 girl-0.012
 worry-0.012
Token the
Token ID262
Feature activation-1.01e-01

Pos logit contributions
ogo+0.002
wen+0.002
lor+0.002
ades+0.002
ules+0.002
Neg logit contributions
 don-0.001
 Are-0.000
 worry-0.000
 Lily-0.000
 girl-0.000
Token."
Token ID526
Feature activation+6.85e-01

Pos logit contributions
ogo+0.122
ules+0.113
eds+0.112
us+0.109
wen+0.109
Neg logit contributions
 blame-0.084
 don-0.084
 Are-0.082
 worry-0.072
 weep-0.072
Token
Token ID198
Feature activation+6.96e-01

Pos logit contributions
ogo+0.041
ades+0.041
ules+0.041
Andrew+0.040
wen+0.039
Neg logit contributions
 don-0.042
 blame-0.039
 worry-0.036
 Are-0.036
 weep-0.034
Token
Token ID198
Feature activation+6.88e-01

Pos logit contributions
ogo+0.066
ules+0.062
eds+0.062
wen+0.060
ades+0.060
Neg logit contributions
 don-0.026
 Are-0.023
 Lily-0.020
 do-0.020
 blame-0.019
TokenSur
Token ID14214
Feature activation-3.37e+00

Pos logit contributions
ogo+0.048
ules+0.043
eds+0.043
ades+0.043
wen+0.043
Neg logit contributions
 don-0.044
 Are-0.041
 blame-0.039
 do-0.036
 girl-0.036
Tokenpr
Token ID1050
Feature activation-2.27e+00

Pos logit contributions
 don+0.228
 Are+0.211
 blame+0.201
 cry+0.191
 worry+0.187
Neg logit contributions
ades-0.208
ogo-0.207
wen-0.195
eds-0.192
lor-0.192
Tokenised
Token ID1417
Feature activation-4.73e+00

Pos logit contributions
 don+0.111
 Are+0.097
 blame+0.091
 worry+0.082
 girl+0.080
Neg logit contributions
ogo-0.186
ades-0.181
eds-0.175
ules-0.174
wen-0.174
Token,
Token ID11
Feature activation-9.02e-01

Pos logit contributions
 don+0.167
 Are+0.148
 blame+0.141
 Sorry+0.111
 worry+0.108
Neg logit contributions
ogo-0.430
ules-0.414
eds-0.411
ades-0.410
us-0.406
Token the
Token ID262
Feature activation-1.11e-01

Pos logit contributions
 blame+0.050
 don+0.046
 weep+0.045
 Are+0.044
 want+0.042
Neg logit contributions
Andrew-0.054
Karl-0.051
ules-0.051
ades-0.049
ogo-0.048
Token little
Token ID1310
Feature activation-4.67e-02

Pos logit contributions
 don+0.006
 Are+0.006
 blame+0.005
 Sorry+0.005
 give+0.005
Neg logit contributions
ades-0.007
wen-0.007
ogo-0.006
Andrew-0.006
eds-0.006
Token boy
Token ID2933
Feature activation-2.74e+00

Pos logit contributions
 don+0.003
 blame+0.003
nam+0.002
 blush+0.002
 Sorry+0.002
Neg logit contributions
Andrew-0.002
uan-0.002
ades-0.002
Kevin-0.002
asper-0.002
Token asked
Token ID1965
Feature activation-2.42e+00

Pos logit contributions
 don+0.135
 Are+0.124
 blame+0.123
 Sorry+0.111
 worry+0.103
Neg logit contributions
ogo-0.196
ules-0.194
ades-0.190
eds-0.189
wen-0.187
Token cold
Token ID4692
Feature activation-3.34e+00

Pos logit contributions
 don+0.041
 Are+0.037
 blame+0.036
 worry+0.032
 girl+0.032
Neg logit contributions
ogo-0.054
ades-0.051
ules-0.050
eds-0.050
wen-0.050
Token."
Token ID526
Feature activation+7.59e-01

Pos logit contributions
 don+0.195
 blame+0.179
 Are+0.175
 Mind+0.152
 worry+0.142
Neg logit contributions
eds-0.211
ogo-0.205
ades-0.200
lor-0.196
ules-0.192
Token
Token ID198
Feature activation+6.67e-01

Pos logit contributions
ogo+0.063
ades+0.060
ules+0.059
wen+0.058
eds+0.058
Neg logit contributions
 don-0.036
 Are-0.032
 blame-0.031
 worry-0.027
 girl-0.026
Token
Token ID198
Feature activation+6.55e-01

Pos logit contributions
ogo+0.064
ules+0.061
wen+0.061
eds+0.061
lor+0.060
Neg logit contributions
 don-0.024
 Lily-0.020
 Are-0.020
 do-0.017
 blame-0.016
Token"
Token ID1
Feature activation+2.42e-01

Pos logit contributions
ogo+0.055
wen+0.052
ules+0.052
eds+0.051
ades+0.051
Neg logit contributions
 don-0.032
 Are-0.029
 blame-0.026
 girl-0.025
 do-0.024
TokenHi
Token ID17250
Feature activation-4.81e+00

Pos logit contributions
ades+0.015
Andrew+0.015
Kat+0.014
lor+0.014
Kevin+0.014
Neg logit contributions
 don-0.014
 blame-0.012
 cry-0.012
 worry-0.012
 weep-0.012
Token,
Token ID11
Feature activation-8.75e-01

Pos logit contributions
 blame+0.193
 don+0.172
 Are+0.163
 weep+0.157
 cry+0.142
Neg logit contributions
ogo-0.398
ades-0.378
ules-0.368
us-0.367
eds-0.357
Token Lily
Token ID20037
Feature activation-2.80e-01

Pos logit contributions
 blame+0.028
 don+0.027
 Are+0.026
 weep+0.025
 worry+0.022
Neg logit contributions
ogo-0.078
ules-0.071
ades-0.069
lor-0.066
wen-0.066
Token,"
Token ID553
Feature activation-1.26e+00

Pos logit contributions
 don+0.014
 blame+0.013
 Are+0.013
 Sorry+0.011
 worry+0.011
Neg logit contributions
ogo-0.021
eds-0.020
wen-0.020
us-0.019
ules-0.019
Token Ben
Token ID3932
Feature activation+2.21e+00

Pos logit contributions
 don+0.071
 blame+0.069
 Are+0.068
ter+0.066
 Mind+0.064
Neg logit contributions
Andrew-0.069
ules-0.066
ogo-0.066
asper-0.065
Kevin-0.064
Token said
Token ID531
Feature activation-1.14e+00

Pos logit contributions
ogo+0.179
ades+0.174
ules+0.173
Andrew+0.170
asper+0.166
Neg logit contributions
 don-0.094
 Are-0.083
 blame-0.077
 worry-0.068
 Sorry-0.065
Token surprised
Token ID6655
Feature activation+9.98e-01

Pos logit contributions
ogo+0.025
ades+0.023
ules+0.023
wen+0.023
eds+0.023
Neg logit contributions
 don-0.014
 Are-0.013
 blame-0.012
 worry-0.011
 do-0.011
Token and
Token ID290
Feature activation-1.67e-01

Pos logit contributions
ogo+0.086
ades+0.082
ules+0.081
eds+0.080
wen+0.080
Neg logit contributions
 don-0.044
 Are-0.039
 blame-0.037
 worry-0.032
 girl-0.031
Token said
Token ID531
Feature activation-1.26e+00

Pos logit contributions
 don+0.009
 blame+0.008
 Are+0.008
ter+0.007
 girl+0.007
Neg logit contributions
ogo-0.010
ules-0.010
eds-0.009
Andrew-0.009
ades-0.009
Token,
Token ID11
Feature activation-9.08e-01

Pos logit contributions
 don+0.046
 blame+0.044
 Are+0.041
 weep+0.036
 worry+0.034
Neg logit contributions
ogo-0.107
ades-0.107
ules-0.103
lor-0.100
us-0.099
Token "
Token ID366
Feature activation-3.86e-02

Pos logit contributions
 don+0.052
 Are+0.047
 blame+0.046
 worry+0.042
 girl+0.041
Neg logit contributions
ogo-0.064
ades-0.061
ules-0.060
wen-0.059
eds-0.059
TokenWow
Token ID22017
Feature activation-4.34e+00

Pos logit contributions
 don+0.002
 blame+0.001
 Are+0.001
 worry+0.001
 cry+0.001
Neg logit contributions
ogo-0.003
ades-0.003
ules-0.003
wen-0.003
Andrew-0.003
Token,
Token ID11
Feature activation-8.98e-01

Pos logit contributions
 don+0.154
 blame+0.145
 Are+0.142
 weep+0.113
 girl+0.108
Neg logit contributions
ogo-0.399
ades-0.377
eds-0.368
ules-0.367
wen-0.365
Token Mr
Token ID1770
Feature activation-9.68e-02

Pos logit contributions
 blame+0.041
 Are+0.040
 don+0.039
cy+0.038
 weep+0.037
Neg logit contributions
ogo-0.058
Andrew-0.052
lor-0.052
wen-0.051
ules-0.050
Token.
Token ID13
Feature activation-3.37e-02

Pos logit contributions
 don+0.005
 blame+0.005
 Are+0.005
 worry+0.004
 weep+0.004
Neg logit contributions
ogo-0.007
ades-0.007
eds-0.007
us-0.006
wen-0.006
Token Cable
Token ID20249
Feature activation+1.26e+00

Pos logit contributions
 don+0.002
 blame+0.002
 Are+0.002
 worry+0.001
 weep+0.001
Neg logit contributions
ogo-0.002
wen-0.002
ules-0.002
ades-0.002
lor-0.002
Token is
Token ID318
Feature activation+1.15e+00

Pos logit contributions
ogo+0.089
ades+0.084
eds+0.082
ules+0.082
wen+0.082
Neg logit contributions
 don-0.072
 Are-0.066
 blame-0.066
 worry-0.059
 weep-0.057
Token straight
Token ID3892
Feature activation+3.12e+00

Pos logit contributions
ogo+0.087
ades+0.083
ules+0.083
wen+0.082
eds+0.081
Neg logit contributions
 don-0.055
 Are-0.047
 blame-0.046
 worry-0.042
 do-0.041
Token.
Token ID13
Feature activation-1.59e-01

Pos logit contributions
ogo+0.231
eds+0.219
ades+0.216
lor+0.207
wen+0.207
Neg logit contributions
 don-0.164
 Are-0.153
 blame-0.152
 Sorry-0.130
 worry-0.129
Token
Token ID198
Feature activation+5.91e-01

Pos logit contributions
 don+0.007
 Are+0.006
 blame+0.006
 Lily+0.005
 girl+0.005
Neg logit contributions
ogo-0.014
eds-0.013
ades-0.013
wen-0.013
ules-0.013
Token
Token ID198
Feature activation+5.89e-01

Pos logit contributions
ogo+0.061
ules+0.059
wen+0.058
eds+0.058
lor+0.057
Neg logit contributions
 don-0.017
 Lily-0.016
 Are-0.014
 Tim-0.011
 with-0.011
Token"
Token ID1
Feature activation+2.11e-01

Pos logit contributions
ogo+0.048
ules+0.045
ades+0.045
wen+0.045
eds+0.044
Neg logit contributions
 don-0.030
 Are-0.027
 blame-0.025
 girl-0.023
 Lily-0.022
TokenWow
Token ID22017
Feature activation-4.24e+00

Pos logit contributions
ogo+0.016
Andrew+0.016
ades+0.015
wen+0.015
lor+0.015
Neg logit contributions
 don-0.011
 blame-0.009
 worry-0.009
 Are-0.009
 weep-0.008
Token,
Token ID11
Feature activation-8.45e-01

Pos logit contributions
 blame+0.157
 don+0.147
 Are+0.146
 weep+0.125
 worry+0.114
Neg logit contributions
ogo-0.378
ades-0.349
eds-0.340
ules-0.337
wen-0.337
Token you
Token ID345
Feature activation-1.33e+00

Pos logit contributions
 blame+0.036
 don+0.036
 Are+0.035
ter+0.034
cy+0.034
Neg logit contributions
ogo-0.056
lor-0.050
Andrew-0.050
wen-0.048
vern-0.048
Token are
Token ID389
Feature activation-7.95e-01

Pos logit contributions
 blame+0.049
 don+0.048
 Are+0.047
 girl+0.040
 weep+0.039
Neg logit contributions
ogo-0.110
ades-0.106
wen-0.102
lor-0.102
eds-0.101
Token good
Token ID922
Feature activation+1.67e+00

Pos logit contributions
 don+0.045
 blame+0.041
 Are+0.041
 girl+0.036
 weep+0.036
Neg logit contributions
ogo-0.056
ades-0.051
wen-0.051
eds-0.050
ules-0.050
Token at
Token ID379
Feature activation-2.61e+00

Pos logit contributions
ogo+0.118
ules+0.115
us+0.113
ades+0.109
eds+0.106
Neg logit contributions
 don-0.088
 Are-0.085
 blame-0.082
 weep-0.072
 worry-0.070