TOP ACTIVATIONS
MAX = 0.151

fell off the stove and spilled soup on the floor .
cup , and the coffee spilled all over him . Max
tipped over . The soup spilled out . It spl ashed
The oct opus picked up the shell and thought
the pastry , so she picked it up . She was
cat !" Spot 's owner picked up Spot 's ball and
the book store and Lily picked a new book to read
slowly poured out . It spilled onto the ground and made
prick ly . Tim my picked it up and showed it
with the lion . He picks up the card and gives
udd ly toy . She picked it up and shouted with
in her garden . She dug a hole and planted the
that the apple she had picked was now on the ground
, Lily 's friend accidentally spilled some sugar on her dress
, Lily 's little brother spilled juice on the same bright
't give up . He picked up the ball and threw
choose from ! Tim my picked a red k ite with
special symbol . The daddy picked an original symbol for the
Mom sigh s and picks up the cars . She
in the ground . She dug it up and saw that

MIN ACTIVATIONS
MIN = -0.139

his parents about the mysterious Gor illa . His parents were
were probably celebrating their own birth days . Just when he
. You worked with such delic acy . You have a
. He kept throwing a tant rum . His mom was
She cried and threw a tant rum . Her mom tried
listen . She threw a tant rum in the store .
, ready to have more pic n ics in the future
her feet and threw a tant rum . She grabbed the
to cry and threw a tant rum . She didn 't
regretted getting close to the Gor illa and not listening to
the cake . Un cle Bob was very shocked
and so she threw a tant rum when her parents said
sunshine . Molly liked the delic ately pet alled yellow ones
him and said , " Un free ze !" Tim smiled
anas available for feeding the gor illas ." She said to
soon nothing but a small spe ck in the sky .
great cook ." Un cle Tom took a bite
a great time using their imag inations . <|endoftext|> Once upon
big cage with lots of gor illas inside !"
. They had used their imag inations and their net to

INTERVAL 0.078 - 0.151
CONTAINS 0.239%

me find my object . It will remain my favorite toy
had a new toy . It was a monster . He
morning , the little bear found that some of the treats
flowers on them . She picked out a stamp with a
something in the park . It was a big tree with

INTERVAL 0.006 - 0.078
CONTAINS 42.592%

y ummy fruit and the pretty sunset . <|endoftext|> Once upon
ink led with joy and he thanked L ila for serving
played on the swings and the slide . Then , she
hugged Teddy and said , " I 'm glad we 're
she saw a big bird flying really high in the sky

INTERVAL -0.067 - 0.006
CONTAINS 55.892%

to bed with a smile on his face . <|endoftext|> Once
way out ." The mouse led Liz to
. " Do you want to go for a ride ?"
was always shining brightly . Bob was very
. At the end of the day , Henry

INTERVAL -0.139 - -0.067
CONTAINS 1.277%

. He wanted to act too . Tim asked
him up and they were very angry at the old bar
knew she could eat the y ummy , sweet pastry .
- cream . Spot was very happy and ate it all
girl , Sarah , was extra excited . She kept saying
Token fell
Token ID3214
Feature activation+8.63e-02

Pos logit contributions
ting+0.053
ing+0.052
ter+0.051
ning+0.049
it+0.045
Neg logit contributions
 whine-0.030
 convincing-0.028
 know-0.027
uth-0.026
 reason-0.026
Token off
Token ID572
Feature activation+9.21e-03

Pos logit contributions
ing+0.119
ting+0.118
ter+0.115
ning+0.114
ini+0.106
Neg logit contributions
 whine-0.158
 argue-0.141
 reason-0.140
 convincing-0.140
 know-0.140
Token the
Token ID262
Feature activation+8.37e-02

Pos logit contributions
ing+0.012
ting+0.012
ter+0.012
ning+0.011
 up+0.010
Neg logit contributions
 whine-0.018
 convincing-0.017
uth-0.017
 month-0.017
 reason-0.017
Token stove
Token ID27635
Feature activation+6.49e-02

Pos logit contributions
ing+0.127
ting+0.127
ter+0.121
ning+0.120
it+0.112
Neg logit contributions
 whine-0.149
 reason-0.138
 month-0.137
 hesitation-0.134
 convincing-0.133
Token and
Token ID290
Feature activation+4.20e-02

Pos logit contributions
ter+0.097
ting+0.096
ing+0.093
ning+0.091
bled+0.087
Neg logit contributions
 whine-0.113
 month-0.103
 argue-0.100
 reason-0.100
 convincing-0.099
Token spilled
Token ID31896
Feature activation+1.14e-01

Pos logit contributions
ing+0.072
ting+0.069
ning+0.069
ter+0.066
ancing+0.065
Neg logit contributions
 whine-0.068
 know-0.059
 Megan-0.058
 convincing-0.058
 argue-0.057
Token soup
Token ID17141
Feature activation+4.57e-02

Pos logit contributions
ting+0.161
ning+0.160
ter+0.158
ing+0.158
ancing+0.150
Neg logit contributions
 whine-0.197
 reason-0.179
 month-0.179
 Megan-0.179
 breath-0.176
Token on
Token ID319
Feature activation-1.41e-02

Pos logit contributions
ting+0.066
ter+0.064
ning+0.063
ing+0.060
 Good+0.059
Neg logit contributions
 whine-0.076
 month-0.072
 reason-0.070
 argue-0.069
 convincing-0.068
Token the
Token ID262
Feature activation+6.67e-02

Pos logit contributions
 whine+0.025
 month+0.023
 convincing+0.023
 reason+0.023
uth+0.023
Neg logit contributions
ing-0.021
ting-0.021
ter-0.021
ning-0.020
it-0.018
Token floor
Token ID4314
Feature activation+3.95e-02

Pos logit contributions
ing+0.104
ting+0.102
ning+0.099
ter+0.099
it+0.093
Neg logit contributions
 whine-0.114
 month-0.109
 reason-0.106
 hour-0.101
 hesitation-0.099
Token.
Token ID13
Feature activation+3.59e-02

Pos logit contributions
ting+0.063
ter+0.061
ing+0.059
ning+0.058
 Good+0.051
Neg logit contributions
 whine-0.066
 month-0.061
 convincing-0.061
 reason-0.060
 hesitation-0.059
Token cup
Token ID6508
Feature activation+8.67e-02

Pos logit contributions
 whine+0.003
uth+0.003
 convincing+0.003
 month+0.003
 argue+0.003
Neg logit contributions
ting-0.004
ter-0.004
ning-0.004
ing-0.004
 nearest-0.004
Token,
Token ID11
Feature activation+7.11e-03

Pos logit contributions
ting+0.116
ter+0.108
ning+0.106
ing+0.103
 pointer+0.096
Neg logit contributions
 whine-0.157
 month-0.153
 reason-0.147
 Megan-0.145
 convincing-0.145
Token and
Token ID290
Feature activation+2.11e-02

Pos logit contributions
ing+0.010
ting+0.010
ter+0.010
ning+0.009
it+0.008
Neg logit contributions
 whine-0.013
 convincing-0.013
 month-0.012
 reason-0.012
uth-0.012
Token the
Token ID262
Feature activation+6.49e-02

Pos logit contributions
ing+0.031
ting+0.031
ter+0.030
ning+0.029
it+0.025
Neg logit contributions
 whine-0.038
 convincing-0.036
 month-0.035
 reason-0.035
uth-0.035
Token coffee
Token ID6891
Feature activation+4.23e-02

Pos logit contributions
ing+0.094
ting+0.093
ter+0.092
ning+0.087
it+0.077
Neg logit contributions
 whine-0.117
 convincing-0.112
 month-0.110
uth-0.109
 reason-0.109
Token spilled
Token ID31896
Feature activation+1.14e-01

Pos logit contributions
ting+0.055
ter+0.054
ing+0.053
ning+0.052
it+0.044
Neg logit contributions
 whine-0.083
 month-0.078
 reason-0.076
 convincing-0.076
 Megan-0.074
Token all
Token ID477
Feature activation+2.00e-02

Pos logit contributions
ting+0.151
ter+0.149
ing+0.148
ning+0.146
ancing+0.132
Neg logit contributions
 whine-0.210
 month-0.201
 reason-0.196
 convincing-0.193
 argue-0.189
Token over
Token ID625
Feature activation-2.65e-03

Pos logit contributions
ting+0.026
ter+0.026
ing+0.024
ning+0.024
 pointer+0.021
Neg logit contributions
 whine-0.039
 month-0.038
 convincing-0.037
 reason-0.036
 Megan-0.036
Token him
Token ID683
Feature activation+1.55e-02

Pos logit contributions
 whine+0.005
 convincing+0.005
uth+0.005
 month+0.005
 hesitation+0.005
Neg logit contributions
ing-0.003
ting-0.003
ter-0.003
ning-0.003
 up-0.003
Token.
Token ID13
Feature activation+3.88e-02

Pos logit contributions
ting+0.021
ing+0.021
ter+0.021
ning+0.020
it+0.017
Neg logit contributions
 whine-0.029
 convincing-0.028
 month-0.027
 reason-0.027
 hesitation-0.027
Token Max
Token ID5436
Feature activation+1.93e-02

Pos logit contributions
ting+0.047
ing+0.047
ter+0.046
ning+0.043
it+0.038
Neg logit contributions
 whine-0.079
 convincing-0.075
 reason-0.075
 hesitation-0.074
 month-0.074
Token tipped
Token ID28395
Feature activation+9.38e-02

Pos logit contributions
ting+0.151
ning+0.140
ter+0.140
ing+0.136
 Good+0.127
Neg logit contributions
 whine-0.162
 month-0.159
 know-0.156
 reason-0.155
 convincing-0.154
Token over
Token ID625
Feature activation+1.55e-02

Pos logit contributions
ting+0.120
ing+0.116
ter+0.115
ning+0.111
ancing+0.107
Neg logit contributions
 whine-0.172
 month-0.163
 reason-0.162
 Megan-0.157
 know-0.156
Token.
Token ID13
Feature activation+4.49e-02

Pos logit contributions
ing+0.018
ting+0.018
ter+0.018
ning+0.016
 up+0.015
Neg logit contributions
 whine-0.032
 convincing-0.031
uth-0.031
 month-0.030
 hesitation-0.030
Token The
Token ID383
Feature activation+6.75e-02

Pos logit contributions
ing+0.055
ting+0.055
ter+0.054
ning+0.049
it+0.044
Neg logit contributions
 whine-0.090
 convincing-0.086
 reason-0.085
 month-0.085
 hesitation-0.085
Token soup
Token ID17141
Feature activation+3.90e-02

Pos logit contributions
ting+0.097
ing+0.096
ter+0.092
ning+0.090
it+0.081
Neg logit contributions
 whine-0.124
 convincing-0.117
 reason-0.117
 month-0.116
 hesitation-0.115
Token spilled
Token ID31896
Feature activation+1.11e-01

Pos logit contributions
ting+0.054
ter+0.051
ing+0.050
ning+0.050
ancing+0.043
Neg logit contributions
 whine-0.076
 convincing-0.068
 argue-0.067
 reason-0.067
 know-0.067
Token out
Token ID503
Feature activation-5.74e-03

Pos logit contributions
ting+0.153
ning+0.153
ter+0.151
ing+0.148
ini+0.142
Neg logit contributions
 whine-0.193
 month-0.184
 convincing-0.180
 reason-0.180
 tant-0.176
Token.
Token ID13
Feature activation+3.62e-02

Pos logit contributions
 whine+0.010
 convincing+0.009
 month+0.009
 hesitation+0.009
 reason+0.009
Neg logit contributions
ting-0.009
ter-0.009
ing-0.009
ning-0.008
ancing-0.007
Token It
Token ID632
Feature activation+8.95e-02

Pos logit contributions
ting+0.045
ing+0.045
ter+0.044
ning+0.040
it+0.036
Neg logit contributions
 whine-0.073
 convincing-0.070
 reason-0.069
 hesitation-0.069
 month-0.068
Token spl
Token ID4328
Feature activation+3.66e-02

Pos logit contributions
ting+0.128
ing+0.127
ter+0.123
ning+0.116
ancing+0.104
Neg logit contributions
 whine-0.171
 convincing-0.157
 reason-0.156
 month-0.154
 argue-0.154
Tokenashed
Token ID5263
Feature activation+6.85e-02

Pos logit contributions
ning+0.072
ting+0.072
ter+0.072
 nearest+0.067
stone+0.066
Neg logit contributions
 whine-0.041
bs-0.038
 tant-0.038
aney-0.037
uth-0.036
Token
Token ID198
Feature activation+1.70e-02

Pos logit contributions
 whine+0.148
uth+0.142
 reason+0.136
 argue+0.135
 convincing+0.134
Neg logit contributions
ing-0.105
ting-0.104
ter-0.100
ning-0.089
 floating-0.085
Token
Token ID198
Feature activation-5.83e-05

Pos logit contributions
ing+0.021
ter+0.021
ting+0.020
it+0.020
ning+0.018
Neg logit contributions
 whine-0.034
 hesitation-0.032
 convincing-0.031
 argue-0.030
 reason-0.030
TokenThe
Token ID464
Feature activation+6.76e-02

Pos logit contributions
 whine+0.000
 hesitation+0.000
 convincing+0.000
 argue+0.000
 allowance+0.000
Neg logit contributions
ing-0.000
ter-0.000
it-0.000
ting-0.000
ning-0.000
Token oct
Token ID19318
Feature activation+1.52e-02

Pos logit contributions
ing+0.107
ting+0.107
ter+0.106
ning+0.100
it+0.090
Neg logit contributions
 whine-0.113
 convincing-0.107
 month-0.105
uth-0.104
 reason-0.104
Tokenopus
Token ID25790
Feature activation+4.61e-02

Pos logit contributions
ting+0.020
ning+0.019
ter+0.019
ing+0.018
 up+0.017
Neg logit contributions
uth-0.027
 convincing-0.027
 age-0.027
 whine-0.026
 oblig-0.026
Token picked
Token ID6497
Feature activation+1.11e-01

Pos logit contributions
ting+0.070
ing+0.069
ter+0.068
ning+0.064
it+0.057
Neg logit contributions
 whine-0.081
 convincing-0.078
uth-0.075
 reason-0.074
 month-0.074
Token up
Token ID510
Feature activation+1.26e-02

Pos logit contributions
pipe+0.104
ter+0.102
auc+0.102
ini+0.100
we+0.098
Neg logit contributions
 Megan-0.213
 Clara-0.208
 know-0.207
 excitement-0.203
uth-0.203
Token the
Token ID262
Feature activation+6.03e-02

Pos logit contributions
ing+0.017
ting+0.017
ter+0.017
ning+0.016
it+0.014
Neg logit contributions
 whine-0.024
 convincing-0.023
uth-0.022
 month-0.022
 reason-0.022
Token shell
Token ID7582
Feature activation+7.74e-02

Pos logit contributions
ing+0.084
ting+0.083
ter+0.079
ning+0.078
it+0.071
Neg logit contributions
 whine-0.111
 reason-0.108
 hesitation-0.106
 month-0.105
 convincing-0.104
Token and
Token ID290
Feature activation+2.31e-02

Pos logit contributions
ting+0.129
 Good+0.120
ter+0.117
ning+0.115
ing+0.111
Neg logit contributions
 whine-0.113
 hesitation-0.112
 convincing-0.110
 month-0.105
 allowance-0.104
Token thought
Token ID1807
Feature activation+3.77e-02

Pos logit contributions
ing+0.041
ting+0.039
ter+0.038
ning+0.037
it+0.035
Neg logit contributions
 whine-0.035
 convincing-0.032
 know-0.032
 argue-0.031
 hesitation-0.031
Token the
Token ID262
Feature activation+6.04e-02

Pos logit contributions
ing+0.070
ting+0.070
ter+0.069
ning+0.064
it+0.056
Neg logit contributions
 whine-0.105
 convincing-0.101
 month-0.099
uth-0.099
 reason-0.098
Token pastry
Token ID41738
Feature activation+7.55e-03

Pos logit contributions
ing+0.090
ting+0.090
ter+0.087
ning+0.082
it+0.075
Neg logit contributions
 whine-0.108
 convincing-0.103
 reason-0.102
 month-0.101
 hesitation-0.101
Token,
Token ID11
Feature activation-8.91e-03

Pos logit contributions
ing+0.010
ting+0.010
ter+0.010
ning+0.010
it+0.008
Neg logit contributions
 whine-0.014
 convincing-0.014
 month-0.013
 reason-0.013
uth-0.013
Token so
Token ID523
Feature activation-5.63e-02

Pos logit contributions
 whine+0.017
 convincing+0.016
 month+0.016
uth+0.016
 reason+0.016
Neg logit contributions
ing-0.012
ting-0.012
ter-0.012
ning-0.011
it-0.010
Token she
Token ID673
Feature activation+2.77e-02

Pos logit contributions
 whine+0.118
 reason+0.111
 convincing+0.111
 month+0.111
uth+0.110
Neg logit contributions
ing-0.066
ter-0.065
ting-0.062
ning-0.057
 up-0.054
Token picked
Token ID6497
Feature activation+1.10e-01

Pos logit contributions
ing+0.045
ter+0.045
ting+0.044
ning+0.042
 up+0.038
Neg logit contributions
 whine-0.045
 convincing-0.043
 month-0.042
 hesitation-0.042
uth-0.042
Token it
Token ID340
Feature activation+6.48e-02

Pos logit contributions
ter+0.163
ting+0.158
ing+0.153
ning+0.152
auc+0.146
Neg logit contributions
 know-0.171
 Megan-0.169
 whine-0.165
 month-0.165
 reason-0.164
Token up
Token ID510
Feature activation+1.39e-02

Pos logit contributions
ting+0.060
ter+0.053
 Good+0.048
ing+0.046
ning+0.045
Neg logit contributions
 whine-0.142
 reason-0.140
 convincing-0.137
 allowance-0.137
 know-0.137
Token.
Token ID13
Feature activation+3.65e-02

Pos logit contributions
ing+0.016
ter+0.016
ting+0.016
ning+0.015
 up+0.013
Neg logit contributions
 whine-0.029
 convincing-0.028
 month-0.027
uth-0.027
 reason-0.027
Token She
Token ID1375
Feature activation+3.61e-02

Pos logit contributions
ing+0.048
ting+0.047
ter+0.047
ning+0.044
it+0.039
Neg logit contributions
 whine-0.071
 convincing-0.068
 month-0.067
 reason-0.067
 hesitation-0.066
Token was
Token ID373
Feature activation+2.81e-02

Pos logit contributions
ing+0.055
ter+0.054
ting+0.054
ning+0.051
 up+0.046
Neg logit contributions
 whine-0.063
 convincing-0.060
 month-0.059
 hesitation-0.059
 reason-0.058
Token cat
Token ID3797
Feature activation+2.75e-02

Pos logit contributions
ting+0.044
ing+0.043
ter+0.042
ning+0.041
bled+0.037
Neg logit contributions
 whine-0.053
 month-0.050
 reason-0.050
 convincing-0.050
 know-0.049
Token!"
Token ID2474
Feature activation+5.06e-03

Pos logit contributions
ting+0.036
ing+0.035
ter+0.035
ning+0.033
bled+0.030
Neg logit contributions
 whine-0.053
 convincing-0.050
 know-0.049
 month-0.049
uth-0.048
Token Spot
Token ID15899
Feature activation+7.40e-03

Pos logit contributions
ting+0.007
ter+0.007
ing+0.006
ning+0.006
 that+0.006
Neg logit contributions
 whine-0.010
 month-0.009
 hesitation-0.009
 reason-0.009
uth-0.009
Token's
Token ID338
Feature activation+8.20e-03

Pos logit contributions
ing+0.011
ting+0.011
ter+0.011
ning+0.010
 up+0.010
Neg logit contributions
 month-0.013
 whine-0.013
 convincing-0.012
 hesitation-0.012
 reason-0.012
Token owner
Token ID4870
Feature activation-1.37e-02

Pos logit contributions
ing+0.016
ter+0.015
ting+0.015
ning+0.015
 up+0.014
Neg logit contributions
 whine-0.011
 month-0.011
 convincing-0.010
uth-0.010
 hesitation-0.010
Token picked
Token ID6497
Feature activation+1.10e-01

Pos logit contributions
 whine+0.022
 convincing+0.021
 reason+0.020
 hesitation+0.020
uth+0.020
Neg logit contributions
ter-0.023
ting-0.023
ing-0.023
ning-0.022
it-0.020
Token up
Token ID510
Feature activation+1.14e-02

Pos logit contributions
ter+0.101
ini+0.096
auc+0.096
we+0.093
tub+0.091
Neg logit contributions
 whine-0.226
 know-0.220
 Megan-0.220
 convincing-0.216
 reason-0.216
Token Spot
Token ID15899
Feature activation+1.17e-02

Pos logit contributions
ting+0.015
ing+0.015
ter+0.014
ning+0.013
 up+0.012
Neg logit contributions
 whine-0.023
 convincing-0.022
 month-0.022
uth-0.022
 reason-0.022
Token's
Token ID338
Feature activation+9.83e-03

Pos logit contributions
auc+0.017
ter+0.017
ini+0.016
ing+0.016
bled+0.016
Neg logit contributions
 whine-0.020
 argue-0.018
 convincing-0.017
 hesitation-0.017
 know-0.017
Token ball
Token ID2613
Feature activation+5.61e-02

Pos logit contributions
ing+0.018
ting+0.018
ter+0.017
ning+0.017
it+0.015
Neg logit contributions
 whine-0.015
 convincing-0.014
 reason-0.013
 hesitation-0.013
 month-0.013
Token and
Token ID290
Feature activation+2.71e-02

Pos logit contributions
 Good+0.099
ting+0.096
ini+0.095
ning+0.094
ter+0.092
Neg logit contributions
 whine-0.066
 hour-0.060
 month-0.059
 convincing-0.058
 argue-0.058
Token the
Token ID262
Feature activation+7.67e-02

Pos logit contributions
 whine+0.036
 convincing+0.035
uth+0.034
 month+0.034
 reason+0.034
Neg logit contributions
ting-0.024
ing-0.024
ter-0.023
ning-0.022
 up-0.019
Token book
Token ID1492
Feature activation+4.63e-02

Pos logit contributions
ing+0.113
ting+0.112
ter+0.111
ning+0.104
it+0.093
Neg logit contributions
 whine-0.137
 convincing-0.131
 month-0.128
 reason-0.127
uth-0.127
Token store
Token ID3650
Feature activation+6.00e-02

Pos logit contributions
ting+0.066
ter+0.064
ing+0.064
ning+0.060
bled+0.052
Neg logit contributions
 whine-0.085
 month-0.081
 convincing-0.081
 reason-0.079
uth-0.078
Token and
Token ID290
Feature activation+2.46e-02

Pos logit contributions
ting+0.090
ing+0.087
ter+0.086
ning+0.082
 pointer+0.075
Neg logit contributions
 whine-0.104
 month-0.099
 convincing-0.099
 reason-0.098
uth-0.095
Token Lily
Token ID20037
Feature activation+3.66e-02

Pos logit contributions
ing+0.037
ting+0.037
ter+0.037
ning+0.035
it+0.031
Neg logit contributions
 whine-0.043
 convincing-0.041
 month-0.040
uth-0.040
 reason-0.040
Token picked
Token ID6497
Feature activation+1.10e-01

Pos logit contributions
ing+0.049
ting+0.049
ter+0.048
ning+0.045
it+0.039
Neg logit contributions
 whine-0.071
 convincing-0.067
 month-0.066
uth-0.066
 reason-0.066
Token a
Token ID257
Feature activation+2.60e-02

Pos logit contributions
ter+0.130
ting+0.123
ing+0.118
ning+0.115
we+0.109
Neg logit contributions
 month-0.204
 whine-0.203
 know-0.202
 Megan-0.200
uth-0.199
Token new
Token ID649
Feature activation+3.11e-03

Pos logit contributions
ing+0.038
ting+0.037
ter+0.036
ning+0.035
 up+0.033
Neg logit contributions
 whine-0.046
 convincing-0.045
uth-0.044
 hesitation-0.043
 month-0.042
Token book
Token ID1492
Feature activation+1.29e-02

Pos logit contributions
ing+0.005
ting+0.005
ter+0.005
ning+0.005
it+0.004
Neg logit contributions
 whine-0.005
 convincing-0.005
 month-0.005
 reason-0.005
uth-0.005
Token to
Token ID284
Feature activation-1.38e-02

Pos logit contributions
ting+0.018
ter+0.017
ing+0.017
ning+0.016
it+0.014
Neg logit contributions
 whine-0.024
 month-0.023
 convincing-0.023
 reason-0.022
uth-0.022
Token read
Token ID1100
Feature activation+2.68e-02

Pos logit contributions
 whine+0.023
 convincing+0.022
uth+0.021
 hesitation+0.021
 reason+0.020
Neg logit contributions
ting-0.022
ter-0.021
ing-0.020
ning-0.019
 up-0.019
Token slowly
Token ID6364
Feature activation-8.68e-03

Pos logit contributions
ting+0.066
ter+0.064
ing+0.063
ning+0.062
 Good+0.059
Neg logit contributions
 whine-0.091
 convincing-0.083
 month-0.083
 hesitation-0.083
 reason-0.083
Token poured
Token ID19036
Feature activation+6.22e-02

Pos logit contributions
 whine+0.016
 convincing+0.015
 hesitation+0.015
 month+0.015
 reason+0.014
Neg logit contributions
ing-0.012
ter-0.012
ting-0.012
ning-0.012
it-0.011
Token out
Token ID503
Feature activation-1.09e-02

Pos logit contributions
ting+0.086
ter+0.085
ning+0.082
ing+0.082
 pointer+0.075
Neg logit contributions
 whine-0.108
 month-0.104
 reason-0.102
 convincing-0.102
 hesitation-0.101
Token.
Token ID13
Feature activation+3.12e-02

Pos logit contributions
 whine+0.019
 convincing+0.018
 month+0.018
 hesitation+0.018
 reason+0.018
Neg logit contributions
ter-0.016
ting-0.016
ing-0.015
ning-0.015
it-0.013
Token It
Token ID632
Feature activation+8.78e-02

Pos logit contributions
ing+0.040
ting+0.040
ter+0.038
ning+0.036
it+0.033
Neg logit contributions
 whine-0.062
 convincing-0.058
 reason-0.058
 hesitation-0.058
 argue-0.057
Token spilled
Token ID31896
Feature activation+1.09e-01

Pos logit contributions
ting+0.102
ing+0.098
ter+0.097
ning+0.087
orting+0.075
Neg logit contributions
 whine-0.188
 convincing-0.180
 reason-0.176
 know-0.175
 month-0.175
Token onto
Token ID4291
Feature activation+7.33e-03

Pos logit contributions
ting+0.150
ing+0.148
ter+0.148
ning+0.147
ancing+0.130
Neg logit contributions
 whine-0.195
 month-0.186
 reason-0.185
 convincing-0.182
 hesitation-0.179
Token the
Token ID262
Feature activation+5.92e-02

Pos logit contributions
ing+0.011
ting+0.011
ter+0.011
ning+0.010
it+0.009
Neg logit contributions
 whine-0.013
 convincing-0.012
 month-0.012
 reason-0.012
uth-0.012
Token ground
Token ID2323
Feature activation+5.03e-02

Pos logit contributions
ing+0.083
ting+0.083
ter+0.080
ning+0.079
it+0.071
Neg logit contributions
 whine-0.107
 reason-0.103
 month-0.102
 convincing-0.098
 hesitation-0.097
Token and
Token ID290
Feature activation+1.47e-02

Pos logit contributions
ting+0.083
ter+0.081
ning+0.077
ing+0.076
 Good+0.072
Neg logit contributions
 whine-0.076
 month-0.073
 convincing-0.070
 reason-0.069
 hesitation-0.069
Token made
Token ID925
Feature activation+5.03e-02

Pos logit contributions
ing+0.021
ting+0.021
ter+0.021
ning+0.020
it+0.017
Neg logit contributions
 whine-0.027
 convincing-0.025
 month-0.024
 reason-0.024
 hesitation-0.024
Token prick
Token ID41409
Feature activation+3.89e-02

Pos logit contributions
ing+0.065
ting+0.065
ter+0.064
ning+0.060
 up+0.054
Neg logit contributions
 whine-0.072
 convincing-0.069
uth-0.067
 month-0.067
 reason-0.067
Tokenly
Token ID306
Feature activation+3.95e-02

Pos logit contributions
ning+0.048
ter+0.048
ing+0.046
ting+0.046
 nearest+0.043
Neg logit contributions
 whine-0.069
 know-0.068
 reason-0.067
 hesitation-0.065
 convincing-0.065
Token.
Token ID13
Feature activation+3.94e-02

Pos logit contributions
ing+0.057
ting+0.057
ter+0.056
ning+0.053
it+0.047
Neg logit contributions
 whine-0.072
 convincing-0.068
 reason-0.066
 hesitation-0.066
 month-0.066
Token Tim
Token ID5045
Feature activation+2.97e-02

Pos logit contributions
ting+0.052
ter+0.052
ing+0.052
ning+0.048
it+0.043
Neg logit contributions
 whine-0.075
 convincing-0.072
 month-0.071
 reason-0.071
 hesitation-0.070
Tokenmy
Token ID1820
Feature activation+1.95e-02

Pos logit contributions
ting+0.038
ing+0.038
ter+0.038
ning+0.035
it+0.031
Neg logit contributions
 whine-0.059
 convincing-0.055
 know-0.055
 reason-0.054
uth-0.054
Token picked
Token ID6497
Feature activation+1.09e-01

Pos logit contributions
ting+0.026
ing+0.026
ter+0.026
ning+0.024
it+0.021
Neg logit contributions
 whine-0.038
 convincing-0.036
uth-0.035
 reason-0.035
 know-0.034
Token it
Token ID340
Feature activation+7.27e-02

Pos logit contributions
ter+0.116
we+0.115
orting+0.113
ting+0.112
uddled+0.110
Neg logit contributions
 know-0.205
 Megan-0.200
 whine-0.192
 reason-0.191
 Clara-0.191
Token up
Token ID510
Feature activation+1.55e-02

Pos logit contributions
ting+0.087
we+0.080
lor+0.080
 Good+0.078
ini+0.077
Neg logit contributions
 reason-0.127
 breath-0.124
 know-0.124
 mind-0.121
 whine-0.119
Token and
Token ID290
Feature activation+2.32e-02

Pos logit contributions
ing+0.018
ting+0.018
ter+0.018
ning+0.017
it+0.014
Neg logit contributions
 whine-0.033
 convincing-0.031
 month-0.031
uth-0.031
 reason-0.031
Token showed
Token ID3751
Feature activation+5.40e-02

Pos logit contributions
ing+0.045
ting+0.043
ter+0.042
ning+0.041
ancing+0.040
Neg logit contributions
 whine-0.033
 know-0.030
 convincing-0.028
 argue-0.027
 begged-0.026
Token it
Token ID340
Feature activation+6.31e-02

Pos logit contributions
ting+0.080
ter+0.077
ing+0.076
ning+0.073
 floating+0.066
Neg logit contributions
 know-0.094
 whine-0.090
 reason-0.088
 Megan-0.085
 convincing-0.084
Token with
Token ID351
Feature activation+9.06e-03

Pos logit contributions
ting+0.115
ing+0.111
ter+0.109
ning+0.104
it+0.090
Neg logit contributions
 whine-0.147
 month-0.140
 convincing-0.139
 reason-0.137
 hesitation-0.136
Token the
Token ID262
Feature activation+5.51e-02

Pos logit contributions
ing+0.012
ting+0.012
ter+0.012
ning+0.011
it+0.010
Neg logit contributions
 whine-0.017
 convincing-0.017
 month-0.016
uth-0.016
 reason-0.016
Token lion
Token ID18744
Feature activation+2.47e-02

Pos logit contributions
ing+0.084
ting+0.083
ter+0.082
ning+0.077
it+0.069
Neg logit contributions
 whine-0.096
 convincing-0.091
 month-0.090
 reason-0.089
uth-0.089
Token.
Token ID13
Feature activation+4.38e-02

Pos logit contributions
ting+0.033
ing+0.033
ter+0.033
ning+0.031
it+0.027
Neg logit contributions
 whine-0.047
 convincing-0.044
 month-0.044
 hesitation-0.043
uth-0.043
Token He
Token ID679
Feature activation+5.15e-02

Pos logit contributions
ing+0.055
ting+0.055
ter+0.055
ning+0.051
it+0.045
Neg logit contributions
 whine-0.087
 convincing-0.083
 month-0.082
 reason-0.082
uth-0.081
Token picks
Token ID11103
Feature activation+1.08e-01

Pos logit contributions
ing+0.082
ting+0.081
ter+0.081
ning+0.075
it+0.069
Neg logit contributions
 whine-0.089
 convincing-0.083
uth-0.081
 know-0.081
 reason-0.080
Token up
Token ID510
Feature activation+9.97e-03

Pos logit contributions
ter+0.141
ting+0.130
pipe+0.125
ini+0.125
ing+0.125
Neg logit contributions
 know-0.188
 Megan-0.188
 whine-0.181
 reason-0.181
uth-0.180
Token the
Token ID262
Feature activation+5.79e-02

Pos logit contributions
ing+0.013
ting+0.013
ter+0.013
ning+0.012
it+0.011
Neg logit contributions
 whine-0.019
 convincing-0.018
 month-0.018
uth-0.018
 reason-0.018
Token card
Token ID2657
Feature activation+5.25e-02

Pos logit contributions
ing+0.092
ting+0.092
ter+0.089
ning+0.086
it+0.078
Neg logit contributions
 whine-0.095
 month-0.091
 reason-0.090
 hesitation-0.088
 convincing-0.088
Token and
Token ID290
Feature activation+2.69e-02

Pos logit contributions
ting+0.078
ter+0.073
ing+0.072
ning+0.070
 Good+0.063
Neg logit contributions
 whine-0.092
 month-0.086
 convincing-0.085
 Megan-0.085
 hesitation-0.084
Token gives
Token ID3607
Feature activation+2.94e-02

Pos logit contributions
ing+0.052
ter+0.050
ting+0.048
it+0.046
ning+0.046
Neg logit contributions
 whine-0.040
 know-0.036
 argue-0.036
 convincing-0.034
 common-0.033
Tokenudd
Token ID4185
Feature activation+7.67e-02

Pos logit contributions
ting+0.053
ning+0.049
ter+0.048
 Good+0.044
ing+0.044
Neg logit contributions
 tant-0.079
uth-0.078
 whine-0.076
 age-0.074
 allowance-0.073
Tokenly
Token ID306
Feature activation+6.02e-02

Pos logit contributions
ning+0.111
ter+0.106
ting+0.104
ing+0.101
 floating+0.101
Neg logit contributions
 tant-0.120
 reason-0.119
 convincing-0.119
 month-0.118
 whine-0.117
Token toy
Token ID13373
Feature activation+5.77e-02

Pos logit contributions
ing+0.084
ting+0.083
ter+0.083
ning+0.078
it+0.069
Neg logit contributions
 whine-0.112
 convincing-0.107
 month-0.105
 reason-0.104
 hesitation-0.103
Token.
Token ID13
Feature activation+4.25e-02

Pos logit contributions
ting+0.083
ing+0.081
ter+0.078
ning+0.077
it+0.071
Neg logit contributions
 whine-0.102
 age-0.096
 reason-0.095
 month-0.095
 convincing-0.095
Token She
Token ID1375
Feature activation+5.07e-02

Pos logit contributions
ting+0.057
ing+0.057
ter+0.056
ning+0.052
it+0.046
Neg logit contributions
 whine-0.082
 convincing-0.078
 month-0.076
 reason-0.076
 hesitation-0.076
Token picked
Token ID6497
Feature activation+1.08e-01

Pos logit contributions
ing+0.074
ting+0.074
ter+0.073
ning+0.069
it+0.061
Neg logit contributions
 whine-0.092
 convincing-0.088
 month-0.085
uth-0.085
 reason-0.085
Token it
Token ID340
Feature activation+7.49e-02

Pos logit contributions
ter+0.150
pipe+0.144
ning+0.141
auc+0.139
ini+0.139
Neg logit contributions
 know-0.168
 Megan-0.165
 Clara-0.165
 Debbie-0.161
 allowance-0.158
Token up
Token ID510
Feature activation+1.57e-02

Pos logit contributions
ting+0.078
ter+0.066
 Good+0.063
we+0.062
ning+0.062
Neg logit contributions
 reason-0.148
 allowance-0.146
 whine-0.145
 know-0.142
 breath-0.141
Token and
Token ID290
Feature activation+2.51e-02

Pos logit contributions
ing+0.015
ter+0.015
ting+0.014
ning+0.012
 up+0.012
Neg logit contributions
 whine-0.037
 convincing-0.035
 month-0.035
uth-0.034
 reason-0.034
Token shouted
Token ID17293
Feature activation+2.04e-02

Pos logit contributions
ing+0.047
ter+0.045
ting+0.044
ning+0.043
it+0.041
Neg logit contributions
 whine-0.037
 know-0.033
 convincing-0.032
 argue-0.031
 allowance-0.030
Token with
Token ID351
Feature activation-1.34e-02

Pos logit contributions
ting+0.034
ter+0.034
ing+0.033
ning+0.032
 nearest+0.028
Neg logit contributions
 whine-0.032
 convincing-0.030
uth-0.029
 hesitation-0.029
 reason-0.029
Token in
Token ID287
Feature activation-2.01e-02

Pos logit contributions
ting+0.022
ing+0.021
ter+0.021
ning+0.021
 pointer+0.019
Neg logit contributions
 whine-0.026
 convincing-0.025
 month-0.025
 hesitation-0.024
uth-0.024
Token her
Token ID607
Feature activation-4.57e-03

Pos logit contributions
 whine+0.038
 convincing+0.037
uth+0.036
 month+0.035
 reason+0.035
Neg logit contributions
ing-0.027
ting-0.027
ter-0.027
ning-0.025
it-0.022
Token garden
Token ID11376
Feature activation+4.49e-02

Pos logit contributions
 whine+0.008
 convincing+0.008
 month+0.008
 reason+0.008
uth+0.008
Neg logit contributions
ing-0.007
ter-0.006
ting-0.006
ning-0.006
 up-0.005
Token.
Token ID13
Feature activation+3.02e-02

Pos logit contributions
ting+0.070
ter+0.068
ing+0.067
ning+0.064
 pointer+0.058
Neg logit contributions
 whine-0.075
 month-0.072
 convincing-0.071
 age-0.070
 hesitation-0.070
Token She
Token ID1375
Feature activation+3.92e-02

Pos logit contributions
ting+0.040
ing+0.040
ter+0.039
ning+0.036
it+0.033
Neg logit contributions
 whine-0.058
 convincing-0.055
 reason-0.054
 hesitation-0.054
 month-0.054
Token dug
Token ID18735
Feature activation+1.08e-01

Pos logit contributions
ing+0.051
ting+0.051
ter+0.050
ning+0.047
it+0.041
Neg logit contributions
 whine-0.077
 convincing-0.074
 month-0.072
uth-0.072
 reason-0.071
Token a
Token ID257
Feature activation+2.95e-02

Pos logit contributions
ting+0.144
ter+0.142
ning+0.137
ing+0.133
bike+0.115
Neg logit contributions
 month-0.194
 reason-0.194
 Megan-0.191
 whine-0.191
uth-0.191
Token hole
Token ID7604
Feature activation+5.14e-02

Pos logit contributions
ing+0.045
ter+0.044
ting+0.044
ning+0.041
 up+0.037
Neg logit contributions
 whine-0.052
 convincing-0.050
uth-0.048
 month-0.048
 hesitation-0.048
Token and
Token ID290
Feature activation+2.51e-02

Pos logit contributions
ting+0.081
ter+0.075
ning+0.073
ing+0.070
 Good+0.070
Neg logit contributions
 whine-0.082
 convincing-0.080
 Megan-0.078
 hesitation-0.078
 month-0.078
Token planted
Token ID18969
Feature activation+6.37e-02

Pos logit contributions
ing+0.042
ting+0.041
ning+0.040
ter+0.039
ancing+0.035
Neg logit contributions
 whine-0.040
 know-0.039
 convincing-0.038
 reason-0.038
 month-0.037
Token the
Token ID262
Feature activation+5.19e-02

Pos logit contributions
ting+0.099
ning+0.096
ing+0.096
 pointer+0.094
ter+0.094
Neg logit contributions
 Megan-0.094
 month-0.092
 hesitation-0.091
 reason-0.090
 know-0.090
Token that
Token ID326
Feature activation+8.72e-02

Pos logit contributions
 whine+0.005
 convincing+0.005
 month+0.004
 hesitation+0.004
 reason+0.004
Neg logit contributions
ting-0.006
ter-0.006
ing-0.006
ning-0.006
it-0.005
Token the
Token ID262
Feature activation+6.83e-02

Pos logit contributions
ting+0.113
ing+0.113
ter+0.110
ning+0.103
it+0.090
Neg logit contributions
 whine-0.172
 convincing-0.165
 month-0.162
 reason-0.161
 hesitation-0.159
Token apple
Token ID17180
Feature activation+4.50e-02

Pos logit contributions
ing+0.102
ting+0.102
ter+0.100
ning+0.095
it+0.085
Neg logit contributions
 whine-0.120
 convincing-0.115
 month-0.112
 reason-0.112
uth-0.111
Token she
Token ID673
Feature activation+5.44e-02

Pos logit contributions
ting+0.059
ing+0.056
ter+0.053
ning+0.052
ancing+0.048
Neg logit contributions
 whine-0.087
 month-0.082
 reason-0.082
 convincing-0.081
 know-0.080
Token had
Token ID550
Feature activation+7.44e-02

Pos logit contributions
ing+0.078
ting+0.078
ter+0.077
ning+0.072
it+0.064
Neg logit contributions
 whine-0.099
 convincing-0.095
 month-0.092
uth-0.092
 reason-0.092
Token picked
Token ID6497
Feature activation+1.07e-01

Pos logit contributions
ing+0.108
ting+0.107
ter+0.106
ning+0.100
it+0.089
Neg logit contributions
 whine-0.135
 convincing-0.129
 month-0.126
uth-0.125
 reason-0.125
Token was
Token ID373
Feature activation+2.83e-02

Pos logit contributions
ter+0.125
ting+0.118
ning+0.114
ing+0.113
auc+0.109
Neg logit contributions
 month-0.199
 reason-0.197
 whine-0.194
 Megan-0.193
uth-0.192
Token now
Token ID783
Feature activation-1.29e-02

Pos logit contributions
ing+0.040
ting+0.040
ter+0.039
ning+0.036
it+0.033
Neg logit contributions
 whine-0.052
 convincing-0.050
 hesitation-0.049
 month-0.049
 reason-0.049
Token on
Token ID319
Feature activation-2.66e-02

Pos logit contributions
 whine+0.022
 convincing+0.022
 hesitation+0.022
 reason+0.021
 month+0.021
Neg logit contributions
ing-0.020
ting-0.020
ter-0.019
ning-0.018
it-0.016
Token the
Token ID262
Feature activation+5.61e-02

Pos logit contributions
 whine+0.050
 convincing+0.048
uth+0.047
 month+0.046
 reason+0.046
Neg logit contributions
ing-0.037
ting-0.037
ter-0.036
ning-0.034
 up-0.030
Token ground
Token ID2323
Feature activation+4.51e-02

Pos logit contributions
ing+0.084
ting+0.083
ter+0.080
ning+0.076
it+0.070
Neg logit contributions
 whine-0.100
 reason-0.096
 month-0.096
 convincing-0.095
 hesitation-0.093
Token,
Token ID11
Feature activation+1.76e-02

Pos logit contributions
 whine+0.125
uth+0.115
 convincing+0.111
 argue+0.111
 reason+0.111
Neg logit contributions
ing-0.114
 up-0.098
ter-0.094
 that-0.092
ting-0.089
Token Lily
Token ID20037
Feature activation+4.66e-02

Pos logit contributions
ing+0.025
ting+0.025
ter+0.024
ning+0.023
it+0.020
Neg logit contributions
 whine-0.033
 convincing-0.032
 month-0.031
 reason-0.031
uth-0.030
Token's
Token ID338
Feature activation+3.32e-02

Pos logit contributions
ting+0.070
ing+0.070
ter+0.069
ning+0.065
it+0.058
Neg logit contributions
 whine-0.082
 convincing-0.077
 month-0.075
 know-0.075
 reason-0.074
Token friend
Token ID1545
Feature activation+1.46e-02

Pos logit contributions
ter+0.064
ing+0.063
ting+0.062
ning+0.060
 up+0.056
Neg logit contributions
 whine-0.044
 convincing-0.042
 month-0.040
uth-0.040
 hesitation-0.039
Token accidentally
Token ID14716
Feature activation+6.70e-02

Pos logit contributions
ting+0.023
ing+0.023
ter+0.023
ning+0.022
it+0.020
Neg logit contributions
 whine-0.024
 convincing-0.023
 month-0.022
 reason-0.022
 hesitation-0.022
Token spilled
Token ID31896
Feature activation+1.07e-01

Pos logit contributions
ing+0.119
ting+0.117
ter+0.111
ning+0.108
orting+0.105
Neg logit contributions
 whine-0.105
 know-0.097
 argue-0.095
 reason-0.095
 convincing-0.089
Token some
Token ID617
Feature activation+6.79e-03

Pos logit contributions
ing+0.152
ning+0.152
ting+0.150
ter+0.147
ancing+0.139
Neg logit contributions
 whine-0.190
 reason-0.178
 convincing-0.175
 month-0.175
 tant-0.170
Token sugar
Token ID7543
Feature activation+2.14e-02

Pos logit contributions
ting+0.010
ning+0.010
ter+0.010
ing+0.010
ancing+0.009
Neg logit contributions
 whine-0.012
 reason-0.011
 convincing-0.011
 month-0.011
 hesitation-0.010
Token on
Token ID319
Feature activation-1.85e-02

Pos logit contributions
ting+0.030
ning+0.029
ter+0.029
 Good+0.029
ing+0.028
Neg logit contributions
 whine-0.034
 month-0.033
 reason-0.032
 argue-0.032
 hour-0.031
Token her
Token ID607
Feature activation-5.38e-03

Pos logit contributions
 whine+0.031
 month+0.029
 reason+0.029
 convincing+0.029
uth+0.028
Neg logit contributions
ing-0.030
ting-0.029
ter-0.029
ning-0.028
it-0.024
Token dress
Token ID6576
Feature activation+3.70e-02

Pos logit contributions
 whine+0.009
 convincing+0.008
 reason+0.008
 month+0.008
uth+0.008
Neg logit contributions
ing-0.009
ting-0.009
ter-0.009
ning-0.008
it-0.008
Token,
Token ID11
Feature activation-1.62e-02

Pos logit contributions
 whine+0.028
 convincing+0.026
uth+0.026
 argue+0.026
 hesitation+0.025
Neg logit contributions
ing-0.023
ting-0.022
ter-0.022
ning-0.019
 up-0.019
Token Lily
Token ID20037
Feature activation+3.91e-02

Pos logit contributions
 whine+0.030
 convincing+0.028
 hesitation+0.027
 month+0.027
uth+0.027
Neg logit contributions
ter-0.024
ting-0.024
ing-0.023
ning-0.021
 up-0.020
Token's
Token ID338
Feature activation+2.54e-02

Pos logit contributions
ter+0.057
ting+0.056
ing+0.055
ning+0.052
bled+0.050
Neg logit contributions
 whine-0.070
 know-0.067
 month-0.065
 age-0.064
 convincing-0.064
Token little
Token ID1310
Feature activation+2.22e-02

Pos logit contributions
ing+0.056
ting+0.056
ter+0.055
ning+0.053
it+0.050
Neg logit contributions
 whine-0.027
 convincing-0.025
 month-0.024
uth-0.024
 reason-0.023
Token brother
Token ID3956
Feature activation+1.38e-02

Pos logit contributions
ing+0.043
ter+0.043
ting+0.042
ning+0.040
 up+0.037
Neg logit contributions
 whine-0.030
 convincing-0.029
uth-0.028
 month-0.028
 reason-0.027
Token spilled
Token ID31896
Feature activation+1.07e-01

Pos logit contributions
ing+0.022
ter+0.022
ting+0.021
ning+0.021
it+0.019
Neg logit contributions
 whine-0.023
 convincing-0.022
 know-0.021
 reason-0.021
 hesitation-0.021
Token juice
Token ID13135
Feature activation+5.39e-02

Pos logit contributions
ing+0.153
ting+0.152
ning+0.152
ter+0.149
ancing+0.143
Neg logit contributions
 whine-0.189
 month-0.174
 reason-0.174
 convincing-0.174
 tant-0.171
Token on
Token ID319
Feature activation-1.79e-02

Pos logit contributions
ning+0.083
ting+0.082
ini+0.081
ter+0.080
 Good+0.078
Neg logit contributions
 month-0.079
 whine-0.076
 hour-0.074
 tant-0.074
 reason-0.072
Token the
Token ID262
Feature activation+6.61e-02

Pos logit contributions
 whine+0.028
 convincing+0.027
 reason+0.026
 month+0.026
uth+0.025
Neg logit contributions
ing-0.030
ting-0.030
ter-0.029
ning-0.028
it-0.025
Token same
Token ID976
Feature activation+2.51e-02

Pos logit contributions
ing+0.106
ting+0.103
ning+0.101
ter+0.098
it+0.094
Neg logit contributions
 whine-0.111
 reason-0.102
 month-0.102
 age-0.098
 convincing-0.096
Token bright
Token ID6016
Feature activation+1.26e-02

Pos logit contributions
ing+0.039
ting+0.039
ter+0.038
ning+0.036
it+0.033
Neg logit contributions
 whine-0.043
 reason-0.041
 convincing-0.041
 month-0.041
 hesitation-0.039
Token't
Token ID470
Feature activation+6.91e-03

Pos logit contributions
uth+0.024
 whine+0.023
 common+0.022
 reason+0.022
 hesitation+0.022
Neg logit contributions
 that-0.013
ting-0.012
ing-0.012
ter-0.011
 was-0.011
Token give
Token ID1577
Feature activation+3.77e-02

Pos logit contributions
ter+0.013
ting+0.012
ing+0.012
 up+0.011
 that+0.011
Neg logit contributions
uth-0.010
 whine-0.009
 month-0.009
 convincing-0.009
ait-0.009
Token up
Token ID510
Feature activation+1.43e-02

Pos logit contributions
ing+0.046
ting+0.046
ter+0.045
ning+0.042
it+0.036
Neg logit contributions
 whine-0.077
 convincing-0.074
 month-0.072
uth-0.072
 reason-0.072
Token.
Token ID13
Feature activation+4.61e-02

Pos logit contributions
ing+0.018
ting+0.018
ter+0.018
ning+0.016
 up+0.015
Neg logit contributions
 whine-0.028
 convincing-0.027
uth-0.027
 month-0.027
 reason-0.026
Token He
Token ID679
Feature activation+6.21e-02

Pos logit contributions
ting+0.054
ing+0.054
ter+0.054
ning+0.049
it+0.045
Neg logit contributions
 whine-0.094
 convincing-0.090
 hesitation-0.090
 reason-0.089
 month-0.089
Token picked
Token ID6497
Feature activation+1.07e-01

Pos logit contributions
ing+0.096
ting+0.095
ter+0.094
ning+0.089
it+0.080
Neg logit contributions
 whine-0.109
 convincing-0.103
uth-0.100
 reason-0.099
 month-0.099
Token up
Token ID510
Feature activation+1.43e-02

Pos logit contributions
ter+0.112
ting+0.105
ini+0.103
we+0.102
ning+0.102
Neg logit contributions
 Megan-0.209
 know-0.208
 reason-0.200
 Clara-0.200
 tant-0.198
Token the
Token ID262
Feature activation+6.41e-02

Pos logit contributions
ing+0.019
ting+0.019
ter+0.019
ning+0.018
it+0.016
Neg logit contributions
 whine-0.027
 convincing-0.026
 month-0.025
uth-0.025
 reason-0.025
Token ball
Token ID2613
Feature activation+5.68e-02

Pos logit contributions
ing+0.093
ting+0.092
ter+0.088
ning+0.086
it+0.079
Neg logit contributions
 whine-0.116
 reason-0.111
 hesitation-0.109
 convincing-0.108
 month-0.108
Token and
Token ID290
Feature activation+2.39e-02

Pos logit contributions
ting+0.097
ning+0.094
 Good+0.094
ini+0.093
ter+0.089
Neg logit contributions
 whine-0.073
 Megan-0.069
 month-0.068
 argue-0.068
 reason-0.068
Token threw
Token ID9617
Feature activation+8.51e-02

Pos logit contributions
ing+0.043
ter+0.041
ting+0.040
ning+0.039
it+0.037
Neg logit contributions
 whine-0.038
 convincing-0.033
 argue-0.033
 know-0.033
 reason-0.032
Token choose
Token ID3853
Feature activation+1.56e-02

Pos logit contributions
 whine+0.015
 convincing+0.014
uth+0.014
 month+0.014
 hesitation+0.014
Neg logit contributions
ting-0.014
ter-0.014
ing-0.014
ning-0.013
 up-0.012
Token from
Token ID422
Feature activation-6.74e-03

Pos logit contributions
ing+0.020
ting+0.020
ter+0.020
ning+0.019
it+0.016
Neg logit contributions
 whine-0.030
 convincing-0.029
 month-0.028
uth-0.028
 reason-0.028
Token!
Token ID0
Feature activation+4.99e-02

Pos logit contributions
 whine+0.013
 convincing+0.013
uth+0.012
 month+0.012
 reason+0.012
Neg logit contributions
ting-0.009
ing-0.009
ter-0.009
ning-0.008
 up-0.007
Token Tim
Token ID5045
Feature activation+2.45e-02

Pos logit contributions
ting+0.065
ing+0.064
ter+0.064
ning+0.059
it+0.054
Neg logit contributions
 whine-0.097
 convincing-0.092
 hesitation-0.091
 reason-0.091
 argue-0.089
Tokenmy
Token ID1820
Feature activation+1.50e-02

Pos logit contributions
ting+0.029
ing+0.029
ter+0.029
ning+0.027
it+0.023
Neg logit contributions
 whine-0.050
 convincing-0.048
uth-0.047
 month-0.047
 know-0.047
Token picked
Token ID6497
Feature activation+1.07e-01

Pos logit contributions
ting+0.015
ing+0.015
ter+0.015
ning+0.013
it+0.011
Neg logit contributions
 whine-0.034
 convincing-0.033
uth-0.032
 month-0.032
 reason-0.032
Token a
Token ID257
Feature activation+2.46e-02

Pos logit contributions
ter+0.114
ing+0.111
ting+0.111
we+0.103
ning+0.099
Neg logit contributions
 whine-0.216
 convincing-0.211
 hesitation-0.209
 month-0.208
 reason-0.208
Token red
Token ID2266
Feature activation+3.22e-02

Pos logit contributions
ing+0.036
ter+0.036
ting+0.035
ning+0.034
 up+0.031
Neg logit contributions
 whine-0.044
 convincing-0.043
uth-0.041
 hesitation-0.041
 reason-0.040
Token k
Token ID479
Feature activation-1.04e-02

Pos logit contributions
ting+0.058
ing+0.057
ter+0.053
ning+0.053
we+0.052
Neg logit contributions
 whine-0.045
 reason-0.043
 month-0.042
 argue-0.041
 allowance-0.040
Tokenite
Token ID578
Feature activation+6.13e-02

Pos logit contributions
 whine+0.017
 convincing+0.016
 month+0.016
uth+0.016
 reason+0.015
Neg logit contributions
ting-0.017
ing-0.017
ter-0.017
ning-0.016
 up-0.014
Token with
Token ID351
Feature activation+4.13e-03

Pos logit contributions
ting+0.104
ing+0.092
ning+0.092
bled+0.091
ter+0.090
Neg logit contributions
 whine-0.085
 month-0.080
 age-0.079
 argue-0.079
 Megan-0.079
Token special
Token ID2041
Feature activation+3.12e-04

Pos logit contributions
ing+0.015
ter+0.014
ting+0.014
ning+0.014
 up+0.013
Neg logit contributions
 whine-0.014
 convincing-0.014
uth-0.013
 hesitation-0.013
 month-0.013
Token symbol
Token ID6194
Feature activation+5.23e-02

Pos logit contributions
ting+0.001
ing+0.001
ter+0.001
ning+0.001
it+0.000
Neg logit contributions
 whine-0.000
 convincing-0.000
 month-0.000
 reason-0.000
 hesitation-0.000
Token.
Token ID13
Feature activation+3.58e-02

Pos logit contributions
ting+0.078
ter+0.072
ing+0.071
ning+0.070
bled+0.065
Neg logit contributions
 month-0.086
 whine-0.083
 know-0.083
 convincing-0.082
 reason-0.081
Token The
Token ID383
Feature activation+6.53e-02

Pos logit contributions
ting+0.046
ter+0.046
ing+0.046
ning+0.042
it+0.039
Neg logit contributions
 whine-0.070
 convincing-0.066
 hesitation-0.066
 reason-0.066
 month-0.064
Token daddy
Token ID35822
Feature activation+3.40e-03

Pos logit contributions
ing+0.103
ting+0.103
ter+0.101
ning+0.097
it+0.087
Neg logit contributions
 whine-0.110
 convincing-0.105
 month-0.103
 reason-0.102
uth-0.101
Token picked
Token ID6497
Feature activation+1.07e-01

Pos logit contributions
ting+0.005
ing+0.004
ter+0.004
ning+0.004
it+0.004
Neg logit contributions
 whine-0.006
 convincing-0.006
 know-0.006
uth-0.006
 month-0.006
Token an
Token ID281
Feature activation+3.44e-03

Pos logit contributions
ter+0.126
ting+0.113
ning+0.111
pipe+0.110
ancing+0.108
Neg logit contributions
 know-0.198
 month-0.192
 Megan-0.189
 reason-0.187
 Clara-0.187
Token original
Token ID2656
Feature activation+1.38e-02

Pos logit contributions
ing+0.005
ting+0.005
ter+0.005
ning+0.005
it+0.005
Neg logit contributions
 whine-0.006
 convincing-0.006
 month-0.005
 reason-0.005
uth-0.005
Token symbol
Token ID6194
Feature activation+5.04e-02

Pos logit contributions
ting+0.022
ing+0.022
ter+0.021
ning+0.020
it+0.018
Neg logit contributions
 whine-0.023
 reason-0.022
 convincing-0.022
 month-0.022
 hesitation-0.022
Token for
Token ID329
Feature activation-5.87e-03

Pos logit contributions
ting+0.078
ter+0.071
ning+0.071
ing+0.070
 Good+0.068
Neg logit contributions
 month-0.079
 convincing-0.075
 hesitation-0.075
 know-0.075
 whine-0.075
Token the
Token ID262
Feature activation+4.95e-02

Pos logit contributions
 whine+0.011
 convincing+0.011
uth+0.011
 month+0.011
 reason+0.010
Neg logit contributions
ting-0.008
ing-0.008
ter-0.008
ning-0.007
it-0.006
Token
Token ID198
Feature activation-5.83e-05

Pos logit contributions
ing+0.021
ter+0.021
ting+0.020
it+0.020
ning+0.018
Neg logit contributions
 whine-0.034
 hesitation-0.032
 convincing-0.031
 argue-0.030
 reason-0.030
TokenMom
Token ID29252
Feature activation-6.51e-03

Pos logit contributions
 whine+0.000
 hesitation+0.000
 convincing+0.000
 argue+0.000
 allowance+0.000
Neg logit contributions
ing-0.000
ter-0.000
it-0.000
ting-0.000
ning-0.000
Token sigh
Token ID19680
Feature activation+3.32e-02

Pos logit contributions
 whine+0.010
 know+0.009
 convincing+0.009
uth+0.009
 hesitation+0.009
Neg logit contributions
ter-0.011
ting-0.011
ing-0.010
ning-0.010
it-0.009
Tokens
Token ID82
Feature activation+1.80e-02

Pos logit contributions
ter+0.044
ting+0.043
ning+0.040
ing+0.039
 floating+0.035
Neg logit contributions
 whine-0.064
 convincing-0.061
 hesitation-0.058
 month-0.057
uth-0.057
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
ting+0.028
ing+0.027
ter+0.027
ning+0.026
it+0.023
Neg logit contributions
 whine-0.033
 convincing-0.030
 hesitation-0.029
 reason-0.029
 month-0.029
Token picks
Token ID11103
Feature activation+1.07e-01

Pos logit contributions
ing+0.023
ting+0.023
ter+0.023
ning+0.022
it+0.020
Neg logit contributions
 whine-0.018
 convincing-0.016
 hesitation-0.016
 know-0.015
 reason-0.015
Token up
Token ID510
Feature activation+4.39e-03

Pos logit contributions
ter+0.119
ting+0.107
ini+0.105
ning+0.105
ing+0.104
Neg logit contributions
 Megan-0.203
 Clara-0.201
 whine-0.197
 reason-0.195
 month-0.195
Token the
Token ID262
Feature activation+5.58e-02

Pos logit contributions
ing+0.006
ting+0.006
ter+0.006
ning+0.005
 up+0.005
Neg logit contributions
 whine-0.008
 convincing-0.008
 month-0.008
 reason-0.008
uth-0.008
Token cars
Token ID5006
Feature activation+2.45e-02

Pos logit contributions
ing+0.081
ting+0.080
ter+0.078
ning+0.075
it+0.067
Neg logit contributions
 whine-0.102
 convincing-0.096
 reason-0.095
 month-0.095
 hesitation-0.095
Token.
Token ID13
Feature activation+5.32e-02

Pos logit contributions
ting+0.037
ter+0.037
ning+0.036
ing+0.036
 Good+0.034
Neg logit contributions
 whine-0.040
 argue-0.038
 month-0.037
 convincing-0.036
 agree-0.036
Token She
Token ID1375
Feature activation+4.20e-02

Pos logit contributions
ing+0.075
ter+0.075
ting+0.075
ning+0.070
bled+0.061
Neg logit contributions
 whine-0.099
 convincing-0.095
uth-0.093
 month-0.092
 hesitation-0.090
Token in
Token ID287
Feature activation-1.85e-02

Pos logit contributions
ting+0.065
ing+0.063
ning+0.062
ter+0.061
 Good+0.060
Neg logit contributions
 hesitation-0.078
 whine-0.076
 reason-0.075
 month-0.074
 convincing-0.072
Token the
Token ID262
Feature activation+4.68e-02

Pos logit contributions
 whine+0.034
 convincing+0.032
 month+0.032
uth+0.032
 reason+0.032
Neg logit contributions
ing-0.027
ting-0.026
ter-0.026
ning-0.024
it-0.022
Token ground
Token ID2323
Feature activation+5.29e-02

Pos logit contributions
ing+0.073
ting+0.070
ning+0.070
ter+0.067
bled+0.066
Neg logit contributions
 month-0.079
 whine-0.077
 reason-0.073
 hesitation-0.072
 mind-0.070
Token.
Token ID13
Feature activation+2.85e-02

Pos logit contributions
ting+0.093
ter+0.088
ning+0.087
 Good+0.087
ing+0.086
Neg logit contributions
 whine-0.070
 hesitation-0.067
 convincing-0.066
 month-0.066
 age-0.065
Token She
Token ID1375
Feature activation+3.72e-02

Pos logit contributions
ing+0.035
ting+0.034
ter+0.034
ning+0.032
it+0.026
Neg logit contributions
 whine-0.058
 convincing-0.056
uth-0.055
 month-0.055
 reason-0.054
Token dug
Token ID18735
Feature activation+1.07e-01

Pos logit contributions
ing+0.057
ting+0.056
ter+0.056
ning+0.052
it+0.047
Neg logit contributions
 whine-0.065
 convincing-0.062
 month-0.060
uth-0.060
 reason-0.060
Token it
Token ID340
Feature activation+7.66e-02

Pos logit contributions
ning+0.160
ting+0.159
ter+0.155
ing+0.150
bike+0.143
Neg logit contributions
 Megan-0.173
 reason-0.172
 know-0.172
 common-0.166
 month-0.165
Token up
Token ID510
Feature activation+2.06e-02

Pos logit contributions
ting+0.100
ter+0.088
ning+0.082
 Good+0.081
ini+0.077
Neg logit contributions
 know-0.149
 breath-0.146
 reason-0.143
 whine-0.139
 argue-0.138
Token and
Token ID290
Feature activation+2.51e-02

Pos logit contributions
ting+0.030
ing+0.030
ter+0.029
ning+0.028
 Good+0.025
Neg logit contributions
 whine-0.036
 convincing-0.035
 hesitation-0.035
 reason-0.035
 month-0.034
Token saw
Token ID2497
Feature activation+7.07e-02

Pos logit contributions
ing+0.044
ting+0.042
ning+0.041
ter+0.041
orting+0.040
Neg logit contributions
 whine-0.036
 know-0.035
 notice-0.032
 hesitation-0.032
 convincing-0.032
Token that
Token ID326
Feature activation+4.25e-02

Pos logit contributions
ting+0.094
ning+0.092
ing+0.089
orting+0.087
ter+0.085
Neg logit contributions
 know-0.125
 hesitation-0.124
 Megan-0.119
 excitement-0.116
 blonde-0.114
Token his
Token ID465
Feature activation-1.93e-02

Pos logit contributions
ing+0.013
ting+0.013
ter+0.013
ning+0.012
it+0.011
Neg logit contributions
 whine-0.018
 convincing-0.017
 reason-0.017
 month-0.017
 hesitation-0.017
Token parents
Token ID3397
Feature activation-4.48e-02

Pos logit contributions
 whine+0.028
 convincing+0.026
 month+0.026
 argue+0.025
 common+0.025
Neg logit contributions
ing-0.033
 up-0.031
ter-0.031
 that-0.031
ning-0.030
Token about
Token ID546
Feature activation-1.16e-02

Pos logit contributions
 whine+0.080
 convincing+0.078
 hesitation+0.078
 know+0.077
 month+0.074
Neg logit contributions
ter-0.065
ning-0.064
ting-0.063
ing-0.059
 pointer-0.057
Token the
Token ID262
Feature activation+2.31e-02

Pos logit contributions
 whine+0.019
 convincing+0.019
 hesitation+0.018
 reason+0.018
 month+0.017
Neg logit contributions
ing-0.020
ter-0.019
ting-0.019
ning-0.018
it-0.017
Token mysterious
Token ID11428
Feature activation-4.75e-03

Pos logit contributions
ting+0.034
ing+0.034
ter+0.034
ning+0.033
 up+0.031
Neg logit contributions
 whine-0.041
uth-0.038
 convincing-0.038
 hesitation-0.037
 argue-0.036
Token Gor
Token ID19097
Feature activation-1.39e-01

Pos logit contributions
 whine+0.009
 convincing+0.008
 argue+0.008
uth+0.008
 hesitation+0.008
Neg logit contributions
ing-0.007
ter-0.007
ting-0.007
ning-0.007
 up-0.007
Tokenilla
Token ID5049
Feature activation-5.71e-02

Pos logit contributions
 whine+0.283
 convincing+0.270
 reason+0.257
 hesitation+0.255
 month+0.253
Neg logit contributions
ing-0.199
ter-0.190
ting-0.188
ning-0.183
we-0.159
Token.
Token ID13
Feature activation+1.61e-02

Pos logit contributions
 whine+0.108
 convincing+0.105
 hesitation+0.104
 month+0.102
 reason+0.100
Neg logit contributions
ter-0.074
ting-0.073
ing-0.072
ning-0.070
it-0.063
Token His
Token ID2399
Feature activation-1.12e-02

Pos logit contributions
ing+0.025
ting+0.025
ter+0.024
ning+0.023
it+0.021
Neg logit contributions
 whine-0.028
 convincing-0.027
 month-0.026
uth-0.026
 reason-0.026
Token parents
Token ID3397
Feature activation-4.54e-02

Pos logit contributions
 whine+0.017
 month+0.016
aney+0.016
 argue+0.016
 convincing+0.016
Neg logit contributions
ing-0.018
 up-0.018
ter-0.017
ting-0.017
 that-0.017
Token were
Token ID547
Feature activation+6.64e-04

Pos logit contributions
 whine+0.071
 convincing+0.068
 hesitation+0.065
 reason+0.065
 month+0.065
Neg logit contributions
ing-0.077
ting-0.077
ter-0.076
ning-0.073
it-0.066
Token were
Token ID547
Feature activation-1.33e-02

Pos logit contributions
 whine+0.022
 convincing+0.021
 month+0.021
 reason+0.020
 hesitation+0.020
Neg logit contributions
ing-0.018
ting-0.017
ter-0.017
ning-0.016
it-0.014
Token probably
Token ID2192
Feature activation+5.10e-04

Pos logit contributions
 whine+0.025
 convincing+0.024
 month+0.023
 reason+0.023
uth+0.023
Neg logit contributions
ing-0.018
ting-0.018
ter-0.018
ning-0.017
it-0.015
Token celebrating
Token ID17499
Feature activation-2.68e-02

Pos logit contributions
ing+0.001
ting+0.001
ter+0.001
ning+0.001
it+0.001
Neg logit contributions
 whine-0.001
 convincing-0.001
 month-0.001
 reason-0.001
 hesitation-0.001
Token their
Token ID511
Feature activation-2.30e-02

Pos logit contributions
 whine+0.051
 convincing+0.049
 month+0.048
 reason+0.048
 hesitation+0.047
Neg logit contributions
ing-0.036
ting-0.036
ter-0.036
ning-0.033
it-0.029
Token own
Token ID898
Feature activation-3.90e-02

Pos logit contributions
 whine+0.036
uth+0.034
 convincing+0.034
 hesitation+0.033
 know+0.032
Neg logit contributions
ing-0.037
 up-0.035
ting-0.035
ter-0.035
ning-0.034
Token birth
Token ID4082
Feature activation-1.38e-01

Pos logit contributions
 whine+0.066
 convincing+0.063
uth+0.061
 hesitation+0.061
 month+0.060
Neg logit contributions
ing-0.061
ting-0.058
ter-0.058
ning-0.056
 up-0.053
Tokendays
Token ID12545
Feature activation-4.85e-02

Pos logit contributions
 whine+0.112
 convincing+0.099
uth+0.097
 reason+0.095
 hesitation+0.093
Neg logit contributions
ing-0.351
ter-0.338
ting-0.337
ning-0.323
it-0.315
Token.
Token ID13
Feature activation+7.77e-03

Pos logit contributions
 whine+0.085
 month+0.082
 convincing+0.082
 reason+0.079
 age+0.078
Neg logit contributions
ter-0.070
ting-0.070
ing-0.069
ning-0.066
bled-0.058
Token Just
Token ID2329
Feature activation-3.21e-02

Pos logit contributions
ing+0.009
ting+0.008
ter+0.008
ning+0.008
 up+0.007
Neg logit contributions
 whine-0.017
 hesitation-0.016
 convincing-0.016
uth-0.016
 reason-0.015
Token when
Token ID618
Feature activation-1.62e-02

Pos logit contributions
 whine+0.063
 convincing+0.060
 month+0.059
uth+0.059
 reason+0.058
Neg logit contributions
ter-0.042
ting-0.042
ing-0.041
ning-0.038
 up-0.034
Token he
Token ID339
Feature activation-4.62e-03

Pos logit contributions
 whine+0.036
uth+0.036
 hesitation+0.034
 convincing+0.034
 reason+0.034
Neg logit contributions
ing-0.015
ter-0.015
ting-0.015
 up-0.013
ning-0.013
Token.
Token ID13
Feature activation+1.93e-02

Pos logit contributions
 whine+0.083
 convincing+0.081
 month+0.079
 reason+0.078
uth+0.077
Neg logit contributions
ting-0.062
ing-0.060
ter-0.060
ning-0.057
 pointer-0.050
Token You
Token ID921
Feature activation-7.72e-03

Pos logit contributions
ing+0.022
ting+0.021
ter+0.020
ning+0.020
 pointer+0.017
Neg logit contributions
 whine-0.041
 convincing-0.040
uth-0.039
 month-0.039
 know-0.039
Token worked
Token ID3111
Feature activation+3.58e-02

Pos logit contributions
 whine+0.014
 convincing+0.013
uth+0.013
 month+0.013
 hesitation+0.013
Neg logit contributions
ter-0.011
ing-0.011
ting-0.011
ning-0.010
it-0.009
Token with
Token ID351
Feature activation-2.39e-02

Pos logit contributions
ter+0.058
ting+0.058
ing+0.056
ning+0.054
it+0.047
Neg logit contributions
 whine-0.058
 convincing-0.056
 month-0.055
 reason-0.054
uth-0.053
Token such
Token ID884
Feature activation-4.96e-02

Pos logit contributions
 whine+0.047
 convincing+0.045
 month+0.044
uth+0.044
 reason+0.044
Neg logit contributions
ing-0.031
ting-0.031
ter-0.031
ning-0.029
it-0.025
Token delic
Token ID8675
Feature activation-1.37e-01

Pos logit contributions
 whine+0.110
 convincing+0.106
 month+0.104
uth+0.104
 reason+0.103
Neg logit contributions
ing-0.052
ter-0.051
ting-0.051
ning-0.046
it-0.039
Tokenacy
Token ID1590
Feature activation-6.68e-02

Pos logit contributions
 whine+0.232
 month+0.217
 convincing+0.212
 reason+0.212
uth+0.206
Neg logit contributions
ing-0.232
ting-0.226
ter-0.222
ning-0.206
ancing-0.187
Token.
Token ID13
Feature activation+1.90e-02

Pos logit contributions
 whine+0.127
 convincing+0.122
 month+0.120
uth+0.119
 reason+0.119
Neg logit contributions
ting-0.090
ing-0.090
ter-0.089
ning-0.083
it-0.072
Token You
Token ID921
Feature activation-8.86e-03

Pos logit contributions
ing+0.021
ting+0.020
ter+0.020
ning+0.019
 pointer+0.016
Neg logit contributions
 whine-0.041
 convincing-0.040
 month-0.039
uth-0.039
 know-0.038
Token have
Token ID423
Feature activation+3.80e-02

Pos logit contributions
 whine+0.016
 convincing+0.016
uth+0.015
 month+0.015
 hesitation+0.015
Neg logit contributions
ter-0.012
ing-0.012
ting-0.012
ning-0.011
it-0.010
Token a
Token ID257
Feature activation-1.68e-02

Pos logit contributions
ing+0.054
ting+0.052
ter+0.052
ning+0.049
it+0.044
Neg logit contributions
 whine-0.071
 convincing-0.067
uth-0.066
 month-0.065
 reason-0.065
Token.
Token ID13
Feature activation+3.70e-02

Pos logit contributions
ing+0.001
ting+0.001
ter+0.001
ning+0.001
 up+0.001
Neg logit contributions
 whine-0.003
uth-0.002
 convincing-0.002
 month-0.002
 hesitation-0.002
Token He
Token ID679
Feature activation+3.22e-02

Pos logit contributions
ing+0.035
ting+0.035
ter+0.034
ning+0.031
it+0.026
Neg logit contributions
 whine-0.086
 convincing-0.083
 month-0.082
uth-0.082
 reason-0.081
Token kept
Token ID4030
Feature activation+4.17e-03

Pos logit contributions
ing+0.050
ting+0.049
ter+0.048
ning+0.046
it+0.041
Neg logit contributions
 whine-0.057
 convincing-0.054
uth-0.053
 reason-0.052
 month-0.051
Token throwing
Token ID9644
Feature activation+3.84e-02

Pos logit contributions
ing+0.006
ting+0.006
ter+0.006
ning+0.005
it+0.005
Neg logit contributions
 whine-0.008
 convincing-0.007
 month-0.007
uth-0.007
 reason-0.007
Token a
Token ID257
Feature activation-1.55e-02

Pos logit contributions
ing+0.050
ting+0.050
ter+0.049
ning+0.045
 up+0.041
Neg logit contributions
 whine-0.076
 convincing-0.073
uth-0.072
 month-0.071
 hesitation-0.070
Token tant
Token ID24246
Feature activation-1.36e-01

Pos logit contributions
 whine+0.029
 convincing+0.029
uth+0.028
 hesitation+0.028
 month+0.027
Neg logit contributions
ing-0.022
ting-0.021
ter-0.021
ning-0.019
it-0.018
Tokenrum
Token ID6582
Feature activation-3.31e-02

Pos logit contributions
 whine+0.197
 month+0.186
 age+0.184
 convincing+0.182
uth+0.181
Neg logit contributions
ing-0.266
ting-0.246
ter-0.241
ning-0.236
it-0.221
Token.
Token ID13
Feature activation+3.27e-02

Pos logit contributions
 whine+0.041
 Megan+0.038
 age+0.036
 convincing+0.035
 argue+0.033
Neg logit contributions
ning-0.059
ting-0.059
planes-0.057
ini-0.057
 Good-0.055
Token His
Token ID2399
Feature activation+1.73e-03

Pos logit contributions
ing+0.028
ting+0.028
ter+0.027
ning+0.024
it+0.021
Neg logit contributions
 whine-0.078
 convincing-0.075
 hesitation-0.074
uth-0.074
 reason-0.074
Token mom
Token ID1995
Feature activation-3.08e-02

Pos logit contributions
ter+0.003
ing+0.003
 up+0.003
 gun+0.003
 pipe+0.003
Neg logit contributions
 convincing-0.002
uth-0.002
 month-0.002
 hesitation-0.002
 oblig-0.002
Token was
Token ID373
Feature activation+7.04e-03

Pos logit contributions
 whine+0.054
 convincing+0.051
 reason+0.049
 know+0.047
 hesitation+0.047
Neg logit contributions
ter-0.047
ting-0.046
ing-0.046
ning-0.044
ancing-0.040
Token She
Token ID1375
Feature activation+1.81e-02

Pos logit contributions
ter+0.021
ting+0.021
ing+0.021
ning+0.019
it+0.017
Neg logit contributions
 whine-0.025
 convincing-0.024
 month-0.024
uth-0.023
 reason-0.023
Token cried
Token ID16896
Feature activation-8.19e-03

Pos logit contributions
ing+0.026
ter+0.026
ting+0.026
ning+0.024
 up+0.023
Neg logit contributions
 whine-0.032
 convincing-0.031
 hesitation-0.031
 month-0.031
uth-0.030
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
 whine+0.016
 convincing+0.015
 month+0.015
 reason+0.015
 hesitation+0.015
Neg logit contributions
ter-0.010
ting-0.010
ing-0.010
ning-0.010
it-0.008
Token threw
Token ID9617
Feature activation+4.72e-02

Pos logit contributions
uth+0.023
 month+0.023
 reason+0.022
 whine+0.022
aney+0.022
Neg logit contributions
ter-0.023
ting-0.023
ing-0.022
ning-0.021
 that-0.020
Token a
Token ID257
Feature activation-1.90e-02

Pos logit contributions
ting+0.065
ing+0.062
ter+0.060
 up+0.060
ning+0.056
Neg logit contributions
uth-0.091
 convincing-0.089
 whine-0.088
 age-0.085
 oblig-0.084
Token tant
Token ID24246
Feature activation-1.34e-01

Pos logit contributions
 convincing+0.024
uth+0.024
 whine+0.024
 hesitation+0.022
 Megan+0.022
Neg logit contributions
ing-0.038
ting-0.036
ter-0.035
ning-0.034
 pointer-0.034
Tokenrum
Token ID6582
Feature activation-3.22e-02

Pos logit contributions
 whine+0.195
 month+0.184
uth+0.182
 convincing+0.181
 age+0.180
Neg logit contributions
ing-0.261
ting-0.247
ter-0.240
ning-0.232
it-0.214
Token.
Token ID13
Feature activation+1.22e-02

Pos logit contributions
 whine+0.043
 age+0.039
 Megan+0.038
 month+0.037
uth+0.036
Neg logit contributions
ting-0.053
ning-0.053
ter-0.053
ini-0.052
auc-0.051
Token Her
Token ID2332
Feature activation-1.56e-02

Pos logit contributions
ing+0.021
ting+0.021
ter+0.021
ning+0.020
it+0.018
Neg logit contributions
 whine-0.019
 convincing-0.018
 month-0.017
uth-0.017
 reason-0.017
Token mom
Token ID1995
Feature activation-3.97e-02

Pos logit contributions
uth+0.026
 convincing+0.026
ait+0.026
Eva+0.025
 Debbie+0.025
Neg logit contributions
 laser-0.022
 gun-0.022
ter-0.022
 up-0.022
 pipe-0.022
Token tried
Token ID3088
Feature activation+3.99e-02

Pos logit contributions
 whine+0.070
 convincing+0.067
 reason+0.064
 know+0.063
 month+0.063
Neg logit contributions
ter-0.059
ting-0.059
ing-0.059
ning-0.055
it-0.050
Token listen
Token ID6004
Feature activation-1.25e-03

Pos logit contributions
uth+0.019
 Megan+0.018
 allowance+0.018
 Debbie+0.017
azel+0.017
Neg logit contributions
ter-0.023
ing-0.023
ting-0.023
 that-0.021
 up-0.020
Token.
Token ID13
Feature activation+3.63e-02

Pos logit contributions
 whine+0.003
 convincing+0.003
uth+0.003
 month+0.002
 hesitation+0.002
Neg logit contributions
ing-0.002
ting-0.001
ter-0.001
ning-0.001
 up-0.001
Token She
Token ID1375
Feature activation+2.56e-02

Pos logit contributions
ter+0.059
ting+0.059
ing+0.058
ning+0.057
bled+0.050
Neg logit contributions
 whine-0.060
 convincing-0.057
 age-0.056
 month-0.056
 know-0.055
Token threw
Token ID9617
Feature activation+5.33e-02

Pos logit contributions
ing+0.043
ting+0.043
ter+0.042
ning+0.040
it+0.036
Neg logit contributions
 whine-0.042
 convincing-0.039
uth-0.038
 reason-0.038
 month-0.038
Token a
Token ID257
Feature activation-1.10e-02

Pos logit contributions
ting+0.072
ing+0.072
ter+0.070
ning+0.065
 up+0.059
Neg logit contributions
 whine-0.103
 convincing-0.099
uth-0.097
 month-0.096
 reason-0.095
Token tant
Token ID24246
Feature activation-1.32e-01

Pos logit contributions
 whine+0.012
 convincing+0.012
uth+0.012
 hesitation+0.011
 month+0.011
Neg logit contributions
ing-0.024
ting-0.023
ter-0.023
ning-0.021
 pointer-0.021
Tokenrum
Token ID6582
Feature activation-2.98e-02

Pos logit contributions
 whine+0.192
 month+0.185
 age+0.179
 convincing+0.179
uth+0.177
Neg logit contributions
ing-0.260
ting-0.244
ter-0.239
ning-0.229
it-0.214
Token in
Token ID287
Feature activation-6.07e-02

Pos logit contributions
 whine+0.039
 Megan+0.037
 age+0.036
uth+0.034
 Clara+0.033
Neg logit contributions
ning-0.049
ini-0.049
planes-0.047
auc-0.047
ting-0.046
Token the
Token ID262
Feature activation+1.72e-02

Pos logit contributions
uth+0.127
 whine+0.123
 convincing+0.121
 oblig+0.121
azel+0.116
Neg logit contributions
ting-0.072
ing-0.069
ter-0.066
 up-0.061
ning-0.060
Token store
Token ID3650
Feature activation-5.15e-03

Pos logit contributions
ter+0.027
ing+0.027
ting+0.027
ning+0.026
it+0.025
Neg logit contributions
 whine-0.031
 reason-0.029
 convincing-0.028
 common-0.028
 month-0.027
Token.
Token ID13
Feature activation+3.49e-02

Pos logit contributions
 whine+0.010
 convincing+0.009
 month+0.009
 reason+0.009
 hesitation+0.009
Neg logit contributions
ing-0.007
ting-0.007
ter-0.007
ning-0.006
it-0.006
Token,
Token ID11
Feature activation-1.37e-02

Pos logit contributions
 whine+0.095
 convincing+0.091
 hesitation+0.090
uth+0.090
 reason+0.090
Neg logit contributions
ing-0.044
ter-0.039
ting-0.039
 up-0.038
 that-0.037
Token ready
Token ID3492
Feature activation-2.76e-02

Pos logit contributions
 whine+0.020
 convincing+0.018
uth+0.018
 hesitation+0.018
 reason+0.018
Neg logit contributions
ting-0.025
ter-0.025
ing-0.025
ning-0.024
 up-0.022
Token to
Token ID284
Feature activation-3.36e-02

Pos logit contributions
 whine+0.048
 month+0.047
 convincing+0.047
 reason+0.045
 know+0.045
Neg logit contributions
ting-0.041
ter-0.041
ing-0.039
ning-0.038
rooms-0.034
Token have
Token ID423
Feature activation+3.60e-02

Pos logit contributions
 whine+0.048
 convincing+0.045
 month+0.044
 know+0.044
 reason+0.044
Neg logit contributions
ing-0.064
ting-0.063
ter-0.062
ning-0.060
it-0.054
Token more
Token ID517
Feature activation-7.14e-02

Pos logit contributions
ing+0.050
ting+0.049
ter+0.049
ning+0.045
it+0.040
Neg logit contributions
 whine-0.068
 convincing-0.066
 month-0.065
uth-0.064
 reason-0.064
Token pic
Token ID8301
Feature activation-1.29e-01

Pos logit contributions
 whine+0.139
 argue+0.127
uth+0.124
 Debbie+0.122
 hesitation+0.121
Neg logit contributions
ing-0.093
 up-0.092
 that-0.087
 floating-0.081
ning-0.079
Tokenn
Token ID77
Feature activation-3.84e-02

Pos logit contributions
 whine+0.217
 convincing+0.201
uth+0.197
 month+0.195
 hesitation+0.194
Neg logit contributions
ing-0.205
ting-0.203
ter-0.201
ning-0.188
it-0.172
Tokenics
Token ID873
Feature activation-9.23e-03

Pos logit contributions
 whine+0.060
 convincing+0.059
 month+0.059
 reason+0.057
uth+0.057
Neg logit contributions
ter-0.062
ting-0.062
ing-0.059
ning-0.056
 pointer-0.051
Token in
Token ID287
Feature activation-5.92e-02

Pos logit contributions
 whine+0.020
 convincing+0.019
uth+0.018
 hesitation+0.018
 month+0.018
Neg logit contributions
ing-0.011
ting-0.011
ter-0.011
ning-0.010
 up-0.009
Token the
Token ID262
Feature activation+1.93e-02

Pos logit contributions
 whine+0.137
uth+0.134
 convincing+0.131
 hesitation+0.127
 reason+0.125
Neg logit contributions
ter-0.050
 that-0.048
 up-0.047
ting-0.047
 floating-0.044
Token future
Token ID2003
Feature activation-3.50e-02

Pos logit contributions
ting+0.034
ter+0.034
ing+0.034
we+0.031
bled+0.031
Neg logit contributions
 month-0.032
 whine-0.029
 reason-0.028
 convincing-0.028
 mind-0.027
Token her
Token ID607
Feature activation-1.81e-02

Pos logit contributions
ting+0.022
 up+0.021
ing+0.020
ter+0.019
ning+0.019
Neg logit contributions
 convincing-0.032
uth-0.031
 month-0.031
 whine-0.031
 oblig-0.030
Token feet
Token ID3625
Feature activation-8.88e-03

Pos logit contributions
 convincing+0.029
 oblig+0.029
 month+0.028
 know+0.027
uth+0.027
Neg logit contributions
 up-0.030
ing-0.029
ter-0.029
ting-0.028
 that-0.028
Token and
Token ID290
Feature activation+1.50e-04

Pos logit contributions
 whine+0.018
 convincing+0.018
 month+0.017
 reason+0.017
 hesitation+0.017
Neg logit contributions
ting-0.011
ing-0.011
ter-0.011
ning-0.010
it-0.008
Token threw
Token ID9617
Feature activation+5.42e-02

Pos logit contributions
ting+0.000
ter+0.000
ing+0.000
ning+0.000
 up+0.000
Neg logit contributions
 whine-0.000
 month-0.000
uth-0.000
 convincing-0.000
 reason-0.000
Token a
Token ID257
Feature activation-1.33e-02

Pos logit contributions
ting+0.071
ing+0.068
ter+0.068
ning+0.062
 up+0.061
Neg logit contributions
 whine-0.107
 convincing-0.106
uth-0.105
 month-0.102
 age-0.100
Token tant
Token ID24246
Feature activation-1.29e-01

Pos logit contributions
 convincing+0.012
uth+0.012
 whine+0.012
 hesitation+0.011
 Megan+0.010
Neg logit contributions
ing-0.032
ting-0.030
ter-0.030
 pointer-0.028
 gun-0.028
Tokenrum
Token ID6582
Feature activation-2.91e-02

Pos logit contributions
 whine+0.195
 month+0.187
 age+0.184
 convincing+0.184
 allowance+0.182
Neg logit contributions
ing-0.242
ting-0.228
ter-0.223
ning-0.213
we-0.199
Token.
Token ID13
Feature activation+1.88e-02

Pos logit contributions
 whine+0.041
 Megan+0.038
 age+0.037
uth+0.036
 hesitation+0.035
Neg logit contributions
ting-0.046
ning-0.045
ter-0.045
orting-0.044
ini-0.044
Token She
Token ID1375
Feature activation+2.11e-02

Pos logit contributions
ing+0.028
ting+0.027
ter+0.027
ning+0.025
 up+0.023
Neg logit contributions
 whine-0.034
 convincing-0.032
 reason-0.032
 month-0.032
 hesitation-0.031
Token grabbed
Token ID13176
Feature activation+4.29e-02

Pos logit contributions
ting+0.034
ing+0.034
ter+0.034
ning+0.032
 up+0.029
Neg logit contributions
 whine-0.033
 convincing-0.033
 month-0.033
 hesitation-0.032
uth-0.032
Token the
Token ID262
Feature activation+2.55e-02

Pos logit contributions
ting+0.056
ing+0.054
ter+0.054
ning+0.049
 up+0.047
Neg logit contributions
 whine-0.086
 convincing-0.083
uth-0.082
 month-0.082
 reason-0.080
Token to
Token ID284
Feature activation-1.94e-02

Pos logit contributions
ing+0.028
ting+0.028
ning+0.026
ter+0.025
 up+0.024
Neg logit contributions
uth-0.055
 whine-0.054
 convincing-0.053
 month-0.053
 hesitation-0.052
Token cry
Token ID3960
Feature activation-6.27e-05

Pos logit contributions
uth+0.025
 month+0.024
 convincing+0.023
 hesitation+0.022
 hour+0.021
Neg logit contributions
ter-0.037
ting-0.036
 up-0.035
ing-0.035
ning-0.035
Token and
Token ID290
Feature activation-1.16e-02

Pos logit contributions
 whine+0.000
 convincing+0.000
 reason+0.000
 month+0.000
 know+0.000
Neg logit contributions
ting-0.000
ter-0.000
ing-0.000
ning-0.000
rooms-0.000
Token threw
Token ID9617
Feature activation+4.93e-02

Pos logit contributions
uth+0.018
 whine+0.018
 month+0.017
 convincing+0.017
 hesitation+0.017
Neg logit contributions
ting-0.019
ter-0.019
ing-0.019
ning-0.018
 up-0.017
Token a
Token ID257
Feature activation-1.62e-02

Pos logit contributions
ting+0.066
ing+0.063
 up+0.061
ter+0.061
ning+0.055
Neg logit contributions
uth-0.101
 convincing-0.098
 whine-0.098
 age-0.094
 month-0.092
Token tant
Token ID24246
Feature activation-1.28e-01

Pos logit contributions
 whine+0.023
 convincing+0.022
uth+0.022
 month+0.021
 hesitation+0.021
Neg logit contributions
ing-0.030
ting-0.029
ter-0.029
ning-0.028
it-0.026
Tokenrum
Token ID6582
Feature activation-2.86e-02

Pos logit contributions
 whine+0.186
 month+0.175
 convincing+0.174
 age+0.173
uth+0.173
Neg logit contributions
ing-0.251
ting-0.237
ter-0.232
ning-0.223
it-0.207
Token.
Token ID13
Feature activation+1.21e-02

Pos logit contributions
 whine+0.040
 age+0.035
 Megan+0.034
 month+0.034
uth+0.033
Neg logit contributions
ting-0.047
ning-0.047
ter-0.046
auc-0.046
rooms-0.046
Token She
Token ID1375
Feature activation+1.88e-02

Pos logit contributions
ter+0.021
ting+0.021
ing+0.021
ning+0.020
it+0.018
Neg logit contributions
 whine-0.019
 convincing-0.017
 month-0.017
uth-0.017
 reason-0.016
Token didn
Token ID1422
Feature activation+9.96e-03

Pos logit contributions
ing+0.029
ting+0.029
ter+0.028
ning+0.027
 up+0.024
Neg logit contributions
 whine-0.032
 convincing-0.031
 month-0.031
uth-0.030
 hesitation-0.030
Token't
Token ID470
Feature activation-1.43e-02

Pos logit contributions
ter+0.020
ting+0.019
ing+0.019
ning+0.019
axe+0.016
Neg logit contributions
 whine-0.012
 convincing-0.012
 know-0.012
 month-0.012
 reason-0.012
Token regretted
Token ID44652
Feature activation-1.86e-02

Pos logit contributions
ting+0.039
ing+0.039
ter+0.038
ning+0.036
 up+0.032
Neg logit contributions
 whine-0.051
 convincing-0.050
 month-0.049
uth-0.048
 hesitation-0.048
Token getting
Token ID1972
Feature activation-9.24e-05

Pos logit contributions
 whine+0.031
 convincing+0.030
 hesitation+0.030
 age+0.028
 argue+0.028
Neg logit contributions
ing-0.030
ter-0.028
ting-0.028
ning-0.027
auc-0.027
Token close
Token ID1969
Feature activation-4.89e-02

Pos logit contributions
 whine+0.000
 convincing+0.000
 month+0.000
 hesitation+0.000
 reason+0.000
Neg logit contributions
ing-0.000
ting-0.000
ter-0.000
ning-0.000
it-0.000
Token to
Token ID284
Feature activation-4.00e-02

Pos logit contributions
 whine+0.095
 convincing+0.091
 month+0.089
uth+0.088
 hesitation+0.088
Neg logit contributions
ing-0.065
ting-0.065
ter-0.064
ning-0.059
it-0.052
Token the
Token ID262
Feature activation+2.75e-02

Pos logit contributions
 whine+0.078
 convincing+0.074
uth+0.073
 month+0.072
 reason+0.072
Neg logit contributions
ting-0.054
ing-0.052
ter-0.052
ning-0.048
 up-0.044
Token Gor
Token ID19097
Feature activation-1.27e-01

Pos logit contributions
ting+0.044
ter+0.043
ing+0.043
ning+0.040
 up+0.039
Neg logit contributions
 whine-0.047
 convincing-0.044
uth-0.043
 hesitation-0.042
 argue-0.042
Tokenilla
Token ID5049
Feature activation-5.35e-02

Pos logit contributions
 whine+0.250
 convincing+0.244
 hesitation+0.231
 reason+0.231
 month+0.228
Neg logit contributions
ing-0.182
ting-0.174
ter-0.173
ning-0.169
 up-0.141
Token and
Token ID290
Feature activation-1.78e-02

Pos logit contributions
 whine+0.097
 hesitation+0.092
 convincing+0.092
 month+0.091
 reason+0.089
Neg logit contributions
ter-0.075
ting-0.073
ing-0.072
ning-0.069
bled-0.064
Token not
Token ID407
Feature activation-3.48e-02

Pos logit contributions
 whine+0.031
 convincing+0.028
 know+0.028
 reason+0.028
 hesitation+0.027
Neg logit contributions
ing-0.029
ting-0.028
ter-0.027
ning-0.027
it-0.025
Token listening
Token ID8680
Feature activation+5.55e-03

Pos logit contributions
 whine+0.059
uth+0.057
 convincing+0.056
 month+0.056
 reason+0.055
Neg logit contributions
ting-0.054
ing-0.054
ter-0.053
ning-0.050
 up-0.045
Token to
Token ID284
Feature activation-4.32e-02

Pos logit contributions
ing+0.007
ting+0.007
ter+0.007
ning+0.007
it+0.006
Neg logit contributions
 whine-0.011
 convincing-0.010
 month-0.010
uth-0.010
 reason-0.010
Token the
Token ID262
Feature activation+2.63e-02

Pos logit contributions
ing+0.027
ting+0.026
ter+0.026
ning+0.025
it+0.022
Neg logit contributions
 whine-0.037
 convincing-0.035
 month-0.035
 reason-0.034
uth-0.034
Token cake
Token ID12187
Feature activation-2.19e-02

Pos logit contributions
ing+0.037
ting+0.036
ter+0.036
ning+0.033
 up+0.032
Neg logit contributions
 whine-0.050
 convincing-0.048
uth-0.047
 hesitation-0.046
 reason-0.045
Token.
Token ID13
Feature activation+2.86e-02

Pos logit contributions
 whine+0.041
 convincing+0.039
 month+0.039
 reason+0.038
uth+0.038
Neg logit contributions
ting-0.030
ing-0.030
ter-0.030
ning-0.028
it-0.024
Token
Token ID198
Feature activation-4.27e-02

Pos logit contributions
ting+0.032
ter+0.031
ing+0.031
ning+0.028
 That+0.026
Neg logit contributions
 whine-0.061
 hesitation-0.057
 reason-0.057
 convincing-0.057
 argue-0.056
Token
Token ID198
Feature activation-3.85e-02

Pos logit contributions
 whine+0.093
 hesitation+0.087
 convincing+0.085
 age+0.085
 reason+0.085
Neg logit contributions
ing-0.048
ting-0.045
ter-0.044
ning-0.040
 up-0.040
TokenUn
Token ID3118
Feature activation-1.27e-01

Pos logit contributions
 whine+0.063
 hesitation+0.058
 convincing+0.058
 reason+0.057
 month+0.056
Neg logit contributions
ing-0.065
ting-0.065
ter-0.063
ning-0.059
it-0.057
Tokencle
Token ID2375
Feature activation-6.92e-02

Pos logit contributions
 whine+0.133
 convincing+0.121
 hesitation+0.116
 reason+0.113
 month+0.110
Neg logit contributions
ing-0.289
ting-0.282
ter-0.281
ning-0.270
it-0.255
Token Bob
Token ID5811
Feature activation-4.87e-02

Pos logit contributions
 whine+0.090
uth+0.088
 convincing+0.085
 Megan+0.083
 hesitation+0.081
Neg logit contributions
ting-0.128
ing-0.128
ter-0.125
ning-0.121
 pointer-0.113
Token was
Token ID373
Feature activation+1.64e-04

Pos logit contributions
 whine+0.093
 convincing+0.088
uth+0.087
 month+0.085
 reason+0.085
Neg logit contributions
ing-0.066
ting-0.065
ter-0.064
ning-0.060
it-0.053
Token very
Token ID845
Feature activation-8.11e-02

Pos logit contributions
ter+0.000
ting+0.000
 up+0.000
ing+0.000
ning+0.000
Neg logit contributions
 whine-0.000
uth-0.000
 month-0.000
 reason-0.000
 allowance-0.000
Token shocked
Token ID11472
Feature activation-5.45e-03

Pos logit contributions
 allowance+0.143
 month+0.142
uth+0.142
kids+0.140
 hesitation+0.139
Neg logit contributions
ing-0.115
 up-0.113
ter-0.106
 that-0.103
it-0.098
Token and
Token ID290
Feature activation-2.53e-02

Pos logit contributions
 whine+0.051
 convincing+0.049
 month+0.048
uth+0.048
 reason+0.048
Neg logit contributions
ing-0.032
ting-0.031
ter-0.031
ning-0.029
it-0.025
Token so
Token ID523
Feature activation-8.48e-02

Pos logit contributions
uth+0.047
 whine+0.046
 reason+0.046
 hesitation+0.046
 month+0.045
Neg logit contributions
ting-0.033
ter-0.033
ing-0.033
ning-0.029
 up-0.028
Token she
Token ID673
Feature activation+2.16e-03

Pos logit contributions
 whine+0.174
 reason+0.170
 hesitation+0.168
 convincing+0.167
 argue+0.166
Neg logit contributions
ing-0.109
ting-0.097
ter-0.097
 that-0.089
ning-0.089
Token threw
Token ID9617
Feature activation+5.56e-02

Pos logit contributions
ter+0.003
ting+0.003
ing+0.003
ning+0.003
 up+0.002
Neg logit contributions
 hesitation-0.004
 whine-0.004
 convincing-0.004
 month-0.004
uth-0.004
Token a
Token ID257
Feature activation-1.57e-02

Pos logit contributions
ting+0.075
ing+0.073
ter+0.072
 up+0.068
ning+0.065
Neg logit contributions
 whine-0.110
uth-0.108
 convincing-0.106
 age-0.103
 month-0.103
Token tant
Token ID24246
Feature activation-1.27e-01

Pos logit contributions
 whine+0.029
 convincing+0.028
uth+0.027
 month+0.027
 hesitation+0.027
Neg logit contributions
ing-0.023
ting-0.022
ter-0.022
ning-0.020
it-0.018
Tokenrum
Token ID6582
Feature activation-2.93e-02

Pos logit contributions
 whine+0.186
 month+0.173
 convincing+0.170
uth+0.169
 age+0.169
Neg logit contributions
ing-0.248
ting-0.233
ter-0.228
ning-0.221
we-0.203
Token when
Token ID618
Feature activation-2.05e-02

Pos logit contributions
 whine+0.042
 age+0.039
 Megan+0.039
 month+0.037
uth+0.037
Neg logit contributions
ting-0.046
ini-0.045
ning-0.044
orting-0.044
ter-0.044
Token her
Token ID607
Feature activation-3.22e-02

Pos logit contributions
 whine+0.039
 convincing+0.037
uth+0.037
 reason+0.036
 month+0.036
Neg logit contributions
ing-0.028
ting-0.027
ter-0.027
ning-0.025
 up-0.023
Token parents
Token ID3397
Feature activation-4.20e-02

Pos logit contributions
 whine+0.039
 convincing+0.038
 month+0.037
uth+0.037
 know+0.035
Neg logit contributions
ing-0.065
ter-0.064
ting-0.062
ning-0.061
 up-0.058
Token said
Token ID531
Feature activation+1.28e-02

Pos logit contributions
 whine+0.071
 convincing+0.068
uth+0.066
 month+0.066
 reason+0.066
Neg logit contributions
ing-0.066
ting-0.066
ter-0.065
ning-0.061
it-0.055
Token sunshine
Token ID34488
Feature activation+5.90e-03

Pos logit contributions
ing+0.044
ter+0.043
ting+0.043
bled+0.042
we+0.039
Neg logit contributions
 month-0.045
 whine-0.042
 reason-0.039
uth-0.039
 blonde-0.039
Token.
Token ID13
Feature activation+8.02e-03

Pos logit contributions
ting+0.008
ing+0.008
ter+0.008
ning+0.008
it+0.007
Neg logit contributions
 whine-0.011
 month-0.010
 convincing-0.010
 reason-0.010
uth-0.010
Token Molly
Token ID30236
Feature activation-2.06e-02

Pos logit contributions
ing+0.014
ting+0.014
ter+0.014
ning+0.013
it+0.012
Neg logit contributions
 whine-0.012
 convincing-0.012
 reason-0.011
 hesitation-0.011
uth-0.011
Token liked
Token ID8288
Feature activation+5.46e-03

Pos logit contributions
 whine+0.044
 convincing+0.043
 hesitation+0.042
 reason+0.042
 month+0.042
Neg logit contributions
ing-0.023
ting-0.022
ter-0.022
ning-0.020
 up-0.018
Token the
Token ID262
Feature activation+3.49e-02

Pos logit contributions
ing+0.008
ter+0.008
ting+0.008
ning+0.007
it+0.007
Neg logit contributions
 whine-0.009
 convincing-0.009
 month-0.009
 hesitation-0.009
 reason-0.009
Token delic
Token ID8675
Feature activation-1.26e-01

Pos logit contributions
ing+0.048
ting+0.048
ter+0.048
ning+0.045
it+0.039
Neg logit contributions
 whine-0.066
 convincing-0.063
uth-0.061
 month-0.061
 reason-0.061
Tokenately
Token ID1286
Feature activation-1.77e-02

Pos logit contributions
 whine+0.216
 convincing+0.201
 month+0.196
 reason+0.193
 hesitation+0.190
Neg logit contributions
ing-0.206
ter-0.206
ting-0.204
ning-0.189
it-0.168
Token pet
Token ID4273
Feature activation-1.05e-02

Pos logit contributions
 whine+0.031
 convincing+0.030
 month+0.030
 reason+0.029
uth+0.029
Neg logit contributions
ter-0.027
ting-0.026
ing-0.026
ning-0.024
it-0.022
Tokenalled
Token ID4262
Feature activation+2.07e-02

Pos logit contributions
 whine+0.018
 month+0.018
uth+0.018
 reason+0.017
 convincing+0.017
Neg logit contributions
ing-0.014
ter-0.014
ting-0.013
ning-0.013
bled-0.013
Token yellow
Token ID7872
Feature activation+4.86e-03

Pos logit contributions
ter+0.028
ting+0.027
ing+0.027
ning+0.026
bled+0.023
Neg logit contributions
 whine-0.038
 convincing-0.037
 hesitation-0.036
 month-0.036
 reason-0.036
Token ones
Token ID3392
Feature activation-4.22e-02

Pos logit contributions
ing+0.008
ting+0.007
ter+0.007
ning+0.007
it+0.006
Neg logit contributions
 whine-0.008
 convincing-0.008
 month-0.008
uth-0.008
 reason-0.008
Token him
Token ID683
Feature activation-6.66e-04

Pos logit contributions
ing+0.046
ter+0.046
ning+0.045
ting+0.044
it+0.043
Neg logit contributions
 whine-0.046
 convincing-0.045
 hesitation-0.044
 reason-0.043
 know-0.042
Token and
Token ID290
Feature activation-8.30e-03

Pos logit contributions
 whine+0.001
 convincing+0.001
 argue+0.001
 reason+0.001
 know+0.001
Neg logit contributions
ter-0.001
ting-0.001
ing-0.001
ning-0.001
auc-0.001
Token said
Token ID531
Feature activation+1.59e-02

Pos logit contributions
 whine+0.011
 convincing+0.011
 hesitation+0.010
uth+0.010
 reason+0.010
Neg logit contributions
ing-0.015
ting-0.015
ter-0.015
ning-0.014
it-0.013
Token,
Token ID11
Feature activation-2.93e-02

Pos logit contributions
ting+0.022
ter+0.021
ning+0.021
ing+0.020
 nearest+0.019
Neg logit contributions
 convincing-0.028
 whine-0.028
 hesitation-0.027
uth-0.026
 know-0.026
Token "
Token ID366
Feature activation+4.01e-02

Pos logit contributions
 convincing+0.058
 whine+0.058
 hesitation+0.055
 month+0.054
uth+0.054
Neg logit contributions
ing-0.036
ter-0.035
ting-0.034
ning-0.033
auc-0.030
TokenUn
Token ID3118
Feature activation-1.26e-01

Pos logit contributions
ing+0.051
ter+0.050
ting+0.048
ning+0.048
 up+0.044
Neg logit contributions
 whine-0.078
 convincing-0.077
 reason-0.075
 month-0.074
uth-0.073
Tokenfree
Token ID5787
Feature activation+8.29e-03

Pos logit contributions
 whine+0.188
 convincing+0.178
 month+0.173
uth+0.172
 reason+0.171
Neg logit contributions
ing-0.222
ting-0.221
ter-0.219
ning-0.208
it-0.189
Tokenze
Token ID2736
Feature activation+8.54e-04

Pos logit contributions
ting+0.010
ter+0.010
ing+0.010
ning+0.009
pipe+0.009
Neg logit contributions
 whine-0.016
 reason-0.016
uth-0.015
 month-0.015
 convincing-0.015
Token!"
Token ID2474
Feature activation-1.32e-02

Pos logit contributions
ter+0.001
ting+0.001
ning+0.001
ing+0.001
 nearest+0.001
Neg logit contributions
 whine-0.001
 convincing-0.001
 reason-0.001
 know-0.001
 month-0.001
Token Tim
Token ID5045
Feature activation-7.31e-03

Pos logit contributions
 convincing+0.021
 whine+0.021
 age+0.020
 know+0.019
 month+0.019
Neg logit contributions
ing-0.021
ter-0.021
ning-0.021
ting-0.020
 pointer-0.018
Token smiled
Token ID13541
Feature activation-1.06e-02

Pos logit contributions
 whine+0.013
 convincing+0.012
uth+0.012
 know+0.012
 reason+0.012
Neg logit contributions
ting-0.011
ter-0.011
ing-0.011
ning-0.010
it-0.009
Tokenanas
Token ID15991
Feature activation-4.38e-02

Pos logit contributions
 whine+0.139
 hesitation+0.134
 convincing+0.131
 argue+0.123
 reason+0.120
Neg logit contributions
ing-0.150
ter-0.138
ting-0.137
ning-0.131
it-0.128
Token available
Token ID1695
Feature activation-9.35e-03

Pos logit contributions
 whine+0.075
 convincing+0.072
 month+0.071
 know+0.070
 reason+0.069
Neg logit contributions
ting-0.064
ter-0.064
ing-0.063
ning-0.060
 pointer-0.053
Token for
Token ID329
Feature activation-3.54e-02

Pos logit contributions
 whine+0.017
 month+0.016
 convincing+0.016
 reason+0.016
uth+0.016
Neg logit contributions
ting-0.013
ing-0.013
ter-0.013
ning-0.012
 pointer-0.011
Token feeding
Token ID13017
Feature activation-8.13e-03

Pos logit contributions
 whine+0.072
 convincing+0.066
uth+0.065
 hesitation+0.065
 month+0.064
Neg logit contributions
ting-0.045
ter-0.045
ing-0.043
ning-0.041
 that-0.036
Token the
Token ID262
Feature activation+2.45e-02

Pos logit contributions
 whine+0.015
 convincing+0.015
uth+0.014
 month+0.014
 hesitation+0.014
Neg logit contributions
ing-0.011
ting-0.011
ter-0.011
ning-0.010
it-0.009
Token gor
Token ID30344
Feature activation-1.25e-01

Pos logit contributions
ting+0.038
ter+0.036
ing+0.035
ning+0.034
 up+0.033
Neg logit contributions
 whine-0.046
 hesitation-0.041
 argue-0.040
 convincing-0.040
 oblig-0.039
Tokenillas
Token ID25314
Feature activation-2.16e-02

Pos logit contributions
 whine+0.222
 convincing+0.208
 reason+0.205
 hesitation+0.196
 common+0.196
Neg logit contributions
ing-0.219
ting-0.215
ter-0.205
ning-0.198
we-0.182
Token."
Token ID526
Feature activation-1.56e-02

Pos logit contributions
 whine+0.040
 convincing+0.039
uth+0.037
 hesitation+0.037
 reason+0.037
Neg logit contributions
ing-0.031
ting-0.031
ter-0.030
ning-0.028
 up-0.025
Token She
Token ID1375
Feature activation+2.00e-02

Pos logit contributions
 whine+0.025
 convincing+0.024
 know+0.023
 reason+0.023
 month+0.022
Neg logit contributions
ing-0.027
ting-0.026
ter-0.025
ning-0.025
bled-0.023
Token said
Token ID531
Feature activation+1.73e-02

Pos logit contributions
ing+0.032
ting+0.032
ter+0.031
ning+0.030
it+0.027
Neg logit contributions
 whine-0.033
 convincing-0.032
uth-0.031
 month-0.031
 reason-0.031
Token to
Token ID284
Feature activation-4.13e-02

Pos logit contributions
ting+0.026
ning+0.025
ing+0.024
ter+0.024
ini+0.022
Neg logit contributions
 whine-0.030
 convincing-0.030
 hesitation-0.028
 Megan-0.028
 month-0.027
Token soon
Token ID2582
Feature activation-3.18e-02

Pos logit contributions
ing+0.011
ter+0.011
ting+0.011
ning+0.010
 up+0.009
Neg logit contributions
 whine-0.016
 convincing-0.015
uth-0.015
 month-0.015
 reason-0.015
Token nothing
Token ID2147
Feature activation-1.51e-02

Pos logit contributions
 whine+0.060
 convincing+0.058
 hesitation+0.058
 month+0.057
 reason+0.056
Neg logit contributions
ting-0.043
ing-0.042
ter-0.042
ning-0.039
it-0.035
Token but
Token ID475
Feature activation-2.63e-02

Pos logit contributions
 whine+0.026
 convincing+0.025
 reason+0.025
 month+0.024
 hesitation+0.024
Neg logit contributions
ting-0.023
ing-0.023
ter-0.022
ning-0.022
it-0.019
Token a
Token ID257
Feature activation-2.48e-02

Pos logit contributions
 whine+0.045
 convincing+0.045
 month+0.043
 hesitation+0.043
 reason+0.043
Neg logit contributions
ing-0.040
ting-0.040
ter-0.039
ning-0.038
it-0.033
Token small
Token ID1402
Feature activation-1.95e-02

Pos logit contributions
 whine+0.041
 convincing+0.039
uth+0.038
 month+0.037
 argue+0.037
Neg logit contributions
ter-0.039
ing-0.039
ting-0.038
ning-0.036
 floating-0.033
Token spe
Token ID693
Feature activation-1.25e-01

Pos logit contributions
 whine+0.035
 convincing+0.034
uth+0.033
 month+0.033
 hesitation+0.032
Neg logit contributions
ter-0.028
ing-0.028
ting-0.028
ning-0.025
 up-0.023
Tokenck
Token ID694
Feature activation+3.25e-03

Pos logit contributions
 whine+0.191
 age+0.182
 convincing+0.179
 customers+0.178
 hesitation+0.175
Neg logit contributions
ing-0.205
ter-0.204
ting-0.196
ning-0.195
it-0.179
Token in
Token ID287
Feature activation-5.57e-02

Pos logit contributions
ning+0.004
 brand+0.004
ting+0.004
ing+0.004
 Good+0.004
Neg logit contributions
 hesitation-0.006
 reason-0.005
 month-0.005
 whine-0.005
 convincing-0.005
Token the
Token ID262
Feature activation+2.05e-02

Pos logit contributions
 whine+0.117
 convincing+0.113
uth+0.112
 age+0.109
 allowance+0.107
Neg logit contributions
ting-0.063
ter-0.062
ing-0.058
ning-0.056
 up-0.051
Token sky
Token ID6766
Feature activation+8.42e-04

Pos logit contributions
ing+0.031
ting+0.029
ning+0.029
ter+0.028
it+0.025
Neg logit contributions
 whine-0.037
 month-0.036
 reason-0.035
 hesitation-0.034
 convincing-0.034
Token.
Token ID13
Feature activation+1.64e-02

Pos logit contributions
ting+0.001
ing+0.001
ter+0.001
ning+0.001
bled+0.001
Neg logit contributions
 whine-0.001
 convincing-0.001
 month-0.001
 hesitation-0.001
 reason-0.001
Token great
Token ID1049
Feature activation-4.15e-02

Pos logit contributions
 whine+0.015
 convincing+0.014
uth+0.014
 hesitation+0.014
 argue+0.013
Neg logit contributions
ter-0.015
ing-0.015
ting-0.015
ning-0.014
 up-0.013
Token cook
Token ID4255
Feature activation+1.10e-02

Pos logit contributions
 whine+0.068
 convincing+0.065
 hesitation+0.063
uth+0.063
 month+0.060
Neg logit contributions
ing-0.068
ter-0.067
ting-0.065
ning-0.062
 up-0.058
Token."
Token ID526
Feature activation-8.38e-03

Pos logit contributions
ing+0.015
ting+0.015
ter+0.015
ning+0.014
it+0.012
Neg logit contributions
 whine-0.021
 convincing-0.020
uth-0.019
 hesitation-0.019
 month-0.019
Token
Token ID198
Feature activation-3.92e-02

Pos logit contributions
 whine+0.016
 convincing+0.015
 reason+0.015
 month+0.015
 hesitation+0.015
Neg logit contributions
ing-0.012
ting-0.012
ter-0.011
ning-0.010
 up-0.010
Token
Token ID198
Feature activation-3.39e-02

Pos logit contributions
 whine+0.071
 convincing+0.068
 month+0.066
 hesitation+0.066
 reason+0.066
Neg logit contributions
ing-0.058
ting-0.057
ter-0.056
ning-0.053
it-0.048
TokenUn
Token ID3118
Feature activation-1.24e-01

Pos logit contributions
 whine+0.051
 convincing+0.049
 hesitation+0.047
 month+0.047
 reason+0.047
Neg logit contributions
ing-0.060
ting-0.060
ter-0.059
ning-0.056
it-0.052
Tokencle
Token ID2375
Feature activation-6.75e-02

Pos logit contributions
 whine+0.127
 convincing+0.117
 reason+0.112
 hesitation+0.111
uth+0.105
Neg logit contributions
ing-0.286
ting-0.278
ter-0.276
ning-0.266
it-0.252
Token Tom
Token ID4186
Feature activation-1.59e-02

Pos logit contributions
 whine+0.076
uth+0.072
 Megan+0.070
 convincing+0.068
 know+0.066
Neg logit contributions
ting-0.140
ing-0.138
ter-0.133
ning-0.133
bled-0.124
Token took
Token ID1718
Feature activation+4.14e-02

Pos logit contributions
 whine+0.030
 convincing+0.029
uth+0.028
 know+0.028
 reason+0.028
Neg logit contributions
ing-0.021
ting-0.021
ter-0.021
ning-0.020
it-0.017
Token a
Token ID257
Feature activation-1.43e-02

Pos logit contributions
ing+0.048
ting+0.048
ter+0.046
ning+0.043
 up+0.040
Neg logit contributions
 whine-0.088
 convincing-0.085
uth-0.083
 hesitation-0.083
 month-0.082
Token bite
Token ID13197
Feature activation-5.29e-03

Pos logit contributions
uth+0.026
 whine+0.025
 convincing+0.025
 hesitation+0.024
 age+0.023
Neg logit contributions
ing-0.021
 up-0.019
ter-0.018
ting-0.018
 pointer-0.018
Token a
Token ID257
Feature activation-2.06e-02

Pos logit contributions
ing+0.057
ting+0.056
ter+0.056
ning+0.052
it+0.046
Neg logit contributions
 whine-0.083
 convincing-0.079
 month-0.077
uth-0.077
 reason-0.077
Token great
Token ID1049
Feature activation-5.54e-02

Pos logit contributions
 whine+0.033
 convincing+0.031
uth+0.030
 Megan+0.029
 hesitation+0.029
Neg logit contributions
ing-0.033
ter-0.032
ting-0.031
ning-0.031
 pointer-0.029
Token time
Token ID640
Feature activation-6.23e-02

Pos logit contributions
 whine+0.064
 convincing+0.058
 hesitation+0.055
uth+0.055
 Megan+0.053
Neg logit contributions
ing-0.117
ter-0.113
ting-0.111
ning-0.109
 up-0.104
Token using
Token ID1262
Feature activation+5.78e-03

Pos logit contributions
 whine+0.118
 convincing+0.115
 month+0.112
uth+0.111
 hesitation+0.111
Neg logit contributions
ing-0.083
ting-0.083
ter-0.083
ning-0.077
it-0.068
Token their
Token ID511
Feature activation-3.55e-02

Pos logit contributions
ing+0.008
ting+0.008
ter+0.008
ning+0.007
it+0.007
Neg logit contributions
 whine-0.011
 convincing-0.010
 month-0.010
 reason-0.010
 hesitation-0.010
Token imag
Token ID3590
Feature activation-1.24e-01

Pos logit contributions
 whine+0.055
 Megan+0.051
 convincing+0.050
uth+0.050
 month+0.048
Neg logit contributions
ing-0.059
ting-0.056
ter-0.056
ning-0.055
 up-0.052
Tokeninations
Token ID7352
Feature activation-8.90e-03

Pos logit contributions
 whine+0.157
uth+0.132
 reason+0.129
 argue+0.127
 hesitation+0.127
Neg logit contributions
ing-0.263
ting-0.235
ter-0.232
ning-0.232
it-0.229
Token.
Token ID13
Feature activation+2.49e-02

Pos logit contributions
 whine+0.018
 convincing+0.017
uth+0.016
 month+0.016
 reason+0.016
Neg logit contributions
ing-0.012
ting-0.011
ter-0.011
ning-0.010
it-0.009
Token<|endoftext|>
Token ID50256
Feature activation-6.53e-03

Pos logit contributions
ting+0.042
ing+0.042
ter+0.042
ning+0.039
 pointer+0.035
Neg logit contributions
 whine-0.039
 convincing-0.038
 month-0.037
uth-0.037
 reason-0.036
TokenOnce
Token ID7454
Feature activation-7.14e-02

Pos logit contributions
 whine+0.010
 month+0.009
 convincing+0.009
uth+0.009
 reason+0.009
Neg logit contributions
ter-0.011
ting-0.011
ing-0.011
ning-0.011
 pointer-0.010
Token upon
Token ID2402
Feature activation+3.62e-02

Pos logit contributions
 whine+0.129
uth+0.120
 convincing+0.117
 reason+0.116
 hesitation+0.115
Neg logit contributions
ing-0.119
 up-0.102
ter-0.098
 that-0.098
ting-0.093
Token big
Token ID1263
Feature activation-1.06e-02

Pos logit contributions
 whine+0.014
 convincing+0.013
 hesitation+0.013
uth+0.012
 argue+0.012
Neg logit contributions
ter-0.009
ing-0.009
ning-0.009
 up-0.009
ting-0.009
Token cage
Token ID17186
Feature activation+2.41e-03

Pos logit contributions
 whine+0.020
 convincing+0.020
uth+0.019
 hesitation+0.018
 oblig+0.018
Neg logit contributions
ter-0.014
ing-0.013
ting-0.013
ning-0.012
 up-0.012
Token with
Token ID351
Feature activation-2.31e-02

Pos logit contributions
ting+0.004
ing+0.003
ter+0.003
ning+0.003
 pointer+0.003
Neg logit contributions
 whine-0.004
 month-0.004
 convincing-0.004
 reason-0.004
 know-0.004
Token lots
Token ID6041
Feature activation-6.06e-02

Pos logit contributions
 whine+0.044
 month+0.043
 convincing+0.043
 reason+0.042
uth+0.042
Neg logit contributions
ing-0.030
ting-0.029
ter-0.029
ning-0.028
it-0.024
Token of
Token ID286
Feature activation-3.19e-02

Pos logit contributions
 whine+0.116
 convincing+0.107
 argue+0.107
uth+0.105
 reason+0.104
Neg logit contributions
ing-0.089
ting-0.087
ter-0.083
ning-0.076
 up-0.071
Token gor
Token ID30344
Feature activation-1.24e-01

Pos logit contributions
 whine+0.069
 argue+0.065
 convincing+0.065
 oblig+0.063
 protests+0.063
Neg logit contributions
ting-0.035
ter-0.034
 up-0.033
ing-0.033
 that-0.032
Tokenillas
Token ID25314
Feature activation-2.30e-02

Pos logit contributions
 whine+0.227
 reason+0.212
 convincing+0.208
 tant+0.200
 common+0.199
Neg logit contributions
ing-0.210
ting-0.203
ter-0.197
ning-0.192
we-0.176
Token inside
Token ID2641
Feature activation-4.63e-02

Pos logit contributions
 whine+0.043
 convincing+0.041
uth+0.040
 month+0.040
 reason+0.039
Neg logit contributions
ing-0.033
ting-0.032
ter-0.032
ning-0.030
it-0.027
Token!"
Token ID2474
Feature activation-1.28e-02

Pos logit contributions
 whine+0.074
 month+0.073
 convincing+0.071
 reason+0.069
uth+0.069
Neg logit contributions
ter-0.073
ing-0.073
ting-0.072
ning-0.069
 pointer-0.063
Token
Token ID198
Feature activation-3.59e-02

Pos logit contributions
 whine+0.023
 convincing+0.022
 reason+0.021
 hesitation+0.021
uth+0.021
Neg logit contributions
ting-0.020
ing-0.019
ter-0.019
ning-0.017
it-0.016
Token
Token ID198
Feature activation-3.01e-02

Pos logit contributions
 whine+0.084
 hesitation+0.080
 convincing+0.078
 argue+0.076
 protests+0.076
Neg logit contributions
ing-0.036
ting-0.032
it-0.031
 that-0.030
ter-0.029
Token.
Token ID13
Feature activation+1.80e-02

Pos logit contributions
 whine+0.031
 convincing+0.030
 month+0.029
 reason+0.029
 hesitation+0.029
Neg logit contributions
ting-0.026
ter-0.025
ing-0.025
ning-0.024
 pointer-0.021
Token They
Token ID1119
Feature activation+2.08e-02

Pos logit contributions
ing+0.025
ting+0.024
ter+0.023
ning+0.022
it+0.021
Neg logit contributions
 whine-0.034
 convincing-0.032
 hesitation-0.032
 reason-0.032
 argue-0.031
Token had
Token ID550
Feature activation+3.66e-02

Pos logit contributions
ting+0.032
ter+0.032
ing+0.031
ning+0.029
it+0.026
Neg logit contributions
 whine-0.036
 convincing-0.035
 month-0.034
uth-0.034
 reason-0.034
Token used
Token ID973
Feature activation+2.03e-02

Pos logit contributions
ing+0.055
ting+0.055
ter+0.054
ning+0.050
it+0.045
Neg logit contributions
 whine-0.065
 convincing-0.063
 month-0.062
uth-0.061
 reason-0.061
Token their
Token ID511
Feature activation-2.73e-02

Pos logit contributions
ing+0.029
ting+0.028
ter+0.028
ning+0.026
 up+0.023
Neg logit contributions
 whine-0.038
 convincing-0.036
 month-0.035
uth-0.035
 reason-0.035
Token imag
Token ID3590
Feature activation-1.23e-01

Pos logit contributions
 whine+0.056
 convincing+0.053
uth+0.052
 month+0.052
 reason+0.051
Neg logit contributions
ing-0.033
ting-0.032
ter-0.032
ning-0.029
 up-0.027
Tokeninations
Token ID7352
Feature activation-9.62e-03

Pos logit contributions
 whine+0.156
uth+0.132
 reason+0.128
 argue+0.127
 month+0.127
Neg logit contributions
ing-0.259
ting-0.231
ning-0.226
it-0.226
ter-0.224
Token and
Token ID290
Feature activation-1.66e-02

Pos logit contributions
 whine+0.018
 convincing+0.018
 month+0.017
 reason+0.017
 hesitation+0.017
Neg logit contributions
ting-0.013
ter-0.013
ing-0.013
ning-0.012
it-0.011
Token their
Token ID511
Feature activation-2.18e-02

Pos logit contributions
 whine+0.029
 convincing+0.028
 month+0.027
 reason+0.027
uth+0.027
Neg logit contributions
ing-0.025
ting-0.025
ter-0.025
ning-0.023
it-0.021
Token net
Token ID2010
Feature activation+3.31e-02

Pos logit contributions
 whine+0.039
 convincing+0.037
uth+0.036
 month+0.036
 reason+0.036
Neg logit contributions
ing-0.032
ting-0.032
ter-0.031
ning-0.029
 up-0.026
Token to
Token ID284
Feature activation-4.77e-02

Pos logit contributions
ting+0.052
ter+0.050
ing+0.049
ning+0.047
 Good+0.042
Neg logit contributions
 whine-0.055
 convincing-0.054
 reason-0.053
 know-0.053
 hesitation-0.053
Token me
Token ID502
Feature activation-3.27e-02

Pos logit contributions
 whine+0.039
 convincing+0.038
 hesitation+0.037
uth+0.037
 month+0.037
Neg logit contributions
ting-0.017
ing-0.016
ter-0.016
ning-0.015
 up-0.012
Token find
Token ID1064
Feature activation+3.77e-02

Pos logit contributions
 whine+0.050
 know+0.049
 reason+0.047
 convincing+0.046
 argue+0.046
Neg logit contributions
ting-0.056
ing-0.056
ter-0.054
ning-0.053
it-0.048
Token my
Token ID616
Feature activation-8.25e-03

Pos logit contributions
ing+0.055
ter+0.055
ting+0.054
ning+0.052
it+0.047
Neg logit contributions
 whine-0.064
 convincing-0.062
 reason-0.062
uth-0.062
 month-0.061
Token object
Token ID2134
Feature activation+1.97e-02

Pos logit contributions
 whine+0.016
 convincing+0.015
uth+0.015
 oblig+0.015
 month+0.015
Neg logit contributions
ter-0.011
ting-0.010
ing-0.010
ning-0.010
 up-0.010
Token.
Token ID13
Feature activation+4.13e-02

Pos logit contributions
ting+0.033
ing+0.032
ning+0.031
ter+0.031
bled+0.028
Neg logit contributions
 whine-0.030
 month-0.028
 reason-0.028
 age-0.028
 convincing-0.028
Token It
Token ID632
Feature activation+7.90e-02

Pos logit contributions
ing+0.054
ting+0.050
ning+0.049
ter+0.049
bled+0.041
Neg logit contributions
 whine-0.083
 convincing-0.079
 know-0.079
uth-0.078
 month-0.078
Token will
Token ID481
Feature activation+9.70e-03

Pos logit contributions
ting+0.122
ing+0.111
bled+0.108
ter+0.105
pped+0.103
Neg logit contributions
 know-0.134
 reason-0.121
 whine-0.119
 worry-0.119
 convincing-0.116
Token remain
Token ID3520
Feature activation+6.21e-03

Pos logit contributions
ing+0.017
ting+0.017
ter+0.016
ning+0.016
it+0.014
Neg logit contributions
 whine-0.015
 convincing-0.014
 reason-0.014
 month-0.014
 know-0.014
Token my
Token ID616
Feature activation-1.23e-02

Pos logit contributions
ing+0.009
ting+0.008
ter+0.008
ning+0.008
it+0.007
Neg logit contributions
 whine-0.012
 convincing-0.011
 month-0.011
uth-0.011
 reason-0.011
Token favorite
Token ID4004
Feature activation-4.02e-02

Pos logit contributions
 whine+0.018
 convincing+0.017
 month+0.016
uth+0.016
 hesitation+0.016
Neg logit contributions
ter-0.022
ing-0.022
ting-0.022
ning-0.020
 up-0.019
Token toy
Token ID13373
Feature activation+1.78e-02

Pos logit contributions
 whine+0.081
 convincing+0.078
uth+0.075
 hesitation+0.075
 oblig+0.073
Neg logit contributions
ter-0.053
ting-0.052
ing-0.051
ning-0.048
 up-0.045
Token had
Token ID550
Feature activation+5.88e-02

Pos logit contributions
ting+0.050
ing+0.050
ter+0.050
ning+0.046
it+0.041
Neg logit contributions
 whine-0.080
 convincing-0.075
uth-0.074
 know-0.073
 month-0.073
Token a
Token ID257
Feature activation+1.93e-02

Pos logit contributions
ting+0.075
ing+0.074
ter+0.074
ning+0.068
it+0.057
Neg logit contributions
 whine-0.118
 convincing-0.113
 month-0.112
uth-0.111
 reason-0.110
Token new
Token ID649
Feature activation+1.72e-02

Pos logit contributions
ing+0.029
ter+0.029
ting+0.027
ning+0.027
 up+0.027
Neg logit contributions
 whine-0.031
 convincing-0.030
uth-0.029
 hesitation-0.029
 month-0.028
Token toy
Token ID13373
Feature activation+4.81e-02

Pos logit contributions
ing+0.028
ter+0.028
ting+0.027
ning+0.026
 up+0.024
Neg logit contributions
 whine-0.028
 convincing-0.027
uth-0.026
 hesitation-0.026
 month-0.026
Token.
Token ID13
Feature activation+2.40e-02

Pos logit contributions
ting+0.066
ing+0.065
ning+0.062
it+0.058
bled+0.057
Neg logit contributions
 whine-0.086
 age-0.084
 month-0.080
 reason-0.078
 allowance-0.077
Token It
Token ID632
Feature activation+8.60e-02

Pos logit contributions
ing+0.028
ting+0.027
ter+0.026
ning+0.024
bled+0.021
Neg logit contributions
 whine-0.052
 convincing-0.050
 month-0.049
 know-0.049
uth-0.048
Token was
Token ID373
Feature activation+3.59e-02

Pos logit contributions
ting+0.121
ing+0.116
ter+0.106
ning+0.103
ancing+0.094
Neg logit contributions
 whine-0.169
 know-0.161
 convincing-0.158
 month-0.158
 Megan-0.157
Token a
Token ID257
Feature activation+1.80e-02

Pos logit contributions
ting+0.051
ing+0.050
ter+0.048
ning+0.045
it+0.041
Neg logit contributions
 whine-0.068
 convincing-0.065
 month-0.064
 reason-0.063
 hesitation-0.063
Token monster
Token ID9234
Feature activation+2.10e-02

Pos logit contributions
ting+0.031
ing+0.031
ter+0.030
ning+0.029
it+0.026
Neg logit contributions
 whine-0.029
 month-0.027
 convincing-0.027
 reason-0.026
uth-0.026
Token.
Token ID13
Feature activation+2.78e-02

Pos logit contributions
ing+0.031
ting+0.031
bled+0.031
ter+0.030
ning+0.029
Neg logit contributions
 whine-0.035
 age-0.033
 month-0.033
 convincing-0.032
 reason-0.032
Token He
Token ID679
Feature activation+3.82e-02

Pos logit contributions
ting+0.029
ing+0.029
ter+0.028
ning+0.026
it+0.022
Neg logit contributions
 whine-0.062
 convincing-0.060
 month-0.059
uth-0.059
 reason-0.058
Token morning
Token ID3329
Feature activation-1.02e-02

Pos logit contributions
 whine+0.004
 convincing+0.004
 hesitation+0.004
uth+0.004
 reason+0.003
Neg logit contributions
ing-0.006
ter-0.005
ting-0.005
ning-0.005
it-0.005
Token,
Token ID11
Feature activation-1.75e-02

Pos logit contributions
 whine+0.017
 convincing+0.016
 month+0.015
uth+0.015
 reason+0.015
Neg logit contributions
ing-0.017
ting-0.017
ter-0.017
ning-0.016
it-0.014
Token the
Token ID262
Feature activation+5.49e-02

Pos logit contributions
 whine+0.029
 convincing+0.028
 reason+0.027
 hesitation+0.027
 month+0.026
Neg logit contributions
ing-0.028
ting-0.027
ning-0.026
ter-0.026
auc-0.023
Token little
Token ID1310
Feature activation+1.89e-02

Pos logit contributions
ing+0.070
ting+0.069
ter+0.066
ning+0.064
it+0.057
Neg logit contributions
 whine-0.109
 convincing-0.105
 reason-0.103
 month-0.103
 hesitation-0.102
Token bear
Token ID6842
Feature activation+9.72e-03

Pos logit contributions
ting+0.024
ing+0.024
ning+0.022
ter+0.022
it+0.021
Neg logit contributions
 whine-0.035
 convincing-0.034
 reason-0.034
 month-0.033
 hesitation-0.033
Token found
Token ID1043
Feature activation+7.84e-02

Pos logit contributions
ing+0.015
ting+0.015
ter+0.015
ning+0.014
it+0.013
Neg logit contributions
 whine-0.017
 convincing-0.016
 know-0.015
 reason-0.015
 month-0.015
Token that
Token ID326
Feature activation+4.99e-02

Pos logit contributions
ting+0.117
ning+0.117
ing+0.117
we+0.117
auc+0.117
Neg logit contributions
 hesitation-0.113
 know-0.103
 excitement-0.100
 convincing-0.099
 reason-0.098
Token some
Token ID617
Feature activation-6.09e-03

Pos logit contributions
ting+0.068
ing+0.068
ter+0.064
ning+0.062
it+0.055
Neg logit contributions
 whine-0.095
 convincing-0.092
 month-0.091
 reason-0.090
 hesitation-0.089
Token of
Token ID286
Feature activation+9.16e-03

Pos logit contributions
 whine+0.010
 convincing+0.010
 month+0.010
 reason+0.010
 hesitation+0.010
Neg logit contributions
ting-0.009
ing-0.009
ter-0.009
ning-0.009
it-0.008
Token the
Token ID262
Feature activation+4.46e-02

Pos logit contributions
ting+0.012
ing+0.012
ter+0.012
ning+0.011
 up+0.010
Neg logit contributions
 whine-0.018
 convincing-0.017
uth-0.017
 month-0.017
 reason-0.017
Token treats
Token ID18432
Feature activation-2.46e-02

Pos logit contributions
ter+0.061
ing+0.061
ting+0.061
ning+0.056
 up+0.050
Neg logit contributions
 whine-0.084
 convincing-0.080
uth-0.079
 month-0.078
 hesitation-0.077
Token flowers
Token ID12734
Feature activation+1.71e-02

Pos logit contributions
ting+0.010
ing+0.010
ter+0.010
ning+0.009
 up+0.008
Neg logit contributions
 whine-0.012
 convincing-0.011
uth-0.011
 month-0.011
 reason-0.011
Token on
Token ID319
Feature activation-3.01e-02

Pos logit contributions
ting+0.026
ing+0.025
ter+0.025
ning+0.024
 Good+0.022
Neg logit contributions
 whine-0.028
 convincing-0.028
 month-0.027
 hesitation-0.027
 know-0.026
Token them
Token ID606
Feature activation+3.01e-02

Pos logit contributions
 whine+0.062
 convincing+0.059
uth+0.057
 hesitation+0.057
 reason+0.056
Neg logit contributions
ting-0.036
ing-0.036
ter-0.035
ning-0.033
 up-0.031
Token.
Token ID13
Feature activation+2.53e-02

Pos logit contributions
ing+0.035
ting+0.034
ter+0.034
ning+0.031
 up+0.029
Neg logit contributions
 whine-0.063
 convincing-0.061
uth-0.061
 reason-0.059
 month-0.059
Token She
Token ID1375
Feature activation+3.60e-02

Pos logit contributions
ting+0.033
ing+0.033
ter+0.033
ning+0.031
it+0.027
Neg logit contributions
 whine-0.049
 convincing-0.047
 reason-0.046
 month-0.046
 hesitation-0.046
Token picked
Token ID6497
Feature activation+9.89e-02

Pos logit contributions
ing+0.049
ting+0.048
ter+0.048
ning+0.045
 up+0.040
Neg logit contributions
 whine-0.068
 convincing-0.065
 month-0.064
 hesitation-0.064
 reason-0.063
Token out
Token ID503
Feature activation-1.67e-02

Pos logit contributions
ter+0.142
ting+0.140
ing+0.139
ning+0.133
it+0.126
Neg logit contributions
uth-0.156
 month-0.156
 whine-0.155
 convincing-0.155
 hesitation-0.154
Token a
Token ID257
Feature activation+1.05e-02

Pos logit contributions
 whine+0.030
 convincing+0.029
uth+0.029
 month+0.028
 hesitation+0.028
Neg logit contributions
ting-0.024
ter-0.024
ing-0.023
ning-0.022
it-0.020
Token stamp
Token ID17977
Feature activation+4.26e-02

Pos logit contributions
ing+0.015
ter+0.015
ting+0.015
ning+0.014
 up+0.013
Neg logit contributions
 whine-0.019
 convincing-0.018
uth-0.018
 hesitation-0.017
 month-0.017
Token with
Token ID351
Feature activation-2.94e-03

Pos logit contributions
ting+0.059
ter+0.054
ing+0.054
ning+0.052
 Good+0.051
Neg logit contributions
 whine-0.072
 month-0.071
uth-0.070
 hesitation-0.069
 couple-0.069
Token a
Token ID257
Feature activation+1.32e-03

Pos logit contributions
 whine+0.006
 convincing+0.005
 month+0.005
uth+0.005
 hesitation+0.005
Neg logit contributions
ing-0.004
ting-0.004
ter-0.004
ning-0.004
it-0.003
Token something
Token ID1223
Feature activation+2.14e-02

Pos logit contributions
ting+0.090
ing+0.089
ter+0.085
ning+0.083
ancing+0.078
Neg logit contributions
 Megan-0.114
 hesitation-0.114
 whine-0.113
 convincing-0.112
uth-0.110
Token in
Token ID287
Feature activation-2.99e-02

Pos logit contributions
ting+0.038
ter+0.037
ning+0.035
ing+0.035
ancing+0.033
Neg logit contributions
 convincing-0.032
 whine-0.032
 hesitation-0.031
 reason-0.031
 common-0.029
Token the
Token ID262
Feature activation+4.03e-02

Pos logit contributions
 whine+0.060
 convincing+0.058
uth+0.057
 month+0.056
 reason+0.055
Neg logit contributions
ting-0.037
ter-0.037
ing-0.036
ning-0.034
 up-0.031
Token park
Token ID3952
Feature activation+8.99e-03

Pos logit contributions
ing+0.065
ter+0.062
ning+0.060
ting+0.060
bled+0.058
Neg logit contributions
 whine-0.069
 month-0.068
 reason-0.064
 hesitation-0.062
 customers-0.061
Token.
Token ID13
Feature activation+3.21e-02

Pos logit contributions
ting+0.016
ing+0.016
ter+0.015
bled+0.015
ning+0.015
Neg logit contributions
 whine-0.012
 month-0.012
 convincing-0.011
 hesitation-0.011
 reason-0.011
Token It
Token ID632
Feature activation+7.97e-02

Pos logit contributions
ing+0.033
ting+0.032
ter+0.032
ning+0.029
it+0.024
Neg logit contributions
 whine-0.072
 convincing-0.070
 month-0.069
uth-0.068
 reason-0.068
Token was
Token ID373
Feature activation+3.40e-02

Pos logit contributions
ting+0.110
ing+0.105
ter+0.102
ancing+0.096
orting+0.095
Neg logit contributions
 whine-0.151
 convincing-0.147
 Megan-0.144
 reason-0.143
 know-0.142
Token a
Token ID257
Feature activation+6.00e-03

Pos logit contributions
ing+0.046
ting+0.046
ter+0.043
ning+0.041
it+0.037
Neg logit contributions
 whine-0.067
 convincing-0.064
 month-0.063
 hesitation-0.063
 reason-0.063
Token big
Token ID1263
Feature activation+9.25e-03

Pos logit contributions
ting+0.010
ing+0.010
ter+0.009
ning+0.009
it+0.008
Neg logit contributions
 whine-0.010
 convincing-0.010
 month-0.010
 reason-0.010
uth-0.009
Token tree
Token ID5509
Feature activation+6.18e-03

Pos logit contributions
ing+0.014
ting+0.014
ter+0.014
ning+0.013
it+0.012
Neg logit contributions
 whine-0.016
 convincing-0.015
 reason-0.015
 month-0.015
 hesitation-0.015
Token with
Token ID351
Feature activation-2.91e-03

Pos logit contributions
ting+0.010
bled+0.010
ter+0.010
ing+0.009
ning+0.009
Neg logit contributions
 whine-0.010
 month-0.009
 age-0.009
 convincing-0.009
 argue-0.008
Token y
Token ID331
Feature activation-4.57e-02

Pos logit contributions
ing+0.085
ting+0.084
ter+0.082
ning+0.078
it+0.071
Neg logit contributions
 whine-0.099
 convincing-0.095
 month-0.093
 reason-0.093
 hesitation-0.092
Tokenummy
Token ID13513
Feature activation-6.02e-04

Pos logit contributions
 whine+0.083
 month+0.079
 convincing+0.078
uth+0.078
 reason+0.077
Neg logit contributions
ting-0.065
ter-0.064
ing-0.062
ning-0.060
it-0.054
Token fruit
Token ID8234
Feature activation+2.48e-02

Pos logit contributions
 whine+0.001
 convincing+0.001
uth+0.001
 month+0.001
 hesitation+0.001
Neg logit contributions
ing-0.001
ting-0.001
ter-0.001
ning-0.001
 up-0.001
Token and
Token ID290
Feature activation+7.70e-03

Pos logit contributions
ting+0.036
ing+0.035
ter+0.034
ning+0.034
bled+0.030
Neg logit contributions
 month-0.041
 whine-0.041
 convincing-0.041
 Megan-0.039
 reason-0.039
Token the
Token ID262
Feature activation+6.26e-02

Pos logit contributions
ing+0.012
ting+0.012
ter+0.011
ning+0.011
it+0.010
Neg logit contributions
 whine-0.014
 convincing-0.013
 month-0.013
uth-0.013
 reason-0.012
Token pretty
Token ID2495
Feature activation+2.12e-02

Pos logit contributions
ing+0.098
ter+0.098
ting+0.098
ning+0.091
 up+0.082
Neg logit contributions
 whine-0.105
 convincing-0.101
uth-0.098
 month-0.097
 reason-0.096
Token sunset
Token ID26428
Feature activation+1.66e-02

Pos logit contributions
ting+0.032
ing+0.032
ter+0.031
ning+0.029
it+0.026
Neg logit contributions
 whine-0.037
 convincing-0.035
 month-0.035
 reason-0.035
uth-0.035
Token.
Token ID13
Feature activation+3.13e-02

Pos logit contributions
ting+0.025
ter+0.024
ing+0.023
ning+0.022
bled+0.020
Neg logit contributions
 month-0.028
 whine-0.027
 convincing-0.027
uth-0.026
 Megan-0.026
Token<|endoftext|>
Token ID50256
Feature activation+1.52e-02

Pos logit contributions
ting+0.043
ing+0.043
ter+0.043
ning+0.040
it+0.036
Neg logit contributions
 whine-0.058
 convincing-0.056
 reason-0.055
 hesitation-0.054
 month-0.054
TokenOnce
Token ID7454
Feature activation-5.74e-02

Pos logit contributions
ing+0.022
ting+0.022
ter+0.022
ning+0.021
it+0.018
Neg logit contributions
 whine-0.027
 convincing-0.026
 month-0.025
uth-0.025
 reason-0.025
Token upon
Token ID2402
Feature activation+4.94e-02

Pos logit contributions
 whine+0.108
 reason+0.098
uth+0.097
 argue+0.097
 convincing+0.096
Neg logit contributions
ing-0.093
ter-0.081
 up-0.080
it-0.075
 that-0.075
Tokenink
Token ID676
Feature activation-5.91e-05

Pos logit contributions
uth+0.051
 whine+0.051
 convincing+0.050
 common+0.047
 hesitation+0.047
Neg logit contributions
ting-0.050
ter-0.048
ing-0.047
ning-0.044
 Good-0.040
Tokenled
Token ID992
Feature activation+3.26e-02

Pos logit contributions
 hesitation+0.000
 convincing+0.000
 couple+0.000
 whine+0.000
uth+0.000
Neg logit contributions
 Good-0.000
auc-0.000
 How-0.000
ter-0.000
 That-0.000
Token with
Token ID351
Feature activation-1.88e-02

Pos logit contributions
ting+0.047
ing+0.046
ter+0.046
ning+0.046
ray+0.043
Neg logit contributions
 whine-0.054
 convincing-0.052
 hesitation-0.052
 reason-0.051
 month-0.049
Token joy
Token ID8716
Feature activation-4.01e-02

Pos logit contributions
 whine+0.024
 convincing+0.022
uth+0.022
 month+0.022
 reason+0.021
Neg logit contributions
ting-0.037
ing-0.037
ter-0.037
ning-0.035
 up-0.033
Token and
Token ID290
Feature activation-1.39e-02

Pos logit contributions
 hesitation+0.061
 whine+0.061
 convincing+0.059
 age+0.057
 month+0.056
Neg logit contributions
ter-0.058
ting-0.058
ning-0.056
 Good-0.056
auc-0.056
Token he
Token ID339
Feature activation+2.20e-02

Pos logit contributions
 whine+0.022
 convincing+0.021
 month+0.021
uth+0.021
 reason+0.020
Neg logit contributions
ing-0.023
ting-0.023
ter-0.023
ning-0.021
it-0.019
Token thanked
Token ID26280
Feature activation-4.14e-03

Pos logit contributions
ing+0.034
ting+0.034
ter+0.034
ning+0.032
it+0.028
Neg logit contributions
 whine-0.038
 convincing-0.036
 month-0.035
uth-0.035
 reason-0.035
Token L
Token ID406
Feature activation-5.53e-02

Pos logit contributions
 whine+0.007
 convincing+0.007
 month+0.007
uth+0.007
 reason+0.007
Neg logit contributions
ting-0.006
ing-0.006
ter-0.006
ning-0.006
it-0.005
Tokenila
Token ID10102
Feature activation-6.11e-03

Pos logit contributions
uth+0.035
 Megan+0.033
 convincing+0.033
 whine+0.032
 reason+0.031
Neg logit contributions
ting-0.138
ning-0.136
ter-0.135
ing-0.130
 pointer-0.128
Token for
Token ID329
Feature activation-2.67e-02

Pos logit contributions
 whine+0.012
 convincing+0.011
 hesitation+0.011
 month+0.011
uth+0.011
Neg logit contributions
ter-0.008
ting-0.008
ing-0.008
ning-0.007
bled-0.006
Token serving
Token ID7351
Feature activation+2.68e-03

Pos logit contributions
 whine+0.046
 convincing+0.045
 hesitation+0.044
 month+0.042
 reason+0.042
Neg logit contributions
ing-0.043
ting-0.040
ter-0.039
ning-0.039
it-0.035
Token played
Token ID2826
Feature activation+5.25e-02

Pos logit contributions
ing+0.046
ter+0.045
ting+0.045
ning+0.042
 up+0.038
Neg logit contributions
 whine-0.061
 convincing-0.058
 hesitation-0.057
 month-0.057
 reason-0.057
Token on
Token ID319
Feature activation-1.44e-02

Pos logit contributions
ing+0.074
ting+0.073
ter+0.072
ning+0.070
ray+0.066
Neg logit contributions
 convincing-0.094
 whine-0.091
 reason-0.088
 month-0.087
 know-0.085
Token the
Token ID262
Feature activation+6.63e-02

Pos logit contributions
 whine+0.023
 convincing+0.022
 month+0.022
 reason+0.022
uth+0.022
Neg logit contributions
ing-0.024
ter-0.023
ting-0.023
ning-0.022
it-0.020
Token swings
Token ID26728
Feature activation+3.70e-02

Pos logit contributions
ing+0.118
ting+0.110
ter+0.110
ning+0.105
ays+0.104
Neg logit contributions
 month-0.093
 reason-0.092
 whine-0.091
 couple-0.089
 customers-0.086
Token and
Token ID290
Feature activation+1.92e-02

Pos logit contributions
ting+0.055
ter+0.054
ing+0.054
ning+0.051
it+0.046
Neg logit contributions
 whine-0.063
 month-0.062
 convincing-0.060
uth-0.059
 reason-0.058
Token the
Token ID262
Feature activation+6.23e-02

Pos logit contributions
ing+0.028
ter+0.027
ting+0.027
ning+0.026
it+0.023
Neg logit contributions
 whine-0.035
 convincing-0.033
 month-0.033
 reason-0.032
uth-0.032
Token slide
Token ID10649
Feature activation+2.91e-02

Pos logit contributions
ing+0.092
ting+0.092
ter+0.090
ning+0.089
ancing+0.079
Neg logit contributions
 whine-0.107
 reason-0.101
 month-0.101
 convincing-0.099
uth-0.097
Token.
Token ID13
Feature activation+3.21e-02

Pos logit contributions
ter+0.048
ting+0.047
ing+0.045
ning+0.044
 Good+0.042
Neg logit contributions
 month-0.043
 whine-0.043
 convincing-0.040
 Megan-0.040
 age-0.039
Token Then
Token ID3244
Feature activation-2.84e-02

Pos logit contributions
ing+0.055
ter+0.054
ting+0.053
ning+0.051
it+0.046
Neg logit contributions
 whine-0.053
 convincing-0.049
 month-0.048
 age-0.047
uth-0.047
Token,
Token ID11
Feature activation-2.50e-02

Pos logit contributions
 whine+0.053
 convincing+0.051
uth+0.050
 month+0.050
 reason+0.049
Neg logit contributions
ing-0.041
ter-0.040
ting-0.040
ning-0.036
 up-0.035
Token she
Token ID673
Feature activation+2.62e-02

Pos logit contributions
 whine+0.043
 convincing+0.040
 month+0.039
uth+0.039
 reason+0.039
Neg logit contributions
ing-0.039
ting-0.039
ter-0.039
ning-0.037
it-0.033
Token hugged
Token ID41130
Feature activation+3.66e-03

Pos logit contributions
ting+0.008
ing+0.008
ter+0.008
ning+0.008
it+0.007
Neg logit contributions
 whine-0.008
 convincing-0.008
uth-0.007
 month-0.007
 reason-0.007
Token Teddy
Token ID29345
Feature activation-3.01e-02

Pos logit contributions
ing+0.004
ter+0.004
ting+0.004
ning+0.004
it+0.004
Neg logit contributions
 whine-0.008
 convincing-0.007
 reason-0.007
 month-0.007
uth-0.007
Token and
Token ID290
Feature activation-1.68e-02

Pos logit contributions
 whine+0.056
 convincing+0.053
 hesitation+0.052
 month+0.052
 know+0.051
Neg logit contributions
ter-0.040
ting-0.040
ing-0.039
ning-0.039
auc-0.035
Token said
Token ID531
Feature activation+7.26e-03

Pos logit contributions
 whine+0.022
 convincing+0.021
 month+0.020
uth+0.020
 reason+0.020
Neg logit contributions
ing-0.033
ting-0.033
ter-0.032
ning-0.031
it-0.029
Token,
Token ID11
Feature activation-3.02e-02

Pos logit contributions
ting+0.014
ter+0.013
ing+0.013
ning+0.013
pipe+0.012
Neg logit contributions
 whine-0.010
 convincing-0.010
 hesitation-0.009
 month-0.009
uth-0.009
Token "
Token ID366
Feature activation+4.49e-02

Pos logit contributions
 convincing+0.054
 whine+0.053
 age+0.048
 hesitation+0.048
 month+0.048
Neg logit contributions
ing-0.042
ter-0.040
ting-0.040
ning-0.040
auc-0.039
TokenI
Token ID40
Feature activation-1.71e-02

Pos logit contributions
ing+0.057
ting+0.055
ter+0.054
ning+0.053
auc+0.049
Neg logit contributions
 convincing-0.084
 month-0.084
 whine-0.084
 reason-0.082
uth-0.081
Token'm
Token ID1101
Feature activation-2.90e-02

Pos logit contributions
 hesitation+0.025
uth+0.024
 convincing+0.024
 whine+0.024
 month+0.023
Neg logit contributions
ing-0.029
ter-0.027
 up-0.026
 Good-0.026
ting-0.026
Token glad
Token ID9675
Feature activation-1.67e-02

Pos logit contributions
 whine+0.043
 convincing+0.040
 month+0.039
uth+0.039
 reason+0.039
Neg logit contributions
ter-0.052
ing-0.052
ting-0.052
ning-0.049
it-0.045
Token we
Token ID356
Feature activation-1.16e-02

Pos logit contributions
 whine+0.027
 convincing+0.027
 reason+0.025
 hesitation+0.024
 age+0.024
Neg logit contributions
ting-0.025
ning-0.025
ing-0.025
ter-0.025
axe-0.024
Token're
Token ID821
Feature activation-2.91e-02

Pos logit contributions
 whine+0.021
 convincing+0.020
uth+0.020
 month+0.019
 hesitation+0.019
Neg logit contributions
ting-0.018
ing-0.017
ter-0.017
ning-0.016
 up-0.014
Token she
Token ID673
Feature activation+2.21e-02

Pos logit contributions
 whine+0.045
 convincing+0.043
uth+0.042
 month+0.042
 reason+0.041
Neg logit contributions
ing-0.039
ting-0.039
ter-0.039
ning-0.036
it-0.033
Token saw
Token ID2497
Feature activation+7.02e-02

Pos logit contributions
ting+0.038
ing+0.037
ter+0.037
ning+0.035
ancing+0.032
Neg logit contributions
 whine-0.036
 reason-0.033
 month-0.033
 convincing-0.033
 know-0.032
Token a
Token ID257
Feature activation+9.50e-03

Pos logit contributions
ting+0.120
ancing+0.117
ing+0.117
ning+0.112
ter+0.111
Neg logit contributions
 know-0.098
 hesitation-0.093
 whine-0.093
 Megan-0.091
 convincing-0.090
Token big
Token ID1263
Feature activation+9.57e-03

Pos logit contributions
ting+0.017
ing+0.016
ter+0.016
ning+0.015
bled+0.014
Neg logit contributions
 whine-0.014
 reason-0.014
 month-0.014
 convincing-0.014
uth-0.013
Token bird
Token ID6512
Feature activation+2.71e-02

Pos logit contributions
ting+0.015
ing+0.015
ter+0.015
ning+0.014
bled+0.014
Neg logit contributions
 whine-0.016
 month-0.015
 convincing-0.015
 reason-0.014
 hesitation-0.014
Token flying
Token ID7348
Feature activation+5.34e-02

Pos logit contributions
bled+0.039
boards+0.038
ing+0.037
ancing+0.037
ting+0.037
Neg logit contributions
 whine-0.044
 know-0.042
 argue-0.041
 worry-0.040
 convincing-0.040
Token really
Token ID1107
Feature activation-4.75e-02

Pos logit contributions
ting+0.083
ing+0.079
 Good+0.078
ter+0.075
ancing+0.074
Neg logit contributions
 whine-0.085
 month-0.081
 breath-0.079
 argue-0.077
 reason-0.076
Token high
Token ID1029
Feature activation-3.75e-03

Pos logit contributions
 whine+0.049
 convincing+0.045
 know+0.044
 common+0.044
uth+0.044
Neg logit contributions
ting-0.104
ter-0.100
ning-0.099
ing-0.097
 Good-0.091
Token in
Token ID287
Feature activation-3.23e-02

Pos logit contributions
 whine+0.005
 month+0.005
 breath+0.005
 common+0.005
 argue+0.005
Neg logit contributions
 Good-0.006
auc-0.005
bar-0.005
ancing-0.005
ini-0.005
Token the
Token ID262
Feature activation+3.78e-02

Pos logit contributions
 whine+0.052
 month+0.050
 reason+0.049
 convincing+0.049
 know+0.048
Neg logit contributions
ing-0.054
ting-0.052
ter-0.051
ning-0.049
bled-0.045
Token sky
Token ID6766
Feature activation+1.81e-02

Pos logit contributions
ting+0.055
ing+0.055
ter+0.054
ning+0.051
bled+0.050
Neg logit contributions
 month-0.067
 whine-0.065
 reason-0.062
 know-0.062
uth-0.060
Token to
Token ID284
Feature activation-1.88e-02

Pos logit contributions
ter+0.050
ting+0.045
ing+0.044
ning+0.044
it+0.042
Neg logit contributions
 convincing-0.049
 whine-0.049
 hesitation-0.047
 month-0.047
 oblig-0.046
Token bed
Token ID3996
Feature activation-1.56e-02

Pos logit contributions
 whine+0.027
 convincing+0.026
 month+0.025
 know+0.025
 reason+0.025
Neg logit contributions
ing-0.036
ting-0.035
ter-0.035
ning-0.033
it-0.031
Token with
Token ID351
Feature activation-2.13e-02

Pos logit contributions
 whine+0.027
 convincing+0.026
 month+0.025
 hesitation+0.025
 reason+0.025
Neg logit contributions
ting-0.024
ter-0.024
ing-0.023
ning-0.022
it-0.020
Token a
Token ID257
Feature activation-2.10e-02

Pos logit contributions
 whine+0.044
uth+0.042
 convincing+0.042
 month+0.041
 reason+0.041
Neg logit contributions
ing-0.025
ting-0.025
ter-0.024
ning-0.022
 up-0.020
Token smile
Token ID8212
Feature activation-2.74e-02

Pos logit contributions
 whine+0.031
uth+0.031
 convincing+0.030
 month+0.029
 Megan+0.028
Neg logit contributions
ing-0.034
ter-0.032
ting-0.032
ning-0.031
 brand-0.031
Token on
Token ID319
Feature activation-6.20e-02

Pos logit contributions
 whine+0.052
 hesitation+0.051
 month+0.051
 convincing+0.050
 reason+0.048
Neg logit contributions
ting-0.033
ter-0.033
ing-0.032
ning-0.031
rooms-0.029
Token his
Token ID465
Feature activation-4.31e-02

Pos logit contributions
 whine+0.124
 convincing+0.119
uth+0.117
 month+0.117
 reason+0.116
Neg logit contributions
ing-0.078
ting-0.078
ter-0.076
ning-0.071
 up-0.063
Token face
Token ID1986
Feature activation-1.22e-02

Pos logit contributions
 whine+0.069
 convincing+0.066
 month+0.063
uth+0.063
 reason+0.063
Neg logit contributions
ing-0.072
ting-0.072
ter-0.071
ning-0.068
it-0.061
Token.
Token ID13
Feature activation+2.11e-02

Pos logit contributions
 whine+0.021
 convincing+0.021
 hesitation+0.020
 month+0.020
 reason+0.019
Neg logit contributions
ting-0.018
ter-0.018
ing-0.018
ning-0.017
rooms-0.015
Token<|endoftext|>
Token ID50256
Feature activation-4.09e-03

Pos logit contributions
ing+0.034
ting+0.034
ter+0.034
ning+0.032
it+0.028
Neg logit contributions
 whine-0.035
 convincing-0.034
 month-0.033
uth-0.033
 reason-0.032
TokenOnce
Token ID7454
Feature activation-6.62e-02

Pos logit contributions
 month+0.006
 convincing+0.006
 whine+0.006
 Megan+0.006
uth+0.006
Neg logit contributions
ter-0.007
ting-0.007
ing-0.006
ning-0.006
 pointer-0.006
Token way
Token ID835
Feature activation-1.35e-02

Pos logit contributions
ter+0.006
ing+0.006
ting+0.006
ning+0.006
 up+0.005
Neg logit contributions
 whine-0.006
 convincing-0.006
uth-0.006
 month-0.005
 oblig-0.005
Token out
Token ID503
Feature activation-4.06e-02

Pos logit contributions
 whine+0.023
 convincing+0.022
 month+0.022
 reason+0.022
 hesitation+0.022
Neg logit contributions
ting-0.020
ing-0.020
ter-0.020
ning-0.019
it-0.017
Token."
Token ID526
Feature activation+4.55e-04

Pos logit contributions
 whine+0.069
 convincing+0.066
 month+0.065
 reason+0.064
uth+0.064
Neg logit contributions
ing-0.063
ting-0.062
ter-0.062
ning-0.059
it-0.052
Token
Token ID220
Feature activation-3.38e-02

Pos logit contributions
ing+0.001
ting+0.001
ter+0.001
ning+0.001
it+0.001
Neg logit contributions
 whine-0.001
 convincing-0.001
 hesitation-0.001
 reason-0.001
 month-0.001
Token
Token ID198
Feature activation-2.04e-02

Pos logit contributions
 whine+0.058
 convincing+0.056
 month+0.054
uth+0.054
 reason+0.054
Neg logit contributions
ing-0.052
ting-0.052
ter-0.051
ning-0.048
it-0.043
Token
Token ID198
Feature activation-1.64e-02

Pos logit contributions
 whine+0.041
 hesitation+0.040
 convincing+0.039
 month+0.038
 reason+0.038
Neg logit contributions
ing-0.027
ting-0.025
ter-0.025
ning-0.023
it-0.023
TokenThe
Token ID464
Feature activation+4.87e-02

Pos logit contributions
 whine+0.031
 convincing+0.030
 hesitation+0.030
 month+0.029
 reason+0.029
Neg logit contributions
ing-0.022
ting-0.022
ter-0.022
ning-0.020
it-0.019
Token mouse
Token ID10211
Feature activation+1.70e-02

Pos logit contributions
ter+0.083
ing+0.083
ting+0.082
ning+0.077
 up+0.070
Neg logit contributions
 whine-0.076
 convincing-0.072
uth-0.070
 month-0.070
 hesitation-0.069
Token led
Token ID2957
Feature activation+3.92e-02

Pos logit contributions
ing+0.025
ting+0.025
ter+0.024
ning+0.023
it+0.020
Neg logit contributions
 whine-0.031
 convincing-0.029
uth-0.029
 month-0.029
 reason-0.028
Token Liz
Token ID28313
Feature activation+1.26e-02

Pos logit contributions
ing+0.061
ting+0.061
ter+0.060
ning+0.056
it+0.051
Neg logit contributions
 whine-0.067
 convincing-0.064
 month-0.062
uth-0.062
 reason-0.062
Token to
Token ID284
Feature activation-1.71e-02

Pos logit contributions
ter+0.017
ting+0.016
ing+0.016
ning+0.015
rooms+0.014
Neg logit contributions
 whine-0.023
 hesitation-0.022
 convincing-0.022
 know-0.022
uth-0.022
Token.
Token ID13
Feature activation+1.67e-02

Pos logit contributions
 whine+0.069
 convincing+0.065
uth+0.064
 month+0.064
 hesitation+0.063
Neg logit contributions
ing-0.058
ting-0.057
ter-0.056
ning-0.053
 up-0.048
Token "
Token ID366
Feature activation+3.91e-02

Pos logit contributions
ing+0.020
ting+0.020
ter+0.020
ning+0.018
it+0.015
Neg logit contributions
 whine-0.035
 convincing-0.034
 know-0.032
 month-0.032
uth-0.032
TokenDo
Token ID5211
Feature activation+1.27e-02

Pos logit contributions
ing+0.053
ning+0.050
ter+0.048
ting+0.047
 pointer+0.042
Neg logit contributions
 whine-0.076
 month-0.074
 convincing-0.073
 reason-0.072
 know-0.070
Token you
Token ID345
Feature activation-2.83e-02

Pos logit contributions
ting+0.018
ing+0.018
ter+0.017
ning+0.017
pipe+0.015
Neg logit contributions
 whine-0.023
 convincing-0.023
 month-0.022
 reason-0.022
 know-0.022
Token want
Token ID765
Feature activation+1.16e-02

Pos logit contributions
uth+0.036
 whine+0.034
 convincing+0.032
 month+0.032
 Megan+0.031
Neg logit contributions
ter-0.056
ting-0.055
ing-0.054
 up-0.049
ning-0.049
Token to
Token ID284
Feature activation-2.84e-02

Pos logit contributions
ing+0.016
ting+0.016
ter+0.016
ning+0.015
it+0.013
Neg logit contributions
 whine-0.022
 convincing-0.021
 month-0.021
 reason-0.020
uth-0.020
Token go
Token ID467
Feature activation-2.87e-03

Pos logit contributions
 whine+0.052
 convincing+0.050
uth+0.049
 month+0.049
 hesitation+0.048
Neg logit contributions
ter-0.040
ting-0.040
ing-0.040
ning-0.038
 up-0.034
Token for
Token ID329
Feature activation-2.63e-02

Pos logit contributions
 whine+0.005
 convincing+0.005
 month+0.005
 know+0.005
 reason+0.005
Neg logit contributions
ing-0.004
ting-0.004
ter-0.004
ning-0.004
it-0.003
Token a
Token ID257
Feature activation-1.39e-02

Pos logit contributions
 whine+0.060
 convincing+0.058
uth+0.057
 month+0.057
 reason+0.056
Neg logit contributions
ting-0.026
ing-0.025
ter-0.025
ning-0.023
 up-0.019
Token ride
Token ID6594
Feature activation-4.26e-03

Pos logit contributions
 whine+0.025
uth+0.024
 convincing+0.023
 hesitation+0.023
 month+0.022
Neg logit contributions
ing-0.020
ter-0.019
ting-0.019
ning-0.018
 up-0.017
Token?"
Token ID1701
Feature activation-2.84e-02

Pos logit contributions
 whine+0.008
 convincing+0.007
 month+0.007
 reason+0.007
uth+0.007
Neg logit contributions
ting-0.006
ing-0.006
ter-0.006
ning-0.006
it-0.005
Token was
Token ID373
Feature activation+1.91e-02

Pos logit contributions
ting+0.069
ter+0.067
ing+0.066
ning+0.061
ancing+0.055
Neg logit contributions
 whine-0.102
 month-0.098
 convincing-0.097
 know-0.096
 reason-0.094
Token always
Token ID1464
Feature activation-6.71e-03

Pos logit contributions
ing+0.026
ter+0.026
ting+0.026
ning+0.025
 up+0.022
Neg logit contributions
 whine-0.036
 convincing-0.034
uth-0.034
 reason-0.033
 month-0.033
Token shining
Token ID22751
Feature activation+2.17e-02

Pos logit contributions
 whine+0.012
 convincing+0.011
 month+0.011
 hesitation+0.011
 reason+0.011
Neg logit contributions
ting-0.010
ing-0.010
ter-0.010
ning-0.009
it-0.008
Token brightly
Token ID35254
Feature activation-4.90e-02

Pos logit contributions
ting+0.034
ter+0.033
ing+0.032
ning+0.031
bled+0.030
Neg logit contributions
 whine-0.033
 month-0.033
 reason-0.032
 convincing-0.032
 hesitation-0.031
Token.
Token ID13
Feature activation+8.07e-03

Pos logit contributions
 whine+0.082
 month+0.079
 convincing+0.079
uth+0.077
 reason+0.077
Neg logit contributions
ting-0.072
ing-0.072
ter-0.071
ning-0.068
it-0.062
Token
Token ID220
Feature activation-3.99e-02

Pos logit contributions
ing+0.011
ting+0.011
ter+0.011
ning+0.011
 up+0.010
Neg logit contributions
 whine-0.015
 convincing-0.014
 hesitation-0.014
 reason-0.014
 argue-0.014
Token
Token ID198
Feature activation-3.43e-02

Pos logit contributions
 whine+0.071
 convincing+0.067
 month+0.065
uth+0.065
 reason+0.065
Neg logit contributions
ing-0.060
ting-0.059
ter-0.059
ning-0.055
it-0.050
Token
Token ID198
Feature activation-2.98e-02

Pos logit contributions
 whine+0.079
 hesitation+0.074
 argue+0.071
 age+0.071
 convincing+0.071
Neg logit contributions
ing-0.036
ting-0.030
ning-0.029
ter-0.029
it-0.029
TokenBob
Token ID18861
Feature activation-2.56e-02

Pos logit contributions
 whine+0.045
 hesitation+0.041
 convincing+0.040
 age+0.039
 reason+0.039
Neg logit contributions
ing-0.054
ting-0.052
ter-0.052
ning-0.050
it-0.047
Token was
Token ID373
Feature activation+7.30e-03

Pos logit contributions
 whine+0.052
 convincing+0.050
 month+0.048
uth+0.048
 reason+0.048
Neg logit contributions
ing-0.032
ting-0.032
ter-0.031
ning-0.029
it-0.025
Token very
Token ID845
Feature activation-7.31e-02

Pos logit contributions
ing+0.011
ter+0.011
ting+0.011
ning+0.010
 up+0.010
Neg logit contributions
 whine-0.013
 convincing-0.012
 reason-0.012
uth-0.012
 month-0.012
Token.
Token ID13
Feature activation+2.58e-02

Pos logit contributions
 whine+0.014
 convincing+0.014
 hesitation+0.013
 month+0.013
 reason+0.013
Neg logit contributions
ter-0.012
ing-0.012
ting-0.012
ning-0.011
it-0.010
Token
Token ID198
Feature activation-1.67e-02

Pos logit contributions
ing+0.036
ting+0.036
ter+0.036
ning+0.033
 pointer+0.029
Neg logit contributions
 whine-0.048
 convincing-0.047
 month-0.046
uth-0.046
 hesitation-0.044
Token
Token ID198
Feature activation-1.30e-02

Pos logit contributions
 whine+0.030
 convincing+0.028
 reason+0.027
 hesitation+0.027
 month+0.027
Neg logit contributions
ing-0.025
ting-0.024
ter-0.024
ning-0.023
it-0.021
TokenAt
Token ID2953
Feature activation+1.02e-04

Pos logit contributions
 whine+0.023
 convincing+0.022
 month+0.021
uth+0.021
 reason+0.021
Neg logit contributions
ing-0.020
ting-0.020
ter-0.019
ning-0.018
it-0.016
Token the
Token ID262
Feature activation+6.38e-02

Pos logit contributions
ting+0.000
ing+0.000
ter+0.000
ning+0.000
it+0.000
Neg logit contributions
 whine-0.000
 convincing-0.000
 month-0.000
 hesitation-0.000
 reason-0.000
Token end
Token ID886
Feature activation-7.41e-03

Pos logit contributions
ing+0.119
ting+0.117
ter+0.113
ning+0.109
bled+0.103
Neg logit contributions
 month-0.098
 reason-0.092
 whine-0.091
 age-0.086
 hesitation-0.086
Token of
Token ID286
Feature activation-1.23e-02

Pos logit contributions
 whine+0.014
 argue+0.013
uth+0.013
 convincing+0.013
 reason+0.013
Neg logit contributions
ing-0.011
ting-0.010
ter-0.010
ning-0.009
 up-0.009
Token the
Token ID262
Feature activation+4.19e-02

Pos logit contributions
 whine+0.025
uth+0.024
 reason+0.024
 convincing+0.023
 age+0.023
Neg logit contributions
ting-0.015
ter-0.015
ing-0.015
ning-0.013
 up-0.013
Token day
Token ID1110
Feature activation-3.18e-02

Pos logit contributions
ing+0.076
ting+0.075
ter+0.075
ning+0.071
 up+0.066
Neg logit contributions
 whine-0.060
 convincing-0.056
uth-0.055
 hesitation-0.053
 know-0.053
Token,
Token ID11
Feature activation-2.37e-02

Pos logit contributions
 whine+0.055
 convincing+0.051
uth+0.050
 reason+0.050
 argue+0.049
Neg logit contributions
ing-0.051
ting-0.050
ter-0.049
ning-0.046
 up-0.042
Token Henry
Token ID8616
Feature activation-1.61e-02

Pos logit contributions
 whine+0.037
 convincing+0.035
uth+0.034
 month+0.034
 reason+0.034
Neg logit contributions
ing-0.041
ting-0.041
ter-0.040
ning-0.037
 up-0.035
Token.
Token ID13
Feature activation+1.76e-02

Pos logit contributions
 whine+0.031
 convincing+0.030
 month+0.030
 reason+0.030
 age+0.029
Neg logit contributions
ter-0.031
ting-0.030
ning-0.030
ing-0.029
bled-0.028
Token He
Token ID679
Feature activation+3.31e-02

Pos logit contributions
ing+0.020
ting+0.020
ter+0.020
ning+0.018
it+0.016
Neg logit contributions
 whine-0.037
 convincing-0.036
 month-0.035
 reason-0.035
 hesitation-0.035
Token wanted
Token ID2227
Feature activation+3.20e-02

Pos logit contributions
ter+0.052
ing+0.052
ting+0.051
ning+0.048
 up+0.045
Neg logit contributions
 whine-0.054
 month-0.053
 hesitation-0.052
 convincing-0.052
uth-0.051
Token to
Token ID284
Feature activation-1.84e-02

Pos logit contributions
ting+0.029
ing+0.029
ter+0.027
 up+0.025
 that+0.025
Neg logit contributions
 whine-0.074
uth-0.074
 convincing-0.071
azel-0.069
 argue-0.069
Token act
Token ID719
Feature activation+1.68e-02

Pos logit contributions
uth+0.038
 hesitation+0.036
 convincing+0.036
 month+0.035
 whine+0.035
Neg logit contributions
ter-0.021
 up-0.019
 that-0.019
ting-0.018
 laser-0.017
Token too
Token ID1165
Feature activation-7.46e-02

Pos logit contributions
ing+0.025
ting+0.025
ter+0.025
ning+0.023
 up+0.021
Neg logit contributions
 whine-0.030
 convincing-0.028
uth-0.028
 month-0.028
 reason-0.028
Token.
Token ID13
Feature activation+1.71e-02

Pos logit contributions
 whine+0.169
 convincing+0.160
uth+0.159
 oblig+0.159
 hesitation+0.157
Neg logit contributions
ing-0.081
ting-0.070
ter-0.069
 that-0.064
 up-0.064
Token
Token ID198
Feature activation-2.81e-02

Pos logit contributions
ing+0.024
ting+0.022
ter+0.022
ning+0.021
it+0.019
Neg logit contributions
 whine-0.033
 hesitation-0.032
 reason-0.032
 month-0.032
 convincing-0.031
Token
Token ID198
Feature activation-2.33e-02

Pos logit contributions
 whine+0.066
 hesitation+0.063
 argue+0.061
 age+0.061
 convincing+0.060
Neg logit contributions
ing-0.029
 up-0.023
it-0.023
ting-0.023
 that-0.022
TokenTim
Token ID14967
Feature activation+1.93e-02

Pos logit contributions
 whine+0.047
 hesitation+0.046
 convincing+0.045
 reason+0.044
 age+0.044
Neg logit contributions
ing-0.030
ting-0.028
ter-0.027
ning-0.026
it-0.025
Token asked
Token ID1965
Feature activation-5.01e-03

Pos logit contributions
ing+0.033
ting+0.033
ter+0.032
ning+0.031
 up+0.028
Neg logit contributions
 whine-0.030
 convincing-0.029
 month-0.028
uth-0.028
 reason-0.028
Token him
Token ID683
Feature activation+6.29e-03

Pos logit contributions
ter+0.099
we+0.091
ning+0.090
ini+0.088
ancing+0.088
Neg logit contributions
 whine-0.118
 know-0.117
 Megan-0.116
 customers-0.116
 tant-0.116
Token up
Token ID510
Feature activation-1.62e-02

Pos logit contributions
ter+0.004
ting+0.004
ing+0.004
ning+0.004
we+0.003
Neg logit contributions
 whine-0.015
 argue-0.015
 know-0.015
 convincing-0.014
 reason-0.014
Token and
Token ID290
Feature activation+7.57e-03

Pos logit contributions
 whine+0.036
 convincing+0.035
uth+0.034
 month+0.034
 reason+0.034
Neg logit contributions
ing-0.017
ting-0.017
ter-0.016
ning-0.015
 up-0.014
Token they
Token ID484
Feature activation+2.49e-02

Pos logit contributions
ing+0.015
ting+0.014
ter+0.014
ning+0.014
ancing+0.013
Neg logit contributions
 whine-0.011
 convincing-0.010
 know-0.009
 argue-0.009
 reason-0.009
Token were
Token ID547
Feature activation+1.94e-02

Pos logit contributions
ing+0.038
ting+0.037
ter+0.037
ning+0.035
it+0.031
Neg logit contributions
 whine-0.044
 convincing-0.042
 month-0.040
 reason-0.040
 hesitation-0.040
Token very
Token ID845
Feature activation-7.07e-02

Pos logit contributions
ter+0.026
ing+0.025
ting+0.025
ning+0.024
 up+0.023
Neg logit contributions
 whine-0.037
 month-0.036
uth-0.036
 reason-0.035
 common-0.035
Token angry
Token ID7954
Feature activation-2.15e-02

Pos logit contributions
 month+0.114
uth+0.110
 hesitation+0.104
kids+0.102
 Debbie+0.101
Neg logit contributions
ing-0.118
ter-0.113
 up-0.111
 that-0.103
bled-0.102
Token at
Token ID379
Feature activation-1.82e-02

Pos logit contributions
 convincing+0.028
 whine+0.027
 age+0.026
 reason+0.026
 hesitation+0.025
Neg logit contributions
ning-0.037
ter-0.037
orn-0.036
ini-0.036
ting-0.036
Token the
Token ID262
Feature activation+4.54e-02

Pos logit contributions
 whine+0.032
 argue+0.031
 convincing+0.031
 reason+0.030
 know+0.029
Neg logit contributions
ing-0.028
ning-0.027
ting-0.026
ter-0.026
auc-0.024
Token old
Token ID1468
Feature activation+2.08e-02

Pos logit contributions
ing+0.064
ting+0.063
ter+0.063
 up+0.061
ning+0.058
Neg logit contributions
uth-0.081
 whine-0.081
 convincing-0.078
 oblig-0.077
 hesitation-0.077
Token bar
Token ID2318
Feature activation-6.86e-02

Pos logit contributions
ing+0.029
ter+0.028
ting+0.027
 up+0.027
ning+0.026
Neg logit contributions
uth-0.038
 whine-0.038
 convincing-0.037
 hesitation-0.037
aney-0.036
Token knew
Token ID2993
Feature activation-5.35e-03

Pos logit contributions
ing+0.003
ting+0.003
ter+0.003
ning+0.003
it+0.002
Neg logit contributions
 whine-0.004
 convincing-0.004
 month-0.004
uth-0.004
 reason-0.004
Token she
Token ID673
Feature activation+1.26e-02

Pos logit contributions
 whine+0.010
 convincing+0.010
 hesitation+0.010
uth+0.009
 month+0.009
Neg logit contributions
ing-0.007
ting-0.007
ter-0.007
ning-0.007
it-0.006
Token could
Token ID714
Feature activation-1.38e-02

Pos logit contributions
ting+0.020
ing+0.020
ter+0.020
ning+0.019
ancing+0.017
Neg logit contributions
 whine-0.021
 know-0.020
 convincing-0.020
 month-0.020
 reason-0.020
Token eat
Token ID4483
Feature activation+1.28e-02

Pos logit contributions
 whine+0.021
 convincing+0.021
 hesitation+0.020
uth+0.020
 month+0.020
Neg logit contributions
ter-0.023
ting-0.023
ing-0.022
ning-0.021
 up-0.020
Token the
Token ID262
Feature activation+2.74e-02

Pos logit contributions
ing+0.016
ting+0.015
ter+0.015
ning+0.014
it+0.012
Neg logit contributions
 whine-0.026
 convincing-0.025
 month-0.025
uth-0.025
 reason-0.024
Token y
Token ID331
Feature activation-8.85e-02

Pos logit contributions
ing+0.043
 up+0.042
ter+0.041
ting+0.040
ning+0.038
Neg logit contributions
 whine-0.048
 convincing-0.043
uth-0.043
 hesitation-0.043
 oblig-0.043
Tokenummy
Token ID13513
Feature activation-3.37e-02

Pos logit contributions
 whine+0.192
 convincing+0.184
 hesitation+0.180
uth+0.178
 month+0.178
Neg logit contributions
ing-0.102
ting-0.096
ter-0.094
ning-0.088
 up-0.077
Token,
Token ID11
Feature activation-3.73e-02

Pos logit contributions
 whine+0.061
 hesitation+0.057
uth+0.056
 convincing+0.056
 oblig+0.055
Neg logit contributions
ing-0.051
 up-0.049
ting-0.047
ter-0.047
ning-0.044
Token sweet
Token ID6029
Feature activation-4.26e-02

Pos logit contributions
 whine+0.068
uth+0.065
 convincing+0.064
 month+0.063
 reason+0.062
Neg logit contributions
ting-0.052
ter-0.052
ing-0.051
ning-0.047
 up-0.045
Token pastry
Token ID41738
Feature activation-2.97e-02

Pos logit contributions
 whine+0.080
 convincing+0.075
uth+0.074
 hesitation+0.073
 argue+0.073
Neg logit contributions
ing-0.063
ting-0.059
ter-0.058
ning-0.055
 up-0.053
Token.
Token ID13
Feature activation+8.81e-03

Pos logit contributions
 whine+0.056
 convincing+0.053
 month+0.053
 reason+0.052
uth+0.052
Neg logit contributions
ting-0.040
ing-0.040
ter-0.039
ning-0.037
it-0.032
Token-
Token ID12
Feature activation-2.18e-02

Pos logit contributions
 whine+0.021
 month+0.020
 reason+0.019
 convincing+0.019
uth+0.019
Neg logit contributions
ting-0.020
ter-0.018
ning-0.018
ing-0.018
 Good-0.016
Tokencream
Token ID36277
Feature activation+6.82e-03

Pos logit contributions
 month+0.045
 whine+0.045
 know+0.045
uth+0.044
 allowance+0.044
Neg logit contributions
ting-0.021
ter-0.020
ning-0.019
ing-0.018
ancing-0.016
Token.
Token ID13
Feature activation+1.70e-02

Pos logit contributions
ting+0.010
ning+0.010
ter+0.010
ing+0.010
rooms+0.009
Neg logit contributions
 whine-0.011
 month-0.011
 allowance-0.010
 convincing-0.010
 reason-0.010
Token Spot
Token ID15899
Feature activation-9.23e-03

Pos logit contributions
ing+0.025
ting+0.025
ter+0.024
ning+0.023
it+0.020
Neg logit contributions
 whine-0.031
 convincing-0.029
 month-0.029
uth-0.029
 reason-0.028
Token was
Token ID373
Feature activation+8.45e-03

Pos logit contributions
 whine+0.020
 convincing+0.019
 month+0.019
 hesitation+0.019
uth+0.019
Neg logit contributions
ing-0.010
ter-0.010
ting-0.010
ning-0.009
 up-0.008
Token very
Token ID845
Feature activation-7.12e-02

Pos logit contributions
ter+0.011
ting+0.010
ing+0.010
ning+0.010
 up+0.010
Neg logit contributions
 whine-0.016
uth-0.016
 month-0.016
 reason-0.016
 hesitation-0.015
Token happy
Token ID3772
Feature activation-2.22e-02

Pos logit contributions
 hesitation+0.116
uth+0.111
 whine+0.111
 month+0.108
 reason+0.107
Neg logit contributions
ing-0.110
 up-0.106
ter-0.104
 floating-0.098
it-0.095
Token and
Token ID290
Feature activation-1.14e-02

Pos logit contributions
 whine+0.034
 convincing+0.034
 reason+0.032
 age+0.032
 month+0.031
Neg logit contributions
ting-0.036
ter-0.035
ning-0.034
ing-0.034
bled-0.033
Token ate
Token ID15063
Feature activation+3.18e-02

Pos logit contributions
 whine+0.017
 convincing+0.017
 know+0.016
 month+0.015
uth+0.015
Neg logit contributions
ing-0.020
ting-0.020
ter-0.020
ning-0.019
it-0.018
Token it
Token ID340
Feature activation+4.23e-02

Pos logit contributions
ing+0.047
ting+0.045
ter+0.045
ning+0.045
ini+0.040
Neg logit contributions
 whine-0.056
 convincing-0.055
 reason-0.052
 month-0.052
 hesitation-0.052
Token all
Token ID477
Feature activation-1.40e-02

Pos logit contributions
ting+0.059
ter+0.059
ing+0.056
ning+0.055
auc+0.049
Neg logit contributions
 whine-0.076
 convincing-0.075
 month-0.075
 reason-0.072
 argue-0.070
Token girl
Token ID2576
Feature activation-9.14e-03

Pos logit contributions
 whine+0.007
 month+0.007
 reason+0.007
 convincing+0.007
uth+0.007
Neg logit contributions
ting-0.005
ing-0.005
ter-0.005
ning-0.005
it-0.004
Token,
Token ID11
Feature activation-3.70e-02

Pos logit contributions
 whine+0.019
 convincing+0.018
 hesitation+0.018
 reason+0.018
 month+0.018
Neg logit contributions
ing-0.011
ting-0.011
ter-0.011
ning-0.010
 up-0.009
Token Sarah
Token ID10490
Feature activation-1.39e-02

Pos logit contributions
 whine+0.070
 convincing+0.066
uth+0.065
 month+0.065
 reason+0.064
Neg logit contributions
ing-0.052
ting-0.052
ter-0.051
ning-0.047
it-0.042
Token,
Token ID11
Feature activation-3.44e-02

Pos logit contributions
 whine+0.028
 convincing+0.027
 month+0.026
uth+0.026
 reason+0.026
Neg logit contributions
ing-0.018
ting-0.017
ter-0.017
ning-0.016
it-0.014
Token was
Token ID373
Feature activation+3.06e-03

Pos logit contributions
 whine+0.062
 convincing+0.059
 month+0.058
uth+0.057
 reason+0.057
Neg logit contributions
ing-0.050
ting-0.049
ter-0.049
ning-0.046
it-0.041
Token extra
Token ID3131
Feature activation-7.13e-02

Pos logit contributions
ing+0.005
ting+0.005
ter+0.004
ning+0.004
 up+0.004
Neg logit contributions
 whine-0.005
 convincing-0.005
 reason-0.005
uth-0.005
 month-0.005
Token excited
Token ID6568
Feature activation-2.91e-02

Pos logit contributions
 whine+0.111
 argue+0.103
 month+0.102
uth+0.101
 reason+0.101
Neg logit contributions
ing-0.123
ter-0.115
ting-0.114
 up-0.110
ning-0.104
Token.
Token ID13
Feature activation+7.79e-03

Pos logit contributions
 month+0.043
 convincing+0.043
 whine+0.043
 age+0.042
 reason+0.042
Neg logit contributions
ter-0.049
ting-0.048
bled-0.046
ning-0.045
ing-0.044
Token She
Token ID1375
Feature activation+1.60e-02

Pos logit contributions
ing+0.011
ting+0.011
ter+0.010
ning+0.010
it+0.009
Neg logit contributions
 whine-0.015
 convincing-0.014
 month-0.014
uth-0.014
 reason-0.014
Token kept
Token ID4030
Feature activation+8.90e-04

Pos logit contributions
ing+0.025
ting+0.025
ter+0.024
ning+0.023
it+0.021
Neg logit contributions
 whine-0.027
 convincing-0.026
 month-0.025
uth-0.025
 hesitation-0.025
Token saying
Token ID2282
Feature activation+8.60e-03

Pos logit contributions
ing+0.001
ter+0.001
ting+0.001
ning+0.001
it+0.001
Neg logit contributions
 whine-0.002
 convincing-0.002
 month-0.002
 hesitation-0.002
 reason-0.002