TOP ACTIVATIONS
MAX = 0.126

mom my was mad . She grabbed a brush and started
She was so angry . She said , " You must
inside that barn . If she did , the owners would
feeling very boss y . She wanted to do her own
mom had an idea . She got an avocado out of
and started to panic . She loved that v ase very
was worried about him . She took a therm ometer to
a special plant . If she took good care of it
was feeling very excited . She was going to open a
was feeling very ashamed . She was ashamed because her wings
's mom was surprised . She said , " But why
She was so excited . She showed it to her three
His mom looked concerned . She said , " I 'm
Pat was so excited . She had a new jar and
, felt so excited . She was wearing her new pink
ummy was so excited ! She had just got a letter
Mom my was excited . She picked up the orange phone
olly was feeling nervous . She had to feed her stomach
. The mom disagreed . She was worried because the home
Her mom was worried . She gave Lily a pill to

MIN ACTIVATIONS
MIN = -0.174

Sammy would often share av oc ados and other y ummy
they could grow more av oc ados and share them with
began flying around making aer opl ane noises . Jack soon
could see lots of av oc ados that the supplier had
learn more about rh in oc er oses and respect them
sailed for days , exploring unch arted waters until he came
I will buy more av oc ados tomorrow . You can
Lily loved to eat av oc ados , they were her
and forgot to buy av oc ados . Lily was sad
sky to spot more aer opl anes . <|endoftext|> Once upon
She loved to eat av oc ados . One day ,
She loved to eat av oc ados because they were green
She loved to eat av oc ados because they were y
fun . I like aer opl anes ." Mom
didn 't like wet av oc ados , so he decided
my loved to eat av oc ados , they were his
that it produced more av oc ados than ever before .
He had heard about aer opl anes before ; they were
ongo loved to eat av oc ados . One day ,
s pretend to be aer opl anes , â € Jack

INTERVAL 0.051 - 0.126
CONTAINS 11.425%

the park every day . They had a lot of fun
, John was so excited he could barely contain himself .
became a happy place where everyone learned and played together .
After that , Andrew learnt his lesson and never
named Lily . She had a furry t eddy bear that

INTERVAL -0.024 - 0.051
CONTAINS 56.063%

flexible - it could move and bend , and Sam was
They were not hurt , but they were sad .
and laughed . They forgot about their snow man . They
of food . Every day , the musician would eat y
a time , there was a little girl named Lily .

INTERVAL -0.099 - -0.024
CONTAINS 32.351%

knew it was made with love . <|endoftext|> Once upon a
her bear a big hug . He was so soft and
his grandmother 's house . <|endoftext|> Once upon a time ,
shouts at the raven . " Sh oo ,
toys and make up fun games . One day , he

INTERVAL -0.174 - -0.099
CONTAINS 0.160%

played and n apped and nib bled and hid together .
It was covered in sticky cob we bs and had to
to pet the rh in oc eros . It looked so
to the mail man tomorrow ." They said ,
oc ados because they were y ummy and green . One
Token mom
Token ID1995
Feature activation+7.61e-02

Pos logit contributions
 forgot+0.061
 punished+0.055
 hoped+0.045
 managed+0.045
wy+0.044
Neg logit contributions
aps-0.093
apping-0.090
uttering-0.084
apped-0.078
 Today-0.077
Tokenmy
Token ID1820
Feature activation+8.76e-02

Pos logit contributions
 secure+0.113
 colder+0.112
 forgot+0.111
ts+0.109
wy+0.106
Neg logit contributions
apping-0.156
aps-0.147
apped-0.142
elt-0.118
uts-0.116
Token was
Token ID373
Feature activation+4.26e-02

Pos logit contributions
 colder+0.137
 secure+0.135
 matter+0.131
wy+0.128
sie+0.125
Neg logit contributions
apping-0.165
apped-0.158
aps-0.147
uts-0.143
unch-0.129
Token mad
Token ID8805
Feature activation-4.76e-02

Pos logit contributions
 forgot+0.080
 cleaned+0.067
 punished+0.063
 sighed+0.060
ty+0.060
Neg logit contributions
apping-0.099
aps-0.098
uttering-0.088
apped-0.084
 This-0.083
Token.
Token ID13
Feature activation-5.16e-02

Pos logit contributions
aps+0.110
apping+0.101
unch+0.097
uttering+0.097
 This+0.094
Neg logit contributions
 forgot-0.092
 cleaned-0.077
 punished-0.076
 realized-0.074
 felt-0.073
Token She
Token ID1375
Feature activation+1.26e-01

Pos logit contributions
aps+0.101
apping+0.100
uttering+0.098
unch+0.088
ectar+0.085
Neg logit contributions
 forgot-0.112
 saw-0.106
 cleaned-0.106
 felt-0.103
 looked-0.103
Token grabbed
Token ID13176
Feature activation-9.51e-03

Pos logit contributions
 colder+0.178
wy+0.172
sie+0.170
 distance+0.165
 rubble+0.159
Neg logit contributions
apped-0.233
apping-0.232
 This-0.220
aps-0.213
 That-0.203
Token a
Token ID257
Feature activation+4.53e-02

Pos logit contributions
aps+0.019
apping+0.018
 This+0.017
apped+0.016
uttering+0.016
Neg logit contributions
 forgot-0.022
 punished-0.018
 cleaned-0.018
 regretted-0.016
'd-0.016
Token brush
Token ID14093
Feature activation-3.52e-02

Pos logit contributions
 forgot+0.078
 punished+0.066
 cleaned+0.060
 hoped+0.058
 realized+0.056
Neg logit contributions
aps-0.113
apping-0.108
uttering-0.100
 This-0.100
uts-0.094
Token and
Token ID290
Feature activation+4.71e-02

Pos logit contributions
aps+0.081
apping+0.080
uttering+0.072
uts+0.071
 This+0.071
Neg logit contributions
 forgot-0.068
 cleaned-0.060
 punished-0.060
 put-0.052
 realized-0.051
Token started
Token ID2067
Feature activation-1.99e-02

Pos logit contributions
 forgot+0.062
sie+0.056
wy+0.055
'd+0.055
 punished+0.053
Neg logit contributions
aps-0.105
apped-0.099
apping-0.096
 This-0.096
apy-0.096
Token She
Token ID1375
Feature activation+1.18e-01

Pos logit contributions
aps+0.090
apping+0.088
uttering+0.083
unch+0.072
uts+0.070
Neg logit contributions
 forgot-0.148
 cleaned-0.143
 punished-0.136
 put-0.134
 felt-0.132
Token was
Token ID373
Feature activation+3.55e-02

Pos logit contributions
 forgot+0.174
wy+0.172
 colder+0.171
 distance+0.160
 ger+0.151
Neg logit contributions
apped-0.228
apping-0.226
aps-0.221
 This-0.204
uts-0.200
Token so
Token ID523
Feature activation+2.87e-02

Pos logit contributions
 forgot+0.065
 cleaned+0.055
 punished+0.052
 realized+0.048
 sighed+0.047
Neg logit contributions
aps-0.084
apping-0.084
uttering-0.074
 This-0.073
uts-0.073
Token angry
Token ID7954
Feature activation-6.91e-02

Pos logit contributions
 forgot+0.045
 cleaned+0.038
 punished+0.038
 put+0.031
 realized+0.031
Neg logit contributions
aps-0.077
apping-0.076
uttering-0.069
uts-0.069
 This-0.068
Token.
Token ID13
Feature activation-5.63e-02

Pos logit contributions
aps+0.178
apping+0.174
uttering+0.166
apped+0.159
unch+0.155
Neg logit contributions
 forgot-0.111
 punished-0.095
 cleaned-0.094
 put-0.088
 realized-0.086
Token She
Token ID1375
Feature activation+1.25e-01

Pos logit contributions
aps+0.086
apping+0.084
uttering+0.077
unch+0.067
uts+0.065
Neg logit contributions
 forgot-0.153
 cleaned-0.142
 punished-0.138
 put-0.134
 felt-0.133
Token said
Token ID531
Feature activation-1.43e-02

Pos logit contributions
wy+0.179
 colder+0.176
 forgot+0.173
sie+0.155
 happiest+0.154
Neg logit contributions
apping-0.239
 This-0.230
apped-0.228
aps-0.224
uts-0.222
Token,
Token ID11
Feature activation+8.90e-03

Pos logit contributions
aps+0.029
apping+0.028
uttering+0.025
uts+0.025
 This+0.024
Neg logit contributions
 forgot-0.032
 cleaned-0.029
 punished-0.029
 realized-0.026
 put-0.026
Token "
Token ID366
Feature activation-2.03e-02

Pos logit contributions
 forgot+0.021
 cleaned+0.020
 punished+0.020
 put+0.019
 sighed+0.019
Neg logit contributions
aps-0.017
apping-0.015
uttering-0.014
uts-0.013
apped-0.013
TokenYou
Token ID1639
Feature activation+6.94e-02

Pos logit contributions
aps+0.037
apping+0.037
 This+0.033
uts+0.032
uttering+0.032
Neg logit contributions
 forgot-0.047
 cleaned-0.042
 punished-0.041
 locked-0.037
 repaired-0.037
Token must
Token ID1276
Feature activation+2.68e-02

Pos logit contributions
 forgot+0.113
 sighed+0.104
 cleaned+0.097
ty+0.097
 colder+0.096
Neg logit contributions
 This-0.146
aps-0.142
apping-0.140
uts-0.136
 These-0.132
Token inside
Token ID2641
Feature activation-2.84e-02

Pos logit contributions
aps+0.057
apping+0.056
uttering+0.050
 This+0.049
uts+0.049
Neg logit contributions
 forgot-0.064
 cleaned-0.058
 punished-0.057
 sighed-0.051
 realized-0.051
Token that
Token ID326
Feature activation+1.55e-02

Pos logit contributions
aps+0.057
apping+0.056
 This+0.049
uttering+0.049
uts+0.049
Neg logit contributions
 forgot-0.063
 punished-0.056
 cleaned-0.055
 sighed-0.049
 checked-0.049
Token barn
Token ID25203
Feature activation-5.12e-02

Pos logit contributions
 forgot+0.032
 punished+0.028
 cleaned+0.027
 sighed+0.025
ts+0.024
Neg logit contributions
 Today-0.029
 This-0.029
aps-0.028
apping-0.027
uttering-0.026
Token.
Token ID13
Feature activation-6.56e-02

Pos logit contributions
aps+0.093
apping+0.092
 This+0.082
uts+0.082
uttering+0.078
Neg logit contributions
 forgot-0.118
 punished-0.106
 cleaned-0.104
 realized-0.093
 sighed-0.093
Token If
Token ID1002
Feature activation-1.09e-02

Pos logit contributions
apping+0.117
uttering+0.116
aps+0.114
uts+0.101
utes+0.098
Neg logit contributions
 forgot-0.153
 cleaned-0.140
 punished-0.134
 hoped-0.133
 learned-0.130
Token she
Token ID673
Feature activation+1.25e-01

Pos logit contributions
aps+0.021
apping+0.021
uts+0.018
 This+0.018
uttering+0.018
Neg logit contributions
 forgot-0.025
 cleaned-0.023
 punished-0.023
 sighed-0.020
 put-0.020
Token did
Token ID750
Feature activation-1.44e-02

Pos logit contributions
ts+0.141
 forgot+0.135
 mess+0.133
 punished+0.129
 colder+0.127
Neg logit contributions
uts-0.282
apping-0.280
 This-0.278
aps-0.274
apped-0.262
Token,
Token ID11
Feature activation+1.17e-02

Pos logit contributions
aps+0.029
apping+0.028
 This+0.026
uts+0.026
uttering+0.025
Neg logit contributions
 forgot-0.031
 cleaned-0.028
 punished-0.028
 sighed-0.025
 hoped-0.024
Token the
Token ID262
Feature activation+2.92e-02

Pos logit contributions
 forgot+0.028
 punished+0.026
 cleaned+0.025
 sighed+0.022
 checked+0.022
Neg logit contributions
aps-0.020
apping-0.020
 This-0.018
uts-0.017
uttering-0.017
Token owners
Token ID4393
Feature activation+5.21e-02

Pos logit contributions
 forgot+0.062
 cleaned+0.052
 punished+0.051
 sighed+0.048
 replaced+0.046
Neg logit contributions
 Today-0.055
aps-0.053
 This-0.052
apping-0.051
uttering-0.051
Token would
Token ID561
Feature activation+2.09e-02

Pos logit contributions
 forgot+0.064
wy+0.052
 punished+0.052
 colder+0.049
 cleaned+0.048
Neg logit contributions
apping-0.132
aps-0.128
apped-0.123
 This-0.117
uts-0.117
Token feeling
Token ID4203
Feature activation+2.46e-02

Pos logit contributions
 forgot+0.115
 cleaned+0.107
 punished+0.106
 put+0.097
 locked+0.094
Neg logit contributions
aps-0.088
apping-0.086
uts-0.079
 This-0.074
uttering-0.074
Token very
Token ID845
Feature activation+4.01e-02

Pos logit contributions
 forgot+0.046
 cleaned+0.039
 punished+0.038
 sighed+0.034
 checked+0.034
Neg logit contributions
aps-0.058
apping-0.057
 This-0.053
uttering-0.051
apped-0.049
Token boss
Token ID6478
Feature activation-3.05e-02

Pos logit contributions
 forgot+0.066
 punished+0.056
 cleaned+0.056
 put+0.047
 realized+0.047
Neg logit contributions
aps-0.105
apping-0.103
uts-0.095
uttering-0.094
 This-0.092
Tokeny
Token ID88
Feature activation+1.69e-02

Pos logit contributions
aps+0.071
apping+0.066
uttering+0.062
 This+0.062
unch+0.059
Neg logit contributions
 forgot-0.059
 punished-0.052
 cleaned-0.052
 felt-0.047
 regretted-0.046
Token.
Token ID13
Feature activation-4.72e-02

Pos logit contributions
 forgot+0.034
 punished+0.031
 cleaned+0.031
 put+0.028
 bit+0.027
Neg logit contributions
aps-0.036
apping-0.035
uttering-0.033
uts-0.031
unch-0.031
Token She
Token ID1375
Feature activation+1.25e-01

Pos logit contributions
apping+0.084
aps+0.083
uttering+0.078
unch+0.072
apped+0.070
Neg logit contributions
 forgot-0.111
 cleaned-0.107
 put-0.097
 felt-0.097
 locked-0.095
Token wanted
Token ID2227
Feature activation+1.01e-02

Pos logit contributions
 forgot+0.164
 punished+0.156
 cleaned+0.147
 colder+0.143
 nurses+0.139
Neg logit contributions
 This-0.285
aps-0.285
apping-0.284
apped-0.267
uts-0.265
Token to
Token ID284
Feature activation+3.82e-02

Pos logit contributions
 forgot+0.026
 punished+0.023
 cleaned+0.023
 put+0.021
 realized+0.021
Neg logit contributions
aps-0.017
apping-0.017
uttering-0.014
uts-0.014
 This-0.014
Token do
Token ID466
Feature activation-1.82e-02

Pos logit contributions
 forgot+0.062
 punished+0.053
 cleaned+0.052
 sighed+0.045
 realized+0.044
Neg logit contributions
aps-0.100
apping-0.097
 This-0.089
uts-0.089
uttering-0.087
Token her
Token ID607
Feature activation+3.44e-02

Pos logit contributions
aps+0.036
apping+0.034
 This+0.032
uts+0.031
uttering+0.029
Neg logit contributions
 forgot-0.040
 cleaned-0.036
 punished-0.035
 sighed-0.032
 checked-0.031
Token own
Token ID898
Feature activation+4.00e-02

Pos logit contributions
 forgot+0.058
 punished+0.051
 cleaned+0.050
 put+0.045
 locked+0.042
Neg logit contributions
apping-0.088
aps-0.088
uttering-0.080
apped-0.079
uts-0.078
Token mom
Token ID1995
Feature activation+6.62e-02

Pos logit contributions
 forgot+0.088
 punished+0.079
 cleaned+0.078
 realized+0.067
 checked+0.066
Neg logit contributions
aps-0.104
apping-0.101
uttering-0.090
uts-0.089
 This-0.089
Token had
Token ID550
Feature activation+4.58e-02

Pos logit contributions
ts+0.091
 rubble+0.091
 forgot+0.090
 wounds+0.084
 colder+0.084
Neg logit contributions
apping-0.135
aps-0.129
apped-0.126
 This-0.124
elt-0.119
Token an
Token ID281
Feature activation+4.44e-02

Pos logit contributions
 forgot+0.094
 punished+0.079
 cleaned+0.074
ts+0.073
sie+0.072
Neg logit contributions
aps-0.085
 This-0.081
apping-0.079
uts-0.075
uttering-0.072
Token idea
Token ID2126
Feature activation-2.58e-03

Pos logit contributions
 forgot+0.101
 punished+0.078
 cleaned+0.077
 sighed+0.072
 realized+0.071
Neg logit contributions
aps-0.090
apping-0.089
 This-0.082
uttering-0.082
 Today-0.081
Token.
Token ID13
Feature activation-5.29e-02

Pos logit contributions
aps+0.005
apping+0.005
uttering+0.005
apped+0.005
uts+0.004
Neg logit contributions
 forgot-0.006
 cleaned-0.005
 punished-0.005
 put-0.005
 locked-0.005
Token She
Token ID1375
Feature activation+1.25e-01

Pos logit contributions
uttering+0.099
unch+0.099
apping+0.097
uts+0.096
aps+0.092
Neg logit contributions
 forgot-0.105
 saw-0.102
 felt-0.101
 looked-0.099
 realized-0.095
Token got
Token ID1392
Feature activation+5.81e-04

Pos logit contributions
ts+0.164
 punished+0.163
 forgot+0.157
eling+0.156
 rubble+0.151
Neg logit contributions
apped-0.263
apping-0.263
aps-0.263
uts-0.228
uttered-0.226
Token an
Token ID281
Feature activation+3.90e-02

Pos logit contributions
 forgot+0.001
 cleaned+0.001
 punished+0.001
 realized+0.001
 sighed+0.001
Neg logit contributions
aps-0.001
apping-0.001
 This-0.001
uttering-0.001
apped-0.001
Token avocado
Token ID40377
Feature activation-4.62e-03

Pos logit contributions
 forgot+0.082
 punished+0.062
 cleaned+0.061
 hoped+0.056
 sighed+0.056
Neg logit contributions
aps-0.087
apping-0.087
uttering-0.079
 This-0.078
 Today-0.077
Token out
Token ID503
Feature activation+2.24e-03

Pos logit contributions
aps+0.010
apping+0.010
uts+0.009
uttering+0.009
 This+0.009
Neg logit contributions
 forgot-0.010
 punished-0.009
 cleaned-0.008
 sighed-0.007
 hoped-0.007
Token of
Token ID286
Feature activation+1.96e-02

Pos logit contributions
 forgot+0.006
 punished+0.005
 cleaned+0.005
 put+0.005
 realized+0.005
Neg logit contributions
aps-0.004
apping-0.004
uttering-0.003
uts-0.003
 This-0.003
Token and
Token ID290
Feature activation+4.28e-02

Pos logit contributions
aps+0.167
apping+0.159
uttering+0.155
apped+0.149
 This+0.147
Neg logit contributions
 forgot-0.079
 cleaned-0.070
 punished-0.067
 put-0.061
 mess-0.057
Token started
Token ID2067
Feature activation-1.88e-02

Pos logit contributions
wy+0.059
ode+0.053
 forgot+0.053
oscope+0.050
ns+0.049
Neg logit contributions
apping-0.087
 This-0.085
apped-0.085
aps-0.082
 Sorry-0.077
Token to
Token ID284
Feature activation+4.82e-02

Pos logit contributions
aps+0.037
apping+0.036
uttering+0.032
uts+0.032
unch+0.031
Neg logit contributions
 forgot-0.043
 cleaned-0.041
 punished-0.040
 put-0.037
 washed-0.036
Token panic
Token ID13619
Feature activation-5.25e-02

Pos logit contributions
 forgot+0.087
wy+0.079
'd+0.075
ns+0.072
 cleaned+0.071
Neg logit contributions
aps-0.098
 This-0.094
apping-0.092
 Sorry-0.080
uts-0.080
Token.
Token ID13
Feature activation-5.85e-02

Pos logit contributions
aps+0.104
apping+0.101
uttering+0.090
uts+0.088
 This+0.087
Neg logit contributions
 forgot-0.118
 punished-0.109
 cleaned-0.109
 put-0.098
 realized-0.095
Token She
Token ID1375
Feature activation+1.25e-01

Pos logit contributions
aps+0.112
apping+0.112
uttering+0.103
unch+0.094
uts+0.089
Neg logit contributions
 forgot-0.144
 cleaned-0.129
 felt-0.121
 saw-0.121
 punished-0.120
Token loved
Token ID6151
Feature activation-3.53e-02

Pos logit contributions
 distance+0.172
wy+0.168
 colder+0.163
sie+0.160
 richest+0.157
Neg logit contributions
apped-0.224
 This-0.219
apping-0.216
aps-0.209
 That-0.207
Token that
Token ID326
Feature activation+1.79e-02

Pos logit contributions
aps+0.065
apping+0.063
 This+0.058
uts+0.058
uttering+0.055
Neg logit contributions
 forgot-0.080
 punished-0.073
 cleaned-0.071
 sighed-0.064
 locked-0.063
Token v
Token ID410
Feature activation-1.07e-01

Pos logit contributions
 forgot+0.031
 punished+0.027
 cleaned+0.026
 sighed+0.024
wy+0.024
Neg logit contributions
aps-0.044
apping-0.042
 This-0.040
uts-0.038
 Today-0.037
Tokenase
Token ID589
Feature activation-2.69e-02

Pos logit contributions
aps+0.173
apping+0.170
 This+0.151
uttering+0.151
 Today+0.147
Neg logit contributions
 forgot-0.263
 cleaned-0.247
 punished-0.246
 felt-0.215
 washed-0.214
Token very
Token ID845
Feature activation-4.90e-03

Pos logit contributions
aps+0.052
apping+0.052
 This+0.051
uts+0.051
apped+0.045
Neg logit contributions
 forgot-0.048
 punished-0.042
 wished-0.040
 cleaned-0.040
 sighed-0.039
Token was
Token ID373
Feature activation+3.87e-02

Pos logit contributions
 forgot+0.051
 punished+0.046
 cleaned+0.038
ts+0.037
 mess+0.035
Neg logit contributions
aps-0.091
apping-0.090
 This-0.088
uts-0.083
unch-0.083
Token worried
Token ID7960
Feature activation-3.76e-02

Pos logit contributions
 forgot+0.085
 cleaned+0.074
 punished+0.074
 sighed+0.067
 realized+0.067
Neg logit contributions
aps-0.079
apping-0.078
uttering-0.069
 This-0.068
uts-0.066
Token about
Token ID546
Feature activation-1.06e-02

Pos logit contributions
aps+0.085
apping+0.081
uttering+0.076
apped+0.072
uts+0.070
Neg logit contributions
 forgot-0.074
 cleaned-0.069
 punished-0.065
 put-0.062
 felt-0.060
Token him
Token ID683
Feature activation+4.20e-02

Pos logit contributions
aps+0.024
apping+0.023
 This+0.021
uts+0.021
uttering+0.021
Neg logit contributions
 forgot-0.021
 punished-0.018
 cleaned-0.018
 sighed-0.016
 realized-0.016
Token.
Token ID13
Feature activation-5.25e-02

Pos logit contributions
 forgot+0.091
 cleaned+0.083
 punished+0.083
 put+0.074
 locked+0.072
Neg logit contributions
aps-0.088
apping-0.085
uttering-0.076
uts-0.075
apped-0.074
Token She
Token ID1375
Feature activation+1.25e-01

Pos logit contributions
apping+0.112
aps+0.108
uttering+0.108
utes+0.097
ute+0.093
Neg logit contributions
 forgot-0.107
 saw-0.103
 looked-0.102
 cleaned-0.100
 had-0.098
Token took
Token ID1718
Feature activation-1.30e-03

Pos logit contributions
ts+0.178
 rubble+0.173
 distance+0.171
 punished+0.169
 colder+0.166
Neg logit contributions
apping-0.236
apped-0.231
aps-0.223
 This-0.216
 Today-0.214
Token a
Token ID257
Feature activation+4.37e-02

Pos logit contributions
aps+0.003
apping+0.002
 This+0.002
uts+0.002
uttering+0.002
Neg logit contributions
 forgot-0.003
 punished-0.003
 cleaned-0.003
 realized-0.002
 sighed-0.002
Token therm
Token ID21969
Feature activation-5.10e-02

Pos logit contributions
 forgot+0.081
 punished+0.069
 cleaned+0.063
 hoped+0.061
 realized+0.060
Neg logit contributions
aps-0.103
apping-0.099
uttering-0.094
 This-0.093
uts-0.088
Tokenometer
Token ID15635
Feature activation-1.06e-02

Pos logit contributions
aps+0.074
apping+0.072
uttering+0.060
 This+0.060
uts+0.059
Neg logit contributions
 forgot-0.139
 cleaned-0.128
 punished-0.128
 put-0.116
 locked-0.116
Token to
Token ID284
Feature activation+3.39e-02

Pos logit contributions
aps+0.022
apping+0.021
uttering+0.019
uts+0.019
 This+0.018
Neg logit contributions
 forgot-0.023
 cleaned-0.021
 punished-0.021
 put-0.019
 locked-0.018
Token a
Token ID257
Feature activation+4.16e-02

Pos logit contributions
 forgot+0.021
 punished+0.018
 cleaned+0.018
 sighed+0.017
 realized+0.016
Neg logit contributions
aps-0.020
apping-0.020
uttering-0.019
 This-0.018
uts-0.018
Token special
Token ID2041
Feature activation+2.34e-02

Pos logit contributions
 forgot+0.072
 punished+0.062
 cleaned+0.061
 sighed+0.054
 hoped+0.053
Neg logit contributions
aps-0.100
apping-0.098
 This-0.091
uttering-0.091
uts-0.087
Token plant
Token ID4618
Feature activation-1.15e-02

Pos logit contributions
 forgot+0.041
 punished+0.037
 cleaned+0.035
 sighed+0.030
 hoped+0.030
Neg logit contributions
aps-0.058
apping-0.057
uttering-0.052
 This-0.051
uts-0.050
Token.
Token ID13
Feature activation-6.77e-02

Pos logit contributions
aps+0.022
apping+0.022
uts+0.019
 This+0.019
uttering+0.019
Neg logit contributions
 forgot-0.026
 punished-0.023
 cleaned-0.023
 sighed-0.021
 hoped-0.020
Token If
Token ID1002
Feature activation-9.21e-03

Pos logit contributions
aps+0.142
uttering+0.140
apping+0.139
unch+0.136
uts+0.129
Neg logit contributions
 forgot-0.131
 cleaned-0.124
 realized-0.112
 put-0.112
 punished-0.111
Token she
Token ID673
Feature activation+1.25e-01

Pos logit contributions
aps+0.018
apping+0.018
 This+0.016
uttering+0.016
uts+0.016
Neg logit contributions
 forgot-0.020
 punished-0.019
 cleaned-0.018
 checked-0.016
 sighed-0.016
Token took
Token ID1718
Feature activation-1.11e-02

Pos logit contributions
ts+0.139
 forgot+0.133
 colder+0.132
 mess+0.132
 nurses+0.130
Neg logit contributions
apping-0.278
 This-0.275
uts-0.274
aps-0.273
apped-0.262
Token good
Token ID922
Feature activation-2.38e-02

Pos logit contributions
aps+0.023
apping+0.022
 This+0.021
uttering+0.020
uts+0.019
Neg logit contributions
 forgot-0.024
 punished-0.021
 cleaned-0.021
 sighed-0.019
 regretted-0.018
Token care
Token ID1337
Feature activation-1.95e-02

Pos logit contributions
aps+0.070
apping+0.069
uts+0.063
uttering+0.063
 This+0.063
Neg logit contributions
 forgot-0.031
 cleaned-0.026
 punished-0.026
 put-0.020
 realized-0.020
Token of
Token ID286
Feature activation-3.19e-03

Pos logit contributions
aps+0.033
apping+0.032
uts+0.027
uttering+0.027
 This+0.027
Neg logit contributions
 forgot-0.050
 punished-0.046
 cleaned-0.045
 sighed-0.041
 put-0.041
Token it
Token ID340
Feature activation+3.82e-02

Pos logit contributions
aps+0.006
apping+0.006
 This+0.006
uttering+0.006
uts+0.006
Neg logit contributions
 forgot-0.007
 punished-0.006
 cleaned-0.006
 sighed-0.005
 regretted-0.005
Token was
Token ID373
Feature activation+4.82e-02

Pos logit contributions
 forgot+0.119
 punished+0.112
 cleaned+0.095
 nurses+0.092
ts+0.086
Neg logit contributions
aps-0.188
apping-0.180
 This-0.179
uts-0.167
apped-0.166
Token feeling
Token ID4203
Feature activation+2.03e-02

Pos logit contributions
 forgot+0.099
 cleaned+0.093
 punished+0.093
 put+0.084
 locked+0.080
Neg logit contributions
aps-0.101
apping-0.098
uts-0.094
 This-0.087
uttering-0.086
Token very
Token ID845
Feature activation+3.70e-02

Pos logit contributions
 forgot+0.039
 punished+0.033
 cleaned+0.033
 sighed+0.030
 checked+0.030
Neg logit contributions
aps-0.046
apping-0.046
uttering-0.042
 This-0.041
apped-0.040
Token excited
Token ID6568
Feature activation-3.77e-02

Pos logit contributions
 forgot+0.063
 cleaned+0.055
 punished+0.055
 put+0.047
 realized+0.046
Neg logit contributions
aps-0.094
apping-0.092
uts-0.084
uttering-0.083
 This-0.083
Token.
Token ID13
Feature activation-4.87e-02

Pos logit contributions
aps+0.092
apping+0.085
uttering+0.078
apped+0.077
unch+0.076
Neg logit contributions
 forgot-0.068
 cleaned-0.061
 put-0.059
ty-0.056
 punished-0.055
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
apping+0.092
aps+0.091
uttering+0.089
unch+0.084
utes+0.077
Neg logit contributions
 forgot-0.105
 cleaned-0.105
 put-0.097
 looked-0.096
 felt-0.095
Token was
Token ID373
Feature activation+4.27e-02

Pos logit contributions
 punished+0.177
 forgot+0.164
 nurses+0.153
 colder+0.150
 worse+0.143
Neg logit contributions
aps-0.287
apping-0.282
 This-0.270
apped-0.267
uts-0.260
Token going
Token ID1016
Feature activation-9.22e-03

Pos logit contributions
 forgot+0.090
 cleaned+0.081
 punished+0.081
 put+0.072
 locked+0.070
Neg logit contributions
aps-0.092
apping-0.089
uts-0.080
uttering-0.079
 This-0.078
Token to
Token ID284
Feature activation+5.04e-02

Pos logit contributions
aps+0.017
apping+0.017
uts+0.015
uttering+0.014
unch+0.014
Neg logit contributions
 forgot-0.022
 cleaned-0.020
 punished-0.020
 put-0.018
 locked-0.018
Token open
Token ID1280
Feature activation-1.07e-02

Pos logit contributions
 forgot+0.081
 cleaned+0.071
 punished+0.070
 put+0.062
 locked+0.058
Neg logit contributions
aps-0.132
apping-0.131
uttering-0.120
uts-0.118
apped-0.116
Token a
Token ID257
Feature activation+5.55e-02

Pos logit contributions
aps+0.026
apping+0.026
uttering+0.023
uts+0.023
 This+0.023
Neg logit contributions
 forgot-0.019
 punished-0.017
 cleaned-0.017
 put-0.014
 realized-0.014
Token was
Token ID373
Feature activation+3.83e-02

Pos logit contributions
 forgot+0.051
 punished+0.044
 cleaned+0.043
 sighed+0.033
 put+0.033
Neg logit contributions
aps-0.115
apping-0.113
uttering-0.104
 This-0.104
uts-0.104
Token feeling
Token ID4203
Feature activation+1.67e-02

Pos logit contributions
 forgot+0.074
 cleaned+0.067
 punished+0.065
 put+0.061
 locked+0.058
Neg logit contributions
aps-0.087
apping-0.084
uts-0.077
 This-0.075
unch-0.074
Token very
Token ID845
Feature activation+3.04e-02

Pos logit contributions
 forgot+0.031
 punished+0.027
 cleaned+0.026
 sighed+0.024
 checked+0.023
Neg logit contributions
aps-0.039
apping-0.039
uttering-0.035
 This-0.035
uts-0.034
Token ashamed
Token ID22461
Feature activation-5.13e-02

Pos logit contributions
 forgot+0.050
 cleaned+0.044
 punished+0.044
 put+0.036
 realized+0.036
Neg logit contributions
aps-0.078
apping-0.077
uts-0.070
uttering-0.070
 This-0.069
Token.
Token ID13
Feature activation-4.52e-02

Pos logit contributions
aps+0.119
apping+0.111
uttering+0.103
unch+0.102
uts+0.101
Neg logit contributions
 forgot-0.094
 cleaned-0.086
 punished-0.081
 put-0.080
 bit-0.075
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
apping+0.098
uttering+0.097
aps+0.094
iked+0.085
utes+0.084
Neg logit contributions
 looked-0.079
 forgot-0.079
 felt-0.076
 had-0.075
 put-0.075
Token was
Token ID373
Feature activation+4.15e-02

Pos logit contributions
 punished+0.181
 forgot+0.174
 cleaned+0.163
 nurses+0.157
 colder+0.146
Neg logit contributions
aps-0.282
apping-0.273
apped-0.266
uts-0.261
 This-0.259
Token ashamed
Token ID22461
Feature activation-4.55e-02

Pos logit contributions
 forgot+0.071
 cleaned+0.061
 punished+0.061
 sighed+0.053
 realized+0.051
Neg logit contributions
aps-0.103
apping-0.103
uttering-0.094
 This-0.091
uts-0.090
Token because
Token ID780
Feature activation+4.60e-02

Pos logit contributions
aps+0.102
apping+0.099
uttering+0.089
uts+0.088
 This+0.086
Neg logit contributions
 forgot-0.092
 cleaned-0.081
 punished-0.079
 put-0.073
 realized-0.072
Token her
Token ID607
Feature activation+4.63e-02

Pos logit contributions
 forgot+0.095
 cleaned+0.084
 punished+0.083
 put+0.074
 realized+0.074
Neg logit contributions
aps-0.100
apping-0.098
uttering-0.087
uts-0.087
 This-0.086
Token wings
Token ID12098
Feature activation-4.29e-03

Pos logit contributions
 forgot+0.085
 punished+0.076
 cleaned+0.075
 sighed+0.064
 put+0.063
Neg logit contributions
aps-0.110
apping-0.108
uttering-0.098
uts-0.097
 This-0.096
Token's
Token ID338
Feature activation+4.39e-02

Pos logit contributions
 forgot+0.118
 punished+0.109
 nurses+0.097
 cleaned+0.095
ts+0.091
Neg logit contributions
apping-0.216
aps-0.211
apped-0.199
 This-0.193
uts-0.186
Token mom
Token ID1995
Feature activation+6.52e-02

Pos logit contributions
 forgot+0.086
 punished+0.077
 cleaned+0.076
 realized+0.065
 checked+0.065
Neg logit contributions
aps-0.102
apping-0.099
uttering-0.087
apped-0.086
 This-0.086
Token was
Token ID373
Feature activation+4.29e-02

Pos logit contributions
 forgot+0.100
 punished+0.087
 ger+0.082
ts+0.078
 damage+0.077
Neg logit contributions
apping-0.157
aps-0.148
apped-0.144
 This-0.132
unch-0.131
Token surprised
Token ID6655
Feature activation-4.76e-02

Pos logit contributions
 forgot+0.080
 cleaned+0.071
 punished+0.071
 put+0.060
 realized+0.060
Neg logit contributions
aps-0.101
apping-0.100
uttering-0.089
uts-0.089
 This-0.088
Token.
Token ID13
Feature activation-5.67e-02

Pos logit contributions
aps+0.111
apping+0.105
uts+0.098
uttering+0.097
apped+0.095
Neg logit contributions
 forgot-0.091
 cleaned-0.082
 punished-0.080
 put-0.075
 sighed-0.072
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
aps+0.105
apping+0.100
uttering+0.094
ectar+0.088
unch+0.086
Neg logit contributions
 forgot-0.133
 cleaned-0.123
 saw-0.119
 put-0.118
 felt-0.117
Token said
Token ID531
Feature activation-1.60e-02

Pos logit contributions
 forgot+0.186
 punished+0.181
wy+0.177
ts+0.170
 colder+0.170
Neg logit contributions
apping-0.266
aps-0.254
apped-0.245
 This-0.235
uts-0.233
Token,
Token ID11
Feature activation+7.47e-03

Pos logit contributions
aps+0.035
apping+0.033
uttering+0.031
unch+0.030
uts+0.029
Neg logit contributions
 forgot-0.033
 cleaned-0.030
 punished-0.029
 put-0.027
 realized-0.027
Token "
Token ID366
Feature activation-2.00e-02

Pos logit contributions
 forgot+0.016
 cleaned+0.015
 put+0.014
 sighed+0.014
 felt+0.014
Neg logit contributions
aps-0.016
apping-0.015
uttering-0.014
ectar-0.013
unch-0.013
TokenBut
Token ID1537
Feature activation-8.50e-03

Pos logit contributions
aps+0.043
apping+0.042
uttering+0.037
uts+0.037
 This+0.036
Neg logit contributions
 forgot-0.042
 cleaned-0.038
 punished-0.038
 put-0.033
 sighed-0.033
Token why
Token ID1521
Feature activation+3.01e-03

Pos logit contributions
aps+0.017
apping+0.017
 This+0.016
uts+0.014
 Today+0.014
Neg logit contributions
 forgot-0.018
 punished-0.016
 cleaned-0.016
 regretted-0.014
 realized-0.014
Token She
Token ID1375
Feature activation+1.20e-01

Pos logit contributions
aps+0.112
apping+0.110
uttering+0.106
uts+0.099
unch+0.092
Neg logit contributions
 cleaned-0.106
 forgot-0.105
 put-0.096
 washed-0.092
 saw-0.091
Token was
Token ID373
Feature activation+4.00e-02

Pos logit contributions
 punished+0.164
 forgot+0.159
 colder+0.144
 nurses+0.137
ty+0.135
Neg logit contributions
aps-0.305
apping-0.287
apped-0.282
 This-0.268
uts-0.264
Token so
Token ID523
Feature activation+3.18e-02

Pos logit contributions
 forgot+0.065
 cleaned+0.059
 punished+0.058
 put+0.052
 locked+0.050
Neg logit contributions
aps-0.104
apping-0.102
uts-0.094
uttering-0.093
unch-0.091
Token excited
Token ID6568
Feature activation-4.61e-02

Pos logit contributions
 forgot+0.059
 cleaned+0.052
 punished+0.050
 put+0.047
 hoped+0.044
Neg logit contributions
aps-0.076
apping-0.075
 This-0.068
uts-0.068
uttering-0.068
Token.
Token ID13
Feature activation-5.41e-02

Pos logit contributions
aps+0.107
apping+0.102
uttering+0.092
apped+0.092
unch+0.088
Neg logit contributions
 forgot-0.090
 cleaned-0.080
 put-0.076
 punished-0.074
 saw-0.072
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
aps+0.114
apping+0.114
uttering+0.111
uts+0.102
ectar+0.095
Neg logit contributions
 cleaned-0.107
 forgot-0.104
 put-0.101
 saw-0.094
 washed-0.093
Token showed
Token ID3751
Feature activation-4.50e-02

Pos logit contributions
 punished+0.152
 forgot+0.143
 colder+0.141
 worse+0.133
 nurses+0.132
Neg logit contributions
aps-0.310
apping-0.301
apped-0.295
 This-0.280
uts-0.278
Token it
Token ID340
Feature activation+3.88e-02

Pos logit contributions
aps+0.098
apping+0.096
uttering+0.085
uts+0.085
 This+0.085
Neg logit contributions
 forgot-0.093
 punished-0.083
 cleaned-0.082
 realized-0.071
 sighed-0.071
Token to
Token ID284
Feature activation+2.23e-02

Pos logit contributions
 forgot+0.076
 cleaned+0.072
 put+0.068
 punished+0.066
 locked+0.064
Neg logit contributions
apping-0.088
aps-0.085
unch-0.080
uttering-0.077
apped-0.072
Token her
Token ID607
Feature activation+4.04e-02

Pos logit contributions
 forgot+0.043
 cleaned+0.040
 punished+0.038
 put+0.036
 sighed+0.033
Neg logit contributions
aps-0.052
apping-0.052
uttering-0.047
uts-0.047
unch-0.044
Token three
Token ID1115
Feature activation-5.12e-03

Pos logit contributions
 forgot+0.073
 cleaned+0.064
 punished+0.064
 put+0.055
 realized+0.054
Neg logit contributions
aps-0.099
apping-0.098
uttering-0.088
uts-0.088
 This-0.087
Token His
Token ID2399
Feature activation+5.50e-02

Pos logit contributions
uttering+0.144
aps+0.140
apping+0.138
ectar+0.130
unch+0.129
Neg logit contributions
 forgot-0.120
 cleaned-0.119
 put-0.105
 felt-0.105
 realized-0.103
Token mom
Token ID1995
Feature activation+7.06e-02

Pos logit contributions
 forgot+0.099
 punished+0.087
 cleaned+0.087
 realized+0.073
 put+0.073
Neg logit contributions
aps-0.135
apping-0.132
uts-0.119
uttering-0.118
 This-0.118
Token looked
Token ID3114
Feature activation+3.16e-02

Pos logit contributions
 forgot+0.107
 colder+0.102
 ger+0.101
 punished+0.100
wy+0.098
Neg logit contributions
apping-0.164
aps-0.160
apped-0.157
uts-0.135
unch-0.135
Token concerned
Token ID5213
Feature activation-3.46e-02

Pos logit contributions
 forgot+0.090
 punished+0.080
 cleaned+0.079
ty+0.074
ts+0.074
Neg logit contributions
aps-0.040
apping-0.040
uttering-0.034
 This-0.033
apped-0.033
Token.
Token ID13
Feature activation-5.93e-02

Pos logit contributions
aps+0.083
apping+0.080
uttering+0.073
uts+0.072
unch+0.070
Neg logit contributions
 forgot-0.064
 cleaned-0.057
 punished-0.056
 realized-0.050
 put-0.050
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
aps+0.119
apping+0.109
uttering+0.107
uts+0.101
unch+0.100
Neg logit contributions
 forgot-0.127
 cleaned-0.120
 saw-0.117
 realized-0.112
 felt-0.111
Token said
Token ID531
Feature activation-1.43e-02

Pos logit contributions
ts+0.206
 colder+0.203
 ger+0.203
 rubble+0.202
wy+0.200
Neg logit contributions
apped-0.225
apping-0.222
aps-0.197
uttered-0.193
 This-0.186
Token,
Token ID11
Feature activation+8.55e-03

Pos logit contributions
aps+0.032
apping+0.030
uttering+0.029
unch+0.027
uts+0.027
Neg logit contributions
 forgot-0.029
 cleaned-0.026
 punished-0.025
 realized-0.024
 put-0.023
Token "
Token ID366
Feature activation-2.12e-02

Pos logit contributions
 forgot+0.017
 cleaned+0.017
 put+0.016
 sighed+0.015
 felt+0.015
Neg logit contributions
aps-0.020
apping-0.017
uttering-0.017
unch-0.016
ectar-0.015
TokenI
Token ID40
Feature activation+6.87e-02

Pos logit contributions
aps+0.045
apping+0.044
uts+0.040
uttering+0.039
 This+0.039
Neg logit contributions
 forgot-0.045
 punished-0.040
 cleaned-0.040
 put-0.035
 checked-0.035
Token'm
Token ID1101
Feature activation+2.27e-02

Pos logit contributions
ts+0.119
 forgot+0.114
 punished+0.111
wy+0.109
ty+0.107
Neg logit contributions
aps-0.137
apping-0.134
 This-0.128
uts-0.126
 Today-0.121
Token Pat
Token ID3208
Feature activation+7.26e-02

Pos logit contributions
aps+0.099
apping+0.095
uttering+0.093
uts+0.087
apped+0.080
Neg logit contributions
 forgot-0.097
 cleaned-0.094
 put-0.089
 saw-0.083
 checked-0.082
Token was
Token ID373
Feature activation+4.15e-02

Pos logit contributions
 forgot+0.111
 punished+0.109
 cleaned+0.095
 nurses+0.090
 repaired+0.080
Neg logit contributions
apping-0.171
aps-0.170
 This-0.159
uts-0.158
uttering-0.151
Token so
Token ID523
Feature activation+3.47e-02

Pos logit contributions
 forgot+0.076
 cleaned+0.070
 punished+0.068
 put+0.063
 felt+0.060
Neg logit contributions
aps-0.098
apping-0.094
uts-0.090
 This-0.085
uttering-0.084
Token excited
Token ID6568
Feature activation-4.43e-02

Pos logit contributions
 forgot+0.060
 cleaned+0.052
 punished+0.049
 put+0.048
 felt+0.045
Neg logit contributions
aps-0.087
apping-0.083
uts-0.077
 This-0.075
uttering-0.075
Token.
Token ID13
Feature activation-4.88e-02

Pos logit contributions
aps+0.105
apping+0.098
uttering+0.090
apped+0.089
unch+0.086
Neg logit contributions
 forgot-0.085
 cleaned-0.074
 put-0.072
 punished-0.070
 saw-0.068
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
apping+0.113
uttering+0.111
aps+0.108
uts+0.100
unch+0.098
Neg logit contributions
 forgot-0.085
 put-0.074
 felt-0.074
 saw-0.073
 cleaned-0.072
Token had
Token ID550
Feature activation+4.59e-02

Pos logit contributions
 punished+0.163
 forgot+0.155
 cleaned+0.135
 nurses+0.132
wy+0.125
Neg logit contributions
aps-0.314
apping-0.312
 This-0.289
apped-0.287
uts-0.287
Token a
Token ID257
Feature activation+5.03e-02

Pos logit contributions
 forgot+0.061
 punished+0.055
 cleaned+0.051
 sighed+0.042
 nurses+0.042
Neg logit contributions
aps-0.124
apping-0.121
 This-0.116
uts-0.113
uttering-0.112
Token new
Token ID649
Feature activation+3.51e-03

Pos logit contributions
 forgot+0.085
 punished+0.074
 cleaned+0.073
 sighed+0.062
 realized+0.061
Neg logit contributions
aps-0.127
apping-0.125
uttering-0.114
 This-0.112
uts-0.111
Token jar
Token ID17379
Feature activation-1.37e-02

Pos logit contributions
 forgot+0.006
 cleaned+0.005
 punished+0.005
 put+0.004
 locked+0.004
Neg logit contributions
aps-0.009
apping-0.009
uts-0.008
uttering-0.008
 This-0.008
Token and
Token ID290
Feature activation+4.60e-02

Pos logit contributions
aps+0.025
apping+0.025
uttering+0.021
uts+0.021
 This+0.021
Neg logit contributions
 forgot-0.032
 cleaned-0.030
 punished-0.029
 put-0.026
 locked-0.026
Token,
Token ID11
Feature activation+2.20e-02

Pos logit contributions
 forgot+0.158
 punished+0.152
 cleaned+0.138
 nurses+0.130
ts+0.123
Neg logit contributions
aps-0.191
apping-0.185
 This-0.172
apped-0.164
uts-0.157
Token felt
Token ID2936
Feature activation+3.07e-02

Pos logit contributions
 forgot+0.032
 punished+0.029
 cleaned+0.027
 sighed+0.022
 mess+0.022
Neg logit contributions
apping-0.059
aps-0.059
apped-0.053
uttering-0.053
 This-0.052
Token so
Token ID523
Feature activation+3.66e-02

Pos logit contributions
 forgot+0.059
 cleaned+0.051
 punished+0.050
 checked+0.045
 sighed+0.044
Neg logit contributions
aps-0.070
apping-0.070
uttering-0.063
 This-0.063
apped-0.060
Token excited
Token ID6568
Feature activation-3.91e-02

Pos logit contributions
 forgot+0.066
 punished+0.059
 cleaned+0.058
 checked+0.050
 sighed+0.050
Neg logit contributions
aps-0.088
apping-0.086
uttering-0.078
 This-0.078
uts-0.077
Token.
Token ID13
Feature activation-5.16e-02

Pos logit contributions
aps+0.089
apping+0.083
uttering+0.073
apped+0.073
unch+0.072
Neg logit contributions
 forgot-0.079
 cleaned-0.070
 punished-0.066
 put-0.066
 felt-0.064
Token She
Token ID1375
Feature activation+1.24e-01

Pos logit contributions
apping+0.098
aps+0.096
uttering+0.086
unch+0.085
uts+0.085
Neg logit contributions
 forgot-0.107
 cleaned-0.097
 put-0.097
 felt-0.096
 saw-0.096
Token was
Token ID373
Feature activation+4.17e-02

Pos logit contributions
 punished+0.163
 forgot+0.144
 nurses+0.143
wy+0.134
 colder+0.133
Neg logit contributions
aps-0.310
apping-0.301
apped-0.285
 This-0.283
uts-0.276
Token wearing
Token ID5762
Feature activation-3.88e-03

Pos logit contributions
 forgot+0.087
 punished+0.078
 cleaned+0.078
 put+0.068
 realized+0.068
Neg logit contributions
aps-0.090
apping-0.088
uttering-0.078
uts-0.078
 This-0.077
Token her
Token ID607
Feature activation+5.20e-02

Pos logit contributions
aps+0.008
apping+0.008
 This+0.007
uttering+0.007
uts+0.007
Neg logit contributions
 forgot-0.008
 punished-0.008
 cleaned-0.007
 sighed-0.007
 realized-0.007
Token new
Token ID649
Feature activation-9.69e-03

Pos logit contributions
 forgot+0.096
 punished+0.085
 cleaned+0.085
 put+0.072
 realized+0.072
Neg logit contributions
aps-0.124
apping-0.122
uttering-0.109
uts-0.109
 This-0.108
Token pink
Token ID11398
Feature activation-2.83e-02

Pos logit contributions
aps+0.025
apping+0.024
uttering+0.022
uts+0.022
 This+0.022
Neg logit contributions
 forgot-0.016
 punished-0.014
 cleaned-0.014
 put-0.012
 realized-0.012
Tokenummy
Token ID13513
Feature activation+5.92e-02

Pos logit contributions
 forgot+0.018
 cleaned+0.017
 punished+0.016
 put+0.015
 realized+0.015
Neg logit contributions
aps-0.013
apping-0.013
uts-0.011
uttering-0.011
 This-0.010
Token was
Token ID373
Feature activation+4.01e-02

Pos logit contributions
 forgot+0.110
 punished+0.101
 cleaned+0.093
 sighed+0.083
ts+0.082
Neg logit contributions
aps-0.131
apping-0.128
 This-0.121
uts-0.116
apped-0.115
Token so
Token ID523
Feature activation+3.10e-02

Pos logit contributions
 cleaned+0.078
 punished+0.077
 forgot+0.076
 put+0.070
 washed+0.067
Neg logit contributions
aps-0.086
uts-0.083
apping-0.079
 Nice-0.076
 This-0.072
Token excited
Token ID6568
Feature activation-4.05e-02

Pos logit contributions
 forgot+0.053
 cleaned+0.048
 punished+0.045
 put+0.043
 saw+0.040
Neg logit contributions
aps-0.078
apping-0.074
uts-0.071
 This-0.068
 Nice-0.067
Token!
Token ID0
Feature activation-4.55e-02

Pos logit contributions
aps+0.096
apping+0.089
uttering+0.081
apped+0.080
unch+0.077
Neg logit contributions
 forgot-0.078
 cleaned-0.068
 put-0.066
ty-0.064
 saw-0.063
Token She
Token ID1375
Feature activation+1.23e-01

Pos logit contributions
aps+0.090
uttering+0.088
apping+0.087
uts+0.080
utes+0.080
Neg logit contributions
 cleaned-0.083
 forgot-0.080
 put-0.079
 washed-0.074
 looked-0.071
Token had
Token ID550
Feature activation+4.73e-02

Pos logit contributions
 forgot+0.144
 punished+0.142
 cleaned+0.115
 nurses+0.109
 worse+0.101
Neg logit contributions
aps-0.335
apping-0.332
 This-0.312
uts-0.309
apped-0.308
Token just
Token ID655
Feature activation+3.33e-02

Pos logit contributions
 forgot+0.062
 punished+0.054
 cleaned+0.050
 sighed+0.043
 regretted+0.041
Neg logit contributions
aps-0.128
apping-0.124
 This-0.121
uts-0.117
uttering-0.115
Token got
Token ID1392
Feature activation-4.30e-03

Pos logit contributions
 forgot+0.042
 punished+0.039
 cleaned+0.033
ty+0.030
 nurses+0.029
Neg logit contributions
aps-0.088
apping-0.087
 This-0.082
apped-0.082
uttering-0.082
Token a
Token ID257
Feature activation+4.71e-02

Pos logit contributions
aps+0.009
apping+0.009
uttering+0.008
uts+0.008
 This+0.008
Neg logit contributions
 forgot-0.009
 cleaned-0.008
 punished-0.008
 sighed-0.007
 realized-0.007
Token letter
Token ID3850
Feature activation-3.32e-02

Pos logit contributions
 forgot+0.082
 punished+0.071
 cleaned+0.071
 sighed+0.060
 realized+0.059
Neg logit contributions
aps-0.117
apping-0.115
uttering-0.105
 This-0.103
uts-0.103
Token Mom
Token ID11254
Feature activation+5.80e-02

Pos logit contributions
aps+0.129
uttering+0.122
apping+0.121
uts+0.107
ectar+0.107
Neg logit contributions
 forgot-0.139
 cleaned-0.129
 put-0.126
 punished-0.121
 saw-0.120
Tokenmy
Token ID1820
Feature activation+7.81e-02

Pos logit contributions
 forgot+0.082
 punished+0.070
 cleaned+0.067
 realized+0.055
 hoped+0.055
Neg logit contributions
apping-0.157
aps-0.157
 This-0.142
uts-0.141
apped-0.140
Token was
Token ID373
Feature activation+4.24e-02

Pos logit contributions
 forgot+0.107
 punished+0.095
 cleaned+0.086
 mess+0.075
wy+0.074
Neg logit contributions
apping-0.203
aps-0.198
uts-0.183
apped-0.182
 This-0.181
Token excited
Token ID6568
Feature activation-2.89e-02

Pos logit contributions
 forgot+0.085
 cleaned+0.075
 punished+0.075
 put+0.065
 realized+0.065
Neg logit contributions
aps-0.095
apping-0.093
uttering-0.083
uts-0.083
 This-0.082
Token.
Token ID13
Feature activation-5.13e-02

Pos logit contributions
aps+0.063
apping+0.059
uttering+0.053
apped+0.052
uts+0.052
Neg logit contributions
 forgot-0.062
 cleaned-0.056
 punished-0.054
 put-0.052
 felt-0.050
Token She
Token ID1375
Feature activation+1.23e-01

Pos logit contributions
aps+0.093
uttering+0.090
apping+0.090
uts+0.084
unch+0.080
Neg logit contributions
 forgot-0.115
 cleaned-0.111
 put-0.108
 saw-0.104
 felt-0.103
Token picked
Token ID6497
Feature activation-2.70e-02

Pos logit contributions
 punished+0.139
 forgot+0.138
wy+0.122
 colder+0.117
 worse+0.117
Neg logit contributions
apping-0.317
aps-0.312
apped-0.295
 This-0.286
uts-0.281
Token up
Token ID510
Feature activation-2.26e-02

Pos logit contributions
aps+0.063
apping+0.062
uttering+0.055
uts+0.055
unch+0.053
Neg logit contributions
 forgot-0.051
 cleaned-0.047
 punished-0.046
 put-0.041
 realized-0.040
Token the
Token ID262
Feature activation+5.18e-02

Pos logit contributions
aps+0.045
apping+0.042
 This+0.040
uts+0.038
apped+0.038
Neg logit contributions
 forgot-0.051
 punished-0.044
 cleaned-0.042
 regretted-0.038
 remembered-0.038
Token orange
Token ID10912
Feature activation-3.19e-02

Pos logit contributions
 forgot+0.102
 cleaned+0.085
 punished+0.085
 realized+0.075
 hoped+0.074
Neg logit contributions
aps-0.116
apping-0.114
 This-0.107
uttering-0.106
uts-0.102
Token phone
Token ID3072
Feature activation-1.89e-02

Pos logit contributions
aps+0.085
apping+0.084
uttering+0.077
uts+0.074
unch+0.073
Neg logit contributions
 forgot-0.049
 cleaned-0.044
 punished-0.042
 put-0.038
 felt-0.037
Tokenolly
Token ID5098
Feature activation+5.56e-02

Pos logit contributions
 forgot+0.014
 cleaned+0.013
 punished+0.013
 put+0.011
 realized+0.011
Neg logit contributions
aps-0.010
apping-0.009
uttering-0.008
uts-0.008
 This-0.008
Token was
Token ID373
Feature activation+4.25e-02

Pos logit contributions
 forgot+0.098
 punished+0.092
 cleaned+0.085
 nurses+0.075
ts+0.075
Neg logit contributions
aps-0.126
apping-0.123
 This-0.116
uts-0.112
apped-0.111
Token feeling
Token ID4203
Feature activation+1.86e-02

Pos logit contributions
 forgot+0.088
 cleaned+0.082
 punished+0.082
 put+0.074
 washed+0.071
Neg logit contributions
aps-0.089
apping-0.086
uts-0.083
 This-0.076
uttering-0.076
Token nervous
Token ID10927
Feature activation-4.03e-02

Pos logit contributions
 forgot+0.037
 cleaned+0.031
 punished+0.031
ty+0.028
 checked+0.028
Neg logit contributions
aps-0.042
apping-0.041
 This-0.037
uttering-0.036
apped-0.035
Token.
Token ID13
Feature activation-4.84e-02

Pos logit contributions
aps+0.084
apping+0.079
uttering+0.072
uts+0.072
unch+0.071
Neg logit contributions
 forgot-0.087
 cleaned-0.078
 punished-0.077
 put-0.071
 felt-0.069
Token She
Token ID1375
Feature activation+1.23e-01

Pos logit contributions
uttering+0.086
apping+0.083
aps+0.080
unch+0.079
uts+0.078
Neg logit contributions
 forgot-0.106
 felt-0.095
 cleaned-0.094
 put-0.091
 saw-0.089
Token had
Token ID550
Feature activation+4.89e-02

Pos logit contributions
 forgot+0.142
 punished+0.142
 cleaned+0.130
wy+0.114
 repaired+0.113
Neg logit contributions
aps-0.312
apping-0.310
 This-0.293
 Today-0.285
uts-0.283
Token to
Token ID284
Feature activation+5.19e-02

Pos logit contributions
 forgot+0.068
 punished+0.060
 cleaned+0.058
wy+0.053
ts+0.051
Neg logit contributions
aps-0.124
apping-0.121
uts-0.114
 This-0.114
uttering-0.112
Token feed
Token ID3745
Feature activation-6.99e-02

Pos logit contributions
 forgot+0.089
 cleaned+0.075
 punished+0.075
 sighed+0.066
 realized+0.066
Neg logit contributions
aps-0.129
apping-0.125
 This-0.115
uts-0.111
uttering-0.111
Token her
Token ID607
Feature activation+3.78e-02

Pos logit contributions
aps+0.175
apping+0.172
uttering+0.155
uts+0.155
 This+0.153
Neg logit contributions
 forgot-0.121
 cleaned-0.106
 punished-0.106
 put-0.090
 realized-0.088
Token stomach
Token ID11384
Feature activation-5.55e-02

Pos logit contributions
 forgot+0.068
 cleaned+0.060
 punished+0.060
 put+0.053
 felt+0.051
Neg logit contributions
aps-0.092
apping-0.091
uttering-0.082
uts-0.081
 This-0.080
Token.
Token ID13
Feature activation-5.84e-02

Pos logit contributions
aps+0.097
apping+0.095
uts+0.090
uttering+0.089
unch+0.087
Neg logit contributions
 forgot-0.073
 cleaned-0.069
 punished-0.065
 put-0.064
 saw-0.060
Token The
Token ID383
Feature activation+3.65e-02

Pos logit contributions
aps+0.116
apping+0.116
uttering+0.110
uts+0.101
unch+0.098
Neg logit contributions
 forgot-0.129
 cleaned-0.120
 put-0.115
 punished-0.113
 felt-0.110
Token mom
Token ID1995
Feature activation+7.20e-02

Pos logit contributions
 forgot+0.077
 punished+0.071
 cleaned+0.068
ts+0.062
ty+0.061
Neg logit contributions
apping-0.071
aps-0.071
uttering-0.063
 This-0.062
apped-0.062
Token disagreed
Token ID27884
Feature activation-7.88e-02

Pos logit contributions
 forgot+0.119
ts+0.113
 ger+0.111
 rubble+0.109
 punished+0.109
Neg logit contributions
apping-0.140
aps-0.132
apped-0.129
unch-0.118
uts-0.117
Token.
Token ID13
Feature activation-5.70e-02

Pos logit contributions
aps+0.189
apping+0.182
uttering+0.167
uts+0.163
unch+0.161
Neg logit contributions
 forgot-0.147
 cleaned-0.136
 punished-0.131
 put-0.118
 sighed-0.116
Token She
Token ID1375
Feature activation+1.23e-01

Pos logit contributions
apping+0.114
aps+0.111
uttering+0.111
ectar+0.100
unch+0.097
Neg logit contributions
 forgot-0.116
 cleaned-0.115
 put-0.110
 saw-0.109
 felt-0.109
Token was
Token ID373
Feature activation+3.59e-02

Pos logit contributions
 distance+0.207
 rubble+0.201
 colder+0.192
ts+0.190
 forgot+0.187
Neg logit contributions
apped-0.203
apping-0.198
oo-0.178
aps-0.177
 This-0.176
Token worried
Token ID7960
Feature activation-3.91e-02

Pos logit contributions
 forgot+0.063
 cleaned+0.054
 punished+0.053
 realized+0.047
 sighed+0.046
Neg logit contributions
aps-0.088
apping-0.088
uttering-0.079
 This-0.078
uts-0.077
Token because
Token ID780
Feature activation+3.92e-02

Pos logit contributions
aps+0.082
apping+0.079
uttering+0.071
uts+0.070
 This+0.068
Neg logit contributions
 forgot-0.085
 cleaned-0.076
 punished-0.075
 put-0.068
 sighed-0.067
Token the
Token ID262
Feature activation+4.13e-02

Pos logit contributions
 forgot+0.087
 punished+0.078
 cleaned+0.077
 realized+0.067
 sighed+0.067
Neg logit contributions
aps-0.080
apping-0.078
uts-0.069
uttering-0.068
 This-0.068
Token home
Token ID1363
Feature activation-1.39e-02

Pos logit contributions
 forgot+0.082
 punished+0.072
 cleaned+0.070
ts+0.064
ty+0.064
Neg logit contributions
aps-0.085
apping-0.083
 This-0.077
uttering-0.076
uts-0.076
Token Her
Token ID2332
Feature activation+4.80e-02

Pos logit contributions
apping+0.117
uttering+0.116
aps+0.114
uts+0.108
unch+0.106
Neg logit contributions
 forgot-0.133
 cleaned-0.127
 felt-0.117
 put-0.116
 punished-0.115
Token mom
Token ID1995
Feature activation+6.16e-02

Pos logit contributions
 forgot+0.092
 punished+0.083
 cleaned+0.081
 realized+0.070
 sighed+0.070
Neg logit contributions
aps-0.111
apping-0.106
uts-0.094
uttering-0.094
 This-0.093
Token was
Token ID373
Feature activation+3.64e-02

Pos logit contributions
 rubble+0.087
 colder+0.085
ts+0.085
 ger+0.080
 forgot+0.080
Neg logit contributions
apping-0.124
apped-0.114
aps-0.113
 This-0.107
unch-0.101
Token worried
Token ID7960
Feature activation-3.94e-02

Pos logit contributions
 forgot+0.072
 cleaned+0.061
 punished+0.059
 realized+0.055
 sighed+0.054
Neg logit contributions
apping-0.082
aps-0.082
uttering-0.073
 This-0.070
uts-0.068
Token.
Token ID13
Feature activation-5.29e-02

Pos logit contributions
aps+0.082
apping+0.080
uttering+0.071
uts+0.071
 This+0.070
Neg logit contributions
 forgot-0.085
 cleaned-0.077
 punished-0.076
 put-0.067
 sighed-0.067
Token She
Token ID1375
Feature activation+1.23e-01

Pos logit contributions
aps+0.095
apping+0.094
uttering+0.090
unch+0.084
uts+0.084
Neg logit contributions
 forgot-0.128
 cleaned-0.121
 felt-0.111
 punished-0.111
 saw-0.111
Token gave
Token ID2921
Feature activation-3.66e-02

Pos logit contributions
ts+0.191
wy+0.177
 distance+0.176
 rubble+0.175
 colder+0.171
Neg logit contributions
apped-0.190
 Today-0.188
apping-0.187
 This-0.186
aps-0.183
Token Lily
Token ID20037
Feature activation+6.29e-02

Pos logit contributions
aps+0.055
 This+0.051
apping+0.050
apped+0.046
uttering+0.042
Neg logit contributions
 forgot-0.098
 punished-0.084
 cleaned-0.081
 regretted-0.077
 hoped-0.077
Token a
Token ID257
Feature activation+4.71e-02

Pos logit contributions
 forgot+0.162
 punished+0.138
 regretted+0.129
 cleaned+0.129
 realized+0.127
Neg logit contributions
aps-0.105
 This-0.095
apping-0.094
uts-0.082
apped-0.080
Token pill
Token ID9582
Feature activation-3.85e-02

Pos logit contributions
 forgot+0.086
 punished+0.073
 cleaned+0.067
ts+0.065
 hoped+0.063
Neg logit contributions
aps-0.114
apping-0.107
 This-0.100
uttering-0.099
uts-0.094
Token to
Token ID284
Feature activation+2.66e-02

Pos logit contributions
aps+0.074
apping+0.072
uts+0.064
uttering+0.063
 This+0.062
Neg logit contributions
 forgot-0.089
 punished-0.081
 cleaned-0.079
 realized-0.071
 sighed-0.071
Token Sammy
Token ID44067
Feature activation+6.60e-02

Pos logit contributions
 forgot+0.092
 punished+0.084
 cleaned+0.084
 sighed+0.074
 hoped+0.073
Neg logit contributions
aps-0.075
apping-0.073
 This-0.066
uts-0.064
uttering-0.062
Token would
Token ID561
Feature activation+1.17e-02

Pos logit contributions
 punished+0.100
 forgot+0.097
 cleaned+0.088
 nurses+0.088
 colder+0.088
Neg logit contributions
apping-0.137
aps-0.134
 This-0.133
apped-0.132
uts-0.130
Token often
Token ID1690
Feature activation+3.57e-02

Pos logit contributions
 forgot+0.019
 punished+0.017
 cleaned+0.017
 sighed+0.014
 realized+0.014
Neg logit contributions
aps-0.030
apping-0.029
 This-0.027
uts-0.027
uttering-0.026
Token share
Token ID2648
Feature activation-4.05e-02

Pos logit contributions
 forgot+0.061
 punished+0.054
 cleaned+0.053
 sighed+0.046
 realized+0.045
Neg logit contributions
aps-0.088
apping-0.086
 This-0.078
uts-0.077
uttering-0.076
Token av
Token ID1196
Feature activation-9.15e-02

Pos logit contributions
aps+0.093
apping+0.092
uttering+0.082
uts+0.082
 This+0.081
Neg logit contributions
 forgot-0.078
 cleaned-0.069
 punished-0.069
 put-0.060
 realized-0.059
Tokenoc
Token ID420
Feature activation-1.70e-01

Pos logit contributions
aps+0.208
apping+0.206
uttering+0.185
uts+0.181
 This+0.180
Neg logit contributions
 forgot-0.177
 cleaned-0.157
 punished-0.157
 put-0.136
 realized-0.135
Tokenados
Token ID22484
Feature activation-4.11e-02

Pos logit contributions
apping+0.241
aps+0.241
uttering+0.211
 Today+0.202
 This+0.182
Neg logit contributions
 forgot-0.454
 punished-0.448
 cleaned-0.429
 repaired-0.392
 realized-0.392
Token and
Token ID290
Feature activation+6.61e-03

Pos logit contributions
aps+0.067
apping+0.066
uttering+0.058
 This+0.056
apped+0.055
Neg logit contributions
 forgot-0.105
 cleaned-0.096
 punished-0.095
 put-0.087
 realized-0.085
Token other
Token ID584
Feature activation-1.30e-02

Pos logit contributions
 forgot+0.011
 punished+0.011
 cleaned+0.010
 sighed+0.009
 locked+0.009
Neg logit contributions
aps-0.015
apping-0.014
 This-0.014
uts-0.013
apped-0.013
Token y
Token ID331
Feature activation-1.18e-01

Pos logit contributions
aps+0.025
apping+0.025
 This+0.022
uts+0.021
apped+0.021
Neg logit contributions
 forgot-0.028
 cleaned-0.025
 punished-0.025
 locked-0.023
 sighed-0.022
Tokenummy
Token ID13513
Feature activation+8.21e-03

Pos logit contributions
aps+0.124
apping+0.118
 Nice+0.110
 This+0.106
 Yes+0.102
Neg logit contributions
 forgot-0.330
 cleaned-0.320
 punished-0.318
 put-0.293
 mess-0.289
Token they
Token ID484
Feature activation+7.73e-02

Pos logit contributions
 forgot+0.056
 cleaned+0.050
 punished+0.050
 put+0.044
 realized+0.043
Neg logit contributions
aps-0.064
apping-0.063
uttering-0.056
uts-0.056
 This-0.055
Token could
Token ID714
Feature activation-2.67e-03

Pos logit contributions
 punished+0.107
 nurses+0.104
 colder+0.103
 mess+0.100
nie+0.100
Neg logit contributions
aps-0.140
uts-0.137
apped-0.133
owers-0.129
apping-0.129
Token grow
Token ID1663
Feature activation-3.31e-02

Pos logit contributions
aps+0.007
apping+0.007
 This+0.006
uts+0.006
uttering+0.006
Neg logit contributions
 forgot-0.004
 punished-0.004
 cleaned-0.004
 sighed-0.003
 regretted-0.003
Token more
Token ID517
Feature activation-8.99e-03

Pos logit contributions
aps+0.082
apping+0.079
uttering+0.072
uts+0.070
 This+0.069
Neg logit contributions
 forgot-0.060
 punished-0.053
 cleaned-0.051
 sighed-0.045
 checked-0.044
Token av
Token ID1196
Feature activation-9.18e-02

Pos logit contributions
aps+0.023
apping+0.022
uttering+0.020
uts+0.020
 This+0.020
Neg logit contributions
 forgot-0.016
 punished-0.014
 cleaned-0.014
 put-0.011
 sighed-0.011
Tokenoc
Token ID420
Feature activation-1.68e-01

Pos logit contributions
aps+0.197
apping+0.193
uts+0.171
uttering+0.171
 This+0.169
Neg logit contributions
 forgot-0.191
 punished-0.171
 cleaned-0.171
 sighed-0.149
 put-0.148
Tokenados
Token ID22484
Feature activation-3.31e-02

Pos logit contributions
apping+0.246
aps+0.242
uttering+0.211
 Today+0.209
 Nice+0.187
Neg logit contributions
 forgot-0.426
 punished-0.420
 cleaned-0.411
ts-0.379
 repaired-0.375
Token and
Token ID290
Feature activation+9.71e-03

Pos logit contributions
aps+0.067
apping+0.066
uts+0.059
uttering+0.058
 This+0.057
Neg logit contributions
 forgot-0.072
 punished-0.065
 cleaned-0.065
 sighed-0.058
 realized-0.057
Token share
Token ID2648
Feature activation-5.13e-02

Pos logit contributions
 forgot+0.016
 punished+0.014
 cleaned+0.014
 sighed+0.012
ty+0.012
Neg logit contributions
aps-0.024
apping-0.023
 This-0.021
uts-0.021
uttering-0.020
Token them
Token ID606
Feature activation+1.43e-02

Pos logit contributions
apping+0.123
aps+0.123
uttering+0.112
uts+0.108
apped+0.108
Neg logit contributions
 forgot-0.092
 cleaned-0.083
 punished-0.081
 put-0.074
 realized-0.070
Token with
Token ID351
Feature activation-1.36e-02

Pos logit contributions
 forgot+0.034
 cleaned+0.032
 punished+0.031
 put+0.029
 felt+0.028
Neg logit contributions
apping-0.025
aps-0.024
uttering-0.022
 This-0.021
unch-0.021
Token began
Token ID2540
Feature activation-4.22e-02

Pos logit contributions
 forgot+0.044
 punished+0.039
 cleaned+0.039
 sighed+0.032
 realized+0.032
Neg logit contributions
aps-0.067
apping-0.066
uts-0.060
uttering-0.059
 This-0.059
Token flying
Token ID7348
Feature activation-7.11e-02

Pos logit contributions
apping+0.086
aps+0.085
unch+0.078
uttering+0.076
apped+0.076
Neg logit contributions
 forgot-0.086
 cleaned-0.078
 punished-0.077
 put-0.077
 felt-0.074
Token around
Token ID1088
Feature activation-3.00e-02

Pos logit contributions
aps+0.151
apping+0.150
unch+0.134
uts+0.133
uttering+0.132
Neg logit contributions
 forgot-0.145
 cleaned-0.131
 punished-0.129
 put-0.118
 felt-0.117
Token making
Token ID1642
Feature activation-2.77e-02

Pos logit contributions
aps+0.076
apping+0.076
uts+0.069
unch+0.069
uttering+0.069
Neg logit contributions
 forgot-0.047
 cleaned-0.043
 punished-0.041
 put-0.039
 saw-0.038
Token aer
Token ID9551
Feature activation-8.88e-02

Pos logit contributions
aps+0.069
apping+0.068
uttering+0.061
uts+0.061
 This+0.060
Neg logit contributions
 forgot-0.048
 punished-0.042
 cleaned-0.042
 put-0.036
 realized-0.036
Tokenopl
Token ID20106
Feature activation-1.66e-01

Pos logit contributions
aps+0.134
apping+0.132
uttering+0.115
apped+0.110
 This+0.109
Neg logit contributions
 forgot-0.232
 cleaned-0.214
 punished-0.214
 realized-0.194
 repaired-0.193
Tokenane
Token ID1531
Feature activation-4.21e-02

Pos logit contributions
 Today+0.235
 Nice+0.233
 Yes+0.229
 This+0.224
aps+0.221
Neg logit contributions
 forgot-0.334
 cleaned-0.308
ts-0.306
 punished-0.304
ty-0.291
Token noises
Token ID26782
Feature activation-4.08e-02

Pos logit contributions
aps+0.102
apping+0.101
uttering+0.093
 This+0.091
uts+0.090
Neg logit contributions
 forgot-0.073
 punished-0.063
 cleaned-0.063
 put-0.056
 felt-0.056
Token.
Token ID13
Feature activation-7.19e-02

Pos logit contributions
aps+0.086
apping+0.081
uttering+0.073
 This+0.073
apped+0.072
Neg logit contributions
 forgot-0.086
 cleaned-0.079
 punished-0.077
 put-0.073
 realized-0.069
Token Jack
Token ID3619
Feature activation+7.10e-02

Pos logit contributions
uttering+0.144
apping+0.142
aps+0.140
apped+0.123
uts+0.122
Neg logit contributions
 forgot-0.135
 cleaned-0.129
 put-0.120
 checked-0.115
 washed-0.114
Token soon
Token ID2582
Feature activation+6.17e-02

Pos logit contributions
 forgot+0.095
 punished+0.090
 cleaned+0.084
 nurses+0.071
 mess+0.068
Neg logit contributions
apping-0.188
aps-0.187
uts-0.172
apped-0.170
 This-0.165
Token could
Token ID714
Feature activation+9.14e-03

Pos logit contributions
aps+0.073
apping+0.072
uts+0.066
 This+0.066
apped+0.065
Neg logit contributions
 forgot-0.040
 punished-0.036
 cleaned-0.034
 nurses-0.028
 sighed-0.028
Token see
Token ID766
Feature activation+4.36e-03

Pos logit contributions
 forgot+0.015
 punished+0.013
 cleaned+0.012
 sighed+0.011
 regretted+0.011
Neg logit contributions
aps-0.024
apping-0.022
 This-0.021
uts-0.021
unch-0.021
Token lots
Token ID6041
Feature activation+4.83e-03

Pos logit contributions
 forgot+0.008
 cleaned+0.008
 punished+0.008
 put+0.006
 realized+0.006
Neg logit contributions
aps-0.010
apping-0.010
uttering-0.009
uts-0.009
 This-0.009
Token of
Token ID286
Feature activation+9.01e-03

Pos logit contributions
 forgot+0.012
 cleaned+0.011
 punished+0.011
 put+0.010
 realized+0.010
Neg logit contributions
aps-0.009
apping-0.008
uttering-0.007
 This-0.007
uts-0.007
Token av
Token ID1196
Feature activation-8.70e-02

Pos logit contributions
 forgot+0.018
 punished+0.015
 cleaned+0.015
 sighed+0.013
 hoped+0.013
Neg logit contributions
aps-0.020
apping-0.020
 This-0.018
uts-0.018
uttering-0.018
Tokenoc
Token ID420
Feature activation-1.64e-01

Pos logit contributions
aps+0.195
apping+0.192
uttering+0.171
uts+0.169
 This+0.169
Neg logit contributions
 forgot-0.172
 punished-0.153
 cleaned-0.153
 put-0.133
 realized-0.132
Tokenados
Token ID22484
Feature activation-2.32e-02

Pos logit contributions
apping+0.244
aps+0.243
uttering+0.209
 Today+0.197
apped+0.186
Neg logit contributions
 forgot-0.420
 punished-0.411
 cleaned-0.395
 repaired-0.368
 realized-0.361
Token that
Token ID326
Feature activation+2.51e-02

Pos logit contributions
aps+0.046
apping+0.045
uttering+0.039
 This+0.039
apped+0.038
Neg logit contributions
 forgot-0.053
 punished-0.048
 cleaned-0.047
 put-0.043
 realized-0.042
Token the
Token ID262
Feature activation+2.83e-02

Pos logit contributions
 forgot+0.035
 punished+0.033
 cleaned+0.031
 sighed+0.025
 nurses+0.025
Neg logit contributions
aps-0.063
apping-0.063
 This-0.062
uts-0.060
apped-0.059
Token supplier
Token ID22693
Feature activation+7.73e-03

Pos logit contributions
 forgot+0.052
 cleaned+0.046
 punished+0.046
 sighed+0.039
 checked+0.039
Neg logit contributions
aps-0.065
apping-0.064
 This-0.059
uttering-0.058
uts-0.057
Token had
Token ID550
Feature activation+2.80e-02

Pos logit contributions
 forgot+0.010
 punished+0.009
 cleaned+0.009
 nurses+0.007
 sighed+0.007
Neg logit contributions
aps-0.022
apping-0.021
uts-0.020
 This-0.019
apped-0.019
Token learn
Token ID2193
Feature activation-3.57e-02

Pos logit contributions
 forgot+0.028
 punished+0.024
 cleaned+0.023
 sighed+0.021
 realized+0.021
Neg logit contributions
aps-0.041
apping-0.040
 This-0.036
uts-0.035
uttering-0.035
Token more
Token ID517
Feature activation-2.17e-02

Pos logit contributions
aps+0.073
apping+0.071
uts+0.063
uttering+0.062
 This+0.062
Neg logit contributions
 forgot-0.079
 punished-0.070
 cleaned-0.070
 sighed-0.062
 realized-0.062
Token about
Token ID546
Feature activation-2.05e-02

Pos logit contributions
apping+0.045
aps+0.045
uttering+0.041
apped+0.041
unch+0.039
Neg logit contributions
 forgot-0.045
 cleaned-0.041
 punished-0.040
 put-0.037
 locked-0.036
Token rh
Token ID9529
Feature activation-9.85e-02

Pos logit contributions
aps+0.046
apping+0.045
 This+0.041
uts+0.041
uttering+0.040
Neg logit contributions
 forgot-0.040
 cleaned-0.035
 punished-0.035
 sighed-0.030
 realized-0.030
Tokenin
Token ID259
Feature activation-3.04e-02

Pos logit contributions
aps+0.210
apping+0.206
uttering+0.179
uts+0.179
unch+0.172
Neg logit contributions
 forgot-0.214
 cleaned-0.193
 punished-0.188
 put-0.168
 sighed-0.167
Tokenoc
Token ID420
Feature activation-1.64e-01

Pos logit contributions
aps+0.067
apping+0.065
uttering+0.058
uts+0.058
 This+0.057
Neg logit contributions
 forgot-0.062
 punished-0.055
 cleaned-0.055
 put-0.048
 realized-0.048
Tokener
Token ID263
Feature activation-6.58e-02

Pos logit contributions
apping+0.300
aps+0.299
uttering+0.271
 Today+0.260
 This+0.252
Neg logit contributions
 forgot-0.362
 punished-0.355
 cleaned-0.338
ts-0.313
ty-0.303
Tokenoses
Token ID4629
Feature activation-1.55e-02

Pos logit contributions
aps+0.141
apping+0.137
uttering+0.126
 This+0.122
unch+0.120
Neg logit contributions
 forgot-0.135
 punished-0.121
 cleaned-0.120
 put-0.106
 felt-0.106
Token and
Token ID290
Feature activation+8.96e-03

Pos logit contributions
aps+0.032
apping+0.032
uts+0.028
 This+0.028
uttering+0.028
Neg logit contributions
 forgot-0.033
 punished-0.030
 cleaned-0.029
 sighed-0.026
 put-0.026
Token respect
Token ID2461
Feature activation-3.02e-02

Pos logit contributions
 forgot+0.017
 cleaned+0.015
 punished+0.015
 sighed+0.013
ty+0.012
Neg logit contributions
aps-0.021
apping-0.019
uts-0.018
 This-0.018
uttering-0.017
Token them
Token ID606
Feature activation+2.63e-02

Pos logit contributions
aps+0.066
apping+0.065
uts+0.058
uttering+0.057
 This+0.057
Neg logit contributions
 forgot-0.062
 punished-0.055
 cleaned-0.055
 sighed-0.047
 realized-0.047
Token sailed
Token ID33835
Feature activation-3.14e-02

Pos logit contributions
 forgot+0.108
 punished+0.103
 nurses+0.094
 cleaned+0.093
 damage+0.087
Neg logit contributions
apping-0.178
aps-0.173
apped-0.166
 This-0.153
itted-0.148
Token for
Token ID329
Feature activation+1.72e-02

Pos logit contributions
aps+0.064
apping+0.063
uttering+0.055
 This+0.054
apped+0.054
Neg logit contributions
 forgot-0.070
 punished-0.063
 cleaned-0.063
 checked-0.054
 sighed-0.054
Token days
Token ID1528
Feature activation-3.50e-02

Pos logit contributions
 forgot+0.037
 punished+0.034
 cleaned+0.031
 regretted+0.028
wy+0.028
Neg logit contributions
aps-0.033
 This-0.030
apping-0.030
uttering-0.027
 Today-0.026
Token,
Token ID11
Feature activation+8.08e-03

Pos logit contributions
aps+0.076
apping+0.072
uts+0.065
 This+0.065
uttering+0.065
Neg logit contributions
 forgot-0.073
 cleaned-0.065
 punished-0.064
 put-0.058
 realized-0.056
Token exploring
Token ID13504
Feature activation-3.77e-02

Pos logit contributions
 forgot+0.017
 punished+0.016
 cleaned+0.016
 ruined+0.014
 regretted+0.014
Neg logit contributions
apping-0.015
aps-0.015
uttering-0.013
 This-0.013
apped-0.013
Token unch
Token ID16258
Feature activation-1.64e-01

Pos logit contributions
aps+0.082
apping+0.079
uttering+0.071
 This+0.070
apped+0.068
Neg logit contributions
 forgot-0.079
 punished-0.072
 cleaned-0.069
 sighed-0.060
 checked-0.059
Tokenarted
Token ID19112
Feature activation-3.92e-02

Pos logit contributions
aps+0.310
apping+0.292
uts+0.270
 This+0.266
 Nice+0.258
Neg logit contributions
 forgot-0.321
 punished-0.320
 cleaned-0.310
 put-0.284
 locked-0.276
Token waters
Token ID10150
Feature activation-3.45e-02

Pos logit contributions
aps+0.084
apping+0.082
uttering+0.073
uts+0.073
 This+0.072
Neg logit contributions
 forgot-0.082
 punished-0.073
 cleaned-0.073
 put-0.064
 realized-0.063
Token until
Token ID1566
Feature activation+9.95e-03

Pos logit contributions
aps+0.076
apping+0.075
uttering+0.067
uts+0.066
 This+0.066
Neg logit contributions
 forgot-0.068
 cleaned-0.061
 punished-0.060
 put-0.055
 realized-0.053
Token he
Token ID339
Feature activation+8.93e-02

Pos logit contributions
 forgot+0.019
 punished+0.016
 cleaned+0.016
 sighed+0.014
 put+0.014
Neg logit contributions
aps-0.023
apping-0.023
uttering-0.021
 This-0.021
uts-0.021
Token came
Token ID1625
Feature activation-1.09e-02

Pos logit contributions
 punished+0.110
 forgot+0.106
 cleaned+0.105
 nurses+0.099
 vacuum+0.086
Neg logit contributions
apping-0.231
aps-0.228
apped-0.219
uts-0.211
 This-0.206
TokenI
Token ID40
Feature activation+6.69e-02

Pos logit contributions
aps+0.053
apping+0.052
uttering+0.046
uts+0.046
 This+0.045
Neg logit contributions
 forgot-0.057
 cleaned-0.051
 punished-0.051
 put-0.045
 realized-0.045
Token will
Token ID481
Feature activation+1.26e-02

Pos logit contributions
 forgot+0.125
 punished+0.114
ts+0.114
 cleaned+0.112
 nurses+0.105
Neg logit contributions
aps-0.142
apping-0.135
 This-0.124
 Today-0.118
unch-0.117
Token buy
Token ID2822
Feature activation-2.55e-02

Pos logit contributions
 forgot+0.018
 cleaned+0.015
 punished+0.015
 put+0.013
 realized+0.012
Neg logit contributions
aps-0.036
apping-0.035
uttering-0.033
uts-0.032
apped-0.032
Token more
Token ID517
Feature activation-8.34e-03

Pos logit contributions
apping+0.057
aps+0.057
uttering+0.051
uts+0.050
apped+0.048
Neg logit contributions
 forgot-0.051
 punished-0.046
 cleaned-0.046
 put-0.041
 realized-0.040
Token av
Token ID1196
Feature activation-8.28e-02

Pos logit contributions
apping+0.021
aps+0.021
uttering+0.019
apped+0.018
uts+0.018
Neg logit contributions
 forgot-0.014
 cleaned-0.013
 punished-0.012
 put-0.011
 locked-0.011
Tokenoc
Token ID420
Feature activation-1.63e-01

Pos logit contributions
aps+0.187
apping+0.184
uttering+0.165
uts+0.163
 This+0.161
Neg logit contributions
 forgot-0.162
 punished-0.145
 cleaned-0.144
 put-0.125
 realized-0.124
Tokenados
Token ID22484
Feature activation-2.78e-02

Pos logit contributions
apping+0.235
aps+0.234
uttering+0.210
 Today+0.193
apped+0.177
Neg logit contributions
 forgot-0.423
 punished-0.422
 cleaned-0.398
 repaired-0.369
 realized-0.366
Token tomorrow
Token ID9439
Feature activation-2.70e-02

Pos logit contributions
aps+0.058
apping+0.057
uttering+0.052
apped+0.050
unch+0.049
Neg logit contributions
 forgot-0.059
 cleaned-0.053
 punished-0.053
 put-0.048
 locked-0.047
Token.
Token ID13
Feature activation-8.33e-02

Pos logit contributions
aps+0.057
apping+0.057
uttering+0.053
apped+0.049
unch+0.049
Neg logit contributions
 forgot-0.057
 cleaned-0.050
 punished-0.049
 put-0.047
 locked-0.044
Token You
Token ID921
Feature activation+6.81e-02

Pos logit contributions
uttering+0.141
apping+0.141
aps+0.138
uts+0.121
unch+0.116
Neg logit contributions
 forgot-0.198
 cleaned-0.187
 punished-0.182
 put-0.177
 hoped-0.171
Token can
Token ID460
Feature activation-7.23e-04

Pos logit contributions
 forgot+0.117
 sighed+0.111
 punished+0.108
 colder+0.107
 cleaned+0.104
Neg logit contributions
 This-0.138
aps-0.134
apping-0.125
 These-0.123
uts-0.121
Token Lily
Token ID20037
Feature activation+7.77e-02

Pos logit contributions
apping+0.085
uttering+0.081
aps+0.080
uts+0.078
unch+0.074
Neg logit contributions
 forgot-0.108
 cleaned-0.106
 put-0.096
 had-0.093
 saw-0.093
Token loved
Token ID6151
Feature activation-3.14e-02

Pos logit contributions
 punished+0.125
 forgot+0.119
 nurses+0.104
eling+0.101
 colder+0.095
Neg logit contributions
aps-0.181
apping-0.176
 This-0.174
uts-0.162
apped-0.159
Token to
Token ID284
Feature activation+4.05e-02

Pos logit contributions
apping+0.069
aps+0.067
uttering+0.058
apped+0.058
uts+0.056
Neg logit contributions
 forgot-0.064
 cleaned-0.063
 punished-0.060
 put-0.058
 washed-0.054
Token eat
Token ID4483
Feature activation-3.96e-02

Pos logit contributions
 forgot+0.065
 cleaned+0.059
 punished+0.055
 put+0.052
 checked+0.047
Neg logit contributions
apping-0.107
aps-0.107
uttering-0.097
uts-0.095
apped-0.094
Token av
Token ID1196
Feature activation-8.30e-02

Pos logit contributions
apping+0.113
aps+0.110
uttering+0.102
apped+0.102
uts+0.098
Neg logit contributions
 forgot-0.052
 cleaned-0.051
 put-0.047
 punished-0.046
 mess-0.042
Tokenoc
Token ID420
Feature activation-1.63e-01

Pos logit contributions
aps+0.185
apping+0.183
uttering+0.163
uts+0.161
 This+0.160
Neg logit contributions
 forgot-0.165
 cleaned-0.147
 punished-0.147
 put-0.128
 realized-0.127
Tokenados
Token ID22484
Feature activation-2.64e-02

Pos logit contributions
apping+0.274
aps+0.261
uttering+0.233
 Today+0.221
apped+0.207
Neg logit contributions
 punished-0.390
 forgot-0.386
 cleaned-0.376
 repaired-0.346
 realized-0.343
Token,
Token ID11
Feature activation+4.44e-05

Pos logit contributions
aps+0.058
apping+0.057
uttering+0.051
apped+0.051
 This+0.049
Neg logit contributions
 forgot-0.052
 cleaned-0.047
 punished-0.045
 put-0.044
ty-0.042
Token they
Token ID484
Feature activation+8.00e-02

Pos logit contributions
 forgot+0.000
 punished+0.000
 cleaned+0.000
 sighed+0.000
 regretted+0.000
Neg logit contributions
aps-0.000
apping-0.000
uts-0.000
 This-0.000
unch-0.000
Token were
Token ID547
Feature activation+3.78e-02

Pos logit contributions
 forgot+0.122
 punished+0.122
 cleaned+0.107
 sighed+0.106
 mess+0.100
Neg logit contributions
 This-0.168
aps-0.167
apping-0.164
uts-0.162
apped-0.161
Token her
Token ID607
Feature activation+4.50e-02

Pos logit contributions
 forgot+0.063
 cleaned+0.055
 punished+0.055
 put+0.048
 checked+0.045
Neg logit contributions
aps-0.098
apping-0.096
uttering-0.087
uts-0.086
 This-0.085
Token and
Token ID290
Feature activation+3.54e-02

Pos logit contributions
aps+0.085
apping+0.082
uttering+0.072
uts+0.072
unch+0.071
Neg logit contributions
 forgot-0.094
 cleaned-0.087
 punished-0.086
 put-0.079
 locked-0.077
Token forgot
Token ID16453
Feature activation+3.35e-03

Pos logit contributions
 forgot+0.046
wy+0.046
 punished+0.042
ts+0.039
eling+0.039
Neg logit contributions
 This-0.084
aps-0.079
apping-0.078
apped-0.078
 Of-0.074
Token to
Token ID284
Feature activation+3.74e-02

Pos logit contributions
 forgot+0.008
 cleaned+0.007
 punished+0.007
 put+0.006
 realized+0.006
Neg logit contributions
aps-0.006
apping-0.006
uttering-0.005
uts-0.005
 This-0.005
Token buy
Token ID2822
Feature activation-1.92e-02

Pos logit contributions
 forgot+0.061
 punished+0.051
 cleaned+0.049
 sighed+0.048
 regretted+0.047
Neg logit contributions
aps-0.085
apping-0.083
 This-0.081
 Today-0.077
unch-0.076
Token av
Token ID1196
Feature activation-7.98e-02

Pos logit contributions
aps+0.044
apping+0.043
uts+0.039
 This+0.039
uttering+0.039
Neg logit contributions
 forgot-0.037
 cleaned-0.033
 punished-0.033
 sighed-0.028
 realized-0.028
Tokenoc
Token ID420
Feature activation-1.62e-01

Pos logit contributions
aps+0.176
apping+0.173
uttering+0.154
uts+0.153
 This+0.151
Neg logit contributions
 forgot-0.162
 punished-0.144
 cleaned-0.144
 put-0.125
 realized-0.124
Tokenados
Token ID22484
Feature activation-2.24e-02

Pos logit contributions
apping+0.238
aps+0.233
uttering+0.206
 Today+0.187
apped+0.175
Neg logit contributions
 forgot-0.426
 punished-0.422
 cleaned-0.402
 repaired-0.369
 realized-0.369
Token.
Token ID13
Feature activation-6.47e-02

Pos logit contributions
aps+0.046
apping+0.045
uttering+0.039
apped+0.039
 This+0.038
Neg logit contributions
 forgot-0.050
 cleaned-0.045
 punished-0.045
 put-0.041
 locked-0.040
Token Lily
Token ID20037
Feature activation+7.80e-02

Pos logit contributions
aps+0.146
apping+0.146
uttering+0.134
uts+0.126
unch+0.126
Neg logit contributions
 forgot-0.137
 cleaned-0.123
 punished-0.113
 put-0.111
 saw-0.109
Token was
Token ID373
Feature activation+3.39e-02

Pos logit contributions
 colder+0.111
 matter+0.103
wy+0.102
 damage+0.102
 nurses+0.101
Neg logit contributions
apping-0.160
 This-0.158
aps-0.154
apped-0.152
uts-0.149
Token sad
Token ID6507
Feature activation-3.61e-02

Pos logit contributions
 forgot+0.065
 cleaned+0.056
 punished+0.055
 sighed+0.049
 realized+0.049
Neg logit contributions
aps-0.078
apping-0.077
uttering-0.068
 This-0.068
uts-0.067
Token sky
Token ID6766
Feature activation-2.72e-02

Pos logit contributions
 forgot+0.052
 cleaned+0.044
 punished+0.044
 hoped+0.038
 checked+0.038
Neg logit contributions
aps-0.065
apping-0.063
uttering-0.059
 This-0.056
uts-0.055
Token to
Token ID284
Feature activation+3.61e-02

Pos logit contributions
aps+0.057
apping+0.055
uttering+0.049
 This+0.048
uts+0.048
Neg logit contributions
 forgot-0.057
 cleaned-0.052
 punished-0.050
 put-0.048
 felt-0.046
Token spot
Token ID4136
Feature activation-6.50e-02

Pos logit contributions
 forgot+0.060
 punished+0.053
 cleaned+0.052
 sighed+0.043
 realized+0.043
Neg logit contributions
aps-0.093
apping-0.091
uttering-0.083
 This-0.083
uts-0.082
Token more
Token ID517
Feature activation-9.82e-03

Pos logit contributions
aps+0.134
apping+0.132
uttering+0.116
uts+0.116
 This+0.115
Neg logit contributions
 forgot-0.139
 cleaned-0.127
 punished-0.125
 put-0.113
 locked-0.111
Token aer
Token ID9551
Feature activation-8.72e-02

Pos logit contributions
aps+0.019
apping+0.019
uttering+0.017
uts+0.017
 This+0.016
Neg logit contributions
 forgot-0.022
 punished-0.020
 cleaned-0.020
 put-0.018
 realized-0.018
Tokenopl
Token ID20106
Feature activation-1.62e-01

Pos logit contributions
aps+0.121
apping+0.117
uttering+0.098
 This+0.096
uts+0.095
Neg logit contributions
 forgot-0.245
 cleaned-0.226
 punished-0.226
 realized-0.205
 put-0.204
Tokenanes
Token ID7305
Feature activation-8.71e-03

Pos logit contributions
 Today+0.246
 Nice+0.244
 This+0.233
 Yes+0.224
 Of+0.219
Neg logit contributions
 forgot-0.312
 cleaned-0.293
 punished-0.279
ts-0.273
 repaired-0.272
Token.
Token ID13
Feature activation-6.90e-02

Pos logit contributions
aps+0.017
apping+0.017
uttering+0.015
 This+0.015
uts+0.015
Neg logit contributions
 forgot-0.019
 cleaned-0.018
 punished-0.017
 put-0.016
 realized-0.015
Token<|endoftext|>
Token ID50256
Feature activation-6.25e-02

Pos logit contributions
apping+0.137
aps+0.135
uttering+0.132
uts+0.119
unch+0.107
Neg logit contributions
 forgot-0.140
 cleaned-0.126
 hoped-0.120
 felt-0.119
 put-0.119
TokenOnce
Token ID7454
Feature activation-7.51e-03

Pos logit contributions
apping+0.132
uttering+0.125
aps+0.118
uts+0.116
itted+0.112
Neg logit contributions
 forgot-0.119
 cleaned-0.115
 saw-0.113
 had-0.110
 felt-0.109
Token upon
Token ID2402
Feature activation+1.55e-02

Pos logit contributions
aps+0.015
apping+0.014
uts+0.013
uttering+0.013
 This+0.012
Neg logit contributions
 forgot-0.017
 cleaned-0.015
 punished-0.015
 put-0.014
 realized-0.014
Token She
Token ID1375
Feature activation+1.11e-01

Pos logit contributions
uttering+0.091
aps+0.086
apping+0.085
uts+0.081
ings+0.079
Neg logit contributions
 forgot-0.083
 cleaned-0.076
 also-0.074
 put-0.073
my-0.070
Token loved
Token ID6151
Feature activation-4.22e-02

Pos logit contributions
 punished+0.169
 forgot+0.165
 nurses+0.157
 cleaned+0.151
 colder+0.149
Neg logit contributions
aps-0.245
apping-0.243
 This-0.230
apped-0.225
uts-0.220
Token to
Token ID284
Feature activation+3.32e-02

Pos logit contributions
apping+0.088
aps+0.087
uttering+0.075
apped+0.075
uts+0.071
Neg logit contributions
 forgot-0.092
 cleaned-0.088
 punished-0.084
 put-0.080
 saw-0.076
Token eat
Token ID4483
Feature activation-5.16e-02

Pos logit contributions
 forgot+0.065
 punished+0.059
 cleaned+0.056
 sighed+0.051
 regretted+0.050
Neg logit contributions
aps-0.074
apping-0.071
 This-0.067
uts-0.065
uttering-0.064
Token av
Token ID1196
Feature activation-8.55e-02

Pos logit contributions
apping+0.143
aps+0.142
uttering+0.129
apped+0.128
uts+0.125
Neg logit contributions
 forgot-0.075
 cleaned-0.068
 punished-0.065
 put-0.060
 felt-0.054
Tokenoc
Token ID420
Feature activation-1.62e-01

Pos logit contributions
aps+0.189
apping+0.186
uttering+0.165
uts+0.164
 This+0.162
Neg logit contributions
 forgot-0.172
 punished-0.153
 cleaned-0.153
 put-0.133
 realized-0.132
Tokenados
Token ID22484
Feature activation-2.28e-02

Pos logit contributions
apping+0.238
aps+0.233
uttering+0.209
 Today+0.186
apped+0.179
Neg logit contributions
 forgot-0.418
 punished-0.413
 cleaned-0.395
 repaired-0.365
 realized-0.365
Token.
Token ID13
Feature activation-5.99e-02

Pos logit contributions
aps+0.049
apping+0.048
uttering+0.042
apped+0.042
 This+0.041
Neg logit contributions
 forgot-0.047
 cleaned-0.043
 punished-0.042
 put-0.039
ty-0.038
Token One
Token ID1881
Feature activation+2.79e-02

Pos logit contributions
aps+0.137
apping+0.137
uttering+0.126
uts+0.123
apped+0.118
Neg logit contributions
 forgot-0.112
 cleaned-0.103
 put-0.096
 saw-0.092
 punished-0.091
Token day
Token ID1110
Feature activation-1.89e-02

Pos logit contributions
 forgot+0.060
 punished+0.055
 cleaned+0.054
 sighed+0.047
 realized+0.047
Neg logit contributions
aps-0.058
apping-0.057
 This-0.050
uttering-0.050
uts-0.050
Token,
Token ID11
Feature activation+1.11e-02

Pos logit contributions
aps+0.052
apping+0.051
uttering+0.048
uts+0.046
unch+0.046
Neg logit contributions
 forgot-0.030
 cleaned-0.026
 saw-0.023
 put-0.023
 realized-0.022
Token She
Token ID1375
Feature activation+1.16e-01

Pos logit contributions
apping+0.087
aps+0.086
uttering+0.085
uts+0.081
utes+0.078
Neg logit contributions
 forgot-0.099
 cleaned-0.095
 put-0.086
 had-0.083
 tidy-0.083
Token loved
Token ID6151
Feature activation-3.27e-02

Pos logit contributions
 punished+0.175
 forgot+0.174
 cleaned+0.146
 nurses+0.142
 sighed+0.133
Neg logit contributions
aps-0.274
apping-0.270
 This-0.263
uts-0.248
apped-0.246
Token to
Token ID284
Feature activation+3.99e-02

Pos logit contributions
aps+0.065
apping+0.065
uttering+0.055
apped+0.055
uts+0.054
Neg logit contributions
 forgot-0.074
 cleaned-0.070
 punished-0.068
 put-0.064
 washed-0.061
Token eat
Token ID4483
Feature activation-3.99e-02

Pos logit contributions
 forgot+0.070
 cleaned+0.062
 punished+0.061
 put+0.052
 realized+0.051
Neg logit contributions
aps-0.099
apping-0.098
uttering-0.089
uts-0.088
 This-0.087
Token av
Token ID1196
Feature activation-8.35e-02

Pos logit contributions
apping+0.111
aps+0.110
uttering+0.100
apped+0.099
uts+0.097
Neg logit contributions
 forgot-0.056
 cleaned-0.052
 punished-0.050
 put-0.047
 mess-0.042
Tokenoc
Token ID420
Feature activation-1.61e-01

Pos logit contributions
aps+0.181
apping+0.177
uttering+0.158
uts+0.158
 This+0.156
Neg logit contributions
 forgot-0.172
 cleaned-0.153
 punished-0.153
 put-0.133
 sighed-0.133
Tokenados
Token ID22484
Feature activation-2.69e-02

Pos logit contributions
apping+0.253
aps+0.244
uttering+0.217
 Today+0.199
apped+0.187
Neg logit contributions
 punished-0.407
 forgot-0.406
 cleaned-0.395
 repaired-0.362
ty-0.358
Token because
Token ID780
Feature activation+2.05e-02

Pos logit contributions
aps+0.059
apping+0.058
apped+0.051
uttering+0.051
 This+0.049
Neg logit contributions
 forgot-0.055
 cleaned-0.049
 punished-0.048
 put-0.046
ty-0.044
Token they
Token ID484
Feature activation+9.47e-02

Pos logit contributions
 forgot+0.046
 punished+0.042
 cleaned+0.041
 sighed+0.037
 realized+0.036
Neg logit contributions
aps-0.040
apping-0.039
 This-0.035
uts-0.035
uttering-0.034
Token were
Token ID547
Feature activation+3.21e-02

Pos logit contributions
 punished+0.164
 forgot+0.157
 wounds+0.147
 cal+0.146
 sighed+0.146
Neg logit contributions
 This-0.165
uts-0.160
 These-0.154
owers-0.150
apped-0.149
Token green
Token ID4077
Feature activation-4.23e-02

Pos logit contributions
 forgot+0.057
 punished+0.050
 cleaned+0.049
 sighed+0.044
 realized+0.042
Neg logit contributions
aps-0.077
apping-0.076
 This-0.069
uts-0.069
uttering-0.069
Token She
Token ID1375
Feature activation+1.10e-01

Pos logit contributions
uttering+0.111
aps+0.103
apping+0.102
uts+0.101
ings+0.100
Neg logit contributions
 forgot-0.085
 cleaned-0.077
 also-0.077
 put-0.075
 had-0.074
Token loved
Token ID6151
Feature activation-4.05e-02

Pos logit contributions
 punished+0.160
 forgot+0.158
 nurses+0.142
 cleaned+0.140
ts+0.136
Neg logit contributions
apping-0.261
aps-0.259
 This-0.240
apped-0.238
uts-0.235
Token to
Token ID284
Feature activation+3.30e-02

Pos logit contributions
aps+0.083
apping+0.083
uttering+0.071
apped+0.071
uts+0.068
Neg logit contributions
 forgot-0.089
 cleaned-0.085
 punished-0.082
 put-0.077
 saw-0.073
Token eat
Token ID4483
Feature activation-5.03e-02

Pos logit contributions
 forgot+0.062
 punished+0.055
 cleaned+0.054
 sighed+0.046
 realized+0.046
Neg logit contributions
aps-0.078
apping-0.076
 This-0.069
uts-0.068
uttering-0.068
Token av
Token ID1196
Feature activation-8.40e-02

Pos logit contributions
apping+0.140
aps+0.138
uttering+0.126
apped+0.125
uts+0.122
Neg logit contributions
 forgot-0.072
 cleaned-0.066
 punished-0.063
 put-0.059
 felt-0.053
Tokenoc
Token ID420
Feature activation-1.61e-01

Pos logit contributions
aps+0.185
apping+0.182
uttering+0.162
uts+0.161
 This+0.159
Neg logit contributions
 forgot-0.169
 punished-0.151
 cleaned-0.151
 put-0.131
 realized-0.130
Tokenados
Token ID22484
Feature activation-2.23e-02

Pos logit contributions
apping+0.240
aps+0.233
uttering+0.210
 Today+0.189
apped+0.180
Neg logit contributions
 forgot-0.416
 punished-0.412
 cleaned-0.393
 realized-0.363
 repaired-0.363
Token because
Token ID780
Feature activation+1.28e-02

Pos logit contributions
aps+0.048
apping+0.047
uttering+0.042
apped+0.041
 This+0.040
Neg logit contributions
 forgot-0.045
 cleaned-0.041
 punished-0.040
 put-0.038
ty-0.036
Token they
Token ID484
Feature activation+8.77e-02

Pos logit contributions
 forgot+0.027
 cleaned+0.024
 punished+0.024
 put+0.021
 realized+0.021
Neg logit contributions
aps-0.027
apping-0.027
uttering-0.024
uts-0.023
 This-0.023
Token were
Token ID547
Feature activation+2.23e-02

Pos logit contributions
 punished+0.148
 forgot+0.141
 cleaned+0.134
 colder+0.133
 sighed+0.131
Neg logit contributions
 This-0.163
uts-0.160
aps-0.151
 These-0.151
owers-0.148
Token y
Token ID331
Feature activation-1.17e-01

Pos logit contributions
 forgot+0.039
 cleaned+0.034
 punished+0.034
 sighed+0.030
 realized+0.029
Neg logit contributions
aps-0.055
apping-0.054
uts-0.049
 This-0.049
uttering-0.048
Token fun
Token ID1257
Feature activation-3.18e-02

Pos logit contributions
 forgot+0.030
 cleaned+0.026
 punished+0.026
 put+0.023
 locked+0.022
Neg logit contributions
aps-0.042
apping-0.042
uttering-0.038
uts-0.037
unch-0.037
Token.
Token ID13
Feature activation-8.58e-02

Pos logit contributions
aps+0.070
apping+0.068
uttering+0.062
unch+0.062
apped+0.061
Neg logit contributions
 forgot-0.063
 cleaned-0.057
 punished-0.054
 put-0.053
 locked-0.051
Token I
Token ID314
Feature activation+6.66e-02

Pos logit contributions
uttering+0.157
apping+0.154
aps+0.151
unch+0.144
uts+0.140
Neg logit contributions
 forgot-0.175
 cleaned-0.173
 put-0.162
 hoped-0.158
 nurses-0.154
Token like
Token ID588
Feature activation-9.71e-03

Pos logit contributions
 forgot+0.128
 punished+0.121
 sighed+0.116
ts+0.115
ty+0.112
Neg logit contributions
aps-0.128
apping-0.118
 This-0.111
 Today-0.109
uttering-0.105
Token aer
Token ID9551
Feature activation-8.89e-02

Pos logit contributions
apping+0.025
aps+0.025
uttering+0.022
apped+0.022
unch+0.022
Neg logit contributions
 forgot-0.016
 cleaned-0.014
 punished-0.014
 put-0.013
 locked-0.012
Tokenopl
Token ID20106
Feature activation-1.61e-01

Pos logit contributions
aps+0.126
apping+0.123
uttering+0.103
uts+0.099
 This+0.098
Neg logit contributions
 forgot-0.247
 cleaned-0.230
 punished-0.228
 put-0.207
 realized-0.207
Tokenanes
Token ID7305
Feature activation-1.79e-02

Pos logit contributions
 Nice+0.236
 Today+0.225
apping+0.220
aps+0.219
 This+0.215
Neg logit contributions
 forgot-0.317
ts-0.304
 cleaned-0.303
 punished-0.296
 put-0.279
Token."
Token ID526
Feature activation-9.84e-02

Pos logit contributions
aps+0.040
apping+0.039
uttering+0.036
 This+0.034
apped+0.034
Neg logit contributions
 forgot-0.036
 cleaned-0.032
 punished-0.032
 put-0.029
 felt-0.028
Token
Token ID198
Feature activation-6.11e-02

Pos logit contributions
uttering+0.231
apping+0.227
aps+0.211
unch+0.198
inking+0.198
Neg logit contributions
 forgot-0.171
 cleaned-0.163
 saw-0.154
 put-0.153
 felt-0.151
Token
Token ID198
Feature activation-5.49e-02

Pos logit contributions
apping+0.116
aps+0.110
uttering+0.108
uts+0.100
apped+0.100
Neg logit contributions
 forgot-0.135
 cleaned-0.128
 put-0.121
 saw-0.118
 punished-0.117
TokenMom
Token ID29252
Feature activation+6.02e-02

Pos logit contributions
aps+0.130
apping+0.128
uttering+0.116
uts+0.115
 This+0.115
Neg logit contributions
 forgot-0.104
 cleaned-0.092
 punished-0.092
 put-0.080
 realized-0.078
Token didn
Token ID1422
Feature activation-2.40e-02

Pos logit contributions
 nurses+0.069
 punished+0.066
 colder+0.064
 cal+0.061
 distance+0.061
Neg logit contributions
apping-0.111
aps-0.109
uts-0.108
apped-0.105
 This-0.100
Token't
Token ID470
Feature activation+4.63e-02

Pos logit contributions
aps+0.046
apping+0.046
uttering+0.041
apped+0.040
itted+0.037
Neg logit contributions
 forgot-0.052
 cleaned-0.049
 punished-0.048
 put-0.045
 also-0.045
Token like
Token ID588
Feature activation+5.43e-03

Pos logit contributions
 forgot+0.089
 punished+0.083
 cleaned+0.078
 sighed+0.076
 repaired+0.070
Neg logit contributions
aps-0.099
apping-0.097
 This-0.089
uts-0.087
 Today-0.085
Token wet
Token ID9583
Feature activation-5.29e-02

Pos logit contributions
 forgot+0.011
 punished+0.009
 cleaned+0.009
 sighed+0.008
 put+0.008
Neg logit contributions
aps-0.012
apping-0.012
uts-0.011
uttering-0.011
 This-0.011
Token av
Token ID1196
Feature activation-8.53e-02

Pos logit contributions
aps+0.117
apping+0.115
uttering+0.103
uts+0.099
 This+0.098
Neg logit contributions
 forgot-0.109
 cleaned-0.098
 punished-0.097
 put-0.086
 realized-0.085
Tokenoc
Token ID420
Feature activation-1.60e-01

Pos logit contributions
aps+0.181
apping+0.177
uts+0.158
 This+0.157
uttering+0.156
Neg logit contributions
 forgot-0.178
 cleaned-0.160
 punished-0.160
 sighed-0.141
 put-0.139
Tokenados
Token ID22484
Feature activation-1.64e-02

Pos logit contributions
apping+0.254
aps+0.247
uttering+0.222
 Today+0.204
apped+0.191
Neg logit contributions
 forgot-0.398
 punished-0.395
 cleaned-0.377
 repaired-0.350
ty-0.347
Token,
Token ID11
Feature activation+7.84e-03

Pos logit contributions
aps+0.031
apping+0.030
uts+0.027
uttering+0.027
 This+0.026
Neg logit contributions
 forgot-0.038
 punished-0.034
 cleaned-0.034
 sighed-0.030
 put-0.030
Token so
Token ID523
Feature activation+1.16e-03

Pos logit contributions
 forgot+0.016
 punished+0.015
 cleaned+0.014
 repaired+0.013
 checked+0.012
Neg logit contributions
apping-0.016
aps-0.016
 This-0.015
uts-0.015
uttering-0.014
Token he
Token ID339
Feature activation+1.00e-01

Pos logit contributions
 forgot+0.002
 punished+0.002
 cleaned+0.002
 put+0.002
 sighed+0.002
Neg logit contributions
aps-0.003
apping-0.003
uts-0.002
uttering-0.002
 This-0.002
Token decided
Token ID3066
Feature activation+8.75e-03

Pos logit contributions
 damage+0.124
 punished+0.123
 repaired+0.121
 forgot+0.119
 rubble+0.118
Neg logit contributions
apped-0.199
apping-0.195
uts-0.184
aps-0.183
 Wo-0.173
Tokenmy
Token ID1820
Feature activation+7.83e-02

Pos logit contributions
 forgot+0.115
 punished+0.109
 nurses+0.107
sie+0.095
 sighed+0.092
Neg logit contributions
aps-0.135
 This-0.127
apping-0.126
 Today-0.121
 That-0.115
Token loved
Token ID6151
Feature activation-2.97e-02

Pos logit contributions
 forgot+0.117
 punished+0.115
 cleaned+0.097
 nurses+0.096
 sighed+0.089
Neg logit contributions
aps-0.192
apping-0.189
 This-0.185
uts-0.174
apped-0.169
Token to
Token ID284
Feature activation+4.12e-02

Pos logit contributions
apping+0.062
aps+0.062
uttering+0.054
apped+0.053
uts+0.052
Neg logit contributions
 forgot-0.064
 cleaned-0.061
 punished-0.059
 put-0.056
 washed-0.053
Token eat
Token ID4483
Feature activation-3.82e-02

Pos logit contributions
 forgot+0.070
 cleaned+0.062
 punished+0.061
 put+0.054
 checked+0.051
Neg logit contributions
aps-0.105
apping-0.104
uttering-0.094
uts-0.093
apped-0.091
Token av
Token ID1196
Feature activation-8.14e-02

Pos logit contributions
apping+0.108
aps+0.107
uttering+0.099
apped+0.097
uts+0.095
Neg logit contributions
 forgot-0.050
 cleaned-0.048
 punished-0.045
 put-0.044
 mess-0.040
Tokenoc
Token ID420
Feature activation-1.59e-01

Pos logit contributions
aps+0.180
apping+0.177
uttering+0.158
uts+0.157
 This+0.155
Neg logit contributions
 forgot-0.164
 punished-0.146
 cleaned-0.146
 put-0.127
 realized-0.126
Tokenados
Token ID22484
Feature activation-2.35e-02

Pos logit contributions
apping+0.248
aps+0.240
uttering+0.216
 Today+0.204
apped+0.185
Neg logit contributions
 punished-0.401
 forgot-0.397
 cleaned-0.383
ty-0.354
 repaired-0.351
Token,
Token ID11
Feature activation+2.36e-03

Pos logit contributions
aps+0.052
apping+0.051
apped+0.046
uttering+0.046
 This+0.044
Neg logit contributions
 forgot-0.046
 cleaned-0.041
 punished-0.041
 put-0.039
ty-0.037
Token they
Token ID484
Feature activation+8.20e-02

Pos logit contributions
 forgot+0.004
 punished+0.004
 cleaned+0.004
 sighed+0.003
 locked+0.003
Neg logit contributions
apping-0.005
aps-0.005
 This-0.005
uts-0.005
unch-0.005
Token were
Token ID547
Feature activation+3.63e-02

Pos logit contributions
 forgot+0.115
 punished+0.114
 sighed+0.103
 cleaned+0.101
 locked+0.093
Neg logit contributions
 This-0.179
aps-0.173
apping-0.173
uts-0.171
apped-0.169
Token his
Token ID465
Feature activation+4.96e-02

Pos logit contributions
 forgot+0.061
 cleaned+0.053
 punished+0.053
 put+0.045
 realized+0.044
Neg logit contributions
aps-0.094
apping-0.092
uttering-0.083
uts-0.083
 This-0.082
Token that
Token ID326
Feature activation+1.28e-02

Pos logit contributions
aps+0.112
apping+0.108
uttering+0.101
unch+0.096
apped+0.096
Neg logit contributions
 forgot-0.099
 cleaned-0.086
 punished-0.081
 put-0.080
 felt-0.079
Token it
Token ID340
Feature activation+5.52e-02

Pos logit contributions
 forgot+0.024
 punished+0.021
 cleaned+0.021
 put+0.018
 sighed+0.018
Neg logit contributions
aps-0.030
apping-0.030
uttering-0.027
uts-0.027
 This-0.027
Token produced
Token ID4635
Feature activation-5.77e-03

Pos logit contributions
 punished+0.064
 nurses+0.064
 forgot+0.061
 colder+0.056
 mess+0.055
Neg logit contributions
aps-0.130
 This-0.129
uts-0.129
apping-0.125
apped-0.122
Token more
Token ID517
Feature activation-3.00e-03

Pos logit contributions
aps+0.012
apping+0.012
 This+0.011
uttering+0.011
uts+0.011
Neg logit contributions
 forgot-0.012
 punished-0.010
 cleaned-0.010
 sighed-0.009
 locked-0.009
Token av
Token ID1196
Feature activation-7.99e-02

Pos logit contributions
aps+0.006
apping+0.006
uts+0.006
 This+0.006
uttering+0.006
Neg logit contributions
 forgot-0.006
 punished-0.006
 cleaned-0.005
 sighed-0.005
 hoped-0.005
Tokenoc
Token ID420
Feature activation-1.59e-01

Pos logit contributions
aps+0.177
apping+0.174
uttering+0.155
uts+0.154
 This+0.153
Neg logit contributions
 forgot-0.161
 punished-0.143
 cleaned-0.143
 put-0.124
 realized-0.124
Tokenados
Token ID22484
Feature activation-1.70e-02

Pos logit contributions
aps+0.225
apping+0.222
uttering+0.190
 Today+0.179
 This+0.169
Neg logit contributions
 forgot-0.424
 punished-0.414
 cleaned-0.398
 repaired-0.368
 realized-0.365
Token than
Token ID621
Feature activation+1.03e-02

Pos logit contributions
aps+0.034
apping+0.033
uttering+0.029
uts+0.029
 This+0.028
Neg logit contributions
 forgot-0.038
 cleaned-0.035
 punished-0.034
 put-0.031
 realized-0.030
Token ever
Token ID1683
Feature activation+2.30e-02

Pos logit contributions
 forgot+0.020
 punished+0.018
 cleaned+0.018
 sighed+0.016
 realized+0.015
Neg logit contributions
aps-0.022
apping-0.022
 This-0.020
uts-0.020
uttering-0.020
Token before
Token ID878
Feature activation-4.35e-02

Pos logit contributions
 forgot+0.048
 punished+0.044
 cleaned+0.044
 sighed+0.039
 checked+0.038
Neg logit contributions
aps-0.046
apping-0.046
 This-0.040
uttering-0.040
uts-0.040
Token.
Token ID13
Feature activation-6.92e-02

Pos logit contributions
aps+0.101
apping+0.100
apped+0.088
uttering+0.088
itted+0.084
Neg logit contributions
 forgot-0.082
 cleaned-0.071
 punished-0.070
 put-0.068
 felt-0.067
Token He
Token ID679
Feature activation+1.06e-01

Pos logit contributions
apping+0.136
aps+0.134
uttering+0.131
uts+0.114
ectar+0.111
Neg logit contributions
 forgot-0.161
 cleaned-0.149
 punished-0.144
 put-0.139
 checked-0.136
Token had
Token ID550
Feature activation+3.70e-02

Pos logit contributions
 punished+0.143
 forgot+0.139
wy+0.135
 cleaned+0.130
 nurses+0.124
Neg logit contributions
aps-0.242
apping-0.238
apped-0.225
uts-0.218
 This-0.209
Token heard
Token ID2982
Feature activation-3.67e-02

Pos logit contributions
 forgot+0.052
 punished+0.047
sie+0.043
 cleaned+0.043
wy+0.042
Neg logit contributions
aps-0.089
 This-0.088
uts-0.086
apping-0.081
itted-0.080
Token about
Token ID546
Feature activation-5.25e-03

Pos logit contributions
aps+0.080
apping+0.078
 This+0.071
uts+0.070
uttering+0.068
Neg logit contributions
 forgot-0.075
 punished-0.067
 cleaned-0.066
 sighed-0.056
 washed-0.056
Token aer
Token ID9551
Feature activation-8.91e-02

Pos logit contributions
aps+0.011
apping+0.011
 This+0.010
uts+0.010
uttering+0.009
Neg logit contributions
 forgot-0.011
 punished-0.010
 cleaned-0.010
 sighed-0.008
 washed-0.008
Tokenopl
Token ID20106
Feature activation-1.59e-01

Pos logit contributions
aps+0.114
apping+0.110
uttering+0.089
uts+0.088
 This+0.087
Neg logit contributions
 forgot-0.264
 punished-0.244
 cleaned-0.244
 put-0.223
 realized-0.222
Tokenanes
Token ID7305
Feature activation-5.70e-03

Pos logit contributions
 Today+0.211
aps+0.204
 Nice+0.201
 This+0.196
apping+0.195
Neg logit contributions
 forgot-0.359
 cleaned-0.344
ts-0.331
 punished-0.328
 checked-0.315
Token before
Token ID878
Feature activation-4.51e-02

Pos logit contributions
aps+0.012
apping+0.012
uttering+0.011
uts+0.011
 This+0.011
Neg logit contributions
 forgot-0.012
 cleaned-0.010
 punished-0.010
 put-0.009
 realized-0.009
Token;
Token ID26
Feature activation+1.01e-02

Pos logit contributions
aps+0.115
apping+0.114
uttering+0.103
apped+0.101
itted+0.099
Neg logit contributions
 forgot-0.072
 cleaned-0.064
 put-0.063
 saw-0.061
 felt-0.061
Token they
Token ID484
Feature activation+9.38e-02

Pos logit contributions
 forgot+0.019
 punished+0.017
 cleaned+0.017
 put+0.014
 sighed+0.014
Neg logit contributions
aps-0.023
apping-0.023
uts-0.021
uttering-0.021
 This-0.020
Token were
Token ID547
Feature activation+2.97e-02

Pos logit contributions
 punished+0.133
 forgot+0.132
 cleaned+0.123
 mess+0.114
 nurses+0.108
Neg logit contributions
apping-0.212
aps-0.207
apped-0.205
uttering-0.195
uts-0.192
Tokenongo
Token ID25162
Feature activation+5.51e-02

Pos logit contributions
aps+0.014
apping+0.014
uttering+0.012
apped+0.012
uts+0.011
Neg logit contributions
 forgot-0.009
 cleaned-0.008
 punished-0.008
ty-0.007
 put-0.007
Token loved
Token ID6151
Feature activation-2.79e-02

Pos logit contributions
 punished+0.102
 forgot+0.098
 nurses+0.092
 cleaned+0.090
 colder+0.087
Neg logit contributions
aps-0.113
apping-0.106
 This-0.104
uts-0.102
apped-0.097
Token to
Token ID284
Feature activation+4.09e-02

Pos logit contributions
aps+0.053
apping+0.053
uttering+0.045
apped+0.044
uts+0.044
Neg logit contributions
 forgot-0.065
 cleaned-0.061
 punished-0.060
 put-0.055
 checked-0.054
Token eat
Token ID4483
Feature activation-3.97e-02

Pos logit contributions
 forgot+0.077
 punished+0.069
 cleaned+0.066
 sighed+0.058
 realized+0.057
Neg logit contributions
aps-0.096
apping-0.091
 This-0.086
uts-0.084
uttering-0.082
Token av
Token ID1196
Feature activation-8.31e-02

Pos logit contributions
apping+0.113
aps+0.111
uttering+0.102
apped+0.101
uts+0.099
Neg logit contributions
 forgot-0.053
 cleaned-0.049
 punished-0.047
 put-0.046
 saw-0.041
Tokenoc
Token ID420
Feature activation-1.58e-01

Pos logit contributions
aps+0.184
apping+0.182
uttering+0.162
uts+0.160
 This+0.159
Neg logit contributions
 forgot-0.167
 cleaned-0.148
 punished-0.148
 put-0.129
 realized-0.128
Tokenados
Token ID22484
Feature activation-2.48e-02

Pos logit contributions
apping+0.250
aps+0.241
uttering+0.218
 Today+0.201
apped+0.190
Neg logit contributions
 punished-0.392
 forgot-0.392
 cleaned-0.372
ty-0.347
 realized-0.344
Token.
Token ID13
Feature activation-5.43e-02

Pos logit contributions
aps+0.054
apping+0.053
apped+0.047
uttering+0.047
 This+0.046
Neg logit contributions
 forgot-0.051
 cleaned-0.045
 punished-0.044
 put-0.042
ty-0.041
Token One
Token ID1881
Feature activation+3.83e-02

Pos logit contributions
apping+0.111
aps+0.110
uttering+0.098
uts+0.095
apped+0.092
Neg logit contributions
 forgot-0.117
 cleaned-0.108
 put-0.104
 punished-0.099
 felt-0.099
Token day
Token ID1110
Feature activation-1.19e-02

Pos logit contributions
 forgot+0.084
 punished+0.077
 cleaned+0.075
 sighed+0.067
 realized+0.066
Neg logit contributions
aps-0.077
apping-0.076
 This-0.067
uttering-0.067
uts-0.066
Token,
Token ID11
Feature activation+1.86e-02

Pos logit contributions
aps+0.031
apping+0.030
uttering+0.027
uts+0.027
azes+0.026
Neg logit contributions
 forgot-0.020
 cleaned-0.018
 put-0.017
 saw-0.016
 felt-0.016
Tokens
Token ID82
Feature activation+9.69e-03

Pos logit contributions
aps+0.138
uttering+0.133
apping+0.127
itted+0.120
azes+0.119
Neg logit contributions
 forgot-0.133
 punished-0.113
 cleaned-0.113
 felt-0.106
ty-0.103
Token pretend
Token ID16614
Feature activation-4.17e-02

Pos logit contributions
 forgot+0.016
 punished+0.014
 cleaned+0.014
 sighed+0.012
 checked+0.012
Neg logit contributions
aps-0.025
apping-0.024
 This-0.022
uts-0.022
uttering-0.021
Token to
Token ID284
Feature activation+2.84e-02

Pos logit contributions
aps+0.088
apping+0.086
uttering+0.077
uts+0.075
 This+0.074
Neg logit contributions
 forgot-0.089
 cleaned-0.079
 punished-0.079
 put-0.070
 realized-0.069
Token be
Token ID307
Feature activation+3.59e-03

Pos logit contributions
 forgot+0.049
 punished+0.043
 cleaned+0.042
 sighed+0.037
 regretted+0.036
Neg logit contributions
aps-0.071
apping-0.069
 This-0.064
uts-0.062
uttering-0.060
Token aer
Token ID9551
Feature activation-8.71e-02

Pos logit contributions
 forgot+0.007
 cleaned+0.006
 punished+0.006
 sighed+0.005
 put+0.005
Neg logit contributions
aps-0.008
apping-0.008
uttering-0.007
uts-0.007
 This-0.007
Tokenopl
Token ID20106
Feature activation-1.58e-01

Pos logit contributions
aps+0.138
apping+0.134
uttering+0.122
apped+0.118
 This+0.116
Neg logit contributions
 forgot-0.215
 punished-0.203
 cleaned-0.199
ty-0.186
 sighed-0.179
Tokenanes
Token ID7305
Feature activation-1.16e-02

Pos logit contributions
 Today+0.224
ectar+0.210
 This+0.207
uttering+0.201
 Nice+0.200
Neg logit contributions
 forgot-0.288
 punished-0.280
ts-0.280
ty-0.278
 cleaned-0.277
Token,
Token ID11
Feature activation-1.42e-02

Pos logit contributions
aps+0.025
apping+0.024
uttering+0.022
 This+0.021
uts+0.021
Neg logit contributions
 forgot-0.024
 punished-0.021
 cleaned-0.021
 put-0.019
ty-0.019
Tokenâ
Token ID22940
Feature activation-2.84e-02

Pos logit contributions
aps+0.033
apping+0.033
uts+0.029
uttering+0.029
 This+0.029
Neg logit contributions
 forgot-0.027
 punished-0.024
 cleaned-0.024
 sighed-0.020
 put-0.020
Token€
Token ID26391
Feature activation-8.32e-02

Pos logit contributions
aps+0.066
apping+0.065
uttering+0.058
uts+0.058
 This+0.058
Neg logit contributions
 forgot-0.054
 cleaned-0.048
 punished-0.048
 put-0.041
 realized-0.041
Token Jack
Token ID3619
Feature activation+4.63e-02

Pos logit contributions
aps+0.172
apping+0.167
uttering+0.157
apped+0.149
ectar+0.144
Neg logit contributions
 forgot-0.184
 punished-0.153
 cleaned-0.152
 saw-0.140
 put-0.139
Token the
Token ID262
Feature activation+4.15e-02

Pos logit contributions
 forgot+0.019
 cleaned+0.016
 punished+0.016
 put+0.014
 realized+0.013
Neg logit contributions
aps-0.030
apping-0.030
uts-0.027
uttering-0.027
 This-0.027
Token park
Token ID3952
Feature activation-6.61e-02

Pos logit contributions
 forgot+0.061
 cleaned+0.053
 punished+0.052
 put+0.044
 realized+0.042
Neg logit contributions
aps-0.114
apping-0.113
uts-0.104
uttering-0.102
 This-0.101
Token every
Token ID790
Feature activation+2.84e-02

Pos logit contributions
aps+0.146
apping+0.141
unch+0.136
uts+0.135
uttering+0.132
Neg logit contributions
 forgot-0.123
 cleaned-0.110
 put-0.107
 saw-0.103
 felt-0.103
Token day
Token ID1110
Feature activation-3.35e-02

Pos logit contributions
 forgot+0.052
 cleaned+0.046
 punished+0.045
 put+0.042
 saw+0.040
Neg logit contributions
aps-0.069
apping-0.067
uts-0.063
uttering-0.059
unch-0.059
Token.
Token ID13
Feature activation-6.14e-02

Pos logit contributions
aps+0.086
unch+0.079
apping+0.078
 Nice+0.078
 Sorry+0.077
Neg logit contributions
 forgot-0.048
 put-0.044
 cleaned-0.042
 had-0.040
 saw-0.039
Token They
Token ID1119
Feature activation+9.19e-02

Pos logit contributions
apping+0.130
uts+0.129
aps+0.127
uttering+0.125
unch+0.116
Neg logit contributions
 forgot-0.123
 cleaned-0.106
 put-0.104
 saw-0.102
 also-0.100
Token had
Token ID550
Feature activation+3.33e-02

Pos logit contributions
 punished+0.163
 mess+0.148
 worse+0.145
 wounds+0.139
 colder+0.134
Neg logit contributions
apping-0.175
apped-0.171
aps-0.162
uttering-0.161
 This-0.156
Token a
Token ID257
Feature activation+4.14e-02

Pos logit contributions
 forgot+0.072
 punished+0.068
 cleaned+0.065
 sighed+0.058
 checked+0.057
Neg logit contributions
aps-0.066
apping-0.064
 This-0.058
uttering-0.057
uts-0.056
Token lot
Token ID1256
Feature activation+1.28e-02

Pos logit contributions
 forgot+0.075
 punished+0.068
 cleaned+0.067
 sighed+0.058
 realized+0.056
Neg logit contributions
aps-0.098
apping-0.096
uttering-0.088
 This-0.085
uts-0.083
Token of
Token ID286
Feature activation+1.88e-02

Pos logit contributions
 forgot+0.027
 cleaned+0.026
 punished+0.024
 put+0.023
 locked+0.022
Neg logit contributions
aps-0.025
apping-0.024
uttering-0.022
 This-0.022
unch-0.022
Token fun
Token ID1257
Feature activation-4.17e-02

Pos logit contributions
 forgot+0.031
 punished+0.027
 cleaned+0.027
 realized+0.022
 put+0.022
Neg logit contributions
aps-0.049
apping-0.048
uttering-0.044
uts-0.043
 This-0.043
Token,
Token ID11
Feature activation+1.38e-02

Pos logit contributions
aps+0.026
apping+0.025
uts+0.024
 This+0.024
unch+0.023
Neg logit contributions
 forgot-0.020
 cleaned-0.018
 punished-0.017
 put-0.016
 saw-0.016
Token John
Token ID1757
Feature activation+7.09e-02

Pos logit contributions
 forgot+0.029
 punished+0.026
 cleaned+0.025
 realized+0.022
 sighed+0.022
Neg logit contributions
aps-0.029
apping-0.029
 This-0.025
uttering-0.025
uts-0.025
Token was
Token ID373
Feature activation+3.68e-02

Pos logit contributions
 forgot+0.088
 punished+0.077
 cleaned+0.073
ts+0.058
 nurses+0.057
Neg logit contributions
aps-0.198
apping-0.195
 This-0.179
uts-0.177
apped-0.175
Token so
Token ID523
Feature activation+3.18e-02

Pos logit contributions
 forgot+0.072
 punished+0.062
 cleaned+0.062
 realized+0.054
 sighed+0.054
Neg logit contributions
aps-0.085
apping-0.084
uttering-0.074
 This-0.073
uts-0.073
Token excited
Token ID6568
Feature activation-4.66e-02

Pos logit contributions
 forgot+0.068
 cleaned+0.061
 punished+0.061
 put+0.053
 realized+0.053
Neg logit contributions
aps-0.067
apping-0.066
uttering-0.058
uts-0.058
 This-0.058
Token he
Token ID339
Feature activation+9.61e-02

Pos logit contributions
aps+0.100
apping+0.096
uttering+0.086
apped+0.084
uts+0.082
Neg logit contributions
 forgot-0.100
 cleaned-0.089
 punished-0.087
 put-0.083
 felt-0.079
Token could
Token ID714
Feature activation+1.78e-02

Pos logit contributions
 forgot+0.118
 punished+0.117
 cleaned+0.109
 nurses+0.101
wy+0.097
Neg logit contributions
aps-0.241
apping-0.240
uts-0.231
 This-0.222
apped-0.222
Token barely
Token ID8523
Feature activation+3.31e-02

Pos logit contributions
 forgot+0.031
 cleaned+0.027
 punished+0.027
 put+0.023
 realized+0.023
Neg logit contributions
aps-0.044
apping-0.044
uttering-0.040
uts-0.039
 This-0.039
Token contain
Token ID3994
Feature activation-6.76e-02

Pos logit contributions
 forgot+0.060
 punished+0.057
 cleaned+0.054
 mess+0.048
 regretted+0.047
Neg logit contributions
aps-0.072
apping-0.070
 This-0.065
uts-0.064
uttering-0.064
Token himself
Token ID2241
Feature activation+4.76e-02

Pos logit contributions
apping+0.172
aps+0.172
uttering+0.157
uts+0.155
apped+0.153
Neg logit contributions
 forgot-0.109
 cleaned-0.098
 punished-0.097
 put-0.089
 realized-0.084
Token.
Token ID13
Feature activation-5.99e-02

Pos logit contributions
 forgot+0.097
 punished+0.087
 cleaned+0.085
 put+0.082
 felt+0.079
Neg logit contributions
aps-0.100
apping-0.097
uttering-0.090
apped-0.088
 This-0.086
Token became
Token ID2627
Feature activation-4.98e-03

Pos logit contributions
 forgot+0.010
 punished+0.009
 cleaned+0.009
wy+0.008
ty+0.008
Neg logit contributions
apping-0.018
aps-0.018
apped-0.017
uts-0.017
 This-0.016
Token a
Token ID257
Feature activation+3.38e-02

Pos logit contributions
aps+0.012
apping+0.011
uttering+0.010
apped+0.010
 This+0.010
Neg logit contributions
 forgot-0.009
 cleaned-0.008
 punished-0.008
 sighed-0.007
 checked-0.007
Token happy
Token ID3772
Feature activation-5.76e-03

Pos logit contributions
 forgot+0.060
 cleaned+0.055
 punished+0.055
wy+0.053
ty+0.052
Neg logit contributions
aps-0.074
apping-0.074
 This-0.065
uttering-0.064
uts-0.062
Token place
Token ID1295
Feature activation-2.89e-02

Pos logit contributions
aps+0.014
apping+0.014
uttering+0.012
uts+0.012
 This+0.012
Neg logit contributions
 forgot-0.011
 punished-0.009
 cleaned-0.009
 sighed-0.008
 put-0.008
Token where
Token ID810
Feature activation-3.54e-03

Pos logit contributions
aps+0.054
apping+0.052
uttering+0.045
uts+0.045
 This+0.045
Neg logit contributions
 forgot-0.069
 cleaned-0.062
 punished-0.062
 put-0.055
 realized-0.055
Token everyone
Token ID2506
Feature activation+5.52e-02

Pos logit contributions
aps+0.007
apping+0.007
 This+0.006
uts+0.006
uttering+0.006
Neg logit contributions
 forgot-0.008
 punished-0.007
 cleaned-0.007
 sighed-0.006
 checked-0.006
Token learned
Token ID4499
Feature activation-2.58e-02

Pos logit contributions
 forgot+0.072
 punished+0.068
 cleaned+0.063
 nurses+0.054
ty+0.053
Neg logit contributions
aps-0.147
apping-0.146
uts-0.134
 This-0.133
apped-0.130
Token and
Token ID290
Feature activation+1.53e-02

Pos logit contributions
aps+0.055
apping+0.054
uttering+0.048
uts+0.048
 This+0.047
Neg logit contributions
 forgot-0.054
 cleaned-0.049
 punished-0.049
 put-0.043
 realized-0.042
Token played
Token ID2826
Feature activation-6.50e-02

Pos logit contributions
 forgot+0.020
 punished+0.018
 cleaned+0.017
wy+0.014
ty+0.014
Neg logit contributions
aps-0.042
apping-0.041
apped-0.038
 This-0.038
uts-0.037
Token together
Token ID1978
Feature activation-2.57e-03

Pos logit contributions
aps+0.117
apping+0.115
uts+0.100
uttering+0.100
 This+0.098
Neg logit contributions
 forgot-0.157
 cleaned-0.143
 punished-0.143
 put-0.128
 realized-0.127
Token.
Token ID13
Feature activation-6.93e-02

Pos logit contributions
aps+0.006
apping+0.005
uttering+0.005
unch+0.005
uts+0.005
Neg logit contributions
 forgot-0.005
 cleaned-0.004
 punished-0.004
 put-0.004
 felt-0.004
Token
Token ID198
Feature activation-4.50e-02

Pos logit contributions
uttering+0.137
apping+0.137
aps+0.131
ectar+0.121
 Nice+0.120
Neg logit contributions
 forgot-0.157
 cleaned-0.144
 punished-0.141
 put-0.132
 fixed-0.130
Token
Token ID198
Feature activation-4.49e-02

Pos logit contributions
aps+0.071
apping+0.071
uttering+0.062
uts+0.059
apped+0.058
Neg logit contributions
 forgot-0.116
 cleaned-0.106
 punished-0.105
 realized-0.100
 hoped-0.098
TokenAfter
Token ID3260
Feature activation-1.66e-02

Pos logit contributions
aps+0.103
apping+0.102
uttering+0.091
uts+0.091
 This+0.090
Neg logit contributions
 forgot-0.087
 cleaned-0.077
 punished-0.077
 put-0.066
 sighed-0.065
Token that
Token ID326
Feature activation+2.25e-02

Pos logit contributions
aps+0.040
apping+0.039
uttering+0.036
uts+0.035
unch+0.034
Neg logit contributions
 forgot-0.031
 cleaned-0.028
 punished-0.028
 put-0.025
 realized-0.024
Token,
Token ID11
Feature activation+6.81e-03

Pos logit contributions
 forgot+0.051
 cleaned+0.046
 punished+0.045
 put+0.041
 realized+0.040
Neg logit contributions
aps-0.045
apping-0.044
uttering-0.039
uts-0.039
unch-0.037
Token Andrew
Token ID6858
Feature activation+6.86e-02

Pos logit contributions
 forgot+0.014
 cleaned+0.013
 punished+0.013
 put+0.011
 sighed+0.011
Neg logit contributions
aps-0.015
apping-0.014
 This-0.013
uttering-0.013
uts-0.013
Token learnt
Token ID26338
Feature activation-1.76e-02

Pos logit contributions
 forgot+0.091
 nurses+0.088
 colder+0.087
 matter+0.087
 punished+0.087
Neg logit contributions
aps-0.151
apping-0.146
uts-0.135
apped-0.134
 This-0.128
Token his
Token ID465
Feature activation+2.33e-02

Pos logit contributions
aps+0.040
apping+0.039
uttering+0.035
uts+0.035
 This+0.034
Neg logit contributions
 forgot-0.035
 punished-0.031
 cleaned-0.031
 put-0.027
 realized-0.027
Token lesson
Token ID11483
Feature activation-3.54e-02

Pos logit contributions
 forgot+0.050
 punished+0.044
 cleaned+0.044
 checked+0.041
 replaced+0.040
Neg logit contributions
aps-0.048
apping-0.046
uttering-0.040
uts-0.040
 This-0.039
Token and
Token ID290
Feature activation+1.38e-02

Pos logit contributions
aps+0.072
apping+0.071
uttering+0.063
uts+0.062
 This+0.061
Neg logit contributions
 forgot-0.078
 cleaned-0.070
 punished-0.070
 put-0.061
 realized-0.061
Token never
Token ID1239
Feature activation+4.12e-02

Pos logit contributions
 forgot+0.018
wy+0.016
 punished+0.016
 cleaned+0.015
ns+0.014
Neg logit contributions
aps-0.035
apping-0.034
apped-0.032
 This-0.031
uts-0.031
Token named
Token ID3706
Feature activation-4.10e-02

Pos logit contributions
 forgot+0.079
 cleaned+0.070
 punished+0.068
 put+0.059
 locked+0.057
Neg logit contributions
aps-0.124
apping-0.122
uts-0.112
uttering-0.111
 This-0.109
Token Lily
Token ID20037
Feature activation+6.59e-02

Pos logit contributions
aps+0.111
apping+0.109
uts+0.100
uttering+0.100
unch+0.098
Neg logit contributions
 forgot-0.068
 cleaned-0.061
 punished-0.054
 put-0.052
 locked-0.049
Token.
Token ID13
Feature activation-4.46e-02

Pos logit contributions
 forgot+0.160
 punished+0.146
 cleaned+0.145
 sighed+0.129
 realized+0.129
Neg logit contributions
aps-0.119
apping-0.117
uttering-0.101
uts-0.100
 This-0.100
Token She
Token ID1375
Feature activation+1.14e-01

Pos logit contributions
apping+0.086
aps+0.084
uttering+0.083
uts+0.080
utes+0.077
Neg logit contributions
 forgot-0.100
 cleaned-0.095
 put-0.086
 felt-0.084
 had-0.084
Token had
Token ID550
Feature activation+4.80e-02

Pos logit contributions
 punished+0.163
 forgot+0.162
 cleaned+0.137
 nurses+0.133
 sighed+0.121
Neg logit contributions
aps-0.280
apping-0.278
 This-0.268
uts-0.257
apped-0.251
Token a
Token ID257
Feature activation+5.44e-02

Pos logit contributions
 forgot+0.092
 cleaned+0.082
 punished+0.081
 put+0.071
 realized+0.070
Neg logit contributions
aps-0.112
apping-0.110
uttering-0.098
uts-0.098
 This-0.096
Token furry
Token ID40857
Feature activation+1.07e-02

Pos logit contributions
 forgot+0.100
 punished+0.089
 cleaned+0.087
 sighed+0.076
 realized+0.075
Neg logit contributions
aps-0.126
apping-0.124
uttering-0.114
 This-0.112
uts-0.108
Token t
Token ID256
Feature activation-4.44e-02

Pos logit contributions
 forgot+0.018
 cleaned+0.016
 punished+0.016
 put+0.014
 realized+0.013
Neg logit contributions
aps-0.027
apping-0.026
uttering-0.024
uts-0.024
 This-0.023
Tokeneddy
Token ID21874
Feature activation+1.82e-02

Pos logit contributions
aps+0.059
apping+0.058
 This+0.049
uts+0.047
uttering+0.047
Neg logit contributions
 forgot-0.124
 cleaned-0.115
 punished-0.114
 put-0.105
 realized-0.104
Token bear
Token ID6842
Feature activation+5.47e-03

Pos logit contributions
 forgot+0.025
 punished+0.021
 cleaned+0.020
ts+0.015
 sighed+0.015
Neg logit contributions
aps-0.050
apping-0.048
 This-0.045
apped-0.044
 Nice-0.043
Token that
Token ID326
Feature activation+4.45e-02

Pos logit contributions
 forgot+0.009
 cleaned+0.008
 punished+0.008
 put+0.007
 locked+0.007
Neg logit contributions
aps-0.014
apping-0.014
uttering-0.013
uts-0.013
unch-0.012
Token flexible
Token ID12846
Feature activation-3.87e-02

Pos logit contributions
 forgot+0.021
 punished+0.018
 cleaned+0.018
 sighed+0.015
 realized+0.015
Neg logit contributions
aps-0.037
apping-0.036
uttering-0.033
uts-0.033
 This-0.033
Token -
Token ID532
Feature activation+2.06e-05

Pos logit contributions
aps+0.074
apping+0.072
uts+0.064
 This+0.063
uttering+0.062
Neg logit contributions
 forgot-0.089
 punished-0.081
 cleaned-0.079
 sighed-0.072
 realized-0.071
Token it
Token ID340
Feature activation+5.85e-02

Pos logit contributions
 forgot+0.000
 cleaned+0.000
 punished+0.000
 put+0.000
 realized+0.000
Neg logit contributions
aps-0.000
apping-0.000
uttering-0.000
uts-0.000
 This-0.000
Token could
Token ID714
Feature activation+6.47e-03

Pos logit contributions
 punished+0.083
 forgot+0.079
 nurses+0.077
 cleaned+0.069
wy+0.068
Neg logit contributions
 This-0.129
aps-0.128
uts-0.127
apping-0.126
apped-0.115
Token move
Token ID1445
Feature activation-4.44e-02

Pos logit contributions
 forgot+0.010
 punished+0.009
 cleaned+0.009
 regretted+0.008
 sighed+0.008
Neg logit contributions
aps-0.016
apping-0.015
 This-0.015
uts-0.015
uttering-0.014
Token and
Token ID290
Feature activation+2.70e-02

Pos logit contributions
aps+0.090
apping+0.087
 This+0.078
uts+0.077
uttering+0.076
Neg logit contributions
 forgot-0.098
 punished-0.088
 cleaned-0.087
 sighed-0.077
 checked-0.076
Token bend
Token ID19396
Feature activation-4.57e-02

Pos logit contributions
 forgot+0.044
 punished+0.039
 cleaned+0.038
 sighed+0.032
 realized+0.032
Neg logit contributions
aps-0.068
apping-0.066
 This-0.062
uts-0.060
uttering-0.059
Token,
Token ID11
Feature activation+7.36e-03

Pos logit contributions
aps+0.096
apping+0.094
uttering+0.083
uts+0.082
unch+0.082
Neg logit contributions
 forgot-0.097
 cleaned-0.088
 punished-0.087
 put-0.077
 felt-0.076
Token and
Token ID290
Feature activation+3.01e-02

Pos logit contributions
 forgot+0.014
 punished+0.013
 cleaned+0.013
 sighed+0.011
 checked+0.011
Neg logit contributions
aps-0.016
apping-0.016
 This-0.015
uts-0.015
uttering-0.014
Token Sam
Token ID3409
Feature activation+7.08e-02

Pos logit contributions
 forgot+0.053
 punished+0.048
 cleaned+0.047
 sighed+0.040
 regretted+0.039
Neg logit contributions
aps-0.071
apping-0.068
 This-0.064
uts-0.062
uttering-0.061
Token was
Token ID373
Feature activation+3.17e-02

Pos logit contributions
 forgot+0.090
 punished+0.089
 nurses+0.083
 cleaned+0.081
 mess+0.072
Neg logit contributions
aps-0.182
apping-0.179
uts-0.173
apped-0.166
 This-0.162
Token
Token ID198
Feature activation-4.19e-02

Pos logit contributions
apping+0.079
aps+0.075
uttering+0.073
apped+0.066
uts+0.065
Neg logit contributions
 forgot-0.109
 cleaned-0.104
 punished-0.098
 put-0.095
 realized-0.093
TokenThey
Token ID2990
Feature activation+9.26e-02

Pos logit contributions
aps+0.069
apping+0.067
uts+0.057
 This+0.056
uttering+0.055
Neg logit contributions
 forgot-0.108
 punished-0.098
 cleaned-0.097
 checked-0.088
 hoped-0.088
Token were
Token ID547
Feature activation+3.46e-02

Pos logit contributions
nie+0.148
wy+0.144
rated+0.142
 matter+0.137
 favor+0.134
Neg logit contributions
apped-0.165
aps-0.157
apping-0.148
 Wo-0.140
ucked-0.137
Token not
Token ID407
Feature activation+4.86e-02

Pos logit contributions
 forgot+0.061
 cleaned+0.052
 punished+0.050
'd+0.049
 would+0.048
Neg logit contributions
aps-0.084
apping-0.082
uts-0.072
apped-0.071
uttering-0.071
Token hurt
Token ID5938
Feature activation-5.91e-02

Pos logit contributions
 forgot+0.084
 punished+0.073
 cleaned+0.073
 sighed+0.066
 realized+0.065
Neg logit contributions
aps-0.117
apping-0.117
uts-0.104
uttering-0.103
apped-0.098
Token,
Token ID11
Feature activation+8.75e-03

Pos logit contributions
aps+0.128
apping+0.123
uttering+0.115
 This+0.115
unch+0.114
Neg logit contributions
 forgot-0.118
 punished-0.106
 cleaned-0.105
 put-0.095
 locked-0.094
Token but
Token ID475
Feature activation+2.71e-02

Pos logit contributions
 forgot+0.018
 punished+0.017
 cleaned+0.017
 repaired+0.015
 checked+0.015
Neg logit contributions
apping-0.017
aps-0.016
uts-0.015
 This-0.014
uttering-0.014
Token they
Token ID484
Feature activation+9.42e-02

Pos logit contributions
 forgot+0.070
 punished+0.068
 cleaned+0.065
wy+0.061
 regretted+0.060
Neg logit contributions
aps-0.038
apping-0.038
apped-0.031
 This-0.029
uts-0.029
Token were
Token ID547
Feature activation+2.03e-02

Pos logit contributions
 nurses+0.172
nie+0.170
 colder+0.169
 punished+0.169
 mess+0.167
Neg logit contributions
apped-0.147
apping-0.147
aps-0.143
uts-0.137
unch-0.119
Token sad
Token ID6507
Feature activation-5.23e-02

Pos logit contributions
 forgot+0.038
 cleaned+0.032
 punished+0.031
 realized+0.031
 sighed+0.030
Neg logit contributions
apping-0.046
aps-0.046
uttering-0.040
uts-0.039
apped-0.037
Token.
Token ID13
Feature activation-7.97e-02

Pos logit contributions
aps+0.109
apping+0.109
uttering+0.100
 This+0.098
unch+0.095
Neg logit contributions
 forgot-0.110
 cleaned-0.099
 punished-0.096
 put-0.090
 felt-0.088
Token and
Token ID290
Feature activation+1.76e-02

Pos logit contributions
aps+0.134
apping+0.131
unch+0.121
uttering+0.119
uts+0.116
Neg logit contributions
 forgot-0.133
 cleaned-0.123
 punished-0.117
 put-0.111
 felt-0.107
Token laughed
Token ID13818
Feature activation-9.16e-02

Pos logit contributions
 forgot+0.023
 punished+0.022
 cleaned+0.020
 realized+0.016
 regretted+0.016
Neg logit contributions
aps-0.047
apping-0.046
apped-0.042
uts-0.042
 This-0.042
Token.
Token ID13
Feature activation-8.69e-02

Pos logit contributions
aps+0.230
unch+0.223
apping+0.223
imb+0.219
azes+0.218
Neg logit contributions
 forgot-0.129
 cleaned-0.123
 put-0.121
 even-0.111
 felt-0.109
Token They
Token ID1119
Feature activation+8.72e-02

Pos logit contributions
uttering+0.187
apping+0.186
unch+0.173
aps+0.171
iked+0.161
Neg logit contributions
 forgot-0.167
 cleaned-0.156
 put-0.145
 hoped-0.140
 washed-0.137
Token forgot
Token ID16453
Feature activation-5.44e-03

Pos logit contributions
 punished+0.123
nie+0.119
ty+0.114
 nurses+0.111
sie+0.110
Neg logit contributions
apped-0.173
aps-0.171
uts-0.163
apping-0.162
 Wo-0.148
Token about
Token ID546
Feature activation-8.23e-03

Pos logit contributions
aps+0.010
apping+0.010
uttering+0.008
uts+0.008
 This+0.008
Neg logit contributions
 forgot-0.013
 cleaned-0.012
 punished-0.012
 put-0.011
 realized-0.011
Token their
Token ID511
Feature activation+3.59e-02

Pos logit contributions
aps+0.020
apping+0.019
uttering+0.017
uts+0.017
 This+0.017
Neg logit contributions
 forgot-0.015
 punished-0.014
 cleaned-0.014
 put-0.012
 realized-0.011
Token snow
Token ID6729
Feature activation-6.24e-02

Pos logit contributions
 forgot+0.063
 punished+0.054
 cleaned+0.050
 realized+0.046
 managed+0.043
Neg logit contributions
aps-0.087
apping-0.082
uttering-0.077
 Today-0.075
 This-0.074
Tokenman
Token ID805
Feature activation-5.57e-03

Pos logit contributions
aps+0.143
apping+0.139
uts+0.126
uttering+0.124
 This+0.122
Neg logit contributions
 forgot-0.122
 punished-0.109
 cleaned-0.107
 realized-0.093
 sighed-0.092
Token.
Token ID13
Feature activation-8.50e-02

Pos logit contributions
aps+0.010
apping+0.010
uts+0.009
uttering+0.009
 This+0.009
Neg logit contributions
 forgot-0.013
 punished-0.012
 cleaned-0.012
 put-0.011
 realized-0.011
Token They
Token ID1119
Feature activation+8.65e-02

Pos logit contributions
uttering+0.196
apping+0.188
iked+0.177
unch+0.169
aps+0.169
Neg logit contributions
 forgot-0.147
 cleaned-0.141
 hoped-0.129
 put-0.126
 also-0.124
Token of
Token ID286
Feature activation+1.83e-02

Pos logit contributions
aps+0.008
apping+0.008
uttering+0.008
apped+0.007
unch+0.007
Neg logit contributions
 forgot-0.009
 cleaned-0.008
 punished-0.008
 put-0.007
ty-0.007
Token food
Token ID2057
Feature activation-2.32e-02

Pos logit contributions
 forgot+0.036
 punished+0.031
 cleaned+0.030
 hoped+0.027
 realized+0.026
Neg logit contributions
aps-0.042
apping-0.039
 This-0.037
uts-0.035
uttering-0.035
Token.
Token ID13
Feature activation-5.84e-02

Pos logit contributions
aps+0.047
apping+0.046
uttering+0.040
 This+0.039
uts+0.039
Neg logit contributions
 forgot-0.051
 cleaned-0.047
 punished-0.046
 put-0.042
 locked-0.041
Token Every
Token ID3887
Feature activation+4.24e-03

Pos logit contributions
apping+0.129
uttering+0.127
aps+0.124
uts+0.113
apped+0.109
Neg logit contributions
 forgot-0.112
 cleaned-0.105
 put-0.099
 punished-0.096
 checked-0.096
Token day
Token ID1110
Feature activation-1.50e-02

Pos logit contributions
 forgot+0.009
 punished+0.008
 cleaned+0.008
 put+0.007
 realized+0.007
Neg logit contributions
aps-0.009
apping-0.009
uttering-0.008
uts-0.008
 This-0.008
Token,
Token ID11
Feature activation+9.92e-03

Pos logit contributions
aps+0.034
apping+0.033
uttering+0.030
uts+0.029
 This+0.029
Neg logit contributions
 forgot-0.031
 cleaned-0.027
 punished-0.027
 put-0.024
 realized-0.024
Token the
Token ID262
Feature activation+4.39e-02

Pos logit contributions
 forgot+0.019
 punished+0.017
 cleaned+0.017
 sighed+0.014
 realized+0.014
Neg logit contributions
aps-0.023
apping-0.022
 This-0.020
uts-0.020
uttering-0.020
Token musician
Token ID21623
Feature activation+5.52e-03

Pos logit contributions
 forgot+0.095
 punished+0.086
 cleaned+0.086
 checked+0.077
 locked+0.077
Neg logit contributions
apping-0.084
aps-0.080
uttering-0.073
uts-0.072
 This-0.070
Token would
Token ID561
Feature activation+1.19e-02

Pos logit contributions
 forgot+0.006
 punished+0.006
 colder+0.005
 cleaned+0.005
 mess+0.005
Neg logit contributions
apping-0.014
aps-0.013
 This-0.013
uts-0.013
apped-0.013
Token eat
Token ID4483
Feature activation-4.42e-02

Pos logit contributions
 forgot+0.021
 punished+0.018
 cleaned+0.018
 sighed+0.016
 regretted+0.016
Neg logit contributions
aps-0.027
apping-0.026
 This-0.025
uts-0.024
unch-0.023
Token y
Token ID331
Feature activation-1.21e-01

Pos logit contributions
aps+0.103
apping+0.100
 This+0.091
uts+0.090
uttering+0.090
Neg logit contributions
 forgot-0.084
 punished-0.073
 cleaned-0.072
 sighed-0.062
 realized-0.062
Token a
Token ID257
Feature activation+2.62e-02

Pos logit contributions
 forgot+0.031
 cleaned+0.027
 punished+0.026
 put+0.024
 felt+0.023
Neg logit contributions
aps-0.039
apping-0.039
uttering-0.035
uts-0.034
 This-0.034
Token time
Token ID640
Feature activation-5.51e-03

Pos logit contributions
 forgot+0.047
 punished+0.042
 cleaned+0.041
 realized+0.035
 sighed+0.035
Neg logit contributions
aps-0.063
apping-0.063
uttering-0.056
 This-0.056
uts-0.055
Token,
Token ID11
Feature activation+1.17e-02

Pos logit contributions
aps+0.014
apping+0.014
uttering+0.014
itted+0.013
anks+0.013
Neg logit contributions
 forgot-0.008
 saw-0.007
 cleaned-0.006
 felt-0.006
 found-0.006
Token there
Token ID612
Feature activation+3.69e-02

Pos logit contributions
 forgot+0.021
 cleaned+0.018
 punished+0.017
 put+0.016
 saw+0.016
Neg logit contributions
aps-0.029
apping-0.028
uts-0.026
uttering-0.026
unch-0.025
Token was
Token ID373
Feature activation+2.82e-02

Pos logit contributions
 forgot+0.044
 punished+0.036
 cleaned+0.036
 put+0.027
 realized+0.027
Neg logit contributions
aps-0.112
apping-0.111
uttering-0.102
uts-0.102
 This-0.101
Token a
Token ID257
Feature activation+4.96e-02

Pos logit contributions
 forgot+0.040
 cleaned+0.039
 punished+0.033
 put+0.033
 saw+0.031
Neg logit contributions
aps-0.077
apping-0.076
uts-0.071
uttering-0.068
unch-0.066
Token little
Token ID1310
Feature activation+1.81e-02

Pos logit contributions
 forgot+0.079
 punished+0.072
 cleaned+0.069
 sighed+0.058
 checked+0.057
Neg logit contributions
apping-0.126
aps-0.125
uttering-0.113
 This-0.110
unch-0.109
Token girl
Token ID2576
Feature activation+4.90e-02

Pos logit contributions
 forgot+0.028
 punished+0.024
 cleaned+0.023
 hoped+0.019
 sighed+0.019
Neg logit contributions
aps-0.049
apping-0.049
uttering-0.044
 This-0.043
apped-0.043
Token named
Token ID3706
Feature activation-3.83e-02

Pos logit contributions
 forgot+0.087
 punished+0.076
 cleaned+0.076
 put+0.064
 realized+0.064
Neg logit contributions
aps-0.120
apping-0.118
uttering-0.106
uts-0.106
 This-0.106
Token Lily
Token ID20037
Feature activation+6.57e-02

Pos logit contributions
aps+0.099
apping+0.097
uts+0.089
uttering+0.088
unch+0.087
Neg logit contributions
 forgot-0.067
 cleaned-0.060
 punished-0.054
 put-0.052
 locked-0.049
Token.
Token ID13
Feature activation-4.57e-02

Pos logit contributions
 forgot+0.169
 punished+0.156
 cleaned+0.152
 sighed+0.139
 realized+0.138
Neg logit contributions
aps-0.108
apping-0.105
 This-0.089
uts-0.088
uttering-0.088
Token knew
Token ID2993
Feature activation-1.83e-02

Pos logit contributions
 forgot+0.061
 punished+0.055
 cleaned+0.054
 sighed+0.047
 checked+0.047
Neg logit contributions
aps-0.064
apping-0.063
 This-0.057
uttering-0.054
uts-0.054
Token it
Token ID340
Feature activation+3.87e-02

Pos logit contributions
aps+0.041
apping+0.041
uttering+0.036
uts+0.036
 This+0.036
Neg logit contributions
 forgot-0.036
 cleaned-0.032
 punished-0.032
 put-0.028
 realized-0.027
Token was
Token ID373
Feature activation+7.80e-03

Pos logit contributions
 forgot+0.051
 punished+0.046
 cleaned+0.043
 sighed+0.035
 nurses+0.035
Neg logit contributions
aps-0.103
apping-0.102
 This-0.098
uts-0.097
uttering-0.092
Token made
Token ID925
Feature activation-1.04e-02

Pos logit contributions
 forgot+0.014
 punished+0.012
 cleaned+0.012
 sighed+0.011
 realized+0.011
Neg logit contributions
aps-0.017
apping-0.017
 This-0.016
uttering-0.016
uts-0.015
Token with
Token ID351
Feature activation-4.10e-03

Pos logit contributions
aps+0.021
apping+0.021
uttering+0.019
uts+0.018
 This+0.018
Neg logit contributions
 forgot-0.022
 punished-0.020
 cleaned-0.020
 put-0.018
 realized-0.018
Token love
Token ID1842
Feature activation-4.40e-02

Pos logit contributions
aps+0.009
apping+0.009
 This+0.008
uttering+0.008
uts+0.008
Neg logit contributions
 forgot-0.008
 punished-0.007
 cleaned-0.007
 sighed-0.006
 checked-0.006
Token.
Token ID13
Feature activation-6.87e-02

Pos logit contributions
aps+0.091
apping+0.089
uttering+0.080
uts+0.078
unch+0.078
Neg logit contributions
 forgot-0.094
 cleaned-0.086
 punished-0.084
 put-0.076
 realized-0.074
Token<|endoftext|>
Token ID50256
Feature activation-5.36e-02

Pos logit contributions
uttering+0.122
apping+0.118
unch+0.109
aps+0.109
uts+0.105
Neg logit contributions
 forgot-0.160
 cleaned-0.149
 felt-0.138
 hoped-0.137
 learned-0.136
TokenOnce
Token ID7454
Feature activation-4.08e-03

Pos logit contributions
apping+0.112
uttering+0.105
aps+0.105
uts+0.102
itted+0.095
Neg logit contributions
 forgot-0.104
 cleaned-0.098
 saw-0.098
 had-0.095
 felt-0.095
Token upon
Token ID2402
Feature activation+1.90e-02

Pos logit contributions
aps+0.008
apping+0.008
uts+0.007
uttering+0.007
 This+0.007
Neg logit contributions
 forgot-0.009
 cleaned-0.008
 punished-0.008
 put-0.007
 realized-0.007
Token a
Token ID257
Feature activation+3.31e-02

Pos logit contributions
 forgot+0.037
 cleaned+0.033
 punished+0.033
 put+0.028
 realized+0.028
Neg logit contributions
aps-0.043
apping-0.043
uttering-0.038
uts-0.038
 This-0.038
Token her
Token ID607
Feature activation+4.54e-02

Pos logit contributions
 punished+0.083
 nurses+0.080
sie+0.080
wy+0.080
 forgot+0.076
Neg logit contributions
apping-0.130
aps-0.127
apped-0.120
uts-0.119
 This-0.112
Token bear
Token ID6842
Feature activation-1.31e-02

Pos logit contributions
 forgot+0.085
 punished+0.075
 cleaned+0.075
 realized+0.064
 put+0.064
Neg logit contributions
aps-0.107
apping-0.104
uttering-0.094
uts-0.093
 This-0.093
Token a
Token ID257
Feature activation+4.66e-02

Pos logit contributions
aps+0.030
apping+0.029
uts+0.026
 This+0.026
uttering+0.026
Neg logit contributions
 forgot-0.025
 punished-0.022
 cleaned-0.022
 sighed-0.019
 put-0.019
Token big
Token ID1263
Feature activation+1.22e-02

Pos logit contributions
 forgot+0.078
 punished+0.068
 cleaned+0.067
 hoped+0.057
 sighed+0.057
Neg logit contributions
aps-0.115
apping-0.113
 This-0.102
uttering-0.102
uts-0.099
Token hug
Token ID16225
Feature activation-6.68e-02

Pos logit contributions
 forgot+0.023
 punished+0.020
 cleaned+0.020
 hoped+0.017
 sighed+0.017
Neg logit contributions
aps-0.029
apping-0.028
 This-0.025
 Today-0.025
uttering-0.024
Token.
Token ID13
Feature activation-6.87e-02

Pos logit contributions
aps+0.145
apping+0.140
uttering+0.127
uts+0.122
 This+0.120
Neg logit contributions
 forgot-0.139
 cleaned-0.126
 punished-0.124
 put-0.113
 realized-0.110
Token He
Token ID679
Feature activation+1.06e-01

Pos logit contributions
uttering+0.139
apping+0.132
aps+0.131
uts+0.115
ings+0.111
Neg logit contributions
 forgot-0.142
 cleaned-0.130
 put-0.126
 hoped-0.119
 saw-0.119
Token was
Token ID373
Feature activation+3.23e-02

Pos logit contributions
 forgot+0.135
 punished+0.129
 cleaned+0.124
ts+0.112
wy+0.110
Neg logit contributions
aps-0.262
apping-0.238
 This-0.236
uts-0.226
apped-0.226
Token so
Token ID523
Feature activation+2.67e-02

Pos logit contributions
 forgot+0.061
 cleaned+0.053
 punished+0.052
 sighed+0.046
 realized+0.046
Neg logit contributions
aps-0.075
apping-0.074
uttering-0.065
 This-0.065
uts-0.064
Token soft
Token ID2705
Feature activation-4.10e-02

Pos logit contributions
 forgot+0.046
 punished+0.040
 cleaned+0.040
 put+0.034
 realized+0.033
Neg logit contributions
aps-0.068
apping-0.066
uttering-0.060
uts-0.060
 This-0.060
Token and
Token ID290
Feature activation+3.38e-02

Pos logit contributions
aps+0.088
apping+0.087
uts+0.078
 This+0.077
uttering+0.076
Neg logit contributions
 forgot-0.085
 punished-0.076
 cleaned-0.075
 sighed-0.064
 realized-0.064
Token his
Token ID465
Feature activation+4.24e-02

Pos logit contributions
 forgot+0.067
 punished+0.058
 cleaned+0.056
 sighed+0.049
 regretted+0.048
Neg logit contributions
aps-0.099
apping-0.096
 This-0.089
uttering-0.086
uts-0.084
Token grandmother
Token ID18410
Feature activation+2.60e-02

Pos logit contributions
 forgot+0.072
 cleaned+0.063
 punished+0.062
 checked+0.052
 sighed+0.052
Neg logit contributions
aps-0.108
apping-0.105
uttering-0.095
 This-0.095
uts-0.094
Token's
Token ID338
Feature activation+2.90e-02

Pos logit contributions
 forgot+0.040
 cleaned+0.035
 punished+0.035
 put+0.029
 realized+0.028
Neg logit contributions
aps-0.070
apping-0.068
uttering-0.063
uts-0.063
 This-0.061
Token house
Token ID2156
Feature activation-2.00e-02

Pos logit contributions
 forgot+0.054
 punished+0.047
 cleaned+0.047
 realized+0.040
 checked+0.040
Neg logit contributions
aps-0.070
apping-0.068
uttering-0.061
 This-0.061
uts-0.060
Token.
Token ID13
Feature activation-6.50e-02

Pos logit contributions
aps+0.035
apping+0.034
uts+0.030
 This+0.029
uttering+0.029
Neg logit contributions
 forgot-0.049
 punished-0.044
 cleaned-0.044
 sighed-0.039
 realized-0.039
Token<|endoftext|>
Token ID50256
Feature activation-5.70e-02

Pos logit contributions
uttering+0.129
apping+0.127
aps+0.119
uts+0.109
utes+0.105
Neg logit contributions
 forgot-0.131
 cleaned-0.126
 checked-0.115
 punished-0.114
 put-0.113
TokenOnce
Token ID7454
Feature activation-4.56e-03

Pos logit contributions
apping+0.122
uttering+0.117
uts+0.112
aps+0.112
itted+0.104
Neg logit contributions
 forgot-0.104
 cleaned-0.101
 saw-0.100
 had-0.098
 felt-0.097
Token upon
Token ID2402
Feature activation+1.87e-02

Pos logit contributions
aps+0.009
apping+0.009
uts+0.008
uttering+0.008
 This+0.008
Neg logit contributions
 forgot-0.010
 cleaned-0.009
 punished-0.009
 put-0.008
 realized-0.008
Token a
Token ID257
Feature activation+3.32e-02

Pos logit contributions
 forgot+0.036
 cleaned+0.032
 punished+0.032
 put+0.028
 realized+0.027
Neg logit contributions
aps-0.043
apping-0.042
uttering-0.038
uts-0.038
 This-0.037
Token time
Token ID640
Feature activation-3.29e-03

Pos logit contributions
 forgot+0.059
 punished+0.053
 cleaned+0.052
 realized+0.045
 sighed+0.044
Neg logit contributions
aps-0.080
apping-0.079
uttering-0.072
 This-0.071
uts-0.070
Token,
Token ID11
Feature activation+1.60e-02

Pos logit contributions
aps+0.008
apping+0.008
itted+0.008
uttering+0.008
uts+0.008
Neg logit contributions
 forgot-0.005
 saw-0.004
 cleaned-0.004
 felt-0.004
 had-0.004
Token shouts
Token ID34757
Feature activation-4.41e-02

Pos logit contributions
 forgot+0.207
 cleaned+0.198
 punished+0.188
 repaired+0.169
ty+0.166
Neg logit contributions
aps-0.242
apping-0.204
 This-0.196
uts-0.195
apped-0.184
Token at
Token ID379
Feature activation-2.72e-02

Pos logit contributions
unch+0.091
apping+0.090
aps+0.090
uttering+0.087
imb+0.083
Neg logit contributions
 forgot-0.086
 cleaned-0.078
 realized-0.075
 put-0.075
 saw-0.074
Token the
Token ID262
Feature activation+2.08e-02

Pos logit contributions
aps+0.059
apping+0.059
uts+0.052
uttering+0.052
 This+0.050
Neg logit contributions
 forgot-0.056
 cleaned-0.050
 punished-0.050
 put-0.045
 realized-0.044
Token raven
Token ID37735
Feature activation-2.95e-02

Pos logit contributions
 forgot+0.038
 cleaned+0.034
 punished+0.033
 checked+0.028
 sighed+0.028
Neg logit contributions
aps-0.051
apping-0.049
uttering-0.045
uts-0.044
 This-0.044
Token.
Token ID13
Feature activation-9.22e-02

Pos logit contributions
apping+0.067
uttering+0.065
aps+0.065
unch+0.063
apped+0.061
Neg logit contributions
 forgot-0.053
 bit-0.045
 punished-0.044
 cleaned-0.044
 put-0.044
Token
Token ID198
Feature activation-5.65e-02

Pos logit contributions
apping+0.201
uttering+0.193
aps+0.177
ummy+0.170
ugs+0.168
Neg logit contributions
 hoped-0.158
 forgot-0.156
 saw-0.154
 felt-0.153
 put-0.148
Token
Token ID198
Feature activation-5.23e-02

Pos logit contributions
apping+0.116
uttering+0.109
apped+0.104
aps+0.103
itted+0.099
Neg logit contributions
 forgot-0.111
 cleaned-0.102
 put-0.101
 saw-0.099
 also-0.099
Token"
Token ID1
Feature activation-7.03e-03

Pos logit contributions
apping+0.123
aps+0.120
uttering+0.114
apped+0.111
uts+0.109
Neg logit contributions
 forgot-0.095
 punished-0.084
 cleaned-0.081
 put-0.076
 realized-0.075
TokenSh
Token ID2484
Feature activation-6.36e-02

Pos logit contributions
aps+0.015
apping+0.014
uttering+0.013
uts+0.013
 This+0.013
Neg logit contributions
 forgot-0.015
 cleaned-0.014
 punished-0.014
 put-0.012
 locked-0.012
Tokenoo
Token ID2238
Feature activation-3.79e-02

Pos logit contributions
aps+0.126
apped+0.121
apping+0.118
uttering+0.117
 This+0.115
Neg logit contributions
 forgot-0.110
ty-0.103
 cleaned-0.102
 punished-0.102
 put-0.090
Token,
Token ID11
Feature activation-1.20e-03

Pos logit contributions
aps+0.075
apping+0.073
uttering+0.064
uts+0.064
 This+0.063
Neg logit contributions
 forgot-0.086
 cleaned-0.077
 punished-0.077
 put-0.068
 realized-0.068
Token toys
Token ID14958
Feature activation-8.14e-03

Pos logit contributions
 forgot+0.094
 cleaned+0.083
 punished+0.081
 put+0.070
 locked+0.068
Neg logit contributions
aps-0.139
apping-0.138
uts-0.124
uttering-0.124
 This-0.123
Token and
Token ID290
Feature activation+1.90e-02

Pos logit contributions
aps+0.015
apping+0.015
uttering+0.013
uts+0.013
 This+0.013
Neg logit contributions
 forgot-0.019
 punished-0.017
 cleaned-0.017
 sighed-0.015
 realized-0.015
Token make
Token ID787
Feature activation-1.16e-02

Pos logit contributions
 punished+0.034
 forgot+0.033
 cleaned+0.031
 sighed+0.029
 regretted+0.029
Neg logit contributions
aps-0.036
 This-0.036
apping-0.034
 Yes-0.033
 Today-0.032
Token up
Token ID510
Feature activation-1.09e-02

Pos logit contributions
aps+0.028
apping+0.027
uttering+0.024
uts+0.024
 This+0.024
Neg logit contributions
 forgot-0.021
 cleaned-0.019
 punished-0.019
 sighed-0.016
 realized-0.016
Token fun
Token ID1257
Feature activation-4.22e-02

Pos logit contributions
aps+0.026
apping+0.026
uttering+0.023
uts+0.023
apped+0.023
Neg logit contributions
 forgot-0.020
 punished-0.018
 cleaned-0.018
 put-0.016
 locked-0.015
Token games
Token ID1830
Feature activation-5.88e-02

Pos logit contributions
aps+0.099
apping+0.099
uttering+0.092
unch+0.089
uts+0.088
Neg logit contributions
 forgot-0.077
 cleaned-0.069
 punished-0.067
 put-0.061
 mess-0.059
Token.
Token ID13
Feature activation-5.86e-02

Pos logit contributions
aps+0.118
apping+0.118
uttering+0.106
unch+0.104
 This+0.102
Neg logit contributions
 forgot-0.129
 cleaned-0.118
 punished-0.114
 put-0.108
 locked-0.103
Token One
Token ID1881
Feature activation+3.92e-02

Pos logit contributions
apping+0.130
aps+0.126
uttering+0.125
uts+0.115
apped+0.109
Neg logit contributions
 forgot-0.113
 cleaned-0.106
 put-0.102
 saw-0.099
 would-0.098
Token day
Token ID1110
Feature activation-1.29e-02

Pos logit contributions
 forgot+0.087
 punished+0.080
 cleaned+0.077
 realized+0.069
 sighed+0.069
Neg logit contributions
aps-0.078
apping-0.077
 This-0.068
uttering-0.068
uts-0.066
Token,
Token ID11
Feature activation+1.89e-02

Pos logit contributions
aps+0.036
apping+0.034
uttering+0.032
uts+0.032
ectar+0.031
Neg logit contributions
 forgot-0.019
 cleaned-0.017
 saw-0.016
 put-0.015
 checked-0.015
Token he
Token ID339
Feature activation+9.85e-02

Pos logit contributions
 forgot+0.032
 cleaned+0.029
 put+0.027
 saw+0.026
 punished+0.026
Neg logit contributions
aps-0.051
apping-0.050
uts-0.044
uttering-0.044
itted-0.042
Token played
Token ID2826
Feature activation-6.70e-02

Pos logit contributions
 punished+0.127
 mess+0.123
 damage+0.122
 unpredictable+0.118
 situation+0.118
Neg logit contributions
apped-0.181
apping-0.165
aps-0.160
iked-0.146
ucked-0.144
Token and
Token ID290
Feature activation+1.93e-02

Pos logit contributions
aps+0.133
apping+0.132
uts+0.121
uttering+0.118
 This+0.116
Neg logit contributions
 forgot-0.145
 cleaned-0.133
 punished-0.130
 put-0.120
 felt-0.116
Token n
Token ID299
Feature activation-1.25e-01

Pos logit contributions
 punished+0.024
 forgot+0.020
 cleaned+0.019
 nurses+0.017
 repaired+0.016
Neg logit contributions
aps-0.050
apping-0.049
apped-0.048
 This-0.046
uts-0.045
Tokenapped
Token ID6320
Feature activation-1.01e-01

Pos logit contributions
 This+0.092
 Nice+0.077
uts+0.074
aps+0.067
 Today+0.065
Neg logit contributions
 forgot-0.397
 cleaned-0.389
 punished-0.366
 hoped-0.353
 washed-0.352
Token and
Token ID290
Feature activation+1.54e-02

Pos logit contributions
aps+0.225
apping+0.221
uts+0.200
uttering+0.198
 This+0.197
Neg logit contributions
 forgot-0.201
 cleaned-0.179
 punished-0.176
 put-0.157
 realized-0.155
Token nib
Token ID33272
Feature activation-1.08e-01

Pos logit contributions
 punished+0.016
 forgot+0.015
 cleaned+0.014
 nurses+0.011
 regretted+0.010
Neg logit contributions
aps-0.045
apping-0.044
apped-0.041
 This-0.041
uts-0.041
Tokenbled
Token ID9342
Feature activation-5.17e-02

Pos logit contributions
aps+0.224
 This+0.223
apping+0.218
 Nice+0.204
 Today+0.203
Neg logit contributions
 forgot-0.204
 cleaned-0.189
 punished-0.182
ts-0.175
ty-0.175
Token and
Token ID290
Feature activation+1.97e-02

Pos logit contributions
aps+0.113
apping+0.113
uttering+0.101
uts+0.100
 This+0.098
Neg logit contributions
 forgot-0.103
 cleaned-0.095
 punished-0.092
 put-0.084
 felt-0.082
Token hid
Token ID24519
Feature activation-2.78e-02

Pos logit contributions
 forgot+0.027
 punished+0.024
 cleaned+0.023
 realized+0.019
 sighed+0.018
Neg logit contributions
aps-0.055
apping-0.054
uts-0.049
 This-0.049
uttering-0.049
Token together
Token ID1978
Feature activation-8.87e-04

Pos logit contributions
aps+0.050
apping+0.049
uttering+0.042
uts+0.042
apped+0.041
Neg logit contributions
 forgot-0.068
 punished-0.062
 cleaned-0.061
 sighed-0.054
 realized-0.054
Token.
Token ID13
Feature activation-8.43e-02

Pos logit contributions
aps+0.002
apping+0.002
unch+0.002
uttering+0.002
uts+0.002
Neg logit contributions
 forgot-0.002
 cleaned-0.001
 put-0.001
 punished-0.001
 bit-0.001
Token It
Token ID632
Feature activation+7.48e-02

Pos logit contributions
uttering+0.133
apping+0.127
aps+0.127
uts+0.111
unch+0.108
Neg logit contributions
 forgot-0.152
 cleaned-0.140
 put-0.133
 punished-0.132
 saw-0.129
Token was
Token ID373
Feature activation+1.62e-02

Pos logit contributions
 forgot+0.101
 punished+0.100
wy+0.094
 nurses+0.092
 cleaned+0.091
Neg logit contributions
aps-0.172
 This-0.156
apping-0.152
uts-0.147
apped-0.143
Token covered
Token ID5017
Feature activation-5.78e-02

Pos logit contributions
 forgot+0.030
 punished+0.026
 cleaned+0.025
 sighed+0.023
 realized+0.022
Neg logit contributions
aps-0.037
apping-0.036
uttering-0.033
 This-0.033
uts-0.032
Token in
Token ID287
Feature activation-6.09e-03

Pos logit contributions
aps+0.112
apping+0.110
uttering+0.096
uts+0.096
 This+0.095
Neg logit contributions
 forgot-0.133
 punished-0.119
 cleaned-0.118
 realized-0.105
 sighed-0.105
Token sticky
Token ID23408
Feature activation-3.86e-02

Pos logit contributions
aps+0.014
apping+0.013
 This+0.013
uttering+0.012
uts+0.012
Neg logit contributions
 forgot-0.012
 punished-0.010
 cleaned-0.009
 sighed-0.008
 regretted-0.008
Token cob
Token ID22843
Feature activation-1.13e-01

Pos logit contributions
aps+0.093
apping+0.089
uts+0.083
 This+0.083
uttering+0.079
Neg logit contributions
 forgot-0.071
 punished-0.062
 cleaned-0.060
 hoped-0.053
 sighed-0.052
Tokenwe
Token ID732
Feature activation+3.46e-02

Pos logit contributions
aps+0.211
apping+0.203
uts+0.181
uttering+0.175
 This+0.167
Neg logit contributions
 forgot-0.276
 punished-0.250
 cleaned-0.248
 put-0.222
 sighed-0.222
Tokenbs
Token ID1443
Feature activation-1.49e-02

Pos logit contributions
 forgot+0.083
 punished+0.077
 cleaned+0.073
 locked+0.069
 realized+0.069
Neg logit contributions
aps-0.056
apping-0.055
uts-0.049
 This-0.048
apped-0.047
Token and
Token ID290
Feature activation+2.16e-02

Pos logit contributions
aps+0.029
apping+0.028
uttering+0.024
uts+0.024
 This+0.024
Neg logit contributions
 forgot-0.034
 cleaned-0.031
 punished-0.031
 put-0.028
 realized-0.027
Token had
Token ID550
Feature activation+2.00e-02

Pos logit contributions
 forgot+0.036
 punished+0.034
 cleaned+0.032
ty+0.028
 repaired+0.028
Neg logit contributions
aps-0.049
apping-0.047
uts-0.044
 This-0.044
apped-0.042
Token to
Token ID284
Feature activation+4.08e-02

Pos logit contributions
 forgot+0.040
 punished+0.036
 cleaned+0.034
 sighed+0.032
ty+0.032
Neg logit contributions
aps-0.039
apping-0.036
uts-0.035
 This-0.035
uttering-0.033
Token to
Token ID284
Feature activation+2.85e-02

Pos logit contributions
aps+0.022
apping+0.022
uttering+0.019
uts+0.019
 This+0.019
Neg logit contributions
 forgot-0.030
 cleaned-0.028
 punished-0.028
 put-0.025
 realized-0.025
Token pet
Token ID4273
Feature activation-7.99e-02

Pos logit contributions
 forgot+0.049
 cleaned+0.042
 punished+0.042
 sighed+0.039
 regretted+0.037
Neg logit contributions
aps-0.068
apping-0.064
 This-0.062
uts-0.061
unch-0.058
Token the
Token ID262
Feature activation+2.72e-02

Pos logit contributions
aps+0.198
apping+0.197
uttering+0.179
uts+0.174
apped+0.171
Neg logit contributions
 forgot-0.138
 punished-0.124
 cleaned-0.122
 put-0.108
 realized-0.105
Token rh
Token ID9529
Feature activation-6.89e-02

Pos logit contributions
 forgot+0.058
 cleaned+0.052
 punished+0.051
 sighed+0.045
 realized+0.044
Neg logit contributions
aps-0.057
apping-0.054
 This-0.050
uts-0.049
uttering-0.049
Tokenin
Token ID259
Feature activation-6.45e-04

Pos logit contributions
aps+0.168
apping+0.165
uts+0.148
uttering+0.148
 This+0.145
Neg logit contributions
 forgot-0.126
 cleaned-0.111
 punished-0.110
 put-0.093
 realized-0.093
Tokenoc
Token ID420
Feature activation-1.40e-01

Pos logit contributions
aps+0.002
apping+0.002
uttering+0.002
uts+0.001
apped+0.001
Neg logit contributions
 forgot-0.001
 punished-0.001
 cleaned-0.001
 bit-0.001
 put-0.001
Tokeneros
Token ID27498
Feature activation-2.28e-02

Pos logit contributions
apping+0.254
aps+0.252
uttering+0.226
uts+0.212
 This+0.204
Neg logit contributions
 forgot-0.314
 punished-0.305
 cleaned-0.295
 sighed-0.266
ty-0.265
Token.
Token ID13
Feature activation-8.53e-02

Pos logit contributions
aps+0.047
apping+0.046
uttering+0.042
uts+0.040
apped+0.039
Neg logit contributions
 forgot-0.049
 punished-0.045
 cleaned-0.045
 put-0.040
 realized-0.039
Token It
Token ID632
Feature activation+6.33e-02

Pos logit contributions
uttering+0.164
apping+0.161
aps+0.153
iked+0.136
unch+0.135
Neg logit contributions
 forgot-0.185
 cleaned-0.174
 punished-0.169
 put-0.164
 hoped-0.164
Token looked
Token ID3114
Feature activation+7.74e-03

Pos logit contributions
 forgot+0.092
 cleaned+0.087
 punished+0.083
ts+0.081
 nurses+0.077
Neg logit contributions
aps-0.138
 This-0.138
uts-0.131
apping-0.128
 That-0.126
Token so
Token ID523
Feature activation+7.71e-03

Pos logit contributions
 forgot+0.013
 cleaned+0.012
 punished+0.011
 sighed+0.010
 checked+0.010
Neg logit contributions
aps-0.018
apping-0.018
 This-0.017
uttering-0.016
uts-0.016
Token to
Token ID284
Feature activation-7.55e-03

Pos logit contributions
uttering+0.183
apping+0.182
aps+0.181
unch+0.170
apped+0.165
Neg logit contributions
 forgot-0.153
 cleaned-0.135
 put-0.130
 punished-0.126
 would-0.123
Token the
Token ID262
Feature activation+1.99e-02

Pos logit contributions
aps+0.015
apping+0.014
 This+0.013
uts+0.013
uttering+0.012
Neg logit contributions
 forgot-0.017
 cleaned-0.015
 punished-0.015
 sighed-0.013
 checked-0.013
Token mail
Token ID6920
Feature activation-4.89e-02

Pos logit contributions
 forgot+0.039
 cleaned+0.034
 punished+0.034
 sighed+0.030
 realized+0.029
Neg logit contributions
aps-0.045
apping-0.044
uttering-0.040
 This-0.039
uts-0.039
Tokenman
Token ID805
Feature activation-8.82e-03

Pos logit contributions
aps+0.121
apping+0.119
uttering+0.108
uts+0.108
 This+0.106
Neg logit contributions
 forgot-0.086
 cleaned-0.075
 punished-0.075
 put-0.064
 realized-0.063
Token tomorrow
Token ID9439
Feature activation-2.48e-02

Pos logit contributions
aps+0.022
apping+0.021
uttering+0.020
uts+0.019
unch+0.019
Neg logit contributions
 forgot-0.015
 cleaned-0.013
 punished-0.013
 put-0.012
 hoped-0.011
Token."
Token ID526
Feature activation-1.03e-01

Pos logit contributions
aps+0.053
apping+0.052
uttering+0.049
unch+0.046
apped+0.045
Neg logit contributions
 forgot-0.052
 cleaned-0.046
 punished-0.045
 put-0.043
 locked-0.040
Token
Token ID198
Feature activation-6.84e-02

Pos logit contributions
uttering+0.250
apping+0.239
iked+0.227
oga+0.223
uttered+0.221
Neg logit contributions
 forgot-0.156
 cleaned-0.145
 saw-0.141
 put-0.134
 felt-0.134
Token
Token ID198
Feature activation-6.27e-02

Pos logit contributions
apping+0.136
uttering+0.132
aps+0.124
iked+0.123
oga+0.123
Neg logit contributions
 cleaned-0.129
 forgot-0.128
 put-0.120
 comfort-0.116
 saw-0.114
TokenThey
Token ID2990
Feature activation+8.31e-02

Pos logit contributions
apping+0.144
aps+0.144
 This+0.139
uttering+0.137
uts+0.136
Neg logit contributions
 forgot-0.108
 cleaned-0.102
 punished-0.095
 put-0.089
 sighed-0.085
Token said
Token ID531
Feature activation-3.58e-02

Pos logit contributions
 matter+0.122
 mess+0.118
 damage+0.115
 unpredictable+0.114
 situation+0.113
Neg logit contributions
apped-0.182
apping-0.159
aps-0.155
uttered-0.132
iked-0.131
Token,
Token ID11
Feature activation-2.24e-03

Pos logit contributions
uttering+0.083
aps+0.082
apping+0.080
uts+0.078
unch+0.077
Neg logit contributions
 forgot-0.060
 cleaned-0.057
 punished-0.051
 put-0.049
 sighed-0.049
Tokenoc
Token ID420
Feature activation-1.61e-01

Pos logit contributions
aps+0.185
apping+0.182
uttering+0.162
uts+0.161
 This+0.159
Neg logit contributions
 forgot-0.169
 punished-0.151
 cleaned-0.151
 put-0.131
 realized-0.130
Tokenados
Token ID22484
Feature activation-2.23e-02

Pos logit contributions
apping+0.240
aps+0.233
uttering+0.210
 Today+0.189
apped+0.180
Neg logit contributions
 forgot-0.416
 punished-0.412
 cleaned-0.393
 realized-0.363
 repaired-0.363
Token because
Token ID780
Feature activation+1.28e-02

Pos logit contributions
aps+0.048
apping+0.047
uttering+0.042
apped+0.041
 This+0.040
Neg logit contributions
 forgot-0.045
 cleaned-0.041
 punished-0.040
 put-0.038
ty-0.036
Token they
Token ID484
Feature activation+8.77e-02

Pos logit contributions
 forgot+0.027
 cleaned+0.024
 punished+0.024
 put+0.021
 realized+0.021
Neg logit contributions
aps-0.027
apping-0.027
uttering-0.024
uts-0.023
 This-0.023
Token were
Token ID547
Feature activation+2.23e-02

Pos logit contributions
 punished+0.148
 forgot+0.141
 cleaned+0.134
 colder+0.133
 sighed+0.131
Neg logit contributions
 This-0.163
uts-0.160
aps-0.151
 These-0.151
owers-0.148
Token y
Token ID331
Feature activation-1.17e-01

Pos logit contributions
 forgot+0.039
 cleaned+0.034
 punished+0.034
 sighed+0.030
 realized+0.029
Neg logit contributions
aps-0.055
apping-0.054
uts-0.049
 This-0.049
uttering-0.048
Tokenummy
Token ID13513
Feature activation+2.56e-03

Pos logit contributions
aps+0.132
apping+0.129
apped+0.111
 Nice+0.106
uttering+0.105
Neg logit contributions
 forgot-0.308
 cleaned-0.302
 punished-0.300
 put-0.280
 mess-0.278
Token and
Token ID290
Feature activation+1.85e-02

Pos logit contributions
 forgot+0.006
 cleaned+0.005
 punished+0.005
 put+0.004
 realized+0.004
Neg logit contributions
aps-0.005
apping-0.005
uttering-0.004
uts-0.004
 This-0.004
Token green
Token ID4077
Feature activation-5.20e-02

Pos logit contributions
 forgot+0.031
 punished+0.029
 cleaned+0.028
 sighed+0.025
 locked+0.024
Neg logit contributions
aps-0.043
apping-0.041
 This-0.040
uts-0.039
uttering-0.037
Token.
Token ID13
Feature activation-6.59e-02

Pos logit contributions
aps+0.112
apping+0.112
uttering+0.100
uts+0.097
 This+0.096
Neg logit contributions
 forgot-0.107
 cleaned-0.096
 punished-0.095
 put-0.085
 realized-0.084
Token One
Token ID1881
Feature activation+2.91e-02

Pos logit contributions
apping+0.155
aps+0.153
uttering+0.150
uts+0.140
apped+0.129
Neg logit contributions
 forgot-0.119
 cleaned-0.103
 put-0.100
 saw-0.099
 would-0.099