TOP ACTIVATIONS
MAX = 0.311

I am the best swim mer !" M ia
Max is a good swim mer , but he likes to
to send the fastest swim mer to a new pond .
became a very good swim mer , just like the powerful
Ben has a blue swim suit . It has sharks .
the light was getting dim mer . She asked her mom
You are a good swim mer . You can try again
two will feel much cal mer ." P aco
was a very fast swim mer , but he was starting
to be the fastest swim mer in her class . She
he was a snake char mer . He liked to play
frog became the best swim mer in the pond . All
Anna has a new swim suit . It is pink and
You are a good swim mer too . You are not
He was an enthusiastic swim mer and would swim all day
be the most successful swim mer in the pond . He
was the most successful swim mer in the pond ! He
and Dad . " Ham mer !" yelled Zoe proudly .
You are a good swim mer . Do you want to
was not a good swim mer . He started to panic

MIN ACTIVATIONS
MIN = -0.390

felt like it was her fault , like she had done
ending was no one 's fault , but it left the
that it was his own fault - if he had written
knew it was her own fault . The girl
awn â s fault . The
realised it wasn 't her fault after all . <|endoftext|> Once
and said it was their fault . Mom said they were
them it was all their fault . His friends were so
it was all Joe 's fault . Joe tried to talk
" It 's your own fault , you should 've been
knew it was his own fault . He would have to
realized it was his own fault for not val uing her
bird . You admit your fault and ask for mercy .
. It is not your fault . It is not anyone
It is not anyone 's fault ." L ila
saying that it was his fault that he forgot the folder
that it wasn 't her fault . Finally , the family
. It 's not your fault . The rope was old
that it was not her fault . He said they would
and it 's all my fault ," he said , weeping

INTERVAL 0.136 - 0.311
CONTAINS 1.618%

. Suddenly , a big wave came and hit the boat
three years old and was very happy . One day ,
to cry . " O w , ow ,
, there was a little girl named Lily . She loved
Suddenly , the shadow started to speak . " Hello

INTERVAL -0.039 - 0.136
CONTAINS 75.785%

. " By e , dolphin !" Ben and
. He sees blood on his finger . He starts to
thinks for a moment . She still does not like Tim
colorful and fun . Do you want to play with blocks
her knee . Tom ran to help her .

INTERVAL -0.215 - -0.039
CONTAINS 22.532%

. They cut another slice of pie and ate
wide tail and bright blue eyes . One day
saw a spark ly egg that was far bigger than the
Lily went to the park to play . There , she
separate their eyes . The bear 's eye went far away

INTERVAL -0.390 - -0.215
CONTAINS 0.066%

could barely contain his impat ience as he opened the ch
they didn 't buy the candy . Her mom explained to
L ily and Spark y became friends and they
Jack loved to go on adventures with his friends . One
had hurt the bird 's feelings . The park was never
TokenI
Token ID40
Feature activation+6.56e-02

Pos logit contributions
he+0.155
ang+0.152
mer+0.142
imp+0.129
enced+0.124
Neg logit contributions
 fault-0.233
 kindness-0.219
 vitamins-0.211
 generosity-0.211
 honesty-0.209
Token am
Token ID716
Feature activation+6.49e-02

Pos logit contributions
he+0.119
ang+0.111
mer+0.106
tted+0.102
amed+0.098
Neg logit contributions
 fault-0.139
 kindness-0.123
 generosity-0.121
 cand-0.117
 forgiveness-0.115
Token the
Token ID262
Feature activation+2.39e-02

Pos logit contributions
he+0.126
ang+0.116
mer+0.113
tted+0.109
amed+0.104
Neg logit contributions
 fault-0.130
 kindness-0.116
 generosity-0.113
 cand-0.109
 forgiveness-0.107
Token best
Token ID1266
Feature activation-5.79e-02

Pos logit contributions
he+0.053
ang+0.049
mer+0.046
tted+0.046
amed+0.045
Neg logit contributions
 fault-0.043
 kindness-0.038
 generosity-0.036
 forgiveness-0.034
 cand-0.034
Token swim
Token ID9422
Feature activation+7.34e-02

Pos logit contributions
 fault+0.116
 kindness+0.102
 generosity+0.097
 forgiveness+0.093
 dreams+0.093
Neg logit contributions
he-0.113
ang-0.102
amed-0.101
tted-0.099
mer-0.095
Tokenmer
Token ID647
Feature activation+2.78e-01

Pos logit contributions
he+0.089
ang+0.069
amed+0.068
aming+0.067
Uh+0.066
Neg logit contributions
 fault-0.200
 kindness-0.181
 generosity-0.175
 powers-0.169
 cand-0.168
Token!"
Token ID2474
Feature activation+1.50e-02

Pos logit contributions
he+0.610
ang+0.562
tted+0.546
amed+0.522
mer+0.512
Neg logit contributions
 fault-0.487
 kindness-0.429
 generosity-0.413
 cand-0.399
 vitamins-0.388
Token
Token ID198
Feature activation+2.23e-02

Pos logit contributions
he+0.027
ang+0.026
mer+0.025
tted+0.024
amed+0.023
Neg logit contributions
 fault-0.030
 kindness-0.028
 generosity-0.027
 cand-0.026
 dreams-0.026
Token
Token ID198
Feature activation+2.09e-02

Pos logit contributions
mer+0.048
he+0.047
ang+0.046
tted+0.043
umbling+0.042
Neg logit contributions
 fault-0.033
 kindness-0.030
 forgiveness-0.028
 honesty-0.028
 generosity-0.027
TokenM
Token ID44
Feature activation+6.25e-02

Pos logit contributions
he+0.040
ang+0.038
mer+0.036
imp+0.033
ivated+0.033
Neg logit contributions
 vitamins-0.030
 fiber-0.028
 kindness-0.028
 fault-0.027
 generosity-0.027
Tokenia
Token ID544
Feature activation+1.28e-01

Pos logit contributions
he+0.124
ang+0.115
mer+0.112
tted+0.108
amed+0.105
Neg logit contributions
 fault-0.124
 kindness-0.107
 generosity-0.105
 cand-0.102
 forgiveness-0.101
Token Max
Token ID5436
Feature activation-6.05e-02

Pos logit contributions
he+0.031
ang+0.029
mer+0.028
tted+0.027
amed+0.025
Neg logit contributions
 fault-0.042
 kindness-0.038
 generosity-0.037
 cand-0.037
 diamonds-0.035
Token is
Token ID318
Feature activation+2.06e-02

Pos logit contributions
 fault+0.134
 generosity+0.120
 kindness+0.118
 diamonds+0.118
 forgiveness+0.116
Neg logit contributions
he-0.099
mer-0.094
ang-0.091
tted-0.085
umpy-0.078
Token a
Token ID257
Feature activation-3.73e-02

Pos logit contributions
he+0.039
ang+0.038
mer+0.036
tted+0.034
amed+0.033
Neg logit contributions
 fault-0.039
 generosity-0.035
 cand-0.035
 kindness-0.034
 forgiveness-0.033
Token good
Token ID922
Feature activation-1.60e-02

Pos logit contributions
 fault+0.069
 kindness+0.061
 generosity+0.060
 cand+0.059
 unic+0.057
Neg logit contributions
he-0.075
ang-0.071
mer-0.070
tted-0.066
imp-0.063
Token swim
Token ID9422
Feature activation+7.25e-02

Pos logit contributions
 fault+0.035
 kindness+0.031
 generosity+0.031
 cand+0.030
 unic+0.029
Neg logit contributions
he-0.028
ang-0.026
mer-0.025
tted-0.024
imp-0.023
Tokenmer
Token ID647
Feature activation+2.78e-01

Pos logit contributions
he+0.074
ang+0.062
amed+0.059
tted+0.055
aming+0.054
Neg logit contributions
 fault-0.207
 kindness-0.190
 generosity-0.184
 cand-0.183
 forgiveness-0.177
Token,
Token ID11
Feature activation+1.43e-02

Pos logit contributions
he+0.597
ang+0.567
tted+0.537
mer+0.531
amed+0.515
Neg logit contributions
 fault-0.494
 kindness-0.429
 generosity-0.415
 cand-0.406
 forgiveness-0.395
Token but
Token ID475
Feature activation-1.47e-02

Pos logit contributions
he+0.025
ang+0.023
mer+0.022
tted+0.021
imp+0.020
Neg logit contributions
 fault-0.030
 generosity-0.026
 cand-0.026
 kindness-0.026
 unic-0.025
Token he
Token ID339
Feature activation+1.17e-03

Pos logit contributions
 fault+0.032
urger+0.027
 cand+0.027
 generosity+0.026
 kindness+0.026
Neg logit contributions
he-0.025
ang-0.024
mer-0.023
tted-0.020
imp-0.020
Token likes
Token ID7832
Feature activation+1.21e-03

Pos logit contributions
he+0.002
ang+0.002
mer+0.002
tted+0.002
imp+0.002
Neg logit contributions
 fault-0.002
 generosity-0.002
 kindness-0.002
 cand-0.002
 diamonds-0.002
Token to
Token ID284
Feature activation-4.91e-02

Pos logit contributions
he+0.002
ang+0.002
mer+0.002
tted+0.002
imp+0.002
Neg logit contributions
 fault-0.002
 generosity-0.002
urger-0.002
 cand-0.002
 kindness-0.002
Token to
Token ID284
Feature activation-5.49e-02

Pos logit contributions
he+0.081
ang+0.075
mer+0.074
tted+0.069
imp+0.067
Neg logit contributions
 fault-0.074
 kindness-0.065
 generosity-0.063
 cand-0.060
 honesty-0.060
Token send
Token ID3758
Feature activation-5.80e-02

Pos logit contributions
 fault+0.109
 kindness+0.094
 generosity+0.093
 cand+0.092
 forgiveness+0.089
Neg logit contributions
he-0.106
ang-0.100
mer-0.097
tted-0.093
amed-0.088
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
 fault+0.111
 generosity+0.093
 kindness+0.093
 cand+0.092
 unic+0.090
Neg logit contributions
he-0.115
ang-0.109
mer-0.106
tted-0.104
amed-0.100
Token fastest
Token ID14162
Feature activation+2.76e-02

Pos logit contributions
he+0.039
ang+0.037
mer+0.035
tted+0.033
amed+0.033
Neg logit contributions
 fault-0.041
 kindness-0.036
 generosity-0.035
 cand-0.034
 vitamins-0.033
Token swim
Token ID9422
Feature activation+7.12e-02

Pos logit contributions
he+0.060
amed+0.054
aming+0.052
ang+0.052
tted+0.051
Neg logit contributions
 fault-0.050
 kindness-0.044
 generosity-0.041
 diamonds-0.041
 dreams-0.040
Tokenmer
Token ID647
Feature activation+2.77e-01

Pos logit contributions
he+0.166
Uh+0.154
ang+0.147
aming+0.145
amed+0.145
Neg logit contributions
 fault-0.112
 kindness-0.095
 powers-0.087
 generosity-0.086
 feelings-0.084
Token to
Token ID284
Feature activation-5.50e-02

Pos logit contributions
he+0.608
ang+0.541
tted+0.508
amed+0.499
aming+0.487
Neg logit contributions
 fault-0.506
 kindness-0.451
 generosity-0.431
 cand-0.410
 vitamins-0.408
Token a
Token ID257
Feature activation-3.68e-02

Pos logit contributions
 fault+0.109
 kindness+0.099
 generosity+0.095
 honesty+0.091
 forgiveness+0.091
Neg logit contributions
he-0.106
ang-0.101
mer-0.096
tted-0.093
amed-0.092
Token new
Token ID649
Feature activation-8.40e-02

Pos logit contributions
 fault+0.068
 kindness+0.062
 generosity+0.059
 cand+0.056
 unic+0.056
Neg logit contributions
he-0.076
ang-0.070
amed-0.067
tted-0.066
mer-0.064
Token pond
Token ID16723
Feature activation+1.05e-01

Pos logit contributions
 fault+0.162
 kindness+0.145
 generosity+0.139
 dreams+0.132
 honesty+0.131
Neg logit contributions
he-0.171
ang-0.153
tted-0.150
amed-0.148
aming-0.143
Token.
Token ID13
Feature activation+8.83e-03

Pos logit contributions
he+0.228
ang+0.214
tted+0.202
amed+0.200
mer+0.199
Neg logit contributions
 fault-0.181
 kindness-0.160
 generosity-0.152
 cand-0.147
 honesty-0.144
Token became
Token ID2627
Feature activation+2.31e-02

Pos logit contributions
 fault+0.012
 generosity+0.010
 kindness+0.010
 forgiveness+0.010
 cand+0.010
Neg logit contributions
he-0.009
mer-0.009
ang-0.008
tted-0.008
amed-0.007
Token a
Token ID257
Feature activation-3.76e-02

Pos logit contributions
he+0.044
ang+0.041
mer+0.040
tted+0.038
amed+0.037
Neg logit contributions
 fault-0.046
 kindness-0.041
 generosity-0.040
 cand-0.039
 forgiveness-0.038
Token very
Token ID845
Feature activation+1.36e-01

Pos logit contributions
 fault+0.075
 kindness+0.067
 generosity+0.065
 cand+0.064
 forgiveness+0.063
Neg logit contributions
he-0.071
ang-0.067
mer-0.065
tted-0.062
amed-0.059
Token good
Token ID922
Feature activation-2.24e-02

Pos logit contributions
he+0.284
ang+0.255
amed+0.237
mer+0.237
tted+0.236
Neg logit contributions
 fault-0.255
 kindness-0.235
 generosity-0.226
 honesty-0.212
 vitamins-0.207
Token swim
Token ID9422
Feature activation+7.13e-02

Pos logit contributions
 fault+0.043
 kindness+0.037
 generosity+0.036
 cand+0.035
 forgiveness+0.034
Neg logit contributions
he-0.046
ang-0.043
tted-0.041
mer-0.041
amed-0.040
Tokenmer
Token ID647
Feature activation+2.76e-01

Pos logit contributions
he+0.133
ang+0.120
Uh+0.117
amed+0.114
tted+0.109
Neg logit contributions
 fault-0.143
 kindness-0.130
 generosity-0.125
 cand-0.118
urger-0.116
Token,
Token ID11
Feature activation+3.60e-03

Pos logit contributions
he+0.517
ang+0.485
mer+0.469
tted+0.446
amed+0.426
Neg logit contributions
 fault-0.562
 kindness-0.499
 generosity-0.490
 cand-0.479
 forgiveness-0.468
Token just
Token ID655
Feature activation+3.32e-03

Pos logit contributions
he+0.007
ang+0.006
mer+0.006
tted+0.006
amed+0.005
Neg logit contributions
 fault-0.007
 kindness-0.006
 generosity-0.006
 cand-0.006
 forgiveness-0.006
Token like
Token ID588
Feature activation+5.27e-02

Pos logit contributions
he+0.004
ang+0.004
mer+0.004
tted+0.003
amed+0.003
Neg logit contributions
 fault-0.008
 cand-0.008
 generosity-0.008
 vitamins-0.008
 kindness-0.007
Token the
Token ID262
Feature activation+1.47e-02

Pos logit contributions
he+0.082
ang+0.079
mer+0.077
tted+0.068
room+0.067
Neg logit contributions
 fault-0.116
 cand-0.103
urger-0.101
 generosity-0.101
 kindness-0.099
Token powerful
Token ID3665
Feature activation-7.26e-03

Pos logit contributions
he+0.027
mer+0.026
ang+0.025
tted+0.023
amed+0.022
Neg logit contributions
 fault-0.030
 cand-0.026
 kindness-0.026
 generosity-0.026
 unic-0.025
Token Ben
Token ID3932
Feature activation+9.69e-02

Pos logit contributions
he+0.085
mer+0.081
ang+0.079
tted+0.077
imp+0.076
Neg logit contributions
 fault-0.074
 kindness-0.064
 vitamins-0.062
 cand-0.061
 dreams-0.061
Token has
Token ID468
Feature activation-2.14e-03

Pos logit contributions
he+0.192
ang+0.177
mer+0.171
tted+0.166
amed+0.159
Neg logit contributions
 fault-0.190
 kindness-0.168
 generosity-0.164
 cand-0.160
 vitamins-0.156
Token a
Token ID257
Feature activation-1.57e-02

Pos logit contributions
 fault+0.004
 generosity+0.004
 kindness+0.004
 forgiveness+0.004
 cand+0.004
Neg logit contributions
he-0.004
ang-0.004
mer-0.003
tted-0.003
amed-0.003
Token blue
Token ID4171
Feature activation-1.49e-02

Pos logit contributions
 fault+0.031
 kindness+0.027
 generosity+0.026
 cand+0.026
 forgiveness+0.025
Neg logit contributions
he-0.031
ang-0.029
mer-0.028
tted-0.027
amed-0.026
Token swim
Token ID9422
Feature activation+7.70e-02

Pos logit contributions
 fault+0.027
 kindness+0.024
 generosity+0.023
 cand+0.023
 forgiveness+0.022
Neg logit contributions
he-0.031
ang-0.029
mer-0.028
tted-0.028
amed-0.026
Tokensuit
Token ID6063
Feature activation+2.75e-01

Pos logit contributions
he+0.103
Uh+0.087
amed+0.079
aming+0.079
ang+0.076
Neg logit contributions
 fault-0.192
 cand-0.176
 kindness-0.175
urger-0.174
 powers-0.172
Token.
Token ID13
Feature activation+4.19e-02

Pos logit contributions
he+0.515
ang+0.484
mer+0.466
tted+0.449
amed+0.430
Neg logit contributions
 fault-0.559
 kindness-0.496
 generosity-0.485
 cand-0.479
 forgiveness-0.462
Token It
Token ID632
Feature activation-1.75e-04

Pos logit contributions
he+0.089
mer+0.085
ang+0.085
amed+0.083
aming+0.082
Neg logit contributions
 fault-0.067
 vitamins-0.059
 kindness-0.059
 cand-0.058
 dreams-0.056
Token has
Token ID468
Feature activation-2.38e-03

Pos logit contributions
 fault+0.000
 kindness+0.000
 generosity+0.000
 forgiveness+0.000
 cand+0.000
Neg logit contributions
he-0.000
ang-0.000
mer-0.000
tted-0.000
amed-0.000
Token sharks
Token ID27476
Feature activation+4.09e-02

Pos logit contributions
 fault+0.005
 generosity+0.004
 kindness+0.004
 forgiveness+0.004
 honesty+0.004
Neg logit contributions
he-0.004
ang-0.004
mer-0.004
tted-0.004
amed-0.004
Token.
Token ID13
Feature activation+4.16e-02

Pos logit contributions
he+0.069
ang+0.064
mer+0.062
tted+0.059
amed+0.056
Neg logit contributions
 fault-0.091
 kindness-0.081
 generosity-0.080
 cand-0.078
 forgiveness-0.076
Token the
Token ID262
Feature activation+1.65e-02

Pos logit contributions
 fault+0.021
 kindness+0.018
 generosity+0.018
 cand+0.018
 forgiveness+0.017
Neg logit contributions
he-0.017
ang-0.016
mer-0.015
tted-0.015
amed-0.014
Token light
Token ID1657
Feature activation+1.28e-02

Pos logit contributions
he+0.031
ang+0.029
mer+0.028
tted+0.027
amed+0.026
Neg logit contributions
 fault-0.034
 kindness-0.030
 generosity-0.030
 cand-0.029
 forgiveness-0.028
Token was
Token ID373
Feature activation+3.92e-02

Pos logit contributions
he+0.024
ang+0.023
mer+0.021
tted+0.021
imp+0.020
Neg logit contributions
 fault-0.026
 kindness-0.023
 generosity-0.022
 cand-0.022
 unic-0.022
Token getting
Token ID1972
Feature activation+8.00e-02

Pos logit contributions
he+0.077
ang+0.073
mer+0.071
tted+0.068
imp+0.064
Neg logit contributions
 fault-0.077
 kindness-0.067
 generosity-0.066
 cand-0.064
 unic-0.063
Token dim
Token ID5391
Feature activation+3.13e-02

Pos logit contributions
he+0.153
ang+0.148
mer+0.143
tted+0.136
imp+0.129
Neg logit contributions
 fault-0.155
 kindness-0.140
 cand-0.136
 generosity-0.135
 diamonds-0.134
Tokenmer
Token ID647
Feature activation+2.75e-01

Pos logit contributions
ang+0.069
he+0.068
imp+0.065
Uh+0.064
tted+0.064
Neg logit contributions
 fault-0.038
 cand-0.036
 diamonds-0.036
 unic-0.035
 kindness-0.034
Token.
Token ID13
Feature activation+8.95e-03

Pos logit contributions
he+0.579
ang+0.548
mer+0.512
tted+0.509
amed+0.486
Neg logit contributions
 fault-0.508
 kindness-0.446
 generosity-0.431
 cand-0.423
 honesty-0.405
Token She
Token ID1375
Feature activation-6.05e-03

Pos logit contributions
he+0.016
ang+0.016
mer+0.015
imp+0.014
tted+0.013
Neg logit contributions
 fault-0.019
 kindness-0.017
 cand-0.017
 unic-0.016
 dreams-0.016
Token asked
Token ID1965
Feature activation-2.80e-02

Pos logit contributions
 fault+0.013
 kindness+0.012
 generosity+0.012
 cand+0.011
 unic+0.011
Neg logit contributions
he-0.011
ang-0.010
mer-0.009
tted-0.009
amed-0.009
Token her
Token ID607
Feature activation-4.68e-03

Pos logit contributions
 fault+0.054
 kindness+0.048
 generosity+0.047
 cand+0.046
 forgiveness+0.045
Neg logit contributions
he-0.057
ang-0.054
mer-0.052
tted-0.050
amed-0.047
Token mom
Token ID1995
Feature activation+3.20e-02

Pos logit contributions
 fault+0.009
 vitamins+0.008
 diamonds+0.008
 generosity+0.008
 weaknesses+0.007
Neg logit contributions
he-0.009
ang-0.008
tted-0.007
imp-0.007
amed-0.007
Token You
Token ID921
Feature activation+1.15e-03

Pos logit contributions
he+0.021
ang+0.020
mer+0.018
tted+0.018
amed+0.017
Neg logit contributions
 fault-0.035
 cand-0.031
 generosity-0.031
 kindness-0.031
 diamonds-0.030
Token are
Token ID389
Feature activation+3.49e-02

Pos logit contributions
he+0.002
tted+0.002
mer+0.002
ang+0.002
amed+0.002
Neg logit contributions
 fault-0.003
 generosity-0.002
 vitamins-0.002
 honesty-0.002
 cand-0.002
Token a
Token ID257
Feature activation-3.61e-02

Pos logit contributions
he+0.063
ang+0.058
mer+0.055
tted+0.054
amed+0.053
Neg logit contributions
 fault-0.075
 generosity-0.065
 kindness-0.065
 cand-0.064
 vitamins-0.061
Token good
Token ID922
Feature activation-1.69e-02

Pos logit contributions
 fault+0.066
 kindness+0.058
 generosity+0.057
 cand+0.055
 forgiveness+0.054
Neg logit contributions
he-0.076
ang-0.071
mer-0.069
tted-0.066
amed-0.064
Token swim
Token ID9422
Feature activation+7.07e-02

Pos logit contributions
 fault+0.036
 kindness+0.031
 generosity+0.030
 cand+0.029
 forgiveness+0.029
Neg logit contributions
he-0.032
ang-0.030
tted-0.028
mer-0.028
amed-0.027
Tokenmer
Token ID647
Feature activation+2.75e-01

Pos logit contributions
he+0.110
ang+0.101
amed+0.094
Uh+0.092
tted+0.091
Neg logit contributions
 fault-0.165
 kindness-0.148
 generosity-0.144
 cand-0.139
 forgiveness-0.138
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
he+0.579
ang+0.551
tted+0.523
mer+0.505
amed+0.499
Neg logit contributions
 fault-0.504
 kindness-0.439
 generosity-0.424
 cand-0.408
 forgiveness-0.406
Token You
Token ID921
Feature activation+2.99e-03

Pos logit contributions
he+0.034
ang+0.032
mer+0.030
tted+0.029
amed+0.027
Neg logit contributions
 fault-0.049
 kindness-0.044
 generosity-0.044
 cand-0.043
 forgiveness-0.042
Token can
Token ID460
Feature activation-1.76e-02

Pos logit contributions
he+0.005
tted+0.004
mer+0.004
ang+0.004
amed+0.004
Neg logit contributions
 fault-0.007
 generosity-0.006
 cand-0.006
 honesty-0.006
 vitamins-0.006
Token try
Token ID1949
Feature activation-1.40e-02

Pos logit contributions
 fault+0.039
 generosity+0.035
 cand+0.035
 vitamins+0.035
 kindness+0.034
Neg logit contributions
he-0.033
ang-0.028
tted-0.027
mer-0.026
imp-0.024
Token again
Token ID757
Feature activation-1.53e-01

Pos logit contributions
 generosity+0.031
urger+0.029
 fault+0.028
 cand+0.028
 hospitality+0.027
Neg logit contributions
ang-0.018
 across-0.018
 outside-0.017
 far-0.016
he-0.016
Token two
Token ID734
Feature activation-8.96e-02

Pos logit contributions
 fault+0.004
 kindness+0.003
 generosity+0.003
 cand+0.003
 forgiveness+0.003
Neg logit contributions
he-0.003
ang-0.003
mer-0.003
tted-0.003
amed-0.003
Token will
Token ID481
Feature activation-4.51e-02

Pos logit contributions
 fault+0.188
 unic+0.158
 forgiveness+0.154
 generosity+0.151
 kindness+0.150
Neg logit contributions
he-0.157
mer-0.145
ang-0.138
tted-0.136
umpy-0.127
Token feel
Token ID1254
Feature activation+1.57e-02

Pos logit contributions
 fault+0.090
 generosity+0.078
 kindness+0.078
 cand+0.078
 diamonds+0.077
Neg logit contributions
he-0.086
mer-0.077
tted-0.076
ang-0.076
imp-0.070
Token much
Token ID881
Feature activation-5.78e-02

Pos logit contributions
he+0.034
mer+0.029
ang+0.029
tted+0.029
umpy+0.028
Neg logit contributions
 fault-0.027
urger-0.023
 unic-0.023
 generosity-0.022
 cand-0.022
Token cal
Token ID2386
Feature activation+4.03e-03

Pos logit contributions
 fault+0.109
 unic+0.095
urger+0.089
 diamonds+0.085
 generosity+0.085
Neg logit contributions
he-0.118
mer-0.104
tted-0.101
 forward-0.097
umpy-0.096
Tokenmer
Token ID647
Feature activation+2.75e-01

Pos logit contributions
he+0.002
ang+0.001
amed+0.001
tted+0.001
room+0.001
Neg logit contributions
 fault-0.013
 kindness-0.012
 cand-0.012
 generosity-0.012
 vitamins-0.012
Token."
Token ID526
Feature activation-3.68e-02

Pos logit contributions
he+0.607
ang+0.574
mer+0.553
tted+0.538
amed+0.516
Neg logit contributions
 fault-0.470
 kindness-0.408
 generosity-0.397
 cand-0.385
 forgiveness-0.374
Token
Token ID198
Feature activation+1.32e-02

Pos logit contributions
 fault+0.071
 kindness+0.063
 generosity+0.061
 cand+0.060
 forgiveness+0.058
Neg logit contributions
he-0.074
ang-0.069
mer-0.067
tted-0.065
amed-0.062
Token
Token ID198
Feature activation+1.20e-02

Pos logit contributions
he+0.023
mer+0.022
ang+0.022
tted+0.020
amed+0.020
Neg logit contributions
 fault-0.027
 kindness-0.024
 generosity-0.024
 cand-0.023
 forgiveness-0.023
TokenP
Token ID47
Feature activation+1.10e-01

Pos logit contributions
he+0.019
mer+0.017
ang+0.017
tted+0.016
amed+0.015
Neg logit contributions
 fault-0.026
 kindness-0.025
 generosity-0.024
 cand-0.024
 forgiveness-0.023
Tokenaco
Token ID10602
Feature activation-4.09e-02

Pos logit contributions
he+0.235
mer+0.210
ang+0.209
amed+0.200
tted+0.198
Neg logit contributions
 fault-0.197
 kindness-0.171
 generosity-0.168
 cand-0.158
urger-0.156
Token was
Token ID373
Feature activation+5.38e-02

Pos logit contributions
he+0.069
ang+0.063
mer+0.060
tted+0.058
amed+0.056
Neg logit contributions
 fault-0.079
 kindness-0.071
 generosity-0.069
 cand-0.069
 unic-0.068
Token a
Token ID257
Feature activation-2.26e-02

Pos logit contributions
he+0.083
ang+0.076
mer+0.073
tted+0.069
amed+0.065
Neg logit contributions
 fault-0.129
 kindness-0.117
 generosity-0.115
 cand-0.112
 forgiveness-0.110
Token very
Token ID845
Feature activation+1.49e-01

Pos logit contributions
 fault+0.040
 kindness+0.038
 generosity+0.036
 unic+0.034
 honesty+0.034
Neg logit contributions
he-0.046
amed-0.041
tted-0.041
ang-0.040
umbling-0.039
Token fast
Token ID3049
Feature activation+8.00e-02

Pos logit contributions
he+0.320
ang+0.275
amed+0.275
tted+0.270
room+0.268
Neg logit contributions
 fault-0.260
 kindness-0.243
 generosity-0.230
 honesty-0.220
 vitamins-0.218
Token swim
Token ID9422
Feature activation+7.06e-02

Pos logit contributions
he+0.189
amed+0.182
tted+0.180
ang+0.173
aming+0.170
Neg logit contributions
 fault-0.118
 kindness-0.104
 generosity-0.094
 dreams-0.094
 unic-0.090
Tokenmer
Token ID647
Feature activation+2.75e-01

Pos logit contributions
Uh+0.105
he+0.103
amed+0.093
aming+0.091
imp+0.085
Neg logit contributions
 fault-0.144
 kindness-0.136
 dreams-0.128
 minerals-0.128
 powers-0.127
Token,
Token ID11
Feature activation+1.49e-02

Pos logit contributions
he+0.571
tted+0.513
ang+0.512
amed+0.493
Uh+0.470
Neg logit contributions
 fault-0.511
 kindness-0.453
 generosity-0.438
 cand-0.423
 honesty-0.413
Token but
Token ID475
Feature activation-8.80e-03

Pos logit contributions
he+0.029
ang+0.027
mer+0.026
tted+0.025
amed+0.024
Neg logit contributions
 fault-0.029
 kindness-0.026
 generosity-0.025
 cand-0.025
 forgiveness-0.024
Token he
Token ID339
Feature activation+1.04e-02

Pos logit contributions
 fault+0.016
 kindness+0.013
 generosity+0.013
 forgiveness+0.013
 cand+0.013
Neg logit contributions
he-0.019
ang-0.018
mer-0.018
imp-0.016
tted-0.015
Token was
Token ID373
Feature activation+5.30e-02

Pos logit contributions
he+0.021
ang+0.019
mer+0.018
tted+0.018
amed+0.018
Neg logit contributions
 fault-0.019
 kindness-0.017
 generosity-0.016
 cand-0.016
 unic-0.016
Token starting
Token ID3599
Feature activation+1.14e-01

Pos logit contributions
he+0.090
ang+0.080
mer+0.079
tted+0.075
amed+0.070
Neg logit contributions
 fault-0.120
 kindness-0.108
 generosity-0.105
 cand-0.102
 unic-0.102
Token to
Token ID284
Feature activation-5.24e-02

Pos logit contributions
he+0.041
ang+0.037
mer+0.036
tted+0.035
room+0.034
Neg logit contributions
 fault-0.041
 generosity-0.037
urger-0.037
 kindness-0.036
 cand-0.035
Token be
Token ID307
Feature activation+2.50e-02

Pos logit contributions
 fault+0.108
 kindness+0.098
 generosity+0.094
 forgiveness+0.091
 cand+0.090
Neg logit contributions
he-0.095
ang-0.093
mer-0.089
amed-0.084
tted-0.083
Token the
Token ID262
Feature activation+2.26e-02

Pos logit contributions
he+0.057
ang+0.052
mer+0.050
umbling+0.049
tted+0.049
Neg logit contributions
 fault-0.041
 kindness-0.038
 generosity-0.036
 forgiveness-0.034
 feelings-0.033
Token fastest
Token ID14162
Feature activation+2.58e-02

Pos logit contributions
he+0.051
ang+0.048
tted+0.046
mer+0.045
amed+0.045
Neg logit contributions
 fault-0.038
 kindness-0.034
 generosity-0.032
 forgiveness-0.030
 dreams-0.030
Token swim
Token ID9422
Feature activation+7.10e-02

Pos logit contributions
he+0.056
tted+0.053
amed+0.053
aming+0.050
ang+0.050
Neg logit contributions
 fault-0.044
 kindness-0.037
 dreams-0.036
 cand-0.035
 feelings-0.035
Tokenmer
Token ID647
Feature activation+2.74e-01

Pos logit contributions
he+0.138
Uh+0.131
amed+0.127
aming+0.126
ang+0.124
Neg logit contributions
 fault-0.131
 kindness-0.113
 generosity-0.105
 powers-0.105
 dreams-0.103
Token in
Token ID287
Feature activation+1.18e-01

Pos logit contributions
he+0.571
ang+0.534
tted+0.516
amed+0.494
mer+0.480
Neg logit contributions
 fault-0.510
 kindness-0.445
 generosity-0.426
 cand-0.421
 dreams-0.409
Token her
Token ID607
Feature activation-2.27e-03

Pos logit contributions
ang+0.272
tted+0.272
he+0.264
mer+0.263
amed+0.259
Neg logit contributions
 fault-0.174
 kindness-0.158
 unic-0.149
 dreams-0.144
 forgiveness-0.142
Token class
Token ID1398
Feature activation+3.87e-02

Pos logit contributions
 fault+0.004
 kindness+0.004
 dreams+0.004
 generosity+0.004
 honesty+0.003
Neg logit contributions
he-0.005
amed-0.004
ang-0.004
tted-0.004
Uh-0.004
Token.
Token ID13
Feature activation+1.71e-02

Pos logit contributions
he+0.084
ang+0.083
tted+0.079
amed+0.078
mer+0.077
Neg logit contributions
 fault-0.062
 kindness-0.054
 generosity-0.051
 cand-0.051
 dreams-0.050
Token She
Token ID1375
Feature activation-1.33e-03

Pos logit contributions
he+0.030
ang+0.030
mer+0.028
tted+0.027
amed+0.026
Neg logit contributions
 fault-0.037
 kindness-0.033
 cand-0.032
 generosity-0.031
 dreams-0.031
Token he
Token ID339
Feature activation+1.71e-03

Pos logit contributions
he+0.022
ang+0.021
mer+0.020
tted+0.019
imp+0.018
Neg logit contributions
 fault-0.023
 generosity-0.020
 kindness-0.020
 cand-0.020
 forgiveness-0.019
Token was
Token ID373
Feature activation+3.35e-02

Pos logit contributions
he+0.003
ang+0.003
mer+0.003
tted+0.003
amed+0.003
Neg logit contributions
 fault-0.003
 kindness-0.003
 generosity-0.003
 cand-0.003
 unic-0.003
Token a
Token ID257
Feature activation-3.85e-02

Pos logit contributions
he+0.058
ang+0.055
mer+0.052
tted+0.050
amed+0.049
Neg logit contributions
 fault-0.072
 kindness-0.064
 generosity-0.064
 cand-0.062
 forgiveness-0.061
Token snake
Token ID17522
Feature activation+2.10e-02

Pos logit contributions
 fault+0.073
 kindness+0.064
 generosity+0.063
 cand+0.061
 forgiveness+0.059
Neg logit contributions
he-0.077
ang-0.073
mer-0.071
tted-0.068
imp-0.064
Token char
Token ID1149
Feature activation+1.72e-02

Pos logit contributions
he+0.040
ang+0.037
mer+0.036
tted+0.034
imp+0.032
Neg logit contributions
 fault-0.042
 kindness-0.038
 generosity-0.037
 cand-0.036
 forgiveness-0.035
Tokenmer
Token ID647
Feature activation+2.74e-01

Pos logit contributions
he+0.027
Uh+0.027
ang+0.025
pour+0.024
suit+0.023
Neg logit contributions
 fault-0.031
 kindness-0.029
 powers-0.028
 unic-0.028
 feelings-0.027
Token.
Token ID13
Feature activation+1.20e-02

Pos logit contributions
he+0.565
ang+0.531
tted+0.496
mer+0.494
amed+0.475
Neg logit contributions
 fault-0.518
 kindness-0.454
 generosity-0.441
 cand-0.428
 forgiveness-0.419
Token He
Token ID679
Feature activation+3.31e-02

Pos logit contributions
he+0.024
ang+0.022
mer+0.022
tted+0.020
amed+0.020
Neg logit contributions
 fault-0.024
 kindness-0.021
 generosity-0.020
 honesty-0.020
 vitamins-0.020
Token liked
Token ID8288
Feature activation+3.23e-02

Pos logit contributions
he+0.059
ang+0.055
mer+0.053
tted+0.051
amed+0.048
Neg logit contributions
 fault-0.071
 kindness-0.063
 generosity-0.062
 cand-0.061
 forgiveness-0.060
Token to
Token ID284
Feature activation-5.27e-02

Pos logit contributions
he+0.047
ang+0.047
mer+0.046
room+0.042
imp+0.041
Neg logit contributions
 fault-0.067
 generosity-0.060
urger-0.059
 cand-0.058
 kindness-0.058
Token play
Token ID711
Feature activation-4.81e-02

Pos logit contributions
 fault+0.114
 cand+0.098
urger+0.097
 generosity+0.097
 diamonds+0.096
Neg logit contributions
he-0.089
ang-0.082
mer-0.081
tted-0.079
imp-0.075
Token frog
Token ID21264
Feature activation+5.88e-02

Pos logit contributions
 fault+0.076
 kindness+0.071
 generosity+0.068
 cand+0.066
 forgiveness+0.065
Neg logit contributions
he-0.076
ang-0.071
mer-0.067
tted-0.066
amed-0.064
Token became
Token ID2627
Feature activation+2.09e-02

Pos logit contributions
he+0.099
ang+0.094
mer+0.091
tted+0.085
amed+0.081
Neg logit contributions
 fault-0.133
 kindness-0.117
 generosity-0.115
 cand-0.113
 forgiveness-0.112
Token the
Token ID262
Feature activation+1.36e-02

Pos logit contributions
he+0.049
ang+0.043
mer+0.043
tted+0.043
umbling+0.040
Neg logit contributions
 fault-0.035
 kindness-0.032
 generosity-0.030
 vitamins-0.028
 honesty-0.028
Token best
Token ID1266
Feature activation-6.34e-02

Pos logit contributions
he+0.034
ang+0.032
mer+0.031
tted+0.031
amed+0.030
Neg logit contributions
 fault-0.020
 kindness-0.016
 generosity-0.016
 cand-0.015
 forgiveness-0.015
Token swim
Token ID9422
Feature activation+6.86e-02

Pos logit contributions
 fault+0.135
 kindness+0.122
 generosity+0.117
 dreams+0.116
 cand+0.112
Neg logit contributions
he-0.112
amed-0.103
tted-0.099
ang-0.095
aming-0.093
Tokenmer
Token ID647
Feature activation+2.74e-01

Pos logit contributions
he+0.085
ang+0.067
Uh+0.067
imp+0.061
amed+0.061
Neg logit contributions
 fault-0.183
 kindness-0.171
 generosity-0.166
 cand-0.158
 dreams-0.157
Token in
Token ID287
Feature activation+1.11e-01

Pos logit contributions
he+0.549
ang+0.503
tted+0.477
mer+0.475
amed+0.456
Neg logit contributions
 fault-0.534
 kindness-0.476
 generosity-0.464
 cand-0.449
 honesty-0.438
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
he+0.215
ang+0.201
mer+0.195
tted+0.190
amed+0.182
Neg logit contributions
 fault-0.221
 kindness-0.199
 generosity-0.193
 forgiveness-0.184
 cand-0.184
Token pond
Token ID16723
Feature activation+1.01e-01

Pos logit contributions
he+0.024
ang+0.022
tted+0.021
amed+0.021
mer+0.020
Neg logit contributions
 fault-0.026
 kindness-0.025
 dreams-0.024
 generosity-0.023
 cand-0.023
Token.
Token ID13
Feature activation+3.44e-03

Pos logit contributions
he+0.193
ang+0.181
mer+0.175
tted+0.167
amed+0.160
Neg logit contributions
 fault-0.201
 kindness-0.178
 generosity-0.174
 cand-0.170
 forgiveness-0.166
Token All
Token ID1439
Feature activation-8.98e-02

Pos logit contributions
he+0.006
ang+0.005
mer+0.005
tted+0.005
amed+0.005
Neg logit contributions
 fault-0.008
 kindness-0.007
 generosity-0.007
 cand-0.007
 forgiveness-0.007
Token Anna
Token ID11735
Feature activation+8.06e-02

Pos logit contributions
he+0.066
ang+0.065
mer+0.063
imp+0.062
amed+0.061
Neg logit contributions
 fault-0.043
 kindness-0.037
 cand-0.036
 vitamins-0.036
 dreams-0.035
Token has
Token ID468
Feature activation-4.34e-03

Pos logit contributions
he+0.169
ang+0.152
mer+0.147
tted+0.147
amed+0.144
Neg logit contributions
 fault-0.149
 kindness-0.129
 generosity-0.123
 vitamins-0.123
 dreams-0.121
Token a
Token ID257
Feature activation-1.59e-02

Pos logit contributions
 fault+0.008
 kindness+0.007
 generosity+0.006
 cand+0.006
 vitamins+0.006
Neg logit contributions
he-0.010
ang-0.009
mer-0.009
tted-0.008
imp-0.008
Token new
Token ID649
Feature activation-7.39e-02

Pos logit contributions
 fault+0.032
 kindness+0.028
 generosity+0.027
 cand+0.027
 forgiveness+0.026
Neg logit contributions
he-0.031
ang-0.029
mer-0.028
tted-0.027
amed-0.026
Token swim
Token ID9422
Feature activation+7.53e-02

Pos logit contributions
 fault+0.146
 kindness+0.129
 generosity+0.126
 cand+0.123
 forgiveness+0.120
Neg logit contributions
he-0.144
ang-0.135
mer-0.131
tted-0.125
amed-0.120
Tokensuit
Token ID6063
Feature activation+2.73e-01

Pos logit contributions
he+0.108
Uh+0.094
amed+0.087
aming+0.086
ang+0.083
Neg logit contributions
 fault-0.182
 kindness-0.165
 cand-0.159
urger-0.159
 powers-0.159
Token.
Token ID13
Feature activation+3.81e-02

Pos logit contributions
he+0.530
ang+0.500
mer+0.481
tted+0.464
amed+0.445
Neg logit contributions
 fault-0.538
 kindness-0.475
 generosity-0.464
 cand-0.458
 forgiveness-0.441
Token It
Token ID632
Feature activation-2.72e-03

Pos logit contributions
mer+0.081
he+0.080
ang+0.077
amed+0.074
imp+0.074
Neg logit contributions
 fault-0.064
 kindness-0.055
 vitamins-0.054
 dreams-0.053
 cand-0.053
Token is
Token ID318
Feature activation+3.83e-02

Pos logit contributions
 fault+0.005
 kindness+0.005
 generosity+0.004
 cand+0.004
 unic+0.004
Neg logit contributions
he-0.006
ang-0.005
mer-0.005
tted-0.005
amed-0.005
Token pink
Token ID11398
Feature activation-3.00e-02

Pos logit contributions
he+0.081
ang+0.075
mer+0.074
tted+0.071
imp+0.068
Neg logit contributions
 fault-0.071
 kindness-0.062
 generosity-0.060
 cand-0.060
 vitamins-0.059
Token and
Token ID290
Feature activation+1.47e-02

Pos logit contributions
 fault+0.059
 kindness+0.052
 cand+0.051
 generosity+0.050
 diamonds+0.049
Neg logit contributions
he-0.058
ang-0.054
mer-0.052
tted-0.052
imp-0.049
Token You
Token ID921
Feature activation-4.24e-04

Pos logit contributions
he+0.040
ang+0.038
mer+0.037
tted+0.034
imp+0.031
Neg logit contributions
 fault-0.059
 kindness-0.053
 generosity-0.053
 cand-0.053
urger-0.051
Token are
Token ID389
Feature activation+2.91e-02

Pos logit contributions
 fault+0.001
 generosity+0.001
urger+0.001
 kindness+0.001
 honesty+0.001
Neg logit contributions
he-0.001
tted-0.001
mer-0.000
ang-0.000
amed-0.000
Token a
Token ID257
Feature activation-4.13e-02

Pos logit contributions
he+0.056
ang+0.053
mer+0.050
tted+0.049
amed+0.049
Neg logit contributions
 fault-0.057
 generosity-0.049
 kindness-0.049
 cand-0.048
urger-0.047
Token good
Token ID922
Feature activation-1.51e-02

Pos logit contributions
 fault+0.074
 kindness+0.064
 generosity+0.062
 cand+0.060
 forgiveness+0.059
Neg logit contributions
he-0.090
ang-0.084
mer-0.081
tted-0.079
amed-0.076
Token swim
Token ID9422
Feature activation+7.08e-02

Pos logit contributions
 fault+0.024
 kindness+0.020
 generosity+0.020
 cand+0.019
 forgiveness+0.018
Neg logit contributions
he-0.036
ang-0.034
mer-0.033
tted-0.032
amed-0.031
Tokenmer
Token ID647
Feature activation+2.73e-01

Pos logit contributions
he+0.102
ang+0.091
amed+0.086
tted+0.083
aming+0.082
Neg logit contributions
 fault-0.174
 kindness-0.158
 generosity-0.154
 cand-0.150
 forgiveness-0.147
Token too
Token ID1165
Feature activation+4.12e-02

Pos logit contributions
he+0.577
ang+0.541
tted+0.517
mer+0.507
amed+0.493
Neg logit contributions
 fault-0.501
 kindness-0.441
 generosity-0.427
 cand-0.417
 forgiveness-0.409
Token.
Token ID13
Feature activation+2.73e-02

Pos logit contributions
he+0.080
ang+0.074
mer+0.072
tted+0.068
amed+0.065
Neg logit contributions
 fault-0.082
 kindness-0.073
 generosity-0.072
 cand-0.070
 forgiveness-0.068
Token You
Token ID921
Feature activation+5.94e-04

Pos logit contributions
he+0.045
ang+0.043
mer+0.041
tted+0.038
amed+0.036
Neg logit contributions
 fault-0.061
 kindness-0.054
 generosity-0.054
 cand-0.054
 diamonds-0.052
Token are
Token ID389
Feature activation+3.07e-02

Pos logit contributions
he+0.001
tted+0.001
mer+0.001
ang+0.001
amed+0.001
Neg logit contributions
 fault-0.001
urger-0.001
 generosity-0.001
 kindness-0.001
 honesty-0.001
Token not
Token ID407
Feature activation-5.45e-02

Pos logit contributions
he+0.059
ang+0.056
mer+0.053
tted+0.052
amed+0.051
Neg logit contributions
 fault-0.060
 generosity-0.052
 kindness-0.051
 cand-0.051
urger-0.049
Token He
Token ID679
Feature activation+3.24e-02

Pos logit contributions
he+0.027
ang+0.025
mer+0.024
tted+0.023
amed+0.021
Neg logit contributions
 fault-0.035
 kindness-0.031
 generosity-0.031
 cand-0.030
 forgiveness-0.030
Token was
Token ID373
Feature activation+4.50e-02

Pos logit contributions
he+0.062
ang+0.059
mer+0.057
tted+0.054
amed+0.052
Neg logit contributions
 fault-0.065
 kindness-0.057
 generosity-0.056
 cand-0.055
 forgiveness-0.054
Token an
Token ID281
Feature activation-6.53e-02

Pos logit contributions
he+0.071
ang+0.065
mer+0.063
tted+0.060
amed+0.056
Neg logit contributions
 fault-0.106
 kindness-0.096
 generosity-0.094
 cand-0.092
 forgiveness-0.090
Token enthusiastic
Token ID21848
Feature activation+8.82e-02

Pos logit contributions
 fault+0.112
 kindness+0.104
 generosity+0.100
 cand+0.094
 vitamins+0.094
Neg logit contributions
he-0.138
tted-0.127
ang-0.126
mer-0.125
amed-0.118
Token swim
Token ID9422
Feature activation+6.73e-02

Pos logit contributions
he+0.204
ang+0.191
tted+0.185
amed+0.180
mer+0.179
Neg logit contributions
 fault-0.143
 kindness-0.126
 generosity-0.121
 cand-0.116
 diamonds-0.114
Tokenmer
Token ID647
Feature activation+2.73e-01

Pos logit contributions
he+0.104
Uh+0.102
aming+0.090
amed+0.089
ang+0.085
Neg logit contributions
 fault-0.138
 kindness-0.132
 generosity-0.125
 stories-0.125
 powers-0.125
Token and
Token ID290
Feature activation-8.28e-03

Pos logit contributions
he+0.551
ang+0.493
tted+0.481
amed+0.460
mer+0.458
Neg logit contributions
 fault-0.531
 kindness-0.475
 generosity-0.461
 cand-0.446
 vitamins-0.434
Token would
Token ID561
Feature activation-4.86e-02

Pos logit contributions
 fault+0.019
 kindness+0.017
 generosity+0.016
 cand+0.016
 forgiveness+0.016
Neg logit contributions
he-0.014
ang-0.013
mer-0.012
tted-0.012
amed-0.011
Token swim
Token ID9422
Feature activation+6.83e-02

Pos logit contributions
 fault+0.107
 kindness+0.094
 generosity+0.094
 cand+0.093
 forgiveness+0.092
Neg logit contributions
he-0.084
ang-0.077
mer-0.075
tted-0.071
imp-0.067
Token all
Token ID477
Feature activation-1.64e-02

Pos logit contributions
he+0.119
ang+0.116
Uh+0.116
imp+0.103
aming+0.103
Neg logit contributions
 fault-0.126
 kindness-0.118
 stories-0.111
 dreams-0.110
 minerals-0.107
Token day
Token ID1110
Feature activation+9.00e-03

Pos logit contributions
 fault+0.025
 kindness+0.024
 unic+0.022
 generosity+0.021
 dreams+0.021
Neg logit contributions
he-0.034
tted-0.034
ang-0.032
Uh-0.032
mer-0.031
Token be
Token ID307
Feature activation+2.60e-02

Pos logit contributions
 fault+0.108
 kindness+0.096
 generosity+0.094
 cand+0.092
 forgiveness+0.090
Neg logit contributions
he-0.104
ang-0.097
mer-0.094
tted-0.090
amed-0.086
Token the
Token ID262
Feature activation+2.04e-02

Pos logit contributions
he+0.061
ang+0.056
mer+0.054
tted+0.052
umbling+0.050
Neg logit contributions
 fault-0.042
 kindness-0.037
 generosity-0.035
 forgiveness-0.034
 cand-0.034
Token most
Token ID749
Feature activation-3.29e-02

Pos logit contributions
he+0.049
ang+0.046
mer+0.044
tted+0.044
amed+0.042
Neg logit contributions
 fault-0.032
 kindness-0.027
 generosity-0.026
 cand-0.025
 forgiveness-0.024
Token successful
Token ID4388
Feature activation+1.03e-02

Pos logit contributions
 fault+0.061
 kindness+0.055
 generosity+0.052
 cand+0.051
 vitamins+0.050
Neg logit contributions
he-0.068
ang-0.061
tted-0.061
amed-0.060
mer-0.059
Token swim
Token ID9422
Feature activation+6.81e-02

Pos logit contributions
he+0.024
amed+0.022
tted+0.022
ang+0.022
aming+0.021
Neg logit contributions
 fault-0.016
 kindness-0.014
 dreams-0.014
 diamonds-0.014
 generosity-0.014
Tokenmer
Token ID647
Feature activation+2.73e-01

Pos logit contributions
he+0.162
Uh+0.152
amed+0.149
ang+0.145
aming+0.143
Neg logit contributions
 fault-0.098
 kindness-0.088
 generosity-0.081
 dreams-0.077
 vitamins-0.077
Token in
Token ID287
Feature activation+1.13e-01

Pos logit contributions
he+0.577
ang+0.519
tted+0.506
amed+0.489
mer+0.477
Neg logit contributions
 fault-0.503
 kindness-0.450
 generosity-0.433
 cand-0.420
 vitamins-0.414
Token the
Token ID262
Feature activation+1.78e-02

Pos logit contributions
he+0.218
ang+0.204
mer+0.197
tted+0.192
amed+0.184
Neg logit contributions
 fault-0.226
 kindness-0.204
 generosity-0.198
 cand-0.190
 forgiveness-0.190
Token pond
Token ID16723
Feature activation+1.02e-01

Pos logit contributions
he+0.033
ang+0.031
amed+0.029
tted+0.029
mer+0.028
Neg logit contributions
 fault-0.033
 kindness-0.032
 dreams-0.031
 memories-0.029
 cand-0.029
Token.
Token ID13
Feature activation+7.24e-03

Pos logit contributions
he+0.206
ang+0.193
mer+0.186
tted+0.181
amed+0.173
Neg logit contributions
 fault-0.195
 kindness-0.172
 generosity-0.168
 cand-0.163
 forgiveness-0.159
Token He
Token ID679
Feature activation+3.00e-02

Pos logit contributions
he+0.014
ang+0.014
mer+0.013
amed+0.012
aming+0.012
Neg logit contributions
 fault-0.014
 kindness-0.013
 generosity-0.012
 cand-0.012
 vitamins-0.012
Token was
Token ID373
Feature activation+4.04e-02

Pos logit contributions
he+0.002
mer+0.002
ang+0.002
tted+0.001
amed+0.001
Neg logit contributions
 fault-0.002
 kindness-0.002
 generosity-0.002
 cand-0.002
 forgiveness-0.002
Token the
Token ID262
Feature activation+1.86e-02

Pos logit contributions
he+0.073
ang+0.068
mer+0.066
tted+0.063
amed+0.060
Neg logit contributions
 fault-0.085
 kindness-0.076
 generosity-0.075
 cand-0.073
 forgiveness-0.071
Token most
Token ID749
Feature activation-3.47e-02

Pos logit contributions
he+0.039
ang+0.037
mer+0.036
tted+0.034
amed+0.033
Neg logit contributions
 fault-0.034
 kindness-0.030
 generosity-0.029
 cand-0.029
 forgiveness-0.028
Token successful
Token ID4388
Feature activation+9.30e-03

Pos logit contributions
 fault+0.062
 kindness+0.054
 generosity+0.053
 cand+0.050
 honesty+0.049
Neg logit contributions
he-0.075
ang-0.070
mer-0.067
tted-0.066
amed-0.065
Token swim
Token ID9422
Feature activation+6.70e-02

Pos logit contributions
he+0.022
ang+0.021
tted+0.020
amed+0.020
mer+0.019
Neg logit contributions
 fault-0.014
 kindness-0.012
 generosity-0.012
 cand-0.011
 vitamins-0.011
Tokenmer
Token ID647
Feature activation+2.73e-01

Pos logit contributions
he+0.073
ang+0.060
amed+0.053
imp+0.050
tted+0.050
Neg logit contributions
 fault-0.191
 kindness-0.177
 generosity-0.173
 cand-0.168
 honesty-0.165
Token in
Token ID287
Feature activation+1.12e-01

Pos logit contributions
he+0.525
ang+0.486
tted+0.455
mer+0.451
amed+0.437
Neg logit contributions
 fault-0.549
 kindness-0.490
 generosity-0.479
 cand-0.464
 forgiveness-0.454
Token the
Token ID262
Feature activation+1.63e-02

Pos logit contributions
he+0.207
ang+0.193
mer+0.186
tted+0.179
amed+0.171
Neg logit contributions
 fault-0.233
 kindness-0.208
 generosity-0.204
 cand-0.198
 forgiveness-0.195
Token pond
Token ID16723
Feature activation+1.02e-01

Pos logit contributions
he+0.029
ang+0.028
tted+0.026
amed+0.025
mer+0.025
Neg logit contributions
 fault-0.033
 kindness-0.032
 generosity-0.030
 vitamins-0.029
 cand-0.029
Token!
Token ID0
Feature activation+3.65e-02

Pos logit contributions
he+0.186
ang+0.173
mer+0.171
tted+0.158
amed+0.149
Neg logit contributions
 fault-0.212
 kindness-0.187
 generosity-0.186
 cand-0.182
 forgiveness-0.179
Token He
Token ID679
Feature activation+2.75e-02

Pos logit contributions
he+0.063
ang+0.059
mer+0.056
tted+0.054
amed+0.051
Neg logit contributions
 fault-0.080
 kindness-0.072
 generosity-0.071
 cand-0.069
 forgiveness-0.068
Token and
Token ID290
Feature activation+5.87e-03

Pos logit contributions
he+0.157
ang+0.134
mer+0.115
room+0.109
amed+0.107
Neg logit contributions
 fault-0.235
 generosity-0.204
 kindness-0.202
 weaknesses-0.202
 vitamins-0.201
Token Dad
Token ID17415
Feature activation+1.15e-01

Pos logit contributions
he+0.011
ang+0.011
mer+0.010
tted+0.010
imp+0.009
Neg logit contributions
 fault-0.012
 kindness-0.010
 generosity-0.010
 cand-0.010
 forgiveness-0.010
Token.
Token ID13
Feature activation+3.26e-02

Pos logit contributions
he+0.200
ang+0.174
mer+0.165
tted+0.155
amed+0.151
Neg logit contributions
 fault-0.259
 unic-0.219
 kindness-0.219
 generosity-0.217
 vitamins-0.217
Token "
Token ID366
Feature activation+1.14e-01

Pos logit contributions
he+0.071
mer+0.068
ang+0.068
tted+0.063
aming+0.062
Neg logit contributions
 fault-0.055
 kindness-0.045
 vitamins-0.044
 cand-0.044
 generosity-0.043
TokenHam
Token ID21281
Feature activation-4.93e-02

Pos logit contributions
mer+0.165
he+0.165
ang+0.164
tted+0.149
imp+0.143
Neg logit contributions
 fault-0.241
 cand-0.226
 vitamins-0.225
 kindness-0.222
 diamonds-0.220
Tokenmer
Token ID647
Feature activation+2.72e-01

Pos logit contributions
 fault+0.109
 kindness+0.099
 generosity+0.098
 cand+0.096
 forgiveness+0.094
Neg logit contributions
he-0.081
ang-0.074
tted-0.070
mer-0.069
amed-0.067
Token!"
Token ID2474
Feature activation+1.82e-02

Pos logit contributions
he+0.564
ang+0.532
mer+0.511
tted+0.496
amed+0.475
Neg logit contributions
 fault-0.503
 kindness-0.442
 generosity-0.432
 cand-0.421
 forgiveness-0.408
Token yelled
Token ID22760
Feature activation+6.21e-02

Pos logit contributions
he+0.041
ang+0.038
mer+0.038
tted+0.037
amed+0.035
Neg logit contributions
 fault-0.028
 kindness-0.026
 cand-0.025
 generosity-0.025
 dreams-0.023
Token Zoe
Token ID32280
Feature activation-1.33e-02

Pos logit contributions
he+0.134
ang+0.125
tted+0.118
mer+0.115
amed+0.115
Neg logit contributions
 fault-0.110
 kindness-0.096
 generosity-0.091
 cand-0.091
 diamonds-0.086
Token proudly
Token ID23017
Feature activation+7.17e-02

Pos logit contributions
 fault+0.026
 kindness+0.023
 generosity+0.022
 cand+0.021
 honesty+0.021
Neg logit contributions
he-0.026
ang-0.024
mer-0.024
tted-0.023
amed-0.022
Token.
Token ID13
Feature activation+2.93e-02

Pos logit contributions
he+0.121
ang+0.111
mer+0.108
tted+0.100
amed+0.097
Neg logit contributions
 fault-0.161
 kindness-0.141
 generosity-0.140
 vitamins-0.138
 unic-0.137
Token You
Token ID921
Feature activation-7.61e-03

Pos logit contributions
he+0.042
ang+0.038
mer+0.037
tted+0.035
amed+0.033
Neg logit contributions
 fault-0.054
 kindness-0.048
 generosity-0.047
 cand-0.046
 forgiveness-0.045
Token are
Token ID389
Feature activation+2.24e-02

Pos logit contributions
 fault+0.017
 kindness+0.015
 generosity+0.015
 cand+0.014
 honesty+0.014
Neg logit contributions
he-0.013
ang-0.012
mer-0.012
tted-0.011
amed-0.011
Token a
Token ID257
Feature activation-4.55e-02

Pos logit contributions
he+0.043
ang+0.040
mer+0.038
tted+0.037
amed+0.035
Neg logit contributions
 fault-0.045
 kindness-0.040
 generosity-0.040
 cand-0.039
 forgiveness-0.038
Token good
Token ID922
Feature activation-2.28e-02

Pos logit contributions
 fault+0.086
 kindness+0.076
 generosity+0.074
 cand+0.072
 forgiveness+0.071
Neg logit contributions
he-0.093
ang-0.087
mer-0.084
tted-0.081
amed-0.078
Token swim
Token ID9422
Feature activation+6.86e-02

Pos logit contributions
 fault+0.043
 kindness+0.038
 generosity+0.038
 cand+0.037
 unic+0.036
Neg logit contributions
he-0.044
ang-0.043
mer-0.042
tted-0.039
imp-0.038
Tokenmer
Token ID647
Feature activation+2.72e-01

Pos logit contributions
he+0.096
ang+0.083
Uh+0.080
amed+0.080
tted+0.077
Neg logit contributions
 fault-0.171
 kindness-0.154
 generosity-0.149
urger-0.144
 cand-0.143
Token.
Token ID13
Feature activation+6.05e-03

Pos logit contributions
he+0.551
ang+0.511
tted+0.488
mer+0.478
amed+0.464
Neg logit contributions
 fault-0.522
 kindness-0.459
 generosity-0.446
 cand-0.431
 forgiveness-0.422
Token Do
Token ID2141
Feature activation-1.46e-01

Pos logit contributions
he+0.010
ang+0.010
mer+0.009
tted+0.009
amed+0.008
Neg logit contributions
 fault-0.013
 kindness-0.012
 generosity-0.012
 cand-0.011
 forgiveness-0.011
Token you
Token ID345
Feature activation-3.32e-03

Pos logit contributions
 fault+0.385
 generosity+0.319
urger+0.317
 kindness+0.315
 diamonds+0.301
Neg logit contributions
he-0.197
ang-0.171
room-0.145
tted-0.133
amed-0.133
Token want
Token ID765
Feature activation+1.13e-01

Pos logit contributions
 fault+0.008
 kindness+0.007
 generosity+0.007
 cand+0.007
 forgiveness+0.007
Neg logit contributions
he-0.005
ang-0.005
mer-0.005
tted-0.004
amed-0.004
Token to
Token ID284
Feature activation-5.45e-02

Pos logit contributions
he+0.133
 nearby+0.118
ang+0.115
room+0.115
 across+0.113
Neg logit contributions
 fault-0.282
urger-0.242
ccoli-0.237
 generosity-0.235
 Majesty-0.233
Token was
Token ID373
Feature activation+3.01e-02

Pos logit contributions
 fault+0.011
 kindness+0.009
 cand+0.009
 generosity+0.009
 worries+0.008
Neg logit contributions
he-0.014
ang-0.013
mer-0.013
tted-0.012
imp-0.012
Token not
Token ID407
Feature activation-6.06e-02

Pos logit contributions
he+0.061
ang+0.057
mer+0.056
tted+0.053
amed+0.051
Neg logit contributions
 fault-0.058
 kindness-0.051
 generosity-0.049
 cand-0.048
 forgiveness-0.047
Token a
Token ID257
Feature activation-4.44e-02

Pos logit contributions
 fault+0.126
 kindness+0.112
 generosity+0.109
 cand+0.105
 forgiveness+0.104
Neg logit contributions
he-0.113
ang-0.104
mer-0.101
tted-0.097
amed-0.092
Token good
Token ID922
Feature activation-2.45e-02

Pos logit contributions
 fault+0.083
 kindness+0.073
 generosity+0.071
 cand+0.069
 forgiveness+0.067
Neg logit contributions
he-0.092
ang-0.086
mer-0.083
tted-0.080
amed-0.077
Token swim
Token ID9422
Feature activation+6.97e-02

Pos logit contributions
 fault+0.049
 kindness+0.044
 generosity+0.044
 cand+0.043
 unic+0.042
Neg logit contributions
he-0.045
ang-0.043
mer-0.042
tted-0.039
imp-0.037
Tokenmer
Token ID647
Feature activation+2.72e-01

Pos logit contributions
he+0.125
amed+0.115
ang+0.114
Uh+0.108
aming+0.108
Neg logit contributions
 fault-0.151
 kindness-0.128
 generosity-0.121
 cand-0.121
 feelings-0.117
Token.
Token ID13
Feature activation-1.10e-04

Pos logit contributions
he+0.553
ang+0.519
mer+0.491
tted+0.485
amed+0.469
Neg logit contributions
 fault-0.523
 kindness-0.456
 generosity-0.444
 cand-0.434
 forgiveness-0.423
Token He
Token ID679
Feature activation+1.89e-02

Pos logit contributions
 fault+0.000
 kindness+0.000
 generosity+0.000
 cand+0.000
 forgiveness+0.000
Neg logit contributions
he-0.000
ang-0.000
mer-0.000
tted-0.000
amed-0.000
Token started
Token ID2067
Feature activation+1.28e-01

Pos logit contributions
he+0.035
ang+0.032
mer+0.031
tted+0.030
amed+0.029
Neg logit contributions
 fault-0.039
 kindness-0.035
 generosity-0.034
 cand-0.034
 forgiveness-0.033
Token to
Token ID284
Feature activation-5.76e-02

Pos logit contributions
he+0.252
ang+0.248
mer+0.237
tted+0.225
imp+0.217
Neg logit contributions
 fault-0.235
 generosity-0.210
 kindness-0.209
 cand-0.204
 diamonds-0.198
Token panic
Token ID13619
Feature activation+3.55e-02

Pos logit contributions
 fault+0.118
 kindness+0.105
 generosity+0.104
 cand+0.102
 forgiveness+0.099
Neg logit contributions
he-0.107
ang-0.100
mer-0.097
tted-0.092
imp-0.087
Token felt
Token ID2936
Feature activation-8.41e-03

Pos logit contributions
 fault+0.065
 kindness+0.057
 generosity+0.056
 cand+0.055
 forgiveness+0.054
Neg logit contributions
he-0.054
ang-0.050
mer-0.048
tted-0.046
amed-0.044
Token like
Token ID588
Feature activation+3.44e-02

Pos logit contributions
 fault+0.019
 kindness+0.017
 generosity+0.016
 cand+0.016
 forgiveness+0.016
Neg logit contributions
he-0.015
ang-0.013
mer-0.013
tted-0.012
amed-0.012
Token it
Token ID340
Feature activation-2.07e-02

Pos logit contributions
he+0.062
ang+0.055
mer+0.054
amed+0.051
tted+0.050
Neg logit contributions
 fault-0.068
urger-0.061
 cand-0.058
 generosity-0.058
 kindness-0.056
Token was
Token ID373
Feature activation+2.93e-02

Pos logit contributions
 fault+0.045
 kindness+0.040
 generosity+0.040
 cand+0.040
 forgiveness+0.038
Neg logit contributions
he-0.036
ang-0.033
mer-0.032
tted-0.030
amed-0.029
Token her
Token ID607
Feature activation-2.17e-02

Pos logit contributions
he+0.052
mer+0.045
ang+0.045
amed+0.044
aming+0.043
Neg logit contributions
 fault-0.058
 generosity-0.053
 cand-0.053
urger-0.052
 kindness-0.051
Token fault
Token ID8046
Feature activation-3.90e-01

Pos logit contributions
 fault+0.034
 cand+0.033
 vitamins+0.031
urger+0.031
 unic+0.031
Neg logit contributions
he-0.044
mer-0.042
ang-0.042
 forward-0.037
tted-0.037
Token,
Token ID11
Feature activation-2.05e-03

Pos logit contributions
 fault+0.806
 generosity+0.738
 kindness+0.732
 cand+0.722
 vitamins+0.714
Neg logit contributions
he-0.706
ang-0.656
mer-0.622
tted-0.564
umbling-0.553
Token like
Token ID588
Feature activation+3.47e-02

Pos logit contributions
 fault+0.004
 generosity+0.004
 cand+0.004
urger+0.004
 vitamins+0.004
Neg logit contributions
he-0.003
ang-0.003
mer-0.003
tted-0.003
amed-0.003
Token she
Token ID673
Feature activation-2.67e-02

Pos logit contributions
he+0.060
ang+0.054
mer+0.052
tted+0.049
amed+0.049
Neg logit contributions
 fault-0.066
urger-0.061
 generosity-0.057
 cand-0.056
 kindness-0.055
Token had
Token ID550
Feature activation-3.85e-02

Pos logit contributions
 fault+0.059
 kindness+0.052
 generosity+0.052
 cand+0.051
 forgiveness+0.049
Neg logit contributions
he-0.047
ang-0.043
mer-0.042
tted-0.039
amed-0.037
Token done
Token ID1760
Feature activation-3.87e-02

Pos logit contributions
 fault+0.074
 generosity+0.068
 kindness+0.066
urger+0.065
 cand+0.064
Neg logit contributions
he-0.069
mer-0.062
ang-0.062
tted-0.062
amed-0.060
Token ending
Token ID7464
Feature activation-8.06e-02

Pos logit contributions
 fault+0.081
 cand+0.070
 generosity+0.069
 kindness+0.068
urger+0.067
Neg logit contributions
he-0.084
mer-0.078
ang-0.077
tted-0.071
imp-0.070
Token was
Token ID373
Feature activation+1.54e-02

Pos logit contributions
 fault+0.171
 kindness+0.153
 generosity+0.149
 cand+0.145
 forgiveness+0.143
Neg logit contributions
he-0.145
ang-0.136
mer-0.130
tted-0.124
amed-0.118
Token no
Token ID645
Feature activation-1.07e-01

Pos logit contributions
he+0.030
mer+0.027
ang+0.027
tted+0.026
amed+0.026
Neg logit contributions
 fault-0.030
 cand-0.027
 vitamins-0.026
 generosity-0.026
 kindness-0.026
Token one
Token ID530
Feature activation-1.09e-01

Pos logit contributions
 fault+0.207
urger+0.187
 vitamins+0.183
 generosity+0.181
 cand+0.181
Neg logit contributions
he-0.204
mer-0.195
ang-0.189
tted-0.170
imp-0.163
Token's
Token ID338
Feature activation-1.20e-01

Pos logit contributions
 fault+0.235
 kindness+0.211
 generosity+0.206
 cand+0.200
 forgiveness+0.197
Neg logit contributions
he-0.192
ang-0.179
mer-0.171
tted-0.165
amed-0.158
Token fault
Token ID8046
Feature activation-3.88e-01

Pos logit contributions
 fault+0.208
 cand+0.198
 vitamins+0.187
urger+0.187
 generosity+0.181
Neg logit contributions
he-0.228
mer-0.216
ang-0.208
tted-0.203
amed-0.192
Token,
Token ID11
Feature activation-1.96e-02

Pos logit contributions
 fault+0.780
 kindness+0.695
 generosity+0.684
 cand+0.670
 forgiveness+0.653
Neg logit contributions
he-0.732
ang-0.691
mer-0.665
tted-0.629
amed-0.600
Token but
Token ID475
Feature activation-4.42e-02

Pos logit contributions
 fault+0.038
 cand+0.033
 generosity+0.033
 kindness+0.033
urger+0.032
Neg logit contributions
he-0.037
ang-0.034
mer-0.034
tted-0.031
imp-0.030
Token it
Token ID340
Feature activation-2.61e-02

Pos logit contributions
 fault+0.086
urger+0.075
 cand+0.074
 vitamins+0.072
 generosity+0.071
Neg logit contributions
he-0.086
ang-0.080
mer-0.078
tted-0.072
imp-0.070
Token left
Token ID1364
Feature activation+7.49e-02

Pos logit contributions
 fault+0.058
 cand+0.052
 kindness+0.052
 generosity+0.052
 forgiveness+0.050
Neg logit contributions
he-0.044
mer-0.040
ang-0.039
tted-0.037
amed-0.035
Token the
Token ID262
Feature activation-8.33e-03

Pos logit contributions
he+0.145
ang+0.134
mer+0.131
tted+0.125
amed+0.120
Neg logit contributions
 fault-0.148
 kindness-0.130
 generosity-0.129
 cand-0.126
 forgiveness-0.122
Token that
Token ID326
Feature activation-1.13e-01

Pos logit contributions
 fault+0.037
urger+0.030
ccoli+0.028
 cand+0.028
 fiber+0.028
Neg logit contributions
he-0.030
ang-0.027
mer-0.026
pour-0.025
 corner-0.024
Token it
Token ID340
Feature activation-2.34e-02

Pos logit contributions
 fault+0.233
urger+0.194
 cand+0.178
 fiber+0.175
 generosity+0.173
Neg logit contributions
he-0.208
mer-0.179
ang-0.179
pour-0.165
 corner-0.164
Token was
Token ID373
Feature activation+2.17e-02

Pos logit contributions
 fault+0.060
 generosity+0.052
urger+0.052
 cand+0.052
 kindness+0.051
Neg logit contributions
he-0.032
mer-0.030
tted-0.027
ang-0.027
amed-0.024
Token his
Token ID465
Feature activation-9.16e-03

Pos logit contributions
he+0.035
amed+0.032
mer+0.031
aming+0.031
ang+0.030
Neg logit contributions
 fault-0.045
 cand-0.040
 generosity-0.039
urger-0.039
 fiber-0.038
Token own
Token ID898
Feature activation-5.48e-02

Pos logit contributions
 fault+0.014
 vitamins+0.013
 cand+0.013
ridge+0.013
 unic+0.013
Neg logit contributions
he-0.017
mer-0.016
ang-0.016
 corner-0.015
 far-0.014
Token fault
Token ID8046
Feature activation-3.88e-01

Pos logit contributions
 fault+0.086
 cand+0.082
 unic+0.081
ridge+0.078
 fiber+0.077
Neg logit contributions
he-0.109
mer-0.097
 corner-0.095
amed-0.091
tted-0.090
Token -
Token ID532
Feature activation-2.53e-02

Pos logit contributions
 fault+0.821
 generosity+0.751
 kindness+0.745
 vitamins+0.737
 cand+0.735
Neg logit contributions
he-0.673
ang-0.618
mer-0.588
tted-0.525
umbling-0.517
Token if
Token ID611
Feature activation-7.88e-02

Pos logit contributions
 fault+0.048
urger+0.042
 cand+0.040
 generosity+0.038
 vitamins+0.038
Neg logit contributions
he-0.047
ang-0.043
mer-0.041
 forward-0.040
tted-0.039
Token he
Token ID339
Feature activation-2.31e-02

Pos logit contributions
 fault+0.178
urger+0.158
ccoli+0.142
 cand+0.139
 vitamins+0.136
Neg logit contributions
he-0.111
mer-0.108
ang-0.102
 far-0.097
Hey-0.091
Token had
Token ID550
Feature activation-4.82e-02

Pos logit contributions
 fault+0.053
 kindness+0.045
 generosity+0.045
 vitamins+0.045
 cand+0.045
Neg logit contributions
he-0.039
mer-0.035
ang-0.034
tted-0.033
imp-0.029
Token written
Token ID3194
Feature activation-1.15e-01

Pos logit contributions
 fault+0.105
mm+0.094
ccoli+0.093
 generosity+0.092
 unic+0.091
Neg logit contributions
he-0.061
amed-0.061
tted-0.059
 far-0.056
aming-0.055
Token knew
Token ID2993
Feature activation+2.52e-02

Pos logit contributions
he+0.014
ang+0.013
mer+0.012
tted+0.012
amed+0.011
Neg logit contributions
 fault-0.015
 kindness-0.014
 generosity-0.013
 cand-0.013
 unic-0.013
Token it
Token ID340
Feature activation+1.23e-03

Pos logit contributions
he+0.046
ang+0.040
mer+0.039
tted+0.038
imp+0.036
Neg logit contributions
 fault-0.050
urger-0.043
 vitamins-0.043
 cand-0.043
 generosity-0.042
Token was
Token ID373
Feature activation+5.29e-02

Pos logit contributions
he+0.002
mer+0.002
ang+0.002
tted+0.002
amed+0.002
Neg logit contributions
 fault-0.003
 generosity-0.003
 kindness-0.003
 cand-0.002
urger-0.002
Token her
Token ID607
Feature activation-6.69e-03

Pos logit contributions
he+0.094
mer+0.082
amed+0.080
ang+0.079
aming+0.079
Neg logit contributions
 fault-0.107
 cand-0.094
 generosity-0.093
urger-0.091
 kindness-0.091
Token own
Token ID898
Feature activation-3.98e-02

Pos logit contributions
 fault+0.011
 cand+0.010
 vitamins+0.010
urger+0.010
 unic+0.010
Neg logit contributions
he-0.013
mer-0.012
ang-0.012
tted-0.011
 corner-0.011
Token fault
Token ID8046
Feature activation-3.86e-01

Pos logit contributions
 fault+0.066
 cand+0.058
 generosity+0.057
 unic+0.056
 vitamins+0.056
Neg logit contributions
he-0.082
mer-0.075
ang-0.074
tted-0.070
 corner-0.067
Token.
Token ID13
Feature activation+6.62e-03

Pos logit contributions
 fault+0.815
 generosity+0.764
 kindness+0.755
 vitamins+0.741
 cand+0.740
Neg logit contributions
he-0.662
ang-0.579
mer-0.562
umbling-0.494
tted-0.494
Token
Token ID198
Feature activation-1.97e-03

Pos logit contributions
he+0.011
ang+0.010
mer+0.010
tted+0.009
amed+0.009
Neg logit contributions
 fault-0.015
 kindness-0.013
 generosity-0.013
 cand-0.013
 forgiveness-0.012
Token
Token ID198
Feature activation-1.91e-03

Pos logit contributions
 fault+0.004
 kindness+0.004
 generosity+0.003
 forgiveness+0.003
 honesty+0.003
Neg logit contributions
he-0.004
mer-0.003
ang-0.003
tted-0.003
amed-0.003
TokenThe
Token ID464
Feature activation-5.65e-02

Pos logit contributions
 fault+0.004
 kindness+0.004
 generosity+0.004
 forgiveness+0.004
 honesty+0.004
Neg logit contributions
mer-0.003
he-0.003
ang-0.003
tted-0.003
umbling-0.003
Token girl
Token ID2576
Feature activation+1.20e-01

Pos logit contributions
 fault+0.111
 kindness+0.095
 generosity+0.094
 cand+0.092
 vitamins+0.090
Neg logit contributions
he-0.110
ang-0.103
mer-0.100
tted-0.096
amed-0.092
Tokenawn
Token ID3832
Feature activation+1.03e-01

Pos logit contributions
 fault+0.108
 kindness+0.095
 generosity+0.093
 honesty+0.089
 forgiveness+0.088
Neg logit contributions
he-0.075
ang-0.068
mer-0.064
tted-0.061
amed-0.060
Tokenâ
Token ID22940
Feature activation+2.54e-02

Pos logit contributions
he+0.163
ang+0.146
tted+0.137
mer+0.136
amed+0.130
Neg logit contributions
 fault-0.236
 kindness-0.215
 generosity-0.210
 cand-0.207
 unic-0.202
Token
Token ID26391
Feature activation+5.50e-02

Pos logit contributions
he+0.039
ang+0.038
mer+0.033
amed+0.033
imp+0.033
Neg logit contributions
 fault-0.055
 kindness-0.051
 generosity-0.049
 honesty-0.047
 forgiveness-0.046
Token
Token ID8151
Feature activation-6.52e-02

Pos logit contributions
he+0.101
ang+0.092
mer+0.090
tted+0.085
amed+0.084
Neg logit contributions
 fault-0.112
 kindness-0.097
 generosity-0.095
 unic-0.092
 cand-0.092
Tokens
Token ID82
Feature activation-4.29e-02

Pos logit contributions
 fault+0.133
 kindness+0.110
 generosity+0.104
 cand+0.102
 honesty+0.101
Neg logit contributions
mer-0.105
Uh-0.105
ang-0.103
he-0.101
imp-0.097
Token fault
Token ID8046
Feature activation-3.85e-01

Pos logit contributions
 fault+0.076
 cand+0.065
 kindness+0.065
 generosity+0.065
 vitamins+0.063
Neg logit contributions
he-0.090
ang-0.082
mer-0.081
tted-0.080
amed-0.076
Token.
Token ID13
Feature activation-7.89e-03

Pos logit contributions
 fault+0.761
 kindness+0.676
 generosity+0.669
 cand+0.656
 forgiveness+0.641
Neg logit contributions
he-0.736
ang-0.693
mer-0.666
tted-0.631
amed-0.606
Token
Token ID220
Feature activation-7.35e-02

Pos logit contributions
 fault+0.017
 kindness+0.015
 generosity+0.015
 cand+0.014
 forgiveness+0.014
Neg logit contributions
he-0.014
ang-0.013
mer-0.013
tted-0.012
amed-0.012
Token
Token ID198
Feature activation-1.49e-02

Pos logit contributions
 fault+0.148
 kindness+0.126
 generosity+0.121
 honesty+0.118
 forgiveness+0.117
Neg logit contributions
he-0.136
mer-0.133
ang-0.131
imp-0.116
tted-0.116
Token
Token ID198
Feature activation-1.49e-02

Pos logit contributions
 fault+0.028
 kindness+0.026
 generosity+0.025
 unic+0.024
 honesty+0.024
Neg logit contributions
he-0.027
mer-0.026
ang-0.025
tted-0.023
amed-0.023
TokenThe
Token ID464
Feature activation-6.70e-02

Pos logit contributions
 fault+0.032
 kindness+0.030
 generosity+0.029
 cand+0.029
 unic+0.029
Neg logit contributions
he-0.023
mer-0.021
ang-0.020
tted-0.019
amed-0.018
Token realised
Token ID19169
Feature activation-8.80e-02

Pos logit contributions
 fault+0.051
 kindness+0.044
 generosity+0.044
 forgiveness+0.042
 cand+0.042
Neg logit contributions
he-0.047
ang-0.041
mer-0.041
tted-0.039
amed-0.038
Token it
Token ID340
Feature activation-2.37e-02

Pos logit contributions
 fault+0.170
urger+0.149
 generosity+0.140
 vitamins+0.140
 cand+0.138
Neg logit contributions
he-0.175
ang-0.157
mer-0.145
imp-0.138
pour-0.137
Token wasn
Token ID2492
Feature activation-5.19e-02

Pos logit contributions
 fault+0.056
 generosity+0.050
 kindness+0.050
 cand+0.050
urger+0.048
Neg logit contributions
he-0.037
mer-0.033
ang-0.032
tted-0.031
amed-0.029
Token't
Token ID470
Feature activation-7.42e-02

Pos logit contributions
 fault+0.101
 kindness+0.090
 generosity+0.087
 cand+0.084
 unic+0.084
Neg logit contributions
he-0.100
ang-0.096
mer-0.093
tted-0.087
amed-0.085
Token her
Token ID607
Feature activation-3.11e-02

Pos logit contributions
 fault+0.153
 generosity+0.137
 kindness+0.136
 cand+0.133
 forgiveness+0.131
Neg logit contributions
he-0.133
ang-0.119
mer-0.118
tted-0.111
amed-0.110
Token fault
Token ID8046
Feature activation-3.85e-01

Pos logit contributions
 cand+0.051
 vitamins+0.050
 fiber+0.048
urger+0.048
 fault+0.047
Neg logit contributions
he-0.057
mer-0.057
ang-0.053
 forward-0.051
 corner-0.049
Token after
Token ID706
Feature activation+2.84e-02

Pos logit contributions
 fault+0.772
 kindness+0.695
 generosity+0.689
 cand+0.676
 vitamins+0.664
Neg logit contributions
he-0.726
ang-0.678
mer-0.652
tted-0.608
amed-0.578
Token all
Token ID477
Feature activation-3.51e-02

Pos logit contributions
he+0.054
ang+0.045
mer+0.043
tted+0.042
imp+0.041
Neg logit contributions
 fault-0.059
urger-0.051
 generosity-0.051
 kindness-0.050
 cand-0.049
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
 fault+0.077
urger+0.070
 cand+0.068
 vitamins+0.067
 generosity+0.067
Neg logit contributions
he-0.058
ang-0.050
mer-0.050
tted-0.044
imp-0.044
Token<|endoftext|>
Token ID50256
Feature activation+4.22e-02

Pos logit contributions
 fault+0.031
 kindness+0.028
 generosity+0.028
 cand+0.028
urger+0.027
Neg logit contributions
he-0.018
ang-0.016
mer-0.015
tted-0.015
amed-0.013
TokenOnce
Token ID7454
Feature activation-8.34e-02

Pos logit contributions
ang+0.074
mer+0.069
imp+0.067
he+0.067
amed+0.066
Neg logit contributions
 fault-0.080
 kindness-0.070
 unic-0.070
 dreams-0.069
 Brown-0.068
Token and
Token ID290
Feature activation+1.39e-03

Pos logit contributions
 fault+0.099
 kindness+0.087
 generosity+0.087
 cand+0.085
 unic+0.084
Neg logit contributions
he-0.086
ang-0.080
mer-0.078
tted-0.073
amed-0.070
Token said
Token ID531
Feature activation-2.38e-02

Pos logit contributions
he+0.003
ang+0.002
mer+0.002
tted+0.002
amed+0.002
Neg logit contributions
 fault-0.003
 kindness-0.003
 generosity-0.003
 cand-0.002
 forgiveness-0.002
Token it
Token ID340
Feature activation-1.19e-02

Pos logit contributions
 fault+0.051
 generosity+0.044
 kindness+0.044
 cand+0.043
urger+0.043
Neg logit contributions
he-0.041
ang-0.038
mer-0.037
tted-0.035
amed-0.034
Token was
Token ID373
Feature activation+3.01e-02

Pos logit contributions
 fault+0.029
 generosity+0.026
 kindness+0.026
 cand+0.025
 diamonds+0.025
Neg logit contributions
he-0.017
mer-0.016
ang-0.015
tted-0.015
amed-0.014
Token their
Token ID511
Feature activation+2.47e-02

Pos logit contributions
he+0.053
ang+0.050
mer+0.049
amed+0.046
tted+0.046
Neg logit contributions
 fault-0.060
 cand-0.053
 generosity-0.053
 kindness-0.052
 forgiveness-0.050
Token fault
Token ID8046
Feature activation-3.80e-01

Pos logit contributions
he+0.051
ang+0.049
mer+0.048
tted+0.044
imp+0.044
Neg logit contributions
 fault-0.038
 cand-0.035
 unic-0.034
 generosity-0.034
 kindness-0.033
Token.
Token ID13
Feature activation+1.87e-02

Pos logit contributions
 fault+0.791
 generosity+0.721
 kindness+0.718
 cand+0.712
 vitamins+0.704
Neg logit contributions
he-0.666
ang-0.632
mer-0.595
tted-0.538
apping-0.536
Token Mom
Token ID11254
Feature activation+5.63e-02

Pos logit contributions
he+0.034
ang+0.031
mer+0.031
tted+0.029
amed+0.028
Neg logit contributions
 fault-0.040
 kindness-0.035
 generosity-0.035
 cand-0.034
 honesty-0.033
Token said
Token ID531
Feature activation-2.98e-02

Pos logit contributions
he+0.113
ang+0.108
mer+0.105
tted+0.102
amed+0.096
Neg logit contributions
 fault-0.104
 kindness-0.093
 generosity-0.090
 cand-0.088
 forgiveness-0.086
Token they
Token ID484
Feature activation-6.81e-03

Pos logit contributions
 fault+0.059
urger+0.051
 generosity+0.051
 vitamins+0.050
 unic+0.049
Neg logit contributions
he-0.055
ang-0.050
mer-0.048
tted-0.044
apping-0.044
Token were
Token ID547
Feature activation+1.41e-02

Pos logit contributions
 fault+0.016
 generosity+0.014
 kindness+0.014
 cand+0.014
 diamonds+0.014
Neg logit contributions
he-0.011
ang-0.010
mer-0.009
tted-0.009
amed-0.009
Token them
Token ID606
Feature activation-5.43e-03

Pos logit contributions
 fault+0.331
 cand+0.275
 generosity+0.273
 vitamins+0.271
 unic+0.269
Neg logit contributions
he-0.335
mer-0.283
ang-0.281
tted-0.275
amed-0.272
Token it
Token ID340
Feature activation-1.92e-02

Pos logit contributions
 fault+0.010
 vitamins+0.009
 cand+0.009
urger+0.009
 fiber+0.009
Neg logit contributions
he-0.010
 forward-0.008
tted-0.008
ang-0.008
mer-0.008
Token was
Token ID373
Feature activation+2.51e-02

Pos logit contributions
 fault+0.045
 cand+0.043
 diamonds+0.042
 vitamins+0.042
 generosity+0.041
Neg logit contributions
he-0.025
mer-0.023
tted-0.022
ang-0.020
amed-0.020
Token all
Token ID477
Feature activation-2.75e-02

Pos logit contributions
he+0.048
mer+0.045
ang+0.044
amed+0.043
tted+0.043
Neg logit contributions
 fault-0.045
 cand-0.042
 generosity-0.041
 kindness-0.040
 vitamins-0.039
Token their
Token ID511
Feature activation+1.11e-02

Pos logit contributions
 fault+0.053
 cand+0.049
urger+0.049
 generosity+0.049
 kindness+0.047
Neg logit contributions
he-0.050
mer-0.044
ang-0.044
amed-0.043
tted-0.043
Token fault
Token ID8046
Feature activation-3.79e-01

Pos logit contributions
he+0.022
mer+0.021
ang+0.021
imp+0.019
umpy+0.019
Neg logit contributions
 fault-0.016
ccoli-0.015
urger-0.015
chief-0.015
 cand-0.014
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
 fault+0.790
 generosity+0.701
 kindness+0.700
 vitamins+0.695
 cand+0.694
Neg logit contributions
he-0.686
ang-0.638
mer-0.608
tted-0.546
umbling-0.542
Token His
Token ID2399
Feature activation-8.24e-02

Pos logit contributions
 fault+0.021
 kindness+0.019
 generosity+0.019
 cand+0.018
 forgiveness+0.018
Neg logit contributions
he-0.020
ang-0.019
mer-0.018
tted-0.017
amed-0.016
Token friends
Token ID2460
Feature activation-1.69e-02

Pos logit contributions
 fault+0.143
 vitamins+0.123
 diamonds+0.123
 cand+0.122
 unic+0.121
Neg logit contributions
he-0.152
ang-0.149
mer-0.143
tted-0.140
amed-0.138
Token were
Token ID547
Feature activation+1.16e-02

Pos logit contributions
 fault+0.036
 kindness+0.032
 generosity+0.031
 cand+0.031
 forgiveness+0.030
Neg logit contributions
he-0.030
ang-0.028
mer-0.027
tted-0.026
amed-0.025
Token so
Token ID523
Feature activation+2.07e-02

Pos logit contributions
he+0.020
ang+0.019
mer+0.019
tted+0.017
room+0.016
Neg logit contributions
 fault-0.025
 kindness-0.023
 generosity-0.022
 forgiveness-0.021
 cand-0.021
Token it
Token ID340
Feature activation-1.77e-02

Pos logit contributions
 fault+0.038
 cand+0.034
urger+0.033
 vitamins+0.033
 fiber+0.032
Neg logit contributions
he-0.033
ang-0.032
mer-0.027
pour-0.027
imp-0.026
Token was
Token ID373
Feature activation+2.10e-02

Pos logit contributions
 fault+0.042
 generosity+0.038
 cand+0.038
 kindness+0.037
urger+0.037
Neg logit contributions
he-0.027
mer-0.024
tted-0.023
ang-0.022
amed-0.021
Token all
Token ID477
Feature activation-2.41e-02

Pos logit contributions
he+0.033
mer+0.030
 forward+0.029
aming+0.028
amed+0.028
Neg logit contributions
 fault-0.041
 cand-0.040
urger-0.038
 unic-0.038
 generosity-0.038
Token Joe
Token ID5689
Feature activation+3.27e-02

Pos logit contributions
 fault+0.048
urger+0.045
 cand+0.045
 generosity+0.044
 kindness+0.043
Neg logit contributions
he-0.043
mer-0.037
ang-0.036
tted-0.036
amed-0.035
Token's
Token ID338
Feature activation-1.13e-01

Pos logit contributions
he+0.056
mer+0.054
ang+0.052
tted+0.048
umpy+0.045
Neg logit contributions
 fault-0.068
 unic-0.062
 cand-0.060
 diamonds-0.059
 generosity-0.058
Token fault
Token ID8046
Feature activation-3.79e-01

Pos logit contributions
 cand+0.173
 unic+0.170
 fault+0.164
urger+0.161
 diamonds+0.156
Neg logit contributions
he-0.213
mer-0.194
ang-0.193
tted-0.190
 corner-0.181
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
 fault+0.769
 kindness+0.684
 generosity+0.684
 cand+0.675
 vitamins+0.665
Neg logit contributions
he-0.703
ang-0.664
mer-0.631
tted-0.574
umbling-0.559
Token Joe
Token ID5689
Feature activation+3.32e-02

Pos logit contributions
 fault+0.027
 kindness+0.024
 generosity+0.023
 honesty+0.022
 forgiveness+0.022
Neg logit contributions
he-0.024
ang-0.022
mer-0.021
tted-0.020
amed-0.019
Token tried
Token ID3088
Feature activation-2.70e-02

Pos logit contributions
he+0.059
ang+0.055
mer+0.054
tted+0.051
amed+0.049
Neg logit contributions
 fault-0.072
 kindness-0.062
 generosity-0.062
 cand-0.060
 vitamins-0.060
Token to
Token ID284
Feature activation-6.41e-02

Pos logit contributions
mm+0.059
 unic+0.059
urger+0.058
 vitamins+0.058
 diamonds+0.057
Neg logit contributions
he-0.036
 across-0.032
 corner-0.031
 forward-0.031
 nearby-0.029
Token talk
Token ID1561
Feature activation-4.04e-02

Pos logit contributions
 fault+0.125
 generosity+0.105
 cand+0.105
 kindness+0.104
 vitamins+0.104
Neg logit contributions
he-0.132
ang-0.117
mer-0.115
tted-0.114
imp-0.107
Token "
Token ID366
Feature activation+1.12e-01

Pos logit contributions
he+0.001
ang+0.001
mer+0.001
tted+0.001
amed+0.001
Neg logit contributions
 fault-0.001
 kindness-0.001
 generosity-0.001
 cand-0.001
 vitamins-0.001
TokenIt
Token ID1026
Feature activation+2.81e-03

Pos logit contributions
he+0.147
ang+0.134
mer+0.126
tted+0.120
amed+0.112
Neg logit contributions
 fault-0.287
 kindness-0.264
 generosity-0.260
 cand-0.253
 forgiveness-0.251
Token's
Token ID338
Feature activation-9.07e-02

Pos logit contributions
he+0.005
mer+0.005
ang+0.005
tted+0.005
amed+0.005
Neg logit contributions
 fault-0.006
 generosity-0.005
 cand-0.005
 kindness-0.005
 forgiveness-0.005
Token your
Token ID534
Feature activation-3.55e-02

Pos logit contributions
 fault+0.162
 cand+0.144
 generosity+0.142
 kindness+0.137
 unic+0.135
Neg logit contributions
he-0.177
ang-0.163
mer-0.158
amed-0.156
tted-0.153
Token own
Token ID898
Feature activation-4.06e-02

Pos logit contributions
 fault+0.056
 cand+0.051
 unic+0.050
 generosity+0.050
 kindness+0.049
Neg logit contributions
he-0.071
ang-0.068
mer-0.066
tted-0.062
imp-0.061
Token fault
Token ID8046
Feature activation-3.78e-01

Pos logit contributions
 fault+0.071
 cand+0.062
 generosity+0.062
 kindness+0.061
 diamonds+0.060
Neg logit contributions
he-0.082
ang-0.075
mer-0.075
tted-0.071
imp-0.068
Token,
Token ID11
Feature activation-1.80e-03

Pos logit contributions
 fault+0.724
 kindness+0.656
 generosity+0.654
 cand+0.645
 forgiveness+0.631
Neg logit contributions
he-0.717
ang-0.663
mer-0.649
tted-0.591
amed-0.577
Token you
Token ID345
Feature activation-9.58e-03

Pos logit contributions
 fault+0.003
 generosity+0.003
 kindness+0.003
 cand+0.003
 vitamins+0.003
Neg logit contributions
he-0.003
ang-0.003
mer-0.003
tted-0.003
imp-0.003
Token should
Token ID815
Feature activation-1.22e-01

Pos logit contributions
 fault+0.019
 generosity+0.017
 kindness+0.017
 cand+0.017
 forgiveness+0.016
Neg logit contributions
he-0.018
ang-0.017
mer-0.016
tted-0.016
imp-0.015
Token've
Token ID1053
Feature activation-7.59e-02

Pos logit contributions
 fault+0.237
 generosity+0.213
 cand+0.211
 diamonds+0.211
 unic+0.203
Neg logit contributions
he-0.233
ang-0.192
tted-0.188
imp-0.182
mer-0.179
Token been
Token ID587
Feature activation-7.68e-02

Pos logit contributions
 fault+0.142
 cand+0.127
urger+0.125
 generosity+0.125
 diamonds+0.123
Neg logit contributions
he-0.144
tted-0.132
amed-0.125
mer-0.124
ang-0.124
Token knew
Token ID2993
Feature activation-1.99e-02

Pos logit contributions
 fault+0.023
 kindness+0.020
 generosity+0.019
 cand+0.019
 diamonds+0.019
Neg logit contributions
he-0.021
ang-0.019
mer-0.019
tted-0.018
amed-0.017
Token it
Token ID340
Feature activation-1.86e-02

Pos logit contributions
 fault+0.038
 cand+0.035
urger+0.034
 unic+0.032
 fiber+0.032
Neg logit contributions
he-0.036
ang-0.034
 forward-0.031
mer-0.030
tted-0.029
Token was
Token ID373
Feature activation+2.50e-02

Pos logit contributions
 fault+0.043
 cand+0.038
 generosity+0.038
 kindness+0.038
 diamonds+0.037
Neg logit contributions
he-0.030
mer-0.026
ang-0.025
tted-0.025
amed-0.023
Token his
Token ID465
Feature activation+1.54e-03

Pos logit contributions
he+0.047
mer+0.042
 forward+0.041
tted+0.041
amed+0.041
Neg logit contributions
 fault-0.046
 cand-0.043
 unic-0.041
 diamonds-0.039
urger-0.039
Token own
Token ID898
Feature activation-5.10e-02

Pos logit contributions
he+0.003
mer+0.003
ang+0.003
tted+0.003
 corner+0.003
Neg logit contributions
 cand-0.002
 fault-0.002
 diamonds-0.002
 unic-0.002
urger-0.002
Token fault
Token ID8046
Feature activation-3.76e-01

Pos logit contributions
 fault+0.087
 cand+0.080
 diamonds+0.077
urger+0.077
 unic+0.076
Neg logit contributions
he-0.103
mer-0.091
tted-0.091
ang-0.089
 corner-0.085
Token.
Token ID13
Feature activation-9.24e-03

Pos logit contributions
 fault+0.807
 vitamins+0.728
 cand+0.727
 generosity+0.726
 kindness+0.722
Neg logit contributions
he-0.664
ang-0.593
mer-0.566
tted-0.505
apping-0.500
Token He
Token ID679
Feature activation+1.43e-02

Pos logit contributions
 fault+0.021
 kindness+0.018
 generosity+0.018
 cand+0.018
 unic+0.017
Neg logit contributions
he-0.016
ang-0.015
mer-0.014
tted-0.013
amed-0.013
Token would
Token ID561
Feature activation-6.07e-02

Pos logit contributions
he+0.024
ang+0.023
mer+0.022
tted+0.021
amed+0.021
Neg logit contributions
 fault-0.032
 kindness-0.027
 cand-0.027
 generosity-0.027
 diamonds-0.026
Token have
Token ID423
Feature activation-5.46e-02

Pos logit contributions
 fault+0.118
 cand+0.103
 vitamins+0.100
 diamonds+0.099
 generosity+0.098
Neg logit contributions
he-0.124
ang-0.103
tted-0.102
mer-0.098
imp-0.097
Token to
Token ID284
Feature activation-6.17e-02

Pos logit contributions
 unic+0.105
urger+0.102
 fault+0.102
 cand+0.101
 diamonds+0.101
Neg logit contributions
he-0.076
amed-0.074
tted-0.074
 across-0.066
aming-0.066
Token realized
Token ID6939
Feature activation-1.02e-01

Pos logit contributions
he+0.021
ang+0.019
mer+0.019
tted+0.018
amed+0.017
Neg logit contributions
 fault-0.028
 kindness-0.024
 generosity-0.024
 vitamins-0.024
 cand-0.023
Token it
Token ID340
Feature activation-2.16e-02

Pos logit contributions
 fault+0.204
urger+0.177
 vitamins+0.175
 generosity+0.172
 kindness+0.168
Neg logit contributions
he-0.198
ang-0.177
mer-0.154
imp-0.152
pour-0.151
Token was
Token ID373
Feature activation+2.26e-02

Pos logit contributions
 fault+0.050
 generosity+0.045
 kindness+0.045
 cand+0.044
 diamonds+0.043
Neg logit contributions
he-0.035
mer-0.030
ang-0.030
tted-0.029
amed-0.027
Token his
Token ID465
Feature activation-4.18e-03

Pos logit contributions
he+0.037
mer+0.033
amed+0.032
aming+0.032
 forward+0.032
Neg logit contributions
 fault-0.044
 cand-0.041
urger-0.040
 generosity-0.040
 Majesty-0.039
Token own
Token ID898
Feature activation-5.30e-02

Pos logit contributions
 vitamins+0.004
 cand+0.004
urger+0.004
ridge+0.004
 diamonds+0.004
Neg logit contributions
he-0.010
mer-0.009
ang-0.009
tted-0.009
umpy-0.009
Token fault
Token ID8046
Feature activation-3.75e-01

Pos logit contributions
 fault+0.075
 cand+0.072
urger+0.070
 diamonds+0.069
 unic+0.068
Neg logit contributions
he-0.114
mer-0.104
tted-0.104
ang-0.100
 corner-0.097
Token for
Token ID329
Feature activation+2.75e-02

Pos logit contributions
 fault+0.785
 kindness+0.698
 generosity+0.698
 cand+0.686
 vitamins+0.683
Neg logit contributions
he-0.689
ang-0.626
mer-0.594
tted-0.550
amed-0.529
Token not
Token ID407
Feature activation-6.74e-02

Pos logit contributions
he+0.046
ang+0.038
imp+0.037
room+0.037
aming+0.036
Neg logit contributions
 fault-0.058
ccoli-0.048
urger-0.048
 cand-0.046
 vitamins-0.046
Token val
Token ID1188
Feature activation-1.94e-01

Pos logit contributions
 fault+0.132
urger+0.116
 cand+0.115
mm+0.114
 unic+0.113
Neg logit contributions
he-0.113
tted-0.105
mer-0.101
ang-0.095
umpy-0.094
Tokenuing
Token ID4250
Feature activation-7.19e-02

Pos logit contributions
 fault+0.322
 kindness+0.285
 generosity+0.279
 cand+0.273
 forgiveness+0.264
Neg logit contributions
he-0.435
ang-0.409
mer-0.401
tted-0.385
imp-0.370
Token her
Token ID607
Feature activation-2.34e-02

Pos logit contributions
 fault+0.132
 generosity+0.116
urger+0.113
 unic+0.112
 cand+0.111
Neg logit contributions
he-0.139
ang-0.128
mer-0.125
tted-0.118
imp-0.114
Token bird
Token ID6512
Feature activation-5.60e-02

Pos logit contributions
he+0.091
ang+0.091
mer+0.089
tted+0.074
imp+0.074
Neg logit contributions
 fault-0.159
 cand-0.140
 kindness-0.139
 generosity-0.139
 unic-0.136
Token.
Token ID13
Feature activation+9.83e-03

Pos logit contributions
 fault+0.115
 kindness+0.101
 generosity+0.100
 cand+0.098
 vitamins+0.095
Neg logit contributions
he-0.104
ang-0.099
mer-0.096
tted-0.090
imp-0.085
Token You
Token ID921
Feature activation-6.33e-03

Pos logit contributions
he+0.016
ang+0.015
mer+0.014
tted+0.014
amed+0.013
Neg logit contributions
 fault-0.022
 kindness-0.020
 generosity-0.020
 cand-0.020
 forgiveness-0.019
Token admit
Token ID9159
Feature activation-1.87e-02

Pos logit contributions
 fault+0.015
 generosity+0.013
 kindness+0.012
 cand+0.012
 honesty+0.012
Neg logit contributions
he-0.011
tted-0.009
ang-0.009
mer-0.009
amed-0.008
Token your
Token ID534
Feature activation-4.75e-02

Pos logit contributions
 fault+0.037
 generosity+0.032
 kindness+0.031
 cand+0.031
 vitamins+0.030
Neg logit contributions
he-0.036
ang-0.033
mer-0.032
imp-0.029
tted-0.029
Token fault
Token ID8046
Feature activation-3.75e-01

Pos logit contributions
 cand+0.072
 fault+0.069
 diamonds+0.067
 vitamins+0.066
urger+0.063
Neg logit contributions
he-0.092
ang-0.091
mer-0.087
imp-0.083
umpy-0.082
Token and
Token ID290
Feature activation-1.26e-02

Pos logit contributions
 fault+0.783
 generosity+0.706
 kindness+0.703
 cand+0.689
 unic+0.676
Neg logit contributions
he-0.668
ang-0.619
mer-0.592
tted-0.548
amed-0.531
Token ask
Token ID1265
Feature activation-2.95e-02

Pos logit contributions
 fault+0.025
 cand+0.020
 generosity+0.020
 kindness+0.020
urger+0.020
Neg logit contributions
he-0.024
ang-0.023
mer-0.021
tted-0.020
amed-0.020
Token for
Token ID329
Feature activation+3.56e-02

Pos logit contributions
 fault+0.064
 generosity+0.054
 kindness+0.054
 cand+0.054
 diamonds+0.053
Neg logit contributions
he-0.052
ang-0.048
mer-0.047
tted-0.043
imp-0.042
Token mercy
Token ID17703
Feature activation-1.06e-01

Pos logit contributions
he+0.061
ang+0.056
imp+0.054
tted+0.053
amed+0.053
Neg logit contributions
 fault-0.074
urger-0.057
ccoli-0.057
 cand-0.056
 unic-0.054
Token.
Token ID13
Feature activation+1.60e-02

Pos logit contributions
 fault+0.216
 kindness+0.191
 generosity+0.189
 cand+0.186
 unic+0.181
Neg logit contributions
he-0.197
ang-0.184
mer-0.179
tted-0.166
imp-0.160
Token.
Token ID13
Feature activation+6.69e-03

Pos logit contributions
 fault+0.165
 generosity+0.143
urger+0.142
 kindness+0.141
 unic+0.139
Neg logit contributions
he-0.115
mer-0.107
ang-0.101
tted-0.092
umpy-0.086
Token It
Token ID632
Feature activation-2.80e-02

Pos logit contributions
he+0.011
ang+0.010
mer+0.010
tted+0.009
imp+0.009
Neg logit contributions
 fault-0.015
 kindness-0.013
 generosity-0.013
 cand-0.013
 diamonds-0.013
Token is
Token ID318
Feature activation+1.30e-02

Pos logit contributions
 fault+0.062
urger+0.055
 generosity+0.055
 cand+0.053
 forgiveness+0.053
Neg logit contributions
he-0.045
mer-0.042
ang-0.041
tted-0.039
imp-0.037
Token not
Token ID407
Feature activation-6.45e-02

Pos logit contributions
he+0.020
amed+0.019
ang+0.019
aming+0.018
mer+0.018
Neg logit contributions
 fault-0.027
 cand-0.025
urger-0.024
 generosity-0.024
 fiber-0.023
Token your
Token ID534
Feature activation-4.99e-02

Pos logit contributions
 fault+0.137
urger+0.127
 generosity+0.126
chief+0.126
 cand+0.125
Neg logit contributions
he-0.087
mer-0.080
 alone-0.079
ang-0.077
amed-0.076
Token fault
Token ID8046
Feature activation-3.75e-01

Pos logit contributions
ridge+0.073
 cand+0.072
 vitamins+0.072
urger+0.071
ccoli+0.067
Neg logit contributions
mer-0.092
he-0.090
ang-0.089
 corner-0.084
 cross-0.083
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
 fault+0.761
 generosity+0.720
 kindness+0.706
 vitamins+0.689
 cand+0.686
Neg logit contributions
he-0.668
ang-0.619
mer-0.600
umpy-0.537
umbling-0.531
Token It
Token ID632
Feature activation-2.69e-02

Pos logit contributions
he+0.019
ang+0.018
mer+0.017
tted+0.016
imp+0.015
Neg logit contributions
 fault-0.027
 kindness-0.024
 generosity-0.024
 cand-0.024
 diamonds-0.023
Token is
Token ID318
Feature activation+1.38e-02

Pos logit contributions
 fault+0.060
 generosity+0.056
urger+0.054
 kindness+0.053
 forgiveness+0.053
Neg logit contributions
he-0.043
mer-0.040
ang-0.039
tted-0.036
umpy-0.034
Token not
Token ID407
Feature activation-6.40e-02

Pos logit contributions
he+0.021
ang+0.020
mer+0.020
amed+0.019
aming+0.019
Neg logit contributions
 fault-0.027
 cand-0.026
 generosity-0.025
urger-0.025
 forgiveness-0.024
Token anyone
Token ID2687
Feature activation-7.00e-02

Pos logit contributions
 fault+0.133
 generosity+0.123
 cand+0.122
urger+0.122
chief+0.121
Neg logit contributions
he-0.090
mer-0.082
ang-0.082
 alone-0.080
amed-0.080
Token It
Token ID632
Feature activation-2.69e-02

Pos logit contributions
he+0.019
ang+0.018
mer+0.017
tted+0.016
imp+0.015
Neg logit contributions
 fault-0.027
 kindness-0.024
 generosity-0.024
 cand-0.024
 diamonds-0.023
Token is
Token ID318
Feature activation+1.38e-02

Pos logit contributions
 fault+0.060
 generosity+0.056
urger+0.054
 kindness+0.053
 forgiveness+0.053
Neg logit contributions
he-0.043
mer-0.040
ang-0.039
tted-0.036
umpy-0.034
Token not
Token ID407
Feature activation-6.40e-02

Pos logit contributions
he+0.021
ang+0.020
mer+0.020
amed+0.019
aming+0.019
Neg logit contributions
 fault-0.027
 cand-0.026
 generosity-0.025
urger-0.025
 forgiveness-0.024
Token anyone
Token ID2687
Feature activation-7.00e-02

Pos logit contributions
 fault+0.133
 generosity+0.123
 cand+0.122
urger+0.122
chief+0.121
Neg logit contributions
he-0.090
mer-0.082
ang-0.082
 alone-0.080
amed-0.080
Token's
Token ID338
Feature activation-1.07e-01

Pos logit contributions
 fault+0.157
 generosity+0.148
urger+0.145
 vitamins+0.143
 cand+0.141
Neg logit contributions
he-0.104
ang-0.099
mer-0.097
umpy-0.085
tted-0.079
Token fault
Token ID8046
Feature activation-3.74e-01

Pos logit contributions
ridge+0.175
 cand+0.173
urger+0.172
mm+0.169
chief+0.162
Neg logit contributions
he-0.177
mer-0.176
 alone-0.164
ang-0.164
 corner-0.163
Token."
Token ID526
Feature activation-4.22e-02

Pos logit contributions
 fault+0.735
 generosity+0.694
 kindness+0.683
 cand+0.661
 forgiveness+0.655
Neg logit contributions
he-0.688
ang-0.645
mer-0.628
umpy-0.562
umbling-0.554
Token
Token ID198
Feature activation+2.84e-03

Pos logit contributions
 fault+0.089
 kindness+0.079
 generosity+0.077
 cand+0.076
 forgiveness+0.074
Neg logit contributions
he-0.076
ang-0.070
mer-0.070
tted-0.066
amed-0.063
Token
Token ID198
Feature activation+1.60e-03

Pos logit contributions
he+0.005
mer+0.005
ang+0.005
tted+0.005
amed+0.004
Neg logit contributions
 fault-0.006
 kindness-0.005
 generosity-0.005
 unic-0.005
 cand-0.005
TokenL
Token ID43
Feature activation+8.22e-02

Pos logit contributions
he+0.002
mer+0.002
tted+0.002
ang+0.002
amed+0.002
Neg logit contributions
 fault-0.004
 kindness-0.004
 unic-0.004
 cand-0.004
 forgiveness-0.004
Tokenila
Token ID10102
Feature activation+1.73e-01

Pos logit contributions
mer+0.103
he+0.101
tted+0.094
ang+0.093
umbling+0.080
Neg logit contributions
 fault-0.212
 unic-0.182
 kindness-0.182
 cand-0.179
 dreams-0.179
Token saying
Token ID2282
Feature activation-9.78e-02

Pos logit contributions
 fault+0.005
 kindness+0.005
 generosity+0.004
 cand+0.004
 forgiveness+0.004
Neg logit contributions
he-0.005
ang-0.004
mer-0.004
tted-0.004
amed-0.004
Token that
Token ID326
Feature activation-1.04e-01

Pos logit contributions
 fault+0.213
 generosity+0.183
 kindness+0.181
 cand+0.179
 vitamins+0.177
Neg logit contributions
he-0.166
ang-0.152
mer-0.148
tted-0.139
imp-0.136
Token it
Token ID340
Feature activation-1.54e-02

Pos logit contributions
 fault+0.190
urger+0.159
 generosity+0.157
 kindness+0.156
 cand+0.153
Neg logit contributions
he-0.211
ang-0.192
mer-0.185
room-0.175
tted-0.175
Token was
Token ID373
Feature activation+2.86e-02

Pos logit contributions
 fault+0.035
 generosity+0.032
 kindness+0.031
 diamonds+0.031
 cand+0.031
Neg logit contributions
he-0.024
mer-0.022
ang-0.021
tted-0.021
amed-0.019
Token his
Token ID465
Feature activation+1.80e-03

Pos logit contributions
he+0.052
ang+0.049
mer+0.048
tted+0.046
amed+0.045
Neg logit contributions
 fault-0.053
 cand-0.047
 generosity-0.047
 kindness-0.046
 unic-0.045
Token fault
Token ID8046
Feature activation-3.74e-01

Pos logit contributions
he+0.004
mer+0.004
ang+0.004
tted+0.003
imp+0.003
Neg logit contributions
 fault-0.003
 cand-0.003
 generosity-0.002
 kindness-0.002
urger-0.002
Token that
Token ID326
Feature activation-1.07e-01

Pos logit contributions
 fault+0.754
 kindness+0.676
 generosity+0.672
 cand+0.660
 vitamins+0.649
Neg logit contributions
he-0.698
ang-0.658
mer-0.627
tted-0.581
amed-0.557
Token he
Token ID339
Feature activation-1.09e-02

Pos logit contributions
 fault+0.197
 kindness+0.168
 generosity+0.168
 cand+0.165
urger+0.163
Neg logit contributions
he-0.216
ang-0.198
mer-0.198
tted-0.187
amed-0.176
Token forgot
Token ID16453
Feature activation-6.53e-02

Pos logit contributions
 fault+0.022
 kindness+0.020
 generosity+0.019
 cand+0.019
 forgiveness+0.018
Neg logit contributions
he-0.021
ang-0.019
mer-0.019
tted-0.018
amed-0.017
Token the
Token ID262
Feature activation+4.01e-03

Pos logit contributions
 fault+0.122
 generosity+0.109
 cand+0.106
 kindness+0.105
 unic+0.103
Neg logit contributions
he-0.124
mer-0.109
ang-0.109
tted-0.107
amed-0.101
Token folder
Token ID9483
Feature activation-5.24e-03

Pos logit contributions
he+0.008
mer+0.007
tted+0.007
ang+0.007
amed+0.007
Neg logit contributions
 fault-0.007
urger-0.006
 cand-0.006
 generosity-0.006
 unic-0.006
Token that
Token ID326
Feature activation-1.11e-01

Pos logit contributions
 fault+0.310
urger+0.266
 cand+0.260
 vitamins+0.257
 generosity+0.247
Neg logit contributions
he-0.307
ang-0.266
mer-0.247
pour-0.240
tted-0.237
Token it
Token ID340
Feature activation-2.00e-02

Pos logit contributions
 fault+0.231
urger+0.197
 cand+0.196
 generosity+0.192
 kindness+0.190
Neg logit contributions
he-0.205
ang-0.178
mer-0.176
tted-0.163
amed-0.158
Token wasn
Token ID2492
Feature activation-4.62e-02

Pos logit contributions
 fault+0.046
 kindness+0.040
 generosity+0.040
 cand+0.040
 diamonds+0.038
Neg logit contributions
he-0.033
ang-0.030
mer-0.030
tted-0.029
amed-0.027
Token't
Token ID470
Feature activation-6.50e-02

Pos logit contributions
 fault+0.088
 kindness+0.078
 generosity+0.075
 cand+0.072
 forgiveness+0.072
Neg logit contributions
he-0.091
ang-0.088
mer-0.086
tted-0.079
amed-0.078
Token her
Token ID607
Feature activation-1.92e-02

Pos logit contributions
 fault+0.123
 generosity+0.109
 kindness+0.109
 cand+0.108
urger+0.104
Neg logit contributions
he-0.127
ang-0.114
mer-0.113
tted-0.109
amed-0.106
Token fault
Token ID8046
Feature activation-3.74e-01

Pos logit contributions
 cand+0.032
 unic+0.031
 fault+0.031
 vitamins+0.031
urger+0.030
Neg logit contributions
he-0.035
mer-0.035
ang-0.032
 forward-0.031
tted-0.030
Token.
Token ID13
Feature activation-9.09e-03

Pos logit contributions
 fault+0.770
 kindness+0.687
 generosity+0.678
 cand+0.663
 forgiveness+0.647
Neg logit contributions
he-0.695
ang-0.645
mer-0.623
tted-0.588
amed-0.559
Token Finally
Token ID9461
Feature activation-4.17e-02

Pos logit contributions
 fault+0.023
 kindness+0.021
 generosity+0.020
 cand+0.020
 forgiveness+0.019
Neg logit contributions
he-0.013
ang-0.012
mer-0.012
tted-0.011
amed-0.010
Token,
Token ID11
Feature activation-1.00e-02

Pos logit contributions
 fault+0.098
 kindness+0.082
 generosity+0.082
 vitamins+0.081
urger+0.080
Neg logit contributions
he-0.071
ang-0.061
mer-0.059
tted-0.056
amed-0.054
Token the
Token ID262
Feature activation+3.11e-03

Pos logit contributions
 fault+0.018
 kindness+0.016
 generosity+0.016
 cand+0.015
 forgiveness+0.015
Neg logit contributions
he-0.021
ang-0.020
mer-0.020
tted-0.019
amed-0.018
Token family
Token ID1641
Feature activation-3.41e-02

Pos logit contributions
he+0.006
ang+0.005
mer+0.005
tted+0.005
amed+0.005
Neg logit contributions
 fault-0.006
 kindness-0.005
 cand-0.005
 generosity-0.005
 unic-0.005
Token.
Token ID13
Feature activation+1.63e-02

Pos logit contributions
 fault+0.227
 generosity+0.206
 kindness+0.203
 cand+0.198
 diamonds+0.197
Neg logit contributions
he-0.173
ang-0.158
mer-0.154
room-0.139
tted-0.138
Token It
Token ID632
Feature activation-2.15e-02

Pos logit contributions
he+0.023
ang+0.023
mer+0.021
tted+0.020
room+0.018
Neg logit contributions
 fault-0.038
 cand-0.035
 generosity-0.034
 kindness-0.034
 diamonds-0.034
Token's
Token ID338
Feature activation-9.85e-02

Pos logit contributions
 fault+0.047
 generosity+0.043
 diamonds+0.043
urger+0.042
 cand+0.042
Neg logit contributions
mer-0.032
he-0.032
ang-0.031
tted-0.029
umpy-0.026
Token not
Token ID407
Feature activation-5.96e-02

Pos logit contributions
 fault+0.171
 generosity+0.171
 cand+0.168
urger+0.159
 unic+0.157
Neg logit contributions
ang-0.165
he-0.163
aming-0.149
mer-0.146
amed-0.146
Token your
Token ID534
Feature activation-4.23e-02

Pos logit contributions
 fault+0.125
 generosity+0.113
 cand+0.113
 unic+0.112
chief+0.112
Neg logit contributions
he-0.085
ang-0.078
mer-0.076
umpy-0.071
 alone-0.071
Token fault
Token ID8046
Feature activation-3.74e-01

Pos logit contributions
 cand+0.048
 unic+0.047
 diamonds+0.047
 generosity+0.046
 vitamins+0.045
Neg logit contributions
he-0.091
mer-0.090
ang-0.090
 corner-0.083
room-0.083
Token.
Token ID13
Feature activation+1.24e-02

Pos logit contributions
 fault+0.768
 generosity+0.750
 kindness+0.727
 cand+0.723
 unic+0.721
Neg logit contributions
he-0.614
ang-0.556
mer-0.539
umpy-0.491
apping-0.484
Token The
Token ID383
Feature activation-9.84e-02

Pos logit contributions
he+0.019
ang+0.018
mer+0.017
tted+0.016
room+0.015
Neg logit contributions
 fault-0.029
 generosity-0.026
 cand-0.026
 kindness-0.026
 diamonds-0.025
Token rope
Token ID17182
Feature activation+4.52e-02

Pos logit contributions
 fault+0.187
 kindness+0.162
 generosity+0.162
 cand+0.158
urger+0.154
Neg logit contributions
he-0.191
ang-0.179
mer-0.174
tted-0.165
imp-0.158
Token was
Token ID373
Feature activation+2.82e-02

Pos logit contributions
he+0.084
ang+0.078
mer+0.076
tted+0.073
amed+0.068
Neg logit contributions
 fault-0.092
 kindness-0.082
 generosity-0.081
 cand-0.078
 unic-0.077
Token old
Token ID1468
Feature activation+5.67e-02

Pos logit contributions
he+0.048
ang+0.047
mer+0.044
tted+0.042
amed+0.042
Neg logit contributions
 fault-0.057
 generosity-0.053
 kindness-0.051
 cand-0.051
 forgiveness-0.050
Token that
Token ID326
Feature activation-1.02e-01

Pos logit contributions
 fault+0.023
urger+0.020
 diamonds+0.019
 generosity+0.019
 unic+0.018
Neg logit contributions
he-0.018
ang-0.017
tted-0.015
mer-0.015
amed-0.015
Token it
Token ID340
Feature activation-1.58e-02

Pos logit contributions
 fault+0.224
urger+0.212
 Majesty+0.191
 generosity+0.189
 fiber+0.183
Neg logit contributions
he-0.154
ang-0.138
room-0.121
 alone-0.119
 corner-0.119
Token was
Token ID373
Feature activation+2.83e-02

Pos logit contributions
 fault+0.041
urger+0.037
 generosity+0.036
 diamonds+0.036
 kindness+0.036
Neg logit contributions
he-0.022
mer-0.018
tted-0.017
ang-0.017
amed-0.016
Token not
Token ID407
Feature activation-6.12e-02

Pos logit contributions
he+0.041
ang+0.037
aming+0.036
amed+0.036
 far+0.035
Neg logit contributions
 fault-0.058
 cand-0.054
urger-0.054
 generosity-0.054
 weaknesses-0.051
Token her
Token ID607
Feature activation-1.22e-02

Pos logit contributions
 fault+0.127
 generosity+0.116
urger+0.113
 kindness+0.111
 honesty+0.110
Neg logit contributions
he-0.097
ang-0.085
amed-0.085
mer-0.083
tted-0.080
Token fault
Token ID8046
Feature activation-3.73e-01

Pos logit contributions
 cand+0.018
 vitamins+0.018
 unic+0.018
 fiber+0.018
chief+0.017
Neg logit contributions
he-0.023
mer-0.023
ang-0.022
umpy-0.020
 forward-0.020
Token.
Token ID13
Feature activation+9.32e-03

Pos logit contributions
 fault+0.815
 generosity+0.750
 kindness+0.739
 vitamins+0.731
 cand+0.721
Neg logit contributions
he-0.648
ang-0.583
mer-0.552
tted-0.494
umbling-0.493
Token He
Token ID679
Feature activation+1.99e-02

Pos logit contributions
he+0.013
ang+0.012
mer+0.011
tted+0.011
imp+0.010
Neg logit contributions
 fault-0.023
 kindness-0.021
 generosity-0.021
 cand-0.021
 diamonds-0.020
Token said
Token ID531
Feature activation-2.72e-02

Pos logit contributions
he+0.027
ang+0.024
amed+0.022
tted+0.021
mer+0.021
Neg logit contributions
 fault-0.051
 diamonds-0.044
 generosity-0.043
 kindness-0.042
urger-0.042
Token they
Token ID484
Feature activation-9.42e-03

Pos logit contributions
 fault+0.063
urger+0.058
 former+0.054
 vitamins+0.053
 generosity+0.053
Neg logit contributions
he-0.039
ang-0.033
 forward-0.030
imp-0.030
crow-0.030
Token would
Token ID561
Feature activation-5.60e-02

Pos logit contributions
 fault+0.024
 generosity+0.022
 diamonds+0.022
 Majesty+0.022
urger+0.022
Neg logit contributions
he-0.014
tted-0.011
amed-0.009
ang-0.009
mer-0.009
Token and
Token ID290
Feature activation-1.49e-02

Pos logit contributions
 fault+0.007
 kindness+0.006
 generosity+0.006
 cand+0.005
 forgiveness+0.005
Neg logit contributions
he-0.006
ang-0.006
mer-0.006
tted-0.005
imp-0.005
Token it
Token ID340
Feature activation-1.22e-02

Pos logit contributions
 fault+0.029
 kindness+0.025
 generosity+0.025
 cand+0.024
 forgiveness+0.023
Neg logit contributions
he-0.029
ang-0.028
mer-0.027
tted-0.025
imp-0.024
Token's
Token ID338
Feature activation-1.01e-01

Pos logit contributions
 fault+0.023
 generosity+0.020
 kindness+0.020
 cand+0.020
 forgiveness+0.019
Neg logit contributions
he-0.024
ang-0.023
mer-0.022
tted-0.021
amed-0.020
Token all
Token ID477
Feature activation-2.11e-02

Pos logit contributions
 fault+0.203
 generosity+0.180
 kindness+0.179
 cand+0.178
 forgiveness+0.172
Neg logit contributions
he-0.187
ang-0.175
mer-0.170
tted-0.162
amed-0.157
Token my
Token ID616
Feature activation-4.21e-02

Pos logit contributions
 fault+0.047
 kindness+0.042
 generosity+0.041
 cand+0.040
 forgiveness+0.039
Neg logit contributions
he-0.036
ang-0.034
mer-0.032
tted-0.031
amed-0.029
Token fault
Token ID8046
Feature activation-3.73e-01

Pos logit contributions
 fault+0.062
 cand+0.055
 unic+0.053
 forgiveness+0.053
 generosity+0.052
Neg logit contributions
ang-0.086
he-0.086
mer-0.085
tted-0.078
umbling-0.076
Token,"
Token ID553
Feature activation-9.02e-02

Pos logit contributions
 fault+0.720
 kindness+0.645
 generosity+0.640
 cand+0.625
 forgiveness+0.617
Neg logit contributions
he-0.714
ang-0.669
mer-0.658
tted-0.605
amed-0.588
Token he
Token ID339
Feature activation-1.70e-02

Pos logit contributions
 fault+0.179
 kindness+0.148
 generosity+0.146
 cand+0.142
urger+0.141
Neg logit contributions
he-0.182
ang-0.171
mer-0.165
tted-0.155
imp-0.149
Token said
Token ID531
Feature activation-3.47e-02

Pos logit contributions
 fault+0.034
 kindness+0.029
 generosity+0.029
 cand+0.028
 unic+0.027
Neg logit contributions
he-0.033
ang-0.031
mer-0.030
tted-0.029
amed-0.028
Token,
Token ID11
Feature activation-7.17e-03

Pos logit contributions
 fault+0.070
 kindness+0.062
 generosity+0.061
 cand+0.060
 unic+0.058
Neg logit contributions
he-0.066
ang-0.061
mer-0.059
tted-0.057
amed-0.054
Token weeping
Token ID45705
Feature activation-1.74e-01

Pos logit contributions
 fault+0.015
 kindness+0.013
 generosity+0.013
 cand+0.013
 forgiveness+0.012
Neg logit contributions
he-0.013
ang-0.012
mer-0.012
tted-0.011
amed-0.011
Token.
Token ID13
Feature activation-6.73e-03

Pos logit contributions
he+0.163
ang+0.152
mer+0.148
tted+0.141
amed+0.134
Neg logit contributions
 fault-0.165
 kindness-0.146
 generosity-0.144
 cand-0.140
 forgiveness-0.137
Token Suddenly
Token ID24975
Feature activation-4.73e-02

Pos logit contributions
 fault+0.015
 kindness+0.013
 generosity+0.013
 cand+0.013
 forgiveness+0.012
Neg logit contributions
he-0.012
ang-0.011
mer-0.010
tted-0.010
amed-0.010
Token,
Token ID11
Feature activation-8.72e-03

Pos logit contributions
 fault+0.118
 generosity+0.105
urger+0.102
 kindness+0.102
 vitamins+0.101
Neg logit contributions
he-0.065
ang-0.059
mer-0.058
tted-0.053
amed-0.051
Token a
Token ID257
Feature activation-4.59e-02

Pos logit contributions
 fault+0.019
 generosity+0.017
 kindness+0.017
 cand+0.016
urger+0.016
Neg logit contributions
he-0.015
ang-0.014
mer-0.013
tted-0.012
amed-0.012
Token big
Token ID1263
Feature activation+8.94e-02

Pos logit contributions
 fault+0.090
 kindness+0.086
 generosity+0.083
 forgiveness+0.080
 cand+0.079
Neg logit contributions
he-0.087
ang-0.081
mer-0.077
tted-0.074
amed-0.072
Token wave
Token ID6769
Feature activation+1.71e-01

Pos logit contributions
he+0.172
ang+0.160
tted+0.155
mer+0.155
amed+0.148
Neg logit contributions
 fault-0.176
 kindness-0.163
 generosity-0.156
 cand-0.151
 forgiveness-0.150
Token came
Token ID1625
Feature activation+7.49e-02

Pos logit contributions
he+0.329
ang+0.316
tted+0.290
mer+0.287
amed+0.283
Neg logit contributions
 fault-0.304
 kindness-0.286
 unic-0.276
 generosity-0.270
 diamonds-0.270
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
he+0.143
ang+0.132
mer+0.129
tted+0.124
imp+0.116
Neg logit contributions
 fault-0.152
 kindness-0.135
 generosity-0.129
 diamonds-0.125
 unic-0.125
Token hit
Token ID2277
Feature activation+2.15e-02

Pos logit contributions
 fault+0.029
 kindness+0.026
 generosity+0.025
 cand+0.025
 forgiveness+0.024
Neg logit contributions
he-0.025
ang-0.023
mer-0.022
tted-0.021
amed-0.020
Token the
Token ID262
Feature activation+8.90e-03

Pos logit contributions
he+0.044
ang+0.041
mer+0.040
tted+0.039
imp+0.037
Neg logit contributions
 fault-0.041
 kindness-0.037
 generosity-0.034
 cand-0.034
 forgiveness-0.033
Token boat
Token ID8848
Feature activation+6.11e-02

Pos logit contributions
he+0.019
ang+0.018
mer+0.017
tted+0.017
amed+0.016
Neg logit contributions
 fault-0.016
 kindness-0.014
 generosity-0.014
 cand-0.013
 diamonds-0.013
Token three
Token ID1115
Feature activation+7.82e-02

Pos logit contributions
he+0.085
mer+0.082
ang+0.079
tted+0.077
imp+0.071
Neg logit contributions
 fault-0.087
 kindness-0.079
 generosity-0.075
 cand-0.075
 unic-0.075
Token years
Token ID812
Feature activation+6.61e-02

Pos logit contributions
he+0.143
ang+0.131
mer+0.128
tted+0.126
amed+0.118
Neg logit contributions
 fault-0.160
 kindness-0.146
 generosity-0.143
 cand-0.142
 unic-0.136
Token old
Token ID1468
Feature activation+5.89e-02

Pos logit contributions
he+0.138
ang+0.130
mer+0.128
tted+0.125
amed+0.120
Neg logit contributions
 fault-0.119
 kindness-0.107
 generosity-0.103
 cand-0.102
 unic-0.101
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
he+0.101
ang+0.094
mer+0.090
tted+0.082
amed+0.080
Neg logit contributions
 fault-0.132
 kindness-0.116
 generosity-0.115
 vitamins-0.111
 forgiveness-0.110
Token was
Token ID373
Feature activation+4.41e-02

Pos logit contributions
 fault+0.032
 kindness+0.030
 generosity+0.029
 cand+0.028
 honesty+0.028
Neg logit contributions
he-0.031
mer-0.029
ang-0.029
tted-0.028
imp-0.027
Token very
Token ID845
Feature activation+1.40e-01

Pos logit contributions
he+0.073
mer+0.067
ang+0.066
tted+0.064
imp+0.058
Neg logit contributions
 fault-0.101
 kindness-0.092
 generosity-0.089
 unic-0.087
 cand-0.087
Token happy
Token ID3772
Feature activation+7.05e-03

Pos logit contributions
he+0.283
mer+0.260
tted+0.257
ang+0.253
amed+0.251
Neg logit contributions
 fault-0.243
 kindness-0.237
 generosity-0.224
 unic-0.217
 honesty-0.216
Token.
Token ID13
Feature activation+1.04e-02

Pos logit contributions
he+0.010
ang+0.009
mer+0.008
apping+0.008
room+0.008
Neg logit contributions
 fault-0.017
 generosity-0.015
urger-0.015
 kindness-0.015
 vitamins-0.015
Token One
Token ID1881
Feature activation-1.68e-01

Pos logit contributions
he+0.017
ang+0.015
mer+0.014
tted+0.014
amed+0.013
Neg logit contributions
 fault-0.024
 kindness-0.022
 generosity-0.022
 cand-0.022
 forgiveness-0.021
Token day
Token ID1110
Feature activation+6.56e-03

Pos logit contributions
 fault+0.270
 kindness+0.264
 generosity+0.246
 dreams+0.241
 honesty+0.239
Neg logit contributions
he-0.344
ang-0.342
mer-0.338
tted-0.334
imp-0.319
Token,
Token ID11
Feature activation+2.54e-03

Pos logit contributions
he+0.014
ang+0.013
mer+0.013
tted+0.012
amed+0.012
Neg logit contributions
 fault-0.012
 kindness-0.011
 generosity-0.010
 cand-0.010
 forgiveness-0.010
Token to
Token ID284
Feature activation-3.82e-02

Pos logit contributions
he+0.272
ang+0.262
mer+0.249
tted+0.243
umbling+0.234
Neg logit contributions
 fault-0.291
 generosity-0.264
 cand-0.260
 kindness-0.258
 vitamins-0.255
Token cry
Token ID3960
Feature activation-4.89e-02

Pos logit contributions
 fault+0.077
 generosity+0.068
 kindness+0.067
 cand+0.067
 vitamins+0.066
Neg logit contributions
he-0.073
ang-0.065
mer-0.063
tted-0.061
imp-0.058
Token.
Token ID13
Feature activation+3.37e-02

Pos logit contributions
 fault+0.108
 generosity+0.096
 kindness+0.095
 cand+0.094
 vitamins+0.094
Neg logit contributions
he-0.083
ang-0.077
mer-0.074
tted-0.068
umbling-0.065
Token
Token ID198
Feature activation+2.04e-02

Pos logit contributions
he+0.058
ang+0.054
mer+0.051
tted+0.049
amed+0.046
Neg logit contributions
 fault-0.074
 kindness-0.066
 generosity-0.065
 cand-0.064
 forgiveness-0.062
Token
Token ID198
Feature activation+1.70e-02

Pos logit contributions
he+0.040
mer+0.038
ang+0.037
tted+0.034
amed+0.034
Neg logit contributions
 fault-0.038
 kindness-0.034
 forgiveness-0.033
 generosity-0.033
 honesty-0.032
Token"
Token ID1
Feature activation+1.44e-01

Pos logit contributions
he+0.029
mer+0.027
ang+0.024
amed+0.023
tted+0.023
Neg logit contributions
 kindness-0.032
 fault-0.031
 forgiveness-0.031
 honesty-0.030
 generosity-0.030
TokenO
Token ID46
Feature activation-3.52e-02

Pos logit contributions
he+0.190
ang+0.177
mer+0.166
tted+0.155
amed+0.146
Neg logit contributions
 fault-0.362
 kindness-0.336
 forgiveness-0.329
 generosity-0.327
 cand-0.323
Tokenw
Token ID86
Feature activation-5.15e-02

Pos logit contributions
 fault+0.082
 kindness+0.075
 generosity+0.074
 cand+0.072
 unic+0.071
Neg logit contributions
he-0.057
ang-0.053
mer-0.050
tted-0.047
amed-0.044
Token,
Token ID11
Feature activation+1.68e-02

Pos logit contributions
 fault+0.111
 generosity+0.101
 kindness+0.100
 vitamins+0.096
 cand+0.096
Neg logit contributions
he-0.090
ang-0.085
mer-0.078
amed-0.074
tted-0.073
Token ow
Token ID12334
Feature activation-9.41e-02

Pos logit contributions
he+0.041
ang+0.039
mer+0.038
tted+0.037
amed+0.036
Neg logit contributions
 fault-0.024
 kindness-0.021
 generosity-0.020
 cand-0.019
 forgiveness-0.019
Token,
Token ID11
Feature activation+1.66e-02

Pos logit contributions
 fault+0.195
 kindness+0.173
 generosity+0.172
 cand+0.166
 forgiveness+0.163
Neg logit contributions
he-0.175
ang-0.166
mer-0.157
tted-0.147
imp-0.141
Token,
Token ID11
Feature activation+8.61e-03

Pos logit contributions
he+0.091
ang+0.086
mer+0.085
tted+0.082
amed+0.079
Neg logit contributions
 fault-0.080
 kindness-0.070
 generosity-0.067
 cand-0.066
 forgiveness-0.065
Token there
Token ID612
Feature activation-2.40e-03

Pos logit contributions
mer+0.017
ang+0.017
he+0.017
tted+0.016
umbling+0.016
Neg logit contributions
 fault-0.014
 kindness-0.012
 unic-0.012
 diamonds-0.011
 honesty-0.011
Token was
Token ID373
Feature activation+4.25e-02

Pos logit contributions
 fault+0.004
 kindness+0.004
 cand+0.003
 dreams+0.003
 unic+0.003
Neg logit contributions
ang-0.005
he-0.005
mer-0.005
tted-0.005
imp-0.005
Token a
Token ID257
Feature activation-3.04e-02

Pos logit contributions
he+0.091
ang+0.087
mer+0.086
tted+0.081
imp+0.078
Neg logit contributions
 fault-0.075
 kindness-0.065
 generosity-0.063
 cand-0.062
 unic-0.060
Token little
Token ID1310
Feature activation+8.16e-02

Pos logit contributions
 fault+0.048
 kindness+0.046
 cand+0.044
 generosity+0.044
 unic+0.044
Neg logit contributions
he-0.066
tted-0.061
amed-0.061
ang-0.059
mer-0.059
Token girl
Token ID2576
Feature activation+1.40e-01

Pos logit contributions
he+0.139
tted+0.131
amed+0.125
ang+0.120
mer+0.118
Neg logit contributions
 fault-0.170
 kindness-0.163
 cand-0.160
 generosity-0.157
 diamonds-0.154
Token named
Token ID3706
Feature activation+4.94e-02

Pos logit contributions
he+0.236
tted+0.211
mer+0.210
ang+0.208
amed+0.195
Neg logit contributions
 fault-0.298
 kindness-0.281
 unic-0.273
 generosity-0.270
 cand-0.270
Token Lily
Token ID20037
Feature activation-9.74e-03

Pos logit contributions
he+0.104
ang+0.101
mer+0.098
tted+0.093
amed+0.089
Neg logit contributions
 fault-0.089
 kindness-0.078
 cand-0.075
 generosity-0.074
 unic-0.073
Token.
Token ID13
Feature activation+1.23e-02

Pos logit contributions
 fault+0.019
 kindness+0.017
 generosity+0.017
 cand+0.016
 forgiveness+0.016
Neg logit contributions
he-0.019
ang-0.018
mer-0.017
tted-0.017
amed-0.016
Token She
Token ID1375
Feature activation-1.24e-02

Pos logit contributions
he+0.019
ang+0.018
mer+0.017
tted+0.016
imp+0.015
Neg logit contributions
 fault-0.029
 kindness-0.026
 cand-0.026
 generosity-0.026
 unic-0.025
Token loved
Token ID6151
Feature activation-1.62e-02

Pos logit contributions
 fault+0.022
 unic+0.021
 kindness+0.020
 cand+0.020
 generosity+0.019
Neg logit contributions
he-0.025
ang-0.024
mer-0.023
tted-0.022
imp-0.022
Token
Token ID198
Feature activation+1.80e-02

Pos logit contributions
mer+0.037
ang+0.036
he+0.036
tted+0.034
amed+0.033
Neg logit contributions
 fault-0.032
 kindness-0.029
 unic-0.028
 generosity-0.027
 honesty-0.026
TokenSuddenly
Token ID38582
Feature activation-3.78e-02

Pos logit contributions
he+0.031
mer+0.031
ang+0.030
tted+0.027
amed+0.026
Neg logit contributions
 unic-0.031
 kindness-0.029
 diamonds-0.029
 cand-0.028
 fault-0.028
Token,
Token ID11
Feature activation+3.67e-03

Pos logit contributions
 fault+0.083
 kindness+0.072
 generosity+0.072
 cand+0.070
 forgiveness+0.069
Neg logit contributions
he-0.066
ang-0.062
mer-0.060
tted-0.056
amed-0.054
Token the
Token ID262
Feature activation+1.95e-02

Pos logit contributions
he+0.007
ang+0.007
mer+0.007
tted+0.006
amed+0.006
Neg logit contributions
 fault-0.007
 kindness-0.006
 generosity-0.006
 cand-0.006
 forgiveness-0.006
Token shadow
Token ID9082
Feature activation+2.49e-02

Pos logit contributions
he+0.041
ang+0.039
mer+0.038
tted+0.037
imp+0.035
Neg logit contributions
 fault-0.034
 kindness-0.031
 generosity-0.029
 dreams-0.029
 unic-0.029
Token started
Token ID2067
Feature activation+1.37e-01

Pos logit contributions
he+0.045
ang+0.042
mer+0.040
tted+0.038
imp+0.037
Neg logit contributions
 fault-0.046
 cand-0.044
 kindness-0.044
 unic-0.044
 worries-0.043
Token to
Token ID284
Feature activation-5.11e-02

Pos logit contributions
he+0.276
ang+0.266
mer+0.256
tted+0.246
amed+0.239
Neg logit contributions
 fault-0.252
 kindness-0.221
 generosity-0.221
 cand-0.216
 vitamins-0.212
Token speak
Token ID2740
Feature activation-5.50e-02

Pos logit contributions
 fault+0.101
 kindness+0.092
 generosity+0.089
 cand+0.086
 unic+0.086
Neg logit contributions
he-0.097
ang-0.094
mer-0.089
tted-0.086
amed-0.082
Token.
Token ID13
Feature activation+1.02e-02

Pos logit contributions
 fault+0.113
 kindness+0.100
 generosity+0.098
 cand+0.096
 forgiveness+0.094
Neg logit contributions
he-0.103
ang-0.096
mer-0.093
tted-0.089
amed-0.085
Token "
Token ID366
Feature activation+1.01e-01

Pos logit contributions
he+0.020
ang+0.019
mer+0.018
tted+0.017
amed+0.017
Neg logit contributions
 fault-0.020
 kindness-0.018
 generosity-0.017
 cand-0.017
 unic-0.017
TokenHello
Token ID15496
Feature activation-1.48e-01

Pos logit contributions
he+0.157
ang+0.147
mer+0.138
tted+0.135
amed+0.128
Neg logit contributions
 fault-0.207
 kindness-0.206
 generosity-0.200
 unic-0.198
 forgiveness-0.197
Token.
Token ID13
Feature activation+1.12e-02

Pos logit contributions
 fault+0.388
 generosity+0.338
 kindness+0.336
 vitamins+0.332
 cand+0.331
Neg logit contributions
he-0.245
ang-0.220
mer-0.219
tted-0.178
imp-0.176
Token
Token ID198
Feature activation+6.09e-03

Pos logit contributions
he+0.020
ang+0.019
mer+0.018
tted+0.017
amed+0.017
Neg logit contributions
 fault-0.024
 kindness-0.021
 generosity-0.020
 honesty-0.020
 vitamins-0.020
Token
Token ID198
Feature activation+4.83e-03

Pos logit contributions
he+0.011
mer+0.010
ang+0.010
tted+0.010
amed+0.009
Neg logit contributions
 fault-0.012
 kindness-0.011
 generosity-0.011
 honesty-0.011
 forgiveness-0.011
Token"
Token ID1
Feature activation+1.26e-01

Pos logit contributions
he+0.008
mer+0.008
ang+0.007
tted+0.007
umbling+0.007
Neg logit contributions
 kindness-0.009
 honesty-0.008
 generosity-0.008
 forgiveness-0.008
 fault-0.008
TokenBy
Token ID3886
Feature activation-4.54e-03

Pos logit contributions
he+0.150
ang+0.132
mer+0.128
tted+0.118
amed+0.109
Neg logit contributions
 fault-0.354
 kindness-0.319
 generosity-0.314
 cand-0.308
 unic-0.303
Tokene
Token ID68
Feature activation+9.79e-02

Pos logit contributions
 kindness+0.010
 forgiveness+0.009
 generosity+0.009
 fault+0.009
 honesty+0.009
Neg logit contributions
he-0.007
ang-0.006
mer-0.006
tted-0.006
umbling-0.006
Token,
Token ID11
Feature activation+5.29e-03

Pos logit contributions
he+0.193
ang+0.180
mer+0.173
tted+0.169
amed+0.161
Neg logit contributions
 fault-0.190
 kindness-0.169
 generosity-0.165
 cand-0.162
 forgiveness-0.156
Token dolphin
Token ID44136
Feature activation+7.84e-02

Pos logit contributions
he+0.011
ang+0.010
tted+0.010
mer+0.010
amed+0.010
Neg logit contributions
 fault-0.010
 kindness-0.009
 generosity-0.008
 honesty-0.008
 cand-0.008
Token!"
Token ID2474
Feature activation+4.82e-04

Pos logit contributions
he+0.163
ang+0.153
mer+0.147
tted+0.143
amed+0.138
Neg logit contributions
 fault-0.143
 kindness-0.126
 generosity-0.123
 cand-0.120
 forgiveness-0.117
Token Ben
Token ID3932
Feature activation+6.81e-02

Pos logit contributions
he+0.001
ang+0.001
mer+0.001
tted+0.001
amed+0.001
Neg logit contributions
 fault-0.001
 kindness-0.001
 generosity-0.001
 forgiveness-0.001
 cand-0.001
Token and
Token ID290
Feature activation-6.48e-03

Pos logit contributions
he+0.119
ang+0.112
mer+0.108
tted+0.101
amed+0.098
Neg logit contributions
 fault-0.149
 kindness-0.131
 generosity-0.129
 cand-0.126
 forgiveness-0.125
Token.
Token ID13
Feature activation+4.15e-02

Pos logit contributions
 fault+0.049
 kindness+0.042
 cand+0.040
 generosity+0.040
 forgiveness+0.038
Neg logit contributions
he-0.051
ang-0.049
mer-0.047
tted-0.045
amed-0.044
Token He
Token ID679
Feature activation+5.10e-02

Pos logit contributions
he+0.076
ang+0.072
mer+0.071
amed+0.065
tted+0.065
Neg logit contributions
 fault-0.087
 kindness-0.077
 generosity-0.074
 forgiveness-0.073
 honesty-0.072
Token sees
Token ID7224
Feature activation+7.17e-02

Pos logit contributions
he+0.105
ang+0.099
mer+0.095
tted+0.092
amed+0.088
Neg logit contributions
 fault-0.095
 kindness-0.083
 generosity-0.082
 cand-0.080
 forgiveness-0.078
Token blood
Token ID2910
Feature activation-1.16e-01

Pos logit contributions
he+0.135
ang+0.124
mer+0.117
tted+0.114
amed+0.110
Neg logit contributions
 fault-0.147
 generosity-0.129
 kindness-0.129
 cand-0.124
 vitamins-0.121
Token on
Token ID319
Feature activation+1.49e-02

Pos logit contributions
 fault+0.239
 kindness+0.215
 cand+0.205
 generosity+0.204
 unic+0.201
Neg logit contributions
he-0.205
ang-0.197
mer-0.190
tted-0.185
amed-0.177
Token his
Token ID465
Feature activation+1.97e-02

Pos logit contributions
he+0.032
ang+0.030
mer+0.029
tted+0.028
amed+0.027
Neg logit contributions
 fault-0.026
 kindness-0.023
 generosity-0.022
 cand-0.022
 forgiveness-0.021
Token finger
Token ID7660
Feature activation+2.81e-02

Pos logit contributions
he+0.038
ang+0.035
mer+0.034
tted+0.033
amed+0.031
Neg logit contributions
 fault-0.039
 kindness-0.035
 generosity-0.034
 cand-0.033
 forgiveness-0.033
Token.
Token ID13
Feature activation+3.93e-02

Pos logit contributions
he+0.055
ang+0.052
mer+0.050
tted+0.048
amed+0.046
Neg logit contributions
 fault-0.055
 kindness-0.048
 generosity-0.047
 cand-0.046
 forgiveness-0.045
Token He
Token ID679
Feature activation+4.91e-02

Pos logit contributions
he+0.072
ang+0.069
mer+0.068
amed+0.062
tted+0.062
Neg logit contributions
 fault-0.082
 kindness-0.072
 generosity-0.069
 forgiveness-0.069
 honesty-0.068
Token starts
Token ID4940
Feature activation+1.44e-01

Pos logit contributions
he+0.103
ang+0.097
mer+0.095
tted+0.091
amed+0.087
Neg logit contributions
 fault-0.090
 kindness-0.079
 generosity-0.076
 cand-0.075
 unic-0.073
Token to
Token ID284
Feature activation-4.53e-02

Pos logit contributions
he+0.297
ang+0.278
mer+0.266
tted+0.259
imp+0.248
Neg logit contributions
 fault-0.261
 generosity-0.229
 kindness-0.228
 cand-0.221
 unic-0.216
Token thinks
Token ID6834
Feature activation-9.53e-02

Pos logit contributions
he+0.147
ang+0.137
mer+0.133
tted+0.128
amed+0.123
Neg logit contributions
 fault-0.136
 kindness-0.118
 generosity-0.117
 cand-0.114
 forgiveness-0.112
Token for
Token ID329
Feature activation+3.92e-02

Pos logit contributions
 fault+0.188
 generosity+0.171
urger+0.170
 cand+0.168
 vitamins+0.167
Neg logit contributions
he-0.179
ang-0.150
 forward-0.135
room-0.134
apping-0.132
Token a
Token ID257
Feature activation-4.02e-02

Pos logit contributions
he+0.054
room+0.038
 corner+0.036
aming+0.035
amed+0.034
Neg logit contributions
 fault-0.106
urger-0.091
 Majesty-0.087
 generosity-0.086
 instincts-0.086
Token moment
Token ID2589
Feature activation-8.97e-02

Pos logit contributions
 fault+0.083
 vitamins+0.071
 cand+0.070
 unic+0.070
 generosity+0.070
Neg logit contributions
he-0.072
mer-0.067
ang-0.063
tted-0.060
 corner-0.059
Token.
Token ID13
Feature activation+2.23e-02

Pos logit contributions
 fault+0.181
 kindness+0.160
 generosity+0.157
 cand+0.153
 forgiveness+0.150
Neg logit contributions
he-0.171
ang-0.160
mer-0.154
tted-0.148
amed-0.141
Token She
Token ID1375
Feature activation+3.93e-03

Pos logit contributions
he+0.035
ang+0.034
mer+0.033
tted+0.030
imp+0.030
Neg logit contributions
 fault-0.052
 kindness-0.048
 generosity-0.046
 dreams-0.045
 honesty-0.045
Token still
Token ID991
Feature activation+2.85e-02

Pos logit contributions
he+0.007
ang+0.006
mer+0.006
tted+0.006
amed+0.006
Neg logit contributions
 fault-0.008
 generosity-0.007
 kindness-0.007
 cand-0.007
 forgiveness-0.007
Token does
Token ID857
Feature activation-9.39e-03

Pos logit contributions
he+0.048
ang+0.044
mer+0.042
tted+0.040
imp+0.038
Neg logit contributions
 fault-0.062
 cand-0.056
 generosity-0.055
 kindness-0.055
 diamonds-0.055
Token not
Token ID407
Feature activation-5.26e-02

Pos logit contributions
 fault+0.017
 cand+0.015
 unic+0.015
 diamonds+0.015
urger+0.014
Neg logit contributions
he-0.019
ang-0.017
imp-0.017
mer-0.016
 forward-0.016
Token like
Token ID588
Feature activation+5.53e-02

Pos logit contributions
 fault+0.122
 generosity+0.107
 cand+0.107
 kindness+0.106
 unic+0.105
Neg logit contributions
he-0.085
ang-0.074
mer-0.074
tted-0.068
imp-0.066
Token Tim
Token ID5045
Feature activation-1.06e-02

Pos logit contributions
he+0.093
ang+0.083
mer+0.081
room+0.076
imp+0.076
Neg logit contributions
 fault-0.107
 generosity-0.097
urger-0.096
 kindness-0.094
ccoli-0.092
Token colorful
Token ID20239
Feature activation-2.89e-02

Pos logit contributions
he+0.060
ang+0.058
mer+0.054
tted+0.051
amed+0.051
Neg logit contributions
 fault-0.074
 generosity-0.067
 kindness-0.065
 forgiveness-0.064
urger-0.063
Token and
Token ID290
Feature activation+1.28e-02

Pos logit contributions
 fault+0.068
 generosity+0.061
 kindness+0.061
 honesty+0.059
 forgiveness+0.059
Neg logit contributions
he-0.045
ang-0.040
mer-0.040
tted-0.035
room-0.034
Token fun
Token ID1257
Feature activation-8.01e-02

Pos logit contributions
he+0.024
ang+0.022
mer+0.022
tted+0.021
amed+0.019
Neg logit contributions
 fault-0.027
urger-0.022
 generosity-0.022
 kindness-0.021
 forgiveness-0.021
Token.
Token ID13
Feature activation+3.08e-02

Pos logit contributions
 fault+0.163
 generosity+0.154
 kindness+0.149
urger+0.147
 forgiveness+0.145
Neg logit contributions
he-0.144
mer-0.127
ang-0.126
room-0.113
umpy-0.113
Token Do
Token ID2141
Feature activation-1.36e-01

Pos logit contributions
he+0.050
ang+0.047
mer+0.045
tted+0.043
amed+0.040
Neg logit contributions
 fault-0.070
 kindness-0.062
 generosity-0.062
 cand-0.060
 forgiveness-0.059
Token you
Token ID345
Feature activation+2.82e-03

Pos logit contributions
 fault+0.360
urger+0.313
 generosity+0.311
 kindness+0.300
 diamonds+0.298
Neg logit contributions
he-0.170
ang-0.154
room-0.128
Hey-0.119
Stop-0.116
Token want
Token ID765
Feature activation+1.22e-01

Pos logit contributions
he+0.004
ang+0.004
mer+0.004
tted+0.003
amed+0.003
Neg logit contributions
 fault-0.007
 kindness-0.006
 generosity-0.006
 cand-0.006
 forgiveness-0.006
Token to
Token ID284
Feature activation-4.67e-02

Pos logit contributions
he+0.188
room+0.164
ang+0.152
 corner+0.151
amed+0.148
Neg logit contributions
 fault-0.277
 generosity-0.250
urger-0.243
 Majesty-0.238
 instincts-0.235
Token play
Token ID711
Feature activation-3.90e-02

Pos logit contributions
 fault+0.115
 generosity+0.107
 vitamins+0.105
 kindness+0.103
 honesty+0.102
Neg logit contributions
he-0.070
ang-0.057
room-0.056
tted-0.053
imp-0.053
Token with
Token ID351
Feature activation+5.31e-02

Pos logit contributions
 fault+0.089
 kindness+0.082
 cand+0.080
 generosity+0.079
 forgiveness+0.079
Neg logit contributions
he-0.060
ang-0.057
mer-0.055
tted-0.052
umbling-0.049
Token blocks
Token ID7021
Feature activation-6.37e-02

Pos logit contributions
he+0.100
ang+0.093
mer+0.090
tted+0.083
room+0.082
Neg logit contributions
 fault-0.110
 generosity-0.097
 kindness-0.096
 honesty-0.090
 cand-0.089
Token her
Token ID607
Feature activation+2.53e-04

Pos logit contributions
 fault+0.124
 kindness+0.111
 cand+0.104
 forgiveness+0.103
 generosity+0.100
Neg logit contributions
he-0.173
ang-0.169
mer-0.167
tted-0.162
imp-0.152
Token knee
Token ID10329
Feature activation+4.83e-02

Pos logit contributions
he+0.000
ang+0.000
mer+0.000
tted+0.000
amed+0.000
Neg logit contributions
 fault-0.001
 generosity-0.000
 kindness-0.000
 vitamins-0.000
 unic-0.000
Token.
Token ID13
Feature activation+2.12e-02

Pos logit contributions
he+0.094
ang+0.087
mer+0.084
tted+0.081
amed+0.077
Neg logit contributions
 fault-0.096
 kindness-0.085
 generosity-0.084
 cand-0.081
 forgiveness-0.080
Token
Token ID198
Feature activation+2.21e-02

Pos logit contributions
he+0.034
ang+0.033
mer+0.033
tted+0.030
amed+0.029
Neg logit contributions
 fault-0.048
 kindness-0.043
 generosity-0.041
 cand-0.041
 forgiveness-0.041
Token
Token ID198
Feature activation+2.01e-02

Pos logit contributions
mer+0.041
he+0.041
ang+0.040
tted+0.037
amed+0.036
Neg logit contributions
 fault-0.043
 kindness-0.039
 generosity-0.037
 forgiveness-0.036
 honesty-0.036
TokenTom
Token ID13787
Feature activation+5.90e-02

Pos logit contributions
mer+0.043
ang+0.034
he+0.034
suit+0.033
enced+0.032
Neg logit contributions
 cand-0.032
 forgiveness-0.031
 kindness-0.031
 honesty-0.030
 candy-0.029
Token ran
Token ID4966
Feature activation+1.10e-01

Pos logit contributions
he+0.122
ang+0.120
mer+0.120
imp+0.112
tted+0.111
Neg logit contributions
 fault-0.094
 cand-0.089
 scarf-0.086
 kindness-0.084
 worries-0.083
Token to
Token ID284
Feature activation-4.89e-02

Pos logit contributions
ang+0.328
he+0.308
mer+0.300
aming+0.294
umbling+0.289
Neg logit contributions
 fault-0.104
 kindness-0.088
 cand-0.075
 candy-0.073
 Brown-0.064
Token help
Token ID1037
Feature activation-6.03e-02

Pos logit contributions
 fault+0.076
 kindness+0.074
 forgiveness+0.070
 generosity+0.068
 honesty+0.066
Neg logit contributions
ang-0.110
mer-0.107
he-0.107
amed-0.101
umbling-0.101
Token her
Token ID607
Feature activation-5.83e-04

Pos logit contributions
 fault+0.074
 kindness+0.073
 forgiveness+0.070
 cand+0.068
 generosity+0.064
Neg logit contributions
ang-0.153
mer-0.152
he-0.141
amed-0.141
tted-0.137
Token.
Token ID13
Feature activation+2.17e-02

Pos logit contributions
 fault+0.001
 kindness+0.001
 generosity+0.001
 cand+0.001
 forgiveness+0.001
Neg logit contributions
he-0.001
ang-0.001
mer-0.001
tted-0.001
amed-0.001
Token.
Token ID13
Feature activation+2.01e-02

Pos logit contributions
 fault+0.036
 kindness+0.032
 generosity+0.031
 cand+0.031
 forgiveness+0.030
Neg logit contributions
he-0.037
ang-0.034
mer-0.033
tted-0.032
amed-0.030
Token
Token ID198
Feature activation+2.03e-02

Pos logit contributions
he+0.041
ang+0.039
mer+0.038
amed+0.036
imp+0.035
Neg logit contributions
 fault-0.036
 kindness-0.033
 generosity-0.031
 cand-0.031
 forgiveness-0.030
Token
Token ID198
Feature activation+1.93e-02

Pos logit contributions
he+0.039
mer+0.038
ang+0.037
tted+0.035
amed+0.034
Neg logit contributions
 fault-0.038
 kindness-0.034
 generosity-0.032
 forgiveness-0.032
 honesty-0.032
TokenThey
Token ID2990
Feature activation+4.51e-02

Pos logit contributions
he+0.036
ang+0.032
mer+0.031
imp+0.029
enced+0.029
Neg logit contributions
 kindness-0.034
 fault-0.034
 cand-0.034
 generosity-0.033
 vitamins-0.032
Token cut
Token ID2005
Feature activation-4.72e-02

Pos logit contributions
ang+0.091
he+0.090
mer+0.086
umbling+0.082
imp+0.082
Neg logit contributions
 fault-0.074
 cand-0.073
 kindness-0.071
 generosity-0.069
 diamonds-0.065
Token another
Token ID1194
Feature activation-9.38e-02

Pos logit contributions
 fault+0.083
 kindness+0.075
 cand+0.071
 generosity+0.069
 forgiveness+0.067
Neg logit contributions
he-0.098
ang-0.094
mer-0.092
tted-0.088
imp-0.087
Token slice
Token ID16416
Feature activation-5.12e-02

Pos logit contributions
 fault+0.173
 kindness+0.150
 generosity+0.148
 cand+0.143
 forgiveness+0.140
Neg logit contributions
he-0.196
ang-0.184
mer-0.179
tted-0.172
amed-0.165
Token of
Token ID286
Feature activation+1.09e-01

Pos logit contributions
 fault+0.088
 kindness+0.082
 cand+0.080
 generosity+0.076
 honesty+0.073
Neg logit contributions
ang-0.102
mer-0.095
he-0.095
imp-0.095
tted-0.092
Token pie
Token ID2508
Feature activation-9.53e-02

Pos logit contributions
he+0.214
ang+0.185
mer+0.182
 corner+0.169
imp+0.168
Neg logit contributions
 fault-0.215
 unic-0.174
 generosity-0.172
 nightmares-0.172
urger-0.170
Token and
Token ID290
Feature activation-6.22e-04

Pos logit contributions
 fault+0.171
 cand+0.151
 kindness+0.149
 generosity+0.144
 diamonds+0.137
Neg logit contributions
he-0.192
ang-0.191
mer-0.178
tted-0.176
imp-0.175
Token ate
Token ID15063
Feature activation-1.05e-01

Pos logit contributions
 fault+0.001
 kindness+0.001
 generosity+0.001
 unic+0.001
 forgiveness+0.001
Neg logit contributions
he-0.001
ang-0.001
mer-0.001
tted-0.001
amed-0.001
Token wide
Token ID3094
Feature activation+6.70e-02

Pos logit contributions
 fault+0.063
 generosity+0.058
 kindness+0.057
 diamonds+0.056
 forgiveness+0.055
Neg logit contributions
he-0.045
ang-0.043
mer-0.040
tted-0.038
imp-0.035
Token tail
Token ID7894
Feature activation+2.05e-02

Pos logit contributions
he+0.132
ang+0.124
mer+0.118
tted+0.113
amed+0.109
Neg logit contributions
 fault-0.127
 generosity-0.110
 kindness-0.110
 forgiveness-0.106
 cand-0.105
Token and
Token ID290
Feature activation-1.35e-02

Pos logit contributions
he+0.042
ang+0.039
mer+0.038
tted+0.036
amed+0.035
Neg logit contributions
 fault-0.039
 kindness-0.034
 generosity-0.034
 cand-0.032
 forgiveness-0.032
Token bright
Token ID6016
Feature activation-1.04e-02

Pos logit contributions
 fault+0.033
 generosity+0.027
 kindness+0.027
urger+0.027
 forgiveness+0.026
Neg logit contributions
he-0.019
ang-0.019
amed-0.017
tted-0.016
mer-0.016
Token blue
Token ID4171
Feature activation-2.70e-02

Pos logit contributions
 fault+0.019
 generosity+0.017
 forgiveness+0.016
 kindness+0.016
 honesty+0.016
Neg logit contributions
he-0.020
ang-0.020
mer-0.018
amed-0.018
room-0.017
Token eyes
Token ID2951
Feature activation-5.72e-02

Pos logit contributions
 fault+0.046
 generosity+0.040
 kindness+0.039
 honesty+0.037
 forgiveness+0.037
Neg logit contributions
he-0.058
ang-0.056
mer-0.052
room-0.051
tted-0.050
Token.
Token ID13
Feature activation+1.20e-02

Pos logit contributions
 fault+0.147
 generosity+0.129
 kindness+0.127
 forgiveness+0.125
 vitamins+0.123
Neg logit contributions
he-0.093
mer-0.072
ang-0.071
tted-0.069
room-0.064
Token
Token ID198
Feature activation+2.25e-02

Pos logit contributions
he+0.020
ang+0.017
mer+0.016
tted+0.016
room+0.015
Neg logit contributions
 fault-0.028
 kindness-0.025
 generosity-0.025
 cand-0.024
 forgiveness-0.024
Token
Token ID198
Feature activation+2.17e-02

Pos logit contributions
he+0.039
ang+0.038
mer+0.037
tted+0.035
amed+0.034
Neg logit contributions
 fault-0.046
 kindness-0.041
 generosity-0.040
 cand-0.039
 forgiveness-0.039
TokenOne
Token ID3198
Feature activation-1.37e-01

Pos logit contributions
ang+0.047
he+0.047
mer+0.046
tted+0.043
amed+0.042
Neg logit contributions
 fault-0.033
 kindness-0.030
 generosity-0.028
 unic-0.028
 cand-0.028
Token day
Token ID1110
Feature activation+1.10e-02

Pos logit contributions
 kindness+0.205
 fault+0.205
 dreams+0.186
 generosity+0.186
 honesty+0.182
Neg logit contributions
he-0.290
mer-0.284
tted-0.284
ang-0.280
amed-0.277
Token saw
Token ID2497
Feature activation+9.17e-02

Pos logit contributions
 fault+0.026
 kindness+0.023
 generosity+0.023
 cand+0.023
 unic+0.022
Neg logit contributions
he-0.023
ang-0.022
mer-0.021
tted-0.020
imp-0.019
Token a
Token ID257
Feature activation-3.30e-02

Pos logit contributions
he+0.175
ang+0.158
mer+0.149
amed+0.147
tted+0.145
Neg logit contributions
 fault-0.185
 generosity-0.160
 kindness-0.158
urger-0.153
 vitamins-0.152
Token spark
Token ID9009
Feature activation-4.83e-02

Pos logit contributions
 fault+0.064
 kindness+0.057
 generosity+0.056
 cand+0.055
 unic+0.053
Neg logit contributions
he-0.065
ang-0.061
mer-0.058
tted-0.057
amed-0.054
Tokenly
Token ID306
Feature activation+5.38e-02

Pos logit contributions
 fault+0.103
 kindness+0.093
 generosity+0.091
 cand+0.091
 diamonds+0.088
Neg logit contributions
he-0.083
ang-0.078
mer-0.074
tted-0.074
amed-0.070
Token egg
Token ID5935
Feature activation-9.17e-02

Pos logit contributions
he+0.125
ang+0.117
mer+0.116
tted+0.109
amed+0.105
Neg logit contributions
 fault-0.087
 kindness-0.072
 generosity-0.072
 forgiveness-0.067
 honesty-0.067
Token that
Token ID326
Feature activation-9.89e-02

Pos logit contributions
 fault+0.225
 kindness+0.204
 generosity+0.202
 cand+0.195
 forgiveness+0.194
Neg logit contributions
he-0.139
ang-0.126
mer-0.120
tted-0.110
amed-0.106
Token was
Token ID373
Feature activation+3.98e-02

Pos logit contributions
 fault+0.211
 kindness+0.183
 generosity+0.181
 cand+0.174
 forgiveness+0.171
Neg logit contributions
he-0.183
ang-0.167
mer-0.162
tted-0.155
amed-0.150
Token far
Token ID1290
Feature activation+1.46e-01

Pos logit contributions
he+0.071
ang+0.065
mer+0.062
tted+0.060
amed+0.060
Neg logit contributions
 fault-0.085
 kindness-0.076
 generosity-0.075
 cand-0.072
 forgiveness-0.072
Token bigger
Token ID5749
Feature activation+1.19e-02

Pos logit contributions
he+0.226
ang+0.223
tted+0.209
amed+0.199
room+0.195
Neg logit contributions
 fault-0.325
 kindness-0.299
 cand-0.287
 diamonds-0.285
 generosity-0.282
Token than
Token ID621
Feature activation+3.26e-02

Pos logit contributions
he+0.019
ang+0.018
mer+0.017
tted+0.015
amed+0.015
Neg logit contributions
 fault-0.027
 kindness-0.025
 generosity-0.024
 forgiveness-0.024
 cand-0.024
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
he+0.064
ang+0.057
mer+0.056
tted+0.053
amed+0.052
Neg logit contributions
 fault-0.061
 generosity-0.053
urger-0.052
 kindness-0.052
 forgiveness-0.050
Token Lily
Token ID20037
Feature activation-9.59e-03

Pos logit contributions
 fault+0.001
 kindness+0.001
 cand+0.001
 generosity+0.001
 forgiveness+0.001
Neg logit contributions
he-0.001
ang-0.001
mer-0.001
tted-0.001
umbling-0.001
Token went
Token ID1816
Feature activation+1.04e-01

Pos logit contributions
 fault+0.017
 cand+0.015
 kindness+0.015
 dreams+0.014
 unic+0.014
Neg logit contributions
he-0.020
ang-0.020
mer-0.019
tted-0.018
imp-0.018
Token to
Token ID284
Feature activation-5.55e-02

Pos logit contributions
he+0.209
ang+0.195
mer+0.185
tted+0.181
amed+0.175
Neg logit contributions
 fault-0.202
 kindness-0.175
 generosity-0.175
 cand-0.167
 forgiveness-0.164
Token the
Token ID262
Feature activation+1.63e-02

Pos logit contributions
 fault+0.121
 kindness+0.106
 generosity+0.106
 cand+0.104
 vitamins+0.102
Neg logit contributions
he-0.098
ang-0.089
mer-0.084
tted-0.081
imp-0.078
Token park
Token ID3952
Feature activation+6.71e-03

Pos logit contributions
he+0.026
ang+0.024
mer+0.023
tted+0.022
amed+0.021
Neg logit contributions
 fault-0.038
 kindness-0.034
 generosity-0.033
 cand-0.033
 forgiveness-0.032
Token to
Token ID284
Feature activation-5.53e-02

Pos logit contributions
he+0.014
ang+0.014
mer+0.013
tted+0.013
amed+0.013
Neg logit contributions
 fault-0.011
 kindness-0.010
 generosity-0.010
 cand-0.010
 honesty-0.009
Token play
Token ID711
Feature activation-5.74e-02

Pos logit contributions
 fault+0.101
 kindness+0.093
 generosity+0.089
 cand+0.086
 forgiveness+0.086
Neg logit contributions
he-0.112
ang-0.109
mer-0.106
tted-0.099
amed-0.098
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
 kindness+0.103
 fault+0.101
 cand+0.101
 forgiveness+0.099
 dreams+0.094
Neg logit contributions
mer-0.098
ang-0.097
he-0.095
umbling-0.093
amed-0.092
Token There
Token ID1318
Feature activation-4.80e-02

Pos logit contributions
mer+0.023
ang+0.022
he+0.021
imp+0.020
aming+0.019
Neg logit contributions
 fault-0.022
 kindness-0.019
 Brown-0.019
 cand-0.018
 unic-0.018
Token,
Token ID11
Feature activation+3.10e-03

Pos logit contributions
 fault+0.082
 kindness+0.072
 cand+0.070
 dreams+0.069
 unic+0.068
Neg logit contributions
ang-0.101
mer-0.100
he-0.098
tted-0.096
imp-0.093
Token she
Token ID673
Feature activation-5.63e-03

Pos logit contributions
mer+0.008
ang+0.008
he+0.007
tted+0.007
imp+0.007
Neg logit contributions
 fault-0.004
 cand-0.004
 diamonds-0.003
 kindness-0.003
 unic-0.003
Token separate
Token ID4553
Feature activation-2.88e-02

Pos logit contributions
 fault+0.122
 kindness+0.106
 generosity+0.105
 cand+0.103
 honesty+0.099
Neg logit contributions
he-0.121
ang-0.110
mer-0.107
tted-0.104
imp-0.098
Token their
Token ID511
Feature activation+1.78e-02

Pos logit contributions
 fault+0.057
 kindness+0.051
 generosity+0.049
 cand+0.048
 forgiveness+0.047
Neg logit contributions
he-0.056
ang-0.053
mer-0.051
tted-0.049
amed-0.047
Token eyes
Token ID2951
Feature activation-7.24e-02

Pos logit contributions
he+0.037
mer+0.036
ang+0.036
tted+0.034
imp+0.032
Neg logit contributions
 fault-0.030
 generosity-0.025
 kindness-0.025
 unic-0.024
 forgiveness-0.024
Token.
Token ID13
Feature activation-5.72e-03

Pos logit contributions
 fault+0.152
 kindness+0.136
 generosity+0.131
 cand+0.131
 diamonds+0.125
Neg logit contributions
he-0.125
ang-0.122
mer-0.115
tted-0.109
imp-0.107
Token The
Token ID383
Feature activation-1.01e-01

Pos logit contributions
 fault+0.011
 kindness+0.009
 generosity+0.009
 cand+0.009
 honesty+0.008
Neg logit contributions
he-0.012
ang-0.011
mer-0.010
amed-0.010
aming-0.010
Token bear
Token ID6842
Feature activation-4.21e-02

Pos logit contributions
 fault+0.196
 kindness+0.153
 generosity+0.153
 cand+0.152
 vitamins+0.152
Neg logit contributions
he-0.207
mer-0.195
ang-0.193
tted-0.182
imp-0.172
Token's
Token ID338
Feature activation-1.07e-01

Pos logit contributions
 fault+0.088
 kindness+0.079
 generosity+0.077
 cand+0.076
 forgiveness+0.074
Neg logit contributions
he-0.076
ang-0.071
mer-0.068
tted-0.066
amed-0.062
Token eye
Token ID4151
Feature activation+1.08e-01

Pos logit contributions
 fault+0.168
 kindness+0.142
 generosity+0.140
 cand+0.140
 unic+0.136
Neg logit contributions
he-0.235
ang-0.225
mer-0.221
tted-0.213
amed-0.206
Token went
Token ID1816
Feature activation+9.90e-02

Pos logit contributions
he+0.181
ang+0.167
mer+0.164
tted+0.157
amed+0.150
Neg logit contributions
 fault-0.244
 kindness-0.216
 generosity-0.213
 cand-0.205
 forgiveness-0.204
Token far
Token ID1290
Feature activation+1.36e-01

Pos logit contributions
he+0.198
ang+0.193
mer+0.181
tted+0.173
umbling+0.168
Neg logit contributions
 fault-0.182
 kindness-0.166
 generosity-0.157
 cand-0.155
 forgiveness-0.151
Token away
Token ID1497
Feature activation+1.05e-01

Pos logit contributions
he+0.251
ang+0.245
tted+0.241
amed+0.225
room+0.220
Neg logit contributions
 fault-0.263
 kindness-0.236
 cand-0.222
 generosity-0.220
 diamonds-0.219
Token could
Token ID714
Feature activation+2.16e-02

Pos logit contributions
he+0.087
ang+0.080
mer+0.078
tted+0.075
amed+0.071
Neg logit contributions
 fault-0.103
 kindness-0.091
 generosity-0.090
 cand-0.087
 forgiveness-0.086
Token barely
Token ID8523
Feature activation+1.35e-02

Pos logit contributions
he+0.046
ang+0.041
tted+0.040
mer+0.040
imp+0.038
Neg logit contributions
 fault-0.040
 kindness-0.034
 generosity-0.034
 cand-0.033
 vitamins-0.033
Token contain
Token ID3994
Feature activation+1.11e-01

Pos logit contributions
he+0.025
ang+0.023
mer+0.022
tted+0.021
amed+0.020
Neg logit contributions
 fault-0.028
 kindness-0.025
 generosity-0.025
 cand-0.024
 forgiveness-0.024
Token his
Token ID465
Feature activation+7.62e-03

Pos logit contributions
he+0.219
ang+0.200
mer+0.191
tted+0.183
amed+0.182
Neg logit contributions
 fault-0.218
 cand-0.185
 generosity-0.183
 kindness-0.182
 unic-0.175
Token impat
Token ID28189
Feature activation-6.02e-02

Pos logit contributions
he+0.015
ang+0.015
mer+0.014
tted+0.014
amed+0.013
Neg logit contributions
 fault-0.014
 cand-0.012
 unic-0.012
 kindness-0.012
 generosity-0.012
Tokenience
Token ID1240
Feature activation-2.76e-01

Pos logit contributions
 fault+0.078
 kindness+0.060
 vitamins+0.060
 generosity+0.060
 cand+0.060
Neg logit contributions
he-0.159
ang-0.157
mer-0.156
tted-0.141
imp-0.140
Token as
Token ID355
Feature activation+7.43e-02

Pos logit contributions
 fault+0.580
 kindness+0.510
 generosity+0.508
 cand+0.497
 forgiveness+0.495
Neg logit contributions
he-0.495
mer-0.447
ang-0.436
tted-0.412
amed-0.383
Token he
Token ID339
Feature activation-1.12e-02

Pos logit contributions
he+0.162
ang+0.152
mer+0.149
tted+0.143
imp+0.136
Neg logit contributions
 fault-0.128
 kindness-0.108
 cand-0.107
 generosity-0.106
 vitamins-0.101
Token opened
Token ID4721
Feature activation-6.03e-03

Pos logit contributions
 fault+0.023
 kindness+0.020
 generosity+0.020
 cand+0.019
 forgiveness+0.019
Neg logit contributions
he-0.021
ang-0.019
mer-0.019
tted-0.018
amed-0.017
Token the
Token ID262
Feature activation+6.95e-03

Pos logit contributions
 fault+0.013
 generosity+0.011
 kindness+0.011
 cand+0.011
 forgiveness+0.011
Neg logit contributions
he-0.011
ang-0.010
mer-0.010
tted-0.009
imp-0.009
Token ch
Token ID442
Feature activation-8.21e-03

Pos logit contributions
he+0.011
ang+0.010
mer+0.010
tted+0.009
amed+0.009
Neg logit contributions
 fault-0.016
 kindness-0.014
 generosity-0.014
 cand-0.014
 forgiveness-0.014
Token they
Token ID484
Feature activation-1.40e-02

Pos logit contributions
 fault+0.143
 kindness+0.126
 generosity+0.124
 cand+0.120
 forgiveness+0.117
Neg logit contributions
he-0.154
ang-0.143
mer-0.137
tted-0.132
imp-0.126
Token didn
Token ID1422
Feature activation-1.12e-03

Pos logit contributions
 fault+0.031
 kindness+0.027
 generosity+0.027
 cand+0.026
 forgiveness+0.026
Neg logit contributions
he-0.024
ang-0.023
mer-0.022
tted-0.021
amed-0.020
Token't
Token ID470
Feature activation-7.04e-02

Pos logit contributions
 fault+0.002
 kindness+0.002
 generosity+0.002
 cand+0.002
 forgiveness+0.002
Neg logit contributions
he-0.002
ang-0.002
mer-0.002
tted-0.002
amed-0.002
Token buy
Token ID2822
Feature activation-9.30e-02

Pos logit contributions
 fault+0.144
 kindness+0.123
 generosity+0.123
 unic+0.119
 cand+0.119
Neg logit contributions
he-0.140
ang-0.124
mer-0.120
tted-0.118
imp-0.112
Token the
Token ID262
Feature activation-2.43e-03

Pos logit contributions
 fault+0.197
 generosity+0.180
ccoli+0.177
urger+0.174
 weaknesses+0.174
Neg logit contributions
he-0.158
mer-0.123
ang-0.123
 forward-0.117
room-0.110
Token candy
Token ID18550
Feature activation-2.24e-01

Pos logit contributions
 fault+0.003
 generosity+0.002
 unic+0.002
urger+0.002
 kindness+0.002
Neg logit contributions
he-0.007
mer-0.006
tted-0.006
 corner-0.006
ang-0.006
Token.
Token ID13
Feature activation-5.60e-04

Pos logit contributions
 fault+0.415
 kindness+0.364
 cand+0.357
 generosity+0.355
 forgiveness+0.338
Neg logit contributions
he-0.448
ang-0.439
mer-0.414
tted-0.403
amed-0.393
Token Her
Token ID2332
Feature activation-1.53e-01

Pos logit contributions
 fault+0.001
 kindness+0.001
 generosity+0.001
 cand+0.001
 forgiveness+0.001
Neg logit contributions
he-0.001
ang-0.001
mer-0.001
tted-0.001
amed-0.001
Token mom
Token ID1995
Feature activation+2.26e-02

Pos logit contributions
 fault+0.290
 vitamins+0.276
 diamonds+0.259
 generosity+0.256
 kindness+0.254
Neg logit contributions
he-0.275
ang-0.247
amed-0.229
tted-0.225
mer-0.221
Token explained
Token ID4893
Feature activation-1.20e-01

Pos logit contributions
he+0.045
ang+0.041
mer+0.038
amed+0.037
tted+0.036
Neg logit contributions
 fault-0.047
 vitamins-0.039
 diamonds-0.039
 kindness-0.038
 generosity-0.038
Token to
Token ID284
Feature activation-6.47e-02

Pos logit contributions
 fault+0.246
urger+0.212
 generosity+0.208
 vitamins+0.205
 diamonds+0.196
Neg logit contributions
he-0.216
ang-0.184
mer-0.181
 nearby-0.172
imp-0.170
Token
Token ID198
Feature activation+2.10e-02

Pos logit contributions
he+0.026
ang+0.025
mer+0.024
tted+0.022
amed+0.021
Neg logit contributions
 fault-0.035
 kindness-0.032
 generosity-0.031
 cand-0.030
 forgiveness-0.030
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
ang+0.040
mer+0.040
he+0.039
tted+0.037
amed+0.036
Neg logit contributions
 fault-0.038
 kindness-0.034
 unic-0.032
 generosity-0.032
 cand-0.031
TokenL
Token ID43
Feature activation+8.82e-02

Pos logit contributions
he+0.028
ang+0.027
mer+0.026
umbling+0.025
pour+0.023
Neg logit contributions
 unic-0.041
 cand-0.040
 fault-0.039
 kindness-0.039
 brace-0.037
Tokenily
Token ID813
Feature activation+1.02e-01

Pos logit contributions
ang+0.119
mer+0.117
he+0.110
umbling+0.100
tted+0.099
Neg logit contributions
 fault-0.216
 kindness-0.196
 unic-0.194
 cand-0.193
 generosity-0.187
Token and
Token ID290
Feature activation-9.53e-03

Pos logit contributions
he+0.178
ang+0.169
mer+0.162
tted+0.153
amed+0.150
Neg logit contributions
 fault-0.225
 kindness-0.195
 generosity-0.193
 forgiveness-0.186
 cand-0.186
Token Spark
Token ID17732
Feature activation-2.36e-01

Pos logit contributions
 fault+0.017
 kindness+0.014
 generosity+0.014
 vitamins+0.013
urger+0.013
Neg logit contributions
he-0.021
ang-0.020
mer-0.019
tted-0.018
amed-0.017
Tokeny
Token ID88
Feature activation+1.28e-01

Pos logit contributions
 fault+0.628
 kindness+0.576
 cand+0.568
 generosity+0.563
 unic+0.557
Neg logit contributions
he-0.278
ang-0.256
tted-0.229
mer-0.229
imp-0.208
Token became
Token ID2627
Feature activation+2.85e-02

Pos logit contributions
he+0.162
mer+0.147
ang+0.144
amed+0.133
tted+0.126
Neg logit contributions
 fault-0.340
 vitamins-0.291
 generosity-0.287
 kindness-0.287
 honesty-0.281
Token friends
Token ID2460
Feature activation-2.70e-03

Pos logit contributions
he+0.078
ang+0.071
mer+0.071
tted+0.070
umbling+0.068
Neg logit contributions
 fault-0.035
 kindness-0.032
 generosity-0.030
 cand-0.028
 unic-0.027
Token and
Token ID290
Feature activation-7.74e-03

Pos logit contributions
 fault+0.006
 kindness+0.005
 generosity+0.005
 cand+0.005
 vitamins+0.005
Neg logit contributions
he-0.005
ang-0.005
mer-0.004
tted-0.004
amed-0.004
Token they
Token ID484
Feature activation-2.39e-03

Pos logit contributions
 fault+0.019
urger+0.016
 vitamins+0.016
 generosity+0.015
 kindness+0.015
Neg logit contributions
he-0.011
ang-0.010
mer-0.010
amed-0.009
tted-0.009
Token Jack
Token ID3619
Feature activation+1.28e-02

Pos logit contributions
ang+0.038
he+0.038
mer+0.035
tted+0.034
amed+0.033
Neg logit contributions
 fault-0.031
 kindness-0.027
 generosity-0.027
 vitamins-0.026
 cand-0.025
Token loved
Token ID6151
Feature activation-1.16e-02

Pos logit contributions
he+0.022
ang+0.020
mer+0.019
tted+0.018
amed+0.017
Neg logit contributions
 fault-0.028
 kindness-0.025
 generosity-0.025
 cand-0.024
 forgiveness-0.024
Token to
Token ID284
Feature activation-4.92e-02

Pos logit contributions
 fault+0.027
 generosity+0.023
 kindness+0.022
 forgiveness+0.022
ccoli+0.022
Neg logit contributions
ang-0.016
he-0.016
mer-0.015
 far-0.014
 nearby-0.014
Token go
Token ID467
Feature activation+1.38e-01

Pos logit contributions
 fault+0.117
 generosity+0.095
 kindness+0.095
 cand+0.095
 honesty+0.093
Neg logit contributions
he-0.081
tted-0.070
mer-0.070
ang-0.069
room-0.068
Token on
Token ID319
Feature activation+1.52e-02

Pos logit contributions
he+0.223
ang+0.208
mer+0.198
tted+0.191
imp+0.179
Neg logit contributions
 fault-0.327
 generosity-0.281
 kindness-0.280
 forgiveness-0.274
 cand-0.270
Token adventures
Token ID17545
Feature activation-2.15e-01

Pos logit contributions
he+0.026
 far+0.024
mer+0.023
ang+0.022
 nearby+0.022
Neg logit contributions
 fault-0.031
 generosity-0.027
 forgiveness-0.027
ccoli-0.026
 kindness-0.026
Token with
Token ID351
Feature activation+4.46e-02

Pos logit contributions
 fault+0.570
 kindness+0.514
 generosity+0.509
 cand+0.494
 forgiveness+0.492
Neg logit contributions
he-0.285
ang-0.254
mer-0.245
tted-0.224
amed-0.206
Token his
Token ID465
Feature activation+2.08e-02

Pos logit contributions
ang+0.046
he+0.045
 far+0.045
 deep+0.040
room+0.040
Neg logit contributions
 fault-0.127
ccoli-0.101
 generosity-0.100
 forgiveness-0.096
 kindness-0.095
Token friends
Token ID2460
Feature activation+4.23e-03

Pos logit contributions
mer+0.038
ang+0.037
he+0.035
tted+0.032
suit+0.032
Neg logit contributions
 fault-0.040
 kindness-0.033
 forgiveness-0.033
 generosity-0.033
 cand-0.033
Token.
Token ID13
Feature activation+1.73e-02

Pos logit contributions
he+0.008
ang+0.007
mer+0.007
tted+0.006
amed+0.006
Neg logit contributions
 fault-0.009
 kindness-0.008
 generosity-0.008
 cand-0.008
 forgiveness-0.008
Token One
Token ID1881
Feature activation-1.58e-01

Pos logit contributions
he+0.035
ang+0.032
mer+0.031
tted+0.030
room+0.028
Neg logit contributions
 fault-0.033
 kindness-0.030
 generosity-0.029
 cand-0.029
 forgiveness-0.028
Token had
Token ID550
Feature activation-4.59e-02

Pos logit contributions
 fault+0.046
 kindness+0.041
 generosity+0.041
 cand+0.040
 forgiveness+0.039
Neg logit contributions
he-0.036
ang-0.033
mer-0.032
tted-0.031
amed-0.029
Token hurt
Token ID5938
Feature activation-1.04e-01

Pos logit contributions
 fault+0.092
urger+0.080
 generosity+0.080
 unic+0.080
 kindness+0.079
Neg logit contributions
he-0.075
tted-0.072
mer-0.070
amed-0.070
ang-0.067
Token the
Token ID262
Feature activation-9.44e-03

Pos logit contributions
 fault+0.200
 kindness+0.176
 generosity+0.173
 cand+0.168
 forgiveness+0.164
Neg logit contributions
he-0.209
ang-0.196
mer-0.190
tted-0.182
amed-0.175
Token bird
Token ID6512
Feature activation-6.03e-02

Pos logit contributions
 fault+0.019
 generosity+0.016
 kindness+0.016
 cand+0.016
urger+0.016
Neg logit contributions
he-0.018
ang-0.016
mer-0.016
tted-0.015
imp-0.014
Token's
Token ID338
Feature activation-1.16e-01

Pos logit contributions
 fault+0.129
 generosity+0.114
 vitamins+0.114
 cand+0.112
 kindness+0.112
Neg logit contributions
he-0.108
ang-0.099
mer-0.097
tted-0.089
imp-0.082
Token feelings
Token ID7666
Feature activation-2.42e-01

Pos logit contributions
 fault+0.175
 cand+0.161
 unic+0.156
 diamonds+0.154
 vitamins+0.153
Neg logit contributions
he-0.229
mer-0.223
ang-0.214
tted-0.213
 corner-0.210
Token.
Token ID13
Feature activation-8.32e-03

Pos logit contributions
 fault+0.499
 kindness+0.444
 generosity+0.439
 cand+0.428
 forgiveness+0.418
Neg logit contributions
he-0.450
ang-0.410
mer-0.400
tted-0.384
amed-0.363
Token The
Token ID383
Feature activation-1.15e-01

Pos logit contributions
 fault+0.018
 kindness+0.017
 generosity+0.016
 cand+0.016
 forgiveness+0.016
Neg logit contributions
he-0.014
ang-0.013
mer-0.013
tted-0.012
amed-0.011
Token park
Token ID3952
Feature activation-3.61e-03

Pos logit contributions
 fault+0.240
 generosity+0.200
 kindness+0.199
 vitamins+0.198
 cand+0.198
Neg logit contributions
he-0.210
ang-0.199
mer-0.188
tted-0.180
imp-0.172
Token was
Token ID373
Feature activation+1.76e-02

Pos logit contributions
 fault+0.008
 kindness+0.007
 generosity+0.007
 cand+0.007
 forgiveness+0.006
Neg logit contributions
he-0.006
ang-0.006
mer-0.006
tted-0.006
amed-0.005
Token never
Token ID1239
Feature activation-1.10e-01

Pos logit contributions
he+0.030
ang+0.027
tted+0.026
amed+0.026
mer+0.026
Neg logit contributions
 fault-0.038
 generosity-0.034
 cand-0.034
 kindness-0.034
 vitamins-0.033