TOP ACTIVATIONS
MAX = 0.126

ally , you know you shouldn 't keep the gum .
Hey little boy , you shouldn 't be picking up that
" L ily , we shouldn 't pick up things from
He 's very small and shouldn 't be feared . He
but his mother said he shouldn 't because it was not
told Sarah , " You shouldn 't climb the tree ,
" Me gan , you shouldn 't take things that are
's very rude ! You shouldn 't eat things without asking
No , NO ! You shouldn 't mix them !"
you do that ? You shouldn 't disturb things that don
Billy and told him he shouldn 't have it .
" Fred , maybe you shouldn 't be blowing that loud
he said , " You shouldn 't have taken it !
and said , " You shouldn 't take things that don
Sorry John , but you shouldn 't have taken the soup
, be serious . You shouldn 't play with the door
and said , " You shouldn 't be here ! This
my told her that she shouldn 't touch the newspaper because
, Tim my ! We shouldn 't catch the butterfly .
's not nice . You shouldn 't throw mud at the

MIN ACTIVATIONS
MIN = -0.124

and clean again . <|endoftext|> Once upon a time , in
passed the bridge . <|endoftext|> Once upon a time , there
happily ever after . <|endoftext|> Once upon a time , there
fun time outside . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , a
visit that day . <|endoftext|> Once upon a time there was
. The end . <|endoftext|> Once upon a time , there
was happy again ! <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time there lived
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time there was
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there
. The end . <|endoftext|> Once upon a time , there

INTERVAL 0.063 - 0.126
CONTAINS 0.766%

my 's mom called them for a snack .
all my lettuce . I need a lawyer to help me
reach it , but they cannot . They ask their mom
was confused . He didn 't know what was happening .
so happy that he wanted to fly too , so he

INTERVAL 0.001 - 0.063
CONTAINS 50.664%

. " You are nice kids . You can play with
excited and couldn 't wait to go . They
your mom and dad . They know what is best for
see a wide , white sock . She was so happy
and cried . She didn 't want Boo to leave .

INTERVAL -0.062 - 0.001
CONTAINS 44.050%

thinking about the little bird 's words .
I want you to be happy . But I can not
, there was a girl named Lily . She was very
, Anna goes to the park with her mom . She
Tom started to skip and play while his mom my

INTERVAL -0.124 - -0.062
CONTAINS 4.520%

. <|endoftext|> Once upon a time there was a small creature
<|endoftext|> Once upon a time there was a man who was
very frustrated and this time , he was angry too .
some pine cones on the top to make a point y
They were chocolate chip cookies , Anna 's favorite . "
Tokenally
Token ID453
Feature activation+3.67e-02

Pos logit contributions
 exchange+0.005
give+0.005
oc+0.005
 harm+0.005
 Amen+0.004
Neg logit contributions
tub-0.005
ashing-0.005
angers-0.005
 Mil-0.005
gan-0.005
Token,
Token ID11
Feature activation-3.22e-02

Pos logit contributions
oc+0.058
 exchange+0.056
 harm+0.055
e+0.054
give+0.054
Neg logit contributions
ashing-0.074
tub-0.069
angers-0.065
cast-0.060
gan-0.060
Token you
Token ID345
Feature activation+3.01e-02

Pos logit contributions
ashing+0.070
tub+0.067
angers+0.063
gan+0.060
pit+0.060
Neg logit contributions
 exchange-0.044
give-0.043
 harm-0.042
oc-0.042
 respect-0.042
Token know
Token ID760
Feature activation+6.96e-02

Pos logit contributions
oc+0.039
 harm+0.039
 Amen+0.039
 exchange+0.039
give+0.038
Neg logit contributions
ashing-0.066
tub-0.063
angers-0.058
gan-0.058
 Mil-0.056
Token you
Token ID345
Feature activation+3.07e-02

Pos logit contributions
 respect+0.078
 harm+0.076
give+0.075
 letter+0.072
 exchange+0.072
Neg logit contributions
ashing-0.173
tub-0.165
angers-0.153
ashed-0.151
pit-0.146
Token shouldn
Token ID6584
Feature activation+1.26e-01

Pos logit contributions
oc+0.044
 harm+0.044
 Amen+0.044
 exchange+0.044
give+0.043
Neg logit contributions
ashing-0.062
tub-0.060
gan-0.054
angers-0.054
 Mil-0.053
Token't
Token ID470
Feature activation+5.71e-02

Pos logit contributions
oc+0.270
rum+0.251
 Amen+0.249
 approval+0.248
 envelop+0.242
Neg logit contributions
tub-0.161
 Mil-0.135
 O-0.132
 upon-0.131
 Sandy-0.130
Token keep
Token ID1394
Feature activation+2.59e-02

Pos logit contributions
 Amen+0.107
oc+0.095
izer+0.085
ez+0.084
',"+0.081
Neg logit contributions
ashing-0.106
 thirsty-0.094
tub-0.091
 upon-0.090
 raining-0.086
Token the
Token ID262
Feature activation-9.18e-03

Pos logit contributions
oc+0.055
 Amen+0.054
 exchange+0.047
 owe+0.047
rum+0.047
Neg logit contributions
ashing-0.036
 bored-0.034
 thirsty-0.034
tub-0.033
angers-0.031
Token gum
Token ID27142
Feature activation+3.07e-02

Pos logit contributions
tub+0.017
ashing+0.017
 thirsty+0.016
 bored+0.015
 upon+0.015
Neg logit contributions
 Amen-0.016
oc-0.014
rum-0.014
 harm-0.013
e-0.013
Token.
Token ID13
Feature activation+2.91e-02

Pos logit contributions
 harm+0.044
 exchange+0.043
oc+0.043
 Amen+0.042
give+0.042
Neg logit contributions
ashing-0.063
tub-0.059
angers-0.054
gan-0.054
 Mil-0.053
TokenHey
Token ID10814
Feature activation+2.41e-02

Pos logit contributions
 harm+0.045
 Amen+0.045
oc+0.044
rum+0.043
 exchange+0.041
Neg logit contributions
ashing-0.061
tub-0.055
 upon-0.054
gan-0.054
pit-0.051
Token little
Token ID1310
Feature activation+2.40e-02

Pos logit contributions
 Amen+0.043
oc+0.043
 harm+0.043
 exchange+0.043
give+0.043
Neg logit contributions
ashing-0.040
tub-0.038
gan-0.034
angers-0.033
 Mil-0.032
Token boy
Token ID2933
Feature activation+1.48e-02

Pos logit contributions
 harm+0.038
oc+0.038
 Amen+0.038
 exchange+0.038
give+0.037
Neg logit contributions
ashing-0.046
tub-0.044
angers-0.039
gan-0.039
 Mil-0.038
Token,
Token ID11
Feature activation-4.90e-02

Pos logit contributions
 Amen+0.031
 harm+0.030
give+0.030
 exchange+0.030
oc+0.030
Neg logit contributions
ashing-0.020
tub-0.019
gan-0.017
 Mil-0.016
angers-0.015
Token you
Token ID345
Feature activation+3.08e-02

Pos logit contributions
ashing+0.095
tub+0.090
angers+0.082
gan+0.081
 Mil+0.078
Neg logit contributions
oc-0.076
 harm-0.076
 Amen-0.075
 exchange-0.075
give-0.074
Token shouldn
Token ID6584
Feature activation+1.22e-01

Pos logit contributions
oc+0.044
 Amen+0.043
 harm+0.043
give+0.042
 exchange+0.042
Neg logit contributions
ashing-0.063
tub-0.060
gan-0.055
angers-0.054
 Mil-0.053
Token't
Token ID470
Feature activation+5.57e-02

Pos logit contributions
oc+0.265
rum+0.250
eness+0.241
 approval+0.239
 Amen+0.237
Neg logit contributions
tub-0.133
 upon-0.128
 O-0.124
 Sandy-0.123
 Mil-0.122
Token be
Token ID307
Feature activation+3.53e-03

Pos logit contributions
 Amen+0.108
oc+0.100
izer+0.090
ez+0.090
rum+0.083
Neg logit contributions
ashing-0.096
 upon-0.084
 thirsty-0.083
gan-0.080
 Suz-0.080
Token picking
Token ID10868
Feature activation+8.35e-03

Pos logit contributions
oc+0.007
 Amen+0.007
 exchange+0.007
 harm+0.007
give+0.007
Neg logit contributions
ashing-0.005
tub-0.005
gan-0.005
 thirsty-0.004
angers-0.004
Token up
Token ID510
Feature activation-2.93e-02

Pos logit contributions
 Amen+0.015
oc+0.014
 harm+0.014
give+0.014
 exchange+0.014
Neg logit contributions
ashing-0.014
tub-0.014
gan-0.013
angers-0.012
 Mil-0.012
Token that
Token ID326
Feature activation-6.57e-03

Pos logit contributions
ashing+0.057
tub+0.055
gan+0.050
angers+0.049
 Mil+0.048
Neg logit contributions
oc-0.044
 Amen-0.044
 harm-0.044
 exchange-0.043
give-0.043
Token "
Token ID366
Feature activation+4.30e-02

Pos logit contributions
ashing+0.055
tub+0.052
pit+0.047
cast+0.046
angers+0.045
Neg logit contributions
 Amen-0.061
give-0.059
 deliver-0.059
 exchange-0.056
 letter-0.056
TokenL
Token ID43
Feature activation+8.03e-03

Pos logit contributions
oc+0.059
 harm+0.058
 Amen+0.058
give+0.058
 exchange+0.058
Neg logit contributions
ashing-0.091
tub-0.087
gan-0.079
angers-0.079
 Mil-0.077
Tokenily
Token ID813
Feature activation+1.93e-02

Pos logit contributions
give+0.015
 harm+0.015
oc+0.015
 exchange+0.014
e+0.014
Neg logit contributions
ashing-0.012
tub-0.012
cast-0.011
gan-0.011
 thirsty-0.011
Token,
Token ID11
Feature activation-3.51e-02

Pos logit contributions
 harm+0.030
oc+0.030
give+0.030
 exchange+0.030
 respect+0.029
Neg logit contributions
ashing-0.037
tub-0.037
angers-0.034
cast-0.032
gan-0.032
Token we
Token ID356
Feature activation+4.95e-02

Pos logit contributions
ashing+0.077
tub+0.072
angers+0.068
pit+0.066
gan+0.065
Neg logit contributions
give-0.047
 exchange-0.046
 harm-0.046
 deliver-0.045
 respect-0.045
Token shouldn
Token ID6584
Feature activation+1.21e-01

Pos logit contributions
oc+0.062
 Amen+0.062
 harm+0.061
give+0.060
 exchange+0.060
Neg logit contributions
ashing-0.110
tub-0.105
gan-0.096
 thirsty-0.094
angers-0.093
Token't
Token ID470
Feature activation+6.06e-02

Pos logit contributions
oc+0.215
 exchange+0.199
rum+0.196
 harm+0.194
 Amen+0.194
Neg logit contributions
tub-0.199
ashing-0.184
 Mil-0.176
gan-0.173
 upon-0.173
Token pick
Token ID2298
Feature activation+3.39e-02

Pos logit contributions
 Amen+0.118
oc+0.107
izer+0.102
ez+0.096
',"+0.095
Neg logit contributions
ashing-0.101
 thirsty-0.087
 Jill-0.086
 O-0.085
gan-0.084
Token up
Token ID510
Feature activation-2.06e-02

Pos logit contributions
 Amen+0.061
oc+0.058
 harm+0.055
 exchange+0.055
give+0.054
Neg logit contributions
ashing-0.058
tub-0.055
gan-0.051
 bored-0.050
 Mil-0.049
Token things
Token ID1243
Feature activation+1.43e-02

Pos logit contributions
ashing+0.038
tub+0.036
gan+0.033
angers+0.032
 Mil+0.032
Neg logit contributions
 Amen-0.034
oc-0.033
 harm-0.032
 exchange-0.032
give-0.032
Token from
Token ID422
Feature activation+1.79e-02

Pos logit contributions
oc+0.022
 Amen+0.022
 harm+0.021
 exchange+0.021
give+0.021
Neg logit contributions
ashing-0.028
tub-0.027
gan-0.024
angers-0.024
 Mil-0.023
Token He
Token ID679
Feature activation+1.85e-02

Pos logit contributions
oc+0.044
 Amen+0.044
give+0.041
 exchange+0.040
 deliver+0.040
Neg logit contributions
ashing-0.062
tub-0.057
cast-0.053
pit-0.052
angers-0.051
Token's
Token ID338
Feature activation+1.33e-02

Pos logit contributions
oc+0.025
 harm+0.025
 exchange+0.024
give+0.024
 deliver+0.024
Neg logit contributions
ashing-0.040
tub-0.039
angers-0.036
gan-0.035
pit-0.034
Token very
Token ID845
Feature activation+2.42e-02

Pos logit contributions
oc+0.022
 Amen+0.022
 harm+0.022
 exchange+0.022
give+0.021
Neg logit contributions
ashing-0.025
tub-0.023
gan-0.021
angers-0.021
 Mil-0.020
Token small
Token ID1402
Feature activation+3.03e-02

Pos logit contributions
 harm+0.038
oc+0.038
 exchange+0.037
 Amen+0.037
give+0.036
Neg logit contributions
ashing-0.046
tub-0.045
angers-0.040
gan-0.039
pit-0.039
Token and
Token ID290
Feature activation-6.79e-03

Pos logit contributions
 harm+0.049
oc+0.048
 exchange+0.047
give+0.046
 Amen+0.045
Neg logit contributions
ashing-0.058
tub-0.054
gan-0.050
angers-0.050
cast-0.048
Token shouldn
Token ID6584
Feature activation+1.20e-01

Pos logit contributions
ashing+0.013
tub+0.012
angers+0.011
gan+0.011
pit+0.010
Neg logit contributions
oc-0.011
 harm-0.011
 Amen-0.011
 exchange-0.011
give-0.011
Token't
Token ID470
Feature activation+5.68e-02

Pos logit contributions
oc+0.230
rum+0.221
eness+0.217
 approval+0.216
 envelop+0.210
Neg logit contributions
tub-0.159
 Mil-0.156
 upon-0.147
gan-0.142
 Madd-0.142
Token be
Token ID307
Feature activation+9.67e-04

Pos logit contributions
oc+0.089
 Amen+0.087
izer+0.082
rum+0.078
give+0.078
Neg logit contributions
ashing-0.107
tub-0.100
 thirsty-0.097
gan-0.097
 upon-0.095
Token feared
Token ID15240
Feature activation+2.48e-02

Pos logit contributions
 harm+0.002
 exchange+0.002
give+0.002
oc+0.002
 Amen+0.002
Neg logit contributions
ashing-0.002
tub-0.002
angers-0.002
gan-0.001
pit-0.001
Token.
Token ID13
Feature activation+2.33e-02

Pos logit contributions
 harm+0.035
oc+0.031
give+0.031
 exchange+0.030
 deliver+0.030
Neg logit contributions
ashing-0.055
tub-0.052
gan-0.048
angers-0.048
ashed-0.047
Token He
Token ID679
Feature activation+1.64e-02

Pos logit contributions
oc+0.040
 Amen+0.040
 exchange+0.038
give+0.037
 deliver+0.037
Neg logit contributions
ashing-0.043
tub-0.040
angers-0.036
cast-0.036
ashed-0.035
Token but
Token ID475
Feature activation+6.11e-03

Pos logit contributions
ashing+0.098
tub+0.094
angers+0.085
gan+0.082
pit+0.080
Neg logit contributions
 exchange-0.107
 harm-0.106
 deliver-0.106
 shave-0.105
oc-0.103
Token his
Token ID465
Feature activation-5.63e-03

Pos logit contributions
 exchange+0.009
 Amen+0.008
 letter+0.008
 harm+0.008
 deliver+0.008
Neg logit contributions
ashing-0.014
tub-0.012
angers-0.012
cast-0.011
ashed-0.011
Token mother
Token ID2802
Feature activation+5.11e-02

Pos logit contributions
ashing+0.009
tub+0.009
gan+0.008
angers+0.008
 Mil+0.007
Neg logit contributions
oc-0.010
 Amen-0.010
 harm-0.010
 exchange-0.010
give-0.010
Token said
Token ID531
Feature activation+3.51e-02

Pos logit contributions
oc+0.090
 harm+0.089
 Amen+0.089
 exchange+0.088
give+0.088
Neg logit contributions
ashing-0.087
tub-0.083
gan-0.073
angers-0.073
 Mil-0.071
Token he
Token ID339
Feature activation+1.50e-02

Pos logit contributions
 harm+0.051
give+0.050
 exchange+0.049
oc+0.048
 Amen+0.048
Neg logit contributions
ashing-0.074
tub-0.070
angers-0.064
pit-0.063
gan-0.061
Token shouldn
Token ID6584
Feature activation+1.20e-01

Pos logit contributions
oc+0.029
 Amen+0.029
 harm+0.028
 exchange+0.028
give+0.028
Neg logit contributions
ashing-0.023
tub-0.022
gan-0.020
angers-0.019
 Mil-0.019
Token't
Token ID470
Feature activation+5.85e-02

Pos logit contributions
oc+0.217
 approval+0.205
 harm+0.204
 Amen+0.202
rum+0.202
Neg logit contributions
tub-0.189
 upon-0.173
ashing-0.167
 Mil-0.161
gan-0.154
Token because
Token ID780
Feature activation+2.35e-02

Pos logit contributions
 Amen+0.100
oc+0.097
izer+0.091
give+0.087
ez+0.083
Neg logit contributions
ashing-0.112
gan-0.097
tub-0.097
 thirsty-0.096
 upon-0.093
Token it
Token ID340
Feature activation-3.29e-02

Pos logit contributions
oc+0.041
 Amen+0.041
 harm+0.040
 exchange+0.040
give+0.040
Neg logit contributions
ashing-0.041
tub-0.039
gan-0.035
angers-0.034
 Mil-0.033
Token was
Token ID373
Feature activation+3.69e-03

Pos logit contributions
ashing+0.069
tub+0.065
angers+0.061
gan+0.059
pit+0.058
Neg logit contributions
 harm-0.048
oc-0.047
 exchange-0.046
give-0.045
 Amen-0.044
Token not
Token ID407
Feature activation+5.12e-02

Pos logit contributions
oc+0.006
 Amen+0.006
 harm+0.006
 exchange+0.006
give+0.006
Neg logit contributions
ashing-0.006
tub-0.006
gan-0.005
angers-0.005
 Mil-0.005
Token told
Token ID1297
Feature activation+5.08e-02

Pos logit contributions
tub+0.013
ashing+0.012
angers+0.012
pit+0.011
cast+0.011
Neg logit contributions
 harm-0.007
 steal-0.007
 deliver-0.006
give-0.006
 Amen-0.006
Token Sarah
Token ID10490
Feature activation+4.87e-02

Pos logit contributions
 harm+0.098
give+0.095
 exchange+0.094
 Amen+0.094
oc+0.093
Neg logit contributions
ashing-0.082
tub-0.078
gan-0.067
angers-0.066
pit-0.065
Token,
Token ID11
Feature activation-4.72e-02

Pos logit contributions
 harm+0.065
 respect+0.061
 exchange+0.061
 letter+0.060
give+0.059
Neg logit contributions
ashing-0.106
tub-0.105
angers-0.096
pit-0.094
cast-0.093
Token "
Token ID366
Feature activation+3.35e-02

Pos logit contributions
ashing+0.078
tub+0.076
pit+0.067
angers+0.066
cast+0.066
Neg logit contributions
 Amen-0.087
give-0.086
 harm-0.084
oc-0.084
 deliver-0.084
TokenYou
Token ID1639
Feature activation+4.38e-02

Pos logit contributions
oc+0.042
 harm+0.042
 Amen+0.042
give+0.042
 exchange+0.041
Neg logit contributions
ashing-0.074
tub-0.072
gan-0.065
angers-0.065
 Mil-0.064
Token shouldn
Token ID6584
Feature activation+1.19e-01

Pos logit contributions
oc+0.054
 harm+0.053
 exchange+0.053
give+0.051
 Amen+0.051
Neg logit contributions
ashing-0.102
tub-0.096
angers-0.089
pit-0.088
gan-0.087
Token't
Token ID470
Feature activation+5.34e-02

Pos logit contributions
oc+0.215
 exchange+0.209
 approval+0.204
 envelop+0.202
 Amen+0.202
Neg logit contributions
tub-0.175
ashing-0.161
 upon-0.159
gan-0.159
 Mil-0.152
Token climb
Token ID12080
Feature activation+3.91e-03

Pos logit contributions
 Amen+0.096
oc+0.091
izer+0.090
give+0.088
 exchange+0.087
Neg logit contributions
ashing-0.088
tub-0.079
gan-0.078
 upon-0.076
 thirsty-0.073
Token the
Token ID262
Feature activation-5.81e-02

Pos logit contributions
 harm+0.006
give+0.006
oc+0.006
 exchange+0.005
 deliver+0.005
Neg logit contributions
ashing-0.008
tub-0.008
gan-0.007
angers-0.007
pit-0.007
Token tree
Token ID5509
Feature activation-4.24e-02

Pos logit contributions
ashing+0.116
tub+0.110
gan+0.101
angers+0.099
 Mil+0.097
Neg logit contributions
 harm-0.088
oc-0.088
 exchange-0.086
 Amen-0.085
give-0.085
Token,
Token ID11
Feature activation-4.83e-02

Pos logit contributions
ashing+0.090
tub+0.085
angers+0.080
 Mil+0.079
gan+0.079
Neg logit contributions
oc-0.059
 harm-0.057
 exchange-0.056
give-0.053
 deliver-0.052
Token "
Token ID366
Feature activation+3.64e-02

Pos logit contributions
ashing+0.067
tub+0.064
pit+0.056
angers+0.056
cast+0.055
Neg logit contributions
 Amen-0.077
give-0.077
oc-0.075
 exchange-0.074
 deliver-0.074
TokenMe
Token ID5308
Feature activation+2.47e-03

Pos logit contributions
oc+0.044
give+0.043
 Amen+0.043
 harm+0.043
 exchange+0.043
Neg logit contributions
ashing-0.083
tub-0.080
angers-0.073
gan-0.072
pit-0.070
Tokengan
Token ID1030
Feature activation-6.98e-03

Pos logit contributions
give+0.004
 harm+0.004
 exchange+0.004
e+0.004
 steal+0.004
Neg logit contributions
ashing-0.004
tub-0.004
angers-0.004
cast-0.003
pit-0.003
Token,
Token ID11
Feature activation-3.15e-02

Pos logit contributions
ashing+0.013
tub+0.012
angers+0.011
cast+0.011
gan+0.010
Neg logit contributions
oc-0.012
 harm-0.012
 exchange-0.012
give-0.011
 Amen-0.011
Token you
Token ID345
Feature activation+2.77e-02

Pos logit contributions
ashing+0.065
tub+0.064
angers+0.062
ashed+0.055
cast+0.053
Neg logit contributions
 respect-0.047
 harm-0.046
give-0.046
 exchange-0.044
 fright-0.044
Token shouldn
Token ID6584
Feature activation+1.19e-01

Pos logit contributions
oc+0.033
 Amen+0.033
 harm+0.032
 exchange+0.032
give+0.032
Neg logit contributions
ashing-0.063
tub-0.061
gan-0.056
angers-0.055
 Mil-0.054
Token't
Token ID470
Feature activation+5.30e-02

Pos logit contributions
oc+0.241
rum+0.218
 approval+0.215
 Amen+0.215
 letter+0.213
Neg logit contributions
tub-0.166
gan-0.146
 Mil-0.145
 Madd-0.142
 upon-0.142
Token take
Token ID1011
Feature activation+6.68e-02

Pos logit contributions
 Amen+0.092
oc+0.090
izer+0.081
ez+0.079
 envelop+0.075
Neg logit contributions
ashing-0.096
gan-0.086
tub-0.085
 thirsty-0.083
 Jill-0.082
Token things
Token ID1243
Feature activation+1.26e-02

Pos logit contributions
 Amen+0.109
oc+0.109
 exchange+0.103
 harm+0.102
give+0.101
Neg logit contributions
ashing-0.123
tub-0.116
gan-0.107
 Mil-0.105
angers-0.105
Token that
Token ID326
Feature activation-1.94e-02

Pos logit contributions
 Amen+0.020
oc+0.020
 harm+0.019
give+0.019
 exchange+0.019
Neg logit contributions
ashing-0.024
tub-0.023
gan-0.021
angers-0.020
 thirsty-0.020
Token are
Token ID389
Feature activation+2.88e-03

Pos logit contributions
ashing+0.046
angers+0.043
tub+0.042
pit+0.040
gan+0.040
Neg logit contributions
 harm-0.024
 exchange-0.024
 steal-0.022
 prove-0.022
 respect-0.022
Token's
Token ID338
Feature activation+1.36e-02

Pos logit contributions
 Amen+0.015
 harm+0.015
give+0.014
 exchange+0.014
oc+0.014
Neg logit contributions
tub-0.019
ashing-0.019
gan-0.017
 Mil-0.017
 thirsty-0.017
Token very
Token ID845
Feature activation+2.82e-02

Pos logit contributions
oc+0.024
 Amen+0.024
 harm+0.023
 exchange+0.023
give+0.023
Neg logit contributions
ashing-0.024
tub-0.023
gan-0.020
angers-0.020
 Mil-0.020
Token rude
Token ID22066
Feature activation+3.44e-02

Pos logit contributions
oc+0.044
 harm+0.044
give+0.043
 exchange+0.043
 Amen+0.043
Neg logit contributions
ashing-0.054
tub-0.052
angers-0.046
gan-0.046
pit-0.045
Token!
Token ID0
Feature activation+3.13e-02

Pos logit contributions
 harm+0.045
oc+0.045
 exchange+0.044
give+0.044
 Amen+0.043
Neg logit contributions
ashing-0.074
tub-0.071
angers-0.066
gan-0.065
 Mil-0.063
Token You
Token ID921
Feature activation+4.00e-02

Pos logit contributions
 Amen+0.048
oc+0.048
give+0.047
 exchange+0.045
 harm+0.044
Neg logit contributions
ashing-0.063
tub-0.060
cast-0.056
pit-0.055
angers-0.053
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
 Amen+0.052
oc+0.052
 harm+0.051
 exchange+0.050
give+0.050
Neg logit contributions
ashing-0.086
tub-0.083
gan-0.076
angers-0.074
 Mil-0.074
Token't
Token ID470
Feature activation+5.30e-02

Pos logit contributions
oc+0.237
eness+0.227
 approval+0.227
rum+0.223
 Amen+0.222
Neg logit contributions
tub-0.149
 Mil-0.144
 Sandy-0.136
ulu-0.136
 O-0.135
Token eat
Token ID4483
Feature activation+4.26e-02

Pos logit contributions
 Amen+0.092
oc+0.086
izer+0.083
eness+0.075
 handcuffs+0.074
Neg logit contributions
ashing-0.093
 thirsty-0.084
 Jill-0.082
gan-0.080
 upon-0.079
Token things
Token ID1243
Feature activation+1.31e-02

Pos logit contributions
oc+0.064
 harm+0.064
 exchange+0.063
 Amen+0.063
give+0.062
Neg logit contributions
ashing-0.084
tub-0.080
gan-0.073
angers-0.072
 Mil-0.070
Token without
Token ID1231
Feature activation+1.86e-03

Pos logit contributions
oc+0.020
 Amen+0.020
 harm+0.020
 exchange+0.020
give+0.020
Neg logit contributions
ashing-0.025
tub-0.024
gan-0.022
angers-0.022
 Mil-0.021
Token asking
Token ID4737
Feature activation+2.82e-02

Pos logit contributions
oc+0.003
 Amen+0.003
 exchange+0.003
give+0.003
 harm+0.003
Neg logit contributions
ashing-0.003
tub-0.003
gan-0.003
 Mil-0.003
 thirsty-0.003
TokenNo
Token ID2949
Feature activation+5.05e-02

Pos logit contributions
 Amen+0.058
oc+0.056
 harm+0.055
rum+0.053
 letter+0.052
Neg logit contributions
ashing-0.087
tub-0.082
gan-0.077
 upon-0.074
 Mil-0.074
Token,
Token ID11
Feature activation-2.64e-02

Pos logit contributions
oc+0.093
 Amen+0.093
 harm+0.092
 exchange+0.091
give+0.091
Neg logit contributions
ashing-0.083
tub-0.078
gan-0.070
angers-0.068
 Mil-0.067
Token NO
Token ID8005
Feature activation-1.64e-03

Pos logit contributions
ashing+0.043
tub+0.040
angers+0.036
gan+0.035
pit+0.034
Neg logit contributions
 harm-0.049
oc-0.049
 exchange-0.049
 Amen-0.049
give-0.048
Token!
Token ID0
Feature activation+2.70e-02

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
gan+0.003
 Mil+0.003
Neg logit contributions
give-0.002
 harm-0.002
 exchange-0.002
oc-0.002
e-0.002
Token You
Token ID921
Feature activation+3.79e-02

Pos logit contributions
give+0.045
oc+0.045
 Amen+0.044
 exchange+0.044
 harm+0.043
Neg logit contributions
ashing-0.049
tub-0.046
cast-0.041
pit-0.041
angers-0.041
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
oc+0.055
 harm+0.054
 Amen+0.054
 exchange+0.054
give+0.053
Neg logit contributions
ashing-0.077
tub-0.073
gan-0.066
angers-0.066
 Mil-0.064
Token't
Token ID470
Feature activation+5.43e-02

Pos logit contributions
oc+0.251
 Amen+0.227
rum+0.222
 approval+0.214
 letter+0.214
Neg logit contributions
tub-0.162
 Mil-0.144
gan-0.138
 Madd-0.135
 Jill-0.134
Token mix
Token ID5022
Feature activation+6.13e-03

Pos logit contributions
 Amen+0.091
oc+0.087
izer+0.082
ez+0.074
 harm+0.074
Neg logit contributions
ashing-0.101
tub-0.096
 Jill-0.089
gan-0.088
 Suz-0.085
Token them
Token ID606
Feature activation-1.42e-02

Pos logit contributions
 harm+0.009
 exchange+0.009
give+0.009
oc+0.009
 respect+0.008
Neg logit contributions
ashing-0.013
tub-0.012
angers-0.011
gan-0.011
ashed-0.010
Token!"
Token ID2474
Feature activation-3.93e-03

Pos logit contributions
ashing+0.031
tub+0.029
angers+0.027
 Mil+0.027
gan+0.027
Neg logit contributions
 exchange-0.018
 harm-0.018
oc-0.018
give-0.017
 deliver-0.017
Token
Token ID198
Feature activation+4.26e-02

Pos logit contributions
ashing+0.007
tub+0.006
angers+0.006
gan+0.006
pit+0.006
Neg logit contributions
 exchange-0.007
give-0.007
oc-0.007
 harm-0.006
 Amen-0.006
Token you
Token ID345
Feature activation+3.11e-02

Pos logit contributions
 harm+0.046
 Amen+0.046
 exchange+0.046
give+0.046
oc+0.045
Neg logit contributions
ashing-0.068
tub-0.064
angers-0.058
pit-0.057
gan-0.056
Token do
Token ID466
Feature activation+5.83e-02

Pos logit contributions
 Amen+0.047
oc+0.044
give+0.040
rum+0.039
 exchange+0.039
Neg logit contributions
ashing-0.062
tub-0.060
 thirsty-0.058
 upon-0.054
gan-0.053
Token that
Token ID326
Feature activation+2.93e-02

Pos logit contributions
oc+0.106
 Amen+0.105
 exchange+0.099
 harm+0.097
rum+0.096
Neg logit contributions
tub-0.093
ashing-0.092
 thirsty-0.082
 Mil-0.080
 bored-0.080
Token?
Token ID30
Feature activation+2.33e-02

Pos logit contributions
oc+0.050
 Amen+0.049
 exchange+0.048
give+0.047
e+0.046
Neg logit contributions
tub-0.051
ashing-0.049
 thirsty-0.045
 bored-0.044
gan-0.043
Token You
Token ID921
Feature activation+3.75e-02

Pos logit contributions
 harm+0.042
oc+0.042
 Amen+0.042
 exchange+0.041
give+0.041
Neg logit contributions
ashing-0.038
tub-0.037
gan-0.033
angers-0.032
 Mil-0.031
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
 Amen+0.059
oc+0.058
 harm+0.057
give+0.056
 exchange+0.056
Neg logit contributions
ashing-0.071
tub-0.068
gan-0.062
 thirsty-0.059
 upon-0.059
Token't
Token ID470
Feature activation+5.39e-02

Pos logit contributions
 approval+0.235
oc+0.233
eness+0.232
rum+0.226
 respect+0.221
Neg logit contributions
tub-0.147
 Mil-0.140
 O-0.131
 thirsty-0.123
 L-0.120
Token disturb
Token ID17037
Feature activation+3.31e-02

Pos logit contributions
 Amen+0.090
oc+0.087
izer+0.083
 handcuffs+0.081
 envelop+0.080
Neg logit contributions
ashing-0.093
tub-0.086
 thirsty-0.082
 upon-0.079
 O-0.078
Token things
Token ID1243
Feature activation+8.97e-03

Pos logit contributions
 Amen+0.061
oc+0.060
 exchange+0.056
izer+0.054
 deliver+0.054
Neg logit contributions
ashing-0.052
tub-0.051
gan-0.048
 Mil-0.047
 bored-0.045
Token that
Token ID326
Feature activation-1.96e-02

Pos logit contributions
 Amen+0.014
oc+0.014
give+0.013
 harm+0.013
 exchange+0.013
Neg logit contributions
ashing-0.017
tub-0.017
gan-0.015
angers-0.015
 thirsty-0.014
Token don
Token ID836
Feature activation+3.67e-02

Pos logit contributions
ashing+0.042
tub+0.039
angers+0.037
gan+0.036
pit+0.035
Neg logit contributions
 harm-0.028
oc-0.027
 exchange-0.027
give-0.027
 Amen-0.026
Token Billy
Token ID15890
Feature activation+6.59e-02

Pos logit contributions
oc+0.090
 Amen+0.088
 harm+0.083
 owe+0.081
give+0.081
Neg logit contributions
ashing-0.119
tub-0.111
gan-0.108
 Mil-0.104
angers-0.103
Token and
Token ID290
Feature activation+5.07e-03

Pos logit contributions
 exchange+0.100
oc+0.100
 harm+0.098
 deliver+0.094
give+0.093
Neg logit contributions
ashing-0.128
tub-0.125
angers-0.114
gan-0.111
 Mil-0.111
Token told
Token ID1297
Feature activation+6.08e-02

Pos logit contributions
oc+0.009
 Amen+0.009
 harm+0.008
give+0.008
 exchange+0.008
Neg logit contributions
ashing-0.009
tub-0.008
gan-0.008
angers-0.007
 Mil-0.007
Token him
Token ID683
Feature activation+2.85e-02

Pos logit contributions
oc+0.087
 Amen+0.084
 exchange+0.078
give+0.077
 harm+0.075
Neg logit contributions
ashing-0.123
tub-0.115
gan-0.108
angers-0.107
cast-0.107
Token he
Token ID339
Feature activation+1.05e-02

Pos logit contributions
 harm+0.040
 respect+0.039
 exchange+0.038
give+0.038
 defeat+0.038
Neg logit contributions
ashing-0.060
tub-0.057
angers-0.052
pit-0.052
gan-0.051
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
 harm+0.018
oc+0.018
 Amen+0.018
give+0.018
 exchange+0.018
Neg logit contributions
ashing-0.019
tub-0.018
angers-0.016
gan-0.016
 Mil-0.015
Token't
Token ID470
Feature activation+5.42e-02

Pos logit contributions
oc+0.233
rum+0.212
 letter+0.211
 harm+0.205
 approval+0.204
Neg logit contributions
tub-0.173
 upon-0.158
ashing-0.143
 Mil-0.143
ulu-0.137
Token have
Token ID423
Feature activation+5.15e-02

Pos logit contributions
 Amen+0.090
oc+0.090
izer+0.085
',"+0.079
give+0.078
Neg logit contributions
ashing-0.099
tub-0.092
 upon-0.084
 Jill-0.084
 thirsty-0.084
Token it
Token ID340
Feature activation-2.86e-02

Pos logit contributions
oc+0.083
 Amen+0.082
 harm+0.078
 exchange+0.077
give+0.077
Neg logit contributions
ashing-0.098
tub-0.092
gan-0.085
angers-0.083
 Mil-0.082
Token.
Token ID13
Feature activation+1.77e-02

Pos logit contributions
ashing+0.060
tub+0.058
 Mil+0.054
angers+0.054
gan+0.053
Neg logit contributions
 harm-0.039
 exchange-0.037
give-0.037
oc-0.035
 deliver-0.035
Token
Token ID220
Feature activation-3.44e-02

Pos logit contributions
oc+0.029
 Amen+0.029
 exchange+0.029
give+0.028
 harm+0.028
Neg logit contributions
ashing-0.033
tub-0.031
gan-0.028
angers-0.028
 Mil-0.027
Token "
Token ID366
Feature activation+4.11e-02

Pos logit contributions
ashing+0.063
tub+0.060
angers+0.054
pit+0.053
gan+0.051
Neg logit contributions
 Amen-0.067
give-0.064
oc-0.064
 deliver-0.062
 exchange-0.062
TokenFred
Token ID30847
Feature activation+1.34e-02

Pos logit contributions
 Amen+0.062
 harm+0.060
 exchange+0.059
 deliver+0.059
oc+0.058
Neg logit contributions
ashing-0.081
tub-0.074
gan-0.072
 Mil-0.069
pit-0.068
Token,
Token ID11
Feature activation-3.79e-02

Pos logit contributions
oc+0.022
 harm+0.021
 exchange+0.021
 Amen+0.020
give+0.020
Neg logit contributions
ashing-0.026
tub-0.025
angers-0.023
gan-0.021
 Mil-0.021
Token maybe
Token ID3863
Feature activation+2.82e-02

Pos logit contributions
ashing+0.084
tub+0.080
angers+0.077
gan+0.072
pit+0.071
Neg logit contributions
 harm-0.050
give-0.049
oc-0.049
 exchange-0.048
 respect-0.048
Token you
Token ID345
Feature activation+2.28e-02

Pos logit contributions
 harm+0.046
oc+0.045
 exchange+0.045
give+0.045
 Amen+0.045
Neg logit contributions
ashing-0.054
tub-0.050
angers-0.045
gan-0.044
pit-0.043
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
 Amen+0.037
oc+0.036
give+0.035
 exchange+0.035
 harm+0.035
Neg logit contributions
ashing-0.042
tub-0.040
gan-0.036
 upon-0.035
 thirsty-0.035
Token't
Token ID470
Feature activation+5.50e-02

Pos logit contributions
oc+0.236
 envelop+0.227
 exchange+0.219
rum+0.218
 Amen+0.216
Neg logit contributions
tub-0.170
 Mil-0.145
ashing-0.137
 O-0.137
 upon-0.134
Token be
Token ID307
Feature activation-4.83e-03

Pos logit contributions
 Amen+0.093
oc+0.085
 envelop+0.081
 handcuffs+0.078
 exchange+0.078
Neg logit contributions
ashing-0.109
tub-0.100
 thirsty-0.090
 bored-0.087
 upon-0.087
Token blowing
Token ID19280
Feature activation+4.15e-02

Pos logit contributions
ashing+0.009
tub+0.009
angers+0.008
gan+0.008
pit+0.008
Neg logit contributions
 harm-0.008
give-0.007
oc-0.007
 exchange-0.007
 Amen-0.007
Token that
Token ID326
Feature activation+8.85e-03

Pos logit contributions
 Amen+0.068
oc+0.067
 exchange+0.066
 harm+0.065
give+0.065
Neg logit contributions
ashing-0.076
tub-0.073
gan-0.065
angers-0.064
 Mil-0.063
Token loud
Token ID7812
Feature activation+4.14e-02

Pos logit contributions
give+0.012
 Amen+0.012
 exchange+0.012
oc+0.011
 harm+0.011
Neg logit contributions
ashing-0.018
tub-0.018
gan-0.016
 thirsty-0.016
 Mil-0.016
Token he
Token ID339
Feature activation-7.45e-03

Pos logit contributions
ashing+0.027
tub+0.024
angers+0.023
gan+0.021
ancer+0.021
Neg logit contributions
 exchange-0.026
oc-0.025
 Amen-0.025
 deliver-0.025
 shave-0.025
Token said
Token ID531
Feature activation+6.21e-03

Pos logit contributions
ashing+0.016
tub+0.015
angers+0.014
pit+0.014
cast+0.013
Neg logit contributions
oc-0.010
 Amen-0.010
 exchange-0.010
give-0.010
 harm-0.010
Token,
Token ID11
Feature activation-4.94e-02

Pos logit contributions
 Amen+0.007
 harm+0.007
oc+0.007
 deliver+0.007
 exchange+0.007
Neg logit contributions
ashing-0.015
tub-0.014
angers-0.013
pit-0.013
cast-0.013
Token "
Token ID366
Feature activation+2.65e-02

Pos logit contributions
ashing+0.083
tub+0.077
angers+0.073
pit+0.067
cast+0.067
Neg logit contributions
 Amen-0.090
give-0.089
 deliver-0.088
 exchange-0.085
 harm-0.084
TokenYou
Token ID1639
Feature activation+3.59e-02

Pos logit contributions
oc+0.032
give+0.032
 harm+0.032
 Amen+0.031
 exchange+0.031
Neg logit contributions
ashing-0.060
tub-0.058
angers-0.053
gan-0.053
pit-0.051
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
oc+0.045
 harm+0.045
 exchange+0.044
 Amen+0.044
give+0.044
Neg logit contributions
ashing-0.081
tub-0.077
angers-0.071
gan-0.070
pit-0.069
Token't
Token ID470
Feature activation+5.23e-02

Pos logit contributions
oc+0.228
rum+0.211
 exchange+0.203
 approval+0.202
 Amen+0.201
Neg logit contributions
tub-0.179
 Mil-0.160
 Madd-0.156
gan-0.152
 upon-0.151
Token have
Token ID423
Feature activation+5.43e-02

Pos logit contributions
 Amen+0.084
oc+0.081
izer+0.079
ez+0.072
give+0.072
Neg logit contributions
ashing-0.097
tub-0.090
 thirsty-0.087
 upon-0.087
gan-0.086
Token taken
Token ID2077
Feature activation+4.99e-02

Pos logit contributions
oc+0.088
 Amen+0.088
 harm+0.080
 exchange+0.080
give+0.079
Neg logit contributions
ashing-0.102
tub-0.096
gan-0.090
 thirsty-0.087
 Mil-0.087
Token it
Token ID340
Feature activation-3.80e-02

Pos logit contributions
 Amen+0.081
oc+0.078
 exchange+0.072
 harm+0.072
 deliver+0.071
Neg logit contributions
ashing-0.092
tub-0.087
gan-0.082
 Mil-0.081
 upon-0.080
Token!
Token ID0
Feature activation+2.57e-02

Pos logit contributions
ashing+0.081
tub+0.077
angers+0.071
 Mil+0.071
gan+0.071
Neg logit contributions
 harm-0.051
 exchange-0.050
give-0.049
oc-0.049
 deliver-0.047
Token and
Token ID290
Feature activation+5.08e-03

Pos logit contributions
oc+0.095
 exchange+0.094
 harm+0.093
 deliver+0.091
 defeat+0.089
Neg logit contributions
ashing-0.102
tub-0.098
angers-0.090
cast-0.086
gan-0.086
Token said
Token ID531
Feature activation+3.41e-02

Pos logit contributions
 Amen+0.009
oc+0.009
 harm+0.009
give+0.009
 exchange+0.009
Neg logit contributions
ashing-0.008
tub-0.008
gan-0.007
 Mil-0.007
angers-0.007
Token,
Token ID11
Feature activation-3.05e-02

Pos logit contributions
 Amen+0.057
oc+0.055
 harm+0.055
give+0.055
 respect+0.055
Neg logit contributions
ashing-0.063
tub-0.059
pit-0.054
angers-0.053
gan-0.053
Token "
Token ID366
Feature activation+4.46e-02

Pos logit contributions
ashing+0.049
tub+0.046
pit+0.041
gan+0.041
angers+0.040
Neg logit contributions
 Amen-0.057
oc-0.057
give-0.056
 harm-0.055
 deliver-0.055
TokenYou
Token ID1639
Feature activation+4.14e-02

Pos logit contributions
oc+0.068
 Amen+0.067
 harm+0.065
 exchange+0.065
 deliver+0.064
Neg logit contributions
ashing-0.083
tub-0.079
gan-0.072
 upon-0.071
 Mil-0.069
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
oc+0.058
 Amen+0.058
 harm+0.057
 exchange+0.056
give+0.056
Neg logit contributions
ashing-0.085
tub-0.082
gan-0.075
angers-0.073
 thirsty-0.073
Token't
Token ID470
Feature activation+5.18e-02

Pos logit contributions
oc+0.222
rum+0.209
 approval+0.207
 letter+0.203
 exchange+0.202
Neg logit contributions
tub-0.173
 Madd-0.148
 Mil-0.147
 upon-0.145
gan-0.144
Token take
Token ID1011
Feature activation+6.40e-02

Pos logit contributions
 Amen+0.081
oc+0.079
izer+0.079
 document+0.072
',"+0.071
Neg logit contributions
ashing-0.092
tub-0.087
 thirsty-0.081
 upon-0.080
 Jill-0.079
Token things
Token ID1243
Feature activation+1.33e-02

Pos logit contributions
 Amen+0.116
oc+0.114
 exchange+0.104
rum+0.103
 shave+0.102
Neg logit contributions
ashing-0.103
tub-0.094
 bored-0.091
gan-0.089
 thirsty-0.088
Token that
Token ID326
Feature activation-2.06e-02

Pos logit contributions
 Amen+0.022
oc+0.022
give+0.021
 harm+0.021
 exchange+0.021
Neg logit contributions
ashing-0.025
tub-0.023
gan-0.021
angers-0.020
 thirsty-0.020
Token don
Token ID836
Feature activation+3.35e-02

Pos logit contributions
ashing+0.046
tub+0.042
angers+0.041
gan+0.040
pit+0.039
Neg logit contributions
 exchange-0.028
 harm-0.028
oc-0.027
 deliver-0.026
give-0.026
TokenSorry
Token ID14385
Feature activation+3.43e-02

Pos logit contributions
 harm+0.057
oc+0.057
 Amen+0.056
 exchange+0.055
give+0.054
Neg logit contributions
ashing-0.089
tub-0.084
gan-0.079
 Mil-0.077
angers-0.075
Token John
Token ID1757
Feature activation+3.01e-02

Pos logit contributions
 respect+0.052
 harm+0.051
give+0.050
 exchange+0.049
oc+0.049
Neg logit contributions
ashing-0.069
tub-0.067
angers-0.062
ashed-0.059
 thirsty-0.056
Token,
Token ID11
Feature activation-2.70e-02

Pos logit contributions
oc+0.050
 harm+0.049
 exchange+0.048
 Amen+0.048
give+0.048
Neg logit contributions
ashing-0.056
tub-0.053
angers-0.049
gan-0.046
cast-0.046
Token but
Token ID475
Feature activation+2.38e-02

Pos logit contributions
ashing+0.049
tub+0.046
angers+0.042
gan+0.040
ashed+0.040
Neg logit contributions
 harm-0.046
oc-0.046
 exchange-0.046
give-0.046
 Amen-0.045
Token you
Token ID345
Feature activation+2.52e-02

Pos logit contributions
 harm+0.036
 exchange+0.036
give+0.035
 respect+0.034
 deliver+0.034
Neg logit contributions
ashing-0.051
tub-0.047
angers-0.045
ashed-0.042
pit-0.041
Token shouldn
Token ID6584
Feature activation+1.18e-01

Pos logit contributions
oc+0.030
 Amen+0.029
 harm+0.029
 exchange+0.029
give+0.029
Neg logit contributions
ashing-0.058
tub-0.056
gan-0.051
angers-0.051
 Mil-0.050
Token't
Token ID470
Feature activation+5.38e-02

Pos logit contributions
oc+0.217
rum+0.201
 harm+0.198
 approval+0.197
 letter+0.196
Neg logit contributions
tub-0.182
gan-0.161
ashing-0.160
 Mil-0.159
 upon-0.154
Token have
Token ID423
Feature activation+5.74e-02

Pos logit contributions
 Amen+0.084
izer+0.081
oc+0.080
 envelop+0.074
ez+0.073
Neg logit contributions
ashing-0.104
tub-0.093
gan-0.092
 thirsty-0.090
 upon-0.086
Token taken
Token ID2077
Feature activation+5.05e-02

Pos logit contributions
oc+0.107
 Amen+0.105
izer+0.095
rum+0.094
 harm+0.093
Neg logit contributions
ashing-0.097
tub-0.088
gan-0.087
 thirsty-0.084
 Mil-0.081
Token the
Token ID262
Feature activation-1.98e-02

Pos logit contributions
 Amen+0.086
oc+0.084
 exchange+0.077
 harm+0.076
 deliver+0.076
Neg logit contributions
ashing-0.090
tub-0.084
gan-0.080
 Mil-0.077
 upon-0.076
Token soup
Token ID17141
Feature activation+4.08e-04

Pos logit contributions
ashing+0.041
tub+0.037
angers+0.036
gan+0.035
 Mil+0.034
Neg logit contributions
 harm-0.030
 Amen-0.029
oc-0.029
 exchange-0.028
give-0.028
Token,
Token ID11
Feature activation-3.54e-02

Pos logit contributions
oc+0.082
 exchange+0.081
 harm+0.081
give+0.078
 letter+0.077
Neg logit contributions
ashing-0.087
tub-0.085
angers-0.080
gan-0.074
pit-0.073
Token be
Token ID307
Feature activation+7.55e-03

Pos logit contributions
ashing+0.080
tub+0.075
angers+0.071
gan+0.069
pit+0.068
Neg logit contributions
 exchange-0.046
 harm-0.045
 deliver-0.045
give-0.045
 respect-0.045
Token serious
Token ID2726
Feature activation+3.60e-02

Pos logit contributions
oc+0.016
 harm+0.016
 Amen+0.016
 exchange+0.016
give+0.016
Neg logit contributions
ashing-0.010
tub-0.009
gan-0.008
angers-0.008
 Mil-0.008
Token.
Token ID13
Feature activation+3.22e-02

Pos logit contributions
 harm+0.042
give+0.040
rum+0.039
oc+0.039
 Amen+0.038
Neg logit contributions
ashing-0.079
tub-0.075
 Mil-0.073
gan-0.072
angers-0.071
Token You
Token ID921
Feature activation+3.96e-02

Pos logit contributions
oc+0.048
 Amen+0.048
give+0.044
 deliver+0.043
 exchange+0.041
Neg logit contributions
ashing-0.068
cast-0.064
tub-0.063
pit-0.062
ashed-0.058
Token shouldn
Token ID6584
Feature activation+1.17e-01

Pos logit contributions
oc+0.055
 harm+0.054
 Amen+0.054
 exchange+0.054
give+0.053
Neg logit contributions
ashing-0.083
tub-0.079
gan-0.072
angers-0.072
 Mil-0.070
Token't
Token ID470
Feature activation+5.35e-02

Pos logit contributions
oc+0.220
rum+0.206
 Amen+0.203
 approval+0.201
 exchange+0.199
Neg logit contributions
tub-0.182
 Mil-0.161
ashing-0.157
 upon-0.155
gan-0.153
Token play
Token ID711
Feature activation+1.45e-02

Pos logit contributions
 Amen+0.092
oc+0.089
izer+0.082
give+0.078
rum+0.078
Neg logit contributions
ashing-0.098
tub-0.092
 upon-0.084
 Jill-0.083
gan-0.082
Token with
Token ID351
Feature activation+2.93e-02

Pos logit contributions
 Amen+0.024
 exchange+0.023
oc+0.023
 harm+0.023
give+0.023
Neg logit contributions
ashing-0.026
tub-0.026
angers-0.023
gan-0.022
 Mil-0.022
Token the
Token ID262
Feature activation-1.77e-02

Pos logit contributions
 Amen+0.046
oc+0.046
 harm+0.045
 exchange+0.045
give+0.044
Neg logit contributions
ashing-0.056
tub-0.053
gan-0.048
angers-0.048
 Mil-0.047
Token door
Token ID3420
Feature activation-1.21e-02

Pos logit contributions
ashing+0.037
tub+0.035
gan+0.032
angers+0.032
 Mil+0.031
Neg logit contributions
oc-0.025
 harm-0.025
 Amen-0.024
 exchange-0.024
give-0.024
Token and
Token ID290
Feature activation-3.20e-02

Pos logit contributions
ashing+0.050
tub+0.049
 upon+0.045
 Mil+0.044
gan+0.044
Neg logit contributions
 Amen-0.040
oc-0.040
give-0.039
 harm-0.038
 exchange-0.038
Token said
Token ID531
Feature activation+1.20e-02

Pos logit contributions
ashing+0.064
tub+0.060
angers+0.055
gan+0.053
pit+0.053
Neg logit contributions
 exchange-0.050
oc-0.050
 harm-0.049
 Amen-0.048
 deliver-0.048
Token,
Token ID11
Feature activation-4.68e-02

Pos logit contributions
oc+0.020
 Amen+0.020
give+0.019
 defeat+0.019
 exchange+0.019
Neg logit contributions
ashing-0.024
tub-0.021
angers-0.019
pit-0.019
gan-0.018
Token "
Token ID366
Feature activation+3.35e-02

Pos logit contributions
ashing+0.076
tub+0.070
angers+0.062
gan+0.060
pit+0.060
Neg logit contributions
 Amen-0.092
oc-0.089
give-0.088
 exchange-0.087
 deliver-0.087
TokenYou
Token ID1639
Feature activation+3.69e-02

Pos logit contributions
oc+0.046
 harm+0.046
 Amen+0.045
 exchange+0.043
rum+0.042
Neg logit contributions
ashing-0.068
tub-0.065
gan-0.062
 Mil-0.060
 upon-0.060
Token shouldn
Token ID6584
Feature activation+1.17e-01

Pos logit contributions
 exchange+0.050
 harm+0.049
give+0.049
oc+0.049
 Amen+0.049
Neg logit contributions
ashing-0.080
tub-0.075
angers-0.070
gan-0.069
pit-0.068
Token't
Token ID470
Feature activation+5.05e-02

Pos logit contributions
oc+0.238
rum+0.205
 Amen+0.199
 exchange+0.196
ez+0.193
Neg logit contributions
tub-0.177
 Mil-0.161
gan-0.159
 Madd-0.158
 upon-0.154
Token be
Token ID307
Feature activation-2.90e-03

Pos logit contributions
oc+0.076
 Amen+0.073
izer+0.067
rum+0.062
give+0.062
Neg logit contributions
ashing-0.104
tub-0.096
gan-0.091
 upon-0.090
 thirsty-0.089
Token here
Token ID994
Feature activation-7.14e-03

Pos logit contributions
ashing+0.005
tub+0.005
angers+0.004
pit+0.004
cast+0.004
Neg logit contributions
 harm-0.005
give-0.005
 exchange-0.005
e-0.005
 deliver-0.005
Token!
Token ID0
Feature activation+2.68e-02

Pos logit contributions
ashing+0.015
tub+0.014
angers+0.013
gan+0.013
 Mil+0.012
Neg logit contributions
 harm-0.010
 exchange-0.010
oc-0.010
 Amen-0.010
give-0.010
Token This
Token ID770
Feature activation+4.94e-03

Pos logit contributions
 Amen+0.045
oc+0.045
 harm+0.044
give+0.044
 exchange+0.044
Neg logit contributions
ashing-0.049
tub-0.046
gan-0.041
angers-0.041
pit-0.040
Tokenmy
Token ID1820
Feature activation+4.85e-02

Pos logit contributions
 Amen+0.114
oc+0.111
 harm+0.109
give+0.109
 exchange+0.108
Neg logit contributions
ashing-0.075
tub-0.067
 upon-0.062
gan-0.061
pit-0.058
Token told
Token ID1297
Feature activation+6.74e-02

Pos logit contributions
 Amen+0.101
 harm+0.093
oc+0.092
 exchange+0.091
 sentence+0.091
Neg logit contributions
ashing-0.070
tub-0.065
 upon-0.058
ashed-0.057
 thirsty-0.057
Token her
Token ID607
Feature activation+2.55e-02

Pos logit contributions
 harm+0.095
 Amen+0.094
give+0.093
 exchange+0.093
oc+0.093
Neg logit contributions
ashing-0.141
tub-0.135
gan-0.122
angers-0.121
pit-0.119
Token that
Token ID326
Feature activation+2.34e-02

Pos logit contributions
 harm+0.035
give+0.034
 Amen+0.034
 letter+0.034
 exchange+0.033
Neg logit contributions
ashing-0.055
tub-0.051
pit-0.047
angers-0.047
gan-0.046
Token she
Token ID673
Feature activation+9.77e-03

Pos logit contributions
 harm+0.040
 exchange+0.039
give+0.039
oc+0.038
 Amen+0.037
Neg logit contributions
ashing-0.045
tub-0.040
gan-0.038
angers-0.037
pit-0.036
Token shouldn
Token ID6584
Feature activation+1.17e-01

Pos logit contributions
 Amen+0.020
oc+0.020
 exchange+0.019
 harm+0.019
give+0.019
Neg logit contributions
ashing-0.014
tub-0.013
gan-0.012
 thirsty-0.011
angers-0.011
Token't
Token ID470
Feature activation+6.01e-02

Pos logit contributions
oc+0.221
rum+0.206
 Amen+0.205
 letter+0.202
 exchange+0.200
Neg logit contributions
tub-0.182
 upon-0.164
ashing-0.158
 Mil-0.151
angers-0.147
Token touch
Token ID3638
Feature activation+4.18e-02

Pos logit contributions
 Amen+0.109
oc+0.105
izer+0.099
ez+0.089
rum+0.089
Neg logit contributions
ashing-0.105
 Jill-0.090
tub-0.090
 upon-0.088
 raining-0.088
Token the
Token ID262
Feature activation-2.02e-02

Pos logit contributions
 Amen+0.081
oc+0.078
give+0.074
 exchange+0.073
 harm+0.072
Neg logit contributions
tub-0.060
ashing-0.060
 bored-0.056
 Mil-0.054
gan-0.053
Token newspaper
Token ID7533
Feature activation+3.78e-02

Pos logit contributions
ashing+0.038
 bored+0.036
tub+0.036
 upon+0.036
 thirsty+0.034
Neg logit contributions
 Amen-0.032
rum-0.030
oc-0.028
give-0.026
 shave-0.026
Token because
Token ID780
Feature activation+1.41e-02

Pos logit contributions
oc+0.056
 harm+0.056
 Amen+0.055
 exchange+0.055
give+0.054
Neg logit contributions
ashing-0.075
tub-0.071
gan-0.065
angers-0.064
 Mil-0.063
Token,
Token ID11
Feature activation-5.75e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.019
Neg logit contributions
ashing-0.019
tub-0.018
gan-0.016
angers-0.016
 Mil-0.015
Token Tim
Token ID5045
Feature activation+8.78e-03

Pos logit contributions
ashing+0.092
tub+0.080
angers+0.076
cast+0.067
pit+0.067
Neg logit contributions
 exchange-0.114
oc-0.114
 harm-0.113
 deliver-0.112
 steal-0.112
Tokenmy
Token ID1820
Feature activation+3.17e-02

Pos logit contributions
give+0.017
 Amen+0.017
oc+0.017
 harm+0.016
 exchange+0.016
Neg logit contributions
ashing-0.014
tub-0.013
gan-0.011
 upon-0.011
pit-0.011
Token!
Token ID0
Feature activation+3.02e-02

Pos logit contributions
 Amen+0.039
give+0.038
 exchange+0.038
 harm+0.038
oc+0.038
Neg logit contributions
ashing-0.071
tub-0.067
gan-0.062
angers-0.061
 Mil-0.060
Token We
Token ID775
Feature activation+4.91e-02

Pos logit contributions
oc+0.045
 Amen+0.044
give+0.042
 exchange+0.042
 deliver+0.040
Neg logit contributions
ashing-0.062
tub-0.059
cast-0.057
angers-0.054
pit-0.054
Token shouldn
Token ID6584
Feature activation+1.17e-01

Pos logit contributions
 Amen+0.084
oc+0.082
 harm+0.080
give+0.080
 exchange+0.078
Neg logit contributions
ashing-0.087
tub-0.082
 upon-0.073
gan-0.072
 thirsty-0.072
Token't
Token ID470
Feature activation+5.50e-02

Pos logit contributions
oc+0.223
rum+0.219
 shave+0.213
 letter+0.210
 exchange+0.210
Neg logit contributions
tub-0.167
ashing-0.151
 upon-0.149
 Mil-0.145
 thirsty-0.137
Token catch
Token ID4929
Feature activation+3.12e-02

Pos logit contributions
 Amen+0.103
oc+0.092
izer+0.089
ez+0.085
give+0.085
Neg logit contributions
ashing-0.102
tub-0.085
 upon-0.083
 thirsty-0.083
gan-0.081
Token the
Token ID262
Feature activation-1.96e-02

Pos logit contributions
 Amen+0.057
oc+0.054
 exchange+0.052
 shave+0.051
rum+0.051
Neg logit contributions
ashing-0.051
tub-0.048
 bored-0.045
 thirsty-0.045
 Mil-0.044
Token butterfly
Token ID35113
Feature activation+1.81e-02

Pos logit contributions
ashing+0.042
tub+0.041
gan+0.037
angers+0.036
 thirsty+0.036
Neg logit contributions
oc-0.027
 Amen-0.026
 exchange-0.025
 harm-0.025
give-0.025
Token.
Token ID13
Feature activation+2.52e-02

Pos logit contributions
oc+0.022
 harm+0.022
 Amen+0.022
 exchange+0.021
give+0.021
Neg logit contributions
ashing-0.041
tub-0.039
gan-0.036
angers-0.036
 Mil-0.035
Token's
Token ID338
Feature activation+1.28e-02

Pos logit contributions
oc+0.013
 harm+0.013
 Amen+0.013
 exchange+0.013
give+0.013
Neg logit contributions
ashing-0.021
tub-0.020
angers-0.019
gan-0.019
 Mil-0.018
Token not
Token ID407
Feature activation+6.72e-02

Pos logit contributions
 harm+0.023
oc+0.023
 Amen+0.023
 exchange+0.023
give+0.023
Neg logit contributions
ashing-0.021
tub-0.020
angers-0.018
gan-0.018
 Mil-0.017
Token nice
Token ID3621
Feature activation+3.06e-02

Pos logit contributions
 harm+0.108
 exchange+0.105
give+0.103
 deliver+0.101
oc+0.100
Neg logit contributions
ashing-0.127
tub-0.122
angers-0.115
gan-0.110
pit-0.110
Token.
Token ID13
Feature activation+3.26e-02

Pos logit contributions
oc+0.045
 Amen+0.044
 harm+0.044
 exchange+0.044
give+0.043
Neg logit contributions
ashing-0.062
tub-0.059
gan-0.054
angers-0.053
 Mil-0.052
Token You
Token ID921
Feature activation+4.16e-02

Pos logit contributions
oc+0.056
 Amen+0.056
 harm+0.055
give+0.054
 exchange+0.054
Neg logit contributions
ashing-0.058
tub-0.055
gan-0.049
angers-0.049
pit-0.048
Token shouldn
Token ID6584
Feature activation+1.17e-01

Pos logit contributions
oc+0.061
 harm+0.061
 Amen+0.060
 exchange+0.060
give+0.059
Neg logit contributions
ashing-0.084
tub-0.080
gan-0.072
angers-0.072
 Mil-0.070
Token't
Token ID470
Feature activation+5.22e-02

Pos logit contributions
oc+0.244
rum+0.227
 envelop+0.225
 letter+0.225
 approval+0.222
Neg logit contributions
tub-0.159
ashing-0.143
 Madd-0.133
 Mil-0.132
 upon-0.127
Token throw
Token ID3714
Feature activation+3.02e-02

Pos logit contributions
 Amen+0.088
oc+0.087
izer+0.081
 envelop+0.078
 letter+0.078
Neg logit contributions
ashing-0.101
tub-0.089
 thirsty-0.081
 upon-0.080
ashed-0.079
Token mud
Token ID17492
Feature activation-1.93e-02

Pos logit contributions
 Amen+0.057
oc+0.052
 exchange+0.050
 deliver+0.050
give+0.050
Neg logit contributions
ashing-0.052
tub-0.048
 thirsty-0.042
gan-0.042
 bored-0.042
Token at
Token ID379
Feature activation+9.68e-03

Pos logit contributions
ashing+0.038
tub+0.036
gan+0.033
angers+0.032
 Mil+0.032
Neg logit contributions
oc-0.029
 harm-0.029
 Amen-0.028
give-0.028
 exchange-0.028
Token the
Token ID262
Feature activation-4.17e-02

Pos logit contributions
 Amen+0.015
oc+0.015
 harm+0.014
 exchange+0.014
give+0.014
Neg logit contributions
ashing-0.019
tub-0.018
gan-0.016
angers-0.016
 thirsty-0.016
Token and
Token ID290
Feature activation-1.29e-02

Pos logit contributions
ashing+0.036
tub+0.034
angers+0.032
gan+0.032
 Mil+0.031
Neg logit contributions
 harm-0.021
 Amen-0.021
oc-0.021
 exchange-0.021
give-0.020
Token clean
Token ID3424
Feature activation-1.51e-02

Pos logit contributions
ashing+0.023
tub+0.021
angers+0.020
pit+0.019
gan+0.019
Neg logit contributions
 Amen-0.023
 exchange-0.022
 deliver-0.022
 harm-0.022
oc-0.021
Token again
Token ID757
Feature activation-6.81e-02

Pos logit contributions
ashing+0.031
tub+0.030
gan+0.027
angers+0.027
 Mil+0.026
Neg logit contributions
oc-0.021
 Amen-0.021
 harm-0.021
 exchange-0.021
give-0.021
Token.
Token ID13
Feature activation-1.41e-02

Pos logit contributions
ashing+0.142
tub+0.136
gan+0.123
angers+0.121
 Mil+0.121
Neg logit contributions
 Amen-0.093
oc-0.091
give-0.090
 harm-0.090
 exchange-0.089
Token<|endoftext|>
Token ID50256
Feature activation-3.80e-02

Pos logit contributions
ashing+0.029
tub+0.027
angers+0.026
cast+0.025
pit+0.025
Neg logit contributions
 Amen-0.022
oc-0.022
 exchange-0.021
give-0.019
 deliver-0.019
TokenOnce
Token ID7454
Feature activation-1.23e-01

Pos logit contributions
ashing+0.069
tub+0.065
gan+0.059
angers+0.058
 Mil+0.056
Neg logit contributions
oc-0.063
 Amen-0.063
 harm-0.062
 exchange-0.062
give-0.062
Token upon
Token ID2402
Feature activation-3.68e-02

Pos logit contributions
ashing+0.150
angers+0.149
drops+0.147
tub+0.136
pit+0.125
Neg logit contributions
 exchange-0.254
 harm-0.239
 letter-0.239
 Amen-0.233
 return-0.227
Token a
Token ID257
Feature activation+1.20e-02

Pos logit contributions
ashing+0.067
tub+0.065
gan+0.058
 Mil+0.056
angers+0.056
Neg logit contributions
oc-0.060
 harm-0.059
 Amen-0.059
 exchange-0.058
give-0.058
Token time
Token ID640
Feature activation-9.08e-02

Pos logit contributions
oc+0.016
 exchange+0.015
 harm+0.015
 Amen+0.015
give+0.015
Neg logit contributions
ashing-0.025
tub-0.025
angers-0.023
gan-0.023
 Mil-0.022
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.223
angers+0.216
tub+0.212
ashed+0.199
 prett+0.197
Neg logit contributions
e-0.083
oc-0.077
 exchange-0.074
rum-0.065
 return-0.064
Token in
Token ID287
Feature activation-2.54e-02

Pos logit contributions
ashing+0.175
angers+0.173
tub+0.163
cast+0.153
ways+0.150
Neg logit contributions
e-0.143
 exchange-0.142
 letter-0.138
 Amen-0.137
oc-0.133
Token passed
Token ID3804
Feature activation-1.15e-02

Pos logit contributions
oc+0.036
 Amen+0.036
 harm+0.035
 exchange+0.035
give+0.034
Neg logit contributions
ashing-0.044
tub-0.042
gan-0.038
angers-0.037
 Mil-0.036
Token the
Token ID262
Feature activation-5.07e-02

Pos logit contributions
ashing+0.023
tub+0.022
gan+0.020
angers+0.020
 Mil+0.019
Neg logit contributions
 harm-0.017
oc-0.017
 exchange-0.017
give-0.017
 Amen-0.016
Token bridge
Token ID7696
Feature activation-5.48e-02

Pos logit contributions
ashing+0.100
tub+0.092
angers+0.086
gan+0.086
 Mil+0.083
Neg logit contributions
 harm-0.080
oc-0.077
 exchange-0.077
give-0.076
 Amen-0.076
Token.
Token ID13
Feature activation-1.34e-02

Pos logit contributions
ashing+0.117
tub+0.111
gan+0.105
angers+0.104
 Mil+0.103
Neg logit contributions
 harm-0.073
 exchange-0.072
oc-0.072
give-0.069
 deliver-0.068
Token<|endoftext|>
Token ID50256
Feature activation-3.63e-02

Pos logit contributions
ashing+0.027
tub+0.026
angers+0.023
cast+0.023
pit+0.023
Neg logit contributions
 Amen-0.021
oc-0.021
 exchange-0.020
give-0.019
 deliver-0.018
TokenOnce
Token ID7454
Feature activation-1.23e-01

Pos logit contributions
ashing+0.065
tub+0.060
gan+0.055
angers+0.054
 Mil+0.053
Neg logit contributions
oc-0.062
 exchange-0.061
 Amen-0.061
 harm-0.060
give-0.060
Token upon
Token ID2402
Feature activation-3.69e-02

Pos logit contributions
ashing+0.154
angers+0.149
drops+0.143
tub+0.140
pit+0.125
Neg logit contributions
 exchange-0.257
 letter-0.240
 harm-0.239
 Amen-0.236
oc-0.232
Token a
Token ID257
Feature activation+1.02e-02

Pos logit contributions
ashing+0.064
tub+0.062
gan+0.056
 upon+0.054
 thirsty+0.054
Neg logit contributions
oc-0.063
 harm-0.062
 Amen-0.062
 exchange-0.061
give-0.061
Token time
Token ID640
Feature activation-9.09e-02

Pos logit contributions
oc+0.013
 exchange+0.013
 harm+0.013
 Amen+0.013
give+0.012
Neg logit contributions
tub-0.022
ashing-0.022
angers-0.020
gan-0.019
 Mil-0.019
Token,
Token ID11
Feature activation-9.67e-02

Pos logit contributions
ashing+0.229
angers+0.219
tub+0.213
ashed+0.208
 prett+0.200
Neg logit contributions
e-0.080
oc-0.077
 exchange-0.072
rum-0.065
 Amen-0.063
Token there
Token ID612
Feature activation-6.17e-02

Pos logit contributions
ashing+0.176
angers+0.173
tub+0.163
cast+0.155
ways+0.151
Neg logit contributions
e-0.145
 exchange-0.141
 Amen-0.140
 letter-0.136
oc-0.133
Token happily
Token ID18177
Feature activation-3.16e-02

Pos logit contributions
ashing+0.071
tub+0.067
angers+0.065
gan+0.063
pit+0.062
Neg logit contributions
 harm-0.051
 Amen-0.051
give-0.051
 exchange-0.050
e-0.050
Token ever
Token ID1683
Feature activation-1.62e-02

Pos logit contributions
ashing+0.070
tub+0.068
angers+0.063
gan+0.062
 Mil+0.061
Neg logit contributions
 harm-0.041
oc-0.040
give-0.039
 exchange-0.039
 Amen-0.039
Token after
Token ID706
Feature activation-2.09e-02

Pos logit contributions
tub+0.031
ashing+0.030
angers+0.029
gan+0.027
 Mil+0.027
Neg logit contributions
 harm-0.027
 exchange-0.025
give-0.025
oc-0.025
 Amen-0.024
Token.
Token ID13
Feature activation-1.14e-02

Pos logit contributions
ashing+0.050
tub+0.048
gan+0.045
angers+0.045
pit+0.043
Neg logit contributions
 harm-0.024
give-0.023
oc-0.023
 exchange-0.022
 Amen-0.022
Token<|endoftext|>
Token ID50256
Feature activation-3.61e-02

Pos logit contributions
ashing+0.023
tub+0.022
angers+0.020
pit+0.020
cast+0.020
Neg logit contributions
 Amen-0.017
oc-0.017
 exchange-0.017
give-0.016
 harm-0.016
TokenOnce
Token ID7454
Feature activation-1.23e-01

Pos logit contributions
ashing+0.063
tub+0.060
gan+0.053
angers+0.053
 Mil+0.052
Neg logit contributions
oc-0.062
 harm-0.062
 Amen-0.061
 exchange-0.060
give-0.060
Token upon
Token ID2402
Feature activation-3.62e-02

Pos logit contributions
ashing+0.141
angers+0.137
drops+0.129
tub+0.126
pit+0.117
Neg logit contributions
 exchange-0.265
 harm-0.253
 letter-0.252
 Amen-0.247
 return-0.238
Token a
Token ID257
Feature activation+1.27e-02

Pos logit contributions
ashing+0.063
tub+0.061
gan+0.055
 upon+0.053
 thirsty+0.053
Neg logit contributions
oc-0.062
 harm-0.061
 Amen-0.060
 exchange-0.060
give-0.059
Token time
Token ID640
Feature activation-8.88e-02

Pos logit contributions
oc+0.018
 Amen+0.018
 harm+0.018
 exchange+0.018
give+0.017
Neg logit contributions
ashing-0.026
tub-0.025
angers-0.023
gan-0.023
 Mil-0.022
Token,
Token ID11
Feature activation-9.43e-02

Pos logit contributions
ashing+0.220
angers+0.213
tub+0.208
ashed+0.198
cast+0.193
Neg logit contributions
e-0.079
oc-0.074
 exchange-0.071
 return-0.064
 bite-0.062
Token there
Token ID612
Feature activation-6.30e-02

Pos logit contributions
ashing+0.169
angers+0.164
tub+0.158
cast+0.149
 Mil+0.143
Neg logit contributions
e-0.145
 exchange-0.144
 Amen-0.142
 letter-0.140
oc-0.136
Token fun
Token ID1257
Feature activation-7.58e-03

Pos logit contributions
oc+0.077
 Amen+0.076
 harm+0.076
give+0.076
 exchange+0.075
Neg logit contributions
ashing-0.075
tub-0.072
gan-0.063
angers-0.062
 thirsty-0.061
Token time
Token ID640
Feature activation-5.29e-02

Pos logit contributions
ashing+0.012
tub+0.011
gan+0.010
angers+0.010
pit+0.010
Neg logit contributions
oc-0.014
 Amen-0.014
 harm-0.014
give-0.014
 exchange-0.014
Token outside
Token ID2354
Feature activation-8.57e-02

Pos logit contributions
tub+0.098
ashing+0.097
gan+0.093
 Mil+0.089
angers+0.088
Neg logit contributions
 harm-0.083
 exchange-0.081
give-0.080
oc-0.080
e-0.078
Token.
Token ID13
Feature activation-7.51e-03

Pos logit contributions
ashing+0.193
tub+0.180
gan+0.173
angers+0.172
 Mil+0.163
Neg logit contributions
 harm-0.106
 exchange-0.106
oc-0.101
give-0.100
 Amen-0.100
Token<|endoftext|>
Token ID50256
Feature activation-3.19e-02

Pos logit contributions
ashing+0.015
tub+0.014
angers+0.013
cast+0.013
pit+0.013
Neg logit contributions
 Amen-0.012
oc-0.011
 exchange-0.011
'.-0.010
give-0.010
TokenOnce
Token ID7454
Feature activation-1.23e-01

Pos logit contributions
ashing+0.052
tub+0.051
gan+0.044
 upon+0.044
angers+0.044
Neg logit contributions
 Amen-0.057
 harm-0.057
oc-0.056
give-0.055
 exchange-0.055
Token upon
Token ID2402
Feature activation-3.68e-02

Pos logit contributions
ashing+0.128
tub+0.120
angers+0.118
drops+0.107
pit+0.102
Neg logit contributions
 exchange-0.287
 harm-0.275
 letter-0.269
 Amen-0.266
oc-0.262
Token a
Token ID257
Feature activation+1.37e-02

Pos logit contributions
ashing+0.063
tub+0.061
gan+0.055
 upon+0.054
 thirsty+0.054
Neg logit contributions
oc-0.064
 harm-0.063
 Amen-0.062
 exchange-0.061
give-0.061
Token time
Token ID640
Feature activation-8.76e-02

Pos logit contributions
oc+0.018
 exchange+0.018
 Amen+0.017
 harm+0.017
give+0.017
Neg logit contributions
tub-0.029
ashing-0.029
angers-0.027
gan-0.026
 Mil-0.025
Token,
Token ID11
Feature activation-9.47e-02

Pos logit contributions
ashing+0.208
angers+0.200
tub+0.198
ashed+0.184
 prett+0.180
Neg logit contributions
e-0.091
oc-0.089
 exchange-0.082
 Amen-0.075
rum-0.075
Token there
Token ID612
Feature activation-6.35e-02

Pos logit contributions
ashing+0.166
angers+0.162
tub+0.154
cast+0.141
 Mil+0.138
Neg logit contributions
 exchange-0.151
 Amen-0.148
e-0.146
 letter-0.145
oc-0.142
Token.
Token ID13
Feature activation-1.12e-02

Pos logit contributions
ashing+0.061
tub+0.058
angers+0.054
gan+0.053
 Mil+0.052
Neg logit contributions
 harm-0.042
 exchange-0.041
oc-0.041
 Amen-0.041
give-0.041
Token The
Token ID383
Feature activation-9.47e-03

Pos logit contributions
ashing+0.021
tub+0.020
angers+0.018
cast+0.018
pit+0.017
Neg logit contributions
 Amen-0.019
oc-0.019
 exchange-0.018
give-0.017
 deliver-0.017
Token end
Token ID886
Feature activation-4.04e-02

Pos logit contributions
ashing+0.019
tub+0.018
gan+0.017
angers+0.017
 Mil+0.016
Neg logit contributions
 harm-0.014
oc-0.014
 Amen-0.014
 exchange-0.013
give-0.013
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.081
angers+0.080
 Mil+0.079
Neg logit contributions
oc-0.049
 Amen-0.049
 harm-0.049
 exchange-0.048
give-0.048
Token<|endoftext|>
Token ID50256
Feature activation-3.65e-02

Pos logit contributions
ashing+0.022
tub+0.020
angers+0.019
cast+0.019
pit+0.018
Neg logit contributions
oc-0.015
 Amen-0.015
 exchange-0.015
give-0.014
 harm-0.014
TokenOnce
Token ID7454
Feature activation-1.23e-01

Pos logit contributions
ashing+0.061
tub+0.059
gan+0.052
angers+0.051
 Mil+0.050
Neg logit contributions
 harm-0.064
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.69e-02

Pos logit contributions
ashing+0.146
angers+0.135
drops+0.130
tub+0.129
pit+0.115
Neg logit contributions
 exchange-0.263
 harm-0.248
 letter-0.248
oc-0.240
 Amen-0.238
Token a
Token ID257
Feature activation+1.25e-02

Pos logit contributions
ashing+0.067
tub+0.064
gan+0.058
 Mil+0.056
 upon+0.056
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.059
 exchange-0.059
give-0.058
Token time
Token ID640
Feature activation-8.83e-02

Pos logit contributions
oc+0.018
 Amen+0.017
 harm+0.017
 exchange+0.017
give+0.017
Neg logit contributions
ashing-0.026
tub-0.025
angers-0.023
gan-0.023
 Mil-0.022
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.205
tub+0.201
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.083
oc-0.079
 exchange-0.076
rum-0.070
 return-0.068
Token there
Token ID612
Feature activation-6.28e-02

Pos logit contributions
ashing+0.165
angers+0.160
tub+0.153
cast+0.142
 Mil+0.141
Neg logit contributions
 exchange-0.156
e-0.154
 Amen-0.152
 letter-0.150
oc-0.147
Token.
Token ID13
Feature activation-2.68e-03

Pos logit contributions
ashing+0.053
tub+0.050
angers+0.047
gan+0.046
 Mil+0.045
Neg logit contributions
 harm-0.025
oc-0.023
 Amen-0.023
 exchange-0.023
give-0.023
Token The
Token ID383
Feature activation-8.63e-03

Pos logit contributions
ashing+0.005
tub+0.005
angers+0.005
cast+0.005
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
'.-0.004
Token end
Token ID886
Feature activation-3.77e-02

Pos logit contributions
ashing+0.018
tub+0.016
gan+0.015
angers+0.015
 Mil+0.015
Neg logit contributions
 harm-0.013
 Amen-0.013
oc-0.012
 exchange-0.012
give-0.012
Token.
Token ID13
Feature activation-8.14e-03

Pos logit contributions
ashing+0.081
tub+0.078
gan+0.070
angers+0.070
 Mil+0.069
Neg logit contributions
 exchange-0.049
oc-0.049
give-0.049
 Amen-0.049
 harm-0.049
Token<|endoftext|>
Token ID50256
Feature activation-3.47e-02

Pos logit contributions
ashing+0.017
tub+0.016
angers+0.015
cast+0.014
pit+0.014
Neg logit contributions
 Amen-0.012
oc-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.056
tub+0.055
gan+0.048
angers+0.047
 Mil+0.047
Neg logit contributions
 harm-0.062
 Amen-0.062
oc-0.062
 exchange-0.061
give-0.061
Token upon
Token ID2402
Feature activation-3.66e-02

Pos logit contributions
ashing+0.145
angers+0.134
drops+0.130
tub+0.127
pit+0.114
Neg logit contributions
 exchange-0.268
 harm-0.253
 letter-0.251
oc-0.244
 Amen-0.243
Token a
Token ID257
Feature activation+1.21e-02

Pos logit contributions
ashing+0.066
tub+0.064
gan+0.058
 Mil+0.056
angers+0.056
Neg logit contributions
oc-0.060
 harm-0.059
 Amen-0.059
 exchange-0.058
give-0.058
Token time
Token ID640
Feature activation-8.79e-02

Pos logit contributions
oc+0.017
 Amen+0.017
 harm+0.017
 exchange+0.017
give+0.017
Neg logit contributions
ashing-0.025
tub-0.024
angers-0.022
gan-0.022
 Mil-0.021
Token,
Token ID11
Feature activation-9.66e-02

Pos logit contributions
ashing+0.208
angers+0.198
tub+0.196
ashed+0.185
cast+0.181
Neg logit contributions
e-0.091
oc-0.089
 exchange-0.084
rum-0.077
 Amen-0.076
Token there
Token ID612
Feature activation-6.33e-02

Pos logit contributions
ashing+0.166
angers+0.162
tub+0.152
cast+0.142
 Mil+0.139
Neg logit contributions
 exchange-0.158
e-0.156
 Amen-0.154
oc-0.153
 letter-0.151
Token.
Token ID13
Feature activation-1.25e-02

Pos logit contributions
ashing+0.115
gan+0.104
tub+0.101
 Mil+0.100
angers+0.098
Neg logit contributions
 exchange-0.040
 harm-0.040
 Amen-0.040
 deliver-0.039
give-0.038
Token The
Token ID383
Feature activation-7.07e-03

Pos logit contributions
ashing+0.024
tub+0.021
cast+0.020
angers+0.020
pit+0.019
Neg logit contributions
 Amen-0.022
oc-0.021
 exchange-0.020
give-0.019
 deliver-0.019
Token end
Token ID886
Feature activation-3.93e-02

Pos logit contributions
ashing+0.014
tub+0.013
gan+0.012
angers+0.012
 Mil+0.012
Neg logit contributions
 Amen-0.011
 harm-0.011
oc-0.010
give-0.010
 exchange-0.010
Token.
Token ID13
Feature activation-9.42e-03

Pos logit contributions
ashing+0.095
tub+0.090
gan+0.085
angers+0.084
pit+0.083
Neg logit contributions
oc-0.043
 Amen-0.043
 harm-0.043
 exchange-0.040
give-0.040
Token<|endoftext|>
Token ID50256
Feature activation-3.45e-02

Pos logit contributions
ashing+0.019
tub+0.018
angers+0.016
cast+0.016
pit+0.016
Neg logit contributions
oc-0.014
 Amen-0.014
 exchange-0.014
give-0.014
 harm-0.013
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.056
tub+0.055
gan+0.048
angers+0.047
 Mil+0.047
Neg logit contributions
 harm-0.062
 Amen-0.062
oc-0.061
 exchange-0.060
give-0.060
Token upon
Token ID2402
Feature activation-3.80e-02

Pos logit contributions
ashing+0.147
angers+0.132
drops+0.130
tub+0.129
pit+0.115
Neg logit contributions
 exchange-0.264
 letter-0.248
 harm-0.247
oc-0.240
 Amen-0.238
Token a
Token ID257
Feature activation+1.30e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.062
 harm-0.061
 Amen-0.061
 exchange-0.060
give-0.060
Token time
Token ID640
Feature activation-8.84e-02

Pos logit contributions
oc+0.019
 Amen+0.018
 harm+0.018
 exchange+0.018
give+0.018
Neg logit contributions
ashing-0.026
tub-0.026
angers-0.024
gan-0.023
 Mil-0.023
Token,
Token ID11
Feature activation-9.65e-02

Pos logit contributions
ashing+0.216
angers+0.205
tub+0.201
ashed+0.192
 prett+0.190
Neg logit contributions
e-0.082
oc-0.078
 exchange-0.076
rum-0.070
 return-0.068
Token a
Token ID257
Feature activation+3.63e-02

Pos logit contributions
ashing+0.166
angers+0.162
tub+0.154
cast+0.143
 Mil+0.143
Neg logit contributions
 exchange-0.155
e-0.153
 Amen-0.152
 letter-0.150
oc-0.146
Token visit
Token ID3187
Feature activation+9.32e-03

Pos logit contributions
ashing+0.045
tub+0.042
angers+0.039
gan+0.039
 Mil+0.038
Neg logit contributions
 harm-0.031
 exchange-0.030
oc-0.029
give-0.029
 deliver-0.029
Token that
Token ID326
Feature activation-2.35e-02

Pos logit contributions
 harm+0.013
 exchange+0.013
give+0.013
oc+0.013
 Amen+0.012
Neg logit contributions
ashing-0.020
tub-0.019
angers-0.017
gan-0.017
 Mil-0.016
Token day
Token ID1110
Feature activation-6.80e-02

Pos logit contributions
ashing+0.043
angers+0.038
gan+0.036
tub+0.036
ashed+0.034
Neg logit contributions
 harm-0.044
 exchange-0.043
 deliver-0.041
 letter-0.041
 Amen-0.041
Token.
Token ID13
Feature activation-1.80e-03

Pos logit contributions
ashing+0.158
tub+0.154
angers+0.144
gan+0.143
 Mil+0.141
Neg logit contributions
 exchange-0.077
 harm-0.076
oc-0.073
give-0.073
 deliver-0.069
Token<|endoftext|>
Token ID50256
Feature activation-3.38e-02

Pos logit contributions
ashing+0.004
tub+0.003
angers+0.003
cast+0.003
pit+0.003
Neg logit contributions
 Amen-0.003
oc-0.003
 exchange-0.003
give-0.003
 harm-0.003
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.062
tub+0.059
gan+0.053
angers+0.053
 Mil+0.051
Neg logit contributions
oc-0.055
 Amen-0.054
 harm-0.054
 exchange-0.054
give-0.053
Token upon
Token ID2402
Feature activation-4.05e-02

Pos logit contributions
ashing+0.132
tub+0.124
angers+0.123
drops+0.108
pit+0.102
Neg logit contributions
 exchange-0.280
 harm-0.264
 letter-0.259
 Amen-0.257
oc-0.256
Token a
Token ID257
Feature activation+1.26e-02

Pos logit contributions
ashing+0.065
tub+0.064
 upon+0.057
gan+0.057
 thirsty+0.056
Neg logit contributions
oc-0.074
 harm-0.072
 Amen-0.071
 exchange-0.071
give-0.071
Token time
Token ID640
Feature activation-8.89e-02

Pos logit contributions
oc+0.018
 Amen+0.017
 harm+0.017
 exchange+0.017
give+0.017
Neg logit contributions
ashing-0.026
tub-0.025
angers-0.023
gan-0.023
 Mil-0.022
Token there
Token ID612
Feature activation-6.65e-02

Pos logit contributions
ashing+0.223
angers+0.217
tub+0.209
ashed+0.202
 prett+0.195
Neg logit contributions
oc-0.078
e-0.075
 exchange-0.072
rum-0.065
 fright-0.065
Token was
Token ID373
Feature activation-6.68e-03

Pos logit contributions
ashing+0.171
angers+0.156
tub+0.151
gan+0.143
 spine+0.141
Neg logit contributions
oc-0.073
 exchange-0.070
 harm-0.066
e-0.066
give-0.063
Token.
Token ID13
Feature activation-7.38e-06

Pos logit contributions
 harm+0.018
 exchange+0.017
 respect+0.017
give+0.017
 Amen+0.017
Neg logit contributions
tub-0.033
ashing-0.033
angers-0.031
 Mil-0.029
gan-0.029
Token The
Token ID383
Feature activation-1.78e-04

Feature not active on prev token;
no predictions were affected.
Token end
Token ID886
Feature activation-3.88e-02

Pos logit contributions
ashing+0.000
tub+0.000
angers+0.000
gan+0.000
 Mil+0.000
Neg logit contributions
 harm-0.000
 Amen-0.000
oc-0.000
give-0.000
 exchange-0.000
Token.
Token ID13
Feature activation-5.43e-03

Pos logit contributions
ashing+0.089
tub+0.085
gan+0.079
angers+0.078
 Mil+0.077
Neg logit contributions
oc-0.046
 Amen-0.046
 harm-0.046
 exchange-0.045
give-0.044
Token<|endoftext|>
Token ID50256
Feature activation-3.34e-02

Pos logit contributions
ashing+0.011
tub+0.010
angers+0.010
cast+0.010
gan+0.009
Neg logit contributions
oc-0.008
 Amen-0.008
 exchange-0.008
give-0.008
 harm-0.007
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.057
tub+0.055
gan+0.048
angers+0.048
 Mil+0.047
Neg logit contributions
 harm-0.058
 Amen-0.058
oc-0.058
 exchange-0.056
give-0.056
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.133
drops+0.130
tub+0.127
pit+0.114
Neg logit contributions
 exchange-0.262
 harm-0.247
 letter-0.245
oc-0.241
 Amen-0.238
Token a
Token ID257
Feature activation+1.30e-02

Pos logit contributions
ashing+0.068
tub+0.066
gan+0.059
 Mil+0.057
 upon+0.057
Neg logit contributions
oc-0.062
 harm-0.061
 Amen-0.061
 exchange-0.060
give-0.060
Token time
Token ID640
Feature activation-8.75e-02

Pos logit contributions
oc+0.018
 Amen+0.018
 harm+0.018
 exchange+0.018
give+0.017
Neg logit contributions
ashing-0.027
tub-0.026
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.61e-02

Pos logit contributions
ashing+0.211
angers+0.201
tub+0.198
ashed+0.187
 prett+0.185
Neg logit contributions
e-0.083
oc-0.079
 exchange-0.077
rum-0.071
 Amen-0.068
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.164
angers+0.159
tub+0.152
 Mil+0.141
cast+0.140
Neg logit contributions
 exchange-0.156
e-0.153
 Amen-0.152
 letter-0.150
oc-0.147
Token was
Token ID373
Feature activation-1.01e-02

Pos logit contributions
ashing+0.024
tub+0.022
angers+0.021
gan+0.020
pit+0.019
Neg logit contributions
oc-0.019
 Amen-0.019
 exchange-0.019
 harm-0.018
give-0.018
Token happy
Token ID3772
Feature activation-1.09e-02

Pos logit contributions
ashing+0.018
tub+0.018
angers+0.016
cast+0.016
pit+0.015
Neg logit contributions
 Amen-0.018
 harm-0.017
give-0.017
 exchange-0.017
e-0.017
Token again
Token ID757
Feature activation-4.63e-02

Pos logit contributions
ashing+0.025
tub+0.024
angers+0.023
gan+0.022
ashed+0.021
Neg logit contributions
 harm-0.013
 Amen-0.013
 exchange-0.013
give-0.013
oc-0.012
Token!
Token ID0
Feature activation-7.72e-03

Pos logit contributions
ashing+0.103
tub+0.099
angers+0.091
gan+0.091
 Mil+0.088
Neg logit contributions
oc-0.059
 harm-0.058
 exchange-0.057
 Amen-0.057
give-0.056
Token<|endoftext|>
Token ID50256
Feature activation-4.03e-02

Pos logit contributions
ashing+0.016
tub+0.015
angers+0.013
cast+0.013
ashed+0.013
Neg logit contributions
 Amen-0.012
 exchange-0.012
oc-0.012
give-0.011
 deliver-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.071
tub+0.065
gan+0.059
angers+0.058
 Mil+0.056
Neg logit contributions
oc-0.071
 exchange-0.070
 Amen-0.070
give-0.069
 harm-0.069
Token upon
Token ID2402
Feature activation-4.19e-02

Pos logit contributions
ashing+0.153
angers+0.146
tub+0.142
drops+0.140
pit+0.121
Neg logit contributions
 exchange-0.252
 harm-0.234
 letter-0.233
 Amen-0.232
 return-0.227
Token a
Token ID257
Feature activation+1.50e-02

Pos logit contributions
ashing+0.074
tub+0.072
gan+0.065
 upon+0.063
 Mil+0.063
Neg logit contributions
oc-0.070
 harm-0.069
 Amen-0.069
 exchange-0.068
give-0.068
Token time
Token ID640
Feature activation-8.54e-02

Pos logit contributions
oc+0.020
 Amen+0.020
 harm+0.020
 exchange+0.020
give+0.019
Neg logit contributions
ashing-0.031
tub-0.031
angers-0.029
gan-0.028
 Mil-0.028
Token,
Token ID11
Feature activation-9.28e-02

Pos logit contributions
ashing+0.223
angers+0.215
tub+0.209
ashed+0.203
 prett+0.198
Neg logit contributions
e-0.064
oc-0.057
 exchange-0.052
"-0.052
 return-0.050
Token there
Token ID612
Feature activation-6.04e-02

Pos logit contributions
ashing+0.166
angers+0.164
tub+0.156
cast+0.147
unes+0.140
Neg logit contributions
e-0.143
 Amen-0.142
 exchange-0.139
oc-0.132
 letter-0.131
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
ashing+0.158
tub+0.152
gan+0.150
 Mil+0.135
pit+0.135
Neg logit contributions
 deliver-0.090
 exchange-0.090
 Amen-0.088
oc-0.084
 harm-0.084
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
oc+0.034
 exchange+0.034
 Amen+0.033
 harm+0.033
give+0.032
Neg logit contributions
ashing-0.039
tub-0.037
angers-0.034
gan-0.033
 Mil-0.032
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token there
Token ID612
Feature activation-6.53e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token lived
Token ID5615
Feature activation-4.04e-02

Pos logit contributions
ashing+0.172
angers+0.156
tub+0.148
 spine+0.141
gan+0.141
Neg logit contributions
oc-0.069
 exchange-0.067
e-0.063
 harm-0.060
give-0.058
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
ashing+0.097
tub+0.089
angers+0.082
gan+0.081
 upon+0.080
Neg logit contributions
 harm-0.092
oc-0.091
 Amen-0.089
give-0.087
 exchange-0.087
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 harm+0.048
 Amen+0.048
give+0.048
 exchange+0.047
oc+0.047
Neg logit contributions
ashing-0.058
tub-0.055
gan-0.049
angers-0.048
 thirsty-0.047
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 exchange+0.031
oc+0.031
 Amen+0.030
give+0.028
 deliver+0.027
Neg logit contributions
ashing-0.041
tub-0.039
angers-0.036
pit-0.035
cast-0.035
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 harm+0.063
give+0.062
 Amen+0.061
 exchange+0.061
oc+0.059
Neg logit contributions
ashing-0.102
tub-0.097
angers-0.087
ashed-0.086
gan-0.085
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
ashing+0.034
tub+0.033
gan+0.028
angers+0.028
pit+0.027
Neg logit contributions
 harm-0.045
 exchange-0.045
oc-0.045
 Amen-0.044
give-0.044
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token there
Token ID612
Feature activation-6.53e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token was
Token ID373
Feature activation-7.85e-03

Pos logit contributions
ashing+0.172
angers+0.156
tub+0.148
 spine+0.141
gan+0.141
Neg logit contributions
oc-0.069
 exchange-0.067
e-0.063
 harm-0.060
give-0.058
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 exchange+0.008
oc+0.008
 Amen+0.008
 deliver+0.008
 harm+0.008
Neg logit contributions
ashing-0.009
tub-0.009
angers-0.008
pit-0.008
gan-0.008
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 exchange+0.028
 Amen+0.027
 mail+0.025
e+0.025
oc+0.025
Neg logit contributions
ashing-0.034
cast-0.033
tub-0.032
angers-0.031
gan-0.030
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 Amen+0.019
oc+0.019
 harm+0.018
 exchange+0.018
give+0.018
Neg logit contributions
ashing-0.028
tub-0.027
gan-0.024
angers-0.024
 upon-0.024
Token The
Token ID383
Feature activation-6.41e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
cast+0.004
ashed+0.004
Neg logit contributions
 Amen-0.004
oc-0.004
 exchange-0.004
give-0.004
 deliver-0.004
Token end
Token ID886
Feature activation-4.02e-02

Pos logit contributions
ashing+0.013
tub+0.012
gan+0.011
angers+0.011
 Mil+0.011
Neg logit contributions
oc-0.009
 harm-0.009
 Amen-0.009
 exchange-0.009
give-0.009
Token.
Token ID13
Feature activation-8.02e-03

Pos logit contributions
ashing+0.092
tub+0.088
gan+0.082
angers+0.081
 Mil+0.079
Neg logit contributions
oc-0.048
 Amen-0.048
 harm-0.048
 exchange-0.046
give-0.046
Token<|endoftext|>
Token ID50256
Feature activation-3.64e-02

Pos logit contributions
ashing+0.017
tub+0.016
cast+0.014
angers+0.014
pit+0.014
Neg logit contributions
oc-0.012
 Amen-0.012
 exchange-0.012
give-0.011
 harm-0.011
TokenOnce
Token ID7454
Feature activation-1.22e-01

Pos logit contributions
ashing+0.060
tub+0.059
gan+0.051
angers+0.050
 Mil+0.050
Neg logit contributions
 harm-0.065
 Amen-0.064
oc-0.064
 exchange-0.063
give-0.063
Token upon
Token ID2402
Feature activation-3.77e-02

Pos logit contributions
ashing+0.145
angers+0.130
tub+0.127
drops+0.124
pit+0.112
Neg logit contributions
 exchange-0.267
 harm-0.252
 letter-0.250
oc-0.247
 Amen-0.242
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
ashing+0.069
tub+0.067
gan+0.060
 Mil+0.058
angers+0.058
Neg logit contributions
oc-0.061
 harm-0.060
 Amen-0.060
 exchange-0.059
give-0.059
Token time
Token ID640
Feature activation-8.74e-02

Pos logit contributions
oc+0.019
 harm+0.019
 Amen+0.019
 exchange+0.019
give+0.018
Neg logit contributions
ashing-0.027
tub-0.027
angers-0.024
gan-0.024
 Mil-0.023
Token,
Token ID11
Feature activation-9.62e-02

Pos logit contributions
ashing+0.215
angers+0.203
tub+0.200
ashed+0.191
 prett+0.188
Neg logit contributions
e-0.079
oc-0.076
 exchange-0.074
rum-0.069
 return-0.066
Token there
Token ID612
Feature activation-6.25e-02

Pos logit contributions
ashing+0.165
angers+0.161
tub+0.151
cast+0.141
 Mil+0.140
Neg logit contributions
 exchange-0.156
e-0.155
 Amen-0.153
 letter-0.150
oc-0.149
Tokenmy
Token ID1820
Feature activation+1.92e-02

Pos logit contributions
ashing+0.006
tub+0.006
angers+0.005
gan+0.005
 Mil+0.005
Neg logit contributions
oc-0.005
 harm-0.005
 exchange-0.005
give-0.005
 Amen-0.005
Token's
Token ID338
Feature activation-1.69e-03

Pos logit contributions
oc+0.027
 exchange+0.026
 harm+0.026
give+0.025
 deliver+0.025
Neg logit contributions
ashing-0.039
tub-0.037
angers-0.036
gan-0.035
 Mil-0.034
Token mom
Token ID1995
Feature activation+4.39e-02

Pos logit contributions
ashing+0.003
tub+0.003
angers+0.003
gan+0.003
 Mil+0.003
Neg logit contributions
 harm-0.003
 exchange-0.002
oc-0.002
 Amen-0.002
give-0.002
Token called
Token ID1444
Feature activation+2.75e-02

Pos logit contributions
 Amen+0.080
oc+0.079
 harm+0.079
give+0.077
 exchange+0.076
Neg logit contributions
ashing-0.073
tub-0.069
gan-0.062
 upon-0.060
angers-0.059
Token them
Token ID606
Feature activation-3.49e-03

Pos logit contributions
oc+0.039
 Amen+0.038
 harm+0.038
 exchange+0.037
give+0.037
Neg logit contributions
ashing-0.057
tub-0.055
gan-0.050
angers-0.049
 Mil-0.048
Token for
Token ID329
Feature activation+6.51e-02

Pos logit contributions
ashing+0.006
tub+0.006
gan+0.005
angers+0.005
 Mil+0.005
Neg logit contributions
 Amen-0.006
 harm-0.006
 exchange-0.006
oc-0.006
give-0.006
Token a
Token ID257
Feature activation+5.71e-02

Pos logit contributions
oc+0.128
 Amen+0.122
 harm+0.120
 exchange+0.118
give+0.117
Neg logit contributions
ashing-0.100
tub-0.096
gan-0.085
 thirsty-0.084
angers-0.084
Token snack
Token ID26906
Feature activation+3.41e-02

Pos logit contributions
oc+0.086
 Amen+0.085
 harm+0.084
give+0.083
 exchange+0.083
Neg logit contributions
ashing-0.113
tub-0.108
gan-0.097
angers-0.095
 Mil-0.094
Token.
Token ID13
Feature activation+1.30e-02

Pos logit contributions
 Amen+0.051
oc+0.050
 harm+0.048
give+0.048
 exchange+0.046
Neg logit contributions
ashing-0.070
tub-0.069
gan-0.061
angers-0.060
pit-0.059
Token
Token ID198
Feature activation+3.60e-02

Pos logit contributions
 exchange+0.017
 mail+0.016
oc+0.014
 Amen+0.014
 deliver+0.014
Neg logit contributions
ashing-0.027
pit-0.026
cast-0.025
gan-0.025
tub-0.025
Token
Token ID198
Feature activation+3.75e-02

Pos logit contributions
 harm+0.069
 Amen+0.068
give+0.068
oc+0.067
 exchange+0.066
Neg logit contributions
ashing-0.056
tub-0.054
 upon-0.047
gan-0.045
 Mil-0.044
Token all
Token ID477
Feature activation-3.48e-02

Pos logit contributions
ashing+0.024
tub+0.023
gan+0.020
angers+0.020
 Mil+0.020
Neg logit contributions
 Amen-0.019
oc-0.019
 harm-0.019
give-0.018
 exchange-0.018
Token my
Token ID616
Feature activation-2.83e-02

Pos logit contributions
ashing+0.073
tub+0.069
angers+0.064
gan+0.063
 Mil+0.061
Neg logit contributions
 harm-0.049
oc-0.048
 exchange-0.048
 Amen-0.047
give-0.047
Token lettuce
Token ID39406
Feature activation+1.81e-02

Pos logit contributions
ashing+0.059
tub+0.057
gan+0.052
angers+0.051
 Mil+0.050
Neg logit contributions
oc-0.038
 Amen-0.038
 harm-0.038
 exchange-0.038
give-0.037
Token.
Token ID13
Feature activation-1.59e-03

Pos logit contributions
oc+0.021
 exchange+0.021
 harm+0.021
give+0.020
 Amen+0.020
Neg logit contributions
ashing-0.042
tub-0.039
gan-0.037
angers-0.036
 Mil-0.036
Token I
Token ID314
Feature activation+3.49e-02

Pos logit contributions
ashing+0.003
tub+0.003
gan+0.002
angers+0.002
pit+0.002
Neg logit contributions
oc-0.003
 Amen-0.003
 exchange-0.003
 harm-0.003
give-0.003
Token need
Token ID761
Feature activation+8.22e-02

Pos logit contributions
 Amen+0.056
 harm+0.055
oc+0.055
give+0.054
 exchange+0.054
Neg logit contributions
ashing-0.065
tub-0.061
gan-0.055
 upon-0.054
 thirsty-0.054
Token a
Token ID257
Feature activation+4.88e-02

Pos logit contributions
oc+0.171
 Amen+0.164
izer+0.155
rum+0.148
 harm+0.146
Neg logit contributions
tub-0.108
ashing-0.107
 upon-0.099
 thirsty-0.097
gan-0.097
Token lawyer
Token ID6853
Feature activation+4.07e-02

Pos logit contributions
 Amen+0.075
oc+0.074
give+0.074
 harm+0.072
rum+0.070
Neg logit contributions
ashing-0.094
tub-0.091
 thirsty-0.081
 upon-0.080
gan-0.080
Token to
Token ID284
Feature activation+4.94e-02

Pos logit contributions
 Amen+0.067
 harm+0.066
oc+0.066
give+0.066
 exchange+0.065
Neg logit contributions
ashing-0.073
tub-0.069
gan-0.062
angers-0.060
 Mil-0.059
Token help
Token ID1037
Feature activation+4.56e-02

Pos logit contributions
oc+0.064
 Amen+0.063
rum+0.057
give+0.056
ez+0.055
Neg logit contributions
ashing-0.097
tub-0.097
 thirsty-0.092
 Mil-0.090
gan-0.090
Token me
Token ID502
Feature activation-2.34e-02

Pos logit contributions
 Amen+0.082
oc+0.079
give+0.075
izer+0.075
 exchange+0.074
Neg logit contributions
ashing-0.070
tub-0.068
 Mil-0.063
gan-0.063
 upon-0.061
Token reach
Token ID3151
Feature activation-1.68e-02

Pos logit contributions
 Amen+0.085
oc+0.083
rum+0.076
ez+0.076
give+0.072
Neg logit contributions
gan-0.073
 thirsty-0.071
 bored-0.070
ashing-0.069
 Mil-0.069
Token it
Token ID340
Feature activation-6.13e-02

Pos logit contributions
ashing+0.027
tub+0.026
 Mil+0.024
angers+0.024
gan+0.023
Neg logit contributions
 Amen-0.031
oc-0.030
give-0.030
 harm-0.029
 exchange-0.028
Token,
Token ID11
Feature activation-7.83e-02

Pos logit contributions
ashing+0.119
tub+0.113
gan+0.102
angers+0.101
 Mil+0.099
Neg logit contributions
oc-0.094
 harm-0.094
 Amen-0.093
 exchange-0.093
give-0.092
Token but
Token ID475
Feature activation+3.94e-03

Pos logit contributions
ashing+0.111
tub+0.106
angers+0.089
gan+0.087
ashed+0.085
Neg logit contributions
 harm-0.164
 exchange-0.164
 deliver-0.160
 shave-0.158
 respect-0.156
Token they
Token ID484
Feature activation+1.60e-03

Pos logit contributions
 exchange+0.005
 respect+0.005
 harm+0.005
 prove+0.005
 return+0.005
Neg logit contributions
ashing-0.009
angers-0.008
tub-0.008
cast-0.007
udd-0.007
Token cannot
Token ID2314
Feature activation+6.72e-02

Pos logit contributions
 Amen+0.003
oc+0.003
 harm+0.003
give+0.002
 exchange+0.002
Neg logit contributions
ashing-0.003
tub-0.003
gan-0.003
angers-0.003
 Mil-0.002
Token.
Token ID13
Feature activation+4.17e-03

Pos logit contributions
 Amen+0.126
oc+0.120
izer+0.118
give+0.118
ez+0.111
Neg logit contributions
gan-0.102
ashing-0.101
 bored-0.096
 Mil-0.095
 thirsty-0.093
Token They
Token ID1119
Feature activation+1.81e-02

Pos logit contributions
 exchange+0.006
give+0.006
oc+0.006
 mail+0.006
Will+0.005
Neg logit contributions
ashing-0.008
cast-0.007
tub-0.007
pit-0.007
 spine-0.007
Token ask
Token ID1265
Feature activation+2.89e-02

Pos logit contributions
oc+0.027
 Amen+0.027
give+0.026
 harm+0.026
 exchange+0.026
Neg logit contributions
ashing-0.035
tub-0.033
gan-0.031
 Mil-0.031
 thirsty-0.030
Token their
Token ID511
Feature activation+1.53e-03

Pos logit contributions
 harm+0.049
 Amen+0.048
 exchange+0.048
oc+0.048
give+0.048
Neg logit contributions
ashing-0.052
tub-0.049
gan-0.043
angers-0.043
pit-0.042
Token mom
Token ID1995
Feature activation+5.15e-02

Pos logit contributions
 harm+0.002
 exchange+0.002
oc+0.002
 Amen+0.002
give+0.002
Neg logit contributions
ashing-0.003
tub-0.003
angers-0.002
gan-0.002
cast-0.002
Token was
Token ID373
Feature activation+1.04e-02

Pos logit contributions
oc+0.022
 harm+0.022
give+0.021
 exchange+0.021
e+0.021
Neg logit contributions
ashing-0.028
tub-0.027
angers-0.025
gan-0.024
pit-0.023
Token confused
Token ID10416
Feature activation+1.34e-02

Pos logit contributions
 harm+0.018
give+0.017
e+0.016
 deliver+0.016
 defeat+0.016
Neg logit contributions
ashing-0.019
tub-0.018
angers-0.018
pit-0.017
cast-0.017
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
 harm+0.018
oc+0.017
 exchange+0.017
 shave+0.016
give+0.016
Neg logit contributions
ashing-0.028
tub-0.028
angers-0.026
gan-0.025
 Mil-0.025
Token He
Token ID679
Feature activation+1.93e-02

Pos logit contributions
oc+0.018
give+0.015
 exchange+0.015
 word+0.015
 Amen+0.015
Neg logit contributions
ashing-0.042
ancer-0.037
angers-0.036
tub-0.036
ashed-0.036
Token didn
Token ID1422
Feature activation+3.66e-02

Pos logit contributions
oc+0.036
 Amen+0.035
 harm+0.035
 exchange+0.035
give+0.035
Neg logit contributions
ashing-0.032
tub-0.030
gan-0.026
angers-0.026
 Mil-0.025
Token't
Token ID470
Feature activation+6.44e-02

Pos logit contributions
oc+0.073
 Amen+0.068
 exchange+0.066
 envelop+0.065
 harm+0.064
Neg logit contributions
tub-0.055
ashing-0.051
angers-0.047
gan-0.047
 Mil-0.047
Token know
Token ID760
Feature activation+6.31e-02

Pos logit contributions
oc+0.082
 Amen+0.082
give+0.081
 exchange+0.079
 harm+0.073
Neg logit contributions
ashing-0.144
tub-0.133
 bored-0.122
 thirsty-0.122
 Mil-0.121
Token what
Token ID644
Feature activation+2.61e-02

Pos logit contributions
 harm+0.076
oc+0.071
 defeat+0.066
 deliver+0.066
 respect+0.066
Neg logit contributions
ashing-0.155
tub-0.146
ashed-0.136
angers-0.134
gan-0.132
Token was
Token ID373
Feature activation+9.54e-03

Pos logit contributions
oc+0.045
 Amen+0.045
 harm+0.044
 exchange+0.044
give+0.043
Neg logit contributions
ashing-0.046
tub-0.044
gan-0.039
angers-0.039
 Mil-0.038
Token happening
Token ID5836
Feature activation-1.59e-02

Pos logit contributions
 harm+0.017
give+0.015
 exchange+0.015
 letter+0.015
 deliver+0.015
Neg logit contributions
ashing-0.017
tub-0.016
angers-0.015
gan-0.015
 Mil-0.015
Token.
Token ID13
Feature activation+9.00e-03

Pos logit contributions
ashing+0.035
tub+0.033
gan+0.031
angers+0.031
 Mil+0.030
Neg logit contributions
 harm-0.021
 exchange-0.020
oc-0.020
 Amen-0.020
give-0.020
Token so
Token ID523
Feature activation-1.91e-03

Pos logit contributions
 harm+0.019
 exchange+0.018
 Amen+0.018
give+0.018
oc+0.018
Neg logit contributions
ashing-0.017
tub-0.016
angers-0.015
gan-0.014
pit-0.014
Token happy
Token ID3772
Feature activation-2.36e-02

Pos logit contributions
ashing+0.003
tub+0.003
angers+0.003
pit+0.003
gan+0.003
Neg logit contributions
oc-0.003
 exchange-0.003
 harm-0.003
 Amen-0.003
give-0.003
Token that
Token ID326
Feature activation+1.09e-03

Pos logit contributions
ashing+0.058
tub+0.057
angers+0.054
gan+0.054
ancer+0.052
Neg logit contributions
oc-0.024
 harm-0.023
give-0.021
 defeat-0.020
 exchange-0.020
Token he
Token ID339
Feature activation-8.28e-03

Pos logit contributions
 Amen+0.002
oc+0.002
 harm+0.002
 exchange+0.002
 deliver+0.002
Neg logit contributions
ashing-0.002
tub-0.002
angers-0.002
gan-0.002
pit-0.002
Token wanted
Token ID2227
Feature activation+5.46e-02

Pos logit contributions
ashing+0.015
tub+0.015
angers+0.014
pit+0.013
gan+0.013
Neg logit contributions
oc-0.013
give-0.013
 harm-0.013
 Amen-0.013
 exchange-0.012
Token to
Token ID284
Feature activation+6.66e-02

Pos logit contributions
 Amen+0.107
oc+0.106
 exchange+0.101
give+0.099
 harm+0.099
Neg logit contributions
ashing-0.085
tub-0.080
gan-0.072
 Mil-0.070
 thirsty-0.070
Token fly
Token ID6129
Feature activation-6.83e-03

Pos logit contributions
oc+0.124
 Amen+0.122
 letter+0.111
rum+0.110
 envelop+0.109
Neg logit contributions
ashing-0.105
 Mil-0.098
tub-0.097
 Madd-0.096
 thirsty-0.095
Token too
Token ID1165
Feature activation+1.41e-03

Pos logit contributions
ashing+0.014
tub+0.013
angers+0.012
gan+0.012
pit+0.012
Neg logit contributions
 harm-0.010
oc-0.010
 exchange-0.010
 Amen-0.010
give-0.009
Token,
Token ID11
Feature activation-7.16e-02

Pos logit contributions
 Amen+0.002
oc+0.002
 harm+0.002
 exchange+0.002
give+0.002
Neg logit contributions
ashing-0.003
tub-0.003
gan-0.002
angers-0.002
 Mil-0.002
Token so
Token ID523
Feature activation-3.67e-03

Pos logit contributions
ashing+0.134
tub+0.131
angers+0.122
gan+0.114
pit+0.114
Neg logit contributions
 harm-0.112
 deliver-0.112
 fright-0.111
 steal-0.110
 prove-0.109
Token he
Token ID339
Feature activation-2.02e-02

Pos logit contributions
ashing+0.006
tub+0.006
angers+0.005
pit+0.005
gan+0.005
Neg logit contributions
oc-0.007
 harm-0.007
 exchange-0.007
 Amen-0.007
give-0.007
Token.
Token ID13
Feature activation-9.99e-03

Pos logit contributions
ashing+0.016
tub+0.016
angers+0.015
gan+0.014
ashed+0.014
Neg logit contributions
 harm-0.006
 Amen-0.006
give-0.006
 exchange-0.006
oc-0.006
Token "
Token ID366
Feature activation+2.26e-02

Pos logit contributions
ashing+0.019
tub+0.018
gan+0.016
angers+0.016
pit+0.015
Neg logit contributions
 Amen-0.016
oc-0.016
 exchange-0.016
 harm-0.016
give-0.016
TokenYou
Token ID1639
Feature activation+3.62e-02

Pos logit contributions
 harm+0.038
 exchange+0.035
 letter+0.034
 defeat+0.034
 message+0.034
Neg logit contributions
ashing-0.039
tub-0.038
gan-0.037
 Mil-0.036
angers-0.032
Token are
Token ID389
Feature activation+1.73e-02

Pos logit contributions
oc+0.054
 Amen+0.053
 harm+0.053
 exchange+0.053
give+0.052
Neg logit contributions
ashing-0.073
tub-0.069
gan-0.063
angers-0.062
 Mil-0.061
Token nice
Token ID3621
Feature activation+2.37e-02

Pos logit contributions
 Amen+0.027
 harm+0.027
 exchange+0.027
oc+0.027
give+0.027
Neg logit contributions
ashing-0.033
tub-0.032
angers-0.028
gan-0.028
pit-0.027
Token kids
Token ID3988
Feature activation+1.77e-02

Pos logit contributions
oc+0.031
 Amen+0.031
 harm+0.030
 exchange+0.030
give+0.030
Neg logit contributions
ashing-0.052
tub-0.049
gan-0.045
angers-0.045
 Mil-0.044
Token.
Token ID13
Feature activation+8.01e-03

Pos logit contributions
 harm+0.024
give+0.024
oc+0.024
 Amen+0.024
 exchange+0.024
Neg logit contributions
ashing-0.037
tub-0.035
gan-0.032
 thirsty-0.031
angers-0.031
Token You
Token ID921
Feature activation+3.53e-02

Pos logit contributions
oc+0.015
 Amen+0.015
 harm+0.014
 exchange+0.014
give+0.014
Neg logit contributions
ashing-0.013
tub-0.013
gan-0.011
angers-0.011
 Mil-0.011
Token can
Token ID460
Feature activation+5.26e-02

Pos logit contributions
oc+0.054
 Amen+0.054
 harm+0.053
 exchange+0.052
give+0.052
Neg logit contributions
ashing-0.069
tub-0.066
gan-0.059
angers-0.058
 Mil-0.057
Token play
Token ID711
Feature activation+7.66e-03

Pos logit contributions
oc+0.089
rum+0.089
 Amen+0.082
 letter+0.080
izer+0.080
Neg logit contributions
ashing-0.091
tub-0.087
gan-0.082
 Mil-0.079
 thirsty-0.079
Token with
Token ID351
Feature activation+2.96e-02

Pos logit contributions
 Amen+0.013
 harm+0.013
oc+0.013
 exchange+0.013
rum+0.013
Neg logit contributions
ashing-0.013
tub-0.013
 thirsty-0.011
angers-0.011
pit-0.011
Token excited
Token ID6568
Feature activation-2.05e-03

Pos logit contributions
pit+0.037
ashing+0.035
 spine+0.034
angers+0.034
tub+0.034
Neg logit contributions
 deliver-0.033
 Amen-0.033
 mail-0.032
 letter-0.031
 value-0.031
Token and
Token ID290
Feature activation-9.88e-03

Pos logit contributions
tub+0.004
ashing+0.004
pit+0.004
gan+0.004
 Mil+0.004
Neg logit contributions
 harm-0.002
give-0.002
 letter-0.002
oc-0.002
 deliver-0.002
Token couldn
Token ID3521
Feature activation+3.82e-02

Pos logit contributions
ashing+0.018
angers+0.018
tub+0.018
pit+0.018
gan+0.016
Neg logit contributions
 deliver-0.016
 Amen-0.015
 mail-0.015
 exchange-0.014
 respect-0.014
Token't
Token ID470
Feature activation+4.25e-02

Pos logit contributions
oc+0.052
 harm+0.051
 Amen+0.050
 exchange+0.050
give+0.048
Neg logit contributions
ashing-0.078
tub-0.078
gan-0.070
angers-0.069
 Mil-0.068
Token wait
Token ID4043
Feature activation-3.62e-02

Pos logit contributions
oc+0.088
give+0.085
 exchange+0.084
rum+0.083
 Amen+0.083
Neg logit contributions
ashing-0.059
tub-0.055
gan-0.050
 Mil-0.049
 thirsty-0.048
Token to
Token ID284
Feature activation+3.38e-02

Pos logit contributions
ashing+0.072
tub+0.071
gan+0.065
pit+0.063
angers+0.063
Neg logit contributions
 harm-0.051
 exchange-0.051
 deliver-0.050
 Amen-0.050
give-0.049
Token go
Token ID467
Feature activation-2.34e-02

Pos logit contributions
oc+0.063
ez+0.059
rum+0.057
give+0.056
 Amen+0.054
Neg logit contributions
tub-0.052
 upon-0.051
ashing-0.051
 thirsty-0.050
 Mil-0.048
Token.
Token ID13
Feature activation+1.00e-02

Pos logit contributions
ashing+0.042
tub+0.040
angers+0.035
gan+0.035
 upon+0.034
Neg logit contributions
oc-0.040
 Amen-0.039
 harm-0.039
give-0.039
 exchange-0.038
Token
Token ID198
Feature activation+2.72e-02

Pos logit contributions
 exchange+0.013
 Amen+0.012
oc+0.012
 deliver+0.011
 mail+0.011
Neg logit contributions
ashing-0.023
cast-0.021
tub-0.021
angers-0.020
pit-0.020
Token
Token ID198
Feature activation+3.03e-02

Pos logit contributions
 harm+0.052
give+0.051
 Amen+0.051
oc+0.050
 exchange+0.050
Neg logit contributions
ashing-0.042
tub-0.040
 upon-0.035
gan-0.034
 Mil-0.034
TokenThey
Token ID2990
Feature activation+1.76e-02

Pos logit contributions
 harm+0.047
 Amen+0.046
oc+0.045
 exchange+0.044
give+0.044
Neg logit contributions
ashing-0.060
tub-0.057
gan-0.051
 upon-0.050
angers-0.050
Token your
Token ID534
Feature activation-1.00e-02

Pos logit contributions
oc+0.064
 Amen+0.062
 exchange+0.059
give+0.058
 owe+0.057
Neg logit contributions
ashing-0.065
tub-0.059
 Mil-0.058
gan-0.057
angers-0.054
Token mom
Token ID1995
Feature activation+3.12e-02

Pos logit contributions
ashing+0.017
tub+0.016
gan+0.014
angers+0.014
 Mil+0.013
Neg logit contributions
oc-0.018
 Amen-0.018
 harm-0.018
 exchange-0.018
give-0.018
Token and
Token ID290
Feature activation+1.19e-02

Pos logit contributions
 Amen+0.051
give+0.049
 harm+0.048
oc+0.048
 exchange+0.047
Neg logit contributions
ashing-0.057
tub-0.054
gan-0.049
 upon-0.048
 thirsty-0.047
Token dad
Token ID9955
Feature activation+5.03e-02

Pos logit contributions
oc+0.022
 Amen+0.022
 harm+0.022
 exchange+0.022
give+0.022
Neg logit contributions
ashing-0.019
tub-0.018
gan-0.016
angers-0.016
 Mil-0.015
Token.
Token ID13
Feature activation-2.35e-04

Pos logit contributions
 Amen+0.073
give+0.071
oc+0.071
 harm+0.071
 exchange+0.069
Neg logit contributions
ashing-0.101
tub-0.095
gan-0.088
 upon-0.085
 thirsty-0.084
Token They
Token ID1119
Feature activation+1.85e-02

Pos logit contributions
ashing+0.000
tub+0.000
gan+0.000
angers+0.000
pit+0.000
Neg logit contributions
oc-0.000
 Amen-0.000
 harm-0.000
 exchange-0.000
give-0.000
Token know
Token ID760
Feature activation+3.86e-02

Pos logit contributions
 Amen+0.026
oc+0.026
izer+0.026
give+0.025
 exchange+0.025
Neg logit contributions
ashing-0.036
tub-0.034
 thirsty-0.031
 upon-0.031
gan-0.031
Token what
Token ID644
Feature activation+1.19e-02

Pos logit contributions
 harm+0.056
give+0.054
 respect+0.054
 exchange+0.053
 deliver+0.052
Neg logit contributions
ashing-0.083
tub-0.078
angers-0.071
pit-0.070
ashed-0.069
Token is
Token ID318
Feature activation-4.53e-03

Pos logit contributions
oc+0.021
 Amen+0.021
 exchange+0.021
 harm+0.021
give+0.020
Neg logit contributions
ashing-0.020
tub-0.019
gan-0.017
 Mil-0.016
angers-0.016
Token best
Token ID1266
Feature activation-1.91e-02

Pos logit contributions
ashing+0.008
tub+0.007
gan+0.006
angers+0.006
 Mil+0.006
Neg logit contributions
 harm-0.008
 Amen-0.008
oc-0.008
 exchange-0.008
give-0.008
Token for
Token ID329
Feature activation+4.75e-02

Pos logit contributions
ashing+0.025
tub+0.024
gan+0.020
angers+0.020
 upon+0.019
Neg logit contributions
oc-0.041
 Amen-0.041
 exchange-0.041
give-0.040
 harm-0.040
Token see
Token ID766
Feature activation-1.30e-02

Pos logit contributions
 harm+0.036
 exchange+0.036
 deliver+0.035
 Amen+0.035
give+0.035
Neg logit contributions
ashing-0.066
tub-0.062
angers-0.057
gan-0.057
pit-0.055
Token a
Token ID257
Feature activation+2.83e-02

Pos logit contributions
ashing+0.028
tub+0.026
gan+0.024
angers+0.024
ashed+0.023
Neg logit contributions
oc-0.018
 harm-0.018
 exchange-0.017
give-0.017
 Amen-0.017
Token wide
Token ID3094
Feature activation+8.79e-03

Pos logit contributions
oc+0.046
 Amen+0.045
 harm+0.045
 exchange+0.045
give+0.044
Neg logit contributions
ashing-0.052
tub-0.050
gan-0.045
angers-0.044
 Mil-0.043
Token,
Token ID11
Feature activation-5.78e-02

Pos logit contributions
oc+0.015
 exchange+0.014
 Amen+0.014
 harm+0.014
give+0.014
Neg logit contributions
ashing-0.016
tub-0.015
gan-0.014
angers-0.014
ashed-0.013
Token white
Token ID2330
Feature activation+2.72e-02

Pos logit contributions
ashing+0.104
tub+0.097
angers+0.090
gan+0.089
pit+0.084
Neg logit contributions
oc-0.098
 Amen-0.097
 harm-0.097
 exchange-0.096
 deliver-0.095
Token sock
Token ID32263
Feature activation+1.59e-02

Pos logit contributions
oc+0.040
 Amen+0.040
 harm+0.040
 exchange+0.039
give+0.039
Neg logit contributions
ashing-0.054
tub-0.052
gan-0.047
angers-0.046
 Mil-0.045
Token.
Token ID13
Feature activation+7.00e-03

Pos logit contributions
oc+0.019
 harm+0.018
 exchange+0.018
give+0.017
 deliver+0.017
Neg logit contributions
ashing-0.037
tub-0.034
gan-0.032
angers-0.031
 upon-0.031
Token She
Token ID1375
Feature activation-1.02e-03

Pos logit contributions
 mail+0.009
oc+0.009
 exchange+0.009
 deliver+0.008
'.+0.008
Neg logit contributions
ashing-0.016
cast-0.015
tub-0.015
pit-0.014
angers-0.013
Token was
Token ID373
Feature activation+5.99e-03

Pos logit contributions
ashing+0.002
tub+0.002
angers+0.002
ancer+0.002
 spine+0.002
Neg logit contributions
oc-0.001
 exchange-0.001
e-0.001
give-0.001
 deliver-0.001
Token so
Token ID523
Feature activation-2.34e-03

Pos logit contributions
e+0.009
 deliver+0.008
 mail+0.008
 harm+0.008
 Amen+0.008
Neg logit contributions
ashing-0.013
tub-0.012
angers-0.011
ancer-0.011
cast-0.011
Token happy
Token ID3772
Feature activation-1.77e-02

Pos logit contributions
angers+0.005
tub+0.005
ashing+0.005
pit+0.005
ancer+0.005
Neg logit contributions
e-0.003
 mail-0.003
 value-0.003
 respect-0.003
 deliver-0.003
Token and
Token ID290
Feature activation-1.65e-02

Pos logit contributions
tub+0.030
angers+0.029
ashing+0.029
gan+0.028
pit+0.028
Neg logit contributions
 harm-0.015
 respect-0.014
 control-0.012
 return-0.012
 reply-0.012
Token cried
Token ID16896
Feature activation+8.87e-03

Pos logit contributions
angers+0.037
ashing+0.037
tub+0.037
pit+0.035
ancer+0.033
Neg logit contributions
 Amen-0.022
 harm-0.021
 deliver-0.020
 comfort-0.020
 steal-0.020
Token.
Token ID13
Feature activation+8.57e-03

Pos logit contributions
 harm+0.012
oc+0.012
 defeat+0.011
 respect+0.011
 deliver+0.011
Neg logit contributions
ashing-0.018
tub-0.018
pit-0.017
angers-0.016
gan-0.016
Token She
Token ID1375
Feature activation+4.45e-03

Pos logit contributions
oc+0.012
give+0.012
 Amen+0.011
 word+0.011
Will+0.011
Neg logit contributions
ashing-0.018
tub-0.017
 thirsty-0.016
ancer-0.015
cast-0.015
Token didn
Token ID1422
Feature activation+3.16e-02

Pos logit contributions
oc+0.008
 Amen+0.008
 harm+0.008
 exchange+0.008
give+0.008
Neg logit contributions
ashing-0.008
tub-0.007
gan-0.006
angers-0.006
 Mil-0.006
Token't
Token ID470
Feature activation+5.66e-02

Pos logit contributions
oc+0.065
 exchange+0.057
 harm+0.056
 Amen+0.056
 envelop+0.056
Neg logit contributions
tub-0.046
ashing-0.043
gan-0.042
angers-0.040
 Mil-0.040
Token want
Token ID765
Feature activation+5.08e-02

Pos logit contributions
oc+0.080
 Amen+0.078
give+0.077
 exchange+0.077
 harm+0.077
Neg logit contributions
ashing-0.118
tub-0.112
gan-0.102
angers-0.100
 Mil-0.099
Token Boo
Token ID21458
Feature activation+1.19e-02

Pos logit contributions
 harm+0.075
give+0.069
 exchange+0.068
 respect+0.067
 Amen+0.065
Neg logit contributions
ashing-0.109
tub-0.105
pit-0.094
angers-0.094
gan-0.092
Token to
Token ID284
Feature activation+6.05e-02

Pos logit contributions
 harm+0.017
 exchange+0.016
oc+0.016
e+0.016
 respect+0.016
Neg logit contributions
tub-0.024
ashing-0.024
angers-0.022
pit-0.021
 Mil-0.021
Token leave
Token ID2666
Feature activation-2.53e-02

Pos logit contributions
oc+0.083
 Amen+0.081
give+0.079
 exchange+0.079
 harm+0.078
Neg logit contributions
ashing-0.128
tub-0.121
gan-0.113
 Mil-0.109
 thirsty-0.108
Token.
Token ID13
Feature activation+4.44e-04

Pos logit contributions
ashing+0.054
tub+0.050
angers+0.046
gan+0.045
ashed+0.044
Neg logit contributions
 harm-0.038
 exchange-0.035
oc-0.035
give-0.035
 Amen-0.034
Token thinking
Token ID3612
Feature activation+8.84e-04

Pos logit contributions
ashing+0.030
tub+0.028
 thirsty+0.025
 Mil+0.025
cast+0.025
Neg logit contributions
oc-0.038
 Amen-0.037
 harm-0.036
 exchange-0.036
give-0.036
Token about
Token ID546
Feature activation+4.81e-03

Pos logit contributions
 harm+0.001
oc+0.001
 exchange+0.001
give+0.001
 defeat+0.001
Neg logit contributions
ashing-0.002
tub-0.002
gan-0.002
angers-0.002
pit-0.002
Token the
Token ID262
Feature activation-4.31e-02

Pos logit contributions
oc+0.007
 harm+0.007
 Amen+0.007
 exchange+0.007
give+0.007
Neg logit contributions
ashing-0.009
tub-0.009
gan-0.008
angers-0.008
 Mil-0.008
Token little
Token ID1310
Feature activation-1.31e-02

Pos logit contributions
ashing+0.085
tub+0.081
gan+0.074
angers+0.073
 Mil+0.071
Neg logit contributions
oc-0.065
 Amen-0.064
 harm-0.064
 exchange-0.063
give-0.063
Token bird
Token ID6512
Feature activation-4.96e-03

Pos logit contributions
ashing+0.028
tub+0.026
angers+0.024
gan+0.024
 Mil+0.023
Neg logit contributions
 harm-0.018
oc-0.018
 exchange-0.018
 Amen-0.018
give-0.017
Token's
Token ID338
Feature activation-2.02e-02

Pos logit contributions
ashing+0.009
tub+0.009
angers+0.008
gan+0.008
 Mil+0.008
Neg logit contributions
oc-0.008
 harm-0.008
 exchange-0.008
 Amen-0.007
 deliver-0.007
Token words
Token ID2456
Feature activation+5.50e-02

Pos logit contributions
ashing+0.035
tub+0.034
 upon+0.030
 thirsty+0.030
cast+0.029
Neg logit contributions
 Amen-0.035
oc-0.035
give-0.034
 exchange-0.033
 harm-0.033
Token.
Token ID13
Feature activation-6.83e-03

Pos logit contributions
 harm+0.085
oc+0.085
 exchange+0.084
 Amen+0.083
give+0.082
Neg logit contributions
ashing-0.106
tub-0.100
gan-0.091
angers-0.091
 Mil-0.088
Token
Token ID220
Feature activation-4.29e-02

Pos logit contributions
ashing+0.015
tub+0.014
pit+0.013
angers+0.013
ashed+0.013
Neg logit contributions
oc-0.010
 Amen-0.009
 exchange-0.009
give-0.008
 deliver-0.008
Token
Token ID198
Feature activation+2.49e-02

Pos logit contributions
ashing+0.099
tub+0.092
pit+0.088
ashed+0.087
 thirsty+0.085
Neg logit contributions
 exchange-0.050
oc-0.049
give-0.048
 harm-0.045
rum-0.045
Token
Token ID198
Feature activation+2.76e-02

Pos logit contributions
 harm+0.042
 Amen+0.042
oc+0.041
 exchange+0.041
give+0.041
Neg logit contributions
ashing-0.044
tub-0.042
gan-0.038
angers-0.037
 Mil-0.037
Token I
Token ID314
Feature activation+3.49e-02

Pos logit contributions
ashing+0.000
tub+0.000
gan+0.000
angers+0.000
 Mil+0.000
Neg logit contributions
 harm-0.000
oc-0.000
 exchange-0.000
 Amen-0.000
give-0.000
Token want
Token ID765
Feature activation+6.00e-02

Pos logit contributions
 Amen+0.058
 harm+0.056
 exchange+0.055
give+0.055
izer+0.055
Neg logit contributions
ashing-0.063
tub-0.058
 upon-0.052
 thirsty-0.052
gan-0.052
Token you
Token ID345
Feature activation+2.52e-02

Pos logit contributions
oc+0.111
 Amen+0.110
 harm+0.104
 exchange+0.104
give+0.102
Neg logit contributions
ashing-0.098
tub-0.095
gan-0.084
angers-0.083
 Mil-0.082
Token to
Token ID284
Feature activation+5.91e-02

Pos logit contributions
 harm+0.036
 exchange+0.036
 respect+0.035
give+0.035
 deliver+0.034
Neg logit contributions
ashing-0.051
tub-0.049
angers-0.046
 Mil-0.045
pit-0.044
Token be
Token ID307
Feature activation-1.01e-02

Pos logit contributions
oc+0.095
 Amen+0.092
ez+0.085
rum+0.085
give+0.083
Neg logit contributions
tub-0.100
ashing-0.099
gan-0.093
 Mil-0.093
 thirsty-0.091
Token happy
Token ID3772
Feature activation-8.82e-04

Pos logit contributions
ashing+0.019
tub+0.018
angers+0.016
gan+0.016
 Mil+0.016
Neg logit contributions
 Amen-0.017
 harm-0.016
oc-0.016
 exchange-0.016
give-0.016
Token.
Token ID13
Feature activation+3.62e-03

Pos logit contributions
ashing+0.002
tub+0.002
gan+0.002
angers+0.002
 Mil+0.002
Neg logit contributions
 harm-0.001
 Amen-0.001
oc-0.001
give-0.001
 exchange-0.001
Token But
Token ID887
Feature activation-2.59e-03

Pos logit contributions
 harm+0.007
oc+0.007
 exchange+0.007
give+0.007
 defeat+0.006
Neg logit contributions
ashing-0.005
tub-0.005
gan-0.005
 Mil-0.005
angers-0.005
Token I
Token ID314
Feature activation+3.72e-02

Pos logit contributions
ashing+0.006
tub+0.005
angers+0.005
ashed+0.005
cast+0.005
Neg logit contributions
 exchange-0.004
give-0.003
 harm-0.003
 deliver-0.003
 Amen-0.003
Token can
Token ID460
Feature activation+5.15e-02

Pos logit contributions
 Amen+0.059
oc+0.058
 harm+0.058
 exchange+0.057
give+0.057
Neg logit contributions
ashing-0.071
tub-0.067
gan-0.060
 upon-0.058
 Mil-0.058
Token not
Token ID407
Feature activation+5.35e-02

Pos logit contributions
oc+0.093
izer+0.082
rum+0.082
ez+0.081
 Amen+0.079
Neg logit contributions
tub-0.082
ashing-0.082
gan-0.079
 upon-0.074
 Mil-0.074
Token,
Token ID11
Feature activation-8.77e-02

Pos logit contributions
ashing+0.159
tub+0.149
angers+0.146
ashed+0.133
cast+0.133
Neg logit contributions
oc-0.098
 exchange-0.093
e-0.092
 Amen-0.089
rum-0.089
Token there
Token ID612
Feature activation-5.74e-02

Pos logit contributions
ashing+0.144
angers+0.136
tub+0.129
cast+0.121
pit+0.116
Neg logit contributions
 exchange-0.167
 Amen-0.165
oc-0.161
e-0.158
give-0.157
Token was
Token ID373
Feature activation+3.22e-03

Pos logit contributions
ashing+0.122
tub+0.114
angers+0.108
gan+0.104
cast+0.101
Neg logit contributions
oc-0.081
 exchange-0.079
 harm-0.078
give-0.077
 Amen-0.076
Token a
Token ID257
Feature activation+5.26e-02

Pos logit contributions
oc+0.005
 harm+0.005
 Amen+0.005
 exchange+0.005
give+0.005
Neg logit contributions
ashing-0.006
tub-0.006
gan-0.005
angers-0.005
 Mil-0.005
Token girl
Token ID2576
Feature activation+1.30e-02

Pos logit contributions
oc+0.089
 Amen+0.088
 harm+0.086
 exchange+0.086
give+0.084
Neg logit contributions
ashing-0.094
tub-0.089
angers-0.081
gan-0.081
pit-0.077
Token named
Token ID3706
Feature activation-2.59e-02

Pos logit contributions
give+0.024
 harm+0.024
 prove+0.024
 shave+0.024
rum+0.024
Neg logit contributions
ashing-0.021
tub-0.018
 thirsty-0.017
 Mil-0.016
gan-0.016
Token Lily
Token ID20037
Feature activation+1.74e-02

Pos logit contributions
ashing+0.038
tub+0.038
cast+0.037
pit+0.035
angers+0.033
Neg logit contributions
 exchange-0.050
 mail-0.048
give-0.048
 Amen-0.048
 deliver-0.047
Token.
Token ID13
Feature activation+1.82e-02

Pos logit contributions
give+0.033
 prove+0.032
 snatch+0.032
 sentence+0.031
 Amen+0.031
Neg logit contributions
ashing-0.028
tub-0.024
 upon-0.023
 thirsty-0.022
 bored-0.021
Token She
Token ID1375
Feature activation+1.12e-02

Pos logit contributions
 exchange+0.029
 Amen+0.028
 mail+0.026
oc+0.026
give+0.026
Neg logit contributions
ashing-0.032
cast-0.031
tub-0.030
gan-0.029
angers-0.028
Token was
Token ID373
Feature activation+1.69e-02

Pos logit contributions
oc+0.020
 Amen+0.020
 harm+0.020
give+0.020
 exchange+0.020
Neg logit contributions
ashing-0.019
tub-0.018
angers-0.016
gan-0.016
pit-0.015
Token very
Token ID845
Feature activation+8.01e-03

Pos logit contributions
 deliver+0.025
 mail+0.025
 Amen+0.025
e+0.024
 harm+0.024
Neg logit contributions
ashing-0.033
tub-0.033
pit-0.032
cast-0.032
angers-0.030
Token,
Token ID11
Feature activation-9.53e-02

Pos logit contributions
ashing+0.198
tub+0.182
cast+0.168
udd+0.163
pit+0.162
Neg logit contributions
e-0.073
 exchange-0.072
 respect-0.071
 mail-0.067
give-0.065
Token Anna
Token ID11735
Feature activation-1.61e-02

Pos logit contributions
ashing+0.207
tub+0.177
udd+0.158
ashed+0.155
cast+0.155
Neg logit contributions
 exchange-0.150
 mail-0.143
 respect-0.142
 deliver-0.141
 stamp-0.140
Token goes
Token ID2925
Feature activation-3.21e-02

Pos logit contributions
ashing+0.034
tub+0.032
gan+0.030
angers+0.030
pit+0.029
Neg logit contributions
oc-0.022
 harm-0.022
 exchange-0.022
give-0.021
 Amen-0.021
Token to
Token ID284
Feature activation+2.45e-02

Pos logit contributions
ashing+0.070
tub+0.069
gan+0.066
 Mil+0.064
pit+0.063
Neg logit contributions
give-0.037
 exchange-0.037
 harm-0.037
 deliver-0.037
 mail-0.034
Token the
Token ID262
Feature activation-4.65e-02

Pos logit contributions
oc+0.044
 Amen+0.043
give+0.042
 harm+0.042
 exchange+0.041
Neg logit contributions
ashing-0.040
tub-0.039
 Mil-0.035
gan-0.035
 upon-0.034
Token park
Token ID3952
Feature activation-5.77e-02

Pos logit contributions
ashing+0.082
tub+0.076
gan+0.070
angers+0.069
 Mil+0.067
Neg logit contributions
 harm-0.079
 exchange-0.079
oc-0.079
 Amen-0.079
give-0.078
Token with
Token ID351
Feature activation+7.80e-03

Pos logit contributions
ashing+0.136
tub+0.132
angers+0.125
gan+0.123
cast+0.120
Neg logit contributions
 exchange-0.062
oc-0.059
 deliver-0.059
 harm-0.057
give-0.056
Token her
Token ID607
Feature activation-2.23e-02

Pos logit contributions
oc+0.012
 Amen+0.012
 harm+0.012
 exchange+0.012
give+0.012
Neg logit contributions
ashing-0.015
tub-0.015
gan-0.013
angers-0.013
 Mil-0.013
Token mom
Token ID1995
Feature activation+2.92e-02

Pos logit contributions
ashing+0.039
tub+0.036
gan+0.033
angers+0.032
 Mil+0.031
Neg logit contributions
 harm-0.039
oc-0.038
 Amen-0.038
 exchange-0.038
give-0.038
Token.
Token ID13
Feature activation+9.11e-03

Pos logit contributions
oc+0.039
 Amen+0.039
 harm+0.039
give+0.038
 exchange+0.037
Neg logit contributions
ashing-0.063
tub-0.060
gan-0.055
angers-0.054
 Mil-0.053
Token She
Token ID1375
Feature activation-5.46e-03

Pos logit contributions
 exchange+0.015
 Amen+0.015
 deliver+0.014
 mail+0.014
 sentence+0.014
Neg logit contributions
ashing-0.016
tub-0.015
cast-0.014
pit-0.014
gan-0.013
Token
Token ID198
Feature activation+3.44e-02

Pos logit contributions
 harm+0.068
 Amen+0.067
give+0.066
 snatch+0.064
oc+0.064
Neg logit contributions
ashing-0.044
tub-0.042
 upon-0.038
 Mil-0.034
gan-0.034
TokenTom
Token ID13787
Feature activation+1.21e-02

Pos logit contributions
 harm+0.061
 Amen+0.059
oc+0.055
 exchange+0.055
give+0.055
Neg logit contributions
ashing-0.063
tub-0.058
gan-0.053
 upon-0.052
angers-0.050
Token started
Token ID2067
Feature activation-3.38e-02

Pos logit contributions
oc+0.015
 exchange+0.014
e+0.014
 mail+0.013
 deliver+0.013
Neg logit contributions
tub-0.028
ashing-0.027
angers-0.027
ancer-0.025
 Mil-0.024
Token to
Token ID284
Feature activation+3.60e-02

Pos logit contributions
ashing+0.075
tub+0.074
gan+0.068
angers+0.067
ancer+0.066
Neg logit contributions
 harm-0.042
 exchange-0.040
 deliver-0.039
 shave-0.039
give-0.039
Token skip
Token ID14267
Feature activation-2.15e-02

Pos logit contributions
 exchange+0.047
 harm+0.047
 deliver+0.046
 shave+0.046
 Amen+0.044
Neg logit contributions
ashing-0.081
tub-0.077
angers-0.073
gan-0.069
ashed-0.069
Token and
Token ID290
Feature activation-2.98e-02

Pos logit contributions
ashing+0.049
angers+0.045
gan+0.045
ancer+0.045
tub+0.044
Neg logit contributions
 exchange-0.023
 control-0.022
 fright-0.021
 harm-0.021
 defeat-0.020
Token play
Token ID711
Feature activation-1.59e-02

Pos logit contributions
ashing+0.071
tub+0.068
angers+0.067
gan+0.061
ancer+0.060
Neg logit contributions
 exchange-0.033
e-0.033
 deliver-0.032
 steal-0.032
 harm-0.032
Token while
Token ID981
Feature activation-5.39e-02

Pos logit contributions
ashing+0.034
tub+0.031
gan+0.031
angers+0.029
ancer+0.028
Neg logit contributions
oc-0.022
 harm-0.021
e-0.020
give-0.020
 deliver-0.020
Token his
Token ID465
Feature activation-3.95e-02

Pos logit contributions
ashing+0.103
tub+0.091
ashed+0.084
udd+0.083
gan+0.081
Neg logit contributions
 harm-0.090
 exchange-0.087
 deliver-0.086
e-0.084
give-0.084
Token mom
Token ID1995
Feature activation+2.33e-02

Pos logit contributions
ashing+0.081
tub+0.076
angers+0.069
cast+0.069
 Mil+0.068
Neg logit contributions
 harm-0.057
 letter-0.054
give-0.053
 message-0.053
 exchange-0.052
Tokenmy
Token ID1820
Feature activation+1.86e-02

Pos logit contributions
 harm+0.032
e+0.031
oc+0.031
 exchange+0.031
 deliver+0.030
Neg logit contributions
tub-0.047
ashing-0.045
angers-0.041
cast-0.040
 Mil-0.040
Token.
Token ID13
Feature activation-1.21e-02

Pos logit contributions
ashing+0.031
tub+0.029
gan+0.027
angers+0.027
 Mil+0.026
Neg logit contributions
 exchange-0.018
 harm-0.018
oc-0.018
 Amen-0.018
give-0.018
Token<|endoftext|>
Token ID50256
Feature activation-2.52e-02

Pos logit contributions
ashing+0.027
tub+0.025
cast+0.023
pit+0.022
angers+0.022
Neg logit contributions
 Amen-0.017
oc-0.017
 exchange-0.017
give-0.015
'.-0.014
TokenOnce
Token ID7454
Feature activation-1.16e-01

Pos logit contributions
ashing+0.041
tub+0.039
gan+0.035
angers+0.034
 Mil+0.033
Neg logit contributions
oc-0.046
 Amen-0.046
 harm-0.046
 exchange-0.045
give-0.045
Token upon
Token ID2402
Feature activation-3.82e-02

Pos logit contributions
ashing+0.137
angers+0.127
tub+0.125
drops+0.114
pit+0.105
Neg logit contributions
 exchange-0.253
 harm-0.238
 letter-0.235
 Amen-0.233
oc-0.227
Token a
Token ID257
Feature activation+1.56e-02

Pos logit contributions
ashing+0.058
tub+0.058
 upon+0.053
gan+0.052
 thirsty+0.051
Neg logit contributions
oc-0.072
 harm-0.071
 Amen-0.069
give-0.069
 exchange-0.069
Token time
Token ID640
Feature activation-8.35e-02

Pos logit contributions
oc+0.022
 Amen+0.022
 harm+0.022
 exchange+0.022
give+0.021
Neg logit contributions
ashing-0.032
tub-0.031
angers-0.028
gan-0.028
 Mil-0.027
Token there
Token ID612
Feature activation-6.39e-02

Pos logit contributions
ashing+0.211
angers+0.203
tub+0.196
ashed+0.188
 prett+0.183
Neg logit contributions
oc-0.078
e-0.075
 exchange-0.071
 fright-0.065
rum-0.065
Token was
Token ID373
Feature activation-5.07e-03

Pos logit contributions
ashing+0.158
angers+0.143
tub+0.140
gan+0.132
 spine+0.129
Neg logit contributions
oc-0.076
 exchange-0.073
 harm-0.070
e-0.068
give-0.067
Token a
Token ID257
Feature activation+4.72e-02

Pos logit contributions
ashing+0.011
tub+0.011
angers+0.010
pit+0.010
cast+0.010
Neg logit contributions
 Amen-0.006
 exchange-0.006
e-0.006
give-0.006
oc-0.006
Token small
Token ID1402
Feature activation+2.88e-02

Pos logit contributions
oc+0.077
 Amen+0.075
 exchange+0.072
 harm+0.072
 deliver+0.070
Neg logit contributions
ashing-0.088
tub-0.084
angers-0.078
gan-0.076
pit-0.074
Token creature
Token ID7185
Feature activation+1.32e-02

Pos logit contributions
oc+0.042
 Amen+0.039
 letter+0.039
 harm+0.038
 exchange+0.038
Neg logit contributions
ashing-0.060
tub-0.056
angers-0.054
gan-0.053
cast-0.052
Token<|endoftext|>
Token ID50256
Feature activation-2.34e-02

Pos logit contributions
ashing+0.013
tub+0.013
angers+0.012
cast+0.012
pit+0.012
Neg logit contributions
 exchange-0.009
oc-0.008
 Amen-0.008
give-0.008
'.-0.008
TokenOnce
Token ID7454
Feature activation-1.16e-01

Pos logit contributions
ashing+0.041
tub+0.038
gan+0.035
angers+0.034
 Mil+0.033
Neg logit contributions
oc-0.041
 exchange-0.040
 Amen-0.040
give-0.040
 harm-0.040
Token upon
Token ID2402
Feature activation-4.46e-02

Pos logit contributions
ashing+0.135
tub+0.126
angers+0.124
drops+0.106
pit+0.103
Neg logit contributions
 exchange-0.258
 harm-0.249
 Amen-0.235
 letter-0.234
oc-0.234
Token a
Token ID257
Feature activation+1.52e-02

Pos logit contributions
ashing+0.071
tub+0.070
 upon+0.063
gan+0.062
 thirsty+0.061
Neg logit contributions
oc-0.081
 harm-0.080
 Amen-0.078
give-0.078
 exchange-0.078
Token time
Token ID640
Feature activation-7.87e-02

Pos logit contributions
oc+0.022
 Amen+0.022
 harm+0.022
 exchange+0.022
give+0.022
Neg logit contributions
ashing-0.030
tub-0.029
gan-0.026
angers-0.026
 Mil-0.025
Token there
Token ID612
Feature activation-6.69e-02

Pos logit contributions
ashing+0.206
angers+0.196
tub+0.188
 prett+0.183
ashed+0.183
Neg logit contributions
oc-0.071
e-0.066
 exchange-0.062
 fright-0.061
�-0.059
Token was
Token ID373
Feature activation-3.89e-03

Pos logit contributions
ashing+0.163
angers+0.148
tub+0.146
gan+0.136
cast+0.133
Neg logit contributions
oc-0.082
 exchange-0.078
 harm-0.075
e-0.074
give-0.073
Token a
Token ID257
Feature activation+4.77e-02

Pos logit contributions
ashing+0.008
tub+0.008
angers+0.007
gan+0.007
pit+0.007
Neg logit contributions
 Amen-0.005
 exchange-0.005
oc-0.005
 harm-0.005
give-0.005
Token man
Token ID582
Feature activation+2.63e-02

Pos logit contributions
oc+0.079
 Amen+0.078
 harm+0.075
 exchange+0.075
give+0.073
Neg logit contributions
ashing-0.087
tub-0.083
angers-0.077
gan-0.075
pit-0.073
Token who
Token ID508
Feature activation-2.76e-03

Pos logit contributions
give+0.047
 Amen+0.046
 harm+0.045
 owe+0.044
 exchange+0.044
Neg logit contributions
ashing-0.045
tub-0.040
 thirsty-0.037
 Mil-0.035
gan-0.035
Token was
Token ID373
Feature activation+1.64e-02

Pos logit contributions
ashing+0.005
tub+0.004
angers+0.004
gan+0.004
pit+0.004
Neg logit contributions
oc-0.005
 Amen-0.005
 harm-0.005
 exchange-0.005
give-0.005
Token very
Token ID845
Feature activation-1.24e-03

Pos logit contributions
 harm+0.014
 Amen+0.014
 exchange+0.014
give+0.014
oc+0.014
Neg logit contributions
ashing-0.013
tub-0.013
angers-0.012
gan-0.011
pit-0.011
Token frustrated
Token ID14718
Feature activation-1.54e-03

Pos logit contributions
ashing+0.002
tub+0.002
angers+0.002
pit+0.002
gan+0.002
Neg logit contributions
 harm-0.002
 exchange-0.002
oc-0.002
 deliver-0.002
 Amen-0.002
Token and
Token ID290
Feature activation-1.60e-02

Pos logit contributions
ashing+0.004
tub+0.003
gan+0.003
angers+0.003
 Mil+0.003
Neg logit contributions
 harm-0.002
oc-0.002
 exchange-0.002
give-0.002
 Amen-0.002
Token this
Token ID428
Feature activation-1.65e-02

Pos logit contributions
ashing+0.030
tub+0.029
angers+0.028
pit+0.027
gan+0.025
Neg logit contributions
 harm-0.027
 Amen-0.026
 steal-0.026
 deliver-0.026
 exchange-0.025
Token time
Token ID640
Feature activation-4.38e-02

Pos logit contributions
ashing+0.028
tub+0.027
gan+0.024
 upon+0.023
 Mil+0.023
Neg logit contributions
 Amen-0.030
oc-0.029
give-0.028
 harm-0.028
 exchange-0.028
Token,
Token ID11
Feature activation-8.25e-02

Pos logit contributions
ashing+0.090
tub+0.081
angers+0.077
gan+0.075
pit+0.074
Neg logit contributions
 exchange-0.065
 harm-0.065
oc-0.063
give-0.063
 deliver-0.062
Token he
Token ID339
Feature activation-1.65e-02

Pos logit contributions
ashing+0.165
tub+0.152
angers+0.144
gan+0.143
ashed+0.136
Neg logit contributions
 harm-0.129
 exchange-0.129
oc-0.128
give-0.123
 deliver-0.122
Token was
Token ID373
Feature activation+2.11e-03

Pos logit contributions
ashing+0.028
tub+0.027
angers+0.024
gan+0.024
 Mil+0.023
Neg logit contributions
oc-0.029
 harm-0.029
 Amen-0.029
 exchange-0.029
give-0.029
Token angry
Token ID7954
Feature activation+8.72e-03

Pos logit contributions
 harm+0.004
give+0.003
 Amen+0.003
 exchange+0.003
 deliver+0.003
Neg logit contributions
ashing-0.004
tub-0.004
angers-0.003
cast-0.003
pit-0.003
Token too
Token ID1165
Feature activation+5.87e-03

Pos logit contributions
 harm+0.009
oc+0.009
give+0.009
 exchange+0.008
 Amen+0.008
Neg logit contributions
ashing-0.021
tub-0.020
gan-0.019
angers-0.019
 thirsty-0.018
Token.
Token ID13
Feature activation+3.41e-04

Pos logit contributions
 Amen+0.009
give+0.008
oc+0.008
 exchange+0.008
 harm+0.008
Neg logit contributions
ashing-0.012
tub-0.011
gan-0.010
 thirsty-0.010
 bored-0.010
Token some
Token ID617
Feature activation+2.92e-02

Pos logit contributions
ashing+0.010
tub+0.009
angers+0.009
gan+0.009
pit+0.008
Neg logit contributions
 exchange-0.006
 harm-0.006
give-0.006
oc-0.006
 Amen-0.006
Token pine
Token ID20161
Feature activation-1.30e-02

Pos logit contributions
oc+0.051
 harm+0.051
 Amen+0.050
 exchange+0.050
give+0.050
Neg logit contributions
ashing-0.050
tub-0.048
gan-0.043
angers-0.042
 Mil-0.041
Token cones
Token ID47314
Feature activation+4.18e-02

Pos logit contributions
ashing+0.029
tub+0.026
gan+0.025
 thirsty+0.024
 upon+0.024
Neg logit contributions
oc-0.018
 exchange-0.017
give-0.016
 harm-0.016
 deliver-0.016
Token on
Token ID319
Feature activation+2.31e-02

Pos logit contributions
 exchange+0.054
oc+0.052
 deliver+0.050
 harm+0.050
give+0.049
Neg logit contributions
ashing-0.089
tub-0.085
angers-0.078
 Mil-0.078
gan-0.077
Token the
Token ID262
Feature activation-6.33e-02

Pos logit contributions
 Amen+0.047
oc+0.045
 harm+0.044
rum+0.043
 owe+0.042
Neg logit contributions
tub-0.034
ashing-0.031
 Mil-0.031
gan-0.030
 Jill-0.028
Token top
Token ID1353
Feature activation-7.18e-02

Pos logit contributions
ashing+0.113
tub+0.107
gan+0.095
angers+0.094
 Mil+0.092
Neg logit contributions
oc-0.107
 harm-0.107
 Amen-0.106
 exchange-0.106
give-0.105
Token to
Token ID284
Feature activation+1.96e-02

Pos logit contributions
ashing+0.150
tub+0.146
gan+0.132
angers+0.130
pit+0.129
Neg logit contributions
 Amen-0.101
 harm-0.097
give-0.097
oc-0.096
 exchange-0.093
Token make
Token ID787
Feature activation+3.17e-02

Pos logit contributions
 Amen+0.031
give+0.029
oc+0.028
 harm+0.028
rum+0.028
Neg logit contributions
tub-0.035
ashing-0.035
 Mil-0.035
gan-0.034
 Jill-0.032
Token a
Token ID257
Feature activation+3.08e-02

Pos logit contributions
oc+0.044
 harm+0.044
 exchange+0.044
give+0.043
 Amen+0.043
Neg logit contributions
ashing-0.067
tub-0.063
gan-0.057
angers-0.057
pit-0.055
Token point
Token ID966
Feature activation-8.09e-03

Pos logit contributions
 harm+0.053
 Amen+0.053
give+0.052
oc+0.052
 exchange+0.052
Neg logit contributions
ashing-0.054
tub-0.051
gan-0.045
angers-0.044
 Mil-0.044
Tokeny
Token ID88
Feature activation+3.37e-04

Pos logit contributions
ashing+0.017
tub+0.017
gan+0.015
angers+0.015
 Mil+0.015
Neg logit contributions
 harm-0.011
oc-0.011
 exchange-0.010
 Amen-0.010
give-0.010
Token They
Token ID1119
Feature activation+6.92e-03

Pos logit contributions
ashing+0.016
tub+0.014
angers+0.013
cast+0.013
pit+0.013
Neg logit contributions
 exchange-0.014
 Amen-0.014
oc-0.014
give-0.014
 deliver-0.013
Token were
Token ID547
Feature activation-1.09e-03

Pos logit contributions
oc+0.011
 exchange+0.011
 Amen+0.011
 harm+0.011
give+0.011
Neg logit contributions
ashing-0.013
tub-0.012
angers-0.011
gan-0.011
 Mil-0.010
Token chocolate
Token ID11311
Feature activation+2.03e-02

Pos logit contributions
ashing+0.002
tub+0.002
angers+0.002
gan+0.002
pit+0.002
Neg logit contributions
 Amen-0.002
 exchange-0.002
 harm-0.002
 deliver-0.002
oc-0.002
Token chip
Token ID11594
Feature activation+2.84e-02

Pos logit contributions
oc+0.029
 exchange+0.028
 harm+0.028
 Amen+0.028
give+0.028
Neg logit contributions
ashing-0.042
tub-0.040
angers-0.036
gan-0.036
 Mil-0.035
Token cookies
Token ID14746
Feature activation+4.29e-02

Pos logit contributions
oc+0.044
 harm+0.043
 exchange+0.043
give+0.042
 Amen+0.042
Neg logit contributions
ashing-0.055
tub-0.053
gan-0.048
angers-0.048
 Mil-0.046
Token,
Token ID11
Feature activation-8.51e-02

Pos logit contributions
oc+0.063
 harm+0.062
 Amen+0.062
 exchange+0.062
give+0.061
Neg logit contributions
ashing-0.085
tub-0.081
gan-0.074
angers-0.073
 Mil-0.072
Token Anna
Token ID11735
Feature activation-2.05e-03

Pos logit contributions
ashing+0.178
tub+0.170
gan+0.154
angers+0.153
 Mil+0.149
Neg logit contributions
oc-0.118
 Amen-0.117
 harm-0.117
 exchange-0.116
give-0.115
Token's
Token ID338
Feature activation+4.79e-03

Pos logit contributions
ashing+0.004
tub+0.004
angers+0.004
gan+0.004
 Mil+0.004
Neg logit contributions
oc-0.003
 exchange-0.003
 harm-0.003
 Amen-0.003
e-0.003
Token favorite
Token ID4004
Feature activation+3.93e-02

Pos logit contributions
oc+0.008
 Amen+0.008
 harm+0.008
 exchange+0.008
give+0.008
Neg logit contributions
ashing-0.008
tub-0.008
gan-0.007
angers-0.007
 Mil-0.007
Token.
Token ID13
Feature activation-3.05e-03

Pos logit contributions
 Amen+0.063
 harm+0.061
give+0.060
oc+0.060
izer+0.059
Neg logit contributions
ashing-0.071
tub-0.071
pit-0.064
gan-0.064
 Mil-0.061
Token "
Token ID366
Feature activation+2.73e-02

Pos logit contributions
ashing+0.005
tub+0.005
angers+0.004
cast+0.004
pit+0.004
Neg logit contributions
 exchange-0.006
 Amen-0.006
oc-0.006
give-0.005
 deliver-0.005