TOP ACTIVATIONS
MAX = 9.587

is very pretty and tw irl s around in it .
. She walks and tw irl s and sings .
it did a few tw irl s in the air .
. Now she could tw irl and dance whenever she wanted
you can spin and tw irl with me forever ."
grass and watching them tw irl around each other . Annie
. She liked to tw irl and spin in circles .
leaves . With soft sn ores , Lucy began to drift
hands and watched them tw irl around and around .
old lady like a wh irl ig ig . His dancing
. She started to tw irl and spin until she landed
would go outside and tw irl and spin while the sun
couldn 't help but tw irl around shouting , " Look
delight , watching it tw irl across the floor .
started to dance and tw irl around in the sunshine .
skirt and loved to tw irl around in it . One
. She began to tw irl around , spinning the da
loved to dance and tw irl in her pretty skirt .
the red dress and tw irl s around .
day . She would tw irl and dance around each flower

MIN ACTIVATIONS
MIN = -10.821

grandma 's house for safe keeping . When
it to me for safe keeping . He told me to
. It was Mrs . Smith , a nice lady from
His name was Mr . Smith . One day
tall trees and ch ir ping birds . She walked around
bright blue birds ch ir ping in the trees and big
birds inside , ch ir ping loudly ! El isa smiled
when her neighbor , Mrs Smith , came over to talk
she saw birds ch ir ping , butterflies fl uttering and
My name is Mr . Smith . Would you like me
the sky , ch ir ping loudly . The
books and paper . Mrs Smith helped L ola to sit
her neighbor , Mrs . Smith . Mrs
y neighbor , Mrs . Smith , came over to say
that says " Mr . Smith ". He looks bored and
His teacher , Mrs . Smith , was very nice and
� € � � Hel ping others is the kind thing
said the teacher , Miss Smith . She smiled and waved
who was called Dr . Smith . At
, and birds ch ir ping . John

INTERVAL 4.485 - 9.587
CONTAINS 1.036%

The children climbed up , higher and higher until they could
, she found a box of matches . She smiled and
" We are le op ards , too . We
a bit shy , she nodded and he helped her put
rest and drink a lot of water . L

INTERVAL -0.617 - 4.485
CONTAINS 76.131%

the oldest kid spoke . " It 's time to go
little boy replied , " Yes !" and the game went
" Let 's go and see !" Tom said
she had done the task . Her Mom smiled
pressed them with her paws . K itty 's

INTERVAL -5.719 - -0.617
CONTAINS 22.647%

he was so excited to learn about soil .
. It was the best day ever !
back to the way they were before . He hoped that
He says , " Thank you , Lily and Tom .
warm sun . He liked playing in the backyard with his

INTERVAL -10.821 - -5.719
CONTAINS 0.186%

if we could ride that mushroom up to the clouds !"
travelled the tall vine brave ly ," he said .
were too boring and gir ly . He liked cars better
there was a girl named Claire . Claire was playing alone
puppy and said , " Hello , little puppy ! Do
Token is
Token ID318
Feature activation+1.65e+00

Pos logit contributions
int+0.004
ores+0.004
ite+0.004
ours+0.004
osh+0.004
Neg logit contributions
keeping-0.008
 heads-0.006
ping-0.006
makers-0.006
 arrest-0.006
Token very
Token ID845
Feature activation-2.34e-01

Pos logit contributions
int+0.117
ite+0.115
osh+0.114
ours+0.111
ig+0.110
Neg logit contributions
keeping-0.207
ping-0.151
 heads-0.146
 arrest-0.141
makers-0.140
Token pretty
Token ID2495
Feature activation-3.24e-01

Pos logit contributions
keeping+0.025
ping+0.017
 heads+0.017
cious+0.016
makers+0.016
Neg logit contributions
int-0.020
osh-0.020
ite-0.020
ours-0.020
ver-0.020
Token and
Token ID290
Feature activation+3.66e-03

Pos logit contributions
keeping+0.042
ping+0.030
 arrest+0.030
 heads+0.029
makers+0.029
Neg logit contributions
ite-0.021
osh-0.021
int-0.021
ours-0.021
ig-0.020
Token tw
Token ID665
Feature activation+5.57e-01

Pos logit contributions
int+0.000
ig+0.000
ite+0.000
umber+0.000
ver+0.000
Neg logit contributions
keeping-0.000
 heads-0.000
ping-0.000
cious-0.000
 arrest-0.000
Tokenirl
Token ID1901
Feature activation+9.56e+00

Pos logit contributions
osh+0.024
ver+0.024
ite+0.023
int+0.022
ores+0.020
Neg logit contributions
keeping-0.081
ping-0.065
 heads-0.064
 arrest-0.061
cious-0.060
Tokens
Token ID82
Feature activation-1.20e+00

Pos logit contributions
int+0.695
ours+0.673
osh+0.655
ite+0.652
ig+0.652
Neg logit contributions
keeping-1.161
ping-0.860
 heads-0.824
makers-0.798
 arrest-0.771
Token around
Token ID1088
Feature activation+1.16e+00

Pos logit contributions
keeping+0.155
ping+0.111
 arrest+0.110
 heads+0.107
makers+0.106
Neg logit contributions
osh-0.080
ite-0.079
ig-0.078
int-0.077
ver-0.077
Token in
Token ID287
Feature activation-1.34e+00

Pos logit contributions
 of+0.049
 at+0.043
ores+0.039
 exactly+0.036
?+0.036
Neg logit contributions
keeping-0.181
 arrest-0.140
 mant-0.136
cious-0.136
ping-0.135
Token it
Token ID340
Feature activation+9.80e-02

Pos logit contributions
keeping+0.192
ping+0.146
makers+0.144
edly+0.141
cious+0.139
Neg logit contributions
 of-0.068
osh-0.063
ila-0.060
 exactly-0.058
ores-0.058
Token.
Token ID13
Feature activation+8.18e-03

Pos logit contributions
osh+0.007
ite+0.007
int+0.006
ig+0.006
ours+0.006
Neg logit contributions
keeping-0.013
ping-0.009
 arrest-0.009
 heads-0.009
makers-0.009
Token.
Token ID13
Feature activation-1.07e-03

Pos logit contributions
keeping+0.369
ping+0.266
 heads+0.259
 arrest+0.254
 mant+0.250
Neg logit contributions
ite-0.197
osh-0.195
ig-0.194
int-0.192
ores-0.189
Token She
Token ID1375
Feature activation+2.55e-01

Pos logit contributions
keeping+0.000
 heads+0.000
ping+0.000
 arrest+0.000
makers+0.000
Neg logit contributions
ig-0.000
int-0.000
osh-0.000
ite-0.000
ver-0.000
Token walks
Token ID11114
Feature activation-1.14e+00

Pos logit contributions
ite+0.020
ver+0.020
ores+0.020
int+0.020
itting+0.020
Neg logit contributions
keeping-0.027
 heads-0.021
 arrest-0.020
ping-0.019
 mant-0.018
Token and
Token ID290
Feature activation+1.56e-02

Pos logit contributions
keeping+0.144
ping+0.106
 arrest+0.103
 heads+0.101
 mant+0.101
Neg logit contributions
osh-0.076
ite-0.075
ores-0.073
ver-0.072
int-0.071
Token tw
Token ID665
Feature activation+5.23e-01

Pos logit contributions
int+0.001
ver+0.001
ite+0.001
ores+0.001
itting+0.001
Neg logit contributions
keeping-0.002
 heads-0.001
 arrest-0.001
ping-0.001
 mant-0.001
Tokenirl
Token ID1901
Feature activation+9.53e+00

Pos logit contributions
ver+0.027
ite+0.025
itting+0.023
 of+0.023
auc+0.023
Neg logit contributions
keeping-0.067
 heads-0.056
 necks-0.056
ping-0.055
 arrest-0.054
Tokens
Token ID82
Feature activation-1.06e+00

Pos logit contributions
int+0.840
ours+0.803
umber+0.770
ver+0.769
ores+0.764
Neg logit contributions
keeping-1.010
ping-0.712
 heads-0.678
makers-0.668
 arrest-0.629
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
keeping+0.129
ping+0.092
 heads+0.089
 arrest+0.087
makers+0.085
Neg logit contributions
int-0.079
ours-0.078
ite-0.078
osh-0.078
ores-0.077
Token sings
Token ID33041
Feature activation+5.45e-01

Pos logit contributions
osh+0.001
ite+0.001
int+0.001
ours+0.001
ores+0.001
Neg logit contributions
keeping-0.001
ping-0.001
 heads-0.001
 arrest-0.001
makers-0.001
Token.
Token ID13
Feature activation-5.08e-03

Pos logit contributions
osh+0.038
int+0.038
ite+0.037
ores+0.037
ours+0.037
Neg logit contributions
keeping-0.068
ping-0.048
 heads-0.047
 arrest-0.047
makers-0.045
Token
Token ID198
Feature activation-6.20e-02

Pos logit contributions
keeping+0.001
ping+0.000
 heads+0.000
 arrest+0.000
makers+0.000
Neg logit contributions
int-0.000
ig-0.000
ite-0.000
osh-0.000
ours-0.000
Token it
Token ID340
Feature activation+2.60e-01

Pos logit contributions
keeping+0.061
ping+0.041
 heads+0.040
 generation+0.039
makers+0.039
Neg logit contributions
int-0.052
ours-0.048
ig-0.047
ite-0.046
uddle-0.045
Token did
Token ID750
Feature activation+1.10e+00

Pos logit contributions
int+0.020
ours+0.020
ite+0.020
ig+0.019
osh+0.019
Neg logit contributions
keeping-0.031
ping-0.022
 heads-0.021
 arrest-0.021
makers-0.020
Token a
Token ID257
Feature activation+9.61e-02

Pos logit contributions
int+0.082
osh+0.082
ite+0.082
ours+0.081
ores+0.080
Neg logit contributions
keeping-0.132
ping-0.094
 heads-0.090
 arrest-0.089
makers-0.087
Token few
Token ID1178
Feature activation-1.35e+00

Pos logit contributions
osh+0.005
ig+0.005
ite+0.005
 exactly+0.005
ver+0.004
Neg logit contributions
keeping-0.014
ping-0.011
 heads-0.010
 arrest-0.010
makers-0.010
Token tw
Token ID665
Feature activation+5.96e-01

Pos logit contributions
keeping+0.156
ping+0.107
 heads+0.103
 arrest+0.102
makers+0.100
Neg logit contributions
int-0.109
osh-0.109
ite-0.108
ours-0.106
ig-0.106
Tokenirl
Token ID1901
Feature activation+9.52e+00

Pos logit contributions
ver+0.036
osh+0.034
ours+0.033
ores+0.031
ite+0.030
Neg logit contributions
keeping-0.077
ping-0.062
 arrest-0.059
 heads-0.056
cious-0.056
Tokens
Token ID82
Feature activation-8.46e-01

Pos logit contributions
int+0.749
ours+0.748
osh+0.728
ores+0.726
ite+0.722
Neg logit contributions
keeping-1.097
ping-0.788
 heads-0.742
makers-0.723
 arrest-0.712
Token in
Token ID287
Feature activation-1.42e+00

Pos logit contributions
keeping+0.105
ping+0.075
 arrest+0.072
 heads+0.072
makers+0.071
Neg logit contributions
osh-0.060
int-0.059
ite-0.059
ours-0.058
ig-0.058
Token the
Token ID262
Feature activation+4.59e-01

Pos logit contributions
keeping+0.204
ping+0.161
makers+0.158
cious+0.156
edly+0.154
Neg logit contributions
osh-0.063
 of-0.063
 exactly-0.061
ores-0.061
?-0.059
Token air
Token ID1633
Feature activation+1.56e+00

Pos logit contributions
ver+0.040
ours+0.040
int+0.039
ores+0.039
osh+0.039
Neg logit contributions
keeping-0.050
ping-0.034
 heads-0.033
 arrest-0.032
 mant-0.031
Token.
Token ID13
Feature activation-1.76e-01

Pos logit contributions
ite+0.094
osh+0.094
ores+0.092
int+0.091
ig+0.091
Neg logit contributions
keeping-0.205
 arrest-0.149
ping-0.147
 heads-0.145
makers-0.143
Token.
Token ID13
Feature activation+4.12e-02

Pos logit contributions
osh+0.159
ite+0.158
int+0.156
ig+0.155
ores+0.152
Neg logit contributions
keeping-0.288
ping-0.207
 heads-0.199
 arrest-0.197
makers-0.194
Token Now
Token ID2735
Feature activation+2.53e-01

Pos logit contributions
int+0.003
osh+0.003
ite+0.003
ig+0.003
ours+0.003
Neg logit contributions
keeping-0.005
ping-0.004
 heads-0.003
 arrest-0.003
makers-0.003
Token she
Token ID673
Feature activation+1.58e+00

Pos logit contributions
int+0.016
ite+0.015
osh+0.015
ours+0.015
ores+0.015
Neg logit contributions
keeping-0.034
ping-0.025
 heads-0.024
 arrest-0.024
makers-0.024
Token could
Token ID714
Feature activation+9.08e-01

Pos logit contributions
ores+0.132
int+0.127
ours+0.123
umber+0.119
itting+0.117
Neg logit contributions
keeping-0.164
 heads-0.128
 arrest-0.119
ping-0.118
makers-0.111
Token tw
Token ID665
Feature activation+6.80e-01

Pos logit contributions
int+0.087
ours+0.086
ores+0.080
itting+0.079
umber+0.079
Neg logit contributions
keeping-0.079
 heads-0.062
 arrest-0.060
 surface-0.058
 mant-0.054
Tokenirl
Token ID1901
Feature activation+9.51e+00

Pos logit contributions
ver+0.035
auc+0.035
ite+0.035
ours+0.034
run+0.033
Neg logit contributions
keeping-0.074
ping-0.066
 arrest-0.065
 heads-0.064
 surface-0.063
Token and
Token ID290
Feature activation+1.01e-01

Pos logit contributions
int+0.852
ours+0.850
umber+0.794
ores+0.784
osh+0.782
Neg logit contributions
keeping-0.975
ping-0.716
 heads-0.660
makers-0.653
 surface-0.636
Token dance
Token ID9280
Feature activation+1.07e+00

Pos logit contributions
int+0.007
ours+0.007
ores+0.007
ite+0.007
osh+0.007
Neg logit contributions
keeping-0.012
ping-0.009
 heads-0.009
 arrest-0.009
makers-0.008
Token whenever
Token ID8797
Feature activation+1.39e+00

Pos logit contributions
ours+0.088
int+0.088
ite+0.084
osh+0.083
ores+0.081
Neg logit contributions
keeping-0.122
ping-0.087
 heads-0.082
makers-0.081
edly-0.078
Token she
Token ID673
Feature activation+1.57e+00

Pos logit contributions
osh+0.076
int+0.075
ite+0.075
ours+0.073
ores+0.073
Neg logit contributions
keeping-0.196
ping-0.147
 heads-0.142
 arrest-0.141
makers-0.139
Token wanted
Token ID2227
Feature activation-2.14e+00

Pos logit contributions
ores+0.128
ours+0.120
 exactly+0.118
ver+0.117
int+0.116
Neg logit contributions
keeping-0.172
 heads-0.129
ping-0.123
 arrest-0.121
makers-0.111
Token you
Token ID345
Feature activation-1.64e+00

Pos logit contributions
keeping+0.003
ping+0.002
 heads+0.002
 arrest+0.002
makers+0.002
Neg logit contributions
int-0.002
osh-0.002
ours-0.002
ite-0.002
ores-0.002
Token can
Token ID460
Feature activation+2.71e+00

Pos logit contributions
keeping+0.233
 heads+0.175
ping+0.175
 arrest+0.173
makers+0.171
Neg logit contributions
int-0.083
ours-0.082
ores-0.081
osh-0.078
ite-0.078
Token spin
Token ID7906
Feature activation+4.11e+00

Pos logit contributions
int+0.212
osh+0.210
ours+0.210
ite+0.206
ores+0.204
Neg logit contributions
keeping-0.315
ping-0.218
 arrest-0.216
 heads-0.214
makers-0.207
Token and
Token ID290
Feature activation-3.46e-02

Pos logit contributions
int+0.309
osh+0.303
ite+0.301
ours+0.299
ores+0.293
Neg logit contributions
keeping-0.495
ping-0.351
 heads-0.338
 arrest-0.334
makers-0.330
Token tw
Token ID665
Feature activation+6.34e-01

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
makers+0.003
Neg logit contributions
int-0.003
osh-0.003
ite-0.003
ours-0.002
ores-0.002
Tokenirl
Token ID1901
Feature activation+9.51e+00

Pos logit contributions
ver+0.036
ite+0.034
osh+0.034
 of+0.033
 swift+0.032
Neg logit contributions
keeping-0.075
 arrest-0.061
ping-0.060
 heads-0.060
 necks-0.057
Token with
Token ID351
Feature activation+4.10e-01

Pos logit contributions
int+0.725
osh+0.717
ite+0.712
ours+0.710
ores+0.699
Neg logit contributions
keeping-1.129
ping-0.799
 heads-0.766
 arrest-0.753
makers-0.748
Token me
Token ID502
Feature activation-1.33e+00

Pos logit contributions
osh+0.035
ite+0.035
int+0.035
ours+0.034
ores+0.034
Neg logit contributions
keeping-0.045
ping-0.031
 heads-0.029
 arrest-0.029
makers-0.028
Token forever
Token ID8097
Feature activation-1.03e+00

Pos logit contributions
keeping+0.128
ping+0.082
 heads+0.077
makers+0.072
 arrest+0.072
Neg logit contributions
ours-0.130
int-0.129
osh-0.128
ores-0.128
ite-0.126
Token."
Token ID526
Feature activation-1.80e+00

Pos logit contributions
keeping+0.118
ping+0.083
 arrest+0.080
 heads+0.080
makers+0.077
Neg logit contributions
ite-0.081
ig-0.080
ores-0.079
ver-0.078
osh-0.077
Token
Token ID198
Feature activation+6.01e-02

Pos logit contributions
keeping+0.225
ping+0.156
 arrest+0.154
 heads+0.153
 mant+0.146
Neg logit contributions
int-0.125
osh-0.124
ours-0.124
ores-0.123
ig-0.120
Token grass
Token ID8701
Feature activation-2.61e+00

Pos logit contributions
ite+0.022
ig+0.021
uddle+0.021
ade+0.021
osh+0.020
Neg logit contributions
keeping-0.033
ping-0.023
 arrest-0.023
iously-0.022
edly-0.022
Token and
Token ID290
Feature activation-1.15e-02

Pos logit contributions
keeping+0.336
ping+0.240
 arrest+0.237
 heads+0.233
makers+0.231
Neg logit contributions
ite-0.173
osh-0.172
ores-0.167
int-0.166
ig-0.166
Token watching
Token ID4964
Feature activation-8.96e-01

Pos logit contributions
keeping+0.001
 doorstep+0.001
 arrest+0.001
ping+0.001
 generation+0.001
Neg logit contributions
osh-0.001
ours-0.001
ig-0.001
 of-0.001
 exactly-0.001
Token them
Token ID606
Feature activation-1.20e+00

Pos logit contributions
keeping+0.116
 arrest+0.080
ping+0.079
makers+0.077
 mant+0.076
Neg logit contributions
ores-0.060
osh-0.059
ours-0.059
 of-0.057
 exactly-0.055
Token tw
Token ID665
Feature activation+5.38e-01

Pos logit contributions
keeping+0.159
ping+0.112
 arrest+0.110
 heads+0.109
makers+0.107
Neg logit contributions
osh-0.077
ite-0.077
ores-0.075
ours-0.074
ig-0.073
Tokenirl
Token ID1901
Feature activation+9.50e+00

Pos logit contributions
ver+0.021
osh+0.019
int+0.019
ite+0.018
ores+0.018
Neg logit contributions
keeping-0.081
ping-0.066
 heads-0.064
 arrest-0.063
 surface-0.062
Token around
Token ID1088
Feature activation+1.16e+00

Pos logit contributions
osh+0.656
int+0.654
ite+0.651
ours+0.642
ores+0.635
Neg logit contributions
keeping-1.194
ping-0.861
 heads-0.828
 arrest-0.822
makers-0.806
Token each
Token ID1123
Feature activation-2.04e+00

Pos logit contributions
 of+0.049
ores+0.040
 at+0.038
?+0.037
 exactly+0.036
Neg logit contributions
keeping-0.176
 arrest-0.139
 mant-0.137
ping-0.134
 heads-0.133
Token other
Token ID584
Feature activation-4.17e+00

Pos logit contributions
keeping+0.213
ping+0.138
 arrest+0.132
 heads+0.131
makers+0.129
Neg logit contributions
osh-0.189
ite-0.186
int-0.186
ours-0.183
ig-0.183
Token.
Token ID13
Feature activation+8.26e-03

Pos logit contributions
keeping+0.582
 arrest+0.460
ping+0.429
makers+0.427
 mant+0.423
Neg logit contributions
ite-0.213
 of-0.212
ig-0.212
ap-0.200
adow-0.199
Token Annie
Token ID25860
Feature activation-2.42e+00

Pos logit contributions
ours+0.000
osh+0.000
int+0.000
ores+0.000
ite+0.000
Neg logit contributions
keeping-0.001
ping-0.001
 heads-0.001
 arrest-0.001
makers-0.001
Token.
Token ID13
Feature activation-9.49e-02

Pos logit contributions
ours+0.093
int+0.093
osh+0.091
ver+0.089
ite+0.089
Neg logit contributions
keeping-0.112
ping-0.079
 heads-0.075
edly-0.071
makers-0.070
Token She
Token ID1375
Feature activation+4.07e-01

Pos logit contributions
keeping+0.011
ping+0.008
 heads+0.008
 generation+0.007
 arrest+0.007
Neg logit contributions
int-0.007
ig-0.007
osh-0.007
ite-0.007
ours-0.007
Token liked
Token ID8288
Feature activation+7.84e-01

Pos logit contributions
int+0.033
ores+0.032
ours+0.032
ite+0.031
ver+0.031
Neg logit contributions
keeping-0.045
ping-0.032
 heads-0.032
 arrest-0.030
makers-0.030
Token to
Token ID284
Feature activation-3.19e-01

Pos logit contributions
osh+0.053
ite+0.052
ours+0.051
ores+0.050
ig+0.050
Neg logit contributions
keeping-0.100
ping-0.072
 heads-0.070
 arrest-0.069
makers-0.068
Token tw
Token ID665
Feature activation+6.89e-01

Pos logit contributions
keeping+0.035
ping+0.025
 heads+0.025
 arrest+0.024
makers+0.023
Neg logit contributions
int-0.026
ite-0.026
ours-0.026
ver-0.025
ores-0.025
Tokenirl
Token ID1901
Feature activation+9.50e+00

Pos logit contributions
ver+0.029
osh+0.027
ite+0.027
run+0.027
int+0.025
Neg logit contributions
keeping-0.093
ping-0.080
 heads-0.078
 arrest-0.077
 surface-0.076
Token and
Token ID290
Feature activation-2.95e-02

Pos logit contributions
ours+0.804
int+0.792
ores+0.767
osh+0.764
ver+0.749
Neg logit contributions
keeping-1.010
ping-0.745
 heads-0.699
makers-0.668
 surface-0.659
Token spin
Token ID7906
Feature activation+4.16e+00

Pos logit contributions
keeping+0.004
ping+0.003
 arrest+0.003
 heads+0.003
makers+0.003
Neg logit contributions
osh-0.002
ite-0.002
int-0.002
ig-0.002
ours-0.002
Token in
Token ID287
Feature activation-1.34e+00

Pos logit contributions
int+0.327
ours+0.326
osh+0.322
ite+0.321
ores+0.313
Neg logit contributions
keeping-0.482
ping-0.347
 heads-0.331
makers-0.320
 arrest-0.317
Token circles
Token ID13332
Feature activation+1.74e+00

Pos logit contributions
keeping+0.170
ping+0.123
 heads+0.118
 arrest+0.118
makers+0.116
Neg logit contributions
osh-0.090
ite-0.089
int-0.088
ig-0.087
ours-0.087
Token.
Token ID13
Feature activation-1.00e-01

Pos logit contributions
int+0.136
ours+0.136
ite+0.135
osh+0.135
ores+0.132
Neg logit contributions
keeping-0.200
ping-0.144
 heads-0.136
 arrest-0.132
makers-0.132
Token leaves
Token ID5667
Feature activation-4.56e-01

Pos logit contributions
ite+0.033
ig+0.032
int+0.032
osh+0.032
ores+0.032
Neg logit contributions
keeping-0.053
ping-0.037
 heads-0.035
 arrest-0.035
makers-0.035
Token.
Token ID13
Feature activation-6.37e-02

Pos logit contributions
keeping+0.055
ping+0.040
 heads+0.038
 arrest+0.037
makers+0.036
Neg logit contributions
int-0.034
osh-0.033
ours-0.033
ite-0.033
ores-0.032
Token With
Token ID2080
Feature activation+1.61e+00

Pos logit contributions
keeping+0.009
ping+0.007
 arrest+0.007
 heads+0.007
makers+0.007
Neg logit contributions
int-0.003
ours-0.003
ores-0.003
osh-0.003
ver-0.003
Token soft
Token ID2705
Feature activation-5.30e-01

Pos logit contributions
int+0.129
ite+0.127
ig+0.123
ours+0.123
osh+0.122
Neg logit contributions
keeping-0.182
ping-0.131
 heads-0.128
 arrest-0.126
makers-0.119
Token sn
Token ID3013
Feature activation+1.53e+00

Pos logit contributions
keeping+0.064
ping+0.045
 heads+0.043
 arrest+0.043
makers+0.042
Neg logit contributions
int-0.040
osh-0.040
ite-0.039
ours-0.039
ores-0.039
Tokenores
Token ID2850
Feature activation+9.50e+00

Pos logit contributions
osh+0.094
ver+0.093
 exactly+0.087
ours+0.083
ila+0.077
Neg logit contributions
keeping-0.190
ping-0.146
edly-0.137
 surface-0.136
 arrest-0.133
Token,
Token ID11
Feature activation-3.13e-02

Pos logit contributions
osh+0.574
int+0.573
ite+0.570
ours+0.561
ores+0.554
Neg logit contributions
keeping-1.271
ping-0.939
 heads-0.906
 arrest-0.899
makers-0.885
Token Lucy
Token ID22162
Feature activation-1.47e+00

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
makers+0.003
Neg logit contributions
int-0.002
ite-0.002
osh-0.002
ig-0.002
ours-0.002
Token began
Token ID2540
Feature activation+1.31e+00

Pos logit contributions
keeping+0.176
ping+0.127
 heads+0.123
 arrest+0.119
makers+0.116
Neg logit contributions
ite-0.108
ours-0.106
int-0.106
osh-0.105
ores-0.105
Token to
Token ID284
Feature activation-2.94e-01

Pos logit contributions
osh+0.088
ores+0.087
ours+0.087
int+0.087
ig+0.086
Neg logit contributions
keeping-0.166
ping-0.118
 heads-0.115
makers-0.114
 arrest-0.114
Token drift
Token ID24260
Feature activation+8.58e-01

Pos logit contributions
keeping+0.036
ping+0.026
 heads+0.025
 arrest+0.025
makers+0.024
Neg logit contributions
osh-0.021
int-0.021
ite-0.021
ours-0.020
ores-0.020
Token hands
Token ID2832
Feature activation+1.73e-01

Pos logit contributions
int+0.087
osh+0.087
ig+0.085
ite+0.085
ours+0.084
Neg logit contributions
keeping-0.126
ping-0.086
 arrest-0.082
makers-0.081
 heads-0.080
Token and
Token ID290
Feature activation-4.03e-01

Pos logit contributions
osh+0.010
int+0.010
ite+0.010
ig+0.010
ver+0.010
Neg logit contributions
keeping-0.024
ping-0.017
 arrest-0.017
 heads-0.016
makers-0.016
Token watched
Token ID7342
Feature activation+2.83e-01

Pos logit contributions
keeping+0.049
ping+0.035
 heads+0.034
 arrest+0.033
makers+0.032
Neg logit contributions
int-0.030
ite-0.030
ours-0.029
ver-0.029
ores-0.029
Token them
Token ID606
Feature activation-1.25e+00

Pos logit contributions
osh+0.019
int+0.019
ite+0.019
ours+0.019
ores+0.019
Neg logit contributions
keeping-0.036
ping-0.026
 heads-0.025
 arrest-0.025
makers-0.024
Token tw
Token ID665
Feature activation+6.70e-01

Pos logit contributions
keeping+0.153
ping+0.110
 heads+0.105
 arrest+0.104
makers+0.103
Neg logit contributions
int-0.090
osh-0.089
ite-0.089
ours-0.088
ores-0.087
Tokenirl
Token ID1901
Feature activation+9.49e+00

Pos logit contributions
ver+0.043
ite+0.043
osh+0.041
ours+0.039
int+0.038
Neg logit contributions
keeping-0.080
ping-0.065
 heads-0.062
 arrest-0.059
 surface-0.059
Token around
Token ID1088
Feature activation+1.08e+00

Pos logit contributions
int+0.715
ours+0.711
ite+0.699
osh+0.698
ores+0.684
Neg logit contributions
keeping-1.125
ping-0.815
 heads-0.775
makers-0.753
 arrest-0.744
Token and
Token ID290
Feature activation-1.54e-01

Pos logit contributions
ores+0.058
osh+0.057
 of+0.057
ig+0.053
 exactly+0.052
Neg logit contributions
keeping-0.151
 arrest-0.110
ping-0.110
 heads-0.107
cious-0.106
Token around
Token ID1088
Feature activation+1.09e+00

Pos logit contributions
keeping+0.019
ping+0.013
 arrest+0.013
 heads+0.013
makers+0.013
Neg logit contributions
osh-0.011
ite-0.011
ores-0.011
ours-0.011
ig-0.011
Token.
Token ID13
Feature activation+2.26e-02

Pos logit contributions
osh+0.066
ores+0.065
ig+0.063
 of+0.060
 exactly+0.060
Neg logit contributions
keeping-0.149
ping-0.106
 arrest-0.105
 heads-0.103
makers-0.101
Token
Token ID220
Feature activation+6.17e-01

Pos logit contributions
int+0.002
ite+0.002
ig+0.002
osh+0.002
ours+0.001
Neg logit contributions
keeping-0.003
ping-0.002
 heads-0.002
 arrest-0.002
makers-0.002
Token old
Token ID1468
Feature activation-1.52e+00

Pos logit contributions
ite+0.035
ig+0.035
osh+0.034
ours+0.034
ores+0.034
Neg logit contributions
keeping-0.051
ping-0.035
 arrest-0.033
makers-0.033
 heads-0.031
Token lady
Token ID10846
Feature activation+8.65e-01

Pos logit contributions
keeping+0.199
ping+0.149
 arrest+0.142
makers+0.141
 mant+0.138
Neg logit contributions
ite-0.095
ig-0.090
osh-0.088
ours-0.087
ila-0.086
Token like
Token ID588
Feature activation+1.74e+00

Pos logit contributions
osh+0.061
ite+0.061
ig+0.061
int+0.061
ours+0.059
Neg logit contributions
keeping-0.108
ping-0.076
 arrest-0.074
 heads-0.073
makers-0.072
Token a
Token ID257
Feature activation+2.23e-01

Pos logit contributions
int+0.114
ite+0.113
osh+0.112
ours+0.111
ig+0.109
Neg logit contributions
keeping-0.225
ping-0.165
 heads-0.158
 arrest-0.158
makers-0.155
Token wh
Token ID348
Feature activation+2.54e+00

Pos logit contributions
osh+0.017
ite+0.017
int+0.017
ig+0.017
ours+0.017
Neg logit contributions
keeping-0.026
ping-0.018
 heads-0.017
 arrest-0.017
makers-0.017
Tokenirl
Token ID1901
Feature activation+9.48e+00

Pos logit contributions
osh+0.167
ours+0.165
ores+0.163
ite+0.162
int+0.160
Neg logit contributions
keeping-0.306
ping-0.231
makers-0.219
 arrest-0.210
 heads-0.207
Tokenig
Token ID328
Feature activation+8.38e+00

Pos logit contributions
ours+0.743
int+0.723
ores+0.702
ite+0.701
ig+0.698
Neg logit contributions
keeping-1.086
ping-0.817
 heads-0.753
makers-0.746
 arrest-0.734
Tokenig
Token ID328
Feature activation+8.39e+00

Pos logit contributions
osh+0.557
ite+0.554
int+0.553
ig+0.546
ores+0.536
Neg logit contributions
keeping-1.076
ping-0.770
 heads-0.749
 arrest-0.746
makers-0.728
Token.
Token ID13
Feature activation+6.98e-02

Pos logit contributions
osh+0.559
ite+0.556
int+0.556
ig+0.546
ores+0.538
Neg logit contributions
keeping-1.074
ping-0.770
 heads-0.748
 arrest-0.744
makers-0.727
Token His
Token ID2399
Feature activation+4.65e-01

Pos logit contributions
ite+0.005
ig+0.005
osh+0.005
int+0.005
ver+0.005
Neg logit contributions
keeping-0.008
ping-0.006
 arrest-0.005
 heads-0.005
 pavement-0.005
Token dancing
Token ID15360
Feature activation-6.97e-01

Pos logit contributions
int+0.037
osh+0.037
ours+0.037
ores+0.037
ite+0.037
Neg logit contributions
keeping-0.053
ping-0.037
 heads-0.037
 arrest-0.035
makers-0.034
Token.
Token ID13
Feature activation+1.32e-01

Pos logit contributions
int+0.109
ours+0.108
osh+0.107
ite+0.107
ig+0.104
Neg logit contributions
keeping-0.190
ping-0.137
 heads-0.130
 arrest-0.130
makers-0.130
Token She
Token ID1375
Feature activation+4.68e-01

Pos logit contributions
int+0.010
osh+0.009
ite+0.009
ig+0.009
ours+0.009
Neg logit contributions
keeping-0.016
ping-0.011
makers-0.011
 arrest-0.011
 heads-0.011
Token started
Token ID2067
Feature activation+3.33e-01

Pos logit contributions
ores+0.039
ours+0.039
int+0.039
itting+0.037
umber+0.036
Neg logit contributions
keeping-0.051
 heads-0.036
ping-0.035
 arrest-0.034
cious-0.032
Token to
Token ID284
Feature activation-3.50e-01

Pos logit contributions
osh+0.026
int+0.026
ite+0.026
ours+0.025
ores+0.025
Neg logit contributions
keeping-0.039
ping-0.027
 heads-0.026
 arrest-0.026
makers-0.025
Token tw
Token ID665
Feature activation+6.45e-01

Pos logit contributions
keeping+0.045
ping+0.032
 heads+0.031
 arrest+0.031
makers+0.031
Neg logit contributions
osh-0.023
ite-0.023
int-0.023
ig-0.023
ours-0.023
Tokenirl
Token ID1901
Feature activation+9.48e+00

Pos logit contributions
ver+0.039
auc+0.038
ours+0.037
run+0.036
osh+0.036
Neg logit contributions
keeping-0.068
ping-0.060
 arrest-0.059
 distances-0.057
 surface-0.057
Token and
Token ID290
Feature activation+1.08e-01

Pos logit contributions
ours+0.820
int+0.803
osh+0.765
ores+0.761
umber+0.752
Neg logit contributions
keeping-1.018
ping-0.741
 heads-0.691
makers-0.679
 surface-0.653
Token spin
Token ID7906
Feature activation+4.12e+00

Pos logit contributions
osh+0.007
int+0.007
ite+0.007
ours+0.007
ores+0.007
Neg logit contributions
keeping-0.013
ping-0.010
 heads-0.009
 arrest-0.009
makers-0.009
Token until
Token ID1566
Feature activation-1.83e+00

Pos logit contributions
ours+0.319
int+0.319
osh+0.312
ite+0.311
ores+0.303
Neg logit contributions
keeping-0.486
ping-0.349
 heads-0.331
makers-0.325
 arrest-0.323
Token she
Token ID673
Feature activation+1.56e+00

Pos logit contributions
keeping+0.226
 arrest+0.159
ping+0.157
 heads+0.153
makers+0.153
Neg logit contributions
int-0.138
ours-0.133
ite-0.127
osh-0.124
ig-0.123
Token landed
Token ID11406
Feature activation-1.64e+00

Pos logit contributions
ours+0.132
ores+0.128
int+0.127
ite+0.119
itting+0.118
Neg logit contributions
keeping-0.164
 arrest-0.115
ping-0.112
 heads-0.112
 surface-0.107
Token would
Token ID561
Feature activation+2.33e+00

Pos logit contributions
int+0.112
ores+0.108
ite+0.107
ours+0.107
 exactly+0.104
Neg logit contributions
keeping-0.163
 heads-0.122
ping-0.119
 arrest-0.113
makers-0.112
Token go
Token ID467
Feature activation+2.52e+00

Pos logit contributions
int+0.195
ite+0.191
ours+0.190
ores+0.187
osh+0.187
Neg logit contributions
keeping-0.253
ping-0.177
 heads-0.175
 arrest-0.169
makers-0.164
Token outside
Token ID2354
Feature activation+2.00e-01

Pos logit contributions
int+0.229
osh+0.221
ours+0.218
ver+0.217
ite+0.215
Neg logit contributions
keeping-0.258
ping-0.178
 heads-0.170
makers-0.166
 arrest-0.163
Token and
Token ID290
Feature activation-5.49e-02

Pos logit contributions
int+0.016
osh+0.016
ite+0.016
ours+0.016
ores+0.016
Neg logit contributions
keeping-0.023
ping-0.016
 heads-0.015
 arrest-0.015
makers-0.014
Token tw
Token ID665
Feature activation+6.36e-01

Pos logit contributions
keeping+0.006
ping+0.004
 heads+0.004
 arrest+0.004
 mant+0.004
Neg logit contributions
int-0.005
ours-0.004
ite-0.004
ores-0.004
ver-0.004
Tokenirl
Token ID1901
Feature activation+9.48e+00

Pos logit contributions
ver+0.033
run+0.029
ite+0.029
osh+0.028
aid+0.028
Neg logit contributions
keeping-0.079
 heads-0.068
ping-0.067
 arrest-0.067
 necks-0.065
Token and
Token ID290
Feature activation-6.93e-02

Pos logit contributions
ours+0.795
int+0.783
osh+0.764
ores+0.748
ite+0.747
Neg logit contributions
keeping-1.039
ping-0.746
 heads-0.709
makers-0.688
edly-0.662
Token spin
Token ID7906
Feature activation+4.09e+00

Pos logit contributions
keeping+0.009
ping+0.006
 arrest+0.006
 heads+0.006
makers+0.006
Neg logit contributions
osh-0.005
ite-0.005
int-0.004
ig-0.004
ores-0.004
Token while
Token ID981
Feature activation-1.33e+00

Pos logit contributions
int+0.297
osh+0.297
ite+0.295
ours+0.293
ores+0.287
Neg logit contributions
keeping-0.501
ping-0.359
 heads-0.345
 arrest-0.339
makers-0.335
Token the
Token ID262
Feature activation+5.17e-01

Pos logit contributions
keeping+0.195
ping+0.147
 heads+0.142
makers+0.140
 mant+0.139
Neg logit contributions
osh-0.067
apy-0.067
ores-0.065
ite-0.064
ig-0.064
Token sun
Token ID4252
Feature activation+6.58e-01

Pos logit contributions
ite+0.038
osh+0.038
int+0.038
ig+0.037
ours+0.036
Neg logit contributions
keeping-0.063
ping-0.045
 arrest-0.043
makers-0.042
 heads-0.042
Token couldn
Token ID3521
Feature activation-2.96e+00

Pos logit contributions
int+0.111
osh+0.110
ite+0.110
ours+0.109
ores+0.108
Neg logit contributions
keeping-0.183
ping-0.130
 heads-0.125
 arrest-0.124
makers-0.121
Token't
Token ID470
Feature activation-3.66e+00

Pos logit contributions
keeping+0.225
 arrest+0.124
ping+0.123
makers+0.117
 heads+0.111
Neg logit contributions
int-0.346
ours-0.341
osh-0.332
ig-0.332
ite-0.332
Token help
Token ID1037
Feature activation+2.88e-01

Pos logit contributions
keeping+0.482
ping+0.352
 heads+0.327
makers+0.312
 arrest+0.309
Neg logit contributions
osh-0.245
 of-0.242
int-0.241
ite-0.238
ores-0.236
Token but
Token ID475
Feature activation+2.01e-01

Pos logit contributions
osh+0.020
int+0.019
ite+0.019
ig+0.019
ores+0.019
Neg logit contributions
keeping-0.036
ping-0.026
 arrest-0.025
 heads-0.025
makers-0.024
Token tw
Token ID665
Feature activation+5.49e-01

Pos logit contributions
int+0.015
ours+0.014
osh+0.014
ig+0.014
ite+0.014
Neg logit contributions
keeping-0.025
ping-0.017
 heads-0.017
 arrest-0.017
 mant-0.017
Tokenirl
Token ID1901
Feature activation+9.46e+00

Pos logit contributions
ite+0.034
osh+0.032
int+0.031
ver+0.031
auc+0.031
Neg logit contributions
keeping-0.066
ping-0.054
 heads-0.051
 arrest-0.050
 necks-0.047
Token around
Token ID1088
Feature activation+9.96e-01

Pos logit contributions
int+0.733
ours+0.720
osh+0.699
ite+0.697
ores+0.691
Neg logit contributions
keeping-1.107
ping-0.805
 heads-0.756
makers-0.748
 arrest-0.725
Token shouting
Token ID19642
Feature activation-4.38e+00

Pos logit contributions
osh+0.054
 of+0.053
ores+0.052
ig+0.052
 exactly+0.051
Neg logit contributions
keeping-0.139
ping-0.101
 arrest-0.101
 heads-0.100
makers-0.097
Token,
Token ID11
Feature activation-7.40e-02

Pos logit contributions
keeping+0.574
ping+0.417
 arrest+0.403
 heads+0.402
makers+0.393
Neg logit contributions
osh-0.279
int-0.278
ite-0.275
ours-0.272
ores-0.271
Token "
Token ID366
Feature activation-9.84e-03

Pos logit contributions
keeping+0.009
ping+0.006
 arrest+0.006
 heads+0.006
makers+0.006
Neg logit contributions
osh-0.005
ite-0.005
ores-0.005
int-0.005
ours-0.005
TokenLook
Token ID8567
Feature activation-3.35e+00

Pos logit contributions
keeping+0.001
ping+0.001
 heads+0.001
 arrest+0.001
makers+0.001
Neg logit contributions
int-0.001
osh-0.001
ite-0.001
ours-0.001
ores-0.001
Token delight
Token ID10974
Feature activation-9.13e-01

Pos logit contributions
keeping+0.153
ping+0.108
makers+0.104
 heads+0.102
 arrest+0.100
Neg logit contributions
osh-0.079
ores-0.077
int-0.076
ours-0.076
ite-0.075
Token,
Token ID11
Feature activation+2.65e-01

Pos logit contributions
keeping+0.123
ping+0.088
 arrest+0.088
makers+0.086
 heads+0.085
Neg logit contributions
int-0.054
ores-0.053
 exactly-0.053
osh-0.052
ig-0.052
Token watching
Token ID4964
Feature activation-1.05e+00

Pos logit contributions
osh+0.020
int+0.020
ite+0.020
ours+0.020
ores+0.019
Neg logit contributions
keeping-0.032
ping-0.022
 heads-0.021
 arrest-0.021
makers-0.021
Token it
Token ID340
Feature activation+5.50e-01

Pos logit contributions
keeping+0.135
ping+0.099
 heads+0.095
 arrest+0.094
makers+0.093
Neg logit contributions
int-0.070
ite-0.070
osh-0.069
ours-0.068
ig-0.067
Token tw
Token ID665
Feature activation+6.03e-01

Pos logit contributions
int+0.042
ours+0.042
ite+0.041
ores+0.041
osh+0.041
Neg logit contributions
keeping-0.063
ping-0.046
 heads-0.044
makers-0.042
 arrest-0.042
Tokenirl
Token ID1901
Feature activation+9.46e+00

Pos logit contributions
ver+0.042
osh+0.040
ite+0.040
ours+0.038
 of+0.038
Neg logit contributions
keeping-0.069
ping-0.055
 heads-0.053
 surface-0.052
edly-0.050
Token across
Token ID1973
Feature activation-9.06e-01

Pos logit contributions
ours+0.719
int+0.718
osh+0.697
ite+0.695
ores+0.693
Neg logit contributions
keeping-1.115
ping-0.809
 heads-0.770
makers-0.747
 arrest-0.735
Token the
Token ID262
Feature activation+6.52e-01

Pos logit contributions
keeping+0.120
ping+0.088
 heads+0.086
 arrest+0.084
makers+0.084
Neg logit contributions
osh-0.056
ores-0.054
ite-0.054
int-0.053
ours-0.053
Token floor
Token ID4314
Feature activation-5.13e+00

Pos logit contributions
ite+0.058
osh+0.056
ig+0.056
int+0.053
ores+0.052
Neg logit contributions
keeping-0.071
ping-0.048
 arrest-0.045
makers-0.045
 heads-0.043
Token.
Token ID13
Feature activation+2.14e-01

Pos logit contributions
keeping+0.608
ping+0.433
 heads+0.414
 arrest+0.407
makers+0.401
Neg logit contributions
int-0.387
osh-0.383
ite-0.382
ours-0.380
ores-0.373
Token
Token ID198
Feature activation+1.16e-01

Pos logit contributions
int+0.017
ig+0.017
ite+0.016
osh+0.016
ours+0.016
Neg logit contributions
keeping-0.024
ping-0.016
 heads-0.016
 arrest-0.016
 Smith-0.016
Token started
Token ID2067
Feature activation+3.13e-01

Pos logit contributions
int+0.063
ite+0.062
ours+0.062
osh+0.062
ig+0.061
Neg logit contributions
keeping-0.102
ping-0.072
 heads-0.069
 arrest-0.068
makers-0.067
Token to
Token ID284
Feature activation-3.34e-01

Pos logit contributions
osh+0.023
ite+0.023
ours+0.023
int+0.022
ig+0.022
Neg logit contributions
keeping-0.038
ping-0.027
 heads-0.026
 arrest-0.025
makers-0.025
Token dance
Token ID9280
Feature activation+9.79e-01

Pos logit contributions
keeping+0.041
ping+0.029
 heads+0.028
 arrest+0.028
makers+0.027
Neg logit contributions
osh-0.024
int-0.024
ite-0.024
ours-0.024
ores-0.024
Token and
Token ID290
Feature activation+2.32e-03

Pos logit contributions
int+0.074
ours+0.072
osh+0.070
ite+0.070
ver+0.067
Neg logit contributions
keeping-0.120
ping-0.085
 heads-0.082
makers-0.080
 arrest-0.079
Token tw
Token ID665
Feature activation+5.99e-01

Pos logit contributions
osh+0.000
ite+0.000
ig+0.000
ores+0.000
ours+0.000
Neg logit contributions
keeping-0.000
ping-0.000
 arrest-0.000
 heads-0.000
makers-0.000
Tokenirl
Token ID1901
Feature activation+9.46e+00

Pos logit contributions
ver+0.032
ite+0.032
osh+0.028
 swift+0.028
run+0.028
Neg logit contributions
keeping-0.077
ping-0.062
 heads-0.061
 arrest-0.060
 necks-0.058
Token around
Token ID1088
Feature activation+1.08e+00

Pos logit contributions
int+0.706
ours+0.698
osh+0.691
ite+0.688
ores+0.674
Neg logit contributions
keeping-1.138
ping-0.818
 heads-0.780
makers-0.763
 arrest-0.758
Token in
Token ID287
Feature activation-1.36e+00

Pos logit contributions
osh+0.054
 of+0.052
ores+0.051
ig+0.051
ap+0.048
Neg logit contributions
keeping-0.157
ping-0.117
 arrest-0.117
 heads-0.112
 mant-0.110
Token the
Token ID262
Feature activation+5.25e-01

Pos logit contributions
keeping+0.185
ping+0.136
makers+0.130
 arrest+0.129
 heads+0.127
Neg logit contributions
osh-0.083
ig-0.079
ores-0.078
ite-0.078
ours-0.076
Token sunshine
Token ID34488
Feature activation+2.23e+00

Pos logit contributions
ours+0.046
int+0.046
ver+0.046
ores+0.045
osh+0.045
Neg logit contributions
keeping-0.055
ping-0.038
 heads-0.038
 mant-0.036
 arrest-0.036
Token.
Token ID13
Feature activation-3.71e-02

Pos logit contributions
ite+0.135
ig+0.135
osh+0.134
ores+0.130
ver+0.127
Neg logit contributions
keeping-0.293
 arrest-0.214
ping-0.213
 heads-0.207
makers-0.203
Token skirt
Token ID23967
Feature activation+1.99e+00

Pos logit contributions
keeping+0.037
ping+0.026
 heads+0.025
 arrest+0.025
makers+0.024
Neg logit contributions
osh-0.025
int-0.025
ite-0.025
ours-0.025
ores-0.025
Token and
Token ID290
Feature activation-3.65e-02

Pos logit contributions
osh+0.154
ours+0.152
int+0.151
ores+0.149
ite+0.146
Neg logit contributions
keeping-0.232
ping-0.166
 heads-0.158
 arrest-0.156
makers-0.152
Token loved
Token ID6151
Feature activation+6.90e-01

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
cious+0.003
Neg logit contributions
int-0.003
ours-0.003
ores-0.003
ite-0.003
osh-0.003
Token to
Token ID284
Feature activation-4.64e-01

Pos logit contributions
ig+0.037
ite+0.037
ores+0.037
ours+0.036
 of+0.036
Neg logit contributions
keeping-0.094
ping-0.070
 heads-0.069
 arrest-0.068
makers-0.068
Token tw
Token ID665
Feature activation+5.39e-01

Pos logit contributions
keeping+0.048
ping+0.033
 heads+0.033
 arrest+0.032
 mant+0.031
Neg logit contributions
ver-0.040
ours-0.040
ite-0.040
int-0.040
ores-0.039
Tokenirl
Token ID1901
Feature activation+9.46e+00

Pos logit contributions
ver+0.021
osh+0.021
ite+0.019
ores+0.018
int+0.018
Neg logit contributions
keeping-0.077
ping-0.064
 heads-0.063
 arrest-0.063
 surface-0.060
Token around
Token ID1088
Feature activation+1.06e+00

Pos logit contributions
int+0.715
ours+0.715
osh+0.700
ores+0.689
ite+0.682
Neg logit contributions
keeping-1.119
ping-0.805
 heads-0.764
makers-0.749
 arrest-0.744
Token in
Token ID287
Feature activation-1.42e+00

Pos logit contributions
 of+0.044
 at+0.040
?+0.034
ig+0.031
irl+0.030
Neg logit contributions
keeping-0.174
cious-0.133
 arrest-0.132
 heads-0.128
ping-0.128
Token it
Token ID340
Feature activation+1.78e-01

Pos logit contributions
keeping+0.190
ping+0.140
makers+0.137
 heads+0.135
 arrest+0.134
Neg logit contributions
osh-0.085
ig-0.083
ite-0.082
ores-0.081
 of-0.080
Token.
Token ID13
Feature activation+3.70e-02

Pos logit contributions
ite+0.012
osh+0.012
ig+0.012
int+0.012
ores+0.011
Neg logit contributions
keeping-0.023
ping-0.017
 arrest-0.016
makers-0.016
 heads-0.016
Token One
Token ID1881
Feature activation-3.27e-01

Pos logit contributions
osh+0.003
int+0.003
ite+0.003
ver+0.003
ores+0.003
Neg logit contributions
keeping-0.004
ping-0.003
 heads-0.003
 mant-0.003
 arrest-0.003
Token.
Token ID13
Feature activation+8.60e-02

Pos logit contributions
int+0.034
osh+0.033
ite+0.033
ours+0.033
ig+0.033
Neg logit contributions
keeping-0.057
ping-0.041
 heads-0.039
 arrest-0.039
makers-0.038
Token She
Token ID1375
Feature activation+4.47e-01

Pos logit contributions
osh+0.006
int+0.006
ite+0.006
ig+0.006
ours+0.006
Neg logit contributions
keeping-0.011
ping-0.007
 arrest-0.007
 heads-0.007
 mant-0.007
Token began
Token ID2540
Feature activation+1.27e+00

Pos logit contributions
ite+0.035
ours+0.035
ores+0.035
int+0.034
itting+0.034
Neg logit contributions
keeping-0.050
 heads-0.036
ping-0.035
 arrest-0.034
edly-0.033
Token to
Token ID284
Feature activation-3.58e-01

Pos logit contributions
osh+0.090
int+0.088
ours+0.088
ite+0.088
ig+0.088
Neg logit contributions
keeping-0.156
ping-0.111
 heads-0.107
 arrest-0.106
makers-0.105
Token tw
Token ID665
Feature activation+6.18e-01

Pos logit contributions
keeping+0.044
ping+0.031
makers+0.030
 arrest+0.030
 heads+0.030
Neg logit contributions
osh-0.026
int-0.025
ite-0.025
ig-0.025
ores-0.024
Tokenirl
Token ID1901
Feature activation+9.45e+00

Pos logit contributions
ver+0.033
ite+0.032
osh+0.032
auc+0.029
ours+0.028
Neg logit contributions
keeping-0.078
ping-0.064
 arrest-0.062
 heads-0.062
 necks-0.061
Token around
Token ID1088
Feature activation+1.08e+00

Pos logit contributions
int+0.783
ours+0.779
osh+0.745
ores+0.740
ite+0.732
Neg logit contributions
keeping-1.050
ping-0.751
 heads-0.709
makers-0.696
 arrest-0.671
Token,
Token ID11
Feature activation+7.69e-03

Pos logit contributions
ig+0.057
osh+0.057
 of+0.056
ores+0.054
 exactly+0.054
Neg logit contributions
keeping-0.154
ping-0.111
 arrest-0.110
 heads-0.109
makers-0.108
Token spinning
Token ID19493
Feature activation+1.55e+00

Pos logit contributions
osh+0.001
int+0.001
ite+0.001
ig+0.001
ours+0.001
Neg logit contributions
keeping-0.001
ping-0.001
 arrest-0.001
 heads-0.001
makers-0.001
Token the
Token ID262
Feature activation+3.71e-01

Pos logit contributions
osh+0.099
ig+0.097
ite+0.095
ores+0.093
int+0.092
Neg logit contributions
keeping-0.205
ping-0.145
 arrest-0.144
 heads-0.141
makers-0.140
Token da
Token ID12379
Feature activation+1.97e+00

Pos logit contributions
ig+0.021
ite+0.018
osh+0.018
ila+0.017
ee+0.016
Neg logit contributions
keeping-0.055
 arrest-0.040
makers-0.039
 generation-0.039
ping-0.039
Token loved
Token ID6151
Feature activation+6.97e-01

Pos logit contributions
osh+0.024
int+0.023
ite+0.023
ig+0.022
ores+0.022
Neg logit contributions
keeping-0.055
ping-0.040
 heads-0.039
 arrest-0.039
makers-0.038
Token to
Token ID284
Feature activation-4.29e-01

Pos logit contributions
ours+0.039
ores+0.038
osh+0.037
ite+0.037
 of+0.036
Neg logit contributions
keeping-0.096
ping-0.071
makers-0.069
 heads-0.069
edly-0.068
Token dance
Token ID9280
Feature activation+9.73e-01

Pos logit contributions
keeping+0.047
ping+0.032
 arrest+0.031
 heads+0.031
 mant+0.029
Neg logit contributions
ite-0.036
int-0.036
ours-0.036
osh-0.036
ver-0.036
Token and
Token ID290
Feature activation+3.61e-02

Pos logit contributions
int+0.085
ours+0.084
osh+0.082
ite+0.081
ver+0.080
Neg logit contributions
keeping-0.102
ping-0.073
 heads-0.071
edly-0.067
 arrest-0.066
Token tw
Token ID665
Feature activation+5.56e-01

Pos logit contributions
osh+0.003
ite+0.003
int+0.003
ours+0.003
ig+0.003
Neg logit contributions
keeping-0.004
ping-0.003
 heads-0.003
 arrest-0.003
makers-0.003
Tokenirl
Token ID1901
Feature activation+9.45e+00

Pos logit contributions
ver+0.023
ite+0.021
osh+0.021
 swift+0.020
run+0.020
Neg logit contributions
keeping-0.078
 heads-0.066
ping-0.066
 arrest-0.063
 necks-0.062
Token in
Token ID287
Feature activation-1.38e+00

Pos logit contributions
ours+0.733
int+0.730
osh+0.716
ite+0.706
ores+0.705
Neg logit contributions
keeping-1.092
ping-0.795
 heads-0.753
makers-0.732
 arrest-0.722
Token her
Token ID607
Feature activation-4.09e-01

Pos logit contributions
keeping+0.183
ping+0.132
makers+0.128
 arrest+0.127
 heads+0.127
Neg logit contributions
osh-0.088
ite-0.086
ig-0.085
ores-0.083
ours-0.083
Token pretty
Token ID2495
Feature activation-3.27e-01

Pos logit contributions
keeping+0.039
 heads+0.031
ping+0.027
 mant+0.027
 surface+0.026
Neg logit contributions
ver-0.040
ores-0.038
umber-0.037
ours-0.036
osh-0.036
Token skirt
Token ID23967
Feature activation+2.03e+00

Pos logit contributions
keeping+0.038
ping+0.027
 heads+0.026
 arrest+0.025
makers+0.025
Neg logit contributions
osh-0.025
int-0.025
ite-0.025
ours-0.025
ores-0.025
Token.
Token ID13
Feature activation+7.89e-02

Pos logit contributions
ours+0.170
osh+0.170
int+0.170
ores+0.167
ite+0.166
Neg logit contributions
keeping-0.220
ping-0.155
 heads-0.149
 arrest-0.141
 mant-0.139
Token the
Token ID262
Feature activation+2.00e-01

Pos logit contributions
ours+0.015
ig+0.015
osh+0.015
ores+0.015
ite+0.014
Neg logit contributions
keeping-0.035
ping-0.026
makers-0.025
 arrest-0.024
 heads-0.024
Token red
Token ID2266
Feature activation+4.70e-01

Pos logit contributions
int+0.014
ig+0.014
osh+0.014
ite+0.014
ours+0.014
Neg logit contributions
keeping-0.025
ping-0.018
 arrest-0.017
makers-0.017
 heads-0.017
Token dress
Token ID6576
Feature activation+1.91e+00

Pos logit contributions
osh+0.038
ver+0.038
int+0.037
ores+0.037
ite+0.037
Neg logit contributions
keeping-0.053
ping-0.037
 heads-0.037
 arrest-0.035
 mant-0.034
Token and
Token ID290
Feature activation-3.74e-01

Pos logit contributions
int+0.147
osh+0.146
ours+0.144
ite+0.143
ores+0.142
Neg logit contributions
keeping-0.225
ping-0.159
 heads-0.153
 arrest-0.148
makers-0.146
Token tw
Token ID665
Feature activation+4.76e-01

Pos logit contributions
keeping+0.040
 heads+0.032
 arrest+0.029
ping+0.028
 stems+0.027
Neg logit contributions
ver-0.031
umber-0.030
int-0.030
itting-0.030
ite-0.029
Tokenirl
Token ID1901
Feature activation+9.45e+00

Pos logit contributions
ver+0.025
ite+0.023
osh+0.021
 of+0.021
 swift+0.020
Neg logit contributions
keeping-0.063
 heads-0.052
 necks-0.051
ping-0.051
 arrest-0.048
Tokens
Token ID82
Feature activation-1.23e+00

Pos logit contributions
int+0.794
ours+0.727
ver+0.716
ores+0.711
ite+0.706
Neg logit contributions
keeping-1.060
ping-0.745
 heads-0.728
makers-0.707
 arrest-0.668
Token around
Token ID1088
Feature activation+1.21e+00

Pos logit contributions
keeping+0.150
ping+0.107
 heads+0.102
 arrest+0.101
makers+0.099
Neg logit contributions
osh-0.091
int-0.090
ite-0.090
ours-0.089
ores-0.088
Token.
Token ID13
Feature activation-1.96e-01

Pos logit contributions
 of+0.056
ores+0.051
ila+0.047
osh+0.047
ig+0.046
Neg logit contributions
keeping-0.179
 arrest-0.139
ping-0.137
cious-0.134
makers-0.133
Token
Token ID198
Feature activation-1.00e-01

Pos logit contributions
keeping+0.021
 heads+0.014
ping+0.014
 arrest+0.014
edly+0.013
Neg logit contributions
int-0.017
osh-0.016
ig-0.016
ite-0.016
ver-0.015
Token
Token ID198
Feature activation-1.02e-01

Pos logit contributions
keeping+0.013
ping+0.009
 arrest+0.009
 heads+0.009
makers+0.009
Neg logit contributions
int-0.007
ite-0.006
osh-0.006
ig-0.006
ours-0.006
Token day
Token ID1110
Feature activation-7.17e-01

Pos logit contributions
osh+0.405
ores+0.401
ours+0.397
int+0.394
ite+0.393
Neg logit contributions
keeping-0.521
ping-0.345
 heads-0.333
 arrest-0.330
makers-0.329
Token.
Token ID13
Feature activation+8.33e-02

Pos logit contributions
keeping+0.101
ping+0.075
 heads+0.074
 arrest+0.073
makers+0.072
Neg logit contributions
 of-0.038
ite-0.037
ores-0.037
int-0.036
ig-0.036
Token She
Token ID1375
Feature activation+3.29e-01

Pos logit contributions
int+0.005
osh+0.005
ours+0.005
ite+0.005
ores+0.005
Neg logit contributions
keeping-0.011
ping-0.008
 arrest-0.008
 heads-0.008
makers-0.008
Token would
Token ID561
Feature activation+2.39e+00

Pos logit contributions
int+0.024
ite+0.024
osh+0.024
ours+0.023
ores+0.023
Neg logit contributions
keeping-0.040
ping-0.028
 heads-0.028
 arrest-0.027
makers-0.026
Token tw
Token ID665
Feature activation+5.57e-01

Pos logit contributions
osh+0.165
int+0.161
ite+0.160
ores+0.159
ours+0.157
Neg logit contributions
keeping-0.306
ping-0.219
makers-0.207
 heads-0.207
 arrest-0.206
Tokenirl
Token ID1901
Feature activation+9.45e+00

Pos logit contributions
ver+0.025
ite+0.024
osh+0.023
auc+0.022
run+0.022
Neg logit contributions
keeping-0.072
ping-0.064
 heads-0.060
 arrest-0.060
 necks-0.059
Token and
Token ID290
Feature activation-1.19e-02

Pos logit contributions
ours+0.762
int+0.761
osh+0.736
ite+0.732
ores+0.728
Neg logit contributions
keeping-1.057
ping-0.767
 heads-0.719
makers-0.707
 arrest-0.687
Token dance
Token ID9280
Feature activation+9.89e-01

Pos logit contributions
keeping+0.001
ping+0.001
 arrest+0.001
 heads+0.001
makers+0.001
Neg logit contributions
osh-0.001
ite-0.001
int-0.001
ig-0.001
ours-0.001
Token around
Token ID1088
Feature activation+1.01e+00

Pos logit contributions
int+0.071
ite+0.071
osh+0.070
ours+0.070
ores+0.068
Neg logit contributions
keeping-0.122
ping-0.088
 heads-0.084
 arrest-0.082
makers-0.082
Token each
Token ID1123
Feature activation-1.82e+00

Pos logit contributions
osh+0.051
 of+0.050
ores+0.050
ig+0.048
 exactly+0.046
Neg logit contributions
keeping-0.151
 arrest-0.108
makers-0.106
ping-0.106
cious-0.105
Token flower
Token ID15061
Feature activation-2.85e+00

Pos logit contributions
keeping+0.179
ping+0.121
 heads+0.113
 arrest+0.111
makers+0.105
Neg logit contributions
int-0.172
ours-0.171
ite-0.170
ver-0.169
ores-0.169
Token grandma
Token ID49890
Feature activation+6.90e-01

Pos logit contributions
ite+0.065
ig+0.064
osh+0.064
ours+0.063
int+0.063
Neg logit contributions
keeping-0.189
ping-0.145
makers-0.136
 arrest-0.136
 heads-0.136
Token's
Token ID338
Feature activation-8.48e-01

Pos logit contributions
ite+0.044
int+0.044
ig+0.044
osh+0.043
ver+0.043
Neg logit contributions
keeping-0.089
ping-0.065
 heads-0.063
 arrest-0.063
makers-0.062
Token house
Token ID2156
Feature activation+9.09e-02

Pos logit contributions
keeping+0.109
ping+0.083
makers+0.076
 arrest+0.075
cious+0.074
Neg logit contributions
ig-0.053
ite-0.052
ours-0.050
 exactly-0.049
osh-0.048
Token for
Token ID329
Feature activation-4.16e-01

Pos logit contributions
int+0.007
osh+0.007
ite+0.007
ours+0.007
ig+0.007
Neg logit contributions
keeping-0.011
ping-0.008
 heads-0.007
 arrest-0.007
makers-0.007
Token safe
Token ID3338
Feature activation+8.11e-01

Pos logit contributions
keeping+0.057
 heads+0.045
ping+0.045
cious+0.044
 mant+0.043
Neg logit contributions
 exactly-0.022
ores-0.021
 of-0.019
run-0.019
apy-0.018
Tokenkeeping
Token ID19934
Feature activation-1.08e+01

Pos logit contributions
ores+0.073
 of+0.073
 exactly+0.072
ig+0.071
ite+0.070
Neg logit contributions
keeping-0.065
ping-0.048
 mant-0.048
cious-0.047
 arrest-0.046
Token.
Token ID13
Feature activation+1.61e-02

Pos logit contributions
keeping+1.435
ping+1.162
 heads+1.154
 mant+1.117
 arrest+1.112
Neg logit contributions
ver-0.564
 of-0.532
adow-0.524
ite-0.510
 exactly-0.506
Token
Token ID220
Feature activation+5.65e-01

Pos logit contributions
osh+0.001
int+0.001
ig+0.001
ite+0.001
umber+0.001
Neg logit contributions
keeping-0.002
 arrest-0.001
ping-0.001
 heads-0.001
makers-0.001
Token
Token ID198
Feature activation+9.73e-02

Pos logit contributions
ig+0.044
int+0.043
osh+0.043
ite+0.042
ours+0.041
Neg logit contributions
keeping-0.067
ping-0.045
makers-0.044
 arrest-0.042
 heads-0.042
Token
Token ID198
Feature activation+1.07e-01

Pos logit contributions
int+0.006
ours+0.006
osh+0.006
ite+0.006
ig+0.006
Neg logit contributions
keeping-0.013
ping-0.009
 arrest-0.009
makers-0.009
 heads-0.009
TokenWhen
Token ID2215
Feature activation+3.95e-01

Pos logit contributions
int+0.008
ours+0.008
osh+0.008
ig+0.008
ite+0.008
Neg logit contributions
keeping-0.013
ping-0.009
 arrest-0.009
makers-0.008
 heads-0.008
Token it
Token ID340
Feature activation+7.49e-02

Pos logit contributions
osh+0.082
ite+0.081
int+0.081
ours+0.080
ores+0.079
Neg logit contributions
keeping-0.139
ping-0.100
 heads-0.095
 arrest-0.095
makers-0.093
Token to
Token ID284
Feature activation-4.22e-01

Pos logit contributions
int+0.006
osh+0.006
ite+0.006
ours+0.006
ig+0.006
Neg logit contributions
keeping-0.009
ping-0.006
 heads-0.006
 arrest-0.006
makers-0.006
Token me
Token ID502
Feature activation-1.35e+00

Pos logit contributions
keeping+0.056
ping+0.041
 heads+0.039
makers+0.039
 arrest+0.039
Neg logit contributions
osh-0.026
ores-0.026
ours-0.025
ite-0.025
int-0.025
Token for
Token ID329
Feature activation-2.70e-01

Pos logit contributions
keeping+0.155
ping+0.109
 heads+0.106
 arrest+0.102
makers+0.101
Neg logit contributions
int-0.105
osh-0.104
ite-0.104
ours-0.103
ig-0.101
Token safe
Token ID3338
Feature activation+1.13e+00

Pos logit contributions
keeping+0.035
ping+0.026
cious+0.026
 heads+0.025
edly+0.025
Neg logit contributions
 exactly-0.016
ores-0.016
osh-0.015
 of-0.014
 please-0.014
Tokenkeeping
Token ID19934
Feature activation-1.04e+01

Pos logit contributions
 of+0.119
?+0.116
ores+0.112
ite+0.108
adow+0.108
Neg logit contributions
keeping-0.062
cious-0.052
 mant-0.051
ping-0.047
uces-0.047
Token.
Token ID13
Feature activation-2.45e-01

Pos logit contributions
keeping+1.391
ping+1.099
 mant+1.080
 heads+1.065
 generation+1.055
Neg logit contributions
ver-0.527
 of-0.505
ite-0.497
adow-0.495
 exactly-0.493
Token He
Token ID679
Feature activation+6.71e-01

Pos logit contributions
keeping+0.028
 arrest+0.018
ping+0.018
makers+0.018
 mant+0.018
Neg logit contributions
ig-0.021
int-0.020
ite-0.020
osh-0.019
itting-0.019
Token told
Token ID1297
Feature activation+5.19e-01

Pos logit contributions
int+0.043
ores+0.042
ite+0.042
itting+0.042
ig+0.042
Neg logit contributions
keeping-0.086
 heads-0.064
 arrest-0.063
ping-0.062
makers-0.060
Token me
Token ID502
Feature activation-1.34e+00

Pos logit contributions
osh+0.037
ores+0.036
ours+0.036
int+0.035
ila+0.035
Neg logit contributions
keeping-0.065
ping-0.046
 arrest-0.044
makers-0.043
 heads-0.043
Token to
Token ID284
Feature activation-4.11e-01

Pos logit contributions
keeping+0.159
ping+0.113
 heads+0.108
makers+0.108
 arrest+0.108
Neg logit contributions
int-0.105
ite-0.096
ours-0.094
ig-0.093
ang-0.093
Token.
Token ID13
Feature activation+2.27e-02

Pos logit contributions
osh+0.152
int+0.152
ite+0.151
ours+0.149
ores+0.147
Neg logit contributions
keeping-0.265
ping-0.190
 heads-0.183
 arrest-0.182
makers-0.178
Token It
Token ID632
Feature activation+8.88e-01

Pos logit contributions
int+0.001
ite+0.001
osh+0.001
ig+0.001
ours+0.001
Neg logit contributions
keeping-0.003
ping-0.002
 heads-0.002
 arrest-0.002
makers-0.002
Token was
Token ID373
Feature activation+4.05e-01

Pos logit contributions
ite+0.067
ours+0.066
int+0.066
ig+0.065
ores+0.064
Neg logit contributions
keeping-0.105
ping-0.078
 heads-0.075
makers-0.073
 arrest-0.071
Token Mrs
Token ID9074
Feature activation-2.91e+00

Pos logit contributions
int+0.028
ite+0.028
osh+0.027
ours+0.027
ig+0.027
Neg logit contributions
keeping-0.052
ping-0.038
 heads-0.036
 arrest-0.036
makers-0.035
Token.
Token ID13
Feature activation+8.85e-02

Pos logit contributions
keeping+0.361
 arrest+0.279
 mant+0.276
ping+0.271
 heads+0.265
Neg logit contributions
ila-0.175
ver-0.175
int-0.174
ig-0.173
ite-0.172
Token Smith
Token ID4176
Feature activation-9.48e+00

Pos logit contributions
ores+0.008
ver+0.008
ite+0.008
int+0.008
osh+0.008
Neg logit contributions
keeping-0.008
ping-0.005
 heads-0.005
 arrest-0.005
makers-0.005
Token,
Token ID11
Feature activation-1.16e-01

Pos logit contributions
keeping+1.169
ping+0.843
 heads+0.807
 arrest+0.800
makers+0.790
Neg logit contributions
int-0.667
osh-0.665
ite-0.663
ours-0.658
ores-0.650
Token a
Token ID257
Feature activation+4.24e-02

Pos logit contributions
keeping+0.014
ping+0.010
 heads+0.010
 arrest+0.010
makers+0.010
Neg logit contributions
osh-0.008
int-0.008
ite-0.008
ours-0.008
ores-0.008
Token nice
Token ID3621
Feature activation+4.42e-01

Pos logit contributions
osh+0.003
ite+0.003
int+0.003
ours+0.003
ores+0.003
Neg logit contributions
keeping-0.005
ping-0.004
 heads-0.003
 arrest-0.003
makers-0.003
Token lady
Token ID10846
Feature activation+7.52e-01

Pos logit contributions
int+0.036
osh+0.036
ite+0.036
ours+0.036
ores+0.035
Neg logit contributions
keeping-0.050
ping-0.035
 heads-0.033
 arrest-0.033
makers-0.032
Token from
Token ID422
Feature activation-2.97e+00

Pos logit contributions
osh+0.049
int+0.049
ite+0.049
ours+0.048
ores+0.048
Neg logit contributions
keeping-0.097
ping-0.071
 heads-0.068
 arrest-0.067
makers-0.066
Token His
Token ID2399
Feature activation+2.02e-01

Pos logit contributions
int+0.012
ig+0.012
osh+0.012
ours+0.012
ite+0.012
Neg logit contributions
keeping-0.020
ping-0.014
 arrest-0.014
 heads-0.013
makers-0.013
Token name
Token ID1438
Feature activation+2.32e+00

Pos logit contributions
int+0.013
osh+0.013
ite+0.013
ores+0.013
ours+0.013
Neg logit contributions
keeping-0.026
ping-0.019
 heads-0.019
 arrest-0.018
makers-0.018
Token was
Token ID373
Feature activation+6.70e-02

Pos logit contributions
int+0.173
ores+0.169
ours+0.166
ite+0.165
ig+0.164
Neg logit contributions
keeping-0.277
ping-0.200
 heads-0.193
 arrest-0.190
makers-0.186
Token Mr
Token ID1770
Feature activation-2.93e+00

Pos logit contributions
int+0.004
ite+0.004
osh+0.004
ours+0.004
ig+0.004
Neg logit contributions
keeping-0.009
ping-0.006
 arrest-0.006
 heads-0.006
makers-0.006
Token.
Token ID13
Feature activation-1.83e-01

Pos logit contributions
keeping+0.387
 mant+0.289
 arrest+0.286
ping+0.285
 heads+0.276
Neg logit contributions
int-0.181
ite-0.179
ver-0.174
osh-0.171
ours-0.170
Token Smith
Token ID4176
Feature activation-9.47e+00

Pos logit contributions
keeping+0.020
ping+0.014
 heads+0.013
 arrest+0.013
makers+0.013
Neg logit contributions
ite-0.016
int-0.015
osh-0.015
ours-0.015
ig-0.015
Token.
Token ID13
Feature activation-9.75e-02

Pos logit contributions
keeping+1.093
ping+0.770
 heads+0.733
 arrest+0.723
makers+0.719
Neg logit contributions
int-0.736
osh-0.727
ours-0.722
ite-0.720
ores-0.719
Token
Token ID198
Feature activation-3.07e-01

Pos logit contributions
keeping+0.012
ping+0.009
 arrest+0.009
 heads+0.009
makers+0.008
Neg logit contributions
int-0.007
osh-0.007
ite-0.006
ig-0.006
ours-0.006
Token
Token ID198
Feature activation-2.48e-01

Pos logit contributions
keeping+0.040
ping+0.028
 arrest+0.028
makers+0.027
 heads+0.027
Neg logit contributions
int-0.020
osh-0.020
ig-0.020
ours-0.020
ite-0.019
TokenOne
Token ID3198
Feature activation-1.65e+00

Pos logit contributions
keeping+0.030
ping+0.021
 heads+0.021
 arrest+0.021
makers+0.020
Neg logit contributions
osh-0.018
int-0.018
ite-0.018
ours-0.018
ores-0.018
Token day
Token ID1110
Feature activation-8.76e-01

Pos logit contributions
keeping+0.155
ping+0.104
 heads+0.093
 generation+0.092
makers+0.091
Neg logit contributions
ours-0.159
ver-0.156
int-0.153
umber-0.151
ig-0.151
Token tall
Token ID7331
Feature activation+4.82e-01

Pos logit contributions
osh+0.086
ite+0.083
ores+0.083
ours+0.083
ig+0.082
Neg logit contributions
keeping-0.147
ping-0.104
 heads-0.101
 arrest-0.100
makers-0.099
Token trees
Token ID7150
Feature activation-2.48e+00

Pos logit contributions
ite+0.039
osh+0.039
ig+0.038
 of+0.038
ours+0.037
Neg logit contributions
keeping-0.053
ping-0.036
makers-0.036
 mant-0.034
 arrest-0.034
Token and
Token ID290
Feature activation-3.01e-01

Pos logit contributions
keeping+0.321
ping+0.231
 heads+0.228
 arrest+0.224
makers+0.221
Neg logit contributions
ite-0.163
osh-0.160
ores-0.157
ig-0.156
int-0.154
Token ch
Token ID442
Feature activation+1.64e-01

Pos logit contributions
keeping+0.035
ping+0.024
 arrest+0.023
makers+0.023
 heads+0.022
Neg logit contributions
osh-0.023
 of-0.022
apy-0.022
ores-0.022
 exactly-0.022
Tokenir
Token ID343
Feature activation-4.18e-01

Pos logit contributions
ver+0.013
osh+0.013
int+0.013
ite+0.013
ours+0.013
Neg logit contributions
keeping-0.019
ping-0.014
 heads-0.013
 arrest-0.013
makers-0.012
Tokenping
Token ID13886
Feature activation-9.44e+00

Pos logit contributions
keeping+0.029
 mant+0.014
 doorstep+0.013
 arrest+0.013
 heads+0.012
Neg logit contributions
int-0.052
osh-0.051
 of-0.050
ig-0.050
ores-0.050
Token birds
Token ID10087
Feature activation-1.58e+00

Pos logit contributions
keeping+1.162
 arrest+0.801
ping+0.789
 heads+0.787
 mant+0.778
Neg logit contributions
osh-0.676
int-0.653
ite-0.651
ores-0.649
ig-0.637
Token.
Token ID13
Feature activation-9.98e-02

Pos logit contributions
keeping+0.211
 heads+0.151
 arrest+0.150
 mant+0.147
 internet+0.144
Neg logit contributions
ite-0.099
osh-0.097
 of-0.097
ig-0.094
adow-0.093
Token She
Token ID1375
Feature activation+1.03e-01

Pos logit contributions
keeping+0.013
ping+0.009
 arrest+0.009
 heads+0.009
makers+0.009
Neg logit contributions
osh-0.007
int-0.007
ite-0.007
ours-0.007
ig-0.007
Token walked
Token ID6807
Feature activation+1.07e+00

Pos logit contributions
ite+0.008
int+0.008
ours+0.007
ores+0.007
osh+0.007
Neg logit contributions
keeping-0.012
ping-0.009
 heads-0.009
 arrest-0.008
makers-0.008
Token around
Token ID1088
Feature activation+8.16e-01

Pos logit contributions
int+0.072
osh+0.072
ite+0.072
ores+0.071
ig+0.070
Neg logit contributions
keeping-0.136
ping-0.097
 arrest-0.095
 heads-0.094
makers-0.091
Token bright
Token ID6016
Feature activation+4.76e-01

Pos logit contributions
keeping+0.082
ping+0.059
 heads+0.055
 arrest+0.055
makers+0.054
Neg logit contributions
int-0.055
ite-0.054
ours-0.053
ig-0.052
osh-0.051
Token blue
Token ID4171
Feature activation+1.10e+00

Pos logit contributions
int+0.039
osh+0.038
ite+0.038
ours+0.038
ores+0.038
Neg logit contributions
keeping-0.054
ping-0.038
 heads-0.036
 arrest-0.036
makers-0.035
Token birds
Token ID10087
Feature activation-1.55e+00

Pos logit contributions
ite+0.082
ig+0.079
osh+0.078
 of+0.077
int+0.077
Neg logit contributions
keeping-0.137
ping-0.094
makers-0.094
 arrest-0.091
 heads-0.090
Token ch
Token ID442
Feature activation+2.38e-01

Pos logit contributions
keeping+0.206
 heads+0.146
ping+0.145
 arrest+0.143
makers+0.140
Neg logit contributions
ite-0.100
osh-0.099
ig-0.097
int-0.094
ores-0.093
Tokenir
Token ID343
Feature activation-4.05e-01

Pos logit contributions
int+0.017
osh+0.017
ite+0.017
ig+0.016
ours+0.016
Neg logit contributions
keeping-0.030
ping-0.021
 heads-0.020
 arrest-0.020
makers-0.020
Tokenping
Token ID13886
Feature activation-9.44e+00

Pos logit contributions
keeping+0.026
 arrest+0.010
 doorstep+0.010
 generation+0.010
 mant+0.010
Neg logit contributions
int-0.052
 of-0.052
osh-0.051
ig-0.050
itting-0.050
Token in
Token ID287
Feature activation-1.64e+00

Pos logit contributions
keeping+1.257
 arrest+0.889
 heads+0.873
ping+0.868
makers+0.860
Neg logit contributions
osh-0.589
ite-0.569
int-0.567
ores-0.559
ig-0.553
Token the
Token ID262
Feature activation+2.48e-01

Pos logit contributions
keeping+0.232
ping+0.174
makers+0.172
 heads+0.170
edly+0.168
Neg logit contributions
osh-0.083
 of-0.081
ores-0.080
 exactly-0.078
ite-0.075
Token trees
Token ID7150
Feature activation-2.12e+00

Pos logit contributions
ite+0.019
osh+0.019
ig+0.018
int+0.017
ores+0.017
Neg logit contributions
keeping-0.030
ping-0.021
makers-0.020
 arrest-0.020
edly-0.019
Token and
Token ID290
Feature activation-7.26e-02

Pos logit contributions
keeping+0.282
ping+0.204
 heads+0.202
 arrest+0.199
makers+0.197
Neg logit contributions
ite-0.131
osh-0.130
int-0.127
ig-0.125
ores-0.125
Token big
Token ID1263
Feature activation+2.22e-01

Pos logit contributions
keeping+0.008
ping+0.006
 arrest+0.006
makers+0.006
 heads+0.006
Neg logit contributions
osh-0.006
ite-0.005
ores-0.005
ours-0.005
apy-0.005
Token birds
Token ID10087
Feature activation-1.64e+00

Pos logit contributions
int+0.373
ite+0.370
osh+0.369
ours+0.363
ig+0.360
Neg logit contributions
keeping-0.510
ping-0.351
 heads-0.337
 arrest-0.334
makers-0.326
Token inside
Token ID2641
Feature activation-3.50e+00

Pos logit contributions
keeping+0.215
 heads+0.154
ping+0.153
 arrest+0.149
 mant+0.148
Neg logit contributions
ite-0.106
osh-0.105
ig-0.102
int-0.099
ores-0.098
Token,
Token ID11
Feature activation-2.70e-01

Pos logit contributions
keeping+0.453
ping+0.331
 heads+0.320
 arrest+0.317
makers+0.312
Neg logit contributions
osh-0.227
ite-0.224
int-0.224
ours-0.221
ores-0.219
Token ch
Token ID442
Feature activation+2.89e-01

Pos logit contributions
keeping+0.031
ping+0.022
 heads+0.021
 arrest+0.021
makers+0.021
Neg logit contributions
osh-0.021
ite-0.021
int-0.020
ores-0.020
ours-0.020
Tokenir
Token ID343
Feature activation-3.84e-01

Pos logit contributions
ite+0.021
ver+0.021
osh+0.021
int+0.021
ours+0.021
Neg logit contributions
keeping-0.035
ping-0.025
 heads-0.024
 arrest-0.024
makers-0.023
Tokenping
Token ID13886
Feature activation-9.41e+00

Pos logit contributions
keeping+0.026
 mant+0.010
 generation+0.010
 arrest+0.010
 doorstep+0.009
Neg logit contributions
int-0.049
osh-0.049
ores-0.048
ig-0.048
 of-0.048
Token loudly
Token ID23112
Feature activation-1.18e+00

Pos logit contributions
keeping+1.209
 arrest+0.845
ping+0.833
 heads+0.830
makers+0.822
Neg logit contributions
osh-0.623
ite-0.595
ores-0.594
int-0.593
ig-0.582
Token!
Token ID0
Feature activation+9.13e-01

Pos logit contributions
keeping+0.159
 mant+0.116
 generation+0.116
 arrest+0.115
makers+0.113
Neg logit contributions
 of-0.061
osh-0.057
ores-0.057
aw-0.053
int-0.053
Token El
Token ID2574
Feature activation+2.59e-01

Pos logit contributions
ite+0.070
int+0.070
ig+0.069
osh+0.068
ours+0.068
Neg logit contributions
keeping-0.106
ping-0.074
 heads-0.072
makers-0.070
 arrest-0.070
Tokenisa
Token ID9160
Feature activation-1.09e+00

Pos logit contributions
ite+0.024
osh+0.024
int+0.024
ours+0.024
ores+0.024
Neg logit contributions
keeping-0.026
ping-0.017
 heads-0.016
 arrest-0.015
makers-0.015
Token smiled
Token ID13541
Feature activation+1.29e-01

Pos logit contributions
keeping+0.143
ping+0.103
 arrest+0.098
 heads+0.098
makers+0.097
Neg logit contributions
osh-0.072
int-0.070
ite-0.069
ig-0.068
ours-0.067
Token when
Token ID618
Feature activation-1.69e-02

Pos logit contributions
ite+0.172
osh+0.171
ores+0.167
int+0.167
ig+0.165
Neg logit contributions
keeping-0.338
ping-0.244
 arrest-0.236
 heads-0.236
makers-0.234
Token her
Token ID607
Feature activation-7.51e-01

Pos logit contributions
keeping+0.002
ping+0.002
 heads+0.002
 arrest+0.002
makers+0.002
Neg logit contributions
osh-0.001
int-0.001
ite-0.001
ours-0.001
ores-0.001
Token neighbor
Token ID4780
Feature activation+1.87e+00

Pos logit contributions
keeping+0.098
ping+0.071
 heads+0.070
 arrest+0.068
makers+0.067
Neg logit contributions
osh-0.049
int-0.048
ours-0.047
ite-0.047
ores-0.047
Token,
Token ID11
Feature activation-3.16e-01

Pos logit contributions
osh+0.119
int+0.119
ite+0.118
ours+0.117
ores+0.115
Neg logit contributions
keeping-0.243
ping-0.178
 heads-0.171
 arrest-0.170
makers-0.167
Token Mrs
Token ID9074
Feature activation-2.76e+00

Pos logit contributions
keeping+0.040
ping+0.029
 arrest+0.028
makers+0.028
 heads+0.027
Neg logit contributions
int-0.023
ite-0.022
ig-0.022
ours-0.021
osh-0.021
Token Smith
Token ID4176
Feature activation-9.38e+00

Pos logit contributions
keeping+0.378
 arrest+0.304
 mant+0.289
ping+0.288
 heads+0.273
Neg logit contributions
ores-0.140
ila-0.140
ig-0.139
ite-0.137
?-0.137
Token,
Token ID11
Feature activation-2.70e-01

Pos logit contributions
keeping+1.185
ping+0.856
 heads+0.826
 arrest+0.815
makers+0.797
Neg logit contributions
ite-0.643
osh-0.643
int-0.638
ours-0.627
ig-0.626
Token came
Token ID1625
Feature activation+1.88e+00

Pos logit contributions
keeping+0.034
ping+0.025
 heads+0.024
 arrest+0.024
makers+0.023
Neg logit contributions
osh-0.018
ite-0.018
int-0.018
ours-0.018
ores-0.017
Token over
Token ID625
Feature activation-1.16e-01

Pos logit contributions
 of+0.108
osh+0.107
run+0.106
ite+0.106
int+0.106
Neg logit contributions
keeping-0.243
 arrest-0.181
ping-0.177
 heads-0.175
 mant-0.174
Token to
Token ID284
Feature activation-5.02e-01

Pos logit contributions
keeping+0.015
ping+0.011
 heads+0.011
 arrest+0.010
 mant+0.010
Neg logit contributions
osh-0.008
ores-0.008
ite-0.007
ours-0.007
int-0.007
Token talk
Token ID1561
Feature activation-1.60e+00

Pos logit contributions
keeping+0.066
ping+0.050
 heads+0.047
edly+0.047
makers+0.046
Neg logit contributions
osh-0.031
ores-0.029
int-0.029
ours-0.028
ila-0.028
Token she
Token ID673
Feature activation+1.37e+00

Pos logit contributions
keeping+0.035
ping+0.026
 heads+0.025
 arrest+0.025
makers+0.025
Neg logit contributions
int-0.018
ite-0.017
ig-0.016
ours-0.016
osh-0.016
Token saw
Token ID2497
Feature activation-3.29e-01

Pos logit contributions
ores+0.123
int+0.120
ite+0.118
ours+0.117
ver+0.115
Neg logit contributions
keeping-0.129
 heads-0.094
ping-0.088
cious-0.082
 arrest-0.082
Token birds
Token ID10087
Feature activation-1.49e+00

Pos logit contributions
keeping+0.042
ping+0.030
 heads+0.029
 arrest+0.029
makers+0.028
Neg logit contributions
int-0.023
ite-0.022
osh-0.022
ours-0.022
ig-0.022
Token ch
Token ID442
Feature activation+2.26e-01

Pos logit contributions
keeping+0.193
 arrest+0.131
 heads+0.128
makers+0.126
 mant+0.126
Neg logit contributions
ite-0.106
ig-0.103
osh-0.103
int-0.100
ours-0.098
Tokenir
Token ID343
Feature activation-4.71e-01

Pos logit contributions
int+0.016
ite+0.016
osh+0.016
ig+0.015
ours+0.015
Neg logit contributions
keeping-0.029
ping-0.020
 heads-0.020
 arrest-0.020
makers-0.019
Tokenping
Token ID13886
Feature activation-9.37e+00

Pos logit contributions
keeping+0.032
 doorstep+0.016
 generation+0.016
 mant+0.016
 arrest+0.014
Neg logit contributions
int-0.060
aw-0.059
 of-0.057
osh-0.056
adow-0.055
Token,
Token ID11
Feature activation-2.31e-01

Pos logit contributions
keeping+1.252
 arrest+0.899
 heads+0.866
 mant+0.865
makers+0.860
Neg logit contributions
osh-0.583
ite-0.573
int-0.570
ores-0.556
 of-0.553
Token butterflies
Token ID44571
Feature activation-4.69e+00

Pos logit contributions
keeping+0.028
 arrest+0.018
ping+0.018
 heads+0.018
makers+0.018
Neg logit contributions
osh-0.018
ite-0.017
ores-0.017
ours-0.017
int-0.017
Token fl
Token ID781
Feature activation-5.59e-01

Pos logit contributions
keeping+0.691
 arrest+0.496
 heads+0.490
 mant+0.487
ping+0.479
Neg logit contributions
ite-0.252
ig-0.249
osh-0.241
 of-0.238
ver-0.226
Tokenuttering
Token ID33598
Feature activation+1.72e+00

Pos logit contributions
keeping+0.073
ping+0.058
 heads+0.054
makers+0.052
edly+0.052
Neg logit contributions
ours-0.032
osh-0.032
ores-0.031
ver-0.030
ig-0.030
Token and
Token ID290
Feature activation-1.14e-01

Pos logit contributions
osh+0.117
ite+0.116
int+0.115
ores+0.113
ours+0.112
Neg logit contributions
keeping-0.218
ping-0.155
 arrest-0.153
 heads-0.151
makers-0.148
Token My
Token ID2011
Feature activation-3.56e-03

Pos logit contributions
keeping+0.063
ping+0.046
 heads+0.045
 arrest+0.044
 mant+0.043
Neg logit contributions
ite-0.030
osh-0.029
ours-0.029
int-0.029
ores-0.029
Token name
Token ID1438
Feature activation+2.35e+00

Pos logit contributions
keeping+0.000
ping+0.000
 heads+0.000
 arrest+0.000
makers+0.000
Neg logit contributions
osh-0.000
int-0.000
ite-0.000
ours-0.000
ores-0.000
Token is
Token ID318
Feature activation+1.50e+00

Pos logit contributions
int+0.133
ores+0.129
ours+0.127
ite+0.126
ig+0.126
Neg logit contributions
keeping-0.319
ping-0.243
 heads-0.231
makers-0.229
 arrest-0.228
Token Mr
Token ID1770
Feature activation-2.71e+00

Pos logit contributions
int+0.110
ite+0.107
osh+0.106
ig+0.104
ours+0.103
Neg logit contributions
keeping-0.183
ping-0.133
 heads-0.127
makers-0.123
 arrest-0.123
Token.
Token ID13
Feature activation-2.51e-01

Pos logit contributions
keeping+0.344
 mant+0.250
 arrest+0.250
ping+0.249
 heads+0.244
Neg logit contributions
ite-0.183
int-0.183
osh-0.176
ver-0.174
ours-0.173
Token Smith
Token ID4176
Feature activation-9.37e+00

Pos logit contributions
keeping+0.028
ping+0.019
 heads+0.018
 arrest+0.018
makers+0.017
Neg logit contributions
osh-0.021
int-0.021
ite-0.021
ours-0.021
ores-0.021
Token.
Token ID13
Feature activation-1.53e-01

Pos logit contributions
keeping+1.126
ping+0.809
 heads+0.783
 arrest+0.773
makers+0.766
Neg logit contributions
int-0.678
ores-0.657
osh-0.649
ours-0.645
ite-0.645
Token Would
Token ID10928
Feature activation-8.55e-01

Pos logit contributions
keeping+0.018
ping+0.012
 arrest+0.012
 heads+0.012
makers+0.012
Neg logit contributions
int-0.012
osh-0.012
ig-0.012
ores-0.011
ite-0.011
Token you
Token ID345
Feature activation-1.65e+00

Pos logit contributions
keeping+0.107
ping+0.075
 heads+0.074
 arrest+0.071
 mant+0.070
Neg logit contributions
ite-0.063
osh-0.062
ours-0.061
ver-0.060
int-0.060
Token like
Token ID588
Feature activation+1.82e+00

Pos logit contributions
keeping+0.169
ping+0.120
 heads+0.111
makers+0.110
 arrest+0.109
Neg logit contributions
ours-0.138
apy-0.134
ig-0.132
osh-0.131
int-0.129
Token me
Token ID502
Feature activation-1.35e+00

Pos logit contributions
osh+0.121
ite+0.120
ours+0.119
int+0.118
ores+0.117
Neg logit contributions
keeping-0.235
ping-0.169
 heads-0.163
 arrest-0.161
makers-0.160
Token the
Token ID262
Feature activation+2.76e-01

Pos logit contributions
keeping+0.206
ping+0.154
makers+0.147
 heads+0.147
 arrest+0.145
Neg logit contributions
osh-0.090
ores-0.086
ours-0.086
ite-0.086
 exactly-0.085
Token sky
Token ID6766
Feature activation-1.03e+00

Pos logit contributions
ite+0.022
osh+0.022
ig+0.022
int+0.021
ours+0.021
Neg logit contributions
keeping-0.033
ping-0.023
makers-0.022
 arrest-0.021
 heads-0.021
Token,
Token ID11
Feature activation-2.04e-01

Pos logit contributions
keeping+0.131
ping+0.096
 heads+0.092
 arrest+0.090
makers+0.089
Neg logit contributions
int-0.068
osh-0.067
ite-0.067
ours-0.067
ores-0.065
Token ch
Token ID442
Feature activation+3.64e-01

Pos logit contributions
keeping+0.024
ping+0.017
 heads+0.016
 arrest+0.016
makers+0.016
Neg logit contributions
osh-0.015
int-0.015
ite-0.015
ours-0.015
ores-0.015
Tokenir
Token ID343
Feature activation-2.62e-01

Pos logit contributions
ver+0.030
int+0.029
ite+0.029
ours+0.029
osh+0.028
Neg logit contributions
keeping-0.040
ping-0.030
 heads-0.028
 arrest-0.027
makers-0.026
Tokenping
Token ID13886
Feature activation-9.36e+00

Pos logit contributions
keeping+0.018
 mant+0.007
 arrest+0.007
 generation+0.006
 doorstep+0.006
Neg logit contributions
osh-0.034
int-0.034
ores-0.033
 of-0.033
ig-0.033
Token loudly
Token ID23112
Feature activation-1.05e+00

Pos logit contributions
keeping+1.243
 arrest+0.871
ping+0.855
 heads+0.848
makers+0.846
Neg logit contributions
osh-0.610
int-0.580
ite-0.578
ores-0.573
ig-0.564
Token.
Token ID13
Feature activation-1.20e-02

Pos logit contributions
keeping+0.151
 mant+0.111
 generation+0.109
 arrest+0.108
 doorstep+0.106
Neg logit contributions
osh-0.052
 of-0.050
aw-0.048
ores-0.047
ig-0.043
Token
Token ID198
Feature activation+4.71e-02

Pos logit contributions
keeping+0.002
ping+0.001
 heads+0.001
 arrest+0.001
makers+0.001
Neg logit contributions
int-0.001
ite-0.001
osh-0.001
ig-0.001
ours-0.001
Token
Token ID198
Feature activation+7.09e-02

Pos logit contributions
int+0.003
ours+0.003
ite+0.003
ig+0.003
osh+0.003
Neg logit contributions
keeping-0.006
ping-0.004
makers-0.004
 arrest-0.004
 heads-0.004
TokenThe
Token ID464
Feature activation+3.20e-01

Pos logit contributions
osh+0.005
int+0.005
ite+0.005
ores+0.005
ours+0.005
Neg logit contributions
keeping-0.009
ping-0.006
 heads-0.006
 arrest-0.006
makers-0.006
Token books
Token ID3835
Feature activation-1.21e+00

Pos logit contributions
int+0.053
ig+0.052
ite+0.051
osh+0.050
ours+0.050
Neg logit contributions
keeping-0.081
ping-0.060
makers-0.056
 heads-0.055
 arrest-0.055
Token and
Token ID290
Feature activation-5.01e-01

Pos logit contributions
keeping+0.152
ping+0.112
 heads+0.109
 arrest+0.108
makers+0.105
Neg logit contributions
ite-0.080
osh-0.079
int-0.077
ver-0.077
ores-0.076
Token paper
Token ID3348
Feature activation-2.21e+00

Pos logit contributions
keeping+0.059
ping+0.042
 heads+0.040
 arrest+0.040
makers+0.039
Neg logit contributions
osh-0.038
int-0.037
ite-0.037
ours-0.037
ores-0.037
Token.
Token ID13
Feature activation-1.91e-01

Pos logit contributions
keeping+0.273
ping+0.196
 arrest+0.187
 heads+0.187
makers+0.183
Neg logit contributions
osh-0.155
ite-0.155
int-0.153
ours-0.153
ig-0.151
Token Mrs
Token ID9074
Feature activation-2.68e+00

Pos logit contributions
keeping+0.023
ping+0.016
 arrest+0.016
 heads+0.015
makers+0.015
Neg logit contributions
int-0.014
ig-0.014
osh-0.014
ite-0.014
ours-0.013
Token Smith
Token ID4176
Feature activation-9.36e+00

Pos logit contributions
keeping+0.371
 mant+0.296
 arrest+0.290
ping+0.279
 internet+0.277
Neg logit contributions
int-0.135
ila-0.132
?-0.130
ite-0.128
ours-0.127
Token helped
Token ID4193
Feature activation+1.70e+00

Pos logit contributions
keeping+1.207
ping+0.884
 heads+0.859
 arrest+0.849
makers+0.832
Neg logit contributions
int-0.597
ite-0.593
osh-0.593
ours-0.583
ores-0.583
Token L
Token ID406
Feature activation+3.52e+00

Pos logit contributions
 of+0.094
 exactly+0.089
ores+0.088
ours+0.082
 please+0.081
Neg logit contributions
keeping-0.227
ping-0.175
 mant-0.169
cious-0.168
 arrest-0.167
Tokenola
Token ID5708
Feature activation+2.21e+00

Pos logit contributions
osh+0.173
ite+0.166
ores+0.162
int+0.162
apy+0.161
Neg logit contributions
keeping-0.510
ping-0.388
makers-0.375
 arrest-0.368
 heads-0.362
Token to
Token ID284
Feature activation-5.86e-01

Pos logit contributions
ite+0.173
int+0.172
osh+0.170
ig+0.168
ours+0.167
Neg logit contributions
keeping-0.252
ping-0.175
 heads-0.172
 arrest-0.170
makers-0.163
Token sit
Token ID1650
Feature activation+2.73e+00

Pos logit contributions
keeping+0.068
ping+0.048
 heads+0.046
 arrest+0.046
makers+0.044
Neg logit contributions
int-0.046
osh-0.046
ite-0.046
ours-0.045
ig-0.045
Token her
Token ID607
Feature activation-5.62e-01

Pos logit contributions
ite+0.031
int+0.031
ig+0.030
ours+0.030
osh+0.029
Neg logit contributions
keeping-0.048
ping-0.036
 heads-0.033
 arrest-0.032
makers-0.032
Token neighbor
Token ID4780
Feature activation+2.01e+00

Pos logit contributions
keeping+0.070
 heads+0.051
ping+0.051
 arrest+0.049
makers+0.048
Neg logit contributions
int-0.039
ours-0.038
osh-0.037
ite-0.037
ores-0.037
Token,
Token ID11
Feature activation-2.98e-01

Pos logit contributions
int+0.139
osh+0.138
ite+0.137
ours+0.137
ores+0.135
Neg logit contributions
keeping-0.252
ping-0.183
 heads-0.176
 arrest-0.173
makers-0.171
Token Mrs
Token ID9074
Feature activation-2.85e+00

Pos logit contributions
keeping+0.036
ping+0.025
 heads+0.024
 arrest+0.024
makers+0.024
Neg logit contributions
int-0.023
ite-0.022
ours-0.022
osh-0.022
ig-0.022
Token.
Token ID13
Feature activation-7.18e-02

Pos logit contributions
keeping+0.381
 arrest+0.312
 mant+0.306
ping+0.291
 internet+0.285
Neg logit contributions
?-0.137
ver-0.137
ores-0.137
ila-0.136
int-0.135
Token Smith
Token ID4176
Feature activation-9.35e+00

Pos logit contributions
keeping+0.007
ping+0.004
 heads+0.004
makers+0.004
 mant+0.004
Neg logit contributions
ver-0.007
ores-0.007
ite-0.007
int-0.007
ours-0.007
Token.
Token ID13
Feature activation-3.19e-02

Pos logit contributions
keeping+1.190
ping+0.869
 heads+0.836
 arrest+0.827
makers+0.818
Neg logit contributions
int-0.620
osh-0.616
ite-0.614
ours-0.608
ores-0.602
Token
Token ID220
Feature activation+4.90e-01

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
makers+0.003
Neg logit contributions
ores-0.002
ver-0.002
ours-0.002
int-0.002
ite-0.002
Token
Token ID198
Feature activation+1.04e-01

Pos logit contributions
ig+0.036
int+0.036
osh+0.035
ite+0.035
ours+0.033
Neg logit contributions
keeping-0.060
ping-0.042
makers-0.041
 heads-0.039
 arrest-0.038
Token
Token ID198
Feature activation+1.23e-01

Pos logit contributions
ver+0.005
ores+0.005
ite+0.005
int+0.004
ila+0.004
Neg logit contributions
keeping-0.015
 heads-0.012
ping-0.012
 arrest-0.011
 mant-0.011
TokenMrs
Token ID27034
Feature activation-4.12e+00

Pos logit contributions
int+0.009
ite+0.008
ores+0.008
ver+0.008
 of+0.008
Neg logit contributions
keeping-0.015
ping-0.011
 heads-0.011
 mant-0.010
 arrest-0.010
Tokeny
Token ID88
Feature activation-2.93e+00

Pos logit contributions
osh+0.035
int+0.035
ite+0.034
ours+0.034
ores+0.034
Neg logit contributions
keeping-0.048
ping-0.033
 heads-0.032
 arrest-0.032
makers-0.031
Token neighbor
Token ID4780
Feature activation+1.96e+00

Pos logit contributions
keeping+0.375
ping+0.274
 arrest+0.258
makers+0.257
 mant+0.255
Neg logit contributions
ite-0.193
int-0.193
osh-0.190
ig-0.188
ours-0.184
Token,
Token ID11
Feature activation-3.98e-01

Pos logit contributions
osh+0.126
int+0.125
ite+0.125
ours+0.123
ores+0.122
Neg logit contributions
keeping-0.254
ping-0.186
 heads-0.179
 arrest-0.178
makers-0.175
Token Mrs
Token ID9074
Feature activation-2.85e+00

Pos logit contributions
keeping+0.051
ping+0.036
 arrest+0.035
 heads+0.035
makers+0.034
Neg logit contributions
int-0.028
ite-0.027
osh-0.027
ig-0.027
ours-0.027
Token.
Token ID13
Feature activation-1.35e-01

Pos logit contributions
keeping+0.410
 arrest+0.336
 mant+0.332
ping+0.312
 internet+0.310
Neg logit contributions
?-0.120
ores-0.119
ila-0.118
ours-0.110
ite-0.109
Token Smith
Token ID4176
Feature activation-9.35e+00

Pos logit contributions
keeping+0.013
ping+0.009
 mant+0.009
makers+0.009
 heads+0.009
Neg logit contributions
ver-0.012
ite-0.012
ores-0.012
int-0.012
ila-0.011
Token,
Token ID11
Feature activation-2.43e-01

Pos logit contributions
keeping+1.167
ping+0.846
 heads+0.812
 arrest+0.805
makers+0.798
Neg logit contributions
int-0.642
osh-0.638
ite-0.634
ores-0.630
ours-0.628
Token came
Token ID1625
Feature activation+1.96e+00

Pos logit contributions
keeping+0.032
ping+0.023
 heads+0.022
 arrest+0.022
makers+0.022
Neg logit contributions
osh-0.016
int-0.015
ite-0.015
ours-0.015
ores-0.015
Token over
Token ID625
Feature activation-7.91e-02

Pos logit contributions
int+0.125
osh+0.124
ite+0.124
ores+0.123
ours+0.120
Neg logit contributions
keeping-0.251
ping-0.181
 arrest-0.180
 heads-0.178
 mant-0.173
Token to
Token ID284
Feature activation-4.81e-01

Pos logit contributions
keeping+0.010
ping+0.008
 heads+0.008
 mant+0.007
 arrest+0.007
Neg logit contributions
ores-0.005
osh-0.005
 of-0.005
ours-0.005
ite-0.005
Token say
Token ID910
Feature activation+8.61e-01

Pos logit contributions
keeping+0.063
ping+0.050
edly+0.048
 heads+0.047
cious+0.046
Neg logit contributions
ores-0.027
?-0.027
osh-0.026
 please-0.026
 of-0.026
Token that
Token ID326
Feature activation-5.47e-01

Pos logit contributions
keeping+0.011
ping+0.008
 heads+0.007
 arrest+0.007
makers+0.007
Neg logit contributions
int-0.007
osh-0.006
ours-0.006
ig-0.006
ite-0.006
Token says
Token ID1139
Feature activation+9.07e-01

Pos logit contributions
keeping+0.071
 heads+0.052
ping+0.051
 arrest+0.050
makers+0.049
Neg logit contributions
int-0.036
ig-0.035
ite-0.033
ours-0.033
itting-0.033
Token "
Token ID366
Feature activation-5.03e-01

Pos logit contributions
int+0.067
ite+0.066
osh+0.065
ours+0.065
ig+0.065
Neg logit contributions
keeping-0.109
ping-0.078
 heads-0.075
 arrest-0.074
makers-0.074
TokenMr
Token ID5246
Feature activation-6.44e+00

Pos logit contributions
keeping+0.058
ping+0.041
 heads+0.040
 arrest+0.039
 mant+0.039
Neg logit contributions
ite-0.040
osh-0.040
int-0.039
ver-0.038
ores-0.037
Token.
Token ID13
Feature activation-1.42e-01

Pos logit contributions
keeping+0.787
ping+0.571
 mant+0.555
 arrest+0.550
 heads+0.546
Neg logit contributions
ite-0.467
osh-0.464
int-0.460
ver-0.446
ores-0.443
Token Smith
Token ID4176
Feature activation-9.35e+00

Pos logit contributions
keeping+0.016
ping+0.010
 heads+0.010
 arrest+0.010
makers+0.010
Neg logit contributions
int-0.012
osh-0.012
ite-0.012
ours-0.012
ores-0.012
Token".
Token ID1911
Feature activation-1.52e+00

Pos logit contributions
keeping+1.155
ping+0.831
 heads+0.800
 arrest+0.798
makers+0.784
Neg logit contributions
int-0.656
osh-0.641
ite-0.639
ours-0.633
ores-0.632
Token He
Token ID679
Feature activation+5.33e-01

Pos logit contributions
keeping+0.211
ping+0.159
 heads+0.154
 arrest+0.152
makers+0.150
Neg logit contributions
osh-0.084
int-0.083
ite-0.083
ours-0.082
ores-0.081
Token looks
Token ID3073
Feature activation+2.36e+00

Pos logit contributions
ite+0.041
int+0.040
ig+0.040
ores+0.039
itting+0.039
Neg logit contributions
keeping-0.058
 heads-0.046
 arrest-0.044
ping-0.042
makers-0.040
Token bored
Token ID17533
Feature activation+1.08e+00

Pos logit contributions
ite+0.152
ig+0.143
osh+0.139
ver+0.139
umber+0.138
Neg logit contributions
keeping-0.295
ping-0.238
 heads-0.236
 mant-0.222
makers-0.222
Token and
Token ID290
Feature activation-3.03e-01

Pos logit contributions
int+0.081
osh+0.080
ite+0.079
ours+0.079
ig+0.078
Neg logit contributions
keeping-0.131
ping-0.093
 heads-0.089
 arrest-0.088
makers-0.087
Token His
Token ID2399
Feature activation+1.21e-01

Pos logit contributions
keeping+0.013
ping+0.009
 heads+0.009
 mant+0.009
 Smith+0.009
Neg logit contributions
ig-0.008
osh-0.008
int-0.008
ite-0.008
ours-0.008
Token teacher
Token ID4701
Feature activation+1.03e+00

Pos logit contributions
int+0.009
ores+0.009
osh+0.009
ours+0.009
ite+0.009
Neg logit contributions
keeping-0.014
 heads-0.011
ping-0.010
 arrest-0.010
 mant-0.009
Token,
Token ID11
Feature activation-3.77e-01

Pos logit contributions
ores+0.075
int+0.075
osh+0.073
ite+0.073
ours+0.073
Neg logit contributions
keeping-0.123
ping-0.088
 heads-0.087
makers-0.086
 arrest-0.085
Token Mrs
Token ID9074
Feature activation-2.88e+00

Pos logit contributions
keeping+0.044
ping+0.030
 arrest+0.030
makers+0.029
 heads+0.029
Neg logit contributions
int-0.031
ig-0.031
ite-0.030
ours-0.029
osh-0.028
Token.
Token ID13
Feature activation-9.70e-02

Pos logit contributions
keeping+0.390
 arrest+0.302
ping+0.296
 mant+0.295
 heads+0.283
Neg logit contributions
ores-0.154
ite-0.153
ig-0.152
ila-0.152
int-0.150
Token Smith
Token ID4176
Feature activation-9.35e+00

Pos logit contributions
keeping+0.009
ping+0.006
 heads+0.006
edly+0.005
 mant+0.005
Neg logit contributions
ver-0.009
ores-0.009
ite-0.009
 Liz-0.009
ours-0.009
Token,
Token ID11
Feature activation-3.17e-01

Pos logit contributions
keeping+1.183
ping+0.868
 heads+0.852
makers+0.849
 arrest+0.839
Neg logit contributions
int-0.616
ores-0.595
osh-0.585
ite-0.578
ours-0.572
Token was
Token ID373
Feature activation+3.80e-01

Pos logit contributions
keeping+0.040
ping+0.029
 arrest+0.028
 heads+0.028
makers+0.027
Neg logit contributions
int-0.022
ite-0.022
ig-0.022
osh-0.022
ours-0.021
Token very
Token ID845
Feature activation-3.86e-01

Pos logit contributions
int+0.038
ite+0.037
ig+0.037
osh+0.036
ours+0.035
Neg logit contributions
keeping-0.036
ping-0.024
makers-0.024
 arrest-0.023
 heads-0.022
Token nice
Token ID3621
Feature activation+3.88e-01

Pos logit contributions
keeping+0.044
makers+0.032
 heads+0.031
ping+0.031
 arrest+0.030
Neg logit contributions
ig-0.030
int-0.030
ite-0.029
osh-0.029
ours-0.028
Token and
Token ID290
Feature activation-9.26e-02

Pos logit contributions
osh+0.028
int+0.028
ite+0.028
ours+0.027
ores+0.027
Neg logit contributions
keeping-0.048
ping-0.034
 heads-0.033
 arrest-0.033
makers-0.032
Token�
Token ID95
Feature activation+3.78e+00

Pos logit contributions
osh+0.214
ours+0.209
ite+0.208
ver+0.206
int+0.205
Neg logit contributions
keeping-0.362
ping-0.261
 arrest-0.256
makers-0.250
 heads-0.249
Token€
Token ID26391
Feature activation+5.07e-01

Pos logit contributions
int+0.324
osh+0.322
ite+0.321
ours+0.316
ores+0.316
Neg logit contributions
keeping-0.414
ping-0.281
 heads-0.270
 arrest-0.266
makers-0.260
Token�
Token ID129
Feature activation+2.32e+00

Pos logit contributions
int+0.039
ours+0.038
osh+0.038
ores+0.037
ig+0.037
Neg logit contributions
keeping-0.060
 arrest-0.042
ping-0.041
cious-0.039
 heads-0.039
Token�
Token ID241
Feature activation+2.19e-01

Pos logit contributions
ores+0.200
int+0.195
osh+0.186
ours+0.186
ig+0.182
Neg logit contributions
keeping-0.263
ping-0.164
 arrest-0.162
 heads-0.161
makers-0.161
TokenHel
Token ID12621
Feature activation-1.24e+00

Pos logit contributions
int+0.017
ite+0.017
osh+0.017
ig+0.017
ours+0.017
Neg logit contributions
keeping-0.026
ping-0.018
 arrest-0.017
makers-0.017
 heads-0.017
Tokenping
Token ID13886
Feature activation-9.34e+00

Pos logit contributions
keeping+0.119
 heads+0.070
 arrest+0.069
makers+0.066
 mant+0.066
Neg logit contributions
int-0.128
ig-0.125
osh-0.124
ours-0.121
ite-0.120
Token others
Token ID1854
Feature activation-1.67e+00

Pos logit contributions
keeping+1.148
ping+0.843
 heads+0.798
 arrest+0.782
makers+0.772
Neg logit contributions
ours-0.662
int-0.661
ite-0.660
osh-0.653
ores-0.646
Token is
Token ID318
Feature activation+1.80e+00

Pos logit contributions
keeping+0.213
ping+0.156
 heads+0.150
 arrest+0.148
makers+0.147
Neg logit contributions
int-0.111
ite-0.109
osh-0.109
ours-0.108
ores-0.107
Token the
Token ID262
Feature activation+6.06e-01

Pos logit contributions
ite+0.135
int+0.134
ig+0.131
osh+0.129
ours+0.128
Neg logit contributions
keeping-0.216
ping-0.160
 heads-0.148
makers-0.147
 arrest-0.146
Token kind
Token ID1611
Feature activation+2.55e+00

Pos logit contributions
osh+0.035
ite+0.035
int+0.035
ig+0.034
ores+0.034
Neg logit contributions
keeping-0.083
ping-0.061
 arrest-0.059
 heads-0.059
makers-0.058
Token thing
Token ID1517
Feature activation+8.37e-01

Pos logit contributions
ours+0.071
osh+0.068
int+0.067
ores+0.066
ver+0.061
Neg logit contributions
keeping-0.431
ping-0.344
 heads-0.330
 arrest-0.329
makers-0.327
Token said
Token ID531
Feature activation+3.10e-01

Pos logit contributions
keeping+0.411
 heads+0.305
ping+0.280
edly+0.276
cious+0.260
Neg logit contributions
ite-0.313
ver-0.312
itting-0.301
ores-0.301
osh-0.300
Token the
Token ID262
Feature activation+4.04e-01

Pos logit contributions
ite+0.027
ours+0.027
udd+0.026
uddle+0.025
int+0.025
Neg logit contributions
keeping-0.033
 heads-0.022
ping-0.022
 mant-0.021
edly-0.021
Token teacher
Token ID4701
Feature activation+1.02e+00

Pos logit contributions
ite+0.027
osh+0.027
int+0.027
ours+0.026
ig+0.026
Neg logit contributions
keeping-0.051
ping-0.038
 heads-0.035
 arrest-0.035
makers-0.035
Token,
Token ID11
Feature activation-1.65e-01

Pos logit contributions
osh+0.078
int+0.077
ite+0.077
ours+0.076
ores+0.076
Neg logit contributions
keeping-0.120
ping-0.085
 heads-0.081
makers-0.079
 arrest-0.079
Token Miss
Token ID4544
Feature activation+2.72e-01

Pos logit contributions
keeping+0.020
ping+0.014
 heads+0.014
 arrest+0.014
makers+0.013
Neg logit contributions
osh-0.012
int-0.012
ite-0.012
ours-0.012
ores-0.012
Token Smith
Token ID4176
Feature activation-9.34e+00

Pos logit contributions
int+0.017
ours+0.017
osh+0.017
ite+0.016
ver+0.016
Neg logit contributions
keeping-0.036
ping-0.026
 arrest-0.026
 heads-0.025
 mant-0.025
Token.
Token ID13
Feature activation+1.86e-02

Pos logit contributions
keeping+1.229
ping+0.894
 arrest+0.866
 heads+0.863
makers+0.834
Neg logit contributions
osh-0.596
int-0.593
ite-0.593
ig-0.583
ours-0.580
Token She
Token ID1375
Feature activation+2.71e-01

Pos logit contributions
osh+0.001
int+0.001
ite+0.001
ours+0.001
ores+0.001
Neg logit contributions
keeping-0.003
ping-0.002
 heads-0.002
 arrest-0.002
makers-0.002
Token smiled
Token ID13541
Feature activation+1.20e-01

Pos logit contributions
ores+0.022
int+0.022
ite+0.021
ours+0.021
ig+0.021
Neg logit contributions
keeping-0.029
 heads-0.021
ping-0.020
 arrest-0.020
makers-0.019
Token and
Token ID290
Feature activation-3.21e-02

Pos logit contributions
osh+0.010
ours+0.010
ite+0.010
ig+0.010
int+0.010
Neg logit contributions
keeping-0.013
ping-0.010
 heads-0.009
edly-0.009
makers-0.009
Token waved
Token ID26834
Feature activation+1.33e+00

Pos logit contributions
keeping+0.004
 heads+0.003
ping+0.003
 arrest+0.002
cious+0.002
Neg logit contributions
int-0.002
ig-0.002
ite-0.002
ours-0.002
umber-0.002
Token who
Token ID508
Feature activation+5.51e-01

Pos logit contributions
keeping+0.030
ping+0.021
 heads+0.021
 arrest+0.020
makers+0.020
Neg logit contributions
osh-0.016
int-0.016
ores-0.016
ite-0.016
ours-0.016
Token was
Token ID373
Feature activation+4.79e-01

Pos logit contributions
int+0.042
ig+0.041
ours+0.040
ores+0.040
ite+0.040
Neg logit contributions
keeping-0.066
ping-0.046
 arrest-0.045
 heads-0.045
makers-0.044
Token called
Token ID1444
Feature activation+2.41e+00

Pos logit contributions
ores+0.033
osh+0.033
ours+0.033
int+0.033
ite+0.033
Neg logit contributions
keeping-0.060
ping-0.042
 heads-0.041
 arrest-0.041
makers-0.039
Token Dr
Token ID1583
Feature activation+1.86e+00

Pos logit contributions
osh+0.168
ours+0.166
ores+0.164
ite+0.164
ver+0.164
Neg logit contributions
keeping-0.300
ping-0.209
 heads-0.205
 arrest-0.203
 mant-0.195
Token.
Token ID13
Feature activation-8.57e-02

Pos logit contributions
ver+0.086
ores+0.081
?+0.080
ila+0.080
adow+0.079
Neg logit contributions
keeping-0.263
 arrest-0.216
 mant-0.211
 heads-0.201
ping-0.198
Token Smith
Token ID4176
Feature activation-9.33e+00

Pos logit contributions
keeping+0.010
ping+0.007
 heads+0.007
 arrest+0.007
 mant+0.007
Neg logit contributions
ver-0.006
ite-0.006
ores-0.006
ours-0.006
osh-0.006
Token.
Token ID13
Feature activation-7.28e-02

Pos logit contributions
keeping+1.227
ping+0.899
 heads+0.867
 arrest+0.860
makers+0.844
Neg logit contributions
osh-0.590
int-0.587
ite-0.586
ours-0.577
ig-0.569
Token
Token ID220
Feature activation+4.47e-01

Pos logit contributions
keeping+0.010
ping+0.008
 heads+0.007
 arrest+0.007
makers+0.007
Neg logit contributions
ver-0.004
ores-0.004
ite-0.004
ours-0.004
osh-0.004
Token
Token ID198
Feature activation-1.40e-01

Pos logit contributions
ig+0.035
int+0.035
osh+0.034
ite+0.033
ours+0.032
Neg logit contributions
keeping-0.053
ping-0.036
makers-0.036
 Smith-0.033
 heads-0.033
Token
Token ID198
Feature activation-1.22e-01

Pos logit contributions
keeping+0.022
 heads+0.018
ping+0.018
 arrest+0.017
 mant+0.017
Neg logit contributions
ver-0.005
osh-0.005
ores-0.005
 of-0.005
ite-0.004
TokenAt
Token ID2953
Feature activation+5.95e-01

Pos logit contributions
keeping+0.018
 heads+0.013
ping+0.013
 mant+0.012
 harming+0.012
Neg logit contributions
 of-0.006
ver-0.006
ores-0.006
Because-0.006
int-0.006
Token,
Token ID11
Feature activation-1.41e-01

Pos logit contributions
keeping+0.260
ping+0.190
 heads+0.184
 arrest+0.182
makers+0.179
Neg logit contributions
osh-0.124
ite-0.124
int-0.124
ours-0.121
ig-0.120
Token and
Token ID290
Feature activation-5.07e-02

Pos logit contributions
keeping+0.015
ping+0.010
 heads+0.009
 arrest+0.009
makers+0.009
Neg logit contributions
osh-0.013
ite-0.012
int-0.012
ours-0.012
ores-0.012
Token birds
Token ID10087
Feature activation-1.44e+00

Pos logit contributions
keeping+0.006
ping+0.004
 heads+0.004
 arrest+0.004
makers+0.004
Neg logit contributions
osh-0.004
ite-0.004
int-0.004
ours-0.004
ores-0.004
Token ch
Token ID442
Feature activation+4.42e-01

Pos logit contributions
keeping+0.183
 arrest+0.128
 heads+0.127
ping+0.127
 mant+0.125
Neg logit contributions
ite-0.099
osh-0.096
ig-0.095
int-0.095
ours-0.092
Tokenir
Token ID343
Feature activation-2.59e-01

Pos logit contributions
int+0.030
osh+0.030
ite+0.030
ours+0.029
ig+0.029
Neg logit contributions
keeping-0.057
ping-0.040
 heads-0.039
 arrest-0.039
makers-0.039
Tokenping
Token ID13886
Feature activation-9.33e+00

Pos logit contributions
keeping+0.016
 mant+0.007
 generation+0.006
 doorstep+0.006
 arrest+0.006
Neg logit contributions
int-0.034
aw-0.033
osh-0.033
 of-0.033
adow-0.032
Token.
Token ID13
Feature activation-1.39e-01

Pos logit contributions
keeping+1.182
 arrest+0.844
 mant+0.818
makers+0.810
 heads+0.807
Neg logit contributions
osh-0.612
ite-0.604
int-0.599
 of-0.592
ores-0.587
Token
Token ID220
Feature activation+6.33e-01

Pos logit contributions
keeping+0.018
ping+0.013
 heads+0.013
 arrest+0.013
makers+0.012
Neg logit contributions
osh-0.009
int-0.009
ite-0.009
ours-0.009
ores-0.009
Token
Token ID198
Feature activation+1.64e-02

Pos logit contributions
ig+0.057
int+0.057
ite+0.053
osh+0.052
ours+0.051
Neg logit contributions
keeping-0.066
makers-0.042
ping-0.041
 Smith-0.039
 heads-0.038
Token
Token ID198
Feature activation+3.17e-02

Pos logit contributions
int+0.001
osh+0.001
ite+0.001
ours+0.001
ig+0.001
Neg logit contributions
keeping-0.002
ping-0.002
 heads-0.002
 arrest-0.002
makers-0.002
TokenJohn
Token ID7554
Feature activation-3.88e-01

Pos logit contributions
osh+0.002
int+0.002
ite+0.002
ours+0.002
ores+0.002
Neg logit contributions
keeping-0.004
ping-0.003
 heads-0.003
 arrest-0.003
makers-0.003
TokenThe
Token ID464
Feature activation+4.19e-01

Pos logit contributions
int+0.012
ours+0.011
ite+0.011
ig+0.011
osh+0.011
Neg logit contributions
keeping-0.019
ping-0.013
 arrest-0.013
makers-0.013
 heads-0.013
Token children
Token ID1751
Feature activation-4.01e+00

Pos logit contributions
ite+0.032
osh+0.032
ig+0.031
ours+0.031
int+0.031
Neg logit contributions
keeping-0.052
ping-0.036
 arrest-0.034
 heads-0.034
makers-0.033
Token climbed
Token ID19952
Feature activation+2.57e+00

Pos logit contributions
keeping+0.468
ping+0.336
 heads+0.328
 arrest+0.319
makers+0.318
Neg logit contributions
ours-0.300
ores-0.299
int-0.297
ite-0.287
ig-0.281
Token up
Token ID510
Feature activation+9.03e-01

Pos logit contributions
int+0.166
osh+0.161
ours+0.161
ite+0.160
ores+0.155
Neg logit contributions
keeping-0.336
ping-0.248
 heads-0.240
 arrest-0.234
makers-0.232
Token,
Token ID11
Feature activation-5.90e-02

Pos logit contributions
int+0.066
ite+0.064
osh+0.064
ours+0.064
ores+0.062
Neg logit contributions
keeping-0.111
ping-0.079
 heads-0.077
 arrest-0.076
makers-0.074
Token higher
Token ID2440
Feature activation+5.54e+00

Pos logit contributions
keeping+0.007
ping+0.005
 heads+0.005
 arrest+0.005
makers+0.005
Neg logit contributions
osh-0.004
int-0.004
ite-0.004
ours-0.004
ores-0.004
Token and
Token ID290
Feature activation+2.19e-02

Pos logit contributions
osh+0.353
ig+0.351
ite+0.347
int+0.342
ours+0.338
Neg logit contributions
keeping-0.734
ping-0.519
 arrest-0.514
makers-0.508
 mant-0.494
Token higher
Token ID2440
Feature activation+5.55e+00

Pos logit contributions
int+0.001
ite+0.001
ours+0.001
osh+0.001
ores+0.001
Neg logit contributions
keeping-0.003
ping-0.003
 heads-0.002
 arrest-0.002
makers-0.002
Token until
Token ID1566
Feature activation-1.91e+00

Pos logit contributions
osh+0.366
ig+0.362
ite+0.355
int+0.350
ores+0.349
Neg logit contributions
keeping-0.723
 arrest-0.507
ping-0.503
makers-0.489
 mant-0.484
Token they
Token ID484
Feature activation+7.44e-01

Pos logit contributions
keeping+0.245
ping+0.176
 arrest+0.173
 heads+0.173
makers+0.170
Neg logit contributions
int-0.130
ite-0.124
ours-0.124
osh-0.122
ig-0.121
Token could
Token ID714
Feature activation+9.38e-01

Pos logit contributions
ours+0.051
int+0.050
ite+0.050
ig+0.049
ores+0.048
Neg logit contributions
keeping-0.092
ping-0.067
 arrest-0.065
 heads-0.065
makers-0.063
Token,
Token ID11
Feature activation-1.87e-01

Pos logit contributions
keeping+0.147
 Smith+0.095
 arrest+0.093
 heads+0.091
makers+0.088
Neg logit contributions
int-0.159
aid-0.140
ores-0.139
aver-0.134
ours-0.133
Token she
Token ID673
Feature activation+1.29e+00

Pos logit contributions
keeping+0.020
 heads+0.014
makers+0.013
 arrest+0.013
 Smith+0.013
Neg logit contributions
int-0.017
ores-0.015
ours-0.014
aver-0.014
umber-0.014
Token found
Token ID1043
Feature activation-5.16e-01

Pos logit contributions
ores+0.119
ite+0.119
itting+0.116
int+0.114
ours+0.113
Neg logit contributions
keeping-0.125
 heads-0.082
 arrest-0.078
 doorstep-0.077
 mant-0.076
Token a
Token ID257
Feature activation+9.81e-02

Pos logit contributions
keeping+0.061
ping+0.043
 heads+0.041
 arrest+0.041
makers+0.040
Neg logit contributions
int-0.041
ite-0.040
osh-0.039
ig-0.038
ours-0.038
Token box
Token ID3091
Feature activation+7.87e-01

Pos logit contributions
int+0.007
osh+0.007
ite+0.007
ours+0.007
ores+0.007
Neg logit contributions
keeping-0.012
ping-0.009
 heads-0.008
 arrest-0.008
makers-0.008
Token of
Token ID286
Feature activation+4.49e+00

Pos logit contributions
int+0.043
osh+0.042
ours+0.041
ite+0.041
ores+0.041
Neg logit contributions
keeping-0.111
ping-0.083
 heads-0.080
 arrest-0.079
makers-0.079
Token matches
Token ID7466
Feature activation+2.51e-01

Pos logit contributions
osh+0.381
int+0.381
ite+0.379
ours+0.375
ores+0.372
Neg logit contributions
keeping-0.493
ping-0.335
 heads-0.320
 arrest-0.317
makers-0.310
Token.
Token ID13
Feature activation-6.62e-02

Pos logit contributions
ite+0.016
osh+0.016
ours+0.016
ores+0.016
ig+0.015
Neg logit contributions
keeping-0.032
ping-0.024
 heads-0.023
 arrest-0.023
 mant-0.022
Token She
Token ID1375
Feature activation+2.81e-01

Pos logit contributions
keeping+0.008
 heads+0.006
 arrest+0.005
ping+0.005
makers+0.005
Neg logit contributions
int-0.005
ig-0.005
ite-0.005
ores-0.005
ours-0.005
Token smiled
Token ID13541
Feature activation+9.03e-02

Pos logit contributions
ite+0.022
ores+0.021
int+0.021
osh+0.021
ours+0.021
Neg logit contributions
keeping-0.032
 heads-0.023
ping-0.023
 arrest-0.022
makers-0.021
Token and
Token ID290
Feature activation-4.86e-02

Pos logit contributions
osh+0.007
ite+0.007
int+0.007
ours+0.007
ig+0.007
Neg logit contributions
keeping-0.010
ping-0.007
 heads-0.007
 arrest-0.007
edly-0.007
Token
Token ID198
Feature activation-3.56e-01

Pos logit contributions
keeping+0.047
ping+0.033
 arrest+0.033
makers+0.032
 heads+0.032
Neg logit contributions
int-0.025
osh-0.024
ours-0.024
ite-0.024
ig-0.023
Token"
Token ID1
Feature activation-5.01e-01

Pos logit contributions
keeping+0.047
ping+0.034
 heads+0.033
 arrest+0.033
makers+0.032
Neg logit contributions
osh-0.022
int-0.022
ite-0.022
ours-0.022
ores-0.022
TokenWe
Token ID1135
Feature activation+2.94e-01

Pos logit contributions
keeping+0.060
ping+0.042
 heads+0.040
 arrest+0.040
makers+0.039
Neg logit contributions
osh-0.039
ite-0.039
int-0.038
ours-0.038
ores-0.037
Token are
Token ID389
Feature activation-3.24e-01

Pos logit contributions
int+0.019
ours+0.019
ite+0.018
osh+0.018
ores+0.018
Neg logit contributions
keeping-0.038
ping-0.028
 arrest-0.027
 heads-0.027
makers-0.027
Token le
Token ID443
Feature activation-3.61e-01

Pos logit contributions
keeping+0.041
ping+0.029
 heads+0.028
edly+0.028
 arrest+0.027
Neg logit contributions
 exactly-0.023
ores-0.023
ours-0.022
osh-0.022
ila-0.021
Tokenop
Token ID404
Feature activation+5.46e+00

Pos logit contributions
keeping+0.042
ping+0.030
 heads+0.028
 arrest+0.028
makers+0.028
Neg logit contributions
osh-0.028
int-0.028
ite-0.028
ours-0.027
ores-0.027
Tokenards
Token ID1371
Feature activation+1.75e-01

Pos logit contributions
osh+0.429
ite+0.428
int+0.426
ours+0.422
ig+0.419
Neg logit contributions
keeping-0.630
ping-0.437
 heads-0.418
 arrest-0.417
makers-0.405
Token,
Token ID11
Feature activation-3.33e-01

Pos logit contributions
ite+0.011
osh+0.011
int+0.011
ig+0.011
ours+0.011
Neg logit contributions
keeping-0.023
ping-0.016
 arrest-0.016
 heads-0.016
makers-0.015
Token too
Token ID1165
Feature activation+1.98e+00

Pos logit contributions
keeping+0.040
ping+0.029
 heads+0.029
 arrest+0.029
makers+0.028
Neg logit contributions
int-0.024
osh-0.023
ite-0.023
ver-0.022
ig-0.022
Token.
Token ID13
Feature activation-4.12e-01

Pos logit contributions
int+0.167
osh+0.167
ite+0.165
ours+0.163
ig+0.162
Neg logit contributions
keeping-0.218
ping-0.149
 heads-0.142
 arrest-0.141
makers-0.139
Token We
Token ID775
Feature activation-6.50e-02

Pos logit contributions
keeping+0.044
ping+0.029
 arrest+0.029
 heads+0.029
makers+0.029
Neg logit contributions
int-0.035
ig-0.035
osh-0.035
ite-0.034
ver-0.034
Token a
Token ID257
Feature activation+1.57e-01

Pos logit contributions
ite+0.066
ig+0.064
ours+0.064
int+0.063
osh+0.061
Neg logit contributions
keeping-0.105
ping-0.079
 heads-0.074
makers-0.074
edly-0.072
Token bit
Token ID1643
Feature activation-5.68e-01

Pos logit contributions
ours+0.015
ores+0.014
int+0.014
osh+0.014
ite+0.014
Neg logit contributions
keeping-0.015
 heads-0.010
ping-0.010
 arrest-0.010
makers-0.010
Token shy
Token ID15800
Feature activation+1.00e-01

Pos logit contributions
keeping+0.057
ping+0.048
makers+0.047
 heads+0.047
edly+0.044
Neg logit contributions
osh-0.041
ite-0.039
ig-0.039
ours-0.038
iz-0.037
Token,
Token ID11
Feature activation+1.46e-01

Pos logit contributions
osh+0.008
ite+0.008
ours+0.007
int+0.007
ores+0.007
Neg logit contributions
keeping-0.012
ping-0.009
 heads-0.008
 arrest-0.008
makers-0.008
Token she
Token ID673
Feature activation+1.45e+00

Pos logit contributions
int+0.011
ite+0.011
ours+0.011
osh+0.011
ig+0.011
Neg logit contributions
keeping-0.018
ping-0.012
 arrest-0.012
 heads-0.012
makers-0.012
Token nodded
Token ID14464
Feature activation+4.50e+00

Pos logit contributions
ite+0.114
ores+0.113
ours+0.112
int+0.109
ig+0.108
Neg logit contributions
keeping-0.160
 heads-0.118
ping-0.114
 arrest-0.109
makers-0.106
Token and
Token ID290
Feature activation+1.75e-01

Pos logit contributions
osh+0.326
int+0.326
ite+0.325
ours+0.320
ores+0.317
Neg logit contributions
keeping-0.550
ping-0.393
 heads-0.377
 arrest-0.373
makers-0.366
Token he
Token ID339
Feature activation+1.47e+00

Pos logit contributions
ite+0.013
ours+0.013
int+0.012
ores+0.012
umber+0.012
Neg logit contributions
keeping-0.021
 heads-0.015
ping-0.015
 arrest-0.014
edly-0.014
Token helped
Token ID4193
Feature activation+1.92e+00

Pos logit contributions
ite+0.123
ours+0.122
ores+0.122
itting+0.119
umber+0.114
Neg logit contributions
keeping-0.148
 heads-0.113
edly-0.104
ping-0.103
cious-0.101
Token her
Token ID607
Feature activation-4.21e-01

Pos logit contributions
osh+0.132
int+0.130
ite+0.128
ores+0.128
ours+0.127
Neg logit contributions
keeping-0.240
ping-0.175
 heads-0.166
 arrest-0.166
makers-0.164
Token put
Token ID1234
Feature activation+4.14e-01

Pos logit contributions
keeping+0.049
 heads+0.034
ping+0.032
 mant+0.031
 arrest+0.031
Neg logit contributions
ite-0.034
osh-0.034
int-0.034
ours-0.033
itting-0.033
Token rest
Token ID1334
Feature activation-1.18e+00

Pos logit contributions
keeping+0.069
ping+0.048
 heads+0.047
 arrest+0.046
makers+0.045
Neg logit contributions
osh-0.043
int-0.043
ite-0.043
ours-0.043
ores-0.042
Token and
Token ID290
Feature activation-2.62e-01

Pos logit contributions
keeping+0.146
ping+0.104
 heads+0.100
 arrest+0.098
makers+0.097
Neg logit contributions
int-0.085
osh-0.084
ite-0.084
ours-0.083
ig-0.081
Token drink
Token ID4144
Feature activation-6.74e-01

Pos logit contributions
keeping+0.032
ping+0.023
 heads+0.022
 arrest+0.022
makers+0.022
Neg logit contributions
osh-0.018
int-0.018
ite-0.018
ours-0.018
ores-0.018
Token a
Token ID257
Feature activation-3.58e-02

Pos logit contributions
keeping+0.080
ping+0.057
 heads+0.055
makers+0.053
 arrest+0.052
Neg logit contributions
int-0.052
ite-0.051
osh-0.051
ig-0.050
ver-0.049
Token lot
Token ID1256
Feature activation+1.11e+00

Pos logit contributions
keeping+0.004
ping+0.003
 mant+0.003
 heads+0.003
 surface+0.003
Neg logit contributions
ver-0.003
osh-0.003
ours-0.003
int-0.003
ite-0.003
Token of
Token ID286
Feature activation+4.58e+00

Pos logit contributions
int+0.034
osh+0.033
ite+0.032
ig+0.032
ours+0.032
Neg logit contributions
keeping-0.182
ping-0.143
makers-0.138
 heads-0.136
 arrest-0.136
Token water
Token ID1660
Feature activation+5.52e-01

Pos logit contributions
osh+0.354
int+0.353
ite+0.353
ours+0.347
ig+0.343
Neg logit contributions
keeping-0.538
ping-0.378
 heads-0.362
 arrest-0.358
makers-0.351
Token.
Token ID13
Feature activation-2.16e-01

Pos logit contributions
ite+0.038
osh+0.038
int+0.037
ores+0.037
ours+0.037
Neg logit contributions
keeping-0.069
ping-0.050
 arrest-0.048
 heads-0.048
makers-0.047
Token
Token ID198
Feature activation-2.64e-01

Pos logit contributions
keeping+0.024
 Smith+0.015
 arrest+0.015
ping+0.015
edly+0.015
Neg logit contributions
ig-0.019
osh-0.018
ite-0.018
int-0.018
itting-0.017
Token
Token ID198
Feature activation-2.61e-01

Pos logit contributions
keeping+0.031
makers+0.021
ping+0.021
 arrest+0.021
 heads+0.020
Neg logit contributions
int-0.021
ig-0.020
ite-0.020
osh-0.019
ours-0.019
TokenL
Token ID43
Feature activation+1.33e+00

Pos logit contributions
keeping+0.031
 arrest+0.022
makers+0.022
ping+0.021
 heads+0.021
Neg logit contributions
int-0.020
ours-0.019
osh-0.019
ig-0.019
ite-0.018
Token the
Token ID262
Feature activation+7.10e-01

Pos logit contributions
keeping+0.244
ping+0.175
 heads+0.169
 arrest+0.166
makers+0.163
Neg logit contributions
int-0.137
ite-0.136
osh-0.136
ours-0.134
ig-0.131
Token oldest
Token ID13325
Feature activation-9.12e-01

Pos logit contributions
osh+0.054
int+0.054
ite+0.054
ours+0.053
ores+0.053
Neg logit contributions
keeping-0.085
ping-0.060
 heads-0.057
 arrest-0.057
makers-0.056
Token kid
Token ID5141
Feature activation+2.31e+00

Pos logit contributions
keeping+0.100
 heads+0.068
ping+0.066
 arrest+0.066
makers+0.061
Neg logit contributions
ours-0.076
ores-0.076
int-0.075
osh-0.075
itting-0.074
Token spoke
Token ID5158
Feature activation-1.70e+00

Pos logit contributions
ours+0.163
int+0.163
ite+0.162
osh+0.162
ores+0.161
Neg logit contributions
keeping-0.285
ping-0.202
 heads-0.197
 arrest-0.194
makers-0.189
Token.
Token ID13
Feature activation-5.54e-03

Pos logit contributions
keeping+0.202
ping+0.144
 heads+0.137
makers+0.131
edly+0.129
Neg logit contributions
ite-0.131
int-0.130
ours-0.130
osh-0.129
ig-0.127
Token "
Token ID366
Feature activation+1.59e-02

Pos logit contributions
keeping+0.001
ping+0.000
 heads+0.000
 arrest+0.000
makers+0.000
Neg logit contributions
ite-0.000
int-0.000
osh-0.000
ig-0.000
ours-0.000
TokenIt
Token ID1026
Feature activation+1.66e+00

Pos logit contributions
osh+0.001
int+0.001
ite+0.001
ours+0.001
ores+0.001
Neg logit contributions
keeping-0.002
ping-0.001
 heads-0.001
 arrest-0.001
makers-0.001
Token's
Token ID338
Feature activation-6.73e-01

Pos logit contributions
ite+0.140
int+0.138
ores+0.138
ours+0.136
ig+0.135
Neg logit contributions
keeping-0.176
ping-0.126
 heads-0.123
makers-0.117
 arrest-0.114
Token time
Token ID640
Feature activation+1.38e+00

Pos logit contributions
keeping+0.085
ping+0.063
 heads+0.060
 arrest+0.058
makers+0.058
Neg logit contributions
ite-0.047
int-0.046
osh-0.045
ours-0.045
ig-0.044
Token to
Token ID284
Feature activation-3.03e-01

Pos logit contributions
osh+0.082
ores+0.080
ours+0.080
ite+0.080
int+0.079
Neg logit contributions
keeping-0.186
ping-0.139
 arrest-0.134
 heads-0.133
makers-0.128
Token go
Token ID467
Feature activation+2.57e+00

Pos logit contributions
keeping+0.038
ping+0.028
 heads+0.027
 arrest+0.026
makers+0.026
Neg logit contributions
osh-0.021
int-0.021
ite-0.020
ores-0.020
ig-0.020
Token little
Token ID1310
Feature activation-2.96e+00

Pos logit contributions
osh+0.025
int+0.025
ite+0.024
ours+0.024
ig+0.024
Neg logit contributions
keeping-0.037
ping-0.026
 heads-0.024
 arrest-0.024
makers-0.024
Token boy
Token ID2933
Feature activation+2.34e-01

Pos logit contributions
keeping+0.273
 heads+0.205
 arrest+0.171
ping+0.169
 Smith+0.165
Neg logit contributions
ores-0.304
ver-0.296
apy-0.282
irl-0.280
ours-0.278
Token replied
Token ID8712
Feature activation+1.37e+00

Pos logit contributions
ores+0.018
ours+0.018
ite+0.018
itting+0.018
int+0.017
Neg logit contributions
keeping-0.026
 heads-0.019
ping-0.018
 arrest-0.018
edly-0.018
Token,
Token ID11
Feature activation-1.64e-01

Pos logit contributions
ite+0.108
int+0.102
osh+0.101
ours+0.100
ig+0.099
Neg logit contributions
keeping-0.156
ping-0.116
edly-0.111
makers-0.110
 heads-0.109
Token "
Token ID366
Feature activation-1.66e-01

Pos logit contributions
keeping+0.020
ping+0.015
 heads+0.014
 arrest+0.014
makers+0.014
Neg logit contributions
ite-0.012
osh-0.012
int-0.012
ours-0.011
ig-0.011
TokenYes
Token ID5297
Feature activation+2.71e+00

Pos logit contributions
keeping+0.019
ping+0.013
 generation+0.012
edly+0.012
makers+0.012
Neg logit contributions
ours-0.014
ig-0.014
ores-0.013
uddle-0.013
osh-0.012
Token!"
Token ID2474
Feature activation-1.76e+00

Pos logit contributions
ite+0.194
ours+0.190
int+0.189
osh+0.183
ores+0.178
Neg logit contributions
keeping-0.312
ping-0.232
 heads-0.229
makers-0.228
 arrest-0.224
Token and
Token ID290
Feature activation-5.40e-02

Pos logit contributions
keeping+0.205
edly+0.144
ping+0.143
 heads+0.143
 arrest+0.133
Neg logit contributions
ig-0.136
ours-0.136
ite-0.135
osh-0.134
irl-0.131
Token the
Token ID262
Feature activation+3.63e-01

Pos logit contributions
keeping+0.006
ping+0.005
 heads+0.005
 arrest+0.004
edly+0.004
Neg logit contributions
int-0.004
ite-0.004
ours-0.004
itting-0.004
ig-0.004
Token game
Token ID983
Feature activation-8.18e-01

Pos logit contributions
osh+0.024
int+0.024
ite+0.024
ours+0.023
ores+0.023
Neg logit contributions
keeping-0.047
ping-0.034
 heads-0.033
 arrest-0.032
makers-0.032
Token went
Token ID1816
Feature activation+8.20e-01

Pos logit contributions
keeping+0.094
 heads+0.066
ping+0.066
makers+0.064
 arrest+0.062
Neg logit contributions
int-0.065
ours-0.064
ores-0.061
ig-0.061
ite-0.060
Token
Token ID198
Feature activation-1.33e-01

Pos logit contributions
keeping+0.014
 heads+0.010
ping+0.010
makers+0.010
 arrest+0.009
Neg logit contributions
int-0.009
ig-0.009
osh-0.009
ite-0.008
ores-0.008
Token
Token ID198
Feature activation-1.16e-01

Pos logit contributions
keeping+0.013
makers+0.009
 generations+0.008
ping+0.008
 Smith+0.008
Neg logit contributions
int-0.012
ig-0.012
ours-0.011
ite-0.011
osh-0.011
Token"
Token ID1
Feature activation-4.86e-01

Pos logit contributions
keeping+0.012
makers+0.009
ping+0.008
cious+0.008
 heads+0.008
Neg logit contributions
ig-0.010
int-0.009
ours-0.009
osh-0.009
ores-0.009
TokenLet
Token ID5756
Feature activation-6.65e-01

Pos logit contributions
keeping+0.055
ping+0.040
 arrest+0.037
 heads+0.037
makers+0.036
Neg logit contributions
osh-0.040
ite-0.039
int-0.038
ours-0.038
ver-0.038
Token's
Token ID338
Feature activation-9.40e-01

Pos logit contributions
keeping+0.082
ping+0.058
 heads+0.055
 arrest+0.055
makers+0.054
Neg logit contributions
osh-0.049
int-0.048
ite-0.048
ours-0.047
ig-0.047
Token go
Token ID467
Feature activation+2.32e+00

Pos logit contributions
keeping+0.111
ping+0.077
 heads+0.077
 arrest+0.076
 mant+0.072
Neg logit contributions
int-0.072
ours-0.071
ite-0.071
osh-0.071
ores-0.069
Token and
Token ID290
Feature activation-6.77e-02

Pos logit contributions
int+0.190
osh+0.188
ours+0.188
ores+0.183
apy+0.181
Neg logit contributions
keeping-0.262
ping-0.176
makers-0.171
 arrest-0.171
 heads-0.169
Token see
Token ID766
Feature activation+1.14e+00

Pos logit contributions
keeping+0.008
ping+0.006
 heads+0.006
 arrest+0.006
makers+0.006
Neg logit contributions
osh-0.005
int-0.005
ite-0.005
ours-0.005
ores-0.005
Token!"
Token ID2474
Feature activation-1.88e+00

Pos logit contributions
int+0.076
ite+0.076
osh+0.075
ores+0.073
ours+0.073
Neg logit contributions
keeping-0.145
ping-0.106
 heads-0.102
 arrest-0.101
makers-0.099
Token Tom
Token ID4186
Feature activation+3.71e-01

Pos logit contributions
keeping+0.218
 heads+0.157
 arrest+0.154
makers+0.146
ping+0.145
Neg logit contributions
ig-0.151
int-0.142
ite-0.141
osh-0.138
itting-0.137
Token said
Token ID531
Feature activation+1.93e-01

Pos logit contributions
ores+0.026
ours+0.026
ite+0.025
osh+0.025
ig+0.024
Neg logit contributions
keeping-0.047
ping-0.035
 heads-0.034
makers-0.032
edly-0.031
Token she
Token ID673
Feature activation+1.29e+00

Pos logit contributions
int+0.103
ite+0.100
osh+0.099
ig+0.097
ours+0.096
Neg logit contributions
keeping-0.204
ping-0.152
 heads-0.144
makers-0.142
 arrest-0.139
Token had
Token ID550
Feature activation-4.15e+00

Pos logit contributions
ite+0.112
ig+0.109
int+0.108
ours+0.108
ores+0.108
Neg logit contributions
keeping-0.132
 heads-0.090
ping-0.090
makers-0.087
cious-0.085
Token done
Token ID1760
Feature activation-1.95e-01

Pos logit contributions
keeping+0.511
ping+0.365
 heads+0.351
 arrest+0.348
makers+0.341
Neg logit contributions
osh-0.299
int-0.298
ite-0.296
ours-0.293
ores-0.290
Token the
Token ID262
Feature activation+6.09e-02

Pos logit contributions
keeping+0.026
ping+0.019
 heads+0.019
 arrest+0.018
makers+0.018
Neg logit contributions
int-0.012
ite-0.012
osh-0.012
ours-0.012
ig-0.011
Token task
Token ID4876
Feature activation+1.04e+00

Pos logit contributions
osh+0.005
int+0.005
ite+0.005
ours+0.004
ores+0.004
Neg logit contributions
keeping-0.007
ping-0.005
 heads-0.005
 arrest-0.005
makers-0.005
Token.
Token ID13
Feature activation-2.63e-01

Pos logit contributions
int+0.075
osh+0.074
ours+0.073
ite+0.073
ores+0.072
Neg logit contributions
keeping-0.128
ping-0.091
 heads-0.087
 arrest-0.086
makers-0.086
Token
Token ID198
Feature activation-2.88e-01

Pos logit contributions
keeping+0.033
ping+0.023
 heads+0.022
 arrest+0.022
makers+0.022
Neg logit contributions
int-0.019
osh-0.019
ite-0.018
ig-0.018
ours-0.017
Token
Token ID198
Feature activation-2.88e-01

Pos logit contributions
keeping+0.038
ping+0.027
 arrest+0.026
makers+0.026
 heads+0.026
Neg logit contributions
int-0.019
ite-0.018
osh-0.018
ours-0.018
ig-0.018
TokenHer
Token ID9360
Feature activation-1.55e+00

Pos logit contributions
keeping+0.035
ping+0.024
 arrest+0.024
makers+0.023
 heads+0.023
Neg logit contributions
int-0.021
osh-0.021
ours-0.021
ite-0.021
ig-0.021
Token Mom
Token ID11254
Feature activation+1.53e+00

Pos logit contributions
keeping+0.185
 heads+0.138
ping+0.127
 mant+0.125
 arrest+0.124
Neg logit contributions
osh-0.116
int-0.114
ite-0.111
ores-0.111
ver-0.109
Token smiled
Token ID13541
Feature activation-3.45e-02

Pos logit contributions
ite+0.122
ores+0.120
ours+0.117
itting+0.115
apy+0.113
Neg logit contributions
keeping-0.163
 heads-0.123
ping-0.122
makers-0.110
 arrest-0.109
Token pressed
Token ID12070
Feature activation+2.61e+00

Pos logit contributions
keeping+0.027
ping+0.019
 arrest+0.018
makers+0.018
 heads+0.018
Neg logit contributions
osh-0.013
int-0.013
ite-0.013
ours-0.013
ig-0.013
Token them
Token ID606
Feature activation-1.04e+00

Pos logit contributions
int+0.198
ite+0.192
osh+0.189
ours+0.189
ig+0.186
Neg logit contributions
keeping-0.315
ping-0.225
 heads-0.218
 arrest-0.212
makers-0.210
Token with
Token ID351
Feature activation+2.16e-01

Pos logit contributions
keeping+0.126
ping+0.090
 heads+0.086
 arrest+0.086
makers+0.084
Neg logit contributions
osh-0.075
int-0.075
ite-0.074
ours-0.073
ores-0.073
Token her
Token ID607
Feature activation-4.47e-01

Pos logit contributions
osh+0.015
int+0.015
ite+0.015
ours+0.015
ores+0.015
Neg logit contributions
keeping-0.027
ping-0.019
 heads-0.019
 arrest-0.018
makers-0.018
Token paws
Token ID49193
Feature activation-6.16e-01

Pos logit contributions
keeping+0.054
ping+0.038
 arrest+0.036
makers+0.035
 heads+0.034
Neg logit contributions
ite-0.034
osh-0.034
ig-0.034
int-0.033
ours-0.032
Token.
Token ID13
Feature activation-1.35e-01

Pos logit contributions
keeping+0.076
ping+0.054
 arrest+0.052
 heads+0.052
makers+0.051
Neg logit contributions
osh-0.044
ite-0.043
int-0.043
ours-0.042
ores-0.042
Token
Token ID198
Feature activation-1.66e-01

Pos logit contributions
keeping+0.017
ping+0.013
 heads+0.012
 arrest+0.012
makers+0.012
Neg logit contributions
int-0.009
osh-0.009
ite-0.009
ig-0.009
ours-0.009
Token
Token ID198
Feature activation-1.56e-01

Pos logit contributions
keeping+0.019
ping+0.014
makers+0.013
 arrest+0.013
 heads+0.013
Neg logit contributions
int-0.012
ours-0.012
ite-0.012
osh-0.012
ig-0.012
TokenK
Token ID42
Feature activation+3.50e-01

Pos logit contributions
keeping+0.018
ping+0.013
 heads+0.012
 arrest+0.012
makers+0.012
Neg logit contributions
osh-0.012
int-0.012
ite-0.012
ours-0.012
ores-0.012
Tokenitty
Token ID9760
Feature activation+5.39e+00

Pos logit contributions
ores+0.012
osh+0.012
ours+0.012
int+0.012
ite+0.011
Neg logit contributions
keeping-0.055
ping-0.043
 arrest-0.042
 heads-0.041
makers-0.041
Token's
Token ID338
Feature activation-6.89e-01

Pos logit contributions
osh+0.341
int+0.334
ite+0.332
ig+0.332
ours+0.319
Neg logit contributions
keeping-0.726
ping-0.524
 arrest-0.499
makers-0.497
 mant-0.495
Token he
Token ID339
Feature activation+1.49e+00

Pos logit contributions
int+0.010
ours+0.009
ig+0.009
itting+0.009
ite+0.009
Neg logit contributions
keeping-0.016
 arrest-0.011
ping-0.011
makers-0.011
 heads-0.011
Token was
Token ID373
Feature activation+5.73e-01

Pos logit contributions
osh+0.111
int+0.108
ite+0.108
ig+0.106
ours+0.106
Neg logit contributions
keeping-0.182
ping-0.130
 heads-0.124
 arrest-0.122
makers-0.120
Token so
Token ID523
Feature activation-2.21e-01

Pos logit contributions
osh+0.047
int+0.047
ite+0.047
ours+0.046
ores+0.046
Neg logit contributions
keeping-0.064
ping-0.044
 heads-0.042
 arrest-0.042
makers-0.041
Token excited
Token ID6568
Feature activation+3.12e-01

Pos logit contributions
keeping+0.027
ping+0.020
makers+0.020
 heads+0.020
 arrest+0.019
Neg logit contributions
int-0.015
ig-0.015
ite-0.015
osh-0.015
ours-0.015
Token to
Token ID284
Feature activation-3.88e-01

Pos logit contributions
osh+0.021
int+0.021
ite+0.021
ores+0.021
ours+0.021
Neg logit contributions
keeping-0.039
ping-0.028
 heads-0.027
 arrest-0.027
makers-0.027
Token learn
Token ID2193
Feature activation-9.42e-01

Pos logit contributions
keeping+0.045
ping+0.031
 arrest+0.030
 heads+0.030
makers+0.029
Neg logit contributions
ite-0.031
int-0.031
osh-0.031
ours-0.031
ver-0.030
Token about
Token ID546
Feature activation+2.54e+00

Pos logit contributions
keeping+0.107
ping+0.073
 surface+0.067
makers+0.067
 mant+0.067
Neg logit contributions
int-0.084
ite-0.078
osh-0.077
ours-0.076
ig-0.074
Token soil
Token ID9260
Feature activation-2.78e-01

Pos logit contributions
osh+0.146
ours+0.145
ores+0.144
ig+0.142
ite+0.141
Neg logit contributions
keeping-0.344
ping-0.255
 heads-0.247
 arrest-0.244
makers-0.243
Token.
Token ID13
Feature activation-8.56e-02

Pos logit contributions
keeping+0.035
ping+0.025
 heads+0.025
 arrest+0.024
makers+0.024
Neg logit contributions
osh-0.019
ite-0.018
ores-0.018
ig-0.018
int-0.018
Token
Token ID198
Feature activation+1.08e-01

Pos logit contributions
keeping+0.011
 arrest+0.007
ping+0.007
 Smith+0.007
 heads+0.007
Neg logit contributions
ig-0.006
osh-0.006
int-0.006
ite-0.006
irl-0.006
Token
Token ID198
Feature activation+1.04e-01

Pos logit contributions
int+0.008
ours+0.008
ite+0.008
osh+0.008
ig+0.008
Neg logit contributions
keeping-0.013
makers-0.009
ping-0.008
 arrest-0.008
 generations-0.008
Token.
Token ID13
Feature activation-9.50e-02

Pos logit contributions
keeping+0.482
 arrest+0.345
 heads+0.337
ping+0.337
makers+0.328
Neg logit contributions
osh-0.251
ores-0.244
ite-0.241
int-0.238
ig-0.234
Token It
Token ID632
Feature activation+1.07e+00

Pos logit contributions
keeping+0.012
ping+0.009
 heads+0.009
 arrest+0.009
makers+0.008
Neg logit contributions
int-0.006
osh-0.006
ite-0.006
ig-0.006
ours-0.006
Token was
Token ID373
Feature activation+6.03e-01

Pos logit contributions
ite+0.076
int+0.075
ours+0.075
ores+0.074
osh+0.073
Neg logit contributions
keeping-0.129
ping-0.096
 heads-0.093
makers-0.088
 arrest-0.088
Token the
Token ID262
Feature activation+3.08e-01

Pos logit contributions
int+0.042
ite+0.041
osh+0.041
ours+0.041
ig+0.040
Neg logit contributions
keeping-0.076
ping-0.055
 heads-0.053
 arrest-0.052
makers-0.051
Token best
Token ID1266
Feature activation+3.78e+00

Pos logit contributions
ite+0.014
osh+0.014
 exactly+0.013
ig+0.013
int+0.013
Neg logit contributions
keeping-0.046
ping-0.035
 arrest-0.034
makers-0.033
 heads-0.032
Token day
Token ID1110
Feature activation-7.62e-01

Pos logit contributions
osh+0.290
ite+0.290
int+0.287
ig+0.284
ores+0.283
Neg logit contributions
keeping-0.447
ping-0.314
 heads-0.301
 arrest-0.297
makers-0.292
Token ever
Token ID1683
Feature activation-5.35e+00

Pos logit contributions
keeping+0.079
ping+0.052
 heads+0.050
 arrest+0.049
makers+0.048
Neg logit contributions
osh-0.070
ite-0.069
int-0.069
ours-0.068
ores-0.068
Token!
Token ID0
Feature activation+9.39e-01

Pos logit contributions
keeping+0.741
 heads+0.558
 arrest+0.548
 mant+0.541
ping+0.539
Neg logit contributions
 of-0.304
osh-0.298
ite-0.296
ig-0.294
adow-0.290
Token
Token ID220
Feature activation+6.54e-01

Pos logit contributions
int+0.070
osh+0.068
ours+0.068
ig+0.066
ores+0.065
Neg logit contributions
keeping-0.117
 arrest-0.077
ping-0.077
makers-0.076
 generation-0.074
Token
Token ID198
Feature activation+8.09e-02

Pos logit contributions
int+0.049
ig+0.048
osh+0.048
ite+0.047
ours+0.045
Neg logit contributions
keeping-0.079
ping-0.053
makers-0.053
 arrest-0.052
 heads-0.050
Token
Token ID198
Feature activation+8.95e-02

Pos logit contributions
int+0.005
ours+0.005
ite+0.005
osh+0.005
ig+0.005
Neg logit contributions
keeping-0.011
ping-0.008
makers-0.007
 arrest-0.007
 heads-0.007
Token back
Token ID736
Feature activation-1.46e+00

Pos logit contributions
osh+0.115
int+0.115
ite+0.114
ours+0.113
ores+0.112
Neg logit contributions
keeping-0.214
ping-0.155
 heads-0.149
 arrest-0.148
makers-0.145
Token to
Token ID284
Feature activation-5.03e-01

Pos logit contributions
keeping+0.181
ping+0.130
 heads+0.125
 arrest+0.124
makers+0.122
Neg logit contributions
osh-0.104
int-0.103
ite-0.103
ours-0.102
ores-0.100
Token the
Token ID262
Feature activation+2.38e-01

Pos logit contributions
keeping+0.067
ping+0.049
 heads+0.047
makers+0.047
 arrest+0.047
Neg logit contributions
osh-0.031
ores-0.030
ig-0.029
ite-0.028
ours-0.028
Token way
Token ID835
Feature activation-3.73e-01

Pos logit contributions
ite+0.015
?+0.015
uddle+0.015
 swift+0.015
osh+0.014
Neg logit contributions
keeping-0.028
ping-0.021
iously-0.019
red-0.019
edly-0.019
Token they
Token ID484
Feature activation+5.46e-01

Pos logit contributions
keeping+0.045
ping+0.032
 heads+0.031
 arrest+0.031
makers+0.030
Neg logit contributions
int-0.027
osh-0.027
ite-0.027
ours-0.027
ores-0.026
Token were
Token ID547
Feature activation-1.92e+00

Pos logit contributions
osh+0.037
ite+0.036
int+0.036
ours+0.035
ores+0.035
Neg logit contributions
keeping-0.070
ping-0.051
 heads-0.049
 arrest-0.049
makers-0.048
Token before
Token ID878
Feature activation-3.39e+00

Pos logit contributions
keeping+0.234
ping+0.166
 heads+0.161
 arrest+0.158
makers+0.155
Neg logit contributions
osh-0.139
ite-0.138
int-0.138
ores-0.136
ours-0.136
Token.
Token ID13
Feature activation-3.07e-01

Pos logit contributions
keeping+0.460
ping+0.346
 heads+0.338
 mant+0.333
makers+0.327
Neg logit contributions
ores-0.192
osh-0.184
 exactly-0.183
apy-0.181
 of-0.180
Token He
Token ID679
Feature activation+6.05e-01

Pos logit contributions
keeping+0.038
 arrest+0.025
ping+0.024
 Smith+0.024
 heads+0.024
Neg logit contributions
ig-0.025
int-0.024
ite-0.023
osh-0.023
ours-0.022
Token hoped
Token ID10719
Feature activation+2.40e+00

Pos logit contributions
itting+0.051
ores+0.051
int+0.051
ours+0.051
ite+0.050
Neg logit contributions
keeping-0.065
 heads-0.048
 arrest-0.045
ping-0.043
edly-0.042
Token that
Token ID326
Feature activation-5.86e-01

Pos logit contributions
int+0.177
ite+0.173
ours+0.169
osh+0.168
ig+0.164
Neg logit contributions
keeping-0.303
ping-0.213
 arrest-0.210
 heads-0.205
makers-0.203
Token He
Token ID679
Feature activation+5.89e-01

Pos logit contributions
keeping+0.041
 heads+0.026
 arrest+0.026
makers+0.026
ping+0.026
Neg logit contributions
ig-0.034
int-0.034
itting-0.031
osh-0.031
ite-0.031
Token says
Token ID1139
Feature activation+1.00e+00

Pos logit contributions
int+0.036
ite+0.036
ores+0.035
itting+0.035
ver+0.035
Neg logit contributions
keeping-0.076
 heads-0.058
 arrest-0.058
ping-0.057
makers-0.054
Token,
Token ID11
Feature activation-3.32e-01

Pos logit contributions
int+0.086
ite+0.078
itting+0.077
osh+0.075
ig+0.075
Neg logit contributions
keeping-0.111
makers-0.079
ping-0.078
cious-0.075
 arrest-0.074
Token "
Token ID366
Feature activation-3.24e-01

Pos logit contributions
keeping+0.045
ping+0.033
 heads+0.032
 arrest+0.032
makers+0.031
Neg logit contributions
osh-0.020
ours-0.020
ores-0.020
int-0.020
ite-0.020
TokenThank
Token ID10449
Feature activation-1.33e+00

Pos logit contributions
keeping+0.038
ping+0.027
 heads+0.026
 mant+0.026
 arrest+0.025
Neg logit contributions
ite-0.026
osh-0.026
int-0.024
ila-0.024
umber-0.023
Token you
Token ID345
Feature activation-1.63e+00

Pos logit contributions
keeping+0.143
ping+0.096
 heads+0.091
 arrest+0.090
makers+0.088
Neg logit contributions
osh-0.117
int-0.116
ite-0.116
ours-0.115
ores-0.114
Token,
Token ID11
Feature activation-3.02e-01

Pos logit contributions
keeping+0.226
ping+0.169
 heads+0.163
 arrest+0.162
makers+0.160
Neg logit contributions
osh-0.092
int-0.092
ite-0.092
ours-0.090
ores-0.089
Token Lily
Token ID20037
Feature activation-6.85e-01

Pos logit contributions
keeping+0.033
ping+0.022
 heads+0.021
 arrest+0.021
makers+0.020
Neg logit contributions
osh-0.026
ite-0.026
int-0.026
ours-0.026
ores-0.025
Token and
Token ID290
Feature activation-3.48e-01

Pos logit contributions
keeping+0.083
ping+0.059
 heads+0.057
 arrest+0.056
makers+0.055
Neg logit contributions
osh-0.050
int-0.050
ite-0.050
ours-0.049
ores-0.049
Token Tom
Token ID4186
Feature activation+1.10e-01

Pos logit contributions
keeping+0.055
ping+0.042
makers+0.040
 heads+0.040
 mant+0.040
Neg logit contributions
osh-0.014
ite-0.014
ours-0.014
ores-0.013
int-0.013
Token.
Token ID13
Feature activation-3.24e-01

Pos logit contributions
osh+0.008
ite+0.008
int+0.008
ours+0.007
ores+0.007
Neg logit contributions
keeping-0.014
ping-0.010
 arrest-0.009
 heads-0.009
makers-0.009
Token warm
Token ID5814
Feature activation+1.29e+00

Pos logit contributions
osh+0.008
int+0.008
ite+0.008
ours+0.008
ores+0.008
Neg logit contributions
keeping-0.014
ping-0.010
 heads-0.010
 arrest-0.010
makers-0.010
Token sun
Token ID4252
Feature activation+4.10e-01

Pos logit contributions
ite+0.097
ig+0.095
int+0.095
osh+0.094
ores+0.092
Neg logit contributions
keeping-0.154
ping-0.108
makers-0.104
 arrest-0.103
 heads-0.102
Token.
Token ID13
Feature activation-1.08e-01

Pos logit contributions
ite+0.024
ig+0.023
int+0.022
osh+0.021
 of+0.021
Neg logit contributions
keeping-0.057
makers-0.042
 arrest-0.041
ping-0.041
 heads-0.040
Token He
Token ID679
Feature activation+6.37e-01

Pos logit contributions
keeping+0.013
ping+0.009
 heads+0.009
 mant+0.009
 arrest+0.009
Neg logit contributions
ig-0.008
int-0.008
ite-0.008
osh-0.008
ours-0.007
Token liked
Token ID8288
Feature activation+6.40e-01

Pos logit contributions
int+0.048
ite+0.048
ours+0.048
ores+0.047
osh+0.047
Neg logit contributions
keeping-0.074
ping-0.052
 heads-0.052
 arrest-0.050
makers-0.048
Token playing
Token ID2712
Feature activation-1.20e+00

Pos logit contributions
osh+0.040
ores+0.039
ours+0.039
ite+0.039
ig+0.037
Neg logit contributions
keeping-0.085
ping-0.061
 heads-0.060
makers-0.059
 arrest-0.058
Token in
Token ID287
Feature activation-1.53e+00

Pos logit contributions
keeping+0.149
ping+0.106
 heads+0.102
 arrest+0.101
makers+0.100
Neg logit contributions
osh-0.085
int-0.085
ite-0.084
ours-0.083
ores-0.083
Token the
Token ID262
Feature activation+2.82e-01

Pos logit contributions
keeping+0.209
ping+0.157
 heads+0.150
makers+0.149
 arrest+0.147
Neg logit contributions
osh-0.089
ite-0.086
ores-0.085
ours-0.085
ig-0.084
Token backyard
Token ID24296
Feature activation+2.22e+00

Pos logit contributions
int+0.024
osh+0.024
ite+0.024
ours+0.024
ver+0.024
Neg logit contributions
keeping-0.031
ping-0.021
 heads-0.020
 arrest-0.020
makers-0.019
Token with
Token ID351
Feature activation+1.61e-01

Pos logit contributions
int+0.157
osh+0.155
ite+0.154
ours+0.152
ores+0.150
Neg logit contributions
keeping-0.276
ping-0.200
 heads-0.192
 arrest-0.190
makers-0.186
Token his
Token ID465
Feature activation+1.16e+00

Pos logit contributions
int+0.013
ver+0.012
itting+0.012
ite+0.012
ig+0.012
Neg logit contributions
keeping-0.019
ping-0.014
 heads-0.014
 arrest-0.014
 mant-0.013
Token if
Token ID611
Feature activation+2.58e+00

Pos logit contributions
keeping+0.141
ping+0.104
 heads+0.101
 arrest+0.100
makers+0.098
Neg logit contributions
osh-0.061
int-0.060
ite-0.060
ours-0.059
ores-0.059
Token we
Token ID356
Feature activation-1.29e+00

Pos logit contributions
int+0.200
osh+0.199
ite+0.198
ours+0.196
ores+0.194
Neg logit contributions
keeping-0.301
ping-0.210
 heads-0.202
 arrest-0.200
makers-0.196
Token could
Token ID714
Feature activation+6.97e-01

Pos logit contributions
keeping+0.181
ping+0.137
 heads+0.130
 arrest+0.129
makers+0.128
Neg logit contributions
osh-0.073
ite-0.072
int-0.070
ver-0.069
ours-0.069
Token ride
Token ID6594
Feature activation+1.77e+00

Pos logit contributions
osh+0.052
int+0.052
ite+0.052
ours+0.051
ores+0.051
Neg logit contributions
keeping-0.084
ping-0.059
 heads-0.057
 arrest-0.056
makers-0.055
Token that
Token ID326
Feature activation-4.03e-01

Pos logit contributions
int+0.126
osh+0.124
ours+0.124
ite+0.124
ores+0.122
Neg logit contributions
keeping-0.219
ping-0.156
 heads-0.151
 arrest-0.149
makers-0.147
Token mushroom
Token ID28520
Feature activation-5.98e+00

Pos logit contributions
keeping+0.044
ping+0.031
 heads+0.030
 arrest+0.029
 mant+0.029
Neg logit contributions
int-0.035
ours-0.033
ores-0.033
ila-0.031
ig-0.031
Token up
Token ID510
Feature activation+8.11e-01

Pos logit contributions
keeping+0.709
ping+0.497
 heads+0.479
 arrest+0.477
makers+0.468
Neg logit contributions
int-0.452
ours-0.442
osh-0.441
ores-0.435
ite-0.434
Token to
Token ID284
Feature activation-5.04e-01

Pos logit contributions
int+0.057
osh+0.056
ours+0.055
ite+0.055
ig+0.054
Neg logit contributions
keeping-0.101
ping-0.073
 heads-0.070
 arrest-0.070
makers-0.068
Token the
Token ID262
Feature activation+4.00e-01

Pos logit contributions
keeping+0.070
ping+0.053
edly+0.053
 heads+0.050
makers+0.050
Neg logit contributions
osh-0.029
ores-0.028
?-0.026
 of-0.026
ap-0.025
Token clouds
Token ID15114
Feature activation-8.87e-01

Pos logit contributions
ite+0.032
osh+0.030
ig+0.030
?+0.029
 of+0.029
Neg logit contributions
keeping-0.046
ping-0.032
 arrest-0.030
makers-0.030
edly-0.029
Token!"
Token ID2474
Feature activation-1.31e+00

Pos logit contributions
keeping+0.103
ping+0.073
 heads+0.069
 arrest+0.068
makers+0.068
Neg logit contributions
int-0.069
ours-0.067
osh-0.067
ite-0.066
ores-0.066
Token travelled
Token ID21650
Feature activation+5.74e-01

Pos logit contributions
keeping+0.026
 arrest+0.018
 heads+0.018
ping+0.018
makers+0.018
Neg logit contributions
ours-0.016
ores-0.015
ite-0.015
ig-0.015
ang-0.014
Token the
Token ID262
Feature activation+5.32e-01

Pos logit contributions
int+0.045
ours+0.045
osh+0.044
ite+0.043
ores+0.043
Neg logit contributions
keeping-0.068
ping-0.047
 heads-0.045
 arrest-0.044
makers-0.044
Token tall
Token ID7331
Feature activation+8.77e-01

Pos logit contributions
ite+0.038
osh+0.038
ig+0.037
 exactly+0.036
int+0.036
Neg logit contributions
keeping-0.068
ping-0.046
edly-0.045
 arrest-0.044
makers-0.043
Token vine
Token ID17793
Feature activation-8.97e-01

Pos logit contributions
ite+0.074
osh+0.073
ig+0.073
int+0.071
ours+0.069
Neg logit contributions
keeping-0.098
ping-0.066
makers-0.064
 arrest-0.062
 heads-0.062
Token brave
Token ID14802
Feature activation+2.47e+00

Pos logit contributions
keeping+0.111
ping+0.080
 heads+0.077
 arrest+0.076
makers+0.074
Neg logit contributions
int-0.063
osh-0.063
ours-0.062
ite-0.062
ores-0.061
Tokenly
Token ID306
Feature activation-7.85e+00

Pos logit contributions
ite+0.173
osh+0.172
int+0.169
ig+0.165
ver+0.163
Neg logit contributions
keeping-0.302
ping-0.218
 arrest-0.212
 heads-0.210
 mant-0.209
Token,"
Token ID553
Feature activation-3.63e+00

Pos logit contributions
keeping+0.985
ping+0.727
 heads+0.696
 arrest+0.685
 mant+0.683
Neg logit contributions
ite-0.536
osh-0.519
ig-0.514
int-0.510
ver-0.498
Token he
Token ID339
Feature activation+1.57e+00

Pos logit contributions
keeping+0.435
 heads+0.295
ping+0.294
 arrest+0.290
edly+0.284
Neg logit contributions
itting-0.278
ours-0.275
ores-0.270
int-0.270
ig-0.264
Token said
Token ID531
Feature activation+3.99e-01

Pos logit contributions
ours+0.122
ores+0.117
ite+0.116
itting+0.112
ig+0.110
Neg logit contributions
keeping-0.188
ping-0.137
 heads-0.135
edly-0.130
cious-0.127
Token.
Token ID13
Feature activation+6.07e-02

Pos logit contributions
ours+0.033
ite+0.033
osh+0.033
int+0.032
ig+0.032
Neg logit contributions
keeping-0.046
ping-0.032
 heads-0.030
makers-0.029
edly-0.029
Token
Token ID198
Feature activation+2.44e-01

Pos logit contributions
osh+0.005
int+0.005
ours+0.005
ite+0.005
ig+0.005
Neg logit contributions
keeping-0.007
ping-0.005
edly-0.005
 arrest-0.005
 mant-0.005
Token were
Token ID547
Feature activation-2.12e+00

Pos logit contributions
keeping+0.188
 heads+0.143
ping+0.136
 arrest+0.132
makers+0.128
Neg logit contributions
ite-0.115
ours-0.114
int-0.114
ores-0.113
ig-0.113
Token too
Token ID1165
Feature activation+2.01e+00

Pos logit contributions
keeping+0.264
ping+0.194
 heads+0.187
 arrest+0.182
makers+0.180
Neg logit contributions
ite-0.149
int-0.149
osh-0.148
ours-0.144
ig-0.144
Token boring
Token ID14262
Feature activation-1.53e-02

Pos logit contributions
ig+0.160
int+0.158
ours+0.154
osh+0.154
ite+0.153
Neg logit contributions
keeping-0.221
ping-0.164
 heads-0.161
 arrest-0.155
makers-0.154
Token and
Token ID290
Feature activation-4.30e-01

Pos logit contributions
keeping+0.002
 mant+0.002
ping+0.002
 heads+0.002
 arrest+0.001
Neg logit contributions
 of-0.001
ores-0.001
ite-0.001
osh-0.001
ver-0.001
Token gir
Token ID37370
Feature activation-3.14e+00

Pos logit contributions
keeping+0.051
 heads+0.037
ping+0.037
 arrest+0.035
makers+0.034
Neg logit contributions
int-0.034
ig-0.033
ite-0.032
umber-0.031
ours-0.031
Tokenly
Token ID306
Feature activation-7.68e+00

Pos logit contributions
keeping+0.349
ping+0.241
 heads+0.229
 arrest+0.228
makers+0.222
Neg logit contributions
osh-0.261
int-0.261
ite-0.259
ours-0.258
ores-0.255
Token.
Token ID13
Feature activation-4.32e-01

Pos logit contributions
keeping+0.988
ping+0.710
 mant+0.693
 heads+0.688
 arrest+0.683
Neg logit contributions
osh-0.493
ite-0.480
ours-0.472
ores-0.470
int-0.466
Token He
Token ID679
Feature activation+5.43e-01

Pos logit contributions
keeping+0.044
 heads+0.028
 arrest+0.027
ping+0.027
makers+0.026
Neg logit contributions
int-0.042
ig-0.041
osh-0.039
ite-0.039
ours-0.038
Token liked
Token ID8288
Feature activation+3.70e-01

Pos logit contributions
ite+0.043
ores+0.042
int+0.042
ours+0.042
ig+0.042
Neg logit contributions
keeping-0.061
 heads-0.045
ping-0.043
 arrest-0.042
makers-0.041
Token cars
Token ID5006
Feature activation-9.13e-02

Pos logit contributions
osh+0.023
ores+0.023
ours+0.022
ite+0.022
ila+0.021
Neg logit contributions
keeping-0.049
ping-0.036
makers-0.034
 heads-0.034
 mant-0.034
Token better
Token ID1365
Feature activation+1.70e+00

Pos logit contributions
keeping+0.010
ping+0.007
 heads+0.007
makers+0.006
 arrest+0.006
Neg logit contributions
int-0.008
ours-0.008
ig-0.008
osh-0.008
ores-0.008
Token there
Token ID612
Feature activation+1.76e+00

Pos logit contributions
keeping+0.016
ping+0.012
makers+0.012
 heads+0.011
 arrest+0.011
Neg logit contributions
int-0.007
ite-0.006
ours-0.006
ores-0.006
osh-0.006
Token was
Token ID373
Feature activation+6.23e-01

Pos logit contributions
int+0.159
ite+0.149
ores+0.148
ours+0.145
aid+0.141
Neg logit contributions
keeping-0.185
ping-0.126
 heads-0.122
makers-0.119
 arrest-0.118
Token a
Token ID257
Feature activation+1.80e-01

Pos logit contributions
osh+0.034
ores+0.033
int+0.033
ite+0.032
ours+0.032
Neg logit contributions
keeping-0.089
ping-0.066
 heads-0.065
 arrest-0.063
makers-0.063
Token girl
Token ID2576
Feature activation-3.43e-01

Pos logit contributions
int+0.015
osh+0.015
ours+0.015
ite+0.015
ores+0.015
Neg logit contributions
keeping-0.019
ping-0.013
 heads-0.013
 arrest-0.012
makers-0.012
Token named
Token ID3706
Feature activation+3.90e+00

Pos logit contributions
keeping+0.052
ping+0.039
 heads+0.038
 arrest+0.038
makers+0.037
Neg logit contributions
osh-0.015
int-0.015
ite-0.015
ours-0.015
ores-0.015
Token Claire
Token ID20758
Feature activation-5.77e+00

Pos logit contributions
ila+0.133
ores+0.130
 of+0.129
 Liz+0.129
ours+0.124
Neg logit contributions
keeping-0.637
ping-0.510
 arrest-0.486
 heads-0.481
edly-0.474
Token.
Token ID13
Feature activation-9.83e-02

Pos logit contributions
keeping+0.756
ping+0.551
 heads+0.533
 arrest+0.532
makers+0.517
Neg logit contributions
ite-0.376
osh-0.373
int-0.369
ig-0.363
ours-0.358
Token Claire
Token ID20758
Feature activation-5.78e+00

Pos logit contributions
keeping+0.013
 arrest+0.010
ping+0.010
 heads+0.010
makers+0.010
Neg logit contributions
ours-0.006
ver-0.006
 of-0.006
ores-0.005
int-0.005
Token was
Token ID373
Feature activation+5.93e-01

Pos logit contributions
keeping+0.757
ping+0.555
 arrest+0.531
 heads+0.531
makers+0.519
Neg logit contributions
osh-0.374
ite-0.371
int-0.369
ours-0.361
ig-0.359
Token playing
Token ID2712
Feature activation-1.15e+00

Pos logit contributions
int+0.049
ite+0.048
osh+0.048
ig+0.047
ours+0.046
Neg logit contributions
keeping-0.065
ping-0.047
 arrest-0.045
 heads-0.045
makers-0.044
Token alone
Token ID3436
Feature activation-1.13e+00

Pos logit contributions
keeping+0.142
ping+0.102
 heads+0.097
 arrest+0.097
makers+0.095
Neg logit contributions
int-0.082
osh-0.082
ite-0.082
ours-0.081
ores-0.080
Token puppy
Token ID26188
Feature activation-3.26e+00

Pos logit contributions
ite+0.027
 of+0.025
uddle+0.025
 helmet+0.024
udd+0.021
Neg logit contributions
keeping-0.044
iously-0.031
makers-0.031
ping-0.031
 generations-0.030
Token and
Token ID290
Feature activation-7.80e-02

Pos logit contributions
keeping+0.464
ping+0.345
makers+0.335
 heads+0.329
 arrest+0.326
Neg logit contributions
ite-0.184
ig-0.181
int-0.174
ours-0.169
osh-0.165
Token said
Token ID531
Feature activation+2.77e-01

Pos logit contributions
keeping+0.010
ping+0.007
 heads+0.007
 arrest+0.007
makers+0.007
Neg logit contributions
int-0.005
osh-0.005
ite-0.005
ours-0.005
ores-0.005
Token,
Token ID11
Feature activation-1.42e-01

Pos logit contributions
int+0.017
ite+0.017
ores+0.017
osh+0.017
ours+0.016
Neg logit contributions
keeping-0.036
ping-0.027
 heads-0.026
makers-0.026
 arrest-0.026
Token "
Token ID366
Feature activation-1.20e-01

Pos logit contributions
keeping+0.022
 arrest+0.016
ping+0.016
 heads+0.016
 generation+0.015
Neg logit contributions
ver-0.006
ores-0.006
ours-0.006
osh-0.006
int-0.006
TokenHello
Token ID15496
Feature activation-6.06e+00

Pos logit contributions
keeping+0.015
ping+0.010
makers+0.010
 heads+0.010
 mant+0.010
Neg logit contributions
ite-0.009
osh-0.008
int-0.008
Because-0.008
aw-0.008
Token,
Token ID11
Feature activation-1.16e-01

Pos logit contributions
keeping+0.691
ping+0.500
 arrest+0.485
 heads+0.478
makers+0.463
Neg logit contributions
ores-0.464
int-0.462
osh-0.461
ours-0.448
ig-0.431
Token little
Token ID1310
Feature activation-2.88e+00

Pos logit contributions
keeping+0.012
ping+0.008
 arrest+0.008
 heads+0.008
makers+0.007
Neg logit contributions
int-0.011
ores-0.010
osh-0.010
ours-0.010
ig-0.010
Token puppy
Token ID26188
Feature activation-3.24e+00

Pos logit contributions
keeping+0.285
ping+0.183
makers+0.167
 mant+0.167
 arrest+0.164
Neg logit contributions
ite-0.289
ig-0.272
osh-0.271
int-0.268
ours-0.266
Token!
Token ID0
Feature activation+9.62e-01

Pos logit contributions
keeping+0.375
ping+0.262
 heads+0.251
 arrest+0.251
makers+0.246
Neg logit contributions
int-0.256
osh-0.253
ours-0.251
ite-0.249
ores-0.249
Token Do
Token ID2141
Feature activation-3.36e+00

Pos logit contributions
int+0.086
ig+0.085
osh+0.084
ores+0.084
ours+0.084
Neg logit contributions
keeping-0.100
ping-0.067
 arrest-0.064
makers-0.063
 heads-0.062