TOP ACTIVATIONS
MAX = 11.705

grandma 's house for safe keeping . When
it to me for safe keeping . He told me to
the birds were ch ir ping . Together , they laughed
her mum . " Hel ping others can be a delicate
He heard birds ch ir ping and saw butterflies fl uttering
she saw birds ch ir ping , butterflies fl uttering and
the small birds ch ir ping in the distance . He
, the birds ch ir ping , her dog barking and
and birds were ch ir ping . Alice noticed
bright blue birds ch ir ping in the trees and big
the birds were ch ir ping . Little Peter was in
make sounds like ch ir ping and singing . Ben and
and heard birds ch ir ping . But she also saw
heard the bird ch ir ping and went to investigate .
to the birds ch ir ping , when she heard a
do . He was gas ping and pant ing , almost
. It was ch ir ping and hopping around ! The
The bird was ch ir ping , and Lily wanted to
the birds stopped ch ir ping to listen .
heard a loud ch ir ping . She looked around and

MIN ACTIVATIONS
MIN = -10.485

loved to dance and tw irl around in her pretty pink
every day she would tw irl around and spin .
by doing a graceful tw irl in the air . From
a different dress and tw irl around in front of the
loved to dance and tw irl around in circles . One
. Now she could tw irl and dance whenever she wanted
. She walks and tw irl s and sings .
She did a little tw irl and looked in the mirror
. He started to tw irl and spin around in circles
loved to dance and tw irl in her pretty skirt .
watching her twist and tw irl . The white lady smiled
started to dance and tw irl around in the sunshine .
He would dance and tw irl around his room every time
is very pretty and tw irl s around in it .
old lady like a wh irl ig ig . His dancing
loved to dance and tw irl around in her room .
. She loved to tw irl around in it and feel
dish washer began to wh irl and swirl the water around
on pretty dresses and tw irl around in them . One
loved to dance and tw irl around in her pretty dress

INTERVAL 6.157 - 11.705
CONTAINS 0.096%

an available stick on the ground . Tom managed to grab
's family . They introduced themselves . They showed them the
, so one twin brave ly reached out and touched the
happy ending ! <|endoftext|> Da isy was a little girl and
, people were chatting and eating lunch . Tom smiled and

INTERVAL 0.610 - 6.157
CONTAINS 24.311%

were sleeping and sn ugg ling with their mom .
not play with was ps anymore . We will only play
he loves juice . Let me help you clean up ,"
carpet and pretend they are in a cave .
Finally , when her friends arrived , Sarah served them the

INTERVAL -4.937 - 0.610
CONTAINS 75.267%

the tree and get it . Her mom said she couldn
Zoe . Zoe wanted to chat with her mom my ,
Ol iver was so happy . He loved
away . He stretched and stretched , but he could not
. She thinks Sara is kind and smart . She hugs

INTERVAL -10.485 - -4.937
CONTAINS 0.325%

did not understand why L ila and Tom were so mean
she saw a beautiful fl ute in the window of a
the duck float and qu ack . " Look
" The p uddle is not the same everywhere
learned an important lesson . Sometimes , we need to take
Token grandma
Token ID49890
Feature activation-6.21e-01

Pos logit contributions
ours+0.082
osh+0.074
ores+0.074
ila+0.073
?+0.072
Neg logit contributions
keeping-0.224
ping-0.185
 heads-0.165
 arrest-0.164
cious-0.160
Token's
Token ID338
Feature activation+5.74e-01

Pos logit contributions
ours+0.040
?+0.039
ig+0.037
ores+0.037
osh+0.037
Neg logit contributions
keeping-0.086
ping-0.069
 heads-0.063
 arrest-0.062
edly-0.060
Token house
Token ID2156
Feature activation-5.96e-01

Pos logit contributions
keeping+0.079
ping+0.066
cious+0.056
edly+0.056
 arrest+0.055
Neg logit contributions
ours-0.036
?-0.035
ig-0.034
 exactly-0.033
uddle-0.032
Token for
Token ID329
Feature activation+6.63e-01

Pos logit contributions
ours+0.044
osh+0.041
ores+0.041
ila+0.040
?+0.039
Neg logit contributions
keeping-0.078
ping-0.062
 heads-0.056
 arrest-0.055
cious-0.053
Token safe
Token ID3338
Feature activation-1.25e-01

Pos logit contributions
keeping+0.097
ping+0.083
 heads+0.078
cious+0.077
 mant+0.074
Neg logit contributions
 exactly-0.034
?-0.034
ores-0.032
 of-0.030
irl-0.029
Tokenkeeping
Token ID19934
Feature activation+1.17e+01

Pos logit contributions
?+0.012
ores+0.011
 of+0.011
 exactly+0.011
ours+0.011
Neg logit contributions
keeping-0.011
ping-0.009
cious-0.009
 mant-0.008
 arrest-0.008
Token.
Token ID13
Feature activation+1.30e-01

Pos logit contributions
keeping+1.682
ping+1.502
 heads+1.402
 mant+1.351
 arrest+1.347
Neg logit contributions
?-0.567
 of-0.534
ver-0.530
adow-0.505
 exactly-0.498
Token
Token ID220
Feature activation-4.54e-02

Pos logit contributions
keeping+0.017
ping+0.012
 arrest+0.011
 heads+0.011
 pavement+0.011
Neg logit contributions
ours-0.010
irl-0.010
osh-0.010
ila-0.010
ig-0.010
Token
Token ID198
Feature activation+4.43e-02

Pos logit contributions
ours+0.003
osh+0.003
ig+0.003
irl+0.003
int+0.003
Neg logit contributions
keeping-0.006
ping-0.005
makers-0.004
 arrest-0.004
 heads-0.004
Token
Token ID198
Feature activation+4.86e-02

Pos logit contributions
keeping+0.006
ping+0.005
 arrest+0.004
 heads+0.004
makers+0.004
Neg logit contributions
ours-0.003
osh-0.003
ig-0.003
ores-0.003
int-0.003
TokenWhen
Token ID2215
Feature activation-3.04e-01

Pos logit contributions
keeping+0.006
ping+0.005
 arrest+0.004
 heads+0.004
makers+0.004
Neg logit contributions
ours-0.004
osh-0.004
ores-0.004
ig-0.004
int-0.004
Token it
Token ID340
Feature activation-3.87e-01

Pos logit contributions
ours+0.082
osh+0.077
ila+0.076
ores+0.076
?+0.074
Neg logit contributions
keeping-0.154
ping-0.124
 heads-0.110
 arrest-0.109
cious-0.105
Token to
Token ID284
Feature activation+3.56e-01

Pos logit contributions
ours+0.030
osh+0.028
ores+0.028
ila+0.027
?+0.027
Neg logit contributions
keeping-0.050
ping-0.039
 heads-0.035
 arrest-0.034
cious-0.033
Token me
Token ID502
Feature activation+1.38e+00

Pos logit contributions
keeping+0.050
ping+0.041
 heads+0.036
cious+0.036
 arrest+0.035
Neg logit contributions
ours-0.023
ores-0.022
ila-0.022
?-0.022
osh-0.021
Token for
Token ID329
Feature activation+5.72e-01

Pos logit contributions
keeping+0.170
ping+0.134
 heads+0.120
 arrest+0.115
edly+0.112
Neg logit contributions
ours-0.110
osh-0.102
ores-0.101
ila-0.100
irl-0.100
Token safe
Token ID3338
Feature activation-2.17e-01

Pos logit contributions
keeping+0.079
ping+0.065
cious+0.061
 heads+0.058
edly+0.058
Neg logit contributions
?-0.035
 exactly-0.033
ores-0.033
irl-0.032
ila-0.031
Tokenkeeping
Token ID19934
Feature activation+1.16e+01

Pos logit contributions
?+0.025
 of+0.024
 if+0.023
ores+0.022
ila+0.022
Neg logit contributions
keeping-0.013
ping-0.012
cious-0.011
 mant-0.011
ulent-0.010
Token.
Token ID13
Feature activation+5.30e-02

Pos logit contributions
keeping+1.678
ping+1.462
 mant+1.343
 heads+1.329
 arrest+1.297
Neg logit contributions
?-0.594
ours-0.522
 of-0.516
ver-0.502
 exactly-0.499
Token He
Token ID679
Feature activation-9.02e-01

Pos logit contributions
keeping+0.006
ping+0.005
 arrest+0.004
 heads+0.004
 mant+0.004
Neg logit contributions
ig-0.005
ours-0.004
irl-0.004
uddle-0.004
ade-0.004
Token told
Token ID1297
Feature activation-1.34e+00

Pos logit contributions
ours+0.060
ores+0.059
ig+0.055
int+0.054
osh+0.054
Neg logit contributions
keeping-0.120
ping-0.096
 heads-0.092
 arrest-0.090
 necks-0.084
Token me
Token ID502
Feature activation+1.37e+00

Pos logit contributions
ours+0.100
ila+0.095
ores+0.094
osh+0.093
?+0.093
Neg logit contributions
keeping-0.177
ping-0.140
 arrest-0.123
 heads-0.122
cious-0.121
Token to
Token ID284
Feature activation+3.44e-01

Pos logit contributions
keeping+0.173
ping+0.137
 heads+0.121
 arrest+0.120
makers+0.117
Neg logit contributions
ours-0.102
int-0.102
irl-0.096
umber-0.095
ig-0.092
Token the
Token ID262
Feature activation-3.54e-01

Pos logit contributions
keeping+0.019
ping+0.015
 heads+0.014
 arrest+0.013
cious+0.013
Neg logit contributions
ours-0.010
osh-0.010
ores-0.010
ila-0.010
?-0.009
Token birds
Token ID10087
Feature activation+1.76e+00

Pos logit contributions
ours+0.029
ila+0.028
?+0.028
ores+0.028
osh+0.028
Neg logit contributions
keeping-0.042
ping-0.032
 arrest-0.028
edly-0.027
makers-0.027
Token were
Token ID547
Feature activation+2.14e+00

Pos logit contributions
keeping+0.237
ping+0.187
 heads+0.167
 arrest+0.167
 mant+0.159
Neg logit contributions
ours-0.125
osh-0.119
ores-0.117
ila-0.115
?-0.113
Token ch
Token ID442
Feature activation-1.98e-01

Pos logit contributions
keeping+0.257
ping+0.190
 arrest+0.174
 heads+0.170
cious+0.163
Neg logit contributions
ours-0.182
ores-0.177
osh-0.173
 exactly-0.170
?-0.170
Tokenir
Token ID343
Feature activation+4.74e-01

Pos logit contributions
ours+0.016
?+0.016
ver+0.016
osh+0.015
ila+0.015
Neg logit contributions
keeping-0.024
ping-0.020
 heads-0.017
 arrest-0.016
makers-0.015
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.028
 arrest+0.009
 mant+0.009
cious+0.008
 internet+0.008
Neg logit contributions
int-0.068
 of-0.068
osh-0.066
ores-0.066
adow-0.065
Token.
Token ID13
Feature activation+1.86e-01

Pos logit contributions
keeping+1.545
ping+1.172
 arrest+1.142
 heads+1.112
 mant+1.091
Neg logit contributions
ores-0.617
osh-0.603
 of-0.601
ours-0.593
?-0.587
Token Together
Token ID17083
Feature activation-4.03e+00

Pos logit contributions
keeping+0.024
ping+0.019
 heads+0.016
 arrest+0.016
cious+0.015
Neg logit contributions
ours-0.015
osh-0.014
ores-0.014
?-0.014
ila-0.013
Token,
Token ID11
Feature activation+2.14e-01

Pos logit contributions
ours+0.202
osh+0.186
ores+0.183
?+0.178
ila+0.174
Neg logit contributions
keeping-0.629
ping-0.519
 heads-0.466
 arrest-0.464
 mant-0.452
Token they
Token ID484
Feature activation-4.02e-01

Pos logit contributions
keeping+0.033
ping+0.027
 heads+0.024
 arrest+0.024
cious+0.023
Neg logit contributions
ours-0.011
osh-0.011
ores-0.010
?-0.010
ver-0.010
Token laughed
Token ID13818
Feature activation+8.34e-01

Pos logit contributions
ours+0.033
ores+0.030
ila+0.030
?+0.029
irl+0.029
Neg logit contributions
keeping-0.049
ping-0.039
 heads-0.035
 arrest-0.034
cious-0.033
Token her
Token ID607
Feature activation+8.24e-01

Pos logit contributions
keeping+0.005
ping+0.004
 heads+0.003
 arrest+0.003
cious+0.003
Neg logit contributions
ours-0.003
osh-0.003
ores-0.003
irl-0.003
ila-0.003
Token mum
Token ID25682
Feature activation-2.65e+00

Pos logit contributions
keeping+0.115
ping+0.092
 heads+0.085
 arrest+0.081
cious+0.079
Neg logit contributions
ours-0.056
osh-0.052
ores-0.052
ila-0.049
int-0.048
Token.
Token ID13
Feature activation+1.88e-01

Pos logit contributions
ours+0.177
osh+0.166
ores+0.165
int+0.163
?+0.163
Neg logit contributions
keeping-0.361
ping-0.287
 arrest-0.261
 heads-0.260
 mant-0.246
Token "
Token ID366
Feature activation+6.19e-01

Pos logit contributions
keeping+0.026
ping+0.021
 heads+0.018
 arrest+0.018
cious+0.017
Neg logit contributions
ours-0.013
osh-0.012
ores-0.012
ila-0.012
?-0.012
TokenHel
Token ID12621
Feature activation+8.82e-01

Pos logit contributions
keeping+0.076
ping+0.060
 heads+0.054
 arrest+0.053
cious+0.050
Neg logit contributions
ours-0.050
osh-0.048
ores-0.048
ila-0.047
?-0.046
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.091
ping+0.058
 heads+0.056
 arrest+0.056
cious+0.054
Neg logit contributions
ours-0.089
osh-0.088
?-0.088
ila-0.088
ig-0.088
Token others
Token ID1854
Feature activation+1.76e+00

Pos logit contributions
keeping+1.412
ping+1.109
 heads+0.993
 arrest+0.992
cious+0.956
Neg logit contributions
ours-0.763
osh-0.729
ores-0.711
?-0.705
ila-0.705
Token can
Token ID460
Feature activation-2.76e+00

Pos logit contributions
keeping+0.255
ping+0.203
 heads+0.183
 arrest+0.182
cious+0.177
Neg logit contributions
ours-0.106
osh-0.103
?-0.099
ores-0.097
ila-0.094
Token be
Token ID307
Feature activation+2.50e+00

Pos logit contributions
?+0.204
osh+0.199
irl+0.199
ours+0.199
ores+0.195
Neg logit contributions
keeping-0.365
ping-0.279
 heads-0.246
cious-0.241
 arrest-0.239
Token a
Token ID257
Feature activation+7.66e-02

Pos logit contributions
keeping+0.334
ping+0.271
 heads+0.238
 arrest+0.238
cious+0.226
Neg logit contributions
ours-0.183
osh-0.167
ores-0.165
ila-0.165
int-0.161
Token delicate
Token ID19217
Feature activation+1.12e+00

Pos logit contributions
keeping+0.011
ping+0.009
 heads+0.008
 arrest+0.008
cious+0.008
Neg logit contributions
ours-0.005
osh-0.004
ores-0.004
ila-0.004
?-0.004
TokenHe
Token ID1544
Feature activation-9.90e-01

Pos logit contributions
keeping+0.013
ping+0.009
 arrest+0.009
cious+0.008
makers+0.008
Neg logit contributions
ours-0.010
ig-0.009
int-0.009
osh-0.009
ores-0.009
Token heard
Token ID2982
Feature activation+1.16e+00

Pos logit contributions
ours+0.083
ores+0.078
ila+0.075
umber+0.074
int+0.073
Neg logit contributions
keeping-0.117
ping-0.093
 heads-0.085
 arrest-0.082
cious-0.079
Token birds
Token ID10087
Feature activation+1.75e+00

Pos logit contributions
keeping+0.155
ping+0.126
 heads+0.111
 arrest+0.110
 mant+0.105
Neg logit contributions
ours-0.085
ores-0.076
ila-0.076
osh-0.076
ig-0.076
Token ch
Token ID442
Feature activation-1.98e-01

Pos logit contributions
keeping+0.206
ping+0.141
 arrest+0.131
 heads+0.130
 mant+0.124
Neg logit contributions
ours-0.155
osh-0.154
?-0.150
ores-0.150
 of-0.148
Tokenir
Token ID343
Feature activation+4.74e-01

Pos logit contributions
ours+0.016
ores+0.015
osh+0.015
ila+0.015
?+0.015
Neg logit contributions
keeping-0.024
ping-0.020
 heads-0.017
 arrest-0.017
cious-0.016
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.030
 mant+0.011
cious+0.010
 arrest+0.010
 doorstep+0.009
Neg logit contributions
int-0.066
osh-0.066
 of-0.066
ack-0.064
ores-0.063
Token and
Token ID290
Feature activation+1.37e-01

Pos logit contributions
keeping+1.570
ping+1.174
 arrest+1.144
 heads+1.106
 mant+1.094
Neg logit contributions
osh-0.608
ores-0.596
 of-0.586
ours-0.584
?-0.571
Token saw
Token ID2497
Feature activation+6.80e-01

Pos logit contributions
keeping+0.018
ping+0.013
 arrest+0.012
 heads+0.012
 mant+0.012
Neg logit contributions
osh-0.010
ours-0.010
ores-0.010
ila-0.009
?-0.009
Token butterflies
Token ID44571
Feature activation+4.39e+00

Pos logit contributions
keeping+0.086
ping+0.070
 heads+0.060
 arrest+0.060
cious+0.057
Neg logit contributions
ours-0.053
int-0.049
ores-0.049
ig-0.049
irl-0.049
Token fl
Token ID781
Feature activation+9.93e-01

Pos logit contributions
keeping+0.588
ping+0.433
 heads+0.409
 arrest+0.406
 mant+0.390
Neg logit contributions
ours-0.311
?-0.310
osh-0.307
 of-0.302
ores-0.300
Tokenuttering
Token ID33598
Feature activation-1.20e+00

Pos logit contributions
keeping+0.133
ping+0.115
 heads+0.099
edly+0.097
cious+0.095
Neg logit contributions
ours-0.065
?-0.060
ila-0.058
irl-0.057
ores-0.056
Token she
Token ID673
Feature activation-1.05e+00

Pos logit contributions
keeping+0.030
ping+0.024
 heads+0.022
 arrest+0.021
cious+0.021
Neg logit contributions
ours-0.014
int-0.014
ig-0.013
ores-0.013
osh-0.013
Token saw
Token ID2497
Feature activation+6.74e-01

Pos logit contributions
ores+0.097
ours+0.096
int+0.090
ver+0.087
ite+0.087
Neg logit contributions
keeping-0.104
ping-0.082
 heads-0.079
 necks-0.071
cious-0.070
Token birds
Token ID10087
Feature activation+1.76e+00

Pos logit contributions
keeping+0.091
ping+0.073
 heads+0.065
 arrest+0.064
cious+0.062
Neg logit contributions
ours-0.048
ores-0.044
osh-0.044
int-0.043
ig-0.043
Token ch
Token ID442
Feature activation-1.98e-01

Pos logit contributions
keeping+0.241
ping+0.173
 arrest+0.167
 heads+0.164
 mant+0.159
Neg logit contributions
ours-0.125
?-0.122
osh-0.119
 of-0.119
ig-0.119
Tokenir
Token ID343
Feature activation+4.86e-01

Pos logit contributions
ours+0.015
osh+0.014
ila+0.014
ores+0.014
ig+0.013
Neg logit contributions
keeping-0.026
ping-0.020
 heads-0.018
 arrest-0.018
 mant-0.018
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.032
 mant+0.016
 doorstep+0.016
 arrest+0.014
cious+0.013
Neg logit contributions
aw-0.064
 of-0.064
int-0.064
ack-0.061
osh-0.061
Token,
Token ID11
Feature activation+2.22e-01

Pos logit contributions
keeping+1.527
ping+1.153
 arrest+1.126
 heads+1.086
 mant+1.080
Neg logit contributions
osh-0.623
ours-0.621
 of-0.618
ores-0.610
?-0.609
Token butterflies
Token ID44571
Feature activation+4.40e+00

Pos logit contributions
keeping+0.028
ping+0.021
 arrest+0.019
 heads+0.019
 mant+0.018
Neg logit contributions
ours-0.018
osh-0.017
ila-0.017
ores-0.017
?-0.017
Token fl
Token ID781
Feature activation+9.87e-01

Pos logit contributions
keeping+0.696
ping+0.527
 arrest+0.510
 heads+0.505
 mant+0.499
Neg logit contributions
?-0.224
 of-0.219
ig-0.215
ours-0.213
osh-0.207
Tokenuttering
Token ID33598
Feature activation-1.18e+00

Pos logit contributions
keeping+0.134
ping+0.113
 heads+0.099
edly+0.096
cious+0.095
Neg logit contributions
ours-0.065
?-0.061
ores-0.059
osh-0.058
irl-0.057
Token and
Token ID290
Feature activation+1.37e-01

Pos logit contributions
ours+0.078
osh+0.076
ores+0.075
ila+0.073
?+0.073
Neg logit contributions
keeping-0.161
ping-0.125
 arrest-0.118
 heads-0.114
 mant-0.111
Token the
Token ID262
Feature activation-3.53e-01

Pos logit contributions
keeping+0.019
ping+0.015
 heads+0.014
 arrest+0.014
cious+0.013
Neg logit contributions
ours-0.009
osh-0.008
ores-0.008
ila-0.008
?-0.008
Token small
Token ID1402
Feature activation+1.84e+00

Pos logit contributions
ours+0.031
?+0.031
ila+0.030
osh+0.030
ig+0.030
Neg logit contributions
keeping-0.039
ping-0.029
 arrest-0.026
makers-0.024
cious-0.024
Token birds
Token ID10087
Feature activation+1.76e+00

Pos logit contributions
keeping+0.209
ping+0.157
 arrest+0.137
cious+0.130
makers+0.130
Neg logit contributions
ours-0.166
?-0.161
osh-0.158
ig-0.157
ila-0.155
Token ch
Token ID442
Feature activation-1.98e-01

Pos logit contributions
keeping+0.241
ping+0.182
 heads+0.170
 arrest+0.170
 mant+0.164
Neg logit contributions
ours-0.119
osh-0.117
ores-0.113
?-0.112
ig-0.111
Tokenir
Token ID343
Feature activation+4.86e-01

Pos logit contributions
ours+0.016
ores+0.015
osh+0.015
ila+0.015
?+0.015
Neg logit contributions
keeping-0.025
ping-0.020
 heads-0.017
 arrest-0.017
cious-0.016
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.034
 mant+0.017
cious+0.013
 doorstep+0.013
 arrest+0.013
Neg logit contributions
 of-0.064
int-0.063
osh-0.063
ila-0.062
ig-0.061
Token in
Token ID287
Feature activation+1.20e+00

Pos logit contributions
keeping+1.498
ping+1.135
 arrest+1.089
 heads+1.062
 mant+1.054
Neg logit contributions
osh-0.651
ours-0.640
ores-0.638
 of-0.631
?-0.617
Token the
Token ID262
Feature activation-3.54e-01

Pos logit contributions
keeping+0.175
ping+0.143
 heads+0.129
cious+0.127
 arrest+0.126
Neg logit contributions
ours-0.066
osh-0.065
?-0.065
ores-0.065
ila-0.062
Token distance
Token ID5253
Feature activation+2.33e+00

Pos logit contributions
?+0.029
osh+0.028
ig+0.028
ite+0.027
ores+0.027
Neg logit contributions
keeping-0.042
ping-0.031
 arrest-0.028
makers-0.027
cious-0.027
Token.
Token ID13
Feature activation+1.99e-01

Pos logit contributions
keeping+0.323
ping+0.255
 arrest+0.235
 heads+0.232
 mant+0.223
Neg logit contributions
ours-0.152
osh-0.144
ores-0.144
?-0.143
ila-0.140
Token He
Token ID679
Feature activation-7.47e-01

Pos logit contributions
keeping+0.027
ping+0.021
 arrest+0.019
 heads+0.019
cious+0.018
Neg logit contributions
ours-0.014
osh-0.013
irl-0.013
ig-0.013
ila-0.013
Token,
Token ID11
Feature activation+1.73e-01

Pos logit contributions
keeping+0.547
ping+0.433
 arrest+0.388
 heads+0.387
cious+0.369
Neg logit contributions
ours-0.289
osh-0.276
ores-0.273
ila-0.267
?-0.266
Token the
Token ID262
Feature activation-3.75e-01

Pos logit contributions
keeping+0.021
ping+0.016
 heads+0.014
 arrest+0.014
cious+0.014
Neg logit contributions
ours-0.014
osh-0.014
ores-0.013
ila-0.013
?-0.013
Token birds
Token ID10087
Feature activation+1.74e+00

Pos logit contributions
ours+0.030
?+0.030
osh+0.030
ila+0.029
ig+0.028
Neg logit contributions
keeping-0.045
ping-0.035
 arrest-0.031
cious-0.029
 heads-0.029
Token ch
Token ID442
Feature activation-2.00e-01

Pos logit contributions
keeping+0.247
ping+0.197
 heads+0.178
 arrest+0.177
 mant+0.170
Neg logit contributions
ours-0.111
osh-0.105
ores-0.103
ila-0.101
?-0.100
Tokenir
Token ID343
Feature activation+4.85e-01

Pos logit contributions
ours+0.016
osh+0.015
ores+0.015
ila+0.015
ig+0.014
Neg logit contributions
keeping-0.026
ping-0.020
 heads-0.018
 arrest-0.018
cious-0.017
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.040
 mant+0.020
 arrest+0.019
 doorstep+0.019
cious+0.019
Neg logit contributions
osh-0.060
 of-0.059
int-0.058
ig-0.058
ila-0.057
Token,
Token ID11
Feature activation+2.18e-01

Pos logit contributions
keeping+1.536
ping+1.180
 arrest+1.119
 heads+1.081
 mant+1.068
Neg logit contributions
osh-0.631
ours-0.616
ores-0.605
?-0.602
 of-0.587
Token her
Token ID607
Feature activation+8.11e-01

Pos logit contributions
keeping+0.029
ping+0.021
 arrest+0.019
makers+0.019
 heads+0.019
Neg logit contributions
ours-0.017
osh-0.016
ores-0.016
?-0.015
irl-0.015
Token dog
Token ID3290
Feature activation+3.51e-01

Pos logit contributions
keeping+0.099
ping+0.074
 arrest+0.066
makers+0.064
cious+0.061
Neg logit contributions
osh-0.066
ours-0.065
?-0.065
ig-0.064
 exactly-0.062
Token barking
Token ID48961
Feature activation+4.27e+00

Pos logit contributions
keeping+0.052
ping+0.042
 heads+0.038
 arrest+0.038
 mant+0.037
Neg logit contributions
ours-0.020
osh-0.019
ores-0.018
ila-0.018
?-0.018
Token and
Token ID290
Feature activation+1.40e-01

Pos logit contributions
keeping+0.634
ping+0.486
 arrest+0.471
 mant+0.453
 heads+0.446
Neg logit contributions
ours-0.227
osh-0.225
ores-0.225
?-0.223
 exactly-0.216
Token and
Token ID290
Feature activation+1.38e-01

Pos logit contributions
keeping+0.310
ping+0.249
 heads+0.222
 arrest+0.219
cious+0.210
Neg logit contributions
ours-0.163
osh-0.152
ores-0.150
ila-0.149
?-0.145
Token birds
Token ID10087
Feature activation+1.75e+00

Pos logit contributions
keeping+0.020
ping+0.016
 arrest+0.014
 heads+0.014
cious+0.014
Neg logit contributions
ours-0.009
osh-0.008
ores-0.008
ila-0.008
?-0.008
Token were
Token ID547
Feature activation+2.14e+00

Pos logit contributions
keeping+0.244
ping+0.178
 arrest+0.174
 heads+0.169
 mant+0.165
Neg logit contributions
osh-0.119
ours-0.113
?-0.110
ores-0.109
int-0.108
Token ch
Token ID442
Feature activation-2.08e-01

Pos logit contributions
keeping+0.244
ping+0.173
 arrest+0.162
 heads+0.158
edly+0.151
Neg logit contributions
ours-0.190
ores-0.188
 exactly-0.186
osh-0.184
?-0.183
Tokenir
Token ID343
Feature activation+4.77e-01

Pos logit contributions
ours+0.017
osh+0.016
ores+0.016
ver+0.016
?+0.016
Neg logit contributions
keeping-0.025
ping-0.020
 heads-0.018
 arrest-0.017
makers-0.016
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.028
 arrest+0.009
 mant+0.008
cious+0.008
 internet+0.007
Neg logit contributions
int-0.069
 of-0.069
ores-0.066
osh-0.066
adow-0.066
Token.
Token ID13
Feature activation+1.85e-01

Pos logit contributions
keeping+1.541
ping+1.168
 arrest+1.137
 heads+1.115
 mant+1.085
Neg logit contributions
ores-0.612
 of-0.605
osh-0.598
?-0.589
ours-0.586
Token
Token ID198
Feature activation+1.10e-01

Pos logit contributions
keeping+0.025
ping+0.020
 heads+0.018
 arrest+0.018
cious+0.017
Neg logit contributions
ours-0.013
osh-0.012
ores-0.012
ila-0.011
ig-0.011
Token
Token ID198
Feature activation+1.09e-01

Pos logit contributions
keeping+0.015
ping+0.011
 arrest+0.010
makers+0.010
cious+0.010
Neg logit contributions
ours-0.008
int-0.007
osh-0.007
ig-0.007
ores-0.007
TokenAlice
Token ID44484
Feature activation+1.62e+00

Pos logit contributions
keeping+0.014
ping+0.011
 arrest+0.010
 heads+0.010
cious+0.009
Neg logit contributions
ours-0.009
osh-0.008
ores-0.008
ig-0.008
int-0.008
Token noticed
Token ID6810
Feature activation-6.93e-01

Pos logit contributions
keeping+0.224
ping+0.178
 arrest+0.159
 heads+0.158
 mant+0.152
Neg logit contributions
ours-0.109
osh-0.104
ila-0.100
ores-0.100
?-0.099
Token bright
Token ID6016
Feature activation-3.25e-01

Pos logit contributions
keeping+0.113
ping+0.090
 heads+0.078
 arrest+0.078
cious+0.075
Neg logit contributions
ours-0.069
irl-0.064
int-0.064
ores-0.063
osh-0.063
Token blue
Token ID4171
Feature activation-6.55e-01

Pos logit contributions
ours+0.028
osh+0.027
ores+0.027
ila+0.026
int+0.026
Neg logit contributions
keeping-0.038
ping-0.030
 heads-0.026
 arrest-0.025
 mant-0.024
Token birds
Token ID10087
Feature activation+1.76e+00

Pos logit contributions
ours+0.049
?+0.048
 of+0.047
ig+0.046
osh+0.046
Neg logit contributions
keeping-0.086
ping-0.066
 arrest-0.058
makers-0.058
 heads-0.058
Token ch
Token ID442
Feature activation-2.05e-01

Pos logit contributions
keeping+0.252
ping+0.194
 heads+0.184
 arrest+0.179
 mant+0.174
Neg logit contributions
ours-0.107
osh-0.105
ig-0.102
?-0.102
 of-0.101
Tokenir
Token ID343
Feature activation+4.87e-01

Pos logit contributions
ours+0.015
osh+0.014
ores+0.014
ila+0.014
ig+0.014
Neg logit contributions
keeping-0.027
ping-0.021
 heads-0.019
 arrest-0.019
 mant-0.018
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.030
 arrest+0.011
 mant+0.011
 doorstep+0.010
cious+0.010
Neg logit contributions
 of-0.068
int-0.065
osh-0.065
adow-0.064
ores-0.064
Token in
Token ID287
Feature activation+1.20e+00

Pos logit contributions
keeping+1.534
ping+1.164
 arrest+1.119
 heads+1.097
 mant+1.075
Neg logit contributions
osh-0.616
 of-0.611
ores-0.596
ours-0.590
 exactly-0.583
Token the
Token ID262
Feature activation-3.47e-01

Pos logit contributions
keeping+0.183
ping+0.150
 heads+0.138
cious+0.138
edly+0.137
Neg logit contributions
?-0.057
 of-0.056
osh-0.055
ores-0.055
 exactly-0.054
Token trees
Token ID7150
Feature activation+2.06e+00

Pos logit contributions
osh+0.026
?+0.026
ite+0.026
ig+0.025
ila+0.025
Neg logit contributions
keeping-0.044
ping-0.033
 arrest-0.030
edly-0.029
makers-0.029
Token and
Token ID290
Feature activation+1.43e-01

Pos logit contributions
keeping+0.300
ping+0.238
 heads+0.223
 arrest+0.219
 mant+0.211
Neg logit contributions
ours-0.115
osh-0.113
?-0.113
ores-0.112
 of-0.109
Token big
Token ID1263
Feature activation-1.26e-01

Pos logit contributions
keeping+0.018
ping+0.013
 arrest+0.012
 heads+0.012
makers+0.011
Neg logit contributions
ours-0.011
osh-0.011
ores-0.011
ila-0.010
 exactly-0.010
Token the
Token ID262
Feature activation-3.64e-01

Pos logit contributions
keeping+0.019
ping+0.015
 arrest+0.013
 heads+0.013
cious+0.013
Neg logit contributions
ours-0.010
ig-0.009
int-0.009
umber-0.009
ores-0.009
Token birds
Token ID10087
Feature activation+1.75e+00

Pos logit contributions
ila+0.026
osh+0.026
ours+0.026
?+0.026
ite+0.025
Neg logit contributions
keeping-0.047
ping-0.037
 arrest-0.033
 heads-0.031
makers-0.031
Token were
Token ID547
Feature activation+2.15e+00

Pos logit contributions
keeping+0.239
ping+0.189
 heads+0.171
 arrest+0.170
 mant+0.164
Neg logit contributions
ours-0.119
osh-0.115
ores-0.111
ila-0.110
?-0.110
Token ch
Token ID442
Feature activation-2.05e-01

Pos logit contributions
keeping+0.276
ping+0.207
 arrest+0.192
 heads+0.188
edly+0.185
Neg logit contributions
ours-0.167
osh-0.157
ores-0.156
irl-0.155
?-0.155
Tokenir
Token ID343
Feature activation+4.87e-01

Pos logit contributions
ours+0.016
osh+0.015
ores+0.015
ver+0.015
?+0.015
Neg logit contributions
keeping-0.025
ping-0.021
 heads-0.018
 arrest-0.018
cious-0.017
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.029
 mant+0.012
cious+0.011
 arrest+0.011
 internet+0.010
Neg logit contributions
 of-0.068
int-0.066
aw-0.065
ack-0.065
osh-0.065
Token.
Token ID13
Feature activation+1.92e-01

Pos logit contributions
keeping+1.559
ping+1.178
 arrest+1.148
 heads+1.122
 mant+1.102
Neg logit contributions
 of-0.601
osh-0.595
?-0.593
ores-0.589
ours-0.581
Token Little
Token ID7703
Feature activation-3.72e+00

Pos logit contributions
keeping+0.026
ping+0.021
 heads+0.019
 arrest+0.019
 mant+0.018
Neg logit contributions
ours-0.013
ores-0.012
osh-0.012
ila-0.012
?-0.012
Token Peter
Token ID5613
Feature activation+2.23e+00

Pos logit contributions
ours+0.245
osh+0.231
ores+0.227
?+0.226
ila+0.225
Neg logit contributions
keeping-0.513
ping-0.412
 arrest-0.371
 heads-0.369
 mant-0.356
Token was
Token ID373
Feature activation-5.44e-01

Pos logit contributions
keeping+0.287
ping+0.229
 heads+0.206
 arrest+0.200
cious+0.195
Neg logit contributions
ours-0.168
ores-0.156
ila-0.153
osh-0.152
int-0.149
Token in
Token ID287
Feature activation+1.19e+00

Pos logit contributions
ours+0.048
osh+0.045
ila+0.044
ores+0.044
int+0.044
Neg logit contributions
keeping-0.063
ping-0.051
 heads-0.043
 arrest-0.042
cious-0.041
Token make
Token ID787
Feature activation-1.14e-01

Pos logit contributions
keeping+0.146
ping+0.115
 heads+0.106
 arrest+0.104
cious+0.097
Neg logit contributions
ours-0.086
ores-0.079
osh-0.077
ig-0.077
int-0.077
Token sounds
Token ID5238
Feature activation+9.73e-01

Pos logit contributions
ours+0.009
osh+0.008
ores+0.008
ila+0.008
int+0.008
Neg logit contributions
keeping-0.014
ping-0.011
 heads-0.010
 arrest-0.010
makers-0.009
Token like
Token ID588
Feature activation-1.10e+00

Pos logit contributions
keeping+0.116
ping+0.096
 heads+0.083
 arrest+0.079
makers+0.078
Neg logit contributions
ours-0.080
int-0.076
osh-0.074
ig-0.073
ila-0.073
Token ch
Token ID442
Feature activation-2.11e-01

Pos logit contributions
ours+0.090
osh+0.085
ores+0.085
ila+0.083
?+0.082
Neg logit contributions
keeping-0.135
ping-0.106
 heads-0.093
 arrest-0.092
cious-0.088
Tokenir
Token ID343
Feature activation+4.84e-01

Pos logit contributions
ours+0.016
osh+0.014
ores+0.014
ila+0.014
?+0.014
Neg logit contributions
keeping-0.028
ping-0.022
 heads-0.020
 arrest-0.020
cious-0.019
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.043
 mant+0.025
 doorstep+0.024
cious+0.022
 arrest+0.021
Neg logit contributions
osh-0.057
 of-0.056
ila-0.055
ores-0.055
ack-0.055
Token and
Token ID290
Feature activation+1.45e-01

Pos logit contributions
keeping+1.516
ping+1.145
 arrest+1.087
 mant+1.084
 heads+1.066
Neg logit contributions
osh-0.653
ores-0.640
?-0.628
ours-0.626
 of-0.615
Token singing
Token ID13777
Feature activation+4.16e+00

Pos logit contributions
keeping+0.017
ping+0.012
cious+0.011
edly+0.011
 doorstep+0.011
Neg logit contributions
ours-0.012
ores-0.012
?-0.012
osh-0.012
irl-0.011
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
keeping+0.579
ping+0.448
 arrest+0.424
 heads+0.411
 mant+0.408
Neg logit contributions
osh-0.265
ores-0.264
ours-0.258
?-0.252
ila-0.245
Token Ben
Token ID3932
Feature activation-2.97e-01

Pos logit contributions
keeping+0.023
ping+0.018
 heads+0.016
 arrest+0.016
makers+0.015
Neg logit contributions
ours-0.014
int-0.013
ig-0.013
osh-0.013
irl-0.012
Token and
Token ID290
Feature activation+1.35e-01

Pos logit contributions
ours+0.022
ores+0.021
osh+0.020
ila+0.020
int+0.020
Neg logit contributions
keeping-0.038
ping-0.031
 heads-0.028
 arrest-0.027
cious-0.026
Token and
Token ID290
Feature activation+7.11e-02

Pos logit contributions
?+0.047
osh+0.042
 of+0.041
ores+0.041
irl+0.039
Neg logit contributions
keeping-0.133
ping-0.108
 arrest-0.101
cious-0.101
 heads-0.099
Token heard
Token ID2982
Feature activation+1.10e+00

Pos logit contributions
keeping+0.009
ping+0.007
 arrest+0.007
edly+0.006
 heads+0.006
Neg logit contributions
osh-0.005
ours-0.005
?-0.005
ores-0.005
irl-0.005
Token birds
Token ID10087
Feature activation+1.74e+00

Pos logit contributions
keeping+0.155
ping+0.121
 heads+0.112
 arrest+0.111
 mant+0.107
Neg logit contributions
ours-0.069
osh-0.068
ores-0.067
?-0.065
ila-0.064
Token ch
Token ID442
Feature activation-2.11e-01

Pos logit contributions
keeping+0.201
ping+0.139
 heads+0.131
 arrest+0.131
edly+0.125
Neg logit contributions
ours-0.152
?-0.152
osh-0.149
ores-0.149
 of-0.147
Tokenir
Token ID343
Feature activation+4.84e-01

Pos logit contributions
ours+0.015
osh+0.014
ila+0.014
ig+0.014
ores+0.014
Neg logit contributions
keeping-0.029
ping-0.022
 mant-0.020
 heads-0.020
 arrest-0.020
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.030
 mant+0.016
 doorstep+0.014
cious+0.013
uces+0.012
Neg logit contributions
 of-0.064
int-0.063
adow-0.062
ila-0.062
osh-0.062
Token.
Token ID13
Feature activation+1.94e-01

Pos logit contributions
keeping+1.522
ping+1.131
 arrest+1.125
 heads+1.088
 mant+1.086
Neg logit contributions
 of-0.619
ores-0.612
?-0.609
osh-0.608
ours-0.584
Token But
Token ID887
Feature activation-4.41e-01

Pos logit contributions
keeping+0.026
ping+0.021
 heads+0.019
 arrest+0.019
cious+0.018
Neg logit contributions
ours-0.013
osh-0.012
ores-0.012
ila-0.012
ig-0.012
Token she
Token ID673
Feature activation-1.07e+00

Pos logit contributions
ours+0.031
osh+0.029
ores+0.029
ila+0.028
int+0.028
Neg logit contributions
keeping-0.059
ping-0.048
 heads-0.043
 arrest-0.042
cious-0.040
Token also
Token ID635
Feature activation-1.50e+00

Pos logit contributions
ours+0.091
ores+0.083
ite+0.079
osh+0.079
irl+0.078
Neg logit contributions
keeping-0.126
ping-0.101
 heads-0.092
 arrest-0.088
cious-0.088
Token saw
Token ID2497
Feature activation+6.73e-01

Pos logit contributions
ours+0.121
ores+0.111
ig+0.109
int+0.108
osh+0.107
Neg logit contributions
keeping-0.187
ping-0.150
 heads-0.132
 arrest-0.131
cious-0.128
Token heard
Token ID2982
Feature activation+1.05e+00

Pos logit contributions
keeping+0.162
ping+0.133
 heads+0.120
 arrest+0.116
edly+0.109
Neg logit contributions
ours-0.097
ores-0.096
irl-0.086
ila-0.085
int-0.084
Token the
Token ID262
Feature activation-4.27e-01

Pos logit contributions
keeping+0.146
ping+0.116
 heads+0.106
 arrest+0.103
cious+0.101
Neg logit contributions
ours-0.069
ores-0.066
osh-0.065
ila-0.065
?-0.064
Token bird
Token ID6512
Feature activation-1.49e-01

Pos logit contributions
?+0.032
osh+0.031
ours+0.030
ila+0.030
 of+0.030
Neg logit contributions
keeping-0.053
ping-0.040
cious-0.036
 arrest-0.035
edly-0.034
Token ch
Token ID442
Feature activation-2.19e-01

Pos logit contributions
ours+0.012
osh+0.011
ores+0.011
?+0.011
ila+0.011
Neg logit contributions
keeping-0.019
ping-0.014
 heads-0.013
 arrest-0.013
cious-0.012
Tokenir
Token ID343
Feature activation+4.73e-01

Pos logit contributions
ours+0.021
ver+0.021
ores+0.020
?+0.020
irl+0.019
Neg logit contributions
keeping-0.023
ping-0.019
 arrest-0.016
 heads-0.015
edly-0.014
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.034
 mant+0.016
cious+0.016
 arrest+0.014
 internet+0.014
Neg logit contributions
int-0.062
 of-0.062
osh-0.062
ila-0.060
?-0.059
Token and
Token ID290
Feature activation+1.43e-01

Pos logit contributions
keeping+1.471
ping+1.132
 arrest+1.054
 heads+1.042
cious+1.012
Neg logit contributions
ours-0.694
osh-0.677
ores-0.675
?-0.659
int-0.645
Token went
Token ID1816
Feature activation-8.49e-01

Pos logit contributions
keeping+0.019
ping+0.016
 heads+0.014
 arrest+0.014
cious+0.013
Neg logit contributions
ours-0.010
osh-0.009
ores-0.009
ila-0.009
?-0.009
Token to
Token ID284
Feature activation+4.87e-01

Pos logit contributions
ours+0.066
osh+0.061
ores+0.061
ila+0.060
?+0.059
Neg logit contributions
keeping-0.109
ping-0.086
 heads-0.076
 arrest-0.076
cious-0.072
Token investigate
Token ID9161
Feature activation-3.14e-02

Pos logit contributions
keeping+0.066
ping+0.054
 heads+0.048
cious+0.047
edly+0.047
Neg logit contributions
osh-0.031
ours-0.031
?-0.031
ores-0.031
ila-0.030
Token.
Token ID13
Feature activation+1.87e-01

Pos logit contributions
ours+0.002
osh+0.002
ores+0.002
ila+0.002
?+0.002
Neg logit contributions
keeping-0.004
ping-0.003
 heads-0.003
 arrest-0.003
cious-0.003
Token to
Token ID284
Feature activation+4.97e-01

Pos logit contributions
keeping+0.494
ping+0.392
 heads+0.350
 arrest+0.350
cious+0.334
Neg logit contributions
ours-0.262
osh-0.246
ores-0.245
ila-0.238
?-0.237
Token the
Token ID262
Feature activation-3.63e-01

Pos logit contributions
keeping+0.067
ping+0.054
 heads+0.048
 arrest+0.048
cious+0.046
Neg logit contributions
ours-0.034
osh-0.032
ores-0.032
ila-0.032
?-0.031
Token birds
Token ID10087
Feature activation+1.74e+00

Pos logit contributions
ours+0.029
?+0.029
osh+0.028
ores+0.028
ila+0.027
Neg logit contributions
keeping-0.044
ping-0.034
 arrest-0.030
cious-0.029
 heads-0.028
Token ch
Token ID442
Feature activation-2.04e-01

Pos logit contributions
keeping+0.228
ping+0.180
 heads+0.161
 arrest+0.159
cious+0.152
Neg logit contributions
ours-0.128
osh-0.122
ores-0.121
ila-0.118
?-0.118
Tokenir
Token ID343
Feature activation+4.77e-01

Pos logit contributions
ours+0.017
ver+0.016
ores+0.016
?+0.016
 exactly+0.016
Neg logit contributions
keeping-0.024
ping-0.020
 heads-0.017
 arrest-0.017
edly-0.015
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.037
 mant+0.018
cious+0.018
 arrest+0.018
ping+0.017
Neg logit contributions
osh-0.059
?-0.058
ores-0.058
ila-0.058
int-0.057
Token,
Token ID11
Feature activation+2.17e-01

Pos logit contributions
keeping+1.431
ping+1.125
 arrest+1.018
 heads+1.012
cious+0.978
Neg logit contributions
ours-0.728
osh-0.705
ores-0.697
?-0.680
ila-0.670
Token when
Token ID618
Feature activation+1.99e-01

Pos logit contributions
keeping+0.027
ping+0.020
 arrest+0.018
 heads+0.018
cious+0.017
Neg logit contributions
ours-0.018
ores-0.017
osh-0.017
?-0.016
ila-0.016
Token she
Token ID673
Feature activation-1.05e+00

Pos logit contributions
keeping+0.027
ping+0.022
 heads+0.020
 arrest+0.019
cious+0.019
Neg logit contributions
ours-0.014
osh-0.013
ores-0.012
int-0.012
ila-0.012
Token heard
Token ID2982
Feature activation+1.15e+00

Pos logit contributions
ours+0.092
ores+0.086
irl+0.077
ila+0.077
ite+0.077
Neg logit contributions
keeping-0.119
ping-0.094
 arrest-0.086
 heads-0.085
cious-0.081
Token a
Token ID257
Feature activation+8.29e-02

Pos logit contributions
keeping+0.158
ping+0.126
 heads+0.114
 arrest+0.112
cious+0.108
Neg logit contributions
ours-0.078
osh-0.073
ores-0.072
ila-0.071
?-0.070
Token do
Token ID466
Feature activation-2.99e-01

Pos logit contributions
keeping+0.081
ping+0.068
 arrest+0.061
edly+0.061
 heads+0.061
Neg logit contributions
?-0.021
ours-0.019
osh-0.019
ila-0.018
irl-0.017
Token.
Token ID13
Feature activation+1.83e-01

Pos logit contributions
ours+0.020
?+0.019
ores+0.019
osh+0.019
ila+0.019
Neg logit contributions
keeping-0.041
ping-0.033
 heads-0.029
 arrest-0.029
cious-0.028
Token He
Token ID679
Feature activation-7.53e-01

Pos logit contributions
keeping+0.025
ping+0.019
 heads+0.017
 arrest+0.017
cious+0.017
Neg logit contributions
ours-0.013
osh-0.012
irl-0.012
ores-0.012
ig-0.012
Token was
Token ID373
Feature activation-5.50e-01

Pos logit contributions
ours+0.063
ores+0.059
osh+0.057
ila+0.057
ite+0.056
Neg logit contributions
keeping-0.091
ping-0.072
 heads-0.067
 arrest-0.063
cious-0.061
Token gas
Token ID3623
Feature activation-3.21e+00

Pos logit contributions
ours+0.042
osh+0.039
ila+0.039
ite+0.038
ig+0.038
Neg logit contributions
keeping-0.071
ping-0.058
 heads-0.050
 arrest-0.050
makers-0.047
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
ours+0.231
osh+0.219
ores+0.218
?+0.213
ila+0.210
Neg logit contributions
keeping-0.421
ping-0.333
 arrest-0.297
 heads-0.293
 mant-0.286
Token and
Token ID290
Feature activation+1.34e-01

Pos logit contributions
keeping+1.453
ping+1.141
 heads+1.033
 arrest+1.030
cious+0.991
Neg logit contributions
ours-0.714
osh-0.687
ores-0.683
?-0.660
ila-0.651
Token pant
Token ID15857
Feature activation-7.10e-01

Pos logit contributions
keeping+0.018
ping+0.014
 heads+0.013
 arrest+0.012
cious+0.012
Neg logit contributions
ours-0.010
osh-0.009
ores-0.009
ila-0.009
?-0.009
Tokening
Token ID278
Feature activation-1.58e+00

Pos logit contributions
ours+0.038
osh+0.035
irl+0.034
ores+0.034
ila+0.034
Neg logit contributions
keeping-0.108
ping-0.089
 heads-0.081
 arrest-0.078
makers-0.077
Token,
Token ID11
Feature activation+2.20e-01

Pos logit contributions
ours+0.104
osh+0.100
ores+0.100
?+0.097
 exactly+0.094
Neg logit contributions
keeping-0.215
ping-0.169
 arrest-0.154
 heads-0.152
cious-0.149
Token almost
Token ID2048
Feature activation-1.23e-01

Pos logit contributions
keeping+0.028
ping+0.022
 heads+0.019
 arrest+0.019
cious+0.018
Neg logit contributions
ours-0.017
ores-0.016
osh-0.016
ila-0.016
?-0.016
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
keeping+0.081
ping+0.064
 heads+0.058
 arrest+0.058
 mant+0.056
Neg logit contributions
ours-0.040
osh-0.038
?-0.038
ores-0.037
ig-0.037
Token It
Token ID632
Feature activation-8.05e-01

Pos logit contributions
keeping+0.023
ping+0.018
 arrest+0.016
 heads+0.016
cious+0.016
Neg logit contributions
ours-0.014
irl-0.013
ig-0.013
osh-0.013
ores-0.013
Token was
Token ID373
Feature activation-5.50e-01

Pos logit contributions
ours+0.049
osh+0.046
ores+0.045
ila+0.044
?+0.044
Neg logit contributions
keeping-0.117
ping-0.094
 heads-0.085
 arrest-0.084
 mant-0.081
Token ch
Token ID442
Feature activation-1.99e-01

Pos logit contributions
ours+0.037
 exactly+0.036
ores+0.036
?+0.036
osh+0.035
Neg logit contributions
keeping-0.076
ping-0.057
 heads-0.053
 arrest-0.053
cious-0.051
Tokenir
Token ID343
Feature activation+4.73e-01

Pos logit contributions
ours+0.018
ver+0.017
osh+0.017
ores+0.017
?+0.016
Neg logit contributions
keeping-0.021
ping-0.019
 heads-0.015
 arrest-0.015
makers-0.014
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.026
 arrest+0.007
cious+0.007
 mant+0.006
 heads+0.005
Neg logit contributions
int-0.071
 of-0.069
osh-0.069
ig-0.068
ores-0.068
Token and
Token ID290
Feature activation+1.34e-01

Pos logit contributions
keeping+1.441
ping+1.137
 heads+1.027
 arrest+1.020
cious+0.976
Neg logit contributions
ours-0.738
osh-0.700
ores-0.696
ila-0.676
?-0.675
Token hopping
Token ID47153
Feature activation+1.78e+00

Pos logit contributions
keeping+0.017
ping+0.013
 arrest+0.012
 heads+0.012
edly+0.011
Neg logit contributions
ours-0.010
osh-0.010
?-0.010
 exactly-0.010
ores-0.010
Token around
Token ID1088
Feature activation-7.88e-01

Pos logit contributions
keeping+0.246
ping+0.190
 arrest+0.180
 heads+0.175
 mant+0.172
Neg logit contributions
osh-0.115
ours-0.111
ores-0.111
 of-0.108
?-0.107
Token!
Token ID0
Feature activation-7.55e-01

Pos logit contributions
?+0.037
 of+0.035
 exactly+0.033
osh+0.033
ores+0.033
Neg logit contributions
keeping-0.124
ping-0.098
 heads-0.093
 arrest-0.092
cious-0.091
Token The
Token ID383
Feature activation+4.72e-01

Pos logit contributions
ours+0.056
irl+0.052
ig+0.051
osh+0.051
ores+0.051
Neg logit contributions
keeping-0.101
ping-0.077
 arrest-0.071
 heads-0.069
cious-0.067
Token The
Token ID383
Feature activation+4.91e-01

Pos logit contributions
keeping+0.023
ping+0.018
 arrest+0.016
 heads+0.016
 mant+0.016
Neg logit contributions
ours-0.013
osh-0.013
ig-0.012
irl-0.012
ores-0.012
Token bird
Token ID6512
Feature activation-9.56e-02

Pos logit contributions
keeping+0.065
ping+0.051
 heads+0.044
 arrest+0.044
cious+0.044
Neg logit contributions
ours-0.035
ila-0.034
?-0.034
osh-0.034
ig-0.033
Token was
Token ID373
Feature activation-5.42e-01

Pos logit contributions
ours+0.006
osh+0.006
?+0.005
ig+0.005
ila+0.005
Neg logit contributions
keeping-0.014
ping-0.011
 arrest-0.010
 heads-0.010
 mant-0.009
Token ch
Token ID442
Feature activation-2.07e-01

Pos logit contributions
ours+0.037
ores+0.036
?+0.035
 exactly+0.035
osh+0.035
Neg logit contributions
keeping-0.074
ping-0.056
 heads-0.052
 arrest-0.052
cious-0.050
Tokenir
Token ID343
Feature activation+4.73e-01

Pos logit contributions
ours+0.018
ver+0.018
ores+0.018
osh+0.017
?+0.017
Neg logit contributions
keeping-0.021
ping-0.019
 arrest-0.016
 heads-0.016
edly-0.014
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.029
cious+0.010
 mant+0.008
 arrest+0.008
 internet+0.008
Neg logit contributions
int-0.069
 of-0.068
ig-0.066
osh-0.065
ila-0.065
Token,
Token ID11
Feature activation+2.18e-01

Pos logit contributions
keeping+1.474
ping+1.150
 heads+1.056
 arrest+1.052
cious+1.007
Neg logit contributions
ours-0.698
osh-0.671
ores-0.669
?-0.655
int-0.640
Token and
Token ID290
Feature activation+1.40e-01

Pos logit contributions
keeping+0.030
ping+0.022
 arrest+0.021
 heads+0.020
makers+0.020
Neg logit contributions
ours-0.015
?-0.015
 exactly-0.014
osh-0.014
ores-0.014
Token Lily
Token ID20037
Feature activation+4.08e-01

Pos logit contributions
keeping+0.020
ping+0.016
 arrest+0.015
 heads+0.015
cious+0.014
Neg logit contributions
ours-0.009
osh-0.008
ores-0.008
ila-0.008
?-0.008
Token wanted
Token ID2227
Feature activation+2.22e+00

Pos logit contributions
keeping+0.058
ping+0.046
 arrest+0.041
 heads+0.041
 mant+0.040
Neg logit contributions
ours-0.026
osh-0.025
?-0.024
ila-0.024
ig-0.023
Token to
Token ID284
Feature activation+4.97e-01

Pos logit contributions
keeping+0.319
ping+0.263
 heads+0.237
 arrest+0.232
edly+0.225
Neg logit contributions
ours-0.130
?-0.127
ores-0.126
osh-0.125
ila-0.122
Token the
Token ID262
Feature activation-3.47e-01

Pos logit contributions
keeping+0.120
ping+0.095
 heads+0.086
 arrest+0.082
cious+0.081
Neg logit contributions
ours-0.072
int-0.070
umber-0.066
ig-0.066
ila-0.065
Token birds
Token ID10087
Feature activation+1.75e+00

Pos logit contributions
osh+0.030
?+0.030
ours+0.030
irl+0.029
ila+0.029
Neg logit contributions
keeping-0.041
ping-0.030
 arrest-0.027
cious-0.025
makers-0.025
Token stopped
Token ID5025
Feature activation+2.38e-02

Pos logit contributions
keeping+0.228
ping+0.180
 heads+0.160
 arrest+0.159
cious+0.152
Neg logit contributions
ours-0.131
osh-0.123
ores-0.122
ila-0.120
?-0.118
Token ch
Token ID442
Feature activation-2.10e-01

Pos logit contributions
keeping+0.003
ping+0.002
 arrest+0.002
cious+0.002
 heads+0.002
Neg logit contributions
ours-0.002
?-0.002
ores-0.002
osh-0.002
 exactly-0.002
Tokenir
Token ID343
Feature activation+4.93e-01

Pos logit contributions
ours+0.017
ores+0.016
osh+0.016
ila+0.016
?+0.016
Neg logit contributions
keeping-0.026
ping-0.021
 heads-0.018
 arrest-0.018
cious-0.017
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.038
 arrest+0.018
 mant+0.017
cious+0.017
 doorstep+0.015
Neg logit contributions
osh-0.063
 of-0.063
int-0.063
?-0.062
ores-0.061
Token to
Token ID284
Feature activation+4.95e-01

Pos logit contributions
keeping+1.522
ping+1.174
 arrest+1.112
 heads+1.081
cious+1.064
Neg logit contributions
osh-0.637
ours-0.632
ores-0.627
?-0.621
 of-0.597
Token listen
Token ID6004
Feature activation+1.26e-01

Pos logit contributions
keeping+0.069
ping+0.052
cious+0.047
 arrest+0.047
edly+0.047
Neg logit contributions
osh-0.033
?-0.033
ores-0.033
ila-0.031
irl-0.031
Token.
Token ID13
Feature activation+1.87e-01

Pos logit contributions
keeping+0.018
ping+0.014
 arrest+0.013
 heads+0.013
 mant+0.012
Neg logit contributions
ours-0.008
ores-0.008
?-0.008
osh-0.007
 exactly-0.007
Token
Token ID198
Feature activation+9.99e-02

Pos logit contributions
keeping+0.025
ping+0.019
 heads+0.017
 arrest+0.017
cious+0.017
Neg logit contributions
ours-0.013
ig-0.013
ila-0.012
osh-0.012
int-0.012
Token
Token ID198
Feature activation+9.70e-02

Pos logit contributions
keeping+0.013
ping+0.010
makers+0.009
 arrest+0.009
cious+0.009
Neg logit contributions
ours-0.007
int-0.007
ig-0.007
ite-0.007
osh-0.007
Token heard
Token ID2982
Feature activation+1.16e+00

Pos logit contributions
ours+0.082
ores+0.077
osh+0.075
ila+0.074
irl+0.072
Neg logit contributions
keeping-0.134
ping-0.107
 heads-0.095
 arrest-0.094
cious-0.090
Token a
Token ID257
Feature activation+7.79e-02

Pos logit contributions
keeping+0.157
ping+0.128
 heads+0.112
 arrest+0.112
cious+0.108
Neg logit contributions
ours-0.083
osh-0.074
ores-0.074
ig-0.073
ila-0.073
Token loud
Token ID7812
Feature activation+2.36e-01

Pos logit contributions
keeping+0.010
ping+0.008
 arrest+0.007
 heads+0.007
edly+0.007
Neg logit contributions
ours-0.006
osh-0.005
?-0.005
ores-0.005
ila-0.005
Token ch
Token ID442
Feature activation-2.10e-01

Pos logit contributions
keeping+0.031
ping+0.021
 arrest+0.020
cious+0.020
 heads+0.020
Neg logit contributions
?-0.018
osh-0.018
ores-0.017
ours-0.017
 of-0.017
Tokenir
Token ID343
Feature activation+4.93e-01

Pos logit contributions
ours+0.021
ver+0.020
 exactly+0.019
ores+0.018
?+0.018
Neg logit contributions
keeping-0.021
ping-0.019
 arrest-0.015
 heads-0.015
edly-0.013
Tokenping
Token ID13886
Feature activation+1.06e+01

Pos logit contributions
keeping+0.044
cious+0.022
 arrest+0.022
 wrists+0.022
 mant+0.022
Neg logit contributions
 of-0.060
int-0.057
osh-0.057
?-0.057
ores-0.056
Token.
Token ID13
Feature activation+1.92e-01

Pos logit contributions
keeping+1.234
ping+0.944
 heads+0.824
 arrest+0.814
cious+0.773
Neg logit contributions
ours-0.949
osh-0.900
ores-0.897
ila-0.886
?-0.873
Token She
Token ID1375
Feature activation-1.63e-01

Pos logit contributions
keeping+0.027
ping+0.021
 heads+0.019
 arrest+0.019
cious+0.018
Neg logit contributions
ours-0.013
osh-0.012
ores-0.012
ila-0.012
?-0.011
Token looked
Token ID3114
Feature activation-5.35e-01

Pos logit contributions
ours+0.013
ores+0.012
ila+0.012
osh+0.011
ite+0.011
Neg logit contributions
keeping-0.020
ping-0.017
 heads-0.015
 arrest-0.014
cious-0.014
Token around
Token ID1088
Feature activation-7.84e-01

Pos logit contributions
ours+0.047
umber+0.044
ila+0.044
osh+0.044
ite+0.043
Neg logit contributions
keeping-0.060
ping-0.051
 heads-0.045
 mant-0.040
makers-0.040
Token and
Token ID290
Feature activation+1.31e-01

Pos logit contributions
ours+0.048
?+0.046
osh+0.046
ores+0.046
ila+0.045
Neg logit contributions
keeping-0.112
ping-0.089
 arrest-0.080
 heads-0.080
cious-0.078
Token loved
Token ID6151
Feature activation-3.68e-01

Pos logit contributions
?+0.018
osh+0.018
ours+0.018
ila+0.018
irl+0.017
Neg logit contributions
keeping-0.047
ping-0.037
 arrest-0.034
 heads-0.034
 mant-0.033
Token to
Token ID284
Feature activation+3.44e-01

Pos logit contributions
ours+0.020
ores+0.018
?+0.018
 of+0.017
osh+0.016
Neg logit contributions
keeping-0.055
ping-0.045
edly-0.042
 heads-0.041
cious-0.041
Token dance
Token ID9280
Feature activation-1.57e+00

Pos logit contributions
keeping+0.041
ping+0.031
 arrest+0.028
 heads+0.028
cious+0.026
Neg logit contributions
ours-0.030
osh-0.028
ores-0.028
ila-0.027
ig-0.027
Token and
Token ID290
Feature activation-1.94e-02

Pos logit contributions
ours+0.147
int+0.133
osh+0.132
umber+0.131
ores+0.129
Neg logit contributions
keeping-0.171
ping-0.139
 heads-0.123
edly-0.118
 arrest-0.112
Token tw
Token ID665
Feature activation-1.09e+00

Pos logit contributions
ours+0.002
osh+0.001
ores+0.001
ila+0.001
int+0.001
Neg logit contributions
keeping-0.002
ping-0.002
 heads-0.002
 arrest-0.002
cious-0.002
Tokenirl
Token ID1901
Feature activation-9.85e+00

Pos logit contributions
ver+0.040
osh+0.037
ours+0.037
ores+0.035
old+0.034
Neg logit contributions
keeping-0.164
ping-0.149
 heads-0.141
 necks-0.136
 arrest-0.135
Token around
Token ID1088
Feature activation-9.44e-01

Pos logit contributions
ours+0.821
ores+0.747
osh+0.745
ila+0.737
int+0.727
Neg logit contributions
keeping-1.206
ping-0.971
 heads-0.850
 arrest-0.812
edly-0.805
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
?+0.048
 of+0.043
 at+0.042
irl+0.039
ores+0.035
Neg logit contributions
keeping-0.157
ping-0.123
cious-0.120
 arrest-0.117
 heads-0.113
Token her
Token ID607
Feature activation+6.58e-01

Pos logit contributions
keeping+0.139
ping+0.111
 heads+0.099
 arrest+0.098
cious+0.095
Neg logit contributions
ours-0.074
osh-0.070
ila-0.069
ores-0.069
?-0.069
Token pretty
Token ID2495
Feature activation+4.09e-01

Pos logit contributions
keeping+0.071
ping+0.055
 heads+0.053
cious+0.047
 necks+0.047
Neg logit contributions
ours-0.062
ver-0.061
ores-0.060
osh-0.058
umber-0.057
Token pink
Token ID11398
Feature activation+4.01e-01

Pos logit contributions
keeping+0.050
ping+0.039
 heads+0.034
 arrest+0.034
makers+0.033
Neg logit contributions
ours-0.033
osh-0.032
ores-0.031
ila-0.031
?-0.031
Token every
Token ID790
Feature activation+8.88e-01

Pos logit contributions
keeping+0.000
ping+0.000
 heads+0.000
 arrest+0.000
cious+0.000
Neg logit contributions
ours-0.000
ores-0.000
osh-0.000
ila-0.000
int-0.000
Token day
Token ID1110
Feature activation+8.62e-01

Pos logit contributions
keeping+0.093
ping+0.073
 heads+0.064
 arrest+0.059
edly+0.059
Neg logit contributions
ours-0.086
 exactly-0.079
ores-0.078
umber-0.078
ver-0.077
Token she
Token ID673
Feature activation-1.22e+00

Pos logit contributions
keeping+0.124
ping+0.101
 heads+0.091
 arrest+0.090
cious+0.087
Neg logit contributions
ours-0.053
osh-0.049
ores-0.048
ila-0.048
int-0.046
Token would
Token ID561
Feature activation-1.83e+00

Pos logit contributions
ours+0.106
ores+0.103
 exactly+0.102
int+0.100
umber+0.097
Neg logit contributions
keeping-0.129
ping-0.105
 heads-0.099
cious-0.090
 arrest-0.088
Token tw
Token ID665
Feature activation-1.09e+00

Pos logit contributions
ours+0.151
ores+0.140
osh+0.138
ila+0.138
int+0.138
Neg logit contributions
keeping-0.221
ping-0.175
 heads-0.157
 arrest-0.153
 mant-0.146
Tokenirl
Token ID1901
Feature activation-9.84e+00

Pos logit contributions
ours+0.044
ver+0.044
ite+0.042
old+0.041
osh+0.040
Neg logit contributions
keeping-0.146
ping-0.137
 heads-0.127
 arrest-0.125
 necks-0.123
Token around
Token ID1088
Feature activation-9.19e-01

Pos logit contributions
ours+0.791
osh+0.723
ores+0.717
ila+0.710
int+0.704
Neg logit contributions
keeping-1.235
ping-0.989
 heads-0.870
 arrest-0.843
makers-0.822
Token and
Token ID290
Feature activation-8.37e-03

Pos logit contributions
?+0.048
 of+0.042
irl+0.041
ores+0.041
 at+0.039
Neg logit contributions
keeping-0.145
ping-0.116
cious-0.109
 arrest-0.108
 heads-0.106
Token spin
Token ID7906
Feature activation-4.16e+00

Pos logit contributions
ours+0.001
osh+0.001
ores+0.001
ila+0.001
?+0.001
Neg logit contributions
keeping-0.001
ping-0.001
 heads-0.001
 arrest-0.001
cious-0.001
Token.
Token ID13
Feature activation+2.90e-02

Pos logit contributions
ours+0.305
osh+0.286
ores+0.286
ila+0.282
?+0.277
Neg logit contributions
keeping-0.550
ping-0.436
 heads-0.388
 arrest-0.386
cious-0.369
Token
Token ID220
Feature activation-1.51e-01

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
 mant+0.003
Neg logit contributions
ours-0.002
osh-0.002
ores-0.002
ila-0.002
int-0.002
Token by
Token ID416
Feature activation-1.77e+00

Pos logit contributions
keeping+0.270
ping+0.216
 heads+0.193
 arrest+0.192
cious+0.183
Neg logit contributions
ours-0.142
osh-0.133
ores-0.133
ila-0.131
?-0.129
Token doing
Token ID1804
Feature activation+3.53e+00

Pos logit contributions
 exactly+0.137
?+0.131
 of+0.127
ila+0.125
ores+0.125
Neg logit contributions
keeping-0.223
cious-0.163
ping-0.162
 heads-0.159
edly-0.154
Token a
Token ID257
Feature activation-6.27e-02

Pos logit contributions
keeping+0.445
ping+0.342
 arrest+0.306
 heads+0.300
cious+0.285
Neg logit contributions
ours-0.289
int-0.281
osh-0.268
ite-0.267
ila-0.264
Token graceful
Token ID44363
Feature activation-1.09e+00

Pos logit contributions
ours+0.004
ila+0.004
ores+0.004
osh+0.004
int+0.004
Neg logit contributions
keeping-0.009
ping-0.007
 arrest-0.006
 heads-0.006
 mant-0.006
Token tw
Token ID665
Feature activation-1.09e+00

Pos logit contributions
osh+0.093
ig+0.093
?+0.092
ours+0.092
 of+0.089
Neg logit contributions
keeping-0.126
ping-0.096
 heads-0.084
 arrest-0.078
cious-0.078
Tokenirl
Token ID1901
Feature activation-9.84e+00

Pos logit contributions
ours+0.077
ores+0.068
ver+0.065
osh+0.064
ila+0.063
Neg logit contributions
keeping-0.144
ping-0.121
 arrest-0.112
cious-0.105
 heads-0.103
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
ours+0.726
osh+0.673
ores+0.672
ila+0.663
int+0.647
Neg logit contributions
keeping-1.293
ping-1.034
 heads-0.918
 arrest-0.905
cious-0.868
Token the
Token ID262
Feature activation-5.22e-01

Pos logit contributions
keeping+0.153
ping+0.128
 heads+0.117
cious+0.115
makers+0.111
Neg logit contributions
?-0.055
ores-0.053
 exactly-0.053
ours-0.052
ila-0.051
Token air
Token ID1633
Feature activation-7.16e-01

Pos logit contributions
ours+0.046
osh+0.044
ores+0.044
?+0.043
ila+0.043
Neg logit contributions
keeping-0.061
ping-0.047
 heads-0.041
 arrest-0.040
makers-0.038
Token.
Token ID13
Feature activation+2.51e-02

Pos logit contributions
?+0.043
ours+0.042
ores+0.041
 of+0.041
osh+0.041
Neg logit contributions
keeping-0.101
ping-0.080
 arrest-0.075
 heads-0.074
cious-0.071
Token From
Token ID3574
Feature activation+1.05e+00

Pos logit contributions
keeping+0.003
ping+0.003
 heads+0.002
 arrest+0.002
 mant+0.002
Neg logit contributions
ours-0.002
osh-0.002
ores-0.002
ila-0.002
irl-0.002
Token a
Token ID257
Feature activation-8.04e-02

Pos logit contributions
ours+0.030
irl+0.029
ila+0.028
ig+0.028
?+0.028
Neg logit contributions
keeping-0.066
ping-0.053
makers-0.046
 arrest-0.046
edly-0.046
Token different
Token ID1180
Feature activation-3.00e-01

Pos logit contributions
ours+0.007
ores+0.006
osh+0.006
ila+0.006
ver+0.006
Neg logit contributions
keeping-0.010
ping-0.008
 heads-0.007
 arrest-0.007
cious-0.006
Token dress
Token ID6576
Feature activation-1.34e+00

Pos logit contributions
ours+0.023
ig+0.022
?+0.022
irl+0.022
osh+0.022
Neg logit contributions
keeping-0.038
ping-0.030
 arrest-0.027
 heads-0.026
makers-0.025
Token and
Token ID290
Feature activation-2.16e-02

Pos logit contributions
ours+0.104
osh+0.097
ores+0.096
ila+0.095
int+0.092
Neg logit contributions
keeping-0.171
ping-0.136
 heads-0.120
 arrest-0.119
cious-0.114
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
ours+0.002
ores+0.002
osh+0.002
ila+0.002
int+0.002
Neg logit contributions
keeping-0.003
ping-0.002
 heads-0.002
 arrest-0.002
cious-0.002
Tokenirl
Token ID1901
Feature activation-9.83e+00

Pos logit contributions
ver+0.039
osh+0.037
ours+0.034
ores+0.034
?+0.032
Neg logit contributions
keeping-0.166
ping-0.147
 heads-0.138
 necks-0.136
 arrest-0.131
Token around
Token ID1088
Feature activation-9.37e-01

Pos logit contributions
ours+0.714
osh+0.668
ores+0.663
ila+0.655
?+0.640
Neg logit contributions
keeping-1.309
ping-1.044
 heads-0.930
 arrest-0.921
cious-0.881
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
 of+0.041
?+0.040
 at+0.038
irl+0.036
ores+0.035
Neg logit contributions
keeping-0.151
ping-0.122
 arrest-0.118
cious-0.116
 heads-0.112
Token front
Token ID2166
Feature activation+3.57e+00

Pos logit contributions
keeping+0.148
ping+0.120
 arrest+0.107
 heads+0.106
edly+0.106
Neg logit contributions
ours-0.064
?-0.063
ila-0.061
osh-0.060
ores-0.059
Token of
Token ID286
Feature activation-4.38e+00

Pos logit contributions
keeping+0.589
ping+0.504
 heads+0.465
 arrest+0.455
 mant+0.452
Neg logit contributions
ver-0.125
ours-0.120
ores-0.113
int-0.109
umber-0.109
Token the
Token ID262
Feature activation-5.23e-01

Pos logit contributions
ours+0.245
ores+0.242
?+0.241
ila+0.237
 exactly+0.231
Neg logit contributions
keeping-0.633
ping-0.512
edly-0.474
cious-0.471
 arrest-0.468
Token loved
Token ID6151
Feature activation-3.69e-01

Pos logit contributions
?+0.017
ours+0.017
osh+0.017
ila+0.017
irl+0.017
Neg logit contributions
keeping-0.044
ping-0.035
 arrest-0.032
 heads-0.032
 mant-0.030
Token to
Token ID284
Feature activation+3.63e-01

Pos logit contributions
ours+0.020
ores+0.018
?+0.018
 of+0.018
osh+0.016
Neg logit contributions
keeping-0.055
ping-0.045
edly-0.042
 heads-0.041
cious-0.041
Token dance
Token ID9280
Feature activation-1.54e+00

Pos logit contributions
keeping+0.044
ping+0.035
 heads+0.031
 arrest+0.030
cious+0.029
Neg logit contributions
ours-0.030
osh-0.028
ores-0.028
ila-0.028
ig-0.027
Token and
Token ID290
Feature activation+3.79e-03

Pos logit contributions
ours+0.139
osh+0.126
int+0.125
ores+0.123
ila+0.123
Neg logit contributions
keeping-0.176
ping-0.143
 heads-0.126
edly-0.121
 arrest-0.117
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.000
ping+0.000
 arrest+0.000
 heads+0.000
cious+0.000
Neg logit contributions
ours-0.000
osh-0.000
ores-0.000
ila-0.000
?-0.000
Tokenirl
Token ID1901
Feature activation-9.83e+00

Pos logit contributions
ver+0.038
ours+0.036
osh+0.035
ores+0.033
old+0.033
Neg logit contributions
keeping-0.161
ping-0.147
 heads-0.139
 necks-0.134
 arrest-0.133
Token around
Token ID1088
Feature activation-9.19e-01

Pos logit contributions
ours+0.809
ores+0.737
osh+0.736
ila+0.729
int+0.714
Neg logit contributions
keeping-1.213
ping-0.977
 heads-0.855
 arrest-0.822
edly-0.809
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
?+0.042
 at+0.040
 of+0.038
irl+0.033
ap+0.029
Neg logit contributions
keeping-0.157
ping-0.122
cious-0.121
 arrest-0.118
 heads-0.112
Token circles
Token ID13332
Feature activation-1.53e+00

Pos logit contributions
keeping+0.143
ping+0.115
 heads+0.102
 arrest+0.102
cious+0.100
Neg logit contributions
ours-0.069
?-0.066
ila-0.066
osh-0.066
ores-0.065
Token.
Token ID13
Feature activation+3.11e-02

Pos logit contributions
ours+0.105
osh+0.099
ores+0.099
?+0.098
ila+0.097
Neg logit contributions
keeping-0.209
ping-0.164
 arrest-0.149
 heads-0.148
cious-0.142
Token One
Token ID1881
Feature activation-3.14e-01

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
 mant+0.003
Neg logit contributions
ours-0.002
osh-0.002
ores-0.002
ila-0.002
int-0.002
Token.
Token ID13
Feature activation+6.67e-02

Pos logit contributions
ours+0.181
osh+0.172
ores+0.169
ila+0.168
?+0.166
Neg logit contributions
keeping-0.342
ping-0.272
 heads-0.244
 arrest-0.240
cious-0.231
Token Now
Token ID2735
Feature activation-9.78e-03

Pos logit contributions
keeping+0.009
ping+0.007
 heads+0.006
 arrest+0.006
edly+0.006
Neg logit contributions
ours-0.005
osh-0.005
irl-0.005
ores-0.005
ila-0.005
Token she
Token ID673
Feature activation-1.19e+00

Pos logit contributions
ours+0.001
osh+0.001
ores+0.001
ila+0.001
int+0.001
Neg logit contributions
keeping-0.001
ping-0.001
 heads-0.001
 arrest-0.001
cious-0.001
Token could
Token ID714
Feature activation-8.80e-01

Pos logit contributions
ores+0.104
ours+0.102
int+0.094
umber+0.093
 exactly+0.088
Neg logit contributions
keeping-0.127
ping-0.103
 heads-0.103
 arrest-0.095
 necks-0.087
Token tw
Token ID665
Feature activation-1.06e+00

Pos logit contributions
ours+0.090
int+0.083
ores+0.080
umber+0.079
old+0.078
Neg logit contributions
keeping-0.080
 heads-0.064
ping-0.063
 arrest-0.062
 sidewalk-0.057
Tokenirl
Token ID1901
Feature activation-9.83e+00

Pos logit contributions
ours+0.054
auc+0.053
ver+0.048
ores+0.046
old+0.046
Neg logit contributions
keeping-0.127
ping-0.123
 arrest-0.113
 heads-0.112
 necks-0.111
Token and
Token ID290
Feature activation-2.58e-03

Pos logit contributions
ours+0.918
int+0.829
umber+0.811
ores+0.806
osh+0.788
Neg logit contributions
keeping-1.088
ping-0.900
 heads-0.763
makers-0.722
 arrest-0.714
Token dance
Token ID9280
Feature activation-1.54e+00

Pos logit contributions
ours+0.000
ores+0.000
int+0.000
ila+0.000
osh+0.000
Neg logit contributions
keeping-0.000
ping-0.000
 heads-0.000
 arrest-0.000
 mant-0.000
Token whenever
Token ID8797
Feature activation-8.10e-01

Pos logit contributions
ours+0.127
int+0.113
osh+0.113
ores+0.112
ila+0.112
Neg logit contributions
keeping-0.191
ping-0.153
 heads-0.134
 arrest-0.130
edly-0.128
Token she
Token ID673
Feature activation-1.19e+00

Pos logit contributions
ours+0.054
ores+0.049
osh+0.049
int+0.048
ila+0.047
Neg logit contributions
keeping-0.113
ping-0.090
 heads-0.082
 arrest-0.082
cious-0.078
Token wanted
Token ID2227
Feature activation+2.08e+00

Pos logit contributions
ores+0.104
ours+0.101
 exactly+0.096
ver+0.091
umber+0.090
Neg logit contributions
keeping-0.132
ping-0.106
 heads-0.103
 arrest-0.095
 necks-0.089
Token.
Token ID13
Feature activation+5.51e-02

Pos logit contributions
keeping+0.378
ping+0.303
 heads+0.276
 arrest+0.269
 mant+0.266
Neg logit contributions
ours-0.178
?-0.173
ila-0.171
osh-0.170
ores-0.170
Token She
Token ID1375
Feature activation-2.88e-01

Pos logit contributions
keeping+0.006
ping+0.005
 heads+0.004
 arrest+0.004
edly+0.004
Neg logit contributions
ig-0.005
ours-0.005
irl-0.005
osh-0.005
int-0.005
Token walks
Token ID11114
Feature activation+1.15e+00

Pos logit contributions
ores+0.024
umber+0.023
ver+0.023
ours+0.023
itting+0.023
Neg logit contributions
keeping-0.031
 heads-0.026
ping-0.025
 arrest-0.024
 necks-0.023
Token and
Token ID290
Feature activation+1.10e-02

Pos logit contributions
keeping+0.155
ping+0.126
 arrest+0.114
 heads+0.112
 mant+0.111
Neg logit contributions
ours-0.075
osh-0.074
ores-0.072
ila-0.072
?-0.069
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.001
ping+0.001
 heads+0.001
 arrest+0.001
 necks+0.001
Neg logit contributions
ours-0.001
umber-0.001
ver-0.001
ores-0.001
int-0.001
Tokenirl
Token ID1901
Feature activation-9.83e+00

Pos logit contributions
ver+0.051
auc+0.048
 of+0.047
washer+0.044
umber+0.044
Neg logit contributions
keeping-0.147
ping-0.132
 necks-0.128
 heads-0.125
 arrest-0.120
Tokens
Token ID82
Feature activation+8.53e-01

Pos logit contributions
ours+0.849
int+0.794
umber+0.765
ores+0.764
iz+0.763
Neg logit contributions
keeping-1.141
ping-0.915
 heads-0.801
makers-0.757
edly-0.753
Token and
Token ID290
Feature activation+4.20e-03

Pos logit contributions
keeping+0.114
ping+0.090
 arrest+0.080
 heads+0.080
cious+0.077
Neg logit contributions
ours-0.062
osh-0.058
ores-0.057
ila-0.057
ig-0.056
Token sings
Token ID33041
Feature activation-7.58e-01

Pos logit contributions
keeping+0.001
ping+0.000
 heads+0.000
 arrest+0.000
cious+0.000
Neg logit contributions
ours-0.000
osh-0.000
ores-0.000
ila-0.000
?-0.000
Token.
Token ID13
Feature activation+4.87e-02

Pos logit contributions
ours+0.051
ores+0.051
osh+0.050
ila+0.048
?+0.048
Neg logit contributions
keeping-0.103
ping-0.081
 arrest-0.074
 heads-0.074
 mant-0.070
Token
Token ID198
Feature activation-3.78e-02

Pos logit contributions
keeping+0.006
ping+0.005
 heads+0.004
 arrest+0.004
cious+0.004
Neg logit contributions
ours-0.004
ig-0.004
int-0.004
osh-0.004
irl-0.004
Token She
Token ID1375
Feature activation-2.84e-01

Pos logit contributions
irl+0.052
ig+0.052
osh+0.052
ite+0.051
ours+0.051
Neg logit contributions
keeping-0.121
ping-0.098
 heads-0.093
edly-0.089
 arrest-0.087
Token did
Token ID750
Feature activation-1.14e+00

Pos logit contributions
ours+0.027
ores+0.026
itting+0.025
int+0.025
ite+0.024
Neg logit contributions
keeping-0.027
 heads-0.020
ping-0.019
 arrest-0.017
 necks-0.017
Token a
Token ID257
Feature activation-5.79e-02

Pos logit contributions
ours+0.093
int+0.089
ite+0.089
osh+0.088
umber+0.085
Neg logit contributions
keeping-0.137
ping-0.108
 arrest-0.097
 heads-0.097
cious-0.092
Token little
Token ID1310
Feature activation+2.90e+00

Pos logit contributions
ours+0.005
ores+0.005
int+0.005
ite+0.004
ver+0.004
Neg logit contributions
keeping-0.006
ping-0.005
 heads-0.005
 arrest-0.005
 mant-0.005
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.355
ping+0.277
 heads+0.252
 arrest+0.250
makers+0.236
Neg logit contributions
ours-0.236
ores-0.221
osh-0.219
ila-0.219
int-0.216
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.051
ours+0.049
ores+0.047
?+0.047
 shall+0.046
Neg logit contributions
keeping-0.143
ping-0.127
 heads-0.117
 arrest-0.116
 necks-0.115
Token and
Token ID290
Feature activation-1.87e-02

Pos logit contributions
ours+0.806
ores+0.737
int+0.728
ila+0.709
iz+0.704
Neg logit contributions
keeping-1.186
ping-0.973
 heads-0.846
 arrest-0.814
makers-0.802
Token looked
Token ID3114
Feature activation-6.69e-01

Pos logit contributions
ours+0.002
int+0.001
umber+0.001
ores+0.001
ila+0.001
Neg logit contributions
keeping-0.002
ping-0.002
 heads-0.002
 arrest-0.002
cious-0.002
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
umber+0.056
ours+0.053
ila+0.053
ver+0.053
osh+0.052
Neg logit contributions
keeping-0.071
ping-0.063
 heads-0.057
 mant-0.053
 dodging-0.051
Token the
Token ID262
Feature activation-5.20e-01

Pos logit contributions
keeping+0.148
ping+0.120
 heads+0.106
 arrest+0.106
cious+0.103
Neg logit contributions
ours-0.067
osh-0.062
ores-0.061
ila-0.061
?-0.061
Token mirror
Token ID10162
Feature activation-3.63e+00

Pos logit contributions
ver+0.048
ores+0.045
ours+0.044
osh+0.042
 exactly+0.040
Neg logit contributions
keeping-0.057
ping-0.046
 heads-0.045
 mant-0.042
 pavement-0.042
Token.
Token ID13
Feature activation+5.93e-02

Pos logit contributions
ours+0.036
?+0.035
osh+0.034
ores+0.034
ila+0.034
Neg logit contributions
keeping-0.077
ping-0.061
 arrest-0.055
 heads-0.055
 mant-0.053
Token He
Token ID679
Feature activation-8.80e-01

Pos logit contributions
keeping+0.008
ping+0.006
 heads+0.005
 arrest+0.005
 pavement+0.005
Neg logit contributions
irl-0.004
ours-0.004
ig-0.004
osh-0.004
int-0.004
Token started
Token ID2067
Feature activation-1.38e-01

Pos logit contributions
ours+0.076
ores+0.072
umber+0.068
ila+0.068
int+0.067
Neg logit contributions
keeping-0.101
ping-0.080
 heads-0.074
 arrest-0.069
cious-0.069
Token to
Token ID284
Feature activation+3.52e-01

Pos logit contributions
ours+0.012
osh+0.011
ores+0.011
ila+0.011
?+0.011
Neg logit contributions
keeping-0.017
ping-0.013
 heads-0.011
 arrest-0.011
cious-0.011
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.044
ping+0.035
 heads+0.031
 arrest+0.030
cious+0.029
Neg logit contributions
ours-0.028
ores-0.026
osh-0.026
ila-0.026
int-0.025
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.056
auc+0.053
ours+0.051
umber+0.049
osh+0.047
Neg logit contributions
keeping-0.137
ping-0.129
 necks-0.123
 heads-0.120
 arrest-0.117
Token and
Token ID290
Feature activation-1.87e-02

Pos logit contributions
ours+0.931
int+0.824
ores+0.818
ila+0.816
umber+0.813
Neg logit contributions
keeping-1.080
ping-0.887
 heads-0.757
edly-0.716
makers-0.702
Token spin
Token ID7906
Feature activation-4.13e+00

Pos logit contributions
ours+0.001
osh+0.001
ores+0.001
ila+0.001
?+0.001
Neg logit contributions
keeping-0.002
ping-0.002
 heads-0.002
 arrest-0.002
cious-0.002
Token around
Token ID1088
Feature activation-9.36e-01

Pos logit contributions
ours+0.346
osh+0.314
ores+0.311
ila+0.310
int+0.308
Neg logit contributions
keeping-0.509
ping-0.407
 heads-0.356
 arrest-0.344
edly-0.336
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
osh+0.051
?+0.051
ores+0.051
ours+0.051
ila+0.049
Neg logit contributions
keeping-0.137
ping-0.109
 heads-0.100
 arrest-0.099
cious-0.097
Token circles
Token ID13332
Feature activation-1.53e+00

Pos logit contributions
keeping+0.136
ping+0.108
 heads+0.096
 arrest+0.095
cious+0.091
Neg logit contributions
ours-0.077
osh-0.073
ores-0.072
ila-0.071
?-0.070
Token loved
Token ID6151
Feature activation-3.56e-01

Pos logit contributions
osh+0.017
ours+0.017
?+0.017
ila+0.017
irl+0.017
Neg logit contributions
keeping-0.044
ping-0.035
 heads-0.031
 arrest-0.031
 mant-0.030
Token to
Token ID284
Feature activation+3.56e-01

Pos logit contributions
ours+0.021
ores+0.019
?+0.019
osh+0.018
 of+0.018
Neg logit contributions
keeping-0.052
ping-0.042
 heads-0.038
edly-0.038
cious-0.038
Token dance
Token ID9280
Feature activation-1.54e+00

Pos logit contributions
keeping+0.042
ping+0.032
 arrest+0.029
 heads+0.029
cious+0.027
Neg logit contributions
ours-0.032
ores-0.029
osh-0.029
ila-0.029
ig-0.029
Token and
Token ID290
Feature activation+3.33e-04

Pos logit contributions
ours+0.139
osh+0.126
int+0.126
ila+0.124
ores+0.123
Neg logit contributions
keeping-0.175
ping-0.142
 heads-0.125
edly-0.119
 arrest-0.117
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.000
ping+0.000
 heads+0.000
 arrest+0.000
cious+0.000
Neg logit contributions
ours-0.000
osh-0.000
ores-0.000
ila-0.000
?-0.000
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.038
osh+0.036
ours+0.035
ores+0.034
old+0.034
Neg logit contributions
keeping-0.161
ping-0.147
 heads-0.139
 necks-0.135
 arrest-0.133
Token in
Token ID287
Feature activation+1.04e+00

Pos logit contributions
ours+0.790
ores+0.723
osh+0.720
ila+0.712
int+0.701
Neg logit contributions
keeping-1.228
ping-0.989
 heads-0.869
 arrest-0.841
edly-0.819
Token her
Token ID607
Feature activation+6.84e-01

Pos logit contributions
keeping+0.146
ping+0.117
 heads+0.104
 arrest+0.104
cious+0.102
Neg logit contributions
ours-0.068
?-0.064
osh-0.064
ila-0.064
ores-0.063
Token pretty
Token ID2495
Feature activation+4.13e-01

Pos logit contributions
keeping+0.070
 heads+0.056
ping+0.055
 mant+0.048
 necks+0.048
Neg logit contributions
ver-0.066
ours-0.065
ores-0.063
umber-0.061
osh-0.060
Token skirt
Token ID23967
Feature activation-1.95e+00

Pos logit contributions
keeping+0.051
ping+0.039
 heads+0.034
 arrest+0.034
cious+0.033
Neg logit contributions
ours-0.034
osh-0.032
ores-0.032
ila-0.032
?-0.031
Token.
Token ID13
Feature activation+4.74e-02

Pos logit contributions
ours+0.159
osh+0.148
ores+0.148
ila+0.147
int+0.142
Neg logit contributions
keeping-0.239
ping-0.190
 heads-0.168
 arrest-0.163
edly-0.158
Token watching
Token ID4964
Feature activation+1.29e+00

Pos logit contributions
keeping+0.011
ping+0.008
 arrest+0.007
 heads+0.007
cious+0.007
Neg logit contributions
ours-0.006
 exactly-0.006
?-0.006
osh-0.006
ores-0.006
Token her
Token ID607
Feature activation+6.84e-01

Pos logit contributions
keeping+0.201
ping+0.151
 arrest+0.150
cious+0.147
 heads+0.144
Neg logit contributions
ores-0.065
 of-0.064
?-0.061
osh-0.060
 exactly-0.059
Token twist
Token ID14528
Feature activation-1.87e+00

Pos logit contributions
keeping+0.091
ping+0.071
 arrest+0.064
 heads+0.062
makers+0.062
Neg logit contributions
ours-0.049
osh-0.047
?-0.047
ores-0.046
ig-0.046
Token and
Token ID290
Feature activation+3.33e-04

Pos logit contributions
ours+0.151
osh+0.143
ores+0.141
ila+0.140
?+0.138
Neg logit contributions
keeping-0.233
ping-0.182
 heads-0.161
 arrest-0.159
cious-0.152
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.000
ping+0.000
 arrest+0.000
 heads+0.000
cious+0.000
Neg logit contributions
ours-0.000
osh-0.000
ores-0.000
?-0.000
ila-0.000
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ours+0.072
ver+0.069
 of+0.068
?+0.067
ite+0.067
Neg logit contributions
keeping-0.132
ping-0.119
 heads-0.102
 arrest-0.101
 necks-0.101
Token.
Token ID13
Feature activation+2.97e-02

Pos logit contributions
ours+0.748
osh+0.695
ores+0.692
ila+0.684
?+0.669
Neg logit contributions
keeping-1.273
ping-1.015
 heads-0.898
 arrest-0.883
cious-0.848
Token The
Token ID383
Feature activation+3.54e-01

Pos logit contributions
keeping+0.004
ping+0.004
 heads+0.003
 arrest+0.003
makers+0.003
Neg logit contributions
ours-0.002
osh-0.002
ores-0.002
ila-0.002
?-0.002
Token white
Token ID2330
Feature activation+1.53e+00

Pos logit contributions
keeping+0.049
ping+0.037
 arrest+0.033
makers+0.033
 heads+0.032
Neg logit contributions
ours-0.026
osh-0.025
ila-0.024
?-0.024
ores-0.023
Token lady
Token ID10846
Feature activation-3.28e-01

Pos logit contributions
keeping+0.194
ping+0.149
 arrest+0.130
 heads+0.130
makers+0.127
Neg logit contributions
ours-0.123
osh-0.117
ila-0.115
ores-0.113
int-0.112
Token smiled
Token ID13541
Feature activation-6.17e-02

Pos logit contributions
ours+0.028
ores+0.028
ila+0.025
umber+0.024
irl+0.024
Neg logit contributions
keeping-0.037
ping-0.031
 heads-0.028
edly-0.026
 necks-0.025
Token started
Token ID2067
Feature activation-1.45e-01

Pos logit contributions
ours+0.041
osh+0.038
ores+0.038
ila+0.037
?+0.037
Neg logit contributions
keeping-0.068
ping-0.054
 heads-0.048
 arrest-0.047
cious-0.045
Token to
Token ID284
Feature activation+3.49e-01

Pos logit contributions
ours+0.011
osh+0.011
ores+0.011
ila+0.010
?+0.010
Neg logit contributions
keeping-0.018
ping-0.015
 heads-0.013
 arrest-0.013
cious-0.012
Token dance
Token ID9280
Feature activation-1.54e+00

Pos logit contributions
keeping+0.045
ping+0.036
 heads+0.032
 arrest+0.031
cious+0.030
Neg logit contributions
ours-0.027
osh-0.025
ores-0.025
ila-0.025
?-0.024
Token and
Token ID290
Feature activation+4.20e-03

Pos logit contributions
ours+0.117
ila+0.105
int+0.105
osh+0.104
ores+0.102
Neg logit contributions
keeping-0.202
ping-0.161
 heads-0.143
 arrest-0.139
edly-0.137
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.001
ping+0.000
 arrest+0.000
 heads+0.000
cious+0.000
Neg logit contributions
ours-0.000
osh-0.000
ores-0.000
irl-0.000
ila-0.000
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.055
?+0.054
ite+0.052
ila+0.052
ours+0.051
Neg logit contributions
keeping-0.145
ping-0.127
 heads-0.116
 necks-0.115
 arrest-0.114
Token around
Token ID1088
Feature activation-9.22e-01

Pos logit contributions
ours+0.749
osh+0.693
ores+0.690
ila+0.685
?+0.670
Neg logit contributions
keeping-1.274
ping-1.015
 heads-0.899
 arrest-0.883
cious-0.849
Token in
Token ID287
Feature activation+1.06e+00

Pos logit contributions
 of+0.044
osh+0.043
ores+0.043
?+0.043
ig+0.040
Neg logit contributions
keeping-0.141
ping-0.115
 arrest-0.107
 heads-0.103
cious-0.102
Token the
Token ID262
Feature activation-5.17e-01

Pos logit contributions
keeping+0.153
ping+0.124
 arrest+0.109
cious+0.109
edly+0.108
Neg logit contributions
ours-0.064
osh-0.062
?-0.061
ores-0.061
ila-0.059
Token sunshine
Token ID34488
Feature activation-1.67e+00

Pos logit contributions
ours+0.049
ores+0.045
ila+0.045
ver+0.045
osh+0.044
Neg logit contributions
keeping-0.058
ping-0.045
 heads-0.040
 arrest-0.038
 mant-0.038
Token.
Token ID13
Feature activation+3.11e-02

Pos logit contributions
 of+0.089
ig+0.088
osh+0.088
ores+0.088
ours+0.088
Neg logit contributions
keeping-0.242
ping-0.195
 arrest-0.185
 heads-0.178
 mant-0.173
Token He
Token ID679
Feature activation-8.87e-01

Pos logit contributions
keeping+0.006
ping+0.004
 sidewalk+0.004
 pavement+0.004
 Smith+0.004
Neg logit contributions
ig-0.006
ila-0.005
ours-0.005
int-0.005
irl-0.005
Token would
Token ID561
Feature activation-1.84e+00

Pos logit contributions
ours+0.080
ores+0.078
umber+0.075
int+0.074
 exactly+0.072
Neg logit contributions
keeping-0.089
ping-0.076
 heads-0.072
 necks-0.065
cious-0.064
Token dance
Token ID9280
Feature activation-1.54e+00

Pos logit contributions
ours+0.152
ores+0.141
osh+0.140
ila+0.139
int+0.138
Neg logit contributions
keeping-0.222
ping-0.176
 heads-0.158
 arrest-0.154
 mant-0.146
Token and
Token ID290
Feature activation+4.20e-03

Pos logit contributions
ours+0.144
int+0.131
umber+0.129
ila+0.127
ite+0.126
Neg logit contributions
keeping-0.166
ping-0.138
 heads-0.119
edly-0.113
makers-0.111
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.001
ping+0.000
 heads+0.000
 arrest+0.000
 mant+0.000
Neg logit contributions
ours-0.000
ores-0.000
int-0.000
ila-0.000
osh-0.000
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.045
umber+0.043
auc+0.043
ours+0.043
old+0.041
Neg logit contributions
keeping-0.145
ping-0.137
 necks-0.128
 heads-0.127
opes-0.120
Token around
Token ID1088
Feature activation-9.22e-01

Pos logit contributions
ours+0.847
ila+0.758
ores+0.752
int+0.748
umber+0.735
Neg logit contributions
keeping-1.169
ping-0.954
 heads-0.827
 arrest-0.788
makers-0.783
Token his
Token ID465
Feature activation-1.58e+00

Pos logit contributions
?+0.048
osh+0.045
ores+0.044
 of+0.043
irl+0.042
Neg logit contributions
keeping-0.145
ping-0.113
cious-0.107
 arrest-0.104
 heads-0.104
Token room
Token ID2119
Feature activation-1.71e+00

Pos logit contributions
ours+0.104
ig+0.102
?+0.100
irl+0.098
uddle+0.097
Neg logit contributions
keeping-0.223
ping-0.176
 arrest-0.157
makers-0.156
cious-0.150
Token every
Token ID790
Feature activation+8.96e-01

Pos logit contributions
ours+0.115
osh+0.108
ores+0.107
?+0.105
ila+0.104
Neg logit contributions
keeping-0.236
ping-0.188
 heads-0.168
 arrest-0.168
cious-0.161
Token time
Token ID640
Feature activation-1.07e+00

Pos logit contributions
keeping+0.107
ping+0.084
 heads+0.074
 arrest+0.072
 mant+0.069
Neg logit contributions
ours-0.077
ores-0.071
ila-0.071
int-0.070
osh-0.070
Token is
Token ID318
Feature activation-1.90e+00

Pos logit contributions
ores+0.024
ours+0.024
ila+0.024
int+0.023
umber+0.022
Neg logit contributions
keeping-0.051
ping-0.043
 heads-0.040
cious-0.036
 necks-0.036
Token very
Token ID845
Feature activation+6.90e-02

Pos logit contributions
ours+0.138
osh+0.130
int+0.126
ores+0.126
ila+0.124
Neg logit contributions
keeping-0.252
ping-0.203
 heads-0.181
 arrest-0.175
cious-0.172
Token pretty
Token ID2495
Feature activation+4.28e-01

Pos logit contributions
keeping+0.008
ping+0.006
 heads+0.005
cious+0.005
 arrest+0.005
Neg logit contributions
ours-0.006
osh-0.006
ver-0.006
int-0.006
ig-0.006
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
keeping+0.058
ping+0.047
 arrest+0.042
 heads+0.042
 mant+0.040
Neg logit contributions
ours-0.029
?-0.028
ores-0.027
osh-0.027
ila-0.027
Token tw
Token ID665
Feature activation-1.07e+00

Pos logit contributions
keeping+0.001
ping+0.001
 heads+0.001
cious+0.001
 arrest+0.001
Neg logit contributions
ours-0.001
int-0.001
umber-0.001
ig-0.001
ores-0.001
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
osh+0.044
ver+0.043
ours+0.043
ores+0.039
 of+0.039
Neg logit contributions
keeping-0.163
ping-0.142
 heads-0.131
 necks-0.126
cious-0.124
Tokens
Token ID82
Feature activation+8.43e-01

Pos logit contributions
ours+0.717
ores+0.650
int+0.647
ila+0.647
osh+0.642
Neg logit contributions
keeping-1.289
ping-1.056
 heads-0.940
 arrest-0.888
makers-0.881
Token around
Token ID1088
Feature activation-9.22e-01

Pos logit contributions
keeping+0.118
ping+0.094
 arrest+0.087
 heads+0.084
 mant+0.081
Neg logit contributions
ours-0.053
osh-0.052
?-0.051
ila-0.050
ig-0.050
Token in
Token ID287
Feature activation+1.06e+00

Pos logit contributions
 of+0.038
?+0.038
 at+0.036
ores+0.031
 if+0.029
Neg logit contributions
keeping-0.152
ping-0.122
 arrest-0.118
cious-0.116
 mant-0.115
Token it
Token ID340
Feature activation-4.10e-01

Pos logit contributions
keeping+0.160
ping+0.133
edly+0.121
cious+0.120
makers+0.119
Neg logit contributions
?-0.055
 of-0.053
ila-0.050
osh-0.047
ours-0.046
Token.
Token ID13
Feature activation+5.30e-02

Pos logit contributions
ours+0.028
osh+0.026
?+0.026
ores+0.026
ila+0.025
Neg logit contributions
keeping-0.056
ping-0.045
 arrest-0.040
 heads-0.039
cious-0.038
Token old
Token ID1468
Feature activation+1.75e+00

Pos logit contributions
ours+0.041
?+0.038
ila+0.038
ores+0.038
ig+0.038
Neg logit contributions
keeping-0.058
ping-0.045
 arrest-0.039
 heads-0.037
makers-0.037
Token lady
Token ID10846
Feature activation-2.89e-01

Pos logit contributions
keeping+0.235
ping+0.193
 arrest+0.171
 mant+0.165
makers+0.164
Neg logit contributions
?-0.114
ours-0.113
ila-0.111
ite-0.106
ig-0.106
Token like
Token ID588
Feature activation-1.21e+00

Pos logit contributions
ours+0.020
osh+0.019
ig+0.019
?+0.018
ores+0.018
Neg logit contributions
keeping-0.039
ping-0.031
 arrest-0.028
 heads-0.028
makers-0.026
Token a
Token ID257
Feature activation-3.90e-02

Pos logit contributions
ours+0.080
osh+0.073
ores+0.073
ila+0.072
int+0.070
Neg logit contributions
keeping-0.168
ping-0.136
 heads-0.122
 arrest-0.121
cious-0.116
Token wh
Token ID348
Feature activation-2.53e+00

Pos logit contributions
ours+0.003
osh+0.003
ores+0.003
ila+0.003
?+0.003
Neg logit contributions
keeping-0.005
ping-0.004
 heads-0.003
 arrest-0.003
edly-0.003
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ours+0.187
ores+0.174
osh+0.172
 exactly+0.167
 shall+0.158
Neg logit contributions
keeping-0.307
ping-0.257
makers-0.217
 arrest-0.213
 heads-0.209
Tokenig
Token ID328
Feature activation-8.59e+00

Pos logit contributions
ours+0.774
ores+0.702
ila+0.699
osh+0.691
irl+0.680
Neg logit contributions
keeping-1.240
ping-1.013
 heads-0.882
 arrest-0.865
makers-0.836
Tokenig
Token ID328
Feature activation-8.59e+00

Pos logit contributions
ig+0.547
osh+0.543
ours+0.536
ores+0.527
?+0.522
Neg logit contributions
keeping-1.192
ping-0.916
 arrest-0.847
 heads-0.845
 mant-0.814
Token.
Token ID13
Feature activation+6.67e-02

Pos logit contributions
ig+0.550
osh+0.547
ours+0.544
ores+0.532
ver+0.527
Neg logit contributions
keeping-1.184
ping-0.911
 arrest-0.843
 heads-0.840
 mant-0.811
Token His
Token ID2399
Feature activation-4.63e-01

Pos logit contributions
keeping+0.008
ping+0.006
 pavement+0.006
 arrest+0.005
 heads+0.005
Neg logit contributions
ours-0.005
osh-0.005
ig-0.005
irl-0.005
ite-0.005
Token dancing
Token ID15360
Feature activation+4.38e-01

Pos logit contributions
ours+0.040
ores+0.038
osh+0.037
ila+0.037
int+0.036
Neg logit contributions
keeping-0.055
ping-0.043
 heads-0.039
 arrest-0.037
 mant-0.035
Token loved
Token ID6151
Feature activation-3.52e-01

Pos logit contributions
keeping+0.041
ping+0.033
 arrest+0.030
 heads+0.029
 mant+0.028
Neg logit contributions
ours-0.018
?-0.018
osh-0.018
ila-0.017
ores-0.017
Token to
Token ID284
Feature activation+3.70e-01

Pos logit contributions
ours+0.020
?+0.018
ores+0.018
 of+0.017
irl+0.016
Neg logit contributions
keeping-0.052
ping-0.043
edly-0.039
 heads-0.039
cious-0.038
Token dance
Token ID9280
Feature activation-1.53e+00

Pos logit contributions
keeping+0.042
ping+0.032
 arrest+0.029
 heads+0.028
cious+0.027
Neg logit contributions
ours-0.034
osh-0.031
ores-0.031
ila-0.031
ver-0.031
Token and
Token ID290
Feature activation+2.27e-02

Pos logit contributions
ours+0.139
osh+0.127
ila+0.124
int+0.123
ores+0.123
Neg logit contributions
keeping-0.176
ping-0.141
 heads-0.125
edly-0.120
 arrest-0.116
Token tw
Token ID665
Feature activation-1.06e+00

Pos logit contributions
keeping+0.003
ping+0.002
 heads+0.002
 arrest+0.002
cious+0.002
Neg logit contributions
ours-0.002
osh-0.002
ores-0.002
ila-0.002
?-0.002
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.037
osh+0.036
ours+0.034
ores+0.032
old+0.031
Neg logit contributions
keeping-0.162
ping-0.147
 heads-0.138
 necks-0.134
 arrest-0.132
Token around
Token ID1088
Feature activation-9.10e-01

Pos logit contributions
ours+0.786
osh+0.721
ores+0.720
ila+0.712
int+0.693
Neg logit contributions
keeping-1.235
ping-0.992
 heads-0.872
 arrest-0.845
edly-0.824
Token in
Token ID287
Feature activation+1.07e+00

Pos logit contributions
?+0.042
 at+0.038
 of+0.037
irl+0.030
ap+0.026
Neg logit contributions
keeping-0.157
ping-0.123
cious-0.123
 arrest-0.118
 heads-0.114
Token her
Token ID607
Feature activation+6.98e-01

Pos logit contributions
keeping+0.147
ping+0.118
 heads+0.105
 arrest+0.104
cious+0.103
Neg logit contributions
ours-0.070
?-0.068
ila-0.067
osh-0.067
ores-0.066
Token room
Token ID2119
Feature activation-1.69e+00

Pos logit contributions
keeping+0.079
ping+0.062
 heads+0.058
 arrest+0.052
cious+0.052
Neg logit contributions
ours-0.063
ver-0.061
ores-0.060
osh-0.059
umber-0.057
Token.
Token ID13
Feature activation+5.50e-02

Pos logit contributions
ours+0.105
?+0.100
ores+0.096
 exactly+0.094
osh+0.094
Neg logit contributions
keeping-0.245
ping-0.194
 arrest-0.176
 heads-0.172
makers-0.168
Token.
Token ID13
Feature activation+6.44e-02

Pos logit contributions
keeping+0.070
ping+0.059
 arrest+0.057
 heads+0.053
 mant+0.052
Neg logit contributions
?-0.021
 of-0.020
ours-0.018
 exactly-0.017
ig-0.017
Token She
Token ID1375
Feature activation-2.72e-01

Pos logit contributions
keeping+0.008
ping+0.007
 heads+0.006
 arrest+0.006
 mant+0.006
Neg logit contributions
ours-0.005
osh-0.005
ores-0.004
ila-0.004
int-0.004
Token loved
Token ID6151
Feature activation-3.60e-01

Pos logit contributions
ours+0.022
ores+0.021
osh+0.020
ila+0.020
int+0.020
Neg logit contributions
keeping-0.033
ping-0.026
 heads-0.024
 arrest-0.023
cious-0.023
Token to
Token ID284
Feature activation+3.56e-01

Pos logit contributions
ours+0.021
?+0.019
 exactly+0.019
 of+0.019
ila+0.019
Neg logit contributions
keeping-0.052
ping-0.043
 heads-0.039
edly-0.039
cious-0.038
Token tw
Token ID665
Feature activation-1.06e+00

Pos logit contributions
keeping+0.041
ping+0.032
 heads+0.030
 arrest+0.029
 mant+0.028
Neg logit contributions
ours-0.031
ores-0.029
ver-0.028
osh-0.028
umber-0.028
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
ver+0.034
osh+0.033
ours+0.033
ores+0.032
auc+0.028
Neg logit contributions
keeping-0.165
ping-0.148
 heads-0.139
 necks-0.135
 arrest-0.134
Token around
Token ID1088
Feature activation-9.00e-01

Pos logit contributions
ours+0.742
ores+0.686
osh+0.684
ila+0.673
int+0.663
Neg logit contributions
keeping-1.277
ping-1.022
 heads-0.906
 arrest-0.889
cious-0.854
Token in
Token ID287
Feature activation+1.07e+00

Pos logit contributions
 of+0.037
 at+0.036
?+0.036
irl+0.028
ig+0.028
Neg logit contributions
keeping-0.157
ping-0.123
cious-0.118
 arrest-0.117
 heads-0.114
Token it
Token ID340
Feature activation-3.84e-01

Pos logit contributions
keeping+0.158
ping+0.128
edly+0.116
 arrest+0.115
cious+0.114
Neg logit contributions
?-0.060
ila-0.058
 of-0.058
osh-0.058
ours-0.057
Token and
Token ID290
Feature activation+1.04e-02

Pos logit contributions
?+0.023
ours+0.022
osh+0.022
ig+0.022
ores+0.021
Neg logit contributions
keeping-0.057
ping-0.045
 arrest-0.042
 heads-0.040
makers-0.040
Token feel
Token ID1254
Feature activation-1.81e+00

Pos logit contributions
keeping+0.001
ping+0.001
 heads+0.001
 arrest+0.001
cious+0.001
Neg logit contributions
ours-0.001
osh-0.001
ores-0.001
ila-0.001
?-0.001
Token dish
Token ID9433
Feature activation+4.75e-01

Pos logit contributions
ours+0.039
ores+0.038
?+0.036
osh+0.036
ila+0.035
Neg logit contributions
keeping-0.059
ping-0.046
 arrest-0.040
cious-0.039
 heads-0.038
Tokenwasher
Token ID45146
Feature activation-4.84e+00

Pos logit contributions
keeping+0.062
ping+0.051
 heads+0.048
edly+0.044
 mant+0.042
Neg logit contributions
ours-0.032
ila-0.031
irl-0.030
int-0.029
ig-0.028
Token began
Token ID2540
Feature activation-1.09e+00

Pos logit contributions
ours+0.388
ores+0.356
ila+0.355
int+0.348
osh+0.345
Neg logit contributions
keeping-0.592
ping-0.477
 heads-0.431
 arrest-0.411
 mant-0.395
Token to
Token ID284
Feature activation+3.58e-01

Pos logit contributions
ours+0.088
osh+0.083
ila+0.082
ores+0.081
int+0.080
Neg logit contributions
keeping-0.135
ping-0.107
 heads-0.094
 arrest-0.092
cious-0.088
Token wh
Token ID348
Feature activation-2.52e+00

Pos logit contributions
keeping+0.045
ping+0.035
 heads+0.031
 arrest+0.031
 mant+0.029
Neg logit contributions
ours-0.029
osh-0.027
ores-0.027
ila-0.026
ig-0.026
Tokenirl
Token ID1901
Feature activation-9.82e+00

Pos logit contributions
osh+0.132
ours+0.126
 exactly+0.125
 shall+0.121
ver+0.117
Neg logit contributions
keeping-0.331
ping-0.286
makers-0.249
 arrest-0.241
 heads-0.236
Token and
Token ID290
Feature activation+6.57e-03

Pos logit contributions
ours+0.797
ila+0.741
osh+0.728
ores+0.720
int+0.714
Neg logit contributions
keeping-1.210
ping-0.979
 heads-0.864
 arrest-0.817
edly-0.802
Token swirl
Token ID42835
Feature activation-9.63e-01

Pos logit contributions
keeping+0.001
ping+0.001
 arrest+0.001
 heads+0.001
cious+0.001
Neg logit contributions
ours-0.000
osh-0.000
ores-0.000
ila-0.000
?-0.000
Token the
Token ID262
Feature activation-4.87e-01

Pos logit contributions
ours+0.074
ila+0.069
?+0.068
irl+0.067
int+0.067
Neg logit contributions
keeping-0.119
ping-0.100
 heads-0.087
makers-0.082
edly-0.081
Token water
Token ID1660
Feature activation-5.64e-01

Pos logit contributions
ours+0.039
?+0.038
ores+0.038
osh+0.037
int+0.036
Neg logit contributions
keeping-0.060
ping-0.045
 arrest-0.042
cious-0.039
edly-0.038
Token around
Token ID1088
Feature activation-9.21e-01

Pos logit contributions
ours+0.043
osh+0.039
ila+0.039
ores+0.039
int+0.038
Neg logit contributions
keeping-0.073
ping-0.058
 heads-0.052
 arrest-0.050
 mant-0.048
Token on
Token ID319
Feature activation-4.57e-01

Pos logit contributions
ours+0.032
ver+0.032
osh+0.031
ores+0.029
umber+0.029
Neg logit contributions
keeping-0.046
ping-0.037
 heads-0.032
 arrest-0.030
 necks-0.030
Token pretty
Token ID2495
Feature activation+4.37e-01

Pos logit contributions
ours+0.035
osh+0.032
?+0.032
ila+0.032
ores+0.032
Neg logit contributions
keeping-0.059
ping-0.047
 heads-0.041
 arrest-0.041
makers-0.040
Token dresses
Token ID27309
Feature activation+4.08e-02

Pos logit contributions
keeping+0.046
ping+0.035
 heads+0.030
 arrest+0.029
cious+0.028
Neg logit contributions
ours-0.043
osh-0.041
ores-0.041
ila-0.040
ver-0.040
Token and
Token ID290
Feature activation+6.73e-03

Pos logit contributions
keeping+0.005
ping+0.004
 heads+0.004
 arrest+0.004
cious+0.004
Neg logit contributions
ours-0.003
osh-0.003
ores-0.003
ila-0.003
int-0.003
Token tw
Token ID665
Feature activation-1.06e+00

Pos logit contributions
keeping+0.001
ping+0.001
 arrest+0.001
 heads+0.001
makers+0.001
Neg logit contributions
ours-0.001
osh-0.001
ila-0.001
irl-0.000
?-0.000
Tokenirl
Token ID1901
Feature activation-9.81e+00

Pos logit contributions
osh+0.036
ver+0.035
ours+0.034
ores+0.032
?+0.030
Neg logit contributions
keeping-0.174
ping-0.150
 heads-0.140
 necks-0.136
 arrest-0.135
Token around
Token ID1088
Feature activation-9.07e-01

Pos logit contributions
ours+0.672
osh+0.629
ores+0.624
ila+0.617
ig+0.602
Neg logit contributions
keeping-1.348
ping-1.078
 heads-0.965
 arrest-0.964
 mant-0.921
Token in
Token ID287
Feature activation+1.06e+00

Pos logit contributions
?+0.040
 of+0.038
 at+0.037
irl+0.032
ores+0.028
Neg logit contributions
keeping-0.148
ping-0.123
 arrest-0.119
cious-0.116
iously-0.112
Token them
Token ID606
Feature activation+1.45e+00

Pos logit contributions
keeping+0.144
ping+0.116
 arrest+0.103
 heads+0.102
edly+0.100
Neg logit contributions
ours-0.072
ila-0.069
?-0.068
osh-0.067
ores-0.067
Token.
Token ID13
Feature activation+5.93e-02

Pos logit contributions
keeping+0.210
ping+0.167
 arrest+0.159
 heads+0.148
makers+0.145
Neg logit contributions
ours-0.088
?-0.085
osh-0.084
ig-0.084
ores-0.083
Token One
Token ID1881
Feature activation-2.88e-01

Pos logit contributions
keeping+0.007
ping+0.006
 heads+0.005
 mant+0.005
 arrest+0.005
Neg logit contributions
ours-0.004
osh-0.004
ores-0.004
ila-0.004
int-0.004
Token loved
Token ID6151
Feature activation-3.58e-01

Pos logit contributions
?+0.017
ours+0.016
osh+0.016
ila+0.016
irl+0.016
Neg logit contributions
keeping-0.043
ping-0.034
 arrest-0.031
 heads-0.031
 mant-0.030
Token to
Token ID284
Feature activation+3.58e-01

Pos logit contributions
ours+0.021
?+0.019
ores+0.019
 of+0.018
osh+0.018
Neg logit contributions
keeping-0.052
ping-0.042
edly-0.039
 heads-0.039
cious-0.038
Token dance
Token ID9280
Feature activation-1.53e+00

Pos logit contributions
keeping+0.042
ping+0.032
 arrest+0.029
 heads+0.029
 mant+0.027
Neg logit contributions
ours-0.032
osh-0.030
ores-0.029
ila-0.029
ver-0.029
Token and
Token ID290
Feature activation+6.73e-03

Pos logit contributions
ours+0.139
osh+0.125
int+0.124
ila+0.123
umber+0.122
Neg logit contributions
keeping-0.173
ping-0.141
 heads-0.124
edly-0.118
 arrest-0.116
Token tw
Token ID665
Feature activation-1.06e+00

Pos logit contributions
keeping+0.001
ping+0.001
 heads+0.001
 arrest+0.001
cious+0.001
Neg logit contributions
ours-0.001
osh-0.000
ores-0.000
ila-0.000
?-0.000
Tokenirl
Token ID1901
Feature activation-9.81e+00

Pos logit contributions
ver+0.037
osh+0.036
ours+0.035
ores+0.033
old+0.033
Neg logit contributions
keeping-0.160
ping-0.146
 heads-0.137
 necks-0.133
 arrest-0.132
Token around
Token ID1088
Feature activation-9.07e-01

Pos logit contributions
ours+0.791
ores+0.722
osh+0.721
ila+0.714
int+0.697
Neg logit contributions
keeping-1.227
ping-0.988
 heads-0.867
 arrest-0.839
edly-0.818
Token in
Token ID287
Feature activation+1.06e+00

Pos logit contributions
?+0.041
 at+0.038
 of+0.037
irl+0.031
ap+0.027
Neg logit contributions
keeping-0.157
ping-0.121
cious-0.121
 arrest-0.117
 heads-0.112
Token her
Token ID607
Feature activation+6.77e-01

Pos logit contributions
keeping+0.147
ping+0.118
 heads+0.105
 arrest+0.105
edly+0.103
Neg logit contributions
ours-0.068
?-0.066
osh-0.065
ila-0.065
ores-0.064
Token pretty
Token ID2495
Feature activation+4.42e-01

Pos logit contributions
keeping+0.074
ping+0.058
 heads+0.055
cious+0.049
 mant+0.049
Neg logit contributions
ours-0.063
ver-0.062
ores-0.061
osh-0.059
umber-0.058
Token dress
Token ID6576
Feature activation-1.32e+00

Pos logit contributions
keeping+0.055
ping+0.043
 arrest+0.038
 heads+0.037
makers+0.036
Neg logit contributions
ours-0.035
osh-0.033
?-0.033
ig-0.033
ila-0.033
Token an
Token ID281
Feature activation-2.79e+00

Pos logit contributions
keeping+0.069
ping+0.056
 heads+0.048
 arrest+0.048
 ground+0.044
Neg logit contributions
int-0.053
ours-0.052
ig-0.051
irl-0.051
umber-0.049
Token available
Token ID1695
Feature activation-3.19e-01

Pos logit contributions
ours+0.206
?+0.198
ila+0.195
ores+0.194
ig+0.193
Neg logit contributions
keeping-0.365
ping-0.278
 arrest-0.250
 heads-0.247
cious-0.246
Token stick
Token ID4859
Feature activation-7.53e-01

Pos logit contributions
ours+0.029
ores+0.028
osh+0.028
ila+0.028
int+0.027
Neg logit contributions
keeping-0.036
ping-0.027
 heads-0.024
 arrest-0.024
 mant-0.023
Token on
Token ID319
Feature activation-3.93e-01

Pos logit contributions
ours+0.055
osh+0.052
ores+0.051
ila+0.051
?+0.050
Neg logit contributions
keeping-0.099
ping-0.079
 heads-0.070
 arrest-0.069
cious-0.067
Token the
Token ID262
Feature activation-4.27e-01

Pos logit contributions
ours+0.021
osh+0.020
?+0.020
ores+0.020
ila+0.019
Neg logit contributions
keeping-0.059
ping-0.049
cious-0.044
 heads-0.044
makers-0.043
Token ground
Token ID2323
Feature activation+6.16e+00

Pos logit contributions
 cross+0.036
 of+0.035
ite+0.035
?+0.034
 best+0.033
Neg logit contributions
keeping-0.050
ping-0.036
makers-0.033
red-0.032
edly-0.032
Token.
Token ID13
Feature activation+1.14e-01

Pos logit contributions
keeping+0.786
ping+0.622
 heads+0.550
 arrest+0.540
cious+0.517
Neg logit contributions
ours-0.479
osh-0.447
ores-0.445
ila-0.440
int-0.432
Token Tom
Token ID4186
Feature activation+2.26e-01

Pos logit contributions
keeping+0.014
ping+0.011
 arrest+0.010
 heads+0.010
 sidewalk+0.010
Neg logit contributions
ig-0.009
ours-0.009
irl-0.008
umber-0.008
int-0.008
Token managed
Token ID5257
Feature activation-2.38e+00

Pos logit contributions
keeping+0.029
ping+0.024
 heads+0.021
 arrest+0.021
cious+0.020
Neg logit contributions
ours-0.017
ores-0.016
osh-0.016
ila-0.015
?-0.015
Token to
Token ID284
Feature activation+4.32e-01

Pos logit contributions
ours+0.188
osh+0.176
ores+0.176
ila+0.174
int+0.170
Neg logit contributions
keeping-0.301
ping-0.238
 heads-0.210
 arrest-0.208
cious-0.198
Token grab
Token ID5552
Feature activation+1.39e+00

Pos logit contributions
keeping+0.057
ping+0.045
 heads+0.040
 arrest+0.039
cious+0.038
Neg logit contributions
ours-0.032
osh-0.031
ores-0.030
ila-0.030
?-0.030
Token's
Token ID338
Feature activation+6.14e-01

Pos logit contributions
ours+0.166
ores+0.153
osh+0.151
?+0.150
ig+0.148
Neg logit contributions
keeping-0.352
ping-0.279
 arrest-0.254
 heads-0.247
makers-0.241
Token family
Token ID1641
Feature activation+1.02e+00

Pos logit contributions
keeping+0.080
ping+0.064
 arrest+0.056
makers+0.055
cious+0.054
Neg logit contributions
ours-0.045
?-0.041
ores-0.041
osh-0.041
 of-0.040
Token.
Token ID13
Feature activation+1.80e-01

Pos logit contributions
keeping+0.145
ping+0.118
 arrest+0.108
 heads+0.104
makers+0.102
Neg logit contributions
ours-0.062
ores-0.058
?-0.058
 of-0.057
osh-0.057
Token They
Token ID1119
Feature activation+1.22e+00

Pos logit contributions
keeping+0.021
ping+0.015
 heads+0.014
 arrest+0.013
cious+0.013
Neg logit contributions
ours-0.016
ig-0.016
osh-0.015
int-0.015
irl-0.015
Token introduced
Token ID5495
Feature activation-1.68e-02

Pos logit contributions
keeping+0.152
ping+0.119
 heads+0.108
 arrest+0.105
cious+0.100
Neg logit contributions
ours-0.097
ores-0.090
osh-0.089
ig-0.088
int-0.088
Token themselves
Token ID2405
Feature activation+6.58e+00

Pos logit contributions
ours+0.001
ila+0.001
ores+0.001
osh+0.001
 exactly+0.001
Neg logit contributions
keeping-0.002
ping-0.002
 arrest-0.001
 mant-0.001
 pavement-0.001
Token.
Token ID13
Feature activation+1.69e-01

Pos logit contributions
keeping+0.912
ping+0.737
 arrest+0.662
 heads+0.648
cious+0.628
Neg logit contributions
ours-0.440
ores-0.412
?-0.408
osh-0.397
 exactly-0.390
Token They
Token ID1119
Feature activation+1.21e+00

Pos logit contributions
keeping+0.021
ping+0.016
 heads+0.014
 arrest+0.014
cious+0.013
Neg logit contributions
ours-0.014
ig-0.013
osh-0.013
int-0.013
irl-0.013
Token showed
Token ID3751
Feature activation-9.25e-01

Pos logit contributions
keeping+0.154
ping+0.121
 heads+0.109
 arrest+0.107
cious+0.102
Neg logit contributions
ours-0.094
ores-0.087
osh-0.087
ila-0.085
int-0.085
Token them
Token ID606
Feature activation+1.57e+00

Pos logit contributions
ours+0.075
osh+0.070
ores+0.070
ila+0.069
int+0.069
Neg logit contributions
keeping-0.115
ping-0.090
 heads-0.080
 arrest-0.078
cious-0.075
Token the
Token ID262
Feature activation-3.77e-01

Pos logit contributions
keeping+0.210
ping+0.168
 heads+0.153
 arrest+0.149
cious+0.144
Neg logit contributions
ours-0.110
osh-0.103
int-0.102
ila-0.102
ores-0.101
Token,
Token ID11
Feature activation+1.26e-01

Pos logit contributions
?+0.024
ours+0.022
ila+0.022
 of+0.021
ver+0.021
Neg logit contributions
keeping-0.058
ping-0.049
 arrest-0.044
 heads-0.044
 mant-0.043
Token so
Token ID523
Feature activation+5.71e-01

Pos logit contributions
keeping+0.016
ping+0.012
 heads+0.011
 arrest+0.011
cious+0.010
Neg logit contributions
ours-0.010
osh-0.009
ores-0.009
ila-0.009
?-0.009
Token one
Token ID530
Feature activation-6.63e-01

Pos logit contributions
keeping+0.072
ping+0.057
 heads+0.053
 arrest+0.052
cious+0.050
Neg logit contributions
ours-0.044
int-0.042
osh-0.040
ig-0.040
umber-0.039
Token twin
Token ID15203
Feature activation-2.42e+00

Pos logit contributions
ours+0.044
irl+0.040
ores+0.038
umber+0.036
uddle+0.035
Neg logit contributions
keeping-0.086
ping-0.075
 heads-0.066
cious-0.064
red-0.064
Token brave
Token ID14802
Feature activation-2.52e+00

Pos logit contributions
ours+0.176
ores+0.166
irl+0.162
osh+0.162
int+0.157
Neg logit contributions
keeping-0.318
ping-0.256
 heads-0.231
 arrest-0.221
makers-0.218
Tokenly
Token ID306
Feature activation+8.45e+00

Pos logit contributions
ours+0.323
osh+0.311
ores+0.309
ila+0.308
?+0.305
Neg logit contributions
keeping-0.194
ping-0.126
 heads-0.096
 arrest-0.095
 mant-0.086
Token reached
Token ID4251
Feature activation-1.71e+00

Pos logit contributions
keeping+1.145
ping+0.916
 heads+0.831
 arrest+0.813
cious+0.795
Neg logit contributions
ours-0.588
ores-0.544
osh-0.531
irl-0.523
ila-0.519
Token out
Token ID503
Feature activation-1.53e+00

Pos logit contributions
ours+0.135
int+0.133
Because+0.130
umber+0.130
ite+0.129
Neg logit contributions
keeping-0.210
ping-0.166
 heads-0.151
 arrest-0.141
 mant-0.138
Token and
Token ID290
Feature activation+3.60e-02

Pos logit contributions
ours+0.132
osh+0.123
irl+0.120
ite+0.120
int+0.120
Neg logit contributions
keeping-0.186
ping-0.147
 heads-0.127
 arrest-0.124
makers-0.121
Token touched
Token ID12615
Feature activation-2.72e+00

Pos logit contributions
keeping+0.004
ping+0.003
 heads+0.003
 arrest+0.003
 mant+0.003
Neg logit contributions
ours-0.003
irl-0.003
ig-0.003
ila-0.003
umber-0.003
Token the
Token ID262
Feature activation-4.61e-01

Pos logit contributions
ours+0.227
int+0.223
irl+0.215
ver+0.214
ig+0.212
Neg logit contributions
keeping-0.326
ping-0.256
 heads-0.228
 arrest-0.211
makers-0.206
Token happy
Token ID3772
Feature activation+7.99e-01

Pos logit contributions
keeping+0.015
ping+0.012
 heads+0.011
 arrest+0.011
cious+0.011
Neg logit contributions
ours-0.010
osh-0.009
ver-0.009
ig-0.009
umber-0.009
Token ending
Token ID7464
Feature activation+3.41e+00

Pos logit contributions
keeping+0.099
ping+0.079
 heads+0.069
 arrest+0.068
cious+0.065
Neg logit contributions
ours-0.062
?-0.060
ores-0.060
osh-0.060
ila-0.059
Token!
Token ID0
Feature activation-8.36e-01

Pos logit contributions
keeping+0.430
ping+0.340
 heads+0.294
 arrest+0.288
makers+0.283
Neg logit contributions
ours-0.277
irl-0.253
int-0.250
osh-0.247
ila-0.247
Token<|endoftext|>
Token ID50256
Feature activation-2.56e-01

Pos logit contributions
ours+0.060
int+0.055
irl+0.054
ig+0.054
osh+0.052
Neg logit contributions
keeping-0.114
ping-0.088
 arrest-0.081
 heads-0.076
makers-0.076
TokenDa
Token ID26531
Feature activation+2.44e+00

Pos logit contributions
ours+0.015
ores+0.013
osh+0.013
int+0.013
ila+0.013
Neg logit contributions
keeping-0.038
ping-0.031
 heads-0.027
 arrest-0.027
makers-0.026
Tokenisy
Token ID13560
Feature activation+7.05e+00

Pos logit contributions
keeping+0.299
ping+0.224
 heads+0.212
 arrest+0.203
 pavement+0.197
Neg logit contributions
ig-0.193
?-0.191
ila-0.190
 exactly-0.190
 of-0.188
Token was
Token ID373
Feature activation-6.33e-01

Pos logit contributions
keeping+0.972
ping+0.783
 heads+0.702
 arrest+0.694
cious+0.667
Neg logit contributions
ours-0.471
osh-0.437
ores-0.434
ila-0.428
?-0.418
Token a
Token ID257
Feature activation-1.65e-02

Pos logit contributions
ours+0.054
osh+0.050
ite+0.049
umber+0.049
ila+0.048
Neg logit contributions
keeping-0.072
ping-0.060
 arrest-0.050
 heads-0.050
 mant-0.049
Token little
Token ID1310
Feature activation+2.95e+00

Pos logit contributions
ours+0.002
ores+0.001
osh+0.001
ver+0.001
ila+0.001
Neg logit contributions
keeping-0.002
ping-0.001
 heads-0.001
cious-0.001
 arrest-0.001
Token girl
Token ID2576
Feature activation+4.80e-01

Pos logit contributions
keeping+0.306
ping+0.242
 heads+0.224
 arrest+0.199
makers+0.195
Neg logit contributions
ours-0.269
ores-0.260
osh-0.258
ver-0.258
irl-0.257
Token and
Token ID290
Feature activation+3.97e-02

Pos logit contributions
keeping+0.067
ping+0.054
 heads+0.048
 arrest+0.048
 mant+0.046
Neg logit contributions
ours-0.032
osh-0.029
ores-0.029
ila-0.029
?-0.029
Token,
Token ID11
Feature activation+1.23e-01

Pos logit contributions
keeping+0.205
ping+0.163
 heads+0.145
 arrest+0.145
 mant+0.138
Neg logit contributions
ours-0.106
osh-0.101
ores-0.098
ila-0.097
?-0.096
Token people
Token ID661
Feature activation+1.47e+00

Pos logit contributions
keeping+0.015
ping+0.012
 heads+0.011
 arrest+0.011
cious+0.010
Neg logit contributions
ours-0.010
int-0.009
ig-0.009
ores-0.009
osh-0.009
Token were
Token ID547
Feature activation+2.05e+00

Pos logit contributions
keeping+0.198
ping+0.158
 heads+0.141
 arrest+0.140
 mant+0.134
Neg logit contributions
ours-0.104
osh-0.098
ores-0.096
ila-0.096
?-0.094
Token chatting
Token ID32046
Feature activation+4.08e+00

Pos logit contributions
keeping+0.239
ping+0.188
 heads+0.162
 arrest+0.158
cious+0.153
Neg logit contributions
ours-0.181
osh-0.171
int-0.168
ores-0.168
ila-0.167
Token and
Token ID290
Feature activation+3.62e-02

Pos logit contributions
keeping+0.558
ping+0.448
 arrest+0.400
 heads+0.400
 mant+0.381
Neg logit contributions
ours-0.273
ores-0.258
osh-0.258
?-0.252
ila-0.249
Token eating
Token ID6600
Feature activation+6.42e+00

Pos logit contributions
keeping+0.004
ping+0.003
 arrest+0.003
 heads+0.003
edly+0.003
Neg logit contributions
ours-0.003
osh-0.003
ores-0.003
ila-0.003
?-0.003
Token lunch
Token ID9965
Feature activation+3.31e+00

Pos logit contributions
keeping+0.791
ping+0.615
 heads+0.552
 arrest+0.527
 necks+0.511
Neg logit contributions
ours-0.515
int-0.497
osh-0.481
ila-0.480
ver-0.478
Token.
Token ID13
Feature activation+8.76e-02

Pos logit contributions
keeping+0.439
ping+0.351
 heads+0.313
 arrest+0.312
 mant+0.297
Neg logit contributions
ours-0.237
ores-0.222
osh-0.222
ila-0.218
?-0.216
Token Tom
Token ID4186
Feature activation+2.03e-01

Pos logit contributions
keeping+0.012
ping+0.009
 heads+0.009
 arrest+0.008
 pavement+0.008
Neg logit contributions
ours-0.006
ig-0.006
osh-0.006
int-0.006
ila-0.006
Token smiled
Token ID13541
Feature activation-2.08e-02

Pos logit contributions
keeping+0.026
ping+0.021
 heads+0.019
 arrest+0.019
cious+0.018
Neg logit contributions
ours-0.015
ores-0.014
osh-0.014
ila-0.014
int-0.013
Token and
Token ID290
Feature activation+4.29e-02

Pos logit contributions
ours+0.002
ila+0.002
osh+0.002
irl+0.002
umber+0.002
Neg logit contributions
keeping-0.002
ping-0.002
 heads-0.002
edly-0.002
cious-0.001
Token were
Token ID547
Feature activation+2.14e+00

Pos logit contributions
keeping+0.155
ping+0.122
 heads+0.109
 arrest+0.107
cious+0.103
Neg logit contributions
ours-0.095
ores-0.089
osh-0.088
ila-0.088
ig-0.086
Token sleeping
Token ID11029
Feature activation+4.73e+00

Pos logit contributions
keeping+0.279
ping+0.216
 arrest+0.195
 heads+0.192
edly+0.184
Neg logit contributions
ours-0.163
ores-0.153
osh-0.152
 exactly-0.148
ila-0.148
Token and
Token ID290
Feature activation+1.35e-01

Pos logit contributions
keeping+0.651
ping+0.517
 arrest+0.473
 heads+0.463
 mant+0.448
Neg logit contributions
ours-0.311
osh-0.297
ores-0.296
?-0.295
ila-0.286
Token sn
Token ID3013
Feature activation-1.48e+00

Pos logit contributions
keeping+0.017
ping+0.014
 arrest+0.012
 heads+0.012
 mant+0.011
Neg logit contributions
ours-0.010
osh-0.010
ores-0.010
?-0.010
ila-0.009
Tokenugg
Token ID6837
Feature activation-2.73e-01

Pos logit contributions
ours+0.113
osh+0.112
ila+0.110
?+0.107
irl+0.105
Neg logit contributions
keeping-0.189
ping-0.148
 heads-0.129
edly-0.127
cious-0.124
Tokenling
Token ID1359
Feature activation+1.94e+00

Pos logit contributions
ours+0.020
?+0.019
ila+0.019
osh+0.018
ores+0.018
Neg logit contributions
keeping-0.035
ping-0.027
 arrest-0.027
 mant-0.025
cious-0.024
Token with
Token ID351
Feature activation-1.80e-01

Pos logit contributions
keeping+0.277
ping+0.214
 arrest+0.206
 mant+0.191
 heads+0.191
Neg logit contributions
ours-0.116
osh-0.115
ores-0.111
?-0.110
 of-0.108
Token their
Token ID511
Feature activation+7.76e-03

Pos logit contributions
ours+0.013
osh+0.012
ores+0.012
ila+0.012
?+0.012
Neg logit contributions
keeping-0.024
ping-0.019
 heads-0.017
 arrest-0.017
cious-0.016
Token mom
Token ID1995
Feature activation-8.49e-01

Pos logit contributions
keeping+0.001
ping+0.001
 arrest+0.001
makers+0.001
cious+0.001
Neg logit contributions
ours-0.001
ig-0.001
osh-0.001
ite-0.001
?-0.001
Token.
Token ID13
Feature activation+1.88e-01

Pos logit contributions
ours+0.055
?+0.051
ores+0.050
osh+0.050
ila+0.049
Neg logit contributions
keeping-0.119
ping-0.095
 arrest-0.087
 heads-0.086
 mant-0.083
Token
Token ID198
Feature activation+1.09e-01

Pos logit contributions
keeping+0.024
ping+0.019
 heads+0.017
 arrest+0.017
cious+0.016
Neg logit contributions
ours-0.014
osh-0.014
ig-0.013
int-0.013
irl-0.013
Token not
Token ID407
Feature activation+6.50e-01

Pos logit contributions
?+0.266
irl+0.238
ores+0.233
ap+0.226
 exactly+0.224
Neg logit contributions
keeping-0.536
ping-0.443
edly-0.379
 heads-0.367
cious-0.365
Token play
Token ID711
Feature activation-5.34e-01

Pos logit contributions
keeping+0.089
ping+0.072
edly+0.062
 heads+0.062
 mant+0.062
Neg logit contributions
?-0.046
ours-0.041
ores-0.041
osh-0.041
ila-0.041
Token with
Token ID351
Feature activation-2.65e-01

Pos logit contributions
ours+0.033
?+0.032
ores+0.032
osh+0.031
ila+0.031
Neg logit contributions
keeping-0.076
ping-0.061
 heads-0.056
 arrest-0.054
 mant-0.053
Token was
Token ID373
Feature activation-6.29e-01

Pos logit contributions
ours+0.020
?+0.019
ila+0.019
osh+0.019
ores+0.018
Neg logit contributions
keeping-0.035
ping-0.028
 heads-0.024
edly-0.024
 arrest-0.024
Tokenps
Token ID862
Feature activation+6.55e+00

Pos logit contributions
ours+0.045
osh+0.041
ores+0.041
ila+0.041
int+0.040
Neg logit contributions
keeping-0.084
ping-0.068
 heads-0.060
 arrest-0.060
cious-0.057
Token anymore
Token ID7471
Feature activation+2.00e+00

Pos logit contributions
keeping+0.882
ping+0.702
 heads+0.630
 arrest+0.623
cious+0.601
Neg logit contributions
ours-0.461
ores-0.432
osh-0.432
ila-0.424
?-0.423
Token.
Token ID13
Feature activation+1.09e-01

Pos logit contributions
keeping+0.283
ping+0.231
 mant+0.216
cious+0.211
 heads+0.210
Neg logit contributions
?-0.137
 of-0.122
ours-0.110
ila-0.109
ores-0.109
Token We
Token ID775
Feature activation+1.19e-01

Pos logit contributions
keeping+0.014
ping+0.010
 heads+0.010
 arrest+0.010
 mant+0.009
Neg logit contributions
ours-0.008
ig-0.008
irl-0.008
osh-0.008
ila-0.008
Token will
Token ID481
Feature activation-3.85e+00

Pos logit contributions
keeping+0.018
ping+0.014
 heads+0.013
 arrest+0.013
makers+0.013
Neg logit contributions
ours-0.007
ores-0.006
ila-0.006
osh-0.006
int-0.006
Token only
Token ID691
Feature activation-1.18e+00

Pos logit contributions
?+0.269
irl+0.246
ores+0.243
 exactly+0.235
ila+0.232
Neg logit contributions
keeping-0.525
ping-0.433
edly-0.370
cious-0.363
 heads-0.362
Token play
Token ID711
Feature activation-5.18e-01

Pos logit contributions
ours+0.092
osh+0.086
ores+0.086
ila+0.085
ig+0.083
Neg logit contributions
keeping-0.150
ping-0.118
 heads-0.105
 arrest-0.104
cious-0.099
Token he
Token ID339
Feature activation-1.54e+00

Pos logit contributions
keeping+0.021
ping+0.017
 heads+0.016
 arrest+0.015
makers+0.015
Neg logit contributions
ours-0.011
int-0.010
ores-0.010
ig-0.010
irl-0.010
Token loves
Token ID10408
Feature activation-2.89e-01

Pos logit contributions
ours+0.112
ores+0.105
ila+0.099
int+0.098
irl+0.098
Neg logit contributions
keeping-0.193
ping-0.159
 heads-0.146
 arrest-0.146
makers-0.137
Token juice
Token ID13135
Feature activation+1.76e+00

Pos logit contributions
?+0.022
ours+0.022
ila+0.022
ores+0.021
osh+0.021
Neg logit contributions
keeping-0.036
ping-0.029
 heads-0.025
 mant-0.025
cious-0.025
Token.
Token ID13
Feature activation+4.55e-02

Pos logit contributions
keeping+0.230
ping+0.183
 heads+0.163
 arrest+0.162
 mant+0.155
Neg logit contributions
ours-0.130
ores-0.122
osh-0.122
ila-0.120
?-0.120
Token Let
Token ID3914
Feature activation-4.77e-02

Pos logit contributions
keeping+0.006
ping+0.004
 arrest+0.004
 heads+0.004
 mant+0.004
Neg logit contributions
ours-0.004
irl-0.004
ig-0.004
osh-0.004
ade-0.003
Token me
Token ID502
Feature activation+1.39e+00

Pos logit contributions
ours+0.004
osh+0.004
ores+0.004
ila+0.004
int+0.004
Neg logit contributions
keeping-0.006
ping-0.005
 heads-0.004
 arrest-0.004
cious-0.004
Token help
Token ID1037
Feature activation+1.67e-01

Pos logit contributions
keeping+0.151
ping+0.116
 arrest+0.109
 heads+0.103
makers+0.097
Neg logit contributions
ours-0.127
ila-0.116
osh-0.113
ores-0.110
umber-0.108
Token you
Token ID345
Feature activation+1.69e+00

Pos logit contributions
keeping+0.022
ping+0.017
 heads+0.015
cious+0.015
 arrest+0.015
Neg logit contributions
?-0.013
ours-0.013
ila-0.012
osh-0.012
ores-0.012
Token clean
Token ID3424
Feature activation-3.31e+00

Pos logit contributions
keeping+0.212
ping+0.167
 heads+0.147
 arrest+0.146
cious+0.140
Neg logit contributions
ours-0.135
ores-0.127
osh-0.126
?-0.125
ila-0.124
Token up
Token ID510
Feature activation-9.86e-01

Pos logit contributions
ours+0.206
?+0.192
ores+0.191
ila+0.186
osh+0.183
Neg logit contributions
keeping-0.470
ping-0.381
 heads-0.341
 arrest-0.340
cious-0.338
Token,"
Token ID553
Feature activation+3.71e+00

Pos logit contributions
ours+0.070
?+0.067
ores+0.066
osh+0.065
ila+0.065
Neg logit contributions
keeping-0.131
ping-0.106
 heads-0.093
 arrest-0.092
cious-0.090
Token carpet
Token ID20710
Feature activation+1.82e+00

Pos logit contributions
 of+0.037
?+0.036
uddle+0.034
ig+0.034
auc+0.033
Neg logit contributions
keeping-0.048
ping-0.038
cious-0.033
 arrest-0.033
edly-0.032
Token and
Token ID290
Feature activation+6.65e-02

Pos logit contributions
keeping+0.263
ping+0.217
 arrest+0.193
 heads+0.189
cious+0.189
Neg logit contributions
ours-0.105
?-0.104
ores-0.101
 of-0.100
osh-0.100
Token pretend
Token ID16614
Feature activation+1.44e+00

Pos logit contributions
keeping+0.009
ping+0.007
 arrest+0.006
cious+0.006
 heads+0.006
Neg logit contributions
ours-0.005
osh-0.005
ores-0.004
?-0.004
ila-0.004
Token they
Token ID484
Feature activation-4.65e-01

Pos logit contributions
keeping+0.199
ping+0.161
 heads+0.143
 arrest+0.142
cious+0.137
Neg logit contributions
ours-0.097
osh-0.090
ores-0.089
ila-0.088
?-0.087
Token are
Token ID389
Feature activation+4.71e-01

Pos logit contributions
ours+0.026
osh+0.025
ores+0.024
ila+0.024
?+0.023
Neg logit contributions
keeping-0.069
ping-0.057
 heads-0.051
 arrest-0.050
cious-0.049
Token in
Token ID287
Feature activation+1.13e+00

Pos logit contributions
keeping+0.060
ping+0.046
 heads+0.042
 arrest+0.041
edly+0.040
Neg logit contributions
ores-0.036
ours-0.035
?-0.035
ila-0.034
 exactly-0.033
Token a
Token ID257
Feature activation+9.09e-03

Pos logit contributions
keeping+0.171
ping+0.146
cious+0.127
 heads+0.126
edly+0.126
Neg logit contributions
?-0.057
ours-0.056
ores-0.055
ila-0.054
irl-0.053
Token cave
Token ID11527
Feature activation+1.17e-01

Pos logit contributions
keeping+0.001
ping+0.001
 heads+0.001
 arrest+0.001
edly+0.001
Neg logit contributions
ours-0.001
osh-0.001
ig-0.001
?-0.001
ila-0.001
Token.
Token ID13
Feature activation+1.14e-01

Pos logit contributions
keeping+0.016
ping+0.013
 heads+0.011
 arrest+0.011
cious+0.011
Neg logit contributions
ours-0.008
ores-0.008
?-0.008
osh-0.007
ila-0.007
Token
Token ID198
Feature activation+3.12e-02

Pos logit contributions
keeping+0.015
ping+0.012
 heads+0.011
 arrest+0.011
cious+0.011
Neg logit contributions
ours-0.008
ig-0.007
int-0.007
osh-0.007
irl-0.007
Token
Token ID198
Feature activation+3.59e-02

Pos logit contributions
keeping+0.004
ping+0.003
makers+0.003
 arrest+0.003
 heads+0.003
Neg logit contributions
ours-0.002
int-0.002
ig-0.002
osh-0.002
ores-0.002
TokenFinally
Token ID11158
Feature activation+1.13e+00

Pos logit contributions
keeping+0.013
ping+0.010
 arrest+0.009
 heads+0.009
makers+0.009
Neg logit contributions
ours-0.009
osh-0.008
int-0.008
ores-0.008
ig-0.008
Token,
Token ID11
Feature activation+2.33e-01

Pos logit contributions
keeping+0.153
ping+0.121
 arrest+0.113
 heads+0.111
cious+0.105
Neg logit contributions
ours-0.077
osh-0.073
int-0.072
ores-0.070
ite-0.070
Token when
Token ID618
Feature activation+2.04e-01

Pos logit contributions
keeping+0.032
ping+0.025
 arrest+0.023
 heads+0.022
cious+0.021
Neg logit contributions
ours-0.017
osh-0.015
int-0.015
ores-0.015
ig-0.015
Token her
Token ID607
Feature activation+8.23e-01

Pos logit contributions
keeping+0.028
ping+0.022
 arrest+0.020
 heads+0.020
makers+0.019
Neg logit contributions
ours-0.014
osh-0.013
int-0.013
ores-0.013
ila-0.013
Token friends
Token ID2460
Feature activation-3.24e+00

Pos logit contributions
keeping+0.118
ping+0.097
 heads+0.086
 arrest+0.085
cious+0.082
Neg logit contributions
ours-0.051
osh-0.047
ores-0.047
ila-0.046
?-0.045
Token arrived
Token ID5284
Feature activation+2.43e+00

Pos logit contributions
ours+0.257
ores+0.240
int+0.231
osh+0.230
ila+0.230
Neg logit contributions
keeping-0.409
ping-0.317
 arrest-0.289
 heads-0.288
makers-0.267
Token,
Token ID11
Feature activation+2.20e-01

Pos logit contributions
keeping+0.361
ping+0.292
 heads+0.269
 arrest+0.264
cious+0.255
Neg logit contributions
ours-0.137
osh-0.133
ores-0.125
int-0.125
irl-0.121
Token Sarah
Token ID10490
Feature activation-2.43e+00

Pos logit contributions
keeping+0.028
ping+0.021
 arrest+0.019
 heads+0.019
cious+0.018
Neg logit contributions
int-0.018
ours-0.018
ig-0.017
osh-0.016
ite-0.016
Token served
Token ID4983
Feature activation+1.13e+00

Pos logit contributions
ours+0.165
osh+0.154
ores+0.151
ila+0.150
?+0.150
Neg logit contributions
keeping-0.334
ping-0.268
 heads-0.239
 arrest-0.236
 mant-0.228
Token them
Token ID606
Feature activation+1.59e+00

Pos logit contributions
keeping+0.153
ping+0.123
 arrest+0.111
 heads+0.109
cious+0.106
Neg logit contributions
ours-0.079
ores-0.074
osh-0.073
ila-0.072
?-0.072
Token the
Token ID262
Feature activation-3.64e-01

Pos logit contributions
keeping+0.211
ping+0.168
 heads+0.150
 arrest+0.148
 mant+0.142
Neg logit contributions
ours-0.115
osh-0.109
ila-0.107
ores-0.105
ig-0.104
Token the
Token ID262
Feature activation-4.33e-01

Pos logit contributions
ours+0.057
osh+0.054
ores+0.053
ila+0.053
?+0.053
Neg logit contributions
keeping-0.126
ping-0.103
 heads-0.092
 arrest-0.091
cious-0.089
Token tree
Token ID5509
Feature activation+5.17e-01

Pos logit contributions
ite+0.029
ig+0.028
?+0.027
osh+0.026
 of+0.026
Neg logit contributions
keeping-0.063
ping-0.051
edly-0.045
ured-0.044
cious-0.044
Token and
Token ID290
Feature activation+7.45e-02

Pos logit contributions
keeping+0.068
ping+0.054
 heads+0.048
 arrest+0.047
cious+0.045
Neg logit contributions
ours-0.038
osh-0.036
ores-0.035
ila-0.035
int-0.034
Token get
Token ID651
Feature activation+1.12e+00

Pos logit contributions
keeping+0.009
ping+0.007
 arrest+0.006
 heads+0.006
 mant+0.006
Neg logit contributions
ours-0.006
ores-0.006
int-0.006
ig-0.006
ila-0.006
Token it
Token ID340
Feature activation-3.23e-01

Pos logit contributions
keeping+0.133
ping+0.099
 heads+0.090
 arrest+0.087
 mant+0.086
Neg logit contributions
ours-0.093
ver-0.090
uddle-0.088
ack-0.088
ig-0.088
Token.
Token ID13
Feature activation+1.21e-01

Pos logit contributions
ours+0.026
osh+0.025
ores+0.025
ila+0.024
irl+0.024
Neg logit contributions
keeping-0.040
ping-0.032
 heads-0.028
 arrest-0.027
cious-0.026
Token Her
Token ID2332
Feature activation+1.55e+00

Pos logit contributions
keeping+0.017
ping+0.014
 arrest+0.012
 heads+0.012
makers+0.011
Neg logit contributions
ours-0.008
osh-0.008
ores-0.007
?-0.007
ila-0.007
Token mom
Token ID1995
Feature activation-9.30e-01

Pos logit contributions
keeping+0.194
ping+0.148
 heads+0.148
 arrest+0.134
 mant+0.131
Neg logit contributions
ores-0.121
ours-0.121
osh-0.118
int-0.112
ila-0.111
Token said
Token ID531
Feature activation-3.89e-02

Pos logit contributions
ores+0.078
ours+0.076
irl+0.071
ila+0.066
osh+0.065
Neg logit contributions
keeping-0.103
ping-0.087
 heads-0.082
red-0.073
makers-0.073
Token she
Token ID673
Feature activation-1.13e+00

Pos logit contributions
ours+0.003
ores+0.002
osh+0.002
irl+0.002
int+0.002
Neg logit contributions
keeping-0.005
ping-0.004
 heads-0.004
cious-0.004
 arrest-0.004
Token couldn
Token ID3521
Feature activation+2.77e+00

Pos logit contributions
ours+0.085
ores+0.085
irl+0.082
ila+0.080
osh+0.079
Neg logit contributions
keeping-0.142
ping-0.108
 heads-0.103
cious-0.100
makers-0.096
Token Zoe
Token ID32280
Feature activation-3.47e-01

Pos logit contributions
ours+0.160
?+0.154
ila+0.151
ores+0.151
 of+0.138
Neg logit contributions
keeping-0.557
ping-0.468
 arrest-0.429
 heads-0.418
edly-0.417
Token.
Token ID13
Feature activation+8.82e-02

Pos logit contributions
ours+0.028
ores+0.027
osh+0.026
irl+0.025
ila+0.025
Neg logit contributions
keeping-0.043
ping-0.034
 heads-0.030
 arrest-0.029
cious-0.029
Token Zoe
Token ID32280
Feature activation-3.47e-01

Pos logit contributions
keeping+0.012
ping+0.009
 heads+0.008
 arrest+0.008
cious+0.008
Neg logit contributions
ours-0.006
osh-0.006
ores-0.006
ila-0.006
int-0.006
Token wanted
Token ID2227
Feature activation+2.12e+00

Pos logit contributions
ours+0.027
ores+0.025
osh+0.025
ila+0.024
ver+0.024
Neg logit contributions
keeping-0.044
ping-0.035
 heads-0.031
 arrest-0.030
cious-0.030
Token to
Token ID284
Feature activation+3.86e-01

Pos logit contributions
keeping+0.287
ping+0.232
 heads+0.207
 arrest+0.203
makers+0.195
Neg logit contributions
ours-0.149
ores-0.138
osh-0.137
?-0.136
ila-0.136
Token chat
Token ID8537
Feature activation-1.68e-01

Pos logit contributions
keeping+0.047
ping+0.035
 arrest+0.033
 heads+0.032
 mant+0.031
Neg logit contributions
ours-0.034
ores-0.030
osh-0.030
ig-0.030
int-0.030
Token with
Token ID351
Feature activation-2.89e-01

Pos logit contributions
ours+0.011
osh+0.010
ores+0.010
ila+0.010
ig+0.009
Neg logit contributions
keeping-0.023
ping-0.019
 heads-0.017
 arrest-0.017
cious-0.016
Token her
Token ID607
Feature activation+7.18e-01

Pos logit contributions
ours+0.026
int+0.025
osh+0.025
ver+0.024
umber+0.024
Neg logit contributions
keeping-0.033
ping-0.026
 heads-0.023
 arrest-0.023
 necks-0.021
Token mom
Token ID1995
Feature activation-9.61e-01

Pos logit contributions
keeping+0.101
ping+0.082
 heads+0.073
 arrest+0.073
cious+0.070
Neg logit contributions
ours-0.047
osh-0.043
ores-0.043
ila-0.042
?-0.041
Tokenmy
Token ID1820
Feature activation-7.11e-01

Pos logit contributions
ours+0.068
osh+0.063
ores+0.063
ila+0.062
irl+0.060
Neg logit contributions
keeping-0.129
ping-0.104
 heads-0.093
 arrest-0.091
cious-0.088
Token,
Token ID11
Feature activation+1.19e-01

Pos logit contributions
ours+0.047
?+0.045
ores+0.043
osh+0.043
ver+0.043
Neg logit contributions
keeping-0.098
ping-0.079
 arrest-0.073
 heads-0.071
 mant-0.068
Token
Token ID220
Feature activation-1.05e-01

Pos logit contributions
keeping+0.231
ping+0.187
 heads+0.171
 arrest+0.158
edly+0.158
Neg logit contributions
ours-0.155
osh-0.147
ite-0.144
ver-0.144
irl-0.143
Token
Token ID198
Feature activation-1.29e-02

Pos logit contributions
osh+0.009
ite+0.009
ig+0.009
ours+0.009
int+0.008
Neg logit contributions
keeping-0.011
ping-0.008
 arrest-0.007
makers-0.007
 heads-0.007
Token
Token ID198
Feature activation-1.21e-02

Pos logit contributions
ours+0.001
ores+0.001
osh+0.001
int+0.001
 exactly+0.001
Neg logit contributions
keeping-0.002
ping-0.001
 arrest-0.001
 heads-0.001
cious-0.001
TokenOl
Token ID30098
Feature activation-1.16e+00

Pos logit contributions
ours+0.001
ores+0.001
osh+0.001
int+0.001
ila+0.001
Neg logit contributions
keeping-0.001
ping-0.001
 arrest-0.001
 heads-0.001
cious-0.001
Tokeniver
Token ID1428
Feature activation-3.05e+00

Pos logit contributions
ours+0.073
osh+0.068
ores+0.068
int+0.068
ila+0.067
Neg logit contributions
keeping-0.164
ping-0.131
 heads-0.120
 arrest-0.118
 mant-0.115
Token was
Token ID373
Feature activation-6.69e-01

Pos logit contributions
ours+0.218
osh+0.199
ores+0.199
ila+0.196
ig+0.193
Neg logit contributions
keeping-0.406
ping-0.332
 heads-0.296
 arrest-0.281
edly-0.280
Token so
Token ID523
Feature activation+5.51e-01

Pos logit contributions
ours+0.050
osh+0.048
ig+0.048
ite+0.048
ila+0.047
Neg logit contributions
keeping-0.086
ping-0.071
 heads-0.062
 arrest-0.059
cious-0.058
Token happy
Token ID3772
Feature activation+7.64e-01

Pos logit contributions
keeping+0.051
ping+0.042
 heads+0.038
edly+0.034
cious+0.033
Neg logit contributions
ig-0.057
ours-0.051
ite-0.051
umber-0.050
int-0.049
Token.
Token ID13
Feature activation+6.62e-02

Pos logit contributions
keeping+0.105
ping+0.084
 heads+0.076
 arrest+0.075
 mant+0.073
Neg logit contributions
ours-0.051
ores-0.048
?-0.048
osh-0.047
ila-0.047
Token He
Token ID679
Feature activation-8.68e-01

Pos logit contributions
keeping+0.009
ping+0.007
 heads+0.007
 mant+0.006
 arrest+0.006
Neg logit contributions
osh-0.005
ours-0.005
irl-0.004
ila-0.004
ite-0.004
Token loved
Token ID6151
Feature activation-3.44e-01

Pos logit contributions
ours+0.068
ores+0.063
ila+0.062
osh+0.061
ig+0.061
Neg logit contributions
keeping-0.109
ping-0.087
 heads-0.079
 arrest-0.077
cious-0.075
Token away
Token ID1497
Feature activation+1.50e+00

Pos logit contributions
?+0.123
ours+0.110
 of+0.109
ores+0.109
 exactly+0.108
Neg logit contributions
keeping-0.251
ping-0.192
cious-0.186
 arrest-0.181
 heads-0.174
Token.
Token ID13
Feature activation+8.72e-02

Pos logit contributions
keeping+0.204
ping+0.164
 arrest+0.149
 heads+0.148
cious+0.142
Neg logit contributions
ours-0.099
?-0.096
ores-0.095
osh-0.093
 exactly-0.092
Token He
Token ID679
Feature activation-8.53e-01

Pos logit contributions
keeping+0.011
ping+0.009
 heads+0.008
 arrest+0.008
 pavement+0.008
Neg logit contributions
ours-0.007
ig-0.007
irl-0.006
osh-0.006
ores-0.006
Token stretched
Token ID19110
Feature activation-2.09e+00

Pos logit contributions
ours+0.068
ores+0.065
osh+0.061
ila+0.061
?+0.061
Neg logit contributions
keeping-0.107
ping-0.085
 heads-0.078
 arrest-0.074
 necks-0.071
Token and
Token ID290
Feature activation+3.67e-02

Pos logit contributions
ours+0.191
int+0.172
ite+0.172
uddle+0.166
umber+0.165
Neg logit contributions
keeping-0.232
ping-0.197
 heads-0.182
 surface-0.156
 necks-0.155
Token stretched
Token ID19110
Feature activation-2.09e+00

Pos logit contributions
keeping+0.005
ping+0.004
 heads+0.003
 arrest+0.003
cious+0.003
Neg logit contributions
ours-0.003
ores-0.003
ila-0.003
ite-0.003
osh-0.003
Token,
Token ID11
Feature activation+1.13e-01

Pos logit contributions
ours+0.192
int+0.176
ite+0.173
ores+0.169
osh+0.167
Neg logit contributions
keeping-0.232
ping-0.189
 heads-0.171
 mant-0.151
 necks-0.150
Token but
Token ID475
Feature activation+1.82e-01

Pos logit contributions
keeping+0.017
ping+0.013
 arrest+0.012
makers+0.012
 heads+0.012
Neg logit contributions
 exactly-0.006
ours-0.006
osh-0.006
ila-0.006
ver-0.006
Token he
Token ID339
Feature activation-1.51e+00

Pos logit contributions
keeping+0.022
ping+0.018
 heads+0.016
 arrest+0.015
 ground+0.015
Neg logit contributions
ours-0.015
int-0.014
ores-0.013
umber-0.013
irl-0.013
Token could
Token ID714
Feature activation-8.63e-01

Pos logit contributions
ours+0.129
ores+0.121
?+0.111
int+0.111
ite+0.110
Neg logit contributions
keeping-0.174
ping-0.140
 heads-0.129
 arrest-0.121
cious-0.119
Token not
Token ID407
Feature activation+6.19e-01

Pos logit contributions
ours+0.081
int+0.074
osh+0.074
ig+0.074
ores+0.073
Neg logit contributions
keeping-0.095
ping-0.074
 heads-0.067
 arrest-0.067
 mant-0.060
Token.
Token ID13
Feature activation+1.38e-01

Pos logit contributions
ours+0.208
osh+0.194
ores+0.193
ila+0.192
?+0.190
Neg logit contributions
keeping-0.386
ping-0.309
 heads-0.275
 arrest-0.272
 mant-0.262
Token She
Token ID1375
Feature activation-1.72e-01

Pos logit contributions
keeping+0.017
ping+0.012
 heads+0.011
cious+0.011
 arrest+0.011
Neg logit contributions
irl-0.011
osh-0.011
ours-0.011
ig-0.011
int-0.011
Token thinks
Token ID6834
Feature activation+2.25e+00

Pos logit contributions
ores+0.013
ver+0.013
ite+0.013
itting+0.013
umber+0.013
Neg logit contributions
keeping-0.018
 heads-0.016
 arrest-0.015
ping-0.015
 stems-0.014
Token Sara
Token ID24799
Feature activation-5.38e-01

Pos logit contributions
keeping+0.282
ping+0.225
 heads+0.205
 arrest+0.203
makers+0.193
Neg logit contributions
int-0.161
ours-0.157
ite-0.155
osh-0.154
ig-0.150
Token is
Token ID318
Feature activation-1.77e+00

Pos logit contributions
ours+0.031
ores+0.029
osh+0.027
irl+0.027
ila+0.027
Neg logit contributions
keeping-0.078
ping-0.065
 heads-0.060
 arrest-0.058
cious-0.056
Token kind
Token ID1611
Feature activation-2.24e+00

Pos logit contributions
ours+0.110
ores+0.104
ila+0.103
?+0.102
osh+0.102
Neg logit contributions
keeping-0.254
ping-0.204
 heads-0.184
 arrest-0.183
edly-0.176
Token and
Token ID290
Feature activation+1.35e-01

Pos logit contributions
ours+0.146
?+0.137
ila+0.137
osh+0.134
ores+0.134
Neg logit contributions
keeping-0.307
ping-0.250
 arrest-0.225
 heads-0.223
 mant-0.222
Token smart
Token ID4451
Feature activation-1.03e+00

Pos logit contributions
keeping+0.017
ping+0.014
 heads+0.013
cious+0.013
 arrest+0.012
Neg logit contributions
ours-0.009
ig-0.009
ores-0.009
int-0.009
osh-0.009
Token.
Token ID13
Feature activation+1.87e-01

Pos logit contributions
ours+0.063
?+0.060
ila+0.059
 of+0.058
ores+0.056
Neg logit contributions
keeping-0.145
ping-0.119
 mant-0.108
 arrest-0.108
 heads-0.105
Token She
Token ID1375
Feature activation-1.59e-01

Pos logit contributions
keeping+0.023
ping+0.017
 heads+0.016
 arrest+0.015
cious+0.015
Neg logit contributions
irl-0.015
osh-0.015
ours-0.015
ig-0.014
int-0.014
Token hugs
Token ID40657
Feature activation+7.29e-01

Pos logit contributions
ores+0.012
ver+0.012
ite+0.012
ours+0.011
ig+0.011
Neg logit contributions
keeping-0.017
 heads-0.015
ping-0.014
 arrest-0.014
 necks-0.013
Token did
Token ID750
Feature activation-1.01e+00

Pos logit contributions
ours+0.061
ores+0.057
osh+0.056
ig+0.055
ila+0.055
Neg logit contributions
keeping-0.092
ping-0.073
 heads-0.065
 arrest-0.064
cious-0.060
Token not
Token ID407
Feature activation+7.33e-01

Pos logit contributions
ours+0.088
osh+0.083
int+0.081
ores+0.080
ig+0.080
Neg logit contributions
keeping-0.121
ping-0.095
 arrest-0.083
 heads-0.083
makers-0.078
Token understand
Token ID1833
Feature activation-1.12e+00

Pos logit contributions
keeping+0.102
ping+0.083
 heads+0.074
 arrest+0.073
makers+0.071
Neg logit contributions
ours-0.050
osh-0.045
ores-0.044
ig-0.044
ila-0.043
Token why
Token ID1521
Feature activation-4.90e+00

Pos logit contributions
ours+0.064
ite+0.064
osh+0.063
int+0.063
ver+0.061
Neg logit contributions
keeping-0.163
ping-0.137
 heads-0.124
 arrest-0.120
makers-0.116
Token L
Token ID406
Feature activation-3.79e+00

Pos logit contributions
int+0.350
ours+0.340
ig+0.314
ite+0.307
osh+0.304
Neg logit contributions
keeping-0.660
ping-0.522
 arrest-0.475
 heads-0.468
makers-0.459
Tokenila
Token ID10102
Feature activation-5.59e+00

Pos logit contributions
ours+0.081
 exactly+0.078
osh+0.069
ores+0.064
irl+0.062
Neg logit contributions
keeping-0.656
ping-0.578
 arrest-0.524
makers-0.508
 heads-0.507
Token and
Token ID290
Feature activation+1.45e-01

Pos logit contributions
ours+0.346
osh+0.322
ila+0.316
?+0.314
ores+0.314
Neg logit contributions
keeping-0.803
ping-0.647
 heads-0.580
 arrest-0.579
 mant-0.562
Token Tom
Token ID4186
Feature activation+3.04e-01

Pos logit contributions
keeping+0.021
ping+0.018
 heads+0.016
 arrest+0.016
cious+0.015
Neg logit contributions
ours-0.008
osh-0.008
ores-0.008
ila-0.007
int-0.007
Token were
Token ID547
Feature activation+2.14e+00

Pos logit contributions
keeping+0.043
ping+0.034
 arrest+0.031
 heads+0.031
 mant+0.030
Neg logit contributions
ours-0.020
osh-0.018
ila-0.018
ores-0.018
?-0.018
Token so
Token ID523
Feature activation+6.84e-01

Pos logit contributions
keeping+0.312
ping+0.249
 heads+0.228
 arrest+0.227
 mant+0.219
Neg logit contributions
ours-0.130
ores-0.121
?-0.120
ila-0.119
osh-0.118
Token mean
Token ID1612
Feature activation-7.46e-01

Pos logit contributions
keeping+0.079
ping+0.067
 heads+0.058
makers+0.057
 arrest+0.056
Neg logit contributions
ours-0.056
ig-0.056
osh-0.053
irl-0.052
ite-0.051
Token she
Token ID673
Feature activation-1.20e+00

Pos logit contributions
keeping+0.011
ping+0.009
 heads+0.008
 arrest+0.008
cious+0.008
Neg logit contributions
ours-0.006
ig-0.006
int-0.005
ores-0.005
irl-0.005
Token saw
Token ID2497
Feature activation+5.37e-01

Pos logit contributions
ours+0.097
ores+0.092
 exactly+0.089
ver+0.088
osh+0.088
Neg logit contributions
keeping-0.149
ping-0.116
 heads-0.105
 arrest-0.105
cious-0.098
Token a
Token ID257
Feature activation-8.04e-02

Pos logit contributions
keeping+0.071
ping+0.057
 arrest+0.050
 heads+0.050
cious+0.048
Neg logit contributions
ours-0.039
osh-0.036
ores-0.036
ig-0.036
int-0.035
Token beautiful
Token ID4950
Feature activation-3.46e-01

Pos logit contributions
ours+0.006
ores+0.006
osh+0.006
ila+0.006
int+0.006
Neg logit contributions
keeping-0.010
ping-0.008
 heads-0.007
 arrest-0.007
 mant-0.007
Token fl
Token ID781
Feature activation+8.38e-01

Pos logit contributions
ours+0.030
?+0.029
osh+0.029
ig+0.029
ila+0.028
Neg logit contributions
keeping-0.041
ping-0.031
 heads-0.027
 arrest-0.027
makers-0.026
Tokenute
Token ID1133
Feature activation-6.00e+00

Pos logit contributions
keeping+0.113
ping+0.094
 arrest+0.088
edly+0.087
 heads+0.085
Neg logit contributions
ours-0.050
ores-0.048
old-0.042
ila-0.041
osh-0.041
Token in
Token ID287
Feature activation+1.06e+00

Pos logit contributions
?+0.363
ours+0.361
ig+0.332
 of+0.328
ores+0.328
Neg logit contributions
keeping-0.873
ping-0.687
 heads-0.638
 arrest-0.627
cious-0.604
Token the
Token ID262
Feature activation-5.06e-01

Pos logit contributions
keeping+0.161
ping+0.136
cious+0.125
edly+0.124
 heads+0.124
Neg logit contributions
?-0.052
ila-0.050
ores-0.049
ours-0.047
osh-0.046
Token window
Token ID4324
Feature activation+9.51e-01

Pos logit contributions
?+0.036
ours+0.036
osh+0.036
ila+0.035
ores+0.034
Neg logit contributions
keeping-0.066
ping-0.052
 heads-0.046
makers-0.046
 arrest-0.046
Token of
Token ID286
Feature activation-4.38e+00

Pos logit contributions
keeping+0.114
ping+0.091
 heads+0.077
 arrest+0.075
 mant+0.071
Neg logit contributions
ours-0.081
osh-0.076
irl-0.075
ver-0.075
ig-0.075
Token a
Token ID257
Feature activation-8.54e-02

Pos logit contributions
ores+0.178
ila+0.176
?+0.173
 exactly+0.157
osh+0.155
Neg logit contributions
keeping-0.681
ping-0.581
 heads-0.547
cious-0.544
makers-0.536
Token the
Token ID262
Feature activation-4.32e-01

Pos logit contributions
ours+0.127
osh+0.118
ores+0.117
ila+0.116
?+0.113
Neg logit contributions
keeping-0.250
ping-0.201
 heads-0.180
 arrest-0.178
cious-0.170
Token duck
Token ID22045
Feature activation-9.53e-01

Pos logit contributions
ours+0.028
?+0.028
uddle+0.028
osh+0.027
irl+0.027
Neg logit contributions
keeping-0.060
ping-0.048
edly-0.042
cious-0.042
 arrest-0.042
Token float
Token ID12178
Feature activation-1.71e+00

Pos logit contributions
ours+0.068
osh+0.063
ores+0.062
ila+0.062
?+0.060
Neg logit contributions
keeping-0.128
ping-0.103
 heads-0.092
 arrest-0.090
cious-0.087
Token and
Token ID290
Feature activation+6.00e-02

Pos logit contributions
ours+0.134
osh+0.124
ores+0.124
ila+0.123
int+0.121
Neg logit contributions
keeping-0.219
ping-0.174
 heads-0.155
 arrest-0.149
cious-0.143
Token qu
Token ID627
Feature activation+1.70e+00

Pos logit contributions
keeping+0.008
ping+0.006
 arrest+0.006
 heads+0.005
makers+0.005
Neg logit contributions
ours-0.004
osh-0.004
ores-0.004
ila-0.004
irl-0.004
Tokenack
Token ID441
Feature activation-6.81e+00

Pos logit contributions
keeping+0.286
ping+0.243
 heads+0.222
 arrest+0.218
cious+0.212
Neg logit contributions
ours-0.063
osh-0.054
ila-0.053
ores-0.053
?-0.050
Token.
Token ID13
Feature activation+1.10e-01

Pos logit contributions
osh+0.396
ours+0.387
ores+0.386
 of+0.377
?+0.359
Neg logit contributions
keeping-1.005
ping-0.745
 arrest-0.730
 mant-0.717
 heads-0.703
Token
Token ID198
Feature activation+2.99e-02

Pos logit contributions
keeping+0.014
ping+0.010
 arrest+0.009
 heads+0.009
edly+0.009
Neg logit contributions
ours-0.009
int-0.009
ig-0.009
osh-0.009
 exactly-0.008
Token
Token ID198
Feature activation+2.19e-02

Pos logit contributions
keeping+0.004
ping+0.004
 heads+0.003
 arrest+0.003
makers+0.003
Neg logit contributions
ours-0.002
osh-0.002
ores-0.002
int-0.002
ila-0.002
Token"
Token ID1
Feature activation+5.09e-01

Pos logit contributions
keeping+0.003
ping+0.003
 heads+0.002
 arrest+0.002
 mant+0.002
Neg logit contributions
ours-0.001
ila-0.001
ores-0.001
osh-0.001
irl-0.001
TokenLook
Token ID8567
Feature activation+3.68e+00

Pos logit contributions
keeping+0.069
ping+0.053
 heads+0.048
iously+0.048
 doorstep+0.046
Neg logit contributions
ila-0.038
osh-0.037
ores-0.035
umber-0.035
irl-0.034
Token
Token ID198
Feature activation+1.93e-02

Pos logit contributions
keeping+0.013
ping+0.010
 heads+0.009
 arrest+0.009
cious+0.009
Neg logit contributions
ig-0.009
ours-0.009
int-0.009
osh-0.009
irl-0.009
Token
Token ID198
Feature activation+2.27e-02

Pos logit contributions
keeping+0.003
ping+0.002
 arrest+0.002
 heads+0.002
makers+0.002
Neg logit contributions
ours-0.001
int-0.001
ig-0.001
osh-0.001
 exactly-0.001
Token"
Token ID1
Feature activation+5.07e-01

Pos logit contributions
keeping+0.003
ping+0.003
 heads+0.002
 arrest+0.002
 mant+0.002
Neg logit contributions
ours-0.002
osh-0.001
ores-0.001
ila-0.001
?-0.001
TokenThe
Token ID464
Feature activation-1.32e-01

Pos logit contributions
keeping+0.063
ping+0.049
 heads+0.042
 arrest+0.042
 mant+0.041
Neg logit contributions
ours-0.042
osh-0.041
ila-0.041
ores-0.040
irl-0.040
Token p
Token ID279
Feature activation-2.74e+00

Pos logit contributions
ours+0.010
osh+0.009
ila+0.009
ores+0.009
?+0.009
Neg logit contributions
keeping-0.018
ping-0.014
 heads-0.012
 arrest-0.012
cious-0.012
Tokenuddle
Token ID24500
Feature activation-7.17e+00

Pos logit contributions
ver+0.133
osh+0.103
 exactly+0.102
ores+0.101
int+0.101
Neg logit contributions
keeping-0.392
 heads-0.340
ping-0.324
 necks-0.324
 arrest-0.313
Token is
Token ID318
Feature activation-1.86e+00

Pos logit contributions
ours+0.431
ores+0.395
int+0.395
osh+0.393
ila+0.389
Neg logit contributions
keeping-1.024
ping-0.838
 heads-0.767
 arrest-0.744
makers-0.724
Token not
Token ID407
Feature activation+6.54e-01

Pos logit contributions
ours+0.131
ores+0.123
osh+0.122
ila+0.119
?+0.118
Neg logit contributions
keeping-0.253
ping-0.201
 heads-0.179
 arrest-0.179
cious-0.172
Token the
Token ID262
Feature activation-4.46e-01

Pos logit contributions
keeping+0.089
ping+0.071
 heads+0.064
 arrest+0.063
cious+0.060
Neg logit contributions
ours-0.046
osh-0.042
ores-0.042
ila-0.042
?-0.041
Token same
Token ID976
Feature activation+2.50e+00

Pos logit contributions
ours+0.035
osh+0.033
?+0.033
ila+0.033
ores+0.033
Neg logit contributions
keeping-0.056
ping-0.044
 arrest-0.038
 heads-0.038
cious-0.037
Token everywhere
Token ID8347
Feature activation+5.30e+00

Pos logit contributions
keeping+0.324
ping+0.256
 heads+0.227
 arrest+0.225
cious+0.216
Neg logit contributions
ours-0.190
osh-0.177
ores-0.176
ila-0.175
?-0.173
Token learned
Token ID4499
Feature activation+3.83e-01

Pos logit contributions
keeping+0.508
ping+0.396
 heads+0.348
 arrest+0.344
cious+0.328
Neg logit contributions
ours-0.346
osh-0.326
ores-0.324
ila-0.321
?-0.315
Token an
Token ID281
Feature activation-2.80e+00

Pos logit contributions
keeping+0.048
ping+0.037
 heads+0.032
 arrest+0.032
 mant+0.031
Neg logit contributions
ours-0.033
int-0.031
osh-0.030
ila-0.030
umber-0.030
Token important
Token ID1593
Feature activation-9.64e-01

Pos logit contributions
ours+0.183
osh+0.170
ores+0.167
ila+0.164
irl+0.159
Neg logit contributions
keeping-0.393
ping-0.321
 heads-0.287
 arrest-0.282
 mant-0.273
Token lesson
Token ID11483
Feature activation+5.47e-01

Pos logit contributions
ours+0.066
osh+0.062
ores+0.061
?+0.061
ila+0.060
Neg logit contributions
keeping-0.131
ping-0.106
 heads-0.094
 arrest-0.093
cious-0.089
Token.
Token ID13
Feature activation+1.23e-01

Pos logit contributions
keeping+0.073
ping+0.058
 heads+0.052
 arrest+0.051
 mant+0.049
Neg logit contributions
ours-0.039
osh-0.037
ores-0.037
ila-0.036
?-0.036
Token Sometimes
Token ID8975
Feature activation-4.96e+00

Pos logit contributions
keeping+0.016
ping+0.012
 arrest+0.011
 heads+0.011
 mant+0.010
Neg logit contributions
ours-0.009
irl-0.009
osh-0.009
ores-0.009
ila-0.009
Token,
Token ID11
Feature activation+1.55e-01

Pos logit contributions
ours+0.289
osh+0.265
ores+0.263
ila+0.259
?+0.251
Neg logit contributions
keeping-0.730
ping-0.597
 heads-0.540
 arrest-0.535
cious-0.515
Token we
Token ID356
Feature activation+1.21e+00

Pos logit contributions
keeping+0.020
ping+0.015
 heads+0.014
 arrest+0.014
 mant+0.013
Neg logit contributions
ours-0.012
ores-0.012
osh-0.012
ila-0.011
int-0.011
Token need
Token ID761
Feature activation-9.44e-01

Pos logit contributions
keeping+0.183
ping+0.151
 heads+0.140
 arrest+0.138
makers+0.133
Neg logit contributions
ours-0.065
ores-0.059
ila-0.058
osh-0.056
int-0.054
Token to
Token ID284
Feature activation+4.18e-01

Pos logit contributions
ours+0.060
?+0.058
osh+0.057
ores+0.056
 exactly+0.055
Neg logit contributions
keeping-0.131
ping-0.108
 heads-0.096
cious-0.094
 arrest-0.091
Token take
Token ID1011
Feature activation+1.71e+00

Pos logit contributions
keeping+0.056
ping+0.045
cious+0.039
edly+0.039
 heads+0.038
Neg logit contributions
?-0.029
irl-0.028
ours-0.028
osh-0.027
ig-0.027