TOP ACTIVATIONS
MAX = 0.213

pink tut u and a ti ara . She put them
crown , a spark ly ti ara , a fancy cape
. She would wear her ti ara and dress every day
with her - a magical ti ara , a golden dress
says , putting on a ti ara and a necklace .
. She put on her ti ara and danced around her
, and a spark ly ti ara .
a pink dress , a ti ara and a necklace .
loo and pretended to be Esk im os . After a
<|endoftext|> L ily loved sun fl owers . She liked to
garden . Lily is the seeker and Ben is the h
garden and admire his sun fl owers . They also see
soon nothing but a small spe ck in the sky .
for some funny little sun fl owers . Her m ama
back with lots of sun fl owers to fill the prince
stretch out and enormous sun fl owers would show their bright
da is ies , sun fl owers , and tul ips
. He wrote z ig z ags on walls , floors
of the frog . He spe ared the lazy frog and
house . Lily was the seeker and Ben was the h

MIN ACTIVATIONS
MIN = -0.150

castle in the distance beck oning him to explore .
Lily and Tom . " Grab the rope ! I will
Tom . " Grab the line , boy !"
hand . " Grab my hand , Ben .
ged and boxed and cart oned and packaged and labeled and
!" Mia shouts . " Grab your car and follow me
into the hole . " Grab the rope and I 'll
Sam . " Grab the rope !" Ben shouts
appreciate your honesty and your apology . I hope you have
sorry . I accept your apology . Next time , be
ily and Ben accepted his apology . They said they forg
appreciate your honesty and your apology . You can keep the
! You owe me an apology ! You can 't just
very grateful and accepted his apology . The z
Lily . We accept your apology . We hope you learned
his bear watched from the sidelines . Tim my was happy
cheered him on from the sidelines . One day , Tim
. When he heard their apology , John 's heart filled
to the squirrel as an apology . The squirrel forg ave
Jack noticed Sam was z oned in on the barrel .

INTERVAL 0.122 - 0.213
CONTAINS 0.010%

wanted to go on an aer opl ane . He asked
stayed with the other sun fl owers in the big field
glowing wand and having the sill iest time of his life
bear who had the co z iest bedroom . Every night
the child wore the ancient crown with pride , and all

INTERVAL 0.032 - 0.122
CONTAINS 9.283%

bed and this time , he fell asleep straight away .
her mum . She was smiling and Jane knew that she
not to worry and that she had a useful tool that
had a red cap that she loved to wear . One
were able to enjoy the cave . <|endoftext|> Once upon a

INTERVAL -0.059 - 0.032
CONTAINS 88.604%

So Sam followed the man and soon they reached the village
they see a problem . There is one toy left that
high in the sky . One day , he went to
skipped along the sidewalk , holding hands and singing songs .
like eggs and toast . He liked cereal with milk and

INTERVAL -0.150 - -0.059
CONTAINS 2.103%

learned that it doesn 't matter what others say , as
She tried to w iggle . She tried to slide
wrong . The little girl told the old man
in the hole with no way out . As
car was red and the other car was blue . They
Token pink
Token ID11398
Feature activation+8.43e-02

Pos logit contributions
arding+0.011
 conclusions+0.011
 tighter+0.011
 sidelines+0.011
 unnecessary+0.011
Neg logit contributions
te-0.010
 owned-0.008
 ti-0.008
 once-0.008
 lived-0.008
Token tut
Token ID9732
Feature activation+8.46e-02

Pos logit contributions
te+0.147
 owned+0.125
 once+0.123
 Sp+0.123
 lived+0.122
Neg logit contributions
 conclusions-0.108
arding-0.107
 tighter-0.107
 sidelines-0.105
 unnecessary-0.102
Tokenu
Token ID84
Feature activation+7.19e-02

Pos logit contributions
te+0.141
 Sp+0.127
 lived+0.126
 Esk+0.124
 once+0.124
Neg logit contributions
arding-0.109
ising-0.106
 sidelines-0.105
 tighter-0.102
 trigger-0.102
Token and
Token ID290
Feature activation+4.62e-03

Pos logit contributions
te+0.097
 Sp+0.085
omo+0.080
 Ske+0.079
 owned+0.078
Neg logit contributions
arding-0.114
ising-0.112
 tighter-0.109
 sidelines-0.109
 conclusions-0.107
Token a
Token ID257
Feature activation-4.56e-03

Pos logit contributions
te+0.007
 once+0.005
z+0.005
 Sp+0.005
rd+0.005
Neg logit contributions
 tighter-0.007
 sidelines-0.007
 weeds-0.007
 conclusions-0.007
arding-0.007
Token ti
Token ID46668
Feature activation+2.13e-01

Pos logit contributions
 tighter+0.007
 sidelines+0.007
 conclusions+0.007
arding+0.007
ising+0.006
Neg logit contributions
te-0.007
 Sp-0.006
 owned-0.006
 once-0.006
 lived-0.006
Tokenara
Token ID3301
Feature activation+4.27e-02

Pos logit contributions
te+0.358
 Sp+0.323
 lived+0.299
 Esk+0.296
 ruled+0.296
Neg logit contributions
arding-0.276
 tighter-0.261
ising-0.260
 conclusions-0.259
oves-0.250
Token.
Token ID13
Feature activation+1.99e-02

Pos logit contributions
te+0.061
 Sp+0.050
 Ske+0.048
 owned+0.048
omo+0.046
Neg logit contributions
 tighter-0.067
arding-0.065
 conclusions-0.063
ising-0.062
 sidelines-0.062
Token She
Token ID1375
Feature activation+2.01e-02

Pos logit contributions
te+0.028
z+0.023
gh+0.022
 ever+0.022
 owned+0.021
Neg logit contributions
 conclusions-0.033
 sidelines-0.032
arding-0.031
 tighter-0.031
 weeds-0.030
Token put
Token ID1234
Feature activation+3.19e-02

Pos logit contributions
te+0.032
z+0.026
gh+0.025
man+0.025
 Esk+0.025
Neg logit contributions
 tighter-0.028
ising-0.027
 faces-0.026
 strings-0.026
 lam-0.026
Token them
Token ID606
Feature activation+1.43e-02

Pos logit contributions
te+0.042
rd+0.033
 ruled+0.033
 lived+0.032
 owned+0.032
Neg logit contributions
 tighter-0.056
 weeds-0.053
arding-0.053
 sidelines-0.052
 conclusions-0.052
Token crown
Token ID12389
Feature activation+1.19e-01

Pos logit contributions
te+0.086
 owned+0.071
 Sp+0.071
 once+0.071
 lived+0.070
Neg logit contributions
 conclusions-0.072
arding-0.072
 tighter-0.072
 sidelines-0.071
ising-0.068
Token,
Token ID11
Feature activation-7.67e-04

Pos logit contributions
te+0.165
 Sp+0.148
 owned+0.137
 Esk+0.134
 Ske+0.132
Neg logit contributions
arding-0.182
ising-0.176
 tighter-0.174
 officers-0.169
 conclusions-0.168
Token a
Token ID257
Feature activation-1.02e-02

Pos logit contributions
 tighter+0.001
 conclusions+0.001
 weeds+0.001
arding+0.001
 clues+0.001
Neg logit contributions
te-0.001
z-0.001
 Sp-0.001
 owned-0.001
 once-0.001
Token spark
Token ID9009
Feature activation+6.98e-02

Pos logit contributions
 conclusions+0.015
 tighter+0.015
 sidelines+0.014
arding+0.014
 weeds+0.014
Neg logit contributions
te-0.016
 Sp-0.013
 once-0.013
 owned-0.013
z-0.013
Tokenly
Token ID306
Feature activation-6.49e-02

Pos logit contributions
te+0.118
 Sp+0.101
 owned+0.099
 once+0.098
 lived+0.097
Neg logit contributions
arding-0.092
 tighter-0.092
 conclusions-0.091
 sidelines-0.089
ising-0.088
Token ti
Token ID46668
Feature activation+2.03e-01

Pos logit contributions
 conclusions+0.084
 tighter+0.083
arding+0.082
 sidelines+0.081
 weeds+0.080
Neg logit contributions
te-0.110
 Sp-0.096
 owned-0.091
 once-0.090
 Ske-0.090
Tokenara
Token ID3301
Feature activation+3.08e-02

Pos logit contributions
te+0.349
 Sp+0.311
 Esk+0.300
 lived+0.294
 owned+0.291
Neg logit contributions
arding-0.253
 conclusions-0.244
ising-0.237
 tighter-0.231
oves-0.226
Token,
Token ID11
Feature activation+2.99e-03

Pos logit contributions
te+0.045
 Sp+0.037
 Ske+0.036
 Esk+0.036
 owned+0.035
Neg logit contributions
arding-0.046
 tighter-0.046
 conclusions-0.044
ising-0.044
oves-0.042
Token a
Token ID257
Feature activation-8.37e-03

Pos logit contributions
te+0.005
z+0.004
 Sp+0.004
omo+0.004
 owned+0.004
Neg logit contributions
 tighter-0.004
 conclusions-0.004
arding-0.004
 weeds-0.004
 sidelines-0.004
Token fancy
Token ID14996
Feature activation+3.82e-02

Pos logit contributions
 conclusions+0.012
 tighter+0.012
 sidelines+0.012
arding+0.012
 trigger+0.011
Neg logit contributions
te-0.013
 Sp-0.011
 owned-0.011
 once-0.011
z-0.011
Token cape
Token ID36023
Feature activation+6.54e-02

Pos logit contributions
te+0.066
 Sp+0.058
 owned+0.055
z+0.055
 Ske+0.054
Neg logit contributions
 conclusions-0.047
 sidelines-0.047
 tighter-0.047
arding-0.045
 trigger-0.044
Token.
Token ID13
Feature activation+8.78e-03

Pos logit contributions
te+0.010
 Driver+0.009
omo+0.009
 Ske+0.009
 Sp+0.009
Neg logit contributions
 tighter-0.010
ising-0.010
 conclusions-0.010
 weeds-0.009
 instructions-0.009
Token She
Token ID1375
Feature activation+1.02e-02

Pos logit contributions
te+0.012
z+0.010
gh+0.009
 ever+0.009
 owned+0.009
Neg logit contributions
 conclusions-0.014
 tighter-0.014
 sidelines-0.014
arding-0.014
 bullies-0.013
Token would
Token ID561
Feature activation+4.95e-02

Pos logit contributions
te+0.015
rd+0.012
z+0.012
gh+0.012
 seeker+0.012
Neg logit contributions
 tighter-0.016
ising-0.014
 conclusions-0.014
 lam-0.014
ck-0.014
Token wear
Token ID5806
Feature activation+1.54e-02

Pos logit contributions
te+0.075
z+0.064
 Sp+0.062
rd+0.061
 owned+0.061
Neg logit contributions
 conclusions-0.073
 tighter-0.072
 sidelines-0.071
arding-0.069
ising-0.068
Token her
Token ID607
Feature activation+2.34e-02

Pos logit contributions
te+0.026
omo+0.026
tis+0.024
 Esk+0.024
z+0.024
Neg logit contributions
 tighter-0.019
 faces-0.018
 clues-0.018
 strings-0.018
 unnecessary-0.017
Token ti
Token ID46668
Feature activation+1.95e-01

Pos logit contributions
te+0.035
Matt+0.031
tis+0.031
 Esk+0.031
tto+0.030
Neg logit contributions
 faces-0.033
 tighter-0.032
 strings-0.031
 clues-0.031
 instructions-0.030
Tokenara
Token ID3301
Feature activation+2.21e-02

Pos logit contributions
te+0.343
 Sp+0.294
 lived+0.289
 ruled+0.288
 owned+0.287
Neg logit contributions
arding-0.248
 tighter-0.242
 conclusions-0.237
ising-0.235
 sidelines-0.232
Token and
Token ID290
Feature activation-7.75e-03

Pos logit contributions
te+0.031
 Esk+0.025
omo+0.025
 Ske+0.025
 Sp+0.025
Neg logit contributions
 tighter-0.036
arding-0.034
 conclusions-0.032
 unnecessary-0.032
ising-0.032
Token dress
Token ID6576
Feature activation+6.53e-02

Pos logit contributions
 tighter+0.012
 conclusions+0.011
arding+0.011
 unnecessary+0.011
 weeds+0.011
Neg logit contributions
te-0.012
z-0.010
 once-0.010
rd-0.010
 Sp-0.010
Token every
Token ID790
Feature activation+5.58e-02

Pos logit contributions
te+0.094
 owned+0.087
omo+0.087
 Esk+0.087
 ruled+0.084
Neg logit contributions
 tighter-0.099
ly-0.092
 instructions-0.090
 faces-0.089
arding-0.089
Token day
Token ID1110
Feature activation+1.16e-02

Pos logit contributions
te+0.061
 Sp+0.048
z+0.045
 owned+0.045
 Ske+0.044
Neg logit contributions
 tighter-0.107
 conclusions-0.105
 sidelines-0.103
arding-0.102
 unnecessary-0.101
Token with
Token ID351
Feature activation+1.77e-02

Pos logit contributions
te+0.016
 Sp+0.014
 ti+0.014
 owned+0.013
omo+0.013
Neg logit contributions
arding-0.018
 tighter-0.018
 conclusions-0.018
ising-0.017
 unnecessary-0.017
Token her
Token ID607
Feature activation+1.82e-02

Pos logit contributions
te+0.026
z+0.021
 owned+0.021
 Sp+0.021
omo+0.021
Neg logit contributions
 tighter-0.027
 conclusions-0.026
arding-0.026
 sidelines-0.026
 weeds-0.026
Token -
Token ID532
Feature activation-4.02e-03

Pos logit contributions
te+0.024
 Sp+0.019
 owned+0.019
omo+0.019
z+0.019
Neg logit contributions
arding-0.030
 tighter-0.030
 conclusions-0.030
ising-0.029
 instructions-0.029
Token a
Token ID257
Feature activation-1.22e-02

Pos logit contributions
 tighter+0.006
 conclusions+0.006
 weeds+0.006
 instructions+0.006
 further+0.006
Neg logit contributions
te-0.006
z-0.005
omo-0.005
gh-0.005
 owned-0.005
Token magical
Token ID10883
Feature activation+8.66e-03

Pos logit contributions
 conclusions+0.018
 sidelines+0.018
 tighter+0.018
arding+0.018
 unnecessary+0.017
Neg logit contributions
te-0.018
 Sp-0.015
 owned-0.015
 once-0.015
z-0.015
Token ti
Token ID46668
Feature activation+1.92e-01

Pos logit contributions
te+0.014
 Sp+0.012
z+0.012
 owned+0.012
 once+0.012
Neg logit contributions
 conclusions-0.011
 sidelines-0.011
 tighter-0.011
ising-0.011
arding-0.011
Tokenara
Token ID3301
Feature activation+1.81e-02

Pos logit contributions
te+0.343
 Sp+0.308
 lived+0.298
 Ske+0.295
 Esk+0.293
Neg logit contributions
arding-0.226
 conclusions-0.224
ising-0.218
 tighter-0.211
 sidelines-0.208
Token,
Token ID11
Feature activation-1.60e-03

Pos logit contributions
te+0.023
 Sp+0.019
 Ske+0.018
omo+0.018
 owned+0.018
Neg logit contributions
arding-0.030
 tighter-0.030
 conclusions-0.029
ising-0.029
 sidelines-0.028
Token a
Token ID257
Feature activation-1.17e-02

Pos logit contributions
 tighter+0.003
 conclusions+0.003
arding+0.002
 sidelines+0.002
 weeds+0.002
Neg logit contributions
te-0.002
z-0.002
 Sp-0.002
omo-0.002
 Ske-0.002
Token golden
Token ID10861
Feature activation+6.08e-02

Pos logit contributions
 conclusions+0.017
 tighter+0.017
 sidelines+0.017
arding+0.017
 unnecessary+0.016
Neg logit contributions
te-0.018
 owned-0.015
 Sp-0.015
 once-0.015
z-0.015
Token dress
Token ID6576
Feature activation+6.22e-02

Pos logit contributions
te+0.106
 Sp+0.091
 owned+0.089
 once+0.088
z+0.088
Neg logit contributions
 tighter-0.076
 conclusions-0.076
 sidelines-0.075
arding-0.075
ising-0.072
Token says
Token ID1139
Feature activation-2.40e-02

Pos logit contributions
te+0.052
 once+0.045
 owned+0.045
 Sp+0.045
 lived+0.044
Neg logit contributions
 conclusions-0.026
arding-0.026
 sidelines-0.026
 tighter-0.025
 unnecessary-0.024
Token,
Token ID11
Feature activation-4.08e-03

Pos logit contributions
 conclusions+0.054
 sidelines+0.053
 unnecessary+0.052
 weeds+0.051
arding+0.051
Neg logit contributions
te-0.023
 once-0.017
 Sp-0.015
 ever-0.015
z-0.014
Token putting
Token ID5137
Feature activation+9.22e-03

Pos logit contributions
 conclusions+0.007
arding+0.007
 unnecessary+0.007
 tighter+0.006
 sidelines+0.006
Neg logit contributions
te-0.006
 owned-0.005
 once-0.005
 Sp-0.005
 ti-0.005
Token on
Token ID319
Feature activation+1.32e-02

Pos logit contributions
te+0.012
 owned+0.009
 lived+0.009
 Sp+0.009
 ruled+0.009
Neg logit contributions
 tighter-0.016
 conclusions-0.016
arding-0.016
 sidelines-0.016
 weeds-0.015
Token a
Token ID257
Feature activation-1.04e-02

Pos logit contributions
te+0.019
 owned+0.016
 lived+0.015
 once+0.015
 Sp+0.015
Neg logit contributions
 conclusions-0.021
 tighter-0.021
arding-0.021
 sidelines-0.021
 unnecessary-0.020
Token ti
Token ID46668
Feature activation+1.88e-01

Pos logit contributions
 tighter+0.015
 conclusions+0.015
 sidelines+0.015
arding+0.014
 weeds+0.014
Neg logit contributions
te-0.017
 Sp-0.014
 owned-0.014
 once-0.014
 lived-0.013
Tokenara
Token ID3301
Feature activation+1.38e-02

Pos logit contributions
te+0.314
 Sp+0.264
 owned+0.263
 lived+0.263
 once+0.256
Neg logit contributions
 conclusions-0.256
arding-0.253
 tighter-0.253
 sidelines-0.246
ising-0.239
Token and
Token ID290
Feature activation-5.49e-03

Pos logit contributions
te+0.020
 owned+0.016
 Sp+0.016
 lived+0.016
 ruled+0.016
Neg logit contributions
 tighter-0.021
arding-0.021
 conclusions-0.021
 sidelines-0.021
 weeds-0.020
Token a
Token ID257
Feature activation-1.04e-02

Pos logit contributions
 tighter+0.008
 conclusions+0.008
 sidelines+0.008
 weeds+0.008
arding+0.008
Neg logit contributions
te-0.008
z-0.007
 owned-0.007
 ruled-0.006
 Sp-0.006
Token necklace
Token ID38987
Feature activation+4.53e-02

Pos logit contributions
 tighter+0.014
 sidelines+0.014
 conclusions+0.014
arding+0.013
 weeds+0.013
Neg logit contributions
te-0.017
 Sp-0.014
 once-0.014
 owned-0.014
 lived-0.014
Token.
Token ID13
Feature activation+8.09e-03

Pos logit contributions
te+0.057
 Sp+0.046
 owned+0.046
 lived+0.044
 once+0.044
Neg logit contributions
 tighter-0.079
arding-0.077
 conclusions-0.076
 sidelines-0.076
ising-0.075
Token.
Token ID13
Feature activation+2.93e-03

Pos logit contributions
te+0.135
 owned+0.109
 Sp+0.109
 lived+0.106
 once+0.103
Neg logit contributions
 tighter-0.191
arding-0.189
 conclusions-0.185
 sidelines-0.183
ising-0.182
Token She
Token ID1375
Feature activation+4.63e-03

Pos logit contributions
te+0.003
 once+0.002
 owned+0.002
 Sp+0.002
 lived+0.002
Neg logit contributions
 conclusions-0.006
arding-0.006
 tighter-0.006
 sidelines-0.006
 unnecessary-0.006
Token put
Token ID1234
Feature activation+2.01e-02

Pos logit contributions
te+0.007
rd+0.006
gh+0.006
 Ske+0.006
 Esk+0.006
Neg logit contributions
 tighter-0.006
 lam-0.006
 weeds-0.006
ck-0.005
 strings-0.005
Token on
Token ID319
Feature activation+8.81e-03

Pos logit contributions
te+0.028
rd+0.025
omo+0.025
 ruled+0.023
 Esk+0.023
Neg logit contributions
 tighter-0.033
 weeds-0.031
 further-0.029
 instructions-0.029
 bread-0.029
Token her
Token ID607
Feature activation+1.66e-02

Pos logit contributions
te+0.015
 owned+0.013
 lived+0.013
omo+0.013
 ruled+0.013
Neg logit contributions
 tighter-0.011
 weeds-0.011
 conclusions-0.011
 sidelines-0.011
 instructions-0.010
Token ti
Token ID46668
Feature activation+1.88e-01

Pos logit contributions
te+0.030
 Sp+0.026
o+0.025
 Ske+0.025
 ruled+0.025
Neg logit contributions
 faces-0.020
 tighter-0.019
 weeds-0.018
 bread-0.018
 sidelines-0.018
Tokenara
Token ID3301
Feature activation+1.33e-02

Pos logit contributions
te+0.325
 Sp+0.275
 lived+0.268
 owned+0.268
 ruled+0.265
Neg logit contributions
arding-0.243
 conclusions-0.240
 tighter-0.239
 sidelines-0.232
ising-0.228
Token and
Token ID290
Feature activation-1.23e-02

Pos logit contributions
te+0.019
 owned+0.015
 Sp+0.015
 once+0.015
 lived+0.015
Neg logit contributions
 tighter-0.021
arding-0.021
 conclusions-0.021
 sidelines-0.020
 unnecessary-0.020
Token danced
Token ID39480
Feature activation+3.55e-02

Pos logit contributions
 tighter+0.018
 conclusions+0.017
 weeds+0.017
 sidelines+0.017
arding+0.017
Neg logit contributions
te-0.019
 once-0.016
z-0.016
 Sp-0.016
 owned-0.016
Token around
Token ID1088
Feature activation+1.42e-02

Pos logit contributions
omo+0.061
tto+0.055
scar+0.055
tis+0.054
 Driver+0.053
Neg logit contributions
 tighter-0.047
 further-0.046
ly-0.045
 faces-0.043
 instructions-0.042
Token her
Token ID607
Feature activation+1.79e-02

Pos logit contributions
omo+0.024
te+0.023
tto+0.022
 Driver+0.021
scar+0.021
Neg logit contributions
 conclusions-0.018
 sidelines-0.018
 faces-0.017
 tighter-0.017
 weeds-0.016
Token,
Token ID11
Feature activation-9.55e-03

Pos logit contributions
te+0.079
 Sp+0.069
 owned+0.066
 Esk+0.063
 lived+0.063
Neg logit contributions
arding-0.096
 tighter-0.095
ising-0.094
 sidelines-0.094
 officers-0.090
Token and
Token ID290
Feature activation-1.47e-02

Pos logit contributions
 tighter+0.015
arding+0.015
 conclusions+0.014
 sidelines+0.014
 weeds+0.014
Neg logit contributions
te-0.014
 Sp-0.011
z-0.011
 owned-0.011
 Ske-0.011
Token a
Token ID257
Feature activation-1.82e-02

Pos logit contributions
 tighter+0.025
 conclusions+0.024
arding+0.024
 sidelines+0.024
 weeds+0.023
Neg logit contributions
te-0.020
 owned-0.015
 Sp-0.015
 once-0.015
z-0.015
Token spark
Token ID9009
Feature activation+5.88e-02

Pos logit contributions
 conclusions+0.028
arding+0.028
 tighter+0.028
 sidelines+0.028
 unnecessary+0.027
Neg logit contributions
te-0.027
 owned-0.022
 once-0.022
 Sp-0.022
 lived-0.021
Tokenly
Token ID306
Feature activation-7.00e-02

Pos logit contributions
te+0.142
 once+0.126
 owned+0.125
 ti+0.125
z+0.123
Neg logit contributions
 conclusions-0.043
arding-0.040
 sidelines-0.040
 unnecessary-0.039
 bullies-0.037
Token ti
Token ID46668
Feature activation+1.86e-01

Pos logit contributions
 tighter+0.091
 conclusions+0.089
 sidelines+0.089
arding+0.088
 trigger+0.086
Neg logit contributions
te-0.118
 Sp-0.103
 owned-0.099
 once-0.098
 Ske-0.097
Tokenara
Token ID3301
Feature activation+1.18e-02

Pos logit contributions
te+0.334
 Sp+0.294
 lived+0.286
 ruled+0.281
 owned+0.280
Neg logit contributions
arding-0.222
 conclusions-0.216
 tighter-0.215
ising-0.205
 sidelines-0.203
Token.
Token ID13
Feature activation+1.76e-03

Pos logit contributions
te+0.016
 Sp+0.013
 owned+0.013
 Ske+0.012
 ruled+0.012
Neg logit contributions
arding-0.019
 tighter-0.019
 conclusions-0.019
 sidelines-0.018
ising-0.018
Token
Token ID220
Feature activation-1.60e-02

Pos logit contributions
te+0.002
z+0.002
gh+0.002
 owned+0.002
 ti+0.002
Neg logit contributions
 sidelines-0.003
 tighter-0.003
 conclusions-0.003
arding-0.003
 weeds-0.003
Token
Token ID198
Feature activation+9.40e-03

Pos logit contributions
arding+0.027
 tighter+0.027
 conclusions+0.027
oning+0.026
 instructions+0.026
Neg logit contributions
te-0.019
 ti-0.016
 owned-0.015
 ruled-0.014
 seeker-0.014
Token
Token ID198
Feature activation+9.35e-03

Pos logit contributions
te+0.008
 owned+0.006
 ti+0.006
 Sp+0.006
 ruled+0.005
Neg logit contributions
 conclusions-0.020
arding-0.020
 tighter-0.020
 sidelines-0.019
ising-0.019
Token a
Token ID257
Feature activation-1.61e-02

Pos logit contributions
te+0.012
 owned+0.009
 Sp+0.009
 lived+0.009
 once+0.009
Neg logit contributions
 conclusions-0.012
 tighter-0.012
arding-0.012
 sidelines-0.012
 unnecessary-0.012
Token pink
Token ID11398
Feature activation+6.68e-02

Pos logit contributions
 tighter+0.025
 conclusions+0.024
 sidelines+0.024
arding+0.024
 weeds+0.023
Neg logit contributions
te-0.024
 Sp-0.020
 once-0.020
 owned-0.020
 lived-0.019
Token dress
Token ID6576
Feature activation+6.05e-02

Pos logit contributions
te+0.121
 owned+0.102
 Esk+0.102
 Sp+0.101
 once+0.099
Neg logit contributions
 tighter-0.078
 conclusions-0.077
 sidelines-0.077
 trigger-0.073
 weeds-0.072
Token,
Token ID11
Feature activation-1.15e-02

Pos logit contributions
te+0.085
 Sp+0.072
 owned+0.072
 lived+0.070
 ruled+0.068
Neg logit contributions
 tighter-0.097
arding-0.093
 conclusions-0.091
 sidelines-0.090
ising-0.090
Token a
Token ID257
Feature activation-1.72e-02

Pos logit contributions
 tighter+0.019
 conclusions+0.018
arding+0.018
 sidelines+0.018
 weeds+0.018
Neg logit contributions
te-0.017
 Sp-0.013
 owned-0.013
z-0.013
 lived-0.013
Token ti
Token ID46668
Feature activation+1.85e-01

Pos logit contributions
 tighter+0.026
 sidelines+0.026
 conclusions+0.025
arding+0.025
 weeds+0.024
Neg logit contributions
te-0.026
 Sp-0.021
 once-0.021
 owned-0.021
 lived-0.021
Tokenara
Token ID3301
Feature activation+1.48e-02

Pos logit contributions
te+0.324
 Sp+0.275
 owned+0.274
 lived+0.273
 ruled+0.273
Neg logit contributions
 conclusions-0.234
arding-0.234
 tighter-0.232
 sidelines-0.222
ising-0.218
Token and
Token ID290
Feature activation-1.16e-02

Pos logit contributions
te+0.020
 ruled+0.016
 owned+0.016
 Sp+0.016
 Ske+0.016
Neg logit contributions
 tighter-0.024
arding-0.023
 conclusions-0.022
ising-0.022
 weeds-0.022
Token a
Token ID257
Feature activation-1.57e-02

Pos logit contributions
 tighter+0.020
 weeds+0.018
 conclusions+0.018
 sidelines+0.018
 unnecessary+0.018
Neg logit contributions
te-0.016
z-0.013
rd-0.012
gh-0.012
 once-0.012
Token necklace
Token ID38987
Feature activation+3.85e-02

Pos logit contributions
 tighter+0.023
 sidelines+0.023
 conclusions+0.023
arding+0.023
 trigger+0.022
Neg logit contributions
te-0.024
 Sp-0.020
 owned-0.020
 once-0.020
 lived-0.019
Token.
Token ID13
Feature activation+4.14e-04

Pos logit contributions
te+0.051
 Sp+0.043
 owned+0.043
 lived+0.040
 ruled+0.040
Neg logit contributions
 tighter-0.064
arding-0.061
ising-0.059
 conclusions-0.059
 trigger-0.059
Tokenloo
Token ID29680
Feature activation+3.12e-02

Pos logit contributions
te+0.148
 ti+0.140
 owned+0.127
 Happy+0.123
 Sweet+0.121
Neg logit contributions
ans-0.158
rows-0.158
oves-0.155
ales-0.153
ising-0.152
Token and
Token ID290
Feature activation-1.22e-02

Pos logit contributions
te+0.041
 owned+0.034
 ruled+0.032
 Sp+0.032
 lived+0.032
Neg logit contributions
 tighter-0.053
 sidelines-0.051
arding-0.050
 conclusions-0.050
ising-0.050
Token pretended
Token ID36766
Feature activation+7.25e-02

Pos logit contributions
 tighter+0.020
 sidelines+0.020
 weeds+0.020
arding+0.019
 faces+0.019
Neg logit contributions
te-0.017
omo-0.013
rd-0.013
 ti-0.013
z-0.012
Token to
Token ID284
Feature activation+2.28e-04

Pos logit contributions
te+0.116
omo+0.098
 ti+0.097
z+0.095
 seeker+0.094
Neg logit contributions
arding-0.098
 faces-0.098
 weeds-0.098
 bullies-0.097
 tighter-0.097
Token be
Token ID307
Feature activation-2.77e-02

Pos logit contributions
te+0.000
z+0.000
tto+0.000
 owned+0.000
 ti+0.000
Neg logit contributions
 conclusions-0.000
 sidelines-0.000
 tighter-0.000
 weeds-0.000
 rules-0.000
Token Esk
Token ID47534
Feature activation+1.77e-01

Pos logit contributions
 weeds+0.036
 bullies+0.034
 clues+0.033
 goats+0.033
 unnecessary+0.032
Neg logit contributions
te-0.045
tto-0.043
z-0.041
gh-0.041
omo-0.040
Tokenim
Token ID320
Feature activation+3.62e-02

Pos logit contributions
 ti+0.247
te+0.226
 Sp+0.216
 Happy+0.209
 Sweet+0.206
Neg logit contributions
ans-0.279
arding-0.260
ales-0.258
ishment-0.251
oves-0.248
Tokenos
Token ID418
Feature activation+5.22e-02

Pos logit contributions
te+0.048
 ti+0.046
 seeker+0.041
omo+0.040
 owned+0.040
Neg logit contributions
arding-0.060
ising-0.056
 tighter-0.055
oves-0.055
 instructions-0.054
Token.
Token ID13
Feature activation+9.87e-04

Pos logit contributions
te+0.071
 owned+0.060
 ti+0.059
omo+0.058
 ruled+0.056
Neg logit contributions
 tighter-0.083
arding-0.082
 conclusions-0.082
 sidelines-0.081
ising-0.080
Token After
Token ID2293
Feature activation-6.31e-02

Pos logit contributions
te+0.001
z+0.001
 owned+0.001
 ti+0.001
 ruled+0.001
Neg logit contributions
 conclusions-0.002
arding-0.002
 sidelines-0.002
 tighter-0.002
 bullies-0.002
Token a
Token ID257
Feature activation-1.73e-02

Pos logit contributions
 sidelines+0.098
 conclusions+0.096
arding+0.093
 weeds+0.093
 tighter+0.092
Neg logit contributions
te-0.092
 owned-0.072
z-0.072
 ti-0.072
omo-0.070
Token<|endoftext|>
Token ID50256
Feature activation-3.11e-02

Pos logit contributions
te+0.032
z+0.026
gh+0.025
 ti+0.024
 Esk+0.024
Neg logit contributions
 conclusions-0.043
 sidelines-0.042
 bullies-0.042
arding-0.041
 tighter-0.041
TokenL
Token ID43
Feature activation+1.56e-02

Pos logit contributions
 bushes+0.038
 conclusions+0.033
 bullies+0.032
 further+0.031
osis+0.030
Neg logit contributions
te-0.049
 barely-0.045
 hardly-0.045
 ti-0.044
belt-0.043
Tokenily
Token ID813
Feature activation-1.46e-02

Pos logit contributions
te+0.024
 owned+0.019
 once+0.019
 Sp+0.019
 lived+0.019
Neg logit contributions
 conclusions-0.024
arding-0.024
 tighter-0.024
 sidelines-0.023
 unnecessary-0.023
Token loved
Token ID6151
Feature activation+3.00e-02

Pos logit contributions
 tighter+0.022
 conclusions+0.021
arding+0.021
 sidelines+0.020
ising+0.020
Neg logit contributions
te-0.022
 Sp-0.018
z-0.018
 Ske-0.018
 ti-0.017
Token sun
Token ID4252
Feature activation+5.08e-02

Pos logit contributions
te+0.040
 owned+0.032
 once+0.031
 Sp+0.031
 lived+0.031
Neg logit contributions
 conclusions-0.051
arding-0.050
 sidelines-0.050
 tighter-0.050
 unnecessary-0.048
Tokenfl
Token ID2704
Feature activation+1.59e-01

Pos logit contributions
te+0.080
 owned+0.068
 Sp+0.065
rd+0.065
 lived+0.065
Neg logit contributions
 tighter-0.074
arding-0.071
ising-0.070
 lam-0.070
 faces-0.070
Tokenowers
Token ID3618
Feature activation+3.36e-03

Pos logit contributions
te+0.305
 owned+0.264
 Sp+0.262
 ti+0.256
 ruled+0.255
Neg logit contributions
ising-0.167
arding-0.165
 conclusions-0.161
 sidelines-0.154
 tighter-0.154
Token.
Token ID13
Feature activation+1.89e-02

Pos logit contributions
te+0.004
 Sp+0.003
 owned+0.003
z+0.003
 once+0.003
Neg logit contributions
 tighter-0.006
arding-0.006
 conclusions-0.006
ising-0.006
 sidelines-0.006
Token She
Token ID1375
Feature activation+1.80e-02

Pos logit contributions
te+0.030
z+0.028
gh+0.028
 ti+0.026
tto+0.026
Neg logit contributions
 conclusions-0.024
weet-0.022
arding-0.021
 weeds-0.021
oning-0.021
Token liked
Token ID8288
Feature activation+4.64e-02

Pos logit contributions
te+0.020
 Ske+0.017
 Sp+0.016
 seeker+0.016
 Esk+0.016
Neg logit contributions
 lam-0.032
 weeds-0.030
 tighter-0.030
 goats-0.030
 strings-0.030
Token to
Token ID284
Feature activation+6.84e-03

Pos logit contributions
te+0.055
 owned+0.042
 Sp+0.041
 once+0.040
z+0.040
Neg logit contributions
 conclusions-0.085
arding-0.084
 tighter-0.084
 sidelines-0.083
 weeds-0.083
Token garden
Token ID11376
Feature activation+3.11e-02

Pos logit contributions
te+0.000
 owned+0.000
 once+0.000
 Sp+0.000
 lived+0.000
Neg logit contributions
 conclusions-0.000
arding-0.000
 tighter-0.000
 sidelines-0.000
 unnecessary-0.000
Token.
Token ID13
Feature activation+2.92e-02

Pos logit contributions
te+0.045
 Sp+0.036
 Ske+0.035
z+0.034
omo+0.034
Neg logit contributions
 tighter-0.048
 weeds-0.047
 conclusions-0.047
ising-0.047
 sidelines-0.047
Token Lily
Token ID20037
Feature activation-2.96e-03

Pos logit contributions
te+0.038
z+0.031
gh+0.030
 ti+0.028
 Esk+0.028
Neg logit contributions
 conclusions-0.048
 sidelines-0.048
 bullies-0.046
arding-0.046
 weeds-0.046
Token is
Token ID318
Feature activation+2.79e-02

Pos logit contributions
 tighter+0.005
ising+0.005
 conclusions+0.005
arding+0.005
 weeds+0.005
Neg logit contributions
te-0.004
 Sp-0.003
 Ske-0.003
z-0.003
gh-0.003
Token the
Token ID262
Feature activation+2.05e-02

Pos logit contributions
te+0.044
z+0.038
 Sp+0.037
rd+0.036
 lived+0.036
Neg logit contributions
 tighter-0.039
 weeds-0.037
 bullies-0.037
 sidelines-0.037
 unnecessary-0.036
Token seeker
Token ID45993
Feature activation+1.57e-01

Pos logit contributions
te+0.031
 owned+0.026
 once+0.025
 Sp+0.025
 lived+0.025
Neg logit contributions
 conclusions-0.031
arding-0.031
 tighter-0.031
 sidelines-0.030
 unnecessary-0.029
Token and
Token ID290
Feature activation-1.36e-03

Pos logit contributions
te+0.235
 Sp+0.211
 Ske+0.204
 owned+0.203
 ruled+0.202
Neg logit contributions
arding-0.220
 sidelines-0.212
 tighter-0.210
 conclusions-0.210
 weeds-0.210
Token Ben
Token ID3932
Feature activation-6.03e-03

Pos logit contributions
 tighter+0.002
 sidelines+0.002
 bullies+0.002
 weeds+0.002
 clues+0.002
Neg logit contributions
te-0.002
z-0.002
 once-0.002
omo-0.002
icer-0.001
Token is
Token ID318
Feature activation+1.97e-02

Pos logit contributions
 tighter+0.009
 weeds+0.009
ising+0.009
 agree+0.009
 conclusions+0.008
Neg logit contributions
te-0.009
omo-0.007
 Sp-0.007
z-0.007
gh-0.007
Token the
Token ID262
Feature activation+1.52e-02

Pos logit contributions
mm+0.031
te+0.031
omo+0.030
 ti+0.028
ests+0.028
Neg logit contributions
 calling-0.026
 tighter-0.025
 further-0.024
 bullies-0.024
 weeds-0.024
Token h
Token ID289
Feature activation+6.32e-02

Pos logit contributions
te+0.025
 owned+0.021
 once+0.021
 lived+0.020
 Sp+0.020
Neg logit contributions
 conclusions-0.021
 tighter-0.021
 sidelines-0.021
arding-0.021
 unnecessary-0.020
Token garden
Token ID11376
Feature activation+1.52e-02

Pos logit contributions
 tighter+0.042
 conclusions+0.041
 sidelines+0.041
arding+0.041
 unnecessary+0.040
Neg logit contributions
te-0.048
 owned-0.040
 Sp-0.039
 ruled-0.039
z-0.038
Token and
Token ID290
Feature activation+2.51e-03

Pos logit contributions
te+0.022
 Sp+0.018
 Ske+0.017
 owned+0.017
z+0.017
Neg logit contributions
 tighter-0.024
 conclusions-0.023
ising-0.023
 sidelines-0.023
 weeds-0.023
Token admire
Token ID21099
Feature activation+4.86e-02

Pos logit contributions
te+0.004
z+0.003
 Sp+0.003
 once+0.003
gh+0.003
Neg logit contributions
 tighter-0.004
 conclusions-0.004
 sidelines-0.004
 weeds-0.004
 bullies-0.004
Token his
Token ID465
Feature activation+2.42e-02

Pos logit contributions
te+0.077
z+0.068
rd+0.062
 seeker+0.060
 owned+0.060
Neg logit contributions
 conclusions-0.073
 tighter-0.069
 weeds-0.068
ising-0.065
 faces-0.065
Token sun
Token ID4252
Feature activation+4.07e-02

Pos logit contributions
te+0.039
 once+0.033
 owned+0.033
 lived+0.032
 Sp+0.032
Neg logit contributions
arding-0.035
 tighter-0.034
 conclusions-0.034
 sidelines-0.033
 unnecessary-0.032
Tokenfl
Token ID2704
Feature activation+1.57e-01

Pos logit contributions
te+0.059
 Sp+0.049
 owned+0.048
 Ske+0.046
 ti+0.046
Neg logit contributions
 tighter-0.063
arding-0.063
ising-0.062
 conclusions-0.060
 sidelines-0.059
Tokenowers
Token ID3618
Feature activation-1.73e-03

Pos logit contributions
te+0.306
 owned+0.264
 Sp+0.262
 ti+0.258
 Ske+0.255
Neg logit contributions
ising-0.172
arding-0.169
 conclusions-0.166
 sidelines-0.165
 tighter-0.158
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
arding+0.003
 conclusions+0.003
 tighter+0.003
 sidelines+0.003
ising+0.003
Neg logit contributions
te-0.002
 owned-0.002
 Sp-0.002
 once-0.002
z-0.002
Token They
Token ID1119
Feature activation-3.27e-02

Pos logit contributions
te+0.016
z+0.014
gh+0.013
 ever+0.013
 ti+0.012
Neg logit contributions
 conclusions-0.019
 sidelines-0.018
 tighter-0.018
 bullies-0.018
 goats-0.017
Token also
Token ID635
Feature activation-4.30e-03

Pos logit contributions
 conclusions+0.048
 tighter+0.047
arding+0.047
 sidelines+0.046
 weeds+0.045
Neg logit contributions
te-0.051
z-0.043
 Sp-0.042
 once-0.042
 owned-0.042
Token see
Token ID766
Feature activation+1.45e-02

Pos logit contributions
 conclusions+0.006
 tighter+0.006
 sidelines+0.006
arding+0.006
 weeds+0.006
Neg logit contributions
te-0.007
z-0.006
 Sp-0.006
 Ske-0.006
 once-0.006
Token soon
Token ID2582
Feature activation+6.81e-02

Pos logit contributions
te+0.047
 Sp+0.042
 Ske+0.040
rd+0.040
z+0.039
Neg logit contributions
 tighter-0.043
 further-0.038
 weeds-0.038
 calling-0.038
 unnecessary-0.037
Token nothing
Token ID2147
Feature activation+2.18e-02

Pos logit contributions
te+0.099
 owned+0.081
 once+0.081
 Sp+0.079
 lived+0.079
Neg logit contributions
 conclusions-0.108
arding-0.107
 tighter-0.106
 sidelines-0.105
 unnecessary-0.103
Token but
Token ID475
Feature activation+1.21e-02

Pos logit contributions
te+0.028
omo+0.023
 Sp+0.022
z+0.022
 ruled+0.021
Neg logit contributions
 tighter-0.038
 unnecessary-0.036
arding-0.036
 conclusions-0.036
 further-0.035
Token a
Token ID257
Feature activation-4.00e-03

Pos logit contributions
te+0.016
z+0.013
 Sp+0.013
 ti+0.013
 owned+0.013
Neg logit contributions
 tighter-0.021
 conclusions-0.020
arding-0.020
 unnecessary-0.019
 sidelines-0.019
Token small
Token ID1402
Feature activation-2.66e-02

Pos logit contributions
arding+0.006
 conclusions+0.006
 tighter+0.006
 sidelines+0.006
 unnecessary+0.006
Neg logit contributions
te-0.006
 owned-0.005
 once-0.005
 Sp-0.005
 lived-0.005
Token spe
Token ID693
Feature activation+1.56e-01

Pos logit contributions
arding+0.042
 conclusions+0.040
 sidelines+0.040
 unnecessary+0.040
 tighter+0.040
Neg logit contributions
te-0.040
 once-0.035
 owned-0.034
 lived-0.033
 ti-0.033
Tokenck
Token ID694
Feature activation-5.55e-03

Pos logit contributions
te+0.389
 owned+0.342
z+0.341
 once+0.338
 lived+0.333
Neg logit contributions
 sidelines-0.098
 conclusions-0.096
arding-0.094
 weeds-0.090
 bullies-0.090
Token in
Token ID287
Feature activation-7.77e-03

Pos logit contributions
 conclusions+0.009
arding+0.009
 tighter+0.009
 sidelines+0.009
 unnecessary+0.009
Neg logit contributions
te-0.008
 owned-0.006
 once-0.006
 lived-0.006
 Sp-0.006
Token the
Token ID262
Feature activation+1.50e-02

Pos logit contributions
 conclusions+0.013
 tighter+0.013
arding+0.013
 sidelines+0.013
 weeds+0.012
Neg logit contributions
te-0.011
 Sp-0.009
 owned-0.009
z-0.008
 once-0.008
Token sky
Token ID6766
Feature activation+1.25e-02

Pos logit contributions
te+0.018
 once+0.017
 ti+0.016
 owned+0.016
 lived+0.016
Neg logit contributions
arding-0.028
ares-0.026
 unnecessary-0.025
irs-0.025
ising-0.025
Token.
Token ID13
Feature activation+1.41e-02

Pos logit contributions
te+0.017
 Sp+0.014
 Ske+0.013
omo+0.013
z+0.013
Neg logit contributions
 tighter-0.020
arding-0.020
 conclusions-0.020
ising-0.020
 sidelines-0.019
Token for
Token ID329
Feature activation+5.44e-03

Pos logit contributions
te+0.021
 owned+0.017
 ruled+0.017
z+0.017
 ti+0.017
Neg logit contributions
 tighter-0.028
arding-0.027
 conclusions-0.026
 instructions-0.026
 weeds-0.025
Token some
Token ID617
Feature activation-1.90e-02

Pos logit contributions
te+0.007
 Sp+0.005
 once+0.005
 owned+0.005
 lived+0.005
Neg logit contributions
arding-0.010
 conclusions-0.010
 tighter-0.010
 sidelines-0.010
 unnecessary-0.009
Token funny
Token ID8258
Feature activation-3.59e-03

Pos logit contributions
arding+0.027
 conclusions+0.027
 sidelines+0.026
 tighter+0.026
 unnecessary+0.025
Neg logit contributions
te-0.031
 once-0.026
 owned-0.026
 Sp-0.026
 lived-0.025
Token little
Token ID1310
Feature activation-6.29e-02

Pos logit contributions
 conclusions+0.005
 tighter+0.005
 weeds+0.005
arding+0.005
 instructions+0.005
Neg logit contributions
te-0.006
 owned-0.005
 ruled-0.005
 Sp-0.005
 lived-0.005
Token sun
Token ID4252
Feature activation+4.03e-02

Pos logit contributions
 conclusions+0.094
arding+0.094
 sidelines+0.092
 tighter+0.092
 unnecessary+0.089
Neg logit contributions
te-0.096
 once-0.081
 owned-0.079
 Sp-0.079
 lived-0.077
Tokenfl
Token ID2704
Feature activation+1.56e-01

Pos logit contributions
te+0.059
 Sp+0.049
 ruled+0.049
 owned+0.049
 Ske+0.048
Neg logit contributions
 tighter-0.062
arding-0.058
ising-0.055
 officers-0.055
 instructions-0.055
Tokenowers
Token ID3618
Feature activation-1.32e-03

Pos logit contributions
te+0.285
 owned+0.247
 ruled+0.239
 Sp+0.238
 ti+0.237
Neg logit contributions
arding-0.185
ising-0.182
 tighter-0.181
 conclusions-0.179
 sidelines-0.177
Token.
Token ID13
Feature activation+1.67e-02

Pos logit contributions
 tighter+0.002
arding+0.002
 conclusions+0.002
 sidelines+0.002
ising+0.002
Neg logit contributions
te-0.002
 Sp-0.001
 owned-0.001
z-0.001
 ruled-0.001
Token Her
Token ID2332
Feature activation-3.03e-02

Pos logit contributions
te+0.021
z+0.017
 owned+0.016
 ruled+0.016
gh+0.016
Neg logit contributions
 conclusions-0.028
 tighter-0.028
 sidelines-0.028
arding-0.028
 weeds-0.027
Token m
Token ID285
Feature activation+3.83e-02

Pos logit contributions
 tighter+0.042
 instructions+0.040
ising+0.040
arding+0.040
 weeds+0.039
Neg logit contributions
te-0.050
 ruled-0.042
 owned-0.042
z-0.040
 Sp-0.039
Tokenama
Token ID1689
Feature activation-5.01e-03

Pos logit contributions
te+0.059
 owned+0.054
 ti+0.054
 ruled+0.052
 seeker+0.051
Neg logit contributions
arding-0.052
ising-0.052
oves-0.051
 sidelines-0.048
 tighter-0.048
Token back
Token ID736
Feature activation+1.45e-02

Pos logit contributions
te+0.027
 owned+0.021
 Sp+0.021
 once+0.021
 lived+0.021
Neg logit contributions
 conclusions-0.031
arding-0.031
 tighter-0.031
 sidelines-0.031
 unnecessary-0.030
Token with
Token ID351
Feature activation+2.62e-02

Pos logit contributions
te+0.021
 Sp+0.017
 ti+0.017
 lived+0.017
 owned+0.017
Neg logit contributions
 tighter-0.023
 conclusions-0.022
ising-0.021
 weeds-0.021
arding-0.021
Token lots
Token ID6041
Feature activation-4.93e-03

Pos logit contributions
te+0.032
z+0.024
 Sp+0.024
 owned+0.024
 lived+0.023
Neg logit contributions
 tighter-0.048
 conclusions-0.048
 sidelines-0.047
arding-0.047
 weeds-0.046
Token of
Token ID286
Feature activation+1.43e-02

Pos logit contributions
arding+0.008
 conclusions+0.008
 sidelines+0.008
 tighter+0.008
 unnecessary+0.008
Neg logit contributions
te-0.006
 once-0.005
 owned-0.005
 Sp-0.005
 lived-0.005
Token sun
Token ID4252
Feature activation+4.14e-02

Pos logit contributions
te+0.023
 once+0.020
 owned+0.020
 lived+0.019
 Sp+0.019
Neg logit contributions
arding-0.021
 tighter-0.020
 conclusions-0.020
 sidelines-0.020
 trigger-0.019
Tokenfl
Token ID2704
Feature activation+1.56e-01

Pos logit contributions
te+0.059
 Sp+0.048
 owned+0.048
 Ske+0.046
 ti+0.045
Neg logit contributions
 tighter-0.067
arding-0.066
ising-0.064
 weeds-0.064
 conclusions-0.063
Tokenowers
Token ID3618
Feature activation-9.96e-04

Pos logit contributions
te+0.273
 owned+0.233
 Sp+0.230
 ti+0.225
 lived+0.222
Neg logit contributions
arding-0.198
 conclusions-0.196
 tighter-0.195
 sidelines-0.192
ising-0.190
Token to
Token ID284
Feature activation+6.52e-03

Pos logit contributions
 conclusions+0.002
arding+0.002
 tighter+0.002
 sidelines+0.002
 unnecessary+0.002
Neg logit contributions
te-0.001
 owned-0.001
 Sp-0.001
 once-0.001
 lived-0.001
Token fill
Token ID6070
Feature activation+3.74e-02

Pos logit contributions
te+0.010
 owned+0.009
 once+0.009
 lived+0.009
 Sp+0.009
Neg logit contributions
arding-0.009
 tighter-0.009
 conclusions-0.009
 sidelines-0.009
 unnecessary-0.009
Token the
Token ID262
Feature activation+1.56e-02

Pos logit contributions
te+0.053
 owned+0.043
 Sp+0.043
z+0.042
 ti+0.042
Neg logit contributions
 conclusions-0.060
 tighter-0.060
 weeds-0.057
ising-0.057
arding-0.056
Token prince
Token ID19716
Feature activation+8.43e-02

Pos logit contributions
te+0.027
 owned+0.023
 Sp+0.022
 once+0.022
 lived+0.022
Neg logit contributions
 conclusions-0.021
 tighter-0.020
arding-0.020
 sidelines-0.020
 weeds-0.019
Token stretch
Token ID7539
Feature activation-6.94e-03

Pos logit contributions
te+0.102
 owned+0.086
 Sp+0.086
 once+0.085
 lived+0.084
Neg logit contributions
 conclusions-0.077
arding-0.076
 tighter-0.076
 sidelines-0.075
 unnecessary-0.072
Token out
Token ID503
Feature activation-4.86e-02

Pos logit contributions
 tighter+0.011
 conclusions+0.010
 weeds+0.010
arding+0.010
 further+0.010
Neg logit contributions
te-0.010
 owned-0.009
omo-0.009
 ruled-0.009
 Esk-0.009
Token and
Token ID290
Feature activation-2.40e-03

Pos logit contributions
 tighter+0.075
 conclusions+0.073
 weeds+0.071
arding+0.071
 sidelines+0.069
Neg logit contributions
te-0.073
 owned-0.061
 Sp-0.060
 lived-0.060
omo-0.059
Token enormous
Token ID9812
Feature activation+2.56e-02

Pos logit contributions
arding+0.003
 conclusions+0.003
 tighter+0.003
 sidelines+0.003
 unnecessary+0.003
Neg logit contributions
te-0.004
 owned-0.004
 once-0.004
 Sp-0.003
 lived-0.003
Token sun
Token ID4252
Feature activation+4.17e-02

Pos logit contributions
te+0.043
 owned+0.036
 once+0.036
 Sp+0.036
 lived+0.036
Neg logit contributions
 conclusions-0.034
arding-0.034
 tighter-0.034
 sidelines-0.033
 unnecessary-0.032
Tokenfl
Token ID2704
Feature activation+1.56e-01

Pos logit contributions
te+0.053
 owned+0.044
 Sp+0.043
 Ske+0.041
 ruled+0.041
Neg logit contributions
 tighter-0.070
arding-0.070
 weeds-0.068
ising-0.068
 conclusions-0.067
Tokenowers
Token ID3618
Feature activation-3.28e-04

Pos logit contributions
te+0.286
 owned+0.250
 Sp+0.245
 ti+0.236
 ruled+0.236
Neg logit contributions
arding-0.179
 conclusions-0.176
 tighter-0.175
ising-0.175
 sidelines-0.173
Token would
Token ID561
Feature activation+5.75e-02

Pos logit contributions
arding+0.001
 tighter+0.001
 conclusions+0.001
 sidelines+0.001
ising+0.001
Neg logit contributions
te-0.000
 Sp-0.000
 owned-0.000
 once-0.000
omo-0.000
Token show
Token ID905
Feature activation-8.06e-03

Pos logit contributions
te+0.100
 Sp+0.084
 owned+0.084
z+0.082
 once+0.082
Neg logit contributions
 conclusions-0.075
 tighter-0.073
arding-0.072
 sidelines-0.072
 weeds-0.069
Token their
Token ID511
Feature activation-8.88e-03

Pos logit contributions
 conclusions+0.011
 tighter+0.011
arding+0.011
 unnecessary+0.011
 instructions+0.011
Neg logit contributions
te-0.013
 owned-0.011
 Sp-0.011
omo-0.011
z-0.011
Token bright
Token ID6016
Feature activation+4.82e-02

Pos logit contributions
 conclusions+0.013
arding+0.013
 tighter+0.012
 sidelines+0.012
 unnecessary+0.012
Neg logit contributions
te-0.014
 owned-0.012
 Sp-0.012
 once-0.012
 lived-0.012
Token da
Token ID12379
Feature activation+9.60e-02

Pos logit contributions
 conclusions+0.008
 tighter+0.008
arding+0.008
 sidelines+0.008
 weeds+0.008
Neg logit contributions
te-0.009
 owned-0.007
 once-0.007
 Sp-0.007
z-0.007
Tokenis
Token ID271
Feature activation-1.08e-02

Pos logit contributions
te+0.222
 owned+0.196
 once+0.195
 Sp+0.194
 lived+0.193
Neg logit contributions
 conclusions-0.069
arding-0.068
 tighter-0.067
 sidelines-0.066
 unnecessary-0.062
Tokenies
Token ID444
Feature activation+2.06e-02

Pos logit contributions
 conclusions+0.007
arding+0.007
 tighter+0.007
 sidelines+0.007
 unnecessary+0.006
Neg logit contributions
te-0.026
 owned-0.023
 once-0.023
 Sp-0.023
 lived-0.023
Token,
Token ID11
Feature activation-8.20e-03

Pos logit contributions
te+0.021
 owned+0.015
 Sp+0.015
 once+0.015
z+0.014
Neg logit contributions
arding-0.041
 conclusions-0.041
 sidelines-0.040
 tighter-0.040
ising-0.039
Token sun
Token ID4252
Feature activation+4.02e-02

Pos logit contributions
 conclusions+0.012
arding+0.012
 tighter+0.012
 sidelines+0.011
 weeds+0.011
Neg logit contributions
te-0.013
 owned-0.011
 once-0.011
 Sp-0.011
z-0.011
Tokenfl
Token ID2704
Feature activation+1.54e-01

Pos logit contributions
te+0.044
 owned+0.034
 Sp+0.033
 once+0.031
 ruled+0.031
Neg logit contributions
arding-0.077
 tighter-0.076
 conclusions-0.076
 sidelines-0.074
ising-0.074
Tokenowers
Token ID3618
Feature activation-2.51e-03

Pos logit contributions
te+0.286
 owned+0.247
 Sp+0.242
 ti+0.236
 ruled+0.235
Neg logit contributions
arding-0.178
 conclusions-0.174
ising-0.172
 sidelines-0.168
 tighter-0.167
Token,
Token ID11
Feature activation-6.73e-03

Pos logit contributions
 conclusions+0.005
 tighter+0.005
 sidelines+0.005
arding+0.005
 unnecessary+0.005
Neg logit contributions
te-0.002
 once-0.002
 owned-0.002
 lived-0.002
 Sp-0.002
Token and
Token ID290
Feature activation-7.76e-03

Pos logit contributions
 conclusions+0.010
 tighter+0.010
arding+0.010
 sidelines+0.010
 weeds+0.010
Neg logit contributions
te-0.010
 owned-0.008
 Sp-0.008
 once-0.008
z-0.008
Token tul
Token ID48373
Feature activation+1.03e-01

Pos logit contributions
arding+0.012
 conclusions+0.012
 sidelines+0.012
 tighter+0.012
 unnecessary+0.012
Neg logit contributions
te-0.011
 owned-0.009
 once-0.009
 Sp-0.009
 lived-0.009
Tokenips
Token ID2419
Feature activation+3.74e-03

Pos logit contributions
te+0.177
 owned+0.150
 once+0.148
 Sp+0.147
 lived+0.146
Neg logit contributions
 conclusions-0.136
arding-0.136
 tighter-0.134
 sidelines-0.132
 unnecessary-0.128
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
ising+0.060
 tighter+0.054
 instructions+0.054
 further+0.053
 calling+0.053
Neg logit contributions
te-0.071
omo-0.070
 ti-0.066
tto-0.065
rd-0.065
Token He
Token ID679
Feature activation+2.34e-02

Pos logit contributions
te+0.029
z+0.026
gh+0.022
 ever+0.022
 ti+0.022
Neg logit contributions
 conclusions-0.035
 sidelines-0.035
 rules-0.034
 tighter-0.034
 agree-0.032
Token wrote
Token ID2630
Feature activation+3.71e-02

Pos logit contributions
te+0.031
rd+0.027
omo+0.026
 Esk+0.026
 Sp+0.025
Neg logit contributions
 tighter-0.033
 strings-0.033
 decide-0.032
 agree-0.032
ising-0.032
Token z
Token ID1976
Feature activation+6.65e-02

Pos logit contributions
te+0.052
omo+0.047
 Sp+0.045
 seeker+0.045
 Esk+0.044
Neg logit contributions
 tighter-0.057
 instructions-0.057
 clues-0.054
 faces-0.054
arding-0.054
Tokenig
Token ID328
Feature activation+3.59e-02

Pos logit contributions
te+0.135
 owned+0.127
 He+0.124
 lived+0.124
 Gor+0.123
Neg logit contributions
ising-0.070
arding-0.059
ares-0.057
oves-0.056
 unnecessary-0.053
Tokenz
Token ID89
Feature activation+1.54e-01

Pos logit contributions
te+0.047
 seeker+0.044
omo+0.043
 Esk+0.042
 owned+0.042
Neg logit contributions
 instructions-0.057
ising-0.056
arding-0.055
oves-0.054
 tighter-0.053
Tokenags
Token ID3775
Feature activation-4.55e-02

Pos logit contributions
te+0.300
 Sp+0.277
 owned+0.268
 ruled+0.260
 Ske+0.259
Neg logit contributions
ising-0.167
 tighter-0.161
arding-0.159
 sidelines-0.153
 unnecessary-0.149
Token on
Token ID319
Feature activation+2.49e-02

Pos logit contributions
ising+0.059
 tighter+0.058
 further+0.056
 instructions+0.056
 calling+0.056
Neg logit contributions
omo-0.065
te-0.063
tto-0.062
rd-0.062
 Sp-0.062
Token walls
Token ID7714
Feature activation+4.98e-02

Pos logit contributions
te+0.028
z+0.022
omo+0.021
 owned+0.021
 lived+0.021
Neg logit contributions
 conclusions-0.048
 sidelines-0.046
 tighter-0.045
 instructions-0.045
 rules-0.045
Token,
Token ID11
Feature activation+5.86e-03

Pos logit contributions
te+0.080
 Sp+0.075
 lived+0.075
 owned+0.074
omo+0.072
Neg logit contributions
ising-0.066
 tighter-0.063
 further-0.060
 instructions-0.059
 rules-0.057
Token floors
Token ID18570
Feature activation+6.09e-02

Pos logit contributions
te+0.010
tto+0.008
gh+0.008
rd+0.008
omo+0.008
Neg logit contributions
 tighter-0.008
 sidelines-0.008
 further-0.007
arding-0.007
 faces-0.007
Token of
Token ID286
Feature activation+1.27e-02

Pos logit contributions
 tighter+0.110
arding+0.106
 conclusions+0.105
 instructions+0.102
 weeds+0.102
Neg logit contributions
te-0.097
 owned-0.081
 lived-0.080
 ruled-0.080
 Sp-0.079
Token the
Token ID262
Feature activation+1.23e-02

Pos logit contributions
te+0.013
 once+0.010
 owned+0.010
 Sp+0.009
 lived+0.009
Neg logit contributions
arding-0.026
 conclusions-0.025
 tighter-0.025
 sidelines-0.025
 unnecessary-0.025
Token frog
Token ID21264
Feature activation-5.85e-03

Pos logit contributions
te+0.019
 once+0.017
 owned+0.016
 ti+0.016
 lived+0.015
Neg logit contributions
arding-0.018
 conclusions-0.017
ares-0.017
 unnecessary-0.016
 tighter-0.016
Token.
Token ID13
Feature activation+4.56e-03

Pos logit contributions
 conclusions+0.010
arding+0.010
 sidelines+0.010
 tighter+0.010
 unnecessary+0.010
Neg logit contributions
te-0.008
 once-0.006
 owned-0.006
z-0.006
 Sp-0.006
Token He
Token ID679
Feature activation+2.18e-02

Pos logit contributions
te+0.005
z+0.004
 ti+0.004
gh+0.004
 ruled+0.004
Neg logit contributions
 sidelines-0.009
 conclusions-0.008
 tighter-0.008
arding-0.008
 bullies-0.008
Token spe
Token ID693
Feature activation+1.54e-01

Pos logit contributions
te+0.030
rd+0.024
 Sp+0.023
 seeker+0.023
 ti+0.023
Neg logit contributions
 tighter-0.037
ising-0.034
arding-0.033
 weeds-0.033
 conclusions-0.033
Tokenared
Token ID1144
Feature activation-5.39e-02

Pos logit contributions
te+0.370
z+0.342
 owned+0.325
 once+0.324
omo+0.320
Neg logit contributions
 sidelines-0.095
arding-0.095
 conclusions-0.093
 bullies-0.090
 weeds-0.086
Token the
Token ID262
Feature activation+8.85e-03

Pos logit contributions
 tighter+0.087
arding+0.081
 weeds+0.081
 conclusions+0.080
 further+0.080
Neg logit contributions
te-0.080
 Sp-0.066
omo-0.065
 ti-0.064
 ruled-0.063
Token lazy
Token ID16931
Feature activation+1.09e-02

Pos logit contributions
te+0.013
 once+0.012
 owned+0.011
 ti+0.011
 lived+0.011
Neg logit contributions
arding-0.013
 conclusions-0.013
 unnecessary-0.012
 tighter-0.012
ares-0.012
Token frog
Token ID21264
Feature activation-1.14e-02

Pos logit contributions
te+0.017
 once+0.014
 owned+0.014
 Sp+0.014
 lived+0.014
Neg logit contributions
arding-0.016
 conclusions-0.016
 tighter-0.016
 unnecessary-0.015
 sidelines-0.015
Token and
Token ID290
Feature activation-1.14e-02

Pos logit contributions
 conclusions+0.018
arding+0.018
 tighter+0.018
 sidelines+0.018
 unnecessary+0.018
Neg logit contributions
te-0.016
 owned-0.013
 Sp-0.013
 once-0.013
 lived-0.013
Token house
Token ID2156
Feature activation+2.15e-02

Pos logit contributions
te+0.020
 owned+0.017
 Sp+0.017
 once+0.017
z+0.016
Neg logit contributions
 conclusions-0.016
 tighter-0.016
arding-0.016
 sidelines-0.016
 weeds-0.015
Token.
Token ID13
Feature activation+1.76e-02

Pos logit contributions
te+0.032
 Sp+0.027
 Ske+0.027
omo+0.026
z+0.025
Neg logit contributions
 tighter-0.032
ising-0.031
 instructions-0.031
 weeds-0.031
 sidelines-0.031
Token Lily
Token ID20037
Feature activation-1.30e-02

Pos logit contributions
te+0.021
z+0.017
gh+0.016
 ti+0.016
 owned+0.016
Neg logit contributions
 sidelines-0.031
 conclusions-0.031
 bullies-0.030
 tighter-0.030
arding-0.030
Token was
Token ID373
Feature activation+2.99e-02

Pos logit contributions
 tighter+0.021
ising+0.021
 bullies+0.020
 weeds+0.020
 sidelines+0.020
Neg logit contributions
te-0.017
gh-0.013
 ti-0.013
 Esk-0.013
z-0.013
Token the
Token ID262
Feature activation+1.78e-02

Pos logit contributions
te+0.049
z+0.045
omo+0.044
 ti+0.044
 Sp+0.043
Neg logit contributions
 tighter-0.038
 calling-0.037
 bullies-0.036
 sidelines-0.036
 further-0.036
Token seeker
Token ID45993
Feature activation+1.54e-01

Pos logit contributions
te+0.024
 owned+0.019
 lived+0.019
 Sp+0.019
z+0.019
Neg logit contributions
 tighter-0.030
 sidelines-0.030
 conclusions-0.030
arding-0.029
 unnecessary-0.029
Token and
Token ID290
Feature activation-2.73e-03

Pos logit contributions
te+0.228
 Sp+0.205
 Ske+0.200
 ruled+0.199
z+0.199
Neg logit contributions
arding-0.215
 sidelines-0.206
 tighter-0.204
 weeds-0.203
 bullies-0.203
Token Ben
Token ID3932
Feature activation-7.48e-03

Pos logit contributions
 tighter+0.004
 sidelines+0.004
 bullies+0.004
 weeds+0.004
 clues+0.004
Neg logit contributions
te-0.003
z-0.003
omo-0.003
 Esk-0.003
 once-0.003
Token was
Token ID373
Feature activation+2.58e-02

Pos logit contributions
 tighter+0.011
 agree+0.011
ising+0.011
 weeds+0.011
 bullies+0.010
Neg logit contributions
te-0.010
 Amen-0.009
omo-0.008
z-0.008
 Sp-0.008
Token the
Token ID262
Feature activation+1.53e-02

Pos logit contributions
omo+0.044
mm+0.044
te+0.043
 ti+0.043
ests+0.043
Neg logit contributions
 calling-0.031
 bullies-0.030
 further-0.028
 unnecessary-0.027
 tighter-0.027
Token h
Token ID289
Feature activation+6.31e-02

Pos logit contributions
te+0.024
 owned+0.020
 lived+0.020
z+0.020
 ruled+0.020
Neg logit contributions
 tighter-0.022
 conclusions-0.022
 sidelines-0.022
 unnecessary-0.021
 bullies-0.021
Token castle
Token ID16669
Feature activation+7.33e-02

Pos logit contributions
arding+0.019
 tighter+0.019
 conclusions+0.019
 unnecessary+0.018
 sidelines+0.018
Neg logit contributions
te-0.012
 owned-0.010
 ti-0.010
 Sp-0.009
 once-0.009
Token in
Token ID287
Feature activation-1.32e-02

Pos logit contributions
te+0.102
 Sp+0.085
omo+0.082
 owned+0.079
 ti+0.079
Neg logit contributions
 tighter-0.118
 sidelines-0.117
ising-0.117
arding-0.115
 conclusions-0.114
Token the
Token ID262
Feature activation+9.56e-03

Pos logit contributions
 conclusions+0.019
 clues+0.019
 tighter+0.019
 further+0.019
 weeds+0.018
Neg logit contributions
te-0.022
gh-0.019
z-0.019
omo-0.018
tto-0.018
Token distance
Token ID5253
Feature activation-4.36e-02

Pos logit contributions
te+0.017
z+0.014
 Sp+0.014
 owned+0.014
omo+0.014
Neg logit contributions
 conclusions-0.013
 sidelines-0.012
 tighter-0.012
 weeds-0.012
 further-0.012
Token beck
Token ID47673
Feature activation+3.71e-02

Pos logit contributions
 tighter+0.074
 conclusions+0.073
arding+0.072
 sidelines+0.071
ising+0.070
Neg logit contributions
te-0.059
 owned-0.048
 Sp-0.047
z-0.046
 lived-0.046
Tokenoning
Token ID12484
Feature activation-1.50e-01

Pos logit contributions
te+0.083
 once+0.071
 Sp+0.070
z+0.069
 owned+0.069
Neg logit contributions
 tighter-0.035
 unnecessary-0.034
 conclusions-0.032
 weeds-0.030
 clues-0.030
Token him
Token ID683
Feature activation+1.63e-02

Pos logit contributions
 conclusions+0.279
arding+0.278
 tighter+0.278
 sidelines+0.274
 unnecessary+0.269
Neg logit contributions
te-0.175
 owned-0.135
 once-0.132
 Sp-0.131
 lived-0.129
Token to
Token ID284
Feature activation+4.66e-03

Pos logit contributions
te+0.023
omo+0.019
 owned+0.018
 ti+0.018
tto+0.018
Neg logit contributions
 tighter-0.027
 instructions-0.026
arding-0.026
 unnecessary-0.026
 clues-0.026
Token explore
Token ID7301
Feature activation-3.80e-03

Pos logit contributions
te+0.006
 owned+0.005
 once+0.005
 Sp+0.005
 lived+0.005
Neg logit contributions
arding-0.008
 tighter-0.008
 conclusions-0.008
 sidelines-0.008
 unnecessary-0.008
Token.
Token ID13
Feature activation+9.74e-03

Pos logit contributions
 conclusions+0.006
 tighter+0.006
 further+0.006
 clues+0.006
 weeds+0.005
Neg logit contributions
te-0.006
omo-0.005
z-0.005
 owned-0.005
tto-0.005
Token
Token ID198
Feature activation+1.68e-02

Pos logit contributions
te+0.013
gh+0.010
z+0.010
 ti+0.010
 ever+0.009
Neg logit contributions
 tighter-0.017
 sidelines-0.017
 conclusions-0.016
 weeds-0.016
 instructions-0.016
Token Lily
Token ID20037
Feature activation-3.75e-02

Pos logit contributions
arding+0.014
 unnecessary+0.014
ising+0.014
 tighter+0.014
 conclusions+0.014
Neg logit contributions
te-0.007
 Sp-0.005
 once-0.005
 owned-0.005
 Ske-0.005
Token and
Token ID290
Feature activation-1.92e-02

Pos logit contributions
 conclusions+0.077
 sidelines+0.076
 unnecessary+0.075
arding+0.075
 bullies+0.073
Neg logit contributions
te-0.040
 once-0.033
z-0.030
 owned-0.029
 Sp-0.029
Token Tom
Token ID4186
Feature activation-8.65e-03

Pos logit contributions
arding+0.032
 conclusions+0.032
 unnecessary+0.030
 sidelines+0.030
ising+0.030
Neg logit contributions
te-0.026
 Sp-0.023
 once-0.023
 owned-0.022
 ti-0.022
Token.
Token ID13
Feature activation-1.26e-02

Pos logit contributions
 conclusions+0.017
 sidelines+0.017
arding+0.017
 unnecessary+0.016
 tighter+0.016
Neg logit contributions
te-0.010
 once-0.008
 owned-0.007
 Sp-0.007
z-0.007
Token "
Token ID366
Feature activation-4.09e-02

Pos logit contributions
 tighter+0.025
 sidelines+0.025
 conclusions+0.025
 bullies+0.024
 trigger+0.024
Neg logit contributions
te-0.013
z-0.010
rd-0.010
 owned-0.009
 ruled-0.009
TokenGrab
Token ID48400
Feature activation-1.47e-01

Pos logit contributions
arding+0.073
 tighter+0.069
 sidelines+0.069
 conclusions+0.069
 unnecessary+0.068
Neg logit contributions
te-0.053
 owned-0.042
 Sp-0.042
 once-0.042
z-0.041
Token the
Token ID262
Feature activation-1.78e-03

Pos logit contributions
 conclusions+0.279
arding+0.277
 sidelines+0.272
 tighter+0.270
 unnecessary+0.264
Neg logit contributions
te-0.166
 Sp-0.132
 once-0.130
 owned-0.128
z-0.127
Token rope
Token ID17182
Feature activation-4.87e-02

Pos logit contributions
arding+0.003
ishment+0.003
 conclusions+0.003
itionally+0.003
ising+0.002
Neg logit contributions
te-0.002
 no-0.002
 ti-0.002
 seeker-0.002
z-0.002
Token!
Token ID0
Feature activation-1.63e-02

Pos logit contributions
 tighter+0.086
arding+0.083
 conclusions+0.082
 sidelines+0.082
ising+0.081
Neg logit contributions
te-0.063
 owned-0.052
 lived-0.050
 ruled-0.049
 Sp-0.049
Token I
Token ID314
Feature activation-1.18e-02

Pos logit contributions
 conclusions+0.028
arding+0.028
 tighter+0.027
 sidelines+0.027
 unnecessary+0.026
Neg logit contributions
te-0.021
z-0.017
 owned-0.017
 once-0.017
 ruled-0.017
Token will
Token ID481
Feature activation+4.91e-02

Pos logit contributions
 tighter+0.017
arding+0.016
ising+0.016
 further+0.016
 conclusions+0.015
Neg logit contributions
te-0.018
 ti-0.015
 Sp-0.015
 Jacob-0.015
rd-0.014
Token Tom
Token ID4186
Feature activation-9.45e-03

Pos logit contributions
 conclusions+0.013
 tighter+0.012
arding+0.012
 sidelines+0.012
 unnecessary+0.012
Neg logit contributions
te-0.007
 owned-0.006
 once-0.006
 Sp-0.006
 lived-0.005
Token.
Token ID13
Feature activation-7.49e-03

Pos logit contributions
 conclusions+0.019
 sidelines+0.019
arding+0.019
 unnecessary+0.019
 bullies+0.018
Neg logit contributions
te-0.010
 once-0.008
z-0.007
 owned-0.007
 Sp-0.007
Token
Token ID198
Feature activation-3.25e-03

Pos logit contributions
 sidelines+0.013
 tighter+0.013
 instructions+0.013
 rules+0.013
 weeds+0.013
Neg logit contributions
te-0.010
z-0.008
rd-0.007
 ti-0.007
gh-0.007
Token
Token ID198
Feature activation-2.92e-03

Pos logit contributions
 conclusions+0.006
 tighter+0.006
arding+0.006
 instructions+0.006
 rules+0.006
Neg logit contributions
te-0.004
 ti-0.003
 owned-0.003
 ruled-0.003
 lived-0.003
Token"
Token ID1
Feature activation-4.13e-02

Pos logit contributions
 tighter+0.004
 sidelines+0.004
 conclusions+0.004
 instructions+0.004
arding+0.004
Neg logit contributions
te-0.005
 once-0.004
 ti-0.004
 owned-0.004
 lived-0.004
TokenGrab
Token ID48400
Feature activation-1.47e-01

Pos logit contributions
 conclusions+0.058
 tighter+0.058
 sidelines+0.057
arding+0.056
 bullies+0.054
Neg logit contributions
te-0.068
 owned-0.057
 once-0.057
 lived-0.056
 Sp-0.056
Token the
Token ID262
Feature activation-1.71e-03

Pos logit contributions
 conclusions+0.269
arding+0.268
 tighter+0.264
 sidelines+0.263
 unnecessary+0.256
Neg logit contributions
te-0.176
 once-0.138
 Sp-0.137
 owned-0.137
z-0.133
Token line
Token ID1627
Feature activation-2.53e-02

Pos logit contributions
 conclusions+0.003
ishment+0.003
arding+0.002
osis+0.002
irs+0.002
Neg logit contributions
man-0.002
 He-0.002
 once-0.002
 no-0.002
te-0.002
Token,
Token ID11
Feature activation-1.89e-02

Pos logit contributions
 tighter+0.046
arding+0.046
 conclusions+0.045
 sidelines+0.045
ising+0.044
Neg logit contributions
te-0.030
 owned-0.024
 Sp-0.024
 lived-0.023
 once-0.023
Token boy
Token ID2933
Feature activation-1.13e-02

Pos logit contributions
 conclusions+0.033
 tighter+0.033
arding+0.033
 sidelines+0.033
 unnecessary+0.032
Neg logit contributions
te-0.024
 owned-0.019
 once-0.019
 lived-0.019
 Sp-0.019
Token!"
Token ID2474
Feature activation-2.09e-02

Pos logit contributions
 tighter+0.018
ising+0.017
arding+0.017
 conclusions+0.016
 trigger+0.016
Neg logit contributions
te-0.016
 owned-0.013
 Sp-0.013
 ti-0.013
 once-0.013
Token hand
Token ID1021
Feature activation-6.72e-02

Pos logit contributions
 tighter+0.065
 instructions+0.062
 conclusions+0.061
 sidelines+0.060
 rules+0.059
Neg logit contributions
te-0.086
 owned-0.075
 ruled-0.075
 lived-0.074
 Sp-0.070
Token.
Token ID13
Feature activation-3.15e-03

Pos logit contributions
 tighter+0.106
arding+0.096
 instructions+0.094
 trigger+0.091
oves+0.091
Neg logit contributions
te-0.102
 owned-0.090
 lived-0.089
 ruled-0.088
 Sp-0.083
Token
Token ID198
Feature activation-8.95e-05

Pos logit contributions
 tighter+0.005
 conclusions+0.005
 sidelines+0.005
arding+0.005
 weeds+0.005
Neg logit contributions
te-0.004
 ruled-0.003
 owned-0.003
rd-0.003
z-0.003
Token
Token ID198
Feature activation+3.56e-04

Pos logit contributions
 conclusions+0.000
 tighter+0.000
arding+0.000
 sidelines+0.000
 bullies+0.000
Neg logit contributions
te-0.000
 owned-0.000
 ti-0.000
 ruled-0.000
 lived-0.000
Token"
Token ID1
Feature activation-3.58e-02

Pos logit contributions
te+0.000
 Sp+0.000
 owned+0.000
z+0.000
 lived+0.000
Neg logit contributions
arding-0.001
 conclusions-0.001
 tighter-0.001
 unnecessary-0.001
 sidelines-0.001
TokenGrab
Token ID48400
Feature activation-1.45e-01

Pos logit contributions
 conclusions+0.046
 tighter+0.046
 sidelines+0.044
 trigger+0.042
arding+0.042
Neg logit contributions
te-0.065
 owned-0.055
 ruled-0.055
 once-0.055
 lived-0.054
Token my
Token ID616
Feature activation+9.13e-04

Pos logit contributions
 conclusions+0.257
arding+0.256
 tighter+0.255
 sidelines+0.252
 unnecessary+0.246
Neg logit contributions
te-0.185
 owned-0.146
 once-0.144
 Sp-0.143
 lived-0.141
Token hand
Token ID1021
Feature activation-6.99e-02

Pos logit contributions
te+0.002
 once+0.001
 Sp+0.001
 owned+0.001
z+0.001
Neg logit contributions
 conclusions-0.001
arding-0.001
 sidelines-0.001
 tighter-0.001
ising-0.001
Token,
Token ID11
Feature activation-1.82e-02

Pos logit contributions
 tighter+0.123
arding+0.118
ising+0.115
 conclusions+0.115
 sidelines+0.112
Neg logit contributions
te-0.091
 owned-0.078
 lived-0.074
 ruled-0.074
 Sp-0.072
Token Ben
Token ID3932
Feature activation-2.55e-02

Pos logit contributions
 conclusions+0.029
 tighter+0.028
 sidelines+0.028
arding+0.028
 weeds+0.027
Neg logit contributions
te-0.027
 owned-0.022
z-0.022
 ruled-0.021
 lived-0.021
Token.
Token ID13
Feature activation-8.55e-03

Pos logit contributions
 tighter+0.042
arding+0.041
ising+0.041
 weeds+0.039
 conclusions+0.039
Neg logit contributions
te-0.036
 owned-0.028
 Sp-0.027
 ruled-0.026
 ti-0.026
Tokenged
Token ID2004
Feature activation-7.34e-02

Pos logit contributions
te+0.031
omo+0.029
 Ske+0.028
 seeker+0.028
 Sp+0.028
Neg logit contributions
 tighter-0.034
 instructions-0.031
 further-0.030
arding-0.029
ly-0.029
Token and
Token ID290
Feature activation-1.24e-02

Pos logit contributions
 tighter+0.115
arding+0.112
 conclusions+0.110
 unnecessary+0.108
 weeds+0.108
Neg logit contributions
te-0.108
 owned-0.088
 Sp-0.088
 ruled-0.088
 once-0.087
Token boxed
Token ID46069
Feature activation+1.56e-02

Pos logit contributions
 tighter+0.020
 sidelines+0.019
 weeds+0.019
arding+0.019
 conclusions+0.018
Neg logit contributions
te-0.018
 Sp-0.014
z-0.014
 owned-0.014
 once-0.014
Token and
Token ID290
Feature activation-1.23e-02

Pos logit contributions
te+0.023
omo+0.019
 owned+0.019
 ruled+0.019
 Sp+0.018
Neg logit contributions
 tighter-0.025
 instructions-0.024
 weeds-0.023
arding-0.023
ising-0.022
Token cart
Token ID6383
Feature activation+2.95e-02

Pos logit contributions
 tighter+0.018
 sidelines+0.018
arding+0.018
 conclusions+0.018
 weeds+0.017
Neg logit contributions
te-0.019
 owned-0.016
 once-0.016
 Sp-0.016
z-0.015
Tokenoned
Token ID12004
Feature activation-1.44e-01

Pos logit contributions
te+0.037
omo+0.031
 Sp+0.031
 ti+0.031
 Ske+0.030
Neg logit contributions
 tighter-0.050
arding-0.048
 instructions-0.047
ising-0.046
 sidelines-0.046
Token and
Token ID290
Feature activation-1.23e-02

Pos logit contributions
 tighter+0.228
 conclusions+0.219
arding+0.218
 weeds+0.216
 unnecessary+0.213
Neg logit contributions
te-0.212
z-0.171
 owned-0.170
 Sp-0.170
 ruled-0.169
Token packaged
Token ID25555
Feature activation+1.87e-02

Pos logit contributions
 tighter+0.018
 conclusions+0.018
arding+0.018
 sidelines+0.018
 weeds+0.017
Neg logit contributions
te-0.019
 owned-0.016
 Sp-0.016
 once-0.016
z-0.015
Token and
Token ID290
Feature activation-1.21e-02

Pos logit contributions
te+0.030
 owned+0.025
 Sp+0.025
 ruled+0.025
 lived+0.024
Neg logit contributions
 tighter-0.028
 conclusions-0.026
arding-0.026
 weeds-0.025
ising-0.025
Token labeled
Token ID15494
Feature activation+2.57e-02

Pos logit contributions
 conclusions+0.017
arding+0.017
 tighter+0.017
 sidelines+0.017
 unnecessary+0.016
Neg logit contributions
te-0.019
 owned-0.016
 once-0.016
 Sp-0.016
 lived-0.016
Token and
Token ID290
Feature activation-1.17e-02

Pos logit contributions
te+0.036
 owned+0.029
 ruled+0.029
z+0.028
 Sp+0.028
Neg logit contributions
 tighter-0.042
arding-0.041
 conclusions-0.041
 instructions-0.040
 weeds-0.040
Token!"
Token ID2474
Feature activation-2.01e-02

Pos logit contributions
 conclusions+0.111
arding+0.111
 sidelines+0.109
 tighter+0.109
 unnecessary+0.106
Neg logit contributions
te-0.071
 once-0.056
 owned-0.054
 Sp-0.054
z-0.053
Token Mia
Token ID32189
Feature activation-7.97e-03

Pos logit contributions
 conclusions+0.034
 tighter+0.033
 sidelines+0.033
 bullies+0.032
 weeds+0.032
Neg logit contributions
te-0.027
z-0.023
 owned-0.022
gh-0.021
 ti-0.021
Token shouts
Token ID34757
Feature activation-5.58e-02

Pos logit contributions
arding+0.008
 conclusions+0.008
 sidelines+0.007
 tighter+0.007
 unnecessary+0.007
Neg logit contributions
te-0.017
 once-0.015
 Sp-0.015
 lived-0.014
 owned-0.014
Token.
Token ID13
Feature activation-9.26e-03

Pos logit contributions
 sidelines+0.120
 conclusions+0.119
arding+0.119
ising+0.116
 unnecessary+0.115
Neg logit contributions
te-0.052
 once-0.041
 Sp-0.036
 He-0.034
 ever-0.033
Token "
Token ID366
Feature activation-3.77e-02

Pos logit contributions
 conclusions+0.019
arding+0.019
 sidelines+0.019
 tighter+0.019
 bullies+0.018
Neg logit contributions
te-0.009
 owned-0.007
 ti-0.006
 ruled-0.006
 lived-0.006
TokenGrab
Token ID48400
Feature activation-1.44e-01

Pos logit contributions
arding+0.063
 tighter+0.058
 conclusions+0.058
 sidelines+0.057
ishment+0.057
Neg logit contributions
te-0.052
z-0.044
 once-0.043
 Sp-0.042
 He-0.042
Token your
Token ID534
Feature activation-1.18e-02

Pos logit contributions
 conclusions+0.256
arding+0.255
 sidelines+0.247
 tighter+0.243
 unnecessary+0.238
Neg logit contributions
te-0.180
 Sp-0.147
 once-0.147
z-0.145
 owned-0.142
Token car
Token ID1097
Feature activation+3.45e-03

Pos logit contributions
arding+0.017
 tighter+0.017
 conclusions+0.017
 sidelines+0.017
 unnecessary+0.016
Neg logit contributions
te-0.019
 owned-0.016
 lived-0.015
 once-0.015
 Sp-0.015
Token and
Token ID290
Feature activation-1.88e-02

Pos logit contributions
te+0.004
 owned+0.004
 lived+0.003
 ruled+0.003
 ti+0.003
Neg logit contributions
 tighter-0.006
ising-0.006
arding-0.006
 sidelines-0.006
 trigger-0.006
Token follow
Token ID1061
Feature activation-4.28e-02

Pos logit contributions
 conclusions+0.020
arding+0.020
 tighter+0.020
 sidelines+0.019
 unnecessary+0.019
Neg logit contributions
te-0.037
 owned-0.032
 Sp-0.032
 once-0.032
 lived-0.031
Token me
Token ID502
Feature activation-1.31e-02

Pos logit contributions
arding+0.067
 conclusions+0.067
 tighter+0.066
 sidelines+0.066
 unnecessary+0.064
Neg logit contributions
te-0.063
 once-0.051
 Sp-0.051
 owned-0.051
 lived-0.050
Token into
Token ID656
Feature activation-3.14e-02

Pos logit contributions
 tighter+0.043
arding+0.041
 conclusions+0.041
 sidelines+0.040
 weeds+0.040
Neg logit contributions
te-0.035
 owned-0.028
 Sp-0.027
 lived-0.027
 ruled-0.027
Token the
Token ID262
Feature activation+5.29e-04

Pos logit contributions
arding+0.071
 unnecessary+0.069
ising+0.067
 sidelines+0.067
 conclusions+0.067
Neg logit contributions
te-0.026
 Sp-0.022
 once-0.020
 Ske-0.018
 owned-0.018
Token hole
Token ID7604
Feature activation-4.47e-02

Pos logit contributions
 lived+0.001
te+0.001
 once+0.001
 owned+0.001
 He+0.001
Neg logit contributions
arding-0.001
ares-0.001
ising-0.001
ales-0.001
oves-0.001
Token.
Token ID13
Feature activation-7.90e-03

Pos logit contributions
 tighter+0.075
arding+0.072
 conclusions+0.071
ising+0.070
 weeds+0.070
Neg logit contributions
te-0.060
 Sp-0.050
 owned-0.050
 lived-0.049
 Ske-0.048
Token "
Token ID366
Feature activation-3.74e-02

Pos logit contributions
 conclusions+0.014
 tighter+0.014
 sidelines+0.014
arding+0.014
 weeds+0.014
Neg logit contributions
te-0.010
z-0.007
 owned-0.007
rd-0.007
 ruled-0.007
TokenGrab
Token ID48400
Feature activation-1.43e-01

Pos logit contributions
arding+0.063
 conclusions+0.062
 tighter+0.062
 sidelines+0.061
 unnecessary+0.060
Neg logit contributions
te-0.051
 owned-0.041
 once-0.041
 Sp-0.041
 lived-0.040
Token the
Token ID262
Feature activation+1.16e-03

Pos logit contributions
 conclusions+0.275
arding+0.272
 sidelines+0.267
 tighter+0.265
 unnecessary+0.258
Neg logit contributions
te-0.158
 Sp-0.126
 once-0.125
 owned-0.123
z-0.123
Token rope
Token ID17182
Feature activation-4.36e-02

Pos logit contributions
 no+0.001
te+0.001
 seeker+0.001
 once+0.001
 ever+0.001
Neg logit contributions
ishment-0.002
arding-0.002
ares-0.002
itionally-0.002
 conclusions-0.002
Token and
Token ID290
Feature activation-1.66e-02

Pos logit contributions
 tighter+0.076
arding+0.073
 conclusions+0.072
ising+0.072
 sidelines+0.072
Neg logit contributions
te-0.057
 owned-0.047
 lived-0.045
 ruled-0.045
 Sp-0.044
Token I
Token ID314
Feature activation-8.56e-03

Pos logit contributions
 conclusions+0.020
 tighter+0.020
arding+0.020
 sidelines+0.020
 unnecessary+0.019
Neg logit contributions
te-0.030
 owned-0.026
 once-0.026
 Sp-0.025
 lived-0.025
Token'll
Token ID1183
Feature activation+3.48e-02

Pos logit contributions
 tighter+0.013
arding+0.013
 further+0.012
 conclusions+0.012
 weeds+0.012
Neg logit contributions
te-0.012
 ti-0.010
rd-0.010
gh-0.010
 ruled-0.010
Token Sam
Token ID3409
Feature activation-2.41e-02

Pos logit contributions
 conclusions+0.005
arding+0.005
 tighter+0.005
 sidelines+0.005
 unnecessary+0.004
Neg logit contributions
te-0.003
 owned-0.002
 once-0.002
 Sp-0.002
 lived-0.002
Token.
Token ID13
Feature activation-7.70e-03

Pos logit contributions
 sidelines+0.049
 conclusions+0.049
arding+0.048
 unnecessary+0.047
 bullies+0.046
Neg logit contributions
te-0.025
 once-0.020
 owned-0.019
 Sp-0.018
z-0.018
Token
Token ID198
Feature activation-4.98e-03

Pos logit contributions
 conclusions+0.013
 bullies+0.013
 tighter+0.013
 sidelines+0.013
 instructions+0.013
Neg logit contributions
te-0.010
rd-0.008
z-0.008
 owned-0.008
 ti-0.008
Token
Token ID198
Feature activation-4.11e-03

Pos logit contributions
 conclusions+0.009
 tighter+0.009
arding+0.009
 sidelines+0.009
 instructions+0.009
Neg logit contributions
te-0.006
 ti-0.004
 owned-0.004
 ruled-0.004
z-0.004
Token"
Token ID1
Feature activation-3.63e-02

Pos logit contributions
 tighter+0.007
 conclusions+0.006
arding+0.006
 sidelines+0.006
 weeds+0.006
Neg logit contributions
te-0.006
 once-0.005
 owned-0.005
 ti-0.005
 lived-0.005
TokenGrab
Token ID48400
Feature activation-1.40e-01

Pos logit contributions
 conclusions+0.050
 tighter+0.050
arding+0.049
 sidelines+0.049
 unnecessary+0.047
Neg logit contributions
te-0.061
 owned-0.051
 once-0.051
 Sp-0.050
 lived-0.050
Token the
Token ID262
Feature activation+1.90e-03

Pos logit contributions
 conclusions+0.268
arding+0.265
 sidelines+0.261
 tighter+0.259
 unnecessary+0.253
Neg logit contributions
te-0.155
 once-0.123
 Sp-0.123
 owned-0.120
z-0.119
Token rope
Token ID17182
Feature activation-4.25e-02

Pos logit contributions
te+0.003
 once+0.003
 no+0.002
z+0.002
 He+0.002
Neg logit contributions
arding-0.003
ishment-0.003
 conclusions-0.003
 officers-0.003
 sidelines-0.002
Token!"
Token ID2474
Feature activation-1.68e-02

Pos logit contributions
 tighter+0.076
arding+0.074
 conclusions+0.073
ising+0.072
 sidelines+0.072
Neg logit contributions
te-0.053
 owned-0.044
 lived-0.042
 Sp-0.041
 ruled-0.041
Token Ben
Token ID3932
Feature activation-2.26e-02

Pos logit contributions
 conclusions+0.030
 sidelines+0.030
 tighter+0.030
 bullies+0.029
 weeds+0.029
Neg logit contributions
te-0.022
z-0.017
 owned-0.017
 once-0.016
 lived-0.016
Token shouts
Token ID34757
Feature activation-5.08e-02

Pos logit contributions
arding+0.029
 conclusions+0.029
 sidelines+0.028
 unnecessary+0.027
 tighter+0.025
Neg logit contributions
te-0.040
 Sp-0.037
 once-0.036
 He-0.036
 Ske-0.036
Token appreciate
Token ID9144
Feature activation+1.55e-02

Pos logit contributions
 agree+0.017
 tighter+0.016
 conclusions+0.016
ising+0.016
arding+0.015
Neg logit contributions
te-0.015
z-0.014
 ti-0.013
bridge-0.013
gh-0.013
Token your
Token ID534
Feature activation-1.61e-02

Pos logit contributions
te+0.026
z+0.022
 owned+0.022
 once+0.021
 ti+0.021
Neg logit contributions
 conclusions-0.021
arding-0.020
 tighter-0.020
 sidelines-0.020
 unnecessary-0.020
Token honesty
Token ID24345
Feature activation-3.93e-02

Pos logit contributions
 conclusions+0.020
arding+0.020
 tighter+0.020
 sidelines+0.019
 unnecessary+0.019
Neg logit contributions
te-0.029
 owned-0.025
 Sp-0.024
 once-0.024
z-0.024
Token and
Token ID290
Feature activation-2.26e-02

Pos logit contributions
 tighter+0.063
 conclusions+0.061
arding+0.061
ising+0.060
 sidelines+0.059
Neg logit contributions
te-0.055
 Sp-0.046
 owned-0.046
z-0.046
 ruled-0.044
Token your
Token ID534
Feature activation-1.71e-02

Pos logit contributions
arding+0.032
 conclusions+0.032
 sidelines+0.031
 tighter+0.030
 trigger+0.030
Neg logit contributions
te-0.036
 owned-0.032
 lived-0.031
 Sp-0.031
 once-0.031
Token apology
Token ID17029
Feature activation-1.39e-01

Pos logit contributions
 conclusions+0.022
 tighter+0.021
arding+0.021
 sidelines+0.021
 unnecessary+0.021
Neg logit contributions
te-0.030
 owned-0.025
z-0.025
 Sp-0.025
 once-0.025
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
arding+0.233
 tighter+0.233
 conclusions+0.229
ising+0.222
 unnecessary+0.220
Neg logit contributions
te-0.184
 Sp-0.150
 owned-0.147
z-0.146
 ti-0.144
Token I
Token ID314
Feature activation-1.52e-02

Pos logit contributions
 conclusions+0.019
 tighter+0.019
 unnecessary+0.017
 supervision+0.017
 ask+0.017
Neg logit contributions
te-0.019
z-0.018
gh-0.017
 owned-0.015
 Esk-0.015
Token hope
Token ID2911
Feature activation+1.00e-01

Pos logit contributions
 agree+0.022
 tighter+0.022
ising+0.021
 conclusions+0.021
arding+0.020
Neg logit contributions
te-0.022
z-0.020
 Sp-0.020
gh-0.019
 ti-0.019
Token you
Token ID345
Feature activation-3.72e-02

Pos logit contributions
te+0.196
z+0.170
 owned+0.168
 ti+0.166
 Sp+0.165
Neg logit contributions
 conclusions-0.110
arding-0.109
 tighter-0.107
 sidelines-0.106
 unnecessary-0.102
Token have
Token ID423
Feature activation-5.17e-02

Pos logit contributions
 conclusions+0.059
 tighter+0.059
arding+0.059
ising+0.056
 unnecessary+0.056
Neg logit contributions
te-0.051
z-0.043
 Sp-0.043
omo-0.042
 Ske-0.042
Token sorry
Token ID7926
Feature activation-4.07e-02

Pos logit contributions
arding+0.060
 sidelines+0.059
 conclusions+0.059
 tighter+0.058
 bullies+0.056
Neg logit contributions
te-0.074
 owned-0.063
z-0.062
 once-0.062
 lived-0.061
Token.
Token ID13
Feature activation-1.06e-02

Pos logit contributions
 conclusions+0.083
 sidelines+0.083
arding+0.083
 tighter+0.082
 unnecessary+0.080
Neg logit contributions
te-0.041
 once-0.030
 Sp-0.029
 owned-0.029
 lived-0.028
Token I
Token ID314
Feature activation-1.40e-02

Pos logit contributions
 conclusions+0.016
arding+0.016
 tighter+0.016
 sidelines+0.016
 bullies+0.015
Neg logit contributions
te-0.015
z-0.014
gh-0.014
 once-0.013
 owned-0.012
Token accept
Token ID2453
Feature activation-5.49e-02

Pos logit contributions
 agree+0.021
 tighter+0.021
 conclusions+0.020
ising+0.020
 damage+0.019
Neg logit contributions
te-0.018
z-0.017
 ti-0.017
 Sp-0.016
 Jo-0.016
Token your
Token ID534
Feature activation-1.47e-02

Pos logit contributions
 conclusions+0.073
 tighter+0.073
arding+0.072
 sidelines+0.071
 bullies+0.069
Neg logit contributions
te-0.093
 owned-0.079
z-0.078
 lived-0.077
 once-0.077
Token apology
Token ID17029
Feature activation-1.38e-01

Pos logit contributions
arding+0.018
 conclusions+0.018
 tighter+0.018
 sidelines+0.018
 unnecessary+0.017
Neg logit contributions
te-0.026
 once-0.022
 owned-0.022
 ti-0.022
 Sp-0.022
Token.
Token ID13
Feature activation-1.18e-02

Pos logit contributions
 tighter+0.227
arding+0.225
 conclusions+0.218
ising+0.213
 instructions+0.211
Neg logit contributions
te-0.186
 Sp-0.155
 owned-0.153
z-0.153
 Ske-0.149
Token Next
Token ID7406
Feature activation-5.05e-02

Pos logit contributions
 conclusions+0.017
 tighter+0.017
 bullies+0.016
 sidelines+0.016
 ask+0.016
Neg logit contributions
z-0.018
te-0.017
gh-0.016
 once-0.015
 owned-0.014
Token time
Token ID640
Feature activation-5.98e-02

Pos logit contributions
 trigger+0.123
arding+0.122
 sidelines+0.121
 conclusions+0.121
 tighter+0.117
Neg logit contributions
te-0.033
 once-0.027
 ever-0.024
 owned-0.023
 no-0.021
Token,
Token ID11
Feature activation-1.59e-02

Pos logit contributions
arding+0.127
 tighter+0.126
 conclusions+0.125
 sidelines+0.124
 unnecessary+0.122
Neg logit contributions
te-0.055
 owned-0.038
 Sp-0.037
 once-0.037
z-0.036
Token be
Token ID307
Feature activation-3.54e-02

Pos logit contributions
 conclusions+0.021
arding+0.021
 tighter+0.021
 sidelines+0.021
 unnecessary+0.020
Neg logit contributions
te-0.027
 owned-0.023
 once-0.023
 Sp-0.023
 lived-0.022
Tokenily
Token ID813
Feature activation-3.87e-02

Pos logit contributions
 conclusions+0.028
arding+0.027
 tighter+0.027
 sidelines+0.026
ising+0.026
Neg logit contributions
te-0.026
 owned-0.021
 Sp-0.021
 once-0.021
 ti-0.021
Token and
Token ID290
Feature activation-1.91e-02

Pos logit contributions
 tighter+0.067
arding+0.063
 conclusions+0.063
ising+0.063
 sidelines+0.062
Neg logit contributions
te-0.051
 Sp-0.041
 owned-0.040
z-0.039
 ti-0.039
Token Ben
Token ID3932
Feature activation-2.05e-02

Pos logit contributions
 tighter+0.032
 conclusions+0.031
arding+0.031
 sidelines+0.031
 bullies+0.030
Neg logit contributions
te-0.027
 owned-0.021
z-0.021
 once-0.021
 ti-0.021
Token accepted
Token ID6292
Feature activation-1.28e-02

Pos logit contributions
 conclusions+0.043
 sidelines+0.042
arding+0.042
 unnecessary+0.042
 tighter+0.040
Neg logit contributions
te-0.021
 lived-0.017
 owned-0.016
 once-0.016
 ti-0.015
Token his
Token ID465
Feature activation+7.62e-03

Pos logit contributions
arding+0.023
 conclusions+0.022
 sidelines+0.022
 tighter+0.022
 unnecessary+0.022
Neg logit contributions
te-0.017
 once-0.013
 Sp-0.013
 owned-0.013
 lived-0.013
Token apology
Token ID17029
Feature activation-1.37e-01

Pos logit contributions
te+0.016
 once+0.014
 owned+0.014
 Sp+0.013
 lived+0.013
Neg logit contributions
arding-0.008
 conclusions-0.007
 tighter-0.007
 sidelines-0.007
 unnecessary-0.007
Token.
Token ID13
Feature activation-5.30e-03

Pos logit contributions
 sidelines+0.296
 conclusions+0.295
arding+0.286
 unnecessary+0.283
 weeds+0.280
Neg logit contributions
te-0.132
 once-0.109
 ever-0.093
 owned-0.087
 lived-0.087
Token They
Token ID1119
Feature activation-4.59e-02

Pos logit contributions
 conclusions+0.009
arding+0.009
 tighter+0.009
 sidelines+0.008
 unnecessary+0.008
Neg logit contributions
te-0.008
 owned-0.006
 once-0.006
 Sp-0.006
 lived-0.006
Token said
Token ID531
Feature activation-2.16e-02

Pos logit contributions
 conclusions+0.075
arding+0.074
 sidelines+0.073
 tighter+0.073
 unnecessary+0.072
Neg logit contributions
te-0.066
 owned-0.054
 lived-0.053
 once-0.053
 Sp-0.052
Token they
Token ID484
Feature activation-1.64e-02

Pos logit contributions
arding+0.025
 faces+0.025
 tighter+0.025
 bullies+0.024
 conclusions+0.023
Neg logit contributions
te-0.039
z-0.036
 owned-0.034
rd-0.034
tto-0.033
Token forg
Token ID5784
Feature activation+1.96e-02

Pos logit contributions
 tighter+0.028
arding+0.027
 agree+0.026
 conclusions+0.026
 officers+0.026
Neg logit contributions
te-0.020
z-0.018
 Sp-0.017
 Ske-0.017
rd-0.016
Token appreciate
Token ID9144
Feature activation+1.83e-02

Pos logit contributions
 agree+0.017
 tighter+0.015
ising+0.015
 decide+0.015
 conclusions+0.014
Neg logit contributions
te-0.015
 Sp-0.014
z-0.014
 Tues-0.014
 Jacob-0.014
Token your
Token ID534
Feature activation-1.27e-02

Pos logit contributions
te+0.028
z+0.023
 owned+0.023
 once+0.023
 Sp+0.023
Neg logit contributions
 conclusions-0.028
 tighter-0.027
arding-0.027
 sidelines-0.027
 unnecessary-0.026
Token honesty
Token ID24345
Feature activation-3.74e-02

Pos logit contributions
 conclusions+0.016
arding+0.016
 tighter+0.016
 sidelines+0.016
 unnecessary+0.015
Neg logit contributions
te-0.023
 owned-0.019
 once-0.019
 Sp-0.019
 lived-0.019
Token and
Token ID290
Feature activation-2.08e-02

Pos logit contributions
 tighter+0.060
arding+0.057
 conclusions+0.057
ising+0.057
 instructions+0.056
Neg logit contributions
te-0.052
 Sp-0.044
 owned-0.043
z-0.043
 ruled-0.042
Token your
Token ID534
Feature activation-1.32e-02

Pos logit contributions
 conclusions+0.028
arding+0.028
 sidelines+0.027
 tighter+0.027
 unnecessary+0.026
Neg logit contributions
te-0.035
 owned-0.030
 Sp-0.030
 lived-0.030
 once-0.030
Token apology
Token ID17029
Feature activation-1.36e-01

Pos logit contributions
 conclusions+0.017
 tighter+0.017
arding+0.017
 sidelines+0.016
 unnecessary+0.016
Neg logit contributions
te-0.023
 owned-0.020
 Sp-0.020
 once-0.020
z-0.019
Token.
Token ID13
Feature activation-1.17e-02

Pos logit contributions
 tighter+0.223
arding+0.223
 conclusions+0.216
ising+0.212
 instructions+0.210
Neg logit contributions
te-0.183
 Sp-0.151
z-0.148
 owned-0.147
 Ske-0.143
Token You
Token ID921
Feature activation-3.31e-02

Pos logit contributions
 tighter+0.015
 lam+0.015
 conclusions+0.015
 ask+0.015
 supervision+0.014
Neg logit contributions
te-0.018
z-0.018
gh-0.018
castle-0.015
cast-0.015
Token can
Token ID460
Feature activation+1.08e-02

Pos logit contributions
arding+0.054
 conclusions+0.053
 tighter+0.053
 sidelines+0.052
 unnecessary+0.051
Neg logit contributions
te-0.047
 owned-0.038
 once-0.037
 Sp-0.037
z-0.037
Token keep
Token ID1394
Feature activation-5.27e-03

Pos logit contributions
te+0.017
 owned+0.014
 once+0.014
 Sp+0.014
 lived+0.014
Neg logit contributions
 conclusions-0.016
arding-0.016
 tighter-0.016
 sidelines-0.016
 unnecessary-0.015
Token the
Token ID262
Feature activation-3.49e-03

Pos logit contributions
 conclusions+0.009
arding+0.009
 tighter+0.009
 sidelines+0.009
 trigger+0.009
Neg logit contributions
te-0.007
 once-0.006
 owned-0.005
 Sp-0.005
 ever-0.005
Token!
Token ID0
Feature activation-1.24e-02

Pos logit contributions
te+0.003
 ti+0.002
 Sp+0.002
z+0.002
 owned+0.002
Neg logit contributions
 tighter-0.003
arding-0.003
 conclusions-0.003
ising-0.003
 further-0.003
Token You
Token ID921
Feature activation-2.91e-02

Pos logit contributions
arding+0.018
 conclusions+0.018
 bullies+0.017
 sidelines+0.016
 unnecessary+0.016
Neg logit contributions
te-0.019
z-0.017
 ti-0.017
gh-0.016
 once-0.015
Token owe
Token ID19059
Feature activation-7.75e-02

Pos logit contributions
arding+0.046
 conclusions+0.045
 tighter+0.045
 bullies+0.044
 sidelines+0.044
Neg logit contributions
te-0.042
z-0.034
 Sp-0.034
 ti-0.034
 once-0.033
Token me
Token ID502
Feature activation-1.22e-02

Pos logit contributions
arding+0.113
 tighter+0.112
 conclusions+0.112
 sidelines+0.111
ising+0.111
Neg logit contributions
te-0.117
 once-0.108
 Sp-0.107
 no-0.101
 ever-0.100
Token an
Token ID281
Feature activation+7.89e-03

Pos logit contributions
 conclusions+0.021
arding+0.020
 tighter+0.020
 sidelines+0.020
 unnecessary+0.020
Neg logit contributions
te-0.016
 owned-0.013
 once-0.013
 Sp-0.013
 lived-0.013
Token apology
Token ID17029
Feature activation-1.35e-01

Pos logit contributions
te+0.013
 once+0.012
 owned+0.011
 lived+0.011
 ti+0.011
Neg logit contributions
arding-0.011
 conclusions-0.011
 tighter-0.010
 sidelines-0.010
 unnecessary-0.010
Token!
Token ID0
Feature activation-1.51e-02

Pos logit contributions
 tighter+0.233
arding+0.233
 conclusions+0.231
 sidelines+0.225
ising+0.224
Neg logit contributions
te-0.168
 Sp-0.139
 owned-0.137
z-0.132
 lived-0.131
Token You
Token ID921
Feature activation-3.09e-02

Pos logit contributions
arding+0.023
 bullies+0.023
 conclusions+0.022
 officers+0.021
 sidelines+0.021
Neg logit contributions
te-0.021
z-0.019
 ti-0.018
gh-0.018
 once-0.017
Token can
Token ID460
Feature activation+1.02e-02

Pos logit contributions
arding+0.050
 tighter+0.048
 conclusions+0.048
 bullies+0.047
ising+0.047
Neg logit contributions
te-0.043
z-0.036
 Sp-0.035
 ti-0.035
 once-0.034
Token't
Token ID470
Feature activation-4.95e-03

Pos logit contributions
te+0.015
 owned+0.012
 Sp+0.012
 once+0.012
z+0.012
Neg logit contributions
 conclusions-0.016
arding-0.016
 sidelines-0.016
 tighter-0.016
ising-0.015
Token just
Token ID655
Feature activation+2.38e-02

Pos logit contributions
 tighter+0.007
arding+0.007
 sidelines+0.007
 conclusions+0.007
 unnecessary+0.007
Neg logit contributions
te-0.008
 once-0.007
 owned-0.007
 lived-0.007
 ever-0.007
Token very
Token ID845
Feature activation-1.23e-02

Pos logit contributions
 tighter+0.007
 sidelines+0.007
 bullies+0.007
arding+0.007
 unnecessary+0.007
Neg logit contributions
te-0.007
 Sp-0.006
 ti-0.006
z-0.006
 Ske-0.006
Token grateful
Token ID14066
Feature activation+1.22e-02

Pos logit contributions
 conclusions+0.019
arding+0.019
 tighter+0.019
 sidelines+0.019
 unnecessary+0.018
Neg logit contributions
te-0.019
 owned-0.015
 once-0.015
 lived-0.015
 Sp-0.015
Token and
Token ID290
Feature activation-1.74e-02

Pos logit contributions
te+0.019
 owned+0.016
z+0.016
 Sp+0.015
 ti+0.015
Neg logit contributions
 tighter-0.019
ising-0.018
arding-0.018
 conclusions-0.018
 instructions-0.017
Token accepted
Token ID6292
Feature activation-1.28e-02

Pos logit contributions
 tighter+0.028
 conclusions+0.028
arding+0.028
 sidelines+0.027
 unnecessary+0.026
Neg logit contributions
te-0.025
 owned-0.020
 once-0.020
 Sp-0.020
z-0.020
Token his
Token ID465
Feature activation+6.79e-03

Pos logit contributions
 conclusions+0.022
arding+0.022
 tighter+0.022
 sidelines+0.022
 unnecessary+0.021
Neg logit contributions
te-0.017
 owned-0.013
 once-0.013
 Sp-0.013
 lived-0.013
Token apology
Token ID17029
Feature activation-1.35e-01

Pos logit contributions
te+0.012
 once+0.010
 owned+0.010
 Sp+0.010
 lived+0.010
Neg logit contributions
arding-0.009
 conclusions-0.009
 tighter-0.009
 sidelines-0.009
 unnecessary-0.008
Token.
Token ID13
Feature activation-3.78e-03

Pos logit contributions
 conclusions+0.241
arding+0.241
 tighter+0.241
 sidelines+0.236
 unnecessary+0.231
Neg logit contributions
te-0.168
 owned-0.132
 Sp-0.130
 once-0.129
 lived-0.127
Token
Token ID198
Feature activation+4.10e-03

Pos logit contributions
 tighter+0.007
 conclusions+0.007
 sidelines+0.007
arding+0.007
 bullies+0.007
Neg logit contributions
te-0.004
 ti-0.004
z-0.003
gh-0.003
 owned-0.003
Token
Token ID198
Feature activation+3.81e-03

Pos logit contributions
te+0.005
 ti+0.004
 owned+0.003
 seeker+0.003
 ruled+0.003
Neg logit contributions
 conclusions-0.008
 sidelines-0.007
 tighter-0.007
arding-0.007
 bullies-0.007
TokenThe
Token ID464
Feature activation-2.73e-03

Pos logit contributions
te+0.006
 owned+0.005
 once+0.005
 Sp+0.005
 ti+0.005
Neg logit contributions
 conclusions-0.005
 tighter-0.005
 sidelines-0.005
arding-0.005
ising-0.005
Token z
Token ID1976
Feature activation+4.63e-02

Pos logit contributions
 conclusions+0.004
arding+0.004
 tighter+0.004
 sidelines+0.004
 unnecessary+0.003
Neg logit contributions
te-0.005
 owned-0.004
 once-0.004
 Sp-0.004
 lived-0.004
Token Lily
Token ID20037
Feature activation-2.74e-02

Pos logit contributions
arding+0.027
 conclusions+0.027
 tighter+0.027
 sidelines+0.027
 bullies+0.026
Neg logit contributions
te-0.024
z-0.019
 owned-0.019
 ruled-0.019
 lived-0.019
Token.
Token ID13
Feature activation-9.66e-03

Pos logit contributions
 tighter+0.037
arding+0.036
ising+0.035
 instructions+0.032
 calling+0.032
Neg logit contributions
te-0.045
 ti-0.039
 Ske-0.039
 owned-0.038
 ruled-0.038
Token We
Token ID775
Feature activation-2.03e-02

Pos logit contributions
arding+0.013
 conclusions+0.013
 trigger+0.011
 supervision+0.011
 tighter+0.011
Neg logit contributions
te-0.016
z-0.015
gh-0.015
 ruled-0.014
cast-0.014
Token accept
Token ID2453
Feature activation-5.12e-02

Pos logit contributions
arding+0.034
 conclusions+0.034
 tighter+0.033
 sidelines+0.032
 unnecessary+0.032
Neg logit contributions
te-0.028
z-0.022
 owned-0.022
 Sp-0.022
 ti-0.022
Token your
Token ID534
Feature activation-1.01e-02

Pos logit contributions
 conclusions+0.086
arding+0.086
 tighter+0.085
 sidelines+0.084
 unnecessary+0.082
Neg logit contributions
te-0.070
 owned-0.056
 once-0.055
 Sp-0.055
 lived-0.054
Token apology
Token ID17029
Feature activation-1.35e-01

Pos logit contributions
arding+0.014
 conclusions+0.013
 tighter+0.013
 sidelines+0.013
 unnecessary+0.013
Neg logit contributions
te-0.017
 once-0.015
 owned-0.014
 Sp-0.014
 lived-0.014
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
arding+0.226
 tighter+0.225
 conclusions+0.216
ising+0.211
 unnecessary+0.210
Neg logit contributions
te-0.177
 owned-0.147
 Sp-0.146
z-0.145
 Ske-0.143
Token We
Token ID775
Feature activation-2.24e-02

Pos logit contributions
arding+0.020
 conclusions+0.020
 tighter+0.019
 trigger+0.018
 unnecessary+0.018
Neg logit contributions
te-0.022
z-0.021
gh-0.020
 ruled-0.019
 owned-0.018
Token hope
Token ID2911
Feature activation+1.06e-01

Pos logit contributions
 conclusions+0.039
arding+0.039
 tighter+0.038
 sidelines+0.038
 unnecessary+0.037
Neg logit contributions
te-0.029
 owned-0.023
 Sp-0.023
 once-0.023
z-0.022
Token you
Token ID345
Feature activation-3.09e-02

Pos logit contributions
te+0.179
 owned+0.150
 once+0.148
 Sp+0.148
 lived+0.147
Neg logit contributions
 conclusions-0.143
arding-0.142
 tighter-0.141
 sidelines-0.139
 unnecessary-0.135
Token learned
Token ID4499
Feature activation-4.04e-02

Pos logit contributions
 conclusions+0.048
 tighter+0.048
arding+0.048
 unnecessary+0.045
ising+0.045
Neg logit contributions
te-0.044
 Sp-0.037
z-0.037
 Ske-0.036
omo-0.036
Token his
Token ID465
Feature activation+6.15e-03

Pos logit contributions
 tighter+0.024
 weeds+0.023
 sidelines+0.022
 bullies+0.022
arding+0.022
Neg logit contributions
te-0.019
z-0.015
 ti-0.015
rd-0.015
omo-0.014
Token bear
Token ID6842
Feature activation+1.51e-02

Pos logit contributions
te+0.010
 ruled+0.008
 Sp+0.008
z+0.008
o+0.008
Neg logit contributions
 tighter-0.009
 sidelines-0.009
 faces-0.009
 bullies-0.009
 instructions-0.009
Token watched
Token ID7342
Feature activation+3.20e-02

Pos logit contributions
te+0.019
 ti+0.015
omo+0.015
 Sp+0.014
 Ske+0.014
Neg logit contributions
 tighter-0.025
ising-0.025
arding-0.024
 faces-0.024
 instructions-0.024
Token from
Token ID422
Feature activation+1.70e-02

Pos logit contributions
te+0.048
 owned+0.040
 Sp+0.039
 lived+0.039
 ti+0.039
Neg logit contributions
arding-0.048
 tighter-0.048
 conclusions-0.048
 sidelines-0.047
ising-0.046
Token the
Token ID262
Feature activation-2.65e-04

Pos logit contributions
te+0.022
 owned+0.018
 once+0.017
 Sp+0.017
 lived+0.017
Neg logit contributions
 conclusions-0.029
 tighter-0.029
arding-0.029
 sidelines-0.029
 unnecessary-0.028
Token sidelines
Token ID29394
Feature activation-1.35e-01

Pos logit contributions
arding+0.000
 tighter+0.000
 conclusions+0.000
ising+0.000
 unnecessary+0.000
Neg logit contributions
te-0.000
 ti-0.000
 once-0.000
 owned-0.000
 lived-0.000
Token.
Token ID13
Feature activation-1.07e-03

Pos logit contributions
 conclusions+0.232
 tighter+0.231
 sidelines+0.228
arding+0.228
ising+0.221
Neg logit contributions
te-0.172
 owned-0.138
 Sp-0.135
 ti-0.132
 once-0.131
Token Tim
Token ID5045
Feature activation-1.32e-02

Pos logit contributions
 sidelines+0.002
 bullies+0.002
arding+0.002
 weeds+0.002
 faces+0.002
Neg logit contributions
te-0.002
 ti-0.001
z-0.001
gh-0.001
 seeker-0.001
Tokenmy
Token ID1820
Feature activation-6.74e-03

Pos logit contributions
 tighter+0.018
arding+0.017
ising+0.016
 sidelines+0.015
 conclusions+0.015
Neg logit contributions
te-0.022
 ti-0.019
 Sp-0.019
 ruled-0.018
 Ske-0.018
Token was
Token ID373
Feature activation+1.67e-02

Pos logit contributions
 tighter+0.011
ising+0.010
arding+0.010
 faces+0.009
 bullies+0.009
Neg logit contributions
te-0.009
 Sp-0.008
 Ske-0.008
 ti-0.008
 Esk-0.008
Token happy
Token ID3772
Feature activation+8.42e-03

Pos logit contributions
te+0.031
ests+0.030
 Sp+0.030
z+0.029
 ti+0.029
Neg logit contributions
 tighter-0.017
 calling-0.016
 sidelines-0.015
 bullies-0.014
 further-0.014
Token cheered
Token ID34767
Feature activation+1.04e-02

Pos logit contributions
te+0.069
 owned+0.057
 once+0.056
 Sp+0.056
 lived+0.056
Neg logit contributions
 conclusions-0.071
arding-0.070
 tighter-0.070
 sidelines-0.069
 unnecessary-0.067
Token him
Token ID683
Feature activation+8.63e-03

Pos logit contributions
te+0.016
 ti+0.014
 owned+0.014
 lived+0.014
 ruled+0.013
Neg logit contributions
arding-0.015
 tighter-0.015
ising-0.015
 bullies-0.014
 instructions-0.014
Token on
Token ID319
Feature activation+6.72e-03

Pos logit contributions
te+0.010
 Sp+0.009
 owned+0.009
omo+0.009
 ti+0.008
Neg logit contributions
 tighter-0.015
arding-0.015
 unnecessary-0.014
 instructions-0.014
 conclusions-0.014
Token from
Token ID422
Feature activation+1.73e-02

Pos logit contributions
te+0.007
 once+0.005
 owned+0.005
 Sp+0.005
 lived+0.005
Neg logit contributions
arding-0.013
 tighter-0.013
 conclusions-0.013
 sidelines-0.013
 unnecessary-0.013
Token the
Token ID262
Feature activation+1.24e-03

Pos logit contributions
te+0.016
 once+0.014
 owned+0.014
 Sp+0.014
 ever+0.012
Neg logit contributions
arding-0.036
 tighter-0.035
 conclusions-0.034
ising-0.034
oves-0.034
Token sidelines
Token ID29394
Feature activation-1.34e-01

Pos logit contributions
te+0.002
 once+0.002
 ti+0.002
 owned+0.001
 He+0.001
Neg logit contributions
arding-0.002
oves-0.002
 conclusions-0.002
 unnecessary-0.002
 tighter-0.002
Token.
Token ID13
Feature activation+3.82e-03

Pos logit contributions
 conclusions+0.221
 tighter+0.220
arding+0.216
 sidelines+0.214
ising+0.211
Neg logit contributions
te-0.177
 owned-0.147
 ti-0.143
 Sp-0.143
z-0.142
Token One
Token ID1881
Feature activation-1.69e-03

Pos logit contributions
te+0.005
 ti+0.005
z+0.004
gh+0.004
 ever+0.004
Neg logit contributions
 conclusions-0.006
 sidelines-0.006
arding-0.006
 bullies-0.006
oning-0.006
Token day
Token ID1110
Feature activation+1.73e-03

Pos logit contributions
 tighter+0.003
 conclusions+0.003
arding+0.003
 sidelines+0.003
ising+0.003
Neg logit contributions
te-0.002
 Sp-0.001
 ti-0.001
z-0.001
 owned-0.001
Token,
Token ID11
Feature activation-6.33e-03

Pos logit contributions
te+0.002
 ti+0.002
 ruled+0.002
 owned+0.001
z+0.001
Neg logit contributions
arding-0.003
 tighter-0.003
 weeds-0.003
 conclusions-0.003
 sidelines-0.003
Token Tim
Token ID5045
Feature activation-1.49e-02

Pos logit contributions
 tighter+0.013
 sidelines+0.013
 conclusions+0.013
arding+0.013
 weeds+0.013
Neg logit contributions
te-0.006
 ti-0.004
z-0.004
 ruled-0.004
 owned-0.004
Token.
Token ID13
Feature activation+1.95e-03

Pos logit contributions
te+0.016
 ti+0.013
 Sp+0.013
 owned+0.013
z+0.013
Neg logit contributions
 tighter-0.018
 conclusions-0.017
arding-0.017
ising-0.017
 unnecessary-0.016
Token When
Token ID1649
Feature activation-4.49e-02

Pos logit contributions
te+0.002
 ti+0.002
z+0.002
 seeker+0.002
 owned+0.002
Neg logit contributions
 sidelines-0.004
 conclusions-0.003
 tighter-0.003
arding-0.003
 instructions-0.003
Token he
Token ID339
Feature activation+3.80e-02

Pos logit contributions
 tighter+0.061
 sidelines+0.060
 conclusions+0.059
 instructions+0.059
 bullies+0.059
Neg logit contributions
te-0.072
z-0.064
 ti-0.063
 Sp-0.060
 owned-0.059
Token heard
Token ID2982
Feature activation-8.03e-03

Pos logit contributions
te+0.051
z+0.042
 Sp+0.042
 ti+0.041
rd+0.039
Neg logit contributions
 tighter-0.061
arding-0.059
 bullies-0.058
 sidelines-0.057
 conclusions-0.056
Token their
Token ID511
Feature activation-2.16e-02

Pos logit contributions
 conclusions+0.014
 sidelines+0.014
arding+0.014
 tighter+0.014
 bullies+0.014
Neg logit contributions
te-0.009
z-0.008
 ti-0.007
 Sp-0.007
 owned-0.007
Token apology
Token ID17029
Feature activation-1.34e-01

Pos logit contributions
 conclusions+0.035
arding+0.035
 tighter+0.034
 sidelines+0.033
 unnecessary+0.032
Neg logit contributions
te-0.032
 once-0.028
 Sp-0.026
 ever-0.025
 Ske-0.025
Token,
Token ID11
Feature activation-1.01e-02

Pos logit contributions
arding+0.263
 tighter+0.261
 conclusions+0.261
 sidelines+0.258
 unnecessary+0.252
Neg logit contributions
te-0.143
 owned-0.107
 Sp-0.106
 once-0.104
 lived-0.102
Token John
Token ID1757
Feature activation-5.22e-03

Pos logit contributions
 tighter+0.018
 bullies+0.018
 conclusions+0.017
 sidelines+0.017
 weeds+0.017
Neg logit contributions
te-0.012
z-0.010
 ti-0.010
 Sp-0.009
omo-0.009
Token's
Token ID338
Feature activation-4.83e-02

Pos logit contributions
 tighter+0.008
arding+0.008
 conclusions+0.008
 sidelines+0.008
ising+0.008
Neg logit contributions
te-0.007
 Sp-0.006
 owned-0.006
 once-0.006
 ti-0.006
Token heart
Token ID2612
Feature activation-2.18e-02

Pos logit contributions
 tighter+0.065
 conclusions+0.065
 sidelines+0.064
arding+0.063
 bullies+0.063
Neg logit contributions
te-0.081
 owned-0.068
z-0.068
 Sp-0.068
 ruled-0.067
Token filled
Token ID5901
Feature activation+6.54e-02

Pos logit contributions
 tighter+0.036
arding+0.035
ising+0.034
 sidelines+0.033
 conclusions+0.033
Neg logit contributions
te-0.030
 Sp-0.025
 Ske-0.025
 owned-0.024
 once-0.024
Token to
Token ID284
Feature activation-4.05e-03

Pos logit contributions
te+0.013
 owned+0.010
 ruled+0.010
 Sp+0.010
 lived+0.010
Neg logit contributions
 tighter-0.015
 conclusions-0.014
arding-0.014
ising-0.014
 weeds-0.014
Token the
Token ID262
Feature activation+1.80e-03

Pos logit contributions
 conclusions+0.008
arding+0.008
 tighter+0.008
 sidelines+0.008
 unnecessary+0.008
Neg logit contributions
te-0.004
 owned-0.003
 once-0.003
 Sp-0.003
 lived-0.003
Token squirrel
Token ID33039
Feature activation-1.23e-02

Pos logit contributions
te+0.003
 once+0.003
 Ske+0.002
 lived+0.002
 owned+0.002
Neg logit contributions
arding-0.003
 tighter-0.003
oves-0.003
ares-0.003
 conclusions-0.003
Token as
Token ID355
Feature activation-3.96e-02

Pos logit contributions
 conclusions+0.024
arding+0.024
 sidelines+0.024
 unnecessary+0.024
 tighter+0.023
Neg logit contributions
te-0.014
 once-0.011
 owned-0.010
 Sp-0.010
 lived-0.010
Token an
Token ID281
Feature activation+9.60e-03

Pos logit contributions
 tighter+0.071
 conclusions+0.069
arding+0.069
 unnecessary+0.069
 bullies+0.067
Neg logit contributions
te-0.048
z-0.038
 owned-0.036
 ti-0.036
 Sp-0.036
Token apology
Token ID17029
Feature activation-1.34e-01

Pos logit contributions
te+0.018
 once+0.016
 ever+0.015
 owned+0.015
 Ske+0.015
Neg logit contributions
arding-0.012
 sidelines-0.011
 conclusions-0.011
 unnecessary-0.011
oves-0.011
Token.
Token ID13
Feature activation-5.89e-04

Pos logit contributions
arding+0.228
 tighter+0.227
 conclusions+0.225
 sidelines+0.219
ising+0.216
Neg logit contributions
te-0.175
 Sp-0.141
 owned-0.137
z-0.137
 ti-0.133
Token The
Token ID383
Feature activation-6.84e-03

Pos logit contributions
 tighter+0.001
 sidelines+0.001
 conclusions+0.001
arding+0.001
 bullies+0.001
Neg logit contributions
te-0.001
 ti-0.000
z-0.000
 owned-0.000
 ruled-0.000
Token squirrel
Token ID33039
Feature activation-1.28e-02

Pos logit contributions
arding+0.010
 conclusions+0.010
 tighter+0.009
 unnecessary+0.009
 sidelines+0.009
Neg logit contributions
te-0.011
 once-0.010
 owned-0.009
 Ske-0.009
 lived-0.009
Token forg
Token ID5784
Feature activation+2.31e-02

Pos logit contributions
 conclusions+0.024
arding+0.024
 tighter+0.024
 sidelines+0.024
 unnecessary+0.023
Neg logit contributions
te-0.015
 owned-0.011
 once-0.011
 Sp-0.011
 lived-0.011
Tokenave
Token ID1015
Feature activation-1.67e-02

Pos logit contributions
te+0.030
 owned+0.024
 lived+0.024
 once+0.023
 ruled+0.023
Neg logit contributions
 conclusions-0.042
 sidelines-0.041
arding-0.040
 tighter-0.040
 weeds-0.040
Token Jack
Token ID3619
Feature activation-2.89e-02

Pos logit contributions
te+0.039
 ti+0.035
 Sp+0.033
 owned+0.033
 Ske+0.033
Neg logit contributions
arding-0.036
 tighter-0.034
ising-0.034
oves-0.033
 sidelines-0.033
Token noticed
Token ID6810
Feature activation+2.11e-02

Pos logit contributions
 conclusions+0.047
arding+0.045
 sidelines+0.045
 unnecessary+0.045
 tighter+0.045
Neg logit contributions
te-0.043
 owned-0.036
 once-0.035
 lived-0.035
 Sp-0.035
Token Sam
Token ID3409
Feature activation-1.55e-02

Pos logit contributions
te+0.030
 owned+0.024
z+0.023
 once+0.023
 ti+0.023
Neg logit contributions
 conclusions-0.035
 tighter-0.034
arding-0.034
 sidelines-0.034
ising-0.033
Token was
Token ID373
Feature activation+2.22e-02

Pos logit contributions
 conclusions+0.029
 sidelines+0.028
 unnecessary+0.028
arding+0.028
 tighter+0.027
Neg logit contributions
te-0.019
 once-0.016
 owned-0.015
 lived-0.015
 Sp-0.014
Token z
Token ID1976
Feature activation+5.81e-02

Pos logit contributions
te+0.033
z+0.028
 Sp+0.027
 lived+0.026
 ti+0.026
Neg logit contributions
 tighter-0.034
arding-0.033
 conclusions-0.033
 bullies-0.032
 sidelines-0.032
Tokenoned
Token ID12004
Feature activation-1.34e-01

Pos logit contributions
te+0.118
 lived+0.110
 owned+0.107
 Sp+0.105
 ruled+0.104
Neg logit contributions
arding-0.059
ising-0.057
 tighter-0.056
 sidelines-0.054
 conclusions-0.054
Token in
Token ID287
Feature activation-1.42e-02

Pos logit contributions
 tighter+0.228
arding+0.223
 conclusions+0.222
 sidelines+0.217
ising+0.217
Neg logit contributions
te-0.180
 owned-0.142
 lived-0.140
 Sp-0.139
 ti-0.139
Token on
Token ID319
Feature activation+1.45e-02

Pos logit contributions
 tighter+0.025
 conclusions+0.025
 weeds+0.024
 clues+0.024
 sidelines+0.024
Neg logit contributions
te-0.019
z-0.015
 lived-0.015
 Sp-0.014
 ruled-0.014
Token the
Token ID262
Feature activation+1.02e-02

Pos logit contributions
te+0.020
z+0.017
omo+0.016
 lived+0.016
 ti+0.015
Neg logit contributions
 conclusions-0.025
 tighter-0.024
 weeds-0.024
 clues-0.023
 sidelines-0.023
Token barrel
Token ID9036
Feature activation-1.86e-03

Pos logit contributions
te+0.019
 once+0.017
 owned+0.016
 lived+0.016
 Sp+0.016
Neg logit contributions
arding-0.012
 conclusions-0.012
 tighter-0.011
ising-0.011
 unnecessary-0.011
Token.
Token ID13
Feature activation+7.71e-03

Pos logit contributions
 unnecessary+0.004
 conclusions+0.004
arding+0.004
 sidelines+0.004
 bullies+0.004
Neg logit contributions
te-0.002
 once-0.002
 owned-0.001
 He-0.001
 ever-0.001
Token wanted
Token ID2227
Feature activation+9.29e-03

Pos logit contributions
 tighter+0.051
 agree+0.051
 faces+0.050
ly+0.050
 conclusions+0.049
Neg logit contributions
te-0.034
 Sp-0.030
z-0.029
 once-0.029
 Jo-0.029
Token to
Token ID284
Feature activation+4.00e-03

Pos logit contributions
te+0.010
 owned+0.007
 once+0.007
 Sp+0.007
 lived+0.007
Neg logit contributions
 conclusions-0.019
arding-0.019
 tighter-0.019
 sidelines-0.018
 unnecessary-0.018
Token go
Token ID467
Feature activation-1.68e-02

Pos logit contributions
te+0.007
 owned+0.006
 once+0.006
 lived+0.006
 Sp+0.006
Neg logit contributions
arding-0.005
 tighter-0.005
 conclusions-0.005
 unnecessary-0.005
 sidelines-0.005
Token on
Token ID319
Feature activation+1.47e-02

Pos logit contributions
 conclusions+0.034
arding+0.034
 tighter+0.033
 sidelines+0.033
 unnecessary+0.032
Neg logit contributions
te-0.017
 once-0.013
 owned-0.013
 Sp-0.013
 lived-0.012
Token an
Token ID281
Feature activation+2.18e-02

Pos logit contributions
te+0.023
z+0.018
man+0.018
 lived+0.017
 owned+0.017
Neg logit contributions
 unnecessary-0.023
 conclusions-0.022
 clues-0.022
 bullies-0.022
 tighter-0.022
Token aer
Token ID9551
Feature activation+1.23e-01

Pos logit contributions
te+0.023
 once+0.019
 owned+0.018
 lived+0.017
 ruled+0.017
Neg logit contributions
arding-0.045
 conclusions-0.043
 trigger-0.042
 unnecessary-0.042
 tighter-0.041
Tokenopl
Token ID20106
Feature activation+5.24e-02

Pos logit contributions
te+0.128
 owned+0.099
 Sp+0.098
 lived+0.096
 ti+0.095
Neg logit contributions
 sidelines-0.242
 conclusions-0.241
 tighter-0.237
arding-0.236
ising-0.233
Tokenane
Token ID1531
Feature activation+5.39e-02

Pos logit contributions
te+0.070
 Sp+0.067
 owned+0.065
 ti+0.064
 lived+0.064
Neg logit contributions
arding-0.078
ising-0.076
 sidelines-0.074
ares-0.073
 conclusions-0.072
Token.
Token ID13
Feature activation+1.41e-02

Pos logit contributions
te+0.063
 once+0.050
 owned+0.049
 Sp+0.047
 lived+0.047
Neg logit contributions
arding-0.102
 conclusions-0.102
 tighter-0.100
 sidelines-0.100
 unnecessary-0.098
Token He
Token ID679
Feature activation+1.32e-02

Pos logit contributions
te+0.016
z+0.012
 ruled+0.011
 ti+0.011
 owned+0.010
Neg logit contributions
 tighter-0.028
 sidelines-0.028
 conclusions-0.027
arding-0.027
 instructions-0.027
Token asked
Token ID1965
Feature activation-1.52e-02

Pos logit contributions
te+0.020
 Sp+0.017
 once+0.016
rd+0.016
 Ske+0.016
Neg logit contributions
 tighter-0.020
 lam-0.018
 weeds-0.017
arding-0.017
 officers-0.017
Token stayed
Token ID9658
Feature activation+1.95e-02

Pos logit contributions
 tighter+0.030
ising+0.028
 weeds+0.027
 conclusions+0.027
arding+0.027
Neg logit contributions
te-0.032
 Sp-0.027
z-0.026
gh-0.026
omo-0.026
Token with
Token ID351
Feature activation+1.83e-02

Pos logit contributions
te+0.031
 Sp+0.029
 Ske+0.028
omo+0.026
 Jacob+0.026
Neg logit contributions
 tighter-0.029
 further-0.028
ising-0.027
 weeds-0.026
 unnecessary-0.025
Token the
Token ID262
Feature activation+9.19e-03

Pos logit contributions
te+0.025
 owned+0.020
 Sp+0.020
z+0.020
 ti+0.019
Neg logit contributions
 conclusions-0.031
 tighter-0.031
arding-0.030
 sidelines-0.030
 unnecessary-0.029
Token other
Token ID584
Feature activation-7.86e-02

Pos logit contributions
te+0.013
 once+0.011
 owned+0.010
 lived+0.010
 Sp+0.010
Neg logit contributions
arding-0.015
 conclusions-0.015
 tighter-0.015
 sidelines-0.015
 unnecessary-0.015
Token sun
Token ID4252
Feature activation+2.81e-02

Pos logit contributions
 conclusions+0.107
arding+0.107
 tighter+0.106
 sidelines+0.105
 unnecessary+0.102
Neg logit contributions
te-0.131
 owned-0.111
 once-0.111
 Sp-0.109
 lived-0.109
Tokenfl
Token ID2704
Feature activation+1.33e-01

Pos logit contributions
te+0.042
 Sp+0.037
 Ske+0.035
 owned+0.035
 ti+0.035
Neg logit contributions
 tighter-0.042
arding-0.041
ising-0.040
 weeds-0.039
oves-0.038
Tokenowers
Token ID3618
Feature activation-1.33e-02

Pos logit contributions
te+0.253
 Sp+0.217
 owned+0.217
 ti+0.215
 ruled+0.208
Neg logit contributions
arding-0.151
ising-0.146
 conclusions-0.146
 sidelines-0.145
 weeds-0.142
Token in
Token ID287
Feature activation-1.48e-02

Pos logit contributions
 tighter+0.023
arding+0.023
 conclusions+0.023
 sidelines+0.023
ising+0.022
Neg logit contributions
te-0.017
 owned-0.013
 Sp-0.013
 once-0.013
 lived-0.013
Token the
Token ID262
Feature activation+9.28e-03

Pos logit contributions
 conclusions+0.022
 tighter+0.021
 weeds+0.021
 sidelines+0.020
 further+0.020
Neg logit contributions
te-0.023
z-0.020
 Sp-0.019
omo-0.019
 Ske-0.019
Token big
Token ID1263
Feature activation+1.01e-02

Pos logit contributions
te+0.014
 once+0.012
 owned+0.012
 lived+0.012
 ruled+0.011
Neg logit contributions
arding-0.015
 conclusions-0.014
 tighter-0.014
 unnecessary-0.013
ising-0.013
Token field
Token ID2214
Feature activation-7.59e-03

Pos logit contributions
te+0.015
 owned+0.012
 Sp+0.012
 once+0.012
 lived+0.012
Neg logit contributions
 conclusions-0.016
 tighter-0.016
arding-0.016
 sidelines-0.015
 unnecessary-0.015
Token glowing
Token ID21377
Feature activation+2.57e-02

Pos logit contributions
te+0.030
 owned+0.026
 once+0.026
 lived+0.025
 Sp+0.025
Neg logit contributions
 conclusions-0.024
arding-0.023
 sidelines-0.023
 tighter-0.023
 unnecessary-0.022
Token wand
Token ID11569
Feature activation-8.45e-03

Pos logit contributions
te+0.049
 Sp+0.042
 owned+0.042
 once+0.041
 lived+0.041
Neg logit contributions
 conclusions-0.029
arding-0.028
 tighter-0.028
 sidelines-0.028
ising-0.028
Token and
Token ID290
Feature activation-3.29e-03

Pos logit contributions
 tighter+0.014
arding+0.014
ising+0.013
 conclusions+0.013
 sidelines+0.013
Neg logit contributions
te-0.011
 Sp-0.009
 owned-0.009
 lived-0.009
omo-0.009
Token having
Token ID1719
Feature activation-1.83e-03

Pos logit contributions
 tighter+0.005
 sidelines+0.005
 weeds+0.005
 conclusions+0.005
 further+0.005
Neg logit contributions
te-0.005
z-0.004
omo-0.004
rd-0.004
gh-0.004
Token the
Token ID262
Feature activation+1.39e-02

Pos logit contributions
 conclusions+0.002
 unnecessary+0.002
 weeds+0.002
 clues+0.002
 tighter+0.002
Neg logit contributions
te-0.003
z-0.003
omo-0.002
man-0.002
 Driver-0.002
Token sill
Token ID49276
Feature activation+1.49e-01

Pos logit contributions
te+0.017
 owned+0.013
 Sp+0.013
z+0.012
 lived+0.012
Neg logit contributions
 conclusions-0.026
 tighter-0.026
arding-0.025
 sidelines-0.025
 unnecessary-0.025
Tokeniest
Token ID6386
Feature activation+1.12e-01

Pos logit contributions
 Sp+0.188
 Ske+0.172
te+0.165
 seeker+0.155
 Esk+0.153
Neg logit contributions
ly-0.237
arding-0.235
 conclusions-0.230
 tighter-0.228
ising-0.223
Token time
Token ID640
Feature activation-4.14e-02

Pos logit contributions
te+0.177
 Sp+0.148
 owned+0.146
z+0.144
 lived+0.142
Neg logit contributions
 conclusions-0.165
 tighter-0.161
 sidelines-0.159
arding-0.159
ising-0.155
Token of
Token ID286
Feature activation+1.69e-02

Pos logit contributions
arding+0.069
 conclusions+0.067
 tighter+0.066
 unnecessary+0.066
 weeds+0.065
Neg logit contributions
te-0.057
omo-0.044
z-0.044
 seeker-0.043
 ti-0.042
Token his
Token ID465
Feature activation+1.92e-02

Pos logit contributions
te+0.024
 owned+0.019
z+0.019
 Sp+0.019
 once+0.018
Neg logit contributions
 conclusions-0.028
 tighter-0.028
arding-0.027
 sidelines-0.027
 unnecessary-0.027
Token life
Token ID1204
Feature activation+7.46e-03

Pos logit contributions
te+0.030
 owned+0.025
z+0.024
 Sp+0.024
 ruled+0.024
Neg logit contributions
 conclusions-0.028
 tighter-0.027
 sidelines-0.027
arding-0.027
 unnecessary-0.027
Token bear
Token ID6842
Feature activation+2.90e-02

Pos logit contributions
te+0.011
 owned+0.009
 Sp+0.009
 lived+0.009
 once+0.009
Neg logit contributions
 conclusions-0.009
arding-0.009
 unnecessary-0.009
 tighter-0.009
 sidelines-0.009
Token who
Token ID508
Feature activation+2.76e-02

Pos logit contributions
te+0.042
 ti+0.034
z+0.034
 once+0.034
omo+0.033
Neg logit contributions
 conclusions-0.042
 tighter-0.041
arding-0.040
ising-0.040
 bullies-0.040
Token had
Token ID550
Feature activation+1.56e-02

Pos logit contributions
te+0.033
z+0.028
 Jacob+0.028
 Matt+0.027
 Jo+0.027
Neg logit contributions
 tighter-0.044
 ask-0.043
 trash-0.042
 waste-0.041
 goats-0.041
Token the
Token ID262
Feature activation+1.19e-02

Pos logit contributions
te+0.023
z+0.018
omo+0.017
 Sp+0.017
 owned+0.016
Neg logit contributions
 tighter-0.024
 weeds-0.024
 bullies-0.024
 unnecessary-0.024
 goats-0.023
Token co
Token ID763
Feature activation+3.46e-02

Pos logit contributions
te+0.015
 owned+0.012
 once+0.012
 Sp+0.012
 lived+0.012
Neg logit contributions
 conclusions-0.021
arding-0.021
 tighter-0.021
 sidelines-0.020
 unnecessary-0.020
Tokenz
Token ID89
Feature activation+1.48e-01

Pos logit contributions
te+0.038
 owned+0.035
 lived+0.033
 once+0.033
 ruled+0.032
Neg logit contributions
arding-0.063
 tighter-0.061
oves-0.061
ising-0.060
 conclusions-0.058
Tokeniest
Token ID6386
Feature activation+1.14e-01

Pos logit contributions
te+0.243
 Sp+0.225
 owned+0.216
 lived+0.211
 ti+0.210
Neg logit contributions
 tighter-0.192
arding-0.191
ising-0.188
 trigger-0.184
 conclusions-0.180
Token bedroom
Token ID14043
Feature activation+2.20e-02

Pos logit contributions
te+0.176
 owned+0.146
 Sp+0.144
 lived+0.141
 once+0.141
Neg logit contributions
 conclusions-0.168
 tighter-0.168
arding-0.164
 sidelines-0.164
ising-0.161
Token.
Token ID13
Feature activation+2.46e-02

Pos logit contributions
te+0.031
omo+0.027
 Sp+0.026
 Ske+0.025
tto+0.025
Neg logit contributions
 tighter-0.033
 instructions-0.031
 sidelines-0.031
arding-0.031
ising-0.031
Token Every
Token ID3887
Feature activation+3.47e-02

Pos logit contributions
te+0.030
z+0.024
gh+0.023
 ever+0.023
rd+0.020
Neg logit contributions
 sidelines-0.045
 conclusions-0.044
 tighter-0.044
 weeds-0.043
 bullies-0.042
Token night
Token ID1755
Feature activation+4.33e-02

Pos logit contributions
te+0.043
 Sp+0.034
 ti+0.033
z+0.033
 owned+0.032
Neg logit contributions
 tighter-0.062
 conclusions-0.059
 sidelines-0.057
 further-0.057
 weeds-0.056
Token the
Token ID262
Feature activation+2.07e-02

Pos logit contributions
te+0.005
z+0.004
gh+0.004
 owned+0.004
rd+0.004
Neg logit contributions
 tighter-0.005
 conclusions-0.005
 weeds-0.004
 sidelines-0.004
 bullies-0.004
Token child
Token ID1200
Feature activation+1.86e-02

Pos logit contributions
te+0.034
 owned+0.029
 once+0.028
 Sp+0.028
 lived+0.028
Neg logit contributions
 conclusions-0.028
arding-0.028
 tighter-0.028
 sidelines-0.028
 unnecessary-0.027
Token wore
Token ID12408
Feature activation+6.19e-02

Pos logit contributions
te+0.024
 Sp+0.021
 Esk+0.020
 seeker+0.020
 Ske+0.020
Neg logit contributions
 tighter-0.029
ising-0.029
arding-0.028
 officers-0.027
 sidelines-0.027
Token the
Token ID262
Feature activation+2.10e-02

Pos logit contributions
te+0.089
omo+0.075
 Sp+0.073
 Esk+0.073
o+0.071
Neg logit contributions
 tighter-0.100
 weeds-0.094
 strings-0.093
 further-0.092
 unnecessary-0.092
Token ancient
Token ID6156
Feature activation+2.17e-02

Pos logit contributions
te+0.035
 Sp+0.029
 owned+0.029
 once+0.029
 lived+0.029
Neg logit contributions
 tighter-0.028
 conclusions-0.028
 sidelines-0.028
arding-0.028
 unnecessary-0.027
Token crown
Token ID12389
Feature activation+1.25e-01

Pos logit contributions
te+0.037
 Sp+0.031
z+0.031
 owned+0.031
 once+0.031
Neg logit contributions
 conclusions-0.028
 tighter-0.028
arding-0.028
 sidelines-0.027
 weeds-0.026
Token with
Token ID351
Feature activation+2.56e-02

Pos logit contributions
te+0.162
 Sp+0.141
 owned+0.133
 Esk+0.132
omo+0.131
Neg logit contributions
 tighter-0.212
arding-0.201
ising-0.195
 conclusions-0.194
 sidelines-0.193
Token pride
Token ID11293
Feature activation+1.97e-02

Pos logit contributions
te+0.036
 owned+0.029
 Sp+0.028
z+0.028
 once+0.028
Neg logit contributions
 conclusions-0.041
 tighter-0.041
arding-0.041
 sidelines-0.040
 weeds-0.039
Token,
Token ID11
Feature activation-4.55e-05

Pos logit contributions
te+0.027
 Sp+0.025
 Ske+0.023
omo+0.023
z+0.023
Neg logit contributions
arding-0.031
 tighter-0.030
ising-0.029
 instructions-0.028
 conclusions-0.028
Token and
Token ID290
Feature activation-3.86e-03

Pos logit contributions
 tighter+0.000
 sidelines+0.000
ising+0.000
 conclusions+0.000
arding+0.000
Neg logit contributions
te-0.000
 Sp-0.000
rd-0.000
 Esk-0.000
z-0.000
Token all
Token ID477
Feature activation+9.81e-03

Pos logit contributions
 tighter+0.006
 conclusions+0.006
arding+0.006
 sidelines+0.006
 weeds+0.006
Neg logit contributions
te-0.005
 Sp-0.004
z-0.004
 owned-0.004
omo-0.004
Token bed
Token ID3996
Feature activation-1.66e-02

Pos logit contributions
 conclusions+0.001
 sidelines+0.001
 tighter+0.001
 bullies+0.001
 weeds+0.001
Neg logit contributions
te-0.000
z-0.000
 ti-0.000
omo-0.000
 owned-0.000
Token and
Token ID290
Feature activation-1.08e-02

Pos logit contributions
 tighter+0.026
ising+0.025
 sidelines+0.025
 instructions+0.024
arding+0.024
Neg logit contributions
te-0.023
 Sp-0.020
 owned-0.020
 Ske-0.019
omo-0.019
Token this
Token ID428
Feature activation+7.19e-03

Pos logit contributions
 tighter+0.018
 conclusions+0.016
 sidelines+0.016
 further+0.016
 weeds+0.016
Neg logit contributions
te-0.016
z-0.012
omo-0.012
gh-0.012
 Sp-0.011
Token time
Token ID640
Feature activation-4.92e-02

Pos logit contributions
te+0.010
 Sp+0.010
 Ske+0.009
z+0.009
 Esk+0.008
Neg logit contributions
 tighter-0.011
 weeds-0.010
 further-0.010
ising-0.010
 sidelines-0.010
Token,
Token ID11
Feature activation-6.63e-03

Pos logit contributions
 tighter+0.083
 further+0.083
arding+0.080
 weeds+0.079
ly+0.077
Neg logit contributions
te-0.061
omo-0.054
z-0.051
rd-0.050
 Amen-0.049
Token he
Token ID339
Feature activation+4.03e-02

Pos logit contributions
 tighter+0.012
 further+0.011
 sidelines+0.011
 weeds+0.011
 conclusions+0.011
Neg logit contributions
te-0.009
z-0.007
rd-0.007
omo-0.007
gh-0.007
Token fell
Token ID3214
Feature activation+4.79e-03

Pos logit contributions
te+0.058
 Sp+0.047
 Ske+0.046
z+0.046
rd+0.046
Neg logit contributions
 tighter-0.064
 further-0.057
 lam-0.056
 weeds-0.056
 bullies-0.055
Token asleep
Token ID16039
Feature activation+7.27e-02

Pos logit contributions
te+0.007
rd+0.006
 owned+0.006
 lived+0.006
 Sp+0.006
Neg logit contributions
 tighter-0.008
 further-0.007
 weeds-0.007
 sidelines-0.007
 lam-0.007
Token straight
Token ID3892
Feature activation-2.19e-02

Pos logit contributions
te+0.100
omo+0.096
 Sp+0.086
 owned+0.084
o+0.082
Neg logit contributions
 tighter-0.115
ising-0.111
 sidelines-0.107
 further-0.107
 instructions-0.101
Token away
Token ID1497
Feature activation-8.33e-03

Pos logit contributions
 tighter+0.025
 further+0.023
 weeds+0.022
ly+0.022
 instructions+0.022
Neg logit contributions
te-0.042
 Sp-0.039
omo-0.039
 lived-0.036
 Ske-0.036
Token.
Token ID13
Feature activation+3.09e-03

Pos logit contributions
 tighter+0.013
ising+0.013
 conclusions+0.012
 sidelines+0.012
 instructions+0.012
Neg logit contributions
te-0.012
 Sp-0.010
 owned-0.010
omo-0.010
 lived-0.010
Token her
Token ID607
Feature activation+1.75e-02

Pos logit contributions
te+0.004
 owned+0.003
 once+0.003
 Sp+0.003
 lived+0.003
Neg logit contributions
 conclusions-0.006
arding-0.006
 tighter-0.006
 sidelines-0.006
 unnecessary-0.006
Token mum
Token ID25682
Feature activation+9.11e-03

Pos logit contributions
te+0.028
 owned+0.023
 Sp+0.023
 lived+0.022
z+0.022
Neg logit contributions
 conclusions-0.025
 tighter-0.025
arding-0.025
 sidelines-0.025
ising-0.024
Token.
Token ID13
Feature activation-3.60e-03

Pos logit contributions
te+0.012
 once+0.010
 owned+0.010
 Sp+0.009
 lived+0.009
Neg logit contributions
 conclusions-0.016
arding-0.016
 tighter-0.015
 sidelines-0.015
 unnecessary-0.015
Token She
Token ID1375
Feature activation+2.04e-03

Pos logit contributions
 conclusions+0.006
 weeds+0.006
 sidelines+0.006
arding+0.006
 tighter+0.006
Neg logit contributions
te-0.005
z-0.004
 ti-0.004
 owned-0.004
gh-0.004
Token was
Token ID373
Feature activation+2.08e-02

Pos logit contributions
te+0.003
 Sp+0.002
z+0.002
 once+0.002
 ti+0.002
Neg logit contributions
 tighter-0.003
arding-0.003
 sidelines-0.003
 conclusions-0.003
ising-0.003
Token smiling
Token ID16755
Feature activation+3.73e-02

Pos logit contributions
te+0.031
 Sp+0.026
z+0.026
 lived+0.026
 ti+0.025
Neg logit contributions
 tighter-0.030
 sidelines-0.030
arding-0.030
 bullies-0.029
 conclusions-0.029
Token and
Token ID290
Feature activation-1.32e-02

Pos logit contributions
te+0.057
z+0.050
 Sp+0.050
rd+0.049
gh+0.049
Neg logit contributions
 tighter-0.055
ising-0.052
 faces-0.050
arding-0.049
ly-0.049
Token Jane
Token ID12091
Feature activation+5.68e-03

Pos logit contributions
 tighter+0.019
arding+0.019
 sidelines+0.019
 conclusions+0.018
ising+0.018
Neg logit contributions
te-0.021
 Sp-0.017
 owned-0.017
 once-0.017
z-0.017
Token knew
Token ID2993
Feature activation-1.00e-02

Pos logit contributions
te+0.007
 Sp+0.006
 ti+0.005
 owned+0.005
 Ske+0.005
Neg logit contributions
 tighter-0.010
arding-0.010
 conclusions-0.010
ising-0.010
 sidelines-0.009
Token that
Token ID326
Feature activation+1.56e-02

Pos logit contributions
 tighter+0.015
 conclusions+0.015
arding+0.014
 weeds+0.014
 unnecessary+0.014
Neg logit contributions
te-0.016
z-0.013
 owned-0.013
 Sp-0.013
 ti-0.013
Token she
Token ID673
Feature activation+3.59e-02

Pos logit contributions
te+0.024
z+0.020
omo+0.019
 ti+0.019
 owned+0.019
Neg logit contributions
 conclusions-0.024
 tighter-0.024
 weeds-0.023
arding-0.023
 bullies-0.022
Token not
Token ID407
Feature activation+4.41e-02

Pos logit contributions
te+0.023
z+0.021
 ti+0.020
omo+0.020
 Esk+0.019
Neg logit contributions
arding-0.023
 tighter-0.022
 instructions-0.021
 bullies-0.021
 unnecessary-0.021
Token to
Token ID284
Feature activation-3.73e-04

Pos logit contributions
te+0.020
 ever+0.016
 once+0.014
 owned+0.014
 no+0.012
Neg logit contributions
 sidelines-0.116
 conclusions-0.116
 trigger-0.115
 weeds-0.110
 lam-0.108
Token worry
Token ID5490
Feature activation+2.93e-05

Pos logit contributions
 sidelines+0.001
 weeds+0.001
 unnecessary+0.001
awks+0.001
ans+0.001
Neg logit contributions
te-0.000
 no-0.000
 ever-0.000
 once-0.000
 He-0.000
Token and
Token ID290
Feature activation-1.11e-02

Pos logit contributions
te+0.000
 owned+0.000
 once+0.000
 Sp+0.000
 lived+0.000
Neg logit contributions
arding-0.000
 conclusions-0.000
 tighter-0.000
 sidelines-0.000
 unnecessary-0.000
Token that
Token ID326
Feature activation+1.52e-02

Pos logit contributions
 conclusions+0.022
arding+0.021
 sidelines+0.021
 tighter+0.021
 unnecessary+0.021
Neg logit contributions
te-0.012
 owned-0.010
 lived-0.010
 once-0.010
 Sp-0.009
Token she
Token ID673
Feature activation+3.42e-02

Pos logit contributions
te+0.020
z+0.015
 ti+0.015
 owned+0.014
 ruled+0.014
Neg logit contributions
 tighter-0.028
 conclusions-0.027
arding-0.027
 bullies-0.026
 instructions-0.026
Token had
Token ID550
Feature activation+8.58e-03

Pos logit contributions
te+0.038
 ti+0.032
 Sp+0.030
 Ske+0.030
z+0.030
Neg logit contributions
 tighter-0.064
arding-0.061
ising-0.060
 unnecessary-0.059
 bullies-0.059
Token a
Token ID257
Feature activation-1.67e-02

Pos logit contributions
te+0.010
 Sp+0.007
 owned+0.007
z+0.007
 once+0.007
Neg logit contributions
 tighter-0.016
arding-0.016
 conclusions-0.016
 sidelines-0.016
 unnecessary-0.016
Token useful
Token ID4465
Feature activation-2.82e-02

Pos logit contributions
 conclusions+0.025
arding+0.025
 sidelines+0.025
ising+0.024
 tighter+0.024
Neg logit contributions
te-0.025
 owned-0.022
 once-0.021
 ti-0.021
 lived-0.021
Token tool
Token ID2891
Feature activation-5.21e-02

Pos logit contributions
 tighter+0.038
 conclusions+0.038
arding+0.037
 sidelines+0.037
 unnecessary+0.036
Neg logit contributions
te-0.047
 Sp-0.039
 owned-0.039
 once-0.039
 lived-0.038
Token that
Token ID326
Feature activation+1.47e-02

Pos logit contributions
 tighter+0.085
arding+0.082
 instructions+0.080
 officers+0.079
 sidelines+0.079
Neg logit contributions
te-0.072
 Sp-0.058
 lived-0.056
 owned-0.056
 Ske-0.056
Token had
Token ID550
Feature activation+1.31e-02

Pos logit contributions
te+0.014
joy+0.013
 Matt+0.013
pter+0.012
 Jo+0.012
Neg logit contributions
 lam-0.013
 bullies-0.013
 faces-0.012
 trash-0.012
ck-0.012
Token a
Token ID257
Feature activation-9.15e-03

Pos logit contributions
te+0.018
z+0.014
 owned+0.014
 Sp+0.014
 ruled+0.013
Neg logit contributions
 tighter-0.021
 unnecessary-0.021
 weeds-0.021
 bullies-0.021
 lam-0.020
Token red
Token ID2266
Feature activation+5.65e-02

Pos logit contributions
 conclusions+0.015
arding+0.015
 tighter+0.015
 sidelines+0.015
 unnecessary+0.015
Neg logit contributions
te-0.012
 owned-0.010
 once-0.010
 Sp-0.010
 lived-0.010
Token cap
Token ID1451
Feature activation+1.71e-02

Pos logit contributions
te+0.085
 once+0.072
 ti+0.071
 lived+0.071
 owned+0.070
Neg logit contributions
arding-0.088
 unnecessary-0.087
 conclusions-0.087
 tighter-0.085
 sidelines-0.083
Token that
Token ID326
Feature activation+1.42e-02

Pos logit contributions
te+0.024
 Sp+0.021
 owned+0.020
 Esk+0.020
 Jo+0.020
Neg logit contributions
 tighter-0.026
arding-0.025
ising-0.025
 sidelines-0.024
 trigger-0.024
Token she
Token ID673
Feature activation+3.86e-02

Pos logit contributions
te+0.033
rd+0.032
joy+0.031
 Esk+0.031
z+0.031
Neg logit contributions
 poppy-0.006
nam-0.005
 trash-0.005
 tighter-0.005
 other-0.004
Token loved
Token ID6151
Feature activation+9.50e-03

Pos logit contributions
te+0.050
rd+0.043
 Matt+0.043
joy+0.042
 once+0.042
Neg logit contributions
 tighter-0.057
ck-0.054
ppa-0.051
ly-0.051
 strings-0.051
Token to
Token ID284
Feature activation+2.00e-03

Pos logit contributions
te+0.014
omo+0.012
z+0.012
rd+0.011
 Esk+0.011
Neg logit contributions
 tighter-0.013
 rules-0.013
 instructions-0.013
 conclusions-0.013
 lam-0.013
Token wear
Token ID5806
Feature activation+1.49e-02

Pos logit contributions
te+0.004
 once+0.003
z+0.003
 Jacob+0.003
rd+0.003
Neg logit contributions
 conclusions-0.002
 sidelines-0.002
 tighter-0.002
 faces-0.002
arding-0.002
Token.
Token ID13
Feature activation+9.48e-03

Pos logit contributions
te+0.020
omo+0.017
 Sp+0.017
z+0.017
 Jacob+0.017
Neg logit contributions
 tighter-0.023
ising-0.023
arding-0.023
 conclusions-0.022
 sidelines-0.022
Token One
Token ID1881
Feature activation+3.16e-03

Pos logit contributions
te+0.014
z+0.012
 ever+0.011
gh+0.011
 ruled+0.011
Neg logit contributions
 conclusions-0.015
 tighter-0.014
arding-0.014
 sidelines-0.014
 weeds-0.014
Token were
Token ID547
Feature activation-2.72e-03

Pos logit contributions
 tighter+0.008
arding+0.008
 conclusions+0.008
 sidelines+0.008
ising+0.007
Neg logit contributions
te-0.006
 Sp-0.005
z-0.005
 once-0.005
 owned-0.005
Token able
Token ID1498
Feature activation+7.63e-03

Pos logit contributions
 tighter+0.004
arding+0.003
 sidelines+0.003
 bullies+0.003
 weeds+0.003
Neg logit contributions
te-0.004
 Sp-0.004
 ti-0.004
 Ske-0.004
z-0.004
Token to
Token ID284
Feature activation-2.60e-03

Pos logit contributions
te+0.008
 owned+0.006
 once+0.006
 Sp+0.006
 lived+0.006
Neg logit contributions
 conclusions-0.015
arding-0.015
 tighter-0.015
 sidelines-0.015
 unnecessary-0.015
Token enjoy
Token ID2883
Feature activation+1.56e-02

Pos logit contributions
arding+0.004
 tighter+0.004
 conclusions+0.003
 sidelines+0.003
 unnecessary+0.003
Neg logit contributions
te-0.004
 owned-0.004
 once-0.004
 lived-0.004
 ruled-0.004
Token the
Token ID262
Feature activation+3.37e-03

Pos logit contributions
te+0.024
z+0.019
omo+0.018
 ti+0.018
 Sp+0.018
Neg logit contributions
 conclusions-0.025
 weeds-0.024
 tighter-0.024
 sidelines-0.024
 unnecessary-0.023
Token cave
Token ID11527
Feature activation+3.17e-02

Pos logit contributions
te+0.006
 Sp+0.005
 owned+0.005
z+0.005
 lived+0.005
Neg logit contributions
 conclusions-0.005
 tighter-0.005
 sidelines-0.005
ising-0.005
 unnecessary-0.005
Token.
Token ID13
Feature activation-1.72e-03

Pos logit contributions
te+0.044
 Sp+0.036
 owned+0.036
omo+0.036
 ti+0.035
Neg logit contributions
 tighter-0.053
ising-0.050
 conclusions-0.049
arding-0.049
 further-0.049
Token<|endoftext|>
Token ID50256
Feature activation-7.48e-02

Pos logit contributions
 conclusions+0.003
arding+0.003
 tighter+0.003
 sidelines+0.003
 bullies+0.003
Neg logit contributions
te-0.002
 ti-0.001
 owned-0.001
z-0.001
 ruled-0.001
TokenOnce
Token ID7454
Feature activation+4.76e-02

Pos logit contributions
 bushes+0.098
 further+0.088
 conclusions+0.084
 bullies+0.082
 weeds+0.081
Neg logit contributions
te-0.116
omo-0.107
 ti-0.104
 hardly-0.102
 barely-0.101
Token upon
Token ID2402
Feature activation+6.37e-03

Pos logit contributions
te+0.075
 ti+0.066
z+0.060
 ruled+0.060
rd+0.059
Neg logit contributions
arding-0.068
 tighter-0.066
 further-0.063
 conclusions-0.060
 weeds-0.060
Token a
Token ID257
Feature activation-1.37e-02

Pos logit contributions
te+0.008
z+0.007
 owned+0.006
 Sp+0.006
 ti+0.006
Neg logit contributions
 conclusions-0.011
 tighter-0.011
 unnecessary-0.010
 clues-0.010
 bullies-0.010
TokenSo
Token ID2396
Feature activation-2.71e-02

Pos logit contributions
te+0.004
 owned+0.004
 Sp+0.004
 once+0.003
z+0.003
Neg logit contributions
arding-0.006
 conclusions-0.006
 tighter-0.006
 sidelines-0.006
 unnecessary-0.006
Token Sam
Token ID3409
Feature activation-2.07e-02

Pos logit contributions
 tighter+0.050
arding+0.048
 conclusions+0.047
ising+0.047
 weeds+0.047
Neg logit contributions
te-0.034
 owned-0.026
 ti-0.026
z-0.026
 ruled-0.025
Token followed
Token ID3940
Feature activation-2.78e-02

Pos logit contributions
 tighter+0.032
ising+0.030
arding+0.030
 sidelines+0.029
 instructions+0.028
Neg logit contributions
te-0.031
 Sp-0.027
 ti-0.025
 owned-0.025
omo-0.025
Token the
Token ID262
Feature activation+8.46e-04

Pos logit contributions
arding+0.060
 conclusions+0.059
 sidelines+0.058
 tighter+0.058
 unnecessary+0.057
Neg logit contributions
te-0.026
 once-0.020
 Sp-0.019
 Ske-0.017
 ever-0.016
Token man
Token ID582
Feature activation+7.47e-03

Pos logit contributions
te+0.001
 He+0.001
 once+0.001
 owned+0.001
 ruled+0.001
Neg logit contributions
 conclusions-0.002
arding-0.002
oves-0.001
ares-0.001
 unnecessary-0.001
Token and
Token ID290
Feature activation-1.40e-02

Pos logit contributions
te+0.008
 once+0.008
z+0.007
 ever+0.006
 He+0.006
Neg logit contributions
 conclusions-0.015
 sidelines-0.015
arding-0.015
 trigger-0.015
 bullies-0.014
Token soon
Token ID2582
Feature activation+5.50e-02

Pos logit contributions
 conclusions+0.024
arding+0.024
 tighter+0.024
 sidelines+0.024
 unnecessary+0.023
Neg logit contributions
te-0.018
 owned-0.015
 once-0.015
 Sp-0.014
 lived-0.014
Token they
Token ID484
Feature activation-1.10e-02

Pos logit contributions
te+0.073
 Sp+0.057
z+0.057
 ti+0.057
omo+0.056
Neg logit contributions
 tighter-0.094
arding-0.092
 weeds-0.091
 bullies-0.090
 conclusions-0.089
Token reached
Token ID4251
Feature activation-1.30e-02

Pos logit contributions
 tighter+0.018
arding+0.018
 sidelines+0.017
 weeds+0.017
 conclusions+0.017
Neg logit contributions
te-0.015
 Sp-0.012
omo-0.012
 ti-0.012
rd-0.012
Token the
Token ID262
Feature activation+1.13e-03

Pos logit contributions
 tighter+0.022
 conclusions+0.022
 weeds+0.021
 unnecessary+0.021
 sidelines+0.021
Neg logit contributions
te-0.018
 owned-0.014
 ti-0.014
z-0.014
 Sp-0.013
Token village
Token ID7404
Feature activation-2.94e-02

Pos logit contributions
 once+0.001
te+0.001
 no+0.001
 seeker+0.001
 He+0.001
Neg logit contributions
arding-0.002
ck-0.002
ares-0.002
oves-0.002
irs-0.002
Token they
Token ID484
Feature activation-5.98e-03

Pos logit contributions
te+0.006
z+0.005
 owned+0.004
 ruled+0.004
imm+0.004
Neg logit contributions
 tighter-0.006
 weeds-0.006
 clues-0.006
 bullies-0.006
arding-0.006
Token see
Token ID766
Feature activation+1.28e-02

Pos logit contributions
 conclusions+0.009
 tighter+0.009
arding+0.009
 sidelines+0.009
 unnecessary+0.009
Neg logit contributions
te-0.009
 owned-0.007
 once-0.007
 Sp-0.007
 lived-0.007
Token a
Token ID257
Feature activation-9.52e-03

Pos logit contributions
te+0.016
 owned+0.012
z+0.012
 once+0.012
 Sp+0.012
Neg logit contributions
 conclusions-0.023
arding-0.023
 tighter-0.023
 sidelines-0.022
 unnecessary-0.022
Token problem
Token ID1917
Feature activation-5.20e-02

Pos logit contributions
 conclusions+0.017
ales+0.017
 tighter+0.017
arding+0.016
 unnecessary+0.016
Neg logit contributions
te-0.012
 owned-0.010
 ti-0.010
 Sp-0.010
 lived-0.009
Token.
Token ID13
Feature activation+5.63e-03

Pos logit contributions
 conclusions+0.097
arding+0.097
 tighter+0.096
 sidelines+0.095
 unnecessary+0.094
Neg logit contributions
te-0.061
 owned-0.047
 once-0.047
 Sp-0.046
 lived-0.046
Token There
Token ID1318
Feature activation-1.91e-02

Pos logit contributions
te+0.006
 owned+0.004
 Sp+0.004
 once+0.004
 lived+0.004
Neg logit contributions
 conclusions-0.012
arding-0.012
 tighter-0.012
 sidelines-0.011
 unnecessary-0.011
Token is
Token ID318
Feature activation+2.14e-02

Pos logit contributions
arding+0.034
 conclusions+0.033
 weeds+0.032
 tighter+0.032
 bullies+0.032
Neg logit contributions
te-0.024
 ti-0.020
z-0.019
 owned-0.018
omo-0.018
Token one
Token ID530
Feature activation+1.91e-02

Pos logit contributions
te+0.028
z+0.022
 owned+0.021
 ti+0.021
 ruled+0.021
Neg logit contributions
 tighter-0.037
 conclusions-0.036
 weeds-0.036
arding-0.035
 bullies-0.035
Token toy
Token ID13373
Feature activation-3.96e-03

Pos logit contributions
te+0.026
 once+0.022
 owned+0.022
 lived+0.021
 Sp+0.021
Neg logit contributions
 conclusions-0.032
arding-0.032
 sidelines-0.032
 unnecessary-0.032
 tighter-0.032
Token left
Token ID1364
Feature activation+4.57e-02

Pos logit contributions
 conclusions+0.006
 tighter+0.006
arding+0.006
 sidelines+0.006
 unnecessary+0.006
Neg logit contributions
te-0.006
 owned-0.005
 once-0.005
 Sp-0.005
 lived-0.005
Token that
Token ID326
Feature activation+2.02e-02

Pos logit contributions
te+0.063
 Sp+0.050
 owned+0.050
 once+0.049
z+0.049
Neg logit contributions
 conclusions-0.075
 tighter-0.075
arding-0.075
 sidelines-0.074
 weeds-0.071
Token high
Token ID1029
Feature activation+4.18e-02

Pos logit contributions
te+0.048
 Sp+0.038
 owned+0.038
 ti+0.036
 Ske+0.036
Neg logit contributions
 tighter-0.066
 conclusions-0.064
arding-0.063
ising-0.062
 sidelines-0.062
Token in
Token ID287
Feature activation-9.20e-03

Pos logit contributions
te+0.059
 Sp+0.049
 owned+0.048
omo+0.048
 lived+0.046
Neg logit contributions
 tighter-0.069
ising-0.066
 weeds-0.064
 further-0.064
 conclusions-0.063
Token the
Token ID262
Feature activation+1.29e-02

Pos logit contributions
 conclusions+0.015
 weeds+0.014
 tighter+0.013
 further+0.013
 rules+0.013
Neg logit contributions
te-0.014
z-0.012
omo-0.012
 ti-0.012
tto-0.011
Token sky
Token ID6766
Feature activation+1.51e-02

Pos logit contributions
te+0.017
 owned+0.014
 once+0.014
 lived+0.014
 Sp+0.014
Neg logit contributions
arding-0.022
 conclusions-0.022
 tighter-0.022
 sidelines-0.021
 unnecessary-0.021
Token.
Token ID13
Feature activation+6.63e-03

Pos logit contributions
te+0.025
 Sp+0.021
 owned+0.021
 Ske+0.021
omo+0.021
Neg logit contributions
 tighter-0.020
ising-0.020
 conclusions-0.019
 further-0.019
 instructions-0.018
Token One
Token ID1881
Feature activation+3.28e-03

Pos logit contributions
te+0.008
z+0.006
gh+0.006
 ti+0.006
 ever+0.006
Neg logit contributions
 conclusions-0.012
 sidelines-0.012
 tighter-0.012
 weeds-0.011
 bullies-0.011
Token day
Token ID1110
Feature activation+9.23e-03

Pos logit contributions
te+0.004
 ti+0.003
 Sp+0.003
z+0.003
 owned+0.003
Neg logit contributions
 tighter-0.006
 conclusions-0.006
arding-0.006
 weeds-0.006
ising-0.006
Token,
Token ID11
Feature activation-1.88e-03

Pos logit contributions
te+0.013
omo+0.011
z+0.011
 owned+0.010
rd+0.010
Neg logit contributions
 weeds-0.014
 further-0.014
arding-0.014
 bullies-0.014
 tighter-0.013
Token he
Token ID339
Feature activation+4.51e-02

Pos logit contributions
 tighter+0.004
 conclusions+0.004
 sidelines+0.004
 weeds+0.004
 bullies+0.004
Neg logit contributions
te-0.002
z-0.001
 owned-0.001
 ti-0.001
rd-0.001
Token went
Token ID1816
Feature activation-6.44e-03

Pos logit contributions
te+0.064
 Sp+0.053
 Ske+0.050
 ti+0.049
 once+0.049
Neg logit contributions
 tighter-0.071
arding-0.070
 conclusions-0.069
 bullies-0.068
 weeds-0.068
Token to
Token ID284
Feature activation+3.64e-03

Pos logit contributions
 conclusions+0.014
arding+0.014
 sidelines+0.014
 unnecessary+0.014
 trigger+0.014
Neg logit contributions
te-0.005
 once-0.004
 owned-0.004
 ruled-0.004
 Sp-0.004
Token skipped
Token ID26684
Feature activation-3.41e-03

Pos logit contributions
 conclusions+0.024
 unnecessary+0.023
arding+0.023
 sidelines+0.022
 trigger+0.022
Neg logit contributions
te-0.015
 lived-0.013
 owned-0.013
 once-0.012
 ti-0.012
Token along
Token ID1863
Feature activation+2.44e-02

Pos logit contributions
 tighter+0.006
 conclusions+0.006
arding+0.006
 sidelines+0.005
 unnecessary+0.005
Neg logit contributions
te-0.005
 owned-0.004
 Sp-0.004
 once-0.004
 lived-0.004
Token the
Token ID262
Feature activation+1.31e-02

Pos logit contributions
te+0.023
 once+0.018
 Sp+0.016
 owned+0.016
 ever+0.015
Neg logit contributions
arding-0.054
 conclusions-0.053
 unnecessary-0.052
 trigger-0.051
 sidelines-0.051
Token sidewalk
Token ID24058
Feature activation-6.14e-02

Pos logit contributions
 once+0.016
z+0.016
te+0.015
 was+0.015
 lived+0.015
Neg logit contributions
arding-0.024
ares-0.023
oves-0.021
 conclusions-0.021
irs-0.021
Token,
Token ID11
Feature activation+1.69e-03

Pos logit contributions
 conclusions+0.111
arding+0.111
 tighter+0.110
 sidelines+0.109
 unnecessary+0.106
Neg logit contributions
te-0.075
 owned-0.059
 Sp-0.057
 once-0.057
 lived-0.056
Token holding
Token ID4769
Feature activation+1.74e-02

Pos logit contributions
te+0.002
 once+0.002
 owned+0.002
 Sp+0.002
 lived+0.002
Neg logit contributions
 conclusions-0.003
arding-0.003
 unnecessary-0.003
 tighter-0.003
 sidelines-0.003
Token hands
Token ID2832
Feature activation-4.61e-02

Pos logit contributions
te+0.031
 owned+0.026
 once+0.026
 Sp+0.026
 lived+0.025
Neg logit contributions
 conclusions-0.022
 tighter-0.022
arding-0.022
 sidelines-0.022
 unnecessary-0.021
Token and
Token ID290
Feature activation-1.23e-03

Pos logit contributions
 conclusions+0.082
arding+0.082
 tighter+0.081
 sidelines+0.080
 unnecessary+0.078
Neg logit contributions
te-0.058
 owned-0.046
 Sp-0.045
 once-0.045
 lived-0.044
Token singing
Token ID13777
Feature activation+1.35e-02

Pos logit contributions
 tighter+0.002
arding+0.002
 conclusions+0.002
 sidelines+0.002
 weeds+0.002
Neg logit contributions
te-0.002
 owned-0.001
 Sp-0.001
 once-0.001
 lived-0.001
Token songs
Token ID7259
Feature activation+5.93e-02

Pos logit contributions
te+0.019
 owned+0.017
omo+0.017
 ruled+0.016
z+0.016
Neg logit contributions
 tighter-0.020
 conclusions-0.020
 faces-0.019
 clues-0.019
 weeds-0.019
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
te+0.073
 owned+0.058
 Sp+0.057
 once+0.056
 lived+0.056
Neg logit contributions
 tighter-0.105
arding-0.105
 conclusions-0.105
 sidelines-0.103
ising-0.101
Token like
Token ID588
Feature activation+6.36e-02

Pos logit contributions
te+0.052
 ever+0.042
 owned+0.040
 once+0.040
 ruled+0.038
Neg logit contributions
 sidelines-0.068
arding-0.067
 conclusions-0.067
 unnecessary-0.065
 tighter-0.065
Token eggs
Token ID9653
Feature activation+8.20e-02

Pos logit contributions
te+0.104
z+0.086
 lived+0.084
 owned+0.083
 ruled+0.082
Neg logit contributions
 conclusions-0.091
 weeds-0.090
 tighter-0.090
arding-0.089
 sidelines-0.088
Token and
Token ID290
Feature activation-1.98e-02

Pos logit contributions
te+0.095
 owned+0.073
 Sp+0.072
 once+0.070
 lived+0.069
Neg logit contributions
arding-0.154
 conclusions-0.153
 tighter-0.153
 sidelines-0.150
ising-0.147
Token toast
Token ID27805
Feature activation+4.91e-02

Pos logit contributions
 weeds+0.034
 conclusions+0.034
 tighter+0.034
 sidelines+0.033
 bullies+0.033
Neg logit contributions
te-0.027
z-0.021
 once-0.020
 owned-0.019
 ruled-0.019
Token.
Token ID13
Feature activation-1.63e-03

Pos logit contributions
te+0.073
 Sp+0.061
 ruled+0.060
z+0.060
 owned+0.060
Neg logit contributions
 tighter-0.073
arding-0.068
ising-0.068
 rules-0.067
 instructions-0.067
Token He
Token ID679
Feature activation+4.83e-03

Pos logit contributions
 conclusions+0.003
 tighter+0.003
arding+0.003
 sidelines+0.003
 bullies+0.003
Neg logit contributions
te-0.002
 ti-0.001
 owned-0.001
 ruled-0.001
z-0.001
Token liked
Token ID8288
Feature activation+3.03e-02

Pos logit contributions
te+0.007
z+0.005
 once+0.005
 Sp+0.005
 ti+0.005
Neg logit contributions
 tighter-0.008
 conclusions-0.007
 bullies-0.007
 weeds-0.007
 sidelines-0.007
Token cereal
Token ID33158
Feature activation+4.66e-02

Pos logit contributions
te+0.046
z+0.037
 lived+0.036
 ruled+0.036
 ti+0.036
Neg logit contributions
 tighter-0.047
 weeds-0.046
 conclusions-0.046
 bullies-0.045
 instructions-0.045
Token with
Token ID351
Feature activation+1.08e-02

Pos logit contributions
te+0.068
 owned+0.057
z+0.056
 ruled+0.055
 ti+0.055
Neg logit contributions
 tighter-0.075
ising-0.070
 instructions-0.067
 conclusions-0.066
 weeds-0.066
Token milk
Token ID7545
Feature activation+6.27e-03

Pos logit contributions
te+0.016
z+0.013
 ruled+0.013
 owned+0.013
 seeker+0.012
Neg logit contributions
 weeds-0.017
 tighter-0.017
 conclusions-0.016
 sidelines-0.016
 faces-0.016
Token and
Token ID290
Feature activation-2.06e-02

Pos logit contributions
te+0.009
 owned+0.007
 Sp+0.007
 ruled+0.007
 lived+0.007
Neg logit contributions
 tighter-0.010
arding-0.010
 conclusions-0.010
ising-0.010
 weeds-0.010
Token learned
Token ID4499
Feature activation-2.90e-02

Pos logit contributions
 tighter+0.047
 sidelines+0.043
 officers+0.043
 conclusions+0.042
arding+0.042
Neg logit contributions
te-0.041
z-0.035
 Sp-0.035
gh-0.034
 ti-0.033
Token that
Token ID326
Feature activation+2.20e-02

Pos logit contributions
 tighter+0.046
 conclusions+0.046
arding+0.046
 sidelines+0.045
 unnecessary+0.044
Neg logit contributions
te-0.042
 owned-0.034
 Sp-0.034
z-0.033
 once-0.033
Token it
Token ID340
Feature activation-1.64e-02

Pos logit contributions
te+0.030
 owned+0.024
 once+0.024
 Sp+0.023
 lived+0.023
Neg logit contributions
 conclusions-0.037
arding-0.037
 tighter-0.037
 sidelines-0.037
 unnecessary-0.036
Token doesn
Token ID1595
Feature activation+2.92e-02

Pos logit contributions
arding+0.023
 tighter+0.023
ising+0.022
 bullies+0.022
 unnecessary+0.021
Neg logit contributions
te-0.027
 ti-0.023
omo-0.022
z-0.022
 Sp-0.021
Token't
Token ID470
Feature activation+1.08e-02

Pos logit contributions
te+0.029
 ti+0.023
 owned+0.023
 lived+0.021
 Sp+0.021
Neg logit contributions
arding-0.059
 conclusions-0.058
 sidelines-0.058
 bullies-0.057
ising-0.057
Token matter
Token ID2300
Feature activation-6.16e-02

Pos logit contributions
te+0.015
 ever+0.014
 owned+0.014
 once+0.013
 lived+0.013
Neg logit contributions
 conclusions-0.017
 sidelines-0.017
arding-0.017
 register-0.017
 tighter-0.016
Token what
Token ID644
Feature activation+9.14e-03

Pos logit contributions
 sidelines+0.099
 tighter+0.095
 trigger+0.093
 conclusions+0.092
 weeds+0.091
Neg logit contributions
te-0.094
 once-0.080
 lived-0.080
 Sp-0.079
 owned-0.079
Token others
Token ID1854
Feature activation-3.04e-02

Pos logit contributions
te+0.014
 ti+0.011
z+0.011
 Sp+0.011
 owned+0.011
Neg logit contributions
arding-0.014
 conclusions-0.014
 unnecessary-0.014
 bullies-0.014
 tighter-0.014
Token say
Token ID910
Feature activation-1.08e-02

Pos logit contributions
 tighter+0.052
 conclusions+0.051
 sidelines+0.051
arding+0.049
 unnecessary+0.048
Neg logit contributions
te-0.041
 owned-0.034
 lived-0.034
 once-0.034
 ever-0.032
Token,
Token ID11
Feature activation+4.83e-03

Pos logit contributions
arding+0.018
 conclusions+0.017
 unnecessary+0.017
 bullies+0.017
ising+0.016
Neg logit contributions
te-0.015
z-0.013
omo-0.013
 Sp-0.012
tto-0.012
Token as
Token ID355
Feature activation-2.75e-02

Pos logit contributions
te+0.007
 owned+0.006
z+0.006
 ti+0.005
 Sp+0.005
Neg logit contributions
arding-0.008
 conclusions-0.008
 tighter-0.008
 sidelines-0.008
 unnecessary-0.007
Token
Token ID198
Feature activation-1.11e-02

Pos logit contributions
arding+0.030
 conclusions+0.030
 sidelines+0.030
 tighter+0.030
 unnecessary+0.029
Neg logit contributions
te-0.007
 once-0.004
 Sp-0.004
 lived-0.004
 owned-0.004
TokenShe
Token ID3347
Feature activation-9.51e-03

Pos logit contributions
 unnecessary+0.022
 trigger+0.022
arding+0.022
 weeds+0.022
 conclusions+0.021
Neg logit contributions
te-0.012
z-0.009
 He-0.009
 Sp-0.008
o-0.008
Token tried
Token ID3088
Feature activation-9.46e-02

Pos logit contributions
 conclusions+0.017
arding+0.017
 unnecessary+0.016
 sidelines+0.016
ising+0.015
Neg logit contributions
te-0.013
 owned-0.011
 lived-0.010
 once-0.010
 Sp-0.010
Token to
Token ID284
Feature activation-8.44e-03

Pos logit contributions
arding+0.208
 sidelines+0.206
 conclusions+0.206
ising+0.203
 unnecessary+0.201
Neg logit contributions
te-0.082
 once-0.062
 Sp-0.058
 owned-0.054
 Ske-0.050
Token w
Token ID266
Feature activation+3.33e-02

Pos logit contributions
arding+0.015
 unnecessary+0.014
leness+0.014
 register+0.013
ented+0.013
Neg logit contributions
te-0.011
 his-0.011
 ever-0.010
 once-0.010
 now-0.010
Tokeniggle
Token ID24082
Feature activation-6.04e-02

Pos logit contributions
te+0.045
gh+0.041
 once+0.037
 lived+0.037
 owned+0.034
Neg logit contributions
 lam-0.058
 lamb-0.058
 tighter-0.056
 sidelines-0.055
 officers-0.054
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 tighter+0.088
 conclusions+0.086
arding+0.082
 unnecessary+0.080
 weeds+0.080
Neg logit contributions
te-0.097
 owned-0.083
 ruled-0.082
 lived-0.081
z-0.081
Token She
Token ID1375
Feature activation-9.14e-03

Pos logit contributions
 conclusions+0.020
arding+0.020
 tighter+0.020
 sidelines+0.020
 unnecessary+0.019
Neg logit contributions
te-0.013
 owned-0.010
z-0.010
 lived-0.009
 ruled-0.009
Token tried
Token ID3088
Feature activation-9.44e-02

Pos logit contributions
 tighter+0.014
arding+0.012
 bullies+0.012
 conclusions+0.012
 weeds+0.012
Neg logit contributions
te-0.014
 once-0.012
z-0.012
rd-0.011
 Sp-0.011
Token to
Token ID284
Feature activation-8.18e-03

Pos logit contributions
arding+0.198
 conclusions+0.197
 sidelines+0.197
 tighter+0.192
ising+0.192
Neg logit contributions
te-0.089
 once-0.067
 Sp-0.065
 owned-0.062
z-0.058
Token slide
Token ID10649
Feature activation+6.11e-03

Pos logit contributions
arding+0.014
 unnecessary+0.013
 register+0.013
leness+0.013
 goats+0.012
Neg logit contributions
te-0.011
 his-0.010
 ever-0.010
 once-0.010
 no-0.010
Token wrong
Token ID2642
Feature activation-3.48e-02

Pos logit contributions
te+0.038
 Sp+0.032
 owned+0.032
z+0.032
 once+0.031
Neg logit contributions
arding-0.031
 conclusions-0.030
 tighter-0.030
 sidelines-0.030
 unnecessary-0.030
Token.
Token ID13
Feature activation+6.61e-03

Pos logit contributions
 tighter+0.049
ising+0.048
arding+0.047
 instructions+0.047
 rules+0.046
Neg logit contributions
te-0.053
 owned-0.046
z-0.046
 ti-0.045
 Sp-0.044
Token
Token ID198
Feature activation+1.29e-02

Pos logit contributions
te+0.006
 ti+0.004
z+0.004
gh+0.004
 owned+0.004
Neg logit contributions
 tighter-0.014
 sidelines-0.014
 conclusions-0.014
arding-0.013
 bullies-0.013
Token
Token ID198
Feature activation+1.30e-02

Pos logit contributions
te+0.012
 ti+0.009
 owned+0.009
 ruled+0.009
 lived+0.009
Neg logit contributions
 tighter-0.026
 conclusions-0.026
arding-0.025
 sidelines-0.025
 instructions-0.025
TokenThe
Token ID464
Feature activation+6.83e-03

Pos logit contributions
te+0.018
 owned+0.015
 Sp+0.014
 once+0.014
 lived+0.014
Neg logit contributions
 conclusions-0.022
arding-0.022
 tighter-0.021
 sidelines-0.021
 unnecessary-0.021
Token little
Token ID1310
Feature activation-7.25e-02

Pos logit contributions
te+0.011
 owned+0.009
 once+0.009
 lived+0.009
 ruled+0.008
Neg logit contributions
 conclusions-0.011
arding-0.011
 unnecessary-0.011
oves-0.011
 sidelines-0.010
Token girl
Token ID2576
Feature activation-1.47e-02

Pos logit contributions
arding+0.089
 sidelines+0.088
 conclusions+0.088
 unnecessary+0.086
oves+0.080
Neg logit contributions
te-0.134
 owned-0.121
 once-0.121
 Sp-0.116
 ti-0.116
Token told
Token ID1297
Feature activation-9.27e-03

Pos logit contributions
 conclusions+0.023
arding+0.023
 sidelines+0.023
 tighter+0.023
 unnecessary+0.022
Neg logit contributions
te-0.022
 owned-0.018
 once-0.018
 lived-0.018
 Sp-0.018
Token the
Token ID262
Feature activation+9.23e-03

Pos logit contributions
arding+0.019
 conclusions+0.019
 unnecessary+0.018
 tighter+0.018
 sidelines+0.018
Neg logit contributions
te-0.009
 Sp-0.008
 once-0.008
 Ske-0.007
 owned-0.007
Token old
Token ID1468
Feature activation-1.79e-02

Pos logit contributions
te+0.014
 once+0.012
 owned+0.012
 lived+0.011
 ruled+0.011
Neg logit contributions
 conclusions-0.015
arding-0.015
 unnecessary-0.015
 tighter-0.015
 sidelines-0.015
Token man
Token ID582
Feature activation+1.25e-02

Pos logit contributions
 tighter+0.022
 conclusions+0.022
arding+0.021
 sidelines+0.021
 bullies+0.020
Neg logit contributions
te-0.032
 Sp-0.027
 owned-0.027
z-0.027
 once-0.027
Token in
Token ID287
Feature activation-1.89e-02

Pos logit contributions
te+0.025
 lived+0.021
 Sp+0.021
 ruled+0.021
 owned+0.021
Neg logit contributions
 tighter-0.027
 further-0.026
ising-0.025
arding-0.025
 weeds-0.024
Token the
Token ID262
Feature activation+5.14e-03

Pos logit contributions
arding+0.038
 sidelines+0.037
 conclusions+0.037
 tighter+0.036
 unnecessary+0.036
Neg logit contributions
te-0.020
 once-0.016
 Sp-0.015
 owned-0.015
 lived-0.014
Token hole
Token ID7604
Feature activation-4.06e-02

Pos logit contributions
te+0.008
 once+0.008
 no+0.008
 lived+0.008
 owned+0.008
Neg logit contributions
arding-0.007
ares-0.007
ales-0.006
irs-0.006
oves-0.006
Token with
Token ID351
Feature activation+1.55e-02

Pos logit contributions
 tighter+0.066
arding+0.063
ising+0.062
 weeds+0.061
 conclusions+0.061
Neg logit contributions
te-0.057
 Sp-0.048
 owned-0.048
 ti-0.047
 Ske-0.047
Token no
Token ID645
Feature activation+7.52e-02

Pos logit contributions
te+0.016
 once+0.013
 owned+0.013
 Sp+0.013
 lived+0.013
Neg logit contributions
arding-0.030
 conclusions-0.030
 sidelines-0.030
ising-0.030
 tighter-0.029
Token way
Token ID835
Feature activation-6.67e-02

Pos logit contributions
te+0.111
 once+0.106
 ti+0.103
 ever+0.099
 owned+0.097
Neg logit contributions
arding-0.115
 sidelines-0.109
 conclusions-0.108
 bullies-0.104
oves-0.103
Token out
Token ID503
Feature activation-6.15e-02

Pos logit contributions
 conclusions+0.105
arding+0.105
 tighter+0.104
 sidelines+0.103
 unnecessary+0.100
Neg logit contributions
te-0.098
 owned-0.080
 Sp-0.079
 once-0.078
 lived-0.077
Token.
Token ID13
Feature activation+2.19e-03

Pos logit contributions
arding+0.091
 conclusions+0.090
 tighter+0.090
 weeds+0.087
 unnecessary+0.085
Neg logit contributions
te-0.094
 ti-0.078
 Sp-0.077
 Ske-0.076
z-0.075
Token
Token ID198
Feature activation+1.10e-02

Pos logit contributions
te+0.002
 once+0.001
 owned+0.001
 Sp+0.001
 lived+0.001
Neg logit contributions
 conclusions-0.005
arding-0.005
 tighter-0.005
 sidelines-0.005
 unnecessary-0.005
Token
Token ID198
Feature activation+1.08e-02

Pos logit contributions
te+0.009
 owned+0.006
 ti+0.006
 Sp+0.006
 once+0.006
Neg logit contributions
 conclusions-0.024
arding-0.024
 tighter-0.024
 sidelines-0.023
 unnecessary-0.023
TokenAs
Token ID1722
Feature activation-6.95e-02

Pos logit contributions
te+0.014
 lived+0.011
 owned+0.011
omo+0.011
z+0.011
Neg logit contributions
 sidelines-0.019
 unnecessary-0.019
 conclusions-0.019
 trigger-0.019
arding-0.019
Token car
Token ID1097
Feature activation+5.12e-03

Pos logit contributions
te+0.000
 ti+0.000
 Sp+0.000
z+0.000
 Ske+0.000
Neg logit contributions
 tighter-0.001
arding-0.001
 bullies-0.001
 conclusions-0.001
 sidelines-0.001
Token was
Token ID373
Feature activation+1.61e-02

Pos logit contributions
te+0.005
 ti+0.005
 Sp+0.004
 Esk+0.004
omo+0.004
Neg logit contributions
 officers-0.010
arding-0.009
ising-0.009
 bullies-0.009
 tighter-0.009
Token red
Token ID2266
Feature activation+5.20e-02

Pos logit contributions
te+0.023
 once+0.019
 Sp+0.019
 ti+0.019
 owned+0.019
Neg logit contributions
 tighter-0.025
 sidelines-0.024
arding-0.024
 bullies-0.024
 unnecessary-0.024
Token and
Token ID290
Feature activation-1.22e-02

Pos logit contributions
te+0.047
 once+0.040
 He+0.037
 Sp+0.035
 Ske+0.035
Neg logit contributions
 conclusions-0.114
 unnecessary-0.112
arding-0.110
 sidelines-0.108
 tighter-0.106
Token the
Token ID262
Feature activation+4.07e-03

Pos logit contributions
 tighter+0.019
 sidelines+0.018
arding+0.017
 weeds+0.017
 officers+0.017
Neg logit contributions
te-0.019
rd-0.016
 once-0.015
 ti-0.015
gh-0.014
Token other
Token ID584
Feature activation-8.54e-02

Pos logit contributions
te+0.008
 owned+0.007
 once+0.007
 Sp+0.007
 lived+0.007
Neg logit contributions
 conclusions-0.004
arding-0.004
 tighter-0.004
 sidelines-0.004
 unnecessary-0.004
Token car
Token ID1097
Feature activation+4.08e-03

Pos logit contributions
 tighter+0.159
arding+0.159
 conclusions+0.157
 sidelines+0.155
ising+0.153
Neg logit contributions
te-0.096
 ti-0.075
 Sp-0.071
 owned-0.069
 Ske-0.069
Token was
Token ID373
Feature activation+1.52e-02

Pos logit contributions
te+0.004
 ti+0.003
 Sp+0.003
rd+0.003
omo+0.003
Neg logit contributions
 officers-0.008
ising-0.007
 tighter-0.007
arding-0.007
 bullies-0.007
Token blue
Token ID4171
Feature activation+7.35e-02

Pos logit contributions
te+0.024
 Sp+0.020
rd+0.020
 ti+0.020
 lived+0.019
Neg logit contributions
 tighter-0.022
 sidelines-0.020
arding-0.020
 bullies-0.020
 further-0.019
Token.
Token ID13
Feature activation+7.17e-03

Pos logit contributions
te+0.093
 owned+0.074
 Sp+0.072
 lived+0.071
 once+0.070
Neg logit contributions
 tighter-0.129
 conclusions-0.129
arding-0.129
 sidelines-0.127
ising-0.124
Token They
Token ID1119
Feature activation-3.79e-02

Pos logit contributions
te+0.007
z+0.005
 owned+0.005
 ti+0.004
 once+0.004
Neg logit contributions
 sidelines-0.015
 conclusions-0.015
arding-0.015
 tighter-0.015
 bullies-0.014