TOP ACTIVATIONS
MAX = 0.101

just like in a fair yt ale ! <|endoftext|> Once upon
Bang ! You 're under arrest !" Their mom
and it shattered into a million pieces . Lily was so
The harbor was forever known as the Sharing Harbor -
o had died from his injuries . His friends were very
giant ! You are under arrest ! You can 't take
big farm a long time ago . Lily listened as the
many more hunts and became known as the smartest , most
, " You are under arrest for harassing and stealing from
f ortunate y , her injuries were so severe , she
robber ! You are under arrest !" Lily shouts , holding
pocket and broke into a million pieces . Tim my 's
" I feel like a million bucks !" And just like
was walking through a fair yt ale . She fell asleep
best friend a long time ago . He was my dog
, the ancient task was known to be solved by the
rot was the happiest he had ever been . <|endoftext|> Once
" and " la la la ". Mom is
and it broke into a million pieces . Lily was very
it away a long time ago . It is not good

MIN ACTIVATIONS
MIN = -0.103

and some soap . Lily dipped a cloth in the water
and a wand . She dips the wand in the soap
paint . First , she dipped the brush into a jar
decided to cheat . He dipped his claws in ink and
" Yes please !" Dad dipped a brush into the paint
make one too . He dips his brush in the red
a big brush . She dipped the brush in the yellow
for making bubbles . She dipped the wand into the bottle
a nearby pond . She dipped her be ak in the
special black bowl . She dipped her brush in and started
M um gave me a big cardboard box and
ran to the shore and dipped their feet in the cool
lake to soak . He dipped his paws in the water
up a carrot stick and dips it in a white sauce
He took the teaspoon and dipped it in red paint .
to the stream . She dips her hand in the water
was so happy ! He dipped his brush into the bucket
and a brush . She dips the brush in the water
wet ." They dipped their fingers in the water
a big paper . He dipped it in blue paint and

INTERVAL 0.050 - 0.101
CONTAINS 0.792%

there was a little girl named Lily . She loved Christmas
giant didn 't know what to do to make her feel
dragon and flew away . From that day on , whenever
there was a little girl named Lily . She loved to
would heal him . Tom never felt sad for long because

INTERVAL -0.001 - 0.050
CONTAINS 51.823%

J enny nodded but she didn 't know what Mr
!" They both stretched their arms up high and touched their
was a tall bear . He liked to slide down hills
there was a country far , far away . Everyone in
wild place with many trees and flowers . Every day ,

INTERVAL -0.052 - -0.001
CONTAINS 46.928%

Mum . She poured the tea and put the cup on
play with my doll ?" Em ma agreed ,
. She asked , " Can we see the big planes
and so he decided to stay and talk to the monster
You are always compassionate to us and to others ."

INTERVAL -0.103 - -0.052
CONTAINS 0.458%

." They hug each other . They are sorry
Jack pushed the button and something amazing happened . A beautiful
from Ben 's hand and put it back on the ground
, I am the Guardian of
, That was so y ummy
Token just
Token ID655
Feature activation+3.65e-02

Pos logit contributions
 loser+0.036
 arrest+0.035
 fourteen+0.033
 eighteen+0.033
OO+0.033
Neg logit contributions
 up-0.020
 around-0.018
 outside-0.017
 else-0.016
 closer-0.016
Token like
Token ID588
Feature activation-1.20e-02

Pos logit contributions
 arrest+0.069
 loser+0.065
 fourteen+0.060
 eighteen+0.059
 sixteen+0.058
Neg logit contributions
 up-0.051
 else-0.046
 around-0.045
 outside-0.045
uggle-0.044
Token in
Token ID287
Feature activation+1.24e-02

Pos logit contributions
 up+0.014
 else+0.013
 outside+0.012
 around+0.012
 carefully+0.011
Neg logit contributions
 loser-0.025
 arrest-0.024
 eighteen-0.023
 fourteen-0.023
 sixteen-0.023
Token a
Token ID257
Feature activation+9.66e-03

Pos logit contributions
EEP+0.027
 loser+0.026
 arrest+0.026
 eighteen+0.026
OO+0.025
Neg logit contributions
 up-0.013
 else-0.012
 outside-0.012
 around-0.011
 special-0.010
Token fair
Token ID3148
Feature activation+1.64e-02

Pos logit contributions
 arrest+0.019
 loser+0.017
 fourteen+0.017
 no+0.017
 eighteen+0.016
Neg logit contributions
 up-0.012
 else-0.012
uggle-0.011
 outside-0.011
 closer-0.011
Tokenyt
Token ID20760
Feature activation+1.01e-01

Pos logit contributions
 arrest+0.032
 loser+0.032
 fourteen+0.032
 eighteen+0.030
 sixteen+0.030
Neg logit contributions
 up-0.019
 outside-0.018
 else-0.018
 around-0.017
fish-0.017
Tokenale
Token ID1000
Feature activation+3.37e-02

Pos logit contributions
 loser+0.214
 arrest+0.207
 no+0.204
 Leopard+0.189
 letting+0.188
Neg logit contributions
 up-0.110
 closely-0.102
uggle-0.100
 around-0.099
 carefully-0.094
Token!
Token ID0
Feature activation+2.15e-03

Pos logit contributions
 arrest+0.067
 loser+0.065
 fourteen+0.064
 eighteen+0.062
 sixteen+0.061
Neg logit contributions
 up-0.040
 else-0.039
 outside-0.038
 around-0.037
in-0.034
Token<|endoftext|>
Token ID50256
Feature activation-3.01e-02

Pos logit contributions
 arrest+0.005
 loser+0.005
 fourteen+0.005
 no+0.005
 eighteen+0.005
Neg logit contributions
 up-0.002
uggle-0.001
 else-0.001
 around-0.001
 compliments-0.001
TokenOnce
Token ID7454
Feature activation-2.40e-02

Pos logit contributions
 up+0.029
 else+0.028
uggle+0.027
 compliments+0.026
fish+0.024
Neg logit contributions
 arrest-0.064
 loser-0.061
 no-0.060
 fourteen-0.058
 enough-0.057
Token upon
Token ID2402
Feature activation+2.16e-02

Pos logit contributions
 up+0.029
uggle+0.029
 else+0.028
 compliments+0.027
Hop+0.026
Neg logit contributions
 arrest-0.044
 no-0.042
 loser-0.042
 fourteen-0.040
 enough-0.039
Token Bang
Token ID9801
Feature activation-6.45e-03

Pos logit contributions
 arrest+0.047
ross+0.047
 enough+0.045
 fourteen+0.044
 eighteen+0.044
Neg logit contributions
 up-0.047
uggle-0.044
 else-0.043
 closer-0.040
 outside-0.040
Token!
Token ID0
Feature activation+2.82e-02

Pos logit contributions
 up+0.009
 else+0.008
in+0.008
 outside+0.008
 around+0.007
Neg logit contributions
 arrest-0.012
 loser-0.011
 no-0.011
 eighteen-0.010
 fourteen-0.010
Token You
Token ID921
Feature activation-4.69e-03

Pos logit contributions
 arrest+0.040
ross+0.040
 enough+0.039
 fourteen+0.037
 eighteen+0.037
Neg logit contributions
 up-0.043
uggle-0.042
 else-0.041
 closer-0.038
Sure-0.037
Token're
Token ID821
Feature activation+1.21e-02

Pos logit contributions
 else+0.007
 up+0.006
 around+0.006
 outside+0.006
 closer+0.006
Neg logit contributions
 arrest-0.008
 loser-0.008
ross-0.007
 eighteen-0.007
 fourteen-0.007
Token under
Token ID739
Feature activation+1.56e-02

Pos logit contributions
 arrest+0.022
 loser+0.020
 fourteen+0.020
ross+0.019
 eighteen+0.019
Neg logit contributions
 up-0.018
 else-0.016
 outside-0.015
 around-0.015
 closer-0.014
Token arrest
Token ID3251
Feature activation+9.82e-02

Pos logit contributions
 loser+0.015
ross+0.013
 nineteen+0.013
 seventeen+0.013
 thirteen+0.013
Neg logit contributions
 up-0.035
 outside-0.034
 around-0.032
 else-0.031
 bubbles-0.030
Token!"
Token ID2474
Feature activation-2.62e-02

Pos logit contributions
 loser+0.185
 thirteen+0.173
 ago+0.170
 Have+0.168
 vain+0.168
Neg logit contributions
 up-0.138
 around-0.121
 outside-0.118
 else-0.107
le-0.097
Token
Token ID198
Feature activation-9.12e-03

Pos logit contributions
 up+0.025
 else+0.023
 around+0.021
uggle+0.020
 outside+0.020
Neg logit contributions
 arrest-0.060
 loser-0.058
 fourteen-0.054
 eighteen-0.054
 no-0.053
Token
Token ID198
Feature activation+3.69e-03

Pos logit contributions
 up+0.011
uggle+0.011
 compliments+0.010
 around+0.010
fish+0.010
Neg logit contributions
 arrest-0.017
 no-0.016
 loser-0.016
 fourteen-0.015
 sixteen-0.015
TokenTheir
Token ID14574
Feature activation+2.15e-02

Pos logit contributions
 arrest+0.006
 loser+0.006
 no+0.006
 fourteen+0.005
 eighteen+0.005
Neg logit contributions
 up-0.006
uggle-0.005
 else-0.005
 around-0.005
 compliments-0.005
Token mom
Token ID1995
Feature activation+1.27e-02

Pos logit contributions
 arrest+0.044
 loser+0.041
 no+0.040
 fourteen+0.039
 eighteen+0.038
Neg logit contributions
 up-0.024
uggle-0.024
 else-0.023
 around-0.022
 compliments-0.021
Token and
Token ID290
Feature activation-6.81e-03

Pos logit contributions
 arrest+0.039
 loser+0.038
 eighteen+0.037
 fourteen+0.036
 nineteen+0.035
Neg logit contributions
 up-0.020
 around-0.018
 outside-0.017
 else-0.017
 carefully-0.015
Token it
Token ID340
Feature activation-7.45e-03

Pos logit contributions
 up+0.009
 else+0.008
 around+0.008
 outside+0.007
uggle+0.007
Neg logit contributions
 loser-0.013
 arrest-0.013
 eighteen-0.013
 fourteen-0.012
 sixteen-0.012
Token shattered
Token ID23273
Feature activation+3.85e-02

Pos logit contributions
 up+0.011
 else+0.010
 around+0.010
 outside+0.010
 closer+0.009
Neg logit contributions
 arrest-0.013
 loser-0.013
 eighteen-0.012
 fourteen-0.012
 sixteen-0.012
Token into
Token ID656
Feature activation+2.15e-02

Pos logit contributions
 arrest+0.098
 loser+0.096
 eighteen+0.093
 nineteen+0.089
 fourteen+0.089
Neg logit contributions
 up-0.038
 around-0.031
 outside-0.029
 inside-0.022
 carefully-0.022
Token a
Token ID257
Feature activation+3.61e-03

Pos logit contributions
 arrest+0.041
 loser+0.040
 eighteen+0.039
 fourteen+0.038
 nineteen+0.038
Neg logit contributions
 up-0.025
 else-0.025
 outside-0.022
 around-0.022
 compliments-0.022
Token million
Token ID1510
Feature activation+9.70e-02

Pos logit contributions
 arrest+0.004
 loser+0.004
 no+0.004
 fourteen+0.004
 eighteen+0.003
Neg logit contributions
 up-0.007
 else-0.007
uggle-0.007
 around-0.007
 compliments-0.007
Token pieces
Token ID5207
Feature activation+3.05e-02

Pos logit contributions
 arrest+0.171
 loser+0.166
 fourteen+0.158
 eighteen+0.158
 no+0.154
Neg logit contributions
 up-0.130
 else-0.130
 compliments-0.119
 around-0.118
uggle-0.116
Token.
Token ID13
Feature activation-9.53e-03

Pos logit contributions
 arrest+0.066
 loser+0.066
 eighteen+0.062
 fourteen+0.060
 sixteen+0.059
Neg logit contributions
 up-0.032
 else-0.032
 around-0.031
 outside-0.030
 closer-0.026
Token Lily
Token ID20037
Feature activation-1.51e-02

Pos logit contributions
 up+0.011
 else+0.010
uggle+0.010
 around+0.009
 compliments+0.009
Neg logit contributions
 arrest-0.019
 loser-0.018
 no-0.018
 fourteen-0.018
 eighteen-0.017
Token was
Token ID373
Feature activation-8.66e-03

Pos logit contributions
 up+0.015
 else+0.014
 around+0.013
uggle+0.013
 outside+0.012
Neg logit contributions
 arrest-0.033
 loser-0.032
 fourteen-0.031
 eighteen-0.030
 sixteen-0.030
Token so
Token ID523
Feature activation-4.19e-02

Pos logit contributions
 up+0.012
 else+0.011
uggle+0.011
 around+0.010
 compliments+0.010
Neg logit contributions
 arrest-0.016
 loser-0.015
 no-0.014
 fourteen-0.014
 eighteen-0.014
Token
Token ID198
Feature activation-1.97e-04

Pos logit contributions
 up+0.012
 else+0.011
uggle+0.011
 compliments+0.010
fish+0.010
Neg logit contributions
 arrest-0.028
 loser-0.027
 no-0.026
 fourteen-0.026
 eighteen-0.025
TokenThe
Token ID464
Feature activation+1.65e-02

Pos logit contributions
 up+0.000
uggle+0.000
 closer+0.000
 compliments+0.000
uddle+0.000
Neg logit contributions
 arrest-0.000
 no-0.000
 fourteen-0.000
 eighteen-0.000
 eight-0.000
Token harbor
Token ID25451
Feature activation+3.23e-02

Pos logit contributions
 arrest+0.029
 loser+0.028
 no+0.027
 fourteen+0.026
 eighteen+0.026
Neg logit contributions
uggle-0.023
 else-0.022
 up-0.022
 outside-0.020
 around-0.020
Token was
Token ID373
Feature activation+1.51e-02

Pos logit contributions
 arrest+0.072
 loser+0.069
 fourteen+0.066
 no+0.066
 eighteen+0.065
Neg logit contributions
 up-0.031
 else-0.028
uggle-0.027
 around-0.026
 compliments-0.025
Token forever
Token ID8097
Feature activation+5.80e-02

Pos logit contributions
 arrest+0.030
 loser+0.028
 fourteen+0.026
EEP+0.025
 eighteen+0.025
Neg logit contributions
 up-0.021
 around-0.019
 else-0.018
 outside-0.018
 closer-0.017
Token known
Token ID1900
Feature activation+9.66e-02

Pos logit contributions
 loser+0.111
 arrest+0.109
 fourteen+0.104
 eighteen+0.102
 eleven+0.101
Neg logit contributions
 up-0.074
 else-0.069
 outside-0.066
 around-0.066
 compliments-0.063
Token as
Token ID355
Feature activation+1.48e-02

Pos logit contributions
 arrest+0.181
 loser+0.167
 fourteen+0.159
 eighteen+0.155
 sixteen+0.155
Neg logit contributions
 up-0.139
 else-0.126
 around-0.124
 closer-0.120
 outside-0.119
Token the
Token ID262
Feature activation+4.15e-02

Pos logit contributions
 arrest+0.031
 loser+0.030
 fourteen+0.028
 eighteen+0.028
 sixteen+0.027
Neg logit contributions
 up-0.018
 else-0.016
 around-0.015
 closer-0.015
 compliments-0.014
Token Sharing
Token ID36644
Feature activation-2.13e-02

Pos logit contributions
 arrest+0.082
 loser+0.078
 no+0.076
 fourteen+0.074
 eighteen+0.072
Neg logit contributions
 up-0.049
 else-0.047
uggle-0.047
 around-0.043
 compliments-0.043
Token Harbor
Token ID21146
Feature activation-5.57e-03

Pos logit contributions
 up+0.026
 else+0.024
 around+0.023
uggle+0.023
 outside+0.022
Neg logit contributions
 arrest-0.043
 loser-0.040
 fourteen-0.038
 eighteen-0.038
 no-0.038
Token -
Token ID532
Feature activation+1.94e-02

Pos logit contributions
 up+0.007
 else+0.006
 around+0.006
fish+0.006
uggle+0.006
Neg logit contributions
 arrest-0.011
 loser-0.011
 fourteen-0.010
 no-0.010
 eighteen-0.010
Tokeno
Token ID78
Feature activation-3.41e-02

Pos logit contributions
 up+0.052
 else+0.046
uggle+0.046
 around+0.045
 outside+0.043
Neg logit contributions
 arrest-0.089
 loser-0.085
 no-0.081
 fourteen-0.081
 eighteen-0.080
Token had
Token ID550
Feature activation+3.38e-02

Pos logit contributions
 up+0.041
 else+0.037
 around+0.036
 outside+0.033
 quietly+0.033
Neg logit contributions
 arrest-0.069
 loser-0.068
 eighteen-0.063
 fourteen-0.063
 eleven-0.063
Token died
Token ID3724
Feature activation+3.69e-02

Pos logit contributions
 loser+0.064
 nineteen+0.063
 arrest+0.062
ross+0.060
 eighteen+0.060
Neg logit contributions
 up-0.041
 else-0.040
 outside-0.038
 around-0.038
 compliments-0.034
Token from
Token ID422
Feature activation+3.55e-02

Pos logit contributions
 arrest+0.080
 loser+0.075
 fourteen+0.071
 eighteen+0.071
 sixteen+0.071
Neg logit contributions
 up-0.044
 around-0.037
 outside-0.037
 else-0.036
uggle-0.035
Token his
Token ID465
Feature activation+3.34e-02

Pos logit contributions
 arrest+0.064
 loser+0.060
 no+0.058
 fourteen+0.057
 eighteen+0.055
Neg logit contributions
 up-0.048
 else-0.046
uggle-0.046
 around-0.043
 compliments-0.043
Token injuries
Token ID6821
Feature activation+9.62e-02

Pos logit contributions
 arrest+0.065
 loser+0.061
 no+0.059
 fourteen+0.059
 eighteen+0.057
Neg logit contributions
 up-0.041
 else-0.039
uggle-0.038
 around-0.035
 outside-0.035
Token.
Token ID13
Feature activation+6.04e-03

Pos logit contributions
 arrest+0.231
 loser+0.225
 eighteen+0.220
 fourteen+0.217
 sixteen+0.215
Neg logit contributions
 up-0.081
 around-0.077
 outside-0.077
 else-0.069
uggle-0.061
Token His
Token ID2399
Feature activation+1.33e-02

Pos logit contributions
 arrest+0.010
 loser+0.010
 fourteen+0.009
 eighteen+0.009
 eleven+0.009
Neg logit contributions
 up-0.009
uggle-0.008
 else-0.008
 around-0.008
 outside-0.008
Token friends
Token ID2460
Feature activation+1.93e-02

Pos logit contributions
 arrest+0.026
 loser+0.025
 no+0.024
 fourteen+0.023
 eighteen+0.023
Neg logit contributions
 up-0.016
uggle-0.015
 else-0.015
 around-0.014
 compliments-0.014
Token were
Token ID547
Feature activation+4.98e-03

Pos logit contributions
 arrest+0.038
 loser+0.037
 fourteen+0.036
 eighteen+0.035
 sixteen+0.035
Neg logit contributions
 up-0.024
 else-0.023
 around-0.022
 outside-0.021
 quietly-0.020
Token very
Token ID845
Feature activation-1.36e-02

Pos logit contributions
 arrest+0.009
 loser+0.009
 fourteen+0.008
 no+0.008
 eighteen+0.008
Neg logit contributions
 up-0.007
 else-0.006
uggle-0.006
 around-0.006
 outside-0.006
Token giant
Token ID6175
Feature activation+2.66e-02

Pos logit contributions
 arrest+0.051
 sixteen+0.048
tern+0.047
 eighteen+0.047
 fourteen+0.047
Neg logit contributions
 else-0.043
 up-0.042
 closer-0.039
 outside-0.037
in-0.036
Token!
Token ID0
Feature activation+3.05e-02

Pos logit contributions
 sixteen+0.057
 eighteen+0.057
 loser+0.056
 nineteen+0.055
 seventeen+0.055
Neg logit contributions
 up-0.025
 else-0.025
 outside-0.023
in-0.023
 around-0.021
Token You
Token ID921
Feature activation+7.15e-04

Pos logit contributions
 eighteen+0.042
 enough+0.041
ross+0.041
 nineteen+0.040
 sixteen+0.040
Neg logit contributions
uggle-0.047
 up-0.046
ore-0.045
 else-0.044
in-0.044
Token are
Token ID389
Feature activation+4.32e-03

Pos logit contributions
 eighteen+0.001
ross+0.001
 loser+0.001
 sixteen+0.001
 fourteen+0.001
Neg logit contributions
 else-0.001
 up-0.001
 around-0.001
uggle-0.001
 outside-0.001
Token under
Token ID739
Feature activation+1.24e-02

Pos logit contributions
 arrest+0.007
 loser+0.007
 fourteen+0.007
 paragraph+0.007
 eighteen+0.007
Neg logit contributions
 up-0.007
 else-0.006
 outside-0.006
 around-0.006
 closer-0.005
Token arrest
Token ID3251
Feature activation+9.61e-02

Pos logit contributions
 loser+0.010
 nineteen+0.010
 seventeen+0.010
 eighteen+0.010
 thirteen+0.009
Neg logit contributions
 outside-0.029
 up-0.029
 around-0.027
 else-0.026
 inside-0.025
Token!
Token ID0
Feature activation+2.15e-02

Pos logit contributions
 loser+0.204
 seventeen+0.197
 eighteen+0.196
 thirteen+0.196
 sixteen+0.194
Neg logit contributions
 up-0.121
 around-0.102
 outside-0.099
 else-0.077
 inside-0.072
Token You
Token ID921
Feature activation-1.43e-03

Pos logit contributions
 eighteen+0.033
 enough+0.032
 nineteen+0.031
 sixteen+0.030
 fourteen+0.030
Neg logit contributions
uggle-0.030
ore-0.030
 up-0.029
 else-0.027
 outside-0.026
Token can
Token ID460
Feature activation-1.05e-02

Pos logit contributions
 else+0.002
 up+0.002
 around+0.002
 outside+0.002
uggle+0.002
Neg logit contributions
 eighteen-0.003
ross-0.002
 loser-0.002
 fourteen-0.002
 sixteen-0.002
Token't
Token ID470
Feature activation+2.21e-02

Pos logit contributions
 up+0.020
 else+0.019
 compliments+0.019
uggle+0.019
 around+0.019
Neg logit contributions
 arrest-0.014
 loser-0.012
 no-0.012
 fourteen-0.011
 eighteen-0.010
Token take
Token ID1011
Feature activation+2.07e-02

Pos logit contributions
 arrest+0.041
 loser+0.038
 no+0.037
 fourteen+0.035
 eighteen+0.034
Neg logit contributions
 up-0.030
uggle-0.028
 else-0.028
 compliments-0.027
 around-0.026
Token big
Token ID1263
Feature activation-2.03e-03

Pos logit contributions
 arrest+0.002
 loser+0.002
 fourteen+0.002
 no+0.002
 eighteen+0.002
Neg logit contributions
 up-0.001
 else-0.001
uggle-0.001
 compliments-0.001
 around-0.001
Token farm
Token ID5318
Feature activation+3.17e-02

Pos logit contributions
 up+0.002
 else+0.002
 outside+0.002
 closer+0.002
uggle+0.002
Neg logit contributions
 arrest-0.004
 fourteen-0.004
 loser-0.004
ross-0.004
 eighteen-0.004
Token a
Token ID257
Feature activation+1.67e-02

Pos logit contributions
 arrest+0.065
 loser+0.061
 fourteen+0.059
 eighteen+0.058
 no+0.057
Neg logit contributions
 up-0.037
 else-0.034
uggle-0.033
 around-0.033
 outside-0.032
Token long
Token ID890
Feature activation+5.09e-02

Pos logit contributions
 arrest+0.035
 loser+0.033
 no+0.032
 fourteen+0.032
 eighteen+0.031
Neg logit contributions
 up-0.017
 else-0.017
uggle-0.016
 around-0.015
 outside-0.015
Token time
Token ID640
Feature activation+1.83e-02

Pos logit contributions
 arrest+0.095
 fourteen+0.093
 loser+0.092
 sixteen+0.090
 eighteen+0.088
Neg logit contributions
 up-0.060
 outside-0.059
 around-0.058
uggle-0.052
 outdoors-0.052
Token ago
Token ID2084
Feature activation+9.61e-02

Pos logit contributions
 arrest+0.028
 loser+0.027
 eighteen+0.025
 fourteen+0.025
 sixteen+0.024
Neg logit contributions
 up-0.030
 outside-0.029
 around-0.029
 else-0.029
 quietly-0.027
Token.
Token ID13
Feature activation-1.60e-03

Pos logit contributions
 arrest+0.234
 loser+0.233
 fourteen+0.229
 thirteen+0.227
 eighteen+0.226
Neg logit contributions
 up-0.070
 outside-0.062
 around-0.059
 else-0.046
fish-0.044
Token Lily
Token ID20037
Feature activation-1.79e-02

Pos logit contributions
 up+0.002
 else+0.002
uggle+0.002
 around+0.002
 compliments+0.002
Neg logit contributions
 arrest-0.003
 loser-0.003
 fourteen-0.003
 no-0.003
 eighteen-0.003
Token listened
Token ID16399
Feature activation-1.33e-02

Pos logit contributions
 up+0.021
uggle+0.021
 else+0.020
 outside+0.018
 around+0.018
Neg logit contributions
 arrest-0.035
 no-0.033
 loser-0.033
 fourteen-0.031
 enough-0.030
Token as
Token ID355
Feature activation-1.95e-02

Pos logit contributions
uggle+0.016
 up+0.016
 else+0.016
 compliments+0.015
fish+0.014
Neg logit contributions
 arrest-0.025
 no-0.023
 loser-0.023
 fourteen-0.023
 enough-0.022
Token the
Token ID262
Feature activation+1.71e-02

Pos logit contributions
 up+0.022
 else+0.020
uggle+0.019
 around+0.019
 outside+0.018
Neg logit contributions
 arrest-0.041
 loser-0.039
 eighteen-0.038
 fourteen-0.038
 sixteen-0.037
Token many
Token ID867
Feature activation+4.06e-02

Pos logit contributions
 arrest+0.018
 loser+0.017
 no+0.016
 fourteen+0.016
 eighteen+0.015
Neg logit contributions
 up-0.012
 else-0.011
uggle-0.011
 around-0.011
 compliments-0.010
Token more
Token ID517
Feature activation+8.08e-03

Pos logit contributions
 arrest+0.080
 loser+0.075
 no+0.074
 fourteen+0.072
 eighteen+0.070
Neg logit contributions
 up-0.047
uggle-0.045
 else-0.044
 compliments-0.041
 around-0.041
Token hunts
Token ID42984
Feature activation+2.08e-02

Pos logit contributions
 arrest+0.017
 loser+0.016
 no+0.016
 fourteen+0.016
 eighteen+0.015
Neg logit contributions
 up-0.008
 else-0.008
uggle-0.008
 around-0.007
 compliments-0.007
Token and
Token ID290
Feature activation+2.01e-02

Pos logit contributions
 arrest+0.042
 loser+0.041
 fourteen+0.039
 eighteen+0.038
 sixteen+0.038
Neg logit contributions
 up-0.025
 else-0.023
 around-0.023
 outside-0.022
uggle-0.022
Token became
Token ID2627
Feature activation+3.16e-02

Pos logit contributions
 arrest+0.037
 loser+0.035
 fourteen+0.034
 sixteen+0.032
 eighteen+0.032
Neg logit contributions
 up-0.028
 else-0.027
 around-0.025
uggle-0.025
 compliments-0.025
Token known
Token ID1900
Feature activation+9.49e-02

Pos logit contributions
 arrest+0.064
 loser+0.059
 fourteen+0.058
 eighteen+0.056
 sixteen+0.056
Neg logit contributions
 up-0.039
 else-0.036
 around-0.034
uggle-0.033
 closer-0.033
Token as
Token ID355
Feature activation+8.30e-03

Pos logit contributions
 arrest+0.176
 loser+0.164
 fourteen+0.158
 eighteen+0.153
 sixteen+0.152
Neg logit contributions
 up-0.136
 around-0.123
 else-0.123
 outside-0.117
 closer-0.117
Token the
Token ID262
Feature activation+4.45e-02

Pos logit contributions
 arrest+0.017
 loser+0.017
 fourteen+0.016
 eighteen+0.016
 sixteen+0.015
Neg logit contributions
 up-0.010
 else-0.009
 around-0.008
 closer-0.008
 outside-0.008
Token smartest
Token ID44730
Feature activation+3.44e-02

Pos logit contributions
 arrest+0.079
 loser+0.075
 no+0.072
 fourteen+0.070
 eighteen+0.069
Neg logit contributions
 up-0.062
 else-0.059
uggle-0.059
 around-0.055
 compliments-0.055
Token,
Token ID11
Feature activation+3.98e-02

Pos logit contributions
 arrest+0.070
 eighteen+0.068
 fourteen+0.068
 sixteen+0.068
 loser+0.067
Neg logit contributions
 up-0.040
 around-0.037
 else-0.037
 outside-0.035
in-0.035
Token most
Token ID749
Feature activation+2.98e-02

Pos logit contributions
 arrest+0.090
 loser+0.089
 fourteen+0.087
 eighteen+0.084
 nineteen+0.083
Neg logit contributions
 up-0.036
 else-0.034
 around-0.033
 outside-0.031
 compliments-0.029
Token,
Token ID11
Feature activation+6.85e-03

Pos logit contributions
 else+0.008
 up+0.008
 around+0.007
 closer+0.007
 quietly+0.007
Neg logit contributions
 arrest-0.014
 loser-0.013
ACK-0.012
 fourteen-0.012
 sixteen-0.012
Token "
Token ID366
Feature activation-1.41e-03

Pos logit contributions
 arrest+0.014
 loser+0.013
ross+0.012
 fourteen+0.011
 eighteen+0.011
Neg logit contributions
 up-0.009
 else-0.008
 quietly-0.008
 around-0.008
 outside-0.008
TokenYou
Token ID1639
Feature activation-1.61e-02

Pos logit contributions
 else+0.002
 up+0.002
Hop+0.002
in+0.002
uggle+0.001
Neg logit contributions
 arrest-0.003
 loser-0.003
 no-0.003
 sixteen-0.003
 eighteen-0.002
Token are
Token ID389
Feature activation+2.38e-03

Pos logit contributions
 else+0.024
 up+0.022
 around+0.020
 compliments+0.019
 closer+0.019
Neg logit contributions
 arrest-0.027
 loser-0.027
 eighteen-0.027
 fourteen-0.026
ross-0.026
Token under
Token ID739
Feature activation+1.14e-02

Pos logit contributions
 arrest+0.005
 loser+0.004
 fourteen+0.004
 paragraph+0.004
 eighteen+0.004
Neg logit contributions
 up-0.003
 else-0.003
 outside-0.003
 around-0.003
 closer-0.003
Token arrest
Token ID3251
Feature activation+9.42e-02

Pos logit contributions
 nineteen+0.013
 loser+0.013
 eighteen+0.012
 seventeen+0.012
 thirteen+0.012
Neg logit contributions
 up-0.024
 outside-0.024
 around-0.022
 else-0.021
 comfort-0.020
Token for
Token ID329
Feature activation+1.53e-02

Pos logit contributions
 loser+0.199
 sixteen+0.193
 eighteen+0.191
 seventeen+0.191
 thirteen+0.191
Neg logit contributions
 up-0.114
 around-0.099
 outside-0.094
 else-0.081
le-0.072
Token harassing
Token ID34110
Feature activation+3.57e-02

Pos logit contributions
 loser+0.032
tern+0.030
ross+0.030
 fourteen+0.030
EEP+0.029
Neg logit contributions
 else-0.015
 up-0.015
 outside-0.015
 comfort-0.014
 closer-0.013
Token and
Token ID290
Feature activation+2.05e-02

Pos logit contributions
 arrest+0.070
 nineteen+0.067
 fourteen+0.066
 eighteen+0.065
 sixteen+0.065
Neg logit contributions
 up-0.043
 around-0.040
 outside-0.039
 else-0.036
 quietly-0.035
Token stealing
Token ID16568
Feature activation+1.07e-02

Pos logit contributions
 arrest+0.033
 loser+0.032
 fourteen+0.031
 sixteen+0.030
 eighteen+0.030
Neg logit contributions
 up-0.032
 else-0.030
 outside-0.028
 around-0.028
 closer-0.027
Token from
Token ID422
Feature activation+2.51e-02

Pos logit contributions
 arrest+0.020
 loser+0.019
 eighteen+0.018
 fourteen+0.018
 sixteen+0.018
Neg logit contributions
 up-0.014
 else-0.013
 around-0.012
 outside-0.012
uggle-0.012
Tokenf
Token ID69
Feature activation+2.69e-02

Pos logit contributions
 arrest+0.024
 loser+0.022
 fourteen+0.021
 no+0.021
 eighteen+0.020
Neg logit contributions
 else-0.020
 up-0.020
uggle-0.020
in-0.019
 closer-0.018
Tokenortunate
Token ID13651
Feature activation+1.31e-02

Pos logit contributions
 arrest+0.053
 loser+0.049
 eighteen+0.048
 nineteen+0.048
 no+0.047
Neg logit contributions
 else-0.031
 up-0.031
in-0.031
 around-0.030
 outside-0.029
Tokeny
Token ID88
Feature activation+1.45e-02

Pos logit contributions
 arrest+0.028
 eighteen+0.027
 loser+0.027
 fourteen+0.027
 nineteen+0.027
Neg logit contributions
 up-0.016
 around-0.015
 outside-0.014
 else-0.013
in-0.013
Token,
Token ID11
Feature activation+2.50e-02

Pos logit contributions
 eighteen+0.032
 fourteen+0.031
 sixteen+0.031
 seventeen+0.030
 nineteen+0.030
Neg logit contributions
 up-0.017
 around-0.016
 outside-0.016
uggle-0.015
 else-0.014
Token her
Token ID607
Feature activation+2.69e-02

Pos logit contributions
 arrest+0.050
 loser+0.049
 eighteen+0.048
 fourteen+0.047
 sixteen+0.047
Neg logit contributions
 up-0.030
 else-0.029
 around-0.028
 outside-0.028
 closer-0.025
Token injuries
Token ID6821
Feature activation+9.41e-02

Pos logit contributions
 arrest+0.050
 loser+0.047
 fourteen+0.046
 eighteen+0.046
 sixteen+0.045
Neg logit contributions
 up-0.036
 else-0.034
uggle-0.033
 around-0.032
 outside-0.032
Token were
Token ID547
Feature activation+3.99e-02

Pos logit contributions
 arrest+0.199
 loser+0.190
 eighteen+0.187
 fourteen+0.182
 sixteen+0.181
Neg logit contributions
 up-0.102
 around-0.100
 else-0.099
 outside-0.096
uggle-0.086
Token so
Token ID523
Feature activation-1.96e-02

Pos logit contributions
 arrest+0.071
 loser+0.066
 fourteen+0.061
 eighteen+0.060
EEP+0.059
Neg logit contributions
 up-0.063
 else-0.060
 around-0.058
 outside-0.056
 closer-0.055
Token severe
Token ID6049
Feature activation+4.73e-02

Pos logit contributions
 up+0.024
 else+0.023
uggle+0.022
 around+0.021
 compliments+0.021
Neg logit contributions
 arrest-0.038
 loser-0.036
 fourteen-0.034
 no-0.034
 eighteen-0.034
Token,
Token ID11
Feature activation+1.12e-02

Pos logit contributions
 arrest+0.096
 loser+0.096
 eighteen+0.092
 fourteen+0.092
 sixteen+0.091
Neg logit contributions
 up-0.056
 around-0.049
 outside-0.049
 else-0.046
 quietly-0.045
Token she
Token ID673
Feature activation+1.80e-02

Pos logit contributions
 arrest+0.026
 loser+0.025
 eighteen+0.024
 sixteen+0.024
 fourteen+0.024
Neg logit contributions
 up-0.011
 else-0.010
 around-0.009
 outside-0.009
 closer-0.008
Token robber
Token ID29979
Feature activation+5.82e-02

Pos logit contributions
 arrest+0.038
tern+0.036
 ago+0.036
 sixteen+0.035
 eighteen+0.035
Neg logit contributions
 else-0.047
 up-0.045
 closer-0.043
Sure-0.042
 around-0.042
Token!
Token ID0
Feature activation+4.03e-02

Pos logit contributions
 loser+0.106
 sixteen+0.102
 eighteen+0.101
tern+0.101
 arrest+0.100
Neg logit contributions
 up-0.077
 else-0.074
 outside-0.070
 around-0.069
 closer-0.063
Token You
Token ID921
Feature activation-2.78e-03

Pos logit contributions
 arrest+0.059
 eighteen+0.058
 sixteen+0.057
 enough+0.057
 fourteen+0.057
Neg logit contributions
 up-0.062
uggle-0.059
 else-0.055
 outside-0.055
 around-0.054
Token are
Token ID389
Feature activation+2.16e-03

Pos logit contributions
 else+0.004
 up+0.004
uggle+0.004
 around+0.003
 outside+0.003
Neg logit contributions
 eighteen-0.005
 fourteen-0.005
 sixteen-0.005
ross-0.005
 arrest-0.005
Token under
Token ID739
Feature activation+1.14e-02

Pos logit contributions
 arrest+0.003
 loser+0.003
 fourteen+0.003
 eighteen+0.003
 paragraph+0.003
Neg logit contributions
 up-0.004
 else-0.003
 outside-0.003
 around-0.003
 closer-0.003
Token arrest
Token ID3251
Feature activation+9.39e-02

Pos logit contributions
 loser+0.014
 nineteen+0.014
 seventeen+0.014
 eighteen+0.013
 thirteen+0.013
Neg logit contributions
 outside-0.023
 up-0.023
 around-0.021
 else-0.021
 new-0.019
Token!"
Token ID2474
Feature activation-5.26e-04

Pos logit contributions
 loser+0.188
tern+0.171
 eighteen+0.171
 thirteen+0.171
 ago+0.170
Neg logit contributions
 up-0.136
 around-0.116
 outside-0.113
 else-0.098
uggle-0.086
Token Lily
Token ID20037
Feature activation-4.93e-03

Pos logit contributions
 up+0.001
 else+0.001
 around+0.001
uggle+0.001
 outside+0.001
Neg logit contributions
 arrest-0.001
 loser-0.001
 fourteen-0.001
 eighteen-0.001
 no-0.001
Token shouts
Token ID34757
Feature activation+1.23e-02

Pos logit contributions
 up+0.007
 else+0.006
uggle+0.006
 compliments+0.006
 around+0.006
Neg logit contributions
 arrest-0.009
 loser-0.009
 no-0.008
 fourteen-0.008
 eighteen-0.008
Token,
Token ID11
Feature activation+1.43e-02

Pos logit contributions
 arrest+0.023
 loser+0.022
 no+0.021
 fourteen+0.020
 enough+0.020
Neg logit contributions
uggle-0.015
 up-0.015
 else-0.015
 compliments-0.014
fish-0.014
Token holding
Token ID4769
Feature activation-4.27e-02

Pos logit contributions
 arrest+0.025
 loser+0.025
 fourteen+0.023
 eighteen+0.023
 sixteen+0.022
Neg logit contributions
 up-0.021
 outside-0.019
 around-0.019
 else-0.018
 quietly-0.018
Token pocket
Token ID10000
Feature activation+2.90e-02

Pos logit contributions
 fourteen+0.000
 arrest+0.000
 loser+0.000
 eighteen+0.000
ross+0.000
Neg logit contributions
 up-0.000
 outside-0.000
 around-0.000
 else-0.000
uggle-0.000
Token and
Token ID290
Feature activation+5.49e-03

Pos logit contributions
 loser+0.069
 eighteen+0.068
 fourteen+0.067
 arrest+0.067
 nineteen+0.065
Neg logit contributions
 up-0.029
 else-0.025
 outside-0.024
 around-0.023
 carefully-0.023
Token broke
Token ID6265
Feature activation+1.55e-02

Pos logit contributions
 loser+0.012
 arrest+0.011
 fourteen+0.011
 eighteen+0.011
 thirteen+0.011
Neg logit contributions
 up-0.007
 else-0.006
 around-0.006
 outside-0.006
 closer-0.006
Token into
Token ID656
Feature activation+1.27e-02

Pos logit contributions
 arrest+0.036
 loser+0.036
 fourteen+0.034
 eighteen+0.034
 nineteen+0.033
Neg logit contributions
 up-0.018
 around-0.013
 outside-0.013
 closer-0.012
 else-0.011
Token a
Token ID257
Feature activation+6.66e-03

Pos logit contributions
 loser+0.024
 eighteen+0.023
 fourteen+0.023
 arrest+0.023
ross+0.023
Neg logit contributions
 up-0.016
 else-0.015
 outside-0.013
 around-0.013
in-0.013
Token million
Token ID1510
Feature activation+9.37e-02

Pos logit contributions
 arrest+0.012
 fourteen+0.011
 loser+0.011
 no+0.010
 eighteen+0.010
Neg logit contributions
 else-0.009
 up-0.009
 closer-0.008
uggle-0.008
 outside-0.008
Token pieces
Token ID5207
Feature activation+2.86e-02

Pos logit contributions
 loser+0.171
 arrest+0.169
 fourteen+0.166
 eighteen+0.165
ross+0.159
Neg logit contributions
 else-0.121
 up-0.119
 around-0.107
 compliments-0.106
in-0.105
Token.
Token ID13
Feature activation-3.02e-03

Pos logit contributions
 arrest+0.064
 loser+0.063
 eighteen+0.060
 fourteen+0.059
 sixteen+0.058
Neg logit contributions
 up-0.028
 else-0.027
 around-0.027
 outside-0.026
 closer-0.022
Token Tim
Token ID5045
Feature activation-1.16e-02

Pos logit contributions
 up+0.005
 else+0.004
uggle+0.004
 outside+0.004
 around+0.004
Neg logit contributions
 arrest-0.005
 loser-0.005
 fourteen-0.004
 eighteen-0.004
 eleven-0.004
Tokenmy
Token ID1820
Feature activation-2.74e-02

Pos logit contributions
 up+0.012
 else+0.011
in+0.011
 around+0.010
 outside+0.010
Neg logit contributions
 arrest-0.027
 loser-0.025
 fourteen-0.025
 eighteen-0.024
 sixteen-0.024
Token's
Token ID338
Feature activation-4.12e-03

Pos logit contributions
 up+0.025
 else+0.025
 around+0.021
uggle+0.021
 outside+0.020
Neg logit contributions
 arrest-0.062
 loser-0.060
 fourteen-0.058
 eighteen-0.057
 eleven-0.057
Token "
Token ID366
Feature activation+1.10e-02

Pos logit contributions
 arrest+0.024
 loser+0.023
 fourteen+0.020
 no+0.020
 eighteen+0.020
Neg logit contributions
 up-0.019
 else-0.017
 around-0.017
 quietly-0.017
 outside-0.017
TokenI
Token ID40
Feature activation-1.64e-02

Pos logit contributions
 arrest+0.022
 loser+0.021
 no+0.020
 enough+0.020
 fourteen+0.019
Neg logit contributions
 up-0.012
Hop-0.011
uggle-0.011
 compliments-0.011
 else-0.011
Token feel
Token ID1254
Feature activation-1.79e-02

Pos logit contributions
 up+0.022
 else+0.020
uggle+0.020
 around+0.019
 compliments+0.019
Neg logit contributions
 arrest-0.030
 loser-0.029
 no-0.028
 fourteen-0.027
 eighteen-0.027
Token like
Token ID588
Feature activation-2.89e-02

Pos logit contributions
 up+0.028
uggle+0.027
 else+0.027
fish+0.025
 compliments+0.025
Neg logit contributions
 arrest-0.028
 no-0.026
 loser-0.026
 fourteen-0.024
 enough-0.024
Token a
Token ID257
Feature activation-3.96e-04

Pos logit contributions
 up+0.038
 else+0.036
 outside+0.034
 around+0.033
uggle+0.033
Neg logit contributions
 arrest-0.054
 loser-0.053
 fourteen-0.050
 eighteen-0.049
 sixteen-0.049
Token million
Token ID1510
Feature activation+9.35e-02

Pos logit contributions
 up+0.000
 else+0.000
uggle+0.000
 around+0.000
 compliments+0.000
Neg logit contributions
 arrest-0.001
 loser-0.001
 no-0.001
 fourteen-0.001
 eighteen-0.001
Token bucks
Token ID24780
Feature activation+3.72e-02

Pos logit contributions
 arrest+0.185
 loser+0.177
 fourteen+0.170
 no+0.169
 eighteen+0.167
Neg logit contributions
 up-0.109
 else-0.105
uggle-0.098
 compliments-0.097
 around-0.096
Token!"
Token ID2474
Feature activation-2.95e-02

Pos logit contributions
 arrest+0.073
 nineteen+0.063
 fourteen+0.063
 eighteen+0.063
 loser+0.063
Neg logit contributions
 up-0.052
 around-0.048
 outside-0.046
fish-0.044
in-0.043
Token And
Token ID843
Feature activation+2.07e-02

Pos logit contributions
 up+0.036
uggle+0.034
 else+0.034
 around+0.033
 outside+0.032
Neg logit contributions
 arrest-0.058
 loser-0.056
 no-0.052
 fourteen-0.052
 eighteen-0.052
Token just
Token ID655
Feature activation+2.67e-02

Pos logit contributions
 arrest+0.043
 loser+0.041
 fourteen+0.039
 eighteen+0.038
 sixteen+0.038
Neg logit contributions
 up-0.025
 else-0.024
 around-0.023
uggle-0.021
 outside-0.021
Token like
Token ID588
Feature activation-2.11e-02

Pos logit contributions
 arrest+0.050
 loser+0.047
 fourteen+0.044
 no+0.043
 eighteen+0.043
Neg logit contributions
 up-0.037
 else-0.035
 around-0.033
 outside-0.032
uggle-0.032
Token was
Token ID373
Feature activation+1.63e-02

Pos logit contributions
 arrest+0.077
 loser+0.072
 fourteen+0.069
 no+0.069
 eighteen+0.068
Neg logit contributions
 up-0.053
 else-0.051
uggle-0.049
 around-0.047
 outside-0.046
Token walking
Token ID6155
Feature activation+1.45e-02

Pos logit contributions
 arrest+0.032
 loser+0.030
 fourteen+0.029
 eighteen+0.028
 sixteen+0.027
Neg logit contributions
 up-0.022
 else-0.020
 around-0.020
 outside-0.020
uggle-0.018
Token through
Token ID832
Feature activation+1.02e-02

Pos logit contributions
 arrest+0.035
 loser+0.033
 fourteen+0.033
 eighteen+0.032
 sixteen+0.032
Neg logit contributions
 up-0.013
 around-0.011
 outside-0.011
uggle-0.010
 else-0.010
Token a
Token ID257
Feature activation+1.32e-03

Pos logit contributions
 arrest+0.018
 loser+0.018
 fourteen+0.017
 eighteen+0.016
 no+0.016
Neg logit contributions
 up-0.014
 else-0.013
uggle-0.013
 around-0.013
 outside-0.013
Token fair
Token ID3148
Feature activation+1.24e-02

Pos logit contributions
 arrest+0.003
 loser+0.002
 no+0.002
 eight+0.002
 eighteen+0.002
Neg logit contributions
 up-0.002
uggle-0.002
 compliments-0.001
fish-0.001
 around-0.001
Tokenyt
Token ID20760
Feature activation+9.27e-02

Pos logit contributions
 arrest+0.023
 loser+0.021
 no+0.021
 fourteen+0.020
 eighteen+0.020
Neg logit contributions
 up-0.017
 else-0.016
uggle-0.016
 compliments-0.015
 around-0.015
Tokenale
Token ID1000
Feature activation+2.75e-02

Pos logit contributions
 arrest+0.187
 loser+0.187
 no+0.181
 Leopard+0.173
 fifteen+0.168
Neg logit contributions
 up-0.100
uggle-0.097
 around-0.094
 closely-0.090
 outside-0.087
Token.
Token ID13
Feature activation+1.90e-03

Pos logit contributions
 arrest+0.054
 loser+0.052
 fourteen+0.050
 no+0.050
 eighteen+0.049
Neg logit contributions
 up-0.032
 else-0.031
uggle-0.030
 around-0.029
 outside-0.029
Token She
Token ID1375
Feature activation+6.89e-03

Pos logit contributions
 arrest+0.004
 loser+0.003
 eighteen+0.003
 fourteen+0.003
 no+0.003
Neg logit contributions
 up-0.003
uggle-0.002
 around-0.002
 else-0.002
 outside-0.002
Token fell
Token ID3214
Feature activation+3.11e-02

Pos logit contributions
 arrest+0.012
 no+0.012
 loser+0.011
 fourteen+0.011
 enough+0.010
Neg logit contributions
 up-0.009
uggle-0.009
 else-0.009
 compliments-0.008
 outside-0.008
Token asleep
Token ID16039
Feature activation+1.48e-02

Pos logit contributions
 arrest+0.050
 loser+0.047
 no+0.045
 fourteen+0.044
 eighteen+0.043
Neg logit contributions
 up-0.049
uggle-0.046
 else-0.046
 around-0.045
 outside-0.044
Token best
Token ID1266
Feature activation+3.87e-02

Pos logit contributions
 up+0.012
 else+0.012
uggle+0.012
 around+0.011
 compliments+0.011
Neg logit contributions
 arrest-0.020
 loser-0.019
 no-0.018
 fourteen-0.018
 eighteen-0.018
Token friend
Token ID1545
Feature activation+4.55e-02

Pos logit contributions
 arrest+0.081
 fourteen+0.077
 loser+0.075
 sixteen+0.075
 thirteen+0.075
Neg logit contributions
 up-0.043
 else-0.038
 outside-0.038
uggle-0.037
 around-0.037
Token a
Token ID257
Feature activation+4.94e-03

Pos logit contributions
 arrest+0.108
 fourteen+0.099
 eighteen+0.099
 sixteen+0.098
 loser+0.098
Neg logit contributions
 up-0.043
 else-0.042
 around-0.040
 outside-0.039
uggle-0.033
Token long
Token ID890
Feature activation+4.60e-02

Pos logit contributions
 arrest+0.010
 loser+0.009
 no+0.009
 fourteen+0.009
 eighteen+0.008
Neg logit contributions
 up-0.006
 else-0.006
uggle-0.006
 around-0.005
 compliments-0.005
Token time
Token ID640
Feature activation+1.25e-02

Pos logit contributions
 arrest+0.081
 fourteen+0.078
 sixteen+0.076
 fifteen+0.076
 thirteen+0.074
Neg logit contributions
 up-0.062
 around-0.058
 outside-0.058
fish-0.055
 else-0.055
Token ago
Token ID2084
Feature activation+9.21e-02

Pos logit contributions
 arrest+0.026
 loser+0.023
 fourteen+0.023
 eighteen+0.023
 sixteen+0.023
Neg logit contributions
 up-0.015
 else-0.014
 around-0.014
 outside-0.014
 closer-0.013
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
 arrest+0.234
 fourteen+0.231
 thirteen+0.226
 eighteen+0.226
 sixteen+0.225
Neg logit contributions
 up-0.061
 outside-0.053
 around-0.052
 else-0.040
fish-0.035
Token He
Token ID679
Feature activation+3.37e-03

Pos logit contributions
 arrest+0.033
 loser+0.029
 fourteen+0.028
 eighteen+0.028
 sixteen+0.027
Neg logit contributions
 up-0.040
uggle-0.039
 else-0.039
 around-0.037
 outside-0.036
Token was
Token ID373
Feature activation-3.70e-03

Pos logit contributions
 arrest+0.006
 loser+0.005
 no+0.005
 fourteen+0.005
 eighteen+0.005
Neg logit contributions
 up-0.005
uggle-0.005
 else-0.005
 compliments-0.004
 around-0.004
Token my
Token ID616
Feature activation+8.98e-05

Pos logit contributions
 up+0.005
 else+0.005
 outside+0.005
 around+0.005
 closer+0.005
Neg logit contributions
 arrest-0.007
 loser-0.006
 fourteen-0.006
 eighteen-0.006
 sixteen-0.006
Token dog
Token ID3290
Feature activation+2.68e-02

Pos logit contributions
 arrest+0.000
 loser+0.000
 no+0.000
 fourteen+0.000
 eighteen+0.000
Neg logit contributions
 up-0.000
 else-0.000
uggle-0.000
 around-0.000
 compliments-0.000
Token,
Token ID11
Feature activation+1.16e-02

Pos logit contributions
 arrest+0.054
 loser+0.053
 eighteen+0.051
 sixteen+0.051
ACK+0.050
Neg logit contributions
 up-0.025
 else-0.022
 around-0.021
 outside-0.020
 carefully-0.019
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
 arrest+0.023
 loser+0.023
 sixteen+0.023
 eighteen+0.022
 fourteen+0.022
Neg logit contributions
 up-0.013
 else-0.012
 around-0.011
uggle-0.011
 outside-0.011
Token ancient
Token ID6156
Feature activation-1.24e-02

Pos logit contributions
 arrest+0.038
 loser+0.037
 no+0.036
 fourteen+0.034
 eighteen+0.033
Neg logit contributions
 up-0.025
uggle-0.024
 else-0.024
 around-0.022
 compliments-0.022
Token task
Token ID4876
Feature activation+1.31e-02

Pos logit contributions
 up+0.015
 else+0.014
uggle+0.014
 compliments+0.013
 around+0.013
Neg logit contributions
 arrest-0.024
 loser-0.023
 no-0.022
 fourteen-0.022
 eighteen-0.021
Token was
Token ID373
Feature activation+2.57e-02

Pos logit contributions
 arrest+0.029
 loser+0.027
 fourteen+0.026
 eighteen+0.026
 sixteen+0.026
Neg logit contributions
 up-0.014
 else-0.013
 around-0.012
uggle-0.012
 outside-0.011
Token known
Token ID1900
Feature activation+9.20e-02

Pos logit contributions
 arrest+0.053
 loser+0.050
 fourteen+0.045
 eighteen+0.044
EEP+0.044
Neg logit contributions
 up-0.036
 else-0.032
 around-0.031
 outside-0.031
 closer-0.030
Token to
Token ID284
Feature activation+3.15e-02

Pos logit contributions
 arrest+0.183
 loser+0.172
 fourteen+0.161
tern+0.159
 eighteen+0.159
Neg logit contributions
 up-0.129
 around-0.114
 else-0.110
 outside-0.108
 closer-0.108
Token be
Token ID307
Feature activation+3.17e-02

Pos logit contributions
 arrest+0.064
 loser+0.063
 fourteen+0.062
 eighteen+0.061
tern+0.061
Neg logit contributions
 up-0.037
 else-0.035
 around-0.035
 outside-0.034
 carefully-0.031
Token solved
Token ID16019
Feature activation+6.39e-02

Pos logit contributions
 arrest+0.070
 loser+0.065
 fourteen+0.065
EEP+0.062
 eighteen+0.061
Neg logit contributions
 up-0.038
 around-0.035
 outside-0.034
 else-0.033
 closer-0.031
Token by
Token ID416
Feature activation+1.16e-02

Pos logit contributions
 arrest+0.139
 loser+0.136
 eighteen+0.128
 fourteen+0.128
 nineteen+0.127
Neg logit contributions
 up-0.079
 around-0.068
 outside-0.064
 else-0.062
 carefully-0.059
Token the
Token ID262
Feature activation+2.16e-02

Pos logit contributions
 loser+0.022
 arrest+0.021
 eighteen+0.021
 fourteen+0.020
 nineteen+0.020
Neg logit contributions
 up-0.014
 else-0.013
 carefully-0.012
 outside-0.012
 around-0.012
Tokenrot
Token ID10599
Feature activation+1.83e-02

Pos logit contributions
 arrest+0.028
 loser+0.027
 no+0.026
 fourteen+0.026
 enough+0.025
Neg logit contributions
 up-0.025
 else-0.024
uggle-0.024
in-0.023
fish-0.023
Token was
Token ID373
Feature activation+9.65e-03

Pos logit contributions
 arrest+0.039
 loser+0.038
 fourteen+0.036
 eighteen+0.036
 sixteen+0.036
Neg logit contributions
 up-0.019
 else-0.019
 around-0.018
uggle-0.016
 outside-0.016
Token the
Token ID262
Feature activation+8.84e-03

Pos logit contributions
 arrest+0.018
 loser+0.017
 fourteen+0.016
EEP+0.016
ACK+0.016
Neg logit contributions
 up-0.015
 around-0.013
 else-0.013
 outside-0.013
uggle-0.012
Token happiest
Token ID49414
Feature activation+2.43e-02

Pos logit contributions
 arrest+0.018
 fourteen+0.016
 loser+0.016
 sixteen+0.016
 eighteen+0.016
Neg logit contributions
 up-0.011
 else-0.010
uggle-0.009
 compliments-0.009
 closer-0.009
Token he
Token ID339
Feature activation+5.95e-02

Pos logit contributions
 arrest+0.051
 fourteen+0.050
 sixteen+0.050
 eighteen+0.050
 loser+0.048
Neg logit contributions
 up-0.025
 else-0.024
 around-0.023
fish-0.023
 outside-0.023
Token had
Token ID550
Feature activation+9.16e-02

Pos logit contributions
 arrest+0.101
 loser+0.099
 fourteen+0.097
 sixteen+0.095
 eighteen+0.094
Neg logit contributions
 else-0.090
 up-0.090
 around-0.088
 outside-0.085
fish-0.082
Token ever
Token ID1683
Feature activation+5.98e-02

Pos logit contributions
 loser+0.142
 thirteen+0.141
 arrest+0.141
 eighteen+0.140
 fourteen+0.139
Neg logit contributions
 up-0.148
 around-0.143
 outside-0.143
 else-0.140
fish-0.131
Token been
Token ID587
Feature activation+4.82e-02

Pos logit contributions
 loser+0.088
 arrest+0.086
 fourteen+0.081
 eighteen+0.079
 sixteen+0.078
Neg logit contributions
 else-0.100
 up-0.100
 around-0.096
 outside-0.093
 compliments-0.091
Token.
Token ID13
Feature activation-8.07e-03

Pos logit contributions
 fourteen+0.102
 arrest+0.101
 loser+0.100
 thirteen+0.098
 eighteen+0.098
Neg logit contributions
 up-0.051
 else-0.050
 outside-0.049
 around-0.049
in-0.043
Token<|endoftext|>
Token ID50256
Feature activation-3.12e-02

Pos logit contributions
uggle+0.005
 up+0.005
 else+0.005
 around+0.004
 compliments+0.004
Neg logit contributions
 arrest-0.020
 loser-0.020
 fourteen-0.018
 eighteen-0.018
 eleven-0.018
TokenOnce
Token ID7454
Feature activation-2.36e-02

Pos logit contributions
 up+0.031
 else+0.031
uggle+0.029
 compliments+0.028
fish+0.026
Neg logit contributions
 arrest-0.065
 loser-0.062
 no-0.061
 fourteen-0.059
 enough-0.058
Token"
Token ID1
Feature activation+2.82e-02

Pos logit contributions
 arrest+0.071
 loser+0.068
 no+0.066
 fourteen+0.063
 eleven+0.062
Neg logit contributions
 up-0.054
 around-0.049
 outside-0.047
 else-0.046
uggle-0.046
Token and
Token ID290
Feature activation-1.27e-03

Pos logit contributions
 arrest+0.062
 loser+0.055
 fourteen+0.054
 eighteen+0.053
 no+0.052
Neg logit contributions
 up-0.034
 around-0.029
 else-0.029
 outside-0.029
uggle-0.028
Token "
Token ID366
Feature activation-1.28e-02

Pos logit contributions
 up+0.002
 else+0.001
 around+0.001
 quietly+0.001
 carefully+0.001
Neg logit contributions
 arrest-0.002
 loser-0.002
ross-0.002
 enough-0.002
OOOO-0.002
Tokenla
Token ID5031
Feature activation+1.15e-02

Pos logit contributions
 up+0.017
 else+0.016
uggle+0.016
in+0.015
 around+0.015
Neg logit contributions
 arrest-0.024
 no-0.021
 loser-0.021
 fourteen-0.021
 enough-0.021
Token la
Token ID8591
Feature activation+8.64e-02

Pos logit contributions
 arrest+0.020
 loser+0.019
ross+0.018
 fourteen+0.018
 no+0.018
Neg logit contributions
 else-0.016
 up-0.016
 around-0.015
uggle-0.014
 outside-0.014
Token la
Token ID8591
Feature activation+9.14e-02

Pos logit contributions
 arrest+0.134
 loser+0.126
 no+0.124
 thirteen+0.120
ross+0.120
Neg logit contributions
uggle-0.122
 up-0.121
 else-0.121
 around-0.119
 outside-0.119
Token".
Token ID1911
Feature activation-1.80e-02

Pos logit contributions
 arrest+0.139
 loser+0.132
 no+0.130
 thirteen+0.125
 vain+0.124
Neg logit contributions
uggle-0.130
 up-0.130
 around-0.128
 else-0.128
 outside-0.127
Token
Token ID198
Feature activation-1.42e-02

Pos logit contributions
 up+0.026
 else+0.023
 around+0.023
 outside+0.023
uggle+0.022
Neg logit contributions
 arrest-0.034
 loser-0.031
 fourteen-0.029
 eighteen-0.029
 thirteen-0.029
Token
Token ID198
Feature activation+3.06e-05

Pos logit contributions
 up+0.017
uggle+0.017
fish+0.017
 compliments+0.017
 findings+0.016
Neg logit contributions
 no-0.024
 arrest-0.024
 loser-0.024
 fourteen-0.023
 No-0.022
TokenMom
Token ID29252
Feature activation-1.72e-02

Pos logit contributions
 arrest+0.000
 loser+0.000
 no+0.000
 fourteen+0.000
 eighteen+0.000
Neg logit contributions
 up-0.000
uggle-0.000
 else-0.000
 compliments-0.000
 around-0.000
Token is
Token ID318
Feature activation-3.71e-02

Pos logit contributions
 up+0.023
 else+0.022
uggle+0.021
 around+0.020
 compliments+0.020
Neg logit contributions
 arrest-0.032
 loser-0.031
 no-0.029
 fourteen-0.028
 eighteen-0.028
Token and
Token ID290
Feature activation-3.39e-03

Pos logit contributions
 up+0.005
 outside+0.004
 around+0.004
 else+0.004
 carefully+0.004
Neg logit contributions
 loser-0.009
 arrest-0.009
 eighteen-0.009
 nineteen-0.009
 fourteen-0.009
Token it
Token ID340
Feature activation-6.01e-03

Pos logit contributions
 up+0.005
 else+0.004
uggle+0.004
 around+0.004
 outside+0.004
Neg logit contributions
 arrest-0.006
 loser-0.006
 fourteen-0.006
 eighteen-0.006
 no-0.006
Token broke
Token ID6265
Feature activation+3.09e-02

Pos logit contributions
 up+0.010
 around+0.009
 else+0.009
 outside+0.009
 closer+0.008
Neg logit contributions
 arrest-0.011
 loser-0.010
 eighteen-0.009
 fourteen-0.009
 sixteen-0.009
Token into
Token ID656
Feature activation+2.77e-02

Pos logit contributions
 arrest+0.078
 loser+0.075
 eighteen+0.072
 fourteen+0.071
 Have+0.070
Neg logit contributions
 up-0.033
 outside-0.026
 around-0.024
 closer-0.021
 carefully-0.021
Token a
Token ID257
Feature activation+9.09e-03

Pos logit contributions
 loser+0.053
 arrest+0.052
 eighteen+0.051
 fourteen+0.050
 nineteen+0.050
Neg logit contributions
 else-0.032
 up-0.032
 outside-0.029
 carefully-0.027
in-0.027
Token million
Token ID1510
Feature activation+9.14e-02

Pos logit contributions
 arrest+0.012
 fourteen+0.011
 loser+0.010
 eighteen+0.010
 eleven+0.010
Neg logit contributions
 else-0.017
 up-0.017
 outside-0.016
 closer-0.016
 around-0.016
Token pieces
Token ID5207
Feature activation+3.13e-02

Pos logit contributions
 arrest+0.171
 loser+0.171
 eighteen+0.166
 fourteen+0.164
ross+0.159
Neg logit contributions
 else-0.112
 up-0.111
 outside-0.100
 around-0.099
in-0.098
Token.
Token ID13
Feature activation-8.92e-03

Pos logit contributions
 loser+0.071
 arrest+0.070
 eighteen+0.068
 fourteen+0.066
 sixteen+0.065
Neg logit contributions
 up-0.030
 around-0.029
 else-0.029
 outside-0.029
 carefully-0.023
Token Lily
Token ID20037
Feature activation-1.89e-02

Pos logit contributions
 up+0.012
 else+0.012
uggle+0.011
 around+0.011
 outside+0.011
Neg logit contributions
 arrest-0.016
 loser-0.015
 no-0.015
 fourteen-0.014
 eighteen-0.014
Token was
Token ID373
Feature activation-1.21e-02

Pos logit contributions
 up+0.019
 else+0.018
uggle+0.017
 around+0.016
 outside+0.015
Neg logit contributions
 arrest-0.041
 loser-0.040
 fourteen-0.038
 no-0.038
 eighteen-0.037
Token very
Token ID845
Feature activation-2.01e-02

Pos logit contributions
 up+0.015
uggle+0.015
 else+0.015
 compliments+0.014
fish+0.014
Neg logit contributions
 arrest-0.023
 loser-0.021
 no-0.021
 fourteen-0.020
 eighteen-0.020
Token it
Token ID340
Feature activation-1.79e-02

Pos logit contributions
 up+0.008
 else+0.007
uggle+0.007
 around+0.007
 compliments+0.007
Neg logit contributions
 arrest-0.013
 loser-0.012
 no-0.012
 fourteen-0.012
 eighteen-0.011
Token away
Token ID1497
Feature activation+3.39e-02

Pos logit contributions
 up+0.021
 around+0.019
 else+0.018
 outside+0.018
 closer+0.017
Neg logit contributions
 arrest-0.039
 loser-0.036
 eighteen-0.035
 fourteen-0.035
 sixteen-0.034
Token a
Token ID257
Feature activation+3.17e-03

Pos logit contributions
 arrest+0.072
 loser+0.066
 eighteen+0.063
 sixteen+0.062
 nineteen+0.062
Neg logit contributions
 up-0.042
 around-0.037
 else-0.037
 outside-0.036
 quietly-0.033
Token long
Token ID890
Feature activation+4.18e-02

Pos logit contributions
 arrest+0.006
 loser+0.005
 no+0.005
 fourteen+0.005
 eighteen+0.005
Neg logit contributions
 up-0.004
uggle-0.004
 else-0.004
 around-0.004
 compliments-0.004
Token time
Token ID640
Feature activation+1.80e-02

Pos logit contributions
 arrest+0.075
 fourteen+0.072
 fifteen+0.071
 eleven+0.070
 sixteen+0.069
Neg logit contributions
 up-0.052
 around-0.050
 outside-0.050
 else-0.047
fish-0.047
Token ago
Token ID2084
Feature activation+9.14e-02

Pos logit contributions
 arrest+0.034
 loser+0.030
 eighteen+0.029
 fourteen+0.029
 sixteen+0.029
Neg logit contributions
 up-0.024
 around-0.023
 outside-0.023
 else-0.022
 closer-0.020
Token.
Token ID13
Feature activation+1.85e-02

Pos logit contributions
 arrest+0.226
 loser+0.219
 thirteen+0.218
 fourteen+0.217
 eighteen+0.216
Neg logit contributions
 up-0.066
 outside-0.057
 around-0.055
 else-0.039
 inside-0.038
Token It
Token ID632
Feature activation-2.35e-02

Pos logit contributions
 arrest+0.027
 loser+0.023
 fourteen+0.023
 eighteen+0.022
 enough+0.022
Neg logit contributions
 up-0.031
uggle-0.031
 else-0.030
 compliments-0.028
 outside-0.028
Token is
Token ID318
Feature activation+3.02e-03

Pos logit contributions
 up+0.031
 else+0.030
uggle+0.029
 around+0.028
 outside+0.027
Neg logit contributions
 arrest-0.044
 loser-0.042
 no-0.039
 fourteen-0.039
 eighteen-0.039
Token not
Token ID407
Feature activation+5.49e-02

Pos logit contributions
 arrest+0.006
 loser+0.005
 fourteen+0.005
 eighteen+0.005
 sixteen+0.005
Neg logit contributions
 up-0.004
 else-0.004
 outside-0.004
 around-0.004
 closer-0.004
Token good
Token ID922
Feature activation+2.34e-02

Pos logit contributions
 arrest+0.134
 loser+0.120
 fourteen+0.118
ACK+0.112
 eighteen+0.112
Neg logit contributions
 up-0.060
 else-0.053
 outside-0.052
 special-0.051
 around-0.049
Token and
Token ID290
Feature activation-1.73e-02

Pos logit contributions
 up+0.047
 around+0.041
 outside+0.041
uggle+0.040
 else+0.040
Neg logit contributions
 arrest-0.081
 loser-0.078
 eighteen-0.073
 fourteen-0.073
 no-0.072
Token some
Token ID617
Feature activation-1.22e-02

Pos logit contributions
uggle+0.023
 up+0.021
 else+0.020
in+0.020
Hop+0.020
Neg logit contributions
 arrest-0.031
 no-0.030
 enough-0.028
 fourteen-0.027
 loser-0.027
Token soap
Token ID19533
Feature activation-4.75e-02

Pos logit contributions
uggle+0.015
 up+0.014
Hop+0.013
rig+0.012
 else+0.012
Neg logit contributions
 no-0.023
 arrest-0.023
 loser-0.022
 enough-0.021
 fourteen-0.021
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
 up+0.056
 else+0.052
uggle+0.051
 around+0.049
 outside+0.048
Neg logit contributions
 arrest-0.095
 loser-0.091
 no-0.087
 fourteen-0.086
 eighteen-0.084
Token Lily
Token ID20037
Feature activation-1.98e-02

Pos logit contributions
 up+0.017
 else+0.016
uggle+0.016
 around+0.015
 outside+0.015
Neg logit contributions
 arrest-0.016
 loser-0.015
 no-0.015
 fourteen-0.014
 eighteen-0.014
Token dipped
Token ID35300
Feature activation-9.76e-02

Pos logit contributions
uggle+0.029
 up+0.026
 closer+0.026
 compliments+0.025
 else+0.024
Neg logit contributions
 no-0.033
 arrest-0.033
 enough-0.029
 eight-0.028
 loser-0.028
Token a
Token ID257
Feature activation-5.69e-02

Pos logit contributions
uggle+0.129
 else+0.126
 up+0.122
 compliments+0.118
fish+0.115
Neg logit contributions
 no-0.167
 arrest-0.166
 loser-0.161
 enough-0.154
 fourteen-0.154
Token cloth
Token ID16270
Feature activation-4.05e-02

Pos logit contributions
uggle+0.083
 up+0.078
 compliments+0.074
Hop+0.073
 else+0.071
Neg logit contributions
 loser-0.094
 no-0.092
 eight-0.090
 arrest-0.089
 sixteen-0.085
Token in
Token ID287
Feature activation-2.53e-02

Pos logit contributions
 up+0.043
 else+0.040
uggle+0.039
 around+0.037
 outside+0.036
Neg logit contributions
 arrest-0.086
 loser-0.082
 no-0.079
 fourteen-0.078
 eighteen-0.076
Token the
Token ID262
Feature activation+1.01e-02

Pos logit contributions
 up+0.035
 else+0.032
uggle+0.032
 around+0.030
 outside+0.030
Neg logit contributions
 arrest-0.046
 loser-0.044
 no-0.041
 fourteen-0.041
 eighteen-0.040
Token water
Token ID1660
Feature activation-4.03e-02

Pos logit contributions
 arrest+0.021
 loser+0.020
 no+0.020
 fourteen+0.019
 eighteen+0.019
Neg logit contributions
 up-0.010
 else-0.010
uggle-0.010
 compliments-0.009
 around-0.009
Token and
Token ID290
Feature activation+5.15e-03

Pos logit contributions
 up+0.034
 around+0.031
 else+0.030
 outside+0.030
fish+0.029
Neg logit contributions
 arrest-0.064
 loser-0.060
 eighteen-0.057
 fourteen-0.057
 sixteen-0.056
Token a
Token ID257
Feature activation-5.59e-03

Pos logit contributions
 arrest+0.011
 loser+0.010
 fourteen+0.010
 no+0.010
 eighteen+0.010
Neg logit contributions
 up-0.006
 else-0.005
uggle-0.005
 around-0.005
 compliments-0.005
Token wand
Token ID11569
Feature activation+8.06e-03

Pos logit contributions
uggle+0.007
 up+0.007
 else+0.006
 compliments+0.006
 around+0.006
Neg logit contributions
 loser-0.010
 arrest-0.010
 no-0.010
 eight-0.009
 eighteen-0.009
Token.
Token ID13
Feature activation+7.39e-03

Pos logit contributions
 arrest+0.018
 loser+0.017
 eighteen+0.017
 fourteen+0.016
 sixteen+0.016
Neg logit contributions
 up-0.008
 around-0.007
 else-0.007
 outside-0.007
uggle-0.007
Token She
Token ID1375
Feature activation-4.05e-03

Pos logit contributions
 arrest+0.010
 loser+0.009
 no+0.009
 fourteen+0.009
 sixteen+0.008
Neg logit contributions
 up-0.013
 else-0.013
uggle-0.013
fish-0.012
 compliments-0.012
Token dips
Token ID48904
Feature activation-9.72e-02

Pos logit contributions
uggle+0.007
 up+0.006
 else+0.006
 closer+0.006
 compliments+0.006
Neg logit contributions
 arrest-0.006
 no-0.006
 fourteen-0.006
 loser-0.005
 eight-0.005
Token the
Token ID262
Feature activation-2.44e-02

Pos logit contributions
uggle+0.139
 else+0.132
rig+0.128
 up+0.127
 findings+0.124
Neg logit contributions
 no-0.161
 arrest-0.149
 loser-0.148
 enough-0.146
 fourteen-0.144
Token wand
Token ID11569
Feature activation-3.13e-02

Pos logit contributions
uggle+0.028
 else+0.027
 up+0.026
 curiously+0.024
 compliments+0.024
Neg logit contributions
 loser-0.048
 arrest-0.046
 no-0.045
 eight-0.043
 fourteen-0.043
Token in
Token ID287
Feature activation-2.21e-02

Pos logit contributions
 up+0.032
uggle+0.032
 else+0.032
 compliments+0.030
Hop+0.028
Neg logit contributions
 arrest-0.064
 loser-0.060
 no-0.060
 fourteen-0.058
 enough-0.057
Token the
Token ID262
Feature activation+8.40e-03

Pos logit contributions
 up+0.028
 else+0.025
 around+0.024
 outside+0.024
uggle+0.024
Neg logit contributions
 arrest-0.043
 loser-0.041
 fourteen-0.039
 eighteen-0.038
 no-0.038
Token soap
Token ID19533
Feature activation-4.67e-02

Pos logit contributions
 arrest+0.017
 loser+0.016
 no+0.016
 fourteen+0.015
 sixteen+0.015
Neg logit contributions
 else-0.009
 up-0.009
uggle-0.009
 compliments-0.008
Hop-0.008
Token paint
Token ID7521
Feature activation-2.15e-02

Pos logit contributions
 up+0.015
uggle+0.014
 else+0.014
 around+0.013
 compliments+0.013
Neg logit contributions
 arrest-0.024
 loser-0.022
 no-0.022
 fourteen-0.022
 eighteen-0.021
Token.
Token ID13
Feature activation-1.55e-02

Pos logit contributions
 up+0.028
 else+0.026
uggle+0.025
 around+0.024
 outside+0.024
Neg logit contributions
 arrest-0.041
 loser-0.039
 no-0.037
 fourteen-0.036
 eighteen-0.036
Token First
Token ID3274
Feature activation-4.58e-02

Pos logit contributions
 up+0.024
 outside+0.022
 around+0.022
uggle+0.021
 else+0.021
Neg logit contributions
 arrest-0.028
 loser-0.027
 eighteen-0.024
 fourteen-0.024
 no-0.024
Token,
Token ID11
Feature activation-2.02e-02

Pos logit contributions
 up+0.048
 around+0.039
 outside+0.038
 carefully+0.037
 else+0.037
Neg logit contributions
 loser-0.108
 arrest-0.107
 eighteen-0.104
 sixteen-0.101
 fourteen-0.101
Token she
Token ID673
Feature activation+1.23e-02

Pos logit contributions
 up+0.022
 else+0.020
 outside+0.018
 closer+0.018
 around+0.018
Neg logit contributions
 arrest-0.044
 loser-0.043
 eighteen-0.040
 fourteen-0.040
 sixteen-0.039
Token dipped
Token ID35300
Feature activation-9.60e-02

Pos logit contributions
 arrest+0.022
 loser+0.020
 no+0.020
 fourteen+0.019
 eighteen+0.019
Neg logit contributions
 up-0.017
uggle-0.016
 else-0.016
 compliments-0.015
 around-0.015
Token the
Token ID262
Feature activation-2.35e-02

Pos logit contributions
 up+0.117
uggle+0.115
 else+0.115
 compliments+0.105
fish+0.103
Neg logit contributions
 arrest-0.179
 loser-0.170
 no-0.169
 fourteen-0.162
 enough-0.158
Token brush
Token ID14093
Feature activation-1.24e-02

Pos logit contributions
 up+0.027
 else+0.025
uggle+0.025
 around+0.024
 outside+0.023
Neg logit contributions
 arrest-0.048
 loser-0.045
 no-0.043
 fourteen-0.043
 eighteen-0.042
Token into
Token ID656
Feature activation-2.19e-02

Pos logit contributions
 up+0.013
 else+0.012
 around+0.012
uggle+0.011
 outside+0.011
Neg logit contributions
 arrest-0.027
 loser-0.026
 fourteen-0.024
 no-0.024
 eighteen-0.024
Token a
Token ID257
Feature activation-3.03e-02

Pos logit contributions
 up+0.028
 else+0.025
 outside+0.024
 around+0.024
 closer+0.023
Neg logit contributions
 arrest-0.043
 loser-0.042
 fourteen-0.038
 eighteen-0.038
 no-0.038
Token jar
Token ID17379
Feature activation+1.50e-02

Pos logit contributions
 up+0.036
uggle+0.036
 else+0.034
 compliments+0.032
Hop+0.031
Neg logit contributions
 arrest-0.057
 loser-0.055
 no-0.055
 fourteen-0.052
 sixteen-0.052
Token decided
Token ID3066
Feature activation+2.75e-02

Pos logit contributions
 arrest+0.019
 loser+0.018
 no+0.017
 fourteen+0.017
 eighteen+0.017
Neg logit contributions
 up-0.015
 else-0.014
uggle-0.014
 around-0.013
 outside-0.013
Token to
Token ID284
Feature activation+2.57e-03

Pos logit contributions
 loser+0.064
 arrest+0.062
 sixteen+0.059
 fourteen+0.058
 eighteen+0.058
Neg logit contributions
 up-0.031
 else-0.026
 around-0.025
 closer-0.024
 outside-0.023
Token cheat
Token ID22705
Feature activation-9.77e-04

Pos logit contributions
 arrest+0.005
 loser+0.005
 fourteen+0.004
 no+0.004
 eighteen+0.004
Neg logit contributions
 up-0.003
 else-0.003
uggle-0.003
 around-0.003
 compliments-0.003
Token.
Token ID13
Feature activation-7.80e-03

Pos logit contributions
 up+0.001
 else+0.001
 around+0.001
 outside+0.001
 quietly+0.001
Neg logit contributions
 arrest-0.002
 loser-0.002
 fourteen-0.002
 eighteen-0.002
 sixteen-0.002
Token He
Token ID679
Feature activation-7.85e-03

Pos logit contributions
 up+0.011
 around+0.011
 else+0.010
uggle+0.010
 outside+0.010
Neg logit contributions
 loser-0.014
 arrest-0.013
 eighteen-0.012
 fourteen-0.012
 no-0.012
Token dipped
Token ID35300
Feature activation-9.51e-02

Pos logit contributions
 up+0.011
 else+0.010
 around+0.009
uggle+0.009
 compliments+0.009
Neg logit contributions
 arrest-0.014
 loser-0.014
 no-0.013
 fourteen-0.013
 eighteen-0.013
Token his
Token ID465
Feature activation-1.44e-02

Pos logit contributions
 up+0.130
 else+0.118
 around+0.116
uggle+0.115
 closer+0.111
Neg logit contributions
 arrest-0.181
 loser-0.168
 fourteen-0.159
 no-0.157
 eighteen-0.156
Token claws
Token ID28421
Feature activation-2.33e-03

Pos logit contributions
 up+0.026
 else+0.024
 around+0.023
 outside+0.023
uggle+0.023
Neg logit contributions
 arrest-0.021
 loser-0.019
 fourteen-0.018
 no-0.018
 eighteen-0.018
Token in
Token ID287
Feature activation-2.67e-02

Pos logit contributions
 up+0.003
 around+0.002
 else+0.002
 outside+0.002
 closer+0.002
Neg logit contributions
 arrest-0.005
 loser-0.005
 fourteen-0.005
 eighteen-0.005
 no-0.005
Token ink
Token ID16882
Feature activation-1.45e-02

Pos logit contributions
 up+0.031
 closer+0.026
 carefully+0.025
 around+0.025
 closely+0.024
Neg logit contributions
 arrest-0.055
 loser-0.054
EEP-0.054
ross-0.054
tern-0.053
Token and
Token ID290
Feature activation-1.06e-02

Pos logit contributions
 up+0.017
 around+0.015
 outside+0.015
 else+0.014
 closely+0.014
Neg logit contributions
 arrest-0.030
 loser-0.029
 eighteen-0.027
 fourteen-0.027
 no-0.027
Token "
Token ID366
Feature activation+1.34e-02

Pos logit contributions
 arrest+0.019
 loser+0.018
 no+0.017
 fourteen+0.017
 eighteen+0.016
Neg logit contributions
 up-0.023
uggle-0.022
 else-0.022
 compliments-0.021
 around-0.021
TokenYes
Token ID5297
Feature activation-6.04e-03

Pos logit contributions
 arrest+0.023
 loser+0.022
 no+0.022
 fourteen+0.021
 eighteen+0.021
Neg logit contributions
 up-0.019
uggle-0.018
 else-0.017
 around-0.016
fish-0.016
Token please
Token ID3387
Feature activation+1.67e-02

Pos logit contributions
 up+0.007
 else+0.006
 closer+0.005
fish+0.005
 carefully+0.005
Neg logit contributions
 arrest-0.014
 loser-0.012
 fourteen-0.012
 eighteen-0.012
 sixteen-0.012
Token!"
Token ID2474
Feature activation-1.50e-02

Pos logit contributions
 arrest+0.029
 loser+0.026
 fourteen+0.024
 eighteen+0.024
 no+0.024
Neg logit contributions
 up-0.026
 outside-0.024
 else-0.024
 around-0.023
uggle-0.023
Token Dad
Token ID17415
Feature activation-2.60e-02

Pos logit contributions
 up+0.021
 around+0.019
 else+0.019
uggle+0.019
 quietly+0.018
Neg logit contributions
 arrest-0.028
 loser-0.026
 fourteen-0.025
 eighteen-0.024
 eleven-0.024
Token dipped
Token ID35300
Feature activation-9.45e-02

Pos logit contributions
uggle+0.034
 up+0.032
 else+0.031
 compliments+0.029
Hop+0.029
Neg logit contributions
 arrest-0.045
 no-0.045
 fourteen-0.042
 loser-0.042
 enough-0.042
Token a
Token ID257
Feature activation-6.21e-02

Pos logit contributions
uggle+0.140
 else+0.139
 up+0.138
 compliments+0.129
Hop+0.126
Neg logit contributions
 arrest-0.147
 no-0.143
 loser-0.141
 fourteen-0.133
 enough-0.133
Token brush
Token ID14093
Feature activation-6.52e-03

Pos logit contributions
 up+0.078
uggle+0.078
 else+0.073
Hop+0.070
 compliments+0.069
Neg logit contributions
 arrest-0.113
 loser-0.108
 no-0.108
 sixteen-0.102
 eighteen-0.102
Token into
Token ID656
Feature activation-2.42e-02

Pos logit contributions
 up+0.007
 around+0.006
 outside+0.006
 else+0.006
uggle+0.006
Neg logit contributions
 arrest-0.014
 loser-0.014
 fourteen-0.013
 eighteen-0.013
 no-0.013
Token the
Token ID262
Feature activation+4.64e-03

Pos logit contributions
 up+0.034
 else+0.032
uggle+0.032
 around+0.030
 compliments+0.030
Neg logit contributions
 arrest-0.042
 loser-0.040
 no-0.038
 fourteen-0.038
 eighteen-0.037
Token paint
Token ID7521
Feature activation+2.75e-03

Pos logit contributions
 arrest+0.009
 loser+0.009
 no+0.008
 fourteen+0.008
 eighteen+0.008
Neg logit contributions
 up-0.005
 else-0.005
uggle-0.005
 around-0.005
 compliments-0.005
Token make
Token ID787
Feature activation-2.15e-02

Pos logit contributions
 up+0.012
 else+0.011
uggle+0.011
 around+0.010
 compliments+0.010
Neg logit contributions
 arrest-0.021
 loser-0.020
 no-0.019
 fourteen-0.019
 eighteen-0.018
Token one
Token ID530
Feature activation-2.96e-02

Pos logit contributions
uggle+0.031
Hop+0.029
 else+0.029
 up+0.028
fish+0.028
Neg logit contributions
 arrest-0.035
 no-0.034
 loser-0.033
 enough-0.032
 eleven-0.031
Token too
Token ID1165
Feature activation-1.73e-02

Pos logit contributions
uggle+0.038
 up+0.036
Hop+0.035
 compliments+0.035
 else+0.035
Neg logit contributions
 loser-0.052
 no-0.051
 arrest-0.051
 eleven-0.048
 fourteen-0.048
Token.
Token ID13
Feature activation+7.65e-03

Pos logit contributions
 up+0.017
 else+0.016
 closer+0.014
 outside+0.014
 around+0.014
Neg logit contributions
 arrest-0.040
 loser-0.037
 fourteen-0.036
 eighteen-0.035
 no-0.034
Token He
Token ID679
Feature activation-6.04e-03

Pos logit contributions
 arrest+0.014
 loser+0.013
 no+0.013
 fourteen+0.013
 eighteen+0.012
Neg logit contributions
 up-0.010
uggle-0.010
 else-0.009
 compliments-0.009
fish-0.009
Token dips
Token ID48904
Feature activation-9.38e-02

Pos logit contributions
uggle+0.010
 else+0.009
 up+0.009
 outside+0.009
fish+0.009
Neg logit contributions
 arrest-0.009
 fourteen-0.008
 no-0.008
 eleven-0.008
 loser-0.008
Token his
Token ID465
Feature activation-9.85e-03

Pos logit contributions
uggle+0.138
 else+0.131
 up+0.131
 compliments+0.123
Hop+0.123
Neg logit contributions
 no-0.148
 arrest-0.146
 loser-0.144
 fourteen-0.136
 enough-0.135
Token brush
Token ID14093
Feature activation-3.67e-03

Pos logit contributions
 up+0.014
 else+0.013
 around+0.012
 outside+0.012
uggle+0.012
Neg logit contributions
 arrest-0.018
 loser-0.017
 fourteen-0.016
 no-0.016
 eighteen-0.015
Token in
Token ID287
Feature activation-2.48e-02

Pos logit contributions
 up+0.004
 else+0.004
uggle+0.003
 around+0.003
 outside+0.003
Neg logit contributions
 arrest-0.008
 loser-0.007
 no-0.007
 fourteen-0.007
 eighteen-0.007
Token the
Token ID262
Feature activation+8.04e-03

Pos logit contributions
 up+0.031
 around+0.026
 closer+0.026
 else+0.026
 outside+0.026
Neg logit contributions
 arrest-0.051
 loser-0.048
ross-0.045
 fourteen-0.044
 eighteen-0.044
Token red
Token ID2266
Feature activation-3.85e-02

Pos logit contributions
 arrest+0.017
 loser+0.015
 fourteen+0.015
 no+0.014
ross+0.014
Neg logit contributions
 up-0.010
 outside-0.009
 around-0.009
 else-0.009
uggle-0.008
Token a
Token ID257
Feature activation-1.63e-02

Pos logit contributions
 up+0.008
 else+0.008
 around+0.007
 outside+0.007
 compliments+0.007
Neg logit contributions
 arrest-0.013
 loser-0.013
 fourteen-0.012
 no-0.012
 eighteen-0.012
Token big
Token ID1263
Feature activation-1.17e-02

Pos logit contributions
 up+0.020
 else+0.019
uggle+0.019
 around+0.018
 compliments+0.018
Neg logit contributions
 arrest-0.031
 loser-0.029
 no-0.028
 fourteen-0.028
 eighteen-0.027
Token brush
Token ID14093
Feature activation+1.13e-02

Pos logit contributions
 up+0.013
 outside+0.012
 closer+0.012
 around+0.011
 else+0.011
Neg logit contributions
 arrest-0.024
ross-0.023
 loser-0.023
 fourteen-0.022
 no-0.021
Token.
Token ID13
Feature activation+6.21e-03

Pos logit contributions
 arrest+0.029
 loser+0.027
 eighteen+0.025
 fourteen+0.025
 losers+0.024
Neg logit contributions
 up-0.011
 around-0.009
 outside-0.009
 carefully-0.008
 else-0.007
Token She
Token ID1375
Feature activation-3.64e-03

Pos logit contributions
 arrest+0.008
 loser+0.007
 no+0.007
 fourteen+0.007
 sixteen+0.007
Neg logit contributions
 up-0.011
 else-0.011
uggle-0.011
 compliments-0.011
fish-0.011
Token dipped
Token ID35300
Feature activation-9.33e-02

Pos logit contributions
uggle+0.005
 up+0.005
 else+0.005
 compliments+0.005
Hop+0.005
Neg logit contributions
 arrest-0.006
 no-0.006
 loser-0.005
 fourteen-0.005
 sixteen-0.005
Token the
Token ID262
Feature activation-1.93e-02

Pos logit contributions
 up+0.116
 else+0.112
uggle+0.112
 compliments+0.103
 around+0.102
Neg logit contributions
 arrest-0.175
 loser-0.166
 no-0.163
 fourteen-0.158
 eighteen-0.155
Token brush
Token ID14093
Feature activation-8.44e-03

Pos logit contributions
 up+0.024
 else+0.022
uggle+0.021
 around+0.021
 outside+0.021
Neg logit contributions
 arrest-0.038
 loser-0.036
 no-0.035
 fourteen-0.034
 eighteen-0.033
Token in
Token ID287
Feature activation-2.28e-02

Pos logit contributions
 up+0.009
 around+0.008
 else+0.008
uggle+0.007
 outside+0.007
Neg logit contributions
 arrest-0.019
 loser-0.018
 fourteen-0.017
 no-0.017
 eighteen-0.017
Token the
Token ID262
Feature activation+1.24e-02

Pos logit contributions
 up+0.028
 outside+0.023
 around+0.023
 carefully+0.023
 closer+0.023
Neg logit contributions
 arrest-0.048
 loser-0.046
ross-0.043
 eighteen-0.041
 fourteen-0.041
Token yellow
Token ID7872
Feature activation-4.68e-02

Pos logit contributions
 arrest+0.027
 loser+0.025
 fourteen+0.024
 no+0.024
 enough+0.023
Neg logit contributions
 up-0.014
 outside-0.013
 around-0.012
 else-0.012
uggle-0.012
Token for
Token ID329
Feature activation+1.54e-02

Pos logit contributions
 arrest+0.013
 loser+0.012
 fourteen+0.012
 no+0.011
 eighteen+0.011
Neg logit contributions
 up-0.008
 else-0.007
uggle-0.007
 around-0.007
 outside-0.007
Token making
Token ID1642
Feature activation-1.58e-02

Pos logit contributions
 arrest+0.029
 loser+0.028
 no+0.027
 fourteen+0.026
 eighteen+0.026
Neg logit contributions
 up-0.019
 else-0.018
uggle-0.018
 around-0.017
 compliments-0.017
Token bubbles
Token ID25037
Feature activation-2.57e-02

Pos logit contributions
uggle+0.021
 up+0.021
 else+0.020
fish+0.019
 compliments+0.019
Neg logit contributions
 arrest-0.028
 no-0.026
 loser-0.026
 enough-0.025
 fourteen-0.025
Token.
Token ID13
Feature activation-4.76e-03

Pos logit contributions
 up+0.031
uggle+0.030
 else+0.029
 compliments+0.028
 around+0.026
Neg logit contributions
 arrest-0.049
 loser-0.047
 no-0.046
 fourteen-0.044
 enough-0.043
Token She
Token ID1375
Feature activation-6.30e-03

Pos logit contributions
fish+0.007
in+0.007
 else+0.007
 compliments+0.007
 up+0.007
Neg logit contributions
 arrest-0.007
 no-0.007
 No-0.007
 Kim-0.007
 eight-0.006
Token dipped
Token ID35300
Feature activation-9.32e-02

Pos logit contributions
uggle+0.012
 up+0.010
ore+0.010
 compliments+0.010
 programmes+0.009
Neg logit contributions
 no-0.008
 arrest-0.008
 eight-0.008
 enough-0.007
 loser-0.007
Token the
Token ID262
Feature activation-2.45e-02

Pos logit contributions
uggle+0.118
 up+0.117
 else+0.116
 compliments+0.108
fish+0.104
Neg logit contributions
 arrest-0.166
 no-0.161
 loser-0.159
 fourteen-0.152
 enough-0.150
Token wand
Token ID11569
Feature activation-3.54e-02

Pos logit contributions
 up+0.027
uggle+0.026
 else+0.026
 compliments+0.024
 around+0.023
Neg logit contributions
 arrest-0.050
 loser-0.048
 no-0.047
 fourteen-0.045
 eighteen-0.044
Token into
Token ID656
Feature activation-1.58e-02

Pos logit contributions
uggle+0.038
 else+0.037
 up+0.037
 compliments+0.036
Hop+0.034
Neg logit contributions
 arrest-0.069
 no-0.067
 loser-0.066
 fourteen-0.064
 enough-0.063
Token the
Token ID262
Feature activation+6.57e-03

Pos logit contributions
 up+0.020
 else+0.019
uggle+0.018
 around+0.018
 outside+0.018
Neg logit contributions
 arrest-0.030
 loser-0.029
 fourteen-0.027
 no-0.027
 eighteen-0.027
Token bottle
Token ID9294
Feature activation-2.24e-02

Pos logit contributions
 arrest+0.013
 loser+0.012
 no+0.012
 fourteen+0.011
 sixteen+0.011
Neg logit contributions
 up-0.008
 else-0.007
uggle-0.007
 compliments-0.007
in-0.007
Token a
Token ID257
Feature activation-2.47e-02

Pos logit contributions
 up+0.015
 else+0.015
uggle+0.015
 around+0.014
fish+0.014
Neg logit contributions
 arrest-0.018
 no-0.017
 loser-0.017
 fourteen-0.016
 eighteen-0.016
Token nearby
Token ID6716
Feature activation+6.20e-03

Pos logit contributions
 up+0.029
uggle+0.028
 else+0.026
 compliments+0.026
 around+0.026
Neg logit contributions
 arrest-0.048
 loser-0.045
 no-0.045
 eighteen-0.042
 enough-0.041
Token pond
Token ID16723
Feature activation-1.06e-02

Pos logit contributions
 arrest+0.013
 loser+0.012
 no+0.012
 fourteen+0.012
 eighteen+0.012
Neg logit contributions
 up-0.007
 else-0.006
uggle-0.006
 around-0.006
 outside-0.006
Token.
Token ID13
Feature activation+1.13e-02

Pos logit contributions
 up+0.012
 else+0.011
uggle+0.010
 around+0.010
 outside+0.010
Neg logit contributions
 arrest-0.022
 loser-0.021
 fourteen-0.020
 no-0.020
 eighteen-0.020
Token She
Token ID1375
Feature activation+6.80e-04

Pos logit contributions
 arrest+0.018
 loser+0.017
 fourteen+0.016
 no+0.016
 eighteen+0.016
Neg logit contributions
 up-0.018
uggle-0.017
 else-0.017
 outside-0.016
 around-0.016
Token dipped
Token ID35300
Feature activation-9.23e-02

Pos logit contributions
 no+0.001
 arrest+0.001
 enough+0.001
 fourteen+0.001
 loser+0.001
Neg logit contributions
uggle-0.001
 up-0.001
 programmes-0.001
 compliments-0.001
ore-0.001
Token her
Token ID607
Feature activation-2.12e-02

Pos logit contributions
uggle+0.115
 else+0.115
 up+0.115
 compliments+0.106
fish+0.103
Neg logit contributions
 arrest-0.164
 no-0.161
 loser-0.158
 fourteen-0.151
 enough-0.149
Token be
Token ID307
Feature activation+1.91e-03

Pos logit contributions
 up+0.034
uggle+0.033
 else+0.033
 compliments+0.031
 around+0.030
Neg logit contributions
 arrest-0.033
 loser-0.030
 no-0.030
 fourteen-0.028
 eighteen-0.028
Tokenak
Token ID461
Feature activation-1.40e-02

Pos logit contributions
 arrest+0.003
 loser+0.002
 no+0.002
 fourteen+0.002
 eighteen+0.002
Neg logit contributions
uggle-0.003
 up-0.003
 else-0.003
 compliments-0.003
 curiously-0.003
Token in
Token ID287
Feature activation-1.53e-02

Pos logit contributions
 else+0.016
uggle+0.016
 up+0.016
 compliments+0.015
fish+0.014
Neg logit contributions
 arrest-0.027
 no-0.026
 loser-0.025
 fourteen-0.025
 enough-0.024
Token the
Token ID262
Feature activation+2.02e-02

Pos logit contributions
 up+0.020
 closer+0.017
 else+0.017
 around+0.017
 outside+0.017
Neg logit contributions
 arrest-0.030
 loser-0.029
 eighteen-0.027
 fourteen-0.027
ross-0.026
Token special
Token ID2041
Feature activation-3.12e-02

Pos logit contributions
 arrest+0.011
 loser+0.011
 no+0.010
 fourteen+0.010
 eighteen+0.010
Neg logit contributions
 up-0.007
 else-0.006
uggle-0.006
 compliments-0.006
 around-0.006
Token black
Token ID2042
Feature activation-2.70e-02

Pos logit contributions
 up+0.037
 else+0.036
uggle+0.035
 around+0.032
 compliments+0.032
Neg logit contributions
 arrest-0.061
 loser-0.057
 no-0.056
 fourteen-0.054
 eighteen-0.053
Token bowl
Token ID9396
Feature activation-4.45e-03

Pos logit contributions
 up+0.031
uggle+0.028
 else+0.028
 outside+0.027
 around+0.027
Neg logit contributions
 arrest-0.055
 loser-0.051
 no-0.050
 fourteen-0.049
 eighteen-0.048
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
 up+0.004
 around+0.004
 else+0.004
 outside+0.004
uggle+0.003
Neg logit contributions
 arrest-0.011
 loser-0.010
 fourteen-0.009
 eighteen-0.009
 sixteen-0.009
Token She
Token ID1375
Feature activation-1.04e-02

Pos logit contributions
 up+0.019
 else+0.018
uggle+0.018
 around+0.017
 compliments+0.016
Neg logit contributions
 arrest-0.027
 loser-0.026
 no-0.025
 fourteen-0.025
 eighteen-0.024
Token dipped
Token ID35300
Feature activation-9.21e-02

Pos logit contributions
uggle+0.014
 up+0.013
 else+0.012
 outside+0.012
Hop+0.011
Neg logit contributions
 arrest-0.019
 no-0.019
 fourteen-0.018
 sixteen-0.017
 loser-0.017
Token her
Token ID607
Feature activation-2.41e-02

Pos logit contributions
 up+0.110
 else+0.106
uggle+0.106
 compliments+0.097
 around+0.096
Neg logit contributions
 arrest-0.177
 loser-0.168
 no-0.165
 fourteen-0.161
 eighteen-0.157
Token brush
Token ID14093
Feature activation-4.20e-03

Pos logit contributions
uggle+0.036
 up+0.036
 else+0.035
 compliments+0.034
Hop+0.033
Neg logit contributions
 arrest-0.038
 no-0.036
 loser-0.036
 fourteen-0.034
 eighteen-0.033
Token in
Token ID287
Feature activation-2.05e-02

Pos logit contributions
 up+0.004
 else+0.004
uggle+0.004
 around+0.004
 outside+0.004
Neg logit contributions
 arrest-0.009
 loser-0.009
 no-0.008
 fourteen-0.008
 eighteen-0.008
Token and
Token ID290
Feature activation-9.75e-03

Pos logit contributions
 up+0.026
 around+0.022
 outside+0.022
 closer+0.022
 else+0.022
Neg logit contributions
 arrest-0.042
 loser-0.040
ross-0.036
 eighteen-0.036
 fourteen-0.036
Token started
Token ID2067
Feature activation-7.55e-03

Pos logit contributions
uggle+0.014
Hop+0.012
 else+0.012
 up+0.012
rig+0.012
Neg logit contributions
 no-0.017
 arrest-0.016
 fourteen-0.016
 sixteen-0.015
 enough-0.015
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
 up+0.013
 else+0.013
uggle+0.012
 compliments+0.012
fish+0.012
Neg logit contributions
 arrest-0.018
 loser-0.017
 no-0.017
 fourteen-0.016
 eighteen-0.016
Token
Token ID198
Feature activation+6.75e-03

Pos logit contributions
 up+0.015
 compliments+0.014
uggle+0.014
 findings+0.013
 recordings+0.013
Neg logit contributions
 no-0.015
 arrest-0.015
 eight-0.014
 loser-0.014
 No-0.014
TokenM
Token ID44
Feature activation-1.18e-02

Pos logit contributions
 arrest+0.011
 loser+0.011
 no+0.011
 fourteen+0.010
 eighteen+0.010
Neg logit contributions
 up-0.010
uggle-0.009
 compliments-0.009
 around-0.009
 else-0.009
Tokenum
Token ID388
Feature activation-2.50e-02

Pos logit contributions
 up+0.013
 else+0.012
uggle+0.012
 around+0.011
 compliments+0.011
Neg logit contributions
 arrest-0.024
 loser-0.023
 no-0.023
 fourteen-0.022
 eighteen-0.022
Token gave
Token ID2921
Feature activation-4.07e-02

Pos logit contributions
 up+0.029
 else+0.028
uggle+0.027
 around+0.026
 outside+0.025
Neg logit contributions
 arrest-0.050
 loser-0.048
 no-0.046
 fourteen-0.045
 eighteen-0.044
Token me
Token ID502
Feature activation-9.21e-02

Pos logit contributions
 up+0.058
 else+0.058
uggle+0.057
 compliments+0.053
fish+0.053
Neg logit contributions
 arrest-0.068
 loser-0.063
 no-0.063
 fourteen-0.061
 enough-0.059
Token a
Token ID257
Feature activation-4.82e-02

Pos logit contributions
 up+0.122
 else+0.116
uggle+0.115
 around+0.107
 compliments+0.107
Neg logit contributions
 arrest-0.169
 loser-0.159
 no-0.154
 fourteen-0.151
 eighteen-0.147
Token big
Token ID1263
Feature activation-3.45e-02

Pos logit contributions
 up+0.059
uggle+0.059
 closely+0.055
 else+0.055
 around+0.055
Neg logit contributions
 loser-0.082
 arrest-0.080
 no-0.080
 eight-0.078
 sixteen-0.075
Token cardboard
Token ID26819
Feature activation-3.77e-04

Pos logit contributions
 up+0.041
 else+0.040
uggle+0.039
 around+0.036
 compliments+0.036
Neg logit contributions
 arrest-0.066
 loser-0.063
 no-0.061
 fourteen-0.060
 eighteen-0.059
Token box
Token ID3091
Feature activation+1.08e-02

Pos logit contributions
 up+0.000
 else+0.000
uggle+0.000
 around+0.000
 outside+0.000
Neg logit contributions
 arrest-0.001
 loser-0.001
 no-0.001
 fourteen-0.001
 eighteen-0.001
Token and
Token ID290
Feature activation-2.22e-02

Pos logit contributions
 arrest+0.024
 loser+0.023
 fourteen+0.022
 eighteen+0.022
 sixteen+0.021
Neg logit contributions
 up-0.012
 around-0.010
 outside-0.010
 else-0.010
uggle-0.010
Token ran
Token ID4966
Feature activation+2.79e-02

Pos logit contributions
 up+0.027
uggle+0.027
 else+0.026
 compliments+0.024
 outside+0.024
Neg logit contributions
 arrest-0.034
 no-0.032
 loser-0.032
 fourteen-0.031
 eighteen-0.030
Token to
Token ID284
Feature activation-1.26e-02

Pos logit contributions
 arrest+0.043
 no+0.042
 enough+0.042
 fourteen+0.039
 sixteen+0.038
Neg logit contributions
uggle-0.041
 else-0.041
fish-0.040
in-0.039
 compliments-0.038
Token the
Token ID262
Feature activation+1.08e-02

Pos logit contributions
fish+0.024
uggle+0.023
in+0.023
Hop+0.023
actus+0.023
Neg logit contributions
 arrest-0.016
 no-0.015
 enough-0.014
 Kim-0.013
 fourteen-0.012
Token shore
Token ID15191
Feature activation-2.71e-02

Pos logit contributions
 no+0.018
 arrest+0.017
 loser+0.016
 enough+0.016
 eight+0.016
Neg logit contributions
uggle-0.015
 compliments-0.013
le-0.013
Hop-0.013
 findings-0.013
Token and
Token ID290
Feature activation-6.52e-03

Pos logit contributions
 up+0.034
uggle+0.033
 else+0.033
 compliments+0.031
Hop+0.030
Neg logit contributions
 arrest-0.050
 loser-0.047
 no-0.047
 fourteen-0.045
 enough-0.044
Token dipped
Token ID35300
Feature activation-9.15e-02

Pos logit contributions
uggle+0.011
in+0.010
 compliments+0.009
actus+0.009
rig+0.009
Neg logit contributions
 no-0.010
 arrest-0.010
 enough-0.009
 fourteen-0.008
 Kim-0.008
Token their
Token ID511
Feature activation-2.04e-03

Pos logit contributions
uggle+0.131
 else+0.128
Hop+0.122
 up+0.120
 compliments+0.118
Neg logit contributions
 no-0.148
 arrest-0.138
 enough-0.138
 loser-0.135
 fourteen-0.131
Token feet
Token ID3625
Feature activation+5.44e-04

Pos logit contributions
uggle+0.004
 else+0.003
 up+0.003
Hop+0.003
 compliments+0.003
Neg logit contributions
 arrest-0.003
 no-0.003
 loser-0.002
 eight-0.002
 fourteen-0.002
Token in
Token ID287
Feature activation-1.46e-02

Pos logit contributions
 no+0.001
 arrest+0.001
 loser+0.001
 fourteen+0.001
 enough+0.001
Neg logit contributions
uggle-0.001
 else-0.001
 compliments-0.001
 up-0.001
Hop-0.001
Token the
Token ID262
Feature activation+2.19e-02

Pos logit contributions
 up+0.020
 else+0.019
uggle+0.018
 around+0.017
 outside+0.017
Neg logit contributions
 arrest-0.026
 loser-0.025
 no-0.024
 fourteen-0.024
 eighteen-0.023
Token cool
Token ID3608
Feature activation-2.43e-02

Pos logit contributions
 arrest+0.047
 no+0.045
 loser+0.045
 fourteen+0.043
 eighteen+0.042
Neg logit contributions
uggle-0.021
 else-0.020
 up-0.019
 compliments-0.018
Hop-0.017
Token lake
Token ID13546
Feature activation-4.34e-02

Pos logit contributions
 arrest+0.012
 loser+0.011
 no+0.011
 fourteen+0.011
 eighteen+0.010
Neg logit contributions
 up-0.009
 else-0.008
uggle-0.008
 outside-0.008
 around-0.008
Token to
Token ID284
Feature activation+1.01e-02

Pos logit contributions
 else+0.057
 up+0.056
uggle+0.056
 compliments+0.054
Hop+0.051
Neg logit contributions
 arrest-0.076
 no-0.072
 loser-0.070
 enough-0.067
 fourteen-0.066
Token soak
Token ID33182
Feature activation-5.80e-02

Pos logit contributions
 arrest+0.020
 no+0.019
 loser+0.018
 fourteen+0.017
 sixteen+0.017
Neg logit contributions
 else-0.012
in-0.011
 up-0.011
uggle-0.011
ixie-0.011
Token.
Token ID13
Feature activation-2.27e-03

Pos logit contributions
uggle+0.096
 compliments+0.095
ixie+0.095
 else+0.094
rig+0.092
Neg logit contributions
 no-0.086
 enough-0.075
 loser-0.068
 arrest-0.067
 eleven-0.065
Token He
Token ID679
Feature activation-3.11e-03

Pos logit contributions
 up+0.003
 else+0.003
uggle+0.003
 around+0.003
 compliments+0.003
Neg logit contributions
 arrest-0.004
 loser-0.004
 no-0.004
 fourteen-0.003
 eighteen-0.003
Token dipped
Token ID35300
Feature activation-9.13e-02

Pos logit contributions
uggle+0.005
 compliments+0.005
 up+0.005
 programmes+0.005
 findings+0.004
Neg logit contributions
 no-0.005
 arrest-0.004
 loser-0.004
 eight-0.004
 enough-0.004
Token his
Token ID465
Feature activation-1.28e-02

Pos logit contributions
uggle+0.130
 else+0.128
 up+0.128
 compliments+0.121
fish+0.116
Neg logit contributions
 arrest-0.148
 no-0.144
 loser-0.141
 fourteen-0.133
 enough-0.133
Token paws
Token ID49193
Feature activation-8.89e-03

Pos logit contributions
uggle+0.023
 else+0.022
 up+0.022
 compliments+0.021
 closely+0.020
Neg logit contributions
 arrest-0.017
 no-0.016
 loser-0.016
 eight-0.015
 fourteen-0.014
Token in
Token ID287
Feature activation-2.22e-02

Pos logit contributions
uggle+0.010
 else+0.010
 compliments+0.009
 up+0.009
 findings+0.009
Neg logit contributions
 arrest-0.017
 no-0.017
 loser-0.016
 fourteen-0.016
 enough-0.016
Token the
Token ID262
Feature activation+9.53e-03

Pos logit contributions
 up+0.029
 else+0.026
 around+0.025
 outside+0.025
 closer+0.025
Neg logit contributions
 arrest-0.043
 loser-0.041
 fourteen-0.038
 eighteen-0.038
 no-0.037
Token water
Token ID1660
Feature activation-3.61e-02

Pos logit contributions
 arrest+0.021
 loser+0.020
 no+0.020
 fourteen+0.019
 eighteen+0.019
Neg logit contributions
 else-0.009
uggle-0.009
 up-0.009
 compliments-0.008
 findings-0.007
Token up
Token ID510
Feature activation-4.76e-02

Pos logit contributions
uggle+0.038
 else+0.038
fish+0.036
rig+0.034
Hop+0.034
Neg logit contributions
 enough-0.072
 arrest-0.072
 no-0.072
 fourteen-0.071
 eleven-0.069
Token a
Token ID257
Feature activation-4.50e-02

Pos logit contributions
 up+0.067
 else+0.063
uggle+0.062
 around+0.059
 compliments+0.059
Neg logit contributions
 arrest-0.084
 loser-0.079
 no-0.076
 fourteen-0.075
 eighteen-0.073
Token carrot
Token ID40562
Feature activation-2.43e-02

Pos logit contributions
uggle+0.064
 up+0.064
 else+0.060
 compliments+0.058
 around+0.058
Neg logit contributions
 arrest-0.074
 loser-0.074
 no-0.071
 fourteen-0.068
 eighteen-0.068
Token stick
Token ID4859
Feature activation-2.61e-04

Pos logit contributions
 up+0.030
 else+0.027
uggle+0.026
 around+0.026
 outside+0.025
Neg logit contributions
 arrest-0.049
 loser-0.046
 fourteen-0.044
 no-0.044
 eighteen-0.043
Token and
Token ID290
Feature activation-7.95e-03

Pos logit contributions
 up+0.000
 around+0.000
 outside+0.000
 else+0.000
 closer+0.000
Neg logit contributions
 arrest-0.001
 loser-0.001
 eighteen-0.001
 fourteen-0.001
 sixteen-0.000
Token dips
Token ID48904
Feature activation-9.12e-02

Pos logit contributions
uggle+0.012
 up+0.011
 else+0.011
 around+0.010
in+0.010
Neg logit contributions
 arrest-0.013
 no-0.012
 fourteen-0.012
 loser-0.012
 eighteen-0.012
Token it
Token ID340
Feature activation-3.73e-02

Pos logit contributions
uggle+0.124
 up+0.118
 else+0.117
 compliments+0.110
rig+0.109
Neg logit contributions
 no-0.155
 arrest-0.152
 loser-0.150
 fourteen-0.145
 enough-0.143
Token in
Token ID287
Feature activation-2.47e-02

Pos logit contributions
uggle+0.042
 compliments+0.038
 else+0.038
 up+0.037
Hop+0.036
Neg logit contributions
 no-0.072
 arrest-0.071
 loser-0.069
 fourteen-0.068
 eleven-0.068
Token a
Token ID257
Feature activation-2.42e-02

Pos logit contributions
 up+0.030
 else+0.027
 closer+0.026
 carefully+0.026
 closely+0.025
Neg logit contributions
 arrest-0.049
 loser-0.047
ross-0.043
 fourteen-0.043
 eighteen-0.042
Token white
Token ID2330
Feature activation-4.39e-02

Pos logit contributions
 up+0.032
uggle+0.031
 else+0.030
 compliments+0.029
 around+0.028
Neg logit contributions
 arrest-0.044
 loser-0.042
 no-0.040
 eighteen-0.039
 fourteen-0.039
Token sauce
Token ID10746
Feature activation-2.06e-02

Pos logit contributions
 up+0.055
 else+0.052
uggle+0.052
 around+0.048
 compliments+0.048
Neg logit contributions
 arrest-0.084
 loser-0.079
 no-0.077
 fourteen-0.075
 eighteen-0.073
Token He
Token ID679
Feature activation-1.06e-02

Pos logit contributions
 arrest+0.016
 loser+0.015
 fourteen+0.014
 eighteen+0.014
tern+0.014
Neg logit contributions
 up-0.013
 else-0.012
 around-0.012
 outside-0.011
 quietly-0.011
Token took
Token ID1718
Feature activation+1.66e-03

Pos logit contributions
uggle+0.017
 up+0.016
 else+0.015
 compliments+0.015
 programmes+0.014
Neg logit contributions
 arrest-0.016
 no-0.015
 loser-0.015
 enough-0.014
 eight-0.014
Token the
Token ID262
Feature activation-2.82e-02

Pos logit contributions
 arrest+0.003
 no+0.003
 loser+0.003
 enough+0.002
 fourteen+0.002
Neg logit contributions
uggle-0.002
 else-0.002
fish-0.002
 up-0.002
rig-0.002
Token teaspoon
Token ID22326
Feature activation-4.61e-02

Pos logit contributions
uggle+0.034
rig+0.032
Hop+0.031
 compliments+0.030
 up+0.030
Neg logit contributions
 loser-0.053
 no-0.053
 arrest-0.050
 eight-0.049
 enough-0.048
Token and
Token ID290
Feature activation-1.47e-02

Pos logit contributions
uggle+0.064
 else+0.063
Hop+0.063
 compliments+0.062
 programmes+0.060
Neg logit contributions
 arrest-0.075
 no-0.072
 enough-0.070
 loser-0.068
 eleven-0.065
Token dipped
Token ID35300
Feature activation-9.09e-02

Pos logit contributions
 up+0.020
uggle+0.019
 else+0.019
 compliments+0.017
 around+0.017
Neg logit contributions
 arrest-0.027
 loser-0.025
 no-0.024
 fourteen-0.024
 eighteen-0.023
Token it
Token ID340
Feature activation-3.49e-02

Pos logit contributions
 up+0.107
 else+0.098
uggle+0.096
 around+0.092
 outside+0.090
Neg logit contributions
 arrest-0.186
 loser-0.175
 no-0.167
 fourteen-0.166
 eighteen-0.163
Token in
Token ID287
Feature activation-1.95e-02

Pos logit contributions
uggle+0.044
 compliments+0.041
 else+0.040
 programmes+0.039
Hop+0.039
Neg logit contributions
 no-0.064
 enough-0.060
 arrest-0.059
 fourteen-0.059
 eleven-0.059
Token red
Token ID2266
Feature activation-3.79e-02

Pos logit contributions
 up+0.025
 outside+0.021
 closer+0.021
 else+0.021
 around+0.020
Neg logit contributions
 arrest-0.039
 loser-0.038
ross-0.035
 fourteen-0.034
 eighteen-0.034
Token paint
Token ID7521
Feature activation-2.04e-03

Pos logit contributions
 up+0.044
 else+0.040
uggle+0.040
 around+0.038
 outside+0.038
Neg logit contributions
 arrest-0.077
 loser-0.074
 no-0.070
 fourteen-0.068
 eighteen-0.068
Token.
Token ID13
Feature activation+6.34e-05

Pos logit contributions
 up+0.002
 around+0.002
 else+0.002
 outside+0.002
uggle+0.002
Neg logit contributions
 arrest-0.005
 loser-0.004
 no-0.004
 fourteen-0.004
 eighteen-0.004
Token to
Token ID284
Feature activation-1.10e-02

Pos logit contributions
 arrest+0.024
 no+0.022
 loser+0.022
 enough+0.021
 eleven+0.020
Neg logit contributions
 else-0.027
fish-0.027
uggle-0.026
 compliments-0.026
 findings-0.026
Token the
Token ID262
Feature activation+5.28e-03

Pos logit contributions
fish+0.022
actus+0.022
in+0.022
rig+0.022
ixie+0.021
Neg logit contributions
 arrest-0.013
 no-0.012
 Kim-0.010
 letting-0.010
 eleven-0.010
Token stream
Token ID4269
Feature activation-5.97e-03

Pos logit contributions
 arrest+0.009
 loser+0.009
 no+0.009
 eight+0.008
 enough+0.008
Neg logit contributions
 compliments-0.007
 else-0.007
uggle-0.007
rig-0.007
 programmes-0.007
Token.
Token ID13
Feature activation+1.94e-02

Pos logit contributions
 up+0.006
 around+0.006
 else+0.005
 outside+0.005
uggle+0.005
Neg logit contributions
 arrest-0.013
 loser-0.013
 fourteen-0.012
 eighteen-0.012
 sixteen-0.012
Token She
Token ID1375
Feature activation+1.78e-03

Pos logit contributions
 arrest+0.030
 loser+0.028
 no+0.027
 fourteen+0.026
 eighteen+0.025
Neg logit contributions
 up-0.032
 else-0.030
uggle-0.030
 compliments-0.029
 around-0.029
Token dips
Token ID48904
Feature activation-9.05e-02

Pos logit contributions
 arrest+0.003
 no+0.002
 fourteen+0.002
 eleven+0.002
 loser+0.002
Neg logit contributions
uggle-0.003
 else-0.003
 up-0.003
 compliments-0.003
Hop-0.003
Token her
Token ID607
Feature activation-1.72e-02

Pos logit contributions
uggle+0.125
 else+0.120
 up+0.118
rig+0.114
 compliments+0.113
Neg logit contributions
 no-0.152
 arrest-0.145
 loser-0.144
 enough-0.139
 fourteen-0.137
Token hand
Token ID1021
Feature activation+7.08e-03

Pos logit contributions
 up+0.022
uggle+0.021
 else+0.021
 compliments+0.020
 around+0.019
Neg logit contributions
 arrest-0.032
 loser-0.030
 no-0.030
 fourteen-0.029
 eighteen-0.028
Token in
Token ID287
Feature activation-1.72e-02

Pos logit contributions
 no+0.013
 arrest+0.013
 loser+0.012
 enough+0.012
 fourteen+0.012
Neg logit contributions
uggle-0.008
 else-0.008
 compliments-0.008
 findings-0.008
 programmes-0.008
Token the
Token ID262
Feature activation+1.48e-02

Pos logit contributions
 up+0.022
 else+0.019
 outside+0.019
 closer+0.019
 around+0.019
Neg logit contributions
 arrest-0.034
 loser-0.033
 fourteen-0.030
 eighteen-0.030
ross-0.030
Token water
Token ID1660
Feature activation-2.89e-02

Pos logit contributions
 arrest+0.034
 loser+0.033
 no+0.032
 fourteen+0.031
 eighteen+0.031
Neg logit contributions
 up-0.012
 else-0.012
uggle-0.011
 compliments-0.010
 around-0.010
Token was
Token ID373
Feature activation-1.49e-02

Pos logit contributions
uggle+0.047
 up+0.045
 closer+0.041
 else+0.041
Hop+0.041
Neg logit contributions
 arrest-0.068
 no-0.066
 loser-0.061
 fourteen-0.060
 enough-0.059
Token so
Token ID523
Feature activation-4.85e-02

Pos logit contributions
 up+0.016
 else+0.014
uggle+0.014
 around+0.013
 outside+0.013
Neg logit contributions
 arrest-0.033
 loser-0.031
 fourteen-0.029
 no-0.029
 eighteen-0.029
Token happy
Token ID3772
Feature activation+6.77e-03

Pos logit contributions
uggle+0.046
 up+0.044
Hop+0.044
fish+0.042
 compliments+0.041
Neg logit contributions
 no-0.101
 arrest-0.099
 loser-0.097
 sixteen-0.093
 eight-0.093
Token!
Token ID0
Feature activation-5.38e-03

Pos logit contributions
 arrest+0.013
 loser+0.012
 no+0.012
 fourteen+0.012
 eighteen+0.011
Neg logit contributions
 up-0.009
 else-0.008
uggle-0.008
 outside-0.008
 around-0.008
Token He
Token ID679
Feature activation-8.24e-03

Pos logit contributions
 up+0.007
 else+0.007
uggle+0.007
 compliments+0.007
fish+0.007
Neg logit contributions
 arrest-0.009
 loser-0.009
 no-0.008
 fourteen-0.008
 eighteen-0.008
Token dipped
Token ID35300
Feature activation-9.03e-02

Pos logit contributions
 up+0.011
uggle+0.011
 else+0.010
 compliments+0.010
 around+0.010
Neg logit contributions
 arrest-0.015
 no-0.014
 loser-0.014
 fourteen-0.013
 eighteen-0.013
Token his
Token ID465
Feature activation-1.81e-02

Pos logit contributions
 up+0.125
 else+0.118
uggle+0.117
 around+0.111
 compliments+0.110
Neg logit contributions
 arrest-0.162
 loser-0.152
 no-0.147
 fourteen-0.144
 eighteen-0.141
Token brush
Token ID14093
Feature activation-1.25e-04

Pos logit contributions
 up+0.023
uggle+0.022
 else+0.022
 compliments+0.020
 around+0.020
Neg logit contributions
 arrest-0.034
 loser-0.032
 no-0.031
 fourteen-0.030
 eighteen-0.030
Token into
Token ID656
Feature activation-1.32e-02

Pos logit contributions
 up+0.000
 else+0.000
uggle+0.000
 around+0.000
 outside+0.000
Neg logit contributions
 arrest-0.000
 loser-0.000
 no-0.000
 fourteen-0.000
 eighteen-0.000
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
 up+0.016
 else+0.014
 outside+0.013
 carefully+0.013
 around+0.013
Neg logit contributions
 arrest-0.027
 loser-0.026
ross-0.025
 fourteen-0.024
 eighteen-0.024
Token bucket
Token ID19236
Feature activation+1.30e-03

Pos logit contributions
 arrest+0.023
 loser+0.022
 no+0.021
 fourteen+0.021
 eighteen+0.020
Neg logit contributions
 up-0.013
 else-0.013
uggle-0.013
 compliments-0.012
 around-0.011
Token and
Token ID290
Feature activation-5.73e-03

Pos logit contributions
 up+0.039
 else+0.036
uggle+0.036
 around+0.034
 compliments+0.034
Neg logit contributions
 arrest-0.057
 loser-0.054
 no-0.052
 fourteen-0.051
 eighteen-0.050
Token a
Token ID257
Feature activation-1.47e-02

Pos logit contributions
 up+0.007
 else+0.007
uggle+0.007
 around+0.006
 compliments+0.006
Neg logit contributions
 arrest-0.011
 loser-0.010
 no-0.010
 fourteen-0.010
 eighteen-0.009
Token brush
Token ID14093
Feature activation+1.34e-02

Pos logit contributions
 up+0.019
uggle+0.018
 else+0.018
 compliments+0.017
 around+0.017
Neg logit contributions
 arrest-0.027
 loser-0.026
 no-0.025
 fourteen-0.024
 eighteen-0.024
Token.
Token ID13
Feature activation+1.49e-02

Pos logit contributions
 arrest+0.032
 loser+0.031
 eighteen+0.029
 fourteen+0.028
 sixteen+0.027
Neg logit contributions
 up-0.014
 around-0.012
 outside-0.012
 carefully-0.011
 closer-0.011
Token She
Token ID1375
Feature activation+1.13e-03

Pos logit contributions
 arrest+0.025
 loser+0.023
 no+0.021
 eighteen+0.021
 fourteen+0.021
Neg logit contributions
 up-0.023
uggle-0.022
 else-0.021
 around-0.021
 outside-0.021
Token dips
Token ID48904
Feature activation-9.00e-02

Pos logit contributions
 arrest+0.002
 fourteen+0.002
 sixteen+0.002
 enough+0.002
 no+0.002
Neg logit contributions
uggle-0.002
Hop-0.002
 else-0.002
 up-0.002
 outdoors-0.002
Token the
Token ID262
Feature activation-1.57e-02

Pos logit contributions
uggle+0.130
 else+0.123
 up+0.122
rig+0.115
 compliments+0.115
Neg logit contributions
 no-0.146
 arrest-0.140
 loser-0.138
 fourteen-0.135
 enough-0.134
Token brush
Token ID14093
Feature activation-2.98e-03

Pos logit contributions
 up+0.020
 else+0.019
uggle+0.018
 around+0.017
 compliments+0.017
Neg logit contributions
 arrest-0.030
 loser-0.028
 no-0.027
 fourteen-0.027
 eighteen-0.026
Token in
Token ID287
Feature activation-1.82e-02

Pos logit contributions
 up+0.003
 else+0.003
uggle+0.003
 around+0.003
 outside+0.003
Neg logit contributions
 arrest-0.006
 loser-0.006
 fourteen-0.006
 no-0.006
 eighteen-0.006
Token the
Token ID262
Feature activation+1.14e-02

Pos logit contributions
 up+0.022
 outside+0.019
 closer+0.019
 carefully+0.019
 around+0.018
Neg logit contributions
 arrest-0.037
 loser-0.036
ross-0.033
 fourteen-0.033
EEP-0.032
Token water
Token ID1660
Feature activation-3.38e-02

Pos logit contributions
 arrest+0.022
 loser+0.021
 no+0.020
 fourteen+0.020
 enough+0.019
Neg logit contributions
 up-0.014
 else-0.013
uggle-0.013
 around-0.013
 outside-0.012
Token wet
Token ID9583
Feature activation-1.84e-02

Pos logit contributions
uggle+0.006
 findings+0.006
rig+0.006
 programmes+0.006
 compliments+0.006
Neg logit contributions
 no-0.009
 arrest-0.009
 loser-0.008
 enough-0.008
 fourteen-0.008
Token."
Token ID526
Feature activation-2.62e-02

Pos logit contributions
 up+0.029
 around+0.027
 outside+0.027
 else+0.025
 closer+0.025
Neg logit contributions
 arrest-0.031
 loser-0.029
 eighteen-0.027
 no-0.027
 fourteen-0.027
Token
Token ID198
Feature activation-1.14e-02

Pos logit contributions
 up+0.020
 else+0.018
uggle+0.017
 around+0.016
 compliments+0.015
Neg logit contributions
 arrest-0.064
 loser-0.061
 no-0.059
 fourteen-0.059
 eighteen-0.058
Token
Token ID198
Feature activation+1.32e-03

Pos logit contributions
 up+0.013
uggle+0.013
 compliments+0.013
fish+0.012
 else+0.012
Neg logit contributions
 arrest-0.021
 no-0.020
 loser-0.019
 fourteen-0.019
 sixteen-0.018
TokenThey
Token ID2990
Feature activation-1.38e-03

Pos logit contributions
 arrest+0.003
 fourteen+0.003
 loser+0.003
tern+0.003
 no+0.003
Neg logit contributions
 else-0.001
 up-0.001
 quietly-0.001
 carefully-0.001
in-0.001
Token dipped
Token ID35300
Feature activation-9.00e-02

Pos logit contributions
 up+0.002
 else+0.002
uggle+0.002
 around+0.002
 compliments+0.002
Neg logit contributions
 arrest-0.002
 loser-0.002
 no-0.002
 fourteen-0.002
 eighteen-0.002
Token their
Token ID511
Feature activation-4.07e-03

Pos logit contributions
 else+0.122
uggle+0.122
 up+0.122
 compliments+0.113
Hop+0.111
Neg logit contributions
 arrest-0.150
 no-0.146
 loser-0.144
 fourteen-0.137
 enough-0.135
Token fingers
Token ID9353
Feature activation-1.93e-02

Pos logit contributions
uggle+0.008
 up+0.008
 else+0.008
 compliments+0.007
in+0.007
Neg logit contributions
 arrest-0.005
 no-0.004
 loser-0.004
 eight-0.004
 jaws-0.004
Token in
Token ID287
Feature activation-2.32e-02

Pos logit contributions
uggle+0.026
 compliments+0.026
 programmes+0.025
 findings+0.024
 else+0.024
Neg logit contributions
 no-0.034
 enough-0.031
 fourteen-0.030
 arrest-0.030
 eleven-0.029
Token the
Token ID262
Feature activation+1.10e-02

Pos logit contributions
 up+0.032
 else+0.030
uggle+0.030
 around+0.029
 outside+0.028
Neg logit contributions
 arrest-0.042
 loser-0.039
 no-0.037
 fourteen-0.037
 eighteen-0.036
Token water
Token ID1660
Feature activation-3.31e-02

Pos logit contributions
 arrest+0.024
 loser+0.023
 no+0.023
 fourteen+0.022
 eighteen+0.022
Neg logit contributions
 else-0.010
uggle-0.010
 up-0.010
 compliments-0.009
Hop-0.008
Token a
Token ID257
Feature activation-9.57e-03

Pos logit contributions
 loser+0.034
 arrest+0.034
ross+0.032
EEP+0.031
 Leave+0.031
Neg logit contributions
 up-0.019
 closely-0.015
 carefully-0.015
 outside-0.015
 around-0.015
Token big
Token ID1263
Feature activation-1.62e-02

Pos logit contributions
 up+0.012
uggle+0.011
 else+0.011
 compliments+0.011
 around+0.010
Neg logit contributions
 arrest-0.018
 loser-0.017
 no-0.017
 fourteen-0.016
 eighteen-0.016
Token paper
Token ID3348
Feature activation-1.51e-02

Pos logit contributions
 up+0.019
 else+0.017
 outside+0.017
 around+0.017
uggle+0.016
Neg logit contributions
 arrest-0.033
 loser-0.031
 fourteen-0.030
 no-0.029
ross-0.029
Token.
Token ID13
Feature activation+3.50e-03

Pos logit contributions
 up+0.016
 around+0.014
 else+0.014
 outside+0.014
fish+0.013
Neg logit contributions
 arrest-0.033
 loser-0.031
 fourteen-0.030
 no-0.029
 eighteen-0.029
Token He
Token ID679
Feature activation-6.32e-03

Pos logit contributions
 arrest+0.006
 loser+0.006
 no+0.005
 fourteen+0.005
 eighteen+0.005
Neg logit contributions
 up-0.005
 else-0.005
uggle-0.005
 around-0.005
 outside-0.005
Token dipped
Token ID35300
Feature activation-8.98e-02

Pos logit contributions
uggle+0.011
 up+0.010
Hop+0.009
 compliments+0.009
 closer+0.009
Neg logit contributions
 no-0.010
 arrest-0.009
 fourteen-0.009
 sixteen-0.009
 eleven-0.009
Token it
Token ID340
Feature activation-3.77e-02

Pos logit contributions
 up+0.119
 else+0.112
uggle+0.111
 around+0.105
 compliments+0.104
Neg logit contributions
 arrest-0.167
 loser-0.157
 no-0.151
 fourteen-0.148
 eighteen-0.145
Token in
Token ID287
Feature activation-1.79e-02

Pos logit contributions
uggle+0.047
 else+0.045
 compliments+0.044
Hop+0.042
 programmes+0.041
Neg logit contributions
 no-0.068
 arrest-0.064
 enough-0.064
 fourteen-0.063
 loser-0.063
Token blue
Token ID4171
Feature activation-4.65e-02

Pos logit contributions
 up+0.022
 outside+0.018
 closer+0.018
 carefully+0.018
 around+0.018
Neg logit contributions
 arrest-0.037
 loser-0.036
ross-0.034
EEP-0.033
OO-0.032
Token paint
Token ID7521
Feature activation-4.04e-04

Pos logit contributions
 up+0.059
uggle+0.053
 else+0.053
 around+0.051
 outside+0.051
Neg logit contributions
 arrest-0.091
 loser-0.086
 no-0.082
 fourteen-0.080
 eighteen-0.079
Token and
Token ID290
Feature activation-5.97e-03

Pos logit contributions
 up+0.000
 around+0.000
 outside+0.000
 else+0.000
uggle+0.000
Neg logit contributions
 arrest-0.001
 loser-0.001
 no-0.001
 fourteen-0.001
 eighteen-0.001
Token there
Token ID612
Feature activation-3.11e-04

Pos logit contributions
 arrest+0.030
 loser+0.028
 no+0.027
 fourteen+0.027
 eighteen+0.026
Neg logit contributions
 up-0.018
 else-0.017
uggle-0.017
 around-0.016
 outside-0.016
Token was
Token ID373
Feature activation+1.27e-02

Pos logit contributions
 up+0.000
 compliments+0.000
uggle+0.000
 else+0.000
 closer+0.000
Neg logit contributions
 no-0.001
 arrest-0.001
 loser-0.001
 enough-0.001
 fourteen-0.001
Token a
Token ID257
Feature activation-8.88e-04

Pos logit contributions
 arrest+0.021
 no+0.020
 loser+0.020
 fourteen+0.019
 enough+0.019
Neg logit contributions
 up-0.018
 else-0.018
uggle-0.018
 compliments-0.017
fish-0.017
Token little
Token ID1310
Feature activation+1.35e-04

Pos logit contributions
uggle+0.001
 up+0.001
 compliments+0.001
 around+0.001
 else+0.001
Neg logit contributions
 loser-0.001
 arrest-0.001
 no-0.001
 eight-0.001
 eighteen-0.001
Token girl
Token ID2576
Feature activation+3.05e-02

Pos logit contributions
 arrest+0.000
 loser+0.000
 no+0.000
 enough+0.000
 eight+0.000
Neg logit contributions
 up-0.000
 else-0.000
uggle-0.000
 compliments-0.000
 around-0.000
Token named
Token ID3706
Feature activation+7.32e-02

Pos logit contributions
 arrest+0.042
 no+0.039
 loser+0.038
 fourteen+0.035
 eleven+0.034
Neg logit contributions
 up-0.054
uggle-0.052
 else-0.051
 compliments-0.050
 closer-0.048
Token Lily
Token ID20037
Feature activation-3.55e-03

Pos logit contributions
 arrest+0.116
 loser+0.108
 no+0.103
 fourteen+0.101
 eighteen+0.099
Neg logit contributions
 up-0.117
 else-0.112
uggle-0.111
 around-0.106
 outside-0.105
Token.
Token ID13
Feature activation+2.01e-02

Pos logit contributions
 up+0.005
uggle+0.004
 else+0.004
 compliments+0.004
fish+0.004
Neg logit contributions
 arrest-0.006
 no-0.006
 loser-0.006
 fourteen-0.006
 enough-0.006
Token She
Token ID1375
Feature activation+5.19e-03

Pos logit contributions
 arrest+0.032
 loser+0.029
 fourteen+0.027
 no+0.027
 eighteen+0.027
Neg logit contributions
 up-0.033
uggle-0.032
 else-0.031
 around-0.030
 outside-0.030
Token loved
Token ID6151
Feature activation-2.67e-02

Pos logit contributions
 arrest+0.010
 no+0.010
 loser+0.009
 eight+0.008
 letting+0.008
Neg logit contributions
uggle-0.006
 compliments-0.006
 up-0.006
 closer-0.006
rig-0.006
Token Christmas
Token ID6786
Feature activation+5.34e-03

Pos logit contributions
 up+0.040
 else+0.039
uggle+0.038
fish+0.036
 compliments+0.036
Neg logit contributions
 arrest-0.045
 no-0.040
 loser-0.039
 fourteen-0.038
 enough-0.037
Token giant
Token ID6175
Feature activation-4.66e-03

Pos logit contributions
 arrest+0.014
 no+0.014
 loser+0.014
 fourteen+0.013
 eight+0.013
Neg logit contributions
uggle-0.017
 up-0.015
Hop-0.015
 else-0.015
 compliments-0.014
Token didn
Token ID1422
Feature activation+1.57e-02

Pos logit contributions
 up+0.006
 else+0.006
 around+0.006
 outside+0.005
 closer+0.005
Neg logit contributions
 arrest-0.009
 loser-0.008
 fourteen-0.008
 eighteen-0.008
 sixteen-0.008
Token't
Token ID470
Feature activation+2.31e-02

Pos logit contributions
 arrest+0.020
 loser+0.018
 no+0.018
 fourteen+0.017
 eighteen+0.016
Neg logit contributions
 up-0.029
 else-0.029
uggle-0.028
 compliments-0.027
 around-0.027
Token know
Token ID760
Feature activation+2.46e-02

Pos logit contributions
 arrest+0.044
 loser+0.041
 no+0.040
 fourteen+0.039
 eighteen+0.038
Neg logit contributions
 up-0.030
 else-0.028
uggle-0.028
 around-0.026
 compliments-0.026
Token what
Token ID644
Feature activation-2.52e-02

Pos logit contributions
 arrest+0.053
 loser+0.051
 fourteen+0.048
 no+0.048
 eighteen+0.048
Neg logit contributions
 up-0.026
 else-0.025
uggle-0.023
 around-0.022
 outside-0.022
Token to
Token ID284
Feature activation+5.06e-02

Pos logit contributions
uggle+0.039
 up+0.038
 compliments+0.036
 else+0.034
 around+0.034
Neg logit contributions
 arrest-0.040
 no-0.039
 loser-0.037
 fourteen-0.035
 eighteen-0.034
Token do
Token ID466
Feature activation-1.18e-02

Pos logit contributions
 arrest+0.084
 no+0.079
 loser+0.079
 enough+0.072
 fourteen+0.072
Neg logit contributions
uggle-0.071
 up-0.069
Hop-0.066
 else-0.066
 compliments-0.065
Token to
Token ID284
Feature activation+3.86e-02

Pos logit contributions
 up+0.014
 else+0.013
 around+0.012
uggle+0.012
 outside+0.012
Neg logit contributions
 arrest-0.024
 loser-0.023
 fourteen-0.022
 eighteen-0.022
 no-0.021
Token make
Token ID787
Feature activation-1.26e-02

Pos logit contributions
 arrest+0.068
 no+0.063
 loser+0.062
 fourteen+0.058
 eighteen+0.056
Neg logit contributions
uggle-0.053
 up-0.052
 else-0.050
fish-0.049
 compliments-0.049
Token her
Token ID607
Feature activation-1.98e-02

Pos logit contributions
 up+0.018
 else+0.016
uggle+0.016
 around+0.015
 outside+0.015
Neg logit contributions
 arrest-0.023
 loser-0.022
 no-0.021
 fourteen-0.021
 eighteen-0.020
Token feel
Token ID1254
Feature activation-2.20e-02

Pos logit contributions
 up+0.025
 else+0.023
uggle+0.022
 around+0.021
 outside+0.021
Neg logit contributions
 arrest-0.039
 loser-0.037
 fourteen-0.036
 eighteen-0.035
 sixteen-0.035
Token dragon
Token ID10441
Feature activation-8.61e-03

Pos logit contributions
uggle+0.017
 up+0.015
 else+0.015
 compliments+0.015
rig+0.014
Neg logit contributions
 arrest-0.013
 no-0.012
 loser-0.012
 eight-0.011
 fourteen-0.011
Token and
Token ID290
Feature activation-5.92e-03

Pos logit contributions
 up+0.012
 else+0.012
uggle+0.012
 compliments+0.011
 around+0.011
Neg logit contributions
 arrest-0.014
 loser-0.014
 no-0.013
 fourteen-0.013
 eighteen-0.013
Token flew
Token ID13112
Feature activation+1.35e-02

Pos logit contributions
 up+0.008
 else+0.007
 around+0.007
 closer+0.007
 quietly+0.007
Neg logit contributions
 arrest-0.011
 loser-0.011
 fourteen-0.010
 eighteen-0.010
 no-0.010
Token away
Token ID1497
Feature activation+1.67e-02

Pos logit contributions
 arrest+0.027
 loser+0.025
 no+0.024
 fourteen+0.024
 eighteen+0.023
Neg logit contributions
 up-0.016
 else-0.015
uggle-0.015
 around-0.014
 outside-0.014
Token.
Token ID13
Feature activation-5.39e-04

Pos logit contributions
 arrest+0.034
 loser+0.033
 fourteen+0.031
 no+0.031
 eighteen+0.030
Neg logit contributions
 up-0.020
 else-0.018
 around-0.017
uggle-0.017
 outside-0.017
Token From
Token ID3574
Feature activation+5.75e-02

Pos logit contributions
 up+0.001
 around+0.001
uggle+0.001
 quietly+0.001
 curiously+0.001
Neg logit contributions
 loser-0.001
 arrest-0.001
 no-0.001
 eighteen-0.001
 fourteen-0.001
Token that
Token ID326
Feature activation-3.49e-02

Pos logit contributions
 no+0.079
 arrest+0.077
 enough+0.075
 fourteen+0.072
 eleven+0.069
Neg logit contributions
uggle-0.099
 compliments-0.094
 programmes-0.091
 else-0.089
 findings-0.087
Token day
Token ID1110
Feature activation-1.44e-03

Pos logit contributions
 up+0.036
 else+0.034
uggle+0.033
 around+0.031
 outside+0.030
Neg logit contributions
 arrest-0.075
 loser-0.071
 no-0.068
 fourteen-0.068
 eighteen-0.067
Token on
Token ID319
Feature activation+2.07e-02

Pos logit contributions
 up+0.001
 around+0.001
 outside+0.001
uggle+0.001
 closer+0.001
Neg logit contributions
 arrest-0.004
 fourteen-0.003
 eighteen-0.003
 fault-0.003
 loser-0.003
Token,
Token ID11
Feature activation+7.37e-03

Pos logit contributions
 arrest+0.050
 loser+0.050
 sixteen+0.048
 eighteen+0.048
 fourteen+0.047
Neg logit contributions
 up-0.019
 around-0.016
 outside-0.014
 else-0.014
fish-0.013
Token whenever
Token ID8797
Feature activation+1.59e-03

Pos logit contributions
 arrest+0.015
 loser+0.015
 eighteen+0.015
 fourteen+0.014
 sixteen+0.014
Neg logit contributions
 up-0.009
 around-0.007
 else-0.007
uggle-0.007
 outside-0.007
Token there
Token ID612
Feature activation+3.81e-04

Pos logit contributions
 arrest+0.031
 loser+0.028
 fourteen+0.027
 eighteen+0.027
 sixteen+0.026
Neg logit contributions
 up-0.017
 else-0.016
 around-0.015
uggle-0.014
 outside-0.014
Token was
Token ID373
Feature activation+1.27e-02

Pos logit contributions
 arrest+0.001
 no+0.001
 loser+0.001
 enough+0.001
 fourteen+0.001
Neg logit contributions
 up-0.000
uggle-0.000
 compliments-0.000
 else-0.000
 closer-0.000
Token a
Token ID257
Feature activation+2.90e-04

Pos logit contributions
 arrest+0.022
 loser+0.020
 no+0.020
 fourteen+0.019
 enough+0.019
Neg logit contributions
 up-0.018
 else-0.017
uggle-0.017
 compliments-0.016
fish-0.016
Token little
Token ID1310
Feature activation+3.30e-04

Pos logit contributions
 loser+0.000
 arrest+0.000
 no+0.000
 eight+0.000
 eighteen+0.000
Neg logit contributions
uggle-0.000
 up-0.000
 compliments-0.000
 around-0.000
 else-0.000
Token girl
Token ID2576
Feature activation+3.08e-02

Pos logit contributions
 arrest+0.001
 loser+0.001
 no+0.001
 enough+0.000
 fourteen+0.000
Neg logit contributions
 up-0.000
 else-0.000
uggle-0.000
 compliments-0.000
 around-0.000
Token named
Token ID3706
Feature activation+7.16e-02

Pos logit contributions
 arrest+0.043
 no+0.039
 loser+0.039
 fourteen+0.035
 eleven+0.035
Neg logit contributions
 up-0.054
uggle-0.052
 else-0.051
 compliments-0.050
 closer-0.048
Token Lily
Token ID20037
Feature activation-4.33e-03

Pos logit contributions
 arrest+0.120
 loser+0.113
 fourteen+0.106
 no+0.105
 eighteen+0.103
Neg logit contributions
 up-0.110
 else-0.105
uggle-0.103
 outside-0.099
 around-0.099
Token.
Token ID13
Feature activation+2.14e-02

Pos logit contributions
 up+0.006
uggle+0.005
 else+0.005
 compliments+0.005
fish+0.005
Neg logit contributions
 arrest-0.008
 no-0.007
 loser-0.007
 fourteen-0.007
 enough-0.007
Token She
Token ID1375
Feature activation+5.60e-03

Pos logit contributions
 arrest+0.035
 loser+0.032
 fourteen+0.031
 no+0.030
 eighteen+0.030
Neg logit contributions
 up-0.034
uggle-0.032
 else-0.032
 around-0.031
 outside-0.031
Token loved
Token ID6151
Feature activation-2.55e-02

Pos logit contributions
 arrest+0.010
 no+0.010
 loser+0.010
 eight+0.009
 letting+0.009
Neg logit contributions
uggle-0.007
 up-0.007
 compliments-0.007
 closer-0.006
rig-0.006
Token to
Token ID284
Feature activation+1.10e-02

Pos logit contributions
 up+0.037
 else+0.036
uggle+0.036
fish+0.034
 compliments+0.033
Neg logit contributions
 arrest-0.043
 no-0.039
 loser-0.038
 fourteen-0.037
 enough-0.036
Token would
Token ID561
Feature activation+2.96e-02

Pos logit contributions
uggle+0.034
 up+0.032
 compliments+0.032
rig+0.031
Hop+0.031
Neg logit contributions
 no-0.033
 arrest-0.030
 loser-0.029
 enough-0.028
 fourteen-0.026
Token heal
Token ID12035
Feature activation-1.03e-03

Pos logit contributions
 arrest+0.055
 loser+0.052
 no+0.052
 fourteen+0.050
 eighteen+0.049
Neg logit contributions
 up-0.037
uggle-0.035
 else-0.035
 compliments-0.033
 around-0.033
Token him
Token ID683
Feature activation-1.67e-03

Pos logit contributions
 up+0.001
 else+0.001
 around+0.001
 outside+0.001
 quietly+0.001
Neg logit contributions
 arrest-0.002
 loser-0.002
 fourteen-0.002
 sixteen-0.002
 eighteen-0.002
Token.
Token ID13
Feature activation-4.21e-03

Pos logit contributions
 up+0.002
 else+0.002
uggle+0.002
 around+0.002
 outside+0.002
Neg logit contributions
 arrest-0.003
 loser-0.003
 fourteen-0.003
 eighteen-0.003
 no-0.003
Token Tom
Token ID4186
Feature activation-8.55e-03

Pos logit contributions
 up+0.004
 else+0.004
uggle+0.004
 around+0.003
 compliments+0.003
Neg logit contributions
 arrest-0.009
 loser-0.009
 no-0.009
 fourteen-0.008
 eighteen-0.008
Token never
Token ID1239
Feature activation+5.75e-02

Pos logit contributions
uggle+0.013
 up+0.011
rig+0.011
 findings+0.011
 compliments+0.011
Neg logit contributions
 no-0.014
 arrest-0.013
 enough-0.013
 loser-0.012
 fourteen-0.012
Token felt
Token ID2936
Feature activation+2.74e-02

Pos logit contributions
 arrest+0.102
 loser+0.096
 no+0.091
 fourteen+0.091
 eighteen+0.089
Neg logit contributions
 up-0.081
 else-0.078
uggle-0.075
 around-0.073
 compliments-0.072
Token sad
Token ID6507
Feature activation+2.55e-02

Pos logit contributions
 arrest+0.048
 no+0.047
 loser+0.046
 enough+0.044
 fourteen+0.043
Neg logit contributions
uggle-0.036
 up-0.035
 else-0.033
 compliments-0.032
fish-0.032
Token for
Token ID329
Feature activation+1.91e-02

Pos logit contributions
 arrest+0.048
 eighteen+0.043
 loser+0.042
 sixteen+0.042
 fourteen+0.042
Neg logit contributions
 up-0.037
 else-0.034
 around-0.034
 outside-0.034
 closely-0.031
Token long
Token ID890
Feature activation+1.18e-02

Pos logit contributions
 arrest+0.037
 loser+0.035
 fourteen+0.034
 eighteen+0.033
 sixteen+0.033
Neg logit contributions
 up-0.024
 else-0.023
 outside-0.021
 around-0.021
 closer-0.020
Token because
Token ID780
Feature activation-1.59e-02

Pos logit contributions
 arrest+0.021
 loser+0.021
 eighteen+0.020
 fourteen+0.020
 sixteen+0.020
Neg logit contributions
 up-0.016
 around-0.015
 else-0.015
 outside-0.015
 compliments-0.014
Token
Token ID198
Feature activation+3.47e-03

Pos logit contributions
uggle+0.012
 up+0.012
fish+0.012
 findings+0.012
 else+0.011
Neg logit contributions
 arrest-0.016
 no-0.015
 loser-0.015
 Kim-0.014
 No-0.014
TokenJ
Token ID41
Feature activation+1.16e-03

Pos logit contributions
 arrest+0.007
 loser+0.006
 no+0.006
 fourteen+0.006
 eighteen+0.006
Neg logit contributions
 up-0.004
 else-0.004
uggle-0.004
 around-0.004
 compliments-0.004
Tokenenny
Token ID11870
Feature activation-1.79e-02

Pos logit contributions
 arrest+0.003
 no+0.003
 loser+0.003
 eighteen+0.002
 fourteen+0.002
Neg logit contributions
 up-0.001
in-0.001
 else-0.001
uggle-0.001
 outside-0.001
Token nodded
Token ID14464
Feature activation-9.11e-03

Pos logit contributions
uggle+0.022
 up+0.021
 else+0.020
 compliments+0.019
 around+0.019
Neg logit contributions
 arrest-0.034
 no-0.032
 loser-0.031
 fourteen-0.031
 eighteen-0.030
Token but
Token ID475
Feature activation-1.39e-02

Pos logit contributions
uggle+0.015
 else+0.013
ixie+0.013
uddle+0.013
actus+0.013
Neg logit contributions
 no-0.014
 arrest-0.013
 enough-0.013
 loser-0.012
 letting-0.012
Token she
Token ID673
Feature activation+2.38e-02

Pos logit contributions
 up+0.018
 else+0.016
uggle+0.015
 around+0.015
 outside+0.015
Neg logit contributions
 arrest-0.028
 loser-0.026
 fourteen-0.025
 eighteen-0.024
 sixteen-0.024
Token didn
Token ID1422
Feature activation+5.58e-03

Pos logit contributions
 no+0.037
 arrest+0.035
 loser+0.034
 enough+0.032
 fourteen+0.031
Neg logit contributions
uggle-0.039
 up-0.037
 compliments-0.035
 around-0.033
 outside-0.033
Token't
Token ID470
Feature activation+1.81e-02

Pos logit contributions
 arrest+0.007
 loser+0.007
 no+0.006
 fourteen+0.006
 eighteen+0.006
Neg logit contributions
 up-0.010
 else-0.010
uggle-0.010
 compliments-0.010
 around-0.010
Token know
Token ID760
Feature activation+1.93e-02

Pos logit contributions
 arrest+0.031
 loser+0.030
 no+0.029
 fourteen+0.028
 eighteen+0.027
Neg logit contributions
 up-0.026
uggle-0.025
 else-0.024
 compliments-0.023
 around-0.022
Token what
Token ID644
Feature activation-2.62e-02

Pos logit contributions
 arrest+0.040
 loser+0.038
 no+0.036
 fourteen+0.036
 eighteen+0.035
Neg logit contributions
 up-0.022
 else-0.021
uggle-0.020
 around-0.019
 outside-0.019
Token Mr
Token ID1770
Feature activation+1.48e-02

Pos logit contributions
 up+0.036
 else+0.034
uggle+0.034
 around+0.032
 compliments+0.032
Neg logit contributions
 arrest-0.047
 loser-0.045
 no-0.043
 fourteen-0.042
 eighteen-0.041
Token!"
Token ID2474
Feature activation-3.38e-02

Pos logit contributions
 arrest+0.055
 loser+0.049
 fourteen+0.045
 eighteen+0.045
 no+0.045
Neg logit contributions
 up-0.051
 else-0.048
 around-0.046
 outside-0.046
uggle-0.045
Token They
Token ID1119
Feature activation-2.39e-03

Pos logit contributions
 up+0.045
uggle+0.043
 else+0.042
 around+0.040
 compliments+0.039
Neg logit contributions
 arrest-0.063
 loser-0.060
 no-0.057
 fourteen-0.056
 eighteen-0.055
Token both
Token ID1111
Feature activation-1.81e-02

Pos logit contributions
 up+0.003
uggle+0.003
 else+0.003
 compliments+0.003
 closer+0.003
Neg logit contributions
 arrest-0.004
 no-0.004
 loser-0.004
 fourteen-0.004
 sixteen-0.004
Token stretched
Token ID19110
Feature activation-2.15e-02

Pos logit contributions
 up+0.024
uggle+0.023
 else+0.022
 compliments+0.021
 outside+0.020
Neg logit contributions
 arrest-0.032
 loser-0.031
 no-0.031
 fourteen-0.030
 sixteen-0.029
Token their
Token ID511
Feature activation-3.57e-03

Pos logit contributions
 else+0.028
uggle+0.028
 up+0.028
 compliments+0.026
fish+0.025
Neg logit contributions
 arrest-0.038
 no-0.036
 loser-0.036
 fourteen-0.034
 enough-0.034
Token arms
Token ID5101
Feature activation+1.80e-02

Pos logit contributions
 up+0.006
uggle+0.005
 else+0.005
 compliments+0.005
 closer+0.005
Neg logit contributions
 arrest-0.006
 loser-0.005
 no-0.005
 fourteen-0.005
 eighteen-0.005
Token up
Token ID510
Feature activation-2.53e-02

Pos logit contributions
 arrest+0.034
 loser+0.032
 no+0.031
 fourteen+0.031
 eighteen+0.030
Neg logit contributions
 up-0.022
 else-0.021
uggle-0.021
 compliments-0.020
 around-0.020
Token high
Token ID1029
Feature activation+4.57e-03

Pos logit contributions
 else+0.032
 up+0.032
uggle+0.031
 compliments+0.029
fish+0.028
Neg logit contributions
 arrest-0.046
 loser-0.043
 no-0.043
 fourteen-0.042
 enough-0.041
Token and
Token ID290
Feature activation+8.18e-03

Pos logit contributions
 arrest+0.010
 loser+0.010
 no+0.009
 fourteen+0.009
 eighteen+0.009
Neg logit contributions
 up-0.005
 around-0.004
 outside-0.004
uggle-0.004
 else-0.004
Token touched
Token ID12615
Feature activation-4.38e-02

Pos logit contributions
 arrest+0.015
 no+0.015
 loser+0.014
 fourteen+0.014
 eighteen+0.014
Neg logit contributions
uggle-0.010
 else-0.009
 up-0.009
in-0.009
 compliments-0.009
Token their
Token ID511
Feature activation+4.23e-03

Pos logit contributions
uggle+0.062
 else+0.062
 up+0.062
 compliments+0.058
fish+0.058
Neg logit contributions
 arrest-0.071
 no-0.068
 loser-0.067
 fourteen-0.065
 enough-0.064
Token was
Token ID373
Feature activation+1.25e-02

Pos logit contributions
uggle+0.002
 up+0.002
 compliments+0.002
 else+0.002
fish+0.002
Neg logit contributions
 no-0.005
 arrest-0.005
 loser-0.004
 fourteen-0.004
 enough-0.004
Token a
Token ID257
Feature activation+2.08e-03

Pos logit contributions
 arrest+0.020
 no+0.019
 loser+0.018
 enough+0.018
 fourteen+0.017
Neg logit contributions
 else-0.018
 up-0.018
uggle-0.018
 compliments-0.018
fish-0.018
Token tall
Token ID7331
Feature activation+2.29e-02

Pos logit contributions
 loser+0.003
 arrest+0.003
 no+0.003
 eight+0.003
 eighteen+0.002
Neg logit contributions
 up-0.004
uggle-0.003
 compliments-0.003
 around-0.003
 else-0.003
Token bear
Token ID6842
Feature activation+8.12e-03

Pos logit contributions
 arrest+0.043
 loser+0.040
 no+0.039
 fourteen+0.039
 eighteen+0.038
Neg logit contributions
 up-0.030
 outside-0.027
 around-0.027
 else-0.027
fish-0.027
Token.
Token ID13
Feature activation+2.29e-02

Pos logit contributions
 arrest+0.015
 loser+0.014
 fourteen+0.014
 no+0.013
 eighteen+0.013
Neg logit contributions
 up-0.011
 else-0.010
uggle-0.010
 outside-0.010
 around-0.010
Token He
Token ID679
Feature activation+7.60e-03

Pos logit contributions
 arrest+0.038
 loser+0.035
ross+0.033
 fourteen+0.033
 enough+0.033
Neg logit contributions
 up-0.034
uggle-0.033
 around-0.033
 outside-0.033
 else-0.032
Token liked
Token ID8288
Feature activation-1.08e-02

Pos logit contributions
 arrest+0.016
 no+0.015
 loser+0.015
 fourteen+0.014
 eighteen+0.014
Neg logit contributions
 up-0.008
uggle-0.008
 else-0.008
 compliments-0.008
 closer-0.007
Token to
Token ID284
Feature activation+1.96e-02

Pos logit contributions
 up+0.014
 else+0.013
 outside+0.012
uggle+0.012
 around+0.012
Neg logit contributions
 arrest-0.020
 loser-0.020
 fourteen-0.019
 no-0.019
 eighteen-0.018
Token slide
Token ID10649
Feature activation+4.58e-03

Pos logit contributions
 arrest+0.040
 loser+0.037
 no+0.036
 fourteen+0.036
 eighteen+0.035
Neg logit contributions
 up-0.022
 else-0.021
uggle-0.021
 around-0.019
 compliments-0.019
Token down
Token ID866
Feature activation+1.68e-02

Pos logit contributions
 arrest+0.009
 loser+0.009
 no+0.008
 fourteen+0.008
 eighteen+0.008
Neg logit contributions
 up-0.005
 else-0.005
uggle-0.005
 compliments-0.005
 around-0.005
Token hills
Token ID18639
Feature activation+6.31e-03

Pos logit contributions
 arrest+0.033
 loser+0.033
 eighteen+0.030
tern+0.030
 no+0.030
Neg logit contributions
 up-0.022
 around-0.020
 outside-0.020
 else-0.018
 closer-0.018
Token there
Token ID612
Feature activation+7.48e-03

Pos logit contributions
 up+0.015
 compliments+0.015
uggle+0.015
 else+0.015
fish+0.014
Neg logit contributions
 arrest-0.014
 no-0.014
 loser-0.013
 fourteen-0.012
 enough-0.012
Token was
Token ID373
Feature activation+1.67e-02

Pos logit contributions
 no+0.014
 arrest+0.013
 loser+0.012
 No+0.012
 enough+0.012
Neg logit contributions
 compliments-0.009
uggle-0.009
 up-0.009
rig-0.008
 closer-0.008
Token a
Token ID257
Feature activation+4.31e-03

Pos logit contributions
 arrest+0.026
 no+0.025
 loser+0.023
 enough+0.022
 fourteen+0.022
Neg logit contributions
 else-0.026
uggle-0.025
 compliments-0.025
 up-0.025
fish-0.025
Token country
Token ID1499
Feature activation+4.96e-02

Pos logit contributions
 loser+0.006
 arrest+0.006
 no+0.006
 eight+0.006
 eighteen+0.005
Neg logit contributions
uggle-0.007
 up-0.007
 compliments-0.007
 around-0.007
 else-0.006
Token far
Token ID1290
Feature activation+3.84e-02

Pos logit contributions
 arrest+0.101
 no+0.097
 loser+0.095
 fourteen+0.089
 enough+0.089
Neg logit contributions
 up-0.050
uggle-0.049
 compliments-0.048
 else-0.048
 findings-0.045
Token,
Token ID11
Feature activation+2.62e-02

Pos logit contributions
 arrest+0.083
 loser+0.079
 no+0.077
 fourteen+0.076
 eighteen+0.076
Neg logit contributions
 up-0.044
 around-0.039
 outside-0.037
uggle-0.037
 else-0.036
Token far
Token ID1290
Feature activation+3.89e-02

Pos logit contributions
 arrest+0.055
 loser+0.052
 fourteen+0.051
 eighteen+0.050
 sixteen+0.048
Neg logit contributions
 up-0.031
 else-0.027
uggle-0.026
 outside-0.026
 around-0.026
Token away
Token ID1497
Feature activation+4.84e-02

Pos logit contributions
 arrest+0.083
 loser+0.081
 no+0.079
 eighteen+0.077
 fourteen+0.077
Neg logit contributions
 up-0.045
 around-0.039
 outside-0.037
uggle-0.037
 else-0.036
Token.
Token ID13
Feature activation+2.13e-02

Pos logit contributions
 arrest+0.108
 loser+0.106
 eighteen+0.102
 fourteen+0.101
 sixteen+0.099
Neg logit contributions
 up-0.052
 around-0.043
 outside-0.042
 else-0.040
uggle-0.040
Token Everyone
Token ID11075
Feature activation-2.34e-02

Pos logit contributions
 arrest+0.037
 loser+0.035
 fourteen+0.034
 eighteen+0.034
 enough+0.033
Neg logit contributions
 outside-0.029
 up-0.028
uggle-0.028
 around-0.027
 outdoors-0.027
Token in
Token ID287
Feature activation-6.62e-03

Pos logit contributions
uggle+0.032
 compliments+0.029
 findings+0.029
ixie+0.028
 up+0.028
Neg logit contributions
 no-0.038
 arrest-0.037
 enough-0.034
 loser-0.034
 letting-0.033
Token wild
Token ID4295
Feature activation-5.73e-03

Pos logit contributions
 arrest+0.012
 loser+0.012
 no+0.012
 eight+0.011
 eleven+0.010
Neg logit contributions
 compliments-0.008
 up-0.008
uggle-0.008
 findings-0.007
 programmes-0.007
Token place
Token ID1295
Feature activation-4.67e-03

Pos logit contributions
 up+0.007
 else+0.007
uggle+0.006
 compliments+0.006
 around+0.006
Neg logit contributions
 arrest-0.011
 loser-0.011
 no-0.010
 fourteen-0.010
 eighteen-0.010
Token with
Token ID351
Feature activation-3.00e-03

Pos logit contributions
 up+0.005
uggle+0.005
 else+0.005
 compliments+0.005
 around+0.005
Neg logit contributions
 arrest-0.009
 loser-0.009
 no-0.009
 fourteen-0.008
 eighteen-0.008
Token many
Token ID867
Feature activation+3.56e-02

Pos logit contributions
 up+0.004
 else+0.004
uggle+0.004
 around+0.004
 compliments+0.004
Neg logit contributions
 arrest-0.005
 loser-0.005
 no-0.005
 fourteen-0.005
 eighteen-0.005
Token trees
Token ID7150
Feature activation+2.14e-02

Pos logit contributions
 arrest+0.061
 no+0.057
 loser+0.056
 eight+0.055
 enough+0.054
Neg logit contributions
uggle-0.053
 up-0.047
Hop-0.043
rig-0.043
Sure-0.042
Token and
Token ID290
Feature activation+1.35e-02

Pos logit contributions
 arrest+0.039
 no+0.036
 loser+0.035
 enough+0.034
 fourteen+0.034
Neg logit contributions
uggle-0.027
 up-0.027
 else-0.026
 compliments-0.026
 findings-0.024
Token flowers
Token ID12734
Feature activation+1.06e-02

Pos logit contributions
 arrest+0.028
 loser+0.026
 no+0.026
 fourteen+0.025
 eighteen+0.025
Neg logit contributions
 up-0.014
uggle-0.014
 else-0.014
 compliments-0.013
 around-0.012
Token.
Token ID13
Feature activation+3.29e-03

Pos logit contributions
 arrest+0.019
 no+0.018
 loser+0.017
 enough+0.017
 fourteen+0.016
Neg logit contributions
uggle-0.014
 up-0.014
 else-0.013
 compliments-0.013
 programmes-0.012
Token Every
Token ID3887
Feature activation-2.35e-02

Pos logit contributions
 arrest+0.005
 loser+0.005
 no+0.005
 fourteen+0.005
 sixteen+0.004
Neg logit contributions
 up-0.005
 else-0.005
uggle-0.005
 compliments-0.005
fish-0.005
Token day
Token ID1110
Feature activation-1.80e-02

Pos logit contributions
 up+0.029
 else+0.028
uggle+0.027
 compliments+0.027
 around+0.025
Neg logit contributions
 arrest-0.043
 no-0.042
 loser-0.042
 enough-0.039
 fourteen-0.039
Token,
Token ID11
Feature activation-1.37e-02

Pos logit contributions
 up+0.023
 else+0.022
uggle+0.021
 around+0.021
 outside+0.020
Neg logit contributions
 arrest-0.035
 loser-0.032
 fourteen-0.031
 eighteen-0.030
 sixteen-0.030
Token Mum
Token ID38367
Feature activation-1.34e-02

Pos logit contributions
 up+0.007
uggle+0.007
 else+0.007
 around+0.006
 compliments+0.006
Neg logit contributions
 arrest-0.011
 loser-0.010
 no-0.010
 fourteen-0.010
 eighteen-0.010
Token.
Token ID13
Feature activation-2.01e-03

Pos logit contributions
 up+0.015
 else+0.014
uggle+0.013
 around+0.013
 outside+0.013
Neg logit contributions
 arrest-0.028
 loser-0.026
 no-0.025
 fourteen-0.025
 eighteen-0.025
Token She
Token ID1375
Feature activation-4.81e-03

Pos logit contributions
 up+0.003
 else+0.003
uggle+0.003
 around+0.003
 compliments+0.003
Neg logit contributions
 arrest-0.003
 loser-0.003
 no-0.003
 fourteen-0.003
 eighteen-0.003
Token poured
Token ID19036
Feature activation-5.95e-02

Pos logit contributions
uggle+0.007
 up+0.007
 else+0.006
 compliments+0.006
 closer+0.006
Neg logit contributions
 arrest-0.008
 no-0.008
 loser-0.007
 fourteen-0.007
 enough-0.007
Token the
Token ID262
Feature activation-3.58e-02

Pos logit contributions
uggle+0.082
 else+0.080
rig+0.077
Hop+0.076
fish+0.075
Neg logit contributions
 no-0.099
 enough-0.094
 fourteen-0.093
 arrest-0.092
 sixteen-0.089
Token tea
Token ID8887
Feature activation-1.70e-02

Pos logit contributions
uggle+0.044
Hop+0.039
 else+0.039
in+0.038
 up+0.037
Neg logit contributions
 no-0.064
 arrest-0.063
 loser-0.063
 fourteen-0.060
 sixteen-0.059
Token and
Token ID290
Feature activation-1.47e-02

Pos logit contributions
 else+0.019
uggle+0.019
 up+0.019
 compliments+0.018
Hop+0.017
Neg logit contributions
 arrest-0.032
 no-0.030
 loser-0.030
 fourteen-0.030
 eighteen-0.029
Token put
Token ID1234
Feature activation-4.70e-02

Pos logit contributions
 up+0.018
uggle+0.018
 else+0.018
 compliments+0.016
 around+0.016
Neg logit contributions
 arrest-0.028
 loser-0.026
 no-0.026
 fourteen-0.025
 eighteen-0.024
Token the
Token ID262
Feature activation-2.49e-02

Pos logit contributions
uggle+0.059
 else+0.057
 up+0.055
rig+0.053
 compliments+0.052
Neg logit contributions
 arrest-0.083
 no-0.083
 loser-0.079
 fourteen-0.079
 enough-0.078
Token cup
Token ID6508
Feature activation-3.86e-02

Pos logit contributions
uggle+0.026
 up+0.025
 else+0.024
Hop+0.023
in+0.022
Neg logit contributions
 arrest-0.050
 loser-0.049
 no-0.049
 fourteen-0.046
 eighteen-0.045
Token on
Token ID319
Feature activation+1.09e-02

Pos logit contributions
uggle+0.042
 else+0.041
 compliments+0.041
 programmes+0.041
 findings+0.040
Neg logit contributions
 no-0.071
 arrest-0.068
 loser-0.068
 fourteen-0.067
 enough-0.065
Token play
Token ID711
Feature activation-1.69e-02

Pos logit contributions
 up+0.015
 else+0.014
uggle+0.014
 around+0.013
 compliments+0.013
Neg logit contributions
 arrest-0.018
 loser-0.017
 no-0.016
 fourteen-0.016
 eighteen-0.016
Token with
Token ID351
Feature activation-1.99e-02

Pos logit contributions
uggle+0.016
 up+0.016
 compliments+0.015
 else+0.015
 findings+0.014
Neg logit contributions
 arrest-0.035
 loser-0.034
 no-0.034
 fourteen-0.031
 sixteen-0.031
Token my
Token ID616
Feature activation-1.82e-02

Pos logit contributions
 up+0.026
uggle+0.025
 else+0.025
 compliments+0.023
 around+0.023
Neg logit contributions
 arrest-0.037
 loser-0.034
 no-0.034
 fourteen-0.033
 eighteen-0.032
Token doll
Token ID3654
Feature activation+2.27e-02

Pos logit contributions
 up+0.026
 else+0.025
uggle+0.024
 around+0.023
 outside+0.023
Neg logit contributions
 arrest-0.032
 loser-0.030
 no-0.029
 fourteen-0.029
 eighteen-0.027
Token?"
Token ID1701
Feature activation-3.64e-02

Pos logit contributions
 arrest+0.049
 loser+0.045
 fourteen+0.044
 eighteen+0.044
 thirteen+0.043
Neg logit contributions
 up-0.026
 else-0.024
 outside-0.023
 around-0.023
uggle-0.022
Token
Token ID198
Feature activation-1.26e-02

Pos logit contributions
 up+0.037
 else+0.035
uggle+0.034
 around+0.032
 compliments+0.031
Neg logit contributions
 arrest-0.078
 loser-0.076
 no-0.072
 fourteen-0.072
 eighteen-0.071
Token
Token ID198
Feature activation+1.04e-03

Pos logit contributions
 up+0.015
uggle+0.014
 else+0.013
 around+0.013
fish+0.013
Neg logit contributions
 arrest-0.024
 no-0.022
 loser-0.022
 fourteen-0.021
 eighteen-0.021
TokenEm
Token ID10161
Feature activation-7.64e-04

Pos logit contributions
 arrest+0.002
 loser+0.002
 no+0.002
 fourteen+0.002
 eighteen+0.002
Neg logit contributions
 up-0.002
uggle-0.001
 else-0.001
 around-0.001
 outside-0.001
Tokenma
Token ID2611
Feature activation-3.39e-03

Pos logit contributions
in+0.001
 up+0.001
 else+0.000
 compliments+0.000
uggle+0.000
Neg logit contributions
 arrest-0.002
 loser-0.002
 fourteen-0.002
 no-0.002
 eighteen-0.002
Token agreed
Token ID4987
Feature activation-1.10e-03

Pos logit contributions
uggle+0.004
 else+0.004
 up+0.004
fish+0.003
 compliments+0.003
Neg logit contributions
 arrest-0.006
 no-0.006
 fourteen-0.006
 loser-0.006
 enough-0.006
Token,
Token ID11
Feature activation+1.31e-02

Pos logit contributions
 up+0.001
 else+0.001
uggle+0.001
 around+0.001
 outside+0.001
Neg logit contributions
 arrest-0.002
 loser-0.002
 fourteen-0.002
 eighteen-0.002
 no-0.002
Token.
Token ID13
Feature activation+7.34e-04

Pos logit contributions
 up+0.021
 else+0.019
 outside+0.018
 around+0.018
uggle+0.018
Neg logit contributions
 arrest-0.034
 loser-0.033
 eighteen-0.031
 no-0.030
 fourteen-0.030
Token She
Token ID1375
Feature activation-6.77e-03

Pos logit contributions
 arrest+0.001
 loser+0.001
 no+0.001
 fourteen+0.001
 eighteen+0.001
Neg logit contributions
 up-0.001
 else-0.001
uggle-0.001
 around-0.001
 compliments-0.001
Token asked
Token ID1965
Feature activation-4.07e-02

Pos logit contributions
 up+0.009
 else+0.009
uggle+0.009
 around+0.008
 compliments+0.008
Neg logit contributions
 arrest-0.012
 loser-0.012
 no-0.011
 fourteen-0.011
 eighteen-0.011
Token,
Token ID11
Feature activation-2.66e-02

Pos logit contributions
 up+0.050
uggle+0.048
 else+0.048
 compliments+0.045
fish+0.044
Neg logit contributions
 arrest-0.077
 loser-0.074
 no-0.074
 fourteen-0.070
 enough-0.069
Token "
Token ID366
Feature activation-6.58e-03

Pos logit contributions
 up+0.043
 else+0.041
uggle+0.039
 outside+0.039
 around+0.039
Neg logit contributions
 arrest-0.043
 loser-0.039
 fourteen-0.037
 no-0.036
 eighteen-0.036
TokenCan
Token ID6090
Feature activation-1.94e-02

Pos logit contributions
 up+0.009
 else+0.009
uggle+0.009
 around+0.008
 compliments+0.008
Neg logit contributions
 arrest-0.011
 loser-0.011
 no-0.010
 fourteen-0.010
 eighteen-0.010
Token we
Token ID356
Feature activation-5.08e-03

Pos logit contributions
 up+0.034
 else+0.034
uggle+0.033
 outside+0.031
 around+0.031
Neg logit contributions
 arrest-0.028
 loser-0.026
 fourteen-0.025
 eighteen-0.024
 sixteen-0.024
Token see
Token ID766
Feature activation-8.25e-03

Pos logit contributions
 up+0.006
 else+0.006
uggle+0.006
 outside+0.006
 around+0.005
Neg logit contributions
 arrest-0.010
 loser-0.009
 fourteen-0.009
 eighteen-0.009
 no-0.009
Token the
Token ID262
Feature activation-4.81e-03

Pos logit contributions
 up+0.012
 else+0.011
uggle+0.011
 around+0.010
 outside+0.010
Neg logit contributions
 arrest-0.014
 loser-0.014
 no-0.013
 fourteen-0.013
 eighteen-0.013
Token big
Token ID1263
Feature activation-3.10e-02

Pos logit contributions
 up+0.006
 else+0.006
 outside+0.006
 closer+0.006
 special+0.006
Neg logit contributions
 arrest-0.009
 loser-0.008
 fourteen-0.008
ross-0.008
 eighteen-0.008
Token planes
Token ID13016
Feature activation+2.56e-02

Pos logit contributions
 up+0.042
 outside+0.039
 around+0.038
 else+0.037
 closer+0.036
Neg logit contributions
 arrest-0.058
ross-0.057
 fourteen-0.055
 loser-0.055
 eighteen-0.053
Token and
Token ID290
Feature activation-7.60e-03

Pos logit contributions
 arrest+0.011
 loser+0.011
 fourteen+0.010
 eighteen+0.010
 sixteen+0.010
Neg logit contributions
 up-0.006
 else-0.006
 around-0.005
 outside-0.005
 closer-0.005
Token so
Token ID523
Feature activation-2.98e-02

Pos logit contributions
 up+0.009
 else+0.009
 around+0.008
 closer+0.008
 quietly+0.008
Neg logit contributions
 arrest-0.015
 loser-0.014
 fourteen-0.014
 eighteen-0.013
 sixteen-0.013
Token he
Token ID339
Feature activation+5.40e-03

Pos logit contributions
 up+0.034
 else+0.033
 around+0.030
uggle+0.030
 closer+0.029
Neg logit contributions
 arrest-0.062
 loser-0.058
 fourteen-0.056
 eighteen-0.055
 no-0.054
Token decided
Token ID3066
Feature activation+1.81e-02

Pos logit contributions
 arrest+0.009
 no+0.009
 loser+0.009
 fourteen+0.008
 enough+0.008
Neg logit contributions
uggle-0.008
 up-0.008
 else-0.007
 compliments-0.007
 outside-0.007
Token to
Token ID284
Feature activation-1.99e-03

Pos logit contributions
 loser+0.037
 arrest+0.036
 fourteen+0.034
 eighteen+0.034
 sixteen+0.033
Neg logit contributions
 up-0.023
 else-0.020
 around-0.020
 closer-0.019
 outside-0.019
Token stay
Token ID2652
Feature activation-2.45e-02

Pos logit contributions
 up+0.003
 else+0.003
uggle+0.003
in+0.002
 around+0.002
Neg logit contributions
 arrest-0.004
 loser-0.003
 no-0.003
 fourteen-0.003
 eighteen-0.003
Token and
Token ID290
Feature activation-9.76e-03

Pos logit contributions
uggle+0.028
 else+0.027
fish+0.025
 compliments+0.025
 up+0.024
Neg logit contributions
 arrest-0.047
 no-0.046
 enough-0.044
 loser-0.044
 eleven-0.040
Token talk
Token ID1561
Feature activation+1.41e-02

Pos logit contributions
uggle+0.011
in+0.011
 up+0.011
 else+0.011
fish+0.010
Neg logit contributions
 arrest-0.019
 no-0.018
 loser-0.017
 enough-0.016
 fourteen-0.016
Token to
Token ID284
Feature activation-9.83e-03

Pos logit contributions
 arrest+0.028
 loser+0.026
 fourteen+0.025
 eighteen+0.025
 sixteen+0.024
Neg logit contributions
 up-0.018
 else-0.017
 around-0.017
 outside-0.016
uggle-0.016
Token the
Token ID262
Feature activation-5.85e-03

Pos logit contributions
 up+0.013
uggle+0.013
 else+0.013
 compliments+0.012
 around+0.012
Neg logit contributions
 arrest-0.018
 loser-0.017
 no-0.016
 fourteen-0.016
 eighteen-0.015
Token monster
Token ID9234
Feature activation-1.69e-02

Pos logit contributions
uggle+0.012
irl+0.010
 compliments+0.010
 else+0.010
 up+0.010
Neg logit contributions
 loser-0.008
 arrest-0.007
 no-0.007
 eight-0.006
 enemy-0.006
Token You
Token ID921
Feature activation-3.29e-04

Pos logit contributions
 arrest+0.029
 loser+0.026
 fourteen+0.024
 eighteen+0.024
 no+0.024
Neg logit contributions
uggle-0.041
 up-0.040
 else-0.038
 around-0.037
 compliments-0.037
Token are
Token ID389
Feature activation+9.41e-03

Pos logit contributions
 up+0.001
uggle+0.001
 else+0.001
 outside+0.001
 around+0.001
Neg logit contributions
 arrest-0.000
 loser-0.000
 no-0.000
 fourteen-0.000
 eighteen-0.000
Token always
Token ID1464
Feature activation+1.75e-02

Pos logit contributions
 arrest+0.017
 no+0.016
 loser+0.016
 fourteen+0.015
 eighteen+0.015
Neg logit contributions
 up-0.012
uggle-0.012
 else-0.012
 compliments-0.011
fish-0.011
Token compassionate
Token ID32533
Feature activation+2.10e-02

Pos logit contributions
 arrest+0.037
 loser+0.034
 fourteen+0.033
 no+0.032
 eighteen+0.032
Neg logit contributions
 up-0.021
 else-0.019
 around-0.018
 outside-0.018
uggle-0.017
Token to
Token ID284
Feature activation+7.75e-03

Pos logit contributions
 arrest+0.043
 eighteen+0.041
 fourteen+0.040
 sixteen+0.040
 loser+0.039
Neg logit contributions
 up-0.025
 around-0.023
 outside-0.022
 else-0.022
uggle-0.021
Token us
Token ID514
Feature activation-2.68e-02

Pos logit contributions
 arrest+0.014
 loser+0.013
 no+0.013
 fourteen+0.012
 eighteen+0.012
Neg logit contributions
 up-0.010
uggle-0.010
 else-0.010
 around-0.009
 closer-0.009
Token and
Token ID290
Feature activation+1.20e-02

Pos logit contributions
 up+0.040
 else+0.038
uggle+0.038
 around+0.035
 compliments+0.035
Neg logit contributions
 arrest-0.045
 loser-0.042
 no-0.041
 fourteen-0.040
 eighteen-0.039
Token to
Token ID284
Feature activation+1.86e-02

Pos logit contributions
 arrest+0.023
 loser+0.022
 fourteen+0.021
 eighteen+0.020
 sixteen+0.020
Neg logit contributions
 up-0.015
 else-0.015
uggle-0.014
 compliments-0.014
 around-0.014
Token others
Token ID1854
Feature activation-1.76e-03

Pos logit contributions
 arrest+0.034
 loser+0.031
 no+0.031
 fourteen+0.030
 eighteen+0.029
Neg logit contributions
 up-0.025
uggle-0.024
 else-0.024
 around-0.022
 closer-0.022
Token."
Token ID526
Feature activation-2.21e-02

Pos logit contributions
 up+0.003
uggle+0.003
 else+0.002
 compliments+0.002
 closer+0.002
Neg logit contributions
 arrest-0.003
 loser-0.003
 no-0.003
 enough-0.002
 fourteen-0.002
Token
Token ID198
Feature activation-7.55e-03

Pos logit contributions
 up+0.028
 else+0.027
uggle+0.026
fish+0.025
 compliments+0.025
Neg logit contributions
 arrest-0.041
 no-0.039
 loser-0.038
 fourteen-0.037
 eighteen-0.036
Token."
Token ID526
Feature activation-1.73e-02

Pos logit contributions
 up+0.004
 else+0.004
 around+0.004
 outside+0.003
uggle+0.003
Neg logit contributions
 arrest-0.007
 loser-0.006
 fourteen-0.006
 eighteen-0.006
 sixteen-0.006
Token
Token ID198
Feature activation-8.86e-03

Pos logit contributions
 up+0.022
 else+0.022
fish+0.021
uggle+0.021
 findings+0.021
Neg logit contributions
 arrest-0.031
 no-0.030
 fourteen-0.028
 loser-0.028
 No-0.027
Token
Token ID198
Feature activation+3.63e-03

Pos logit contributions
 compliments+0.013
 programmes+0.012
 up+0.012
 findings+0.012
 seafood+0.012
Neg logit contributions
 arrest-0.014
 no-0.013
 Kim-0.012
 eight-0.011
 No-0.011
TokenThey
Token ID2990
Feature activation+5.32e-03

Pos logit contributions
 arrest+0.005
 loser+0.005
 no+0.005
 fourteen+0.005
 eight+0.005
Neg logit contributions
 up-0.006
 around-0.006
 compliments-0.006
uggle-0.006
 findings-0.006
Token hug
Token ID16225
Feature activation+2.28e-02

Pos logit contributions
 arrest+0.009
 no+0.009
 loser+0.008
 fourteen+0.008
 eighteen+0.008
Neg logit contributions
 up-0.007
uggle-0.007
 else-0.007
in-0.007
fish-0.007
Token each
Token ID1123
Feature activation-5.69e-02

Pos logit contributions
 arrest+0.035
 no+0.035
 loser+0.034
 enough+0.032
 fourteen+0.032
Neg logit contributions
 else-0.034
uggle-0.033
 up-0.032
 compliments-0.032
 findings-0.031
Token other
Token ID584
Feature activation+2.03e-02

Pos logit contributions
 up+0.067
uggle+0.064
 else+0.064
 around+0.058
 compliments+0.058
Neg logit contributions
 arrest-0.112
 loser-0.106
 no-0.104
 fourteen-0.101
 eighteen-0.098
Token.
Token ID13
Feature activation+2.68e-02

Pos logit contributions
 arrest+0.033
 no+0.032
 loser+0.031
 enough+0.030
 fourteen+0.029
Neg logit contributions
uggle-0.028
 else-0.028
 up-0.027
Hop-0.027
fish-0.027
Token They
Token ID1119
Feature activation+1.38e-02

Pos logit contributions
 arrest+0.036
 loser+0.034
 no+0.033
 fourteen+0.032
 eighteen+0.031
Neg logit contributions
 up-0.048
 else-0.047
uggle-0.046
 compliments-0.044
 around-0.044
Token are
Token ID389
Feature activation-4.02e-03

Pos logit contributions
 arrest+0.023
 no+0.022
 loser+0.021
 fourteen+0.020
 eighteen+0.019
Neg logit contributions
 up-0.020
uggle-0.020
 else-0.019
in-0.019
fish-0.019
Token sorry
Token ID7926
Feature activation+2.92e-02

Pos logit contributions
 up+0.004
 else+0.004
uggle+0.004
 around+0.003
 compliments+0.003
Neg logit contributions
 arrest-0.009
 loser-0.008
 no-0.008
 fourteen-0.008
 eighteen-0.008
TokenJack
Token ID14295
Feature activation-3.06e-02

Pos logit contributions
 arrest+0.017
 loser+0.015
 no+0.015
 fourteen+0.015
 sixteen+0.015
Neg logit contributions
uggle-0.010
 up-0.010
 compliments-0.009
fish-0.008
 else-0.008
Token pushed
Token ID7121
Feature activation-3.22e-02

Pos logit contributions
uggle+0.039
 up+0.039
 else+0.037
 compliments+0.035
 around+0.034
Neg logit contributions
 arrest-0.057
 no-0.052
 loser-0.051
 fourteen-0.050
 enough-0.049
Token the
Token ID262
Feature activation-1.63e-02

Pos logit contributions
uggle+0.050
 else+0.048
 up+0.047
fish+0.046
 compliments+0.045
Neg logit contributions
 arrest-0.051
 loser-0.048
 no-0.048
 fourteen-0.045
 enough-0.044
Token button
Token ID4936
Feature activation-3.92e-02

Pos logit contributions
uggle+0.023
 else+0.022
Hop+0.022
 compliments+0.020
fish+0.020
Neg logit contributions
 loser-0.027
 arrest-0.027
 no-0.027
 eight-0.024
ACK-0.024
Token and
Token ID290
Feature activation-7.20e-03

Pos logit contributions
 up+0.044
 else+0.038
 around+0.038
 outside+0.037
 closer+0.036
Neg logit contributions
 arrest-0.086
 loser-0.081
 fourteen-0.077
 eighteen-0.076
 no-0.075
Token something
Token ID1223
Feature activation-5.30e-02

Pos logit contributions
 up+0.009
uggle+0.009
 else+0.009
 compliments+0.008
 around+0.008
Neg logit contributions
 arrest-0.013
 loser-0.012
 no-0.012
 fourteen-0.012
 eighteen-0.011
Token amazing
Token ID4998
Feature activation-1.25e-02

Pos logit contributions
uggle+0.083
Hop+0.067
actus+0.066
 compliments+0.065
 programmes+0.065
Neg logit contributions
 no-0.093
 enough-0.084
 arrest-0.078
 fallen-0.077
 fifteen-0.075
Token happened
Token ID3022
Feature activation+6.42e-03

Pos logit contributions
 up+0.021
uggle+0.021
 else+0.020
 compliments+0.019
 around+0.019
Neg logit contributions
 arrest-0.018
 loser-0.017
 no-0.016
 fourteen-0.016
 eighteen-0.015
Token.
Token ID13
Feature activation+9.68e-03

Pos logit contributions
 arrest+0.013
 loser+0.013
 no+0.012
 fourteen+0.012
 eighteen+0.012
Neg logit contributions
 up-0.007
 else-0.007
uggle-0.006
 around-0.006
 outside-0.006
Token A
Token ID317
Feature activation-1.46e-02

Pos logit contributions
 arrest+0.014
 loser+0.013
 no+0.012
 fourteen+0.012
 eighteen+0.012
Neg logit contributions
 up-0.017
 else-0.016
uggle-0.016
 around-0.016
 outside-0.015
Token beautiful
Token ID4950
Feature activation-2.16e-02

Pos logit contributions
uggle+0.019
 up+0.019
 around+0.017
 else+0.017
 compliments+0.017
Neg logit contributions
 loser-0.026
 arrest-0.026
 no-0.025
 eight-0.024
 eighteen-0.023
Token from
Token ID422
Feature activation-2.96e-03

Pos logit contributions
 else+0.046
Hop+0.045
 compliments+0.045
uggle+0.045
 programmes+0.044
Neg logit contributions
 arrest-0.054
 no-0.050
 loser-0.048
 enough-0.047
 fourteen-0.044
Token Ben
Token ID3932
Feature activation-1.73e-02

Pos logit contributions
 up+0.005
 else+0.004
uggle+0.004
 outside+0.004
 around+0.004
Neg logit contributions
 arrest-0.005
 loser-0.005
 no-0.004
 fourteen-0.004
 eighteen-0.004
Token's
Token ID338
Feature activation-9.35e-03

Pos logit contributions
 up+0.020
 else+0.018
 around+0.018
 outside+0.018
uggle+0.017
Neg logit contributions
 arrest-0.035
 loser-0.035
 fourteen-0.033
 eighteen-0.033
 sixteen-0.032
Token hand
Token ID1021
Feature activation+1.84e-02

Pos logit contributions
 up+0.014
 else+0.013
 outside+0.013
 around+0.012
uggle+0.012
Neg logit contributions
 arrest-0.016
 loser-0.016
 fourteen-0.016
 eighteen-0.015
 sixteen-0.015
Token and
Token ID290
Feature activation-1.25e-02

Pos logit contributions
 arrest+0.036
 loser+0.034
 fourteen+0.032
 no+0.032
 eighteen+0.032
Neg logit contributions
 up-0.023
 else-0.021
uggle-0.021
 around-0.020
 outside-0.020
Token put
Token ID1234
Feature activation-5.57e-02

Pos logit contributions
uggle+0.018
 up+0.017
 else+0.016
in+0.016
Hop+0.016
Neg logit contributions
 arrest-0.022
 no-0.020
 loser-0.019
 fourteen-0.019
 enough-0.019
Token it
Token ID340
Feature activation-3.27e-02

Pos logit contributions
uggle+0.077
 else+0.074
rig+0.073
 up+0.070
fish+0.068
Neg logit contributions
 no-0.095
 arrest-0.094
 loser-0.089
 enough-0.089
 fourteen-0.088
Token back
Token ID736
Feature activation+3.00e-05

Pos logit contributions
uggle+0.043
rig+0.041
 compliments+0.040
 programmes+0.040
 else+0.040
Neg logit contributions
 no-0.058
 arrest-0.054
 loser-0.054
 enough-0.053
 eleven-0.052
Token on
Token ID319
Feature activation+1.59e-02

Pos logit contributions
 arrest+0.000
 no+0.000
 loser+0.000
 fourteen+0.000
 enough+0.000
Neg logit contributions
uggle-0.000
 else-0.000
 compliments-0.000
 up-0.000
Hop-0.000
Token the
Token ID262
Feature activation+1.23e-02

Pos logit contributions
 arrest+0.031
 loser+0.030
 fourteen+0.028
 eighteen+0.028
 no+0.027
Neg logit contributions
 up-0.021
 else-0.018
 outside-0.018
 around-0.018
 quietly-0.017
Token ground
Token ID2323
Feature activation-7.20e-03

Pos logit contributions
 arrest+0.020
 loser+0.019
 no+0.018
 fourteen+0.018
 eighteen+0.017
Neg logit contributions
 up-0.018
 else-0.018
uggle-0.018
 compliments-0.017
 around-0.017
Token,
Token ID11
Feature activation+1.22e-02

Pos logit contributions
 up+0.009
 else+0.008
uggle+0.008
 around+0.008
 outside+0.007
Neg logit contributions
 arrest-0.011
 loser-0.011
 fourteen-0.010
 no-0.010
 eighteen-0.010
Token
Token ID6184
Feature activation+1.34e-02

Pos logit contributions
 arrest+0.023
 loser+0.021
 fourteen+0.019
 eighteen+0.018
 no+0.018
Neg logit contributions
 up-0.018
 around-0.016
 else-0.016
 closer-0.016
 quietly-0.016
Token
Token ID95
Feature activation+1.52e-02

Pos logit contributions
 arrest+0.022
 loser+0.020
 no+0.019
 fourteen+0.019
 eighteen+0.019
Neg logit contributions
 up-0.020
 else-0.020
uggle-0.019
 compliments-0.019
 around-0.018
Token
Token ID26391
Feature activation-2.37e-02

Pos logit contributions
 arrest+0.023
 loser+0.022
 no+0.022
 fourteen+0.021
 enough+0.020
Neg logit contributions
 up-0.025
uggle-0.024
 outside-0.023
 else-0.023
 around-0.022
Token
Token ID129
Feature activation+4.56e-02

Pos logit contributions
 up+0.033
uggle+0.033
 else+0.032
 around+0.031
 outside+0.031
Neg logit contributions
 loser-0.039
 fourteen-0.038
 arrest-0.038
 thirteen-0.036
 eighteen-0.036
Token
Token ID241
Feature activation-5.26e-02

Pos logit contributions
 arrest+0.073
 loser+0.069
 no+0.066
 fourteen+0.065
 eighteen+0.063
Neg logit contributions
 up-0.071
 else-0.067
uggle-0.067
 around-0.064
 outside-0.063
TokenI
Token ID40
Feature activation-1.51e-02

Pos logit contributions
 else+0.058
 up+0.058
uggle+0.054
 around+0.052
Hop+0.051
Neg logit contributions
 arrest-0.107
 loser-0.098
 fourteen-0.096
 eighteen-0.095
 no-0.095
Token am
Token ID716
Feature activation-1.41e-02

Pos logit contributions
 up+0.021
uggle+0.020
 else+0.019
 compliments+0.019
 around+0.018
Neg logit contributions
 arrest-0.026
 loser-0.025
 no-0.025
 fourteen-0.023
 eighteen-0.022
Token the
Token ID262
Feature activation+2.75e-03

Pos logit contributions
 up+0.020
 else+0.018
 outside+0.017
 around+0.017
 closer+0.016
Neg logit contributions
 arrest-0.027
 loser-0.025
 fourteen-0.024
 eighteen-0.024
 nineteen-0.023
Token Guardian
Token ID8283
Feature activation-1.40e-02

Pos logit contributions
 arrest+0.005
 fourteen+0.005
 loser+0.005
 eighteen+0.005
 sixteen+0.005
Neg logit contributions
 up-0.004
 else-0.003
 closer-0.003
uggle-0.003
 outside-0.003
Token of
Token ID286
Feature activation-1.14e-03

Pos logit contributions
 else+0.017
 up+0.017
 around+0.016
in+0.016
fish+0.015
Neg logit contributions
 arrest-0.029
 loser-0.026
 fourteen-0.026
 eighteen-0.026
ACK-0.025
Token,
Token ID11
Feature activation+1.00e-02

Pos logit contributions
 arrest+0.004
 loser+0.004
 fourteen+0.004
 sixteen+0.003
 eighteen+0.003
Neg logit contributions
 up-0.003
 else-0.003
 around-0.002
 closer-0.002
 outside-0.002
Token
Token ID6184
Feature activation+1.49e-02

Pos logit contributions
 arrest+0.016
 loser+0.015
 no+0.014
 fourteen+0.014
 eighteen+0.014
Neg logit contributions
 up-0.016
 else-0.015
uggle-0.015
 around-0.014
 outside-0.014
Token
Token ID95
Feature activation+4.05e-03

Pos logit contributions
 arrest+0.024
 loser+0.023
 no+0.022
 fourteen+0.022
 eighteen+0.021
Neg logit contributions
 up-0.023
 else-0.022
 compliments-0.021
uggle-0.021
 around-0.020
Token
Token ID26391
Feature activation-2.69e-02

Pos logit contributions
 arrest+0.006
 loser+0.006
 no+0.006
 fourteen+0.006
 eighteen+0.005
Neg logit contributions
 up-0.006
 else-0.006
uggle-0.006
 around-0.006
 compliments-0.006
Token
Token ID129
Feature activation+4.10e-02

Pos logit contributions
 up+0.023
 else+0.023
uggle+0.023
 around+0.020
 outside+0.019
Neg logit contributions
 arrest-0.059
 loser-0.058
 fourteen-0.056
 enough-0.055
 eleven-0.055
Token
Token ID241
Feature activation-5.61e-02

Pos logit contributions
 arrest+0.068
 loser+0.066
 no+0.063
 fourteen+0.060
 enough+0.059
Neg logit contributions
 up-0.062
uggle-0.058
 else-0.057
 around-0.055
 outside-0.055
TokenThat
Token ID2504
Feature activation-3.78e-02

Pos logit contributions
 else+0.058
 up+0.057
uggle+0.053
 around+0.052
 outside+0.051
Neg logit contributions
 arrest-0.119
 loser-0.108
 fourteen-0.106
 no-0.105
 eighteen-0.104
Token was
Token ID373
Feature activation+1.90e-02

Pos logit contributions
 up+0.046
 else+0.044
uggle+0.042
 around+0.040
 compliments+0.040
Neg logit contributions
 arrest-0.074
 loser-0.070
 no-0.067
 fourteen-0.067
 eighteen-0.066
Token so
Token ID523
Feature activation-1.73e-02

Pos logit contributions
 arrest+0.040
 loser+0.037
 fourteen+0.034
 eighteen+0.033
 nineteen+0.033
Neg logit contributions
 up-0.024
 else-0.022
 around-0.020
 outside-0.020
 closer-0.019
Token y
Token ID331
Feature activation+7.32e-03

Pos logit contributions
 up+0.021
 else+0.021
uggle+0.019
 around+0.019
 compliments+0.018
Neg logit contributions
 arrest-0.034
 loser-0.032
 fourteen-0.031
 no-0.030
 eighteen-0.030
Tokenummy
Token ID13513
Feature activation-1.68e-02

Pos logit contributions
 arrest+0.011
 loser+0.011
 no+0.010
 fourteen+0.010
 enough+0.010
Neg logit contributions
 up-0.011
 else-0.011
uggle-0.011
 compliments-0.011
 closer-0.010