TOP ACTIVATIONS
MAX = 8.367

cup . But as they touched the lid ,
happy . But as they ate , the sky
. But as she did , the stone
. But as he wiped it , a
is . But as they are about to land
safety . But as he was helping his friends
playground . But as she was having fun ,
himself . But as he was hiding , he
now . But when the waiter came , he
one . But as she was about to grab
behind . But as she ran , she did
away . But as he bit into the peach
. But as he was walking back ,
line . But as they crossed the finish line
. But as the mouse was wrapping the
out . But as she was taking pictures ,
inside . But as she looked around , she
." But as they were walking back ,
honey . But as they were eating , a
branch . But as the muff in sat on

MIN ACTIVATIONS
MIN = -10.437

. He jumped in and sw am down and saw a
. She sw am and sw am , until she found
They sw am and sw am until they were tired
, he leaned over and sw atted the mosquito with his
He sw am and sw am , passing all kinds
. They sw am and sw am until they found the
. She sw am and sw am until she saw a
and Mia grabbed Leo and sw am to the edge of
. It has flowers and rib bons on it . She
sailed for days , exploring unch arted waters until he came
. The ducks and fish sw am away . The water
She cautiously sw am around and saw lots
and he and Be cca sw am together . They had
, she jumped in and sw am to the bag .
. He grabbed it and sw am back to shore .
He sw am and sw am until he found a
. He sw am and sw am until he found a
Joe sw am and sw am , but he couldn
. He sw am and sw am until he saw a
. He jumped in and sw am all around the river

INTERVAL 3.666 - 8.367
CONTAINS 5.568%

weak . But soon , the sky began to
cat the other . Finally , the pile separated , and
all day long . One day , Tim my 's mom
<|endoftext|> Once upon a time , there was a horse named
<|endoftext|> Once upon a time , there was a gentle bunny

INTERVAL -1.035 - 3.666
CONTAINS 62.206%

arms and grabbed Tom 's leg . It pulled him into
But when she opened the box , something made her feel
very messy . " Your feathers are all over the place
touch it . " Let 's seat on the
knew that they had a smart friend in the turkey who

INTERVAL -5.736 - -1.035
CONTAINS 32.105%

?" the voice asked . B uster looked
: to respect other people 's things and to be careful
with her mom and dad . They had a lot of
cl apped her hands with joy and started running along the
is Anna , my daughter . She likes to come with

INTERVAL -10.437 - -5.736
CONTAINS 0.120%

asis so badly . He stood there for a moment ,
replied , � � € � � it is a rocking
happy . She smiled and cl apped her hands . She
. Tom my stepped inside , closed his eyes
Mitt ens and Spark y sh ivered and looked around .
Token cup
Token ID6508
Feature activation+6.46e-01

Pos logit contributions
 heads+0.008
 circles+0.007
 requests+0.007
stick+0.007
 mistakes+0.007
Neg logit contributions
str-0.008
ww-0.007
prise-0.007
Not-0.007
arest-0.007
Token.
Token ID13
Feature activation-1.58e+00

Pos logit contributions
str+0.015
prise+0.014
ww+0.013
arest+0.013
 slice+0.013
Neg logit contributions
stick-0.014
lee-0.013
en-0.013
arted-0.012
 circles-0.012
Token
Token ID198
Feature activation+1.75e+00

Pos logit contributions
stick+0.027
lee+0.026
en+0.026
ags+0.025
 heads+0.025
Neg logit contributions
str-0.040
Not-0.038
 slice-0.037
ww-0.037
aster-0.036
Token
Token ID198
Feature activation+1.46e+00

Pos logit contributions
str+0.035
ww+0.032
 slice+0.032
iest+0.031
prise+0.031
Neg logit contributions
lee-0.040
stick-0.040
 requests-0.038
en-0.038
arted-0.038
TokenBut
Token ID1537
Feature activation+5.09e+00

Pos logit contributions
str+0.025
iest+0.023
 slice+0.022
 happened+0.022
ipop+0.022
Neg logit contributions
lee-0.037
stick-0.036
 requests-0.036
lement-0.035
aled-0.035
Token as
Token ID355
Feature activation+8.37e+00

Pos logit contributions
str+0.120
arest+0.112
iest+0.111
ww+0.109
 slice+0.103
Neg logit contributions
stick-0.103
lee-0.099
en-0.099
 heads-0.098
ishment-0.097
Token they
Token ID484
Feature activation+7.35e-01

Pos logit contributions
str+0.178
 slice+0.160
arest+0.157
prise+0.142
ipop+0.142
Neg logit contributions
 circles-0.172
en-0.167
lee-0.166
 houses-0.165
 requests-0.165
Token touched
Token ID12615
Feature activation-2.55e+00

Pos logit contributions
ww+0.016
str+0.015
iest+0.015
arest+0.015
 possible+0.014
Neg logit contributions
lee-0.016
 heads-0.016
arted-0.016
en-0.015
am-0.015
Token the
Token ID262
Feature activation+2.17e-01

Pos logit contributions
en+0.055
lee+0.055
stick+0.054
 heads+0.052
 circles+0.051
Neg logit contributions
str-0.056
ww-0.052
prise-0.052
arest-0.051
 slice-0.051
Token lid
Token ID19789
Feature activation+1.09e+00

Pos logit contributions
str+0.005
ww+0.005
arest+0.004
 happened+0.004
prise+0.004
Neg logit contributions
 heads-0.005
 circles-0.004
lee-0.004
stick-0.004
 backs-0.004
Token,
Token ID11
Feature activation+5.56e+00

Pos logit contributions
str+0.021
ww+0.020
prise+0.019
 slice+0.019
aster+0.018
Neg logit contributions
stick-0.026
lee-0.025
en-0.025
arted-0.024
 heads-0.024
Token happy
Token ID3772
Feature activation+2.30e+00

Pos logit contributions
str+0.024
 slice+0.022
ww+0.022
iest+0.022
ha+0.022
Neg logit contributions
lee-0.024
 requests-0.023
 heads-0.022
stick-0.022
arted-0.022
Token.
Token ID13
Feature activation-1.37e+00

Pos logit contributions
str+0.059
ww+0.053
arest+0.053
prise+0.051
iest+0.051
Neg logit contributions
lee-0.043
stick-0.042
en-0.041
 requests-0.040
 circles-0.039
Token
Token ID198
Feature activation+2.53e+00

Pos logit contributions
stick+0.029
lee+0.029
en+0.028
 requests+0.027
arted+0.027
Neg logit contributions
str-0.030
ww-0.027
 slice-0.027
Not-0.027
prise-0.026
Token
Token ID198
Feature activation+2.01e+00

Pos logit contributions
str+0.048
ww+0.045
iest+0.045
 slice+0.044
ipop+0.043
Neg logit contributions
lee-0.059
 requests-0.057
stick-0.057
 circles-0.055
ags-0.055
TokenBut
Token ID1537
Feature activation+5.11e+00

Pos logit contributions
str+0.041
ipop+0.040
iest+0.040
 happened+0.039
 slice+0.038
Neg logit contributions
 requests-0.042
lement-0.041
 circles-0.041
lee-0.041
stick-0.041
Token as
Token ID355
Feature activation+8.35e+00

Pos logit contributions
str+0.111
arest+0.110
iest+0.108
ww+0.101
prise+0.100
Neg logit contributions
stick-0.104
en-0.104
ishment-0.104
lee-0.102
 heads-0.102
Token they
Token ID484
Feature activation+6.88e-01

Pos logit contributions
str+0.164
arest+0.149
 slice+0.146
ipop+0.138
prise+0.135
Neg logit contributions
 houses-0.183
 circles-0.182
lee-0.176
en-0.176
 requests-0.174
Token ate
Token ID15063
Feature activation-1.61e+00

Pos logit contributions
str+0.015
arest+0.015
iest+0.014
ww+0.014
 possible+0.013
Neg logit contributions
 heads-0.014
en-0.014
lee-0.014
am-0.014
arted-0.014
Token,
Token ID11
Feature activation+5.85e+00

Pos logit contributions
ags+0.031
 circles+0.030
en+0.030
 houses+0.030
 heads+0.029
Neg logit contributions
str-0.040
arest-0.038
prise-0.036
ww-0.036
ha-0.036
Token the
Token ID262
Feature activation+4.97e-01

Pos logit contributions
str+0.105
ww+0.094
 slice+0.093
arest+0.093
prise+0.090
Neg logit contributions
lee-0.147
stick-0.146
en-0.140
 requests-0.139
 circles-0.137
Token sky
Token ID6766
Feature activation+1.86e+00

Pos logit contributions
str+0.011
ww+0.011
prise+0.011
arest+0.011
 slice+0.010
Neg logit contributions
lee-0.010
 heads-0.010
stick-0.009
 requests-0.009
 circles-0.009
Token.
Token ID13
Feature activation-1.62e+00

Pos logit contributions
lee+0.033
stick+0.032
 requests+0.031
arted+0.031
en+0.031
Neg logit contributions
str-0.029
 slice-0.027
ww-0.027
iest-0.026
Not-0.026
Token
Token ID220
Feature activation-2.57e+00

Pos logit contributions
 heads+0.034
stick+0.034
lee+0.033
en+0.033
ags+0.033
Neg logit contributions
str-0.035
Not-0.032
ww-0.032
 slice-0.031
aster-0.031
Token
Token ID198
Feature activation+1.28e+00

Pos logit contributions
stick+0.065
lee+0.064
en+0.064
ishment+0.063
 heads+0.063
Neg logit contributions
str-0.045
 slice-0.039
Not-0.039
ww-0.039
 happened-0.038
Token
Token ID198
Feature activation+9.74e-01

Pos logit contributions
str+0.025
ww+0.022
 slice+0.022
Not+0.022
iest+0.022
Neg logit contributions
lee-0.030
stick-0.030
en-0.029
 requests-0.029
arted-0.028
TokenBut
Token ID1537
Feature activation+4.92e+00

Pos logit contributions
str+0.017
iest+0.016
 slice+0.016
 happened+0.015
ipop+0.015
Neg logit contributions
lee-0.024
stick-0.024
 requests-0.023
arted-0.023
en-0.023
Token as
Token ID355
Feature activation+8.32e+00

Pos logit contributions
str+0.105
arest+0.102
iest+0.098
ww+0.096
 slice+0.095
Neg logit contributions
stick-0.108
en-0.105
lee-0.102
ishment-0.098
arted-0.096
Token she
Token ID673
Feature activation+4.56e-01

Pos logit contributions
str+0.183
 slice+0.177
arest+0.166
 coconut+0.159
ipop+0.157
Neg logit contributions
en-0.157
lee-0.156
 circles-0.154
 requests-0.153
lement-0.147
Token did
Token ID750
Feature activation-1.23e+00

Pos logit contributions
str+0.010
 slice+0.009
arest+0.009
prise+0.009
ww+0.009
Neg logit contributions
en-0.009
lee-0.009
 requests-0.009
stick-0.009
am-0.009
Token,
Token ID11
Feature activation+5.26e+00

Pos logit contributions
stick+0.032
lee+0.032
 requests+0.031
en+0.031
arted+0.030
Neg logit contributions
str-0.020
 slice-0.019
ww-0.018
prise-0.018
iest-0.018
Token the
Token ID262
Feature activation+4.51e-01

Pos logit contributions
str+0.099
 slice+0.092
arest+0.090
ww+0.090
prise+0.088
Neg logit contributions
stick-0.128
lee-0.127
 requests-0.122
en-0.121
 circles-0.118
Token stone
Token ID7815
Feature activation+1.36e+00

Pos logit contributions
str+0.010
ww+0.009
prise+0.009
 slice+0.009
arest+0.009
Neg logit contributions
stick-0.009
 heads-0.009
lee-0.009
 requests-0.009
 circles-0.009
Token.
Token ID13
Feature activation-2.00e+00

Pos logit contributions
str+0.041
ww+0.039
Not+0.037
prise+0.037
 slice+0.037
Neg logit contributions
stick-0.032
lee-0.032
en-0.030
 requests-0.029
arted-0.028
Token
Token ID220
Feature activation-2.39e+00

Pos logit contributions
stick+0.045
en+0.045
 heads+0.045
lee+0.044
ags+0.043
Neg logit contributions
str-0.040
Not-0.036
 slice-0.035
 mel-0.035
ipop-0.035
Token
Token ID198
Feature activation+1.94e+00

Pos logit contributions
stick+0.059
lee+0.059
en+0.058
 heads+0.057
 requests+0.056
Neg logit contributions
str-0.043
 slice-0.038
ww-0.038
Not-0.037
prise-0.036
Token
Token ID198
Feature activation+1.11e+00

Pos logit contributions
str+0.039
 slice+0.036
ww+0.036
iest+0.035
prise+0.035
Neg logit contributions
lee-0.043
stick-0.043
 requests-0.041
en-0.041
arted-0.041
TokenBut
Token ID1537
Feature activation+4.48e+00

Pos logit contributions
str+0.021
 happened+0.019
 slice+0.019
iest+0.019
ipop+0.019
Neg logit contributions
stick-0.026
lee-0.026
 requests-0.025
aled-0.025
 circles-0.025
Token as
Token ID355
Feature activation+8.23e+00

Pos logit contributions
str+0.094
arest+0.091
ww+0.090
prise+0.089
iest+0.086
Neg logit contributions
stick-0.098
en-0.097
lee-0.093
ishment-0.093
 circles-0.090
Token he
Token ID339
Feature activation-1.23e-01

Pos logit contributions
str+0.175
 slice+0.169
arest+0.159
prise+0.150
ipop+0.150
Neg logit contributions
 circles-0.160
en-0.156
lee-0.152
sey-0.151
 arms-0.146
Token wiped
Token ID21122
Feature activation-3.95e+00

Pos logit contributions
en+0.003
lee+0.002
stick+0.002
 requests+0.002
am+0.002
Neg logit contributions
str-0.003
arest-0.003
ww-0.003
prise-0.002
iest-0.002
Token it
Token ID340
Feature activation-1.34e+00

Pos logit contributions
lee+0.093
stick+0.092
 requests+0.089
arted+0.088
ags+0.087
Neg logit contributions
str-0.074
ww-0.068
 slice-0.068
iest-0.067
Not-0.066
Token,
Token ID11
Feature activation+5.05e+00

Pos logit contributions
lee+0.028
stick+0.028
 requests+0.026
arted+0.026
ishment+0.026
Neg logit contributions
str-0.029
iest-0.027
ww-0.027
 slice-0.027
Not-0.026
Token a
Token ID257
Feature activation+1.38e+00

Pos logit contributions
str+0.099
ww+0.092
prise+0.091
arest+0.091
 slice+0.090
Neg logit contributions
lee-0.117
stick-0.117
 circles-0.114
 requests-0.112
en-0.112
Token is
Token ID318
Feature activation-2.57e-01

Pos logit contributions
str+0.025
ww+0.023
 slice+0.023
prise+0.023
Not+0.023
Neg logit contributions
lee-0.027
stick-0.027
en-0.026
am-0.025
arted-0.025
Token.
Token ID13
Feature activation-1.66e+00

Pos logit contributions
lee+0.005
stick+0.005
 circles+0.005
en+0.005
 requests+0.005
Neg logit contributions
str-0.005
ww-0.005
arest-0.005
prise-0.005
Not-0.005
Token
Token ID198
Feature activation+1.61e+00

Pos logit contributions
stick+0.029
lee+0.029
en+0.028
 heads+0.028
ishment+0.027
Neg logit contributions
str-0.041
ww-0.039
 slice-0.039
Not-0.038
aster-0.038
Token
Token ID198
Feature activation+1.33e+00

Pos logit contributions
str+0.031
ww+0.028
 slice+0.028
iest+0.027
prise+0.027
Neg logit contributions
lee-0.039
stick-0.038
 requests-0.037
en-0.037
arted-0.036
TokenBut
Token ID1537
Feature activation+4.81e+00

Pos logit contributions
str+0.022
iest+0.020
 slice+0.020
 happened+0.020
ipop+0.019
Neg logit contributions
lee-0.034
stick-0.034
 requests-0.033
aled-0.033
lement-0.033
Token as
Token ID355
Feature activation+8.22e+00

Pos logit contributions
str+0.110
arest+0.105
iest+0.104
ww+0.100
prise+0.096
Neg logit contributions
stick-0.099
en-0.096
 heads-0.095
ishment-0.094
 circles-0.093
Token they
Token ID484
Feature activation+5.38e-01

Pos logit contributions
str+0.178
 slice+0.164
arest+0.160
ipop+0.150
prise+0.150
Neg logit contributions
 circles-0.161
en-0.160
 houses-0.157
 requests-0.155
lee-0.154
Token are
Token ID389
Feature activation-8.08e-01

Pos logit contributions
str+0.011
ww+0.011
iest+0.011
arest+0.010
Not+0.010
Neg logit contributions
 heads-0.012
en-0.012
aled-0.011
 backs-0.011
 requests-0.011
Token about
Token ID546
Feature activation-1.96e-01

Pos logit contributions
stick+0.018
lee+0.018
en+0.017
 requests+0.017
aled+0.017
Neg logit contributions
str-0.017
 slice-0.015
ww-0.015
 happened-0.015
prise-0.015
Token to
Token ID284
Feature activation-1.94e+00

Pos logit contributions
lee+0.005
stick+0.005
 requests+0.005
en+0.005
arted+0.004
Neg logit contributions
str-0.004
ww-0.003
 slice-0.003
iest-0.003
prise-0.003
Token land
Token ID1956
Feature activation-6.83e-02

Pos logit contributions
lee+0.045
stick+0.044
en+0.043
 requests+0.043
 heads+0.042
Neg logit contributions
str-0.039
ww-0.036
 slice-0.035
iest-0.034
Not-0.034
Token safety
Token ID3747
Feature activation+6.21e-02

Pos logit contributions
lee+0.052
stick+0.051
en+0.049
 requests+0.049
arted+0.049
Neg logit contributions
str-0.043
 slice-0.039
ww-0.039
prise-0.037
iest-0.037
Token.
Token ID13
Feature activation-2.09e+00

Pos logit contributions
str+0.001
iest+0.001
ww+0.001
 happened+0.001
 slice+0.001
Neg logit contributions
lee-0.001
stick-0.001
 requests-0.001
aled-0.001
ags-0.001
Token
Token ID198
Feature activation+1.56e+00

Pos logit contributions
en+0.053
lee+0.052
 heads+0.052
stick+0.051
ishment+0.050
Neg logit contributions
str-0.036
Not-0.032
ww-0.031
aster-0.031
ha-0.031
Token
Token ID198
Feature activation+8.96e-01

Pos logit contributions
str+0.032
ww+0.030
 slice+0.030
iest+0.029
prise+0.029
Neg logit contributions
lee-0.034
stick-0.034
 requests-0.033
en-0.032
arted-0.032
TokenBut
Token ID1537
Feature activation+4.56e+00

Pos logit contributions
str+0.017
iest+0.015
 slice+0.015
ipop+0.015
 happened+0.015
Neg logit contributions
stick-0.022
lee-0.021
 requests-0.021
aled-0.021
 circles-0.020
Token as
Token ID355
Feature activation+8.21e+00

Pos logit contributions
str+0.099
arest+0.092
prise+0.092
ww+0.090
iest+0.090
Neg logit contributions
stick-0.102
en-0.095
lee-0.093
arted-0.092
 circles-0.091
Token he
Token ID339
Feature activation+3.52e-01

Pos logit contributions
str+0.173
 slice+0.160
arest+0.150
prise+0.147
 coconut+0.146
Neg logit contributions
 circles-0.172
en-0.167
 houses-0.165
stick-0.164
 requests-0.161
Token was
Token ID373
Feature activation-1.46e+00

Pos logit contributions
str+0.008
 slice+0.007
prise+0.007
arest+0.007
ww+0.007
Neg logit contributions
en-0.007
stick-0.007
am-0.007
lee-0.007
aled-0.007
Token helping
Token ID5742
Feature activation+6.12e-01

Pos logit contributions
lee+0.029
stick+0.028
 requests+0.027
en+0.027
arted+0.027
Neg logit contributions
str-0.034
 slice-0.032
ww-0.032
prise-0.031
iest-0.031
Token his
Token ID465
Feature activation-7.95e-01

Pos logit contributions
str+0.012
 slice+0.011
ww+0.011
prise+0.011
arest+0.011
Neg logit contributions
stick-0.014
lee-0.014
en-0.013
 requests-0.013
 heads-0.013
Token friends
Token ID2460
Feature activation+2.49e-01

Pos logit contributions
lee+0.013
stick+0.013
en+0.013
arted+0.012
 requests+0.012
Neg logit contributions
str-0.021
 slice-0.020
ww-0.019
iest-0.019
prise-0.019
Token playground
Token ID24817
Feature activation-6.53e-01

Pos logit contributions
lee+0.017
stick+0.017
en+0.016
 requests+0.016
arted+0.016
Neg logit contributions
str-0.022
ww-0.020
 slice-0.020
iest-0.020
prise-0.020
Token.
Token ID13
Feature activation-1.96e+00

Pos logit contributions
lee+0.016
ags+0.015
stick+0.015
lement+0.015
 requests+0.015
Neg logit contributions
str-0.012
 slice-0.011
iest-0.011
 happened-0.011
Not-0.010
Token
Token ID198
Feature activation+1.71e+00

Pos logit contributions
en+0.039
 heads+0.039
stick+0.038
ishment+0.037
lee+0.037
Neg logit contributions
str-0.045
Not-0.042
ww-0.040
ha-0.040
aster-0.039
Token
Token ID198
Feature activation+1.22e+00

Pos logit contributions
str+0.035
 slice+0.032
ww+0.032
iest+0.031
prise+0.031
Neg logit contributions
lee-0.039
stick-0.038
 requests-0.037
en-0.036
arted-0.036
TokenBut
Token ID1537
Feature activation+4.81e+00

Pos logit contributions
str+0.021
 slice+0.020
iest+0.020
ipop+0.020
 happened+0.019
Neg logit contributions
lee-0.030
stick-0.029
 requests-0.029
arted-0.029
lement-0.028
Token as
Token ID355
Feature activation+8.21e+00

Pos logit contributions
str+0.100
arest+0.098
iest+0.095
 slice+0.095
prise+0.093
Neg logit contributions
stick-0.106
en-0.102
lee-0.099
arted-0.098
ishment-0.097
Token she
Token ID673
Feature activation+1.73e-01

Pos logit contributions
 slice+0.169
str+0.166
arest+0.151
 coconut+0.149
ipop+0.147
Neg logit contributions
lee-0.171
en-0.169
 circles-0.168
 requests-0.162
lement-0.162
Token was
Token ID373
Feature activation-1.87e+00

Pos logit contributions
str+0.003
 slice+0.003
ww+0.003
arest+0.003
prise+0.003
Neg logit contributions
lee-0.004
en-0.004
stick-0.004
 requests-0.004
am-0.004
Token having
Token ID1719
Feature activation-1.96e+00

Pos logit contributions
stick+0.036
lee+0.036
 requests+0.034
en+0.034
 heads+0.034
Neg logit contributions
str-0.045
 slice-0.040
ww-0.040
Not-0.040
 happened-0.040
Token fun
Token ID1257
Feature activation+1.80e+00

Pos logit contributions
stick+0.048
lee+0.047
ishment+0.047
en+0.047
 heads+0.047
Neg logit contributions
str-0.034
 happened-0.032
ww-0.031
Not-0.031
ha-0.030
Token,
Token ID11
Feature activation+5.45e+00

Pos logit contributions
str+0.029
ww+0.026
 slice+0.026
iest+0.025
Not+0.025
Neg logit contributions
lee-0.048
stick-0.048
 requests-0.046
en-0.046
arted-0.046
Token himself
Token ID2241
Feature activation+1.82e-01

Pos logit contributions
stick+0.041
ishment+0.041
lee+0.040
aters+0.039
arted+0.039
Neg logit contributions
str-0.031
aster-0.030
iest-0.028
 slice-0.028
Not-0.028
Token.
Token ID13
Feature activation-1.83e+00

Pos logit contributions
str+0.004
 slice+0.003
ww+0.003
iest+0.003
Not+0.003
Neg logit contributions
lee-0.004
stick-0.004
en-0.004
 requests-0.004
arted-0.004
Token
Token ID198
Feature activation+1.70e+00

Pos logit contributions
en+0.041
 heads+0.041
stick+0.041
ags+0.041
lee+0.040
Neg logit contributions
str-0.035
Not-0.033
ha-0.031
ipop-0.031
ww-0.031
Token
Token ID198
Feature activation+9.07e-01

Pos logit contributions
str+0.034
 slice+0.031
ww+0.031
iest+0.030
prise+0.030
Neg logit contributions
lee-0.039
stick-0.038
 requests-0.037
en-0.037
arted-0.037
TokenBut
Token ID1537
Feature activation+4.78e+00

Pos logit contributions
str+0.016
 slice+0.014
ww+0.014
iest+0.014
 happened+0.014
Neg logit contributions
lee-0.023
stick-0.023
 requests-0.022
en-0.022
arted-0.022
Token as
Token ID355
Feature activation+8.20e+00

Pos logit contributions
str+0.097
arest+0.092
ww+0.090
prise+0.089
iest+0.088
Neg logit contributions
stick-0.111
en-0.107
lee-0.106
ishment-0.103
arted-0.102
Token he
Token ID339
Feature activation+3.70e-01

Pos logit contributions
str+0.158
 slice+0.156
arest+0.143
prise+0.133
ipop+0.132
Neg logit contributions
en-0.178
 circles-0.178
lee-0.175
 requests-0.171
sey-0.169
Token was
Token ID373
Feature activation-1.71e+00

Pos logit contributions
str+0.008
arest+0.008
 slice+0.007
ww+0.007
prise+0.007
Neg logit contributions
en-0.008
lee-0.008
stick-0.007
am-0.007
arted-0.007
Token hiding
Token ID11816
Feature activation-2.08e+00

Pos logit contributions
lee+0.035
stick+0.034
 requests+0.033
en+0.033
arted+0.033
Neg logit contributions
str-0.039
ww-0.036
 slice-0.036
iest-0.035
prise-0.035
Token,
Token ID11
Feature activation+5.96e+00

Pos logit contributions
lee+0.048
stick+0.047
en+0.047
arted+0.044
 heads+0.044
Neg logit contributions
str-0.044
ww-0.040
 slice-0.040
prise-0.039
arest-0.038
Token he
Token ID339
Feature activation-4.44e-01

Pos logit contributions
str+0.106
 slice+0.098
arest+0.095
ww+0.095
prise+0.094
Neg logit contributions
lee-0.150
stick-0.149
 circles-0.144
en-0.143
 requests-0.142
Token now
Token ID783
Feature activation+2.18e+00

Pos logit contributions
str+0.022
ww+0.021
arest+0.021
ha+0.020
 happened+0.019
Neg logit contributions
stick-0.012
en-0.011
 heads-0.011
 houses-0.011
ags-0.010
Token.
Token ID13
Feature activation-2.24e+00

Pos logit contributions
str+0.058
ww+0.053
prise+0.052
Not+0.051
arest+0.051
Neg logit contributions
lee-0.037
stick-0.037
en-0.036
 requests-0.036
 heads-0.035
Token
Token ID198
Feature activation+2.74e+00

Pos logit contributions
lee+0.028
stick+0.027
 requests+0.027
ags+0.026
 heads+0.026
Neg logit contributions
str-0.068
iest-0.064
ww-0.064
 slice-0.064
prise-0.063
Token
Token ID198
Feature activation+2.68e+00

Pos logit contributions
iest+0.065
ww+0.065
str+0.064
ipop+0.062
 slice+0.061
Neg logit contributions
lee-0.051
 requests-0.049
am-0.048
 circles-0.047
ags-0.047
TokenBut
Token ID1537
Feature activation+4.78e+00

Pos logit contributions
ipop+0.084
iest+0.078
 happened+0.073
str+0.072
 reward+0.069
Neg logit contributions
 houses-0.033
 circles-0.032
acking-0.030
am-0.030
 commands-0.029
Token when
Token ID618
Feature activation+8.19e+00

Pos logit contributions
str+0.119
iest+0.118
arest+0.116
ww+0.114
prise+0.109
Neg logit contributions
 houses-0.090
 heads-0.083
am-0.082
 circles-0.081
ishment-0.081
Token the
Token ID262
Feature activation+1.23e+00

Pos logit contributions
str+0.163
arest+0.157
ipop+0.147
iest+0.147
prise+0.147
Neg logit contributions
 houses-0.199
 circles-0.178
 heads-0.177
am-0.172
 voices-0.172
Token waiter
Token ID46612
Feature activation+2.14e+00

Pos logit contributions
str+0.031
ww+0.028
w+0.028
ha+0.027
arest+0.027
Neg logit contributions
 requests-0.024
 houses-0.023
 heads-0.023
 voices-0.023
ags-0.022
Token came
Token ID1625
Feature activation-4.10e+00

Pos logit contributions
ww+0.049
arest+0.048
str+0.048
Not+0.044
iest+0.043
Neg logit contributions
en-0.042
lee-0.041
am-0.041
stick-0.040
 requests-0.040
Token,
Token ID11
Feature activation+5.74e+00

Pos logit contributions
lee+0.090
stick+0.089
en+0.086
 requests+0.085
arted+0.084
Neg logit contributions
str-0.089
ww-0.081
 slice-0.080
iest-0.078
Not-0.077
Token he
Token ID339
Feature activation-1.25e-01

Pos logit contributions
arest+0.128
str+0.127
ww+0.122
w+0.117
prise+0.115
Neg logit contributions
 houses-0.131
 circles-0.116
 requests-0.116
 voices-0.114
 heads-0.111
Token one
Token ID530
Feature activation+3.79e+00

Pos logit contributions
lee+0.057
stick+0.057
ishment+0.055
 heads+0.054
aled+0.054
Neg logit contributions
str-0.055
iest-0.054
ha-0.051
ww-0.050
 mel-0.050
Token.
Token ID13
Feature activation-2.03e+00

Pos logit contributions
str+0.094
ww+0.087
 happened+0.086
prise+0.085
aster+0.083
Neg logit contributions
en-0.073
 circles-0.072
stick-0.070
 requests-0.066
lee-0.065
Token
Token ID198
Feature activation+2.06e+00

Pos logit contributions
en+0.048
stick+0.047
 heads+0.046
ags+0.046
ishment+0.045
Neg logit contributions
str-0.040
Not-0.037
ipop-0.035
ww-0.034
 mel-0.034
Token
Token ID198
Feature activation+1.36e+00

Pos logit contributions
str+0.041
 slice+0.038
ww+0.038
iest+0.037
prise+0.036
Neg logit contributions
lee-0.047
stick-0.046
 requests-0.045
en-0.044
arted-0.044
TokenBut
Token ID1537
Feature activation+4.68e+00

Pos logit contributions
str+0.026
iest+0.024
 slice+0.024
 happened+0.024
ipop+0.023
Neg logit contributions
stick-0.032
lee-0.031
 requests-0.031
lement-0.030
arted-0.030
Token as
Token ID355
Feature activation+8.19e+00

Pos logit contributions
str+0.095
arest+0.092
iest+0.090
ww+0.089
prise+0.088
Neg logit contributions
stick-0.106
lee-0.102
en-0.102
arted-0.099
ishment-0.099
Token she
Token ID673
Feature activation+1.11e-01

Pos logit contributions
str+0.173
 slice+0.169
arest+0.160
 coconut+0.150
prise+0.147
Neg logit contributions
 circles-0.162
lee-0.159
en-0.156
 requests-0.155
 houses-0.153
Token was
Token ID373
Feature activation-1.74e+00

Pos logit contributions
str+0.002
arest+0.002
ww+0.002
prise+0.002
iest+0.002
Neg logit contributions
en-0.002
lee-0.002
am-0.002
stick-0.002
 requests-0.002
Token about
Token ID546
Feature activation-1.04e-01

Pos logit contributions
lee+0.036
stick+0.035
 requests+0.034
en+0.034
arted+0.034
Neg logit contributions
str-0.039
ww-0.036
 slice-0.036
iest-0.036
Not-0.035
Token to
Token ID284
Feature activation-2.24e+00

Pos logit contributions
lee+0.002
stick+0.002
 requests+0.002
en+0.002
arted+0.002
Neg logit contributions
str-0.002
ww-0.002
 slice-0.002
iest-0.002
arest-0.002
Token grab
Token ID5552
Feature activation-2.55e+00

Pos logit contributions
lee+0.053
stick+0.053
en+0.051
 requests+0.050
 heads+0.049
Neg logit contributions
str-0.044
ww-0.040
 slice-0.039
iest-0.039
Not-0.039
Token behind
Token ID2157
Feature activation+9.72e-01

Pos logit contributions
lee+0.010
stick+0.010
 requests+0.009
en+0.009
arted+0.009
Neg logit contributions
str-0.007
 slice-0.007
ww-0.007
prise-0.007
iest-0.007
Token.
Token ID13
Feature activation-2.06e+00

Pos logit contributions
str+0.018
 slice+0.016
ww+0.016
iest+0.016
Not+0.016
Neg logit contributions
lee-0.024
stick-0.023
 requests-0.022
en-0.022
arted-0.022
Token
Token ID198
Feature activation+2.01e+00

Pos logit contributions
en+0.045
stick+0.044
lee+0.043
ags+0.043
 heads+0.042
Neg logit contributions
str-0.043
Not-0.042
ww-0.038
 mel-0.038
 happened-0.037
Token
Token ID198
Feature activation+1.45e+00

Pos logit contributions
str+0.039
ww+0.035
 slice+0.035
Not+0.034
prise+0.034
Neg logit contributions
lee-0.048
stick-0.047
en-0.046
 requests-0.045
arted-0.045
TokenBut
Token ID1537
Feature activation+4.96e+00

Pos logit contributions
str+0.023
 slice+0.021
iest+0.021
ww+0.020
 happened+0.020
Neg logit contributions
lee-0.039
stick-0.039
 requests-0.038
en-0.037
arted-0.037
Token as
Token ID355
Feature activation+8.19e+00

Pos logit contributions
str+0.121
ww+0.112
iest+0.111
arest+0.110
 slice+0.110
Neg logit contributions
stick-0.095
lee-0.095
arted-0.089
en-0.089
 circles-0.087
Token she
Token ID673
Feature activation+2.96e-01

Pos logit contributions
str+0.170
 slice+0.160
arest+0.150
ipop+0.144
prise+0.142
Neg logit contributions
 circles-0.178
lee-0.176
stick-0.170
 requests-0.170
en-0.165
Token ran
Token ID4966
Feature activation-3.99e+00

Pos logit contributions
str+0.006
ww+0.005
 slice+0.005
arest+0.005
iest+0.005
Neg logit contributions
lee-0.007
stick-0.007
en-0.007
 requests-0.006
 heads-0.006
Token,
Token ID11
Feature activation+5.30e+00

Pos logit contributions
stick+0.087
arted+0.083
lee+0.082
en+0.081
aled+0.080
Neg logit contributions
str-0.089
 slice-0.083
ww-0.079
prise-0.079
arest-0.077
Token she
Token ID673
Feature activation-9.23e-01

Pos logit contributions
str+0.092
 slice+0.084
ww+0.083
arest+0.081
prise+0.080
Neg logit contributions
stick-0.137
lee-0.137
 requests-0.131
arted-0.129
 circles-0.129
Token did
Token ID750
Feature activation-2.32e+00

Pos logit contributions
lee+0.025
stick+0.025
en+0.024
 requests+0.024
arted+0.024
Neg logit contributions
str-0.015
ww-0.013
 slice-0.013
iest-0.013
prise-0.013
Token away
Token ID1497
Feature activation+1.25e+00

Pos logit contributions
en+0.027
stick+0.026
lee+0.025
ags+0.024
 requests+0.024
Neg logit contributions
str-0.036
ww-0.032
prise-0.032
arest-0.032
Not-0.030
Token.
Token ID13
Feature activation-9.06e-01

Pos logit contributions
str+0.032
ww+0.029
prise+0.029
arest+0.029
Not+0.028
Neg logit contributions
stick-0.023
en-0.023
lee-0.023
 requests-0.021
 circles-0.021
Token
Token ID198
Feature activation+1.67e+00

Pos logit contributions
stick+0.019
lee+0.019
en+0.019
ags+0.019
 heads+0.019
Neg logit contributions
str-0.019
ww-0.018
Not-0.017
 slice-0.017
aster-0.017
Token
Token ID198
Feature activation+1.09e+00

Pos logit contributions
str+0.033
 slice+0.031
ww+0.030
iest+0.030
prise+0.029
Neg logit contributions
lee-0.038
stick-0.037
 requests-0.036
arted-0.036
en-0.035
TokenBut
Token ID1537
Feature activation+4.77e+00

Pos logit contributions
str+0.019
iest+0.017
 slice+0.017
 happened+0.017
ipop+0.017
Neg logit contributions
lee-0.027
stick-0.027
 requests-0.026
arted-0.026
aled-0.026
Token as
Token ID355
Feature activation+8.19e+00

Pos logit contributions
str+0.098
arest+0.094
ww+0.091
prise+0.090
iest+0.090
Neg logit contributions
stick-0.108
en-0.105
lee-0.104
ishment-0.100
arted-0.099
Token he
Token ID339
Feature activation+5.19e-01

Pos logit contributions
str+0.167
 slice+0.162
arest+0.153
prise+0.143
ipop+0.140
Neg logit contributions
en-0.169
 circles-0.164
 requests-0.163
lee-0.162
sey-0.158
Token bit
Token ID1643
Feature activation+1.14e+00

Pos logit contributions
str+0.012
arest+0.011
iest+0.011
prise+0.011
ww+0.011
Neg logit contributions
en-0.010
 requests-0.010
lee-0.010
am-0.010
arted-0.009
Token into
Token ID656
Feature activation-2.71e-01

Pos logit contributions
str+0.027
ww+0.024
 slice+0.024
prise+0.024
iest+0.024
Neg logit contributions
lee-0.023
stick-0.023
en-0.022
 requests-0.022
arted-0.022
Token the
Token ID262
Feature activation-1.24e+00

Pos logit contributions
lee+0.006
stick+0.006
 requests+0.005
en+0.005
 heads+0.005
Neg logit contributions
str-0.006
ww-0.006
prise-0.006
arest-0.005
Not-0.005
Token peach
Token ID47565
Feature activation+1.54e+00

Pos logit contributions
stick+0.026
 heads+0.026
 requests+0.026
lee+0.025
 circles+0.025
Neg logit contributions
str-0.027
iest-0.024
arest-0.024
ww-0.024
ha-0.024
Token.
Token ID13
Feature activation-1.87e+00

Pos logit contributions
str+0.001
 slice+0.001
ww+0.001
iest+0.001
prise+0.001
Neg logit contributions
lee-0.001
stick-0.001
arted-0.001
aled-0.001
ags-0.001
Token
Token ID220
Feature activation-2.23e+00

Pos logit contributions
stick+0.038
lee+0.038
en+0.037
 requests+0.036
ags+0.035
Neg logit contributions
str-0.043
ww-0.039
 slice-0.039
Not-0.039
aster-0.038
Token
Token ID198
Feature activation+1.85e+00

Pos logit contributions
lee+0.054
stick+0.054
en+0.052
 requests+0.052
 heads+0.051
Neg logit contributions
str-0.042
 slice-0.037
ww-0.037
Not-0.036
prise-0.035
Token
Token ID198
Feature activation+1.41e+00

Pos logit contributions
str+0.040
ww+0.038
 slice+0.038
iest+0.038
prise+0.037
Neg logit contributions
lee-0.038
stick-0.037
 requests-0.036
arted-0.035
ags-0.034
TokenBut
Token ID1537
Feature activation+4.41e+00

Pos logit contributions
ipop+0.029
 happened+0.029
str+0.028
iest+0.028
 slice+0.028
Neg logit contributions
stick-0.028
 commands-0.027
 requests-0.027
aled-0.027
 circles-0.027
Token as
Token ID355
Feature activation+8.19e+00

Pos logit contributions
arest+0.100
prise+0.099
ww+0.094
iest+0.094
str+0.093
Neg logit contributions
stick-0.087
en-0.085
ishment-0.082
lee-0.081
am-0.080
Token he
Token ID339
Feature activation+2.69e-02

Pos logit contributions
 slice+0.188
arest+0.180
str+0.178
prise+0.173
ipop+0.173
Neg logit contributions
en-0.128
 houses-0.127
 circles-0.124
 shadows-0.120
 requests-0.119
Token was
Token ID373
Feature activation-1.51e+00

Pos logit contributions
str+0.001
arest+0.001
iest+0.001
prise+0.001
 slice+0.001
Neg logit contributions
en-0.001
lee-0.001
am-0.001
 requests-0.001
stick-0.001
Token walking
Token ID6155
Feature activation-3.61e+00

Pos logit contributions
lee+0.031
stick+0.030
 requests+0.029
en+0.029
arted+0.029
Neg logit contributions
str-0.034
ww-0.032
 slice-0.032
iest-0.032
Not-0.031
Token back
Token ID736
Feature activation-5.31e-01

Pos logit contributions
stick+0.067
lee+0.062
arted+0.062
en+0.061
ags+0.061
Neg logit contributions
str-0.092
ww-0.085
 slice-0.085
prise-0.084
 happened-0.083
Token,
Token ID11
Feature activation+5.72e+00

Pos logit contributions
lee+0.010
stick+0.010
en+0.009
 requests+0.009
arted+0.009
Neg logit contributions
str-0.013
 slice-0.012
ww-0.012
iest-0.012
prise-0.012
Token line
Token ID1627
Feature activation-1.67e+00

Pos logit contributions
lee+0.029
stick+0.029
arted+0.028
ags+0.027
 requests+0.027
Neg logit contributions
str-0.026
 slice-0.024
iest-0.023
ww-0.023
aster-0.023
Token.
Token ID13
Feature activation-2.07e+00

Pos logit contributions
lee+0.037
stick+0.036
 requests+0.035
en+0.035
arted+0.035
Neg logit contributions
str-0.034
 slice-0.031
ww-0.031
iest-0.031
Not-0.030
Token
Token ID198
Feature activation+1.42e+00

Pos logit contributions
lee+0.045
stick+0.045
en+0.044
 requests+0.043
 heads+0.043
Neg logit contributions
str-0.044
ww-0.039
 slice-0.039
Not-0.039
 happened-0.038
Token
Token ID198
Feature activation+1.10e+00

Pos logit contributions
str+0.032
ww+0.031
iest+0.031
 slice+0.031
ipop+0.030
Neg logit contributions
lee-0.028
stick-0.027
 requests-0.026
 circles-0.026
en-0.025
TokenBut
Token ID1537
Feature activation+4.69e+00

Pos logit contributions
ipop+0.023
iest+0.022
str+0.021
 slice+0.021
 happened+0.020
Neg logit contributions
stick-0.023
lee-0.023
 requests-0.023
 commands-0.022
 circles-0.022
Token as
Token ID355
Feature activation+8.17e+00

Pos logit contributions
str+0.106
prise+0.105
arest+0.104
iest+0.102
 slice+0.101
Neg logit contributions
stick-0.094
en-0.092
lee-0.092
ishment-0.089
arted-0.087
Token they
Token ID484
Feature activation+6.72e-01

Pos logit contributions
ipop+0.164
str+0.164
 slice+0.160
arest+0.152
prise+0.150
Neg logit contributions
lee-0.166
en-0.161
 houses-0.161
 circles-0.160
ishment-0.152
Token crossed
Token ID12606
Feature activation-2.96e+00

Pos logit contributions
str+0.013
ww+0.013
 slice+0.013
prise+0.012
arest+0.012
Neg logit contributions
lee-0.015
arted-0.015
aled-0.015
en-0.014
am-0.014
Token the
Token ID262
Feature activation-2.27e+00

Pos logit contributions
en+0.065
lee+0.065
stick+0.065
 circles+0.062
arted+0.062
Neg logit contributions
str-0.062
ww-0.059
 slice-0.058
prise-0.057
arest-0.056
Token finish
Token ID5461
Feature activation-6.58e-01

Pos logit contributions
lee+0.054
stick+0.053
en+0.051
 requests+0.051
ishment+0.050
Neg logit contributions
str-0.045
ww-0.041
 slice-0.041
Not-0.040
aster-0.039
Token line
Token ID1627
Feature activation-1.36e+00

Pos logit contributions
stick+0.015
lee+0.015
arted+0.014
 requests+0.014
ags+0.014
Neg logit contributions
str-0.013
 slice-0.012
iest-0.012
ww-0.012
aster-0.012
Token.
Token ID13
Feature activation-1.86e+00

Pos logit contributions
stick+0.017
lee+0.017
en+0.016
 requests+0.016
arted+0.016
Neg logit contributions
str-0.016
ww-0.015
 slice-0.015
prise-0.014
Not-0.014
Token
Token ID220
Feature activation-2.14e+00

Pos logit contributions
ags+0.038
en+0.038
stick+0.038
lee+0.037
 heads+0.037
Neg logit contributions
str-0.041
Not-0.038
 mel-0.037
ww-0.036
 slice-0.036
Token
Token ID198
Feature activation+1.91e+00

Pos logit contributions
lee+0.053
stick+0.053
en+0.051
 heads+0.050
 requests+0.050
Neg logit contributions
str-0.039
 slice-0.034
ww-0.034
Not-0.033
prise-0.033
Token
Token ID198
Feature activation+1.07e+00

Pos logit contributions
str+0.039
 slice+0.036
ww+0.036
iest+0.035
prise+0.035
Neg logit contributions
lee-0.043
stick-0.042
 requests-0.041
en-0.040
arted-0.040
TokenBut
Token ID1537
Feature activation+4.57e+00

Pos logit contributions
str+0.019
 slice+0.018
 happened+0.018
iest+0.018
ww+0.017
Neg logit contributions
lee-0.026
stick-0.026
 requests-0.025
arted-0.025
en-0.025
Token as
Token ID355
Feature activation+8.16e+00

Pos logit contributions
str+0.094
ww+0.087
arest+0.086
prise+0.086
iest+0.084
Neg logit contributions
stick-0.106
en-0.103
lee-0.103
ishment-0.101
 heads-0.099
Token the
Token ID262
Feature activation+1.56e+00

Pos logit contributions
str+0.174
 slice+0.166
arest+0.158
ipop+0.151
prise+0.150
Neg logit contributions
 circles-0.161
en-0.158
lee-0.152
sey-0.152
 requests-0.151
Token mouse
Token ID10211
Feature activation+1.49e-01

Pos logit contributions
str+0.035
ww+0.030
Not+0.029
iest+0.029
prise+0.029
Neg logit contributions
 requests-0.033
 heads-0.033
stick-0.033
lee-0.032
 circles-0.032
Token was
Token ID373
Feature activation-1.52e+00

Pos logit contributions
str+0.003
ww+0.003
 slice+0.003
prise+0.003
arest+0.003
Neg logit contributions
lee-0.003
stick-0.003
en-0.003
 requests-0.003
arted-0.003
Token wrapping
Token ID27074
Feature activation-2.62e+00

Pos logit contributions
lee+0.031
stick+0.030
 requests+0.029
en+0.029
arted+0.029
Neg logit contributions
str-0.035
ww-0.032
 slice-0.032
iest-0.031
Not-0.031
Token the
Token ID262
Feature activation-1.20e+00

Pos logit contributions
stick+0.058
lee+0.058
en+0.056
 heads+0.055
 requests+0.055
Neg logit contributions
str-0.055
ww-0.051
 slice-0.050
prise-0.049
Not-0.048
Token out
Token ID503
Feature activation+1.02e+00

Pos logit contributions
lee+0.049
stick+0.049
en+0.047
 requests+0.046
arted+0.045
Neg logit contributions
str-0.059
ww-0.054
 slice-0.053
prise-0.052
Not-0.051
Token.
Token ID13
Feature activation-1.74e+00

Pos logit contributions
str+0.022
ww+0.020
 slice+0.020
prise+0.019
arest+0.019
Neg logit contributions
lee-0.023
stick-0.022
en-0.022
 requests-0.021
arted-0.021
Token
Token ID198
Feature activation+1.56e+00

Pos logit contributions
en+0.035
stick+0.035
 heads+0.034
ags+0.034
lee+0.034
Neg logit contributions
str-0.039
Not-0.037
ww-0.035
 slice-0.035
ipop-0.034
Token
Token ID198
Feature activation+1.05e+00

Pos logit contributions
str+0.032
 slice+0.030
ww+0.030
iest+0.029
prise+0.028
Neg logit contributions
lee-0.035
stick-0.034
 requests-0.033
en-0.033
arted-0.033
TokenBut
Token ID1537
Feature activation+4.92e+00

Pos logit contributions
str+0.019
 slice+0.018
 happened+0.018
iest+0.018
ipop+0.017
Neg logit contributions
lee-0.025
stick-0.025
 requests-0.024
arted-0.024
lement-0.024
Token as
Token ID355
Feature activation+8.11e+00

Pos logit contributions
str+0.106
arest+0.101
ww+0.097
iest+0.097
prise+0.094
Neg logit contributions
stick-0.110
en-0.105
lee-0.104
ishment-0.101
 heads-0.100
Token she
Token ID673
Feature activation+4.36e-01

Pos logit contributions
str+0.177
 slice+0.168
arest+0.161
 coconut+0.153
prise+0.146
Neg logit contributions
en-0.156
 circles-0.154
lee-0.153
 requests-0.151
lement-0.144
Token was
Token ID373
Feature activation-1.56e+00

Pos logit contributions
str+0.009
prise+0.009
ww+0.009
 slice+0.008
arest+0.008
Neg logit contributions
lee-0.009
en-0.009
am-0.009
 requests-0.009
stick-0.009
Token taking
Token ID2263
Feature activation-3.39e+00

Pos logit contributions
lee+0.030
stick+0.029
 requests+0.028
en+0.028
arted+0.028
Neg logit contributions
str-0.037
ww-0.035
 slice-0.035
iest-0.034
prise-0.034
Token pictures
Token ID5986
Feature activation+1.32e+00

Pos logit contributions
lee+0.086
stick+0.085
en+0.082
 requests+0.082
arted+0.081
Neg logit contributions
str-0.060
ww-0.054
 slice-0.054
iest-0.051
prise-0.051
Token,
Token ID11
Feature activation+5.35e+00

Pos logit contributions
str+0.021
ww+0.019
 slice+0.019
prise+0.018
aster+0.017
Neg logit contributions
stick-0.037
lee-0.036
en-0.035
 requests-0.034
arted-0.034
Token inside
Token ID2641
Feature activation+1.06e+00

Pos logit contributions
lee+0.043
stick+0.042
 requests+0.041
en+0.040
arted+0.040
Neg logit contributions
str-0.043
ww-0.040
 slice-0.039
iest-0.038
prise-0.038
Token.
Token ID13
Feature activation-7.18e-01

Pos logit contributions
str+0.028
ww+0.026
arest+0.025
ha+0.025
w+0.024
Neg logit contributions
en-0.018
lee-0.018
stick-0.018
 requests-0.018
 circles-0.018
Token
Token ID198
Feature activation+1.60e+00

Pos logit contributions
stick+0.012
lee+0.012
en+0.011
ags+0.011
 heads+0.011
Neg logit contributions
str-0.019
ww-0.017
 slice-0.017
Not-0.017
aster-0.017
Token
Token ID198
Feature activation+1.31e+00

Pos logit contributions
str+0.031
 slice+0.029
ww+0.028
iest+0.028
prise+0.027
Neg logit contributions
lee-0.037
stick-0.037
 requests-0.036
en-0.035
arted-0.035
TokenBut
Token ID1537
Feature activation+4.81e+00

Pos logit contributions
str+0.022
iest+0.020
 slice+0.020
 happened+0.020
ipop+0.019
Neg logit contributions
lee-0.034
stick-0.033
 requests-0.033
arted-0.032
 circles-0.032
Token as
Token ID355
Feature activation+8.10e+00

Pos logit contributions
str+0.104
arest+0.097
ww+0.094
iest+0.094
prise+0.090
Neg logit contributions
stick-0.108
en-0.104
lee-0.102
ishment-0.101
 heads-0.100
Token she
Token ID673
Feature activation+2.15e-01

Pos logit contributions
str+0.172
 slice+0.158
arest+0.153
ipop+0.143
prise+0.142
Neg logit contributions
 circles-0.165
en-0.165
 requests-0.161
lee-0.159
stick-0.158
Token looked
Token ID3114
Feature activation-2.39e+00

Pos logit contributions
str+0.005
arest+0.005
ww+0.005
iest+0.005
prise+0.004
Neg logit contributions
lee-0.004
en-0.004
 requests-0.004
stick-0.004
 heads-0.004
Token around
Token ID1088
Feature activation+2.92e-01

Pos logit contributions
lee+0.028
en+0.026
arted+0.025
 requests+0.025
stick+0.024
Neg logit contributions
str-0.079
ww-0.074
 slice-0.073
prise-0.072
arest-0.071
Token,
Token ID11
Feature activation+5.03e+00

Pos logit contributions
str+0.006
ww+0.006
arest+0.005
 slice+0.005
prise+0.005
Neg logit contributions
lee-0.007
en-0.007
stick-0.006
 requests-0.006
 circles-0.006
Token she
Token ID673
Feature activation-1.21e+00

Pos logit contributions
str+0.090
 slice+0.083
ww+0.081
arest+0.080
prise+0.078
Neg logit contributions
lee-0.127
stick-0.126
 requests-0.122
en-0.121
 circles-0.120
Token."
Token ID526
Feature activation-2.21e+00

Pos logit contributions
str+0.017
ww+0.016
 slice+0.015
iest+0.015
ha+0.015
Neg logit contributions
lee-0.032
stick-0.031
arted-0.030
 requests-0.030
aled-0.030
Token
Token ID220
Feature activation-1.89e+00

Pos logit contributions
en+0.051
lee+0.050
stick+0.049
 heads+0.049
arted+0.049
Neg logit contributions
str-0.045
 mel-0.039
 slice-0.039
Not-0.039
 happened-0.038
Token
Token ID198
Feature activation+1.95e+00

Pos logit contributions
lee+0.045
stick+0.045
en+0.044
 requests+0.043
 heads+0.043
Neg logit contributions
str-0.036
 slice-0.032
ww-0.032
Not-0.031
iest-0.030
Token
Token ID198
Feature activation+1.50e+00

Pos logit contributions
str+0.042
 slice+0.039
ww+0.039
iest+0.039
prise+0.038
Neg logit contributions
lee-0.041
stick-0.041
 requests-0.039
en-0.038
arted-0.038
TokenBut
Token ID1537
Feature activation+4.26e+00

Pos logit contributions
str+0.027
ipop+0.027
iest+0.026
 slice+0.026
prise+0.025
Neg logit contributions
stick-0.036
lee-0.035
 requests-0.034
en-0.033
am-0.033
Token as
Token ID355
Feature activation+8.08e+00

Pos logit contributions
str+0.093
arest+0.091
prise+0.090
ww+0.088
iest+0.087
Neg logit contributions
stick-0.092
lee-0.088
en-0.085
ishment-0.084
arted-0.082
Token they
Token ID484
Feature activation+4.11e-01

Pos logit contributions
 slice+0.175
str+0.170
arest+0.166
prise+0.159
ipop+0.157
Neg logit contributions
lee-0.149
en-0.143
 houses-0.141
 circles-0.138
am-0.138
Token were
Token ID547
Feature activation-1.76e+00

Pos logit contributions
str+0.009
ww+0.008
 slice+0.008
iest+0.008
arest+0.008
Neg logit contributions
am-0.009
arted-0.009
aled-0.009
lee-0.009
en-0.009
Token walking
Token ID6155
Feature activation-3.57e+00

Pos logit contributions
lee+0.035
stick+0.035
 requests+0.034
en+0.033
arted+0.033
Neg logit contributions
str-0.041
 slice-0.038
ww-0.037
iest-0.037
prise-0.036
Token back
Token ID736
Feature activation-4.00e-01

Pos logit contributions
stick+0.079
lee+0.077
arted+0.076
ags+0.076
en+0.075
Neg logit contributions
str-0.076
 slice-0.070
ww-0.068
prise-0.068
arest-0.066
Token,
Token ID11
Feature activation+6.45e+00

Pos logit contributions
lee+0.007
stick+0.007
en+0.007
arted+0.007
 requests+0.007
Neg logit contributions
str-0.010
 slice-0.009
ww-0.009
arest-0.009
iest-0.009
Token honey
Token ID12498
Feature activation+1.45e+00

Pos logit contributions
 requests+0.006
lee+0.006
 circles+0.006
stick+0.006
 houses+0.006
Neg logit contributions
str-0.010
ww-0.009
prise-0.009
arest-0.009
Not-0.008
Token.
Token ID13
Feature activation-1.95e+00

Pos logit contributions
str+0.035
ww+0.032
Not+0.031
prise+0.031
 slice+0.031
Neg logit contributions
stick-0.029
lee-0.028
en-0.027
 requests-0.026
 heads-0.026
Token
Token ID198
Feature activation+1.84e+00

Pos logit contributions
en+0.042
stick+0.041
lee+0.041
 heads+0.040
ags+0.040
Neg logit contributions
str-0.042
Not-0.037
ww-0.037
 slice-0.037
 mel-0.036
Token
Token ID198
Feature activation+1.17e+00

Pos logit contributions
str+0.039
 slice+0.037
ww+0.037
iest+0.037
prise+0.036
Neg logit contributions
lee-0.040
stick-0.038
 requests-0.037
arted-0.036
en-0.036
TokenBut
Token ID1537
Feature activation+4.65e+00

Pos logit contributions
ipop+0.023
iest+0.023
str+0.022
 slice+0.022
prise+0.021
Neg logit contributions
stick-0.026
lee-0.026
arted-0.025
 requests-0.025
lement-0.024
Token as
Token ID355
Feature activation+8.07e+00

Pos logit contributions
str+0.098
prise+0.094
iest+0.093
ww+0.093
arest+0.093
Neg logit contributions
stick-0.103
lee-0.100
en-0.098
ishment-0.097
arted-0.096
Token they
Token ID484
Feature activation+5.83e-01

Pos logit contributions
 slice+0.154
str+0.153
ipop+0.148
prise+0.145
arest+0.144
Neg logit contributions
lee-0.177
en-0.172
 houses-0.169
 circles-0.168
ishment-0.165
Token were
Token ID547
Feature activation-1.45e+00

Pos logit contributions
str+0.012
ww+0.012
arest+0.012
iest+0.011
prise+0.011
Neg logit contributions
lee-0.012
en-0.012
arted-0.012
am-0.012
 heads-0.012
Token eating
Token ID6600
Feature activation-9.32e-01

Pos logit contributions
lee+0.030
stick+0.029
 requests+0.028
en+0.028
arted+0.028
Neg logit contributions
str-0.033
ww-0.031
 slice-0.030
iest-0.030
Not-0.030
Token,
Token ID11
Feature activation+6.17e+00

Pos logit contributions
 houses+0.017
 circles+0.017
 heads+0.017
en+0.017
ags+0.017
Neg logit contributions
str-0.023
arest-0.022
ww-0.020
prise-0.020
ha-0.020
Token a
Token ID257
Feature activation+2.20e+00

Pos logit contributions
str+0.125
 slice+0.116
prise+0.115
ww+0.114
arest+0.114
Neg logit contributions
lee-0.140
stick-0.138
en-0.132
 requests-0.131
 houses-0.130
Token branch
Token ID8478
Feature activation+2.13e-01

Pos logit contributions
stick+0.004
lee+0.004
en+0.004
 requests+0.004
arted+0.004
Neg logit contributions
str-0.004
ww-0.004
 slice-0.004
iest-0.004
prise-0.004
Token.
Token ID13
Feature activation-1.59e+00

Pos logit contributions
str+0.005
ww+0.005
prise+0.005
 slice+0.004
arest+0.004
Neg logit contributions
stick-0.004
lee-0.004
en-0.004
arted-0.004
 requests-0.004
Token
Token ID198
Feature activation+1.47e+00

Pos logit contributions
 heads+0.031
lee+0.030
stick+0.030
en+0.030
ags+0.030
Neg logit contributions
str-0.037
Not-0.035
ww-0.034
 slice-0.034
aster-0.033
Token
Token ID198
Feature activation+9.20e-01

Pos logit contributions
str+0.030
 slice+0.027
ww+0.027
iest+0.027
prise+0.026
Neg logit contributions
lee-0.033
stick-0.032
 requests-0.031
en-0.031
arted-0.031
TokenBut
Token ID1537
Feature activation+4.51e+00

Pos logit contributions
str+0.016
 happened+0.015
 slice+0.015
ipop+0.015
iest+0.015
Neg logit contributions
stick-0.023
lee-0.022
 requests-0.022
 circles-0.021
arted-0.021
Token as
Token ID355
Feature activation+8.07e+00

Pos logit contributions
str+0.097
arest+0.091
ww+0.090
iest+0.089
prise+0.087
Neg logit contributions
stick-0.099
en-0.097
lee-0.093
ishment-0.092
 heads-0.091
Token the
Token ID262
Feature activation+1.31e+00

Pos logit contributions
str+0.182
 slice+0.176
ipop+0.168
arest+0.164
prise+0.157
Neg logit contributions
 circles-0.149
en-0.146
lee-0.140
 requests-0.138
 houses-0.137
Token muff
Token ID27563
Feature activation+1.54e+00

Pos logit contributions
str+0.028
prise+0.026
 slice+0.026
arest+0.025
ww+0.025
Neg logit contributions
 requests-0.026
 heads-0.026
stick-0.026
lee-0.026
 circles-0.025
Tokenin
Token ID259
Feature activation-5.05e-01

Pos logit contributions
str+0.029
Not+0.027
 reward+0.027
 slice+0.027
ww+0.026
Neg logit contributions
am-0.034
lee-0.034
en-0.034
ags-0.033
stick-0.032
Token sat
Token ID3332
Feature activation-2.95e+00

Pos logit contributions
stick+0.012
lee+0.011
en+0.011
 requests+0.011
am+0.011
Neg logit contributions
str-0.010
prise-0.009
arest-0.009
 slice-0.009
ww-0.009
Token on
Token ID319
Feature activation+7.79e-01

Pos logit contributions
lee+0.063
stick+0.063
en+0.060
 requests+0.059
arted+0.059
Neg logit contributions
str-0.064
 slice-0.059
ww-0.058
prise-0.057
Not-0.056
Token.
Token ID13
Feature activation-2.77e+00

Pos logit contributions
str+0.021
ww+0.019
 slice+0.019
prise+0.019
Not+0.018
Neg logit contributions
lee-0.018
stick-0.017
en-0.017
 requests-0.016
arted-0.016
Token He
Token ID679
Feature activation-1.35e+00

Pos logit contributions
stick+0.051
en+0.050
lee+0.049
ags+0.047
lement+0.047
Neg logit contributions
str-0.065
 slice-0.064
Not-0.063
 happened-0.062
ipop-0.061
Token jumped
Token ID11687
Feature activation-5.87e+00

Pos logit contributions
lee+0.032
stick+0.032
 requests+0.030
en+0.030
arted+0.030
Neg logit contributions
str-0.026
ww-0.024
 slice-0.024
iest-0.023
prise-0.023
Token in
Token ID287
Feature activation-9.48e-01

Pos logit contributions
lee+0.106
stick+0.105
en+0.101
arted+0.100
 requests+0.098
Neg logit contributions
str-0.150
ww-0.137
 slice-0.136
prise-0.135
iest-0.133
Token and
Token ID290
Feature activation-1.44e+00

Pos logit contributions
lee+0.019
stick+0.019
en+0.018
 requests+0.018
arted+0.018
Neg logit contributions
str-0.023
ww-0.021
 slice-0.020
iest-0.020
arest-0.020
Token sw
Token ID1509
Feature activation-9.36e+00

Pos logit contributions
lee+0.033
stick+0.033
arted+0.031
en+0.031
 heads+0.030
Neg logit contributions
str-0.029
iest-0.027
ww-0.027
 slice-0.027
prise-0.027
Tokenam
Token ID321
Feature activation-4.72e+00

Pos logit contributions
stick+0.096
 robots+0.093
 circles+0.092
 requests+0.087
 languages+0.085
Neg logit contributions
w-0.283
str-0.278
 coconut-0.273
 slice-0.269
ww-0.267
Token down
Token ID866
Feature activation-3.40e-01

Pos logit contributions
lee+0.113
stick+0.112
 requests+0.108
en+0.108
arted+0.106
Neg logit contributions
str-0.091
ww-0.082
 slice-0.082
iest-0.079
Not-0.079
Token and
Token ID290
Feature activation-1.31e+00

Pos logit contributions
lee+0.008
stick+0.008
en+0.008
 requests+0.008
arted+0.008
Neg logit contributions
str-0.007
ww-0.006
 slice-0.006
prise-0.006
iest-0.006
Token saw
Token ID2497
Feature activation+5.66e-01

Pos logit contributions
lee+0.031
stick+0.031
en+0.030
 requests+0.030
arted+0.029
Neg logit contributions
str-0.025
ww-0.023
 slice-0.023
iest-0.022
prise-0.022
Token a
Token ID257
Feature activation+1.04e+00

Pos logit contributions
str+0.010
ww+0.009
 slice+0.008
prise+0.008
Not+0.008
Neg logit contributions
lee-0.015
stick-0.015
en-0.014
 requests-0.014
arted-0.014
Token.
Token ID13
Feature activation-2.61e+00

Pos logit contributions
stick+0.045
lee+0.044
en+0.044
 requests+0.042
 heads+0.042
Neg logit contributions
str-0.050
ww-0.046
 slice-0.046
aster-0.044
prise-0.044
Token She
Token ID1375
Feature activation-1.25e+00

Pos logit contributions
en+0.051
ags+0.051
ishment+0.050
lee+0.049
 heads+0.049
Neg logit contributions
Not-0.060
str-0.053
 mel-0.050
t-0.049
aster-0.049
Token sw
Token ID1509
Feature activation-8.33e+00

Pos logit contributions
stick+0.031
lee+0.031
 requests+0.030
ags+0.030
en+0.030
Neg logit contributions
str-0.022
ww-0.020
 slice-0.020
iest-0.019
 happened-0.019
Tokenam
Token ID321
Feature activation-5.24e+00

Pos logit contributions
stick+0.104
lement+0.103
 circles+0.101
 requests+0.096
 mistakes+0.089
Neg logit contributions
 bait-0.214
w-0.212
 coconut-0.208
str-0.207
ww-0.202
Token and
Token ID290
Feature activation-1.85e+00

Pos logit contributions
lee+0.096
stick+0.096
en+0.091
 requests+0.090
arted+0.089
Neg logit contributions
str-0.130
ww-0.120
 slice-0.120
prise-0.117
iest-0.116
Token sw
Token ID1509
Feature activation-9.19e+00

Pos logit contributions
stick+0.044
ags+0.044
lee+0.043
 requests+0.043
lement+0.042
Neg logit contributions
str-0.034
ww-0.032
ha-0.031
 slice-0.031
Not-0.031
Tokenam
Token ID321
Feature activation-5.13e+00

Pos logit contributions
lement+0.101
 requests+0.100
stick+0.098
 circles+0.098
 mistakes+0.095
Neg logit contributions
w-0.256
 bait-0.253
ww-0.253
str-0.251
iest-0.243
Token,
Token ID11
Feature activation+1.03e+00

Pos logit contributions
lee+0.121
stick+0.120
 requests+0.115
en+0.114
arted+0.113
Neg logit contributions
str-0.099
ww-0.090
 slice-0.089
iest-0.089
Not-0.089
Token until
Token ID1566
Feature activation+4.48e+00

Pos logit contributions
str+0.018
Not+0.016
 slice+0.015
 mel+0.015
ww+0.014
Neg logit contributions
ags-0.026
lee-0.026
lement-0.026
stick-0.026
 requests-0.025
Token she
Token ID673
Feature activation+1.07e-01

Pos logit contributions
str+0.078
 slice+0.076
prise+0.073
ww+0.073
arest+0.070
Neg logit contributions
lee-0.113
stick-0.109
arted-0.108
en-0.107
aled-0.106
Token found
Token ID1043
Feature activation-1.65e+00

Pos logit contributions
str+0.002
ww+0.002
 slice+0.002
prise+0.002
arest+0.002
Neg logit contributions
lee-0.003
en-0.003
am-0.003
stick-0.003
 requests-0.003
Token
Token ID198
Feature activation+1.31e+00

Pos logit contributions
str+0.053
 slice+0.052
ww+0.050
iest+0.050
prise+0.050
Neg logit contributions
lee-0.045
stick-0.044
 requests-0.044
en-0.043
lement-0.042
TokenThey
Token ID2990
Feature activation+1.95e-01

Pos logit contributions
str+0.020
ww+0.017
 slice+0.017
Not+0.017
prise+0.016
Neg logit contributions
lee-0.037
stick-0.037
en-0.036
 requests-0.036
arted-0.035
Token sw
Token ID1509
Feature activation-8.65e+00

Pos logit contributions
str+0.004
ww+0.004
iest+0.004
 slice+0.004
arest+0.004
Neg logit contributions
lee-0.004
stick-0.004
arted-0.004
en-0.004
 heads-0.004
Tokenam
Token ID321
Feature activation-5.23e+00

Pos logit contributions
 circles+0.103
stick+0.091
 requests+0.090
lement+0.086
 stages+0.085
Neg logit contributions
w-0.259
str-0.254
ww-0.246
 slice-0.237
 bait-0.236
Token and
Token ID290
Feature activation-1.40e+00

Pos logit contributions
stick+0.095
lee+0.094
arted+0.091
en+0.090
 heads+0.090
Neg logit contributions
str-0.131
ww-0.121
 slice-0.120
prise-0.119
aster-0.118
Token sw
Token ID1509
Feature activation-9.18e+00

Pos logit contributions
lee+0.031
stick+0.031
 requests+0.029
en+0.029
ags+0.029
Neg logit contributions
str-0.029
ww-0.027
 slice-0.027
Not-0.026
ha-0.026
Tokenam
Token ID321
Feature activation-5.07e+00

Pos logit contributions
 circles+0.084
 requests+0.082
stick+0.079
lee+0.078
lement+0.076
Neg logit contributions
str-0.293
w-0.292
ww-0.289
 slice-0.276
 bait-0.273
Token until
Token ID1566
Feature activation+4.55e+00

Pos logit contributions
lee+0.115
stick+0.114
en+0.110
 requests+0.109
arted+0.108
Neg logit contributions
str-0.104
ww-0.094
 slice-0.094
prise-0.091
iest-0.090
Token they
Token ID484
Feature activation+6.57e-01

Pos logit contributions
prise+0.072
str+0.072
 slice+0.070
arest+0.069
ww+0.068
Neg logit contributions
lee-0.120
arted-0.114
stick-0.112
en-0.111
 houses-0.110
Token were
Token ID547
Feature activation-2.09e+00

Pos logit contributions
ww+0.014
prise+0.013
arest+0.013
iest+0.013
 slice+0.013
Neg logit contributions
lee-0.014
am-0.014
arted-0.013
en-0.013
aled-0.012
Token tired
Token ID10032
Feature activation+1.80e+00

Pos logit contributions
stick+0.046
lee+0.046
en+0.044
 requests+0.044
 heads+0.043
Neg logit contributions
str-0.044
 slice-0.040
ww-0.040
Not-0.038
prise-0.038
Token,
Token ID11
Feature activation+5.22e+00

Pos logit contributions
arest+0.134
str+0.131
prise+0.127
iest+0.121
 slice+0.119
Neg logit contributions
lee-0.095
am-0.093
stick-0.091
 houses-0.091
en-0.090
Token he
Token ID339
Feature activation-7.15e-01

Pos logit contributions
arest+0.154
str+0.152
prise+0.149
ww+0.142
ha+0.141
Neg logit contributions
 houses-0.072
stick-0.069
lee-0.063
 circles-0.063
am-0.060
Token leaned
Token ID23831
Feature activation-4.92e+00

Pos logit contributions
stick+0.017
lee+0.017
ags+0.017
en+0.017
 requests+0.016
Neg logit contributions
str-0.013
ww-0.012
 slice-0.012
 happened-0.012
ha-0.012
Token over
Token ID625
Feature activation-1.68e-01

Pos logit contributions
stick+0.091
lee+0.090
arted+0.086
en+0.086
ags+0.084
Neg logit contributions
str-0.122
ww-0.113
 slice-0.111
prise-0.110
iest-0.109
Token and
Token ID290
Feature activation-1.42e+00

Pos logit contributions
lee+0.003
stick+0.003
en+0.003
 requests+0.003
 circles+0.003
Neg logit contributions
str-0.004
ww-0.003
arest-0.003
prise-0.003
iest-0.003
Token sw
Token ID1509
Feature activation-9.17e+00

Pos logit contributions
lee+0.031
stick+0.031
 requests+0.030
en+0.030
arted+0.029
Neg logit contributions
str-0.030
iest-0.028
ww-0.027
 slice-0.027
prise-0.027
Tokenatted
Token ID16898
Feature activation-3.30e+00

Pos logit contributions
 circles+0.085
stick+0.080
lee+0.077
 requests+0.076
 languages+0.074
Neg logit contributions
w-0.295
str-0.285
ww-0.282
 slice-0.280
ipop-0.280
Token the
Token ID262
Feature activation+3.27e-02

Pos logit contributions
lee+0.077
stick+0.077
 requests+0.074
en+0.073
arted+0.073
Neg logit contributions
str-0.065
ww-0.059
 slice-0.059
iest-0.057
Not-0.057
Token mosquito
Token ID35446
Feature activation-1.35e+00

Pos logit contributions
str+0.000
prise+0.000
ww+0.000
 slice+0.000
iest+0.000
Neg logit contributions
 heads-0.001
stick-0.001
 requests-0.001
ags-0.001
 circles-0.001
Token with
Token ID351
Feature activation+2.08e-01

Pos logit contributions
lee+0.033
ags+0.030
lement+0.030
ishment+0.030
arted+0.029
Neg logit contributions
 slice-0.025
str-0.025
 bite-0.024
Not-0.023
ww-0.023
Token his
Token ID465
Feature activation-7.88e-01

Pos logit contributions
str+0.005
 slice+0.004
ww+0.004
iest+0.004
Not+0.004
Neg logit contributions
lee-0.004
stick-0.004
en-0.004
 requests-0.004
arted-0.004
Token
Token ID198
Feature activation+1.07e+00

Pos logit contributions
str+0.039
ww+0.037
 slice+0.036
prise+0.035
iest+0.035
Neg logit contributions
lee-0.038
stick-0.037
 requests-0.037
en-0.036
 circles-0.035
TokenHe
Token ID1544
Feature activation-7.71e-01

Pos logit contributions
str+0.018
ww+0.016
 slice+0.016
iest+0.015
prise+0.015
Neg logit contributions
lee-0.028
stick-0.028
 requests-0.027
en-0.027
arted-0.027
Token sw
Token ID1509
Feature activation-8.30e+00

Pos logit contributions
stick+0.021
lee+0.021
ags+0.021
arted+0.020
 requests+0.020
Neg logit contributions
str-0.012
ww-0.011
 happened-0.010
 slice-0.010
ha-0.010
Tokenam
Token ID321
Feature activation-4.79e+00

Pos logit contributions
stick+0.089
 requests+0.084
 circles+0.084
lement+0.078
 commands+0.075
Neg logit contributions
w-0.241
 coconut-0.239
str-0.238
 bait-0.232
 slice-0.231
Token and
Token ID290
Feature activation-1.49e+00

Pos logit contributions
lee+0.092
stick+0.092
en+0.088
 requests+0.087
arted+0.086
Neg logit contributions
str-0.115
ww-0.106
 slice-0.105
prise-0.103
iest-0.102
Token sw
Token ID1509
Feature activation-9.16e+00

Pos logit contributions
ags+0.038
lee+0.037
 mistakes+0.037
 requests+0.037
stick+0.036
Neg logit contributions
str-0.026
 happened-0.024
ww-0.023
ha-0.023
 slice-0.023
Tokenam
Token ID321
Feature activation-4.92e+00

Pos logit contributions
 requests+0.088
stick+0.086
 circles+0.082
lement+0.080
 mistakes+0.080
Neg logit contributions
w-0.278
str-0.278
 coconut-0.272
 bait-0.270
ww-0.268
Token,
Token ID11
Feature activation+1.62e+00

Pos logit contributions
lee+0.117
stick+0.116
 requests+0.112
en+0.110
arted+0.110
Neg logit contributions
str-0.094
ww-0.085
 slice-0.085
iest-0.084
Not-0.083
Token passing
Token ID6427
Feature activation-2.00e+00

Pos logit contributions
str+0.028
Not+0.024
 slice+0.023
 mel+0.023
 happened+0.023
Neg logit contributions
lee-0.042
ags-0.041
stick-0.041
 requests-0.040
aters-0.040
Token all
Token ID477
Feature activation+4.73e-01

Pos logit contributions
stick+0.053
lee+0.049
ishment+0.047
anks+0.047
ags+0.046
Neg logit contributions
iest-0.033
 mel-0.033
str-0.033
 coconut-0.032
ha-0.031
Token kinds
Token ID6982
Feature activation+1.38e+00

Pos logit contributions
str+0.007
iest+0.007
ww+0.006
aster+0.006
Not+0.006
Neg logit contributions
stick-0.013
lee-0.013
 requests-0.013
ishment-0.012
 heads-0.012
Token.
Token ID13
Feature activation-2.01e+00

Pos logit contributions
str+0.036
ww+0.034
 slice+0.033
prise+0.032
Not+0.032
Neg logit contributions
stick-0.028
lee-0.028
 requests-0.027
en-0.027
arted-0.026
Token They
Token ID1119
Feature activation-5.63e-02

Pos logit contributions
lee+0.033
stick+0.033
en+0.031
 requests+0.031
arted+0.030
Neg logit contributions
str-0.053
 slice-0.049
ww-0.049
Not-0.048
iest-0.048
Token sw
Token ID1509
Feature activation-8.67e+00

Pos logit contributions
lee+0.001
stick+0.001
arted+0.001
 heads+0.001
en+0.001
Neg logit contributions
str-0.001
ww-0.001
 slice-0.001
arest-0.001
aster-0.001
Tokenam
Token ID321
Feature activation-5.26e+00

Pos logit contributions
stick+0.095
 circles+0.085
 requests+0.085
lement+0.084
 commands+0.076
Neg logit contributions
w-0.264
str-0.256
ww-0.254
 bait-0.253
 coconut-0.251
Token and
Token ID290
Feature activation-9.48e-01

Pos logit contributions
lee+0.100
stick+0.099
en+0.095
arted+0.094
 requests+0.094
Neg logit contributions
str-0.127
ww-0.117
 slice-0.117
prise-0.115
aster-0.114
Token sw
Token ID1509
Feature activation-9.12e+00

Pos logit contributions
lee+0.021
stick+0.021
 requests+0.020
en+0.020
arted+0.020
Neg logit contributions
str-0.020
ww-0.018
 slice-0.018
iest-0.018
Not-0.018
Tokenam
Token ID321
Feature activation-4.99e+00

Pos logit contributions
stick+0.080
 requests+0.078
lement+0.075
 circles+0.074
lee+0.069
Neg logit contributions
str-0.291
w-0.290
ww-0.289
 bait-0.279
 slice-0.278
Token until
Token ID1566
Feature activation+4.43e+00

Pos logit contributions
lee+0.118
stick+0.117
 requests+0.112
en+0.112
arted+0.111
Neg logit contributions
str-0.097
ww-0.088
 slice-0.088
iest-0.085
Not-0.085
Token they
Token ID484
Feature activation+7.40e-01

Pos logit contributions
prise+0.091
arest+0.082
str+0.081
 slice+0.081
ww+0.077
Neg logit contributions
am-0.098
 houses-0.098
arted-0.097
lee-0.094
aled-0.093
Token found
Token ID1043
Feature activation-1.68e+00

Pos logit contributions
ww+0.015
prise+0.015
 slice+0.014
str+0.014
arest+0.014
Neg logit contributions
lee-0.016
am-0.016
arted-0.016
en-0.015
aled-0.015
Token the
Token ID262
Feature activation-5.29e-01

Pos logit contributions
am+0.037
lee+0.036
 circles+0.035
arted+0.035
ags+0.034
Neg logit contributions
str-0.036
ww-0.034
prise-0.033
ipop-0.033
 slice-0.032
Token.
Token ID13
Feature activation-2.38e+00

Pos logit contributions
str+0.010
ww+0.010
prise+0.009
 slice+0.009
ha+0.009
Neg logit contributions
stick-0.009
lee-0.009
en-0.009
 circles-0.009
 requests-0.009
Token She
Token ID1375
Feature activation-1.18e+00

Pos logit contributions
ags+0.052
ishment+0.051
 mistakes+0.050
stick+0.049
 heads+0.048
Neg logit contributions
Not-0.051
str-0.048
 coconut-0.043
ipop-0.042
t-0.041
Token sw
Token ID1509
Feature activation-8.16e+00

Pos logit contributions
stick+0.033
ags+0.032
lee+0.032
ishment+0.031
arted+0.031
Neg logit contributions
str-0.018
 happened-0.016
ww-0.016
 slice-0.016
ha-0.015
Tokenam
Token ID321
Feature activation-4.88e+00

Pos logit contributions
stick+0.095
lement+0.088
 circles+0.087
 requests+0.086
 commands+0.080
Neg logit contributions
w-0.232
str-0.228
 coconut-0.227
 bait-0.226
iest-0.221
Token and
Token ID290
Feature activation-1.34e+00

Pos logit contributions
lee+0.089
stick+0.089
en+0.085
 requests+0.084
arted+0.083
Neg logit contributions
str-0.122
ww-0.112
 slice-0.112
prise-0.110
iest-0.108
Token sw
Token ID1509
Feature activation-9.07e+00

Pos logit contributions
ags+0.035
stick+0.034
 requests+0.033
lee+0.033
lement+0.032
Neg logit contributions
str-0.023
ww-0.021
 happened-0.021
ha-0.020
w-0.020
Tokenam
Token ID321
Feature activation-4.93e+00

Pos logit contributions
stick+0.091
 requests+0.089
lement+0.087
 circles+0.084
 mistakes+0.080
Neg logit contributions
w-0.271
str-0.268
ww-0.265
 bait-0.262
iest-0.261
Token until
Token ID1566
Feature activation+3.88e+00

Pos logit contributions
lee+0.114
stick+0.113
 requests+0.108
en+0.107
arted+0.106
Neg logit contributions
str-0.097
ww-0.089
 slice-0.088
iest-0.088
Not-0.087
Token she
Token ID673
Feature activation+2.31e-01

Pos logit contributions
str+0.060
 slice+0.058
ww+0.056
prise+0.056
arest+0.053
Neg logit contributions
lee-0.105
stick-0.103
en-0.101
arted-0.101
aled-0.098
Token saw
Token ID2497
Feature activation-1.51e-01

Pos logit contributions
str+0.004
ww+0.004
 slice+0.004
prise+0.004
arest+0.004
Neg logit contributions
lee-0.006
en-0.006
stick-0.005
am-0.005
arted-0.005
Token a
Token ID257
Feature activation+1.05e+00

Pos logit contributions
lee+0.004
stick+0.004
 requests+0.004
en+0.004
arted+0.004
Neg logit contributions
str-0.003
ww-0.002
 slice-0.002
prise-0.002
arest-0.002
Token and
Token ID290
Feature activation+2.62e+00

Pos logit contributions
stick+0.057
lee+0.057
ags+0.055
 requests+0.055
ishment+0.054
Neg logit contributions
str-0.045
 slice-0.040
ww-0.040
 happened-0.039
ha-0.039
Token Mia
Token ID32189
Feature activation-4.64e-02

Pos logit contributions
str+0.062
ww+0.057
arest+0.057
 slice+0.057
iest+0.056
Neg logit contributions
lee-0.052
stick-0.052
 requests-0.049
arted-0.049
en-0.048
Token grabbed
Token ID13176
Feature activation-2.76e+00

Pos logit contributions
lee+0.001
stick+0.001
 heads+0.001
en+0.001
 requests+0.001
Neg logit contributions
str-0.001
ww-0.001
 slice-0.001
iest-0.001
prise-0.001
Token Leo
Token ID19632
Feature activation-6.22e-01

Pos logit contributions
lee+0.068
stick+0.068
en+0.065
 requests+0.065
arted+0.064
Neg logit contributions
str-0.051
 slice-0.046
ww-0.046
Not-0.044
prise-0.044
Token and
Token ID290
Feature activation+6.14e-01

Pos logit contributions
stick+0.010
lee+0.010
en+0.010
 requests+0.010
 heads+0.010
Neg logit contributions
str-0.017
ww-0.016
iest-0.015
arest-0.015
Not-0.015
Token sw
Token ID1509
Feature activation-9.07e+00

Pos logit contributions
str+0.013
iest+0.013
arest+0.013
ww+0.012
prise+0.012
Neg logit contributions
lee-0.014
stick-0.013
 heads-0.013
 requests-0.013
 backs-0.013
Tokenam
Token ID321
Feature activation-4.39e+00

Pos logit contributions
 Playing+0.086
 requests+0.086
 circles+0.086
stick+0.086
lement+0.085
Neg logit contributions
w-0.280
 bait-0.263
 slice-0.262
str-0.259
 coconut-0.259
Token to
Token ID284
Feature activation-8.40e-01

Pos logit contributions
lee+0.090
stick+0.089
 requests+0.085
en+0.085
arted+0.084
Neg logit contributions
str-0.100
ww-0.092
 slice-0.092
iest-0.089
prise-0.089
Token the
Token ID262
Feature activation-1.20e+00

Pos logit contributions
lee+0.021
stick+0.020
en+0.020
 requests+0.020
arted+0.019
Neg logit contributions
str-0.016
 slice-0.014
ww-0.014
Not-0.014
prise-0.014
Token edge
Token ID5743
Feature activation+7.44e-01

Pos logit contributions
stick+0.022
lee+0.022
 heads+0.022
 requests+0.022
en+0.021
Neg logit contributions
str-0.029
ww-0.027
 slice-0.026
prise-0.026
iest-0.026
Token of
Token ID286
Feature activation+1.29e-01

Pos logit contributions
str+0.011
ww+0.010
prise+0.009
arest+0.009
 slice+0.009
Neg logit contributions
lee-0.021
stick-0.021
en-0.021
 circles-0.021
arted-0.020
Token.
Token ID13
Feature activation-1.51e+00

Pos logit contributions
lee+0.045
stick+0.043
arted+0.042
 requests+0.042
en+0.042
Neg logit contributions
str-0.042
 slice-0.039
ww-0.038
iest-0.038
Not-0.037
Token It
Token ID632
Feature activation+5.05e-01

Pos logit contributions
 heads+0.031
ishment+0.030
am+0.030
ags+0.029
lee+0.029
Neg logit contributions
Not-0.032
aster-0.032
 slice-0.030
 mel-0.028
 please-0.028
Token has
Token ID468
Feature activation+5.67e-01

Pos logit contributions
str+0.007
ww+0.007
 happened+0.006
Not+0.006
iest+0.006
Neg logit contributions
ishment-0.014
lee-0.014
arted-0.014
ags-0.014
stick-0.014
Token flowers
Token ID12734
Feature activation+6.97e-01

Pos logit contributions
str+0.009
ww+0.008
 slice+0.008
iest+0.008
Not+0.008
Neg logit contributions
lee-0.015
stick-0.015
en-0.015
ishment-0.015
arted-0.015
Token and
Token ID290
Feature activation+1.82e+00

Pos logit contributions
str+0.014
ww+0.013
 slice+0.013
prise+0.012
Not+0.012
Neg logit contributions
stick-0.016
lee-0.016
en-0.015
 requests-0.015
ags-0.015
Token rib
Token ID12183
Feature activation-9.05e+00

Pos logit contributions
str+0.047
ww+0.044
 slice+0.044
iest+0.043
Not+0.043
Neg logit contributions
lee-0.031
stick-0.031
 requests-0.029
en-0.029
arted-0.029
Tokenbons
Token ID23461
Feature activation-1.11e+00

Pos logit contributions
 requests+0.099
ishment+0.098
lee+0.097
stick+0.095
arted+0.095
Neg logit contributions
str-0.269
ww-0.268
 slice-0.265
ha-0.257
w-0.257
Token on
Token ID319
Feature activation-1.10e+00

Pos logit contributions
lee+0.024
stick+0.024
en+0.023
 requests+0.023
arted+0.022
Neg logit contributions
str-0.024
ww-0.022
 slice-0.022
prise-0.021
Not-0.021
Token it
Token ID340
Feature activation+9.60e-01

Pos logit contributions
lee+0.027
stick+0.027
arted+0.027
 requests+0.027
ishment+0.027
Neg logit contributions
str-0.019
 slice-0.018
ww-0.017
aster-0.017
iest-0.017
Token.
Token ID13
Feature activation-2.26e+00

Pos logit contributions
str+0.019
 slice+0.017
ww+0.017
iest+0.017
 happened+0.017
Neg logit contributions
lee-0.022
stick-0.021
arted-0.021
en-0.021
 requests-0.021
Token She
Token ID1375
Feature activation-1.02e+00

Pos logit contributions
 heads+0.054
stick+0.053
am+0.052
ping+0.052
arted+0.052
Neg logit contributions
Not-0.041
aster-0.039
 slice-0.037
 sugar-0.037
 mel-0.036
Token sailed
Token ID33835
Feature activation-3.98e+00

Pos logit contributions
ags+0.063
stick+0.062
lee+0.061
 requests+0.060
ishment+0.059
Neg logit contributions
str-0.049
ww-0.043
 slice-0.043
ha-0.043
iest-0.042
Token for
Token ID329
Feature activation+6.18e-01

Pos logit contributions
lee+0.082
stick+0.081
en+0.078
 requests+0.077
arted+0.077
Neg logit contributions
str-0.090
ww-0.082
 slice-0.082
prise-0.080
Not-0.079
Token days
Token ID1528
Feature activation+1.17e+00

Pos logit contributions
str+0.010
ww+0.009
 slice+0.009
iest+0.008
prise+0.008
Neg logit contributions
lee-0.017
stick-0.017
 requests-0.016
en-0.016
arted-0.016
Token,
Token ID11
Feature activation+2.26e+00

Pos logit contributions
str+0.022
ww+0.020
 slice+0.020
iest+0.019
Not+0.019
Neg logit contributions
lee-0.028
stick-0.028
 requests-0.027
en-0.027
ags-0.027
Token exploring
Token ID13504
Feature activation-1.30e+00

Pos logit contributions
str+0.046
 slice+0.041
ww+0.041
Not+0.040
iest+0.040
Neg logit contributions
lee-0.051
stick-0.050
ags-0.049
 requests-0.049
en-0.048
Token unch
Token ID16258
Feature activation-9.04e+00

Pos logit contributions
lee+0.029
stick+0.029
en+0.028
 requests+0.028
arted+0.028
Neg logit contributions
str-0.027
ww-0.024
 slice-0.024
prise-0.023
Not-0.023
Tokenarted
Token ID19112
Feature activation-1.75e+00

Pos logit contributions
 shoulders+0.114
 backs+0.107
 voices+0.106
 dances+0.106
 requests+0.103
Neg logit contributions
str-0.240
 coconut-0.237
aster-0.237
 mel-0.234
ww-0.222
Token waters
Token ID10150
Feature activation-3.06e-01

Pos logit contributions
lee+0.036
 requests+0.035
ags+0.035
stick+0.035
ishment+0.034
Neg logit contributions
str-0.038
 happened-0.035
ww-0.034
ha-0.034
iest-0.034
Token until
Token ID1566
Feature activation+4.35e+00

Pos logit contributions
lee+0.007
stick+0.007
 requests+0.007
en+0.007
arted+0.007
Neg logit contributions
str-0.006
 slice-0.005
ww-0.005
iest-0.005
Not-0.005
Token he
Token ID339
Feature activation+1.54e-01

Pos logit contributions
str+0.077
 slice+0.071
ww+0.070
prise+0.070
arest+0.068
Neg logit contributions
lee-0.111
stick-0.109
arted-0.105
en-0.104
aled-0.103
Token came
Token ID1625
Feature activation-4.35e+00

Pos logit contributions
str+0.003
 slice+0.003
ww+0.003
prise+0.003
Not+0.003
Neg logit contributions
lee-0.004
en-0.004
stick-0.004
am-0.003
aled-0.003
Token.
Token ID13
Feature activation-2.51e+00

Pos logit contributions
str+0.002
ww+0.002
 slice+0.002
iest+0.002
Not+0.002
Neg logit contributions
lee-0.002
stick-0.002
 requests-0.002
en-0.002
arted-0.002
Token The
Token ID383
Feature activation+2.39e-01

Pos logit contributions
stick+0.051
lee+0.049
en+0.048
 heads+0.047
arted+0.047
Neg logit contributions
str-0.055
 slice-0.053
aster-0.053
Not-0.052
ww-0.051
Token ducks
Token ID39694
Feature activation-2.97e-01

Pos logit contributions
str+0.006
prise+0.005
ww+0.005
arest+0.005
aster+0.005
Neg logit contributions
 heads-0.005
ags-0.004
stick-0.004
 backs-0.004
 houses-0.004
Token and
Token ID290
Feature activation+1.24e+00

Pos logit contributions
stick+0.007
lee+0.007
en+0.006
arted+0.006
 heads+0.006
Neg logit contributions
str-0.006
ww-0.006
prise-0.006
arest-0.006
Not-0.005
Token fish
Token ID5916
Feature activation-7.01e-01

Pos logit contributions
str+0.032
arest+0.031
prise+0.030
ww+0.029
iest+0.029
Neg logit contributions
 heads-0.024
lee-0.023
stick-0.023
 houses-0.022
arted-0.022
Token sw
Token ID1509
Feature activation-9.03e+00

Pos logit contributions
stick+0.016
lee+0.016
am+0.015
en+0.015
 heads+0.015
Neg logit contributions
str-0.014
ww-0.014
prise-0.013
arest-0.013
Not-0.013
Tokenam
Token ID321
Feature activation-4.80e+00

Pos logit contributions
stick+0.081
 requests+0.078
lement+0.073
 circles+0.071
lee+0.066
Neg logit contributions
w-0.291
str-0.281
ww-0.279
 slice-0.279
 bait-0.278
Token away
Token ID1497
Feature activation+9.93e-01

Pos logit contributions
stick+0.097
lee+0.097
en+0.093
arted+0.093
 heads+0.092
Neg logit contributions
str-0.110
ww-0.101
 slice-0.100
prise-0.098
aster-0.098
Token.
Token ID13
Feature activation-1.98e+00

Pos logit contributions
str+0.025
ww+0.023
prise+0.022
aster+0.022
 slice+0.022
Neg logit contributions
stick-0.018
lee-0.018
en-0.018
 heads-0.018
 requests-0.017
Token The
Token ID383
Feature activation+5.77e-01

Pos logit contributions
stick+0.045
lee+0.043
en+0.042
 heads+0.041
arted+0.041
Neg logit contributions
str-0.040
 slice-0.038
Not-0.037
aster-0.037
ww-0.037
Token water
Token ID1660
Feature activation+2.38e-01

Pos logit contributions
str+0.014
prise+0.013
ww+0.013
arest+0.013
iest+0.013
Neg logit contributions
 heads-0.011
ags-0.011
 houses-0.011
 requests-0.010
 backs-0.010
Token
Token ID220
Feature activation-2.88e+00

Pos logit contributions
ishment+0.056
en+0.053
lee+0.053
 heads+0.050
ags+0.049
Neg logit contributions
Not-0.060
str-0.053
aster-0.052
ipop-0.051
 slice-0.051
Token
Token ID198
Feature activation+9.80e-01

Pos logit contributions
lee+0.073
stick+0.073
en+0.070
ishment+0.069
 heads+0.069
Neg logit contributions
str-0.051
Not-0.045
 slice-0.045
ww-0.044
 happened-0.044
Token
Token ID198
Feature activation+6.03e-01

Pos logit contributions
str+0.019
ww+0.017
 slice+0.017
Not+0.016
iest+0.016
Neg logit contributions
lee-0.024
stick-0.023
en-0.022
 requests-0.022
arted-0.022
TokenShe
Token ID3347
Feature activation-1.11e+00

Pos logit contributions
str+0.009
ww+0.008
Not+0.008
 slice+0.008
aster+0.008
Neg logit contributions
lee-0.017
stick-0.017
en-0.016
ishment-0.016
arted-0.016
Token cautiously
Token ID39640
Feature activation-8.54e-01

Pos logit contributions
lee+0.028
stick+0.027
 requests+0.026
en+0.026
arted+0.026
Neg logit contributions
str-0.020
ww-0.018
 slice-0.018
iest-0.018
Not-0.017
Token sw
Token ID1509
Feature activation-9.02e+00

Pos logit contributions
lee+0.019
stick+0.019
en+0.018
 requests+0.018
arted+0.018
Neg logit contributions
str-0.018
ww-0.017
 slice-0.016
prise-0.016
Not-0.016
Tokenam
Token ID321
Feature activation-4.87e+00

Pos logit contributions
stick+0.108
 mistakes+0.106
 requests+0.102
 circles+0.099
 robots+0.094
Neg logit contributions
w-0.255
str-0.242
 bait-0.241
ww-0.237
 coconut-0.232
Token around
Token ID1088
Feature activation-1.59e+00

Pos logit contributions
lee+0.100
stick+0.098
 requests+0.094
ags+0.092
ishment+0.092
Neg logit contributions
str-0.109
Not-0.101
iest-0.100
ww-0.100
 slice-0.099
Token and
Token ID290
Feature activation-8.88e-01

Pos logit contributions
lee+0.042
stick+0.042
arted+0.040
aled+0.040
ags+0.040
Neg logit contributions
str-0.025
 slice-0.023
aster-0.023
 happened-0.022
Not-0.022
Token saw
Token ID2497
Feature activation+2.94e-01

Pos logit contributions
lee+0.023
ags+0.023
stick+0.023
 requests+0.022
aters+0.022
Neg logit contributions
str-0.015
ww-0.013
Not-0.013
 happened-0.013
 slice-0.013
Token lots
Token ID6041
Feature activation+2.22e+00

Pos logit contributions
str+0.005
ww+0.004
 slice+0.004
iest+0.004
Not+0.004
Neg logit contributions
lee-0.008
stick-0.008
en-0.008
 requests-0.008
arted-0.007
Token and
Token ID290
Feature activation+4.50e-01

Pos logit contributions
stick+0.016
lee+0.015
en+0.015
arted+0.015
aled+0.014
Neg logit contributions
str-0.015
ww-0.014
prise-0.013
 slice-0.013
arest-0.013
Token he
Token ID339
Feature activation-1.10e+00

Pos logit contributions
str+0.009
ww+0.007
 slice+0.007
arest+0.007
prise+0.007
Neg logit contributions
stick-0.011
lee-0.011
en-0.011
arted-0.011
 requests-0.011
Token and
Token ID290
Feature activation+6.37e-01

Pos logit contributions
lee+0.026
stick+0.026
 requests+0.025
en+0.025
ags+0.024
Neg logit contributions
str-0.021
ww-0.019
 slice-0.019
iest-0.018
Not-0.018
Token Be
Token ID1355
Feature activation-1.88e+00

Pos logit contributions
str+0.014
ww+0.013
 slice+0.013
iest+0.013
Not+0.013
Neg logit contributions
lee-0.013
stick-0.013
 requests-0.012
en-0.012
ags-0.012
Tokencca
Token ID13227
Feature activation-1.15e+00

Pos logit contributions
lee+0.041
stick+0.041
 requests+0.039
en+0.039
arted+0.039
Neg logit contributions
str-0.039
 slice-0.036
ww-0.036
iest-0.035
prise-0.035
Token sw
Token ID1509
Feature activation-8.98e+00

Pos logit contributions
lee+0.027
stick+0.025
en+0.025
arted+0.024
am+0.024
Neg logit contributions
str-0.024
ww-0.022
 slice-0.022
prise-0.021
aster-0.021
Tokenam
Token ID321
Feature activation-4.77e+00

Pos logit contributions
 requests+0.091
 circles+0.088
stick+0.085
 stages+0.083
 houses+0.083
Neg logit contributions
w-0.266
str-0.263
ipop-0.258
ww-0.256
less-0.255
Token together
Token ID1978
Feature activation+1.02e+00

Pos logit contributions
lee+0.097
stick+0.096
 requests+0.092
en+0.090
lement+0.090
Neg logit contributions
str-0.104
ww-0.097
Not-0.097
 slice-0.097
iest-0.097
Token.
Token ID13
Feature activation-2.25e+00

Pos logit contributions
str+0.020
 slice+0.018
ww+0.018
iest+0.018
Not+0.018
Neg logit contributions
lee-0.023
stick-0.023
 requests-0.022
en-0.022
arted-0.022
Token They
Token ID1119
Feature activation-2.78e-01

Pos logit contributions
stick+0.053
lee+0.051
 heads+0.051
ishment+0.050
en+0.050
Neg logit contributions
str-0.044
Not-0.040
 slice-0.038
ww-0.038
 coconut-0.037
Token had
Token ID550
Feature activation-9.33e-01

Pos logit contributions
lee+0.006
stick+0.006
en+0.006
arted+0.006
 requests+0.006
Neg logit contributions
str-0.006
ww-0.005
 slice-0.005
prise-0.005
iest-0.005
Token,
Token ID11
Feature activation+3.06e+00

Pos logit contributions
str+0.062
ww+0.056
 slice+0.055
prise+0.055
arest+0.053
Neg logit contributions
lee-0.075
stick-0.071
 requests-0.071
en-0.071
 circles-0.069
Token she
Token ID673
Feature activation-9.71e-01

Pos logit contributions
str+0.044
ww+0.038
 slice+0.038
iest+0.036
prise+0.036
Neg logit contributions
lee-0.088
stick-0.088
 requests-0.085
en-0.085
arted-0.084
Token jumped
Token ID11687
Feature activation-5.76e+00

Pos logit contributions
lee+0.023
stick+0.023
en+0.022
 requests+0.022
arted+0.022
Neg logit contributions
str-0.019
ww-0.017
 slice-0.017
prise-0.016
iest-0.016
Token in
Token ID287
Feature activation-1.51e+00

Pos logit contributions
lee+0.125
stick+0.124
 requests+0.120
lement+0.120
ishment+0.119
Neg logit contributions
str-0.113
Not-0.110
 slice-0.109
iest-0.109
ww-0.106
Token and
Token ID290
Feature activation-1.21e+00

Pos logit contributions
stick+0.038
lee+0.038
aled+0.036
arted+0.036
en+0.036
Neg logit contributions
str-0.026
 slice-0.025
Not-0.024
aster-0.023
ww-0.023
Token sw
Token ID1509
Feature activation-8.95e+00

Pos logit contributions
lee+0.029
stick+0.028
 requests+0.028
ags+0.027
en+0.027
Neg logit contributions
str-0.023
ww-0.021
 slice-0.021
Not-0.020
iest-0.020
Tokenam
Token ID321
Feature activation-4.53e+00

Pos logit contributions
stick+0.091
 robots+0.087
 requests+0.082
lement+0.081
 channels+0.080
Neg logit contributions
w-0.267
str-0.262
 slice-0.262
ww-0.260
ipop-0.256
Token to
Token ID284
Feature activation-1.94e+00

Pos logit contributions
lee+0.108
stick+0.106
 requests+0.104
ishment+0.104
lement+0.103
Neg logit contributions
str-0.080
iest-0.079
Not-0.078
 slice-0.075
ha-0.074
Token the
Token ID262
Feature activation-1.11e+00

Pos logit contributions
stick+0.060
lee+0.059
aters+0.059
am+0.057
arted+0.056
Neg logit contributions
aster-0.023
 slice-0.020
 reward-0.020
 mel-0.020
 believe-0.019
Token bag
Token ID6131
Feature activation-1.47e+00

Pos logit contributions
lee+0.023
stick+0.023
en+0.022
 requests+0.022
arted+0.022
Neg logit contributions
str-0.025
 slice-0.023
ww-0.023
Not-0.022
iest-0.022
Token.
Token ID13
Feature activation-2.67e+00

Pos logit contributions
lee+0.035
stick+0.034
 requests+0.034
en+0.033
ishment+0.033
Neg logit contributions
str-0.028
 slice-0.026
iest-0.025
Not-0.025
ww-0.024
Token.
Token ID13
Feature activation-2.67e+00

Pos logit contributions
str+0.010
ww+0.009
 slice+0.009
prise+0.009
iest+0.009
Neg logit contributions
lee-0.010
stick-0.010
en-0.010
 requests-0.010
arted-0.009
Token He
Token ID679
Feature activation-1.22e+00

Pos logit contributions
ags+0.068
ishment+0.065
en+0.065
aters+0.062
stick+0.058
Neg logit contributions
str-0.047
Not-0.045
 he-0.045
 coconut-0.044
t-0.041
Token grabbed
Token ID13176
Feature activation-3.18e+00

Pos logit contributions
stick+0.030
lee+0.029
ags+0.029
 requests+0.028
en+0.028
Neg logit contributions
str-0.022
ww-0.020
 slice-0.020
 happened-0.019
ha-0.019
Token it
Token ID340
Feature activation-5.70e-01

Pos logit contributions
lee+0.074
stick+0.073
en+0.071
arted+0.071
 requests+0.071
Neg logit contributions
str-0.064
ww-0.058
 slice-0.056
prise-0.056
iest-0.055
Token and
Token ID290
Feature activation-1.65e+00

Pos logit contributions
lee+0.013
stick+0.013
ags+0.013
 requests+0.013
lement+0.012
Neg logit contributions
str-0.011
 slice-0.010
Not-0.010
ww-0.010
 mel-0.010
Token sw
Token ID1509
Feature activation-8.94e+00

Pos logit contributions
lee+0.037
stick+0.036
en+0.035
 requests+0.035
arted+0.034
Neg logit contributions
str-0.035
ww-0.032
 slice-0.031
prise-0.031
iest-0.031
Tokenam
Token ID321
Feature activation-4.40e+00

Pos logit contributions
stick+0.075
 requests+0.072
 circles+0.072
lement+0.066
 languages+0.063
Neg logit contributions
w-0.289
str-0.286
ww-0.281
 slice-0.278
 coconut-0.278
Token back
Token ID736
Feature activation-2.13e+00

Pos logit contributions
lee+0.093
stick+0.092
 requests+0.088
en+0.087
arted+0.086
Neg logit contributions
str-0.096
 slice-0.088
ww-0.088
Not-0.086
iest-0.086
Token to
Token ID284
Feature activation-1.65e+00

Pos logit contributions
lee+0.057
stick+0.056
 requests+0.056
lement+0.055
ishment+0.055
Neg logit contributions
str-0.033
Not-0.029
ww-0.028
ipop-0.027
iest-0.027
Token shore
Token ID15191
Feature activation-2.33e+00

Pos logit contributions
aters+0.047
lee+0.045
stick+0.045
lement+0.043
ishment+0.043
Neg logit contributions
aster-0.025
 coconut-0.023
str-0.023
 slice-0.023
 bite-0.022
Token.
Token ID13
Feature activation-2.53e+00

Pos logit contributions
lee+0.055
stick+0.054
aled+0.053
ags+0.053
lement+0.053
Neg logit contributions
str-0.044
 slice-0.040
iest-0.039
Not-0.038
 mel-0.038
Token
Token ID198
Feature activation+1.28e+00

Pos logit contributions
str+0.045
ww+0.043
 slice+0.043
prise+0.041
iest+0.041
Neg logit contributions
lee-0.044
stick-0.043
 requests-0.042
en-0.042
arted-0.041
TokenHe
Token ID1544
Feature activation-6.73e-01

Pos logit contributions
str+0.021
ww+0.019
 slice+0.018
Not+0.018
iest+0.018
Neg logit contributions
lee-0.035
stick-0.035
en-0.033
 requests-0.033
arted-0.033
Token sw
Token ID1509
Feature activation-8.00e+00

Pos logit contributions
stick+0.018
lee+0.018
ags+0.018
arted+0.017
 requests+0.017
Neg logit contributions
str-0.011
ww-0.009
ha-0.009
 happened-0.009
 slice-0.009
Tokenam
Token ID321
Feature activation-4.71e+00

Pos logit contributions
stick+0.100
 circles+0.098
lement+0.096
 requests+0.095
 channels+0.090
Neg logit contributions
w-0.218
 bait-0.211
 coconut-0.207
str-0.203
ww-0.198
Token and
Token ID290
Feature activation-1.64e+00

Pos logit contributions
lee+0.091
stick+0.091
 requests+0.086
en+0.086
arted+0.085
Neg logit contributions
str-0.112
ww-0.103
 slice-0.103
iest-0.100
Not-0.100
Token sw
Token ID1509
Feature activation-8.91e+00

Pos logit contributions
ags+0.042
stick+0.041
lee+0.041
 requests+0.041
 mistakes+0.040
Neg logit contributions
str-0.027
ha-0.025
 happened-0.025
ww-0.025
 slice-0.024
Tokenam
Token ID321
Feature activation-4.82e+00

Pos logit contributions
 requests+0.094
 circles+0.093
stick+0.093
lement+0.091
lee+0.087
Neg logit contributions
w-0.265
 bait-0.255
ww-0.254
str-0.253
 coconut-0.247
Token until
Token ID1566
Feature activation+4.07e+00

Pos logit contributions
lee+0.117
stick+0.116
 requests+0.112
en+0.111
ishment+0.110
Neg logit contributions
str-0.089
ww-0.081
 slice-0.080
iest-0.079
Not-0.079
Token he
Token ID339
Feature activation+5.04e-02

Pos logit contributions
prise+0.065
str+0.064
 slice+0.063
ww+0.060
arest+0.058
Neg logit contributions
lee-0.109
am-0.104
arted-0.104
aled-0.103
en-0.103
Token found
Token ID1043
Feature activation-1.61e+00

Pos logit contributions
str+0.001
ww+0.001
 slice+0.001
prise+0.001
arest+0.001
Neg logit contributions
lee-0.001
am-0.001
en-0.001
stick-0.001
arted-0.001
Token a
Token ID257
Feature activation+9.22e-01

Pos logit contributions
lee+0.037
stick+0.036
am+0.036
arted+0.035
aled+0.035
Neg logit contributions
str-0.033
ww-0.031
 slice-0.030
prise-0.029
arest-0.029
Token.
Token ID13
Feature activation-2.51e+00

Pos logit contributions
prise+0.068
str+0.068
aster+0.068
ww+0.067
w+0.065
Neg logit contributions
 circles-0.053
lee-0.051
 backs-0.049
 requests-0.049
stick-0.049
Token He
Token ID679
Feature activation-1.12e+00

Pos logit contributions
lee+0.042
en+0.042
stick+0.041
ishment+0.040
ags+0.040
Neg logit contributions
str-0.065
 slice-0.061
Not-0.060
ww-0.060
 mel-0.060
Token sw
Token ID1509
Feature activation-8.19e+00

Pos logit contributions
lee+0.026
stick+0.025
en+0.024
 requests+0.024
am+0.024
Neg logit contributions
str-0.023
ww-0.021
iest-0.021
prise-0.021
 slice-0.021
Tokenam
Token ID321
Feature activation-4.87e+00

Pos logit contributions
stick+0.077
 requests+0.075
 circles+0.072
lement+0.068
 languages+0.068
Neg logit contributions
w-0.256
str-0.253
 coconut-0.251
 bait-0.247
ww-0.241
Token and
Token ID290
Feature activation-1.88e+00

Pos logit contributions
stick+0.092
lee+0.092
arted+0.089
en+0.088
 heads+0.086
Neg logit contributions
str-0.120
ww-0.112
 slice-0.110
prise-0.109
arest-0.107
Token sw
Token ID1509
Feature activation-8.91e+00

Pos logit contributions
lee+0.042
stick+0.042
en+0.040
 requests+0.040
arted+0.040
Neg logit contributions
str-0.039
ww-0.035
 slice-0.035
iest-0.034
prise-0.034
Tokenam
Token ID321
Feature activation-4.82e+00

Pos logit contributions
 requests+0.075
stick+0.074
lement+0.069
 circles+0.067
 languages+0.063
Neg logit contributions
str-0.288
w-0.285
ww-0.279
 coconut-0.279
 bait-0.277
Token until
Token ID1566
Feature activation+4.10e+00

Pos logit contributions
lee+0.110
stick+0.109
en+0.105
 requests+0.104
arted+0.104
Neg logit contributions
str-0.098
ww-0.089
 slice-0.089
prise-0.086
iest-0.085
Token he
Token ID339
Feature activation-3.12e-01

Pos logit contributions
str+0.070
prise+0.068
ww+0.067
 slice+0.066
arest+0.061
Neg logit contributions
lee-0.104
arted-0.102
aled-0.100
am-0.100
stick-0.100
Token found
Token ID1043
Feature activation-1.84e+00

Pos logit contributions
lee+0.007
am+0.007
en+0.007
stick+0.007
arted+0.007
Neg logit contributions
ww-0.006
str-0.006
prise-0.006
 slice-0.006
arest-0.006
Token a
Token ID257
Feature activation+7.36e-01

Pos logit contributions
stick+0.041
lee+0.040
 circles+0.039
en+0.039
arted+0.038
Neg logit contributions
str-0.040
ww-0.039
prise-0.036
 slice-0.036
arest-0.035
Token
Token ID198
Feature activation+8.60e-01

Pos logit contributions
str+0.029
 slice+0.027
ww+0.027
iest+0.026
prise+0.026
Neg logit contributions
lee-0.030
stick-0.029
 requests-0.029
arted-0.028
en-0.028
TokenJoe
Token ID19585
Feature activation-2.22e+00

Pos logit contributions
str+0.016
 slice+0.016
ipop+0.015
 happened+0.015
ww+0.015
Neg logit contributions
lee-0.020
stick-0.020
 requests-0.020
arted-0.020
 circles-0.019
Token sw
Token ID1509
Feature activation-8.02e+00

Pos logit contributions
stick+0.053
lee+0.053
ags+0.052
 requests+0.051
ishment+0.051
Neg logit contributions
str-0.042
ww-0.037
 slice-0.037
iest-0.036
ha-0.036
Tokenam
Token ID321
Feature activation-4.41e+00

Pos logit contributions
 circles+0.095
 channels+0.093
 robots+0.091
 requests+0.090
 Playing+0.089
Neg logit contributions
str-0.223
w-0.213
 coconut-0.207
ww-0.202
ipop-0.201
Token and
Token ID290
Feature activation-1.40e+00

Pos logit contributions
lee+0.084
stick+0.083
 requests+0.079
en+0.078
ags+0.077
Neg logit contributions
str-0.105
ww-0.097
 slice-0.097
iest-0.096
Not-0.096
Token sw
Token ID1509
Feature activation-8.91e+00

Pos logit contributions
lee+0.032
stick+0.032
ags+0.031
 requests+0.031
en+0.030
Neg logit contributions
str-0.028
ww-0.025
 slice-0.025
Not-0.025
ha-0.024
Tokenam
Token ID321
Feature activation-4.58e+00

Pos logit contributions
 requests+0.090
lement+0.082
 circles+0.082
ishment+0.081
stick+0.080
Neg logit contributions
str-0.268
ww-0.262
w-0.261
 bait-0.254
ha-0.254
Token,
Token ID11
Feature activation+1.46e+00

Pos logit contributions
lee+0.107
stick+0.106
 requests+0.102
en+0.101
ishment+0.100
Neg logit contributions
str-0.090
ww-0.081
 slice-0.081
iest-0.080
Not-0.080
Token but
Token ID475
Feature activation+3.45e+00

Pos logit contributions
str+0.030
Not+0.026
 slice+0.026
ww+0.026
 mel+0.025
Neg logit contributions
lee-0.033
ags-0.032
stick-0.032
 requests-0.032
lement-0.031
Token he
Token ID339
Feature activation+2.56e-01

Pos logit contributions
ww+0.061
str+0.060
 slice+0.060
prise+0.060
arest+0.057
Neg logit contributions
lee-0.085
stick-0.084
arted-0.084
en-0.083
am-0.082
Token couldn
Token ID3521
Feature activation-8.19e-01

Pos logit contributions
str+0.005
ww+0.005
 slice+0.005
prise+0.005
arest+0.005
Neg logit contributions
lee-0.006
stick-0.006
en-0.006
arted-0.006
am-0.005
Token.
Token ID13
Feature activation-2.44e+00

Pos logit contributions
stick+0.025
lee+0.025
en+0.024
 requests+0.024
 heads+0.023
Neg logit contributions
str-0.029
ww-0.026
 slice-0.026
prise-0.025
arest-0.025
Token He
Token ID679
Feature activation-1.07e+00

Pos logit contributions
stick+0.060
en+0.059
lement+0.058
lee+0.057
ishment+0.057
Neg logit contributions
str-0.045
Not-0.043
 coconut-0.041
 happened-0.040
ipop-0.039
Token sw
Token ID1509
Feature activation-7.98e+00

Pos logit contributions
ags+0.032
lee+0.031
stick+0.031
aters+0.031
arted+0.030
Neg logit contributions
str-0.014
 happened-0.013
 else-0.012
ha-0.012
w-0.012
Tokenam
Token ID321
Feature activation-5.02e+00

Pos logit contributions
stick+0.083
 circles+0.082
 requests+0.079
lement+0.079
 commands+0.073
Neg logit contributions
w-0.246
str-0.231
 coconut-0.228
ww-0.227
iest-0.225
Token and
Token ID290
Feature activation-2.11e+00

Pos logit contributions
stick+0.093
lee+0.092
arted+0.089
en+0.089
 heads+0.087
Neg logit contributions
str-0.124
ww-0.115
 slice-0.114
prise-0.113
aster-0.111
Token sw
Token ID1509
Feature activation-8.90e+00

Pos logit contributions
ags+0.052
lee+0.051
en+0.051
stick+0.051
lement+0.050
Neg logit contributions
str-0.034
 happened-0.033
w-0.032
 mel-0.032
ww-0.032
Tokenam
Token ID321
Feature activation-5.07e+00

Pos logit contributions
 requests+0.083
stick+0.081
 circles+0.080
lement+0.080
lee+0.075
Neg logit contributions
w-0.283
ww-0.271
str-0.271
iest-0.263
 bait-0.262
Token until
Token ID1566
Feature activation+3.84e+00

Pos logit contributions
lee+0.123
stick+0.122
 requests+0.117
en+0.117
arted+0.115
Neg logit contributions
str-0.095
ww-0.086
 slice-0.086
iest-0.083
Not-0.083
Token he
Token ID339
Feature activation-1.49e-01

Pos logit contributions
str+0.057
 slice+0.054
prise+0.054
ww+0.052
arest+0.050
Neg logit contributions
lee-0.106
stick-0.103
am-0.102
aled-0.101
arted-0.101
Token saw
Token ID2497
Feature activation-1.21e-01

Pos logit contributions
lee+0.004
stick+0.004
en+0.004
am+0.004
 requests+0.004
Neg logit contributions
str-0.003
ww-0.002
 slice-0.002
prise-0.002
arest-0.002
Token a
Token ID257
Feature activation+1.29e+00

Pos logit contributions
lee+0.003
stick+0.003
 requests+0.003
en+0.003
arted+0.003
Neg logit contributions
str-0.002
ww-0.002
 slice-0.002
prise-0.002
arest-0.002
Token.
Token ID13
Feature activation-2.21e+00

Pos logit contributions
ags+0.040
stick+0.039
lee+0.038
aled+0.038
 requests+0.038
Neg logit contributions
str-0.029
Not-0.026
aster-0.025
iest-0.025
 happened-0.025
Token He
Token ID679
Feature activation-7.85e-01

Pos logit contributions
en+0.052
lee+0.051
ishment+0.050
stick+0.050
ags+0.050
Neg logit contributions
str-0.044
Not-0.041
 mel-0.038
 coconut-0.037
 slice-0.036
Token jumped
Token ID11687
Feature activation-5.66e+00

Pos logit contributions
lee+0.020
stick+0.019
en+0.019
 requests+0.019
arted+0.019
Neg logit contributions
str-0.014
ww-0.013
 slice-0.013
prise-0.012
iest-0.012
Token in
Token ID287
Feature activation-7.63e-01

Pos logit contributions
lee+0.115
stick+0.114
 requests+0.109
en+0.109
arted+0.107
Neg logit contributions
str-0.129
ww-0.119
 slice-0.118
iest-0.115
Not-0.115
Token and
Token ID290
Feature activation-1.25e+00

Pos logit contributions
stick+0.018
lee+0.018
 requests+0.017
arted+0.017
aled+0.017
Neg logit contributions
str-0.014
 slice-0.014
ww-0.013
Not-0.013
 mel-0.013
Token sw
Token ID1509
Feature activation-8.89e+00

Pos logit contributions
lee+0.028
stick+0.028
en+0.026
arted+0.026
 requests+0.026
Neg logit contributions
str-0.026
 slice-0.024
ww-0.024
prise-0.024
iest-0.023
Tokenam
Token ID321
Feature activation-4.34e+00

Pos logit contributions
stick+0.089
 circles+0.084
 requests+0.082
lement+0.079
 robots+0.077
Neg logit contributions
w-0.271
str-0.265
ww-0.262
 slice-0.257
 bait-0.256
Token all
Token ID477
Feature activation-8.05e-01

Pos logit contributions
lee+0.100
stick+0.099
 requests+0.096
en+0.094
ishment+0.093
Neg logit contributions
str-0.086
ww-0.078
 slice-0.078
Not-0.077
iest-0.077
Token around
Token ID1088
Feature activation-6.73e-01

Pos logit contributions
stick+0.021
lee+0.021
 requests+0.020
ishment+0.020
lement+0.020
Neg logit contributions
str-0.014
iest-0.013
Not-0.012
ww-0.012
 slice-0.011
Token the
Token ID262
Feature activation-1.21e+00

Pos logit contributions
lee+0.018
stick+0.018
aled+0.017
ishment+0.017
 requests+0.017
Neg logit contributions
str-0.011
iest-0.010
 slice-0.010
aster-0.010
 happened-0.010
Token river
Token ID7850
Feature activation-8.21e-01

Pos logit contributions
stick+0.030
aled+0.028
lee+0.028
arted+0.027
lement+0.027
Neg logit contributions
str-0.025
 mel-0.022
iest-0.021
 coconut-0.021
ww-0.021
Token weak
Token ID4939
Feature activation-2.54e-01

Pos logit contributions
stick+0.025
lee+0.025
 requests+0.024
en+0.023
 heads+0.023
Neg logit contributions
str-0.025
ww-0.023
prise-0.022
 slice-0.022
aster-0.022
Token.
Token ID13
Feature activation-2.30e+00

Pos logit contributions
lee+0.006
stick+0.006
 requests+0.005
en+0.005
arted+0.005
Neg logit contributions
str-0.005
 slice-0.005
ww-0.005
iest-0.005
 happened-0.005
Token
Token ID198
Feature activation+8.47e-01

Pos logit contributions
stick+0.051
lee+0.050
en+0.049
 requests+0.048
ags+0.047
Neg logit contributions
str-0.049
 slice-0.044
ww-0.044
Not-0.043
 happened-0.042
Token
Token ID198
Feature activation+6.38e-01

Pos logit contributions
str+0.019
iest+0.018
ww+0.018
 slice+0.017
ipop+0.017
Neg logit contributions
lee-0.017
stick-0.017
 requests-0.016
ags-0.016
 circles-0.016
TokenBut
Token ID1537
Feature activation+4.77e+00

Pos logit contributions
str+0.012
 happened+0.011
iest+0.011
ipop+0.011
w+0.011
Neg logit contributions
lee-0.015
 requests-0.015
stick-0.014
 circles-0.014
 commands-0.014
Token soon
Token ID2582
Feature activation+5.46e+00

Pos logit contributions
str+0.111
arest+0.104
ww+0.103
iest+0.100
prise+0.099
Neg logit contributions
en-0.096
lee-0.096
ishment-0.093
 heads-0.091
stick-0.090
Token,
Token ID11
Feature activation+5.56e+00

Pos logit contributions
str+0.125
arest+0.117
prise+0.115
w+0.112
 slice+0.111
Neg logit contributions
lee-0.119
en-0.114
am-0.105
 circles-0.104
 requests-0.104
Token the
Token ID262
Feature activation+1.18e+00

Pos logit contributions
str+0.153
ww+0.141
arest+0.141
ha+0.139
prise+0.139
Neg logit contributions
 heads-0.083
en-0.082
lee-0.082
 houses-0.082
 requests-0.082
Token sky
Token ID6766
Feature activation+6.84e-01

Pos logit contributions
str+0.028
ww+0.027
prise+0.026
ipop+0.025
ha+0.025
Neg logit contributions
 heads-0.024
 circles-0.023
 requests-0.023
 houses-0.022
 backs-0.022
Token began
Token ID2540
Feature activation-2.13e+00

Pos logit contributions
str+0.014
prise+0.014
ww+0.014
 slice+0.013
ipop+0.013
Neg logit contributions
lee-0.015
en-0.015
stick-0.015
 heads-0.014
 requests-0.014
Token to
Token ID284
Feature activation-7.70e-01

Pos logit contributions
lee+0.056
stick+0.055
ags+0.055
 requests+0.054
aled+0.054
Neg logit contributions
str-0.035
 slice-0.031
Not-0.030
 happened-0.030
ww-0.029
Token cat
Token ID3797
Feature activation-1.97e+00

Pos logit contributions
str+0.006
ww+0.006
prise+0.006
 slice+0.006
arest+0.005
Neg logit contributions
stick-0.005
lee-0.005
 heads-0.005
 requests-0.005
en-0.005
Token the
Token ID262
Feature activation-7.21e-01

Pos logit contributions
lee+0.049
stick+0.049
 requests+0.047
en+0.047
arted+0.046
Neg logit contributions
str-0.035
ww-0.032
 slice-0.032
iest-0.031
prise-0.030
Token other
Token ID584
Feature activation+2.01e-01

Pos logit contributions
stick+0.014
 heads+0.014
lee+0.014
 requests+0.014
arted+0.013
Neg logit contributions
str-0.016
prise-0.015
ww-0.015
arest-0.015
 slice-0.015
Token.
Token ID13
Feature activation-2.88e+00

Pos logit contributions
str+0.005
 slice+0.005
ww+0.005
iest+0.005
Not+0.005
Neg logit contributions
lee-0.004
stick-0.004
 requests-0.003
ags-0.003
en-0.003
Token Finally
Token ID9461
Feature activation+5.50e+00

Pos logit contributions
en+0.057
ishment+0.057
lement+0.055
lee+0.055
ags+0.054
Neg logit contributions
Not-0.063
str-0.063
ww-0.056
 mel-0.056
 coconut-0.056
Token,
Token ID11
Feature activation+5.11e+00

Pos logit contributions
str+0.134
arest+0.125
prise+0.120
 slice+0.120
w+0.117
Neg logit contributions
lee-0.107
en-0.099
 circles-0.097
am-0.096
stick-0.092
Token the
Token ID262
Feature activation+6.94e-01

Pos logit contributions
arest+0.108
prise+0.107
str+0.107
iest+0.104
ww+0.101
Neg logit contributions
 houses-0.106
stick-0.104
 circles-0.102
arted-0.102
en-0.100
Token pile
Token ID14540
Feature activation+9.54e-01

Pos logit contributions
str+0.015
prise+0.014
ww+0.014
arest+0.014
aster+0.014
Neg logit contributions
stick-0.015
 requests-0.014
 heads-0.014
lee-0.014
 circles-0.014
Token separated
Token ID11266
Feature activation-8.97e-01

Pos logit contributions
str+0.016
ww+0.014
 slice+0.014
prise+0.014
arest+0.013
Neg logit contributions
stick-0.025
lee-0.025
en-0.024
arted-0.024
 requests-0.024
Token,
Token ID11
Feature activation+3.37e+00

Pos logit contributions
lee+0.021
stick+0.021
 requests+0.020
ags+0.020
aled+0.020
Neg logit contributions
str-0.017
ww-0.015
 slice-0.015
iest-0.015
Not-0.015
Token and
Token ID290
Feature activation+2.26e+00

Pos logit contributions
str+0.078
ww+0.071
 slice+0.071
prise+0.069
iest+0.069
Neg logit contributions
lee-0.068
stick-0.067
en-0.064
 requests-0.064
arted-0.063
Token all
Token ID477
Feature activation-3.23e-01

Pos logit contributions
str+0.012
 slice+0.011
ww+0.011
iest+0.011
Not+0.011
Neg logit contributions
lee-0.012
stick-0.011
 requests-0.011
en-0.011
arted-0.011
Token day
Token ID1110
Feature activation+2.00e+00

Pos logit contributions
stick+0.009
lee+0.009
en+0.009
ags+0.009
 requests+0.009
Neg logit contributions
str-0.005
 slice-0.004
prise-0.004
ww-0.004
arest-0.004
Token long
Token ID890
Feature activation+1.41e+00

Pos logit contributions
str+0.027
ww+0.024
 slice+0.024
arest+0.022
iest+0.022
Neg logit contributions
stick-0.059
lee-0.058
arted-0.057
aled-0.057
en-0.057
Token.
Token ID13
Feature activation-1.57e+00

Pos logit contributions
str+0.028
 slice+0.026
 happened+0.026
iest+0.025
ww+0.025
Neg logit contributions
lee-0.032
stick-0.031
arted-0.030
ishment-0.030
 requests-0.030
Token One
Token ID1881
Feature activation+4.27e+00

Pos logit contributions
stick+0.032
lee+0.032
arted+0.032
 heads+0.031
pered+0.030
Neg logit contributions
str-0.035
 slice-0.034
Not-0.033
aster-0.033
 mel-0.032
Token day
Token ID1110
Feature activation+5.58e+00

Pos logit contributions
prise+0.096
arest+0.095
str+0.092
keys+0.091
ha+0.087
Neg logit contributions
en-0.074
 requests-0.074
 circles-0.072
lee-0.070
am-0.069
Token,
Token ID11
Feature activation+5.56e+00

Pos logit contributions
prise+0.152
str+0.152
arest+0.152
comb+0.147
 slice+0.144
Neg logit contributions
am-0.077
lee-0.071
en-0.068
bed-0.063
 Playing-0.061
Token Tim
Token ID5045
Feature activation-2.12e+00

Pos logit contributions
prise+0.166
arest+0.166
str+0.165
ww+0.156
ha+0.151
Neg logit contributions
 houses-0.072
stick-0.064
bed-0.061
 circles-0.061
lee-0.058
Tokenmy
Token ID1820
Feature activation-1.79e+00

Pos logit contributions
stick+0.043
lee+0.043
arted+0.041
ags+0.041
 requests+0.041
Neg logit contributions
str-0.047
 slice-0.044
ww-0.044
 happened-0.043
w-0.042
Token's
Token ID338
Feature activation-1.57e+00

Pos logit contributions
lee+0.029
stick+0.028
en+0.027
 requests+0.027
am+0.026
Neg logit contributions
str-0.049
 slice-0.045
ww-0.045
arest-0.045
prise-0.045
Token mom
Token ID1995
Feature activation+4.73e-01

Pos logit contributions
stick+0.033
lee+0.032
 requests+0.032
 heads+0.031
am+0.030
Neg logit contributions
str-0.036
ww-0.033
ha-0.032
aster-0.032
arest-0.032
Token<|endoftext|>
Token ID50256
Feature activation-4.80e-01

Pos logit contributions
lee+0.013
stick+0.013
en+0.012
ags+0.012
 heads+0.012
Neg logit contributions
str-0.018
ww-0.017
 slice-0.017
Not-0.017
aster-0.016
TokenOnce
Token ID7454
Feature activation+3.37e+00

Pos logit contributions
lee+0.011
stick+0.011
en+0.011
 requests+0.011
arted+0.011
Neg logit contributions
str-0.009
 slice-0.008
ww-0.008
Not-0.008
iest-0.008
Token upon
Token ID2402
Feature activation+6.05e-01

Pos logit contributions
ww+0.064
str+0.062
 slice+0.062
prise+0.060
arest+0.059
Neg logit contributions
lee-0.082
am-0.073
arted-0.073
en-0.073
 requests-0.071
Token a
Token ID257
Feature activation+2.87e+00

Pos logit contributions
str+0.012
ww+0.010
 slice+0.010
iest+0.010
ipop+0.010
Neg logit contributions
stick-0.014
lee-0.014
 requests-0.014
 circles-0.014
ags-0.014
Token time
Token ID640
Feature activation+2.47e+00

Pos logit contributions
str+0.047
iest+0.044
Not+0.042
prise+0.042
 slice+0.041
Neg logit contributions
lee-0.074
 requests-0.072
ags-0.072
en-0.071
 heads-0.070
Token,
Token ID11
Feature activation+4.11e+00

Pos logit contributions
 slice+0.053
aster+0.052
str+0.052
ww+0.049
arest+0.049
Neg logit contributions
lee-0.051
am-0.048
stick-0.047
 circles-0.047
en-0.046
Token there
Token ID612
Feature activation+9.53e-01

Pos logit contributions
str+0.074
 reward+0.073
aster+0.072
iest+0.070
prise+0.069
Neg logit contributions
lee-0.093
sey-0.093
en-0.092
stick-0.091
arted-0.090
Token was
Token ID373
Feature activation+4.37e-02

Pos logit contributions
ww+0.021
str+0.021
prise+0.021
 slice+0.021
arest+0.021
Neg logit contributions
en-0.019
stick-0.018
lee-0.018
am-0.018
 circles-0.018
Token a
Token ID257
Feature activation+1.95e+00

Pos logit contributions
str+0.001
iest+0.001
ipop+0.001
 slice+0.001
Not+0.001
Neg logit contributions
lee-0.001
ags-0.001
stick-0.001
 circles-0.001
 requests-0.001
Token horse
Token ID8223
Feature activation+1.11e+00

Pos logit contributions
str+0.037
iest+0.035
w+0.034
arest+0.034
prise+0.033
Neg logit contributions
lee-0.045
 requests-0.043
 heads-0.042
stick-0.041
en-0.041
Token named
Token ID3706
Feature activation-1.37e+00

Pos logit contributions
str+0.022
ww+0.021
 slice+0.020
prise+0.020
Not+0.020
Neg logit contributions
stick-0.026
lee-0.025
en-0.024
 requests-0.024
am-0.023
Token<|endoftext|>
Token ID50256
Feature activation-2.11e-01

Pos logit contributions
lee+0.046
stick+0.046
en+0.044
 requests+0.044
arted+0.043
Neg logit contributions
str-0.045
ww-0.041
 slice-0.041
iest-0.040
Not-0.040
TokenOnce
Token ID7454
Feature activation+3.75e+00

Pos logit contributions
lee+0.005
stick+0.005
 requests+0.005
en+0.005
arted+0.005
Neg logit contributions
str-0.004
 slice-0.003
ww-0.003
iest-0.003
 happened-0.003
Token upon
Token ID2402
Feature activation+8.69e-01

Pos logit contributions
ww+0.077
prise+0.073
 slice+0.072
str+0.072
arest+0.072
Neg logit contributions
lee-0.086
am-0.079
arted-0.075
 requests-0.074
ags-0.074
Token a
Token ID257
Feature activation+3.15e+00

Pos logit contributions
str+0.019
ww+0.016
ipop+0.016
iest+0.016
ha+0.015
Neg logit contributions
 requests-0.019
 circles-0.019
ags-0.019
stick-0.019
lee-0.018
Token time
Token ID640
Feature activation+3.23e+00

Pos logit contributions
iest+0.059
str+0.059
ipop+0.054
prise+0.054
Not+0.053
Neg logit contributions
 requests-0.071
ags-0.070
lee-0.068
am-0.067
en-0.066
Token,
Token ID11
Feature activation+4.62e+00

Pos logit contributions
 slice+0.073
aster+0.072
str+0.071
ww+0.071
arest+0.070
Neg logit contributions
am-0.059
lee-0.057
azes-0.055
 circles-0.054
 requests-0.053
Token there
Token ID612
Feature activation+1.35e+00

Pos logit contributions
str+0.105
ww+0.103
 reward+0.101
aster+0.101
 mel+0.098
Neg logit contributions
am-0.083
ags-0.083
sey-0.081
pered-0.080
arted-0.076
Token was
Token ID373
Feature activation-1.85e-01

Pos logit contributions
ww+0.032
arest+0.032
str+0.032
prise+0.032
 slice+0.031
Neg logit contributions
en-0.024
am-0.023
lee-0.023
 circles-0.023
 requests-0.022
Token a
Token ID257
Feature activation+1.68e+00

Pos logit contributions
ags+0.004
lee+0.004
 circles+0.004
 requests+0.004
stick+0.004
Neg logit contributions
str-0.004
iest-0.003
ww-0.003
ipop-0.003
arest-0.003
Token gentle
Token ID10296
Feature activation-7.80e-01

Pos logit contributions
str+0.032
iest+0.031
arest+0.030
w+0.030
prise+0.030
Neg logit contributions
lee-0.037
 requests-0.035
 heads-0.035
stick-0.034
en-0.034
Token bunny
Token ID44915
Feature activation+4.64e-01

Pos logit contributions
stick+0.017
lee+0.017
en+0.016
 requests+0.016
 heads+0.016
Neg logit contributions
str-0.016
prise-0.015
arest-0.015
w-0.015
Not-0.015
Token arms
Token ID5101
Feature activation-1.47e+00

Pos logit contributions
lee+0.043
stick+0.042
en+0.040
 requests+0.040
arted+0.039
Neg logit contributions
str-0.090
 slice-0.086
ww-0.085
iest-0.083
Not-0.082
Token and
Token ID290
Feature activation-7.36e-02

Pos logit contributions
lee+0.033
stick+0.032
 requests+0.032
ishment+0.031
lement+0.031
Neg logit contributions
str-0.029
Not-0.027
 slice-0.027
iest-0.026
 happened-0.026
Token grabbed
Token ID13176
Feature activation-2.68e+00

Pos logit contributions
stick+0.002
lee+0.002
 requests+0.001
en+0.001
 heads+0.001
Neg logit contributions
str-0.002
ww-0.001
 slice-0.001
prise-0.001
iest-0.001
Token Tom
Token ID4186
Feature activation-1.44e+00

Pos logit contributions
stick+0.062
lee+0.062
 requests+0.059
en+0.059
aled+0.059
Neg logit contributions
str-0.055
ww-0.049
 slice-0.048
iest-0.047
prise-0.047
Token's
Token ID338
Feature activation-2.34e+00

Pos logit contributions
lee+0.027
stick+0.026
arted+0.025
 requests+0.024
en+0.024
Neg logit contributions
str-0.035
 slice-0.033
ww-0.033
Not-0.032
prise-0.032
Token leg
Token ID1232
Feature activation-5.70e-01

Pos logit contributions
stick+0.040
lee+0.039
 heads+0.039
 requests+0.038
ags+0.037
Neg logit contributions
str-0.063
ww-0.059
prise-0.057
ha-0.056
Not-0.056
Token.
Token ID13
Feature activation-2.85e+00

Pos logit contributions
lee+0.014
en+0.013
stick+0.012
arted+0.012
 requests+0.012
Neg logit contributions
str-0.011
 slice-0.010
Not-0.010
 mel-0.010
ww-0.010
Token It
Token ID632
Feature activation-9.34e-02

Pos logit contributions
lee+0.058
en+0.057
stick+0.056
 heads+0.054
ishment+0.053
Neg logit contributions
str-0.065
Not-0.062
 happened-0.060
 slice-0.060
aster-0.060
Token pulled
Token ID5954
Feature activation-2.72e+00

Pos logit contributions
stick+0.002
lee+0.002
 heads+0.002
 requests+0.002
aled+0.002
Neg logit contributions
str-0.002
prise-0.002
ww-0.002
arest-0.002
 slice-0.002
Token him
Token ID683
Feature activation-1.29e+00

Pos logit contributions
lee+0.065
stick+0.063
 requests+0.061
ishment+0.060
en+0.059
Neg logit contributions
str-0.050
ww-0.047
 slice-0.047
Not-0.047
ha-0.046
Token into
Token ID656
Feature activation-9.63e-01

Pos logit contributions
lee+0.028
stick+0.027
 requests+0.026
en+0.026
arted+0.025
Neg logit contributions
str-0.028
 slice-0.026
ww-0.026
Not-0.025
iest-0.025
Token But
Token ID887
Feature activation+4.36e+00

Pos logit contributions
 heads+0.047
en+0.047
stick+0.047
ishment+0.045
ags+0.044
Neg logit contributions
str-0.041
Not-0.040
 slice-0.038
ha-0.038
ww-0.038
Token when
Token ID618
Feature activation+7.02e+00

Pos logit contributions
str+0.093
ww+0.086
arest+0.085
 slice+0.082
iest+0.081
Neg logit contributions
stick-0.098
en-0.095
lee-0.093
 requests-0.091
 heads-0.091
Token she
Token ID673
Feature activation-5.71e-02

Pos logit contributions
str+0.127
 slice+0.115
arest+0.114
ww+0.109
iest+0.107
Neg logit contributions
stick-0.174
lee-0.173
en-0.169
 requests-0.168
 circles-0.165
Token opened
Token ID4721
Feature activation-4.43e+00

Pos logit contributions
lee+0.001
en+0.001
stick+0.001
 requests+0.001
am+0.001
Neg logit contributions
str-0.001
ww-0.001
arest-0.001
iest-0.001
Not-0.001
Token the
Token ID262
Feature activation-5.85e-01

Pos logit contributions
lee+0.107
stick+0.106
 requests+0.103
arted+0.101
ags+0.101
Neg logit contributions
str-0.082
 slice-0.074
ww-0.073
iest-0.072
Not-0.071
Token box
Token ID3091
Feature activation-8.00e-01

Pos logit contributions
 circles+0.011
stick+0.010
 heads+0.010
lee+0.010
 requests+0.010
Neg logit contributions
str-0.015
ww-0.014
aster-0.014
w-0.013
Not-0.013
Token,
Token ID11
Feature activation+4.89e+00

Pos logit contributions
lee+0.021
stick+0.021
en+0.020
 requests+0.020
arted+0.020
Neg logit contributions
str-0.014
ww-0.012
 slice-0.012
prise-0.012
Not-0.011
Token something
Token ID1223
Feature activation+3.56e+00

Pos logit contributions
str+0.093
ww+0.087
 slice+0.085
arest+0.083
prise+0.078
Neg logit contributions
stick-0.117
lee-0.117
 requests-0.114
en-0.111
 houses-0.111
Token made
Token ID925
Feature activation+3.22e-01

Pos logit contributions
prise+0.070
str+0.069
 slice+0.069
ww+0.069
aster+0.068
Neg logit contributions
lee-0.081
 requests-0.076
 circles-0.073
en-0.073
stick-0.072
Token her
Token ID607
Feature activation-1.30e+00

Pos logit contributions
str+0.009
ww+0.008
 slice+0.008
arest+0.008
Not+0.008
Neg logit contributions
lee-0.005
stick-0.005
en-0.005
 requests-0.005
arted-0.005
Token feel
Token ID1254
Feature activation-5.81e-01

Pos logit contributions
stick+0.027
lee+0.027
 requests+0.025
en+0.025
 heads+0.025
Neg logit contributions
str-0.030
ww-0.028
 slice-0.027
iest-0.027
Not-0.026
Token very
Token ID845
Feature activation+2.24e-01

Pos logit contributions
lee+0.034
stick+0.034
 requests+0.033
en+0.033
arted+0.032
Neg logit contributions
str-0.034
ww-0.032
 slice-0.031
Not-0.031
prise-0.030
Token messy
Token ID24097
Feature activation-5.65e-01

Pos logit contributions
str+0.004
 slice+0.004
ww+0.004
Not+0.004
iest+0.004
Neg logit contributions
lee-0.005
 requests-0.005
stick-0.005
arted-0.005
en-0.005
Token.
Token ID13
Feature activation-2.09e+00

Pos logit contributions
lee+0.012
stick+0.012
 requests+0.011
en+0.011
arted+0.011
Neg logit contributions
str-0.012
ww-0.011
 slice-0.011
iest-0.011
Not-0.011
Token "
Token ID366
Feature activation+4.11e-01

Pos logit contributions
ags+0.038
en+0.038
 heads+0.037
stick+0.037
aled+0.035
Neg logit contributions
str-0.051
Not-0.047
ww-0.047
 slice-0.046
ha-0.045
TokenYour
Token ID7120
Feature activation-5.69e-01

Pos logit contributions
str+0.007
 slice+0.006
iest+0.006
ww+0.006
ipop+0.006
Neg logit contributions
stick-0.011
 requests-0.010
lee-0.010
en-0.010
arted-0.010
Token feathers
Token ID27400
Feature activation+9.61e-01

Pos logit contributions
lee+0.012
stick+0.012
 heads+0.012
 requests+0.012
en+0.011
Neg logit contributions
str-0.012
arest-0.011
ww-0.011
Not-0.011
iest-0.011
Token are
Token ID389
Feature activation+1.47e-01

Pos logit contributions
str+0.017
 slice+0.015
aster+0.015
prise+0.015
ww+0.015
Neg logit contributions
stick-0.025
lee-0.024
en-0.024
 heads-0.023
 requests-0.023
Token all
Token ID477
Feature activation+4.36e-01

Pos logit contributions
str+0.003
ipop+0.003
ww+0.003
Not+0.003
aster+0.003
Neg logit contributions
lee-0.003
stick-0.003
 heads-0.003
en-0.003
ags-0.003
Token over
Token ID625
Feature activation+1.68e+00

Pos logit contributions
str+0.010
aster+0.009
ww+0.009
 slice+0.009
ipop+0.009
Neg logit contributions
stick-0.009
lee-0.009
en-0.009
arted-0.008
ags-0.008
Token the
Token ID262
Feature activation-1.38e+00

Pos logit contributions
str+0.037
ipop+0.034
ww+0.033
arest+0.032
 slice+0.032
Neg logit contributions
en-0.037
lee-0.036
stick-0.034
 circles-0.034
arted-0.034
Token place
Token ID1295
Feature activation+2.50e+00

Pos logit contributions
 heads+0.024
stick+0.023
lee+0.023
en+0.022
 houses+0.022
Neg logit contributions
str-0.034
ipop-0.034
prise-0.033
Not-0.033
arest-0.033
Token touch
Token ID3638
Feature activation-1.87e+00

Pos logit contributions
lee+0.020
stick+0.020
en+0.020
 requests+0.020
 heads+0.020
Neg logit contributions
str-0.015
ww-0.014
prise-0.013
iest-0.013
 slice-0.013
Token it
Token ID340
Feature activation+1.06e+00

Pos logit contributions
lee+0.049
stick+0.049
 requests+0.047
en+0.047
arted+0.047
Neg logit contributions
str-0.031
 slice-0.028
ww-0.028
iest-0.027
Not-0.026
Token.
Token ID13
Feature activation-2.45e+00

Pos logit contributions
str+0.022
 slice+0.021
ww+0.020
iest+0.020
Not+0.020
Neg logit contributions
lee-0.023
stick-0.023
 requests-0.022
arted-0.022
en-0.022
Token
Token ID198
Feature activation+1.99e+00

Pos logit contributions
stick+0.058
lee+0.056
en+0.056
aled+0.054
 heads+0.053
Neg logit contributions
str-0.047
Not-0.046
 slice-0.044
ww-0.042
 happened-0.041
Token
Token ID198
Feature activation+1.86e+00

Pos logit contributions
str+0.043
ww+0.041
 slice+0.041
iest+0.040
prise+0.039
Neg logit contributions
lee-0.042
stick-0.040
 requests-0.040
en-0.038
arted-0.038
Token"
Token ID1
Feature activation+2.47e+00

Pos logit contributions
str+0.034
iest+0.031
ipop+0.031
 happened+0.031
 slice+0.031
Neg logit contributions
lee-0.044
 requests-0.043
 heads-0.043
am-0.043
stick-0.042
TokenLet
Token ID5756
Feature activation+1.11e+00

Pos logit contributions
str+0.051
ipop+0.049
iest+0.048
prise+0.047
 slice+0.045
Neg logit contributions
 heads-0.051
lement-0.050
 requests-0.050
 circles-0.049
arted-0.049
Token's
Token ID338
Feature activation-3.12e+00

Pos logit contributions
str+0.036
ipop+0.035
prise+0.035
 slice+0.035
iest+0.034
Neg logit contributions
 requests-0.011
lee-0.011
en-0.011
stick-0.010
 heads-0.010
Token seat
Token ID5852
Feature activation-2.46e+00

Pos logit contributions
stick+0.070
lee+0.067
en+0.066
 requests+0.066
 heads+0.066
Neg logit contributions
str-0.065
iest-0.061
ww-0.060
arest-0.058
ipop-0.058
Token on
Token ID319
Feature activation+9.37e-02

Pos logit contributions
stick+0.047
lee+0.044
 heads+0.044
en+0.044
ags+0.043
Neg logit contributions
str-0.060
ww-0.055
prise-0.054
arest-0.054
 happened-0.054
Token the
Token ID262
Feature activation-2.32e+00

Pos logit contributions
str+0.002
prise+0.002
iest+0.002
ipop+0.002
ww+0.002
Neg logit contributions
stick-0.002
lee-0.002
en-0.002
 circles-0.002
ags-0.002
Token knew
Token ID2993
Feature activation+3.89e+00

Pos logit contributions
str+0.005
ww+0.005
arest+0.005
Not+0.005
iest+0.005
Neg logit contributions
lee-0.007
stick-0.007
en-0.006
 heads-0.006
arted-0.006
Token that
Token ID326
Feature activation+3.49e+00

Pos logit contributions
str+0.060
 slice+0.056
arest+0.055
ww+0.055
prise+0.054
Neg logit contributions
lee-0.106
stick-0.103
en-0.102
 requests-0.101
am-0.099
Token they
Token ID484
Feature activation+6.05e-01

Pos logit contributions
str+0.069
arest+0.063
ww+0.063
ipop+0.061
prise+0.060
Neg logit contributions
 requests-0.081
stick-0.079
en-0.079
 circles-0.078
lee-0.078
Token had
Token ID550
Feature activation-3.49e-01

Pos logit contributions
str+0.010
ww+0.010
aster+0.010
 slice+0.010
prise+0.009
Neg logit contributions
lee-0.015
stick-0.015
en-0.015
 heads-0.015
 requests-0.015
Token a
Token ID257
Feature activation+1.51e-01

Pos logit contributions
lee+0.009
stick+0.009
 requests+0.008
en+0.008
 heads+0.008
Neg logit contributions
str-0.006
ww-0.006
iest-0.006
 slice-0.006
arest-0.005
Token smart
Token ID4451
Feature activation-4.48e-01

Pos logit contributions
iest+0.003
w+0.003
ipop+0.003
str+0.003
aster+0.003
Neg logit contributions
 requests-0.003
lee-0.003
 circles-0.003
 commands-0.003
 houses-0.003
Token friend
Token ID1545
Feature activation+1.07e+00

Pos logit contributions
 requests+0.009
en+0.009
stick+0.009
lee+0.009
 heads+0.009
Neg logit contributions
str-0.010
arest-0.009
w-0.009
prise-0.009
ww-0.009
Token in
Token ID287
Feature activation-8.51e-02

Pos logit contributions
str+0.025
ww+0.024
arest+0.023
 coconut+0.023
 slice+0.023
Neg logit contributions
stick-0.021
lee-0.020
en-0.020
 requests-0.019
 circles-0.019
Token the
Token ID262
Feature activation-1.52e+00

Pos logit contributions
stick+0.002
en+0.002
lee+0.002
 circles+0.002
 requests+0.002
Neg logit contributions
str-0.002
ww-0.002
ipop-0.002
arest-0.002
w-0.002
Token turkey
Token ID26876
Feature activation+1.02e+00

Pos logit contributions
lee+0.030
stick+0.030
en+0.029
 requests+0.029
arted+0.028
Neg logit contributions
str-0.035
ww-0.032
 slice-0.032
iest-0.031
Not-0.031
Token who
Token ID508
Feature activation+1.85e+00

Pos logit contributions
str+0.024
ww+0.023
Not+0.022
 slice+0.022
arest+0.022
Neg logit contributions
stick-0.020
lee-0.019
en-0.018
 requests-0.018
 circles-0.018
Token?"
Token ID1701
Feature activation-2.57e-01

Pos logit contributions
stick+0.077
lee+0.076
en+0.074
 requests+0.073
arted+0.072
Neg logit contributions
str-0.042
 slice-0.037
ww-0.037
prise-0.035
Not-0.035
Token the
Token ID262
Feature activation+3.32e-02

Pos logit contributions
lee+0.006
stick+0.006
ags+0.006
en+0.006
arted+0.006
Neg logit contributions
str-0.005
ww-0.004
 happened-0.004
Not-0.004
ha-0.004
Token voice
Token ID3809
Feature activation-7.84e-01

Pos logit contributions
str+0.001
iest+0.001
ww+0.001
Not+0.001
 happened+0.001
Neg logit contributions
lee-0.001
stick-0.001
arted-0.001
en-0.001
ishment-0.001
Token asked
Token ID1965
Feature activation+5.46e-01

Pos logit contributions
lee+0.021
stick+0.021
 requests+0.020
en+0.020
arted+0.020
Neg logit contributions
str-0.013
 slice-0.011
ww-0.011
prise-0.011
aster-0.011
Token.
Token ID13
Feature activation-1.65e+00

Pos logit contributions
str+0.011
 happened+0.010
ww+0.010
iest+0.010
Not+0.009
Neg logit contributions
lee-0.012
stick-0.012
en-0.012
 requests-0.012
ags-0.012
Token
Token ID220
Feature activation-2.09e+00

Pos logit contributions
lee+0.037
en+0.036
stick+0.036
ishment+0.036
ags+0.036
Neg logit contributions
str-0.034
 happened-0.030
ww-0.030
Not-0.030
ha-0.030
Token
Token ID198
Feature activation+1.36e+00

Pos logit contributions
stick+0.053
lee+0.053
en+0.052
ishment+0.051
 heads+0.051
Neg logit contributions
str-0.037
 happened-0.031
ww-0.031
Not-0.031
 slice-0.031
Token
Token ID198
Feature activation+8.74e-01

Pos logit contributions
str+0.026
ww+0.024
 slice+0.023
Not+0.023
iest+0.023
Neg logit contributions
lee-0.032
stick-0.032
en-0.031
 requests-0.031
arted-0.030
TokenB
Token ID33
Feature activation-2.65e+00

Pos logit contributions
str+0.015
 slice+0.013
ww+0.013
iest+0.013
prise+0.013
Neg logit contributions
lee-0.023
stick-0.022
 requests-0.022
en-0.022
arted-0.021
Tokenuster
Token ID5819
Feature activation-2.33e+00

Pos logit contributions
lee+0.061
stick+0.060
 circles+0.060
 requests+0.060
ags+0.059
Neg logit contributions
str-0.053
ww-0.050
ha-0.049
Not-0.047
iest-0.046
Token looked
Token ID3114
Feature activation-3.07e+00

Pos logit contributions
ags+0.062
stick+0.061
lee+0.061
 requests+0.060
aled+0.059
Neg logit contributions
str-0.039
 slice-0.032
ww-0.032
 happened-0.032
ha-0.031
Token:
Token ID25
Feature activation+8.24e-01

Pos logit contributions
str+0.085
ww+0.078
 slice+0.077
prise+0.075
iest+0.074
Neg logit contributions
lee-0.084
stick-0.083
 requests-0.080
en-0.080
arted-0.078
Token to
Token ID284
Feature activation-1.67e+00

Pos logit contributions
str+0.014
Not+0.013
ww+0.013
 slice+0.013
 happened+0.012
Neg logit contributions
lee-0.021
stick-0.021
ags-0.020
arted-0.020
en-0.020
Token respect
Token ID2461
Feature activation+1.58e+00

Pos logit contributions
lee+0.042
stick+0.041
arted+0.040
 requests+0.039
en+0.039
Neg logit contributions
str-0.030
 slice-0.027
ww-0.027
aster-0.026
Not-0.026
Token other
Token ID584
Feature activation+1.23e+00

Pos logit contributions
str+0.039
ww+0.036
 slice+0.036
arest+0.036
prise+0.035
Neg logit contributions
stick-0.029
en-0.029
lee-0.029
 requests-0.028
 heads-0.028
Token people
Token ID661
Feature activation+1.90e+00

Pos logit contributions
str+0.030
ww+0.028
 slice+0.028
iest+0.027
Not+0.027
Neg logit contributions
lee-0.022
stick-0.022
arted-0.021
ags-0.021
aled-0.021
Token's
Token ID338
Feature activation-2.03e+00

Pos logit contributions
str+0.049
ww+0.045
 slice+0.045
prise+0.044
iest+0.044
Neg logit contributions
lee-0.033
stick-0.033
 requests-0.031
en-0.031
 heads-0.030
Token things
Token ID1243
Feature activation+3.39e+00

Pos logit contributions
lee+0.041
stick+0.041
arted+0.040
aled+0.039
en+0.038
Neg logit contributions
str-0.044
 slice-0.041
 happened-0.040
prise-0.040
 mel-0.040
Token and
Token ID290
Feature activation+7.89e-01

Pos logit contributions
str+0.080
ww+0.074
 slice+0.073
prise+0.072
aster+0.070
Neg logit contributions
lee-0.066
stick-0.065
en-0.064
 requests-0.063
 heads-0.062
Token to
Token ID284
Feature activation-1.99e+00

Pos logit contributions
str+0.017
ww+0.016
 slice+0.016
iest+0.015
ha+0.015
Neg logit contributions
stick-0.017
lee-0.017
 requests-0.016
 heads-0.016
en-0.016
Token be
Token ID307
Feature activation-2.06e+00

Pos logit contributions
lee+0.052
stick+0.051
arted+0.050
aled+0.049
en+0.049
Neg logit contributions
str-0.034
 slice-0.030
ww-0.030
Not-0.029
prise-0.029
Token careful
Token ID8161
Feature activation+5.38e-01

Pos logit contributions
lee+0.053
stick+0.052
en+0.051
arted+0.050
ishment+0.050
Neg logit contributions
str-0.036
 slice-0.032
ww-0.032
prise-0.031
Not-0.031
Token with
Token ID351
Feature activation-2.51e-01

Pos logit contributions
lee+0.038
stick+0.037
ags+0.036
en+0.036
 requests+0.035
Neg logit contributions
str-0.029
 slice-0.027
Not-0.027
ww-0.025
aster-0.025
Token her
Token ID607
Feature activation+3.37e-01

Pos logit contributions
lee+0.004
stick+0.004
 requests+0.003
ags+0.003
en+0.003
Neg logit contributions
str-0.007
ww-0.007
 slice-0.007
iest-0.007
arest-0.007
Token mom
Token ID1995
Feature activation-9.26e-02

Pos logit contributions
str+0.006
ww+0.006
 slice+0.006
Not+0.005
iest+0.005
Neg logit contributions
lee-0.008
stick-0.008
en-0.008
 requests-0.008
arted-0.008
Token and
Token ID290
Feature activation+1.34e+00

Pos logit contributions
lee+0.002
stick+0.002
en+0.002
 requests+0.002
arted+0.002
Neg logit contributions
str-0.002
ww-0.002
 slice-0.002
iest-0.002
prise-0.002
Token dad
Token ID9955
Feature activation-8.87e-02

Pos logit contributions
iest+0.029
arest+0.028
ww+0.027
str+0.027
 coconut+0.025
Neg logit contributions
 heads-0.029
lee-0.029
stick-0.028
 requests-0.027
 houses-0.027
Token.
Token ID13
Feature activation-2.17e+00

Pos logit contributions
lee+0.002
stick+0.002
 requests+0.002
en+0.002
arted+0.002
Neg logit contributions
str-0.002
ww-0.002
 slice-0.002
iest-0.002
prise-0.002
Token They
Token ID1119
Feature activation+1.69e-01

Pos logit contributions
lee+0.034
stick+0.034
en+0.032
 requests+0.031
arted+0.031
Neg logit contributions
str-0.059
 slice-0.055
ww-0.055
Not-0.054
iest-0.053
Token had
Token ID550
Feature activation-1.03e+00

Pos logit contributions
str+0.003
ww+0.003
 slice+0.003
iest+0.003
arest+0.003
Neg logit contributions
lee-0.004
stick-0.004
arted-0.004
en-0.004
 requests-0.004
Token a
Token ID257
Feature activation+1.12e-01

Pos logit contributions
lee+0.028
stick+0.028
en+0.027
 requests+0.027
ishment+0.027
Neg logit contributions
str-0.016
ww-0.014
 slice-0.014
Not-0.014
 happened-0.014
Token lot
Token ID1256
Feature activation+1.20e+00

Pos logit contributions
iest+0.002
aster+0.002
str+0.002
arest+0.002
ipop+0.002
Neg logit contributions
 requests-0.002
 heads-0.002
lee-0.002
 houses-0.002
 circles-0.002
Token of
Token ID286
Feature activation-1.17e+00

Pos logit contributions
str+0.018
 slice+0.016
ww+0.016
prise+0.015
aster+0.015
Neg logit contributions
lee-0.034
ags-0.033
stick-0.033
 circles-0.033
arted-0.032
Token cl
Token ID537
Feature activation-6.67e+00

Pos logit contributions
ags+0.033
stick+0.032
lee+0.031
ishment+0.031
 requests+0.031
Neg logit contributions
str-0.020
 slice-0.018
 happened-0.018
ha-0.017
ww-0.017
Tokenapped
Token ID6320
Feature activation-3.67e+00

Pos logit contributions
 requests+0.125
 commands+0.124
 backs+0.124
 heads+0.124
 houses+0.116
Neg logit contributions
prise-0.151
ww-0.151
 slice-0.148
w-0.147
ipop-0.142
Token her
Token ID607
Feature activation-1.94e+00

Pos logit contributions
stick+0.079
lee+0.078
lement+0.077
aled+0.077
ags+0.076
Neg logit contributions
 slice-0.069
Not-0.066
ww-0.065
iest-0.065
 happened-0.065
Token hands
Token ID2832
Feature activation-1.95e+00

Pos logit contributions
stick+0.027
lee+0.026
en+0.025
lement+0.024
 requests+0.024
Neg logit contributions
str-0.055
 slice-0.054
ww-0.053
ipop-0.051
 happened-0.051
Token with
Token ID351
Feature activation-2.48e-01

Pos logit contributions
lee+0.050
 requests+0.049
lement+0.049
ags+0.049
ishment+0.048
Neg logit contributions
 slice-0.031
str-0.029
Not-0.028
 mel-0.027
less-0.027
Token joy
Token ID8716
Feature activation-1.07e+00

Pos logit contributions
stick+0.005
ishment+0.005
lee+0.005
lement+0.005
am+0.005
Neg logit contributions
 slice-0.005
 happened-0.004
 mel-0.004
iest-0.004
 coconut-0.004
Token and
Token ID290
Feature activation-6.35e-02

Pos logit contributions
lee+0.024
stick+0.024
 requests+0.023
aled+0.023
ags+0.023
Neg logit contributions
str-0.021
 slice-0.020
ww-0.019
iest-0.019
 happened-0.019
Token started
Token ID2067
Feature activation-2.21e+00

Pos logit contributions
ags+0.002
stick+0.002
lee+0.001
arted+0.001
ishment+0.001
Neg logit contributions
str-0.001
 happened-0.001
ww-0.001
 slice-0.001
iest-0.001
Token running
Token ID2491
Feature activation-4.61e+00

Pos logit contributions
ags+0.072
ishment+0.070
lement+0.068
aled+0.066
 heads+0.065
Neg logit contributions
 happened-0.023
str-0.022
 mel-0.021
 they-0.020
less-0.019
Token along
Token ID1863
Feature activation-5.85e-01

Pos logit contributions
 requests+0.118
ishment+0.118
lement+0.117
lee+0.116
ags+0.115
Neg logit contributions
 while-0.073
ha-0.067
less-0.065
 bite-0.063
Not-0.063
Token the
Token ID262
Feature activation-1.81e+00

Pos logit contributions
ags+0.018
lee+0.018
aled+0.017
aters+0.017
am+0.017
Neg logit contributions
 while-0.006
 else-0.006
 that-0.006
str-0.006
 slice-0.006
Token is
Token ID318
Feature activation-1.12e-01

Pos logit contributions
str+0.059
ww+0.054
 slice+0.053
prise+0.052
Not+0.052
Neg logit contributions
stick-0.056
lee-0.056
 requests-0.054
en-0.053
 heads-0.052
Token Anna
Token ID11735
Feature activation+1.74e+00

Pos logit contributions
lee+0.002
stick+0.002
 requests+0.002
en+0.002
 circles+0.002
Neg logit contributions
str-0.003
arest-0.002
 slice-0.002
iest-0.002
prise-0.002
Token,
Token ID11
Feature activation+3.13e+00

Pos logit contributions
str+0.041
ww+0.038
 slice+0.038
Not+0.037
prise+0.037
Neg logit contributions
lee-0.033
stick-0.033
 requests-0.032
en-0.031
 heads-0.031
Token my
Token ID616
Feature activation-3.86e-01

Pos logit contributions
str+0.068
ww+0.060
arest+0.058
iest+0.058
w+0.057
Neg logit contributions
 requests-0.068
lee-0.068
en-0.068
stick-0.067
 heads-0.066
Token daughter
Token ID4957
Feature activation+1.57e+00

Pos logit contributions
lee+0.009
stick+0.009
 requests+0.008
en+0.008
 heads+0.008
Neg logit contributions
str-0.008
ww-0.007
 slice-0.007
Not-0.007
iest-0.007
Token.
Token ID13
Feature activation-1.63e+00

Pos logit contributions
str+0.033
ww+0.030
 slice+0.030
Not+0.029
iest+0.029
Neg logit contributions
lee-0.034
stick-0.034
 requests-0.033
en-0.033
 heads-0.032
Token She
Token ID1375
Feature activation+5.23e-01

Pos logit contributions
lee+0.032
ags+0.032
arted+0.031
ishment+0.031
 heads+0.030
Neg logit contributions
Not-0.035
str-0.034
ww-0.034
ipop-0.032
prise-0.032
Token likes
Token ID7832
Feature activation+1.12e-01

Pos logit contributions
str+0.010
ww+0.009
 slice+0.009
 happened+0.009
prise+0.009
Neg logit contributions
lee-0.013
stick-0.013
arted-0.012
ags-0.012
en-0.012
Token to
Token ID284
Feature activation-3.22e+00

Pos logit contributions
str+0.002
ww+0.002
 slice+0.002
iest+0.002
prise+0.002
Neg logit contributions
lee-0.003
stick-0.003
 requests-0.003
en-0.003
arted-0.002
Token come
Token ID1282
Feature activation-4.32e+00

Pos logit contributions
lee+0.071
stick+0.070
 requests+0.067
en+0.067
arted+0.067
Neg logit contributions
str-0.067
 slice-0.061
ww-0.061
iest-0.059
prise-0.059
Token with
Token ID351
Feature activation+8.56e-01

Pos logit contributions
lee+0.109
arted+0.103
ishment+0.102
 requests+0.101
aled+0.101
Neg logit contributions
str-0.072
 slice-0.069
prise-0.067
Not-0.067
ww-0.065
Tokenasis
Token ID17765
Feature activation+1.36e+00

Pos logit contributions
str+0.003
 slice+0.003
ww+0.003
 happened+0.003
Not+0.003
Neg logit contributions
lee-0.005
stick-0.005
am-0.005
en-0.005
ags-0.005
Token so
Token ID523
Feature activation+2.92e+00

Pos logit contributions
str+0.029
ww+0.027
prise+0.026
 slice+0.026
arest+0.025
Neg logit contributions
stick-0.030
lee-0.029
en-0.028
 requests-0.028
 heads-0.027
Token badly
Token ID11234
Feature activation+2.21e+00

Pos logit contributions
str+0.043
 slice+0.043
prise+0.041
ww+0.038
arest+0.036
Neg logit contributions
lee-0.082
am-0.078
stick-0.077
arted-0.077
 requests-0.076
Token.
Token ID13
Feature activation-2.57e+00

Pos logit contributions
str+0.048
 slice+0.047
prise+0.046
ww+0.045
arest+0.043
Neg logit contributions
stick-0.047
en-0.046
lee-0.045
 requests-0.043
arted-0.043
Token He
Token ID679
Feature activation-8.34e-01

Pos logit contributions
stick+0.051
en+0.051
ags+0.049
 heads+0.048
lee+0.047
Neg logit contributions
str-0.056
Not-0.054
aster-0.052
 mel-0.052
 happened-0.049
Token stood
Token ID6204
Feature activation-5.85e+00

Pos logit contributions
stick+0.022
lee+0.022
ags+0.021
 requests+0.021
en+0.021
Neg logit contributions
str-0.014
 slice-0.012
ww-0.012
ha-0.012
 happened-0.012
Token there
Token ID612
Feature activation-6.57e-01

Pos logit contributions
lee+0.130
stick+0.129
 requests+0.124
en+0.124
arted+0.122
Neg logit contributions
str-0.121
 slice-0.111
ww-0.110
Not-0.107
iest-0.107
Token for
Token ID329
Feature activation+8.87e-01

Pos logit contributions
stick+0.014
lee+0.014
en+0.014
 requests+0.014
arted+0.013
Neg logit contributions
str-0.014
ww-0.013
 slice-0.013
prise-0.012
iest-0.012
Token a
Token ID257
Feature activation+1.28e+00

Pos logit contributions
str+0.014
ww+0.012
 slice+0.012
iest+0.012
Not+0.012
Neg logit contributions
lee-0.024
stick-0.024
 requests-0.023
en-0.023
arted-0.023
Token moment
Token ID2589
Feature activation+2.17e+00

Pos logit contributions
w+0.028
iest+0.027
 happened+0.026
mmmm+0.025
ipop+0.024
Neg logit contributions
 manner-0.021
 commands-0.021
 requests-0.021
 arms-0.020
 circles-0.020
Token,
Token ID11
Feature activation+2.11e+00

Pos logit contributions
str+0.056
ww+0.053
prise+0.052
w+0.050
 coconut+0.049
Neg logit contributions
en-0.039
stick-0.038
lee-0.038
 circles-0.037
 requests-0.034
Token replied
Token ID8712
Feature activation+2.79e-02

Pos logit contributions
lee+0.006
stick+0.006
en+0.006
 requests+0.006
am+0.006
Neg logit contributions
str-0.005
 slice-0.005
arest-0.004
ww-0.004
aster-0.004
Token,
Token ID11
Feature activation+2.82e+00

Pos logit contributions
str+0.001
 slice+0.000
prise+0.000
ww+0.000
arest+0.000
Neg logit contributions
lee-0.001
stick-0.001
en-0.001
 requests-0.001
arted-0.001
Token �
Token ID6184
Feature activation-1.60e+00

Pos logit contributions
str+0.066
ww+0.060
 slice+0.060
Not+0.059
iest+0.059
Neg logit contributions
lee-0.055
stick-0.055
en-0.052
 requests-0.052
arted-0.051
Token�
Token ID95
Feature activation-4.26e+00

Pos logit contributions
stick+0.039
lee+0.038
 requests+0.037
en+0.037
arted+0.037
Neg logit contributions
str-0.032
ww-0.029
arest-0.027
ha-0.027
Not-0.027
Token€
Token ID26391
Feature activation-1.42e+00

Pos logit contributions
lee+0.096
 heads+0.096
arted+0.094
 requests+0.092
stick+0.092
Neg logit contributions
str-0.095
ww-0.082
aster-0.082
ha-0.080
arest-0.078
Token�
Token ID129
Feature activation-6.08e+00

Pos logit contributions
stick+0.034
lee+0.034
en+0.033
arted+0.033
 requests+0.033
Neg logit contributions
str-0.027
ww-0.024
 slice-0.024
prise-0.023
 happened-0.023
Token�
Token ID241
Feature activation+1.04e+00

Pos logit contributions
stick+0.125
 spirits+0.122
 houses+0.121
aters+0.120
ucks+0.117
Neg logit contributions
str-0.143
aster-0.117
ww-0.112
arest-0.107
ha-0.107
Tokenit
Token ID270
Feature activation-6.52e-01

Pos logit contributions
str+0.019
 slice+0.018
prise+0.018
ww+0.017
iest+0.017
Neg logit contributions
lee-0.024
stick-0.024
 requests-0.023
en-0.023
arted-0.023
Token is
Token ID318
Feature activation-1.07e+00

Pos logit contributions
stick+0.015
lee+0.015
 requests+0.014
en+0.014
 heads+0.014
Neg logit contributions
str-0.013
 slice-0.012
prise-0.012
Not-0.011
ww-0.011
Token a
Token ID257
Feature activation-2.47e-01

Pos logit contributions
lee+0.025
stick+0.025
en+0.024
 requests+0.024
 circles+0.023
Neg logit contributions
str-0.021
 slice-0.020
prise-0.019
ww-0.019
arest-0.019
Token rocking
Token ID38059
Feature activation-6.28e-01

Pos logit contributions
stick+0.005
lee+0.005
 circles+0.005
 requests+0.005
 heads+0.005
Neg logit contributions
str-0.005
prise-0.005
 slice-0.005
 happened-0.005
ipop-0.004
Token happy
Token ID3772
Feature activation+2.07e+00

Pos logit contributions
stick+0.046
lee+0.046
en+0.045
ishment+0.044
arted+0.044
Neg logit contributions
str-0.031
 slice-0.028
ww-0.026
 happened-0.026
prise-0.026
Token.
Token ID13
Feature activation-3.04e+00

Pos logit contributions
str+0.039
 slice+0.037
ww+0.035
Not+0.035
 happened+0.035
Neg logit contributions
stick-0.049
lee-0.049
aled-0.047
ags-0.047
arted-0.047
Token She
Token ID1375
Feature activation-9.24e-01

Pos logit contributions
en+0.071
am+0.071
stick+0.070
charged+0.069
lement+0.068
Neg logit contributions
Not-0.054
 happened-0.044
 thank-0.043
 that-0.043
 mel-0.042
Token smiled
Token ID13541
Feature activation-2.88e+00

Pos logit contributions
lee+0.023
stick+0.022
 requests+0.021
en+0.021
 heads+0.021
Neg logit contributions
str-0.017
ww-0.016
 slice-0.015
iest-0.015
prise-0.015
Token and
Token ID290
Feature activation-1.07e-01

Pos logit contributions
stick+0.069
lee+0.068
aled+0.066
lement+0.065
arted+0.065
Neg logit contributions
str-0.050
 slice-0.050
Not-0.049
iest-0.048
ww-0.048
Token cl
Token ID537
Feature activation-5.88e+00

Pos logit contributions
stick+0.003
lee+0.003
en+0.003
 requests+0.003
arted+0.003
Neg logit contributions
str-0.002
ww-0.002
 slice-0.002
 happened-0.002
Not-0.002
Tokenapped
Token ID6320
Feature activation-2.66e+00

Pos logit contributions
 backs+0.101
 requests+0.100
 heads+0.099
stick+0.099
 commands+0.097
Neg logit contributions
prise-0.143
 slice-0.143
ww-0.141
str-0.140
iest-0.140
Token her
Token ID607
Feature activation-1.07e+00

Pos logit contributions
stick+0.061
lee+0.060
aled+0.059
arted+0.058
lement+0.058
Neg logit contributions
 slice-0.049
Not-0.047
iest-0.045
str-0.045
ww-0.045
Token hands
Token ID2832
Feature activation-4.15e-01

Pos logit contributions
stick+0.017
lee+0.016
arted+0.016
en+0.016
lement+0.015
Neg logit contributions
 slice-0.029
str-0.029
ww-0.028
 reward-0.027
ipop-0.027
Token.
Token ID13
Feature activation-2.77e+00

Pos logit contributions
lee+0.011
lement+0.011
aled+0.011
arted+0.011
ishment+0.010
Neg logit contributions
 slice-0.007
str-0.006
Not-0.006
less-0.006
iest-0.006
Token She
Token ID1375
Feature activation-1.04e+00

Pos logit contributions
en+0.058
stick+0.058
ishment+0.055
am+0.055
 heads+0.054
Neg logit contributions
Not-0.057
 happened-0.052
 mel-0.050
 thank-0.049
 they-0.048
Token.
Token ID13
Feature activation-1.98e+00

Pos logit contributions
lee+0.006
stick+0.006
en+0.006
 requests+0.006
arted+0.006
Neg logit contributions
str-0.007
ww-0.006
 slice-0.006
prise-0.006
Not-0.006
Token
Token ID198
Feature activation+2.07e+00

Pos logit contributions
stick+0.049
lee+0.047
en+0.047
ags+0.046
 heads+0.046
Neg logit contributions
str-0.038
Not-0.035
ww-0.034
 slice-0.033
 happened-0.033
Token
Token ID198
Feature activation+1.56e+00

Pos logit contributions
str+0.045
ww+0.042
 slice+0.042
iest+0.041
prise+0.041
Neg logit contributions
lee-0.043
stick-0.042
 requests-0.041
en-0.041
arted-0.040
TokenTom
Token ID13787
Feature activation-2.70e+00

Pos logit contributions
str+0.027
 slice+0.025
prise+0.024
iest+0.024
ipop+0.023
Neg logit contributions
lee-0.039
stick-0.039
 requests-0.038
en-0.038
aled-0.038
Tokenmy
Token ID1820
Feature activation-2.07e+00

Pos logit contributions
ags+0.068
lee+0.065
stick+0.065
ishment+0.064
 commands+0.064
Neg logit contributions
str-0.048
ha-0.044
 slice-0.043
 happened-0.041
ww-0.041
Token stepped
Token ID10764
Feature activation-6.09e+00

Pos logit contributions
ags+0.057
stick+0.054
lee+0.054
ishment+0.053
arted+0.052
Neg logit contributions
str-0.035
ha-0.030
 happened-0.029
 slice-0.029
ww-0.029
Token inside
Token ID2641
Feature activation+5.09e-02

Pos logit contributions
stick+0.153
lee+0.151
ishment+0.146
en+0.145
 requests+0.144
Neg logit contributions
Not-0.101
str-0.100
 slice-0.097
prise-0.095
iest-0.094
Token,
Token ID11
Feature activation+2.19e+00

Pos logit contributions
str+0.001
ww+0.001
arest+0.001
prise+0.001
 slice+0.001
Neg logit contributions
lee-0.001
stick-0.001
 requests-0.001
en-0.001
 circles-0.001
Token closed
Token ID4838
Feature activation-2.57e+00

Pos logit contributions
str+0.047
ww+0.043
 slice+0.043
iest+0.042
prise+0.042
Neg logit contributions
lee-0.047
stick-0.046
 requests-0.044
en-0.044
arted-0.044
Token his
Token ID465
Feature activation-1.61e+00

Pos logit contributions
stick+0.057
lee+0.056
 requests+0.055
arted+0.054
en+0.053
Neg logit contributions
str-0.052
 slice-0.048
iest-0.047
ww-0.047
aster-0.046
Token eyes
Token ID2951
Feature activation-6.35e-01

Pos logit contributions
stick+0.019
arted+0.018
en+0.018
aled+0.017
lee+0.017
Neg logit contributions
 slice-0.049
str-0.049
 mel-0.048
 reward-0.047
ipop-0.047
Token Mitt
Token ID16627
Feature activation+4.79e-01

Pos logit contributions
en+0.043
ags+0.040
ishment+0.040
lee+0.040
lement+0.039
Neg logit contributions
str-0.034
ww-0.033
Not-0.032
 happened-0.032
ha-0.032
Tokenens
Token ID641
Feature activation-2.45e+00

Pos logit contributions
str+0.007
ww+0.006
 happened+0.006
ha+0.006
Not+0.006
Neg logit contributions
stick-0.014
lee-0.014
 requests-0.013
aled-0.013
ags-0.013
Token and
Token ID290
Feature activation+1.19e+00

Pos logit contributions
lee+0.060
stick+0.060
ags+0.058
 requests+0.058
en+0.058
Neg logit contributions
str-0.045
ww-0.040
 slice-0.040
Not-0.038
 happened-0.038
Token Spark
Token ID17732
Feature activation-1.96e+00

Pos logit contributions
str+0.023
ww+0.021
 mel+0.021
 happened+0.021
Not+0.020
Neg logit contributions
ags-0.028
lee-0.027
 requests-0.026
stick-0.026
en-0.026
Tokeny
Token ID88
Feature activation-1.47e+00

Pos logit contributions
stick+0.048
ags+0.048
 requests+0.048
lee+0.047
 commands+0.047
Neg logit contributions
str-0.034
ww-0.032
ha-0.032
w-0.032
 happened-0.031
Token sh
Token ID427
Feature activation-5.96e+00

Pos logit contributions
lee+0.038
stick+0.038
ags+0.037
 requests+0.037
en+0.037
Neg logit contributions
str-0.025
 slice-0.022
ww-0.021
 happened-0.021
Not-0.021
Tokenivered
Token ID6396
Feature activation-2.23e+00

Pos logit contributions
 houses+0.137
 circles+0.136
 requests+0.136
 dances+0.136
 goals+0.135
Neg logit contributions
w-0.115
str-0.115
ww-0.104
iest-0.098
uch-0.097
Token and
Token ID290
Feature activation-8.23e-01

Pos logit contributions
stick+0.045
lee+0.045
en+0.043
 requests+0.042
arted+0.042
Neg logit contributions
str-0.052
ww-0.048
prise-0.047
 slice-0.047
arest-0.046
Token looked
Token ID3114
Feature activation-2.56e+00

Pos logit contributions
lee+0.019
stick+0.019
en+0.018
ags+0.018
 requests+0.018
Neg logit contributions
str-0.016
ww-0.015
 slice-0.015
 happened-0.014
Not-0.014
Token around
Token ID1088
Feature activation-5.65e-01

Pos logit contributions
lee+0.051
stick+0.050
 requests+0.048
en+0.048
arted+0.047
Neg logit contributions
str-0.059
ww-0.055
 slice-0.055
iest-0.053
Not-0.053
Token.
Token ID13
Feature activation-2.55e+00

Pos logit contributions
lee+0.013
stick+0.013
aled+0.013
ags+0.012
 requests+0.012
Neg logit contributions
str-0.011
 slice-0.010
Not-0.010
 happened-0.010
ww-0.009