TOP ACTIVATIONS
MAX = 0.146

It was a big j igsaw puzzle and she wanted to
box and saw a j igsaw puzzle inside . He paused
would take out a j igsaw puzzle and try to solve
was a bunny called B oun cy . B oun cy
was very proud of the work he had done . All
little boy getting a buzz cut and a lady getting a
But then Susan felt a tug on her waist . She
her mom my gave the cash ier some money to pay
mom was proud of the work she had done !
another way to get the work done . <|endoftext|> Once there
He felt proud of the work he had just done ,
, Tim my felt a tug on his fishing rod .
. The girl had a work to do , it was
, but suddenly felt a tug . He looked back and
quickly checked for a fire extingu isher , but there wasn
happened . He felt a tug on his fishing line and
enough , he felt a tug on the net . He
. They also saw a tug boat which was to wing
She was proud of the work she was doing and it
he needed to give the cash ier some money . Tim

MIN ACTIVATIONS
MIN = -0.114

" Ho ho ho ! Merry Christmas , Anna and Ben
tw igs and to c aw and fly . He also
tw igs and to c aw and fly . He also
and dragons and knights and princess es . She wished she
" Ho ho ho ! Merry Christmas , Lily and Ben
, c aw , c aw ". <|endoftext|> Anna and Ben
d uk k ah and har issa and ber b ere
se ag ull s and pel icans . They heard horns
the stars and planets in outer space . So one day
towers , and knights and princess es ! Sus ie spent
" wow " and " aw w ". They are calm
We were playing knights and princess es . Tom was a
" C aw , c aw !" They had
pretended to be knights and princess es . They had a
" B rav o , brav o , you are wonderful
They felt like princes and princess es . They played with
their map and sword and hat . They have fun .
animals , cars , and princess es . Sometimes they bring
pretend to be princes and princess es . They have a
" c aw , c aw , c aw ". <|endoftext|>

INTERVAL 0.081 - 0.146
CONTAINS 0.014%

the vine she had che wed on earlier . She realized
impressed with Jerry 's hard work . From that day on
did , he felt a tug on the line ! He
so proud of the hard work she had done to make
was very proud of his work and his rake . <|endoftext|>

INTERVAL 0.016 - 0.081
CONTAINS 18.241%

Suddenly , they saw a door in the middle of the
walking outside , she saw a fuzzy inch worm ! She
€ Dad smiled at her and said � �
he had made a very special picture . <|endoftext|> Once upon
to his bedroom to learn how to use it .

INTERVAL -0.049 - 0.016
CONTAINS 81.091%

The man felt happy and his worries were gone . <|endoftext|>
. They feel sorry . They say , " Oh no
want a reward !" The other animals heard his yell and
to see that no one was watching . He quickly put
. One day , she invited Lisa to her house to

INTERVAL -0.114 - -0.049
CONTAINS 0.654%

like playing with little girls named Lily ."
. Let 's go and see !" Lily said .
our friend . And our notebook is ruined ."
her car , and her bell . They make loud noises
that they would always be best friends . <|endoftext|> One day
Token It
Token ID632
Feature activation+1.67e-02

Pos logit contributions
sie+0.017
unny+0.017
 stool+0.017
aughter+0.016
otte+0.016
Neg logit contributions
aw-0.013
 flashes-0.010
 anything-0.010
 its-0.010
 wishes-0.010
Token was
Token ID373
Feature activation+7.48e-03

Pos logit contributions
sie+0.031
 stool+0.031
unny+0.031
ll+0.031
hetically+0.030
Neg logit contributions
aw-0.028
 flashes-0.021
 anything-0.021
 wishes-0.020
 its-0.020
Token a
Token ID257
Feature activation+3.55e-02

Pos logit contributions
 stool+0.014
sie+0.013
otte+0.012
nie+0.012
 aisle+0.012
Neg logit contributions
aw-0.013
 Leave-0.010
 wishes-0.010
 flashes-0.010
 War-0.009
Token big
Token ID1263
Feature activation+9.48e-03

Pos logit contributions
unny+0.070
sie+0.070
 stool+0.070
ll+0.068
ve+0.066
Neg logit contributions
aw-0.057
 its-0.044
 flashes-0.043
 anything-0.043
 such-0.042
Token j
Token ID474
Feature activation+4.48e-02

Pos logit contributions
unny+0.020
sie+0.019
ll+0.019
ve+0.019
otte+0.018
Neg logit contributions
aw-0.015
 its-0.012
 flashes-0.011
 anything-0.011
 wishes-0.011
Tokenigsaw
Token ID45636
Feature activation+1.43e-01

Pos logit contributions
hetically+0.041
unny+0.040
ll+0.039
 stool+0.038
stack+0.038
Neg logit contributions
aw-0.124
 its-0.100
 about-0.098
 flashes-0.098
 It-0.097
Token puzzle
Token ID15027
Feature activation+6.42e-02

Pos logit contributions
unny+0.315
ylan+0.279
ll+0.266
hetically+0.264
erek+0.263
Neg logit contributions
aw-0.209
 about-0.191
 such-0.173
 its-0.171
 pictures-0.171
Token and
Token ID290
Feature activation+1.81e-02

Pos logit contributions
unny+0.158
sie+0.154
ll+0.152
hetically+0.150
 stool+0.149
Neg logit contributions
aw-0.077
 its-0.055
 such-0.052
 anything-0.052
 flashes-0.049
Token she
Token ID673
Feature activation-2.52e-02

Pos logit contributions
 stool+0.034
sie+0.033
otte+0.033
beans+0.032
ll+0.032
Neg logit contributions
aw-0.031
 flashes-0.024
 Leave-0.023
 wishes-0.022
 anything-0.022
Token wanted
Token ID2227
Feature activation+1.77e-02

Pos logit contributions
aw+0.039
 its+0.031
 anything+0.030
 flashes+0.030
 such+0.029
Neg logit contributions
sie-0.050
unny-0.050
 stool-0.049
ll-0.049
ve-0.048
Token to
Token ID284
Feature activation+7.22e-03

Pos logit contributions
 stool+0.025
 first+0.024
 aisle+0.023
otte+0.022
sie+0.022
Neg logit contributions
aw-0.038
 flashes-0.032
 Leave-0.031
 wishes-0.028
cks-0.028
Token box
Token ID3091
Feature activation+2.72e-02

Pos logit contributions
ll+0.024
unny+0.024
sie+0.023
otte+0.022
hetically+0.021
Neg logit contributions
aw-0.037
 its-0.034
 wishes-0.034
 such-0.033
 anything-0.033
Token and
Token ID290
Feature activation+5.31e-03

Pos logit contributions
unny+0.061
sie+0.061
 stool+0.059
ll+0.059
otte+0.058
Neg logit contributions
aw-0.037
 its-0.029
 flashes-0.027
 such-0.026
 anything-0.026
Token saw
Token ID2497
Feature activation-3.83e-02

Pos logit contributions
unny+0.011
sie+0.011
 stool+0.011
ll+0.011
hetically+0.011
Neg logit contributions
aw-0.008
 its-0.006
 flashes-0.006
 anything-0.006
 such-0.006
Token a
Token ID257
Feature activation+1.75e-02

Pos logit contributions
aw+0.053
 Leave+0.043
 flashes+0.040
 aids+0.037
 anything+0.037
Neg logit contributions
 stool-0.078
 miner-0.076
sie-0.075
aughter-0.075
hetically-0.075
Token j
Token ID474
Feature activation+2.69e-02

Pos logit contributions
unny+0.032
sie+0.032
ll+0.031
 stool+0.031
hetically+0.030
Neg logit contributions
aw-0.030
 its-0.025
 flashes-0.024
 anything-0.024
 wishes-0.023
Tokenigsaw
Token ID45636
Feature activation+1.34e-01

Pos logit contributions
hetically+0.028
unny+0.027
 stool+0.027
comb+0.027
stack+0.027
Neg logit contributions
aw-0.071
 its-0.057
 wishes-0.056
 about-0.056
 It-0.055
Token puzzle
Token ID15027
Feature activation+5.33e-02

Pos logit contributions
unny+0.315
ylan+0.270
otte+0.267
sie+0.263
ll+0.262
Neg logit contributions
 about-0.166
aw-0.161
 its-0.158
 pictures-0.153
 it-0.151
Token inside
Token ID2641
Feature activation+4.16e-02

Pos logit contributions
unny+0.132
sie+0.127
otte+0.123
ll+0.123
hetically+0.121
Neg logit contributions
aw-0.068
 its-0.052
 such-0.048
 anything-0.048
 about-0.047
Token.
Token ID13
Feature activation+4.75e-03

Pos logit contributions
 stool+0.071
 aisle+0.068
ve+0.067
 miner+0.066
ll+0.066
Neg logit contributions
aw-0.071
 Leave-0.061
 flashes-0.058
 aids-0.055
 wishes-0.054
Token He
Token ID679
Feature activation+1.27e-02

Pos logit contributions
 stool+0.010
unny+0.010
sie+0.010
ll+0.010
aughter+0.010
Neg logit contributions
aw-0.007
 its-0.005
 flashes-0.005
 anything-0.005
 wishes-0.005
Token paused
Token ID24487
Feature activation+1.15e-02

Pos logit contributions
unny+0.027
sie+0.027
 stool+0.027
ll+0.026
aughter+0.026
Neg logit contributions
aw-0.019
 its-0.014
 flashes-0.014
 wishes-0.013
 such-0.013
Token would
Token ID561
Feature activation-3.15e-02

Pos logit contributions
aw+0.028
 its+0.022
 wishes+0.022
 drawn+0.021
 anything+0.021
Neg logit contributions
unny-0.037
sie-0.036
aughter-0.035
hetically-0.035
 stool-0.035
Token take
Token ID1011
Feature activation+6.37e-03

Pos logit contributions
aw+0.043
 its+0.039
 anything+0.038
 wishes+0.038
 such+0.035
Neg logit contributions
unny-0.068
ll-0.066
sie-0.065
aughter-0.064
ve-0.063
Token out
Token ID503
Feature activation-2.30e-02

Pos logit contributions
sie+0.013
 stool+0.013
ve+0.012
 miner+0.012
unny+0.012
Neg logit contributions
aw-0.010
 flashes-0.007
 Leave-0.007
 its-0.006
 wishes-0.006
Token a
Token ID257
Feature activation-1.75e-03

Pos logit contributions
aw+0.030
 its+0.021
 flashes+0.021
 anything+0.020
 wishes+0.020
Neg logit contributions
sie-0.052
 stool-0.052
unny-0.052
ll-0.050
hetically-0.050
Token j
Token ID474
Feature activation+2.77e-02

Pos logit contributions
aw+0.003
 anything+0.002
 its+0.002
 wishes+0.002
 about+0.002
Neg logit contributions
unny-0.003
hetically-0.003
ll-0.003
sie-0.003
otte-0.003
Tokenigsaw
Token ID45636
Feature activation+1.32e-01

Pos logit contributions
stack+0.029
unny+0.028
hetically+0.028
comb+0.028
 stool+0.027
Neg logit contributions
aw-0.072
 about-0.058
 its-0.058
 wishes-0.057
 anything-0.057
Token puzzle
Token ID15027
Feature activation+5.25e-02

Pos logit contributions
unny+0.280
stack+0.250
ylan+0.250
otte+0.238
hetically+0.237
Neg logit contributions
aw-0.186
 about-0.185
 pictures-0.180
 more-0.164
 its-0.159
Token and
Token ID290
Feature activation-2.16e-03

Pos logit contributions
unny+0.134
sie+0.130
hetically+0.128
otte+0.127
ll+0.127
Neg logit contributions
aw-0.060
 its-0.045
 about-0.043
 such-0.042
 anything-0.042
Token try
Token ID1949
Feature activation-2.01e-02

Pos logit contributions
aw+0.004
 flashes+0.003
 its+0.003
 anything+0.003
 such+0.003
Neg logit contributions
 stool-0.004
sie-0.004
unny-0.004
ll-0.004
hetically-0.004
Token to
Token ID284
Feature activation-5.67e-03

Pos logit contributions
aw+0.034
 Leave+0.027
 flashes+0.027
 wishes+0.025
 aids+0.023
Neg logit contributions
 stool-0.040
ve-0.037
ll-0.036
oris-0.035
sie-0.035
Token solve
Token ID8494
Feature activation+2.20e-02

Pos logit contributions
aw+0.010
 its+0.008
 anything+0.008
 flashes+0.008
 such+0.008
Neg logit contributions
unny-0.010
sie-0.010
 stool-0.010
ll-0.010
aughter-0.010
Token was
Token ID373
Feature activation-2.48e-02

Pos logit contributions
unny+0.021
ll+0.021
 stool+0.020
sie+0.020
ve+0.020
Neg logit contributions
aw-0.023
 flashes-0.018
 anything-0.017
 Leave-0.017
 aids-0.017
Token a
Token ID257
Feature activation+1.65e-02

Pos logit contributions
aw+0.033
 flashes+0.024
 anything+0.024
 its+0.023
 wishes+0.023
Neg logit contributions
sie-0.054
unny-0.054
 stool-0.054
otte-0.052
ve-0.052
Token bunny
Token ID44915
Feature activation+9.67e-03

Pos logit contributions
hetically+0.034
ll+0.033
unny+0.033
sie+0.033
aughter+0.031
Neg logit contributions
aw-0.026
 its-0.021
 flashes-0.019
 wishes-0.019
 anything-0.018
Token called
Token ID1444
Feature activation-2.97e-02

Pos logit contributions
unny+0.024
sie+0.024
 stool+0.024
ll+0.023
otte+0.023
Neg logit contributions
aw-0.010
 flashes-0.007
 anything-0.007
 such-0.007
 its-0.006
Token B
Token ID347
Feature activation+6.29e-02

Pos logit contributions
aw+0.049
 its+0.043
 anything+0.041
 such+0.040
 It+0.038
Neg logit contributions
sie-0.055
ll-0.055
 stool-0.054
hetically-0.053
ve-0.053
Tokenoun
Token ID977
Feature activation+1.18e-01

Pos logit contributions
 stool+0.093
hetically+0.086
 boulder+0.084
sie+0.084
stack+0.082
Neg logit contributions
aw-0.135
 its-0.113
 It-0.112
 Its-0.108
 about-0.104
Tokency
Token ID948
Feature activation+3.90e-02

Pos logit contributions
 stool+0.166
igsaw+0.136
hetically+0.134
 boulder+0.131
 grocer+0.128
Neg logit contributions
aw-0.275
e-0.253
 its-0.250
 such-0.238
 anything-0.221
Token.
Token ID13
Feature activation+2.16e-03

Pos logit contributions
unny+0.090
sie+0.090
 stool+0.090
ll+0.088
ve+0.086
Neg logit contributions
aw-0.049
 its-0.036
 flashes-0.034
 anything-0.034
 such-0.033
Token B
Token ID347
Feature activation+3.41e-02

Pos logit contributions
unny+0.004
 stool+0.004
sie+0.004
ll+0.004
ve+0.004
Neg logit contributions
aw-0.004
 its-0.003
 flashes-0.003
 anything-0.003
 wishes-0.003
Tokenoun
Token ID977
Feature activation+9.23e-02

Pos logit contributions
 stool+0.037
comb+0.031
sie+0.029
stack+0.027
hetically+0.027
Neg logit contributions
aw-0.093
 It-0.083
 Leave-0.079
 its-0.078
 such-0.078
Tokency
Token ID948
Feature activation+1.06e-02

Pos logit contributions
 stool+0.114
hetically+0.086
 boulder+0.082
 grocer+0.082
ipop+0.082
Neg logit contributions
aw-0.236
 its-0.213
e-0.208
 such-0.205
 It-0.196
Token was
Token ID373
Feature activation-2.11e-03

Pos logit contributions
aw+0.022
 flashes+0.018
 anything+0.018
 its+0.018
 such+0.017
Neg logit contributions
sie-0.023
 stool-0.023
ll-0.023
ve-0.022
unny-0.022
Token very
Token ID845
Feature activation+4.55e-03

Pos logit contributions
aw+0.004
 flashes+0.003
 Leave+0.003
 wishes+0.003
 its+0.003
Neg logit contributions
 stool-0.004
sie-0.004
ve-0.004
ll-0.004
otte-0.004
Token proud
Token ID6613
Feature activation+3.07e-02

Pos logit contributions
sie+0.008
 stool+0.008
ll+0.008
ve+0.008
otte+0.008
Neg logit contributions
aw-0.008
 flashes-0.007
 its-0.006
 anything-0.006
 Leave-0.006
Token of
Token ID286
Feature activation+1.81e-02

Pos logit contributions
 stool+0.064
ll+0.063
sie+0.063
unny+0.061
ve+0.061
Neg logit contributions
aw-0.044
 flashes-0.033
 its-0.033
 anything-0.032
 Leave-0.031
Token the
Token ID262
Feature activation+3.71e-02

Pos logit contributions
sie+0.033
 stool+0.033
ll+0.032
ve+0.032
otte+0.032
Neg logit contributions
aw-0.032
 flashes-0.026
 Leave-0.023
 anything-0.023
 its-0.023
Token work
Token ID670
Feature activation+1.10e-01

Pos logit contributions
unny+0.080
sie+0.074
ll+0.074
otte+0.073
hetically+0.072
Neg logit contributions
aw-0.056
 anything-0.047
 wishes-0.045
 such-0.043
 flashes-0.043
Token he
Token ID339
Feature activation+1.24e-02

Pos logit contributions
unny+0.245
sie+0.241
 stool+0.237
ll+0.234
otte+0.231
Neg logit contributions
aw-0.154
 its-0.118
 anything-0.112
 such-0.111
 flashes-0.110
Token had
Token ID550
Feature activation+6.14e-03

Pos logit contributions
sie+0.020
ve+0.020
ll+0.020
 stool+0.020
 miner+0.019
Neg logit contributions
aw-0.024
 flashes-0.020
 its-0.020
 anything-0.019
 Leave-0.019
Token done
Token ID1760
Feature activation+3.37e-02

Pos logit contributions
ve+0.010
ll+0.010
sie+0.010
 stool+0.010
 miner+0.009
Neg logit contributions
aw-0.011
 flashes-0.010
 Leave-0.009
 wishes-0.009
 its-0.009
Token.
Token ID13
Feature activation+2.00e-02

Pos logit contributions
 stool+0.064
ve+0.062
aughter+0.059
ll+0.059
ylan+0.059
Neg logit contributions
aw-0.054
 flashes-0.046
 Leave-0.042
 wishes-0.039
 its-0.037
Token All
Token ID1439
Feature activation+5.46e-03

Pos logit contributions
unny+0.041
 stool+0.041
sie+0.040
ll+0.040
aughter+0.040
Neg logit contributions
aw-0.030
 flashes-0.023
 anything-0.023
 its-0.022
 wishes-0.021
Token little
Token ID1310
Feature activation+2.58e-02

Pos logit contributions
 stool+0.044
unny+0.044
sie+0.044
ll+0.042
ve+0.042
Neg logit contributions
aw-0.047
 its-0.038
 flashes-0.037
 anything-0.037
 such-0.036
Token boy
Token ID2933
Feature activation+2.43e-02

Pos logit contributions
 stool+0.033
sie+0.033
unny+0.032
ll+0.030
hetically+0.030
Neg logit contributions
aw-0.059
 its-0.050
 anything-0.049
 flashes-0.049
 such-0.048
Token getting
Token ID1972
Feature activation+3.46e-02

Pos logit contributions
 stool+0.039
ll+0.039
ve+0.038
hetically+0.038
 miner+0.037
Neg logit contributions
aw-0.040
 flashes-0.036
 anything-0.035
 its-0.033
 wishes-0.033
Token a
Token ID257
Feature activation+4.22e-02

Pos logit contributions
sie+0.064
 stool+0.064
otte+0.062
unny+0.062
hetically+0.061
Neg logit contributions
aw-0.056
 wishes-0.042
 flashes-0.042
 its-0.042
 Leave-0.041
Token buzz
Token ID14713
Feature activation+2.01e-02

Pos logit contributions
unny+0.072
sie+0.071
ll+0.069
 stool+0.069
hetically+0.067
Neg logit contributions
aw-0.079
 its-0.065
 anything-0.064
 flashes-0.063
 such-0.062
Tokencut
Token ID8968
Feature activation+1.10e-01

Pos logit contributions
unny+0.040
sie+0.040
 stool+0.040
ll+0.039
ve+0.038
Neg logit contributions
aw-0.032
 its-0.025
 flashes-0.024
 anything-0.024
 such-0.023
Token and
Token ID290
Feature activation+1.24e-02

Pos logit contributions
unny+0.178
sie+0.178
 stool+0.171
oris+0.166
ll+0.166
Neg logit contributions
aw-0.228
 its-0.190
 such-0.182
 anything-0.180
 flashes-0.172
Token a
Token ID257
Feature activation+1.25e-02

Pos logit contributions
 stool+0.022
unny+0.022
sie+0.021
hetically+0.021
ll+0.021
Neg logit contributions
aw-0.023
 anything-0.017
 flashes-0.017
 wishes-0.017
 such-0.017
Token lady
Token ID10846
Feature activation+1.99e-02

Pos logit contributions
unny+0.020
sie+0.020
 stool+0.020
ll+0.019
otte+0.019
Neg logit contributions
aw-0.024
 its-0.020
 flashes-0.020
 anything-0.020
 such-0.019
Token getting
Token ID1972
Feature activation+1.94e-02

Pos logit contributions
ve+0.035
hetically+0.034
ll+0.033
sie+0.033
ley+0.033
Neg logit contributions
aw-0.033
 anything-0.029
 flashes-0.027
 such-0.027
 wishes-0.026
Token a
Token ID257
Feature activation+3.51e-02

Pos logit contributions
sie+0.036
 stool+0.036
unny+0.035
hetically+0.034
otte+0.034
Neg logit contributions
aw-0.032
 flashes-0.024
 wishes-0.024
 Leave-0.024
 its-0.023
TokenBut
Token ID1537
Feature activation-7.26e-03

Pos logit contributions
aw+0.003
 its+0.002
 flashes+0.002
 anything+0.002
 such+0.002
Neg logit contributions
sie-0.003
unny-0.003
 stool-0.003
ll-0.003
hetically-0.003
Token then
Token ID788
Feature activation-4.14e-03

Pos logit contributions
aw+0.010
 flashes+0.008
 wishes+0.007
 its+0.007
 anything+0.007
Neg logit contributions
 stool-0.016
sie-0.015
unny-0.015
hetically-0.015
ll-0.015
Token Susan
Token ID14919
Feature activation+1.09e-02

Pos logit contributions
aw+0.007
 Leave+0.006
 flashes+0.006
 wishes+0.005
 aids+0.005
Neg logit contributions
 stool-0.007
sie-0.007
comb-0.007
 miner-0.007
ve-0.007
Token felt
Token ID2936
Feature activation-7.10e-03

Pos logit contributions
sie+0.020
unny+0.020
ll+0.020
ve+0.020
 stool+0.020
Neg logit contributions
aw-0.017
 flashes-0.015
 anything-0.015
 such-0.014
 its-0.014
Token a
Token ID257
Feature activation+3.18e-02

Pos logit contributions
aw+0.011
 flashes+0.008
 Leave+0.008
 its+0.008
 wishes+0.007
Neg logit contributions
sie-0.014
 stool-0.014
unny-0.013
hetically-0.013
 grocer-0.013
Token tug
Token ID27762
Feature activation+1.09e-01

Pos logit contributions
ll+0.069
aughter+0.066
stack+0.065
otte+0.065
sie+0.065
Neg logit contributions
aw-0.045
 its-0.039
 wishes-0.038
 anything-0.036
 flashes-0.036
Token on
Token ID319
Feature activation+8.57e-03

Pos logit contributions
unny+0.232
sie+0.228
 stool+0.225
ll+0.224
aughter+0.218
Neg logit contributions
aw-0.162
 its-0.127
 anything-0.120
 such-0.118
 flashes-0.116
Token her
Token ID607
Feature activation-1.72e-03

Pos logit contributions
 stool+0.018
hetically+0.016
sie+0.016
 boulder+0.016
comb+0.016
Neg logit contributions
aw-0.014
 Leave-0.010
 flashes-0.009
 drawn-0.009
 wishes-0.008
Token waist
Token ID16139
Feature activation+6.81e-03

Pos logit contributions
aw+0.003
 its+0.003
 anything+0.003
 wishes+0.002
 such+0.002
Neg logit contributions
sie-0.003
unny-0.003
ll-0.003
otte-0.003
aughter-0.003
Token.
Token ID13
Feature activation+1.55e-02

Pos logit contributions
sie+0.015
unny+0.015
 stool+0.015
ll+0.015
otte+0.015
Neg logit contributions
aw-0.009
 its-0.007
 anything-0.006
 flashes-0.006
 such-0.006
Token She
Token ID1375
Feature activation+8.59e-03

Pos logit contributions
 stool+0.034
unny+0.034
sie+0.033
hetically+0.032
aughter+0.032
Neg logit contributions
aw-0.021
 anything-0.015
 flashes-0.015
 its-0.015
 wishes-0.015
Token her
Token ID607
Feature activation-3.86e-03

Pos logit contributions
 stool+0.026
sie+0.024
otte+0.024
unny+0.023
aughter+0.022
Neg logit contributions
aw-0.024
 wishes-0.018
 Leave-0.018
 flashes-0.017
 drawn-0.016
Token mom
Token ID1995
Feature activation+5.60e-05

Pos logit contributions
aw+0.007
 its+0.006
 flashes+0.005
 anything+0.005
 such+0.005
Neg logit contributions
 stool-0.007
unny-0.007
sie-0.007
ll-0.007
ve-0.007
Tokenmy
Token ID1820
Feature activation+9.27e-03

Pos logit contributions
unny+0.000
 stool+0.000
sie+0.000
ll+0.000
hetically+0.000
Neg logit contributions
aw-0.000
 its-0.000
 anything-0.000
 flashes-0.000
 wishes-0.000
Token gave
Token ID2921
Feature activation+2.11e-02

Pos logit contributions
unny+0.020
sie+0.020
 stool+0.020
aughter+0.019
otte+0.019
Neg logit contributions
aw-0.013
 its-0.010
 such-0.009
 flashes-0.009
 about-0.009
Token the
Token ID262
Feature activation+2.23e-02

Pos logit contributions
sie+0.044
aughter+0.042
otte+0.041
unny+0.041
 stool+0.041
Neg logit contributions
aw-0.031
 flashes-0.025
 Leave-0.022
 aids-0.022
 drawn-0.021
Token cash
Token ID5003
Feature activation+1.08e-01

Pos logit contributions
unny+0.040
sie+0.040
 stool+0.040
ll+0.039
hetically+0.038
Neg logit contributions
aw-0.040
 its-0.032
 anything-0.031
 flashes-0.031
 such-0.031
Tokenier
Token ID959
Feature activation+1.10e-02

Pos logit contributions
unny+0.174
 stool+0.171
sie+0.165
stack+0.163
 slope+0.162
Neg logit contributions
aw-0.230
 such-0.175
 about-0.174
 anything-0.173
 its-0.172
Token some
Token ID617
Feature activation+3.19e-02

Pos logit contributions
unny+0.024
sie+0.024
 stool+0.023
ll+0.023
ve+0.022
Neg logit contributions
aw-0.016
 its-0.012
 flashes-0.012
 anything-0.011
 such-0.011
Token money
Token ID1637
Feature activation+7.00e-02

Pos logit contributions
 stool+0.055
sie+0.055
unny+0.053
ve+0.053
ll+0.052
Neg logit contributions
aw-0.056
 flashes-0.047
 its-0.047
 anything-0.045
 Leave-0.044
Token to
Token ID284
Feature activation-1.63e-04

Pos logit contributions
unny+0.174
sie+0.173
 stool+0.167
otte+0.165
hetically+0.164
Neg logit contributions
aw-0.082
 its-0.060
 such-0.056
 anything-0.055
 about-0.053
Token pay
Token ID1414
Feature activation+3.45e-02

Pos logit contributions
aw+0.000
 flashes+0.000
 Leave+0.000
 wishes+0.000
 Its+0.000
Neg logit contributions
 stool-0.000
sie-0.000
otte-0.000
ll-0.000
 miner-0.000
Token mom
Token ID1995
Feature activation-6.29e-03

Pos logit contributions
aw+0.023
 its+0.018
 anything+0.017
 flashes+0.017
 such+0.017
Neg logit contributions
 stool-0.021
sie-0.019
unny-0.019
ll-0.019
ve-0.019
Token was
Token ID373
Feature activation+1.08e-02

Pos logit contributions
aw+0.011
 anything+0.009
 flashes+0.009
 its+0.009
 wishes+0.008
Neg logit contributions
ll-0.012
sie-0.011
unny-0.011
 stool-0.011
ve-0.011
Token proud
Token ID6613
Feature activation+2.65e-02

Pos logit contributions
 stool+0.019
otte+0.019
ve+0.018
sie+0.018
ll+0.018
Neg logit contributions
aw-0.019
 flashes-0.015
 Leave-0.015
 wishes-0.014
 its-0.013
Token of
Token ID286
Feature activation+1.37e-02

Pos logit contributions
 stool+0.054
ll+0.054
sie+0.053
unny+0.052
otte+0.051
Neg logit contributions
aw-0.040
 flashes-0.030
 its-0.029
 Leave-0.028
 anything-0.028
Token the
Token ID262
Feature activation+2.62e-02

Pos logit contributions
sie+0.030
otte+0.029
 stool+0.029
unny+0.028
hetically+0.028
Neg logit contributions
aw-0.020
 flashes-0.015
 Leave-0.013
 drawn-0.013
 wishes-0.013
Token work
Token ID670
Feature activation+1.08e-01

Pos logit contributions
unny+0.051
ll+0.048
stack+0.046
sie+0.046
hetically+0.045
Neg logit contributions
aw-0.044
 wishes-0.039
 anything-0.037
 flashes-0.037
 such-0.035
Token she
Token ID673
Feature activation+1.65e-03

Pos logit contributions
unny+0.236
sie+0.232
 stool+0.228
ll+0.226
otte+0.221
Neg logit contributions
aw-0.151
 its-0.117
 anything-0.111
 flashes-0.110
 such-0.110
Token had
Token ID550
Feature activation+6.17e-03

Pos logit contributions
ll+0.003
ve+0.003
sie+0.003
 stool+0.003
otte+0.003
Neg logit contributions
aw-0.003
 flashes-0.003
 its-0.003
 anything-0.002
 wishes-0.002
Token done
Token ID1760
Feature activation+2.73e-02

Pos logit contributions
ve+0.010
ll+0.010
sie+0.009
otte+0.009
 stool+0.009
Neg logit contributions
aw-0.012
 flashes-0.011
 Leave-0.010
 wishes-0.009
 aids-0.009
Token!
Token ID0
Feature activation-2.96e-03

Pos logit contributions
 stool+0.054
ve+0.051
otte+0.049
aughter+0.049
sie+0.048
Neg logit contributions
aw-0.041
 flashes-0.036
 Leave-0.035
 Their-0.030
 wishes-0.029
Token
Token ID220
Feature activation-8.94e-03

Pos logit contributions
aw+0.004
 anything+0.003
 flashes+0.003
 its+0.003
 wishes+0.003
Neg logit contributions
unny-0.006
 stool-0.006
sie-0.006
aughter-0.006
otte-0.006
Token another
Token ID1194
Feature activation+2.86e-02

Pos logit contributions
aw+0.014
 flashes+0.010
 Leave+0.010
 drawn+0.010
 aids+0.009
Neg logit contributions
 stool-0.017
 miner-0.017
sie-0.017
aughter-0.017
ylan-0.017
Token way
Token ID835
Feature activation-7.99e-03

Pos logit contributions
unny+0.054
sie+0.054
 stool+0.052
ll+0.052
otte+0.051
Neg logit contributions
aw-0.048
 its-0.040
 flashes-0.038
 anything-0.038
 such-0.038
Token to
Token ID284
Feature activation+5.22e-03

Pos logit contributions
aw+0.012
 flashes+0.009
 its+0.009
 anything+0.009
 wishes+0.009
Neg logit contributions
unny-0.016
 stool-0.016
sie-0.016
ll-0.016
ve-0.016
Token get
Token ID651
Feature activation+2.29e-02

Pos logit contributions
 stool+0.008
sie+0.008
hetically+0.008
unny+0.008
ll+0.008
Neg logit contributions
aw-0.011
 flashes-0.009
 drawn-0.008
 Leave-0.008
 anything-0.008
Token the
Token ID262
Feature activation+3.29e-02

Pos logit contributions
sie+0.046
 stool+0.044
 miner+0.044
ve+0.044
ylan+0.044
Neg logit contributions
aw-0.036
 flashes-0.026
 Leave-0.024
 aids-0.024
 wishes-0.023
Token work
Token ID670
Feature activation+1.07e-01

Pos logit contributions
unny+0.068
sie+0.067
comb+0.066
ll+0.065
otte+0.064
Neg logit contributions
aw-0.050
 flashes-0.042
 its-0.042
 anything-0.042
 such-0.041
Token done
Token ID1760
Feature activation+2.44e-02

Pos logit contributions
unny+0.258
sie+0.254
 stool+0.249
ll+0.248
otte+0.244
Neg logit contributions
aw-0.129
 its-0.097
 anything-0.093
 such-0.090
 flashes-0.087
Token.
Token ID13
Feature activation+1.37e-02

Pos logit contributions
 stool+0.047
ve+0.046
sie+0.045
ll+0.045
hetically+0.044
Neg logit contributions
aw-0.040
 flashes-0.032
 Leave-0.029
 wishes-0.028
 its-0.027
Token<|endoftext|>
Token ID50256
Feature activation+2.90e-02

Pos logit contributions
unny+0.029
 stool+0.028
sie+0.027
ll+0.027
hetically+0.027
Neg logit contributions
aw-0.021
 flashes-0.016
 anything-0.016
 its-0.015
 wishes-0.015
TokenOnce
Token ID7454
Feature activation-1.54e-02

Pos logit contributions
sie+0.067
unny+0.067
 stool+0.066
ll+0.065
hetically+0.064
Neg logit contributions
aw-0.037
 its-0.027
 anything-0.026
 flashes-0.025
 such-0.025
Token there
Token ID612
Feature activation+8.73e-03

Pos logit contributions
aw+0.033
 its+0.030
 anything+0.028
 about+0.028
 such+0.028
Neg logit contributions
sie-0.021
unny-0.021
 stool-0.020
hetically-0.020
comb-0.019
Token He
Token ID679
Feature activation+1.15e-02

Pos logit contributions
unny+0.016
 stool+0.016
sie+0.016
ll+0.016
hetically+0.015
Neg logit contributions
aw-0.010
 its-0.008
 flashes-0.008
 anything-0.008
 wishes-0.007
Token felt
Token ID2936
Feature activation-1.61e-02

Pos logit contributions
unny+0.024
sie+0.024
 stool+0.023
ll+0.023
ve+0.022
Neg logit contributions
aw-0.018
 its-0.014
 flashes-0.013
 anything-0.013
 such-0.013
Token proud
Token ID6613
Feature activation+1.71e-02

Pos logit contributions
aw+0.025
 flashes+0.020
 Leave+0.018
 aids+0.018
 voices+0.017
Neg logit contributions
sie-0.029
 stool-0.029
unny-0.028
 boulder-0.028
 miner-0.028
Token of
Token ID286
Feature activation+9.24e-03

Pos logit contributions
ll+0.033
 stool+0.033
unny+0.032
sie+0.032
ve+0.031
Neg logit contributions
aw-0.026
 flashes-0.021
 its-0.020
 anything-0.020
 Leave-0.020
Token the
Token ID262
Feature activation+2.29e-02

Pos logit contributions
 stool+0.015
sie+0.014
hetically+0.014
ll+0.014
unny+0.014
Neg logit contributions
aw-0.018
 flashes-0.015
 Leave-0.014
 drawn-0.014
 its-0.014
Token work
Token ID670
Feature activation+1.07e-01

Pos logit contributions
otte+0.048
unny+0.046
sie+0.044
ll+0.044
abby+0.043
Neg logit contributions
aw-0.034
 flashes-0.032
 wishes-0.031
 anything-0.030
 its-0.028
Token he
Token ID339
Feature activation+4.97e-03

Pos logit contributions
unny+0.248
sie+0.243
ll+0.234
otte+0.234
 stool+0.233
Neg logit contributions
aw-0.144
 its-0.114
 anything-0.104
 such-0.104
 flashes-0.101
Token had
Token ID550
Feature activation+2.30e-03

Pos logit contributions
sie+0.008
ve+0.008
ll+0.008
 stool+0.008
unny+0.008
Neg logit contributions
aw-0.009
 flashes-0.008
 its-0.008
 anything-0.007
 wishes-0.007
Token just
Token ID655
Feature activation+4.71e-03

Pos logit contributions
ll+0.004
ve+0.004
 stool+0.004
sie+0.004
 miner+0.003
Neg logit contributions
aw-0.004
 flashes-0.004
 Leave-0.004
 wishes-0.003
 aids-0.003
Token done
Token ID1760
Feature activation+2.12e-02

Pos logit contributions
sie+0.008
ll+0.008
 stool+0.008
 miner+0.008
hetically+0.008
Neg logit contributions
aw-0.008
 flashes-0.006
 Leave-0.006
 its-0.006
 wishes-0.006
Token,
Token ID11
Feature activation+5.91e-03

Pos logit contributions
 stool+0.042
ve+0.040
beans+0.038
 miner+0.038
 boulder+0.038
Neg logit contributions
aw-0.032
 flashes-0.028
 Leave-0.026
 wishes-0.024
 flash-0.023
Token,
Token ID11
Feature activation+1.27e-02

Pos logit contributions
aw+0.022
 flashes+0.017
 wishes+0.017
 its+0.016
 such+0.016
Neg logit contributions
unny-0.025
 stool-0.025
sie-0.024
ve-0.024
ll-0.023
Token Tim
Token ID5045
Feature activation+3.99e-02

Pos logit contributions
 stool+0.029
unny+0.029
sie+0.029
ll+0.028
ve+0.028
Neg logit contributions
aw-0.016
 flashes-0.011
 its-0.011
 wishes-0.011
 anything-0.011
Tokenmy
Token ID1820
Feature activation+5.50e-03

Pos logit contributions
unny+0.063
sie+0.063
 stool+0.063
ll+0.060
ve+0.059
Neg logit contributions
aw-0.080
 its-0.066
 flashes-0.064
 anything-0.064
 such-0.063
Token felt
Token ID2936
Feature activation+1.26e-03

Pos logit contributions
ll+0.010
ve+0.010
unny+0.010
 miner+0.010
sie+0.010
Neg logit contributions
aw-0.008
 its-0.007
 such-0.007
 anything-0.007
 flashes-0.007
Token a
Token ID257
Feature activation+3.00e-02

Pos logit contributions
 stool+0.002
sie+0.002
 grocer+0.002
 boulder+0.002
 miner+0.002
Neg logit contributions
aw-0.002
 Leave-0.001
 flashes-0.001
 wishes-0.001
 aids-0.001
Token tug
Token ID27762
Feature activation+1.07e-01

Pos logit contributions
ll+0.063
sie+0.062
unny+0.061
aughter+0.061
otte+0.060
Neg logit contributions
aw-0.045
 its-0.038
 wishes-0.035
 flashes-0.035
 anything-0.035
Token on
Token ID319
Feature activation+8.29e-03

Pos logit contributions
unny+0.217
sie+0.215
 stool+0.214
ll+0.209
otte+0.204
Neg logit contributions
aw-0.168
 its-0.132
 anything-0.126
 flashes-0.125
 such-0.124
Token his
Token ID465
Feature activation+6.41e-03

Pos logit contributions
 stool+0.017
 boulder+0.016
hetically+0.016
 miner+0.016
sie+0.016
Neg logit contributions
aw-0.012
 Leave-0.009
 flashes-0.008
 drawn-0.008
 wishes-0.008
Token fishing
Token ID12478
Feature activation+1.38e-02

Pos logit contributions
sie+0.010
unny+0.009
otte+0.009
 stool+0.009
oris+0.009
Neg logit contributions
aw-0.013
 its-0.012
 anything-0.011
 wishes-0.011
 flashes-0.011
Token rod
Token ID15299
Feature activation+1.37e-02

Pos logit contributions
sie+0.016
otte+0.015
unny+0.015
ll+0.014
ve+0.014
Neg logit contributions
aw-0.035
 its-0.030
 anything-0.029
 wishes-0.029
 flashes-0.028
Token.
Token ID13
Feature activation+1.61e-02

Pos logit contributions
sie+0.032
unny+0.032
 stool+0.031
ll+0.031
otte+0.031
Neg logit contributions
aw-0.018
 its-0.013
 anything-0.012
 flashes-0.012
 such-0.012
Token.
Token ID13
Feature activation+8.76e-04

Pos logit contributions
ll+0.020
ve+0.020
unny+0.020
sie+0.020
ley+0.020
Neg logit contributions
aw-0.016
 anything-0.015
 flashes-0.014
 aids-0.014
 such-0.013
Token The
Token ID383
Feature activation+1.03e-02

Pos logit contributions
unny+0.002
 stool+0.002
sie+0.002
ll+0.002
hetically+0.002
Neg logit contributions
aw-0.001
 anything-0.001
 wishes-0.001
 flashes-0.001
 such-0.001
Token girl
Token ID2576
Feature activation-1.41e-02

Pos logit contributions
ve+0.017
unny+0.017
sie+0.016
otte+0.016
hetically+0.016
Neg logit contributions
aw-0.020
 wishes-0.017
 flashes-0.017
 its-0.016
 hair-0.016
Token had
Token ID550
Feature activation-2.49e-03

Pos logit contributions
aw+0.025
 anything+0.022
 flashes+0.021
 such+0.021
 its+0.020
Neg logit contributions
ll-0.024
sie-0.023
unny-0.023
ve-0.022
 stool-0.022
Token a
Token ID257
Feature activation+1.75e-02

Pos logit contributions
aw+0.004
 flashes+0.004
 aids+0.004
 wishes+0.004
 Leave+0.003
Neg logit contributions
 stool-0.003
unny-0.003
 toddler-0.003
otte-0.003
ve-0.003
Token work
Token ID670
Feature activation+1.06e-01

Pos logit contributions
sie+0.035
unny+0.035
hetically+0.035
ll+0.034
ve+0.033
Neg logit contributions
aw-0.026
 its-0.022
 wishes-0.021
 flashes-0.021
 anything-0.021
Token to
Token ID284
Feature activation+1.48e-02

Pos logit contributions
unny+0.229
sie+0.227
 stool+0.223
ll+0.220
otte+0.216
Neg logit contributions
aw-0.154
 its-0.120
 anything-0.113
 flashes-0.113
 such-0.110
Token do
Token ID466
Feature activation-7.89e-03

Pos logit contributions
 stool+0.027
sie+0.027
unny+0.027
ll+0.026
hetically+0.025
Neg logit contributions
aw-0.026
 its-0.020
 flashes-0.020
 anything-0.020
 such-0.019
Token,
Token ID11
Feature activation+2.06e-03

Pos logit contributions
aw+0.012
 flashes+0.010
 Leave+0.010
 Its+0.008
 See+0.008
Neg logit contributions
ve-0.015
 stool-0.015
ll-0.014
sie-0.014
otte-0.014
Token it
Token ID340
Feature activation-1.47e-03

Pos logit contributions
 stool+0.003
stack+0.003
ll+0.003
sie+0.003
 hill+0.003
Neg logit contributions
aw-0.004
 flashes-0.003
 wishes-0.003
 Leave-0.003
 aids-0.003
Token was
Token ID373
Feature activation-3.36e-03

Pos logit contributions
aw+0.002
 its+0.002
 anything+0.002
 flashes+0.002
 such+0.002
Neg logit contributions
ll-0.003
sie-0.003
 stool-0.003
unny-0.003
hetically-0.003
Token,
Token ID11
Feature activation+2.72e-02

Pos logit contributions
 stool+0.016
ll+0.015
sie+0.015
unny+0.015
ve+0.015
Neg logit contributions
aw-0.013
 flashes-0.011
 its-0.010
 anything-0.010
 Leave-0.010
Token but
Token ID475
Feature activation-6.89e-03

Pos logit contributions
 stool+0.051
unny+0.049
ll+0.049
sie+0.049
ve+0.047
Neg logit contributions
aw-0.045
 flashes-0.036
 anything-0.035
 wishes-0.035
 its-0.033
Token suddenly
Token ID6451
Feature activation-4.57e-02

Pos logit contributions
aw+0.011
 its+0.009
 flashes+0.009
 anything+0.009
 such+0.008
Neg logit contributions
unny-0.013
sie-0.013
 stool-0.013
ll-0.013
otte-0.013
Token felt
Token ID2936
Feature activation-7.78e-03

Pos logit contributions
aw+0.079
 flashes+0.065
 Leave+0.065
 aids+0.063
 wishes+0.062
Neg logit contributions
sie-0.077
 miner-0.076
 stool-0.075
unny-0.075
 boulder-0.074
Token a
Token ID257
Feature activation+3.16e-02

Pos logit contributions
aw+0.011
 Leave+0.009
 flashes+0.009
 wishes+0.009
 aids+0.008
Neg logit contributions
sie-0.015
 stool-0.014
 miner-0.014
 boulder-0.014
 grocer-0.014
Token tug
Token ID27762
Feature activation+1.06e-01

Pos logit contributions
ll+0.058
sie+0.057
unny+0.057
 stool+0.056
aughter+0.056
Neg logit contributions
aw-0.055
 its-0.046
 flashes-0.044
 anything-0.044
 wishes-0.043
Token.
Token ID13
Feature activation+1.49e-02

Pos logit contributions
unny+0.212
sie+0.209
 stool+0.209
ll+0.204
aughter+0.200
Neg logit contributions
aw-0.171
 its-0.135
 anything-0.129
 flashes-0.128
 such-0.127
Token He
Token ID679
Feature activation+2.25e-02

Pos logit contributions
unny+0.032
 stool+0.032
sie+0.031
hetically+0.031
oris+0.030
Neg logit contributions
aw-0.021
 flashes-0.015
 wishes-0.015
 its-0.015
 anything-0.015
Token looked
Token ID3114
Feature activation-8.14e-03

Pos logit contributions
sie+0.038
unny+0.037
ll+0.036
ve+0.036
 miner+0.036
Neg logit contributions
aw-0.040
 anything-0.035
 such-0.034
 flashes-0.034
 its-0.033
Token back
Token ID736
Feature activation+1.17e-02

Pos logit contributions
aw+0.014
 flashes+0.011
 its+0.011
 anything+0.011
 wishes+0.011
Neg logit contributions
sie-0.015
 stool-0.015
unny-0.015
ll-0.014
ve-0.014
Token and
Token ID290
Feature activation+3.27e-03

Pos logit contributions
unny+0.025
 stool+0.025
sie+0.025
ll+0.025
ve+0.024
Neg logit contributions
aw-0.016
 flashes-0.012
 its-0.012
 anything-0.012
 wishes-0.011
Token quickly
Token ID2952
Feature activation-3.65e-03

Pos logit contributions
aw+0.021
 anything+0.017
 its+0.017
 flashes+0.017
 such+0.016
Neg logit contributions
unny-0.021
sie-0.021
 stool-0.021
ll-0.020
ve-0.020
Token checked
Token ID10667
Feature activation+1.71e-02

Pos logit contributions
aw+0.007
 anything+0.005
 its+0.005
 flashes+0.005
 such+0.005
Neg logit contributions
sie-0.006
unny-0.006
 stool-0.006
ll-0.006
ve-0.006
Token for
Token ID329
Feature activation-5.01e-04

Pos logit contributions
unny+0.043
sie+0.043
 stool+0.042
ll+0.042
ve+0.041
Neg logit contributions
aw-0.018
 its-0.014
 anything-0.013
 such-0.012
 flashes-0.012
Token a
Token ID257
Feature activation+8.46e-03

Pos logit contributions
aw+0.001
 flashes+0.001
 Leave+0.001
 wishes+0.001
 drawn+0.000
Neg logit contributions
 stool-0.001
sie-0.001
unny-0.001
 miner-0.001
hetically-0.001
Token fire
Token ID2046
Feature activation+3.08e-02

Pos logit contributions
sie+0.015
ll+0.015
unny+0.015
hetically+0.015
otte+0.015
Neg logit contributions
aw-0.014
 its-0.012
 anything-0.012
 flashes-0.011
 wishes-0.011
Token extingu
Token ID24995
Feature activation+1.06e-01

Pos logit contributions
sie+0.060
otte+0.058
unny+0.058
 stool+0.057
ve+0.056
Neg logit contributions
aw-0.051
 its-0.041
 flashes-0.040
 anything-0.040
 such-0.039
Tokenisher
Token ID4828
Feature activation+4.38e-02

Pos logit contributions
unny+0.118
 stool+0.114
sie+0.108
otte+0.107
comb+0.106
Neg logit contributions
aw-0.269
 flashes-0.225
 anything-0.223
 its-0.222
 such-0.221
Token,
Token ID11
Feature activation+4.37e-03

Pos logit contributions
sie+0.107
unny+0.107
 stool+0.104
otte+0.103
ll+0.103
Neg logit contributions
aw-0.050
 its-0.036
 anything-0.034
 such-0.033
 flashes-0.032
Token but
Token ID475
Feature activation-1.86e-02

Pos logit contributions
 stool+0.007
unny+0.007
sie+0.007
hetically+0.007
 miner+0.007
Neg logit contributions
aw-0.009
 flashes-0.007
 wishes-0.007
 anything-0.006
 its-0.006
Token there
Token ID612
Feature activation+2.50e-02

Pos logit contributions
aw+0.029
 flashes+0.022
 Leave+0.021
 wishes+0.021
 its+0.020
Neg logit contributions
 stool-0.038
 miner-0.036
sie-0.036
unny-0.036
hetically-0.036
Token wasn
Token ID2492
Feature activation-4.74e-03

Pos logit contributions
 stool+0.046
 miner+0.045
hetically+0.044
jack+0.044
unny+0.043
Neg logit contributions
aw-0.042
 Leave-0.032
 flashes-0.032
 wishes-0.031
 anything-0.031
Token happened
Token ID3022
Feature activation+1.76e-02

Pos logit contributions
aw+0.039
 anything+0.031
 Leave+0.031
 its+0.031
 flashes+0.030
Neg logit contributions
sie-0.050
 miner-0.050
unny-0.049
ve-0.049
 stool-0.049
Token.
Token ID13
Feature activation+1.50e-02

Pos logit contributions
unny+0.039
 stool+0.039
sie+0.039
ll+0.038
ve+0.038
Neg logit contributions
aw-0.023
 flashes-0.017
 its-0.017
 anything-0.016
 wishes-0.016
Token He
Token ID679
Feature activation+2.29e-02

Pos logit contributions
 stool+0.036
unny+0.036
sie+0.035
 miner+0.035
aughter+0.034
Neg logit contributions
aw-0.018
 flashes-0.012
 its-0.012
 anything-0.012
 wishes-0.011
Token felt
Token ID2936
Feature activation-6.83e-03

Pos logit contributions
sie+0.042
 miner+0.041
unny+0.041
ll+0.041
ve+0.041
Neg logit contributions
aw-0.038
 anything-0.031
 such-0.030
 its-0.030
 flashes-0.030
Token a
Token ID257
Feature activation+3.05e-02

Pos logit contributions
aw+0.012
 flashes+0.009
 Leave+0.009
c+0.008
 aids+0.008
Neg logit contributions
 miner-0.011
sie-0.011
 boulder-0.011
 grocer-0.010
 stool-0.010
Token tug
Token ID27762
Feature activation+1.06e-01

Pos logit contributions
ll+0.060
sie+0.060
unny+0.060
aughter+0.059
 stool+0.058
Neg logit contributions
aw-0.047
 its-0.040
 wishes-0.039
 flashes-0.038
 anything-0.037
Token on
Token ID319
Feature activation+9.05e-03

Pos logit contributions
unny+0.221
sie+0.218
 stool+0.217
ll+0.213
aughter+0.209
Neg logit contributions
aw-0.161
 its-0.127
 anything-0.120
 such-0.118
 flashes-0.116
Token his
Token ID465
Feature activation+7.67e-03

Pos logit contributions
 stool+0.020
 miner+0.019
sie+0.019
hetically+0.018
 boulder+0.018
Neg logit contributions
aw-0.013
 flashes-0.009
 Leave-0.009
 drawn-0.008
 wishes-0.008
Token fishing
Token ID12478
Feature activation+1.40e-02

Pos logit contributions
sie+0.013
unny+0.013
 stool+0.013
ll+0.013
otte+0.013
Neg logit contributions
aw-0.014
 its-0.012
 wishes-0.011
 anything-0.011
 flashes-0.011
Token line
Token ID1627
Feature activation+1.21e-02

Pos logit contributions
sie+0.017
otte+0.017
unny+0.017
ll+0.016
ve+0.016
Neg logit contributions
aw-0.034
 its-0.029
 anything-0.028
 wishes-0.028
 flashes-0.028
Token and
Token ID290
Feature activation-5.86e-03

Pos logit contributions
unny+0.027
sie+0.027
 stool+0.026
ll+0.026
otte+0.026
Neg logit contributions
aw-0.017
 its-0.013
 anything-0.012
 flashes-0.012
 such-0.012
Token enough
Token ID1576
Feature activation+2.05e-02

Pos logit contributions
unny+0.005
 stool+0.005
sie+0.005
 miner+0.005
ll+0.005
Neg logit contributions
aw-0.005
 flashes-0.004
 aids-0.004
 its-0.004
 wishes-0.004
Token,
Token ID11
Feature activation+2.40e-02

Pos logit contributions
unny+0.044
sie+0.044
 stool+0.043
ll+0.043
ve+0.042
Neg logit contributions
aw-0.030
 its-0.023
 anything-0.022
 flashes-0.022
 such-0.022
Token he
Token ID339
Feature activation-2.79e-04

Pos logit contributions
 stool+0.049
sie+0.048
unny+0.048
ll+0.047
otte+0.046
Neg logit contributions
aw-0.037
 flashes-0.028
 its-0.027
 wishes-0.027
 Leave-0.027
Token felt
Token ID2936
Feature activation-3.14e-03

Pos logit contributions
aw+0.000
 its+0.000
 flashes+0.000
 anything+0.000
 such+0.000
Neg logit contributions
sie-0.001
unny-0.001
ll-0.001
 stool-0.001
ve-0.001
Token a
Token ID257
Feature activation+2.80e-02

Pos logit contributions
aw+0.005
 flashes+0.004
 Leave+0.004
 aids+0.004
 wishes+0.003
Neg logit contributions
 stool-0.006
sie-0.006
unny-0.005
 miner-0.005
 grocer-0.005
Token tug
Token ID27762
Feature activation+1.06e-01

Pos logit contributions
ll+0.059
sie+0.058
aughter+0.058
unny+0.058
otte+0.057
Neg logit contributions
aw-0.040
 its-0.033
 wishes-0.032
 anything-0.031
 flashes-0.031
Token on
Token ID319
Feature activation+6.17e-03

Pos logit contributions
unny+0.226
sie+0.223
 stool+0.221
ll+0.217
otte+0.213
Neg logit contributions
aw-0.158
 its-0.122
 anything-0.116
 such-0.113
 flashes-0.112
Token the
Token ID262
Feature activation+1.08e-02

Pos logit contributions
 stool+0.014
 boulder+0.012
hetically+0.012
sie+0.012
 miner+0.012
Neg logit contributions
aw-0.009
 Leave-0.006
 flashes-0.006
 drawn-0.006
 wishes-0.006
Token net
Token ID2010
Feature activation+1.48e-02

Pos logit contributions
otte+0.018
sie+0.018
ll+0.018
unny+0.018
oris+0.017
Neg logit contributions
aw-0.022
 its-0.019
 anything-0.018
 flashes-0.018
 wishes-0.018
Token.
Token ID13
Feature activation+7.94e-03

Pos logit contributions
 stool+0.033
unny+0.032
sie+0.032
ll+0.032
hetically+0.031
Neg logit contributions
aw-0.020
 its-0.015
 flashes-0.015
 anything-0.014
 wishes-0.014
Token He
Token ID679
Feature activation+1.96e-02

Pos logit contributions
 stool+0.016
unny+0.016
ylan+0.015
 grocer+0.015
aughter+0.015
Neg logit contributions
aw-0.012
 flashes-0.009
 its-0.009
 anything-0.009
 such-0.009
Token.
Token ID13
Feature activation+4.24e-03

Pos logit contributions
 stool+0.013
unny+0.013
ll+0.013
sie+0.013
ve+0.013
Neg logit contributions
aw-0.010
 flashes-0.007
 anything-0.007
 its-0.007
 wishes-0.007
Token They
Token ID1119
Feature activation+2.04e-02

Pos logit contributions
unny+0.009
 stool+0.009
sie+0.009
aughter+0.009
ylan+0.009
Neg logit contributions
aw-0.006
 flashes-0.004
 anything-0.004
 its-0.004
 such-0.004
Token also
Token ID635
Feature activation-1.47e-03

Pos logit contributions
unny+0.047
 stool+0.046
aughter+0.045
sie+0.044
hetically+0.044
Neg logit contributions
aw-0.027
 its-0.022
 flashes-0.020
 wishes-0.019
 such-0.019
Token saw
Token ID2497
Feature activation-2.50e-02

Pos logit contributions
aw+0.002
 its+0.002
 flashes+0.002
 anything+0.002
 such+0.002
Neg logit contributions
unny-0.003
sie-0.003
 stool-0.003
ll-0.003
ve-0.003
Token a
Token ID257
Feature activation+2.70e-02

Pos logit contributions
aw+0.037
 Leave+0.032
 flashes+0.030
 drawn+0.030
 aids+0.030
Neg logit contributions
 miner-0.042
sie-0.042
 toddler-0.042
 stool-0.041
ylan-0.041
Token tug
Token ID27762
Feature activation+1.05e-01

Pos logit contributions
unny+0.051
sie+0.051
 stool+0.050
ll+0.049
hetically+0.049
Neg logit contributions
aw-0.045
 its-0.036
 flashes-0.035
 anything-0.035
 such-0.034
Tokenboat
Token ID24482
Feature activation+2.82e-02

Pos logit contributions
unny+0.207
sie+0.206
 stool+0.205
ll+0.200
hetically+0.196
Neg logit contributions
aw-0.170
 its-0.134
 anything-0.129
 flashes-0.129
 such-0.126
Token which
Token ID543
Feature activation+6.32e-03

Pos logit contributions
 stool+0.052
unny+0.051
ll+0.051
 miner+0.050
sie+0.050
Neg logit contributions
aw-0.047
 flashes-0.037
 wishes-0.037
 its-0.036
 anything-0.035
Token was
Token ID373
Feature activation+1.19e-02

Pos logit contributions
 stool+0.012
sie+0.012
unny+0.012
ll+0.012
 miner+0.012
Neg logit contributions
aw-0.010
 its-0.008
 flashes-0.007
 anything-0.007
 wishes-0.007
Token to
Token ID284
Feature activation+3.02e-02

Pos logit contributions
 stool+0.021
ll+0.020
 miner+0.020
sie+0.020
unny+0.019
Neg logit contributions
aw-0.022
 Leave-0.017
 wishes-0.017
 flashes-0.016
 anything-0.015
Tokenwing
Token ID5469
Feature activation+3.14e-02

Pos logit contributions
 stool+0.050
sie+0.049
unny+0.049
 miner+0.047
ll+0.047
Neg logit contributions
aw-0.059
 flashes-0.043
 Leave-0.043
 wishes-0.042
 anything-0.040
Token She
Token ID1375
Feature activation+3.50e-03

Pos logit contributions
unny+0.019
sie+0.019
 stool+0.019
ll+0.018
ve+0.018
Neg logit contributions
aw-0.010
 its-0.007
 flashes-0.007
 anything-0.007
 such-0.007
Token was
Token ID373
Feature activation-3.12e-03

Pos logit contributions
unny+0.007
 stool+0.007
sie+0.007
ll+0.007
aughter+0.007
Neg logit contributions
aw-0.005
 its-0.004
 flashes-0.004
 wishes-0.004
 such-0.004
Token proud
Token ID6613
Feature activation+1.68e-02

Pos logit contributions
aw+0.005
 flashes+0.004
 Leave+0.004
 anything+0.004
 wishes+0.004
Neg logit contributions
sie-0.006
 stool-0.006
ve-0.006
otte-0.006
comb-0.006
Token of
Token ID286
Feature activation+9.08e-03

Pos logit contributions
sie+0.036
 stool+0.036
ll+0.036
unny+0.036
ve+0.035
Neg logit contributions
aw-0.023
 flashes-0.017
 its-0.017
 anything-0.017
 wishes-0.016
Token the
Token ID262
Feature activation+1.84e-02

Pos logit contributions
sie+0.020
 stool+0.020
unny+0.020
ll+0.019
otte+0.019
Neg logit contributions
aw-0.013
 flashes-0.009
 its-0.009
 anything-0.009
 wishes-0.009
Token work
Token ID670
Feature activation+1.05e-01

Pos logit contributions
unny+0.040
ll+0.037
oris+0.036
otte+0.036
oun+0.035
Neg logit contributions
aw-0.027
 wishes-0.024
 anything-0.024
 such-0.022
 flashes-0.022
Token she
Token ID673
Feature activation-2.01e-03

Pos logit contributions
unny+0.234
sie+0.226
ll+0.222
 stool+0.220
otte+0.217
Neg logit contributions
aw-0.147
 its-0.118
 such-0.111
 anything-0.111
 flashes-0.106
Token was
Token ID373
Feature activation+1.23e-02

Pos logit contributions
aw+0.004
 flashes+0.003
 its+0.003
 anything+0.003
 wishes+0.003
Neg logit contributions
sie-0.003
ll-0.003
 stool-0.003
ve-0.003
unny-0.003
Token doing
Token ID1804
Feature activation+7.82e-03

Pos logit contributions
sie+0.022
 stool+0.022
ve+0.022
ll+0.021
otte+0.021
Neg logit contributions
aw-0.022
 flashes-0.018
 wishes-0.017
 Leave-0.017
 anything-0.016
Token and
Token ID290
Feature activation+7.43e-03

Pos logit contributions
 stool+0.018
sie+0.017
ve+0.017
aughter+0.017
otte+0.016
Neg logit contributions
aw-0.010
 flashes-0.008
 Leave-0.007
 wishes-0.006
 its-0.006
Token it
Token ID340
Feature activation-1.06e-02

Pos logit contributions
ll+0.013
sie+0.013
hetically+0.013
 stool+0.013
otte+0.012
Neg logit contributions
aw-0.014
 flashes-0.011
 anything-0.010
 Leave-0.010
 wishes-0.009
Token he
Token ID339
Feature activation-1.03e-02

Pos logit contributions
 stool+0.007
sie+0.006
ll+0.006
unny+0.006
 miner+0.006
Neg logit contributions
aw-0.004
 flashes-0.003
 its-0.003
 Leave-0.003
 wishes-0.003
Token needed
Token ID2622
Feature activation+1.17e-03

Pos logit contributions
aw+0.016
 its+0.013
 anything+0.012
 flashes+0.012
 such+0.012
Neg logit contributions
unny-0.021
sie-0.020
 stool-0.020
ll-0.020
hetically-0.019
Token to
Token ID284
Feature activation+7.73e-03

Pos logit contributions
 stool+0.002
sie+0.002
unny+0.002
ve+0.002
ll+0.002
Neg logit contributions
aw-0.002
 flashes-0.001
 Leave-0.001
 wishes-0.001
 its-0.001
Token give
Token ID1577
Feature activation+1.49e-02

Pos logit contributions
sie+0.013
unny+0.013
 stool+0.013
ll+0.013
otte+0.013
Neg logit contributions
aw-0.014
 its-0.011
 flashes-0.011
 anything-0.011
 such-0.011
Token the
Token ID262
Feature activation+1.87e-02

Pos logit contributions
sie+0.035
 stool+0.035
unny+0.035
aughter+0.034
ve+0.034
Neg logit contributions
aw-0.017
 flashes-0.013
 Leave-0.011
 wishes-0.010
 its-0.010
Token cash
Token ID5003
Feature activation+1.05e-01

Pos logit contributions
unny+0.040
ll+0.039
sie+0.038
hetically+0.037
comb+0.036
Neg logit contributions
aw-0.027
 anything-0.024
 wishes-0.024
 such-0.023
 its-0.023
Tokenier
Token ID959
Feature activation+7.12e-03

Pos logit contributions
unny+0.164
 stool+0.163
stack+0.161
sie+0.158
 slope+0.155
Neg logit contributions
aw-0.220
 anything-0.172
 such-0.172
 about-0.166
 its-0.163
Token some
Token ID617
Feature activation+2.96e-02

Pos logit contributions
sie+0.015
 stool+0.015
unny+0.015
ll+0.014
aughter+0.014
Neg logit contributions
aw-0.011
 flashes-0.008
 its-0.008
 anything-0.008
 wishes-0.007
Token money
Token ID1637
Feature activation+6.97e-02

Pos logit contributions
sie+0.044
 stool+0.044
unny+0.044
ll+0.042
ve+0.042
Neg logit contributions
aw-0.061
 its-0.051
 flashes-0.050
 anything-0.049
 such-0.048
Token.
Token ID13
Feature activation+1.19e-02

Pos logit contributions
unny+0.168
sie+0.166
 stool+0.163
hetically+0.161
oris+0.159
Neg logit contributions
aw-0.082
 its-0.062
 anything-0.060
 such-0.060
 about-0.055
Token Tim
Token ID5045
Feature activation+3.38e-02

Pos logit contributions
unny+0.025
 stool+0.024
ll+0.024
aughter+0.024
oun+0.024
Neg logit contributions
aw-0.018
 anything-0.014
 flashes-0.014
 wishes-0.014
 its-0.013
Token"
Token ID1
Feature activation+2.08e-02

Pos logit contributions
aw+0.015
 its+0.013
 anything+0.013
 flashes+0.013
 such+0.012
Neg logit contributions
unny-0.011
sie-0.011
 stool-0.011
ll-0.010
otte-0.010
TokenHo
Token ID28900
Feature activation-1.49e-02

Pos logit contributions
unny+0.041
sie+0.041
 stool+0.041
ll+0.040
otte+0.039
Neg logit contributions
aw-0.034
 its-0.026
 flashes-0.026
 anything-0.026
 such-0.025
Token ho
Token ID8169
Feature activation-5.31e-02

Pos logit contributions
aw+0.032
 its+0.027
 flashes+0.026
 anything+0.026
 such+0.026
Neg logit contributions
unny-0.021
sie-0.021
 stool-0.021
ll-0.020
hetically-0.020
Token ho
Token ID8169
Feature activation-4.99e-02

Pos logit contributions
aw+0.109
 its+0.090
 anything+0.088
 flashes+0.087
 such+0.086
Neg logit contributions
sie-0.082
 stool-0.080
unny-0.079
ll-0.076
hetically-0.076
Token!
Token ID0
Feature activation-4.51e-02

Pos logit contributions
aw+0.105
 its+0.088
 anything+0.085
 flashes+0.083
 such+0.083
Neg logit contributions
sie-0.075
 stool-0.073
unny-0.072
hetically-0.069
ll-0.069
Token Merry
Token ID39001
Feature activation-1.14e-01

Pos logit contributions
aw+0.074
 drawn+0.056
 flashes+0.056
 wishes+0.054
 its+0.053
Neg logit contributions
unny-0.092
 stool-0.085
sie-0.084
ylan-0.083
aughter-0.082
Token Christmas
Token ID6786
Feature activation-1.72e-02

Pos logit contributions
aw+0.207
 flashes+0.170
 its+0.170
 anything+0.167
 wishes+0.166
Neg logit contributions
unny-0.186
sie-0.182
ll-0.178
otte-0.178
ve-0.178
Token,
Token ID11
Feature activation-2.50e-02

Pos logit contributions
aw+0.027
 its+0.021
 flashes+0.020
 anything+0.020
 such+0.020
Neg logit contributions
unny-0.035
sie-0.035
 stool-0.034
ll-0.034
hetically-0.033
Token Anna
Token ID11735
Feature activation-2.42e-02

Pos logit contributions
aw+0.037
 its+0.028
 flashes+0.028
 anything+0.027
 such+0.026
Neg logit contributions
sie-0.052
unny-0.052
 stool-0.052
ll-0.050
otte-0.050
Token and
Token ID290
Feature activation-1.32e-02

Pos logit contributions
aw+0.037
 its+0.029
 flashes+0.028
 anything+0.028
 such+0.027
Neg logit contributions
unny-0.049
sie-0.049
 stool-0.049
ll-0.048
ve-0.047
Token Ben
Token ID3932
Feature activation-3.39e-02

Pos logit contributions
aw+0.019
 its+0.015
 anything+0.014
 flashes+0.014
 such+0.014
Neg logit contributions
unny-0.028
sie-0.028
 stool-0.028
ll-0.027
ve-0.027
Token tw
Token ID665
Feature activation-2.87e-02

Pos logit contributions
sie+0.006
 stool+0.006
unny+0.006
 miner+0.006
hetically+0.006
Neg logit contributions
aw-0.004
 Leave-0.003
 flashes-0.003
 drawn-0.003
 wishes-0.003
Tokenigs
Token ID9235
Feature activation-4.86e-03

Pos logit contributions
aw+0.063
 flashes+0.053
 its+0.053
 anything+0.052
 such+0.052
Neg logit contributions
sie-0.039
unny-0.039
ll-0.039
ve-0.038
 stool-0.038
Token and
Token ID290
Feature activation-2.58e-02

Pos logit contributions
aw+0.007
 its+0.005
 anything+0.005
 flashes+0.005
 such+0.005
Neg logit contributions
unny-0.010
sie-0.010
 stool-0.010
ll-0.010
otte-0.010
Token to
Token ID284
Feature activation-2.44e-02

Pos logit contributions
aw+0.052
 flashes+0.040
 Leave+0.040
 such+0.040
 its+0.039
Neg logit contributions
 stool-0.040
sie-0.039
unny-0.039
ll-0.038
hetically-0.037
Token c
Token ID269
Feature activation-7.14e-02

Pos logit contributions
aw+0.044
 flashes+0.033
 Leave+0.032
 wishes+0.031
 such+0.031
Neg logit contributions
sie-0.044
 stool-0.044
unny-0.043
ll-0.043
stack-0.042
Tokenaw
Token ID707
Feature activation-1.14e-01

Pos logit contributions
aw+0.142
 anything+0.140
 such+0.138
 its+0.138
 flashes+0.137
Neg logit contributions
ll-0.100
unny-0.094
ve-0.092
sie-0.087
oun-0.083
Token and
Token ID290
Feature activation-3.28e-02

Pos logit contributions
aw+0.170
 flashes+0.151
 wishes+0.148
 Leave+0.141
 such+0.135
Neg logit contributions
unny-0.223
ll-0.222
 stool-0.215
comb-0.212
hetically-0.212
Token fly
Token ID6129
Feature activation-6.69e-02

Pos logit contributions
aw+0.060
 flashes+0.045
 Leave+0.043
 such+0.043
 its+0.043
Neg logit contributions
 stool-0.059
sie-0.057
unny-0.056
ll-0.056
comb-0.054
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
aw+0.110
 flashes+0.091
 wishes+0.088
 Leave+0.087
 such+0.082
Neg logit contributions
unny-0.125
sie-0.123
 stool-0.119
ll-0.118
ve-0.116
Token He
Token ID679
Feature activation-6.13e-04

Pos logit contributions
aw+0.026
 flashes+0.021
 anything+0.020
 its+0.020
 such+0.019
Neg logit contributions
unny-0.029
 stool-0.028
sie-0.027
oris-0.027
 grocer-0.027
Token also
Token ID635
Feature activation-1.12e-02

Pos logit contributions
aw+0.001
 its+0.001
 anything+0.001
 flashes+0.001
 such+0.001
Neg logit contributions
sie-0.001
unny-0.001
 stool-0.001
ll-0.001
ve-0.001
Token tw
Token ID665
Feature activation-2.72e-02

Pos logit contributions
sie+0.009
 stool+0.009
unny+0.009
 miner+0.009
hetically+0.009
Neg logit contributions
aw-0.007
 Leave-0.006
 flashes-0.005
 drawn-0.005
 wishes-0.005
Tokenigs
Token ID9235
Feature activation-4.98e-03

Pos logit contributions
aw+0.060
 flashes+0.051
 its+0.050
 anything+0.050
 such+0.050
Neg logit contributions
sie-0.037
unny-0.037
ll-0.036
 stool-0.035
ve-0.035
Token and
Token ID290
Feature activation-2.20e-02

Pos logit contributions
aw+0.007
 its+0.006
 flashes+0.005
 anything+0.005
 such+0.005
Neg logit contributions
unny-0.010
sie-0.010
 stool-0.010
ll-0.010
otte-0.010
Token to
Token ID284
Feature activation-1.80e-02

Pos logit contributions
aw+0.044
 flashes+0.034
 Leave+0.034
 such+0.033
 its+0.033
Neg logit contributions
 stool-0.034
sie-0.033
unny-0.033
ll-0.032
hetically-0.031
Token c
Token ID269
Feature activation-6.90e-02

Pos logit contributions
aw+0.033
 flashes+0.025
 Leave+0.024
 wishes+0.023
 such+0.023
Neg logit contributions
sie-0.032
 stool-0.032
unny-0.031
ll-0.031
stack-0.030
Tokenaw
Token ID707
Feature activation-1.13e-01

Pos logit contributions
 anything+0.140
aw+0.140
 flashes+0.138
 such+0.137
 its+0.137
Neg logit contributions
ll-0.094
unny-0.088
ve-0.086
sie-0.081
oun-0.077
Token and
Token ID290
Feature activation-2.66e-02

Pos logit contributions
aw+0.170
 flashes+0.153
 wishes+0.148
 Leave+0.143
 such+0.132
Neg logit contributions
unny-0.217
ll-0.217
 stool-0.209
comb-0.208
hetically-0.206
Token fly
Token ID6129
Feature activation-6.23e-02

Pos logit contributions
aw+0.047
 flashes+0.036
 its+0.034
 Leave+0.033
 such+0.033
Neg logit contributions
 stool-0.048
sie-0.047
ll-0.047
unny-0.047
stack-0.045
Token.
Token ID13
Feature activation-5.72e-03

Pos logit contributions
aw+0.104
 flashes+0.086
 wishes+0.082
 Leave+0.082
 such+0.077
Neg logit contributions
unny-0.115
sie-0.113
 stool-0.110
ll-0.109
ve-0.107
Token He
Token ID679
Feature activation+6.67e-03

Pos logit contributions
aw+0.008
 flashes+0.006
 its+0.006
 anything+0.006
 such+0.006
Neg logit contributions
unny-0.012
 stool-0.012
sie-0.012
ll-0.012
hetically-0.012
Token also
Token ID635
Feature activation-6.83e-03

Pos logit contributions
sie+0.013
unny+0.013
 stool+0.013
ll+0.013
ve+0.013
Neg logit contributions
aw-0.010
 its-0.008
 anything-0.008
 flashes-0.008
 such-0.008
Token and
Token ID290
Feature activation-2.84e-02

Pos logit contributions
unny+0.027
sie+0.027
 stool+0.027
ll+0.026
ve+0.026
Neg logit contributions
aw-0.020
 its-0.016
 anything-0.015
 flashes-0.015
 such-0.015
Token dragons
Token ID20308
Feature activation-5.94e-02

Pos logit contributions
aw+0.047
 flashes+0.035
 anything+0.034
 such+0.034
 its+0.033
Neg logit contributions
 stool-0.054
sie-0.053
unny-0.051
ll-0.051
hetically-0.050
Token and
Token ID290
Feature activation-4.23e-02

Pos logit contributions
aw+0.093
 flashes+0.072
 its+0.069
 anything+0.069
 such+0.067
Neg logit contributions
ll-0.117
sie-0.114
unny-0.114
 stool-0.112
ve-0.112
Token knights
Token ID27536
Feature activation-6.40e-02

Pos logit contributions
aw+0.072
 flashes+0.053
 such+0.051
 Leave+0.051
 its+0.050
Neg logit contributions
sie-0.079
 stool-0.077
unny-0.075
 miner-0.074
hetically-0.074
Token and
Token ID290
Feature activation-5.45e-02

Pos logit contributions
aw+0.104
 flashes+0.084
 anything+0.078
 such+0.077
 its+0.077
Neg logit contributions
ll-0.121
sie-0.116
unny-0.116
ve-0.115
 stool-0.114
Token princess
Token ID21752
Feature activation-1.12e-01

Pos logit contributions
aw+0.094
 flashes+0.070
 such+0.068
 its+0.068
 Leave+0.066
Neg logit contributions
sie-0.103
unny-0.100
 stool-0.099
ll-0.096
 miner-0.096
Tokenes
Token ID274
Feature activation-6.48e-02

Pos logit contributions
aw+0.187
 flashes+0.160
 anything+0.150
 such+0.149
 its+0.148
Neg logit contributions
ll-0.197
sie-0.194
unny-0.193
ve-0.191
beans-0.184
Token.
Token ID13
Feature activation-2.45e-02

Pos logit contributions
aw+0.103
 flashes+0.081
 wishes+0.075
 anything+0.074
 its+0.074
Neg logit contributions
sie-0.124
ll-0.123
unny-0.123
 stool-0.120
ve-0.118
Token She
Token ID1375
Feature activation-1.23e-02

Pos logit contributions
aw+0.037
 flashes+0.028
 anything+0.027
 wishes+0.027
 such+0.026
Neg logit contributions
unny-0.052
sie-0.050
 stool-0.049
otte-0.049
ll-0.048
Token wished
Token ID16555
Feature activation-1.53e-02

Pos logit contributions
aw+0.019
 its+0.014
 anything+0.014
 flashes+0.014
 such+0.014
Neg logit contributions
sie-0.025
unny-0.024
 stool-0.024
ll-0.024
ve-0.023
Token she
Token ID673
Feature activation-1.11e-02

Pos logit contributions
aw+0.029
 flashes+0.023
 wishes+0.022
 Leave+0.022
 its+0.021
Neg logit contributions
 stool-0.026
sie-0.025
ll-0.025
unny-0.025
hetically-0.025
Token"
Token ID1
Feature activation+1.98e-02

Pos logit contributions
aw+0.011
 its+0.009
 flashes+0.009
 anything+0.009
 such+0.009
Neg logit contributions
unny-0.008
sie-0.008
 stool-0.008
ll-0.008
ve-0.008
TokenHo
Token ID28900
Feature activation-1.56e-02

Pos logit contributions
unny+0.038
sie+0.038
 stool+0.038
ll+0.037
ve+0.036
Neg logit contributions
aw-0.033
 its-0.026
 flashes-0.026
 anything-0.026
 such-0.025
Token ho
Token ID8169
Feature activation-5.19e-02

Pos logit contributions
aw+0.033
 its+0.028
 anything+0.027
 flashes+0.026
 such+0.026
Neg logit contributions
sie-0.023
unny-0.023
 stool-0.023
ll-0.022
hetically-0.022
Token ho
Token ID8169
Feature activation-4.78e-02

Pos logit contributions
aw+0.098
 its+0.079
 anything+0.078
 such+0.075
 flashes+0.073
Neg logit contributions
sie-0.089
 stool-0.088
unny-0.084
hetically-0.083
otte-0.083
Token!
Token ID0
Feature activation-3.92e-02

Pos logit contributions
aw+0.092
 its+0.075
 anything+0.073
 such+0.070
 Leave+0.068
Neg logit contributions
sie-0.081
 stool-0.079
unny-0.075
 grocer-0.075
hetically-0.075
Token Merry
Token ID39001
Feature activation-1.09e-01

Pos logit contributions
aw+0.062
 flashes+0.047
 drawn+0.046
 wishes+0.046
 its+0.045
Neg logit contributions
unny-0.083
 stool-0.076
sie-0.075
ylan-0.075
aughter-0.074
Token Christmas
Token ID6786
Feature activation-1.39e-02

Pos logit contributions
aw+0.179
 flashes+0.145
 its+0.144
 wishes+0.142
 anything+0.142
Neg logit contributions
unny-0.202
sie-0.195
 stool-0.194
ll-0.192
ve-0.191
Token,
Token ID11
Feature activation-2.02e-02

Pos logit contributions
aw+0.022
 its+0.016
 flashes+0.016
 such+0.016
 anything+0.016
Neg logit contributions
unny-0.029
sie-0.028
 stool-0.028
otte-0.027
hetically-0.027
Token Lily
Token ID20037
Feature activation-1.14e-02

Pos logit contributions
aw+0.029
 its+0.022
 flashes+0.021
 anything+0.021
 such+0.020
Neg logit contributions
unny-0.043
sie-0.043
 stool-0.043
ll-0.042
ve-0.041
Token and
Token ID290
Feature activation-3.05e-04

Pos logit contributions
aw+0.017
 its+0.013
 flashes+0.012
 such+0.012
 anything+0.012
Neg logit contributions
sie-0.024
unny-0.024
 stool-0.024
hetically-0.023
ll-0.023
Token Ben
Token ID3932
Feature activation-2.90e-02

Pos logit contributions
aw+0.000
 flashes+0.000
 its+0.000
 anything+0.000
 such+0.000
Neg logit contributions
sie-0.001
unny-0.001
 stool-0.001
ll-0.001
otte-0.001
Token,
Token ID11
Feature activation-1.55e-02

Pos logit contributions
aw+0.156
 its+0.125
 flashes+0.125
 wishes+0.122
 anything+0.121
Neg logit contributions
unny-0.182
ll-0.179
 stool-0.178
sie-0.176
ve-0.173
Token c
Token ID269
Feature activation-5.25e-02

Pos logit contributions
aw+0.029
 flashes+0.023
 its+0.021
 such+0.021
 drawn+0.021
Neg logit contributions
 stool-0.027
unny-0.026
sie-0.025
 miner-0.025
ll-0.025
Tokenaw
Token ID707
Feature activation-1.04e-01

Pos logit contributions
aw+0.096
 its+0.094
 anything+0.094
 flashes+0.092
 such+0.092
Neg logit contributions
ll-0.083
ve-0.081
unny-0.078
sie-0.075
aughter-0.074
Token,
Token ID11
Feature activation-2.41e-02

Pos logit contributions
aw+0.167
 flashes+0.137
 its+0.135
 wishes+0.134
 anything+0.131
Neg logit contributions
unny-0.200
ll-0.198
 stool-0.197
sie-0.192
ve-0.190
Token c
Token ID269
Feature activation-5.20e-02

Pos logit contributions
aw+0.044
 flashes+0.035
 its+0.033
 such+0.033
 anything+0.032
Neg logit contributions
 stool-0.042
unny-0.041
sie-0.041
ll-0.040
 miner-0.039
Tokenaw
Token ID707
Feature activation-1.07e-01

Pos logit contributions
aw+0.089
 its+0.085
 anything+0.084
 flashes+0.083
 such+0.083
Neg logit contributions
ll-0.088
ve-0.087
unny-0.085
sie-0.082
 stool-0.081
Token".
Token ID1911
Feature activation-5.64e-03

Pos logit contributions
aw+0.170
 flashes+0.139
 its+0.137
 wishes+0.136
 anything+0.134
Neg logit contributions
unny-0.207
ll-0.204
 stool-0.204
sie-0.198
ve-0.196
Token<|endoftext|>
Token ID50256
Feature activation+1.56e-02

Pos logit contributions
aw+0.010
 flashes+0.008
 anything+0.008
 such+0.007
 wishes+0.007
Neg logit contributions
 stool-0.010
unny-0.010
oun-0.009
 aisle-0.009
aughter-0.009
TokenAnna
Token ID31160
Feature activation-2.91e-02

Pos logit contributions
sie+0.028
 stool+0.028
unny+0.028
hetically+0.027
ll+0.027
Neg logit contributions
aw-0.028
 its-0.023
 anything-0.022
 flashes-0.022
 such-0.022
Token and
Token ID290
Feature activation-1.37e-02

Pos logit contributions
aw+0.045
 anything+0.035
 flashes+0.035
 its+0.035
 such+0.034
Neg logit contributions
unny-0.058
sie-0.058
 stool-0.057
ll-0.057
ve-0.055
Token Ben
Token ID3932
Feature activation-2.77e-02

Pos logit contributions
aw+0.019
 flashes+0.014
 anything+0.013
 its+0.013
 wishes+0.013
Neg logit contributions
sie-0.029
unny-0.029
 stool-0.029
otte-0.028
aughter-0.028
Token d
Token ID288
Feature activation-2.08e-02

Pos logit contributions
 stool+0.008
sie+0.008
unny+0.008
ll+0.008
ve+0.008
Neg logit contributions
aw-0.009
 flashes-0.007
 Leave-0.006
 anything-0.006
 its-0.006
Tokenuk
Token ID2724
Feature activation+1.45e-02

Pos logit contributions
aw+0.041
 its+0.035
 anything+0.035
 flashes+0.035
 wishes+0.034
Neg logit contributions
unny-0.032
sie-0.032
 stool-0.031
ll-0.031
ve-0.031
Tokenk
Token ID74
Feature activation-1.89e-02

Pos logit contributions
unny+0.024
sie+0.024
 stool+0.024
ll+0.023
ve+0.023
Neg logit contributions
aw-0.027
 its-0.023
 flashes-0.022
 anything-0.022
 such-0.022
Tokenah
Token ID993
Feature activation-2.74e-02

Pos logit contributions
aw+0.030
 flashes+0.026
 wishes+0.024
 anything+0.023
 Leave+0.023
Neg logit contributions
unny-0.035
sie-0.034
ve-0.034
 stool-0.034
ll-0.033
Token and
Token ID290
Feature activation-4.81e-03

Pos logit contributions
aw+0.043
 flashes+0.039
 anything+0.037
 its+0.037
 such+0.036
Neg logit contributions
ve-0.051
unny-0.049
ll-0.048
sie-0.047
 stool-0.047
Token har
Token ID3971
Feature activation-1.07e-01

Pos logit contributions
aw+0.009
 flashes+0.007
 anything+0.007
 its+0.007
 Leave+0.007
Neg logit contributions
 stool-0.008
unny-0.008
sie-0.008
hetically-0.008
ll-0.008
Tokenissa
Token ID13808
Feature activation-2.94e-02

Pos logit contributions
aw+0.180
 flashes+0.165
 anything+0.154
 such+0.147
 flash+0.147
Neg logit contributions
ve-0.177
unny-0.174
ley-0.169
sie-0.167
hetically-0.162
Token and
Token ID290
Feature activation-9.28e-03

Pos logit contributions
aw+0.048
 flashes+0.039
 anything+0.036
 its+0.035
 such+0.035
Neg logit contributions
unny-0.057
sie-0.057
ve-0.055
ll-0.054
 stool-0.054
Token ber
Token ID18157
Feature activation-1.59e-02

Pos logit contributions
aw+0.016
 flashes+0.013
 its+0.012
 anything+0.012
 Leave+0.012
Neg logit contributions
 stool-0.016
sie-0.016
unny-0.016
hetically-0.015
 miner-0.015
Tokenb
Token ID65
Feature activation-2.84e-02

Pos logit contributions
aw+0.034
 its+0.028
 flashes+0.028
 anything+0.028
 such+0.027
Neg logit contributions
unny-0.023
sie-0.023
 stool-0.023
ll-0.022
ve-0.022
Tokenere
Token ID567
Feature activation+2.72e-02

Pos logit contributions
aw+0.062
 its+0.053
 flashes+0.052
 anything+0.052
 such+0.051
Neg logit contributions
unny-0.039
sie-0.038
 stool-0.037
ll-0.037
ve-0.036
Token se
Token ID384
Feature activation-2.47e-02

Pos logit contributions
aw+0.054
 Leave+0.053
 such+0.048
 aids+0.048
e+0.047
Neg logit contributions
 stool-0.052
 toddler-0.052
 fisherman-0.052
guard-0.051
sie-0.051
Tokenag
Token ID363
Feature activation-1.98e-03

Pos logit contributions
aw+0.036
 anything+0.034
 flashes+0.034
 such+0.033
 its+0.033
Neg logit contributions
sie-0.047
unny-0.046
ll-0.046
ve-0.046
aughter-0.044
Tokenull
Token ID724
Feature activation-4.68e-02

Pos logit contributions
aw+0.003
 Leave+0.003
 wishes+0.003
 anything+0.003
 such+0.003
Neg logit contributions
unny-0.004
sie-0.003
ve-0.003
ll-0.003
ert-0.003
Tokens
Token ID82
Feature activation-1.62e-02

Pos logit contributions
aw+0.098
 anything+0.087
 such+0.087
 wishes+0.084
 its+0.083
Neg logit contributions
ll-0.068
unny-0.066
ve-0.062
sie-0.061
hetically-0.060
Token and
Token ID290
Feature activation-2.37e-02

Pos logit contributions
aw+0.029
 wishes+0.024
 anything+0.024
 such+0.024
 Leave+0.023
Neg logit contributions
 stool-0.027
ll-0.025
hetically-0.025
unny-0.025
ve-0.024
Token pel
Token ID16176
Feature activation-1.06e-01

Pos logit contributions
aw+0.045
 Leave+0.035
 such+0.035
 wishes+0.033
 aids+0.033
Neg logit contributions
 stool-0.039
beans-0.036
sie-0.035
 miner-0.035
unny-0.034
Tokenicans
Token ID22398
Feature activation+2.36e-02

Pos logit contributions
aw+0.187
 its+0.156
 flashes+0.155
 anything+0.153
 wishes+0.152
Neg logit contributions
sie-0.182
unny-0.181
ll-0.179
 stool-0.177
ve-0.173
Token.
Token ID13
Feature activation-2.09e-02

Pos logit contributions
ll+0.043
ve+0.043
hetically+0.041
comb+0.040
unny+0.040
Neg logit contributions
aw-0.038
 Leave-0.031
 such-0.029
 anything-0.029
 flashes-0.028
Token They
Token ID1119
Feature activation+5.59e-03

Pos logit contributions
aw+0.036
eping+0.030
 wishes+0.029
 anything+0.029
 drawn+0.029
Neg logit contributions
unny-0.036
 stool-0.034
oun-0.032
 toddler-0.032
 grocer-0.031
Token heard
Token ID2982
Feature activation-5.58e-02

Pos logit contributions
sie+0.010
unny+0.010
 stool+0.010
ll+0.010
ve+0.009
Neg logit contributions
aw-0.010
 such-0.008
 its-0.008
 anything-0.008
 wishes-0.007
Token horns
Token ID26771
Feature activation-5.55e-02

Pos logit contributions
 Leave+0.080
aw+0.075
 drawn+0.073
 Winning+0.071
house+0.070
Neg logit contributions
 fisherman-0.071
 toddler-0.070
 donkey-0.068
guard-0.066
 bunny-0.065
Token the
Token ID262
Feature activation+6.75e-03

Pos logit contributions
aw+0.012
 Leave+0.008
 flashes+0.008
 drawn+0.008
 aids+0.007
Neg logit contributions
 stool-0.016
sie-0.015
 hill-0.015
 boulder-0.015
otte-0.015
Token stars
Token ID5788
Feature activation+1.32e-02

Pos logit contributions
unny+0.014
sie+0.013
 stool+0.013
ll+0.013
hetically+0.013
Neg logit contributions
aw-0.011
 anything-0.009
 its-0.009
 flashes-0.009
 wishes-0.009
Token and
Token ID290
Feature activation-1.02e-02

Pos logit contributions
 stool+0.026
sie+0.026
unny+0.025
ll+0.025
hetically+0.024
Neg logit contributions
aw-0.021
 flashes-0.016
 its-0.016
 wishes-0.016
 anything-0.015
Token planets
Token ID14705
Feature activation-4.18e-02

Pos logit contributions
aw+0.017
 Leave+0.013
 flashes+0.012
 aids+0.012
 such+0.012
Neg logit contributions
 stool-0.019
sie-0.018
ll-0.018
otte-0.018
 aisle-0.018
Token in
Token ID287
Feature activation-1.04e-02

Pos logit contributions
aw+0.063
 flashes+0.047
 its+0.046
 wishes+0.045
 Leave+0.045
Neg logit contributions
sie-0.084
 stool-0.084
ll-0.082
unny-0.081
ve-0.080
Token outer
Token ID12076
Feature activation-1.06e-01

Pos logit contributions
aw+0.017
 Leave+0.012
 flashes+0.011
 drawn+0.011
 aids+0.011
Neg logit contributions
 stool-0.021
sie-0.021
 aisle-0.020
otte-0.020
 hill-0.019
Token space
Token ID2272
Feature activation-3.56e-02

Pos logit contributions
aw+0.175
 Leave+0.131
 flashes+0.127
 anything+0.127
 such+0.122
Neg logit contributions
sie-0.201
 stool-0.199
unny-0.193
oris-0.189
otte-0.186
Token.
Token ID13
Feature activation-8.48e-03

Pos logit contributions
aw+0.058
 Leave+0.048
 flashes+0.046
 Its+0.043
 It+0.041
Neg logit contributions
 stool-0.068
sie-0.067
 aisle-0.063
ve-0.063
ll-0.062
Token So
Token ID1406
Feature activation+1.37e-02

Pos logit contributions
aw+0.012
 flashes+0.009
 its+0.009
 anything+0.009
 wishes+0.009
Neg logit contributions
 stool-0.018
unny-0.018
ll-0.018
sie-0.018
aughter-0.017
Token one
Token ID530
Feature activation-6.40e-03

Pos logit contributions
sie+0.025
 stool+0.024
unny+0.023
ll+0.023
aughter+0.023
Neg logit contributions
aw-0.025
 flashes-0.020
 Leave-0.019
 its-0.018
 wishes-0.018
Token day
Token ID1110
Feature activation-1.05e-02

Pos logit contributions
aw+0.015
 its+0.012
 flashes+0.012
 anything+0.012
 such+0.012
Neg logit contributions
unny-0.008
sie-0.008
 stool-0.008
ll-0.008
otte-0.008
Token towers
Token ID18028
Feature activation-5.48e-03

Pos logit contributions
sie+0.006
 stool+0.005
 miner+0.005
ve+0.005
hetically+0.005
Neg logit contributions
aw-0.005
 Leave-0.004
 flashes-0.004
 its-0.004
 War-0.004
Token,
Token ID11
Feature activation-1.84e-03

Pos logit contributions
aw+0.009
 its+0.007
 wishes+0.007
 flashes+0.007
 anything+0.007
Neg logit contributions
 stool-0.010
ll-0.010
hetically-0.010
ve-0.010
sie-0.010
Token and
Token ID290
Feature activation-6.11e-03

Pos logit contributions
aw+0.003
 Leave+0.002
 wishes+0.002
 its+0.002
 flashes+0.002
Neg logit contributions
 stool-0.003
 miner-0.003
sie-0.003
hetically-0.003
beans-0.003
Token knights
Token ID27536
Feature activation-5.63e-02

Pos logit contributions
aw+0.011
 Leave+0.008
 its+0.008
 flashes+0.008
 Their+0.007
Neg logit contributions
 stool-0.011
sie-0.011
 miner-0.010
beans-0.010
 aisle-0.010
Token and
Token ID290
Feature activation-3.49e-02

Pos logit contributions
aw+0.092
 flashes+0.072
 anything+0.070
 its+0.070
 such+0.067
Neg logit contributions
ll-0.105
 stool-0.104
ve-0.104
unny-0.104
sie-0.103
Token princess
Token ID21752
Feature activation-1.06e-01

Pos logit contributions
aw+0.065
 flashes+0.051
 its+0.049
 Leave+0.049
 such+0.048
Neg logit contributions
sie-0.060
 stool-0.057
unny-0.056
 miner-0.056
otte-0.055
Tokenes
Token ID274
Feature activation-5.96e-02

Pos logit contributions
aw+0.191
 flashes+0.164
 anything+0.157
 its+0.155
 Leave+0.155
Neg logit contributions
ve-0.175
ll-0.168
sie-0.168
unny-0.164
 miner-0.164
Token!
Token ID0
Feature activation-2.59e-02

Pos logit contributions
aw+0.096
 flashes+0.075
 Leave+0.072
 wishes+0.072
 anything+0.071
Neg logit contributions
sie-0.111
ll-0.110
 stool-0.110
ve-0.110
unny-0.108
Token Sus
Token ID8932
Feature activation+6.23e-03

Pos logit contributions
aw+0.038
 flashes+0.029
 anything+0.028
 wishes+0.028
 its+0.028
Neg logit contributions
unny-0.056
sie-0.054
 stool-0.053
aughter-0.053
oris-0.052
Tokenie
Token ID494
Feature activation-4.78e-02

Pos logit contributions
unny+0.010
 stool+0.010
sie+0.010
ll+0.009
hetically+0.009
Neg logit contributions
aw-0.013
 its-0.011
 flashes-0.010
 anything-0.010
 such-0.010
Token spent
Token ID3377
Feature activation+2.09e-03

Pos logit contributions
aw+0.070
 anything+0.054
 its+0.054
 flashes+0.053
 such+0.052
Neg logit contributions
sie-0.098
unny-0.098
 stool-0.098
ll-0.097
ve-0.096
Token "
Token ID366
Feature activation-4.93e-03

Pos logit contributions
aw+0.073
 flashes+0.059
 drawn+0.058
 such+0.056
 Leave+0.055
Neg logit contributions
 stool-0.071
unny-0.069
ipop-0.066
sie-0.066
ve-0.066
Tokenwow
Token ID42773
Feature activation+8.67e-03

Pos logit contributions
aw+0.010
 its+0.008
 flashes+0.008
 anything+0.008
 wishes+0.008
Neg logit contributions
unny-0.008
ve-0.007
ll-0.007
sie-0.007
 stool-0.007
Token"
Token ID1
Feature activation+6.10e-02

Pos logit contributions
sie+0.015
unny+0.015
 stool+0.015
ll+0.015
hetically+0.015
Neg logit contributions
aw-0.016
 its-0.012
 anything-0.012
 flashes-0.012
 such-0.012
Token and
Token ID290
Feature activation-2.68e-03

Pos logit contributions
unny+0.111
ll+0.108
 stool+0.107
ve+0.104
aughter+0.104
Neg logit contributions
aw-0.102
 flashes-0.088
 wishes-0.086
 its-0.084
 anything-0.082
Token "
Token ID366
Feature activation-1.40e-02

Pos logit contributions
aw+0.005
 its+0.004
 flashes+0.004
 anything+0.003
 such+0.003
Neg logit contributions
 stool-0.005
unny-0.005
sie-0.005
ll-0.005
otte-0.005
Tokenaw
Token ID707
Feature activation-1.06e-01

Pos logit contributions
aw+0.027
 its+0.024
 flashes+0.024
 anything+0.023
 wishes+0.023
Neg logit contributions
unny-0.021
ve-0.021
ll-0.021
sie-0.020
 stool-0.019
Tokenw
Token ID86
Feature activation-2.44e-02

Pos logit contributions
aw+0.181
 flashes+0.151
 its+0.149
 wishes+0.148
 anything+0.144
Neg logit contributions
unny-0.191
 stool-0.188
ll-0.184
sie-0.182
ve-0.180
Token".
Token ID1911
Feature activation-1.77e-02

Pos logit contributions
aw+0.042
 flashes+0.034
 its+0.034
 wishes+0.033
 anything+0.033
Neg logit contributions
unny-0.045
 stool-0.043
sie-0.043
ll-0.043
ve-0.042
Token They
Token ID1119
Feature activation-3.99e-03

Pos logit contributions
aw+0.029
 flashes+0.024
 wishes+0.023
 such+0.022
 drawn+0.022
Neg logit contributions
 stool-0.032
unny-0.031
�-0.029
aughter-0.029
ylan-0.029
Token are
Token ID389
Feature activation-7.14e-03

Pos logit contributions
aw+0.006
 its+0.005
 flashes+0.004
 anything+0.004
 such+0.004
Neg logit contributions
unny-0.008
sie-0.008
 stool-0.008
ll-0.008
aughter-0.008
Token calm
Token ID9480
Feature activation-2.77e-02

Pos logit contributions
aw+0.013
 Leave+0.010
 wishes+0.009
 Its+0.009
 flashes+0.009
Neg logit contributions
 stool-0.012
ll-0.011
sie-0.011
otte-0.011
nie-0.011
Token We
Token ID775
Feature activation-7.23e-03

Pos logit contributions
aw+0.033
 anything+0.024
 its+0.024
 drawn+0.023
 flashes+0.023
Neg logit contributions
unny-0.037
 stool-0.035
stack-0.034
aughter-0.033
comb-0.033
Token were
Token ID547
Feature activation-5.95e-04

Pos logit contributions
aw+0.012
 its+0.010
 flashes+0.009
 anything+0.009
 such+0.009
Neg logit contributions
unny-0.014
sie-0.014
ll-0.013
 stool-0.013
ve-0.013
Token playing
Token ID2712
Feature activation+3.56e-03

Pos logit contributions
aw+0.001
 flashes+0.001
 its+0.001
 Leave+0.001
 anything+0.001
Neg logit contributions
ll-0.001
ve-0.001
 stool-0.001
sie-0.001
aughter-0.001
Token knights
Token ID27536
Feature activation-3.30e-02

Pos logit contributions
 stool+0.007
sie+0.007
unny+0.007
ll+0.007
ve+0.007
Neg logit contributions
aw-0.005
 flashes-0.004
 its-0.004
 anything-0.004
 such-0.004
Token and
Token ID290
Feature activation-3.42e-02

Pos logit contributions
aw+0.046
 its+0.035
 anything+0.034
 flashes+0.033
 such+0.033
Neg logit contributions
unny-0.072
sie-0.071
 stool-0.071
ll-0.069
hetically-0.069
Token princess
Token ID21752
Feature activation-1.06e-01

Pos logit contributions
aw+0.062
 its+0.047
 flashes+0.047
 anything+0.046
 wishes+0.045
Neg logit contributions
sie-0.061
 stool-0.061
unny-0.061
ll-0.059
otte-0.057
Tokenes
Token ID274
Feature activation-5.85e-02

Pos logit contributions
aw+0.171
 flashes+0.137
 its+0.136
 anything+0.133
 such+0.130
Neg logit contributions
sie-0.201
unny-0.199
 stool-0.198
ll-0.194
ve-0.190
Token.
Token ID13
Feature activation-3.39e-02

Pos logit contributions
aw+0.090
 flashes+0.071
 its+0.068
 anything+0.067
 wishes+0.067
Neg logit contributions
sie-0.115
unny-0.115
 stool-0.115
ll-0.112
ve-0.109
Token Tom
Token ID4186
Feature activation-5.00e-03

Pos logit contributions
aw+0.062
 anything+0.047
 drawn+0.046
 flashes+0.045
 its+0.045
Neg logit contributions
unny-0.061
stack-0.056
 stool-0.055
 miner-0.052
oun-0.052
Token was
Token ID373
Feature activation+8.49e-04

Pos logit contributions
aw+0.007
 its+0.005
 flashes+0.005
 anything+0.005
 wishes+0.005
Neg logit contributions
unny-0.011
sie-0.011
 stool-0.011
otte-0.011
ll-0.010
Token a
Token ID257
Feature activation+2.37e-02

Pos logit contributions
 stool+0.001
sie+0.001
 miner+0.001
otte+0.001
ve+0.001
Neg logit contributions
aw-0.001
 Leave-0.001
 flashes-0.001
 wishes-0.001
 Its-0.001
Token "
Token ID366
Feature activation-3.29e-02

Pos logit contributions
aw+0.021
 wishes+0.016
 such+0.016
 flashes+0.016
 anything+0.016
Neg logit contributions
 stool-0.024
unny-0.023
ylan-0.022
aughter-0.022
oris-0.022
TokenC
Token ID34
Feature activation-2.29e-02

Pos logit contributions
aw+0.058
 its+0.047
 flashes+0.046
 anything+0.046
 such+0.045
Neg logit contributions
unny-0.059
sie-0.059
 stool-0.058
ll-0.057
ve-0.055
Tokenaw
Token ID707
Feature activation-1.04e-01

Pos logit contributions
aw+0.043
 its+0.036
 flashes+0.035
 anything+0.035
 such+0.034
Neg logit contributions
unny-0.038
sie-0.038
 stool-0.038
ll-0.037
ve-0.036
Token,
Token ID11
Feature activation-2.15e-02

Pos logit contributions
aw+0.165
 its+0.131
 flashes+0.131
 wishes+0.128
 such+0.126
Neg logit contributions
unny-0.201
 stool-0.200
ll-0.198
sie-0.194
ve-0.191
Token c
Token ID269
Feature activation-5.14e-02

Pos logit contributions
aw+0.036
 its+0.028
 flashes+0.028
 anything+0.027
 such+0.027
Neg logit contributions
 stool-0.041
unny-0.040
sie-0.039
ll-0.038
ve-0.037
Tokenaw
Token ID707
Feature activation-1.05e-01

Pos logit contributions
aw+0.063
 its+0.060
 anything+0.058
 such+0.056
 flashes+0.055
Neg logit contributions
ll-0.108
ve-0.108
unny-0.107
 stool-0.105
aughter-0.104
Token!"
Token ID2474
Feature activation+4.63e-03

Pos logit contributions
aw+0.166
 its+0.131
 flashes+0.129
 anything+0.126
 wishes+0.125
Neg logit contributions
unny-0.209
 stool-0.208
sie-0.206
ll-0.203
ve-0.198
Token
Token ID198
Feature activation-7.24e-03

Pos logit contributions
 stool+0.008
unny+0.007
sie+0.007
aughter+0.007
ylan+0.007
Neg logit contributions
aw-0.009
 flashes-0.007
 anything-0.007
 such-0.007
 its-0.007
Token
Token ID198
Feature activation-1.23e-03

Pos logit contributions
aw+0.009
 its+0.006
 flashes+0.006
 anything+0.006
 such+0.006
Neg logit contributions
sie-0.017
unny-0.017
 stool-0.017
ll-0.017
hetically-0.017
TokenThey
Token ID2990
Feature activation+2.56e-02

Pos logit contributions
aw+0.002
 its+0.002
 flashes+0.002
 anything+0.002
 wishes+0.001
Neg logit contributions
sie-0.002
hetically-0.002
ipop-0.002
 stool-0.002
unny-0.002
Token had
Token ID550
Feature activation-1.96e-02

Pos logit contributions
unny+0.054
 stool+0.053
sie+0.052
ll+0.052
aughter+0.052
Neg logit contributions
aw-0.039
 its-0.031
 flashes-0.029
 anything-0.028
 wishes-0.027
Token pretended
Token ID36766
Feature activation-2.19e-02

Pos logit contributions
aw+0.051
 flashes+0.040
 anything+0.038
 Leave+0.036
 such+0.036
Neg logit contributions
 stool-0.054
sie-0.054
ll-0.052
otte-0.051
 miner-0.051
Token to
Token ID284
Feature activation-1.01e-02

Pos logit contributions
aw+0.042
 flashes+0.035
 Leave+0.034
 wishes+0.034
 Its+0.032
Neg logit contributions
 stool-0.033
guard-0.032
ll-0.031
ipop-0.030
 aisle-0.030
Token be
Token ID307
Feature activation-2.17e-02

Pos logit contributions
aw+0.021
 flashes+0.016
 such+0.014
 Its+0.014
 Leave+0.014
Neg logit contributions
 hill-0.015
 stool-0.015
 bud-0.015
stack-0.015
comb-0.014
Token knights
Token ID27536
Feature activation-3.08e-02

Pos logit contributions
aw+0.035
 wishes+0.031
 Leave+0.031
 flashes+0.030
 Its+0.029
Neg logit contributions
sie-0.036
 miner-0.034
otte-0.033
unny-0.033
 grocer-0.032
Token and
Token ID290
Feature activation-3.39e-02

Pos logit contributions
aw+0.048
 flashes+0.038
 its+0.037
 anything+0.036
 wishes+0.035
Neg logit contributions
unny-0.060
sie-0.060
 stool-0.060
ll-0.059
hetically-0.057
Token princess
Token ID21752
Feature activation-1.05e-01

Pos logit contributions
aw+0.065
 flashes+0.048
 Leave+0.046
 such+0.046
 its+0.046
Neg logit contributions
sie-0.058
 stool-0.056
unny-0.055
ll-0.055
 miner-0.054
Tokenes
Token ID274
Feature activation-5.59e-02

Pos logit contributions
aw+0.175
 flashes+0.150
 its+0.142
 anything+0.142
 Leave+0.138
Neg logit contributions
sie-0.184
unny-0.184
ll-0.181
beans-0.176
ve-0.176
Token.
Token ID13
Feature activation-1.69e-02

Pos logit contributions
aw+0.089
 flashes+0.072
 wishes+0.070
 Leave+0.069
 anything+0.065
Neg logit contributions
sie-0.102
unny-0.101
ll-0.101
 stool-0.100
hetically-0.097
Token They
Token ID1119
Feature activation+2.90e-03

Pos logit contributions
aw+0.032
 anything+0.024
 wishes+0.023
 flashes+0.023
eping+0.023
Neg logit contributions
unny-0.029
 stool-0.027
 aisle-0.026
sie-0.026
oris-0.026
Token had
Token ID550
Feature activation-2.58e-02

Pos logit contributions
unny+0.006
 stool+0.006
sie+0.006
aughter+0.006
ve+0.006
Neg logit contributions
aw-0.004
 its-0.003
 flashes-0.003
 wishes-0.003
 anything-0.003
Token a
Token ID257
Feature activation-5.32e-03

Pos logit contributions
aw+0.039
 flashes+0.033
 Leave+0.032
 aids+0.031
 wishes+0.030
Neg logit contributions
sie-0.045
ll-0.044
 stool-0.044
unny-0.043
beans-0.042
Token"
Token ID1
Feature activation+1.98e-02

Pos logit contributions
aw+0.019
 its+0.015
 Leave+0.013
 flashes+0.013
 It+0.013
Neg logit contributions
sie-0.021
 miner-0.021
hetically-0.021
 stool-0.020
comb-0.020
TokenB
Token ID33
Feature activation+1.96e-02

Pos logit contributions
unny+0.041
 stool+0.040
sie+0.040
ll+0.039
hetically+0.039
Neg logit contributions
aw-0.032
 flashes-0.024
 its-0.024
 anything-0.023
 such-0.023
Tokenrav
Token ID4108
Feature activation-2.28e-02

Pos logit contributions
 miner+0.029
 stool+0.029
sie+0.028
hetically+0.027
 grocer+0.026
Neg logit contributions
aw-0.043
 Leave-0.034
 anything-0.033
 its-0.033
 such-0.033
Tokeno
Token ID78
Feature activation-1.08e-02

Pos logit contributions
aw+0.054
 anything+0.053
 wishes+0.053
 flashes+0.051
 its+0.049
Neg logit contributions
unny-0.025
ll-0.023
otte-0.020
ve-0.020
aughter-0.020
Token,
Token ID11
Feature activation-7.37e-03

Pos logit contributions
aw+0.014
 its+0.010
 such+0.009
 anything+0.009
 flashes+0.008
Neg logit contributions
sie-0.026
 stool-0.025
unny-0.025
otte-0.025
hetically-0.025
Token brav
Token ID49025
Feature activation-1.05e-01

Pos logit contributions
aw+0.011
 its+0.008
 flashes+0.008
 anything+0.008
 such+0.008
Neg logit contributions
sie-0.015
unny-0.015
 stool-0.015
ll-0.015
otte-0.015
Tokeno
Token ID78
Feature activation-1.74e-02

Pos logit contributions
aw+0.232
 anything+0.203
 flashes+0.201
 its+0.201
 such+0.200
Neg logit contributions
unny-0.135
sie-0.129
ley-0.121
ll-0.121
 stool-0.120
Token,
Token ID11
Feature activation-1.17e-02

Pos logit contributions
aw+0.023
 its+0.017
 anything+0.015
 such+0.015
 flashes+0.015
Neg logit contributions
sie-0.040
 stool-0.039
unny-0.039
hetically-0.038
otte-0.038
Token you
Token ID345
Feature activation-1.00e-02

Pos logit contributions
aw+0.017
 flashes+0.013
 its+0.013
 anything+0.012
 wishes+0.012
Neg logit contributions
 stool-0.025
sie-0.024
unny-0.024
ll-0.024
otte-0.023
Token are
Token ID389
Feature activation-8.00e-04

Pos logit contributions
aw+0.019
 its+0.016
 flashes+0.014
 Its+0.014
 aids+0.014
Neg logit contributions
ve-0.017
sie-0.016
unny-0.016
ll-0.016
otte-0.015
Token wonderful
Token ID7932
Feature activation-3.13e-02

Pos logit contributions
aw+0.001
 flashes+0.001
 its+0.001
 wishes+0.001
 Leave+0.001
Neg logit contributions
sie-0.002
 stool-0.002
unny-0.002
otte-0.002
ll-0.002
Token They
Token ID1119
Feature activation+4.11e-04

Pos logit contributions
aw+0.044
 wishes+0.036
e+0.033
 flashes+0.033
 such+0.033
Neg logit contributions
 stool-0.045
unny-0.045
 grocer-0.040
beans-0.039
suits-0.039
Token felt
Token ID2936
Feature activation-2.72e-02

Pos logit contributions
sie+0.001
unny+0.001
 stool+0.001
ll+0.001
ve+0.001
Neg logit contributions
aw-0.001
 its-0.001
 anything-0.001
 flashes-0.001
 such-0.001
Token like
Token ID588
Feature activation-2.05e-03

Pos logit contributions
aw+0.051
 Leave+0.042
 aids+0.039
c+0.038
 flashes+0.037
Neg logit contributions
sie-0.039
mit-0.038
 stool-0.036
 toddler-0.036
ylan-0.035
Token princes
Token ID42676
Feature activation-4.96e-02

Pos logit contributions
aw+0.003
 Leave+0.003
 flashes+0.002
 wishes+0.002
 aids+0.002
Neg logit contributions
 stool-0.004
sie-0.004
 miner-0.004
beans-0.004
unny-0.004
Token and
Token ID290
Feature activation-3.65e-02

Pos logit contributions
aw+0.083
 flashes+0.066
 its+0.066
 anything+0.064
 wishes+0.062
Neg logit contributions
ll-0.091
 stool-0.089
sie-0.089
unny-0.088
ve-0.087
Token princess
Token ID21752
Feature activation-1.05e-01

Pos logit contributions
aw+0.075
 Leave+0.055
 flashes+0.055
 such+0.055
 It+0.053
Neg logit contributions
sie-0.058
 stool-0.056
beans-0.053
unny-0.053
ll-0.052
Tokenes
Token ID274
Feature activation-5.22e-02

Pos logit contributions
aw+0.175
 flashes+0.156
 Its+0.148
 its+0.147
 Leave+0.147
Neg logit contributions
ve-0.167
sie-0.167
ll-0.166
beans-0.164
unny-0.161
Token.
Token ID13
Feature activation-2.68e-02

Pos logit contributions
aw+0.091
 flashes+0.075
 Leave+0.074
 wishes+0.070
 Its+0.069
Neg logit contributions
ll-0.084
sie-0.083
 stool-0.082
unny-0.081
comb-0.081
Token They
Token ID1119
Feature activation+2.11e-03

Pos logit contributions
aw+0.052
 wishes+0.041
 flashes+0.040
eping+0.039
 such+0.038
Neg logit contributions
unny-0.046
 stool-0.041
 grocer-0.038
 opener-0.038
oris-0.038
Token played
Token ID2826
Feature activation+1.10e-02

Pos logit contributions
sie+0.004
unny+0.004
 stool+0.004
ll+0.004
ve+0.004
Neg logit contributions
aw-0.004
 its-0.003
 flashes-0.003
 anything-0.003
 such-0.003
Token with
Token ID351
Feature activation-3.63e-03

Pos logit contributions
 stool+0.020
beans+0.019
sie+0.019
ve+0.019
unny+0.018
Neg logit contributions
aw-0.018
 flashes-0.015
 Leave-0.015
 wishes-0.013
 drawn-0.013
Token their
Token ID511
Feature activation-2.34e-02

Pos logit contributions
aw+0.005
 Leave+0.004
 flashes+0.004
 Nobody+0.004
 Ghosts+0.004
Neg logit contributions
 stool-0.006
beans-0.005
 aisle-0.005
 seller-0.005
ipop-0.005
Token map
Token ID3975
Feature activation-3.20e-02

Pos logit contributions
aw+0.046
 its+0.037
 flashes+0.036
 anything+0.036
 such+0.036
Neg logit contributions
 stool-0.039
sie-0.038
unny-0.038
ll-0.037
ve-0.036
Token and
Token ID290
Feature activation-4.40e-02

Pos logit contributions
aw+0.041
 its+0.030
 flashes+0.028
 anything+0.028
 wishes+0.027
Neg logit contributions
unny-0.074
sie-0.074
 stool-0.073
ll-0.072
ve-0.071
Token sword
Token ID8429
Feature activation-9.27e-02

Pos logit contributions
aw+0.069
 flashes+0.050
 its+0.049
 anything+0.048
 such+0.047
Neg logit contributions
 stool-0.090
sie-0.089
unny-0.087
ll-0.086
hetically-0.084
Token and
Token ID290
Feature activation-5.31e-02

Pos logit contributions
aw+0.131
 flashes+0.100
 anything+0.098
 its+0.096
 such+0.096
Neg logit contributions
unny-0.195
 stool-0.194
ll-0.193
sie-0.191
ve-0.186
Token hat
Token ID6877
Feature activation-1.04e-01

Pos logit contributions
aw+0.090
 flashes+0.066
 its+0.064
 anything+0.064
 such+0.063
Neg logit contributions
 stool-0.102
sie-0.099
unny-0.096
ll-0.095
hetically-0.094
Token.
Token ID13
Feature activation-3.32e-02

Pos logit contributions
aw+0.157
 flashes+0.133
 anything+0.125
 wishes+0.125
 Leave+0.124
Neg logit contributions
ll-0.197
 stool-0.196
unny-0.194
hetically-0.191
ve-0.187
Token They
Token ID1119
Feature activation-5.06e-03

Pos logit contributions
aw+0.067
eping+0.058
 wishes+0.054
 anything+0.053
c+0.053
Neg logit contributions
unny-0.044
 stool-0.043
oun-0.043
stack-0.039
�-0.039
Token have
Token ID423
Feature activation-1.34e-02

Pos logit contributions
aw+0.007
 its+0.005
 anything+0.005
 wishes+0.005
 flashes+0.005
Neg logit contributions
unny-0.012
sie-0.011
 stool-0.011
aughter-0.011
ve-0.011
Token fun
Token ID1257
Feature activation-1.45e-02

Pos logit contributions
aw+0.021
 flashes+0.017
 Leave+0.017
 aids+0.017
 War+0.016
Neg logit contributions
 stool-0.024
ll-0.023
sie-0.023
beans-0.022
 aisle-0.022
Token.
Token ID13
Feature activation-2.98e-02

Pos logit contributions
aw+0.024
 flashes+0.020
 Leave+0.019
 Its+0.019
 aids+0.018
Neg logit contributions
ll-0.026
ve-0.026
 stool-0.026
sie-0.025
beans-0.024
Token animals
Token ID4695
Feature activation-5.26e-02

Pos logit contributions
aw+0.004
 Leave+0.004
 flashes+0.003
 Nobody+0.003
 Instead+0.003
Neg logit contributions
sie-0.005
 aisle-0.005
 stool-0.005
 forks-0.005
beans-0.005
Token,
Token ID11
Feature activation-3.31e-02

Pos logit contributions
aw+0.087
 flashes+0.069
 its+0.067
 wishes+0.067
 anything+0.066
Neg logit contributions
 stool-0.100
unny-0.099
ll-0.098
sie-0.097
ve-0.093
Token cars
Token ID5006
Feature activation-3.23e-02

Pos logit contributions
aw+0.063
 wishes+0.050
 its+0.048
 Leave+0.047
 flashes+0.045
Neg logit contributions
 stool-0.057
sie-0.050
beans-0.049
 aisle-0.048
unny-0.048
Token,
Token ID11
Feature activation-3.79e-02

Pos logit contributions
aw+0.052
 its+0.041
 anything+0.040
 flashes+0.040
 wishes+0.040
Neg logit contributions
 stool-0.063
unny-0.062
ll-0.061
sie-0.061
ve-0.059
Token and
Token ID290
Feature activation-2.90e-02

Pos logit contributions
aw+0.066
 wishes+0.053
 its+0.053
 such+0.050
 Leave+0.049
Neg logit contributions
 stool-0.069
sie-0.064
ll-0.062
unny-0.062
 aisle-0.061
Token princess
Token ID21752
Feature activation-1.04e-01

Pos logit contributions
aw+0.053
 its+0.040
 Leave+0.037
 Its+0.037
 flashes+0.037
Neg logit contributions
 stool-0.049
sie-0.048
 toddler-0.045
beans-0.045
 aisle-0.044
Tokenes
Token ID274
Feature activation-5.79e-02

Pos logit contributions
aw+0.191
 flashes+0.164
 its+0.162
 wishes+0.158
 anything+0.157
Neg logit contributions
unny-0.174
ll-0.166
sie-0.166
ve-0.165
 stool-0.165
Token.
Token ID13
Feature activation-1.76e-02

Pos logit contributions
aw+0.093
 flashes+0.075
 wishes+0.074
 Leave+0.071
 its+0.068
Neg logit contributions
 stool-0.109
unny-0.108
sie-0.106
ll-0.106
 aisle-0.102
Token Sometimes
Token ID8975
Feature activation-8.39e-03

Pos logit contributions
aw+0.030
 wishes+0.025
 anything+0.024
 its+0.024
 such+0.023
Neg logit contributions
 stool-0.033
 aisle-0.031
oun-0.029
aughter-0.029
unny-0.029
Token they
Token ID484
Feature activation+8.23e-03

Pos logit contributions
aw+0.018
 Leave+0.015
 drawn+0.015
 Its+0.014
 flashes+0.014
Neg logit contributions
 stool-0.011
sie-0.011
 aisle-0.011
otte-0.010
guard-0.010
Token bring
Token ID2222
Feature activation-1.23e-02

Pos logit contributions
sie+0.015
ll+0.015
unny+0.014
 stool+0.014
otte+0.014
Neg logit contributions
aw-0.015
 its-0.011
 such-0.011
 Its-0.011
 flashes-0.011
Token pretend
Token ID16614
Feature activation-4.33e-02

Pos logit contributions
aw+0.014
 Leave+0.011
 Ach+0.010
 Its+0.010
 flashes+0.010
Neg logit contributions
 stool-0.012
hetically-0.011
sie-0.011
comb-0.011
 bud-0.011
Token to
Token ID284
Feature activation-1.03e-02

Pos logit contributions
aw+0.088
 Leave+0.073
 flashes+0.068
 Its+0.066
 Ach+0.065
Neg logit contributions
 stool-0.062
guard-0.057
beans-0.057
 aisle-0.054
balls-0.054
Token be
Token ID307
Feature activation-2.58e-02

Pos logit contributions
aw+0.021
 flashes+0.016
 Leave+0.015
 such+0.015
 aids+0.014
Neg logit contributions
 bud-0.016
comb-0.015
 stool-0.015
sie-0.015
 hill-0.015
Token princes
Token ID42676
Feature activation-4.78e-02

Pos logit contributions
aw+0.043
 Leave+0.040
 Ach+0.036
 flashes+0.035
 Its+0.034
Neg logit contributions
sie-0.038
 miner-0.037
mit-0.035
beans-0.035
 seller-0.035
Token and
Token ID290
Feature activation-3.33e-02

Pos logit contributions
aw+0.077
 flashes+0.061
 its+0.058
 Leave+0.057
 anything+0.055
Neg logit contributions
ll-0.091
 stool-0.090
sie-0.089
unny-0.089
comb-0.089
Token princess
Token ID21752
Feature activation-1.04e-01

Pos logit contributions
aw+0.069
 flashes+0.054
 Leave+0.052
 such+0.051
 its+0.051
Neg logit contributions
sie-0.052
 stool-0.048
unny-0.047
ll-0.046
beans-0.046
Tokenes
Token ID274
Feature activation-5.64e-02

Pos logit contributions
aw+0.185
 flashes+0.168
 Leave+0.156
 its+0.152
 Its+0.151
Neg logit contributions
sie-0.162
ve-0.157
unny-0.156
beans-0.156
ll-0.156
Token.
Token ID13
Feature activation-1.76e-02

Pos logit contributions
aw+0.097
 flashes+0.078
 Leave+0.077
 wishes+0.073
 aids+0.069
Neg logit contributions
sie-0.095
unny-0.094
 stool-0.092
ll-0.091
ley-0.090
Token They
Token ID1119
Feature activation+6.07e-03

Pos logit contributions
aw+0.033
 wishes+0.028
 such+0.026
e+0.026
 anything+0.026
Neg logit contributions
unny-0.028
 stool-0.028
 opener-0.025
suits-0.025
oris-0.025
Token have
Token ID423
Feature activation-5.99e-03

Pos logit contributions
unny+0.013
sie+0.013
 stool+0.013
ll+0.012
ve+0.012
Neg logit contributions
aw-0.009
 its-0.007
 flashes-0.007
 anything-0.007
 wishes-0.006
Token a
Token ID257
Feature activation+3.14e-04

Pos logit contributions
aw+0.010
 Leave+0.008
 flashes+0.008
 aids+0.007
 War+0.007
Neg logit contributions
sie-0.010
 stool-0.010
ll-0.010
beans-0.009
 first-0.009
Token "
Token ID366
Feature activation-3.89e-03

Pos logit contributions
aw+0.008
 flashes+0.006
 such+0.006
 drawn+0.006
 Leave+0.006
Neg logit contributions
 stool-0.006
unny-0.006
ipop-0.006
sie-0.006
ll-0.006
Tokenc
Token ID66
Feature activation-9.14e-02

Pos logit contributions
aw+0.008
 its+0.007
 anything+0.007
 flashes+0.007
 about+0.007
Neg logit contributions
unny-0.006
sie-0.005
ll-0.005
ve-0.005
 stool-0.005
Tokenaw
Token ID707
Feature activation-9.50e-02

Pos logit contributions
aw+0.136
 anything+0.114
 flashes+0.113
 its+0.111
 such+0.110
Neg logit contributions
ll-0.183
ve-0.178
unny-0.176
hetically-0.167
sie-0.164
Token,
Token ID11
Feature activation-1.55e-02

Pos logit contributions
aw+0.156
 its+0.125
 flashes+0.125
 wishes+0.122
 anything+0.121
Neg logit contributions
unny-0.182
ll-0.179
 stool-0.178
sie-0.176
ve-0.173
Token c
Token ID269
Feature activation-5.25e-02

Pos logit contributions
aw+0.029
 flashes+0.023
 its+0.021
 such+0.021
 drawn+0.021
Neg logit contributions
 stool-0.027
unny-0.026
sie-0.025
 miner-0.025
ll-0.025
Tokenaw
Token ID707
Feature activation-1.04e-01

Pos logit contributions
aw+0.096
 its+0.094
 anything+0.094
 flashes+0.092
 such+0.092
Neg logit contributions
ll-0.083
ve-0.081
unny-0.078
sie-0.075
aughter-0.074
Token,
Token ID11
Feature activation-2.41e-02

Pos logit contributions
aw+0.167
 flashes+0.137
 its+0.135
 wishes+0.134
 anything+0.131
Neg logit contributions
unny-0.200
ll-0.198
 stool-0.197
sie-0.192
ve-0.190
Token c
Token ID269
Feature activation-5.20e-02

Pos logit contributions
aw+0.044
 flashes+0.035
 its+0.033
 such+0.033
 anything+0.032
Neg logit contributions
 stool-0.042
unny-0.041
sie-0.041
ll-0.040
 miner-0.039
Tokenaw
Token ID707
Feature activation-1.07e-01

Pos logit contributions
aw+0.089
 its+0.085
 anything+0.084
 flashes+0.083
 such+0.083
Neg logit contributions
ll-0.088
ve-0.087
unny-0.085
sie-0.082
 stool-0.081
Token".
Token ID1911
Feature activation-5.64e-03

Pos logit contributions
aw+0.170
 flashes+0.139
 its+0.137
 wishes+0.136
 anything+0.134
Neg logit contributions
unny-0.207
ll-0.204
 stool-0.204
sie-0.198
ve-0.196
Token<|endoftext|>
Token ID50256
Feature activation+1.56e-02

Pos logit contributions
aw+0.010
 flashes+0.008
 anything+0.008
 such+0.007
 wishes+0.007
Neg logit contributions
 stool-0.010
unny-0.010
oun-0.009
 aisle-0.009
aughter-0.009
Token the
Token ID262
Feature activation+1.72e-02

Pos logit contributions
 stool+0.027
sie+0.026
unny+0.025
 miner+0.025
aughter+0.024
Neg logit contributions
aw-0.032
 flashes-0.025
 Leave-0.024
 anything-0.024
 its-0.024
Token vine
Token ID17793
Feature activation+1.59e-02

Pos logit contributions
unny+0.029
sie+0.029
ll+0.029
otte+0.027
ve+0.027
Neg logit contributions
aw-0.033
 its-0.027
 anything-0.027
 wishes-0.026
 flashes-0.026
Token she
Token ID673
Feature activation+2.57e-04

Pos logit contributions
 stool+0.035
sie+0.034
unny+0.034
ll+0.034
hetically+0.033
Neg logit contributions
aw-0.022
 its-0.016
 flashes-0.016
 anything-0.016
 wishes-0.015
Token had
Token ID550
Feature activation+4.05e-03

Pos logit contributions
sie+0.000
ve+0.000
unny+0.000
 stool+0.000
ll+0.000
Neg logit contributions
aw-0.000
 flashes-0.000
 its-0.000
 anything-0.000
 such-0.000
Token che
Token ID1125
Feature activation+4.63e-02

Pos logit contributions
ve+0.008
ll+0.007
 stool+0.007
sie+0.007
unny+0.007
Neg logit contributions
aw-0.007
 flashes-0.006
 its-0.005
 Leave-0.005
 anything-0.005
Tokenwed
Token ID19103
Feature activation+8.18e-02

Pos logit contributions
 stool+0.086
unny+0.086
sie+0.082
otte+0.082
ll+0.081
Neg logit contributions
aw-0.085
 its-0.065
 anything-0.064
 such-0.062
 wishes-0.061
Token on
Token ID319
Feature activation-1.53e-02

Pos logit contributions
unny+0.182
sie+0.181
 stool+0.179
ll+0.176
otte+0.173
Neg logit contributions
aw-0.113
 its-0.087
 anything-0.083
 such-0.081
 flashes-0.077
Token earlier
Token ID2961
Feature activation+1.09e-02

Pos logit contributions
aw+0.020
 flashes+0.015
 Leave+0.014
 wishes+0.014
 its+0.013
Neg logit contributions
 stool-0.034
sie-0.033
unny-0.033
ve-0.032
hetically-0.032
Token.
Token ID13
Feature activation+7.94e-03

Pos logit contributions
 stool+0.021
sie+0.020
unny+0.020
comb+0.020
ve+0.020
Neg logit contributions
aw-0.016
 flashes-0.014
 wishes-0.013
 Leave-0.012
 anything-0.012
Token She
Token ID1375
Feature activation+4.21e-03

Pos logit contributions
unny+0.016
 stool+0.016
sie+0.016
otte+0.016
aughter+0.016
Neg logit contributions
aw-0.012
 flashes-0.009
 anything-0.009
 wishes-0.009
 its-0.009
Token realized
Token ID6939
Feature activation-1.68e-02

Pos logit contributions
sie+0.008
 stool+0.008
unny+0.008
ll+0.008
ve+0.008
Neg logit contributions
aw-0.007
 anything-0.005
 its-0.005
 flashes-0.005
 such-0.005
Token impressed
Token ID12617
Feature activation-5.69e-03

Pos logit contributions
aw+0.025
 anything+0.020
 such+0.020
 flashes+0.020
 Leave+0.019
Neg logit contributions
ll-0.031
unny-0.031
sie-0.031
ve-0.030
 stool-0.030
Token with
Token ID351
Feature activation+2.37e-02

Pos logit contributions
aw+0.008
 flashes+0.007
 its+0.007
 anything+0.006
 wishes+0.006
Neg logit contributions
ll-0.011
unny-0.011
sie-0.011
ve-0.011
 stool-0.011
Token Jerry
Token ID13075
Feature activation+3.40e-02

Pos logit contributions
sie+0.042
hetically+0.041
 stool+0.041
 miner+0.041
aughter+0.040
Neg logit contributions
aw-0.039
 flashes-0.031
 wishes-0.029
 anything-0.028
 Leave-0.028
Token's
Token ID338
Feature activation+1.70e-03

Pos logit contributions
 stool+0.086
unny+0.085
sie+0.085
otte+0.081
aughter+0.081
Neg logit contributions
aw-0.039
 its-0.028
 such-0.026
 anything-0.025
 about-0.024
Token hard
Token ID1327
Feature activation+2.75e-03

Pos logit contributions
unny+0.003
 stool+0.003
sie+0.003
ll+0.003
ve+0.003
Neg logit contributions
aw-0.003
 its-0.002
 flashes-0.002
 anything-0.002
 such-0.002
Token work
Token ID670
Feature activation+9.16e-02

Pos logit contributions
unny+0.004
sie+0.004
hetically+0.004
ll+0.004
otte+0.004
Neg logit contributions
aw-0.006
 anything-0.005
 its-0.005
 wishes-0.005
 such-0.005
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
unny+0.208
sie+0.207
 stool+0.206
ll+0.201
otte+0.198
Neg logit contributions
aw-0.120
 its-0.089
 anything-0.085
 flashes-0.085
 such-0.083
Token From
Token ID3574
Feature activation+3.16e-02

Pos logit contributions
unny+0.026
ll+0.026
 stool+0.026
hetically+0.025
aughter+0.025
Neg logit contributions
aw-0.021
 anything-0.017
 flashes-0.017
 its-0.016
 such-0.016
Token that
Token ID326
Feature activation-8.69e-03

Pos logit contributions
sie+0.071
unny+0.070
 stool+0.070
ll+0.070
otte+0.068
Neg logit contributions
aw-0.040
 flashes-0.029
 anything-0.029
 its-0.029
 wishes-0.028
Token day
Token ID1110
Feature activation-2.22e-02

Pos logit contributions
 its+0.012
aw+0.011
 about+0.010
 such+0.010
 anything+0.010
Neg logit contributions
sie-0.019
unny-0.018
ylan-0.018
beans-0.018
ve-0.018
Token on
Token ID319
Feature activation-2.40e-02

Pos logit contributions
aw+0.032
 its+0.025
 anything+0.023
 such+0.023
 flashes+0.022
Neg logit contributions
sie-0.049
unny-0.048
 stool-0.048
ll-0.046
otte-0.046
Token did
Token ID750
Feature activation+1.59e-03

Pos logit contributions
aw+0.020
 flashes+0.016
 its+0.016
 anything+0.016
 wishes+0.015
Neg logit contributions
sie-0.022
unny-0.022
ll-0.022
 stool-0.021
ve-0.021
Token,
Token ID11
Feature activation+8.83e-03

Pos logit contributions
 stool+0.003
sie+0.003
unny+0.003
ll+0.003
ve+0.003
Neg logit contributions
aw-0.002
 flashes-0.002
 its-0.002
 wishes-0.002
 anything-0.002
Token he
Token ID339
Feature activation-1.70e-02

Pos logit contributions
 stool+0.018
sie+0.018
unny+0.018
ll+0.017
hetically+0.017
Neg logit contributions
aw-0.014
 flashes-0.010
 its-0.010
 wishes-0.010
 anything-0.009
Token felt
Token ID2936
Feature activation-1.52e-02

Pos logit contributions
aw+0.027
 its+0.021
 flashes+0.020
 anything+0.020
 such+0.020
Neg logit contributions
sie-0.034
unny-0.034
 stool-0.033
ll-0.033
ve-0.032
Token a
Token ID257
Feature activation+1.91e-02

Pos logit contributions
aw+0.024
 flashes+0.020
 Leave+0.018
 voices+0.017
 aids+0.017
Neg logit contributions
sie-0.028
 stool-0.027
ylan-0.026
 grocer-0.026
 boulder-0.026
Token tug
Token ID27762
Feature activation+9.63e-02

Pos logit contributions
unny+0.037
sie+0.036
ll+0.036
 stool+0.036
otte+0.035
Neg logit contributions
aw-0.032
 its-0.026
 anything-0.025
 wishes-0.025
 flashes-0.025
Token on
Token ID319
Feature activation+1.21e-03

Pos logit contributions
unny+0.192
sie+0.189
 stool+0.188
ll+0.184
otte+0.180
Neg logit contributions
aw-0.156
 its-0.123
 anything-0.118
 such-0.116
 flashes-0.116
Token the
Token ID262
Feature activation+1.02e-02

Pos logit contributions
 stool+0.003
sie+0.003
hetically+0.003
unny+0.003
 boulder+0.003
Neg logit contributions
aw-0.002
 flashes-0.001
 Leave-0.001
 wishes-0.001
 drawn-0.001
Token line
Token ID1627
Feature activation+2.13e-02

Pos logit contributions
sie+0.017
unny+0.017
otte+0.017
ll+0.017
oris+0.016
Neg logit contributions
aw-0.020
 anything-0.016
 its-0.016
 flashes-0.016
 such-0.015
Token!
Token ID0
Feature activation-3.71e-03

Pos logit contributions
unny+0.048
sie+0.048
 stool+0.047
ll+0.046
otte+0.046
Neg logit contributions
aw-0.029
 its-0.022
 anything-0.020
 such-0.020
 flashes-0.020
Token He
Token ID679
Feature activation+1.60e-02

Pos logit contributions
aw+0.005
 its+0.004
 flashes+0.004
 anything+0.004
 wishes+0.004
Neg logit contributions
unny-0.008
 stool-0.008
sie-0.008
ll-0.008
aughter-0.008
Token so
Token ID523
Feature activation+9.11e-03

Pos logit contributions
sie+0.003
 stool+0.003
ll+0.003
otte+0.003
ve+0.003
Neg logit contributions
aw-0.003
 flashes-0.002
 Leave-0.002
 wishes-0.002
 voices-0.002
Token proud
Token ID6613
Feature activation+2.26e-02

Pos logit contributions
sie+0.017
unny+0.017
 stool+0.017
ll+0.017
ve+0.017
Neg logit contributions
aw-0.015
 flashes-0.012
 its-0.012
 anything-0.012
 such-0.011
Token of
Token ID286
Feature activation+1.32e-02

Pos logit contributions
ll+0.047
 stool+0.046
sie+0.045
unny+0.045
ve+0.044
Neg logit contributions
aw-0.032
 flashes-0.025
 anything-0.025
 its-0.024
 wishes-0.023
Token the
Token ID262
Feature activation+2.33e-02

Pos logit contributions
 stool+0.023
sie+0.023
otte+0.023
ll+0.023
hetically+0.022
Neg logit contributions
aw-0.024
 flashes-0.019
 Leave-0.018
 anything-0.017
 wishes-0.017
Token hard
Token ID1327
Feature activation+2.21e-02

Pos logit contributions
unny+0.053
sie+0.051
ll+0.050
 stool+0.049
otte+0.049
Neg logit contributions
aw-0.032
 flashes-0.025
 wishes-0.025
 its-0.025
 anything-0.025
Token work
Token ID670
Feature activation+1.02e-01

Pos logit contributions
unny+0.038
ll+0.036
sie+0.034
stack+0.033
 miner+0.033
Neg logit contributions
aw-0.040
 anything-0.039
 its-0.039
 wishes-0.038
 flashes-0.037
Token she
Token ID673
Feature activation-3.47e-03

Pos logit contributions
unny+0.229
sie+0.224
 stool+0.220
ll+0.218
otte+0.214
Neg logit contributions
aw-0.141
 its-0.111
 anything-0.104
 such-0.103
 flashes-0.102
Token had
Token ID550
Feature activation+2.90e-03

Pos logit contributions
aw+0.006
 flashes+0.005
 its+0.005
 anything+0.005
 wishes+0.005
Neg logit contributions
sie-0.006
ll-0.006
ve-0.006
 stool-0.006
unny-0.006
Token done
Token ID1760
Feature activation+2.55e-02

Pos logit contributions
ll+0.005
ve+0.005
sie+0.004
 stool+0.004
otte+0.004
Neg logit contributions
aw-0.006
 flashes-0.005
 Leave-0.004
 aids-0.004
 wishes-0.004
Token to
Token ID284
Feature activation+1.73e-02

Pos logit contributions
 stool+0.051
ve+0.049
otte+0.047
aughter+0.046
beans+0.046
Neg logit contributions
aw-0.037
 flashes-0.033
 Leave-0.030
 Their-0.026
 wishes-0.025
Token make
Token ID787
Feature activation-6.39e-03

Pos logit contributions
sie+0.031
 stool+0.031
unny+0.031
ll+0.030
otte+0.030
Neg logit contributions
aw-0.031
 flashes-0.024
 its-0.024
 anything-0.023
 such-0.022
Token was
Token ID373
Feature activation-5.47e-03

Pos logit contributions
unny+0.025
 stool+0.024
sie+0.024
aughter+0.023
otte+0.023
Neg logit contributions
aw-0.016
 its-0.012
 wishes-0.012
 flashes-0.011
 such-0.011
Token very
Token ID845
Feature activation-1.92e-04

Pos logit contributions
aw+0.009
 flashes+0.007
 anything+0.007
 Leave+0.006
 wishes+0.006
Neg logit contributions
 stool-0.010
sie-0.010
ve-0.010
unny-0.010
ll-0.010
Token proud
Token ID6613
Feature activation+2.42e-02

Pos logit contributions
aw+0.000
 flashes+0.000
 its+0.000
 anything+0.000
 such+0.000
Neg logit contributions
sie-0.000
unny-0.000
 stool-0.000
ll-0.000
ve-0.000
Token of
Token ID286
Feature activation+9.29e-03

Pos logit contributions
ll+0.051
 stool+0.051
unny+0.050
sie+0.050
ve+0.048
Neg logit contributions
aw-0.034
 flashes-0.026
 its-0.025
 anything-0.025
 wishes-0.024
Token his
Token ID465
Feature activation+8.24e-03

Pos logit contributions
sie+0.019
 stool+0.019
unny+0.019
ll+0.019
hetically+0.018
Neg logit contributions
aw-0.014
 flashes-0.011
 its-0.010
 anything-0.010
 wishes-0.010
Token work
Token ID670
Feature activation+9.05e-02

Pos logit contributions
sie+0.015
unny+0.015
otte+0.014
ll+0.014
abby+0.014
Neg logit contributions
aw-0.013
 wishes-0.013
 its-0.012
 hair-0.012
 anything-0.012
Token and
Token ID290
Feature activation+9.17e-03

Pos logit contributions
unny+0.209
sie+0.205
 stool+0.199
otte+0.197
ll+0.196
Neg logit contributions
aw-0.120
 its-0.094
 anything-0.088
 such-0.087
 flashes-0.084
Token his
Token ID465
Feature activation-1.42e-02

Pos logit contributions
 stool+0.016
ll+0.016
sie+0.016
 miner+0.016
hetically+0.016
Neg logit contributions
aw-0.016
 flashes-0.012
 anything-0.012
 Leave-0.012
 its-0.011
Token rake
Token ID47885
Feature activation-2.00e-02

Pos logit contributions
aw+0.024
 its+0.021
 wishes+0.020
 anything+0.020
 flashes+0.020
Neg logit contributions
sie-0.026
unny-0.026
ll-0.025
otte-0.025
ve-0.024
Token.
Token ID13
Feature activation+1.15e-03

Pos logit contributions
aw+0.031
 flashes+0.023
 its+0.023
 anything+0.022
 wishes+0.022
Neg logit contributions
 stool-0.040
sie-0.039
unny-0.039
ll-0.039
hetically-0.039
Token<|endoftext|>
Token ID50256
Feature activation+1.68e-02

Pos logit contributions
unny+0.003
 stool+0.002
sie+0.002
ll+0.002
aughter+0.002
Neg logit contributions
aw-0.002
 flashes-0.001
 anything-0.001
 its-0.001
 wishes-0.001
Token Suddenly
Token ID24975
Feature activation-1.88e-02

Pos logit contributions
 stool+0.023
unny+0.023
sie+0.022
aughter+0.022
ylan+0.022
Neg logit contributions
aw-0.018
 flashes-0.013
 anything-0.013
 its-0.013
 wishes-0.013
Token,
Token ID11
Feature activation+1.39e-02

Pos logit contributions
aw+0.032
 flashes+0.024
 its+0.024
 wishes+0.024
 such+0.023
Neg logit contributions
sie-0.035
 stool-0.035
unny-0.034
 miner-0.033
ll-0.033
Token they
Token ID484
Feature activation+1.99e-02

Pos logit contributions
sie+0.026
 stool+0.026
unny+0.026
ll+0.025
aughter+0.025
Neg logit contributions
aw-0.024
 its-0.018
 flashes-0.018
 wishes-0.018
 anything-0.017
Token saw
Token ID2497
Feature activation-2.53e-02

Pos logit contributions
sie+0.041
unny+0.040
ll+0.040
 stool+0.040
ve+0.039
Neg logit contributions
aw-0.029
 its-0.023
 flashes-0.022
 wishes-0.022
 such-0.022
Token a
Token ID257
Feature activation+2.89e-02

Pos logit contributions
aw+0.035
 Leave+0.032
 wishes+0.027
 flashes+0.027
 its+0.027
Neg logit contributions
 miner-0.046
 toddler-0.045
sie-0.044
ylan-0.044
aughter-0.044
Token door
Token ID3420
Feature activation+2.13e-02

Pos logit contributions
sie+0.059
unny+0.058
ll+0.058
hetically+0.057
aughter+0.056
Neg logit contributions
aw-0.045
 its-0.036
 flashes-0.035
 anything-0.034
 wishes-0.033
Token in
Token ID287
Feature activation+2.41e-02

Pos logit contributions
 stool+0.041
ve+0.038
hetically+0.038
 aisle+0.038
ll+0.037
Neg logit contributions
aw-0.035
 anything-0.028
 wishes-0.027
 its-0.027
 such-0.026
Token the
Token ID262
Feature activation-2.10e-02

Pos logit contributions
unny+0.059
sie+0.058
 stool+0.058
ll+0.057
ve+0.056
Neg logit contributions
aw-0.028
 its-0.020
 flashes-0.019
 anything-0.019
 such-0.018
Token middle
Token ID3504
Feature activation-4.66e-02

Pos logit contributions
aw+0.042
 its+0.035
 flashes+0.034
 anything+0.034
 such+0.033
Neg logit contributions
 stool-0.033
sie-0.033
unny-0.033
ll-0.031
hetically-0.030
Token of
Token ID286
Feature activation-2.11e-02

Pos logit contributions
aw+0.077
 Leave+0.070
 anything+0.067
 wishes+0.067
 its+0.065
Neg logit contributions
 stool-0.081
 aisle-0.077
ll-0.075
 hill-0.073
aughter-0.072
Token the
Token ID262
Feature activation-8.07e-03

Pos logit contributions
aw+0.023
 its+0.015
 flashes+0.015
 anything+0.015
 such+0.014
Neg logit contributions
unny-0.053
sie-0.053
 stool-0.052
ll-0.051
ve-0.050
Token walking
Token ID6155
Feature activation+4.48e-03

Pos logit contributions
aw+0.003
 its+0.002
 flashes+0.002
 anything+0.002
 wishes+0.002
Neg logit contributions
 stool-0.004
sie-0.004
unny-0.004
ll-0.004
ve-0.004
Token outside
Token ID2354
Feature activation+1.67e-02

Pos logit contributions
sie+0.010
aughter+0.010
 stool+0.010
otte+0.010
unny+0.010
Neg logit contributions
aw-0.006
 its-0.005
 anything-0.004
 such-0.004
 about-0.004
Token,
Token ID11
Feature activation+4.48e-03

Pos logit contributions
 stool+0.029
sie+0.027
unny+0.027
ll+0.026
 aisle+0.026
Neg logit contributions
aw-0.033
 flashes-0.025
 aids-0.024
 wishes-0.024
 drawn-0.024
Token she
Token ID673
Feature activation-2.52e-02

Pos logit contributions
 stool+0.008
unny+0.008
sie+0.008
ll+0.007
otte+0.007
Neg logit contributions
aw-0.008
 flashes-0.007
 its-0.007
 wishes-0.006
 anything-0.006
Token saw
Token ID2497
Feature activation-2.68e-02

Pos logit contributions
aw+0.047
 flashes+0.043
 such+0.043
 aids+0.043
 its+0.042
Neg logit contributions
ll-0.038
sie-0.037
unny-0.036
ve-0.036
 stool-0.035
Token a
Token ID257
Feature activation+2.65e-02

Pos logit contributions
aw+0.040
 Leave+0.034
 aids+0.032
 wishes+0.032
 drawn+0.030
Neg logit contributions
 stool-0.049
unny-0.046
sie-0.045
hetically-0.045
aughter-0.045
Token fuzzy
Token ID34669
Feature activation+7.57e-03

Pos logit contributions
sie+0.054
unny+0.053
ll+0.053
hetically+0.053
aughter+0.051
Neg logit contributions
aw-0.041
 its-0.033
 flashes-0.032
 anything-0.031
 wishes-0.030
Token inch
Token ID11111
Feature activation+5.18e-02

Pos logit contributions
ll+0.013
unny+0.012
sie+0.012
hetically+0.012
 stool+0.012
Neg logit contributions
aw-0.015
 its-0.013
 flashes-0.012
 anything-0.012
 wishes-0.012
Tokenworm
Token ID25323
Feature activation+1.35e-02

Pos logit contributions
unny+0.095
hetically+0.093
sie+0.093
ll+0.092
otte+0.092
Neg logit contributions
aw-0.098
 its-0.079
 flashes-0.076
 anything-0.074
 wishes-0.072
Token!
Token ID0
Feature activation-1.39e-03

Pos logit contributions
 stool+0.026
sie+0.026
unny+0.026
ll+0.025
ve+0.025
Neg logit contributions
aw-0.022
 flashes-0.017
 its-0.017
 anything-0.016
 such-0.016
Token She
Token ID1375
Feature activation+1.32e-02

Pos logit contributions
aw+0.002
 its+0.001
 flashes+0.001
 anything+0.001
 such+0.001
Neg logit contributions
sie-0.003
unny-0.003
 stool-0.003
ll-0.003
ve-0.003
Token€
Token ID26391
Feature activation-2.54e-02

Pos logit contributions
unny+0.027
sie+0.026
 stool+0.026
ll+0.025
otte+0.025
Neg logit contributions
aw-0.030
 its-0.024
 flashes-0.024
 such-0.023
 anything-0.023
Token
Token ID198
Feature activation-9.50e-04

Pos logit contributions
aw+0.042
 its+0.035
 flashes+0.034
 anything+0.033
 such+0.033
Neg logit contributions
 stool-0.048
unny-0.048
ll-0.047
ve-0.046
sie-0.045
Token
Token ID198
Feature activation+8.43e-03

Pos logit contributions
aw+0.001
 flashes+0.001
 anything+0.001
 its+0.001
 wishes+0.001
Neg logit contributions
unny-0.002
ll-0.002
 stool-0.002
sie-0.002
ve-0.002
TokenDad
Token ID46270
Feature activation+8.58e-03

Pos logit contributions
unny+0.015
 stool+0.015
sie+0.015
ll+0.015
ve+0.014
Neg logit contributions
aw-0.015
 flashes-0.012
 anything-0.012
 its-0.012
 wishes-0.012
Token smiled
Token ID13541
Feature activation+2.91e-02

Pos logit contributions
unny+0.019
sie+0.019
 stool+0.019
comb+0.018
otte+0.018
Neg logit contributions
aw-0.012
 its-0.008
 about-0.008
 It-0.008
 flashes-0.008
Token at
Token ID379
Feature activation+1.83e-02

Pos logit contributions
sie+0.058
unny+0.058
 stool+0.057
ll+0.056
hetically+0.055
Neg logit contributions
aw-0.045
 flashes-0.035
 anything-0.035
 its-0.034
 wishes-0.033
Token her
Token ID607
Feature activation-2.78e-03

Pos logit contributions
sie+0.040
aughter+0.038
 stool+0.038
otte+0.038
unny+0.037
Neg logit contributions
aw-0.025
 flashes-0.020
 wishes-0.018
 anything-0.017
 Leave-0.017
Token and
Token ID290
Feature activation+5.52e-03

Pos logit contributions
aw+0.004
 its+0.003
 anything+0.003
 such+0.003
 flashes+0.003
Neg logit contributions
unny-0.006
sie-0.006
 stool-0.006
ll-0.006
otte-0.006
Token said
Token ID531
Feature activation+6.99e-03

Pos logit contributions
 stool+0.010
sie+0.010
unny+0.010
ll+0.010
otte+0.009
Neg logit contributions
aw-0.010
 anything-0.008
 flashes-0.008
 its-0.008
 wishes-0.007
Token �
Token ID6184
Feature activation+3.20e-02

Pos logit contributions
unny+0.014
sie+0.014
 stool+0.014
ll+0.013
ve+0.013
Neg logit contributions
aw-0.011
 flashes-0.008
 its-0.008
 anything-0.008
 such-0.008
Token�
Token ID95
Feature activation+2.04e-02

Pos logit contributions
unny+0.049
sie+0.049
 stool+0.048
ll+0.047
hetically+0.046
Neg logit contributions
aw-0.067
 its-0.053
 anything-0.052
 flashes-0.052
 such-0.052
Token he
Token ID339
Feature activation-1.20e-02

Pos logit contributions
 stool+0.038
ll+0.035
 miner+0.035
sie+0.035
hetically+0.035
Neg logit contributions
aw-0.027
 flashes-0.019
 wishes-0.018
 Leave-0.018
 anything-0.018
Token had
Token ID550
Feature activation-9.12e-03

Pos logit contributions
aw+0.022
 flashes+0.018
 anything+0.018
 its+0.017
 Leave+0.017
Neg logit contributions
ll-0.020
sie-0.020
ve-0.020
 stool-0.019
unny-0.019
Token made
Token ID925
Feature activation-2.63e-03

Pos logit contributions
aw+0.015
 Leave+0.013
 flashes+0.013
 aids+0.012
 War+0.012
Neg logit contributions
ll-0.015
sie-0.014
ve-0.014
 stool-0.014
ylan-0.014
Token a
Token ID257
Feature activation+1.85e-02

Pos logit contributions
 Leave+0.004
aw+0.004
 flashes+0.003
 Their+0.003
 blamed+0.003
Neg logit contributions
ert-0.004
 stool-0.004
ve-0.004
aughter-0.004
ylan-0.004
Token very
Token ID845
Feature activation+7.06e-03

Pos logit contributions
 stool+0.036
sie+0.034
ve+0.033
ll+0.033
aughter+0.033
Neg logit contributions
aw-0.031
 Leave-0.024
 flashes-0.023
 its-0.022
 such-0.022
Token special
Token ID2041
Feature activation+1.68e-02

Pos logit contributions
 stool+0.013
sie+0.013
ve+0.013
ll+0.013
unny+0.012
Neg logit contributions
aw-0.013
 flashes-0.010
 Leave-0.010
 such-0.010
 its-0.010
Token picture
Token ID4286
Feature activation+2.69e-02

Pos logit contributions
 stool+0.030
sie+0.029
unny+0.029
ll+0.028
aughter+0.028
Neg logit contributions
aw-0.031
 its-0.024
 flashes-0.023
 anything-0.023
 such-0.023
Token.
Token ID13
Feature activation+1.74e-02

Pos logit contributions
 stool+0.053
ll+0.051
unny+0.050
sie+0.050
ve+0.050
Neg logit contributions
aw-0.041
 flashes-0.032
 Leave-0.031
 wishes-0.030
 its-0.030
Token<|endoftext|>
Token ID50256
Feature activation+2.91e-02

Pos logit contributions
unny+0.039
 stool+0.038
sie+0.037
ll+0.037
aughter+0.037
Neg logit contributions
aw-0.024
 flashes-0.017
 anything-0.017
 its-0.017
 wishes-0.016
TokenOnce
Token ID7454
Feature activation-1.62e-02

Pos logit contributions
sie+0.056
unny+0.056
 stool+0.056
ll+0.054
hetically+0.053
Neg logit contributions
aw-0.048
 its-0.038
 flashes-0.036
 anything-0.036
 such-0.036
Token upon
Token ID2402
Feature activation+7.81e-03

Pos logit contributions
aw+0.037
 its+0.033
 anything+0.032
 flashes+0.031
 such+0.031
Neg logit contributions
unny-0.020
sie-0.020
 stool-0.020
hetically-0.019
otte-0.018
Token to
Token ID284
Feature activation+3.93e-03

Pos logit contributions
unny+0.019
sie+0.018
 stool+0.018
ll+0.018
hetically+0.018
Neg logit contributions
aw-0.011
 its-0.009
 such-0.008
 anything-0.008
 flashes-0.008
Token his
Token ID465
Feature activation-1.32e-02

Pos logit contributions
 stool+0.008
sie+0.008
otte+0.008
aughter+0.008
unny+0.008
Neg logit contributions
aw-0.006
 flashes-0.004
 Leave-0.004
 drawn-0.004
 wishes-0.004
Token bedroom
Token ID14043
Feature activation-2.60e-02

Pos logit contributions
aw+0.023
 its+0.019
 flashes+0.018
 anything+0.018
 such+0.018
Neg logit contributions
 stool-0.024
unny-0.024
sie-0.024
ll-0.023
otte-0.023
Token to
Token ID284
Feature activation-1.08e-02

Pos logit contributions
aw+0.035
 its+0.027
 anything+0.025
 such+0.025
 flashes+0.025
Neg logit contributions
unny-0.059
sie-0.058
ll-0.056
 stool-0.055
ve-0.055
Token learn
Token ID2193
Feature activation+2.88e-03

Pos logit contributions
aw+0.023
 flashes+0.017
 Leave+0.016
 anything+0.015
 drawn+0.015
Neg logit contributions
 stool-0.017
sie-0.016
otte-0.016
 aisle-0.015
stack-0.015
Token how
Token ID703
Feature activation+3.14e-02

Pos logit contributions
 stool+0.006
guard+0.005
 aisle+0.005
 boulder+0.005
aughter+0.005
Neg logit contributions
aw-0.004
 Leave-0.003
 drawn-0.003
 flashes-0.003
 wishes-0.003
Token to
Token ID284
Feature activation-7.29e-03

Pos logit contributions
 stool+0.051
ll+0.049
ve+0.049
unny+0.048
sie+0.047
Neg logit contributions
aw-0.059
 flashes-0.047
 Leave-0.045
 wishes-0.043
 drawn-0.042
Token use
Token ID779
Feature activation+4.94e-03

Pos logit contributions
aw+0.014
 anything+0.011
 such+0.010
 Leave+0.010
 its+0.010
Neg logit contributions
 stool-0.012
unny-0.012
ll-0.012
hetically-0.012
sie-0.012
Token it
Token ID340
Feature activation-1.51e-03

Pos logit contributions
 stool+0.010
 ruler+0.009
 boulder+0.009
 scale+0.009
 teaspoon+0.009
Neg logit contributions
aw-0.008
 Leave-0.006
 drawn-0.005
 flashes-0.005
 aids-0.005
Token.
Token ID13
Feature activation+4.12e-03

Pos logit contributions
aw+0.002
 flashes+0.002
 Leave+0.002
 its+0.002
 wishes+0.002
Neg logit contributions
 stool-0.003
unny-0.003
ll-0.003
sie-0.003
ve-0.003
Token
Token ID198
Feature activation-1.31e-02

Pos logit contributions
unny+0.009
 stool+0.009
sie+0.009
ll+0.009
aughter+0.008
Neg logit contributions
aw-0.006
 flashes-0.004
 anything-0.004
 its-0.004
 wishes-0.004
Token The
Token ID383
Feature activation+8.32e-03

Pos logit contributions
unny+0.036
 stool+0.036
ylan+0.035
ll+0.035
oun+0.034
Neg logit contributions
aw-0.025
 anything-0.019
 flashes-0.019
 wishes-0.019
 its-0.018
Token man
Token ID582
Feature activation+3.20e-03

Pos logit contributions
unny+0.013
sie+0.013
ll+0.013
otte+0.012
comb+0.012
Neg logit contributions
aw-0.018
 flashes-0.015
 anything-0.014
 wishes-0.014
 its-0.014
Token felt
Token ID2936
Feature activation-6.42e-03

Pos logit contributions
sie+0.006
ll+0.006
unny+0.006
 stool+0.006
hetically+0.005
Neg logit contributions
aw-0.006
 anything-0.005
 its-0.005
 flashes-0.004
 such-0.004
Token happy
Token ID3772
Feature activation-4.13e-03

Pos logit contributions
aw+0.011
 flashes+0.010
 Leave+0.009
 York+0.009
 voices+0.008
Neg logit contributions
sie-0.010
 toddler-0.009
ylan-0.009
 grocer-0.009
 miner-0.009
Token and
Token ID290
Feature activation+2.38e-02

Pos logit contributions
aw+0.007
 flashes+0.005
 its+0.005
 anything+0.005
 Leave+0.005
Neg logit contributions
 stool-0.008
ll-0.008
sie-0.008
unny-0.008
hetically-0.008
Token his
Token ID465
Feature activation-4.84e-04

Pos logit contributions
ll+0.039
sie+0.037
 stool+0.037
unny+0.037
hetically+0.037
Neg logit contributions
aw-0.044
 flashes-0.037
 anything-0.036
 Leave-0.034
 aids-0.034
Token worries
Token ID18572
Feature activation-1.25e-02

Pos logit contributions
aw+0.001
 its+0.001
 flashes+0.001
 anything+0.001
 wishes+0.001
Neg logit contributions
unny-0.001
sie-0.001
 stool-0.001
ll-0.001
otte-0.001
Token were
Token ID547
Feature activation+1.39e-02

Pos logit contributions
aw+0.020
 its+0.016
 flashes+0.015
 anything+0.015
 such+0.015
Neg logit contributions
sie-0.024
unny-0.024
 stool-0.024
ll-0.024
ve-0.023
Token gone
Token ID3750
Feature activation+1.53e-02

Pos logit contributions
ll+0.023
 miner+0.023
sie+0.023
 stool+0.023
ve+0.023
Neg logit contributions
aw-0.025
 flashes-0.020
 Leave-0.020
 wishes-0.019
 anything-0.017
Token.
Token ID13
Feature activation+1.08e-02

Pos logit contributions
 stool+0.031
sie+0.031
unny+0.031
ll+0.031
ve+0.030
Neg logit contributions
aw-0.023
 flashes-0.018
 its-0.016
 Leave-0.016
 anything-0.016
Token<|endoftext|>
Token ID50256
Feature activation+2.44e-02

Pos logit contributions
unny+0.022
 stool+0.022
ll+0.021
ylan+0.021
sie+0.021
Neg logit contributions
aw-0.016
 flashes-0.012
 anything-0.012
 wishes-0.012
 its-0.012
Token.
Token ID13
Feature activation-2.68e-02

Pos logit contributions
aw+0.071
 flashes+0.057
 Leave+0.054
 Its+0.052
 such+0.051
Neg logit contributions
 stool-0.076
ll-0.076
unny-0.073
ve-0.073
sie-0.072
Token They
Token ID1119
Feature activation+1.62e-03

Pos logit contributions
aw+0.052
 such+0.040
 anything+0.040
 wishes+0.040
eping+0.039
Neg logit contributions
unny-0.043
ylan-0.036
oun-0.036
aughter-0.036
 stool-0.036
Token feel
Token ID1254
Feature activation-2.89e-02

Pos logit contributions
sie+0.003
unny+0.003
 stool+0.003
ll+0.003
ve+0.003
Neg logit contributions
aw-0.003
 its-0.002
 flashes-0.002
 anything-0.002
 such-0.002
Token sorry
Token ID7926
Feature activation-2.86e-02

Pos logit contributions
aw+0.048
 Its+0.035
 aids+0.035
c+0.033
clock+0.033
Neg logit contributions
sie-0.045
 stool-0.045
 miner-0.045
hetically-0.044
guard-0.044
Token.
Token ID13
Feature activation-2.33e-02

Pos logit contributions
aw+0.055
 drawn+0.043
 flashes+0.043
 Leave+0.042
 aids+0.041
Neg logit contributions
ll-0.045
 stool-0.044
unny-0.043
ve-0.042
 bud-0.041
Token They
Token ID1119
Feature activation+6.69e-03

Pos logit contributions
aw+0.045
 anything+0.035
 wishes+0.034
 such+0.034
 flashes+0.033
Neg logit contributions
unny-0.040
 stool-0.033
 toddler-0.033
oris-0.033
oun-0.033
Token say
Token ID910
Feature activation-2.96e-02

Pos logit contributions
unny+0.014
sie+0.014
 stool+0.014
ll+0.014
otte+0.013
Neg logit contributions
aw-0.010
 its-0.008
 flashes-0.007
 anything-0.007
 wishes-0.007
Token,
Token ID11
Feature activation+4.09e-03

Pos logit contributions
aw+0.058
 flashes+0.046
 drawn+0.043
 such+0.043
 wishes+0.042
Neg logit contributions
unny-0.047
 stool-0.046
ipop-0.044
ll-0.044
 bud-0.043
Token "
Token ID366
Feature activation-1.28e-02

Pos logit contributions
 stool+0.006
unny+0.006
ll+0.006
stack+0.006
ve+0.006
Neg logit contributions
aw-0.009
 wishes-0.007
 anything-0.007
 flashes-0.007
 such-0.007
TokenOh
Token ID5812
Feature activation-2.05e-02

Pos logit contributions
aw+0.026
 about+0.023
 anything+0.022
 its+0.022
 such+0.021
Neg logit contributions
unny-0.019
ve-0.018
ll-0.018
otte-0.017
 stool-0.017
Token no
Token ID645
Feature activation-5.64e-02

Pos logit contributions
aw+0.029
 flashes+0.021
 its+0.020
 drawn+0.020
 such+0.020
Neg logit contributions
unny-0.043
 stool-0.043
ll-0.042
ve-0.042
sie-0.041
Token want
Token ID765
Feature activation+3.85e-03

Pos logit contributions
unny+0.004
ll+0.004
sie+0.004
ve+0.003
 stool+0.003
Neg logit contributions
aw-0.004
 flashes-0.003
 its-0.003
 anything-0.003
 such-0.003
Token a
Token ID257
Feature activation+1.17e-02

Pos logit contributions
 stool+0.007
unny+0.007
ll+0.006
sie+0.006
 miner+0.006
Neg logit contributions
aw-0.007
 flashes-0.005
 drawn-0.005
 wishes-0.005
 Leave-0.005
Token reward
Token ID6721
Feature activation+5.72e-02

Pos logit contributions
unny+0.019
 stool+0.019
sie+0.019
ll+0.018
ve+0.018
Neg logit contributions
aw-0.023
 its-0.019
 flashes-0.019
 such-0.018
 anything-0.018
Token!"
Token ID2474
Feature activation+1.39e-02

Pos logit contributions
sie+0.114
unny+0.112
 stool+0.112
hetically+0.110
ll+0.109
Neg logit contributions
aw-0.091
 its-0.073
 such-0.071
 anything-0.071
 about-0.065
Token The
Token ID383
Feature activation+1.22e-02

Pos logit contributions
unny+0.030
 stool+0.030
sie+0.029
aughter+0.028
 miner+0.028
Neg logit contributions
aw-0.019
 flashes-0.015
 anything-0.014
 wishes-0.014
 drawn-0.013
Token other
Token ID584
Feature activation-6.83e-04

Pos logit contributions
unny+0.021
sie+0.021
 stool+0.021
ll+0.021
otte+0.020
Neg logit contributions
aw-0.022
 its-0.018
 flashes-0.018
 anything-0.017
 wishes-0.017
Token animals
Token ID4695
Feature activation-1.95e-02

Pos logit contributions
aw+0.001
 its+0.001
 flashes+0.001
 anything+0.001
 such+0.001
Neg logit contributions
unny-0.001
sie-0.001
 stool-0.001
 miner-0.001
ll-0.001
Token heard
Token ID2982
Feature activation-4.60e-02

Pos logit contributions
aw+0.030
 its+0.024
 flashes+0.022
 such+0.022
 about+0.022
Neg logit contributions
 stool-0.041
unny-0.040
sie-0.039
otte-0.038
ll-0.038
Token his
Token ID465
Feature activation+4.57e-03

Pos logit contributions
aw+0.061
 drawn+0.053
 flashes+0.051
 Leave+0.050
 wishes+0.047
Neg logit contributions
hetically-0.082
sie-0.081
aughter-0.079
 toddler-0.078
 stool-0.078
Token yell
Token ID28946
Feature activation+1.52e-02

Pos logit contributions
unny+0.008
 stool+0.008
ll+0.008
ve+0.008
hetically+0.008
Neg logit contributions
aw-0.008
 its-0.007
 anything-0.007
 flashes-0.007
 such-0.007
Token and
Token ID290
Feature activation-3.36e-03

Pos logit contributions
 stool+0.027
unny+0.027
hetically+0.027
sie+0.026
ve+0.026
Neg logit contributions
aw-0.026
 flashes-0.021
 wishes-0.021
 anything-0.020
 its-0.020
Token to
Token ID284
Feature activation+2.58e-02

Pos logit contributions
aw+0.014
 flashes+0.011
 its+0.010
 anything+0.010
 wishes+0.010
Neg logit contributions
ll-0.017
 stool-0.017
unny-0.016
sie-0.016
ve-0.016
Token see
Token ID766
Feature activation-3.22e-02

Pos logit contributions
sie+0.047
 stool+0.047
unny+0.047
stack+0.045
 miner+0.045
Neg logit contributions
aw-0.047
 flashes-0.036
 anything-0.031
 such-0.030
 drawn-0.030
Token that
Token ID326
Feature activation+2.68e-03

Pos logit contributions
aw+0.043
 flashes+0.034
 drawn+0.031
 Leave+0.030
 wishes+0.029
Neg logit contributions
sie-0.068
guard-0.065
aughter-0.065
 miner-0.064
otte-0.063
Token no
Token ID645
Feature activation-5.62e-02

Pos logit contributions
 stool+0.005
sie+0.005
unny+0.005
 miner+0.005
aughter+0.005
Neg logit contributions
aw-0.004
 flashes-0.003
 Leave-0.003
 wishes-0.003
 its-0.003
Token one
Token ID530
Feature activation+1.17e-02

Pos logit contributions
aw+0.108
 its+0.086
 flashes+0.085
 Leave+0.082
 such+0.082
Neg logit contributions
 stool-0.095
sie-0.094
unny-0.089
 miner-0.085
otte-0.085
Token was
Token ID373
Feature activation+1.34e-02

Pos logit contributions
sie+0.022
 stool+0.022
ll+0.022
unny+0.022
 miner+0.021
Neg logit contributions
aw-0.019
 flashes-0.015
 its-0.015
 anything-0.015
 such-0.014
Token watching
Token ID4964
Feature activation+4.23e-05

Pos logit contributions
sie+0.024
 stool+0.024
ve+0.023
unny+0.023
otte+0.022
Neg logit contributions
aw-0.025
 flashes-0.020
 wishes-0.019
 Leave-0.018
 its-0.018
Token.
Token ID13
Feature activation-2.66e-03

Pos logit contributions
 stool+0.000
sie+0.000
unny+0.000
hetically+0.000
aughter+0.000
Neg logit contributions
aw-0.000
 flashes-0.000
 wishes-0.000
 drawn-0.000
 Leave-0.000
Token He
Token ID679
Feature activation+6.65e-03

Pos logit contributions
aw+0.004
 flashes+0.004
 anything+0.003
 wishes+0.003
 its+0.003
Neg logit contributions
unny-0.005
 stool-0.005
ll-0.005
oris-0.005
ylan-0.005
Token quickly
Token ID2952
Feature activation-1.46e-02

Pos logit contributions
sie+0.013
unny+0.013
 stool+0.013
ll+0.012
ve+0.012
Neg logit contributions
aw-0.011
 anything-0.009
 flashes-0.008
 its-0.008
 such-0.008
Token put
Token ID1234
Feature activation+2.59e-02

Pos logit contributions
aw+0.027
 anything+0.022
 flashes+0.022
 its+0.022
 such+0.021
Neg logit contributions
sie-0.025
 stool-0.025
unny-0.024
ll-0.024
ve-0.024
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
aw+0.040
 flashes+0.032
 its+0.031
 anything+0.029
 such+0.028
Neg logit contributions
unny-0.052
sie-0.051
ll-0.051
 stool-0.051
ve-0.049
Token One
Token ID1881
Feature activation+1.07e-02

Pos logit contributions
aw+0.020
 flashes+0.017
 its+0.015
 anything+0.015
 such+0.015
Neg logit contributions
unny-0.025
 stool-0.025
aughter-0.024
sie-0.024
ll-0.024
Token day
Token ID1110
Feature activation-1.06e-02

Pos logit contributions
unny+0.014
sie+0.014
 stool+0.014
ll+0.013
aughter+0.013
Neg logit contributions
aw-0.025
 its-0.021
 flashes-0.021
 anything-0.020
 such-0.020
Token,
Token ID11
Feature activation-6.26e-03

Pos logit contributions
aw+0.015
 its+0.011
 anything+0.011
 such+0.010
 flashes+0.010
Neg logit contributions
unny-0.023
sie-0.023
 stool-0.023
ll-0.023
hetically-0.023
Token she
Token ID673
Feature activation-3.07e-02

Pos logit contributions
aw+0.010
 flashes+0.008
 wishes+0.008
 its+0.008
 Leave+0.007
Neg logit contributions
 stool-0.012
sie-0.012
unny-0.012
otte-0.012
ll-0.012
Token invited
Token ID9392
Feature activation-2.51e-02

Pos logit contributions
aw+0.049
 its+0.039
 anything+0.038
 flashes+0.038
 such+0.037
Neg logit contributions
sie-0.060
unny-0.059
ll-0.059
 stool-0.059
otte-0.057
Token Lisa
Token ID15378
Feature activation-1.54e-02

Pos logit contributions
aw+0.036
 flashes+0.028
 Leave+0.025
 wishes+0.025
 drawn+0.024
Neg logit contributions
sie-0.054
 stool-0.052
aughter-0.052
otte-0.052
unny-0.051
Token to
Token ID284
Feature activation+5.12e-03

Pos logit contributions
aw+0.024
 flashes+0.019
 its+0.018
 wishes+0.018
 anything+0.018
Neg logit contributions
sie-0.031
unny-0.031
 stool-0.030
ll-0.030
otte-0.029
Token her
Token ID607
Feature activation-3.14e-02

Pos logit contributions
sie+0.011
 stool+0.011
unny+0.010
otte+0.010
ll+0.010
Neg logit contributions
aw-0.008
 flashes-0.006
 wishes-0.005
 its-0.005
 anything-0.005
Token house
Token ID2156
Feature activation-2.65e-02

Pos logit contributions
aw+0.060
 its+0.048
 flashes+0.048
 anything+0.047
 such+0.045
Neg logit contributions
 stool-0.053
sie-0.053
unny-0.052
ll-0.051
otte-0.050
Token to
Token ID284
Feature activation-1.23e-02

Pos logit contributions
aw+0.042
 flashes+0.032
 its+0.031
 wishes+0.031
 anything+0.031
Neg logit contributions
 stool-0.053
sie-0.052
unny-0.052
ll-0.051
aughter-0.050
Token like
Token ID588
Feature activation-7.89e-03

Pos logit contributions
sie+0.030
unny+0.030
 stool+0.029
ll+0.028
ve+0.028
Neg logit contributions
aw-0.025
 flashes-0.020
 its-0.019
 wishes-0.018
 anything-0.018
Token playing
Token ID2712
Feature activation-4.15e-03

Pos logit contributions
aw+0.013
 flashes+0.009
 Leave+0.009
 drawn+0.009
 wishes+0.009
Neg logit contributions
 stool-0.015
 teaspoon-0.014
sie-0.014
 hill-0.013
ll-0.013
Token with
Token ID351
Feature activation-7.53e-03

Pos logit contributions
aw+0.006
 flashes+0.005
 wishes+0.005
 its+0.005
 drawn+0.005
Neg logit contributions
 stool-0.008
unny-0.008
sie-0.008
ve-0.008
beans-0.008
Token little
Token ID1310
Feature activation-2.49e-02

Pos logit contributions
aw+0.011
 flashes+0.008
 drawn+0.008
 Leave+0.008
 wishes+0.008
Neg logit contributions
 stool-0.015
sie-0.015
unny-0.014
otte-0.014
 teaspoon-0.014
Token girls
Token ID4813
Feature activation-4.38e-04

Pos logit contributions
aw+0.048
 its+0.039
 flashes+0.038
 anything+0.037
 such+0.037
Neg logit contributions
sie-0.041
 stool-0.040
unny-0.040
ll-0.038
ve-0.038
Token named
Token ID3706
Feature activation-5.21e-02

Pos logit contributions
aw+0.001
 its+0.001
 flashes+0.000
 wishes+0.000
 anything+0.000
Neg logit contributions
sie-0.001
 stool-0.001
unny-0.001
ll-0.001
otte-0.001
Token Lily
Token ID20037
Feature activation+1.79e-02

Pos logit contributions
aw+0.080
 flashes+0.062
 its+0.059
 wishes+0.058
 drawn+0.058
Neg logit contributions
sie-0.108
unny-0.105
 stool-0.102
otte-0.101
ll-0.100
Token."
Token ID526
Feature activation+2.46e-02

Pos logit contributions
unny+0.039
sie+0.039
 stool+0.039
ll+0.038
otte+0.037
Neg logit contributions
aw-0.025
 its-0.019
 flashes-0.018
 anything-0.018
 wishes-0.017
Token
Token ID220
Feature activation-4.33e-03

Pos logit contributions
unny+0.046
sie+0.044
 stool+0.044
ll+0.043
aughter+0.043
Neg logit contributions
aw-0.041
 flashes-0.036
 anything-0.034
 wishes-0.033
 drawn-0.031
Token
Token ID198
Feature activation-1.26e-03

Pos logit contributions
aw+0.006
 flashes+0.005
 anything+0.005
 its+0.005
 wishes+0.005
Neg logit contributions
unny-0.009
ll-0.009
 stool-0.009
ve-0.009
sie-0.009
Token
Token ID198
Feature activation+6.51e-03

Pos logit contributions
aw+0.002
 flashes+0.002
 anything+0.002
 wishes+0.001
 its+0.001
Neg logit contributions
unny-0.003
ll-0.003
sie-0.003
 stool-0.002
ve-0.002
Token.
Token ID13
Feature activation-2.56e-02

Pos logit contributions
aw+0.005
 flashes+0.004
 Leave+0.004
 its+0.004
 such+0.004
Neg logit contributions
 stool-0.006
unny-0.006
sie-0.006
ll-0.006
comb-0.006
Token Let
Token ID3914
Feature activation+5.24e-03

Pos logit contributions
aw+0.050
eping+0.040
 flashes+0.040
 such+0.039
 drawn+0.039
Neg logit contributions
unny-0.042
comb-0.037
 stool-0.037
stack-0.036
 miner-0.034
Token's
Token ID338
Feature activation-4.69e-02

Pos logit contributions
sie+0.010
 stool+0.010
ley+0.010
unny+0.010
aughter+0.010
Neg logit contributions
aw-0.008
 flashes-0.006
 drawn-0.006
 Winning-0.005
 its-0.005
Token go
Token ID467
Feature activation-2.58e-02

Pos logit contributions
aw+0.086
 flashes+0.065
 its+0.063
 such+0.062
 Its+0.061
Neg logit contributions
sie-0.080
 stool-0.079
unny-0.079
 miner-0.078
ll-0.076
Token and
Token ID290
Feature activation-1.43e-02

Pos logit contributions
aw+0.041
 its+0.033
 anything+0.031
 flashes+0.031
 such+0.031
Neg logit contributions
unny-0.051
sie-0.051
 stool-0.051
hetically-0.049
ll-0.049
Token see
Token ID766
Feature activation-5.96e-02

Pos logit contributions
aw+0.026
 flashes+0.019
 its+0.018
 such+0.018
 anything+0.017
Neg logit contributions
 stool-0.027
sie-0.027
unny-0.026
ll-0.025
 miner-0.025
Token!"
Token ID2474
Feature activation-2.81e-02

Pos logit contributions
aw+0.109
 Leave+0.102
 aids+0.099
 flashes+0.098
 drawn+0.097
Neg logit contributions
 stool-0.086
ll-0.083
sie-0.081
ve-0.078
aughter-0.077
Token Lily
Token ID20037
Feature activation-3.47e-03

Pos logit contributions
aw+0.052
 flashes+0.039
 drawn+0.039
eping+0.038
 wishes+0.038
Neg logit contributions
unny-0.047
 stool-0.047
sie-0.046
ipop-0.046
 toddler-0.044
Token said
Token ID531
Feature activation+1.14e-03

Pos logit contributions
aw+0.005
 its+0.004
 flashes+0.004
 anything+0.004
 such+0.004
Neg logit contributions
sie-0.007
unny-0.007
 stool-0.007
ll-0.007
hetically-0.007
Token.
Token ID13
Feature activation-2.49e-03

Pos logit contributions
unny+0.002
sie+0.002
 stool+0.002
ll+0.002
ve+0.002
Neg logit contributions
aw-0.002
 its-0.001
 flashes-0.001
 anything-0.001
 such-0.001
Token
Token ID198
Feature activation-1.38e-02

Pos logit contributions
aw+0.003
 its+0.002
 flashes+0.002
 anything+0.002
 such+0.002
Neg logit contributions
unny-0.006
 stool-0.006
sie-0.006
ll-0.005
otte-0.005
Token our
Token ID674
Feature activation+1.30e-02

Pos logit contributions
aw+0.025
 flashes+0.019
 Leave+0.018
 wishes+0.018
 its+0.017
Neg logit contributions
 stool-0.025
sie-0.024
ve-0.023
 miner-0.023
otte-0.023
Token friend
Token ID1545
Feature activation+7.12e-04

Pos logit contributions
unny+0.022
sie+0.022
 stool+0.022
ll+0.021
hetically+0.021
Neg logit contributions
aw-0.025
 its-0.020
 anything-0.020
 flashes-0.019
 wishes-0.019
Token.
Token ID13
Feature activation-1.40e-02

Pos logit contributions
unny+0.001
sie+0.001
 stool+0.001
ll+0.001
ve+0.001
Neg logit contributions
aw-0.001
 flashes-0.001
 its-0.001
 anything-0.001
 wishes-0.001
Token And
Token ID843
Feature activation-9.60e-03

Pos logit contributions
aw+0.026
 flashes+0.020
 drawn+0.020
 anything+0.020
 its+0.019
Neg logit contributions
unny-0.027
 stool-0.023
oun-0.023
 miner-0.023
stack-0.023
Token our
Token ID674
Feature activation-1.34e-02

Pos logit contributions
aw+0.017
 flashes+0.014
 its+0.013
 drawn+0.012
 anything+0.012
Neg logit contributions
 stool-0.017
sie-0.017
unny-0.017
ll-0.016
 miner-0.016
Token notebook
Token ID20922
Feature activation-5.29e-02

Pos logit contributions
aw+0.026
 its+0.021
 flashes+0.021
 anything+0.021
 such+0.021
Neg logit contributions
 stool-0.022
sie-0.022
unny-0.021
ll-0.021
ve-0.020
Token is
Token ID318
Feature activation-1.22e-02

Pos logit contributions
aw+0.074
 its+0.056
 flashes+0.055
 anything+0.053
 such+0.052
Neg logit contributions
 stool-0.114
unny-0.113
sie-0.113
ll-0.112
ve-0.109
Token ruined
Token ID20484
Feature activation-4.59e-04

Pos logit contributions
aw+0.022
 flashes+0.017
 Leave+0.017
 wishes+0.015
 Its+0.015
Neg logit contributions
 stool-0.022
sie-0.022
ll-0.021
 miner-0.021
ve-0.020
Token."
Token ID526
Feature activation+2.38e-03

Pos logit contributions
aw+0.001
 its+0.000
 anything+0.000
 flashes+0.000
 such+0.000
Neg logit contributions
unny-0.001
sie-0.001
 stool-0.001
ll-0.001
hetically-0.001
Token
Token ID198
Feature activation-1.02e-02

Pos logit contributions
unny+0.005
 stool+0.005
ll+0.005
otte+0.004
stack+0.004
Neg logit contributions
aw-0.004
 flashes-0.003
 anything-0.003
 such-0.003
 wishes-0.003
Token
Token ID198
Feature activation-4.18e-03

Pos logit contributions
aw+0.014
 anything+0.011
 flashes+0.011
 such+0.010
 its+0.010
Neg logit contributions
unny-0.023
ll-0.022
 stool-0.022
sie-0.021
ve-0.021
Token her
Token ID607
Feature activation-1.64e-02

Pos logit contributions
aw+0.031
 flashes+0.023
 its+0.023
 anything+0.023
 wishes+0.023
Neg logit contributions
 stool-0.039
unny-0.037
sie-0.037
ll-0.036
hetically-0.036
Token car
Token ID1097
Feature activation-2.90e-02

Pos logit contributions
aw+0.028
 its+0.024
 anything+0.024
 wishes+0.023
 flashes+0.023
Neg logit contributions
unny-0.031
sie-0.030
ll-0.029
otte-0.029
ve-0.028
Token,
Token ID11
Feature activation-1.84e-02

Pos logit contributions
aw+0.042
 its+0.032
 flashes+0.031
 anything+0.031
 such+0.030
Neg logit contributions
 stool-0.061
unny-0.061
sie-0.061
ll-0.059
hetically-0.058
Token and
Token ID290
Feature activation-1.25e-02

Pos logit contributions
aw+0.029
 flashes+0.022
 its+0.022
 anything+0.021
 wishes+0.021
Neg logit contributions
 stool-0.037
unny-0.036
sie-0.036
ll-0.035
hetically-0.035
Token her
Token ID607
Feature activation-3.28e-02

Pos logit contributions
aw+0.020
 flashes+0.015
 Leave+0.014
 anything+0.014
 its+0.014
Neg logit contributions
 stool-0.025
sie-0.024
hetically-0.023
unny-0.023
beans-0.023
Token bell
Token ID8966
Feature activation-6.51e-02

Pos logit contributions
aw+0.060
 its+0.052
 anything+0.051
 wishes+0.050
 flashes+0.049
Neg logit contributions
unny-0.058
sie-0.057
ll-0.055
otte-0.054
ve-0.053
Token.
Token ID13
Feature activation-5.92e-03

Pos logit contributions
aw+0.103
 Leave+0.078
 flashes+0.077
 such+0.075
 wishes+0.075
Neg logit contributions
 stool-0.129
ll-0.126
unny-0.125
hetically-0.125
ley-0.120
Token They
Token ID1119
Feature activation+1.26e-02

Pos logit contributions
aw+0.009
 anything+0.008
 wishes+0.008
 such+0.008
 flashes+0.007
Neg logit contributions
 stool-0.011
unny-0.011
aughter-0.010
 grocer-0.010
stack-0.010
Token make
Token ID787
Feature activation-6.35e-03

Pos logit contributions
unny+0.031
aughter+0.029
 stool+0.029
sie+0.029
ll+0.029
Neg logit contributions
aw-0.015
 wishes-0.012
 its-0.012
 anything-0.012
 flashes-0.012
Token loud
Token ID7812
Feature activation-4.04e-02

Pos logit contributions
aw+0.010
 Leave+0.009
 flashes+0.007
 drawn+0.007
 Nobody+0.007
Neg logit contributions
 stool-0.012
hetically-0.011
aughter-0.010
ve-0.010
 miner-0.010
Token noises
Token ID26782
Feature activation+2.34e-03

Pos logit contributions
aw+0.052
 Leave+0.039
 drawn+0.039
 such+0.039
 its+0.038
Neg logit contributions
hetically-0.082
 stool-0.081
 hill-0.076
 boulder-0.076
ve-0.075
Token that
Token ID326
Feature activation+1.29e-02

Pos logit contributions
unny+0.011
sie+0.011
 stool+0.010
ll+0.010
ve+0.010
Neg logit contributions
aw-0.007
 its-0.005
 flashes-0.005
 anything-0.005
 such-0.005
Token they
Token ID484
Feature activation+2.16e-02

Pos logit contributions
 stool+0.023
sie+0.023
ll+0.022
otte+0.022
unny+0.022
Neg logit contributions
aw-0.024
 flashes-0.018
 Leave-0.017
 wishes-0.017
 its-0.016
Token would
Token ID561
Feature activation-2.62e-02

Pos logit contributions
 stool+0.041
unny+0.041
sie+0.040
aughter+0.038
hetically+0.038
Neg logit contributions
aw-0.036
 its-0.030
 anything-0.029
 such-0.028
 flashes-0.028
Token always
Token ID1464
Feature activation-1.87e-02

Pos logit contributions
aw+0.040
 its+0.030
 flashes+0.030
 anything+0.029
 such+0.028
Neg logit contributions
unny-0.054
sie-0.054
 stool-0.054
ll-0.053
ve-0.052
Token be
Token ID307
Feature activation+5.91e-03

Pos logit contributions
aw+0.033
 flashes+0.025
 its+0.024
 anything+0.024
 such+0.023
Neg logit contributions
ll-0.035
sie-0.035
 stool-0.034
unny-0.034
ve-0.033
Token best
Token ID1266
Feature activation-5.23e-02

Pos logit contributions
sie+0.010
otte+0.009
ve+0.009
unny+0.009
ll+0.009
Neg logit contributions
aw-0.011
 flashes-0.010
 aids-0.009
 wishes-0.009
 Leave-0.009
Token friends
Token ID2460
Feature activation+1.06e-02

Pos logit contributions
aw+0.083
 flashes+0.064
 its+0.063
 anything+0.062
 such+0.059
Neg logit contributions
unny-0.105
sie-0.104
 stool-0.103
ll-0.103
ve-0.101
Token.
Token ID13
Feature activation+7.98e-03

Pos logit contributions
unny+0.023
sie+0.023
 stool+0.023
ll+0.022
hetically+0.022
Neg logit contributions
aw-0.015
 its-0.011
 anything-0.011
 flashes-0.011
 such-0.011
Token<|endoftext|>
Token ID50256
Feature activation+2.45e-02

Pos logit contributions
unny+0.016
aughter+0.015
ll+0.015
oris+0.015
 stool+0.015
Neg logit contributions
aw-0.013
 anything-0.010
 flashes-0.010
 wishes-0.009
 its-0.009
TokenOne
Token ID3198
Feature activation+2.95e-03

Pos logit contributions
unny+0.051
sie+0.051
 stool+0.051
ll+0.050
ve+0.049
Neg logit contributions
aw-0.036
 its-0.028
 flashes-0.027
 anything-0.027
 such-0.026
Token day
Token ID1110
Feature activation-2.54e-02

Pos logit contributions
sie+0.003
 stool+0.003
unny+0.003
ll+0.003
aughter+0.003
Neg logit contributions
aw-0.007
 flashes-0.006
 anything-0.006
 its-0.006
 wishes-0.006