TOP ACTIVATIONS
MAX = 10.680

. " You ruined my gir affe !" "
a big net around the oct opus . It was stuck
he waved to the r asp berries and said , "
not mean to hurt the gir affe . You are right
't I play with the oct opus , too ?"
water and wave at the oct opus . The oct opus
it with you ?" the par rot said .
wanted to play with the oct opus , but we broke
Don 't eat the r asp berries , they are too
. The flaming o nodded its head and
. The par rot got scared and flew
!" The par rot was so pleased and
being too curious . N ont he less , it showed
as they drank the mil ksh akes .
need . Even the t ini est creature needs our help
. Then the gir affe ripped out the monkey
. The par rot was soon able to
you and we love the oct opus ." Mom
said . The par rot looked at Tim .
ksh akes . The mil ksh akes were blue and pink

MIN ACTIVATIONS
MIN = -8.939

and asked if he had seen their orange ball , and
and asked if she had seen a little boy . Lily
out a needle and carefully inserted it into the toy car
just right , then carefully inserted it into the drum to
a big bush and waited for Ben to find her .
all lined up and waited for someone to say " go
asked her if she had seen the wall . She hadn
a big tree and waits for Sara to find him .
a big tree and waited for Anna to find him .
eddy bear when she accidentally grabbed the cord of her lamp
them before , but never seen one . She was so
started to cry and reached for their bottle to feel safe
to the window and waited for her mom my to arrive
get damaged , so she grabbed the books and held them
escape . As the hunter grabbed him and c aged him
to ten and then looked for Max . She saw a
to ten and then looks for Ben . She
water the garden . Mama grabbed a long hose and Baby
everywhere . Mom heard the noise and came to
One day , Lily accidentally dropped the ball and it rolled

INTERVAL 5.775 - 10.680
CONTAINS 0.825%

she needed help finding her t eddy bear . The little
. That is the real treasure ." Tim my
are hard ." So , Sp ide y tried again and
time , there lived a dragon . He was a very
play peek - a - bo o instead ." So they

INTERVAL 0.871 - 5.775
CONTAINS 43.975%

sees a red button on the oven . He thinks it
coming from the freezer . Dad said , " Quick ly
need to panic anymore because he had found a safe place
fun together . One day , Tim and Rod
! â € John peek ed behind the bush

INTERVAL -4.034 - 0.871
CONTAINS 45.695%

She is not happy . She says , " Sh apes
find a dry cloth and wrap it around the doll 's
. He didn 't want to make any more mistakes .
They say thank you and goodbye to the man . They
. They sn ugg led close and soon fell fast asleep

INTERVAL -8.939 - -4.034
CONTAINS 9.505%

on , Tim grew up to be a kind and caring
, but he couldn 't find it . He was so
wished they had not gone on vacation . They wished they
L ily looked up at them with a big smile
know why . She looked around the bathroom , but nothing
Token.
Token ID13
Feature activation+2.69e+00

Pos logit contributions
ouse+0.175
anda+0.175
umpy+0.173
int+0.159
uer+0.158
Neg logit contributions
 the-0.108
 it-0.097
 them-0.088
 her-0.087
 their-0.086
Token "
Token ID366
Feature activation+1.30e+00

Pos logit contributions
anda+0.087
ouse+0.085
umpy+0.082
int+0.079
uer+0.079
Neg logit contributions
 the-0.061
 it-0.057
 her-0.052
 them-0.052
 their-0.050
TokenYou
Token ID1639
Feature activation+9.81e-01

Pos logit contributions
anda+0.049
ouse+0.047
uddle+0.045
umpy+0.044
ables+0.043
Neg logit contributions
 the-0.020
 it-0.019
 them-0.018
 her-0.017
 their-0.016
Token ruined
Token ID20484
Feature activation-4.28e+00

Pos logit contributions
anda+0.035
ouse+0.034
umpy+0.033
oc+0.031
uddle+0.031
Neg logit contributions
 the-0.018
 it-0.016
 them-0.015
 their-0.014
 her-0.014
Token my
Token ID616
Feature activation+3.07e+00

Pos logit contributions
 the+0.040
 it+0.037
 them+0.034
 their+0.032
wered+0.029
Neg logit contributions
anda-0.167
ouse-0.159
umpy-0.155
oc-0.152
ables-0.147
Token gir
Token ID37370
Feature activation+9.95e+00

Pos logit contributions
ouse+0.082
umpy+0.082
anda+0.079
int+0.074
ables+0.070
Neg logit contributions
 the-0.082
 it-0.076
 them-0.071
 their-0.069
 her-0.068
Tokenaffe
Token ID21223
Feature activation+6.67e+00

Pos logit contributions
ouse+0.141
anda+0.119
umpy+0.106
uddle+0.105
int+0.104
Neg logit contributions
 the-0.360
 it-0.346
 them-0.327
 this-0.322
 help-0.322
Token!"
Token ID2474
Feature activation+2.63e+00

Pos logit contributions
anda+0.226
ouse+0.221
umpy+0.216
int+0.204
ari+0.202
Neg logit contributions
 the-0.135
 it-0.124
 her-0.106
 them-0.102
 this-0.102
Token
Token ID198
Feature activation+1.88e+00

Pos logit contributions
anda+0.098
ouse+0.095
umpy+0.095
uer+0.089
oc+0.089
Neg logit contributions
 the-0.049
 it-0.045
 her-0.040
 them-0.039
 their-0.038
Token
Token ID198
Feature activation+1.76e+00

Pos logit contributions
anda+0.059
ouse+0.057
umpy+0.055
uer+0.053
int+0.052
Neg logit contributions
 the-0.043
 it-0.040
 them-0.038
 her-0.036
 their-0.036
Token"
Token ID1
Feature activation+6.53e-01

Pos logit contributions
anda+0.055
ouse+0.054
umpy+0.052
oc+0.049
int+0.049
Neg logit contributions
 the-0.038
 it-0.036
 them-0.033
 their-0.032
 her-0.032
Token a
Token ID257
Feature activation+1.73e+00

Pos logit contributions
 the+0.022
 it+0.019
 them+0.014
 her+0.012
 their+0.012
Neg logit contributions
anda-0.159
ouse-0.154
umpy-0.152
oc-0.147
ney-0.143
Token big
Token ID1263
Feature activation+2.00e+00

Pos logit contributions
anda+0.049
ouse+0.048
umpy+0.046
oc+0.044
uer+0.042
Neg logit contributions
 the-0.041
 it-0.039
 them-0.037
 their-0.036
 her-0.035
Token net
Token ID2010
Feature activation+4.46e+00

Pos logit contributions
anda+0.051
ouse+0.049
umpy+0.047
oc+0.046
uddle+0.043
Neg logit contributions
 the-0.050
 it-0.048
 them-0.048
 their-0.045
 her-0.045
Token around
Token ID1088
Feature activation-2.55e+00

Pos logit contributions
anda+0.133
ouse+0.131
umpy+0.130
oc+0.120
ney+0.119
Neg logit contributions
 the-0.088
 it-0.084
 them-0.081
 their-0.078
 anything-0.076
Token the
Token ID262
Feature activation+3.86e+00

Pos logit contributions
 permission+0.012
 anything+0.012
 such+0.011
 respect+0.011
wered+0.009
Neg logit contributions
oc-0.092
anda-0.089
umpy-0.088
ouse-0.087
ney-0.087
Token oct
Token ID19318
Feature activation+9.92e+00

Pos logit contributions
anda+0.095
ouse+0.094
umpy+0.093
ney+0.085
oc+0.085
Neg logit contributions
 the-0.096
 it-0.094
 them-0.092
 her-0.090
 their-0.088
Tokenopus
Token ID25790
Feature activation+6.23e+00

Pos logit contributions
anda+0.131
ouse+0.108
umpy+0.108
ney+0.102
jo+0.099
Neg logit contributions
 it-0.336
 the-0.330
 them-0.316
 this-0.308
 him-0.308
Token.
Token ID13
Feature activation+1.33e+00

Pos logit contributions
anda+0.209
ouse+0.204
umpy+0.202
uer+0.188
oc+0.188
Neg logit contributions
 the-0.123
 it-0.117
 them-0.105
 their-0.101
 her-0.096
Token It
Token ID632
Feature activation+2.08e+00

Pos logit contributions
anda+0.041
ouse+0.040
umpy+0.038
oc+0.036
uer+0.036
Neg logit contributions
 the-0.030
 it-0.028
 them-0.027
 their-0.026
 her-0.025
Token was
Token ID373
Feature activation-3.21e+00

Pos logit contributions
ouse+0.068
anda+0.067
umpy+0.066
ari+0.063
uer+0.063
Neg logit contributions
 the-0.047
 it-0.045
 them-0.038
 their-0.037
 a-0.036
Token stuck
Token ID7819
Feature activation+9.80e-01

Pos logit contributions
 them+0.032
 it+0.032
 the+0.032
 respect+0.031
 anything+0.030
Neg logit contributions
anda-0.107
umpy-0.104
ouse-0.100
ney-0.095
oc-0.094
Token he
Token ID339
Feature activation+3.73e+00

Pos logit contributions
 the+0.009
 it+0.008
 them+0.008
 their+0.008
 her+0.007
Neg logit contributions
anda-0.015
ouse-0.015
umpy-0.014
oc-0.013
ney-0.013
Token waved
Token ID26834
Feature activation-1.32e+00

Pos logit contributions
anda+0.115
umpy+0.113
ouse+0.112
oc+0.106
ari+0.105
Neg logit contributions
 the-0.083
 it-0.080
 them-0.073
 their-0.069
 her-0.068
Token to
Token ID284
Feature activation-3.97e+00

Pos logit contributions
 the+0.022
 it+0.020
 them+0.020
 their+0.019
 anything+0.018
Neg logit contributions
anda-0.044
ouse-0.044
umpy-0.043
oc-0.040
uer-0.039
Token the
Token ID262
Feature activation+2.90e+00

Pos logit contributions
 the+0.012
 it+0.009
 their+0.007
 them+0.006
 such+0.005
Neg logit contributions
anda-0.165
ouse-0.165
oc-0.161
umpy-0.160
affe-0.154
Token r
Token ID374
Feature activation+5.35e+00

Pos logit contributions
anda+0.086
ouse+0.085
umpy+0.083
oc+0.079
int+0.077
Neg logit contributions
 the-0.067
 it-0.063
 them-0.060
 their-0.057
 her-0.056
Tokenasp
Token ID5126
Feature activation+9.91e+00

Pos logit contributions
oc+0.120
ouse+0.115
anda+0.108
uer+0.108
jo+0.102
Neg logit contributions
 them-0.146
 the-0.145
 their-0.131
 her-0.129
 it-0.128
Tokenberries
Token ID20853
Feature activation+4.43e+00

Pos logit contributions
anda+0.230
uer+0.229
ouse+0.220
umpy+0.208
ari+0.190
Neg logit contributions
 the-0.302
 it-0.286
 them-0.277
 her-0.252
 their-0.248
Token and
Token ID290
Feature activation-1.07e+00

Pos logit contributions
anda+0.142
ouse+0.139
umpy+0.136
oc+0.129
uer+0.125
Neg logit contributions
 the-0.090
 it-0.084
 them-0.079
 their-0.076
 her-0.074
Token said
Token ID531
Feature activation-1.66e+00

Pos logit contributions
 the+0.022
 it+0.021
 them+0.020
 their+0.019
 her+0.018
Neg logit contributions
anda-0.033
ouse-0.033
umpy-0.031
oc-0.030
uer-0.029
Token,
Token ID11
Feature activation+2.59e-01

Pos logit contributions
 the+0.036
 it+0.034
 them+0.032
 their+0.031
 her+0.030
Neg logit contributions
anda-0.049
ouse-0.049
umpy-0.047
oc-0.044
uer-0.043
Token "
Token ID366
Feature activation+8.34e-01

Pos logit contributions
anda+0.008
ouse+0.008
umpy+0.008
oc+0.008
ney+0.007
Neg logit contributions
 the-0.005
 it-0.005
 them-0.005
 their-0.005
 her-0.004
Token not
Token ID407
Feature activation-2.10e+00

Pos logit contributions
wered+0.038
 them+0.037
 the+0.034
 their+0.033
 permission+0.032
Neg logit contributions
ouse-0.078
anda-0.074
umpy-0.070
uer-0.068
oc-0.068
Token mean
Token ID1612
Feature activation-6.85e-01

Pos logit contributions
 the+0.041
 it+0.038
 them+0.036
 their+0.035
 her+0.034
Neg logit contributions
anda-0.068
ouse-0.067
umpy-0.066
oc-0.061
uer-0.060
Token to
Token ID284
Feature activation-3.07e+00

Pos logit contributions
 the+0.010
 it+0.009
 them+0.008
 their+0.008
 her+0.008
Neg logit contributions
anda-0.025
ouse-0.024
umpy-0.024
oc-0.023
ables-0.023
Token hurt
Token ID5938
Feature activation-3.52e+00

Pos logit contributions
 the+0.050
 it+0.047
 them+0.045
 their+0.043
 her+0.041
Neg logit contributions
anda-0.105
ouse-0.101
umpy-0.101
oc-0.094
uer-0.092
Token the
Token ID262
Feature activation+2.78e+00

Pos logit contributions
 the+0.042
 it+0.039
 their+0.037
 them+0.036
wered+0.034
Neg logit contributions
anda-0.127
umpy-0.125
ouse-0.119
oc-0.117
affe-0.113
Token gir
Token ID37370
Feature activation+9.86e+00

Pos logit contributions
anda+0.077
ouse+0.075
umpy+0.073
oc+0.068
uddle+0.066
Neg logit contributions
 the-0.065
 it-0.062
 them-0.061
 their-0.060
 her-0.058
Tokenaffe
Token ID21223
Feature activation+6.14e+00

Pos logit contributions
ouse+0.133
anda+0.110
int+0.101
uddle+0.092
aj+0.087
Neg logit contributions
 the-0.388
 it-0.372
 them-0.355
 this-0.341
 him-0.336
Token.
Token ID13
Feature activation+1.28e+00

Pos logit contributions
anda+0.213
ouse+0.212
umpy+0.199
int+0.195
uer+0.192
Neg logit contributions
 the-0.120
 it-0.111
 them-0.099
 her-0.094
 their-0.093
Token You
Token ID921
Feature activation+9.17e-01

Pos logit contributions
anda+0.043
ouse+0.042
umpy+0.039
uddle+0.039
uer+0.038
Neg logit contributions
 the-0.026
 it-0.025
 their-0.021
 her-0.021
 them-0.021
Token are
Token ID389
Feature activation-1.53e+00

Pos logit contributions
anda+0.030
ouse+0.030
umpy+0.029
oc+0.027
uer+0.027
Neg logit contributions
 the-0.018
 it-0.017
 them-0.016
 their-0.015
 her-0.015
Token right
Token ID826
Feature activation+1.22e+00

Pos logit contributions
 them+0.026
 it+0.026
 the+0.025
wered+0.023
 their+0.023
Neg logit contributions
anda-0.047
umpy-0.045
ouse-0.043
ney-0.040
uer-0.039
Token't
Token ID470
Feature activation-3.20e+00

Pos logit contributions
 the+0.073
 it+0.068
 them+0.067
 their+0.065
 her+0.062
Neg logit contributions
anda-0.138
ouse-0.135
umpy-0.131
oc-0.123
uer-0.120
Token I
Token ID314
Feature activation-1.58e+00

Pos logit contributions
 the+0.054
 them+0.051
 their+0.049
 it+0.049
 her+0.045
Neg logit contributions
ouse-0.100
anda-0.100
umpy-0.099
oc-0.093
affe-0.090
Token play
Token ID711
Feature activation-8.35e-01

Pos logit contributions
 the+0.025
 it+0.024
 them+0.022
 their+0.021
 her+0.020
Neg logit contributions
anda-0.057
ouse-0.056
umpy-0.055
oc-0.052
uer-0.050
Token with
Token ID351
Feature activation-3.13e+00

Pos logit contributions
 the+0.017
 it+0.016
 them+0.016
 their+0.015
 her+0.015
Neg logit contributions
anda-0.025
ouse-0.025
umpy-0.024
oc-0.023
uer-0.022
Token the
Token ID262
Feature activation+2.57e+00

Pos logit contributions
 the+0.029
 their+0.029
wered+0.028
 it+0.027
 them+0.027
Neg logit contributions
anda-0.114
ouse-0.109
umpy-0.106
oc-0.103
uddle-0.102
Token oct
Token ID19318
Feature activation+9.77e+00

Pos logit contributions
ouse+0.076
anda+0.076
umpy+0.072
int+0.070
oc+0.070
Neg logit contributions
 the-0.060
 it-0.056
 them-0.052
 their-0.050
 her-0.048
Tokenopus
Token ID25790
Feature activation+6.15e+00

Pos logit contributions
anda+0.141
ney+0.124
umpy+0.124
ouse+0.120
int+0.120
Neg logit contributions
 it-0.310
 the-0.299
 them-0.290
 this-0.281
 him-0.280
Token,
Token ID11
Feature activation-1.47e+00

Pos logit contributions
anda+0.222
ouse+0.214
umpy+0.211
uer+0.204
int+0.203
Neg logit contributions
 the-0.110
 it-0.097
 them-0.085
 their-0.083
 her-0.082
Token too
Token ID1165
Feature activation+2.54e-01

Pos logit contributions
 the+0.026
 it+0.024
 them+0.023
 their+0.022
 her+0.021
Neg logit contributions
anda-0.050
ouse-0.049
umpy-0.048
oc-0.045
ney-0.044
Token?"
Token ID1701
Feature activation+2.20e+00

Pos logit contributions
anda+0.008
ouse+0.008
umpy+0.008
oc+0.007
ney+0.007
Neg logit contributions
 the-0.005
 it-0.005
 them-0.005
 their-0.004
 her-0.004
Token
Token ID198
Feature activation+1.46e+00

Pos logit contributions
anda+0.073
ouse+0.070
umpy+0.069
uer+0.064
uddle+0.063
Neg logit contributions
 the-0.050
 it-0.047
 them-0.042
 their-0.041
 her-0.041
Token water
Token ID1660
Feature activation+3.39e+00

Pos logit contributions
anda+0.103
ouse+0.101
umpy+0.098
oc+0.092
uer+0.089
Neg logit contributions
 the-0.087
 it-0.082
 them-0.078
 their-0.075
 her-0.074
Token and
Token ID290
Feature activation-1.01e+00

Pos logit contributions
anda+0.103
ouse+0.101
umpy+0.098
oc+0.092
ney+0.090
Neg logit contributions
 the-0.073
 it-0.069
 them-0.065
 their-0.063
 her-0.061
Token wave
Token ID6769
Feature activation+1.77e-01

Pos logit contributions
 the+0.015
 them+0.014
 it+0.014
 such+0.013
 anything+0.013
Neg logit contributions
anda-0.033
ouse-0.033
umpy-0.031
oc-0.030
ables-0.029
Token at
Token ID379
Feature activation-3.91e+00

Pos logit contributions
anda+0.006
ouse+0.006
umpy+0.006
oc+0.006
uer+0.005
Neg logit contributions
 the-0.003
 it-0.003
 them-0.003
 their-0.003
 her-0.002
Token the
Token ID262
Feature activation+3.31e+00

Pos logit contributions
 the+0.011
 it+0.007
 such+0.004
 anything+0.004
 them+0.003
Neg logit contributions
anda-0.172
ouse-0.164
umpy-0.160
oc-0.159
jo-0.154
Token oct
Token ID19318
Feature activation+9.69e+00

Pos logit contributions
anda+0.100
ouse+0.098
umpy+0.096
oc+0.090
uer+0.087
Neg logit contributions
 the-0.072
 it-0.068
 them-0.065
 their-0.062
 her-0.061
Tokenopus
Token ID25790
Feature activation+5.67e+00

Pos logit contributions
anda+0.129
ney+0.114
umpy+0.113
ouse+0.105
jo+0.103
Neg logit contributions
 it-0.319
 the-0.306
 them-0.297
 him-0.292
 this-0.286
Token.
Token ID13
Feature activation+2.26e+00

Pos logit contributions
anda+0.198
ouse+0.195
umpy+0.194
int+0.181
uer+0.180
Neg logit contributions
 the-0.106
 it-0.099
 them-0.089
 their-0.086
 her-0.081
Token The
Token ID383
Feature activation+2.87e+00

Pos logit contributions
anda+0.070
ouse+0.069
umpy+0.066
uer+0.063
int+0.062
Neg logit contributions
 the-0.054
 it-0.051
 them-0.046
 their-0.045
 her-0.044
Token oct
Token ID19318
Feature activation+9.01e+00

Pos logit contributions
anda+0.084
ouse+0.083
umpy+0.080
int+0.076
uer+0.076
Neg logit contributions
 the-0.068
 it-0.063
 them-0.059
 their-0.056
 her-0.054
Tokenopus
Token ID25790
Feature activation+6.56e+00

Pos logit contributions
anda+0.125
umpy+0.113
ney+0.109
ouse+0.106
jo+0.104
Neg logit contributions
 it-0.302
 the-0.289
 this-0.271
 them-0.271
 him-0.270
Token it
Token ID340
Feature activation+1.89e+00

Pos logit contributions
 the+0.014
 their+0.009
wered+0.008
 it+0.008
 them+0.008
Neg logit contributions
anda-0.143
ouse-0.139
umpy-0.135
oc-0.132
ang-0.128
Token with
Token ID351
Feature activation-3.57e+00

Pos logit contributions
anda+0.069
ouse+0.067
umpy+0.066
oc+0.062
uer+0.061
Neg logit contributions
 the-0.033
 it-0.030
 them-0.028
 their-0.027
 her-0.026
Token you
Token ID345
Feature activation+1.87e+00

Pos logit contributions
wered+0.046
 the+0.045
 their+0.041
 such+0.040
 anything+0.039
Neg logit contributions
anda-0.112
ouse-0.107
oc-0.100
umpy-0.100
affe-0.099
Token?"
Token ID1701
Feature activation+1.44e+00

Pos logit contributions
anda+0.056
ouse+0.055
umpy+0.053
oc+0.050
int+0.049
Neg logit contributions
 the-0.039
 it-0.037
 them-0.036
 their-0.035
 her-0.034
Token the
Token ID262
Feature activation+3.57e+00

Pos logit contributions
anda+0.046
ouse+0.045
umpy+0.044
oc+0.041
uer+0.041
Neg logit contributions
 the-0.032
 it-0.030
 them-0.028
 their-0.027
 her-0.027
Token par
Token ID1582
Feature activation+9.58e+00

Pos logit contributions
ouse+0.092
anda+0.091
umpy+0.089
int+0.082
oc+0.081
Neg logit contributions
 the-0.098
 it-0.091
 them-0.088
 their-0.086
 her-0.083
Tokenrot
Token ID10599
Feature activation+6.47e+00

Pos logit contributions
umpy+0.166
anda+0.146
ables+0.141
int+0.140
uddle+0.138
Neg logit contributions
 it-0.294
 them-0.282
 the-0.281
 this-0.253
 a-0.247
Token said
Token ID531
Feature activation+7.59e-01

Pos logit contributions
umpy+0.209
ouse+0.200
anda+0.198
ini+0.197
ables+0.192
Neg logit contributions
 the-0.160
 it-0.150
 her-0.140
 them-0.133
 their-0.125
Token.
Token ID13
Feature activation+1.59e+00

Pos logit contributions
anda+0.024
ouse+0.024
umpy+0.023
oc+0.022
uer+0.021
Neg logit contributions
 the-0.016
 it-0.015
 them-0.014
 their-0.013
 her-0.013
Token
Token ID198
Feature activation+1.33e+00

Pos logit contributions
anda+0.052
ouse+0.051
umpy+0.049
uer+0.046
uddle+0.046
Neg logit contributions
 the-0.036
 it-0.034
 them-0.030
 their-0.030
 her-0.029
Token
Token ID198
Feature activation+1.30e+00

Pos logit contributions
anda+0.040
ouse+0.039
umpy+0.037
uer+0.035
oc+0.035
Neg logit contributions
 the-0.032
 it-0.030
 them-0.028
 their-0.027
 her-0.027
Token wanted
Token ID2227
Feature activation-5.46e+00

Pos logit contributions
anda+0.028
ouse+0.027
umpy+0.026
oc+0.025
int+0.024
Neg logit contributions
 the-0.020
 it-0.019
 them-0.018
 their-0.017
 her-0.017
Token to
Token ID284
Feature activation-3.64e+00

Pos logit contributions
wered+0.105
 the+0.080
 her+0.077
 risking+0.076
 them+0.075
Neg logit contributions
anda-0.146
ouse-0.130
umpy-0.125
ables-0.123
uddle-0.123
Token play
Token ID711
Feature activation-1.18e+00

Pos logit contributions
 the+0.059
 it+0.057
 them+0.055
 anything+0.052
 their+0.050
Neg logit contributions
ouse-0.113
anda-0.112
umpy-0.106
oc-0.103
ney-0.101
Token with
Token ID351
Feature activation-3.94e+00

Pos logit contributions
 the+0.021
 it+0.020
 them+0.019
 her+0.018
 their+0.018
Neg logit contributions
ouse-0.036
anda-0.036
umpy-0.035
oc-0.035
uddle-0.033
Token the
Token ID262
Feature activation+1.95e+00

Pos logit contributions
wered+0.038
 it+0.028
 the+0.028
 going+0.026
 her+0.026
Neg logit contributions
anda-0.145
uddle-0.133
ouse-0.129
oc-0.127
umpy-0.126
Token oct
Token ID19318
Feature activation+9.58e+00

Pos logit contributions
anda+0.057
ouse+0.055
umpy+0.054
oc+0.051
uer+0.049
Neg logit contributions
 the-0.046
 it-0.043
 them-0.041
 their-0.040
 her-0.039
Tokenopus
Token ID25790
Feature activation+6.09e+00

Pos logit contributions
anda+0.134
int+0.123
est+0.115
umpy+0.114
mo+0.114
Neg logit contributions
 it-0.298
 the-0.286
 them-0.276
 him-0.271
 this-0.267
Token,
Token ID11
Feature activation-1.68e+00

Pos logit contributions
ouse+0.208
anda+0.207
umpy+0.203
int+0.195
uer+0.190
Neg logit contributions
 the-0.118
 it-0.107
 their-0.095
 them-0.095
 her-0.091
Token but
Token ID475
Feature activation-2.63e+00

Pos logit contributions
 the+0.033
 it+0.030
 them+0.029
 their+0.028
 her+0.027
Neg logit contributions
anda-0.054
ouse-0.051
umpy-0.051
oc-0.048
ney-0.046
Token we
Token ID356
Feature activation+8.12e-01

Pos logit contributions
wered+0.032
 the+0.032
 them+0.032
 her+0.030
 such+0.030
Neg logit contributions
anda-0.089
ouse-0.082
oc-0.078
umpy-0.078
ney-0.076
Token broke
Token ID6265
Feature activation-5.35e+00

Pos logit contributions
anda+0.024
ouse+0.023
umpy+0.023
oc+0.021
ney+0.020
Neg logit contributions
 the-0.018
 it-0.017
 them-0.017
 their-0.016
 her-0.016
Token Don
Token ID2094
Feature activation-1.15e+00

Pos logit contributions
 the+0.002
 it+0.002
 them+0.001
 their+0.001
 her+0.001
Neg logit contributions
anda-0.003
ouse-0.003
umpy-0.002
uddle-0.002
uer-0.002
Token't
Token ID470
Feature activation-2.79e+00

Pos logit contributions
 the+0.019
 it+0.017
 them+0.016
 their+0.015
 her+0.015
Neg logit contributions
anda-0.042
ouse-0.041
umpy-0.040
oc-0.038
uer-0.038
Token eat
Token ID4483
Feature activation-3.00e+00

Pos logit contributions
 the+0.047
 it+0.044
 them+0.042
 their+0.041
 her+0.038
Neg logit contributions
anda-0.096
umpy-0.092
ouse-0.092
oc-0.087
uer-0.084
Token the
Token ID262
Feature activation+1.85e+00

Pos logit contributions
 the+0.027
 it+0.023
 their+0.022
 her+0.020
 them+0.020
Neg logit contributions
anda-0.120
ouse-0.113
umpy-0.113
oc-0.108
uddle-0.107
Token r
Token ID374
Feature activation+4.68e+00

Pos logit contributions
anda+0.058
ouse+0.057
umpy+0.055
oc+0.052
uer+0.051
Neg logit contributions
 the-0.041
 it-0.038
 them-0.036
 their-0.034
 her-0.034
Tokenasp
Token ID5126
Feature activation+9.53e+00

Pos logit contributions
ouse+0.105
oc+0.104
anda+0.099
uer+0.097
jo+0.092
Neg logit contributions
 the-0.133
 them-0.130
 it-0.118
 their-0.117
 her-0.116
Tokenberries
Token ID20853
Feature activation+4.73e+00

Pos logit contributions
anda+0.224
uer+0.220
ouse+0.218
umpy+0.192
ari+0.190
Neg logit contributions
 the-0.288
 it-0.273
 them-0.263
 her-0.235
 their-0.230
Token,
Token ID11
Feature activation-1.19e+00

Pos logit contributions
anda+0.148
ouse+0.145
umpy+0.141
oc+0.134
uer+0.130
Neg logit contributions
 the-0.099
 it-0.093
 them-0.088
 their-0.085
 her-0.083
Token they
Token ID484
Feature activation+6.29e-01

Pos logit contributions
 the+0.018
 it+0.017
 them+0.016
 their+0.015
 her+0.015
Neg logit contributions
anda-0.042
ouse-0.041
umpy-0.041
oc-0.039
uer-0.037
Token are
Token ID389
Feature activation-2.02e+00

Pos logit contributions
anda+0.022
ouse+0.021
umpy+0.021
oc+0.020
uer+0.019
Neg logit contributions
 the-0.011
 it-0.010
 them-0.010
 their-0.009
 her-0.009
Token too
Token ID1165
Feature activation-1.03e+00

Pos logit contributions
 the+0.036
 it+0.034
 them+0.032
 their+0.031
 her+0.030
Neg logit contributions
anda-0.065
umpy-0.061
ouse-0.061
oc-0.056
ney-0.056
Token.
Token ID13
Feature activation+1.36e+00

Pos logit contributions
 the+0.020
 it+0.019
 them+0.018
 their+0.017
 her+0.016
Neg logit contributions
anda-0.030
ouse-0.029
umpy-0.028
oc-0.027
uer-0.026
Token
Token ID220
Feature activation+1.45e+00

Pos logit contributions
anda+0.041
ouse+0.040
umpy+0.038
uer+0.036
uddle+0.036
Neg logit contributions
 the-0.032
 it-0.031
 them-0.028
 their-0.027
 her-0.026
Token
Token ID198
Feature activation+1.33e+00

Pos logit contributions
anda+0.045
ouse+0.044
umpy+0.042
oc+0.041
uer+0.041
Neg logit contributions
 the-0.032
 it-0.030
 them-0.029
 their-0.027
 her-0.026
Token
Token ID198
Feature activation+1.10e+00

Pos logit contributions
anda+0.041
ouse+0.040
umpy+0.038
uer+0.036
oc+0.036
Neg logit contributions
 the-0.030
 it-0.029
 them-0.027
 their-0.026
 her-0.025
TokenThe
Token ID464
Feature activation+2.41e+00

Pos logit contributions
anda+0.034
ouse+0.033
umpy+0.032
oc+0.030
uer+0.030
Neg logit contributions
 the-0.024
 it-0.023
 them-0.022
 their-0.021
 her-0.020
Token flaming
Token ID39518
Feature activation+9.51e+00

Pos logit contributions
ouse+0.078
ables+0.074
umpy+0.072
uer+0.069
int+0.068
Neg logit contributions
 the-0.058
 it-0.052
 them-0.050
 her-0.047
 their-0.046
Tokeno
Token ID78
Feature activation+6.50e+00

Pos logit contributions
uer+0.198
ouse+0.186
umpy+0.181
uddle+0.166
ables+0.164
Neg logit contributions
 the-0.271
 it-0.246
 her-0.242
 them-0.242
 what-0.234
Token nodded
Token ID14464
Feature activation-1.74e+00

Pos logit contributions
umpy+0.210
ouse+0.207
anda+0.197
uer+0.187
ables+0.186
Neg logit contributions
 the-0.147
 it-0.135
 them-0.120
 their-0.113
 her-0.113
Token its
Token ID663
Feature activation+2.33e+00

Pos logit contributions
 the+0.036
 it+0.034
 them+0.032
 their+0.030
 her+0.030
Neg logit contributions
anda-0.054
ouse-0.054
umpy-0.052
oc-0.049
uer-0.048
Token head
Token ID1182
Feature activation+3.34e+00

Pos logit contributions
ouse+0.065
anda+0.065
umpy+0.063
ari+0.062
oke+0.061
Neg logit contributions
 the-0.058
 it-0.052
 her-0.051
 their-0.050
 a-0.050
Token and
Token ID290
Feature activation-1.30e+00

Pos logit contributions
anda+0.110
ouse+0.105
umpy+0.103
oc+0.098
uddle+0.096
Neg logit contributions
 the-0.071
 it-0.067
 them-0.061
 her-0.060
 their-0.060
Token.
Token ID13
Feature activation+1.83e+00

Pos logit contributions
anda+0.111
ouse+0.107
umpy+0.106
oc+0.102
ari+0.100
Neg logit contributions
 the-0.066
 it-0.062
 them-0.057
 her-0.055
 their-0.055
Token
Token ID220
Feature activation+3.58e+00

Pos logit contributions
anda+0.058
ouse+0.056
umpy+0.055
oc+0.052
uer+0.052
Neg logit contributions
 the-0.042
 it-0.039
 them-0.036
 her-0.035
 their-0.034
Token
Token ID198
Feature activation+2.12e+00

Pos logit contributions
anda+0.118
ouse+0.115
umpy+0.111
uer+0.110
oc+0.109
Neg logit contributions
 the-0.074
 it-0.070
 them-0.067
 her-0.062
 their-0.061
Token
Token ID198
Feature activation+1.73e+00

Pos logit contributions
anda+0.069
ouse+0.067
umpy+0.063
uer+0.061
int+0.061
Neg logit contributions
 the-0.046
 it-0.043
 them-0.041
 her-0.039
 their-0.038
TokenThe
Token ID464
Feature activation+2.87e+00

Pos logit contributions
anda+0.055
ouse+0.054
umpy+0.051
oc+0.049
uddle+0.048
Neg logit contributions
 the-0.036
 it-0.034
 them-0.032
 their-0.031
 her-0.031
Token par
Token ID1582
Feature activation+9.50e+00

Pos logit contributions
anda+0.076
ouse+0.075
umpy+0.074
ables+0.069
int+0.069
Neg logit contributions
 the-0.076
 it-0.072
 them-0.065
 her-0.064
 a-0.063
Tokenrot
Token ID10599
Feature activation+6.37e+00

Pos logit contributions
umpy+0.172
anda+0.150
int+0.148
ouse+0.147
uddle+0.144
Neg logit contributions
 it-0.287
 the-0.282
 them-0.270
 this-0.255
 such-0.249
Token got
Token ID1392
Feature activation-2.73e+00

Pos logit contributions
umpy+0.220
ouse+0.212
anda+0.209
ables+0.199
ari+0.196
Neg logit contributions
 the-0.139
 it-0.131
 her-0.110
 them-0.108
 a-0.104
Token scared
Token ID12008
Feature activation-8.21e-01

Pos logit contributions
 the+0.062
 it+0.059
 them+0.056
 their+0.054
 her+0.052
Neg logit contributions
anda-0.078
ouse-0.075
umpy-0.074
oc-0.068
uer-0.066
Token and
Token ID290
Feature activation-4.42e-01

Pos logit contributions
 the+0.019
 it+0.018
 them+0.018
 their+0.017
 her+0.017
Neg logit contributions
anda-0.023
ouse-0.022
umpy-0.022
oc-0.021
uer-0.020
Token flew
Token ID13112
Feature activation-1.94e+00

Pos logit contributions
 the+0.010
 it+0.009
 them+0.009
 their+0.009
 her+0.008
Neg logit contributions
anda-0.013
ouse-0.012
umpy-0.012
oc-0.011
uer-0.011
Token!"
Token ID2474
Feature activation+2.49e+00

Pos logit contributions
anda+0.040
ouse+0.039
umpy+0.038
oc+0.036
int+0.036
Neg logit contributions
 the-0.024
 it-0.023
 them-0.021
 their-0.020
 her-0.020
Token
Token ID220
Feature activation+2.58e+00

Pos logit contributions
anda+0.080
ouse+0.077
umpy+0.076
uer+0.071
uddle+0.070
Neg logit contributions
 the-0.058
 it-0.054
 them-0.050
 their-0.049
 her-0.047
Token
Token ID198
Feature activation+1.51e+00

Pos logit contributions
anda+0.083
ouse+0.081
umpy+0.078
oc+0.074
uer+0.074
Neg logit contributions
 the-0.056
 it-0.052
 them-0.050
 their-0.047
 her-0.045
Token
Token ID198
Feature activation+1.39e+00

Pos logit contributions
anda+0.049
ouse+0.047
umpy+0.046
int+0.043
uer+0.043
Neg logit contributions
 the-0.034
 it-0.031
 them-0.030
 their-0.028
 her-0.027
TokenThe
Token ID464
Feature activation+2.87e+00

Pos logit contributions
anda+0.046
ouse+0.045
umpy+0.044
oc+0.041
int+0.040
Neg logit contributions
 the-0.027
 it-0.025
 them-0.024
 their-0.023
 her-0.022
Token par
Token ID1582
Feature activation+9.50e+00

Pos logit contributions
ouse+0.085
anda+0.081
umpy+0.079
ables+0.077
int+0.075
Neg logit contributions
 the-0.073
 it-0.068
 them-0.065
 their-0.061
 her-0.061
Tokenrot
Token ID10599
Feature activation+6.54e+00

Pos logit contributions
umpy+0.161
ouse+0.159
uddle+0.149
ables+0.146
anda+0.145
Neg logit contributions
 it-0.291
 them-0.288
 the-0.286
 her-0.257
 this-0.254
Token was
Token ID373
Feature activation-1.53e+00

Pos logit contributions
umpy+0.219
anda+0.210
ouse+0.205
ables+0.196
ini+0.193
Neg logit contributions
 the-0.152
 it-0.138
 her-0.121
 them-0.116
 a-0.115
Token so
Token ID523
Feature activation-8.72e-01

Pos logit contributions
 the+0.033
 it+0.030
 them+0.028
 their+0.028
 her+0.027
Neg logit contributions
anda-0.050
ouse-0.049
umpy-0.047
oc-0.046
uddle-0.044
Token pleased
Token ID10607
Feature activation+3.64e-01

Pos logit contributions
 the+0.020
 it+0.018
 their+0.017
 her+0.017
 a+0.016
Neg logit contributions
anda-0.029
oc-0.027
ouse-0.027
ables-0.026
uddle-0.026
Token and
Token ID290
Feature activation-1.01e-01

Pos logit contributions
anda+0.012
ouse+0.012
umpy+0.012
oc+0.011
uer+0.011
Neg logit contributions
 the-0.007
 it-0.007
 them-0.006
 their-0.006
 her-0.006
Token being
Token ID852
Feature activation-3.80e-01

Pos logit contributions
 the+0.014
 it+0.012
 them+0.010
 their+0.008
 her+0.006
Neg logit contributions
anda-0.149
ouse-0.144
umpy-0.142
oc-0.141
ables-0.135
Token too
Token ID1165
Feature activation+5.06e-01

Pos logit contributions
 the+0.006
 it+0.006
 them+0.005
 their+0.005
 her+0.005
Neg logit contributions
anda-0.013
ouse-0.013
umpy-0.013
oc-0.012
uer-0.012
Token curious
Token ID11040
Feature activation+2.37e+00

Pos logit contributions
anda+0.015
ouse+0.015
umpy+0.015
oc+0.014
ney+0.013
Neg logit contributions
 the-0.011
 it-0.010
 them-0.010
 their-0.009
 her-0.009
Token.
Token ID13
Feature activation+2.87e+00

Pos logit contributions
anda+0.080
ouse+0.079
umpy+0.076
oc+0.072
uer+0.071
Neg logit contributions
 the-0.045
 it-0.042
 them-0.039
 their-0.037
 her-0.037
Token N
Token ID399
Feature activation+5.79e+00

Pos logit contributions
anda+0.089
ouse+0.089
umpy+0.083
uddle+0.080
uer+0.080
Neg logit contributions
 the-0.068
 it-0.065
 them-0.057
 her-0.056
 their-0.055
Tokenont
Token ID756
Feature activation+9.50e+00

Pos logit contributions
anda+0.123
oc+0.123
ouse+0.115
umpy+0.110
uer+0.109
Neg logit contributions
 the-0.167
 it-0.161
 them-0.149
 her-0.146
 their-0.144
Tokenhe
Token ID258
Feature activation+4.05e+00

Pos logit contributions
umpy+0.190
oc+0.183
ouse+0.166
ari+0.166
uer+0.162
Neg logit contributions
 the-0.291
 it-0.287
 them-0.268
 her-0.261
 what-0.250
Tokenless
Token ID1203
Feature activation+5.15e-02

Pos logit contributions
anda+0.127
umpy+0.123
ouse+0.120
oc+0.115
ari+0.115
Neg logit contributions
 the-0.090
 it-0.088
 them-0.077
 her-0.075
 their-0.071
Token,
Token ID11
Feature activation+7.36e-02

Pos logit contributions
anda+0.002
ouse+0.002
umpy+0.002
oc+0.002
int+0.002
Neg logit contributions
 the-0.001
 it-0.001
 them-0.001
 their-0.001
 her-0.001
Token it
Token ID340
Feature activation+3.10e+00

Pos logit contributions
anda+0.002
ouse+0.002
umpy+0.002
oc+0.002
uer+0.002
Neg logit contributions
 the-0.002
 it-0.001
 them-0.001
 her-0.001
 their-0.001
Token showed
Token ID3751
Feature activation-3.40e+00

Pos logit contributions
ouse+0.094
anda+0.093
umpy+0.092
oc+0.085
ari+0.084
Neg logit contributions
 the-0.073
 it-0.068
 them-0.061
 her-0.061
 their-0.059
Token as
Token ID355
Feature activation-2.56e+00

Pos logit contributions
anda+0.017
ouse+0.017
umpy+0.017
oc+0.016
uer+0.015
Neg logit contributions
 the-0.009
 it-0.009
 them-0.008
 their-0.008
 her-0.008
Token they
Token ID484
Feature activation+2.04e+00

Pos logit contributions
 the+0.027
 them+0.025
 it+0.024
wered+0.023
 her+0.022
Neg logit contributions
anda-0.094
ouse-0.091
umpy-0.089
oc-0.086
ables-0.083
Token drank
Token ID24070
Feature activation-3.41e+00

Pos logit contributions
anda+0.059
ouse+0.057
umpy+0.055
oc+0.053
ney+0.051
Neg logit contributions
 the-0.047
 it-0.045
 them-0.043
 their-0.041
 her-0.040
Token the
Token ID262
Feature activation+2.58e+00

Pos logit contributions
 the+0.041
 it+0.038
wered+0.036
 them+0.034
 anything+0.034
Neg logit contributions
anda-0.117
ouse-0.115
umpy-0.113
oc-0.111
uddle-0.107
Token mil
Token ID1465
Feature activation+5.97e+00

Pos logit contributions
anda+0.087
ouse+0.086
umpy+0.084
oc+0.080
uer+0.080
Neg logit contributions
 the-0.050
 it-0.046
 them-0.044
 their-0.041
 her-0.040
Tokenksh
Token ID50133
Feature activation+9.50e+00

Pos logit contributions
ouse+0.125
umpy+0.122
anda+0.122
ari+0.112
ney+0.104
Neg logit contributions
 them-0.167
 the-0.163
 it-0.154
 their-0.152
 him-0.144
Tokenakes
Token ID1124
Feature activation+3.01e+00

Pos logit contributions
anda+0.193
ouse+0.182
umpy+0.178
uer+0.164
ari+0.159
Neg logit contributions
 the-0.271
 them-0.269
 their-0.263
 it-0.261
 him-0.248
Token.
Token ID13
Feature activation+2.10e+00

Pos logit contributions
anda+0.096
ouse+0.094
umpy+0.092
oc+0.087
uer+0.084
Neg logit contributions
 the-0.060
 it-0.056
 them-0.053
 their-0.051
 her-0.050
Token
Token ID220
Feature activation+2.73e+00

Pos logit contributions
anda+0.066
ouse+0.065
umpy+0.062
uer+0.059
int+0.058
Neg logit contributions
 the-0.049
 it-0.046
 them-0.043
 their-0.042
 her-0.039
Token
Token ID198
Feature activation+1.77e+00

Pos logit contributions
anda+0.088
ouse+0.084
umpy+0.081
oc+0.079
uer+0.079
Neg logit contributions
 the-0.059
 it-0.055
 them-0.054
 their-0.051
 her-0.047
Token
Token ID198
Feature activation+1.44e+00

Pos logit contributions
anda+0.056
ouse+0.054
umpy+0.051
int+0.049
uer+0.049
Neg logit contributions
 the-0.040
 it-0.037
 them-0.036
 their-0.034
 her-0.033
Token need
Token ID761
Feature activation-5.32e+00

Pos logit contributions
 it+0.011
 them+0.011
 the+0.010
 anything+0.009
wered+0.007
Neg logit contributions
anda-0.086
umpy-0.084
ouse-0.084
ney-0.083
oc-0.081
Token.
Token ID13
Feature activation+2.00e+00

Pos logit contributions
wered+0.066
 the+0.047
 their+0.044
 it+0.043
 them+0.043
Neg logit contributions
anda-0.155
umpy-0.148
ouse-0.148
ables-0.144
uddle-0.137
Token Even
Token ID3412
Feature activation-3.26e+00

Pos logit contributions
anda+0.061
ouse+0.061
umpy+0.057
uer+0.055
uddle+0.055
Neg logit contributions
 the-0.048
 it-0.045
 them-0.041
 their-0.040
 her-0.040
Token the
Token ID262
Feature activation+2.64e+00

Pos logit contributions
 the+0.042
 it+0.041
 them+0.039
 their+0.036
 her+0.033
Neg logit contributions
anda-0.118
umpy-0.114
ouse-0.113
oc-0.107
ney-0.104
Token t
Token ID256
Feature activation+7.11e+00

Pos logit contributions
anda+0.078
ouse+0.076
umpy+0.074
oc+0.070
uer+0.068
Neg logit contributions
 the-0.061
 it-0.057
 them-0.054
 their-0.052
 her-0.051
Tokenini
Token ID5362
Feature activation+9.48e+00

Pos logit contributions
umpy+0.126
anda+0.121
ouse+0.120
int+0.117
oc+0.117
Neg logit contributions
 it-0.206
 the-0.199
 them-0.194
 her-0.191
 him-0.189
Tokenest
Token ID395
Feature activation+3.14e+00

Pos logit contributions
anda+0.152
ouse+0.140
umpy+0.128
uer+0.127
uddle+0.125
Neg logit contributions
 the-0.365
 it-0.349
 her-0.331
 them-0.330
 him-0.318
Token creature
Token ID7185
Feature activation+4.57e+00

Pos logit contributions
anda+0.092
ouse+0.090
umpy+0.089
oc+0.083
uer+0.081
Neg logit contributions
 the-0.073
 it-0.070
 them-0.065
 their-0.063
 her-0.062
Token needs
Token ID2476
Feature activation-3.24e+00

Pos logit contributions
anda+0.135
ouse+0.134
umpy+0.132
oc+0.124
int+0.120
Neg logit contributions
 the-0.106
 it-0.101
 them-0.093
 their-0.090
 her-0.090
Token our
Token ID674
Feature activation+2.25e+00

Pos logit contributions
 the+0.040
wered+0.040
 it+0.037
 them+0.036
 their+0.036
Neg logit contributions
anda-0.107
ouse-0.104
umpy-0.103
oc-0.098
ables-0.097
Token help
Token ID1037
Feature activation+1.31e+00

Pos logit contributions
anda+0.067
ouse+0.066
umpy+0.064
oc+0.060
uer+0.059
Neg logit contributions
 the-0.051
 it-0.048
 them-0.046
 their-0.044
 her-0.043
Token.
Token ID13
Feature activation+2.58e+00

Pos logit contributions
anda+0.095
ouse+0.093
umpy+0.091
oc+0.087
uer+0.085
Neg logit contributions
 the-0.053
 it-0.049
 them-0.045
 her-0.044
 their-0.043
Token
Token ID198
Feature activation+1.76e+00

Pos logit contributions
anda+0.075
ouse+0.075
umpy+0.069
uer+0.066
uddle+0.065
Neg logit contributions
 the-0.065
 it-0.061
 them-0.057
 their-0.054
 her-0.053
Token
Token ID198
Feature activation+1.51e+00

Pos logit contributions
anda+0.053
ouse+0.052
umpy+0.049
uer+0.047
oc+0.046
Neg logit contributions
 the-0.041
 it-0.039
 them-0.037
 their-0.034
 her-0.034
TokenThen
Token ID6423
Feature activation-2.16e+00

Pos logit contributions
anda+0.047
ouse+0.047
umpy+0.044
oc+0.042
uer+0.042
Neg logit contributions
 the-0.032
 it-0.030
 them-0.029
 their-0.027
 her-0.027
Token the
Token ID262
Feature activation+2.96e+00

Pos logit contributions
 the+0.035
 them+0.034
 it+0.033
 their+0.031
 her+0.029
Neg logit contributions
anda-0.069
ouse-0.067
umpy-0.063
oc-0.063
int-0.060
Token gir
Token ID37370
Feature activation+9.45e+00

Pos logit contributions
ouse+0.084
umpy+0.075
anda+0.074
ables+0.073
uer+0.070
Neg logit contributions
 the-0.085
 a-0.073
 help-0.073
 it-0.073
 her-0.068
Tokenaffe
Token ID21223
Feature activation+5.86e+00

Pos logit contributions
ouse+0.186
aj+0.141
uddle+0.139
ouses+0.134
ables+0.131
Neg logit contributions
 the-0.321
 it-0.288
 them-0.273
 this-0.271
 her-0.268
Token ripped
Token ID19551
Feature activation-2.74e+00

Pos logit contributions
ouse+0.196
umpy+0.192
anda+0.189
uer+0.177
uddle+0.174
Neg logit contributions
 the-0.128
 it-0.114
 her-0.099
 them-0.099
 a-0.096
Token out
Token ID503
Feature activation-6.04e-01

Pos logit contributions
 the+0.041
 it+0.038
 them+0.036
 their+0.034
 her+0.032
Neg logit contributions
anda-0.099
ouse-0.097
umpy-0.094
oc-0.091
ney-0.088
Token the
Token ID262
Feature activation+3.52e+00

Pos logit contributions
 the+0.015
 it+0.014
 them+0.013
 their+0.012
 her+0.012
Neg logit contributions
anda-0.018
ouse-0.018
umpy-0.017
oc-0.016
uer-0.016
Token monkey
Token ID21657
Feature activation+5.81e+00

Pos logit contributions
ouse+0.100
anda+0.097
umpy+0.096
int+0.090
uer+0.089
Neg logit contributions
 the-0.087
 it-0.078
 them-0.073
 her-0.072
 their-0.071
Token.
Token ID13
Feature activation+2.37e+00

Pos logit contributions
anda+0.099
ouse+0.097
umpy+0.095
oc+0.090
uer+0.088
Neg logit contributions
 the-0.060
 it-0.056
 them-0.053
 their-0.050
 her-0.049
Token
Token ID220
Feature activation+3.54e+00

Pos logit contributions
anda+0.074
ouse+0.072
umpy+0.068
uddle+0.066
uer+0.066
Neg logit contributions
 the-0.057
 it-0.054
 them-0.049
 her-0.048
 their-0.047
Token
Token ID198
Feature activation+2.69e+00

Pos logit contributions
anda+0.116
ouse+0.114
umpy+0.110
uer+0.109
oc+0.109
Neg logit contributions
 the-0.074
 it-0.070
 them-0.068
 her-0.061
 their-0.061
Token
Token ID198
Feature activation+2.28e+00

Pos logit contributions
anda+0.087
ouse+0.087
umpy+0.082
uer+0.080
uddle+0.078
Neg logit contributions
 the-0.058
 it-0.054
 them-0.052
 her-0.048
 their-0.048
TokenThe
Token ID464
Feature activation+3.42e+00

Pos logit contributions
anda+0.074
ouse+0.072
umpy+0.068
uddle+0.065
int+0.065
Neg logit contributions
 the-0.047
 it-0.044
 them-0.042
 her-0.040
 their-0.040
Token par
Token ID1582
Feature activation+9.45e+00

Pos logit contributions
ouse+0.096
anda+0.096
umpy+0.093
oc+0.087
ables+0.086
Neg logit contributions
 the-0.084
 it-0.078
 them-0.073
 their-0.070
 her-0.070
Tokenrot
Token ID10599
Feature activation+6.89e+00

Pos logit contributions
umpy+0.163
ouse+0.146
anda+0.142
uddle+0.139
oodle+0.134
Neg logit contributions
 it-0.297
 the-0.296
 them-0.284
 this-0.266
 such-0.259
Token was
Token ID373
Feature activation-1.34e+00

Pos logit contributions
umpy+0.226
ouse+0.221
anda+0.218
uddle+0.202
ables+0.201
Neg logit contributions
 the-0.157
 it-0.147
 her-0.125
 them-0.124
 a-0.118
Token soon
Token ID2582
Feature activation-1.02e+00

Pos logit contributions
 the+0.028
 it+0.026
 them+0.024
 their+0.023
 her+0.023
Neg logit contributions
anda-0.044
ouse-0.043
umpy-0.041
oc-0.040
uddle-0.039
Token able
Token ID1498
Feature activation-2.39e+00

Pos logit contributions
 the+0.021
 it+0.020
 them+0.019
 their+0.019
 her+0.018
Neg logit contributions
anda-0.031
ouse-0.031
umpy-0.030
oc-0.028
uer-0.028
Token to
Token ID284
Feature activation-2.82e+00

Pos logit contributions
 the+0.036
 them+0.034
 it+0.033
wered+0.032
 their+0.031
Neg logit contributions
anda-0.078
ouse-0.076
umpy-0.075
ables-0.070
uer-0.070
Token you
Token ID345
Feature activation+2.17e+00

Pos logit contributions
wered+0.058
 the+0.050
 anything+0.049
 it+0.048
 such+0.046
Neg logit contributions
anda-0.127
ouse-0.114
oc-0.112
umpy-0.110
s-0.106
Token and
Token ID290
Feature activation-2.61e+00

Pos logit contributions
anda+0.068
ouse+0.066
umpy+0.064
oc+0.061
uer+0.059
Neg logit contributions
 the-0.047
 it-0.044
 them-0.042
 their-0.040
 her-0.039
Token we
Token ID356
Feature activation+1.78e+00

Pos logit contributions
 the+0.042
 them+0.042
 it+0.040
wered+0.037
 anything+0.036
Neg logit contributions
anda-0.086
ouse-0.085
umpy-0.080
oc-0.077
ables-0.076
Token love
Token ID1842
Feature activation-3.72e+00

Pos logit contributions
anda+0.064
ouse+0.063
umpy+0.062
oc+0.059
uer+0.057
Neg logit contributions
 the-0.029
 it-0.027
 them-0.025
 their-0.024
 her-0.023
Token the
Token ID262
Feature activation+2.33e+00

Pos logit contributions
wered+0.045
 the+0.039
 it+0.034
 such+0.033
 anything+0.033
Neg logit contributions
anda-0.127
ouse-0.119
oc-0.119
umpy-0.115
ables-0.111
Token oct
Token ID19318
Feature activation+9.43e+00

Pos logit contributions
anda+0.065
ouse+0.065
umpy+0.062
oc+0.058
uer+0.058
Neg logit contributions
 the-0.057
 it-0.054
 them-0.052
 their-0.050
 her-0.048
Tokenopus
Token ID25790
Feature activation+6.60e+00

Pos logit contributions
anda+0.133
int+0.123
ouse+0.120
ney+0.118
umpy+0.113
Neg logit contributions
 it-0.295
 them-0.287
 the-0.286
 him-0.275
 this-0.264
Token."
Token ID526
Feature activation+1.78e+00

Pos logit contributions
anda+0.234
ouse+0.232
int+0.223
umpy+0.222
uer+0.215
Neg logit contributions
 the-0.124
 it-0.113
 their-0.100
 them-0.100
 her-0.094
Token
Token ID198
Feature activation+7.83e-01

Pos logit contributions
anda+0.062
ouse+0.060
umpy+0.060
uer+0.057
int+0.056
Neg logit contributions
 the-0.038
 it-0.035
 their-0.034
 them-0.033
 her-0.031
Token
Token ID198
Feature activation+7.24e-01

Pos logit contributions
anda+0.023
ouse+0.022
umpy+0.022
oc+0.020
uer+0.020
Neg logit contributions
 the-0.019
 it-0.018
 them-0.017
 their-0.016
 her-0.016
TokenMom
Token ID29252
Feature activation+3.43e+00

Pos logit contributions
anda+0.022
ouse+0.022
umpy+0.021
oc+0.020
uer+0.019
Neg logit contributions
 the-0.016
 it-0.015
 them-0.014
 their-0.014
 her-0.013
Token said
Token ID531
Feature activation+1.67e-01

Pos logit contributions
anda+0.197
ouse+0.196
umpy+0.192
uer+0.191
ini+0.187
Neg logit contributions
 the-0.126
 it-0.109
 her-0.105
 a-0.097
 their-0.088
Token.
Token ID13
Feature activation+2.09e+00

Pos logit contributions
anda+0.005
ouse+0.005
umpy+0.005
oc+0.005
ney+0.005
Neg logit contributions
 the-0.003
 it-0.003
 them-0.003
 their-0.003
 her-0.003
Token
Token ID198
Feature activation+1.46e+00

Pos logit contributions
anda+0.062
ouse+0.060
umpy+0.057
uer+0.054
int+0.054
Neg logit contributions
 the-0.055
 it-0.052
 them-0.047
 their-0.047
 her-0.045
Token
Token ID198
Feature activation+1.26e+00

Pos logit contributions
anda+0.043
ouse+0.042
umpy+0.040
oc+0.038
uer+0.037
Neg logit contributions
 the-0.035
 it-0.033
 them-0.032
 their-0.030
 her-0.030
TokenThe
Token ID464
Feature activation+2.71e+00

Pos logit contributions
anda+0.036
ouse+0.036
umpy+0.034
oc+0.032
int+0.031
Neg logit contributions
 the-0.031
 it-0.029
 them-0.027
 their-0.027
 her-0.026
Token par
Token ID1582
Feature activation+9.43e+00

Pos logit contributions
anda+0.073
ouse+0.072
umpy+0.071
int+0.069
ables+0.069
Neg logit contributions
 the-0.073
 it-0.068
 them-0.062
 her-0.061
 their-0.060
Tokenrot
Token ID10599
Feature activation+6.39e+00

Pos logit contributions
umpy+0.164
int+0.147
anda+0.144
jo+0.142
ouse+0.142
Neg logit contributions
 it-0.293
 the-0.290
 them-0.272
 this-0.259
 her-0.252
Token looked
Token ID3114
Feature activation-2.22e+00

Pos logit contributions
umpy+0.234
ouse+0.215
anda+0.215
ables+0.215
ini+0.207
Neg logit contributions
 the-0.138
 it-0.131
 her-0.107
 a-0.105
 them-0.101
Token at
Token ID379
Feature activation-4.52e+00

Pos logit contributions
 the+0.053
 it+0.050
 them+0.046
 their+0.045
 her+0.045
Neg logit contributions
anda-0.068
ouse-0.068
umpy-0.065
ney-0.063
oc-0.062
Token Tim
Token ID5045
Feature activation+4.50e+00

Pos logit contributions
 permission+0.046
 anything+0.044
 help+0.043
 respect+0.043
 such+0.043
Neg logit contributions
anda-0.148
oc-0.139
umpy-0.136
an-0.135
affe-0.134
Token.
Token ID13
Feature activation+6.75e-01

Pos logit contributions
anda+0.129
ouse+0.127
umpy+0.124
oc+0.114
uer+0.114
Neg logit contributions
 the-0.111
 it-0.105
 them-0.098
 her-0.095
 their-0.095
Tokenksh
Token ID50133
Feature activation+9.18e+00

Pos logit contributions
ouse+0.122
umpy+0.119
anda+0.116
ney+0.104
oc+0.101
Neg logit contributions
 them-0.159
 the-0.156
 their-0.147
 it-0.147
 such-0.138
Tokenakes
Token ID1124
Feature activation+4.05e+00

Pos logit contributions
anda+0.179
ouse+0.164
umpy+0.163
uer+0.149
ari+0.143
Neg logit contributions
 the-0.262
 them-0.262
 their-0.257
 it-0.251
 him-0.250
Token.
Token ID13
Feature activation+1.69e+00

Pos logit contributions
anda+0.130
ouse+0.127
umpy+0.124
oc+0.117
uer+0.114
Neg logit contributions
 the-0.081
 it-0.076
 them-0.072
 their-0.069
 her-0.067
Token The
Token ID383
Feature activation+2.73e+00

Pos logit contributions
anda+0.054
ouse+0.053
umpy+0.051
uer+0.048
int+0.048
Neg logit contributions
 the-0.037
 it-0.035
 them-0.033
 their-0.032
 her-0.031
Token mil
Token ID1465
Feature activation+5.85e+00

Pos logit contributions
ouse+0.080
anda+0.079
umpy+0.077
oc+0.072
int+0.071
Neg logit contributions
 the-0.065
 it-0.060
 them-0.058
 their-0.056
 her-0.054
Tokenksh
Token ID50133
Feature activation+9.43e+00

Pos logit contributions
ouse+0.121
umpy+0.115
anda+0.114
ari+0.106
uer+0.099
Neg logit contributions
 them-0.173
 the-0.167
 it-0.159
 their-0.158
 him-0.150
Tokenakes
Token ID1124
Feature activation+2.74e+00

Pos logit contributions
anda+0.186
ouse+0.173
umpy+0.169
ari+0.155
uer+0.154
Neg logit contributions
 the-0.278
 them-0.274
 it-0.270
 their-0.268
 him-0.256
Token were
Token ID547
Feature activation-2.21e+00

Pos logit contributions
anda+0.085
ouse+0.083
umpy+0.081
oc+0.077
uer+0.074
Neg logit contributions
 the-0.056
 it-0.053
 them-0.050
 their-0.048
 her-0.047
Token blue
Token ID4171
Feature activation+4.16e+00

Pos logit contributions
 the+0.044
 it+0.042
 them+0.040
 their+0.038
 her+0.037
Neg logit contributions
anda-0.068
ouse-0.065
umpy-0.064
oc-0.059
ney-0.058
Token and
Token ID290
Feature activation-2.49e+00

Pos logit contributions
anda+0.135
ouse+0.134
umpy+0.128
uer+0.124
oc+0.121
Neg logit contributions
 the-0.089
 it-0.081
 them-0.079
 their-0.075
 her-0.074
Token pink
Token ID11398
Feature activation+3.24e+00

Pos logit contributions
 the+0.043
 it+0.040
 them+0.039
 their+0.036
 her+0.036
Neg logit contributions
anda-0.081
ouse-0.078
umpy-0.076
oc-0.073
ney-0.071
Token and
Token ID290
Feature activation-2.66e+00

Pos logit contributions
 the+0.038
 it+0.036
 them+0.035
 anything+0.034
wered+0.033
Neg logit contributions
anda-0.067
ouse-0.066
umpy-0.064
oc-0.061
ables-0.059
Token asked
Token ID1965
Feature activation-4.01e+00

Pos logit contributions
 it+0.063
 the+0.063
 them+0.062
 anything+0.060
wered+0.057
Neg logit contributions
anda-0.063
ouse-0.060
umpy-0.056
oc-0.056
int-0.054
Token if
Token ID611
Feature activation-5.21e+00

Pos logit contributions
 the+0.043
wered+0.042
 it+0.042
 such+0.042
 anything+0.041
Neg logit contributions
anda-0.140
oc-0.126
ouse-0.125
umpy-0.123
jo-0.123
Token he
Token ID339
Feature activation-1.26e-01

Pos logit contributions
wered+0.088
 them+0.083
 the+0.082
 such+0.081
 respect+0.080
Neg logit contributions
anda-0.142
ouse-0.126
ari-0.119
oc-0.118
uer-0.116
Token had
Token ID550
Feature activation-5.43e+00

Pos logit contributions
 the+0.003
 it+0.003
 them+0.003
 her+0.003
 their+0.003
Neg logit contributions
anda-0.004
ouse-0.004
umpy-0.004
oc-0.003
ables-0.003
Token seen
Token ID1775
Feature activation-8.94e+00

Pos logit contributions
 them+0.081
wered+0.080
 it+0.078
 the+0.078
 their+0.069
Neg logit contributions
ouse-0.169
anda-0.158
umpy-0.155
int-0.149
ables-0.146
Token their
Token ID511
Feature activation+8.83e-01

Pos logit contributions
wered+0.050
 the+0.034
 respect+0.034
 it+0.027
 achieving+0.026
Neg logit contributions
anda-0.331
ouse-0.300
oc-0.300
affe-0.296
jo-0.295
Token orange
Token ID10912
Feature activation+3.92e+00

Pos logit contributions
anda+0.025
ouse+0.024
umpy+0.023
oc+0.022
uer+0.021
Neg logit contributions
 the-0.021
 it-0.020
 them-0.020
 their-0.019
 her-0.018
Token ball
Token ID2613
Feature activation+3.04e+00

Pos logit contributions
anda+0.115
umpy+0.110
ouse+0.109
oc+0.104
int+0.101
Neg logit contributions
 the-0.096
 it-0.090
 them-0.083
 their-0.083
 her-0.082
Token,
Token ID11
Feature activation-2.05e+00

Pos logit contributions
anda+0.099
ouse+0.097
umpy+0.095
oc+0.090
uer+0.087
Neg logit contributions
 the-0.060
 it-0.056
 them-0.053
 their-0.051
 her-0.050
Token and
Token ID290
Feature activation-3.08e+00

Pos logit contributions
 the+0.043
 them+0.041
 it+0.041
 their+0.038
 her+0.038
Neg logit contributions
anda-0.056
ouse-0.055
umpy-0.053
oc-0.050
ney-0.049
Token and
Token ID290
Feature activation-1.80e+00

Pos logit contributions
ouse+0.043
anda+0.043
umpy+0.042
int+0.040
ney+0.039
Neg logit contributions
 the-0.025
 it-0.022
 them-0.021
 their-0.021
 her-0.020
Token asked
Token ID1965
Feature activation-4.08e+00

Pos logit contributions
 anything+0.040
 them+0.039
 the+0.039
 it+0.039
wered+0.038
Neg logit contributions
ouse-0.044
anda-0.043
oc-0.041
uer-0.038
ari-0.038
Token if
Token ID611
Feature activation-5.22e+00

Pos logit contributions
wered+0.038
 such+0.034
 it+0.032
 the+0.032
 anything+0.029
Neg logit contributions
anda-0.142
oc-0.135
umpy-0.129
ouse-0.129
ari-0.126
Token she
Token ID673
Feature activation-7.00e-01

Pos logit contributions
wered+0.098
 such+0.095
 respect+0.092
 the+0.090
 them+0.086
Neg logit contributions
anda-0.127
ouse-0.117
ari-0.111
uer-0.107
umpy-0.107
Token had
Token ID550
Feature activation-5.56e+00

Pos logit contributions
 the+0.016
 it+0.016
 them+0.014
 her+0.014
 their+0.013
Neg logit contributions
anda-0.021
ouse-0.021
umpy-0.021
ables-0.019
uddle-0.019
Token seen
Token ID1775
Feature activation-8.88e+00

Pos logit contributions
wered+0.089
 the+0.087
 it+0.086
 them+0.085
 their+0.078
Neg logit contributions
ouse-0.162
anda-0.157
int-0.144
umpy-0.143
ables-0.138
Token a
Token ID257
Feature activation+8.32e-01

Pos logit contributions
 the+0.059
wered+0.052
 it+0.052
 respect+0.043
 their+0.040
Neg logit contributions
anda-0.343
ouse-0.312
oc-0.303
affe-0.296
umpy-0.296
Token little
Token ID1310
Feature activation+1.40e+00

Pos logit contributions
anda+0.024
ouse+0.023
umpy+0.023
oc+0.021
int+0.021
Neg logit contributions
 the-0.020
 it-0.019
 them-0.018
 their-0.017
 her-0.017
Token boy
Token ID2933
Feature activation+1.69e+00

Pos logit contributions
anda+0.038
ouse+0.038
umpy+0.036
oc+0.034
int+0.033
Neg logit contributions
 the-0.035
 it-0.034
 them-0.032
 their-0.031
 her-0.030
Token.
Token ID13
Feature activation+2.42e+00

Pos logit contributions
anda+0.056
ouse+0.055
umpy+0.054
oc+0.051
ney+0.050
Neg logit contributions
 the-0.033
 it-0.031
 them-0.029
 their-0.028
 her-0.027
Token Lily
Token ID20037
Feature activation+3.85e+00

Pos logit contributions
anda+0.081
ouse+0.079
umpy+0.078
uer+0.072
int+0.072
Neg logit contributions
 the-0.052
 it-0.047
 them-0.045
 their-0.044
 her-0.043
Token out
Token ID503
Feature activation-3.44e+00

Pos logit contributions
 the+0.037
 it+0.035
wered+0.032
 them+0.031
 their+0.028
Neg logit contributions
anda-0.215
ouse-0.213
umpy-0.205
oc-0.203
int-0.197
Token a
Token ID257
Feature activation+1.25e+00

Pos logit contributions
 it+0.026
 the+0.021
 them+0.021
 anything+0.020
 happen+0.019
Neg logit contributions
anda-0.133
ouse-0.131
umpy-0.126
oc-0.124
ari-0.122
Token needle
Token ID17598
Feature activation+3.80e+00

Pos logit contributions
anda+0.037
ouse+0.036
umpy+0.035
oc+0.033
uer+0.032
Neg logit contributions
 the-0.028
 it-0.027
 them-0.025
 their-0.024
 her-0.024
Token and
Token ID290
Feature activation-2.03e+00

Pos logit contributions
anda+0.102
ouse+0.100
umpy+0.096
oc+0.091
ney+0.088
Neg logit contributions
 the-0.094
 it-0.090
 them-0.087
 their-0.084
 her-0.082
Token carefully
Token ID7773
Feature activation-1.98e+00

Pos logit contributions
 them+0.036
 the+0.035
 it+0.034
 anything+0.032
 their+0.032
Neg logit contributions
ouse-0.062
anda-0.061
oc-0.058
int-0.055
umpy-0.055
Token inserted
Token ID18846
Feature activation-8.85e+00

Pos logit contributions
 the+0.038
 it+0.037
 them+0.036
 anything+0.035
 their+0.034
Neg logit contributions
ouse-0.058
anda-0.058
oc-0.054
int-0.053
umpy-0.053
Token it
Token ID340
Feature activation+1.10e+00

Pos logit contributions
wered+0.041
 listen+0.034
 respect+0.028
 happen+0.024
 ruining+0.020
Neg logit contributions
anda-0.336
oc-0.315
ouse-0.315
ip-0.302
ish-0.300
Token into
Token ID656
Feature activation-7.27e+00

Pos logit contributions
anda+0.037
ouse+0.037
umpy+0.036
oc+0.034
uer+0.033
Neg logit contributions
 the-0.020
 it-0.019
 them-0.018
 their-0.017
 her-0.017
Token the
Token ID262
Feature activation+2.02e+00

Pos logit contributions
 respect+0.041
 listen+0.039
wered+0.037
 happen+0.031
 anything+0.029
Neg logit contributions
oc-0.258
ouse-0.257
anda-0.256
ney-0.235
zag-0.234
Token toy
Token ID13373
Feature activation+2.99e+00

Pos logit contributions
anda+0.049
ouse+0.048
umpy+0.046
oc+0.043
ney+0.042
Neg logit contributions
 the-0.055
 it-0.053
 them-0.052
 their-0.050
 her-0.049
Token car
Token ID1097
Feature activation+3.31e+00

Pos logit contributions
anda+0.070
ouse+0.068
umpy+0.065
oc+0.062
ney+0.059
Neg logit contributions
 the-0.085
 it-0.081
 them-0.079
 their-0.076
 her-0.075
Token just
Token ID655
Feature activation-4.62e+00

Pos logit contributions
 respect+0.085
 it+0.083
 the+0.082
 them+0.081
 permission+0.078
Neg logit contributions
ouse-0.112
umpy-0.107
anda-0.106
zag-0.095
int-0.094
Token right
Token ID826
Feature activation-2.17e+00

Pos logit contributions
 them+0.052
 it+0.052
 respect+0.048
 the+0.047
 help+0.046
Neg logit contributions
anda-0.144
umpy-0.141
ouse-0.141
int-0.135
est-0.128
Token,
Token ID11
Feature activation-2.09e+00

Pos logit contributions
 the+0.045
 it+0.044
wered+0.043
 them+0.042
 help+0.041
Neg logit contributions
anda-0.056
ouse-0.055
umpy-0.052
int-0.049
uddle-0.049
Token then
Token ID788
Feature activation-3.17e+00

Pos logit contributions
 the+0.043
 them+0.041
 their+0.039
 it+0.039
wered+0.037
Neg logit contributions
ouse-0.054
anda-0.054
umpy-0.052
oc-0.049
uddle-0.047
Token carefully
Token ID7773
Feature activation-1.65e+00

Pos logit contributions
 them+0.050
 the+0.049
wered+0.048
 such+0.045
 their+0.045
Neg logit contributions
ouse-0.101
anda-0.097
uddle-0.089
uer-0.087
umpy-0.087
Token inserted
Token ID18846
Feature activation-8.75e+00

Pos logit contributions
 the+0.029
 it+0.027
 them+0.027
 their+0.025
 anything+0.025
Neg logit contributions
ouse-0.052
anda-0.051
umpy-0.048
oc-0.047
int-0.046
Token it
Token ID340
Feature activation+9.07e-01

Pos logit contributions
wered+0.049
 listen+0.039
 help+0.031
 behave+0.031
 save+0.029
Neg logit contributions
anda-0.316
ouse-0.294
s-0.287
umpy-0.284
c-0.284
Token into
Token ID656
Feature activation-7.13e+00

Pos logit contributions
anda+0.031
ouse+0.030
umpy+0.029
oc+0.028
uer+0.027
Neg logit contributions
 the-0.017
 it-0.016
 them-0.015
 their-0.014
 her-0.014
Token the
Token ID262
Feature activation+1.29e+00

Pos logit contributions
 listen+0.054
wered+0.052
 respect+0.049
 risking+0.049
 happen+0.048
Neg logit contributions
ouse-0.217
anda-0.211
oc-0.210
c-0.204
ney-0.204
Token drum
Token ID13026
Feature activation+3.53e+00

Pos logit contributions
anda+0.037
ouse+0.036
umpy+0.035
oc+0.033
ney+0.032
Neg logit contributions
 the-0.030
 it-0.028
 them-0.027
 their-0.026
 her-0.026
Token to
Token ID284
Feature activation-4.74e+00

Pos logit contributions
anda+0.121
ouse+0.116
umpy+0.116
oc+0.109
int+0.106
Neg logit contributions
 the-0.068
 it-0.063
 them-0.057
 their-0.055
 her-0.055
Token a
Token ID257
Feature activation+1.64e+00

Pos logit contributions
 happen+0.041
 respect+0.040
wered+0.039
 yours+0.037
 prevent+0.037
Neg logit contributions
anda-0.177
uddle-0.166
umpy-0.164
ouse-0.163
ables-0.161
Token big
Token ID1263
Feature activation+9.61e-01

Pos logit contributions
anda+0.044
ouse+0.042
umpy+0.041
oc+0.039
uddle+0.037
Neg logit contributions
 the-0.040
 it-0.038
 them-0.038
 their-0.037
 her-0.036
Token bush
Token ID24484
Feature activation+2.36e+00

Pos logit contributions
anda+0.024
ouse+0.022
umpy+0.022
oc+0.021
ney+0.020
Neg logit contributions
 the-0.024
 them-0.023
 it-0.023
 their-0.022
 such-0.022
Token and
Token ID290
Feature activation-1.67e+00

Pos logit contributions
anda+0.062
ouse+0.060
umpy+0.060
oc+0.056
ney+0.055
Neg logit contributions
 the-0.056
 it-0.054
 them-0.054
 their-0.052
 anything-0.050
Token waited
Token ID13488
Feature activation-3.12e+00

Pos logit contributions
 them+0.037
 the+0.037
 it+0.035
 their+0.035
 anything+0.035
Neg logit contributions
anda-0.044
ouse-0.043
umpy-0.039
uer-0.037
oc-0.037
Token for
Token ID329
Feature activation-8.59e+00

Pos logit contributions
 such+0.066
 the+0.066
wered+0.065
 anything+0.065
 it+0.064
Neg logit contributions
anda-0.068
ouse-0.068
umpy-0.063
uddle-0.060
int-0.060
Token Ben
Token ID3932
Feature activation+3.24e+00

Pos logit contributions
 risking+0.118
 preventing+0.112
 happening+0.105
 happen+0.101
 buying+0.101
Neg logit contributions
olly-0.182
anda-0.170
uddle-0.169
ittens-0.166
s-0.164
Token to
Token ID284
Feature activation-4.69e+00

Pos logit contributions
anda+0.121
ouse+0.119
umpy+0.116
oc+0.111
uer+0.108
Neg logit contributions
 the-0.049
 it-0.045
 them-0.041
 their-0.039
 her-0.038
Token find
Token ID1064
Feature activation-6.59e+00

Pos logit contributions
 it+0.086
 them+0.084
 their+0.083
 risking+0.082
 the+0.075
Neg logit contributions
anda-0.097
umpy-0.094
ouse-0.093
uer-0.092
ney-0.090
Token her
Token ID607
Feature activation+2.12e-01

Pos logit contributions
wered+0.064
 the+0.060
 it+0.058
 buying+0.055
 happen+0.055
Neg logit contributions
anda-0.220
oc-0.215
ouse-0.206
int-0.201
affe-0.197
Token.
Token ID13
Feature activation+1.46e+00

Pos logit contributions
anda+0.007
umpy+0.007
ouse+0.007
oc+0.007
int+0.007
Neg logit contributions
 the-0.004
 it-0.004
 her-0.003
 their-0.003
 them-0.003
Token all
Token ID477
Feature activation-4.47e-01

Pos logit contributions
 the+0.002
 it+0.002
 them+0.002
 their+0.002
 her+0.002
Neg logit contributions
anda-0.003
ouse-0.003
umpy-0.003
oc-0.003
uer-0.002
Token lined
Token ID16566
Feature activation-5.97e+00

Pos logit contributions
 the+0.010
 it+0.010
 them+0.009
 their+0.009
 her+0.009
Neg logit contributions
anda-0.013
ouse-0.013
umpy-0.013
oc-0.012
uer-0.012
Token up
Token ID510
Feature activation-3.06e+00

Pos logit contributions
 the+0.122
 it+0.114
 anything+0.113
 such+0.112
 her+0.111
Neg logit contributions
anda-0.161
ouse-0.151
umpy-0.140
oc-0.139
ang-0.137
Token and
Token ID290
Feature activation-2.76e+00

Pos logit contributions
 the+0.058
 it+0.056
 them+0.053
 their+0.049
 her+0.049
Neg logit contributions
anda-0.098
ouse-0.095
umpy-0.093
oc-0.088
ney-0.086
Token waited
Token ID13488
Feature activation-3.24e+00

Pos logit contributions
 the+0.064
 them+0.063
 it+0.062
 anything+0.059
 such+0.059
Neg logit contributions
ouse-0.071
anda-0.071
umpy-0.065
oc-0.062
ables-0.061
Token for
Token ID329
Feature activation-8.51e+00

Pos logit contributions
 the+0.066
 it+0.065
 such+0.063
 them+0.063
wered+0.062
Neg logit contributions
ouse-0.078
anda-0.077
umpy-0.073
int-0.069
uddle-0.069
Token someone
Token ID2130
Feature activation+1.09e+00

Pos logit contributions
 risking+0.070
 wasting+0.065
 yourselves+0.061
 happen+0.061
 grant+0.060
Neg logit contributions
anda-0.242
oc-0.236
uddle-0.218
affe-0.216
ari-0.215
Token to
Token ID284
Feature activation-4.63e+00

Pos logit contributions
anda+0.043
ouse+0.042
umpy+0.042
oc+0.040
uer+0.039
Neg logit contributions
 the-0.014
 it-0.013
 them-0.012
 their-0.011
 her-0.011
Token say
Token ID910
Feature activation-3.31e+00

Pos logit contributions
 it+0.077
 anything+0.066
 them+0.065
 the+0.064
 her+0.064
Neg logit contributions
umpy-0.120
ari-0.113
anda-0.110
ouse-0.109
uer-0.109
Token "
Token ID366
Feature activation-8.06e-02

Pos logit contributions
 the+0.065
 it+0.061
 them+0.058
 their+0.056
 her+0.055
Neg logit contributions
anda-0.104
ouse-0.102
umpy-0.099
oc-0.093
uer-0.091
Tokengo
Token ID2188
Feature activation+8.83e-01

Pos logit contributions
 the+0.002
 it+0.002
 them+0.002
 her+0.002
 him+0.002
Neg logit contributions
anda-0.002
ouse-0.002
umpy-0.002
uddle-0.002
ables-0.002
Token asked
Token ID1965
Feature activation-4.17e+00

Pos logit contributions
umpy+0.101
anda+0.098
ouse+0.098
uddle+0.091
uer+0.090
Neg logit contributions
 the-0.074
 it-0.069
 them-0.062
 her-0.061
 their-0.058
Token her
Token ID607
Feature activation-6.09e-01

Pos logit contributions
 the+0.043
 it+0.039
 their+0.033
 them+0.033
 such+0.031
Neg logit contributions
anda-0.162
ouse-0.154
umpy-0.151
oc-0.147
int-0.143
Token if
Token ID611
Feature activation-5.27e+00

Pos logit contributions
 the+0.007
 it+0.006
 them+0.006
 their+0.005
 her+0.005
Neg logit contributions
anda-0.025
ouse-0.024
umpy-0.023
oc-0.023
uer-0.022
Token she
Token ID673
Feature activation-4.50e-01

Pos logit contributions
 the+0.097
 them+0.093
wered+0.089
 their+0.089
 it+0.088
Neg logit contributions
anda-0.148
ouse-0.136
umpy-0.127
ari-0.122
int-0.119
Token had
Token ID550
Feature activation-5.63e+00

Pos logit contributions
 the+0.011
 it+0.010
 her+0.009
 them+0.009
 a+0.009
Neg logit contributions
anda-0.014
ouse-0.013
umpy-0.013
uddle-0.013
ables-0.012
Token seen
Token ID1775
Feature activation-8.51e+00

Pos logit contributions
 the+0.078
 them+0.077
 it+0.074
wered+0.072
 their+0.070
Neg logit contributions
ouse-0.187
anda-0.181
umpy-0.173
int-0.167
ables-0.166
Token the
Token ID262
Feature activation+8.97e-01

Pos logit contributions
 the+0.038
wered+0.034
 it+0.028
 them+0.028
 their+0.027
Neg logit contributions
anda-0.340
ouse-0.322
oc-0.311
jo-0.308
umpy-0.303
Token wall
Token ID3355
Feature activation+2.32e+00

Pos logit contributions
anda+0.023
ouse+0.022
umpy+0.021
oc+0.020
uer+0.019
Neg logit contributions
 the-0.025
 it-0.024
 them-0.022
 her-0.022
 their-0.022
Token.
Token ID13
Feature activation+1.36e+00

Pos logit contributions
anda+0.078
ouse+0.076
umpy+0.075
oc+0.072
int+0.070
Neg logit contributions
 the-0.046
 it-0.042
 them-0.039
 their-0.038
 her-0.038
Token She
Token ID1375
Feature activation+9.09e-01

Pos logit contributions
anda+0.052
ouse+0.052
umpy+0.050
uer+0.049
oc+0.048
Neg logit contributions
 the-0.023
 it-0.021
 them-0.018
 her-0.017
 their-0.017
Token hadn
Token ID8020
Feature activation-2.23e+00

Pos logit contributions
umpy+0.027
anda+0.026
ouse+0.026
oc+0.024
ables+0.024
Neg logit contributions
 the-0.022
 it-0.021
 them-0.019
 her-0.019
 their-0.019
Token a
Token ID257
Feature activation+1.42e+00

Pos logit contributions
 respect+0.029
wered+0.027
 happen+0.027
 it+0.026
 prevent+0.022
Neg logit contributions
anda-0.208
umpy-0.191
oc-0.190
ouse-0.189
uddle-0.186
Token big
Token ID1263
Feature activation+9.12e-01

Pos logit contributions
anda+0.039
ouse+0.038
umpy+0.037
oc+0.034
ney+0.033
Neg logit contributions
 the-0.035
 it-0.033
 them-0.032
 their-0.031
 her-0.030
Token tree
Token ID5509
Feature activation+2.26e+00

Pos logit contributions
anda+0.023
ouse+0.021
umpy+0.021
oc+0.020
ney+0.019
Neg logit contributions
 the-0.023
 them-0.022
 it-0.022
 their-0.021
 such-0.021
Token and
Token ID290
Feature activation-1.80e+00

Pos logit contributions
anda+0.067
ouse+0.066
umpy+0.064
oc+0.060
uer+0.058
Neg logit contributions
 the-0.052
 it-0.049
 them-0.046
 their-0.045
 her-0.044
Token waits
Token ID28364
Feature activation-2.96e+00

Pos logit contributions
 the+0.043
 it+0.041
 them+0.040
 their+0.038
 her+0.037
Neg logit contributions
anda-0.050
ouse-0.048
umpy-0.046
oc-0.043
ables-0.042
Token for
Token ID329
Feature activation-8.50e+00

Pos logit contributions
 them+0.062
wered+0.061
 the+0.061
 it+0.061
 their+0.060
Neg logit contributions
anda-0.063
umpy-0.062
ouse-0.060
int-0.055
uddle-0.054
Token Sara
Token ID24799
Feature activation+3.11e+00

Pos logit contributions
 preventing+0.116
 risking+0.111
 happen+0.107
 happening+0.104
 buying+0.104
Neg logit contributions
anda-0.199
oodle-0.179
olly-0.175
uddle-0.170
s-0.167
Token to
Token ID284
Feature activation-4.50e+00

Pos logit contributions
anda+0.120
ouse+0.119
umpy+0.117
oc+0.112
uer+0.110
Neg logit contributions
 the-0.044
 it-0.040
 them-0.036
 their-0.034
 her-0.034
Token find
Token ID1064
Feature activation-6.73e+00

Pos logit contributions
 their+0.079
 it+0.079
 them+0.078
 the+0.074
 such+0.067
Neg logit contributions
anda-0.114
ouse-0.111
umpy-0.110
uer-0.107
oc-0.105
Token him
Token ID683
Feature activation+8.90e-01

Pos logit contributions
wered+0.059
 the+0.054
 it+0.051
 them+0.050
 buying+0.049
Neg logit contributions
anda-0.239
oc-0.227
ouse-0.218
int-0.210
uddle-0.208
Token.
Token ID13
Feature activation+1.45e+00

Pos logit contributions
anda+0.029
ouse+0.029
umpy+0.028
oc+0.027
uer+0.026
Neg logit contributions
 the-0.018
 it-0.016
 them-0.015
 their-0.015
 her-0.015
Token a
Token ID257
Feature activation+1.59e+00

Pos logit contributions
wered+0.028
 respect+0.027
 happen+0.026
 it+0.026
 yours+0.022
Neg logit contributions
anda-0.199
umpy-0.188
oc-0.185
ouse-0.184
ables-0.184
Token big
Token ID1263
Feature activation+8.73e-01

Pos logit contributions
anda+0.043
ouse+0.042
umpy+0.041
oc+0.038
ney+0.037
Neg logit contributions
 the-0.039
 it-0.037
 them-0.036
 their-0.035
 her-0.034
Token tree
Token ID5509
Feature activation+2.28e+00

Pos logit contributions
anda+0.022
ouse+0.020
umpy+0.020
oc+0.019
ney+0.018
Neg logit contributions
 the-0.022
 them-0.021
 it-0.021
 their-0.020
 such-0.020
Token and
Token ID290
Feature activation-1.68e+00

Pos logit contributions
anda+0.067
ouse+0.066
umpy+0.063
oc+0.060
uer+0.058
Neg logit contributions
 the-0.053
 it-0.050
 them-0.047
 their-0.046
 her-0.045
Token waited
Token ID13488
Feature activation-3.12e+00

Pos logit contributions
 the+0.037
 them+0.035
 it+0.035
 their+0.033
 her+0.031
Neg logit contributions
anda-0.048
ouse-0.047
umpy-0.044
oc-0.042
ney-0.041
Token for
Token ID329
Feature activation-8.43e+00

Pos logit contributions
 the+0.065
 such+0.064
wered+0.063
 it+0.063
 anything+0.062
Neg logit contributions
anda-0.071
ouse-0.070
umpy-0.066
uddle-0.062
int-0.062
Token Anna
Token ID11735
Feature activation+2.68e+00

Pos logit contributions
 preventing+0.121
 risking+0.116
 happening+0.113
 buying+0.113
 happen+0.112
Neg logit contributions
olly-0.175
anda-0.173
s-0.161
oodle-0.159
uddle-0.159
Token to
Token ID284
Feature activation-4.68e+00

Pos logit contributions
anda+0.104
ouse+0.103
umpy+0.101
oc+0.096
uer+0.095
Neg logit contributions
 the-0.037
 it-0.034
 them-0.031
 their-0.029
 her-0.028
Token find
Token ID1064
Feature activation-6.59e+00

Pos logit contributions
 it+0.082
 their+0.082
 them+0.082
 risking+0.075
 the+0.073
Neg logit contributions
anda-0.105
ouse-0.103
umpy-0.101
uer-0.101
oc-0.099
Token him
Token ID683
Feature activation+7.52e-01

Pos logit contributions
wered+0.055
 the+0.049
 it+0.047
 going+0.044
 buying+0.044
Neg logit contributions
anda-0.233
oc-0.223
ouse-0.214
affe-0.207
int-0.206
Token.
Token ID13
Feature activation+1.68e+00

Pos logit contributions
anda+0.025
ouse+0.024
umpy+0.024
oc+0.022
uer+0.022
Neg logit contributions
 the-0.015
 it-0.014
 them-0.013
 their-0.013
 her-0.012
Tokeneddy
Token ID21874
Feature activation+4.20e+00

Pos logit contributions
umpy+0.093
anda+0.087
uer+0.086
oke+0.086
ouse+0.084
Neg logit contributions
 it-0.139
 the-0.138
 them-0.135
 her-0.132
 him-0.131
Token bear
Token ID6842
Feature activation+3.88e+00

Pos logit contributions
anda+0.093
ouse+0.091
umpy+0.091
uer+0.080
ari+0.078
Neg logit contributions
 the-0.134
 it-0.126
 them-0.120
 her-0.117
 their-0.116
Token when
Token ID618
Feature activation-2.71e+00

Pos logit contributions
anda+0.140
ouse+0.137
umpy+0.134
oc+0.125
uer+0.125
Neg logit contributions
 the-0.068
 it-0.062
 them-0.056
 her-0.054
 their-0.054
Token she
Token ID673
Feature activation+1.11e+00

Pos logit contributions
 respect+0.054
 permission+0.053
 them+0.051
 such+0.050
wered+0.049
Neg logit contributions
ouse-0.063
anda-0.062
uddle-0.056
uffy-0.054
s-0.054
Token accidentally
Token ID14716
Feature activation-4.83e+00

Pos logit contributions
anda+0.036
umpy+0.035
ouse+0.035
oc+0.033
uer+0.033
Neg logit contributions
 the-0.024
 it-0.022
 them-0.020
 her-0.019
 their-0.019
Token grabbed
Token ID13176
Feature activation-8.41e+00

Pos logit contributions
 them+0.111
 the+0.111
wered+0.107
 their+0.105
 such+0.102
Neg logit contributions
ouse-0.122
anda-0.115
oc-0.099
zag-0.097
int-0.096
Token the
Token ID262
Feature activation+1.22e+00

Pos logit contributions
wered+0.042
 the+0.034
 respect+0.033
 them+0.029
 letting+0.028
Neg logit contributions
anda-0.323
ouse-0.308
oc-0.305
affe-0.286
lers-0.286
Token cord
Token ID15050
Feature activation+3.33e+00

Pos logit contributions
anda+0.039
ouse+0.039
umpy+0.038
oc+0.036
ables+0.036
Neg logit contributions
 the-0.025
 it-0.023
 them-0.021
 her-0.021
 their-0.020
Token of
Token ID286
Feature activation-5.72e+00

Pos logit contributions
anda+0.110
ouse+0.107
umpy+0.105
oc+0.099
uer+0.097
Neg logit contributions
 the-0.067
 it-0.062
 them-0.057
 their-0.056
 her-0.055
Token her
Token ID607
Feature activation+2.64e+00

Pos logit contributions
 listen+0.040
 respect+0.039
wered+0.037
 happen+0.035
 anything+0.035
Neg logit contributions
anda-0.182
ouse-0.170
oc-0.170
ang-0.165
affe-0.160
Token lamp
Token ID20450
Feature activation+4.67e+00

Pos logit contributions
umpy+0.079
anda+0.074
ouse+0.073
int+0.071
ari+0.071
Neg logit contributions
 the-0.065
 it-0.058
 a-0.055
 her-0.055
 their-0.054
Token them
Token ID606
Feature activation+2.04e+00

Pos logit contributions
wered+0.039
 them+0.035
 their+0.031
pped+0.026
 the+0.024
Neg logit contributions
anda-0.210
oc-0.202
ouse-0.197
jo-0.191
ney-0.190
Token before
Token ID878
Feature activation-3.06e+00

Pos logit contributions
anda+0.061
ouse+0.060
umpy+0.058
oc+0.055
uer+0.053
Neg logit contributions
 the-0.045
 it-0.042
 them-0.040
 their-0.039
 her-0.038
Token,
Token ID11
Feature activation-1.17e+00

Pos logit contributions
 the+0.052
 them+0.050
 respect+0.050
 it+0.048
 anything+0.048
Neg logit contributions
ouse-0.084
anda-0.084
umpy-0.081
uddle-0.076
jo-0.076
Token but
Token ID475
Feature activation-2.89e+00

Pos logit contributions
 the+0.026
 them+0.025
 it+0.025
 their+0.025
 anything+0.023
Neg logit contributions
anda-0.030
ouse-0.029
umpy-0.028
oc-0.026
ables-0.025
Token never
Token ID1239
Feature activation-2.11e+00

Pos logit contributions
 them+0.036
 respect+0.031
 the+0.029
 help+0.029
wered+0.029
Neg logit contributions
anda-0.090
ouse-0.088
umpy-0.082
ney-0.081
uer-0.078
Token seen
Token ID1775
Feature activation-8.38e+00

Pos logit contributions
 the+0.049
 them+0.049
 it+0.048
 their+0.046
 anything+0.043
Neg logit contributions
anda-0.054
ouse-0.054
umpy-0.051
ari-0.047
oc-0.047
Token one
Token ID530
Feature activation+1.66e+00

Pos logit contributions
 the+0.052
 their+0.040
wered+0.040
 it+0.038
 them+0.035
Neg logit contributions
anda-0.329
ouse-0.301
jo-0.292
affe-0.291
umpy-0.288
Token.
Token ID13
Feature activation+1.39e+00

Pos logit contributions
anda+0.056
ouse+0.055
umpy+0.054
oc+0.051
uer+0.050
Neg logit contributions
 the-0.031
 it-0.029
 them-0.027
 their-0.026
 her-0.025
Token She
Token ID1375
Feature activation+1.71e+00

Pos logit contributions
anda+0.041
ouse+0.041
umpy+0.038
oc+0.037
uer+0.036
Neg logit contributions
 the-0.034
 it-0.032
 them-0.029
 her-0.029
 their-0.029
Token was
Token ID373
Feature activation-2.98e+00

Pos logit contributions
ouse+0.054
anda+0.053
umpy+0.053
ables+0.049
oc+0.048
Neg logit contributions
 the-0.037
 it-0.035
 them-0.032
 her-0.031
 their-0.030
Token so
Token ID523
Feature activation-2.04e+00

Pos logit contributions
 the+0.070
 it+0.069
 them+0.067
 their+0.064
 anything+0.061
Neg logit contributions
anda-0.079
ouse-0.076
umpy-0.076
ney-0.068
uer-0.068
Token started
Token ID2067
Feature activation-3.64e+00

Pos logit contributions
ouse+0.156
umpy+0.146
uer+0.144
anda+0.143
ables+0.140
Neg logit contributions
 the-0.119
 it-0.109
 her-0.097
 a-0.089
 them-0.084
Token to
Token ID284
Feature activation-4.06e+00

Pos logit contributions
 the+0.053
 it+0.052
 them+0.051
 their+0.047
 anything+0.044
Neg logit contributions
anda-0.126
ouse-0.121
umpy-0.121
oc-0.113
ables-0.112
Token cry
Token ID3960
Feature activation-1.15e-01

Pos logit contributions
 them+0.091
 it+0.088
 the+0.086
 their+0.084
 such+0.080
Neg logit contributions
anda-0.106
umpy-0.102
ouse-0.101
oc-0.096
uddle-0.092
Token and
Token ID290
Feature activation-2.09e+00

Pos logit contributions
 the+0.003
 it+0.002
 them+0.002
 their+0.002
 her+0.002
Neg logit contributions
anda-0.004
ouse-0.004
umpy-0.003
oc-0.003
ari-0.003
Token reached
Token ID4251
Feature activation-5.55e+00

Pos logit contributions
 the+0.042
 them+0.042
 it+0.040
 such+0.039
 anything+0.038
Neg logit contributions
anda-0.058
ouse-0.054
oc-0.051
umpy-0.051
uddle-0.049
Token for
Token ID329
Feature activation-8.33e+00

Pos logit contributions
 them+0.105
 it+0.096
 anything+0.092
 respect+0.091
 such+0.089
Neg logit contributions
anda-0.146
ouse-0.135
zag-0.135
ney-0.131
umpy-0.130
Token their
Token ID511
Feature activation+8.13e-01

Pos logit contributions
 yours+0.072
reating+0.069
ework+0.065
 yourselves+0.065
Stop+0.065
Neg logit contributions
oc-0.195
anda-0.187
ittens-0.184
 paws-0.183
 cub-0.182
Token bottle
Token ID9294
Feature activation+2.62e+00

Pos logit contributions
anda+0.023
ouse+0.023
umpy+0.023
oc+0.021
uer+0.021
Neg logit contributions
 the-0.019
 it-0.018
 them-0.017
 their-0.017
 her-0.017
Token to
Token ID284
Feature activation-4.60e+00

Pos logit contributions
anda+0.080
ouse+0.079
umpy+0.077
oc+0.070
uddle+0.070
Neg logit contributions
 the-0.055
 it-0.052
 them-0.050
 their-0.048
 her-0.046
Token feel
Token ID1254
Feature activation-4.74e+00

Pos logit contributions
 them+0.067
 their+0.065
 it+0.065
 such+0.063
 these+0.062
Neg logit contributions
oc-0.125
anda-0.122
ouse-0.121
umpy-0.118
an-0.116
Token safe
Token ID3338
Feature activation+2.95e-01

Pos logit contributions
 them+0.118
 the+0.115
 their+0.111
 it+0.110
wered+0.109
Neg logit contributions
umpy-0.102
anda-0.097
ouse-0.093
oc-0.088
uddle-0.084
Token to
Token ID284
Feature activation-4.99e+00

Pos logit contributions
 the+0.053
 it+0.049
 them+0.047
 their+0.045
 anything+0.044
Neg logit contributions
anda-0.112
ouse-0.110
umpy-0.107
oc-0.100
ables-0.100
Token the
Token ID262
Feature activation+2.10e+00

Pos logit contributions
 anything+0.062
 risking+0.058
 miles+0.049
ts+0.048
pped+0.046
Neg logit contributions
ouse-0.127
jo-0.120
anda-0.119
zag-0.118
oc-0.118
Token window
Token ID4324
Feature activation+1.96e+00

Pos logit contributions
anda+0.057
ouse+0.056
umpy+0.054
oc+0.050
ney+0.050
Neg logit contributions
 the-0.049
 it-0.049
 them-0.047
 their-0.044
 anything-0.043
Token and
Token ID290
Feature activation-1.45e+00

Pos logit contributions
anda+0.063
ouse+0.061
umpy+0.059
oc+0.056
uer+0.055
Neg logit contributions
 the-0.043
 it-0.040
 them-0.037
 her-0.036
 their-0.036
Token waited
Token ID13488
Feature activation-2.86e+00

Pos logit contributions
 them+0.034
 the+0.033
 anything+0.032
 it+0.032
 such+0.031
Neg logit contributions
anda-0.036
ouse-0.036
umpy-0.030
oc-0.030
ney-0.030
Token for
Token ID329
Feature activation-8.28e+00

Pos logit contributions
 the+0.059
 such+0.058
 it+0.057
 them+0.057
 anything+0.057
Neg logit contributions
ouse-0.068
anda-0.065
umpy-0.062
int-0.060
uddle-0.058
Token her
Token ID607
Feature activation+3.08e-01

Pos logit contributions
 risking+0.095
Put+0.083
 respect+0.079
 yourselves+0.078
 wasting+0.077
Neg logit contributions
ittens-0.174
ouse-0.169
ama-0.165
anda-0.164
int-0.160
Token mom
Token ID1995
Feature activation+2.68e+00

Pos logit contributions
anda+0.010
ouse+0.010
umpy+0.010
oc+0.009
uer+0.009
Neg logit contributions
 the-0.006
 it-0.006
 them-0.005
 their-0.005
 her-0.005
Tokenmy
Token ID1820
Feature activation+2.70e+00

Pos logit contributions
anda+0.061
ouse+0.060
umpy+0.059
uer+0.055
oc+0.055
Neg logit contributions
 the-0.082
 it-0.077
 them-0.075
 her-0.074
 their-0.073
Token to
Token ID284
Feature activation-4.58e+00

Pos logit contributions
anda+0.101
ouse+0.099
umpy+0.097
oc+0.092
uer+0.090
Neg logit contributions
 the-0.041
 it-0.037
 them-0.034
 their-0.032
 her-0.031
Token arrive
Token ID9240
Feature activation-1.35e+00

Pos logit contributions
 it+0.076
 them+0.074
 their+0.072
 the+0.068
 these+0.064
Neg logit contributions
ouse-0.122
umpy-0.116
int-0.116
ney-0.115
anda-0.115
Token get
Token ID651
Feature activation-4.36e+00

Pos logit contributions
 them+0.063
 the+0.063
 such+0.060
 it+0.060
 their+0.059
Neg logit contributions
umpy-0.117
anda-0.114
ouse-0.112
zag-0.108
oc-0.106
Token damaged
Token ID9694
Feature activation-2.51e+00

Pos logit contributions
 the+0.080
 it+0.078
 them+0.073
wered+0.072
 their+0.071
Neg logit contributions
anda-0.129
ouse-0.123
umpy-0.121
ney-0.110
oc-0.108
Token,
Token ID11
Feature activation-6.29e-01

Pos logit contributions
 the+0.055
 it+0.053
 them+0.051
 their+0.049
wered+0.048
Neg logit contributions
anda-0.068
ouse-0.066
umpy-0.066
oc-0.061
ney-0.061
Token so
Token ID523
Feature activation-2.81e+00

Pos logit contributions
 the+0.014
 it+0.014
 them+0.013
 their+0.013
 anything+0.012
Neg logit contributions
anda-0.016
umpy-0.016
ouse-0.016
oc-0.015
ney-0.014
Token she
Token ID673
Feature activation+1.41e+00

Pos logit contributions
 them+0.040
 the+0.039
 such+0.037
 anything+0.036
 it+0.036
Neg logit contributions
anda-0.091
ouse-0.088
umpy-0.086
oc-0.082
uddle-0.080
Token grabbed
Token ID13176
Feature activation-8.28e+00

Pos logit contributions
anda+0.043
ouse+0.042
umpy+0.041
oc+0.039
uer+0.038
Neg logit contributions
 the-0.031
 it-0.030
 them-0.028
 their-0.027
 her-0.026
Token the
Token ID262
Feature activation+1.60e+00

Pos logit contributions
 respect+0.030
 happen+0.029
wered+0.029
 such+0.026
 risking+0.023
Neg logit contributions
anda-0.314
oc-0.295
ouse-0.292
int-0.287
umpy-0.283
Token books
Token ID3835
Feature activation+2.16e+00

Pos logit contributions
anda+0.053
ouse+0.052
umpy+0.051
oc+0.048
uer+0.047
Neg logit contributions
 the-0.031
 it-0.029
 them-0.027
 their-0.026
 her-0.026
Token and
Token ID290
Feature activation-2.33e+00

Pos logit contributions
anda+0.054
ouse+0.053
umpy+0.052
oc+0.048
ney+0.047
Neg logit contributions
 the-0.056
 it-0.053
 them-0.052
 their-0.050
 her-0.049
Token held
Token ID2714
Feature activation-6.23e+00

Pos logit contributions
 them+0.052
wered+0.049
 anything+0.048
 it+0.048
 the+0.047
Neg logit contributions
ouse-0.054
anda-0.054
oc-0.053
ney-0.048
ables-0.047
Token them
Token ID606
Feature activation+1.72e-01

Pos logit contributions
 the+0.034
 such+0.027
 it+0.026
 anything+0.024
wered+0.022
Neg logit contributions
anda-0.256
ouse-0.247
oc-0.235
int-0.234
umpy-0.233
Token escape
Token ID6654
Feature activation-1.48e+00

Pos logit contributions
 the+0.059
 her+0.059
 it+0.057
wered+0.057
 their+0.055
Neg logit contributions
anda-0.163
oc-0.159
umpy-0.152
affe-0.149
ables-0.145
Token.
Token ID13
Feature activation+1.40e+00

Pos logit contributions
 the+0.030
 it+0.028
 them+0.027
 their+0.026
 her+0.026
Neg logit contributions
anda-0.045
ouse-0.043
umpy-0.043
oc-0.040
ables-0.039
Token As
Token ID1081
Feature activation-4.67e+00

Pos logit contributions
anda+0.034
ouse+0.034
umpy+0.030
uddle+0.028
uer+0.028
Neg logit contributions
 the-0.043
 it-0.041
 them-0.037
 their-0.037
 an-0.036
Token the
Token ID262
Feature activation+1.31e+00

Pos logit contributions
 them+0.081
 the+0.077
 her+0.076
wered+0.074
 such+0.074
Neg logit contributions
oc-0.132
anda-0.132
umpy-0.127
ouse-0.126
uer-0.118
Token hunter
Token ID19177
Feature activation+4.08e+00

Pos logit contributions
anda+0.034
ouse+0.033
umpy+0.032
ables+0.029
ney+0.029
Neg logit contributions
 the-0.037
 it-0.035
 them-0.033
 their-0.032
 her-0.030
Token grabbed
Token ID13176
Feature activation-8.28e+00

Pos logit contributions
anda+0.131
ouse+0.130
umpy+0.126
ables+0.116
uddle+0.115
Neg logit contributions
 the-0.090
 it-0.083
 them-0.075
 their-0.071
 her-0.069
Token him
Token ID683
Feature activation+1.42e+00

Pos logit contributions
 yourselves+0.049
wered+0.048
 happen+0.048
 it+0.044
 respect+0.044
Neg logit contributions
oc-0.291
anda-0.281
affe-0.267
int-0.261
ang-0.261
Token and
Token ID290
Feature activation-1.58e+00

Pos logit contributions
anda+0.044
ouse+0.044
umpy+0.043
oc+0.040
uer+0.039
Neg logit contributions
 the-0.031
 it-0.029
 them-0.027
 their-0.026
 her-0.026
Token c
Token ID269
Feature activation+2.68e+00

Pos logit contributions
 the+0.035
 them+0.034
 it+0.034
 anything+0.032
 their+0.032
Neg logit contributions
anda-0.043
ouse-0.041
oc-0.040
umpy-0.038
uer-0.036
Tokenaged
Token ID1886
Feature activation-4.75e+00

Pos logit contributions
anda+0.049
umpy+0.048
ouse+0.047
oc+0.046
ney+0.043
Neg logit contributions
 the-0.085
 it-0.084
 them-0.083
 her-0.079
 their-0.078
Token him
Token ID683
Feature activation+8.34e-01

Pos logit contributions
 the+0.039
 it+0.035
wered+0.033
 them+0.032
 such+0.030
Neg logit contributions
anda-0.184
oc-0.178
ouse-0.174
umpy-0.174
ables-0.166
Token to
Token ID284
Feature activation-5.63e+00

Pos logit contributions
 the+0.049
 it+0.046
 them+0.043
 their+0.042
 such+0.038
Neg logit contributions
anda-0.141
ouse-0.134
umpy-0.129
ney-0.124
ables-0.123
Token ten
Token ID3478
Feature activation-5.17e-01

Pos logit contributions
 it+0.073
 them+0.069
 their+0.067
 the+0.065
 risking+0.060
Neg logit contributions
anda-0.163
oc-0.157
umpy-0.156
uer-0.155
ney-0.153
Token and
Token ID290
Feature activation-3.58e+00

Pos logit contributions
 the+0.012
 it+0.011
 them+0.010
 their+0.010
 her+0.010
Neg logit contributions
ouse-0.016
anda-0.016
umpy-0.015
oc-0.014
uer-0.014
Token then
Token ID788
Feature activation-2.54e+00

Pos logit contributions
 the+0.060
 them+0.058
 it+0.057
 their+0.054
wered+0.051
Neg logit contributions
anda-0.114
ouse-0.110
umpy-0.102
oc-0.102
uer-0.101
Token looked
Token ID3114
Feature activation-4.33e+00

Pos logit contributions
 the+0.052
 it+0.049
 them+0.046
 their+0.045
 her+0.043
Neg logit contributions
anda-0.081
ouse-0.079
umpy-0.077
oc-0.073
uer-0.071
Token for
Token ID329
Feature activation-8.26e+00

Pos logit contributions
 the+0.089
 it+0.087
 them+0.084
 their+0.081
 such+0.077
Neg logit contributions
anda-0.128
ouse-0.122
umpy-0.117
oc-0.110
uddle-0.109
Token Max
Token ID5436
Feature activation+2.84e+00

Pos logit contributions
wered+0.097
 happen+0.087
 preventing+0.087
 respect+0.084
Make+0.083
Neg logit contributions
anda-0.215
affe-0.197
uddle-0.192
oodle-0.189
oc-0.186
Token.
Token ID13
Feature activation+1.44e+00

Pos logit contributions
anda+0.095
ouse+0.095
umpy+0.093
oc+0.088
int+0.086
Neg logit contributions
 the-0.055
 it-0.051
 them-0.047
 her-0.046
 their-0.046
Token She
Token ID1375
Feature activation+9.09e-01

Pos logit contributions
anda+0.047
ouse+0.046
umpy+0.045
oc+0.042
int+0.042
Neg logit contributions
 the-0.032
 it-0.030
 them-0.028
 their-0.027
 her-0.026
Token saw
Token ID2497
Feature activation-6.21e+00

Pos logit contributions
umpy+0.030
ouse+0.030
anda+0.028
oc+0.027
ables+0.027
Neg logit contributions
 the-0.020
 it-0.019
 her-0.016
 them-0.016
 a-0.015
Token a
Token ID257
Feature activation+7.50e-01

Pos logit contributions
 the+0.075
 it+0.071
wered+0.069
 them+0.068
 their+0.064
Neg logit contributions
anda-0.204
ouse-0.188
affe-0.186
oc-0.184
umpy-0.184
Token to
Token ID284
Feature activation-5.49e+00

Pos logit contributions
 the+0.050
 it+0.048
 them+0.045
 their+0.044
wered+0.042
Neg logit contributions
anda-0.129
ouse-0.120
umpy-0.115
ari-0.112
ables-0.111
Token ten
Token ID3478
Feature activation-2.50e-01

Pos logit contributions
 it+0.075
 them+0.071
 their+0.069
 the+0.067
 risking+0.063
Neg logit contributions
anda-0.158
ney-0.149
umpy-0.148
ouse-0.147
oc-0.147
Token and
Token ID290
Feature activation-3.31e+00

Pos logit contributions
 the+0.006
 it+0.005
 them+0.005
 her+0.005
 their+0.005
Neg logit contributions
ouse-0.008
anda-0.008
umpy-0.007
uer-0.007
oc-0.007
Token then
Token ID788
Feature activation-2.36e+00

Pos logit contributions
 the+0.048
 it+0.044
 them+0.043
 their+0.041
 her+0.037
Neg logit contributions
anda-0.119
ouse-0.116
umpy-0.110
oc-0.106
int-0.105
Token looks
Token ID3073
Feature activation-4.60e+00

Pos logit contributions
 the+0.052
 it+0.049
 them+0.046
 their+0.044
 her+0.044
Neg logit contributions
anda-0.073
ouse-0.072
umpy-0.070
oc-0.066
ney-0.064
Token for
Token ID329
Feature activation-8.25e+00

Pos logit contributions
 the+0.089
 it+0.086
 them+0.083
 their+0.080
 such+0.076
Neg logit contributions
anda-0.140
ouse-0.134
umpy-0.129
oc-0.122
uddle-0.121
Token Ben
Token ID3932
Feature activation+3.16e+00

Pos logit contributions
wered+0.096
 happen+0.092
 preventing+0.091
 risking+0.091
 respect+0.090
Neg logit contributions
anda-0.224
uddle-0.202
oodle-0.191
s-0.190
ari-0.189
Token.
Token ID13
Feature activation+1.13e+00

Pos logit contributions
anda+0.104
ouse+0.103
umpy+0.101
oc+0.095
int+0.093
Neg logit contributions
 the-0.063
 it-0.059
 them-0.055
 their-0.053
 her-0.053
Token
Token ID198
Feature activation+5.53e-01

Pos logit contributions
anda+0.037
ouse+0.036
umpy+0.035
oc+0.034
uer+0.033
Neg logit contributions
 the-0.025
 it-0.024
 them-0.022
 their-0.021
 her-0.021
Token
Token ID198
Feature activation+2.42e-01

Pos logit contributions
anda+0.017
ouse+0.017
umpy+0.016
oc+0.016
uer+0.015
Neg logit contributions
 the-0.012
 it-0.012
 them-0.011
 their-0.011
 her-0.010
TokenShe
Token ID3347
Feature activation+1.04e+00

Pos logit contributions
anda+0.007
ouse+0.007
umpy+0.007
oc+0.006
ney+0.006
Neg logit contributions
 the-0.006
 it-0.005
 them-0.005
 their-0.005
 her-0.005
Token water
Token ID1660
Feature activation-3.13e+00

Pos logit contributions
 the+0.070
 them+0.070
 it+0.069
 anything+0.064
 their+0.063
Neg logit contributions
ouse-0.146
anda-0.139
umpy-0.136
oc-0.135
ney-0.130
Token the
Token ID262
Feature activation+9.72e-01

Pos logit contributions
 the+0.032
 it+0.031
 respect+0.029
 anything+0.027
wered+0.027
Neg logit contributions
anda-0.108
ouse-0.106
uddle-0.101
umpy-0.100
oc-0.097
Token garden
Token ID11376
Feature activation+1.48e+00

Pos logit contributions
anda+0.029
ouse+0.028
umpy+0.027
oc+0.026
ney+0.025
Neg logit contributions
 the-0.022
 it-0.021
 them-0.020
 their-0.019
 her-0.019
Token.
Token ID13
Feature activation+1.49e+00

Pos logit contributions
anda+0.044
ouse+0.043
umpy+0.041
oc+0.039
ney+0.038
Neg logit contributions
 the-0.031
 it-0.030
 them-0.029
 their-0.027
 her-0.027
Token Mama
Token ID40084
Feature activation+3.52e+00

Pos logit contributions
anda+0.044
ouse+0.043
umpy+0.042
oc+0.039
int+0.038
Neg logit contributions
 the-0.033
 it-0.031
 them-0.030
 their-0.028
 her-0.028
Token grabbed
Token ID13176
Feature activation-8.25e+00

Pos logit contributions
ouse+0.109
anda+0.107
umpy+0.106
uer+0.104
oc+0.099
Neg logit contributions
 the-0.083
 it-0.076
 them-0.071
 their-0.067
 her-0.067
Token a
Token ID257
Feature activation+6.45e-01

Pos logit contributions
 achieving+0.068
 letting+0.065
 anything+0.063
 happen+0.060
 respect+0.059
Neg logit contributions
anda-0.278
int-0.272
oc-0.260
ouse-0.259
ish-0.252
Token long
Token ID890
Feature activation+1.47e+00

Pos logit contributions
anda+0.019
ouse+0.019
umpy+0.018
oc+0.017
uddle+0.017
Neg logit contributions
 the-0.014
 it-0.013
 them-0.013
 their-0.012
 her-0.012
Token hose
Token ID31489
Feature activation+2.24e+00

Pos logit contributions
ouse+0.041
anda+0.041
umpy+0.038
oc+0.036
uddle+0.036
Neg logit contributions
 the-0.032
 it-0.031
 them-0.031
 such-0.030
 anything-0.030
Token and
Token ID290
Feature activation-2.26e+00

Pos logit contributions
anda+0.058
ouse+0.057
umpy+0.055
oc+0.051
ney+0.051
Neg logit contributions
 the-0.055
 it-0.053
 them-0.052
 their-0.049
 anything-0.049
Token Baby
Token ID14801
Feature activation+3.72e+00

Pos logit contributions
 anything+0.050
 the+0.047
 them+0.047
wered+0.046
 such+0.045
Neg logit contributions
ouse-0.058
anda-0.057
int-0.053
oc-0.052
umpy-0.050
Token everywhere
Token ID8347
Feature activation-6.21e-01

Pos logit contributions
 the+0.048
 it+0.047
 them+0.043
 her+0.042
 their+0.042
Neg logit contributions
anda-0.078
umpy-0.075
ouse-0.073
ables-0.068
ney-0.067
Token.
Token ID13
Feature activation+3.47e-01

Pos logit contributions
 the+0.014
 them+0.013
 it+0.013
 anything+0.012
 such+0.012
Neg logit contributions
anda-0.016
ouse-0.016
umpy-0.016
oc-0.014
ney-0.014
Token
Token ID198
Feature activation-3.33e-01

Pos logit contributions
anda+0.010
ouse+0.010
umpy+0.009
oc+0.009
uer+0.009
Neg logit contributions
 the-0.008
 it-0.008
 them-0.008
 their-0.007
 her-0.007
Token
Token ID198
Feature activation-5.53e-01

Pos logit contributions
 the+0.008
 it+0.008
 them+0.007
 their+0.007
 her+0.007
Neg logit contributions
anda-0.009
ouse-0.009
umpy-0.009
oc-0.008
uer-0.008
TokenMom
Token ID29252
Feature activation+2.30e+00

Pos logit contributions
 the+0.014
 it+0.013
 them+0.013
 her+0.012
 their+0.012
Neg logit contributions
anda-0.015
ouse-0.014
umpy-0.014
oc-0.013
ables-0.013
Token heard
Token ID2982
Feature activation-8.22e+00

Pos logit contributions
ouse+0.075
anda+0.074
uer+0.072
umpy+0.072
oc+0.069
Neg logit contributions
 the-0.050
 it-0.045
 them-0.043
 her-0.040
 their-0.039
Token the
Token ID262
Feature activation+9.58e-01

Pos logit contributions
 anything+0.037
 yours+0.035
 it+0.034
 achieving+0.033
 such+0.033
Neg logit contributions
anda-0.311
umpy-0.282
oodle-0.278
affe-0.277
oc-0.274
Token noise
Token ID7838
Feature activation+4.84e-01

Pos logit contributions
anda+0.033
ouse+0.033
umpy+0.031
oc+0.031
int+0.030
Neg logit contributions
 the-0.019
 it-0.017
 them-0.016
 her-0.015
 their-0.015
Token and
Token ID290
Feature activation-1.97e+00

Pos logit contributions
anda+0.014
ouse+0.014
umpy+0.013
oc+0.012
uer+0.012
Neg logit contributions
 the-0.012
 it-0.011
 them-0.011
 their-0.010
 her-0.010
Token came
Token ID1625
Feature activation-2.78e+00

Pos logit contributions
 such+0.046
 the+0.045
 anything+0.045
wered+0.044
 it+0.044
Neg logit contributions
anda-0.048
ouse-0.043
umpy-0.041
oc-0.038
int-0.038
Token to
Token ID284
Feature activation-6.05e+00

Pos logit contributions
 the+0.045
 it+0.045
 anything+0.043
wered+0.043
 such+0.040
Neg logit contributions
anda-0.087
umpy-0.083
ouse-0.081
ney-0.078
ables-0.075
Token One
Token ID1881
Feature activation+8.38e-01

Pos logit contributions
anda+0.076
ouse+0.073
umpy+0.071
uer+0.070
oc+0.068
Neg logit contributions
 the-0.043
 it-0.040
 them-0.037
 her-0.036
 their-0.034
Token day
Token ID1110
Feature activation-1.53e+00

Pos logit contributions
anda+0.026
ouse+0.025
uer+0.025
umpy+0.024
ables+0.023
Neg logit contributions
 the-0.020
 it-0.018
 their-0.017
 her-0.017
 a-0.016
Token,
Token ID11
Feature activation-2.68e+00

Pos logit contributions
 the+0.039
 them+0.039
wered+0.038
 it+0.037
 such+0.037
Neg logit contributions
anda-0.034
ouse-0.034
umpy-0.031
oc-0.030
uddle-0.029
Token Lily
Token ID20037
Feature activation+3.28e+00

Pos logit contributions
 permission+0.050
 respect+0.048
wered+0.048
 them+0.046
 such+0.044
Neg logit contributions
ouse-0.070
anda-0.065
uddle-0.062
s-0.061
umpy-0.058
Token accidentally
Token ID14716
Feature activation-4.58e+00

Pos logit contributions
anda+0.094
ouse+0.088
umpy+0.087
uer+0.086
ari+0.083
Neg logit contributions
 the-0.085
 it-0.080
 her-0.072
 their-0.070
 them-0.070
Token dropped
Token ID5710
Feature activation-8.22e+00

Pos logit contributions
wered+0.115
 them+0.113
 the+0.110
 such+0.108
 their+0.106
Neg logit contributions
ouse-0.106
anda-0.096
int-0.079
uddle-0.078
umpy-0.078
Token the
Token ID262
Feature activation+1.24e+00

Pos logit contributions
wered+0.068
 respect+0.055
 earn+0.050
 happen+0.049
ts+0.046
Neg logit contributions
anda-0.266
s-0.265
umpy-0.261
ouse-0.253
oc-0.248
Token ball
Token ID2613
Feature activation+2.51e+00

Pos logit contributions
anda+0.031
ouse+0.030
umpy+0.029
oc+0.028
uer+0.026
Neg logit contributions
 the-0.033
 it-0.032
 them-0.031
 their-0.030
 her-0.029
Token and
Token ID290
Feature activation-2.27e+00

Pos logit contributions
anda+0.078
ouse+0.076
umpy+0.074
oc+0.070
ney+0.068
Neg logit contributions
 the-0.052
 it-0.049
 them-0.047
 their-0.045
 her-0.044
Token it
Token ID340
Feature activation+2.25e-01

Pos logit contributions
wered+0.018
 them+0.017
 respect+0.014
 permission+0.013
 such+0.013
Neg logit contributions
ouse-0.088
anda-0.084
s-0.080
oc-0.079
uddle-0.079
Token rolled
Token ID11686
Feature activation-3.92e+00

Pos logit contributions
anda+0.006
ouse+0.006
umpy+0.006
oc+0.006
ney+0.006
Neg logit contributions
 the-0.005
 it-0.005
 them-0.005
 their-0.005
 her-0.004
Token she
Token ID673
Feature activation+1.22e+00

Pos logit contributions
wered+0.093
 them+0.089
 respect+0.085
 such+0.084
 the+0.082
Neg logit contributions
anda-0.082
ouse-0.077
oc-0.065
uer-0.064
uffy-0.063
Token needed
Token ID2622
Feature activation-5.67e+00

Pos logit contributions
anda+0.036
ouse+0.036
umpy+0.035
ables+0.033
uddle+0.033
Neg logit contributions
 the-0.029
 it-0.028
 her-0.024
 them-0.023
 a-0.023
Token help
Token ID1037
Feature activation-1.66e+00

Pos logit contributions
wered+0.091
 wasting+0.071
 ruining+0.069
 achieving+0.066
oted+0.066
Neg logit contributions
anda-0.134
ouse-0.125
oc-0.124
age-0.119
s-0.118
Token finding
Token ID4917
Feature activation-5.42e+00

Pos logit contributions
wered+0.039
 the+0.037
 them+0.036
 it+0.036
 their+0.035
Neg logit contributions
ouse-0.037
anda-0.036
oc-0.036
umpy-0.035
ney-0.032
Token her
Token ID607
Feature activation+1.18e+00

Pos logit contributions
wered+0.053
 letting+0.038
 ruining+0.035
 achieving+0.034
 inconvenience+0.033
Neg logit contributions
anda-0.194
oc-0.186
ouse-0.181
ish-0.174
est-0.173
Token t
Token ID256
Feature activation+6.71e+00

Pos logit contributions
anda+0.036
ouse+0.035
umpy+0.035
oc+0.033
int+0.032
Neg logit contributions
 the-0.027
 it-0.025
 them-0.023
 their-0.023
 her-0.023
Tokeneddy
Token ID21874
Feature activation+5.45e+00

Pos logit contributions
umpy+0.133
anda+0.125
oke+0.122
ouse+0.120
uer+0.119
Neg logit contributions
 it-0.189
 the-0.188
 them-0.182
 him-0.181
 her-0.181
Token bear
Token ID6842
Feature activation+4.49e+00

Pos logit contributions
umpy+0.129
anda+0.128
ouse+0.125
uer+0.113
int+0.112
Neg logit contributions
 the-0.168
 it-0.158
 them-0.149
 her-0.148
 their-0.143
Token.
Token ID13
Feature activation+3.32e+00

Pos logit contributions
anda+0.144
ouse+0.141
umpy+0.137
oc+0.130
uer+0.126
Neg logit contributions
 the-0.091
 it-0.085
 them-0.081
 their-0.078
 her-0.076
Token The
Token ID383
Feature activation+1.90e+00

Pos logit contributions
anda+0.122
umpy+0.120
ouse+0.119
uer+0.112
int+0.111
Neg logit contributions
 the-0.062
 it-0.057
 her-0.052
 their-0.051
 them-0.050
Token little
Token ID1310
Feature activation+1.38e+00

Pos logit contributions
anda+0.053
ouse+0.052
umpy+0.050
oc+0.047
uer+0.045
Neg logit contributions
 the-0.047
 it-0.045
 them-0.043
 their-0.041
 her-0.040
Token.
Token ID13
Feature activation+1.81e+00

Pos logit contributions
anda+0.027
ouse+0.027
umpy+0.026
oc+0.025
uer+0.024
Neg logit contributions
 the-0.018
 it-0.017
 them-0.016
 their-0.016
 her-0.015
Token That
Token ID1320
Feature activation+1.28e+00

Pos logit contributions
anda+0.057
ouse+0.057
umpy+0.053
uer+0.052
uddle+0.052
Neg logit contributions
 the-0.041
 it-0.039
 their-0.035
 them-0.035
 her-0.034
Token is
Token ID318
Feature activation-1.83e+00

Pos logit contributions
anda+0.033
ouse+0.032
umpy+0.031
oc+0.029
uer+0.028
Neg logit contributions
 the-0.035
 it-0.033
 them-0.031
 their-0.031
 her-0.030
Token the
Token ID262
Feature activation+2.75e+00

Pos logit contributions
 them+0.021
 it+0.021
 the+0.020
wered+0.019
 him+0.017
Neg logit contributions
anda-0.064
ouse-0.060
umpy-0.059
oc-0.056
ney-0.055
Token real
Token ID1103
Feature activation+3.47e+00

Pos logit contributions
anda+0.059
ouse+0.058
umpy+0.055
oc+0.051
uer+0.049
Neg logit contributions
 the-0.084
 it-0.080
 them-0.078
 their-0.076
 her-0.074
Token treasure
Token ID14068
Feature activation+6.26e+00

Pos logit contributions
anda+0.101
ouse+0.099
umpy+0.097
oc+0.090
int+0.090
Neg logit contributions
 the-0.084
 it-0.079
 them-0.073
 their-0.072
 her-0.072
Token."
Token ID526
Feature activation+2.28e+00

Pos logit contributions
anda+0.209
ouse+0.206
umpy+0.204
int+0.193
oc+0.189
Neg logit contributions
 the-0.124
 it-0.119
 them-0.107
 their-0.103
 her-0.100
Token
Token ID198
Feature activation+1.71e+00

Pos logit contributions
anda+0.074
ouse+0.073
umpy+0.070
uer+0.067
int+0.066
Neg logit contributions
 the-0.052
 it-0.048
 them-0.044
 their-0.044
 her-0.043
Token
Token ID198
Feature activation+1.90e+00

Pos logit contributions
anda+0.050
ouse+0.049
umpy+0.047
oc+0.045
uer+0.044
Neg logit contributions
 the-0.040
 it-0.038
 them-0.036
 their-0.035
 her-0.034
TokenTim
Token ID14967
Feature activation+5.21e+00

Pos logit contributions
anda+0.058
ouse+0.056
umpy+0.054
oc+0.051
uer+0.050
Neg logit contributions
 the-0.043
 it-0.040
 them-0.038
 their-0.037
 her-0.036
Tokenmy
Token ID1820
Feature activation+4.43e+00

Pos logit contributions
ouse+0.159
anda+0.159
uer+0.153
umpy+0.151
oc+0.143
Neg logit contributions
 the-0.128
 it-0.110
 her-0.106
 a-0.099
 them-0.099
Token are
Token ID389
Feature activation-1.87e+00

Pos logit contributions
anda+0.023
ouse+0.022
umpy+0.022
ables+0.020
oc+0.020
Neg logit contributions
 the-0.011
 it-0.010
 them-0.010
 their-0.010
 her-0.009
Token hard
Token ID1327
Feature activation+1.65e+00

Pos logit contributions
 the+0.036
 it+0.035
 them+0.033
 their+0.032
 her+0.031
Neg logit contributions
anda-0.058
umpy-0.055
ouse-0.055
ney-0.049
ables-0.049
Token."
Token ID526
Feature activation+2.82e+00

Pos logit contributions
anda+0.051
ouse+0.050
umpy+0.049
oc+0.046
uer+0.044
Neg logit contributions
 the-0.035
 it-0.033
 them-0.031
 their-0.030
 her-0.029
Token So
Token ID1406
Feature activation-1.41e+00

Pos logit contributions
anda+0.087
ouse+0.087
umpy+0.086
uer+0.078
int+0.077
Neg logit contributions
 the-0.070
 it-0.064
 them-0.059
 her-0.058
 their-0.057
Token,
Token ID11
Feature activation-1.65e+00

Pos logit contributions
 the+0.027
 it+0.025
 them+0.022
 their+0.022
 her+0.021
Neg logit contributions
anda-0.049
ouse-0.049
umpy-0.048
int-0.044
oc-0.044
Token Sp
Token ID1338
Feature activation+7.87e+00

Pos logit contributions
 the+0.029
 it+0.027
 them+0.024
 their+0.023
 her+0.023
Neg logit contributions
anda-0.059
ouse-0.058
umpy-0.057
oc-0.053
uer-0.053
Tokenide
Token ID485
Feature activation+5.35e+00

Pos logit contributions
ouse+0.230
anda+0.217
ables+0.204
ooter+0.202
uddle+0.201
Neg logit contributions
 the-0.148
 them-0.144
 it-0.134
 their-0.130
 her-0.123
Tokeny
Token ID88
Feature activation+3.72e+00

Pos logit contributions
ouse+0.095
anda+0.091
umpy+0.089
uer+0.086
uddle+0.086
Neg logit contributions
 the-0.184
 it-0.172
 them-0.168
 her-0.165
 their-0.165
Token tried
Token ID3088
Feature activation-4.58e+00

Pos logit contributions
anda+0.118
ouse+0.114
umpy+0.113
oc+0.104
uddle+0.104
Neg logit contributions
 the-0.083
 it-0.078
 them-0.069
 her-0.067
 their-0.066
Token again
Token ID757
Feature activation+1.16e-01

Pos logit contributions
 the+0.065
 it+0.061
wered+0.060
 them+0.060
 their+0.055
Neg logit contributions
ouse-0.143
anda-0.142
umpy-0.136
ables-0.133
oc-0.130
Token and
Token ID290
Feature activation-1.29e+00

Pos logit contributions
anda+0.004
ouse+0.003
umpy+0.003
oc+0.003
int+0.003
Neg logit contributions
 the-0.003
 it-0.003
 them-0.002
 their-0.002
 her-0.002
Token time
Token ID640
Feature activation+1.96e+00

Pos logit contributions
ouse+0.155
anda+0.153
umpy+0.144
uer+0.141
ari+0.139
Neg logit contributions
 the-0.050
 it-0.041
 their-0.039
 them-0.038
 her-0.036
Token,
Token ID11
Feature activation+6.39e-01

Pos logit contributions
ouse+0.063
anda+0.063
umpy+0.061
oc+0.058
uer+0.057
Neg logit contributions
 the-0.041
 it-0.038
 them-0.035
 their-0.035
 her-0.034
Token there
Token ID612
Feature activation+2.39e+00

Pos logit contributions
anda+0.026
ouse+0.026
umpy+0.025
oc+0.024
uer+0.024
Neg logit contributions
 the-0.008
 it-0.007
 them-0.006
 their-0.006
 her-0.006
Token lived
Token ID5615
Feature activation-2.23e+00

Pos logit contributions
anda+0.061
ouse+0.059
umpy+0.057
oc+0.054
ney+0.052
Neg logit contributions
 the-0.062
 it-0.059
 them-0.057
 their-0.055
 her-0.054
Token a
Token ID257
Feature activation+2.03e+00

Pos logit contributions
 it+0.023
 happen+0.020
 anything+0.020
 them+0.019
 the+0.017
Neg logit contributions
anda-0.080
ouse-0.080
umpy-0.080
ney-0.075
int-0.074
Token dragon
Token ID10441
Feature activation+6.66e+00

Pos logit contributions
anda+0.061
ouse+0.061
umpy+0.059
ables+0.057
int+0.056
Neg logit contributions
 the-0.048
 it-0.044
 their-0.040
 them-0.038
 her-0.038
Token.
Token ID13
Feature activation+4.42e-01

Pos logit contributions
anda+0.223
ouse+0.220
umpy+0.220
int+0.213
uer+0.212
Neg logit contributions
 the-0.136
 it-0.121
 her-0.102
 their-0.102
 a-0.100
Token He
Token ID679
Feature activation+2.09e+00

Pos logit contributions
anda+0.013
ouse+0.013
umpy+0.012
oc+0.011
uer+0.011
Neg logit contributions
 the-0.010
 it-0.010
 them-0.009
 their-0.009
 her-0.009
Token was
Token ID373
Feature activation-2.32e+00

Pos logit contributions
anda+0.066
ouse+0.065
umpy+0.064
uddle+0.061
uer+0.061
Neg logit contributions
 the-0.048
 it-0.045
 her-0.038
 a-0.038
 them-0.038
Token a
Token ID257
Feature activation+2.43e+00

Pos logit contributions
 the+0.051
 it+0.045
 her+0.041
 their+0.041
 them+0.041
Neg logit contributions
ouse-0.077
anda-0.077
umpy-0.073
oc-0.071
uddle-0.071
Token very
Token ID845
Feature activation+1.03e+00

Pos logit contributions
anda+0.080
ouse+0.077
umpy+0.075
int+0.074
ables+0.073
Neg logit contributions
 the-0.053
 it-0.048
 a-0.042
 their-0.042
 her-0.040
Token play
Token ID711
Feature activation-1.00e+00

Pos logit contributions
 the+0.043
 it+0.041
 them+0.041
 their+0.038
 her+0.036
Neg logit contributions
anda-0.089
ouse-0.089
umpy-0.084
oc-0.082
ney-0.080
Token peek
Token ID27185
Feature activation+7.27e-01

Pos logit contributions
 the+0.016
 it+0.015
 them+0.014
 their+0.013
 her+0.013
Neg logit contributions
anda-0.035
ouse-0.035
umpy-0.034
oc-0.033
uer-0.032
Token-
Token ID12
Feature activation+1.11e+00

Pos logit contributions
anda+0.024
ouse+0.024
umpy+0.023
oc+0.022
ney+0.022
Neg logit contributions
 the-0.015
 it-0.013
 them-0.013
 their-0.012
 her-0.012
Tokena
Token ID64
Feature activation+4.12e+00

Pos logit contributions
anda+0.024
ouse+0.023
umpy+0.022
oc+0.020
uer+0.020
Neg logit contributions
 the-0.034
 it-0.033
 them-0.032
 their-0.030
 her-0.030
Token-
Token ID12
Feature activation+1.62e+00

Pos logit contributions
umpy+0.121
anda+0.121
ouse+0.119
ney+0.110
uer+0.108
Neg logit contributions
 the-0.096
 it-0.090
 them-0.087
 their-0.081
 her-0.079
Tokenbo
Token ID2127
Feature activation+7.27e+00

Pos logit contributions
anda+0.036
ouse+0.035
umpy+0.034
uddle+0.031
oc+0.030
Neg logit contributions
 the-0.049
 it-0.047
 them-0.046
 her-0.044
 their-0.043
Tokeno
Token ID78
Feature activation+5.07e+00

Pos logit contributions
uer+0.122
anda+0.111
umpy+0.107
ouse+0.105
ini+0.102
Neg logit contributions
 the-0.248
 it-0.236
 them-0.229
 their-0.221
 a-0.217
Token instead
Token ID2427
Feature activation+2.96e+00

Pos logit contributions
anda+0.156
umpy+0.152
ouse+0.149
uer+0.141
ney+0.140
Neg logit contributions
 the-0.112
 it-0.106
 them-0.103
 her-0.092
 their-0.092
Token."
Token ID526
Feature activation+2.82e+00

Pos logit contributions
anda+0.094
ouse+0.092
umpy+0.086
oc+0.085
uer+0.083
Neg logit contributions
 the-0.064
 it-0.061
 them-0.057
 their-0.053
 her-0.053
Token So
Token ID1406
Feature activation-1.59e+00

Pos logit contributions
anda+0.095
umpy+0.091
ouse+0.090
uer+0.085
ney+0.084
Neg logit contributions
 the-0.062
 it-0.057
 her-0.053
 them-0.053
 their-0.052
Token they
Token ID484
Feature activation+4.35e-01

Pos logit contributions
 the+0.033
 it+0.031
 them+0.030
 their+0.028
 her+0.027
Neg logit contributions
anda-0.049
ouse-0.048
umpy-0.046
oc-0.044
uer-0.043
Token sees
Token ID7224
Feature activation-5.94e+00

Pos logit contributions
ouse+0.036
umpy+0.034
anda+0.034
uer+0.032
oc+0.032
Neg logit contributions
 the-0.033
 it-0.031
 them-0.029
 their-0.028
 her-0.028
Token a
Token ID257
Feature activation+1.03e+00

Pos logit contributions
 the+0.056
 it+0.051
 them+0.044
 their+0.040
 her+0.038
Neg logit contributions
anda-0.245
ouse-0.238
umpy-0.234
oc-0.225
uddle-0.219
Token red
Token ID2266
Feature activation+2.25e+00

Pos logit contributions
anda+0.032
umpy+0.031
ouse+0.031
int+0.029
oc+0.029
Neg logit contributions
 the-0.023
 it-0.021
 their-0.019
 them-0.019
 her-0.018
Token button
Token ID4936
Feature activation+1.89e+00

Pos logit contributions
anda+0.067
ouse+0.067
umpy+0.063
oc+0.062
ney+0.059
Neg logit contributions
 the-0.048
 it-0.045
 them-0.044
 their-0.042
 her-0.041
Token on
Token ID319
Feature activation-4.76e+00

Pos logit contributions
anda+0.066
ouse+0.065
umpy+0.064
oc+0.062
uer+0.060
Neg logit contributions
 the-0.035
 it-0.033
 them-0.030
 their-0.029
 her-0.028
Token the
Token ID262
Feature activation+1.74e+00

Pos logit contributions
 permission+0.024
 risking+0.022
 respect+0.020
 help+0.020
 happen+0.017
Neg logit contributions
oc-0.179
anda-0.171
affe-0.166
umpy-0.166
ouse-0.165
Token oven
Token ID14361
Feature activation+2.42e+00

Pos logit contributions
anda+0.049
ouse+0.047
umpy+0.046
oc+0.043
ney+0.043
Neg logit contributions
 the-0.040
 it-0.039
 them-0.038
 their-0.036
 her-0.035
Token.
Token ID13
Feature activation+6.11e-01

Pos logit contributions
anda+0.079
ouse+0.077
umpy+0.075
oc+0.071
uer+0.070
Neg logit contributions
 the-0.049
 it-0.046
 them-0.043
 their-0.042
 her-0.041
Token He
Token ID679
Feature activation+9.63e-01

Pos logit contributions
anda+0.019
ouse+0.018
umpy+0.018
uer+0.017
oc+0.017
Neg logit contributions
 the-0.015
 it-0.014
 them-0.013
 their-0.012
 her-0.012
Token thinks
Token ID6834
Feature activation-4.91e+00

Pos logit contributions
ouse+0.028
umpy+0.027
anda+0.027
uer+0.025
oc+0.025
Neg logit contributions
 the-0.024
 it-0.023
 them-0.022
 their-0.021
 her-0.021
Token it
Token ID340
Feature activation+1.71e+00

Pos logit contributions
 the+0.021
 them+0.019
wered+0.018
 such+0.017
 her+0.014
Neg logit contributions
anda-0.204
ouse-0.203
umpy-0.193
oke-0.187
affe-0.186
Token coming
Token ID2406
Feature activation-2.14e+00

Pos logit contributions
anda+0.182
ouse+0.179
umpy+0.170
oc+0.163
uer+0.160
Neg logit contributions
 the-0.118
 it-0.108
 them-0.103
 their-0.099
 her-0.097
Token from
Token ID422
Feature activation-4.22e+00

Pos logit contributions
 the+0.035
 it+0.034
 them+0.031
 anything+0.031
 her+0.030
Neg logit contributions
anda-0.069
umpy-0.068
ouse-0.066
oc-0.065
ney-0.063
Token the
Token ID262
Feature activation+2.82e+00

Pos logit contributions
 respect+0.030
wered+0.030
 anything+0.028
 such+0.026
 yours+0.026
Neg logit contributions
oc-0.138
ney-0.137
anda-0.134
umpy-0.128
mer-0.127
Token freezer
Token ID30967
Feature activation+4.62e+00

Pos logit contributions
anda+0.072
ouse+0.070
umpy+0.068
oc+0.064
ney+0.061
Neg logit contributions
 the-0.075
 it-0.072
 them-0.069
 their-0.067
 her-0.066
Token.
Token ID13
Feature activation+1.95e+00

Pos logit contributions
anda+0.150
ouse+0.147
umpy+0.143
oc+0.135
int+0.133
Neg logit contributions
 the-0.095
 it-0.089
 them-0.083
 their-0.080
 her-0.077
Token Dad
Token ID17415
Feature activation+3.73e+00

Pos logit contributions
anda+0.055
ouse+0.053
umpy+0.052
oc+0.049
uer+0.047
Neg logit contributions
 the-0.048
 it-0.045
 them-0.043
 their-0.042
 her-0.040
Token said
Token ID531
Feature activation-2.38e+00

Pos logit contributions
ouse+0.122
umpy+0.119
anda+0.118
uer+0.111
ables+0.109
Neg logit contributions
 the-0.082
 it-0.076
 them-0.069
 their-0.064
 her-0.063
Token,
Token ID11
Feature activation-1.87e-01

Pos logit contributions
 the+0.047
wered+0.046
 them+0.045
 her+0.043
 such+0.043
Neg logit contributions
anda-0.066
ouse-0.064
umpy-0.059
oc-0.058
uer-0.055
Token "
Token ID366
Feature activation+1.47e-01

Pos logit contributions
 the+0.004
 it+0.004
 them+0.004
 their+0.004
 her+0.004
Neg logit contributions
anda-0.005
ouse-0.005
umpy-0.005
oc-0.005
uer-0.005
TokenQuick
Token ID21063
Feature activation-1.13e+00

Pos logit contributions
anda+0.005
ouse+0.005
umpy+0.005
oc+0.004
int+0.004
Neg logit contributions
 the-0.003
 it-0.003
 them-0.003
 their-0.003
 her-0.002
Tokenly
Token ID306
Feature activation+6.95e-01

Pos logit contributions
 the+0.023
 it+0.022
 them+0.021
 their+0.021
 her+0.020
Neg logit contributions
anda-0.034
umpy-0.032
ouse-0.032
oc-0.030
ari-0.029
Token need
Token ID761
Feature activation-5.92e+00

Pos logit contributions
 the+0.041
 it+0.038
 them+0.038
 their+0.037
 her+0.034
Neg logit contributions
umpy-0.075
anda-0.073
ouse-0.071
oc-0.068
ney-0.065
Token to
Token ID284
Feature activation-4.07e+00

Pos logit contributions
wered+0.077
 the+0.069
 it+0.067
 their+0.064
 them+0.061
Neg logit contributions
anda-0.186
umpy-0.171
ouse-0.163
ables-0.161
affe-0.160
Token panic
Token ID13619
Feature activation-1.09e+00

Pos logit contributions
 the+0.066
 it+0.064
 them+0.062
 their+0.060
 such+0.055
Neg logit contributions
anda-0.128
ouse-0.125
umpy-0.124
oc-0.119
uer-0.113
Token anymore
Token ID7471
Feature activation+8.07e-01

Pos logit contributions
 the+0.022
 it+0.021
 them+0.020
 their+0.019
 her+0.019
Neg logit contributions
anda-0.034
ouse-0.033
umpy-0.033
oc-0.031
uer-0.030
Token because
Token ID780
Feature activation-3.01e+00

Pos logit contributions
anda+0.026
ouse+0.026
umpy+0.025
oc+0.023
uer+0.023
Neg logit contributions
 the-0.017
 it-0.016
 them-0.015
 their-0.014
 her-0.014
Token he
Token ID339
Feature activation+2.49e+00

Pos logit contributions
 the+0.051
 them+0.048
 it+0.046
 their+0.046
 her+0.043
Neg logit contributions
anda-0.092
umpy-0.088
ouse-0.088
oc-0.084
ney-0.081
Token had
Token ID550
Feature activation-3.01e+00

Pos logit contributions
ouse+0.076
anda+0.076
umpy+0.072
ari+0.069
uer+0.069
Neg logit contributions
 the-0.057
 it-0.055
 them-0.048
 her-0.047
 their-0.045
Token found
Token ID1043
Feature activation-5.02e+00

Pos logit contributions
wered+0.045
 them+0.045
 the+0.044
 it+0.043
 their+0.041
Neg logit contributions
anda-0.091
ouse-0.091
umpy-0.089
ables-0.085
ney-0.081
Token a
Token ID257
Feature activation+1.91e+00

Pos logit contributions
 the+0.039
 it+0.036
 them+0.031
wered+0.030
 their+0.030
Neg logit contributions
anda-0.199
ouse-0.189
umpy-0.189
oc-0.187
ables-0.180
Token safe
Token ID3338
Feature activation+1.87e+00

Pos logit contributions
anda+0.058
ouse+0.056
umpy+0.055
oc+0.052
int+0.050
Neg logit contributions
 the-0.042
 it-0.040
 them-0.038
 their-0.037
 her-0.036
Token place
Token ID1295
Feature activation+1.82e+00

Pos logit contributions
anda+0.057
ouse+0.056
umpy+0.054
oc+0.051
ney+0.050
Neg logit contributions
 the-0.039
 it-0.037
 them-0.035
 their-0.034
 her-0.033
Token fun
Token ID1257
Feature activation-1.74e+00

Pos logit contributions
 the+0.037
 it+0.035
 them+0.033
 their+0.032
 her+0.031
Neg logit contributions
anda-0.064
ouse-0.062
umpy-0.061
oc-0.058
uddle-0.056
Token together
Token ID1978
Feature activation-1.77e+00

Pos logit contributions
 the+0.035
 it+0.032
 them+0.031
 their+0.029
 her+0.029
Neg logit contributions
anda-0.055
ouse-0.054
umpy-0.053
oc-0.051
uer-0.048
Token.
Token ID13
Feature activation+1.73e+00

Pos logit contributions
 the+0.038
 it+0.037
 them+0.035
 her+0.033
 anything+0.033
Neg logit contributions
anda-0.048
ouse-0.047
umpy-0.045
oc-0.044
ney-0.041
Token
Token ID198
Feature activation+1.23e+00

Pos logit contributions
anda+0.057
ouse+0.057
umpy+0.054
uer+0.053
int+0.050
Neg logit contributions
 the-0.038
 it-0.036
 them-0.033
 their-0.032
 her-0.029
Token
Token ID198
Feature activation+1.19e+00

Pos logit contributions
anda+0.037
ouse+0.036
umpy+0.035
uer+0.033
oc+0.033
Neg logit contributions
 the-0.029
 it-0.027
 them-0.026
 their-0.025
 her-0.024
TokenOne
Token ID3198
Feature activation+9.41e-01

Pos logit contributions
anda+0.039
ouse+0.038
umpy+0.036
uer+0.034
int+0.034
Neg logit contributions
 the-0.025
 it-0.023
 them-0.022
 their-0.021
 her-0.020
Token day
Token ID1110
Feature activation-1.42e+00

Pos logit contributions
anda+0.027
ouse+0.026
umpy+0.025
oc+0.024
uer+0.024
Neg logit contributions
 the-0.024
 it-0.022
 their-0.021
 them-0.021
 her-0.020
Token,
Token ID11
Feature activation-2.38e+00

Pos logit contributions
 the+0.035
 them+0.034
 it+0.033
 her+0.032
 such+0.032
Neg logit contributions
anda-0.035
ouse-0.033
umpy-0.032
oc-0.031
uddle-0.029
Token Tim
Token ID5045
Feature activation+3.43e+00

Pos logit contributions
 them+0.035
 permission+0.034
 the+0.033
 her+0.032
 respect+0.032
Neg logit contributions
anda-0.071
ouse-0.069
oc-0.067
umpy-0.066
affe-0.065
Token and
Token ID290
Feature activation-4.09e+00

Pos logit contributions
anda+0.111
ouse+0.107
uer+0.104
umpy+0.101
oc+0.100
Neg logit contributions
 the-0.080
 it-0.071
 her-0.064
 them-0.063
 their-0.063
Token Rod
Token ID6882
Feature activation+3.09e+00

Pos logit contributions
 the+0.092
 it+0.088
 them+0.087
 their+0.082
 her+0.079
Neg logit contributions
anda-0.114
ouse-0.112
umpy-0.107
oc-0.104
ables-0.097
Token!
Token ID0
Feature activation+1.31e+00

Pos logit contributions
anda+0.151
ouse+0.148
umpy+0.144
oc+0.136
uddle+0.134
Neg logit contributions
 the-0.095
 it-0.090
 them-0.080
 their-0.078
 her-0.077
Tokenâ
Token ID22940
Feature activation+3.60e+00

Pos logit contributions
anda+0.040
ouse+0.039
umpy+0.036
uddle+0.035
uer+0.034
Neg logit contributions
 the-0.031
 it-0.030
 them-0.027
 their-0.026
 her-0.026
Token€
Token ID26391
Feature activation+4.27e+00

Pos logit contributions
anda+0.095
ouse+0.090
umpy+0.086
oc+0.081
uddle+0.080
Neg logit contributions
 the-0.096
 it-0.091
 them-0.085
 their-0.083
 her-0.082
Token
Token ID198
Feature activation+4.99e-01

Pos logit contributions
anda+0.124
ouse+0.117
umpy+0.116
int+0.108
uer+0.105
Neg logit contributions
 the-0.105
 it-0.101
 them-0.089
 their-0.088
 her-0.085
Token
Token ID198
Feature activation+3.71e-01

Pos logit contributions
anda+0.016
ouse+0.015
umpy+0.015
oc+0.014
ney+0.014
Neg logit contributions
 the-0.010
 it-0.010
 them-0.009
 their-0.009
 her-0.009
TokenJohn
Token ID7554
Feature activation+3.92e+00

Pos logit contributions
anda+0.011
ouse+0.011
umpy+0.010
oc+0.010
ney+0.010
Neg logit contributions
 the-0.008
 it-0.008
 them-0.008
 their-0.007
 her-0.007
Token peek
Token ID27185
Feature activation-1.36e+00

Pos logit contributions
ouse+0.128
umpy+0.126
anda+0.124
uer+0.120
uddle+0.120
Neg logit contributions
 the-0.084
 it-0.076
 her-0.064
 a-0.063
 them-0.061
Tokened
Token ID276
Feature activation-4.26e+00

Pos logit contributions
 the+0.050
 it+0.049
 them+0.047
 their+0.046
 her+0.045
Neg logit contributions
anda-0.022
ouse-0.021
umpy-0.020
oc-0.018
ney-0.017
Token behind
Token ID2157
Feature activation-5.52e+00

Pos logit contributions
 the+0.077
 it+0.073
 them+0.071
 their+0.068
 anything+0.064
Neg logit contributions
anda-0.134
ouse-0.129
umpy-0.126
oc-0.123
int-0.117
Token the
Token ID262
Feature activation+2.20e+00

Pos logit contributions
wered+0.028
 anything+0.024
 their+0.023
 the+0.022
 it+0.022
Neg logit contributions
anda-0.195
oc-0.193
umpy-0.190
affe-0.189
jo-0.187
Token bush
Token ID24484
Feature activation+3.06e+00

Pos logit contributions
anda+0.054
ouse+0.051
umpy+0.051
oc+0.046
uddle+0.046
Neg logit contributions
 the-0.058
 them-0.057
 it-0.056
 their-0.056
 anything-0.052
Token She
Token ID1375
Feature activation+6.31e-01

Pos logit contributions
umpy+0.004
anda+0.004
ouse+0.004
uddle+0.004
ables+0.004
Neg logit contributions
 the-0.005
 it-0.005
 them-0.005
 anything-0.005
 such-0.005
Token is
Token ID318
Feature activation-3.17e+00

Pos logit contributions
ouse+0.017
anda+0.017
umpy+0.016
oc+0.015
uer+0.015
Neg logit contributions
 the-0.016
 it-0.016
 them-0.015
 their-0.014
 her-0.014
Token not
Token ID407
Feature activation-3.50e+00

Pos logit contributions
 it+0.075
 the+0.073
 them+0.072
 anything+0.068
 him+0.066
Neg logit contributions
anda-0.081
umpy-0.080
ouse-0.077
ney-0.067
int-0.066
Token happy
Token ID3772
Feature activation-8.63e-01

Pos logit contributions
 the+0.088
 it+0.083
 them+0.079
 their+0.077
 her+0.076
Neg logit contributions
anda-0.097
ouse-0.094
umpy-0.091
oc-0.086
uer-0.083
Token.
Token ID13
Feature activation+5.84e-01

Pos logit contributions
 the+0.018
 it+0.017
 them+0.016
 their+0.016
 her+0.015
Neg logit contributions
anda-0.026
ouse-0.026
umpy-0.025
oc-0.023
ney-0.023
Token She
Token ID1375
Feature activation+7.88e-01

Pos logit contributions
anda+0.014
umpy+0.013
ouse+0.013
oc+0.012
ney+0.011
Neg logit contributions
 the-0.016
 it-0.015
 them-0.015
 their-0.014
 her-0.014
Token says
Token ID1139
Feature activation-3.19e+00

Pos logit contributions
ouse+0.021
anda+0.021
umpy+0.021
oc+0.019
uer+0.019
Neg logit contributions
 the-0.020
 it-0.019
 them-0.018
 their-0.018
 her-0.017
Token,
Token ID11
Feature activation-1.23e+00

Pos logit contributions
 the+0.046
 them+0.043
wered+0.042
 it+0.042
 such+0.041
Neg logit contributions
anda-0.106
ouse-0.101
umpy-0.099
affe-0.092
oke-0.092
Token "
Token ID366
Feature activation-5.27e-01

Pos logit contributions
 the+0.028
 it+0.026
 them+0.026
 their+0.025
 such+0.024
Neg logit contributions
anda-0.034
ouse-0.033
umpy-0.032
oc-0.030
ney-0.028
TokenSh
Token ID2484
Feature activation+2.66e+00

Pos logit contributions
 the+0.010
 it+0.009
 them+0.009
 their+0.008
 her+0.008
Neg logit contributions
anda-0.018
ouse-0.018
umpy-0.017
oc-0.016
uer-0.016
Tokenapes
Token ID7916
Feature activation+2.56e+00

Pos logit contributions
anda+0.063
ouse+0.063
umpy+0.056
oc+0.055
uddle+0.055
Neg logit contributions
 the-0.074
 it-0.071
 them-0.069
 their-0.066
 her-0.065
Token find
Token ID1064
Feature activation-4.85e+00

Pos logit contributions
anda+0.010
ouse+0.010
umpy+0.010
oc+0.009
ney+0.009
Neg logit contributions
 the-0.005
 it-0.005
 them-0.005
 their-0.004
 such-0.004
Token a
Token ID257
Feature activation+8.66e-01

Pos logit contributions
 it+0.049
 the+0.047
wered+0.041
 them+0.038
 anything+0.035
Neg logit contributions
anda-0.180
ouse-0.169
umpy-0.166
uddle-0.162
oc-0.160
Token dry
Token ID5894
Feature activation+2.17e+00

Pos logit contributions
ouse+0.023
anda+0.023
umpy+0.021
uddle+0.020
oc+0.020
Neg logit contributions
 the-0.021
 it-0.020
 them-0.019
 her-0.018
 their-0.018
Token cloth
Token ID16270
Feature activation+4.37e+00

Pos logit contributions
ouse+0.057
anda+0.057
umpy+0.053
oc+0.052
ney+0.050
Neg logit contributions
 the-0.051
 it-0.049
 them-0.048
 their-0.046
 anything-0.046
Token and
Token ID290
Feature activation-2.57e+00

Pos logit contributions
anda+0.130
ouse+0.128
umpy+0.125
oc+0.119
int+0.116
Neg logit contributions
 the-0.100
 it-0.094
 them-0.089
 their-0.087
 her-0.084
Token wrap
Token ID14441
Feature activation-2.57e+00

Pos logit contributions
 them+0.032
wered+0.031
 it+0.027
 such+0.027
 the+0.026
Neg logit contributions
ouse-0.081
anda-0.073
ney-0.071
oc-0.071
uddle-0.070
Token it
Token ID340
Feature activation+2.04e-01

Pos logit contributions
wered+0.014
 the+0.013
 it+0.009
 anything+0.008
 her+0.008
Neg logit contributions
anda-0.108
ouse-0.105
umpy-0.099
ney-0.098
ables-0.097
Token around
Token ID1088
Feature activation-4.73e+00

Pos logit contributions
anda+0.007
ouse+0.007
umpy+0.007
oc+0.006
uer+0.006
Neg logit contributions
 the-0.004
 it-0.004
 them-0.003
 their-0.003
 her-0.003
Token the
Token ID262
Feature activation+2.70e+00

Pos logit contributions
 the+0.025
wered+0.024
 it+0.023
 such+0.019
 anything+0.016
Neg logit contributions
anda-0.186
ouse-0.183
oc-0.178
umpy-0.175
ney-0.170
Token doll
Token ID3654
Feature activation+5.14e+00

Pos logit contributions
anda+0.072
ouse+0.071
umpy+0.068
oc+0.062
ney+0.062
Neg logit contributions
 the-0.067
 it-0.065
 them-0.063
 their-0.061
 her-0.059
Token's
Token ID338
Feature activation+1.31e+00

Pos logit contributions
umpy+0.167
anda+0.162
ouse+0.155
int+0.153
uer+0.148
Neg logit contributions
 the-0.113
 it-0.103
 her-0.093
 their-0.093
 them-0.091
Token.
Token ID13
Feature activation+1.93e+00

Pos logit contributions
anda+0.066
ouse+0.064
umpy+0.062
oc+0.059
uer+0.057
Neg logit contributions
 the-0.046
 it-0.043
 them-0.040
 their-0.039
 her-0.038
Token He
Token ID679
Feature activation+2.62e+00

Pos logit contributions
anda+0.053
ouse+0.049
oc+0.045
umpy+0.045
uer+0.045
Neg logit contributions
 the-0.055
 it-0.052
 her-0.047
 them-0.045
 an-0.045
Token didn
Token ID1422
Feature activation-1.96e+00

Pos logit contributions
anda+0.080
ouse+0.079
umpy+0.078
ari+0.072
oc+0.072
Neg logit contributions
 the-0.060
 it-0.057
 them-0.051
 her-0.050
 their-0.049
Token't
Token ID470
Feature activation-1.93e+00

Pos logit contributions
 the+0.038
 it+0.035
 them+0.035
 their+0.034
 her+0.032
Neg logit contributions
anda-0.059
umpy-0.059
ouse-0.058
oc-0.053
zag-0.052
Token want
Token ID765
Feature activation-5.40e+00

Pos logit contributions
 the+0.040
 it+0.038
 them+0.036
 their+0.035
 her+0.033
Neg logit contributions
anda-0.059
ouse-0.058
umpy-0.058
oc-0.054
int-0.052
Token to
Token ID284
Feature activation-3.82e+00

Pos logit contributions
wered+0.082
 the+0.074
 their+0.073
 them+0.072
 it+0.070
Neg logit contributions
anda-0.155
umpy-0.146
ouse-0.145
ables-0.139
jo-0.133
Token make
Token ID787
Feature activation-5.26e+00

Pos logit contributions
 the+0.060
 them+0.057
 it+0.056
 their+0.054
 such+0.049
Neg logit contributions
anda-0.125
ouse-0.122
umpy-0.121
oc-0.114
uer-0.109
Token any
Token ID597
Feature activation+2.18e-01

Pos logit contributions
 the+0.041
 it+0.034
 them+0.031
 their+0.028
 her+0.024
Neg logit contributions
anda-0.224
ouse-0.219
umpy-0.215
oc-0.208
uddle-0.203
Token more
Token ID517
Feature activation-1.31e-01

Pos logit contributions
anda+0.007
ouse+0.007
umpy+0.006
oc+0.006
uer+0.006
Neg logit contributions
 the-0.005
 it-0.004
 them-0.004
 their-0.004
 her-0.004
Token mistakes
Token ID10135
Feature activation+1.98e+00

Pos logit contributions
 the+0.003
 it+0.003
 them+0.002
 their+0.002
 her+0.002
Neg logit contributions
anda-0.004
ouse-0.004
umpy-0.004
oc-0.004
uer-0.004
Token.
Token ID13
Feature activation+2.17e+00

Pos logit contributions
anda+0.056
ouse+0.055
umpy+0.054
oc+0.050
ables+0.049
Neg logit contributions
 the-0.042
 it-0.040
 them-0.039
 their-0.038
 her-0.036
Token They
Token ID1119
Feature activation-1.15e-01

Pos logit contributions
anda+0.012
ouse+0.012
umpy+0.011
oc+0.011
uer+0.010
Neg logit contributions
 the-0.013
 it-0.012
 them-0.012
 their-0.011
 her-0.011
Token say
Token ID910
Feature activation-2.69e+00

Pos logit contributions
 it+0.002
 the+0.002
 them+0.002
 anything+0.002
 such+0.002
Neg logit contributions
anda-0.004
ouse-0.004
oc-0.004
umpy-0.004
ari-0.003
Token thank
Token ID5875
Feature activation-2.95e+00

Pos logit contributions
 the+0.051
 it+0.048
 them+0.047
 her+0.044
 such+0.044
Neg logit contributions
anda-0.081
ouse-0.078
umpy-0.075
oc-0.072
affe-0.071
Token you
Token ID345
Feature activation+3.10e-01

Pos logit contributions
 the+0.037
 it+0.036
 such+0.032
 anything+0.032
 their+0.032
Neg logit contributions
anda-0.099
oc-0.093
ouse-0.093
umpy-0.092
int-0.090
Token and
Token ID290
Feature activation-2.74e+00

Pos logit contributions
anda+0.011
ouse+0.011
umpy+0.010
oc+0.010
int+0.010
Neg logit contributions
 the-0.006
 it-0.005
 them-0.005
 their-0.005
 her-0.004
Token goodbye
Token ID24829
Feature activation+7.09e-01

Pos logit contributions
 them+0.046
 it+0.045
 the+0.044
 anything+0.043
wered+0.042
Neg logit contributions
anda-0.083
ouse-0.081
oc-0.076
umpy-0.076
ari-0.074
Token to
Token ID284
Feature activation-5.53e+00

Pos logit contributions
anda+0.024
ouse+0.024
umpy+0.023
oc+0.022
int+0.022
Neg logit contributions
 the-0.013
 it-0.013
 them-0.011
 their-0.011
 her-0.011
Token the
Token ID262
Feature activation+2.75e+00

Pos logit contributions
pped+0.024
 such+0.024
 anything+0.022
 risking+0.021
 it+0.020
Neg logit contributions
anda-0.199
oc-0.199
affe-0.180
ang-0.177
jo-0.175
Token man
Token ID582
Feature activation+4.30e+00

Pos logit contributions
anda+0.058
ouse+0.057
umpy+0.055
oc+0.050
uer+0.047
Neg logit contributions
 the-0.085
 it-0.082
 them-0.079
 their-0.077
 her-0.076
Token.
Token ID13
Feature activation+8.23e-01

Pos logit contributions
anda+0.153
ouse+0.153
umpy+0.149
uddle+0.141
uer+0.140
Neg logit contributions
 the-0.081
 it-0.073
 them-0.065
 their-0.064
 her-0.061
Token They
Token ID1119
Feature activation+5.93e-02

Pos logit contributions
anda+0.022
ouse+0.022
umpy+0.021
oc+0.020
uer+0.019
Neg logit contributions
 the-0.021
 it-0.020
 them-0.019
 their-0.019
 her-0.018
Token.
Token ID13
Feature activation+1.96e+00

Pos logit contributions
anda+0.061
ouse+0.059
umpy+0.058
oc+0.055
int+0.053
Neg logit contributions
 the-0.039
 it-0.036
 them-0.034
 their-0.033
 her-0.032
Token They
Token ID1119
Feature activation-6.28e-02

Pos logit contributions
anda+0.055
ouse+0.054
umpy+0.052
oc+0.049
uer+0.048
Neg logit contributions
 the-0.049
 it-0.047
 them-0.044
 their-0.044
 her-0.042
Token sn
Token ID3013
Feature activation+8.65e-01

Pos logit contributions
 the+0.001
 it+0.001
 them+0.001
 their+0.001
 anything+0.001
Neg logit contributions
anda-0.002
ouse-0.002
umpy-0.002
oc-0.001
ari-0.001
Tokenugg
Token ID6837
Feature activation+6.23e-01

Pos logit contributions
anda+0.017
ouse+0.016
umpy+0.016
oc+0.014
uer+0.014
Neg logit contributions
 the-0.029
 it-0.028
 them-0.027
 their-0.026
 her-0.025
Tokenled
Token ID992
Feature activation-4.06e+00

Pos logit contributions
anda+0.010
ouse+0.010
umpy+0.009
oc+0.009
uer+0.008
Neg logit contributions
 the-0.023
 it-0.022
 them-0.021
 their-0.021
 her-0.021
Token close
Token ID1969
Feature activation-3.11e+00

Pos logit contributions
 the+0.084
 it+0.079
 them+0.075
 their+0.071
 her+0.070
Neg logit contributions
anda-0.123
ouse-0.120
umpy-0.115
oc-0.110
ables-0.107
Token and
Token ID290
Feature activation-3.09e+00

Pos logit contributions
 the+0.047
 it+0.045
wered+0.045
 them+0.043
 anything+0.043
Neg logit contributions
anda-0.102
ouse-0.094
umpy-0.092
ables-0.089
ney-0.088
Token soon
Token ID2582
Feature activation-3.41e+00

Pos logit contributions
 the+0.063
 them+0.062
 it+0.060
 their+0.057
 anything+0.057
Neg logit contributions
ouse-0.086
anda-0.085
umpy-0.079
oc-0.077
ney-0.075
Token fell
Token ID3214
Feature activation-2.07e+00

Pos logit contributions
 the+0.047
 them+0.043
 it+0.043
wered+0.043
 such+0.040
Neg logit contributions
anda-0.117
ouse-0.115
umpy-0.108
ables-0.104
uddle-0.104
Token fast
Token ID3049
Feature activation-5.29e-01

Pos logit contributions
 the+0.049
 it+0.047
 them+0.044
 their+0.043
 her+0.042
Neg logit contributions
anda-0.060
ouse-0.058
umpy-0.056
oc-0.054
uer-0.052
Token asleep
Token ID16039
Feature activation+2.13e-01

Pos logit contributions
 the+0.012
 it+0.011
 them+0.011
 their+0.010
 her+0.010
Neg logit contributions
anda-0.015
ouse-0.015
umpy-0.015
oc-0.014
uer-0.013
Token on
Token ID319
Feature activation-2.40e+00

Pos logit contributions
 the+0.030
 it+0.029
 them+0.028
 their+0.025
 anything+0.025
Neg logit contributions
anda-0.056
ouse-0.055
umpy-0.053
ari-0.050
oc-0.050
Token,
Token ID11
Feature activation-2.36e+00

Pos logit contributions
 it+0.046
 the+0.046
 them+0.046
 anything+0.041
wered+0.041
Neg logit contributions
anda-0.066
ouse-0.063
oc-0.060
umpy-0.060
ables-0.057
Token Tim
Token ID5045
Feature activation+4.76e+00

Pos logit contributions
 the+0.031
 them+0.029
 it+0.029
 her+0.026
 their+0.025
Neg logit contributions
anda-0.081
ouse-0.079
umpy-0.077
oc-0.076
uddle-0.072
Token grew
Token ID6348
Feature activation-2.08e+00

Pos logit contributions
anda+0.148
ouse+0.142
umpy+0.140
uer+0.135
uddle+0.133
Neg logit contributions
 the-0.116
 it-0.105
 her-0.094
 a-0.089
 them-0.088
Token up
Token ID510
Feature activation-1.31e+00

Pos logit contributions
 the+0.045
 it+0.043
 them+0.041
 their+0.039
 her+0.038
Neg logit contributions
anda-0.061
ouse-0.061
umpy-0.059
oc-0.055
ney-0.054
Token to
Token ID284
Feature activation-5.19e+00

Pos logit contributions
 the+0.027
 it+0.025
 them+0.024
 their+0.023
 her+0.023
Neg logit contributions
anda-0.042
ouse-0.041
umpy-0.040
oc-0.038
uer-0.037
Token be
Token ID307
Feature activation-1.88e+00

Pos logit contributions
 it+0.052
 them+0.052
 the+0.048
 such+0.043
 their+0.041
Neg logit contributions
oc-0.173
ouse-0.169
anda-0.166
jo-0.160
umpy-0.160
Token a
Token ID257
Feature activation+2.64e+00

Pos logit contributions
 the+0.035
 it+0.031
 them+0.029
 their+0.029
 her+0.028
Neg logit contributions
anda-0.066
ouse-0.064
umpy-0.062
oc-0.060
uddle-0.058
Token kind
Token ID1611
Feature activation+3.25e+00

Pos logit contributions
anda+0.080
ouse+0.077
umpy+0.077
int+0.072
ables+0.069
Neg logit contributions
 the-0.062
 it-0.058
 their-0.052
 them-0.051
 her-0.049
Token and
Token ID290
Feature activation-1.75e+00

Pos logit contributions
anda+0.089
ouse+0.088
umpy+0.082
int+0.078
uer+0.077
Neg logit contributions
 the-0.085
 it-0.079
 their-0.073
 them-0.073
 her-0.071
Token caring
Token ID18088
Feature activation+2.96e+00

Pos logit contributions
 the+0.040
 them+0.039
 it+0.038
 their+0.035
 her+0.034
Neg logit contributions
ouse-0.047
anda-0.046
umpy-0.045
oc-0.043
ney-0.041
Token,
Token ID11
Feature activation-1.74e+00

Pos logit contributions
anda+0.023
ouse+0.022
umpy+0.022
oc+0.021
ari+0.020
Neg logit contributions
 the-0.011
 it-0.010
 them-0.009
 their-0.009
 her-0.009
Token but
Token ID475
Feature activation-2.86e+00

Pos logit contributions
 the+0.036
 it+0.035
 them+0.034
 their+0.032
 her+0.032
Neg logit contributions
anda-0.051
ouse-0.049
umpy-0.048
oc-0.046
ari-0.044
Token he
Token ID339
Feature activation+4.94e-01

Pos logit contributions
 the+0.038
 them+0.037
 it+0.036
 such+0.035
 her+0.034
Neg logit contributions
anda-0.091
ouse-0.088
oc-0.084
umpy-0.082
uer-0.081
Token couldn
Token ID3521
Feature activation-2.86e+00

Pos logit contributions
ouse+0.015
umpy+0.015
anda+0.015
oc+0.014
ables+0.014
Neg logit contributions
 the-0.011
 it-0.011
 them-0.009
 her-0.009
 their-0.009
Token't
Token ID470
Feature activation-2.33e+00

Pos logit contributions
 the+0.048
 it+0.044
 them+0.041
 their+0.039
 her+0.038
Neg logit contributions
anda-0.101
ouse-0.100
umpy-0.097
oc-0.093
uer-0.090
Token find
Token ID1064
Feature activation-6.81e+00

Pos logit contributions
 the+0.036
 it+0.034
 them+0.031
 their+0.030
 her+0.028
Neg logit contributions
anda-0.083
ouse-0.082
umpy-0.081
oc-0.077
uer-0.075
Token it
Token ID340
Feature activation-5.16e-01

Pos logit contributions
wered+0.040
 the+0.034
 their+0.026
 respect+0.024
 it+0.024
Neg logit contributions
anda-0.272
oc-0.254
ouse-0.245
umpy-0.237
affe-0.235
Token.
Token ID13
Feature activation+1.78e+00

Pos logit contributions
 the+0.011
 it+0.010
 them+0.010
 their+0.009
 her+0.009
Neg logit contributions
anda-0.015
ouse-0.015
umpy-0.015
oc-0.014
ney-0.013
Token He
Token ID679
Feature activation+1.25e+00

Pos logit contributions
anda+0.052
ouse+0.051
umpy+0.047
uddle+0.044
int+0.044
Neg logit contributions
 the-0.047
 it-0.043
 them-0.041
 their-0.040
 an-0.039
Token was
Token ID373
Feature activation-3.26e+00

Pos logit contributions
ouse+0.040
umpy+0.040
anda+0.039
ables+0.037
oc+0.036
Neg logit contributions
 the-0.027
 it-0.026
 them-0.023
 her-0.023
 their-0.022
Token so
Token ID523
Feature activation-2.37e+00

Pos logit contributions
 the+0.077
 it+0.076
 them+0.073
 their+0.070
 her+0.068
Neg logit contributions
anda-0.088
umpy-0.083
ouse-0.083
ney-0.074
oc-0.074
Token wished
Token ID16555
Feature activation-4.65e+00

Pos logit contributions
anda+0.041
ouse+0.040
umpy+0.039
oc+0.037
ney+0.035
Neg logit contributions
 the-0.032
 it-0.031
 them-0.030
 their-0.028
 her-0.028
Token they
Token ID484
Feature activation-1.22e+00

Pos logit contributions
 the+0.076
 it+0.071
wered+0.070
 such+0.069
 anything+0.066
Neg logit contributions
anda-0.135
ouse-0.130
umpy-0.123
ari-0.121
jo-0.120
Token had
Token ID550
Feature activation-4.91e+00

Pos logit contributions
 the+0.032
 them+0.031
 it+0.030
 their+0.029
wered+0.029
Neg logit contributions
anda-0.028
ouse-0.028
umpy-0.027
ney-0.024
oc-0.024
Token not
Token ID407
Feature activation-3.16e+00

Pos logit contributions
wered+0.077
 it+0.072
 the+0.069
 them+0.066
 outer+0.059
Neg logit contributions
ouse-0.146
anda-0.144
ney-0.128
uer-0.127
umpy-0.126
Token gone
Token ID3750
Feature activation-2.41e+00

Pos logit contributions
 it+0.067
 the+0.066
 them+0.065
wered+0.058
 their+0.058
Neg logit contributions
anda-0.089
ouse-0.086
umpy-0.084
ney-0.077
ari-0.075
Token on
Token ID319
Feature activation-4.79e+00

Pos logit contributions
 the+0.043
 it+0.041
 them+0.038
 their+0.035
 her+0.035
Neg logit contributions
anda-0.078
ouse-0.076
umpy-0.074
oc-0.069
ney-0.069
Token vacation
Token ID14600
Feature activation+2.58e+00

Pos logit contributions
wered+0.022
 it+0.018
 respect+0.017
 them+0.014
 help+0.013
Neg logit contributions
anda-0.175
oc-0.174
ouse-0.173
umpy-0.173
ables-0.170
Token.
Token ID13
Feature activation+1.14e+00

Pos logit contributions
anda+0.084
ouse+0.082
umpy+0.079
oc+0.076
int+0.074
Neg logit contributions
 the-0.052
 it-0.049
 them-0.046
 their-0.044
 her-0.043
Token They
Token ID1119
Feature activation+1.30e+00

Pos logit contributions
anda+0.031
ouse+0.030
umpy+0.029
oc+0.027
uer+0.027
Neg logit contributions
 the-0.029
 it-0.028
 them-0.026
 their-0.026
 her-0.025
Token wished
Token ID16555
Feature activation-4.77e+00

Pos logit contributions
anda+0.038
ouse+0.037
umpy+0.036
oc+0.034
ney+0.033
Neg logit contributions
 the-0.029
 it-0.028
 them-0.027
 their-0.026
 her-0.025
Token they
Token ID484
Feature activation-1.26e+00

Pos logit contributions
 the+0.078
wered+0.073
 it+0.073
 such+0.072
 anything+0.068
Neg logit contributions
anda-0.137
ouse-0.132
umpy-0.124
ari-0.123
jo-0.121
Token
Token ID198
Feature activation+1.43e+00

Pos logit contributions
anda+0.045
ouse+0.044
umpy+0.042
oc+0.040
uer+0.040
Neg logit contributions
 the-0.037
 it-0.035
 them-0.033
 their-0.032
 her-0.031
TokenL
Token ID43
Feature activation+3.33e+00

Pos logit contributions
anda+0.042
ouse+0.041
umpy+0.039
oc+0.037
int+0.036
Neg logit contributions
 the-0.034
 it-0.032
 them-0.031
 their-0.030
 her-0.029
Tokenily
Token ID813
Feature activation+5.71e+00

Pos logit contributions
anda+0.106
ouse+0.104
umpy+0.100
oc+0.097
uer+0.094
Neg logit contributions
 the-0.068
 it-0.064
 them-0.061
 their-0.058
 her-0.058
Token looked
Token ID3114
Feature activation-1.48e+00

Pos logit contributions
anda+0.178
umpy+0.176
ouse+0.168
oc+0.162
uer+0.160
Neg logit contributions
 the-0.131
 it-0.120
 her-0.110
 them-0.106
 their-0.104
Token up
Token ID510
Feature activation-5.27e-01

Pos logit contributions
 the+0.034
 it+0.032
 them+0.030
 their+0.029
 her+0.029
Neg logit contributions
anda-0.045
ouse-0.045
umpy-0.044
oc-0.042
ney-0.041
Token at
Token ID379
Feature activation-4.51e+00

Pos logit contributions
 the+0.013
 it+0.013
 them+0.012
 their+0.012
 anything+0.012
Neg logit contributions
anda-0.013
ouse-0.013
umpy-0.013
oc-0.012
int-0.012
Token them
Token ID606
Feature activation+2.85e+00

Pos logit contributions
 respect+0.015
 it+0.014
 such+0.013
 anything+0.013
 risking+0.013
Neg logit contributions
anda-0.171
ouse-0.158
umpy-0.157
jo-0.154
ang-0.152
Token with
Token ID351
Feature activation-3.19e+00

Pos logit contributions
anda+0.085
ouse+0.083
umpy+0.081
oc+0.076
uer+0.074
Neg logit contributions
 the-0.066
 it-0.062
 them-0.058
 their-0.057
 her-0.056
Token a
Token ID257
Feature activation+1.81e+00

Pos logit contributions
 it+0.044
 them+0.040
 the+0.040
 your+0.039
 anything+0.039
Neg logit contributions
ouse-0.085
anda-0.085
umpy-0.083
oc-0.080
affe-0.079
Token big
Token ID1263
Feature activation+1.60e+00

Pos logit contributions
anda+0.055
umpy+0.052
ouse+0.052
oc+0.049
ari+0.048
Neg logit contributions
 the-0.041
 it-0.039
 them-0.036
 their-0.036
 her-0.035
Token smile
Token ID8212
Feature activation+4.01e+00

Pos logit contributions
anda+0.042
ouse+0.041
umpy+0.039
oc+0.037
uer+0.035
Neg logit contributions
 the-0.041
 it-0.039
 them-0.038
 their-0.036
 her-0.036
Token know
Token ID760
Feature activation-4.67e+00

Pos logit contributions
 the+0.048
 it+0.044
 them+0.042
 their+0.040
 her+0.039
Neg logit contributions
anda-0.080
ouse-0.078
umpy-0.077
oc-0.072
uer-0.070
Token why
Token ID1521
Feature activation-3.74e+00

Pos logit contributions
 the+0.041
wered+0.039
 their+0.038
 them+0.037
 it+0.036
Neg logit contributions
anda-0.180
ouse-0.158
umpy-0.154
uddle-0.150
oc-0.148
Token.
Token ID13
Feature activation+1.39e+00

Pos logit contributions
 the+0.062
 them+0.060
 it+0.057
 their+0.056
 such+0.055
Neg logit contributions
anda-0.115
ouse-0.112
umpy-0.108
uddle-0.099
oc-0.099
Token She
Token ID1375
Feature activation+1.47e+00

Pos logit contributions
anda+0.039
ouse+0.039
umpy+0.036
oc+0.036
uer+0.034
Neg logit contributions
 the-0.035
 it-0.034
 them-0.032
 her-0.031
 their-0.030
Token looked
Token ID3114
Feature activation-3.71e+00

Pos logit contributions
ouse+0.046
umpy+0.045
anda+0.045
ari+0.042
oc+0.042
Neg logit contributions
 the-0.033
 it-0.031
 them-0.029
 her-0.028
 their-0.027
Token around
Token ID1088
Feature activation-4.15e+00

Pos logit contributions
 the+0.083
 it+0.079
 them+0.076
 their+0.073
 her+0.070
Neg logit contributions
anda-0.107
ouse-0.102
umpy-0.101
oc-0.093
uddle-0.092
Token the
Token ID262
Feature activation+2.27e+00

Pos logit contributions
 it+0.089
 anything+0.089
wered+0.088
 them+0.087
 yours+0.087
Neg logit contributions
anda-0.087
umpy-0.084
ouse-0.080
oc-0.078
uddle-0.076
Token bathroom
Token ID12436
Feature activation+3.26e+00

Pos logit contributions
anda+0.069
ouse+0.068
umpy+0.066
oc+0.061
uddle+0.061
Neg logit contributions
 the-0.049
 it-0.047
 them-0.044
 their-0.043
 her-0.042
Token,
Token ID11
Feature activation-1.02e+00

Pos logit contributions
anda+0.098
ouse+0.096
umpy+0.093
oc+0.088
uer+0.085
Neg logit contributions
 the-0.074
 it-0.069
 them-0.066
 their-0.063
 her-0.062
Token but
Token ID475
Feature activation-3.07e+00

Pos logit contributions
 the+0.021
 it+0.021
 them+0.021
 anything+0.020
 their+0.020
Neg logit contributions
anda-0.026
ouse-0.025
umpy-0.024
int-0.023
ney-0.022
Token nothing
Token ID2147
Feature activation-9.21e-03

Pos logit contributions
 them+0.046
 respect+0.043
 their+0.043
 the+0.043
 such+0.042
Neg logit contributions
anda-0.091
ouse-0.085
umpy-0.080
uddle-0.079
int-0.077