TOP ACTIVATIONS
MAX = 7.372

the ball and the dog fet ches it . They have
throw the ball and he fet ches it . They take
the ball and the dog fet ches it . They push
a ball and the dog fet ches it . They hug
dust and Lily was able to skip around the house once
was careful and was able to find her way out of
and the tourist was able to finish his tour happily .
, and Lily was able to get her toy back .
They took off the ash tr ay and the book .
. Jack put the ash tr ay on the kitchen table
eventually , they were able to fix the pattern . Sammy
his mom put the ash tr ay away , but Little
tire and he was able to join the race again .
together and they were able to get Jeff out of jail
, and they were able to get on the aer opl
fun together . They managed to not fight after that and
girl , they were able to get his old k ite
perfectly and they were able to unlock the gate .
time , make sure you can hear each other , okay
." She hugged Sarah and fet ched the key from the

MIN ACTIVATIONS
MIN = -7.305

<|endoftext|> One day , a funny hipp o went to play
<|endoftext|> One day , a funny cat named Kitty found a
<|endoftext|> One day , a funny duck named Bob went to
, there was a j olly man . His name was
time there was a j olly old woman who lived in
, there was a j olly little girl named Lily .
, there was a j olly little boy named Max .
. <|endoftext|> One day , a Steve and a Steve 's
, there was a j olly boy named Tim my .
<|endoftext|> One day , a funny cat and a little dog
upon a time , a shy little girl named Lily went
time there was a j olly girl . She was gigg
, there was a j olly girl named Joy . She
, there was a j olly boy named Tom . He
. <|endoftext|> One day , a small girl called Lucy went
. <|endoftext|> One day , a small boy wanted to eat
. <|endoftext|> One day , a little girl named Lily went
a time , a j olly angel flew down from the
upon a time , a silly infant was playing outside in
. <|endoftext|> One day , a little girl went outside to

INTERVAL 3.703 - 7.372
CONTAINS 1.525%

. The dragon took the treasure and flew away . Jack
that she didn 't forget to smile anymore when taking pictures
, you don 't want to get hurt !"
my got frustrated because he didn 't have any money .
it was the perfect way to make their pretend marriage even

INTERVAL 0.034 - 3.703
CONTAINS 49.974%

to help others and to never give up on your dreams
found some in the river nearby . He put them in
that day on , Tim promised to be more careful when
did you do ?" Mom says . " You broke the
. Where had the creature gone ? He had to find

INTERVAL -3.636 - 0.034
CONTAINS 45.181%

to go on the slide , but Ben is scared .
€ <|endoftext|> Once upon a time , there were two friends
dig in the sand and find things . Tom found a
shy . He nodded and said yes . He said that
clumsy girl named Lily . She always dropped things and tri

INTERVAL -7.305 - -3.636
CONTAINS 3.320%

new friend . <|endoftext|> Once upon a time , there was
cheeks . " But why ?" he asked in
. He had an idea - he would paint !
, there was a big oct opus . The oct opus
time , there was a little girl named Lily . She
Token the
Token ID262
Feature activation+2.90e+00

Pos logit contributions
ips+0.079
aned+0.073
aled+0.073
wed+0.066
ge+0.065
Neg logit contributions
 Can-0.070
 oh-0.067
 insect-0.065
 mean-0.062
 Please-0.062
Token ball
Token ID2613
Feature activation+4.00e+00

Pos logit contributions
ips+0.092
aned+0.086
aled+0.085
ags+0.081
ge+0.079
Neg logit contributions
 Can-0.046
 oh-0.043
 insect-0.040
 Please-0.039
 I-0.038
Token and
Token ID290
Feature activation+3.07e+00

Pos logit contributions
aned+0.117
ips+0.117
aled+0.110
ulk+0.107
ge+0.105
Neg logit contributions
 Can-0.069
 insect-0.068
 mean-0.068
 oh-0.064
 I-0.062
Token the
Token ID262
Feature activation+2.53e+00

Pos logit contributions
ips+0.090
aned+0.089
aled+0.082
ge+0.079
wed+0.078
Neg logit contributions
 Can-0.059
 oh-0.055
 insect-0.054
 I-0.054
 mean-0.054
Token dog
Token ID3290
Feature activation+2.37e+00

Pos logit contributions
ips+0.073
aned+0.068
aled+0.067
ags+0.062
ge+0.062
Neg logit contributions
 Can-0.048
 oh-0.046
 insect-0.044
 Please-0.042
 witch-0.041
Token fet
Token ID11351
Feature activation+7.37e+00

Pos logit contributions
aned+0.075
ips+0.073
aled+0.068
ulk+0.067
ippy+0.066
Neg logit contributions
 Can-0.038
 mean-0.037
 hello-0.036
 I-0.034
 insect-0.034
Tokenches
Token ID2052
Feature activation+4.47e+00

Pos logit contributions
ips+0.162
aned+0.159
ge+0.146
keeping+0.140
ulk+0.138
Neg logit contributions
 Can-0.174
 please-0.170
 mean-0.165
 I-0.165
 hello-0.164
Token it
Token ID340
Feature activation+3.99e+00

Pos logit contributions
ips+0.166
aned+0.161
aled+0.153
wed+0.149
ge+0.148
Neg logit contributions
 Can-0.045
 insect-0.044
 mean-0.042
 I-0.040
 oh-0.040
Token.
Token ID13
Feature activation+1.39e+00

Pos logit contributions
ips+0.144
aned+0.141
aled+0.133
ulk+0.131
 ease+0.126
Neg logit contributions
 Can-0.046
 mean-0.044
 hello-0.039
 I-0.038
 insect-0.038
Token They
Token ID1119
Feature activation+2.81e+00

Pos logit contributions
ips+0.042
aned+0.041
aled+0.039
ge+0.037
wed+0.036
Neg logit contributions
 Can-0.026
 oh-0.024
 insect-0.023
 I-0.022
 mean-0.022
Token have
Token ID423
Feature activation+9.82e-01

Pos logit contributions
ips+0.089
aned+0.088
aled+0.080
ge+0.078
ulk+0.078
Neg logit contributions
 mean-0.048
 Can-0.045
 I-0.044
 hello-0.043
 insect-0.042
Token throw
Token ID3714
Feature activation+2.85e+00

Pos logit contributions
ips+0.069
aned+0.068
aled+0.061
ge+0.061
ulk+0.061
Neg logit contributions
 mean-0.038
 Can-0.037
 hello-0.036
 I-0.035
 insect-0.035
Token the
Token ID262
Feature activation+2.52e+00

Pos logit contributions
ips+0.076
aned+0.074
aled+0.070
ge+0.066
wed+0.064
Neg logit contributions
 Can-0.061
 oh-0.058
 insect-0.057
 mean-0.055
 hello-0.053
Token ball
Token ID2613
Feature activation+3.68e+00

Pos logit contributions
ips+0.068
aned+0.064
aled+0.062
ge+0.057
wed+0.057
Neg logit contributions
 Can-0.054
 oh-0.052
 insect-0.050
 mean-0.048
 Please-0.047
Token and
Token ID290
Feature activation+2.77e+00

Pos logit contributions
ips+0.111
aned+0.109
aled+0.103
ge+0.098
ulk+0.098
Neg logit contributions
 Can-0.065
 insect-0.061
 mean-0.059
 oh-0.059
 I-0.056
Token he
Token ID339
Feature activation+2.85e+00

Pos logit contributions
ips+0.081
aned+0.080
aled+0.074
ge+0.071
ippy+0.070
Neg logit contributions
 Can-0.054
 insect-0.049
 oh-0.049
 I-0.049
 mean-0.049
Token fet
Token ID11351
Feature activation+7.17e+00

Pos logit contributions
aned+0.074
ips+0.073
ge+0.066
aled+0.066
ulk+0.064
Neg logit contributions
 Can-0.062
 mean-0.058
 I-0.057
 insect-0.056
 hello-0.055
Tokenches
Token ID2052
Feature activation+3.96e+00

Pos logit contributions
aned+0.164
ips+0.161
ge+0.149
keeping+0.142
ulk+0.141
Neg logit contributions
 Can-0.162
 please-0.156
 I-0.152
 mean-0.151
 hello-0.151
Token it
Token ID340
Feature activation+3.70e+00

Pos logit contributions
ips+0.149
aned+0.145
aled+0.137
wed+0.134
ulk+0.133
Neg logit contributions
 Can-0.038
 insect-0.035
 mean-0.035
 I-0.033
 oh-0.032
Token.
Token ID13
Feature activation+6.85e-01

Pos logit contributions
ips+0.129
aned+0.128
aled+0.118
ulk+0.117
ge+0.115
Neg logit contributions
 Can-0.048
 mean-0.044
 insect-0.040
 hello-0.039
 I-0.039
Token They
Token ID1119
Feature activation+2.79e+00

Pos logit contributions
ips+0.022
aned+0.022
aled+0.021
ge+0.021
wed+0.020
Neg logit contributions
 Can-0.011
 oh-0.010
 I-0.009
 Please-0.009
 hello-0.009
Token take
Token ID1011
Feature activation+3.14e+00

Pos logit contributions
aned+0.089
ips+0.089
ulk+0.078
aled+0.078
ge+0.078
Neg logit contributions
 mean-0.048
 Can-0.045
 I-0.044
 hello-0.043
 insect-0.042
Token the
Token ID262
Feature activation+2.43e+00

Pos logit contributions
ips+0.070
aned+0.067
aled+0.064
ge+0.060
wed+0.059
Neg logit contributions
 Can-0.061
 oh-0.058
 insect-0.057
 mean-0.055
 Please-0.053
Token ball
Token ID2613
Feature activation+3.46e+00

Pos logit contributions
ips+0.067
aned+0.062
aled+0.061
ags+0.057
wed+0.057
Neg logit contributions
 Can-0.049
 oh-0.047
 insect-0.044
 Please-0.043
 I-0.042
Token and
Token ID290
Feature activation+2.57e+00

Pos logit contributions
ips+0.100
aned+0.099
aled+0.093
ge+0.089
ulk+0.089
Neg logit contributions
 Can-0.064
 insect-0.061
 mean-0.060
 oh-0.059
 I-0.056
Token the
Token ID262
Feature activation+2.13e+00

Pos logit contributions
ips+0.073
aned+0.072
aled+0.067
ge+0.065
wed+0.063
Neg logit contributions
 Can-0.051
 insect-0.047
 oh-0.047
 I-0.047
 mean-0.046
Token dog
Token ID3290
Feature activation+1.71e+00

Pos logit contributions
ips+0.059
aned+0.054
aled+0.054
ags+0.049
ge+0.049
Neg logit contributions
 Can-0.043
 oh-0.042
 insect-0.040
 Please-0.038
 mean-0.037
Token fet
Token ID11351
Feature activation+7.07e+00

Pos logit contributions
aned+0.051
ips+0.051
aled+0.047
ulk+0.046
ge+0.046
Neg logit contributions
 Can-0.030
 mean-0.029
 hello-0.028
 I-0.027
 insect-0.027
Tokenches
Token ID2052
Feature activation+4.12e+00

Pos logit contributions
ips+0.155
aned+0.153
ge+0.142
ulk+0.136
keeping+0.135
Neg logit contributions
 Can-0.166
 please-0.162
 mean-0.159
 I-0.159
 hello-0.156
Token it
Token ID340
Feature activation+3.52e+00

Pos logit contributions
ips+0.149
aned+0.143
aled+0.137
wed+0.132
ge+0.132
Neg logit contributions
 Can-0.046
 insect-0.045
 mean-0.043
 oh-0.041
 I-0.041
Token.
Token ID13
Feature activation+1.05e+00

Pos logit contributions
ips+0.123
aned+0.119
aled+0.113
ulk+0.112
ge+0.109
Neg logit contributions
 Can-0.046
 mean-0.042
 hello-0.039
 insect-0.039
 I-0.039
Token They
Token ID1119
Feature activation+2.57e+00

Pos logit contributions
ips+0.032
aned+0.031
aled+0.029
ge+0.028
wed+0.028
Neg logit contributions
 Can-0.020
 oh-0.018
 I-0.017
 insect-0.017
 Please-0.017
Token push
Token ID4574
Feature activation+2.98e+00

Pos logit contributions
ips+0.082
aned+0.081
aled+0.073
ge+0.072
ulk+0.072
Neg logit contributions
 mean-0.044
 Can-0.041
 I-0.040
 hello-0.039
 insect-0.038
Token a
Token ID257
Feature activation+2.30e+00

Pos logit contributions
ips+0.078
aned+0.072
aled+0.072
wed+0.065
ge+0.065
Neg logit contributions
 Can-0.062
 oh-0.059
 insect-0.057
 mean-0.054
 Please-0.054
Token ball
Token ID2613
Feature activation+2.49e+00

Pos logit contributions
ips+0.060
aned+0.056
aled+0.055
ge+0.051
wed+0.050
Neg logit contributions
 Can-0.050
 oh-0.048
 insect-0.046
 mean-0.044
 Please-0.044
Token and
Token ID290
Feature activation+1.77e+00

Pos logit contributions
ips+0.077
aned+0.076
aled+0.073
ge+0.069
ulk+0.068
Neg logit contributions
 Can-0.041
 insect-0.038
 oh-0.038
 mean-0.037
 I-0.035
Token the
Token ID262
Feature activation+1.69e+00

Pos logit contributions
ips+0.047
aned+0.044
aled+0.043
wed+0.039
ge+0.039
Neg logit contributions
 Can-0.038
 oh-0.037
 insect-0.035
 mean-0.033
 Please-0.033
Token dog
Token ID3290
Feature activation+1.45e+00

Pos logit contributions
ips+0.048
aned+0.044
aled+0.043
ags+0.041
 ease+0.041
Neg logit contributions
 Can-0.032
 oh-0.030
 insect-0.029
 Please-0.028
 I-0.027
Token fet
Token ID11351
Feature activation+7.02e+00

Pos logit contributions
aned+0.046
ips+0.045
aled+0.042
ulk+0.041
ge+0.041
Neg logit contributions
 Can-0.024
 mean-0.023
 I-0.022
 hello-0.021
 insect-0.021
Tokenches
Token ID2052
Feature activation+4.07e+00

Pos logit contributions
aned+0.154
ips+0.152
ge+0.143
keeping+0.138
ulk+0.133
Neg logit contributions
 Can-0.166
 please-0.160
 mean-0.157
 I-0.157
 hello-0.154
Token it
Token ID340
Feature activation+3.74e+00

Pos logit contributions
ips+0.150
aned+0.144
aled+0.138
wed+0.134
ge+0.134
Neg logit contributions
 Can-0.043
 insect-0.041
 mean-0.039
 I-0.038
 oh-0.038
Token.
Token ID13
Feature activation+1.12e+00

Pos logit contributions
ips+0.135
aned+0.133
aled+0.125
ulk+0.124
ge+0.121
Neg logit contributions
 Can-0.043
 mean-0.040
 I-0.036
 hello-0.035
 insect-0.034
Token They
Token ID1119
Feature activation+2.74e+00

Pos logit contributions
ips+0.033
aned+0.031
aled+0.030
ge+0.029
wed+0.028
Neg logit contributions
 Can-0.022
 oh-0.020
 insect-0.020
 mean-0.019
 I-0.019
Token hug
Token ID16225
Feature activation+5.12e-01

Pos logit contributions
ips+0.088
aned+0.088
aled+0.078
ulk+0.078
ge+0.078
Neg logit contributions
 mean-0.047
 Can-0.044
 I-0.043
 hello-0.041
 insect-0.040
Token dust
Token ID8977
Feature activation+3.57e+00

Pos logit contributions
ips+0.108
aned+0.102
aled+0.102
ge+0.090
ulk+0.090
Neg logit contributions
 Can-0.077
 oh-0.075
 mean-0.075
 insect-0.075
 witch-0.070
Token and
Token ID290
Feature activation+1.28e+00

Pos logit contributions
ips+0.116
aned+0.112
aled+0.112
wed+0.104
ws+0.103
Neg logit contributions
 Can-0.050
 oh-0.049
 insect-0.048
 mean-0.045
 I-0.045
Token Lily
Token ID20037
Feature activation+2.47e+00

Pos logit contributions
ips+0.038
aned+0.037
aled+0.035
ippy+0.033
ags+0.033
Neg logit contributions
 oh-0.024
 Can-0.023
 insect-0.022
 mean-0.022
 I-0.021
Token was
Token ID373
Feature activation+1.11e+00

Pos logit contributions
ips+0.086
ags+0.085
aled+0.084
aned+0.081
ulk+0.080
Neg logit contributions
 curious-0.027
 oh-0.027
 mean-0.026
 wanted-0.025
 happened-0.025
Token able
Token ID1498
Feature activation+7.99e-01

Pos logit contributions
ips+0.035
aled+0.033
aned+0.033
ws+0.032
ippy+0.032
Neg logit contributions
 oh-0.019
 curious-0.018
 mean-0.018
 kind-0.016
 nos-0.016
Token to
Token ID284
Feature activation+6.93e+00

Pos logit contributions
ips+0.027
aned+0.027
aled+0.027
ags+0.024
ws+0.024
Neg logit contributions
 Can-0.011
 oh-0.011
 mean-0.010
 I-0.010
 insect-0.010
Token skip
Token ID14267
Feature activation+3.03e+00

Pos logit contributions
ips+0.219
aned+0.211
aled+0.206
ws+0.192
wed+0.191
Neg logit contributions
 please-0.114
 mean-0.113
 I-0.112
 insect-0.109
 curious-0.104
Token around
Token ID1088
Feature activation+1.11e+00

Pos logit contributions
aned+0.087
ips+0.085
aled+0.080
wed+0.076
ge+0.074
Neg logit contributions
 Can-0.059
 oh-0.055
 mean-0.054
 I-0.054
 insect-0.054
Token the
Token ID262
Feature activation+2.17e+00

Pos logit contributions
ips+0.033
aned+0.032
aled+0.031
wed+0.029
ge+0.028
Neg logit contributions
 Can-0.021
 oh-0.020
 insect-0.019
 mean-0.018
 I-0.018
Token house
Token ID2156
Feature activation+1.93e+00

Pos logit contributions
ips+0.054
aned+0.051
aled+0.049
ge+0.045
wed+0.045
Neg logit contributions
 Can-0.051
 oh-0.049
 insect-0.048
 mean-0.046
 Please-0.045
Token once
Token ID1752
Feature activation+2.72e+00

Pos logit contributions
ips+0.062
aned+0.060
aled+0.060
ags+0.056
wed+0.055
Neg logit contributions
 Can-0.030
 oh-0.029
 insect-0.027
 I-0.027
 mean-0.027
Token was
Token ID373
Feature activation+1.10e+00

Pos logit contributions
ips+0.054
aned+0.049
aled+0.048
ags+0.047
ge+0.046
Neg logit contributions
 Can-0.027
 oh-0.026
 insect-0.025
 mean-0.024
 I-0.024
Token careful
Token ID8161
Feature activation+4.42e-01

Pos logit contributions
ips+0.034
aned+0.032
aled+0.031
ge+0.030
wed+0.029
Neg logit contributions
 Can-0.019
 oh-0.019
 insect-0.018
 mean-0.018
 I-0.017
Token and
Token ID290
Feature activation+1.11e+00

Pos logit contributions
ips+0.012
aned+0.012
aled+0.011
 safely+0.011
ge+0.011
Neg logit contributions
 Can-0.009
 oh-0.008
 insect-0.008
 witch-0.008
 mean-0.008
Token was
Token ID373
Feature activation+1.20e+00

Pos logit contributions
ips+0.031
aned+0.030
aled+0.029
 safely+0.028
ge+0.027
Neg logit contributions
 Can-0.022
 insect-0.020
 oh-0.020
 mean-0.018
 Please-0.018
Token able
Token ID1498
Feature activation+5.01e-01

Pos logit contributions
ips+0.036
aned+0.035
aled+0.034
ge+0.031
wed+0.031
Neg logit contributions
 Can-0.022
 oh-0.021
 insect-0.020
 mean-0.020
 I-0.019
Token to
Token ID284
Feature activation+6.87e+00

Pos logit contributions
ips+0.016
aned+0.015
aled+0.015
wed+0.014
ge+0.014
Neg logit contributions
 Can-0.009
 oh-0.008
 insect-0.008
 mean-0.008
 I-0.008
Token find
Token ID1064
Feature activation+2.29e+00

Pos logit contributions
ips+0.221
aned+0.200
aled+0.191
ushes+0.178
ags+0.176
Neg logit contributions
 mean-0.128
 oh-0.122
 insect-0.121
 please-0.119
 I-0.119
Token her
Token ID607
Feature activation+2.75e+00

Pos logit contributions
ips+0.066
aned+0.063
aled+0.062
ge+0.057
wed+0.057
Neg logit contributions
 Can-0.044
 oh-0.041
 insect-0.040
 mean-0.039
 Please-0.038
Token way
Token ID835
Feature activation+3.02e+00

Pos logit contributions
ips+0.069
aned+0.066
aled+0.064
ulk+0.057
wed+0.056
Neg logit contributions
 Can-0.065
 oh-0.064
 insect-0.063
 mean-0.061
 I-0.060
Token out
Token ID503
Feature activation+1.81e+00

Pos logit contributions
ips+0.058
aned+0.054
aled+0.052
wed+0.046
ws+0.044
Neg logit contributions
 Can-0.088
 oh-0.085
 insect-0.084
 I-0.083
 mean-0.082
Token of
Token ID286
Feature activation+2.78e+00

Pos logit contributions
ips+0.055
aned+0.053
aled+0.052
wed+0.048
ge+0.047
Neg logit contributions
 Can-0.032
 oh-0.030
 insect-0.029
 mean-0.028
 I-0.027
Token and
Token ID290
Feature activation-4.31e-01

Pos logit contributions
 Can+0.019
 oh+0.018
 insect+0.017
 Please+0.016
 mean+0.016
Neg logit contributions
ips-0.029
aned-0.027
aled-0.027
 safely-0.026
ge-0.025
Token the
Token ID262
Feature activation+2.75e-01

Pos logit contributions
 Can+0.007
 oh+0.007
 insect+0.007
 mean+0.006
 Please+0.006
Neg logit contributions
ips-0.013
aned-0.013
aled-0.013
ge-0.012
wed-0.012
Token tourist
Token ID18473
Feature activation+3.13e-01

Pos logit contributions
ips+0.008
aled+0.008
aned+0.008
 safely+0.007
 ease+0.007
Neg logit contributions
 Can-0.005
 oh-0.005
 insect-0.004
 Please-0.004
 please-0.004
Token was
Token ID373
Feature activation+1.05e+00

Pos logit contributions
ips+0.009
aned+0.009
aled+0.009
ge+0.008
wed+0.008
Neg logit contributions
 Can-0.006
 oh-0.005
 insect-0.005
 mean-0.005
 I-0.005
Token able
Token ID1498
Feature activation+4.11e-01

Pos logit contributions
ips+0.032
aned+0.031
aled+0.030
ge+0.028
wed+0.027
Neg logit contributions
 oh-0.018
 Can-0.018
 mean-0.018
 insect-0.017
 I-0.016
Token to
Token ID284
Feature activation+6.83e+00

Pos logit contributions
ips+0.013
aned+0.012
aled+0.012
ge+0.011
wed+0.011
Neg logit contributions
 Can-0.007
 oh-0.007
 insect-0.007
 mean-0.006
 I-0.006
Token finish
Token ID5461
Feature activation+2.44e+00

Pos logit contributions
ips+0.231
aned+0.213
aled+0.201
ulk+0.189
ushes+0.184
Neg logit contributions
 mean-0.116
 please-0.111
 named-0.102
 I-0.100
 oh-0.099
Token his
Token ID465
Feature activation+3.68e+00

Pos logit contributions
aned+0.075
ips+0.073
aled+0.070
wed+0.066
ippy+0.064
Neg logit contributions
 oh-0.042
 Can-0.041
 insect-0.038
 mean-0.038
 I-0.037
Token tour
Token ID4205
Feature activation+3.85e+00

Pos logit contributions
ips+0.101
aned+0.096
aled+0.092
ge+0.086
wed+0.085
Neg logit contributions
 Can-0.077
 oh-0.074
 insect-0.073
 mean-0.071
 witch-0.068
Token happily
Token ID18177
Feature activation+3.62e+00

Pos logit contributions
ips+0.121
aned+0.120
aled+0.113
wed+0.109
ws+0.108
Neg logit contributions
 Can-0.060
 I-0.058
 oh-0.056
 mean-0.056
 insect-0.053
Token.
Token ID13
Feature activation+2.38e+00

Pos logit contributions
aned+0.121
ips+0.121
aled+0.118
ulk+0.109
ags+0.109
Neg logit contributions
 I-0.047
 Can-0.047
 oh-0.045
 insect-0.044
 mean-0.042
Token,
Token ID11
Feature activation+1.66e-01

Pos logit contributions
ips+0.056
aled+0.053
aned+0.053
wed+0.051
ags+0.050
Neg logit contributions
 Can-0.018
 oh-0.017
 insect-0.016
 mean-0.016
 curious-0.015
Token and
Token ID290
Feature activation-2.29e-01

Pos logit contributions
ips+0.005
aned+0.005
aled+0.005
wed+0.005
ge+0.005
Neg logit contributions
 Can-0.003
 oh-0.002
 insect-0.002
 mean-0.002
 I-0.002
Token Lily
Token ID20037
Feature activation+1.90e+00

Pos logit contributions
 Can+0.003
 oh+0.003
 insect+0.002
 mean+0.002
 I+0.002
Neg logit contributions
ips-0.008
aned-0.008
aled-0.008
ags-0.007
ge-0.007
Token was
Token ID373
Feature activation+1.25e+00

Pos logit contributions
ips+0.070
ags+0.065
aled+0.064
aned+0.062
ws+0.061
Neg logit contributions
 oh-0.022
 mean-0.022
 Can-0.021
 curious-0.021
 named-0.020
Token able
Token ID1498
Feature activation+1.46e+00

Pos logit contributions
ips+0.042
aned+0.039
aled+0.038
wed+0.037
ge+0.036
Neg logit contributions
 mean-0.019
 oh-0.019
 Can-0.018
 curious-0.018
 insect-0.017
Token to
Token ID284
Feature activation+6.80e+00

Pos logit contributions
ips+0.048
aned+0.047
aled+0.046
wed+0.042
ags+0.042
Neg logit contributions
 Can-0.022
 oh-0.022
 mean-0.020
 insect-0.020
 I-0.020
Token get
Token ID651
Feature activation+3.13e+00

Pos logit contributions
ips+0.228
aned+0.204
aled+0.196
ushes+0.192
ulk+0.189
Neg logit contributions
 mean-0.118
 please-0.108
 insect-0.108
 hello-0.107
 oh-0.105
Token her
Token ID607
Feature activation+3.40e+00

Pos logit contributions
ips+0.074
aned+0.069
aled+0.067
ge+0.061
wed+0.061
Neg logit contributions
 Can-0.077
 oh-0.074
 insect-0.073
 mean-0.069
 Please-0.069
Token toy
Token ID13373
Feature activation+3.47e+00

Pos logit contributions
ips+0.092
aned+0.087
aled+0.085
ge+0.078
wed+0.077
Neg logit contributions
 Can-0.073
 oh-0.070
 insect-0.068
 mean-0.066
 I-0.064
Token back
Token ID736
Feature activation+4.18e+00

Pos logit contributions
ips+0.076
aned+0.072
aled+0.069
wed+0.062
ge+0.062
Neg logit contributions
 Can-0.091
 oh-0.088
 insect-0.087
 mean-0.084
 I-0.083
Token.
Token ID13
Feature activation+2.31e+00

Pos logit contributions
aned+0.142
ips+0.142
aled+0.138
ws+0.129
wed+0.129
Neg logit contributions
 I-0.054
 Can-0.053
 insect-0.049
 oh-0.048
 mean-0.047
Token They
Token ID1119
Feature activation+2.11e+00

Pos logit contributions
 Can+0.004
 oh+0.004
 insect+0.004
 I+0.003
 Please+0.003
Neg logit contributions
ips-0.006
aned-0.006
aled-0.006
ge-0.005
wed-0.005
Token took
Token ID1718
Feature activation+3.80e+00

Pos logit contributions
ips+0.067
aned+0.063
aled+0.062
ge+0.059
ulk+0.057
Neg logit contributions
 Can-0.035
 oh-0.034
 insect-0.032
 mean-0.032
 I-0.030
Token off
Token ID572
Feature activation+1.88e+00

Pos logit contributions
aned+0.116
ips+0.113
ge+0.108
aled+0.104
ulk+0.099
Neg logit contributions
 oh-0.064
 Can-0.063
 insect-0.061
 mean-0.060
 I-0.057
Token the
Token ID262
Feature activation+3.12e+00

Pos logit contributions
aned+0.060
ips+0.059
aled+0.056
ge+0.053
ws+0.052
Neg logit contributions
 Can-0.030
 oh-0.030
 insect-0.029
 mean-0.028
 hello-0.026
Token ash
Token ID12530
Feature activation+3.61e+00

Pos logit contributions
ips+0.092
aned+0.091
aled+0.088
ps+0.085
ags+0.084
Neg logit contributions
 Can-0.053
 Please-0.047
 oh-0.046
 insect-0.046
 I-0.044
Tokentr
Token ID2213
Feature activation+6.80e+00

Pos logit contributions
ips+0.103
aned+0.096
aled+0.094
ge+0.090
 ease+0.088
Neg logit contributions
 Can-0.071
 oh-0.068
 insect-0.066
 mean-0.064
 witch-0.062
Tokenay
Token ID323
Feature activation+4.63e+00

Pos logit contributions
ips+0.171
aned+0.171
aled+0.168
ge+0.162
wed+0.156
Neg logit contributions
 Can-0.135
 oh-0.127
 I-0.122
 please-0.122
 Please-0.117
Token and
Token ID290
Feature activation+2.36e+00

Pos logit contributions
ips+0.152
aned+0.145
aled+0.143
ulk+0.134
ge+0.132
Neg logit contributions
 Can-0.066
 insect-0.063
 oh-0.063
 mean-0.059
 I-0.058
Token the
Token ID262
Feature activation+1.29e+00

Pos logit contributions
ips+0.059
aned+0.056
aled+0.055
wed+0.051
ge+0.050
Neg logit contributions
 Can-0.054
 oh-0.051
 insect-0.050
 mean-0.048
 Please-0.047
Token book
Token ID1492
Feature activation+3.18e+00

Pos logit contributions
ips+0.034
aned+0.031
aled+0.031
ps+0.030
 ease+0.030
Neg logit contributions
 Can-0.026
 oh-0.024
 insect-0.023
 Please-0.023
 I-0.022
Token.
Token ID13
Feature activation+2.20e-01

Pos logit contributions
ips+0.106
aned+0.102
aled+0.099
ippy+0.096
ulk+0.094
Neg logit contributions
 mean-0.044
 Can-0.042
 oh-0.039
 I-0.039
 hello-0.039
Token.
Token ID13
Feature activation+1.29e+00

Pos logit contributions
ips+0.015
aned+0.014
aled+0.014
ge+0.013
wed+0.013
Neg logit contributions
 Can-0.010
 oh-0.010
 insect-0.009
 mean-0.009
 Please-0.009
Token Jack
Token ID3619
Feature activation+5.15e-01

Pos logit contributions
ips+0.038
aned+0.036
aled+0.036
ge+0.033
wed+0.033
Neg logit contributions
 Can-0.024
 oh-0.023
 insect-0.022
 mean-0.021
 Please-0.020
Token put
Token ID1234
Feature activation+3.53e+00

Pos logit contributions
ips+0.017
aned+0.016
aled+0.016
ags+0.015
ge+0.015
Neg logit contributions
 Can-0.008
 oh-0.007
 insect-0.007
 mean-0.007
 I-0.007
Token the
Token ID262
Feature activation+2.52e+00

Pos logit contributions
ips+0.100
aned+0.098
aled+0.093
ge+0.088
wed+0.086
Neg logit contributions
 Can-0.069
 oh-0.068
 insect-0.064
 mean-0.062
 I-0.061
Token ash
Token ID12530
Feature activation+3.50e+00

Pos logit contributions
ips+0.082
aned+0.080
aled+0.078
wed+0.074
ge+0.073
Neg logit contributions
 Can-0.037
 oh-0.033
 insect-0.032
 Please-0.031
 I-0.030
Tokentr
Token ID2213
Feature activation+6.79e+00

Pos logit contributions
ips+0.099
aned+0.091
aled+0.088
ulk+0.084
ge+0.084
Neg logit contributions
 Can-0.071
 oh-0.069
 insect-0.064
 mean-0.064
 I-0.063
Tokenay
Token ID323
Feature activation+4.88e+00

Pos logit contributions
ips+0.177
aned+0.175
aled+0.171
ge+0.165
ws+0.161
Neg logit contributions
 Can-0.134
 oh-0.125
 I-0.122
 please-0.118
 Please-0.117
Token on
Token ID319
Feature activation+1.32e+00

Pos logit contributions
ips+0.130
aned+0.122
aled+0.117
ulk+0.112
ge+0.110
Neg logit contributions
 Can-0.101
 oh-0.100
 mean-0.093
 I-0.093
 insect-0.093
Token the
Token ID262
Feature activation+2.01e+00

Pos logit contributions
ips+0.037
aned+0.035
aled+0.034
ge+0.032
wed+0.031
Neg logit contributions
 Can-0.027
 oh-0.025
 insect-0.025
 mean-0.023
 Please-0.023
Token kitchen
Token ID9592
Feature activation+1.76e+00

Pos logit contributions
ips+0.053
aned+0.049
aled+0.047
ge+0.043
wed+0.043
Neg logit contributions
 Can-0.047
 oh-0.045
 insect-0.044
 mean-0.042
 witch-0.041
Token table
Token ID3084
Feature activation+2.34e+00

Pos logit contributions
ips+0.055
aned+0.053
aled+0.052
ge+0.048
wed+0.047
Neg logit contributions
 Can-0.029
 oh-0.028
 insect-0.027
 mean-0.026
 I-0.025
Token eventually
Token ID4191
Feature activation+2.74e-01

Pos logit contributions
ips+0.002
aned+0.002
aled+0.002
ge+0.002
wed+0.002
Neg logit contributions
 Can-0.001
 oh-0.001
 insect-0.001
 mean-0.001
 I-0.001
Token,
Token ID11
Feature activation-4.03e-01

Pos logit contributions
ips+0.009
aned+0.008
aled+0.008
ags+0.008
ge+0.008
Neg logit contributions
 Can-0.004
 oh-0.004
 insect-0.004
 mean-0.004
 I-0.004
Token they
Token ID484
Feature activation+2.05e+00

Pos logit contributions
 Can+0.007
 oh+0.006
 insect+0.006
 mean+0.006
 I+0.006
Neg logit contributions
ips-0.013
aned-0.012
aled-0.012
ge-0.011
wed-0.011
Token were
Token ID547
Feature activation+1.35e+00

Pos logit contributions
ips+0.069
aned+0.063
ags+0.061
aled+0.061
ulk+0.060
Neg logit contributions
 mean-0.031
 insect-0.029
 Can-0.028
 oh-0.028
 happened-0.028
Token able
Token ID1498
Feature activation+1.18e+00

Pos logit contributions
ips+0.036
aned+0.034
aled+0.033
ippy+0.031
ws+0.031
Neg logit contributions
 mean-0.028
 curious-0.027
 oh-0.027
 Can-0.025
 nos-0.025
Token to
Token ID284
Feature activation+6.76e+00

Pos logit contributions
ips+0.040
aned+0.039
aled+0.038
ags+0.035
ws+0.035
Neg logit contributions
 Can-0.017
 oh-0.016
 mean-0.016
 insect-0.015
 I-0.015
Token fix
Token ID4259
Feature activation+2.18e+00

Pos logit contributions
ips+0.217
aned+0.195
aled+0.187
ulk+0.172
ippy+0.172
Neg logit contributions
 mean-0.122
 please-0.117
 I-0.114
 oh-0.111
 insect-0.109
Token the
Token ID262
Feature activation+2.31e+00

Pos logit contributions
ips+0.068
aned+0.068
aled+0.064
wed+0.060
ge+0.060
Neg logit contributions
 oh-0.035
 Can-0.035
 insect-0.034
 mean-0.032
 I-0.030
Token pattern
Token ID3912
Feature activation+2.94e+00

Pos logit contributions
ips+0.062
aned+0.060
aled+0.057
wed+0.054
ge+0.053
Neg logit contributions
 Can-0.050
 insect-0.050
 mean-0.048
 oh-0.048
 I-0.046
Token.
Token ID13
Feature activation+2.28e+00

Pos logit contributions
ips+0.088
aned+0.085
aled+0.083
ags+0.076
wed+0.076
Neg logit contributions
 Can-0.051
 oh-0.050
 insect-0.049
 mean-0.047
 I-0.047
Token Sammy
Token ID44067
Feature activation+6.86e-01

Pos logit contributions
ips+0.070
aned+0.068
aled+0.065
ge+0.064
wed+0.062
Neg logit contributions
 Can-0.040
 oh-0.036
 insect-0.036
 I-0.034
 Please-0.033
Token his
Token ID465
Feature activation+1.66e+00

Pos logit contributions
 Can+0.015
 insect+0.014
 oh+0.013
 mean+0.013
 witch+0.012
Neg logit contributions
ips-0.032
aned-0.031
aled-0.030
 safely-0.029
ge-0.029
Token mom
Token ID1995
Feature activation+5.97e-01

Pos logit contributions
ips+0.048
aned+0.046
aled+0.045
wed+0.042
ge+0.042
Neg logit contributions
 Can-0.032
 oh-0.030
 insect-0.028
 mean-0.027
 Please-0.027
Token put
Token ID1234
Feature activation+2.98e+00

Pos logit contributions
ips+0.018
aned+0.017
aled+0.017
ge+0.015
wed+0.015
Neg logit contributions
 Can-0.011
 oh-0.010
 insect-0.010
 mean-0.009
 Please-0.009
Token the
Token ID262
Feature activation+2.64e+00

Pos logit contributions
ips+0.084
aned+0.081
aled+0.078
ge+0.074
wed+0.072
Neg logit contributions
 Can-0.059
 oh-0.057
 insect-0.055
 mean-0.053
 I-0.051
Token ash
Token ID12530
Feature activation+3.14e+00

Pos logit contributions
ips+0.082
aned+0.080
aled+0.077
wed+0.074
ps+0.074
Neg logit contributions
 Can-0.042
 oh-0.037
 Please-0.036
 insect-0.036
 I-0.035
Tokentr
Token ID2213
Feature activation+6.76e+00

Pos logit contributions
ips+0.085
aned+0.080
aled+0.077
ge+0.074
ulk+0.072
Neg logit contributions
 Can-0.066
 oh-0.064
 insect-0.061
 mean-0.060
 I-0.058
Tokenay
Token ID323
Feature activation+4.42e+00

Pos logit contributions
ips+0.177
aned+0.173
aled+0.170
ge+0.166
wed+0.161
Neg logit contributions
 Can-0.132
 oh-0.124
 I-0.121
 please-0.118
 Please-0.116
Token away
Token ID1497
Feature activation+1.57e+00

Pos logit contributions
ips+0.116
aned+0.111
aled+0.106
ge+0.100
ulk+0.100
Neg logit contributions
 Can-0.093
 oh-0.091
 insect-0.086
 mean-0.085
 I-0.084
Token,
Token ID11
Feature activation-5.24e-01

Pos logit contributions
ips+0.048
aned+0.046
aled+0.045
ge+0.042
wed+0.042
Neg logit contributions
 Can-0.027
 oh-0.026
 insect-0.025
 mean-0.024
 I-0.023
Token but
Token ID475
Feature activation-8.24e-01

Pos logit contributions
 Can+0.009
 oh+0.009
 insect+0.009
 mean+0.008
 witch+0.008
Neg logit contributions
ips-0.015
aned-0.015
aled-0.014
 safely-0.013
ge-0.013
Token Little
Token ID7703
Feature activation-4.02e+00

Pos logit contributions
 Can+0.014
 insect+0.013
 oh+0.012
 witch+0.012
 mean+0.012
Neg logit contributions
ips-0.025
aned-0.024
aled-0.023
 safely-0.023
 securely-0.022
Token tire
Token ID15867
Feature activation+3.82e+00

Pos logit contributions
ips+0.096
aned+0.091
aled+0.087
wed+0.080
ulk+0.080
Neg logit contributions
 mean-0.070
 Can-0.069
 oh-0.068
 insect-0.067
 curious-0.066
Token and
Token ID290
Feature activation+6.46e-01

Pos logit contributions
ips+0.128
aned+0.123
aled+0.118
ags+0.116
ws+0.113
Neg logit contributions
 oh-0.051
 Can-0.047
 mean-0.046
 insect-0.045
 I-0.044
Token he
Token ID339
Feature activation+2.04e+00

Pos logit contributions
ips+0.020
aned+0.019
aled+0.018
ags+0.018
ippy+0.017
Neg logit contributions
 oh-0.012
 Can-0.012
 mean-0.011
 insect-0.010
 I-0.010
Token was
Token ID373
Feature activation+1.03e+00

Pos logit contributions
ips+0.070
ags+0.067
aned+0.066
ulk+0.063
aled+0.063
Neg logit contributions
 mean-0.026
 named-0.024
 oh-0.024
 curious-0.023
 happened-0.023
Token able
Token ID1498
Feature activation+7.37e-01

Pos logit contributions
ips+0.031
aned+0.030
aled+0.029
ge+0.028
ws+0.027
Neg logit contributions
 oh-0.019
 mean-0.018
 curious-0.017
 Can-0.017
 I-0.016
Token to
Token ID284
Feature activation+6.75e+00

Pos logit contributions
ips+0.025
aned+0.024
aled+0.024
ags+0.022
ws+0.022
Neg logit contributions
 Can-0.011
 oh-0.010
 mean-0.010
 I-0.010
 insect-0.009
Token join
Token ID4654
Feature activation+9.84e-01

Pos logit contributions
ips+0.225
aned+0.215
ushes+0.195
aled+0.194
wed+0.190
Neg logit contributions
 mean-0.111
 please-0.103
 I-0.101
 oh-0.095
 Can-0.092
Token the
Token ID262
Feature activation+1.19e+00

Pos logit contributions
ips+0.030
aned+0.029
aled+0.027
wed+0.026
ws+0.026
Neg logit contributions
 Can-0.018
 oh-0.018
 I-0.016
 mean-0.016
 hello-0.016
Token race
Token ID3234
Feature activation+1.85e+00

Pos logit contributions
ips+0.035
aned+0.034
aled+0.032
wed+0.030
ags+0.030
Neg logit contributions
 Can-0.024
 oh-0.023
 mean-0.022
 insect-0.022
 I-0.021
Token again
Token ID757
Feature activation+1.12e+00

Pos logit contributions
aned+0.057
ips+0.057
aled+0.054
wed+0.052
ws+0.051
Neg logit contributions
 Can-0.031
 oh-0.030
 insect-0.028
 I-0.028
 mean-0.028
Token.
Token ID13
Feature activation+2.10e+00

Pos logit contributions
aned+0.038
ips+0.037
aled+0.036
wed+0.034
ags+0.033
Neg logit contributions
 Can-0.017
 oh-0.016
 insect-0.015
 I-0.015
 mean-0.015
Token together
Token ID1978
Feature activation+2.12e+00

Pos logit contributions
ips+0.108
aned+0.101
aled+0.101
ags+0.093
ulk+0.093
Neg logit contributions
 mean-0.064
 oh-0.063
 Can-0.059
 curious-0.058
 happened-0.057
Token and
Token ID290
Feature activation+1.85e+00

Pos logit contributions
ips+0.074
aned+0.072
aled+0.071
ags+0.066
ulk+0.066
Neg logit contributions
 oh-0.026
 Can-0.026
 insect-0.025
 mean-0.024
 curious-0.023
Token they
Token ID484
Feature activation+2.83e+00

Pos logit contributions
ips+0.052
aned+0.049
aled+0.047
ags+0.045
ge+0.043
Neg logit contributions
 Can-0.038
 oh-0.037
 insect-0.035
 mean-0.034
 I-0.034
Token were
Token ID547
Feature activation+1.37e+00

Pos logit contributions
ips+0.094
aned+0.088
ags+0.086
aled+0.083
ulk+0.081
Neg logit contributions
 named-0.044
 mean-0.044
 oh-0.042
 curious-0.040
 insect-0.040
Token able
Token ID1498
Feature activation+8.64e-01

Pos logit contributions
ips+0.046
aned+0.044
aled+0.043
ge+0.041
ulk+0.041
Neg logit contributions
 mean-0.020
 oh-0.020
 curious-0.020
 Can-0.018
 kind-0.017
Token to
Token ID284
Feature activation+6.73e+00

Pos logit contributions
ips+0.028
aned+0.027
aled+0.027
ags+0.024
wed+0.024
Neg logit contributions
 Can-0.014
 oh-0.013
 insect-0.012
 mean-0.012
 I-0.012
Token get
Token ID651
Feature activation+2.78e+00

Pos logit contributions
ips+0.218
aned+0.201
aled+0.196
ulk+0.175
ags+0.174
Neg logit contributions
 please-0.120
 mean-0.118
 I-0.115
 Can-0.112
 oh-0.111
Token Jeff
Token ID5502
Feature activation+1.18e+00

Pos logit contributions
ips+0.079
aned+0.076
aled+0.072
ge+0.067
ags+0.066
Neg logit contributions
 Can-0.056
 oh-0.055
 mean-0.052
 I-0.050
 insect-0.050
Token out
Token ID503
Feature activation+2.52e+00

Pos logit contributions
ips+0.033
aned+0.032
aled+0.031
ulk+0.028
ags+0.028
Neg logit contributions
 oh-0.023
 Can-0.023
 insect-0.022
 mean-0.022
 I-0.021
Token of
Token ID286
Feature activation+3.32e+00

Pos logit contributions
ips+0.082
aned+0.079
aled+0.077
wed+0.071
ags+0.071
Neg logit contributions
 Can-0.039
 mean-0.036
 oh-0.035
 insect-0.035
 I-0.034
Token jail
Token ID7356
Feature activation+1.37e+00

Pos logit contributions
aned+0.112
ips+0.110
aled+0.104
wed+0.097
ge+0.095
Neg logit contributions
 Can-0.051
 oh-0.047
 insect-0.046
 mean-0.045
 I-0.044
Token,
Token ID11
Feature activation-2.10e+00

Pos logit contributions
 Can+0.055
 mean+0.051
 oh+0.049
 insect+0.049
 witch+0.047
Neg logit contributions
ips-0.062
 safely-0.059
aned-0.058
 off-0.057
ge-0.055
Token and
Token ID290
Feature activation-5.36e-01

Pos logit contributions
 Can+0.034
 oh+0.031
 insect+0.030
 guy+0.029
 Please+0.029
Neg logit contributions
ips-0.061
 safely-0.059
aned-0.057
 off-0.057
 successfully-0.055
Token they
Token ID484
Feature activation+1.94e+00

Pos logit contributions
 Can+0.009
 insect+0.007
 oh+0.007
 mean+0.007
 Please+0.007
Neg logit contributions
ips-0.017
aned-0.016
 safely-0.016
aled-0.016
 off-0.015
Token were
Token ID547
Feature activation+8.38e-01

Pos logit contributions
ips+0.059
aned+0.053
ags+0.051
aled+0.051
ulk+0.050
Neg logit contributions
 mean-0.034
 named-0.033
 insect-0.033
 oh-0.033
 pigeon-0.033
Token able
Token ID1498
Feature activation+1.24e+00

Pos logit contributions
ips+0.026
aned+0.024
aled+0.024
ge+0.023
ws+0.023
Neg logit contributions
 oh-0.014
 mean-0.014
 curious-0.014
 Can-0.013
 insect-0.013
Token to
Token ID284
Feature activation+6.72e+00

Pos logit contributions
ips+0.041
aned+0.040
aled+0.039
ws+0.036
ags+0.036
Neg logit contributions
 Can-0.018
 oh-0.018
 insect-0.017
 mean-0.017
 I-0.017
Token get
Token ID651
Feature activation+2.91e+00

Pos logit contributions
ips+0.226
aned+0.210
aled+0.194
ulk+0.187
wed+0.184
Neg logit contributions
 please-0.110
 mean-0.107
 I-0.104
 insect-0.102
 hello-0.102
Token on
Token ID319
Feature activation+2.05e+00

Pos logit contributions
aned+0.083
ips+0.082
aled+0.076
ge+0.071
ippy+0.071
Neg logit contributions
 oh-0.056
 Can-0.056
 mean-0.052
 I-0.051
 insect-0.051
Token the
Token ID262
Feature activation+2.72e+00

Pos logit contributions
aned+0.068
ips+0.066
aled+0.063
wed+0.061
ws+0.059
Neg logit contributions
 oh-0.030
 Can-0.030
 hello-0.028
 insect-0.028
 mean-0.028
Token aer
Token ID9551
Feature activation+3.28e+00

Pos logit contributions
ips+0.085
aned+0.080
aled+0.078
wed+0.073
ge+0.073
Neg logit contributions
 Can-0.047
 oh-0.047
 insect-0.045
 mean-0.044
 witch-0.042
Tokenopl
Token ID20106
Feature activation+2.74e+00

Pos logit contributions
ips+0.068
 off+0.062
wed+0.061
 ease+0.061
ws+0.061
Neg logit contributions
 Can-0.073
 witch-0.072
 insect-0.072
 mean-0.069
 I-0.068
Token fun
Token ID1257
Feature activation+4.97e-01

Pos logit contributions
aned+0.053
ips+0.053
aled+0.051
ge+0.050
ippy+0.048
Neg logit contributions
 oh-0.019
 Can-0.019
 mean-0.019
 insect-0.017
 hello-0.016
Token together
Token ID1978
Feature activation+1.11e+00

Pos logit contributions
ips+0.014
aned+0.014
aled+0.013
wed+0.012
ge+0.012
Neg logit contributions
 Can-0.010
 oh-0.009
 insect-0.009
 mean-0.009
 I-0.008
Token.
Token ID13
Feature activation+2.78e+00

Pos logit contributions
aned+0.037
ips+0.036
aled+0.036
ippy+0.033
ags+0.032
Neg logit contributions
 Can-0.016
 oh-0.015
 insect-0.015
 mean-0.015
 I-0.014
Token They
Token ID1119
Feature activation+2.60e+00

Pos logit contributions
ips+0.082
aned+0.081
ge+0.081
aled+0.077
wed+0.073
Neg logit contributions
 Can-0.051
 I-0.046
 insect-0.046
 oh-0.043
 Are-0.042
Token managed
Token ID5257
Feature activation+3.46e+00

Pos logit contributions
ips+0.086
aned+0.082
ge+0.076
ags+0.076
ippy+0.075
Neg logit contributions
 mean-0.043
 named-0.042
 pigeon-0.038
 insect-0.037
 wanted-0.037
Token to
Token ID284
Feature activation+6.69e+00

Pos logit contributions
ips+0.125
aned+0.122
aled+0.117
ags+0.112
ippy+0.110
Neg logit contributions
 mean-0.043
 oh-0.040
 Can-0.039
 insect-0.039
 happened-0.037
Token not
Token ID407
Feature activation+2.58e+00

Pos logit contributions
ips+0.227
aned+0.207
aled+0.194
ippy+0.189
ushes+0.184
Neg logit contributions
 mean-0.121
 please-0.103
 oh-0.101
 kind-0.100
 curious-0.099
Token fight
Token ID1907
Feature activation+1.90e+00

Pos logit contributions
ips+0.079
aned+0.074
aled+0.072
ags+0.068
ippy+0.067
Neg logit contributions
 Can-0.048
 mean-0.048
 oh-0.044
 I-0.044
 insect-0.044
Token after
Token ID706
Feature activation-9.12e-01

Pos logit contributions
ips+0.054
aned+0.052
aled+0.050
wed+0.047
ge+0.046
Neg logit contributions
 Can-0.037
 oh-0.036
 mean-0.036
 insect-0.035
 I-0.033
Token that
Token ID326
Feature activation+5.44e-01

Pos logit contributions
 oh+0.017
 Can+0.017
 insect+0.017
 mean+0.015
 I+0.015
Neg logit contributions
ips-0.026
aned-0.026
aled-0.025
wed-0.023
ge-0.023
Token and
Token ID290
Feature activation+1.10e+00

Pos logit contributions
ips+0.014
aned+0.013
aled+0.013
ge+0.012
wed+0.012
Neg logit contributions
 Can-0.012
 oh-0.011
 insect-0.011
 Please-0.011
 mean-0.011
Token girl
Token ID2576
Feature activation+3.18e+00

Pos logit contributions
 Can+0.011
 oh+0.010
 insect+0.009
 mean+0.008
 Please+0.008
Neg logit contributions
ips-0.030
aled-0.028
ge-0.028
 safely-0.028
aned-0.027
Token,
Token ID11
Feature activation-1.15e+00

Pos logit contributions
ips+0.095
aned+0.092
aled+0.088
ags+0.087
ulk+0.085
Neg logit contributions
 curious-0.051
 Can-0.051
 insect-0.049
 I-0.049
 mean-0.049
Token they
Token ID484
Feature activation+2.70e+00

Pos logit contributions
 Can+0.019
 oh+0.018
 insect+0.017
 mean+0.016
 I+0.016
Neg logit contributions
ips-0.036
aned-0.035
aled-0.034
ge-0.032
wed-0.031
Token were
Token ID547
Feature activation+9.90e-01

Pos logit contributions
ips+0.080
aned+0.076
ags+0.070
ulk+0.070
aled+0.070
Neg logit contributions
 mean-0.047
 insect-0.046
 named-0.046
 Can-0.045
 curious-0.045
Token able
Token ID1498
Feature activation+1.67e+00

Pos logit contributions
ips+0.028
aned+0.027
aled+0.027
ge+0.025
ws+0.024
Neg logit contributions
 oh-0.019
 mean-0.018
 curious-0.018
 Can-0.018
 insect-0.017
Token to
Token ID284
Feature activation+6.69e+00

Pos logit contributions
ips+0.059
aned+0.058
aled+0.058
ws+0.053
ags+0.053
Neg logit contributions
 Can-0.021
 oh-0.020
 I-0.019
 mean-0.019
 curious-0.019
Token get
Token ID651
Feature activation+2.85e+00

Pos logit contributions
ips+0.216
aned+0.202
aled+0.191
ulk+0.181
wed+0.176
Neg logit contributions
 mean-0.113
 please-0.109
 insect-0.108
 I-0.106
 hello-0.104
Token his
Token ID465
Feature activation+3.56e+00

Pos logit contributions
ips+0.082
aned+0.082
aled+0.077
ge+0.071
wed+0.070
Neg logit contributions
 Can-0.054
 oh-0.053
 insect-0.049
 mean-0.049
 I-0.048
Token old
Token ID1468
Feature activation+2.69e+00

Pos logit contributions
ips+0.106
aned+0.105
aled+0.100
ulk+0.097
wed+0.093
Neg logit contributions
 Can-0.067
 insect-0.063
 oh-0.062
 mean-0.060
 I-0.060
Token k
Token ID479
Feature activation+2.95e+00

Pos logit contributions
ips+0.076
aned+0.073
aled+0.071
wed+0.065
ge+0.064
Neg logit contributions
 Can-0.055
 oh-0.052
 insect-0.052
 mean-0.049
 I-0.048
Tokenite
Token ID578
Feature activation+4.53e+00

Pos logit contributions
aned+0.088
aled+0.085
ips+0.083
wed+0.080
ws+0.079
Neg logit contributions
 Can-0.053
 oh-0.049
 insect-0.048
 I-0.048
 pigeon-0.047
Token perfectly
Token ID7138
Feature activation+2.18e+00

Pos logit contributions
ips+0.019
aned+0.018
aled+0.018
ge+0.017
wed+0.016
Neg logit contributions
 Can-0.016
 oh-0.015
 insect-0.015
 mean-0.014
 Please-0.014
Token and
Token ID290
Feature activation-8.02e-01

Pos logit contributions
ips+0.076
aled+0.073
aned+0.071
ags+0.069
ulk+0.069
Neg logit contributions
 Can-0.027
 oh-0.027
 mean-0.026
 curious-0.025
 I-0.024
Token they
Token ID484
Feature activation+1.94e+00

Pos logit contributions
 Can+0.014
 oh+0.013
 insect+0.013
 mean+0.012
 Please+0.012
Neg logit contributions
ips-0.024
aned-0.023
aled-0.022
ge-0.021
 safely-0.021
Token were
Token ID547
Feature activation+3.46e-01

Pos logit contributions
ips+0.064
ags+0.058
aned+0.057
aled+0.055
ulk+0.055
Neg logit contributions
 oh-0.031
 curious-0.030
 mean-0.030
 Can-0.029
 insect-0.029
Token able
Token ID1498
Feature activation+6.56e-01

Pos logit contributions
ips+0.011
aned+0.010
aled+0.010
ulk+0.010
ags+0.009
Neg logit contributions
 curious-0.006
 oh-0.006
 Can-0.006
 mean-0.006
 insect-0.005
Token to
Token ID284
Feature activation+6.67e+00

Pos logit contributions
ips+0.021
aned+0.021
aled+0.020
ags+0.019
wed+0.019
Neg logit contributions
 Can-0.010
 oh-0.010
 insect-0.009
 mean-0.009
 I-0.009
Token unlock
Token ID12116
Feature activation+2.49e+00

Pos logit contributions
ips+0.233
aned+0.210
aled+0.197
ulk+0.192
ags+0.187
Neg logit contributions
 please-0.106
 mean-0.102
 oh-0.101
 I-0.100
 curious-0.100
Token the
Token ID262
Feature activation+2.90e+00

Pos logit contributions
ips+0.085
aned+0.085
aled+0.080
ulk+0.075
wed+0.075
Neg logit contributions
 Can-0.033
 oh-0.033
 insect-0.030
 I-0.029
 hello-0.028
Token gate
Token ID8946
Feature activation+3.26e+00

Pos logit contributions
ips+0.074
aned+0.069
aled+0.067
ge+0.062
wed+0.061
Neg logit contributions
 Can-0.067
 oh-0.064
 insect-0.062
 mean-0.060
 Please-0.059
Token.
Token ID13
Feature activation+1.45e+00

Pos logit contributions
ips+0.106
aned+0.101
aled+0.099
ags+0.094
wed+0.094
Neg logit contributions
 Can-0.050
 I-0.048
 oh-0.047
 insect-0.044
 mean-0.043
Token
Token ID220
Feature activation+3.77e+00

Pos logit contributions
ips+0.035
aned+0.033
aled+0.032
ge+0.029
wed+0.029
Neg logit contributions
 Can-0.036
 oh-0.034
 insect-0.033
 I-0.032
 Please-0.032
Token time
Token ID640
Feature activation-8.23e-01

Pos logit contributions
 Can+0.021
 oh+0.020
 insect+0.020
 Please+0.018
 witch+0.018
Neg logit contributions
ips-0.028
aned-0.026
aled-0.026
wed-0.024
ge-0.023
Token,
Token ID11
Feature activation-2.73e+00

Pos logit contributions
 Can+0.009
 oh+0.008
 insect+0.008
 mean+0.007
 witch+0.007
Neg logit contributions
ips-0.030
aned-0.029
aled-0.029
ge-0.027
wed-0.027
Token make
Token ID787
Feature activation+2.16e+00

Pos logit contributions
 insect+0.046
 Can+0.046
 witch+0.042
 goat+0.042
 oh+0.041
Neg logit contributions
ips-0.076
aled-0.070
 safely-0.070
 off-0.069
aned-0.068
Token sure
Token ID1654
Feature activation-1.12e+00

Pos logit contributions
ips+0.065
aned+0.062
aled+0.060
ge+0.058
wed+0.055
Neg logit contributions
 oh-0.039
 Can-0.038
 insect-0.038
 mean-0.036
 I-0.034
Token you
Token ID345
Feature activation+3.21e+00

Pos logit contributions
 Can+0.015
 oh+0.014
 insect+0.014
 mean+0.013
 Please+0.012
Neg logit contributions
ips-0.039
aned-0.037
aled-0.036
ge-0.034
wed-0.034
Token can
Token ID460
Feature activation+6.67e+00

Pos logit contributions
ips+0.097
aned+0.090
aled+0.088
ulk+0.083
ge+0.081
Neg logit contributions
 Can-0.057
 mean-0.056
 insect-0.055
 oh-0.054
 I-0.052
Token hear
Token ID3285
Feature activation-1.34e+00

Pos logit contributions
ips+0.212
aned+0.198
aled+0.197
ulk+0.183
wed+0.174
Neg logit contributions
 mean-0.114
 I-0.112
 please-0.110
 like-0.100
 named-0.099
Token each
Token ID1123
Feature activation+3.12e+00

Pos logit contributions
 Can+0.024
 oh+0.023
 insect+0.022
 mean+0.021
 Please+0.020
Neg logit contributions
ips-0.040
aned-0.038
aled-0.037
ge-0.035
wed-0.035
Token other
Token ID584
Feature activation+1.61e+00

Pos logit contributions
ips+0.070
aned+0.067
aled+0.063
ge+0.057
wed+0.057
Neg logit contributions
 Can-0.081
 oh-0.080
 insect-0.079
 mean-0.077
 witch-0.075
Token,
Token ID11
Feature activation-2.88e+00

Pos logit contributions
ips+0.049
aned+0.047
aled+0.046
wed+0.042
ulk+0.042
Neg logit contributions
 Can-0.028
 mean-0.027
 insect-0.026
 oh-0.026
 I-0.025
Token okay
Token ID8788
Feature activation-2.82e+00

Pos logit contributions
 Can+0.044
 insect+0.042
 oh+0.040
 happened+0.039
 witch+0.038
Neg logit contributions
ips-0.085
 safely-0.081
aled-0.079
aned-0.077
 off-0.076
Token."
Token ID526
Feature activation+1.80e+00

Pos logit contributions
ips+0.034
aned+0.033
aled+0.033
ulk+0.030
ge+0.029
Neg logit contributions
 Can-0.022
 oh-0.020
 I-0.020
 insect-0.019
 Please-0.019
Token She
Token ID1375
Feature activation+2.03e+00

Pos logit contributions
aned+0.054
ips+0.054
ge+0.053
aled+0.052
wed+0.050
Neg logit contributions
 Can-0.032
 I-0.031
 hello-0.028
 Are-0.028
 Please-0.027
Token hugged
Token ID41130
Feature activation+8.96e-01

Pos logit contributions
ips+0.062
aned+0.057
aled+0.056
ge+0.054
ags+0.054
Neg logit contributions
 oh-0.036
 Can-0.036
 insect-0.034
 mean-0.033
 witch-0.032
Token Sarah
Token ID10490
Feature activation+1.14e+00

Pos logit contributions
ips+0.027
aned+0.026
aled+0.025
ge+0.023
wed+0.023
Neg logit contributions
 Can-0.017
 oh-0.016
 insect-0.015
 hello-0.015
 I-0.015
Token and
Token ID290
Feature activation+1.50e+00

Pos logit contributions
ips+0.040
aned+0.039
aled+0.038
ulk+0.037
ags+0.035
Neg logit contributions
 Can-0.014
 curious-0.013
 I-0.013
 insect-0.013
 oh-0.013
Token fet
Token ID11351
Feature activation+6.66e+00

Pos logit contributions
ips+0.045
aned+0.043
aled+0.042
ge+0.039
wed+0.039
Neg logit contributions
 Can-0.027
 oh-0.026
 insect-0.025
 mean-0.024
 I-0.023
Tokenched
Token ID1740
Feature activation+2.93e+00

Pos logit contributions
ips+0.156
aned+0.145
ge+0.134
ulk+0.129
ags+0.128
Neg logit contributions
 happened-0.132
 please-0.127
 named-0.127
 Can-0.125
 hello-0.125
Token the
Token ID262
Feature activation+1.49e+00

Pos logit contributions
ips+0.086
aned+0.084
aled+0.080
ge+0.076
wed+0.074
Neg logit contributions
 oh-0.054
 Can-0.052
 insect-0.050
 mean-0.048
 hello-0.048
Token key
Token ID1994
Feature activation+4.34e+00

Pos logit contributions
ips+0.041
aned+0.039
aled+0.038
ge+0.036
 safely+0.036
Neg logit contributions
 Can-0.030
 oh-0.028
 insect-0.026
 Please-0.025
 mean-0.025
Token from
Token ID422
Feature activation+1.18e+00

Pos logit contributions
ips+0.140
aned+0.133
aled+0.130
ulk+0.123
ags+0.121
Neg logit contributions
 Can-0.068
 insect-0.064
 oh-0.063
 mean-0.062
 I-0.062
Token the
Token ID262
Feature activation+1.74e+00

Pos logit contributions
ips+0.035
aned+0.034
aled+0.032
ge+0.030
wed+0.030
Neg logit contributions
 Can-0.022
 oh-0.022
 insect-0.021
 mean-0.020
 witch-0.019
Token<|endoftext|>
Token ID50256
Feature activation-6.66e-01

Pos logit contributions
ips+0.066
aned+0.064
aled+0.061
ge+0.058
wed+0.058
Neg logit contributions
 Can-0.046
 oh-0.042
 insect-0.041
 I-0.040
 Please-0.040
TokenOne
Token ID3198
Feature activation-2.05e+00

Pos logit contributions
 Can+0.012
 I+0.010
 oh+0.010
 insect+0.009
 mean+0.009
Neg logit contributions
aned-0.021
ips-0.020
aled-0.020
wed-0.019
ge-0.019
Token day
Token ID1110
Feature activation-2.51e+00

Pos logit contributions
 Can+0.047
 oh+0.044
 Please+0.041
 insect+0.040
 Are+0.040
Neg logit contributions
ips-0.051
aled-0.046
aned-0.045
 safely-0.044
wed-0.044
Token,
Token ID11
Feature activation-4.70e+00

Pos logit contributions
 Can+0.047
 insect+0.046
 witch+0.046
 oh+0.042
 writer+0.041
Neg logit contributions
ips-0.072
aned-0.065
 off-0.063
ippy-0.061
 safely-0.061
Token a
Token ID257
Feature activation-6.59e+00

Pos logit contributions
 witch+0.118
 writer+0.113
 insect+0.105
 crow+0.103
 photographer+0.103
Neg logit contributions
ips-0.097
 safely-0.094
 securely-0.093
rolled-0.091
aled-0.090
Token funny
Token ID8258
Feature activation-7.30e+00

Pos logit contributions
 Can+0.134
 oh+0.133
 unlike+0.131
heart+0.127
 Please+0.124
Neg logit contributions
ippy-0.114
 safely-0.110
ips-0.107
 finish-0.103
 securely-0.102
Token hipp
Token ID18568
Feature activation-2.07e+00

Pos logit contributions
 Can+0.136
 Do+0.131
 oh+0.128
 Please+0.123
 Did+0.122
Neg logit contributions
ippy-0.150
 safely-0.131
 home-0.130
 happily-0.125
ips-0.124
Tokeno
Token ID78
Feature activation-4.34e+00

Pos logit contributions
 Can+0.048
 oh+0.047
 insect+0.044
 Please+0.043
 hello+0.042
Neg logit contributions
ips-0.052
aled-0.047
aned-0.047
wed-0.043
ge-0.043
Token went
Token ID1816
Feature activation+6.99e-01

Pos logit contributions
 oh+0.074
 Can+0.073
 dear+0.068
 Please+0.065
 insect+0.062
Neg logit contributions
aled-0.124
ips-0.120
wed-0.118
 safely-0.113
rolled-0.112
Token to
Token ID284
Feature activation-6.42e-01

Pos logit contributions
ips+0.020
aned+0.019
aled+0.018
ge+0.017
wed+0.017
Neg logit contributions
 Can-0.014
 oh-0.013
 insect-0.013
 mean-0.012
 Please-0.012
Token play
Token ID711
Feature activation-1.83e-01

Pos logit contributions
 Can+0.015
 oh+0.015
 insect+0.014
 Please+0.014
 mean+0.013
Neg logit contributions
ips-0.015
aled-0.014
aned-0.014
ge-0.013
 ease-0.013
Token<|endoftext|>
Token ID50256
Feature activation-8.31e-01

Pos logit contributions
ips+0.038
aned+0.038
aled+0.036
ge+0.035
wed+0.034
Neg logit contributions
 Can-0.022
 oh-0.019
 I-0.019
 Please-0.019
 insect-0.018
TokenOne
Token ID3198
Feature activation-1.96e+00

Pos logit contributions
 Can+0.014
 I+0.012
 mean+0.012
 oh+0.012
 insect+0.012
Neg logit contributions
aned-0.026
ips-0.026
aled-0.025
ge-0.024
wed-0.024
Token day
Token ID1110
Feature activation-2.66e+00

Pos logit contributions
 Can+0.041
 oh+0.038
 Please+0.036
 insect+0.035
 hello+0.034
Neg logit contributions
ips-0.053
aned-0.049
aled-0.048
wed-0.046
 safely-0.046
Token,
Token ID11
Feature activation-4.86e+00

Pos logit contributions
 Can+0.050
 witch+0.049
 insect+0.048
 oh+0.045
 goat+0.044
Neg logit contributions
ips-0.077
aned-0.070
ippy-0.066
 off-0.065
ge-0.064
Token a
Token ID257
Feature activation-6.74e+00

Pos logit contributions
 witch+0.126
 writer+0.120
 photographer+0.114
 insect+0.114
 peac+0.112
Neg logit contributions
ips-0.095
 safely-0.090
 securely-0.088
aled-0.085
 ease-0.084
Token funny
Token ID8258
Feature activation-7.24e+00

Pos logit contributions
 unlike+0.132
 Can+0.131
 oh+0.131
heart+0.129
 Please+0.123
Neg logit contributions
ippy-0.116
 safely-0.114
ips-0.112
 finish-0.110
 securely-0.103
Token cat
Token ID3797
Feature activation-4.82e+00

Pos logit contributions
 Can+0.131
 Do+0.124
 oh+0.124
 Please+0.117
 Did+0.114
Neg logit contributions
ippy-0.157
ips-0.136
 home-0.134
 safely-0.134
 happily-0.125
Token named
Token ID3706
Feature activation-3.09e+00

Pos logit contributions
 oh+0.082
 dear+0.078
 Can+0.073
 please+0.069
 Please+0.068
Neg logit contributions
ips-0.133
aled-0.133
wed-0.128
aned-0.123
 safely-0.121
Token Kitty
Token ID21168
Feature activation-3.34e+00

Pos logit contributions
 Can+0.058
 oh+0.055
 insect+0.053
 writer+0.052
 witch+0.051
Neg logit contributions
ips-0.078
aled-0.077
aned-0.074
 off-0.070
ippy-0.070
Token found
Token ID1043
Feature activation-1.97e+00

Pos logit contributions
 Can+0.062
 oh+0.061
 insect+0.056
 Please+0.055
 hello+0.052
Neg logit contributions
ips-0.095
aled-0.091
aned-0.091
wed-0.087
 safely-0.084
Token a
Token ID257
Feature activation-4.39e+00

Pos logit contributions
 Can+0.034
 insect+0.029
 Please+0.029
 oh+0.029
 I+0.027
Neg logit contributions
ips-0.061
aled-0.057
aned-0.057
ags-0.053
 safely-0.053
Token<|endoftext|>
Token ID50256
Feature activation-6.28e-01

Pos logit contributions
aned+0.075
ips+0.075
ge+0.074
aled+0.070
wed+0.067
Neg logit contributions
 Can-0.041
 I-0.036
 oh-0.034
 mean-0.034
 insect-0.034
TokenOne
Token ID3198
Feature activation-2.18e+00

Pos logit contributions
 Can+0.010
 I+0.008
 mean+0.008
 oh+0.008
 happened+0.008
Neg logit contributions
aned-0.021
ips-0.020
aled-0.020
wed-0.019
ge-0.019
Token day
Token ID1110
Feature activation-2.89e+00

Pos logit contributions
 Can+0.045
 oh+0.042
 Please+0.039
 insect+0.039
 hello+0.038
Neg logit contributions
ips-0.060
aned-0.055
aled-0.054
wed-0.052
 safely-0.051
Token,
Token ID11
Feature activation-4.96e+00

Pos logit contributions
 Can+0.050
 insect+0.049
 witch+0.049
 oh+0.045
 crow+0.043
Neg logit contributions
ips-0.089
aned-0.081
ippy-0.076
ge-0.076
aled-0.074
Token a
Token ID257
Feature activation-6.69e+00

Pos logit contributions
 witch+0.121
 insect+0.113
 writer+0.111
 Can+0.109
 crow+0.106
Neg logit contributions
ips-0.104
 safely-0.095
 securely-0.094
aled-0.093
ippy-0.093
Token funny
Token ID8258
Feature activation-7.18e+00

Pos logit contributions
 Can+0.134
 oh+0.129
 unlike+0.127
heart+0.127
 Please+0.122
Neg logit contributions
ips-0.123
ippy-0.122
 safely-0.118
 finish-0.111
 securely-0.109
Token duck
Token ID22045
Feature activation-3.72e+00

Pos logit contributions
 Can+0.136
 Do+0.130
 oh+0.124
 Please+0.121
 Does+0.119
Neg logit contributions
ippy-0.152
ips-0.132
 safely-0.130
 home-0.129
 happily-0.124
Token named
Token ID3706
Feature activation-3.38e+00

Pos logit contributions
 oh+0.057
 Can+0.056
 Please+0.052
 dear+0.051
 insect+0.046
Neg logit contributions
ips-0.116
aled-0.109
wed-0.109
aned-0.104
eds-0.103
Token Bob
Token ID5811
Feature activation-3.17e+00

Pos logit contributions
 insect+0.066
 witch+0.066
 oh+0.066
 writer+0.064
 Can+0.064
Neg logit contributions
aled-0.082
ips-0.081
aned-0.075
 off-0.075
wed-0.074
Token went
Token ID1816
Feature activation+8.09e-01

Pos logit contributions
 oh+0.063
 Can+0.060
 insect+0.058
 Please+0.055
 dear+0.054
Neg logit contributions
ips-0.088
aled-0.083
wed-0.083
aned-0.080
ws-0.079
Token to
Token ID284
Feature activation-4.73e-01

Pos logit contributions
ips+0.024
aned+0.023
aled+0.022
ge+0.020
wed+0.020
Neg logit contributions
 Can-0.015
 oh-0.015
 insect-0.014
 mean-0.014
 I-0.013
Token,
Token ID11
Feature activation-5.10e+00

Pos logit contributions
 Can+0.058
 insect+0.056
 witch+0.053
 crow+0.050
 writer+0.049
Neg logit contributions
ips-0.086
aned-0.081
 off-0.076
ge-0.076
 safely-0.075
Token there
Token ID612
Feature activation-8.98e-01

Pos logit contributions
 witch+0.118
 writer+0.114
 insect+0.111
 Can+0.109
 crow+0.105
Neg logit contributions
ips-0.110
 safely-0.108
 securely-0.106
 ease-0.105
rolled-0.104
Token was
Token ID373
Feature activation-4.48e+00

Pos logit contributions
 Can+0.016
 oh+0.015
 insect+0.015
 mean+0.014
 Please+0.014
Neg logit contributions
ips-0.027
aned-0.026
aled-0.025
ge-0.023
wed-0.023
Token a
Token ID257
Feature activation-5.99e+00

Pos logit contributions
 Can+0.106
 Please+0.096
 hello+0.094
 insect+0.094
 witch+0.093
Neg logit contributions
ips-0.098
 ease-0.095
 safely-0.091
aled-0.091
ags-0.090
Token j
Token ID474
Feature activation-2.85e+00

Pos logit contributions
 Can+0.122
 happened+0.120
 oh+0.116
 unlike+0.113
 Do+0.112
Neg logit contributions
 safely-0.104
 checkout-0.103
 finish-0.102
 successfully-0.095
ippy-0.095
Tokenolly
Token ID5098
Feature activation-7.10e+00

Pos logit contributions
 oh+0.064
 Can+0.064
 insect+0.057
 Please+0.056
 hello+0.055
Neg logit contributions
ips-0.075
aned-0.068
aled-0.067
ippy-0.062
ags-0.061
Token man
Token ID582
Feature activation-4.27e+00

Pos logit contributions
 Can+0.132
 Do+0.125
 oh+0.122
 Are+0.120
 Did+0.119
Neg logit contributions
ippy-0.151
 safely-0.145
ips-0.130
 securely-0.127
 home-0.127
Token.
Token ID13
Feature activation-1.69e+00

Pos logit contributions
 Can+0.068
 oh+0.065
 dear+0.061
 hello+0.058
 Please+0.057
Neg logit contributions
ips-0.126
ge-0.125
 safely-0.119
aled-0.118
wed-0.116
Token His
Token ID2399
Feature activation-7.07e-01

Pos logit contributions
 oh+0.038
 insect+0.035
 Can+0.034
 witch+0.033
 mean+0.033
Neg logit contributions
ips-0.042
aled-0.040
aned-0.038
 safely-0.035
ags-0.035
Token name
Token ID1438
Feature activation-2.17e+00

Pos logit contributions
 Can+0.013
 oh+0.012
 insect+0.012
 Please+0.011
 I+0.011
Neg logit contributions
ips-0.021
aned-0.019
aled-0.019
ags-0.018
ippy-0.018
Token was
Token ID373
Feature activation-3.68e+00

Pos logit contributions
 Can+0.047
 oh+0.045
 insect+0.045
 Please+0.042
 hello+0.041
Neg logit contributions
ips-0.054
aned-0.052
aled-0.050
 safely-0.048
wed-0.046
Token time
Token ID640
Feature activation-3.23e+00

Pos logit contributions
 Can+0.099
 oh+0.095
 insect+0.090
 witch+0.086
 hello+0.085
Neg logit contributions
ips-0.090
aned-0.085
aled-0.080
ippy-0.079
wed-0.077
Token there
Token ID612
Feature activation-1.23e+00

Pos logit contributions
 goat+0.070
 insect+0.070
 Can+0.068
 witch+0.068
 pigeon+0.067
Neg logit contributions
ips-0.078
 safely-0.069
ge-0.068
ippy-0.067
aned-0.065
Token was
Token ID373
Feature activation-4.97e+00

Pos logit contributions
 Can+0.029
 insect+0.027
 oh+0.027
 Please+0.027
 witch+0.026
Neg logit contributions
ips-0.028
aned-0.026
 ease-0.025
 safely-0.025
aled-0.025
Token a
Token ID257
Feature activation-6.16e+00

Pos logit contributions
 Can+0.119
 Please+0.112
 Are+0.105
 witch+0.105
 oh+0.104
Neg logit contributions
ips-0.111
 ease-0.108
aled-0.102
 safely-0.100
ifts-0.098
Token j
Token ID474
Feature activation-3.03e+00

Pos logit contributions
 oh+0.125
 Can+0.122
 happened+0.121
 Please+0.113
 unlike+0.113
Neg logit contributions
 safely-0.116
ippy-0.112
ips-0.107
 finish-0.105
 securely-0.105
Tokenolly
Token ID5098
Feature activation-7.09e+00

Pos logit contributions
 Can+0.067
 oh+0.067
 insect+0.060
 Please+0.059
 mean+0.057
Neg logit contributions
ips-0.080
aned-0.073
aled-0.072
ippy-0.067
ge-0.066
Token old
Token ID1468
Feature activation-4.33e+00

Pos logit contributions
 Can+0.139
 oh+0.133
 Do+0.129
 Are+0.126
 Please+0.125
Neg logit contributions
ippy-0.149
 safely-0.143
 securely-0.125
ge-0.125
ips-0.123
Token woman
Token ID2415
Feature activation-4.24e+00

Pos logit contributions
 Can+0.081
 oh+0.079
 Please+0.072
 Are+0.069
 happened+0.069
Neg logit contributions
ips-0.106
ippy-0.103
 safely-0.098
aned-0.093
ge-0.092
Token who
Token ID508
Feature activation-2.18e+00

Pos logit contributions
 oh+0.084
 Can+0.083
 Please+0.073
 insect+0.072
 mean+0.072
Neg logit contributions
ips-0.110
ge-0.103
 safely-0.102
aled-0.100
aned-0.098
Token lived
Token ID5615
Feature activation-1.04e+00

Pos logit contributions
 Can+0.037
 oh+0.037
 insect+0.035
 Please+0.032
 hello+0.030
Neg logit contributions
ips-0.063
aned-0.061
aled-0.061
wed-0.057
 securely-0.057
Token in
Token ID287
Feature activation-1.87e+00

Pos logit contributions
 oh+0.013
 curious+0.013
 Can+0.012
 mean+0.012
 insect+0.012
Neg logit contributions
aned-0.037
aled-0.036
ips-0.035
ws-0.033
wed-0.033
Token,
Token ID11
Feature activation-4.46e+00

Pos logit contributions
 insect+0.060
 witch+0.059
 Can+0.058
 goat+0.057
 oh+0.055
Neg logit contributions
ips-0.084
aned-0.070
 safely-0.070
ippy-0.069
ge-0.068
Token there
Token ID612
Feature activation-7.06e-01

Pos logit contributions
 witch+0.097
 insect+0.093
 writer+0.092
 photographer+0.092
 goat+0.089
Neg logit contributions
ips-0.105
 safely-0.096
 securely-0.091
wed-0.090
 off-0.090
Token was
Token ID373
Feature activation-4.84e+00

Pos logit contributions
 Can+0.014
 insect+0.014
 oh+0.014
 Please+0.013
 witch+0.013
Neg logit contributions
ips-0.019
aned-0.018
aled-0.017
 ease-0.016
 safely-0.016
Token a
Token ID257
Feature activation-6.04e+00

Pos logit contributions
 Can+0.108
 Please+0.100
 witch+0.098
 Are+0.097
 hello+0.096
Neg logit contributions
ips-0.115
ags-0.105
aled-0.105
 ease-0.105
 safely-0.100
Token j
Token ID474
Feature activation-3.23e+00

Pos logit contributions
 Can+0.124
 oh+0.124
 happened+0.121
 unlike+0.116
 Please+0.114
Neg logit contributions
 safely-0.108
ips-0.102
ippy-0.101
 finish-0.100
 securely-0.095
Tokenolly
Token ID5098
Feature activation-7.08e+00

Pos logit contributions
 Can+0.070
 oh+0.070
 insect+0.063
 Please+0.062
 hello+0.060
Neg logit contributions
ips-0.086
aned-0.079
aled-0.077
ippy-0.072
ge-0.071
Token little
Token ID1310
Feature activation-5.84e+00

Pos logit contributions
 Can+0.132
 oh+0.124
 Are+0.123
 Do+0.120
 happened+0.118
Neg logit contributions
ippy-0.157
 safely-0.150
ips-0.138
 securely-0.131
 bank-0.130
Token girl
Token ID2576
Feature activation-5.29e+00

Pos logit contributions
 oh+0.103
 happened+0.097
 unlike+0.095
 Can+0.094
 Do+0.087
Neg logit contributions
ips-0.148
ippy-0.140
 halves-0.129
 safely-0.126
 off-0.120
Token named
Token ID3706
Feature activation-3.47e+00

Pos logit contributions
 oh+0.090
 Can+0.086
 hello+0.081
 guy+0.080
 dear+0.077
Neg logit contributions
ips-0.146
wed-0.132
eds-0.129
 safely-0.129
aled-0.128
Token Lily
Token ID20037
Feature activation-3.79e+00

Pos logit contributions
 Can+0.056
 oh+0.054
 insect+0.053
 mean+0.049
 witch+0.047
Neg logit contributions
ips-0.107
aned-0.100
aled-0.099
wed-0.092
 safely-0.091
Token.
Token ID13
Feature activation-8.25e-01

Pos logit contributions
 Can+0.088
 oh+0.086
 insect+0.081
 hello+0.078
 Please+0.078
Neg logit contributions
ips-0.091
aned-0.084
aled-0.083
ge-0.080
wed-0.080
Token,
Token ID11
Feature activation-4.74e+00

Pos logit contributions
 goat+0.069
 insect+0.069
 witch+0.067
 Can+0.067
 crow+0.066
Neg logit contributions
ips-0.081
ge-0.069
 safely-0.069
ippy-0.068
aned-0.065
Token there
Token ID612
Feature activation-7.68e-01

Pos logit contributions
 witch+0.099
 writer+0.097
 insect+0.097
 photographer+0.096
 goat+0.096
Neg logit contributions
ips-0.109
 safely-0.105
ge-0.099
 securely-0.098
 off-0.095
Token was
Token ID373
Feature activation-4.94e+00

Pos logit contributions
 Can+0.017
 insect+0.016
 oh+0.016
 Please+0.015
 witch+0.015
Neg logit contributions
ips-0.020
aned-0.018
aled-0.017
 ease-0.017
 safely-0.017
Token a
Token ID257
Feature activation-6.08e+00

Pos logit contributions
 Can+0.116
 Please+0.109
 oh+0.104
 guy+0.103
 Are+0.103
Neg logit contributions
ips-0.110
 ease-0.105
aled-0.102
ags-0.099
 safely-0.099
Token j
Token ID474
Feature activation-3.20e+00

Pos logit contributions
 oh+0.123
 Can+0.121
 happened+0.121
 Please+0.112
 unlike+0.110
Neg logit contributions
 safely-0.114
ippy-0.111
ips-0.105
 finish-0.101
ge-0.101
Tokenolly
Token ID5098
Feature activation-7.08e+00

Pos logit contributions
 Can+0.071
 oh+0.071
 insect+0.063
 Please+0.062
 Are+0.060
Neg logit contributions
ips-0.085
aned-0.077
aled-0.076
ippy-0.071
ge-0.070
Token little
Token ID1310
Feature activation-5.95e+00

Pos logit contributions
 Can+0.136
 oh+0.129
 Do+0.125
 Are+0.123
 happened+0.121
Neg logit contributions
ippy-0.157
 safely-0.147
ips-0.130
 securely-0.128
ge-0.128
Token boy
Token ID2933
Feature activation-5.03e+00

Pos logit contributions
 oh+0.107
 happened+0.106
 unlike+0.099
 Can+0.096
 Please+0.090
Neg logit contributions
ips-0.143
ippy-0.139
 halves-0.127
 safely-0.124
 home-0.118
Token named
Token ID3706
Feature activation-3.24e+00

Pos logit contributions
 Can+0.088
 oh+0.081
 dear+0.080
 hello+0.079
 Please+0.079
Neg logit contributions
ips-0.136
wed-0.127
ge-0.127
eds-0.124
ippy-0.124
Token Max
Token ID5436
Feature activation-3.88e+00

Pos logit contributions
 Can+0.045
 oh+0.043
 insect+0.043
 witch+0.038
 mean+0.038
Neg logit contributions
ips-0.108
aned-0.102
aled-0.102
ge-0.097
 ease-0.095
Token.
Token ID13
Feature activation-2.75e-02

Pos logit contributions
 Can+0.105
 oh+0.102
 Please+0.096
 insect+0.096
 hello+0.092
Neg logit contributions
ips-0.071
ge-0.070
wed-0.069
aled-0.069
aned-0.067
Token.
Token ID13
Feature activation+1.79e+00

Pos logit contributions
 Can+0.021
 oh+0.020
 insect+0.019
 mean+0.018
 Please+0.018
Neg logit contributions
ips-0.034
aned-0.032
aled-0.031
ge-0.029
wed-0.029
Token<|endoftext|>
Token ID50256
Feature activation-1.44e+00

Pos logit contributions
ips+0.051
aned+0.048
aled+0.047
ge+0.044
wed+0.043
Neg logit contributions
 Can-0.036
 oh-0.034
 insect-0.033
 mean-0.031
 Please-0.031
TokenOne
Token ID3198
Feature activation-2.34e+00

Pos logit contributions
 Can+0.031
 oh+0.028
 insect+0.028
 I+0.027
 mean+0.027
Neg logit contributions
ips-0.038
aned-0.038
aled-0.036
ge-0.034
wed-0.033
Token day
Token ID1110
Feature activation-3.49e+00

Pos logit contributions
 Can+0.054
 oh+0.051
 Please+0.048
 witch+0.046
 hello+0.046
Neg logit contributions
ips-0.057
aled-0.051
aned-0.051
wed-0.049
 safely-0.049
Token,
Token ID11
Feature activation-5.31e+00

Pos logit contributions
 Can+0.072
 insect+0.067
 witch+0.065
 writer+0.064
 crow+0.063
Neg logit contributions
ips-0.090
aned-0.082
 off-0.081
 ease-0.077
 safely-0.077
Token a
Token ID257
Feature activation-7.03e+00

Pos logit contributions
 writer+0.122
 witch+0.121
 Can+0.118
 whale+0.117
 insect+0.113
Neg logit contributions
ips-0.111
 ease-0.110
 safely-0.107
 securely-0.106
rolled-0.101
Token Steve
Token ID6542
Feature activation-3.94e+00

Pos logit contributions
 unlike+0.139
 Can+0.138
 oh+0.136
heart+0.132
 Please+0.130
Neg logit contributions
 safely-0.113
 finish-0.111
ippy-0.110
 securely-0.103
 neatly-0.101
Token and
Token ID290
Feature activation-4.31e+00

Pos logit contributions
 oh+0.104
 Can+0.103
 Please+0.097
 dear+0.096
 please+0.094
Neg logit contributions
aled-0.080
ips-0.078
wed-0.076
ge-0.071
aned-0.070
Token a
Token ID257
Feature activation-6.32e+00

Pos logit contributions
 Can+0.103
 insect+0.102
 whale+0.098
 crow+0.096
 chick+0.095
Neg logit contributions
aled-0.083
 off-0.077
 safely-0.075
 securely-0.075
 ease-0.075
Token Steve
Token ID6542
Feature activation-3.47e+00

Pos logit contributions
 Can+0.115
heart+0.110
 Please+0.109
 unlike+0.106
 happened+0.104
Neg logit contributions
 safely-0.117
 neatly-0.109
 ease-0.106
 finish-0.104
 securely-0.103
Token's
Token ID338
Feature activation-1.23e+00

Pos logit contributions
 oh+0.085
 Can+0.085
 insect+0.079
 Please+0.078
 dear+0.075
Neg logit contributions
ips-0.080
aled-0.077
wed-0.071
aned-0.071
 ease-0.068
Token,
Token ID11
Feature activation-4.54e+00

Pos logit contributions
 insect+0.058
 Can+0.058
 witch+0.056
 goat+0.056
 oh+0.054
Neg logit contributions
ips-0.085
ge-0.075
 safely-0.073
aned-0.072
ippy-0.072
Token there
Token ID612
Feature activation-1.11e+00

Pos logit contributions
 insect+0.089
 witch+0.087
 Can+0.084
 writer+0.082
 goat+0.081
Neg logit contributions
ips-0.118
 safely-0.109
ge-0.107
 securely-0.105
aled-0.104
Token was
Token ID373
Feature activation-4.80e+00

Pos logit contributions
 Can+0.023
 insect+0.022
 oh+0.022
 Please+0.021
 witch+0.021
Neg logit contributions
ips-0.029
aned-0.027
aled-0.026
 ease-0.025
ge-0.025
Token a
Token ID257
Feature activation-5.88e+00

Pos logit contributions
 Can+0.107
 Please+0.098
 oh+0.096
 witch+0.095
 insect+0.094
Neg logit contributions
ips-0.116
 ease-0.108
aled-0.108
ags-0.104
 safely-0.102
Token j
Token ID474
Feature activation-3.28e+00

Pos logit contributions
 oh+0.115
 Can+0.115
 happened+0.110
 Please+0.103
 Do+0.102
Neg logit contributions
 safely-0.117
ips-0.115
ippy-0.114
ge-0.107
 securely-0.106
Tokenolly
Token ID5098
Feature activation-7.02e+00

Pos logit contributions
 Can+0.071
 oh+0.070
 insect+0.064
 Please+0.062
 mean+0.061
Neg logit contributions
ips-0.089
aned-0.082
aled-0.080
ge-0.074
ippy-0.074
Token boy
Token ID2933
Feature activation-5.12e+00

Pos logit contributions
 Can+0.132
 oh+0.125
 Do+0.120
 Are+0.120
 happened+0.119
Neg logit contributions
ippy-0.160
 safely-0.152
ips-0.140
ge-0.138
 securely-0.134
Token named
Token ID3706
Feature activation-3.30e+00

Pos logit contributions
 Can+0.083
 oh+0.076
 hello+0.073
 dear+0.072
 Please+0.071
Neg logit contributions
ips-0.152
ge-0.139
wed-0.139
ippy-0.138
aled-0.137
Token Tim
Token ID5045
Feature activation-2.23e+00

Pos logit contributions
 Can+0.043
 oh+0.040
 insect+0.039
 mean+0.035
 witch+0.035
Neg logit contributions
ips-0.115
aned-0.110
aled-0.109
ge-0.103
wed-0.102
Tokenmy
Token ID1820
Feature activation-4.01e+00

Pos logit contributions
 Can+0.041
 insect+0.040
 mean+0.040
 oh+0.039
 I+0.036
Neg logit contributions
ips-0.066
aned-0.063
aled-0.060
ulk-0.055
 securely-0.054
Token.
Token ID13
Feature activation-7.27e-01

Pos logit contributions
 Can+0.087
 oh+0.084
 insect+0.080
 Please+0.077
 mean+0.076
Neg logit contributions
ips-0.103
aned-0.098
aled-0.096
ge-0.091
wed-0.089
Token<|endoftext|>
Token ID50256
Feature activation+5.86e-02

Pos logit contributions
ips+0.077
aned+0.074
aled+0.071
ge+0.070
wed+0.066
Neg logit contributions
 Can-0.051
 oh-0.047
 insect-0.045
 I-0.045
 mean-0.043
TokenOne
Token ID3198
Feature activation-1.80e+00

Pos logit contributions
aned+0.002
ips+0.002
aled+0.002
ge+0.002
wed+0.002
Neg logit contributions
 Can-0.001
 I-0.001
 mean-0.001
 happened-0.001
 oh-0.001
Token day
Token ID1110
Feature activation-2.22e+00

Pos logit contributions
 Can+0.038
 oh+0.035
 Please+0.033
 insect+0.032
 witch+0.031
Neg logit contributions
ips-0.049
aled-0.045
aned-0.045
wed-0.042
 safely-0.042
Token,
Token ID11
Feature activation-4.20e+00

Pos logit contributions
 Can+0.039
 insect+0.037
 witch+0.037
 oh+0.034
 writer+0.033
Neg logit contributions
ips-0.069
aned-0.063
aled-0.058
ippy-0.058
 off-0.058
Token a
Token ID257
Feature activation-6.02e+00

Pos logit contributions
 witch+0.089
 insect+0.086
 Can+0.085
 writer+0.082
 crow+0.078
Neg logit contributions
ips-0.106
aned-0.096
aled-0.095
 safely-0.094
 securely-0.093
Token funny
Token ID8258
Feature activation-7.01e+00

Pos logit contributions
 Can+0.127
 oh+0.119
 unlike+0.116
heart+0.115
 Please+0.113
Neg logit contributions
ippy-0.112
ips-0.108
 safely-0.105
 securely-0.097
ge-0.096
Token cat
Token ID3797
Feature activation-4.51e+00

Pos logit contributions
 Can+0.136
 Do+0.130
 oh+0.122
 Please+0.119
 Does+0.118
Neg logit contributions
ippy-0.155
 safely-0.127
ips-0.127
 home-0.123
 happily-0.123
Token and
Token ID290
Feature activation-4.55e+00

Pos logit contributions
 oh+0.084
 dear+0.081
 Can+0.077
 Please+0.073
 please+0.070
Neg logit contributions
aled-0.113
ips-0.113
wed-0.107
 safely-0.104
eds-0.102
Token a
Token ID257
Feature activation-5.59e+00

Pos logit contributions
 Can+0.127
 Please+0.119
 insect+0.118
 writer+0.114
 whale+0.113
Neg logit contributions
 off-0.047
 ease-0.042
aled-0.040
 herself-0.038
 himself-0.035
Token little
Token ID1310
Feature activation-4.43e+00

Pos logit contributions
heart+0.115
 Can+0.108
 unlike+0.103
 Judy+0.101
 Please+0.100
Neg logit contributions
 safely-0.092
 ease-0.090
 finish-0.082
 securely-0.082
 top-0.081
Token dog
Token ID3290
Feature activation-3.23e+00

Pos logit contributions
 oh+0.078
 Can+0.071
 unlike+0.066
 happened+0.062
 Please+0.062
Neg logit contributions
ips-0.116
ippy-0.099
ags-0.099
 off-0.097
 halves-0.097
Token upon
Token ID2402
Feature activation-3.56e+00

Pos logit contributions
ips+0.002
aned+0.002
aled+0.002
ge+0.001
wed+0.001
Neg logit contributions
 Can-0.001
 oh-0.001
 insect-0.001
 mean-0.001
 I-0.001
Token a
Token ID257
Feature activation-3.39e+00

Pos logit contributions
 Can+0.066
 insect+0.061
 oh+0.060
 Please+0.056
 goat+0.056
Neg logit contributions
ips-0.099
aled-0.092
aned-0.092
 safely-0.090
 ease-0.088
Token time
Token ID640
Feature activation-3.19e+00

Pos logit contributions
 Can+0.078
 oh+0.074
 insect+0.071
 witch+0.067
 Please+0.067
Neg logit contributions
ips-0.085
aned-0.080
aled-0.076
wed-0.074
ippy-0.072
Token,
Token ID11
Feature activation-4.72e+00

Pos logit contributions
 Can+0.061
 insect+0.061
 goat+0.059
 witch+0.058
 oh+0.058
Neg logit contributions
ips-0.092
aned-0.077
 safely-0.076
ippy-0.076
ge-0.074
Token a
Token ID257
Feature activation-5.80e+00

Pos logit contributions
 insect+0.094
 goat+0.093
 witch+0.093
 Can+0.091
 writer+0.089
Neg logit contributions
ips-0.117
 safely-0.104
 securely-0.100
wed-0.099
aled-0.098
Token shy
Token ID15800
Feature activation-7.01e+00

Pos logit contributions
 Can+0.119
 oh+0.118
 Please+0.103
 Are+0.101
 Do+0.100
Neg logit contributions
ips-0.134
ippy-0.128
ags-0.116
 safely-0.116
ge-0.109
Token little
Token ID1310
Feature activation-5.56e+00

Pos logit contributions
 oh+0.129
 Can+0.112
 Please+0.104
 Do+0.100
 Are+0.100
Neg logit contributions
ippy-0.179
ips-0.177
aled-0.164
 safely-0.161
lo-0.155
Token girl
Token ID2576
Feature activation-4.86e+00

Pos logit contributions
 oh+0.097
 Can+0.092
 unlike+0.090
 Please+0.081
 Are+0.080
Neg logit contributions
ips-0.150
ippy-0.142
ags-0.124
 halves-0.124
ifts-0.119
Token named
Token ID3706
Feature activation-2.96e+00

Pos logit contributions
 oh+0.085
 Can+0.077
 dear+0.072
 Please+0.071
 hello+0.068
Neg logit contributions
ips-0.141
wed-0.130
aled-0.127
eds-0.119
ws-0.116
Token Lily
Token ID20037
Feature activation-3.93e+00

Pos logit contributions
 Can+0.047
 oh+0.044
 insect+0.043
 mean+0.040
 Please+0.039
Neg logit contributions
ips-0.093
aned-0.088
aled-0.088
wed-0.081
 safely-0.080
Token went
Token ID1816
Feature activation+6.55e-01

Pos logit contributions
 oh+0.092
 Can+0.090
 Please+0.082
 please+0.081
 insect+0.081
Neg logit contributions
ips-0.092
wed-0.089
aled-0.088
ws-0.083
aned-0.082
Token time
Token ID640
Feature activation-3.30e+00

Pos logit contributions
 Can+0.115
 oh+0.107
 insect+0.101
 I+0.099
 Please+0.098
Neg logit contributions
ips-0.086
aned-0.079
ippy-0.078
wed-0.075
aled-0.073
Token there
Token ID612
Feature activation-1.15e+00

Pos logit contributions
 Can+0.071
 goat+0.071
 insect+0.069
 witch+0.069
 writer+0.068
Neg logit contributions
ips-0.081
ippy-0.070
 safely-0.069
ge-0.069
aned-0.065
Token was
Token ID373
Feature activation-5.07e+00

Pos logit contributions
 Can+0.026
 insect+0.024
 oh+0.024
 Please+0.024
 witch+0.023
Neg logit contributions
ips-0.028
aned-0.026
 ease-0.025
aled-0.025
 safely-0.024
Token a
Token ID257
Feature activation-6.04e+00

Pos logit contributions
 Can+0.122
 Please+0.113
 Are+0.108
 guy+0.107
 witch+0.106
Neg logit contributions
ips-0.115
 ease-0.109
ags-0.105
aled-0.104
 safely-0.102
Token j
Token ID474
Feature activation-2.97e+00

Pos logit contributions
 Can+0.123
 oh+0.121
 happened+0.119
 Please+0.111
 unlike+0.110
Neg logit contributions
ippy-0.115
 safely-0.114
ips-0.111
 securely-0.105
 finish-0.104
Tokenolly
Token ID5098
Feature activation-7.00e+00

Pos logit contributions
 Can+0.063
 oh+0.062
 insect+0.057
 Please+0.055
 mean+0.054
Neg logit contributions
ips-0.081
aned-0.075
aled-0.074
ge-0.068
ippy-0.067
Token girl
Token ID2576
Feature activation-5.23e+00

Pos logit contributions
 Can+0.139
 oh+0.130
 Do+0.129
 Are+0.127
 happened+0.124
Neg logit contributions
ippy-0.149
 safely-0.142
ail-0.125
ge-0.125
ips-0.124
Token.
Token ID13
Feature activation-6.25e-01

Pos logit contributions
 Can+0.088
 oh+0.086
 guy+0.083
 hello+0.080
 happened+0.078
Neg logit contributions
ips-0.141
 safely-0.128
wed-0.125
ippy-0.123
eds-0.123
Token She
Token ID1375
Feature activation-2.79e+00

Pos logit contributions
 oh+0.011
 Can+0.011
 insect+0.010
 mean+0.010
 witch+0.009
Neg logit contributions
ips-0.019
aled-0.017
aned-0.017
ags-0.016
 ease-0.016
Token was
Token ID373
Feature activation-2.60e+00

Pos logit contributions
 Can+0.053
 oh+0.052
 insect+0.051
 hello+0.047
 Please+0.046
Neg logit contributions
ips-0.074
aned-0.072
aled-0.072
wed-0.068
 securely-0.066
Token gigg
Token ID30442
Feature activation-1.77e+00

Pos logit contributions
 Can+0.058
 insect+0.055
 guy+0.052
 Please+0.051
 whale+0.051
Neg logit contributions
ips-0.064
aled-0.057
aned-0.053
 securely-0.053
 safely-0.052
Token,
Token ID11
Feature activation-4.62e+00

Pos logit contributions
 goat+0.070
 Can+0.068
 insect+0.067
 pigeon+0.066
 witch+0.066
Neg logit contributions
ips-0.079
ge-0.067
 safely-0.067
ippy-0.066
aned-0.063
Token there
Token ID612
Feature activation-9.78e-01

Pos logit contributions
 witch+0.096
 goat+0.096
 photographer+0.095
 writer+0.095
 insect+0.094
Neg logit contributions
ips-0.106
 safely-0.101
ge-0.094
 securely-0.094
 off-0.092
Token was
Token ID373
Feature activation-4.74e+00

Pos logit contributions
 Can+0.022
 insect+0.020
 oh+0.020
 Please+0.020
 witch+0.019
Neg logit contributions
ips-0.025
aned-0.022
aled-0.022
 ease-0.021
 safely-0.021
Token a
Token ID257
Feature activation-6.08e+00

Pos logit contributions
 Can+0.114
 Please+0.106
 Are+0.102
 oh+0.099
 guy+0.099
Neg logit contributions
ips-0.106
 ease-0.100
aled-0.098
ags-0.095
 safely-0.094
Token j
Token ID474
Feature activation-2.98e+00

Pos logit contributions
 happened+0.127
 Can+0.125
 oh+0.124
 unlike+0.118
 Please+0.116
Neg logit contributions
 safely-0.104
ippy-0.099
 finish-0.098
 checkout-0.094
ips-0.094
Tokenolly
Token ID5098
Feature activation-6.98e+00

Pos logit contributions
 Can+0.066
 oh+0.065
 insect+0.059
 Please+0.057
 Are+0.056
Neg logit contributions
ips-0.079
aned-0.072
aled-0.071
ippy-0.066
ags-0.065
Token girl
Token ID2576
Feature activation-5.25e+00

Pos logit contributions
 Can+0.134
 oh+0.125
 Do+0.124
 Are+0.123
 happened+0.122
Neg logit contributions
ippy-0.150
 safely-0.141
ips-0.125
 securely-0.122
ail-0.122
Token named
Token ID3706
Feature activation-3.09e+00

Pos logit contributions
 Can+0.092
 oh+0.092
 guy+0.089
 hello+0.086
 happened+0.084
Neg logit contributions
ips-0.136
 safely-0.121
wed-0.118
ippy-0.117
eds-0.117
Token Joy
Token ID14087
Feature activation-3.97e+00

Pos logit contributions
 Can+0.045
 oh+0.042
 insect+0.041
 mean+0.038
 Please+0.037
Neg logit contributions
ips-0.104
aned-0.099
aled-0.097
ge-0.091
wed-0.091
Token.
Token ID13
Feature activation+1.57e-01

Pos logit contributions
 Can+0.108
 oh+0.108
 Please+0.099
 insect+0.096
 guy+0.096
Neg logit contributions
ips-0.073
ge-0.066
wed-0.065
 safely-0.064
ippy-0.061
Token She
Token ID1375
Feature activation-2.81e+00

Pos logit contributions
ips+0.005
aned+0.004
aled+0.004
wed+0.004
ags+0.004
Neg logit contributions
 oh-0.003
 Can-0.003
 insect-0.003
 mean-0.002
 witch-0.002
Token,
Token ID11
Feature activation-4.51e+00

Pos logit contributions
 Can+0.068
 insect+0.067
 goat+0.067
 witch+0.066
 writer+0.065
Neg logit contributions
ips-0.084
 safely-0.070
ippy-0.069
ge-0.068
aned-0.067
Token there
Token ID612
Feature activation-1.07e+00

Pos logit contributions
 witch+0.092
 insect+0.090
 Can+0.089
 writer+0.089
 goat+0.088
Neg logit contributions
ips-0.114
 safely-0.103
 securely-0.097
aled-0.096
 off-0.094
Token was
Token ID373
Feature activation-4.84e+00

Pos logit contributions
 Can+0.022
 insect+0.021
 oh+0.021
 Please+0.020
 witch+0.020
Neg logit contributions
ips-0.029
aned-0.027
aled-0.026
 ease-0.025
 safely-0.025
Token a
Token ID257
Feature activation-5.99e+00

Pos logit contributions
 Can+0.116
 Please+0.107
 Are+0.104
 oh+0.102
 guy+0.102
Neg logit contributions
ips-0.109
 ease-0.101
ags-0.100
aled-0.099
 safely-0.097
Token j
Token ID474
Feature activation-3.17e+00

Pos logit contributions
 Can+0.120
 oh+0.119
 happened+0.117
 Please+0.108
 unlike+0.107
Neg logit contributions
 safely-0.117
ippy-0.114
ips-0.113
 securely-0.105
ge-0.104
Tokenolly
Token ID5098
Feature activation-6.97e+00

Pos logit contributions
 Can+0.068
 oh+0.067
 insect+0.062
 Please+0.059
 mean+0.058
Neg logit contributions
ips-0.086
aned-0.079
aled-0.078
ge-0.071
ippy-0.071
Token boy
Token ID2933
Feature activation-4.99e+00

Pos logit contributions
 Can+0.131
 oh+0.123
 Are+0.119
 Do+0.118
 happened+0.117
Neg logit contributions
ippy-0.160
 safely-0.152
ips-0.141
 securely-0.134
ge-0.133
Token named
Token ID3706
Feature activation-3.33e+00

Pos logit contributions
 Can+0.084
 oh+0.076
 hello+0.072
 Please+0.071
 dear+0.071
Neg logit contributions
ips-0.147
wed-0.133
ippy-0.132
ge-0.131
aled-0.131
Token Tom
Token ID4186
Feature activation-3.31e+00

Pos logit contributions
 Can+0.048
 oh+0.046
 insect+0.046
 witch+0.042
 mean+0.041
Neg logit contributions
ips-0.109
aled-0.103
aned-0.102
ge-0.097
 ease-0.096
Token.
Token ID13
Feature activation-3.71e-01

Pos logit contributions
 Can+0.079
 oh+0.076
 insect+0.072
 Please+0.071
 hello+0.068
Neg logit contributions
ips-0.079
aned-0.073
aled-0.072
ge-0.071
wed-0.070
Token He
Token ID679
Feature activation-3.10e+00

Pos logit contributions
 Can+0.005
 oh+0.005
 insect+0.005
 mean+0.005
 I+0.004
Neg logit contributions
ips-0.012
aned-0.012
aled-0.012
ge-0.011
wed-0.011
Token.
Token ID13
Feature activation+1.01e+00

Pos logit contributions
ips+0.030
aned+0.030
aled+0.029
ags+0.027
wed+0.027
Neg logit contributions
 Can-0.016
 oh-0.015
 insect-0.014
 I-0.014
 mean-0.014
Token<|endoftext|>
Token ID50256
Feature activation-1.34e+00

Pos logit contributions
ips+0.029
aned+0.028
aled+0.027
wed+0.026
ge+0.025
Neg logit contributions
 Can-0.021
 oh-0.019
 I-0.018
 Please-0.018
 insect-0.017
TokenOne
Token ID3198
Feature activation-2.71e+00

Pos logit contributions
 Can+0.030
 oh+0.028
 insect+0.027
 mean+0.026
 I+0.026
Neg logit contributions
ips-0.034
aned-0.033
aled-0.032
ge-0.030
wed-0.030
Token day
Token ID1110
Feature activation-3.52e+00

Pos logit contributions
 Can+0.065
 oh+0.061
 Please+0.057
 witch+0.056
 hello+0.055
Neg logit contributions
ips-0.063
aled-0.056
aned-0.054
 safely-0.054
 ease-0.053
Token,
Token ID11
Feature activation-5.33e+00

Pos logit contributions
 Can+0.065
 insect+0.065
 witch+0.064
 crow+0.058
 writer+0.058
Neg logit contributions
ips-0.096
aned-0.089
 off-0.087
rolled-0.085
ge-0.084
Token a
Token ID257
Feature activation-6.97e+00

Pos logit contributions
 witch+0.126
 writer+0.120
 Can+0.117
 insect+0.115
 crow+0.109
Neg logit contributions
 ease-0.111
 safely-0.109
ips-0.109
 securely-0.107
rolled-0.106
Token small
Token ID1402
Feature activation-5.48e+00

Pos logit contributions
 Can+0.139
 unlike+0.139
 oh+0.136
 happened+0.129
 Please+0.129
Neg logit contributions
ippy-0.112
 safely-0.112
 finish-0.110
 securely-0.105
 checkout-0.102
Token girl
Token ID2576
Feature activation-5.12e+00

Pos logit contributions
 oh+0.103
 Can+0.090
 Do+0.085
 Please+0.080
 hello+0.079
Neg logit contributions
ips-0.130
ippy-0.128
 safely-0.119
 home-0.117
ifts-0.113
Token called
Token ID1444
Feature activation-4.27e+00

Pos logit contributions
 oh+0.094
 dear+0.091
 Can+0.084
 sir+0.084
 Please+0.084
Neg logit contributions
ips-0.130
wed-0.120
aled-0.120
 safely-0.115
eds-0.109
Token Lucy
Token ID22162
Feature activation-3.71e+00

Pos logit contributions
 Can+0.081
 oh+0.077
 insect+0.075
 crow+0.072
 Please+0.072
Neg logit contributions
ips-0.114
aled-0.106
aned-0.101
 ease-0.100
 off-0.100
Token went
Token ID1816
Feature activation+1.04e+00

Pos logit contributions
 oh+0.087
 Can+0.086
 insect+0.082
 Please+0.079
 mean+0.077
Neg logit contributions
ips-0.083
aled-0.079
wed-0.077
 safely-0.075
aned-0.075
Token.
Token ID13
Feature activation+1.76e+00

Pos logit contributions
 Can+0.015
 oh+0.014
 insect+0.014
 mean+0.013
 I+0.013
Neg logit contributions
ips-0.024
aned-0.023
aled-0.022
ge-0.021
wed-0.020
Token<|endoftext|>
Token ID50256
Feature activation-9.66e-01

Pos logit contributions
ips+0.049
aned+0.047
aled+0.045
ge+0.042
wed+0.042
Neg logit contributions
 Can-0.036
 oh-0.034
 insect-0.033
 mean-0.032
 Please-0.031
TokenOne
Token ID3198
Feature activation-2.15e+00

Pos logit contributions
 Can+0.017
 I+0.015
 oh+0.015
 insect+0.014
 mean+0.014
Neg logit contributions
aned-0.030
ips-0.029
aled-0.028
wed-0.027
ge-0.027
Token day
Token ID1110
Feature activation-3.20e+00

Pos logit contributions
 Can+0.049
 oh+0.046
 Please+0.043
 insect+0.042
 witch+0.042
Neg logit contributions
ips-0.053
aled-0.049
aned-0.048
wed-0.046
 safely-0.046
Token,
Token ID11
Feature activation-5.20e+00

Pos logit contributions
 Can+0.064
 insect+0.063
 witch+0.061
 crow+0.059
 goat+0.057
Neg logit contributions
ips-0.085
aned-0.078
 off-0.076
ippy-0.074
ge-0.073
Token a
Token ID257
Feature activation-6.96e+00

Pos logit contributions
 witch+0.131
 writer+0.130
 whale+0.125
 insect+0.123
 crow+0.121
Neg logit contributions
 safely-0.094
 securely-0.092
ips-0.091
rolled-0.091
 ease-0.091
Token small
Token ID1402
Feature activation-5.47e+00

Pos logit contributions
 unlike+0.138
 Can+0.135
 oh+0.135
heart+0.133
 Please+0.126
Neg logit contributions
 safely-0.116
 finish-0.114
ippy-0.113
 securely-0.106
ips-0.103
Token boy
Token ID2933
Feature activation-4.80e+00

Pos logit contributions
 oh+0.103
 Can+0.088
 Do+0.081
 Please+0.077
 hello+0.077
Neg logit contributions
ips-0.137
ippy-0.133
 safely-0.122
 home-0.119
ifts-0.116
Token wanted
Token ID2227
Feature activation-2.12e+00

Pos logit contributions
 dear+0.088
 oh+0.084
 Please+0.076
 Can+0.074
 hello+0.074
Neg logit contributions
ips-0.130
wed-0.125
aled-0.124
 safely-0.111
ws-0.109
Token to
Token ID284
Feature activation+5.89e-01

Pos logit contributions
 Can+0.052
 oh+0.049
 insect+0.048
 Are+0.048
 witch+0.048
Neg logit contributions
ips-0.044
aled-0.040
 safely-0.039
aned-0.039
 off-0.038
Token eat
Token ID4483
Feature activation-1.57e+00

Pos logit contributions
ips+0.015
aned+0.014
aled+0.014
ge+0.013
wed+0.012
Neg logit contributions
 Can-0.013
 oh-0.013
 insect-0.013
 mean-0.012
 I-0.012
Token.
Token ID13
Feature activation+1.75e+00

Pos logit contributions
aned+0.063
ips+0.063
aled+0.061
ulk+0.057
wed+0.055
Neg logit contributions
 Can-0.030
 oh-0.028
 I-0.027
 mean-0.027
 insect-0.027
Token<|endoftext|>
Token ID50256
Feature activation-6.08e-01

Pos logit contributions
ips+0.059
aned+0.057
ge+0.055
aled+0.053
wed+0.052
Neg logit contributions
 Can-0.028
 I-0.024
 Please-0.023
 oh-0.022
 hello-0.021
TokenOne
Token ID3198
Feature activation-2.03e+00

Pos logit contributions
 Can+0.010
 I+0.009
 mean+0.008
 oh+0.008
 happened+0.008
Neg logit contributions
aned-0.020
ips-0.019
aled-0.019
wed-0.018
ge-0.018
Token day
Token ID1110
Feature activation-2.81e+00

Pos logit contributions
 Can+0.040
 oh+0.038
 insect+0.036
 Please+0.035
 hello+0.034
Neg logit contributions
ips-0.057
aned-0.053
aled-0.052
wed-0.049
ge-0.049
Token,
Token ID11
Feature activation-4.99e+00

Pos logit contributions
 insect+0.045
 Can+0.044
 witch+0.043
 oh+0.041
 crow+0.038
Neg logit contributions
ips-0.090
aned-0.083
aled-0.078
ge-0.077
ippy-0.077
Token a
Token ID257
Feature activation-6.96e+00

Pos logit contributions
 witch+0.120
 writer+0.113
 insect+0.112
 photographer+0.106
 peac+0.105
Neg logit contributions
ips-0.109
 safely-0.100
 securely-0.099
aled-0.098
 ease-0.095
Token little
Token ID1310
Feature activation-6.28e+00

Pos logit contributions
 unlike+0.134
 oh+0.134
 Can+0.132
heart+0.125
 Please+0.123
Neg logit contributions
ippy-0.126
 safely-0.122
ips-0.122
 finish-0.116
 securely-0.112
Token girl
Token ID2576
Feature activation-5.27e+00

Pos logit contributions
 oh+0.108
 Can+0.103
 unlike+0.102
 hello+0.097
 happened+0.093
Neg logit contributions
ips-0.161
ippy-0.147
 halves-0.136
ags-0.134
 safely-0.134
Token named
Token ID3706
Feature activation-3.22e+00

Pos logit contributions
 oh+0.089
 dear+0.082
 Can+0.078
 hello+0.076
 Please+0.074
Neg logit contributions
ips-0.149
wed-0.141
aled-0.138
 safely-0.125
ws-0.124
Token Lily
Token ID20037
Feature activation-4.19e+00

Pos logit contributions
 Can+0.052
 oh+0.051
 insect+0.050
 goat+0.046
 crow+0.045
Neg logit contributions
ips-0.098
aned-0.093
aled-0.092
wed-0.086
ags-0.084
Token went
Token ID1816
Feature activation+5.27e-01

Pos logit contributions
 oh+0.102
 Can+0.099
 hello+0.094
 dear+0.094
 please+0.093
Neg logit contributions
ips-0.093
wed-0.088
aled-0.088
ws-0.082
aned-0.081
Token a
Token ID257
Feature activation-4.04e+00

Pos logit contributions
 Can+0.077
 insect+0.071
 oh+0.070
 Please+0.067
 witch+0.066
Neg logit contributions
ips-0.099
aled-0.093
 safely-0.093
 ease-0.092
aned-0.091
Token time
Token ID640
Feature activation-3.20e+00

Pos logit contributions
 Can+0.105
 oh+0.101
 insect+0.095
 hello+0.091
 Please+0.091
Neg logit contributions
ips-0.086
aned-0.079
ippy-0.077
aled-0.076
wed-0.074
Token,
Token ID11
Feature activation-4.51e+00

Pos logit contributions
 insect+0.071
 goat+0.069
 witch+0.069
 Can+0.069
 crow+0.068
Neg logit contributions
ips-0.078
 safely-0.066
ippy-0.066
ge-0.064
aned-0.063
Token a
Token ID257
Feature activation-6.57e+00

Pos logit contributions
 witch+0.102
 insect+0.098
 goat+0.097
 crow+0.097
 writer+0.096
Neg logit contributions
ips-0.098
 safely-0.098
ge-0.093
 securely-0.090
 off-0.088
Token j
Token ID474
Feature activation-2.74e+00

Pos logit contributions
 oh+0.143
 Can+0.142
 unlike+0.132
 Please+0.131
 Do+0.127
Neg logit contributions
ippy-0.115
 safely-0.109
ips-0.103
 securely-0.097
 finish-0.094
Tokenolly
Token ID5098
Feature activation-6.95e+00

Pos logit contributions
 Can+0.056
 oh+0.054
 insect+0.052
 mean+0.049
 Please+0.049
Neg logit contributions
ips-0.076
aned-0.072
aled-0.070
ge-0.065
wed-0.064
Token angel
Token ID18304
Feature activation-4.12e+00

Pos logit contributions
 oh+0.141
 Can+0.141
 Do+0.127
 Please+0.127
 Does+0.124
Neg logit contributions
ippy-0.150
 safely-0.136
ips-0.130
ail-0.128
sey-0.126
Token flew
Token ID13112
Feature activation+6.68e-01

Pos logit contributions
 oh+0.056
 Can+0.052
 mean+0.043
 Please+0.043
 insect+0.041
Neg logit contributions
ips-0.138
aled-0.129
aned-0.127
wed-0.123
 safely-0.122
Token down
Token ID866
Feature activation+6.74e-01

Pos logit contributions
ips+0.018
aned+0.017
aled+0.016
ge+0.015
wed+0.015
Neg logit contributions
 Can-0.015
 insect-0.013
 oh-0.013
 mean-0.013
 hello-0.013
Token from
Token ID422
Feature activation-1.45e+00

Pos logit contributions
ips+0.020
aned+0.020
aled+0.019
ge+0.018
wed+0.018
Neg logit contributions
 Can-0.013
 insect-0.012
 oh-0.012
 I-0.011
 mean-0.011
Token the
Token ID262
Feature activation-1.27e+00

Pos logit contributions
 Can+0.030
 oh+0.029
 insect+0.028
 mean+0.027
 Please+0.027
Neg logit contributions
ips-0.039
aned-0.037
aled-0.036
ge-0.033
wed-0.033
Token upon
Token ID2402
Feature activation-3.87e+00

Pos logit contributions
ips+0.009
aned+0.008
aled+0.008
ge+0.007
wed+0.007
Neg logit contributions
 Can-0.007
 oh-0.007
 insect-0.007
 mean-0.006
 Please-0.006
Token a
Token ID257
Feature activation-3.40e+00

Pos logit contributions
 Can+0.075
 insect+0.070
 oh+0.068
 goat+0.065
 Please+0.065
Neg logit contributions
ips-0.101
 safely-0.095
aled-0.095
 ease-0.095
aned-0.094
Token time
Token ID640
Feature activation-2.88e+00

Pos logit contributions
 Can+0.082
 oh+0.079
 insect+0.075
 witch+0.071
 Please+0.071
Neg logit contributions
ips-0.081
aned-0.075
aled-0.070
ippy-0.070
wed-0.069
Token,
Token ID11
Feature activation-4.26e+00

Pos logit contributions
 insect+0.061
 goat+0.061
 witch+0.059
 crow+0.058
 pigeon+0.057
Neg logit contributions
ips-0.073
 safely-0.063
ge-0.062
 ease-0.060
ippy-0.059
Token a
Token ID257
Feature activation-5.87e+00

Pos logit contributions
 witch+0.093
 photographer+0.092
 writer+0.091
 insect+0.090
 goat+0.089
Neg logit contributions
ips-0.093
 safely-0.091
 securely-0.086
ge-0.082
 off-0.082
Token silly
Token ID14397
Feature activation-6.95e+00

Pos logit contributions
 oh+0.117
 Can+0.114
 Please+0.101
 Are+0.097
 Do+0.097
Neg logit contributions
ips-0.135
ippy-0.132
 safely-0.120
ge-0.117
 securely-0.114
Token infant
Token ID11212
Feature activation-5.01e+00

Pos logit contributions
 oh+0.118
 Can+0.115
 Please+0.105
 Do+0.104
 Are+0.102
Neg logit contributions
ippy-0.158
ips-0.154
 safely-0.150
aled-0.143
 destination-0.138
Token was
Token ID373
Feature activation-2.90e+00

Pos logit contributions
 oh+0.086
 please+0.076
 Can+0.073
 hello+0.071
 dear+0.070
Neg logit contributions
ips-0.132
aled-0.123
sey-0.121
 safely-0.118
wed-0.117
Token playing
Token ID2712
Feature activation-4.60e-01

Pos logit contributions
 Can+0.060
 oh+0.054
 insect+0.054
 Please+0.050
 hello+0.049
Neg logit contributions
ips-0.083
aned-0.077
aled-0.076
ags-0.069
 safely-0.069
Token outside
Token ID2354
Feature activation-2.23e+00

Pos logit contributions
 Can+0.010
 insect+0.009
 oh+0.009
 witch+0.008
 mean+0.008
Neg logit contributions
ips-0.012
aled-0.011
aned-0.011
ags-0.011
 safely-0.010
Token in
Token ID287
Feature activation-1.91e+00

Pos logit contributions
 Can+0.048
 oh+0.047
 insect+0.046
 witch+0.045
 pigeon+0.045
Neg logit contributions
ips-0.057
aled-0.051
aned-0.050
 safely-0.048
ulk-0.047
Token.
Token ID13
Feature activation+2.26e+00

Pos logit contributions
 Can+0.054
 oh+0.052
 insect+0.052
 mean+0.050
 pigeon+0.048
Neg logit contributions
ips-0.056
aned-0.049
 safely-0.049
aled-0.048
ge-0.047
Token<|endoftext|>
Token ID50256
Feature activation-9.21e-01

Pos logit contributions
ips+0.066
aned+0.064
aled+0.061
ge+0.058
wed+0.057
Neg logit contributions
 Can-0.044
 oh-0.040
 insect-0.040
 I-0.038
 mean-0.038
TokenOne
Token ID3198
Feature activation-2.30e+00

Pos logit contributions
 Can+0.015
 I+0.013
 oh+0.012
 Are+0.012
 happened+0.012
Neg logit contributions
aned-0.030
ips-0.029
aled-0.029
wed-0.028
ge-0.027
Token day
Token ID1110
Feature activation-3.61e+00

Pos logit contributions
 Can+0.052
 oh+0.049
 Please+0.046
 insect+0.045
 hello+0.044
Neg logit contributions
ips-0.057
aled-0.052
aned-0.051
 safely-0.050
wed-0.049
Token,
Token ID11
Feature activation-5.36e+00

Pos logit contributions
 insect+0.072
 witch+0.071
 Can+0.071
 crow+0.066
 writer+0.063
Neg logit contributions
ips-0.095
aned-0.087
 off-0.085
ge-0.083
rolled-0.083
Token a
Token ID257
Feature activation-6.95e+00

Pos logit contributions
 witch+0.132
 writer+0.123
 insect+0.123
 Can+0.117
 crow+0.116
Neg logit contributions
 safely-0.108
 ease-0.107
ips-0.106
 securely-0.106
rolled-0.105
Token little
Token ID1310
Feature activation-6.24e+00

Pos logit contributions
 Can+0.136
 oh+0.135
 unlike+0.135
heart+0.129
 Please+0.125
Neg logit contributions
 safely-0.119
ippy-0.117
 finish-0.115
 securely-0.110
ips-0.108
Token girl
Token ID2576
Feature activation-5.04e+00

Pos logit contributions
 oh+0.115
 unlike+0.108
 Can+0.105
 happened+0.105
 hello+0.102
Neg logit contributions
ips-0.138
ippy-0.130
 home-0.124
 safely-0.121
 halves-0.115
Token went
Token ID1816
Feature activation+4.73e-01

Pos logit contributions
 oh+0.088
 dear+0.085
 Can+0.080
 hello+0.078
 Please+0.078
Neg logit contributions
ips-0.133
wed-0.124
aled-0.124
 safely-0.115
eds-0.112
Token outside
Token ID2354
Feature activation-1.44e+00

Pos logit contributions
ips+0.014
aned+0.013
aled+0.013
ge+0.012
wed+0.012
Neg logit contributions
 Can-0.009
 oh-0.008
 insect-0.008
 mean-0.008
 I-0.008
Token to
Token ID284
Feature activation+1.28e-01

Pos logit contributions
 Can+0.030
 oh+0.029
 insect+0.028
 mean+0.027
 witch+0.026
Neg logit contributions
ips-0.039
aled-0.035
aned-0.035
ge-0.033
ulk-0.032
Token.
Token ID13
Feature activation+7.28e-02

Pos logit contributions
 Can+0.028
 oh+0.027
 insect+0.026
 mean+0.025
 I+0.024
Neg logit contributions
ips-0.044
aned-0.042
aled-0.041
wed-0.038
ge-0.037
Token The
Token ID383
Feature activation+2.40e-02

Pos logit contributions
ips+0.002
aned+0.002
aled+0.002
ge+0.002
wed+0.002
Neg logit contributions
 Can-0.002
 oh-0.001
 insect-0.001
 mean-0.001
 Please-0.001
Token dragon
Token ID10441
Feature activation-2.33e-01

Pos logit contributions
ips+0.001
aned+0.001
aled+0.001
 ease+0.001
 safely+0.001
Neg logit contributions
 Can-0.000
 oh-0.000
 insect-0.000
 I-0.000
 Please-0.000
Token took
Token ID1718
Feature activation+2.49e+00

Pos logit contributions
 Can+0.005
 oh+0.005
 insect+0.004
 Do+0.004
 I+0.004
Neg logit contributions
aned-0.006
ips-0.006
aled-0.006
 safely-0.006
ge-0.005
Token the
Token ID262
Feature activation+1.83e+00

Pos logit contributions
ips+0.071
aned+0.068
aled+0.066
ge+0.061
wed+0.061
Neg logit contributions
 Can-0.049
 oh-0.047
 insect-0.045
 mean-0.043
 I-0.042
Token treasure
Token ID14068
Feature activation+4.34e+00

Pos logit contributions
ips+0.050
aned+0.049
aled+0.048
ps+0.047
 ease+0.047
Neg logit contributions
 Can-0.033
 oh-0.030
 Please-0.028
 Are-0.028
 insect-0.028
Token and
Token ID290
Feature activation+5.83e-01

Pos logit contributions
ips+0.129
aned+0.120
aled+0.117
ulk+0.110
wed+0.108
Neg logit contributions
 Can-0.079
 insect-0.079
 oh-0.077
 mean-0.076
 I-0.073
Token flew
Token ID13112
Feature activation+3.03e+00

Pos logit contributions
ips+0.016
aned+0.016
aled+0.015
ge+0.014
wed+0.014
Neg logit contributions
 Can-0.012
 oh-0.011
 insect-0.011
 mean-0.010
 Please-0.010
Token away
Token ID1497
Feature activation+1.52e+00

Pos logit contributions
ips+0.055
aned+0.051
aled+0.046
ge+0.042
wed+0.042
Neg logit contributions
 Can-0.092
 oh-0.089
 mean-0.086
 insect-0.086
 hello-0.086
Token.
Token ID13
Feature activation+9.93e-01

Pos logit contributions
ips+0.054
aled+0.051
aned+0.050
wed+0.048
ags+0.048
Neg logit contributions
 oh-0.020
 Can-0.020
 mean-0.019
 curious-0.018
 I-0.018
Token Jack
Token ID3619
Feature activation+3.01e-01

Pos logit contributions
ips+0.029
aned+0.027
aled+0.026
ge+0.025
wed+0.025
Neg logit contributions
 Can-0.020
 oh-0.018
 insect-0.018
 I-0.017
 Please-0.017
Token that
Token ID326
Feature activation-1.92e+00

Pos logit contributions
 Can+0.050
 mean+0.047
 oh+0.047
 insect+0.046
 Please+0.043
Neg logit contributions
ips-0.062
aned-0.056
 safely-0.055
ge-0.054
ulk-0.054
Token she
Token ID673
Feature activation+5.36e-01

Pos logit contributions
 Can+0.024
 insect+0.023
 oh+0.022
 Please+0.019
 hello+0.019
Neg logit contributions
ips-0.067
aled-0.062
aned-0.061
 ease-0.061
 safely-0.060
Token didn
Token ID1422
Feature activation+4.32e+00

Pos logit contributions
ips+0.015
aned+0.015
aled+0.014
ge+0.013
wed+0.013
Neg logit contributions
 Can-0.010
 oh-0.010
 insect-0.010
 mean-0.009
 witch-0.009
Token't
Token ID470
Feature activation+4.52e-01

Pos logit contributions
ips+0.130
wed+0.126
aned+0.124
aled+0.121
ws+0.121
Neg logit contributions
 I-0.076
 Can-0.075
 mean-0.071
 please-0.069
 Please-0.065
Token forget
Token ID6044
Feature activation-1.49e+00

Pos logit contributions
ips+0.011
aned+0.011
aled+0.011
ge+0.010
wed+0.010
Neg logit contributions
 Can-0.010
 oh-0.010
 insect-0.009
 witch-0.008
 Please-0.008
Token to
Token ID284
Feature activation+4.11e+00

Pos logit contributions
 Can+0.028
 insect+0.026
 oh+0.026
 Please+0.025
 mean+0.024
Neg logit contributions
ips-0.041
aned-0.039
aled-0.037
 safely-0.037
 off-0.036
Token smile
Token ID8212
Feature activation+2.58e-01

Pos logit contributions
ips+0.119
aned+0.113
aled+0.107
ge+0.099
ulk+0.098
Neg logit contributions
 Can-0.080
 mean-0.079
 insect-0.078
 oh-0.078
 I-0.075
Token anymore
Token ID7471
Feature activation-9.56e-01

Pos logit contributions
ips+0.007
aned+0.007
aled+0.007
ge+0.006
wed+0.006
Neg logit contributions
 Can-0.005
 oh-0.005
 insect-0.005
 mean-0.004
 Please-0.004
Token when
Token ID618
Feature activation-3.20e+00

Pos logit contributions
 Can+0.018
 oh+0.017
 insect+0.016
 mean+0.015
 I+0.015
Neg logit contributions
ips-0.029
aned-0.027
aled-0.027
wed-0.025
ge-0.025
Token taking
Token ID2263
Feature activation+2.86e+00

Pos logit contributions
 Can+0.044
 insect+0.041
 oh+0.040
 mean+0.038
 Please+0.036
Neg logit contributions
ips-0.108
aned-0.098
aled-0.098
 ease-0.096
 safely-0.095
Token pictures
Token ID5986
Feature activation+1.61e+00

Pos logit contributions
ips+0.081
aned+0.079
aled+0.075
ge+0.072
wed+0.070
Neg logit contributions
 Can-0.057
 oh-0.054
 insect-0.052
 mean-0.052
 hello-0.050
Token,
Token ID11
Feature activation-3.76e+00

Pos logit contributions
 Can+0.012
 oh+0.011
 insect+0.011
 mean+0.010
 Please+0.010
Neg logit contributions
ips-0.024
aned-0.023
aled-0.022
ge-0.021
wed-0.021
Token you
Token ID345
Feature activation+1.32e+00

Pos logit contributions
 Can+0.060
 writer+0.056
 guy+0.055
 oh+0.053
 insect+0.053
Neg logit contributions
ips-0.098
 safely-0.097
 off-0.094
aned-0.093
aled-0.089
Token don
Token ID836
Feature activation+3.49e+00

Pos logit contributions
ips+0.043
aned+0.041
aled+0.040
ge+0.038
ags+0.037
Neg logit contributions
 Can-0.021
 oh-0.019
 insect-0.019
 mean-0.019
 I-0.018
Token't
Token ID470
Feature activation+1.90e+00

Pos logit contributions
wed+0.106
ips+0.105
aled+0.104
ws+0.102
ge+0.100
Neg logit contributions
 I-0.059
 mean-0.058
 Can-0.058
 Please-0.052
 please-0.051
Token want
Token ID765
Feature activation+1.78e-02

Pos logit contributions
ips+0.059
aned+0.056
aled+0.055
wed+0.051
ags+0.051
Neg logit contributions
 Can-0.033
 oh-0.031
 mean-0.031
 insect-0.031
 I-0.029
Token to
Token ID284
Feature activation+3.94e+00

Pos logit contributions
ips+0.000
aned+0.000
aled+0.000
ge+0.000
 safely+0.000
Neg logit contributions
 Can-0.000
 oh-0.000
 insect-0.000
 mean-0.000
 witch-0.000
Token get
Token ID651
Feature activation+2.51e+00

Pos logit contributions
ips+0.117
aned+0.112
aled+0.111
wed+0.102
ags+0.101
Neg logit contributions
 mean-0.075
 Can-0.074
 I-0.073
 oh-0.071
 insect-0.070
Token hurt
Token ID5938
Feature activation+4.27e-01

Pos logit contributions
ips+0.071
aned+0.068
aled+0.066
ge+0.061
wed+0.061
Neg logit contributions
 Can-0.050
 oh-0.048
 insect-0.046
 mean-0.046
 I-0.044
Token!"
Token ID2474
Feature activation+1.90e+00

Pos logit contributions
ips+0.011
aned+0.011
aled+0.010
ge+0.010
 safely+0.009
Neg logit contributions
 Can-0.009
 oh-0.009
 insect-0.009
 mean-0.008
 witch-0.008
Token
Token ID220
Feature activation+3.82e+00

Pos logit contributions
ips+0.052
aned+0.051
aled+0.048
ge+0.048
wed+0.046
Neg logit contributions
 Can-0.040
 I-0.037
 oh-0.037
 insect-0.036
 hello-0.035
Token
Token ID198
Feature activation+3.77e+00

Pos logit contributions
aled+0.129
ips+0.128
aned+0.127
ge+0.126
wed+0.126
Neg logit contributions
 I-0.049
 Can-0.045
 oh-0.041
 Please-0.039
 Are-0.038
Tokenmy
Token ID1820
Feature activation+6.95e-01

Pos logit contributions
ips+0.052
aled+0.048
aned+0.046
ulk+0.046
ags+0.045
Neg logit contributions
 mean-0.022
 insect-0.022
 Can-0.021
 oh-0.021
 curious-0.019
Token got
Token ID1392
Feature activation+1.66e+00

Pos logit contributions
ips+0.024
aled+0.022
aned+0.022
ags+0.021
ge+0.021
Neg logit contributions
 oh-0.009
 Can-0.009
 insect-0.009
 mean-0.009
 curious-0.008
Token frustrated
Token ID14718
Feature activation-5.25e-01

Pos logit contributions
ips+0.050
aned+0.048
aled+0.046
ge+0.044
wed+0.043
Neg logit contributions
 Can-0.030
 oh-0.029
 insect-0.027
 mean-0.027
 Please-0.026
Token because
Token ID780
Feature activation-2.39e+00

Pos logit contributions
 Can+0.008
 oh+0.007
 insect+0.007
 mean+0.007
 I+0.006
Neg logit contributions
ips-0.018
aned-0.017
aled-0.017
wed-0.015
ge-0.015
Token he
Token ID339
Feature activation+1.17e+00

Pos logit contributions
 Can+0.032
 insect+0.029
 oh+0.029
 hello+0.026
 Please+0.026
Neg logit contributions
ips-0.081
aned-0.076
aled-0.075
ge-0.073
 safely-0.073
Token didn
Token ID1422
Feature activation+4.04e+00

Pos logit contributions
ips+0.043
aned+0.040
aled+0.040
ge+0.038
ags+0.038
Neg logit contributions
 Can-0.014
 oh-0.013
 insect-0.012
 mean-0.012
 Please-0.011
Token't
Token ID470
Feature activation+8.52e-01

Pos logit contributions
wed+0.131
ips+0.129
aled+0.125
ws+0.125
aned+0.121
Neg logit contributions
 I-0.061
 Can-0.055
 mean-0.055
 please-0.053
 Please-0.050
Token have
Token ID423
Feature activation+8.92e-01

Pos logit contributions
ips+0.029
aned+0.028
aled+0.027
ags+0.025
ulk+0.024
Neg logit contributions
 Can-0.013
 mean-0.012
 oh-0.012
 insect-0.011
 I-0.011
Token any
Token ID597
Feature activation+1.76e+00

Pos logit contributions
ips+0.028
aned+0.028
aled+0.025
ags+0.024
ippy+0.024
Neg logit contributions
 mean-0.016
 oh-0.015
 Can-0.015
 I-0.014
 happened-0.014
Token money
Token ID1637
Feature activation+9.26e-01

Pos logit contributions
ips+0.035
aned+0.034
aled+0.032
ulk+0.027
ge+0.027
Neg logit contributions
 Can-0.050
 oh-0.049
 mean-0.048
 insect-0.047
 hello-0.046
Token.
Token ID13
Feature activation+1.12e+00

Pos logit contributions
ips+0.029
aned+0.028
aled+0.027
ulk+0.025
ags+0.025
Neg logit contributions
 Can-0.016
 oh-0.015
 mean-0.014
 insect-0.014
 I-0.014
Token it
Token ID340
Feature activation+1.27e+00

Pos logit contributions
ips+0.001
aned+0.001
aled+0.001
ge+0.001
 safely+0.001
Neg logit contributions
 Can-0.000
 oh-0.000
 insect-0.000
 Please-0.000
 witch-0.000
Token was
Token ID373
Feature activation-1.79e+00

Pos logit contributions
ips+0.038
aned+0.037
aled+0.036
ge+0.033
wed+0.033
Neg logit contributions
 Can-0.023
 oh-0.021
 insect-0.021
 Please-0.019
 mean-0.019
Token the
Token ID262
Feature activation+3.10e-01

Pos logit contributions
 Can+0.037
 insect+0.036
 oh+0.035
 pigeon+0.034
 Please+0.034
Neg logit contributions
ips-0.043
aled-0.041
 ease-0.041
 safely-0.041
 securely-0.040
Token perfect
Token ID2818
Feature activation-1.48e-01

Pos logit contributions
ips+0.007
aled+0.006
 ease+0.006
aned+0.006
 safely+0.006
Neg logit contributions
 Can-0.007
 oh-0.007
 insect-0.007
 Are-0.007
 Please-0.006
Token way
Token ID835
Feature activation+6.20e-01

Pos logit contributions
 Can+0.003
 oh+0.003
 insect+0.003
 mean+0.003
 Please+0.003
Neg logit contributions
ips-0.004
aned-0.004
aled-0.004
 safely-0.003
ge-0.003
Token to
Token ID284
Feature activation+4.08e+00

Pos logit contributions
ips+0.017
aned+0.016
aled+0.016
ge+0.015
 safely+0.015
Neg logit contributions
 Can-0.012
 oh-0.012
 insect-0.011
 Please-0.011
 mean-0.011
Token make
Token ID787
Feature activation+1.31e+00

Pos logit contributions
ips+0.124
aned+0.122
aled+0.112
ippy+0.104
wed+0.103
Neg logit contributions
 mean-0.073
 I-0.073
 Can-0.072
 insect-0.071
 oh-0.071
Token their
Token ID511
Feature activation+1.62e+00

Pos logit contributions
ips+0.037
aned+0.035
aled+0.034
ge+0.031
wed+0.031
Neg logit contributions
 Can-0.026
 oh-0.025
 insect-0.024
 mean-0.023
 Please-0.023
Token pretend
Token ID16614
Feature activation-5.35e-01

Pos logit contributions
ips+0.042
aned+0.040
aled+0.039
ge+0.036
wed+0.035
Neg logit contributions
 Can-0.036
 oh-0.034
 insect-0.033
 mean-0.032
 Please-0.031
Token marriage
Token ID4845
Feature activation+8.37e-01

Pos logit contributions
 Can+0.011
 oh+0.010
 insect+0.010
 mean+0.009
 Please+0.009
Neg logit contributions
ips-0.015
aned-0.014
aled-0.013
ge-0.013
 safely-0.013
Token even
Token ID772
Feature activation+9.23e-01

Pos logit contributions
ips+0.025
aned+0.024
aled+0.023
ge+0.021
wed+0.021
Neg logit contributions
 Can-0.016
 oh-0.015
 insect-0.014
 mean-0.014
 I-0.014
Token to
Token ID284
Feature activation+3.68e+00

Pos logit contributions
ips+0.013
aned+0.012
aled+0.012
wed+0.011
ge+0.011
Neg logit contributions
 Can-0.009
 oh-0.009
 insect-0.009
 mean-0.008
 witch-0.008
Token help
Token ID1037
Feature activation+2.19e+00

Pos logit contributions
ips+0.122
aned+0.116
aled+0.111
ushes+0.103
ulk+0.103
Neg logit contributions
 mean-0.060
 oh-0.056
 Can-0.056
 I-0.056
 insect-0.054
Token others
Token ID1854
Feature activation+5.73e-01

Pos logit contributions
ips+0.063
aned+0.060
aled+0.058
ge+0.054
wed+0.053
Neg logit contributions
 Can-0.043
 oh-0.042
 insect-0.040
 mean-0.039
 I-0.038
Token and
Token ID290
Feature activation+1.11e+00

Pos logit contributions
ips+0.019
aned+0.018
aled+0.018
ge+0.017
wed+0.017
Neg logit contributions
 Can-0.008
 oh-0.008
 insect-0.008
 mean-0.007
 I-0.007
Token to
Token ID284
Feature activation+3.91e+00

Pos logit contributions
ips+0.030
aned+0.028
aled+0.027
ge+0.026
wed+0.025
Neg logit contributions
 Can-0.023
 oh-0.022
 insect-0.021
 mean-0.020
 witch-0.020
Token never
Token ID1239
Feature activation+2.87e+00

Pos logit contributions
ips+0.123
aned+0.118
aled+0.114
wed+0.103
ulk+0.103
Neg logit contributions
 mean-0.067
 Can-0.066
 oh-0.066
 I-0.064
 insect-0.063
Token give
Token ID1577
Feature activation+2.50e+00

Pos logit contributions
ips+0.080
aned+0.076
aled+0.073
wed+0.067
ags+0.066
Neg logit contributions
 Can-0.060
 oh-0.056
 mean-0.056
 insect-0.055
 I-0.054
Token up
Token ID510
Feature activation+1.93e-01

Pos logit contributions
ips+0.052
aned+0.048
aled+0.047
ge+0.044
 safely+0.042
Neg logit contributions
 Can-0.067
 oh-0.065
 insect-0.064
 witch-0.060
 mean-0.060
Token on
Token ID319
Feature activation+4.12e-01

Pos logit contributions
ips+0.006
aned+0.005
aled+0.005
ge+0.005
wed+0.005
Neg logit contributions
 Can-0.004
 oh-0.003
 insect-0.003
 mean-0.003
 witch-0.003
Token your
Token ID534
Feature activation+3.39e+00

Pos logit contributions
ips+0.012
aned+0.011
aled+0.011
ge+0.010
wed+0.010
Neg logit contributions
 Can-0.008
 oh-0.008
 insect-0.007
 mean-0.007
 Please-0.007
Token dreams
Token ID10625
Feature activation+2.47e+00

Pos logit contributions
ips+0.087
aned+0.083
aled+0.080
ge+0.074
wed+0.073
Neg logit contributions
 Can-0.078
 oh-0.074
 insect-0.073
 mean-0.072
 Please-0.069
Token found
Token ID1043
Feature activation+9.69e-01

Pos logit contributions
ips+0.013
aned+0.012
aled+0.012
ge+0.011
wed+0.011
Neg logit contributions
 Can-0.011
 oh-0.010
 insect-0.010
 mean-0.010
 Please-0.009
Token some
Token ID617
Feature activation+8.31e-01

Pos logit contributions
ips+0.027
aned+0.026
aled+0.025
ge+0.024
wed+0.023
Neg logit contributions
 Can-0.019
 oh-0.018
 insect-0.018
 mean-0.017
 Please-0.017
Token in
Token ID287
Feature activation+1.17e+00

Pos logit contributions
ips+0.019
aned+0.018
aled+0.017
ge+0.016
wed+0.016
Neg logit contributions
 Can-0.021
 oh-0.020
 insect-0.020
 Please-0.019
 mean-0.019
Token the
Token ID262
Feature activation+2.78e+00

Pos logit contributions
ips+0.038
aned+0.038
aled+0.036
wed+0.034
ags+0.033
Neg logit contributions
 Can-0.019
 oh-0.018
 insect-0.018
 mean-0.017
 I-0.016
Token river
Token ID7850
Feature activation+1.37e+00

Pos logit contributions
ips+0.067
aned+0.064
aled+0.062
ge+0.056
wed+0.056
Neg logit contributions
 Can-0.068
 oh-0.065
 insect-0.064
 mean-0.062
 I-0.060
Token nearby
Token ID6716
Feature activation+1.12e+00

Pos logit contributions
ips+0.042
aned+0.040
aled+0.039
ags+0.036
wed+0.036
Neg logit contributions
 Can-0.024
 oh-0.023
 insect-0.023
 mean-0.022
 I-0.021
Token.
Token ID13
Feature activation+2.25e+00

Pos logit contributions
ips+0.027
aned+0.025
aled+0.025
ge+0.023
wed+0.022
Neg logit contributions
 Can-0.027
 oh-0.026
 insect-0.025
 mean-0.024
 Please-0.024
Token He
Token ID679
Feature activation+1.32e+00

Pos logit contributions
ips+0.068
aned+0.067
ge+0.063
aled+0.062
wed+0.061
Neg logit contributions
 Can-0.042
 I-0.036
 Please-0.036
 insect-0.035
 oh-0.035
Token put
Token ID1234
Feature activation+3.27e+00

Pos logit contributions
ips+0.042
aned+0.039
aled+0.038
ags+0.037
ulk+0.036
Neg logit contributions
 oh-0.022
 Can-0.022
 insect-0.021
 mean-0.020
 curious-0.020
Token them
Token ID606
Feature activation+3.71e+00

Pos logit contributions
ips+0.084
aned+0.082
aled+0.076
ge+0.073
ulk+0.072
Neg logit contributions
 Can-0.071
 oh-0.071
 insect-0.067
 mean-0.066
 hello-0.064
Token in
Token ID287
Feature activation+1.56e+00

Pos logit contributions
ips+0.097
aned+0.097
aled+0.089
ulk+0.088
ge+0.081
Neg logit contributions
 oh-0.076
 mean-0.072
 curious-0.072
 Can-0.071
 hello-0.071
Token that
Token ID326
Feature activation+1.06e+00

Pos logit contributions
 Can+0.006
 oh+0.005
 Please+0.005
 Are+0.005
 insect+0.005
Neg logit contributions
ips-0.009
aled-0.008
aned-0.008
 safely-0.008
ags-0.008
Token day
Token ID1110
Feature activation+1.64e+00

Pos logit contributions
ips+0.029
aned+0.027
aled+0.027
ge+0.025
 ease+0.025
Neg logit contributions
 Can-0.021
 oh-0.020
 insect-0.020
 Please-0.019
 hello-0.018
Token on
Token ID319
Feature activation+3.70e-01

Pos logit contributions
ips+0.042
aned+0.042
aled+0.040
ge+0.038
wed+0.037
Neg logit contributions
 Can-0.035
 oh-0.034
 mean-0.032
 insect-0.031
 I-0.031
Token,
Token ID11
Feature activation-1.79e+00

Pos logit contributions
ips+0.011
aned+0.010
 safely+0.010
 off+0.010
aled+0.010
Neg logit contributions
 Can-0.006
 insect-0.006
 oh-0.006
 witch-0.006
 Please-0.006
Token Tim
Token ID5045
Feature activation+1.44e+00

Pos logit contributions
 Can+0.028
 insect+0.027
 oh+0.025
 Please+0.025
 witch+0.025
Neg logit contributions
ips-0.056
aled-0.052
aned-0.052
 ease-0.052
 safely-0.051
Token promised
Token ID8072
Feature activation+2.81e-01

Pos logit contributions
ips+0.049
aned+0.047
aled+0.046
ge+0.045
ags+0.045
Neg logit contributions
 Can-0.019
 mean-0.018
 insect-0.018
 oh-0.017
 witch-0.017
Token to
Token ID284
Feature activation+4.47e+00

Pos logit contributions
ips+0.010
aned+0.009
aled+0.009
wed+0.009
ws+0.009
Neg logit contributions
 oh-0.004
 Can-0.004
 insect-0.004
 I-0.004
 mean-0.003
Token be
Token ID307
Feature activation+1.54e+00

Pos logit contributions
ips+0.162
aned+0.157
aled+0.149
ge+0.142
ushes+0.138
Neg logit contributions
 I-0.059
 please-0.056
 mean-0.056
 Can-0.054
 insect-0.051
Token more
Token ID517
Feature activation+1.46e+00

Pos logit contributions
ips+0.039
aled+0.036
aned+0.036
 safely+0.033
wed+0.033
Neg logit contributions
 Can-0.035
 insect-0.032
 oh-0.032
 Please-0.030
 witch-0.030
Token careful
Token ID8161
Feature activation+8.77e-02

Pos logit contributions
ips+0.038
aned+0.036
aled+0.035
 safely+0.033
ge+0.033
Neg logit contributions
 Can-0.032
 oh-0.029
 insect-0.029
 I-0.027
 Please-0.027
Token when
Token ID618
Feature activation-2.55e+00

Pos logit contributions
ips+0.002
aned+0.002
aled+0.002
 safely+0.002
ge+0.002
Neg logit contributions
 Can-0.002
 insect-0.002
 oh-0.002
 witch-0.002
 hello-0.002
Token did
Token ID750
Feature activation-4.62e+00

Pos logit contributions
 Can+0.027
 oh+0.026
 insect+0.024
 goat+0.024
 Please+0.023
Neg logit contributions
ips-0.032
wed-0.027
aled-0.027
ws-0.026
aned-0.026
Token you
Token ID345
Feature activation+3.48e-01

Pos logit contributions
 Can+0.097
 insect+0.091
 writer+0.088
 chick+0.087
 witch+0.086
Neg logit contributions
ips-0.100
 off-0.094
 safely-0.092
ulk-0.090
 instructions-0.088
Token do
Token ID466
Feature activation-1.99e+00

Pos logit contributions
ips+0.011
aned+0.011
aled+0.011
wed+0.010
ge+0.010
Neg logit contributions
 Can-0.005
 oh-0.005
 insect-0.005
 Please-0.004
 witch-0.004
Token?"
Token ID1701
Feature activation-6.41e-01

Pos logit contributions
 Can+0.032
 insect+0.032
 hello+0.029
 goat+0.029
 pigeon+0.029
Neg logit contributions
ips-0.059
aned-0.053
 safely-0.053
ulk-0.051
aled-0.050
Token Mom
Token ID11254
Feature activation+1.12e+00

Pos logit contributions
 Can+0.015
 goat+0.015
 pigeon+0.015
 photographer+0.015
 whale+0.014
Neg logit contributions
ips-0.012
ulk-0.011
 safely-0.011
 off-0.011
 ease-0.011
Token says
Token ID1139
Feature activation+1.20e+00

Pos logit contributions
ips+0.031
aned+0.030
aled+0.029
ulk+0.026
ge+0.026
Neg logit contributions
 oh-0.023
 Can-0.023
 insect-0.021
 pigeon-0.020
 witch-0.019
Token.
Token ID13
Feature activation-1.01e+00

Pos logit contributions
ips+0.033
aned+0.031
aled+0.030
ge+0.028
 safely+0.027
Neg logit contributions
 Can-0.025
 oh-0.024
 insect-0.024
 mean-0.022
 witch-0.022
Token "
Token ID366
Feature activation-8.21e-01

Pos logit contributions
 oh+0.017
 insect+0.016
 Can+0.015
 pigeon+0.015
 guy+0.015
Neg logit contributions
ips-0.030
aned-0.028
aled-0.026
 safely-0.025
ulk-0.025
TokenYou
Token ID1639
Feature activation+8.37e-01

Pos logit contributions
 Can+0.015
 oh+0.014
 insect+0.014
 mean+0.013
 I+0.013
Neg logit contributions
ips-0.025
aned-0.024
aled-0.023
wed-0.022
ge-0.022
Token broke
Token ID6265
Feature activation+1.69e+00

Pos logit contributions
ips+0.024
aned+0.023
aled+0.022
ge+0.021
wed+0.021
Neg logit contributions
 Can-0.016
 oh-0.015
 insect-0.015
 mean-0.014
 I-0.014
Token the
Token ID262
Feature activation+2.93e+00

Pos logit contributions
ips+0.053
aned+0.051
aled+0.050
wed+0.047
ge+0.046
Neg logit contributions
 Can-0.029
 oh-0.028
 insect-0.027
 mean-0.026
 I-0.026
Token.
Token ID13
Feature activation-4.23e-01

Pos logit contributions
 Can+0.049
 oh+0.045
 insect+0.045
 Please+0.043
 witch+0.042
Neg logit contributions
ips-0.057
aned-0.055
aled-0.052
ge-0.051
 safely-0.051
Token Where
Token ID6350
Feature activation+5.82e-02

Pos logit contributions
 Can+0.008
 oh+0.008
 insect+0.007
 mean+0.007
 witch+0.007
Neg logit contributions
ips-0.012
aned-0.012
aled-0.012
 safely-0.011
 securely-0.010
Token had
Token ID550
Feature activation-2.24e+00

Pos logit contributions
ips+0.002
aned+0.001
aled+0.001
 safely+0.001
 securely+0.001
Neg logit contributions
 Can-0.001
 insect-0.001
 goat-0.001
 oh-0.001
 pigeon-0.001
Token the
Token ID262
Feature activation+2.43e+00

Pos logit contributions
 Can+0.035
 insect+0.032
 oh+0.031
 mean+0.028
 witch+0.028
Neg logit contributions
ips-0.072
aled-0.069
aned-0.068
 ease-0.064
wed-0.064
Token creature
Token ID7185
Feature activation-7.94e-01

Pos logit contributions
ips+0.062
aned+0.061
aled+0.058
 safely+0.057
 securely+0.057
Neg logit contributions
 Can-0.049
 oh-0.045
 insect-0.041
 Please-0.041
 I-0.041
Token gone
Token ID3750
Feature activation+1.25e+00

Pos logit contributions
 Can+0.017
 oh+0.016
 insect+0.015
 Please+0.014
 hello+0.014
Neg logit contributions
ips-0.020
aned-0.019
aled-0.019
 safely-0.019
ge-0.018
Token?
Token ID30
Feature activation+4.32e-01

Pos logit contributions
ips+0.045
aned+0.043
aled+0.042
ge+0.040
 safely+0.040
Neg logit contributions
 Can-0.015
 oh-0.014
 insect-0.013
 mean-0.012
 Please-0.012
Token He
Token ID679
Feature activation+1.08e-01

Pos logit contributions
ips+0.012
aled+0.012
aned+0.011
 safely+0.011
 securely+0.011
Neg logit contributions
 insect-0.008
 oh-0.008
 mean-0.007
 Can-0.007
 witch-0.007
Token had
Token ID550
Feature activation+4.19e-01

Pos logit contributions
aned+0.003
ips+0.003
aled+0.003
 safely+0.003
wed+0.003
Neg logit contributions
 Can-0.002
 oh-0.002
 insect-0.002
 hello-0.002
 Please-0.002
Token to
Token ID284
Feature activation+2.76e+00

Pos logit contributions
ips+0.013
aned+0.012
aled+0.012
ags+0.011
ge+0.011
Neg logit contributions
 Can-0.007
 oh-0.007
 mean-0.007
 insect-0.007
 happened-0.007
Token find
Token ID1064
Feature activation+4.96e-01

Pos logit contributions
ips+0.073
aned+0.068
aled+0.065
ge+0.061
wed+0.060
Neg logit contributions
 Can-0.060
 oh-0.059
 insect-0.058
 mean-0.056
 I-0.055
Token to
Token ID284
Feature activation+2.77e+00

Pos logit contributions
 Can+0.021
 oh+0.020
 insect+0.019
 mean+0.018
 I+0.018
Neg logit contributions
ips-0.034
aned-0.032
aled-0.031
ge-0.029
wed-0.029
Token go
Token ID467
Feature activation+2.09e+00

Pos logit contributions
ips+0.087
aned+0.086
aled+0.078
wed+0.077
ge+0.076
Neg logit contributions
 Can-0.048
 I-0.047
 oh-0.045
 mean-0.045
 please-0.044
Token on
Token ID319
Feature activation+3.36e-02

Pos logit contributions
aned+0.056
ips+0.056
aled+0.053
wed+0.051
ws+0.049
Neg logit contributions
 Can-0.044
 oh-0.041
 I-0.041
 mean-0.041
 insect-0.039
Token the
Token ID262
Feature activation+1.03e+00

Pos logit contributions
aned+0.001
ips+0.001
aled+0.001
wed+0.001
ws+0.001
Neg logit contributions
 Can-0.000
 oh-0.000
 insect-0.000
 I-0.000
 mean-0.000
Token slide
Token ID10649
Feature activation+6.71e-01

Pos logit contributions
ips+0.029
aned+0.028
aled+0.027
wed+0.026
ge+0.025
Neg logit contributions
 Can-0.021
 mean-0.021
 I-0.021
 oh-0.021
 insect-0.020
Token,
Token ID11
Feature activation-9.63e-01

Pos logit contributions
ips+0.021
aned+0.021
aled+0.020
wed+0.019
ws+0.019
Neg logit contributions
 Can-0.011
 oh-0.010
 insect-0.010
 I-0.010
 mean-0.010
Token but
Token ID475
Feature activation-2.13e+00

Pos logit contributions
 Can+0.012
 insect+0.012
 oh+0.011
 witch+0.011
 writer+0.010
Neg logit contributions
ips-0.032
aned-0.030
aled-0.030
 safely-0.029
ge-0.028
Token Ben
Token ID3932
Feature activation-1.31e+00

Pos logit contributions
 Can+0.032
 insect+0.029
 witch+0.029
 oh+0.028
 goat+0.027
Neg logit contributions
ips-0.070
aned-0.063
aled-0.063
 securely-0.060
 safely-0.059
Token is
Token ID318
Feature activation-1.30e+00

Pos logit contributions
 Can+0.023
 oh+0.023
 insect+0.022
 witch+0.020
 Please+0.019
Neg logit contributions
ips-0.039
aned-0.037
aled-0.036
wed-0.034
ge-0.033
Token scared
Token ID12008
Feature activation-2.46e+00

Pos logit contributions
 Can+0.019
 oh+0.017
 insect+0.017
 witch+0.015
 Please+0.015
Neg logit contributions
ips-0.044
aned-0.042
aled-0.041
ge-0.039
wed-0.038
Token.
Token ID13
Feature activation-2.80e-01

Pos logit contributions
 oh+0.063
 Can+0.062
 insect+0.062
 witch+0.060
 pigeon+0.059
Neg logit contributions
ips-0.053
aned-0.045
 off-0.045
aled-0.043
 safely-0.043
Token€
Token ID26391
Feature activation-1.05e+00

Pos logit contributions
 Can+0.039
 oh+0.037
 insect+0.036
 mean+0.035
 Please+0.035
Neg logit contributions
ips-0.041
aned-0.038
aled-0.037
ge-0.034
wed-0.034
Token<|endoftext|>
Token ID50256
Feature activation-1.81e+00

Pos logit contributions
 Can+0.022
 oh+0.022
 insect+0.022
 mean+0.020
 witch+0.020
Neg logit contributions
ips-0.028
aned-0.025
aled-0.025
ge-0.023
ulk-0.023
TokenOnce
Token ID7454
Feature activation-2.53e-01

Pos logit contributions
 Can+0.042
 oh+0.041
 insect+0.040
 mean+0.038
 Please+0.037
Neg logit contributions
ips-0.044
aned-0.042
aled-0.041
ge-0.037
wed-0.037
Token upon
Token ID2402
Feature activation-3.74e+00

Pos logit contributions
 Can+0.006
 oh+0.006
 insect+0.005
 Please+0.005
 mean+0.005
Neg logit contributions
ips-0.006
aned-0.006
aled-0.006
ge-0.005
wed-0.005
Token a
Token ID257
Feature activation-4.43e+00

Pos logit contributions
 Can+0.080
 insect+0.073
 goat+0.070
 Please+0.069
 oh+0.069
Neg logit contributions
ips-0.089
 safely-0.086
aled-0.085
 ease-0.085
aned-0.082
Token time
Token ID640
Feature activation-3.58e+00

Pos logit contributions
 Can+0.120
 oh+0.116
 insect+0.111
 alien+0.107
 witch+0.106
Neg logit contributions
ips-0.087
aned-0.078
ippy-0.076
wed-0.073
aled-0.073
Token,
Token ID11
Feature activation-4.43e+00

Pos logit contributions
 goat+0.078
 insect+0.077
 Can+0.077
 witch+0.076
 pigeon+0.076
Neg logit contributions
ips-0.088
 safely-0.075
ippy-0.074
ge-0.074
aned-0.072
Token there
Token ID612
Feature activation-9.55e-01

Pos logit contributions
 witch+0.094
 photographer+0.091
 writer+0.091
 goat+0.091
 pigeon+0.090
Neg logit contributions
ips-0.103
 safely-0.100
aled-0.092
 securely-0.092
 off-0.092
Token were
Token ID547
Feature activation-3.76e+00

Pos logit contributions
 Can+0.021
 insect+0.019
 oh+0.019
 Please+0.019
 witch+0.019
Neg logit contributions
ips-0.024
aned-0.023
aled-0.022
 ease-0.021
 safely-0.021
Token two
Token ID734
Feature activation-4.58e+00

Pos logit contributions
 Can+0.104
 oh+0.101
 insect+0.097
 witch+0.096
 Please+0.094
Neg logit contributions
ips-0.076
aled-0.066
aned-0.064
 ease-0.060
ags-0.058
Token friends
Token ID2460
Feature activation-3.87e+00

Pos logit contributions
 Can+0.129
 oh+0.118
 Please+0.116
 Are+0.111
 I+0.110
Neg logit contributions
ips-0.097
aned-0.088
aled-0.080
 ease-0.078
ippy-0.078
Token dig
Token ID3100
Feature activation+4.65e-01

Pos logit contributions
ips+0.046
aled+0.043
aned+0.043
 ease+0.039
ge+0.039
Neg logit contributions
 Can-0.046
 oh-0.043
 insect-0.041
 Please-0.041
 witch-0.040
Token in
Token ID287
Feature activation-4.38e-01

Pos logit contributions
ips+0.011
aled+0.010
aned+0.010
wed+0.010
 safely+0.010
Neg logit contributions
 Can-0.011
 oh-0.011
 Please-0.010
 insect-0.010
 witch-0.010
Token the
Token ID262
Feature activation+1.70e-02

Pos logit contributions
 Can+0.008
 insect+0.007
 oh+0.007
 mean+0.007
 I+0.007
Neg logit contributions
aned-0.013
ips-0.013
aled-0.013
wed-0.012
ge-0.011
Token sand
Token ID6450
Feature activation+7.93e-01

Pos logit contributions
ips+0.000
aled+0.000
aned+0.000
ags+0.000
 ease+0.000
Neg logit contributions
 Can-0.000
 oh-0.000
 insect-0.000
 Please-0.000
 hello-0.000
Token and
Token ID290
Feature activation-1.25e-01

Pos logit contributions
ips+0.025
aned+0.025
aled+0.024
ge+0.022
ulk+0.022
Neg logit contributions
 Can-0.012
 insect-0.012
 oh-0.012
 mean-0.011
 I-0.011
Token find
Token ID1064
Feature activation-9.88e-02

Pos logit contributions
 Can+0.003
 oh+0.003
 insect+0.003
 Please+0.003
 witch+0.003
Neg logit contributions
ips-0.003
aled-0.003
aned-0.003
 ease-0.003
 safely-0.002
Token things
Token ID1243
Feature activation+3.33e-01

Pos logit contributions
 Can+0.003
 Please+0.002
 oh+0.002
 insect+0.002
 I+0.002
Neg logit contributions
ips-0.002
aled-0.002
aned-0.002
wed-0.002
ags-0.002
Token.
Token ID13
Feature activation+8.02e-02

Pos logit contributions
ips+0.009
aned+0.008
aled+0.008
ge+0.008
 safely+0.008
Neg logit contributions
 Can-0.007
 oh-0.007
 insect-0.007
 witch-0.006
 Please-0.006
Token Tom
Token ID4186
Feature activation-2.20e+00

Pos logit contributions
ips+0.002
aned+0.002
aled+0.002
ge+0.002
wed+0.002
Neg logit contributions
 Can-0.002
 oh-0.001
 insect-0.001
 I-0.001
 mean-0.001
Token found
Token ID1043
Feature activation-6.91e-01

Pos logit contributions
 Can+0.044
 oh+0.041
 insect+0.039
 hello+0.039
 Please+0.039
Neg logit contributions
ips-0.059
aned-0.058
wed-0.056
aled-0.055
ge-0.054
Token a
Token ID257
Feature activation-2.82e+00

Pos logit contributions
 Can+0.015
 Please+0.013
 insect+0.012
 oh+0.012
 I+0.012
Neg logit contributions
ips-0.019
aled-0.018
 ease-0.017
 safely-0.017
aned-0.017
Token shy
Token ID15800
Feature activation-1.66e+00

Pos logit contributions
aned+0.008
aled+0.008
ips+0.007
 safely+0.007
wed+0.007
Neg logit contributions
 insect-0.006
 Can-0.006
 oh-0.005
 goat-0.005
 witch-0.005
Token.
Token ID13
Feature activation-3.74e-01

Pos logit contributions
 oh+0.038
 Can+0.037
 insect+0.037
 witch+0.034
 crow+0.034
Neg logit contributions
ips-0.040
aned-0.037
aled-0.036
 safely-0.034
ge-0.033
Token He
Token ID679
Feature activation+9.26e-01

Pos logit contributions
 Can+0.006
 oh+0.005
 insect+0.005
 mean+0.005
 I+0.005
Neg logit contributions
ips-0.012
aned-0.012
aled-0.011
ge-0.011
wed-0.011
Token nodded
Token ID14464
Feature activation+5.74e-01

Pos logit contributions
aned+0.026
ips+0.026
aled+0.026
wed+0.024
 safely+0.024
Neg logit contributions
 Can-0.017
 oh-0.016
 insect-0.015
 I-0.014
 Please-0.013
Token and
Token ID290
Feature activation+2.05e+00

Pos logit contributions
ips+0.014
aned+0.014
aled+0.013
 safely+0.013
ge+0.012
Neg logit contributions
 Can-0.012
 insect-0.012
 oh-0.012
 pigeon-0.011
 witch-0.011
Token said
Token ID531
Feature activation-7.74e-01

Pos logit contributions
ips+0.059
aned+0.057
aled+0.057
wed+0.053
 safely+0.053
Neg logit contributions
 Can-0.038
 insect-0.035
 oh-0.034
 Please-0.032
 mean-0.031
Token yes
Token ID3763
Feature activation+2.67e-01

Pos logit contributions
 insect+0.012
 Can+0.011
 goat+0.011
 oh+0.011
 witch+0.010
Neg logit contributions
ips-0.023
aled-0.022
 safely-0.022
aned-0.022
 off-0.021
Token.
Token ID13
Feature activation+4.62e-01

Pos logit contributions
ips+0.008
aned+0.007
aled+0.007
ge+0.007
wed+0.007
Neg logit contributions
 Can-0.005
 oh-0.005
 insect-0.005
 mean-0.005
 I-0.004
Token He
Token ID679
Feature activation+1.37e+00

Pos logit contributions
ips+0.016
aned+0.015
aled+0.015
ge+0.014
wed+0.014
Neg logit contributions
 Can-0.007
 I-0.006
 oh-0.006
 Please-0.006
 Are-0.006
Token said
Token ID531
Feature activation-8.37e-01

Pos logit contributions
ips+0.044
aned+0.042
aled+0.041
ge+0.038
wed+0.038
Neg logit contributions
 Can-0.022
 oh-0.021
 insect-0.020
 mean-0.019
 I-0.019
Token that
Token ID326
Feature activation-1.36e+00

Pos logit contributions
 insect+0.014
 Can+0.013
 oh+0.012
 goat+0.012
 witch+0.011
Neg logit contributions
ips-0.024
aled-0.023
 off-0.023
 safely-0.023
aned-0.022
Token clumsy
Token ID39120
Feature activation-5.83e+00

Pos logit contributions
 Can+0.123
 oh+0.121
 happened+0.119
 unlike+0.112
 Please+0.111
Neg logit contributions
 safely-0.104
ippy-0.101
ips-0.098
 finish-0.095
 securely-0.093
Token girl
Token ID2576
Feature activation-5.09e+00

Pos logit contributions
 Can+0.096
 oh+0.094
 Do+0.085
 Please+0.084
 Are+0.083
Neg logit contributions
ips-0.154
ippy-0.142
aned-0.141
aled-0.140
 safely-0.138
Token named
Token ID3706
Feature activation-3.03e+00

Pos logit contributions
 oh+0.091
 Can+0.090
 guy+0.082
 hello+0.082
 dear+0.080
Neg logit contributions
ips-0.134
wed-0.122
aled-0.119
 safely-0.116
eds-0.115
Token Lily
Token ID20037
Feature activation-3.98e+00

Pos logit contributions
 Can+0.047
 oh+0.044
 insect+0.043
 mean+0.040
 Please+0.038
Neg logit contributions
ips-0.097
aned-0.092
aled-0.091
wed-0.085
ge-0.083
Token.
Token ID13
Feature activation-8.21e-01

Pos logit contributions
 Can+0.099
 oh+0.096
 insect+0.089
 hello+0.088
 Please+0.088
Neg logit contributions
ips-0.087
aned-0.081
aled-0.081
ge-0.079
wed-0.079
Token She
Token ID1375
Feature activation-2.78e+00

Pos logit contributions
 Can+0.013
 oh+0.013
 insect+0.013
 mean+0.012
 witch+0.011
Neg logit contributions
ips-0.026
aned-0.025
aled-0.024
wed-0.022
ge-0.022
Token always
Token ID1464
Feature activation-8.76e-01

Pos logit contributions
 Can+0.051
 insect+0.048
 oh+0.048
 hello+0.045
 Please+0.042
Neg logit contributions
aned-0.078
ips-0.077
aled-0.075
wed-0.072
 securely-0.068
Token dropped
Token ID5710
Feature activation-9.45e-02

Pos logit contributions
 Can+0.013
 oh+0.012
 insect+0.012
 mean+0.011
 Please+0.011
Neg logit contributions
ips-0.029
aned-0.028
aled-0.027
wed-0.025
ge-0.025
Token things
Token ID1243
Feature activation+5.81e-01

Pos logit contributions
 Can+0.002
 oh+0.002
 insect+0.002
 mean+0.002
 I+0.002
Neg logit contributions
ips-0.003
aned-0.003
aled-0.003
ge-0.002
wed-0.002
Token and
Token ID290
Feature activation-6.96e-01

Pos logit contributions
ips+0.017
aned+0.017
aled+0.016
ulk+0.015
ge+0.015
Neg logit contributions
 Can-0.011
 oh-0.010
 mean-0.010
 insect-0.010
 I-0.010
Token tri
Token ID1333
Feature activation+9.77e-01

Pos logit contributions
 Can+0.014
 oh+0.013
 insect+0.012
 mean+0.012
 Please+0.012
Neg logit contributions
ips-0.020
aned-0.019
aled-0.019
wed-0.017
ge-0.017
Token new
Token ID649
Feature activation-1.18e+00

Pos logit contributions
 Can+0.036
 oh+0.034
 Please+0.031
 I+0.031
 insect+0.031
Neg logit contributions
ips-0.037
 safely-0.035
aned-0.035
ge-0.035
 securely-0.035
Token friend
Token ID1545
Feature activation-2.03e+00

Pos logit contributions
 Can+0.026
 oh+0.024
 Please+0.023
 mean+0.023
 Are+0.022
Neg logit contributions
ips-0.028
 safely-0.026
aned-0.025
ge-0.025
 securely-0.025
Token.
Token ID13
Feature activation-3.68e-01

Pos logit contributions
 Can+0.041
 oh+0.040
 insect+0.038
 mean+0.036
 witch+0.036
Neg logit contributions
ips-0.055
aned-0.051
aled-0.050
ge-0.048
 safely-0.047
Token<|endoftext|>
Token ID50256
Feature activation-4.35e-01

Pos logit contributions
 oh+0.008
 Can+0.008
 insect+0.008
 mean+0.007
 witch+0.007
Neg logit contributions
ips-0.010
aled-0.009
aned-0.009
 safely-0.008
ge-0.008
TokenOnce
Token ID7454
Feature activation-4.92e-01

Pos logit contributions
 Can+0.007
 I+0.006
 insect+0.006
 oh+0.006
 Please+0.006
Neg logit contributions
aned-0.014
ips-0.013
aled-0.013
wed-0.013
ge-0.013
Token upon
Token ID2402
Feature activation-3.82e+00

Pos logit contributions
 Can+0.011
 oh+0.010
 insect+0.010
 mean+0.009
 Please+0.009
Neg logit contributions
ips-0.013
aned-0.012
aled-0.012
ge-0.011
wed-0.011
Token a
Token ID257
Feature activation-3.55e+00

Pos logit contributions
 Can+0.075
 insect+0.067
 oh+0.067
 Please+0.064
 goat+0.063
Neg logit contributions
ips-0.101
aled-0.096
 safely-0.094
 ease-0.093
aned-0.093
Token time
Token ID640
Feature activation-3.06e+00

Pos logit contributions
 Can+0.085
 oh+0.080
 insect+0.074
 hello+0.073
 Please+0.073
Neg logit contributions
ips-0.087
aned-0.080
aled-0.077
wed-0.075
ippy-0.075
Token,
Token ID11
Feature activation-3.73e+00

Pos logit contributions
 Can+0.055
 insect+0.053
 witch+0.051
 oh+0.050
 goat+0.050
Neg logit contributions
ips-0.092
aned-0.081
ippy-0.079
 safely-0.079
ge-0.078
Token there
Token ID612
Feature activation-6.79e-01

Pos logit contributions
 Can+0.056
 oh+0.050
 insect+0.050
 witch+0.046
 Please+0.045
Neg logit contributions
ips-0.124
aned-0.117
aled-0.114
ge-0.110
 safely-0.108
Token was
Token ID373
Feature activation-4.28e+00

Pos logit contributions
 Can+0.013
 oh+0.012
 insect+0.012
 Please+0.011
 mean+0.011
Neg logit contributions
ips-0.020
aned-0.019
aled-0.018
ge-0.017
wed-0.017
Token cheeks
Token ID25839
Feature activation+1.99e+00

Pos logit contributions
ips+0.047
aned+0.045
aled+0.043
ge+0.040
wed+0.039
Neg logit contributions
 Can-0.045
 oh-0.043
 insect-0.042
 mean-0.040
 I-0.040
Token.
Token ID13
Feature activation+1.28e-01

Pos logit contributions
ips+0.068
aled+0.064
aned+0.063
ags+0.060
ulk+0.060
Neg logit contributions
 mean-0.027
 Can-0.027
 oh-0.025
 insect-0.025
 I-0.023
Token
Token ID198
Feature activation+2.43e+00

Pos logit contributions
ips+0.004
aned+0.004
aled+0.003
 safely+0.003
 securely+0.003
Neg logit contributions
 insect-0.002
 Can-0.002
 oh-0.002
 mean-0.002
 witch-0.002
Token
Token ID198
Feature activation+5.73e-01

Pos logit contributions
ips+0.083
aled+0.078
aned+0.078
wed+0.075
ps+0.073
Neg logit contributions
 Can-0.037
 I-0.032
 oh-0.032
 Please-0.030
 hello-0.030
Token"
Token ID1
Feature activation-1.89e+00

Pos logit contributions
ips+0.017
aned+0.016
aled+0.016
ge+0.015
wed+0.014
Neg logit contributions
 Can-0.011
 oh-0.010
 insect-0.010
 mean-0.010
 Please-0.009
TokenBut
Token ID1537
Feature activation-4.25e+00

Pos logit contributions
 Can+0.047
 insect+0.047
 witch+0.045
 hello+0.045
 dear+0.045
Neg logit contributions
ips-0.038
aned-0.036
aled-0.034
 ease-0.032
ge-0.032
Token why
Token ID1521
Feature activation-1.44e+00

Pos logit contributions
 insect+0.074
 pigeon+0.073
 photographer+0.070
 witch+0.069
 Can+0.067
Neg logit contributions
 safely-0.106
 off-0.105
 securely-0.098
ips-0.096
 destination-0.093
Token?"
Token ID1701
Feature activation-1.49e+00

Pos logit contributions
 Can+0.024
 insect+0.023
 oh+0.022
 witch+0.020
 pigeon+0.020
Neg logit contributions
ips-0.043
aned-0.042
aled-0.039
ge-0.038
 safely-0.038
Token he
Token ID339
Feature activation-5.49e-01

Pos logit contributions
 insect+0.037
 Can+0.033
 witch+0.033
 worm+0.033
 pigeon+0.033
Neg logit contributions
aned-0.031
 safely-0.028
aled-0.028
 off-0.028
 securely-0.026
Token asked
Token ID1965
Feature activation-2.80e-01

Pos logit contributions
 Can+0.012
 oh+0.011
 insect+0.011
 hello+0.010
 witch+0.010
Neg logit contributions
aned-0.014
ips-0.013
aled-0.012
ge-0.012
 securely-0.012
Token in
Token ID287
Feature activation-3.53e-01

Pos logit contributions
 Can+0.007
 insect+0.006
 oh+0.006
 witch+0.006
 Please+0.006
Neg logit contributions
ips-0.007
aned-0.006
aled-0.006
ge-0.006
 safely-0.006
Token.
Token ID13
Feature activation+9.43e-02

Pos logit contributions
 Can+0.005
 insect+0.005
 oh+0.005
 witch+0.005
 Please+0.004
Neg logit contributions
ips-0.009
aned-0.008
aled-0.008
ge-0.008
 safely-0.008
Token He
Token ID679
Feature activation-1.10e+00

Pos logit contributions
ips+0.003
aned+0.003
aled+0.003
wed+0.003
ge+0.003
Neg logit contributions
 Can-0.001
 oh-0.001
 I-0.001
 Please-0.001
 insect-0.001
Token had
Token ID550
Feature activation-1.52e+00

Pos logit contributions
 Can+0.019
 oh+0.018
 insect+0.018
 mean+0.016
 hello+0.016
Neg logit contributions
ips-0.033
aned-0.032
aled-0.031
wed-0.029
ge-0.029
Token an
Token ID281
Feature activation-7.29e-01

Pos logit contributions
 Can+0.023
 oh+0.022
 mean+0.021
 insect+0.020
 I+0.019
Neg logit contributions
ips-0.051
aned-0.049
aled-0.048
ge-0.045
ags-0.045
Token idea
Token ID2126
Feature activation-1.78e+00

Pos logit contributions
 Can+0.014
 oh+0.013
 Please+0.012
 insect+0.012
 I+0.012
Neg logit contributions
ips-0.021
aned-0.020
aled-0.020
ags-0.018
wed-0.018
Token -
Token ID532
Feature activation-3.89e+00

Pos logit contributions
 Can+0.038
 insect+0.035
 oh+0.035
 Please+0.034
 mean+0.034
Neg logit contributions
ips-0.045
aned-0.043
aled-0.041
ge-0.041
 safely-0.040
Token he
Token ID339
Feature activation-2.28e+00

Pos logit contributions
 Can+0.064
 insect+0.064
 pigeon+0.062
 happened+0.062
 witch+0.062
Neg logit contributions
 safely-0.098
 off-0.096
 securely-0.093
aled-0.092
 successfully-0.089
Token would
Token ID561
Feature activation+1.49e+00

Pos logit contributions
 oh+0.039
 Can+0.037
 insect+0.037
 hello+0.035
 pigeon+0.033
Neg logit contributions
ips-0.063
aled-0.063
aned-0.063
wed-0.061
 safely-0.059
Token paint
Token ID7521
Feature activation-8.68e-02

Pos logit contributions
ips+0.036
aned+0.034
aled+0.034
ge+0.031
wed+0.030
Neg logit contributions
 Can-0.035
 oh-0.034
 insect-0.033
 mean-0.031
 Please-0.031
Token!
Token ID0
Feature activation+7.93e-01

Pos logit contributions
 Can+0.002
 oh+0.002
 insect+0.002
 Please+0.001
 witch+0.001
Neg logit contributions
ips-0.002
aled-0.002
aned-0.002
 safely-0.002
 ease-0.002
Token
Token ID198
Feature activation+2.10e+00

Pos logit contributions
ips+0.027
aned+0.026
aled+0.025
wed+0.024
ge+0.024
Neg logit contributions
 Can-0.013
 oh-0.011
 I-0.011
 Please-0.010
 insect-0.010
Token,
Token ID11
Feature activation-4.49e+00

Pos logit contributions
 Can+0.066
 insect+0.066
 goat+0.065
 witch+0.064
 pigeon+0.062
Neg logit contributions
ips-0.080
 safely-0.069
ippy-0.068
ge-0.067
aned-0.065
Token there
Token ID612
Feature activation-9.64e-01

Pos logit contributions
 witch+0.088
 insect+0.088
 Can+0.086
 goat+0.085
 writer+0.084
Neg logit contributions
ips-0.113
 safely-0.104
 securely-0.098
ge-0.097
aled-0.097
Token was
Token ID373
Feature activation-4.75e+00

Pos logit contributions
 Can+0.021
 insect+0.020
 oh+0.019
 Please+0.019
 witch+0.018
Neg logit contributions
ips-0.025
aned-0.023
aled-0.022
 ease-0.022
 safely-0.021
Token a
Token ID257
Feature activation-5.84e+00

Pos logit contributions
 Can+0.112
 Please+0.104
 Are+0.100
 oh+0.098
 hello+0.097
Neg logit contributions
ips-0.110
 ease-0.103
aled-0.101
ags-0.099
 safely-0.096
Token big
Token ID1263
Feature activation-4.18e+00

Pos logit contributions
 Can+0.118
 oh+0.117
 happened+0.114
 Please+0.107
 unlike+0.107
Neg logit contributions
 safely-0.109
ippy-0.108
ips-0.105
ge-0.100
 finish-0.099
Token oct
Token ID19318
Feature activation-5.53e+00

Pos logit contributions
 oh+0.088
 Can+0.081
 happened+0.077
 Please+0.076
 Are+0.073
Neg logit contributions
ips-0.097
 destination-0.085
ippy-0.085
 safely-0.084
ags-0.084
Tokenopus
Token ID25790
Feature activation-5.34e+00

Pos logit contributions
 oh+0.111
 Can+0.109
 mean+0.096
 happened+0.094
 Are+0.092
Neg logit contributions
ips-0.147
ippy-0.143
aned-0.128
ge-0.126
lo-0.122
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
 Can+0.094
 oh+0.092
 dear+0.083
 hello+0.080
 Please+0.080
Neg logit contributions
ips-0.149
aled-0.137
aned-0.136
 safely-0.135
ge-0.134
Token The
Token ID383
Feature activation-1.59e+00

Pos logit contributions
 oh+0.043
 insect+0.039
 witch+0.038
 mean+0.037
 Can+0.037
Neg logit contributions
ips-0.040
aled-0.039
aned-0.035
ags-0.033
 safely-0.033
Token oct
Token ID19318
Feature activation-3.89e+00

Pos logit contributions
 Can+0.021
 oh+0.020
 Please+0.018
 hello+0.017
 insect+0.017
Neg logit contributions
ips-0.053
aled-0.050
aned-0.050
 ease-0.047
ge-0.046
Tokenopus
Token ID25790
Feature activation-3.45e+00

Pos logit contributions
 oh+0.069
 Can+0.069
 mean+0.062
 hello+0.061
 insect+0.059
Neg logit contributions
ips-0.117
aned-0.107
ippy-0.104
aled-0.101
ge-0.098
Token time
Token ID640
Feature activation-2.59e+00

Pos logit contributions
 Can+0.066
 oh+0.063
 insect+0.060
 witch+0.057
 Please+0.057
Neg logit contributions
ips-0.071
aned-0.067
aled-0.064
wed-0.062
ippy-0.060
Token,
Token ID11
Feature activation-3.74e+00

Pos logit contributions
 insect+0.050
 goat+0.049
 witch+0.048
 Can+0.048
 crow+0.047
Neg logit contributions
ips-0.072
ge-0.062
 safely-0.062
aned-0.061
ippy-0.060
Token there
Token ID612
Feature activation-3.22e-01

Pos logit contributions
 insect+0.062
 Can+0.061
 witch+0.059
 goat+0.058
 oh+0.057
Neg logit contributions
ips-0.110
 safely-0.100
aled-0.099
ge-0.099
aned-0.098
Token was
Token ID373
Feature activation-4.34e+00

Pos logit contributions
 Can+0.006
 oh+0.006
 insect+0.006
 Please+0.006
 mean+0.006
Neg logit contributions
ips-0.009
aned-0.008
aled-0.008
 ease-0.008
 safely-0.008
Token a
Token ID257
Feature activation-5.28e+00

Pos logit contributions
 Can+0.097
 Please+0.090
 oh+0.087
 Are+0.086
 witch+0.084
Neg logit contributions
ips-0.105
 ease-0.099
aled-0.098
ags-0.094
 safely-0.092
Token little
Token ID1310
Feature activation-5.50e+00

Pos logit contributions
 oh+0.107
 Can+0.107
 happened+0.106
 Please+0.099
 unlike+0.098
Neg logit contributions
 safely-0.095
ippy-0.095
ips-0.089
ge-0.086
 successfully-0.085
Token girl
Token ID2576
Feature activation-4.85e+00

Pos logit contributions
 happened+0.107
 oh+0.105
 unlike+0.102
 Can+0.099
 Do+0.094
Neg logit contributions
ips-0.118
ippy-0.116
 halves-0.110
 safely-0.102
 home-0.100
Token named
Token ID3706
Feature activation-2.58e+00

Pos logit contributions
 Can+0.087
 oh+0.086
 hello+0.082
 guy+0.080
 happened+0.077
Neg logit contributions
ips-0.121
eds-0.111
 safely-0.111
wed-0.109
ippy-0.109
Token Lily
Token ID20037
Feature activation-2.99e+00

Pos logit contributions
 Can+0.039
 oh+0.037
 insect+0.036
 mean+0.033
 Please+0.032
Neg logit contributions
ips-0.085
aned-0.081
aled-0.079
ge-0.074
wed-0.074
Token.
Token ID13
Feature activation-1.21e+00

Pos logit contributions
 Can+0.070
 oh+0.068
 insect+0.064
 Please+0.062
 hello+0.062
Neg logit contributions
ips-0.071
aned-0.066
aled-0.066
ge-0.064
wed-0.063
Token She
Token ID1375
Feature activation-2.89e+00

Pos logit contributions
 oh+0.021
 insect+0.021
 Can+0.021
 mean+0.019
 witch+0.019
Neg logit contributions
ips-0.037
aned-0.034
aled-0.034
ags-0.031
wed-0.031