TOP ACTIVATIONS
MAX = 0.123

time , in a small garden , there lived a little
time , in a big garden , there lived a little
time , in a small garden , there was a little
time , in a green garden , there lived a restless
time , in a big garden , there was a caul
time , in a big garden , there was a wealthy
time , in a big classroom , there was a thick
time , in a small room , there was a humble
time , in a big forest , there was a log
time , in a big forest , there was a wise
time , in a big forest , there was an unusual
time , in a small garden , there was a fragile
time , in a big forest , there was a brave
time , in a big forest , lived a small bunny
time , in a big forest , there was a little
time , in a big forest , there was a poor
ily and Sam went on a saf ari with their mom
time , in a big forest , there was a little
time , in a small village , there was a little
time , in a big forest , there was a little

MIN ACTIVATIONS
MIN = -0.090

new friend just by sharing her smile . <|endoftext|> One day
that it produced more av oc ados than ever before .
better than selfish ness and rud eness . <|endoftext|> Tom and
and thanked them for sharing their water . They took the
to each other and share their toys . Tom
like picking flowers or sharing your food ." The rabbit nodded
someone else happy by sharing her sweets . From that day
to our friends and respect their style . We should say
with your heart and share your voice . <|endoftext|> M ia
They stopped sharing their toys and began to snatch
and a way to share your ideas and feelings . If
should always respect and share their things and not hurt each
learn more about rh in oc er oses and respect them
Both of them put their anx ieties aside . From then
We have different likes and disl ikes . We can be
and apologized for not sharing her toys . The little girl
kind to animals and respect their lives . He and Sam
a good friend and sharing her treasures . <|endoftext|> Once upon
a helping hand and sharing his truth . <|endoftext|> Once ,
don 't like to share their honey ." <|endoftext|> S ara

INTERVAL 0.070 - 0.123
CONTAINS 2.972%

Once upon a time there was a little girl named Lucy
a time , there was a little girl named Lily .
upon a time , there was a big tower . It
, there was a little girl named Lily . She had
had a bad ending . <|endoftext|> Once upon a time ,

INTERVAL 0.017 - 0.070
CONTAINS 20.484%

in the garage . They opened it and saw many old
her . Down the trail , she thought she
He found a nice , soft bed under a big tree
ground . She asked her friends , " What is in
playing with them . He was especially delighted to find a

INTERVAL -0.037 - 0.017
CONTAINS 70.474%

shapes and soon the puzzle was transformed into a beautiful masterpiece
something fun . He started running around the ball . It
would come back and they could play together . <|endoftext|> Harry
. She felt frustrated and sad . She wanted to whistle
a bush !". Jack chuckled and said , " You are

INTERVAL -0.090 - -0.037
CONTAINS 6.070%

â € J ill understood why Steve was
kill him . He is my friend . He came to
my how they made their wrap and loop with paper and
that these views were made just for him and he belonged
always ask before you try something new . There are many
Token time
Token ID640
Feature activation+9.91e-02

Pos logit contributions
 named+0.130
 saw+0.128
 there+0.122
 abandoned+0.112
 nearby+0.110
Neg logit contributions
 Deal-0.177
 apology-0.176
oor-0.172
 accomplishment-0.164
Less-0.161
Token,
Token ID11
Feature activation+1.06e-01

Pos logit contributions
 saw+0.117
 named+0.115
 abandoned+0.108
 nearby+0.105
 sheep+0.094
Neg logit contributions
oor-0.232
 Deal-0.219
 apology-0.216
ently-0.209
anks-0.206
Token in
Token ID287
Feature activation+7.47e-02

Pos logit contributions
 cloak+0.103
 witch+0.101
 hall+0.100
 abandoned+0.100
 sheep+0.092
Neg logit contributions
 Deal-0.224
 apology-0.222
Less-0.222
raw-0.221
 itself-0.218
Token a
Token ID257
Feature activation+1.08e-01

Pos logit contributions
 saw+0.113
 named+0.112
 hall+0.108
 appeared+0.105
 abandoned+0.103
Neg logit contributions
 Deal-0.130
 accomplishment-0.129
 apology-0.128
 myself-0.119
Less-0.117
Token small
Token ID1402
Feature activation+9.09e-02

Pos logit contributions
 appeared+0.165
 named+0.156
 telephone+0.134
itty+0.132
 called+0.132
Neg logit contributions
Less-0.161
 Deal-0.160
 accomplishment-0.158
 myself-0.157
 mistakes-0.154
Token garden
Token ID11376
Feature activation+1.23e-01

Pos logit contributions
 appeared+0.160
aver+0.157
 named+0.151
eh+0.144
 lurking+0.138
Neg logit contributions
 decisions-0.106
 mistakes-0.101
 halves-0.096
 myself-0.094
 instructions-0.093
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
 knight+0.186
 abandoned+0.184
 detective+0.178
 cowboy+0.173
 guy+0.172
Neg logit contributions
ently-0.165
um-0.157
 myself-0.154
 itself-0.151
oor-0.149
Token there
Token ID612
Feature activation+9.21e-02

Pos logit contributions
 uniform+0.095
 cloak+0.091
 witch+0.088
 station+0.087
 laboratory+0.085
Neg logit contributions
 itself-0.223
 themselves-0.219
 myself-0.216
 ourselves-0.214
Less-0.206
Token lived
Token ID5615
Feature activation+9.26e-02

Pos logit contributions
 abandoned+0.143
 witch+0.141
 sheep+0.140
 named+0.140
eh+0.136
Neg logit contributions
 itself-0.137
mm-0.132
oor-0.131
Less-0.130
 themselves-0.129
Token a
Token ID257
Feature activation+8.18e-02

Pos logit contributions
 saw+0.123
 named+0.121
 abandoned+0.114
 hall+0.108
 appeared+0.103
Neg logit contributions
Less-0.184
 Deal-0.183
 myself-0.182
 apology-0.176
oor-0.174
Token little
Token ID1310
Feature activation+5.50e-02

Pos logit contributions
 appeared+0.133
 seen+0.123
 there+0.113
 heard+0.113
c+0.112
Neg logit contributions
 apology-0.120
 accomplishment-0.113
 lesson-0.105
 Deal-0.103
Less-0.103
Token time
Token ID640
Feature activation+1.03e-01

Pos logit contributions
 named+0.120
 saw+0.119
 there+0.113
 abandoned+0.101
 nearby+0.100
Neg logit contributions
 apology-0.179
 Deal-0.177
oor-0.175
 accomplishment-0.166
der-0.165
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
 saw+0.110
 named+0.108
 abandoned+0.098
 nearby+0.095
 hall+0.084
Neg logit contributions
oor-0.251
 Deal-0.242
 apology-0.239
anks-0.232
ently-0.229
Token in
Token ID287
Feature activation+7.46e-02

Pos logit contributions
 hall+0.088
 abandoned+0.087
 named+0.081
 witch+0.080
 cloak+0.079
Neg logit contributions
 Deal-0.235
 apology-0.232
Less-0.231
 myself-0.225
raw-0.223
Token a
Token ID257
Feature activation+1.05e-01

Pos logit contributions
 saw+0.108
 named+0.107
 hall+0.100
 abandoned+0.098
 appeared+0.098
Neg logit contributions
 Deal-0.137
 apology-0.135
 accomplishment-0.133
 myself-0.128
Less-0.126
Token big
Token ID1263
Feature activation+7.94e-02

Pos logit contributions
 appeared+0.154
 named+0.146
 abandoned+0.123
 called+0.123
 telephone+0.122
Neg logit contributions
 Deal-0.160
Less-0.159
 myself-0.157
 accomplishment-0.155
 apology-0.153
Token garden
Token ID11376
Feature activation+1.22e-01

Pos logit contributions
 appeared+0.106
 named+0.099
aver+0.099
 disturbed+0.095
 there+0.090
Neg logit contributions
 accomplishment-0.134
 mistakes-0.122
 decisions-0.121
 apology-0.121
 myself-0.120
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 abandoned+0.181
 knight+0.179
 detective+0.172
 guy+0.169
 cowboy+0.168
Neg logit contributions
 myself-0.170
ently-0.167
oor-0.161
 ourselves-0.155
 itself-0.150
Token there
Token ID612
Feature activation+9.05e-02

Pos logit contributions
 uniform+0.078
 cloak+0.070
 witch+0.067
 abandoned+0.067
 mask+0.065
Neg logit contributions
 myself-0.223
 itself-0.218
 ourselves-0.217
 apology-0.216
 themselves-0.215
Token lived
Token ID5615
Feature activation+9.04e-02

Pos logit contributions
 abandoned+0.135
 named+0.131
 witch+0.128
 nearby+0.127
 sheep+0.126
Neg logit contributions
oor-0.140
mm-0.140
 itself-0.137
ently-0.136
Less-0.136
Token a
Token ID257
Feature activation+8.01e-02

Pos logit contributions
 saw+0.112
 named+0.111
 abandoned+0.102
 hall+0.095
 appeared+0.093
Neg logit contributions
Less-0.188
 Deal-0.188
 myself-0.185
 apology-0.183
oor-0.180
Token little
Token ID1310
Feature activation+5.41e-02

Pos logit contributions
 appeared+0.129
 seen+0.118
 there+0.110
 abandoned+0.110
 walking+0.109
Neg logit contributions
 apology-0.119
 accomplishment-0.111
 lesson-0.105
 Deal-0.102
Less-0.102
Token time
Token ID640
Feature activation+9.86e-02

Pos logit contributions
 named+0.119
 saw+0.118
 there+0.111
 nearby+0.098
 abandoned+0.097
Neg logit contributions
 apology-0.185
 Deal-0.183
oor-0.181
 accomplishment-0.172
der-0.172
Token,
Token ID11
Feature activation+1.07e-01

Pos logit contributions
 saw+0.110
 named+0.110
 abandoned+0.098
 nearby+0.097
 hall+0.086
Neg logit contributions
oor-0.237
 Deal-0.226
 apology-0.224
ently-0.215
anks-0.213
Token in
Token ID287
Feature activation+7.23e-02

Pos logit contributions
 cloak+0.096
 hall+0.093
 witch+0.092
 abandoned+0.091
 laboratory+0.086
Neg logit contributions
 apology-0.232
Less-0.231
 Deal-0.231
raw-0.228
 itself-0.221
Token a
Token ID257
Feature activation+1.06e-01

Pos logit contributions
 saw+0.109
 named+0.104
 hall+0.102
 appeared+0.101
 abandoned+0.096
Neg logit contributions
 apology-0.131
 accomplishment-0.129
 Deal-0.128
Less-0.117
 myself-0.116
Token small
Token ID1402
Feature activation+8.79e-02

Pos logit contributions
 appeared+0.154
 named+0.148
 there+0.128
 called+0.126
 abandoned+0.126
Neg logit contributions
 Deal-0.168
Less-0.167
 accomplishment-0.165
 apology-0.162
 mistakes-0.161
Token garden
Token ID11376
Feature activation+1.21e-01

Pos logit contributions
 appeared+0.144
aver+0.138
 named+0.136
 lurking+0.125
 disturbed+0.123
Neg logit contributions
 decisions-0.115
 mistakes-0.115
 halves-0.109
 instructions-0.103
 myself-0.103
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 abandoned+0.169
 knight+0.156
 flashlight+0.155
 detective+0.153
 guy+0.151
Neg logit contributions
ently-0.188
 myself-0.178
oor-0.178
 ourselves-0.166
mm-0.164
Token there
Token ID612
Feature activation+9.06e-02

Pos logit contributions
 uniform+0.098
 cloak+0.095
 laboratory+0.090
 station+0.088
 suit+0.088
Neg logit contributions
 myself-0.215
 itself-0.215
 themselves-0.210
 ourselves-0.208
Less-0.207
Token was
Token ID373
Feature activation+9.23e-02

Pos logit contributions
 abandoned+0.130
 named+0.130
 nearby+0.124
 witch+0.122
eh+0.121
Neg logit contributions
oor-0.146
mm-0.145
Less-0.143
 itself-0.141
ently-0.140
Token a
Token ID257
Feature activation+7.92e-02

Pos logit contributions
 saw+0.149
itty+0.141
 named+0.140
 cell+0.139
 hall+0.136
Neg logit contributions
Less-0.138
 myself-0.130
 herself-0.122
 Deal-0.122
 itself-0.122
Token little
Token ID1310
Feature activation+5.34e-02

Pos logit contributions
 appeared+0.131
 seen+0.124
 there+0.120
 named+0.116
 saw+0.114
Neg logit contributions
 apology-0.120
 accomplishment-0.114
 Deal-0.106
 lesson-0.103
 achievement-0.098
Token time
Token ID640
Feature activation+9.65e-02

Pos logit contributions
 named+0.122
 saw+0.121
 there+0.115
 abandoned+0.103
 nearby+0.102
Neg logit contributions
 Deal-0.174
 apology-0.174
oor-0.170
 accomplishment-0.161
der-0.159
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 saw+0.112
 named+0.109
 abandoned+0.101
 nearby+0.099
 sheep+0.090
Neg logit contributions
oor-0.226
 Deal-0.217
 apology-0.213
ently-0.205
anks-0.204
Token in
Token ID287
Feature activation+7.32e-02

Pos logit contributions
 cloak+0.096
 hall+0.093
 abandoned+0.093
 witch+0.090
 sheep+0.088
Neg logit contributions
Less-0.224
 Deal-0.220
 apology-0.218
raw-0.217
 itself-0.215
Token a
Token ID257
Feature activation+1.06e-01

Pos logit contributions
 saw+0.112
 named+0.110
 hall+0.105
 appeared+0.104
 abandoned+0.100
Neg logit contributions
 Deal-0.127
 apology-0.124
 accomplishment-0.123
 myself-0.117
Less-0.115
Token green
Token ID4077
Feature activation+8.75e-02

Pos logit contributions
 appeared+0.157
 named+0.155
 called+0.131
itty+0.129
 telephone+0.126
Neg logit contributions
Less-0.163
 Deal-0.160
 myself-0.158
 accomplishment-0.158
 apology-0.154
Token garden
Token ID11376
Feature activation+1.19e-01

Pos logit contributions
 afraid+0.117
 disturbed+0.112
 appeared+0.111
 abandoned+0.110
 saw+0.110
Neg logit contributions
 myself-0.131
der-0.129
 apology-0.124
 halves-0.124
oral-0.121
Token,
Token ID11
Feature activation+1.00e-01

Pos logit contributions
 abandoned+0.175
 knight+0.174
 sheep+0.170
 detective+0.167
 guy+0.165
Neg logit contributions
ently-0.164
 myself-0.160
oor-0.154
mm-0.153
 itself-0.153
Token there
Token ID612
Feature activation+8.76e-02

Pos logit contributions
 uniform+0.083
 cloak+0.081
 witch+0.081
 laboratory+0.078
 station+0.076
Neg logit contributions
 itself-0.212
 themselves-0.210
 myself-0.206
Less-0.204
 ourselves-0.203
Token lived
Token ID5615
Feature activation+8.93e-02

Pos logit contributions
 abandoned+0.129
 named+0.127
 sheep+0.127
 nearby+0.124
 witch+0.123
Neg logit contributions
mm-0.137
oor-0.134
 itself-0.134
Less-0.133
ently-0.129
Token a
Token ID257
Feature activation+7.75e-02

Pos logit contributions
 saw+0.118
 named+0.115
 abandoned+0.109
 hall+0.104
 appeared+0.099
Neg logit contributions
Less-0.178
 Deal-0.176
 myself-0.176
 apology-0.170
oor-0.168
Token restless
Token ID42537
Feature activation+4.25e-02

Pos logit contributions
 appeared+0.119
 seen+0.110
 there+0.103
 abandoned+0.103
 heard+0.102
Neg logit contributions
 apology-0.123
 accomplishment-0.115
Less-0.108
 lesson-0.107
 Deal-0.106
Token time
Token ID640
Feature activation+9.48e-02

Pos logit contributions
 named+0.108
 saw+0.108
 there+0.101
 nearby+0.089
 abandoned+0.088
Neg logit contributions
 apology-0.171
 Deal-0.168
oor-0.166
 accomplishment-0.158
der-0.158
Token,
Token ID11
Feature activation+9.87e-02

Pos logit contributions
 saw+0.098
 named+0.096
 nearby+0.080
 abandoned+0.079
 hall+0.071
Neg logit contributions
oor-0.240
 apology-0.236
 Deal-0.234
anks-0.225
ently-0.222
Token in
Token ID287
Feature activation+7.03e-02

Pos logit contributions
 hall+0.080
 abandoned+0.077
 cloak+0.076
 witch+0.076
 named+0.074
Neg logit contributions
 Deal-0.227
 apology-0.225
raw-0.224
Less-0.223
oor-0.213
Token a
Token ID257
Feature activation+1.02e-01

Pos logit contributions
 saw+0.101
 named+0.096
 hall+0.091
 appeared+0.090
 abandoned+0.088
Neg logit contributions
 apology-0.138
 Deal-0.137
 accomplishment-0.134
Less-0.125
oor-0.123
Token big
Token ID1263
Feature activation+7.81e-02

Pos logit contributions
 appeared+0.152
 named+0.146
 called+0.126
 abandoned+0.122
 there+0.121
Neg logit contributions
 Deal-0.162
Less-0.160
 accomplishment-0.157
 apology-0.157
 mistakes-0.152
Token garden
Token ID11376
Feature activation+1.19e-01

Pos logit contributions
 appeared+0.102
 named+0.098
 disturbed+0.096
aver+0.094
 heard+0.088
Neg logit contributions
 accomplishment-0.139
 apology-0.126
 decisions-0.125
 mistakes-0.124
 goals-0.121
Token,
Token ID11
Feature activation+9.96e-02

Pos logit contributions
 abandoned+0.161
 knight+0.154
 flashlight+0.150
 sheep+0.150
 mean+0.150
Neg logit contributions
oor-0.188
ently-0.185
 myself-0.180
mm-0.169
 ourselves-0.168
Token there
Token ID612
Feature activation+8.76e-02

Pos logit contributions
 uniform+0.076
 cloak+0.071
 mask+0.068
 witch+0.066
 appeared+0.065
Neg logit contributions
 myself-0.212
 itself-0.211
 ourselves-0.208
 themselves-0.207
oor-0.207
Token was
Token ID373
Feature activation+9.15e-02

Pos logit contributions
 named+0.122
 abandoned+0.120
 nearby+0.117
 sheep+0.115
 witch+0.114
Neg logit contributions
oor-0.148
mm-0.145
Less-0.144
ently-0.141
 itself-0.141
Token a
Token ID257
Feature activation+7.70e-02

Pos logit contributions
 saw+0.140
itty+0.135
 named+0.134
 cell+0.133
 appeared+0.128
Neg logit contributions
Less-0.142
 myself-0.136
 Deal-0.127
 themselves-0.127
 lesson-0.127
Token caul
Token ID46087
Feature activation+3.92e-02

Pos logit contributions
 appeared+0.125
 seen+0.119
 there+0.113
 named+0.110
 saw+0.110
Neg logit contributions
 apology-0.121
 accomplishment-0.113
 Deal-0.106
 lesson-0.104
 moral-0.100
Token time
Token ID640
Feature activation+9.69e-02

Pos logit contributions
 named+0.114
 saw+0.114
 there+0.107
 nearby+0.095
 abandoned+0.095
Neg logit contributions
 apology-0.172
 Deal-0.170
oor-0.168
 accomplishment-0.160
der-0.158
Token,
Token ID11
Feature activation+1.00e-01

Pos logit contributions
 saw+0.108
 named+0.108
 abandoned+0.095
 nearby+0.094
 hall+0.084
Neg logit contributions
oor-0.235
 apology-0.223
 Deal-0.222
ently-0.212
anks-0.211
Token in
Token ID287
Feature activation+6.98e-02

Pos logit contributions
 hall+0.080
 witch+0.080
 abandoned+0.078
 cloak+0.076
 named+0.075
Neg logit contributions
 apology-0.231
 Deal-0.230
Less-0.229
raw-0.225
oor-0.221
Token a
Token ID257
Feature activation+1.01e-01

Pos logit contributions
 saw+0.101
 named+0.100
 hall+0.093
 appeared+0.092
 abandoned+0.090
Neg logit contributions
 Deal-0.132
 apology-0.131
 accomplishment-0.130
 myself-0.121
Less-0.121
Token big
Token ID1263
Feature activation+7.82e-02

Pos logit contributions
 appeared+0.149
 named+0.146
 called+0.124
 there+0.122
 abandoned+0.122
Neg logit contributions
Less-0.159
 Deal-0.158
 accomplishment-0.155
 apology-0.152
 mistakes-0.152
Token garden
Token ID11376
Feature activation+1.18e-01

Pos logit contributions
 appeared+0.100
aver+0.098
 named+0.097
 disturbed+0.092
 lurking+0.086
Neg logit contributions
 accomplishment-0.134
 decisions-0.120
 mistakes-0.120
 apology-0.117
 goals-0.117
Token,
Token ID11
Feature activation+9.77e-02

Pos logit contributions
 knight+0.173
 abandoned+0.171
 detective+0.167
 flashlight+0.165
 sheep+0.164
Neg logit contributions
 myself-0.163
ently-0.161
oor-0.160
 ourselves-0.150
mm-0.149
Token there
Token ID612
Feature activation+8.54e-02

Pos logit contributions
 uniform+0.080
 cloak+0.078
 witch+0.075
 laboratory+0.072
 visitor+0.072
Neg logit contributions
 itself-0.209
 myself-0.207
 themselves-0.206
 ourselves-0.205
Less-0.203
Token was
Token ID373
Feature activation+8.80e-02

Pos logit contributions
 named+0.127
 abandoned+0.125
 witch+0.122
 nearby+0.122
 sheep+0.121
Neg logit contributions
mm-0.133
oor-0.133
 itself-0.130
Less-0.130
ently-0.128
Token a
Token ID257
Feature activation+7.48e-02

Pos logit contributions
 saw+0.135
 named+0.129
itty+0.126
 abandoned+0.122
 cell+0.121
Neg logit contributions
Less-0.143
 myself-0.136
oor-0.129
 themselves-0.128
 Deal-0.127
Token wealthy
Token ID11574
Feature activation+5.86e-02

Pos logit contributions
 appeared+0.118
 seen+0.112
 there+0.111
 named+0.109
 saw+0.107
Neg logit contributions
 apology-0.118
 accomplishment-0.112
 Deal-0.104
 lesson-0.103
Less-0.099
Token time
Token ID640
Feature activation+9.65e-02

Pos logit contributions
 named+0.102
 saw+0.101
 there+0.094
 nearby+0.081
 abandoned+0.078
Neg logit contributions
 apology-0.179
 Deal-0.175
oor-0.174
anks-0.170
der-0.168
Token,
Token ID11
Feature activation+9.92e-02

Pos logit contributions
 named+0.096
 saw+0.095
 nearby+0.078
 abandoned+0.077
 there+0.070
Neg logit contributions
oor-0.249
 Deal-0.241
 apology-0.241
anks-0.233
Less-0.230
Token in
Token ID287
Feature activation+6.94e-02

Pos logit contributions
 abandoned+0.074
 named+0.071
 hall+0.069
 cloak+0.066
 sheep+0.063
Neg logit contributions
Less-0.234
 Deal-0.233
 apology-0.230
raw-0.224
 myself-0.221
Token a
Token ID257
Feature activation+1.00e-01

Pos logit contributions
 named+0.094
 saw+0.093
 appeared+0.080
 abandoned+0.080
 hall+0.080
Neg logit contributions
 accomplishment-0.142
 Deal-0.142
 apology-0.141
Less-0.133
oor-0.130
Token big
Token ID1263
Feature activation+7.69e-02

Pos logit contributions
 named+0.137
 appeared+0.135
 there+0.116
 called+0.115
 abandoned+0.113
Neg logit contributions
 Deal-0.170
Less-0.169
 accomplishment-0.167
 apology-0.163
 myself-0.159
Token classroom
Token ID15806
Feature activation+1.17e-01

Pos logit contributions
 appeared+0.094
 named+0.093
 disturbed+0.092
aver+0.089
 there+0.086
Neg logit contributions
 accomplishment-0.141
 apology-0.126
 decisions-0.125
 mistakes-0.124
 goals-0.123
Token,
Token ID11
Feature activation+9.54e-02

Pos logit contributions
 sheep+0.159
 nearby+0.147
 abandoned+0.146
 deer+0.145
 flashlight+0.142
Neg logit contributions
oor-0.213
ently-0.192
mm-0.189
der-0.185
anks-0.180
Token there
Token ID612
Feature activation+8.45e-02

Pos logit contributions
 cloak+0.069
 sheep+0.066
 desert+0.064
 mask+0.063
 witch+0.063
Neg logit contributions
 apology-0.206
Less-0.204
 myself-0.203
oor-0.201
 Deal-0.200
Token was
Token ID373
Feature activation+8.12e-02

Pos logit contributions
 named+0.130
 nearby+0.121
 abandoned+0.121
 sheep+0.121
eh+0.119
Neg logit contributions
oor-0.140
Less-0.136
ently-0.135
mm-0.135
 apology-0.133
Token a
Token ID257
Feature activation+6.88e-02

Pos logit contributions
 saw+0.114
 named+0.109
 cell+0.098
 abandoned+0.097
 sheep+0.097
Neg logit contributions
Less-0.151
oor-0.143
 apology-0.143
 myself-0.141
 Deal-0.140
Token thick
Token ID6546
Feature activation+3.56e-02

Pos logit contributions
 there+0.102
 named+0.100
 appeared+0.100
 lurking+0.093
 saw+0.093
Neg logit contributions
 apology-0.119
 accomplishment-0.115
 Deal-0.106
 lesson-0.104
oor-0.101
Token time
Token ID640
Feature activation+1.05e-01

Pos logit contributions
 named+0.124
 saw+0.123
 there+0.116
 abandoned+0.104
 nearby+0.104
Neg logit contributions
 apology-0.184
 Deal-0.184
oor-0.180
 accomplishment-0.171
Less-0.170
Token,
Token ID11
Feature activation+1.03e-01

Pos logit contributions
 saw+0.117
 named+0.114
 abandoned+0.106
 nearby+0.103
 sheep+0.091
Neg logit contributions
oor-0.252
 Deal-0.240
 apology-0.236
ently-0.229
anks-0.227
Token in
Token ID287
Feature activation+7.44e-02

Pos logit contributions
 abandoned+0.089
 hall+0.085
 named+0.082
 witch+0.079
 cloak+0.077
Neg logit contributions
 Deal-0.238
 apology-0.234
Less-0.232
raw-0.229
oor-0.225
Token a
Token ID257
Feature activation+1.06e-01

Pos logit contributions
 named+0.106
 saw+0.105
 hall+0.102
 appeared+0.097
 abandoned+0.096
Neg logit contributions
 Deal-0.137
 apology-0.137
 accomplishment-0.134
Less-0.127
 themselves-0.123
Token small
Token ID1402
Feature activation+8.57e-02

Pos logit contributions
 appeared+0.155
 named+0.145
 abandoned+0.128
 telephone+0.126
 called+0.125
Neg logit contributions
 Deal-0.165
Less-0.165
 accomplishment-0.161
 myself-0.158
 apology-0.157
Token room
Token ID2119
Feature activation+1.16e-01

Pos logit contributions
 appeared+0.146
aver+0.141
 named+0.138
eh+0.128
 lurking+0.128
Neg logit contributions
 decisions-0.105
 mistakes-0.103
 halves-0.099
 myself-0.095
 goals-0.093
Token,
Token ID11
Feature activation+9.68e-02

Pos logit contributions
 knight+0.152
 abandoned+0.150
 mean+0.146
 nearby+0.143
 flashlight+0.142
Neg logit contributions
oor-0.187
ently-0.178
mm-0.176
um-0.166
omm-0.159
Token there
Token ID612
Feature activation+9.02e-02

Pos logit contributions
 witch+0.079
 cloak+0.077
 laboratory+0.075
 uniform+0.073
 desert+0.072
Neg logit contributions
 themselves-0.201
 itself-0.197
 myself-0.196
 ourselves-0.196
 apology-0.196
Token was
Token ID373
Feature activation+8.86e-02

Pos logit contributions
 named+0.132
 abandoned+0.127
 nearby+0.124
 witch+0.121
eh+0.121
Neg logit contributions
oor-0.149
Less-0.147
mm-0.146
ently-0.144
 apology-0.144
Token a
Token ID257
Feature activation+7.46e-02

Pos logit contributions
 saw+0.136
 named+0.129
itty+0.124
 sheep+0.123
 abandoned+0.121
Neg logit contributions
Less-0.145
 Deal-0.134
 myself-0.132
 apology-0.130
 lesson-0.129
Token humble
Token ID19695
Feature activation+4.53e-02

Pos logit contributions
 appeared+0.126
 seen+0.121
 there+0.117
 named+0.115
 lurking+0.113
Neg logit contributions
 apology-0.114
 accomplishment-0.109
 Deal-0.101
 lesson-0.095
 achievement-0.093
Token time
Token ID640
Feature activation+1.01e-01

Pos logit contributions
 named+0.130
 saw+0.129
 there+0.122
 abandoned+0.109
 nearby+0.109
Neg logit contributions
 apology-0.191
 Deal-0.189
oor-0.187
 accomplishment-0.178
Less-0.176
Token,
Token ID11
Feature activation+1.08e-01

Pos logit contributions
 saw+0.112
 named+0.112
 abandoned+0.100
 nearby+0.097
 hall+0.086
Neg logit contributions
oor-0.243
 Deal-0.233
 apology-0.232
ently-0.223
anks-0.222
Token in
Token ID287
Feature activation+7.58e-02

Pos logit contributions
 hall+0.102
 cloak+0.102
 abandoned+0.101
 witch+0.099
 named+0.099
Neg logit contributions
 apology-0.229
 Deal-0.227
Less-0.225
raw-0.223
ently-0.217
Token a
Token ID257
Feature activation+1.10e-01

Pos logit contributions
 saw+0.114
 named+0.111
 hall+0.107
 appeared+0.106
 abandoned+0.103
Neg logit contributions
 Deal-0.135
 apology-0.135
 accomplishment-0.133
Less-0.123
 myself-0.122
Token big
Token ID1263
Feature activation+8.46e-02

Pos logit contributions
 appeared+0.171
 named+0.163
 abandoned+0.141
 called+0.140
 telephone+0.137
Neg logit contributions
 Deal-0.166
Less-0.164
 accomplishment-0.162
 myself-0.157
 apology-0.157
Token forest
Token ID8222
Feature activation+1.15e-01

Pos logit contributions
 appeared+0.114
 named+0.107
aver+0.106
 disturbed+0.103
 lurking+0.096
Neg logit contributions
 accomplishment-0.148
 mistakes-0.132
 decisions-0.132
 goals-0.131
 apology-0.130
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 abandoned+0.157
 closet+0.156
 cloak+0.150
 sheep+0.149
 saw+0.146
Neg logit contributions
ently-0.191
oor-0.187
iously-0.176
 myself-0.175
 itself-0.169
Token there
Token ID612
Feature activation+9.11e-02

Pos logit contributions
 uniform+0.099
 cloak+0.094
 suit+0.088
 costume+0.086
 robe+0.084
Neg logit contributions
 itself-0.221
 myself-0.218
 ourselves-0.214
 themselves-0.214
Less-0.208
Token was
Token ID373
Feature activation+9.73e-02

Pos logit contributions
 named+0.131
 abandoned+0.127
 witch+0.119
 nearby+0.119
 sheep+0.117
Neg logit contributions
oor-0.156
Less-0.150
ently-0.150
mm-0.147
 itself-0.147
Token a
Token ID257
Feature activation+8.71e-02

Pos logit contributions
 saw+0.151
 cell+0.147
 named+0.143
 hall+0.141
 appeared+0.140
Neg logit contributions
Less-0.151
 myself-0.142
 Deal-0.135
 lesson-0.134
 itself-0.133
Token log
Token ID2604
Feature activation+7.76e-02

Pos logit contributions
 appeared+0.140
 seen+0.132
 there+0.125
 saw+0.121
 named+0.121
Neg logit contributions
 apology-0.132
 accomplishment-0.126
 Deal-0.119
 lesson-0.116
Less-0.116
Token time
Token ID640
Feature activation+1.04e-01

Pos logit contributions
 named+0.121
 saw+0.120
 there+0.113
 abandoned+0.100
 nearby+0.100
Neg logit contributions
 apology-0.183
 Deal-0.181
oor-0.180
 accomplishment-0.171
der-0.169
Token,
Token ID11
Feature activation+1.06e-01

Pos logit contributions
 saw+0.118
 named+0.117
 abandoned+0.107
 nearby+0.103
 sheep+0.092
Neg logit contributions
oor-0.246
 Deal-0.233
 apology-0.231
ently-0.221
anks-0.219
Token in
Token ID287
Feature activation+7.61e-02

Pos logit contributions
 abandoned+0.098
 cloak+0.096
 hall+0.094
 witch+0.092
 named+0.089
Neg logit contributions
 Deal-0.225
Less-0.223
 apology-0.222
raw-0.218
 itself-0.215
Token a
Token ID257
Feature activation+1.08e-01

Pos logit contributions
 saw+0.111
 named+0.109
 hall+0.103
 appeared+0.102
 abandoned+0.100
Neg logit contributions
 apology-0.141
 Deal-0.140
 accomplishment-0.139
Less-0.128
 myself-0.127
Token big
Token ID1263
Feature activation+8.33e-02

Pos logit contributions
 appeared+0.165
 named+0.151
 called+0.132
 abandoned+0.131
 there+0.129
Neg logit contributions
 Deal-0.165
Less-0.164
 accomplishment-0.160
 myself-0.158
 apology-0.158
Token forest
Token ID8222
Feature activation+1.15e-01

Pos logit contributions
 appeared+0.109
 named+0.102
 disturbed+0.101
aver+0.100
 there+0.094
Neg logit contributions
 accomplishment-0.147
 apology-0.131
 mistakes-0.130
 decisions-0.130
 goals-0.129
Token,
Token ID11
Feature activation+1.03e-01

Pos logit contributions
 closet+0.158
 abandoned+0.157
 sheep+0.152
 cloak+0.152
 lab+0.147
Neg logit contributions
ently-0.182
oor-0.178
 myself-0.166
iously-0.165
 itself-0.164
Token there
Token ID612
Feature activation+9.14e-02

Pos logit contributions
 uniform+0.092
 cloak+0.086
 witch+0.079
 suit+0.079
 abandoned+0.077
Neg logit contributions
 itself-0.221
 themselves-0.216
 myself-0.214
 ourselves-0.214
Less-0.206
Token was
Token ID373
Feature activation+9.57e-02

Pos logit contributions
 abandoned+0.136
 named+0.133
 witch+0.129
 nearby+0.127
 sheep+0.126
Neg logit contributions
oor-0.142
 itself-0.141
mm-0.138
ently-0.137
Less-0.136
Token a
Token ID257
Feature activation+8.35e-02

Pos logit contributions
 saw+0.154
 cell+0.151
itty+0.150
 suit+0.148
 hall+0.146
Neg logit contributions
Less-0.135
 myself-0.129
 itself-0.124
 themselves-0.120
 lesson-0.119
Token wise
Token ID10787
Feature activation+4.53e-02

Pos logit contributions
 appeared+0.131
 seen+0.127
 there+0.123
 saw+0.115
 heard+0.114
Neg logit contributions
 apology-0.128
 accomplishment-0.120
 Deal-0.114
 lesson-0.111
Less-0.110
Token time
Token ID640
Feature activation+9.94e-02

Pos logit contributions
 named+0.127
 saw+0.127
 there+0.120
 abandoned+0.109
 nearby+0.108
Neg logit contributions
 apology-0.178
 Deal-0.176
oor-0.175
 accomplishment-0.165
Less-0.163
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
 saw+0.117
 named+0.115
 abandoned+0.107
 nearby+0.105
 sheep+0.094
Neg logit contributions
oor-0.232
 Deal-0.218
 apology-0.216
ently-0.208
anks-0.205
Token in
Token ID287
Feature activation+7.20e-02

Pos logit contributions
 cloak+0.098
 witch+0.097
 hall+0.093
 abandoned+0.091
 laboratory+0.088
Neg logit contributions
 apology-0.218
Less-0.217
 Deal-0.216
raw-0.213
 itself-0.211
Token a
Token ID257
Feature activation+1.08e-01

Pos logit contributions
 saw+0.106
 named+0.105
 hall+0.098
 appeared+0.097
 abandoned+0.095
Neg logit contributions
 Deal-0.131
 apology-0.130
 accomplishment-0.130
 myself-0.122
Less-0.120
Token big
Token ID1263
Feature activation+8.45e-02

Pos logit contributions
 appeared+0.164
 named+0.158
 called+0.134
 abandoned+0.133
itty+0.132
Neg logit contributions
Less-0.156
 Deal-0.153
 myself-0.152
 accomplishment-0.151
 mistakes-0.151
Token forest
Token ID8222
Feature activation+1.14e-01

Pos logit contributions
 appeared+0.112
aver+0.110
 named+0.107
 disturbed+0.105
 lurking+0.099
Neg logit contributions
 accomplishment-0.140
 mistakes-0.126
 decisions-0.126
 myself-0.123
 goals-0.122
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 closet+0.172
 cloak+0.172
 sheep+0.171
 abandoned+0.170
 detective+0.165
Neg logit contributions
ently-0.161
oor-0.158
 myself-0.150
 itself-0.149
iously-0.146
Token there
Token ID612
Feature activation+9.14e-02

Pos logit contributions
 cloak+0.092
 uniform+0.091
 witch+0.086
 cell+0.082
 robe+0.082
Neg logit contributions
 itself-0.212
 myself-0.209
 ourselves-0.208
 themselves-0.208
 apology-0.197
Token was
Token ID373
Feature activation+9.31e-02

Pos logit contributions
 named+0.128
 abandoned+0.127
 witch+0.122
 nearby+0.122
 sheep+0.119
Neg logit contributions
oor-0.154
mm-0.151
 itself-0.150
Less-0.148
ently-0.147
Token an
Token ID281
Feature activation+4.80e-02

Pos logit contributions
 saw+0.145
 cell+0.137
 named+0.136
itty+0.136
 abandoned+0.132
Neg logit contributions
Less-0.145
 myself-0.140
 lesson-0.131
 itself-0.131
oor-0.130
Token unusual
Token ID8468
Feature activation+3.17e-02

Pos logit contributions
 saw+0.076
 there+0.069
 named+0.068
 heard+0.063
 appeared+0.063
Neg logit contributions
 apology-0.100
iously-0.092
 accomplishment-0.092
anks-0.091
oor-0.090
Token time
Token ID640
Feature activation+9.20e-02

Pos logit contributions
 named+0.101
 saw+0.101
 there+0.094
 nearby+0.082
 abandoned+0.081
Neg logit contributions
 apology-0.167
 Deal-0.164
oor-0.162
anks-0.156
der-0.156
Token,
Token ID11
Feature activation+9.64e-02

Pos logit contributions
 named+0.102
 saw+0.101
 nearby+0.087
 abandoned+0.087
 sheep+0.076
Neg logit contributions
oor-0.225
 apology-0.216
 Deal-0.216
anks-0.205
ently-0.205
Token in
Token ID287
Feature activation+6.77e-02

Pos logit contributions
 witch+0.087
 abandoned+0.087
 cloak+0.087
 hall+0.084
 laboratory+0.081
Neg logit contributions
 Deal-0.210
raw-0.205
 apology-0.203
Less-0.200
 itself-0.198
Token a
Token ID257
Feature activation+1.00e-01

Pos logit contributions
 saw+0.099
 named+0.096
 appeared+0.087
 abandoned+0.087
 hall+0.086
Neg logit contributions
 Deal-0.132
 accomplishment-0.130
 apology-0.129
Less-0.119
 myself-0.117
Token small
Token ID1402
Feature activation+8.25e-02

Pos logit contributions
 appeared+0.155
 named+0.146
 called+0.128
 disturbed+0.125
 there+0.125
Neg logit contributions
 Deal-0.157
Less-0.152
 accomplishment-0.151
 apology-0.148
 mistakes-0.145
Token garden
Token ID11376
Feature activation+1.14e-01

Pos logit contributions
 appeared+0.142
aver+0.137
 named+0.130
 disturbed+0.126
eh+0.125
Neg logit contributions
 decisions-0.103
 mistakes-0.099
 halves-0.094
 goals-0.092
 myself-0.092
Token,
Token ID11
Feature activation+9.39e-02

Pos logit contributions
 abandoned+0.160
 knight+0.155
 mean+0.148
 flashlight+0.147
 detective+0.146
Neg logit contributions
ently-0.171
 myself-0.167
oor-0.164
mm-0.160
 itself-0.153
Token there
Token ID612
Feature activation+8.59e-02

Pos logit contributions
 witch+0.083
 cloak+0.082
 uniform+0.080
 laboratory+0.079
 appeared+0.078
Neg logit contributions
 itself-0.194
 myself-0.193
 themselves-0.191
 ourselves-0.186
 apology-0.178
Token was
Token ID373
Feature activation+8.78e-02

Pos logit contributions
 named+0.129
 abandoned+0.127
eh+0.123
 nearby+0.122
 sheep+0.121
Neg logit contributions
mm-0.133
oor-0.133
Less-0.131
 itself-0.131
ently-0.128
Token a
Token ID257
Feature activation+7.57e-02

Pos logit contributions
 saw+0.144
itty+0.140
 named+0.136
 cell+0.136
 appeared+0.132
Neg logit contributions
Less-0.128
 myself-0.122
 themselves-0.117
 herself-0.116
 Deal-0.115
Token fragile
Token ID21049
Feature activation+2.81e-02

Pos logit contributions
 appeared+0.125
 seen+0.119
 there+0.115
 named+0.110
 saw+0.110
Neg logit contributions
 apology-0.117
 accomplishment-0.109
 Deal-0.101
 lesson-0.100
 moral-0.095
Token time
Token ID640
Feature activation+9.34e-02

Pos logit contributions
 named+0.126
 saw+0.125
 there+0.117
 abandoned+0.107
 nearby+0.105
Neg logit contributions
 apology-0.177
 Deal-0.175
oor-0.173
 accomplishment-0.164
Less-0.163
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
 saw+0.109
 named+0.104
 abandoned+0.097
 nearby+0.094
 hall+0.086
Neg logit contributions
oor-0.222
 Deal-0.211
 apology-0.210
ently-0.203
anks-0.200
Token in
Token ID287
Feature activation+7.14e-02

Pos logit contributions
 hall+0.105
 witch+0.102
 cloak+0.099
 abandoned+0.094
 closet+0.094
Neg logit contributions
 apology-0.224
raw-0.218
Less-0.217
 Deal-0.216
 itself-0.213
Token a
Token ID257
Feature activation+1.09e-01

Pos logit contributions
 saw+0.108
 named+0.107
 hall+0.102
 appeared+0.099
 abandoned+0.097
Neg logit contributions
 Deal-0.129
 apology-0.127
 accomplishment-0.124
 myself-0.121
Less-0.117
Token big
Token ID1263
Feature activation+8.57e-02

Pos logit contributions
 appeared+0.174
 named+0.162
 telephone+0.142
 abandoned+0.139
 called+0.138
Neg logit contributions
Less-0.157
 myself-0.155
 Deal-0.154
 accomplishment-0.149
 mistakes-0.148
Token forest
Token ID8222
Feature activation+1.14e-01

Pos logit contributions
 appeared+0.114
 named+0.110
aver+0.109
 disturbed+0.103
 heard+0.096
Neg logit contributions
 accomplishment-0.144
 decisions-0.135
 myself-0.133
 goals-0.132
 mistakes-0.131
Token,
Token ID11
Feature activation+1.00e-01

Pos logit contributions
 closet+0.173
 sheep+0.172
 cloak+0.170
 detective+0.169
 abandoned+0.168
Neg logit contributions
ently-0.163
oor-0.161
 itself-0.158
 myself-0.152
am-0.150
Token there
Token ID612
Feature activation+9.18e-02

Pos logit contributions
 uniform+0.092
 cloak+0.087
 witch+0.083
 suit+0.080
 hall+0.078
Neg logit contributions
 itself-0.220
 ourselves-0.216
 myself-0.215
 themselves-0.213
 apology-0.207
Token was
Token ID373
Feature activation+9.58e-02

Pos logit contributions
 abandoned+0.133
 witch+0.132
 named+0.132
 sheep+0.129
 nearby+0.125
Neg logit contributions
oor-0.148
 itself-0.147
mm-0.140
Less-0.140
 themselves-0.140
Token a
Token ID257
Feature activation+8.57e-02

Pos logit contributions
 saw+0.151
 cell+0.147
 hall+0.144
itty+0.143
 suit+0.142
Neg logit contributions
Less-0.145
 myself-0.140
 itself-0.133
 Deal-0.128
 lesson-0.127
Token brave
Token ID14802
Feature activation+3.99e-02

Pos logit contributions
 appeared+0.134
 seen+0.126
 there+0.120
 saw+0.118
 heard+0.117
Neg logit contributions
 apology-0.133
 accomplishment-0.126
 Deal-0.121
Less-0.119
 lesson-0.116
Token time
Token ID640
Feature activation+9.10e-02

Pos logit contributions
 named+0.120
 saw+0.119
 there+0.112
 abandoned+0.101
 nearby+0.100
Neg logit contributions
 apology-0.172
 Deal-0.171
oor-0.168
 accomplishment-0.160
Less-0.158
Token,
Token ID11
Feature activation+1.04e-01

Pos logit contributions
 saw+0.100
 named+0.099
 nearby+0.085
 abandoned+0.085
 hall+0.077
Neg logit contributions
oor-0.222
 apology-0.213
 Deal-0.212
ently-0.205
anks-0.204
Token in
Token ID287
Feature activation+7.07e-02

Pos logit contributions
 hall+0.090
 witch+0.086
 cloak+0.086
 named+0.082
 closet+0.080
Neg logit contributions
 apology-0.234
Less-0.229
 Deal-0.227
raw-0.226
 itself-0.223
Token a
Token ID257
Feature activation+1.04e-01

Pos logit contributions
 named+0.102
 saw+0.100
 appeared+0.092
 hall+0.092
 abandoned+0.090
Neg logit contributions
 apology-0.132
 Deal-0.131
 accomplishment-0.129
 myself-0.123
Less-0.122
Token big
Token ID1263
Feature activation+8.49e-02

Pos logit contributions
 named+0.153
 appeared+0.150
itty+0.125
 called+0.125
 abandoned+0.122
Neg logit contributions
Less-0.161
 Deal-0.160
 accomplishment-0.158
 myself-0.155
 apology-0.154
Token forest
Token ID8222
Feature activation+1.14e-01

Pos logit contributions
aver+0.114
 appeared+0.113
 named+0.112
 disturbed+0.102
a+0.101
Neg logit contributions
 accomplishment-0.139
 decisions-0.125
 mistakes-0.123
 myself-0.121
 apology-0.120
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 sheep+0.167
 closet+0.165
 abandoned+0.164
 cloak+0.163
 uniform+0.160
Neg logit contributions
ently-0.170
oor-0.168
iously-0.156
 itself-0.155
 myself-0.153
Token lived
Token ID5615
Feature activation+9.08e-02

Pos logit contributions
 uniform+0.085
 cloak+0.080
 witch+0.076
 suit+0.074
 costume+0.073
Neg logit contributions
 itself-0.222
 myself-0.217
 ourselves-0.217
 themselves-0.214
 apology-0.213
Token a
Token ID257
Feature activation+8.31e-02

Pos logit contributions
 saw+0.110
 named+0.109
 abandoned+0.093
 hall+0.091
 appeared+0.090
Neg logit contributions
 Deal-0.196
Less-0.195
 myself-0.194
oor-0.193
anks-0.191
Token small
Token ID1402
Feature activation+5.05e-02

Pos logit contributions
 appeared+0.126
 seen+0.115
 there+0.111
 walking+0.110
 named+0.108
Neg logit contributions
 apology-0.118
 accomplishment-0.115
Less-0.108
 Deal-0.107
 lesson-0.107
Token bunny
Token ID44915
Feature activation+8.34e-02

Pos logit contributions
 appeared+0.063
 saw+0.062
 there+0.059
eh+0.058
 heard+0.057
Neg logit contributions
der-0.094
 apology-0.089
 decisions-0.084
iously-0.082
 accomplishment-0.082
Token time
Token ID640
Feature activation+1.02e-01

Pos logit contributions
 named+0.125
 saw+0.124
 there+0.117
 nearby+0.103
 abandoned+0.102
Neg logit contributions
 apology-0.191
 Deal-0.190
oor-0.187
 accomplishment-0.178
Less-0.177
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 saw+0.118
 named+0.118
 nearby+0.105
 abandoned+0.105
 sheep+0.092
Neg logit contributions
oor-0.243
 Deal-0.228
 apology-0.226
iously-0.218
ently-0.218
Token in
Token ID287
Feature activation+7.30e-02

Pos logit contributions
 witch+0.096
 cloak+0.095
 abandoned+0.094
 hall+0.089
 laboratory+0.088
Neg logit contributions
 Deal-0.225
Less-0.225
raw-0.224
 itself-0.220
 themselves-0.219
Token a
Token ID257
Feature activation+1.07e-01

Pos logit contributions
 saw+0.107
 named+0.105
 hall+0.100
 appeared+0.098
 abandoned+0.096
Neg logit contributions
 Deal-0.134
 accomplishment-0.134
 apology-0.132
Less-0.121
 myself-0.119
Token big
Token ID1263
Feature activation+8.11e-02

Pos logit contributions
 appeared+0.154
 named+0.144
 there+0.126
 called+0.124
 telephone+0.121
Neg logit contributions
 Deal-0.176
Less-0.174
 accomplishment-0.169
 mistakes-0.165
 myself-0.164
Token forest
Token ID8222
Feature activation+1.14e-01

Pos logit contributions
 appeared+0.102
aver+0.099
 disturbed+0.098
 named+0.096
 there+0.090
Neg logit contributions
 accomplishment-0.144
 mistakes-0.129
 decisions-0.128
 apology-0.126
 delight-0.123
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
 closet+0.162
 abandoned+0.161
 sheep+0.160
 cloak+0.159
 detective+0.155
Neg logit contributions
ently-0.168
oor-0.165
iously-0.157
 itself-0.154
 myself-0.150
Token there
Token ID612
Feature activation+8.94e-02

Pos logit contributions
 uniform+0.105
 cloak+0.103
 suit+0.094
 robe+0.092
 witch+0.092
Neg logit contributions
 itself-0.216
 themselves-0.213
 ourselves-0.208
 myself-0.205
oor-0.193
Token was
Token ID373
Feature activation+9.54e-02

Pos logit contributions
 abandoned+0.135
 named+0.133
 witch+0.130
 nearby+0.129
 sheep+0.125
Neg logit contributions
 itself-0.139
oor-0.137
 themselves-0.134
ently-0.133
mm-0.133
Token a
Token ID257
Feature activation+8.17e-02

Pos logit contributions
 saw+0.154
itty+0.150
 cell+0.148
 named+0.146
 suit+0.145
Neg logit contributions
Less-0.136
 myself-0.128
 itself-0.126
 themselves-0.120
 lesson-0.120
Token little
Token ID1310
Feature activation+5.41e-02

Pos logit contributions
 appeared+0.125
 seen+0.121
 there+0.121
 saw+0.113
 heard+0.110
Neg logit contributions
 apology-0.126
 accomplishment-0.120
 Deal-0.114
Less-0.112
 lesson-0.108
Token time
Token ID640
Feature activation+1.03e-01

Pos logit contributions
 named+0.128
 saw+0.127
 there+0.120
 abandoned+0.110
 nearby+0.108
Neg logit contributions
 apology-0.179
 Deal-0.177
oor-0.176
 accomplishment-0.166
Less-0.164
Token,
Token ID11
Feature activation+1.06e-01

Pos logit contributions
 saw+0.120
 named+0.118
 abandoned+0.113
 nearby+0.106
 hall+0.098
Neg logit contributions
oor-0.240
 Deal-0.228
 apology-0.226
ently-0.216
anks-0.215
Token in
Token ID287
Feature activation+7.16e-02

Pos logit contributions
 abandoned+0.106
 hall+0.104
 cloak+0.103
 closet+0.097
 witch+0.096
Neg logit contributions
 apology-0.218
 Deal-0.214
Less-0.214
raw-0.213
 itself-0.206
Token a
Token ID257
Feature activation+1.07e-01

Pos logit contributions
 named+0.105
 saw+0.105
 abandoned+0.098
 hall+0.098
 appeared+0.094
Neg logit contributions
 Deal-0.132
 apology-0.130
 accomplishment-0.127
 myself-0.122
Less-0.121
Token big
Token ID1263
Feature activation+8.48e-02

Pos logit contributions
 appeared+0.161
 named+0.158
 abandoned+0.136
itty+0.132
 called+0.132
Neg logit contributions
Less-0.158
 Deal-0.156
 myself-0.155
 accomplishment-0.152
 mistakes-0.149
Token forest
Token ID8222
Feature activation+1.14e-01

Pos logit contributions
 appeared+0.111
 named+0.106
aver+0.105
 disturbed+0.102
 lurking+0.097
Neg logit contributions
 accomplishment-0.143
 decisions-0.128
 myself-0.127
 mistakes-0.127
 apology-0.126
Token,
Token ID11
Feature activation+9.94e-02

Pos logit contributions
 abandoned+0.163
 closet+0.162
 sheep+0.159
 cloak+0.155
 uniform+0.154
Neg logit contributions
ently-0.175
oor-0.173
 myself-0.163
 itself-0.162
iously-0.157
Token there
Token ID612
Feature activation+9.00e-02

Pos logit contributions
 uniform+0.086
 cloak+0.081
 suit+0.076
 abandoned+0.073
 hall+0.072
Neg logit contributions
 itself-0.218
 myself-0.213
 ourselves-0.212
 apology-0.210
 themselves-0.207
Token was
Token ID373
Feature activation+9.03e-02

Pos logit contributions
 named+0.125
 abandoned+0.124
 nearby+0.115
 sheep+0.114
 witch+0.112
Neg logit contributions
oor-0.158
ently-0.152
mm-0.152
 itself-0.151
Less-0.149
Token a
Token ID257
Feature activation+8.01e-02

Pos logit contributions
 saw+0.135
 named+0.128
 cell+0.126
itty+0.124
 abandoned+0.123
Neg logit contributions
Less-0.150
 myself-0.143
 lesson-0.136
 apology-0.135
 Deal-0.135
Token poor
Token ID3595
Feature activation+3.69e-02

Pos logit contributions
 appeared+0.122
 seen+0.114
 there+0.114
 saw+0.111
 named+0.110
Neg logit contributions
 apology-0.130
 accomplishment-0.124
 Deal-0.117
 lesson-0.114
Less-0.114
Tokenily
Token ID813
Feature activation+5.92e-02

Pos logit contributions
 named+0.046
 saw+0.046
 there+0.043
 nearby+0.037
 heard+0.037
Neg logit contributions
 apology-0.054
anks-0.054
 Deal-0.053
der-0.053
oor-0.053
Token and
Token ID290
Feature activation+5.34e-02

Pos logit contributions
 named+0.086
 saw+0.084
 there+0.084
 nearby+0.074
 hall+0.073
Neg logit contributions
 apology-0.120
der-0.114
oor-0.113
anks-0.112
 Deal-0.112
Token Sam
Token ID3409
Feature activation+6.32e-02

Pos logit contributions
 named+0.084
 saw+0.084
 hall+0.079
 abandoned+0.077
 nearby+0.077
Neg logit contributions
 apology-0.097
 Deal-0.093
Less-0.091
oor-0.090
 myself-0.088
Token went
Token ID1816
Feature activation+5.44e-02

Pos logit contributions
 named+0.083
 saw+0.078
 there+0.076
 nearby+0.067
 hall+0.066
Neg logit contributions
 apology-0.142
oor-0.138
 Deal-0.135
der-0.134
anks-0.134
Token on
Token ID319
Feature activation+6.89e-02

Pos logit contributions
 saw+0.084
 named+0.080
 there+0.073
 nearby+0.068
 abandoned+0.068
Neg logit contributions
 apology-0.112
anks-0.112
oor-0.109
 Deal-0.108
der-0.105
Token a
Token ID257
Feature activation+1.14e-01

Pos logit contributions
 saw+0.090
 named+0.088
 nearby+0.072
 there+0.071
 appeared+0.071
Neg logit contributions
oor-0.143
Less-0.142
 Deal-0.141
 accomplishment-0.141
 apology-0.141
Token saf
Token ID1932
Feature activation+6.22e-02

Pos logit contributions
 named+0.160
 appeared+0.156
 there+0.144
 saw+0.137
 sitting+0.135
Neg logit contributions
 accomplishment-0.188
Less-0.188
 apology-0.179
 lesson-0.177
oor-0.177
Tokenari
Token ID2743
Feature activation+8.59e-02

Pos logit contributions
 saw+0.113
 named+0.110
 there+0.106
 abandoned+0.095
 heard+0.094
Neg logit contributions
anks-0.112
oor-0.112
 apology-0.109
der-0.102
 Deal-0.102
Token with
Token ID351
Feature activation+6.11e-02

Pos logit contributions
 saw+0.148
 mean+0.131
 named+0.131
 appeared+0.128
 there+0.126
Neg logit contributions
anks-0.137
oor-0.131
iously-0.127
 apology-0.126
 themselves-0.125
Token their
Token ID511
Feature activation+3.55e-02

Pos logit contributions
 named+0.100
 saw+0.099
 there+0.091
 nearby+0.085
 hall+0.081
Neg logit contributions
 apology-0.122
 Deal-0.117
anks-0.116
oor-0.115
Less-0.113
Token mom
Token ID1995
Feature activation+4.41e-02

Pos logit contributions
 named+0.072
 saw+0.070
 there+0.067
 abandoned+0.062
 heard+0.062
Neg logit contributions
 apology-0.055
 Deal-0.052
Less-0.050
 accomplishment-0.050
anks-0.049
Token time
Token ID640
Feature activation+9.90e-02

Pos logit contributions
 named+0.125
 saw+0.124
 there+0.116
 abandoned+0.104
 nearby+0.103
Neg logit contributions
 apology-0.185
 Deal-0.184
oor-0.180
 accomplishment-0.171
Less-0.171
Token,
Token ID11
Feature activation+1.03e-01

Pos logit contributions
 saw+0.112
 named+0.106
 abandoned+0.097
 nearby+0.094
 heard+0.084
Neg logit contributions
oor-0.239
 Deal-0.233
 apology-0.230
anks-0.220
ently-0.220
Token in
Token ID287
Feature activation+7.65e-02

Pos logit contributions
 abandoned+0.101
 hall+0.095
 cloak+0.093
 witch+0.092
 sheep+0.089
Neg logit contributions
 Deal-0.216
 apology-0.216
Less-0.213
raw-0.212
 itself-0.205
Token a
Token ID257
Feature activation+1.10e-01

Pos logit contributions
 named+0.119
 saw+0.119
 hall+0.114
 appeared+0.111
 abandoned+0.109
Neg logit contributions
 Deal-0.132
 apology-0.126
 accomplishment-0.125
 myself-0.122
Less-0.119
Token big
Token ID1263
Feature activation+8.38e-02

Pos logit contributions
 appeared+0.166
 named+0.155
 called+0.135
 abandoned+0.133
 telephone+0.131
Neg logit contributions
 Deal-0.168
 myself-0.166
Less-0.166
 accomplishment-0.159
 apology-0.157
Token forest
Token ID8222
Feature activation+1.13e-01

Pos logit contributions
 appeared+0.114
aver+0.108
 named+0.108
 disturbed+0.105
 heard+0.099
Neg logit contributions
 accomplishment-0.139
 myself-0.129
 decisions-0.128
 mistakes-0.127
 goals-0.125
Token,
Token ID11
Feature activation+9.77e-02

Pos logit contributions
 sheep+0.151
 abandoned+0.149
 closet+0.148
 saw+0.147
 cloak+0.141
Neg logit contributions
ently-0.193
oor-0.187
 myself-0.180
 itself-0.176
iously-0.174
Token there
Token ID612
Feature activation+9.20e-02

Pos logit contributions
 uniform+0.070
 witch+0.069
 abandoned+0.067
 cloak+0.065
 sheep+0.065
Neg logit contributions
 itself-0.222
 myself-0.216
 themselves-0.216
 ourselves-0.215
 apology-0.213
Token was
Token ID373
Feature activation+9.75e-02

Pos logit contributions
 named+0.122
 abandoned+0.122
 sheep+0.117
 nearby+0.116
 witch+0.115
Neg logit contributions
oor-0.162
mm-0.158
 itself-0.157
ently-0.155
Less-0.154
Token a
Token ID257
Feature activation+8.45e-02

Pos logit contributions
 saw+0.153
 cell+0.146
itty+0.145
 abandoned+0.144
 hall+0.144
Neg logit contributions
Less-0.148
 myself-0.142
 itself-0.138
 Deal-0.131
 themselves-0.129
Token little
Token ID1310
Feature activation+5.45e-02

Pos logit contributions
 appeared+0.130
 seen+0.127
 there+0.117
 heard+0.114
 saw+0.114
Neg logit contributions
 apology-0.129
 accomplishment-0.123
 Deal-0.118
Less-0.117
 lesson-0.114
Token time
Token ID640
Feature activation+9.60e-02

Pos logit contributions
 named+0.125
 saw+0.124
 there+0.117
 abandoned+0.103
 nearby+0.103
Neg logit contributions
 apology-0.187
 Deal-0.185
oor-0.182
 accomplishment-0.173
Less-0.172
Token,
Token ID11
Feature activation+1.09e-01

Pos logit contributions
 saw+0.106
 named+0.103
 abandoned+0.090
 nearby+0.090
 hall+0.080
Neg logit contributions
oor-0.233
 Deal-0.227
 apology-0.225
ently-0.216
anks-0.215
Token in
Token ID287
Feature activation+7.27e-02

Pos logit contributions
 cloak+0.094
 hall+0.090
 named+0.087
 witch+0.085
 desert+0.083
Neg logit contributions
 apology-0.245
raw-0.241
Less-0.238
 Deal-0.235
 itself-0.233
Token a
Token ID257
Feature activation+1.06e-01

Pos logit contributions
 saw+0.105
 named+0.104
 appeared+0.095
 abandoned+0.095
 hall+0.094
Neg logit contributions
 Deal-0.134
 apology-0.134
 accomplishment-0.132
 myself-0.125
Less-0.123
Token small
Token ID1402
Feature activation+9.14e-02

Pos logit contributions
 named+0.154
 appeared+0.152
 called+0.127
 abandoned+0.126
 telephone+0.123
Neg logit contributions
Less-0.166
 Deal-0.166
 accomplishment-0.162
 myself-0.162
 mistakes-0.160
Token village
Token ID7404
Feature activation+1.13e-01

Pos logit contributions
 appeared+0.154
 named+0.154
aver+0.152
eh+0.140
 sitting+0.136
Neg logit contributions
 decisions-0.117
 mistakes-0.112
 instructions-0.109
 myself-0.106
 halves-0.105
Token,
Token ID11
Feature activation+1.02e-01

Pos logit contributions
 cloak+0.168
 flashlight+0.165
 mask+0.160
 sheep+0.156
 cowboy+0.150
Neg logit contributions
 myself-0.159
ently-0.154
 decisions-0.153
oor-0.149
 ourselves-0.149
Token there
Token ID612
Feature activation+9.30e-02

Pos logit contributions
 cloak+0.091
 uniform+0.083
 suit+0.080
 costume+0.079
 witch+0.079
Neg logit contributions
 apology-0.220
 itself-0.218
 myself-0.215
Less-0.213
 themselves-0.212
Token was
Token ID373
Feature activation+8.97e-02

Pos logit contributions
 named+0.135
 abandoned+0.129
 sheep+0.129
 witch+0.126
eh+0.126
Neg logit contributions
oor-0.156
Less-0.153
 apology-0.150
mm-0.149
ently-0.146
Token a
Token ID257
Feature activation+7.94e-02

Pos logit contributions
 saw+0.137
 cell+0.127
 named+0.126
 appeared+0.121
 hall+0.121
Neg logit contributions
Less-0.152
 apology-0.138
 myself-0.137
 lesson-0.136
oor-0.136
Token little
Token ID1310
Feature activation+5.52e-02

Pos logit contributions
 appeared+0.124
 named+0.116
 there+0.115
 seen+0.114
 saw+0.112
Neg logit contributions
 apology-0.131
 accomplishment-0.124
 Deal-0.113
 lesson-0.113
Less-0.109
Token time
Token ID640
Feature activation+9.62e-02

Pos logit contributions
 named+0.115
 saw+0.114
 there+0.107
 nearby+0.094
 abandoned+0.093
Neg logit contributions
 apology-0.180
 Deal-0.176
oor-0.175
 accomplishment-0.167
der-0.166
Token,
Token ID11
Feature activation+1.05e-01

Pos logit contributions
 named+0.109
 saw+0.108
 abandoned+0.094
 nearby+0.092
 sheep+0.082
Neg logit contributions
oor-0.231
 apology-0.223
 Deal-0.221
ently-0.211
anks-0.209
Token in
Token ID287
Feature activation+7.24e-02

Pos logit contributions
 cloak+0.098
 abandoned+0.094
 witch+0.093
 hall+0.091
 named+0.087
Neg logit contributions
Less-0.222
raw-0.221
 apology-0.221
 Deal-0.218
 itself-0.215
Token a
Token ID257
Feature activation+1.04e-01

Pos logit contributions
 saw+0.107
 named+0.102
 appeared+0.099
 hall+0.096
 abandoned+0.094
Neg logit contributions
 accomplishment-0.135
 apology-0.135
 Deal-0.132
Less-0.121
 themselves-0.119
Token big
Token ID1263
Feature activation+8.19e-02

Pos logit contributions
 appeared+0.155
 named+0.147
 called+0.126
 telephone+0.124
 abandoned+0.123
Neg logit contributions
Less-0.164
 Deal-0.163
 accomplishment-0.162
 myself-0.156
 apology-0.155
Token forest
Token ID8222
Feature activation+1.13e-01

Pos logit contributions
 appeared+0.108
 named+0.103
aver+0.102
 disturbed+0.098
 heard+0.091
Neg logit contributions
 accomplishment-0.144
 decisions-0.131
 mistakes-0.130
 goals-0.128
 apology-0.126
Token,
Token ID11
Feature activation+1.01e-01

Pos logit contributions
 cloak+0.165
 closet+0.164
 abandoned+0.164
 sheep+0.163
 mask+0.157
Neg logit contributions
ently-0.169
oor-0.169
 itself-0.155
iously-0.153
 accomplishment-0.149
Token there
Token ID612
Feature activation+8.98e-02

Pos logit contributions
 uniform+0.092
 cloak+0.088
 witch+0.082
 suit+0.081
 costume+0.078
Neg logit contributions
 itself-0.217
 themselves-0.211
 ourselves-0.207
 myself-0.207
Less-0.201
Token was
Token ID373
Feature activation+9.61e-02

Pos logit contributions
 named+0.135
 abandoned+0.135
 witch+0.132
eh+0.129
 nearby+0.128
Neg logit contributions
 itself-0.140
oor-0.140
ently-0.135
 themselves-0.135
mm-0.135
Token a
Token ID257
Feature activation+8.37e-02

Pos logit contributions
 saw+0.158
 cell+0.156
itty+0.152
 suit+0.152
 abandoned+0.149
Neg logit contributions
Less-0.131
 myself-0.124
 itself-0.123
 themselves-0.120
 lesson-0.116
Token little
Token ID1310
Feature activation+5.58e-02

Pos logit contributions
 appeared+0.138
 seen+0.132
 there+0.123
 heard+0.119
 saw+0.118
Neg logit contributions
 apology-0.118
 accomplishment-0.114
 lesson-0.105
Less-0.105
 Deal-0.105
Token new
Token ID649
Feature activation-3.60e-02

Pos logit contributions
 apology+0.054
 accomplishment+0.050
 Deal+0.049
anks+0.049
oor+0.048
Neg logit contributions
 named-0.052
 saw-0.050
 there-0.048
 abandoned-0.043
 nearby-0.042
Token friend
Token ID1545
Feature activation-2.64e-02

Pos logit contributions
 apology+0.073
 accomplishment+0.070
anks+0.068
 Deal+0.065
 decisions+0.064
Neg logit contributions
 named-0.058
 saw-0.056
 there-0.054
 nearby-0.049
 abandoned-0.048
Token just
Token ID655
Feature activation-4.73e-02

Pos logit contributions
anks+0.051
der+0.049
 apology+0.049
oor+0.048
 Deal+0.047
Neg logit contributions
 saw-0.040
 named-0.037
 hall-0.035
 abandoned-0.035
 there-0.033
Token by
Token ID416
Feature activation-2.18e-02

Pos logit contributions
 apology+0.105
anks+0.102
der+0.101
oor+0.100
iously+0.099
Neg logit contributions
 named-0.066
 saw-0.065
 there-0.058
 hall-0.055
 nearby-0.052
Token sharing
Token ID7373
Feature activation-2.50e-02

Pos logit contributions
 apology+0.044
 Deal+0.043
anks+0.043
oor+0.043
der+0.041
Neg logit contributions
 named-0.036
 saw-0.036
 there-0.034
 nearby-0.030
 looking-0.029
Token her
Token ID607
Feature activation-8.96e-02

Pos logit contributions
 apology+0.055
anks+0.053
 Deal+0.053
oor+0.052
der+0.052
Neg logit contributions
 named-0.037
 saw-0.036
 there-0.033
 nearby-0.029
 abandoned-0.028
Token smile
Token ID8212
Feature activation-6.19e-02

Pos logit contributions
 apology+0.185
anks+0.176
 Deal+0.172
oor+0.170
 accomplishment+0.168
Neg logit contributions
 named-0.149
 saw-0.148
 there-0.134
 nearby-0.121
 abandoned-0.120
Token.
Token ID13
Feature activation-3.11e-02

Pos logit contributions
anks+0.136
 apology+0.135
der+0.134
oor+0.132
 Deal+0.130
Neg logit contributions
 saw-0.089
 named-0.088
 there-0.078
 abandoned-0.070
 nearby-0.069
Token<|endoftext|>
Token ID50256
Feature activation+7.21e-02

Pos logit contributions
aned+0.082
 apology+0.081
iously+0.081
anks+0.078
 accomplishment+0.077
Neg logit contributions
 named-0.042
 saw-0.037
 there-0.034
 living-0.031
 who-0.027
TokenOne
Token ID3198
Feature activation+7.37e-02

Pos logit contributions
 named+0.139
 saw+0.138
 there+0.126
 abandoned+0.121
 nearby+0.118
Neg logit contributions
 apology-0.116
oor-0.114
anks-0.113
 Deal-0.113
Less-0.110
Token day
Token ID1110
Feature activation+7.02e-02

Pos logit contributions
 named+0.102
 saw+0.098
 abandoned+0.089
 there+0.087
 appeared+0.086
Neg logit contributions
 apology-0.153
 accomplishment-0.146
Less-0.145
oor-0.142
raw-0.140
Token that
Token ID326
Feature activation-1.18e-03

Pos logit contributions
 apology+0.039
der+0.038
oor+0.038
anks+0.038
iously+0.037
Neg logit contributions
 named-0.027
 saw-0.027
 there-0.024
 nearby-0.022
 hall-0.021
Token it
Token ID340
Feature activation-9.64e-03

Pos logit contributions
 apology+0.002
 Deal+0.002
oor+0.002
anks+0.002
der+0.002
Neg logit contributions
 named-0.002
 saw-0.002
 there-0.002
 nearby-0.002
 hall-0.001
Token produced
Token ID4635
Feature activation-1.82e-02

Pos logit contributions
 apology+0.025
oor+0.024
anks+0.024
 Deal+0.024
der+0.024
Neg logit contributions
 named-0.010
 saw-0.010
 there-0.009
 nearby-0.007
 hall-0.007
Token more
Token ID517
Feature activation-5.73e-02

Pos logit contributions
 Deal+0.046
 apology+0.045
oor+0.044
aned+0.044
anks+0.044
Neg logit contributions
 saw-0.023
 named-0.023
 there-0.022
 nearby-0.018
 hidden-0.017
Token av
Token ID1196
Feature activation-1.30e-02

Pos logit contributions
 Deal+0.128
oor+0.126
anks+0.125
aned+0.124
 apology+0.124
Neg logit contributions
 saw-0.090
 there-0.087
 named-0.084
 nearby-0.075
 looking-0.072
Tokenoc
Token ID420
Feature activation-8.89e-02

Pos logit contributions
 apology+0.031
 Deal+0.030
anks+0.029
oor+0.029
der+0.029
Neg logit contributions
 saw-0.018
 named-0.018
 there-0.016
 nearby-0.014
 looking-0.013
Tokenados
Token ID22484
Feature activation-1.79e-02

Pos logit contributions
 Deal+0.140
Less+0.123
 apology+0.117
oor+0.107
raw+0.107
Neg logit contributions
 named-0.175
 saw-0.166
 there-0.163
 looking-0.160
 nearby-0.154
Token than
Token ID621
Feature activation-2.61e-02

Pos logit contributions
 apology+0.040
anks+0.039
oor+0.039
 Deal+0.039
der+0.038
Neg logit contributions
 named-0.025
 saw-0.025
 there-0.023
 nearby-0.020
 hall-0.019
Token ever
Token ID1683
Feature activation-2.71e-02

Pos logit contributions
 apology+0.054
 Deal+0.053
anks+0.052
oor+0.051
aned+0.051
Neg logit contributions
 saw-0.044
 named-0.043
 there-0.042
 nearby-0.036
 looking-0.035
Token before
Token ID878
Feature activation-1.11e-02

Pos logit contributions
 apology+0.047
oor+0.045
der+0.045
anks+0.045
iously+0.044
Neg logit contributions
 named-0.052
 saw-0.051
 there-0.047
 hall-0.044
 nearby-0.043
Token.
Token ID13
Feature activation-4.74e-02

Pos logit contributions
 apology+0.028
 Deal+0.027
anks+0.027
oor+0.026
aned+0.026
Neg logit contributions
 saw-0.014
 named-0.013
 there-0.013
 nearby-0.011
 looking-0.010
Token better
Token ID1365
Feature activation-6.19e-02

Pos logit contributions
oor+0.100
anks+0.097
aned+0.097
 Deal+0.097
 apology+0.097
Neg logit contributions
 named-0.078
 saw-0.076
 there-0.073
 nearby-0.063
 called-0.062
Token than
Token ID621
Feature activation-3.76e-02

Pos logit contributions
 Deal+0.120
 apology+0.118
oor+0.118
aned+0.117
anks+0.115
Neg logit contributions
 saw-0.112
 named-0.112
 there-0.107
 looking-0.091
 nearby-0.090
Token selfish
Token ID20363
Feature activation-5.64e-02

Pos logit contributions
 apology+0.078
 Deal+0.076
anks+0.076
oor+0.076
der+0.073
Neg logit contributions
 named-0.060
 saw-0.060
 there-0.056
 nearby-0.049
 looking-0.047
Tokenness
Token ID1108
Feature activation-5.89e-02

Pos logit contributions
 apology+0.123
der+0.119
anks+0.118
 Deal+0.114
oor+0.114
Neg logit contributions
 named-0.087
 saw-0.084
 there-0.077
 nearby-0.070
 hall-0.068
Token and
Token ID290
Feature activation-6.02e-02

Pos logit contributions
 apology+0.129
der+0.124
anks+0.124
oor+0.122
 Deal+0.120
Neg logit contributions
 named-0.087
 saw-0.087
 there-0.077
 nearby-0.069
 hall-0.068
Token rud
Token ID28906
Feature activation-8.79e-02

Pos logit contributions
 apology+0.124
 Deal+0.118
anks+0.118
oor+0.117
der+0.116
Neg logit contributions
 named-0.098
 saw-0.097
 there-0.089
 nearby-0.080
 hall-0.078
Tokeneness
Token ID9449
Feature activation-5.83e-02

Pos logit contributions
Less+0.080
 apology+0.077
uct+0.075
 Deal+0.075
 accomplishment+0.073
Neg logit contributions
 named-0.232
 saw-0.231
 there-0.216
 heard-0.206
 outside-0.206
Token.
Token ID13
Feature activation-4.03e-02

Pos logit contributions
 apology+0.143
 Deal+0.141
oor+0.139
anks+0.138
aned+0.137
Neg logit contributions
 saw-0.071
 named-0.071
 there-0.066
 nearby-0.054
 looking-0.051
Token<|endoftext|>
Token ID50256
Feature activation+5.84e-02

Pos logit contributions
aned+0.088
iously+0.087
 accomplishment+0.084
oor+0.081
ently+0.081
Neg logit contributions
 named-0.070
 there-0.061
 saw-0.057
 who-0.051
 called-0.050
TokenTom
Token ID13787
Feature activation+2.23e-02

Pos logit contributions
 named+0.104
 saw+0.103
 abandoned+0.093
 there+0.087
 nearby+0.084
Neg logit contributions
 Deal-0.100
anks-0.100
oor-0.098
 apology-0.097
Less-0.097
Token and
Token ID290
Feature activation+1.28e-02

Pos logit contributions
 named+0.028
 there+0.026
 saw+0.025
 nearby+0.024
 abandoned+0.023
Neg logit contributions
 apology-0.049
anks-0.047
 Deal-0.046
oor-0.046
raw-0.045
Token and
Token ID290
Feature activation-3.56e-02

Pos logit contributions
 saw+0.000
 named+0.000
 there+0.000
 abandoned+0.000
 hall+0.000
Neg logit contributions
 apology-0.000
oor-0.000
anks-0.000
der-0.000
iously-0.000
Token thanked
Token ID26280
Feature activation-1.69e-02

Pos logit contributions
 apology+0.081
anks+0.078
oor+0.077
 Deal+0.077
der+0.076
Neg logit contributions
 named-0.047
 saw-0.047
 there-0.044
 nearby-0.038
 abandoned-0.036
Token them
Token ID606
Feature activation-3.42e-02

Pos logit contributions
oor+0.033
anks+0.032
 Deal+0.032
 apology+0.032
aned+0.032
Neg logit contributions
 named-0.031
 saw-0.030
 there-0.029
 nearby-0.025
 looking-0.025
Token for
Token ID329
Feature activation-1.53e-02

Pos logit contributions
 apology+0.074
 Deal+0.073
oor+0.071
aned+0.071
anks+0.071
Neg logit contributions
 named-0.053
 saw-0.051
 there-0.049
 nearby-0.042
 looking-0.041
Token sharing
Token ID7373
Feature activation-1.63e-02

Pos logit contributions
 apology+0.033
 Deal+0.032
anks+0.032
oor+0.032
der+0.031
Neg logit contributions
 named-0.024
 saw-0.024
 there-0.022
 nearby-0.019
 looking-0.018
Token their
Token ID511
Feature activation-8.78e-02

Pos logit contributions
 apology+0.036
anks+0.034
 Deal+0.034
oor+0.034
der+0.034
Neg logit contributions
 named-0.024
 saw-0.024
 there-0.022
 nearby-0.019
 hall-0.019
Token water
Token ID1660
Feature activation-6.19e-02

Pos logit contributions
 apology+0.175
 Deal+0.170
anks+0.169
oor+0.168
der+0.165
Neg logit contributions
 saw-0.149
 named-0.149
 there-0.139
 nearby-0.123
 heard-0.118
Token.
Token ID13
Feature activation-3.87e-02

Pos logit contributions
 apology+0.139
anks+0.136
oor+0.134
der+0.132
 Deal+0.131
Neg logit contributions
 saw-0.089
 named-0.087
 there-0.079
 nearby-0.068
 abandoned-0.068
Token They
Token ID1119
Feature activation-3.11e-03

Pos logit contributions
 apology+0.073
anks+0.069
oor+0.069
 Deal+0.069
aned+0.068
Neg logit contributions
 named-0.073
 saw-0.071
 there-0.068
 nearby-0.059
 hall-0.057
Token took
Token ID1718
Feature activation-2.69e-02

Pos logit contributions
 apology+0.007
 Deal+0.007
anks+0.007
oor+0.007
der+0.007
Neg logit contributions
 named-0.004
 saw-0.004
 there-0.004
 nearby-0.003
 heard-0.003
Token the
Token ID262
Feature activation-4.00e-02

Pos logit contributions
 apology+0.051
 Deal+0.049
anks+0.049
oor+0.049
der+0.048
Neg logit contributions
 saw-0.048
 named-0.048
 there-0.045
 nearby-0.040
 heard-0.039
Token to
Token ID284
Feature activation-2.73e-02

Pos logit contributions
 apology+0.090
 Deal+0.089
oor+0.088
anks+0.087
Less+0.085
Neg logit contributions
 saw-0.054
 named-0.053
 there-0.049
 nearby-0.042
 heard-0.040
Token each
Token ID1123
Feature activation-2.34e-02

Pos logit contributions
oor+0.053
 Deal+0.052
aned+0.051
anks+0.051
 apology+0.050
Neg logit contributions
 named-0.052
 saw-0.051
 there-0.047
 nearby-0.042
 looking-0.041
Token other
Token ID584
Feature activation-2.92e-02

Pos logit contributions
oor+0.048
anks+0.048
 Deal+0.048
 apology+0.047
iously+0.047
Neg logit contributions
 saw-0.039
 named-0.039
 there-0.036
 nearby-0.033
 heard-0.031
Token and
Token ID290
Feature activation-5.61e-02

Pos logit contributions
 apology+0.067
anks+0.064
 Deal+0.064
oor+0.064
der+0.063
Neg logit contributions
 named-0.040
 saw-0.040
 there-0.037
 nearby-0.032
 abandoned-0.030
Token share
Token ID2648
Feature activation-2.45e-02

Pos logit contributions
oor+0.109
 Deal+0.108
anks+0.107
 apology+0.106
aned+0.106
Neg logit contributions
 saw-0.100
 named-0.100
 there-0.093
 nearby-0.082
 heard-0.079
Token their
Token ID511
Feature activation-8.78e-02

Pos logit contributions
 Deal+0.057
anks+0.057
oor+0.057
 apology+0.057
aned+0.056
Neg logit contributions
 saw-0.033
 named-0.033
 there-0.031
 nearby-0.026
 heard-0.025
Token toys
Token ID14958
Feature activation-3.92e-02

Pos logit contributions
 apology+0.170
 Deal+0.164
anks+0.164
oor+0.163
der+0.159
Neg logit contributions
 named-0.154
 saw-0.154
 there-0.144
 nearby-0.128
 abandoned-0.123
Token.
Token ID13
Feature activation-3.17e-02

Pos logit contributions
 Deal+0.103
anks+0.100
aned+0.100
 apology+0.100
iously+0.099
Neg logit contributions
 named-0.044
 saw-0.043
 there-0.042
 nearby-0.035
 outside-0.030
Token
Token ID198
Feature activation-1.55e-02

Pos logit contributions
iously+0.069
 apology+0.069
aned+0.068
oor+0.068
 accomplishment+0.067
Neg logit contributions
 named-0.052
 saw-0.049
 there-0.047
 called-0.038
 nearby-0.037
Token
Token ID198
Feature activation-2.40e-02

Pos logit contributions
iously+0.038
anks+0.036
 Deal+0.036
der+0.036
aned+0.036
Neg logit contributions
 named-0.022
 saw-0.022
 there-0.020
 nearby-0.017
 called-0.016
TokenTom
Token ID13787
Feature activation-2.91e-02

Pos logit contributions
 apology+0.052
anks+0.051
 Deal+0.050
iously+0.050
oor+0.050
Neg logit contributions
 named-0.038
 saw-0.038
 there-0.035
 nearby-0.031
 heard-0.029
Token like
Token ID588
Feature activation-1.96e-02

Pos logit contributions
 apology+0.098
 Deal+0.094
anks+0.094
oor+0.093
der+0.091
Neg logit contributions
 named-0.078
 saw-0.078
 there-0.072
 nearby-0.065
 abandoned-0.062
Token picking
Token ID10868
Feature activation-2.83e-02

Pos logit contributions
 apology+0.041
anks+0.040
 Deal+0.040
oor+0.039
der+0.039
Neg logit contributions
 saw-0.030
 named-0.030
 there-0.027
 nearby-0.025
 heard-0.024
Token flowers
Token ID12734
Feature activation-2.50e-02

Pos logit contributions
 apology+0.057
anks+0.055
oor+0.054
 Deal+0.054
der+0.053
Neg logit contributions
 named-0.048
 saw-0.047
 there-0.043
 nearby-0.038
 abandoned-0.037
Token or
Token ID393
Feature activation-3.36e-02

Pos logit contributions
 apology+0.055
 Deal+0.053
anks+0.053
oor+0.052
aned+0.051
Neg logit contributions
 named-0.038
 saw-0.038
 there-0.036
 nearby-0.031
 looking-0.029
Token sharing
Token ID7373
Feature activation-3.61e-02

Pos logit contributions
 apology+0.069
anks+0.066
 Deal+0.066
der+0.065
oor+0.065
Neg logit contributions
 named-0.054
 saw-0.053
 there-0.048
 nearby-0.044
 abandoned-0.042
Token your
Token ID534
Feature activation-8.51e-02

Pos logit contributions
 apology+0.069
 Deal+0.069
oor+0.067
aned+0.067
anks+0.066
Neg logit contributions
 saw-0.066
 named-0.065
 there-0.065
 nearby-0.056
 looking-0.053
Token food
Token ID2057
Feature activation-6.04e-02

Pos logit contributions
 apology+0.159
 Deal+0.151
anks+0.151
oor+0.150
der+0.145
Neg logit contributions
 named-0.156
 saw-0.155
 there-0.146
 nearby-0.131
 abandoned-0.127
Token."
Token ID526
Feature activation-3.54e-02

Pos logit contributions
 apology+0.104
 Deal+0.102
Less+0.097
aned+0.097
oor+0.097
Neg logit contributions
 named-0.121
 there-0.121
 saw-0.120
 nearby-0.107
 outside-0.102
Token The
Token ID383
Feature activation-1.77e-02

Pos logit contributions
 apology+0.084
iously+0.082
aned+0.081
oor+0.081
 Deal+0.081
Neg logit contributions
 named-0.056
 saw-0.050
 there-0.045
 called-0.038
 who-0.036
Token rabbit
Token ID22746
Feature activation+1.33e-02

Pos logit contributions
 apology+0.040
Less+0.036
 Deal+0.036
 accomplishment+0.035
der+0.035
Neg logit contributions
 named-0.026
 saw-0.025
 there-0.025
 hall-0.022
 abandoned-0.021
Token nodded
Token ID14464
Feature activation-1.89e-02

Pos logit contributions
 named+0.020
 saw+0.020
 there+0.018
 nearby+0.016
 abandoned+0.016
Neg logit contributions
 apology-0.029
anks-0.028
 Deal-0.028
oor-0.028
der-0.027
Token someone
Token ID2130
Feature activation-2.93e-02

Pos logit contributions
 apology+0.062
 Deal+0.060
oor+0.060
anks+0.059
der+0.058
Neg logit contributions
 saw-0.043
 named-0.043
 there-0.040
 nearby-0.035
 looking-0.033
Token else
Token ID2073
Feature activation-5.56e-02

Pos logit contributions
 apology+0.060
anks+0.057
 Deal+0.056
oor+0.056
der+0.055
Neg logit contributions
 named-0.048
 saw-0.047
 there-0.043
 nearby-0.038
 hall-0.038
Token happy
Token ID3772
Feature activation-4.51e-02

Pos logit contributions
 apology+0.128
 Deal+0.126
oor+0.125
anks+0.124
aned+0.122
Neg logit contributions
 saw-0.078
 named-0.078
 there-0.073
 nearby-0.062
 looking-0.060
Token by
Token ID416
Feature activation-3.13e-02

Pos logit contributions
 apology+0.104
anks+0.100
 Deal+0.100
oor+0.100
der+0.098
Neg logit contributions
 named-0.062
 saw-0.062
 there-0.057
 nearby-0.049
 abandoned-0.047
Token sharing
Token ID7373
Feature activation-2.56e-02

Pos logit contributions
 apology+0.065
 Deal+0.062
anks+0.062
oor+0.062
der+0.061
Neg logit contributions
 saw-0.050
 named-0.050
 there-0.046
 nearby-0.041
 heard-0.039
Token her
Token ID607
Feature activation-8.51e-02

Pos logit contributions
 apology+0.059
 Deal+0.057
anks+0.057
oor+0.056
der+0.055
Neg logit contributions
 named-0.036
 saw-0.035
 there-0.032
 nearby-0.028
 abandoned-0.027
Token sweets
Token ID42402
Feature activation-3.74e-02

Pos logit contributions
 Deal+0.175
aned+0.171
oor+0.170
 apology+0.168
anks+0.168
Neg logit contributions
 named-0.141
 saw-0.140
 there-0.135
 nearby-0.116
 looking-0.115
Token.
Token ID13
Feature activation-3.58e-02

Pos logit contributions
 apology+0.093
 Deal+0.091
anks+0.089
oor+0.089
aned+0.087
Neg logit contributions
 named-0.046
 saw-0.046
 there-0.043
 nearby-0.036
 looking-0.034
Token From
Token ID3574
Feature activation+7.14e-03

Pos logit contributions
 apology+0.082
aned+0.080
iously+0.080
anks+0.079
ently+0.077
Neg logit contributions
 named-0.061
 saw-0.056
 there-0.052
 called-0.044
 living-0.042
Token that
Token ID326
Feature activation+2.27e-02

Pos logit contributions
 saw+0.012
 named+0.012
 hall+0.011
 appeared+0.011
 there+0.011
Neg logit contributions
 apology-0.012
 accomplishment-0.012
der-0.012
iously-0.012
oor-0.012
Token day
Token ID1110
Feature activation+8.63e-03

Pos logit contributions
 named+0.032
 saw+0.032
 there+0.029
 abandoned+0.027
 appeared+0.026
Neg logit contributions
 apology-0.048
Less-0.047
 accomplishment-0.047
oor-0.046
 Deal-0.045
Token to
Token ID284
Feature activation-3.45e-02

Pos logit contributions
 apology+0.073
 Deal+0.071
anks+0.070
oor+0.070
der+0.068
Neg logit contributions
 saw-0.047
 named-0.047
 there-0.043
 nearby-0.037
 heard-0.036
Token our
Token ID674
Feature activation-6.64e-02

Pos logit contributions
aned+0.075
oor+0.073
anks+0.071
iously+0.070
der+0.068
Neg logit contributions
 named-0.061
 saw-0.057
 there-0.053
 who-0.049
 looking-0.048
Token friends
Token ID2460
Feature activation-2.74e-02

Pos logit contributions
oor+0.113
 Deal+0.111
aned+0.110
anks+0.107
Less+0.107
Neg logit contributions
 saw-0.129
 named-0.128
 there-0.121
 nearby-0.109
 hall-0.109
Token and
Token ID290
Feature activation-5.83e-02

Pos logit contributions
 apology+0.067
 Deal+0.065
anks+0.065
oor+0.064
aned+0.063
Neg logit contributions
 named-0.035
 saw-0.035
 there-0.032
 nearby-0.027
 heard-0.025
Token respect
Token ID2461
Feature activation-3.53e-02

Pos logit contributions
oor+0.103
 apology+0.102
 Deal+0.101
anks+0.101
aned+0.099
Neg logit contributions
 saw-0.113
 named-0.112
 there-0.105
 nearby-0.094
 heard-0.092
Token their
Token ID511
Feature activation-8.51e-02

Pos logit contributions
oor+0.072
 Deal+0.071
aned+0.071
anks+0.071
 apology+0.070
Neg logit contributions
 named-0.061
 saw-0.060
 there-0.057
 nearby-0.050
 called-0.048
Token style
Token ID3918
Feature activation-5.97e-02

Pos logit contributions
oor+0.173
 Deal+0.171
anks+0.168
aned+0.168
 apology+0.167
Neg logit contributions
 saw-0.139
 named-0.137
 there-0.128
 nearby-0.112
 hall-0.110
Token.
Token ID13
Feature activation-4.16e-02

Pos logit contributions
 Deal+0.152
 apology+0.149
anks+0.145
oor+0.145
Less+0.145
Neg logit contributions
 named-0.070
 saw-0.066
 there-0.065
 nearby-0.053
 called-0.048
Token We
Token ID775
Feature activation+4.62e-04

Pos logit contributions
aned+0.087
iously+0.085
 accomplishment+0.084
ently+0.081
 apology+0.080
Neg logit contributions
 named-0.077
 there-0.068
 saw-0.065
 who-0.057
 called-0.057
Token should
Token ID815
Feature activation-2.45e-02

Pos logit contributions
 named+0.001
 saw+0.001
 there+0.001
 nearby+0.001
 heard+0.001
Neg logit contributions
 apology-0.001
 Deal-0.001
anks-0.001
oor-0.001
der-0.001
Token say
Token ID910
Feature activation-3.07e-02

Pos logit contributions
anks+0.046
 apology+0.045
oor+0.045
iously+0.045
 Deal+0.045
Neg logit contributions
 named-0.045
 saw-0.044
 there-0.041
 nearby-0.036
 looking-0.036
Token with
Token ID351
Feature activation-3.36e-02

Pos logit contributions
 apology+0.105
anks+0.101
 Deal+0.101
oor+0.100
der+0.098
Neg logit contributions
 named-0.080
 saw-0.080
 there-0.074
 nearby-0.066
 abandoned-0.063
Token your
Token ID534
Feature activation-7.10e-02

Pos logit contributions
 apology+0.059
anks+0.058
 Deal+0.057
oor+0.057
der+0.055
Neg logit contributions
 named-0.064
 saw-0.064
 there-0.060
 nearby-0.054
 looking-0.053
Token heart
Token ID2612
Feature activation-5.01e-02

Pos logit contributions
oor+0.118
 apology+0.117
 Deal+0.117
anks+0.116
aned+0.115
Neg logit contributions
 saw-0.141
 named-0.140
 there-0.132
 nearby-0.119
 looking-0.117
Token and
Token ID290
Feature activation-5.20e-02

Pos logit contributions
 apology+0.127
 Deal+0.124
anks+0.123
oor+0.123
der+0.121
Neg logit contributions
 named-0.057
 saw-0.057
 there-0.051
 nearby-0.042
 heard-0.039
Token share
Token ID2648
Feature activation-2.73e-02

Pos logit contributions
 apology+0.085
 Deal+0.081
anks+0.081
oor+0.081
der+0.078
Neg logit contributions
 saw-0.107
 named-0.106
 there-0.101
 nearby-0.091
 looking-0.088
Token your
Token ID534
Feature activation-8.50e-02

Pos logit contributions
 apology+0.062
 Deal+0.061
anks+0.060
oor+0.060
aned+0.059
Neg logit contributions
 named-0.039
 saw-0.039
 there-0.036
 nearby-0.031
 heard-0.029
Token voice
Token ID3809
Feature activation-5.22e-02

Pos logit contributions
oor+0.141
anks+0.141
der+0.140
 Deal+0.140
aned+0.140
Neg logit contributions
 saw-0.168
 named-0.168
 there-0.160
 nearby-0.143
 looking-0.141
Token.
Token ID13
Feature activation-4.77e-02

Pos logit contributions
 apology+0.124
anks+0.120
 Deal+0.120
oor+0.120
der+0.117
Neg logit contributions
 saw-0.068
 named-0.068
 there-0.061
 nearby-0.053
 abandoned-0.050
Token<|endoftext|>
Token ID50256
Feature activation+5.91e-02

Pos logit contributions
iously+0.105
aned+0.104
 accomplishment+0.102
 apology+0.098
ently+0.098
Neg logit contributions
 named-0.081
 there-0.070
 saw-0.067
 called-0.059
 who-0.058
TokenM
Token ID44
Feature activation+1.22e-02

Pos logit contributions
 named+0.102
 saw+0.101
 abandoned+0.089
 there+0.086
 nearby+0.082
Neg logit contributions
anks-0.105
 Deal-0.105
 apology-0.104
oor-0.103
Less-0.102
Tokenia
Token ID544
Feature activation+3.15e-02

Pos logit contributions
 named+0.025
 saw+0.025
 there+0.024
 nearby+0.021
 heard+0.021
Neg logit contributions
 apology-0.020
anks-0.019
 Deal-0.019
oor-0.019
der-0.019
Token
Token ID198
Feature activation-1.15e-02

Pos logit contributions
 apology+0.049
anks+0.049
oor+0.048
aned+0.048
 Deal+0.047
Neg logit contributions
 named-0.038
 saw-0.037
 there-0.035
 nearby-0.029
 heard-0.029
Token
Token ID198
Feature activation-1.67e-02

Pos logit contributions
anks+0.027
iously+0.027
 apology+0.026
oor+0.026
 Deal+0.026
Neg logit contributions
 named-0.017
 saw-0.017
 there-0.016
 nearby-0.014
 heard-0.013
TokenThey
Token ID2990
Feature activation+1.52e-02

Pos logit contributions
 apology+0.027
 Deal+0.026
Less+0.025
oor+0.025
der+0.025
Neg logit contributions
 saw-0.034
 named-0.034
 there-0.032
 abandoned-0.029
 appeared-0.029
Token stopped
Token ID5025
Feature activation-1.20e-02

Pos logit contributions
 named+0.019
 there+0.019
 abandoned+0.018
 saw+0.018
 hall+0.017
Neg logit contributions
 apology-0.032
der-0.031
aned-0.030
Less-0.030
 Deal-0.030
Token sharing
Token ID7373
Feature activation-6.99e-03

Pos logit contributions
 apology+0.030
oor+0.029
 Deal+0.029
anks+0.029
aned+0.028
Neg logit contributions
 named-0.015
 saw-0.015
 there-0.013
 nearby-0.011
 looking-0.011
Token their
Token ID511
Feature activation-8.46e-02

Pos logit contributions
 apology+0.016
 Deal+0.016
anks+0.016
oor+0.015
der+0.015
Neg logit contributions
 named-0.010
 saw-0.010
 there-0.009
 nearby-0.008
 abandoned-0.007
Token toys
Token ID14958
Feature activation-2.29e-02

Pos logit contributions
 apology+0.183
anks+0.175
 Deal+0.175
oor+0.175
der+0.170
Neg logit contributions
 named-0.132
 saw-0.130
 there-0.120
 nearby-0.106
 abandoned-0.101
Token and
Token ID290
Feature activation-3.74e-02

Pos logit contributions
 apology+0.056
 Deal+0.055
anks+0.054
oor+0.054
aned+0.053
Neg logit contributions
 saw-0.029
 named-0.029
 there-0.027
 nearby-0.023
 looking-0.021
Token began
Token ID2540
Feature activation-6.78e-03

Pos logit contributions
 apology+0.076
anks+0.071
der+0.071
 Deal+0.070
oor+0.070
Neg logit contributions
 named-0.060
 saw-0.059
 there-0.056
 nearby-0.051
 abandoned-0.049
Token to
Token ID284
Feature activation-2.12e-02

Pos logit contributions
 apology+0.014
anks+0.013
oor+0.013
 accomplishment+0.013
der+0.013
Neg logit contributions
 named-0.011
 saw-0.011
 there-0.010
 nearby-0.009
 abandoned-0.009
Token snatch
Token ID43388
Feature activation-9.88e-03

Pos logit contributions
 apology+0.040
 Deal+0.036
 accomplishment+0.036
anks+0.035
oor+0.035
Neg logit contributions
 named-0.038
 saw-0.037
 there-0.036
 nearby-0.032
 abandoned-0.032
Token and
Token ID290
Feature activation-5.25e-02

Pos logit contributions
 apology+0.129
anks+0.124
 Deal+0.124
oor+0.123
der+0.121
Neg logit contributions
 named-0.087
 saw-0.087
 there-0.080
 nearby-0.071
 hall-0.067
Token a
Token ID257
Feature activation-2.20e-02

Pos logit contributions
 apology+0.104
oor+0.103
anks+0.102
 Deal+0.102
aned+0.101
Neg logit contributions
 saw-0.089
 named-0.088
 there-0.085
 nearby-0.073
 looking-0.071
Token way
Token ID835
Feature activation-3.88e-02

Pos logit contributions
 apology+0.044
 Deal+0.042
oor+0.041
 accomplishment+0.041
anks+0.040
Neg logit contributions
 named-0.037
 saw-0.037
 there-0.035
 nearby-0.030
 abandoned-0.030
Token to
Token ID284
Feature activation-3.86e-02

Pos logit contributions
 apology+0.090
 Deal+0.088
anks+0.086
oor+0.086
der+0.085
Neg logit contributions
 saw-0.054
 named-0.053
 there-0.050
 nearby-0.042
 heard-0.040
Token share
Token ID2648
Feature activation-2.65e-02

Pos logit contributions
oor+0.075
iously+0.075
 Deal+0.075
aned+0.075
 apology+0.074
Neg logit contributions
 saw-0.067
 named-0.066
 there-0.062
 nearby-0.055
 heard-0.054
Token your
Token ID534
Feature activation-8.43e-02

Pos logit contributions
 apology+0.062
 Deal+0.062
anks+0.061
oor+0.061
aned+0.060
Neg logit contributions
 saw-0.035
 named-0.035
 there-0.034
 nearby-0.028
 looking-0.026
Token ideas
Token ID4213
Feature activation-4.81e-02

Pos logit contributions
 apology+0.150
oor+0.148
 Deal+0.148
der+0.147
anks+0.147
Neg logit contributions
 saw-0.159
 named-0.157
 there-0.149
 nearby-0.133
 looking-0.130
Token and
Token ID290
Feature activation-6.39e-02

Pos logit contributions
 apology+0.129
 Deal+0.127
anks+0.124
aned+0.124
iously+0.124
Neg logit contributions
 there-0.051
 saw-0.049
 named-0.049
 nearby-0.039
 outside-0.036
Token feelings
Token ID7666
Feature activation-5.28e-02

Pos logit contributions
 apology+0.105
oor+0.105
 Deal+0.104
anks+0.103
aned+0.102
Neg logit contributions
 saw-0.130
 named-0.128
 there-0.122
 nearby-0.109
 looking-0.107
Token.
Token ID13
Feature activation-4.13e-02

Pos logit contributions
 apology+0.139
 Deal+0.138
aned+0.136
iously+0.135
oor+0.135
Neg logit contributions
 named-0.056
 saw-0.055
 there-0.055
 nearby-0.043
 living-0.039
Token If
Token ID1002
Feature activation+4.55e-03

Pos logit contributions
iously+0.098
aned+0.098
 accomplishment+0.093
ently+0.091
oor+0.089
Neg logit contributions
 named-0.066
 there-0.057
 saw-0.052
 who-0.048
 called-0.047
Token should
Token ID815
Feature activation-2.60e-02

Pos logit contributions
 apology+0.033
 Deal+0.032
anks+0.032
oor+0.032
der+0.032
Neg logit contributions
 named-0.017
 saw-0.017
 there-0.015
 nearby-0.013
 abandoned-0.012
Token always
Token ID1464
Feature activation-3.38e-02

Pos logit contributions
 apology+0.053
anks+0.052
 Deal+0.051
oor+0.051
der+0.050
Neg logit contributions
 named-0.043
 saw-0.042
 there-0.039
 nearby-0.035
 looking-0.033
Token respect
Token ID2461
Feature activation-3.58e-02

Pos logit contributions
 apology+0.061
oor+0.061
anks+0.061
der+0.060
 Deal+0.059
Neg logit contributions
 named-0.063
 saw-0.063
 there-0.059
 nearby-0.053
 looking-0.052
Token and
Token ID290
Feature activation-6.31e-02

Pos logit contributions
oor+0.078
 apology+0.077
 Deal+0.077
anks+0.076
aned+0.075
Neg logit contributions
 named-0.056
 saw-0.055
 there-0.052
 nearby-0.045
 called-0.043
Token share
Token ID2648
Feature activation-3.41e-02

Pos logit contributions
oor+0.117
 Deal+0.116
 apology+0.116
anks+0.115
aned+0.113
Neg logit contributions
 saw-0.117
 named-0.116
 there-0.110
 nearby-0.097
 looking-0.095
Token their
Token ID511
Feature activation-8.39e-02

Pos logit contributions
 apology+0.078
 Deal+0.077
oor+0.077
anks+0.077
aned+0.076
Neg logit contributions
 saw-0.048
 named-0.048
 there-0.045
 nearby-0.038
 heard-0.036
Token things
Token ID1243
Feature activation-4.89e-02

Pos logit contributions
anks+0.171
oor+0.170
aned+0.169
der+0.168
 Deal+0.166
Neg logit contributions
 saw-0.139
 named-0.135
 there-0.131
 nearby-0.115
 living-0.113
Token and
Token ID290
Feature activation-6.47e-02

Pos logit contributions
 Deal+0.126
 apology+0.126
anks+0.124
aned+0.122
oor+0.122
Neg logit contributions
 named-0.054
 saw-0.053
 there-0.053
 nearby-0.042
 hidden-0.039
Token not
Token ID407
Feature activation-2.75e-02

Pos logit contributions
oor+0.125
 Deal+0.123
aned+0.120
anks+0.119
 apology+0.118
Neg logit contributions
 saw-0.121
 named-0.118
 there-0.113
 nearby-0.098
 looking-0.097
Token hurt
Token ID5938
Feature activation-5.55e-02

Pos logit contributions
 apology+0.057
 Deal+0.055
anks+0.055
oor+0.054
der+0.053
Neg logit contributions
 named-0.045
 saw-0.045
 there-0.041
 nearby-0.037
 hall-0.035
Token each
Token ID1123
Feature activation-4.20e-02

Pos logit contributions
 apology+0.100
 Deal+0.096
anks+0.095
oor+0.095
der+0.092
Neg logit contributions
 named-0.105
 saw-0.105
 there-0.099
 nearby-0.089
 abandoned-0.086
Token learn
Token ID2193
Feature activation-1.24e-02

Pos logit contributions
 apology+0.037
 Deal+0.035
anks+0.035
oor+0.035
der+0.034
Neg logit contributions
 saw-0.041
 named-0.041
 there-0.039
 nearby-0.035
 abandoned-0.034
Token more
Token ID517
Feature activation-5.29e-02

Pos logit contributions
 Deal+0.027
oor+0.027
 apology+0.027
anks+0.027
aned+0.026
Neg logit contributions
 named-0.019
 saw-0.018
 there-0.018
 nearby-0.015
 called-0.014
Token about
Token ID546
Feature activation-3.42e-02

Pos logit contributions
 apology+0.104
 Deal+0.102
anks+0.101
oor+0.101
aned+0.098
Neg logit contributions
 named-0.091
 saw-0.090
 there-0.085
 nearby-0.076
 looking-0.072
Token rh
Token ID9529
Feature activation-3.47e-02

Pos logit contributions
 apology+0.071
anks+0.068
 Deal+0.068
oor+0.067
der+0.067
Neg logit contributions
 saw-0.054
 named-0.054
 there-0.050
 nearby-0.044
 heard-0.043
Tokenin
Token ID259
Feature activation-5.04e-02

Pos logit contributions
 apology+0.061
anks+0.058
oor+0.057
der+0.056
ently+0.056
Neg logit contributions
 named-0.067
 saw-0.066
 there-0.061
 abandoned-0.056
 hall-0.055
Tokenoc
Token ID420
Feature activation-8.39e-02

Pos logit contributions
 apology+0.104
anks+0.100
oor+0.099
der+0.098
iously+0.097
Neg logit contributions
 named-0.080
 saw-0.080
 there-0.073
 abandoned-0.065
 nearby-0.064
Tokener
Token ID263
Feature activation-4.57e-02

Pos logit contributions
 Deal+0.151
Less+0.141
 apology+0.135
uct+0.130
raw+0.129
Neg logit contributions
 named-0.155
 saw-0.145
 there-0.135
 looking-0.130
 nearby-0.129
Tokenoses
Token ID4629
Feature activation-3.49e-02

Pos logit contributions
 apology+0.101
anks+0.100
der+0.098
oor+0.097
iously+0.095
Neg logit contributions
 saw-0.065
 named-0.064
 there-0.057
 nearby-0.051
 abandoned-0.050
Token and
Token ID290
Feature activation-4.92e-02

Pos logit contributions
 apology+0.077
anks+0.074
der+0.074
oor+0.073
iously+0.071
Neg logit contributions
 named-0.049
 saw-0.049
 there-0.044
 abandoned-0.039
 hall-0.039
Token respect
Token ID2461
Feature activation-4.06e-02

Pos logit contributions
 apology+0.099
 Deal+0.095
anks+0.095
oor+0.095
der+0.093
Neg logit contributions
 named-0.082
 saw-0.082
 there-0.076
 nearby-0.068
 abandoned-0.065
Token them
Token ID606
Feature activation-6.46e-02

Pos logit contributions
 apology+0.086
 Deal+0.085
oor+0.084
anks+0.084
aned+0.082
Neg logit contributions
 named-0.064
 saw-0.063
 there-0.059
 nearby-0.052
 heard-0.049
Token Both
Token ID5747
Feature activation-1.73e-02

Pos logit contributions
 apology+0.112
iously+0.111
aned+0.108
der+0.107
oor+0.106
Neg logit contributions
 named-0.052
 saw-0.050
 there-0.049
 living-0.037
 hall-0.036
Token of
Token ID286
Feature activation-9.56e-03

Pos logit contributions
 apology+0.037
anks+0.036
 Deal+0.036
oor+0.036
der+0.035
Neg logit contributions
 saw-0.027
 named-0.027
 there-0.025
 nearby-0.022
 heard-0.021
Token them
Token ID606
Feature activation-2.83e-03

Pos logit contributions
 apology+0.016
anks+0.015
oor+0.015
 Deal+0.015
der+0.014
Neg logit contributions
 named-0.020
 saw-0.020
 there-0.019
 nearby-0.017
 heard-0.016
Token put
Token ID1234
Feature activation-1.52e-02

Pos logit contributions
 apology+0.006
oor+0.006
der+0.006
Less+0.006
iously+0.006
Neg logit contributions
 named-0.004
 there-0.004
 saw-0.004
 abandoned-0.003
 hall-0.003
Token their
Token ID511
Feature activation-6.48e-02

Pos logit contributions
 apology+0.031
 Deal+0.030
anks+0.030
oor+0.030
der+0.029
Neg logit contributions
 named-0.025
 saw-0.025
 there-0.023
 nearby-0.020
 abandoned-0.019
Token anx
Token ID7296
Feature activation-8.38e-02

Pos logit contributions
anks+0.142
der+0.142
oor+0.141
 apology+0.139
 Deal+0.138
Neg logit contributions
 saw-0.102
 named-0.097
 there-0.089
 looking-0.080
 nearby-0.080
Tokenieties
Token ID9545
Feature activation-3.23e-02

Pos logit contributions
Less+0.078
 Deal+0.075
oor+0.065
der+0.063
Ever+0.059
Neg logit contributions
 named-0.229
 saw-0.207
 looking-0.205
 called-0.197
 sitting-0.193
Token aside
Token ID7263
Feature activation-3.02e-02

Pos logit contributions
 apology+0.071
oor+0.069
anks+0.069
 Deal+0.069
der+0.068
Neg logit contributions
 named-0.047
 saw-0.046
 there-0.042
 nearby-0.036
 abandoned-0.035
Token.
Token ID13
Feature activation-4.33e-02

Pos logit contributions
 apology+0.068
oor+0.067
anks+0.066
der+0.066
 Deal+0.065
Neg logit contributions
 named-0.042
 saw-0.041
 there-0.036
 hall-0.032
 nearby-0.032
Token From
Token ID3574
Feature activation+4.50e-03

Pos logit contributions
 apology+0.106
iously+0.106
aned+0.102
der+0.102
anks+0.101
Neg logit contributions
 named-0.059
 there-0.056
 saw-0.055
 living-0.042
 called-0.041
Token then
Token ID788
Feature activation-5.64e-04

Pos logit contributions
 saw+0.008
 named+0.008
 hall+0.007
 there+0.007
 heard+0.007
Neg logit contributions
 apology-0.007
 accomplishment-0.007
iously-0.007
der-0.007
oor-0.007
Token We
Token ID775
Feature activation+1.24e-03

Pos logit contributions
 apology+0.071
oor+0.068
 accomplishment+0.068
aned+0.068
anks+0.067
Neg logit contributions
 named-0.087
 saw-0.082
 there-0.080
 nearby-0.070
 called-0.069
Token have
Token ID423
Feature activation-7.98e-03

Pos logit contributions
 named+0.002
 saw+0.002
 there+0.002
 abandoned+0.002
 nearby+0.002
Neg logit contributions
 apology-0.002
anks-0.002
oor-0.002
 Deal-0.002
der-0.002
Token different
Token ID1180
Feature activation-5.41e-02

Pos logit contributions
 apology+0.018
 Deal+0.018
oor+0.017
anks+0.017
aned+0.017
Neg logit contributions
 saw-0.012
 named-0.012
 there-0.011
 nearby-0.009
 heard-0.009
Token likes
Token ID7832
Feature activation-3.97e-02

Pos logit contributions
 apology+0.104
anks+0.099
 Deal+0.097
der+0.097
oor+0.096
Neg logit contributions
 named-0.096
 saw-0.095
 there-0.089
 nearby-0.081
 abandoned-0.078
Token and
Token ID290
Feature activation-4.47e-02

Pos logit contributions
 apology+0.079
anks+0.076
 Deal+0.075
oor+0.075
der+0.074
Neg logit contributions
 saw-0.066
 named-0.066
 there-0.061
 nearby-0.055
 abandoned-0.053
Token disl
Token ID19621
Feature activation-8.38e-02

Pos logit contributions
 apology+0.081
anks+0.076
 Deal+0.076
oor+0.075
der+0.075
Neg logit contributions
 named-0.084
 saw-0.083
 there-0.078
 nearby-0.072
 abandoned-0.068
Tokenikes
Token ID7938
Feature activation-2.45e-02

Pos logit contributions
 apology+0.132
 Deal+0.127
 accomplishment+0.124
Less+0.123
raw+0.122
Neg logit contributions
 saw-0.167
 there-0.164
 outside-0.156
 named-0.151
 liked-0.149
Token.
Token ID13
Feature activation-4.08e-02

Pos logit contributions
 apology+0.057
 Deal+0.055
anks+0.055
oor+0.055
iously+0.054
Neg logit contributions
 saw-0.033
 named-0.033
 there-0.031
 nearby-0.026
 heard-0.025
Token We
Token ID775
Feature activation+2.74e-03

Pos logit contributions
 apology+0.072
 accomplishment+0.072
aned+0.072
iously+0.071
oor+0.071
Neg logit contributions
 named-0.081
 saw-0.074
 there-0.073
 called-0.064
 who-0.062
Token can
Token ID460
Feature activation-3.76e-02

Pos logit contributions
 named+0.004
 saw+0.004
 there+0.004
 abandoned+0.004
 nearby+0.004
Neg logit contributions
 apology-0.005
anks-0.005
 Deal-0.005
oor-0.005
der-0.005
Token be
Token ID307
Feature activation-2.63e-02

Pos logit contributions
 apology+0.078
oor+0.077
 Deal+0.077
anks+0.076
iously+0.076
Neg logit contributions
 saw-0.061
 named-0.061
 there-0.056
 nearby-0.050
 called-0.047
Token and
Token ID290
Feature activation-3.97e-02

Pos logit contributions
der+0.081
 apology+0.080
anks+0.078
oor+0.076
iously+0.076
Neg logit contributions
 named-0.057
 saw-0.055
 there-0.047
 nearby-0.045
 hall-0.044
Token apologized
Token ID22893
Feature activation-2.70e-02

Pos logit contributions
 apology+0.087
der+0.082
anks+0.082
 Deal+0.080
oor+0.080
Neg logit contributions
 named-0.059
 saw-0.057
 there-0.053
 nearby-0.047
 hall-0.047
Token for
Token ID329
Feature activation-1.32e-02

Pos logit contributions
 apology+0.063
 Deal+0.062
oor+0.061
anks+0.061
aned+0.060
Neg logit contributions
 saw-0.037
 named-0.036
 there-0.035
 nearby-0.029
 heard-0.027
Token not
Token ID407
Feature activation-1.69e-02

Pos logit contributions
 apology+0.026
anks+0.026
 Deal+0.026
oor+0.026
der+0.025
Neg logit contributions
 named-0.022
 saw-0.022
 there-0.021
 nearby-0.018
 looking-0.018
Token sharing
Token ID7373
Feature activation-3.38e-02

Pos logit contributions
 apology+0.034
anks+0.033
 Deal+0.032
oor+0.032
der+0.032
Neg logit contributions
 named-0.028
 saw-0.028
 there-0.026
 nearby-0.023
 hall-0.023
Token her
Token ID607
Feature activation-8.37e-02

Pos logit contributions
 apology+0.081
 Deal+0.080
oor+0.079
anks+0.078
aned+0.077
Neg logit contributions
 saw-0.044
 named-0.044
 there-0.042
 nearby-0.035
 heard-0.033
Token toys
Token ID14958
Feature activation-4.93e-02

Pos logit contributions
 Deal+0.183
oor+0.183
 apology+0.179
aned+0.178
anks+0.177
Neg logit contributions
 saw-0.123
 named-0.122
 there-0.118
 nearby-0.098
 looking-0.096
Token.
Token ID13
Feature activation-3.22e-02

Pos logit contributions
 apology+0.121
 Deal+0.120
anks+0.117
oor+0.116
aned+0.115
Neg logit contributions
 saw-0.061
 named-0.060
 there-0.059
 nearby-0.049
 looking-0.045
Token The
Token ID383
Feature activation-1.07e-02

Pos logit contributions
 apology+0.075
 Deal+0.073
iously+0.072
raw+0.071
oor+0.071
Neg logit contributions
 named-0.049
 saw-0.048
 there-0.042
 called-0.036
 hall-0.036
Token little
Token ID1310
Feature activation-4.43e-03

Pos logit contributions
 apology+0.023
 accomplishment+0.022
 Deal+0.021
Less+0.021
der+0.020
Neg logit contributions
 named-0.017
 saw-0.017
 there-0.015
 appeared-0.014
 abandoned-0.014
Token girl
Token ID2576
Feature activation+1.67e-02

Pos logit contributions
 apology+0.010
 accomplishment+0.009
 decisions+0.009
 mistakes+0.009
 choices+0.009
Neg logit contributions
 named-0.006
 saw-0.006
eh-0.005
 nearby-0.005
 there-0.005
Token kind
Token ID1611
Feature activation-5.50e-02

Pos logit contributions
oor+0.070
anks+0.069
 apology+0.068
iously+0.067
aned+0.067
Neg logit contributions
 named-0.052
 saw-0.051
 there-0.049
 nearby-0.042
 looking-0.039
Token to
Token ID284
Feature activation-3.27e-02

Pos logit contributions
 apology+0.123
anks+0.119
der+0.117
oor+0.117
 Deal+0.116
Neg logit contributions
 named-0.080
 saw-0.079
 there-0.072
 nearby-0.063
 hall-0.062
Token animals
Token ID4695
Feature activation-2.61e-02

Pos logit contributions
oor+0.077
 Deal+0.077
aned+0.075
 apology+0.075
anks+0.075
Neg logit contributions
 named-0.048
 saw-0.048
 there-0.043
 nearby-0.037
 looking-0.036
Token and
Token ID290
Feature activation-5.50e-02

Pos logit contributions
 apology+0.058
anks+0.056
oor+0.056
der+0.056
 Deal+0.055
Neg logit contributions
 saw-0.037
 named-0.037
 there-0.033
 abandoned-0.029
 hall-0.029
Token respect
Token ID2461
Feature activation-3.09e-02

Pos logit contributions
 apology+0.106
 Deal+0.103
oor+0.102
anks+0.102
der+0.099
Neg logit contributions
 saw-0.097
 named-0.097
 there-0.090
 nearby-0.081
 heard-0.077
Token their
Token ID511
Feature activation-8.37e-02

Pos logit contributions
 apology+0.063
 Deal+0.061
anks+0.061
oor+0.061
der+0.059
Neg logit contributions
 named-0.051
 saw-0.051
 there-0.048
 nearby-0.042
 heard-0.040
Token lives
Token ID3160
Feature activation-6.22e-02

Pos logit contributions
 apology+0.159
 Deal+0.155
oor+0.155
anks+0.154
der+0.150
Neg logit contributions
 saw-0.149
 named-0.148
 there-0.138
 nearby-0.125
 looking-0.119
Token.
Token ID13
Feature activation-3.10e-02

Pos logit contributions
 apology+0.134
anks+0.128
oor+0.127
der+0.126
ently+0.124
Neg logit contributions
 saw-0.094
 named-0.094
 there-0.081
 hall-0.074
 abandoned-0.073
Token He
Token ID679
Feature activation-1.13e-02

Pos logit contributions
aned+0.071
iously+0.071
 apology+0.070
oor+0.069
anks+0.069
Neg logit contributions
 named-0.048
 saw-0.046
 there-0.044
 called-0.035
 nearby-0.034
Token and
Token ID290
Feature activation-3.87e-02

Pos logit contributions
 apology+0.025
 Deal+0.025
anks+0.024
oor+0.024
der+0.024
Neg logit contributions
 named-0.017
 saw-0.017
 there-0.015
 nearby-0.013
 heard-0.013
Token Sam
Token ID3409
Feature activation+2.97e-03

Pos logit contributions
 apology+0.085
anks+0.080
 Deal+0.080
oor+0.079
der+0.078
Neg logit contributions
 named-0.057
 saw-0.057
 there-0.054
 nearby-0.047
 abandoned-0.045
Token a
Token ID257
Feature activation-2.78e-02

Pos logit contributions
 apology+0.037
anks+0.035
Less+0.035
 myself+0.035
 Deal+0.034
Neg logit contributions
 saw-0.028
 named-0.027
 hall-0.025
 there-0.024
 abandoned-0.022
Token good
Token ID922
Feature activation-3.23e-02

Pos logit contributions
 apology+0.061
 Deal+0.059
anks+0.059
oor+0.059
der+0.057
Neg logit contributions
 named-0.042
 saw-0.042
 there-0.039
 nearby-0.034
 looking-0.032
Token friend
Token ID1545
Feature activation-2.62e-02

Pos logit contributions
 apology+0.068
 accomplishment+0.062
 decisions+0.062
anks+0.062
der+0.061
Neg logit contributions
 saw-0.050
 named-0.050
 there-0.047
 nearby-0.043
 abandoned-0.043
Token and
Token ID290
Feature activation-3.73e-02

Pos logit contributions
der+0.048
anks+0.047
 apology+0.046
Less+0.046
oor+0.045
Neg logit contributions
 saw-0.040
 named-0.036
 hall-0.036
 abandoned-0.034
 dark-0.033
Token sharing
Token ID7373
Feature activation-2.50e-02

Pos logit contributions
 apology+0.080
 Deal+0.078
anks+0.077
oor+0.077
der+0.075
Neg logit contributions
 named-0.057
 saw-0.057
 there-0.053
 nearby-0.047
 heard-0.044
Token her
Token ID607
Feature activation-8.36e-02

Pos logit contributions
 apology+0.058
 Deal+0.057
oor+0.057
anks+0.056
aned+0.055
Neg logit contributions
 saw-0.034
 named-0.034
 there-0.032
 nearby-0.027
 looking-0.026
Token treasures
Token ID29561
Feature activation-6.01e-02

Pos logit contributions
 Deal+0.171
 apology+0.169
anks+0.167
oor+0.167
aned+0.166
Neg logit contributions
 saw-0.136
 named-0.136
 there-0.130
 nearby-0.113
 looking-0.109
Token.
Token ID13
Feature activation-3.37e-02

Pos logit contributions
 apology+0.152
 Deal+0.150
anks+0.146
oor+0.146
aned+0.144
Neg logit contributions
 named-0.071
 saw-0.070
 there-0.067
 nearby-0.055
 looking-0.051
Token<|endoftext|>
Token ID50256
Feature activation+6.73e-02

Pos logit contributions
aned+0.089
iously+0.087
 apology+0.087
 Deal+0.085
anks+0.082
Neg logit contributions
 named-0.047
 saw-0.042
 there-0.038
 living-0.033
 who-0.030
TokenOnce
Token ID7454
Feature activation+7.48e-02

Pos logit contributions
 named+0.127
 saw+0.127
 there+0.116
 abandoned+0.108
 nearby+0.107
Neg logit contributions
 apology-0.114
oor-0.111
anks-0.111
 Deal-0.111
Less-0.107
Token upon
Token ID2402
Feature activation+9.25e-02

Pos logit contributions
 named+0.091
 saw+0.089
 there+0.079
 nearby+0.070
 abandoned+0.066
Neg logit contributions
 apology-0.183
 Deal-0.177
oor-0.176
anks-0.176
der-0.172
Token a
Token ID257
Feature activation-4.20e-02

Pos logit contributions
 apology+0.035
 Deal+0.035
oor+0.035
anks+0.035
der+0.034
Neg logit contributions
 saw-0.028
 named-0.028
 there-0.026
 nearby-0.023
 heard-0.022
Token helping
Token ID5742
Feature activation-2.29e-02

Pos logit contributions
 apology+0.083
anks+0.083
 Deal+0.082
oor+0.082
aned+0.080
Neg logit contributions
 saw-0.071
 named-0.070
 there-0.066
 nearby-0.059
 heard-0.056
Token hand
Token ID1021
Feature activation-4.84e-02

Pos logit contributions
 apology+0.048
anks+0.046
 Deal+0.046
oor+0.045
der+0.045
Neg logit contributions
 named-0.037
 saw-0.037
 there-0.034
 nearby-0.030
 hall-0.029
Token and
Token ID290
Feature activation-5.09e-02

Pos logit contributions
 apology+0.103
anks+0.099
oor+0.099
 Deal+0.098
der+0.097
Neg logit contributions
 saw-0.075
 named-0.074
 there-0.068
 nearby-0.060
 hall-0.059
Token sharing
Token ID7373
Feature activation-2.98e-02

Pos logit contributions
 apology+0.096
 Deal+0.094
oor+0.093
anks+0.093
aned+0.090
Neg logit contributions
 saw-0.092
 named-0.091
 there-0.086
 nearby-0.076
 looking-0.074
Token his
Token ID465
Feature activation-8.36e-02

Pos logit contributions
 Deal+0.069
oor+0.068
aned+0.068
 apology+0.068
anks+0.068
Neg logit contributions
 saw-0.042
 named-0.040
 there-0.040
 nearby-0.034
 looking-0.032
Token truth
Token ID3872
Feature activation-6.38e-02

Pos logit contributions
 Deal+0.136
oor+0.134
anks+0.134
aned+0.133
 apology+0.131
Neg logit contributions
 saw-0.170
 named-0.167
 there-0.162
 nearby-0.147
 looking-0.144
Token.
Token ID13
Feature activation-5.20e-02

Pos logit contributions
 apology+0.150
 Deal+0.146
anks+0.144
oor+0.144
der+0.141
Neg logit contributions
 named-0.085
 saw-0.085
 there-0.078
 nearby-0.067
 looking-0.063
Token<|endoftext|>
Token ID50256
Feature activation+6.24e-02

Pos logit contributions
aned+0.135
iously+0.131
 apology+0.130
anks+0.127
 accomplishment+0.125
Neg logit contributions
 named-0.067
 there-0.059
 saw-0.056
 who-0.048
 living-0.045
TokenOnce
Token ID7454
Feature activation+7.42e-02

Pos logit contributions
 named+0.112
 saw+0.110
 there+0.098
 abandoned+0.097
 nearby+0.093
Neg logit contributions
oor-0.107
anks-0.106
 apology-0.105
 Deal-0.105
Less-0.104
Token,
Token ID11
Feature activation+8.81e-02

Pos logit contributions
 named+0.095
 saw+0.091
 there+0.079
 nearby+0.074
 abandoned+0.072
Neg logit contributions
 apology-0.175
oor-0.169
 Deal-0.169
anks-0.167
 accomplishment-0.164
Token don
Token ID836
Feature activation-3.11e-02

Pos logit contributions
 apology+0.051
 Deal+0.049
anks+0.048
oor+0.048
der+0.047
Neg logit contributions
 named-0.040
 saw-0.040
 there-0.037
 nearby-0.033
 abandoned-0.032
Token't
Token ID470
Feature activation-1.59e-02

Pos logit contributions
 apology+0.075
iously+0.074
eter+0.071
 accomplishment+0.071
anks+0.070
Neg logit contributions
 there-0.046
 named-0.042
 saw-0.040
 nearby-0.036
 outside-0.035
Token like
Token ID588
Feature activation-4.04e-02

Pos logit contributions
 apology+0.031
 Deal+0.029
anks+0.029
oor+0.028
der+0.028
Neg logit contributions
 named-0.027
 saw-0.027
 there-0.025
 hall-0.023
 nearby-0.022
Token to
Token ID284
Feature activation-1.93e-02

Pos logit contributions
 apology+0.083
anks+0.079
 Deal+0.079
oor+0.078
der+0.078
Neg logit contributions
 named-0.066
 saw-0.065
 there-0.059
 nearby-0.053
 abandoned-0.052
Token share
Token ID2648
Feature activation-3.24e-02

Pos logit contributions
 apology+0.036
 Deal+0.034
anks+0.034
oor+0.033
der+0.032
Neg logit contributions
 saw-0.034
 named-0.034
 there-0.033
 nearby-0.029
 abandoned-0.029
Token their
Token ID511
Feature activation-8.35e-02

Pos logit contributions
 apology+0.068
 Deal+0.066
anks+0.066
oor+0.065
der+0.064
Neg logit contributions
 named-0.051
 saw-0.051
 there-0.047
 nearby-0.042
 abandoned-0.040
Token honey
Token ID12498
Feature activation-5.50e-02

Pos logit contributions
 apology+0.171
 Deal+0.164
anks+0.163
oor+0.162
der+0.159
Neg logit contributions
 named-0.138
 saw-0.138
 there-0.128
 nearby-0.114
 abandoned-0.109
Token."
Token ID526
Feature activation-1.86e-02

Pos logit contributions
 apology+0.110
 Deal+0.106
anks+0.105
oor+0.105
der+0.102
Neg logit contributions
 named-0.093
 saw-0.093
 there-0.087
 nearby-0.077
 heard-0.074
Token<|endoftext|>
Token ID50256
Feature activation+5.20e-02

Pos logit contributions
 apology+0.041
aned+0.041
anks+0.041
iously+0.041
oor+0.041
Neg logit contributions
 named-0.028
 saw-0.028
 there-0.027
 nearby-0.022
 heard-0.022
TokenS
Token ID50
Feature activation-5.60e-03

Pos logit contributions
 named+0.090
 saw+0.088
 abandoned+0.079
 there+0.076
 nearby+0.072
Neg logit contributions
anks-0.093
 Deal-0.093
 apology-0.092
oor-0.090
Less-0.090
Tokenara
Token ID3301
Feature activation+1.73e-02

Pos logit contributions
 apology+0.010
anks+0.009
der+0.009
 Deal+0.009
oor+0.009
Neg logit contributions
 saw-0.011
 named-0.011
 there-0.011
 nearby-0.009
 heard-0.009
TokenOnce
Token ID7454
Feature activation+7.56e-02

Pos logit contributions
 named+0.121
 saw+0.117
 abandoned+0.107
 there+0.102
 mean+0.102
Neg logit contributions
oor-0.095
anks-0.094
 Deal-0.094
Less-0.093
 apology-0.092
Token upon
Token ID2402
Feature activation+9.01e-02

Pos logit contributions
 named+0.097
 saw+0.093
 there+0.081
 nearby+0.077
 abandoned+0.074
Neg logit contributions
 apology-0.177
 Deal-0.171
oor-0.171
anks-0.169
Less-0.166
Token a
Token ID257
Feature activation+8.00e-02

Pos logit contributions
 named+0.120
 saw+0.114
 hall+0.099
 abandoned+0.098
 there+0.093
Neg logit contributions
 apology-0.192
 Deal-0.186
Less-0.184
 myself-0.184
oor-0.182
Token time
Token ID640
Feature activation+9.60e-02

Pos logit contributions
 named+0.123
 saw+0.121
 there+0.116
 abandoned+0.106
 nearby+0.104
Neg logit contributions
 Deal-0.161
 apology-0.157
oor-0.156
 accomplishment-0.149
Less-0.145
Token there
Token ID612
Feature activation+8.80e-02

Pos logit contributions
 named+0.121
 saw+0.119
 abandoned+0.109
 nearby+0.108
 sheep+0.099
Neg logit contributions
oor-0.217
 Deal-0.197
 apology-0.196
mmm-0.193
ently-0.190
Token was
Token ID373
Feature activation+1.00e-01

Pos logit contributions
 named+0.136
 abandoned+0.122
 saw+0.119
eh+0.118
 nearby+0.118
Neg logit contributions
 apology-0.151
Less-0.150
oor-0.147
ently-0.144
raw-0.143
Token a
Token ID257
Feature activation+8.75e-02

Pos logit contributions
 saw+0.156
 named+0.140
 hall+0.133
 appeared+0.129
 abandoned+0.128
Neg logit contributions
Less-0.184
 Deal-0.181
 myself-0.179
 apology-0.177
oor-0.176
Token little
Token ID1310
Feature activation+6.57e-02

Pos logit contributions
 appeared+0.141
 there+0.134
 seen+0.128
 saw+0.126
 abandoned+0.125
Neg logit contributions
 apology-0.142
 accomplishment-0.133
 Deal-0.128
Less-0.122
oor-0.118
Token girl
Token ID2576
Feature activation+8.54e-02

Pos logit contributions
 appeared+0.100
 heard+0.098
 saw+0.098
 met+0.097
 there+0.093
Neg logit contributions
 apology-0.112
der-0.098
 accomplishment-0.097
umm-0.096
rier-0.091
Token named
Token ID3706
Feature activation+5.75e-02

Pos logit contributions
 abandoned+0.065
 fog+0.064
 there+0.064
 barrel+0.064
 appeared+0.064
Neg logit contributions
 apology-0.167
der-0.155
 myself-0.155
Less-0.153
 mistakes-0.153
Token Lucy
Token ID22162
Feature activation+5.38e-02

Pos logit contributions
 saw+0.096
 named+0.094
 there+0.086
 nearby+0.082
 abandoned+0.080
Neg logit contributions
 apology-0.111
 Deal-0.109
oor-0.106
anks-0.103
Less-0.103
Token a
Token ID257
Feature activation+7.87e-02

Pos logit contributions
 named+0.105
 saw+0.098
 hall+0.084
 there+0.083
 abandoned+0.083
Neg logit contributions
 apology-0.199
 Deal-0.193
oor-0.191
Less-0.190
der-0.188
Token time
Token ID640
Feature activation+9.45e-02

Pos logit contributions
 named+0.114
 saw+0.113
 there+0.108
 abandoned+0.097
 nearby+0.096
Neg logit contributions
 apology-0.164
 Deal-0.163
oor-0.160
 accomplishment-0.153
Less-0.152
Token,
Token ID11
Feature activation+9.83e-02

Pos logit contributions
 saw+0.107
 named+0.105
 abandoned+0.098
 nearby+0.095
 hall+0.085
Neg logit contributions
oor-0.225
 apology-0.212
 Deal-0.211
ently-0.204
anks-0.200
Token there
Token ID612
Feature activation+9.16e-02

Pos logit contributions
 hall+0.083
 abandoned+0.082
 witch+0.080
 cloak+0.078
 named+0.078
Neg logit contributions
 apology-0.219
Less-0.216
 Deal-0.215
raw-0.214
 itself-0.211
Token was
Token ID373
Feature activation+9.19e-02

Pos logit contributions
 named+0.124
 saw+0.111
 abandoned+0.108
 nearby+0.105
 there+0.105
Neg logit contributions
 apology-0.186
oor-0.185
Less-0.180
ently-0.177
anks-0.175
Token a
Token ID257
Feature activation+8.18e-02

Pos logit contributions
 saw+0.135
 named+0.124
 hall+0.119
 abandoned+0.115
 appeared+0.113
Neg logit contributions
Less-0.174
 Deal-0.172
 apology-0.171
 myself-0.164
oor-0.163
Token little
Token ID1310
Feature activation+5.85e-02

Pos logit contributions
 appeared+0.135
 there+0.123
 abandoned+0.121
 seen+0.121
 named+0.117
Neg logit contributions
 apology-0.129
 accomplishment-0.120
 Deal-0.114
 lesson-0.107
Less-0.106
Token girl
Token ID2576
Feature activation+8.29e-02

Pos logit contributions
 appeared+0.090
 heard+0.087
 saw+0.087
 met+0.085
 spotted+0.085
Neg logit contributions
 apology-0.096
der-0.082
 accomplishment-0.080
umm-0.079
rier-0.077
Token named
Token ID3706
Feature activation+5.39e-02

Pos logit contributions
 barrel+0.059
 abandoned+0.058
 saw+0.055
 appeared+0.054
 flashlight+0.050
Neg logit contributions
 apology-0.179
 myself-0.171
oor-0.166
Less-0.164
 mistakes-0.162
Token Lily
Token ID20037
Feature activation+6.29e-02

Pos logit contributions
 saw+0.090
 named+0.088
 there+0.081
 nearby+0.079
 abandoned+0.079
Neg logit contributions
 apology-0.101
 Deal-0.099
oor-0.095
anks-0.093
Less-0.092
Token.
Token ID13
Feature activation+2.42e-02

Pos logit contributions
 saw+0.102
 there+0.100
 abandoned+0.093
 hall+0.093
 named+0.092
Neg logit contributions
 apology-0.097
der-0.096
oor-0.095
Less-0.093
anks-0.090
Token upon
Token ID2402
Feature activation+9.21e-02

Pos logit contributions
 named+0.086
 saw+0.086
 there+0.077
 nearby+0.065
 abandoned+0.061
Neg logit contributions
 apology-0.184
 Deal-0.179
anks-0.178
oor-0.178
der-0.174
Token a
Token ID257
Feature activation+7.48e-02

Pos logit contributions
 named+0.113
 saw+0.107
 hall+0.091
 abandoned+0.091
 there+0.089
Neg logit contributions
 apology-0.210
 Deal-0.203
oor-0.202
Less-0.200
der-0.199
Token time
Token ID640
Feature activation+9.17e-02

Pos logit contributions
 named+0.102
 saw+0.101
 there+0.094
 nearby+0.082
 abandoned+0.080
Neg logit contributions
 apology-0.169
 Deal-0.166
oor-0.164
anks-0.158
der-0.158
Token,
Token ID11
Feature activation+9.89e-02

Pos logit contributions
 named+0.107
 saw+0.106
 nearby+0.094
 abandoned+0.094
 hall+0.083
Neg logit contributions
oor-0.219
 Deal-0.203
 apology-0.203
mmm-0.195
ently-0.195
Token there
Token ID612
Feature activation+9.17e-02

Pos logit contributions
 hall+0.080
 witch+0.079
 cloak+0.077
 abandoned+0.077
 laboratory+0.074
Neg logit contributions
Less-0.225
 Deal-0.222
 itself-0.216
 themselves-0.214
raw-0.213
Token was
Token ID373
Feature activation+9.27e-02

Pos logit contributions
 named+0.131
 saw+0.115
 abandoned+0.113
 nearby+0.110
 there+0.108
Neg logit contributions
 apology-0.180
oor-0.176
Less-0.174
ently-0.170
 accomplishment-0.169
Token a
Token ID257
Feature activation+7.89e-02

Pos logit contributions
 saw+0.137
 named+0.124
 hall+0.117
 abandoned+0.114
 appeared+0.112
Neg logit contributions
Less-0.178
 Deal-0.177
 apology-0.174
oor-0.170
 myself-0.170
Token big
Token ID1263
Feature activation+5.01e-02

Pos logit contributions
 appeared+0.128
 there+0.123
 abandoned+0.118
 named+0.117
 saw+0.113
Neg logit contributions
 apology-0.125
 accomplishment-0.116
 Deal-0.111
der-0.104
Less-0.103
Token tower
Token ID10580
Feature activation+8.23e-02

Pos logit contributions
 there+0.060
 appeared+0.060
 saw+0.058
 heard+0.057
 disturbed+0.055
Neg logit contributions
 accomplishment-0.100
 apology-0.099
 achievement-0.088
der-0.087
oor-0.086
Token.
Token ID13
Feature activation+2.18e-02

Pos logit contributions
 saw+0.133
 abandoned+0.125
 sheep+0.119
 there+0.115
 closet+0.113
Neg logit contributions
der-0.138
oor-0.126
anks-0.123
ently-0.122
aned-0.118
Token It
Token ID632
Feature activation+3.75e-02

Pos logit contributions
 saw+0.046
 named+0.045
 there+0.042
 nearby+0.040
 heard+0.038
Neg logit contributions
 apology-0.033
oor-0.032
 Deal-0.032
anks-0.032
der-0.030
Token,
Token ID11
Feature activation+9.81e-02

Pos logit contributions
 saw+0.098
 named+0.098
 nearby+0.082
 abandoned+0.081
 hall+0.072
Neg logit contributions
oor-0.239
 Deal-0.232
 apology-0.231
anks-0.222
der-0.220
Token there
Token ID612
Feature activation+8.81e-02

Pos logit contributions
 hall+0.083
 cloak+0.082
 witch+0.079
 abandoned+0.079
 named+0.077
Neg logit contributions
Less-0.217
 Deal-0.216
raw-0.213
 apology-0.211
 itself-0.208
Token was
Token ID373
Feature activation+9.16e-02

Pos logit contributions
 named+0.120
 saw+0.107
 abandoned+0.101
 there+0.100
 nearby+0.100
Neg logit contributions
 apology-0.181
oor-0.179
Less-0.176
der-0.172
ently-0.171
Token a
Token ID257
Feature activation+7.94e-02

Pos logit contributions
 saw+0.127
 named+0.117
 hall+0.104
 abandoned+0.103
 nearby+0.102
Neg logit contributions
Less-0.187
 Deal-0.185
 apology-0.185
oor-0.181
 myself-0.178
Token little
Token ID1310
Feature activation+5.95e-02

Pos logit contributions
 appeared+0.128
 there+0.123
 named+0.119
 abandoned+0.118
 saw+0.115
Neg logit contributions
 apology-0.129
 accomplishment-0.121
 Deal-0.116
Less-0.108
 lesson-0.106
Token girl
Token ID2576
Feature activation+8.27e-02

Pos logit contributions
 appeared+0.092
 saw+0.089
 heard+0.088
 met+0.088
 there+0.087
Neg logit contributions
 apology-0.101
 accomplishment-0.087
der-0.084
umm-0.082
rier-0.080
Token named
Token ID3706
Feature activation+5.56e-02

Pos logit contributions
 abandoned+0.060
 appeared+0.059
 barrel+0.059
 fog+0.059
 saw+0.057
Neg logit contributions
 apology-0.168
 myself-0.166
Less-0.155
iu-0.154
 mistakes-0.153
Token Lily
Token ID20037
Feature activation+6.16e-02

Pos logit contributions
 saw+0.091
 named+0.090
 there+0.082
 nearby+0.078
 abandoned+0.077
Neg logit contributions
 apology-0.107
 Deal-0.106
oor-0.102
anks-0.100
Less-0.099
Token.
Token ID13
Feature activation+2.34e-02

Pos logit contributions
 there+0.100
 saw+0.099
 hall+0.095
 named+0.093
 abandoned+0.092
Neg logit contributions
 apology-0.092
der-0.091
Less-0.089
oor-0.086
anks-0.085
Token She
Token ID1375
Feature activation+4.64e-02

Pos logit contributions
 saw+0.041
 named+0.040
 there+0.037
 nearby+0.035
 abandoned+0.035
Neg logit contributions
 Deal-0.044
 apology-0.043
anks-0.042
der-0.041
oor-0.041
Token had
Token ID550
Feature activation+6.74e-02

Pos logit contributions
 there+0.058
 nearby+0.053
 abandoned+0.051
 named+0.050
 shed+0.050
Neg logit contributions
aned-0.086
Less-0.086
 apology-0.085
oor-0.083
der-0.082
Token had
Token ID550
Feature activation+6.49e-03

Pos logit contributions
 saw+0.020
 named+0.020
 there+0.019
 nearby+0.016
 heard+0.015
Neg logit contributions
 apology-0.029
 Deal-0.029
anks-0.028
oor-0.028
iously-0.028
Token a
Token ID257
Feature activation-3.42e-04

Pos logit contributions
 saw+0.009
 named+0.009
 there+0.008
 nearby+0.007
 heard+0.006
Neg logit contributions
 Deal-0.016
 apology-0.016
oor-0.015
anks-0.015
aned-0.015
Token bad
Token ID2089
Feature activation-7.93e-03

Pos logit contributions
 apology+0.001
 accomplishment+0.001
 Deal+0.001
anks+0.001
Less+0.001
Neg logit contributions
 named-0.001
 saw-0.001
 there-0.001
 nearby-0.000
 abandoned-0.000
Token ending
Token ID7464
Feature activation-5.38e-03

Pos logit contributions
 apology+0.013
anks+0.012
oor+0.012
der+0.012
 Deal+0.012
Neg logit contributions
 named-0.016
 saw-0.016
 there-0.015
 nearby-0.014
 abandoned-0.014
Token.
Token ID13
Feature activation-3.16e-02

Pos logit contributions
 apology+0.011
oor+0.011
anks+0.011
der+0.011
 Deal+0.011
Neg logit contributions
 saw-0.008
 named-0.008
 there-0.007
 nearby-0.006
 abandoned-0.006
Token<|endoftext|>
Token ID50256
Feature activation+8.46e-02

Pos logit contributions
iously+0.076
 apology+0.076
aned+0.075
 accomplishment+0.074
anks+0.073
Neg logit contributions
 named-0.047
 saw-0.044
 there-0.042
 called-0.033
 who-0.033
TokenOnce
Token ID7454
Feature activation+8.26e-02

Pos logit contributions
 named+0.160
 saw+0.158
 there+0.146
 abandoned+0.135
 nearby+0.134
Neg logit contributions
 apology-0.145
anks-0.140
 Deal-0.140
oor-0.140
Less-0.135
Token upon
Token ID2402
Feature activation+9.82e-02

Pos logit contributions
 named+0.103
 saw+0.099
 there+0.086
 nearby+0.080
 abandoned+0.078
Neg logit contributions
 apology-0.198
 Deal-0.192
oor-0.191
anks-0.190
Less-0.186
Token a
Token ID257
Feature activation+8.31e-02

Pos logit contributions
 named+0.122
 saw+0.113
 abandoned+0.096
 hall+0.094
 there+0.094
Neg logit contributions
 apology-0.224
 Deal-0.216
oor-0.215
Less-0.213
der-0.212
Token time
Token ID640
Feature activation+1.01e-01

Pos logit contributions
 named+0.124
 saw+0.122
 there+0.116
 abandoned+0.106
 nearby+0.105
Neg logit contributions
 apology-0.170
 Deal-0.168
oor-0.167
 accomplishment-0.157
der-0.155
Token,
Token ID11
Feature activation+1.06e-01

Pos logit contributions
 saw+0.114
 named+0.110
 abandoned+0.098
 nearby+0.096
 hall+0.087
Neg logit contributions
oor-0.242
 apology-0.235
 Deal-0.234
anks-0.223
ently-0.222
Token in
Token ID287
Feature activation+3.98e-02

Pos logit contributions
 saw+0.111
 met+0.101
 named+0.101
 witch+0.100
 sheep+0.100
Neg logit contributions
oor-0.107
der-0.105
 accomplishment-0.098
ently-0.097
anks-0.095
Token the
Token ID262
Feature activation+4.42e-02

Pos logit contributions
 named+0.063
 saw+0.062
 there+0.055
 nearby+0.050
 appeared+0.049
Neg logit contributions
 apology-0.082
Less-0.078
 Deal-0.078
oor-0.077
 accomplishment-0.076
Token garage
Token ID15591
Feature activation+6.76e-02

Pos logit contributions
 named+0.055
 there+0.051
 saw+0.048
 appeared+0.041
 sitting+0.040
Neg logit contributions
Less-0.099
 apology-0.094
 Deal-0.092
 instructions-0.092
 Moral-0.089
Token.
Token ID13
Feature activation-4.27e-03

Pos logit contributions
 saw+0.105
 named+0.104
 there+0.096
 abandoned+0.087
 sheep+0.086
Neg logit contributions
der-0.133
anks-0.125
oor-0.125
iously-0.125
 apology-0.123
Token They
Token ID1119
Feature activation+2.40e-02

Pos logit contributions
 apology+0.008
anks+0.008
oor+0.008
 Deal+0.008
der+0.008
Neg logit contributions
 saw-0.008
 named-0.008
 there-0.007
 nearby-0.006
 heard-0.006
Token opened
Token ID4721
Feature activation+4.18e-02

Pos logit contributions
 named+0.029
 saw+0.027
 there+0.026
 nearby+0.022
 hall+0.022
Neg logit contributions
 apology-0.057
oor-0.055
 Deal-0.054
der-0.054
Less-0.054
Token it
Token ID340
Feature activation+1.28e-02

Pos logit contributions
 named+0.071
 saw+0.069
 there+0.062
 nearby+0.058
 hall+0.057
Neg logit contributions
 apology-0.081
oor-0.078
anks-0.077
 Deal-0.076
der-0.076
Token and
Token ID290
Feature activation+1.56e-02

Pos logit contributions
 named+0.021
 saw+0.019
 there+0.017
 abandoned+0.016
 hall+0.016
Neg logit contributions
 apology-0.025
anks-0.025
oor-0.025
der-0.025
iously-0.025
Token saw
Token ID2497
Feature activation+4.15e-02

Pos logit contributions
 named+0.013
 hall+0.009
 nearby+0.009
 forest+0.009
itty+0.009
Neg logit contributions
oor-0.038
 apology-0.038
Less-0.038
der-0.038
 accomplishment-0.037
Token many
Token ID867
Feature activation+2.16e-03

Pos logit contributions
 named+0.067
 saw+0.060
eh+0.058
 hall+0.056
itty+0.056
Neg logit contributions
der-0.081
 apology-0.079
 themselves-0.075
anks-0.074
 accomplishment-0.074
Token old
Token ID1468
Feature activation+1.71e-02

Pos logit contributions
 named+0.004
 saw+0.004
 there+0.003
eh+0.003
 nearby+0.003
Neg logit contributions
 apology-0.004
 accomplishment-0.004
 mistakes-0.004
 decisions-0.003
 choices-0.003
Token her
Token ID607
Feature activation-3.04e-02

Pos logit contributions
 named+0.001
 saw+0.001
 there+0.001
 nearby+0.000
 looking+0.000
Neg logit contributions
 apology-0.001
anks-0.001
 Deal-0.001
oor-0.001
der-0.001
Token.
Token ID13
Feature activation-2.27e-02

Pos logit contributions
 apology+0.057
 Deal+0.056
anks+0.054
aned+0.053
Less+0.053
Neg logit contributions
 saw-0.056
 named-0.054
 there-0.051
 nearby-0.048
 looking-0.047
Token
Token ID198
Feature activation-9.86e-03

Pos logit contributions
 apology+0.054
aned+0.051
anks+0.051
iously+0.051
 Deal+0.051
Neg logit contributions
 named-0.034
 saw-0.033
 there-0.031
 hall-0.025
 looking-0.025
Token
Token ID198
Feature activation-2.38e-02

Pos logit contributions
 apology+0.023
anks+0.023
iously+0.022
der+0.022
 Deal+0.022
Neg logit contributions
 named-0.015
 saw-0.014
 there-0.014
 nearby-0.012
 heard-0.011
TokenDown
Token ID8048
Feature activation+2.09e-02

Pos logit contributions
 apology+0.038
 Deal+0.037
oor+0.036
anks+0.036
der+0.035
Neg logit contributions
 named-0.049
 saw-0.049
 there-0.047
 nearby-0.042
 abandoned-0.041
Token the
Token ID262
Feature activation+1.81e-02

Pos logit contributions
 named+0.030
 saw+0.028
 there+0.025
 abandoned+0.024
 nearby+0.022
Neg logit contributions
 apology-0.044
oor-0.043
anks-0.042
Less-0.042
ently-0.042
Token trail
Token ID8025
Feature activation+4.83e-02

Pos logit contributions
 named+0.027
 saw+0.027
 there+0.025
 nearby+0.022
 abandoned+0.021
Neg logit contributions
 apology-0.039
 Deal-0.038
anks-0.038
oor-0.038
Less-0.037
Token,
Token ID11
Feature activation+1.70e-02

Pos logit contributions
 named+0.056
 saw+0.048
 there+0.046
 abandoned+0.043
 sheep+0.041
Neg logit contributions
oor-0.112
der-0.109
ently-0.108
 apology-0.107
aned-0.107
Token she
Token ID673
Feature activation+1.27e-02

Pos logit contributions
 named+0.027
 saw+0.023
 abandoned+0.021
 nearby+0.021
aver+0.020
Neg logit contributions
der-0.031
 apology-0.031
 accomplishment-0.031
oor-0.031
Less-0.030
Token thought
Token ID1807
Feature activation-3.68e-03

Pos logit contributions
 named+0.009
 there+0.007
 saw+0.006
 abandoned+0.006
 nearby+0.006
Neg logit contributions
oor-0.033
der-0.033
aned-0.033
anks-0.032
 apology-0.032
Token she
Token ID673
Feature activation+6.01e-03

Pos logit contributions
 apology+0.008
oor+0.007
der+0.007
anks+0.007
 Deal+0.007
Neg logit contributions
 named-0.006
 saw-0.006
 there-0.005
 nearby-0.005
 abandoned-0.005
Token He
Token ID679
Feature activation+3.34e-02

Pos logit contributions
 saw+0.017
 named+0.017
 there+0.016
 nearby+0.014
 looking+0.013
Neg logit contributions
 apology-0.020
oor-0.020
 Deal-0.020
anks-0.019
der-0.019
Token found
Token ID1043
Feature activation+4.62e-02

Pos logit contributions
 named+0.041
 there+0.040
 outdoors+0.037
 hall+0.037
 abandoned+0.036
Neg logit contributions
 apology-0.066
aned-0.065
iously-0.063
der-0.062
Less-0.061
Token a
Token ID257
Feature activation+5.60e-02

Pos logit contributions
 named+0.059
 saw+0.056
 there+0.049
 hall+0.047
 abandoned+0.045
Neg logit contributions
 apology-0.108
der-0.103
anks-0.101
Less-0.101
 Deal-0.101
Token nice
Token ID3621
Feature activation+4.75e-02

Pos logit contributions
 named+0.095
 saw+0.081
 appeared+0.075
 there+0.075
 met+0.072
Neg logit contributions
 apology-0.100
 lesson-0.095
 accomplishment-0.091
 Deal-0.088
 instructions-0.087
Token,
Token ID11
Feature activation+1.72e-02

Pos logit contributions
 named+0.079
 saw+0.076
 there+0.069
 appeared+0.068
 met+0.068
Neg logit contributions
 apology-0.083
 accomplishment-0.080
 lesson-0.075
 achievement-0.070
raw-0.068
Token soft
Token ID2705
Feature activation+4.11e-02

Pos logit contributions
 named+0.027
 saw+0.025
 appeared+0.023
 hall+0.022
itty+0.022
Neg logit contributions
 apology-0.033
 accomplishment-0.030
 Deal-0.029
 myself-0.029
 wounds-0.028
Token bed
Token ID3996
Feature activation+3.85e-02

Pos logit contributions
 named+0.071
 saw+0.070
 there+0.064
 met+0.063
 who+0.061
Neg logit contributions
 apology-0.065
 accomplishment-0.062
der-0.057
 myself-0.056
 lesson-0.056
Token under
Token ID739
Feature activation+2.47e-02

Pos logit contributions
 saw+0.058
 named+0.057
 abandoned+0.050
 sheep+0.050
 there+0.049
Neg logit contributions
der-0.074
oor-0.072
anks-0.071
fulness-0.067
ently-0.065
Token a
Token ID257
Feature activation+7.48e-02

Pos logit contributions
 named+0.032
 saw+0.031
 there+0.028
 nearby+0.025
 abandoned+0.024
Neg logit contributions
 apology-0.057
der-0.055
oor-0.055
 Deal-0.055
anks-0.054
Token big
Token ID1263
Feature activation+6.49e-02

Pos logit contributions
 named+0.121
 saw+0.107
 there+0.101
 met+0.092
 walking+0.090
Neg logit contributions
 apology-0.137
 instructions-0.132
der-0.130
oor-0.129
raw-0.129
Token tree
Token ID5509
Feature activation+7.69e-02

Pos logit contributions
 named+0.106
 saw+0.101
 there+0.095
 met+0.087
 heard+0.084
Neg logit contributions
 apology-0.119
oor-0.114
 accomplishment-0.113
 myself-0.110
raw-0.110
Token ground
Token ID2323
Feature activation+3.23e-02

Pos logit contributions
 named+0.019
 saw+0.018
 there+0.017
 heard+0.015
 nearby+0.015
Neg logit contributions
 apology-0.017
oor-0.017
anks-0.016
 Deal-0.016
der-0.016
Token.
Token ID13
Feature activation+1.29e-04

Pos logit contributions
 named+0.059
 saw+0.058
 sheep+0.051
 abandoned+0.049
 rat+0.049
Neg logit contributions
der-0.054
oor-0.050
anks-0.050
ently-0.048
iously-0.046
Token She
Token ID1375
Feature activation+2.26e-02

Pos logit contributions
 named+0.000
 saw+0.000
 there+0.000
 nearby+0.000
 abandoned+0.000
Neg logit contributions
 apology-0.000
anks-0.000
 Deal-0.000
der-0.000
oor-0.000
Token asked
Token ID1965
Feature activation+3.38e-02

Pos logit contributions
 named+0.025
 saw+0.024
 there+0.023
 nearby+0.019
 hall+0.018
Neg logit contributions
 apology-0.057
 Deal-0.055
oor-0.055
anks-0.055
der-0.054
Token her
Token ID607
Feature activation+4.67e-03

Pos logit contributions
 named+0.052
 saw+0.052
 there+0.047
 nearby+0.044
 abandoned+0.044
Neg logit contributions
 apology-0.071
anks-0.066
 Deal-0.066
oor-0.065
der-0.065
Token friends
Token ID2460
Feature activation+4.39e-02

Pos logit contributions
 saw+0.009
 named+0.009
 abandoned+0.009
 nearby+0.008
 appeared+0.008
Neg logit contributions
 apology-0.007
anks-0.007
 accomplishment-0.006
 Deal-0.006
Less-0.006
Token,
Token ID11
Feature activation+2.13e-02

Pos logit contributions
 named+0.073
 saw+0.072
 abandoned+0.066
eh+0.064
house+0.062
Neg logit contributions
 apology-0.082
der-0.077
anks-0.076
oor-0.074
 Deal-0.074
Token "
Token ID366
Feature activation+9.59e-03

Pos logit contributions
 saw+0.041
 named+0.040
 hall+0.039
house+0.038
 abandoned+0.037
Neg logit contributions
 apology-0.030
 Deal-0.030
Less-0.028
der-0.027
oor-0.026
TokenWhat
Token ID2061
Feature activation+4.50e-02

Pos logit contributions
 named+0.019
 saw+0.018
 there+0.017
 looking+0.016
 abandoned+0.016
Neg logit contributions
Less-0.015
 Deal-0.015
 apology-0.014
oor-0.014
der-0.013
Token is
Token ID318
Feature activation+1.81e-02

Pos logit contributions
 named+0.084
 saw+0.079
 nearby+0.078
eh+0.077
 abandoned+0.076
Neg logit contributions
 apology-0.070
 decisions-0.067
anks-0.066
oor-0.064
 mistakes-0.062
Token in
Token ID287
Feature activation+5.85e-03

Pos logit contributions
 saw+0.026
 named+0.025
 heard+0.023
itty+0.022
 nearby+0.022
Neg logit contributions
oor-0.034
anks-0.034
 myself-0.034
 accomplishment-0.033
 apology-0.033
Token playing
Token ID2712
Feature activation-1.85e-04

Pos logit contributions
 apology+0.029
anks+0.029
oor+0.028
 Deal+0.028
der+0.028
Neg logit contributions
 named-0.017
 saw-0.017
 there-0.015
 nearby-0.013
 hall-0.013
Token with
Token ID351
Feature activation-1.37e-03

Pos logit contributions
anks+0.000
der+0.000
 apology+0.000
 myself+0.000
oor+0.000
Neg logit contributions
 saw-0.000
 named-0.000
 abandoned-0.000
 heard-0.000
 appeared-0.000
Token them
Token ID606
Feature activation-2.63e-02

Pos logit contributions
 apology+0.003
anks+0.002
 Deal+0.002
der+0.002
oor+0.002
Neg logit contributions
 named-0.002
 saw-0.002
 there-0.002
 nearby-0.002
 abandoned-0.002
Token.
Token ID13
Feature activation-2.32e-02

Pos logit contributions
anks+0.050
der+0.050
 apology+0.050
oor+0.049
iously+0.048
Neg logit contributions
 named-0.044
 saw-0.044
 there-0.037
 abandoned-0.035
 hall-0.035
Token He
Token ID679
Feature activation+9.61e-03

Pos logit contributions
 apology+0.049
anks+0.046
oor+0.046
aned+0.046
 accomplishment+0.045
Neg logit contributions
 named-0.040
 saw-0.038
 there-0.037
 nearby-0.031
 called-0.031
Token was
Token ID373
Feature activation+1.73e-02

Pos logit contributions
 named+0.012
 saw+0.012
 there+0.012
 nearby+0.010
 hall+0.010
Neg logit contributions
 apology-0.022
oor-0.021
der-0.021
anks-0.021
 Deal-0.021
Token especially
Token ID2592
Feature activation-2.24e-03

Pos logit contributions
 saw+0.029
 named+0.028
 there+0.025
 hall+0.025
 abandoned+0.024
Neg logit contributions
 accomplishment-0.031
 apology-0.031
 Deal-0.030
aned-0.030
iously-0.030
Token delighted
Token ID20707
Feature activation-1.37e-02

Pos logit contributions
 accomplishment+0.004
 apology+0.004
 mistakes+0.004
 decisions+0.004
der+0.004
Neg logit contributions
 named-0.004
 hall-0.003
 saw-0.003
 appeared-0.003
 box-0.003
Token to
Token ID284
Feature activation-8.34e-03

Pos logit contributions
der+0.027
iously+0.027
anks+0.027
oor+0.027
 apology+0.026
Neg logit contributions
 named-0.022
 saw-0.021
 hall-0.019
 there-0.018
 nearby-0.017
Token find
Token ID1064
Feature activation+9.67e-03

Pos logit contributions
der+0.020
 Deal+0.020
anks+0.019
iously+0.019
aned+0.019
Neg logit contributions
 named-0.012
 saw-0.012
 there-0.011
 looking-0.009
 nearby-0.009
Token a
Token ID257
Feature activation+5.17e-03

Pos logit contributions
 named+0.016
 saw+0.016
 hall+0.014
 abandoned+0.014
 there+0.013
Neg logit contributions
 apology-0.019
 decisions-0.017
anks-0.016
 accomplishment-0.016
der-0.016
Token shapes
Token ID15268
Feature activation+2.89e-03

Pos logit contributions
 apology+0.076
 Deal+0.074
anks+0.073
oor+0.073
der+0.071
Neg logit contributions
 saw-0.044
 named-0.043
 there-0.040
 nearby-0.034
 heard-0.032
Token and
Token ID290
Feature activation-3.07e-02

Pos logit contributions
 named+0.004
 saw+0.004
 there+0.004
 abandoned+0.003
 hall+0.003
Neg logit contributions
 apology-0.006
anks-0.006
der-0.006
oor-0.006
iously-0.006
Token soon
Token ID2582
Feature activation-1.10e-02

Pos logit contributions
 apology+0.056
anks+0.054
der+0.054
oor+0.053
 Deal+0.053
Neg logit contributions
 named-0.056
 saw-0.054
 there-0.052
 nearby-0.047
 abandoned-0.045
Token the
Token ID262
Feature activation-3.28e-02

Pos logit contributions
 apology+0.025
anks+0.024
 Deal+0.024
oor+0.024
der+0.024
Neg logit contributions
 named-0.016
 saw-0.015
 there-0.014
 nearby-0.012
 abandoned-0.012
Token puzzle
Token ID15027
Feature activation+3.37e-03

Pos logit contributions
 apology+0.066
 Deal+0.063
anks+0.063
oor+0.063
 accomplishment+0.062
Neg logit contributions
 named-0.056
 saw-0.054
 there-0.050
 nearby-0.045
 abandoned-0.043
Token was
Token ID373
Feature activation-2.27e-03

Pos logit contributions
 named+0.004
 saw+0.004
 there+0.004
 nearby+0.003
 abandoned+0.003
Neg logit contributions
 apology-0.008
anks-0.008
oor-0.008
 Deal-0.008
der-0.008
Token transformed
Token ID14434
Feature activation-3.15e-02

Pos logit contributions
 apology+0.005
 Deal+0.005
anks+0.005
oor+0.005
der+0.005
Neg logit contributions
 saw-0.003
 named-0.003
 there-0.003
 nearby-0.003
 looking-0.002
Token into
Token ID656
Feature activation-1.77e-02

Pos logit contributions
 apology+0.068
 Deal+0.065
oor+0.063
anks+0.062
Less+0.061
Neg logit contributions
 saw-0.051
 named-0.049
 there-0.049
 nearby-0.042
 looking-0.040
Token a
Token ID257
Feature activation+9.06e-03

Pos logit contributions
 apology+0.042
 Deal+0.041
anks+0.041
oor+0.041
der+0.040
Neg logit contributions
 saw-0.023
 named-0.023
 there-0.021
 nearby-0.018
 heard-0.017
Token beautiful
Token ID4950
Feature activation+5.60e-03

Pos logit contributions
 named+0.014
 saw+0.013
 there+0.012
 abandoned+0.010
 walking+0.010
Neg logit contributions
 apology-0.019
 accomplishment-0.018
oor-0.018
 Deal-0.018
Less-0.018
Token masterpiece
Token ID30669
Feature activation-2.02e-03

Pos logit contributions
 named+0.009
 saw+0.008
 there+0.008
 abandoned+0.007
 nearby+0.007
Neg logit contributions
 accomplishment-0.011
 apology-0.011
oor-0.010
der-0.010
 decisions-0.010
Token something
Token ID1223
Feature activation+8.35e-03

Pos logit contributions
 named+0.006
 saw+0.005
 there+0.005
 hall+0.004
 nearby+0.004
Neg logit contributions
 apology-0.008
der-0.008
 Deal-0.008
oor-0.008
 accomplishment-0.008
Token fun
Token ID1257
Feature activation-1.42e-02

Pos logit contributions
 named+0.014
 saw+0.013
 hall+0.012
 there+0.011
 abandoned+0.011
Neg logit contributions
 apology-0.016
der-0.015
 accomplishment-0.015
Less-0.014
 mistakes-0.014
Token.
Token ID13
Feature activation-2.64e-02

Pos logit contributions
anks+0.027
 apology+0.027
der+0.027
oor+0.026
 accomplishment+0.026
Neg logit contributions
 named-0.023
 saw-0.023
 there-0.020
 nearby-0.019
 abandoned-0.019
Token He
Token ID679
Feature activation+8.30e-03

Pos logit contributions
 apology+0.051
anks+0.049
oor+0.049
 Deal+0.048
aned+0.048
Neg logit contributions
 named-0.047
 saw-0.047
 there-0.044
 nearby-0.039
 looking-0.038
Token started
Token ID2067
Feature activation+7.97e-03

Pos logit contributions
 named+0.010
 there+0.009
 saw+0.009
 hall+0.008
 abandoned+0.008
Neg logit contributions
 apology-0.019
der-0.019
aned-0.018
iously-0.018
oor-0.018
Token running
Token ID2491
Feature activation+1.42e-02

Pos logit contributions
 named+0.013
 saw+0.012
 there+0.011
 abandoned+0.010
 nearby+0.010
Neg logit contributions
 apology-0.015
anks-0.015
 accomplishment-0.015
iously-0.015
der-0.015
Token around
Token ID1088
Feature activation+5.64e-03

Pos logit contributions
 named+0.023
 saw+0.022
 there+0.021
 abandoned+0.018
 nearby+0.018
Neg logit contributions
 apology-0.028
oor-0.028
anks-0.027
der-0.027
iously-0.027
Token the
Token ID262
Feature activation+1.48e-02

Pos logit contributions
 named+0.009
 saw+0.008
 there+0.007
 abandoned+0.007
aver+0.007
Neg logit contributions
 accomplishment-0.011
 apology-0.010
oor-0.010
der-0.010
iously-0.010
Token ball
Token ID2613
Feature activation+2.05e-02

Pos logit contributions
 named+0.021
 there+0.017
 appeared+0.017
 saw+0.016
 came+0.015
Neg logit contributions
 mistakes-0.030
 apology-0.030
 decisions-0.029
 instructions-0.029
 myself-0.028
Token.
Token ID13
Feature activation-3.17e-02

Pos logit contributions
 saw+0.031
 named+0.030
 there+0.028
 abandoned+0.027
 appeared+0.023
Neg logit contributions
der-0.041
oor-0.040
anks-0.040
 apology-0.039
iously-0.039
Token It
Token ID632
Feature activation+1.62e-02

Pos logit contributions
 apology+0.063
anks+0.060
 Deal+0.060
oor+0.059
aned+0.059
Neg logit contributions
 named-0.056
 saw-0.055
 there-0.052
 nearby-0.046
 looking-0.044
Token would
Token ID561
Feature activation-3.20e-02

Pos logit contributions
 apology+0.033
der+0.033
iously+0.032
anks+0.032
aned+0.031
Neg logit contributions
 named-0.019
 saw-0.019
 there-0.018
 abandoned-0.016
 hall-0.016
Token come
Token ID1282
Feature activation-2.81e-02

Pos logit contributions
anks+0.064
 apology+0.064
 Deal+0.064
iously+0.063
aned+0.063
Neg logit contributions
 named-0.055
 saw-0.054
 there-0.050
 nearby-0.047
 looking-0.044
Token back
Token ID736
Feature activation-2.58e-02

Pos logit contributions
 apology+0.047
anks+0.045
 Deal+0.045
oor+0.045
der+0.043
Neg logit contributions
 saw-0.055
 named-0.055
 there-0.051
 nearby-0.047
 hall-0.047
Token and
Token ID290
Feature activation-4.04e-02

Pos logit contributions
 apology+0.053
anks+0.051
 Deal+0.051
oor+0.051
der+0.049
Neg logit contributions
 named-0.042
 saw-0.042
 there-0.039
 nearby-0.034
 abandoned-0.033
Token they
Token ID484
Feature activation-1.89e-02

Pos logit contributions
 apology+0.083
 Deal+0.080
anks+0.080
oor+0.079
der+0.078
Neg logit contributions
 named-0.065
 saw-0.065
 there-0.061
 nearby-0.054
 abandoned-0.052
Token could
Token ID714
Feature activation-3.01e-02

Pos logit contributions
 apology+0.045
anks+0.043
oor+0.043
 Deal+0.043
der+0.042
Neg logit contributions
 named-0.024
 saw-0.024
 there-0.022
 nearby-0.019
 abandoned-0.018
Token play
Token ID711
Feature activation-6.40e-03

Pos logit contributions
 apology+0.068
 Deal+0.066
iously+0.066
aned+0.066
der+0.065
Neg logit contributions
 named-0.048
 saw-0.045
 there-0.042
 nearby-0.037
 called-0.035
Token together
Token ID1978
Feature activation-2.36e-02

Pos logit contributions
 apology+0.011
anks+0.011
oor+0.011
der+0.011
 accomplishment+0.011
Neg logit contributions
 named-0.011
 saw-0.011
 there-0.010
 abandoned-0.010
 hall-0.009
Token.
Token ID13
Feature activation-4.49e-02

Pos logit contributions
 apology+0.045
oor+0.043
anks+0.043
 Deal+0.043
der+0.043
Neg logit contributions
 named-0.041
 saw-0.041
 there-0.038
 nearby-0.034
 abandoned-0.033
Token<|endoftext|>
Token ID50256
Feature activation+7.52e-02

Pos logit contributions
 apology+0.106
aned+0.106
anks+0.100
 Deal+0.100
iously+0.099
Neg logit contributions
 named-0.072
 saw-0.067
 there-0.061
 who-0.054
 living-0.051
TokenHarry
Token ID18308
Feature activation+5.25e-02

Pos logit contributions
 named+0.150
 saw+0.147
 there+0.133
 abandoned+0.131
 nearby+0.127
Neg logit contributions
oor-0.114
 apology-0.114
 Deal-0.112
anks-0.112
Less-0.110
Token.
Token ID13
Feature activation-3.15e-02

Pos logit contributions
der+0.013
anks+0.012
 apology+0.012
oor+0.012
 accomplishment+0.011
Neg logit contributions
 named-0.010
 saw-0.010
 abandoned-0.009
 there-0.009
eh-0.008
Token She
Token ID1375
Feature activation-1.77e-03

Pos logit contributions
 apology+0.077
oor+0.076
anks+0.075
iously+0.075
aned+0.075
Neg logit contributions
 saw-0.040
 there-0.039
 named-0.039
 heard-0.030
 nearby-0.030
Token felt
Token ID2936
Feature activation+1.03e-02

Pos logit contributions
 apology+0.004
der+0.004
aned+0.004
 Deal+0.004
anks+0.004
Neg logit contributions
 named-0.002
 there-0.002
 saw-0.002
 nearby-0.002
 abandoned-0.002
Token frustrated
Token ID14718
Feature activation-6.46e-03

Pos logit contributions
 named+0.014
 saw+0.014
 there+0.013
 hall+0.012
 nearby+0.011
Neg logit contributions
 apology-0.023
der-0.022
 Deal-0.022
anks-0.021
 accomplishment-0.021
Token and
Token ID290
Feature activation-2.32e-02

Pos logit contributions
 apology+0.014
der+0.014
anks+0.014
oor+0.013
 Deal+0.013
Neg logit contributions
 named-0.009
 saw-0.009
 there-0.008
 abandoned-0.007
 nearby-0.007
Token sad
Token ID6507
Feature activation-1.92e-02

Pos logit contributions
 apology+0.050
 Deal+0.048
der+0.047
anks+0.047
aned+0.046
Neg logit contributions
 named-0.034
 saw-0.031
 there-0.029
 hall-0.028
 nearby-0.026
Token.
Token ID13
Feature activation-2.32e-02

Pos logit contributions
 apology+0.046
 Deal+0.044
anks+0.044
oor+0.044
der+0.043
Neg logit contributions
 named-0.025
 saw-0.025
 there-0.023
 nearby-0.020
 heard-0.019
Token She
Token ID1375
Feature activation+7.34e-03

Pos logit contributions
 apology+0.053
oor+0.052
anks+0.052
 accomplishment+0.050
iously+0.050
Neg logit contributions
 saw-0.034
 named-0.033
 there-0.032
 nearby-0.027
 heard-0.026
Token wanted
Token ID2227
Feature activation+1.79e-03

Pos logit contributions
 there+0.009
 named+0.009
 hall+0.008
 saw+0.008
 nearby+0.008
Neg logit contributions
der-0.016
 apology-0.016
aned-0.016
 Deal-0.015
iously-0.015
Token to
Token ID284
Feature activation-2.62e-02

Pos logit contributions
 named+0.003
 saw+0.003
 there+0.003
 nearby+0.002
 abandoned+0.002
Neg logit contributions
 apology-0.004
anks-0.004
 Deal-0.004
oor-0.004
der-0.004
Token whistle
Token ID16121
Feature activation-1.71e-02

Pos logit contributions
der+0.060
oor+0.060
iously+0.059
anks+0.059
 Deal+0.058
Neg logit contributions
 named-0.039
 saw-0.038
 there-0.034
 nearby-0.030
 heard-0.029
Token a
Token ID257
Feature activation-7.65e-03

Pos logit contributions
 apology+0.031
anks+0.030
 Deal+0.030
oor+0.030
der+0.029
Neg logit contributions
 saw-0.019
 named-0.019
 there-0.017
 nearby-0.015
 looking-0.014
Token bush
Token ID24484
Feature activation+4.87e-03

Pos logit contributions
 apology+0.015
 accomplishment+0.013
 Deal+0.013
 instructions+0.013
 lesson+0.012
Neg logit contributions
 named-0.013
 saw-0.012
 there-0.011
 appeared-0.011
 sitting-0.010
Token!".
Token ID48220
Feature activation-3.40e-02

Pos logit contributions
 saw+0.008
 named+0.008
 there+0.007
 heard+0.007
 abandoned+0.007
Neg logit contributions
oor-0.009
 apology-0.009
anks-0.009
der-0.009
 myself-0.008
Token Jack
Token ID3619
Feature activation+1.22e-02

Pos logit contributions
 apology+0.066
anks+0.063
oor+0.062
 Deal+0.062
der+0.061
Neg logit contributions
 saw-0.061
 named-0.061
 there-0.057
 nearby-0.050
 looking-0.049
Token chuckled
Token ID39889
Feature activation-8.94e-03

Pos logit contributions
 named+0.018
 saw+0.016
 there+0.016
 hall+0.015
 abandoned+0.015
Neg logit contributions
iously-0.026
 apology-0.025
aned-0.025
oor-0.024
der-0.024
Token and
Token ID290
Feature activation-2.01e-02

Pos logit contributions
der+0.019
 apology+0.018
iously+0.018
oor+0.018
 accomplishment+0.017
Neg logit contributions
 named-0.014
 saw-0.013
 hall-0.012
 there-0.012
 abandoned-0.011
Token said
Token ID531
Feature activation+1.29e-02

Pos logit contributions
 apology+0.045
der+0.043
oor+0.043
anks+0.043
iously+0.043
Neg logit contributions
 named-0.028
 saw-0.027
 there-0.025
 nearby-0.022
 hall-0.022
Token,
Token ID11
Feature activation-1.95e-02

Pos logit contributions
 saw+0.014
 named+0.014
 hall+0.011
 abandoned+0.011
 there+0.011
Neg logit contributions
 apology-0.033
der-0.033
oor-0.032
iously-0.032
anks-0.032
Token "
Token ID366
Feature activation-2.36e-02

Pos logit contributions
 apology+0.046
anks+0.045
 Deal+0.045
oor+0.043
Less+0.043
Neg logit contributions
 named-0.027
 saw-0.027
 there-0.025
 looking-0.021
 nearby-0.021
TokenYou
Token ID1639
Feature activation+7.52e-04

Pos logit contributions
 apology+0.045
anks+0.045
 Deal+0.043
oor+0.042
der+0.042
Neg logit contributions
 saw-0.043
 named-0.042
 there-0.040
 nearby-0.037
 heard-0.035
Token are
Token ID389
Feature activation-1.68e-02

Pos logit contributions
 named+0.001
 saw+0.001
 nearby+0.001
 there+0.001
 abandoned+0.001
Neg logit contributions
 apology-0.002
 Deal-0.001
oor-0.001
anks-0.001
der-0.001
Tokenâ
Token ID22940
Feature activation-7.20e-03

Pos logit contributions
 apology+0.061
anks+0.059
oor+0.058
der+0.057
aned+0.057
Neg logit contributions
 named-0.055
 saw-0.054
 there-0.052
 nearby-0.045
 looking-0.043
Token€
Token ID26391
Feature activation-1.50e-02

Pos logit contributions
 apology+0.010
anks+0.010
 Deal+0.010
oor+0.010
der+0.009
Neg logit contributions
 named-0.016
 saw-0.016
 there-0.016
 nearby-0.014
 heard-0.014
Token
Token ID220
Feature activation-5.31e-02

Pos logit contributions
 apology+0.027
 Deal+0.026
anks+0.025
oor+0.025
der+0.025
Neg logit contributions
 saw-0.029
 named-0.029
 there-0.027
 nearby-0.024
 hall-0.023
Token
Token ID198
Feature activation-2.21e-02

Pos logit contributions
 apology+0.129
 accomplishment+0.126
iously+0.126
der+0.120
aned+0.120
Neg logit contributions
 there-0.071
 named-0.067
 saw-0.065
 called-0.051
 nearby-0.051
Token
Token ID198
Feature activation-2.58e-02

Pos logit contributions
iously+0.053
der+0.053
 apology+0.052
anks+0.052
 accomplishment+0.050
Neg logit contributions
 named-0.031
 saw-0.031
 there-0.029
 nearby-0.025
 heard-0.023
TokenJ
Token ID41
Feature activation-4.32e-02

Pos logit contributions
 apology+0.050
 Deal+0.049
oor+0.048
anks+0.048
Less+0.047
Neg logit contributions
 named-0.045
 saw-0.044
 there-0.041
 nearby-0.037
 looking-0.036
Tokenill
Token ID359
Feature activation+2.00e-02

Pos logit contributions
 apology+0.070
 Deal+0.067
anks+0.066
oor+0.065
Less+0.064
Neg logit contributions
 saw-0.089
 named-0.089
 there-0.085
 nearby-0.077
 heard-0.075
Token understood
Token ID7247
Feature activation+1.11e-03

Pos logit contributions
 named+0.025
 there+0.025
 saw+0.025
 hall+0.022
 abandoned+0.022
Neg logit contributions
 apology-0.043
aned-0.042
der-0.042
iously-0.042
oor-0.041
Token why
Token ID1521
Feature activation-2.64e-04

Pos logit contributions
 saw+0.002
 named+0.002
 there+0.001
 nearby+0.001
 abandoned+0.001
Neg logit contributions
 apology-0.003
 Deal-0.002
anks-0.002
oor-0.002
der-0.002
Token Steve
Token ID6542
Feature activation+1.94e-02

Pos logit contributions
 apology+0.001
oor+0.001
anks+0.001
 Deal+0.001
der+0.001
Neg logit contributions
 saw-0.000
 named-0.000
 there-0.000
 nearby-0.000
 abandoned-0.000
Token was
Token ID373
Feature activation+1.20e-02

Pos logit contributions
 named+0.024
 saw+0.021
 there+0.021
 hall+0.020
 nearby+0.018
Neg logit contributions
 apology-0.043
der-0.042
iously-0.042
aned-0.042
oor-0.041
Token kill
Token ID1494
Feature activation-2.81e-02

Pos logit contributions
 apology+0.012
anks+0.011
 Deal+0.011
oor+0.011
der+0.011
Neg logit contributions
 saw-0.011
 named-0.011
 there-0.010
 nearby-0.009
 hall-0.009
Token him
Token ID683
Feature activation-4.08e-02

Pos logit contributions
 apology+0.046
anks+0.043
 Deal+0.043
oor+0.043
der+0.042
Neg logit contributions
 named-0.057
 saw-0.056
 there-0.051
 abandoned-0.048
 nearby-0.047
Token.
Token ID13
Feature activation-4.12e-02

Pos logit contributions
 apology+0.071
anks+0.069
oor+0.068
 Deal+0.067
der+0.067
Neg logit contributions
 named-0.078
 saw-0.078
 there-0.073
 nearby-0.066
 abandoned-0.065
Token He
Token ID679
Feature activation-3.12e-02

Pos logit contributions
 apology+0.071
 Deal+0.068
anks+0.067
oor+0.067
der+0.065
Neg logit contributions
 saw-0.081
 named-0.081
 there-0.076
 nearby-0.070
 abandoned-0.067
Token is
Token ID318
Feature activation-1.69e-02

Pos logit contributions
 apology+0.058
anks+0.056
oor+0.055
 Deal+0.054
der+0.054
Neg logit contributions
 named-0.054
 saw-0.052
 there-0.050
 hall-0.046
 nearby-0.046
Token my
Token ID616
Feature activation-4.91e-02

Pos logit contributions
 apology+0.031
 Deal+0.031
 myself+0.030
anks+0.030
 accomplishment+0.030
Neg logit contributions
 saw-0.027
 named-0.026
 hall-0.025
 there-0.023
 nearby-0.022
Token friend
Token ID1545
Feature activation-2.69e-02

Pos logit contributions
 apology+0.111
anks+0.104
 accomplishment+0.104
der+0.103
 Deal+0.102
Neg logit contributions
 named-0.071
 saw-0.068
 there-0.063
 nearby-0.056
 abandoned-0.055
Token.
Token ID13
Feature activation-3.47e-02

Pos logit contributions
 apology+0.038
oor+0.036
anks+0.036
der+0.036
 Deal+0.035
Neg logit contributions
 saw-0.057
 named-0.056
 there-0.053
 abandoned-0.050
 hall-0.050
Token He
Token ID679
Feature activation-2.12e-02

Pos logit contributions
 apology+0.061
anks+0.059
oor+0.059
 Deal+0.058
der+0.057
Neg logit contributions
 named-0.067
 saw-0.066
 there-0.063
 nearby-0.056
 hall-0.054
Token came
Token ID1625
Feature activation-1.34e-02

Pos logit contributions
 apology+0.039
anks+0.036
oor+0.035
der+0.035
 Deal+0.034
Neg logit contributions
 named-0.037
 there-0.035
 saw-0.035
 hall-0.033
 abandoned-0.032
Token to
Token ID284
Feature activation-1.41e-02

Pos logit contributions
 apology+0.030
 Deal+0.029
anks+0.029
oor+0.028
der+0.028
Neg logit contributions
 named-0.020
 saw-0.020
 there-0.018
 nearby-0.016
 abandoned-0.015
Tokenmy
Token ID1820
Feature activation-2.68e-04

Pos logit contributions
 apology+0.002
der+0.002
anks+0.002
 decisions+0.002
 choices+0.002
Neg logit contributions
 named-0.002
 saw-0.002
 abandoned-0.002
 there-0.002
eh-0.002
Token how
Token ID703
Feature activation-1.42e-02

Pos logit contributions
 apology+0.000
der+0.000
anks+0.000
oor+0.000
iously+0.000
Neg logit contributions
 named-0.000
 saw-0.000
 there-0.000
 abandoned-0.000
 nearby-0.000
Token they
Token ID484
Feature activation-3.10e-02

Pos logit contributions
 apology+0.032
 Deal+0.031
anks+0.031
oor+0.031
aned+0.030
Neg logit contributions
 saw-0.021
 named-0.021
 there-0.020
 nearby-0.017
 looking-0.016
Token made
Token ID925
Feature activation-1.60e-02

Pos logit contributions
 apology+0.067
 Deal+0.064
anks+0.064
oor+0.064
der+0.062
Neg logit contributions
 named-0.047
 saw-0.046
 there-0.044
 nearby-0.038
 abandoned-0.037
Token their
Token ID511
Feature activation-6.29e-02

Pos logit contributions
 apology+0.035
 Deal+0.034
anks+0.034
oor+0.034
der+0.033
Neg logit contributions
 saw-0.024
 named-0.024
 there-0.022
 nearby-0.019
 looking-0.019
Token wrap
Token ID14441
Feature activation-3.83e-02

Pos logit contributions
 apology+0.129
anks+0.122
 Deal+0.122
oor+0.120
 accomplishment+0.119
Neg logit contributions
 named-0.105
 saw-0.105
 there-0.097
 nearby-0.087
 abandoned-0.084
Token and
Token ID290
Feature activation-5.76e-02

Pos logit contributions
 apology+0.083
anks+0.080
oor+0.078
der+0.077
 Deal+0.076
Neg logit contributions
 saw-0.058
 named-0.058
 there-0.052
 nearby-0.046
 abandoned-0.046
Token loop
Token ID9052
Feature activation-3.92e-02

Pos logit contributions
 apology+0.113
oor+0.110
anks+0.110
 Deal+0.109
der+0.106
Neg logit contributions
 named-0.100
 saw-0.099
 there-0.093
 nearby-0.082
 looking-0.079
Token with
Token ID351
Feature activation-2.12e-02

Pos logit contributions
 apology+0.078
anks+0.076
oor+0.075
 Deal+0.074
der+0.073
Neg logit contributions
 named-0.066
 saw-0.066
 there-0.061
 nearby-0.054
 abandoned-0.053
Token paper
Token ID3348
Feature activation-3.64e-02

Pos logit contributions
 apology+0.042
 Deal+0.040
anks+0.040
oor+0.040
der+0.039
Neg logit contributions
 named-0.036
 saw-0.036
 there-0.033
 nearby-0.030
 abandoned-0.029
Token and
Token ID290
Feature activation-5.02e-02

Pos logit contributions
 apology+0.082
anks+0.080
oor+0.078
der+0.078
 Deal+0.078
Neg logit contributions
 named-0.052
 saw-0.052
 there-0.047
 nearby-0.041
 abandoned-0.040
Token that
Token ID326
Feature activation-1.04e-02

Pos logit contributions
 named+0.018
 saw+0.018
 there+0.016
 hall+0.014
 nearby+0.014
Neg logit contributions
 apology-0.027
 Deal-0.027
oor-0.026
anks-0.026
der-0.026
Token these
Token ID777
Feature activation-3.65e-02

Pos logit contributions
 apology+0.021
anks+0.020
 Deal+0.020
oor+0.020
der+0.020
Neg logit contributions
 named-0.017
 saw-0.016
 there-0.014
 nearby-0.014
 abandoned-0.013
Token views
Token ID5009
Feature activation-4.04e-03

Pos logit contributions
 apology+0.071
oor+0.065
anks+0.065
 accomplishment+0.065
der+0.065
Neg logit contributions
 named-0.065
 saw-0.062
 there-0.058
 nearby-0.052
 hall-0.052
Token were
Token ID547
Feature activation-1.46e-02

Pos logit contributions
 apology+0.009
oor+0.009
der+0.009
anks+0.009
 accomplishment+0.009
Neg logit contributions
 named-0.006
 saw-0.005
 there-0.005
 hall-0.004
 abandoned-0.004
Token made
Token ID925
Feature activation-2.74e-02

Pos logit contributions
 apology+0.029
 accomplishment+0.029
der+0.029
oor+0.029
anks+0.028
Neg logit contributions
 named-0.023
 saw-0.022
 there-0.018
 hall-0.018
 nearby-0.017
Token just
Token ID655
Feature activation-4.04e-02

Pos logit contributions
 apology+0.054
oor+0.053
anks+0.053
der+0.051
 accomplishment+0.051
Neg logit contributions
 named-0.046
 saw-0.045
 there-0.040
 abandoned-0.037
 nearby-0.037
Token for
Token ID329
Feature activation-1.79e-02

Pos logit contributions
 apology+0.087
der+0.084
anks+0.084
 accomplishment+0.084
iously+0.083
Neg logit contributions
 named-0.061
 saw-0.058
 there-0.051
 hall-0.048
 nearby-0.047
Token him
Token ID683
Feature activation-3.57e-02

Pos logit contributions
 apology+0.037
 Deal+0.036
anks+0.036
oor+0.036
der+0.035
Neg logit contributions
 named-0.029
 saw-0.029
 there-0.027
 nearby-0.023
 looking-0.022
Token and
Token ID290
Feature activation-4.15e-02

Pos logit contributions
 apology+0.074
anks+0.073
oor+0.073
der+0.071
iously+0.071
Neg logit contributions
 named-0.054
 saw-0.053
 there-0.047
 abandoned-0.043
 hall-0.042
Token he
Token ID339
Feature activation-1.53e-02

Pos logit contributions
 apology+0.087
der+0.084
anks+0.083
oor+0.082
 Deal+0.082
Neg logit contributions
 named-0.063
 saw-0.061
 there-0.056
 nearby-0.051
 hall-0.049
Token belonged
Token ID19611
Feature activation-1.84e-02

Pos logit contributions
 apology+0.035
oor+0.034
der+0.034
anks+0.034
 Deal+0.034
Neg logit contributions
 named-0.020
 saw-0.019
 there-0.018
 nearby-0.015
 abandoned-0.015
Token always
Token ID1464
Feature activation-3.85e-02

Pos logit contributions
 apology+0.052
anks+0.051
oor+0.051
 Deal+0.051
der+0.049
Neg logit contributions
 named-0.048
 saw-0.047
 there-0.044
 nearby-0.039
 looking-0.038
Token ask
Token ID1265
Feature activation-3.08e-02

Pos logit contributions
 apology+0.071
oor+0.070
anks+0.070
 Deal+0.068
der+0.068
Neg logit contributions
 saw-0.071
 named-0.071
 there-0.066
 nearby-0.059
 looking-0.058
Token before
Token ID878
Feature activation-6.95e-03

Pos logit contributions
oor+0.062
 apology+0.061
aned+0.061
anks+0.060
 Deal+0.059
Neg logit contributions
 named-0.053
 saw-0.051
 there-0.049
 nearby-0.043
 called-0.043
Token you
Token ID345
Feature activation-3.89e-02

Pos logit contributions
 apology+0.013
 Deal+0.012
anks+0.012
oor+0.012
aned+0.012
Neg logit contributions
 named-0.013
 saw-0.013
 there-0.012
 nearby-0.011
 looking-0.011
Token try
Token ID1949
Feature activation-2.21e-02

Pos logit contributions
 apology+0.081
oor+0.079
 Deal+0.078
anks+0.077
aned+0.076
Neg logit contributions
 saw-0.065
 named-0.063
 there-0.058
 nearby-0.052
 heard-0.051
Token something
Token ID1223
Feature activation-3.85e-02

Pos logit contributions
oor+0.054
 apology+0.053
 Deal+0.052
aned+0.052
iously+0.051
Neg logit contributions
 there-0.029
 saw-0.029
 named-0.028
 nearby-0.023
 looking-0.023
Token new
Token ID649
Feature activation-6.08e-02

Pos logit contributions
oor+0.096
 Deal+0.096
aned+0.095
 apology+0.094
anks+0.094
Neg logit contributions
 there-0.046
 saw-0.046
 named-0.044
 nearby-0.036
 hidden-0.035
Token.
Token ID13
Feature activation-3.89e-02

Pos logit contributions
 Deal+0.147
oor+0.147
 apology+0.147
aned+0.146
Less+0.143
Neg logit contributions
 named-0.076
 saw-0.075
 there-0.073
 nearby-0.059
 looking-0.055
Token There
Token ID1318
Feature activation+9.44e-04

Pos logit contributions
aned+0.072
iously+0.072
oor+0.072
 accomplishment+0.072
 apology+0.071
Neg logit contributions
 named-0.076
 saw-0.067
 there-0.067
 called-0.058
 who-0.057
Token are
Token ID389
Feature activation-8.93e-03

Pos logit contributions
 named+0.002
 saw+0.002
 there+0.001
 nearby+0.001
 abandoned+0.001
Neg logit contributions
 apology-0.002
anks-0.002
 Deal-0.002
oor-0.002
der-0.002
Token many
Token ID867
Feature activation-3.61e-02

Pos logit contributions
 apology+0.018
 Deal+0.017
anks+0.017
der+0.017
oor+0.017
Neg logit contributions
 named-0.015
 saw-0.015
 there-0.014
 nearby-0.012
 hall-0.012