TOP ACTIVATIONS
MAX = 7.792

a rainbow . â Molly â s eyes
the rain . â Daisy and the frog talked about
one yet ! <|endoftext|> Joe y had a favourite pattern .
The end . <|endoftext|> Joe y was young . He liked
for nothing . <|endoftext|> Joe y was walking along the road
being kind . â Jenny smiled and said ,
't forget it . Jay den j ogged his way home
never there ! <|endoftext|> Joe y was a little boy with
Until one day , Bob bie saw the o at field
. El isa 's mum was very compassionate
J enny and her mom walked on
. Da isy was delighted and stepped into
. Jon ny smiled at his parents and
each other ! <|endoftext|> Joe y was sitting in his room
give up ! <|endoftext|> Joe y was out in the yard
to other customers . Jo anne and Joey said goodbye as
rely on each other ". Tina hugged her back and knew
moved away . â Daisy looked sad as she didn
â Jo jo dug into his pocket and
meet you . â Daisy was surprised and smiled .

MIN ACTIVATIONS
MIN = -5.779

day , she saw a shiny ball high up on a
girl named Sue found a shiny coin on the ground .
rabbit named Tim found a shiny trophy in the forest .
Tim saw a big , shiny motorcycle . It looked different
Lily . Lily found a shiny diamond on the ground .
, they saw a big shiny car parked near the swings
Sally . Sally found a shiny penny on the ground .
playing , she found a shiny coin on the ground .
She saw a small , lost dog near her door .
cat named Tom saw a shiny ornament hanging on a tree
playing , he found a shiny coin on the floor .
day , she found a shiny gold coin on the ground
named Tim went to get a haircut . He was very
day , they found a shiny ring in her drawer .
day , Tom found a shiny copper coin in the garden
. They saw a big shiny crystal on the grass .
park . They find a shiny stone on the ground .
, Tim my found a shiny rock on the ground .
day , she saw a shiny penny on the ground and
day , she saw a shiny pe b ble on the

INTERVAL 4.399 - 7.792
CONTAINS 1.777%

much better now . The girl was very proud of what
discovery . Joe smiled and was really happy
it . Ch loe ran to her mom and
y ummy w affle . Mia was happy she could help
high off the ground . Dave wanted so badly to rest

INTERVAL 1.006 - 4.399
CONTAINS 25.100%

Finally , they arrived at the church .
and textures she saw . She wrote how the bright sunshine
't touch it ". His mom was so proud that he
she was flying ! After a while ,
wanted to help , so she asked the rabbit to follow

INTERVAL -2.386 - 1.006
CONTAINS 66.655%

and has horns . He looks strong and attractive . Anna
stopped and looked up and smiled , because it was beautiful
gems are shiny and pretty . They have red , blue
give them kisses . She did not give them cookies .
Maria . She was three years old and liked to explore

INTERVAL -5.779 - -2.386
CONTAINS 6.468%

Lucy and Jamie had to go get it . When they
t wear my new sweater anymore ! â
time , there was a little boy named Tim my .
tree . She wanted to climb it ! "
fish looked up with a surprised look and the two of
Token a
Token ID257
Feature activation-3.13e-01

Pos logit contributions
ave+0.116
olded+0.109
oped+0.107
 reached+0.104
 thanked+0.100
Neg logit contributions
 things-0.091
 new-0.090
 thing-0.082
 job-0.081
 attention-0.081
Token rainbow
Token ID27223
Feature activation+1.84e+00

Pos logit contributions
 attention+0.006
 cotton+0.006
 new+0.006
 things+0.006
 thing+0.006
Neg logit contributions
olded-0.021
 could-0.021
ave-0.020
 reached-0.020
 agreed-0.020
Token.
Token ID13
Feature activation+3.10e+00

Pos logit contributions
olded+0.116
ave+0.113
 From+0.111
 understood+0.109
 reached+0.109
Neg logit contributions
 things-0.050
 thing-0.047
 cotton-0.047
 new-0.045
 food-0.044
Tokenâ
Token ID22940
Feature activation+1.70e+00

Pos logit contributions
 forgot+0.145
ave+0.140
 couldn+0.139
 stared+0.135
 understood+0.133
Neg logit contributions
 TV-0.126
 new-0.116
 cotton-0.115
 things-0.112
 television-0.111
Token
Token ID26391
Feature activation+2.57e+00

Pos logit contributions
 could+0.078
 couldn+0.077
olded+0.072
 screamed+0.071
 reached+0.071
Neg logit contributions
 attention-0.069
 job-0.068
 thing-0.063
 sunny-0.060
 things-0.060
Token Molly
Token ID30236
Feature activation+7.56e+00

Pos logit contributions
ave+0.118
olded+0.118
 thanked+0.110
 apologized+0.110
 apologised+0.109
Neg logit contributions
 things-0.107
 long-0.102
 lunch-0.102
 new-0.097
 sight-0.095
Tokenâ
Token ID22940
Feature activation+2.04e+00

Pos logit contributions
olded+0.288
ave+0.288
oped+0.269
ver+0.244
via+0.243
Neg logit contributions
 red-0.335
 new-0.328
 things-0.312
 big-0.311
 pink-0.311
Token
Token ID26391
Feature activation+2.77e+00

Pos logit contributions
 could+0.097
 couldn+0.097
olded+0.088
 hadn+0.088
 wondered+0.088
Neg logit contributions
 attention-0.081
 job-0.078
 thing-0.072
 sight-0.068
 sunny-0.068
Token
Token ID8151
Feature activation+3.59e+00

Pos logit contributions
ave+0.135
olded+0.134
 apologized+0.129
 apologised+0.128
 thanked+0.126
Neg logit contributions
 things-0.106
 long-0.101
 lunch-0.096
 new-0.094
 sight-0.094
Tokens
Token ID82
Feature activation+2.37e+00

Pos logit contributions
 wondered+0.143
 stared+0.142
 reached+0.141
 thanked+0.140
 prayed+0.138
Neg logit contributions
 new-0.150
 home-0.143
 attention-0.142
 red-0.132
 long-0.132
Token eyes
Token ID2951
Feature activation+2.27e+00

Pos logit contributions
ave+0.130
olded+0.127
 reached+0.124
 suggested+0.124
 agreed+0.122
Neg logit contributions
 new-0.084
 attention-0.081
 things-0.080
 sight-0.077
 job-0.075
Token the
Token ID262
Feature activation-1.09e+00

Pos logit contributions
 attention+0.001
 cotton+0.001
 things+0.001
 new+0.001
 food+0.001
Neg logit contributions
 could-0.002
ave-0.002
olded-0.002
 reached-0.002
 agreed-0.002
Token rain
Token ID6290
Feature activation+1.15e+00

Pos logit contributions
 attention+0.021
 cotton+0.019
 things+0.018
 new+0.017
 food+0.017
Neg logit contributions
 could-0.074
 reached-0.072
 agreed-0.072
 couldn-0.071
olded-0.071
Token.
Token ID13
Feature activation+2.78e+00

Pos logit contributions
olded+0.070
 hadn+0.069
 could+0.069
 reached+0.068
 couldn+0.068
Neg logit contributions
 attention-0.031
 things-0.030
 cotton-0.030
 new-0.028
 TV-0.028
Tokenâ
Token ID22940
Feature activation+1.80e+00

Pos logit contributions
 forgot+0.123
ave+0.118
 couldn+0.117
 hadn+0.113
 managed+0.113
Neg logit contributions
 TV-0.123
 cotton-0.118
 new-0.116
 attention-0.113
 dessert-0.110
Token
Token ID26391
Feature activation+2.50e+00

Pos logit contributions
 could+0.081
 couldn+0.081
 hadn+0.077
olded+0.074
 wondered+0.074
Neg logit contributions
 attention-0.075
 job-0.073
 thing-0.068
 things-0.066
 sunny-0.064
Token Daisy
Token ID40355
Feature activation+7.55e+00

Pos logit contributions
ave+0.119
olded+0.117
 apologized+0.108
 understood+0.107
 apologised+0.106
Neg logit contributions
 things-0.104
 lunch-0.100
 long-0.100
 view-0.097
 cotton-0.097
Token and
Token ID290
Feature activation+1.72e+00

Pos logit contributions
olded+0.289
ave+0.273
oped+0.273
Well+0.248
ver+0.237
Neg logit contributions
 new-0.342
 red-0.342
 things-0.326
 pink-0.326
 big-0.318
Token the
Token ID262
Feature activation-6.63e-01

Pos logit contributions
olded+0.111
ave+0.109
oped+0.104
 hadn+0.102
 could+0.101
Neg logit contributions
 things-0.050
 attention-0.048
 new-0.046
 lunch-0.045
 food-0.044
Token frog
Token ID21264
Feature activation+5.35e+00

Pos logit contributions
 attention+0.020
 cotton+0.019
 things+0.018
 new+0.018
 food+0.017
Neg logit contributions
 could-0.038
olded-0.038
 reached-0.037
 agreed-0.037
 couldn-0.037
Token talked
Token ID6619
Feature activation+6.78e-02

Pos logit contributions
olded+0.174
 From+0.152
lett+0.151
ave+0.148
Well+0.146
Neg logit contributions
 new-0.291
 things-0.279
 red-0.273
 food-0.268
 home-0.267
Token about
Token ID546
Feature activation-5.78e-05

Pos logit contributions
olded+0.004
 reached+0.004
 could+0.004
 hadn+0.004
ave+0.004
Neg logit contributions
 attention-0.002
 new-0.002
 things-0.002
 cotton-0.002
 food-0.002
Token one
Token ID530
Feature activation+3.26e-01

Pos logit contributions
 attention+0.036
 cotton+0.034
 things+0.030
 groceries+0.030
 new+0.030
Neg logit contributions
 could-0.092
 agreed-0.089
 reached-0.089
 couldn-0.089
 thanked-0.088
Token yet
Token ID1865
Feature activation+2.09e+00

Pos logit contributions
olded+0.018
ave+0.017
 reached+0.017
 thanked+0.017
 From+0.017
Neg logit contributions
 attention-0.011
 new-0.010
 thing-0.010
 things-0.010
 gift-0.010
Token!
Token ID0
Feature activation+3.51e+00

Pos logit contributions
olded+0.129
ave+0.127
oped+0.122
 reached+0.119
 From+0.117
Neg logit contributions
 new-0.062
 things-0.060
 attention-0.060
 thing-0.056
 food-0.055
Token<|endoftext|>
Token ID50256
Feature activation+3.05e+00

Pos logit contributions
 forgot+0.193
ave+0.182
'd+0.175
 refused+0.171
 counted+0.167
Neg logit contributions
 new-0.117
 sight-0.113
 home-0.110
 TV-0.109
 cotton-0.104
TokenJoe
Token ID19585
Feature activation+5.59e+00

Pos logit contributions
olded+0.184
 forgot+0.168
 apologised+0.166
ave+0.163
 apologized+0.161
Neg logit contributions
 things-0.104
 new-0.098
 thing-0.096
 sight-0.096
 view-0.090
Tokeny
Token ID88
Feature activation+7.54e+00

Pos logit contributions
olded+0.307
Well+0.268
 apologised+0.259
ave+0.257
ver+0.255
Neg logit contributions
 red-0.184
 home-0.168
 things-0.166
 new-0.157
 pink-0.157
Token had
Token ID550
Feature activation+1.31e+00

Pos logit contributions
olded+0.354
ver+0.324
oped+0.317
ave+0.315
Well+0.304
Neg logit contributions
 red-0.283
 things-0.275
 new-0.271
 sand-0.265
 big-0.260
Token a
Token ID257
Feature activation-1.87e+00

Pos logit contributions
olded+0.096
oped+0.094
ave+0.093
 could+0.090
 hadn+0.090
Neg logit contributions
 new-0.023
 lunch-0.022
 sand-0.021
 red-0.021
 cotton-0.020
Token favourite
Token ID12507
Feature activation-1.80e+00

Pos logit contributions
 attention+0.050
 cotton+0.042
 things+0.041
 food+0.038
 groceries+0.037
Neg logit contributions
 could-0.116
 couldn-0.113
 agreed-0.112
 thanked-0.111
 hadn-0.111
Token pattern
Token ID3912
Feature activation-1.22e-01

Pos logit contributions
 attention+0.047
 cotton+0.041
 things+0.038
 groceries+0.036
 food+0.036
Neg logit contributions
 could-0.110
 couldn-0.107
 agreed-0.107
 thanked-0.106
 reached-0.105
Token.
Token ID13
Feature activation+7.27e-01

Pos logit contributions
 attention+0.003
 cotton+0.003
 things+0.003
 new+0.003
 food+0.003
Neg logit contributions
 could-0.007
olded-0.007
 reached-0.007
 thanked-0.007
 hadn-0.007
TokenThe
Token ID464
Feature activation-1.64e+00

Pos logit contributions
olded+0.086
ave+0.083
 forgot+0.082
 could+0.080
 couldn+0.080
Neg logit contributions
 new-0.060
 cotton-0.059
 sunny-0.058
 view-0.054
 things-0.054
Token end
Token ID886
Feature activation-4.42e-01

Pos logit contributions
 attention+0.052
 cotton+0.049
 sand+0.044
 things+0.044
 groceries+0.043
Neg logit contributions
 could-0.092
 couldn-0.089
 hadn-0.088
 thanked-0.088
 agreed-0.087
Token.
Token ID13
Feature activation+2.44e+00

Pos logit contributions
 attention+0.011
 new+0.011
 things+0.011
 cotton+0.010
 thing+0.010
Neg logit contributions
olded-0.027
 could-0.027
 reached-0.027
 thanked-0.027
ave-0.027
Token<|endoftext|>
Token ID50256
Feature activation+2.08e+00

Pos logit contributions
ave+0.123
 forgot+0.120
olded+0.120
 stared+0.115
 screamed+0.112
Neg logit contributions
 new-0.088
 sight-0.083
 lunch-0.080
 gift-0.080
 TV-0.080
TokenJoe
Token ID19585
Feature activation+5.12e+00

Pos logit contributions
olded+0.115
 forgot+0.106
 apologised+0.104
ave+0.102
 apologized+0.101
Neg logit contributions
 things-0.077
 new-0.075
 thing-0.072
 sight-0.072
 lunch-0.071
Tokeny
Token ID88
Feature activation+7.51e+00

Pos logit contributions
olded+0.288
Well+0.257
 hadn+0.248
 apologised+0.245
ave+0.242
Neg logit contributions
 home-0.162
 red-0.156
 things-0.148
 sand-0.140
 new-0.139
Token was
Token ID373
Feature activation-4.37e-01

Pos logit contributions
olded+0.368
oped+0.343
Well+0.337
ave+0.335
ver+0.332
Neg logit contributions
 home-0.251
 new-0.244
 big-0.240
 things-0.238
 lunch-0.238
Token young
Token ID1862
Feature activation-8.29e-01

Pos logit contributions
 attention+0.013
 cotton+0.012
 new+0.012
 things+0.012
 dessert+0.011
Neg logit contributions
 could-0.026
olded-0.025
 agreed-0.025
ave-0.024
 reached-0.024
Token.
Token ID13
Feature activation+1.48e+00

Pos logit contributions
 attention+0.030
 cotton+0.028
 things+0.028
 new+0.027
 food+0.026
Neg logit contributions
 could-0.043
 reached-0.041
olded-0.041
 agreed-0.041
 couldn-0.041
Token He
Token ID679
Feature activation+4.22e+00

Pos logit contributions
olded+0.064
ave+0.063
 forgot+0.062
 screamed+0.060
 thanked+0.059
Neg logit contributions
 thing-0.065
 things-0.065
 sight-0.065
 TV-0.064
 sand-0.064
Token liked
Token ID8288
Feature activation-1.38e+00

Pos logit contributions
olded+0.170
ave+0.142
via+0.139
 exclaimed+0.138
ver+0.135
Neg logit contributions
 red-0.203
 things-0.201
 dark-0.200
 big-0.198
 shiny-0.198
Token for
Token ID329
Feature activation-4.73e-01

Pos logit contributions
 could+0.021
olded+0.021
ave+0.021
 reached+0.021
oped+0.021
Neg logit contributions
 attention-0.012
 new-0.011
 things-0.011
 food-0.011
 cotton-0.010
Token nothing
Token ID2147
Feature activation+3.51e-02

Pos logit contributions
 attention+0.014
 things+0.013
 cotton+0.013
 new+0.013
 food+0.012
Neg logit contributions
 could-0.028
olded-0.027
ave-0.027
 reached-0.027
 agreed-0.027
Token.
Token ID13
Feature activation+2.61e+00

Pos logit contributions
olded+0.002
 could+0.002
 reached+0.002
ave+0.002
 thanked+0.002
Neg logit contributions
 attention-0.001
 new-0.001
 things-0.001
 cotton-0.001
 food-0.001
Token<|endoftext|>
Token ID50256
Feature activation+2.05e+00

Pos logit contributions
 forgot+0.132
'd+0.131
ave+0.130
 managed+0.128
 agreed+0.127
Neg logit contributions
 new-0.089
 home-0.083
 sight-0.080
 things-0.077
 purpose-0.076
TokenJoe
Token ID19585
Feature activation+5.37e+00

Pos logit contributions
olded+0.110
 apologised+0.099
 forgot+0.098
ave+0.097
 apologized+0.096
Neg logit contributions
 things-0.081
 new-0.080
 thing-0.077
 sight-0.076
 lunch-0.076
Tokeny
Token ID88
Feature activation+7.51e+00

Pos logit contributions
olded+0.252
Well+0.223
ave+0.210
 hadn+0.205
ver+0.203
Neg logit contributions
 red-0.210
 home-0.208
 things-0.204
 sand-0.196
 new-0.187
Token was
Token ID373
Feature activation-1.40e-01

Pos logit contributions
olded+0.298
ave+0.272
Well+0.267
oped+0.267
ver+0.265
Neg logit contributions
 things-0.314
 sand-0.310
 lunch-0.309
 red-0.308
 new-0.304
Token walking
Token ID6155
Feature activation-1.15e+00

Pos logit contributions
 attention+0.004
 new+0.004
 cotton+0.004
 things+0.004
 red+0.003
Neg logit contributions
 could-0.008
olded-0.008
ave-0.008
 agreed-0.008
 suggested-0.008
Token along
Token ID1863
Feature activation-1.07e+00

Pos logit contributions
 attention+0.035
 cotton+0.033
 things+0.031
 new+0.031
 food+0.030
Neg logit contributions
 could-0.066
 reached-0.063
 agreed-0.063
 couldn-0.063
 thanked-0.063
Token the
Token ID262
Feature activation-1.51e+00

Pos logit contributions
 attention+0.034
 cotton+0.032
 things+0.031
 new+0.030
 food+0.030
Neg logit contributions
 could-0.060
 reached-0.058
 agreed-0.058
 couldn-0.057
 thanked-0.057
Token road
Token ID2975
Feature activation+5.75e-01

Pos logit contributions
 attention+0.046
 cotton+0.043
 gift+0.039
 things+0.039
 dessert+0.039
Neg logit contributions
 could-0.085
 couldn-0.083
 thanked-0.083
 reached-0.082
 agreed-0.082
Token being
Token ID852
Feature activation+1.15e+00

Pos logit contributions
oped+0.107
ave+0.107
 reached+0.105
olded+0.103
 agreed+0.100
Neg logit contributions
 attention-0.073
 things-0.072
 new-0.067
 food-0.067
 gift-0.062
Token kind
Token ID1611
Feature activation+1.96e+00

Pos logit contributions
oped+0.071
 could+0.071
 reached+0.071
 couldn+0.071
ave+0.070
Neg logit contributions
 new-0.029
 attention-0.028
 things-0.028
 food-0.026
 red-0.025
Token.
Token ID13
Feature activation+3.06e+00

Pos logit contributions
 reached+0.132
 stared+0.131
 hadn+0.130
oped+0.129
olded+0.128
Neg logit contributions
 attention-0.043
 things-0.042
 thing-0.038
 new-0.035
 food-0.034
Tokenâ
Token ID22940
Feature activation+1.99e+00

Pos logit contributions
 stared+0.131
ave+0.128
 forgot+0.126
 spoke+0.125
 lived+0.125
Neg logit contributions
 new-0.129
 TV-0.126
 attention-0.124
 things-0.121
 lunch-0.116
Token
Token ID26391
Feature activation+3.09e+00

Pos logit contributions
 could+0.091
 couldn+0.088
 reached+0.084
 screamed+0.083
 hadn+0.082
Neg logit contributions
 attention-0.082
 job-0.081
 thing-0.073
 things-0.072
 temptation-0.068
Token Jenny
Token ID28437
Feature activation+7.50e+00

Pos logit contributions
ave+0.133
olded+0.119
 apologized+0.115
 agreed+0.115
 stared+0.114
Neg logit contributions
 things-0.140
 long-0.132
 job-0.129
 new-0.127
 lunch-0.124
Token smiled
Token ID13541
Feature activation+4.46e-02

Pos logit contributions
oped+0.258
olded+0.242
 Did+0.229
os+0.223
anced+0.216
Neg logit contributions
 new-0.327
 things-0.323
 red-0.319
 cake-0.312
 big-0.309
Token and
Token ID290
Feature activation+3.90e+00

Pos logit contributions
 could+0.002
olded+0.002
 reached+0.002
 hadn+0.002
ave+0.002
Neg logit contributions
 attention-0.002
 new-0.001
 things-0.001
 cotton-0.001
 red-0.001
Token said
Token ID531
Feature activation+4.89e-01

Pos logit contributions
os+0.121
oped+0.118
olded+0.105
ave+0.105
 hadn+0.104
Neg logit contributions
 new-0.216
 red-0.203
 long-0.203
 good-0.201
 shiny-0.199
Token,
Token ID11
Feature activation+2.58e+00

Pos logit contributions
 hadn+0.030
ave+0.029
 reached+0.029
 could+0.029
olded+0.029
Neg logit contributions
 attention-0.013
 new-0.013
 things-0.013
 cotton-0.012
 thing-0.012
Token
Token ID6184
Feature activation+4.63e-01

Pos logit contributions
ave+0.143
 forgot+0.136
olded+0.134
os+0.132
oped+0.129
Neg logit contributions
 new-0.085
 long-0.082
 TV-0.080
 red-0.079
 round-0.079
Token't
Token ID470
Feature activation+7.92e-01

Pos logit contributions
 agreed+0.101
ave+0.099
 nodded+0.099
 stared+0.099
olded+0.098
Neg logit contributions
 TV-0.070
 attention-0.066
 any-0.065
 things-0.064
 new-0.063
Token forget
Token ID6044
Feature activation+6.59e-01

Pos logit contributions
olded+0.038
ave+0.037
oped+0.037
 From+0.036
 could+0.036
Neg logit contributions
 attention-0.033
 things-0.030
 new-0.030
 thing-0.029
 job-0.029
Token it
Token ID340
Feature activation+1.96e+00

Pos logit contributions
ave+0.040
olded+0.039
 reached+0.039
oped+0.039
 agreed+0.038
Neg logit contributions
 things-0.020
 attention-0.019
 new-0.019
 food-0.018
 cotton-0.017
Token.
Token ID13
Feature activation+3.16e+00

Pos logit contributions
olded+0.129
oped+0.123
ave+0.123
 reached+0.121
 stared+0.118
Neg logit contributions
 new-0.050
 attention-0.048
 things-0.047
 food-0.043
 thing-0.041
Token Jay
Token ID9180
Feature activation+3.37e+00

Pos logit contributions
ave+0.172
 forgot+0.160
 managed+0.155
olded+0.154
 argued+0.154
Neg logit contributions
 new-0.121
 attention-0.103
 sight-0.101
 things-0.098
 TV-0.097
Tokenden
Token ID6559
Feature activation+7.48e+00

Pos logit contributions
olded+0.130
 reached+0.120
 apologized+0.120
 forgot+0.120
 succeeded+0.117
Neg logit contributions
 thing-0.141
 attention-0.138
away-0.131
 things-0.131
 new-0.129
Token j
Token ID474
Feature activation+1.16e+00

Pos logit contributions
oped+0.259
olded+0.242
 Did+0.234
ave+0.233
Get+0.225
Neg logit contributions
 new-0.356
 more-0.330
 things-0.328
 home-0.326
 big-0.320
Tokenogged
Token ID42545
Feature activation-3.57e-02

Pos logit contributions
 could+0.056
 hadn+0.055
 couldn+0.053
 reached+0.052
 forgot+0.052
Neg logit contributions
 attention-0.046
 job-0.042
 gift-0.041
 things-0.041
 home-0.041
Token his
Token ID465
Feature activation-1.20e+00

Pos logit contributions
 attention+0.001
 new+0.001
 things+0.001
 cotton+0.001
 sight+0.001
Neg logit contributions
 could-0.002
 hadn-0.002
olded-0.002
 couldn-0.002
 suggested-0.002
Token way
Token ID835
Feature activation-9.91e-01

Pos logit contributions
 attention+0.035
 cotton+0.034
 things+0.032
 food+0.031
 new+0.031
Neg logit contributions
 could-0.070
 couldn-0.068
 reached-0.067
 thanked-0.067
 agreed-0.067
Token home
Token ID1363
Feature activation+1.30e+00

Pos logit contributions
 attention+0.019
 cotton+0.017
 things+0.016
 new+0.016
 food+0.015
Neg logit contributions
 could-0.068
 reached-0.066
olded-0.066
 agreed-0.066
 couldn-0.066
Token never
Token ID1239
Feature activation-6.48e-01

Pos logit contributions
 attention+0.003
 new+0.003
 things+0.003
 cotton+0.003
 sight+0.003
Neg logit contributions
olded-0.005
 could-0.005
ave-0.005
 agreed-0.005
 thanked-0.005
Token there
Token ID612
Feature activation-2.63e-01

Pos logit contributions
 attention+0.024
 cotton+0.022
 new+0.022
 things+0.022
 food+0.021
Neg logit contributions
 could-0.033
olded-0.032
 reached-0.032
 agreed-0.032
 thanked-0.032
Token!
Token ID0
Feature activation+1.55e+00

Pos logit contributions
 attention+0.008
 cotton+0.007
 new+0.007
 things+0.007
 thing+0.007
Neg logit contributions
olded-0.015
 could-0.015
 thanked-0.015
ave-0.015
 reached-0.015
Token<|endoftext|>
Token ID50256
Feature activation+2.11e+00

Pos logit contributions
olded+0.064
 forgot+0.061
 stared+0.060
ave+0.060
 agreed+0.060
Neg logit contributions
 new-0.070
 attention-0.068
 sight-0.068
 things-0.066
 gift-0.065
TokenJoe
Token ID19585
Feature activation+5.48e+00

Pos logit contributions
olded+0.107
 forgot+0.098
 apologised+0.096
ave+0.094
 apologized+0.094
Neg logit contributions
 things-0.086
 new-0.084
 sight-0.082
 thing-0.081
 lunch-0.081
Tokeny
Token ID88
Feature activation+7.47e+00

Pos logit contributions
olded+0.252
Well+0.224
ave+0.209
 apologised+0.207
 Follow+0.205
Neg logit contributions
 red-0.217
 home-0.212
 things-0.199
 sand-0.191
 outside-0.189
Token was
Token ID373
Feature activation-3.86e-01

Pos logit contributions
olded+0.324
ave+0.299
Well+0.280
oped+0.279
via+0.276
Neg logit contributions
 lunch-0.289
 red-0.289
 sand-0.288
 things-0.285
 home-0.283
Token a
Token ID257
Feature activation-1.72e+00

Pos logit contributions
 attention+0.010
 new+0.010
 cotton+0.010
 things+0.009
 dessert+0.009
Neg logit contributions
 could-0.023
olded-0.023
 agreed-0.023
ave-0.023
 thanked-0.022
Token little
Token ID1310
Feature activation-1.88e+00

Pos logit contributions
 attention+0.059
 cotton+0.053
 things+0.050
 food+0.049
 groceries+0.048
Neg logit contributions
 could-0.097
 couldn-0.093
 reached-0.092
 thanked-0.092
 agreed-0.091
Token boy
Token ID2933
Feature activation-3.54e-02

Pos logit contributions
 attention+0.077
 cotton+0.071
 food+0.067
 things+0.066
 lunch+0.066
Neg logit contributions
 could-0.092
 agreed-0.087
 couldn-0.086
 suggested-0.086
 thanked-0.085
Token with
Token ID351
Feature activation-1.01e+00

Pos logit contributions
 attention+0.001
 things+0.001
 new+0.001
 cotton+0.001
 thing+0.001
Neg logit contributions
olded-0.002
 reached-0.002
ave-0.002
 could-0.002
 agreed-0.002
Token Until
Token ID14303
Feature activation+1.64e+00

Pos logit contributions
ave+0.196
 forgot+0.194
 managed+0.181
 reached+0.176
'd+0.175
Neg logit contributions
 TV-0.165
 new-0.160
 things-0.155
 lunch-0.149
 sight-0.148
Token one
Token ID530
Feature activation+2.53e-02

Pos logit contributions
ave+0.102
olded+0.099
 could+0.093
 thanked+0.091
 forgot+0.091
Neg logit contributions
 things-0.045
 job-0.044
 attention-0.044
 thing-0.044
 sight-0.042
Token day
Token ID1110
Feature activation+2.19e+00

Pos logit contributions
olded+0.002
 could+0.002
ave+0.002
 reached+0.002
 thanked+0.002
Neg logit contributions
 attention-0.000
 things-0.000
 new-0.000
 thing-0.000
 cotton-0.000
Token,
Token ID11
Feature activation+1.81e+00

Pos logit contributions
ave+0.139
olded+0.136
 hadn+0.129
oped+0.128
'd+0.128
Neg logit contributions
 things-0.061
 food-0.052
 lunch-0.052
 thing-0.051
 new-0.051
Token Bob
Token ID5811
Feature activation+5.28e+00

Pos logit contributions
ave+0.114
olded+0.109
oped+0.107
 reached+0.103
 agreed+0.103
Neg logit contributions
 things-0.055
 new-0.049
 food-0.049
 attention-0.049
 lunch-0.047
Tokenbie
Token ID12590
Feature activation+7.44e+00

Pos logit contributions
Well+0.171
olded+0.168
Even+0.157
oped+0.154
From+0.144
Neg logit contributions
 home-0.260
 things-0.246
 food-0.242
 new-0.238
 big-0.236
Token saw
Token ID2497
Feature activation+3.03e-01

Pos logit contributions
olded+0.231
oped+0.220
Well+0.219
 From+0.207
Even+0.206
Neg logit contributions
 things-0.343
 home-0.326
 food-0.323
 red-0.320
 thing-0.316
Token the
Token ID262
Feature activation-2.05e+00

Pos logit contributions
olded+0.020
ave+0.020
 could+0.020
 reached+0.020
 agreed+0.019
Neg logit contributions
 things-0.007
 attention-0.007
 new-0.006
 cotton-0.006
 food-0.006
Token o
Token ID267
Feature activation+4.77e-01

Pos logit contributions
 attention+0.049
 cotton+0.045
 things+0.041
 new+0.040
 dessert+0.039
Neg logit contributions
 could-0.126
 couldn-0.122
 hadn-0.121
 thanked-0.120
 agreed-0.119
Tokenat
Token ID265
Feature activation+1.13e+00

Pos logit contributions
 could+0.025
 agreed+0.024
 suggested+0.024
 couldn+0.024
olded+0.024
Neg logit contributions
 attention-0.016
 things-0.016
 cotton-0.015
 sand-0.015
 food-0.015
Token field
Token ID2214
Feature activation+1.91e+00

Pos logit contributions
olded+0.063
ave+0.061
 From+0.060
 agreed+0.059
 suggested+0.058
Neg logit contributions
 attention-0.037
 things-0.036
 cotton-0.036
 food-0.035
 new-0.035
Token.
Token ID13
Feature activation+3.11e+00

Pos logit contributions
olded+0.153
ave+0.153
 reached+0.152
 hadn+0.148
 understood+0.146
Neg logit contributions
 new-0.041
 food-0.039
 things-0.038
 attention-0.038
 lunch-0.036
Token
Token ID220
Feature activation+3.62e+00

Pos logit contributions
 forgot+0.155
ave+0.155
 understood+0.143
 managed+0.142
olded+0.140
Neg logit contributions
 new-0.121
 TV-0.118
 bed-0.110
 home-0.110
 things-0.109
Token
Token ID198
Feature activation+2.60e+00

Pos logit contributions
 stared+0.189
 forgot+0.189
ave+0.181
 apologised+0.180
 hadn+0.180
Neg logit contributions
 new-0.142
 things-0.121
 attention-0.120
 TV-0.114
 home-0.113
Token
Token ID198
Feature activation+1.84e+00

Pos logit contributions
olded+0.136
 forgot+0.136
ave+0.133
 refused+0.130
 apologized+0.128
Neg logit contributions
 new-0.100
 long-0.091
 cotton-0.088
 things-0.084
 big-0.084
TokenEl
Token ID9527
Feature activation+1.60e+00

Pos logit contributions
olded+0.092
 forgot+0.089
ave+0.088
 couldn+0.088
 hadn+0.085
Neg logit contributions
 cotton-0.067
 new-0.063
 sunny-0.062
 things-0.062
 long-0.060
Tokenisa
Token ID9160
Feature activation+7.42e+00

Pos logit contributions
olded+0.073
 could+0.073
 apologized+0.072
 hadn+0.070
 rode+0.069
Neg logit contributions
 attention-0.064
 thing-0.059
 things-0.058
 sight-0.055
 new-0.054
Token's
Token ID338
Feature activation-8.66e-03

Pos logit contributions
olded+0.287
 Did+0.261
Get+0.258
oped+0.257
liv+0.255
Neg logit contributions
 red-0.268
 things-0.255
 more-0.254
 new-0.252
 big-0.251
Token mum
Token ID25682
Feature activation+5.32e+00

Pos logit contributions
 attention+0.000
 new+0.000
 things+0.000
 cotton+0.000
 food+0.000
Neg logit contributions
 could-0.000
 reached-0.000
olded-0.000
 suggested-0.000
 agreed-0.000
Token was
Token ID373
Feature activation+5.58e-01

Pos logit contributions
ave+0.174
olded+0.170
 From+0.156
 Did+0.153
via+0.152
Neg logit contributions
 new-0.238
 red-0.232
 more-0.223
 cake-0.222
 pink-0.219
Token very
Token ID845
Feature activation+8.08e-01

Pos logit contributions
 could+0.034
ave+0.033
 couldn+0.032
 hadn+0.032
olded+0.032
Neg logit contributions
 new-0.015
 attention-0.014
 dessert-0.013
 red-0.013
 cotton-0.013
Token compassionate
Token ID32533
Feature activation-2.70e-01

Pos logit contributions
 could+0.045
ave+0.044
 hadn+0.044
oped+0.043
 reached+0.043
Neg logit contributions
 new-0.027
 attention-0.025
 red-0.023
 nice-0.023
 fake-0.023
Token
Token ID26391
Feature activation+2.69e+00

Pos logit contributions
 could+0.084
 couldn+0.083
 reached+0.079
 screamed+0.078
 hadn+0.078
Neg logit contributions
 attention-0.081
 job-0.078
 thing-0.071
 things-0.071
 temptation-0.066
Token
Token ID220
Feature activation+3.38e+00

Pos logit contributions
ave+0.122
olded+0.113
 apologized+0.108
 stared+0.106
 thanked+0.106
Neg logit contributions
 things-0.123
 job-0.111
 long-0.111
 new-0.110
 lunch-0.110
Token
Token ID198
Feature activation+2.71e+00

Pos logit contributions
 stared+0.153
 thanked+0.148
 reached+0.146
 forgot+0.146
 agreed+0.144
Neg logit contributions
 new-0.157
 attention-0.149
 things-0.144
 TV-0.132
 long-0.130
Token
Token ID198
Feature activation+1.71e+00

Pos logit contributions
 forgot+0.140
 stared+0.136
ave+0.136
 refused+0.132
 stood+0.131
Neg logit contributions
 new-0.105
 long-0.092
 cotton-0.088
 lunch-0.086
 big-0.085
TokenJ
Token ID41
Feature activation+1.92e+00

Pos logit contributions
 stared+0.083
 couldn+0.083
 could+0.083
 missed+0.079
 forgot+0.079
Neg logit contributions
 new-0.061
 sunny-0.060
 cotton-0.060
 rainy-0.058
 things-0.057
Tokenenny
Token ID11870
Feature activation+7.40e+00

Pos logit contributions
 stared+0.070
 thanked+0.068
 reached+0.066
 wondered+0.065
 lived+0.065
Neg logit contributions
 attention-0.094
 things-0.089
 new-0.088
 gift-0.087
 birthday-0.084
Token and
Token ID290
Feature activation+1.21e+00

Pos logit contributions
oped+0.256
 Did+0.245
olded+0.245
os+0.217
 Get+0.214
Neg logit contributions
 red-0.314
 new-0.311
 things-0.301
 more-0.298
 cake-0.297
Token her
Token ID607
Feature activation-5.46e-01

Pos logit contributions
ave+0.071
 could+0.070
olded+0.069
'd+0.069
 hadn+0.068
Neg logit contributions
 attention-0.037
 things-0.035
 new-0.034
 home-0.034
 job-0.032
Token mom
Token ID1995
Feature activation+3.74e+00

Pos logit contributions
 attention+0.016
 cotton+0.015
 new+0.014
 things+0.014
 food+0.013
Neg logit contributions
 could-0.032
 reached-0.032
olded-0.032
 agreed-0.032
 suggested-0.031
Token walked
Token ID6807
Feature activation-8.46e-01

Pos logit contributions
 From+0.102
 hasn+0.101
olded+0.098
 Did+0.096
 hadn+0.092
Neg logit contributions
 cake-0.193
 red-0.187
 home-0.185
 more-0.184
 attention-0.183
Token on
Token ID319
Feature activation-1.91e+00

Pos logit contributions
 attention+0.023
 cotton+0.021
 things+0.020
 new+0.020
 food+0.019
Neg logit contributions
 could-0.052
 reached-0.050
olded-0.050
 couldn-0.050
 agreed-0.050
Token.
Token ID13
Feature activation+3.96e+00

Pos logit contributions
 thanked+0.128
 apologized+0.127
 understood+0.126
 agreed+0.126
ave+0.126
Neg logit contributions
 things-0.113
 long-0.110
 lunch-0.109
 TV-0.106
 sight-0.105
Token
Token ID220
Feature activation+3.85e+00

Pos logit contributions
 forgot+0.185
ave+0.181
 couldn+0.173
 understood+0.168
 managed+0.168
Neg logit contributions
 TV-0.156
 new-0.147
 television-0.142
 attention-0.141
 things-0.141
Token
Token ID198
Feature activation+2.70e+00

Pos logit contributions
 stared+0.184
 forgot+0.181
 thanked+0.178
 reached+0.176
 nodded+0.174
Neg logit contributions
 new-0.160
 things-0.147
 attention-0.147
 TV-0.142
 lunch-0.133
Token
Token ID198
Feature activation+1.74e+00

Pos logit contributions
 forgot+0.136
 couldn+0.134
ave+0.134
 stared+0.133
olded+0.131
Neg logit contributions
 new-0.103
 cotton-0.094
 long-0.091
 sunny-0.091
 big-0.085
TokenDa
Token ID26531
Feature activation+2.68e+00

Pos logit contributions
 couldn+0.086
 could+0.083
olded+0.081
 stared+0.080
 forgot+0.078
Neg logit contributions
 cotton-0.066
 sunny-0.065
 new-0.062
 rainy-0.060
 things-0.059
Tokenisy
Token ID13560
Feature activation+7.39e+00

Pos logit contributions
 reached+0.171
 could+0.168
 crossed+0.162
olded+0.162
oped+0.161
Neg logit contributions
 attention-0.054
 new-0.044
 sunny-0.042
 thing-0.041
 long-0.041
Token was
Token ID373
Feature activation+2.50e+00

Pos logit contributions
ave+0.266
oped+0.261
olded+0.259
 Did+0.246
Well+0.240
Neg logit contributions
 new-0.306
 things-0.305
 red-0.301
 big-0.288
 fire-0.285
Token delighted
Token ID20707
Feature activation+1.71e-01

Pos logit contributions
ave+0.155
oped+0.152
 could+0.148
olded+0.145
Well+0.143
Neg logit contributions
 new-0.066
 fake-0.059
 red-0.058
 home-0.058
 gloomy-0.057
Token and
Token ID290
Feature activation+3.79e+00

Pos logit contributions
 could+0.010
olded+0.010
 hadn+0.010
 reached+0.010
ave+0.010
Neg logit contributions
 attention-0.005
 things-0.005
 new-0.005
 food-0.004
 cotton-0.004
Token stepped
Token ID10764
Feature activation-8.29e-01

Pos logit contributions
oped+0.157
os+0.151
ave+0.151
ver+0.137
oke+0.133
Neg logit contributions
 new-0.193
 long-0.174
 shiny-0.171
 round-0.169
 attention-0.167
Token into
Token ID656
Feature activation-1.43e+00

Pos logit contributions
 attention+0.022
 cotton+0.021
 things+0.020
 new+0.019
 food+0.019
Neg logit contributions
 could-0.051
 reached-0.049
olded-0.049
 agreed-0.049
 couldn-0.049
Token.
Token ID13
Feature activation+2.14e+00

Pos logit contributions
 attention+0.019
 cotton+0.018
 things+0.018
 new+0.017
 food+0.017
Neg logit contributions
 could-0.030
olded-0.029
 reached-0.029
 agreed-0.029
ave-0.029
Token
Token ID220
Feature activation+3.18e+00

Pos logit contributions
ave+0.097
 forgot+0.087
 managed+0.084
olded+0.082
 agreed+0.080
Neg logit contributions
 new-0.099
 sight-0.095
 things-0.093
 attention-0.090
 cotton-0.089
Token
Token ID198
Feature activation+1.81e+00

Pos logit contributions
 forgot+0.158
ave+0.158
 stared+0.154
 agreed+0.152
 reached+0.151
Neg logit contributions
 new-0.130
 attention-0.116
 things-0.112
 TV-0.104
 home-0.104
Token
Token ID198
Feature activation+4.83e-01

Pos logit contributions
ave+0.082
olded+0.080
 forgot+0.076
 agreed+0.075
 could+0.073
Neg logit contributions
 new-0.080
 cotton-0.075
 long-0.072
 things-0.072
 attention-0.070
TokenJon
Token ID18219
Feature activation+1.68e+00

Pos logit contributions
 could+0.018
olded+0.018
ave+0.017
 agreed+0.017
 couldn+0.017
Neg logit contributions
 cotton-0.023
 things-0.023
 new-0.022
 attention-0.022
 sand-0.021
Tokenny
Token ID3281
Feature activation+7.37e+00

Pos logit contributions
olded+0.078
 could+0.077
 couldn+0.075
 blamed+0.075
 hadn+0.075
Neg logit contributions
 attention-0.069
 thing-0.059
 new-0.059
 things-0.058
 home-0.058
Token smiled
Token ID13541
Feature activation-4.99e-01

Pos logit contributions
oped+0.284
olded+0.252
os+0.251
ver+0.251
 Did+0.244
Neg logit contributions
 new-0.338
 home-0.322
 big-0.318
 things-0.315
 red-0.310
Token at
Token ID379
Feature activation-1.02e+00

Pos logit contributions
 attention+0.019
 new+0.018
 cotton+0.018
 things+0.017
 thing+0.016
Neg logit contributions
 could-0.025
olded-0.024
 reached-0.024
 hadn-0.024
 agreed-0.024
Token his
Token ID465
Feature activation-1.83e+00

Pos logit contributions
 attention+0.035
 cotton+0.033
 things+0.032
 new+0.031
 food+0.030
Neg logit contributions
 could-0.055
 reached-0.053
 agreed-0.053
 thanked-0.053
 couldn-0.053
Token parents
Token ID3397
Feature activation+3.80e-01

Pos logit contributions
 cotton+0.076
 attention+0.075
 rainy+0.071
 sand+0.069
 groceries+0.069
Neg logit contributions
 couldn-0.076
 thanked-0.076
 could-0.076
 agreed-0.074
 understood-0.071
Token and
Token ID290
Feature activation+3.10e+00

Pos logit contributions
 could+0.020
 reached+0.020
ave+0.020
olded+0.019
 hadn+0.019
Neg logit contributions
 attention-0.013
 new-0.012
 things-0.012
 cotton-0.012
 TV-0.012
Token each
Token ID1123
Feature activation-1.71e+00

Pos logit contributions
olded+0.036
 could+0.034
ave+0.034
 couldn+0.034
 From+0.034
Neg logit contributions
 things-0.020
 food-0.020
 new-0.020
 lunch-0.019
 cotton-0.019
Token other
Token ID584
Feature activation-2.89e-02

Pos logit contributions
 attention+0.048
 cotton+0.047
 things+0.041
 sight+0.041
 sand+0.040
Neg logit contributions
 could-0.100
 agreed-0.097
 reached-0.096
 thanked-0.095
 couldn-0.095
Token!
Token ID0
Feature activation+2.78e+00

Pos logit contributions
 attention+0.001
 things+0.001
 new+0.001
 cotton+0.001
 food+0.001
Neg logit contributions
 reached-0.002
olded-0.002
 could-0.002
ave-0.002
 hadn-0.002
Token<|endoftext|>
Token ID50256
Feature activation+2.10e+00

Pos logit contributions
ave+0.132
 managed+0.128
 forgot+0.124
 nodded+0.124
 stared+0.123
Neg logit contributions
 TV-0.110
 new-0.106
 birthday-0.103
 sight-0.100
 lunch-0.099
TokenJoe
Token ID19585
Feature activation+5.08e+00

Pos logit contributions
olded+0.112
 apologised+0.102
 forgot+0.101
ave+0.100
 apologized+0.099
Neg logit contributions
 things-0.083
 new-0.081
 thing-0.077
 sight-0.076
 food-0.076
Tokeny
Token ID88
Feature activation+7.36e+00

Pos logit contributions
olded+0.283
Well+0.257
 apologised+0.243
ave+0.238
 hadn+0.237
Neg logit contributions
 red-0.156
 home-0.155
 things-0.144
 big-0.137
 sand-0.137
Token was
Token ID373
Feature activation-3.11e-01

Pos logit contributions
olded+0.334
Well+0.309
ave+0.308
oped+0.307
ver+0.292
Neg logit contributions
 things-0.267
 home-0.266
 new-0.263
 red-0.261
 big-0.260
Token sitting
Token ID5586
Feature activation-1.84e+00

Pos logit contributions
 attention+0.009
 new+0.008
 cotton+0.008
 things+0.008
 red+0.008
Neg logit contributions
 could-0.018
olded-0.018
 agreed-0.018
ave-0.018
 reached-0.018
Token in
Token ID287
Feature activation-1.53e+00

Pos logit contributions
 attention+0.064
 cotton+0.061
 gift+0.055
 purpose+0.054
 TV+0.053
Neg logit contributions
 could-0.098
 agreed-0.093
 thanked-0.093
 reached-0.093
 couldn-0.092
Token his
Token ID465
Feature activation-1.42e+00

Pos logit contributions
 attention+0.056
 cotton+0.051
 things+0.048
 new+0.048
 gift+0.047
Neg logit contributions
 could-0.078
 reached-0.076
 thanked-0.075
 couldn-0.074
 agreed-0.074
Token room
Token ID2119
Feature activation-2.38e-01

Pos logit contributions
 attention+0.040
 cotton+0.037
 things+0.035
 food+0.033
 gift+0.032
Neg logit contributions
 could-0.085
 couldn-0.082
 thanked-0.082
 reached-0.081
 agreed-0.080
Token give
Token ID1577
Feature activation-2.87e-01

Pos logit contributions
olded+0.029
ave+0.028
oped+0.028
 could+0.028
 From+0.028
Neg logit contributions
 attention-0.022
 new-0.020
 things-0.019
 food-0.018
 cotton-0.018
Token up
Token ID510
Feature activation-1.75e-01

Pos logit contributions
 attention+0.007
 things+0.006
 new+0.006
 cotton+0.006
 food+0.006
Neg logit contributions
 could-0.019
 reached-0.018
olded-0.018
 thanked-0.018
 agreed-0.018
Token!
Token ID0
Feature activation+2.51e+00

Pos logit contributions
 attention+0.005
 things+0.005
 new+0.005
 cotton+0.005
 food+0.004
Neg logit contributions
 could-0.010
olded-0.010
 reached-0.010
ave-0.010
 agreed-0.010
Token<|endoftext|>
Token ID50256
Feature activation+2.13e+00

Pos logit contributions
ave+0.118
 forgot+0.118
 stared+0.114
 couldn+0.112
 reached+0.111
Neg logit contributions
 new-0.092
 gift-0.090
 things-0.090
 TV-0.090
 purpose-0.088
TokenJoe
Token ID19585
Feature activation+5.47e+00

Pos logit contributions
olded+0.115
 forgot+0.106
 apologised+0.104
ave+0.103
 apologized+0.101
Neg logit contributions
 things-0.082
 new-0.081
 lunch-0.077
 sight-0.077
 thing-0.077
Tokeny
Token ID88
Feature activation+7.35e+00

Pos logit contributions
olded+0.259
Well+0.221
 exclaimed+0.215
ave+0.214
 apologised+0.213
Neg logit contributions
 things-0.217
 red-0.215
 home-0.206
 new-0.193
 sand-0.192
Token was
Token ID373
Feature activation-2.67e-01

Pos logit contributions
olded+0.322
oped+0.291
ave+0.285
ver+0.282
Well+0.272
Neg logit contributions
 things-0.308
 red-0.291
 new-0.285
 sand-0.278
 lunch-0.275
Token out
Token ID503
Feature activation-1.50e+00

Pos logit contributions
 attention+0.007
 new+0.007
 cotton+0.007
 things+0.007
 dessert+0.006
Neg logit contributions
 could-0.016
olded-0.016
 agreed-0.016
ave-0.016
 hadn-0.015
Token in
Token ID287
Feature activation-1.46e+00

Pos logit contributions
 attention+0.048
 cotton+0.046
 things+0.043
 food+0.041
 new+0.041
Neg logit contributions
 could-0.083
 couldn-0.081
 reached-0.080
olded-0.079
 thanked-0.079
Token the
Token ID262
Feature activation-1.53e+00

Pos logit contributions
 attention+0.051
 cotton+0.048
 things+0.044
 new+0.044
 food+0.043
Neg logit contributions
 could-0.077
 reached-0.074
 couldn-0.074
 thanked-0.074
 hadn-0.073
Token yard
Token ID12699
Feature activation+1.66e+00

Pos logit contributions
 attention+0.049
 cotton+0.044
 things+0.041
 gift+0.040
 food+0.039
Neg logit contributions
 could-0.087
 reached-0.083
 thanked-0.083
 couldn-0.083
 suggested-0.082
Token to
Token ID284
Feature activation-2.24e-01

Pos logit contributions
 could+0.008
olded+0.008
 reached+0.008
ave+0.008
 hadn+0.008
Neg logit contributions
 attention-0.003
 new-0.003
 things-0.003
 cotton-0.003
 food-0.003
Token other
Token ID584
Feature activation-1.59e+00

Pos logit contributions
 attention+0.006
 new+0.006
 things+0.006
 cotton+0.006
 food+0.006
Neg logit contributions
 could-0.014
olded-0.013
ave-0.013
 reached-0.013
 agreed-0.013
Token customers
Token ID4297
Feature activation+1.01e+00

Pos logit contributions
 attention+0.043
 cotton+0.042
 gift+0.036
 sight+0.036
 TV+0.036
Neg logit contributions
 could-0.094
 thanked-0.091
 agreed-0.090
 reached-0.090
olded-0.089
Token.
Token ID13
Feature activation+2.53e+00

Pos logit contributions
 reached+0.068
ave+0.067
olded+0.066
 stared+0.065
 hadn+0.065
Neg logit contributions
 attention-0.020
 things-0.019
 new-0.017
 food-0.017
 home-0.016
Token Jo
Token ID5302
Feature activation+4.65e+00

Pos logit contributions
'd+0.112
ave+0.110
 forgot+0.109
oped+0.109
 managed+0.108
Neg logit contributions
 TV-0.108
 new-0.103
 dessert-0.099
 lunch-0.097
 groceries-0.097
Tokenanne
Token ID21952
Feature activation+7.33e+00

Pos logit contributions
icked+0.153
olded+0.147
 Did+0.141
anced+0.135
Well+0.132
Neg logit contributions
 new-0.190
 home-0.189
 outside-0.187
--0.186
 long-0.186
Token and
Token ID290
Feature activation+1.17e+00

Pos logit contributions
 Did+0.235
olded+0.224
ver+0.222
 Follow+0.219
Kids+0.215
Neg logit contributions
 red-0.301
 new-0.285
 home-0.283
 big-0.283
 toy-0.279
Token Joey
Token ID26154
Feature activation+5.41e+00

Pos logit contributions
ave+0.078
olded+0.076
oped+0.074
 could+0.073
 hadn+0.072
Neg logit contributions
 attention-0.030
 things-0.029
 new-0.028
 home-0.026
 sand-0.024
Token said
Token ID531
Feature activation+6.93e-01

Pos logit contributions
azz+0.184
 Did+0.182
ave+0.180
enny+0.180
olded+0.179
Neg logit contributions
 home-0.219
 big-0.206
 very-0.200
 new-0.199
 back-0.191
Token goodbye
Token ID24829
Feature activation+4.88e-01

Pos logit contributions
ave+0.040
 hadn+0.040
olded+0.040
 reached+0.039
 could+0.039
Neg logit contributions
 new-0.019
 things-0.019
 attention-0.019
 cotton-0.018
 thing-0.017
Token as
Token ID355
Feature activation+2.18e-01

Pos logit contributions
 could+0.030
 hadn+0.029
 reached+0.029
olded+0.029
ave+0.029
Neg logit contributions
 attention-0.013
 new-0.012
 things-0.011
 home-0.011
 cotton-0.011
Token rely
Token ID8814
Feature activation+2.99e-01

Pos logit contributions
 attention+0.005
 things+0.005
 new+0.005
 cotton+0.005
 food+0.004
Neg logit contributions
 could-0.005
olded-0.005
ave-0.005
 reached-0.005
 agreed-0.005
Token on
Token ID319
Feature activation+1.20e+00

Pos logit contributions
 thanked+0.020
olded+0.020
 reached+0.020
 agreed+0.020
 hadn+0.019
Neg logit contributions
 attention-0.006
 things-0.006
 new-0.006
 cotton-0.005
 TV-0.005
Token each
Token ID1123
Feature activation-7.35e-01

Pos logit contributions
 reached+0.080
ave+0.077
olded+0.077
 agreed+0.077
 hadn+0.076
Neg logit contributions
 things-0.030
 attention-0.029
 new-0.028
 TV-0.026
 cotton-0.024
Token other
Token ID584
Feature activation+1.19e+00

Pos logit contributions
 attention+0.018
 cotton+0.016
 things+0.016
 new+0.015
 food+0.015
Neg logit contributions
 could-0.047
olded-0.046
 reached-0.046
 couldn-0.046
 agreed-0.046
Token".
Token ID1911
Feature activation+3.89e+00

Pos logit contributions
 reached+0.071
olded+0.070
 hadn+0.068
oped+0.068
ave+0.067
Neg logit contributions
 attention-0.038
 things-0.037
 new-0.034
 food-0.032
 cotton-0.031
Token Tina
Token ID35903
Feature activation+7.33e+00

Pos logit contributions
ave+0.189
 forgot+0.185
'd+0.182
 managed+0.175
 reached+0.174
Neg logit contributions
 TV-0.158
 lunch-0.147
 dessert-0.146
 things-0.143
 new-0.142
Token hugged
Token ID41130
Feature activation-2.33e-01

Pos logit contributions
oped+0.266
olded+0.219
 Did+0.211
ave+0.209
Get+0.207
Neg logit contributions
 new-0.329
 things-0.309
 old-0.302
 red-0.300
.-0.292
Token her
Token ID607
Feature activation-9.77e-01

Pos logit contributions
 attention+0.008
 things+0.007
 new+0.007
 cotton+0.007
 TV+0.007
Neg logit contributions
 could-0.013
olded-0.013
 reached-0.013
ave-0.013
 hadn-0.013
Token back
Token ID736
Feature activation-3.25e-01

Pos logit contributions
 attention+0.031
 cotton+0.029
 things+0.028
 new+0.027
 food+0.027
Neg logit contributions
 could-0.055
 reached-0.053
olded-0.053
 agreed-0.053
 couldn-0.053
Token and
Token ID290
Feature activation+2.38e+00

Pos logit contributions
 attention+0.011
 new+0.010
 things+0.010
 cotton+0.009
 thing+0.009
Neg logit contributions
 could-0.018
 reached-0.018
olded-0.018
 hadn-0.018
 agreed-0.018
Token knew
Token ID2993
Feature activation+1.14e+00

Pos logit contributions
ave+0.067
'd+0.065
att+0.065
os+0.064
oped+0.063
Neg logit contributions
 new-0.138
 things-0.131
 attention-0.130
 red-0.124
 more-0.124
Token moved
Token ID3888
Feature activation+1.39e+00

Pos logit contributions
ave+0.187
 From+0.184
Well+0.180
olded+0.176
oped+0.172
Neg logit contributions
 things-0.245
 thing-0.241
 new-0.238
 red-0.232
 gift-0.230
Token away
Token ID1497
Feature activation+1.26e+00

Pos logit contributions
ave+0.083
 reached+0.082
 forgot+0.082
 hadn+0.082
olded+0.081
Neg logit contributions
 attention-0.040
 things-0.040
 new-0.037
 thing-0.035
 red-0.034
Token.
Token ID13
Feature activation+3.49e+00

Pos logit contributions
ave+0.078
 hadn+0.076
 reached+0.076
olded+0.076
 could+0.075
Neg logit contributions
 things-0.036
 attention-0.035
 new-0.034
 thing-0.031
 food-0.031
Tokenâ
Token ID22940
Feature activation+2.12e+00

Pos logit contributions
 forgot+0.163
ave+0.159
 couldn+0.149
 understood+0.149
 managed+0.145
Neg logit contributions
 TV-0.146
 new-0.139
 cotton-0.138
 things-0.133
 television-0.132
Token
Token ID26391
Feature activation+2.91e+00

Pos logit contributions
 could+0.095
 couldn+0.093
 screamed+0.087
olded+0.086
 hadn+0.085
Neg logit contributions
 job-0.087
 attention-0.086
 thing-0.080
 earthquake-0.076
 things-0.074
Token Daisy
Token ID40355
Feature activation+7.32e+00

Pos logit contributions
ave+0.129
olded+0.122
 apologized+0.117
 agreed+0.116
 apologised+0.116
Neg logit contributions
 things-0.130
 long-0.118
 job-0.117
 view-0.114
 lunch-0.114
Token looked
Token ID3114
Feature activation+9.29e-02

Pos logit contributions
olded+0.254
ave+0.250
oped+0.241
Well+0.227
via+0.220
Neg logit contributions
 new-0.311
 things-0.310
 big-0.310
 red-0.309
 long-0.293
Token sad
Token ID6507
Feature activation+8.70e-01

Pos logit contributions
 could+0.006
 reached+0.005
olded+0.005
ave+0.005
 hadn+0.005
Neg logit contributions
 attention-0.003
 new-0.002
 things-0.002
 cotton-0.002
 red-0.002
Token as
Token ID355
Feature activation+1.09e+00

Pos logit contributions
 reached+0.051
olded+0.050
ave+0.050
oped+0.049
 hadn+0.049
Neg logit contributions
 attention-0.026
 things-0.025
 new-0.025
 cotton-0.023
 thing-0.022
Token she
Token ID673
Feature activation+4.33e+00

Pos logit contributions
ave+0.059
olded+0.059
oped+0.057
 reached+0.056
 suggested+0.055
Neg logit contributions
 new-0.041
 attention-0.040
 things-0.038
 long-0.036
 gloomy-0.035
Token didn
Token ID1422
Feature activation+4.25e-02

Pos logit contributions
olded+0.131
via+0.120
lett+0.117
ver+0.113
Although+0.104
Neg logit contributions
 new-0.218
 dark-0.216
 red-0.210
 big-0.208
 shiny-0.207
Tokenâ
Token ID22940
Feature activation+1.32e+00

Pos logit contributions
 forgot+0.187
ave+0.174
 understood+0.165
 stared+0.164
 couldn+0.163
Neg logit contributions
 TV-0.153
 things-0.146
 cotton-0.141
 new-0.140
 groceries-0.137
Token
Token ID26391
Feature activation+3.05e+00

Pos logit contributions
 could+0.062
 couldn+0.059
 reached+0.056
 hadn+0.055
 forgot+0.055
Neg logit contributions
 attention-0.055
 job-0.053
 thing-0.050
 earthquake-0.047
 temptation-0.047
Token
Token ID198
Feature activation+2.43e+00

Pos logit contributions
ave+0.139
olded+0.128
 thanked+0.122
 apologized+0.122
 apologised+0.121
Neg logit contributions
 things-0.123
 long-0.120
 job-0.118
 sight-0.113
 view-0.113
Token
Token ID198
Feature activation+1.95e+00

Pos logit contributions
 forgot+0.121
ave+0.117
 stared+0.111
 couldn+0.110
olded+0.110
Neg logit contributions
 new-0.098
 cotton-0.090
 long-0.089
 lunch-0.083
 big-0.083
TokenJo
Token ID9908
Feature activation+4.47e+00

Pos logit contributions
 couldn+0.100
 could+0.096
 forgot+0.095
 stared+0.095
 missed+0.094
Neg logit contributions
 cotton-0.070
 new-0.065
 sunny-0.064
 long-0.062
 things-0.062
Tokenjo
Token ID7639
Feature activation+7.32e+00

Pos logit contributions
 Did+0.157
olded+0.153
oped+0.148
 blamed+0.142
 apologized+0.136
Neg logit contributions
 thing-0.192
 things-0.188
 job-0.181
 attention-0.179
 home-0.175
Token dug
Token ID18735
Feature activation-1.10e+00

Pos logit contributions
oped+0.269
olded+0.249
 Did+0.241
Well+0.233
ave+0.229
Neg logit contributions
 things-0.309
 thing-0.294
 more-0.287
 red-0.281
 new-0.279
Token into
Token ID656
Feature activation-1.68e+00

Pos logit contributions
 attention+0.032
 cotton+0.030
 things+0.029
 new+0.028
 food+0.027
Neg logit contributions
 could-0.064
 reached-0.062
 agreed-0.062
olded-0.062
 thanked-0.062
Token his
Token ID465
Feature activation-1.49e+00

Pos logit contributions
 attention+0.056
 cotton+0.050
 gift+0.048
 things+0.047
 new+0.046
Neg logit contributions
 could-0.091
 reached-0.088
 couldn-0.087
 thanked-0.087
 agreed-0.086
Token pocket
Token ID10000
Feature activation+3.71e-01

Pos logit contributions
 attention+0.046
 cotton+0.044
 things+0.040
 new+0.038
 TV+0.038
Neg logit contributions
 could-0.083
 thanked-0.080
 reached-0.080
 couldn-0.080
 agreed-0.080
Token and
Token ID290
Feature activation+2.37e+00

Pos logit contributions
 could+0.019
 hadn+0.019
 reached+0.019
olded+0.019
ave+0.019
Neg logit contributions
 attention-0.013
 new-0.013
 things-0.012
 cotton-0.012
 food-0.012
Token meet
Token ID1826
Feature activation+1.20e+00

Pos logit contributions
ave+0.077
olded+0.077
 could+0.073
 hadn+0.073
oped+0.073
Neg logit contributions
 new-0.046
 attention-0.046
 things-0.043
 food-0.042
 lunch-0.041
Token you
Token ID345
Feature activation+1.43e+00

Pos logit contributions
 reached+0.069
 forgot+0.069
olded+0.069
 hadn+0.068
ave+0.068
Neg logit contributions
 new-0.038
 things-0.037
 cotton-0.035
 food-0.035
 attention-0.034
Token.
Token ID13
Feature activation+2.97e+00

Pos logit contributions
ave+0.091
 reached+0.091
olded+0.089
 thanked+0.087
 stared+0.087
Neg logit contributions
 new-0.037
 things-0.036
 attention-0.035
 food-0.033
 TV-0.032
Tokenâ
Token ID22940
Feature activation+1.46e+00

Pos logit contributions
 forgot+0.136
ave+0.135
 couldn+0.126
 understood+0.124
 managed+0.122
Neg logit contributions
 TV-0.125
 new-0.123
 cotton-0.122
 things-0.115
 red-0.114
Token
Token ID26391
Feature activation+2.59e+00

Pos logit contributions
 could+0.066
 couldn+0.065
 hadn+0.060
olded+0.060
 reached+0.059
Neg logit contributions
 attention-0.062
 job-0.059
 thing-0.057
 things-0.054
 sunny-0.054
Token Daisy
Token ID40355
Feature activation+7.32e+00

Pos logit contributions
ave+0.115
olded+0.113
 apologised+0.104
 apologized+0.104
 understood+0.102
Neg logit contributions
 things-0.110
 long-0.106
 lunch-0.105
 sight-0.103
 view-0.102
Token was
Token ID373
Feature activation+1.73e+00

Pos logit contributions
olded+0.261
ave+0.261
oped+0.255
Well+0.233
ver+0.230
Neg logit contributions
 new-0.331
 red-0.329
 things-0.312
 big-0.309
 pink-0.307
Token surprised
Token ID6655
Feature activation-4.37e-01

Pos logit contributions
ave+0.111
oped+0.108
olded+0.107
 could+0.104
 suggested+0.101
Neg logit contributions
 new-0.047
 red-0.041
 real-0.039
 fake-0.039
 sight-0.039
Token and
Token ID290
Feature activation+3.32e+00

Pos logit contributions
 attention+0.014
 things+0.013
 new+0.013
 cotton+0.013
 food+0.012
Neg logit contributions
 could-0.024
olded-0.024
 reached-0.024
 hadn-0.024
 couldn-0.023
Token smiled
Token ID13541
Feature activation-7.74e-01

Pos logit contributions
oped+0.143
ave+0.140
os+0.138
olded+0.126
ver+0.126
Neg logit contributions
 new-0.165
 red-0.146
 shiny-0.146
 round-0.146
 long-0.145
Token.
Token ID13
Feature activation+3.38e+00

Pos logit contributions
 attention+0.022
 cotton+0.021
 things+0.020
 new+0.020
 food+0.019
Neg logit contributions
 could-0.046
olded-0.044
 reached-0.044
 agreed-0.044
 couldn-0.044
Token day
Token ID1110
Feature activation-1.72e+00

Pos logit contributions
 attention+0.056
 cotton+0.054
 things+0.049
 groceries+0.048
 TV+0.045
Neg logit contributions
 could-0.115
 couldn-0.112
 reached-0.110
 agreed-0.109
 thanked-0.107
Token,
Token ID11
Feature activation-1.51e+00

Pos logit contributions
 attention+0.073
 cotton+0.067
 groceries+0.063
 sight+0.063
 gift+0.062
Neg logit contributions
 could-0.085
 couldn-0.081
 reached-0.079
 suggested-0.077
 agreed-0.077
Token she
Token ID673
Feature activation+1.36e+00

Pos logit contributions
 attention+0.081
 cotton+0.076
 things+0.073
 food+0.072
 sight+0.072
Neg logit contributions
 could-0.058
 couldn-0.053
 reached-0.052
 agreed-0.051
 thanked-0.050
Token saw
Token ID2497
Feature activation-2.24e+00

Pos logit contributions
olded+0.048
 agreed+0.041
ver+0.040
Well+0.040
oped+0.040
Neg logit contributions
 things-0.072
 red-0.070
 dark-0.070
 food-0.069
 sand-0.069
Token a
Token ID257
Feature activation-4.46e+00

Pos logit contributions
 attention+0.090
 job+0.077
 cotton+0.076
 gift+0.076
rier+0.073
Neg logit contributions
 could-0.112
 reached-0.107
 couldn-0.107
 thanked-0.106
 understood-0.105
Token shiny
Token ID22441
Feature activation-5.78e+00

Pos logit contributions
 attention+0.176
ags+0.151
 groceries+0.124
 any+0.123
 choices+0.122
Neg logit contributions
 could-0.243
 couldn-0.235
 hadn-0.229
 didn-0.222
 said-0.218
Token ball
Token ID2613
Feature activation-2.91e+00

Pos logit contributions
 attention+0.243
 unnecessary+0.206
ags+0.181
 groceries+0.175
 numb+0.168
Neg logit contributions
 replied-0.257
 could-0.252
 was-0.251
 couldn-0.249
 lived-0.248
Token high
Token ID1029
Feature activation-3.06e+00

Pos logit contributions
 attention+0.129
 purpose+0.110
 cotton+0.109
 groceries+0.108
 things+0.103
Neg logit contributions
 could-0.154
 reached-0.145
 couldn-0.141
 stood-0.140
 suggested-0.139
Token up
Token ID510
Feature activation-3.49e+00

Pos logit contributions
 attention+0.142
 cotton+0.124
 gift+0.123
 groceries+0.122
 lunch+0.120
Neg logit contributions
 could-0.140
 reached-0.129
 couldn-0.126
 stood-0.124
 wanted-0.123
Token on
Token ID319
Feature activation-3.37e+00

Pos logit contributions
 cotton+0.162
 attention+0.157
 groceries+0.150
 dessert+0.147
 TV+0.147
Neg logit contributions
 reached-0.151
 could-0.148
 asked-0.139
 suggested-0.138
 stood-0.136
Token a
Token ID257
Feature activation-4.16e+00

Pos logit contributions
 attention+0.187
agging+0.156
rier+0.154
 gift+0.153
 cotton+0.150
Neg logit contributions
 reached-0.111
 thanked-0.109
 asked-0.106
 could-0.104
 From-0.103
Token girl
Token ID2576
Feature activation-7.59e-01

Pos logit contributions
 attention+0.182
 purpose+0.152
 groceries+0.150
 rules+0.149
 cotton+0.148
Neg logit contributions
 could-0.139
 agreed-0.129
 reached-0.127
 wanted-0.127
 nodded-0.127
Token named
Token ID3706
Feature activation-1.48e+00

Pos logit contributions
 attention+0.046
 cotton+0.045
 things+0.044
 new+0.044
 food+0.043
Neg logit contributions
 could-0.020
olded-0.019
 agreed-0.019
 reached-0.019
 couldn-0.019
Token Sue
Token ID26089
Feature activation+7.51e-01

Pos logit contributions
 attention+0.070
 cotton+0.070
 things+0.065
 new+0.063
 sight+0.063
Neg logit contributions
 could-0.062
 couldn-0.058
 reached-0.058
 agreed-0.057
olded-0.056
Token found
Token ID1043
Feature activation-2.87e+00

Pos logit contributions
olded+0.022
 agreed+0.018
 hadn+0.018
ave+0.017
 suggested+0.017
Neg logit contributions
 things-0.044
 food-0.043
 thing-0.043
 red-0.043
 new-0.043
Token a
Token ID257
Feature activation-4.28e+00

Pos logit contributions
 attention+0.127
 cotton+0.118
 TV+0.113
 job+0.112
rier+0.111
Neg logit contributions
 could-0.131
 reached-0.131
 thanked-0.122
 understood-0.122
 stood-0.121
Token shiny
Token ID22441
Feature activation-5.72e+00

Pos logit contributions
 attention+0.168
ags+0.151
 choices+0.126
reating+0.123
agging+0.116
Neg logit contributions
 could-0.237
 couldn-0.234
 hadn-0.227
 didn-0.224
'd-0.221
Token coin
Token ID10752
Feature activation-3.19e+00

Pos logit contributions
 attention+0.234
 unnecessary+0.199
 films+0.175
 emergencies+0.172
 groceries+0.169
Neg logit contributions
 could-0.260
 replied-0.258
 was-0.257
 couldn-0.256
 lived-0.253
Token on
Token ID319
Feature activation-2.51e+00

Pos logit contributions
 attention+0.139
 cotton+0.135
 purpose+0.130
 TV+0.129
 groceries+0.124
Neg logit contributions
 could-0.161
 couldn-0.145
 hadn-0.140
 suggested-0.138
 reached-0.138
Token the
Token ID262
Feature activation-3.47e+00

Pos logit contributions
 attention+0.124
 cotton+0.101
 gift+0.099
 unnecessary+0.097
rier+0.097
Neg logit contributions
 could-0.104
 reached-0.102
 couldn-0.099
 lived-0.098
 thanked-0.098
Token ground
Token ID2323
Feature activation-1.41e+00

Pos logit contributions
 attention+0.141
 cotton+0.128
 unnecessary+0.126
 things+0.125
 sight+0.120
Neg logit contributions
 could-0.146
 couldn-0.144
 thanked-0.141
 reached-0.129
 lived-0.129
Token.
Token ID13
Feature activation-2.04e-01

Pos logit contributions
 attention+0.046
 cotton+0.044
 things+0.040
 new+0.040
 dessert+0.039
Neg logit contributions
 could-0.079
 reached-0.076
 couldn-0.075
 agreed-0.075
 suggested-0.075
Token rabbit
Token ID22746
Feature activation+9.83e-01

Pos logit contributions
 attention+0.200
 purpose+0.172
 clues+0.168
 anything+0.167
 rules+0.167
Neg logit contributions
 could-0.144
 agreed-0.133
 nodded-0.132
 wanted-0.131
 said-0.131
Token named
Token ID3706
Feature activation-1.17e+00

Pos logit contributions
olded+0.042
 agreed+0.036
 hadn+0.036
ave+0.036
 apologised+0.035
Neg logit contributions
 thing-0.045
 things-0.045
 red-0.044
 new-0.043
 food-0.043
Token Tim
Token ID5045
Feature activation+1.22e+00

Pos logit contributions
 attention+0.062
 cotton+0.059
 things+0.058
 new+0.057
 sight+0.056
Neg logit contributions
 could-0.042
 reached-0.040
 couldn-0.039
 agreed-0.039
 thanked-0.038
Token found
Token ID1043
Feature activation-2.47e+00

Pos logit contributions
olded+0.042
 agreed+0.032
 hadn+0.031
ave+0.031
 apologised+0.031
Neg logit contributions
 things-0.069
 food-0.068
 red-0.067
 thing-0.067
 sand-0.066
Token a
Token ID257
Feature activation-3.99e+00

Pos logit contributions
 attention+0.102
 cotton+0.096
 TV+0.094
 rainy+0.089
 birthday+0.089
Neg logit contributions
 could-0.124
 reached-0.119
 couldn-0.113
 thanked-0.113
 suggested-0.112
Token shiny
Token ID22441
Feature activation-5.72e+00

Pos logit contributions
 attention+0.152
ags+0.132
reating+0.118
 choices+0.117
 groceries+0.110
Neg logit contributions
 could-0.228
 couldn-0.224
 hadn-0.213
'd-0.212
 didn-0.210
Token trophy
Token ID16383
Feature activation-3.32e+00

Pos logit contributions
 attention+0.250
 unnecessary+0.211
 emergencies+0.192
 choices+0.190
 films+0.189
Neg logit contributions
 could-0.249
 couldn-0.244
 replied-0.243
 lived-0.240
 was-0.235
Token in
Token ID287
Feature activation-1.33e+00

Pos logit contributions
 attention+0.151
 cotton+0.136
 purpose+0.131
 groceries+0.129
 TV+0.126
Neg logit contributions
 could-0.167
 couldn-0.148
 suggested-0.147
 reached-0.147
 agreed-0.144
Token the
Token ID262
Feature activation-1.75e+00

Pos logit contributions
 attention+0.050
 cotton+0.046
 things+0.043
 new+0.042
 groceries+0.042
Neg logit contributions
 could-0.069
 reached-0.067
 thanked-0.066
 suggested-0.065
 couldn-0.065
Token forest
Token ID8222
Feature activation-1.05e+00

Pos logit contributions
 attention+0.069
 cotton+0.062
 things+0.056
 groceries+0.055
 TV+0.054
Neg logit contributions
 could-0.090
 reached-0.086
 thanked-0.085
 couldn-0.083
 suggested-0.082
Token.
Token ID13
Feature activation-3.66e-01

Pos logit contributions
 attention+0.032
 cotton+0.031
 things+0.029
 new+0.028
 food+0.027
Neg logit contributions
 could-0.061
 reached-0.059
 agreed-0.058
 couldn-0.058
olded-0.058
Token Tim
Token ID5045
Feature activation+1.40e+00

Pos logit contributions
 attention+0.073
 cotton+0.072
 things+0.069
 sight+0.068
 new+0.067
Neg logit contributions
 could-0.046
 reached-0.042
 couldn-0.042
 agreed-0.041
olded-0.040
Token saw
Token ID2497
Feature activation-2.21e+00

Pos logit contributions
olded+0.052
 apologised+0.039
Well+0.039
ave+0.039
 agreed+0.038
Neg logit contributions
 things-0.075
 food-0.074
 thing-0.074
 red-0.074
 sand-0.073
Token a
Token ID257
Feature activation-4.51e+00

Pos logit contributions
 attention+0.096
 cotton+0.081
 gift+0.080
 job+0.077
rier+0.077
Neg logit contributions
 could-0.108
 reached-0.104
 couldn-0.101
 thanked-0.101
 suggested-0.100
Token big
Token ID1263
Feature activation-4.67e+00

Pos logit contributions
 attention+0.194
ags+0.172
 choices+0.145
reating+0.143
 groceries+0.140
Neg logit contributions
 could-0.235
 couldn-0.227
 didn-0.220
 forgot-0.216
 wanted-0.215
Token,
Token ID11
Feature activation-3.78e+00

Pos logit contributions
 attention+0.213
 unnecessary+0.174
rier+0.166
ags+0.165
 clues+0.155
Neg logit contributions
 could-0.199
 couldn-0.198
 said-0.195
 lived-0.194
 replied-0.192
Token shiny
Token ID22441
Feature activation-5.67e+00

Pos logit contributions
 attention+0.128
 purpose+0.104
reating+0.100
 cotton+0.099
 things+0.098
Neg logit contributions
 could-0.214
 hadn-0.209
 wanted-0.206
 reached-0.201
 stood-0.200
Token motorcycle
Token ID18757
Feature activation-1.97e+00

Pos logit contributions
 attention+0.254
 unnecessary+0.219
 choices+0.179
ags+0.179
 numb+0.178
Neg logit contributions
 replied-0.237
 could-0.237
 wanted-0.234
 was-0.233
 lived-0.232
Token.
Token ID13
Feature activation-6.78e-01

Pos logit contributions
 attention+0.074
 cotton+0.067
 groceries+0.063
 things+0.061
 purpose+0.061
Neg logit contributions
 could-0.109
 couldn-0.102
 reached-0.102
 hadn-0.100
 suggested-0.099
Token It
Token ID632
Feature activation+1.09e+00

Pos logit contributions
 attention+0.041
 cotton+0.040
 things+0.039
 new+0.039
 food+0.038
Neg logit contributions
 could-0.018
olded-0.017
 agreed-0.017
 reached-0.017
 thanked-0.017
Token looked
Token ID3114
Feature activation-2.67e+00

Pos logit contributions
olded+0.035
ave+0.030
 agreed+0.027
 apologised+0.027
oped+0.026
Neg logit contributions
 red-0.068
 new-0.067
 shiny-0.066
 round-0.065
 thing-0.063
Token different
Token ID1180
Feature activation-2.57e+00

Pos logit contributions
 attention+0.094
 cotton+0.092
 purpose+0.091
 dessert+0.088
 groceries+0.088
Neg logit contributions
 could-0.150
 couldn-0.143
 reached-0.136
 thanked-0.135
 hadn-0.134
Token Lily
Token ID20037
Feature activation-1.04e+00

Pos logit contributions
 attention+0.047
 cotton+0.045
 things+0.044
 new+0.043
 food+0.042
Neg logit contributions
 could-0.045
 reached-0.043
 agreed-0.043
 couldn-0.043
olded-0.042
Token.
Token ID13
Feature activation-1.15e+00

Pos logit contributions
 attention+0.033
 cotton+0.031
 things+0.029
 new+0.029
 food+0.028
Neg logit contributions
 could-0.060
 reached-0.057
 agreed-0.057
 couldn-0.057
 thanked-0.057
Token Lily
Token ID20037
Feature activation+1.05e+00

Pos logit contributions
 attention+0.071
 cotton+0.069
 things+0.067
 new+0.066
 food+0.066
Neg logit contributions
 could-0.032
 reached-0.029
 agreed-0.028
 couldn-0.028
 thanked-0.028
Token found
Token ID1043
Feature activation-2.67e+00

Pos logit contributions
olded+0.033
oped+0.027
 apologised+0.026
 agreed+0.025
ave+0.025
Neg logit contributions
 things-0.061
 thing-0.061
 red-0.060
 new-0.060
 sunny-0.059
Token a
Token ID257
Feature activation-4.50e+00

Pos logit contributions
 attention+0.114
 cotton+0.101
 job+0.101
 TV+0.100
 rainy+0.099
Neg logit contributions
 could-0.126
 reached-0.120
 thanked-0.115
 said-0.114
 understood-0.114
Token shiny
Token ID22441
Feature activation-5.64e+00

Pos logit contributions
 attention+0.163
ags+0.138
 choices+0.123
reating+0.122
 any+0.118
Neg logit contributions
 could-0.251
 couldn-0.241
 hadn-0.230
 said-0.229
 wanted-0.226
Token diamond
Token ID15291
Feature activation-3.05e+00

Pos logit contributions
 attention+0.222
 unnecessary+0.193
 emergencies+0.166
 films+0.165
 storms+0.163
Neg logit contributions
 could-0.269
 was-0.258
 couldn-0.257
 replied-0.257
 lived-0.256
Token on
Token ID319
Feature activation-2.72e+00

Pos logit contributions
 attention+0.128
 cotton+0.121
 purpose+0.117
 TV+0.117
 dessert+0.114
Neg logit contributions
 could-0.160
 couldn-0.147
 hadn-0.136
 said-0.136
 wanted-0.133
Token the
Token ID262
Feature activation-3.36e+00

Pos logit contributions
 attention+0.139
 gift+0.111
 unnecessary+0.110
 cotton+0.109
 groceries+0.108
Neg logit contributions
 could-0.109
 reached-0.103
 couldn-0.101
 thanked-0.100
 lived-0.099
Token ground
Token ID2323
Feature activation-8.38e-01

Pos logit contributions
 attention+0.137
 cotton+0.125
 unnecessary+0.123
 things+0.120
 sight+0.117
Neg logit contributions
 could-0.144
 couldn-0.141
 thanked-0.139
 lived-0.126
 reached-0.126
Token.
Token ID13
Feature activation-4.18e-01

Pos logit contributions
 attention+0.024
 cotton+0.022
 things+0.022
 new+0.021
 food+0.021
Neg logit contributions
 could-0.049
olded-0.048
 reached-0.048
 agreed-0.048
 couldn-0.048
Token,
Token ID11
Feature activation-1.28e+00

Pos logit contributions
 attention+0.031
 cotton+0.029
 things+0.028
 new+0.028
 food+0.027
Neg logit contributions
 could-0.046
 reached-0.045
olded-0.045
 agreed-0.044
 couldn-0.044
Token they
Token ID484
Feature activation+2.16e+00

Pos logit contributions
 attention+0.068
 cotton+0.065
 things+0.062
 new+0.062
 gift+0.062
Neg logit contributions
 could-0.046
 reached-0.043
 couldn-0.043
 agreed-0.043
 thanked-0.042
Token saw
Token ID2497
Feature activation-1.70e+00

Pos logit contributions
olded+0.097
etta+0.096
lor+0.089
aned+0.088
 From+0.087
Neg logit contributions
 red-0.090
 sand-0.090
 dark-0.089
 big-0.088
 home-0.083
Token a
Token ID257
Feature activation-4.33e+00

Pos logit contributions
 attention+0.057
 cotton+0.050
 gift+0.049
 job+0.048
 TV+0.047
Neg logit contributions
 could-0.094
 reached-0.091
 agreed-0.089
 couldn-0.089
 thanked-0.088
Token big
Token ID1263
Feature activation-4.50e+00

Pos logit contributions
 attention+0.155
ags+0.133
 groceries+0.109
 any+0.107
 choices+0.104
Neg logit contributions
 could-0.250
 couldn-0.236
 hadn-0.232
 said-0.227
 had-0.226
Token shiny
Token ID22441
Feature activation-5.61e+00

Pos logit contributions
 attention+0.191
 unnecessary+0.148
rier+0.146
ags+0.145
 groceries+0.143
Neg logit contributions
 could-0.203
 said-0.202
 replied-0.198
 lived-0.197
 reached-0.196
Token car
Token ID1097
Feature activation-1.52e+00

Pos logit contributions
 attention+0.234
 unnecessary+0.200
ags+0.169
 groceries+0.163
 numb+0.163
Neg logit contributions
 replied-0.250
 lived-0.248
 could-0.247
 was-0.247
 wanted-0.241
Token parked
Token ID19584
Feature activation-3.28e+00

Pos logit contributions
 attention+0.053
 cotton+0.049
 things+0.046
 new+0.045
 food+0.045
Neg logit contributions
 could-0.084
 reached-0.081
 couldn-0.080
 agreed-0.079
 suggested-0.079
Token near
Token ID1474
Feature activation-2.00e+00

Pos logit contributions
 attention+0.141
 cotton+0.127
 purpose+0.123
 unnecessary+0.119
 groceries+0.119
Neg logit contributions
 could-0.150
 reached-0.149
 stood-0.144
 agreed-0.143
 suggested-0.141
Token the
Token ID262
Feature activation-2.51e+00

Pos logit contributions
 attention+0.077
 cotton+0.072
 job+0.066
 rainy+0.066
 gift+0.066
Neg logit contributions
 could-0.095
 reached-0.094
 thanked-0.094
 suggested-0.093
 couldn-0.091
Token swings
Token ID26728
Feature activation-6.90e-01

Pos logit contributions
 attention+0.089
 cotton+0.078
 gift+0.073
 things+0.072
 dessert+0.072
Neg logit contributions
 could-0.132
 thanked-0.123
 reached-0.123
 couldn-0.120
 hadn-0.116
Token Sally
Token ID25737
Feature activation-4.59e-01

Pos logit contributions
 attention+0.027
 cotton+0.026
 new+0.026
 things+0.025
 food+0.025
Neg logit contributions
 could-0.030
olded-0.029
 thanked-0.028
 reached-0.028
 agreed-0.028
Token.
Token ID13
Feature activation-1.01e+00

Pos logit contributions
 attention+0.011
 new+0.010
 things+0.010
 cotton+0.010
 food+0.010
Neg logit contributions
olded-0.028
 could-0.028
 reached-0.028
 agreed-0.027
 couldn-0.027
Token Sally
Token ID25737
Feature activation+2.27e+00

Pos logit contributions
 attention+0.057
 cotton+0.055
 things+0.054
 new+0.053
 food+0.052
Neg logit contributions
 could-0.032
 reached-0.030
 agreed-0.030
 couldn-0.030
 thanked-0.030
Token found
Token ID1043
Feature activation-2.38e+00

Pos logit contributions
olded+0.092
oped+0.081
ave+0.075
 apologised+0.075
alled+0.071
Neg logit contributions
 red-0.119
 things-0.115
 new-0.114
 thing-0.113
 sunny-0.111
Token a
Token ID257
Feature activation-4.40e+00

Pos logit contributions
 attention+0.092
 cotton+0.084
 TV+0.083
 job+0.081
 rainy+0.081
Neg logit contributions
 could-0.119
 reached-0.114
 couldn-0.110
 thanked-0.109
 understood-0.108
Token shiny
Token ID22441
Feature activation-5.61e+00

Pos logit contributions
 attention+0.152
ags+0.125
reating+0.113
 choices+0.112
 any+0.110
Neg logit contributions
 could-0.249
 couldn-0.240
 hadn-0.231
 said-0.227
'd-0.226
Token penny
Token ID26369
Feature activation-2.50e+00

Pos logit contributions
 attention+0.218
 unnecessary+0.188
 films+0.164
 emergencies+0.163
 storms+0.160
Neg logit contributions
 could-0.268
 couldn-0.258
 was-0.258
 lived-0.257
 replied-0.256
Token on
Token ID319
Feature activation-2.55e+00

Pos logit contributions
 attention+0.097
 cotton+0.091
 purpose+0.088
 TV+0.087
 sight+0.086
Neg logit contributions
 could-0.136
 couldn-0.124
 suggested-0.120
 hadn-0.119
 reached-0.118
Token the
Token ID262
Feature activation-3.31e+00

Pos logit contributions
 attention+0.121
 cotton+0.100
 gift+0.099
 groceries+0.095
 job+0.095
Neg logit contributions
 could-0.107
 reached-0.104
 couldn-0.101
 thanked-0.099
 lived-0.097
Token ground
Token ID2323
Feature activation-5.77e-01

Pos logit contributions
 attention+0.133
 cotton+0.126
 things+0.122
 unnecessary+0.120
 sight+0.116
Neg logit contributions
 couldn-0.141
 could-0.140
 thanked-0.136
 reached-0.130
 From-0.123
Token.
Token ID13
Feature activation+1.89e-01

Pos logit contributions
 attention+0.015
 cotton+0.014
 things+0.014
 new+0.014
 food+0.013
Neg logit contributions
 could-0.035
olded-0.034
 reached-0.034
 thanked-0.034
 agreed-0.034
Token playing
Token ID2712
Feature activation-3.21e+00

Pos logit contributions
 attention+0.102
 cotton+0.101
 gift+0.095
 purpose+0.091
 sight+0.091
Neg logit contributions
 couldn-0.119
 could-0.118
 reached-0.117
 stood-0.113
olded-0.111
Token,
Token ID11
Feature activation-9.54e-01

Pos logit contributions
 attention+0.145
 ours+0.124
 cotton+0.122
 sight+0.116
 groceries+0.114
Neg logit contributions
 could-0.155
 reached-0.148
 lived-0.142
 didn-0.140
 stood-0.138
Token she
Token ID673
Feature activation+1.75e+00

Pos logit contributions
 attention+0.046
 cotton+0.044
 things+0.043
 new+0.042
 food+0.042
Neg logit contributions
 could-0.038
 reached-0.036
 agreed-0.036
olded-0.036
 couldn-0.036
Token found
Token ID1043
Feature activation-2.13e+00

Pos logit contributions
olded+0.067
 agreed+0.060
 From+0.060
Well+0.057
ver+0.056
Neg logit contributions
 dark-0.088
 sand-0.088
 things-0.088
 gloomy-0.084
 red-0.082
Token a
Token ID257
Feature activation-4.33e+00

Pos logit contributions
 attention+0.081
 cotton+0.075
 TV+0.073
 job+0.071
 rainy+0.070
Neg logit contributions
 could-0.111
 reached-0.107
 couldn-0.102
 understood-0.101
 thanked-0.101
Token shiny
Token ID22441
Feature activation-5.56e+00

Pos logit contributions
 attention+0.150
ags+0.134
 choices+0.107
 groceries+0.107
 cotton+0.103
Neg logit contributions
 could-0.249
 couldn-0.239
 hadn-0.237
'd-0.228
 didn-0.222
Token coin
Token ID10752
Feature activation-2.73e+00

Pos logit contributions
 attention+0.225
 unnecessary+0.190
 films+0.167
 groceries+0.166
 storms+0.163
Neg logit contributions
 could-0.265
 replied-0.258
 was-0.256
 couldn-0.253
 wanted-0.248
Token on
Token ID319
Feature activation-2.62e+00

Pos logit contributions
 attention+0.107
 cotton+0.104
 TV+0.100
 purpose+0.099
 groceries+0.096
Neg logit contributions
 could-0.146
 couldn-0.132
 reached-0.131
 hadn-0.129
 suggested-0.129
Token the
Token ID262
Feature activation-3.04e+00

Pos logit contributions
 attention+0.130
 gift+0.106
 cotton+0.105
 rainy+0.103
 job+0.102
Neg logit contributions
 could-0.108
 reached-0.108
 couldn-0.104
 thanked-0.102
 asked-0.099
Token ground
Token ID2323
Feature activation-9.85e-01

Pos logit contributions
 attention+0.126
 cotton+0.118
 things+0.114
 sight+0.110
 gift+0.110
Neg logit contributions
 could-0.136
 couldn-0.127
 thanked-0.127
 reached-0.125
 hadn-0.117
Token.
Token ID13
Feature activation+8.96e-01

Pos logit contributions
 attention+0.029
 cotton+0.027
 things+0.026
 new+0.026
 food+0.025
Neg logit contributions
 could-0.058
 reached-0.056
 agreed-0.055
olded-0.055
 couldn-0.055
Token She
Token ID1375
Feature activation+2.03e+00

Pos logit contributions
 attention+0.035
 cotton+0.034
 things+0.033
 new+0.033
 food+0.033
Neg logit contributions
 could-0.015
olded-0.014
 agreed-0.014
 reached-0.014
 thanked-0.014
Token saw
Token ID2497
Feature activation-2.25e+00

Pos logit contributions
olded+0.089
 apologised+0.077
ave+0.071
oped+0.071
 From+0.070
Neg logit contributions
 red-0.101
 new-0.099
 sunny-0.097
 shiny-0.097
 round-0.094
Token a
Token ID257
Feature activation-4.09e+00

Pos logit contributions
 attention+0.086
 job+0.079
 gift+0.074
 purpose+0.074
rier+0.074
Neg logit contributions
 could-0.117
 reached-0.112
 couldn-0.108
 thanked-0.107
 agreed-0.107
Token small
Token ID1402
Feature activation-4.43e+00

Pos logit contributions
 attention+0.156
ags+0.127
 any+0.120
 groceries+0.115
 choices+0.108
Neg logit contributions
 could-0.235
 couldn-0.216
 said-0.212
 had-0.212
 hadn-0.210
Token,
Token ID11
Feature activation-3.88e+00

Pos logit contributions
 attention+0.200
 unnecessary+0.156
ags+0.156
 any+0.151
 groceries+0.150
Neg logit contributions
 could-0.197
 wanted-0.187
 said-0.187
 lived-0.185
 asked-0.181
Token lost
Token ID2626
Feature activation-5.55e+00

Pos logit contributions
 attention+0.118
 purpose+0.095
reating+0.093
 job+0.088
 lunch+0.087
Neg logit contributions
 could-0.230
 hadn-0.223
 wanted-0.220
 had-0.218
 stood-0.215
Token dog
Token ID3290
Feature activation-1.97e+00

Pos logit contributions
 attention+0.249
 unnecessary+0.200
 cotton+0.181
 choices+0.179
ags+0.174
Neg logit contributions
 lived-0.239
 could-0.228
 wanted-0.227
 was-0.219
 had-0.207
Token near
Token ID1474
Feature activation-2.62e+00

Pos logit contributions
 attention+0.088
 cotton+0.079
 groceries+0.073
 sight+0.073
 sand+0.072
Neg logit contributions
 could-0.098
 suggested-0.090
 couldn-0.090
 reached-0.088
 hadn-0.088
Token her
Token ID607
Feature activation-2.84e+00

Pos logit contributions
 attention+0.118
rier+0.112
 cotton+0.108
 job+0.107
 rainy+0.104
Neg logit contributions
 could-0.104
 thanked-0.101
 asked-0.099
 From-0.099
 reached-0.098
Token door
Token ID3420
Feature activation-1.63e+00

Pos logit contributions
 cotton+0.108
 attention+0.102
 rainy+0.090
 dessert+0.090
rier+0.089
Neg logit contributions
 could-0.144
 couldn-0.136
 thanked-0.130
 said-0.126
 From-0.125
Token.
Token ID13
Feature activation+3.39e-01

Pos logit contributions
 attention+0.057
 cotton+0.057
 things+0.052
 dessert+0.052
 groceries+0.051
Neg logit contributions
 could-0.088
 reached-0.083
 agreed-0.083
 suggested-0.083
 couldn-0.082
Token cat
Token ID3797
Feature activation+1.18e+00

Pos logit contributions
 attention+0.134
 cotton+0.112
 groceries+0.110
 things+0.101
 any+0.101
Neg logit contributions
 could-0.142
 lived-0.136
 reached-0.133
 nodded-0.132
 couldn-0.132
Token named
Token ID3706
Feature activation-1.58e+00

Pos logit contributions
olded+0.052
 hadn+0.047
 agreed+0.046
 apologised+0.046
ave+0.045
Neg logit contributions
 thing-0.053
 things-0.051
 food-0.050
 red-0.050
 new-0.048
Token Tom
Token ID4186
Feature activation+1.60e+00

Pos logit contributions
 attention+0.085
 cotton+0.083
 things+0.079
 sight+0.078
 gift+0.077
Neg logit contributions
 could-0.056
 reached-0.053
 couldn-0.051
 agreed-0.050
 suggested-0.049
Token saw
Token ID2497
Feature activation-2.11e+00

Pos logit contributions
olded+0.057
 apologised+0.045
ave+0.044
Well+0.044
 hadn+0.043
Neg logit contributions
 things-0.090
 food-0.089
 thing-0.088
 red-0.086
 sand-0.083
Token a
Token ID257
Feature activation-4.14e+00

Pos logit contributions
 attention+0.089
 cotton+0.077
 gift+0.074
 groceries+0.074
 job+0.073
Neg logit contributions
 could-0.104
 reached-0.101
 lived-0.098
 thanked-0.097
 agreed-0.096
Token shiny
Token ID22441
Feature activation-5.55e+00

Pos logit contributions
 attention+0.182
ags+0.159
 groceries+0.138
 choices+0.137
reating+0.137
Neg logit contributions
 could-0.213
 couldn-0.204
 didn-0.197
 forgot-0.196
 wanted-0.196
Token ornament
Token ID45618
Feature activation-4.34e+00

Pos logit contributions
 attention+0.247
 unnecessary+0.202
 groceries+0.178
 choices+0.176
reating+0.170
Neg logit contributions
 replied-0.235
 lived-0.229
 wanted-0.224
 couldn-0.223
 could-0.222
Token hanging
Token ID10938
Feature activation-2.66e+00

Pos logit contributions
 attention+0.212
 cotton+0.201
 purpose+0.198
 groceries+0.194
 unnecessary+0.191
Neg logit contributions
 could-0.196
 couldn-0.172
 stood-0.166
 suggested-0.166
 wanted-0.165
Token on
Token ID319
Feature activation-2.16e+00

Pos logit contributions
 attention+0.107
 cotton+0.103
 dessert+0.102
 groceries+0.097
 TV+0.097
Neg logit contributions
 could-0.128
 reached-0.118
 couldn-0.116
 stood-0.114
 suggested-0.113
Token a
Token ID257
Feature activation-4.17e+00

Pos logit contributions
 attention+0.088
 gift+0.073
 cotton+0.071
 rainy+0.069
 job+0.068
Neg logit contributions
 could-0.107
 reached-0.103
 thanked-0.100
 From-0.098
 couldn-0.098
Token tree
Token ID5509
Feature activation-1.67e+00

Pos logit contributions
 attention+0.222
 choices+0.182
 luck+0.180
 unnecessary+0.175
 consequences+0.173
Neg logit contributions
 could-0.174
 couldn-0.157
 reached-0.147
 lived-0.144
 was-0.141
Token playing
Token ID2712
Feature activation-3.01e+00

Pos logit contributions
 attention+0.072
 cotton+0.067
 gift+0.065
 new+0.064
 things+0.064
Neg logit contributions
 could-0.063
 reached-0.060
 couldn-0.060
 thanked-0.058
 agreed-0.057
Token,
Token ID11
Feature activation-1.66e+00

Pos logit contributions
 attention+0.156
 cotton+0.139
 sight+0.131
 groceries+0.129
 ours+0.128
Neg logit contributions
 could-0.121
 lived-0.114
 reached-0.113
 agreed-0.107
 was-0.104
Token he
Token ID339
Feature activation+1.54e+00

Pos logit contributions
 attention+0.093
 cotton+0.086
 gift+0.086
 new+0.083
 things+0.083
Neg logit contributions
 could-0.056
 reached-0.053
 couldn-0.052
 agreed-0.050
 thanked-0.049
Token found
Token ID1043
Feature activation-2.19e+00

Pos logit contributions
olded+0.058
 From+0.053
Well+0.052
 agreed+0.049
oped+0.048
Neg logit contributions
 dark-0.078
 things-0.077
 sand-0.076
 red-0.076
 new-0.075
Token a
Token ID257
Feature activation-4.24e+00

Pos logit contributions
 attention+0.088
 cotton+0.082
 rainy+0.081
 birthday+0.079
 TV+0.078
Neg logit contributions
 could-0.109
 reached-0.107
 thanked-0.100
 couldn-0.099
 stood-0.098
Token shiny
Token ID22441
Feature activation-5.55e+00

Pos logit contributions
 attention+0.143
ags+0.129
 groceries+0.108
 choices+0.103
reating+0.103
Neg logit contributions
 could-0.247
 couldn-0.234
 hadn-0.232
'd-0.226
 had-0.220
Token coin
Token ID10752
Feature activation-3.43e+00

Pos logit contributions
 attention+0.226
 unnecessary+0.195
 groceries+0.167
 storms+0.163
 temptation+0.163
Neg logit contributions
 could-0.264
 was-0.256
 replied-0.247
 lived-0.246
 couldn-0.245
Token on
Token ID319
Feature activation-2.77e+00

Pos logit contributions
 attention+0.147
 purpose+0.144
 cotton+0.144
 TV+0.139
 rainy+0.136
Neg logit contributions
 could-0.177
 couldn-0.154
 reached-0.151
 hadn-0.148
 suggested-0.148
Token the
Token ID262
Feature activation-3.21e+00

Pos logit contributions
 attention+0.140
 gift+0.117
 cotton+0.114
 rainy+0.112
 birthday+0.111
Neg logit contributions
 reached-0.112
 could-0.111
 couldn-0.105
 thanked-0.105
 From-0.102
Token floor
Token ID4314
Feature activation-1.15e+00

Pos logit contributions
 attention+0.138
 cotton+0.127
 things+0.123
 gift+0.121
 sight+0.120
Neg logit contributions
 could-0.138
 thanked-0.129
 couldn-0.128
 reached-0.126
 hadn-0.116
Token.
Token ID13
Feature activation+6.49e-01

Pos logit contributions
 attention+0.036
 cotton+0.034
 things+0.032
 new+0.032
 food+0.030
Neg logit contributions
 could-0.066
 reached-0.064
 agreed-0.064
olded-0.063
 couldn-0.063
Token day
Token ID1110
Feature activation-2.25e+00

Pos logit contributions
 attention+0.071
 cotton+0.069
 groceries+0.063
rier+0.062
 things+0.059
Neg logit contributions
 could-0.128
 couldn-0.122
 reached-0.120
 lived-0.120
 agreed-0.120
Token,
Token ID11
Feature activation-2.09e+00

Pos logit contributions
 attention+0.106
 cotton+0.098
 groceries+0.096
 sight+0.094
rier+0.092
Neg logit contributions
 could-0.102
 couldn-0.095
 reached-0.092
 lived-0.089
 agreed-0.088
Token she
Token ID673
Feature activation+1.28e+00

Pos logit contributions
 attention+0.124
 cotton+0.113
 purpose+0.112
 groceries+0.111
 sight+0.110
Neg logit contributions
 could-0.073
 couldn-0.065
 reached-0.062
 agreed-0.059
 didn-0.058
Token found
Token ID1043
Feature activation-2.55e+00

Pos logit contributions
olded+0.045
oped+0.038
Well+0.037
 agreed+0.037
 apologised+0.036
Neg logit contributions
 things-0.069
 red-0.067
 food-0.067
 dark-0.067
 new-0.066
Token a
Token ID257
Feature activation-4.44e+00

Pos logit contributions
 attention+0.106
 cotton+0.097
 job+0.094
 TV+0.094
 rainy+0.091
Neg logit contributions
 could-0.123
 reached-0.119
 thanked-0.113
 understood-0.112
 couldn-0.111
Token shiny
Token ID22441
Feature activation-5.55e+00

Pos logit contributions
 attention+0.164
ags+0.148
 choices+0.125
reating+0.116
 groceries+0.116
Neg logit contributions
 could-0.246
 couldn-0.238
 hadn-0.229
'd-0.226
 didn-0.222
Token gold
Token ID3869
Feature activation-4.31e+00

Pos logit contributions
 attention+0.222
 unnecessary+0.190
 groceries+0.163
 emergencies+0.161
 temptation+0.160
Neg logit contributions
 could-0.258
 replied-0.249
 lived-0.246
 was-0.246
 couldn-0.245
Token coin
Token ID10752
Feature activation-2.65e+00

Pos logit contributions
 attention+0.179
 rainy+0.161
 unnecessary+0.159
 temptation+0.147
 groceries+0.147
Neg logit contributions
 could-0.224
 reached-0.195
 said-0.189
 couldn-0.188
 had-0.188
Token on
Token ID319
Feature activation-2.79e+00

Pos logit contributions
 attention+0.103
 cotton+0.101
 TV+0.097
 purpose+0.097
 dessert+0.093
Neg logit contributions
 could-0.144
 couldn-0.130
 reached-0.127
 hadn-0.125
 suggested-0.124
Token the
Token ID262
Feature activation-3.38e+00

Pos logit contributions
 attention+0.141
 gift+0.114
 cotton+0.113
 unnecessary+0.112
 groceries+0.110
Neg logit contributions
 could-0.111
 reached-0.108
 couldn-0.105
 thanked-0.104
 From-0.103
Token ground
Token ID2323
Feature activation-9.06e-01

Pos logit contributions
 attention+0.143
 cotton+0.127
 things+0.123
 unnecessary+0.121
 sight+0.116
Neg logit contributions
 could-0.148
 couldn-0.142
 thanked-0.140
 reached-0.135
 lived-0.130
Token named
Token ID3706
Feature activation-1.41e+00

Pos logit contributions
 attention+0.060
 cotton+0.058
 things+0.057
 new+0.056
 food+0.055
Neg logit contributions
 could-0.025
 reached-0.023
 agreed-0.023
 couldn-0.022
olded-0.022
Token Tim
Token ID5045
Feature activation+1.70e+00

Pos logit contributions
 attention+0.079
 cotton+0.077
 things+0.073
 sight+0.073
 new+0.072
Neg logit contributions
 could-0.049
 reached-0.045
 couldn-0.044
 agreed-0.044
 thanked-0.042
Token went
Token ID1816
Feature activation-3.56e+00

Pos logit contributions
olded+0.063
 apologised+0.048
Well+0.048
ave+0.048
 hadn+0.047
Neg logit contributions
 things-0.092
 food-0.090
 thing-0.089
 red-0.089
 sand-0.086
Token to
Token ID284
Feature activation-1.74e+00

Pos logit contributions
 purpose+0.169
rier+0.167
 cotton+0.166
 sight+0.165
 avail+0.162
Neg logit contributions
 could-0.135
 reached-0.131
olded-0.124
 rode-0.123
 suggested-0.122
Token get
Token ID651
Feature activation-2.93e+00

Pos logit contributions
 attention+0.073
 cotton+0.068
 things+0.063
 gift+0.063
 thing+0.063
Neg logit contributions
 could-0.081
 reached-0.078
 couldn-0.078
 thanked-0.077
 agreed-0.074
Token a
Token ID257
Feature activation-5.50e+00

Pos logit contributions
 purpose+0.106
 sight+0.104
 cotton+0.103
rier+0.099
 thing+0.099
Neg logit contributions
 could-0.144
 couldn-0.136
 asked-0.135
 thanked-0.133
 suggested-0.133
Token haircut
Token ID45548
Feature activation-2.04e+00

Pos logit contributions
 attention+0.166
ags+0.166
rier+0.155
 choices+0.147
agging+0.138
Neg logit contributions
 could-0.289
 couldn-0.277
 said-0.276
 replied-0.266
 suggested-0.264
Token.
Token ID13
Feature activation-6.79e-01

Pos logit contributions
 attention+0.071
 cotton+0.070
 sight+0.065
 things+0.064
 sand+0.063
Neg logit contributions
 could-0.108
 agreed-0.102
olded-0.101
 couldn-0.101
 suggested-0.100
Token He
Token ID679
Feature activation+2.75e+00

Pos logit contributions
 attention+0.042
 cotton+0.040
 things+0.040
 new+0.040
 food+0.039
Neg logit contributions
 could-0.018
olded-0.017
 agreed-0.016
 reached-0.016
 hadn-0.016
Token was
Token ID373
Feature activation-2.32e+00

Pos logit contributions
olded+0.102
oped+0.097
 apologised+0.097
 congratulated+0.093
 From+0.093
Neg logit contributions
 red-0.144
 shiny-0.136
 new-0.135
 big-0.132
 things-0.130
Token very
Token ID845
Feature activation-1.04e+00

Pos logit contributions
 cotton+0.087
 attention+0.084
 purpose+0.081
 sand+0.078
 groceries+0.077
Neg logit contributions
 could-0.120
 couldn-0.115
 reached-0.114
 asked-0.111
 thanked-0.111
Token day
Token ID1110
Feature activation-1.33e+00

Pos logit contributions
 attention+0.027
 cotton+0.025
 things+0.023
 new+0.022
 food+0.021
Neg logit contributions
 could-0.073
 couldn-0.071
 agreed-0.071
 reached-0.070
 thanked-0.070
Token,
Token ID11
Feature activation-1.41e+00

Pos logit contributions
 attention+0.050
 cotton+0.048
 things+0.045
 new+0.044
 sight+0.044
Neg logit contributions
 could-0.068
 reached-0.065
 agreed-0.065
 couldn-0.065
 suggested-0.064
Token they
Token ID484
Feature activation+2.20e+00

Pos logit contributions
 attention+0.074
 cotton+0.073
 things+0.069
 new+0.068
 gift+0.067
Neg logit contributions
 could-0.052
 reached-0.048
 agreed-0.048
 couldn-0.048
 suggested-0.047
Token found
Token ID1043
Feature activation-1.70e+00

Pos logit contributions
etta+0.079
olded+0.075
lor+0.072
 From+0.068
zed+0.068
Neg logit contributions
 red-0.108
 dark-0.106
 big-0.104
 sand-0.102
 home-0.102
Token a
Token ID257
Feature activation-4.35e+00

Pos logit contributions
 attention+0.056
 cotton+0.053
 things+0.048
 gift+0.048
 TV+0.048
Neg logit contributions
 could-0.093
 reached-0.090
 couldn-0.088
 agreed-0.088
olded-0.087
Token shiny
Token ID22441
Feature activation-5.48e+00

Pos logit contributions
 attention+0.144
ags+0.126
 choices+0.102
 any+0.097
 temptation+0.096
Neg logit contributions
 could-0.250
 couldn-0.244
 hadn-0.240
'd-0.231
 suggested-0.230
Token ring
Token ID5858
Feature activation-2.17e+00

Pos logit contributions
 attention+0.209
 unnecessary+0.181
 groceries+0.148
 temptation+0.146
 numb+0.146
Neg logit contributions
 could-0.262
 lived-0.260
 replied-0.260
 was-0.259
 couldn-0.254
Token in
Token ID287
Feature activation-2.16e+00

Pos logit contributions
 attention+0.078
 cotton+0.071
 groceries+0.067
 things+0.067
 dessert+0.066
Neg logit contributions
 could-0.118
 reached-0.110
 couldn-0.110
 suggested-0.108
 agreed-0.107
Token her
Token ID607
Feature activation-1.36e+00

Pos logit contributions
 attention+0.092
 cotton+0.078
 job+0.078
 groceries+0.075
 rainy+0.075
Neg logit contributions
 thanked-0.096
 could-0.096
 reached-0.095
 suggested-0.094
 agreed-0.091
Token drawer
Token ID33451
Feature activation-4.29e-02

Pos logit contributions
 attention+0.044
 cotton+0.043
 things+0.038
 food+0.037
 rainy+0.037
Neg logit contributions
 could-0.077
 couldn-0.073
 thanked-0.072
 reached-0.072
olded-0.071
Token.
Token ID13
Feature activation+1.26e+00

Pos logit contributions
 attention+0.001
 things+0.001
 cotton+0.001
 new+0.001
 food+0.001
Neg logit contributions
 could-0.003
olded-0.003
 reached-0.003
 thanked-0.003
 hadn-0.003
Token day
Token ID1110
Feature activation-2.25e+00

Pos logit contributions
 attention+0.072
 cotton+0.071
rier+0.065
 groceries+0.064
 things+0.060
Neg logit contributions
 could-0.119
 couldn-0.113
 lived-0.112
 agreed-0.112
 reached-0.111
Token,
Token ID11
Feature activation-1.62e+00

Pos logit contributions
 attention+0.104
 cotton+0.100
 groceries+0.098
 sight+0.096
 gift+0.094
Neg logit contributions
 could-0.099
 reached-0.093
 couldn-0.090
 agreed-0.089
 suggested-0.086
Token Tom
Token ID4186
Feature activation+2.73e+00

Pos logit contributions
 attention+0.086
 cotton+0.081
 gift+0.078
 things+0.077
 groceries+0.076
Neg logit contributions
 could-0.060
 reached-0.058
 agreed-0.056
 couldn-0.055
 thanked-0.054
Token found
Token ID1043
Feature activation-1.73e+00

Pos logit contributions
olded+0.115
Well+0.114
oped+0.109
 congratulated+0.102
Alright+0.100
Neg logit contributions
 food-0.122
 things-0.120
 new-0.119
 home-0.119
 dark-0.117
Token a
Token ID257
Feature activation-4.41e+00

Pos logit contributions
 attention+0.061
 cotton+0.056
 rainy+0.053
 job+0.053
 TV+0.053
Neg logit contributions
 could-0.093
 reached-0.090
 thanked-0.088
 agreed-0.088
olded-0.087
Token shiny
Token ID22441
Feature activation-5.48e+00

Pos logit contributions
 attention+0.146
ags+0.132
 choices+0.109
reating+0.106
 groceries+0.105
Neg logit contributions
 could-0.256
 couldn-0.243
 hadn-0.238
'd-0.233
 had-0.230
Token copper
Token ID15317
Feature activation-3.98e+00

Pos logit contributions
 attention+0.209
 unnecessary+0.172
 groceries+0.153
 films+0.149
 temptation+0.147
Neg logit contributions
 could-0.272
 was-0.260
 lived-0.260
 couldn-0.253
 wanted-0.253
Token coin
Token ID10752
Feature activation-2.67e+00

Pos logit contributions
 attention+0.154
 rainy+0.130
 TV+0.129
 cotton+0.125
 unnecessary+0.121
Neg logit contributions
 could-0.227
 couldn-0.194
 was-0.187
 hadn-0.187
 lived-0.186
Token in
Token ID287
Feature activation-1.96e+00

Pos logit contributions
 attention+0.102
 cotton+0.102
 purpose+0.098
 TV+0.098
 dessert+0.095
Neg logit contributions
 could-0.147
 couldn-0.132
 reached-0.128
 hadn-0.127
 suggested-0.126
Token the
Token ID262
Feature activation-1.98e+00

Pos logit contributions
 attention+0.083
 cotton+0.073
 job+0.071
 rainy+0.070
 gift+0.069
Neg logit contributions
 could-0.089
 reached-0.088
 thanked-0.088
 suggested-0.085
 agreed-0.083
Token garden
Token ID11376
Feature activation-7.92e-01

Pos logit contributions
 attention+0.077
 cotton+0.071
 things+0.062
 breath+0.061
 sight+0.059
Neg logit contributions
 could-0.103
 reached-0.098
 thanked-0.096
 lived-0.093
 couldn-0.092
Token.
Token ID13
Feature activation+3.25e-01

Pos logit contributions
 attention+0.071
 cotton+0.067
 gift+0.062
 dessert+0.060
 groceries+0.060
Neg logit contributions
 could-0.100
 agreed-0.095
 suggested-0.095
 reached-0.094
 thanked-0.094
Token They
Token ID1119
Feature activation+1.90e+00

Pos logit contributions
olded+0.011
ave+0.011
 hadn+0.011
 thanked+0.011
 agreed+0.011
Neg logit contributions
 sand-0.017
 attention-0.016
 new-0.016
 things-0.016
 red-0.016
Token saw
Token ID2497
Feature activation-2.20e+00

Pos logit contributions
olded+0.078
lor+0.070
ave+0.069
 apologised+0.067
 From+0.065
Neg logit contributions
 red-0.099
 round-0.095
 sunny-0.093
 shiny-0.093
 sand-0.093
Token a
Token ID257
Feature activation-4.29e+00

Pos logit contributions
 attention+0.080
 job+0.072
 gift+0.069
 cotton+0.069
 TV+0.068
Neg logit contributions
 could-0.119
 reached-0.112
 couldn-0.109
 agreed-0.109
 thanked-0.108
Token big
Token ID1263
Feature activation-4.50e+00

Pos logit contributions
 attention+0.158
ags+0.135
 any+0.118
 groceries+0.116
 choices+0.106
Neg logit contributions
 could-0.252
 couldn-0.235
 said-0.228
 hadn-0.228
 had-0.226
Token shiny
Token ID22441
Feature activation-5.48e+00

Pos logit contributions
 attention+0.202
 unnecessary+0.162
ags+0.160
 groceries+0.156
rier+0.154
Neg logit contributions
 could-0.201
 said-0.194
 couldn-0.191
 lived-0.190
 replied-0.190
Token crystal
Token ID15121
Feature activation-3.21e+00

Pos logit contributions
 attention+0.229
 unnecessary+0.197
 groceries+0.167
ags+0.166
 numb+0.162
Neg logit contributions
 could-0.252
 was-0.247
 replied-0.246
 lived-0.245
 wanted-0.242
Token on
Token ID319
Feature activation-3.32e+00

Pos logit contributions
 attention+0.140
 unnecessary+0.118
 groceries+0.115
 purpose+0.114
 cotton+0.113
Neg logit contributions
 could-0.167
 couldn-0.151
 reached-0.150
 hadn-0.146
 suggested-0.145
Token the
Token ID262
Feature activation-3.39e+00

Pos logit contributions
 attention+0.183
agging+0.156
 gift+0.155
 unnecessary+0.152
 birthday+0.152
Neg logit contributions
 reached-0.119
 could-0.119
 couldn-0.109
 thanked-0.107
 lived-0.104
Token grass
Token ID8701
Feature activation-1.09e+00

Pos logit contributions
 cotton+0.128
 attention+0.124
 dessert+0.120
 sight+0.119
 gift+0.117
Neg logit contributions
 could-0.152
 couldn-0.145
 thanked-0.139
 reached-0.132
 hadn-0.128
Token.
Token ID13
Feature activation+6.58e-01

Pos logit contributions
 attention+0.035
 cotton+0.033
 things+0.031
 new+0.031
 food+0.030
Neg logit contributions
 could-0.062
 reached-0.059
 agreed-0.059
 couldn-0.059
olded-0.059
Token park
Token ID3952
Feature activation-1.06e+00

Pos logit contributions
 attention+0.134
rier+0.115
 cotton+0.113
 gift+0.103
 groceries+0.101
Neg logit contributions
 could-0.122
 reached-0.115
 thanked-0.115
 lived-0.115
 From-0.113
Token.
Token ID13
Feature activation-2.81e-01

Pos logit contributions
 attention+0.032
 cotton+0.031
 things+0.029
 new+0.029
 food+0.028
Neg logit contributions
 could-0.061
 reached-0.059
 agreed-0.059
olded-0.058
 couldn-0.058
Token They
Token ID1119
Feature activation+1.67e+00

Pos logit contributions
 attention+0.015
 cotton+0.015
 new+0.015
 things+0.015
 food+0.015
Neg logit contributions
olded-0.009
 could-0.009
 couldn-0.008
 hadn-0.008
ave-0.008
Token find
Token ID1064
Feature activation-1.79e+00

Pos logit contributions
olded+0.099
ave+0.092
 apologised+0.089
 suggested+0.088
 hadn+0.088
Neg logit contributions
 red-0.054
 round-0.053
 sand-0.051
 shiny-0.050
 sunny-0.050
Token a
Token ID257
Feature activation-3.88e+00

Pos logit contributions
 attention+0.057
 cotton+0.053
 TV+0.051
 job+0.050
 sight+0.049
Neg logit contributions
 could-0.100
 reached-0.096
 agreed-0.096
 thanked-0.094
olded-0.093
Token shiny
Token ID22441
Feature activation-5.47e+00

Pos logit contributions
 attention+0.135
ags+0.109
rier+0.102
ocol+0.101
 any+0.101
Neg logit contributions
 could-0.218
 forgot-0.202
 said-0.200
 couldn-0.200
 suggested-0.199
Token stone
Token ID7815
Feature activation-2.11e+00

Pos logit contributions
 attention+0.216
 unnecessary+0.193
 temptation+0.163
 numb+0.161
 emergencies+0.161
Neg logit contributions
 could-0.255
 lived-0.254
 was-0.253
 replied-0.244
 wanted-0.239
Token on
Token ID319
Feature activation-2.94e+00

Pos logit contributions
 attention+0.080
 cotton+0.074
 groceries+0.070
 TV+0.069
 dessert+0.069
Neg logit contributions
 could-0.115
 couldn-0.104
 agreed-0.104
 reached-0.104
 suggested-0.103
Token the
Token ID262
Feature activation-3.16e+00

Pos logit contributions
 attention+0.153
 gift+0.133
 job+0.132
rier+0.131
agging+0.127
Neg logit contributions
 could-0.109
 reached-0.106
 thanked-0.097
 couldn-0.097
 lived-0.095
Token ground
Token ID2323
Feature activation-1.04e+00

Pos logit contributions
 attention+0.123
 sight+0.118
 gift+0.118
 dessert+0.117
 cotton+0.116
Neg logit contributions
 could-0.138
 thanked-0.125
 couldn-0.125
 reached-0.119
 hadn-0.116
Token.
Token ID13
Feature activation-3.67e-03

Pos logit contributions
 attention+0.031
 cotton+0.029
 things+0.028
 new+0.027
 food+0.026
Neg logit contributions
 could-0.061
 reached-0.058
 agreed-0.058
olded-0.058
 couldn-0.058
Token,
Token ID11
Feature activation-2.33e+00

Pos logit contributions
 attention+0.130
 cotton+0.119
rier+0.117
 rules+0.114
 gift+0.114
Neg logit contributions
 could-0.111
 couldn-0.105
 reached-0.104
 didn-0.096
 lived-0.096
Token Tim
Token ID5045
Feature activation+2.05e-01

Pos logit contributions
 attention+0.151
 gift+0.138
 cotton+0.137
 purpose+0.133
rier+0.133
Neg logit contributions
 could-0.068
 couldn-0.065
 reached-0.060
 didn-0.057
 agreed-0.054
Tokenmy
Token ID1820
Feature activation+1.70e+00

Pos logit contributions
olded+0.009
 could+0.008
 agreed+0.008
 thanked+0.008
 reached+0.008
Neg logit contributions
 attention-0.009
 things-0.009
 new-0.009
 food-0.009
 red-0.009
Token found
Token ID1043
Feature activation-2.37e+00

Pos logit contributions
olded+0.063
Well+0.057
oped+0.056
 hadn+0.053
 agreed+0.053
Neg logit contributions
 things-0.085
 food-0.084
 sand-0.083
 new-0.083
 red-0.083
Token a
Token ID257
Feature activation-4.10e+00

Pos logit contributions
 attention+0.096
 cotton+0.088
 rainy+0.086
 birthday+0.085
 TV+0.084
Neg logit contributions
 could-0.116
 reached-0.112
 thanked-0.106
 couldn-0.106
 suggested-0.105
Token shiny
Token ID22441
Feature activation-5.46e+00

Pos logit contributions
 attention+0.134
ags+0.112
 choices+0.098
 groceries+0.096
 any+0.093
Neg logit contributions
 could-0.239
 couldn-0.231
 hadn-0.225
'd-0.220
 didn-0.215
Token rock
Token ID3881
Feature activation-2.62e+00

Pos logit contributions
 attention+0.214
 unnecessary+0.181
 films+0.164
 groceries+0.164
 emergencies+0.158
Neg logit contributions
 could-0.262
 replied-0.249
 couldn-0.248
 was-0.247
 lived-0.245
Token on
Token ID319
Feature activation-3.07e+00

Pos logit contributions
 attention+0.108
 cotton+0.099
 purpose+0.095
 dessert+0.094
 groceries+0.093
Neg logit contributions
 could-0.135
 couldn-0.126
 reached-0.125
 suggested-0.121
 stood-0.121
Token the
Token ID262
Feature activation-3.36e+00

Pos logit contributions
 attention+0.164
 gift+0.141
 birthday+0.138
agging+0.133
 unnecessary+0.132
Neg logit contributions
 could-0.113
 reached-0.113
 couldn-0.105
 thanked-0.104
 From-0.102
Token ground
Token ID2323
Feature activation-1.29e+00

Pos logit contributions
 attention+0.133
 cotton+0.131
 gift+0.128
 sight+0.124
 things+0.124
Neg logit contributions
 could-0.152
 couldn-0.141
 thanked-0.133
 reached-0.131
'd-0.124
Token.
Token ID13
Feature activation+3.42e-01

Pos logit contributions
 attention+0.042
 cotton+0.040
 things+0.037
 new+0.037
 gift+0.036
Neg logit contributions
 could-0.073
 reached-0.070
 couldn-0.069
 agreed-0.069
olded-0.068
Token day
Token ID1110
Feature activation-2.58e+00

Pos logit contributions
rier+0.102
 attention+0.100
 cotton+0.096
 clues+0.090
 groceries+0.088
Neg logit contributions
 could-0.134
 couldn-0.129
 lived-0.128
 wanted-0.125
 agreed-0.124
Token,
Token ID11
Feature activation-2.30e+00

Pos logit contributions
 attention+0.130
 cotton+0.119
 groceries+0.117
rier+0.117
 sight+0.116
Neg logit contributions
 could-0.107
 couldn-0.101
 reached-0.098
 suggested-0.091
 lived-0.091
Token she
Token ID673
Feature activation+1.54e+00

Pos logit contributions
 attention+0.138
 cotton+0.123
 purpose+0.122
 rules+0.121
 sight+0.120
Neg logit contributions
 could-0.079
 couldn-0.072
 reached-0.069
 didn-0.066
 said-0.064
Token saw
Token ID2497
Feature activation-2.06e+00

Pos logit contributions
olded+0.054
Well+0.050
oped+0.049
 agreed+0.045
ver+0.045
Neg logit contributions
 red-0.079
 things-0.079
 food-0.078
 new-0.078
 dark-0.077
Token a
Token ID257
Feature activation-4.22e+00

Pos logit contributions
 attention+0.082
 job+0.071
 gift+0.069
 cotton+0.069
 things+0.065
Neg logit contributions
 could-0.104
 reached-0.100
 couldn-0.098
 thanked-0.098
 suggested-0.098
Token shiny
Token ID22441
Feature activation-5.46e+00

Pos logit contributions
 attention+0.166
ags+0.145
 groceries+0.116
 any+0.116
 choices+0.115
Neg logit contributions
 could-0.232
 couldn-0.220
 hadn-0.215
 didn-0.210
 said-0.209
Token penny
Token ID26369
Feature activation-3.43e+00

Pos logit contributions
 attention+0.228
 unnecessary+0.196
 groceries+0.164
ags+0.162
 injuries+0.159
Neg logit contributions
 replied-0.243
 could-0.243
 was-0.242
 lived-0.236
 wanted-0.234
Token on
Token ID319
Feature activation-3.06e+00

Pos logit contributions
 attention+0.153
 purpose+0.148
 cotton+0.136
 sight+0.135
 TV+0.133
Neg logit contributions
 could-0.173
 couldn-0.155
 reached-0.151
 hadn-0.150
 stood-0.150
Token the
Token ID262
Feature activation-3.42e+00

Pos logit contributions
 attention+0.161
 gift+0.131
 cotton+0.126
 job+0.125
agging+0.125
Neg logit contributions
 reached-0.113
 could-0.110
 thanked-0.110
 asked-0.105
 couldn-0.105
Token ground
Token ID2323
Feature activation-1.30e+00

Pos logit contributions
 attention+0.145
 things+0.130
 cotton+0.129
 sight+0.126
 unnecessary+0.125
Neg logit contributions
 could-0.143
 couldn-0.136
 thanked-0.136
 reached-0.134
 forgot-0.122
Token and
Token ID290
Feature activation+1.38e+00

Pos logit contributions
 attention+0.043
 cotton+0.040
 things+0.038
 new+0.037
 food+0.036
Neg logit contributions
 could-0.073
 reached-0.071
 couldn-0.070
 agreed-0.069
 suggested-0.069
Token day
Token ID1110
Feature activation-2.50e+00

Pos logit contributions
 attention+0.067
 cotton+0.067
rier+0.061
 groceries+0.061
 things+0.059
Neg logit contributions
 could-0.123
 couldn-0.118
 reached-0.118
 agreed-0.117
 lived-0.114
Token,
Token ID11
Feature activation-2.33e+00

Pos logit contributions
 attention+0.124
 cotton+0.113
 groceries+0.111
 sight+0.111
rier+0.109
Neg logit contributions
 could-0.108
 couldn-0.101
 reached-0.099
 lived-0.091
 didn-0.091
Token she
Token ID673
Feature activation+1.56e+00

Pos logit contributions
 attention+0.145
 rules+0.129
 purpose+0.129
 cotton+0.129
 gift+0.128
Neg logit contributions
 could-0.074
 couldn-0.068
 reached-0.065
 didn-0.060
 thanked-0.058
Token saw
Token ID2497
Feature activation-2.20e+00

Pos logit contributions
olded+0.063
Well+0.054
oped+0.053
 From+0.053
ver+0.052
Neg logit contributions
 sand-0.075
 things-0.075
 red-0.074
 dark-0.074
 new-0.074
Token a
Token ID257
Feature activation-4.08e+00

Pos logit contributions
 attention+0.088
 job+0.078
 gift+0.075
 cotton+0.074
rier+0.073
Neg logit contributions
 could-0.112
 reached-0.108
 couldn-0.105
 thanked-0.104
 understood-0.103
Token shiny
Token ID22441
Feature activation-5.45e+00

Pos logit contributions
 attention+0.155
ags+0.127
 groceries+0.113
 any+0.111
 choices+0.107
Neg logit contributions
 could-0.230
 couldn-0.218
 hadn-0.212
 didn-0.206
 said-0.206
Token pe
Token ID613
Feature activation-2.40e+00

Pos logit contributions
 attention+0.225
 unnecessary+0.186
 groceries+0.166
 any+0.156
 injuries+0.156
Neg logit contributions
 replied-0.249
 could-0.246
 lived-0.245
 was-0.241
 wanted-0.240
Tokenb
Token ID65
Feature activation-1.67e+00

Pos logit contributions
 TV+0.130
 cotton+0.129
 attention+0.127
 dessert+0.123
 television+0.123
Neg logit contributions
 could-0.081
 reached-0.078
 couldn-0.073
olded-0.073
 stood-0.071
Tokenble
Token ID903
Feature activation-3.09e+00

Pos logit contributions
 attention+0.057
 cotton+0.056
 TV+0.052
 new+0.051
 rainy+0.051
Neg logit contributions
 could-0.091
 reached-0.087
 couldn-0.087
olded-0.084
 agreed-0.084
Token on
Token ID319
Feature activation-3.23e+00

Pos logit contributions
 attention+0.128
 purpose+0.113
 cotton+0.111
 groceries+0.109
 sight+0.104
Neg logit contributions
 could-0.162
 reached-0.153
 couldn-0.148
 stood-0.147
 suggested-0.142
Token the
Token ID262
Feature activation-3.23e+00

Pos logit contributions
 attention+0.170
 gift+0.140
rier+0.134
 job+0.134
agging+0.133
Neg logit contributions
 reached-0.121
 thanked-0.114
 could-0.112
 asked-0.108
 stood-0.108
Token much
Token ID881
Feature activation-1.86e+00

Pos logit contributions
 new+0.000
 attention+0.000
 things+0.000
 cotton+0.000
 food+0.000
Neg logit contributions
olded-0.000
 could-0.000
 reached-0.000
ave-0.000
 hadn-0.000
Token better
Token ID1365
Feature activation-1.05e-02

Pos logit contributions
 attention+0.052
 cotton+0.050
 TV+0.048
 things+0.047
 dessert+0.046
Neg logit contributions
 could-0.111
 thanked-0.109
 agreed-0.109
 suggested-0.107
 reached-0.107
Token now
Token ID783
Feature activation+9.03e-01

Pos logit contributions
 attention+0.000
 things+0.000
 new+0.000
 cotton+0.000
 thing+0.000
Neg logit contributions
olded-0.001
ave-0.001
 reached-0.001
 could-0.001
 thanked-0.001
Token.
Token ID13
Feature activation+1.90e+00

Pos logit contributions
olded+0.061
ave+0.061
oped+0.057
 reached+0.057
 could+0.056
Neg logit contributions
 things-0.020
 new-0.019
 attention-0.019
 thing-0.018
 cotton-0.017
Token The
Token ID383
Feature activation-1.17e+00

Pos logit contributions
 forgot+0.094
'd+0.089
ave+0.086
 screamed+0.085
 managed+0.085
Neg logit contributions
 TV-0.079
 new-0.077
 sight-0.072
 things-0.071
 birthday-0.069
Token girl
Token ID2576
Feature activation+5.63e+00

Pos logit contributions
 attention+0.035
 cotton+0.033
 things+0.030
 new+0.030
 dessert+0.029
Neg logit contributions
 could-0.068
 agreed-0.066
 reached-0.065
 thanked-0.065
 couldn-0.065
Token was
Token ID373
Feature activation+6.67e-01

Pos logit contributions
olded+0.206
oped+0.202
 Did+0.189
Get+0.184
ave+0.182
Neg logit contributions
 new-0.253
 red-0.242
 more-0.240
 home-0.238
 cake-0.235
Token very
Token ID845
Feature activation+3.49e-01

Pos logit contributions
ave+0.041
 could+0.040
olded+0.040
 suggested+0.039
oped+0.039
Neg logit contributions
 new-0.017
 attention-0.016
 red-0.015
 home-0.015
 dessert-0.015
Token proud
Token ID6613
Feature activation-5.98e-01

Pos logit contributions
ave+0.022
 could+0.021
 reached+0.021
oped+0.021
olded+0.021
Neg logit contributions
 new-0.009
 attention-0.009
 red-0.007
 round-0.007
 things-0.007
Token of
Token ID286
Feature activation-1.03e+00

Pos logit contributions
 attention+0.020
 cotton+0.018
 things+0.018
 new+0.018
 food+0.017
Neg logit contributions
 could-0.032
olded-0.032
 reached-0.032
 agreed-0.031
ave-0.031
Token what
Token ID644
Feature activation+1.02e+00

Pos logit contributions
 attention+0.036
 cotton+0.035
 things+0.033
 new+0.033
 food+0.032
Neg logit contributions
 could-0.054
 reached-0.052
 agreed-0.052
 couldn-0.052
olded-0.052
Token discovery
Token ID9412
Feature activation-2.71e-01

Pos logit contributions
 attention+0.077
 cotton+0.076
rier+0.069
 groceries+0.068
 TV+0.067
Neg logit contributions
 couldn-0.140
 thanked-0.139
 hadn-0.135
 promised-0.134
 could-0.133
Token.
Token ID13
Feature activation+3.20e+00

Pos logit contributions
 attention+0.007
 things+0.007
 cotton+0.007
 new+0.007
 food+0.006
Neg logit contributions
 could-0.016
olded-0.016
 reached-0.016
ave-0.016
 agreed-0.016
Token
Token ID220
Feature activation+4.04e+00

Pos logit contributions
ave+0.175
 forgot+0.172
olded+0.160
 rang+0.150
owed+0.148
Neg logit contributions
 sight-0.108
 new-0.107
 things-0.099
 home-0.096
 attention-0.095
Token
Token ID198
Feature activation+2.68e+00

Pos logit contributions
 forgot+0.223
 stared+0.208
ave+0.205
 wondered+0.203
 spoke+0.202
Neg logit contributions
 new-0.143
 attention-0.126
 home-0.120
 things-0.119
 lunch-0.113
Token
Token ID198
Feature activation+1.98e+00

Pos logit contributions
ave+0.140
 forgot+0.140
olded+0.139
 wouldn+0.129
 stared+0.129
Neg logit contributions
 new-0.096
 long-0.084
 big-0.083
 view-0.081
 things-0.080
TokenJoe
Token ID19585
Feature activation+5.53e+00

Pos logit contributions
olded+0.097
 forgot+0.093
ave+0.093
 missed+0.090
 stared+0.086
Neg logit contributions
 sunny-0.069
 view-0.068
 new-0.067
 sand-0.066
 things-0.066
Token smiled
Token ID13541
Feature activation-8.49e-02

Pos logit contributions
 Did+0.204
olded+0.199
 blamed+0.175
oped+0.169
ver+0.165
Neg logit contributions
 home-0.239
 red-0.239
 new-0.239
 things-0.228
 big-0.227
Token and
Token ID290
Feature activation+3.48e+00

Pos logit contributions
 attention+0.003
 new+0.003
 cotton+0.003
 things+0.003
 red+0.003
Neg logit contributions
 could-0.004
olded-0.004
 reached-0.004
 hadn-0.004
ave-0.004
Token was
Token ID373
Feature activation+1.63e+00

Pos logit contributions
os+0.106
oped+0.104
ave+0.102
att+0.100
olded+0.099
Neg logit contributions
 new-0.198
 sand-0.185
 shiny-0.184
 long-0.184
 round-0.181
Token really
Token ID1107
Feature activation+1.47e+00

Pos logit contributions
ave+0.105
oped+0.101
olded+0.101
 could+0.098
 forgot+0.095
Neg logit contributions
 new-0.039
 home-0.036
 attention-0.035
 sunny-0.035
 red-0.034
Token happy
Token ID3772
Feature activation-4.62e-01

Pos logit contributions
ave+0.088
oped+0.088
olded+0.086
 reached+0.085
 could+0.084
Neg logit contributions
 new-0.042
 attention-0.038
 good-0.036
 nice-0.036
 shiny-0.035
Token it
Token ID340
Feature activation+1.30e+00

Pos logit contributions
ave+0.042
olded+0.042
 reached+0.041
 agreed+0.041
 could+0.041
Neg logit contributions
 things-0.019
 attention-0.018
 new-0.018
 cotton-0.018
 food-0.016
Token.
Token ID13
Feature activation+3.29e+00

Pos logit contributions
olded+0.057
 From+0.056
 thanked+0.056
ave+0.056
 agreed+0.054
Neg logit contributions
 attention-0.056
 new-0.056
 things-0.055
 thing-0.052
 cotton-0.052
Token
Token ID198
Feature activation+2.75e+00

Pos logit contributions
 forgot+0.161
ave+0.158
'd+0.154
 managed+0.147
 thanked+0.146
Neg logit contributions
 new-0.124
 TV-0.123
 lunch-0.118
 long-0.116
 sight-0.115
Token
Token ID198
Feature activation+2.05e+00

Pos logit contributions
ave+0.143
 forgot+0.142
olded+0.134
 stared+0.133
 agreed+0.132
Neg logit contributions
 new-0.102
 long-0.092
 things-0.088
 big-0.086
 home-0.083
TokenCh
Token ID1925
Feature activation+1.47e+00

Pos logit contributions
 couldn+0.098
 thanked+0.098
 could+0.097
ave+0.095
 wouldn+0.094
Neg logit contributions
 sunny-0.070
 things-0.069
 long-0.068
 purpose-0.066
 rainy-0.066
Tokenloe
Token ID24617
Feature activation+5.15e+00

Pos logit contributions
 could+0.076
 thanked+0.073
 suggested+0.072
 wondered+0.071
 forgot+0.071
Neg logit contributions
 attention-0.052
 lunch-0.050
 purpose-0.050
 TV-0.049
 dessert-0.047
Token ran
Token ID4966
Feature activation-1.71e+00

Pos logit contributions
 Did+0.169
Well+0.164
olded+0.156
 From+0.155
Even+0.154
Neg logit contributions
 things-0.213
 red-0.207
 more-0.202
 long-0.201
 black-0.201
Token to
Token ID284
Feature activation-6.09e-01

Pos logit contributions
 cotton+0.059
 attention+0.057
 things+0.054
 dessert+0.054
 gift+0.054
Neg logit contributions
 could-0.089
 reached-0.089
 agreed-0.087
olded-0.085
 couldn-0.085
Token her
Token ID607
Feature activation-1.03e+00

Pos logit contributions
 attention+0.019
 cotton+0.017
 new+0.017
 things+0.017
 food+0.016
Neg logit contributions
 could-0.035
olded-0.035
ave-0.034
 agreed-0.034
 hadn-0.034
Token mom
Token ID1995
Feature activation-9.84e-02

Pos logit contributions
 attention+0.031
 cotton+0.029
 things+0.028
 new+0.027
 food+0.026
Neg logit contributions
 could-0.060
 reached-0.058
 couldn-0.057
olded-0.057
 agreed-0.057
Token and
Token ID290
Feature activation+2.40e+00

Pos logit contributions
 attention+0.004
 new+0.003
 things+0.003
 cotton+0.003
 TV+0.003
Neg logit contributions
 could-0.005
ave-0.005
 hadn-0.005
 reached-0.005
olded-0.005
Token y
Token ID331
Feature activation-3.75e+00

Pos logit contributions
 attention+0.020
 cotton+0.018
 things+0.017
 new+0.016
 food+0.016
Neg logit contributions
 could-0.069
 reached-0.067
 agreed-0.067
olded-0.067
 couldn-0.067
Tokenummy
Token ID13513
Feature activation-2.17e+00

Pos logit contributions
 attention+0.147
 cotton+0.139
 things+0.133
 new+0.131
 fake+0.128
Neg logit contributions
 reached-0.159
 agreed-0.158
 couldn-0.155
 could-0.155
 thanked-0.153
Token w
Token ID266
Feature activation-8.31e-01

Pos logit contributions
 attention+0.054
 cotton+0.046
 groceries+0.044
 things+0.043
 rainy+0.043
Neg logit contributions
 could-0.130
 thanked-0.129
 agreed-0.129
 couldn-0.126
 reached-0.126
Tokenaffle
Token ID30697
Feature activation+1.44e-01

Pos logit contributions
 attention+0.031
 cotton+0.029
 things+0.028
 new+0.028
 food+0.027
Neg logit contributions
 could-0.042
 reached-0.041
olded-0.041
 agreed-0.041
 couldn-0.040
Token.
Token ID13
Feature activation+3.13e+00

Pos logit contributions
 could+0.009
 hadn+0.008
olded+0.008
 reached+0.008
 thanked+0.008
Neg logit contributions
 attention-0.004
 cotton-0.004
 things-0.004
 food-0.004
 new-0.004
Token Mia
Token ID32189
Feature activation+4.60e+00

Pos logit contributions
'd+0.188
 hadn+0.183
ave+0.181
 forgot+0.180
 rode+0.179
Neg logit contributions
 lunch-0.082
 new-0.080
 sun-0.075
 TV-0.074
 tea-0.070
Token was
Token ID373
Feature activation-3.32e-02

Pos logit contributions
olded+0.188
oped+0.179
Well+0.170
anced+0.161
 Did+0.157
Neg logit contributions
 cake-0.192
 red-0.186
 big-0.176
 soup-0.175
 new-0.175
Token happy
Token ID3772
Feature activation-7.48e-01

Pos logit contributions
 attention+0.001
 new+0.001
 cotton+0.001
 dessert+0.001
 things+0.001
Neg logit contributions
 could-0.002
ave-0.002
olded-0.002
oped-0.002
 suggested-0.002
Token she
Token ID673
Feature activation+2.72e+00

Pos logit contributions
 attention+0.026
 cotton+0.025
 things+0.024
 new+0.024
 food+0.023
Neg logit contributions
 could-0.040
olded-0.039
 reached-0.038
 agreed-0.038
 hadn-0.038
Token could
Token ID714
Feature activation-5.56e-03

Pos logit contributions
olded+0.081
 From+0.075
 apologised+0.069
 Even+0.069
oped+0.069
Neg logit contributions
 job-0.141
 food-0.137
 thing-0.136
 good-0.135
 outside-0.132
Token help
Token ID1037
Feature activation-1.63e+00

Pos logit contributions
 attention+0.000
 cotton+0.000
 new+0.000
 lunch+0.000
 dessert+0.000
Neg logit contributions
olded-0.000
ave-0.000
 suggested-0.000
 hadn-0.000
 could-0.000
Token high
Token ID1029
Feature activation-1.16e+00

Pos logit contributions
ave+0.106
olded+0.103
oped+0.101
os+0.097
 apologized+0.094
Neg logit contributions
 new-0.052
 long-0.049
 attention-0.049
 things-0.047
 sunny-0.047
Token off
Token ID572
Feature activation+8.44e-02

Pos logit contributions
 attention+0.041
 cotton+0.039
 things+0.038
 new+0.037
 food+0.037
Neg logit contributions
 could-0.061
 reached-0.058
 agreed-0.058
 couldn-0.058
 thanked-0.058
Token the
Token ID262
Feature activation-1.59e+00

Pos logit contributions
olded+0.005
ave+0.005
 could+0.005
 agreed+0.005
 suggested+0.005
Neg logit contributions
 attention-0.003
 cotton-0.002
 things-0.002
 new-0.002
 sand-0.002
Token ground
Token ID2323
Feature activation+7.95e-01

Pos logit contributions
 attention+0.042
 cotton+0.038
 things+0.037
 new+0.035
 gift+0.035
Neg logit contributions
 could-0.096
 couldn-0.092
 thanked-0.092
 reached-0.092
 hadn-0.091
Token.
Token ID13
Feature activation+4.35e+00

Pos logit contributions
olded+0.047
 could+0.046
ave+0.046
 hadn+0.045
 agreed+0.045
Neg logit contributions
 attention-0.023
 new-0.021
 cotton-0.021
 things-0.021
 sand-0.020
Token Dave
Token ID9935
Feature activation+5.26e+00

Pos logit contributions
 forgot+0.208
ave+0.206
 managed+0.193
 understood+0.191
'd+0.189
Neg logit contributions
 TV-0.185
 lunch-0.171
 new-0.164
 long-0.163
 bed-0.162
Token wanted
Token ID2227
Feature activation-6.74e-01

Pos logit contributions
olded+0.218
Well+0.183
 Did+0.183
ave+0.182
 Sadly+0.170
Neg logit contributions
 red-0.211
 big-0.208
 new-0.190
 black-0.189
 shiny-0.189
Token so
Token ID523
Feature activation-1.56e+00

Pos logit contributions
 attention+0.020
 cotton+0.019
 things+0.018
 new+0.018
 food+0.018
Neg logit contributions
 could-0.040
olded-0.039
 agreed-0.038
 hadn-0.038
ave-0.038
Token badly
Token ID11234
Feature activation-1.66e+00

Pos logit contributions
 attention+0.050
 cotton+0.048
 gift+0.046
 TV+0.046
 dessert+0.044
Neg logit contributions
 could-0.086
 reached-0.083
 agreed-0.083
 couldn-0.083
 suggested-0.082
Token to
Token ID284
Feature activation+6.59e-02

Pos logit contributions
 attention+0.063
 cotton+0.062
 TV+0.059
 gift+0.059
 dessert+0.057
Neg logit contributions
 could-0.080
 reached-0.078
 agreed-0.077
 couldn-0.076
 suggested-0.076
Token rest
Token ID1334
Feature activation-5.14e-01

Pos logit contributions
olded+0.004
ave+0.004
 could+0.004
 hadn+0.003
 suggested+0.003
Neg logit contributions
 attention-0.002
 cotton-0.002
 sand-0.002
 things-0.002
 new-0.002
Token
Token ID220
Feature activation+1.91e+00

Pos logit contributions
'd+0.028
ave+0.028
olded+0.027
 managed+0.027
 hadn+0.027
Neg logit contributions
 new-0.037
 things-0.036
 dessert-0.035
 TV-0.035
 lunch-0.035
Token
Token ID198
Feature activation+9.13e-01

Pos logit contributions
 stared+0.076
ave+0.075
 forgot+0.075
 agreed+0.074
 hadn+0.073
Neg logit contributions
 new-0.091
 attention-0.084
 things-0.084
 TV-0.080
 lunch-0.079
Token
Token ID198
Feature activation+4.73e-01

Pos logit contributions
olded+0.033
 hadn+0.032
ave+0.032
 agreed+0.032
 forgot+0.032
Neg logit contributions
 new-0.047
 things-0.043
 attention-0.043
 red-0.042
 cotton-0.041
TokenFinally
Token ID11158
Feature activation-1.34e+00

Pos logit contributions
 could+0.020
 agreed+0.020
olded+0.019
 hadn+0.019
 thanked+0.019
Neg logit contributions
 new-0.020
 thing-0.020
 things-0.019
 red-0.019
 sand-0.019
Token,
Token ID11
Feature activation-1.74e+00

Pos logit contributions
 attention+0.051
 cotton+0.047
 new+0.045
 things+0.044
 food+0.043
Neg logit contributions
 could-0.068
 reached-0.066
 couldn-0.065
 agreed-0.065
 thanked-0.064
Token they
Token ID484
Feature activation+1.40e+00

Pos logit contributions
 attention+0.093
 cotton+0.086
 gift+0.082
 new+0.082
 things+0.082
Neg logit contributions
 could-0.064
 couldn-0.062
 reached-0.061
 agreed-0.058
 thanked-0.058
Token arrived
Token ID5284
Feature activation-7.19e-01

Pos logit contributions
 From+0.048
olded+0.047
 hadn+0.041
ushed+0.040
etta+0.040
Neg logit contributions
 lunch-0.069
 home-0.069
 dark-0.067
 dessert-0.067
 sand-0.066
Token at
Token ID379
Feature activation-2.14e+00

Pos logit contributions
 attention+0.023
 cotton+0.022
 things+0.021
 new+0.021
 food+0.020
Neg logit contributions
 could-0.040
 reached-0.039
 agreed-0.038
olded-0.038
 hadn-0.038
Token the
Token ID262
Feature activation-1.96e+00

Pos logit contributions
 attention+0.082
 cotton+0.076
 gift+0.072
 things+0.071
 thing+0.071
Neg logit contributions
 thanked-0.101
 could-0.099
 reached-0.097
 couldn-0.096
 agreed-0.094
Token church
Token ID4928
Feature activation+2.11e-01

Pos logit contributions
 attention+0.066
 cotton+0.057
 things+0.051
 gift+0.051
 food+0.050
Neg logit contributions
 could-0.108
 reached-0.105
 thanked-0.103
 couldn-0.101
 agreed-0.099
Token.
Token ID13
Feature activation+9.87e-01

Pos logit contributions
 could+0.012
ave+0.012
olded+0.012
 hadn+0.012
 reached+0.012
Neg logit contributions
 attention-0.006
 new-0.006
 things-0.006
 food-0.005
 cotton-0.005
Token and
Token ID290
Feature activation+5.37e-01

Pos logit contributions
olded+0.016
ave+0.015
 could+0.015
 suggested+0.015
 agreed+0.015
Neg logit contributions
 attention-0.010
 cotton-0.010
 things-0.010
 new-0.010
 food-0.009
Token textures
Token ID20028
Feature activation+2.96e-02

Pos logit contributions
olded+0.032
ave+0.031
oped+0.031
 could+0.031
 agreed+0.031
Neg logit contributions
 attention-0.016
 things-0.015
 cotton-0.015
 new-0.015
 food-0.014
Token she
Token ID673
Feature activation+2.87e+00

Pos logit contributions
olded+0.002
 could+0.002
 reached+0.002
ave+0.002
 thanked+0.002
Neg logit contributions
 attention-0.001
 cotton-0.001
 things-0.001
 new-0.001
 food-0.001
Token saw
Token ID2497
Feature activation+1.14e+00

Pos logit contributions
olded+0.075
oped+0.066
 From+0.064
 apologised+0.059
 Sadly+0.058
Neg logit contributions
 new-0.177
 magic-0.173
 things-0.173
 shiny-0.172
 red-0.168
Token.
Token ID13
Feature activation+2.60e+00

Pos logit contributions
olded+0.068
 reached+0.065
 agreed+0.065
 suggested+0.065
ave+0.065
Neg logit contributions
 things-0.033
 attention-0.033
 new-0.032
 cotton-0.030
 food-0.030
Token She
Token ID1375
Feature activation+4.30e+00

Pos logit contributions
 forgot+0.129
'd+0.115
 screamed+0.113
 managed+0.112
 hadn+0.112
Neg logit contributions
 new-0.108
 TV-0.103
 things-0.097
 sun-0.097
 sight-0.096
Token wrote
Token ID2630
Feature activation+6.43e-01

Pos logit contributions
olded+0.166
oped+0.154
etsy+0.141
oke+0.135
ave+0.135
Neg logit contributions
 shiny-0.200
 new-0.197
 red-0.196
 magic-0.195
 more-0.193
Token how
Token ID703
Feature activation+8.55e-01

Pos logit contributions
olded+0.038
 hadn+0.037
 could+0.037
ave+0.037
 reached+0.037
Neg logit contributions
 attention-0.019
 new-0.018
 things-0.018
 cotton-0.017
 food-0.016
Token the
Token ID262
Feature activation-6.15e-01

Pos logit contributions
olded+0.050
 hadn+0.050
oped+0.049
 reached+0.049
ave+0.049
Neg logit contributions
 new-0.025
 things-0.025
 attention-0.024
 red-0.021
 thing-0.021
Token bright
Token ID6016
Feature activation-9.80e-02

Pos logit contributions
 attention+0.015
 things+0.013
 cotton+0.013
 new+0.013
 food+0.012
Neg logit contributions
 could-0.040
olded-0.039
 reached-0.039
 agreed-0.038
 suggested-0.038
Token sunshine
Token ID34488
Feature activation+2.42e+00

Pos logit contributions
 attention+0.002
 things+0.002
 cotton+0.002
 new+0.002
 thing+0.002
Neg logit contributions
olded-0.007
 could-0.007
ave-0.007
 hadn-0.007
 suggested-0.007
Token't
Token ID470
Feature activation-2.58e-01

Pos logit contributions
 attention+0.009
 things+0.009
 cotton+0.009
 new+0.009
 TV+0.009
Neg logit contributions
 could-0.008
ave-0.008
olded-0.008
 agreed-0.008
 thanked-0.008
Token touch
Token ID3638
Feature activation-2.18e+00

Pos logit contributions
 attention+0.009
 things+0.009
 cotton+0.009
 new+0.009
 food+0.008
Neg logit contributions
 could-0.013
olded-0.013
 hadn-0.013
 agreed-0.013
 thanked-0.013
Token it
Token ID340
Feature activation-5.17e-01

Pos logit contributions
 attention+0.086
 cotton+0.081
 gift+0.078
 rainy+0.077
 job+0.076
Neg logit contributions
 could-0.105
 reached-0.097
 couldn-0.096
 thanked-0.095
 agreed-0.095
Token".
Token ID1911
Feature activation+8.69e-01

Pos logit contributions
 attention+0.016
 cotton+0.016
 things+0.015
 new+0.015
 food+0.014
Neg logit contributions
 could-0.029
olded-0.028
 reached-0.028
 thanked-0.028
 hadn-0.028
Token His
Token ID2399
Feature activation-2.01e+00

Pos logit contributions
'd+0.035
 hadn+0.034
 forgot+0.033
 managed+0.031
 could+0.031
Neg logit contributions
 dessert-0.042
 TV-0.041
 new-0.041
 things-0.040
 lunch-0.040
Token mom
Token ID1995
Feature activation+3.48e+00

Pos logit contributions
 cotton+0.084
 attention+0.080
 rainy+0.075
 groceries+0.075
 food+0.072
Neg logit contributions
 couldn-0.093
 could-0.091
 thanked-0.088
 hadn-0.085
 understood-0.085
Token was
Token ID373
Feature activation-2.74e-01

Pos logit contributions
 Did+0.111
 Didn+0.095
 From+0.095
olded+0.094
 Meanwhile+0.093
Neg logit contributions
 more-0.166
 new-0.161
 red-0.156
 cake-0.156
 thing-0.148
Token so
Token ID523
Feature activation+4.66e-01

Pos logit contributions
 attention+0.009
 new+0.009
 cotton+0.009
 things+0.009
 dessert+0.008
Neg logit contributions
 could-0.015
ave-0.014
 couldn-0.014
olded-0.014
 hadn-0.014
Token proud
Token ID6613
Feature activation-3.51e-01

Pos logit contributions
 could+0.028
ave+0.027
 hadn+0.027
 couldn+0.027
oped+0.027
Neg logit contributions
 new-0.013
 attention-0.012
 nice-0.011
 red-0.011
 good-0.011
Token that
Token ID326
Feature activation+3.86e-01

Pos logit contributions
 attention+0.011
 things+0.011
 new+0.011
 cotton+0.011
 food+0.010
Neg logit contributions
 could-0.019
olded-0.019
ave-0.019
 hadn-0.019
 reached-0.019
Token he
Token ID339
Feature activation+3.88e+00

Pos logit contributions
ave+0.021
olded+0.020
 reached+0.018
 agreed+0.018
oped+0.018
Neg logit contributions
 things-0.015
 cotton-0.014
 new-0.014
 attention-0.014
 thing-0.013
Token she
Token ID673
Feature activation+3.08e+00

Pos logit contributions
olded+0.019
ave+0.019
 could+0.018
 reached+0.018
 suggested+0.018
Neg logit contributions
 new-0.013
 cotton-0.013
 attention-0.013
 things-0.013
 food-0.012
Token was
Token ID373
Feature activation+7.86e-01

Pos logit contributions
olded+0.121
apa+0.099
via+0.096
 From+0.093
oped+0.093
Neg logit contributions
 new-0.155
 real-0.145
 things-0.144
 thing-0.144
 shiny-0.138
Token flying
Token ID7348
Feature activation-8.02e-02

Pos logit contributions
 could+0.047
ave+0.046
olded+0.046
 couldn+0.045
 hadn+0.045
Neg logit contributions
 new-0.021
 attention-0.021
 red-0.019
 cotton-0.019
 real-0.019
Token!
Token ID0
Feature activation+2.65e+00

Pos logit contributions
 attention+0.002
 cotton+0.002
 new+0.002
 things+0.002
 thing+0.002
Neg logit contributions
olded-0.005
 could-0.005
 reached-0.004
 hadn-0.004
 couldn-0.004
Token
Token ID220
Feature activation+3.06e+00

Pos logit contributions
 forgot+0.129
ave+0.119
'd+0.119
 stared+0.119
 managed+0.118
Neg logit contributions
 new-0.106
 sight-0.094
 attention-0.093
 TV-0.093
 home-0.092
Token
Token ID198
Feature activation+2.31e+00

Pos logit contributions
 forgot+0.152
 stared+0.148
 wondered+0.143
 agreed+0.142
 thanked+0.139
Neg logit contributions
 new-0.128
 attention-0.116
 things-0.110
 home-0.108
 TV-0.105
Token
Token ID198
Feature activation+1.59e+00

Pos logit contributions
 forgot+0.110
olded+0.109
 stared+0.108
ave+0.107
 refused+0.105
Neg logit contributions
 new-0.095
 long-0.084
 big-0.079
 things-0.078
 home-0.078
TokenAfter
Token ID3260
Feature activation-8.12e-01

Pos logit contributions
 could+0.072
olded+0.072
 stared+0.072
 couldn+0.071
 forgot+0.071
Neg logit contributions
 new-0.058
 things-0.055
 sunny-0.053
 cotton-0.052
 TV-0.052
Token a
Token ID257
Feature activation-2.23e+00

Pos logit contributions
 attention+0.025
 cotton+0.024
 things+0.023
 new+0.023
 food+0.022
Neg logit contributions
 could-0.046
olded-0.044
 reached-0.044
 agreed-0.044
 couldn-0.044
Token while
Token ID981
Feature activation-1.13e+00

Pos logit contributions
 attention+0.071
 cotton+0.063
 things+0.063
 groceries+0.062
 sand+0.060
Neg logit contributions
 could-0.127
 couldn-0.123
 suggested-0.122
 reached-0.122
 thanked-0.121
Token,
Token ID11
Feature activation-5.23e-01

Pos logit contributions
 attention+0.043
 cotton+0.041
 things+0.039
 new+0.039
 food+0.038
Neg logit contributions
 could-0.057
 reached-0.055
 agreed-0.054
 couldn-0.054
 thanked-0.054
Token wanted
Token ID2227
Feature activation-7.17e-01

Pos logit contributions
liv+0.179
Well+0.175
grand+0.173
oped+0.172
ave+0.172
Neg logit contributions
 new-0.203
 more-0.198
 big-0.196
 red-0.194
 very-0.191
Token to
Token ID284
Feature activation+6.30e-01

Pos logit contributions
 attention+0.022
 cotton+0.020
 things+0.020
 new+0.019
 food+0.019
Neg logit contributions
 could-0.042
olded-0.041
 agreed-0.040
 reached-0.040
 couldn-0.040
Token help
Token ID1037
Feature activation-9.81e-01

Pos logit contributions
olded+0.039
ave+0.039
oped+0.038
 could+0.038
 hadn+0.037
Neg logit contributions
 attention-0.017
 new-0.016
 food-0.015
 things-0.015
 gift-0.015
Token,
Token ID11
Feature activation+1.41e+00

Pos logit contributions
 attention+0.034
 cotton+0.032
 things+0.031
 new+0.031
 food+0.030
Neg logit contributions
 could-0.052
 reached-0.050
olded-0.050
 agreed-0.050
 couldn-0.050
Token so
Token ID523
Feature activation+6.39e-01

Pos logit contributions
ave+0.097
oped+0.096
 lived+0.092
olded+0.091
 hadn+0.088
Neg logit contributions
 attention-0.032
 new-0.030
 things-0.029
 food-0.025
 red-0.025
Token she
Token ID673
Feature activation+3.49e+00

Pos logit contributions
ave+0.034
olded+0.032
'd+0.032
 could+0.032
 hadn+0.031
Neg logit contributions
 attention-0.024
 new-0.024
 things-0.024
 long-0.022
 cotton-0.021
Token asked
Token ID1965
Feature activation+1.55e-01

Pos logit contributions
os+0.117
oped+0.115
 Did+0.107
 Doing+0.106
Because+0.104
Neg logit contributions
 food-0.159
 bigger-0.157
 more-0.156
 magic-0.156
 new-0.154
Token the
Token ID262
Feature activation-9.08e-01

Pos logit contributions
ave+0.009
olded+0.009
 could+0.009
 hadn+0.009
 reached+0.009
Neg logit contributions
 attention-0.004
 things-0.004
 new-0.004
 food-0.004
 cotton-0.004
Token rabbit
Token ID22746
Feature activation+6.60e-01

Pos logit contributions
 attention+0.033
 cotton+0.031
 things+0.030
 new+0.030
 food+0.029
Neg logit contributions
 could-0.047
 reached-0.046
olded-0.045
 agreed-0.045
 couldn-0.045
Token to
Token ID284
Feature activation+2.30e-01

Pos logit contributions
 hadn+0.040
 reached+0.040
ave+0.039
 thanked+0.039
olded+0.039
Neg logit contributions
 attention-0.019
 new-0.018
 things-0.017
 food-0.016
 thing-0.015
Token follow
Token ID1061
Feature activation-9.45e-01

Pos logit contributions
 could+0.014
ave+0.014
olded+0.014
 hadn+0.014
 couldn+0.014
Neg logit contributions
 attention-0.006
 new-0.005
 food-0.005
 things-0.005
 gift-0.005
Token and
Token ID290
Feature activation+6.03e-01

Pos logit contributions
 attention+0.108
 cotton+0.101
 new+0.098
 gift+0.097
 rainy+0.097
Neg logit contributions
 could-0.081
 reached-0.072
 agreed-0.072
 couldn-0.072
 stood-0.070
Token has
Token ID468
Feature activation-5.40e-01

Pos logit contributions
ave+0.046
olded+0.045
 agreed+0.043
 suggested+0.042
 reached+0.042
Neg logit contributions
 red-0.009
 new-0.009
 cotton-0.009
 things-0.008
 round-0.008
Token horns
Token ID26771
Feature activation-3.43e-01

Pos logit contributions
 attention+0.012
 cotton+0.012
 things+0.011
 new+0.011
 food+0.010
Neg logit contributions
 could-0.035
olded-0.034
 agreed-0.034
 couldn-0.034
 hadn-0.034
Token.
Token ID13
Feature activation+7.91e-01

Pos logit contributions
 attention+0.010
 cotton+0.010
 things+0.010
 new+0.009
 food+0.009
Neg logit contributions
 could-0.019
 agreed-0.019
olded-0.019
 thanked-0.019
 hadn-0.019
Token He
Token ID679
Feature activation+3.40e+00

Pos logit contributions
 hadn+0.033
'd+0.032
 forgot+0.032
oped+0.031
 couldn+0.031
Neg logit contributions
 TV-0.039
 new-0.038
 red-0.036
 things-0.036
 food-0.036
Token looks
Token ID3073
Feature activation-1.48e+00

Pos logit contributions
ave+0.172
olded+0.168
oped+0.149
 agreed+0.147
 recommended+0.146
Neg logit contributions
 red-0.145
 things-0.140
 new-0.137
 shiny-0.133
 TV-0.132
Token strong
Token ID1913
Feature activation-9.58e-01

Pos logit contributions
 attention+0.050
 cotton+0.049
 things+0.046
 purpose+0.045
 dessert+0.045
Neg logit contributions
 could-0.080
 agreed-0.077
 reached-0.077
 thanked-0.076
 couldn-0.076
Token and
Token ID290
Feature activation+6.76e-01

Pos logit contributions
 attention+0.037
 cotton+0.035
 things+0.034
 new+0.034
 food+0.033
Neg logit contributions
 could-0.047
 reached-0.045
olded-0.045
 agreed-0.045
 couldn-0.045
Token attractive
Token ID10966
Feature activation-1.32e+00

Pos logit contributions
ave+0.046
olded+0.046
oped+0.044
 suggested+0.043
 couldn+0.043
Neg logit contributions
 attention-0.015
 new-0.015
 things-0.014
 red-0.014
 round-0.013
Token.
Token ID13
Feature activation+1.03e+00

Pos logit contributions
 attention+0.041
 cotton+0.039
 new+0.037
 things+0.037
 food+0.035
Neg logit contributions
 could-0.075
 agreed-0.072
 couldn-0.072
 reached-0.071
olded-0.071
Token Anna
Token ID11735
Feature activation+3.37e+00

Pos logit contributions
 hadn+0.046
'd+0.045
oped+0.045
 forgot+0.044
 couldn+0.044
Neg logit contributions
 new-0.048
 TV-0.046
 things-0.045
 red-0.044
 food-0.043
Token stopped
Token ID5025
Feature activation-1.32e+00

Pos logit contributions
 From+0.094
olded+0.090
ave+0.089
Well+0.082
 hadn+0.081
Neg logit contributions
 attention-0.104
 sand-0.102
 things-0.101
 new-0.100
 red-0.099
Token and
Token ID290
Feature activation+2.70e+00

Pos logit contributions
 attention+0.049
 cotton+0.047
 things+0.045
 food+0.044
 dessert+0.044
Neg logit contributions
 could-0.066
 reached-0.064
 couldn-0.064
 thanked-0.063
 agreed-0.063
Token looked
Token ID3114
Feature activation-7.23e-01

Pos logit contributions
ave+0.093
olded+0.090
 From+0.080
oped+0.075
att+0.074
Neg logit contributions
 sand-0.141
 new-0.135
 more-0.130
 attention-0.129
 dark-0.129
Token up
Token ID510
Feature activation-1.81e+00

Pos logit contributions
 attention+0.022
 cotton+0.020
 new+0.019
 things+0.019
 food+0.018
Neg logit contributions
 could-0.042
olded-0.041
 reached-0.041
 couldn-0.040
 agreed-0.040
Token and
Token ID290
Feature activation+1.98e+00

Pos logit contributions
 attention+0.068
 cotton+0.066
 food+0.063
 dessert+0.063
 things+0.063
Neg logit contributions
 could-0.090
 reached-0.087
 thanked-0.086
 agreed-0.086
 couldn-0.085
Token smiled
Token ID13541
Feature activation-1.08e+00

Pos logit contributions
ave+0.074
olded+0.069
 From+0.063
oped+0.058
 hadn+0.056
Neg logit contributions
 new-0.105
 attention-0.103
 round-0.099
 sand-0.099
 dark-0.098
Token,
Token ID11
Feature activation+1.79e+00

Pos logit contributions
 attention+0.033
 cotton+0.031
 things+0.030
 new+0.029
 food+0.029
Neg logit contributions
 could-0.062
 agreed-0.059
 reached-0.059
 couldn-0.059
olded-0.059
Token because
Token ID780
Feature activation+2.78e-01

Pos logit contributions
ave+0.107
olded+0.100
oped+0.096
os+0.096
 could+0.094
Neg logit contributions
 new-0.062
 attention-0.056
 red-0.055
 round-0.055
 sight-0.054
Token it
Token ID340
Feature activation+3.78e+00

Pos logit contributions
ave+0.014
olded+0.013
 reached+0.013
 could+0.013
 agreed+0.013
Neg logit contributions
 things-0.011
 new-0.011
 attention-0.011
 cotton-0.011
 thing-0.010
Token was
Token ID373
Feature activation-2.15e-01

Pos logit contributions
olded+0.168
 From+0.143
ave+0.141
 Did+0.132
oped+0.127
Neg logit contributions
 shiny-0.182
 big-0.173
 new-0.172
 magic-0.165
 beautiful-0.165
Token beautiful
Token ID4950
Feature activation-1.33e+00

Pos logit contributions
 attention+0.006
 new+0.006
 cotton+0.006
 sight+0.006
 things+0.006
Neg logit contributions
olded-0.013
ave-0.013
 could-0.013
 agreed-0.012
 suggested-0.012
Token gems
Token ID23685
Feature activation+8.55e-01

Pos logit contributions
 attention+0.055
 cotton+0.049
 TV+0.046
 dessert+0.045
 rainy+0.043
Neg logit contributions
 could-0.130
 thanked-0.121
 couldn-0.120
 hadn-0.120
 agreed-0.119
Token are
Token ID389
Feature activation-2.07e+00

Pos logit contributions
olded+0.038
ave+0.037
 hadn+0.035
 couldn+0.034
 reached+0.034
Neg logit contributions
 things-0.039
 attention-0.038
 new-0.037
 gift-0.036
 red-0.036
Token shiny
Token ID22441
Feature activation-3.15e+00

Pos logit contributions
 attention+0.043
 cotton+0.039
 dessert+0.036
 groceries+0.036
 TV+0.036
Neg logit contributions
 could-0.142
 couldn-0.135
 thanked-0.134
 reached-0.133
 hadn-0.133
Token and
Token ID290
Feature activation-9.02e-01

Pos logit contributions
 attention+0.174
 rainy+0.161
 purpose+0.159
 job+0.157
 sight+0.155
Neg logit contributions
 could-0.111
 agreed-0.089
 thanked-0.089
 wanted-0.089
 reached-0.086
Token pretty
Token ID2495
Feature activation-2.51e+00

Pos logit contributions
 attention+0.028
 cotton+0.027
 things+0.026
 new+0.025
 food+0.024
Neg logit contributions
 could-0.051
 reached-0.049
olded-0.049
 agreed-0.049
 couldn-0.049
Token.
Token ID13
Feature activation+3.49e-01

Pos logit contributions
 attention+0.098
 rainy+0.090
 cotton+0.089
 job+0.089
 groceries+0.087
Neg logit contributions
 could-0.126
 agreed-0.113
 thanked-0.111
 reached-0.110
 suggested-0.110
Token They
Token ID1119
Feature activation+5.64e-01

Pos logit contributions
ave+0.015
olded+0.014
 hadn+0.014
 couldn+0.014
 forgot+0.014
Neg logit contributions
 new-0.016
 cotton-0.016
 things-0.016
 attention-0.015
 red-0.015
Token have
Token ID423
Feature activation-1.31e+00

Pos logit contributions
olded+0.028
ave+0.028
oped+0.027
 suggested+0.027
 hadn+0.027
Neg logit contributions
 attention-0.021
 things-0.019
 red-0.019
 sand-0.019
 new-0.019
Token red
Token ID2266
Feature activation-2.33e+00

Pos logit contributions
 attention+0.034
 cotton+0.031
 things+0.028
 TV+0.028
 dessert+0.027
Neg logit contributions
 could-0.083
 reached-0.080
 thanked-0.079
 agreed-0.079
 couldn-0.079
Token,
Token ID11
Feature activation-1.65e+00

Pos logit contributions
 attention+0.083
 job+0.075
 rainy+0.074
 dessert+0.073
 TV+0.073
Neg logit contributions
 could-0.124
 thanked-0.114
 couldn-0.113
 reached-0.113
 agreed-0.111
Token blue
Token ID4171
Feature activation-2.20e+00

Pos logit contributions
 attention+0.032
 job+0.025
 cotton+0.025
 things+0.023
 sight+0.023
Neg logit contributions
 could-0.116
 reached-0.112
 thanked-0.111
 agreed-0.111
 hadn-0.110
Token give
Token ID1577
Feature activation-1.64e+00

Pos logit contributions
 attention+0.014
 things+0.013
 cotton+0.012
 food+0.012
 new+0.012
Neg logit contributions
 could-0.014
olded-0.013
 couldn-0.013
 hadn-0.013
 reached-0.013
Token them
Token ID606
Feature activation-1.64e+00

Pos logit contributions
 cotton+0.052
 attention+0.051
 gift+0.046
 dessert+0.046
 rainy+0.046
Neg logit contributions
 could-0.090
 reached-0.086
 couldn-0.085
 agreed-0.084
olded-0.084
Token kisses
Token ID36544
Feature activation-4.37e-01

Pos logit contributions
 cotton+0.041
 attention+0.041
 sight+0.037
 things+0.036
 new+0.036
Neg logit contributions
 could-0.100
 agreed-0.097
olded-0.097
 reached-0.096
 thanked-0.095
Token.
Token ID13
Feature activation+1.18e+00

Pos logit contributions
 attention+0.013
 cotton+0.012
 things+0.012
 new+0.012
 food+0.011
Neg logit contributions
 could-0.025
olded-0.025
 reached-0.025
 hadn-0.025
 couldn-0.024
Token She
Token ID1375
Feature activation+3.62e+00

Pos logit contributions
ave+0.052
olded+0.049
 forgot+0.049
'd+0.048
 agreed+0.048
Neg logit contributions
 new-0.054
 TV-0.053
 things-0.052
 long-0.051
 cotton-0.050
Token did
Token ID750
Feature activation-7.53e-01

Pos logit contributions
oped+0.145
os+0.131
ver+0.126
enny+0.125
rolled+0.123
Neg logit contributions
 things-0.168
 new-0.166
 bad-0.164
 magic-0.160
 food-0.159
Token not
Token ID407
Feature activation-3.70e-01

Pos logit contributions
 attention+0.033
 cotton+0.031
 things+0.031
 new+0.031
 food+0.030
Neg logit contributions
 could-0.034
olded-0.032
 reached-0.032
 agreed-0.032
 thanked-0.032
Token give
Token ID1577
Feature activation-1.43e+00

Pos logit contributions
 attention+0.016
 things+0.015
 cotton+0.015
 food+0.015
 new+0.015
Neg logit contributions
 could-0.017
olded-0.016
 couldn-0.016
 hadn-0.016
 reached-0.016
Token them
Token ID606
Feature activation-1.54e+00

Pos logit contributions
 attention+0.046
 cotton+0.045
 things+0.041
 new+0.041
 gift+0.041
Neg logit contributions
 could-0.078
 reached-0.075
 agreed-0.074
 couldn-0.074
olded-0.074
Token cookies
Token ID14746
Feature activation-4.80e-01

Pos logit contributions
 attention+0.038
 cotton+0.038
 things+0.034
 sight+0.034
 new+0.033
Neg logit contributions
 could-0.095
 agreed-0.092
olded-0.092
 reached-0.091
 thanked-0.091
Token.
Token ID13
Feature activation+9.98e-01

Pos logit contributions
 attention+0.013
 cotton+0.012
 things+0.012
 new+0.012
 food+0.011
Neg logit contributions
 could-0.029
olded-0.028
 reached-0.028
 couldn-0.028
 hadn-0.028
Token Maria
Token ID14200
Feature activation-2.66e-01

Pos logit contributions
 attention+0.032
 cotton+0.030
 things+0.030
 new+0.030
 food+0.029
Neg logit contributions
 could-0.036
olded-0.034
 reached-0.034
 agreed-0.034
 thanked-0.034
Token.
Token ID13
Feature activation+1.87e-01

Pos logit contributions
 attention+0.007
 things+0.006
 new+0.006
 cotton+0.006
 thing+0.006
Neg logit contributions
olded-0.016
 could-0.016
 reached-0.016
 agreed-0.016
 hadn-0.016
Token She
Token ID1375
Feature activation+1.06e+00

Pos logit contributions
olded+0.007
ave+0.006
 thanked+0.006
 could+0.006
 hadn+0.006
Neg logit contributions
 attention-0.010
 things-0.010
 new-0.010
 sight-0.009
 thing-0.009
Token was
Token ID373
Feature activation-2.57e+00

Pos logit contributions
olded+0.041
 agreed+0.034
oped+0.033
 apologised+0.033
 suggested+0.032
Neg logit contributions
 red-0.056
 new-0.056
 things-0.055
 shiny-0.055
 thing-0.054
Token three
Token ID1115
Feature activation-1.68e+00

Pos logit contributions
 attention+0.105
 cotton+0.097
 purpose+0.093
 groceries+0.090
 rules+0.089
Neg logit contributions
 could-0.125
 couldn-0.122
 reached-0.118
 asked-0.116
 stood-0.115
Token years
Token ID812
Feature activation-2.37e+00

Pos logit contributions
 attention+0.072
 cotton+0.068
 sight+0.063
 gift+0.062
 job+0.061
Neg logit contributions
 could-0.078
 thanked-0.075
 agreed-0.075
olded-0.073
 couldn-0.073
Token old
Token ID1468
Feature activation-2.43e+00

Pos logit contributions
 attention+0.061
 cotton+0.059
 things+0.052
 food+0.050
 thing+0.050
Neg logit contributions
 could-0.141
 thanked-0.140
olded-0.136
 couldn-0.135
 suggested-0.134
Token and
Token ID290
Feature activation+4.58e-01

Pos logit contributions
 attention+0.126
 cotton+0.118
 rainy+0.112
 purpose+0.111
 lunch+0.109
Neg logit contributions
 could-0.095
 thanked-0.088
 nodded-0.086
 agreed-0.085
 couldn-0.085
Token liked
Token ID8288
Feature activation-2.32e+00

Pos logit contributions
olded+0.014
ave+0.014
oped+0.013
 reached+0.013
 suggested+0.013
Neg logit contributions
 new-0.026
 things-0.025
 red-0.025
 attention-0.025
 sunny-0.024
Token to
Token ID284
Feature activation-7.36e-01

Pos logit contributions
 attention+0.096
 cotton+0.086
 gift+0.082
 thing+0.081
rier+0.080
Neg logit contributions
 could-0.111
 couldn-0.106
 thanked-0.101
 suggested-0.101
 reached-0.100
Token explore
Token ID7301
Feature activation-3.40e+00

Pos logit contributions
 attention+0.027
 cotton+0.026
 things+0.025
 new+0.024
 food+0.024
Neg logit contributions
 could-0.038
olded-0.037
 reached-0.036
 agreed-0.036
 couldn-0.036
Token Lucy
Token ID22162
Feature activation+4.83e+00

Pos logit contributions
ave+0.059
olded+0.058
oped+0.057
 From+0.053
 reached+0.052
Neg logit contributions
 things-0.040
 attention-0.038
 food-0.038
 new-0.038
 dessert-0.037
Token and
Token ID290
Feature activation+7.14e-01

Pos logit contributions
Well+0.201
olded+0.187
Even+0.179
uct+0.176
 From+0.174
Neg logit contributions
 home-0.181
 red-0.180
 big-0.172
 new-0.170
 long-0.167
Token Jamie
Token ID17826
Feature activation+4.04e+00

Pos logit contributions
olded+0.031
ave+0.031
oped+0.028
 hadn+0.028
 apologized+0.027
Neg logit contributions
 attention-0.034
 things-0.033
 lunch-0.032
 TV-0.032
 sand-0.031
Token had
Token ID550
Feature activation+5.47e-01

Pos logit contributions
Well+0.148
 From+0.144
olded+0.139
 Perhaps+0.129
 Now+0.129
Neg logit contributions
 any-0.178
 home-0.176
 sand-0.173
 red-0.171
 long-0.166
Token to
Token ID284
Feature activation-2.24e-01

Pos logit contributions
olded+0.035
oped+0.035
ave+0.034
 could+0.033
 hadn+0.033
Neg logit contributions
 new-0.013
 things-0.012
 attention-0.012
 sand-0.012
 food-0.012
Token go
Token ID467
Feature activation-2.45e+00

Pos logit contributions
 attention+0.008
 cotton+0.008
 things+0.007
 sand+0.007
 new+0.007
Neg logit contributions
 could-0.012
olded-0.012
ave-0.011
 couldn-0.011
 reached-0.011
Token get
Token ID651
Feature activation-1.63e+00

Pos logit contributions
 cotton+0.071
 attention+0.063
 purpose+0.063
 TV+0.063
 dessert+0.062
Neg logit contributions
 could-0.145
 reached-0.139
 agreed-0.135
 thanked-0.132
 couldn-0.132
Token it
Token ID340
Feature activation-1.81e+00

Pos logit contributions
 cotton+0.056
 attention+0.056
 TV+0.054
 gift+0.052
 dessert+0.050
Neg logit contributions
 could-0.085
 agreed-0.081
 thanked-0.081
 couldn-0.081
 reached-0.080
Token.
Token ID13
Feature activation+1.71e+00

Pos logit contributions
 attention+0.052
 cotton+0.050
 things+0.048
 TV+0.046
 gift+0.045
Neg logit contributions
 could-0.105
 agreed-0.102
 reached-0.101
 couldn-0.100
 suggested-0.099
Token When
Token ID1649
Feature activation-5.63e-01

Pos logit contributions
oped+0.072
ave+0.071
olded+0.069
 apologized+0.068
 managed+0.068
Neg logit contributions
 TV-0.083
 birthday-0.077
 new-0.077
 sight-0.076
 things-0.075
Token they
Token ID484
Feature activation+1.71e+00

Pos logit contributions
 attention+0.026
 cotton+0.025
 things+0.025
 new+0.025
 food+0.024
Neg logit contributions
 could-0.023
olded-0.023
ave-0.023
 agreed-0.022
 reached-0.022
Token
Token ID26391
Feature activation+2.03e+00

Pos logit contributions
 could+0.012
 couldn+0.012
 hadn+0.011
 thanked+0.011
olded+0.011
Neg logit contributions
 attention-0.014
 job-0.013
 cotton-0.013
 thing-0.013
 things-0.012
Token
Token ID8151
Feature activation+8.67e-01

Pos logit contributions
olded+0.083
 apologised+0.081
ave+0.081
 thanked+0.081
 apologized+0.081
Neg logit contributions
 things-0.086
 long-0.084
 lunch-0.083
 food-0.079
 job-0.079
Tokent
Token ID83
Feature activation+6.36e-01

Pos logit contributions
 thanked+0.033
 reached+0.032
 stared+0.032
 could+0.031
 wondered+0.031
Neg logit contributions
 attention-0.041
 new-0.039
 cotton-0.038
 things-0.037
 red-0.037
Token wear
Token ID5806
Feature activation-4.41e-01

Pos logit contributions
olded+0.031
ave+0.031
 could+0.030
 From+0.030
 thanked+0.030
Neg logit contributions
 thing-0.023
 things-0.023
 attention-0.023
 dessert-0.023
 job-0.022
Token my
Token ID616
Feature activation-2.49e+00

Pos logit contributions
 attention+0.013
 things+0.012
 cotton+0.012
 new+0.012
 food+0.011
Neg logit contributions
 could-0.026
 agreed-0.026
olded-0.026
 reached-0.026
 thanked-0.025
Token new
Token ID649
Feature activation-2.45e+00

Pos logit contributions
 attention+0.055
 cotton+0.049
 sand+0.049
 dessert+0.046
reating+0.045
Neg logit contributions
 could-0.160
 couldn-0.152
 hadn-0.150
 reached-0.148
 agreed-0.148
Token sweater
Token ID35685
Feature activation+1.08e+00

Pos logit contributions
 attention+0.063
 sand+0.055
 cotton+0.055
 dessert+0.054
 rainy+0.051
Neg logit contributions
 could-0.144
 agreed-0.140
 couldn-0.138
 suggested-0.138
 reached-0.137
Token anymore
Token ID7471
Feature activation+1.58e+00

Pos logit contributions
 reached+0.064
 stared+0.063
olded+0.063
 thanked+0.063
 hadn+0.062
Neg logit contributions
 attention-0.029
 things-0.027
 new-0.026
 thing-0.026
 cotton-0.026
Token!
Token ID0
Feature activation+2.80e+00

Pos logit contributions
olded+0.105
 reached+0.103
ave+0.103
oped+0.101
 stared+0.101
Neg logit contributions
 things-0.037
 thing-0.033
 food-0.033
 attention-0.032
 lunch-0.032
Tokenâ
Token ID22940
Feature activation+9.00e-01

Pos logit contributions
 forgot+0.139
ave+0.131
 understood+0.127
 couldn+0.125
 managed+0.125
Neg logit contributions
 new-0.095
 TV-0.094
 gift-0.091
 cotton-0.091
 things-0.090
Token
Token ID26391
Feature activation+2.16e+00

Pos logit contributions
 could+0.037
 couldn+0.037
olded+0.034
 reached+0.034
 hadn+0.034
Neg logit contributions
 attention-0.041
 job-0.040
 thing-0.038
 things-0.037
 gift-0.037
Token time
Token ID640
Feature activation-1.31e+00

Pos logit contributions
 attention+0.046
 cotton+0.042
 things+0.040
 groceries+0.037
 food+0.037
Neg logit contributions
 could-0.099
 reached-0.098
 agreed-0.097
 couldn-0.096
 thanked-0.095
Token,
Token ID11
Feature activation-1.44e+00

Pos logit contributions
 attention+0.049
 cotton+0.046
 things+0.044
 food+0.043
 new+0.043
Neg logit contributions
 could-0.067
 couldn-0.065
 reached-0.064
 agreed-0.064
 hadn-0.063
Token there
Token ID612
Feature activation+1.29e+00

Pos logit contributions
 attention+0.055
 cotton+0.052
 things+0.048
 food+0.048
 groceries+0.047
Neg logit contributions
 could-0.075
 reached-0.071
 couldn-0.071
 hadn-0.070
 agreed-0.070
Token was
Token ID373
Feature activation-2.49e+00

Pos logit contributions
olded+0.051
ave+0.045
 apologised+0.040
 agreed+0.038
 apologized+0.038
Neg logit contributions
 thing-0.074
 new-0.073
 things-0.072
 sand-0.068
 lunch-0.068
Token a
Token ID257
Feature activation-3.05e+00

Pos logit contributions
 attention+0.107
 things+0.087
 gift+0.087
 rules+0.086
rier+0.085
Neg logit contributions
 could-0.118
 reached-0.117
 couldn-0.113
 thanked-0.113
 From-0.113
Token little
Token ID1310
Feature activation-3.30e+00

Pos logit contributions
 attention+0.135
 groceries+0.109
 cotton+0.106
 things+0.102
reating+0.101
Neg logit contributions
 could-0.154
 couldn-0.143
 thanked-0.142
 forgot-0.141
 wanted-0.139
Token boy
Token ID2933
Feature activation-1.57e+00

Pos logit contributions
 attention+0.165
 purpose+0.138
 groceries+0.135
 cotton+0.134
 any+0.133
Neg logit contributions
 could-0.131
 agreed-0.121
 nodded-0.120
 said-0.119
 wanted-0.119
Token named
Token ID3706
Feature activation-1.03e+00

Pos logit contributions
 attention+0.111
 cotton+0.107
 groceries+0.103
 sight+0.102
 things+0.102
Neg logit contributions
 could-0.033
 couldn-0.028
 reached-0.028
 suggested-0.028
 thanked-0.027
Token Tim
Token ID5045
Feature activation-8.88e-01

Pos logit contributions
 attention+0.054
 cotton+0.053
 things+0.051
 new+0.050
 gift+0.049
Neg logit contributions
 could-0.038
 reached-0.036
 couldn-0.035
 agreed-0.035
olded-0.035
Tokenmy
Token ID1820
Feature activation-2.00e+00

Pos logit contributions
 attention+0.048
 cotton+0.046
 things+0.045
 new+0.045
 food+0.044
Neg logit contributions
 could-0.031
 reached-0.029
 agreed-0.029
olded-0.029
 couldn-0.028
Token.
Token ID13
Feature activation-9.66e-01

Pos logit contributions
 attention+0.083
 cotton+0.080
 groceries+0.076
 sight+0.071
 things+0.070
Neg logit contributions
 could-0.102
 couldn-0.095
 thanked-0.094
 suggested-0.093
 reached-0.093
Token tree
Token ID5509
Feature activation-1.67e+00

Pos logit contributions
 attention+0.174
 groceries+0.140
 unnecessary+0.133
ags+0.131
 cotton+0.128
Neg logit contributions
 could-0.184
 reached-0.176
 said-0.174
 couldn-0.172
 thanked-0.169
Token.
Token ID13
Feature activation+5.52e-01

Pos logit contributions
 attention+0.063
 cotton+0.059
 groceries+0.055
 things+0.055
 dessert+0.054
Neg logit contributions
 could-0.088
 reached-0.084
 couldn-0.082
 suggested-0.082
 agreed-0.082
Token She
Token ID1375
Feature activation+2.95e+00

Pos logit contributions
olded+0.020
 forgot+0.020
ave+0.020
 agreed+0.019
 suggested+0.019
Neg logit contributions
 TV-0.028
 new-0.027
 things-0.027
 cotton-0.027
 red-0.027
Token wanted
Token ID2227
Feature activation-1.75e+00

Pos logit contributions
olded+0.109
oped+0.090
ives+0.090
ave+0.089
os+0.086
Neg logit contributions
 red-0.161
 new-0.156
 shiny-0.155
 big-0.152
 long-0.151
Token to
Token ID284
Feature activation-3.38e-01

Pos logit contributions
 cotton+0.067
 attention+0.066
 gift+0.061
 job+0.061
 TV+0.060
Neg logit contributions
 could-0.088
 reached-0.085
 couldn-0.081
 agreed-0.079
 suggested-0.079
Token climb
Token ID12080
Feature activation-2.46e+00

Pos logit contributions
 attention+0.012
 things+0.012
 cotton+0.011
 new+0.011
 food+0.011
Neg logit contributions
olded-0.017
 could-0.017
 hadn-0.017
ave-0.017
 suggested-0.016
Token it
Token ID340
Feature activation-1.16e+00

Pos logit contributions
 attention+0.104
 gift+0.103
 birthday+0.103
 dessert+0.102
 purpose+0.097
Neg logit contributions
 could-0.111
 reached-0.105
 couldn-0.101
 stood-0.096
 agreed-0.094
Token!
Token ID0
Feature activation+1.41e+00

Pos logit contributions
 attention+0.042
 cotton+0.040
 things+0.038
 new+0.037
 dessert+0.037
Neg logit contributions
 could-0.060
 reached-0.058
 agreed-0.057
 couldn-0.057
 thanked-0.056
Token
Token ID198
Feature activation+1.59e+00

Pos logit contributions
 forgot+0.062
'd+0.061
 apologized+0.060
 managed+0.060
 apologised+0.059
Neg logit contributions
 new-0.060
 things-0.060
 TV-0.057
 food-0.056
 round-0.055
Token
Token ID198
Feature activation+5.47e-01

Pos logit contributions
olded+0.072
ave+0.069
 hadn+0.068
 apologized+0.067
 apologised+0.067
Neg logit contributions
 new-0.070
 things-0.063
 long-0.063
 attention-0.062
 big-0.060
Token"
Token ID1
Feature activation-2.45e-01

Pos logit contributions
olded+0.022
 couldn+0.021
 could+0.021
 hadn+0.021
 apologized+0.021
Neg logit contributions
 things-0.024
 new-0.023
 thing-0.023
 cotton-0.022
 food-0.022
Token fish
Token ID5916
Feature activation+3.40e+00

Pos logit contributions
 attention+0.091
 earthquake+0.076
 purpose+0.076
 groceries+0.074
 cotton+0.074
Neg logit contributions
 could-0.138
 thanked-0.130
 suggested-0.124
 agreed-0.123
 reached-0.122
Token looked
Token ID3114
Feature activation-9.24e-01

Pos logit contributions
olded+0.118
lett+0.114
oped+0.108
iva+0.106
ave+0.106
Neg logit contributions
 new-0.168
 thing-0.165
 red-0.164
 home-0.160
 things-0.159
Token up
Token ID510
Feature activation-1.72e+00

Pos logit contributions
 attention+0.029
 cotton+0.027
 things+0.026
 new+0.026
 food+0.025
Neg logit contributions
 could-0.052
 reached-0.050
olded-0.050
 agreed-0.050
 couldn-0.050
Token with
Token ID351
Feature activation-1.79e+00

Pos logit contributions
 attention+0.075
 cotton+0.073
 things+0.069
 food+0.069
 TV+0.069
Neg logit contributions
 could-0.078
 reached-0.076
 agreed-0.075
 couldn-0.074
 thanked-0.073
Token a
Token ID257
Feature activation-1.69e+00

Pos logit contributions
 attention+0.061
 cotton+0.057
 groceries+0.057
 things+0.055
 job+0.054
Neg logit contributions
 thanked-0.095
 could-0.095
 agreed-0.094
 reached-0.093
 understood-0.090
Token surprised
Token ID6655
Feature activation-2.91e+00

Pos logit contributions
 attention+0.044
 cotton+0.038
 things+0.037
 groceries+0.037
 food+0.034
Neg logit contributions
 could-0.107
 thanked-0.103
 reached-0.102
 agreed-0.102
 couldn-0.102
Token look
Token ID804
Feature activation-1.24e+00

Pos logit contributions
 groceries+0.105
 attention+0.104
 cotton+0.104
 rainy+0.092
 lunch+0.088
Neg logit contributions
 agreed-0.141
 thanked-0.141
 nodded-0.137
 could-0.132
 reached-0.132
Token and
Token ID290
Feature activation+1.35e+00

Pos logit contributions
 attention+0.043
 cotton+0.040
 things+0.038
 new+0.037
 dessert+0.037
Neg logit contributions
 could-0.067
 agreed-0.064
 reached-0.064
 couldn-0.064
 thanked-0.064
Token the
Token ID262
Feature activation-2.16e+00

Pos logit contributions
ave+0.045
olded+0.045
oped+0.043
os+0.039
Well+0.036
Neg logit contributions
 new-0.081
 red-0.076
 dark-0.075
 shiny-0.075
 round-0.075
Token two
Token ID734
Feature activation-1.03e+00

Pos logit contributions
 attention+0.070
 cotton+0.064
 rainy+0.063
 groceries+0.062
 job+0.062
Neg logit contributions
 could-0.118
 thanked-0.115
 agreed-0.111
 reached-0.109
 suggested-0.108
Token of
Token ID286
Feature activation-1.88e+00

Pos logit contributions
 attention+0.029
 cotton+0.027
 things+0.025
 new+0.025
 food+0.024
Neg logit contributions
 could-0.062
 reached-0.060
 agreed-0.060
 thanked-0.059
olded-0.059