TOP ACTIVATIONS
MAX = 0.187

incredible spirit of the forest envelop ing him .
's how the two ordinary envelop es united and became one
Thank you Mom my for letting me help ," said Lily
thank you to them for letting him see them . He
it . Thank you for letting me paint your car .
you to the park for letting us play here ."
names and put them in envelop es . Then they would
up . Thank you for letting me play with you ."
heart . Thank you for letting me go . I love
it can forgive you for letting it go . That 's
. And thank you for letting us play with Max .
" Thank you for letting us pet him ," Ben
today . Thank you for letting me brush your hair ."
for soaking the crack and letting me out . You are
Mom , thank you for letting me share Rose with them
" Thank you for letting us pet your dog ,"
on . Thank you for letting me borrow it ," Ben
thanked the star fish for letting them touch it and left
different . Thank you for letting me play with it ."
he could serve others by letting them read his story .

MIN ACTIVATIONS
MIN = -0.216

confused , but it felt very refreshing .
. The small shell felt very sad , until it met
It was large and looked very interesting to him . So
was very high and looked very dangerous . Maria
yellow and her wool felt very soft . One
very old and Alice felt very serious when she saw it
It was gray and looked very fun ! One day ,
cry . The hall felt very quiet . Suddenly
very wet . It felt very heavy in her small hands
. The zero was feeling very sad . Every day ,
was very scared and felt very alone . She was so
little girl who was feeling very sleepy . She was so
, a woman was feeling very sleepy . She wanted to
. The avocado was feeling very lonely and did not know
. But it was feeling very lonely , so it called
big , and Tiny felt very small . But Tiny was
The aer opl ane felt very happy too . From that
was , but it felt very excited .
very troubled and was feeling very sad one day . He
was tall and it looked very strong . It was so

INTERVAL 0.086 - 0.187
CONTAINS 4.670%

sometimes , it 's important to mind what your parents say
to me , happy birthday to me , happy birthday dear
, " Maybe it belongs to the snow man . He
store and began to talk to one of the ladies who
He thinks they are fun to play with . One day

INTERVAL -0.015 - 0.086
CONTAINS 61.608%

of special stones . One day he found
Jill said . " I have a surprise for you .
a lot of noise and wake up Mom .
the fire men . They can help us get Spot out
His mom ma was in the kitchen and saw him holding

INTERVAL -0.116 - -0.015
CONTAINS 31.041%

in the ground . It was tall and gray and didn
day , Tom had an idea . He wanted to scare
giant tree in a hidden forest . Every night an owl
large clock . It was big and made a loud tick
ground . The feather flew away in the wind . L

INTERVAL -0.216 - -0.116
CONTAINS 2.680%

that made her feel very uncomfortable . Faith
But she was having so much fun that she didn 't
fly . The fly was very clean . One day it
. They both had a great time , and Tim learned
. " You look so shiny and colorful ."
Token incredible
Token ID8082
Feature activation-1.16e-01

Pos logit contributions
 join+0.027
 meet+0.026
 greet+0.025
talk+0.025
 build+0.024
Neg logit contributions
ummy-0.018
ls-0.018
 adorable-0.017
 important-0.017
ience-0.017
Token spirit
Token ID4437
Feature activation+3.87e-04

Pos logit contributions
ls+0.200
ummy+0.197
 adorable+0.192
 important+0.191
ience+0.190
Neg logit contributions
 join-0.239
 meet-0.237
 greet-0.225
 build-0.218
talk-0.216
Token of
Token ID286
Feature activation+8.04e-02

Pos logit contributions
 join+0.001
 meet+0.001
 greet+0.001
 build+0.001
talk+0.001
Neg logit contributions
ummy-0.001
ls-0.001
 important-0.001
 adorable-0.001
 problem-0.001
Token the
Token ID262
Feature activation+4.28e-02

Pos logit contributions
 meet+0.166
 join+0.166
 greet+0.157
talk+0.157
 build+0.153
Neg logit contributions
ummy-0.136
 adorable-0.135
 important-0.134
ls-0.132
 problem-0.129
Token forest
Token ID8222
Feature activation+5.58e-03

Pos logit contributions
 meet+0.097
 join+0.096
 greet+0.091
talk+0.087
 contain+0.086
Neg logit contributions
ummy-0.066
ls-0.066
ience-0.061
 adorable-0.061
 important-0.061
Token envelop
Token ID16441
Feature activation+1.81e-01

Pos logit contributions
 join+0.014
 meet+0.014
 greet+0.013
talk+0.013
 build+0.013
Neg logit contributions
ummy-0.007
 adorable-0.007
ls-0.007
 important-0.007
 problem-0.007
Tokening
Token ID278
Feature activation+1.25e-02

Pos logit contributions
 meet+0.380
 join+0.379
 greet+0.360
talk+0.353
 build+0.348
Neg logit contributions
ummy-0.306
ls-0.299
 adorable-0.296
 important-0.294
ience-0.285
Token him
Token ID683
Feature activation+3.86e-02

Pos logit contributions
 meet+0.026
 join+0.026
 greet+0.025
talk+0.024
 build+0.024
Neg logit contributions
ummy-0.021
 adorable-0.020
 important-0.020
ls-0.020
 problem-0.020
Token.
Token ID13
Feature activation+8.59e-03

Pos logit contributions
 join+0.087
 meet+0.087
 greet+0.082
 build+0.080
talk+0.080
Neg logit contributions
ummy-0.058
 adorable-0.057
ls-0.056
 important-0.056
 problem-0.054
Token
Token ID198
Feature activation+2.91e-02

Pos logit contributions
 join+0.018
 meet+0.017
 greet+0.016
 build+0.016
talk+0.016
Neg logit contributions
ls-0.015
 important-0.015
 adorable-0.015
ummy-0.015
 problem-0.015
Token
Token ID198
Feature activation+2.46e-02

Pos logit contributions
 join+0.057
 meet+0.057
 greet+0.054
talk+0.053
 build+0.052
Neg logit contributions
ummy-0.053
 adorable-0.052
ls-0.052
 important-0.051
 problem-0.050
Token's
Token ID338
Feature activation+6.96e-03

Pos logit contributions
 join+0.039
 meet+0.038
 build+0.037
 greet+0.035
 make+0.035
Neg logit contributions
ls-0.025
ummy-0.024
 adorable-0.023
ience-0.022
 importantly-0.022
Token how
Token ID703
Feature activation+1.87e-02

Pos logit contributions
 join+0.015
 meet+0.015
 build+0.014
 greet+0.014
 contain+0.013
Neg logit contributions
ls-0.012
ummy-0.011
ience-0.011
 adorable-0.010
 problem-0.010
Token the
Token ID262
Feature activation+5.17e-02

Pos logit contributions
 join+0.043
 meet+0.042
 build+0.039
 greet+0.039
 contain+0.037
Neg logit contributions
ummy-0.028
ls-0.027
 adorable-0.026
 problem-0.026
ience-0.026
Token two
Token ID734
Feature activation+1.66e-02

Pos logit contributions
 join+0.125
 meet+0.123
 greet+0.116
 build+0.115
 gra+0.111
Neg logit contributions
ls-0.071
ummy-0.070
ience-0.065
 adorable-0.065
 important-0.063
Token ordinary
Token ID8850
Feature activation-8.45e-02

Pos logit contributions
 join+0.043
 meet+0.042
 build+0.041
 greet+0.039
 make+0.038
Neg logit contributions
ummy-0.020
ls-0.019
 adorable-0.018
 important-0.018
ience-0.017
Token envelop
Token ID16441
Feature activation+1.78e-01

Pos logit contributions
ls+0.129
ummy+0.128
 adorable+0.124
 important+0.122
ience+0.121
Neg logit contributions
 join-0.192
 meet-0.189
 greet-0.178
 build-0.177
talk-0.171
Tokenes
Token ID274
Feature activation+6.71e-02

Pos logit contributions
 meet+0.383
 join+0.383
 greet+0.367
talk+0.357
 build+0.352
Neg logit contributions
ummy-0.288
ls-0.283
 adorable-0.278
 important-0.275
ience-0.270
Token united
Token ID16503
Feature activation-5.07e-03

Pos logit contributions
 join+0.151
 meet+0.151
 greet+0.144
talk+0.143
 build+0.140
Neg logit contributions
ummy-0.100
 adorable-0.098
 important-0.098
ls-0.098
 problem-0.094
Token and
Token ID290
Feature activation-4.29e-02

Pos logit contributions
ummy+0.007
 adorable+0.007
ls+0.007
 important+0.007
 problem+0.007
Neg logit contributions
 meet-0.012
 join-0.012
 greet-0.011
talk-0.011
 build-0.011
Token became
Token ID2627
Feature activation-3.51e-02

Pos logit contributions
 important+0.085
 adorable+0.083
 problem+0.080
ummy+0.080
 disgusting+0.079
Neg logit contributions
 meet-0.075
 join-0.075
talk-0.074
 greet-0.071
 contain-0.069
Token one
Token ID530
Feature activation-2.21e-02

Pos logit contributions
ummy+0.054
ls+0.053
 adorable+0.050
ience+0.050
 important+0.050
Neg logit contributions
 join-0.081
 meet-0.079
 greet-0.075
 build-0.074
talk-0.073
TokenThank
Token ID10449
Feature activation+3.01e-02

Pos logit contributions
 meet+0.072
 join+0.072
 greet+0.069
 build+0.068
 contain+0.067
Neg logit contributions
 adorable-0.073
 important-0.073
ummy-0.071
 importantly-0.069
 disgusting-0.068
Token you
Token ID345
Feature activation+6.14e-02

Pos logit contributions
 meet+0.073
 join+0.072
 greet+0.070
talk+0.070
 build+0.068
Neg logit contributions
ummy-0.042
 important-0.040
 adorable-0.040
ls-0.039
 problem-0.037
Token Mom
Token ID11254
Feature activation+6.00e-02

Pos logit contributions
 join+0.147
 meet+0.143
 greet+0.134
 contain+0.131
 build+0.129
Neg logit contributions
ls-0.091
 importantly-0.082
 adorable-0.082
 disgusting-0.080
 important-0.080
Tokenmy
Token ID1820
Feature activation+5.62e-02

Pos logit contributions
 meet+0.133
 join+0.132
 greet+0.127
talk+0.126
 contain+0.124
Neg logit contributions
ummy-0.094
 adorable-0.090
 important-0.090
ls-0.088
 problem-0.087
Token for
Token ID329
Feature activation+1.04e-01

Pos logit contributions
 meet+0.129
 join+0.129
 greet+0.123
talk+0.121
 build+0.119
Neg logit contributions
ummy-0.083
 adorable-0.081
 important-0.081
ls-0.080
 problem-0.077
Token letting
Token ID9616
Feature activation+1.78e-01

Pos logit contributions
 join+0.234
talk+0.234
 meet+0.233
 greet+0.226
 contain+0.222
Neg logit contributions
 important-0.153
ummy-0.153
 adorable-0.152
 problem-0.143
 disgusting-0.143
Token me
Token ID502
Feature activation+8.42e-02

Pos logit contributions
 join+0.472
 meet+0.471
talk+0.462
 greet+0.457
 contain+0.456
Neg logit contributions
ummy-0.197
 important-0.187
 adorable-0.185
ls-0.178
 problem-0.173
Token help
Token ID1037
Feature activation+8.37e-02

Pos logit contributions
talk+0.095
 meet+0.084
 greet+0.083
 contain+0.079
 gra+0.077
Neg logit contributions
ummy-0.211
 adorable-0.210
 important-0.209
ls-0.201
 problem-0.200
Token,"
Token ID553
Feature activation+3.42e-02

Pos logit contributions
 meet+0.219
 join+0.217
talk+0.214
 greet+0.211
 contain+0.207
Neg logit contributions
ummy-0.101
 important-0.095
 adorable-0.093
ls-0.087
 problem-0.086
Token said
Token ID531
Feature activation-4.40e-03

Pos logit contributions
 join+0.084
 meet+0.083
 build+0.079
 greet+0.079
talk+0.076
Neg logit contributions
ls-0.046
ummy-0.045
 important-0.045
 problem-0.045
ience-0.044
Token Lily
Token ID20037
Feature activation+1.28e-02

Pos logit contributions
ummy+0.006
ls+0.006
 important+0.006
 adorable+0.006
ience+0.006
Neg logit contributions
 join-0.011
 meet-0.010
 greet-0.010
 build-0.010
talk-0.010
Token thank
Token ID5875
Feature activation+3.68e-02

Pos logit contributions
 join+0.049
 meet+0.049
 greet+0.046
 build+0.045
talk+0.044
Neg logit contributions
ummy-0.024
 adorable-0.023
 important-0.023
ience-0.022
 importantly-0.022
Token you
Token ID345
Feature activation+5.88e-02

Pos logit contributions
 meet+0.092
talk+0.091
 join+0.090
 greet+0.089
 build+0.088
Neg logit contributions
ummy-0.044
 adorable-0.044
ls-0.043
 important-0.042
 cute-0.038
Token to
Token ID284
Feature activation+1.38e-01

Pos logit contributions
 meet+0.142
 join+0.142
 greet+0.135
talk+0.133
 build+0.131
Neg logit contributions
ummy-0.080
 adorable-0.079
ls-0.078
 important-0.078
 problem-0.075
Token them
Token ID606
Feature activation+9.62e-02

Pos logit contributions
talk+0.313
 meet+0.290
 contain+0.290
 join+0.288
tted+0.286
Neg logit contributions
 adorable-0.201
 important-0.195
ummy-0.192
ls-0.186
 cute-0.184
Token for
Token ID329
Feature activation+1.00e-01

Pos logit contributions
 meet+0.261
 join+0.260
 greet+0.250
talk+0.249
 build+0.244
Neg logit contributions
ummy-0.102
 adorable-0.099
 important-0.098
ls-0.097
 problem-0.092
Token letting
Token ID9616
Feature activation+1.74e-01

Pos logit contributions
talk+0.239
oes+0.221
 join+0.219
 contain+0.218
 meet+0.217
Neg logit contributions
 important-0.146
 adorable-0.144
 cute-0.135
 disgusting-0.135
 accidents-0.128
Token him
Token ID683
Feature activation+6.21e-02

Pos logit contributions
 join+0.313
 contain+0.309
talk+0.308
 meet+0.308
 greet+0.299
Neg logit contributions
 important-0.319
 adorable-0.317
ls-0.317
ummy-0.314
 disgusting-0.300
Token see
Token ID766
Feature activation+8.27e-02

Pos logit contributions
talk+0.077
 greet+0.073
 meet+0.071
 contain+0.069
 join+0.069
Neg logit contributions
 adorable-0.145
 important-0.143
ls-0.141
 cute-0.138
ummy-0.136
Token them
Token ID606
Feature activation+9.41e-02

Pos logit contributions
 meet+0.166
 join+0.165
talk+0.159
 greet+0.157
 contain+0.154
Neg logit contributions
 adorable-0.143
ummy-0.143
ls-0.142
 important-0.142
 disgusting-0.135
Token.
Token ID13
Feature activation+2.44e-02

Pos logit contributions
 meet+0.214
 join+0.213
talk+0.208
 greet+0.206
 contain+0.198
Neg logit contributions
 adorable-0.136
ummy-0.134
 important-0.134
ls-0.133
 problem-0.127
Token He
Token ID679
Feature activation+9.75e-03

Pos logit contributions
 join+0.049
 meet+0.049
 greet+0.046
talk+0.045
 build+0.044
Neg logit contributions
ummy-0.043
 important-0.043
ls-0.043
 problem-0.043
 adorable-0.042
Token it
Token ID340
Feature activation+1.37e-02

Pos logit contributions
 join+0.030
 meet+0.030
 greet+0.028
 build+0.027
talk+0.026
Neg logit contributions
ls-0.024
ience-0.024
ummy-0.024
 adorable-0.023
 important-0.023
Token.
Token ID13
Feature activation+3.34e-02

Pos logit contributions
 join+0.027
 meet+0.026
 build+0.024
 greet+0.024
talk+0.023
Neg logit contributions
ummy-0.025
ls-0.025
ience-0.025
 importantly-0.025
 adorable-0.024
Token Thank
Token ID6952
Feature activation+4.52e-02

Pos logit contributions
 meet+0.075
talk+0.075
 join+0.075
 contain+0.075
 greet+0.073
Neg logit contributions
ummy-0.049
 adorable-0.048
 important-0.046
 importantly-0.045
 cute-0.043
Token you
Token ID345
Feature activation+5.37e-02

Pos logit contributions
 meet+0.117
talk+0.116
 join+0.116
 greet+0.116
 build+0.114
Neg logit contributions
ummy-0.052
 adorable-0.050
ls-0.048
 important-0.047
 importantly-0.046
Token for
Token ID329
Feature activation+1.01e-01

Pos logit contributions
 join+0.122
 meet+0.120
 greet+0.112
 build+0.108
talk+0.108
Neg logit contributions
ls-0.081
 adorable-0.080
 important-0.079
ience-0.078
 importantly-0.078
Token letting
Token ID9616
Feature activation+1.74e-01

Pos logit contributions
talk+0.253
 contain+0.234
 join+0.230
 meet+0.230
 greet+0.228
Neg logit contributions
 important-0.140
 adorable-0.137
 cute-0.128
 disgusting-0.127
ummy-0.122
Token me
Token ID502
Feature activation+7.82e-02

Pos logit contributions
 join+0.509
 meet+0.504
talk+0.495
 greet+0.494
 contain+0.493
Neg logit contributions
ummy-0.141
 important-0.138
 adorable-0.135
ls-0.131
 problem-0.124
Token paint
Token ID7521
Feature activation+7.41e-02

Pos logit contributions
talk+0.106
 gra+0.099
 greet+0.098
 meet+0.093
 contain+0.093
Neg logit contributions
 adorable-0.181
 important-0.180
ummy-0.174
 problem-0.173
 cute-0.170
Token your
Token ID534
Feature activation+4.97e-02

Pos logit contributions
 meet+0.164
 join+0.164
 greet+0.156
talk+0.154
 build+0.151
Neg logit contributions
ummy-0.115
 adorable-0.113
 important-0.113
ls-0.112
 problem-0.108
Token car
Token ID1097
Feature activation+5.69e-02

Pos logit contributions
 meet+0.117
 join+0.117
 greet+0.112
talk+0.111
 build+0.108
Neg logit contributions
ummy-0.070
 adorable-0.069
 important-0.069
ls-0.068
 problem-0.066
Token.
Token ID13
Feature activation+3.48e-02

Pos logit contributions
 meet+0.132
 join+0.130
 greet+0.127
talk+0.124
 contain+0.123
Neg logit contributions
 important-0.082
 adorable-0.081
ummy-0.080
ls-0.078
 problem-0.077
Token you
Token ID345
Feature activation+5.66e-02

Pos logit contributions
 meet+0.104
 join+0.102
 greet+0.101
talk+0.100
 build+0.099
Neg logit contributions
ls-0.048
ummy-0.047
 adorable-0.046
 important-0.045
ience-0.043
Token to
Token ID284
Feature activation+1.35e-01

Pos logit contributions
 meet+0.125
 join+0.125
 greet+0.118
talk+0.117
 build+0.115
Neg logit contributions
ummy-0.089
 adorable-0.087
 important-0.087
ls-0.087
 problem-0.083
Token the
Token ID262
Feature activation+7.58e-02

Pos logit contributions
talk+0.311
 meet+0.295
 contain+0.292
 join+0.290
tted+0.289
Neg logit contributions
 adorable-0.193
ummy-0.189
 important-0.186
ls-0.186
 cute-0.179
Token park
Token ID3952
Feature activation+3.81e-02

Pos logit contributions
 meet+0.177
 join+0.177
 greet+0.169
talk+0.167
 build+0.164
Neg logit contributions
ummy-0.110
 adorable-0.107
 important-0.107
ls-0.106
 problem-0.102
Token for
Token ID329
Feature activation+9.61e-02

Pos logit contributions
 meet+0.086
 join+0.086
 greet+0.082
talk+0.081
 build+0.080
Neg logit contributions
ummy-0.057
 adorable-0.056
 important-0.056
ls-0.056
 problem-0.054
Token letting
Token ID9616
Feature activation+1.71e-01

Pos logit contributions
talk+0.234
 contain+0.216
oes+0.213
 gra+0.211
 join+0.210
Neg logit contributions
 important-0.138
 adorable-0.136
 cute-0.129
 disgusting-0.128
 accidents-0.124
Token us
Token ID514
Feature activation+9.26e-02

Pos logit contributions
 contain+0.423
 join+0.417
talk+0.416
 meet+0.410
 greet+0.406
Neg logit contributions
ls-0.209
 important-0.204
ummy-0.201
 adorable-0.197
 problem-0.192
Token play
Token ID711
Feature activation+8.40e-02

Pos logit contributions
talk+0.125
 gra+0.119
 greet+0.117
 meet+0.103
Make+0.101
Neg logit contributions
 adorable-0.206
 important-0.203
 cute-0.197
 disgusting-0.195
 lazy-0.194
Token here
Token ID994
Feature activation+6.51e-02

Pos logit contributions
 meet+0.195
 join+0.191
talk+0.190
 greet+0.188
 contain+0.184
Neg logit contributions
 adorable-0.119
 important-0.118
ls-0.116
ummy-0.113
 disgusting-0.108
Token."
Token ID526
Feature activation+4.53e-02

Pos logit contributions
 meet+0.142
 join+0.142
 greet+0.135
talk+0.134
 build+0.131
Neg logit contributions
ummy-0.102
 adorable-0.101
 important-0.101
ls-0.100
 problem-0.096
Token
Token ID198
Feature activation+4.17e-02

Pos logit contributions
 meet+0.102
 join+0.101
talk+0.099
 greet+0.098
 contain+0.098
Neg logit contributions
ls-0.066
ummy-0.065
 adorable-0.064
 important-0.060
 cute-0.060
Token names
Token ID3891
Feature activation+1.09e-01

Pos logit contributions
 meet+0.073
 join+0.072
 greet+0.067
talk+0.065
 build+0.064
Neg logit contributions
ummy-0.094
ls-0.093
 adorable-0.091
ience-0.090
 importantly-0.090
Token and
Token ID290
Feature activation-1.84e-02

Pos logit contributions
 join+0.241
 meet+0.240
 greet+0.226
 build+0.222
talk+0.222
Neg logit contributions
ummy-0.169
 adorable-0.168
 important-0.167
ls-0.166
ience-0.164
Token put
Token ID1234
Feature activation+6.80e-02

Pos logit contributions
 important+0.035
 adorable+0.035
ummy+0.034
ls+0.033
 problem+0.033
Neg logit contributions
 join-0.033
 meet-0.033
talk-0.032
 greet-0.031
 contain-0.031
Token them
Token ID606
Feature activation+1.07e-01

Pos logit contributions
 meet+0.123
 join+0.122
 greet+0.115
talk+0.113
 build+0.111
Neg logit contributions
ummy-0.133
 adorable-0.132
 important-0.131
ls-0.131
 problem-0.127
Token in
Token ID287
Feature activation+1.29e-01

Pos logit contributions
 meet+0.219
 join+0.219
 greet+0.207
talk+0.205
 build+0.201
Neg logit contributions
ummy-0.184
 adorable-0.180
 important-0.179
ls-0.179
 problem-0.173
Token envelop
Token ID16441
Feature activation+1.71e-01

Pos logit contributions
 meet+0.259
 join+0.259
 greet+0.245
talk+0.241
 build+0.237
Neg logit contributions
ummy-0.227
 adorable-0.222
ls-0.222
 important-0.221
 problem-0.214
Tokenes
Token ID274
Feature activation+6.86e-02

Pos logit contributions
 join+0.366
 meet+0.365
 greet+0.347
talk+0.337
 build+0.335
Neg logit contributions
ummy-0.282
ls-0.277
 adorable-0.274
 important-0.273
 problem-0.263
Token.
Token ID13
Feature activation+2.41e-02

Pos logit contributions
 join+0.155
 meet+0.155
 greet+0.147
talk+0.145
 build+0.142
Neg logit contributions
ummy-0.104
 adorable-0.102
 important-0.101
ls-0.100
 problem-0.097
Token Then
Token ID3244
Feature activation-1.69e-02

Pos logit contributions
 meet+0.048
 join+0.048
 greet+0.045
talk+0.044
 build+0.043
Neg logit contributions
ummy-0.043
ls-0.043
 important-0.043
 adorable-0.043
 problem-0.042
Token they
Token ID484
Feature activation+3.51e-02

Pos logit contributions
ummy+0.020
 adorable+0.020
 important+0.020
ls+0.019
ience+0.018
Neg logit contributions
 join-0.044
 meet-0.044
 greet-0.041
 build-0.041
talk-0.041
Token would
Token ID561
Feature activation+5.65e-02

Pos logit contributions
 join+0.057
 meet+0.056
 greet+0.053
talk+0.053
 contain+0.052
Neg logit contributions
ls-0.069
ience-0.068
 problem-0.067
 adorable-0.067
 important-0.066
Token up
Token ID510
Feature activation+2.13e-02

Pos logit contributions
 join+0.009
 meet+0.009
 greet+0.009
talk+0.009
 build+0.009
Neg logit contributions
ummy-0.006
 adorable-0.006
 important-0.006
ls-0.006
 problem-0.006
Token.
Token ID13
Feature activation+1.05e-03

Pos logit contributions
 join+0.048
 meet+0.048
 greet+0.046
talk+0.045
 build+0.044
Neg logit contributions
ummy-0.032
 important-0.031
 adorable-0.031
ls-0.030
ience-0.030
Token Thank
Token ID6952
Feature activation+3.31e-02

Pos logit contributions
 join+0.002
 meet+0.002
 greet+0.002
 build+0.002
talk+0.002
Neg logit contributions
ls-0.002
 important-0.002
 problem-0.002
ummy-0.002
 adorable-0.002
Token you
Token ID345
Feature activation+5.23e-02

Pos logit contributions
 meet+0.079
 join+0.079
 greet+0.077
talk+0.076
 build+0.075
Neg logit contributions
ummy-0.045
 adorable-0.044
 important-0.043
ls-0.043
 importantly-0.041
Token for
Token ID329
Feature activation+1.02e-01

Pos logit contributions
 join+0.122
 meet+0.121
 greet+0.114
 contain+0.110
talk+0.110
Neg logit contributions
ls-0.076
ummy-0.076
 adorable-0.075
 important-0.075
 problem-0.073
Token letting
Token ID9616
Feature activation+1.71e-01

Pos logit contributions
talk+0.244
 meet+0.237
 join+0.234
 greet+0.228
 contain+0.227
Neg logit contributions
 important-0.139
 adorable-0.138
ummy-0.134
 disgusting-0.129
 importantly-0.129
Token me
Token ID502
Feature activation+8.12e-02

Pos logit contributions
 join+0.420
 meet+0.419
talk+0.411
 greet+0.406
 contain+0.405
Neg logit contributions
ummy-0.216
 important-0.209
 adorable-0.207
ls-0.207
 problem-0.195
Token play
Token ID711
Feature activation+8.68e-02

Pos logit contributions
talk+0.095
 meet+0.082
 greet+0.082
 gra+0.078
 contain+0.075
Neg logit contributions
 adorable-0.202
 important-0.199
ummy-0.199
ls-0.193
 cute-0.190
Token with
Token ID351
Feature activation+7.75e-02

Pos logit contributions
 join+0.191
 meet+0.189
 greet+0.179
talk+0.176
 build+0.175
Neg logit contributions
ummy-0.138
 adorable-0.134
 important-0.134
ls-0.134
 problem-0.130
Token you
Token ID345
Feature activation+6.72e-02

Pos logit contributions
 join+0.167
 meet+0.165
 greet+0.156
 build+0.151
talk+0.147
Neg logit contributions
ummy-0.124
 important-0.123
 adorable-0.122
 problem-0.121
ls-0.121
Token."
Token ID526
Feature activation+4.12e-02

Pos logit contributions
 join+0.157
 meet+0.157
 greet+0.148
 build+0.145
talk+0.145
Neg logit contributions
ummy-0.098
 adorable-0.095
ls-0.095
 important-0.095
 problem-0.091
Token heart
Token ID2612
Feature activation+1.27e-02

Pos logit contributions
 meet+0.068
 join+0.068
 greet+0.064
talk+0.063
 build+0.062
Neg logit contributions
ummy-0.055
ls-0.054
 adorable-0.053
 important-0.053
ience-0.051
Token.
Token ID13
Feature activation+3.32e-02

Pos logit contributions
 join+0.027
 meet+0.027
 greet+0.026
talk+0.025
 build+0.025
Neg logit contributions
ummy-0.021
 adorable-0.020
ls-0.020
 important-0.020
 problem-0.019
Token Thank
Token ID6952
Feature activation+4.07e-02

Pos logit contributions
 meet+0.074
 join+0.074
talk+0.073
 contain+0.072
 greet+0.071
Neg logit contributions
ummy-0.049
 adorable-0.049
 important-0.048
 importantly-0.047
ience-0.045
Token you
Token ID345
Feature activation+5.22e-02

Pos logit contributions
 meet+0.105
 join+0.104
talk+0.103
 greet+0.103
 build+0.102
Neg logit contributions
ummy-0.047
 adorable-0.045
 important-0.044
ls-0.044
 importantly-0.042
Token for
Token ID329
Feature activation+9.77e-02

Pos logit contributions
 join+0.121
 meet+0.119
 greet+0.111
 build+0.108
talk+0.107
Neg logit contributions
ls-0.077
 adorable-0.075
 important-0.073
ummy-0.072
 disgusting-0.072
Token letting
Token ID9616
Feature activation+1.71e-01

Pos logit contributions
talk+0.238
 join+0.226
 meet+0.225
 contain+0.222
 greet+0.219
Neg logit contributions
 important-0.138
 adorable-0.133
 cute-0.124
 disgusting-0.122
 problem-0.121
Token me
Token ID502
Feature activation+7.53e-02

Pos logit contributions
 join+0.491
 meet+0.486
talk+0.482
 contain+0.480
 greet+0.477
Neg logit contributions
ummy-0.143
 important-0.142
 adorable-0.136
ls-0.133
 problem-0.127
Token go
Token ID467
Feature activation+7.85e-02

Pos logit contributions
talk+0.093
 gra+0.085
 greet+0.084
 meet+0.082
 join+0.079
Neg logit contributions
 adorable-0.182
 important-0.180
ummy-0.173
 cute-0.172
 lazy-0.170
Token.
Token ID13
Feature activation+3.90e-02

Pos logit contributions
 join+0.174
 meet+0.173
 greet+0.167
talk+0.167
 gra+0.165
Neg logit contributions
ls-0.117
ummy-0.116
 important-0.115
 adorable-0.115
 problem-0.109
Token I
Token ID314
Feature activation+8.33e-02

Pos logit contributions
 meet+0.087
 join+0.087
talk+0.086
 contain+0.084
 greet+0.084
Neg logit contributions
ummy-0.058
 adorable-0.057
 important-0.056
 importantly-0.055
ience-0.052
Token love
Token ID1842
Feature activation-2.18e-03

Pos logit contributions
 meet+0.123
 join+0.121
talk+0.118
 greet+0.117
 contain+0.116
Neg logit contributions
 important-0.186
 adorable-0.181
 problem-0.177
ience-0.174
 cute-0.173
Token it
Token ID340
Feature activation-1.77e-02

Pos logit contributions
ummy+0.129
 adorable+0.124
ls+0.123
 important+0.121
 importantly+0.118
Neg logit contributions
 join-0.191
 meet-0.190
 greet-0.177
 build-0.175
talk-0.169
Token can
Token ID460
Feature activation+3.16e-02

Pos logit contributions
ummy+0.031
 adorable+0.029
 importantly+0.028
 important+0.028
ls+0.028
Neg logit contributions
 meet-0.037
 join-0.037
 greet-0.034
 build-0.034
talk-0.034
Token forgive
Token ID20927
Feature activation+4.22e-02

Pos logit contributions
talk+0.053
keeping+0.050
 greet+0.047
 meet+0.045
 contain+0.045
Neg logit contributions
 important-0.066
 adorable-0.065
 problem-0.064
 disgusting-0.062
 cute-0.062
Token you
Token ID345
Feature activation+4.96e-02

Pos logit contributions
 join+0.100
 meet+0.100
 greet+0.096
talk+0.095
 build+0.093
Neg logit contributions
ummy-0.059
 adorable-0.058
 important-0.058
ls-0.057
 problem-0.055
Token for
Token ID329
Feature activation+8.97e-02

Pos logit contributions
 meet+0.104
 join+0.104
 greet+0.099
talk+0.099
 build+0.096
Neg logit contributions
ummy-0.083
 adorable-0.082
 important-0.082
ls-0.081
 problem-0.078
Token letting
Token ID9616
Feature activation+1.70e-01

Pos logit contributions
 join+0.203
 meet+0.202
talk+0.194
 greet+0.193
 contain+0.188
Neg logit contributions
 important-0.135
ummy-0.134
 adorable-0.134
ls-0.130
 problem-0.128
Token it
Token ID340
Feature activation+2.79e-02

Pos logit contributions
 join+0.470
talk+0.468
 meet+0.465
 contain+0.463
 greet+0.459
Neg logit contributions
 important-0.165
ummy-0.156
 adorable-0.155
ls-0.149
 problem-0.145
Token go
Token ID467
Feature activation+7.29e-02

Pos logit contributions
talk+0.043
 greet+0.040
 meet+0.039
 contain+0.039
 join+0.038
Neg logit contributions
 adorable-0.061
 important-0.061
ls-0.061
 problem-0.059
 cute-0.058
Token.
Token ID13
Feature activation+1.17e-02

Pos logit contributions
 join+0.151
 meet+0.150
 greet+0.145
talk+0.145
 contain+0.141
Neg logit contributions
 important-0.121
ls-0.121
 adorable-0.121
ummy-0.120
 problem-0.116
Token That
Token ID1320
Feature activation-5.80e-02

Pos logit contributions
 meet+0.024
 join+0.024
talk+0.024
 contain+0.024
 greet+0.024
Neg logit contributions
ummy-0.019
 adorable-0.019
 important-0.019
 importantly-0.018
ience-0.018
Token's
Token ID338
Feature activation-1.68e-02

Pos logit contributions
ls+0.101
ummy+0.101
 importantly+0.097
 adorable+0.092
ience+0.091
Neg logit contributions
 meet-0.122
 join-0.119
 build-0.111
 greet-0.110
 make-0.108
Token.
Token ID13
Feature activation+3.54e-02

Pos logit contributions
 join+0.013
 meet+0.012
talk+0.012
 build+0.012
 greet+0.012
Neg logit contributions
 importantly-0.010
 important-0.010
ience-0.010
 adorable-0.010
ls-0.010
Token And
Token ID843
Feature activation-4.82e-02

Pos logit contributions
 meet+0.077
 join+0.076
talk+0.073
 greet+0.073
 contain+0.072
Neg logit contributions
ummy-0.057
 adorable-0.055
 important-0.054
ls-0.053
 importantly-0.052
Token thank
Token ID5875
Feature activation+3.73e-02

Pos logit contributions
ummy+0.075
 adorable+0.074
 important+0.073
ls+0.073
 problem+0.070
Neg logit contributions
 meet-0.107
 join-0.107
 greet-0.102
talk-0.101
 build-0.098
Token you
Token ID345
Feature activation+5.77e-02

Pos logit contributions
 meet+0.091
 join+0.090
talk+0.089
 greet+0.088
 contain+0.087
Neg logit contributions
ummy-0.048
ls-0.047
 adorable-0.047
 important-0.046
 problem-0.044
Token for
Token ID329
Feature activation+9.90e-02

Pos logit contributions
 join+0.129
 meet+0.129
 greet+0.121
talk+0.120
 build+0.118
Neg logit contributions
ummy-0.089
ls-0.088
 adorable-0.088
 important-0.087
ience-0.084
Token letting
Token ID9616
Feature activation+1.70e-01

Pos logit contributions
talk+0.242
 join+0.220
 contain+0.220
 meet+0.219
 greet+0.215
Neg logit contributions
 important-0.142
 adorable-0.139
 disgusting-0.130
 cute-0.129
ummy-0.128
Token us
Token ID514
Feature activation+9.96e-02

Pos logit contributions
 join+0.468
 contain+0.464
 meet+0.460
talk+0.460
 greet+0.454
Neg logit contributions
 important-0.161
ummy-0.161
 adorable-0.155
ls-0.151
 problem-0.146
Token play
Token ID711
Feature activation+8.57e-02

Pos logit contributions
talk+0.124
 greet+0.113
 gra+0.108
 meet+0.104
 contain+0.102
Neg logit contributions
 adorable-0.231
 important-0.228
ummy-0.225
 cute-0.220
 disgusting-0.218
Token with
Token ID351
Feature activation+7.94e-02

Pos logit contributions
 meet+0.186
 join+0.186
 greet+0.177
talk+0.176
 build+0.171
Neg logit contributions
ummy-0.137
 adorable-0.135
 important-0.134
ls-0.133
 problem-0.128
Token Max
Token ID5436
Feature activation+6.30e-02

Pos logit contributions
 meet+0.165
 join+0.165
 greet+0.154
 build+0.152
talk+0.145
Neg logit contributions
 importantly-0.131
ummy-0.130
 problem-0.129
 adorable-0.129
ls-0.129
Token.
Token ID13
Feature activation+2.66e-02

Pos logit contributions
 join+0.128
 meet+0.127
 build+0.119
talk+0.118
 greet+0.117
Neg logit contributions
ummy-0.109
 importantly-0.106
ls-0.104
 adorable-0.103
 important-0.102
Token
Token ID198
Feature activation+2.02e-02

Pos logit contributions
 join+0.050
 meet+0.050
 greet+0.048
talk+0.047
 build+0.047
Neg logit contributions
ummy-0.042
 adorable-0.041
 important-0.041
ls-0.041
 problem-0.039
Token"
Token ID1
Feature activation+8.38e-02

Pos logit contributions
 join+0.041
 meet+0.041
 greet+0.039
talk+0.039
 build+0.038
Neg logit contributions
ummy-0.034
 adorable-0.034
 important-0.034
ls-0.033
 importantly-0.032
TokenThank
Token ID10449
Feature activation+2.81e-02

Pos logit contributions
 meet+0.151
 join+0.149
 greet+0.143
 build+0.141
talk+0.139
Neg logit contributions
 adorable-0.161
 important-0.160
ummy-0.158
ls-0.155
 importantly-0.154
Token you
Token ID345
Feature activation+5.53e-02

Pos logit contributions
 meet+0.068
 join+0.067
 greet+0.065
talk+0.065
 build+0.064
Neg logit contributions
ummy-0.039
 adorable-0.037
 important-0.037
ls-0.037
 problem-0.035
Token for
Token ID329
Feature activation+1.00e-01

Pos logit contributions
 join+0.140
 meet+0.137
 greet+0.131
 contain+0.129
talk+0.127
Neg logit contributions
ls-0.070
 adorable-0.066
 important-0.065
 importantly-0.065
ience-0.064
Token letting
Token ID9616
Feature activation+1.70e-01

Pos logit contributions
 meet+0.222
 join+0.222
 greet+0.211
talk+0.207
 build+0.205
Neg logit contributions
ummy-0.157
ls-0.153
 adorable-0.153
 important-0.152
 problem-0.147
Token us
Token ID514
Feature activation+1.03e-01

Pos logit contributions
 join+0.466
 meet+0.464
talk+0.453
 greet+0.450
 contain+0.446
Neg logit contributions
ummy-0.168
 important-0.163
 adorable-0.162
ls-0.160
 problem-0.150
Token pet
Token ID4273
Feature activation+7.34e-02

Pos logit contributions
talk+0.130
 greet+0.123
 gra+0.119
 meet+0.116
 join+0.108
Neg logit contributions
 adorable-0.237
ummy-0.237
 important-0.235
ls-0.229
 problem-0.228
Token him
Token ID683
Feature activation+6.35e-02

Pos logit contributions
 meet+0.149
 join+0.148
 greet+0.140
talk+0.140
 build+0.138
Neg logit contributions
ls-0.127
ummy-0.127
 adorable-0.124
 important-0.122
 problem-0.118
Token,"
Token ID553
Feature activation+3.59e-02

Pos logit contributions
 join+0.135
 meet+0.135
 greet+0.126
talk+0.124
 build+0.124
Neg logit contributions
ummy-0.104
 adorable-0.102
 important-0.101
 importantly-0.101
ls-0.100
Token Ben
Token ID3932
Feature activation+1.63e-02

Pos logit contributions
 join+0.086
 meet+0.085
 greet+0.081
talk+0.079
 build+0.078
Neg logit contributions
ummy-0.052
 important-0.050
 adorable-0.049
ls-0.049
 problem-0.048
Token today
Token ID1909
Feature activation-3.39e-02

Pos logit contributions
ummy+0.009
ls+0.008
 adorable+0.008
 important+0.008
 problem+0.008
Neg logit contributions
 join-0.011
 meet-0.011
 greet-0.010
talk-0.010
 build-0.010
Token.
Token ID13
Feature activation+2.64e-02

Pos logit contributions
ls+0.060
 adorable+0.059
ummy+0.059
ience+0.058
 important+0.058
Neg logit contributions
 join-0.067
 meet-0.067
 greet-0.062
 build-0.061
talk-0.061
Token Thank
Token ID6952
Feature activation+3.76e-02

Pos logit contributions
 join+0.059
 meet+0.059
 contain+0.059
talk+0.059
 greet+0.057
Neg logit contributions
ummy-0.039
 adorable-0.037
 important-0.036
 importantly-0.036
 cute-0.033
Token you
Token ID345
Feature activation+5.21e-02

Pos logit contributions
 meet+0.095
 join+0.095
talk+0.094
 greet+0.094
 build+0.093
Neg logit contributions
ummy-0.045
 adorable-0.043
 important-0.042
ls-0.041
 importantly-0.039
Token for
Token ID329
Feature activation+9.68e-02

Pos logit contributions
 join+0.115
 meet+0.114
 greet+0.106
talk+0.103
 build+0.103
Neg logit contributions
ls-0.083
 adorable-0.081
 important-0.079
ience-0.078
ummy-0.078
Token letting
Token ID9616
Feature activation+1.70e-01

Pos logit contributions
talk+0.231
 join+0.218
 meet+0.218
 contain+0.216
 greet+0.212
Neg logit contributions
 important-0.140
 adorable-0.137
ummy-0.128
 disgusting-0.127
 cute-0.127
Token me
Token ID502
Feature activation+7.57e-02

Pos logit contributions
 join+0.487
talk+0.481
 meet+0.481
 contain+0.479
 greet+0.473
Neg logit contributions
 important-0.143
ummy-0.141
 adorable-0.137
ls-0.130
 problem-0.124
Token brush
Token ID14093
Feature activation+7.35e-02

Pos logit contributions
talk+0.088
 greet+0.078
 meet+0.077
 gra+0.076
 contain+0.074
Neg logit contributions
 adorable-0.189
 important-0.186
ummy-0.183
 cute-0.179
 problem-0.177
Token your
Token ID534
Feature activation+4.09e-02

Pos logit contributions
 join+0.157
 meet+0.156
 greet+0.148
talk+0.145
 build+0.144
Neg logit contributions
ummy-0.120
 adorable-0.117
 important-0.117
ls-0.117
 problem-0.114
Token hair
Token ID4190
Feature activation+5.60e-02

Pos logit contributions
 meet+0.089
 join+0.089
 greet+0.085
talk+0.084
 build+0.082
Neg logit contributions
ummy-0.065
 adorable-0.064
 important-0.064
ls-0.063
 problem-0.061
Token."
Token ID526
Feature activation+4.34e-02

Pos logit contributions
 meet+0.136
 join+0.136
 greet+0.130
talk+0.129
 build+0.126
Neg logit contributions
ummy-0.075
 adorable-0.073
 important-0.073
ls-0.072
 problem-0.069
Token for
Token ID329
Feature activation+1.05e-01

Pos logit contributions
 join+0.124
 meet+0.124
 greet+0.115
 contain+0.112
talk+0.111
Neg logit contributions
ummy-0.081
ls-0.081
 adorable-0.081
 important-0.080
 problem-0.078
Token soaking
Token ID41714
Feature activation+1.78e-02

Pos logit contributions
talk+0.245
 join+0.234
 meet+0.234
 contain+0.230
 greet+0.227
Neg logit contributions
 important-0.155
 adorable-0.151
ummy-0.145
 disgusting-0.144
 cute-0.140
Token the
Token ID262
Feature activation+6.53e-02

Pos logit contributions
 join+0.036
 meet+0.035
 greet+0.033
 build+0.033
 contain+0.031
Neg logit contributions
ience-0.030
 problem-0.030
 importantly-0.030
ummy-0.030
ls-0.030
Token crack
Token ID8469
Feature activation+2.33e-02

Pos logit contributions
 meet+0.149
 join+0.149
 greet+0.142
 build+0.138
talk+0.138
Neg logit contributions
ummy-0.095
ls-0.094
 adorable-0.092
 important-0.092
 importantly-0.091
Token and
Token ID290
Feature activation-3.28e-02

Pos logit contributions
 join+0.056
 meet+0.055
 greet+0.053
talk+0.053
 build+0.052
Neg logit contributions
ummy-0.032
 adorable-0.031
 important-0.031
ls-0.030
 problem-0.030
Token letting
Token ID9616
Feature activation+1.69e-01

Pos logit contributions
 adorable+0.060
 important+0.060
 disgusting+0.059
ummy+0.059
ls+0.058
Neg logit contributions
talk-0.062
 meet-0.060
 join-0.060
 contain-0.057
 greet-0.057
Token me
Token ID502
Feature activation+8.28e-02

Pos logit contributions
 contain+0.379
 join+0.376
talk+0.374
 meet+0.371
 greet+0.365
Neg logit contributions
 important-0.246
ummy-0.244
ls-0.237
 adorable-0.235
 disgusting-0.228
Token out
Token ID503
Feature activation+8.24e-02

Pos logit contributions
talk+0.101
 meet+0.090
 greet+0.088
 contain+0.084
 join+0.083
Neg logit contributions
 adorable-0.200
ummy-0.198
 important-0.196
ls-0.193
 cute-0.190
Token.
Token ID13
Feature activation+3.61e-02

Pos logit contributions
 meet+0.183
 join+0.182
 greet+0.174
talk+0.173
 build+0.168
Neg logit contributions
ummy-0.127
 adorable-0.125
 important-0.124
ls-0.124
 problem-0.119
Token You
Token ID921
Feature activation+4.27e-02

Pos logit contributions
 meet+0.076
 join+0.075
 greet+0.071
talk+0.070
 build+0.069
Neg logit contributions
ummy-0.061
ls-0.060
 important-0.060
 adorable-0.060
 problem-0.058
Token are
Token ID389
Feature activation-9.12e-03

Pos logit contributions
talk+0.056
 greet+0.054
 join+0.054
 meet+0.053
 gra+0.050
Neg logit contributions
 important-0.100
 adorable-0.099
ience-0.096
 problem-0.096
 cute-0.096
TokenMom
Token ID29252
Feature activation+5.01e-02

Pos logit contributions
 join+0.072
 build+0.072
 contain+0.071
 greet+0.070
 gra+0.069
Neg logit contributions
 adorable-0.048
ummy-0.048
 important-0.045
 importantly-0.045
very-0.045
Token,
Token ID11
Feature activation-5.31e-02

Pos logit contributions
 meet+0.140
 join+0.140
 greet+0.132
 build+0.132
talk+0.130
Neg logit contributions
ls-0.049
 important-0.048
 adorable-0.047
ummy-0.046
ience-0.045
Token thank
Token ID5875
Feature activation+3.30e-02

Pos logit contributions
ummy+0.088
 adorable+0.086
 important+0.086
ls+0.085
 problem+0.083
Neg logit contributions
 meet-0.113
 join-0.113
 greet-0.107
talk-0.106
 build-0.104
Token you
Token ID345
Feature activation+5.20e-02

Pos logit contributions
talk+0.084
 meet+0.083
 join+0.083
 greet+0.082
 build+0.081
Neg logit contributions
ummy-0.040
ls-0.037
 adorable-0.037
 important-0.035
 problem-0.034
Token for
Token ID329
Feature activation+9.24e-02

Pos logit contributions
 join+0.130
 meet+0.128
 greet+0.120
 build+0.118
talk+0.116
Neg logit contributions
ls-0.069
 adorable-0.067
 important-0.066
ummy-0.066
ience-0.065
Token letting
Token ID9616
Feature activation+1.69e-01

Pos logit contributions
talk+0.223
 meet+0.209
 join+0.207
 greet+0.203
 contain+0.203
Neg logit contributions
 important-0.131
 adorable-0.130
 disgusting-0.122
 cute-0.121
ummy-0.120
Token me
Token ID502
Feature activation+6.70e-02

Pos logit contributions
talk+0.473
 join+0.472
 meet+0.469
 contain+0.467
 greet+0.463
Neg logit contributions
 important-0.150
ummy-0.150
 adorable-0.143
ls-0.140
 problem-0.136
Token share
Token ID2648
Feature activation+8.86e-02

Pos logit contributions
talk+0.081
 greet+0.072
 gra+0.070
 meet+0.068
 contain+0.065
Neg logit contributions
 adorable-0.164
ummy-0.161
 important-0.161
 cute-0.157
ls-0.156
Token Rose
Token ID8049
Feature activation+3.05e-02

Pos logit contributions
 join+0.196
 meet+0.194
 greet+0.183
 build+0.175
talk+0.175
Neg logit contributions
ummy-0.141
ls-0.137
 important-0.136
ience-0.136
 importantly-0.136
Token with
Token ID351
Feature activation+6.80e-02

Pos logit contributions
 join+0.071
 meet+0.071
 greet+0.067
talk+0.066
 build+0.065
Neg logit contributions
ummy-0.044
ls-0.043
 important-0.043
 adorable-0.042
 importantly-0.042
Token them
Token ID606
Feature activation+9.04e-02

Pos logit contributions
 join+0.160
 meet+0.159
 greet+0.149
talk+0.144
 build+0.143
Neg logit contributions
ummy-0.098
 important-0.095
 adorable-0.095
ls-0.095
 problem-0.093
Token
Token ID198
Feature activation+3.60e-02

Pos logit contributions
 join+0.084
 meet+0.084
 greet+0.080
talk+0.080
 contain+0.079
Neg logit contributions
ummy-0.064
ls-0.063
 important-0.063
 adorable-0.062
 problem-0.059
Token"
Token ID1
Feature activation+6.77e-02

Pos logit contributions
 meet+0.077
 join+0.077
 greet+0.073
talk+0.072
 build+0.070
Neg logit contributions
ummy-0.060
 adorable-0.059
 important-0.058
ls-0.058
 problem-0.056
TokenThank
Token ID10449
Feature activation+1.69e-02

Pos logit contributions
 meet+0.129
 join+0.128
 greet+0.121
 build+0.121
 contain+0.119
Neg logit contributions
 adorable-0.126
 important-0.123
ls-0.121
ummy-0.120
 cute-0.120
Token you
Token ID345
Feature activation+5.09e-02

Pos logit contributions
 meet+0.043
 join+0.042
 greet+0.042
talk+0.041
 build+0.041
Neg logit contributions
ummy-0.021
ls-0.021
 adorable-0.021
 important-0.020
 importantly-0.019
Token for
Token ID329
Feature activation+9.42e-02

Pos logit contributions
 join+0.134
 meet+0.132
 greet+0.124
 build+0.123
talk+0.121
Neg logit contributions
ls-0.058
 important-0.056
ience-0.056
 adorable-0.055
 importantly-0.054
Token letting
Token ID9616
Feature activation+1.69e-01

Pos logit contributions
talk+0.227
 join+0.213
 contain+0.212
 meet+0.210
 greet+0.208
Neg logit contributions
 important-0.138
 adorable-0.137
 disgusting-0.128
 cute-0.128
ummy-0.124
Token us
Token ID514
Feature activation+9.61e-02

Pos logit contributions
 join+0.504
 contain+0.498
 meet+0.497
talk+0.493
 greet+0.491
Neg logit contributions
ummy-0.123
 important-0.120
 adorable-0.117
ls-0.116
 problem-0.110
Token pet
Token ID4273
Feature activation+6.86e-02

Pos logit contributions
talk+0.127
 greet+0.111
 gra+0.110
 contain+0.107
 build+0.106
Neg logit contributions
 adorable-0.228
 important-0.222
 cute-0.220
ummy-0.218
 disgusting-0.216
Token your
Token ID534
Feature activation+4.96e-02

Pos logit contributions
 meet+0.129
 join+0.129
 greet+0.121
talk+0.120
 build+0.118
Neg logit contributions
ummy-0.129
ls-0.127
 adorable-0.127
 important-0.126
 problem-0.122
Token dog
Token ID3290
Feature activation+6.53e-02

Pos logit contributions
 join+0.108
 meet+0.108
 greet+0.101
talk+0.097
 build+0.096
Neg logit contributions
ummy-0.082
ls-0.080
 important-0.079
ience-0.078
 importantly-0.078
Token,"
Token ID553
Feature activation+3.25e-02

Pos logit contributions
 meet+0.169
 join+0.169
 greet+0.160
talk+0.158
 build+0.156
Neg logit contributions
ummy-0.079
 important-0.076
 adorable-0.075
ls-0.074
 importantly-0.074
Tokenon
Token ID261
Feature activation+2.09e-02

Pos logit contributions
 join+0.055
 meet+0.055
 greet+0.052
talk+0.052
 build+0.051
Neg logit contributions
ummy-0.045
 important-0.044
 adorable-0.044
ls-0.044
 problem-0.042
Token.
Token ID13
Feature activation+4.15e-02

Pos logit contributions
 join+0.045
 meet+0.045
talk+0.043
 greet+0.043
 build+0.042
Neg logit contributions
ummy-0.033
 adorable-0.033
 important-0.032
ls-0.032
 importantly-0.031
Token Thank
Token ID6952
Feature activation+3.93e-02

Pos logit contributions
 meet+0.088
 join+0.087
 greet+0.084
talk+0.084
 contain+0.083
Neg logit contributions
ummy-0.069
 adorable-0.066
 important-0.066
ls-0.064
ience-0.062
Token you
Token ID345
Feature activation+5.09e-02

Pos logit contributions
 meet+0.100
 join+0.099
 greet+0.097
talk+0.096
 gra+0.095
Neg logit contributions
ummy-0.048
 adorable-0.047
ls-0.046
 important-0.045
 importantly-0.042
Token for
Token ID329
Feature activation+9.45e-02

Pos logit contributions
 join+0.109
 meet+0.107
 greet+0.100
talk+0.099
 build+0.098
Neg logit contributions
 adorable-0.082
ls-0.082
ience-0.081
ummy-0.080
 important-0.080
Token letting
Token ID9616
Feature activation+1.68e-01

Pos logit contributions
talk+0.219
 join+0.214
 meet+0.214
 greet+0.208
 contain+0.208
Neg logit contributions
 important-0.138
 adorable-0.136
ummy-0.131
 disgusting-0.128
 problem-0.127
Token me
Token ID502
Feature activation+7.39e-02

Pos logit contributions
 join+0.486
 meet+0.480
 contain+0.479
talk+0.476
 greet+0.474
Neg logit contributions
ummy-0.142
 important-0.136
ls-0.134
 adorable-0.132
 problem-0.125
Token borrow
Token ID8804
Feature activation+9.72e-02

Pos logit contributions
talk+0.097
 meet+0.094
 greet+0.093
 contain+0.093
 gra+0.092
Neg logit contributions
ummy-0.174
 adorable-0.173
 important-0.170
 problem-0.163
ls-0.163
Token it
Token ID340
Feature activation+4.17e-02

Pos logit contributions
 meet+0.227
 join+0.227
 greet+0.216
talk+0.214
 build+0.210
Neg logit contributions
ummy-0.139
 adorable-0.136
 important-0.136
ls-0.135
 problem-0.130
Token,"
Token ID553
Feature activation+3.44e-02

Pos logit contributions
 join+0.088
 meet+0.088
 greet+0.083
talk+0.082
 build+0.081
Neg logit contributions
ummy-0.069
 adorable-0.068
 important-0.067
ls-0.067
ience-0.065
Token Ben
Token ID3932
Feature activation+1.33e-02

Pos logit contributions
 join+0.073
 meet+0.070
 build+0.066
 greet+0.065
 make+0.062
Neg logit contributions
 important-0.061
ummy-0.059
ience-0.058
 problem-0.058
 adorable-0.057
Token thanked
Token ID26280
Feature activation+6.29e-02

Pos logit contributions
 join+0.064
 meet+0.064
 greet+0.061
talk+0.059
 build+0.059
Neg logit contributions
ummy-0.042
 important-0.040
ls-0.040
 adorable-0.039
ience-0.039
Token the
Token ID262
Feature activation+5.73e-02

Pos logit contributions
 meet+0.128
 join+0.127
talk+0.122
 greet+0.120
 build+0.118
Neg logit contributions
ummy-0.108
 adorable-0.106
 important-0.105
ls-0.105
 problem-0.100
Token star
Token ID3491
Feature activation+9.08e-03

Pos logit contributions
 meet+0.140
 join+0.140
 greet+0.133
 build+0.128
talk+0.128
Neg logit contributions
ummy-0.077
ls-0.075
 adorable-0.073
 important-0.073
ience-0.070
Tokenfish
Token ID11084
Feature activation+4.92e-02

Pos logit contributions
 join+0.023
 meet+0.023
 greet+0.022
talk+0.022
 build+0.021
Neg logit contributions
ummy-0.011
 adorable-0.011
 important-0.011
ls-0.011
 problem-0.010
Token for
Token ID329
Feature activation+8.29e-02

Pos logit contributions
 join+0.120
 meet+0.119
 greet+0.114
talk+0.111
 build+0.109
Neg logit contributions
ummy-0.067
 adorable-0.064
 important-0.064
ls-0.064
 problem-0.062
Token letting
Token ID9616
Feature activation+1.68e-01

Pos logit contributions
talk+0.177
 meet+0.177
 join+0.175
 greet+0.168
 contain+0.168
Neg logit contributions
 important-0.131
 adorable-0.129
ummy-0.125
 importantly-0.122
 disgusting-0.120
Token them
Token ID606
Feature activation+1.14e-01

Pos logit contributions
 meet+0.344
talk+0.344
 join+0.342
 contain+0.334
 greet+0.331
Neg logit contributions
 important-0.269
ls-0.268
ummy-0.267
 adorable-0.262
 problem-0.251
Token touch
Token ID3638
Feature activation+7.26e-02

Pos logit contributions
talk+0.155
 greet+0.138
 meet+0.137
 gra+0.129
 join+0.127
Neg logit contributions
 adorable-0.260
 important-0.259
ls-0.249
 cute-0.247
 disgusting-0.245
Token it
Token ID340
Feature activation+4.11e-02

Pos logit contributions
 join+0.147
 meet+0.146
 greet+0.138
talk+0.133
 build+0.132
Neg logit contributions
ummy-0.127
 important-0.124
 adorable-0.124
ls-0.123
 problem-0.121
Token and
Token ID290
Feature activation-3.92e-02

Pos logit contributions
 join+0.093
 meet+0.093
 greet+0.088
talk+0.087
 build+0.086
Neg logit contributions
ummy-0.062
 adorable-0.060
 important-0.060
ls-0.060
 problem-0.058
Token left
Token ID1364
Feature activation+4.29e-02

Pos logit contributions
 adorable+0.065
 important+0.065
ls+0.064
ummy+0.062
 problem+0.061
Neg logit contributions
 meet-0.079
 join-0.079
talk-0.077
 greet-0.075
 contain-0.074
Token different
Token ID1180
Feature activation-7.57e-02

Pos logit contributions
ummy+0.128
ls+0.125
 adorable+0.121
 important+0.120
ience+0.120
Neg logit contributions
 meet-0.152
 join-0.152
 greet-0.143
 build-0.138
talk-0.137
Token.
Token ID13
Feature activation+2.79e-02

Pos logit contributions
ummy+0.127
ls+0.125
 adorable+0.124
 important+0.123
ience+0.122
Neg logit contributions
 join-0.158
 meet-0.158
 greet-0.148
talk-0.146
 build-0.144
Token Thank
Token ID6952
Feature activation+3.15e-02

Pos logit contributions
 meet+0.058
 join+0.057
 contain+0.056
talk+0.056
 greet+0.056
Neg logit contributions
ummy-0.045
 adorable-0.045
 important-0.043
 importantly-0.043
 cute-0.041
Token you
Token ID345
Feature activation+4.77e-02

Pos logit contributions
 meet+0.080
 join+0.079
talk+0.079
 greet+0.079
 build+0.078
Neg logit contributions
ummy-0.036
 adorable-0.036
ls-0.035
 important-0.035
 importantly-0.033
Token for
Token ID329
Feature activation+9.09e-02

Pos logit contributions
 join+0.108
 meet+0.107
 greet+0.100
 build+0.097
talk+0.097
Neg logit contributions
ls-0.072
 adorable-0.071
ummy-0.069
 important-0.069
 importantly-0.068
Token letting
Token ID9616
Feature activation+1.68e-01

Pos logit contributions
talk+0.217
 join+0.205
 meet+0.204
 contain+0.203
 greet+0.199
Neg logit contributions
 important-0.132
 adorable-0.128
 disgusting-0.121
 cute-0.121
 problem-0.119
Token me
Token ID502
Feature activation+7.15e-02

Pos logit contributions
 join+0.463
 meet+0.458
talk+0.456
 contain+0.455
 greet+0.450
Neg logit contributions
ummy-0.160
 important-0.157
ls-0.153
 adorable-0.151
 problem-0.144
Token play
Token ID711
Feature activation+8.29e-02

Pos logit contributions
talk+0.083
 gra+0.076
 greet+0.076
 meet+0.075
 contain+0.072
Neg logit contributions
 adorable-0.177
 important-0.174
ummy-0.172
 cute-0.168
 problem-0.165
Token with
Token ID351
Feature activation+7.89e-02

Pos logit contributions
 meet+0.180
 join+0.179
 greet+0.171
talk+0.170
 contain+0.165
Neg logit contributions
ummy-0.132
 adorable-0.131
 important-0.130
ls-0.128
 problem-0.124
Token it
Token ID340
Feature activation+3.29e-02

Pos logit contributions
 meet+0.181
 join+0.181
 greet+0.170
 build+0.166
talk+0.163
Neg logit contributions
ummy-0.115
ls-0.113
 important-0.112
 importantly-0.112
 adorable-0.112
Token."
Token ID526
Feature activation+3.80e-02

Pos logit contributions
 join+0.060
 meet+0.059
 greet+0.054
 build+0.054
talk+0.053
Neg logit contributions
ummy-0.065
ls-0.064
 adorable-0.063
 importantly-0.063
 important-0.062
Token he
Token ID339
Feature activation-2.48e-02

Pos logit contributions
ummy+0.103
ls+0.102
 adorable+0.099
 important+0.095
ience+0.093
Neg logit contributions
 meet-0.179
 join-0.179
 greet-0.168
 build-0.166
talk-0.163
Token could
Token ID714
Feature activation+8.55e-02

Pos logit contributions
ummy+0.039
ls+0.038
 adorable+0.038
 important+0.037
 problem+0.036
Neg logit contributions
 meet-0.055
 join-0.055
 greet-0.052
talk-0.051
 build-0.051
Token serve
Token ID4691
Feature activation+6.24e-02

Pos logit contributions
talk+0.113
 gra+0.105
 meet+0.105
 greet+0.104
 join+0.099
Neg logit contributions
 important-0.204
ummy-0.203
 adorable-0.200
 problem-0.194
 cute-0.188
Token others
Token ID1854
Feature activation+8.43e-02

Pos logit contributions
 join+0.132
 meet+0.130
 greet+0.123
 build+0.120
talk+0.119
Neg logit contributions
ummy-0.104
ls-0.104
 adorable-0.102
ience-0.101
 important-0.101
Token by
Token ID416
Feature activation+1.09e-01

Pos logit contributions
 join+0.193
 meet+0.189
 greet+0.179
 build+0.174
 contain+0.173
Neg logit contributions
ummy-0.130
ls-0.127
 adorable-0.123
 important-0.122
 problem-0.121
Token letting
Token ID9616
Feature activation+1.68e-01

Pos logit contributions
 join+0.233
 meet+0.232
 build+0.217
 greet+0.217
 contain+0.206
Neg logit contributions
ls-0.180
ummy-0.175
ience-0.172
 adorable-0.172
 important-0.171
Token them
Token ID606
Feature activation+9.76e-02

Pos logit contributions
talk+0.314
 meet+0.313
 join+0.312
 greet+0.301
 contain+0.299
Neg logit contributions
 important-0.307
 adorable-0.298
ummy-0.297
ls-0.288
 problem-0.286
Token read
Token ID1100
Feature activation+9.41e-02

Pos logit contributions
talk+0.148
 meet+0.145
 greet+0.143
 join+0.142
 gra+0.137
Neg logit contributions
 adorable-0.211
 important-0.210
ummy-0.204
ls-0.201
 cute-0.199
Token his
Token ID465
Feature activation+2.51e-02

Pos logit contributions
 join+0.202
 meet+0.199
 greet+0.189
 build+0.185
talk+0.184
Neg logit contributions
ummy-0.154
ls-0.150
 adorable-0.148
 important-0.145
ience-0.144
Token story
Token ID1621
Feature activation+6.32e-02

Pos logit contributions
 meet+0.057
 join+0.057
 greet+0.053
 build+0.052
talk+0.052
Neg logit contributions
ummy-0.038
ls-0.038
 adorable-0.036
ience-0.035
 important-0.035
Token.
Token ID13
Feature activation+4.16e-03

Pos logit contributions
 meet+0.144
 join+0.144
 greet+0.137
talk+0.135
 build+0.133
Neg logit contributions
ummy-0.094
 adorable-0.092
 important-0.092
ls-0.091
 problem-0.088
Token confused
Token ID10416
Feature activation-1.41e-01

Pos logit contributions
ummy+0.165
ls+0.141
 problem+0.139
ience+0.137
ement+0.134
Neg logit contributions
 join-0.220
 meet-0.212
 build-0.204
 greet-0.202
 make-0.198
Token,
Token ID11
Feature activation-1.12e-01

Pos logit contributions
ls+0.270
ummy+0.268
ience+0.268
omo+0.265
opus+0.259
Neg logit contributions
 join-0.259
 meet-0.234
 greet-0.224
 contain-0.219
 make-0.218
Token but
Token ID475
Feature activation-5.31e-02

Pos logit contributions
ummy+0.203
ls+0.194
ience+0.185
 problem+0.182
 important+0.179
Neg logit contributions
 meet-0.231
 join-0.231
 build-0.216
 greet-0.215
 contain-0.211
Token it
Token ID340
Feature activation-4.19e-02

Pos logit contributions
ummy+0.100
ls+0.092
 problem+0.087
ience+0.084
omo+0.084
Neg logit contributions
 join-0.114
 meet-0.113
 greet-0.104
 build-0.103
 contain-0.102
Token felt
Token ID2936
Feature activation-9.15e-02

Pos logit contributions
ummy+0.089
ls+0.075
 importantly+0.073
 problem+0.071
ittens+0.071
Neg logit contributions
 join-0.081
 meet-0.078
 make-0.076
 build-0.074
 greet-0.072
Token very
Token ID845
Feature activation-2.16e-01

Pos logit contributions
ummy+0.137
ls+0.129
ience+0.117
opus+0.115
 problem+0.114
Neg logit contributions
 join-0.230
 meet-0.219
 greet-0.212
 contain-0.206
 build-0.205
Token refreshing
Token ID23056
Feature activation-1.31e-01

Pos logit contributions
ummy+0.283
ience+0.273
opus+0.257
ement+0.253
omo+0.253
Neg logit contributions
 join-0.535
 meet-0.507
 make-0.485
 greet-0.478
 build-0.474
Token.
Token ID13
Feature activation-5.00e-03

Pos logit contributions
 importantly+0.236
ls+0.233
 adorable+0.233
ience+0.233
 problem+0.232
Neg logit contributions
 join-0.243
 meet-0.232
 contain-0.220
 greet-0.218
 build-0.218
Token
Token ID220
Feature activation-5.60e-02

Pos logit contributions
ls+0.010
 important+0.010
 adorable+0.010
 problem+0.010
ience+0.010
Neg logit contributions
 meet-0.009
 join-0.009
 build-0.008
 greet-0.008
talk-0.008
Token
Token ID198
Feature activation+1.93e-02

Pos logit contributions
ummy+0.081
 important+0.075
ls+0.075
 adorable+0.074
 importantly+0.070
Neg logit contributions
 meet-0.131
talk-0.129
 greet-0.129
 join-0.129
 contain-0.128
Token
Token ID198
Feature activation+1.58e-02

Pos logit contributions
 join+0.044
 meet+0.044
talk+0.043
 greet+0.043
 contain+0.042
Neg logit contributions
ummy-0.028
 important-0.027
ls-0.027
 adorable-0.026
 importantly-0.025
Token.
Token ID13
Feature activation-4.44e-03

Pos logit contributions
talk+0.078
 join+0.077
 meet+0.077
 greet+0.076
 contain+0.074
Neg logit contributions
 important-0.039
 adorable-0.039
 problem-0.037
ummy-0.036
 disgusting-0.036
Token The
Token ID383
Feature activation-4.01e-03

Pos logit contributions
ls+0.009
 important+0.009
 adorable+0.009
ummy+0.009
 problem+0.009
Neg logit contributions
 join-0.008
 meet-0.008
talk-0.007
 build-0.007
 greet-0.007
Token small
Token ID1402
Feature activation-8.97e-02

Pos logit contributions
ummy+0.005
ls+0.004
 important+0.004
ience+0.004
 adorable+0.004
Neg logit contributions
 meet-0.011
 join-0.011
 greet-0.010
 build-0.010
 contain-0.010
Token shell
Token ID7582
Feature activation-2.94e-02

Pos logit contributions
ummy+0.140
 importantly+0.138
 important+0.137
ls+0.137
ience+0.137
Neg logit contributions
 meet-0.195
 join-0.195
 build-0.182
 greet-0.182
 contain-0.180
Token felt
Token ID2936
Feature activation-8.77e-02

Pos logit contributions
ummy+0.057
 adorable+0.049
 importantly+0.048
ww+0.047
ls+0.046
Neg logit contributions
 meet-0.057
 join-0.055
 build-0.055
 make-0.051
talk-0.051
Token very
Token ID845
Feature activation-2.16e-01

Pos logit contributions
ummy+0.106
ls+0.095
ience+0.086
omo+0.086
opus+0.082
Neg logit contributions
 join-0.240
 meet-0.239
 greet-0.227
 contain-0.223
 build-0.221
Token sad
Token ID6507
Feature activation-1.62e-01

Pos logit contributions
ummy+0.249
omo+0.231
ip+0.220
ience+0.216
opus+0.216
Neg logit contributions
 meet-0.573
 join-0.564
 contain-0.534
 greet-0.533
 find-0.528
Token,
Token ID11
Feature activation-1.15e-01

Pos logit contributions
ience+0.253
omo+0.239
ls+0.237
opus+0.233
ww+0.232
Neg logit contributions
 join-0.355
 meet-0.334
 build-0.326
 make-0.323
 contain-0.314
Token until
Token ID1566
Feature activation-3.88e-02

Pos logit contributions
ummy+0.187
ls+0.178
ience+0.168
opus+0.162
 importantly+0.162
Neg logit contributions
 meet-0.259
 join-0.253
 build-0.242
 contain-0.239
 greet-0.239
Token it
Token ID340
Feature activation-3.24e-02

Pos logit contributions
ummy+0.069
 adorable+0.066
 important+0.065
ls+0.064
ience+0.064
Neg logit contributions
 join-0.080
 meet-0.080
 greet-0.074
 build-0.073
talk-0.072
Token met
Token ID1138
Feature activation+6.69e-02

Pos logit contributions
ummy+0.060
 adorable+0.057
 important+0.057
ls+0.057
 importantly+0.055
Neg logit contributions
 meet-0.064
 join-0.064
 greet-0.060
talk-0.060
 build-0.058
Token It
Token ID632
Feature activation-1.60e-02

Pos logit contributions
ummy+0.014
ls+0.014
 adorable+0.014
 important+0.014
 problem+0.013
Neg logit contributions
 meet-0.017
 join-0.017
 greet-0.016
talk-0.016
 build-0.016
Token was
Token ID373
Feature activation-7.54e-02

Pos logit contributions
ummy+0.033
 importantly+0.029
ience+0.028
ls+0.027
ement+0.027
Neg logit contributions
 meet-0.031
 join-0.030
 build-0.030
 make-0.029
 greet-0.028
Token large
Token ID1588
Feature activation-1.16e-01

Pos logit contributions
ience+0.131
ls+0.127
ummy+0.123
opus+0.117
ement+0.112
Neg logit contributions
 join-0.165
 meet-0.161
 build-0.161
 greet-0.155
 gra-0.152
Token and
Token ID290
Feature activation-1.12e-01

Pos logit contributions
ls+0.184
ience+0.183
omo+0.174
dd+0.171
ummy+0.169
Neg logit contributions
 join-0.239
 build-0.233
 meet-0.227
 make-0.218
 greet-0.217
Token looked
Token ID3114
Feature activation-1.22e-01

Pos logit contributions
ummy+0.197
olly+0.171
ement+0.170
ls+0.166
ience+0.164
Neg logit contributions
 build-0.241
 make-0.240
 meet-0.232
 join-0.232
 greet-0.221
Token very
Token ID845
Feature activation-2.15e-01

Pos logit contributions
ummy+0.268
opus+0.254
omo+0.240
ittens+0.236
olly+0.235
Neg logit contributions
 join-0.225
 meet-0.217
 greet-0.210
 make-0.207
 contain-0.199
Token interesting
Token ID3499
Feature activation-1.53e-01

Pos logit contributions
ience+0.368
opus+0.352
ummy+0.344
ittens+0.336
ip+0.335
Neg logit contributions
 join-0.447
 greet-0.414
 meet-0.412
 make-0.409
 see-0.408
Token to
Token ID284
Feature activation+6.75e-02

Pos logit contributions
ience+0.326
omo+0.321
ls+0.309
ww+0.309
opus+0.306
Neg logit contributions
 join-0.243
 meet-0.224
 greet-0.213
 build-0.212
 contain-0.212
Token him
Token ID683
Feature activation+7.76e-03

Pos logit contributions
talk+0.104
 join+0.090
 gra+0.087
tted+0.082
 greet+0.082
Neg logit contributions
 adorable-0.154
 important-0.148
 disgusting-0.147
ummy-0.145
 ugly-0.143
Token.
Token ID13
Feature activation-6.51e-03

Pos logit contributions
 join+0.016
 meet+0.015
 build+0.015
 greet+0.014
 contain+0.014
Neg logit contributions
ummy-0.014
ls-0.014
ience-0.013
 importantly-0.013
 adorable-0.013
Token So
Token ID1406
Feature activation-8.07e-02

Pos logit contributions
ls+0.011
ummy+0.011
 adorable+0.010
 important+0.010
 problem+0.010
Neg logit contributions
 meet-0.014
 join-0.014
 greet-0.013
 build-0.013
talk-0.013
Token was
Token ID373
Feature activation-7.71e-02

Pos logit contributions
ummy+0.037
 importantly+0.033
ience+0.031
olly+0.031
ip+0.031
Neg logit contributions
 meet-0.033
 join-0.031
 build-0.031
 make-0.031
 greet-0.029
Token very
Token ID845
Feature activation-2.04e-01

Pos logit contributions
ience+0.142
ls+0.137
ummy+0.130
opus+0.129
ement+0.124
Neg logit contributions
 join-0.164
 meet-0.160
 build-0.155
 make-0.151
 gra-0.151
Token high
Token ID1029
Feature activation-1.28e-01

Pos logit contributions
ience+0.425
ement+0.409
der+0.363
opus+0.362
 importantly+0.361
Neg logit contributions
 join-0.352
 meet-0.345
 enter-0.328
 see-0.327
 gra-0.324
Token and
Token ID290
Feature activation-1.07e-01

Pos logit contributions
 importantly+0.235
omo+0.217
ience+0.208
 sudden+0.208
opus+0.201
Neg logit contributions
 join-0.223
 build-0.222
 meet-0.218
 make-0.218
talk-0.210
Token looked
Token ID3114
Feature activation-1.22e-01

Pos logit contributions
ummy+0.171
ience+0.151
opus+0.150
ls+0.149
 importantly+0.147
Neg logit contributions
 meet-0.241
 join-0.236
 build-0.229
 greet-0.225
 make-0.222
Token very
Token ID845
Feature activation-2.15e-01

Pos logit contributions
ummy+0.255
opus+0.240
omo+0.231
ittens+0.229
olly+0.228
Neg logit contributions
 join-0.233
 meet-0.228
 greet-0.216
 make-0.209
 build-0.204
Token dangerous
Token ID4923
Feature activation-1.28e-01

Pos logit contributions
opus+0.350
ience+0.348
ummy+0.347
ittens+0.347
ip+0.346
Neg logit contributions
 join-0.437
 meet-0.409
 see-0.406
 greet-0.402
 make-0.390
Token.
Token ID13
Feature activation-4.77e-03

Pos logit contributions
ience+0.241
 importantly+0.240
ummy+0.240
omo+0.239
ls+0.236
Neg logit contributions
 join-0.225
 meet-0.222
 greet-0.207
 build-0.204
talk-0.203
Token
Token ID198
Feature activation+2.51e-02

Pos logit contributions
ummy+0.008
 adorable+0.008
 important+0.008
ls+0.008
 problem+0.008
Neg logit contributions
 join-0.010
 meet-0.010
 greet-0.009
talk-0.009
 build-0.009
Token
Token ID198
Feature activation+2.15e-02

Pos logit contributions
 join+0.057
 meet+0.056
 contain+0.055
 greet+0.055
talk+0.055
Neg logit contributions
ummy-0.037
ls-0.035
 adorable-0.035
 important-0.035
 cute-0.033
TokenMaria
Token ID46827
Feature activation-1.81e-02

Pos logit contributions
 meet+0.040
 join+0.040
 greet+0.038
talk+0.037
 build+0.036
Neg logit contributions
ummy-0.041
 adorable-0.040
 important-0.040
ls-0.040
 problem-0.039
Token yellow
Token ID7872
Feature activation-1.01e-01

Pos logit contributions
ience+0.203
ls+0.187
 problem+0.180
opus+0.169
 importantly+0.167
Neg logit contributions
 join-0.234
 greet-0.213
 meet-0.211
 make-0.207
 gra-0.204
Token and
Token ID290
Feature activation-9.62e-02

Pos logit contributions
ience+0.161
ls+0.150
 important+0.148
 problem+0.145
 importantly+0.140
Neg logit contributions
 join-0.234
 meet-0.220
talk-0.207
 contain-0.207
 greet-0.206
Token her
Token ID607
Feature activation-1.48e-02

Pos logit contributions
ummy+0.132
 problem+0.126
ls+0.126
ience+0.121
ement+0.119
Neg logit contributions
 meet-0.239
 join-0.236
 greet-0.223
 build-0.223
 gra-0.222
Token wool
Token ID25749
Feature activation+2.72e-03

Pos logit contributions
ls+0.020
ience+0.019
ummy+0.018
 accidents+0.018
 importantly+0.018
Neg logit contributions
 meet-0.038
 join-0.037
 greet-0.035
 gra-0.034
 build-0.033
Token felt
Token ID2936
Feature activation-9.09e-02

Pos logit contributions
 meet+0.006
 join+0.006
 build+0.005
 greet+0.005
 gra+0.005
Neg logit contributions
ls-0.004
ummy-0.004
 problem-0.004
 important-0.004
 importantly-0.004
Token very
Token ID845
Feature activation-2.15e-01

Pos logit contributions
ummy+0.113
ls+0.112
ience+0.106
 problem+0.106
 important+0.099
Neg logit contributions
 join-0.235
 meet-0.232
 greet-0.223
 contain-0.217
 build-0.216
Token soft
Token ID2705
Feature activation-1.16e-01

Pos logit contributions
ience+0.314
 problem+0.294
ummy+0.291
ement+0.284
ls+0.283
Neg logit contributions
 join-0.502
 meet-0.491
 greet-0.471
 contain-0.465
 make-0.462
Token.
Token ID13
Feature activation-5.64e-03

Pos logit contributions
ls+0.232
ience+0.227
 problem+0.222
 importantly+0.220
ummy+0.220
Neg logit contributions
 join-0.201
 meet-0.196
 greet-0.184
 build-0.181
talk-0.180
Token
Token ID198
Feature activation+2.36e-02

Pos logit contributions
ummy+0.009
 adorable+0.009
 important+0.009
ls+0.009
 problem+0.009
Neg logit contributions
 join-0.012
 meet-0.012
 greet-0.012
talk-0.011
 build-0.011
Token
Token ID198
Feature activation+1.89e-02

Pos logit contributions
 join+0.053
 meet+0.053
talk+0.052
 greet+0.052
 contain+0.051
Neg logit contributions
ummy-0.035
 adorable-0.034
 important-0.033
ls-0.033
 cute-0.032
TokenOne
Token ID3198
Feature activation-1.76e-02

Pos logit contributions
 meet+0.034
 join+0.034
 greet+0.032
talk+0.031
 build+0.031
Neg logit contributions
ummy-0.038
ls-0.038
 important-0.037
 adorable-0.037
 problem-0.037
Token very
Token ID845
Feature activation-2.05e-01

Pos logit contributions
opus+0.210
ummy+0.206
ittens+0.197
imi+0.190
omo+0.188
Neg logit contributions
 join-0.221
 meet-0.212
 make-0.212
 greet-0.212
 see-0.203
Token old
Token ID1468
Feature activation-1.03e-01

Pos logit contributions
opus+0.316
ience+0.309
ittens+0.294
 importantly+0.290
der+0.287
Neg logit contributions
 join-0.431
 greet-0.401
 see-0.399
 meet-0.397
 make-0.396
Token and
Token ID290
Feature activation-9.60e-02

Pos logit contributions
omo+0.174
 importantly+0.166
 important+0.166
ls+0.165
ience+0.164
Neg logit contributions
 join-0.190
 meet-0.182
 build-0.180
 make-0.179
 contain-0.173
Token Alice
Token ID14862
Feature activation-1.71e-02

Pos logit contributions
ummy+0.161
opus+0.140
olly+0.137
ls+0.136
ement+0.134
Neg logit contributions
 meet-0.206
 join-0.204
 make-0.203
 greet-0.200
 build-0.199
Token felt
Token ID2936
Feature activation-8.96e-02

Pos logit contributions
ummy+0.034
 important+0.031
 importantly+0.030
ls+0.030
 problem+0.030
Neg logit contributions
 meet-0.033
 join-0.033
 build-0.031
 greet-0.031
 make-0.030
Token very
Token ID845
Feature activation-2.14e-01

Pos logit contributions
ummy+0.105
ls+0.098
 problem+0.092
ience+0.092
 important+0.089
Neg logit contributions
 join-0.242
 meet-0.240
 greet-0.231
 contain-0.224
 build-0.223
Token serious
Token ID2726
Feature activation-1.52e-01

Pos logit contributions
ummy+0.245
ience+0.240
omo+0.224
ls+0.224
 problem+0.216
Neg logit contributions
 join-0.561
 meet-0.557
 greet-0.532
 contain-0.525
 build-0.518
Token when
Token ID618
Feature activation+2.96e-03

Pos logit contributions
ls+0.269
ience+0.262
ummy+0.260
 adorable+0.253
 problem+0.249
Neg logit contributions
 join-0.307
 meet-0.298
 greet-0.279
 contain-0.277
 build-0.276
Token she
Token ID673
Feature activation-2.69e-02

Pos logit contributions
 meet+0.007
 join+0.007
 greet+0.007
 build+0.007
 contain+0.007
Neg logit contributions
ummy-0.004
ience-0.004
 adorable-0.004
 important-0.003
ls-0.003
Token saw
Token ID2497
Feature activation+2.27e-02

Pos logit contributions
ummy+0.053
 important+0.051
 adorable+0.050
ls+0.049
 importantly+0.048
Neg logit contributions
 meet-0.051
 join-0.050
 greet-0.047
talk-0.047
 build-0.046
Token it
Token ID340
Feature activation-6.93e-03

Pos logit contributions
 meet+0.051
 join+0.051
 greet+0.049
talk+0.047
 build+0.047
Neg logit contributions
ummy-0.035
 important-0.033
ls-0.033
 adorable-0.033
 problem-0.032
Token It
Token ID632
Feature activation-2.26e-02

Pos logit contributions
ummy+0.027
 adorable+0.026
 important+0.025
 importantly+0.024
 problem+0.023
Neg logit contributions
 meet-0.032
 join-0.032
talk-0.031
 greet-0.031
 contain-0.030
Token was
Token ID373
Feature activation-7.82e-02

Pos logit contributions
ummy+0.038
ls+0.035
ience+0.034
 problem+0.033
 importantly+0.033
Neg logit contributions
 meet-0.051
 join-0.050
 build-0.048
 greet-0.047
talk-0.046
Token gray
Token ID12768
Feature activation-1.26e-01

Pos logit contributions
ls+0.121
ummy+0.120
ience+0.118
 importantly+0.106
ement+0.105
Neg logit contributions
 join-0.181
 meet-0.181
 build-0.171
 greet-0.169
 contain-0.164
Token and
Token ID290
Feature activation-1.07e-01

Pos logit contributions
ience+0.205
ls+0.204
 problem+0.175
 sudden+0.172
 importantly+0.168
Neg logit contributions
 join-0.277
 meet-0.268
 build-0.268
talk-0.258
 make-0.248
Token looked
Token ID3114
Feature activation-1.23e-01

Pos logit contributions
ummy+0.187
ls+0.175
ience+0.171
ement+0.166
der+0.162
Neg logit contributions
 meet-0.223
 join-0.218
 build-0.215
 make-0.214
 greet-0.201
Token very
Token ID845
Feature activation-2.14e-01

Pos logit contributions
ummy+0.247
ls+0.235
opus+0.232
omo+0.228
ience+0.228
Neg logit contributions
 join-0.235
 meet-0.230
 greet-0.217
 contain-0.208
 build-0.208
Token fun
Token ID1257
Feature activation-1.32e-01

Pos logit contributions
ummy+0.311
ience+0.306
der+0.291
opus+0.287
ls+0.278
Neg logit contributions
 join-0.502
 meet-0.481
 greet-0.464
 build-0.457
 make-0.455
Token!
Token ID0
Feature activation-1.83e-02

Pos logit contributions
ience+0.248
ls+0.246
ummy+0.242
 adorable+0.233
 importantly+0.233
Neg logit contributions
 meet-0.249
 join-0.249
 build-0.231
 greet-0.226
talk-0.224
Token One
Token ID1881
Feature activation-1.55e-02

Pos logit contributions
ummy+0.031
 adorable+0.030
 important+0.029
ls+0.028
 problem+0.028
Neg logit contributions
 join-0.039
 meet-0.039
 greet-0.037
talk-0.037
 contain-0.036
Token day
Token ID1110
Feature activation-5.87e-02

Pos logit contributions
ls+0.029
ience+0.028
ummy+0.027
 importantly+0.026
ement+0.025
Neg logit contributions
 meet-0.032
 join-0.031
 build-0.029
 greet-0.029
 contain-0.028
Token,
Token ID11
Feature activation-8.89e-02

Pos logit contributions
ience+0.090
 important+0.087
 adorable+0.087
ummy+0.086
 precious+0.084
Neg logit contributions
 join-0.134
 meet-0.127
 build-0.119
 arrive-0.118
 make-0.115
Token cry
Token ID3960
Feature activation-5.49e-02

Pos logit contributions
talk+0.141
 meet+0.129
 join+0.128
five+0.120
 contain+0.119
Neg logit contributions
 adorable-0.179
 disgusting-0.176
 cute-0.175
 important-0.172
 problem-0.169
Token.
Token ID13
Feature activation-6.42e-03

Pos logit contributions
ls+0.103
ience+0.101
ummy+0.101
 importantly+0.098
 adorable+0.097
Neg logit contributions
 join-0.106
 meet-0.098
 greet-0.094
 build-0.092
 make-0.089
Token The
Token ID383
Feature activation-2.45e-03

Pos logit contributions
ls+0.012
ummy+0.012
 adorable+0.011
ience+0.011
 important+0.011
Neg logit contributions
 join-0.013
 meet-0.013
 greet-0.012
 build-0.011
talk-0.011
Token hall
Token ID6899
Feature activation-5.80e-03

Pos logit contributions
ummy+0.004
 adorable+0.004
 important+0.004
ls+0.004
 problem+0.004
Neg logit contributions
 meet-0.005
 join-0.005
 greet-0.005
talk-0.005
 build-0.005
Token felt
Token ID2936
Feature activation-8.95e-02

Pos logit contributions
ummy+0.010
 importantly+0.009
 adorable+0.009
ls+0.009
 important+0.009
Neg logit contributions
 join-0.011
 meet-0.011
talk-0.011
 contain-0.011
 build-0.011
Token very
Token ID845
Feature activation-2.14e-01

Pos logit contributions
ummy+0.113
ls+0.108
opus+0.098
ience+0.098
 problem+0.089
Neg logit contributions
 join-0.237
 meet-0.232
 greet-0.226
 contain-0.218
 make-0.218
Token quiet
Token ID5897
Feature activation-1.18e-01

Pos logit contributions
ummy+0.273
opus+0.270
ip+0.250
ience+0.243
very+0.237
Neg logit contributions
 join-0.515
 meet-0.514
 greet-0.499
 make-0.488
 protect-0.483
Token.
Token ID13
Feature activation-7.09e-03

Pos logit contributions
ls+0.135
ummy+0.134
 adorable+0.130
 important+0.129
ience+0.126
Neg logit contributions
 join-0.307
 meet-0.306
 greet-0.292
talk-0.288
 build-0.286
Token
Token ID198
Feature activation+2.22e-02

Pos logit contributions
ummy+0.012
ls+0.012
 adorable+0.012
 important+0.012
 problem+0.011
Neg logit contributions
 join-0.015
 meet-0.015
 greet-0.014
talk-0.014
 build-0.013
Token
Token ID198
Feature activation+1.90e-02

Pos logit contributions
 join+0.047
 meet+0.046
 greet+0.044
talk+0.044
 contain+0.043
Neg logit contributions
ummy-0.037
 important-0.036
 adorable-0.036
ls-0.036
 problem-0.035
TokenSuddenly
Token ID38582
Feature activation-5.02e-02

Pos logit contributions
 meet+0.041
 join+0.041
 greet+0.039
talk+0.038
 build+0.037
Neg logit contributions
ummy-0.032
ls-0.031
 adorable-0.031
 important-0.030
 problem-0.029
Token very
Token ID845
Feature activation-2.03e-01

Pos logit contributions
ience+0.112
ls+0.107
ummy+0.092
opus+0.091
ement+0.091
Neg logit contributions
 join-0.178
 build-0.176
 meet-0.176
 make-0.169
 arrive-0.165
Token wet
Token ID9583
Feature activation-1.31e-01

Pos logit contributions
ience+0.318
ement+0.276
der+0.275
ls+0.267
 accidents+0.265
Neg logit contributions
 join-0.484
 meet-0.477
 build-0.462
 make-0.455
 buy-0.448
Token.
Token ID13
Feature activation-7.51e-03

Pos logit contributions
ience+0.261
ls+0.253
 importantly+0.245
 problem+0.232
 precious+0.231
Neg logit contributions
 join-0.228
 build-0.224
 meet-0.221
 make-0.210
talk-0.203
Token It
Token ID632
Feature activation-1.76e-02

Pos logit contributions
ummy+0.013
 adorable+0.013
 important+0.013
ls+0.012
 importantly+0.012
Neg logit contributions
 meet-0.015
 join-0.015
 greet-0.015
talk-0.015
 contain-0.014
Token felt
Token ID2936
Feature activation-8.70e-02

Pos logit contributions
ummy+0.031
ls+0.028
 importantly+0.028
ience+0.028
ement+0.027
Neg logit contributions
 join-0.037
 meet-0.037
 build-0.036
 make-0.035
 greet-0.034
Token very
Token ID845
Feature activation-2.14e-01

Pos logit contributions
ummy+0.120
ience+0.118
ls+0.118
opus+0.106
ement+0.105
Neg logit contributions
 join-0.218
 meet-0.209
 make-0.203
 build-0.202
 greet-0.197
Token heavy
Token ID4334
Feature activation-1.44e-01

Pos logit contributions
ience+0.341
ement+0.310
ummy+0.301
opus+0.293
very+0.287
Neg logit contributions
 join-0.470
 meet-0.451
 make-0.450
 build-0.444
 stay-0.441
Token in
Token ID287
Feature activation+6.49e-02

Pos logit contributions
ls+0.279
ience+0.275
opus+0.273
 importantly+0.267
omo+0.267
Neg logit contributions
 join-0.244
 meet-0.242
 build-0.237
 make-0.224
 greet-0.220
Token her
Token ID607
Feature activation-1.10e-02

Pos logit contributions
 join+0.128
 meet+0.128
 greet+0.120
 build+0.118
talk+0.116
Neg logit contributions
ummy-0.117
ls-0.116
 adorable-0.115
 important-0.114
ience-0.113
Token small
Token ID1402
Feature activation-8.95e-02

Pos logit contributions
ls+0.021
ience+0.021
ummy+0.020
 important+0.020
 importantly+0.020
Neg logit contributions
 join-0.021
 meet-0.021
 build-0.020
 greet-0.019
 gra-0.018
Token hands
Token ID2832
Feature activation+2.91e-02

Pos logit contributions
ls+0.156
ience+0.154
 important+0.150
 adorable+0.149
 importantly+0.149
Neg logit contributions
 join-0.182
 meet-0.181
 build-0.174
 greet-0.171
talk-0.165
Token.
Token ID13
Feature activation-4.41e-03

Pos logit contributions
ummy+0.122
 adorable+0.113
ls+0.112
ience+0.111
 important+0.111
Neg logit contributions
 join-0.130
 meet-0.128
 build-0.122
talk-0.118
 greet-0.116
Token The
Token ID383
Feature activation-5.12e-03

Pos logit contributions
ummy+0.007
 adorable+0.007
 important+0.007
 disgusting+0.007
 cute+0.006
Neg logit contributions
 meet-0.009
 join-0.009
talk-0.009
 greet-0.009
 contain-0.009
Token zero
Token ID6632
Feature activation-4.87e-02

Pos logit contributions
ummy+0.008
 adorable+0.008
 important+0.008
ls+0.008
 problem+0.008
Neg logit contributions
 meet-0.011
 join-0.011
 greet-0.011
talk-0.011
 contain-0.010
Token was
Token ID373
Feature activation-6.98e-02

Pos logit contributions
ummy+0.087
ience+0.076
 importantly+0.076
 adorable+0.076
 accidents+0.075
Neg logit contributions
 meet-0.099
 join-0.098
 build-0.096
talk-0.095
 make-0.090
Token feeling
Token ID4203
Feature activation-1.02e-01

Pos logit contributions
ummy+0.089
ls+0.087
ience+0.086
ement+0.076
 importantly+0.075
Neg logit contributions
 join-0.180
 meet-0.179
 build-0.172
 greet-0.167
 make-0.165
Token very
Token ID845
Feature activation-2.14e-01

Pos logit contributions
ummy+0.129
ls+0.124
omo+0.111
ience+0.109
 problem+0.106
Neg logit contributions
 join-0.266
 meet-0.259
 build-0.249
 make-0.246
 greet-0.246
Token sad
Token ID6507
Feature activation-1.58e-01

Pos logit contributions
ummy+0.242
omo+0.224
ience+0.224
ls+0.223
 importantly+0.213
Neg logit contributions
 join-0.570
 meet-0.563
 build-0.533
 greet-0.532
 contain-0.528
Token.
Token ID13
Feature activation-5.89e-03

Pos logit contributions
ience+0.340
ls+0.331
opus+0.313
 sudden+0.304
omo+0.302
Neg logit contributions
 join-0.261
 make-0.237
 meet-0.237
 build-0.235
 contain-0.220
Token Every
Token ID3887
Feature activation+1.01e-03

Pos logit contributions
ummy+0.010
ls+0.010
 adorable+0.010
 important+0.010
 problem+0.010
Neg logit contributions
 meet-0.012
 join-0.012
 greet-0.011
talk-0.011
 build-0.011
Token day
Token ID1110
Feature activation-4.86e-02

Pos logit contributions
 meet+0.002
 join+0.002
 greet+0.002
 build+0.002
talk+0.002
Neg logit contributions
ls-0.002
ummy-0.002
 adorable-0.002
ience-0.002
 importantly-0.002
Token,
Token ID11
Feature activation-8.06e-02

Pos logit contributions
ummy+0.068
ience+0.067
ls+0.066
 adorable+0.065
 important+0.064
Neg logit contributions
 join-0.118
 meet-0.115
 build-0.107
 greet-0.106
 arrive-0.106
Token was
Token ID373
Feature activation-6.17e-02

Pos logit contributions
ummy+0.043
 importantly+0.038
ement+0.038
ls+0.036
 accidents+0.036
Neg logit contributions
 meet-0.034
 build-0.032
 join-0.032
 make-0.031
 greet-0.031
Token very
Token ID845
Feature activation-1.90e-01

Pos logit contributions
ummy+0.066
ls+0.065
ience+0.063
 importantly+0.054
 problem+0.054
Neg logit contributions
 join-0.174
 meet-0.174
 greet-0.165
 build-0.163
 contain-0.159
Token scared
Token ID12008
Feature activation-1.31e-01

Pos logit contributions
ience+0.264
ummy+0.255
ement+0.252
der+0.239
 accidents+0.223
Neg logit contributions
 join-0.474
 meet-0.466
 greet-0.445
 build-0.444
 make-0.428
Token and
Token ID290
Feature activation-8.25e-02

Pos logit contributions
ience+0.265
omo+0.244
opus+0.242
ls+0.233
iflower+0.230
Neg logit contributions
 join-0.247
 meet-0.225
 make-0.210
 greet-0.201
 see-0.200
Token felt
Token ID2936
Feature activation-9.14e-02

Pos logit contributions
ummy+0.135
 importantly+0.115
ls+0.114
 adorable+0.110
 important+0.110
Neg logit contributions
 meet-0.194
 join-0.189
 build-0.181
 greet-0.180
 contain-0.174
Token very
Token ID845
Feature activation-2.14e-01

Pos logit contributions
ummy+0.105
ls+0.089
omo+0.087
ience+0.084
 problem+0.082
Neg logit contributions
 join-0.255
 meet-0.254
 greet-0.244
 build-0.234
 contain-0.233
Token alone
Token ID3436
Feature activation-9.08e-02

Pos logit contributions
ummy+0.266
omo+0.243
ement+0.232
ience+0.232
 importantly+0.224
Neg logit contributions
 meet-0.561
 join-0.555
 greet-0.529
 build-0.516
 save-0.510
Token.
Token ID13
Feature activation-8.56e-03

Pos logit contributions
ummy+0.176
ls+0.169
ience+0.168
 adorable+0.164
 importantly+0.161
Neg logit contributions
 join-0.169
 meet-0.162
 build-0.152
 greet-0.151
talk-0.149
Token She
Token ID1375
Feature activation-2.31e-02

Pos logit contributions
ummy+0.014
 adorable+0.013
 important+0.013
ls+0.013
 disgusting+0.012
Neg logit contributions
 join-0.019
 meet-0.018
 greet-0.018
talk-0.018
 contain-0.018
Token was
Token ID373
Feature activation-6.64e-02

Pos logit contributions
ummy+0.051
 importantly+0.043
ement+0.042
ls+0.041
 adorable+0.040
Neg logit contributions
 meet-0.045
 join-0.042
 greet-0.041
 build-0.041
talk-0.041
Token so
Token ID523
Feature activation-1.49e-01

Pos logit contributions
ummy+0.083
ls+0.076
ience+0.073
opus+0.065
ement+0.064
Neg logit contributions
 meet-0.180
 join-0.180
 greet-0.171
 build-0.167
 contain-0.163
Token little
Token ID1310
Feature activation-1.17e-01

Pos logit contributions
ls+0.056
ement+0.056
ummy+0.054
ience+0.054
der+0.053
Neg logit contributions
 join-0.104
 meet-0.103
 build-0.095
 greet-0.094
 gra-0.094
Token girl
Token ID2576
Feature activation-4.31e-02

Pos logit contributions
ement+0.232
 importantly+0.226
ience+0.224
der+0.224
 important+0.220
Neg logit contributions
 join-0.196
 meet-0.178
 make-0.168
 build-0.167
 greet-0.164
Token who
Token ID508
Feature activation-2.72e-02

Pos logit contributions
ummy+0.064
 adorable+0.064
 important+0.063
ls+0.062
 problem+0.060
Neg logit contributions
 join-0.098
 meet-0.098
 greet-0.093
talk-0.092
 contain-0.091
Token was
Token ID373
Feature activation-7.21e-02

Pos logit contributions
 adorable+0.030
 important+0.030
ummy+0.029
ls+0.028
 problem+0.027
Neg logit contributions
 join-0.073
 meet-0.072
 greet-0.070
talk-0.070
 contain-0.068
Token feeling
Token ID4203
Feature activation-1.03e-01

Pos logit contributions
ls+0.083
ummy+0.079
ience+0.077
 problem+0.070
 important+0.068
Neg logit contributions
 join-0.198
 meet-0.197
 build-0.186
 greet-0.186
 gra-0.181
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ls+0.101
ummy+0.100
ience+0.090
 problem+0.087
 importantly+0.083
Neg logit contributions
 join-0.295
 meet-0.293
 greet-0.279
 build-0.275
 contain-0.271
Token sleepy
Token ID42023
Feature activation-1.60e-01

Pos logit contributions
ience+0.221
ls+0.217
ummy+0.199
ement+0.198
omo+0.194
Neg logit contributions
 meet-0.605
 join-0.601
 greet-0.560
 build-0.556
 contain-0.545
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
ls+0.284
ience+0.274
ummy+0.269
 adorable+0.264
 important+0.262
Neg logit contributions
 join-0.328
 meet-0.320
 greet-0.302
 build-0.297
talk-0.294
Token She
Token ID1375
Feature activation-2.49e-02

Pos logit contributions
ummy+0.017
 adorable+0.016
 important+0.016
 cute+0.014
 disgusting+0.014
Neg logit contributions
 join-0.032
 meet-0.032
talk-0.032
 greet-0.031
 contain-0.031
Token was
Token ID373
Feature activation-6.40e-02

Pos logit contributions
ummy+0.054
ls+0.050
ement+0.049
 importantly+0.048
 accidents+0.046
Neg logit contributions
 meet-0.046
 join-0.042
 greet-0.041
 make-0.041
 build-0.040
Token so
Token ID523
Feature activation-1.50e-01

Pos logit contributions
ls+0.099
ummy+0.096
ience+0.094
ement+0.087
 problem+0.085
Neg logit contributions
 meet-0.152
 join-0.151
 greet-0.142
 build-0.140
 gra-0.135
Token,
Token ID11
Feature activation-8.39e-02

Pos logit contributions
ience+0.090
 important+0.085
 importantly+0.085
 adorable+0.085
ement+0.084
Neg logit contributions
 join-0.132
 meet-0.127
 arrive-0.118
 build-0.113
 contain-0.111
Token a
Token ID257
Feature activation-3.03e-02

Pos logit contributions
ience+0.164
ummy+0.164
ls+0.160
 problem+0.159
 important+0.156
Neg logit contributions
 join-0.158
 meet-0.156
 greet-0.138
 build-0.138
 contain-0.135
Token woman
Token ID2415
Feature activation+4.86e-03

Pos logit contributions
ls+0.035
ummy+0.033
ement+0.032
ience+0.032
 importantly+0.029
Neg logit contributions
 join-0.082
 meet-0.081
 greet-0.075
 build-0.075
 gra-0.074
Token was
Token ID373
Feature activation-6.98e-02

Pos logit contributions
 join+0.011
 meet+0.010
 greet+0.010
talk+0.010
 contain+0.010
Neg logit contributions
ummy-0.008
 adorable-0.008
ls-0.008
 important-0.008
 problem-0.007
Token feeling
Token ID4203
Feature activation-1.02e-01

Pos logit contributions
ummy+0.101
ls+0.099
 adorable+0.097
 important+0.096
ience+0.096
Neg logit contributions
 meet-0.164
 join-0.164
 greet-0.155
 build-0.152
talk-0.152
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ummy+0.101
ls+0.098
ience+0.089
 problem+0.086
 adorable+0.084
Neg logit contributions
 join-0.293
 meet-0.291
 greet-0.279
 build-0.274
 contain-0.270
Token sleepy
Token ID42023
Feature activation-1.58e-01

Pos logit contributions
ience+0.221
ummy+0.220
ls+0.216
omo+0.201
 accidents+0.192
Neg logit contributions
 meet-0.604
 join-0.603
 greet-0.570
 contain-0.559
 build-0.554
Token.
Token ID13
Feature activation-9.15e-03

Pos logit contributions
ls+0.265
ience+0.255
ummy+0.253
 adorable+0.252
 important+0.249
Neg logit contributions
 join-0.338
 meet-0.333
 greet-0.315
 build-0.310
talk-0.309
Token She
Token ID1375
Feature activation-2.16e-02

Pos logit contributions
ummy+0.011
 adorable+0.010
 important+0.009
 cute+0.009
 disgusting+0.009
Neg logit contributions
 join-0.024
 meet-0.023
talk-0.023
 greet-0.023
 contain-0.023
Token wanted
Token ID2227
Feature activation+5.45e-04

Pos logit contributions
ummy+0.048
ement+0.045
 importantly+0.044
ls+0.043
 problem+0.042
Neg logit contributions
 meet-0.038
 join-0.036
 greet-0.034
talk-0.033
 build-0.033
Token to
Token ID284
Feature activation+1.02e-01

Pos logit contributions
 meet+0.001
 join+0.001
 greet+0.001
talk+0.001
 build+0.001
Neg logit contributions
ummy-0.001
ls-0.001
 adorable-0.001
 important-0.001
ience-0.001
Token.
Token ID13
Feature activation-4.98e-03

Pos logit contributions
 meet+0.161
 join+0.161
 greet+0.153
talk+0.151
 build+0.148
Neg logit contributions
ummy-0.121
 adorable-0.119
 important-0.119
ls-0.118
 problem-0.114
Token The
Token ID383
Feature activation-5.88e-03

Pos logit contributions
ummy+0.007
 adorable+0.007
 important+0.006
 cute+0.006
 disgusting+0.006
Neg logit contributions
 meet-0.012
 join-0.011
talk-0.011
 greet-0.011
 contain-0.011
Token avocado
Token ID40377
Feature activation-9.24e-03

Pos logit contributions
ience+0.008
ls+0.008
 importantly+0.007
ummy+0.007
 important+0.007
Neg logit contributions
 meet-0.015
 join-0.014
 build-0.014
 greet-0.013
 contain-0.013
Token was
Token ID373
Feature activation-7.06e-02

Pos logit contributions
ummy+0.019
 importantly+0.018
 accidents+0.018
ement+0.018
der+0.017
Neg logit contributions
 build-0.016
 meet-0.016
 join-0.015
 make-0.015
 find-0.014
Token feeling
Token ID4203
Feature activation-1.01e-01

Pos logit contributions
ience+0.109
ls+0.102
opus+0.097
ummy+0.095
ement+0.094
Neg logit contributions
 meet-0.171
 join-0.169
 build-0.166
 greet-0.161
 make-0.158
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ummy+0.125
ls+0.125
omo+0.115
ience+0.114
opus+0.111
Neg logit contributions
 join-0.266
 meet-0.262
 greet-0.252
 build-0.247
 make-0.245
Token lonely
Token ID21757
Feature activation-1.48e-01

Pos logit contributions
ience+0.263
omo+0.245
ement+0.240
 accidents+0.239
ummy+0.237
Neg logit contributions
 meet-0.554
 join-0.549
 greet-0.520
 contain-0.504
 build-0.503
Token and
Token ID290
Feature activation-9.36e-02

Pos logit contributions
ls+0.254
ience+0.244
omo+0.238
ummy+0.237
ww+0.232
Neg logit contributions
 join-0.306
 meet-0.287
 build-0.286
 make-0.281
 find-0.271
Token did
Token ID750
Feature activation-3.90e-02

Pos logit contributions
ummy+0.130
ls+0.117
 importantly+0.108
ement+0.108
ience+0.105
Neg logit contributions
 meet-0.244
 join-0.236
 greet-0.225
 build-0.225
 contain-0.220
Token not
Token ID407
Feature activation-4.24e-02

Pos logit contributions
 important+0.053
 adorable+0.049
 disgusting+0.047
 cute+0.046
 problem+0.046
Neg logit contributions
talk-0.098
 join-0.096
 meet-0.095
 greet-0.094
 contain-0.091
Token know
Token ID760
Feature activation-1.77e-02

Pos logit contributions
 important+0.099
 adorable+0.097
 cute+0.092
 lazy+0.091
 problem+0.091
Neg logit contributions
talk-0.059
 gra-0.058
 join-0.053
 greet-0.053
 meet-0.052
Token.
Token ID13
Feature activation-8.57e-03

Pos logit contributions
ummy+0.011
ls+0.011
 adorable+0.011
ience+0.010
 important+0.010
Neg logit contributions
 join-0.015
 meet-0.015
 build-0.014
 greet-0.013
talk-0.013
Token But
Token ID887
Feature activation-3.74e-02

Pos logit contributions
ummy+0.014
 adorable+0.013
 important+0.013
 cute+0.012
 importantly+0.012
Neg logit contributions
talk-0.018
 meet-0.018
 join-0.018
 greet-0.018
 contain-0.017
Token it
Token ID340
Feature activation-3.25e-02

Pos logit contributions
ls+0.070
ummy+0.069
ience+0.065
 importantly+0.064
 adorable+0.063
Neg logit contributions
 join-0.074
 meet-0.073
 build-0.071
 greet-0.067
 make-0.065
Token was
Token ID373
Feature activation-7.65e-02

Pos logit contributions
ummy+0.071
 importantly+0.064
ement+0.062
ls+0.062
der+0.058
Neg logit contributions
 meet-0.058
 build-0.058
 make-0.056
 join-0.055
 find-0.050
Token feeling
Token ID4203
Feature activation-1.02e-01

Pos logit contributions
ummy+0.115
ls+0.112
ement+0.109
ience+0.108
opus+0.105
Neg logit contributions
 join-0.180
 meet-0.179
 build-0.174
 make-0.168
 contain-0.164
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ummy+0.127
ls+0.122
omo+0.112
ience+0.110
ement+0.107
Neg logit contributions
 join-0.269
 meet-0.262
 build-0.251
 make-0.249
 greet-0.248
Token lonely
Token ID21757
Feature activation-1.42e-01

Pos logit contributions
ummy+0.242
ement+0.241
ience+0.241
omo+0.240
ls+0.227
Neg logit contributions
 join-0.542
 meet-0.539
 build-0.508
 contain-0.501
 greet-0.500
Token,
Token ID11
Feature activation-1.09e-01

Pos logit contributions
ls+0.275
ience+0.265
ummy+0.264
omo+0.254
 importantly+0.247
Neg logit contributions
 join-0.264
 meet-0.250
 build-0.240
 make-0.231
 contain-0.230
Token so
Token ID523
Feature activation-1.52e-01

Pos logit contributions
ls+0.170
ience+0.168
ummy+0.162
 problem+0.159
omo+0.147
Neg logit contributions
 join-0.252
 meet-0.252
 build-0.236
 contain-0.229
 greet-0.229
Token it
Token ID340
Feature activation-3.72e-02

Pos logit contributions
ummy+0.228
ls+0.221
ience+0.221
 problem+0.212
ement+0.210
Neg logit contributions
 meet-0.350
 join-0.345
 greet-0.319
 build-0.315
 arrive-0.306
Token called
Token ID1444
Feature activation+1.24e-03

Pos logit contributions
ummy+0.068
 importantly+0.058
ls+0.058
 adorable+0.056
 problem+0.056
Neg logit contributions
 meet-0.083
 join-0.081
 greet-0.077
talk-0.076
 build-0.075
Token big
Token ID1263
Feature activation-9.25e-02

Pos logit contributions
ience+0.349
ement+0.339
der+0.336
 accidents+0.301
opus+0.291
Neg logit contributions
 meet-0.466
 join-0.447
 build-0.426
 greet-0.422
 arrive-0.419
Token,
Token ID11
Feature activation-1.20e-01

Pos logit contributions
ls+0.153
ummy+0.153
 sudden+0.151
omo+0.150
ittens+0.150
Neg logit contributions
 meet-0.175
 join-0.170
 build-0.168
 make-0.166
 greet-0.157
Token and
Token ID290
Feature activation-8.95e-02

Pos logit contributions
ience+0.201
ls+0.194
ummy+0.194
 problem+0.187
ement+0.177
Neg logit contributions
 meet-0.258
 join-0.255
 build-0.255
 make-0.241
 greet-0.237
Token Tiny
Token ID20443
Feature activation+1.04e-02

Pos logit contributions
ummy+0.159
ls+0.147
ement+0.145
 problem+0.140
ience+0.137
Neg logit contributions
 meet-0.186
 join-0.180
 build-0.177
 make-0.174
 greet-0.172
Token felt
Token ID2936
Feature activation-8.79e-02

Pos logit contributions
 meet+0.022
 join+0.020
 build+0.020
 make+0.019
 greet+0.019
Neg logit contributions
ummy-0.020
 importantly-0.019
ls-0.019
 problem-0.018
 accidents-0.018
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ummy+0.104
ls+0.099
ience+0.091
 problem+0.091
 important+0.086
Neg logit contributions
 join-0.238
 meet-0.237
 greet-0.226
 build-0.221
 contain-0.218
Token small
Token ID1402
Feature activation-1.19e-01

Pos logit contributions
ummy+0.247
ience+0.236
ls+0.224
ement+0.221
 importantly+0.215
Neg logit contributions
 meet-0.578
 join-0.569
 greet-0.540
 build-0.533
 contain-0.521
Token.
Token ID13
Feature activation-1.00e-02

Pos logit contributions
ls+0.220
ummy+0.216
ience+0.207
opus+0.207
omo+0.203
Neg logit contributions
 join-0.230
 meet-0.226
 greet-0.210
 build-0.209
talk-0.205
Token But
Token ID887
Feature activation-3.77e-02

Pos logit contributions
ummy+0.017
 adorable+0.017
 important+0.017
ls+0.017
 problem+0.016
Neg logit contributions
 meet-0.021
 join-0.021
 greet-0.020
talk-0.019
 build-0.019
Token Tiny
Token ID20443
Feature activation+1.70e-02

Pos logit contributions
ummy+0.069
ls+0.066
ement+0.064
 adorable+0.061
ience+0.060
Neg logit contributions
 meet-0.077
 join-0.074
 build-0.073
 make-0.070
 greet-0.069
Token was
Token ID373
Feature activation-6.60e-02

Pos logit contributions
 meet+0.033
 build+0.030
 join+0.029
 make+0.029
 find+0.028
Neg logit contributions
ummy-0.035
 importantly-0.035
ls-0.030
ement-0.029
 accidents-0.029
Token The
Token ID383
Feature activation-2.30e-03

Pos logit contributions
ummy+0.006
ls+0.006
 important+0.006
 adorable+0.006
 problem+0.005
Neg logit contributions
 join-0.005
 meet-0.005
 greet-0.005
talk-0.005
 build-0.004
Token aer
Token ID9551
Feature activation-3.19e-02

Pos logit contributions
ummy+0.003
ls+0.003
ience+0.003
 importantly+0.003
 important+0.003
Neg logit contributions
 meet-0.006
 join-0.006
 greet-0.005
talk-0.005
 build-0.005
Tokenopl
Token ID20106
Feature activation-3.84e-02

Pos logit contributions
ummy+0.056
 adorable+0.055
ls+0.055
 important+0.055
 problem+0.053
Neg logit contributions
 meet-0.064
 join-0.064
 greet-0.061
talk-0.060
 build-0.059
Tokenane
Token ID1531
Feature activation+4.22e-03

Pos logit contributions
ummy+0.061
ls+0.060
 adorable+0.058
 important+0.057
 problem+0.056
Neg logit contributions
 join-0.085
 meet-0.085
 greet-0.083
 build-0.079
 contain-0.079
Token felt
Token ID2936
Feature activation-8.72e-02

Pos logit contributions
 meet+0.009
 join+0.009
talk+0.008
 greet+0.008
 build+0.008
Neg logit contributions
ummy-0.008
 importantly-0.007
 adorable-0.007
 important-0.007
ls-0.007
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ummy+0.108
ls+0.105
ience+0.097
 problem+0.096
 adorable+0.091
Neg logit contributions
 join-0.230
 meet-0.228
 greet-0.218
 contain-0.211
 build-0.210
Token happy
Token ID3772
Feature activation-1.60e-01

Pos logit contributions
ummy+0.257
ls+0.242
ience+0.239
 problem+0.234
 importantly+0.212
Neg logit contributions
 join-0.557
 meet-0.556
 greet-0.526
 contain-0.515
 build-0.509
Token too
Token ID1165
Feature activation-8.85e-02

Pos logit contributions
ience+0.282
ls+0.273
ummy+0.264
omo+0.257
 problem+0.257
Neg logit contributions
 join-0.342
 meet-0.323
 build-0.299
 greet-0.294
 make-0.293
Token.
Token ID13
Feature activation-2.05e-03

Pos logit contributions
ls+0.163
omo+0.161
 problem+0.158
ience+0.157
 importantly+0.154
Neg logit contributions
 join-0.169
 meet-0.167
 greet-0.149
 make-0.148
 arrive-0.148
Token From
Token ID3574
Feature activation-1.77e-02

Pos logit contributions
ls+0.004
ummy+0.004
 important+0.004
 adorable+0.004
 problem+0.004
Neg logit contributions
 join-0.004
 meet-0.004
 greet-0.004
talk-0.004
 build-0.003
Token that
Token ID326
Feature activation+7.64e-03

Pos logit contributions
ummy+0.029
ience+0.028
 importantly+0.027
 problem+0.027
 important+0.026
Neg logit contributions
 meet-0.036
 join-0.036
 greet-0.033
 contain-0.033
 make-0.032
Token was
Token ID373
Feature activation-6.03e-02

Pos logit contributions
ummy+0.053
 adorable+0.050
ls+0.050
 important+0.049
 importantly+0.049
Neg logit contributions
 join-0.074
 meet-0.074
 greet-0.069
 build-0.069
 contain-0.067
Token,
Token ID11
Feature activation-1.01e-01

Pos logit contributions
ummy+0.065
 adorable+0.063
 important+0.063
ls+0.062
 problem+0.059
Neg logit contributions
 meet-0.162
 join-0.162
 greet-0.155
talk-0.154
 build-0.151
Token but
Token ID475
Feature activation-4.17e-02

Pos logit contributions
ummy+0.134
ls+0.130
 problem+0.121
ience+0.120
 important+0.118
Neg logit contributions
 meet-0.254
 join-0.253
 greet-0.240
 build-0.238
 contain-0.233
Token it
Token ID340
Feature activation-3.20e-02

Pos logit contributions
ummy+0.075
ls+0.071
 adorable+0.069
omo+0.069
 important+0.068
Neg logit contributions
 join-0.086
 meet-0.085
 greet-0.080
 build-0.078
 contain-0.077
Token felt
Token ID2936
Feature activation-8.63e-02

Pos logit contributions
ummy+0.059
ls+0.052
 problem+0.050
 adorable+0.050
 importantly+0.049
Neg logit contributions
 join-0.067
 meet-0.067
 greet-0.064
 build-0.061
 make-0.060
Token very
Token ID845
Feature activation-2.13e-01

Pos logit contributions
ummy+0.131
ls+0.126
 problem+0.115
ience+0.115
omo+0.112
Neg logit contributions
 join-0.205
 meet-0.202
 greet-0.195
 contain-0.187
 build-0.186
Token excited
Token ID6568
Feature activation-1.56e-01

Pos logit contributions
ummy+0.261
omo+0.252
opus+0.236
ls+0.234
ience+0.233
Neg logit contributions
 join-0.540
 meet-0.538
 greet-0.519
 contain-0.497
 make-0.493
Token.
Token ID13
Feature activation-1.55e-03

Pos logit contributions
ls+0.255
ience+0.251
ummy+0.250
 adorable+0.246
 importantly+0.240
Neg logit contributions
 join-0.333
 meet-0.332
 greet-0.309
 contain-0.304
 build-0.303
Token
Token ID220
Feature activation-5.37e-02

Pos logit contributions
ummy+0.003
ls+0.003
 important+0.003
 adorable+0.003
 problem+0.003
Neg logit contributions
 meet-0.003
 join-0.003
 greet-0.003
talk-0.003
 build-0.003
Token
Token ID198
Feature activation+2.41e-02

Pos logit contributions
ummy+0.075
ls+0.070
 important+0.070
 adorable+0.070
 importantly+0.064
Neg logit contributions
 meet-0.126
talk-0.125
 join-0.125
 contain-0.124
 greet-0.124
Token
Token ID198
Feature activation+2.08e-02

Pos logit contributions
 join+0.055
 meet+0.054
talk+0.053
 greet+0.053
 contain+0.052
Neg logit contributions
ummy-0.034
 important-0.033
 adorable-0.033
ls-0.033
 cute-0.031
Token very
Token ID845
Feature activation-1.94e-01

Pos logit contributions
ls+0.102
ience+0.096
ummy+0.092
 problem+0.085
ement+0.083
Neg logit contributions
 join-0.156
 meet-0.155
 build-0.147
 greet-0.144
 gra-0.141
Token troubled
Token ID17840
Feature activation-1.48e-01

Pos logit contributions
ience+0.240
ls+0.234
ement+0.216
 accidents+0.211
ummy+0.210
Neg logit contributions
 join-0.529
 meet-0.523
 greet-0.487
 build-0.486
 make-0.462
Token and
Token ID290
Feature activation-8.85e-02

Pos logit contributions
ls+0.246
ience+0.242
opus+0.215
omo+0.214
ww+0.213
Neg logit contributions
 join-0.315
 build-0.303
 meet-0.298
 make-0.293
 contain-0.279
Token was
Token ID373
Feature activation-7.65e-02

Pos logit contributions
ummy+0.145
ls+0.129
ement+0.120
 importantly+0.116
ience+0.109
Neg logit contributions
 meet-0.210
 build-0.199
 join-0.197
 make-0.189
 greet-0.188
Token feeling
Token ID4203
Feature activation-1.04e-01

Pos logit contributions
ummy+0.126
ls+0.126
ience+0.115
ement+0.111
opus+0.106
Neg logit contributions
 join-0.177
 meet-0.176
 build-0.163
 greet-0.162
 make-0.156
Token very
Token ID845
Feature activation-2.12e-01

Pos logit contributions
ummy+0.123
ls+0.118
omo+0.109
ience+0.101
ement+0.096
Neg logit contributions
 join-0.283
 meet-0.280
 greet-0.265
 build-0.263
 make-0.260
Token sad
Token ID6507
Feature activation-1.59e-01

Pos logit contributions
ummy+0.236
omo+0.235
ement+0.225
ls+0.218
ience+0.217
Neg logit contributions
 meet-0.575
 join-0.568
 greet-0.535
 build-0.527
 make-0.518
Token one
Token ID530
Feature activation-4.11e-02

Pos logit contributions
ience+0.333
ls+0.326
opus+0.311
omo+0.311
ummy+0.309
Neg logit contributions
 join-0.270
 meet-0.247
 build-0.236
 make-0.233
 greet-0.228
Token day
Token ID1110
Feature activation-4.62e-02

Pos logit contributions
ummy+0.062
ls+0.061
 adorable+0.060
 important+0.059
ience+0.058
Neg logit contributions
 join-0.093
 meet-0.093
 greet-0.088
talk-0.087
 build-0.086
Token.
Token ID13
Feature activation-3.83e-03

Pos logit contributions
ls+0.075
ummy+0.073
ience+0.072
 adorable+0.070
 important+0.068
Neg logit contributions
 join-0.100
 meet-0.100
 greet-0.093
 contain-0.093
talk-0.091
Token He
Token ID679
Feature activation-1.80e-02

Pos logit contributions
ummy+0.007
ls+0.007
 adorable+0.007
 important+0.007
ience+0.006
Neg logit contributions
 meet-0.008
 join-0.008
 greet-0.007
talk-0.007
 build-0.007
Token was
Token ID373
Feature activation-8.09e-02

Pos logit contributions
ummy+0.052
 adorable+0.046
 important+0.046
 importantly+0.046
ls+0.044
Neg logit contributions
 join-0.048
 meet-0.046
talk-0.043
 build-0.043
 greet-0.042
Token tall
Token ID7331
Feature activation-1.26e-01

Pos logit contributions
ummy+0.124
ls+0.111
ience+0.111
opus+0.103
ement+0.098
Neg logit contributions
 join-0.200
 meet-0.196
 greet-0.188
 build-0.181
 contain-0.180
Token and
Token ID290
Feature activation-1.08e-01

Pos logit contributions
ummy+0.192
ls+0.178
 importantly+0.173
omo+0.169
ittens+0.167
Neg logit contributions
 join-0.268
 meet-0.259
 build-0.258
 make-0.252
 greet-0.249
Token it
Token ID340
Feature activation-3.25e-02

Pos logit contributions
ummy+0.166
ement+0.134
opus+0.132
ls+0.132
ience+0.128
Neg logit contributions
 join-0.266
 meet-0.261
 build-0.254
 make-0.250
 greet-0.249
Token looked
Token ID3114
Feature activation-1.22e-01

Pos logit contributions
ummy+0.069
 importantly+0.058
ement+0.054
olly+0.053
very+0.052
Neg logit contributions
 join-0.063
 build-0.062
 meet-0.061
 make-0.061
 greet-0.057
Token very
Token ID845
Feature activation-2.12e-01

Pos logit contributions
ummy+0.259
opus+0.241
ittens+0.220
olly+0.218
omo+0.218
Neg logit contributions
 join-0.243
 meet-0.230
 greet-0.226
 make-0.217
 protect-0.213
Token strong
Token ID1913
Feature activation-1.30e-01

Pos logit contributions
ummy+0.337
opus+0.316
ip+0.291
ittens+0.285
omm+0.279
Neg logit contributions
 join-0.484
 meet-0.446
 make-0.444
 greet-0.444
 protect-0.437
Token.
Token ID13
Feature activation-6.02e-03

Pos logit contributions
ummy+0.246
ls+0.240
omo+0.236
 importantly+0.235
 adorable+0.235
Neg logit contributions
 join-0.233
 meet-0.224
talk-0.212
 greet-0.210
 build-0.206
Token It
Token ID632
Feature activation-2.03e-02

Pos logit contributions
ummy+0.011
 important+0.011
ls+0.011
 adorable+0.011
 problem+0.011
Neg logit contributions
 meet-0.012
 join-0.012
 greet-0.011
talk-0.011
 build-0.011
Token was
Token ID373
Feature activation-7.69e-02

Pos logit contributions
ummy+0.044
 importantly+0.036
ement+0.035
olly+0.034
 important+0.034
Neg logit contributions
 join-0.040
 meet-0.040
 build-0.038
 make-0.037
 greet-0.037
Token so
Token ID523
Feature activation-1.62e-01

Pos logit contributions
ummy+0.135
ls+0.129
ience+0.126
opus+0.123
olly+0.114
Neg logit contributions
 join-0.171
 meet-0.165
 greet-0.159
 build-0.155
 make-0.150
Token sometimes
Token ID3360
Feature activation-1.54e-02

Pos logit contributions
 join+0.004
 meet+0.004
 build+0.004
 greet+0.004
 make+0.004
Neg logit contributions
ls-0.003
ummy-0.003
ience-0.003
ement-0.003
 adorable-0.003
Token,
Token ID11
Feature activation-6.78e-02

Pos logit contributions
ummy+0.027
ement+0.025
omo+0.025
olly+0.025
ls+0.024
Neg logit contributions
 join-0.032
 meet-0.032
 build-0.032
 make-0.031
 save-0.031
Token it
Token ID340
Feature activation-1.29e-02

Pos logit contributions
ummy+0.102
ls+0.101
 adorable+0.097
ience+0.097
 important+0.095
Neg logit contributions
 join-0.155
 meet-0.153
 build-0.146
 greet-0.142
talk-0.138
Token's
Token ID338
Feature activation-1.21e-02

Pos logit contributions
ummy+0.023
 adorable+0.022
 importantly+0.022
ls+0.021
 disgusting+0.021
Neg logit contributions
 meet-0.026
 join-0.026
 build-0.024
talk-0.023
 make-0.023
Token important
Token ID1593
Feature activation-9.32e-02

Pos logit contributions
ummy+0.016
ls+0.015
ience+0.014
opus+0.013
ement+0.013
Neg logit contributions
 join-0.031
 meet-0.031
 build-0.029
 greet-0.028
 save-0.028
Token to
Token ID284
Feature activation+1.15e-01

Pos logit contributions
ummy+0.198
ls+0.195
 adorable+0.195
ience+0.188
 importantly+0.185
Neg logit contributions
 join-0.157
 meet-0.157
talk-0.143
 build-0.142
 greet-0.142
Token mind
Token ID2000
Feature activation+2.93e-03

Pos logit contributions
talk+0.164
 contain+0.153
 join+0.151
 greet+0.150
 meet+0.146
Neg logit contributions
 important-0.270
 adorable-0.260
ummy-0.253
 problem-0.248
 cute-0.248
Token what
Token ID644
Feature activation+5.04e-02

Pos logit contributions
 join+0.007
 meet+0.006
 greet+0.006
 build+0.006
talk+0.006
Neg logit contributions
ummy-0.004
 adorable-0.004
ls-0.004
 important-0.004
 problem-0.004
Token your
Token ID534
Feature activation+1.40e-02

Pos logit contributions
 join+0.109
 meet+0.108
 build+0.100
 greet+0.097
 make+0.095
Neg logit contributions
ummy-0.085
 adorable-0.080
ls-0.079
ience-0.076
ement-0.076
Token parents
Token ID3397
Feature activation+8.23e-02

Pos logit contributions
 meet+0.032
 join+0.032
 greet+0.031
talk+0.030
 build+0.030
Neg logit contributions
ummy-0.021
 adorable-0.020
ls-0.020
 important-0.020
 problem-0.020
Token say
Token ID910
Feature activation+4.75e-02

Pos logit contributions
 join+0.140
 meet+0.139
 greet+0.132
talk+0.132
 contain+0.126
Neg logit contributions
 important-0.166
ummy-0.166
 adorable-0.166
ls-0.163
 problem-0.161
Token to
Token ID284
Feature activation+1.11e-01

Pos logit contributions
 important+0.021
ls+0.020
 adorable+0.020
ience+0.019
 importantly+0.019
Neg logit contributions
 join-0.051
 meet-0.051
 greet-0.048
talk-0.047
 build-0.047
Token me
Token ID502
Feature activation+5.09e-02

Pos logit contributions
 join+0.284
 meet+0.283
talk+0.277
 greet+0.274
 contain+0.268
Neg logit contributions
ummy-0.131
 adorable-0.128
 important-0.126
ls-0.122
 problem-0.119
Token,
Token ID11
Feature activation-5.81e-02

Pos logit contributions
 join+0.121
 meet+0.121
 greet+0.115
talk+0.114
 build+0.112
Neg logit contributions
ummy-0.071
 adorable-0.069
 important-0.069
ls-0.069
 problem-0.066
Token happy
Token ID3772
Feature activation-1.17e-01

Pos logit contributions
ls+0.095
ience+0.091
ummy+0.090
 important+0.090
 problem+0.087
Neg logit contributions
 join-0.134
 meet-0.131
 greet-0.121
 build-0.115
 contain-0.113
Token birthday
Token ID10955
Feature activation-2.78e-02

Pos logit contributions
ls+0.180
 important+0.179
 adorable+0.179
ummy+0.178
ience+0.177
Neg logit contributions
 meet-0.264
 join-0.264
 greet-0.246
talk-0.241
 build-0.237
Token to
Token ID284
Feature activation+1.05e-01

Pos logit contributions
ls+0.044
 important+0.044
 adorable+0.043
ience+0.043
ummy+0.043
Neg logit contributions
 join-0.061
 meet-0.060
 greet-0.056
talk-0.055
 build-0.054
Token me
Token ID502
Feature activation+4.95e-02

Pos logit contributions
talk+0.218
 join+0.211
 meet+0.207
 contain+0.205
 greet+0.202
Neg logit contributions
 adorable-0.175
ummy-0.174
 important-0.173
 problem-0.164
 cute-0.162
Token,
Token ID11
Feature activation-6.39e-02

Pos logit contributions
 join+0.114
 meet+0.113
 greet+0.107
talk+0.105
 build+0.105
Neg logit contributions
ummy-0.074
ls-0.072
 adorable-0.072
 important-0.072
 problem-0.069
Token happy
Token ID3772
Feature activation-1.18e-01

Pos logit contributions
ls+0.107
ience+0.103
ummy+0.100
 important+0.099
 problem+0.099
Neg logit contributions
 join-0.146
 meet-0.142
 greet-0.131
 build-0.124
 arrive-0.121
Token birthday
Token ID10955
Feature activation-2.56e-02

Pos logit contributions
ls+0.185
ience+0.184
 important+0.184
 adorable+0.182
ummy+0.181
Neg logit contributions
 join-0.262
 meet-0.261
 greet-0.243
talk-0.237
 build-0.233
Token dear
Token ID13674
Feature activation-4.54e-02

Pos logit contributions
ls+0.042
 important+0.041
ience+0.041
 adorable+0.041
ummy+0.041
Neg logit contributions
 join-0.054
 meet-0.054
 greet-0.050
talk-0.050
 contain-0.048
Token,
Token ID11
Feature activation-6.61e-02

Pos logit contributions
 join+0.028
 meet+0.027
 greet+0.026
 build+0.026
talk+0.026
Neg logit contributions
ummy-0.010
 adorable-0.010
 important-0.010
ls-0.009
ience-0.009
Token "
Token ID366
Feature activation+3.56e-02

Pos logit contributions
ummy+0.092
 important+0.090
ls+0.089
 adorable+0.088
 problem+0.085
Neg logit contributions
 join-0.160
 meet-0.159
 greet-0.150
 build-0.148
talk-0.144
TokenMaybe
Token ID13300
Feature activation-3.34e-02

Pos logit contributions
 meet+0.074
 join+0.073
 greet+0.070
 build+0.069
 contain+0.068
Neg logit contributions
 adorable-0.060
ls-0.058
 important-0.058
ummy-0.057
 disgusting-0.056
Token it
Token ID340
Feature activation-5.94e-03

Pos logit contributions
ummy+0.051
 adorable+0.049
 important+0.049
ls+0.049
ience+0.048
Neg logit contributions
 join-0.076
 meet-0.075
 greet-0.071
 build-0.070
talk-0.068
Token belongs
Token ID14448
Feature activation-1.85e-02

Pos logit contributions
ummy+0.012
 importantly+0.011
 important+0.011
 adorable+0.011
 sudden+0.010
Neg logit contributions
 meet-0.011
 join-0.011
 build-0.010
 greet-0.010
talk-0.010
Token to
Token ID284
Feature activation+1.10e-01

Pos logit contributions
ummy+0.035
 adorable+0.035
 important+0.034
ls+0.034
 problem+0.033
Neg logit contributions
 meet-0.035
 join-0.035
 greet-0.033
talk-0.032
 build-0.031
Token the
Token ID262
Feature activation+6.07e-02

Pos logit contributions
 meet+0.234
 join+0.234
 greet+0.220
talk+0.213
 build+0.213
Neg logit contributions
ummy-0.184
ls-0.181
 adorable-0.179
 important-0.178
ience-0.174
Token snow
Token ID6729
Feature activation+1.05e-02

Pos logit contributions
 meet+0.129
 join+0.128
 greet+0.120
 build+0.118
talk+0.117
Neg logit contributions
ummy-0.101
ls-0.101
 adorable-0.098
 important-0.097
ience-0.096
Tokenman
Token ID805
Feature activation+7.78e-02

Pos logit contributions
 meet+0.018
 join+0.018
 greet+0.017
talk+0.017
 contain+0.017
Neg logit contributions
ummy-0.022
 adorable-0.021
ls-0.021
 important-0.021
 problem-0.020
Token.
Token ID13
Feature activation+2.86e-02

Pos logit contributions
 join+0.171
 meet+0.171
 greet+0.161
talk+0.159
 build+0.157
Neg logit contributions
ummy-0.123
 adorable-0.120
ls-0.120
 important-0.120
 problem-0.115
Token He
Token ID679
Feature activation+1.45e-02

Pos logit contributions
 meet+0.063
 join+0.063
 greet+0.060
 contain+0.060
talk+0.060
Neg logit contributions
ummy-0.043
 adorable-0.043
 important-0.041
ls-0.040
 cute-0.039
Token store
Token ID3650
Feature activation+2.20e-03

Pos logit contributions
ummy+0.013
ls+0.012
ience+0.012
 importantly+0.012
 adorable+0.012
Neg logit contributions
 meet-0.022
 join-0.022
 greet-0.021
 build-0.020
 arrive-0.020
Token and
Token ID290
Feature activation-5.68e-02

Pos logit contributions
 meet+0.005
 join+0.005
 greet+0.005
talk+0.005
 build+0.005
Neg logit contributions
ummy-0.003
 adorable-0.003
 important-0.003
ls-0.003
 problem-0.003
Token began
Token ID2540
Feature activation+7.03e-04

Pos logit contributions
ummy+0.098
ls+0.092
 adorable+0.092
 important+0.092
 importantly+0.089
Neg logit contributions
 meet-0.122
 join-0.121
 greet-0.114
talk-0.114
 build-0.113
Token to
Token ID284
Feature activation+8.57e-02

Pos logit contributions
 meet+0.001
talk+0.001
 join+0.001
 contain+0.001
 greet+0.001
Neg logit contributions
 important-0.001
 adorable-0.001
ummy-0.001
ls-0.001
 problem-0.001
Token talk
Token ID1561
Feature activation-5.78e-03

Pos logit contributions
talk+0.113
tted+0.099
 join+0.095
 meet+0.094
five+0.090
Neg logit contributions
 adorable-0.199
 cute-0.193
 important-0.190
 disgusting-0.189
 problem-0.188
Token to
Token ID284
Feature activation+9.62e-02

Pos logit contributions
ummy+0.010
ls+0.010
 adorable+0.010
 important+0.010
ience+0.010
Neg logit contributions
 join-0.012
 meet-0.012
 greet-0.011
talk-0.011
 build-0.011
Token one
Token ID530
Feature activation-9.65e-03

Pos logit contributions
talk+0.221
 join+0.215
 meet+0.215
 greet+0.209
 contain+0.207
Neg logit contributions
 adorable-0.133
 important-0.132
 disgusting-0.122
 cute-0.122
 problem-0.120
Token of
Token ID286
Feature activation+1.02e-01

Pos logit contributions
ummy+0.015
 adorable+0.015
 important+0.015
ls+0.015
 problem+0.014
Neg logit contributions
 meet-0.021
 join-0.021
 greet-0.020
talk-0.020
 build-0.020
Token the
Token ID262
Feature activation+4.70e-02

Pos logit contributions
talk+0.225
 meet+0.222
 join+0.216
 greet+0.211
 gra+0.209
Neg logit contributions
 adorable-0.153
 important-0.151
 disgusting-0.145
ummy-0.144
 problem-0.141
Token ladies
Token ID17308
Feature activation+8.14e-02

Pos logit contributions
 join+0.101
 meet+0.101
 greet+0.096
talk+0.095
 build+0.093
Neg logit contributions
ummy-0.077
 adorable-0.076
 important-0.075
ls-0.075
 problem-0.072
Token who
Token ID508
Feature activation+1.85e-02

Pos logit contributions
 meet+0.192
 join+0.190
talk+0.186
 greet+0.184
 build+0.179
Neg logit contributions
ummy-0.109
 adorable-0.109
 important-0.107
ls-0.107
 problem-0.101
Token He
Token ID679
Feature activation-2.28e-03

Pos logit contributions
ummy+0.000
 adorable+0.000
 important+0.000
 problem+0.000
 disgusting+0.000
Neg logit contributions
 meet-0.000
 join-0.000
talk-0.000
 greet-0.000
 contain-0.000
Token thinks
Token ID6834
Feature activation-6.88e-02

Pos logit contributions
ummy+0.004
 important+0.004
 adorable+0.004
 importantly+0.004
ls+0.004
Neg logit contributions
 meet-0.005
 join-0.005
 greet-0.004
 build-0.004
talk-0.004
Token they
Token ID484
Feature activation+2.42e-02

Pos logit contributions
ummy+0.102
 adorable+0.098
 important+0.097
ls+0.096
ience+0.096
Neg logit contributions
 join-0.160
 meet-0.158
 greet-0.149
 build-0.147
talk-0.145
Token are
Token ID389
Feature activation-2.04e-02

Pos logit contributions
talk+0.030
 join+0.029
 greet+0.029
 contain+0.029
 meet+0.028
Neg logit contributions
 adorable-0.058
 cute-0.057
 important-0.057
ls-0.056
 problem-0.056
Token fun
Token ID1257
Feature activation-9.79e-02

Pos logit contributions
ummy+0.031
ls+0.029
ience+0.028
 problem+0.028
 important+0.027
Neg logit contributions
 join-0.049
 meet-0.048
 greet-0.045
 build-0.045
 gra-0.043
Token to
Token ID284
Feature activation+1.09e-01

Pos logit contributions
ummy+0.180
ience+0.171
ls+0.160
 adorable+0.160
omo+0.159
Neg logit contributions
 join-0.203
 meet-0.194
 build-0.175
 greet-0.174
talk-0.171
Token play
Token ID711
Feature activation+5.32e-02

Pos logit contributions
talk+0.183
 contain+0.153
keeping+0.148
iece+0.148
tted+0.142
Neg logit contributions
 cute-0.226
 adorable-0.224
 important-0.215
 disgusting-0.207
 problem-0.207
Token with
Token ID351
Feature activation+5.60e-02

Pos logit contributions
 join+0.112
 meet+0.112
 greet+0.106
talk+0.105
 build+0.103
Neg logit contributions
ummy-0.089
 adorable-0.087
 important-0.087
ls-0.087
 problem-0.084
Token.
Token ID13
Feature activation+3.79e-03

Pos logit contributions
 join+0.103
 meet+0.100
 greet+0.091
 build+0.091
talk+0.088
Neg logit contributions
ummy-0.114
 problem-0.110
ience-0.109
 importantly-0.108
 important-0.108
Token One
Token ID1881
Feature activation-2.74e-03

Pos logit contributions
 join+0.008
 meet+0.008
 greet+0.008
talk+0.008
 contain+0.008
Neg logit contributions
ummy-0.006
 important-0.006
 adorable-0.006
ls-0.005
 problem-0.005
Token day
Token ID1110
Feature activation-4.94e-02

Pos logit contributions
ummy+0.004
ls+0.004
ience+0.004
 adorable+0.004
 importantly+0.004
Neg logit contributions
 meet-0.006
 join-0.006
 greet-0.006
 build-0.006
talk-0.006
Token of
Token ID286
Feature activation+7.08e-02

Pos logit contributions
ummy+0.087
 adorable+0.085
 important+0.085
ls+0.085
 problem+0.082
Neg logit contributions
 meet-0.116
 join-0.116
 greet-0.110
talk-0.109
 build-0.107
Token special
Token ID2041
Feature activation-7.72e-02

Pos logit contributions
 meet+0.165
 join+0.164
 greet+0.157
talk+0.154
 build+0.153
Neg logit contributions
ummy-0.103
ls-0.100
 adorable-0.099
 important-0.098
ience-0.096
Token stones
Token ID14966
Feature activation+4.91e-02

Pos logit contributions
ummy+0.105
 adorable+0.102
ls+0.101
 important+0.099
ience+0.099
Neg logit contributions
 meet-0.186
 join-0.185
 greet-0.177
talk-0.175
 build-0.174
Token.
Token ID13
Feature activation+6.59e-03

Pos logit contributions
 join+0.106
 meet+0.104
 greet+0.098
 build+0.098
talk+0.097
Neg logit contributions
ummy-0.078
 adorable-0.077
 important-0.076
ience-0.075
 problem-0.074
Token
Token ID220
Feature activation-4.89e-02

Pos logit contributions
 join+0.014
 meet+0.014
 greet+0.014
talk+0.014
 contain+0.014
Neg logit contributions
ummy-0.011
 adorable-0.010
 important-0.010
ls-0.010
 problem-0.009
Token
Token ID198
Feature activation+2.96e-02

Pos logit contributions
ummy+0.064
 important+0.060
ls+0.060
 adorable+0.059
 importantly+0.054
Neg logit contributions
 meet-0.120
 greet-0.118
 join-0.118
 contain-0.118
talk-0.117
Token
Token ID198
Feature activation+2.57e-02

Pos logit contributions
 join+0.070
 meet+0.069
 greet+0.068
talk+0.067
 contain+0.067
Neg logit contributions
ummy-0.040
 important-0.039
 adorable-0.038
ls-0.037
 cute-0.035
TokenOne
Token ID3198
Feature activation-1.23e-02

Pos logit contributions
 join+0.050
 meet+0.050
 greet+0.048
talk+0.048
 contain+0.046
Neg logit contributions
 important-0.046
 adorable-0.046
ummy-0.046
ls-0.044
 problem-0.043
Token day
Token ID1110
Feature activation-5.56e-02

Pos logit contributions
ls+0.022
ience+0.022
ummy+0.021
ement+0.020
 importantly+0.020
Neg logit contributions
 meet-0.026
 join-0.025
 greet-0.023
 build-0.023
 gra-0.022
Token he
Token ID339
Feature activation-2.85e-02

Pos logit contributions
ummy+0.074
 important+0.072
 adorable+0.071
ience+0.070
ls+0.068
Neg logit contributions
 join-0.141
 meet-0.137
 build-0.129
 greet-0.127
 save-0.126
Token found
Token ID1043
Feature activation+1.67e-02

Pos logit contributions
ummy+0.055
 adorable+0.052
ls+0.052
 importantly+0.051
 important+0.051
Neg logit contributions
 meet-0.056
 join-0.055
 build-0.053
talk-0.052
 greet-0.051
Token Jill
Token ID22772
Feature activation-1.53e-02

Pos logit contributions
 join+0.075
 meet+0.074
 greet+0.069
 build+0.067
talk+0.067
Neg logit contributions
ummy-0.077
ience-0.077
ls-0.075
 important-0.075
 adorable-0.074
Token said
Token ID531
Feature activation-5.29e-03

Pos logit contributions
ummy+0.024
 important+0.023
 problem+0.021
 importantly+0.021
 adorable+0.021
Neg logit contributions
 join-0.035
 meet-0.035
talk-0.034
 build-0.034
 contain-0.034
Token.
Token ID13
Feature activation+1.60e-02

Pos logit contributions
ummy+0.007
ls+0.006
 adorable+0.006
 important+0.006
 problem+0.006
Neg logit contributions
 meet-0.014
 join-0.013
 greet-0.013
talk-0.013
 gra-0.013
Token "
Token ID366
Feature activation+2.19e-02

Pos logit contributions
 meet+0.040
 join+0.040
 greet+0.038
talk+0.038
 build+0.037
Neg logit contributions
ummy-0.020
 adorable-0.020
 important-0.020
ls-0.020
 problem-0.019
TokenI
Token ID40
Feature activation+5.91e-02

Pos logit contributions
 join+0.047
 greet+0.047
 meet+0.046
 contain+0.045
talk+0.044
Neg logit contributions
 adorable-0.033
 cute-0.031
ummy-0.030
 brightly-0.030
 important-0.029
Token have
Token ID423
Feature activation+5.09e-02

Pos logit contributions
 meet+0.095
 join+0.095
 greet+0.089
talk+0.088
 contain+0.086
Neg logit contributions
 important-0.126
 adorable-0.126
ummy-0.124
ls-0.123
 problem-0.122
Token a
Token ID257
Feature activation+8.18e-03

Pos logit contributions
 meet+0.102
 join+0.100
 greet+0.094
talk+0.092
 build+0.092
Neg logit contributions
ummy-0.094
ls-0.091
 adorable-0.088
 important-0.087
 importantly-0.087
Token surprise
Token ID5975
Feature activation-3.06e-02

Pos logit contributions
 join+0.019
 meet+0.018
 greet+0.017
 build+0.017
 make+0.016
Neg logit contributions
ls-0.014
 importantly-0.013
ience-0.013
ummy-0.012
dd-0.012
Token for
Token ID329
Feature activation+7.42e-02

Pos logit contributions
ummy+0.046
 adorable+0.045
 important+0.045
ls+0.045
 problem+0.043
Neg logit contributions
 join-0.069
 meet-0.069
 greet-0.066
talk-0.065
 build-0.064
Token you
Token ID345
Feature activation+3.58e-02

Pos logit contributions
 join+0.167
 meet+0.166
 greet+0.156
 build+0.153
talk+0.150
Neg logit contributions
ls-0.113
ummy-0.113
 important-0.113
 adorable-0.112
ience-0.110
Token.
Token ID13
Feature activation+2.72e-02

Pos logit contributions
 join+0.078
 meet+0.077
 greet+0.071
 build+0.070
 contain+0.069
Neg logit contributions
ls-0.057
ummy-0.057
 adorable-0.057
 important-0.057
 importantly-0.056
Token a
Token ID257
Feature activation+1.59e-03

Pos logit contributions
 join+0.161
 meet+0.160
 greet+0.151
talk+0.147
 build+0.147
Neg logit contributions
ummy-0.141
ls-0.136
 adorable-0.135
 important-0.134
ience-0.132
Token lot
Token ID1256
Feature activation-5.89e-02

Pos logit contributions
 meet+0.004
 join+0.004
 greet+0.004
talk+0.004
 build+0.004
Neg logit contributions
ummy-0.002
 adorable-0.002
 important-0.002
ls-0.002
ience-0.002
Token of
Token ID286
Feature activation+9.15e-02

Pos logit contributions
ls+0.078
 important+0.077
ummy+0.074
 adorable+0.073
 disgusting+0.072
Neg logit contributions
 meet-0.152
 greet-0.144
 join-0.143
 contain-0.139
talk-0.137
Token noise
Token ID7838
Feature activation-2.70e-02

Pos logit contributions
 join+0.223
 meet+0.222
 greet+0.213
 build+0.209
talk+0.209
Neg logit contributions
ummy-0.126
ls-0.118
 adorable-0.118
 important-0.116
ience-0.114
Token and
Token ID290
Feature activation-5.44e-02

Pos logit contributions
ummy+0.037
 adorable+0.035
 important+0.035
ls+0.035
ience+0.034
Neg logit contributions
 join-0.066
 meet-0.065
 greet-0.062
talk-0.061
 build-0.061
Token wake
Token ID7765
Feature activation+5.26e-03

Pos logit contributions
 adorable+0.094
 disgusting+0.094
 problem+0.092
ls+0.092
 important+0.092
Neg logit contributions
talk-0.106
 meet-0.104
 join-0.103
 greet-0.100
 contain-0.097
Token up
Token ID510
Feature activation-1.71e-03

Pos logit contributions
 join+0.011
 meet+0.010
 greet+0.010
 build+0.010
talk+0.010
Neg logit contributions
 importantly-0.009
ummy-0.009
 adorable-0.009
 important-0.009
ience-0.009
Token Mom
Token ID11254
Feature activation+2.73e-02

Pos logit contributions
ummy+0.003
 adorable+0.003
 important+0.003
ls+0.003
 problem+0.002
Neg logit contributions
 join-0.004
 meet-0.004
 greet-0.004
talk-0.004
 build-0.004
Token.
Token ID13
Feature activation+2.25e-02

Pos logit contributions
 join+0.079
 meet+0.079
 contain+0.078
 greet+0.077
talk+0.076
Neg logit contributions
ls-0.022
ummy-0.022
 adorable-0.021
 problem-0.019
 important-0.019
Token
Token ID198
Feature activation+4.79e-02

Pos logit contributions
 meet+0.042
 join+0.042
 greet+0.039
talk+0.038
 build+0.037
Neg logit contributions
ience-0.043
ls-0.043
 important-0.043
ummy-0.043
 adorable-0.043
Token
Token ID198
Feature activation+4.35e-02

Pos logit contributions
 join+0.106
 meet+0.106
talk+0.103
 contain+0.102
 greet+0.102
Neg logit contributions
ummy-0.073
 important-0.072
ls-0.071
 adorable-0.071
 cute-0.067
Token the
Token ID262
Feature activation+6.82e-02

Pos logit contributions
 meet+0.174
talk+0.172
 join+0.171
 greet+0.167
 contain+0.164
Neg logit contributions
 important-0.119
ummy-0.119
 adorable-0.117
ls-0.113
 problem-0.112
Token fire
Token ID2046
Feature activation+3.96e-02

Pos logit contributions
 meet+0.148
 join+0.148
 greet+0.140
talk+0.139
 build+0.136
Neg logit contributions
ummy-0.110
 adorable-0.108
 important-0.107
ls-0.107
 problem-0.103
Tokenmen
Token ID3653
Feature activation+1.01e-01

Pos logit contributions
 meet+0.077
 join+0.077
 greet+0.075
talk+0.074
 contain+0.074
Neg logit contributions
ummy-0.071
 important-0.068
 adorable-0.068
 problem-0.068
ls-0.068
Token.
Token ID13
Feature activation+2.31e-02

Pos logit contributions
talk+0.230
 meet+0.227
 join+0.226
 greet+0.225
 contain+0.221
Neg logit contributions
ummy-0.134
 important-0.131
 problem-0.130
 adorable-0.130
ls-0.127
Token They
Token ID1119
Feature activation+3.71e-02

Pos logit contributions
 meet+0.048
 join+0.048
talk+0.047
 greet+0.046
 contain+0.046
Neg logit contributions
ummy-0.038
 adorable-0.037
 important-0.037
ience-0.035
 importantly-0.035
Token can
Token ID460
Feature activation+3.87e-02

Pos logit contributions
 join+0.053
talk+0.052
 meet+0.051
 contain+0.050
 greet+0.050
Neg logit contributions
 important-0.083
 adorable-0.082
 problem-0.081
ience-0.079
 cute-0.079
Token help
Token ID1037
Feature activation+6.26e-02

Pos logit contributions
talk+0.046
 join+0.044
 contain+0.043
 meet+0.042
 greet+0.041
Neg logit contributions
 adorable-0.093
ls-0.092
ummy-0.092
 important-0.091
 cute-0.091
Token us
Token ID514
Feature activation+8.29e-02

Pos logit contributions
 meet+0.134
 join+0.134
talk+0.133
 greet+0.130
 contain+0.129
Neg logit contributions
 important-0.099
ummy-0.097
 adorable-0.095
ls-0.094
 problem-0.091
Token get
Token ID651
Feature activation+2.41e-02

Pos logit contributions
talk+0.116
 contain+0.103
 greet+0.101
 meet+0.101
 gra+0.100
Neg logit contributions
 important-0.187
 adorable-0.183
 disgusting-0.177
 cute-0.176
 lazy-0.174
Token Spot
Token ID15899
Feature activation+4.87e-02

Pos logit contributions
 meet+0.047
 join+0.047
 greet+0.044
talk+0.044
 build+0.043
Neg logit contributions
ummy-0.044
 adorable-0.043
 important-0.043
ls-0.043
 problem-0.041
Token out
Token ID503
Feature activation+6.50e-02

Pos logit contributions
 join+0.101
 meet+0.101
 greet+0.098
talk+0.097
 contain+0.094
Neg logit contributions
 important-0.081
 adorable-0.080
ummy-0.080
ls-0.079
 problem-0.077
Token His
Token ID2399
Feature activation-4.10e-02

Pos logit contributions
ummy+0.001
 adorable+0.001
 important+0.001
ls+0.001
 problem+0.001
Neg logit contributions
 meet-0.002
 join-0.002
 greet-0.002
talk-0.002
 build-0.001
Token mom
Token ID1995
Feature activation+3.92e-02

Pos logit contributions
ience+0.068
ls+0.066
ummy+0.062
 importantly+0.059
 important+0.058
Neg logit contributions
 join-0.091
 meet-0.090
 greet-0.085
 build-0.084
talk-0.081
Tokenma
Token ID2611
Feature activation+2.49e-02

Pos logit contributions
 join+0.085
 meet+0.085
 greet+0.081
talk+0.079
 build+0.078
Neg logit contributions
ummy-0.063
ls-0.062
 important-0.062
 adorable-0.061
 problem-0.060
Token was
Token ID373
Feature activation-5.85e-02

Pos logit contributions
 join+0.055
 meet+0.055
 greet+0.053
 build+0.051
talk+0.050
Neg logit contributions
ummy-0.040
 important-0.039
ls-0.039
 adorable-0.038
 problem-0.038
Token in
Token ID287
Feature activation+8.29e-02

Pos logit contributions
ummy+0.071
ls+0.070
 important+0.068
 adorable+0.068
ience+0.066
Neg logit contributions
 join-0.150
 meet-0.150
 greet-0.144
talk-0.141
 build-0.140
Token the
Token ID262
Feature activation+3.75e-02

Pos logit contributions
talk+0.174
 meet+0.174
 join+0.170
 build+0.164
 greet+0.160
Neg logit contributions
 adorable-0.135
ummy-0.131
 important-0.130
 problem-0.128
 cute-0.126
Token kitchen
Token ID9592
Feature activation-1.81e-03

Pos logit contributions
 meet+0.078
 join+0.077
 greet+0.074
talk+0.072
 build+0.071
Neg logit contributions
ummy-0.065
ls-0.064
 important-0.063
 adorable-0.062
ience-0.061
Token and
Token ID290
Feature activation-6.28e-02

Pos logit contributions
ls+0.002
ummy+0.002
 adorable+0.002
 important+0.002
ience+0.002
Neg logit contributions
 join-0.005
 meet-0.005
 greet-0.004
 build-0.004
talk-0.004
Token saw
Token ID2497
Feature activation+9.41e-03

Pos logit contributions
ummy+0.095
ls+0.090
 important+0.089
 adorable+0.087
 problem+0.087
Neg logit contributions
 join-0.147
 meet-0.146
 greet-0.141
 build-0.138
talk-0.131
Token him
Token ID683
Feature activation+8.05e-03

Pos logit contributions
 meet+0.017
 join+0.017
 greet+0.016
 build+0.016
talk+0.016
Neg logit contributions
ummy-0.017
ls-0.017
 important-0.017
ience-0.017
 adorable-0.017
Token holding
Token ID4769
Feature activation+4.47e-02

Pos logit contributions
 join+0.018
 meet+0.018
 greet+0.017
 build+0.017
talk+0.017
Neg logit contributions
ummy-0.012
ls-0.012
 important-0.012
 adorable-0.012
ience-0.011
Token in
Token ID287
Feature activation+8.01e-02

Pos logit contributions
 join+0.036
 meet+0.035
 greet+0.034
talk+0.033
 build+0.033
Neg logit contributions
ummy-0.025
 adorable-0.024
 important-0.024
ls-0.024
 problem-0.023
Token the
Token ID262
Feature activation+3.11e-02

Pos logit contributions
 meet+0.173
 join+0.172
talk+0.168
 greet+0.161
 build+0.160
Neg logit contributions
 adorable-0.128
ummy-0.124
 disgusting-0.123
 important-0.123
ls-0.120
Token ground
Token ID2323
Feature activation+1.48e-02

Pos logit contributions
 join+0.058
 greet+0.053
 meet+0.053
talk+0.052
 contain+0.051
Neg logit contributions
 adorable-0.061
 disgusting-0.061
 important-0.060
ummy-0.058
 problem-0.058
Token.
Token ID13
Feature activation-1.20e-03

Pos logit contributions
 join+0.033
 meet+0.033
 greet+0.031
talk+0.031
 build+0.031
Neg logit contributions
ummy-0.023
 adorable-0.022
 important-0.022
ls-0.022
 problem-0.021
Token It
Token ID632
Feature activation-1.23e-02

Pos logit contributions
ls+0.002
 important+0.002
 problem+0.002
 adorable+0.002
ience+0.002
Neg logit contributions
 join-0.002
 meet-0.002
 greet-0.002
 build-0.002
 gra-0.002
Token was
Token ID373
Feature activation-7.53e-02

Pos logit contributions
ummy+0.029
 importantly+0.025
olly+0.025
ip+0.023
ement+0.023
Neg logit contributions
 make-0.020
 join-0.020
 meet-0.020
 build-0.019
 greet-0.018
Token tall
Token ID7331
Feature activation-1.17e-01

Pos logit contributions
ience+0.128
ls+0.125
ummy+0.121
opus+0.117
olly+0.113
Neg logit contributions
 join-0.165
 meet-0.162
 build-0.155
 greet-0.154
 make-0.151
Token and
Token ID290
Feature activation-1.05e-01

Pos logit contributions
ummy+0.165
ls+0.164
 importantly+0.156
olly+0.154
ience+0.150
Neg logit contributions
 join-0.251
 meet-0.247
 build-0.246
 greet-0.238
 make-0.236
Token gray
Token ID12768
Feature activation-1.06e-01

Pos logit contributions
ummy+0.176
ls+0.149
ience+0.149
olly+0.148
opus+0.147
Neg logit contributions
 join-0.231
 meet-0.228
 make-0.222
 build-0.220
 greet-0.218
Token and
Token ID290
Feature activation-9.64e-02

Pos logit contributions
ience+0.163
ls+0.163
 important+0.161
 adorable+0.155
ummy+0.154
Neg logit contributions
 join-0.230
 meet-0.223
 build-0.215
 greet-0.211
talk-0.208
Token didn
Token ID1422
Feature activation-5.94e-02

Pos logit contributions
ummy+0.170
ls+0.150
ience+0.148
ement+0.145
olly+0.144
Neg logit contributions
 join-0.198
 meet-0.196
 make-0.190
 build-0.188
 greet-0.185
Token day
Token ID1110
Feature activation-4.86e-02

Pos logit contributions
ummy+0.000
ience+0.000
ls+0.000
 importantly+0.000
ement+0.000
Neg logit contributions
 meet-0.000
 join-0.000
 build-0.000
 greet-0.000
talk-0.000
Token,
Token ID11
Feature activation-7.72e-02

Pos logit contributions
 adorable+0.061
ience+0.061
ummy+0.060
 important+0.060
 importantly+0.055
Neg logit contributions
 join-0.128
 meet-0.123
 build-0.114
 greet-0.113
talk-0.112
Token Tom
Token ID4186
Feature activation+2.70e-02

Pos logit contributions
ummy+0.149
ience+0.145
 important+0.141
ls+0.140
 adorable+0.139
Neg logit contributions
 join-0.152
 meet-0.147
 greet-0.135
 build-0.134
talk-0.130
Token had
Token ID550
Feature activation+2.44e-02

Pos logit contributions
 meet+0.060
 join+0.056
 build+0.054
talk+0.054
 greet+0.052
Neg logit contributions
ummy-0.048
 importantly-0.047
ement-0.045
 important-0.044
 adorable-0.043
Token an
Token ID281
Feature activation-3.73e-02

Pos logit contributions
 meet+0.052
 join+0.052
 greet+0.049
talk+0.049
 build+0.048
Neg logit contributions
ummy-0.041
 adorable-0.040
ls-0.040
 important-0.040
 problem-0.038
Token idea
Token ID2126
Feature activation-6.54e-02

Pos logit contributions
ls+0.072
ience+0.065
 importantly+0.065
 sudden+0.063
ummy+0.063
Neg logit contributions
 meet-0.075
 join-0.072
 build-0.072
 greet-0.069
 contain-0.069
Token.
Token ID13
Feature activation+1.30e-02

Pos logit contributions
ummy+0.094
 adorable+0.093
 important+0.093
ls+0.092
 problem+0.089
Neg logit contributions
 meet-0.152
 join-0.151
 greet-0.145
talk-0.143
 build-0.140
Token He
Token ID679
Feature activation-6.69e-03

Pos logit contributions
 meet+0.031
 join+0.031
 greet+0.030
talk+0.029
 contain+0.029
Neg logit contributions
ummy-0.018
 adorable-0.018
 important-0.018
ls-0.017
 problem-0.017
Token wanted
Token ID2227
Feature activation+1.08e-02

Pos logit contributions
ummy+0.015
ement+0.013
 importantly+0.013
 accidents+0.013
 adorable+0.012
Neg logit contributions
 meet-0.013
 join-0.012
talk-0.012
 build-0.012
 greet-0.012
Token to
Token ID284
Feature activation+1.10e-01

Pos logit contributions
 meet+0.022
 join+0.022
 greet+0.021
talk+0.021
 contain+0.020
Neg logit contributions
 important-0.019
ummy-0.018
 adorable-0.018
ls-0.018
 problem-0.018
Token scare
Token ID19437
Feature activation+1.86e-02

Pos logit contributions
talk+0.147
iece+0.126
 contain+0.124
oes+0.118
 meet+0.118
Neg logit contributions
 adorable-0.254
 cute-0.244
 important-0.243
 expensive-0.240
 disgusting-0.237
Token giant
Token ID6175
Feature activation-7.89e-02

Pos logit contributions
ement+0.053
ls+0.053
der+0.050
ummy+0.050
ience+0.049
Neg logit contributions
 join-0.096
 meet-0.095
 build-0.087
 greet-0.086
 gra-0.086
Token tree
Token ID5509
Feature activation-2.43e-02

Pos logit contributions
ls+0.141
ummy+0.140
 important+0.138
ience+0.136
 tempting+0.134
Neg logit contributions
 join-0.151
 meet-0.148
 build-0.139
 greet-0.133
 gra-0.133
Token in
Token ID287
Feature activation+6.81e-02

Pos logit contributions
ummy+0.040
ls+0.040
 adorable+0.039
 important+0.039
ience+0.039
Neg logit contributions
 join-0.051
 meet-0.051
 build-0.047
 greet-0.047
talk-0.047
Token a
Token ID257
Feature activation-2.84e-02

Pos logit contributions
 join+0.151
 meet+0.149
talk+0.147
 greet+0.142
 contain+0.141
Neg logit contributions
 adorable-0.108
 important-0.104
ummy-0.104
 disgusting-0.100
 cute-0.100
Token hidden
Token ID7104
Feature activation-7.21e-02

Pos logit contributions
ummy+0.044
ls+0.044
 importantly+0.044
ement+0.043
der+0.043
Neg logit contributions
 meet-0.067
 join-0.062
 greet-0.060
 build-0.060
 gra-0.059
Token forest
Token ID8222
Feature activation-3.38e-02

Pos logit contributions
ummy+0.120
ience+0.119
ls+0.119
 adorable+0.118
 important+0.116
Neg logit contributions
 join-0.149
 meet-0.147
 greet-0.141
talk-0.137
 build-0.137
Token.
Token ID13
Feature activation-1.07e-02

Pos logit contributions
ummy+0.052
 adorable+0.050
 important+0.050
ls+0.050
 problem+0.048
Neg logit contributions
 meet-0.076
 join-0.076
 greet-0.072
talk-0.071
 build-0.070
Token Every
Token ID3887
Feature activation-6.79e-03

Pos logit contributions
ummy+0.017
 adorable+0.017
 important+0.017
 disgusting+0.016
 cute+0.016
Neg logit contributions
 join-0.023
 meet-0.023
talk-0.022
 greet-0.022
 contain-0.021
Token night
Token ID1755
Feature activation-6.47e-02

Pos logit contributions
ls+0.011
ummy+0.011
 adorable+0.011
ience+0.011
 importantly+0.011
Neg logit contributions
 meet-0.015
 join-0.014
 greet-0.013
talk-0.013
 build-0.013
Token an
Token ID281
Feature activation-4.78e-02

Pos logit contributions
ummy+0.057
 adorable+0.055
 important+0.055
ls+0.054
 problem+0.051
Neg logit contributions
 join-0.187
 meet-0.187
 greet-0.180
talk-0.179
 build-0.175
Token owl
Token ID39610
Feature activation+1.30e-02

Pos logit contributions
ls+0.079
ience+0.069
 importantly+0.064
ip+0.063
 sudden+0.063
Neg logit contributions
 meet-0.106
 contain-0.103
 join-0.099
 greet-0.097
 build-0.096
Token large
Token ID1588
Feature activation-8.89e-02

Pos logit contributions
ummy+0.041
ls+0.041
 importantly+0.040
ement+0.040
ience+0.039
Neg logit contributions
 meet-0.059
 join-0.058
 build-0.055
 greet-0.055
 gra-0.055
Token clock
Token ID8801
Feature activation-6.93e-03

Pos logit contributions
 importantly+0.180
ls+0.179
ience+0.173
 important+0.167
 adorable+0.167
Neg logit contributions
 meet-0.151
 join-0.149
 build-0.147
 greet-0.141
 gra-0.138
Token.
Token ID13
Feature activation-8.44e-03

Pos logit contributions
ummy+0.010
ls+0.010
 adorable+0.010
 important+0.009
 problem+0.009
Neg logit contributions
 meet-0.016
 join-0.016
 greet-0.015
talk-0.015
 contain-0.015
Token It
Token ID632
Feature activation-1.57e-02

Pos logit contributions
ls+0.018
ummy+0.018
 important+0.018
 adorable+0.018
 problem+0.018
Neg logit contributions
 join-0.014
 meet-0.014
 greet-0.013
 build-0.013
talk-0.012
Token was
Token ID373
Feature activation-7.65e-02

Pos logit contributions
ummy+0.034
 importantly+0.030
ement+0.029
ls+0.028
olly+0.028
Neg logit contributions
 meet-0.029
 join-0.028
 make-0.028
 build-0.028
 gra-0.026
Token big
Token ID1263
Feature activation-8.97e-02

Pos logit contributions
ls+0.128
ience+0.126
ummy+0.120
opus+0.114
ement+0.109
Neg logit contributions
 join-0.170
 meet-0.169
 build-0.162
 greet-0.160
 gra-0.157
Token and
Token ID290
Feature activation-1.13e-01

Pos logit contributions
ls+0.133
 important+0.131
ummy+0.131
omo+0.130
 sudden+0.126
Neg logit contributions
 join-0.176
 build-0.175
 meet-0.175
 make-0.170
 greet-0.166
Token made
Token ID925
Feature activation+2.40e-02

Pos logit contributions
ummy+0.179
ls+0.155
olly+0.154
ement+0.153
der+0.145
Neg logit contributions
 meet-0.253
 make-0.251
 join-0.251
 build-0.251
 gra-0.245
Token a
Token ID257
Feature activation-3.17e-02

Pos logit contributions
 join+0.046
 meet+0.046
 greet+0.043
 build+0.042
talk+0.042
Neg logit contributions
ummy-0.045
ls-0.044
 important-0.044
 adorable-0.044
ience-0.043
Token loud
Token ID7812
Feature activation-6.70e-02

Pos logit contributions
ls+0.043
ummy+0.041
ience+0.040
 importantly+0.039
ement+0.037
Neg logit contributions
 join-0.078
 meet-0.078
 build-0.073
 greet-0.073
 gra-0.072
Token tick
Token ID4378
Feature activation-8.84e-02

Pos logit contributions
ls+0.104
ummy+0.104
 importantly+0.101
 important+0.099
 adorable+0.099
Neg logit contributions
 meet-0.148
 join-0.148
 build-0.141
 greet-0.140
 gra-0.136
Token ground
Token ID2323
Feature activation+1.68e-02

Pos logit contributions
 join+0.095
 meet+0.095
 greet+0.090
talk+0.089
 build+0.087
Neg logit contributions
ummy-0.072
 adorable-0.071
 important-0.071
ls-0.070
 problem-0.068
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
 join+0.042
 meet+0.042
 greet+0.040
talk+0.039
 build+0.039
Neg logit contributions
ummy-0.022
 adorable-0.021
 important-0.021
ls-0.021
ience-0.020
Token The
Token ID383
Feature activation+1.26e-02

Pos logit contributions
 meet+0.029
 join+0.029
 greet+0.028
talk+0.027
 build+0.027
Neg logit contributions
ummy-0.025
 adorable-0.025
 important-0.025
ls-0.025
 problem-0.024
Token feather
Token ID17972
Feature activation+8.11e-03

Pos logit contributions
 meet+0.027
 join+0.027
 greet+0.025
 build+0.025
talk+0.025
Neg logit contributions
ummy-0.021
ls-0.020
 important-0.019
ience-0.019
 importantly-0.019
Token flew
Token ID13112
Feature activation+2.72e-02

Pos logit contributions
 join+0.017
 meet+0.017
 build+0.016
 greet+0.016
talk+0.016
Neg logit contributions
ummy-0.014
 important-0.013
 adorable-0.013
 importantly-0.013
ience-0.012
Token away
Token ID1497
Feature activation-4.67e-02

Pos logit contributions
 join+0.048
 meet+0.046
 greet+0.043
talk+0.043
 build+0.042
Neg logit contributions
ummy-0.056
 important-0.054
 adorable-0.054
 importantly-0.054
ience-0.054
Token in
Token ID287
Feature activation+8.83e-02

Pos logit contributions
ience+0.072
ummy+0.068
 importantly+0.067
 important+0.067
 adorable+0.066
Neg logit contributions
 join-0.108
 meet-0.101
 contain-0.098
 greet-0.097
 build-0.096
Token the
Token ID262
Feature activation+2.88e-02

Pos logit contributions
 meet+0.183
 join+0.181
talk+0.174
 greet+0.172
 build+0.168
Neg logit contributions
ummy-0.150
 adorable-0.149
 important-0.147
ls-0.145
 problem-0.143
Token wind
Token ID2344
Feature activation+1.24e-02

Pos logit contributions
 join+0.069
 meet+0.068
 greet+0.065
talk+0.064
 contain+0.063
Neg logit contributions
 adorable-0.040
 important-0.040
ummy-0.040
 problem-0.039
ls-0.039
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
 join+0.031
 meet+0.031
 greet+0.029
talk+0.029
 build+0.029
Neg logit contributions
ummy-0.015
 important-0.015
 adorable-0.015
ls-0.015
ience-0.014
Token L
Token ID406
Feature activation-4.34e-02

Pos logit contributions
 meet+0.027
 join+0.027
 greet+0.025
talk+0.025
 build+0.025
Neg logit contributions
ummy-0.022
 adorable-0.021
 important-0.021
ls-0.021
 problem-0.020
Token that
Token ID326
Feature activation-2.51e-03

Pos logit contributions
 meet+0.147
 join+0.147
 greet+0.141
talk+0.140
 contain+0.135
Neg logit contributions
ummy-0.083
 important-0.081
 adorable-0.080
ls-0.077
 problem-0.076
Token made
Token ID925
Feature activation+4.03e-02

Pos logit contributions
 important+0.003
 adorable+0.003
ummy+0.003
ls+0.003
 disgusting+0.003
Neg logit contributions
 meet-0.006
 join-0.006
 greet-0.006
talk-0.006
 contain-0.006
Token her
Token ID607
Feature activation+1.66e-02

Pos logit contributions
 meet+0.092
 join+0.091
talk+0.088
 greet+0.087
 contain+0.086
Neg logit contributions
 adorable-0.060
 important-0.060
ummy-0.060
ls-0.058
 problem-0.056
Token feel
Token ID1254
Feature activation-5.43e-02

Pos logit contributions
 join+0.037
 meet+0.036
 greet+0.035
talk+0.034
 build+0.034
Neg logit contributions
ummy-0.026
ls-0.026
 important-0.026
 adorable-0.026
ience-0.025
Token very
Token ID845
Feature activation-1.92e-01

Pos logit contributions
ummy+0.065
ls+0.061
ience+0.055
opus+0.054
omo+0.052
Neg logit contributions
 join-0.145
 meet-0.142
 greet-0.137
 build-0.136
 contain-0.133
Token uncomfortable
Token ID12916
Feature activation-1.17e-01

Pos logit contributions
ummy+0.216
 importantly+0.202
ience+0.199
omo+0.194
opus+0.192
Neg logit contributions
 join-0.486
 build-0.475
 meet-0.474
 contain-0.467
 make-0.466
Token.
Token ID13
Feature activation+1.42e-03

Pos logit contributions
ls+0.195
ummy+0.194
ience+0.190
 adorable+0.190
 important+0.186
Neg logit contributions
 join-0.245
 meet-0.241
 greet-0.227
 build-0.226
talk-0.225
Token
Token ID220
Feature activation-5.05e-02

Pos logit contributions
 meet+0.003
 join+0.003
 greet+0.003
talk+0.003
 build+0.002
Neg logit contributions
ummy-0.003
 adorable-0.003
 important-0.003
ls-0.003
ience-0.002
Token
Token ID198
Feature activation+2.89e-02

Pos logit contributions
ummy+0.069
 important+0.065
ls+0.064
 adorable+0.064
 cute+0.059
Neg logit contributions
 meet-0.122
 join-0.121
 greet-0.120
 contain-0.120
talk-0.120
Token
Token ID198
Feature activation+2.49e-02

Pos logit contributions
 join+0.065
 meet+0.063
 greet+0.062
 contain+0.061
 build+0.061
Neg logit contributions
 important-0.043
ummy-0.043
 adorable-0.043
ls-0.043
 problem-0.040
TokenFaith
Token ID45536
Feature activation+1.03e-02

Pos logit contributions
 meet+0.048
 join+0.047
talk+0.044
 greet+0.044
 build+0.042
Neg logit contributions
ummy-0.047
ls-0.045
 adorable-0.045
 important-0.044
 problem-0.044
Token But
Token ID887
Feature activation-3.80e-02

Pos logit contributions
ummy+0.010
 adorable+0.010
 important+0.010
ls+0.010
 problem+0.009
Neg logit contributions
 meet-0.013
 join-0.013
 greet-0.013
talk-0.013
 build-0.012
Token she
Token ID673
Feature activation-2.79e-02

Pos logit contributions
ummy+0.056
ls+0.052
 adorable+0.051
ience+0.051
 important+0.050
Neg logit contributions
 meet-0.091
 join-0.090
 greet-0.085
 build-0.084
talk-0.082
Token was
Token ID373
Feature activation-5.47e-02

Pos logit contributions
ummy+0.056
 importantly+0.051
ls+0.051
 adorable+0.050
ement+0.048
Neg logit contributions
 build-0.053
 meet-0.052
 join-0.051
 make-0.050
 rebuild-0.048
Token having
Token ID1719
Feature activation+3.04e-02

Pos logit contributions
ummy+0.075
ls+0.070
ience+0.066
 importantly+0.061
 adorable+0.061
Neg logit contributions
 join-0.142
 meet-0.140
 build-0.133
 greet-0.130
 contain-0.128
Token so
Token ID523
Feature activation-1.24e-01

Pos logit contributions
 join+0.069
 meet+0.067
 build+0.063
 make+0.061
 greet+0.060
Neg logit contributions
ls-0.050
ummy-0.049
ience-0.048
opus-0.047
 importantly-0.043
Token much
Token ID881
Feature activation-1.19e-01

Pos logit contributions
ummy+0.133
ience+0.125
 adorable+0.119
ls+0.119
 problem+0.113
Neg logit contributions
 meet-0.332
 join-0.330
talk-0.317
 build-0.309
 greet-0.304
Token fun
Token ID1257
Feature activation-1.00e-01

Pos logit contributions
ience+0.145
ummy+0.129
ls+0.123
 adorable+0.120
opus+0.115
Neg logit contributions
 join-0.325
 meet-0.320
 build-0.301
 greet-0.294
talk-0.290
Token that
Token ID326
Feature activation-1.87e-02

Pos logit contributions
ience+0.168
ummy+0.165
 adorable+0.161
ls+0.160
 important+0.155
Neg logit contributions
 join-0.217
 meet-0.205
 build-0.201
 greet-0.187
 make-0.186
Token she
Token ID673
Feature activation-2.84e-02

Pos logit contributions
ummy+0.024
ls+0.023
ience+0.023
 adorable+0.021
 importantly+0.021
Neg logit contributions
 meet-0.048
 join-0.048
 build-0.044
 greet-0.044
 make-0.043
Token didn
Token ID1422
Feature activation-3.01e-02

Pos logit contributions
ummy+0.061
 importantly+0.053
ement+0.052
 adorable+0.051
ls+0.050
Neg logit contributions
 meet-0.056
 join-0.055
 build-0.055
 make-0.051
talk-0.049
Token't
Token ID470
Feature activation-2.59e-02

Pos logit contributions
 adorable+0.068
ummy+0.067
ls+0.067
 important+0.067
ience+0.067
Neg logit contributions
 join-0.046
 meet-0.045
 build-0.040
 greet-0.040
talk-0.039
Token fly
Token ID6129
Feature activation-2.21e-02

Pos logit contributions
ls+0.061
ement+0.060
ience+0.058
der+0.057
ummy+0.057
Neg logit contributions
 join-0.107
 meet-0.105
 build-0.098
 arrive-0.096
 greet-0.096
Token.
Token ID13
Feature activation-1.23e-02

Pos logit contributions
ummy+0.032
 adorable+0.032
 important+0.032
ls+0.031
 problem+0.030
Neg logit contributions
 meet-0.051
 join-0.051
 greet-0.049
talk-0.048
 build-0.047
Token The
Token ID383
Feature activation-1.07e-02

Pos logit contributions
ummy+0.022
 adorable+0.022
 important+0.021
 cute+0.020
 importantly+0.020
Neg logit contributions
 join-0.024
 meet-0.024
talk-0.024
 greet-0.023
 contain-0.022
Token fly
Token ID6129
Feature activation-3.83e-03

Pos logit contributions
 adorable+0.023
 important+0.023
ls+0.022
 problem+0.022
ummy+0.022
Neg logit contributions
 join-0.017
 meet-0.017
talk-0.017
 greet-0.016
 contain-0.016
Token was
Token ID373
Feature activation-7.36e-02

Pos logit contributions
ummy+0.006
 adorable+0.006
ls+0.006
 important+0.005
ience+0.005
Neg logit contributions
 meet-0.009
 join-0.009
 greet-0.008
talk-0.008
 build-0.008
Token very
Token ID845
Feature activation-2.01e-01

Pos logit contributions
ummy+0.097
ience+0.095
ls+0.092
ement+0.083
 importantly+0.083
Neg logit contributions
 meet-0.189
 join-0.187
 build-0.177
 greet-0.175
 contain-0.171
Token clean
Token ID3424
Feature activation-4.26e-02

Pos logit contributions
ience+0.268
ummy+0.264
ement+0.253
der+0.245
 accidents+0.243
Neg logit contributions
 join-0.507
 meet-0.504
 build-0.465
 greet-0.463
 contain-0.461
Token.
Token ID13
Feature activation-7.38e-03

Pos logit contributions
ls+0.070
ience+0.068
 importantly+0.068
ummy+0.066
 problem+0.064
Neg logit contributions
 join-0.087
 meet-0.085
 build-0.084
 make-0.081
 contain-0.080
Token One
Token ID1881
Feature activation-1.51e-02

Pos logit contributions
ummy+0.013
 adorable+0.012
 important+0.012
 importantly+0.012
 cute+0.011
Neg logit contributions
 join-0.015
 meet-0.015
talk-0.015
 greet-0.015
 contain-0.014
Token day
Token ID1110
Feature activation-5.75e-02

Pos logit contributions
ls+0.024
ummy+0.024
ience+0.023
 importantly+0.022
 adorable+0.022
Neg logit contributions
 meet-0.033
 join-0.033
 greet-0.031
 build-0.031
talk-0.030
Token it
Token ID340
Feature activation-1.59e-02

Pos logit contributions
ience+0.077
 important+0.075
 adorable+0.074
ummy+0.073
 importantly+0.070
Neg logit contributions
 join-0.145
 meet-0.139
 build-0.131
 greet-0.126
 arrive-0.126
Token.
Token ID13
Feature activation+7.79e-04

Pos logit contributions
 meet+0.027
 join+0.027
 greet+0.026
talk+0.026
 build+0.025
Neg logit contributions
ummy-0.019
 adorable-0.018
 important-0.018
ls-0.018
 problem-0.018
Token They
Token ID1119
Feature activation+2.60e-02

Pos logit contributions
 meet+0.001
 join+0.001
 build+0.001
 greet+0.001
talk+0.001
Neg logit contributions
ls-0.001
 adorable-0.001
 important-0.001
ummy-0.001
ience-0.001
Token both
Token ID1111
Feature activation-1.86e-02

Pos logit contributions
 meet+0.052
 join+0.052
 greet+0.049
talk+0.049
 build+0.048
Neg logit contributions
ummy-0.046
 adorable-0.045
 important-0.045
ls-0.045
 problem-0.044
Token had
Token ID550
Feature activation+2.73e-02

Pos logit contributions
ummy+0.031
 adorable+0.030
 important+0.030
ls+0.030
 problem+0.029
Neg logit contributions
 meet-0.040
 join-0.040
 greet-0.038
talk-0.038
 build-0.037
Token a
Token ID257
Feature activation-8.51e-03

Pos logit contributions
 meet+0.068
 join+0.068
 greet+0.064
 build+0.063
talk+0.063
Neg logit contributions
ummy-0.037
ls-0.036
 adorable-0.035
 important-0.034
ience-0.034
Token great
Token ID1049
Feature activation-1.17e-01

Pos logit contributions
ummy+0.015
ls+0.014
der+0.013
ement+0.013
ience+0.013
Neg logit contributions
 join-0.020
 meet-0.020
 build-0.018
 greet-0.017
 save-0.017
Token time
Token ID640
Feature activation-2.18e-03

Pos logit contributions
ls+0.219
ience+0.215
ummy+0.204
 importantly+0.202
 adorable+0.195
Neg logit contributions
 meet-0.241
 join-0.238
 build-0.221
 greet-0.209
talk-0.205
Token,
Token ID11
Feature activation-8.45e-02

Pos logit contributions
ummy+0.004
ience+0.003
 adorable+0.003
 important+0.003
ls+0.003
Neg logit contributions
 join-0.005
 meet-0.005
 build-0.004
 greet-0.004
 rebuild-0.004
Token and
Token ID290
Feature activation-5.56e-02

Pos logit contributions
ummy+0.107
ls+0.103
ience+0.098
 important+0.098
 adorable+0.097
Neg logit contributions
 join-0.221
 meet-0.220
 build-0.207
 greet-0.206
 contain-0.200
Token Tim
Token ID5045
Feature activation+4.83e-03

Pos logit contributions
ummy+0.088
ls+0.084
 adorable+0.084
 important+0.083
 problem+0.082
Neg logit contributions
 meet-0.125
 join-0.125
 greet-0.117
 build-0.115
talk-0.115
Token learned
Token ID4499
Feature activation+1.75e-02

Pos logit contributions
 meet+0.011
 join+0.011
 build+0.010
talk+0.010
 greet+0.010
Neg logit contributions
ummy-0.008
ls-0.008
 importantly-0.007
 adorable-0.007
 important-0.007
Token.
Token ID13
Feature activation+2.99e-03

Pos logit contributions
 join+0.007
 meet+0.007
 greet+0.007
talk+0.007
 build+0.007
Neg logit contributions
ummy-0.005
 adorable-0.005
 important-0.005
ls-0.005
 problem-0.005
Token "
Token ID366
Feature activation+1.11e-02

Pos logit contributions
 join+0.007
 meet+0.007
talk+0.007
 greet+0.007
 contain+0.007
Neg logit contributions
ummy-0.004
 adorable-0.004
ls-0.003
 important-0.003
 cute-0.003
TokenYou
Token ID1639
Feature activation+2.20e-02

Pos logit contributions
 join+0.021
 build+0.020
 meet+0.020
 greet+0.020
talk+0.020
Neg logit contributions
 adorable-0.020
 cute-0.019
 important-0.018
 disgusting-0.018
 brightly-0.018
Token look
Token ID804
Feature activation-7.46e-02

Pos logit contributions
 join+0.026
 meet+0.026
talk+0.024
 greet+0.024
 contain+0.023
Neg logit contributions
 adorable-0.056
 important-0.055
ummy-0.055
ls-0.055
 problem-0.054
Token so
Token ID523
Feature activation-1.22e-01

Pos logit contributions
ummy+0.119
ience+0.114
ls+0.111
omo+0.111
ittens+0.108
Neg logit contributions
 join-0.173
 meet-0.170
 greet-0.158
 contain-0.149
 make-0.147
Token shiny
Token ID22441
Feature activation-1.22e-01

Pos logit contributions
ummy+0.130
der+0.129
ience+0.124
 importantly+0.121
ement+0.121
Neg logit contributions
 meet-0.337
 join-0.324
 build-0.301
 greet-0.299
 make-0.290
Token and
Token ID290
Feature activation-6.59e-02

Pos logit contributions
omo+0.268
 sudden+0.263
 importantly+0.254
ittens+0.240
 problem+0.240
Neg logit contributions
 make-0.155
talk-0.151
 join-0.150
 meet-0.144
 greet-0.134
Token colorful
Token ID20239
Feature activation-9.85e-02

Pos logit contributions
ummy+0.072
ls+0.067
 problem+0.063
 important+0.062
 adorable+0.062
Neg logit contributions
 meet-0.181
 join-0.180
 greet-0.172
 build-0.170
 contain-0.166
Token."
Token ID526
Feature activation+2.75e-02

Pos logit contributions
omo+0.205
ience+0.204
 problem+0.204
 important+0.202
ummy+0.201
Neg logit contributions
 join-0.158
 meet-0.149
talk-0.140
 greet-0.138
 contain-0.134
Token
Token ID198
Feature activation+2.64e-02

Pos logit contributions
 join+0.056
 meet+0.056
 greet+0.053
talk+0.053
 contain+0.052
Neg logit contributions
ummy-0.047
 adorable-0.046
 important-0.046
ls-0.046
 problem-0.044
Token
Token ID198
Feature activation+2.32e-02

Pos logit contributions
 join+0.055
 meet+0.055
 greet+0.053
talk+0.052
 contain+0.051
Neg logit contributions
ummy-0.044
ls-0.043
 adorable-0.043
 important-0.043
 problem-0.041