TOP ACTIVATIONS
MAX = 0.275

said together , " A br ac ad ab ra !"
another wand . " A br ac ad ab ra !"
it around . " A br ac ad ab ra !"
and said , " A br ac ad ab ra !"
She said , " A br ac ad ab ra !"
the air . " A br ac ad ab ra !"
pond . The two ze br as stayed together for a
and said , " A br ac ad ab ra !
and said , " A br ac ad ab ra !
and said , " A br ac ad ab ra !"
and said , " A br ac ad ab ra !"
and saying , " A br ac ad ab ra !"
and said , " A br ac ad ab ra !
pond . The two ze br as stayed together for a
and said , " A br ac ad ab ra ,
and said , " A br ac ad ab ra ,
can 't . The ze br as have their own food
around and said " A br ac ad ab ra !"
wand and say " A br ac ad ab ra !"
scree ched . The ze br as ne ighed . The

MIN ACTIVATIONS
MIN = -0.263

if nothing had ever happened <|endoftext|> Once upon a time there
belonged to the park care taker . He used the soft
birds , no matter how <|endoftext|> Once upon a time there
understand things a bit better <|endoftext|> Once upon a time ,
also felt better knowing that <|endoftext|> Once there was a girl
smiled and said goodbye before <|endoftext|> One day , something bright
how cold it got or <|endoftext|> Once there was a little
it couldn 't tell the <|endoftext|> John was walking in a
and waved goodbye as it <|endoftext|> Tom my was walking in
she couldn 't get it <|endoftext|> Once upon a time there
this old man might be <|endoftext|> Once there was a little
use it whenever you want <|endoftext|> Once there was a girl
full of love . <|endoftext|> Sad ie was feeling angry .
an adventure they would never <|endoftext|> Once upon a time ,
more than his other toys <|endoftext|> An adventurous boy named Tom
, but still True was nowhere to be found .
day , but it was nowhere to be found . He
elf and waved goodbye as <|endoftext|> Tom my was walking in
him from playing too deep <|endoftext|> Once upon a time there
smiled as they watched it <|endoftext|> Once upon a time there

INTERVAL 0.140 - 0.275
CONTAINS 2.891%

to play in her backyard , where she had a sandbox
were flying . One day , Lily met a
made them even more angry , so they started shouting at
world outside . One day , the cat saw a big
named Leo . Max said , " Hi , I am

INTERVAL 0.006 - 0.140
CONTAINS 39.368%

he doing ?" Her dad said , " He
Tom and Anna are scared . They drop the
for the polite answer . Then , it went off to
. She noticed the same math book in the same place
are his family . They go inside the house

INTERVAL -0.129 - 0.006
CONTAINS 55.975%

he didn 't know how to make a cake . One
What 's wrong , honey ?" P apa
to explore . He was always excited to learn new things
Now you 'll be successful !" he said . And sure
. He raced so fast that no - one could catch

INTERVAL -0.263 - -0.129
CONTAINS 1.766%

M ia was sad and angry . She ran after Tom
but Tim my still didn 't want to . Instead ,
smiles on their faces . <|endoftext|> Once upon a time ,
disappointed but still imagined what it would have been like if
Tim my loved to wander around his neighborhood and see all
Token said
Token ID531
Feature activation+1.01e-02

Pos logit contributions
 detective+0.088
 customer+0.085
 chefs+0.082
 cam+0.082
 cowboy+0.081
Neg logit contributions
 shouldn-0.076
os-0.067
 hurts-0.064
|-0.061
 doesn-0.061
Token together
Token ID1978
Feature activation+3.91e-02

Pos logit contributions
 detective+0.015
 customer+0.015
 cam+0.014
 cowboy+0.014
 buyer+0.014
Neg logit contributions
 shouldn-0.016
 hurts-0.015
 means-0.014
 doesn-0.014
os-0.014
Token,
Token ID11
Feature activation+1.59e-01

Pos logit contributions
 detective+0.065
 cam+0.063
 cowboy+0.062
 customer+0.062
 chefs+0.061
Neg logit contributions
 shouldn-0.059
 hurts-0.055
os-0.053
 doesn-0.051
idents-0.051
Token "
Token ID366
Feature activation+7.08e-02

Pos logit contributions
 detective+0.259
 customer+0.248
 cam+0.248
 rulers+0.242
 servant+0.239
Neg logit contributions
 shouldn-0.240
 hurts-0.219
idents-0.218
 doesn-0.213
elling-0.213
TokenA
Token ID32
Feature activation+7.59e-02

Pos logit contributions
 detective+0.098
 customer+0.094
 cam+0.091
 rulers+0.091
 pirates+0.090
Neg logit contributions
 shouldn-0.124
 hurts-0.119
rated-0.115
elling-0.115
|-0.115
Tokenbr
Token ID1671
Feature activation+2.75e-01

Pos logit contributions
 detective+0.123
 cam+0.120
 customer+0.117
 chefs+0.117
 pirates+0.116
Neg logit contributions
 shouldn-0.115
 hurts-0.112
|-0.104
elling-0.104
rated-0.103
Tokenac
Token ID330
Feature activation+5.38e-02

Pos logit contributions
 detective+0.396
 cam+0.377
 customer+0.373
 cowboy+0.365
 chefs+0.362
Neg logit contributions
 shouldn-0.468
 hurts-0.447
|-0.443
elling-0.435
rated-0.431
Tokenad
Token ID324
Feature activation+8.49e-02

Pos logit contributions
 detective+0.095
 rulers+0.095
 customer+0.095
 cam+0.093
 ha+0.093
Neg logit contributions
 shouldn-0.074
|-0.068
idents-0.068
rated-0.064
 hurts-0.063
Tokenab
Token ID397
Feature activation+8.22e-02

Pos logit contributions
 detective+0.129
 customer+0.122
 cam+0.121
 cowboy+0.120
 ha+0.118
Neg logit contributions
 shouldn-0.141
idents-0.135
|-0.132
 aback-0.130
 hurts-0.130
Tokenra
Token ID430
Feature activation+4.23e-02

Pos logit contributions
 detective+0.106
 customer+0.101
 ha+0.101
 rulers+0.101
 cam+0.100
Neg logit contributions
 shouldn-0.154
|-0.148
elling-0.143
rated-0.141
 hurts-0.141
Token!"
Token ID2474
Feature activation-6.99e-02

Pos logit contributions
 detective+0.075
 cam+0.072
 customer+0.072
 chefs+0.070
 cowboy+0.070
Neg logit contributions
 shouldn-0.058
 hurts-0.055
|-0.053
elling-0.052
rated-0.052
Token another
Token ID1194
Feature activation+6.47e-02

Pos logit contributions
 detective+0.147
 cam+0.144
 customer+0.143
 chefs+0.139
 servant+0.138
Neg logit contributions
 shouldn-0.095
 aback-0.093
 hurts-0.092
|-0.091
elling-0.086
Token wand
Token ID11569
Feature activation+2.83e-02

Pos logit contributions
 cam+0.075
 detective+0.074
 chefs+0.073
 customer+0.072
 rulers+0.070
Neg logit contributions
 hurts-0.122
 shouldn-0.120
|-0.116
rated-0.114
elling-0.113
Token.
Token ID13
Feature activation-2.46e-02

Pos logit contributions
 detective+0.047
 cam+0.046
 customer+0.045
 cowboy+0.044
 chefs+0.044
Neg logit contributions
 shouldn-0.041
 hurts-0.039
|-0.038
elling-0.037
 aback-0.037
Token "
Token ID366
Feature activation+7.84e-02

Pos logit contributions
 shouldn+0.021
 hurts+0.019
|+0.018
 aback+0.018
elling+0.017
Neg logit contributions
 detective-0.056
 cam-0.055
 rulers-0.054
 customer-0.054
 cowboy-0.053
TokenA
Token ID32
Feature activation+6.90e-02

Pos logit contributions
 detective+0.140
 cam+0.136
 customer+0.134
 servant+0.131
 cowboy+0.130
Neg logit contributions
 shouldn-0.106
|-0.104
 hurts-0.099
elling-0.096
rated-0.095
Tokenbr
Token ID1671
Feature activation+2.72e-01

Pos logit contributions
 chefs+0.086
 cam+0.081
 customer+0.080
 pirates+0.080
 buyer+0.079
Neg logit contributions
 hurts-0.128
ches-0.124
 shouldn-0.124
|-0.116
 doesn-0.114
Tokenac
Token ID330
Feature activation+4.64e-02

Pos logit contributions
 detective+0.431
 cam+0.418
 customer+0.409
 cowboy+0.408
 chefs+0.403
Neg logit contributions
 shouldn-0.411
|-0.395
elling-0.395
 hurts-0.394
os-0.393
Tokenad
Token ID324
Feature activation+9.11e-02

Pos logit contributions
 detective+0.086
 cam+0.084
 customer+0.082
 ha+0.082
 rulers+0.082
Neg logit contributions
 shouldn-0.061
idents-0.055
|-0.055
 hurts-0.053
rated-0.052
Tokenab
Token ID397
Feature activation+8.67e-02

Pos logit contributions
 detective+0.139
 customer+0.130
 cam+0.130
 cowboy+0.129
 ha+0.126
Neg logit contributions
 shouldn-0.150
idents-0.142
 hurts-0.139
|-0.139
elling-0.138
Tokenra
Token ID430
Feature activation+5.43e-02

Pos logit contributions
 detective+0.123
 cam+0.116
 customer+0.115
 cowboy+0.112
 rulers+0.112
Neg logit contributions
 shouldn-0.152
|-0.143
 hurts-0.142
elling-0.140
rated-0.139
Token!"
Token ID2474
Feature activation-6.77e-02

Pos logit contributions
 detective+0.106
 cam+0.102
 customer+0.102
 cowboy+0.099
 chefs+0.099
Neg logit contributions
 shouldn-0.064
 hurts-0.060
|-0.059
elling-0.058
rated-0.057
Token it
Token ID340
Feature activation-9.79e-02

Pos logit contributions
 customer+0.028
 detective+0.028
 cam+0.027
 chefs+0.027
 buyer+0.027
Neg logit contributions
 shouldn-0.015
 hurts-0.015
 aback-0.014
os-0.013
elling-0.013
Token around
Token ID1088
Feature activation-1.38e-01

Pos logit contributions
 shouldn+0.131
 hurts+0.129
os+0.120
 aback+0.119
rated+0.119
Neg logit contributions
 detective-0.174
 customer-0.171
 chefs-0.171
 cam-0.170
 cowboy-0.165
Token.
Token ID13
Feature activation-3.89e-02

Pos logit contributions
 shouldn+0.177
|+0.170
 hurts+0.167
os+0.163
 aback+0.162
Neg logit contributions
 detective-0.246
 cam-0.238
 cowboy-0.236
 customer-0.233
 rulers-0.231
Token "
Token ID366
Feature activation+7.10e-02

Pos logit contributions
 shouldn+0.049
 hurts+0.046
|+0.046
rated+0.045
elling+0.044
Neg logit contributions
 detective-0.073
 cam-0.071
 customer-0.071
 chefs-0.069
 cowboy-0.069
TokenA
Token ID32
Feature activation+6.53e-02

Pos logit contributions
 detective+0.118
 cam+0.114
 customer+0.113
 chefs+0.111
 servant+0.110
Neg logit contributions
 shouldn-0.103
|-0.102
 hurts-0.098
elling-0.095
rated-0.094
Tokenbr
Token ID1671
Feature activation+2.67e-01

Pos logit contributions
 chefs+0.078
 customer+0.074
 cam+0.073
 buyer+0.072
 pirates+0.072
Neg logit contributions
 hurts-0.123
 shouldn-0.120
ches-0.119
|-0.113
 doesn-0.111
Tokenac
Token ID330
Feature activation+4.46e-02

Pos logit contributions
 detective+0.412
 cam+0.397
 customer+0.390
 cowboy+0.388
 chefs+0.381
Neg logit contributions
 shouldn-0.416
|-0.401
 hurts-0.399
elling-0.398
os-0.395
Tokenad
Token ID324
Feature activation+8.41e-02

Pos logit contributions
 detective+0.082
 cam+0.080
 customer+0.078
 ha+0.078
 rulers+0.078
Neg logit contributions
 shouldn-0.060
idents-0.054
|-0.053
 hurts-0.052
rated-0.051
Tokenab
Token ID397
Feature activation+7.91e-02

Pos logit contributions
 detective+0.125
 cowboy+0.117
 cam+0.116
 customer+0.116
 pirate+0.113
Neg logit contributions
 shouldn-0.142
idents-0.137
 hurts-0.132
 aback-0.132
|-0.131
Tokenra
Token ID430
Feature activation+4.65e-02

Pos logit contributions
 detective+0.113
 cam+0.107
 customer+0.106
 rulers+0.104
 cowboy+0.103
Neg logit contributions
 shouldn-0.138
|-0.129
 hurts-0.129
elling-0.127
rated-0.126
Token!"
Token ID2474
Feature activation-7.14e-02

Pos logit contributions
 detective+0.091
 cam+0.087
 customer+0.087
 cowboy+0.085
 rulers+0.084
Neg logit contributions
 shouldn-0.055
|-0.052
 hurts-0.051
elling-0.050
rated-0.049
Token and
Token ID290
Feature activation-5.67e-02

Pos logit contributions
 detective+0.036
 cam+0.035
 customer+0.035
 chefs+0.034
 cowboy+0.034
Neg logit contributions
 shouldn-0.030
 hurts-0.029
elling-0.027
 aback-0.027
os-0.027
Token said
Token ID531
Feature activation-5.53e-03

Pos logit contributions
 shouldn+0.064
 hurts+0.062
 aback+0.056
 doesn+0.056
 means+0.055
Neg logit contributions
 customer-0.109
 detective-0.108
 rulers-0.103
 chefs-0.103
 cam-0.103
Token,
Token ID11
Feature activation+1.41e-01

Pos logit contributions
 hurts+0.007
 aback+0.007
 shouldn+0.007
'.+0.007
os+0.007
Neg logit contributions
 detective-0.009
 customer-0.009
 cowboy-0.009
 buyer-0.009
 chefs-0.008
Token "
Token ID366
Feature activation+5.98e-02

Pos logit contributions
 detective+0.247
 customer+0.234
 cam+0.233
 rulers+0.233
 chefs+0.230
Neg logit contributions
 shouldn-0.193
 hurts-0.175
idents-0.172
os-0.169
 doesn-0.169
TokenA
Token ID32
Feature activation+6.68e-02

Pos logit contributions
 detective+0.102
 cam+0.099
 chefs+0.098
 customer+0.098
 servant+0.095
Neg logit contributions
 shouldn-0.083
|-0.081
 hurts-0.078
elling-0.077
 aback-0.076
Tokenbr
Token ID1671
Feature activation+2.66e-01

Pos logit contributions
 detective+0.083
 chefs+0.082
 customer+0.080
 cam+0.079
 pirates+0.078
Neg logit contributions
 shouldn-0.121
 hurts-0.120
|-0.114
elling-0.113
upid-0.111
Tokenac
Token ID330
Feature activation+4.65e-02

Pos logit contributions
 detective+0.415
 cam+0.396
 cowboy+0.393
 customer+0.389
 chefs+0.383
Neg logit contributions
 shouldn-0.411
elling-0.397
|-0.396
 hurts-0.393
os-0.391
Tokenad
Token ID324
Feature activation+8.06e-02

Pos logit contributions
 detective+0.085
 cam+0.084
 customer+0.083
 rulers+0.083
 ha+0.082
Neg logit contributions
 shouldn-0.062
idents-0.056
|-0.055
 hurts-0.054
rated-0.053
Tokenab
Token ID397
Feature activation+7.66e-02

Pos logit contributions
 detective+0.123
 cam+0.117
 customer+0.115
 cowboy+0.114
 pirates+0.113
Neg logit contributions
 shouldn-0.133
idents-0.127
 hurts-0.124
|-0.124
rated-0.122
Tokenra
Token ID430
Feature activation+3.89e-02

Pos logit contributions
 detective+0.106
 cam+0.101
 customer+0.099
 rulers+0.098
 ha+0.098
Neg logit contributions
 shouldn-0.137
|-0.129
 hurts-0.127
elling-0.126
rated-0.125
Token!"
Token ID2474
Feature activation-6.94e-02

Pos logit contributions
 detective+0.063
 cam+0.060
 customer+0.060
 chefs+0.058
 cowboy+0.058
Neg logit contributions
 shouldn-0.059
 hurts-0.057
|-0.054
elling-0.054
rated-0.053
Token She
Token ID1375
Feature activation+4.44e-02

Pos logit contributions
rated+0.033
 shouldn+0.031
|+0.031
elling+0.030
 hurts+0.030
Neg logit contributions
 detective-0.041
 customer-0.040
 chefs-0.040
 cam-0.039
 pirates-0.039
Token said
Token ID531
Feature activation+1.79e-02

Pos logit contributions
 detective+0.073
 customer+0.071
 cam+0.070
 chefs+0.068
 cowboy+0.067
Neg logit contributions
 shouldn-0.068
 hurts-0.064
|-0.061
os-0.059
elling-0.059
Token,
Token ID11
Feature activation+1.52e-01

Pos logit contributions
 detective+0.027
 customer+0.026
 cam+0.026
 cowboy+0.025
 rulers+0.025
Neg logit contributions
 shouldn-0.029
 hurts-0.028
 aback-0.026
elling-0.026
os-0.026
Token "
Token ID366
Feature activation+6.57e-02

Pos logit contributions
 detective+0.207
 cam+0.197
 customer+0.196
 chefs+0.189
 cowboy+0.188
Neg logit contributions
 shouldn-0.271
 hurts-0.259
|-0.253
elling-0.249
rated-0.246
TokenA
Token ID32
Feature activation+6.75e-02

Pos logit contributions
 detective+0.091
 customer+0.089
 ha+0.088
P+0.087
 cam+0.087
Neg logit contributions
 shouldn-0.113
rated-0.108
|-0.106
elling-0.106
 hurts-0.105
Tokenbr
Token ID1671
Feature activation+2.64e-01

Pos logit contributions
 cam+0.103
 detective+0.097
 chefs+0.097
 pirates+0.097
 customer+0.096
Neg logit contributions
 hurts-0.112
 shouldn-0.112
 doesn-0.096
upid-0.096
elling-0.095
Tokenac
Token ID330
Feature activation+4.40e-02

Pos logit contributions
 detective+0.410
 cam+0.394
 cowboy+0.385
 customer+0.385
 chefs+0.376
Neg logit contributions
 shouldn-0.421
 hurts-0.403
|-0.395
elling-0.395
os-0.390
Tokenad
Token ID324
Feature activation+8.45e-02

Pos logit contributions
 ha+0.075
 detective+0.075
 cam+0.074
 rulers+0.074
 customer+0.073
Neg logit contributions
 shouldn-0.061
idents-0.058
|-0.058
rated-0.054
 aback-0.052
Tokenab
Token ID397
Feature activation+7.89e-02

Pos logit contributions
 detective+0.133
 customer+0.126
 cam+0.125
 cowboy+0.124
 ha+0.123
Neg logit contributions
 shouldn-0.133
idents-0.127
|-0.125
 hurts-0.124
rated-0.123
Tokenra
Token ID430
Feature activation+4.59e-02

Pos logit contributions
 detective+0.103
 ha+0.097
 customer+0.097
 cam+0.096
 rulers+0.095
Neg logit contributions
 shouldn-0.146
|-0.140
elling-0.136
rated-0.135
 hurts-0.135
Token!"
Token ID2474
Feature activation-6.87e-02

Pos logit contributions
 detective+0.071
 cam+0.068
 customer+0.068
 chefs+0.066
 cowboy+0.066
Neg logit contributions
 shouldn-0.073
 hurts-0.069
|-0.068
elling-0.067
rated-0.066
Token the
Token ID262
Feature activation-3.71e-02

Pos logit contributions
 shouldn+0.016
|+0.016
 hurts+0.015
 aback+0.015
upid+0.014
Neg logit contributions
 detective-0.028
 cam-0.027
 cowboy-0.027
 servant-0.027
 chefs-0.026
Token air
Token ID1633
Feature activation-1.80e-02

Pos logit contributions
 shouldn+0.069
 hurts+0.068
|+0.066
elling+0.065
 aback+0.064
Neg logit contributions
 detective-0.047
 cam-0.046
 customer-0.045
 cowboy-0.044
 chefs-0.044
Token.
Token ID13
Feature activation-3.70e-02

Pos logit contributions
 shouldn+0.024
 hurts+0.024
|+0.023
rated+0.022
elling+0.022
Neg logit contributions
 detective-0.032
 cam-0.030
 customer-0.030
 cowboy-0.030
 chefs-0.029
Token "
Token ID366
Feature activation+6.95e-02

Pos logit contributions
 shouldn+0.049
 hurts+0.047
|+0.046
elling+0.044
rated+0.044
Neg logit contributions
 detective-0.067
 cam-0.064
 customer-0.064
 cowboy-0.063
 rulers-0.062
TokenA
Token ID32
Feature activation+5.98e-02

Pos logit contributions
 detective+0.121
 cam+0.120
 servant+0.116
 customer+0.116
 cowboy+0.113
Neg logit contributions
 shouldn-0.095
|-0.092
 hurts-0.089
elling-0.086
rated-0.085
Tokenbr
Token ID1671
Feature activation+2.64e-01

Pos logit contributions
 cam+0.084
 chefs+0.081
 customer+0.079
 servant+0.078
 rulers+0.075
Neg logit contributions
 hurts-0.103
 shouldn-0.101
ches-0.094
 doesn-0.091
'.-0.090
Tokenac
Token ID330
Feature activation+3.90e-02

Pos logit contributions
 detective+0.417
 cam+0.400
 cowboy+0.393
 customer+0.393
 chefs+0.382
Neg logit contributions
 shouldn-0.409
|-0.390
 hurts-0.389
elling-0.387
os-0.381
Tokenad
Token ID324
Feature activation+8.20e-02

Pos logit contributions
 ha+0.065
 rulers+0.065
 detective+0.065
 cam+0.064
 pirates+0.064
Neg logit contributions
 shouldn-0.056
idents-0.053
|-0.051
rated-0.048
 hurts-0.048
Tokenab
Token ID397
Feature activation+7.84e-02

Pos logit contributions
 detective+0.126
 cam+0.117
 customer+0.117
 cowboy+0.116
 ha+0.114
Neg logit contributions
 shouldn-0.135
idents-0.129
|-0.125
rated-0.124
 hurts-0.123
Tokenra
Token ID430
Feature activation+4.44e-02

Pos logit contributions
 detective+0.101
 ha+0.096
 cam+0.094
 rulers+0.094
 customer+0.094
Neg logit contributions
 shouldn-0.149
|-0.140
rated-0.137
elling-0.137
 hurts-0.136
Token!"
Token ID2474
Feature activation-7.32e-02

Pos logit contributions
 detective+0.088
 cam+0.085
 customer+0.084
 cowboy+0.083
 chefs+0.082
Neg logit contributions
 shouldn-0.051
 hurts-0.048
|-0.048
elling-0.046
rated-0.045
Token pond
Token ID16723
Feature activation+5.46e-02

Pos logit contributions
 shouldn+0.062
 hurts+0.059
|+0.059
elling+0.058
rated+0.057
Neg logit contributions
 detective-0.050
 cam-0.048
 customer-0.047
 cowboy-0.046
 chefs-0.046
Token.
Token ID13
Feature activation-1.76e-02

Pos logit contributions
 detective+0.079
 cam+0.077
 customer+0.077
 pirates+0.075
 chefs+0.075
Neg logit contributions
 shouldn-0.089
|-0.086
 hurts-0.085
elling-0.085
rated-0.081
Token The
Token ID383
Feature activation-2.71e-02

Pos logit contributions
rated+0.026
 hurts+0.024
 shouldn+0.023
elling+0.023
|+0.023
Neg logit contributions
 customer-0.030
 detective-0.029
 cam-0.029
 knights-0.028
 pirates-0.028
Token two
Token ID734
Feature activation+1.19e-01

Pos logit contributions
 shouldn+0.050
 hurts+0.050
|+0.048
elling+0.047
upid+0.047
Neg logit contributions
 chefs-0.030
 detective-0.030
 customer-0.028
 rulers-0.028
 cam-0.027
Token ze
Token ID41271
Feature activation+7.70e-02

Pos logit contributions
 detective+0.207
 chefs+0.207
 customer+0.203
 servant+0.191
P+0.190
Neg logit contributions
 shouldn-0.155
os-0.146
 hurts-0.145
 cries-0.137
ches-0.135
Tokenbr
Token ID1671
Feature activation+2.63e-01

Pos logit contributions
 detective+0.116
 customer+0.114
 chefs+0.112
 servant+0.107
 buyer+0.106
Neg logit contributions
 shouldn-0.125
 hurts-0.116
|-0.116
elling-0.116
os-0.114
Tokenas
Token ID292
Feature activation+2.07e-02

Pos logit contributions
 cam+0.259
 detective+0.258
 rulers+0.253
 customer+0.249
 chefs+0.238
Neg logit contributions
 shouldn-0.560
 hurts-0.543
idents-0.527
|-0.526
rated-0.513
Token stayed
Token ID9658
Feature activation+8.25e-02

Pos logit contributions
 detective+0.029
 cam+0.029
 customer+0.028
 rulers+0.027
 pirates+0.027
Neg logit contributions
 shouldn-0.034
 hurts-0.034
elling-0.033
|-0.033
idents-0.033
Token together
Token ID1978
Feature activation+1.74e-02

Pos logit contributions
 detective+0.124
 customer+0.123
 servant+0.120
 cam+0.116
 cowboy+0.115
Neg logit contributions
 shouldn-0.128
 aback-0.123
|-0.121
 hurts-0.121
rated-0.115
Token for
Token ID329
Feature activation-9.09e-03

Pos logit contributions
 detective+0.030
 cowboy+0.028
 cam+0.028
 customer+0.028
 chefs+0.028
Neg logit contributions
 shouldn-0.024
|-0.023
 hurts-0.023
 aback-0.022
os-0.022
Token a
Token ID257
Feature activation+7.47e-02

Pos logit contributions
 shouldn+0.013
|+0.012
 hurts+0.012
 aback+0.011
 cries+0.011
Neg logit contributions
 detective-0.016
 customer-0.015
 servant-0.014
 cam-0.014
 buyer-0.014
Token and
Token ID290
Feature activation-5.69e-02

Pos logit contributions
 shouldn+0.152
|+0.145
 hurts+0.143
elling+0.137
 aback+0.137
Neg logit contributions
 detective-0.186
 cam-0.179
 customer-0.178
 cowboy-0.176
 chefs-0.173
Token said
Token ID531
Feature activation-9.31e-03

Pos logit contributions
 shouldn+0.083
 hurts+0.078
|+0.076
rated+0.075
idents+0.073
Neg logit contributions
 customer-0.095
 detective-0.095
 cam-0.092
 chefs-0.091
 rulers-0.090
Token,
Token ID11
Feature activation+1.39e-01

Pos logit contributions
 hurts+0.013
 shouldn+0.013
 It+0.012
 doesn+0.012
 aback+0.012
Neg logit contributions
 customer-0.016
 chefs-0.015
 buyer-0.015
 detective-0.015
 cowboy-0.014
Token "
Token ID366
Feature activation+5.40e-02

Pos logit contributions
 detective+0.225
 customer+0.215
 cam+0.213
 chefs+0.212
 rulers+0.212
Neg logit contributions
 shouldn-0.210
 hurts-0.193
idents-0.191
|-0.185
 doesn-0.183
TokenA
Token ID32
Feature activation+6.09e-02

Pos logit contributions
 detective+0.087
 cam+0.084
 customer+0.083
 chefs+0.081
 cowboy+0.081
Neg logit contributions
 shouldn-0.083
 hurts-0.078
|-0.078
elling-0.076
rated-0.075
Tokenbr
Token ID1671
Feature activation+2.62e-01

Pos logit contributions
 chefs+0.084
 cam+0.083
 rulers+0.080
 pirates+0.080
 detective+0.078
Neg logit contributions
 shouldn-0.109
 hurts-0.106
|-0.100
ches-0.096
 doesn-0.095
Tokenac
Token ID330
Feature activation+4.20e-02

Pos logit contributions
 detective+0.407
 cam+0.391
 cowboy+0.383
 customer+0.381
 chefs+0.375
Neg logit contributions
 shouldn-0.417
 hurts-0.395
|-0.394
elling-0.391
os-0.383
Tokenad
Token ID324
Feature activation+8.05e-02

Pos logit contributions
 detective+0.075
 ha+0.074
 cam+0.073
 rulers+0.072
 customer+0.072
Neg logit contributions
 shouldn-0.056
idents-0.053
|-0.051
rated-0.049
 hurts-0.049
Tokenab
Token ID397
Feature activation+7.54e-02

Pos logit contributions
 detective+0.121
 customer+0.114
 cowboy+0.112
 cam+0.112
 ha+0.111
Neg logit contributions
 shouldn-0.134
idents-0.129
 hurts-0.126
 aback-0.125
|-0.124
Tokenra
Token ID430
Feature activation+4.32e-02

Pos logit contributions
 detective+0.105
 cam+0.100
 customer+0.100
 ha+0.097
 rulers+0.096
Neg logit contributions
 shouldn-0.132
|-0.125
 hurts-0.124
elling-0.123
rated-0.122
Token!
Token ID0
Feature activation-3.96e-02

Pos logit contributions
 detective+0.090
 cam+0.087
 customer+0.086
 cowboy+0.085
 chefs+0.084
Neg logit contributions
 shouldn-0.046
 hurts-0.043
|-0.042
elling-0.041
rated-0.040
Token and
Token ID290
Feature activation-5.85e-02

Pos logit contributions
 shouldn+0.060
 hurts+0.056
|+0.055
elling+0.054
 aback+0.053
Neg logit contributions
 detective-0.069
 cam-0.066
 customer-0.065
 cowboy-0.064
 chefs-0.063
Token said
Token ID531
Feature activation-7.26e-03

Pos logit contributions
 shouldn+0.092
 hurts+0.086
idents+0.081
 doesn+0.080
|+0.080
Neg logit contributions
 customer-0.094
 detective-0.093
 cam-0.091
 servant-0.088
 rulers-0.088
Token,
Token ID11
Feature activation+1.37e-01

Pos logit contributions
 shouldn+0.012
 hurts+0.011
 It+0.010
 doesn+0.010
idents+0.010
Neg logit contributions
 customer-0.012
 detective-0.011
 buyer-0.011
 cowboy-0.011
 cam-0.011
Token "
Token ID366
Feature activation+5.56e-02

Pos logit contributions
 detective+0.193
 cam+0.184
 customer+0.183
 rulers+0.177
 chefs+0.177
Neg logit contributions
 shouldn-0.238
 hurts-0.225
|-0.220
idents-0.217
elling-0.216
TokenA
Token ID32
Feature activation+6.02e-02

Pos logit contributions
 detective+0.083
 customer+0.079
 cam+0.078
 cowboy+0.077
 pirates+0.076
Neg logit contributions
 shouldn-0.090
 hurts-0.088
|-0.085
elling-0.083
idents-0.083
Tokenbr
Token ID1671
Feature activation+2.60e-01

Pos logit contributions
 chefs+0.081
 cam+0.079
 customer+0.078
 rulers+0.074
 detective+0.072
Neg logit contributions
 shouldn-0.109
 hurts-0.108
|-0.101
ches-0.099
 doesn-0.097
Tokenac
Token ID330
Feature activation+4.17e-02

Pos logit contributions
 detective+0.392
 cam+0.375
 customer+0.371
 cowboy+0.367
 chefs+0.360
Neg logit contributions
 shouldn-0.427
 hurts-0.405
|-0.402
elling-0.399
rated-0.390
Tokenad
Token ID324
Feature activation+8.30e-02

Pos logit contributions
 detective+0.072
 ha+0.072
 cam+0.071
 pirates+0.069
 rulers+0.069
Neg logit contributions
 shouldn-0.056
idents-0.055
|-0.053
rated-0.051
 hurts-0.050
Tokenab
Token ID397
Feature activation+7.67e-02

Pos logit contributions
 detective+0.125
 cam+0.118
 customer+0.117
 cowboy+0.116
 ha+0.115
Neg logit contributions
 shouldn-0.137
idents-0.132
 hurts-0.129
|-0.128
rated-0.127
Tokenra
Token ID430
Feature activation+4.71e-02

Pos logit contributions
 detective+0.103
 cam+0.097
 ha+0.096
 customer+0.096
 rulers+0.095
Neg logit contributions
 shouldn-0.139
|-0.131
 hurts-0.129
rated-0.129
elling-0.128
Token!
Token ID0
Feature activation-3.75e-02

Pos logit contributions
 detective+0.092
 cam+0.089
 customer+0.088
 cowboy+0.086
 chefs+0.086
Neg logit contributions
 shouldn-0.056
 hurts-0.052
|-0.052
elling-0.050
rated-0.049
Token and
Token ID290
Feature activation-5.71e-02

Pos logit contributions
 shouldn+0.054
 hurts+0.051
|+0.050
elling+0.049
 aback+0.048
Neg logit contributions
 detective-0.061
 cam-0.058
 customer-0.058
 cowboy-0.057
 chefs-0.056
Token said
Token ID531
Feature activation-5.60e-03

Pos logit contributions
 shouldn+0.092
 hurts+0.087
 doesn+0.081
idents+0.081
 aback+0.079
Neg logit contributions
 customer-0.090
 detective-0.089
 cam-0.087
 servant-0.084
 rulers-0.083
Token,
Token ID11
Feature activation+1.38e-01

Pos logit contributions
 shouldn+0.009
 hurts+0.008
 It+0.008
 doesn+0.008
idents+0.008
Neg logit contributions
 customer-0.009
 detective-0.009
 buyer-0.009
 cam-0.009
 cowboy-0.008
Token "
Token ID366
Feature activation+5.63e-02

Pos logit contributions
 detective+0.198
 cam+0.189
 customer+0.187
 rulers+0.182
 chefs+0.181
Neg logit contributions
 shouldn-0.238
 hurts-0.224
|-0.219
idents-0.216
elling-0.215
TokenA
Token ID32
Feature activation+6.03e-02

Pos logit contributions
 detective+0.086
 customer+0.082
 cam+0.081
 cowboy+0.079
 chefs+0.079
Neg logit contributions
 shouldn-0.090
 hurts-0.088
|-0.085
elling-0.084
rated-0.083
Tokenbr
Token ID1671
Feature activation+2.59e-01

Pos logit contributions
 chefs+0.074
 cam+0.074
 customer+0.071
 rulers+0.068
 detective+0.067
Neg logit contributions
 shouldn-0.113
 hurts-0.112
ches-0.105
|-0.103
upid-0.101
Tokenac
Token ID330
Feature activation+4.13e-02

Pos logit contributions
 detective+0.398
 cam+0.383
 customer+0.375
 cowboy+0.373
 chefs+0.365
Neg logit contributions
 shouldn-0.420
 hurts-0.395
|-0.392
elling-0.392
rated-0.383
Tokenad
Token ID324
Feature activation+8.27e-02

Pos logit contributions
 detective+0.072
 ha+0.071
 cam+0.070
 pirates+0.069
 rulers+0.069
Neg logit contributions
 shouldn-0.056
idents-0.054
|-0.053
rated-0.051
 hurts-0.050
Tokenab
Token ID397
Feature activation+7.72e-02

Pos logit contributions
 detective+0.125
 cowboy+0.117
 cam+0.116
 customer+0.116
 ha+0.115
Neg logit contributions
 shouldn-0.136
idents-0.132
 hurts-0.129
|-0.128
 aback-0.128
Tokenra
Token ID430
Feature activation+4.65e-02

Pos logit contributions
 detective+0.105
 cam+0.098
 customer+0.098
 ha+0.097
 rulers+0.096
Neg logit contributions
 shouldn-0.139
|-0.132
 hurts-0.131
elling-0.129
rated-0.129
Token!"
Token ID2474
Feature activation-7.96e-02

Pos logit contributions
 detective+0.086
 cam+0.083
 customer+0.083
 cowboy+0.081
 chefs+0.080
Neg logit contributions
 shouldn-0.060
|-0.056
 hurts-0.056
elling-0.054
rated-0.054
Token and
Token ID290
Feature activation-4.78e-02

Pos logit contributions
 shouldn+0.009
 hurts+0.008
|+0.008
elling+0.008
rated+0.008
Neg logit contributions
 detective-0.008
 cam-0.008
 customer-0.008
 pirates-0.007
 cowboy-0.007
Token said
Token ID531
Feature activation-1.50e-03

Pos logit contributions
 shouldn+0.072
 hurts+0.067
|+0.064
idents+0.063
 aback+0.063
Neg logit contributions
 detective-0.079
 customer-0.076
 cam-0.076
 cowboy-0.073
 rulers-0.073
Token,
Token ID11
Feature activation+1.45e-01

Pos logit contributions
 shouldn+0.002
 hurts+0.002
 aback+0.002
'.+0.002
upid+0.002
Neg logit contributions
 customer-0.002
 detective-0.002
 cowboy-0.002
 cam-0.002
 chefs-0.002
Token "
Token ID366
Feature activation+5.97e-02

Pos logit contributions
 detective+0.219
 cam+0.208
 customer+0.207
 rulers+0.203
 chefs+0.202
Neg logit contributions
 shouldn-0.238
 hurts-0.221
|-0.215
idents-0.215
elling-0.211
TokenA
Token ID32
Feature activation+6.10e-02

Pos logit contributions
 detective+0.100
 cam+0.096
 customer+0.096
 cowboy+0.093
 chefs+0.093
Neg logit contributions
 shouldn-0.088
 hurts-0.083
|-0.083
elling-0.081
rated-0.080
Tokenbr
Token ID1671
Feature activation+2.59e-01

Pos logit contributions
 chefs+0.078
 cam+0.077
 detective+0.075
 customer+0.075
 pirates+0.074
Neg logit contributions
 shouldn-0.111
 hurts-0.108
|-0.100
upid-0.100
elling-0.099
Tokenac
Token ID330
Feature activation+3.95e-02

Pos logit contributions
 detective+0.402
 cam+0.385
 customer+0.379
 cowboy+0.378
 chefs+0.368
Neg logit contributions
 shouldn-0.415
 hurts-0.391
|-0.390
elling-0.390
os-0.381
Tokenad
Token ID324
Feature activation+8.48e-02

Pos logit contributions
 ha+0.066
 detective+0.066
 cam+0.065
 rulers+0.065
 pirates+0.064
Neg logit contributions
 shouldn-0.054
idents-0.053
|-0.052
rated-0.049
 hurts-0.049
Tokenab
Token ID397
Feature activation+7.68e-02

Pos logit contributions
 detective+0.128
 cowboy+0.120
 customer+0.120
 cam+0.119
 pirate+0.118
Neg logit contributions
 shouldn-0.140
idents-0.137
 hurts-0.133
|-0.131
 aback-0.130
Tokenra
Token ID430
Feature activation+4.64e-02

Pos logit contributions
 detective+0.107
 cam+0.102
 customer+0.101
 rulers+0.098
 ha+0.098
Neg logit contributions
 shouldn-0.135
|-0.128
 hurts-0.127
elling-0.126
rated-0.125
Token!"
Token ID2474
Feature activation-7.14e-02

Pos logit contributions
 detective+0.087
 cam+0.084
 customer+0.084
 cowboy+0.082
 chefs+0.081
Neg logit contributions
 shouldn-0.058
 hurts-0.055
|-0.054
elling-0.053
rated-0.052
Token and
Token ID290
Feature activation-8.17e-02

Pos logit contributions
 hurts+0.008
 shouldn+0.007
os+0.007
 aback+0.007
elling+0.007
Neg logit contributions
 detective-0.013
 cam-0.013
 cowboy-0.012
 chefs-0.012
 customer-0.012
Token saying
Token ID2282
Feature activation-6.30e-02

Pos logit contributions
 hurts+0.118
 cries+0.108
 shouldn+0.107
 doesn+0.105
 aback+0.103
Neg logit contributions
 cowboy-0.136
 detective-0.136
 customer-0.135
 servant-0.129
 cents-0.128
Token,
Token ID11
Feature activation+1.23e-01

Pos logit contributions
 hurts+0.092
 shouldn+0.083
'.+0.083
 doesn+0.082
 aback+0.082
Neg logit contributions
 detective-0.097
 buyer-0.096
 customer-0.096
 cam-0.095
 chefs-0.095
Token "
Token ID366
Feature activation+5.03e-02

Pos logit contributions
 detective+0.202
 customer+0.191
 cam+0.189
 rulers+0.186
 chefs+0.185
Neg logit contributions
 shouldn-0.172
 hurts-0.171
idents-0.165
ric-0.159
os-0.158
TokenA
Token ID32
Feature activation+5.41e-02

Pos logit contributions
 detective+0.073
 customer+0.069
 cam+0.069
 cowboy+0.067
 rulers+0.066
Neg logit contributions
 shouldn-0.086
 hurts-0.082
|-0.081
elling-0.080
rated-0.079
Tokenbr
Token ID1671
Feature activation+2.59e-01

Pos logit contributions
 chefs+0.078
 cam+0.073
 detective+0.073
 pirates+0.071
 knights+0.070
Neg logit contributions
 hurts-0.094
 shouldn-0.089
'.-0.080
 doesn-0.080
|-0.080
Tokenac
Token ID330
Feature activation+3.77e-02

Pos logit contributions
 detective+0.440
 cam+0.430
 cowboy+0.425
 chefs+0.408
 customer+0.404
Neg logit contributions
 shouldn-0.355
os-0.350
elling-0.347
upid-0.345
 hurts-0.341
Tokenad
Token ID324
Feature activation+7.58e-02

Pos logit contributions
 detective+0.069
 customer+0.067
 cam+0.067
 ha+0.067
 rulers+0.066
Neg logit contributions
 shouldn-0.050
|-0.046
idents-0.045
 hurts-0.043
rated-0.042
Tokenab
Token ID397
Feature activation+7.12e-02

Pos logit contributions
 detective+0.117
 customer+0.111
 cam+0.110
 cowboy+0.109
 pirates+0.107
Neg logit contributions
 shouldn-0.123
idents-0.116
|-0.114
 hurts-0.114
rated-0.113
Tokenra
Token ID430
Feature activation+3.74e-02

Pos logit contributions
 detective+0.104
 cam+0.099
 customer+0.099
 cowboy+0.096
 chefs+0.095
Neg logit contributions
 shouldn-0.120
 hurts-0.114
|-0.113
elling-0.111
rated-0.110
Token!"
Token ID2474
Feature activation-7.32e-02

Pos logit contributions
 detective+0.064
 cam+0.061
 customer+0.061
 chefs+0.060
 pirates+0.059
Neg logit contributions
 shouldn-0.053
 hurts-0.051
|-0.049
elling-0.048
rated-0.048
Token and
Token ID290
Feature activation-6.47e-02

Pos logit contributions
 shouldn+0.205
 hurts+0.194
|+0.193
elling+0.186
rated+0.184
Neg logit contributions
 detective-0.222
 cam-0.213
 customer-0.212
 cowboy-0.208
 chefs-0.205
Token said
Token ID531
Feature activation-1.27e-02

Pos logit contributions
 shouldn+0.097
 hurts+0.092
|+0.090
elling+0.087
rated+0.087
Neg logit contributions
 detective-0.106
 customer-0.103
 cam-0.102
 chefs-0.099
 cowboy-0.099
Token,
Token ID11
Feature activation+1.38e-01

Pos logit contributions
 shouldn+0.020
 hurts+0.019
 aback+0.019
 It+0.018
os+0.018
Neg logit contributions
 customer-0.019
 detective-0.018
 chefs-0.018
 cowboy-0.018
 buyer-0.018
Token "
Token ID366
Feature activation+5.18e-02

Pos logit contributions
 detective+0.208
 customer+0.198
 cam+0.197
 rulers+0.193
 chefs+0.193
Neg logit contributions
 shouldn-0.225
 hurts-0.211
|-0.206
idents-0.205
elling-0.201
TokenA
Token ID32
Feature activation+5.94e-02

Pos logit contributions
 detective+0.086
 cam+0.083
 customer+0.082
 cowboy+0.080
 chefs+0.080
Neg logit contributions
 shouldn-0.077
 hurts-0.073
|-0.072
elling-0.070
rated-0.069
Tokenbr
Token ID1671
Feature activation+2.59e-01

Pos logit contributions
 chefs+0.083
 cam+0.080
 pirates+0.078
 rulers+0.078
 knights+0.076
Neg logit contributions
 shouldn-0.106
 hurts-0.104
|-0.099
ches-0.096
elling-0.095
Tokenac
Token ID330
Feature activation+3.92e-02

Pos logit contributions
 detective+0.404
 cam+0.391
 cowboy+0.384
 customer+0.382
 chefs+0.378
Neg logit contributions
 shouldn-0.406
elling-0.386
|-0.386
 hurts-0.384
os-0.377
Tokenad
Token ID324
Feature activation+8.18e-02

Pos logit contributions
 detective+0.068
 ha+0.067
 cam+0.067
 rulers+0.066
 customer+0.065
Neg logit contributions
 shouldn-0.054
idents-0.052
|-0.049
 hurts-0.048
rated-0.048
Tokenab
Token ID397
Feature activation+7.57e-02

Pos logit contributions
 detective+0.119
 customer+0.110
 cowboy+0.110
 ha+0.109
 cam+0.108
Neg logit contributions
 shouldn-0.141
idents-0.138
 hurts-0.132
 aback-0.132
elling-0.130
Tokenra
Token ID430
Feature activation+4.59e-02

Pos logit contributions
 detective+0.109
 cam+0.104
 customer+0.103
 cowboy+0.100
 rulers+0.100
Neg logit contributions
 shouldn-0.129
 hurts-0.122
|-0.122
elling-0.119
rated-0.119
Token!
Token ID0
Feature activation-3.69e-02

Pos logit contributions
 detective+0.093
 cam+0.090
 customer+0.089
 cowboy+0.088
 rulers+0.087
Neg logit contributions
 shouldn-0.051
|-0.047
 hurts-0.047
elling-0.046
rated-0.045
Token pond
Token ID16723
Feature activation+5.74e-02

Pos logit contributions
 shouldn+0.051
|+0.049
 hurts+0.048
elling+0.048
rated+0.047
Neg logit contributions
 detective-0.041
 cam-0.040
 customer-0.039
 pirates-0.038
 cowboy-0.038
Token.
Token ID13
Feature activation-1.85e-02

Pos logit contributions
 detective+0.078
 cam+0.077
 customer+0.077
 pirates+0.075
 chefs+0.075
Neg logit contributions
 shouldn-0.097
|-0.095
elling-0.093
 hurts-0.093
idents-0.090
Token The
Token ID383
Feature activation-2.41e-02

Pos logit contributions
rated+0.028
 hurts+0.026
elling+0.025
 shouldn+0.024
|+0.024
Neg logit contributions
 customer-0.031
 detective-0.030
 cam-0.030
 knights-0.030
 pirates-0.029
Token two
Token ID734
Feature activation+1.23e-01

Pos logit contributions
 shouldn+0.042
 hurts+0.041
elling+0.039
|+0.039
rated+0.038
Neg logit contributions
 detective-0.031
 chefs-0.030
 customer-0.029
 rulers-0.028
 cam-0.028
Token ze
Token ID41271
Feature activation+7.54e-02

Pos logit contributions
 detective+0.218
 chefs+0.217
 customer+0.214
 servant+0.199
 ti+0.197
Neg logit contributions
 shouldn-0.164
 hurts-0.148
os-0.145
elling-0.140
 cries-0.139
Tokenbr
Token ID1671
Feature activation+2.59e-01

Pos logit contributions
 detective+0.123
 customer+0.120
 chefs+0.116
 cam+0.114
 servant+0.113
Neg logit contributions
 shouldn-0.115
 hurts-0.107
|-0.106
elling-0.105
os-0.104
Tokenas
Token ID292
Feature activation+1.98e-02

Pos logit contributions
 cam+0.252
 detective+0.250
 rulers+0.248
 customer+0.243
 chefs+0.234
Neg logit contributions
 shouldn-0.556
 hurts-0.538
|-0.523
idents-0.521
rated-0.510
Token stayed
Token ID9658
Feature activation+8.03e-02

Pos logit contributions
 detective+0.029
 cam+0.028
 customer+0.027
 pirates+0.027
 rulers+0.027
Neg logit contributions
 shouldn-0.032
 hurts-0.032
|-0.031
elling-0.031
idents-0.031
Token together
Token ID1978
Feature activation+1.98e-02

Pos logit contributions
 detective+0.125
 customer+0.122
 servant+0.119
 cam+0.117
 ha+0.114
Neg logit contributions
 shouldn-0.124
 aback-0.119
 hurts-0.117
|-0.115
rated-0.112
Token for
Token ID329
Feature activation-8.57e-03

Pos logit contributions
 detective+0.033
 cam+0.032
 customer+0.031
 cowboy+0.031
 chefs+0.031
Neg logit contributions
 shouldn-0.029
 hurts-0.027
|-0.027
 aback-0.026
os-0.026
Token a
Token ID257
Feature activation+7.88e-02

Pos logit contributions
 shouldn+0.012
|+0.011
 hurts+0.011
 aback+0.011
idents+0.010
Neg logit contributions
 detective-0.015
 cam-0.014
 customer-0.014
 servant-0.014
 buyer-0.014
Token and
Token ID290
Feature activation-6.76e-02

Pos logit contributions
 shouldn+0.036
 hurts+0.035
'.+0.034
os+0.034
 aback+0.032
Neg logit contributions
 detective-0.089
 cowboy-0.088
 cam-0.088
 chefs-0.086
 buyer-0.084
Token said
Token ID531
Feature activation-1.29e-02

Pos logit contributions
 shouldn+0.076
 hurts+0.075
 doesn+0.072
|+0.070
 cries+0.069
Neg logit contributions
 customer-0.131
 detective-0.125
 cowboy-0.124
 rulers-0.124
 cam-0.123
Token,
Token ID11
Feature activation+1.32e-01

Pos logit contributions
 hurts+0.016
upid+0.014
 doesn+0.014
 It+0.014
'.+0.014
Neg logit contributions
 customer-0.023
 buyer-0.023
 cowboy-0.022
 detective-0.022
 chefs-0.022
Token "
Token ID366
Feature activation+5.06e-02

Pos logit contributions
 detective+0.240
 customer+0.234
 cam+0.232
 rulers+0.232
 chefs+0.228
Neg logit contributions
 shouldn-0.165
idents-0.153
 hurts-0.152
 It-0.148
 doesn-0.148
TokenA
Token ID32
Feature activation+5.66e-02

Pos logit contributions
 detective+0.086
 cam+0.085
 customer+0.083
 chefs+0.083
 pirates+0.081
Neg logit contributions
 shouldn-0.071
|-0.070
 hurts-0.068
rated-0.064
elling-0.064
Tokenbr
Token ID1671
Feature activation+2.56e-01

Pos logit contributions
 chefs+0.073
 cam+0.072
 detective+0.070
 pirates+0.069
 customer+0.067
Neg logit contributions
 hurts-0.104
 shouldn-0.100
|-0.098
elling-0.093
rated-0.092
Tokenac
Token ID330
Feature activation+3.73e-02

Pos logit contributions
 detective+0.409
 cam+0.400
 cowboy+0.391
 customer+0.382
 chefs+0.379
Neg logit contributions
 shouldn-0.380
elling-0.368
|-0.368
 hurts-0.366
os-0.365
Tokenad
Token ID324
Feature activation+7.62e-02

Pos logit contributions
 detective+0.067
 ha+0.065
 customer+0.064
 cam+0.064
 pirates+0.063
Neg logit contributions
 shouldn-0.052
idents-0.047
|-0.046
rated-0.045
 hurts-0.044
Tokenab
Token ID397
Feature activation+7.29e-02

Pos logit contributions
 detective+0.120
 customer+0.113
 cam+0.113
 cowboy+0.111
 pirates+0.110
Neg logit contributions
 shouldn-0.122
idents-0.114
 hurts-0.113
|-0.113
rated-0.112
Tokenra
Token ID430
Feature activation+3.98e-02

Pos logit contributions
 detective+0.106
 cam+0.100
 customer+0.100
 rulers+0.097
 cowboy+0.096
Neg logit contributions
 shouldn-0.125
|-0.117
 hurts-0.117
elling-0.115
rated-0.114
Token,
Token ID11
Feature activation+1.77e-01

Pos logit contributions
 detective+0.063
 cam+0.061
 customer+0.061
 chefs+0.059
 rulers+0.058
Neg logit contributions
 shouldn-0.061
 hurts-0.059
|-0.056
elling-0.056
rated-0.055
Token and
Token ID290
Feature activation-5.18e-02

Pos logit contributions
 shouldn+0.092
 hurts+0.088
|+0.087
elling+0.085
rated+0.084
Neg logit contributions
 detective-0.084
 cam-0.080
 customer-0.080
 cowboy-0.077
 chefs-0.077
Token said
Token ID531
Feature activation-1.03e-02

Pos logit contributions
 shouldn+0.069
 hurts+0.064
 doesn+0.060
 aback+0.059
|+0.058
Neg logit contributions
 customer-0.094
 detective-0.093
 rulers-0.090
 cam-0.090
 chefs-0.090
Token,
Token ID11
Feature activation+1.39e-01

Pos logit contributions
 shouldn+0.016
 hurts+0.016
 doesn+0.014
 aback+0.014
os+0.014
Neg logit contributions
 customer-0.016
 detective-0.015
 cam-0.015
 chefs-0.015
 buyer-0.015
Token "
Token ID366
Feature activation+5.34e-02

Pos logit contributions
 detective+0.207
 cam+0.196
 customer+0.196
 rulers+0.191
 chefs+0.191
Neg logit contributions
 shouldn-0.230
 hurts-0.217
idents-0.209
|-0.208
elling-0.204
TokenA
Token ID32
Feature activation+5.53e-02

Pos logit contributions
 detective+0.080
 customer+0.076
 cam+0.075
 cowboy+0.075
 ha+0.073
Neg logit contributions
 shouldn-0.087
 hurts-0.084
|-0.082
elling-0.081
idents-0.081
Tokenbr
Token ID1671
Feature activation+2.56e-01

Pos logit contributions
 cam+0.071
 chefs+0.071
 pirates+0.071
 rulers+0.068
 detective+0.068
Neg logit contributions
 shouldn-0.097
 hurts-0.096
upid-0.088
|-0.088
ches-0.087
Tokenac
Token ID330
Feature activation+3.70e-02

Pos logit contributions
 detective+0.401
 cam+0.389
 cowboy+0.380
 customer+0.376
 chefs+0.369
Neg logit contributions
 shouldn-0.398
elling-0.378
|-0.377
 hurts-0.376
os-0.376
Tokenad
Token ID324
Feature activation+7.82e-02

Pos logit contributions
 ha+0.063
 detective+0.063
 cam+0.062
 pirates+0.061
 customer+0.061
Neg logit contributions
 shouldn-0.052
idents-0.049
|-0.048
rated-0.046
 hurts-0.045
Tokenab
Token ID397
Feature activation+7.25e-02

Pos logit contributions
 detective+0.118
 cowboy+0.111
 customer+0.111
 ha+0.109
 cam+0.109
Neg logit contributions
 shouldn-0.128
idents-0.126
|-0.120
 aback-0.120
 hurts-0.120
Tokenra
Token ID430
Feature activation+4.17e-02

Pos logit contributions
 detective+0.106
 cam+0.101
 customer+0.100
 cowboy+0.097
 rulers+0.097
Neg logit contributions
 shouldn-0.123
 hurts-0.116
|-0.116
elling-0.113
rated-0.113
Token,
Token ID11
Feature activation+1.77e-01

Pos logit contributions
 detective+0.070
 cam+0.067
 customer+0.066
 cowboy+0.065
 chefs+0.064
Neg logit contributions
 shouldn-0.061
 hurts-0.058
|-0.057
elling-0.056
rated-0.055
Token can
Token ID460
Feature activation-6.53e-03

Pos logit contributions
 chefs+0.038
 ha+0.038
 pirates+0.036
 rulers+0.035
 detective+0.035
Neg logit contributions
opl-0.045
elling-0.045
|-0.044
ingle-0.043
 aback-0.042
Token't
Token ID470
Feature activation-1.02e-01

Pos logit contributions
rated+0.012
elling+0.011
 aback+0.011
 shouldn+0.011
|+0.011
Neg logit contributions
 ha-0.009
 rulers-0.008
 pirates-0.008
 dressing-0.008
 detective-0.008
Token.
Token ID13
Feature activation-1.93e-02

Pos logit contributions
 shouldn+0.158
|+0.154
elling+0.151
 aback+0.150
 hurts+0.149
Neg logit contributions
 detective-0.157
 cam-0.152
 customer-0.151
 pirates-0.149
 chefs-0.149
Token The
Token ID383
Feature activation-1.34e-02

Pos logit contributions
 shouldn+0.025
rated+0.024
|+0.023
 hurts+0.023
 aback+0.022
Neg logit contributions
 detective-0.035
 cam-0.034
 chefs-0.034
 customer-0.034
 pirates-0.034
Token ze
Token ID41271
Feature activation+7.08e-02

Pos logit contributions
 shouldn+0.024
 hurts+0.024
|+0.023
elling+0.023
rated+0.023
Neg logit contributions
 detective-0.017
 cam-0.016
 customer-0.016
 cowboy-0.016
 chefs-0.016
Tokenbr
Token ID1671
Feature activation+2.54e-01

Pos logit contributions
 detective+0.069
 cam+0.064
 customer+0.063
 chefs+0.060
 cowboy+0.060
Neg logit contributions
 shouldn-0.154
 hurts-0.148
|-0.147
elling-0.145
rated-0.143
Tokenas
Token ID292
Feature activation+1.72e-02

Pos logit contributions
 detective+0.355
 cam+0.336
 customer+0.335
 cowboy+0.326
 chefs+0.323
Neg logit contributions
 shouldn-0.442
 hurts-0.424
|-0.421
elling-0.413
rated-0.408
Token have
Token ID423
Feature activation+1.90e-02

Pos logit contributions
 ha+0.026
 detective+0.025
 cam+0.025
 pirates+0.025
 rulers+0.025
Neg logit contributions
elling-0.025
 shouldn-0.025
|-0.025
rated-0.025
idents-0.024
Token their
Token ID511
Feature activation+7.60e-02

Pos logit contributions
 detective+0.025
 cam+0.025
 pirates+0.024
 rulers+0.024
 knights+0.023
Neg logit contributions
 shouldn-0.033
rated-0.033
|-0.033
idents-0.033
elling-0.032
Token own
Token ID898
Feature activation-4.02e-02

Pos logit contributions
 detective+0.105
 cam+0.103
 pirates+0.100
 rulers+0.099
 customer+0.099
Neg logit contributions
 shouldn-0.132
|-0.128
elling-0.127
rated-0.126
 aback-0.126
Token food
Token ID2057
Feature activation+8.77e-02

Pos logit contributions
 shouldn+0.073
|+0.070
elling+0.069
rated+0.067
idents+0.067
Neg logit contributions
 detective-0.053
 cam-0.052
 customer-0.050
 rulers-0.050
 pirates-0.049
Token around
Token ID1088
Feature activation-1.25e-01

Pos logit contributions
 shouldn+0.131
 hurts+0.126
|+0.119
rated+0.118
 aback+0.117
Neg logit contributions
 detective-0.145
 customer-0.142
 cam-0.142
 chefs-0.139
 cowboy-0.135
Token and
Token ID290
Feature activation-6.02e-02

Pos logit contributions
 shouldn+0.174
|+0.165
 hurts+0.165
idents+0.158
 aback+0.158
Neg logit contributions
 detective-0.211
 cam-0.206
 customer-0.205
 cowboy-0.203
 rulers-0.199
Token said
Token ID531
Feature activation-1.09e-02

Pos logit contributions
 shouldn+0.085
 hurts+0.078
 doesn+0.074
idents+0.073
|+0.073
Neg logit contributions
 customer-0.110
 detective-0.104
 rulers-0.103
 cam-0.102
 chefs-0.102
Token "
Token ID366
Feature activation+4.08e-02

Pos logit contributions
 shouldn+0.016
 hurts+0.016
 It+0.015
 doesn+0.015
idents+0.015
Neg logit contributions
 customer-0.018
 buyer-0.017
 detective-0.017
 chefs-0.017
 cowboy-0.016
TokenA
Token ID32
Feature activation+5.56e-02

Pos logit contributions
 detective+0.060
 customer+0.056
 rulers+0.056
 cam+0.055
 cowboy+0.055
Neg logit contributions
 shouldn-0.067
 hurts-0.065
elling-0.063
idents-0.062
|-0.061
Tokenbr
Token ID1671
Feature activation+2.54e-01

Pos logit contributions
 cam+0.072
 chefs+0.072
 customer+0.069
 rulers+0.066
 buyer+0.064
Neg logit contributions
 hurts-0.102
 shouldn-0.101
ches-0.094
|-0.094
 doesn-0.091
Tokenac
Token ID330
Feature activation+3.80e-02

Pos logit contributions
 detective+0.386
 cam+0.374
 customer+0.366
 cowboy+0.363
 chefs+0.356
Neg logit contributions
 shouldn-0.407
 hurts-0.386
|-0.384
elling-0.381
os-0.372
Tokenad
Token ID324
Feature activation+7.08e-02

Pos logit contributions
 detective+0.068
 ha+0.066
 cam+0.066
 rulers+0.065
 pirates+0.065
Neg logit contributions
 shouldn-0.051
idents-0.048
|-0.046
rated-0.045
 hurts-0.045
Tokenab
Token ID397
Feature activation+6.90e-02

Pos logit contributions
 detective+0.109
 customer+0.102
 cowboy+0.101
 cam+0.101
 rulers+0.099
Neg logit contributions
 shouldn-0.116
 hurts-0.110
idents-0.110
|-0.109
rated-0.107
Tokenra
Token ID430
Feature activation+3.60e-02

Pos logit contributions
 detective+0.092
 cam+0.086
 customer+0.086
 ha+0.085
 rulers+0.085
Neg logit contributions
 shouldn-0.126
|-0.120
 hurts-0.118
elling-0.117
rated-0.117
Token!"
Token ID2474
Feature activation-8.25e-02

Pos logit contributions
 detective+0.074
 cam+0.071
 customer+0.071
 cowboy+0.070
 chefs+0.069
Neg logit contributions
 shouldn-0.039
 hurts-0.036
|-0.036
elling-0.035
rated-0.034
Token wand
Token ID11569
Feature activation+1.65e-02

Pos logit contributions
 detective+0.041
 customer+0.038
 pirates+0.038
 cam+0.038
 pirate+0.038
Neg logit contributions
 shouldn-0.065
|-0.063
elling-0.062
rated-0.061
 hurts-0.060
Token and
Token ID290
Feature activation-6.59e-02

Pos logit contributions
 detective+0.024
 cam+0.023
 customer+0.023
 cowboy+0.022
 rulers+0.022
Neg logit contributions
 shouldn-0.027
 hurts-0.026
|-0.026
elling-0.025
rated-0.025
Token say
Token ID910
Feature activation-1.14e-02

Pos logit contributions
 shouldn+0.110
elling+0.107
|+0.106
 hurts+0.104
rated+0.103
Neg logit contributions
 detective-0.097
 cam-0.092
 customer-0.090
 pirates-0.089
 cowboy-0.089
Token "
Token ID366
Feature activation+3.24e-02

Pos logit contributions
 shouldn+0.018
 hurts+0.018
 doesn+0.017
os+0.016
upid+0.016
Neg logit contributions
 detective-0.016
 customer-0.016
 chefs-0.016
 cam-0.016
 buyer-0.015
TokenA
Token ID32
Feature activation+5.22e-02

Pos logit contributions
 detective+0.050
 customer+0.048
 cowboy+0.047
 rulers+0.046
 cam+0.046
Neg logit contributions
 shouldn-0.051
 hurts-0.049
elling-0.048
idents-0.047
|-0.047
Tokenbr
Token ID1671
Feature activation+2.53e-01

Pos logit contributions
 detective+0.064
 cam+0.060
 customer+0.060
 cowboy+0.057
 chefs+0.057
Neg logit contributions
 shouldn-0.101
 hurts-0.097
|-0.096
elling-0.094
rated-0.093
Tokenac
Token ID330
Feature activation+3.94e-02

Pos logit contributions
 detective+0.370
 cam+0.354
 customer+0.350
 cowboy+0.341
 chefs+0.339
Neg logit contributions
 shouldn-0.425
 hurts-0.405
|-0.402
elling-0.395
rated-0.390
Tokenad
Token ID324
Feature activation+6.16e-02

Pos logit contributions
 rulers+0.066
 detective+0.065
 customer+0.065
 ha+0.065
 cam+0.064
Neg logit contributions
 shouldn-0.057
idents-0.055
|-0.052
 hurts-0.049
rated-0.048
Tokenab
Token ID397
Feature activation+5.98e-02

Pos logit contributions
 detective+0.093
 customer+0.089
 cowboy+0.088
 cam+0.086
 rulers+0.085
Neg logit contributions
 shouldn-0.102
idents-0.098
 hurts-0.097
|-0.096
 aback-0.094
Tokenra
Token ID430
Feature activation+1.86e-02

Pos logit contributions
 detective+0.079
 customer+0.075
 rulers+0.075
 cam+0.074
 ha+0.074
Neg logit contributions
 shouldn-0.111
|-0.106
 hurts-0.102
elling-0.102
rated-0.102
Token!"
Token ID2474
Feature activation-7.71e-02

Pos logit contributions
 detective+0.037
 customer+0.036
 cam+0.036
 cowboy+0.035
 ha+0.035
Neg logit contributions
 shouldn-0.021
|-0.020
 hurts-0.020
elling-0.019
idents-0.019
Token scree
Token ID43465
Feature activation-4.14e-02

Pos logit contributions
 cam+0.089
 detective+0.085
 rulers+0.084
 pirates+0.082
 ha+0.081
Neg logit contributions
elling-0.104
idents-0.101
 shouldn-0.101
 aback-0.099
|-0.099
Tokenched
Token ID1740
Feature activation-5.47e-02

Pos logit contributions
idents+0.055
 fault+0.052
ric+0.051
 shouldn+0.051
 disl+0.049
Neg logit contributions
 cowboy-0.069
 cam-0.068
 mer-0.066
 detective-0.066
P-0.065
Token.
Token ID13
Feature activation-2.47e-02

Pos logit contributions
 shouldn+0.089
|+0.085
elling+0.084
 hurts+0.083
rated+0.083
Neg logit contributions
 cam-0.078
 detective-0.078
 customer-0.077
 pirates-0.076
 cowboy-0.075
Token The
Token ID383
Feature activation-1.90e-02

Pos logit contributions
rated+0.031
 shouldn+0.031
 hurts+0.031
elling+0.029
idents+0.029
Neg logit contributions
 customer-0.045
 cam-0.045
 detective-0.045
 pirates-0.044
 knights-0.044
Token ze
Token ID41271
Feature activation+7.61e-02

Pos logit contributions
 shouldn+0.036
 hurts+0.034
|+0.034
elling+0.033
rated+0.033
Neg logit contributions
 detective-0.024
 cam-0.023
 customer-0.023
 cowboy-0.022
 pirates-0.022
Tokenbr
Token ID1671
Feature activation+2.50e-01

Pos logit contributions
 detective+0.090
 cam+0.085
 customer+0.084
 chefs+0.081
 cowboy+0.080
Neg logit contributions
 shouldn-0.149
 hurts-0.144
|-0.142
elling-0.140
rated-0.138
Tokenas
Token ID292
Feature activation+1.89e-02

Pos logit contributions
 detective+0.271
 cam+0.266
 rulers+0.263
 customer+0.261
 pirates+0.249
Neg logit contributions
 shouldn-0.516
 hurts-0.491
|-0.482
idents-0.475
elling-0.475
Token ne
Token ID497
Feature activation+1.41e-01

Pos logit contributions
 detective+0.024
 cam+0.024
 cowboy+0.023
 rulers+0.023
 customer+0.023
Neg logit contributions
 shouldn-0.033
elling-0.033
|-0.032
idents-0.032
 hurts-0.032
Tokenighed
Token ID12570
Feature activation-4.47e-02

Pos logit contributions
 detective+0.226
 cam+0.217
 customer+0.215
 cowboy+0.209
 rulers+0.208
Neg logit contributions
 shouldn-0.218
 hurts-0.207
|-0.203
elling-0.200
rated-0.197
Token.
Token ID13
Feature activation-2.41e-02

Pos logit contributions
 shouldn+0.080
|+0.077
elling+0.076
 hurts+0.075
rated+0.073
Neg logit contributions
 detective-0.060
 cam-0.057
 customer-0.056
 pirates-0.056
 cowboy-0.055
Token The
Token ID383
Feature activation-2.36e-02

Pos logit contributions
 shouldn+0.029
rated+0.029
 hurts+0.028
elling+0.027
idents+0.027
Neg logit contributions
 detective-0.045
 cam-0.045
 customer-0.045
 pirates-0.044
 knights-0.044
Token if
Token ID611
Feature activation-3.93e-02

Pos logit contributions
 shouldn+0.018
|+0.016
 hurts+0.016
os+0.016
elling+0.015
Neg logit contributions
 detective-0.016
 cam-0.015
 customer-0.015
 cowboy-0.014
 ha-0.014
Token nothing
Token ID2147
Feature activation-2.06e-02

Pos logit contributions
 shouldn+0.058
 hurts+0.052
idents+0.049
 doesn+0.049
 aback+0.049
Neg logit contributions
 detective-0.064
 cam-0.063
 customer-0.062
 servant-0.060
 rulers-0.060
Token had
Token ID550
Feature activation-2.19e-02

Pos logit contributions
 shouldn+0.029
idents+0.028
|+0.028
elling+0.028
 hurts+0.027
Neg logit contributions
 detective-0.035
 cam-0.033
 pirates-0.032
 mer-0.032
 cowboy-0.032
Token ever
Token ID1683
Feature activation-4.72e-02

Pos logit contributions
|+0.030
 shouldn+0.029
elling+0.028
idents+0.028
rated+0.028
Neg logit contributions
 detective-0.037
 pirates-0.036
 cam-0.036
 customer-0.036
 cowboy-0.035
Token happened
Token ID3022
Feature activation-3.51e-02

Pos logit contributions
 shouldn+0.066
|+0.065
idents+0.063
elling+0.062
os+0.062
Neg logit contributions
 detective-0.078
 cam-0.076
 pirates-0.075
 chefs-0.074
 customer-0.073
Token<|endoftext|>
Token ID50256
Feature activation-2.29e-01

Pos logit contributions
 shouldn+0.057
|+0.055
elling+0.054
 hurts+0.054
rated+0.053
Neg logit contributions
 detective-0.052
 cam-0.050
 pirates-0.049
 customer-0.049
 chefs-0.049
TokenOnce
Token ID7454
Feature activation-3.94e-03

Pos logit contributions
idents+0.306
rated+0.306
 shouldn+0.304
 hurts+0.288
elling+0.285
Neg logit contributions
 pirates-0.405
 detective-0.400
 cam-0.398
 customer-0.397
 chefs-0.393
Token upon
Token ID2402
Feature activation-1.71e-01

Pos logit contributions
elling+0.007
 shouldn+0.007
rated+0.007
 hurts+0.007
 aback+0.006
Neg logit contributions
 detective-0.005
 customer-0.005
 pirates-0.005
 cam-0.005
 pirate-0.005
Token a
Token ID257
Feature activation+1.97e-03

Pos logit contributions
elling+0.379
 shouldn+0.360
rated+0.358
 hurts+0.353
idents+0.353
Neg logit contributions
 pirates-0.172
 detective-0.167
 knights-0.157
 chefs-0.155
 pirate-0.153
Token time
Token ID640
Feature activation-3.38e-02

Pos logit contributions
 detective+0.003
 cam+0.003
 customer+0.003
 chefs+0.003
 servant+0.003
Neg logit contributions
 shouldn-0.003
|-0.003
 hurts-0.003
rated-0.003
elling-0.002
Token there
Token ID612
Feature activation-8.10e-02

Pos logit contributions
 shouldn+0.068
|+0.066
 hurts+0.066
os+0.063
elling+0.063
Neg logit contributions
 detective-0.038
 customer-0.035
 cam-0.035
 cowboy-0.034
 chefs-0.034
Token belonged
Token ID19611
Feature activation-1.89e-02

Pos logit contributions
upid+0.187
|+0.186
opl+0.181
ingle+0.178
ric+0.178
Neg logit contributions
 detective-0.142
 pirates-0.134
 cam-0.128
 ha-0.128
 knights-0.126
Token to
Token ID284
Feature activation-3.32e-02

Pos logit contributions
 shouldn+0.025
 hurts+0.023
|+0.022
elling+0.022
 aback+0.021
Neg logit contributions
 detective-0.035
 customer-0.034
 cam-0.034
 cowboy-0.033
 rulers-0.033
Token the
Token ID262
Feature activation-6.05e-02

Pos logit contributions
 shouldn+0.055
 hurts+0.052
|+0.051
elling+0.050
rated+0.049
Neg logit contributions
 detective-0.050
 cam-0.048
 customer-0.047
 cowboy-0.046
 chefs-0.046
Token park
Token ID3952
Feature activation+5.41e-02

Pos logit contributions
 shouldn+0.104
 hurts+0.097
 doesn+0.091
rated+0.090
idents+0.090
Neg logit contributions
 detective-0.085
 cam-0.082
 rulers-0.082
 customer-0.080
 chefs-0.079
Token care
Token ID1337
Feature activation-2.98e-02

Pos logit contributions
 detective+0.087
 cam+0.084
 customer+0.083
 cowboy+0.081
 chefs+0.081
Neg logit contributions
 shouldn-0.083
 hurts-0.078
|-0.077
elling-0.075
rated-0.074
Tokentaker
Token ID30157
Feature activation-2.26e-01

Pos logit contributions
|+0.054
idents+0.053
 shouldn+0.052
 hurts+0.051
rated+0.050
Neg logit contributions
 detective-0.038
 pirates-0.037
 cam-0.037
 knights-0.036
 rulers-0.035
Token.
Token ID13
Feature activation-4.10e-02

Pos logit contributions
idents+0.413
 shouldn+0.412
 hurts+0.407
|+0.401
 aback+0.399
Neg logit contributions
 detective-0.285
 pirates-0.273
 customer-0.265
 chefs-0.258
 explorers-0.256
Token He
Token ID679
Feature activation+6.14e-02

Pos logit contributions
 hurts+0.061
 shouldn+0.060
rated+0.060
|+0.057
opl+0.055
Neg logit contributions
 detective-0.064
 customer-0.064
 cam-0.063
 knights-0.062
 servant-0.061
Token used
Token ID973
Feature activation+1.07e-01

Pos logit contributions
 detective+0.099
 cam+0.095
 customer+0.094
 cowboy+0.092
 chefs+0.091
Neg logit contributions
 shouldn-0.094
 hurts-0.090
|-0.089
elling-0.087
rated-0.086
Token the
Token ID262
Feature activation-1.02e-02

Pos logit contributions
 detective+0.172
 cam+0.164
 customer+0.164
 pirates+0.160
 cowboy+0.159
Neg logit contributions
 shouldn-0.165
|-0.159
rated-0.156
 hurts-0.156
elling-0.153
Token soft
Token ID2705
Feature activation+1.37e-02

Pos logit contributions
|+0.021
 shouldn+0.021
elling+0.020
os+0.020
 hurts+0.020
Neg logit contributions
 detective-0.011
 customer-0.011
 cowboy-0.010
 cam-0.010
 pirates-0.010
Token birds
Token ID10087
Feature activation-1.01e-01

Pos logit contributions
|+0.163
 shouldn+0.159
elling+0.152
idents+0.151
rated+0.150
Neg logit contributions
 detective-0.121
 cam-0.115
 pirates-0.114
 knights-0.112
 chefs-0.110
Token,
Token ID11
Feature activation+1.33e-01

Pos logit contributions
 shouldn+0.161
|+0.155
 hurts+0.154
elling+0.153
rated+0.151
Neg logit contributions
 detective-0.154
 cam-0.146
 customer-0.144
 pirates-0.143
 cowboy-0.142
Token no
Token ID645
Feature activation+4.85e-02

Pos logit contributions
 detective+0.151
 pirates+0.147
 cam+0.143
 cowboy+0.139
 knights+0.136
Neg logit contributions
|-0.263
rated-0.258
elling-0.256
 hurts-0.250
ingle-0.249
Token matter
Token ID2300
Feature activation-5.79e-02

Pos logit contributions
 detective+0.088
 cam+0.086
 customer+0.086
 chefs+0.085
 cowboy+0.083
Neg logit contributions
 shouldn-0.064
 hurts-0.062
rated-0.058
elling-0.057
|-0.055
Token how
Token ID703
Feature activation+3.59e-03

Pos logit contributions
 shouldn+0.096
|+0.093
 hurts+0.093
rated+0.090
elling+0.090
Neg logit contributions
 detective-0.084
 cam-0.080
 customer-0.079
 pirates-0.078
 rulers-0.078
Token<|endoftext|>
Token ID50256
Feature activation-2.25e-01

Pos logit contributions
 pirates+0.003
 knights+0.003
 long+0.003
 cam+0.003
 rulers+0.003
Neg logit contributions
|-0.008
idents-0.007
ric-0.007
zi-0.007
elling-0.007
TokenOnce
Token ID7454
Feature activation-5.85e-04

Pos logit contributions
idents+0.362
rated+0.346
 shouldn+0.333
 disl+0.331
elling+0.329
Neg logit contributions
 pirates-0.352
 customer-0.346
 knights-0.339
P-0.339
 detective-0.338
Token upon
Token ID2402
Feature activation-1.71e-01

Pos logit contributions
elling+0.001
 shouldn+0.001
idents+0.001
rated+0.001
 hurts+0.001
Neg logit contributions
 detective-0.001
 customer-0.001
 pirates-0.001
 servant-0.001
 pirate-0.001
Token a
Token ID257
Feature activation-5.43e-05

Pos logit contributions
elling+0.376
 shouldn+0.362
rated+0.361
idents+0.358
 hurts+0.352
Neg logit contributions
 pirates-0.172
 detective-0.167
 knights-0.161
 pirate-0.155
 chefs-0.150
Token time
Token ID640
Feature activation-3.12e-02

Pos logit contributions
|+0.000
 shouldn+0.000
 hurts+0.000
rated+0.000
ches+0.000
Neg logit contributions
 detective-0.000
 chefs-0.000
 cam-0.000
 buyer-0.000
 servant-0.000
Token there
Token ID612
Feature activation-7.53e-02

Pos logit contributions
 shouldn+0.062
|+0.059
 hurts+0.059
os+0.057
 aback+0.056
Neg logit contributions
 detective-0.037
 customer-0.034
 cam-0.034
 cowboy-0.034
 chefs-0.033
Token understand
Token ID1833
Feature activation-7.35e-02

Pos logit contributions
 shouldn+0.043
 hurts+0.041
|+0.041
elling+0.040
rated+0.039
Neg logit contributions
 detective-0.054
 cam-0.052
 customer-0.052
 cowboy-0.051
 chefs-0.051
Token things
Token ID1243
Feature activation-4.35e-02

Pos logit contributions
 shouldn+0.105
 hurts+0.098
|+0.093
elling+0.093
 aback+0.093
Neg logit contributions
 detective-0.126
 cam-0.121
 customer-0.119
 chefs-0.117
 cowboy-0.117
Token a
Token ID257
Feature activation+4.94e-02

Pos logit contributions
|+0.075
 shouldn+0.075
elling+0.074
 hurts+0.073
idents+0.071
Neg logit contributions
 detective-0.057
 cam-0.053
 customer-0.052
 mer-0.052
 pirates-0.051
Token bit
Token ID1643
Feature activation+2.72e-02

Pos logit contributions
 detective+0.068
 cam+0.066
 customer+0.064
 chefs+0.064
 pirates+0.063
Neg logit contributions
 shouldn-0.087
 hurts-0.083
|-0.081
rated-0.080
idents-0.079
Token better
Token ID1365
Feature activation-1.28e-02

Pos logit contributions
 detective+0.044
 cam+0.042
 customer+0.042
 cowboy+0.041
 chefs+0.040
Neg logit contributions
 shouldn-0.042
 hurts-0.040
|-0.039
elling-0.039
rated-0.038
Token<|endoftext|>
Token ID50256
Feature activation-2.20e-01

Pos logit contributions
 shouldn+0.019
 hurts+0.018
rated+0.017
|+0.017
elling+0.017
Neg logit contributions
 detective-0.022
 cam-0.021
 customer-0.021
 cowboy-0.020
 chefs-0.020
TokenOnce
Token ID7454
Feature activation+6.22e-04

Pos logit contributions
 shouldn+0.270
idents+0.269
rated+0.263
 hurts+0.257
elling+0.250
Neg logit contributions
 detective-0.407
 cam-0.406
 pirates-0.400
 customer-0.400
 chefs-0.395
Token upon
Token ID2402
Feature activation-1.66e-01

Pos logit contributions
 detective+0.001
 customer+0.001
 cam+0.001
 pirates+0.001
 servant+0.001
Neg logit contributions
elling-0.001
 shouldn-0.001
agree-0.001
idents-0.001
 hurts-0.001
Token a
Token ID257
Feature activation+1.35e-02

Pos logit contributions
elling+0.357
 shouldn+0.344
idents+0.343
rated+0.341
 hurts+0.337
Neg logit contributions
 pirates-0.167
 detective-0.165
 knights-0.154
 chefs-0.153
 customer-0.151
Token time
Token ID640
Feature activation-2.83e-02

Pos logit contributions
 detective+0.023
 cam+0.022
 customer+0.022
 chefs+0.022
 servant+0.022
Neg logit contributions
 shouldn-0.019
|-0.019
 hurts-0.018
rated-0.018
idents-0.017
Token,
Token ID11
Feature activation+1.17e-01

Pos logit contributions
 shouldn+0.051
|+0.050
os+0.048
 hurts+0.048
 aback+0.046
Neg logit contributions
 detective-0.038
 cowboy-0.036
 customer-0.036
 cam-0.035
 rulers-0.035
Token also
Token ID635
Feature activation-3.97e-02

Pos logit contributions
 detective+0.081
 pirate+0.073
 ha+0.073
 cowboy+0.072
 pirates+0.071
Neg logit contributions
|-0.112
elling-0.109
upid-0.108
rated-0.105
opl-0.104
Token felt
Token ID2936
Feature activation-1.26e-02

Pos logit contributions
|+0.068
elling+0.066
rated+0.065
 shouldn+0.065
 hurts+0.064
Neg logit contributions
 detective-0.056
 cam-0.053
 customer-0.052
 ha-0.051
 pirates-0.051
Token better
Token ID1365
Feature activation-1.84e-02

Pos logit contributions
 shouldn+0.023
|+0.022
 hurts+0.022
rated+0.022
elling+0.022
Neg logit contributions
 detective-0.017
 cam-0.016
 customer-0.016
 pirates-0.015
 chefs-0.015
Token knowing
Token ID6970
Feature activation-5.48e-02

Pos logit contributions
 shouldn+0.027
 hurts+0.026
|+0.024
 aback+0.024
rated+0.024
Neg logit contributions
 detective-0.031
 cam-0.030
 customer-0.030
 cowboy-0.029
 rulers-0.029
Token that
Token ID326
Feature activation-7.25e-02

Pos logit contributions
 shouldn+0.080
 hurts+0.074
 aback+0.069
|+0.068
 doesn+0.068
Neg logit contributions
 detective-0.093
 cam-0.090
 customer-0.088
 cowboy-0.087
 chefs-0.087
Token<|endoftext|>
Token ID50256
Feature activation-2.20e-01

Pos logit contributions
 shouldn+0.120
 hurts+0.115
|+0.109
rated+0.107
 aback+0.106
Neg logit contributions
 detective-0.109
 cam-0.104
 customer-0.101
 cowboy-0.101
 chefs-0.100
TokenOnce
Token ID7454
Feature activation-1.43e-03

Pos logit contributions
idents+0.307
rated+0.300
 shouldn+0.298
 hurts+0.282
elling+0.279
Neg logit contributions
 cam-0.374
 pirates-0.374
 detective-0.368
 chefs-0.366
 customer-0.363
Token there
Token ID612
Feature activation-9.78e-02

Pos logit contributions
elling+0.002
 shouldn+0.002
rated+0.002
agree+0.002
idents+0.002
Neg logit contributions
 detective-0.002
 customer-0.002
 cam-0.002
 pirates-0.002
 servant-0.002
Token was
Token ID373
Feature activation-3.56e-02

Pos logit contributions
elling+0.177
rated+0.161
 hurts+0.160
idents+0.157
agree+0.157
Neg logit contributions
 ha-0.133
 cam-0.125
 detective-0.123
 customer-0.122
 pirates-0.122
Token a
Token ID257
Feature activation+2.75e-02

Pos logit contributions
 shouldn+0.065
 hurts+0.063
rated+0.062
|+0.061
elling+0.061
Neg logit contributions
 detective-0.047
 cam-0.045
 customer-0.044
 pirates-0.044
 chefs-0.044
Token girl
Token ID2576
Feature activation-6.77e-02

Pos logit contributions
 rulers+0.044
 penn+0.044
 cents+0.042
anges+0.041
 detective+0.041
Neg logit contributions
 shouldn-0.034
 hurts-0.031
|-0.030
 The-0.030
�-0.028
Token smiled
Token ID13541
Feature activation-3.99e-02

Pos logit contributions
elling+0.024
 shouldn+0.023
 hurts+0.022
rated+0.022
|+0.022
Neg logit contributions
 cam-0.024
 detective-0.024
 ha-0.022
 cowboy-0.022
 mer-0.022
Token and
Token ID290
Feature activation-5.49e-02

Pos logit contributions
 shouldn+0.055
 aback+0.050
 hurts+0.049
elling+0.048
idents+0.047
Neg logit contributions
 detective-0.069
 customer-0.068
 cam-0.068
 ha-0.068
 chefs-0.068
Token said
Token ID531
Feature activation-8.27e-03

Pos logit contributions
 shouldn+0.087
 hurts+0.082
|+0.081
elling+0.077
rated+0.077
Neg logit contributions
 detective-0.086
 customer-0.083
 cam-0.081
 chefs-0.080
 rulers-0.080
Token goodbye
Token ID24829
Feature activation-5.14e-02

Pos logit contributions
 shouldn+0.013
 hurts+0.012
 aback+0.012
'.+0.011
 Whenever+0.011
Neg logit contributions
 detective-0.013
 customer-0.013
 chefs-0.013
 rulers-0.012
 cam-0.012
Token before
Token ID878
Feature activation-1.12e-01

Pos logit contributions
 shouldn+0.079
 hurts+0.075
|+0.075
elling+0.073
rated+0.072
Neg logit contributions
 detective-0.082
 cam-0.079
 customer-0.078
 cowboy-0.076
 chefs-0.076
Token<|endoftext|>
Token ID50256
Feature activation-2.19e-01

Pos logit contributions
 shouldn+0.187
|+0.178
 hurts+0.176
idents+0.164
 Doing+0.163
Neg logit contributions
 detective-0.165
 customer-0.156
 servant-0.154
 cam-0.154
 rulers-0.154
TokenOne
Token ID3198
Feature activation+5.61e-02

Pos logit contributions
rated+0.320
idents+0.316
 shouldn+0.307
elling+0.298
 hurts+0.298
Neg logit contributions
 pirates-0.365
 cam-0.362
 detective-0.353
 chefs-0.351
 ha-0.350
Token day
Token ID1110
Feature activation-4.30e-02

Pos logit contributions
 customer+0.059
 cam+0.058
 fair+0.056
 ha+0.055
 pirate+0.055
Neg logit contributions
idents-0.117
 shouldn-0.111
 aback-0.111
elling-0.108
 hurts-0.106
Token,
Token ID11
Feature activation+1.11e-01

Pos logit contributions
 shouldn+0.078
 hurts+0.075
|+0.074
os+0.072
'.+0.071
Neg logit contributions
 detective-0.057
 customer-0.052
 cam-0.052
 servant-0.051
 chefs-0.051
Token something
Token ID1223
Feature activation+7.78e-03

Pos logit contributions
 detective+0.150
 customer+0.143
 cam+0.142
 pirates+0.142
 cowboy+0.139
Neg logit contributions
 shouldn-0.193
 hurts-0.190
elling-0.189
|-0.184
rated-0.184
Token bright
Token ID6016
Feature activation-5.38e-02

Pos logit contributions
 detective+0.012
 customer+0.011
 cam+0.011
 chefs+0.011
 rulers+0.011
Neg logit contributions
 shouldn-0.013
 hurts-0.012
|-0.012
rated-0.011
elling-0.011
Token how
Token ID703
Feature activation+7.82e-04

Pos logit contributions
 shouldn+0.085
 hurts+0.081
|+0.080
elling+0.078
rated+0.078
Neg logit contributions
 detective-0.083
 cam-0.079
 customer-0.079
 cowboy-0.076
 chefs-0.076
Token cold
Token ID4692
Feature activation-8.76e-02

Pos logit contributions
 cam+0.001
 detective+0.001
 pirates+0.001
 knights+0.001
 rulers+0.001
Neg logit contributions
|-0.002
idents-0.002
elling-0.002
 shouldn-0.002
agree-0.002
Token it
Token ID340
Feature activation-1.27e-01

Pos logit contributions
 shouldn+0.155
|+0.153
 hurts+0.149
elling+0.148
rated+0.146
Neg logit contributions
 detective-0.115
 cam-0.110
 customer-0.107
 pirates-0.107
 cowboy-0.107
Token got
Token ID1392
Feature activation+6.58e-02

Pos logit contributions
elling+0.206
upid+0.200
|+0.192
ric+0.190
 aback+0.190
Neg logit contributions
 detective-0.173
 mer-0.165
 pirates-0.164
 cam-0.164
 pirate-0.161
Token or
Token ID393
Feature activation-6.75e-02

Pos logit contributions
 detective+0.097
 cam+0.092
 customer+0.091
 cowboy+0.089
 chefs+0.089
Neg logit contributions
 shouldn-0.109
 hurts-0.104
|-0.104
elling-0.101
rated-0.100
Token<|endoftext|>
Token ID50256
Feature activation-2.19e-01

Pos logit contributions
|+0.115
elling+0.111
 shouldn+0.110
os+0.109
ingle+0.107
Neg logit contributions
 detective-0.091
 pirates-0.088
 cam-0.087
 rulers-0.086
 chefs-0.085
TokenOnce
Token ID7454
Feature activation+4.06e-03

Pos logit contributions
idents+0.298
rated+0.293
 shouldn+0.290
 hurts+0.281
elling+0.279
Neg logit contributions
 pirates-0.382
 cam-0.380
 detective-0.378
 customer-0.378
 chefs-0.371
Token there
Token ID612
Feature activation-9.51e-02

Pos logit contributions
 detective+0.005
 customer+0.005
 cam+0.005
 pirates+0.005
 pirate+0.005
Neg logit contributions
elling-0.007
 shouldn-0.007
rated-0.007
idents-0.007
 hurts-0.007
Token was
Token ID373
Feature activation-3.62e-02

Pos logit contributions
elling+0.154
 hurts+0.141
rated+0.141
 shouldn+0.138
idents+0.136
Neg logit contributions
 ha-0.144
 detective-0.141
 cam-0.141
 customer-0.139
 pirates-0.136
Token a
Token ID257
Feature activation+2.56e-02

Pos logit contributions
 shouldn+0.069
 hurts+0.068
rated+0.066
elling+0.065
idents+0.065
Neg logit contributions
 detective-0.045
 pirates-0.043
 cam-0.042
 chefs-0.042
 customer-0.042
Token little
Token ID1310
Feature activation-2.27e-02

Pos logit contributions
 rulers+0.040
 penn+0.039
 chefs+0.038
 detective+0.037
 cents+0.036
Neg logit contributions
 shouldn-0.033
|-0.031
 hurts-0.030
rig-0.029
 The-0.029
Token it
Token ID340
Feature activation-1.12e-01

Pos logit contributions
|+0.091
elling+0.091
 shouldn+0.089
rated+0.088
idents+0.087
Neg logit contributions
 detective-0.089
 pirates-0.086
 customer-0.085
 cam-0.085
 rulers-0.083
Token couldn
Token ID3521
Feature activation-1.34e-01

Pos logit contributions
opl+0.216
upid+0.210
 aback+0.204
elling+0.203
agree+0.202
Neg logit contributions
 mer-0.123
 pirates-0.123
 explorers-0.119
 pirate-0.117
 cam-0.117
Token't
Token ID470
Feature activation-1.20e-01

Pos logit contributions
rated+0.235
 aback+0.226
 disl+0.222
elling+0.216
anne+0.208
Neg logit contributions
 ha-0.183
 customer-0.150
 dressing-0.147
P-0.143
 b-0.141
Token tell
Token ID1560
Feature activation-5.33e-02

Pos logit contributions
elling+0.230
 aback+0.226
 shouldn+0.221
 Doing+0.218
idents+0.217
Neg logit contributions
 cam-0.138
 mer-0.132
 pirates-0.131
 detective-0.130
 customer-0.129
Token the
Token ID262
Feature activation-6.77e-02

Pos logit contributions
|+0.088
 shouldn+0.087
idents+0.084
rated+0.084
elling+0.084
Neg logit contributions
 detective-0.075
 customer-0.073
 cam-0.072
 pirates-0.071
 rulers-0.071
Token<|endoftext|>
Token ID50256
Feature activation-2.18e-01

Pos logit contributions
 shouldn+0.118
|+0.118
elling+0.114
 hurts+0.113
os+0.113
Neg logit contributions
 detective-0.092
 cam-0.090
 customer-0.088
 pirates-0.087
 chefs-0.086
TokenJohn
Token ID7554
Feature activation-1.05e-01

Pos logit contributions
idents+0.291
 shouldn+0.278
rated+0.271
 hurts+0.269
elling+0.264
Neg logit contributions
 customer-0.390
 pirates-0.386
 detective-0.386
 cam-0.383
 knights-0.375
Token was
Token ID373
Feature activation-5.02e-02

Pos logit contributions
|+0.184
idents+0.183
elling+0.181
 hurts+0.180
 shouldn+0.178
Neg logit contributions
 detective-0.138
 cam-0.134
 customer-0.132
 pirates-0.128
 pirate-0.126
Token walking
Token ID6155
Feature activation+3.81e-02

Pos logit contributions
 shouldn+0.099
 hurts+0.096
|+0.095
elling+0.093
rated+0.093
Neg logit contributions
 detective-0.058
 cam-0.055
 customer-0.055
 cowboy-0.052
 chefs-0.052
Token in
Token ID287
Feature activation-3.01e-02

Pos logit contributions
 detective+0.061
 customer+0.058
 cam+0.058
 pirates+0.057
 cowboy+0.057
Neg logit contributions
 shouldn-0.059
 hurts-0.057
|-0.055
elling-0.055
idents-0.053
Token a
Token ID257
Feature activation+3.58e-02

Pos logit contributions
 shouldn+0.043
 hurts+0.041
elling+0.040
rated+0.039
|+0.039
Neg logit contributions
 detective-0.052
 customer-0.051
 cam-0.050
 pirates-0.050
 chefs-0.048
Token and
Token ID290
Feature activation-5.48e-02

Pos logit contributions
 shouldn+0.162
 hurts+0.152
|+0.149
elling+0.148
rated+0.147
Neg logit contributions
 detective-0.155
 cam-0.148
 customer-0.148
 chefs-0.144
 rulers-0.142
Token waved
Token ID26834
Feature activation-1.70e-03

Pos logit contributions
 shouldn+0.085
 hurts+0.081
|+0.079
elling+0.077
rated+0.077
Neg logit contributions
 detective-0.088
 customer-0.084
 cam-0.083
 chefs-0.081
 cowboy-0.081
Token goodbye
Token ID24829
Feature activation-5.61e-02

Pos logit contributions
 aback+0.002
 shouldn+0.002
 hurts+0.002
'.+0.002
rated+0.002
Neg logit contributions
 detective-0.003
 chefs-0.003
 customer-0.003
 rulers-0.003
 cam-0.003
Token as
Token ID355
Feature activation-6.69e-03

Pos logit contributions
 shouldn+0.081
 hurts+0.077
|+0.075
rated+0.073
idents+0.073
Neg logit contributions
 detective-0.095
 customer-0.091
 cam-0.091
 chefs-0.089
 cowboy-0.088
Token it
Token ID340
Feature activation-1.24e-01

Pos logit contributions
 shouldn+0.011
 hurts+0.010
|+0.010
elling+0.010
rated+0.010
Neg logit contributions
 detective-0.010
 cam-0.009
 customer-0.009
 chefs-0.009
 cowboy-0.009
Token<|endoftext|>
Token ID50256
Feature activation-2.18e-01

Pos logit contributions
 shouldn+0.210
 hurts+0.201
|+0.199
rated+0.194
os+0.191
Neg logit contributions
 detective-0.179
 customer-0.172
 cam-0.168
 chefs-0.167
 rulers-0.162
TokenTom
Token ID13787
Feature activation-5.82e-02

Pos logit contributions
idents+0.318
rated+0.312
 shouldn+0.307
 hurts+0.297
elling+0.293
Neg logit contributions
 pirates-0.365
 cam-0.363
 detective-0.357
 chefs-0.352
 customer-0.351
Tokenmy
Token ID1820
Feature activation-1.06e-01

Pos logit contributions
idents+0.074
 shouldn+0.073
 hurts+0.072
|+0.071
agree+0.071
Neg logit contributions
 detective-0.103
 customer-0.102
 cam-0.101
 servant-0.098
 ha-0.096
Token was
Token ID373
Feature activation-5.56e-02

Pos logit contributions
 shouldn+0.184
|+0.182
 hurts+0.182
elling+0.180
rated+0.179
Neg logit contributions
 detective-0.141
 cam-0.138
 customer-0.131
 ha-0.130
 pirates-0.130
Token walking
Token ID6155
Feature activation+3.30e-02

Pos logit contributions
 shouldn+0.108
 hurts+0.105
idents+0.103
elling+0.102
|+0.102
Neg logit contributions
 detective-0.066
 cam-0.064
 customer-0.063
 pirates-0.061
 cowboy-0.060
Token in
Token ID287
Feature activation-3.26e-02

Pos logit contributions
 detective+0.057
 cam+0.055
 customer+0.055
 pirates+0.054
 cowboy+0.053
Neg logit contributions
 shouldn-0.048
 hurts-0.045
elling-0.043
|-0.043
idents-0.043
Token she
Token ID673
Feature activation+3.32e-02

Pos logit contributions
elling+0.118
|+0.117
 shouldn+0.116
rated+0.114
 hurts+0.113
Neg logit contributions
 detective-0.097
 cam-0.092
 customer-0.091
 ha-0.090
 pirates-0.090
Token couldn
Token ID3521
Feature activation-1.26e-01

Pos logit contributions
 ha+0.051
 detective+0.050
 cam+0.048
 pirate+0.048
 pirates+0.047
Neg logit contributions
elling-0.054
rated-0.050
|-0.050
upid-0.048
idents-0.048
Token't
Token ID470
Feature activation-1.18e-01

Pos logit contributions
rated+0.248
elling+0.223
 aback+0.212
rig+0.206
ric+0.205
Neg logit contributions
 ha-0.171
 customer-0.142
P-0.133
 soccer-0.131
 cowboy-0.127
Token get
Token ID651
Feature activation+2.81e-02

Pos logit contributions
 shouldn+0.184
 hurts+0.177
|+0.174
elling+0.172
rated+0.169
Neg logit contributions
 detective-0.185
 customer-0.177
 cam-0.177
 cowboy-0.172
 pirates-0.171
Token it
Token ID340
Feature activation-1.36e-01

Pos logit contributions
 detective+0.048
 customer+0.047
 cam+0.046
 pirates+0.045
 chefs+0.045
Neg logit contributions
 shouldn-0.040
|-0.038
 hurts-0.038
elling-0.037
rated-0.037
Token<|endoftext|>
Token ID50256
Feature activation-2.17e-01

Pos logit contributions
 shouldn+0.209
elling+0.199
|+0.198
 hurts+0.197
rated+0.192
Neg logit contributions
 detective-0.214
 customer-0.206
 cam-0.200
 pirates-0.199
 cowboy-0.199
TokenOnce
Token ID7454
Feature activation-2.75e-03

Pos logit contributions
idents+0.305
rated+0.303
 shouldn+0.292
 hurts+0.286
elling+0.284
Neg logit contributions
 pirates-0.372
 detective-0.364
 cam-0.364
P-0.363
 customer-0.362
Token upon
Token ID2402
Feature activation-1.65e-01

Pos logit contributions
elling+0.005
agree+0.005
rated+0.005
 shouldn+0.005
 aback+0.005
Neg logit contributions
 customer-0.003
 detective-0.003
 cam-0.003
 pirates-0.003
 dressing-0.003
Token a
Token ID257
Feature activation+1.18e-02

Pos logit contributions
elling+0.395
rated+0.371
idents+0.369
upid+0.364
opl+0.363
Neg logit contributions
 pirates-0.144
 detective-0.131
 knights-0.128
 chefs-0.125
 customer-0.119
Token time
Token ID640
Feature activation-3.01e-02

Pos logit contributions
 detective+0.019
 cam+0.018
 customer+0.018
 chefs+0.018
 cowboy+0.018
Neg logit contributions
 shouldn-0.018
|-0.017
 hurts-0.017
rated-0.016
elling-0.016
Token there
Token ID612
Feature activation-7.60e-02

Pos logit contributions
 shouldn+0.058
|+0.056
 hurts+0.055
os+0.054
idents+0.053
Neg logit contributions
 detective-0.036
 cam-0.034
 customer-0.034
 cowboy-0.034
 rulers-0.033
Token this
Token ID428
Feature activation-2.15e-02

Pos logit contributions
 shouldn+0.119
 hurts+0.112
|+0.109
elling+0.106
 aback+0.105
Neg logit contributions
 detective-0.108
 customer-0.102
 cam-0.102
 chefs-0.101
 cowboy-0.100
Token old
Token ID1468
Feature activation-5.01e-02

Pos logit contributions
|+0.030
 shouldn+0.030
elling+0.029
 hurts+0.029
idents+0.028
Neg logit contributions
 detective-0.036
 cam-0.035
 customer-0.035
 cowboy-0.034
 pirates-0.034
Token man
Token ID582
Feature activation-5.53e-02

Pos logit contributions
 shouldn+0.097
|+0.096
elling+0.094
 hurts+0.093
rated+0.091
Neg logit contributions
 detective-0.061
 customer-0.058
 cam-0.058
 cowboy-0.055
 pirates-0.054
Token might
Token ID1244
Feature activation-9.09e-02

Pos logit contributions
|+0.075
elling+0.074
 hurts+0.072
 shouldn+0.072
rated+0.071
Neg logit contributions
 detective-0.095
 customer-0.090
 cam-0.090
 ha-0.088
 cowboy-0.088
Token be
Token ID307
Feature activation+5.60e-02

Pos logit contributions
elling+0.167
 shouldn+0.155
os+0.155
idents+0.154
rated+0.150
Neg logit contributions
 cam-0.118
 detective-0.117
 customer-0.115
 ha-0.110
 mer-0.109
Token<|endoftext|>
Token ID50256
Feature activation-2.17e-01

Pos logit contributions
 detective+0.086
 cam+0.082
 customer+0.082
 cowboy+0.079
 chefs+0.079
Neg logit contributions
 shouldn-0.090
 hurts-0.086
|-0.084
elling-0.082
rated-0.082
TokenOnce
Token ID7454
Feature activation+8.34e-04

Pos logit contributions
 shouldn+0.270
idents+0.269
rated+0.267
elling+0.255
 hurts+0.254
Neg logit contributions
 detective-0.398
 cam-0.397
 pirates-0.395
 customer-0.395
 chefs-0.389
Token there
Token ID612
Feature activation-9.82e-02

Pos logit contributions
 customer+0.001
 detective+0.001
 cam+0.001
 pirates+0.001
 servant+0.001
Neg logit contributions
elling-0.001
rated-0.001
 shouldn-0.001
agree-0.001
idents-0.001
Token was
Token ID373
Feature activation-3.51e-02

Pos logit contributions
elling+0.168
 hurts+0.153
rated+0.152
idents+0.147
agree+0.147
Neg logit contributions
 ha-0.144
 detective-0.137
 cam-0.136
 customer-0.135
 pirates-0.132
Token a
Token ID257
Feature activation+2.31e-02

Pos logit contributions
 shouldn+0.066
 hurts+0.065
rated+0.064
elling+0.063
|+0.062
Neg logit contributions
 detective-0.044
 pirates-0.042
 cam-0.042
 chefs-0.041
 customer-0.041
Token little
Token ID1310
Feature activation-2.32e-02

Pos logit contributions
 rulers+0.037
 penn+0.036
 cents+0.034
 detective+0.034
 chefs+0.034
Neg logit contributions
 shouldn-0.029
|-0.026
 hurts-0.025
 The-0.025
rig-0.024
Token use
Token ID779
Feature activation+6.17e-02

Pos logit contributions
 shouldn+0.023
elling+0.022
rated+0.022
|+0.022
 hurts+0.022
Neg logit contributions
 detective-0.019
 customer-0.018
 ha-0.018
 cam-0.018
 rulers-0.017
Token it
Token ID340
Feature activation-8.70e-02

Pos logit contributions
 detective+0.088
 customer+0.085
 rulers+0.085
 pirates+0.085
 chefs+0.084
Neg logit contributions
 shouldn-0.102
|-0.102
rated-0.102
elling-0.095
 hurts-0.094
Token whenever
Token ID8797
Feature activation-1.95e-02

Pos logit contributions
 shouldn+0.138
|+0.136
 hurts+0.132
elling+0.131
rated+0.129
Neg logit contributions
 detective-0.131
 customer-0.124
 rulers-0.123
 pirates-0.122
 chefs-0.122
Token you
Token ID345
Feature activation-6.46e-02

Pos logit contributions
 shouldn+0.028
 hurts+0.027
|+0.026
elling+0.025
rated+0.025
Neg logit contributions
 detective-0.033
 cam-0.032
 customer-0.031
 cowboy-0.031
 chefs-0.030
Token want
Token ID765
Feature activation-5.14e-02

Pos logit contributions
elling+0.102
|+0.099
idents+0.098
 shouldn+0.097
 hurts+0.096
Neg logit contributions
 detective-0.090
 ha-0.090
 pirates-0.088
 cam-0.087
 rulers-0.086
Token<|endoftext|>
Token ID50256
Feature activation-2.17e-01

Pos logit contributions
|+0.073
 shouldn+0.072
rated+0.069
elling+0.068
 hurts+0.067
Neg logit contributions
 detective-0.086
 customer-0.084
 chefs-0.082
 pirates-0.082
 rulers-0.081
TokenOnce
Token ID7454
Feature activation-6.13e-05

Pos logit contributions
rated+0.351
idents+0.346
 shouldn+0.326
elling+0.321
 disl+0.310
Neg logit contributions
 pirates-0.344
P-0.336
 chefs-0.334
 ha-0.333
 customer-0.329
Token there
Token ID612
Feature activation-9.89e-02

Pos logit contributions
elling+0.000
 shouldn+0.000
 hurts+0.000
 aback+0.000
agree+0.000
Neg logit contributions
 customer-0.000
 detective-0.000
 pirates-0.000
 mate-0.000
 servant-0.000
Token was
Token ID373
Feature activation-3.92e-02

Pos logit contributions
elling+0.165
 hurts+0.151
rated+0.151
 shouldn+0.145
ches+0.145
Neg logit contributions
 ha-0.147
 customer-0.142
 detective-0.142
 cam-0.139
 pirates-0.138
Token a
Token ID257
Feature activation+1.87e-02

Pos logit contributions
 hurts+0.078
rated+0.076
 shouldn+0.076
elling+0.074
idents+0.074
Neg logit contributions
 pirates-0.046
 chefs-0.045
 detective-0.044
 cam-0.043
 knights-0.043
Token girl
Token ID2576
Feature activation-6.32e-02

Pos logit contributions
 rulers+0.027
 penn+0.027
 detective+0.026
 cam+0.026
 chefs+0.026
Neg logit contributions
 shouldn-0.027
|-0.025
 hurts-0.024
rig-0.023
 doesn-0.023
Token full
Token ID1336
Feature activation+5.79e-02

Pos logit contributions
 detective+0.232
 cam+0.222
 customer+0.221
 rulers+0.213
 chefs+0.212
Neg logit contributions
 shouldn-0.260
 hurts-0.244
|-0.241
elling-0.235
rated-0.232
Token of
Token ID286
Feature activation+2.50e-02

Pos logit contributions
 detective+0.107
 chefs+0.101
 cam+0.101
 customer+0.100
 cowboy+0.100
Neg logit contributions
 shouldn-0.075
 hurts-0.071
elling-0.068
|-0.066
 aback-0.064
Token love
Token ID1842
Feature activation-3.72e-02

Pos logit contributions
 detective+0.034
 cam+0.032
 customer+0.032
 cowboy+0.031
 chefs+0.031
Neg logit contributions
 shouldn-0.044
 hurts-0.042
|-0.041
elling-0.041
rated-0.040
Token.
Token ID13
Feature activation-3.42e-02

Pos logit contributions
 shouldn+0.057
 hurts+0.054
|+0.053
elling+0.052
rated+0.051
Neg logit contributions
 detective-0.060
 cam-0.058
 customer-0.057
 cowboy-0.056
 chefs-0.055
Token<|endoftext|>
Token ID50256
Feature activation-1.80e-01

Pos logit contributions
rated+0.063
 hurts+0.062
elling+0.059
ingle+0.054
upid+0.054
Neg logit contributions
 customer-0.046
 detective-0.043
 cam-0.042
 pirates-0.041
 knights-0.040
TokenSad
Token ID26699
Feature activation-2.16e-01

Pos logit contributions
idents+0.340
rated+0.337
elling+0.324
 disl+0.320
 shouldn+0.314
Neg logit contributions
 pirates-0.232
P-0.224
 customer-0.221
 chefs-0.219
 knights-0.218
Tokenie
Token ID494
Feature activation-7.83e-02

Pos logit contributions
 shouldn+0.397
 aback+0.391
idents+0.388
agree+0.377
|+0.376
Neg logit contributions
 cam-0.272
 customer-0.230
 princess-0.228
 knights-0.226
 neck-0.226
Token was
Token ID373
Feature activation-2.38e-02

Pos logit contributions
idents+0.136
elling+0.134
rated+0.133
 shouldn+0.130
upid+0.130
Neg logit contributions
 cam-0.106
 ha-0.104
 detective-0.103
 pirate-0.100
 pirates-0.096
Token feeling
Token ID4203
Feature activation-8.64e-02

Pos logit contributions
 shouldn+0.043
 hurts+0.041
|+0.041
elling+0.040
rated+0.039
Neg logit contributions
 detective-0.032
 cam-0.030
 customer-0.030
 cowboy-0.029
 chefs-0.029
Token angry
Token ID7954
Feature activation-1.73e-01

Pos logit contributions
 shouldn+0.158
 hurts+0.151
|+0.150
elling+0.147
rated+0.146
Neg logit contributions
 detective-0.114
 cam-0.109
 customer-0.108
 cowboy-0.104
 chefs-0.103
Token.
Token ID13
Feature activation-7.28e-02

Pos logit contributions
 shouldn+0.248
 hurts+0.233
elling+0.222
rated+0.218
idents+0.217
Neg logit contributions
 detective-0.293
 cam-0.283
 customer-0.283
 cowboy-0.272
 buyer-0.272
Token an
Token ID281
Feature activation+3.80e-02

Pos logit contributions
 detective+0.022
 cam+0.021
 customer+0.021
 chefs+0.020
 cowboy+0.020
Neg logit contributions
 shouldn-0.025
 hurts-0.025
|-0.025
elling-0.024
rated-0.024
Token adventure
Token ID8855
Feature activation+7.90e-02

Pos logit contributions
 detective+0.057
 cam+0.054
 customer+0.054
 chefs+0.053
 cowboy+0.052
Neg logit contributions
 shouldn-0.063
 hurts-0.059
elling-0.058
|-0.058
rated-0.058
Token they
Token ID484
Feature activation+2.72e-02

Pos logit contributions
 detective+0.124
 cam+0.118
 customer+0.118
 cowboy+0.114
 pirates+0.114
Neg logit contributions
 shouldn-0.123
 hurts-0.118
|-0.117
elling-0.115
rated-0.113
Token would
Token ID561
Feature activation+1.15e-02

Pos logit contributions
 detective+0.041
 cam+0.039
 customer+0.039
 cowboy+0.038
 pirates+0.038
Neg logit contributions
 hurts-0.044
elling-0.043
 shouldn-0.041
idents-0.041
|-0.041
Token never
Token ID1239
Feature activation-6.34e-02

Pos logit contributions
 detective+0.017
 cam+0.017
 customer+0.017
 rulers+0.016
 cowboy+0.016
Neg logit contributions
 shouldn-0.018
 hurts-0.018
elling-0.018
|-0.017
rated-0.017
Token<|endoftext|>
Token ID50256
Feature activation-2.16e-01

Pos logit contributions
 shouldn+0.091
 hurts+0.085
|+0.084
rated+0.081
elling+0.081
Neg logit contributions
 detective-0.109
 cam-0.105
 customer-0.105
 cowboy-0.102
 chefs-0.102
TokenOnce
Token ID7454
Feature activation+1.98e-04

Pos logit contributions
 shouldn+0.265
idents+0.264
 hurts+0.261
rated+0.258
elling+0.252
Neg logit contributions
 customer-0.401
 detective-0.400
 cam-0.396
 pirates-0.396
 chefs-0.387
Token upon
Token ID2402
Feature activation-1.68e-01

Pos logit contributions
 customer+0.000
 cam+0.000
 pirates+0.000
 detective+0.000
 mate+0.000
Neg logit contributions
elling-0.000
agree-0.000
rated-0.000
 hurts-0.000
|-0.000
Token a
Token ID257
Feature activation+1.13e-02

Pos logit contributions
elling+0.367
 shouldn+0.349
rated+0.345
 hurts+0.345
idents+0.342
Neg logit contributions
 pirates-0.168
 detective-0.163
 knights-0.155
 customer-0.152
 chefs-0.152
Token time
Token ID640
Feature activation-2.82e-02

Pos logit contributions
 detective+0.019
 cam+0.018
 customer+0.018
 chefs+0.018
 cowboy+0.017
Neg logit contributions
 shouldn-0.017
|-0.016
 hurts-0.016
rated-0.015
elling-0.015
Token,
Token ID11
Feature activation+1.17e-01

Pos logit contributions
 shouldn+0.053
|+0.050
 hurts+0.049
os+0.049
idents+0.048
Neg logit contributions
 detective-0.036
 customer-0.034
 cam-0.033
 cowboy-0.033
 rulers-0.033
Token more
Token ID517
Feature activation+4.80e-02

Pos logit contributions
 shouldn+0.044
 hurts+0.043
rated+0.042
 aback+0.042
 doesn+0.042
Neg logit contributions
 detective-0.037
 customer-0.037
 cam-0.037
 cowboy-0.036
 chefs-0.036
Token than
Token ID621
Feature activation+1.01e-03

Pos logit contributions
 detective+0.075
 cam+0.072
 customer+0.071
 cowboy+0.070
 chefs+0.069
Neg logit contributions
 shouldn-0.078
 hurts-0.073
|-0.072
elling-0.070
rated-0.070
Token his
Token ID465
Feature activation+2.40e-02

Pos logit contributions
 detective+0.002
 cam+0.002
 customer+0.002
 cowboy+0.002
 chefs+0.002
Neg logit contributions
 shouldn-0.001
 hurts-0.001
|-0.001
elling-0.001
rated-0.001
Token other
Token ID584
Feature activation-5.59e-03

Pos logit contributions
 detective+0.031
 customer+0.030
 chefs+0.030
 cowboy+0.030
 cam+0.030
Neg logit contributions
 shouldn-0.042
 hurts-0.041
 fault-0.039
idents-0.039
 doesn-0.038
Token toys
Token ID14958
Feature activation+3.74e-03

Pos logit contributions
 shouldn+0.010
 hurts+0.009
|+0.009
rated+0.009
elling+0.009
Neg logit contributions
 detective-0.008
 customer-0.008
 cowboy-0.007
 chefs-0.007
 cam-0.007
Token<|endoftext|>
Token ID50256
Feature activation-2.16e-01

Pos logit contributions
 detective+0.007
 cam+0.006
 customer+0.006
 cowboy+0.006
 chefs+0.006
Neg logit contributions
 shouldn-0.005
 hurts-0.005
|-0.005
rated-0.004
os-0.004
TokenAn
Token ID2025
Feature activation-7.42e-02

Pos logit contributions
idents+0.293
rated+0.287
 shouldn+0.286
 hurts+0.274
elling+0.272
Neg logit contributions
 pirates-0.379
 cam-0.377
 detective-0.372
 customer-0.371
 chefs-0.370
Token adventurous
Token ID39770
Feature activation-6.74e-02

Pos logit contributions
 shouldn+0.122
 hurts+0.120
idents+0.118
elling+0.113
 aback+0.112
Neg logit contributions
 detective-0.111
 customer-0.109
 cam-0.108
 pirates-0.105
 cowboy-0.104
Token boy
Token ID2933
Feature activation-9.02e-02

Pos logit contributions
 shouldn+0.114
 hurts+0.109
|+0.108
elling+0.106
rated+0.105
Neg logit contributions
 detective-0.097
 cam-0.092
 customer-0.092
 chefs-0.089
 cowboy-0.088
Token named
Token ID3706
Feature activation-4.49e-02

Pos logit contributions
 shouldn+0.157
 hurts+0.144
|+0.143
os+0.142
elling+0.139
Neg logit contributions
 detective-0.129
 customer-0.124
 cowboy-0.121
 cam-0.120
 rulers-0.119
Token Tom
Token ID4186
Feature activation-5.91e-02

Pos logit contributions
 hurts+0.060
elling+0.056
idents+0.056
 shouldn+0.055
 aback+0.055
Neg logit contributions
 cam-0.080
 detective-0.078
 cowboy-0.077
 pirates-0.077
 customer-0.076
Token,
Token ID11
Feature activation+1.36e-01

Pos logit contributions
 shouldn+0.232
 hurts+0.221
|+0.219
elling+0.214
rated+0.211
Neg logit contributions
 detective-0.233
 cam-0.223
 customer-0.222
 cowboy-0.216
 chefs-0.215
Token but
Token ID475
Feature activation-7.45e-02

Pos logit contributions
 detective+0.171
 cam+0.163
 customer+0.162
 cowboy+0.159
 pirates+0.157
Neg logit contributions
 shouldn-0.251
elling-0.244
|-0.243
 hurts-0.242
rated-0.236
Token still
Token ID991
Feature activation-8.18e-02

Pos logit contributions
 shouldn+0.129
 hurts+0.123
|+0.121
elling+0.118
rated+0.118
Neg logit contributions
 detective-0.105
 cam-0.100
 customer-0.099
 cowboy-0.096
 chefs-0.096
Token True
Token ID6407
Feature activation-2.31e-02

Pos logit contributions
 shouldn+0.140
|+0.140
elling+0.139
 hurts+0.134
idents+0.133
Neg logit contributions
 detective-0.111
 cam-0.105
 customer-0.103
 pirates-0.102
 ha-0.100
Token was
Token ID373
Feature activation-1.71e-02

Pos logit contributions
 shouldn+0.035
elling+0.035
|+0.035
 hurts+0.034
rated+0.033
Neg logit contributions
 detective-0.036
 cam-0.035
 customer-0.034
 servant-0.033
 ha-0.033
Token nowhere
Token ID12062
Feature activation-2.16e-01

Pos logit contributions
 shouldn+0.031
 hurts+0.030
|+0.030
elling+0.029
rated+0.029
Neg logit contributions
 detective-0.022
 cam-0.021
 customer-0.021
 cowboy-0.020
 chefs-0.020
Token to
Token ID284
Feature activation-6.67e-02

Pos logit contributions
 shouldn+0.344
|+0.339
elling+0.337
idents+0.336
rated+0.331
Neg logit contributions
 detective-0.327
 cam-0.306
 customer-0.305
 pirates-0.302
 cowboy-0.296
Token be
Token ID307
Feature activation+4.49e-02

Pos logit contributions
 shouldn+0.122
elling+0.120
idents+0.116
os+0.115
 hurts+0.114
Neg logit contributions
 detective-0.085
 customer-0.082
 ha-0.082
 cam-0.081
 cowboy-0.078
Token found
Token ID1043
Feature activation+3.44e-02

Pos logit contributions
 detective+0.060
 pirates+0.058
 cam+0.057
 knights+0.057
 cowboy+0.057
Neg logit contributions
idents-0.077
elling-0.077
 shouldn-0.075
|-0.074
 hurts-0.074
Token.
Token ID13
Feature activation-2.96e-02

Pos logit contributions
 detective+0.051
 customer+0.049
 cam+0.049
 pirates+0.048
 chefs+0.047
Neg logit contributions
 shouldn-0.055
|-0.054
 hurts-0.053
elling-0.052
rated-0.052
Token
Token ID198
Feature activation+5.75e-03

Pos logit contributions
rated+0.046
elling+0.043
 shouldn+0.041
ingle+0.041
upid+0.040
Neg logit contributions
 customer-0.049
 detective-0.046
 knights-0.045
 servant-0.045
 ha-0.044
Token day
Token ID1110
Feature activation-1.26e-01

Pos logit contributions
elling+0.017
|+0.017
 shouldn+0.017
idents+0.017
 hurts+0.017
Neg logit contributions
 detective-0.018
 cam-0.018
 customer-0.017
 pirates-0.017
 ha-0.017
Token,
Token ID11
Feature activation+1.15e-01

Pos logit contributions
 shouldn+0.252
 hurts+0.241
elling+0.240
|+0.239
rated+0.236
Neg logit contributions
 detective-0.144
 cam-0.137
 customer-0.135
 cowboy-0.130
 pirates-0.130
Token but
Token ID475
Feature activation-8.17e-02

Pos logit contributions
 detective+0.166
 cam+0.158
 customer+0.157
 cowboy+0.153
 pirates+0.152
Neg logit contributions
 shouldn-0.196
 hurts-0.188
|-0.185
elling-0.184
rated-0.181
Token it
Token ID340
Feature activation-1.19e-01

Pos logit contributions
 shouldn+0.138
 hurts+0.131
|+0.128
rated+0.124
idents+0.123
Neg logit contributions
 detective-0.118
 cam-0.113
 customer-0.111
 chefs-0.109
 rulers-0.108
Token was
Token ID373
Feature activation-2.45e-02

Pos logit contributions
elling+0.194
upid+0.188
 aback+0.177
idents+0.177
|+0.176
Neg logit contributions
 ha-0.172
 detective-0.172
 cam-0.167
 pirate-0.164
 pirates-0.162
Token nowhere
Token ID12062
Feature activation-2.14e-01

Pos logit contributions
elling+0.045
|+0.045
idents+0.045
 shouldn+0.045
os+0.044
Neg logit contributions
 detective-0.029
 customer-0.029
 cam-0.028
 pirates-0.028
 cowboy-0.027
Token to
Token ID284
Feature activation-6.31e-02

Pos logit contributions
idents+0.396
|+0.389
elling+0.383
 shouldn+0.376
upid+0.376
Neg logit contributions
 detective-0.265
 pirates-0.250
 cam-0.249
 customer-0.241
 knights-0.237
Token be
Token ID307
Feature activation+4.42e-02

Pos logit contributions
elling+0.125
 shouldn+0.123
idents+0.121
 aback+0.119
os+0.117
Neg logit contributions
 detective-0.069
 ha-0.069
 customer-0.067
 cam-0.065
 pirates-0.064
Token found
Token ID1043
Feature activation+2.95e-02

Pos logit contributions
 detective+0.058
 pirates+0.058
 knights+0.054
 pirate+0.054
 cam+0.053
Neg logit contributions
idents-0.080
elling-0.078
 hurts-0.074
 shouldn-0.073
|-0.073
Token.
Token ID13
Feature activation-3.71e-02

Pos logit contributions
 detective+0.043
 cam+0.041
 customer+0.041
 pirates+0.041
 chefs+0.040
Neg logit contributions
 shouldn-0.048
|-0.047
 hurts-0.047
idents-0.046
elling-0.045
Token He
Token ID679
Feature activation+6.92e-02

Pos logit contributions
rated+0.050
 hurts+0.049
elling+0.047
 shouldn+0.046
idents+0.045
Neg logit contributions
 customer-0.067
 detective-0.065
 pirates-0.063
 cowboy-0.062
 ha-0.062
Token elf
Token ID23878
Feature activation-4.04e-03

Pos logit contributions
 shouldn+0.067
 hurts+0.064
elling+0.061
 aback+0.060
|+0.060
Neg logit contributions
 detective-0.071
 chefs-0.070
 rulers-0.069
 customer-0.068
 cam-0.068
Token and
Token ID290
Feature activation-5.44e-02

Pos logit contributions
 shouldn+0.007
 hurts+0.007
|+0.007
elling+0.007
rated+0.007
Neg logit contributions
 detective-0.005
 customer-0.005
 cam-0.005
 servant-0.005
 rulers-0.005
Token waved
Token ID26834
Feature activation+7.13e-03

Pos logit contributions
 shouldn+0.081
 hurts+0.076
|+0.075
 aback+0.072
os+0.071
Neg logit contributions
 detective-0.088
 chefs-0.087
 customer-0.086
 rulers-0.084
 cam-0.083
Token goodbye
Token ID24829
Feature activation-5.34e-02

Pos logit contributions
 chefs+0.013
 detective+0.012
 customer+0.012
 rulers+0.012
 cam+0.012
Neg logit contributions
 aback-0.009
 shouldn-0.009
 hurts-0.009
elling-0.008
'.-0.008
Token as
Token ID355
Feature activation-1.02e-02

Pos logit contributions
 shouldn+0.080
 hurts+0.076
|+0.074
 aback+0.073
rated+0.073
Neg logit contributions
 detective-0.088
 customer-0.084
 cam-0.083
 chefs-0.082
 cowboy-0.082
Token<|endoftext|>
Token ID50256
Feature activation-2.14e-01

Pos logit contributions
 shouldn+0.015
 hurts+0.015
|+0.015
os+0.014
elling+0.014
Neg logit contributions
 detective-0.016
 customer-0.015
 cam-0.015
 cowboy-0.015
 chefs-0.015
TokenTom
Token ID13787
Feature activation-5.53e-02

Pos logit contributions
idents+0.282
 shouldn+0.277
rated+0.275
 hurts+0.265
elling+0.258
Neg logit contributions
 cam-0.384
 detective-0.383
 pirates-0.377
 ha-0.376
 customer-0.374
Tokenmy
Token ID1820
Feature activation-1.05e-01

Pos logit contributions
 shouldn+0.065
 hurts+0.063
idents+0.062
|+0.061
rated+0.059
Neg logit contributions
 detective-0.105
 customer-0.104
 cam-0.103
 servant-0.099
 ha-0.099
Token was
Token ID373
Feature activation-5.21e-02

Pos logit contributions
 shouldn+0.183
 hurts+0.178
|+0.178
elling+0.175
rated+0.175
Neg logit contributions
 detective-0.142
 cam-0.136
 customer-0.132
 pirates-0.130
 cowboy-0.129
Token walking
Token ID6155
Feature activation+3.31e-02

Pos logit contributions
 shouldn+0.102
 hurts+0.099
|+0.097
elling+0.097
idents+0.096
Neg logit contributions
 detective-0.061
 customer-0.058
 cam-0.058
 pirates-0.056
 cowboy-0.055
Token in
Token ID287
Feature activation-3.13e-02

Pos logit contributions
 detective+0.061
 cam+0.058
 customer+0.058
 cowboy+0.057
 chefs+0.056
Neg logit contributions
 shouldn-0.044
 hurts-0.041
|-0.041
elling-0.040
rated-0.039
Token him
Token ID683
Feature activation-3.99e-02

Pos logit contributions
 shouldn+0.027
elling+0.026
 hurts+0.026
|+0.026
rated+0.025
Neg logit contributions
 detective-0.030
 customer-0.030
 cam-0.029
 pirates-0.029
 cowboy-0.028
Token from
Token ID422
Feature activation-2.01e-02

Pos logit contributions
 shouldn+0.057
 hurts+0.054
|+0.053
elling+0.052
rated+0.051
Neg logit contributions
 detective-0.068
 cam-0.066
 customer-0.065
 chefs-0.063
 cowboy-0.063
Token playing
Token ID2712
Feature activation+3.82e-02

Pos logit contributions
 shouldn+0.034
 hurts+0.033
|+0.031
rated+0.031
elling+0.031
Neg logit contributions
 detective-0.030
 cam-0.028
 customer-0.027
 cowboy-0.027
 servant-0.027
Token too
Token ID1165
Feature activation-1.29e-01

Pos logit contributions
 detective+0.052
 cam+0.050
 customer+0.049
 cowboy+0.048
 servant+0.047
Neg logit contributions
 shouldn-0.067
 hurts-0.064
|-0.063
 aback-0.061
elling-0.061
Token deep
Token ID2769
Feature activation-6.45e-02

Pos logit contributions
 shouldn+0.231
 hurts+0.222
|+0.217
rated+0.211
elling+0.210
Neg logit contributions
 detective-0.178
 cam-0.167
 customer-0.166
 cowboy-0.162
 chefs-0.160
Token<|endoftext|>
Token ID50256
Feature activation-2.13e-01

Pos logit contributions
 shouldn+0.102
 hurts+0.098
|+0.096
elling+0.094
rated+0.093
Neg logit contributions
 detective-0.100
 customer-0.096
 cam-0.095
 cowboy-0.093
 rulers-0.091
TokenOnce
Token ID7454
Feature activation-6.04e-05

Pos logit contributions
 shouldn+0.256
idents+0.252
rated+0.246
 hurts+0.237
elling+0.237
Neg logit contributions
 pirates-0.403
 detective-0.403
 cam-0.401
 customer-0.401
 chefs-0.397
Token upon
Token ID2402
Feature activation-1.65e-01

Pos logit contributions
elling+0.000
agree+0.000
 aback+0.000
 shouldn+0.000
|+0.000
Neg logit contributions
 customer-0.000
 detective-0.000
 pirates-0.000
 cam-0.000
 mate-0.000
Token a
Token ID257
Feature activation+6.85e-03

Pos logit contributions
elling+0.366
 shouldn+0.350
rated+0.349
idents+0.343
 hurts+0.342
Neg logit contributions
 pirates-0.163
 detective-0.155
 knights-0.149
 chefs-0.148
 pirate-0.144
Token time
Token ID640
Feature activation-3.26e-02

Pos logit contributions
 detective+0.012
 cam+0.011
 customer+0.011
 chefs+0.011
 servant+0.011
Neg logit contributions
 shouldn-0.010
|-0.009
 hurts-0.009
rated-0.009
idents-0.008
Token there
Token ID612
Feature activation-8.05e-02

Pos logit contributions
 shouldn+0.063
|+0.060
 hurts+0.060
os+0.058
idents+0.057
Neg logit contributions
 detective-0.040
 customer-0.038
 cowboy-0.037
 cam-0.037
 rulers-0.036
Token smiled
Token ID13541
Feature activation-5.50e-02

Pos logit contributions
 shouldn+0.115
 hurts+0.109
|+0.108
elling+0.105
rated+0.104
Neg logit contributions
 detective-0.107
 customer-0.103
 cam-0.102
 chefs-0.099
 cowboy-0.099
Token as
Token ID355
Feature activation+1.14e-03

Pos logit contributions
 shouldn+0.085
 aback+0.078
 hurts+0.078
idents+0.076
rated+0.076
Neg logit contributions
 detective-0.086
 customer-0.084
 cam-0.083
 ha-0.082
 chefs-0.081
Token they
Token ID484
Feature activation+6.94e-03

Pos logit contributions
 detective+0.002
 cam+0.002
 customer+0.002
 cowboy+0.002
 pirates+0.002
Neg logit contributions
 shouldn-0.002
 hurts-0.002
|-0.002
elling-0.002
rated-0.002
Token watched
Token ID7342
Feature activation+4.65e-02

Pos logit contributions
 detective+0.010
 customer+0.010
 chefs+0.010
 cam+0.010
 pirates+0.009
Neg logit contributions
 shouldn-0.012
|-0.011
os-0.011
 hurts-0.010
rated-0.010
Token it
Token ID340
Feature activation-1.33e-01

Pos logit contributions
 detective+0.077
 cam+0.074
 customer+0.073
 pirates+0.072
 cowboy+0.072
Neg logit contributions
 shouldn-0.069
 hurts-0.066
|-0.066
elling-0.064
rated-0.063
Token<|endoftext|>
Token ID50256
Feature activation-2.13e-01

Pos logit contributions
 shouldn+0.221
 hurts+0.212
|+0.210
rated+0.204
elling+0.203
Neg logit contributions
 detective-0.195
 customer-0.187
 cam-0.186
 chefs-0.182
 cowboy-0.180
TokenOnce
Token ID7454
Feature activation+6.70e-03

Pos logit contributions
rated+0.350
idents+0.346
 shouldn+0.323
elling+0.321
 hurts+0.315
Neg logit contributions
 pirates-0.334
P-0.322
 cam-0.319
 knights-0.316
 ha-0.315
Token upon
Token ID2402
Feature activation-1.68e-01

Pos logit contributions
 detective+0.009
 customer+0.009
 pirates+0.009
 cam+0.009
 mate+0.009
Neg logit contributions
elling-0.011
 shouldn-0.011
 hurts-0.011
idents-0.011
rated-0.011
Token a
Token ID257
Feature activation-3.71e-04

Pos logit contributions
elling+0.362
 shouldn+0.350
rated+0.349
idents+0.346
 hurts+0.345
Neg logit contributions
 pirates-0.173
 detective-0.169
 knights-0.161
 chefs-0.159
 pirate-0.155
Token time
Token ID640
Feature activation-2.87e-02

Pos logit contributions
 shouldn+0.001
|+0.001
 hurts+0.000
rated+0.000
elling+0.000
Neg logit contributions
 detective-0.001
 cam-0.001
 customer-0.001
 chefs-0.001
 servant-0.001
Token there
Token ID612
Feature activation-7.24e-02

Pos logit contributions
 shouldn+0.058
|+0.056
 hurts+0.056
os+0.053
elling+0.053
Neg logit contributions
 detective-0.033
 customer-0.031
 cam-0.030
 cowboy-0.030
 chefs-0.029
Token to
Token ID284
Feature activation-1.86e-02

Pos logit contributions
 shouldn+0.042
 hurts+0.040
|+0.038
elling+0.038
idents+0.037
Neg logit contributions
 detective-0.064
 customer-0.061
 cam-0.060
 rulers-0.060
 buyer-0.059
Token play
Token ID711
Feature activation+2.12e-02

Pos logit contributions
 hurts+0.022
|+0.021
 shouldn+0.021
 Whenever+0.020
 disl+0.019
Neg logit contributions
 detective-0.035
 buyer-0.033
 cam-0.033
 customer-0.033
 chefs-0.033
Token in
Token ID287
Feature activation-2.02e-02

Pos logit contributions
 cents+0.043
 customer+0.040
 buyer+0.040
 cam+0.040
 servant+0.040
Neg logit contributions
|-0.021
 hurts-0.020
 up-0.020
ches-0.019
 It-0.019
Token her
Token ID607
Feature activation+1.58e-02

Pos logit contributions
 shouldn+0.023
|+0.023
 hurts+0.023
elling+0.021
idents+0.020
Neg logit contributions
 detective-0.040
 cam-0.038
 customer-0.038
 cowboy-0.037
 servant-0.037
Token backyard
Token ID24296
Feature activation+5.58e-02

Pos logit contributions
 detective+0.024
 customer+0.024
 rulers+0.024
 cents+0.024
 cam+0.023
Neg logit contributions
 hurts-0.024
elling-0.022
|-0.022
 shouldn-0.022
 belongs-0.021
Token,
Token ID11
Feature activation+1.43e-01

Pos logit contributions
 detective+0.108
 cowboy+0.108
 servant+0.098
 cam+0.097
 cents+0.097
Neg logit contributions
 hurts-0.064
os-0.064
|-0.064
 Whenever-0.063
 shouldn-0.062
Token where
Token ID810
Feature activation+1.67e-02

Pos logit contributions
 detective+0.196
 customer+0.188
 servant+0.181
 cam+0.181
 rulers+0.177
Neg logit contributions
 shouldn-0.258
|-0.249
 hurts-0.245
idents-0.231
rated-0.231
Token she
Token ID673
Feature activation+1.14e-02

Pos logit contributions
 detective+0.023
 customer+0.022
 cam+0.022
 cowboy+0.022
 rulers+0.021
Neg logit contributions
 shouldn-0.029
 hurts-0.028
|-0.027
os-0.026
elling-0.026
Token had
Token ID550
Feature activation-5.88e-04

Pos logit contributions
 detective+0.021
 customer+0.021
 cowboy+0.020
 buyer+0.020
 rulers+0.020
Neg logit contributions
 shouldn-0.015
os-0.014
 hurts-0.014
|-0.013
 doesn-0.013
Token a
Token ID257
Feature activation+6.06e-02

Pos logit contributions
 shouldn+0.001
 hurts+0.001
|+0.001
 aback+0.001
elling+0.001
Neg logit contributions
 detective-0.001
 customer-0.001
 cam-0.001
 servant-0.001
 cowboy-0.001
Token sandbox
Token ID35204
Feature activation+1.40e-01

Pos logit contributions
 cam+0.092
ider+0.092
anges+0.090
 rulers+0.090
per+0.089
Neg logit contributions
 shouldn-0.088
 hurts-0.084
idents-0.076
 fault-0.072
 doesn-0.070
Token were
Token ID547
Feature activation+4.15e-04

Pos logit contributions
 shouldn+0.045
 hurts+0.043
|+0.042
elling+0.041
rated+0.040
Neg logit contributions
 detective-0.059
 cam-0.056
 customer-0.056
 cowboy-0.055
 chefs-0.055
Token flying
Token ID7348
Feature activation+3.77e-03

Pos logit contributions
 detective+0.001
 customer+0.001
 cam+0.001
 buyer+0.001
 ha+0.000
Neg logit contributions
 shouldn-0.001
os-0.001
 hurts-0.001
|-0.001
ric-0.001
Token.
Token ID13
Feature activation-3.94e-02

Pos logit contributions
 detective+0.005
 cam+0.005
 pirates+0.005
 customer+0.005
 rulers+0.005
Neg logit contributions
 shouldn-0.007
elling-0.007
 hurts-0.007
idents-0.006
rated-0.006
Token
Token ID198
Feature activation+2.77e-03

Pos logit contributions
 shouldn+0.055
rated+0.054
elling+0.052
 hurts+0.052
|+0.049
Neg logit contributions
 detective-0.069
 customer-0.067
 cam-0.066
 pirates-0.064
 knights-0.064
Token
Token ID198
Feature activation+1.76e-02

Pos logit contributions
 customer+0.004
 detective+0.004
 cam+0.004
 pirates+0.004
 knights+0.004
Neg logit contributions
 shouldn-0.004
elling-0.004
rated-0.004
idents-0.004
 hurts-0.004
TokenOne
Token ID3198
Feature activation+1.44e-01

Pos logit contributions
 detective+0.023
 cam+0.022
 chefs+0.021
 servant+0.021
 customer+0.021
Neg logit contributions
 shouldn-0.032
|-0.032
 hurts-0.031
rated-0.030
elling-0.029
Token day
Token ID1110
Feature activation+3.48e-02

Pos logit contributions
 detective+0.243
 cam+0.236
 customer+0.234
 cowboy+0.225
 pirates+0.225
Neg logit contributions
 shouldn-0.212
 hurts-0.198
|-0.196
elling-0.192
idents-0.190
Token,
Token ID11
Feature activation+1.83e-01

Pos logit contributions
 detective+0.054
 cowboy+0.053
 buyer+0.051
 servant+0.050
 rulers+0.050
Neg logit contributions
 hurts-0.053
 shouldn-0.050
'.-0.048
|-0.048
 means-0.047
Token Lily
Token ID20037
Feature activation-1.06e-03

Pos logit contributions
 detective+0.258
 cam+0.250
 cowboy+0.238
 customer+0.238
 rulers+0.238
Neg logit contributions
 shouldn-0.312
 hurts-0.298
|-0.295
idents-0.291
 aback-0.279
Token met
Token ID1138
Feature activation+1.09e-01

Pos logit contributions
 shouldn+0.002
elling+0.002
|+0.002
idents+0.002
 hurts+0.002
Neg logit contributions
 detective-0.002
 cam-0.001
 pirates-0.001
 cowboy-0.001
 chefs-0.001
Token a
Token ID257
Feature activation+8.41e-02

Pos logit contributions
 detective+0.146
 customer+0.139
 cam+0.138
 pirates+0.135
 chefs+0.133
Neg logit contributions
 shouldn-0.198
 hurts-0.188
rated-0.187
|-0.187
elling-0.184
Token made
Token ID925
Feature activation+4.50e-02

Pos logit contributions
 detective+0.041
 customer+0.039
 cam+0.039
 cowboy+0.038
 pirates+0.037
Neg logit contributions
 shouldn-0.047
|-0.045
 hurts-0.045
elling-0.045
idents-0.044
Token them
Token ID606
Feature activation-4.20e-02

Pos logit contributions
 cam+0.054
 pirates+0.054
 customer+0.053
 detective+0.052
 chefs+0.051
Neg logit contributions
opl-0.080
|-0.079
 hurts-0.078
 shouldn-0.077
os-0.077
Token even
Token ID772
Feature activation+7.17e-04

Pos logit contributions
elling+0.083
|+0.078
 shouldn+0.074
opl+0.073
rated+0.073
Neg logit contributions
 detective-0.048
 rulers-0.047
 customer-0.047
 pirates-0.047
 cam-0.046
Token more
Token ID517
Feature activation+6.12e-02

Pos logit contributions
 detective+0.001
 cam+0.001
 customer+0.001
 chefs+0.001
 cowboy+0.001
Neg logit contributions
 shouldn-0.001
 hurts-0.001
|-0.001
rated-0.001
 aback-0.001
Token angry
Token ID7954
Feature activation-1.13e-01

Pos logit contributions
 detective+0.073
 cam+0.068
 pirates+0.067
 customer+0.066
 cowboy+0.066
Neg logit contributions
|-0.117
 shouldn-0.115
elling-0.115
 hurts-0.112
idents-0.112
Token,
Token ID11
Feature activation+1.45e-01

Pos logit contributions
 shouldn+0.189
 hurts+0.181
|+0.178
elling+0.175
rated+0.173
Neg logit contributions
 detective-0.166
 cam-0.158
 customer-0.158
 cowboy-0.153
 chefs-0.152
Token so
Token ID523
Feature activation-5.06e-02

Pos logit contributions
 detective+0.173
 cam+0.164
 customer+0.161
 pirates+0.160
 cowboy+0.158
Neg logit contributions
|-0.270
 shouldn-0.269
elling-0.267
 hurts-0.265
rated-0.259
Token they
Token ID484
Feature activation+2.40e-02

Pos logit contributions
 shouldn+0.093
|+0.093
elling+0.092
 hurts+0.091
os+0.089
Neg logit contributions
 detective-0.061
 cam-0.059
 customer-0.057
 pirates-0.056
 cowboy-0.055
Token started
Token ID2067
Feature activation+6.00e-02

Pos logit contributions
 detective+0.036
 cam+0.034
 customer+0.034
 cowboy+0.034
 pirates+0.033
Neg logit contributions
elling-0.038
 hurts-0.038
 shouldn-0.038
|-0.037
idents-0.036
Token shouting
Token ID19642
Feature activation-4.47e-02

Pos logit contributions
 detective+0.085
 cam+0.082
 customer+0.081
 pirates+0.079
 chefs+0.078
Neg logit contributions
 shouldn-0.101
elling-0.101
|-0.098
rated-0.097
 hurts-0.096
Token at
Token ID379
Feature activation-1.53e-02

Pos logit contributions
 shouldn+0.074
 hurts+0.071
|+0.071
elling+0.069
rated+0.068
Neg logit contributions
 detective-0.066
 cam-0.063
 customer-0.063
 cowboy-0.061
 chefs-0.061
Token world
Token ID995
Feature activation-8.33e-02

Pos logit contributions
 detective+0.037
 customer+0.035
 cam+0.035
 cowboy+0.034
 chefs+0.034
Neg logit contributions
 shouldn-0.045
 hurts-0.044
elling-0.041
|-0.041
rated-0.041
Token outside
Token ID2354
Feature activation-3.00e-02

Pos logit contributions
 shouldn+0.131
 hurts+0.125
|+0.124
elling+0.122
rated+0.120
Neg logit contributions
 detective-0.130
 cam-0.123
 customer-0.123
 pirates-0.120
 chefs-0.119
Token.
Token ID13
Feature activation-3.60e-02

Pos logit contributions
 shouldn+0.046
|+0.044
 hurts+0.044
elling+0.043
rated+0.043
Neg logit contributions
 detective-0.047
 cam-0.046
 customer-0.045
 pirates-0.045
 chefs-0.044
Token One
Token ID1881
Feature activation+1.14e-01

Pos logit contributions
rated+0.053
 hurts+0.053
 shouldn+0.050
elling+0.049
idents+0.048
Neg logit contributions
 detective-0.060
 customer-0.059
 pirates-0.058
 knights-0.058
 cam-0.058
Token day
Token ID1110
Feature activation+2.99e-02

Pos logit contributions
 detective+0.169
 cam+0.167
 customer+0.162
 pirates+0.159
 pirate+0.156
Neg logit contributions
 shouldn-0.186
idents-0.177
 aback-0.177
 hurts-0.175
elling-0.174
Token,
Token ID11
Feature activation+1.71e-01

Pos logit contributions
 detective+0.039
 customer+0.037
 servant+0.035
 rulers+0.035
 cam+0.035
Neg logit contributions
 shouldn-0.055
 hurts-0.053
|-0.051
 aback-0.050
os-0.050
Token the
Token ID262
Feature activation-7.70e-03

Pos logit contributions
 detective+0.261
 cam+0.248
 customer+0.247
 rulers+0.240
 cowboy+0.238
Neg logit contributions
 shouldn-0.276
 hurts-0.260
|-0.259
idents-0.250
rated-0.245
Token cat
Token ID3797
Feature activation-3.71e-02

Pos logit contributions
 shouldn+0.010
 hurts+0.010
elling+0.009
idents+0.009
 doesn+0.008
Neg logit contributions
 detective-0.013
 customer-0.013
 cam-0.013
 rulers-0.013
 buyer-0.013
Token saw
Token ID2497
Feature activation+6.24e-02

Pos logit contributions
 shouldn+0.065
elling+0.064
|+0.064
 hurts+0.063
idents+0.063
Neg logit contributions
 detective-0.050
 cam-0.048
 pirates-0.046
 cowboy-0.045
 chefs-0.045
Token a
Token ID257
Feature activation+7.36e-02

Pos logit contributions
 detective+0.071
 pirates+0.071
 cam+0.070
 chefs+0.069
 knights+0.067
Neg logit contributions
|-0.127
rated-0.126
 shouldn-0.122
idents-0.119
 hurts-0.118
Token big
Token ID1263
Feature activation+6.11e-02

Pos logit contributions
 rulers+0.104
 cam+0.103
 detective+0.102
 pirates+0.102
 chefs+0.101
Neg logit contributions
 shouldn-0.120
 hurts-0.115
idents-0.102
 doesn-0.097
|-0.096
Token named
Token ID3706
Feature activation+1.66e-02

Pos logit contributions
 detective+0.018
 customer+0.017
 cam+0.017
 cowboy+0.016
 chefs+0.016
Neg logit contributions
 shouldn-0.025
 hurts-0.024
|-0.023
os-0.023
elling-0.022
Token Leo
Token ID19632
Feature activation-2.02e-02

Pos logit contributions
 detective+0.027
 cam+0.027
 customer+0.026
 pirates+0.026
 knights+0.026
Neg logit contributions
 shouldn-0.024
elling-0.024
rated-0.024
 hurts-0.023
idents-0.023
Token.
Token ID13
Feature activation-2.93e-02

Pos logit contributions
 shouldn+0.031
 hurts+0.030
|+0.029
elling+0.029
rated+0.028
Neg logit contributions
 detective-0.032
 cam-0.031
 customer-0.031
 cowboy-0.030
 pirates-0.030
Token Max
Token ID5436
Feature activation-4.74e-03

Pos logit contributions
 shouldn+0.043
|+0.041
elling+0.041
rated+0.041
 hurts+0.040
Neg logit contributions
 detective-0.049
 cam-0.048
 customer-0.047
 knights-0.046
 servant-0.046
Token said
Token ID531
Feature activation+1.19e-02

Pos logit contributions
elling+0.007
upid+0.007
|+0.007
rated+0.007
 aback+0.007
Neg logit contributions
 detective-0.007
 cam-0.007
 pirates-0.007
 cowboy-0.007
 ha-0.007
Token,
Token ID11
Feature activation+1.50e-01

Pos logit contributions
 detective+0.016
 customer+0.016
 cam+0.015
 rulers+0.015
 chefs+0.015
Neg logit contributions
 shouldn-0.021
 hurts-0.021
os-0.019
 doesn-0.019
 aback-0.019
Token "
Token ID366
Feature activation+6.65e-02

Pos logit contributions
 detective+0.202
 cam+0.192
 customer+0.191
 chefs+0.183
 cowboy+0.183
Neg logit contributions
 shouldn-0.270
 hurts-0.257
|-0.254
elling-0.249
rated-0.247
TokenHi
Token ID17250
Feature activation+4.99e-02

Pos logit contributions
 detective+0.084
P+0.082
 customer+0.080
 cam+0.079
 cowboy+0.078
Neg logit contributions
 fault-0.126
 shouldn-0.125
idents-0.121
elling-0.120
upid-0.119
Token,
Token ID11
Feature activation+1.70e-01

Pos logit contributions
 detective+0.063
 customer+0.060
 cam+0.060
 buyer+0.057
 rulers+0.057
Neg logit contributions
 shouldn-0.093
 hurts-0.090
|-0.086
os-0.085
elling-0.085
Token I
Token ID314
Feature activation+4.10e-02

Pos logit contributions
 detective+0.249
 cam+0.238
 customer+0.235
 cowboy+0.231
 pirates+0.229
Neg logit contributions
 shouldn-0.286
|-0.276
rated-0.274
 hurts-0.273
elling-0.269
Token am
Token ID716
Feature activation+4.95e-02

Pos logit contributions
 detective+0.077
 pirates+0.072
 cam+0.071
 ha+0.071
 chefs+0.071
Neg logit contributions
elling-0.055
rated-0.052
 shouldn-0.050
 hurts-0.050
idents-0.050
Token he
Token ID339
Feature activation+2.40e-02

Pos logit contributions
 shouldn+0.027
 hurts+0.023
 means+0.023
elling+0.023
 doesn+0.022
Neg logit contributions
 detective-0.045
 servant-0.043
 cam-0.043
 customer-0.042
 rulers-0.042
Token doing
Token ID1804
Feature activation+4.69e-03

Pos logit contributions
 detective+0.042
 cam+0.040
 customer+0.040
 cowboy+0.039
 chefs+0.039
Neg logit contributions
 shouldn-0.034
 hurts-0.032
|-0.032
elling-0.031
rated-0.030
Token?"
Token ID1701
Feature activation-1.09e-01

Pos logit contributions
 detective+0.009
 cam+0.008
 customer+0.008
 servant+0.008
 rulers+0.008
Neg logit contributions
 shouldn-0.006
 hurts-0.006
elling-0.005
|-0.005
idents-0.005
Token
Token ID198
Feature activation-6.88e-03

Pos logit contributions
 shouldn+0.146
rated+0.138
|+0.138
 hurts+0.137
elling+0.136
Neg logit contributions
 customer-0.200
 detective-0.197
 cam-0.191
 pirates-0.185
 cowboy-0.185
Token
Token ID198
Feature activation+7.28e-03

Pos logit contributions
rated+0.011
 shouldn+0.011
elling+0.010
idents+0.010
 hurts+0.010
Neg logit contributions
 customer-0.012
 detective-0.011
 buyer-0.011
 cam-0.010
 ha-0.010
TokenHer
Token ID9360
Feature activation+7.21e-02

Pos logit contributions
 detective+0.010
 cam+0.009
 chefs+0.009
 cowboy+0.009
 servant+0.009
Neg logit contributions
 shouldn-0.013
 hurts-0.012
|-0.012
elling-0.012
os-0.012
Token dad
Token ID9955
Feature activation+1.84e-02

Pos logit contributions
 detective+0.091
 cam+0.084
 rulers+0.083
 cowboy+0.082
 chefs+0.081
Neg logit contributions
 shouldn-0.134
 hurts-0.130
 cries-0.122
idents-0.121
 doesn-0.118
Token said
Token ID531
Feature activation+1.22e-02

Pos logit contributions
 detective+0.036
 cam+0.035
 customer+0.035
 chefs+0.034
 cowboy+0.034
Neg logit contributions
 shouldn-0.022
 hurts-0.020
|-0.020
elling-0.019
rated-0.019
Token,
Token ID11
Feature activation+1.54e-01

Pos logit contributions
 detective+0.019
 rulers+0.019
 cowboy+0.018
P+0.018
 cam+0.018
Neg logit contributions
 shouldn-0.018
 hurts-0.018
 aback-0.017
os-0.017
 doesn-0.016
Token "
Token ID366
Feature activation+6.98e-02

Pos logit contributions
 detective+0.252
 cam+0.239
 rulers+0.238
 customer+0.234
 chefs+0.230
Neg logit contributions
 shouldn-0.236
 hurts-0.217
idents-0.208
os-0.205
 doesn-0.204
TokenHe
Token ID1544
Feature activation+8.17e-02

Pos logit contributions
 detective+0.097
 cam+0.092
 customer+0.092
 cowboy+0.089
 pirates+0.088
Neg logit contributions
 shouldn-0.122
 hurts-0.118
|-0.116
rated-0.115
elling-0.115
Token
Token ID198
Feature activation-2.05e-02

Pos logit contributions
rated+0.136
elling+0.134
 shouldn+0.133
|+0.127
idents+0.125
Neg logit contributions
 customer-0.135
 ha-0.133
 detective-0.132
 cam-0.131
 pirates-0.128
Token
Token ID198
Feature activation-9.34e-04

Pos logit contributions
rated+0.035
idents+0.034
elling+0.034
 shouldn+0.033
 hurts+0.032
Neg logit contributions
P-0.031
 customer-0.029
 pirates-0.029
 ha-0.029
 cam-0.028
TokenTom
Token ID13787
Feature activation+3.62e-02

Pos logit contributions
 shouldn+0.002
 hurts+0.002
elling+0.002
rated+0.002
|+0.002
Neg logit contributions
 detective-0.001
 customer-0.001
 cam-0.001
 cowboy-0.001
 pirates-0.001
Token and
Token ID290
Feature activation-2.30e-02

Pos logit contributions
 cam+0.047
 ha+0.045
 detective+0.045
 knights+0.043
 customer+0.043
Neg logit contributions
|-0.062
elling-0.061
 shouldn-0.061
idents-0.060
agree-0.060
Token Anna
Token ID11735
Feature activation-3.72e-02

Pos logit contributions
 shouldn+0.035
 hurts+0.034
|+0.033
 aback+0.031
rated+0.031
Neg logit contributions
 detective-0.036
 cam-0.035
 customer-0.035
 rulers-0.034
 chefs-0.034
Token are
Token ID389
Feature activation+4.50e-02

Pos logit contributions
 shouldn+0.059
 hurts+0.056
|+0.055
elling+0.053
rated+0.053
Neg logit contributions
 detective-0.058
 customer-0.056
 cam-0.055
 cowboy-0.054
 chefs-0.054
Token scared
Token ID12008
Feature activation-1.69e-01

Pos logit contributions
 detective+0.077
 cam+0.072
 customer+0.072
 cowboy+0.070
 servant+0.070
Neg logit contributions
 shouldn-0.067
 hurts-0.063
|-0.060
 aback-0.058
elling-0.057
Token.
Token ID13
Feature activation-3.59e-02

Pos logit contributions
 shouldn+0.226
 hurts+0.221
elling+0.207
|+0.207
rated+0.201
Neg logit contributions
 detective-0.309
 cam-0.292
 customer-0.292
 cowboy-0.288
 servant-0.286
Token They
Token ID1119
Feature activation+5.23e-02

Pos logit contributions
 shouldn+0.051
rated+0.048
 hurts+0.048
|+0.048
elling+0.047
Neg logit contributions
 customer-0.061
 detective-0.061
 cam-0.060
 pirates-0.058
 chefs-0.058
Token drop
Token ID4268
Feature activation+4.26e-02

Pos logit contributions
 cam+0.070
 detective+0.067
 ha+0.066
 cowboy+0.064
 shoes+0.061
Neg logit contributions
elling-0.095
os-0.090
|-0.090
opl-0.087
ingle-0.086
Token the
Token ID262
Feature activation-3.42e-02

Pos logit contributions
 detective+0.056
 customer+0.056
 cam+0.055
 chefs+0.055
 pirates+0.055
Neg logit contributions
 shouldn-0.078
rated-0.074
|-0.073
 hurts-0.071
idents-0.070
Token for
Token ID329
Feature activation-2.70e-03

Pos logit contributions
 shouldn+0.080
 hurts+0.078
|+0.077
idents+0.076
elling+0.075
Neg logit contributions
 detective-0.039
 cam-0.037
 servant-0.037
 customer-0.036
 knights-0.036
Token the
Token ID262
Feature activation-2.07e-02

Pos logit contributions
 shouldn+0.004
 hurts+0.003
|+0.003
 aback+0.003
elling+0.003
Neg logit contributions
 detective-0.005
 cam-0.004
 customer-0.004
 chefs-0.004
 cowboy-0.004
Token polite
Token ID23507
Feature activation+8.08e-03

Pos logit contributions
 shouldn+0.035
 hurts+0.033
|+0.033
elling+0.032
rated+0.032
Neg logit contributions
 detective-0.030
 cam-0.029
 customer-0.029
 cowboy-0.028
 chefs-0.028
Token answer
Token ID3280
Feature activation+2.93e-02

Pos logit contributions
 detective+0.012
 cam+0.012
 customer+0.012
 pirates+0.011
 rulers+0.011
Neg logit contributions
 shouldn-0.013
|-0.013
 hurts-0.013
elling-0.012
rated-0.012
Token.
Token ID13
Feature activation-2.94e-02

Pos logit contributions
 detective+0.040
 customer+0.039
 cam+0.038
 pirates+0.037
 rulers+0.037
Neg logit contributions
|-0.051
 shouldn-0.050
 hurts-0.049
rated-0.048
elling-0.048
Token Then
Token ID3244
Feature activation+7.13e-02

Pos logit contributions
rated+0.043
 hurts+0.040
|+0.038
 shouldn+0.038
idents+0.035
Neg logit contributions
 customer-0.048
 cam-0.047
 knights-0.047
 detective-0.047
 pirates-0.047
Token,
Token ID11
Feature activation+1.67e-01

Pos logit contributions
 detective+0.119
 customer+0.115
 cam+0.113
 chefs+0.112
 cowboy+0.111
Neg logit contributions
 shouldn-0.106
 hurts-0.100
 doesn-0.093
 aback-0.092
idents-0.092
Token it
Token ID340
Feature activation-6.65e-02

Pos logit contributions
 detective+0.289
 cam+0.279
 customer+0.278
 pirates+0.272
 cowboy+0.271
Neg logit contributions
 shouldn-0.236
|-0.227
 hurts-0.226
elling-0.221
rated-0.218
Token went
Token ID1816
Feature activation+1.24e-01

Pos logit contributions
 shouldn+0.108
|+0.106
 hurts+0.104
elling+0.103
idents+0.102
Neg logit contributions
 detective-0.100
 cam-0.096
 customer-0.092
 cowboy-0.092
 pirates-0.091
Token off
Token ID572
Feature activation+1.20e-03

Pos logit contributions
 detective+0.232
 chefs+0.230
 cam+0.229
 customer+0.227
 rulers+0.224
Neg logit contributions
 shouldn-0.146
|-0.141
 hurts-0.140
 aback-0.139
elling-0.133
Token to
Token ID284
Feature activation+8.91e-03

Pos logit contributions
 detective+0.002
 customer+0.002
 cam+0.002
 pirates+0.002
 knights+0.001
Neg logit contributions
 shouldn-0.002
elling-0.002
rated-0.002
 hurts-0.002
idents-0.002
Token.
Token ID13
Feature activation-2.52e-02

Pos logit contributions
 detective+0.036
 cam+0.035
 customer+0.035
 cowboy+0.035
 chefs+0.034
Neg logit contributions
 shouldn-0.026
 hurts-0.025
|-0.024
elling-0.024
 aback-0.023
Token She
Token ID1375
Feature activation+5.03e-02

Pos logit contributions
 hurts+0.038
 shouldn+0.037
rated+0.037
elling+0.037
ric+0.036
Neg logit contributions
 detective-0.039
 customer-0.037
 servant-0.036
 cam-0.036
 knights-0.035
Token noticed
Token ID6810
Feature activation+4.58e-02

Pos logit contributions
 detective+0.089
 customer+0.088
 chefs+0.085
 cam+0.085
 rulers+0.085
Neg logit contributions
 shouldn-0.072
|-0.063
 hurts-0.063
os-0.060
 doesn-0.059
Token the
Token ID262
Feature activation-3.49e-02

Pos logit contributions
 detective+0.074
 cam+0.071
 customer+0.071
 cowboy+0.069
 chefs+0.068
Neg logit contributions
 shouldn-0.070
 hurts-0.066
|-0.065
elling-0.064
rated-0.063
Token same
Token ID976
Feature activation+4.44e-02

Pos logit contributions
 shouldn+0.061
 hurts+0.058
|+0.057
elling+0.057
rated+0.056
Neg logit contributions
 detective-0.049
 cam-0.047
 customer-0.046
 cowboy-0.045
 chefs-0.045
Token math
Token ID10688
Feature activation+6.78e-02

Pos logit contributions
 cam+0.061
 detective+0.060
 chefs+0.059
 customer+0.058
 cowboy+0.058
Neg logit contributions
 shouldn-0.074
 hurts-0.073
rated-0.067
 aback-0.067
elling-0.067
Token book
Token ID1492
Feature activation-2.02e-02

Pos logit contributions
 cam+0.094
 detective+0.089
 customer+0.087
 chefs+0.087
 cowboy+0.086
Neg logit contributions
 shouldn-0.117
 hurts-0.115
 doesn-0.107
|-0.104
idents-0.103
Token in
Token ID287
Feature activation-1.36e-03

Pos logit contributions
 shouldn+0.031
 hurts+0.030
|+0.029
elling+0.028
rated+0.028
Neg logit contributions
 detective-0.032
 cam-0.031
 customer-0.031
 chefs-0.030
 cowboy-0.030
Token the
Token ID262
Feature activation-2.67e-02

Pos logit contributions
 shouldn+0.002
 hurts+0.002
|+0.002
elling+0.002
rated+0.002
Neg logit contributions
 detective-0.002
 customer-0.002
 cam-0.002
 pirates-0.002
 rulers-0.002
Token same
Token ID976
Feature activation+4.61e-02

Pos logit contributions
 shouldn+0.045
 hurts+0.043
|+0.042
elling+0.042
rated+0.041
Neg logit contributions
 detective-0.039
 cowboy-0.037
 cam-0.037
 customer-0.036
 chefs-0.036
Token place
Token ID1295
Feature activation+7.23e-02

Pos logit contributions
 detective+0.073
 cowboy+0.073
 chefs+0.072
 cam+0.072
ider+0.070
Neg logit contributions
 shouldn-0.065
 aback-0.063
|-0.061
 hurts-0.061
'.-0.060
Token are
Token ID389
Feature activation+2.79e-02

Pos logit contributions
elling+0.021
anne+0.021
 fault+0.021
idents+0.020
 hurts+0.020
Neg logit contributions
 detective-0.017
 pirates-0.017
 cam-0.017
 cowboy-0.016
 pirate-0.016
Token his
Token ID465
Feature activation+2.11e-02

Pos logit contributions
 detective+0.039
 cam+0.037
 customer+0.037
 ha+0.035
 servant+0.035
Neg logit contributions
 shouldn-0.048
 hurts-0.046
|-0.044
elling-0.043
 aback-0.042
Token family
Token ID1641
Feature activation-5.56e-02

Pos logit contributions
 detective+0.023
 cam+0.022
 customer+0.022
 chefs+0.021
 cowboy+0.021
Neg logit contributions
 shouldn-0.042
 hurts-0.041
|-0.040
idents-0.039
elling-0.039
Token.
Token ID13
Feature activation-3.16e-02

Pos logit contributions
 shouldn+0.077
 hurts+0.072
|+0.072
elling+0.068
os+0.068
Neg logit contributions
 detective-0.098
 cam-0.094
 customer-0.092
 cowboy-0.092
 chefs-0.090
Token
Token ID198
Feature activation+3.24e-03

Pos logit contributions
 shouldn+0.037
 hurts+0.035
rated+0.035
elling+0.034
|+0.034
Neg logit contributions
 detective-0.062
 cam-0.061
 customer-0.060
 cowboy-0.059
 chefs-0.058
Token
Token ID198
Feature activation+1.69e-02

Pos logit contributions
 cam+0.004
 pirates+0.004
 pirate+0.004
P+0.004
 customer+0.004
Neg logit contributions
rated-0.006
idents-0.006
elling-0.006
 shouldn-0.006
 aback-0.006
TokenThey
Token ID2990
Feature activation+8.30e-02

Pos logit contributions
 detective+0.022
 cam+0.020
 chefs+0.020
 customer+0.020
 servant+0.020
Neg logit contributions
 shouldn-0.031
|-0.030
 hurts-0.030
elling-0.029
rated-0.028
Token go
Token ID467
Feature activation+1.37e-01

Pos logit contributions
 detective+0.120
 cam+0.118
 customer+0.113
 cowboy+0.111
 servant+0.110
Neg logit contributions
 shouldn-0.139
 hurts-0.135
elling-0.134
rated-0.131
|-0.129
Token inside
Token ID2641
Feature activation+8.96e-03

Pos logit contributions
 detective+0.276
 cowboy+0.268
 customer+0.265
 chefs+0.264
 rulers+0.263
Neg logit contributions
|-0.151
 hurts-0.135
os-0.134
 shouldn-0.130
 aback-0.127
Token the
Token ID262
Feature activation-2.24e-02

Pos logit contributions
 detective+0.013
 customer+0.013
 cam+0.013
 pirates+0.013
 chefs+0.013
Neg logit contributions
 shouldn-0.015
 hurts-0.014
elling-0.014
|-0.014
rated-0.014
Token house
Token ID2156
Feature activation+6.47e-02

Pos logit contributions
 shouldn+0.040
 hurts+0.038
|+0.038
elling+0.037
idents+0.036
Neg logit contributions
 detective-0.031
 customer-0.029
 cam-0.029
 cowboy-0.028
 rulers-0.028
Token he
Token ID339
Feature activation+5.07e-02

Pos logit contributions
 shouldn+0.108
 hurts+0.102
|+0.096
 doesn+0.096
rated+0.095
Neg logit contributions
 detective-0.109
 cam-0.108
 customer-0.103
 servant-0.102
 rulers-0.101
Token didn
Token ID1422
Feature activation-1.26e-01

Pos logit contributions
 detective+0.072
 customer+0.070
 cowboy+0.070
 ha+0.070
 cam+0.070
Neg logit contributions
 aback-0.087
elling-0.087
upid-0.081
idents-0.081
|-0.081
Token't
Token ID470
Feature activation-1.19e-01

Pos logit contributions
rated+0.230
 aback+0.204
elling+0.199
idents+0.192
 disl+0.187
Neg logit contributions
 customer-0.175
 ha-0.164
 soccer-0.163
 rulers-0.160
 pirates-0.159
Token know
Token ID760
Feature activation-5.17e-02

Pos logit contributions
elling+0.177
 shouldn+0.174
idents+0.173
 hurts+0.166
 aback+0.165
Neg logit contributions
 customer-0.191
 detective-0.188
 cam-0.187
 cowboy-0.180
 rulers-0.177
Token how
Token ID703
Feature activation-1.91e-02

Pos logit contributions
 shouldn+0.074
 hurts+0.069
elling+0.065
os+0.065
 doesn+0.064
Neg logit contributions
 detective-0.091
 customer-0.087
 cam-0.085
 cowboy-0.084
 buyer-0.084
Token to
Token ID284
Feature activation-4.01e-02

Pos logit contributions
|+0.033
 shouldn+0.032
 aback+0.031
elling+0.031
rated+0.031
Neg logit contributions
 chefs-0.026
 cam-0.026
 detective-0.026
 pirates-0.026
 customer-0.026
Token make
Token ID787
Feature activation+4.84e-03

Pos logit contributions
 shouldn+0.070
 hurts+0.067
|+0.066
elling+0.064
rated+0.064
Neg logit contributions
 detective-0.057
 cam-0.053
 customer-0.052
 cowboy-0.052
 servant-0.051
Token a
Token ID257
Feature activation+4.28e-02

Pos logit contributions
 detective+0.007
 customer+0.007
 cam+0.007
 chefs+0.007
 pirates+0.007
Neg logit contributions
 shouldn-0.008
|-0.008
 hurts-0.007
rated-0.007
elling-0.007
Token cake
Token ID12187
Feature activation+5.58e-02

Pos logit contributions
 detective+0.063
 cam+0.060
 customer+0.059
 cowboy+0.057
 chefs+0.057
Neg logit contributions
 shouldn-0.072
 hurts-0.069
|-0.067
elling-0.066
rated-0.065
Token.
Token ID13
Feature activation-3.95e-02

Pos logit contributions
 detective+0.095
 cam+0.093
 customer+0.090
 cowboy+0.088
 ha+0.088
Neg logit contributions
 shouldn-0.079
 hurts-0.076
idents-0.071
rated-0.071
|-0.071
Token One
Token ID1881
Feature activation+1.18e-01

Pos logit contributions
 hurts+0.059
 shouldn+0.058
rated+0.057
elling+0.056
|+0.055
Neg logit contributions
 customer-0.063
 detective-0.062
 knights-0.062
 pirates-0.062
 chefs-0.061
TokenWhat
Token ID2061
Feature activation-1.16e-02

Pos logit contributions
 detective+0.106
 cam+0.100
 customer+0.099
 servant+0.098
 rulers+0.097
Neg logit contributions
 shouldn-0.113
 hurts-0.105
|-0.105
elling-0.104
 aback-0.103
Token's
Token ID338
Feature activation+3.02e-02

Pos logit contributions
elling+0.015
|+0.015
rated+0.015
opl+0.014
 aback+0.014
Neg logit contributions
 chefs-0.021
 detective-0.021
 pirates-0.021
 knights-0.020
 cam-0.020
Token wrong
Token ID2642
Feature activation-4.48e-02

Pos logit contributions
 detective+0.051
 cam+0.049
 customer+0.049
 cowboy+0.048
 chefs+0.048
Neg logit contributions
 shouldn-0.044
 hurts-0.042
|-0.042
elling-0.040
rated-0.040
Token,
Token ID11
Feature activation+1.60e-01

Pos logit contributions
 shouldn+0.063
 hurts+0.060
rated+0.057
elling+0.056
idents+0.055
Neg logit contributions
 detective-0.077
 cam-0.073
 rulers-0.072
 customer-0.072
 cowboy-0.071
Token honey
Token ID12498
Feature activation+4.52e-02

Pos logit contributions
 detective+0.251
 cam+0.245
 customer+0.242
 chefs+0.237
 cowboy+0.234
Neg logit contributions
 shouldn-0.246
|-0.242
 hurts-0.239
rated-0.229
elling-0.229
Token?"
Token ID1701
Feature activation-1.02e-01

Pos logit contributions
 detective+0.089
 cam+0.087
 customer+0.086
 chefs+0.084
 cowboy+0.084
Neg logit contributions
 shouldn-0.052
|-0.049
 hurts-0.048
elling-0.047
rated-0.046
Token
Token ID220
Feature activation-5.66e-02

Pos logit contributions
 shouldn+0.131
|+0.130
 hurts+0.127
rated+0.125
elling+0.122
Neg logit contributions
 detective-0.186
 cam-0.184
 customer-0.183
 chefs-0.179
 cowboy-0.176
Token
Token ID198
Feature activation-3.20e-03

Pos logit contributions
|+0.103
rated+0.101
elling+0.099
 hurts+0.095
 shouldn+0.091
Neg logit contributions
 cam-0.072
 customer-0.072
 chefs-0.072
 pirates-0.067
 buyer-0.064
Token
Token ID198
Feature activation+8.63e-03

Pos logit contributions
 shouldn+0.004
 hurts+0.004
|+0.003
elling+0.003
rated+0.003
Neg logit contributions
 detective-0.006
 cam-0.006
 customer-0.006
 cowboy-0.006
 chefs-0.006
TokenP
Token ID47
Feature activation+1.47e-01

Pos logit contributions
 detective+0.012
 servant+0.011
 cam+0.011
 rulers+0.011
 customer+0.011
Neg logit contributions
 shouldn-0.015
 hurts-0.014
|-0.014
elling-0.014
'.-0.014
Tokenapa
Token ID32678
Feature activation+3.62e-02

Pos logit contributions
 cam+0.232
 customer+0.229
 detective+0.222
 ha+0.220
 rulers+0.217
Neg logit contributions
|-0.225
 shouldn-0.219
 hurts-0.217
rated-0.216
elling-0.200
Token to
Token ID284
Feature activation-2.34e-02

Pos logit contributions
 shouldn+0.045
 hurts+0.043
|+0.042
elling+0.040
rated+0.040
Neg logit contributions
 detective-0.061
 customer-0.059
 cam-0.059
 cowboy-0.057
 rulers-0.057
Token explore
Token ID7301
Feature activation+1.32e-02

Pos logit contributions
 shouldn+0.035
 hurts+0.035
|+0.034
rated+0.033
elling+0.032
Neg logit contributions
 detective-0.037
 customer-0.035
 cam-0.035
 chefs-0.035
 cowboy-0.034
Token.
Token ID13
Feature activation-4.38e-02

Pos logit contributions
 detective+0.024
 customer+0.024
 cam+0.023
 cowboy+0.023
 servant+0.023
Neg logit contributions
 shouldn-0.017
 hurts-0.016
|-0.016
elling-0.015
rated-0.015
Token He
Token ID679
Feature activation+6.75e-02

Pos logit contributions
 shouldn+0.065
 hurts+0.063
rated+0.062
elling+0.061
idents+0.061
Neg logit contributions
 detective-0.072
 customer-0.072
 cam-0.070
 pirates-0.069
 knights-0.068
Token was
Token ID373
Feature activation+1.82e-02

Pos logit contributions
 detective+0.120
 customer+0.117
 cam+0.114
 chefs+0.113
 cowboy+0.112
Neg logit contributions
 shouldn-0.094
|-0.088
 hurts-0.086
rated-0.082
os-0.081
Token always
Token ID1464
Feature activation+5.72e-03

Pos logit contributions
 detective+0.026
 customer+0.025
 cam+0.024
 cowboy+0.024
 chefs+0.024
Neg logit contributions
 shouldn-0.031
 hurts-0.029
|-0.029
 aback-0.027
rated-0.027
Token excited
Token ID6568
Feature activation-6.95e-02

Pos logit contributions
 detective+0.008
 customer+0.008
 cowboy+0.008
 cam+0.008
 rulers+0.007
Neg logit contributions
 shouldn-0.010
 hurts-0.009
 aback-0.009
rated-0.009
|-0.009
Token to
Token ID284
Feature activation-2.09e-02

Pos logit contributions
 shouldn+0.100
 hurts+0.095
|+0.094
elling+0.092
rated+0.091
Neg logit contributions
 detective-0.119
 cam-0.114
 customer-0.113
 cowboy-0.110
 chefs-0.110
Token learn
Token ID2193
Feature activation-1.81e-02

Pos logit contributions
 shouldn+0.028
 hurts+0.026
|+0.026
rated+0.025
upid+0.024
Neg logit contributions
 detective-0.038
 customer-0.036
 chefs-0.036
 cam-0.036
 cowboy-0.035
Token new
Token ID649
Feature activation-3.52e-02

Pos logit contributions
 shouldn+0.030
|+0.029
 hurts+0.029
elling+0.028
rated+0.028
Neg logit contributions
 detective-0.026
 cam-0.025
 customer-0.025
 pirates-0.024
 cowboy-0.024
Token things
Token ID1243
Feature activation-4.22e-02

Pos logit contributions
 shouldn+0.064
 hurts+0.061
|+0.060
elling+0.059
rated+0.058
Neg logit contributions
 detective-0.047
 cam-0.045
 customer-0.045
 cowboy-0.043
 chefs-0.043
TokenNow
Token ID3844
Feature activation-3.20e-02

Pos logit contributions
 detective+0.106
 cam+0.103
 customer+0.101
 chefs+0.098
 cowboy+0.098
Neg logit contributions
 shouldn-0.104
|-0.099
 hurts-0.099
elling-0.096
rated-0.095
Token you
Token ID345
Feature activation-3.55e-02

Pos logit contributions
 shouldn+0.042
 doesn+0.040
 hurts+0.040
idents+0.038
 isn+0.035
Neg logit contributions
 detective-0.059
 customer-0.058
 chefs-0.056
 cam-0.056
 buyer-0.055
Token'll
Token ID1183
Feature activation-4.73e-02

Pos logit contributions
 hurts+0.048
opl+0.047
elling+0.047
rated+0.046
 shouldn+0.045
Neg logit contributions
 detective-0.060
 pirates-0.058
 rulers-0.058
 cam-0.057
 ha-0.056
Token be
Token ID307
Feature activation+7.42e-02

Pos logit contributions
 shouldn+0.074
 hurts+0.072
elling+0.072
rated+0.071
|+0.070
Neg logit contributions
 detective-0.073
 cam-0.070
 customer-0.069
 ha-0.068
 rulers-0.067
Token successful
Token ID4388
Feature activation-5.60e-02

Pos logit contributions
 cam+0.114
 detective+0.113
 customer+0.107
 servant+0.104
 rulers+0.102
Neg logit contributions
 shouldn-0.113
 aback-0.110
 hurts-0.104
|-0.101
elling-0.101
Token!"
Token ID2474
Feature activation-8.75e-02

Pos logit contributions
 shouldn+0.077
 hurts+0.073
|+0.070
 aback+0.069
elling+0.069
Neg logit contributions
 detective-0.098
 cam-0.095
 customer-0.094
 cowboy-0.092
 rulers-0.091
Token he
Token ID339
Feature activation+6.25e-02

Pos logit contributions
rated+0.189
ric+0.183
upid+0.179
elling+0.177
 fault+0.176
Neg logit contributions
 cam-0.083
 pirates-0.082
 detective-0.082
 shoes-0.081
 pirate-0.080
Token said
Token ID531
Feature activation-6.37e-04

Pos logit contributions
 detective+0.093
 cowboy+0.089
 customer+0.087
 cam+0.087
 pirate+0.087
Neg logit contributions
elling-0.101
opl-0.100
|-0.099
ric-0.099
upid-0.098
Token.
Token ID13
Feature activation-3.44e-02

Pos logit contributions
 shouldn+0.001
 hurts+0.001
|+0.001
 aback+0.001
rated+0.001
Neg logit contributions
 detective-0.001
 cam-0.001
 cowboy-0.001
 chefs-0.001
 customer-0.001
Token And
Token ID843
Feature activation+4.70e-02

Pos logit contributions
rated+0.049
 shouldn+0.049
 hurts+0.048
elling+0.047
|+0.046
Neg logit contributions
 customer-0.056
 detective-0.055
 chefs-0.054
 cam-0.053
 servant-0.053
Token sure
Token ID1654
Feature activation-5.33e-02

Pos logit contributions
 detective+0.084
 cam+0.082
 customer+0.081
 cowboy+0.080
 rulers+0.080
Neg logit contributions
 shouldn-0.065
|-0.058
 hurts-0.057
 aback-0.055
idents-0.054
Token.
Token ID13
Feature activation-3.92e-02

Pos logit contributions
 shouldn+0.125
elling+0.124
 hurts+0.121
|+0.120
idents+0.117
Neg logit contributions
 detective-0.101
 customer-0.097
 cam-0.096
 pirates-0.092
 rulers-0.092
Token He
Token ID679
Feature activation+6.26e-02

Pos logit contributions
 shouldn+0.050
 hurts+0.049
|+0.047
rated+0.046
elling+0.045
Neg logit contributions
 detective-0.073
 customer-0.072
 cam-0.070
 pirates-0.069
 servant-0.069
Token raced
Token ID32897
Feature activation+1.07e-01

Pos logit contributions
 detective+0.102
 cam+0.097
 customer+0.097
 chefs+0.095
 rulers+0.094
Neg logit contributions
 shouldn-0.096
 hurts-0.089
|-0.089
rated-0.086
elling-0.085
Token so
Token ID523
Feature activation-4.44e-02

Pos logit contributions
 detective+0.201
 cam+0.194
 customer+0.193
 cowboy+0.188
 chefs+0.188
Neg logit contributions
 shouldn-0.137
 hurts-0.129
|-0.127
elling-0.123
rated-0.122
Token fast
Token ID3049
Feature activation-6.59e-02

Pos logit contributions
 shouldn+0.077
 hurts+0.073
|+0.072
 aback+0.069
rated+0.068
Neg logit contributions
 detective-0.065
 chefs-0.062
 cam-0.061
 customer-0.060
 cowboy-0.058
Token that
Token ID326
Feature activation-4.55e-02

Pos logit contributions
 shouldn+0.102
 hurts+0.097
|+0.093
 aback+0.093
elling+0.092
Neg logit contributions
 detective-0.104
 customer-0.101
 cam-0.100
 chefs-0.098
 rulers-0.097
Token no
Token ID645
Feature activation+3.93e-02

Pos logit contributions
 shouldn+0.064
 hurts+0.060
|+0.056
 doesn+0.056
rated+0.055
Neg logit contributions
 detective-0.079
 cam-0.075
 customer-0.075
 chefs-0.074
 servant-0.073
Token-
Token ID12
Feature activation+1.35e-02

Pos logit contributions
 detective+0.063
 cam+0.062
 chefs+0.062
 mer+0.061
P+0.060
Neg logit contributions
 shouldn-0.055
rated-0.053
 hurts-0.053
elling-0.050
 doesn-0.049
Tokenone
Token ID505
Feature activation+4.91e-02

Pos logit contributions
 customer+0.021
 detective+0.021
 ha+0.020
 cowboy+0.020
 rulers+0.020
Neg logit contributions
|-0.021
idents-0.020
 hurts-0.020
 shouldn-0.020
rated-0.019
Token could
Token ID714
Feature activation-9.07e-03

Pos logit contributions
 detective+0.077
 cam+0.075
 customer+0.074
 ha+0.074
 pirates+0.074
Neg logit contributions
idents-0.073
elling-0.073
|-0.072
rated-0.069
 shouldn-0.069
Token catch
Token ID4929
Feature activation+1.19e-03

Pos logit contributions
 shouldn+0.015
 hurts+0.014
|+0.014
elling+0.014
rated+0.014
Neg logit contributions
 detective-0.014
 cam-0.013
 customer-0.013
 cowboy-0.013
 rulers-0.012
TokenM
Token ID44
Feature activation+8.34e-02

Pos logit contributions
 detective+0.015
 cam+0.014
 customer+0.013
 cowboy+0.013
 chefs+0.013
Neg logit contributions
 shouldn-0.027
 hurts-0.026
|-0.026
rated-0.025
os-0.025
Tokenia
Token ID544
Feature activation+2.71e-02

Pos logit contributions
 detective+0.146
 cowboy+0.139
 cam+0.139
 chefs+0.136
 customer+0.135
Neg logit contributions
 shouldn-0.115
 hurts-0.114
elling-0.112
os-0.107
upid-0.106
Token was
Token ID373
Feature activation+2.81e-02

Pos logit contributions
 detective+0.044
 cam+0.042
 customer+0.042
 pirates+0.041
 cowboy+0.041
Neg logit contributions
 shouldn-0.041
|-0.039
 hurts-0.039
elling-0.038
rated-0.038
Token sad
Token ID6507
Feature activation-1.59e-01

Pos logit contributions
 detective+0.046
 cam+0.044
 customer+0.044
 pirates+0.043
 chefs+0.043
Neg logit contributions
 shouldn-0.042
|-0.040
 hurts-0.040
elling-0.039
rated-0.039
Token and
Token ID290
Feature activation-8.14e-02

Pos logit contributions
 hurts+0.188
 shouldn+0.181
idents+0.167
 cries+0.167
 doesn+0.165
Neg logit contributions
 detective-0.309
 cowboy-0.304
 customer-0.304
 buyer-0.303
 rulers-0.298
Token angry
Token ID7954
Feature activation-1.39e-01

Pos logit contributions
 shouldn+0.118
|+0.114
 hurts+0.111
elling+0.111
rated+0.109
Neg logit contributions
 detective-0.136
 cam-0.131
 customer-0.128
 pirates-0.127
 cowboy-0.126
Token.
Token ID13
Feature activation-2.75e-02

Pos logit contributions
 shouldn+0.160
 hurts+0.158
idents+0.143
 doesn+0.143
elling+0.141
Neg logit contributions
 detective-0.271
 customer-0.268
 cam-0.265
 buyer-0.265
 cowboy-0.262
Token She
Token ID1375
Feature activation+4.08e-02

Pos logit contributions
 shouldn+0.038
|+0.036
 hurts+0.036
rated+0.036
elling+0.035
Neg logit contributions
 detective-0.048
 cam-0.046
 customer-0.046
 chefs-0.045
 pirates-0.045
Token ran
Token ID4966
Feature activation+1.11e-01

Pos logit contributions
 detective+0.061
 cam+0.058
 customer+0.057
 cowboy+0.056
 pirates+0.055
Neg logit contributions
 shouldn-0.066
|-0.064
 hurts-0.064
elling-0.063
rated-0.062
Token after
Token ID706
Feature activation+1.31e-03

Pos logit contributions
 detective+0.197
 cam+0.190
 customer+0.189
 cowboy+0.184
 chefs+0.183
Neg logit contributions
 shouldn-0.152
 hurts-0.144
|-0.142
elling-0.138
rated-0.137
Token Tom
Token ID4186
Feature activation-1.07e-02

Pos logit contributions
 detective+0.002
 cam+0.002
 customer+0.002
 cowboy+0.002
 rulers+0.002
Neg logit contributions
 shouldn-0.002
 hurts-0.002
|-0.002
elling-0.002
idents-0.001
Token but
Token ID475
Feature activation-6.35e-02

Pos logit contributions
 detective+0.150
 cam+0.141
 customer+0.140
 cowboy+0.135
 chefs+0.134
Neg logit contributions
 shouldn-0.269
 hurts-0.260
|-0.259
elling-0.255
rated-0.252
Token Tim
Token ID5045
Feature activation-3.64e-02

Pos logit contributions
 shouldn+0.089
 hurts+0.085
|+0.084
elling+0.082
rated+0.081
Neg logit contributions
 detective-0.111
 cam-0.106
 customer-0.106
 cowboy-0.103
 chefs-0.103
Tokenmy
Token ID1820
Feature activation-3.66e-02

Pos logit contributions
elling+0.044
|+0.044
 shouldn+0.041
idents+0.041
 disl+0.039
Neg logit contributions
 detective-0.066
 ha-0.064
 customer-0.064
 cowboy-0.063
 servant-0.062
Token still
Token ID991
Feature activation-6.66e-02

Pos logit contributions
elling+0.048
|+0.048
 shouldn+0.046
 hurts+0.045
rated+0.044
Neg logit contributions
 detective-0.065
 cam-0.062
 cowboy-0.062
 customer-0.062
 ha-0.061
Token didn
Token ID1422
Feature activation-1.40e-01

Pos logit contributions
elling+0.119
|+0.118
 aback+0.108
neath+0.106
opl+0.105
Neg logit contributions
 detective-0.087
 cowboy-0.085
 cam-0.084
 neck-0.081
 pirates-0.080
Token't
Token ID470
Feature activation-1.30e-01

Pos logit contributions
rated+0.201
elling+0.192
 aback+0.182
 disl+0.177
anne+0.177
Neg logit contributions
 ha-0.226
 customer-0.220
 cowboy-0.217
 detective-0.210
 soccer-0.210
Token want
Token ID765
Feature activation-7.11e-02

Pos logit contributions
 shouldn+0.187
 hurts+0.181
|+0.174
rated+0.171
 aback+0.165
Neg logit contributions
 detective-0.220
 cam-0.212
 customer-0.210
 chefs-0.206
 rulers-0.206
Token to
Token ID284
Feature activation-5.42e-02

Pos logit contributions
 shouldn+0.108
rated+0.107
|+0.107
 hurts+0.103
elling+0.102
Neg logit contributions
 detective-0.111
 customer-0.107
 cam-0.107
 pirates-0.106
 chefs-0.105
Token.
Token ID13
Feature activation-4.69e-02

Pos logit contributions
 shouldn+0.086
 hurts+0.084
|+0.081
rated+0.078
idents+0.078
Neg logit contributions
 detective-0.086
 customer-0.081
 cam-0.081
 chefs-0.079
 rulers-0.079
Token Instead
Token ID5455
Feature activation+3.94e-02

Pos logit contributions
|+0.073
 hurts+0.071
elling+0.069
rated+0.069
ingle+0.068
Neg logit contributions
 detective-0.072
 customer-0.069
 cowboy-0.069
 pirates-0.068
 chefs-0.068
Token,
Token ID11
Feature activation+1.49e-01

Pos logit contributions
 detective+0.054
 cam+0.050
 chefs+0.050
 customer+0.050
 servant+0.048
Neg logit contributions
 shouldn-0.071
 hurts-0.070
os-0.064
 doesn-0.064
|-0.064
Token smiles
Token ID21845
Feature activation-1.01e-01

Pos logit contributions
 detective+0.008
 cam+0.007
 customer+0.007
 cowboy+0.007
 chefs+0.007
Neg logit contributions
 shouldn-0.008
 hurts-0.007
|-0.007
elling-0.007
rated-0.007
Token on
Token ID319
Feature activation-3.18e-02

Pos logit contributions
 shouldn+0.155
 hurts+0.144
rated+0.141
os+0.141
|+0.140
Neg logit contributions
 detective-0.159
 cam-0.154
 customer-0.150
 chefs-0.148
 rulers-0.146
Token their
Token ID511
Feature activation+4.50e-02

Pos logit contributions
 shouldn+0.052
 hurts+0.050
|+0.049
elling+0.048
rated+0.048
Neg logit contributions
 detective-0.048
 cam-0.046
 customer-0.046
 cowboy-0.044
 chefs-0.044
Token faces
Token ID6698
Feature activation+9.56e-03

Pos logit contributions
 detective+0.055
 cam+0.052
 customer+0.051
 cowboy+0.050
 pirates+0.050
Neg logit contributions
 shouldn-0.087
|-0.082
elling-0.082
 hurts-0.081
 aback-0.081
Token.
Token ID13
Feature activation-3.01e-02

Pos logit contributions
 detective+0.018
 cam+0.017
 chefs+0.017
 customer+0.017
 cowboy+0.017
Neg logit contributions
 shouldn-0.012
 hurts-0.011
os-0.011
|-0.011
elling-0.011
Token<|endoftext|>
Token ID50256
Feature activation-1.81e-01

Pos logit contributions
rated+0.058
 hurts+0.054
elling+0.053
 shouldn+0.050
idents+0.050
Neg logit contributions
 customer-0.038
 detective-0.035
 cam-0.034
 ha-0.033
 buyer-0.033
TokenOnce
Token ID7454
Feature activation+2.44e-02

Pos logit contributions
idents+0.288
rated+0.286
elling+0.277
 shouldn+0.272
 disl+0.269
Neg logit contributions
 pirates-0.276
 detective-0.272
 customer-0.271
 cam-0.270
 chefs-0.267
Token upon
Token ID2402
Feature activation-1.42e-01

Pos logit contributions
 customer+0.029
 detective+0.029
 cam+0.027
 pirates+0.027
 servant+0.027
Neg logit contributions
elling-0.044
 shouldn-0.042
rated-0.042
 aback-0.042
 hurts-0.041
Token a
Token ID257
Feature activation+1.97e-02

Pos logit contributions
 shouldn+0.276
elling+0.274
 hurts+0.269
rated+0.267
idents+0.263
Neg logit contributions
 detective-0.165
 pirates-0.160
 customer-0.155
 cam-0.153
 chefs-0.153
Token time
Token ID640
Feature activation-1.62e-02

Pos logit contributions
 detective+0.035
 cam+0.033
 chefs+0.033
 servant+0.033
 customer+0.033
Neg logit contributions
 shouldn-0.027
|-0.027
 hurts-0.026
rated-0.025
idents-0.024
Token,
Token ID11
Feature activation+1.27e-01

Pos logit contributions
 shouldn+0.033
 hurts+0.032
|+0.032
elling+0.031
idents+0.030
Neg logit contributions
 detective-0.018
 customer-0.017
 cam-0.017
 cowboy-0.016
 chefs-0.016
Token disappointed
Token ID11679
Feature activation-1.43e-01

Pos logit contributions
 detective+0.027
 customer+0.026
 cam+0.026
 pirates+0.025
 cowboy+0.025
Neg logit contributions
 shouldn-0.029
 hurts-0.028
|-0.028
elling-0.028
rated-0.027
Token but
Token ID475
Feature activation-7.80e-02

Pos logit contributions
 shouldn+0.240
 hurts+0.232
|+0.219
elling+0.218
idents+0.217
Neg logit contributions
 detective-0.213
 cam-0.203
 customer-0.201
 cowboy-0.199
 chefs-0.198
Token still
Token ID991
Feature activation-9.12e-02

Pos logit contributions
 shouldn+0.132
 hurts+0.119
|+0.114
 aback+0.112
rated+0.112
Neg logit contributions
 detective-0.120
 cowboy-0.115
 cam-0.114
 customer-0.113
 chefs-0.112
Token imagined
Token ID15758
Feature activation+6.20e-02

Pos logit contributions
|+0.149
elling+0.146
 shouldn+0.143
 hurts+0.142
idents+0.139
Neg logit contributions
 detective-0.136
 customer-0.126
 cam-0.126
 pirates-0.123
 servant-0.122
Token what
Token ID644
Feature activation-4.13e-02

Pos logit contributions
 detective+0.091
 cam+0.087
 customer+0.086
 pirates+0.084
 rulers+0.084
Neg logit contributions
 shouldn-0.103
 hurts-0.098
|-0.098
elling-0.096
rated-0.096
Token it
Token ID340
Feature activation-1.33e-01

Pos logit contributions
 shouldn+0.071
 hurts+0.067
|+0.066
elling+0.064
rated+0.064
Neg logit contributions
 detective-0.060
 cam-0.057
 customer-0.056
 cowboy-0.055
 chefs-0.055
Token would
Token ID561
Feature activation-2.19e-02

Pos logit contributions
upid+0.219
elling+0.218
opl+0.205
ric+0.203
|+0.202
Neg logit contributions
 detective-0.202
 cam-0.183
 pirates-0.177
 mer-0.177
 princess-0.176
Token have
Token ID423
Feature activation-4.89e-02

Pos logit contributions
elling+0.038
 shouldn+0.037
|+0.036
 hurts+0.036
os+0.036
Neg logit contributions
 detective-0.031
 customer-0.029
 pirates-0.028
 cam-0.028
 rulers-0.027
Token been
Token ID587
Feature activation+3.32e-02

Pos logit contributions
 shouldn+0.078
|+0.075
 hurts+0.075
elling+0.074
rated+0.073
Neg logit contributions
 detective-0.075
 cam-0.071
 customer-0.070
 pirates-0.070
 chefs-0.069
Token like
Token ID588
Feature activation-5.92e-02

Pos logit contributions
 detective+0.050
 cam+0.048
 customer+0.047
 pirates+0.046
 chefs+0.046
Neg logit contributions
 shouldn-0.054
 hurts-0.052
|-0.052
elling-0.051
rated-0.050
Token if
Token ID611
Feature activation-4.83e-02

Pos logit contributions
 shouldn+0.090
 hurts+0.085
|+0.083
elling+0.081
rated+0.080
Neg logit contributions
 detective-0.095
 cam-0.091
 customer-0.091
 cowboy-0.089
 chefs-0.088
Token Tim
Token ID5045
Feature activation-2.85e-02

Pos logit contributions
 shouldn+0.062
 hurts+0.058
|+0.057
elling+0.056
rated+0.055
Neg logit contributions
 detective-0.097
 cam-0.094
 customer-0.094
 chefs-0.091
 cowboy-0.091
Tokenmy
Token ID1820
Feature activation-2.31e-02

Pos logit contributions
idents+0.042
upid+0.039
elling+0.039
 aback+0.039
 shouldn+0.037
Neg logit contributions
 pirates-0.045
 chefs-0.044
 detective-0.044
 dressing-0.043
 pirate-0.043
Token loved
Token ID6151
Feature activation-3.85e-02

Pos logit contributions
elling+0.039
 shouldn+0.039
 aback+0.039
idents+0.038
rated+0.038
Neg logit contributions
 detective-0.032
 cam-0.031
 pirates-0.030
 customer-0.030
 pirate-0.029
Token to
Token ID284
Feature activation-2.23e-02

Pos logit contributions
 shouldn+0.044
 hurts+0.043
|+0.040
elling+0.039
idents+0.039
Neg logit contributions
 detective-0.075
 customer-0.070
 rulers-0.070
 cam-0.070
 buyer-0.069
Token wander
Token ID27776
Feature activation+9.54e-02

Pos logit contributions
 hurts+0.028
 shouldn+0.027
|+0.027
os+0.025
idents+0.024
Neg logit contributions
 detective-0.040
 cam-0.039
 customer-0.038
 rulers-0.038
 buyer-0.038
Token around
Token ID1088
Feature activation-1.42e-01

Pos logit contributions
 detective+0.192
 customer+0.185
 cam+0.184
 cowboy+0.180
 chefs+0.180
Neg logit contributions
 shouldn-0.109
 hurts-0.103
|-0.102
elling-0.098
rated-0.096
Token his
Token ID465
Feature activation+3.79e-02

Pos logit contributions
|+0.164
 shouldn+0.161
 hurts+0.155
 aback+0.141
os+0.140
Neg logit contributions
 detective-0.268
 customer-0.257
 servant-0.254
 rulers-0.252
 cam-0.252
Token neighborhood
Token ID6232
Feature activation+3.23e-02

Pos logit contributions
 detective+0.061
 customer+0.059
 rulers+0.058
 cowboy+0.058
 cam+0.057
Neg logit contributions
 hurts-0.054
 shouldn-0.054
idents-0.052
|-0.052
elling-0.049
Token and
Token ID290
Feature activation-7.21e-02

Pos logit contributions
 detective+0.074
 cowboy+0.073
P+0.071
 ha+0.070
 customer+0.069
Neg logit contributions
os-0.024
 up-0.023
 The-0.021
 around-0.021
 Every-0.021
Token see
Token ID766
Feature activation+3.31e-02

Pos logit contributions
 shouldn+0.097
|+0.093
 hurts+0.091
rated+0.085
idents+0.084
Neg logit contributions
 detective-0.128
 customer-0.124
 chefs-0.122
 cam-0.121
 rulers-0.121
Token all
Token ID477
Feature activation-1.07e-01

Pos logit contributions
 detective+0.061
ider+0.060
 customer+0.059
 ha+0.057
 servant+0.056
Neg logit contributions
 shouldn-0.043
elling-0.038
 hurts-0.038
 aback-0.037
 Whenever-0.036