TOP ACTIVATIONS
MAX = 8.030

. Chloe took his ho of and they started to walk
they played with the boun cy ball together and had lots
he could help the boun cy thing have some fun .
he had used the squee gee to good use ! <|endoftext|>
take turns reading the emer ald green book to each other
He got on the boun cy castle , closed his eyes
With a to othy grin , Kenny led the
I am taking the rh in oc eros to a new
sweet to share the boun cy castle with her friends .
So Tom jumped onto the ear and it was so much
so he raised his ho of . But then he saw
Let 's play tug - of - war ," said the
clay . He took the biggest piece and said , "
finding ! He showed the three gems to his friends ,
The unicorn took the rh in oc eros to a nearby
. He used the squee gee and lots of water to
night and took the rh in oc eros home with her
. He played tug - of - war and ran in
Jack thanked the ow ls and he waved goodbye
the tree and picked the biggest , rip est olive she

MIN ACTIVATIONS
MIN = -8.803

spoiled and mold y and ranc id and expired and best
that her injury was too severe and she would never be
the soup , it was ste aming hot ! She pretended
There was a gr umpy ranc her named Tom . Tom
wallet had coins , gum wra ppers , buttons , and
control . The ranc her heard the noise and
scratched . The ranc her heard the noise and
the new horse when the ranc her was not looking .
room . The dish was ste aming hot . It was
duck slowly crawled , it envelop ed in the fresh ,
but today they were qu arre ling . Jenna wanted to
little birds that were qu arre ling . The
willing . The ranc her forg ave them ,
doctor , which was an err and he didn 't understand
. Lucy thought it was wine . She knew she should
, cup cakes and ch ocol ates .
It was sweet , ch ocol at y and warm ,
. There were cans , wra ppers and bottles . Sara
but today they were qu arre ling . They were arguing
little girl who was feeling un conf ident . She wanted

INTERVAL 3.822 - 8.030
CONTAINS 0.701%

as he put the finishing touches on his painting . It
you . We made your favorite dinner : pizza and ice
always be here to make each other feel better ."
? She asked her mom my . Mom my couldn 't
the bar ber in the mirror . The bar ber was

INTERVAL -0.387 - 3.822
CONTAINS 61.618%

said , � � € � � I understand . I
shelf . But when she tried to put Anna
the mud and started to roll around . He was having
blowing in the wind . It was so pretty ! Tim
?" Tom asks . " Sometimes , the water

INTERVAL -4.595 - -0.387
CONTAINS 37.511%

. Bob and Lily were very happy . They shared the
is so graceful . It swim s like a dancer ."
the little boy 's shoulder . He begged and begged the
had fun . One of the games was " cl ap
" I wish I never gave my bug to you ,

INTERVAL -8.803 - -4.595
CONTAINS 0.170%

on the island were very thirsty , but no matter how
what the man was looking for - a book he had
foolish . She knew that folders weren 't meant to be
Big John was feeling generous , so he happily agreed
together . They have ten dollars . Sam is happy .
Token.
Token ID13
Feature activation-6.55e-02

Pos logit contributions
 sacks+0.001
mons+0.001
 bulldo+0.001
 problem+0.001
unes+0.001
Neg logit contributions
 who-0.000
Sure-0.000
 gently-0.000
 softly-0.000
 eagerly-0.000
Token Chloe
Token ID29476
Feature activation+1.17e+00

Pos logit contributions
 trouble+0.003
 guns+0.003
 men+0.003
 problem+0.003
 thieves+0.003
Neg logit contributions
Sure-0.003
 approval-0.002
Acc-0.002
Indeed-0.002
Of-0.002
Token took
Token ID1718
Feature activation+6.85e-01

Pos logit contributions
Sure+0.034
 approval+0.030
Indeed+0.029
 gently+0.029
Of+0.028
Neg logit contributions
 guns-0.062
 trouble-0.062
 thieves-0.062
 problem-0.062
 men-0.061
Token his
Token ID465
Feature activation+1.93e+00

Pos logit contributions
Sure+0.026
Acc+0.026
 approval+0.024
Indeed+0.024
Of+0.024
Neg logit contributions
 trouble-0.030
 men-0.029
 guns-0.029
 thieves-0.029
 problem-0.028
Token ho
Token ID8169
Feature activation+1.92e+00

Pos logit contributions
Sure+0.077
Acc+0.073
 approval+0.071
Indeed+0.070
Of+0.068
Neg logit contributions
 trouble-0.082
 guns-0.081
 problem-0.081
 thieves-0.080
 men-0.080
Tokenof
Token ID1659
Feature activation+8.03e+00

Pos logit contributions
Acc+0.029
Sure+0.029
Indeed+0.025
 approval+0.025
And+0.022
Neg logit contributions
 guns-0.124
 thieves-0.123
 bulldo-0.120
 trouble-0.119
 men-0.119
Token and
Token ID290
Feature activation+1.53e+00

Pos logit contributions
Sure+0.280
 approval+0.254
Acc+0.243
Indeed+0.243
Of+0.243
Neg logit contributions
 trouble-0.379
 guns-0.375
 men-0.374
 problem-0.373
 thieves-0.370
Token they
Token ID484
Feature activation-2.40e+00

Pos logit contributions
 gently+0.050
Sure+0.049
Of+0.043
 carefully+0.043
 softly+0.041
Neg logit contributions
 problem-0.082
unes-0.081
 inconvenience-0.079
irens-0.078
 virus-0.077
Token started
Token ID2067
Feature activation+2.25e-01

Pos logit contributions
 trouble+0.116
 guns+0.115
 problem+0.114
 men+0.114
 thieves+0.114
Neg logit contributions
Sure-0.081
 approval-0.074
Acc-0.071
Indeed-0.071
Of-0.070
Token to
Token ID284
Feature activation-1.62e+00

Pos logit contributions
 gently+0.007
 softly+0.006
 who+0.006
 comfort+0.006
 eagerly+0.006
Neg logit contributions
mons-0.012
 virus-0.012
 problem-0.011
 inconvenience-0.011
 thieves-0.011
Token walk
Token ID2513
Feature activation+1.25e+00

Pos logit contributions
 trouble+0.082
 guns+0.082
 men+0.082
 problem+0.081
 thieves+0.081
Neg logit contributions
Sure-0.051
 approval-0.046
Acc-0.044
Indeed-0.044
Of-0.043
Token they
Token ID484
Feature activation-2.10e+00

Pos logit contributions
Sure+0.035
Of+0.029
 gently+0.029
Indeed+0.028
Acc+0.028
Neg logit contributions
 thieves-0.051
mons-0.051
inc-0.050
 men-0.050
 trouble-0.050
Token played
Token ID2826
Feature activation+1.62e+00

Pos logit contributions
 guns+0.087
 trouble+0.086
 men+0.085
 problem+0.085
 bulldo+0.084
Neg logit contributions
Sure-0.084
 approval-0.079
Acc-0.076
Of-0.076
Indeed-0.075
Token with
Token ID351
Feature activation+1.95e+00

Pos logit contributions
Sure+0.068
 approval+0.064
Acc+0.063
Of+0.061
Indeed+0.061
Neg logit contributions
 trouble-0.065
 guns-0.065
 men-0.063
 problem-0.062
 thieves-0.062
Token the
Token ID262
Feature activation+3.37e-01

Pos logit contributions
Sure+0.089
 approval+0.082
Acc+0.081
Of+0.081
Indeed+0.080
Neg logit contributions
 guns-0.077
 trouble-0.075
 men-0.072
 thieves-0.071
 problem-0.071
Token boun
Token ID31283
Feature activation+3.03e+00

Pos logit contributions
Sure+0.015
 approval+0.014
Acc+0.014
Indeed+0.014
Of+0.014
Neg logit contributions
 guns-0.012
 trouble-0.012
 men-0.012
 thieves-0.012
 problem-0.012
Tokency
Token ID948
Feature activation+7.64e+00

Pos logit contributions
Sure+0.099
 approval+0.094
Acc+0.093
Indeed+0.090
Of+0.089
Neg logit contributions
 guns-0.146
 trouble-0.144
 problem-0.144
 thieves-0.142
 bulldo-0.140
Token ball
Token ID2613
Feature activation+2.32e+00

Pos logit contributions
Sure+0.366
 approval+0.343
Acc+0.338
Indeed+0.333
Of+0.332
Neg logit contributions
 trouble-0.259
 guns-0.259
 men-0.255
 problem-0.254
 thieves-0.253
Token together
Token ID1978
Feature activation+1.56e+00

Pos logit contributions
Sure+0.101
 approval+0.099
Acc+0.095
Of+0.093
Indeed+0.093
Neg logit contributions
 guns-0.089
 problem-0.086
 men-0.085
 trouble-0.085
 thieves-0.083
Token and
Token ID290
Feature activation+1.42e+00

Pos logit contributions
Sure+0.066
Acc+0.062
Of+0.061
 approval+0.061
Indeed+0.059
Neg logit contributions
 guns-0.065
 problem-0.062
 trouble-0.062
 men-0.061
 sale-0.061
Token had
Token ID550
Feature activation-4.91e-01

Pos logit contributions
Sure+0.054
Acc+0.049
 approval+0.048
Indeed+0.048
Of+0.047
Neg logit contributions
 thieves-0.062
 trouble-0.062
 problem-0.062
 men-0.061
irens-0.061
Token lots
Token ID6041
Feature activation+4.21e-01

Pos logit contributions
 trouble+0.022
 guns+0.021
 problem+0.020
 sacks+0.020
 men+0.020
Neg logit contributions
Sure-0.019
 approval-0.018
Acc-0.018
Of-0.017
Indeed-0.017
Token he
Token ID339
Feature activation-1.54e+00

Pos logit contributions
Sure+0.047
 approval+0.044
 gently+0.044
 who+0.043
Of+0.042
Neg logit contributions
mons-0.057
 bulldo-0.057
riger-0.056
irens-0.056
 problem-0.056
Token could
Token ID714
Feature activation+3.07e+00

Pos logit contributions
 trouble+0.068
 guns+0.068
 men+0.067
 problem+0.067
 thieves+0.066
Neg logit contributions
Sure-0.058
 approval-0.053
Acc-0.052
Indeed-0.051
Of-0.051
Token help
Token ID1037
Feature activation+9.75e-01

Pos logit contributions
Sure+0.097
 approval+0.088
Acc+0.084
Indeed+0.083
Of+0.082
Neg logit contributions
 guns-0.154
 trouble-0.154
 men-0.153
 problem-0.153
 thieves-0.152
Token the
Token ID262
Feature activation+1.09e+00

Pos logit contributions
Sure+0.029
 approval+0.028
 gently+0.026
 forgive+0.025
 agreement+0.025
Neg logit contributions
mons-0.050
 bulldo-0.049
 problem-0.049
 thieves-0.048
 guns-0.048
Token boun
Token ID31283
Feature activation+2.60e+00

Pos logit contributions
Sure+0.044
 approval+0.042
Acc+0.040
Of+0.039
Indeed+0.039
Neg logit contributions
 guns-0.043
irens-0.043
 trouble-0.043
mons-0.042
 men-0.042
Tokency
Token ID948
Feature activation+7.60e+00

Pos logit contributions
Sure+0.082
 approval+0.077
Acc+0.075
Indeed+0.073
Of+0.072
Neg logit contributions
 guns-0.128
 trouble-0.127
 problem-0.126
 thieves-0.126
 men-0.125
Token thing
Token ID1517
Feature activation+2.40e+00

Pos logit contributions
Sure+0.291
 approval+0.271
 gently+0.259
Indeed+0.253
 agreement+0.253
Neg logit contributions
irens-0.330
 trouble-0.329
 thieves-0.325
 men-0.322
 guns-0.318
Token have
Token ID423
Feature activation+8.10e-01

Pos logit contributions
Sure+0.076
 gently+0.068
 approval+0.067
Indeed+0.064
Of+0.063
Neg logit contributions
 trouble-0.120
 bulldo-0.119
 men-0.119
 thieves-0.118
 problem-0.118
Token some
Token ID617
Feature activation-1.07e+00

Pos logit contributions
Sure+0.018
 gently+0.018
 forgive+0.018
 approval+0.018
 eagerly+0.017
Neg logit contributions
 bulldo-0.045
irens-0.045
 men-0.045
 paramed-0.044
 problem-0.044
Token fun
Token ID1257
Feature activation+1.10e+00

Pos logit contributions
irens+0.049
 bulldo+0.049
mons+0.048
 guns+0.048
 men+0.048
Neg logit contributions
Sure-0.037
 approval-0.036
 forgive-0.033
 gently-0.033
Of-0.032
Token.
Token ID13
Feature activation-1.52e-01

Pos logit contributions
 approval+0.027
Sure+0.027
 who+0.026
 gently+0.026
 forgive+0.024
Neg logit contributions
 bulldo-0.059
irens-0.058
riger-0.058
 thieves-0.058
 paramed-0.057
Token he
Token ID339
Feature activation-4.08e-01

Pos logit contributions
irens+0.009
 bulldo+0.008
mons+0.008
 thieves+0.008
inc+0.008
Neg logit contributions
Sure-0.005
 approval-0.005
 forgive-0.005
 gently-0.005
Acc-0.005
Token had
Token ID550
Feature activation-1.80e-01

Pos logit contributions
 guns+0.018
 trouble+0.018
 men+0.018
 problem+0.018
 thieves+0.018
Neg logit contributions
Sure-0.015
 approval-0.014
Acc-0.014
Indeed-0.014
Of-0.014
Token used
Token ID973
Feature activation+3.91e-01

Pos logit contributions
irens+0.008
 bulldo+0.008
mons+0.008
 men+0.008
riger+0.008
Neg logit contributions
 approval-0.007
 forgive-0.006
Sure-0.006
 gently-0.006
Acc-0.006
Token the
Token ID262
Feature activation+7.05e-01

Pos logit contributions
 approval+0.012
Sure+0.012
 gently+0.011
 forgive+0.010
Acc+0.010
Neg logit contributions
 bulldo-0.019
 men-0.019
mons-0.019
irens-0.019
 thieves-0.019
Token squee
Token ID13170
Feature activation+2.74e+00

Pos logit contributions
Sure+0.029
 approval+0.027
Acc+0.026
Of+0.026
Indeed+0.026
Neg logit contributions
 guns-0.028
 men-0.028
 trouble-0.028
irens-0.028
 thieves-0.028
Tokengee
Token ID29622
Feature activation+7.59e+00

Pos logit contributions
Sure+0.102
 approval+0.098
Acc+0.094
 gently+0.089
Indeed+0.089
Neg logit contributions
 men-0.118
 thieves-0.117
 trouble-0.117
 guns-0.117
 problem-0.114
Token to
Token ID284
Feature activation-8.46e-01

Pos logit contributions
Sure+0.267
 approval+0.244
 gently+0.233
of+0.232
 forgive+0.227
Neg logit contributions
 thieves-0.349
 trouble-0.345
 problem-0.337
irens-0.332
 guns-0.332
Token good
Token ID922
Feature activation+1.57e+00

Pos logit contributions
 guns+0.043
 men+0.042
 problem+0.042
 thieves+0.042
 trouble+0.042
Neg logit contributions
Sure-0.026
 approval-0.024
Acc-0.022
 gently-0.022
 forgive-0.022
Token use
Token ID779
Feature activation+1.10e+00

Pos logit contributions
Sure+0.066
 approval+0.065
 agreement+0.060
 gently+0.059
 forgive+0.058
Neg logit contributions
mons-0.060
 guns-0.059
irens-0.058
 bulldo-0.058
 thieves-0.057
Token!
Token ID0
Feature activation+1.34e+00

Pos logit contributions
Sure+0.033
 approval+0.032
 gently+0.032
 forgive+0.029
Acc+0.028
Neg logit contributions
mons-0.054
 thieves-0.054
 men-0.054
 bulldo-0.054
irens-0.053
Token<|endoftext|>
Token ID50256
Feature activation-2.61e+00

Pos logit contributions
Sure+0.037
 approval+0.034
Acc+0.032
Indeed+0.032
Of+0.031
Neg logit contributions
 trouble-0.071
 guns-0.071
 men-0.070
 problem-0.070
 thieves-0.070
Token take
Token ID1011
Feature activation+1.09e+00

Pos logit contributions
Sure+0.013
 approval+0.012
Of+0.011
 gently+0.011
Indeed+0.011
Neg logit contributions
 trouble-0.025
 problem-0.025
 thieves-0.024
 guns-0.024
irens-0.024
Token turns
Token ID4962
Feature activation+1.78e+00

Pos logit contributions
Sure+0.041
 approval+0.038
Indeed+0.036
Of+0.036
Acc+0.035
Neg logit contributions
 guns-0.048
 thieves-0.048
 trouble-0.048
 problem-0.048
 men-0.047
Token reading
Token ID3555
Feature activation-7.47e-01

Pos logit contributions
 who+0.061
 gently+0.060
 softly+0.057
 carefully+0.057
 eagerly+0.055
Neg logit contributions
 virus-0.082
mons-0.082
 problem-0.081
 bulldo-0.078
 AW-0.077
Token the
Token ID262
Feature activation+7.39e-01

Pos logit contributions
 trouble+0.033
 guns+0.033
 problem+0.032
 men+0.032
 thieves+0.032
Neg logit contributions
Sure-0.028
 approval-0.026
Acc-0.025
Indeed-0.025
Of-0.025
Token emer
Token ID3165
Feature activation+6.64e-01

Pos logit contributions
Sure+0.037
Acc+0.037
 approval+0.034
Indeed+0.034
Of+0.034
Neg logit contributions
 trouble-0.024
 problem-0.024
 men-0.023
 guns-0.023
 thieves-0.022
Tokenald
Token ID1940
Feature activation+7.53e+00

Pos logit contributions
Sure+0.021
 approval+0.019
Acc+0.019
Indeed+0.019
Of+0.018
Neg logit contributions
 guns-0.032
 thieves-0.032
 trouble-0.032
 men-0.032
 problem-0.032
Token green
Token ID4077
Feature activation+3.63e+00

Pos logit contributions
Sure+0.288
of+0.275
 gently+0.258
 who+0.249
 approval+0.249
Neg logit contributions
 trouble-0.339
 bulldo-0.326
irens-0.320
 thieves-0.320
 problem-0.318
Token book
Token ID1492
Feature activation-7.13e-01

Pos logit contributions
Sure+0.159
 approval+0.149
Indeed+0.143
Of+0.142
Acc+0.140
Neg logit contributions
 trouble-0.137
 guns-0.137
 men-0.135
 thieves-0.135
 bulldo-0.134
Token to
Token ID284
Feature activation-1.10e+00

Pos logit contributions
 trouble+0.035
 thieves+0.035
 guns+0.035
 bulldo+0.035
 men+0.035
Neg logit contributions
Sure-0.023
 approval-0.021
 gently-0.020
Indeed-0.020
Of-0.020
Token each
Token ID1123
Feature activation+3.09e+00

Pos logit contributions
 trouble+0.050
 men+0.048
 sale+0.048
 guns+0.048
 thieves+0.047
Neg logit contributions
Acc-0.042
Sure-0.041
Indeed-0.037
Of-0.037
And-0.035
Token other
Token ID584
Feature activation+1.17e+00

Pos logit contributions
Acc+0.145
Sure+0.129
Of+0.127
 Higher+0.123
Indeed+0.121
Neg logit contributions
 trouble-0.125
 problem-0.121
 men-0.118
 guns-0.117
 sale-0.114
Token He
Token ID679
Feature activation+5.39e-01

Pos logit contributions
Sure+0.003
 approval+0.003
Acc+0.003
Indeed+0.003
Of+0.003
Neg logit contributions
 trouble-0.003
 men-0.003
 guns-0.003
 bulldo-0.003
 thieves-0.003
Token got
Token ID1392
Feature activation-4.04e-01

Pos logit contributions
Sure+0.020
 approval+0.018
Acc+0.018
Indeed+0.017
Of+0.017
Neg logit contributions
 guns-0.024
 trouble-0.024
 men-0.024
 problem-0.024
 thieves-0.024
Token on
Token ID319
Feature activation+1.69e-01

Pos logit contributions
 problem+0.021
irens+0.020
 guns+0.020
 bulldo+0.020
mons+0.020
Neg logit contributions
Sure-0.013
 approval-0.012
 gently-0.011
Indeed-0.011
Of-0.011
Token the
Token ID262
Feature activation+1.05e+00

Pos logit contributions
Sure+0.004
 approval+0.004
 gently+0.004
 eagerly+0.004
Indeed+0.004
Neg logit contributions
mons-0.009
 problem-0.009
 thieves-0.009
 pests-0.009
ages-0.009
Token boun
Token ID31283
Feature activation+2.70e+00

Pos logit contributions
Sure+0.044
Acc+0.041
 approval+0.041
Indeed+0.040
Of+0.040
Neg logit contributions
 trouble-0.042
 guns-0.042
 men-0.041
 problem-0.041
 thieves-0.041
Tokency
Token ID948
Feature activation+7.52e+00

Pos logit contributions
Sure+0.092
Acc+0.092
 approval+0.089
Of+0.086
And+0.085
Neg logit contributions
 guns-0.121
 trouble-0.120
 problem-0.120
 thieves-0.118
 sacks-0.117
Token castle
Token ID16669
Feature activation+1.12e+00

Pos logit contributions
Sure+0.332
Acc+0.309
 approval+0.304
Of+0.301
Indeed+0.299
Neg logit contributions
 guns-0.285
 trouble-0.282
 problem-0.277
 men-0.276
 thieves-0.275
Token,
Token ID11
Feature activation-9.20e-01

Pos logit contributions
 gently+0.024
Sure+0.024
 carefully+0.021
 approval+0.021
 eagerly+0.021
Neg logit contributions
 thieves-0.067
 trouble-0.067
mons-0.066
 problem-0.065
irens-0.065
Token closed
Token ID4838
Feature activation-1.10e+00

Pos logit contributions
 trouble+0.044
 guns+0.044
 men+0.044
 problem+0.043
 thieves+0.043
Neg logit contributions
Sure-0.031
 approval-0.028
Acc-0.027
Indeed-0.027
Of-0.027
Token his
Token ID465
Feature activation+3.03e-01

Pos logit contributions
 trouble+0.049
 guns+0.048
 men+0.048
 problem+0.048
 thieves+0.048
Neg logit contributions
Sure-0.041
 approval-0.038
Acc-0.037
Indeed-0.036
Of-0.036
Token eyes
Token ID2951
Feature activation+1.60e+00

Pos logit contributions
Sure+0.012
 approval+0.011
Indeed+0.011
Acc+0.011
Of+0.011
Neg logit contributions
 guns-0.013
 trouble-0.013
 men-0.013
 thieves-0.013
 bulldo-0.012
Token
Token ID198
Feature activation+8.30e-01

Pos logit contributions
Sure+0.014
 approval+0.012
Acc+0.012
Indeed+0.012
Of+0.011
Neg logit contributions
 guns-0.040
 trouble-0.040
 problem-0.040
 men-0.040
 thieves-0.040
Token
Token ID198
Feature activation+1.04e+00

Pos logit contributions
 approval+0.026
Acc+0.025
Sure+0.024
 agreement+0.024
Of+0.022
Neg logit contributions
irens-0.041
 bulldo-0.040
 trouble-0.039
uts-0.038
 guns-0.038
TokenWith
Token ID3152
Feature activation+7.12e-01

Pos logit contributions
Sure+0.034
 approval+0.030
Indeed+0.030
Of+0.029
Acc+0.029
Neg logit contributions
 trouble-0.052
 guns-0.052
 problem-0.052
 thieves-0.052
 men-0.051
Token a
Token ID257
Feature activation+1.02e+00

Pos logit contributions
Sure+0.026
 approval+0.024
Indeed+0.023
Of+0.023
Acc+0.022
Neg logit contributions
 trouble-0.032
 guns-0.032
 problem-0.032
 thieves-0.032
 men-0.032
Token to
Token ID284
Feature activation-1.35e+00

Pos logit contributions
Acc+0.033
Sure+0.033
 approval+0.030
Indeed+0.029
Of+0.028
Neg logit contributions
 problem-0.050
 trouble-0.050
 men-0.049
 guns-0.048
 bulldo-0.048
Tokenothy
Token ID14863
Feature activation+7.45e+00

Pos logit contributions
 trouble+0.063
 guns+0.063
 men+0.063
 problem+0.062
 thieves+0.062
Neg logit contributions
Sure-0.047
 approval-0.043
Acc-0.042
Indeed-0.041
Of-0.041
Token grin
Token ID16770
Feature activation+3.48e+00

Pos logit contributions
Sure+0.261
 approval+0.240
Of+0.227
Indeed+0.224
 agreement+0.220
Neg logit contributions
 guns-0.349
 trouble-0.348
 thieves-0.344
mons-0.343
 men-0.339
Token,
Token ID11
Feature activation-5.22e-01

Pos logit contributions
 gently+0.116
Sure+0.107
 softly+0.106
 then+0.105
 carefully+0.104
Neg logit contributions
 trouble-0.178
mons-0.177
 thieves-0.174
 pests-0.170
 inconvenience-0.168
Token Kenny
Token ID22102
Feature activation+6.12e-01

Pos logit contributions
mons+0.024
 thieves+0.023
 problem+0.023
unes+0.023
 pests+0.023
Neg logit contributions
Sure-0.020
 gently-0.018
Of-0.018
Indeed-0.017
 who-0.017
Token led
Token ID2957
Feature activation+6.52e-01

Pos logit contributions
Sure+0.020
 approval+0.018
Acc+0.018
Indeed+0.018
Of+0.017
Neg logit contributions
 trouble-0.030
 guns-0.030
 men-0.029
 problem-0.029
 thieves-0.029
Token the
Token ID262
Feature activation+2.46e-01

Pos logit contributions
Sure+0.021
 approval+0.019
 gently+0.018
Of+0.018
Indeed+0.018
Neg logit contributions
mons-0.032
 problem-0.032
 guns-0.032
 thieves-0.032
 bulldo-0.032
Token I
Token ID314
Feature activation+2.52e+00

Pos logit contributions
Sure+0.021
 approval+0.020
 gently+0.020
Of+0.019
 forgive+0.018
Neg logit contributions
irens-0.030
 bulldo-0.030
 problem-0.030
mons-0.029
 thieves-0.029
Token am
Token ID716
Feature activation+6.99e-01

Pos logit contributions
Sure+0.085
 forgive+0.074
 approval+0.074
Of+0.072
 gently+0.072
Neg logit contributions
irens-0.127
 guns-0.127
 thieves-0.124
 men-0.123
 trouble-0.121
Token taking
Token ID2263
Feature activation+4.40e-01

Pos logit contributions
Sure+0.023
 approval+0.021
 gently+0.021
 forgive+0.021
Acc+0.020
Neg logit contributions
 guns-0.034
mons-0.033
irens-0.033
 bulldo-0.033
 men-0.033
Token the
Token ID262
Feature activation+2.47e+00

Pos logit contributions
Sure+0.016
 approval+0.015
 gently+0.014
Of+0.014
Indeed+0.013
Neg logit contributions
 bulldo-0.020
mons-0.020
 men-0.020
 problem-0.020
 guns-0.020
Token rh
Token ID9529
Feature activation+1.55e+00

Pos logit contributions
Sure+0.107
 approval+0.101
Of+0.096
 gently+0.095
 forgive+0.094
Neg logit contributions
irens-0.092
uts-0.089
 trouble-0.088
 men-0.088
mons-0.088
Tokenin
Token ID259
Feature activation+7.39e+00

Pos logit contributions
Sure+0.034
 approval+0.029
Indeed+0.027
Acc+0.027
Of+0.026
Neg logit contributions
 trouble-0.092
 guns-0.091
 men-0.090
 problem-0.089
 bulldo-0.089
Tokenoc
Token ID420
Feature activation-1.39e+00

Pos logit contributions
Sure+0.315
 approval+0.287
Of+0.278
Indeed+0.277
And+0.272
Neg logit contributions
 trouble-0.290
 guns-0.283
 men-0.283
 thieves-0.281
irens-0.279
Tokeneros
Token ID27498
Feature activation+4.16e+00

Pos logit contributions
 trouble+0.043
 guns+0.041
 men+0.040
 problem+0.040
 virus+0.040
Neg logit contributions
Sure-0.068
 approval-0.066
Acc-0.062
Indeed-0.062
Of-0.061
Token to
Token ID284
Feature activation-2.93e-01

Pos logit contributions
 who+0.134
Sure+0.129
 forgive+0.127
 gently+0.127
of+0.119
Neg logit contributions
irens-0.198
mons-0.197
 men-0.190
 trouble-0.187
 bulldo-0.186
Token a
Token ID257
Feature activation+2.06e+00

Pos logit contributions
mons+0.014
irens+0.014
 problem+0.014
 guns+0.014
 men+0.013
Neg logit contributions
Sure-0.010
 approval-0.009
 forgive-0.009
 gently-0.009
Indeed-0.009
Token new
Token ID649
Feature activation+3.04e+00

Pos logit contributions
Sure+0.086
 approval+0.080
Acc+0.079
Indeed+0.078
Of+0.077
Neg logit contributions
 trouble-0.082
 problem-0.082
 guns-0.082
 men-0.080
 thieves-0.080
Token sweet
Token ID6029
Feature activation-5.85e-01

Pos logit contributions
 guns+0.091
 trouble+0.091
 men+0.090
 problem+0.089
 thieves+0.089
Neg logit contributions
Sure-0.060
 approval-0.055
Acc-0.053
Indeed-0.053
Of-0.052
Token to
Token ID284
Feature activation-2.40e+00

Pos logit contributions
 bulldo+0.030
irens+0.030
 guns+0.030
mons+0.029
 men+0.029
Neg logit contributions
Sure-0.018
 approval-0.016
Of-0.015
Indeed-0.015
 gently-0.015
Token share
Token ID2648
Feature activation+1.17e+00

Pos logit contributions
 trouble+0.120
 guns+0.120
 men+0.119
 problem+0.119
 thieves+0.119
Neg logit contributions
Sure-0.076
 approval-0.069
Acc-0.067
Indeed-0.066
Of-0.065
Token the
Token ID262
Feature activation+7.96e-01

Pos logit contributions
Sure+0.036
 approval+0.035
 gently+0.032
 forgive+0.031
Indeed+0.031
Neg logit contributions
 bulldo-0.060
 problem-0.059
mons-0.059
 men-0.058
 thieves-0.058
Token boun
Token ID31283
Feature activation+2.71e+00

Pos logit contributions
Sure+0.033
 approval+0.031
Indeed+0.029
Of+0.029
Acc+0.029
Neg logit contributions
irens-0.032
 men-0.032
 guns-0.032
mons-0.031
 thieves-0.031
Tokency
Token ID948
Feature activation+7.34e+00

Pos logit contributions
Sure+0.091
Acc+0.089
 approval+0.088
Indeed+0.085
Of+0.083
Neg logit contributions
 guns-0.122
 problem-0.120
 trouble-0.120
 thieves-0.119
mons-0.119
Token castle
Token ID16669
Feature activation+7.09e-01

Pos logit contributions
Sure+0.297
 approval+0.294
 agreement+0.271
of+0.266
 gently+0.264
Neg logit contributions
irens-0.301
 trouble-0.289
 thieves-0.284
 men-0.283
 bulldo-0.281
Token with
Token ID351
Feature activation+2.34e-01

Pos logit contributions
Sure+0.018
 gently+0.018
 approval+0.018
 who+0.017
 carefully+0.016
Neg logit contributions
 bulldo-0.039
 men-0.039
irens-0.039
 trouble-0.039
 thieves-0.038
Token her
Token ID607
Feature activation+6.80e-01

Pos logit contributions
Sure+0.009
Acc+0.008
Of+0.008
Indeed+0.008
 approval+0.008
Neg logit contributions
 trouble-0.011
 guns-0.011
 men-0.010
 thieves-0.010
 problem-0.010
Token friends
Token ID2460
Feature activation+6.56e-01

Pos logit contributions
Sure+0.025
 approval+0.024
Indeed+0.023
Acc+0.023
Of+0.023
Neg logit contributions
irens-0.030
 men-0.029
 trouble-0.029
 guns-0.029
 bulldo-0.029
Token.
Token ID13
Feature activation-4.31e-01

Pos logit contributions
Sure+0.018
 approval+0.017
 gently+0.016
 forgive+0.015
 who+0.015
Neg logit contributions
 men-0.036
 trouble-0.036
irens-0.036
 bulldo-0.036
mons-0.035
Token So
Token ID1406
Feature activation+7.88e-01

Pos logit contributions
Sure+0.026
 approval+0.024
Acc+0.023
Indeed+0.023
Of+0.022
Neg logit contributions
 trouble-0.044
 guns-0.043
 men-0.043
 problem-0.043
 thieves-0.043
Token Tom
Token ID4186
Feature activation+2.64e-01

Pos logit contributions
Sure+0.026
 gently+0.024
 then+0.024
 carefully+0.022
 eagerly+0.022
Neg logit contributions
 bulldo-0.041
inc-0.041
unes-0.041
mons-0.040
 virus-0.040
Token jumped
Token ID11687
Feature activation-3.76e-01

Pos logit contributions
Sure+0.009
 approval+0.008
Acc+0.008
Indeed+0.008
Of+0.008
Neg logit contributions
 trouble-0.013
 guns-0.013
 men-0.012
 problem-0.012
 thieves-0.012
Token onto
Token ID4291
Feature activation-1.40e-01

Pos logit contributions
 thieves+0.019
mons+0.019
 guns+0.019
 problem+0.018
 trouble+0.018
Neg logit contributions
Sure-0.013
 gently-0.013
 softly-0.011
 approval-0.011
 carefully-0.011
Token the
Token ID262
Feature activation+1.22e+00

Pos logit contributions
 trouble+0.006
 guns+0.006
 men+0.006
 thieves+0.006
 problem+0.006
Neg logit contributions
Sure-0.006
Acc-0.005
 approval-0.005
Indeed-0.005
Of-0.005
Token ear
Token ID1027
Feature activation+7.34e+00

Pos logit contributions
Acc+0.058
Sure+0.052
 forgive+0.051
And+0.050
Rachel+0.050
Neg logit contributions
 problem-0.045
 trouble-0.045
 men-0.044
 thieves-0.042
 bags-0.040
Token and
Token ID290
Feature activation+1.07e+00

Pos logit contributions
Acc+0.291
Sure+0.290
 approval+0.272
Indeed+0.266
Of+0.264
Neg logit contributions
 guns-0.296
 men-0.293
uts-0.290
 sale-0.288
 problem-0.288
Token it
Token ID340
Feature activation-6.43e-02

Pos logit contributions
Sure+0.038
 gently+0.034
Of+0.033
 approval+0.032
Indeed+0.032
Neg logit contributions
 problem-0.052
 trouble-0.051
 thieves-0.051
 guns-0.050
 men-0.050
Token was
Token ID373
Feature activation-5.19e-01

Pos logit contributions
 thieves+0.003
 trouble+0.003
 guns+0.003
 men+0.003
 problem+0.003
Neg logit contributions
Sure-0.002
 gently-0.002
 approval-0.002
Indeed-0.002
Of-0.002
Token so
Token ID523
Feature activation-2.15e+00

Pos logit contributions
 guns+0.025
 men+0.025
 thieves+0.025
 trouble+0.024
mons+0.024
Neg logit contributions
Sure-0.017
 approval-0.016
Acc-0.015
Indeed-0.015
 gently-0.015
Token much
Token ID881
Feature activation-2.14e+00

Pos logit contributions
 trouble+0.103
 guns+0.102
 men+0.102
 problem+0.101
 thieves+0.100
Neg logit contributions
Sure-0.074
 approval-0.068
Acc-0.066
Indeed-0.064
Of-0.064
Token so
Token ID523
Feature activation-4.15e-01

Pos logit contributions
Sure+0.008
Of+0.007
 approval+0.007
 gently+0.007
Indeed+0.007
Neg logit contributions
mons-0.017
irens-0.017
 thieves-0.017
 guns-0.017
 bulldo-0.017
Token he
Token ID339
Feature activation-1.94e+00

Pos logit contributions
 guns+0.018
 bulldo+0.017
 sacks+0.017
 trouble+0.017
mons+0.017
Neg logit contributions
Sure-0.017
 gently-0.015
Of-0.015
Indeed-0.014
of-0.014
Token raised
Token ID4376
Feature activation+9.79e-01

Pos logit contributions
 trouble+0.093
irens+0.091
 thieves+0.090
 guns+0.090
mons+0.089
Neg logit contributions
Sure-0.071
 gently-0.063
Of-0.061
Indeed-0.059
 approval-0.058
Token his
Token ID465
Feature activation+6.75e-01

Pos logit contributions
Sure+0.037
Acc+0.034
 approval+0.033
Indeed+0.033
Of+0.032
Neg logit contributions
 trouble-0.044
 guns-0.044
 men-0.043
 problem-0.043
 thieves-0.042
Token ho
Token ID8169
Feature activation+1.14e+00

Pos logit contributions
Sure+0.022
 approval+0.021
Of+0.019
 agreement+0.018
 gently+0.018
Neg logit contributions
 guns-0.033
 bulldo-0.032
 trouble-0.032
irens-0.032
mons-0.032
Tokenof
Token ID1659
Feature activation+7.31e+00

Pos logit contributions
Sure+0.046
Acc+0.045
Indeed+0.042
Of+0.041
 approval+0.041
Neg logit contributions
 guns-0.047
 thieves-0.046
irens-0.045
 men-0.044
 problem-0.044
Token.
Token ID13
Feature activation-2.56e-02

Pos logit contributions
Sure+0.312
Acc+0.295
 approval+0.288
Indeed+0.284
Of+0.283
Neg logit contributions
 guns-0.286
 problem-0.277
 men-0.277
 trouble-0.277
 thieves-0.272
Token But
Token ID887
Feature activation+8.60e-01

Pos logit contributions
 thieves+0.001
 trouble+0.001
 guns+0.001
 problem+0.001
 pests+0.001
Neg logit contributions
Sure-0.001
Of-0.001
 approval-0.001
 gently-0.001
Indeed-0.001
Token then
Token ID788
Feature activation+2.72e+00

Pos logit contributions
Sure+0.025
 gently+0.025
 then+0.024
 quietly+0.022
 carefully+0.022
Neg logit contributions
irens-0.046
 bullets-0.046
mons-0.046
 trouble-0.046
 thieves-0.045
Token he
Token ID339
Feature activation-1.30e+00

Pos logit contributions
 gently+0.063
Sure+0.061
 smiled+0.060
 then+0.058
 softly+0.058
Neg logit contributions
mons-0.161
 inconvenience-0.155
unes-0.154
keeping-0.153
ained-0.152
Token saw
Token ID2497
Feature activation+1.45e+00

Pos logit contributions
mons+0.065
 trouble+0.065
irens+0.064
 guns+0.064
 sacks+0.064
Neg logit contributions
Sure-0.039
 gently-0.035
 approval-0.033
Indeed-0.033
of-0.033
TokenLet
Token ID5756
Feature activation+3.22e+00

Pos logit contributions
mons+0.114
 guns+0.109
 bulldo+0.107
 men+0.107
irens+0.105
Neg logit contributions
 approval-0.076
 agreement-0.070
Acc-0.068
 gently-0.063
Sure-0.063
Token's
Token ID338
Feature activation+1.05e+00

Pos logit contributions
Sure+0.107
of+0.094
 gently+0.094
 approval+0.090
 agreement+0.084
Neg logit contributions
mons-0.168
irens-0.163
 bulldo-0.162
ometers-0.160
 thieves-0.159
Token play
Token ID711
Feature activation+3.00e+00

Pos logit contributions
Sure+0.027
 approval+0.023
 forgive+0.022
 gently+0.022
Indeed+0.021
Neg logit contributions
 trouble-0.059
 men-0.059
irens-0.059
 problem-0.059
 thieves-0.058
Token tug
Token ID27762
Feature activation+2.15e+00

Pos logit contributions
Sure+0.113
 approval+0.104
Indeed+0.100
Acc+0.099
Of+0.099
Neg logit contributions
 trouble-0.132
 men-0.131
 guns-0.131
 problem-0.131
 thieves-0.131
Token-
Token ID12
Feature activation+1.22e+00

Pos logit contributions
Sure+0.092
 approval+0.086
 gently+0.085
Indeed+0.082
Acc+0.080
Neg logit contributions
 trouble-0.082
 bulldo-0.080
 men-0.080
 thieves-0.080
 problem-0.079
Tokenof
Token ID1659
Feature activation+7.29e+00

Pos logit contributions
 approval+0.046
Sure+0.044
Acc+0.044
 agreement+0.044
Indeed+0.042
Neg logit contributions
mons-0.052
 guns-0.052
irens-0.051
 bulldo-0.051
 men-0.050
Token-
Token ID12
Feature activation+1.24e+00

Pos logit contributions
Sure+0.291
of+0.267
 approval+0.266
 gently+0.256
Indeed+0.253
Neg logit contributions
 trouble-0.310
 men-0.303
 bulldo-0.303
 thieves-0.301
 problem-0.299
Tokenwar
Token ID5767
Feature activation+1.77e+00

Pos logit contributions
 approval+0.046
 agreement+0.044
Acc+0.043
Sure+0.042
Indeed+0.041
Neg logit contributions
mons-0.054
irens-0.053
 guns-0.053
 bulldo-0.052
 men-0.051
Token,"
Token ID553
Feature activation+8.56e-01

Pos logit contributions
 gently+0.056
Sure+0.054
 softly+0.053
 approval+0.052
And+0.049
Neg logit contributions
 problem-0.086
 bulldo-0.085
 trouble-0.085
 thieves-0.085
 virus-0.082
Token said
Token ID531
Feature activation-2.37e+00

Pos logit contributions
Sure+0.036
 approval+0.033
Of+0.032
Indeed+0.032
 gently+0.031
Neg logit contributions
 guns-0.035
 trouble-0.034
 thieves-0.034
 problem-0.034
 bulldo-0.034
Token the
Token ID262
Feature activation-1.81e+00

Pos logit contributions
 trouble+0.112
 guns+0.112
 problem+0.110
 men+0.110
 thieves+0.110
Neg logit contributions
Sure-0.082
 approval-0.074
Acc-0.071
Indeed-0.071
Of-0.071
Token clay
Token ID21558
Feature activation+1.28e+00

Pos logit contributions
irens+0.040
mons+0.039
riger+0.038
inc+0.038
 tresp+0.038
Neg logit contributions
 approval-0.026
Sure-0.025
 gently-0.024
 agreement-0.024
 forgive-0.023
Token.
Token ID13
Feature activation+2.30e-01

Pos logit contributions
 gently+0.031
Sure+0.029
 who+0.029
 forgive+0.028
 approval+0.027
Neg logit contributions
irens-0.074
riger-0.072
 inconvenience-0.071
 trouble-0.070
 virus-0.069
Token He
Token ID679
Feature activation+7.28e-01

Pos logit contributions
Sure+0.009
 approval+0.008
Of+0.007
Indeed+0.007
Acc+0.007
Neg logit contributions
 thieves-0.010
 trouble-0.010
 guns-0.010
 problem-0.010
 men-0.010
Token took
Token ID1718
Feature activation+9.50e-01

Pos logit contributions
Sure+0.026
 gently+0.023
Of+0.023
 approval+0.022
Indeed+0.022
Neg logit contributions
 trouble-0.034
 thieves-0.034
irens-0.034
 guns-0.033
 men-0.033
Token the
Token ID262
Feature activation+1.23e+00

Pos logit contributions
Sure+0.033
 approval+0.031
Indeed+0.029
 gently+0.028
Of+0.028
Neg logit contributions
 guns-0.044
mons-0.044
 thieves-0.044
 bulldo-0.044
 problem-0.044
Token biggest
Token ID4094
Feature activation+7.25e+00

Pos logit contributions
Sure+0.045
 approval+0.043
 gently+0.042
Indeed+0.040
Acc+0.040
Neg logit contributions
irens-0.054
 trouble-0.053
 thieves-0.053
 men-0.052
mons-0.052
Token piece
Token ID3704
Feature activation+1.76e+00

Pos logit contributions
Sure+0.257
 approval+0.252
 gently+0.237
 agreement+0.231
 forgive+0.229
Neg logit contributions
mons-0.323
irens-0.321
 thieves-0.317
 men-0.316
riger-0.313
Token and
Token ID290
Feature activation+1.87e+00

Pos logit contributions
Sure+0.061
 gently+0.056
 approval+0.055
Indeed+0.053
 forgive+0.052
Neg logit contributions
 thieves-0.082
 trouble-0.082
 men-0.081
 problem-0.081
irens-0.080
Token said
Token ID531
Feature activation-1.69e+00

Pos logit contributions
Sure+0.068
 gently+0.063
Indeed+0.059
Of+0.059
 approval+0.058
Neg logit contributions
 thieves-0.085
irens-0.085
 trouble-0.085
 problem-0.084
 men-0.084
Token,
Token ID11
Feature activation-2.30e-01

Pos logit contributions
irens+0.079
 thieves+0.079
 trouble+0.079
 sacks+0.078
 pests+0.077
Neg logit contributions
Sure-0.061
 gently-0.055
 forgive-0.055
 approval-0.053
Of-0.053
Token "
Token ID366
Feature activation-1.45e+00

Pos logit contributions
 trouble+0.011
 thieves+0.011
 guns+0.011
 problem+0.011
 men+0.011
Neg logit contributions
Sure-0.008
 approval-0.007
Acc-0.007
Of-0.007
 gently-0.007
Token finding
Token ID4917
Feature activation+2.50e-01

Pos logit contributions
irens+0.014
 men+0.014
 thieves+0.013
 guns+0.013
mons+0.013
Neg logit contributions
Sure-0.015
 approval-0.015
 forgive-0.014
 gently-0.014
Indeed-0.014
Token!
Token ID0
Feature activation+4.92e-01

Pos logit contributions
Sure+0.009
 approval+0.009
Indeed+0.008
Acc+0.008
Of+0.008
Neg logit contributions
 trouble-0.011
 guns-0.011
 men-0.011
 problem-0.011
 bulldo-0.011
Token He
Token ID679
Feature activation+5.86e-01

Pos logit contributions
Sure+0.019
 approval+0.017
Acc+0.017
Indeed+0.017
Of+0.017
Neg logit contributions
 trouble-0.022
 guns-0.021
 men-0.021
 problem-0.021
 thieves-0.021
Token showed
Token ID3751
Feature activation+1.14e+00

Pos logit contributions
Sure+0.023
 approval+0.022
Acc+0.022
Of+0.021
 agreement+0.020
Neg logit contributions
 bulldo-0.024
 guns-0.024
 problem-0.024
 trouble-0.024
 men-0.023
Token the
Token ID262
Feature activation+1.28e+00

Pos logit contributions
Acc+0.058
Sure+0.054
And+0.052
Of+0.050
Indeed+0.049
Neg logit contributions
 guns-0.041
 trouble-0.039
 men-0.039
 bullets-0.037
 thieves-0.037
Token three
Token ID1115
Feature activation+7.20e+00

Pos logit contributions
Sure+0.054
 approval+0.052
 forgive+0.049
Indeed+0.049
Of+0.049
Neg logit contributions
irens-0.054
mons-0.051
 bulldo-0.049
 trouble-0.048
 men-0.048
Token gems
Token ID23685
Feature activation+1.22e-01

Pos logit contributions
Sure+0.291
 approval+0.278
Indeed+0.263
of+0.260
 agreement+0.258
Neg logit contributions
 trouble-0.288
 bulldo-0.286
mons-0.285
 thieves-0.284
 guns-0.282
Token to
Token ID284
Feature activation-1.19e+00

Pos logit contributions
Sure+0.004
 gently+0.004
 approval+0.004
Indeed+0.004
 carefully+0.004
Neg logit contributions
irens-0.006
 trouble-0.006
 virus-0.006
 bulldo-0.006
mons-0.006
Token his
Token ID465
Feature activation+2.82e-01

Pos logit contributions
 guns+0.038
 sale+0.036
 trouble+0.035
 men+0.034
 bullets+0.033
Neg logit contributions
Acc-0.067
Sure-0.063
Of-0.061
Indeed-0.057
And-0.057
Token friends
Token ID2460
Feature activation+2.00e+00

Pos logit contributions
Sure+0.013
Acc+0.012
 approval+0.012
Of+0.012
Indeed+0.012
Neg logit contributions
 trouble-0.010
 guns-0.010
 problem-0.010
 men-0.010
 thieves-0.010
Token,
Token ID11
Feature activation-7.11e-01

Pos logit contributions
Sure+0.053
 gently+0.050
 approval+0.049
 who+0.049
 forgive+0.047
Neg logit contributions
 trouble-0.111
 problem-0.109
mons-0.109
 thieves-0.108
 bulldo-0.107
Token The
Token ID383
Feature activation+1.25e-02

Pos logit contributions
Sure+0.012
 approval+0.012
Acc+0.012
Indeed+0.011
of+0.011
Neg logit contributions
 men-0.011
 bulldo-0.011
 guns-0.011
 trouble-0.011
 problem-0.010
Token unicorn
Token ID44986
Feature activation+3.25e+00

Pos logit contributions
Acc+0.001
Sure+0.001
Of+0.001
Indeed+0.001
 approval+0.001
Neg logit contributions
 problem-0.000
 trouble-0.000
 men-0.000
 guns-0.000
 thieves-0.000
Token took
Token ID1718
Feature activation+8.69e-01

Pos logit contributions
Sure+0.108
 approval+0.098
Indeed+0.095
Acc+0.094
Of+0.093
Neg logit contributions
 trouble-0.158
 guns-0.156
 thieves-0.154
 men-0.154
 problem-0.154
Token the
Token ID262
Feature activation+1.45e+00

Pos logit contributions
Sure+0.028
 approval+0.026
Indeed+0.024
 gently+0.024
Of+0.024
Neg logit contributions
 guns-0.043
 problem-0.042
mons-0.042
 men-0.042
 bulldo-0.042
Token rh
Token ID9529
Feature activation+1.90e+00

Pos logit contributions
Sure+0.063
 approval+0.059
Indeed+0.056
Of+0.056
 gently+0.055
Neg logit contributions
irens-0.055
 trouble-0.054
 guns-0.054
 men-0.054
mons-0.054
Tokenin
Token ID259
Feature activation+7.15e+00

Pos logit contributions
Sure+0.055
 approval+0.051
Acc+0.049
Indeed+0.048
Of+0.047
Neg logit contributions
 guns-0.100
 trouble-0.099
 thieves-0.099
 men-0.099
 problem-0.099
Tokenoc
Token ID420
Feature activation-3.48e-01

Pos logit contributions
Sure+0.343
 approval+0.324
Acc+0.321
Indeed+0.314
Of+0.312
Neg logit contributions
 guns-0.239
 trouble-0.237
 men-0.235
 problem-0.233
 thieves-0.233
Tokeneros
Token ID27498
Feature activation+4.06e+00

Pos logit contributions
 trouble+0.010
 guns+0.009
 problem+0.009
 men+0.009
 sacks+0.009
Neg logit contributions
Sure-0.019
 approval-0.018
Acc-0.017
Indeed-0.017
Of-0.017
Token to
Token ID284
Feature activation+1.40e-01

Pos logit contributions
 gently+0.139
 who+0.126
Sure+0.125
 carefully+0.124
 approval+0.121
Neg logit contributions
irens-0.205
 bulldo-0.195
 inconvenience-0.192
 problem-0.192
mons-0.191
Token a
Token ID257
Feature activation+2.61e+00

Pos logit contributions
Sure+0.005
Acc+0.005
Of+0.005
 approval+0.005
Indeed+0.005
Neg logit contributions
 trouble-0.006
 guns-0.006
 men-0.006
 thieves-0.006
 problem-0.006
Token nearby
Token ID6716
Feature activation+1.93e+00

Pos logit contributions
Sure+0.103
 approval+0.095
Acc+0.093
Indeed+0.092
Of+0.091
Neg logit contributions
 trouble-0.110
 guns-0.110
 men-0.109
 thieves-0.108
 problem-0.108
Token.
Token ID13
Feature activation+6.95e-02

Pos logit contributions
Sure+0.052
 approval+0.048
Acc+0.046
Indeed+0.046
Of+0.045
Neg logit contributions
 guns-0.074
 trouble-0.074
 men-0.074
 thieves-0.074
 bulldo-0.073
Token He
Token ID679
Feature activation+4.67e-01

Pos logit contributions
Sure+0.003
 approval+0.002
Indeed+0.002
Acc+0.002
Of+0.002
Neg logit contributions
 guns-0.003
 trouble-0.003
 thieves-0.003
 problem-0.003
 men-0.003
Token used
Token ID973
Feature activation+4.65e-01

Pos logit contributions
Sure+0.018
 approval+0.017
Acc+0.016
Of+0.015
 agreement+0.015
Neg logit contributions
 problem-0.020
 trouble-0.020
 bulldo-0.020
 guns-0.020
 men-0.020
Token the
Token ID262
Feature activation+5.61e-02

Pos logit contributions
Sure+0.016
 approval+0.015
Indeed+0.014
Acc+0.014
Of+0.014
Neg logit contributions
 men-0.022
 thieves-0.022
 guns-0.021
 trouble-0.021
 problem-0.021
Token squee
Token ID13170
Feature activation+2.45e+00

Pos logit contributions
Sure+0.002
 approval+0.002
Acc+0.002
 gently+0.002
Of+0.002
Neg logit contributions
 guns-0.002
 thieves-0.002
 men-0.002
irens-0.002
mons-0.002
Tokengee
Token ID29622
Feature activation+7.12e+00

Pos logit contributions
Sure+0.116
Of+0.108
Indeed+0.107
Acc+0.104
 approval+0.104
Neg logit contributions
 problem-0.087
 trouble-0.086
 guns-0.085
 bulldo-0.084
 men-0.082
Token and
Token ID290
Feature activation+1.17e+00

Pos logit contributions
Sure+0.250
 approval+0.231
Of+0.216
 gently+0.215
Indeed+0.214
Neg logit contributions
 thieves-0.331
 trouble-0.330
 problem-0.325
 guns-0.322
 men-0.320
Token lots
Token ID6041
Feature activation+4.02e-01

Pos logit contributions
Sure+0.040
 approval+0.037
Acc+0.036
Indeed+0.036
Of+0.035
Neg logit contributions
 trouble-0.055
 guns-0.055
 men-0.055
 thieves-0.055
 problem-0.054
Token of
Token ID286
Feature activation-6.77e-01

Pos logit contributions
Sure+0.019
 approval+0.017
Acc+0.017
Indeed+0.017
Of+0.017
Neg logit contributions
 trouble-0.014
 guns-0.014
 problem-0.014
 men-0.014
 thieves-0.014
Token water
Token ID1660
Feature activation-4.52e-01

Pos logit contributions
 trouble+0.029
 problem+0.028
 men+0.026
 guns+0.026
 machines+0.025
Neg logit contributions
Sure-0.029
Indeed-0.027
Acc-0.026
Of-0.026
And-0.025
Token to
Token ID284
Feature activation-8.27e-01

Pos logit contributions
 thieves+0.022
 trouble+0.022
 virus+0.022
 problem+0.022
 robbery+0.022
Neg logit contributions
 gently-0.015
Sure-0.014
 softly-0.013
 carefully-0.013
 forgive-0.013
Token night
Token ID1755
Feature activation+1.42e+00

Pos logit contributions
Sure+0.014
Of+0.013
 approval+0.013
Acc+0.013
Indeed+0.012
Neg logit contributions
 trouble-0.009
 sale-0.008
 thieves-0.008
 guns-0.008
 problem-0.008
Token and
Token ID290
Feature activation+1.07e+00

Pos logit contributions
 who+0.025
 gently+0.025
 softly+0.023
 carefully+0.022
 quietly+0.022
Neg logit contributions
 thieves-0.092
mons-0.091
 tresp-0.088
 pistols-0.088
 guns-0.088
Token took
Token ID1718
Feature activation+7.24e-01

Pos logit contributions
Sure+0.036
 gently+0.032
 approval+0.032
Acc+0.032
Of+0.031
Neg logit contributions
irens-0.051
 guns-0.051
 problem-0.051
 bulldo-0.050
 men-0.050
Token the
Token ID262
Feature activation+1.19e+00

Pos logit contributions
Sure+0.026
 approval+0.024
Acc+0.023
Indeed+0.023
Of+0.023
Neg logit contributions
 guns-0.033
 trouble-0.033
 men-0.033
 problem-0.033
 thieves-0.033
Token rh
Token ID9529
Feature activation+2.13e+00

Pos logit contributions
Sure+0.064
 approval+0.060
Of+0.059
Indeed+0.059
 agreement+0.057
Neg logit contributions
 guns-0.032
irens-0.032
 trouble-0.032
mons-0.032
 thieves-0.031
Tokenin
Token ID259
Feature activation+7.12e+00

Pos logit contributions
Sure+0.053
 approval+0.046
Acc+0.045
Indeed+0.044
Of+0.043
Neg logit contributions
 trouble-0.121
 guns-0.121
 men-0.120
 problem-0.119
 thieves-0.119
Tokenoc
Token ID420
Feature activation+2.37e-01

Pos logit contributions
Sure+0.324
 approval+0.298
Indeed+0.291
Of+0.290
Acc+0.287
Neg logit contributions
 trouble-0.263
 guns-0.258
 problem-0.256
 thieves-0.254
 men-0.254
Tokeneros
Token ID27498
Feature activation+3.99e+00

Pos logit contributions
Sure+0.012
 approval+0.012
Acc+0.011
Indeed+0.011
Of+0.011
Neg logit contributions
 trouble-0.007
 problem-0.007
 guns-0.007
 men-0.007
 sacks-0.006
Token home
Token ID1363
Feature activation-5.04e-01

Pos logit contributions
 gently+0.142
Sure+0.131
 who+0.131
 forgive+0.127
 approval+0.126
Neg logit contributions
irens-0.193
 men-0.185
 problem-0.183
 bulldo-0.181
 trouble-0.179
Token with
Token ID351
Feature activation+5.94e-01

Pos logit contributions
 trouble+0.025
 guns+0.025
 problem+0.025
 men+0.025
 thieves+0.025
Neg logit contributions
Sure-0.016
 approval-0.015
Acc-0.014
Indeed-0.014
 gently-0.014
Token her
Token ID607
Feature activation+8.33e-01

Pos logit contributions
Sure+0.017
 approval+0.016
Acc+0.015
Indeed+0.015
Of+0.015
Neg logit contributions
 guns-0.031
 problem-0.031
 men-0.031
 bulldo-0.031
 trouble-0.031
Token.
Token ID13
Feature activation-8.94e-02

Pos logit contributions
Sure+0.018
 who+0.017
 gently+0.016
 forgive+0.015
 approval+0.015
Neg logit contributions
 trouble-0.046
 thieves-0.046
 bulldo-0.046
irens-0.046
mons-0.046
Token He
Token ID679
Feature activation+4.97e-01

Pos logit contributions
 men+0.003
 trouble+0.003
 bulldo+0.003
 guns+0.003
 sacks+0.003
Neg logit contributions
Sure-0.004
Acc-0.004
 approval-0.004
Indeed-0.004
Of-0.004
Token played
Token ID2826
Feature activation+1.18e+00

Pos logit contributions
Sure+0.017
 approval+0.016
Acc+0.015
Indeed+0.015
Of+0.015
Neg logit contributions
 trouble-0.023
 guns-0.023
 men-0.023
 problem-0.023
 thieves-0.023
Token tug
Token ID27762
Feature activation+2.04e+00

Pos logit contributions
Sure+0.049
 approval+0.045
Acc+0.044
Indeed+0.044
Of+0.043
Neg logit contributions
 trouble-0.047
 guns-0.047
 men-0.047
 problem-0.046
 thieves-0.046
Token-
Token ID12
Feature activation+1.23e+00

Pos logit contributions
Sure+0.072
 gently+0.066
 approval+0.066
Indeed+0.063
of+0.063
Neg logit contributions
 thieves-0.091
 trouble-0.091
 bulldo-0.090
 problem-0.089
mons-0.089
Tokenof
Token ID1659
Feature activation+7.11e+00

Pos logit contributions
 approval+0.049
 agreement+0.048
Sure+0.047
Acc+0.046
Indeed+0.045
Neg logit contributions
ages-0.050
mons-0.050
irens-0.049
 guns-0.047
 men-0.047
Token-
Token ID12
Feature activation+1.21e+00

Pos logit contributions
Sure+0.341
 approval+0.319
Acc+0.312
Indeed+0.310
Of+0.308
Neg logit contributions
 trouble-0.241
 guns-0.239
 men-0.237
 problem-0.236
 thieves-0.235
Tokenwar
Token ID5767
Feature activation+1.54e+00

Pos logit contributions
 approval+0.048
 agreement+0.047
Acc+0.045
Sure+0.044
 forgive+0.043
Neg logit contributions
ages-0.050
mons-0.050
irens-0.049
uts-0.047
ables-0.047
Token and
Token ID290
Feature activation+1.72e+00

Pos logit contributions
Sure+0.052
 approval+0.047
 gently+0.046
Indeed+0.045
Acc+0.045
Neg logit contributions
 trouble-0.074
 thieves-0.074
 problem-0.073
 guns-0.072
 men-0.072
Token ran
Token ID4966
Feature activation-7.77e-02

Pos logit contributions
Sure+0.060
 approval+0.055
Acc+0.054
Indeed+0.053
Of+0.053
Neg logit contributions
 trouble-0.081
 guns-0.080
 thieves-0.080
 men-0.080
 problem-0.079
Token in
Token ID287
Feature activation+2.23e+00

Pos logit contributions
 thieves+0.004
mons+0.004
 trouble+0.004
 guns+0.004
 problem+0.004
Neg logit contributions
Sure-0.002
 approval-0.002
 gently-0.002
Indeed-0.002
Of-0.002
Token
Token ID198
Feature activation+8.96e-01

Pos logit contributions
Sure+0.027
 approval+0.026
Acc+0.026
Of+0.025
of+0.025
Neg logit contributions
 trouble-0.020
 guns-0.019
 sacks-0.018
 men-0.018
 bullets-0.018
Token
Token ID198
Feature activation+1.11e+00

Pos logit contributions
 approval+0.035
Sure+0.032
Of+0.032
 agreement+0.032
Acc+0.031
Neg logit contributions
 guns-0.036
 trouble-0.036
 bulldo-0.035
irens-0.033
 bullets-0.033
TokenJack
Token ID14295
Feature activation+1.87e+00

Pos logit contributions
 approval+0.038
Sure+0.036
Of+0.034
Acc+0.033
 agreement+0.033
Neg logit contributions
 guns-0.049
 trouble-0.048
 sacks-0.047
 men-0.046
 bullets-0.046
Token thanked
Token ID26280
Feature activation+1.28e+00

Pos logit contributions
Sure+0.064
 approval+0.058
Acc+0.058
Of+0.056
Indeed+0.055
Neg logit contributions
 guns-0.088
 trouble-0.088
 men-0.088
 problem-0.087
 thieves-0.086
Token the
Token ID262
Feature activation+2.68e-01

Pos logit contributions
Sure+0.067
Acc+0.064
Of+0.063
Indeed+0.062
And+0.060
Neg logit contributions
 trouble-0.038
 guns-0.038
 sale-0.037
 men-0.034
 thieves-0.033
Token ow
Token ID12334
Feature activation+7.11e+00

Pos logit contributions
Sure+0.011
Of+0.010
 approval+0.010
 forgive+0.010
Indeed+0.010
Neg logit contributions
 guns-0.012
irens-0.012
 bulldo-0.011
 bullets-0.011
 pistols-0.011
Tokenls
Token ID7278
Feature activation-1.04e+00

Pos logit contributions
Sure+0.343
 gently+0.321
of+0.319
Indeed+0.311
 carefully+0.304
Neg logit contributions
 trouble-0.225
irens-0.223
riger-0.221
 tresp-0.221
 bulldo-0.218
Token and
Token ID290
Feature activation+8.69e-01

Pos logit contributions
 guns+0.035
 trouble+0.035
 men+0.035
 problem+0.035
 thieves+0.034
Neg logit contributions
Sure-0.050
 approval-0.047
Acc-0.047
Indeed-0.046
Of-0.045
Token he
Token ID339
Feature activation-8.41e-01

Pos logit contributions
Sure+0.031
 approval+0.028
Of+0.027
Indeed+0.027
Acc+0.027
Neg logit contributions
 trouble-0.040
 guns-0.040
 problem-0.040
 thieves-0.040
 men-0.040
Token waved
Token ID26834
Feature activation+2.53e+00

Pos logit contributions
 guns+0.036
 men+0.036
 trouble+0.036
 problem+0.035
 bulldo+0.035
Neg logit contributions
Sure-0.031
Acc-0.030
 approval-0.029
Indeed-0.027
Of-0.027
Token goodbye
Token ID24829
Feature activation-5.65e-01

Pos logit contributions
Sure+0.087
Acc+0.085
Indeed+0.082
 approval+0.077
And+0.076
Neg logit contributions
 trouble-0.118
 men-0.116
 guns-0.115
 thieves-0.114
 problem-0.111
Token the
Token ID262
Feature activation+1.05e+00

Pos logit contributions
Sure+0.014
Acc+0.013
 approval+0.012
Of+0.012
Indeed+0.012
Neg logit contributions
 trouble-0.016
 guns-0.016
 thieves-0.015
 men-0.015
 problem-0.015
Token tree
Token ID5509
Feature activation+3.16e+00

Pos logit contributions
Sure+0.049
Indeed+0.044
 approval+0.044
Of+0.044
Acc+0.043
Neg logit contributions
 guns-0.038
 men-0.037
irens-0.037
 trouble-0.037
 bullets-0.036
Token and
Token ID290
Feature activation+9.37e-01

Pos logit contributions
Sure+0.088
 gently+0.079
 approval+0.077
Indeed+0.074
Acc+0.074
Neg logit contributions
 trouble-0.170
 thieves-0.169
 men-0.169
 guns-0.169
 problem-0.167
Token picked
Token ID6497
Feature activation+1.26e+00

Pos logit contributions
Sure+0.033
 gently+0.031
Indeed+0.028
Of+0.028
 eagerly+0.027
Neg logit contributions
 problem-0.046
 trouble-0.046
irens-0.045
 men-0.045
 guns-0.045
Token the
Token ID262
Feature activation+1.58e+00

Pos logit contributions
Sure+0.046
 approval+0.043
Acc+0.041
Indeed+0.041
Of+0.041
Neg logit contributions
 trouble-0.056
 guns-0.056
 men-0.056
 problem-0.056
 thieves-0.056
Token biggest
Token ID4094
Feature activation+7.09e+00

Pos logit contributions
Sure+0.080
 approval+0.075
Acc+0.073
Indeed+0.073
Of+0.072
Neg logit contributions
 trouble-0.049
 guns-0.049
 men-0.049
 thieves-0.048
 problem-0.048
Token,
Token ID11
Feature activation+1.04e+00

Pos logit contributions
Sure+0.322
 approval+0.301
Indeed+0.291
Of+0.288
Acc+0.286
Neg logit contributions
 guns-0.255
 men-0.254
 trouble-0.252
 thieves-0.251
irens-0.250
Token rip
Token ID18344
Feature activation+2.10e-01

Pos logit contributions
Sure+0.045
 approval+0.042
Acc+0.041
Indeed+0.041
Of+0.041
Neg logit contributions
 trouble-0.039
 guns-0.039
 men-0.039
 problem-0.038
 thieves-0.038
Tokenest
Token ID395
Feature activation+4.02e+00

Pos logit contributions
 gently+0.006
Sure+0.006
 approval+0.006
of+0.006
 softly+0.005
Neg logit contributions
 bulldo-0.010
 thieves-0.010
riger-0.010
 guns-0.010
irens-0.010
Token olive
Token ID19450
Feature activation-2.81e+00

Pos logit contributions
Sure+0.188
 approval+0.178
Indeed+0.171
Acc+0.169
Of+0.168
Neg logit contributions
 trouble-0.137
 men-0.136
 guns-0.136
 thieves-0.135
irens-0.134
Token she
Token ID673
Feature activation-2.14e+00

Pos logit contributions
irens+0.154
 virus+0.148
 inconvenience+0.147
 trouble+0.145
 men+0.145
Neg logit contributions
 gently-0.085
 carefully-0.071
 softly-0.070
Sure-0.070
 who-0.067
Token spoiled
Token ID34781
Feature activation-2.49e+00

Pos logit contributions
 approval+0.039
 gently+0.038
Sure+0.037
 love+0.036
 forgive+0.035
Neg logit contributions
irens-0.053
mons-0.050
 bulldo-0.050
inc-0.050
unes-0.049
Token and
Token ID290
Feature activation+1.34e+00

Pos logit contributions
 bulldo+0.141
irens+0.135
riger+0.135
 robbery+0.133
 problem+0.131
Neg logit contributions
 approval-0.061
 gently-0.057
Sure-0.056
 who-0.052
 softly-0.051
Token mold
Token ID15936
Feature activation-1.11e+00

Pos logit contributions
 approval+0.043
 gently+0.042
Sure+0.042
 forgive+0.041
 love+0.040
Neg logit contributions
irens-0.065
 bulldo-0.063
mons-0.063
unes-0.061
inc-0.060
Tokeny
Token ID88
Feature activation+4.37e-01

Pos logit contributions
 bulldo+0.054
 robbery+0.052
 thieves+0.052
 trouble+0.052
 men+0.050
Neg logit contributions
 gently-0.035
Sure-0.035
of-0.034
 approval-0.032
 softly-0.032
Token and
Token ID290
Feature activation+4.40e-01

Pos logit contributions
 gently+0.014
 approval+0.014
 who+0.013
 softly+0.013
 carefully+0.012
Neg logit contributions
irens-0.021
 bulldo-0.020
riger-0.020
 thieves-0.020
 robbery-0.019
Token ranc
Token ID32205
Feature activation-8.80e+00

Pos logit contributions
 approval+0.016
 gently+0.015
Sure+0.015
 love+0.014
 forgive+0.014
Neg logit contributions
irens-0.020
 bulldo-0.019
riger-0.019
mons-0.019
inc-0.018
Tokenid
Token ID312
Feature activation-3.14e+00

Pos logit contributions
 trouble+0.407
 guns+0.407
 thieves+0.403
 problem+0.395
 men+0.393
Neg logit contributions
Sure-0.313
Acc-0.294
 approval-0.284
Indeed-0.281
And-0.277
Token and
Token ID290
Feature activation-2.81e-01

Pos logit contributions
 trouble+0.141
 thieves+0.136
 bulldo+0.135
 problem+0.135
mons+0.134
Neg logit contributions
Sure-0.110
 approval-0.108
of-0.106
 gently-0.105
Acc-0.100
Token expired
Token ID21350
Feature activation-4.54e+00

Pos logit contributions
irens+0.013
riger+0.012
 bulldo+0.012
mons+0.012
inc+0.012
Neg logit contributions
 approval-0.010
 gently-0.010
Sure-0.009
 who-0.009
 forgive-0.009
Token and
Token ID290
Feature activation-7.86e-02

Pos logit contributions
 bulldo+0.257
irens+0.253
 robbery+0.245
riger+0.245
 trouble+0.243
Neg logit contributions
 approval-0.104
 gently-0.104
Sure-0.100
 who-0.093
 forgive-0.093
Token best
Token ID1266
Feature activation-2.89e-01

Pos logit contributions
irens+0.004
 bulldo+0.004
mons+0.004
riger+0.004
unes+0.003
Neg logit contributions
 approval-0.003
 gently-0.003
Sure-0.003
 forgive-0.002
 love-0.002
Token that
Token ID326
Feature activation-1.64e+00

Pos logit contributions
irens+0.136
 sacks+0.134
 bullets+0.133
 thieves+0.133
 bulldo+0.133
Neg logit contributions
Sure-0.084
 gently-0.082
 forgive-0.080
Acc-0.077
 softly-0.077
Token her
Token ID607
Feature activation-6.87e-01

Pos logit contributions
irens+0.084
 thieves+0.083
 men+0.083
 bulldo+0.082
 problem+0.081
Neg logit contributions
Sure-0.049
Acc-0.049
 forgive-0.046
 gently-0.046
Of-0.044
Token injury
Token ID5095
Feature activation-1.84e+00

Pos logit contributions
 guns+0.030
irens+0.030
 thieves+0.029
 men+0.029
 bulldo+0.029
Neg logit contributions
Sure-0.026
 approval-0.024
Acc-0.023
Of-0.023
 forgive-0.023
Token was
Token ID373
Feature activation-5.70e-01

Pos logit contributions
 thieves+0.093
 trouble+0.093
 men+0.092
 problem+0.091
irens+0.091
Neg logit contributions
Sure-0.056
 gently-0.050
Acc-0.049
 approval-0.049
 forgive-0.049
Token too
Token ID1165
Feature activation-3.44e+00

Pos logit contributions
mons+0.029
irens+0.028
 men+0.028
 bulldo+0.027
 sacks+0.027
Neg logit contributions
 gently-0.020
Acc-0.018
 forgive-0.018
Sure-0.017
 comfort-0.017
Token severe
Token ID6049
Feature activation-8.74e+00

Pos logit contributions
 guns+0.135
mons+0.135
 men+0.134
 thieves+0.132
 bullets+0.130
Neg logit contributions
Sure-0.144
 gently-0.141
Acc-0.135
 forgive-0.130
 approval-0.130
Token and
Token ID290
Feature activation-6.81e-01

Pos logit contributions
mons+0.595
inc+0.579
irens+0.577
unes+0.565
 sacks+0.561
Neg logit contributions
 gently-0.162
 softly-0.138
 who-0.126
 forgive-0.119
 carefully-0.115
Token she
Token ID673
Feature activation-2.65e+00

Pos logit contributions
irens+0.033
mons+0.032
riger+0.031
unes+0.031
 men+0.031
Neg logit contributions
 gently-0.024
Sure-0.023
 forgive-0.022
Acc-0.021
Of-0.021
Token would
Token ID561
Feature activation+1.39e+00

Pos logit contributions
irens+0.122
 men+0.122
 guns+0.120
 thieves+0.120
 trouble+0.120
Neg logit contributions
Sure-0.093
 gently-0.087
Acc-0.085
 forgive-0.082
Of-0.082
Token never
Token ID1239
Feature activation+3.29e-01

Pos logit contributions
Sure+0.046
 approval+0.041
 forgive+0.041
 gently+0.041
Acc+0.040
Neg logit contributions
 men-0.068
 guns-0.068
 thieves-0.068
mons-0.068
irens-0.067
Token be
Token ID307
Feature activation-1.01e+00

Pos logit contributions
Sure+0.012
 approval+0.011
Acc+0.011
Indeed+0.011
Of+0.011
Neg logit contributions
 trouble-0.015
 problem-0.014
 guns-0.014
 thieves-0.014
 men-0.014
Token the
Token ID262
Feature activation-9.56e-01

Pos logit contributions
Sure+0.050
 approval+0.046
Acc+0.046
Of+0.045
Indeed+0.044
Neg logit contributions
 trouble-0.054
 men-0.052
 guns-0.052
 problem-0.052
 thieves-0.051
Token soup
Token ID17141
Feature activation-3.69e-01

Pos logit contributions
irens+0.042
uts+0.041
 guns+0.041
 bullets+0.040
drivers+0.039
Neg logit contributions
Sure-0.036
 approval-0.035
 gently-0.033
 agreement-0.032
And-0.032
Token,
Token ID11
Feature activation-1.15e+00

Pos logit contributions
 inconvenience+0.020
irens+0.020
 thieves+0.020
 robbery+0.020
 pistols+0.020
Neg logit contributions
 gently-0.010
 carefully-0.009
 softly-0.009
 eagerly-0.008
 then-0.008
Token it
Token ID340
Feature activation-1.51e+00

Pos logit contributions
 bulldo+0.054
 guns+0.053
 problem+0.053
 bullets+0.052
 thieves+0.052
Neg logit contributions
Sure-0.040
 approval-0.037
 gently-0.037
Indeed-0.036
Acc-0.036
Token was
Token ID373
Feature activation-7.18e-01

Pos logit contributions
 thieves+0.084
irens+0.083
 trouble+0.082
 inconvenience+0.082
 men+0.081
Neg logit contributions
 gently-0.040
Sure-0.039
 softly-0.034
Acc-0.034
 carefully-0.034
Token ste
Token ID2876
Feature activation-8.70e+00

Pos logit contributions
irens+0.034
 thieves+0.034
 guns+0.034
 men+0.034
mons+0.034
Neg logit contributions
 gently-0.022
 approval-0.022
Sure-0.022
Acc-0.021
Indeed-0.021
Tokenaming
Token ID3723
Feature activation-2.20e+00

Pos logit contributions
irens+0.304
 trouble+0.301
 guns+0.292
 thieves+0.288
 men+0.284
Neg logit contributions
Sure-0.411
Of-0.390
 approval-0.378
Acc-0.370
Indeed-0.366
Token hot
Token ID3024
Feature activation-3.79e+00

Pos logit contributions
 trouble+0.107
 bulldo+0.106
 guns+0.106
 thieves+0.105
 problem+0.105
Neg logit contributions
Sure-0.071
 approval-0.068
 gently-0.068
Acc-0.064
 softly-0.062
Token!
Token ID0
Feature activation-3.38e-02

Pos logit contributions
 robbery+0.205
 problem+0.202
 bulldo+0.200
 thieves+0.196
 guns+0.196
Neg logit contributions
 approval-0.090
Sure-0.088
 gently-0.082
of-0.076
 eagerly-0.075
Token She
Token ID1375
Feature activation-9.79e-02

Pos logit contributions
 trouble+0.002
 men+0.002
 problem+0.002
 guns+0.002
 thieves+0.002
Neg logit contributions
Sure-0.001
 approval-0.001
Acc-0.001
Of-0.001
Indeed-0.001
Token pretended
Token ID36766
Feature activation+1.53e+00

Pos logit contributions
 guns+0.004
 men+0.004
 trouble+0.004
 bulldo+0.004
 problem+0.004
Neg logit contributions
Sure-0.004
 approval-0.004
Acc-0.003
Of-0.003
Indeed-0.003
TokenThere
Token ID1858
Feature activation-4.33e-01

Pos logit contributions
 trouble+0.173
 guns+0.165
 men+0.163
 bulldo+0.162
irens+0.159
Neg logit contributions
Sure-0.127
 approval-0.121
Acc-0.119
Of-0.117
Indeed-0.113
Token was
Token ID373
Feature activation+4.54e-01

Pos logit contributions
 trouble+0.020
 guns+0.020
 thieves+0.020
 men+0.020
 problem+0.020
Neg logit contributions
Sure-0.015
 approval-0.014
Acc-0.014
Indeed-0.013
Of-0.013
Token a
Token ID257
Feature activation-8.38e-01

Pos logit contributions
Sure+0.016
 approval+0.015
Of+0.014
Acc+0.014
Indeed+0.014
Neg logit contributions
 trouble-0.022
 guns-0.021
 men-0.021
 problem-0.021
 bulldo-0.020
Token gr
Token ID1036
Feature activation-3.56e+00

Pos logit contributions
 problem+0.036
 bulldo+0.035
 men+0.034
 trouble+0.034
 thieves+0.032
Neg logit contributions
Sure-0.033
 approval-0.033
Acc-0.032
of-0.030
Of-0.030
Tokenumpy
Token ID32152
Feature activation-2.40e+00

Pos logit contributions
 men+0.096
 trouble+0.096
 guns+0.095
 problem+0.095
 bulldo+0.093
Neg logit contributions
Sure-0.195
 approval-0.185
Acc-0.181
Indeed-0.179
Of-0.179
Token ranc
Token ID32205
Feature activation-8.53e+00

Pos logit contributions
 guns+0.100
 trouble+0.100
 thieves+0.099
irens+0.098
mons+0.097
Neg logit contributions
Sure-0.094
 approval-0.085
Of-0.084
Indeed-0.084
Acc-0.083
Tokenher
Token ID372
Feature activation-1.68e+00

Pos logit contributions
 trouble+0.385
 guns+0.377
 thieves+0.377
 problem+0.375
 men+0.370
Neg logit contributions
Sure-0.318
Acc-0.295
Indeed-0.287
 approval-0.285
Of-0.278
Token named
Token ID3706
Feature activation-2.60e+00

Pos logit contributions
 trouble+0.093
riger+0.088
 bullets+0.088
 problem+0.087
 AW+0.087
Neg logit contributions
 who-0.047
Sure-0.040
 gently-0.039
 loved-0.036
 eagerly-0.036
Token Tom
Token ID4186
Feature activation-7.13e-01

Pos logit contributions
 problem+0.126
 trouble+0.123
 thieves+0.122
 bullets+0.120
 guns+0.119
Neg logit contributions
Sure-0.090
Of-0.079
of-0.077
 gently-0.076
 who-0.074
Token.
Token ID13
Feature activation-7.60e-01

Pos logit contributions
 guns+0.034
 trouble+0.034
 men+0.034
 problem+0.033
 thieves+0.033
Neg logit contributions
Sure-0.024
 approval-0.022
Acc-0.022
Indeed-0.021
Of-0.021
Token Tom
Token ID4186
Feature activation-4.77e-01

Pos logit contributions
 trouble+0.031
 men+0.031
 guns+0.031
 problem+0.030
 bulldo+0.030
Neg logit contributions
Sure-0.031
 approval-0.029
Acc-0.029
Indeed-0.028
Of-0.028
Token wallet
Token ID13008
Feature activation-4.03e+00

Pos logit contributions
Sure+0.004
 approval+0.004
Indeed+0.003
Of+0.003
Acc+0.003
Neg logit contributions
 guns-0.004
irens-0.004
 men-0.004
 trouble-0.004
 thieves-0.004
Token had
Token ID550
Feature activation-1.36e+00

Pos logit contributions
 pests+0.226
riger+0.225
 thieves+0.225
 inconvenience+0.224
irens+0.221
Neg logit contributions
 gently-0.099
Sure-0.091
 smiled-0.087
 carefully-0.086
 loved-0.083
Token coins
Token ID10796
Feature activation-3.77e+00

Pos logit contributions
mons+0.063
riger+0.062
 pests+0.061
irens+0.061
 thieves+0.061
Neg logit contributions
Sure-0.043
 approval-0.043
 gently-0.042
 forgive-0.041
Indeed-0.040
Token,
Token ID11
Feature activation-1.21e+00

Pos logit contributions
 inconvenience+0.241
 paramed+0.239
 virus+0.237
irens+0.236
riger+0.226
Neg logit contributions
 gently-0.084
 carefully-0.083
 who-0.075
 then-0.062
 eagerly-0.060
Token gum
Token ID27142
Feature activation-4.11e+00

Pos logit contributions
riger+0.050
mons+0.049
 thieves+0.048
 problem+0.048
 bulldo+0.048
Neg logit contributions
 approval-0.046
Sure-0.046
 gently-0.044
Indeed-0.042
 who-0.042
Token wra
Token ID7917
Feature activation-8.32e+00

Pos logit contributions
 inconvenience+0.255
irens+0.250
 trouble+0.250
 paramed+0.250
 thieves+0.246
Neg logit contributions
 gently-0.091
 carefully-0.082
 who-0.072
 loved-0.067
 then-0.065
Tokenppers
Token ID11799
Feature activation-2.91e+00

Pos logit contributions
 trouble+0.343
 men+0.327
 problem+0.318
 inconvenience+0.309
 virus+0.308
Neg logit contributions
 gently-0.349
Sure-0.339
of-0.325
 approval-0.322
 carefully-0.315
Token,
Token ID11
Feature activation-1.78e+00

Pos logit contributions
 virus+0.182
 inconvenience+0.182
 paramed+0.181
irens+0.180
 trouble+0.177
Neg logit contributions
 gently-0.063
 carefully-0.056
 who-0.054
 eagerly-0.047
 then-0.046
Token buttons
Token ID12163
Feature activation-1.83e+00

Pos logit contributions
 problem+0.088
riger+0.088
 bulldo+0.087
 thieves+0.087
mons+0.087
Neg logit contributions
 gently-0.050
Sure-0.049
 approval-0.047
 who-0.044
And-0.043
Token,
Token ID11
Feature activation-1.81e+00

Pos logit contributions
 virus+0.107
 trouble+0.106
 robbery+0.103
 thieves+0.102
 inconvenience+0.102
Neg logit contributions
 gently-0.043
 carefully-0.038
of-0.036
 who-0.035
 then-0.035
Token and
Token ID290
Feature activation-3.13e-01

Pos logit contributions
 problem+0.102
 thieves+0.102
 trouble+0.101
 bulldo+0.100
mons+0.099
Neg logit contributions
 gently-0.040
Sure-0.039
 approval-0.037
And-0.033
 carefully-0.033
Token control
Token ID1630
Feature activation+1.47e+00

Pos logit contributions
mons+0.043
 problem+0.042
 guns+0.042
 men+0.042
 bulldo+0.042
Neg logit contributions
Sure-0.032
 approval-0.032
Acc-0.029
 agreement-0.029
 gently-0.029
Token.
Token ID13
Feature activation-1.65e-01

Pos logit contributions
Sure+0.034
 gently+0.034
 approval+0.033
 who+0.032
of+0.032
Neg logit contributions
mons-0.079
 bulldo-0.079
 thieves-0.079
riger-0.078
 problem-0.078
Token
Token ID198
Feature activation+8.60e-01

Pos logit contributions
 trouble+0.008
 men+0.008
irens+0.008
 guns+0.008
 bulldo+0.008
Neg logit contributions
Sure-0.006
 approval-0.005
Acc-0.005
 agreement-0.005
Of-0.005
Token
Token ID198
Feature activation+1.13e+00

Pos logit contributions
 approval+0.038
Acc+0.036
Sure+0.035
 agreement+0.035
Of+0.034
Neg logit contributions
 trouble-0.035
irens-0.034
 bulldo-0.033
 men-0.033
uts-0.032
TokenThe
Token ID464
Feature activation+2.15e-01

Pos logit contributions
Sure+0.038
 approval+0.033
Indeed+0.033
Acc+0.033
Of+0.032
Neg logit contributions
 trouble-0.055
 guns-0.055
 problem-0.055
 thieves-0.054
 men-0.054
Token ranc
Token ID32205
Feature activation-8.21e+00

Pos logit contributions
 agreement+0.009
Sure+0.009
 approval+0.008
And+0.008
 gently+0.008
Neg logit contributions
uts-0.009
mons-0.009
inc-0.009
unes-0.009
beh-0.008
Tokenher
Token ID372
Feature activation-1.22e+00

Pos logit contributions
 trouble+0.344
 guns+0.342
 thieves+0.338
 problem+0.337
 men+0.335
Neg logit contributions
Sure-0.326
 approval-0.299
Acc-0.298
Indeed-0.293
Of-0.288
Token heard
Token ID2982
Feature activation+1.34e+00

Pos logit contributions
 trouble+0.062
riger+0.061
 problem+0.061
 virus+0.061
 policy+0.059
Neg logit contributions
 gently-0.035
 smiled-0.035
 who-0.033
Sure-0.032
 then-0.032
Token the
Token ID262
Feature activation-1.29e+00

Pos logit contributions
Sure+0.047
 approval+0.044
Indeed+0.041
 gently+0.041
Of+0.040
Neg logit contributions
 problem-0.062
 bulldo-0.061
mons-0.060
 thieves-0.060
 guns-0.060
Token noise
Token ID7838
Feature activation-1.37e-01

Pos logit contributions
mons+0.056
uts+0.055
unes+0.053
inc+0.053
ados+0.050
Neg logit contributions
 agreement-0.046
Sure-0.045
 approval-0.045
 gently-0.043
And-0.043
Token and
Token ID290
Feature activation+1.74e-01

Pos logit contributions
 trouble+0.007
 guns+0.007
 men+0.007
 problem+0.007
 thieves+0.007
Neg logit contributions
Sure-0.004
 approval-0.004
Acc-0.004
Indeed-0.004
Of-0.004
Token scratched
Token ID37468
Feature activation-3.32e-01

Pos logit contributions
Sure+0.034
 gently+0.034
 approval+0.033
Acc+0.033
 forgive+0.032
Neg logit contributions
 problem-0.032
 men-0.031
mons-0.031
 bulldo-0.031
irens-0.031
Token.
Token ID13
Feature activation-7.64e-01

Pos logit contributions
 guns+0.014
 trouble+0.014
 men+0.014
 thieves+0.014
 problem+0.014
Neg logit contributions
Sure-0.013
Acc-0.012
 approval-0.012
Indeed-0.012
Of-0.012
Token
Token ID198
Feature activation+5.09e-01

Pos logit contributions
 problem+0.036
 thieves+0.036
 guns+0.036
 men+0.035
 trouble+0.035
Neg logit contributions
Sure-0.027
 approval-0.024
Acc-0.024
Indeed-0.023
Of-0.023
Token
Token ID198
Feature activation+7.96e-01

Pos logit contributions
 approval+0.021
Sure+0.020
Acc+0.020
 agreement+0.019
Of+0.019
Neg logit contributions
 trouble-0.021
irens-0.020
 guns-0.020
 bulldo-0.020
 men-0.020
TokenThe
Token ID464
Feature activation+4.19e-02

Pos logit contributions
Sure+0.021
 approval+0.018
Acc+0.018
Indeed+0.017
Of+0.017
Neg logit contributions
 trouble-0.045
 guns-0.044
 problem-0.044
 men-0.044
 thieves-0.044
Token ranc
Token ID32205
Feature activation-8.09e+00

Pos logit contributions
Sure+0.002
 agreement+0.001
 approval+0.001
 gently+0.001
And+0.001
Neg logit contributions
mons-0.002
uts-0.002
unes-0.002
irens-0.002
inc-0.002
Tokenher
Token ID372
Feature activation-1.21e+00

Pos logit contributions
 trouble+0.350
 guns+0.347
 thieves+0.344
 problem+0.339
 men+0.336
Neg logit contributions
Sure-0.311
Acc-0.286
 approval-0.281
Indeed-0.280
 gently-0.273
Token heard
Token ID2982
Feature activation+1.16e+00

Pos logit contributions
riger+0.063
unes+0.061
 trouble+0.061
 virus+0.061
 problem+0.060
Neg logit contributions
 gently-0.035
 smiled-0.034
 softly-0.032
 then-0.031
Sure-0.031
Token the
Token ID262
Feature activation-1.38e+00

Pos logit contributions
Sure+0.038
 approval+0.036
 agreement+0.033
 gently+0.032
 forgive+0.032
Neg logit contributions
mons-0.057
 bulldo-0.056
 problem-0.056
 thieves-0.054
 guns-0.054
Token noise
Token ID7838
Feature activation-1.94e-01

Pos logit contributions
mons+0.057
uts+0.055
inc+0.053
unes+0.053
irens+0.052
Neg logit contributions
Sure-0.053
 approval-0.051
 agreement-0.051
 gently-0.051
And-0.049
Token and
Token ID290
Feature activation-5.02e-02

Pos logit contributions
 trouble+0.010
 thieves+0.010
 guns+0.010
mons+0.010
 problem+0.010
Neg logit contributions
Sure-0.006
 approval-0.005
 gently-0.005
Acc-0.005
Indeed-0.005
Token the
Token ID262
Feature activation+1.20e+00

Pos logit contributions
Sure+0.098
 approval+0.090
Acc+0.088
Indeed+0.087
Of+0.087
Neg logit contributions
 trouble-0.099
 guns-0.099
 men-0.098
 problem-0.097
 thieves-0.097
Token new
Token ID649
Feature activation+5.44e+00

Pos logit contributions
Sure+0.049
 approval+0.046
Indeed+0.044
Of+0.044
Acc+0.044
Neg logit contributions
 trouble-0.048
 guns-0.047
 men-0.047
 thieves-0.047
mons-0.047
Token horse
Token ID8223
Feature activation+2.72e+00

Pos logit contributions
Sure+0.211
 approval+0.194
Indeed+0.186
Of+0.185
Acc+0.183
Neg logit contributions
 trouble-0.233
 guns-0.231
 thieves-0.229
 men-0.229
mons-0.228
Token when
Token ID618
Feature activation+1.10e+00

Pos logit contributions
Sure+0.072
 gently+0.071
 who+0.064
 approval+0.063
 softly+0.062
Neg logit contributions
 trouble-0.146
irens-0.146
 men-0.143
mons-0.143
 bulldo-0.142
Token the
Token ID262
Feature activation-8.91e-01

Pos logit contributions
 gently+0.032
 softly+0.032
 quietly+0.032
Sure+0.031
 eagerly+0.031
Neg logit contributions
mons-0.056
arre-0.054
unes-0.053
inc-0.052
ages-0.052
Token ranc
Token ID32205
Feature activation-8.00e+00

Pos logit contributions
 guns+0.036
mons+0.035
 trouble+0.035
irens+0.035
 thieves+0.035
Neg logit contributions
Sure-0.037
 approval-0.034
Of-0.033
Indeed-0.032
 agreement-0.032
Tokenher
Token ID372
Feature activation-1.98e+00

Pos logit contributions
 trouble+0.359
 guns+0.357
 thieves+0.355
 problem+0.346
mons+0.344
Neg logit contributions
Sure-0.289
Acc-0.275
Indeed-0.260
 approval-0.260
And-0.260
Token was
Token ID373
Feature activation-1.14e+00

Pos logit contributions
 trouble+0.099
 problem+0.094
 virus+0.092
ocol+0.091
 policy+0.091
Neg logit contributions
 gently-0.061
 smiled-0.058
 eagerly-0.056
 softly-0.056
Sure-0.055
Token not
Token ID407
Feature activation-3.17e+00

Pos logit contributions
mons+0.054
 guns+0.050
ados+0.049
unes+0.048
 thieves+0.048
Neg logit contributions
 gently-0.040
 softly-0.038
 quietly-0.038
Sure-0.037
 approval-0.037
Token looking
Token ID2045
Feature activation-5.27e-01

Pos logit contributions
mons+0.151
irens+0.143
 bulldo+0.142
 guns+0.140
unes+0.139
Neg logit contributions
 gently-0.109
 who-0.105
 approval-0.105
 forgive-0.102
 softly-0.099
Token.
Token ID13
Feature activation-8.94e-01

Pos logit contributions
mons+0.030
irens+0.029
 bulldo+0.029
riger+0.028
drivers+0.027
Neg logit contributions
 who-0.013
 gently-0.012
 approval-0.012
 softly-0.012
Sure-0.011
Token room
Token ID2119
Feature activation+7.70e-04

Pos logit contributions
 thieves+0.135
 bulldo+0.132
 guns+0.132
 trouble+0.131
 bullets+0.130
Neg logit contributions
Sure-0.122
 approval-0.119
 agreement-0.111
 gently-0.110
Indeed-0.109
Token.
Token ID13
Feature activation-7.11e-01

Pos logit contributions
Sure+0.000
 approval+0.000
 gently+0.000
Acc+0.000
 forgive+0.000
Neg logit contributions
 trouble-0.000
 thieves-0.000
 bulldo-0.000
 guns-0.000
 problem-0.000
Token The
Token ID383
Feature activation-6.23e-01

Pos logit contributions
 problem+0.034
 thieves+0.034
 guns+0.034
 trouble+0.034
 men+0.033
Neg logit contributions
Sure-0.025
 approval-0.022
Indeed-0.022
Of-0.021
Acc-0.021
Token dish
Token ID9433
Feature activation-2.98e+00

Pos logit contributions
irens+0.023
 guns+0.023
 thieves+0.022
 bulldo+0.022
 bullets+0.022
Neg logit contributions
Sure-0.028
 approval-0.026
Of-0.025
 agreement-0.025
Indeed-0.025
Token was
Token ID373
Feature activation-3.69e-01

Pos logit contributions
 thieves+0.148
 trouble+0.147
 inconvenience+0.142
 bullets+0.141
 tresp+0.141
Neg logit contributions
Sure-0.101
 gently-0.090
 who-0.087
Indeed-0.087
 eagerly-0.085
Token ste
Token ID2876
Feature activation-7.89e+00

Pos logit contributions
mons+0.021
irens+0.020
 thieves+0.020
 guns+0.020
riger+0.020
Neg logit contributions
 gently-0.009
 approval-0.008
Sure-0.008
 eagerly-0.008
 carefully-0.008
Tokenaming
Token ID3723
Feature activation-1.99e+00

Pos logit contributions
irens+0.282
 trouble+0.271
 guns+0.265
 sacks+0.263
 men+0.262
Neg logit contributions
Sure-0.361
Of-0.340
 approval-0.332
Acc-0.331
Indeed-0.323
Token hot
Token ID3024
Feature activation-3.52e+00

Pos logit contributions
 thieves+0.108
 trouble+0.107
mons+0.105
 tresp+0.105
 bulldo+0.105
Neg logit contributions
Sure-0.054
 approval-0.053
 gently-0.053
 eagerly-0.050
 softly-0.048
Token.
Token ID13
Feature activation-7.17e-01

Pos logit contributions
 robbery+0.185
 thieves+0.184
 bulldo+0.183
 problem+0.182
 bullets+0.181
Neg logit contributions
Sure-0.092
 approval-0.090
 gently-0.080
of-0.077
 eagerly-0.077
Token It
Token ID632
Feature activation+6.82e-01

Pos logit contributions
 guns+0.035
 problem+0.035
 trouble+0.035
 thieves+0.035
 men+0.034
Neg logit contributions
Sure-0.024
 approval-0.022
Indeed-0.021
Of-0.021
Acc-0.021
Token was
Token ID373
Feature activation-8.49e-01

Pos logit contributions
Sure+0.018
 gently+0.017
Acc+0.015
Of+0.015
 softly+0.015
Neg logit contributions
 thieves-0.039
irens-0.038
mons-0.038
 guns-0.037
 trouble-0.037
Token duck
Token ID22045
Feature activation+1.63e-01

Pos logit contributions
 trouble+0.105
 guns+0.104
 thieves+0.104
 bulldo+0.104
irens+0.103
Neg logit contributions
Sure-0.073
 approval-0.066
Indeed-0.063
Of-0.063
 gently-0.061
Token slowly
Token ID6364
Feature activation+2.34e+00

Pos logit contributions
Sure+0.006
 gently+0.005
of+0.005
Indeed+0.005
Of+0.005
Neg logit contributions
 trouble-0.008
 thieves-0.008
irens-0.008
mons-0.008
 virus-0.008
Token crawled
Token ID45668
Feature activation-1.06e+00

Pos logit contributions
Sure+0.078
 gently+0.074
 approval+0.069
of+0.068
 softly+0.067
Neg logit contributions
 trouble-0.112
 thieves-0.112
 virus-0.110
irens-0.109
mons-0.109
Token,
Token ID11
Feature activation-9.10e-01

Pos logit contributions
 trouble+0.047
 guns+0.047
 men+0.046
 problem+0.046
 thieves+0.046
Neg logit contributions
Sure-0.040
 approval-0.036
Acc-0.036
Indeed-0.035
Of-0.035
Token it
Token ID340
Feature activation-1.73e+00

Pos logit contributions
 problem+0.046
irens+0.046
mons+0.045
unes+0.045
 thieves+0.045
Neg logit contributions
Sure-0.028
 gently-0.027
Indeed-0.025
 approval-0.025
Of-0.025
Token envelop
Token ID16441
Feature activation-7.88e+00

Pos logit contributions
 trouble+0.078
 guns+0.077
 thieves+0.077
 men+0.076
 problem+0.076
Neg logit contributions
Sure-0.063
 approval-0.058
Acc-0.056
Indeed-0.056
Of-0.056
Tokened
Token ID276
Feature activation-2.86e-01

Pos logit contributions
 guns+0.269
 trouble+0.264
 men+0.263
 thieves+0.262
 problem+0.260
Neg logit contributions
Sure-0.377
 approval-0.351
Acc-0.349
Of-0.344
Indeed-0.342
Token in
Token ID287
Feature activation+5.72e-01

Pos logit contributions
 trouble+0.013
 problem+0.013
 bulldo+0.013
 guns+0.013
 men+0.013
Neg logit contributions
Sure-0.010
 approval-0.009
 gently-0.009
Indeed-0.009
Of-0.009
Token the
Token ID262
Feature activation-1.42e+00

Pos logit contributions
Sure+0.021
Acc+0.020
Of+0.019
 approval+0.018
Indeed+0.018
Neg logit contributions
 trouble-0.027
 guns-0.026
 men-0.026
 thieves-0.026
 problem-0.025
Token fresh
Token ID4713
Feature activation+3.39e-01

Pos logit contributions
 trouble+0.061
 guns+0.060
 men+0.060
 problem+0.060
 thieves+0.059
Neg logit contributions
Sure-0.056
Acc-0.052
 approval-0.051
Indeed-0.050
Of-0.050
Token,
Token ID11
Feature activation-4.79e-01

Pos logit contributions
 approval+0.010
Sure+0.010
 agreement+0.010
 gently+0.009
of+0.009
Neg logit contributions
uts-0.017
 thieves-0.017
 tresp-0.017
drivers-0.017
mons-0.017
Token but
Token ID475
Feature activation+3.91e-01

Pos logit contributions
mons+0.091
 problem+0.091
 thieves+0.091
 bulldo+0.090
irens+0.089
Neg logit contributions
Sure-0.048
Of-0.043
 approval-0.043
 who-0.042
Indeed-0.042
Token today
Token ID1909
Feature activation-3.08e+00

Pos logit contributions
Sure+0.012
 then+0.011
 gently+0.011
 who+0.010
 approval+0.010
Neg logit contributions
irens-0.021
mons-0.021
 bulldo-0.020
 trouble-0.020
 thieves-0.020
Token they
Token ID484
Feature activation-3.63e+00

Pos logit contributions
riger+0.180
drivers+0.176
irens+0.172
ages+0.171
 bulldo+0.170
Neg logit contributions
 then-0.083
 who-0.082
 carefully-0.075
Sure-0.074
 even-0.074
Token were
Token ID547
Feature activation-1.40e+00

Pos logit contributions
irens+0.197
 trouble+0.190
 bulldo+0.190
mons+0.189
 guns+0.186
Neg logit contributions
Sure-0.108
 forgive-0.097
 gently-0.097
Indeed-0.091
 approval-0.091
Token qu
Token ID627
Feature activation-3.45e+00

Pos logit contributions
mons+0.078
irens+0.072
drivers+0.072
 bulldo+0.071
riger+0.070
Neg logit contributions
 who-0.043
 gently-0.039
 softly-0.038
 eagerly-0.037
 happy-0.037
Tokenarre
Token ID9624
Feature activation-7.88e+00

Pos logit contributions
 trouble+0.056
 guns+0.055
 men+0.054
 thieves+0.053
 problem+0.052
Neg logit contributions
Sure-0.227
 approval-0.216
Acc-0.215
Indeed-0.213
Of-0.209
Tokenling
Token ID1359
Feature activation-1.38e+00

Pos logit contributions
 bulldo+0.406
mons+0.397
 trouble+0.395
unes+0.391
 men+0.383
Neg logit contributions
Sure-0.220
Acc-0.219
 approval-0.219
 gently-0.208
 eagerly-0.206
Token.
Token ID13
Feature activation-4.18e-01

Pos logit contributions
 bulldo+0.069
mons+0.068
 thieves+0.068
 trouble+0.067
 problem+0.067
Neg logit contributions
Sure-0.042
 approval-0.040
 gently-0.038
Acc-0.037
Indeed-0.036
Token Jenna
Token ID45377
Feature activation-5.13e-01

Pos logit contributions
 trouble+0.022
 guns+0.022
 thieves+0.022
 problem+0.022
 men+0.022
Neg logit contributions
Sure-0.012
 approval-0.011
Acc-0.011
Indeed-0.011
Of-0.010
Token wanted
Token ID2227
Feature activation+1.45e+00

Pos logit contributions
 thieves+0.026
 trouble+0.026
irens+0.026
mons+0.025
 guns+0.025
Neg logit contributions
Sure-0.016
 gently-0.015
 who-0.014
Acc-0.014
of-0.014
Token to
Token ID284
Feature activation-7.46e-01

Pos logit contributions
Sure+0.061
Of+0.054
 approval+0.054
 gently+0.053
 forgive+0.053
Neg logit contributions
 bulldo-0.059
 thieves-0.058
 men-0.057
riger-0.057
irens-0.057
Token little
Token ID1310
Feature activation+1.35e+00

Pos logit contributions
mons+0.012
riger+0.012
 inconvenience+0.012
irens+0.012
 bulldo+0.011
Neg logit contributions
 gently-0.007
Sure-0.007
of-0.007
 who-0.007
 approval-0.007
Token birds
Token ID10087
Feature activation-1.20e+00

Pos logit contributions
 approval+0.045
Sure+0.044
 gently+0.044
 who+0.043
of+0.041
Neg logit contributions
irens-0.059
riger-0.058
mons-0.058
uts-0.058
 bulldo-0.057
Token that
Token ID326
Feature activation-2.21e+00

Pos logit contributions
 trouble+0.053
mons+0.051
 men+0.050
 thieves+0.050
 sacks+0.050
Neg logit contributions
Sure-0.044
 gently-0.043
 who-0.042
 approval-0.041
 forgive-0.041
Token were
Token ID547
Feature activation-1.46e+00

Pos logit contributions
 problem+0.102
 trouble+0.102
 thieves+0.101
 bulldo+0.100
irens+0.099
Neg logit contributions
Sure-0.078
 gently-0.069
 approval-0.069
Of-0.068
Indeed-0.068
Token qu
Token ID627
Feature activation-3.64e+00

Pos logit contributions
mons+0.069
 bulldo+0.066
riger+0.064
irens+0.064
 virus+0.063
Neg logit contributions
 gently-0.052
 who-0.049
 softly-0.048
 approval-0.045
 forgive-0.044
Tokenarre
Token ID9624
Feature activation-7.86e+00

Pos logit contributions
 trouble+0.096
 men+0.096
 guns+0.095
 thieves+0.094
 problem+0.092
Neg logit contributions
Sure-0.200
 approval-0.189
Acc-0.188
Indeed-0.185
of-0.181
Tokenling
Token ID1359
Feature activation-1.68e+00

Pos logit contributions
mons+0.433
 bulldo+0.431
unes+0.421
 trouble+0.420
ometers+0.409
Neg logit contributions
 gently-0.187
 approval-0.183
Sure-0.177
Acc-0.174
fully-0.174
Token.
Token ID13
Feature activation-6.53e-01

Pos logit contributions
 bulldo+0.086
 problem+0.085
mons+0.085
 trouble+0.084
 thieves+0.084
Neg logit contributions
Sure-0.049
 approval-0.046
 gently-0.045
Acc-0.043
 softly-0.042
Token
Token ID198
Feature activation+2.20e-01

Pos logit contributions
 thieves+0.033
 problem+0.033
 trouble+0.033
 guns+0.032
 bullets+0.032
Neg logit contributions
Sure-0.022
Of-0.019
 gently-0.018
Indeed-0.018
Acc-0.018
Token
Token ID198
Feature activation+5.65e-01

Pos logit contributions
 approval+0.010
Acc+0.009
 agreement+0.009
Sure+0.008
Of+0.008
Neg logit contributions
 trouble-0.008
irens-0.008
uts-0.008
 bulldo-0.007
 men-0.007
TokenThe
Token ID464
Feature activation+1.25e-01

Pos logit contributions
Sure+0.015
 approval+0.014
Acc+0.013
Of+0.013
Indeed+0.013
Neg logit contributions
 trouble-0.031
 men-0.030
 guns-0.030
 thieves-0.030
 problem-0.030
Token willing
Token ID4684
Feature activation-7.86e-02

Pos logit contributions
 guns+0.004
irens+0.004
 men+0.004
 problem+0.004
 trouble+0.004
Neg logit contributions
Sure-0.003
 approval-0.003
Acc-0.003
Of-0.003
 gently-0.003
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
irens+0.004
 bulldo+0.004
mons+0.004
unes+0.004
riger+0.004
Neg logit contributions
 forgive-0.003
 gently-0.003
 approval-0.003
Sure-0.003
 who-0.003
Token
Token ID198
Feature activation+1.03e+00

Pos logit contributions
 trouble+0.001
 men+0.001
 guns+0.001
 bulldo+0.001
 thieves+0.001
Neg logit contributions
Sure-0.000
 approval-0.000
Acc-0.000
Indeed-0.000
Of-0.000
Token
Token ID198
Feature activation+1.26e+00

Pos logit contributions
 approval+0.039
Sure+0.038
Acc+0.037
Of+0.035
 copy+0.035
Neg logit contributions
irens-0.043
 trouble-0.043
 bulldo-0.043
 guns-0.043
 men-0.042
TokenThe
Token ID464
Feature activation+2.15e-01

Pos logit contributions
Sure+0.040
 approval+0.038
Acc+0.036
Indeed+0.035
Of+0.035
Neg logit contributions
 guns-0.062
 trouble-0.061
 men-0.061
 bulldo-0.060
 problem-0.060
Token ranc
Token ID32205
Feature activation-7.81e+00

Pos logit contributions
Sure+0.006
 approval+0.005
 agreement+0.005
Of+0.005
And+0.005
Neg logit contributions
mons-0.012
 guns-0.012
uts-0.011
 thieves-0.011
irens-0.011
Tokenher
Token ID372
Feature activation-8.32e-01

Pos logit contributions
 trouble+0.321
 guns+0.320
 men+0.316
 problem+0.315
 thieves+0.315
Neg logit contributions
Sure-0.317
 approval-0.293
Acc-0.288
Indeed-0.284
Of-0.282
Token forg
Token ID5784
Feature activation-3.93e-01

Pos logit contributions
 trouble+0.042
 virus+0.040
 problem+0.040
riger+0.039
 policy+0.039
Neg logit contributions
 gently-0.027
Sure-0.026
 smiled-0.025
 eagerly-0.024
 softly-0.024
Tokenave
Token ID1015
Feature activation+2.13e+00

Pos logit contributions
 guns+0.018
 trouble+0.018
 men+0.018
 problem+0.017
 thieves+0.017
Neg logit contributions
Sure-0.014
 approval-0.013
Acc-0.013
Indeed-0.013
Of-0.012
Token them
Token ID606
Feature activation+5.37e-01

Pos logit contributions
Sure+0.076
 approval+0.069
of+0.068
Of+0.064
 gently+0.064
Neg logit contributions
 bulldo-0.098
 problem-0.097
mons-0.095
 trouble-0.094
 guns-0.093
Token,
Token ID11
Feature activation-2.02e-01

Pos logit contributions
Sure+0.013
 gently+0.012
 approval+0.012
 forgive+0.011
Of+0.011
Neg logit contributions
 trouble-0.031
 problem-0.030
mons-0.030
 thieves-0.030
irens-0.030
Token doctor
Token ID6253
Feature activation+1.01e+00

Pos logit contributions
 trouble+0.020
 problem+0.020
 men+0.020
 guns+0.020
 thieves+0.019
Neg logit contributions
Sure-0.030
 approval-0.028
Acc-0.028
Indeed-0.027
Of-0.027
Token,
Token ID11
Feature activation-1.57e+00

Pos logit contributions
Sure+0.029
 gently+0.027
 who+0.027
 approval+0.026
 forgive+0.025
Neg logit contributions
 trouble-0.053
 guns-0.052
 thieves-0.052
 bullets-0.051
mons-0.051
Token which
Token ID543
Feature activation-5.21e-01

Pos logit contributions
 problem+0.090
 guns+0.089
mons+0.089
 men+0.089
 trouble+0.088
Neg logit contributions
Sure-0.038
 approval-0.033
Of-0.033
Acc-0.032
 forgive-0.032
Token was
Token ID373
Feature activation-5.57e-01

Pos logit contributions
mons+0.030
unes+0.030
 thieves+0.030
inc+0.029
ados+0.029
Neg logit contributions
Sure-0.011
 smiled-0.011
 who-0.011
 loved-0.011
Me-0.011
Token an
Token ID281
Feature activation-4.01e+00

Pos logit contributions
mons+0.035
ados+0.033
uts+0.032
riger+0.032
unes+0.032
Neg logit contributions
 gently-0.013
 who-0.011
 softly-0.011
Sure-0.011
 carefully-0.011
Token err
Token ID11454
Feature activation-7.78e+00

Pos logit contributions
 guns+0.138
mons+0.133
 trouble+0.132
 thieves+0.130
 men+0.129
Neg logit contributions
Sure-0.187
 approval-0.180
Acc-0.174
Indeed-0.173
 agreement-0.170
Tokenand
Token ID392
Feature activation+1.77e+00

Pos logit contributions
 guns+0.357
 trouble+0.357
 men+0.353
 problem+0.352
 thieves+0.351
Neg logit contributions
Sure-0.278
 approval-0.254
Acc-0.248
Indeed-0.244
Of-0.243
Token he
Token ID339
Feature activation-2.40e+00

Pos logit contributions
 who+0.064
Sure+0.063
of+0.061
 gently+0.061
 carefully+0.058
Neg logit contributions
 trouble-0.077
 problem-0.074
 sacks-0.074
 bullets-0.073
 bulldo-0.073
Token didn
Token ID1422
Feature activation-2.30e+00

Pos logit contributions
 guns+0.123
irens+0.120
mons+0.120
 thieves+0.118
 sacks+0.118
Neg logit contributions
Sure-0.074
 gently-0.065
 forgive-0.064
Of-0.063
 approval-0.062
Token't
Token ID470
Feature activation-1.47e+00

Pos logit contributions
 thieves+0.099
mons+0.097
 guns+0.096
 sacks+0.095
 pests+0.093
Neg logit contributions
Sure-0.094
 gently-0.086
of-0.085
And-0.085
Her-0.082
Token understand
Token ID1833
Feature activation+1.31e+00

Pos logit contributions
mons+0.089
 guns+0.085
ados+0.083
 sacks+0.082
 bulldo+0.081
Neg logit contributions
 understand-0.038
 forgive-0.036
Sure-0.035
 believe-0.033
of-0.032
Token.
Token ID13
Feature activation-7.43e-01

Pos logit contributions
 trouble+0.244
irens+0.241
 robbery+0.240
 paramed+0.238
 bulldo+0.238
Neg logit contributions
 gently-0.093
 who-0.076
 carefully-0.072
 then-0.069
 softly-0.065
Token Lucy
Token ID22162
Feature activation+9.37e-01

Pos logit contributions
 trouble+0.033
 men+0.032
 guns+0.032
 problem+0.032
irens+0.031
Neg logit contributions
Sure-0.027
 approval-0.026
Acc-0.026
Indeed-0.024
Of-0.024
Token thought
Token ID1807
Feature activation+7.20e-01

Pos logit contributions
Sure+0.032
 approval+0.030
Acc+0.029
Of+0.028
Indeed+0.028
Neg logit contributions
 guns-0.044
 trouble-0.044
 men-0.044
 problem-0.044
 thieves-0.043
Token it
Token ID340
Feature activation-1.16e+00

Pos logit contributions
Sure+0.024
 gently+0.023
 approval+0.022
Acc+0.022
Indeed+0.021
Neg logit contributions
 thieves-0.034
irens-0.034
 bulldo-0.033
 men-0.033
 guns-0.032
Token was
Token ID373
Feature activation-8.87e-01

Pos logit contributions
 problem+0.045
 guns+0.043
 trouble+0.042
 bulldo+0.041
 men+0.041
Neg logit contributions
 approval-0.053
Sure-0.050
 agreement-0.049
Acc-0.048
Of-0.048
Token wine
Token ID8237
Feature activation-7.69e+00

Pos logit contributions
irens+0.047
 bulldo+0.047
 men+0.046
 thieves+0.046
 guns+0.046
Neg logit contributions
Sure-0.025
 approval-0.023
Acc-0.022
Indeed-0.022
 gently-0.021
Token.
Token ID13
Feature activation-8.82e-01

Pos logit contributions
irens+0.429
 bulldo+0.429
 men+0.414
 trouble+0.412
 thieves+0.411
Neg logit contributions
Sure-0.203
 gently-0.193
 forgive-0.171
 who-0.168
 softly-0.167
Token She
Token ID1375
Feature activation-2.87e-01

Pos logit contributions
 trouble+0.041
 men+0.040
mons+0.039
 guns+0.039
 problem+0.039
Neg logit contributions
 approval-0.031
Sure-0.030
Acc-0.029
 agreement-0.028
Indeed-0.027
Token knew
Token ID2993
Feature activation+1.78e-01

Pos logit contributions
 guns+0.010
 men+0.010
 bulldo+0.010
 problem+0.010
mons+0.010
Neg logit contributions
 approval-0.013
Sure-0.012
Acc-0.012
 agreement-0.012
 Higher-0.011
Token she
Token ID673
Feature activation-1.32e+00

Pos logit contributions
Sure+0.007
 approval+0.006
Acc+0.006
Of+0.006
Indeed+0.006
Neg logit contributions
 bulldo-0.008
 men-0.008
 trouble-0.008
irens-0.008
 guns-0.008
Token should
Token ID815
Feature activation+1.24e+00

Pos logit contributions
 trouble+0.056
 guns+0.056
 men+0.056
 problem+0.056
 thieves+0.055
Neg logit contributions
Sure-0.052
 approval-0.048
Acc-0.046
Indeed-0.046
Of-0.046
Token,
Token ID11
Feature activation-1.76e+00

Pos logit contributions
 trouble+0.265
irens+0.258
 virus+0.256
 bulldo+0.255
 problem+0.254
Neg logit contributions
 gently-0.097
 who-0.088
 carefully-0.087
 eagerly-0.086
Sure-0.082
Token cup
Token ID6508
Feature activation-5.45e+00

Pos logit contributions
 problem+0.091
 thieves+0.086
 trouble+0.086
irens+0.086
 men+0.085
Neg logit contributions
Sure-0.056
 gently-0.053
 approval-0.051
Of-0.050
 who-0.049
Tokencakes
Token ID37263
Feature activation-4.53e+00

Pos logit contributions
 trouble+0.262
 thieves+0.254
 problem+0.242
 pests+0.240
 robbery+0.239
Neg logit contributions
Sure-0.198
 gently-0.183
of-0.181
 eagerly-0.170
 softly-0.167
Token and
Token ID290
Feature activation+4.13e-01

Pos logit contributions
 trouble+0.258
 thieves+0.256
 problem+0.248
 bulldo+0.248
 bullets+0.246
Neg logit contributions
Sure-0.127
 gently-0.118
of-0.108
Of-0.105
 eagerly-0.104
Token ch
Token ID442
Feature activation-2.68e-01

Pos logit contributions
Sure+0.016
 gently+0.015
Of+0.015
 approval+0.015
Indeed+0.014
Neg logit contributions
 problem-0.018
 trouble-0.018
irens-0.017
 thieves-0.017
 bulldo-0.017
Tokenocol
Token ID4668
Feature activation-7.60e+00

Pos logit contributions
 trouble+0.007
 problem+0.007
 virus+0.007
 bulldo+0.007
 guns+0.007
Neg logit contributions
 approval-0.014
Sure-0.014
of-0.014
 gently-0.013
 agreement-0.013
Tokenates
Token ID689
Feature activation-1.06e+00

Pos logit contributions
 trouble+0.292
 problem+0.289
 men+0.282
 guns+0.280
 bulldo+0.277
Neg logit contributions
Sure-0.324
 approval-0.312
Acc-0.305
Indeed-0.298
 gently-0.290
Token.
Token ID13
Feature activation-6.76e-01

Pos logit contributions
 trouble+0.060
 problem+0.058
irens+0.057
 bulldo+0.057
 thieves+0.057
Neg logit contributions
 gently-0.029
Sure-0.027
 approval-0.026
of-0.025
 eagerly-0.025
Token
Token ID220
Feature activation-2.42e-01

Pos logit contributions
 problem+0.036
 thieves+0.036
 guns+0.036
 trouble+0.035
 men+0.035
Neg logit contributions
Sure-0.021
Of-0.018
 approval-0.017
Indeed-0.017
Acc-0.016
Token
Token ID198
Feature activation+3.41e-01

Pos logit contributions
 trouble+0.014
 guns+0.014
 men+0.014
 problem+0.014
 thieves+0.014
Neg logit contributions
Sure-0.005
 approval-0.005
Acc-0.005
Indeed-0.004
Of-0.004
Token
Token ID198
Feature activation+7.13e-01

Pos logit contributions
Sure+0.009
 approval+0.009
Acc+0.009
Indeed+0.008
Of+0.008
Neg logit contributions
 trouble-0.018
 bulldo-0.018
 men-0.018
irens-0.018
 guns-0.018
Token It
Token ID632
Feature activation+1.33e+00

Pos logit contributions
Sure+0.001
 approval+0.001
Acc+0.001
Indeed+0.001
Of+0.001
Neg logit contributions
 trouble-0.001
 guns-0.001
 men-0.001
 problem-0.001
 bulldo-0.001
Token was
Token ID373
Feature activation-1.79e-01

Pos logit contributions
Sure+0.055
 approval+0.052
Acc+0.050
Of+0.050
Indeed+0.049
Neg logit contributions
 trouble-0.052
 problem-0.052
 guns-0.051
 men-0.051
 bulldo-0.049
Token sweet
Token ID6029
Feature activation-7.01e-01

Pos logit contributions
 trouble+0.008
 guns+0.008
 men+0.008
 problem+0.008
 thieves+0.008
Neg logit contributions
Sure-0.007
 approval-0.006
Acc-0.006
Indeed-0.006
Of-0.006
Token,
Token ID11
Feature activation-1.27e+00

Pos logit contributions
 guns+0.036
 trouble+0.036
 men+0.035
 thieves+0.035
 problem+0.035
Neg logit contributions
Sure-0.022
 approval-0.020
Acc-0.019
Indeed-0.019
Of-0.018
Token ch
Token ID442
Feature activation-1.36e-01

Pos logit contributions
 trouble+0.058
 guns+0.057
 men+0.057
 problem+0.056
 thieves+0.056
Neg logit contributions
Sure-0.046
 approval-0.042
Acc-0.041
Indeed-0.040
Of-0.040
Tokenocol
Token ID4668
Feature activation-7.59e+00

Pos logit contributions
 trouble+0.004
 thieves+0.004
 guns+0.004
 men+0.004
 problem+0.004
Neg logit contributions
Sure-0.007
 approval-0.006
Acc-0.006
Indeed-0.006
Of-0.006
Tokenat
Token ID265
Feature activation-2.53e+00

Pos logit contributions
mons+0.209
 guns+0.195
 thieves+0.194
irens+0.192
unes+0.188
Neg logit contributions
Sure-0.444
Of-0.416
 agreement-0.400
 approval-0.399
Indeed-0.397
Tokeny
Token ID88
Feature activation-8.65e-01

Pos logit contributions
 trouble+0.110
 guns+0.109
 thieves+0.108
 problem+0.107
 men+0.107
Neg logit contributions
Sure-0.098
 approval-0.091
 gently-0.088
Indeed-0.087
Acc-0.087
Token and
Token ID290
Feature activation+3.31e-02

Pos logit contributions
 trouble+0.042
 thieves+0.042
 guns+0.042
irens+0.042
 men+0.042
Neg logit contributions
Sure-0.028
 approval-0.026
 gently-0.025
Indeed-0.024
Acc-0.024
Token warm
Token ID5814
Feature activation-1.75e+00

Pos logit contributions
Sure+0.001
 approval+0.001
Acc+0.001
Indeed+0.001
Of+0.001
Neg logit contributions
 trouble-0.001
 guns-0.001
 men-0.001
 problem-0.001
 thieves-0.001
Token,
Token ID11
Feature activation-1.30e+00

Pos logit contributions
 trouble+0.081
 guns+0.081
 thieves+0.081
 bulldo+0.080
 men+0.080
Neg logit contributions
Sure-0.062
 approval-0.058
Acc-0.054
 gently-0.054
Of-0.054
Token.
Token ID13
Feature activation-7.61e-01

Pos logit contributions
 gently+0.045
 who+0.043
of+0.043
Sure+0.041
 softly+0.037
Neg logit contributions
riger-0.167
 trouble-0.167
 problem-0.164
 thieves-0.163
 robbery-0.162
Token There
Token ID1318
Feature activation-1.34e+00

Pos logit contributions
 problem+0.041
 thieves+0.040
 guns+0.040
 bulldo+0.039
irens+0.039
Neg logit contributions
Sure-0.021
Of-0.018
 approval-0.018
Acc-0.018
Indeed-0.018
Token were
Token ID547
Feature activation-1.82e-02

Pos logit contributions
riger+0.086
 thieves+0.086
irens+0.086
 virus+0.083
inc+0.083
Neg logit contributions
Sure-0.022
 gently-0.022
 forgive-0.019
Me-0.019
 smiled-0.018
Token cans
Token ID23916
Feature activation-5.79e+00

Pos logit contributions
riger+0.001
irens+0.001
 bulldo+0.001
 AW+0.001
inc+0.001
Neg logit contributions
 gently-0.000
Sure-0.000
 approval-0.000
 forgive-0.000
 who-0.000
Token,
Token ID11
Feature activation-7.16e-01

Pos logit contributions
riger+0.334
 virus+0.319
mons+0.319
 paramed+0.318
ages+0.318
Neg logit contributions
 gently-0.145
 who-0.129
 forgive-0.123
 carefully-0.123
 then-0.117
Token wra
Token ID7917
Feature activation-7.56e+00

Pos logit contributions
riger+0.023
 problem+0.022
irens+0.021
ables+0.021
 robbery+0.021
Neg logit contributions
 gently-0.034
 who-0.033
Sure-0.033
 approval-0.032
 then-0.032
Tokenppers
Token ID11799
Feature activation-2.56e+00

Pos logit contributions
 trouble+0.298
 men+0.282
 problem+0.282
 virus+0.272
 bulldo+0.269
Neg logit contributions
 gently-0.319
Sure-0.311
of-0.293
 approval-0.287
 carefully-0.280
Token and
Token ID290
Feature activation+1.76e-01

Pos logit contributions
 paramed+0.149
riger+0.147
irens+0.146
 virus+0.146
beh+0.143
Neg logit contributions
 gently-0.063
 who-0.058
 carefully-0.051
 softly-0.048
 then-0.046
Token bottles
Token ID14096
Feature activation-7.10e+00

Pos logit contributions
 gently+0.007
 approval+0.007
 who+0.007
Sure+0.007
 forgive+0.007
Neg logit contributions
riger-0.007
arre-0.006
uts-0.006
ables-0.006
irens-0.006
Token.
Token ID13
Feature activation-8.28e-01

Pos logit contributions
riger+0.444
 virus+0.442
 paramed+0.430
irens+0.429
 bulldo+0.427
Neg logit contributions
 gently-0.155
 carefully-0.124
 who-0.116
 forgive-0.113
 softly-0.105
Token Sara
Token ID24799
Feature activation+2.63e-02

Pos logit contributions
 problem+0.043
 guns+0.042
 thieves+0.042
 bulldo+0.041
irens+0.041
Neg logit contributions
Sure-0.025
Of-0.021
 approval-0.021
 gently-0.021
And-0.021
Token but
Token ID475
Feature activation+1.01e+00

Pos logit contributions
 trouble+0.001
 thieves+0.001
 problem+0.001
 men+0.001
 guns+0.001
Neg logit contributions
Sure-0.001
 approval-0.001
Acc-0.001
Indeed-0.001
Of-0.001
Token today
Token ID1909
Feature activation-2.46e+00

Pos logit contributions
Sure+0.035
 gently+0.031
Indeed+0.030
Of+0.029
Acc+0.029
Neg logit contributions
irens-0.049
 thieves-0.049
 trouble-0.048
mons-0.048
 bulldo-0.048
Token they
Token ID484
Feature activation-3.39e+00

Pos logit contributions
drivers+0.162
riger+0.155
 thieves+0.152
unes+0.152
ados+0.150
Neg logit contributions
 who-0.050
 then-0.042
 gently-0.042
 softly-0.041
 quietly-0.040
Token were
Token ID547
Feature activation-1.17e+00

Pos logit contributions
irens+0.188
 bulldo+0.177
drivers+0.176
 trouble+0.173
 bullets+0.173
Neg logit contributions
Sure-0.097
 gently-0.093
 forgive-0.090
 who-0.083
Indeed-0.082
Token qu
Token ID627
Feature activation-3.15e+00

Pos logit contributions
mons+0.061
irens+0.060
 bulldo+0.059
 bullets+0.059
drivers+0.059
Neg logit contributions
 who-0.037
 gently-0.037
 softly-0.036
 happy-0.035
 quietly-0.034
Tokenarre
Token ID9624
Feature activation-7.54e+00

Pos logit contributions
 trouble+0.050
 guns+0.049
 men+0.048
 problem+0.047
 thieves+0.047
Neg logit contributions
Sure-0.208
 approval-0.198
Acc-0.196
Indeed-0.195
Of-0.193
Tokenling
Token ID1359
Feature activation-1.25e+00

Pos logit contributions
 bulldo+0.377
 trouble+0.376
mons+0.371
 guns+0.369
 men+0.368
Neg logit contributions
Sure-0.223
Acc-0.216
 approval-0.209
 gently-0.206
Indeed-0.198
Token.
Token ID13
Feature activation-3.05e-01

Pos logit contributions
 trouble+0.060
 bulldo+0.060
 thieves+0.060
 problem+0.060
 guns+0.060
Neg logit contributions
Sure-0.041
 approval-0.037
Acc-0.036
 gently-0.036
Indeed-0.035
Token They
Token ID1119
Feature activation-1.58e+00

Pos logit contributions
 thieves+0.013
 trouble+0.013
 problem+0.013
 guns+0.013
 men+0.013
Neg logit contributions
Sure-0.012
Indeed-0.010
Of-0.010
 approval-0.010
 gently-0.010
Token were
Token ID547
Feature activation-1.02e+00

Pos logit contributions
 trouble+0.071
 guns+0.071
 men+0.070
 problem+0.070
 thieves+0.070
Neg logit contributions
Sure-0.058
 approval-0.053
Acc-0.051
Indeed-0.051
Of-0.051
Token arguing
Token ID11810
Feature activation-3.04e+00

Pos logit contributions
mons+0.057
 policy+0.055
 bulldo+0.055
irens+0.054
 guy+0.054
Neg logit contributions
 gently-0.029
 happy-0.028
 who-0.027
 softly-0.026
 then-0.025
Token little
Token ID1310
Feature activation+1.00e+00

Pos logit contributions
 problem+0.024
 bulldo+0.024
 men+0.023
 trouble+0.022
 thieves+0.022
Neg logit contributions
 approval-0.024
Sure-0.023
Acc-0.022
of-0.022
Indeed-0.022
Token girl
Token ID2576
Feature activation+3.29e-02

Pos logit contributions
Sure+0.037
 approval+0.033
Acc+0.032
Of+0.032
Indeed+0.032
Neg logit contributions
 trouble-0.045
 guns-0.045
 men-0.045
 thieves-0.044
 problem-0.044
Token who
Token ID508
Feature activation+1.08e+00

Pos logit contributions
Sure+0.002
 approval+0.002
Indeed+0.002
Of+0.002
 Higher+0.002
Neg logit contributions
 guns-0.001
 bulldo-0.001
 trouble-0.001
 problem-0.001
 men-0.001
Token was
Token ID373
Feature activation-7.80e-01

Pos logit contributions
Sure+0.029
 approval+0.028
Indeed+0.026
Of+0.025
 agreement+0.024
Neg logit contributions
 trouble-0.059
 problem-0.058
 bulldo-0.057
 men-0.057
 guns-0.057
Token feeling
Token ID4203
Feature activation-1.29e+00

Pos logit contributions
 guns+0.044
 bullets+0.043
 men+0.043
irens+0.043
 trouble+0.043
Neg logit contributions
Sure-0.020
Acc-0.018
 approval-0.016
Indeed-0.016
Of-0.016
Token un
Token ID555
Feature activation-7.54e+00

Pos logit contributions
 guns+0.060
 trouble+0.059
 men+0.058
 problem+0.058
 thieves+0.058
Neg logit contributions
Sure-0.046
 approval-0.042
Acc-0.041
Indeed-0.040
Of-0.040
Tokenconf
Token ID10414
Feature activation-4.88e+00

Pos logit contributions
 trouble+0.319
 problem+0.311
 guns+0.308
 men+0.304
 bullets+0.294
Neg logit contributions
Sure-0.330
Indeed-0.290
 approval-0.289
Acc-0.289
Of-0.287
Tokenident
Token ID738
Feature activation-1.53e+00

Pos logit contributions
 trouble+0.213
 guns+0.212
 men+0.211
 problem+0.202
 thieves+0.201
Neg logit contributions
Sure-0.203
Indeed-0.183
Acc-0.182
Of-0.179
 gently-0.176
Token.
Token ID13
Feature activation-6.64e-01

Pos logit contributions
 trouble+0.087
 thieves+0.087
 bulldo+0.087
mons+0.086
irens+0.086
Neg logit contributions
Sure-0.033
 gently-0.032
 who-0.031
 approval-0.030
 softly-0.029
Token She
Token ID1375
Feature activation-1.46e-02

Pos logit contributions
 trouble+0.023
 sale+0.023
mons+0.022
 men+0.022
unes+0.022
Neg logit contributions
Sure-0.029
 approval-0.029
Acc-0.029
Indeed-0.028
And-0.027
Token wanted
Token ID2227
Feature activation+1.55e+00

Pos logit contributions
 trouble+0.001
 guns+0.001
 problem+0.001
 men+0.001
 bulldo+0.001
Neg logit contributions
Sure-0.000
 approval-0.000
Acc-0.000
Indeed-0.000
Of-0.000
Token as
Token ID355
Feature activation+9.44e-02

Pos logit contributions
Sure+0.045
Acc+0.041
 approval+0.041
Of+0.040
Indeed+0.040
Neg logit contributions
 trouble-0.039
 men-0.039
 problem-0.038
 guns-0.038
 thieves-0.038
Token he
Token ID339
Feature activation-1.37e+00

Pos logit contributions
Sure+0.005
 approval+0.004
Acc+0.004
Of+0.004
Indeed+0.004
Neg logit contributions
 men-0.003
 trouble-0.003
 guns-0.003
 problem-0.003
 thieves-0.003
Token put
Token ID1234
Feature activation+7.33e-01

Pos logit contributions
mons+0.047
 bulldo+0.046
 problem+0.046
inc+0.045
 caused+0.045
Neg logit contributions
 approval-0.061
Sure-0.061
 agreement-0.060
Acc-0.057
Of-0.056
Token the
Token ID262
Feature activation-2.58e-01

Pos logit contributions
Sure+0.033
Acc+0.031
 approval+0.031
Indeed+0.030
Of+0.030
Neg logit contributions
 guns-0.027
 trouble-0.027
 men-0.027
 problem-0.026
 thieves-0.026
Token finishing
Token ID12848
Feature activation+1.31e+00

Pos logit contributions
 problem+0.010
 trouble+0.010
 guns+0.010
 men+0.010
 thieves+0.010
Neg logit contributions
Sure-0.011
Acc-0.010
 approval-0.010
Indeed-0.010
Of-0.010
Token touches
Token ID18105
Feature activation+3.84e+00

Pos logit contributions
 approval+0.046
 gently+0.046
Sure+0.045
 eagerly+0.042
 softly+0.041
Neg logit contributions
 trouble-0.058
 bulldo-0.057
 robbery-0.057
 thieves-0.057
 guns-0.057
Token on
Token ID319
Feature activation+2.00e-01

Pos logit contributions
Sure+0.133
 approval+0.125
 gently+0.119
Indeed+0.116
Acc+0.114
Neg logit contributions
 thieves-0.175
 trouble-0.175
 problem-0.173
 guns-0.172
 bulldo-0.171
Token his
Token ID465
Feature activation+1.81e+00

Pos logit contributions
Sure+0.007
 approval+0.007
Acc+0.006
Indeed+0.006
Of+0.006
Neg logit contributions
 guns-0.009
 trouble-0.009
 thieves-0.009
 problem-0.009
 men-0.009
Token painting
Token ID12036
Feature activation+2.36e+00

Pos logit contributions
Sure+0.078
Acc+0.071
Of+0.069
Indeed+0.069
 approval+0.068
Neg logit contributions
 problem-0.073
 trouble-0.073
 guns-0.071
 men-0.071
 thieves-0.070
Token.
Token ID13
Feature activation-3.56e-01

Pos logit contributions
Sure+0.074
 approval+0.067
 gently+0.064
Indeed+0.062
Of+0.062
Neg logit contributions
 thieves-0.120
 trouble-0.118
 guns-0.117
 men-0.116
 bulldo-0.116
Token It
Token ID632
Feature activation+1.07e+00

Pos logit contributions
 men+0.013
 trouble+0.013
 bulldo+0.013
irens+0.012
 guns+0.012
Neg logit contributions
Sure-0.015
Acc-0.015
 approval-0.015
Indeed-0.014
of-0.014
Token you
Token ID345
Feature activation+9.71e-01

Pos logit contributions
 guns+0.061
 trouble+0.060
 thieves+0.059
 men+0.058
 problem+0.058
Neg logit contributions
Sure-0.099
Acc-0.092
 approval-0.091
Indeed-0.091
Of-0.090
Token.
Token ID13
Feature activation-1.18e-01

Pos logit contributions
Sure+0.039
 approval+0.035
Acc+0.035
Indeed+0.035
Of+0.035
Neg logit contributions
 trouble-0.041
 guns-0.041
 problem-0.041
 men-0.040
 thieves-0.040
Token We
Token ID775
Feature activation+1.56e-01

Pos logit contributions
 trouble+0.004
 guns+0.004
 men+0.004
 thieves+0.004
 bulldo+0.004
Neg logit contributions
Sure-0.005
 approval-0.005
Acc-0.005
Indeed-0.005
 gently-0.005
Token made
Token ID925
Feature activation-8.50e-01

Pos logit contributions
Sure+0.004
 approval+0.004
Acc+0.004
Indeed+0.004
Of+0.004
Neg logit contributions
 guns-0.008
 trouble-0.008
 men-0.008
 thieves-0.008
 problem-0.008
Token your
Token ID534
Feature activation+2.26e-02

Pos logit contributions
 trouble+0.039
 guns+0.039
 men+0.038
 problem+0.038
 thieves+0.038
Neg logit contributions
Sure-0.031
 approval-0.028
Acc-0.028
Indeed-0.027
Of-0.027
Token favorite
Token ID4004
Feature activation+4.65e+00

Pos logit contributions
Sure+0.001
Acc+0.001
 approval+0.001
Indeed+0.001
Of+0.001
Neg logit contributions
 trouble-0.001
 problem-0.001
 guns-0.001
 men-0.001
 thieves-0.001
Token dinner
Token ID8073
Feature activation-3.35e+00

Pos logit contributions
Sure+0.210
Acc+0.201
 approval+0.193
Indeed+0.191
Of+0.191
Neg logit contributions
 problem-0.172
 guns-0.171
 trouble-0.171
 men-0.167
 thieves-0.163
Token:
Token ID25
Feature activation+1.36e-01

Pos logit contributions
 guns+0.154
 trouble+0.154
 thieves+0.153
 bulldo+0.153
 men+0.153
Neg logit contributions
Sure-0.118
 approval-0.108
Indeed-0.104
Acc-0.103
Of-0.102
Token pizza
Token ID14256
Feature activation-4.37e+00

Pos logit contributions
Sure+0.005
 approval+0.005
Acc+0.005
Indeed+0.005
Of+0.005
Neg logit contributions
 trouble-0.006
 guns-0.006
 men-0.006
 problem-0.006
 thieves-0.006
Token and
Token ID290
Feature activation+2.25e-01

Pos logit contributions
irens+0.218
 trouble+0.217
 bulldo+0.216
 thieves+0.213
 guns+0.213
Neg logit contributions
Sure-0.133
 gently-0.121
 approval-0.120
And-0.118
Indeed-0.116
Token ice
Token ID4771
Feature activation-2.16e+00

Pos logit contributions
Sure+0.010
 approval+0.010
 gently+0.010
 forgive+0.010
Of+0.010
Neg logit contributions
irens-0.008
inc-0.008
mons-0.008
 bulldo-0.007
unes-0.007
Token always
Token ID1464
Feature activation+1.01e+00

Pos logit contributions
Sure+0.084
Acc+0.079
 approval+0.076
Indeed+0.076
Of+0.076
Neg logit contributions
 trouble-0.103
 problem-0.101
 guns-0.099
 bulldo-0.097
 sale-0.097
Token be
Token ID307
Feature activation-4.72e-01

Pos logit contributions
Sure+0.038
Acc+0.036
Indeed+0.035
Of+0.034
 Higher+0.033
Neg logit contributions
 trouble-0.046
 problem-0.045
 thieves-0.043
 guns-0.042
 job-0.042
Token here
Token ID994
Feature activation-1.85e+00

Pos logit contributions
 bulldo+0.023
mons+0.023
 men+0.023
 guns+0.023
 bullets+0.022
Neg logit contributions
Sure-0.016
 approval-0.015
Acc-0.015
 gently-0.014
 forgive-0.014
Token to
Token ID284
Feature activation-1.42e+00

Pos logit contributions
 trouble+0.072
 problem+0.070
 guns+0.069
 thieves+0.068
 men+0.067
Neg logit contributions
Sure-0.080
Acc-0.074
 approval-0.073
Indeed-0.072
Of-0.071
Token make
Token ID787
Feature activation+1.81e+00

Pos logit contributions
 trouble+0.062
 problem+0.058
 sale+0.057
 guns+0.056
 thieves+0.056
Neg logit contributions
Sure-0.055
Acc-0.054
Indeed-0.052
Of-0.050
 approval-0.048
Token each
Token ID1123
Feature activation+4.14e+00

Pos logit contributions
Sure+0.075
 approval+0.069
Acc+0.068
Indeed+0.067
Of+0.067
Neg logit contributions
 trouble-0.073
 guns-0.073
 men-0.072
 problem-0.072
 thieves-0.072
Token other
Token ID584
Feature activation+1.89e+00

Pos logit contributions
Sure+0.155
Acc+0.147
Of+0.138
Indeed+0.136
 approval+0.132
Neg logit contributions
 trouble-0.194
 men-0.188
 problem-0.188
 thieves-0.184
 guns-0.184
Token feel
Token ID1254
Feature activation+2.07e+00

Pos logit contributions
Sure+0.054
 approval+0.047
Acc+0.047
Indeed+0.046
Of+0.046
Neg logit contributions
 trouble-0.101
 guns-0.099
 men-0.099
 problem-0.099
 thieves-0.098
Token better
Token ID1365
Feature activation-4.56e-01

Pos logit contributions
Sure+0.065
Indeed+0.056
Acc+0.056
 approval+0.055
Of+0.054
Neg logit contributions
 trouble-0.108
 guns-0.104
 problem-0.104
 men-0.103
 thieves-0.102
Token."
Token ID526
Feature activation+5.51e-01

Pos logit contributions
 guns+0.021
 trouble+0.021
 men+0.021
 problem+0.021
 thieves+0.021
Neg logit contributions
Sure-0.016
 approval-0.014
Acc-0.014
Indeed-0.014
Of-0.014
Token
Token ID198
Feature activation+6.54e-01

Pos logit contributions
 approval+0.019
Sure+0.019
Acc+0.019
Indeed+0.018
 agreement+0.018
Neg logit contributions
 trouble-0.024
 guns-0.023
mons-0.023
uts-0.022
 peanuts-0.022
Token?
Token ID30
Feature activation+2.24e-01

Pos logit contributions
 gently+0.015
Sure+0.014
of+0.014
 carefully+0.013
 forgive+0.013
Neg logit contributions
mons-0.045
ages-0.044
ometers-0.041
 men-0.041
uts-0.040
Token She
Token ID1375
Feature activation-1.67e-01

Pos logit contributions
Sure+0.008
 approval+0.008
Acc+0.008
 agreement+0.008
Indeed+0.008
Neg logit contributions
 trouble-0.009
mons-0.009
 men-0.009
 sale-0.008
 bulldo-0.008
Token asked
Token ID1965
Feature activation-8.81e-02

Pos logit contributions
 trouble+0.008
 problem+0.007
 men+0.007
 guns+0.007
 thieves+0.007
Neg logit contributions
Sure-0.006
 approval-0.006
Acc-0.006
Of-0.005
Indeed-0.005
Token her
Token ID607
Feature activation+1.17e+00

Pos logit contributions
 problem+0.003
 trouble+0.003
 guns+0.003
 sale+0.003
 men+0.003
Neg logit contributions
Sure-0.004
And-0.004
Acc-0.004
Indeed-0.004
Of-0.003
Token mom
Token ID1995
Feature activation+8.26e-01

Pos logit contributions
Acc+0.040
Sure+0.039
 approval+0.035
And+0.035
Indeed+0.034
Neg logit contributions
 problem-0.057
 trouble-0.056
 men-0.054
 thieves-0.052
 guns-0.052
Tokenmy
Token ID1820
Feature activation+4.06e+00

Pos logit contributions
Sure+0.026
 approval+0.024
Acc+0.023
Indeed+0.023
Of+0.022
Neg logit contributions
 guns-0.041
 trouble-0.041
 problem-0.041
 men-0.041
 thieves-0.041
Token.
Token ID13
Feature activation+7.45e-02

Pos logit contributions
Sure+0.135
 approval+0.122
 gently+0.118
Acc+0.117
 forgive+0.117
Neg logit contributions
 trouble-0.198
 guns-0.195
 men-0.194
 bulldo-0.194
 thieves-0.194
Token Mom
Token ID11254
Feature activation-1.02e+00

Pos logit contributions
 approval+0.002
of+0.002
Sure+0.002
Acc+0.002
Indeed+0.002
Neg logit contributions
 sale-0.003
 trouble-0.003
mons-0.003
 guns-0.003
 men-0.003
Tokenmy
Token ID1820
Feature activation+2.17e+00

Pos logit contributions
 problem+0.050
 guns+0.048
 men+0.047
 thieves+0.047
 trouble+0.046
Neg logit contributions
Sure-0.032
Acc-0.030
 approval-0.030
 Higher-0.029
Indeed-0.029
Token couldn
Token ID3521
Feature activation+1.10e+00

Pos logit contributions
Sure+0.080
 approval+0.073
Acc+0.071
Indeed+0.071
Of+0.070
Neg logit contributions
 trouble-0.097
 guns-0.097
 men-0.096
 thieves-0.095
 problem-0.095
Token't
Token ID470
Feature activation+2.23e-01

Pos logit contributions
Sure+0.047
 gently+0.043
And+0.041
of+0.040
 forgive+0.040
Neg logit contributions
 thieves-0.047
 guns-0.047
mons-0.046
 sacks-0.045
 men-0.045
Token the
Token ID262
Feature activation+1.15e+00

Pos logit contributions
Sure+0.034
 approval+0.032
Acc+0.029
Indeed+0.029
Of+0.029
Neg logit contributions
 trouble-0.045
 problem-0.045
 thieves-0.044
 guns-0.044
mons-0.044
Token bar
Token ID2318
Feature activation+2.77e+00

Pos logit contributions
Sure+0.049
Acc+0.046
 approval+0.045
Indeed+0.045
Of+0.044
Neg logit contributions
 trouble-0.045
 men-0.044
 guns-0.044
 problem-0.044
 thieves-0.043
Tokenber
Token ID527
Feature activation+2.24e+00

Pos logit contributions
Sure+0.120
Acc+0.114
 approval+0.114
Indeed+0.109
Of+0.108
Neg logit contributions
 men-0.107
 problem-0.104
 thieves-0.104
 trouble-0.104
 guns-0.103
Token in
Token ID287
Feature activation+1.50e+00

Pos logit contributions
Sure+0.077
 approval+0.072
 gently+0.068
Indeed+0.067
Acc+0.066
Neg logit contributions
 trouble-0.106
 guns-0.104
 thieves-0.104
 men-0.103
 problem-0.102
Token the
Token ID262
Feature activation-3.36e-01

Pos logit contributions
Sure+0.052
 approval+0.048
Acc+0.046
Indeed+0.046
Of+0.045
Neg logit contributions
 guns-0.070
 men-0.069
 trouble-0.069
 problem-0.069
 thieves-0.069
Token mirror
Token ID10162
Feature activation+4.80e+00

Pos logit contributions
 trouble+0.014
 problem+0.013
 men+0.013
 guns+0.013
 thieves+0.013
Neg logit contributions
Sure-0.014
Acc-0.013
Indeed-0.013
 approval-0.013
Of-0.013
Token.
Token ID13
Feature activation-3.06e-01

Pos logit contributions
Sure+0.132
 approval+0.125
 gently+0.124
Acc+0.114
of+0.113
Neg logit contributions
 trouble-0.249
 thieves-0.247
 bulldo-0.247
 guns-0.246
 men-0.243
Token The
Token ID383
Feature activation-1.61e-01

Pos logit contributions
 trouble+0.014
 bulldo+0.013
 men+0.013
irens+0.013
 guns+0.013
Neg logit contributions
Sure-0.011
 approval-0.010
Acc-0.010
Indeed-0.010
 agreement-0.009
Token bar
Token ID2318
Feature activation-1.32e-02

Pos logit contributions
 problem+0.006
 trouble+0.005
 men+0.005
 thieves+0.005
 bulldo+0.005
Neg logit contributions
Acc-0.008
Sure-0.008
of-0.008
Indeed-0.008
And-0.007
Tokenber
Token ID527
Feature activation-8.50e-01

Pos logit contributions
 men+0.001
 problem+0.000
 trouble+0.000
 thieves+0.000
 bulldo+0.000
Neg logit contributions
Sure-0.001
Acc-0.001
 approval-0.001
Indeed-0.001
 agreement-0.001
Token was
Token ID373
Feature activation-4.86e-01

Pos logit contributions
 trouble+0.044
 guns+0.044
 thieves+0.044
 men+0.044
 problem+0.044
Neg logit contributions
Sure-0.025
 approval-0.022
Acc-0.022
Indeed-0.021
Of-0.021
Token said
Token ID531
Feature activation-1.89e+00

Pos logit contributions
Sure+0.017
 approval+0.016
Acc+0.016
Indeed+0.015
Of+0.015
Neg logit contributions
 trouble-0.020
 men-0.020
 problem-0.020
 guns-0.019
 bulldo-0.019
Token,
Token ID11
Feature activation-4.92e-01

Pos logit contributions
 thieves+0.084
 trouble+0.083
mons+0.083
 guns+0.083
irens+0.082
Neg logit contributions
Sure-0.073
 gently-0.066
 approval-0.064
Of-0.064
 softly-0.063
Token �
Token ID6184
Feature activation-1.60e+00

Pos logit contributions
 guns+0.019
 men+0.019
 bulldo+0.019
 trouble+0.019
 problem+0.019
Neg logit contributions
Sure-0.020
 approval-0.019
Acc-0.019
Indeed-0.019
Of-0.018
Token�
Token ID95
Feature activation-3.05e+00

Pos logit contributions
 guns+0.059
 trouble+0.057
 thieves+0.055
 problem+0.055
 men+0.055
Neg logit contributions
Sure-0.078
Of-0.070
Acc-0.069
Indeed-0.069
 approval-0.069
Token€
Token ID26391
Feature activation+1.47e+00

Pos logit contributions
 guns+0.125
 trouble+0.123
 men+0.122
 thieves+0.122
 problem+0.122
Neg logit contributions
Sure-0.127
 approval-0.117
Acc-0.115
Indeed-0.114
Of-0.113
Token�
Token ID129
Feature activation-3.68e-01

Pos logit contributions
Sure+0.055
And+0.053
of+0.051
Indeed+0.050
Again+0.049
Neg logit contributions
 thieves-0.064
 guns-0.064
 problem-0.063
 bullets-0.061
 men-0.061
Token�
Token ID241
Feature activation-6.38e-01

Pos logit contributions
 thieves+0.015
 guns+0.015
 trouble+0.014
 problem+0.014
 men+0.014
Neg logit contributions
Sure-0.015
 approval-0.014
Acc-0.014
Indeed-0.014
 agreement-0.014
TokenI
Token ID40
Feature activation+3.35e+00

Pos logit contributions
 trouble+0.030
 bulldo+0.028
 men+0.028
mons+0.028
 problem+0.027
Neg logit contributions
Sure-0.021
 approval-0.021
Acc-0.020
 gently-0.019
 agreement-0.019
Token understand
Token ID1833
Feature activation+2.62e+00

Pos logit contributions
Sure+0.129
 approval+0.121
Acc+0.119
Indeed+0.116
Of+0.115
Neg logit contributions
 trouble-0.143
 problem-0.140
 men-0.140
 guns-0.139
 bulldo-0.138
Token.
Token ID13
Feature activation+5.53e-01

Pos logit contributions
Sure+0.102
 approval+0.096
Of+0.090
 gently+0.089
 forgive+0.089
Neg logit contributions
 guns-0.110
 trouble-0.110
 bulldo-0.109
 thieves-0.109
mons-0.109
Token I
Token ID314
Feature activation+2.14e+00

Pos logit contributions
Sure+0.023
Acc+0.023
 approval+0.022
Indeed+0.021
 agreement+0.021
Neg logit contributions
 trouble-0.021
 bulldo-0.020
 men-0.020
 guns-0.019
 thieves-0.019
Token shelf
Token ID18316
Feature activation-1.90e-01

Pos logit contributions
Sure+0.055
 approval+0.051
Acc+0.049
Indeed+0.049
Of+0.049
Neg logit contributions
 trouble-0.050
 guns-0.050
 men-0.049
 thieves-0.049
 problem-0.048
Token.
Token ID13
Feature activation-5.83e-01

Pos logit contributions
 trouble+0.008
 guns+0.008
 problem+0.008
 men+0.008
 thieves+0.008
Neg logit contributions
Sure-0.007
 approval-0.007
Acc-0.007
Indeed-0.006
Of-0.006
Token
Token ID198
Feature activation+6.47e-01

Pos logit contributions
 trouble+0.019
mons+0.017
uts+0.016
 men+0.016
irens+0.016
Neg logit contributions
 approval-0.029
isy-0.029
Acc-0.028
 agreement-0.027
rene-0.027
Token
Token ID198
Feature activation+9.33e-01

Pos logit contributions
 approval+0.032
Acc+0.029
 agreement+0.028
Of+0.027
 upper+0.027
Neg logit contributions
 trouble-0.022
irens-0.021
 bulldo-0.020
uts-0.019
 men-0.019
TokenBut
Token ID1537
Feature activation+1.42e+00

Pos logit contributions
 approval+0.036
Sure+0.036
Acc+0.035
Of+0.033
Indeed+0.033
Neg logit contributions
 trouble-0.037
 men-0.037
 guns-0.037
irens-0.037
 problem-0.037
Token when
Token ID618
Feature activation+9.73e-01

Pos logit contributions
Sure+0.037
 approval+0.031
Indeed+0.030
Of+0.030
Acc+0.030
Neg logit contributions
 trouble-0.081
 guns-0.081
 bulldo-0.080
 men-0.080
 thieves-0.080
Token she
Token ID673
Feature activation-4.41e-01

Pos logit contributions
Sure+0.029
 gently+0.026
Me+0.023
 forgive+0.023
Of+0.023
Neg logit contributions
 bulldo-0.054
 bullets-0.052
 sacks-0.052
irens-0.052
inc-0.052
Token tried
Token ID3088
Feature activation+8.54e-01

Pos logit contributions
 problem+0.015
 trouble+0.014
 bulldo+0.014
 thieves+0.014
 guns+0.013
Neg logit contributions
 approval-0.022
Sure-0.021
Acc-0.021
Indeed-0.020
Of-0.020
Token to
Token ID284
Feature activation-1.06e+00

Pos logit contributions
Sure+0.033
 gently+0.033
Acc+0.030
 softly+0.029
 forgive+0.029
Neg logit contributions
 men-0.036
 thieves-0.036
 bulldo-0.035
 bullets-0.035
 guns-0.034
Token put
Token ID1234
Feature activation+1.26e+00

Pos logit contributions
 trouble+0.046
 thieves+0.045
 guns+0.045
 men+0.044
 sale+0.044
Neg logit contributions
Sure-0.040
Acc-0.039
 approval-0.037
Of-0.037
Indeed-0.036
Token Anna
Token ID11735
Feature activation+3.59e+00

Pos logit contributions
Sure+0.048
 approval+0.044
Indeed+0.042
Acc+0.042
Of+0.042
Neg logit contributions
 guns-0.055
 trouble-0.055
 men-0.054
 thieves-0.054
 problem-0.054
Token the
Token ID262
Feature activation-1.14e+00

Pos logit contributions
 guns+0.031
 trouble+0.030
 problem+0.030
 men+0.030
 thieves+0.030
Neg logit contributions
Sure-0.026
 approval-0.024
Acc-0.023
Indeed-0.023
Of-0.023
Token mud
Token ID17492
Feature activation+1.01e+00

Pos logit contributions
 trouble+0.041
 thieves+0.041
 guns+0.040
 men+0.040
 problem+0.040
Neg logit contributions
Sure-0.052
 approval-0.048
Acc-0.047
Indeed-0.047
Of-0.046
Token and
Token ID290
Feature activation+5.42e-01

Pos logit contributions
Sure+0.026
 gently+0.023
 approval+0.021
Acc+0.021
Indeed+0.020
Neg logit contributions
 trouble-0.057
 thieves-0.056
 problem-0.055
 men-0.055
 bulldo-0.055
Token started
Token ID2067
Feature activation-1.35e-01

Pos logit contributions
Sure+0.020
Acc+0.018
 gently+0.018
 approval+0.018
Of+0.018
Neg logit contributions
 trouble-0.024
 problem-0.024
 thieves-0.024
 men-0.024
irens-0.023
Token to
Token ID284
Feature activation-1.89e+00

Pos logit contributions
 thieves+0.006
 problem+0.006
 virus+0.006
 men+0.006
 trouble+0.006
Neg logit contributions
Sure-0.005
 gently-0.005
Acc-0.005
 carefully-0.004
 softly-0.004
Token roll
Token ID4836
Feature activation+1.13e+00

Pos logit contributions
 guns+0.076
 bulldo+0.074
 trouble+0.074
 problem+0.073
 men+0.073
Neg logit contributions
Sure-0.079
 approval-0.074
Of-0.073
Indeed-0.073
Acc-0.072
Token around
Token ID1088
Feature activation+2.60e+00

Pos logit contributions
Sure+0.047
 approval+0.044
Acc+0.044
Indeed+0.043
Of+0.043
Neg logit contributions
 trouble-0.046
 guns-0.046
 men-0.045
 problem-0.045
 thieves-0.044
Token.
Token ID13
Feature activation-3.73e-01

Pos logit contributions
Sure+0.097
 approval+0.089
Acc+0.087
Of+0.086
Indeed+0.086
Neg logit contributions
 trouble-0.117
 guns-0.117
 men-0.115
 problem-0.115
 thieves-0.115
Token He
Token ID679
Feature activation+3.65e-01

Pos logit contributions
 bulldo+0.016
 trouble+0.016
 men+0.016
irens+0.016
mons+0.016
Neg logit contributions
 approval-0.015
Acc-0.015
 agreement-0.014
Sure-0.013
isy-0.013
Token was
Token ID373
Feature activation+1.06e-01

Pos logit contributions
 approval+0.019
Sure+0.018
 agreement+0.017
Acc+0.017
Of+0.017
Neg logit contributions
 guns-0.013
 bulldo-0.013
 problem-0.012
 men-0.012
 bullets-0.011
Token having
Token ID1719
Feature activation-5.67e-01

Pos logit contributions
Sure+0.004
 approval+0.004
Acc+0.004
Indeed+0.004
Of+0.004
Neg logit contributions
 trouble-0.005
 guns-0.005
 men-0.005
 problem-0.005
 thieves-0.005
Token blowing
Token ID19280
Feature activation-2.96e-01

Pos logit contributions
 trouble+0.038
mons+0.038
 men+0.037
ometers+0.036
 thieves+0.036
Neg logit contributions
 who-0.025
 gently-0.023
Sure-0.023
 carefully-0.021
 then-0.021
Token in
Token ID287
Feature activation+5.57e-01

Pos logit contributions
 thieves+0.013
 problem+0.013
 men+0.013
mons+0.013
 trouble+0.013
Neg logit contributions
Sure-0.010
 gently-0.010
 approval-0.010
Acc-0.009
Indeed-0.009
Token the
Token ID262
Feature activation-1.66e+00

Pos logit contributions
Sure+0.022
 approval+0.020
Acc+0.020
Indeed+0.019
Of+0.019
Neg logit contributions
 trouble-0.024
 guns-0.024
 men-0.024
 problem-0.023
 thieves-0.023
Token wind
Token ID2344
Feature activation+1.57e-01

Pos logit contributions
 guns+0.066
 trouble+0.066
 men+0.065
 problem+0.064
 thieves+0.064
Neg logit contributions
Sure-0.071
 approval-0.066
Acc-0.065
Indeed-0.065
Of-0.064
Token.
Token ID13
Feature activation-5.12e-01

Pos logit contributions
Sure+0.003
 gently+0.003
 who+0.003
Acc+0.003
 approval+0.003
Neg logit contributions
 trouble-0.009
 thieves-0.009
 men-0.009
 sacks-0.009
 problem-0.009
Token It
Token ID632
Feature activation+7.76e-01

Pos logit contributions
 trouble+0.023
 guns+0.023
 men+0.023
 problem+0.022
 bulldo+0.022
Neg logit contributions
Sure-0.019
 approval-0.018
Acc-0.017
Indeed-0.017
Of-0.017
Token was
Token ID373
Feature activation-6.81e-01

Pos logit contributions
Sure+0.025
 gently+0.022
Indeed+0.021
Acc+0.021
Of+0.021
Neg logit contributions
 thieves-0.040
 trouble-0.040
 men-0.039
mons-0.039
 guns-0.039
Token so
Token ID523
Feature activation-2.30e+00

Pos logit contributions
mons+0.039
 thieves+0.039
 men+0.037
irens+0.037
 pests+0.037
Neg logit contributions
 gently-0.016
Sure-0.016
Indeed-0.015
 then-0.014
 approval-0.014
Token pretty
Token ID2495
Feature activation-6.23e-01

Pos logit contributions
 trouble+0.103
 guns+0.103
 men+0.102
 problem+0.101
 thieves+0.101
Neg logit contributions
Sure-0.086
 approval-0.079
Acc-0.077
Indeed-0.076
Of-0.075
Token!
Token ID0
Feature activation-3.86e-01

Pos logit contributions
 thieves+0.034
 inconvenience+0.034
 robbery+0.034
 trouble+0.034
 problem+0.034
Neg logit contributions
 gently-0.015
Sure-0.014
 who-0.014
 softly-0.014
 carefully-0.013
Token Tim
Token ID5045
Feature activation-1.11e+00

Pos logit contributions
 trouble+0.020
 guns+0.020
 thieves+0.019
 problem+0.019
 men+0.019
Neg logit contributions
Sure-0.012
 approval-0.011
Acc-0.011
Indeed-0.010
Of-0.010
Token?"
Token ID1701
Feature activation+2.01e+00

Pos logit contributions
uts+0.225
 trouble+0.225
 bulldo+0.222
 robbery+0.221
irens+0.221
Neg logit contributions
Sure-0.122
 forgive-0.117
 gently-0.114
of-0.106
 approval-0.104
Token Tom
Token ID4186
Feature activation+2.46e-01

Pos logit contributions
Sure+0.072
Of+0.060
 gently+0.060
Indeed+0.059
 approval+0.057
Neg logit contributions
mons-0.095
 guns-0.094
 problem-0.093
 thieves-0.093
unes-0.092
Token asks
Token ID7893
Feature activation-1.84e-01

Pos logit contributions
Sure+0.010
 approval+0.009
Acc+0.009
Indeed+0.009
 gently+0.009
Neg logit contributions
 guns-0.010
 trouble-0.010
 thieves-0.010
 men-0.010
mons-0.010
Token.
Token ID13
Feature activation+5.19e-01

Pos logit contributions
 trouble+0.008
 guns+0.008
 men+0.008
 thieves+0.008
 problem+0.008
Neg logit contributions
Sure-0.007
 approval-0.006
Acc-0.006
Indeed-0.006
Of-0.006
Token
Token ID198
Feature activation+9.95e-01

Pos logit contributions
Sure+0.014
 approval+0.014
Acc+0.013
Indeed+0.012
Of+0.012
Neg logit contributions
 trouble-0.028
 men-0.028
 guns-0.028
 bulldo-0.028
irens-0.027
Token
Token ID198
Feature activation+1.16e+00

Pos logit contributions
 approval+0.040
Acc+0.040
 agreement+0.036
Sure+0.035
Indeed+0.034
Neg logit contributions
irens-0.043
 trouble-0.042
 bulldo-0.041
 men-0.041
 guns-0.040
Token"
Token ID1
Feature activation-2.02e+00

Pos logit contributions
Sure+0.039
Indeed+0.033
 approval+0.033
And+0.033
Of+0.033
Neg logit contributions
 trouble-0.058
 problem-0.058
 guns-0.058
 thieves-0.057
 men-0.057
TokenSometimes
Token ID15468
Feature activation+2.82e+00

Pos logit contributions
 problem+0.112
 thieves+0.111
 trouble+0.111
 guns+0.111
 men+0.109
Neg logit contributions
Sure-0.059
Of-0.049
Indeed-0.048
Acc-0.048
And-0.047
Token,
Token ID11
Feature activation+4.84e-01

Pos logit contributions
 gently+0.097
 who+0.097
 softly+0.091
 forgive+0.089
 then+0.088
Neg logit contributions
irens-0.128
mons-0.125
unes-0.124
inc-0.123
riger-0.123
Token the
Token ID262
Feature activation-2.21e+00

Pos logit contributions
Sure+0.018
 gently+0.016
 approval+0.016
Of+0.016
Indeed+0.016
Neg logit contributions
irens-0.021
mons-0.021
 bulldo-0.021
 thieves-0.020
unes-0.020
Token water
Token ID1660
Feature activation-9.70e-02

Pos logit contributions
irens+0.112
riger+0.108
uts+0.108
drivers+0.105
cases+0.104
Neg logit contributions
Sure-0.069
 forgive-0.068
 agreement-0.064
 nod-0.062
 gently-0.062
Token.
Token ID13
Feature activation-1.82e-01

Pos logit contributions
Sure+0.032
 approval+0.029
Acc+0.028
Indeed+0.028
 gently+0.028
Neg logit contributions
 trouble-0.050
 guns-0.049
 men-0.049
 thieves-0.049
 problem-0.049
Token Bob
Token ID5811
Feature activation+1.17e+00

Pos logit contributions
 problem+0.009
 guns+0.009
 thieves+0.009
 trouble+0.009
 men+0.009
Neg logit contributions
Sure-0.006
Indeed-0.006
Of-0.006
 approval-0.005
 gently-0.005
Token and
Token ID290
Feature activation+1.61e-01

Pos logit contributions
Sure+0.039
 approval+0.035
Acc+0.034
Indeed+0.034
Of+0.033
Neg logit contributions
 trouble-0.057
 guns-0.057
 men-0.056
 problem-0.056
 thieves-0.056
Token Lily
Token ID20037
Feature activation+2.58e-01

Pos logit contributions
Sure+0.007
Acc+0.007
 approval+0.007
Indeed+0.006
And+0.006
Neg logit contributions
 guns-0.006
 trouble-0.006
 men-0.006
 thieves-0.006
 problem-0.006
Token were
Token ID547
Feature activation-1.18e+00

Pos logit contributions
Sure+0.011
 approval+0.010
Acc+0.010
Of+0.010
Indeed+0.010
Neg logit contributions
 guns-0.010
 bulldo-0.010
 men-0.010
 trouble-0.010
mons-0.010
Token very
Token ID845
Feature activation-1.53e+00

Pos logit contributions
 guns+0.071
 problem+0.070
 trouble+0.070
 men+0.070
mons+0.070
Neg logit contributions
Sure-0.025
 approval-0.023
Indeed-0.021
 gently-0.021
Acc-0.021
Token happy
Token ID3772
Feature activation-2.21e-01

Pos logit contributions
 guns+0.102
 trouble+0.102
 problem+0.101
 men+0.101
 thieves+0.101
Neg logit contributions
Sure-0.024
 approval-0.019
Acc-0.018
Indeed-0.018
Of-0.017
Token.
Token ID13
Feature activation-4.36e-01

Pos logit contributions
irens+0.012
 bulldo+0.012
 problem+0.012
 men+0.012
 thieves+0.012
Neg logit contributions
Sure-0.006
 gently-0.005
 approval-0.005
Acc-0.005
 forgive-0.005
Token They
Token ID1119
Feature activation-1.59e+00

Pos logit contributions
 trouble+0.016
 guns+0.016
 problem+0.016
 thieves+0.016
 men+0.016
Neg logit contributions
Sure-0.019
 approval-0.018
Acc-0.018
Indeed-0.017
Of-0.017
Token shared
Token ID4888
Feature activation+1.44e+00

Pos logit contributions
 guns+0.069
 bulldo+0.068
 trouble+0.067
 men+0.067
 thieves+0.067
Neg logit contributions
Sure-0.059
 approval-0.056
Acc-0.054
Of-0.052
Indeed-0.052
Token the
Token ID262
Feature activation+7.83e-01

Pos logit contributions
Sure+0.067
Acc+0.063
 approval+0.061
Of+0.061
Indeed+0.060
Neg logit contributions
 guns-0.054
 trouble-0.054
 men-0.051
 thieves-0.050
 sacks-0.049
Token is
Token ID318
Feature activation-5.98e-01

Pos logit contributions
Sure+0.049
 gently+0.047
 who+0.043
of+0.042
Of+0.042
Neg logit contributions
irens-0.071
 trouble-0.071
 thieves-0.069
 virus-0.068
 men-0.067
Token so
Token ID523
Feature activation-1.92e+00

Pos logit contributions
irens+0.035
mons+0.033
riger+0.032
 bulldo+0.032
 thieves+0.032
Neg logit contributions
 gently-0.014
 approval-0.013
Sure-0.013
 agreement-0.012
And-0.012
Token graceful
Token ID44363
Feature activation-6.91e-01

Pos logit contributions
 trouble+0.091
 guns+0.090
 thieves+0.090
 men+0.090
 problem+0.090
Neg logit contributions
Sure-0.066
 approval-0.060
Acc-0.058
Indeed-0.058
Of-0.057
Token.
Token ID13
Feature activation-6.35e-01

Pos logit contributions
 problem+0.037
 robbery+0.037
 bulldo+0.037
irens+0.037
 trouble+0.037
Neg logit contributions
 gently-0.017
 who-0.017
 softly-0.016
Sure-0.016
And-0.015
Token It
Token ID632
Feature activation+1.24e+00

Pos logit contributions
 men+0.021
 guns+0.021
 bulldo+0.020
mons+0.019
 sacks+0.019
Neg logit contributions
Acc-0.030
 approval-0.028
Sure-0.028
of-0.027
isy-0.027
Token swim
Token ID9422
Feature activation-9.81e-01

Pos logit contributions
Sure+0.042
 gently+0.040
Of+0.037
 who+0.037
 forgive+0.037
Neg logit contributions
irens-0.060
 trouble-0.058
 sale-0.056
 thieves-0.056
 virus-0.056
Tokens
Token ID82
Feature activation-1.73e+00

Pos logit contributions
 trouble+0.043
 guns+0.043
 men+0.042
 problem+0.042
 thieves+0.042
Neg logit contributions
Sure-0.037
 approval-0.034
Acc-0.033
Indeed-0.033
Of-0.033
Token like
Token ID588
Feature activation+1.82e+00

Pos logit contributions
 guns+0.077
 trouble+0.076
 men+0.075
 problem+0.075
 thieves+0.074
Neg logit contributions
Sure-0.066
 approval-0.061
Acc-0.060
Indeed-0.059
Of-0.058
Token a
Token ID257
Feature activation+7.62e-01

Pos logit contributions
Sure+0.057
 approval+0.051
Of+0.049
Indeed+0.049
Acc+0.049
Neg logit contributions
 problem-0.089
irens-0.089
 trouble-0.089
 bulldo-0.089
 men-0.088
Token dancer
Token ID38619
Feature activation+1.02e+00

Pos logit contributions
Sure+0.033
Acc+0.030
 approval+0.030
Indeed+0.030
Of+0.029
Neg logit contributions
 guns-0.030
 trouble-0.030
 problem-0.030
 men-0.029
 thieves-0.029
Token."
Token ID526
Feature activation+3.49e-01

Pos logit contributions
 who+0.026
 gently+0.024
Sure+0.024
 approval+0.022
 softly+0.022
Neg logit contributions
irens-0.057
 trouble-0.055
 bulldo-0.055
mons-0.055
 problem-0.054
Token the
Token ID262
Feature activation+1.59e+00

Pos logit contributions
Sure+0.024
 approval+0.022
Indeed+0.021
 gently+0.021
Acc+0.021
Neg logit contributions
mons-0.036
 thieves-0.035
 men-0.035
 problem-0.035
 bulldo-0.035
Token little
Token ID1310
Feature activation+5.96e+00

Pos logit contributions
Sure+0.078
 approval+0.076
Acc+0.075
Of+0.072
 forgive+0.071
Neg logit contributions
 problem-0.058
 bulldo-0.056
 guns-0.055
 men-0.054
 thieves-0.054
Token boy
Token ID2933
Feature activation+3.88e+00

Pos logit contributions
Sure+0.235
 approval+0.216
Acc+0.211
Indeed+0.209
Of+0.208
Neg logit contributions
 trouble-0.251
 guns-0.251
 men-0.248
 problem-0.246
 thieves-0.246
Token's
Token ID338
Feature activation+7.57e-01

Pos logit contributions
Sure+0.148
 approval+0.136
 gently+0.133
Acc+0.130
Indeed+0.130
Neg logit contributions
 trouble-0.166
 men-0.165
 thieves-0.162
 guns-0.162
 problem-0.162
Token shoulder
Token ID8163
Feature activation+2.77e+00

Pos logit contributions
Sure+0.037
Acc+0.035
 approval+0.034
Indeed+0.034
Of+0.033
Neg logit contributions
 problem-0.027
 trouble-0.026
 guns-0.026
 men-0.025
 bulldo-0.025
Token.
Token ID13
Feature activation-5.75e-01

Pos logit contributions
Sure+0.086
 gently+0.077
 approval+0.077
Of+0.073
Indeed+0.072
Neg logit contributions
 trouble-0.138
 thieves-0.138
 men-0.137
irens-0.136
mons-0.136
Token He
Token ID679
Feature activation+8.07e-02

Pos logit contributions
 trouble+0.029
 thieves+0.029
 problem+0.029
 men+0.029
 guns+0.029
Neg logit contributions
Sure-0.019
Of-0.016
 approval-0.016
Indeed-0.016
Acc-0.015
Token begged
Token ID32946
Feature activation-8.80e-01

Pos logit contributions
Sure+0.003
 approval+0.003
Acc+0.003
Indeed+0.003
Of+0.003
Neg logit contributions
 guns-0.003
 trouble-0.003
 men-0.003
 problem-0.003
 thieves-0.003
Token and
Token ID290
Feature activation+1.10e+00

Pos logit contributions
 guns+0.035
 trouble+0.034
 men+0.033
 bulldo+0.033
 problem+0.032
Neg logit contributions
Sure-0.038
Acc-0.037
 approval-0.036
Indeed-0.036
Of-0.035
Token begged
Token ID32946
Feature activation-8.16e-01

Pos logit contributions
Sure+0.047
 gently+0.042
Of+0.041
 approval+0.040
 forgive+0.040
Neg logit contributions
 trouble-0.048
 problem-0.048
unes-0.047
irens-0.046
mons-0.046
Token the
Token ID262
Feature activation-6.64e-01

Pos logit contributions
 guns+0.029
 trouble+0.028
 men+0.028
 bulldo+0.027
 problem+0.027
Neg logit contributions
Sure-0.038
Acc-0.037
Indeed-0.036
 approval-0.036
Of-0.036
Token had
Token ID550
Feature activation-8.24e-01

Pos logit contributions
Sure+0.042
 approval+0.039
Acc+0.038
Indeed+0.037
Of+0.037
Neg logit contributions
 trouble-0.054
 problem-0.053
 thieves-0.053
 guns-0.052
irens-0.052
Token fun
Token ID1257
Feature activation+7.31e-01

Pos logit contributions
 trouble+0.036
 guns+0.036
 men+0.035
 problem+0.034
 thieves+0.034
Neg logit contributions
Sure-0.032
 approval-0.029
Acc-0.029
Of-0.029
Indeed-0.028
Token.
Token ID13
Feature activation-1.62e-01

Pos logit contributions
Sure+0.021
 approval+0.019
And+0.019
Acc+0.018
Indeed+0.018
Neg logit contributions
 thieves-0.038
 trouble-0.037
 bulldo-0.037
 men-0.037
 problem-0.037
Token One
Token ID1881
Feature activation+9.48e-01

Pos logit contributions
 trouble+0.007
 guns+0.007
 men+0.007
 problem+0.007
 bulldo+0.007
Neg logit contributions
Sure-0.006
 approval-0.006
Acc-0.005
Indeed-0.005
Of-0.005
Token of
Token ID286
Feature activation+5.75e-02

Pos logit contributions
Sure+0.036
 approval+0.033
Indeed+0.032
Of+0.032
Acc+0.032
Neg logit contributions
 guns-0.041
 trouble-0.041
 thieves-0.041
irens-0.040
 men-0.040
Token the
Token ID262
Feature activation-8.32e-01

Pos logit contributions
Sure+0.003
 approval+0.002
Acc+0.002
Of+0.002
Indeed+0.002
Neg logit contributions
 guns-0.002
 trouble-0.002
 men-0.002
 thieves-0.002
 problem-0.002
Token games
Token ID1830
Feature activation-4.49e-02

Pos logit contributions
 bulldo+0.038
irens+0.038
 bullets+0.037
 guns+0.037
drivers+0.037
Neg logit contributions
Sure-0.030
 approval-0.028
 agreement-0.026
Acc-0.026
Indeed-0.026
Token was
Token ID373
Feature activation+1.24e+00

Pos logit contributions
irens+0.003
drivers+0.002
riger+0.002
 bulldo+0.002
unes+0.002
Neg logit contributions
Sure-0.001
 who-0.001
 smiled-0.001
 forgive-0.001
 loved-0.001
Token "
Token ID366
Feature activation-1.60e+00

Pos logit contributions
Sure+0.034
 gently+0.033
 approval+0.031
Acc+0.031
 forgive+0.030
Neg logit contributions
mons-0.064
 thieves-0.064
 bulldo-0.064
irens-0.063
 guns-0.062
Tokencl
Token ID565
Feature activation+1.80e+00

Pos logit contributions
mons+0.068
 guns+0.066
 sale+0.066
 trouble+0.065
 thieves+0.065
Neg logit contributions
Sure-0.057
 approval-0.057
Acc-0.056
 agreement-0.053
Indeed-0.052
Tokenap
Token ID499
Feature activation+7.23e-01

Pos logit contributions
Sure+0.072
 approval+0.065
Acc+0.064
Indeed+0.062
Of+0.062
Neg logit contributions
 trouble-0.079
 men-0.078
 guns-0.077
 thieves-0.077
 problem-0.076
Token "
Token ID366
Feature activation-1.42e+00

Pos logit contributions
 guns+0.027
 trouble+0.027
 men+0.026
 problem+0.026
 bulldo+0.026
Neg logit contributions
Sure-0.026
 approval-0.025
Acc-0.024
Indeed-0.024
Of-0.023
TokenI
Token ID40
Feature activation+3.83e+00

Pos logit contributions
 trouble+0.064
mons+0.060
irens+0.059
 thieves+0.059
 men+0.058
Neg logit contributions
 approval-0.046
Sure-0.045
 agreement-0.043
 gently-0.042
 forgive-0.041
Token wish
Token ID4601
Feature activation+4.35e-01

Pos logit contributions
Sure+0.134
 approval+0.120
Acc+0.118
Of+0.116
Indeed+0.116
Neg logit contributions
 guns-0.184
 men-0.181
 thieves-0.180
irens-0.179
 trouble-0.179
Token I
Token ID314
Feature activation+2.65e+00

Pos logit contributions
Sure+0.017
 approval+0.015
Of+0.015
Acc+0.015
 gently+0.015
Neg logit contributions
 bulldo-0.018
 men-0.018
 trouble-0.018
 guns-0.018
 thieves-0.018
Token never
Token ID1239
Feature activation+7.71e-01

Pos logit contributions
Sure+0.103
 approval+0.100
Acc+0.098
 agreement+0.097
Indeed+0.093
Neg logit contributions
 trouble-0.109
 guns-0.106
 problem-0.106
 bulldo-0.104
 thieves-0.104
Token gave
Token ID2921
Feature activation-1.15e+00

Pos logit contributions
Sure+0.030
 approval+0.027
Acc+0.027
Indeed+0.026
Of+0.026
Neg logit contributions
 trouble-0.034
 guns-0.033
 problem-0.033
 thieves-0.033
 men-0.033
Token my
Token ID616
Feature activation+1.94e+00

Pos logit contributions
 trouble+0.046
 guns+0.045
 thieves+0.045
 problem+0.044
 men+0.043
Neg logit contributions
Sure-0.050
Acc-0.046
 approval-0.046
Indeed-0.045
Of-0.045
Token bug
Token ID5434
Feature activation+3.26e+00

Pos logit contributions
Sure+0.083
 approval+0.077
Acc+0.075
Indeed+0.074
Of+0.074
Neg logit contributions
 trouble-0.076
 guns-0.076
 men-0.075
 problem-0.075
 thieves-0.074
Token to
Token ID284
Feature activation-4.14e-01

Pos logit contributions
Sure+0.110
 approval+0.101
Of+0.095
Acc+0.095
Indeed+0.094
Neg logit contributions
 men-0.155
 trouble-0.155
 guns-0.154
 bulldo-0.153
 problem-0.152
Token you
Token ID345
Feature activation+1.40e+00

Pos logit contributions
 trouble+0.015
 guns+0.014
 thieves+0.014
 problem+0.014
 sale+0.014
Neg logit contributions
Sure-0.020
Acc-0.019
Indeed-0.018
Of-0.018
 approval-0.017
Token,
Token ID11
Feature activation-3.86e-01

Pos logit contributions
Sure+0.054
Acc+0.049
 approval+0.049
Indeed+0.048
Of+0.048
Neg logit contributions
 trouble-0.061
 guns-0.061
 problem-0.061
 thieves-0.060
 men-0.060
Token on
Token ID319
Feature activation-3.92e-01

Pos logit contributions
 trouble+0.060
 guns+0.060
 men+0.060
 problem+0.060
 thieves+0.059
Neg logit contributions
Sure-0.053
 approval-0.048
Acc-0.047
Indeed-0.047
Of-0.046
Token the
Token ID262
Feature activation+1.42e-01

Pos logit contributions
mons+0.020
 bulldo+0.020
 thieves+0.020
 problem+0.020
 robbery+0.020
Neg logit contributions
Sure-0.010
 gently-0.010
 approval-0.010
Indeed-0.009
 agreement-0.009
Token island
Token ID7022
Feature activation+1.50e-02

Pos logit contributions
Sure+0.007
 approval+0.007
Acc+0.007
Of+0.007
Indeed+0.007
Neg logit contributions
 guns-0.005
 men-0.005
 trouble-0.005
 problem-0.005
 thieves-0.004
Token were
Token ID547
Feature activation-1.17e+00

Pos logit contributions
Sure+0.001
 approval+0.000
 gently+0.000
Acc+0.000
Indeed+0.000
Neg logit contributions
 trouble-0.001
 thieves-0.001
 bulldo-0.001
irens-0.001
 guns-0.001
Token very
Token ID845
Feature activation-1.42e+00

Pos logit contributions
mons+0.066
irens+0.064
riger+0.063
 virus+0.063
unes+0.062
Neg logit contributions
 who-0.030
 gently-0.029
 happy-0.029
 forgive-0.027
 softly-0.027
Token thirsty
Token ID47124
Feature activation-5.04e+00

Pos logit contributions
irens+0.080
 guns+0.080
 thieves+0.079
mons+0.079
 trouble+0.078
Neg logit contributions
Sure-0.035
 gently-0.030
 approval-0.030
 forgive-0.029
Indeed-0.029
Token,
Token ID11
Feature activation-1.99e+00

Pos logit contributions
irens+0.312
 bulldo+0.307
riger+0.302
mons+0.300
 robbery+0.296
Neg logit contributions
 gently-0.094
 who-0.092
Sure-0.085
 softly-0.079
 forgive-0.079
Token but
Token ID475
Feature activation+8.46e-04

Pos logit contributions
 problem+0.116
 thieves+0.115
mons+0.114
irens+0.114
riger+0.114
Neg logit contributions
Sure-0.043
 gently-0.042
 who-0.039
 softly-0.037
 quietly-0.036
Token no
Token ID645
Feature activation-8.96e-01

Pos logit contributions
Sure+0.000
 gently+0.000
 approval+0.000
Indeed+0.000
Acc+0.000
Neg logit contributions
irens-0.000
 bulldo-0.000
 trouble-0.000
mons-0.000
 thieves-0.000
Token matter
Token ID2300
Feature activation-2.48e-01

Pos logit contributions
 trouble+0.042
 guns+0.042
 problem+0.042
 men+0.042
 thieves+0.042
Neg logit contributions
Sure-0.031
 approval-0.028
Acc-0.028
Indeed-0.028
Of-0.027
Token how
Token ID703
Feature activation+2.02e+00

Pos logit contributions
 guns+0.014
 trouble+0.013
 problem+0.013
 thieves+0.013
 men+0.013
Neg logit contributions
Sure-0.007
 approval-0.006
Acc-0.006
Of-0.006
Indeed-0.006
Token what
Token ID644
Feature activation+1.74e+00

Pos logit contributions
 trouble+0.012
 guns+0.012
 men+0.012
 thieves+0.012
 problem+0.012
Neg logit contributions
Sure-0.010
Acc-0.010
 approval-0.009
Of-0.009
Indeed-0.009
Token the
Token ID262
Feature activation-1.95e+00

Pos logit contributions
Sure+0.061
 approval+0.055
 forgive+0.054
 gently+0.054
Indeed+0.054
Neg logit contributions
unes-0.082
mons-0.082
irens-0.081
 bulldo-0.079
inc-0.079
Token man
Token ID582
Feature activation-1.38e+00

Pos logit contributions
 guns+0.072
 trouble+0.070
 men+0.070
 thieves+0.069
 bulldo+0.069
Neg logit contributions
Sure-0.088
 approval-0.083
Indeed-0.080
Of-0.080
Acc-0.079
Token was
Token ID373
Feature activation-6.76e-01

Pos logit contributions
irens+0.070
mons+0.069
 trouble+0.068
 guns+0.068
unes+0.067
Neg logit contributions
Sure-0.043
 gently-0.040
 who-0.039
Indeed-0.038
 forgive-0.038
Token looking
Token ID2045
Feature activation-5.85e-01

Pos logit contributions
mons+0.031
 pistols+0.030
 bullets+0.029
irens+0.029
 guns+0.029
Neg logit contributions
 who-0.025
 gently-0.025
 forgive-0.025
 eagerly-0.024
 carefully-0.024
Token for
Token ID329
Feature activation-4.76e+00

Pos logit contributions
irens+0.021
mons+0.021
 guns+0.021
 bulldo+0.020
 trouble+0.020
Neg logit contributions
Sure-0.026
 approval-0.025
 gently-0.024
Of-0.024
Indeed-0.024
Token -
Token ID532
Feature activation-1.79e+00

Pos logit contributions
 problem+0.235
mons+0.233
 guns+0.233
 bulldo+0.233
irens+0.232
Neg logit contributions
Sure-0.146
 approval-0.140
 gently-0.128
Indeed-0.127
 forgive-0.126
Token a
Token ID257
Feature activation-6.16e-01

Pos logit contributions
mons+0.081
 problem+0.080
irens+0.079
 guns+0.078
 thieves+0.077
Neg logit contributions
Sure-0.067
 approval-0.062
 gently-0.061
Of-0.060
Indeed-0.060
Token book
Token ID1492
Feature activation-3.85e+00

Pos logit contributions
 guns+0.026
 trouble+0.026
 men+0.025
 thieves+0.025
 problem+0.025
Neg logit contributions
Sure-0.025
 approval-0.023
Acc-0.022
Indeed-0.022
Of-0.022
Token he
Token ID339
Feature activation-2.71e+00

Pos logit contributions
mons+0.200
 thieves+0.188
 guns+0.187
 sacks+0.187
 trouble+0.186
Neg logit contributions
Sure-0.117
 who-0.114
 gently-0.111
of-0.110
 forgive-0.104
Token had
Token ID550
Feature activation-1.55e+00

Pos logit contributions
 trouble+0.117
 guns+0.115
 problem+0.115
 men+0.115
 thieves+0.114
Neg logit contributions
Sure-0.105
 approval-0.098
Acc-0.096
Indeed-0.093
Of-0.093
Token foolish
Token ID19538
Feature activation-3.03e+00

Pos logit contributions
 guns+0.051
 bullets+0.049
 bulldo+0.049
 men+0.049
irens+0.049
Neg logit contributions
Sure-0.045
Acc-0.043
 gently-0.042
Of-0.041
 approval-0.041
Token.
Token ID13
Feature activation+3.71e-01

Pos logit contributions
 bulldo+0.163
irens+0.160
 sacks+0.157
 men+0.156
 guns+0.156
Neg logit contributions
Sure-0.078
 who-0.078
 gently-0.075
 softly-0.072
Acc-0.070
Token She
Token ID1375
Feature activation+2.87e-01

Pos logit contributions
 approval+0.015
Sure+0.015
 agreement+0.015
Acc+0.014
 gently+0.014
Neg logit contributions
 trouble-0.014
uts-0.013
unes-0.013
 problem-0.013
mons-0.013
Token knew
Token ID2993
Feature activation+1.05e+00

Pos logit contributions
Sure+0.010
 approval+0.010
Of+0.009
Acc+0.009
Indeed+0.009
Neg logit contributions
 guns-0.013
 trouble-0.013
 bulldo-0.013
 problem-0.013
 men-0.013
Token that
Token ID326
Feature activation-1.10e+00

Pos logit contributions
Sure+0.045
 approval+0.042
Acc+0.041
Indeed+0.041
Of+0.041
Neg logit contributions
 trouble-0.041
 guns-0.041
 men-0.040
 problem-0.040
 thieves-0.040
Token folders
Token ID24512
Feature activation-5.99e+00

Pos logit contributions
 thieves+0.055
 men+0.055
 bulldo+0.054
irens+0.054
 guns+0.053
Neg logit contributions
Sure-0.035
Acc-0.033
 approval-0.032
Indeed-0.031
Of-0.031
Token weren
Token ID6304
Feature activation-2.25e+00

Pos logit contributions
 bulldo+0.325
irens+0.322
 thieves+0.313
 men+0.311
 trouble+0.306
Neg logit contributions
Sure-0.170
 gently-0.163
 forgive-0.160
 approval-0.151
Indeed-0.149
Token't
Token ID470
Feature activation-2.06e+00

Pos logit contributions
mons+0.099
 men+0.099
 guns+0.097
 thieves+0.095
 sacks+0.095
Neg logit contributions
Sure-0.092
 gently-0.083
of-0.081
Indeed-0.079
And-0.079
Token meant
Token ID4001
Feature activation-2.99e+00

Pos logit contributions
irens+0.125
mons+0.122
 bulldo+0.118
ops+0.118
ables+0.113
Neg logit contributions
 gently-0.057
 who-0.052
 even-0.051
 happy-0.051
 carefully-0.048
Token to
Token ID284
Feature activation-2.64e+00

Pos logit contributions
irens+0.156
 bulldo+0.154
riger+0.146
 men+0.144
 ditch+0.144
Neg logit contributions
 gently-0.096
 who-0.091
 forgive-0.088
 softly-0.085
 carefully-0.085
Token be
Token ID307
Feature activation-6.72e-01

Pos logit contributions
 men+0.134
 guns+0.134
 trouble+0.133
 bulldo+0.133
 problem+0.131
Neg logit contributions
Sure-0.082
 approval-0.075
Of-0.070
Acc-0.070
Indeed-0.070
Token
Token ID198
Feature activation+4.23e-01

Pos logit contributions
Acc+0.005
 approval+0.005
Sure+0.005
Of+0.004
 agreement+0.004
Neg logit contributions
irens-0.007
 bulldo-0.007
 trouble-0.007
 men-0.007
 guns-0.007
TokenBig
Token ID12804
Feature activation+4.92e-01

Pos logit contributions
Sure+0.012
 approval+0.011
Acc+0.011
Of+0.010
Indeed+0.010
Neg logit contributions
 trouble-0.022
 men-0.022
 guns-0.022
 bulldo-0.022
irens-0.022
Token John
Token ID1757
Feature activation+1.66e+00

Pos logit contributions
Sure+0.016
 approval+0.014
Indeed+0.014
Of+0.014
of+0.013
Neg logit contributions
 guns-0.024
 thieves-0.024
mons-0.024
 trouble-0.024
 men-0.023
Token was
Token ID373
Feature activation-3.21e-01

Pos logit contributions
Sure+0.049
 gently+0.043
 approval+0.042
Indeed+0.042
Of+0.041
Neg logit contributions
 trouble-0.090
 guns-0.088
 thieves-0.088
 problem-0.087
irens-0.086
Token feeling
Token ID4203
Feature activation-1.23e+00

Pos logit contributions
 guns+0.017
mons+0.016
 problem+0.016
 thieves+0.016
 bulldo+0.016
Neg logit contributions
Sure-0.010
 approval-0.009
Indeed-0.009
Acc-0.009
 gently-0.008
Token generous
Token ID14431
Feature activation-5.19e+00

Pos logit contributions
 guns+0.064
 thieves+0.063
mons+0.062
 men+0.062
 problem+0.062
Neg logit contributions
Sure-0.037
 approval-0.034
Acc-0.032
Indeed-0.032
Of-0.032
Token,
Token ID11
Feature activation-1.63e+00

Pos logit contributions
mons+0.308
irens+0.306
 guns+0.303
 bulldo+0.302
 thieves+0.296
Neg logit contributions
Sure-0.116
 approval-0.107
 gently-0.104
 who-0.103
 forgive-0.096
Token so
Token ID523
Feature activation-1.78e+00

Pos logit contributions
 guns+0.082
 thieves+0.082
 problem+0.082
mons+0.081
 trouble+0.081
Neg logit contributions
Sure-0.052
 approval-0.046
Of-0.046
Indeed-0.045
Acc-0.045
Token he
Token ID339
Feature activation-2.72e+00

Pos logit contributions
mons+0.075
 thieves+0.073
 guns+0.072
unes+0.071
 trouble+0.070
Neg logit contributions
Sure-0.075
 approval-0.068
Indeed-0.067
Of-0.066
 gently-0.066
Token happily
Token ID18177
Feature activation+1.67e+00

Pos logit contributions
 trouble+0.142
irens+0.138
 thieves+0.138
mons+0.137
 guns+0.137
Neg logit contributions
Sure-0.091
 gently-0.082
Of-0.079
 approval-0.077
Indeed-0.077
Token agreed
Token ID4987
Feature activation-2.63e-01

Pos logit contributions
Sure+0.059
 approval+0.053
Of+0.052
 gently+0.051
Indeed+0.051
Neg logit contributions
 trouble-0.082
 guns-0.079
 problem-0.079
 thieves-0.079
 men-0.078
Token together
Token ID1978
Feature activation+7.55e-01

Pos logit contributions
 trouble+0.076
 virus+0.075
irens+0.074
 bulldo+0.074
 paramed+0.073
Neg logit contributions
 gently-0.035
 carefully-0.031
 who-0.031
 softly-0.028
 eagerly-0.027
Token.
Token ID13
Feature activation-3.31e-01

Pos logit contributions
Sure+0.025
 approval+0.023
Indeed+0.022
Acc+0.021
 gently+0.021
Neg logit contributions
 trouble-0.037
 guns-0.036
 thieves-0.036
 problem-0.036
 men-0.036
Token They
Token ID1119
Feature activation-1.17e+00

Pos logit contributions
 trouble+0.012
 guns+0.012
 men+0.012
 bulldo+0.011
 thieves+0.011
Neg logit contributions
Sure-0.015
 approval-0.014
Acc-0.014
Indeed-0.014
Of-0.014
Token have
Token ID423
Feature activation+1.65e-01

Pos logit contributions
 trouble+0.058
 guns+0.058
 men+0.057
 problem+0.057
 thieves+0.057
Neg logit contributions
Sure-0.038
 approval-0.034
Acc-0.033
Indeed-0.033
Of-0.032
Token ten
Token ID3478
Feature activation+3.52e+00

Pos logit contributions
Sure+0.005
 approval+0.005
 gently+0.005
Indeed+0.005
Acc+0.005
Neg logit contributions
 bulldo-0.008
irens-0.008
mons-0.008
 problem-0.008
 thieves-0.008
Token dollars
Token ID5054
Feature activation-5.05e+00

Pos logit contributions
 gently+0.173
 who+0.171
 approval+0.168
 carefully+0.165
of+0.161
Neg logit contributions
 virus-0.105
mons-0.098
storm-0.096
ained-0.094
 bulldo-0.092
Token.
Token ID13
Feature activation-4.60e-01

Pos logit contributions
 virus+0.291
 bulldo+0.286
arre+0.278
 paramed+0.274
riger+0.273
Neg logit contributions
 gently-0.125
 carefully-0.110
Sure-0.108
 then-0.107
 who-0.107
Token Sam
Token ID3409
Feature activation-4.82e-02

Pos logit contributions
 trouble+0.015
 guns+0.015
 men+0.015
mons+0.015
 bulldo+0.014
Neg logit contributions
Sure-0.022
 approval-0.021
Acc-0.021
Indeed-0.020
Of-0.020
Token is
Token ID318
Feature activation-1.40e-01

Pos logit contributions
 guns+0.002
 trouble+0.002
irens+0.002
 thieves+0.002
mons+0.002
Neg logit contributions
Sure-0.002
 gently-0.002
 approval-0.002
Indeed-0.002
Acc-0.001
Token happy
Token ID3772
Feature activation+2.92e-01

Pos logit contributions
 guns+0.008
 bulldo+0.008
 trouble+0.008
 problem+0.008
mons+0.008
Neg logit contributions
Sure-0.003
 approval-0.003
Acc-0.003
Indeed-0.003
Of-0.002
Token.
Token ID13
Feature activation-5.31e-01

Pos logit contributions
Sure+0.008
 approval+0.007
 gently+0.007
 forgive+0.007
Acc+0.007
Neg logit contributions
 bulldo-0.016
 guns-0.015
irens-0.015
mons-0.015
 men-0.015