TOP ACTIVATIONS
MAX = 0.202

friendly and asked the l itt ler frog to join them
She grabbed oven m itt s and carefully pulled out
took off the oven m itt s and opened the oven
little hat and oven m itt s . She took a
this flower was even l itt ler than the da isy
. She used a m itt to take out the pan
there was a three - year - old who found a
sometimes , even the l itt lest things can lead to
and get it , k itt ies !" she called ,
, there was a l itt tle girl . She had
put on her oven m itt s and z ipped open
put on an oven m itt and a a pron .
her depend able oven m itt s and bake y ummy
was an average three - year - old boy . He
They put on oven m itt s and took out the
clown s juggling and ac rob ats doing flips . But
thought they looked like ac rob ats . She enjoyed the
mum and her three - year - old daughter went to
saw a little 3 - year - old boy . He
They get a m itt and a spat ula .

MIN ACTIVATIONS
MIN = -0.128

eat some of the fish ~ â Jane
't judge things by their appearance , be curious and respectful
a trick for you . Watch this ." He took the
also see words in different languages . " What
" Yes , part . Watch this ." They
that it 's not the appearance of things that matter ,
play with their toys and forget about the tiger . They
mouth . And don 't forget to share ."
lot of fun . They forget about their impat ience .
't judge someone by their appearance . Sometimes they are not
. They have fun and forget about the juice . They
lesson . They did not forget their new shoes again .
fly fast and loud . Watch this !" Max
again tomorrow . Don 't forget me !" L
They have fun . They forget about the hurt . <|endoftext|>
They have fun . They forget about the sun , the
the sees aw . They forget about the new boy and
will take some money . Watch me ." He runs to
and their music . They forget about the lion and the
tomorrow . And don 't forget your bone . It 's

INTERVAL 0.119 - 0.202
CONTAINS 0.020%

The little three - year - old t ipt o
over . K aren thought , " You want
he heard a loud me owing . He peek ed out
One day , a three - year - old boy was
my was a 3 - year - old who loved to

INTERVAL 0.037 - 0.119
CONTAINS 4.994%

magical nail went around freezing everything in sight . Everywhere he
, " But I want milk now !" and started to
, " That is very sweet of you . But you
ens . When the 30 minutes were up , Lily turned
tell that his mom was struggling to make it . He

INTERVAL -0.045 - 0.037
CONTAINS 92.023%

day on , Tim my shared all his toys with his
will find a good home with other kids who will love
there was a little boy named Tim . Tim was a
joined the castle . They had a lot of fun in
She asked the store worker , " Can you get

INTERVAL -0.128 - -0.045
CONTAINS 2.963%

flowers , some looked like animals . Lily and Sam cl
man . He saw that Jack was really thirsty . "
it . L ily took off one of her
was over , there were two winners - one with a
decided to have a dance party . He invited all of
Token friendly
Token ID8030
Feature activation-2.30e-03

Pos logit contributions
 rescued+0.028
eh+0.027
vous+0.027
 deliber+0.026
year+0.024
Neg logit contributions
 outfits-0.040
 appearance-0.040
 shirts-0.038
 houses-0.038
 languages-0.038
Token and
Token ID290
Feature activation+8.37e-04

Pos logit contributions
 outfits+0.004
 appearance+0.004
 stations+0.004
 shirts+0.004
utes+0.004
Neg logit contributions
 rescued-0.004
 deliber-0.003
vous-0.003
eh-0.003
 eaten-0.003
Token asked
Token ID1965
Feature activation+1.72e-03

Pos logit contributions
 rescued+0.001
eh+0.001
vous+0.001
 deliber+0.001
year+0.001
Neg logit contributions
 outfits-0.002
 appearance-0.002
 shirts-0.002
 stations-0.002
 houses-0.002
Token the
Token ID262
Feature activation+2.04e-02

Pos logit contributions
 rescued+0.003
eh+0.002
 deliber+0.002
vous+0.002
year+0.002
Neg logit contributions
 outfits-0.003
 appearance-0.003
 shirts-0.003
 stations-0.003
 houses-0.003
Token l
Token ID300
Feature activation+3.38e-02

Pos logit contributions
 rescued+0.032
vous+0.032
eh+0.030
 deliber+0.030
 eaten+0.028
Neg logit contributions
 outfits-0.039
 appearance-0.037
 shirts-0.036
 stations-0.036
ies-0.035
Tokenitt
Token ID715
Feature activation+2.02e-01

Pos logit contributions
 rescued+0.052
 deliber+0.050
vous+0.048
eh+0.046
 eaten+0.043
Neg logit contributions
 outfits-0.062
 hoops-0.060
 languages-0.059
 shirts-0.059
 appearance-0.059
Tokenler
Token ID1754
Feature activation+8.04e-03

Pos logit contributions
eh+0.414
 rescued+0.411
vous+0.398
on+0.395
dd+0.389
Neg logit contributions
 appearance-0.265
 outfits-0.262
 stations-0.254
 shirts-0.251
 houses-0.250
Token frog
Token ID21264
Feature activation-2.31e-02

Pos logit contributions
 rescued+0.010
eh+0.009
vous+0.009
on+0.008
sm+0.008
Neg logit contributions
 stations-0.017
 shirts-0.017
 appearance-0.017
 outfits-0.017
 houses-0.016
Token to
Token ID284
Feature activation+2.61e-02

Pos logit contributions
 shirts+0.047
 stations+0.046
 outfits+0.046
 appearance+0.046
 heads+0.045
Neg logit contributions
 rescued-0.031
eh-0.029
sm-0.027
year-0.027
vous-0.027
Token join
Token ID4654
Feature activation-6.18e-02

Pos logit contributions
 rescued+0.043
 oblig+0.037
 deliber+0.037
eh+0.037
vous+0.037
Neg logit contributions
 outfits-0.051
 stations-0.049
 shirts-0.049
lights-0.048
ies-0.047
Token them
Token ID606
Feature activation-1.97e-02

Pos logit contributions
 appearance+0.114
 outfits+0.111
 stations+0.109
 shirts+0.107
 houses+0.103
Neg logit contributions
 rescued-0.098
eh-0.091
 deliber-0.089
year-0.088
on-0.088
Token
Token ID220
Feature activation+3.83e-02

Pos logit contributions
 rescued+0.021
eh+0.021
vous+0.020
 deliber+0.020
year+0.018
Neg logit contributions
 outfits-0.025
 appearance-0.024
 shirts-0.024
 houses-0.023
 languages-0.023
Token She
Token ID1375
Feature activation+1.98e-02

Pos logit contributions
 rescued+0.056
eh+0.054
vous+0.053
 deliber+0.052
year+0.047
Neg logit contributions
 outfits-0.074
 appearance-0.073
 shirts-0.071
 houses-0.071
 stations-0.070
Token grabbed
Token ID13176
Feature activation+1.56e-02

Pos logit contributions
eh+0.029
 rescued+0.029
vous+0.029
 deliber+0.028
year+0.026
Neg logit contributions
 outfits-0.037
 appearance-0.035
 shirts-0.035
ies-0.035
 houses-0.035
Token oven
Token ID14361
Feature activation+6.48e-02

Pos logit contributions
vous+0.026
eh+0.026
 rescued+0.025
 deliber+0.025
jected+0.023
Neg logit contributions
 outfits-0.026
 houses-0.026
 hoops-0.026
 shirts-0.025
 faces-0.025
Token m
Token ID285
Feature activation+5.50e-02

Pos logit contributions
 rescued+0.125
 deliber+0.117
eh+0.115
on+0.110
vous+0.110
Neg logit contributions
 appearance-0.099
 outfits-0.096
 languages-0.092
 stations-0.090
 hoops-0.089
Tokenitt
Token ID715
Feature activation+2.01e-01

Pos logit contributions
 rescued+0.056
eh+0.052
vous+0.050
 deliber+0.046
 saved+0.044
Neg logit contributions
 outfits-0.131
ies-0.129
 faces-0.127
 appearance-0.126
 shirts-0.125
Tokens
Token ID82
Feature activation-1.87e-02

Pos logit contributions
 rescued+0.455
eh+0.453
 deliber+0.437
vous+0.432
on+0.427
Neg logit contributions
 outfits-0.219
 appearance-0.219
 stations-0.205
 houses-0.202
 languages-0.201
Token and
Token ID290
Feature activation-2.51e-03

Pos logit contributions
 appearance+0.031
 outfits+0.030
 languages+0.030
 stations+0.029
 hoops+0.029
Neg logit contributions
 rescued-0.036
eh-0.032
on-0.032
 saved-0.031
 eaten-0.031
Token carefully
Token ID7773
Feature activation+2.70e-02

Pos logit contributions
 outfits+0.005
 houses+0.005
 shirts+0.004
ies+0.004
 appearance+0.004
Neg logit contributions
vous-0.004
eh-0.004
 rescued-0.004
 deliber-0.004
jected-0.003
Token pulled
Token ID5954
Feature activation+3.15e-02

Pos logit contributions
vous+0.042
eh+0.041
 rescued+0.040
 deliber+0.039
jected+0.036
Neg logit contributions
 outfits-0.050
 shirts-0.048
 appearance-0.047
 houses-0.046
ies-0.046
Token out
Token ID503
Feature activation+2.29e-02

Pos logit contributions
 rescued+0.051
eh+0.049
vous+0.047
 deliber+0.047
on+0.044
Neg logit contributions
 outfits-0.057
 appearance-0.057
 shirts-0.054
 languages-0.054
 stations-0.053
Token took
Token ID1718
Feature activation+6.12e-03

Pos logit contributions
 rescued+0.031
eh+0.031
vous+0.030
 deliber+0.029
on+0.027
Neg logit contributions
 outfits-0.042
 appearance-0.041
 shirts-0.040
 houses-0.040
ies-0.039
Token off
Token ID572
Feature activation-3.91e-02

Pos logit contributions
eh+0.012
vous+0.012
 deliber+0.012
 rescued+0.011
jected+0.011
Neg logit contributions
 houses-0.009
 outfits-0.008
 faces-0.008
 shirts-0.008
 stations-0.007
Token the
Token ID262
Feature activation+1.68e-02

Pos logit contributions
 houses+0.058
 outfits+0.056
 shirts+0.054
 faces+0.051
 hoops+0.050
Neg logit contributions
vous-0.080
eh-0.074
jected-0.072
 rescued-0.070
dd-0.070
Token oven
Token ID14361
Feature activation+6.49e-02

Pos logit contributions
 rescued+0.031
vous+0.030
dd+0.030
 deliber+0.030
eh+0.029
Neg logit contributions
 houses-0.026
 outfits-0.025
 faces-0.024
ies-0.024
 shirts-0.024
Token m
Token ID285
Feature activation+5.30e-02

Pos logit contributions
 rescued+0.133
eh+0.128
 deliber+0.126
vous+0.125
on+0.118
Neg logit contributions
 appearance-0.090
 outfits-0.089
 shirts-0.083
 languages-0.082
 stations-0.081
Tokenitt
Token ID715
Feature activation+1.96e-01

Pos logit contributions
 rescued+0.078
eh+0.074
vous+0.073
 deliber+0.070
 saved+0.066
Neg logit contributions
ies-0.102
 outfits-0.101
 faces-0.099
 houses-0.097
 appearance-0.097
Tokens
Token ID82
Feature activation-2.48e-02

Pos logit contributions
 rescued+0.434
eh+0.429
on+0.414
 deliber+0.412
vous+0.405
Neg logit contributions
 appearance-0.230
 outfits-0.226
 languages-0.213
 stations-0.211
 houses-0.208
Token and
Token ID290
Feature activation-1.04e-03

Pos logit contributions
 appearance+0.042
 hoops+0.041
 languages+0.040
 outfits+0.040
 stations+0.040
Neg logit contributions
 rescued-0.045
eh-0.040
on-0.040
vous-0.039
 eaten-0.039
Token opened
Token ID4721
Feature activation-3.88e-03

Pos logit contributions
 outfits+0.002
 appearance+0.002
 shirts+0.002
 houses+0.002
 stations+0.002
Neg logit contributions
eh-0.001
vous-0.001
 rescued-0.001
 deliber-0.001
year-0.001
Token the
Token ID262
Feature activation+1.69e-02

Pos logit contributions
 outfits+0.006
 houses+0.005
 appearance+0.005
 shirts+0.005
 faces+0.005
Neg logit contributions
vous-0.008
eh-0.008
 rescued-0.007
 deliber-0.007
jected-0.007
Token oven
Token ID14361
Feature activation+6.57e-02

Pos logit contributions
 rescued+0.027
eh+0.027
vous+0.026
 deliber+0.026
dd+0.024
Neg logit contributions
 outfits-0.030
 appearance-0.030
 houses-0.029
 shirts-0.029
 stations-0.028
Token little
Token ID1310
Feature activation-5.44e-04

Pos logit contributions
 rescued+0.009
eh+0.008
vous+0.008
 deliber+0.008
year+0.008
Neg logit contributions
 outfits-0.007
 appearance-0.007
 shirts-0.007
 houses-0.007
 stations-0.007
Token hat
Token ID6877
Feature activation-1.07e-02

Pos logit contributions
 languages+0.001
 stations+0.001
 appearance+0.001
 countries+0.001
 outfits+0.001
Neg logit contributions
 rescued-0.001
eh-0.001
 che-0.001
vous-0.001
 deliber-0.001
Token and
Token ID290
Feature activation-7.63e-03

Pos logit contributions
 appearance+0.019
 outfits+0.018
 languages+0.018
 countries+0.018
 stations+0.018
Neg logit contributions
 rescued-0.019
eh-0.018
on-0.017
 saved-0.016
vous-0.016
Token oven
Token ID14361
Feature activation+6.03e-02

Pos logit contributions
 appearance+0.015
 outfits+0.015
 languages+0.014
 stations+0.014
 countries+0.014
Neg logit contributions
 rescued-0.012
eh-0.011
 deliber-0.010
vous-0.010
 eaten-0.010
Token m
Token ID285
Feature activation+4.51e-02

Pos logit contributions
 rescued+0.087
eh+0.080
 deliber+0.078
vous+0.074
 eaten+0.073
Neg logit contributions
 appearance-0.124
 outfits-0.120
 languages-0.116
 countries-0.115
 stations-0.114
Tokenitt
Token ID715
Feature activation+1.96e-01

Pos logit contributions
 rescued+0.066
vous+0.065
eh+0.064
 deliber+0.060
year+0.055
Neg logit contributions
 outfits-0.088
 appearance-0.086
 shirts-0.086
ies-0.083
 faces-0.083
Tokens
Token ID82
Feature activation-2.52e-02

Pos logit contributions
 rescued+0.477
eh+0.476
dd+0.454
on+0.454
 deliber+0.452
Neg logit contributions
 appearance-0.185
 outfits-0.177
 languages-0.170
 houses-0.167
 stations-0.163
Token.
Token ID13
Feature activation+9.65e-03

Pos logit contributions
 appearance+0.050
 outfits+0.050
 languages+0.049
 countries+0.048
 shirts+0.048
Neg logit contributions
 rescued-0.037
eh-0.034
 eaten-0.032
 saved-0.031
 deliber-0.031
Token She
Token ID1375
Feature activation+1.30e-02

Pos logit contributions
 rescued+0.013
 deliber+0.012
eh+0.012
vous+0.012
year+0.011
Neg logit contributions
 outfits-0.020
 appearance-0.020
 shirts-0.019
ies-0.019
 stations-0.018
Token took
Token ID1718
Feature activation-2.76e-03

Pos logit contributions
 rescued+0.018
eh+0.018
vous+0.017
 deliber+0.017
year+0.015
Neg logit contributions
 outfits-0.026
 appearance-0.026
 shirts-0.025
 houses-0.025
 languages-0.025
Token a
Token ID257
Feature activation-7.54e-03

Pos logit contributions
 outfits+0.005
 shirts+0.005
 appearance+0.005
 houses+0.005
 stations+0.005
Neg logit contributions
eh-0.004
 rescued-0.004
vous-0.004
 deliber-0.004
jected-0.004
Token this
Token ID428
Feature activation-3.71e-02

Pos logit contributions
 outfits+0.035
 appearance+0.035
 stations+0.034
 shirts+0.034
utes+0.033
Neg logit contributions
 rescued-0.018
 deliber-0.016
 eaten-0.016
on-0.016
eh-0.016
Token flower
Token ID15061
Feature activation-1.57e-02

Pos logit contributions
 appearance+0.067
 stations+0.066
 outfits+0.065
 shirts+0.065
 houses+0.063
Neg logit contributions
eh-0.060
 rescued-0.056
vous-0.055
sm-0.051
 eaten-0.051
Token was
Token ID373
Feature activation+6.01e-03

Pos logit contributions
 stations+0.035
 appearance+0.034
 outfits+0.034
 shirts+0.034
 countries+0.033
Neg logit contributions
 rescued-0.019
on-0.017
 saved-0.017
eh-0.017
 helped-0.016
Token even
Token ID772
Feature activation-3.88e-02

Pos logit contributions
 rescued+0.008
eh+0.007
 eaten+0.007
 saved+0.006
vous+0.006
Neg logit contributions
 outfits-0.014
 stations-0.013
 appearance-0.013
 shirts-0.013
 languages-0.013
Token l
Token ID300
Feature activation+2.34e-02

Pos logit contributions
 appearance+0.073
 outfits+0.073
 shirts+0.070
 countries+0.070
 languages+0.069
Neg logit contributions
 rescued-0.062
eh-0.060
vous-0.056
 eaten-0.054
 saved-0.053
Tokenitt
Token ID715
Feature activation+1.93e-01

Pos logit contributions
 rescued+0.039
 deliber+0.039
vous+0.038
eh+0.037
 eaten+0.034
Neg logit contributions
 outfits-0.039
 appearance-0.037
 hoops-0.036
 shirts-0.036
ies-0.036
Tokenler
Token ID1754
Feature activation-5.94e-03

Pos logit contributions
 rescued+0.358
eh+0.357
vous+0.338
 deliber+0.337
dd+0.336
Neg logit contributions
 appearance-0.295
 outfits-0.291
 stations-0.283
 houses-0.279
 shirts-0.275
Token than
Token ID621
Feature activation+3.19e-02

Pos logit contributions
 outfits+0.012
 appearance+0.012
 stations+0.012
 shirts+0.012
 houses+0.012
Neg logit contributions
 rescued-0.007
eh-0.007
vous-0.007
sm-0.007
 eaten-0.006
Token the
Token ID262
Feature activation+6.92e-03

Pos logit contributions
 rescued+0.044
eh+0.040
 eaten+0.040
 deliber+0.040
vous+0.039
Neg logit contributions
 outfits-0.065
 appearance-0.062
 stations-0.062
 shirts-0.061
 houses-0.061
Token da
Token ID12379
Feature activation+1.77e-02

Pos logit contributions
eh+0.012
 rescued+0.011
vous+0.011
 eaten+0.010
 che+0.010
Neg logit contributions
 outfits-0.012
 shirts-0.011
lights-0.011
 appearance-0.011
 stations-0.011
Tokenisy
Token ID13560
Feature activation+2.33e-02

Pos logit contributions
 rescued+0.029
eh+0.029
vous+0.028
 deliber+0.027
on+0.025
Neg logit contributions
 outfits-0.031
 appearance-0.031
 shirts-0.030
 houses-0.029
 stations-0.029
Token.
Token ID13
Feature activation-2.24e-03

Pos logit contributions
 appearance+0.023
 outfits+0.023
 shirts+0.022
 languages+0.022
 stations+0.022
Neg logit contributions
 rescued-0.020
eh-0.018
 deliber-0.018
vous-0.018
year-0.017
Token She
Token ID1375
Feature activation+1.26e-02

Pos logit contributions
 outfits+0.005
 appearance+0.004
 shirts+0.004
 stations+0.004
ies+0.004
Neg logit contributions
 rescued-0.003
eh-0.003
 deliber-0.003
vous-0.003
year-0.003
Token used
Token ID973
Feature activation-1.72e-02

Pos logit contributions
 rescued+0.017
eh+0.015
vous+0.014
 deliber+0.014
 saved+0.013
Neg logit contributions
 appearance-0.028
 outfits-0.027
 languages-0.027
 stations-0.027
 shirts-0.026
Token a
Token ID257
Feature activation-1.45e-02

Pos logit contributions
 appearance+0.036
 outfits+0.036
 stations+0.034
 shirts+0.034
 languages+0.034
Neg logit contributions
 rescued-0.023
eh-0.021
 deliber-0.021
vous-0.020
on-0.019
Token m
Token ID285
Feature activation+4.17e-02

Pos logit contributions
 appearance+0.028
 stations+0.027
 outfits+0.026
 languages+0.026
 countries+0.026
Neg logit contributions
 rescued-0.023
eh-0.023
on-0.021
vous-0.021
 deliber-0.021
Tokenitt
Token ID715
Feature activation+1.93e-01

Pos logit contributions
 rescued+0.053
eh+0.050
vous+0.048
 deliber+0.047
 eaten+0.042
Neg logit contributions
 outfits-0.091
 appearance-0.089
 shirts-0.087
ies-0.087
 faces-0.086
Token to
Token ID284
Feature activation+7.40e-03

Pos logit contributions
 rescued+0.350
eh+0.345
vous+0.333
 deliber+0.330
on+0.319
Neg logit contributions
 appearance-0.307
 outfits-0.305
 shirts-0.289
 houses-0.288
 stations-0.287
Token take
Token ID1011
Feature activation-1.10e-02

Pos logit contributions
 rescued+0.012
eh+0.012
vous+0.011
 deliber+0.011
on+0.011
Neg logit contributions
 appearance-0.014
 outfits-0.013
 languages-0.013
 stations-0.013
 shirts-0.013
Token out
Token ID503
Feature activation+7.88e-03

Pos logit contributions
 outfits+0.022
 appearance+0.022
 shirts+0.021
 houses+0.021
 stations+0.021
Neg logit contributions
 rescued-0.015
eh-0.015
vous-0.015
 deliber-0.014
year-0.013
Token the
Token ID262
Feature activation+4.99e-03

Pos logit contributions
 rescued+0.009
eh+0.008
sm+0.008
on+0.007
vous+0.007
Neg logit contributions
 appearance-0.019
 outfits-0.018
 shirts-0.017
 stations-0.017
 sax-0.017
Token pan
Token ID3425
Feature activation+4.15e-02

Pos logit contributions
 rescued+0.008
eh+0.007
vous+0.007
 deliber+0.007
on+0.007
Neg logit contributions
 appearance-0.009
 outfits-0.009
 languages-0.009
 stations-0.009
 hoops-0.009
Token there
Token ID612
Feature activation+2.88e-02

Pos logit contributions
 rescued+0.060
eh+0.059
vous+0.058
 deliber+0.058
year+0.055
Neg logit contributions
 outfits-0.030
 appearance-0.029
 shirts-0.028
 houses-0.027
 stations-0.026
Token was
Token ID373
Feature activation+6.33e-02

Pos logit contributions
 rescued+0.041
eh+0.039
on+0.037
vous+0.037
 eaten+0.035
Neg logit contributions
 outfits-0.060
 appearance-0.059
 shirts-0.058
 languages-0.058
 stations-0.058
Token a
Token ID257
Feature activation+2.69e-02

Pos logit contributions
 rescued+0.089
vous+0.085
 deliber+0.085
eh+0.080
jected+0.078
Neg logit contributions
 shirts-0.120
 outfits-0.120
 hoops-0.119
 houses-0.119
 faces-0.116
Token three
Token ID1115
Feature activation+3.01e-02

Pos logit contributions
 rescued+0.058
eh+0.057
vous+0.056
 deliber+0.055
year+0.052
Neg logit contributions
 outfits-0.034
 appearance-0.033
 shirts-0.032
 houses-0.031
 stations-0.030
Token-
Token ID12
Feature activation+1.40e-01

Pos logit contributions
 rescued+0.054
 deliber+0.051
vous+0.049
eh+0.049
jected+0.046
Neg logit contributions
 outfits-0.050
ies-0.048
 houses-0.047
 shirts-0.047
 appearance-0.047
Tokenyear
Token ID1941
Feature activation+1.93e-01

Pos logit contributions
 rescued+0.120
 deliber+0.116
+0.097
vous+0.088
 saved+0.084
Neg logit contributions
 outfits-0.347
 houses-0.341
ies-0.337
 shirts-0.328
 languages-0.326
Token-
Token ID12
Feature activation+1.34e-01

Pos logit contributions
 rescued+0.379
 deliber+0.339
vous+0.329
 saved+0.326
+0.319
Neg logit contributions
ies-0.312
 outfits-0.291
 houses-0.283
 faces-0.282
 languages-0.276
Tokenold
Token ID727
Feature activation+9.34e-02

Pos logit contributions
 rescued+0.182
 deliber+0.168
+0.156
vous+0.152
 saved+0.148
Neg logit contributions
 outfits-0.277
 houses-0.266
ies-0.263
 countries-0.258
 languages-0.258
Token who
Token ID508
Feature activation+8.82e-02

Pos logit contributions
 rescued+0.192
eh+0.188
vous+0.185
 deliber+0.182
year+0.171
Neg logit contributions
 outfits-0.127
 appearance-0.124
 shirts-0.119
 houses-0.116
 stations-0.115
Token found
Token ID1043
Feature activation+1.10e-02

Pos logit contributions
 deliber+0.151
eh+0.148
 rescued+0.141
vous+0.139
jected+0.129
Neg logit contributions
 outfits-0.142
 houses-0.133
 shirts-0.132
ies-0.128
 faces-0.127
Token a
Token ID257
Feature activation+1.23e-02

Pos logit contributions
 deliber+0.017
vous+0.016
 rescued+0.016
eh+0.015
 oblig+0.014
Neg logit contributions
 houses-0.020
 outfits-0.020
 shirts-0.020
 stations-0.018
 hoops-0.018
Token sometimes
Token ID3360
Feature activation-5.74e-02

Pos logit contributions
 outfits+0.040
 shirts+0.038
 appearance+0.037
maid+0.037
utes+0.036
Neg logit contributions
 rescued-0.026
 deliber-0.024
 saved-0.023
 eaten-0.023
vous-0.022
Token,
Token ID11
Feature activation-1.22e-02

Pos logit contributions
 outfits+0.110
 shirts+0.105
utes+0.102
maid+0.100
 hoops+0.099
Neg logit contributions
 rescued-0.090
 deliber-0.083
 saved-0.081
vous-0.077
 eaten-0.077
Token even
Token ID772
Feature activation-4.31e-02

Pos logit contributions
 outfits+0.024
maid+0.023
 shirts+0.022
utes+0.022
inate+0.021
Neg logit contributions
 rescued-0.018
 saved-0.017
 deliber-0.016
 eaten-0.015
 che-0.015
Token the
Token ID262
Feature activation+4.98e-03

Pos logit contributions
 outfits+0.086
 shirts+0.081
maid+0.079
 appearance+0.076
utes+0.076
Neg logit contributions
 rescued-0.068
 eaten-0.061
 saved-0.060
vous-0.059
eh-0.059
Token l
Token ID300
Feature activation+1.93e-02

Pos logit contributions
 rescued+0.008
vous+0.007
eh+0.007
 deliber+0.007
 che+0.007
Neg logit contributions
 outfits-0.009
ies-0.009
 appearance-0.009
 stations-0.008
 languages-0.008
Tokenitt
Token ID715
Feature activation+1.92e-01

Pos logit contributions
 rescued+0.026
eh+0.025
 deliber+0.024
vous+0.024
 eaten+0.021
Neg logit contributions
 outfits-0.040
 appearance-0.039
 shirts-0.038
 houses-0.038
 stations-0.037
Tokenlest
Token ID32712
Feature activation-8.80e-03

Pos logit contributions
 rescued+0.328
eh+0.312
on+0.308
vous+0.303
dd+0.302
Neg logit contributions
 appearance-0.331
 outfits-0.328
 stations-0.312
 shirts-0.312
 houses-0.307
Token things
Token ID1243
Feature activation-2.35e-02

Pos logit contributions
 appearance+0.014
 outfits+0.014
inate+0.014
 stations+0.013
 shirts+0.013
Neg logit contributions
 rescued-0.015
vous-0.014
on-0.014
sm-0.013
 saved-0.013
Token can
Token ID460
Feature activation-3.00e-02

Pos logit contributions
 outfits+0.042
 appearance+0.040
 shirts+0.039
 stations+0.038
maid+0.038
Neg logit contributions
 rescued-0.042
 saved-0.038
 helped-0.035
vous-0.035
 deliber-0.035
Token lead
Token ID1085
Feature activation-5.58e-02

Pos logit contributions
 outfits+0.058
 shirts+0.055
 hoops+0.053
 languages+0.052
 stations+0.052
Neg logit contributions
 rescued-0.050
vous-0.044
 saved-0.043
 deliber-0.043
 oblig-0.042
Token to
Token ID284
Feature activation+1.02e-02

Pos logit contributions
 outfits+0.131
maid+0.128
 shirts+0.127
utes+0.121
lights+0.121
Neg logit contributions
 rescued-0.067
 saved-0.055
vous-0.055
 eaten-0.053
eh-0.050
Token and
Token ID290
Feature activation-5.14e-03

Pos logit contributions
 appearance+0.012
 languages+0.012
 stations+0.011
 outfits+0.011
 countries+0.011
Neg logit contributions
 rescued-0.007
on-0.006
eh-0.006
 deliber-0.006
 saved-0.006
Token get
Token ID651
Feature activation+2.40e-02

Pos logit contributions
 outfits+0.010
 languages+0.010
 appearance+0.010
 stations+0.010
 shirts+0.010
Neg logit contributions
 rescued-0.009
eh-0.007
 deliber-0.007
 saved-0.007
vous-0.007
Token it
Token ID340
Feature activation-7.10e-03

Pos logit contributions
 rescued+0.041
eh+0.040
vous+0.039
 deliber+0.038
 eaten+0.036
Neg logit contributions
 outfits-0.041
 appearance-0.041
 shirts-0.039
 languages-0.039
 stations-0.039
Token,
Token ID11
Feature activation-1.20e-02

Pos logit contributions
 appearance+0.014
 outfits+0.014
 languages+0.014
 shirts+0.014
 stations+0.014
Neg logit contributions
 rescued-0.011
eh-0.010
vous-0.009
 deliber-0.009
 eaten-0.009
Token k
Token ID479
Feature activation+5.33e-02

Pos logit contributions
 outfits+0.025
 languages+0.024
 shirts+0.023
 stations+0.023
 appearance+0.023
Neg logit contributions
 rescued-0.017
sha-0.015
eh-0.015
 saved-0.014
vous-0.014
Tokenitt
Token ID715
Feature activation+1.90e-01

Pos logit contributions
 deliber+0.072
 rescued+0.071
vous+0.068
eh+0.067
 eaten+0.063
Neg logit contributions
 outfits-0.107
ies-0.104
 shirts-0.103
 houses-0.103
 appearance-0.102
Tokenies
Token ID444
Feature activation-3.18e-02

Pos logit contributions
eh+0.469
 rescued+0.464
on+0.446
vous+0.444
dd+0.442
Neg logit contributions
 appearance-0.182
 outfits-0.178
 houses-0.170
 languages-0.166
 stations-0.165
Token!"
Token ID2474
Feature activation+4.34e-03

Pos logit contributions
 appearance+0.070
 outfits+0.069
 shirts+0.067
 languages+0.067
 hoops+0.066
Neg logit contributions
 rescued-0.041
eh-0.039
vous-0.036
on-0.035
 deliber-0.034
Token she
Token ID673
Feature activation+1.34e-02

Pos logit contributions
 rescued+0.006
eh+0.005
 deliber+0.005
vous+0.005
on+0.005
Neg logit contributions
 appearance-0.009
 outfits-0.009
 stations-0.009
 languages-0.009
 shirts-0.009
Token called
Token ID1444
Feature activation-9.14e-03

Pos logit contributions
 rescued+0.015
eh+0.015
 deliber+0.014
vous+0.014
 eaten+0.012
Neg logit contributions
 appearance-0.032
 outfits-0.031
 stations-0.030
 languages-0.030
 houses-0.029
Token,
Token ID11
Feature activation-1.28e-02

Pos logit contributions
 outfits+0.016
 appearance+0.016
 shirts+0.015
 languages+0.015
 houses+0.015
Neg logit contributions
 rescued-0.015
eh-0.015
vous-0.014
 deliber-0.014
year-0.013
Token,
Token ID11
Feature activation+4.12e-03

Pos logit contributions
 outfits+0.013
 appearance+0.013
 shirts+0.013
 houses+0.013
ies+0.013
Neg logit contributions
 rescued-0.016
vous-0.016
eh-0.016
 deliber-0.015
year-0.014
Token there
Token ID612
Feature activation-1.35e-02

Pos logit contributions
 rescued+0.007
 deliber+0.007
vous+0.007
eh+0.007
year+0.006
Neg logit contributions
 outfits-0.006
 houses-0.006
 appearance-0.006
ies-0.006
 shirts-0.006
Token was
Token ID373
Feature activation+1.35e-02

Pos logit contributions
 outfits+0.022
 appearance+0.022
 shirts+0.021
 houses+0.020
 stations+0.020
Neg logit contributions
 rescued-0.024
eh-0.024
vous-0.023
 deliber-0.023
year-0.021
Token a
Token ID257
Feature activation-3.28e-03

Pos logit contributions
 deliber+0.021
 rescued+0.021
vous+0.020
jected+0.019
eh+0.019
Neg logit contributions
 shirts-0.023
 houses-0.023
 hoops-0.023
 outfits-0.023
 stations-0.022
Token l
Token ID300
Feature activation+2.30e-02

Pos logit contributions
 outfits+0.004
 appearance+0.004
 shirts+0.004
 houses+0.004
ies+0.004
Neg logit contributions
 rescued-0.007
eh-0.007
vous-0.007
 deliber-0.007
year-0.006
Tokenitt
Token ID715
Feature activation+1.89e-01

Pos logit contributions
 deliber+0.037
 rescued+0.037
vous+0.032
 saved+0.031
 eaten+0.030
Neg logit contributions
ies-0.040
 outfits-0.039
 countries-0.039
utes-0.039
 hoops-0.039
Tokentle
Token ID7100
Feature activation-1.25e-02

Pos logit contributions
 rescued+0.373
eh+0.370
vous+0.361
 deliber+0.353
on+0.335
Neg logit contributions
 outfits-0.270
 appearance-0.266
 shirts-0.254
 houses-0.249
 stations-0.248
Token girl
Token ID2576
Feature activation+8.97e-03

Pos logit contributions
 outfits+0.019
 appearance+0.019
 shirts+0.018
 stations+0.017
 languages+0.017
Neg logit contributions
 rescued-0.024
eh-0.023
vous-0.023
 deliber-0.021
year-0.021
Token.
Token ID13
Feature activation+1.88e-02

Pos logit contributions
 rescued+0.011
 deliber+0.011
vous+0.010
eh+0.010
year+0.009
Neg logit contributions
 outfits-0.019
 appearance-0.019
 shirts-0.019
ies-0.019
 houses-0.019
Token She
Token ID1375
Feature activation+2.07e-02

Pos logit contributions
 rescued+0.034
eh+0.034
vous+0.032
dd+0.032
 deliber+0.031
Neg logit contributions
 outfits-0.030
 appearance-0.030
 houses-0.029
 shirts-0.028
 languages-0.028
Token had
Token ID550
Feature activation-9.01e-03

Pos logit contributions
 deliber+0.038
 rescued+0.037
vous+0.036
eh+0.035
year+0.035
Neg logit contributions
 outfits-0.031
 shirts-0.030
ies-0.030
 houses-0.029
 dresses-0.028
Token put
Token ID1234
Feature activation+1.60e-02

Pos logit contributions
 outfits+0.032
 shirts+0.031
 appearance+0.031
ies+0.030
 houses+0.030
Neg logit contributions
 rescued-0.028
eh-0.027
vous-0.027
 deliber-0.026
year-0.025
Token on
Token ID319
Feature activation-9.65e-03

Pos logit contributions
 rescued+0.026
eh+0.025
vous+0.024
 deliber+0.024
year+0.022
Neg logit contributions
 appearance-0.029
 outfits-0.029
 shirts-0.028
 languages-0.027
 stations-0.027
Token her
Token ID607
Feature activation+1.29e-03

Pos logit contributions
 outfits+0.018
 shirts+0.017
 appearance+0.017
 houses+0.017
 faces+0.017
Neg logit contributions
 rescued-0.015
eh-0.015
vous-0.015
 deliber-0.014
year-0.013
Token oven
Token ID14361
Feature activation+5.34e-02

Pos logit contributions
 rescued+0.002
eh+0.002
vous+0.002
 deliber+0.002
 eaten+0.002
Neg logit contributions
 appearance-0.002
 outfits-0.002
 stations-0.002
 languages-0.002
 countries-0.002
Token m
Token ID285
Feature activation+4.18e-02

Pos logit contributions
 rescued+0.096
eh+0.091
 deliber+0.090
on+0.086
vous+0.085
Neg logit contributions
 appearance-0.089
 outfits-0.084
 languages-0.082
 stations-0.081
 countries-0.080
Tokenitt
Token ID715
Feature activation+1.89e-01

Pos logit contributions
 rescued+0.059
eh+0.056
vous+0.056
 deliber+0.053
 saved+0.048
Neg logit contributions
 outfits-0.085
 appearance-0.082
 shirts-0.082
ies-0.079
 faces-0.079
Tokens
Token ID82
Feature activation-3.25e-02

Pos logit contributions
eh+0.388
 rescued+0.386
on+0.384
dd+0.374
 eaten+0.372
Neg logit contributions
 appearance-0.239
 houses-0.226
 outfits-0.225
 stations-0.225
 languages-0.219
Token and
Token ID290
Feature activation-6.72e-03

Pos logit contributions
 appearance+0.062
 languages+0.061
 nations+0.060
 countries+0.059
 stations+0.059
Neg logit contributions
 rescued-0.051
eh-0.048
on-0.046
 eaten-0.044
 saved-0.043
Token z
Token ID1976
Feature activation+3.17e-03

Pos logit contributions
 outfits+0.013
 appearance+0.013
 shirts+0.012
 languages+0.012
 stations+0.012
Neg logit contributions
 rescued-0.010
eh-0.010
vous-0.009
 deliber-0.009
 eaten-0.008
Tokenipped
Token ID3949
Feature activation+3.78e-02

Pos logit contributions
vous+0.006
 rescued+0.006
eh+0.005
 deliber+0.005
year+0.005
Neg logit contributions
 outfits-0.005
 shirts-0.005
ies-0.005
 appearance-0.005
 hoops-0.005
Token open
Token ID1280
Feature activation-2.19e-02

Pos logit contributions
 rescued+0.060
eh+0.056
 deliber+0.055
on+0.054
 eaten+0.052
Neg logit contributions
 appearance-0.072
 languages-0.068
 outfits-0.067
 countries-0.066
 stations-0.066
Token put
Token ID1234
Feature activation+1.21e-02

Pos logit contributions
 rescued+0.004
eh+0.003
vous+0.003
 deliber+0.003
 eaten+0.003
Neg logit contributions
 outfits-0.005
 appearance-0.005
 shirts-0.005
 languages-0.005
 stations-0.005
Token on
Token ID319
Feature activation-1.43e-02

Pos logit contributions
 rescued+0.018
eh+0.017
vous+0.016
 deliber+0.016
on+0.016
Neg logit contributions
 appearance-0.024
 outfits-0.024
 languages-0.022
 stations-0.022
 shirts-0.022
Token an
Token ID281
Feature activation-1.66e-02

Pos logit contributions
 outfits+0.027
 appearance+0.027
 shirts+0.026
 houses+0.026
 stations+0.025
Neg logit contributions
 rescued-0.021
eh-0.021
vous-0.021
 deliber-0.020
year-0.018
Token oven
Token ID14361
Feature activation+5.11e-02

Pos logit contributions
 outfits+0.030
 appearance+0.029
 shirts+0.028
 houses+0.028
 stations+0.028
Neg logit contributions
 rescued-0.026
eh-0.026
vous-0.025
 deliber-0.025
year-0.023
Token m
Token ID285
Feature activation+3.91e-02

Pos logit contributions
 rescued+0.084
eh+0.078
 deliber+0.076
on+0.072
 saved+0.071
Neg logit contributions
 appearance-0.095
 outfits-0.092
 languages-0.089
maid-0.087
 hoops-0.086
Tokenitt
Token ID715
Feature activation+1.87e-01

Pos logit contributions
 rescued+0.049
eh+0.046
vous+0.045
 deliber+0.043
 saved+0.039
Neg logit contributions
 outfits-0.085
 appearance-0.083
 shirts-0.082
 faces-0.081
ies-0.081
Token and
Token ID290
Feature activation-1.53e-02

Pos logit contributions
 rescued+0.362
eh+0.361
on+0.350
dd+0.340
 deliber+0.339
Neg logit contributions
 appearance-0.267
 outfits-0.262
 languages-0.250
 houses-0.247
 stations-0.244
Token a
Token ID257
Feature activation-1.95e-02

Pos logit contributions
 outfits+0.030
 shirts+0.030
 houses+0.030
 appearance+0.029
 faces+0.029
Neg logit contributions
vous-0.021
eh-0.021
 rescued-0.020
 deliber-0.020
year-0.018
Token a
Token ID257
Feature activation-1.99e-02

Pos logit contributions
 appearance+0.036
 outfits+0.036
 languages+0.035
 stations+0.034
 shirts+0.034
Neg logit contributions
 rescued-0.031
eh-0.031
vous-0.028
 deliber-0.028
on-0.027
Tokenpron
Token ID31186
Feature activation+5.64e-02

Pos logit contributions
 appearance+0.038
 outfits+0.037
 languages+0.036
 stations+0.036
 countries+0.036
Neg logit contributions
 rescued-0.031
eh-0.031
vous-0.028
 deliber-0.028
on-0.026
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 rescued+0.085
eh+0.081
on+0.075
vous+0.075
 eaten+0.073
Neg logit contributions
 appearance-0.114
 outfits-0.108
 languages-0.107
 countries-0.106
 hoops-0.104
Token her
Token ID607
Feature activation+3.68e-03

Pos logit contributions
 rescued+0.012
eh+0.011
vous+0.011
 deliber+0.011
year+0.010
Neg logit contributions
 outfits-0.017
 appearance-0.017
 shirts-0.017
 houses-0.016
 stations-0.016
Token depend
Token ID4745
Feature activation+1.99e-02

Pos logit contributions
 rescued+0.006
eh+0.006
vous+0.006
 deliber+0.006
 eaten+0.005
Neg logit contributions
 outfits-0.006
 appearance-0.006
 shirts-0.006
 stations-0.006
 languages-0.006
Tokenable
Token ID540
Feature activation+5.53e-02

Pos logit contributions
 rescued+0.029
 deliber+0.028
eh+0.027
vous+0.025
jected+0.025
Neg logit contributions
 outfits-0.039
ies-0.039
 shirts-0.037
 houses-0.036
 appearance-0.036
Token oven
Token ID14361
Feature activation+5.41e-02

Pos logit contributions
 rescued+0.091
eh+0.090
vous+0.086
 deliber+0.084
 eaten+0.081
Neg logit contributions
 outfits-0.095
 appearance-0.094
 languages-0.091
 stations-0.091
 shirts-0.090
Token m
Token ID285
Feature activation+4.10e-02

Pos logit contributions
 rescued+0.078
eh+0.074
 deliber+0.072
vous+0.070
on+0.068
Neg logit contributions
 appearance-0.107
 outfits-0.104
 languages-0.099
 stations-0.099
 hoops-0.099
Tokenitt
Token ID715
Feature activation+1.86e-01

Pos logit contributions
 rescued+0.056
eh+0.052
vous+0.052
 deliber+0.049
 saved+0.046
Neg logit contributions
 outfits-0.085
ies-0.083
 houses-0.081
 appearance-0.081
 shirts-0.081
Tokens
Token ID82
Feature activation-3.65e-02

Pos logit contributions
eh+0.455
 rescued+0.451
on+0.434
dd+0.434
 eaten+0.433
Neg logit contributions
 appearance-0.167
 outfits-0.163
 languages-0.152
 stations-0.151
 houses-0.148
Token and
Token ID290
Feature activation-9.95e-03

Pos logit contributions
 appearance+0.068
 outfits+0.066
 languages+0.064
 hoops+0.063
 stations+0.063
Neg logit contributions
 rescued-0.058
eh-0.055
vous-0.053
 deliber-0.052
 eaten-0.051
Token bake
Token ID28450
Feature activation+1.43e-02

Pos logit contributions
 outfits+0.017
 appearance+0.017
 shirts+0.016
 languages+0.016
 stations+0.016
Neg logit contributions
 rescued-0.017
eh-0.016
 deliber-0.016
vous-0.016
 eaten-0.015
Token y
Token ID331
Feature activation+4.21e-02

Pos logit contributions
 rescued+0.022
eh+0.022
vous+0.021
 deliber+0.021
on+0.019
Neg logit contributions
 appearance-0.027
 outfits-0.027
 shirts-0.025
 languages-0.025
 stations-0.025
Tokenummy
Token ID13513
Feature activation-2.21e-03

Pos logit contributions
 rescued+0.088
eh+0.085
vous+0.085
 deliber+0.081
 saved+0.078
Neg logit contributions
 outfits-0.058
 appearance-0.056
 shirts-0.055
ies-0.055
 houses-0.055
Token was
Token ID373
Feature activation+4.84e-02

Pos logit contributions
 rescued+0.061
eh+0.061
vous+0.058
on+0.058
 deliber+0.055
Neg logit contributions
 appearance-0.063
 outfits-0.062
 stations-0.061
 hoops-0.060
 languages-0.060
Token an
Token ID281
Feature activation+1.45e-02

Pos logit contributions
 rescued+0.071
vous+0.070
 deliber+0.068
eh+0.066
year+0.061
Neg logit contributions
 outfits-0.090
 shirts-0.087
 houses-0.087
 appearance-0.087
 hoops-0.084
Token average
Token ID2811
Feature activation-6.95e-03

Pos logit contributions
 rescued+0.027
eh+0.024
 deliber+0.024
 eaten+0.024
vous+0.023
Neg logit contributions
 appearance-0.023
 shirts-0.022
 outfits-0.022
 houses-0.022
 countries-0.022
Token three
Token ID1115
Feature activation+2.85e-02

Pos logit contributions
 outfits+0.012
 appearance+0.011
 stations+0.011
 shirts+0.011
 languages+0.011
Neg logit contributions
eh-0.011
vous-0.011
 rescued-0.011
 deliber-0.010
year-0.010
Token-
Token ID12
Feature activation+1.37e-01

Pos logit contributions
 rescued+0.042
eh+0.041
vous+0.040
 deliber+0.039
year+0.036
Neg logit contributions
 outfits-0.055
 appearance-0.054
 shirts-0.053
 houses-0.052
 languages-0.052
Tokenyear
Token ID1941
Feature activation+1.85e-01

Pos logit contributions
 rescued+0.101
 deliber+0.091
+0.070
vous+0.069
 saved+0.069
Neg logit contributions
 outfits-0.361
 houses-0.354
 stations-0.348
ies-0.347
 shirts-0.339
Token-
Token ID12
Feature activation+1.32e-01

Pos logit contributions
 rescued+0.356
 deliber+0.314
vous+0.312
eh+0.311
 saved+0.304
Neg logit contributions
ies-0.300
 outfits-0.291
 houses-0.285
 languages-0.279
 countries-0.277
Tokenold
Token ID727
Feature activation+8.90e-02

Pos logit contributions
 rescued+0.185
 deliber+0.167
vous+0.154
 saved+0.153
eh+0.153
Neg logit contributions
 outfits-0.269
 houses-0.258
ies-0.252
 countries-0.251
 languages-0.249
Token boy
Token ID2933
Feature activation+2.48e-02

Pos logit contributions
 rescued+0.160
eh+0.156
vous+0.154
 deliber+0.145
on+0.141
Neg logit contributions
 appearance-0.146
 outfits-0.141
 shirts-0.134
 stations-0.131
 hoops-0.131
Token.
Token ID13
Feature activation+3.73e-02

Pos logit contributions
 rescued+0.038
eh+0.036
vous+0.034
 deliber+0.033
on+0.033
Neg logit contributions
 appearance-0.048
 outfits-0.048
 shirts-0.045
 languages-0.045
 stations-0.045
Token He
Token ID679
Feature activation+2.26e-02

Pos logit contributions
 rescued+0.053
eh+0.052
vous+0.051
 deliber+0.049
year+0.045
Neg logit contributions
 outfits-0.074
 appearance-0.073
 shirts-0.071
 houses-0.070
 stations-0.069
Token They
Token ID1119
Feature activation-1.08e-02

Pos logit contributions
 outfits+0.015
 appearance+0.015
ies+0.014
 shirts+0.014
 languages+0.014
Neg logit contributions
 rescued-0.019
 deliber-0.018
eh-0.018
vous-0.018
 eaten-0.017
Token put
Token ID1234
Feature activation+9.91e-03

Pos logit contributions
 outfits+0.019
 shirts+0.019
 houses+0.019
ies+0.018
 faces+0.018
Neg logit contributions
vous-0.017
eh-0.017
 deliber-0.016
 rescued-0.015
dd-0.015
Token on
Token ID319
Feature activation-1.27e-02

Pos logit contributions
vous+0.018
eh+0.018
 deliber+0.017
 rescued+0.017
year+0.015
Neg logit contributions
 outfits-0.016
 houses-0.016
 shirts-0.016
 faces-0.015
ies-0.015
Token oven
Token ID14361
Feature activation+5.02e-02

Pos logit contributions
 outfits+0.020
 houses+0.019
 shirts+0.019
 faces+0.019
 hoops+0.018
Neg logit contributions
vous-0.026
eh-0.024
 rescued-0.024
 deliber-0.024
year-0.023
Token m
Token ID285
Feature activation+3.77e-02

Pos logit contributions
 rescued+0.096
 deliber+0.091
eh+0.089
vous+0.086
 eaten+0.084
Neg logit contributions
 appearance-0.078
 outfits-0.075
 languages-0.072
 hoops-0.070
 shirts-0.069
Tokenitt
Token ID715
Feature activation+1.85e-01

Pos logit contributions
 rescued+0.044
eh+0.044
vous+0.042
 deliber+0.038
 saved+0.036
Neg logit contributions
 outfits-0.083
 faces-0.081
 appearance-0.080
 shirts-0.080
ies-0.079
Tokens
Token ID82
Feature activation-4.26e-02

Pos logit contributions
 rescued+0.425
eh+0.421
 deliber+0.405
on+0.404
dd+0.401
Neg logit contributions
 appearance-0.194
 outfits-0.190
 languages-0.179
 stations-0.176
 houses-0.173
Token and
Token ID290
Feature activation-2.08e-02

Pos logit contributions
 appearance+0.076
 languages+0.073
 outfits+0.072
 sax+0.070
 hoops+0.069
Neg logit contributions
 rescued-0.074
eh-0.071
 eaten-0.065
vous-0.065
itt-0.064
Token took
Token ID1718
Feature activation-8.93e-03

Pos logit contributions
 outfits+0.040
 shirts+0.039
 houses+0.039
 appearance+0.038
 faces+0.038
Neg logit contributions
vous-0.030
eh-0.030
 rescued-0.029
 deliber-0.028
jected-0.026
Token out
Token ID503
Feature activation+6.08e-03

Pos logit contributions
 outfits+0.015
 houses+0.015
 shirts+0.015
 appearance+0.014
 faces+0.014
Neg logit contributions
eh-0.015
vous-0.015
 rescued-0.015
 deliber-0.014
jected-0.013
Token the
Token ID262
Feature activation-2.52e-03

Pos logit contributions
 rescued+0.008
eh+0.008
vous+0.008
 deliber+0.008
year+0.007
Neg logit contributions
 outfits-0.013
 appearance-0.012
 shirts-0.012
 houses-0.012
 stations-0.012
Token clown
Token ID25573
Feature activation+5.29e-03

Pos logit contributions
 rescued+0.006
vous+0.005
 saved+0.005
on+0.005
eh+0.005
Neg logit contributions
 appearance-0.018
 outfits-0.017
ies-0.016
maid-0.016
inate-0.016
Tokens
Token ID82
Feature activation-2.67e-02

Pos logit contributions
on+0.009
 rescued+0.009
o+0.008
eh+0.008
 deliber+0.008
Neg logit contributions
 appearance-0.009
 stations-0.009
 outfits-0.008
 languages-0.008
adoes-0.008
Token juggling
Token ID48097
Feature activation-4.58e-02

Pos logit contributions
 stations+0.061
 outfits+0.060
 appearance+0.059
 conclusions+0.056
 hoops+0.056
Neg logit contributions
 rescued-0.028
on-0.027
 full-0.026
 saved-0.024
 Bob-0.023
Token and
Token ID290
Feature activation-5.99e-03

Pos logit contributions
 appearance+0.089
maid+0.084
inate+0.082
 outfits+0.082
ies+0.079
Neg logit contributions
 rescued-0.067
 che-0.066
eh-0.066
on-0.063
vous-0.062
Token ac
Token ID936
Feature activation+1.09e-02

Pos logit contributions
 appearance+0.013
 outfits+0.013
 countries+0.012
 stations+0.012
 languages+0.012
Neg logit contributions
 rescued-0.008
 deliber-0.007
 eaten-0.007
eh-0.007
on-0.007
Tokenrob
Token ID22609
Feature activation+1.84e-01

Pos logit contributions
 deliber+0.014
vous+0.013
 rescued+0.013
eh+0.013
 oblig+0.013
Neg logit contributions
 shirts-0.022
 outfits-0.022
ies-0.021
 stations-0.021
 hoops-0.021
Tokenats
Token ID1381
Feature activation+4.70e-02

Pos logit contributions
 rescued+0.311
vous+0.296
eh+0.291
 deliber+0.289
year+0.272
Neg logit contributions
 outfits-0.334
 shirts-0.316
ies-0.310
 appearance-0.309
 languages-0.301
Token doing
Token ID1804
Feature activation-5.97e-03

Pos logit contributions
 rescued+0.065
on+0.060
 eaten+0.056
 saved+0.055
 deliber+0.054
Neg logit contributions
 appearance-0.101
 outfits-0.097
 countries-0.094
 languages-0.092
 stations-0.091
Token flips
Token ID45971
Feature activation-6.94e-03

Pos logit contributions
 appearance+0.011
 outfits+0.011
maid+0.011
ies+0.010
 heads+0.010
Neg logit contributions
 rescued-0.009
 che-0.008
eh-0.008
 deliber-0.008
 saved-0.008
Token.
Token ID13
Feature activation+3.20e-03

Pos logit contributions
 appearance+0.015
 outfits+0.015
maid+0.014
 shirts+0.014
utes+0.014
Neg logit contributions
 rescued-0.008
eh-0.008
vous-0.007
sm-0.007
year-0.007
Token But
Token ID887
Feature activation-1.74e-03

Pos logit contributions
 rescued+0.004
eh+0.004
 deliber+0.003
vous+0.003
on+0.003
Neg logit contributions
 outfits-0.007
 appearance-0.007
 shirts-0.007
 stations-0.007
ies-0.007
Token thought
Token ID1807
Feature activation+1.68e-02

Pos logit contributions
 rescued+0.000
eh+0.000
vous+0.000
 deliber+0.000
on+0.000
Neg logit contributions
 appearance-0.000
 outfits-0.000
 shirts-0.000
 languages-0.000
 stations-0.000
Token they
Token ID484
Feature activation+7.07e-03

Pos logit contributions
 deliber+0.020
on+0.020
 rescued+0.019
eh+0.018
year+0.017
Neg logit contributions
 appearance-0.035
 outfits-0.035
 stations-0.034
 heads-0.034
maid-0.033
Token looked
Token ID3114
Feature activation+2.38e-02

Pos logit contributions
 rescued+0.012
vous+0.011
eh+0.011
 deliber+0.011
year+0.010
Neg logit contributions
 outfits-0.012
 appearance-0.012
 shirts-0.012
 houses-0.012
 languages-0.012
Token like
Token ID588
Feature activation-2.47e-02

Pos logit contributions
 rescued+0.033
eh+0.030
vous+0.029
 deliber+0.028
 saved+0.028
Neg logit contributions
 outfits-0.049
 appearance-0.048
 stations-0.047
 shirts-0.047
 countries-0.046
Token ac
Token ID936
Feature activation+1.22e-02

Pos logit contributions
 appearance+0.058
 heads+0.055
 outfits+0.054
ies+0.054
inate+0.054
Neg logit contributions
 rescued-0.027
 saved-0.023
 deliber-0.023
 helped-0.022
vous-0.022
Tokenrob
Token ID22609
Feature activation+1.81e-01

Pos logit contributions
 deliber+0.016
 rescued+0.015
eh+0.014
vous+0.014
year+0.014
Neg logit contributions
ies-0.025
 outfits-0.024
 shirts-0.024
 stations-0.023
 sax-0.023
Tokenats
Token ID1381
Feature activation+4.73e-02

Pos logit contributions
 rescued+0.319
vous+0.302
eh+0.293
 deliber+0.289
year+0.282
Neg logit contributions
 outfits-0.325
 shirts-0.308
ies-0.305
 appearance-0.299
 hoops-0.290
Token.
Token ID13
Feature activation+1.16e-02

Pos logit contributions
 rescued+0.067
on+0.060
 saved+0.060
 helped+0.057
vous+0.054
Neg logit contributions
 appearance-0.097
 outfits-0.094
utes-0.090
inate-0.090
 countries-0.090
Token She
Token ID1375
Feature activation+1.88e-02

Pos logit contributions
 rescued+0.013
 deliber+0.013
eh+0.012
vous+0.012
itt+0.011
Neg logit contributions
 outfits-0.025
 appearance-0.024
maid-0.024
ies-0.024
 stations-0.024
Token enjoyed
Token ID8359
Feature activation+1.91e-02

Pos logit contributions
 rescued+0.025
eh+0.024
vous+0.023
 deliber+0.023
 eaten+0.021
Neg logit contributions
 appearance-0.040
 outfits-0.039
 stations-0.038
 languages-0.038
 shirts-0.037
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
 rescued+0.026
eh+0.025
 deliber+0.025
vous+0.024
on+0.022
Neg logit contributions
 outfits-0.039
 appearance-0.038
 shirts-0.037
 languages-0.036
 houses-0.036
Token mum
Token ID25682
Feature activation+1.43e-02

Pos logit contributions
 rescued+0.052
eh+0.050
 deliber+0.049
vous+0.048
 eaten+0.045
Neg logit contributions
 outfits-0.080
 shirts-0.077
 appearance-0.077
 stations-0.075
 hoops-0.074
Token and
Token ID290
Feature activation+4.16e-02

Pos logit contributions
 rescued+0.021
eh+0.020
on+0.019
vous+0.019
 deliber+0.019
Neg logit contributions
 appearance-0.028
 stations-0.028
 outfits-0.028
 languages-0.027
 hoops-0.027
Token her
Token ID607
Feature activation+2.52e-02

Pos logit contributions
 rescued+0.055
eh+0.048
 eaten+0.048
 deliber+0.047
 saved+0.045
Neg logit contributions
 appearance-0.094
 stations-0.090
 outfits-0.088
 languages-0.087
 countries-0.084
Token three
Token ID1115
Feature activation+2.89e-02

Pos logit contributions
 rescued+0.042
eh+0.040
on+0.038
 deliber+0.038
itt+0.037
Neg logit contributions
 stations-0.044
 appearance-0.042
 languages-0.040
 countries-0.039
inate-0.039
Token-
Token ID12
Feature activation+1.31e-01

Pos logit contributions
 rescued+0.042
eh+0.042
vous+0.041
 deliber+0.040
year+0.037
Neg logit contributions
 appearance-0.056
 outfits-0.053
 stations-0.053
 houses-0.052
 hoops-0.052
Tokenyear
Token ID1941
Feature activation+1.81e-01

Pos logit contributions
 rescued+0.094
eh+0.082
 deliber+0.080
vous+0.079
 saved+0.060
Neg logit contributions
 outfits-0.354
 shirts-0.339
 stations-0.335
 houses-0.335
ies-0.334
Token-
Token ID12
Feature activation+1.27e-01

Pos logit contributions
 rescued+0.297
eh+0.284
vous+0.279
 deliber+0.269
 eaten+0.252
Neg logit contributions
 outfits-0.330
 appearance-0.319
 shirts-0.314
 houses-0.311
ies-0.310
Tokenold
Token ID727
Feature activation+8.56e-02

Pos logit contributions
 rescued+0.147
eh+0.139
vous+0.132
 deliber+0.130
 eaten+0.116
Neg logit contributions
 outfits-0.286
 shirts-0.271
 houses-0.269
 stations-0.269
 languages-0.268
Token daughter
Token ID4957
Feature activation+2.79e-02

Pos logit contributions
 rescued+0.150
vous+0.142
eh+0.138
 deliber+0.135
 saved+0.128
Neg logit contributions
 appearance-0.145
 outfits-0.138
 hoops-0.135
 shirts-0.135
utes-0.134
Token went
Token ID1816
Feature activation+1.36e-02

Pos logit contributions
 rescued+0.041
eh+0.035
 deliber+0.035
on+0.034
vous+0.034
Neg logit contributions
 appearance-0.058
 outfits-0.055
 stations-0.055
 shirts-0.054
 hoops-0.054
Token to
Token ID284
Feature activation+3.12e-02

Pos logit contributions
 rescued+0.020
eh+0.019
 deliber+0.018
vous+0.017
on+0.016
Neg logit contributions
 appearance-0.029
 outfits-0.027
 languages-0.026
 stations-0.026
 shirts-0.026
Token saw
Token ID2497
Feature activation+2.13e-02

Pos logit contributions
 rescued+0.040
eh+0.038
vous+0.034
 saved+0.034
 eaten+0.033
Neg logit contributions
 appearance-0.066
 outfits-0.065
 languages-0.063
 stations-0.063
 shirts-0.062
Token a
Token ID257
Feature activation+1.33e-03

Pos logit contributions
 rescued+0.022
eh+0.021
vous+0.020
 deliber+0.018
on+0.017
Neg logit contributions
 outfits-0.052
 appearance-0.051
 shirts-0.049
 languages-0.049
ies-0.049
Token little
Token ID1310
Feature activation+1.21e-02

Pos logit contributions
eh+0.002
 rescued+0.002
 che+0.002
vous+0.002
 satisfied+0.002
Neg logit contributions
 outfits-0.003
 stations-0.003
 appearance-0.003
 languages-0.003
 shirts-0.003
Token 3
Token ID513
Feature activation+4.15e-02

Pos logit contributions
 rescued+0.021
eh+0.021
vous+0.020
 che+0.019
 deliber+0.018
Neg logit contributions
 outfits-0.021
 shirts-0.019
 appearance-0.019
 stations-0.019
utes-0.019
Token-
Token ID12
Feature activation+1.30e-01

Pos logit contributions
 rescued+0.071
 deliber+0.065
eh+0.063
vous+0.061
 saved+0.059
Neg logit contributions
 outfits-0.072
ies-0.070
 languages-0.068
 houses-0.068
 faces-0.067
Tokenyear
Token ID1941
Feature activation+1.80e-01

Pos logit contributions
 deliber+0.096
 rescued+0.096
+0.090
eh+0.074
vous+0.073
Neg logit contributions
 outfits-0.319
 houses-0.311
ies-0.310
 stations-0.305
 languages-0.300
Token-
Token ID12
Feature activation+1.26e-01

Pos logit contributions
 rescued+0.366
vous+0.328
 deliber+0.327
+0.324
eh+0.323
Neg logit contributions
ies-0.275
 houses-0.249
 languages-0.243
 outfits-0.242
 faces-0.242
Tokenold
Token ID727
Feature activation+8.56e-02

Pos logit contributions
 rescued+0.178
+0.169
 deliber+0.165
eh+0.158
vous+0.151
Neg logit contributions
 outfits-0.238
 houses-0.229
 countries-0.224
ies-0.224
 languages-0.223
Token boy
Token ID2933
Feature activation+2.00e-02

Pos logit contributions
 rescued+0.169
eh+0.162
vous+0.162
 deliber+0.155
year+0.148
Neg logit contributions
 appearance-0.125
 outfits-0.124
 shirts-0.119
 stations-0.116
 hoops-0.115
Token.
Token ID13
Feature activation+2.16e-02

Pos logit contributions
 rescued+0.026
eh+0.024
vous+0.023
 deliber+0.023
on+0.021
Neg logit contributions
 outfits-0.043
 appearance-0.043
 shirts-0.042
 stations-0.041
 languages-0.041
Token He
Token ID679
Feature activation+9.22e-03

Pos logit contributions
 rescued+0.028
eh+0.027
vous+0.026
 deliber+0.025
year+0.023
Neg logit contributions
 outfits-0.045
 appearance-0.045
 shirts-0.044
 houses-0.043
 languages-0.043
Token
Token ID198
Feature activation-5.82e-03

Pos logit contributions
 outfits+0.015
 appearance+0.014
 shirts+0.014
 houses+0.014
 stations+0.014
Neg logit contributions
 rescued-0.010
eh-0.010
vous-0.010
 deliber-0.009
year-0.008
TokenThey
Token ID2990
Feature activation-2.91e-02

Pos logit contributions
 appearance+0.008
 outfits+0.008
 shirts+0.007
 languages+0.007
 houses+0.007
Neg logit contributions
 rescued-0.012
eh-0.012
vous-0.011
 deliber-0.011
year-0.011
Token get
Token ID651
Feature activation+1.61e-02

Pos logit contributions
 outfits+0.050
 appearance+0.050
 languages+0.049
 stations+0.047
 houses+0.047
Neg logit contributions
 rescued-0.051
eh-0.047
 deliber-0.045
vous-0.045
 saved-0.044
Token a
Token ID257
Feature activation-2.11e-02

Pos logit contributions
 rescued+0.029
eh+0.029
vous+0.028
 deliber+0.028
year+0.026
Neg logit contributions
 outfits-0.026
 appearance-0.025
 shirts-0.025
 houses-0.024
 stations-0.024
Token m
Token ID285
Feature activation+3.26e-02

Pos logit contributions
 appearance+0.042
 outfits+0.042
 languages+0.040
 stations+0.040
 shirts+0.040
Neg logit contributions
 rescued-0.031
eh-0.031
vous-0.029
 deliber-0.028
on-0.026
Tokenitt
Token ID715
Feature activation+1.80e-01

Pos logit contributions
 rescued+0.052
eh+0.048
 deliber+0.048
vous+0.046
 saved+0.044
Neg logit contributions
ies-0.061
 faces-0.058
 outfits-0.057
 appearance-0.056
 shirts-0.054
Token and
Token ID290
Feature activation-2.26e-02

Pos logit contributions
 rescued+0.344
eh+0.339
vous+0.330
 deliber+0.325
 eaten+0.306
Neg logit contributions
 outfits-0.269
 appearance-0.266
 shirts-0.253
 houses-0.248
 stations-0.247
Token a
Token ID257
Feature activation-2.24e-02

Pos logit contributions
 outfits+0.044
 appearance+0.044
 languages+0.042
 shirts+0.041
 stations+0.041
Neg logit contributions
 rescued-0.034
eh-0.032
vous-0.031
 deliber-0.031
 eaten-0.029
Token spat
Token ID15246
Feature activation+5.26e-03

Pos logit contributions
 appearance+0.045
 outfits+0.044
 languages+0.044
 stations+0.043
 countries+0.043
Neg logit contributions
 rescued-0.033
eh-0.032
 deliber-0.030
vous-0.030
on-0.028
Tokenula
Token ID4712
Feature activation-5.44e-02

Pos logit contributions
 rescued+0.008
on+0.008
eh+0.008
 deliber+0.007
dd+0.007
Neg logit contributions
 appearance-0.010
 languages-0.010
 outfits-0.010
 countries-0.009
 stations-0.009
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
 outfits+0.099
 appearance+0.098
 languages+0.096
 countries+0.094
 stations+0.093
Neg logit contributions
 rescued-0.089
eh-0.085
vous-0.082
 deliber-0.080
on-0.078
Token eat
Token ID4483
Feature activation+1.44e-02

Pos logit contributions
 outfits+0.048
 appearance+0.046
 stations+0.046
 languages+0.046
 shirts+0.046
Neg logit contributions
 rescued-0.043
vous-0.039
 deliber-0.039
eh-0.038
 oblig-0.038
Token some
Token ID617
Feature activation-2.19e-02

Pos logit contributions
eh+0.018
 rescued+0.018
on+0.017
 deliber+0.016
 che+0.016
Neg logit contributions
 outfits-0.030
 appearance-0.030
 stations-0.029
 languages-0.029
ies-0.028
Token of
Token ID286
Feature activation-2.68e-02

Pos logit contributions
 appearance+0.046
 outfits+0.044
 stations+0.043
 heads+0.041
 languages+0.041
Neg logit contributions
eh-0.030
 rescued-0.030
vous-0.029
 deliber-0.028
on-0.026
Token the
Token ID262
Feature activation+5.91e-03

Pos logit contributions
 appearance+0.058
 outfits+0.058
 stations+0.056
 languages+0.056
ies+0.055
Neg logit contributions
 rescued-0.033
eh-0.031
 deliber-0.030
on-0.029
vous-0.028
Token fish
Token ID5916
Feature activation-2.39e-02

Pos logit contributions
 rescued+0.009
vous+0.009
eh+0.008
 deliber+0.008
 saved+0.008
Neg logit contributions
 appearance-0.011
 outfits-0.010
 languages-0.010
adoes-0.010
 stations-0.010
Token~
Token ID93
Feature activation-1.28e-01

Pos logit contributions
 appearance+0.049
 outfits+0.048
 stations+0.047
 shirts+0.046
 languages+0.046
Neg logit contributions
 rescued-0.035
eh-0.034
vous-0.032
on-0.031
 eaten-0.030
Tokenâ
Token ID22940
Feature activation-4.14e-02

Pos logit contributions
 appearance+0.273
 outfits+0.270
 shirts+0.256
 languages+0.252
 faces+0.252
Neg logit contributions
 rescued-0.176
eh-0.168
vous-0.164
 deliber-0.156
on-0.155
Token
Token ID26391
Feature activation+2.26e-02

Pos logit contributions
 appearance+0.081
 outfits+0.078
 shirts+0.077
 stations+0.074
 languages+0.074
Neg logit contributions
 rescued-0.054
itt-0.053
eh-0.053
year-0.051
 deliber-0.049
Token
Token ID198
Feature activation+1.31e-02

Pos logit contributions
 rescued+0.036
eh+0.035
vous+0.035
 deliber+0.034
year+0.033
Neg logit contributions
 outfits-0.040
 appearance-0.039
 shirts-0.039
 houses-0.038
 countries-0.037
Token
Token ID198
Feature activation+1.37e-02

Pos logit contributions
 rescued+0.021
eh+0.021
vous+0.020
 deliber+0.019
 eaten+0.019
Neg logit contributions
 outfits-0.022
 shirts-0.021
 houses-0.021
 appearance-0.021
 languages-0.021
TokenJane
Token ID41083
Feature activation+2.03e-02

Pos logit contributions
 rescued+0.020
eh+0.019
vous+0.019
 deliber+0.018
 eaten+0.017
Neg logit contributions
 outfits-0.026
 appearance-0.025
 shirts-0.024
 houses-0.024
 stations-0.024
Token't
Token ID470
Feature activation+3.78e-03

Pos logit contributions
 rescued+0.023
eh+0.021
vous+0.021
 deliber+0.021
on+0.019
Neg logit contributions
 outfits-0.040
 appearance-0.040
 shirts-0.038
 houses-0.038
 stations-0.037
Token judge
Token ID5052
Feature activation-6.97e-02

Pos logit contributions
 rescued+0.005
 eaten+0.005
vous+0.005
 che+0.005
 deliber+0.005
Neg logit contributions
 outfits-0.008
lights-0.008
 shirts-0.007
 stations-0.007
 houses-0.007
Token things
Token ID1243
Feature activation-2.92e-02

Pos logit contributions
lights+0.115
 outfits+0.113
 runway+0.112
utes+0.111
 stations+0.109
Neg logit contributions
 rescued-0.114
 saved-0.104
vous-0.103
 helped-0.100
sm-0.099
Token by
Token ID416
Feature activation+2.13e-02

Pos logit contributions
 outfits+0.067
maid+0.063
 stations+0.063
 appearance+0.063
 shirts+0.063
Neg logit contributions
 rescued-0.035
 saved-0.031
 deliber-0.028
 helped-0.028
sm-0.027
Token their
Token ID511
Feature activation-8.23e-03

Pos logit contributions
 rescued+0.033
 deliber+0.032
vous+0.031
eh+0.030
 eaten+0.030
Neg logit contributions
 outfits-0.038
 shirts-0.036
 houses-0.036
lights-0.035
ies-0.035
Token appearance
Token ID5585
Feature activation-1.23e-01

Pos logit contributions
lights+0.010
bles+0.009
ists+0.009
ters+0.009
utes+0.009
Neg logit contributions
 deliber-0.015
 rescued-0.015
 che-0.014
vous-0.014
 eaten-0.014
Token,
Token ID11
Feature activation-2.02e-02

Pos logit contributions
 outfits+0.281
inate+0.278
utes+0.276
 sax+0.266
 shirts+0.264
Neg logit contributions
on-0.097
 rescued-0.086
sm-0.085
 ever-0.081
 saved-0.079
Token be
Token ID307
Feature activation+5.16e-04

Pos logit contributions
 outfits+0.041
maid+0.039
utes+0.037
 shirts+0.036
 trigger+0.036
Neg logit contributions
 saved-0.027
 rescued-0.027
 eaten-0.024
 helped-0.024
 che-0.023
Token curious
Token ID11040
Feature activation-3.49e-03

Pos logit contributions
 rescued+0.001
eh+0.001
vous+0.001
 deliber+0.001
year+0.000
Neg logit contributions
 outfits-0.001
 appearance-0.001
 shirts-0.001
 houses-0.001
 stations-0.001
Token and
Token ID290
Feature activation-1.81e-02

Pos logit contributions
utes+0.007
inate+0.007
maid+0.007
 outfits+0.007
ramids+0.007
Neg logit contributions
 rescued-0.004
vous-0.004
 helped-0.004
 saved-0.004
sm-0.004
Token respectful
Token ID25923
Feature activation+2.08e-02

Pos logit contributions
 outfits+0.037
 shirts+0.036
 appearance+0.035
 stations+0.035
maid+0.034
Neg logit contributions
 rescued-0.027
vous-0.024
 deliber-0.023
eh-0.023
 saved-0.023
Token a
Token ID257
Feature activation-2.43e-02

Pos logit contributions
 outfits+0.032
 appearance+0.031
 stations+0.030
 shirts+0.030
 languages+0.030
Neg logit contributions
 rescued-0.021
eh-0.019
 saved-0.019
on-0.018
 eaten-0.018
Token trick
Token ID6908
Feature activation-8.21e-02

Pos logit contributions
 stations+0.043
 countries+0.043
 languages+0.042
 outfits+0.041
 shirts+0.041
Neg logit contributions
eh-0.038
 rescued-0.037
vous-0.036
 che-0.035
 deliber-0.033
Token for
Token ID329
Feature activation+6.39e-03

Pos logit contributions
 appearance+0.187
 outfits+0.182
 heads+0.171
 shirts+0.170
 houses+0.170
Neg logit contributions
on-0.101
 rescued-0.097
eh-0.090
vous-0.089
year-0.083
Token you
Token ID345
Feature activation-2.73e-03

Pos logit contributions
 rescued+0.009
on+0.008
 deliber+0.008
eh+0.008
vous+0.007
Neg logit contributions
 outfits-0.013
 stations-0.012
ies-0.012
 appearance-0.012
inate-0.012
Token.
Token ID13
Feature activation-2.67e-02

Pos logit contributions
 outfits+0.006
 stations+0.006
lights+0.006
 houses+0.006
utes+0.006
Neg logit contributions
 rescued-0.003
 eaten-0.003
vous-0.003
eh-0.003
on-0.002
Token Watch
Token ID6305
Feature activation-1.23e-01

Pos logit contributions
 outfits+0.051
 appearance+0.050
ies+0.049
 shirts+0.048
 stations+0.047
Neg logit contributions
 rescued-0.041
vous-0.039
eh-0.038
 deliber-0.038
on-0.035
Token this
Token ID428
Feature activation-4.57e-02

Pos logit contributions
 outfits+0.227
 appearance+0.225
 cultures+0.221
 stations+0.221
 houses+0.217
Neg logit contributions
 rescued-0.191
on-0.184
eh-0.182
 deliber-0.173
vous-0.161
Token."
Token ID526
Feature activation+2.27e-03

Pos logit contributions
 stations+0.095
 outfits+0.094
 shirts+0.094
lights+0.093
 houses+0.092
Neg logit contributions
eh-0.054
 rescued-0.054
vous-0.050
 eaten-0.050
on-0.049
Token He
Token ID679
Feature activation-2.76e-02

Pos logit contributions
 rescued+0.003
eh+0.002
 deliber+0.002
vous+0.002
on+0.002
Neg logit contributions
 outfits-0.005
 appearance-0.005
 stations-0.005
 shirts-0.005
ies-0.005
Token took
Token ID1718
Feature activation-1.28e-02

Pos logit contributions
 appearance+0.067
 outfits+0.063
 languages+0.062
 stations+0.061
 countries+0.061
Neg logit contributions
 rescued-0.035
eh-0.031
 saved-0.028
 deliber-0.028
vous-0.027
Token the
Token ID262
Feature activation-4.91e-03

Pos logit contributions
 outfits+0.025
 appearance+0.024
 shirts+0.024
 houses+0.024
 stations+0.023
Neg logit contributions
 rescued-0.019
eh-0.018
vous-0.018
 deliber-0.018
year-0.016
Token also
Token ID635
Feature activation-1.03e-02

Pos logit contributions
 outfits+0.048
 appearance+0.046
 shirts+0.045
utes+0.045
ies+0.043
Neg logit contributions
 rescued-0.041
 saved-0.036
vous-0.035
 deliber-0.035
 che-0.035
Token see
Token ID766
Feature activation-2.90e-02

Pos logit contributions
 outfits+0.019
 appearance+0.018
 shirts+0.018
ies+0.018
 stations+0.018
Neg logit contributions
 rescued-0.017
vous-0.015
eh-0.015
 deliber-0.015
 eaten-0.014
Token words
Token ID2456
Feature activation-2.33e-02

Pos logit contributions
 outfits+0.059
 appearance+0.059
uing+0.056
inate+0.056
utes+0.055
Neg logit contributions
 rescued-0.038
 saved-0.034
vous-0.034
sm-0.034
eh-0.033
Token in
Token ID287
Feature activation-2.72e-02

Pos logit contributions
 outfits+0.059
 appearance+0.058
 heads+0.056
 stations+0.054
inate+0.054
Neg logit contributions
on-0.021
 rescued-0.018
 helped-0.018
 full-0.017
 saved-0.017
Token different
Token ID1180
Feature activation-1.72e-02

Pos logit contributions
 outfits+0.062
maid+0.062
ies+0.061
utes+0.060
 appearance+0.060
Neg logit contributions
 rescued-0.026
sm-0.023
vous-0.023
 saved-0.022
 che-0.021
Token languages
Token ID8950
Feature activation-1.22e-01

Pos logit contributions
utes+0.021
ute+0.019
maid+0.019
 trigger+0.019
 sax+0.018
Neg logit contributions
vous-0.034
 deliber-0.032
 rescued-0.032
 helped-0.030
sm-0.030
Token.
Token ID13
Feature activation-2.85e-02

Pos logit contributions
 appearance+0.312
 outfits+0.304
inate+0.297
 faces+0.289
 heads+0.287
Neg logit contributions
on-0.095
 rescued-0.084
 helped-0.080
 saved-0.076
sm-0.067
Token
Token ID198
Feature activation-1.30e-02

Pos logit contributions
 outfits+0.065
 appearance+0.063
ies+0.062
 shirts+0.060
 heads+0.059
Neg logit contributions
 rescued-0.034
 deliber-0.032
vous-0.031
eh-0.031
itt-0.030
Token
Token ID198
Feature activation-1.13e-02

Pos logit contributions
 outfits+0.027
 appearance+0.027
 shirts+0.026
 stations+0.025
 languages+0.025
Neg logit contributions
 rescued-0.017
eh-0.017
vous-0.016
 deliber-0.016
on-0.014
Token"
Token ID1
Feature activation+1.82e-03

Pos logit contributions
 appearance+0.019
 outfits+0.019
 shirts+0.017
 languages+0.017
 faces+0.017
Neg logit contributions
 rescued-0.020
eh-0.020
vous-0.020
 deliber-0.019
year-0.018
TokenWhat
Token ID2061
Feature activation-6.55e-02

Pos logit contributions
vous+0.003
 rescued+0.003
 deliber+0.003
eh+0.003
year+0.003
Neg logit contributions
 appearance-0.003
 outfits-0.003
ies-0.003
 faces-0.003
 shirts-0.003
Token "
Token ID366
Feature activation-4.18e-02

Pos logit contributions
 outfits+0.049
 appearance+0.048
 countries+0.046
 stations+0.046
 shirts+0.045
Neg logit contributions
 rescued-0.031
 deliber-0.028
eh-0.027
on-0.027
year-0.026
TokenYes
Token ID5297
Feature activation-1.78e-02

Pos logit contributions
 appearance+0.083
 outfits+0.082
ies+0.078
 shirts+0.076
 houses+0.075
Neg logit contributions
vous-0.063
 rescued-0.062
eh-0.059
 deliber-0.057
year-0.055
Token,
Token ID11
Feature activation-2.40e-02

Pos logit contributions
 outfits+0.041
 stations+0.039
 appearance+0.039
 languages+0.039
 houses+0.039
Neg logit contributions
 rescued-0.018
on-0.017
 deliber-0.016
eh-0.016
vous-0.015
Token part
Token ID636
Feature activation-1.89e-02

Pos logit contributions
 outfits+0.056
 appearance+0.052
 shirts+0.050
 heads+0.050
ies+0.049
Neg logit contributions
 rescued-0.027
on-0.025
 saved-0.025
 helped-0.023
 deliber-0.022
Token.
Token ID13
Feature activation-2.55e-02

Pos logit contributions
 outfits+0.034
 appearance+0.034
 shirts+0.032
 houses+0.031
 faces+0.031
Neg logit contributions
 rescued-0.031
vous-0.031
on-0.030
eh-0.029
sm-0.028
Token Watch
Token ID6305
Feature activation-1.22e-01

Pos logit contributions
 outfits+0.049
ies+0.047
 appearance+0.047
 shirts+0.046
 hoops+0.044
Neg logit contributions
 rescued-0.040
vous-0.036
 deliber-0.035
year-0.034
itt-0.033
Token this
Token ID428
Feature activation-4.72e-02

Pos logit contributions
 appearance+0.230
 outfits+0.230
 cultures+0.221
ies+0.220
 languages+0.218
Neg logit contributions
 rescued-0.179
on-0.170
eh-0.168
 deliber-0.157
vous-0.152
Token."
Token ID526
Feature activation-2.42e-03

Pos logit contributions
 shirts+0.096
 outfits+0.095
 hoops+0.089
lights+0.089
 stations+0.088
Neg logit contributions
 rescued-0.064
vous-0.061
eh-0.060
 deliber-0.058
on-0.057
Token
Token ID198
Feature activation-8.17e-03

Pos logit contributions
 appearance+0.006
 outfits+0.006
 languages+0.006
 stations+0.005
ies+0.005
Neg logit contributions
 rescued-0.002
 deliber-0.002
eh-0.002
itt-0.002
on-0.002
Token
Token ID198
Feature activation-7.48e-03

Pos logit contributions
 appearance+0.020
 outfits+0.019
 shirts+0.018
 languages+0.018
ies+0.018
Neg logit contributions
 rescued-0.009
eh-0.008
 deliber-0.008
vous-0.008
on-0.007
TokenThey
Token ID2990
Feature activation-3.11e-02

Pos logit contributions
 appearance+0.014
 outfits+0.013
maid+0.012
ies+0.012
 faces+0.012
Neg logit contributions
eh-0.014
 rescued-0.013
vous-0.012
on-0.012
itt-0.012
Token that
Token ID326
Feature activation-2.56e-02

Pos logit contributions
 outfits+0.078
 appearance+0.077
 stations+0.075
 shirts+0.075
ies+0.074
Neg logit contributions
 rescued-0.037
 deliber-0.032
eh-0.032
vous-0.030
 eaten-0.029
Token it
Token ID340
Feature activation-1.51e-02

Pos logit contributions
 outfits+0.049
maid+0.047
 shirts+0.047
 appearance+0.046
utes+0.045
Neg logit contributions
 rescued-0.036
 deliber-0.036
on-0.034
 eaten-0.034
 saved-0.032
Token's
Token ID338
Feature activation-4.83e-02

Pos logit contributions
 outfits+0.025
 appearance+0.024
 shirts+0.023
 stations+0.023
 faces+0.022
Neg logit contributions
 rescued-0.027
vous-0.026
eh-0.026
 deliber-0.025
 saved-0.024
Token not
Token ID407
Feature activation-4.40e-03

Pos logit contributions
 outfits+0.103
 appearance+0.100
 shirts+0.099
 houses+0.096
 stations+0.096
Neg logit contributions
 rescued-0.065
eh-0.062
vous-0.060
 deliber-0.058
 eaten-0.054
Token the
Token ID262
Feature activation-2.30e-03

Pos logit contributions
 outfits+0.009
 shirts+0.009
 appearance+0.009
maid+0.009
 stations+0.009
Neg logit contributions
 rescued-0.006
vous-0.005
eh-0.005
 saved-0.005
 eaten-0.005
Token appearance
Token ID5585
Feature activation-1.22e-01

Pos logit contributions
 outfits+0.004
ies+0.004
maid+0.004
 shirts+0.004
utes+0.004
Neg logit contributions
 rescued-0.003
vous-0.003
 deliber-0.003
eh-0.003
 saved-0.003
Token of
Token ID286
Feature activation-3.78e-02

Pos logit contributions
 outfits+0.267
inate+0.265
maid+0.262
utes+0.261
 faces+0.258
Neg logit contributions
on-0.125
 helped-0.113
sm-0.112
 rescued-0.112
vous-0.108
Token things
Token ID1243
Feature activation-2.97e-02

Pos logit contributions
 outfits+0.072
maid+0.071
ies+0.068
lights+0.066
 shirts+0.066
Neg logit contributions
 rescued-0.055
 deliber-0.050
 saved-0.048
 eaten-0.048
vous-0.047
Token that
Token ID326
Feature activation-2.61e-02

Pos logit contributions
 outfits+0.064
 stations+0.060
 appearance+0.060
 shirts+0.059
maid+0.058
Neg logit contributions
 rescued-0.039
 saved-0.034
on-0.034
 deliber-0.033
 helped-0.032
Token matter
Token ID2300
Feature activation-7.16e-02

Pos logit contributions
 outfits+0.047
 shirts+0.045
 stations+0.044
maid+0.044
 appearance+0.044
Neg logit contributions
 rescued-0.043
vous-0.040
 saved-0.039
 deliber-0.039
 eaten-0.038
Token,
Token ID11
Feature activation-1.42e-02

Pos logit contributions
 outfits+0.134
 shirts+0.131
 faces+0.131
 appearance+0.124
 edge+0.123
Neg logit contributions
 rescued-0.098
on-0.097
sm-0.094
year-0.091
vous-0.088
Token play
Token ID711
Feature activation-3.10e-02

Pos logit contributions
 outfits+0.039
 appearance+0.038
 shirts+0.036
 houses+0.036
 stations+0.035
Neg logit contributions
 rescued-0.054
eh-0.054
vous-0.053
 deliber-0.052
year-0.049
Token with
Token ID351
Feature activation+2.14e-02

Pos logit contributions
 outfits+0.047
 appearance+0.047
 shirts+0.045
 languages+0.044
 houses+0.043
Neg logit contributions
 rescued-0.059
eh-0.057
vous-0.056
 deliber-0.055
year-0.052
Token their
Token ID511
Feature activation-1.61e-02

Pos logit contributions
 rescued+0.034
eh+0.032
vous+0.030
 eaten+0.030
 saved+0.029
Neg logit contributions
 appearance-0.038
 outfits-0.038
 languages-0.036
maid-0.036
 countries-0.035
Token toys
Token ID14958
Feature activation-2.59e-02

Pos logit contributions
maid+0.021
 appearance+0.021
 outfits+0.020
inate+0.020
 languages+0.019
Neg logit contributions
 rescued-0.032
eh-0.030
 deliber-0.029
 saved-0.029
vous-0.028
Token and
Token ID290
Feature activation-2.97e-02

Pos logit contributions
 outfits+0.042
 appearance+0.041
 languages+0.039
 shirts+0.038
maid+0.038
Neg logit contributions
 rescued-0.047
eh-0.044
 saved-0.042
 deliber-0.042
vous-0.042
Token forget
Token ID6044
Feature activation-1.22e-01

Pos logit contributions
maid+0.050
 outfits+0.047
 appearance+0.047
utes+0.047
inate+0.046
Neg logit contributions
 rescued-0.053
 saved-0.048
 eaten-0.047
 deliber-0.047
 helped-0.046
Token about
Token ID546
Feature activation-2.35e-02

Pos logit contributions
 outfits+0.246
 appearance+0.243
maid+0.237
 shirts+0.233
 countries+0.231
Neg logit contributions
 rescued-0.160
 deliber-0.145
eh-0.139
 saved-0.139
 eaten-0.137
Token the
Token ID262
Feature activation-1.08e-02

Pos logit contributions
 outfits+0.043
 appearance+0.040
 shirts+0.040
 languages+0.040
 countries+0.039
Neg logit contributions
 rescued-0.037
eh-0.034
 deliber-0.034
vous-0.033
 eaten-0.033
Token tiger
Token ID26241
Feature activation+3.83e-03

Pos logit contributions
 outfits+0.017
maid+0.017
inate+0.016
 appearance+0.016
ies+0.016
Neg logit contributions
 rescued-0.019
eh-0.019
vous-0.018
 saved-0.017
 deliber-0.017
Token.
Token ID13
Feature activation-2.17e-02

Pos logit contributions
 rescued+0.005
eh+0.005
vous+0.005
on+0.005
 eaten+0.005
Neg logit contributions
 outfits-0.008
 appearance-0.008
 shirts-0.007
 languages-0.007
utes-0.007
Token They
Token ID1119
Feature activation-2.66e-02

Pos logit contributions
 outfits+0.035
ies+0.034
 appearance+0.034
 shirts+0.031
 countries+0.031
Neg logit contributions
 rescued-0.039
 deliber-0.038
vous-0.037
eh-0.036
 eaten-0.035
Token mouth
Token ID5422
Feature activation+1.62e-03

Pos logit contributions
 stations+0.052
 outfits+0.049
 countries+0.049
 languages+0.049
utes+0.049
Neg logit contributions
 eaten-0.044
eh-0.044
 rescued-0.043
vous-0.041
 che-0.040
Token.
Token ID13
Feature activation-3.01e-02

Pos logit contributions
eh+0.002
 rescued+0.002
vous+0.002
 eaten+0.002
on+0.002
Neg logit contributions
 outfits-0.003
 languages-0.003
 countries-0.003
 cultures-0.003
 appearance-0.003
Token And
Token ID843
Feature activation-6.39e-04

Pos logit contributions
 appearance+0.054
 outfits+0.053
 shirts+0.052
 houses+0.051
 stations+0.050
Neg logit contributions
 rescued-0.048
eh-0.048
vous-0.046
 deliber-0.045
year-0.042
Token don
Token ID836
Feature activation+1.49e-02

Pos logit contributions
 outfits+0.001
 stations+0.001
 appearance+0.001
 languages+0.001
 shirts+0.001
Neg logit contributions
 rescued-0.001
 deliber-0.001
 eaten-0.001
 saved-0.001
eh-0.001
Token't
Token ID470
Feature activation-1.69e-03

Pos logit contributions
eh+0.024
 rescued+0.024
vous+0.023
 deliber+0.022
year+0.021
Neg logit contributions
 outfits-0.026
 appearance-0.026
 houses-0.026
 shirts-0.025
 languages-0.025
Token forget
Token ID6044
Feature activation-1.22e-01

Pos logit contributions
 outfits+0.004
 stations+0.003
 languages+0.003
 shirts+0.003
 countries+0.003
Neg logit contributions
 rescued-0.002
 eaten-0.002
 che-0.002
vous-0.002
 deliber-0.002
Token to
Token ID284
Feature activation-2.20e-03

Pos logit contributions
 outfits+0.250
 languages+0.238
 stations+0.235
 hoops+0.235
 appearance+0.234
Neg logit contributions
 rescued-0.154
eh-0.146
 deliber-0.139
 eaten-0.138
vous-0.133
Token share
Token ID2648
Feature activation-2.22e-02

Pos logit contributions
 outfits+0.004
 languages+0.004
 stations+0.004
 shirts+0.004
 countries+0.004
Neg logit contributions
 rescued-0.004
eh-0.003
 eaten-0.003
vous-0.003
 deliber-0.003
Token."
Token ID526
Feature activation-4.58e-03

Pos logit contributions
 outfits+0.044
 stations+0.043
 languages+0.043
 appearance+0.042
utes+0.042
Neg logit contributions
 rescued-0.029
 eaten-0.028
on-0.027
eh-0.027
 che-0.026
Token
Token ID198
Feature activation-1.45e-02

Pos logit contributions
 outfits+0.010
 appearance+0.009
 shirts+0.009
 stations+0.009
 languages+0.009
Neg logit contributions
 rescued-0.006
eh-0.006
 deliber-0.006
vous-0.006
 eaten-0.005
Token
Token ID198
Feature activation-1.35e-02

Pos logit contributions
 outfits+0.031
 appearance+0.031
 shirts+0.029
 stations+0.029
 languages+0.029
Neg logit contributions
 rescued-0.019
eh-0.018
vous-0.018
 deliber-0.017
on-0.016
Token lot
Token ID1256
Feature activation+2.92e-02

Pos logit contributions
 appearance+0.046
 stations+0.043
 outfits+0.043
 languages+0.043
 countries+0.042
Neg logit contributions
 rescued-0.049
eh-0.047
 deliber-0.045
vous-0.043
 che-0.042
Token of
Token ID286
Feature activation-4.44e-02

Pos logit contributions
 rescued+0.053
eh+0.052
vous+0.051
 deliber+0.050
year+0.046
Neg logit contributions
 outfits-0.047
 appearance-0.046
 shirts-0.045
 houses-0.044
 stations-0.044
Token fun
Token ID1257
Feature activation+1.08e-02

Pos logit contributions
 outfits+0.084
 shirts+0.081
 appearance+0.080
 houses+0.078
 languages+0.078
Neg logit contributions
 rescued-0.070
eh-0.069
vous-0.067
 deliber-0.064
year-0.061
Token.
Token ID13
Feature activation-2.51e-02

Pos logit contributions
eh+0.015
 rescued+0.015
 deliber+0.015
vous+0.014
on+0.014
Neg logit contributions
 appearance-0.021
 outfits-0.021
 shirts-0.020
 languages-0.019
 stations-0.019
Token They
Token ID1119
Feature activation-2.53e-02

Pos logit contributions
 outfits+0.040
 appearance+0.039
ies+0.038
 shirts+0.037
 countries+0.036
Neg logit contributions
 rescued-0.047
 deliber-0.044
eh-0.044
vous-0.043
 eaten-0.041
Token forget
Token ID6044
Feature activation-1.22e-01

Pos logit contributions
 outfits+0.041
 appearance+0.041
 houses+0.039
 languages+0.039
ies+0.039
Neg logit contributions
 rescued-0.046
 saved-0.041
vous-0.040
 deliber-0.040
eh-0.039
Token about
Token ID546
Feature activation-2.26e-02

Pos logit contributions
 outfits+0.272
 appearance+0.269
 stations+0.260
 shirts+0.257
maid+0.257
Neg logit contributions
 rescued-0.129
 deliber-0.119
on-0.111
 eaten-0.110
 saved-0.110
Token their
Token ID511
Feature activation-1.46e-02

Pos logit contributions
 outfits+0.049
 appearance+0.046
 shirts+0.046
 stations+0.046
 languages+0.045
Neg logit contributions
 rescued-0.027
 deliber-0.025
 eaten-0.024
eh-0.024
vous-0.023
Token impat
Token ID28189
Feature activation-6.12e-04

Pos logit contributions
 outfits+0.022
 stations+0.022
 appearance+0.021
 shirts+0.021
ies+0.021
Neg logit contributions
 rescued-0.026
 deliber-0.025
eh-0.024
vous-0.024
 eaten-0.023
Tokenience
Token ID1240
Feature activation+1.05e-02

Pos logit contributions
 outfits+0.001
 appearance+0.001
 shirts+0.001
ies+0.001
 languages+0.001
Neg logit contributions
eh-0.001
 rescued-0.001
vous-0.001
 deliber-0.001
year-0.001
Token.
Token ID13
Feature activation-2.40e-02

Pos logit contributions
 rescued+0.012
 deliber+0.011
vous+0.011
 eaten+0.011
 saved+0.011
Neg logit contributions
 outfits-0.022
 appearance-0.022
maid-0.022
utes-0.022
inate-0.021
Token't
Token ID470
Feature activation+1.07e-03

Pos logit contributions
 outfits+0.098
 appearance+0.097
 shirts+0.095
 houses+0.094
 languages+0.094
Neg logit contributions
 rescued-0.073
eh-0.072
vous-0.070
 deliber-0.068
year-0.063
Token judge
Token ID5052
Feature activation-7.07e-02

Pos logit contributions
 che+0.001
 rescued+0.001
 eaten+0.001
vous+0.001
 deliber+0.001
Neg logit contributions
 outfits-0.002
lights-0.002
 stations-0.002
 shirts-0.002
maid-0.002
Token someone
Token ID2130
Feature activation-4.91e-02

Pos logit contributions
lights+0.132
 outfits+0.130
utes+0.127
 runway+0.126
 stations+0.123
Neg logit contributions
 rescued-0.101
vous-0.091
 saved-0.089
sm-0.089
 eaten-0.085
Token by
Token ID416
Feature activation+1.96e-02

Pos logit contributions
 outfits+0.090
 stations+0.086
 shirts+0.085
 appearance+0.084
lights+0.083
Neg logit contributions
 rescued-0.081
vous-0.074
eh-0.072
 saved-0.071
 eaten-0.070
Token their
Token ID511
Feature activation-9.16e-03

Pos logit contributions
 rescued+0.031
 deliber+0.030
vous+0.030
eh+0.030
 eaten+0.029
Neg logit contributions
 outfits-0.034
 shirts-0.032
 houses-0.031
maid-0.031
lights-0.031
Token appearance
Token ID5585
Feature activation-1.21e-01

Pos logit contributions
bles+0.010
utes+0.010
lights+0.010
uing+0.009
inate+0.009
Neg logit contributions
 rescued-0.017
 che-0.016
 deliber-0.015
 eaten-0.015
vous-0.015
Token.
Token ID13
Feature activation-2.57e-02

Pos logit contributions
utes+0.303
 outfits+0.300
inate+0.297
 sax+0.290
 stations+0.286
Neg logit contributions
on-0.068
sm-0.063
vous-0.060
 ever-0.059
 rescued-0.058
Token Sometimes
Token ID8975
Feature activation-1.95e-02

Pos logit contributions
 outfits+0.048
 shirts+0.046
ies+0.045
 appearance+0.045
 stations+0.044
Neg logit contributions
 rescued-0.040
vous-0.038
 deliber-0.038
eh-0.036
on-0.035
Token they
Token ID484
Feature activation-1.98e-02

Pos logit contributions
 outfits+0.042
utes+0.040
 shirts+0.040
inate+0.039
maid+0.039
Neg logit contributions
 rescued-0.022
 saved-0.021
 deliber-0.020
 helped-0.020
 che-0.019
Token are
Token ID389
Feature activation-1.58e-02

Pos logit contributions
 outfits+0.041
 shirts+0.039
lights+0.039
utes+0.038
 hoops+0.037
Neg logit contributions
 rescued-0.033
 saved-0.030
 defeated-0.027
 helped-0.027
 eaten-0.026
Token not
Token ID407
Feature activation-6.19e-03

Pos logit contributions
 outfits+0.037
 shirts+0.035
 stations+0.033
 appearance+0.033
lights+0.033
Neg logit contributions
 rescued-0.021
 eaten-0.019
 saved-0.018
eh-0.018
 defeated-0.017
Token.
Token ID13
Feature activation-2.34e-02

Pos logit contributions
 rescued+0.029
eh+0.027
 deliber+0.027
vous+0.027
on+0.026
Neg logit contributions
 appearance-0.044
 outfits-0.044
 languages-0.042
 stations-0.042
 shirts-0.042
Token They
Token ID1119
Feature activation-2.04e-02

Pos logit contributions
 outfits+0.037
 appearance+0.036
ies+0.035
 languages+0.034
 shirts+0.034
Neg logit contributions
 rescued-0.043
 deliber-0.042
eh-0.041
vous-0.040
 eaten-0.039
Token have
Token ID423
Feature activation-2.50e-03

Pos logit contributions
 outfits+0.030
 appearance+0.029
 shirts+0.028
 houses+0.027
 stations+0.027
Neg logit contributions
 rescued-0.040
eh-0.039
vous-0.039
 deliber-0.038
year-0.035
Token fun
Token ID1257
Feature activation+1.53e-02

Pos logit contributions
 outfits+0.005
 shirts+0.005
 appearance+0.005
 houses+0.005
 stations+0.004
Neg logit contributions
eh-0.004
vous-0.004
 rescued-0.004
 deliber-0.004
year-0.004
Token and
Token ID290
Feature activation-3.05e-02

Pos logit contributions
 rescued+0.030
eh+0.029
vous+0.029
 deliber+0.028
year+0.027
Neg logit contributions
 outfits-0.023
 appearance-0.022
 shirts-0.021
 houses-0.021
 stations-0.021
Token forget
Token ID6044
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.051
 appearance+0.050
 shirts+0.048
 houses+0.047
 languages+0.047
Neg logit contributions
 rescued-0.053
 deliber-0.050
vous-0.049
eh-0.049
 eaten-0.046
Token about
Token ID546
Feature activation-2.05e-02

Pos logit contributions
 outfits+0.249
 appearance+0.247
 stations+0.235
 shirts+0.235
ies+0.233
Neg logit contributions
 rescued-0.153
 deliber-0.143
eh-0.139
 eaten-0.131
vous-0.131
Token the
Token ID262
Feature activation-7.31e-03

Pos logit contributions
 outfits+0.039
 appearance+0.038
 shirts+0.037
 languages+0.037
 stations+0.037
Neg logit contributions
 rescued-0.030
eh-0.028
 deliber-0.028
vous-0.027
 eaten-0.026
Token juice
Token ID13135
Feature activation-1.00e-02

Pos logit contributions
 outfits+0.012
 countries+0.012
 appearance+0.012
 languages+0.012
 stations+0.012
Neg logit contributions
 rescued-0.012
eh-0.011
vous-0.011
 che-0.011
 deliber-0.011
Token.
Token ID13
Feature activation-2.17e-02

Pos logit contributions
 appearance+0.022
 outfits+0.021
 languages+0.021
 cultures+0.021
inate+0.020
Neg logit contributions
 rescued-0.013
on-0.011
eh-0.011
 eaten-0.011
 saved-0.011
Token They
Token ID1119
Feature activation-2.16e-02

Pos logit contributions
 outfits+0.039
 appearance+0.038
 shirts+0.037
ies+0.036
 stations+0.036
Neg logit contributions
 rescued-0.035
eh-0.034
 deliber-0.033
vous-0.033
 eaten-0.031
Token lesson
Token ID11483
Feature activation-2.16e-02

Pos logit contributions
 outfits+0.047
 appearance+0.046
 shirts+0.044
 houses+0.044
 languages+0.043
Neg logit contributions
 rescued-0.047
eh-0.046
vous-0.045
 deliber-0.044
year-0.041
Token.
Token ID13
Feature activation-2.31e-02

Pos logit contributions
 outfits+0.037
 appearance+0.037
 shirts+0.036
 houses+0.035
 stations+0.035
Neg logit contributions
 rescued-0.037
eh-0.036
vous-0.035
 deliber-0.034
 eaten-0.032
Token They
Token ID1119
Feature activation-2.29e-02

Pos logit contributions
 outfits+0.033
 appearance+0.032
ies+0.030
 shirts+0.030
 stations+0.029
Neg logit contributions
 rescued-0.046
 deliber-0.045
eh-0.044
vous-0.044
 eaten-0.042
Token did
Token ID750
Feature activation-8.18e-03

Pos logit contributions
 outfits+0.039
 shirts+0.038
 countries+0.036
 stations+0.036
 appearance+0.036
Neg logit contributions
eh-0.040
 deliber-0.036
vous-0.036
itt-0.034
on-0.034
Token not
Token ID407
Feature activation-8.55e-03

Pos logit contributions
 outfits+0.016
 appearance+0.015
 shirts+0.015
 stations+0.015
 languages+0.015
Neg logit contributions
 rescued-0.012
eh-0.012
 deliber-0.011
vous-0.011
 eaten-0.011
Token forget
Token ID6044
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.016
 shirts+0.015
 houses+0.015
 stations+0.015
 appearance+0.015
Neg logit contributions
 rescued-0.014
vous-0.014
eh-0.013
 deliber-0.013
 eaten-0.013
Token their
Token ID511
Feature activation-1.34e-02

Pos logit contributions
 outfits+0.272
 appearance+0.259
maid+0.255
 shirts+0.251
 languages+0.246
Neg logit contributions
 rescued-0.142
 saved-0.123
eh-0.123
 deliber-0.122
 eaten-0.120
Token new
Token ID649
Feature activation-3.26e-02

Pos logit contributions
 stations+0.021
maid+0.020
 outfits+0.020
 languages+0.020
 appearance+0.020
Neg logit contributions
 deliber-0.023
 rescued-0.022
vous-0.021
eh-0.021
 eaten-0.020
Token shoes
Token ID10012
Feature activation+2.05e-02

Pos logit contributions
hips+0.044
 stations+0.043
 appearance+0.041
 languages+0.041
 trigger+0.040
Neg logit contributions
 deliber-0.061
eh-0.058
 rescued-0.057
 eaten-0.055
vous-0.055
Token again
Token ID757
Feature activation-1.73e-03

Pos logit contributions
 rescued+0.032
vous+0.029
eh+0.029
 deliber+0.028
 eaten+0.027
Neg logit contributions
 outfits-0.039
 appearance-0.038
 languages-0.037
 shirts-0.036
 houses-0.036
Token.
Token ID13
Feature activation-2.42e-02

Pos logit contributions
 outfits+0.003
 appearance+0.003
 shirts+0.003
 houses+0.003
 stations+0.003
Neg logit contributions
 rescued-0.003
eh-0.003
vous-0.003
 deliber-0.003
 eaten-0.002
Token fly
Token ID6129
Feature activation-1.97e-02

Pos logit contributions
 outfits+0.068
 appearance+0.066
 languages+0.065
 countries+0.062
 stations+0.062
Neg logit contributions
 rescued-0.066
on-0.059
eh-0.058
 saved-0.057
 deliber-0.056
Token fast
Token ID3049
Feature activation+1.40e-02

Pos logit contributions
 outfits+0.045
 faces+0.044
 appearance+0.043
 shirts+0.042
maid+0.042
Neg logit contributions
 rescued-0.023
eh-0.019
 saved-0.019
 defeated-0.018
sm-0.017
Token and
Token ID290
Feature activation-2.34e-02

Pos logit contributions
 rescued+0.020
 saved+0.017
 eaten+0.017
eh+0.017
 che+0.017
Neg logit contributions
 outfits-0.027
maid-0.026
 appearance-0.026
inate-0.025
utes-0.024
Token loud
Token ID7812
Feature activation-4.04e-03

Pos logit contributions
 outfits+0.044
maid+0.041
 appearance+0.041
 shirts+0.041
ies+0.040
Neg logit contributions
 rescued-0.038
 saved-0.033
eh-0.032
vous-0.032
 che-0.032
Token.
Token ID13
Feature activation-2.11e-02

Pos logit contributions
 outfits+0.008
 appearance+0.008
maid+0.007
ies+0.007
 shirts+0.007
Neg logit contributions
 rescued-0.006
eh-0.005
 saved-0.005
 deliber-0.005
vous-0.005
Token Watch
Token ID6305
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.044
ies+0.043
 appearance+0.043
 shirts+0.041
 countries+0.040
Neg logit contributions
 rescued-0.029
vous-0.026
 deliber-0.026
eh-0.026
on-0.025
Token this
Token ID428
Feature activation-4.15e-02

Pos logit contributions
 outfits+0.213
 appearance+0.213
 cultures+0.205
 countries+0.205
 languages+0.204
Neg logit contributions
 rescued-0.198
eh-0.187
on-0.184
 deliber-0.179
vous-0.172
Token!"
Token ID2474
Feature activation-1.06e-02

Pos logit contributions
 appearance+0.088
 outfits+0.088
 shirts+0.087
 cultures+0.086
 faces+0.086
Neg logit contributions
eh-0.045
 rescued-0.043
on-0.041
 saved-0.036
 eaten-0.036
Token
Token ID198
Feature activation-7.75e-03

Pos logit contributions
 appearance+0.022
 outfits+0.022
 shirts+0.021
 languages+0.021
 stations+0.021
Neg logit contributions
 rescued-0.015
eh-0.014
 deliber-0.014
vous-0.013
on-0.012
Token
Token ID198
Feature activation-6.26e-03

Pos logit contributions
 appearance+0.017
 outfits+0.017
 shirts+0.016
 stations+0.016
 languages+0.016
Neg logit contributions
 rescued-0.010
eh-0.009
vous-0.009
 deliber-0.009
on-0.008
TokenMax
Token ID11518
Feature activation-1.41e-02

Pos logit contributions
 appearance+0.013
 outfits+0.012
 languages+0.012
maid+0.012
 faces+0.012
Neg logit contributions
eh-0.009
 rescued-0.009
vous-0.009
jected-0.008
 deliber-0.008
Token again
Token ID757
Feature activation-1.62e-03

Pos logit contributions
 outfits+0.124
 appearance+0.122
 shirts+0.118
 stations+0.117
 languages+0.115
Neg logit contributions
 rescued-0.078
eh-0.070
 saved-0.067
 eaten-0.066
year-0.065
Token tomorrow
Token ID9439
Feature activation+2.35e-02

Pos logit contributions
 outfits+0.003
 appearance+0.003
 shirts+0.003
 languages+0.003
 faces+0.003
Neg logit contributions
 rescued-0.003
eh-0.003
vous-0.002
 deliber-0.002
 eaten-0.002
Token.
Token ID13
Feature activation-2.68e-02

Pos logit contributions
 rescued+0.035
 deliber+0.033
eh+0.033
vous+0.032
 eaten+0.031
Neg logit contributions
 outfits-0.044
 appearance-0.043
 languages-0.042
 hoops-0.042
 shirts-0.041
Token Don
Token ID2094
Feature activation-5.12e-02

Pos logit contributions
 outfits+0.044
 appearance+0.044
 shirts+0.043
 houses+0.042
 languages+0.042
Neg logit contributions
eh-0.046
 rescued-0.046
vous-0.044
 deliber-0.043
 eaten-0.040
Token't
Token ID470
Feature activation+8.53e-04

Pos logit contributions
 outfits+0.098
 appearance+0.097
 shirts+0.095
 houses+0.095
 languages+0.094
Neg logit contributions
 rescued-0.074
eh-0.073
vous-0.071
 deliber-0.069
year-0.065
Token forget
Token ID6044
Feature activation-1.21e-01

Pos logit contributions
 rescued+0.001
vous+0.001
 che+0.001
 eaten+0.001
 deliber+0.001
Neg logit contributions
 outfits-0.002
 stations-0.002
 shirts-0.002
utes-0.002
 hoops-0.002
Token me
Token ID502
Feature activation-4.66e-02

Pos logit contributions
 outfits+0.246
 shirts+0.231
 stations+0.231
 languages+0.229
 hoops+0.229
Neg logit contributions
 rescued-0.159
eh-0.149
 deliber-0.145
on-0.137
vous-0.137
Token!"
Token ID2474
Feature activation-9.54e-03

Pos logit contributions
 outfits+0.076
 appearance+0.075
 shirts+0.072
 houses+0.071
 stations+0.071
Neg logit contributions
 rescued-0.084
eh-0.081
vous-0.080
 deliber-0.078
 eaten-0.073
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
 outfits+0.020
 appearance+0.020
 shirts+0.019
 stations+0.019
 houses+0.018
Neg logit contributions
 rescued-0.013
eh-0.013
vous-0.012
 deliber-0.012
on-0.011
Token
Token ID198
Feature activation-9.01e-03

Pos logit contributions
 appearance+0.022
 outfits+0.021
 shirts+0.020
 stations+0.020
 languages+0.020
Neg logit contributions
 rescued-0.013
eh-0.013
vous-0.012
 deliber-0.012
on-0.011
TokenL
Token ID43
Feature activation+6.24e-03

Pos logit contributions
 appearance+0.018
 outfits+0.017
 languages+0.016
 shirts+0.016
 faces+0.016
Neg logit contributions
eh-0.014
 rescued-0.014
vous-0.013
on-0.012
 deliber-0.012
Token They
Token ID1119
Feature activation-2.79e-02

Pos logit contributions
 outfits+0.051
 appearance+0.050
ies+0.048
 shirts+0.047
 stations+0.047
Neg logit contributions
 rescued-0.043
 deliber-0.041
eh-0.040
vous-0.040
 eaten-0.037
Token have
Token ID423
Feature activation-1.78e-03

Pos logit contributions
 outfits+0.044
 appearance+0.043
 shirts+0.041
 houses+0.041
 languages+0.040
Neg logit contributions
 rescued-0.053
eh-0.050
vous-0.049
 deliber-0.049
 saved-0.046
Token fun
Token ID1257
Feature activation+1.27e-02

Pos logit contributions
 shirts+0.003
 outfits+0.003
 houses+0.003
 faces+0.003
 appearance+0.003
Neg logit contributions
eh-0.003
vous-0.003
year-0.003
 deliber-0.003
itt-0.003
Token.
Token ID13
Feature activation-2.86e-02

Pos logit contributions
 rescued+0.022
eh+0.022
vous+0.021
 deliber+0.021
year+0.019
Neg logit contributions
 outfits-0.022
 appearance-0.021
 shirts-0.021
 houses-0.020
 stations-0.020
Token They
Token ID1119
Feature activation-2.96e-02

Pos logit contributions
 outfits+0.054
 appearance+0.053
 shirts+0.051
ies+0.050
 stations+0.050
Neg logit contributions
 rescued-0.045
eh-0.042
vous-0.042
 deliber-0.042
 eaten-0.038
Token forget
Token ID6044
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.043
 appearance+0.043
 shirts+0.041
 houses+0.040
 languages+0.040
Neg logit contributions
 rescued-0.059
eh-0.056
vous-0.055
 deliber-0.055
 saved-0.051
Token about
Token ID546
Feature activation-2.17e-02

Pos logit contributions
 outfits+0.230
 appearance+0.227
 shirts+0.218
 stations+0.215
 languages+0.214
Neg logit contributions
 rescued-0.181
eh-0.171
 deliber-0.168
vous-0.166
 eaten-0.156
Token the
Token ID262
Feature activation-7.81e-03

Pos logit contributions
 outfits+0.043
 appearance+0.042
 shirts+0.041
 houses+0.040
 stations+0.040
Neg logit contributions
 rescued-0.032
eh-0.031
vous-0.030
 deliber-0.029
year-0.027
Token hurt
Token ID5938
Feature activation-3.66e-02

Pos logit contributions
 outfits+0.014
 appearance+0.013
 languages+0.012
 shirts+0.012
 stations+0.012
Neg logit contributions
 rescued-0.013
eh-0.013
vous-0.012
 deliber-0.012
 che-0.012
Token.
Token ID13
Feature activation-2.79e-02

Pos logit contributions
 outfits+0.059
 appearance+0.058
 languages+0.055
inate+0.052
 countries+0.052
Neg logit contributions
eh-0.066
 rescued-0.061
vous-0.061
on-0.060
 eaten-0.059
Token<|endoftext|>
Token ID50256
Feature activation-1.04e-01

Pos logit contributions
 outfits+0.048
 appearance+0.048
 shirts+0.045
ies+0.045
 stations+0.045
Neg logit contributions
 rescued-0.048
eh-0.045
vous-0.045
 deliber-0.045
 eaten-0.041
Token They
Token ID1119
Feature activation-2.34e-02

Pos logit contributions
ies+0.050
 outfits+0.049
 appearance+0.048
maid+0.046
 countries+0.043
Neg logit contributions
 rescued-0.046
 deliber-0.045
vous-0.041
 eaten-0.040
 saved-0.040
Token have
Token ID423
Feature activation-3.32e-03

Pos logit contributions
 appearance+0.042
 outfits+0.041
 languages+0.039
ies+0.039
 houses+0.038
Neg logit contributions
 rescued-0.040
 deliber-0.035
eh-0.034
vous-0.034
 saved-0.034
Token fun
Token ID1257
Feature activation+1.37e-02

Pos logit contributions
 appearance+0.007
 outfits+0.007
ies+0.007
 languages+0.007
 shirts+0.006
Neg logit contributions
 rescued-0.005
eh-0.004
vous-0.004
 deliber-0.004
 eaten-0.004
Token.
Token ID13
Feature activation-2.85e-02

Pos logit contributions
 rescued+0.022
eh+0.021
 deliber+0.021
vous+0.020
on+0.019
Neg logit contributions
 appearance-0.025
 outfits-0.024
 shirts-0.023
 languages-0.023
 countries-0.023
Token They
Token ID1119
Feature activation-2.51e-02

Pos logit contributions
 outfits+0.052
 appearance+0.050
ies+0.050
maid+0.047
 shirts+0.047
Neg logit contributions
 rescued-0.046
 deliber-0.044
vous-0.043
eh-0.042
 eaten-0.040
Token forget
Token ID6044
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.042
 appearance+0.041
 languages+0.039
 houses+0.039
ies+0.039
Neg logit contributions
 rescued-0.045
vous-0.040
 deliber-0.040
eh-0.040
 saved-0.040
Token about
Token ID546
Feature activation-2.14e-02

Pos logit contributions
 appearance+0.259
 outfits+0.258
maid+0.246
 stations+0.244
ies+0.243
Neg logit contributions
 rescued-0.137
 deliber-0.128
eh-0.120
 saved-0.118
on-0.117
Token the
Token ID262
Feature activation-9.79e-03

Pos logit contributions
 outfits+0.044
 appearance+0.042
 shirts+0.041
ies+0.041
 languages+0.041
Neg logit contributions
 rescued-0.028
 deliber-0.027
eh-0.025
 eaten-0.025
vous-0.025
Token sun
Token ID4252
Feature activation-4.61e-04

Pos logit contributions
 outfits+0.016
 appearance+0.016
inate+0.015
ies+0.015
utes+0.015
Neg logit contributions
 rescued-0.016
vous-0.015
 deliber-0.015
eh-0.014
 saved-0.014
Token,
Token ID11
Feature activation-2.83e-02

Pos logit contributions
 appearance+0.001
 outfits+0.001
 languages+0.001
 countries+0.001
 hoops+0.001
Neg logit contributions
 rescued-0.001
on-0.001
 eaten-0.001
 saved-0.001
 deliber-0.001
Token the
Token ID262
Feature activation-1.03e-02

Pos logit contributions
 outfits+0.063
 appearance+0.061
 languages+0.059
maid+0.059
utes+0.059
Neg logit contributions
 rescued-0.035
 saved-0.030
eh-0.030
 eaten-0.030
 deliber-0.029
Token the
Token ID262
Feature activation-8.39e-03

Pos logit contributions
 outfits+0.074
utes+0.073
maid+0.073
 appearance+0.072
ies+0.069
Neg logit contributions
 rescued-0.032
 eaten-0.027
 deliber-0.026
 saved-0.026
 che-0.025
Token sees
Token ID7224
Feature activation-2.25e-02

Pos logit contributions
inate+0.014
 outfits+0.014
 appearance+0.013
uing+0.013
ies+0.013
Neg logit contributions
 rescued-0.013
 che-0.013
eh-0.013
vous-0.012
 saved-0.012
Tokenaw
Token ID707
Feature activation+4.94e-02

Pos logit contributions
 appearance+0.051
 outfits+0.049
inate+0.048
ies+0.047
maid+0.046
Neg logit contributions
 rescued-0.027
 saved-0.024
on-0.023
eh-0.023
sm-0.023
Token.
Token ID13
Feature activation-2.70e-02

Pos logit contributions
 rescued+0.053
on+0.049
eh+0.049
 eaten+0.045
sm+0.043
Neg logit contributions
 outfits-0.115
 appearance-0.110
 languages-0.107
 shirts-0.103
 countries-0.103
Token They
Token ID1119
Feature activation-2.75e-02

Pos logit contributions
 outfits+0.046
ies+0.045
 appearance+0.044
maid+0.042
 languages+0.041
Neg logit contributions
 rescued-0.047
 deliber-0.045
vous-0.043
eh-0.043
 eaten-0.042
Token forget
Token ID6044
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.050
 appearance+0.049
ies+0.048
utes+0.047
 languages+0.047
Neg logit contributions
 rescued-0.047
 saved-0.041
 che-0.039
 deliber-0.039
vous-0.038
Token about
Token ID546
Feature activation-2.16e-02

Pos logit contributions
 outfits+0.265
 appearance+0.264
maid+0.252
ies+0.252
 shirts+0.250
Neg logit contributions
 rescued-0.135
 deliber-0.124
 saved-0.117
year-0.114
 eaten-0.114
Token the
Token ID262
Feature activation-8.64e-03

Pos logit contributions
 outfits+0.050
 appearance+0.047
ies+0.047
maid+0.047
utes+0.047
Neg logit contributions
 rescued-0.022
 deliber-0.021
 eaten-0.019
 che-0.018
 saved-0.018
Token new
Token ID649
Feature activation-3.36e-02

Pos logit contributions
 outfits+0.016
inate+0.015
maid+0.014
 appearance+0.014
uing+0.014
Neg logit contributions
 rescued-0.013
vous-0.013
 che-0.012
eh-0.012
 deliber-0.012
Token boy
Token ID2933
Feature activation-2.14e-02

Pos logit contributions
inate+0.054
 appearance+0.054
 outfits+0.053
 stations+0.052
uing+0.050
Neg logit contributions
eh-0.050
 deliber-0.050
vous-0.049
 che-0.049
 rescued-0.048
Token and
Token ID290
Feature activation-3.31e-02

Pos logit contributions
 appearance+0.039
 outfits+0.039
 stations+0.036
 shirts+0.035
 houses+0.035
Neg logit contributions
 rescued-0.035
on-0.033
eh-0.032
vous-0.032
 eaten-0.032
Token will
Token ID481
Feature activation-2.80e-02

Pos logit contributions
 outfits+0.016
 appearance+0.015
 shirts+0.015
 houses+0.015
 stations+0.015
Neg logit contributions
 rescued-0.014
vous-0.013
eh-0.013
 deliber-0.013
 eaten-0.012
Token take
Token ID1011
Feature activation-1.93e-02

Pos logit contributions
lights+0.060
 outfits+0.060
 stations+0.058
 shirts+0.057
 houses+0.057
Neg logit contributions
 rescued-0.042
 saved-0.036
 eaten-0.036
 deliber-0.034
 che-0.033
Token some
Token ID617
Feature activation-2.91e-02

Pos logit contributions
 outfits+0.037
 appearance+0.036
 stations+0.035
 shirts+0.035
ies+0.034
Neg logit contributions
 rescued-0.030
eh-0.027
vous-0.027
 deliber-0.026
on-0.026
Token money
Token ID1637
Feature activation+2.30e-02

Pos logit contributions
 appearance+0.059
 outfits+0.058
 stations+0.058
maid+0.057
inate+0.055
Neg logit contributions
 rescued-0.040
eh-0.036
 eaten-0.035
 deliber-0.035
sm-0.034
Token.
Token ID13
Feature activation-2.55e-02

Pos logit contributions
 rescued+0.032
 eaten+0.028
eh+0.027
on+0.027
 saved+0.027
Neg logit contributions
 outfits-0.047
 appearance-0.047
 languages-0.045
 hoops-0.044
 faces-0.044
Token Watch
Token ID6305
Feature activation-1.21e-01

Pos logit contributions
 outfits+0.046
 appearance+0.045
 shirts+0.043
ies+0.043
 houses+0.042
Neg logit contributions
 rescued-0.042
eh-0.040
vous-0.040
 deliber-0.039
year-0.036
Token me
Token ID502
Feature activation-4.48e-02

Pos logit contributions
 outfits+0.225
 appearance+0.219
ies+0.211
 stations+0.211
 languages+0.210
Neg logit contributions
 rescued-0.191
eh-0.184
on-0.177
 deliber-0.172
vous-0.168
Token."
Token ID526
Feature activation+2.75e-03

Pos logit contributions
 outfits+0.084
 appearance+0.082
 shirts+0.080
 stations+0.080
 languages+0.080
Neg logit contributions
 rescued-0.075
eh-0.070
vous-0.068
 deliber-0.065
 eaten-0.064
Token He
Token ID679
Feature activation-2.56e-02

Pos logit contributions
 rescued+0.004
eh+0.004
vous+0.004
 deliber+0.004
year+0.003
Neg logit contributions
 outfits-0.006
 appearance-0.006
 shirts-0.005
 houses-0.005
 stations-0.005
Token runs
Token ID4539
Feature activation-1.20e-02

Pos logit contributions
 appearance+0.047
 outfits+0.043
 stations+0.042
ies+0.042
 languages+0.041
Neg logit contributions
 rescued-0.045
eh-0.041
 deliber-0.039
 saved-0.039
 che-0.038
Token to
Token ID284
Feature activation+3.15e-03

Pos logit contributions
 outfits+0.022
 appearance+0.022
 shirts+0.021
 languages+0.021
 stations+0.021
Neg logit contributions
 rescued-0.019
eh-0.018
vous-0.018
 deliber-0.018
year-0.017
Token and
Token ID290
Feature activation-2.54e-02

Pos logit contributions
 appearance+0.047
 outfits+0.045
 hoops+0.042
 stations+0.042
utes+0.042
Neg logit contributions
 rescued-0.065
vous-0.061
eh-0.059
 deliber-0.058
 saved-0.058
Token their
Token ID511
Feature activation-1.35e-02

Pos logit contributions
 outfits+0.037
 appearance+0.036
 shirts+0.034
 stations+0.033
 languages+0.033
Neg logit contributions
 rescued-0.050
eh-0.048
 deliber-0.047
vous-0.047
 eaten-0.045
Token music
Token ID2647
Feature activation-3.56e-02

Pos logit contributions
inate+0.019
 appearance+0.019
 stations+0.018
maid+0.018
en+0.018
Neg logit contributions
 rescued-0.023
 deliber-0.021
vous-0.020
 saved-0.020
 satisfied-0.020
Token.
Token ID13
Feature activation-2.49e-02

Pos logit contributions
 appearance+0.081
inate+0.077
 countries+0.073
 outfits+0.073
utes+0.072
Neg logit contributions
 rescued-0.042
 helped-0.036
 saved-0.036
vous-0.034
 eaten-0.034
Token They
Token ID1119
Feature activation-2.24e-02

Pos logit contributions
ies+0.049
 outfits+0.048
 appearance+0.048
maid+0.047
 countries+0.045
Neg logit contributions
 rescued-0.035
 deliber-0.035
vous-0.030
 The-0.029
 helped-0.029
Token forget
Token ID6044
Feature activation-1.20e-01

Pos logit contributions
 appearance+0.038
 outfits+0.038
 houses+0.036
 shirts+0.036
ies+0.036
Neg logit contributions
 rescued-0.039
 deliber-0.035
vous-0.035
eh-0.035
 saved-0.034
Token about
Token ID546
Feature activation-2.26e-02

Pos logit contributions
 appearance+0.263
 outfits+0.260
maid+0.252
 heads+0.248
ies+0.248
Neg logit contributions
 rescued-0.132
 deliber-0.129
on-0.116
 saved-0.115
year-0.114
Token the
Token ID262
Feature activation-1.11e-02

Pos logit contributions
 outfits+0.046
 appearance+0.043
 countries+0.042
maid+0.042
utes+0.042
Neg logit contributions
 rescued-0.030
 deliber-0.029
 eaten-0.027
eh-0.026
 che-0.025
Token lion
Token ID18744
Feature activation-1.85e-02

Pos logit contributions
 appearance+0.018
 outfits+0.018
maid+0.018
inate+0.017
uing+0.017
Neg logit contributions
 rescued-0.018
vous-0.017
eh-0.017
 deliber-0.016
 saved-0.016
Token and
Token ID290
Feature activation-2.59e-02

Pos logit contributions
 outfits+0.033
 appearance+0.033
utes+0.032
inate+0.031
 stations+0.031
Neg logit contributions
 rescued-0.030
on-0.028
 saved-0.027
 eaten-0.027
eh-0.026
Token the
Token ID262
Feature activation-1.16e-02

Pos logit contributions
 outfits+0.058
 appearance+0.055
utes+0.055
maid+0.055
ies+0.054
Neg logit contributions
 rescued-0.033
 eaten-0.028
 saved-0.027
 deliber-0.027
eh-0.026
Token tomorrow
Token ID9439
Feature activation+2.20e-02

Pos logit contributions
 outfits+0.003
 appearance+0.002
 shirts+0.002
 languages+0.002
 stations+0.002
Neg logit contributions
 rescued-0.002
eh-0.002
vous-0.002
 deliber-0.002
 eaten-0.002
Token.
Token ID13
Feature activation-2.99e-02

Pos logit contributions
 rescued+0.032
eh+0.031
 deliber+0.030
vous+0.030
 eaten+0.028
Neg logit contributions
 outfits-0.043
 appearance-0.042
 languages-0.041
 stations-0.040
 shirts-0.040
Token And
Token ID843
Feature activation-1.75e-04

Pos logit contributions
 appearance+0.049
 outfits+0.048
 shirts+0.046
 houses+0.045
 languages+0.045
Neg logit contributions
eh-0.053
 rescued-0.051
vous-0.050
 deliber-0.049
dd-0.046
Token don
Token ID836
Feature activation+1.68e-02

Pos logit contributions
 outfits+0.000
 appearance+0.000
 shirts+0.000
 stations+0.000
ies+0.000
Neg logit contributions
 rescued-0.000
 deliber-0.000
vous-0.000
 saved-0.000
eh-0.000
Token't
Token ID470
Feature activation-2.36e-04

Pos logit contributions
eh+0.026
 rescued+0.026
vous+0.025
 deliber+0.024
year+0.023
Neg logit contributions
 outfits-0.030
 appearance-0.030
 houses-0.030
 shirts-0.029
 languages-0.029
Token forget
Token ID6044
Feature activation-1.20e-01

Pos logit contributions
 outfits+0.000
 stations+0.000
 shirts+0.000
utes+0.000
 hoops+0.000
Neg logit contributions
 rescued-0.000
 che-0.000
vous-0.000
 eaten-0.000
 deliber-0.000
Token your
Token ID534
Feature activation-3.00e-02

Pos logit contributions
 outfits+0.223
 languages+0.211
 appearance+0.211
 stations+0.210
ies+0.209
Neg logit contributions
 rescued-0.181
eh-0.171
 deliber-0.165
vous-0.162
 saved-0.157
Token bone
Token ID9970
Feature activation+1.33e-02

Pos logit contributions
 stations+0.055
 outfits+0.054
angles+0.053
 countries+0.053
inate+0.053
Neg logit contributions
 rescued-0.045
 deliber-0.041
vous-0.041
eh-0.039
 eaten-0.039
Token.
Token ID13
Feature activation-3.29e-02

Pos logit contributions
 rescued+0.019
vous+0.017
eh+0.016
 saved+0.016
 eaten+0.016
Neg logit contributions
 outfits-0.027
 appearance-0.027
 languages-0.027
 shirts-0.026
 stations-0.026
Token It
Token ID632
Feature activation-1.53e-03

Pos logit contributions
 outfits+0.063
 appearance+0.062
 shirts+0.060
 houses+0.060
 stations+0.059
Neg logit contributions
 rescued-0.049
eh-0.047
vous-0.047
 deliber-0.046
year-0.042
Token's
Token ID338
Feature activation-5.66e-02

Pos logit contributions
 outfits+0.002
 appearance+0.002
 shirts+0.002
 houses+0.002
 stations+0.002
Neg logit contributions
 rescued-0.003
eh-0.003
vous-0.003
 deliber-0.003
year-0.003
Token
Token ID198
Feature activation+1.34e-02

Pos logit contributions
 rescued+0.018
eh+0.017
vous+0.017
 deliber+0.016
year+0.015
Neg logit contributions
 outfits-0.027
 appearance-0.027
 shirts-0.026
 houses-0.025
 stations-0.025
TokenThe
Token ID464
Feature activation-1.53e-02

Pos logit contributions
eh+0.020
 rescued+0.019
vous+0.019
 deliber+0.018
year+0.017
Neg logit contributions
 appearance-0.028
 outfits-0.027
ies-0.026
 shirts-0.026
 faces-0.025
Token little
Token ID1310
Feature activation-7.11e-03

Pos logit contributions
 appearance+0.025
 outfits+0.025
 languages+0.024
 hoops+0.024
 stations+0.024
Neg logit contributions
 rescued-0.025
vous-0.025
 deliber-0.025
eh-0.024
year-0.024
Token three
Token ID1115
Feature activation-8.49e-03

Pos logit contributions
 appearance+0.010
 outfits+0.009
 languages+0.009
 stations+0.009
 hoops+0.009
Neg logit contributions
 rescued-0.014
vous-0.013
 deliber-0.013
eh-0.013
on-0.013
Token-
Token ID12
Feature activation+1.08e-01

Pos logit contributions
 appearance+0.019
 outfits+0.018
 stations+0.018
 shirts+0.018
 houses+0.017
Neg logit contributions
eh-0.010
vous-0.010
 rescued-0.010
 deliber-0.010
year-0.009
Tokenyear
Token ID1941
Feature activation+1.47e-01

Pos logit contributions
 rescued+0.071
 deliber+0.062
eh+0.055
vous+0.051
 saved+0.042
Neg logit contributions
 outfits-0.292
 languages-0.282
ies-0.282
 stations-0.281
 shirts-0.281
Token-
Token ID12
Feature activation+1.08e-01

Pos logit contributions
 rescued+0.260
eh+0.242
vous+0.242
 deliber+0.234
 saved+0.219
Neg logit contributions
ies-0.247
 outfits-0.246
 appearance-0.243
 houses-0.240
 shirts-0.238
Tokenold
Token ID727
Feature activation+5.93e-02

Pos logit contributions
 rescued+0.143
 deliber+0.131
eh+0.128
vous+0.121
+0.113
Neg logit contributions
 outfits-0.220
 languages-0.211
ies-0.210
 countries-0.208
 shirts-0.207
Token t
Token ID256
Feature activation+1.55e-02

Pos logit contributions
 rescued+0.090
vous+0.082
eh+0.080
 deliber+0.078
 saved+0.076
Neg logit contributions
 appearance-0.117
 outfits-0.114
 stations-0.110
 shirts-0.110
 hoops-0.110
Tokenipt
Token ID10257
Feature activation+8.51e-02

Pos logit contributions
 rescued+0.026
eh+0.026
vous+0.026
year+0.026
itt+0.025
Neg logit contributions
 appearance-0.026
 outfits-0.025
 shirts-0.024
 languages-0.024
 stations-0.024
Tokeno
Token ID78
Feature activation+9.70e-02

Pos logit contributions
 rescued+0.091
eh+0.088
 deliber+0.087
vous+0.084
dd+0.072
Neg logit contributions
 outfits-0.190
ies-0.189
 appearance-0.187
 houses-0.183
 shirts-0.181
Token over
Token ID625
Feature activation+2.25e-02

Pos logit contributions
 appearance+0.093
 outfits+0.089
 countries+0.086
 languages+0.085
 stations+0.084
Neg logit contributions
 rescued-0.080
eh-0.075
 deliber-0.070
vous-0.070
 eaten-0.067
Token.
Token ID13
Feature activation+9.67e-03

Pos logit contributions
 rescued+0.032
 deliber+0.029
eh+0.029
vous+0.028
 eaten+0.026
Neg logit contributions
 appearance-0.046
 outfits-0.046
 shirts-0.044
ies-0.043
 languages-0.043
Token
Token ID198
Feature activation+1.61e-02

Pos logit contributions
 rescued+0.013
eh+0.012
vous+0.012
 deliber+0.012
year+0.011
Neg logit contributions
 outfits-0.020
 appearance-0.020
 shirts-0.019
 stations-0.019
 houses-0.019
Token
Token ID198
Feature activation+1.63e-02

Pos logit contributions
 rescued+0.024
eh+0.023
vous+0.023
 deliber+0.022
year+0.020
Neg logit contributions
 outfits-0.031
 appearance-0.030
 shirts-0.029
 houses-0.029
 stations-0.029
TokenK
Token ID42
Feature activation+8.42e-02

Pos logit contributions
 rescued+0.022
eh+0.022
vous+0.021
 deliber+0.021
on+0.019
Neg logit contributions
 outfits-0.032
 appearance-0.031
 houses-0.031
 shirts-0.031
 stations-0.030
Tokenaren
Token ID5757
Feature activation+1.20e-01

Pos logit contributions
 rescued+0.109
eh+0.107
vous+0.100
 deliber+0.096
 eaten+0.093
Neg logit contributions
 outfits-0.174
 appearance-0.170
 shirts-0.169
ies-0.167
 languages-0.165
Token thought
Token ID1807
Feature activation+1.34e-02

Pos logit contributions
 rescued+0.176
eh+0.166
vous+0.160
 deliber+0.155
 eaten+0.145
Neg logit contributions
 appearance-0.243
 outfits-0.242
 stations-0.232
 shirts-0.230
 languages-0.230
Token,
Token ID11
Feature activation-3.62e-03

Pos logit contributions
 rescued+0.019
 deliber+0.018
eh+0.018
vous+0.017
on+0.017
Neg logit contributions
 outfits-0.027
 appearance-0.027
 shirts-0.025
 houses-0.025
 stations-0.025
Token "
Token ID366
Feature activation-2.20e-02

Pos logit contributions
 outfits+0.006
 appearance+0.006
 shirts+0.006
 houses+0.006
 stations+0.006
Neg logit contributions
 rescued-0.006
eh-0.006
vous-0.006
 deliber-0.006
year-0.005
TokenYou
Token ID1639
Feature activation+1.88e-02

Pos logit contributions
 outfits+0.044
 appearance+0.044
 shirts+0.043
ies+0.042
 faces+0.041
Neg logit contributions
vous-0.033
 rescued-0.033
eh-0.032
 deliber-0.029
year-0.028
Token want
Token ID765
Feature activation+1.95e-03

Pos logit contributions
 rescued+0.034
vous+0.031
eh+0.030
 saved+0.029
 deliber+0.029
Neg logit contributions
 outfits-0.033
 appearance-0.032
 shirts-0.032
 houses-0.031
 stations-0.031
Token he
Token ID339
Feature activation+1.19e-02

Pos logit contributions
 appearance+0.015
 houses+0.015
 outfits+0.015
 shirts+0.014
 stations+0.014
Neg logit contributions
eh-0.010
 rescued-0.010
vous-0.010
 deliber-0.010
year-0.009
Token heard
Token ID2982
Feature activation+8.31e-03

Pos logit contributions
 rescued+0.017
eh+0.017
 deliber+0.017
vous+0.017
year+0.016
Neg logit contributions
 outfits-0.022
 appearance-0.022
 shirts-0.021
 houses-0.021
ies-0.021
Token a
Token ID257
Feature activation+4.77e-04

Pos logit contributions
 rescued+0.009
eh+0.008
vous+0.008
 deliber+0.008
on+0.007
Neg logit contributions
 outfits-0.020
 appearance-0.020
 shirts-0.019
ies-0.019
 stations-0.019
Token loud
Token ID7812
Feature activation+1.35e-02

Pos logit contributions
 rescued+0.001
vous+0.001
eh+0.001
 deliber+0.001
 satisfied+0.001
Neg logit contributions
 outfits-0.001
 shirts-0.001
 appearance-0.001
 stations-0.001
 countries-0.001
Token me
Token ID502
Feature activation-3.12e-02

Pos logit contributions
 rescued+0.029
eh+0.028
vous+0.028
 deliber+0.027
year+0.026
Neg logit contributions
 outfits-0.017
 appearance-0.017
 shirts-0.016
 houses-0.016
 stations-0.016
Tokenowing
Token ID7855
Feature activation+1.45e-01

Pos logit contributions
 appearance+0.075
 outfits+0.075
ies+0.072
 shirts+0.072
 languages+0.071
Neg logit contributions
 rescued-0.030
eh-0.029
 deliber-0.028
vous-0.026
dd-0.024
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
 rescued+0.284
eh+0.279
vous+0.277
 deliber+0.266
on+0.254
Neg logit contributions
 outfits-0.216
 shirts-0.207
 appearance-0.205
 hoops-0.191
 countries-0.189
Token He
Token ID679
Feature activation-7.71e-04

Pos logit contributions
eh+0.023
 rescued+0.022
vous+0.022
dd+0.021
 deliber+0.021
Neg logit contributions
 houses-0.023
 appearance-0.022
 faces-0.022
 shirts-0.022
 outfits-0.022
Token peek
Token ID27185
Feature activation+4.34e-03

Pos logit contributions
 outfits+0.002
 appearance+0.002
 shirts+0.002
 stations+0.001
 languages+0.001
Neg logit contributions
 rescued-0.001
eh-0.001
vous-0.001
 deliber-0.001
year-0.001
Tokened
Token ID276
Feature activation+7.12e-02

Pos logit contributions
 rescued+0.005
eh+0.005
 deliber+0.005
vous+0.005
on+0.004
Neg logit contributions
 outfits-0.009
 appearance-0.009
 houses-0.009
 shirts-0.009
ies-0.009
Token out
Token ID503
Feature activation+2.30e-02

Pos logit contributions
 rescued+0.124
eh+0.116
 deliber+0.113
vous+0.112
year+0.105
Neg logit contributions
 outfits-0.123
 appearance-0.121
 shirts-0.118
 languages-0.115
 stations-0.114
Token One
Token ID1881
Feature activation-6.25e-03

Pos logit contributions
 rescued+0.035
 deliber+0.034
vous+0.033
eh+0.033
on+0.030
Neg logit contributions
 appearance-0.052
 outfits-0.052
ies-0.051
 shirts-0.048
 stations-0.048
Token day
Token ID1110
Feature activation+3.17e-02

Pos logit contributions
 outfits+0.012
 appearance+0.012
 shirts+0.011
 houses+0.011
 languages+0.011
Neg logit contributions
 rescued-0.009
eh-0.009
 deliber-0.009
vous-0.008
dd-0.008
Token,
Token ID11
Feature activation+7.53e-03

Pos logit contributions
 rescued+0.056
eh+0.055
vous+0.055
 deliber+0.053
year+0.050
Neg logit contributions
 outfits-0.051
 appearance-0.050
 shirts-0.049
 houses-0.049
ies-0.048
Token a
Token ID257
Feature activation+6.45e-03

Pos logit contributions
 rescued+0.007
eh+0.007
vous+0.007
 deliber+0.006
year+0.006
Neg logit contributions
 appearance-0.019
 outfits-0.019
 shirts-0.018
 languages-0.017
 stations-0.017
Token three
Token ID1115
Feature activation+2.84e-03

Pos logit contributions
 rescued+0.012
eh+0.012
vous+0.012
 deliber+0.011
on+0.011
Neg logit contributions
 outfits-0.010
 appearance-0.010
 shirts-0.009
 stations-0.009
 languages-0.009
Token-
Token ID12
Feature activation+1.19e-01

Pos logit contributions
eh+0.004
 rescued+0.003
vous+0.003
 deliber+0.003
year+0.003
Neg logit contributions
 appearance-0.006
 outfits-0.006
 shirts-0.006
 stations-0.006
 houses-0.006
Tokenyear
Token ID1941
Feature activation+1.55e-01

Pos logit contributions
 rescued+0.080
 deliber+0.075
+0.067
eh+0.058
 eaten+0.054
Neg logit contributions
 outfits-0.317
 languages-0.312
 stations-0.310
 countries-0.305
 houses-0.305
Token-
Token ID12
Feature activation+1.19e-01

Pos logit contributions
 rescued+0.289
 deliber+0.259
vous+0.258
eh+0.258
 saved+0.245
Neg logit contributions
ies-0.258
 outfits-0.248
 languages-0.240
 countries-0.240
 houses-0.239
Tokenold
Token ID727
Feature activation+6.77e-02

Pos logit contributions
 rescued+0.159
 deliber+0.147
eh+0.137
+0.136
vous+0.132
Neg logit contributions
 outfits-0.246
 languages-0.240
 countries-0.237
ies-0.235
 houses-0.233
Token boy
Token ID2933
Feature activation+9.28e-03

Pos logit contributions
 rescued+0.141
vous+0.135
eh+0.134
 deliber+0.127
 saved+0.123
Neg logit contributions
 appearance-0.095
 outfits-0.093
 shirts-0.089
 stations-0.085
 hoops-0.084
Token was
Token ID373
Feature activation+1.91e-02

Pos logit contributions
 rescued+0.010
eh+0.010
vous+0.009
 deliber+0.009
 eaten+0.008
Neg logit contributions
 appearance-0.022
 outfits-0.022
 shirts-0.021
 languages-0.021
 stations-0.021
Tokenmy
Token ID1820
Feature activation+3.21e-02

Pos logit contributions
 rescued+0.069
eh+0.065
vous+0.063
 deliber+0.060
on+0.059
Neg logit contributions
 outfits-0.088
 appearance-0.087
 stations-0.086
 languages-0.085
 houses-0.085
Token was
Token ID373
Feature activation+1.09e-02

Pos logit contributions
 rescued+0.046
eh+0.042
vous+0.040
 deliber+0.040
 eaten+0.038
Neg logit contributions
 outfits-0.066
 appearance-0.065
 languages-0.064
 stations-0.063
 shirts-0.062
Token a
Token ID257
Feature activation-2.61e-03

Pos logit contributions
 rescued+0.015
vous+0.014
eh+0.014
 deliber+0.014
year+0.012
Neg logit contributions
 outfits-0.021
 appearance-0.021
 shirts-0.021
 houses-0.021
 hoops-0.020
Token 3
Token ID513
Feature activation+1.26e-02

Pos logit contributions
 appearance+0.004
 outfits+0.004
 houses+0.004
 shirts+0.004
ies+0.004
Neg logit contributions
 rescued-0.005
eh-0.005
 deliber-0.005
vous-0.005
 eaten-0.004
Token-
Token ID12
Feature activation+1.11e-01

Pos logit contributions
 rescued+0.019
eh+0.018
vous+0.018
 deliber+0.018
 eaten+0.016
Neg logit contributions
 outfits-0.024
 appearance-0.023
 shirts-0.023
 houses-0.023
ies-0.023
Tokenyear
Token ID1941
Feature activation+1.47e-01

Pos logit contributions
 rescued+0.084
 deliber+0.081
+0.067
eh+0.063
 saved+0.062
Neg logit contributions
 outfits-0.279
ies-0.269
 houses-0.262
 stations-0.261
 languages-0.257
Token-
Token ID12
Feature activation+1.10e-01

Pos logit contributions
 rescued+0.289
eh+0.259
vous+0.258
 deliber+0.257
 saved+0.249
Neg logit contributions
ies-0.231
 outfits-0.221
 faces-0.214
 houses-0.212
 countries-0.211
Tokenold
Token ID727
Feature activation+5.83e-02

Pos logit contributions
 rescued+0.155
 deliber+0.144
+0.141
eh+0.131
 saved+0.130
Neg logit contributions
 outfits-0.215
ies-0.206
 languages-0.197
 countries-0.197
 houses-0.197
Token who
Token ID508
Feature activation+5.85e-02

Pos logit contributions
 rescued+0.118
eh+0.113
vous+0.113
 deliber+0.109
year+0.106
Neg logit contributions
 appearance-0.082
 outfits-0.081
 shirts-0.076
 stations-0.074
 languages-0.074
Token loved
Token ID6151
Feature activation+1.32e-02

Pos logit contributions
 rescued+0.083
eh+0.083
vous+0.079
 deliber+0.078
year+0.070
Neg logit contributions
 outfits-0.115
 appearance-0.113
 shirts-0.110
 houses-0.109
 languages-0.107
Token to
Token ID284
Feature activation+1.53e-02

Pos logit contributions
 rescued+0.016
 deliber+0.016
eh+0.015
vous+0.014
on+0.014
Neg logit contributions
 appearance-0.027
 outfits-0.027
maid-0.027
 stations-0.026
ies-0.026
Token magical
Token ID10883
Feature activation-5.48e-02

Pos logit contributions
 appearance+0.004
 houses+0.004
 outfits+0.004
 shirts+0.003
 hoops+0.003
Neg logit contributions
 rescued-0.005
eh-0.005
vous-0.005
 deliber-0.005
year-0.005
Token nail
Token ID17864
Feature activation+2.75e-02

Pos logit contributions
 outfits+0.092
 appearance+0.090
 shirts+0.087
 houses+0.085
 stations+0.085
Neg logit contributions
 rescued-0.095
eh-0.093
vous-0.091
 deliber-0.090
year-0.083
Token went
Token ID1816
Feature activation+1.04e-03

Pos logit contributions
 rescued+0.045
eh+0.040
vous+0.039
 saved+0.039
 eaten+0.038
Neg logit contributions
 appearance-0.051
inate-0.048
 shirts-0.048
 outfits-0.047
 stations-0.047
Token around
Token ID1088
Feature activation+2.01e-04

Pos logit contributions
 rescued+0.001
eh+0.001
 deliber+0.001
vous+0.001
 saved+0.001
Neg logit contributions
 appearance-0.002
 outfits-0.002
 shirts-0.002
 languages-0.002
 stations-0.002
Token freezing
Token ID20884
Feature activation+7.61e-03

Pos logit contributions
eh+0.000
 rescued+0.000
vous+0.000
 deliber+0.000
year+0.000
Neg logit contributions
 outfits-0.000
 appearance-0.000
 houses-0.000
 stations-0.000
 shirts-0.000
Token everything
Token ID2279
Feature activation+5.08e-02

Pos logit contributions
 rescued+0.015
eh+0.014
vous+0.014
 deliber+0.014
 saved+0.013
Neg logit contributions
 appearance-0.011
 outfits-0.011
 shirts-0.010
 languages-0.010
 countries-0.010
Token in
Token ID287
Feature activation-3.86e-03

Pos logit contributions
 rescued+0.082
eh+0.079
vous+0.078
 deliber+0.076
on+0.070
Neg logit contributions
 outfits-0.091
 appearance-0.090
 shirts-0.087
 languages-0.085
 stations-0.085
Token sight
Token ID6504
Feature activation+4.66e-02

Pos logit contributions
 outfits+0.008
 appearance+0.008
 shirts+0.008
ies+0.008
 languages+0.008
Neg logit contributions
 rescued-0.005
eh-0.005
vous-0.005
 deliber-0.005
 eaten-0.004
Token.
Token ID13
Feature activation+1.64e-02

Pos logit contributions
 rescued+0.058
vous+0.057
eh+0.055
 deliber+0.054
on+0.052
Neg logit contributions
 outfits-0.098
 appearance-0.096
 shirts-0.095
 stations-0.092
 hoops-0.089
Token Everywhere
Token ID48201
Feature activation-5.84e-02

Pos logit contributions
 rescued+0.029
eh+0.029
vous+0.028
 deliber+0.027
year+0.025
Neg logit contributions
 outfits-0.027
 appearance-0.026
 shirts-0.025
 houses-0.025
 languages-0.025
Token he
Token ID339
Feature activation+1.82e-02

Pos logit contributions
 outfits+0.117
 appearance+0.116
 shirts+0.114
 houses+0.113
 languages+0.111
Neg logit contributions
 rescued-0.081
eh-0.080
vous-0.077
 deliber-0.075
year-0.069
Token,
Token ID11
Feature activation-9.19e-03

Pos logit contributions
vous+0.000
eh+0.000
 rescued+0.000
 deliber+0.000
year+0.000
Neg logit contributions
 outfits-0.000
 appearance-0.000
ies-0.000
 faces-0.000
 shirts-0.000
Token "
Token ID366
Feature activation-3.05e-02

Pos logit contributions
 outfits+0.014
 appearance+0.014
 shirts+0.013
 houses+0.013
 stations+0.013
Neg logit contributions
 rescued-0.017
eh-0.017
vous-0.017
 deliber-0.016
year-0.015
TokenBut
Token ID1537
Feature activation-1.41e-03

Pos logit contributions
 outfits+0.048
 appearance+0.047
 shirts+0.045
 houses+0.045
 stations+0.044
Neg logit contributions
 rescued-0.056
eh-0.055
vous-0.054
 deliber-0.053
 eaten-0.049
Token I
Token ID314
Feature activation-8.93e-04

Pos logit contributions
 outfits+0.003
 languages+0.003
 stations+0.002
 shirts+0.002
 countries+0.002
Neg logit contributions
 rescued-0.002
 deliber-0.002
on-0.002
eh-0.002
 saved-0.002
Token want
Token ID765
Feature activation-4.02e-03

Pos logit contributions
 appearance+0.001
 outfits+0.001
ies+0.001
 shirts+0.001
 countries+0.001
Neg logit contributions
eh-0.002
 rescued-0.002
vous-0.002
 deliber-0.002
year-0.002
Token milk
Token ID7545
Feature activation+6.19e-02

Pos logit contributions
 outfits+0.008
 appearance+0.008
 stations+0.008
 languages+0.008
 shirts+0.008
Neg logit contributions
 rescued-0.006
eh-0.005
 deliber-0.005
vous-0.005
 eaten-0.005
Token now
Token ID783
Feature activation+2.06e-02

Pos logit contributions
 rescued+0.081
eh+0.077
on+0.071
 eaten+0.071
vous+0.071
Neg logit contributions
 languages-0.134
 appearance-0.132
 outfits-0.130
 countries-0.129
 stations-0.127
Token!"
Token ID2474
Feature activation+2.62e-04

Pos logit contributions
 rescued+0.032
eh+0.030
 deliber+0.030
vous+0.030
on+0.028
Neg logit contributions
 outfits-0.039
 appearance-0.038
 shirts-0.037
 languages-0.037
 houses-0.037
Token and
Token ID290
Feature activation-9.03e-03

Pos logit contributions
 rescued+0.000
eh+0.000
vous+0.000
 deliber+0.000
year+0.000
Neg logit contributions
 outfits-0.001
 appearance-0.001
 shirts-0.001
 houses-0.001
 stations-0.001
Token started
Token ID2067
Feature activation-7.26e-03

Pos logit contributions
 appearance+0.017
 outfits+0.017
 languages+0.016
 shirts+0.016
 stations+0.016
Neg logit contributions
 rescued-0.014
eh-0.013
 deliber-0.013
vous-0.013
 eaten-0.012
Token to
Token ID284
Feature activation+1.24e-02

Pos logit contributions
 outfits+0.012
 houses+0.011
 shirts+0.011
 appearance+0.011
 faces+0.011
Neg logit contributions
vous-0.014
eh-0.013
 rescued-0.013
jected-0.013
dd-0.012
Token,
Token ID11
Feature activation-2.02e-02

Pos logit contributions
 outfits+0.005
 appearance+0.004
 shirts+0.004
 houses+0.004
 stations+0.004
Neg logit contributions
 rescued-0.004
eh-0.004
vous-0.004
 deliber-0.004
year-0.003
Token "
Token ID366
Feature activation-4.05e-02

Pos logit contributions
 outfits+0.031
 appearance+0.031
 shirts+0.030
 houses+0.029
 stations+0.029
Neg logit contributions
 rescued-0.037
eh-0.037
vous-0.036
 deliber-0.035
year-0.033
TokenThat
Token ID2504
Feature activation-1.20e-02

Pos logit contributions
 outfits+0.066
 shirts+0.066
 houses+0.064
 stations+0.064
 languages+0.063
Neg logit contributions
eh-0.073
 rescued-0.071
vous-0.067
 deliber-0.065
 eaten-0.063
Token is
Token ID318
Feature activation+5.71e-03

Pos logit contributions
 appearance+0.018
 outfits+0.018
 shirts+0.017
 stations+0.017
 hoops+0.017
Neg logit contributions
 rescued-0.023
vous-0.022
eh-0.022
 deliber-0.021
on-0.020
Token very
Token ID845
Feature activation+2.07e-03

Pos logit contributions
eh+0.009
 rescued+0.009
vous+0.009
 deliber+0.008
dd+0.008
Neg logit contributions
 outfits-0.010
 appearance-0.010
 shirts-0.010
 houses-0.010
 languages-0.010
Token sweet
Token ID6029
Feature activation+5.68e-02

Pos logit contributions
 rescued+0.004
eh+0.004
vous+0.004
 deliber+0.004
ez+0.004
Neg logit contributions
 outfits-0.003
ies-0.003
 shirts-0.003
 languages-0.003
 appearance-0.002
Token of
Token ID286
Feature activation-3.09e-02

Pos logit contributions
vous+0.073
 deliber+0.068
eh+0.068
 rescued+0.067
sm+0.065
Neg logit contributions
 appearance-0.119
 outfits-0.116
utes-0.112
 stations-0.112
 shirts-0.110
Token you
Token ID345
Feature activation+7.51e-03

Pos logit contributions
 outfits+0.056
 appearance+0.054
 stations+0.051
 shirts+0.051
 houses+0.050
Neg logit contributions
 rescued-0.051
eh-0.049
 deliber-0.047
vous-0.047
on-0.046
Token.
Token ID13
Feature activation-2.24e-02

Pos logit contributions
 rescued+0.011
eh+0.011
vous+0.011
 deliber+0.010
on+0.010
Neg logit contributions
 outfits-0.014
 appearance-0.014
 shirts-0.014
 stations-0.013
 houses-0.013
Token But
Token ID887
Feature activation-1.48e-02

Pos logit contributions
 appearance+0.035
 outfits+0.034
 shirts+0.033
 languages+0.033
 houses+0.033
Neg logit contributions
eh-0.043
 rescued-0.040
vous-0.040
 deliber-0.038
dd-0.037
Token you
Token ID345
Feature activation+8.04e-03

Pos logit contributions
 outfits+0.025
 appearance+0.024
 shirts+0.024
 stations+0.023
 houses+0.023
Neg logit contributions
 rescued-0.026
 deliber-0.024
eh-0.024
vous-0.024
on-0.022
Tokenens
Token ID641
Feature activation+3.79e-02

Pos logit contributions
 rescued+0.015
eh+0.014
vous+0.014
 deliber+0.014
year+0.013
Neg logit contributions
 outfits-0.018
 appearance-0.017
 shirts-0.017
ies-0.017
 faces-0.017
Token.
Token ID13
Feature activation+4.56e-03

Pos logit contributions
 rescued+0.054
eh+0.049
 deliber+0.048
vous+0.046
 eaten+0.046
Neg logit contributions
 appearance-0.076
 outfits-0.074
 languages-0.071
 shirts-0.071
 stations-0.071
Token When
Token ID1649
Feature activation-3.23e-02

Pos logit contributions
 rescued+0.006
eh+0.005
 deliber+0.005
vous+0.005
 eaten+0.005
Neg logit contributions
 outfits-0.010
 appearance-0.010
 stations-0.010
 countries-0.010
 languages-0.010
Token the
Token ID262
Feature activation+4.93e-03

Pos logit contributions
 outfits+0.056
 appearance+0.055
 shirts+0.054
 houses+0.054
 hoops+0.053
Neg logit contributions
vous-0.054
 rescued-0.053
eh-0.053
 deliber-0.050
year-0.048
Token 30
Token ID1542
Feature activation+4.43e-02

Pos logit contributions
 rescued+0.011
eh+0.011
vous+0.011
 deliber+0.011
on+0.010
Neg logit contributions
 outfits-0.006
 appearance-0.006
 shirts-0.005
 countries-0.005
 stations-0.005
Token minutes
Token ID2431
Feature activation+7.69e-02

Pos logit contributions
 rescued+0.060
eh+0.059
vous+0.056
 deliber+0.056
on+0.050
Neg logit contributions
 outfits-0.091
 appearance-0.091
 shirts-0.087
 houses-0.086
 languages-0.086
Token were
Token ID547
Feature activation+4.92e-03

Pos logit contributions
vous+0.132
eh+0.131
 rescued+0.128
 deliber+0.120
jected+0.117
Neg logit contributions
 outfits-0.128
 shirts-0.122
ies-0.121
 houses-0.120
 appearance-0.118
Token up
Token ID510
Feature activation-9.42e-03

Pos logit contributions
 rescued+0.008
eh+0.008
vous+0.007
 deliber+0.007
year+0.007
Neg logit contributions
 outfits-0.009
 appearance-0.009
 shirts-0.009
 houses-0.009
 stations-0.008
Token,
Token ID11
Feature activation-9.09e-03

Pos logit contributions
 outfits+0.016
 appearance+0.016
 shirts+0.016
 houses+0.015
ies+0.015
Neg logit contributions
 rescued-0.016
eh-0.015
vous-0.015
 deliber-0.015
year-0.014
Token Lily
Token ID20037
Feature activation-7.25e-03

Pos logit contributions
 outfits+0.018
 appearance+0.018
 languages+0.018
 shirts+0.017
 stations+0.017
Neg logit contributions
 rescued-0.013
eh-0.013
 deliber-0.012
vous-0.012
 eaten-0.011
Token turned
Token ID2900
Feature activation-9.68e-05

Pos logit contributions
 appearance+0.015
 outfits+0.014
 languages+0.014
 stations+0.014
 countries+0.014
Neg logit contributions
 rescued-0.011
eh-0.010
vous-0.009
 deliber-0.009
 saved-0.009
Token tell
Token ID1560
Feature activation-1.39e-02

Pos logit contributions
 appearance+0.008
 languages+0.008
 stations+0.008
 nations+0.008
 countries+0.008
Neg logit contributions
 rescued-0.007
on-0.006
 deliber-0.006
 oblig-0.006
 saved-0.006
Token that
Token ID326
Feature activation-5.84e-03

Pos logit contributions
 appearance+0.031
ies+0.031
inate+0.030
utes+0.030
 stations+0.030
Neg logit contributions
 rescued-0.013
 deliber-0.013
on-0.013
zed-0.012
 saved-0.011
Token his
Token ID465
Feature activation-1.29e-04

Pos logit contributions
 appearance+0.015
 outfits+0.014
 heads+0.014
inate+0.014
maid+0.014
Neg logit contributions
on-0.005
 rescued-0.005
 deliber-0.005
 che-0.004
 saved-0.004
Token mom
Token ID1995
Feature activation-5.30e-03

Pos logit contributions
 stations+0.000
hips+0.000
angles+0.000
 trigger+0.000
 nations+0.000
Neg logit contributions
 rescued-0.000
 deliber-0.000
 helped-0.000
 che-0.000
on-0.000
Token was
Token ID373
Feature activation+1.52e-02

Pos logit contributions
 appearance+0.011
 stations+0.011
 outfits+0.011
 hoops+0.011
 faces+0.011
Neg logit contributions
 rescued-0.007
 saved-0.006
vous-0.006
 helped-0.006
on-0.006
Token struggling
Token ID9648
Feature activation+7.56e-02

Pos logit contributions
 rescued+0.020
eh+0.020
vous+0.019
 deliber+0.018
 eaten+0.017
Neg logit contributions
 outfits-0.032
 appearance-0.032
 shirts-0.031
 stations-0.030
 languages-0.030
Token to
Token ID284
Feature activation+2.25e-02

Pos logit contributions
 rescued+0.101
eh+0.096
vous+0.094
 deliber+0.090
 eaten+0.089
Neg logit contributions
 outfits-0.157
 appearance-0.153
 faces-0.149
 stations-0.149
 shirts-0.148
Token make
Token ID787
Feature activation-1.99e-02

Pos logit contributions
 rescued+0.038
eh+0.035
vous+0.035
 deliber+0.034
on+0.033
Neg logit contributions
 outfits-0.040
 appearance-0.039
 languages-0.038
 houses-0.038
 shirts-0.038
Token it
Token ID340
Feature activation+2.01e-03

Pos logit contributions
 appearance+0.044
 languages+0.042
 outfits+0.042
 nations+0.041
 countries+0.040
Neg logit contributions
 rescued-0.023
eh-0.021
on-0.021
 deliber-0.021
vous-0.020
Token.
Token ID13
Feature activation+2.40e-02

Pos logit contributions
 rescued+0.003
eh+0.002
 saved+0.002
vous+0.002
 eaten+0.002
Neg logit contributions
 appearance-0.004
 shirts-0.004
 outfits-0.004
 languages-0.004
 stations-0.004
Token He
Token ID679
Feature activation+5.68e-03

Pos logit contributions
 deliber+0.026
 rescued+0.025
eh+0.024
vous+0.024
on+0.022
Neg logit contributions
 appearance-0.055
 outfits-0.055
ies-0.053
 stations-0.053
 languages-0.052
Token day
Token ID1110
Feature activation+1.73e-02

Pos logit contributions
 appearance+0.039
 outfits+0.038
 languages+0.037
 houses+0.037
 shirts+0.037
Neg logit contributions
 rescued-0.036
eh-0.036
vous-0.035
 deliber-0.033
year-0.033
Token on
Token ID319
Feature activation-1.17e-02

Pos logit contributions
vous+0.043
eh+0.042
 deliber+0.040
sha+0.037
 rescued+0.037
Neg logit contributions
ies-0.013
 houses-0.013
 hoops-0.013
 shirts-0.012
 or-0.012
Token,
Token ID11
Feature activation-1.81e-02

Pos logit contributions
 outfits+0.023
 appearance+0.023
 stations+0.022
 shirts+0.022
 languages+0.022
Neg logit contributions
 rescued-0.017
eh-0.016
 deliber-0.016
vous-0.015
on-0.015
Token Tim
Token ID5045
Feature activation+9.10e-03

Pos logit contributions
 outfits+0.040
 appearance+0.040
 houses+0.039
 shirts+0.039
 stations+0.038
Neg logit contributions
 rescued-0.021
eh-0.021
vous-0.020
 deliber-0.019
year-0.018
Tokenmy
Token ID1820
Feature activation+2.40e-02

Pos logit contributions
 rescued+0.016
eh+0.015
vous+0.015
 deliber+0.015
 eaten+0.014
Neg logit contributions
 outfits-0.015
 appearance-0.015
 shirts-0.015
 houses-0.014
 stations-0.014
Token shared
Token ID4888
Feature activation-2.39e-02

Pos logit contributions
 rescued+0.036
eh+0.035
vous+0.034
 deliber+0.034
year+0.031
Neg logit contributions
 outfits-0.046
 appearance-0.045
 shirts-0.044
 houses-0.043
 languages-0.043
Token all
Token ID477
Feature activation-2.92e-02

Pos logit contributions
 appearance+0.065
 stations+0.064
 outfits+0.062
maid+0.061
utes+0.061
Neg logit contributions
 rescued-0.020
 saved-0.015
 eaten-0.014
 che-0.014
on-0.014
Token his
Token ID465
Feature activation-1.94e-02

Pos logit contributions
 outfits+0.052
 shirts+0.050
 houses+0.050
 languages+0.049
 appearance+0.049
Neg logit contributions
 rescued-0.048
eh-0.047
 deliber-0.045
vous-0.045
jected-0.041
Token toys
Token ID14958
Feature activation-1.83e-02

Pos logit contributions
 appearance+0.023
 outfits+0.021
 stations+0.021
maid+0.020
 shirts+0.019
Neg logit contributions
 rescued-0.044
vous-0.041
eh-0.041
 deliber-0.040
 saved-0.039
Token with
Token ID351
Feature activation+3.19e-02

Pos logit contributions
 appearance+0.027
 outfits+0.026
 stations+0.025
 shirts+0.024
 languages+0.024
Neg logit contributions
 rescued-0.036
eh-0.035
vous-0.034
 deliber-0.034
on-0.033
Token his
Token ID465
Feature activation-1.97e-02

Pos logit contributions
 rescued+0.032
eh+0.027
 eaten+0.027
 saved+0.026
vous+0.026
Neg logit contributions
 outfits-0.077
 appearance-0.076
 stations-0.075
 shirts-0.073
 languages-0.073
Token will
Token ID481
Feature activation-2.20e-02

Pos logit contributions
 outfits+0.014
 appearance+0.014
 shirts+0.013
 hoops+0.013
 languages+0.013
Neg logit contributions
 rescued-0.017
 saved-0.015
vous-0.015
 deliber-0.015
eh-0.014
Token find
Token ID1064
Feature activation-2.80e-02

Pos logit contributions
 hoops+0.044
 outfits+0.043
 languages+0.043
 shirts+0.042
ies+0.042
Neg logit contributions
 rescued-0.035
 saved-0.030
 oblig-0.028
 eaten-0.028
 deliber-0.028
Token a
Token ID257
Feature activation-1.35e-02

Pos logit contributions
 appearance+0.051
 outfits+0.050
 shirts+0.048
 stations+0.048
 languages+0.048
Neg logit contributions
 rescued-0.046
eh-0.040
vous-0.040
 deliber-0.039
 saved-0.039
Token good
Token ID922
Feature activation+3.45e-02

Pos logit contributions
 languages+0.024
 outfits+0.024
 appearance+0.024
 stations+0.024
ies+0.023
Neg logit contributions
 rescued-0.023
eh-0.020
 deliber-0.020
vous-0.019
 saved-0.019
Token home
Token ID1363
Feature activation+9.48e-03

Pos logit contributions
 rescued+0.059
 deliber+0.055
vous+0.054
 eaten+0.053
eh+0.052
Neg logit contributions
 appearance-0.057
 stations-0.056
 languages-0.056
utes-0.056
 outfits-0.056
Token with
Token ID351
Feature activation+3.30e-02

Pos logit contributions
 rescued+0.015
vous+0.013
eh+0.013
 deliber+0.013
year+0.013
Neg logit contributions
 appearance-0.018
 outfits-0.017
 hoops-0.017
 shirts-0.017
 languages-0.017
Token other
Token ID584
Feature activation-4.43e-02

Pos logit contributions
 rescued+0.053
 saved+0.046
 eaten+0.046
 deliber+0.046
eh+0.046
Neg logit contributions
 appearance-0.059
 outfits-0.058
 stations-0.055
utes-0.055
inate-0.055
Token kids
Token ID3988
Feature activation-2.93e-02

Pos logit contributions
utes+0.074
ute+0.067
inate+0.067
maid+0.063
 edge+0.063
Neg logit contributions
 rescued-0.078
 saved-0.072
vous-0.070
 deliber-0.070
sm-0.069
Token who
Token ID508
Feature activation+4.38e-02

Pos logit contributions
 appearance+0.056
 outfits+0.053
 stations+0.052
 houses+0.052
 languages+0.052
Neg logit contributions
 rescued-0.045
eh-0.041
vous-0.040
 deliber-0.039
 eaten-0.039
Token will
Token ID481
Feature activation-2.14e-02

Pos logit contributions
eh+0.076
 rescued+0.076
vous+0.074
 deliber+0.073
year+0.067
Neg logit contributions
 outfits-0.072
 appearance-0.071
 shirts-0.068
 houses-0.067
 stations-0.066
Token love
Token ID1842
Feature activation+2.43e-02

Pos logit contributions
 outfits+0.043
 hoops+0.042
 languages+0.041
 shirts+0.041
 appearance+0.041
Neg logit contributions
 rescued-0.033
 deliber-0.029
 saved-0.028
vous-0.028
 eaten-0.028
Token there
Token ID612
Feature activation-3.50e-03

Pos logit contributions
 rescued+0.017
 deliber+0.017
vous+0.016
eh+0.016
 oblig+0.015
Neg logit contributions
 houses-0.012
ies-0.012
 outfits-0.012
 appearance-0.012
 shirts-0.012
Token was
Token ID373
Feature activation+1.68e-02

Pos logit contributions
 outfits+0.005
 appearance+0.004
 houses+0.004
 shirts+0.004
ies+0.004
Neg logit contributions
eh-0.007
 rescued-0.007
vous-0.007
 deliber-0.007
year-0.007
Token a
Token ID257
Feature activation+5.80e-03

Pos logit contributions
 deliber+0.030
 rescued+0.029
vous+0.028
jected+0.028
 oblig+0.028
Neg logit contributions
 shirts-0.025
 houses-0.025
 hoops-0.025
 outfits-0.024
 boats-0.023
Token little
Token ID1310
Feature activation+1.01e-02

Pos logit contributions
 rescued+0.013
 deliber+0.012
vous+0.012
eh+0.012
year+0.012
Neg logit contributions
 houses-0.007
 outfits-0.006
 appearance-0.006
ies-0.006
 shirts-0.006
Token boy
Token ID2933
Feature activation+8.47e-03

Pos logit contributions
 rescued+0.023
eh+0.023
vous+0.022
 deliber+0.022
year+0.021
Neg logit contributions
 outfits-0.011
 appearance-0.011
 shirts-0.011
 houses-0.010
ies-0.010
Token named
Token ID3706
Feature activation+3.07e-02

Pos logit contributions
 deliber+0.015
 rescued+0.014
 succeeded+0.014
vous+0.014
jected+0.013
Neg logit contributions
ies-0.014
 houses-0.013
 outfits-0.013
 shirts-0.013
 appearance-0.013
Token Tim
Token ID5045
Feature activation+3.29e-02

Pos logit contributions
eh+0.058
jected+0.057
 rescued+0.056
vous+0.056
year+0.055
Neg logit contributions
 shirts-0.043
 hoops-0.042
 houses-0.042
 names-0.040
 faces-0.039
Token.
Token ID13
Feature activation+2.48e-02

Pos logit contributions
 deliber+0.075
 rescued+0.070
eh+0.069
itt+0.069
year+0.068
Neg logit contributions
ies-0.038
 outfits-0.036
 appearance-0.036
 houses-0.035
 shirts-0.035
Token Tim
Token ID5045
Feature activation+3.47e-02

Pos logit contributions
eh+0.043
 rescued+0.041
dd+0.041
jected+0.040
year+0.038
Neg logit contributions
 houses-0.042
 hoops-0.041
 shirts-0.040
 appearance-0.039
 faces-0.038
Token was
Token ID373
Feature activation+1.57e-02

Pos logit contributions
 deliber+0.069
year+0.065
eh+0.063
 oblig+0.063
 rescued+0.062
Neg logit contributions
 shirts-0.047
 houses-0.047
 outfits-0.047
ies-0.046
 appearance-0.043
Token a
Token ID257
Feature activation+5.10e-03

Pos logit contributions
 deliber+0.029
dd+0.029
jected+0.028
eh+0.028
sha+0.028
Neg logit contributions
 houses-0.023
 boats-0.020
 sax-0.020
 shirts-0.020
 hoops-0.020
Token joined
Token ID5399
Feature activation-6.05e-02

Pos logit contributions
utes+0.027
 appearance+0.026
 outfits+0.026
inate+0.025
maid+0.025
Neg logit contributions
 rescued-0.019
 saved-0.016
 helped-0.016
 deliber-0.016
 eaten-0.016
Token the
Token ID262
Feature activation+4.32e-03

Pos logit contributions
 appearance+0.119
 stations+0.113
 outfits+0.112
 sax+0.110
 shirts+0.106
Neg logit contributions
 rescued-0.085
 deliber-0.079
on-0.077
eh-0.077
year-0.076
Token castle
Token ID16669
Feature activation-5.07e-02

Pos logit contributions
 rescued+0.007
vous+0.007
 deliber+0.007
 che+0.007
 eaten+0.007
Neg logit contributions
 appearance-0.007
utes-0.007
 stations-0.007
inate-0.007
uing-0.007
Token.
Token ID13
Feature activation-5.48e-03

Pos logit contributions
 appearance+0.112
 outfits+0.104
 shirts+0.101
inate+0.101
maid+0.099
Neg logit contributions
 rescued-0.066
eh-0.061
on-0.055
vous-0.055
 deliber-0.054
Token They
Token ID1119
Feature activation-2.34e-03

Pos logit contributions
 outfits+0.009
 appearance+0.009
 shirts+0.009
ies+0.008
 stations+0.008
Neg logit contributions
 rescued-0.010
eh-0.009
 deliber-0.009
vous-0.009
year-0.009
Token had
Token ID550
Feature activation-1.90e-02

Pos logit contributions
 appearance+0.005
 outfits+0.005
 languages+0.005
 houses+0.005
 shirts+0.005
Neg logit contributions
 rescued-0.003
 saved-0.003
eh-0.003
vous-0.002
 deliber-0.002
Token a
Token ID257
Feature activation-1.02e-02

Pos logit contributions
 appearance+0.043
 outfits+0.042
ies+0.040
 shirts+0.040
 languages+0.040
Neg logit contributions
 rescued-0.024
eh-0.021
vous-0.021
 deliber-0.021
 eaten-0.019
Token lot
Token ID1256
Feature activation+4.67e-02

Pos logit contributions
 appearance+0.018
 stations+0.017
 outfits+0.017
adoes+0.017
 languages+0.016
Neg logit contributions
 rescued-0.018
eh-0.017
 che-0.016
 deliber-0.015
 satisfied-0.015
Token of
Token ID286
Feature activation-2.65e-02

Pos logit contributions
 rescued+0.076
eh+0.073
vous+0.072
 deliber+0.069
year+0.066
Neg logit contributions
 appearance-0.087
 outfits-0.086
 shirts-0.081
 houses-0.080
ies-0.079
Token fun
Token ID1257
Feature activation+2.82e-02

Pos logit contributions
 appearance+0.059
 outfits+0.057
 stations+0.056
ies+0.055
maid+0.055
Neg logit contributions
 rescued-0.032
eh-0.030
 deliber-0.030
vous-0.029
 eaten-0.026
Token in
Token ID287
Feature activation-1.31e-02

Pos logit contributions
eh+0.035
 rescued+0.035
 deliber+0.033
on+0.033
sm+0.032
Neg logit contributions
 appearance-0.060
 outfits-0.056
 shirts-0.055
 languages-0.054
maid-0.054
Token
Token ID198
Feature activation+2.06e-02

Pos logit contributions
 rescued+0.027
eh+0.026
vous+0.025
 deliber+0.025
on+0.023
Neg logit contributions
 appearance-0.044
 outfits-0.044
 shirts-0.042
 stations-0.042
 languages-0.041
TokenShe
Token ID3347
Feature activation-3.36e-03

Pos logit contributions
eh+0.031
 rescued+0.030
vous+0.029
 deliber+0.029
year+0.028
Neg logit contributions
 appearance-0.043
 outfits-0.040
maid-0.038
ies-0.038
 languages-0.038
Token asked
Token ID1965
Feature activation-2.70e-03

Pos logit contributions
 outfits+0.007
 appearance+0.007
 shirts+0.006
 houses+0.006
 stations+0.006
Neg logit contributions
 rescued-0.005
eh-0.005
vous-0.005
 deliber-0.004
year-0.004
Token the
Token ID262
Feature activation+1.47e-02

Pos logit contributions
 outfits+0.005
 appearance+0.005
 shirts+0.005
 houses+0.004
 languages+0.004
Neg logit contributions
 rescued-0.005
eh-0.004
vous-0.004
 deliber-0.004
dd-0.004
Token store
Token ID3650
Feature activation+6.62e-03

Pos logit contributions
 rescued+0.026
eh+0.024
 deliber+0.024
vous+0.024
 saved+0.022
Neg logit contributions
 outfits-0.025
 appearance-0.025
ies-0.023
 stations-0.023
maid-0.023
Token worker
Token ID8383
Feature activation+9.83e-03

Pos logit contributions
 rescued+0.011
 deliber+0.010
vous+0.010
eh+0.010
year+0.010
Neg logit contributions
 appearance-0.012
 outfits-0.012
 stations-0.011
 shirts-0.011
 hoops-0.011
Token,
Token ID11
Feature activation-3.16e-04

Pos logit contributions
 rescued+0.014
year+0.013
eh+0.013
 deliber+0.013
vous+0.012
Neg logit contributions
 outfits-0.019
 appearance-0.019
 stations-0.018
 shirts-0.018
 faces-0.018
Token "
Token ID366
Feature activation-1.99e-02

Pos logit contributions
 outfits+0.000
 appearance+0.000
 shirts+0.000
 houses+0.000
 stations+0.000
Neg logit contributions
 rescued-0.001
eh-0.001
vous-0.001
 deliber-0.001
dd-0.001
TokenCan
Token ID6090
Feature activation-1.86e-02

Pos logit contributions
 appearance+0.033
 outfits+0.033
ies+0.032
 faces+0.031
 languages+0.030
Neg logit contributions
 rescued-0.037
eh-0.035
vous-0.034
 deliber-0.034
year-0.034
Token you
Token ID345
Feature activation+1.54e-02

Pos logit contributions
 outfits+0.039
 stations+0.036
 appearance+0.035
ies+0.035
 languages+0.034
Neg logit contributions
 rescued-0.027
 deliber-0.025
on-0.024
eh-0.024
vous-0.023
Token get
Token ID651
Feature activation+3.23e-02

Pos logit contributions
 rescued+0.025
eh+0.023
vous+0.023
 deliber+0.023
 saved+0.021
Neg logit contributions
 outfits-0.029
 appearance-0.028
 stations-0.028
 shirts-0.027
 languages-0.027
Token flowers
Token ID12734
Feature activation-1.92e-02

Pos logit contributions
 appearance+0.090
 outfits+0.086
inate+0.085
ies+0.084
maid+0.083
Neg logit contributions
 rescued-0.044
 saved-0.039
eh-0.037
on-0.037
 helped-0.036
Token,
Token ID11
Feature activation-1.95e-02

Pos logit contributions
 outfits+0.038
 appearance+0.038
 shirts+0.035
 languages+0.035
 stations+0.035
Neg logit contributions
 rescued-0.028
on-0.025
 saved-0.025
eh-0.024
 eaten-0.024
Token some
Token ID617
Feature activation-2.39e-02

Pos logit contributions
 appearance+0.031
 outfits+0.030
maid+0.029
utes+0.029
uing+0.028
Neg logit contributions
 rescued-0.031
 saved-0.029
 che-0.029
on-0.028
 helped-0.028
Token looked
Token ID3114
Feature activation+1.20e-02

Pos logit contributions
inate+0.041
utes+0.040
 appearance+0.039
maid+0.038
 outfits+0.037
Neg logit contributions
 rescued-0.040
 saved-0.039
 helped-0.037
vous-0.036
 measured-0.036
Token like
Token ID588
Feature activation-3.78e-02

Pos logit contributions
 rescued+0.013
 saved+0.012
 full+0.011
 eaten+0.011
sm+0.011
Neg logit contributions
 outfits-0.027
 appearance-0.026
 shirts-0.026
 stations-0.026
utes-0.026
Token animals
Token ID4695
Feature activation-5.71e-02

Pos logit contributions
 appearance+0.087
 outfits+0.082
inate+0.080
ies+0.080
maid+0.079
Neg logit contributions
 rescued-0.043
 saved-0.038
eh-0.037
on-0.036
 eaten-0.035
Token.
Token ID13
Feature activation-2.23e-02

Pos logit contributions
 outfits+0.124
 appearance+0.120
 shirts+0.117
utes+0.115
maid+0.114
Neg logit contributions
 rescued-0.075
 saved-0.064
on-0.063
 eaten-0.062
 helped-0.060
Token Lily
Token ID20037
Feature activation-1.55e-02

Pos logit contributions
 outfits+0.048
ies+0.047
 appearance+0.047
 heads+0.044
 shirts+0.044
Neg logit contributions
 rescued-0.029
 deliber-0.027
itt-0.026
vous-0.026
eh-0.025
Token and
Token ID290
Feature activation-2.03e-02

Pos logit contributions
 appearance+0.024
 outfits+0.023
 stations+0.022
 languages+0.022
 countries+0.021
Neg logit contributions
 rescued-0.031
eh-0.028
vous-0.028
 saved-0.027
 eaten-0.027
Token Sam
Token ID3409
Feature activation-4.16e-02

Pos logit contributions
 outfits+0.044
 appearance+0.043
 shirts+0.041
 stations+0.041
 houses+0.041
Neg logit contributions
 rescued-0.026
eh-0.025
vous-0.024
 deliber-0.024
 eaten-0.021
Token cl
Token ID537
Feature activation+3.22e-02

Pos logit contributions
 appearance+0.089
 outfits+0.088
 shirts+0.085
utes+0.085
 hoops+0.084
Neg logit contributions
 rescued-0.056
vous-0.047
 saved-0.047
 eaten-0.044
 helped-0.044
Token man
Token ID582
Feature activation+1.96e-02

Pos logit contributions
 rescued+0.029
eh+0.029
vous+0.028
 deliber+0.027
 eaten+0.025
Neg logit contributions
 outfits-0.030
 appearance-0.029
 shirts-0.029
 stations-0.028
 hoops-0.028
Token.
Token ID13
Feature activation+4.56e-03

Pos logit contributions
 rescued+0.033
eh+0.030
vous+0.030
 deliber+0.030
on+0.029
Neg logit contributions
 outfits-0.035
 appearance-0.035
 shirts-0.034
 stations-0.034
 hoops-0.033
Token He
Token ID679
Feature activation-3.88e-03

Pos logit contributions
 rescued+0.005
eh+0.005
vous+0.005
 deliber+0.005
year+0.004
Neg logit contributions
 outfits-0.010
 appearance-0.010
 shirts-0.010
ies-0.010
 stations-0.010
Token saw
Token ID2497
Feature activation+9.40e-03

Pos logit contributions
 outfits+0.008
 appearance+0.008
 stations+0.008
 shirts+0.008
 languages+0.008
Neg logit contributions
 rescued-0.005
eh-0.005
vous-0.005
 deliber-0.005
 saved-0.004
Token that
Token ID326
Feature activation-1.17e-02

Pos logit contributions
 rescued+0.011
eh+0.009
vous+0.009
on+0.009
 deliber+0.009
Neg logit contributions
 outfits-0.022
 appearance-0.022
 shirts-0.021
 stations-0.021
ies-0.021
Token Jack
Token ID3619
Feature activation-4.67e-02

Pos logit contributions
 outfits+0.028
 appearance+0.027
 shirts+0.026
 stations+0.026
ies+0.026
Neg logit contributions
 rescued-0.013
 deliber-0.012
eh-0.011
vous-0.011
on-0.011
Token was
Token ID373
Feature activation+1.15e-02

Pos logit contributions
 outfits+0.093
 appearance+0.091
 stations+0.090
 hoops+0.089
 languages+0.087
Neg logit contributions
 rescued-0.071
vous-0.064
eh-0.064
 deliber-0.061
 saved-0.061
Token really
Token ID1107
Feature activation-1.63e-02

Pos logit contributions
 rescued+0.015
eh+0.015
vous+0.014
 deliber+0.014
 eaten+0.013
Neg logit contributions
 outfits-0.025
 appearance-0.024
 shirts-0.024
 stations-0.024
 languages-0.023
Token thirsty
Token ID47124
Feature activation+3.52e-02

Pos logit contributions
 outfits+0.034
 appearance+0.033
 shirts+0.032
 stations+0.032
 houses+0.032
Neg logit contributions
 rescued-0.023
eh-0.022
vous-0.021
 deliber-0.021
 eaten-0.019
Token.
Token ID13
Feature activation+3.60e-03

Pos logit contributions
 rescued+0.052
vous+0.049
eh+0.049
 deliber+0.048
 eaten+0.046
Neg logit contributions
 outfits-0.069
 appearance-0.067
utes-0.065
 languages-0.065
 stations-0.064
Token "
Token ID366
Feature activation-2.27e-02

Pos logit contributions
 rescued+0.004
eh+0.004
 deliber+0.004
vous+0.004
year+0.003
Neg logit contributions
 outfits-0.008
 appearance-0.008
 shirts-0.008
ies-0.008
 stations-0.008
Token it
Token ID340
Feature activation-7.93e-03

Pos logit contributions
 rescued+0.029
eh+0.028
vous+0.027
 deliber+0.026
on+0.024
Neg logit contributions
 outfits-0.035
 appearance-0.034
 shirts-0.033
 stations-0.033
 houses-0.033
Token.
Token ID13
Feature activation+1.65e-03

Pos logit contributions
 outfits+0.014
 appearance+0.014
 shirts+0.014
 stations+0.013
 languages+0.013
Neg logit contributions
 rescued-0.013
eh-0.012
vous-0.012
 deliber-0.012
 eaten-0.011
Token
Token ID198
Feature activation+8.87e-03

Pos logit contributions
eh+0.003
vous+0.003
 rescued+0.003
 deliber+0.003
year+0.003
Neg logit contributions
 houses-0.003
 outfits-0.003
 appearance-0.003
 shirts-0.003
 faces-0.003
Token
Token ID198
Feature activation+8.82e-03

Pos logit contributions
 rescued+0.015
eh+0.015
vous+0.015
 deliber+0.014
jected+0.013
Neg logit contributions
 outfits-0.014
 houses-0.014
 appearance-0.014
 shirts-0.014
ies-0.013
TokenL
Token ID43
Feature activation+1.80e-02

Pos logit contributions
eh+0.014
 rescued+0.014
vous+0.014
 deliber+0.013
jected+0.012
Neg logit contributions
 appearance-0.017
 outfits-0.016
 shirts-0.015
maid-0.015
 languages-0.015
Tokenily
Token ID813
Feature activation-4.67e-02

Pos logit contributions
 rescued+0.024
 deliber+0.022
eh+0.022
vous+0.021
 eaten+0.019
Neg logit contributions
 outfits-0.038
 appearance-0.037
 shirts-0.036
ies-0.036
 languages-0.036
Token took
Token ID1718
Feature activation-7.50e-03

Pos logit contributions
 outfits+0.083
 appearance+0.081
 shirts+0.080
 houses+0.079
ies+0.078
Neg logit contributions
 rescued-0.073
eh-0.073
vous-0.073
 deliber-0.071
year-0.066
Token off
Token ID572
Feature activation-5.02e-02

Pos logit contributions
 houses+0.011
 them+0.010
 faces+0.010
 outfits+0.010
 hoops+0.010
Neg logit contributions
vous-0.015
eh-0.015
 deliber-0.014
jected-0.014
ager-0.014
Token one
Token ID530
Feature activation-2.67e-02

Pos logit contributions
 houses+0.058
 hoops+0.052
 outfits+0.052
 them+0.051
 shirts+0.051
Neg logit contributions
vous-0.120
jected-0.113
dd-0.110
eh-0.107
sha-0.105
Token of
Token ID286
Feature activation-2.95e-02

Pos logit contributions
 appearance+0.046
 outfits+0.046
 shirts+0.044
 stations+0.044
 languages+0.043
Neg logit contributions
eh-0.045
 rescued-0.045
vous-0.043
 deliber-0.041
on-0.040
Token her
Token ID607
Feature activation-3.91e-05

Pos logit contributions
 outfits+0.050
 appearance+0.049
 houses+0.049
 hoops+0.049
 shirts+0.049
Neg logit contributions
vous-0.049
 rescued-0.048
eh-0.048
 deliber-0.047
dd-0.044
Token was
Token ID373
Feature activation-3.16e-04

Pos logit contributions
 outfits+0.020
 appearance+0.020
 shirts+0.020
 languages+0.019
 stations+0.019
Neg logit contributions
 rescued-0.019
eh-0.018
vous-0.018
 deliber-0.018
 eaten-0.017
Token over
Token ID625
Feature activation+1.58e-02

Pos logit contributions
 outfits+0.001
 appearance+0.001
 shirts+0.001
 houses+0.001
 stations+0.001
Neg logit contributions
 rescued-0.000
eh-0.000
vous-0.000
 deliber-0.000
year-0.000
Token,
Token ID11
Feature activation-9.04e-03

Pos logit contributions
 rescued+0.024
eh+0.023
 deliber+0.023
vous+0.023
 eaten+0.021
Neg logit contributions
 outfits-0.030
 appearance-0.030
 shirts-0.029
 stations-0.028
 languages-0.028
Token there
Token ID612
Feature activation-2.29e-02

Pos logit contributions
 appearance+0.020
 outfits+0.020
 shirts+0.019
 languages+0.018
 stations+0.018
Neg logit contributions
 rescued-0.011
 saved-0.010
 deliber-0.010
on-0.009
 eaten-0.009
Token were
Token ID547
Feature activation+5.18e-03

Pos logit contributions
 appearance+0.050
 outfits+0.049
 shirts+0.047
 hoops+0.046
utes+0.046
Neg logit contributions
 rescued-0.029
 saved-0.027
on-0.026
 helped-0.025
 deliber-0.024
Token two
Token ID734
Feature activation-4.85e-02

Pos logit contributions
 rescued+0.006
 saved+0.005
 eaten+0.005
vous+0.005
 measured+0.005
Neg logit contributions
 outfits-0.012
 appearance-0.012
utes-0.012
maid-0.011
 sax-0.011
Token winners
Token ID14591
Feature activation+3.57e-03

Pos logit contributions
inate+0.078
utes+0.077
 appearance+0.077
 outfits+0.074
 sax+0.071
Neg logit contributions
 rescued-0.086
 saved-0.079
vous-0.077
 deliber-0.077
 satisfied-0.074
Token -
Token ID532
Feature activation-1.86e-02

Pos logit contributions
 rescued+0.004
 saved+0.004
on+0.004
sm+0.004
vous+0.003
Neg logit contributions
 appearance-0.008
inate-0.008
 outfits-0.008
 shirts-0.007
 languages-0.007
Token one
Token ID530
Feature activation-2.63e-02

Pos logit contributions
 appearance+0.044
 outfits+0.043
ies+0.042
maid+0.041
 heads+0.040
Neg logit contributions
 rescued-0.022
 saved-0.019
 deliber-0.017
on-0.017
vous-0.016
Token with
Token ID351
Feature activation+3.62e-02

Pos logit contributions
 appearance+0.054
 outfits+0.053
 shirts+0.052
inate+0.051
 stations+0.050
Neg logit contributions
 rescued-0.037
vous-0.035
 saved-0.033
on-0.033
 helped-0.031
Token a
Token ID257
Feature activation-1.14e-02

Pos logit contributions
 rescued+0.047
vous+0.041
 saved+0.040
 eaten+0.038
on+0.038
Neg logit contributions
 appearance-0.077
 outfits-0.077
ies-0.075
inate-0.074
maid-0.073
Token decided
Token ID3066
Feature activation-4.33e-02

Pos logit contributions
 appearance+0.025
 outfits+0.024
 stations+0.024
utes+0.024
 languages+0.023
Neg logit contributions
 rescued-0.013
 saved-0.011
vous-0.011
eh-0.011
 helped-0.011
Token to
Token ID284
Feature activation+1.41e-02

Pos logit contributions
 outfits+0.106
 stations+0.101
 appearance+0.100
 languages+0.100
 shirts+0.100
Neg logit contributions
 rescued-0.046
 deliber-0.041
 saved-0.038
 eaten-0.038
eh-0.037
Token have
Token ID423
Feature activation+1.33e-02

Pos logit contributions
 rescued+0.021
 oblig+0.020
 helped+0.019
vous+0.019
 saved+0.019
Neg logit contributions
lights-0.026
 appearance-0.025
 houses-0.025
ies-0.025
 stations-0.024
Token a
Token ID257
Feature activation-6.33e-03

Pos logit contributions
 rescued+0.016
eh+0.016
vous+0.015
 deliber+0.014
year+0.013
Neg logit contributions
 outfits-0.030
 appearance-0.029
 shirts-0.029
 houses-0.028
 stations-0.028
Token dance
Token ID9280
Feature activation-6.46e-02

Pos logit contributions
 appearance+0.011
 outfits+0.011
 stations+0.011
 languages+0.010
 shirts+0.010
Neg logit contributions
 rescued-0.011
eh-0.011
vous-0.010
 deliber-0.010
on-0.009
Token party
Token ID2151
Feature activation-4.96e-02

Pos logit contributions
 appearance+0.127
 houses+0.118
 outfits+0.118
 countries+0.117
 faces+0.116
Neg logit contributions
eh-0.097
 rescued-0.092
vous-0.091
 deliber-0.084
on-0.082
Token.
Token ID13
Feature activation+1.46e-02

Pos logit contributions
 appearance+0.099
 outfits+0.094
 shirts+0.090
 languages+0.090
 hoops+0.089
Neg logit contributions
 rescued-0.074
eh-0.071
vous-0.069
 deliber-0.068
on-0.064
Token He
Token ID679
Feature activation-3.48e-03

Pos logit contributions
 rescued+0.018
vous+0.017
 deliber+0.017
eh+0.016
year+0.015
Neg logit contributions
 outfits-0.033
 appearance-0.032
ies-0.032
 shirts-0.031
 stations-0.031
Token invited
Token ID9392
Feature activation-6.57e-02

Pos logit contributions
 appearance+0.008
 languages+0.007
 faces+0.007
 outfits+0.007
 stations+0.007
Neg logit contributions
 rescued-0.004
eh-0.004
vous-0.004
 deliber-0.004
 helped-0.004
Token all
Token ID477
Feature activation-2.12e-02

Pos logit contributions
 appearance+0.160
 outfits+0.155
utes+0.150
maid+0.150
ies+0.150
Neg logit contributions
 rescued-0.070
 deliber-0.060
 saved-0.057
on-0.057
vous-0.056
Token of
Token ID286
Feature activation-2.71e-02

Pos logit contributions
 appearance+0.047
 outfits+0.045
 shirts+0.043
 stations+0.042
 faces+0.042
Neg logit contributions
 rescued-0.027
eh-0.027
vous-0.026
 deliber-0.025
on-0.024