TOP ACTIVATIONS
MAX = 0.065

" Do you like my mustache ?" he asks .
starts to limp . He wh ines and l icks his
and runs away . He wh ines and hides under a
and runs away . He wh ines and looks back .
in the fence . He wh ines and w ags his
Why are you picking my onions ? â
lies on his bed and wh ines . Anna is worried
" What are peanuts ?" Lily asked .
Lily and Tom . He wh ines and looks at them
icks their faces . He wh ines and w ags his
are you doing with the grapes ? Those are for eating
are you doing with my beans ?" It was Mrs .
Daddy , why are the whales marching ?", exclaimed Ben .
asked , " What do chickens look like ?"
button . They hear a wh ir ring sound and see
like " castle ", " pr ince ", " dragon ",
cardboard ." Sur pr ised , Ralph looked at
they do it . He imag ines he is one of
together . They let their imag inations take over and soon
" Because lady bugs are kind . They help

MIN ACTIVATIONS
MIN = -0.064

the mailbox . On the way , she spotted something disgusting
. We will find a way to fix your car ."
come , and the lovely leather wrapped doll stayed with her
the pigeon would find a way to fix its wing and
it was decorated just the way she liked it . <|endoftext|>
a lamp to see the way . The lamp was old
her grandma . On the way , she found a shiny
. On the way to the park , they
y showed the bug the way home and the bug was
hopes they will stay that way . <|endoftext|> It was a
They try to find the way back to their home .
her pocket . On the way to her grandma 's house
it would never find its way back home . But then
find their ball or the way back . "
some ice cream on the way back . L
. The map showed the way to a big treasure chest
He couldn 't find the way out . He got frustrated
he slowly crawled back the way he came .
and could not find its way back home . The pretty
He liked his plane the way it was , but he

INTERVAL 0.033 - 0.065
CONTAINS 0.177%

. She lived in a beautiful green me adow .
" Do you hear the bird singing ?" Lily stopped what
up , she saw an owl flying in the air .
spaghetti like snakes again . <|endoftext|> L ily was eager to
mind the gloomy day . <|endoftext|> S ara and Tom were

INTERVAL 0.001 - 0.033
CONTAINS 47.352%

Their mom said yes , but reminded them to clean up
saw an old man standing nearby . The old man said
y bird left . But , Lucy and Tom were still
, the farmer started scattering some wheat in the field for
put it in her pocket . As she was walking home

INTERVAL -0.032 - 0.001
CONTAINS 52.124%

to greet her . She was really nice and said ,
Once upon a time there was a little girl named Emily
She went home and had a big , happy smile on
big gr umpy bear . He didn 't like to do
and bye - bye and make funny noises .

INTERVAL -0.064 - -0.032
CONTAINS 0.347%

and laughed until the sun began to set . Lucy said
Mom my said sorry and promised to pack her favorite sandwich
store their honey and take care of their babies ," mom
it that this was the way things had to be and
put the blanket on the side of her bed . It
Token"
Token ID1
Feature activation+9.18e-03

Pos logit contributions
 aware+0.007
 believing+0.006
 bra+0.006
 thanking+0.006
 disl+0.006
Neg logit contributions
room-0.007
roller-0.006
ork-0.006
ocated-0.006
Mine-0.006
TokenDo
Token ID5211
Feature activation+2.27e-04

Pos logit contributions
 aware+0.016
 questions+0.014
 cautious+0.014
 believing+0.014
 mindful+0.013
Neg logit contributions
Mine-0.013
robe-0.012
room-0.012
tered-0.011
ase-0.010
Token you
Token ID345
Feature activation+1.54e-02

Pos logit contributions
 aware+0.000
 bra+0.000
 cautious+0.000
 consider+0.000
 disl+0.000
Neg logit contributions
robe-0.000
roller-0.000
room-0.000
ork-0.000
 ceiling-0.000
Token like
Token ID588
Feature activation+5.03e-03

Pos logit contributions
 aware+0.023
Just+0.022
 bra+0.021
Next+0.021
emy+0.021
Neg logit contributions
room-0.020
robe-0.019
tered-0.018
Mine-0.018
 belong-0.017
Token my
Token ID616
Feature activation+9.79e-03

Pos logit contributions
 aware+0.008
 cautious+0.008
 bra+0.007
 mindful+0.007
 consider+0.007
Neg logit contributions
robe-0.007
 ceiling-0.006
roller-0.006
room-0.006
ork-0.006
Token mustache
Token ID49303
Feature activation+6.55e-02

Pos logit contributions
 aware+0.014
Just+0.014
 bra+0.013
 consider+0.013
 cautious+0.013
Neg logit contributions
robe-0.015
 ceiling-0.013
 frame-0.013
 plate-0.013
 table-0.013
Token?"
Token ID1701
Feature activation+7.70e-03

Pos logit contributions
 aware+0.057
 bra+0.053
 cautious+0.052
 disl+0.050
Just+0.048
Neg logit contributions
robe-0.139
room-0.130
roller-0.126
 ceiling-0.124
ase-0.123
Token he
Token ID339
Feature activation+2.27e-02

Pos logit contributions
 aware+0.014
ologist+0.014
cod+0.013
 mindful+0.013
 believing+0.013
Neg logit contributions
 ceiling-0.008
 shelf-0.008
 glass-0.008
room-0.008
roller-0.008
Token asks
Token ID7893
Feature activation-7.20e-03

Pos logit contributions
 aware+0.030
fing+0.028
 disl+0.028
ologist+0.028
smart+0.027
Neg logit contributions
Mine-0.033
 button-0.033
 ends-0.033
 shelf-0.032
 ceiling-0.031
Token.
Token ID13
Feature activation-1.79e-02

Pos logit contributions
robe+0.015
roller+0.015
 ceiling+0.014
room+0.014
ase+0.014
Neg logit contributions
 bra-0.006
 aware-0.006
 questions-0.006
 cautious-0.006
 disl-0.005
Token
Token ID198
Feature activation-7.50e-03

Pos logit contributions
robe+0.030
roller+0.028
room+0.027
ork+0.027
 ceiling+0.026
Neg logit contributions
 aware-0.024
 disl-0.023
 bra-0.022
 consider-0.022
 cautious-0.022
Token starts
Token ID4940
Feature activation-8.46e-03

Pos logit contributions
 bra+0.022
 aware+0.021
 cautious+0.020
 questions+0.020
 consider+0.020
Neg logit contributions
robe-0.023
roller-0.022
room-0.020
 ceiling-0.020
ork-0.020
Token to
Token ID284
Feature activation+6.14e-03

Pos logit contributions
robe+0.012
roller+0.012
room+0.011
 ceiling+0.011
ork+0.011
Neg logit contributions
 bra-0.012
 aware-0.012
 disl-0.012
 consider-0.012
 cautious-0.012
Token limp
Token ID42102
Feature activation-2.91e-03

Pos logit contributions
 bra+0.008
 aware+0.008
 cautious+0.007
 disl+0.007
 consider+0.007
Neg logit contributions
robe-0.010
roller-0.010
room-0.009
 ceiling-0.009
ase-0.009
Token.
Token ID13
Feature activation+2.75e-05

Pos logit contributions
robe+0.006
roller+0.006
room+0.006
ase+0.006
 ceiling+0.006
Neg logit contributions
 bra-0.002
 aware-0.002
 cautious-0.002
 questions-0.002
 consider-0.002
Token He
Token ID679
Feature activation+1.59e-02

Pos logit contributions
 bra+0.000
 aware+0.000
 questions+0.000
 cautious+0.000
 til+0.000
Neg logit contributions
robe-0.000
room-0.000
ase-0.000
ocated-0.000
attered-0.000
Token wh
Token ID348
Feature activation+6.44e-02

Pos logit contributions
 aware+0.024
 disl+0.023
 bra+0.023
 consider+0.022
 impressed+0.022
Neg logit contributions
robe-0.022
roller-0.021
 ceiling-0.020
ork-0.020
tered-0.019
Tokenines
Token ID1127
Feature activation+2.22e-02

Pos logit contributions
 bra+0.091
 disl+0.079
 aware+0.078
 cautious+0.076
 consider+0.076
Neg logit contributions
robe-0.107
room-0.105
ase-0.103
roller-0.100
ings-0.096
Token and
Token ID290
Feature activation+1.89e-02

Pos logit contributions
 bra+0.016
 aware+0.015
 questions+0.013
 cautious+0.013
 proud+0.012
Neg logit contributions
robe-0.050
roller-0.049
room-0.046
attered-0.046
 ceiling-0.046
Token l
Token ID300
Feature activation+3.13e-02

Pos logit contributions
ologist+0.026
 aware+0.026
 consider+0.026
And+0.026
 impressed+0.025
Neg logit contributions
robe-0.026
roller-0.025
 ceiling-0.024
 glass-0.024
Mine-0.024
Tokenicks
Token ID3378
Feature activation+2.21e-02

Pos logit contributions
 bra+0.040
 aware+0.039
 cautious+0.036
 questions+0.036
 consider+0.035
Neg logit contributions
robe-0.054
roller-0.052
room-0.049
 ceiling-0.049
ase-0.048
Token his
Token ID465
Feature activation+9.74e-03

Pos logit contributions
 bra+0.024
 questions+0.023
But+0.021
 aware+0.021
smart+0.020
Neg logit contributions
robe-0.043
roller-0.041
ocated-0.040
ase-0.039
 ceiling-0.039
Token and
Token ID290
Feature activation+1.77e-02

Pos logit contributions
 bra+0.011
 aware+0.010
 cautious+0.009
 proud+0.009
But+0.009
Neg logit contributions
robe-0.036
roller-0.034
room-0.033
ase-0.033
ork-0.032
Token runs
Token ID4539
Feature activation-2.13e-02

Pos logit contributions
 aware+0.026
 consider+0.026
ologist+0.025
 cautious+0.025
 knowledgeable+0.025
Neg logit contributions
robe-0.023
roller-0.022
 ceiling-0.022
Mine-0.022
 glass-0.022
Token away
Token ID1497
Feature activation+5.19e-05

Pos logit contributions
robe+0.033
roller+0.031
room+0.030
 ceiling+0.030
ase+0.029
Neg logit contributions
 bra-0.031
 aware-0.030
 cautious-0.028
 questions-0.028
 disl-0.027
Token.
Token ID13
Feature activation-7.47e-03

Pos logit contributions
ologist+0.000
 disl+0.000
 consider+0.000
 aware+0.000
 bra+0.000
Neg logit contributions
robe-0.000
roller-0.000
ings-0.000
ork-0.000
tered-0.000
Token He
Token ID679
Feature activation+2.42e-02

Pos logit contributions
robe+0.011
roller+0.010
room+0.010
 ceiling+0.010
ork+0.010
Neg logit contributions
 aware-0.011
 bra-0.011
 consider-0.011
ologist-0.010
 cautious-0.010
Token wh
Token ID348
Feature activation+6.32e-02

Pos logit contributions
 aware+0.040
 consider+0.038
ologist+0.037
 disl+0.036
 mindful+0.036
Neg logit contributions
robe-0.029
 spilled-0.029
roller-0.029
 glass-0.028
 ceiling-0.027
Tokenines
Token ID1127
Feature activation+1.95e-02

Pos logit contributions
 bra+0.081
 disl+0.072
 aware+0.069
 consider+0.069
 cautious+0.067
Neg logit contributions
room-0.111
robe-0.110
ase-0.106
roller-0.106
ings-0.102
Token and
Token ID290
Feature activation+2.57e-02

Pos logit contributions
 aware+0.017
 bra+0.017
 disl+0.016
 cautious+0.016
 consider+0.015
Neg logit contributions
robe-0.040
roller-0.038
room-0.038
 ceiling-0.037
ase-0.036
Token hides
Token ID30768
Feature activation+4.31e-03

Pos logit contributions
ologist+0.039
 aware+0.038
 consider+0.037
 knowledgeable+0.036
inations+0.036
Neg logit contributions
robe-0.034
 glass-0.032
roller-0.032
 ceiling-0.032
Mine-0.031
Token under
Token ID739
Feature activation-1.40e-02

Pos logit contributions
 aware+0.006
 bra+0.006
 disl+0.005
 consider+0.005
 cautious+0.005
Neg logit contributions
robe-0.007
roller-0.007
room-0.006
 ceiling-0.006
tered-0.006
Token a
Token ID257
Feature activation+7.80e-04

Pos logit contributions
robe+0.029
roller+0.027
attered+0.026
ase+0.026
ocated+0.026
Neg logit contributions
 bra-0.014
But-0.012
 aware-0.012
 questions-0.012
 disl-0.012
Token and
Token ID290
Feature activation+1.91e-02

Pos logit contributions
robe+0.005
roller+0.005
room+0.005
ase+0.004
 ceiling+0.004
Neg logit contributions
 aware-0.003
 disl-0.003
 consider-0.003
 bra-0.003
 trusts-0.003
Token runs
Token ID4539
Feature activation-2.64e-02

Pos logit contributions
 aware+0.027
 consider+0.027
Just+0.026
ologist+0.026
 cautious+0.026
Neg logit contributions
robe-0.026
roller-0.025
 glass-0.025
 ceiling-0.024
Mine-0.023
Token away
Token ID1497
Feature activation+2.47e-04

Pos logit contributions
robe+0.039
roller+0.036
room+0.035
 ceiling+0.034
ork+0.033
Neg logit contributions
 aware-0.039
 bra-0.039
 disl-0.037
 cautious-0.037
 consider-0.037
Token.
Token ID13
Feature activation-4.82e-03

Pos logit contributions
ologist+0.000
 aware+0.000
 consider+0.000
 disl+0.000
 trusts+0.000
Neg logit contributions
robe-0.000
roller-0.000
ings-0.000
ork-0.000
 ceiling-0.000
Token He
Token ID679
Feature activation+2.04e-02

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
 ceiling+0.008
ase+0.008
Neg logit contributions
 bra-0.006
 aware-0.006
 disl-0.005
 cautious-0.005
 consider-0.005
Token wh
Token ID348
Feature activation+6.28e-02

Pos logit contributions
 aware+0.034
 consider+0.031
ologist+0.031
 mindful+0.030
 disl+0.030
Neg logit contributions
robe-0.026
roller-0.025
 spilled-0.024
ork-0.024
Mine-0.023
Tokenines
Token ID1127
Feature activation+1.89e-02

Pos logit contributions
 bra+0.079
 disl+0.069
 aware+0.068
 consider+0.067
 cautious+0.065
Neg logit contributions
robe-0.112
room-0.111
ase-0.107
roller-0.106
ings-0.102
Token and
Token ID290
Feature activation+2.20e-02

Pos logit contributions
 bra+0.016
 aware+0.016
 disl+0.015
 cautious+0.015
 consider+0.014
Neg logit contributions
robe-0.040
roller-0.038
room-0.037
 ceiling-0.036
ase-0.036
Token looks
Token ID3073
Feature activation+1.16e-02

Pos logit contributions
ologist+0.036
 aware+0.034
inations+0.034
ules+0.033
ascal+0.033
Neg logit contributions
robe-0.026
 glass-0.026
roller-0.025
Mine-0.025
 spilled-0.025
Token back
Token ID736
Feature activation-8.30e-03

Pos logit contributions
ologist+0.021
 disl+0.020
Just+0.020
ascal+0.020
fing+0.020
Neg logit contributions
 out-0.011
robe-0.010
 spilled-0.010
tered-0.010
 everywhere-0.009
Token.
Token ID13
Feature activation-4.84e-03

Pos logit contributions
ings+0.012
roller+0.012
robe+0.011
room+0.011
ork+0.011
Neg logit contributions
 disl-0.013
 consider-0.012
 bra-0.012
 aware-0.012
ologist-0.011
Token in
Token ID287
Feature activation-2.46e-03

Pos logit contributions
robe+0.019
roller+0.019
ase+0.018
ork+0.018
 ceiling+0.018
Neg logit contributions
 bra-0.007
 aware-0.006
 cautious-0.006
But-0.006
 brave-0.005
Token the
Token ID262
Feature activation-2.84e-03

Pos logit contributions
robe+0.005
attered+0.005
ase+0.005
roller+0.005
tered+0.004
Neg logit contributions
 bra-0.003
But-0.002
 questions-0.002
smart-0.002
 brave-0.002
Token fence
Token ID13990
Feature activation+3.89e-03

Pos logit contributions
robe+0.004
roller+0.003
room+0.003
ase+0.003
tered+0.003
Neg logit contributions
 bra-0.005
 aware-0.004
 cautious-0.004
But-0.004
 questions-0.004
Token.
Token ID13
Feature activation-5.37e-03

Pos logit contributions
 bra+0.003
 aware+0.002
 cautious+0.002
 questions+0.002
But+0.002
Neg logit contributions
robe-0.009
roller-0.009
 ceiling-0.008
ork-0.008
room-0.008
Token He
Token ID679
Feature activation+1.98e-02

Pos logit contributions
robe+0.009
roller+0.009
room+0.009
 ceiling+0.008
ork+0.008
Neg logit contributions
 aware-0.007
 bra-0.007
 disl-0.006
 consider-0.006
 cautious-0.006
Token wh
Token ID348
Feature activation+6.24e-02

Pos logit contributions
 aware+0.029
 bra+0.028
 disl+0.027
 consider+0.027
 cautious+0.027
Neg logit contributions
robe-0.029
roller-0.028
 ceiling-0.026
ork-0.026
room-0.026
Tokenines
Token ID1127
Feature activation+2.28e-02

Pos logit contributions
 bra+0.081
 disl+0.070
 aware+0.069
 consider+0.068
 knowledgeable+0.066
Neg logit contributions
room-0.111
robe-0.110
ase-0.106
roller-0.105
ings-0.103
Token and
Token ID290
Feature activation+2.16e-02

Pos logit contributions
 bra+0.017
 aware+0.016
 cautious+0.014
 questions+0.014
 proud+0.013
Neg logit contributions
robe-0.051
roller-0.049
room-0.047
 ceiling-0.046
ase-0.046
Token w
Token ID266
Feature activation+4.72e-02

Pos logit contributions
 aware+0.032
 consider+0.031
ologist+0.030
 knowledgeable+0.030
 disl+0.030
Neg logit contributions
robe-0.030
roller-0.029
 ceiling-0.028
Mine-0.027
ocated-0.026
Tokenags
Token ID3775
Feature activation+1.07e-02

Pos logit contributions
 bra+0.056
 aware+0.056
 disl+0.053
 cautious+0.052
 consider+0.052
Neg logit contributions
robe-0.083
roller-0.079
room-0.078
ork-0.075
 ceiling-0.075
Token his
Token ID465
Feature activation-1.22e-02

Pos logit contributions
 bra+0.014
 aware+0.014
 cautious+0.013
 disl+0.013
 consider+0.013
Neg logit contributions
robe-0.018
roller-0.017
room-0.016
 ceiling-0.016
ase-0.016
TokenWhy
Token ID5195
Feature activation+3.17e-03

Pos logit contributions
robe+0.003
roller+0.003
Mine+0.003
room+0.003
tered+0.003
Neg logit contributions
 aware-0.003
 bra-0.003
ologist-0.003
cod-0.003
 questions-0.003
Token are
Token ID389
Feature activation+8.98e-03

Pos logit contributions
 bra+0.005
 aware+0.005
 disl+0.005
 consider+0.005
 questions+0.005
Neg logit contributions
robe-0.004
roller-0.004
 ceiling-0.004
ase-0.004
attered-0.004
Token you
Token ID345
Feature activation+1.77e-02

Pos logit contributions
 aware+0.011
 bra+0.011
 cautious+0.010
 disl+0.010
 consider+0.010
Neg logit contributions
robe-0.015
roller-0.014
 ceiling-0.014
room-0.014
tered-0.014
Token picking
Token ID10868
Feature activation+1.87e-02

Pos logit contributions
 bra+0.023
 aware+0.021
 disl+0.020
 cautious+0.020
 trusts+0.020
Neg logit contributions
robe-0.030
roller-0.028
 ceiling-0.027
room-0.027
tered-0.026
Token my
Token ID616
Feature activation+2.35e-02

Pos logit contributions
 aware+0.033
 bra+0.033
 cautious+0.032
 consider+0.032
Just+0.031
Neg logit contributions
robe-0.021
room-0.020
ings-0.019
 ceiling-0.019
ase-0.018
Token onions
Token ID24681
Feature activation+6.11e-02

Pos logit contributions
 bra+0.034
 aware+0.032
 consider+0.031
 disl+0.030
 cautious+0.030
Neg logit contributions
robe-0.037
 ceiling-0.034
roller-0.033
ings-0.032
ork-0.032
Token?
Token ID30
Feature activation+1.05e-02

Pos logit contributions
 bra+0.045
 aware+0.043
 disl+0.039
 cautious+0.039
 questions+0.036
Neg logit contributions
robe-0.139
roller-0.129
room-0.128
 ceiling-0.127
ase-0.124
Tokenâ
Token ID22940
Feature activation+9.26e-03

Pos logit contributions
 aware+0.019
 bra+0.018
 cautious+0.017
 disl+0.017
 believing+0.017
Neg logit contributions
robe-0.013
roller-0.012
room-0.011
Mine-0.011
ase-0.011
Token
Token ID26391
Feature activation+1.43e-02

Pos logit contributions
 bra+0.015
 questions+0.015
 aware+0.015
 impressed+0.014
 ze+0.014
Neg logit contributions
robe-0.012
ase-0.011
room-0.011
roller-0.011
tered-0.011
Token
Token ID220
Feature activation+6.09e-03

Pos logit contributions
 bra+0.019
 questions+0.017
 aware+0.017
 consider+0.016
 cautious+0.016
Neg logit contributions
robe-0.023
roller-0.023
room-0.022
ork-0.022
ase-0.022
Token
Token ID198
Feature activation+9.51e-03

Pos logit contributions
 bra+0.007
 aware+0.006
 cautious+0.006
 disl+0.006
 consider+0.006
Neg logit contributions
robe-0.012
roller-0.011
room-0.011
 ceiling-0.011
ase-0.010
Token lies
Token ID7363
Feature activation+1.13e-02

Pos logit contributions
 aware+0.010
 bra+0.010
 disl+0.009
 cautious+0.009
 consider+0.009
Neg logit contributions
robe-0.013
roller-0.012
 ceiling-0.012
room-0.011
tered-0.011
Token on
Token ID319
Feature activation+6.40e-04

Pos logit contributions
 bra+0.013
 aware+0.013
 cautious+0.012
 questions+0.012
 proud+0.011
Neg logit contributions
robe-0.020
roller-0.020
room-0.019
 ceiling-0.019
ocated-0.019
Token his
Token ID465
Feature activation+1.09e-02

Pos logit contributions
 bra+0.001
 questions+0.001
 aware+0.001
But+0.001
 disl+0.001
Neg logit contributions
robe-0.001
roller-0.001
attered-0.001
room-0.001
ocated-0.001
Token bed
Token ID3996
Feature activation+7.37e-03

Pos logit contributions
 bra+0.018
 aware+0.017
 cautious+0.017
 questions+0.017
 proud+0.016
Neg logit contributions
robe-0.014
room-0.013
roller-0.013
ocated-0.013
ase-0.012
Token and
Token ID290
Feature activation+1.00e-02

Pos logit contributions
 bra+0.004
 aware+0.004
 questions+0.004
 proud+0.004
But+0.004
Neg logit contributions
Mine-0.016
roller-0.016
robe-0.016
ocated-0.016
attered-0.015
Token wh
Token ID348
Feature activation+6.10e-02

Pos logit contributions
 aware+0.014
 bra+0.013
 consider+0.013
 cautious+0.013
 disl+0.013
Neg logit contributions
robe-0.015
roller-0.014
 ceiling-0.014
room-0.013
tered-0.013
Tokenines
Token ID1127
Feature activation+2.06e-02

Pos logit contributions
 bra+0.086
 disl+0.073
 aware+0.072
 cautious+0.071
 consider+0.071
Neg logit contributions
robe-0.100
room-0.100
ase-0.098
roller-0.094
ings-0.093
Token.
Token ID13
Feature activation-1.29e-03

Pos logit contributions
 bra+0.018
 aware+0.017
 questions+0.015
 cautious+0.015
 disl+0.014
Neg logit contributions
robe-0.043
roller-0.042
room-0.040
 ceiling-0.039
ase-0.039
Token Anna
Token ID11735
Feature activation+1.45e-02

Pos logit contributions
robe+0.002
roller+0.002
room+0.002
 ceiling+0.001
ase+0.001
Neg logit contributions
 bra-0.002
 aware-0.002
 cautious-0.002
 disl-0.002
 questions-0.002
Token is
Token ID318
Feature activation+1.36e-02

Pos logit contributions
 aware+0.023
 disl+0.022
 bra+0.022
 consider+0.022
 cautious+0.021
Neg logit contributions
robe-0.019
roller-0.018
 ceiling-0.017
room-0.017
ork-0.016
Token worried
Token ID7960
Feature activation+1.59e-02

Pos logit contributions
 bra+0.015
 aware+0.015
 disl+0.014
 cautious+0.014
 consider+0.014
Neg logit contributions
robe-0.025
roller-0.023
room-0.023
 ceiling-0.023
tered-0.022
Token
Token ID198
Feature activation-1.05e-02

Pos logit contributions
robe+0.028
 ceiling+0.027
ocated+0.026
attered+0.026
ase+0.026
Neg logit contributions
But-0.007
 bra-0.006
 proud-0.006
 questions-0.005
Just-0.005
Token
Token ID198
Feature activation-7.19e-03

Pos logit contributions
robe+0.023
roller+0.021
 ceiling+0.021
room+0.021
ase+0.020
Neg logit contributions
 bra-0.008
 aware-0.008
 consider-0.008
 disl-0.007
 cautious-0.007
Token"
Token ID1
Feature activation+8.79e-03

Pos logit contributions
robe+0.016
roller+0.015
 ceiling+0.015
room+0.015
ase+0.015
Neg logit contributions
 bra-0.005
 aware-0.005
But-0.005
 consider-0.005
 cautious-0.005
TokenWhat
Token ID2061
Feature activation-3.94e-03

Pos logit contributions
 aware+0.012
 bra+0.011
 disl+0.011
 cautious+0.011
 questions+0.010
Neg logit contributions
robe-0.015
roller-0.014
room-0.014
Mine-0.013
 ceiling-0.013
Token are
Token ID389
Feature activation+1.26e-02

Pos logit contributions
robe+0.007
roller+0.006
attered+0.006
ocated+0.006
ase+0.006
Neg logit contributions
 questions-0.005
 consider-0.005
 bra-0.005
 aware-0.005
 trusts-0.005
Token peanuts
Token ID45612
Feature activation+5.97e-02

Pos logit contributions
 bra+0.017
 aware+0.017
 disl+0.016
 cautious+0.016
 consider+0.015
Neg logit contributions
robe-0.020
roller-0.019
room-0.018
 ceiling-0.018
tered-0.018
Token?"
Token ID1701
Feature activation-4.16e-03

Pos logit contributions
 bra+0.046
 aware+0.044
 disl+0.041
 cautious+0.040
 questions+0.037
Neg logit contributions
robe-0.132
roller-0.124
room-0.124
 ceiling-0.121
ase-0.119
Token Lily
Token ID20037
Feature activation+7.28e-03

Pos logit contributions
robe+0.006
roller+0.005
room+0.005
 ceiling+0.005
ase+0.005
Neg logit contributions
 bra-0.007
 aware-0.007
 disl-0.006
 cautious-0.006
 consider-0.006
Token asked
Token ID1965
Feature activation+1.65e-03

Pos logit contributions
 disl+0.008
 bra+0.008
 aware+0.008
smart+0.007
cod+0.007
Neg logit contributions
roller-0.012
robe-0.012
 ceiling-0.012
tered-0.012
room-0.012
Token.
Token ID13
Feature activation-1.48e-02

Pos logit contributions
 bra+0.001
 questions+0.001
 aware+0.001
 proud+0.001
 saying+0.001
Neg logit contributions
robe-0.004
ase-0.004
roller-0.004
attered-0.004
 ceiling-0.004
Token
Token ID198
Feature activation-3.08e-03

Pos logit contributions
robe+0.033
roller+0.030
ase+0.030
room+0.030
attered+0.030
Neg logit contributions
 bra-0.011
But-0.011
 aware-0.010
 proud-0.010
 questions-0.010
Token Lily
Token ID20037
Feature activation+6.61e-03

Pos logit contributions
 aware+0.015
 bra+0.015
 cautious+0.014
 consider+0.014
 disl+0.014
Neg logit contributions
robe-0.014
roller-0.013
 ceiling-0.013
room-0.013
ork-0.012
Token and
Token ID290
Feature activation+1.07e-02

Pos logit contributions
 bra+0.008
 aware+0.008
 cautious+0.007
 questions+0.007
 disl+0.007
Neg logit contributions
robe-0.012
roller-0.011
room-0.011
 ceiling-0.011
ork-0.011
Token Tom
Token ID4186
Feature activation+1.15e-02

Pos logit contributions
 aware+0.017
 bra+0.017
 cautious+0.016
 consider+0.016
Just+0.016
Neg logit contributions
robe-0.015
roller-0.013
 ceiling-0.013
room-0.013
tered-0.012
Token.
Token ID13
Feature activation+1.08e-03

Pos logit contributions
 bra+0.013
 aware+0.013
 disl+0.013
 consider+0.012
 cautious+0.012
Neg logit contributions
robe-0.021
room-0.019
roller-0.019
 ceiling-0.019
tered-0.018
Token He
Token ID679
Feature activation+2.17e-02

Pos logit contributions
 aware+0.002
 believing+0.002
 bra+0.002
cod+0.002
 cautious+0.002
Neg logit contributions
robe-0.001
room-0.001
roller-0.001
Mine-0.001
 ceiling-0.001
Token wh
Token ID348
Feature activation+5.96e-02

Pos logit contributions
 aware+0.035
smart+0.033
fing+0.033
Just+0.032
arks+0.032
Neg logit contributions
robe-0.027
 ceiling-0.025
 spilled-0.025
 glass-0.025
Mine-0.024
Tokenines
Token ID1127
Feature activation+1.53e-02

Pos logit contributions
 bra+0.087
 aware+0.073
 cautious+0.072
 disl+0.072
 consider+0.071
Neg logit contributions
room-0.099
robe-0.098
ase-0.096
roller-0.090
ings-0.089
Token and
Token ID290
Feature activation+1.71e-02

Pos logit contributions
 bra+0.015
 aware+0.014
 cautious+0.013
 disl+0.013
 consider+0.013
Neg logit contributions
robe-0.031
roller-0.029
room-0.028
 ceiling-0.028
ase-0.028
Token looks
Token ID3073
Feature activation+1.04e-02

Pos logit contributions
emy+0.025
ologist+0.024
 aware+0.024
ules+0.024
Just+0.024
Neg logit contributions
robe-0.024
 ceiling-0.023
 glass-0.022
Mine-0.022
 shelf-0.021
Token at
Token ID379
Feature activation+2.28e-03

Pos logit contributions
Just+0.016
emy+0.016
 aware+0.015
fing+0.015
ologist+0.015
Neg logit contributions
robe-0.013
tered-0.012
 budget-0.012
 ceiling-0.012
 glass-0.011
Token them
Token ID606
Feature activation-3.53e-03

Pos logit contributions
 bra+0.003
 questions+0.003
 aware+0.002
 disl+0.002
But+0.002
Neg logit contributions
roller-0.004
robe-0.004
ocated-0.004
ork-0.004
ase-0.004
Tokenicks
Token ID3378
Feature activation+7.91e-03

Pos logit contributions
 aware+0.035
 bra+0.034
 disl+0.034
 consider+0.032
 cautious+0.032
Neg logit contributions
robe-0.037
roller-0.034
room-0.033
ase-0.032
 ceiling-0.032
Token their
Token ID511
Feature activation+4.05e-03

Pos logit contributions
 bra+0.010
 questions+0.010
 aware+0.009
 disl+0.009
But+0.009
Neg logit contributions
robe-0.013
roller-0.013
 ceiling-0.012
ocated-0.012
ase-0.012
Token faces
Token ID6698
Feature activation+8.71e-03

Pos logit contributions
 bra+0.006
 aware+0.006
 cautious+0.006
 disl+0.006
 consider+0.006
Neg logit contributions
robe-0.006
roller-0.006
room-0.005
 ceiling-0.005
ase-0.005
Token.
Token ID13
Feature activation-3.98e-03

Pos logit contributions
 aware+0.010
 disl+0.009
 bra+0.009
 cautious+0.009
 consider+0.009
Neg logit contributions
robe-0.016
roller-0.015
room-0.015
 ceiling-0.015
tered-0.015
Token He
Token ID679
Feature activation+2.03e-02

Pos logit contributions
robe+0.006
roller+0.005
room+0.005
 ceiling+0.005
tered+0.005
Neg logit contributions
 aware-0.007
 bra-0.006
 cautious-0.006
 consider-0.006
 mindful-0.006
Token wh
Token ID348
Feature activation+5.91e-02

Pos logit contributions
 aware+0.032
fing+0.029
 cautious+0.029
 importantly+0.029
Just+0.029
Neg logit contributions
robe-0.025
 glass-0.024
 spilled-0.024
Mine-0.024
 ceiling-0.023
Tokenines
Token ID1127
Feature activation+1.52e-02

Pos logit contributions
 bra+0.085
 aware+0.071
 cautious+0.070
 disl+0.070
 consider+0.069
Neg logit contributions
room-0.101
robe-0.097
ase-0.097
roller-0.093
ings-0.092
Token and
Token ID290
Feature activation+1.66e-02

Pos logit contributions
 aware+0.015
 bra+0.015
 cautious+0.014
 disl+0.014
 consider+0.014
Neg logit contributions
robe-0.030
roller-0.028
room-0.028
 ceiling-0.028
tered-0.027
Token w
Token ID266
Feature activation+4.45e-02

Pos logit contributions
emy+0.024
ules+0.024
ologist+0.024
 aware+0.023
Just+0.022
Neg logit contributions
robe-0.022
 glass-0.022
 ceiling-0.022
Mine-0.022
 shelf-0.020
Tokenags
Token ID3775
Feature activation+8.43e-03

Pos logit contributions
 bra+0.059
 aware+0.058
 cautious+0.055
 disl+0.055
 consider+0.053
Neg logit contributions
room-0.072
robe-0.071
tered-0.066
ork-0.066
roller-0.066
Token his
Token ID465
Feature activation-1.32e-02

Pos logit contributions
 bra+0.012
 aware+0.012
 cautious+0.011
 disl+0.011
 consider+0.011
Neg logit contributions
robe-0.013
roller-0.012
room-0.012
 ceiling-0.012
tered-0.011
Token are
Token ID389
Feature activation+1.29e-02

Pos logit contributions
robe+0.014
roller+0.014
attered+0.013
tered+0.012
ocated+0.012
Neg logit contributions
 bra-0.013
 questions-0.013
 disl-0.012
 aware-0.012
 trusts-0.012
Token you
Token ID345
Feature activation+3.91e-02

Pos logit contributions
 bra+0.012
 aware+0.012
 cautious+0.011
 disl+0.011
 consider+0.011
Neg logit contributions
robe-0.026
roller-0.024
room-0.024
 ceiling-0.023
ork-0.023
Token doing
Token ID1804
Feature activation+3.25e-02

Pos logit contributions
 bra+0.049
 disl+0.046
 aware+0.043
 trusts+0.043
 cautious+0.042
Neg logit contributions
robe-0.068
room-0.067
box-0.064
 ceiling-0.062
roller-0.062
Token with
Token ID351
Feature activation+9.49e-03

Pos logit contributions
 bra+0.028
 aware+0.027
 cautious+0.025
 disl+0.024
 consider+0.023
Neg logit contributions
robe-0.071
room-0.069
roller-0.065
 ceiling-0.065
ase-0.064
Token the
Token ID262
Feature activation+1.21e-02

Pos logit contributions
 bra+0.009
 questions+0.008
 disl+0.008
 aware+0.008
 consider+0.007
Neg logit contributions
robe-0.020
roller-0.018
attered-0.018
room-0.018
tered-0.017
Token grapes
Token ID36656
Feature activation+5.79e-02

Pos logit contributions
 aware+0.020
 bra+0.019
 cautious+0.018
 consider+0.018
Just+0.018
Neg logit contributions
robe-0.016
 ceiling-0.015
roller-0.015
room-0.014
ork-0.014
Token?
Token ID30
Feature activation+1.16e-02

Pos logit contributions
 aware+0.047
 bra+0.046
 disl+0.043
 cautious+0.042
 questions+0.038
Neg logit contributions
robe-0.129
room-0.120
 ceiling-0.119
roller-0.119
ase-0.116
Token Those
Token ID5845
Feature activation+1.51e-02

Pos logit contributions
 aware+0.019
 bra+0.019
 cautious+0.018
 consider+0.018
 questions+0.017
Neg logit contributions
robe-0.016
roller-0.014
room-0.014
ase-0.014
 ceiling-0.014
Token are
Token ID389
Feature activation+1.04e-02

Pos logit contributions
 bra+0.026
 aware+0.025
 disl+0.024
 questions+0.024
 cautious+0.024
Neg logit contributions
robe-0.019
roller-0.018
room-0.017
 ceiling-0.016
ase-0.016
Token for
Token ID329
Feature activation+3.60e-03

Pos logit contributions
 aware+0.015
 bra+0.014
 cautious+0.014
 disl+0.014
 consider+0.013
Neg logit contributions
robe-0.016
roller-0.015
 ceiling-0.014
room-0.014
ork-0.014
Token eating
Token ID6600
Feature activation+1.48e-02

Pos logit contributions
 bra+0.004
 aware+0.004
 questions+0.003
 disl+0.003
 consider+0.003
Neg logit contributions
robe-0.007
roller-0.007
room-0.006
ase-0.006
 ceiling-0.006
Token are
Token ID389
Feature activation+8.37e-03

Pos logit contributions
roller+0.017
robe+0.017
attered+0.016
tered+0.015
ase+0.015
Neg logit contributions
 trusts-0.014
 questions-0.014
 brave-0.014
 tell-0.014
 consider-0.014
Token you
Token ID345
Feature activation+3.29e-02

Pos logit contributions
 bra+0.009
 aware+0.009
 cautious+0.008
 disl+0.008
 consider+0.008
Neg logit contributions
robe-0.016
roller-0.015
room-0.015
 ceiling-0.014
ork-0.014
Token doing
Token ID1804
Feature activation+2.99e-02

Pos logit contributions
 bra+0.036
 aware+0.033
 disl+0.032
 cautious+0.031
 questions+0.030
Neg logit contributions
robe-0.063
room-0.061
 ceiling-0.058
roller-0.058
box-0.058
Token with
Token ID351
Feature activation+9.62e-03

Pos logit contributions
 aware+0.025
 bra+0.025
 cautious+0.023
 disl+0.022
 consider+0.021
Neg logit contributions
robe-0.064
room-0.062
roller-0.059
 ceiling-0.059
ork-0.058
Token my
Token ID616
Feature activation+1.66e-02

Pos logit contributions
 bra+0.009
 aware+0.009
 disl+0.009
 questions+0.009
 consider+0.008
Neg logit contributions
robe-0.019
roller-0.018
room-0.017
 ceiling-0.017
tered-0.017
Token beans
Token ID16567
Feature activation+5.76e-02

Pos logit contributions
 bra+0.029
 aware+0.029
Just+0.027
 consider+0.027
 cautious+0.027
Neg logit contributions
robe-0.023
 ceiling-0.019
roller-0.019
ork-0.018
ings-0.017
Token?"
Token ID1701
Feature activation+1.98e-02

Pos logit contributions
 bra+0.051
 aware+0.048
 disl+0.046
 cautious+0.045
 questions+0.042
Neg logit contributions
robe-0.125
roller-0.115
room-0.114
 ceiling-0.114
box-0.113
Token It
Token ID632
Feature activation+1.05e-03

Pos logit contributions
 aware+0.036
ologist+0.036
 cautious+0.033
 mindful+0.033
 consider+0.033
Neg logit contributions
 glass-0.020
 ceiling-0.019
ork-0.018
 No-0.018
room-0.018
Token was
Token ID373
Feature activation-7.16e-03

Pos logit contributions
 aware+0.002
 bra+0.002
 cautious+0.002
 consider+0.002
 disl+0.002
Neg logit contributions
robe-0.001
roller-0.001
room-0.001
 ceiling-0.001
tered-0.001
Token Mrs
Token ID9074
Feature activation-1.33e-02

Pos logit contributions
robe+0.011
roller+0.010
room+0.010
 ceiling+0.010
ase+0.010
Neg logit contributions
 bra-0.010
 aware-0.010
 disl-0.010
 cautious-0.010
 consider-0.009
Token.
Token ID13
Feature activation-1.78e-02

Pos logit contributions
robe+0.024
roller+0.022
room+0.022
 ceiling+0.021
ase+0.021
Neg logit contributions
 aware-0.016
 bra-0.016
 disl-0.015
 cautious-0.015
 consider-0.015
TokenDaddy
Token ID48280
Feature activation+1.65e-02

Pos logit contributions
 aware+0.013
 bra+0.013
 disl+0.012
 cautious+0.012
 questions+0.012
Neg logit contributions
robe-0.013
room-0.013
roller-0.012
Mine-0.012
 ceiling-0.012
Token,
Token ID11
Feature activation-8.02e-03

Pos logit contributions
 bra+0.021
 aware+0.021
 disl+0.020
 cautious+0.020
 questions+0.019
Neg logit contributions
robe-0.027
roller-0.026
room-0.025
 ceiling-0.024
ings-0.023
Token why
Token ID1521
Feature activation+9.47e-03

Pos logit contributions
robe+0.014
roller+0.013
room+0.013
 ceiling+0.013
ocated+0.012
Neg logit contributions
 bra-0.010
 aware-0.009
 cautious-0.009
Just-0.009
 disl-0.009
Token are
Token ID389
Feature activation+1.32e-02

Pos logit contributions
 aware+0.015
 bra+0.015
 disl+0.014
 consider+0.014
 questions+0.014
Neg logit contributions
robe-0.013
roller-0.013
ase-0.012
room-0.012
attered-0.011
Token the
Token ID262
Feature activation+1.93e-02

Pos logit contributions
 aware+0.018
 bra+0.018
 cautious+0.017
 disl+0.017
 consider+0.016
Neg logit contributions
robe-0.020
roller-0.019
 ceiling-0.019
room-0.018
tered-0.018
Token whales
Token ID24635
Feature activation+5.76e-02

Pos logit contributions
smart+0.022
 aware+0.021
 disl+0.021
Just+0.021
 consider+0.020
Neg logit contributions
 ceiling-0.033
robe-0.032
 bottom-0.031
roller-0.031
room-0.031
Token marching
Token ID26409
Feature activation+1.95e-02

Pos logit contributions
 bra+0.062
emy+0.057
 aware+0.056
 disl+0.056
Just+0.055
Neg logit contributions
robe-0.106
 ceiling-0.101
roller-0.100
room-0.099
ings-0.094
Token?",
Token ID35379
Feature activation+6.43e-03

Pos logit contributions
 bra+0.020
 aware+0.019
 cautious+0.018
 disl+0.017
 consider+0.017
Neg logit contributions
robe-0.039
roller-0.037
room-0.036
 ceiling-0.035
tered-0.034
Token exclaimed
Token ID33552
Feature activation-8.96e-03

Pos logit contributions
 aware+0.010
cod+0.010
ologist+0.010
 disl+0.010
 bra+0.010
Neg logit contributions
robe-0.008
tered-0.007
roller-0.007
Mine-0.007
room-0.007
Token Ben
Token ID3932
Feature activation-8.28e-03

Pos logit contributions
robe+0.013
roller+0.012
room+0.011
 ceiling+0.011
tered+0.011
Neg logit contributions
 aware-0.014
 bra-0.014
 disl-0.013
 cautious-0.013
 consider-0.013
Token.
Token ID13
Feature activation-1.35e-02

Pos logit contributions
robe+0.014
roller+0.013
room+0.013
tered+0.013
 ceiling+0.013
Neg logit contributions
 disl-0.010
 bra-0.010
 aware-0.009
 trusts-0.009
emy-0.009
Token asked
Token ID1965
Feature activation+1.27e-03

Pos logit contributions
 aware+0.006
 questions+0.006
 bra+0.006
 disl+0.006
ologist+0.006
Neg logit contributions
robe-0.011
roller-0.010
tered-0.010
 ceiling-0.010
room-0.010
Token,
Token ID11
Feature activation-3.64e-03

Pos logit contributions
 bra+0.002
 aware+0.002
 disl+0.002
 cautious+0.002
 consider+0.002
Neg logit contributions
robe-0.002
roller-0.002
room-0.002
 ceiling-0.002
ase-0.002
Token "
Token ID366
Feature activation-1.29e-05

Pos logit contributions
robe+0.002
tered+0.002
roller+0.002
 ceiling+0.002
 No+0.002
Neg logit contributions
 disl-0.008
cod-0.008
ieve-0.008
inations-0.008
ises-0.008
TokenWhat
Token ID2061
Feature activation-1.50e-02

Pos logit contributions
Mine+0.000
robe+0.000
room+0.000
ase+0.000
mail+0.000
Neg logit contributions
 disl-0.000
 til-0.000
 aware-0.000
 impressed-0.000
 questions-0.000
Token do
Token ID466
Feature activation-4.59e-03

Pos logit contributions
robe+0.024
roller+0.024
attered+0.022
ocated+0.021
ase+0.021
Neg logit contributions
 bra-0.021
 questions-0.021
 aware-0.021
 consider-0.020
 trusts-0.020
Token chickens
Token ID25972
Feature activation+5.70e-02

Pos logit contributions
robe+0.007
room+0.007
ork+0.007
roller+0.006
 ceiling+0.006
Neg logit contributions
 aware-0.006
 bra-0.006
 cautious-0.006
 impressed-0.006
ologist-0.006
Token look
Token ID804
Feature activation+1.52e-02

Pos logit contributions
 bra+0.085
Just+0.083
emy+0.083
 aware+0.080
 disl+0.077
Neg logit contributions
robe-0.078
room-0.072
 ceiling-0.071
 frame-0.067
tered-0.066
Token like
Token ID588
Feature activation+2.28e-02

Pos logit contributions
Just+0.023
 aware+0.022
 trusts+0.021
 disl+0.021
 cautious+0.021
Neg logit contributions
robe-0.020
tered-0.018
 ceiling-0.018
ings-0.017
 glass-0.017
Token?"
Token ID1701
Feature activation+1.04e-02

Pos logit contributions
 bra+0.024
 aware+0.024
 disl+0.023
 cautious+0.022
 trusts+0.022
Neg logit contributions
robe-0.042
 ceiling-0.041
room-0.039
 glass-0.038
ase-0.038
Token
Token ID198
Feature activation+2.31e-03

Pos logit contributions
 aware+0.017
ieve+0.017
cod+0.017
 cautious+0.017
 knowledgeable+0.017
Neg logit contributions
robe-0.012
roller-0.012
room-0.012
ase-0.011
ork-0.011
Token
Token ID198
Feature activation+5.83e-03

Pos logit contributions
 aware+0.003
 bra+0.003
 cautious+0.003
ologist+0.003
 disl+0.003
Neg logit contributions
roller-0.004
ork-0.004
room-0.004
robe-0.004
ase-0.003
Token button
Token ID4936
Feature activation-1.09e-02

Pos logit contributions
 aware+0.010
 bra+0.009
 cautious+0.009
 consider+0.009
 disl+0.009
Neg logit contributions
robe-0.004
roller-0.004
 ceiling-0.003
room-0.003
ork-0.003
Token.
Token ID13
Feature activation+7.23e-04

Pos logit contributions
robe+0.023
roller+0.022
room+0.021
 ceiling+0.021
ocated+0.021
Neg logit contributions
 bra-0.009
 aware-0.008
 questions-0.008
 cautious-0.008
 disl-0.008
Token They
Token ID1119
Feature activation+1.81e-02

Pos logit contributions
 aware+0.001
 bra+0.001
 cautious+0.001
 consider+0.001
 disl+0.001
Neg logit contributions
robe-0.001
roller-0.001
room-0.001
 ceiling-0.001
ork-0.001
Token hear
Token ID3285
Feature activation+2.29e-03

Pos logit contributions
 aware+0.031
emy+0.030
Just+0.029
 cautious+0.029
 disl+0.028
Neg logit contributions
 ceiling-0.023
robe-0.022
room-0.021
Mine-0.021
 shelf-0.020
Token a
Token ID257
Feature activation+1.12e-03

Pos logit contributions
 aware+0.004
 bra+0.004
 cautious+0.003
 consider+0.003
 disl+0.003
Neg logit contributions
robe-0.003
roller-0.003
room-0.003
 ceiling-0.003
tered-0.003
Token wh
Token ID348
Feature activation+5.67e-02

Pos logit contributions
 aware+0.002
 cautious+0.002
 bra+0.002
 disl+0.002
 consider+0.002
Neg logit contributions
 ceiling-0.002
robe-0.002
roller-0.002
room-0.001
ork-0.001
Tokenir
Token ID343
Feature activation+1.54e-02

Pos logit contributions
 bra+0.079
 disl+0.070
 aware+0.068
 consider+0.067
 cautious+0.066
Neg logit contributions
room-0.095
ase-0.092
ings-0.090
robe-0.090
roller-0.089
Tokenring
Token ID1806
Feature activation+1.60e-02

Pos logit contributions
 aware+0.017
 bra+0.017
 cautious+0.017
 trusts+0.017
 consider+0.017
Neg logit contributions
robe-0.027
room-0.026
tered-0.026
roller-0.025
ase-0.025
Token sound
Token ID2128
Feature activation+8.11e-03

Pos logit contributions
 aware+0.022
 bra+0.022
 cautious+0.021
 consider+0.021
 disl+0.021
Neg logit contributions
robe-0.025
roller-0.023
room-0.023
 ceiling-0.022
ase-0.022
Token and
Token ID290
Feature activation+1.11e-02

Pos logit contributions
emy+0.012
 disl+0.012
 aware+0.011
 trusts+0.011
 cautious+0.011
Neg logit contributions
room-0.012
robe-0.012
tered-0.011
 ceiling-0.011
ork-0.011
Token see
Token ID766
Feature activation+5.74e-03

Pos logit contributions
 aware+0.015
emy+0.015
 cautious+0.014
Just+0.014
 consider+0.014
Neg logit contributions
robe-0.017
 ceiling-0.016
roller-0.015
room-0.015
 glass-0.015
Token like
Token ID588
Feature activation-3.37e-04

Pos logit contributions
 aware+0.022
 disl+0.021
 bra+0.020
 cautious+0.020
emy+0.020
Neg logit contributions
robe-0.030
room-0.028
roller-0.028
 ceiling-0.028
tered-0.028
Token "
Token ID366
Feature activation+2.24e-03

Pos logit contributions
robe+0.001
roller+0.001
room+0.000
 ceiling+0.000
ork+0.000
Neg logit contributions
 aware-0.000
 bra-0.000
 cautious-0.000
 disl-0.000
 consider-0.000
Tokencastle
Token ID18676
Feature activation+3.79e-02

Pos logit contributions
 bra+0.003
 aware+0.003
 disl+0.003
 questions+0.003
 consider+0.003
Neg logit contributions
robe-0.004
roller-0.003
 ceiling-0.003
room-0.003
ocated-0.003
Token",
Token ID1600
Feature activation+1.44e-02

Pos logit contributions
 bra+0.044
 aware+0.043
 cautious+0.040
 disl+0.040
 consider+0.039
Neg logit contributions
robe-0.068
roller-0.064
room-0.064
 ceiling-0.062
ase-0.061
Token "
Token ID366
Feature activation+2.77e-02

Pos logit contributions
 aware+0.018
 bra+0.017
 cautious+0.016
cod+0.016
emy+0.016
Neg logit contributions
robe-0.024
room-0.024
tered-0.023
roller-0.023
 ceiling-0.023
Tokenpr
Token ID1050
Feature activation+5.66e-02

Pos logit contributions
 aware+0.036
 bra+0.035
 cautious+0.033
 disl+0.033
 trusts+0.032
Neg logit contributions
robe-0.045
room-0.045
roller-0.043
Mine-0.042
ase-0.042
Tokenince
Token ID924
Feature activation+1.91e-02

Pos logit contributions
 bra+0.088
 aware+0.084
 cautious+0.081
 disl+0.078
 thanking+0.076
Neg logit contributions
room-0.085
ings-0.076
roller-0.076
robe-0.075
tered-0.075
Token",
Token ID1600
Feature activation+1.21e-02

Pos logit contributions
 bra+0.025
 aware+0.025
 cautious+0.023
 disl+0.022
emy+0.022
Neg logit contributions
robe-0.030
room-0.030
roller-0.029
ork-0.029
 ceiling-0.028
Token "
Token ID366
Feature activation+2.85e-02

Pos logit contributions
 aware+0.016
 bra+0.015
emy+0.015
cod+0.015
 cautious+0.015
Neg logit contributions
robe-0.019
room-0.018
tered-0.018
 ceiling-0.018
roller-0.018
Tokendragon
Token ID14844
Feature activation+3.18e-02

Pos logit contributions
 aware+0.037
 bra+0.036
 disl+0.034
 cautious+0.034
 trusts+0.033
Neg logit contributions
room-0.047
robe-0.046
roller-0.044
Mine-0.043
ase-0.043
Token",
Token ID1600
Feature activation+1.16e-02

Pos logit contributions
 bra+0.046
 aware+0.045
emy+0.042
 cautious+0.042
 confident+0.041
Neg logit contributions
robe-0.047
room-0.045
ings-0.043
ork-0.043
roller-0.043
Token cardboard
Token ID26819
Feature activation-1.00e-02

Pos logit contributions
robe+0.016
roller+0.015
room+0.014
 ceiling+0.014
ase+0.014
Neg logit contributions
 bra-0.020
 aware-0.020
 cautious-0.019
 disl-0.019
 consider-0.019
Token."
Token ID526
Feature activation+1.94e-02

Pos logit contributions
robe+0.018
roller+0.017
room+0.017
 ceiling+0.016
ase+0.016
Neg logit contributions
 aware-0.012
 bra-0.012
 disl-0.011
 cautious-0.011
 consider-0.011
Token
Token ID198
Feature activation+1.04e-02

Pos logit contributions
 bra+0.030
 disl+0.029
 aware+0.029
 consider+0.028
 knowledgeable+0.028
Neg logit contributions
robe-0.026
roller-0.026
room-0.024
 ceiling-0.024
tered-0.023
Token
Token ID198
Feature activation+1.70e-02

Pos logit contributions
 disl+0.014
 bra+0.014
 aware+0.014
 cautious+0.013
 knowledgeable+0.013
Neg logit contributions
roller-0.016
robe-0.016
room-0.016
ork-0.016
tered-0.015
TokenSur
Token ID14214
Feature activation+3.23e-02

Pos logit contributions
 disl+0.027
 bra+0.026
 questions+0.025
smart+0.024
 aware+0.024
Neg logit contributions
Mine-0.021
room-0.021
roller-0.021
tered-0.021
mail-0.020
Tokenpr
Token ID1050
Feature activation+5.64e-02

Pos logit contributions
 disl+0.041
 bra+0.035
 til+0.033
 aware+0.033
 ze+0.033
Neg logit contributions
room-0.057
ase-0.054
robe-0.054
box-0.053
ocated-0.052
Tokenised
Token ID1417
Feature activation+2.37e-02

Pos logit contributions
 bra+0.067
 aware+0.064
 disl+0.062
 cautious+0.061
 questions+0.059
Neg logit contributions
robe-0.100
roller-0.095
room-0.093
 ceiling-0.091
ase-0.091
Token,
Token ID11
Feature activation+2.03e-02

Pos logit contributions
 disl+0.042
 bra+0.040
cod+0.039
 knowledgeable+0.038
ologist+0.037
Neg logit contributions
ings-0.027
roller-0.027
robe-0.025
tered-0.024
box-0.024
Token Ralph
Token ID20993
Feature activation-7.61e-03

Pos logit contributions
 disl+0.032
 bra+0.031
 aware+0.030
cod+0.029
 consider+0.029
Neg logit contributions
robe-0.027
tered-0.025
roller-0.024
 ceiling-0.024
room-0.023
Token looked
Token ID3114
Feature activation+1.30e-02

Pos logit contributions
robe+0.012
roller+0.012
room+0.011
tered+0.011
 ceiling+0.011
Neg logit contributions
 bra-0.010
 aware-0.010
 disl-0.010
 consider-0.009
 cautious-0.009
Token at
Token ID379
Feature activation+1.33e-02

Pos logit contributions
 disl+0.020
 bra+0.019
emy+0.018
 consider+0.018
 aware+0.018
Neg logit contributions
robe-0.019
 ceiling-0.017
tered-0.017
roller-0.017
attered-0.016
Token they
Token ID484
Feature activation+1.65e-02

Pos logit contributions
robe+0.016
roller+0.015
room+0.014
 ceiling+0.014
ase+0.014
Neg logit contributions
 bra-0.015
 aware-0.015
 cautious-0.014
 disl-0.014
 questions-0.014
Token do
Token ID466
Feature activation+3.68e-03

Pos logit contributions
 bra+0.024
 disl+0.023
 aware+0.023
Just+0.022
 impressed+0.022
Neg logit contributions
robe-0.023
 ceiling-0.022
roller-0.021
room-0.021
mail-0.021
Token it
Token ID340
Feature activation+2.95e-03

Pos logit contributions
 bra+0.005
 aware+0.005
 disl+0.004
 cautious+0.004
 consider+0.004
Neg logit contributions
robe-0.006
roller-0.006
room-0.006
 ceiling-0.005
ork-0.005
Token.
Token ID13
Feature activation-6.52e-03

Pos logit contributions
 aware+0.004
 bra+0.004
 disl+0.003
 cautious+0.003
 consider+0.003
Neg logit contributions
robe-0.005
room-0.005
roller-0.005
 ceiling-0.005
tered-0.004
Token He
Token ID679
Feature activation+2.11e-02

Pos logit contributions
robe+0.010
roller+0.009
 ceiling+0.009
room+0.009
ork+0.008
Neg logit contributions
 aware-0.010
 bra-0.009
 disl-0.009
cod-0.009
 consider-0.009
Token imag
Token ID3590
Feature activation+5.60e-02

Pos logit contributions
 aware+0.030
 disl+0.029
smart+0.029
 consider+0.028
ologist+0.028
Neg logit contributions
robe-0.027
roller-0.027
 ceiling-0.026
Mine-0.026
ork-0.026
Tokenines
Token ID1127
Feature activation+1.08e-02

Pos logit contributions
 bra+0.080
 wiser+0.070
 disl+0.069
emy+0.068
 ze+0.068
Neg logit contributions
room-0.084
ied-0.080
ool-0.080
box-0.079
roller-0.078
Token he
Token ID339
Feature activation+2.15e-02

Pos logit contributions
 bra+0.012
 aware+0.011
 questions+0.011
 disl+0.010
 consider+0.010
Neg logit contributions
robe-0.020
roller-0.020
attered-0.018
ase-0.018
room-0.018
Token is
Token ID318
Feature activation+2.36e-02

Pos logit contributions
 aware+0.037
 bra+0.035
 disl+0.035
emy+0.035
 consider+0.034
Neg logit contributions
robe-0.025
 ceiling-0.023
room-0.022
roller-0.022
ork-0.021
Token one
Token ID530
Feature activation+4.92e-03

Pos logit contributions
 disl+0.032
cod+0.032
emy+0.032
 trusts+0.032
Just+0.031
Neg logit contributions
robe-0.036
 ceiling-0.032
room-0.030
roller-0.030
tered-0.029
Token of
Token ID286
Feature activation+7.72e-03

Pos logit contributions
 disl+0.007
 aware+0.007
 bra+0.006
 consider+0.006
Just+0.006
Neg logit contributions
robe-0.007
 ceiling-0.007
room-0.007
ase-0.007
roller-0.007
Token together
Token ID1978
Feature activation+5.15e-03

Pos logit contributions
 aware+0.019
 bra+0.019
 disl+0.018
 consider+0.018
 trusts+0.018
Neg logit contributions
robe-0.018
room-0.018
 ceiling-0.016
ase-0.016
roller-0.015
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
 bra+0.006
 aware+0.006
 disl+0.006
 consider+0.005
 cautious+0.005
Neg logit contributions
robe-0.009
room-0.009
roller-0.009
 ceiling-0.009
ase-0.008
Token They
Token ID1119
Feature activation+4.17e-03

Pos logit contributions
 bra+0.015
 aware+0.015
 disl+0.014
 cautious+0.014
 questions+0.014
Neg logit contributions
robe-0.016
roller-0.015
room-0.015
 ceiling-0.014
ase-0.014
Token let
Token ID1309
Feature activation-8.17e-03

Pos logit contributions
 bra+0.007
 aware+0.007
 cautious+0.006
 disl+0.006
 consider+0.006
Neg logit contributions
robe-0.006
roller-0.005
room-0.005
 ceiling-0.005
ork-0.005
Token their
Token ID511
Feature activation+2.11e-02

Pos logit contributions
robe+0.018
roller+0.017
attered+0.017
bars+0.017
ase+0.016
Neg logit contributions
 bra-0.007
 brave-0.007
 questions-0.007
 say-0.006
 tell-0.006
Token imag
Token ID3590
Feature activation+5.59e-02

Pos logit contributions
 bra+0.022
 aware+0.022
 questions+0.021
 cautious+0.020
 disl+0.020
Neg logit contributions
robe-0.041
roller-0.038
ase-0.036
room-0.036
tered-0.036
Tokeninations
Token ID7352
Feature activation+7.87e-03

Pos logit contributions
 bra+0.065
 cautious+0.057
 aware+0.056
 wiser+0.056
But+0.056
Neg logit contributions
room-0.105
ool-0.100
roller-0.099
ase-0.098
robe-0.096
Token take
Token ID1011
Feature activation-4.89e-03

Pos logit contributions
 bra+0.011
 aware+0.010
 consider+0.010
 questions+0.010
 disl+0.010
Neg logit contributions
robe-0.012
roller-0.012
ase-0.011
room-0.011
 ceiling-0.011
Token over
Token ID625
Feature activation+1.51e-03

Pos logit contributions
robe+0.009
roller+0.008
ase+0.008
 ceiling+0.008
ocated+0.008
Neg logit contributions
 bra-0.006
 questions-0.006
 brave-0.006
 aware-0.006
 proud-0.005
Token and
Token ID290
Feature activation+3.40e-03

Pos logit contributions
 bra+0.002
 aware+0.002
 disl+0.002
 cautious+0.002
 consider+0.002
Neg logit contributions
robe-0.002
roller-0.002
room-0.002
 ceiling-0.002
ork-0.002
Token soon
Token ID2582
Feature activation+1.53e-02

Pos logit contributions
 aware+0.005
 bra+0.005
 cautious+0.005
 disl+0.005
 consider+0.004
Neg logit contributions
robe-0.005
roller-0.005
room-0.005
 ceiling-0.005
tered-0.004
Token
Token ID198
Feature activation+9.27e-03

Pos logit contributions
robe+0.001
roller+0.001
room+0.001
 ceiling+0.001
ork+0.001
Neg logit contributions
 aware-0.001
 disl-0.001
 bra-0.001
 consider-0.001
cod-0.001
Token
Token ID198
Feature activation+1.25e-02

Pos logit contributions
 bra+0.012
 aware+0.011
 cautious+0.011
 consider+0.010
 knowledgeable+0.010
Neg logit contributions
robe-0.015
roller-0.015
room-0.015
ork-0.015
 ceiling-0.014
Token"
Token ID1
Feature activation+1.41e-02

Pos logit contributions
 bra+0.017
 aware+0.016
 questions+0.016
 knowledgeable+0.016
 cautious+0.016
Neg logit contributions
roller-0.019
room-0.019
ork-0.018
robe-0.018
Mine-0.018
TokenBecause
Token ID8128
Feature activation+1.33e-02

Pos logit contributions
 aware+0.018
 bra+0.017
 questions+0.017
 cautious+0.016
 consider+0.016
Neg logit contributions
robe-0.024
room-0.022
roller-0.022
Mine-0.022
ork-0.021
Token lady
Token ID10846
Feature activation+3.07e-02

Pos logit contributions
 bra+0.019
 aware+0.019
 cautious+0.018
 disl+0.018
 consider+0.018
Neg logit contributions
robe-0.020
roller-0.019
room-0.018
 ceiling-0.018
ase-0.017
Tokenbugs
Token ID32965
Feature activation+5.58e-02

Pos logit contributions
 aware+0.047
 bra+0.045
Just+0.044
 cautious+0.044
 disl+0.042
Neg logit contributions
robe-0.042
room-0.039
ase-0.038
roller-0.038
ork-0.037
Token are
Token ID389
Feature activation+1.33e-02

Pos logit contributions
 aware+0.113
 bra+0.110
 cautious+0.108
Just+0.107
 disl+0.107
Neg logit contributions
robe-0.049
room-0.043
roller-0.041
 ceiling-0.041
tered-0.039
Token kind
Token ID1611
Feature activation+9.18e-03

Pos logit contributions
Just+0.018
 trusts+0.017
 aware+0.017
fing+0.017
ologist+0.017
Neg logit contributions
robe-0.019
 ceiling-0.018
tered-0.017
 glass-0.017
roller-0.017
Token.
Token ID13
Feature activation-4.61e-03

Pos logit contributions
 bra+0.011
 aware+0.011
 disl+0.010
 consider+0.010
 cautious+0.010
Neg logit contributions
robe-0.016
roller-0.015
room-0.015
 ceiling-0.014
ase-0.014
Token They
Token ID1119
Feature activation+2.37e-02

Pos logit contributions
robe+0.005
room+0.005
roller+0.004
 ceiling+0.004
ase+0.004
Neg logit contributions
 aware-0.009
 bra-0.009
 cautious-0.008
 believing-0.008
 impressed-0.008
Token help
Token ID1037
Feature activation-6.15e-03

Pos logit contributions
 aware+0.046
Just+0.046
 bra+0.044
emy+0.044
 cautious+0.044
Neg logit contributions
robe-0.020
 ceiling-0.019
tered-0.017
 frame-0.017
 glass-0.017
Token the
Token ID262
Feature activation-1.45e-02

Pos logit contributions
 bra+0.006
 aware+0.006
 questions+0.006
 disl+0.006
 proud+0.006
Neg logit contributions
robe-0.009
roller-0.008
ase-0.008
attered-0.008
room-0.008
Token mailbox
Token ID37282
Feature activation-1.31e-02

Pos logit contributions
robe+0.019
roller+0.018
room+0.017
tered+0.017
ocated+0.017
Neg logit contributions
 bra-0.022
 proud-0.022
 aware-0.021
But-0.021
 questions-0.021
Token.
Token ID13
Feature activation-1.30e-02

Pos logit contributions
robe+0.029
roller+0.027
room+0.026
ork+0.026
 ceiling+0.026
Neg logit contributions
 bra-0.010
 aware-0.010
 cautious-0.009
 consider-0.008
 questions-0.008
Token On
Token ID1550
Feature activation-1.06e-02

Pos logit contributions
robe+0.022
roller+0.020
ase+0.020
room+0.020
 ceiling+0.019
Neg logit contributions
 aware-0.015
 bra-0.015
 proud-0.015
But-0.015
 disl-0.015
Token the
Token ID262
Feature activation-1.64e-02

Pos logit contributions
attered+0.026
robe+0.026
roller+0.023
ocated+0.023
ase+0.023
Neg logit contributions
 then-0.006
 proud-0.005
But-0.005
 saying-0.005
?-0.005
Token way
Token ID835
Feature activation-6.39e-02

Pos logit contributions
robe+0.029
room+0.023
ocated+0.022
ase+0.022
attered+0.022
Neg logit contributions
 saying-0.023
But-0.022
 bra-0.022
 tell-0.022
 proud-0.021
Token,
Token ID11
Feature activation-5.38e-03

Pos logit contributions
robe+0.157
Mine+0.152
attered+0.148
tered+0.147
ase+0.144
Neg logit contributions
 brave-0.025
 then-0.024
But-0.024
 proud-0.024
 and-0.020
Token she
Token ID673
Feature activation-3.59e-03

Pos logit contributions
robe+0.011
ase+0.011
roller+0.010
attered+0.010
ool+0.010
Neg logit contributions
 proud-0.005
But-0.005
 bra-0.005
 brave-0.005
 questions-0.005
Token spotted
Token ID13489
Feature activation+1.81e-02

Pos logit contributions
robe+0.007
roller+0.007
room+0.006
 ceiling+0.006
Mine+0.006
Neg logit contributions
 bra-0.004
 aware-0.004
 proud-0.003
 impressed-0.003
 cautious-0.003
Token something
Token ID1223
Feature activation-2.46e-02

Pos logit contributions
 bra+0.019
 questions+0.018
 saying+0.017
 aware+0.017
 proud+0.017
Neg logit contributions
robe-0.035
roller-0.035
attered-0.032
ase-0.032
oun-0.031
Token disgusting
Token ID23374
Feature activation-1.01e-02

Pos logit contributions
robe+0.042
roller+0.039
room+0.038
 ceiling+0.037
ase+0.037
Neg logit contributions
 bra-0.031
 aware-0.031
 cautious-0.029
 disl-0.029
 consider-0.029
Token.
Token ID13
Feature activation-2.29e-02

Pos logit contributions
robe+0.013
roller+0.012
ase+0.012
room+0.011
 ceiling+0.011
Neg logit contributions
 bra-0.005
 aware-0.005
 questions-0.004
 disl-0.004
 cautious-0.004
Token We
Token ID775
Feature activation-1.10e-02

Pos logit contributions
robe+0.038
roller+0.036
room+0.034
 ceiling+0.034
ork+0.034
Neg logit contributions
 bra-0.029
 aware-0.029
 disl-0.028
 cautious-0.027
 consider-0.027
Token will
Token ID481
Feature activation-2.49e-02

Pos logit contributions
robe+0.017
roller+0.016
room+0.015
 ceiling+0.015
ork+0.015
Neg logit contributions
 bra-0.016
 aware-0.016
 disl-0.015
 cautious-0.015
 consider-0.014
Token find
Token ID1064
Feature activation-1.73e-02

Pos logit contributions
robe+0.036
roller+0.034
room+0.033
 ceiling+0.032
ocated+0.031
Neg logit contributions
 bra-0.037
 aware-0.037
 cautious-0.035
 disl-0.035
 questions-0.035
Token a
Token ID257
Feature activation-2.43e-02

Pos logit contributions
robe+0.033
roller+0.031
ase+0.029
 ceiling+0.028
room+0.028
Neg logit contributions
 bra-0.021
 aware-0.020
 questions-0.019
 proud-0.018
 consider-0.018
Token way
Token ID835
Feature activation-6.27e-02

Pos logit contributions
robe+0.040
tered+0.036
room+0.036
ase+0.036
attered+0.036
Neg logit contributions
 bra-0.034
 aware-0.032
 consider-0.031
 cautious-0.031
 proud-0.031
Token to
Token ID284
Feature activation-2.47e-02

Pos logit contributions
robe+0.106
roller+0.098
room+0.095
ase+0.093
 ceiling+0.093
Neg logit contributions
 bra-0.079
 aware-0.078
 questions-0.073
 cautious-0.073
 consider-0.073
Token fix
Token ID4259
Feature activation-3.73e-03

Pos logit contributions
robe+0.042
roller+0.038
ase+0.037
 ceiling+0.036
room+0.036
Neg logit contributions
 bra-0.033
 aware-0.032
 consider-0.031
 disl-0.030
 cautious-0.030
Token your
Token ID534
Feature activation-6.76e-03

Pos logit contributions
robe+0.006
roller+0.006
ase+0.005
attered+0.005
 ceiling+0.005
Neg logit contributions
 bra-0.005
 aware-0.005
 questions-0.005
 disl-0.005
 proud-0.005
Token car
Token ID1097
Feature activation-1.52e-02

Pos logit contributions
robe+0.007
roller+0.007
room+0.006
tered+0.006
ase+0.006
Neg logit contributions
 bra-0.013
 aware-0.012
 questions-0.012
 cautious-0.012
 disl-0.012
Token."
Token ID526
Feature activation+2.46e-03

Pos logit contributions
robe+0.032
roller+0.031
 ceiling+0.030
Mine+0.030
attered+0.030
Neg logit contributions
 bra-0.012
 aware-0.011
 questions-0.010
 consider-0.010
 cautious-0.010
Token come
Token ID1282
Feature activation-1.09e-02

Pos logit contributions
robe+0.012
roller+0.011
room+0.011
 ceiling+0.011
ase+0.010
Neg logit contributions
 bra-0.010
 aware-0.010
 disl-0.009
 cautious-0.009
 consider-0.009
Token,
Token ID11
Feature activation+6.54e-03

Pos logit contributions
robe+0.019
roller+0.018
room+0.017
 ceiling+0.017
ork+0.017
Neg logit contributions
 bra-0.013
 aware-0.013
 disl-0.013
 cautious-0.012
 consider-0.012
Token and
Token ID290
Feature activation+2.15e-03

Pos logit contributions
 aware+0.007
 bra+0.007
 disl+0.007
 cautious+0.007
 questions+0.006
Neg logit contributions
robe-0.012
roller-0.011
 ceiling-0.011
room-0.011
tered-0.010
Token the
Token ID262
Feature activation-1.88e-02

Pos logit contributions
 bra+0.003
 aware+0.003
 disl+0.003
 cautious+0.003
 consider+0.003
Neg logit contributions
robe-0.004
roller-0.003
room-0.003
 ceiling-0.003
ase-0.003
Token lovely
Token ID14081
Feature activation-4.09e-02

Pos logit contributions
robe+0.032
roller+0.032
room+0.030
ork+0.029
attered+0.029
Neg logit contributions
 bra-0.024
 aware-0.022
 questions-0.021
 proud-0.021
 cautious-0.021
Token leather
Token ID11620
Feature activation-6.25e-02

Pos logit contributions
roller+0.053
robe+0.052
attered+0.051
ocated+0.050
room+0.050
Neg logit contributions
 bra-0.064
 proud-0.060
 med-0.059
 aware-0.059
 brave-0.058
Token wrapped
Token ID12908
Feature activation-3.65e-02

Pos logit contributions
attered+0.068
roller+0.068
ocated+0.068
room+0.066
tered+0.066
Neg logit contributions
 bra-0.099
 proud-0.094
 brave-0.094
 questions-0.088
 trusts-0.088
Token doll
Token ID3654
Feature activation-2.48e-02

Pos logit contributions
roller+0.064
ocated+0.060
OP+0.059
Mine+0.058
 ceiling+0.057
Neg logit contributions
 bra-0.040
 questions-0.036
 brave-0.035
 proud-0.035
 aware-0.034
Token stayed
Token ID9658
Feature activation-2.11e-02

Pos logit contributions
robe+0.043
roller+0.041
room+0.041
 ceiling+0.039
ocated+0.039
Neg logit contributions
 bra-0.030
 aware-0.029
 cautious-0.027
 proud-0.027
 consider-0.027
Token with
Token ID351
Feature activation-1.94e-02

Pos logit contributions
robe+0.036
roller+0.034
room+0.033
 ceiling+0.032
ase+0.032
Neg logit contributions
 bra-0.026
 aware-0.026
 disl-0.024
 cautious-0.024
 consider-0.024
Token her
Token ID607
Feature activation+8.99e-03

Pos logit contributions
robe+0.032
roller+0.032
ocated+0.029
ase+0.029
room+0.029
Neg logit contributions
 bra-0.027
 aware-0.024
 questions-0.024
 proud-0.024
 consider-0.023
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
ase+0.008
attered+0.007
Neg logit contributions
 bra-0.006
 aware-0.006
 questions-0.006
 consider-0.005
 cautious-0.005
Token pigeon
Token ID46387
Feature activation-8.58e-03

Pos logit contributions
 bra+0.019
 aware+0.018
 cautious+0.017
 disl+0.017
 consider+0.017
Neg logit contributions
robe-0.021
roller-0.020
room-0.019
 ceiling-0.019
ase-0.019
Token would
Token ID561
Feature activation-3.31e-02

Pos logit contributions
robe+0.009
roller+0.009
room+0.008
 ceiling+0.008
ase+0.008
Neg logit contributions
 bra-0.016
 aware-0.016
 cautious-0.015
 disl-0.015
 consider-0.015
Token find
Token ID1064
Feature activation-1.41e-02

Pos logit contributions
robe+0.046
roller+0.042
 ceiling+0.041
tered+0.040
room+0.040
Neg logit contributions
 aware-0.049
 disl-0.049
 cautious-0.047
 bra-0.047
ologist-0.046
Token a
Token ID257
Feature activation-4.17e-03

Pos logit contributions
robe+0.024
roller+0.023
room+0.022
 ceiling+0.021
ase+0.021
Neg logit contributions
 bra-0.020
 aware-0.018
 questions-0.017
 cautious-0.017
 consider-0.017
Token way
Token ID835
Feature activation-6.21e-02

Pos logit contributions
robe+0.007
roller+0.006
room+0.006
ase+0.006
tered+0.006
Neg logit contributions
 bra-0.006
 aware-0.006
 cautious-0.005
 consider-0.005
 disl-0.005
Token to
Token ID284
Feature activation-1.22e-02

Pos logit contributions
robe+0.087
roller+0.082
 ceiling+0.077
room+0.076
ork+0.075
Neg logit contributions
 bra-0.097
 aware-0.096
 disl-0.093
 cautious-0.091
 consider-0.090
Token fix
Token ID4259
Feature activation+1.39e-03

Pos logit contributions
robe+0.018
roller+0.017
room+0.017
 ceiling+0.016
ork+0.016
Neg logit contributions
 aware-0.017
 bra-0.017
 cautious-0.016
 disl-0.016
 questions-0.016
Token its
Token ID663
Feature activation+1.42e-02

Pos logit contributions
 bra+0.002
 aware+0.002
 questions+0.002
 disl+0.002
 proud+0.002
Neg logit contributions
robe-0.002
roller-0.002
 ceiling-0.002
ocated-0.002
room-0.002
Token wing
Token ID8539
Feature activation+8.24e-03

Pos logit contributions
 bra+0.023
 aware+0.023
 disl+0.022
 cautious+0.022
 consider+0.022
Neg logit contributions
robe-0.018
roller-0.017
 ceiling-0.016
ork-0.016
room-0.015
Token and
Token ID290
Feature activation-1.24e-02

Pos logit contributions
 aware+0.009
 bra+0.008
 disl+0.008
 cautious+0.008
 consider+0.008
Neg logit contributions
robe-0.016
roller-0.015
room-0.014
 ceiling-0.014
ork-0.014
Token it
Token ID340
Feature activation-2.74e-02

Pos logit contributions
roller+0.026
room+0.025
robe+0.025
ase+0.025
ork+0.024
Neg logit contributions
 bra-0.018
 aware-0.017
 disl-0.017
 proud-0.017
 consider-0.017
Token was
Token ID373
Feature activation-3.59e-02

Pos logit contributions
roller+0.049
ifier+0.044
room+0.044
ings+0.043
olk+0.042
Neg logit contributions
 bra-0.038
 brave-0.036
 proud-0.034
 cal-0.032
 reassured-0.032
Token decorated
Token ID24789
Feature activation-2.13e-02

Pos logit contributions
robe+0.062
roller+0.062
room+0.059
ase+0.058
ork+0.056
Neg logit contributions
 bra-0.045
 aware-0.044
 proud-0.042
 consider-0.042
 cautious-0.040
Token just
Token ID655
Feature activation-2.37e-02

Pos logit contributions
roller+0.046
robe+0.042
attered+0.041
 ceiling+0.041
Mine+0.041
Neg logit contributions
 bra-0.018
But-0.017
 aware-0.015
 proud-0.015
 brave-0.014
Token the
Token ID262
Feature activation-2.43e-02

Pos logit contributions
robe+0.045
roller+0.042
room+0.041
 ceiling+0.041
tered+0.040
Neg logit contributions
 bra-0.025
 aware-0.025
 disl-0.023
 cautious-0.023
 consider-0.022
Token way
Token ID835
Feature activation-6.17e-02

Pos logit contributions
robe+0.033
roller+0.032
room+0.030
ase+0.029
tered+0.029
Neg logit contributions
 bra-0.038
 aware-0.036
 questions-0.036
 proud-0.035
 cautious-0.035
Token she
Token ID673
Feature activation-4.49e-03

Pos logit contributions
robe+0.123
roller+0.120
ase+0.120
Mine+0.120
room+0.118
Neg logit contributions
 bra-0.049
 proud-0.048
 aware-0.043
 brave-0.043
?-0.043
Token liked
Token ID8288
Feature activation-9.30e-03

Pos logit contributions
robe+0.007
roller+0.007
room+0.006
 ceiling+0.006
ork+0.006
Neg logit contributions
 bra-0.007
 aware-0.006
 cautious-0.006
 consider-0.006
 proud-0.006
Token it
Token ID340
Feature activation-1.11e-02

Pos logit contributions
robe+0.021
roller+0.021
room+0.019
 ceiling+0.019
attered+0.019
Neg logit contributions
 bra-0.006
 aware-0.006
 questions-0.006
 disl-0.005
 cautious-0.005
Token.
Token ID13
Feature activation+3.08e-03

Pos logit contributions
roller+0.024
robe+0.024
room+0.023
ase+0.022
attered+0.022
Neg logit contributions
 bra-0.009
 aware-0.008
 proud-0.007
 consider-0.007
 cautious-0.007
Token<|endoftext|>
Token ID50256
Feature activation+2.31e-02

Pos logit contributions
 bra+0.003
 aware+0.003
 disl+0.003
 cautious+0.003
 consider+0.003
Neg logit contributions
robe-0.006
roller-0.006
room-0.006
 ceiling-0.006
ase-0.006
Token a
Token ID257
Feature activation+1.27e-02

Pos logit contributions
 bra+0.006
 questions+0.006
 aware+0.006
 disl+0.006
 cautious+0.006
Neg logit contributions
robe-0.008
roller-0.008
ase-0.008
room-0.008
ocated-0.007
Token lamp
Token ID20450
Feature activation+8.55e-03

Pos logit contributions
 aware+0.022
 bra+0.021
 disl+0.021
 consider+0.020
 cautious+0.020
Neg logit contributions
robe-0.016
roller-0.015
 ceiling-0.014
ork-0.014
room-0.014
Token to
Token ID284
Feature activation-5.02e-03

Pos logit contributions
 bra+0.009
 aware+0.009
 questions+0.009
 cautious+0.009
 disl+0.008
Neg logit contributions
robe-0.016
roller-0.015
room-0.014
 ceiling-0.014
ocated-0.014
Token see
Token ID766
Feature activation-4.66e-03

Pos logit contributions
robe+0.008
roller+0.008
room+0.007
ase+0.007
 ceiling+0.007
Neg logit contributions
 bra-0.007
 aware-0.007
 cautious-0.006
 consider-0.006
 disl-0.006
Token the
Token ID262
Feature activation-2.20e-03

Pos logit contributions
robe+0.008
roller+0.007
ase+0.007
room+0.007
ork+0.007
Neg logit contributions
 bra-0.006
 aware-0.006
 questions-0.006
 cautious-0.006
 consider-0.006
Token way
Token ID835
Feature activation-6.14e-02

Pos logit contributions
robe+0.003
 ceiling+0.003
roller+0.003
room+0.003
 shelf+0.002
Neg logit contributions
 aware-0.004
 disl-0.004
 consider-0.004
 bra-0.004
 cautious-0.003
Token.
Token ID13
Feature activation+1.21e-02

Pos logit contributions
robe+0.104
ase+0.099
Mine+0.097
roller+0.097
room+0.095
Neg logit contributions
 bra-0.069
 questions-0.069
 aware-0.068
But-0.068
 brave-0.065
Token The
Token ID383
Feature activation-9.81e-03

Pos logit contributions
 bra+0.014
 aware+0.013
 questions+0.013
But+0.012
 cautious+0.012
Neg logit contributions
robe-0.022
roller-0.020
room-0.020
ase-0.020
ocated-0.020
Token lamp
Token ID20450
Feature activation-1.67e-02

Pos logit contributions
robe+0.010
 ceiling+0.009
roller+0.008
room+0.008
ase+0.007
Neg logit contributions
 aware-0.020
 bra-0.019
 disl-0.019
 consider-0.019
 cautious-0.018
Token was
Token ID373
Feature activation-1.20e-02

Pos logit contributions
robe+0.021
roller+0.020
room+0.019
ocated+0.019
Mine+0.018
Neg logit contributions
 bra-0.028
 aware-0.027
 cautious-0.026
 questions-0.025
 consider-0.025
Token old
Token ID1468
Feature activation-4.23e-03

Pos logit contributions
robe+0.019
roller+0.018
room+0.018
 ceiling+0.017
ork+0.017
Neg logit contributions
 bra-0.016
 aware-0.016
 disl-0.015
 cautious-0.015
 consider-0.015
Token her
Token ID607
Feature activation-9.87e-03

Pos logit contributions
 bra+0.000
 aware+0.000
 cautious+0.000
 disl+0.000
 consider+0.000
Neg logit contributions
robe-0.000
roller-0.000
room-0.000
 ceiling-0.000
ork-0.000
Token grandma
Token ID49890
Feature activation-2.61e-03

Pos logit contributions
robe+0.012
roller+0.011
room+0.010
 ceiling+0.010
ase+0.010
Neg logit contributions
 bra-0.018
 aware-0.017
 cautious-0.017
 disl-0.017
 consider-0.017
Token.
Token ID13
Feature activation-1.31e-02

Pos logit contributions
robe+0.004
roller+0.004
room+0.004
 ceiling+0.004
ase+0.004
Neg logit contributions
 disl-0.004
 aware-0.004
 bra-0.004
 cautious-0.003
 consider-0.003
Token On
Token ID1550
Feature activation-1.58e-02

Pos logit contributions
ocated+0.025
robe+0.025
kin+0.025
ase+0.024
ied+0.024
Neg logit contributions
 questions-0.010
But-0.010
 disl-0.010
 proud-0.010
 tell-0.009
Token the
Token ID262
Feature activation-2.16e-02

Pos logit contributions
robe+0.034
attered+0.033
ocated+0.032
roller+0.031
arded+0.030
Neg logit contributions
 bra-0.014
 saying-0.013
But-0.012
 proud-0.012
 tell-0.012
Token way
Token ID835
Feature activation-6.12e-02

Pos logit contributions
robe+0.034
ocated+0.032
tered+0.030
kin+0.029
room+0.029
Neg logit contributions
 bra-0.030
 aware-0.028
 tell-0.028
 proud-0.027
 med-0.027
Token,
Token ID11
Feature activation-1.54e-02

Pos logit contributions
Mine+0.172
tered+0.170
attered+0.169
robe+0.166
ocated+0.166
Neg logit contributions
,-0.003
 and+0.001
?+0.003
 next+0.004
 brave+0.005
Token she
Token ID673
Feature activation-6.49e-03

Pos logit contributions
roller+0.038
ase+0.036
attered+0.035
ool+0.035
ings+0.035
Neg logit contributions
But-0.010
 proud-0.009
 then-0.008
 brave-0.008
 bra-0.007
Token found
Token ID1043
Feature activation+1.46e-02

Pos logit contributions
robe+0.014
roller+0.013
room+0.013
Mine+0.013
 ceiling+0.013
Neg logit contributions
 bra-0.006
 aware-0.005
 proud-0.005
But-0.005
 brave-0.004
Token a
Token ID257
Feature activation-1.14e-02

Pos logit contributions
 bra+0.011
 brave+0.010
 proud+0.009
 aware+0.008
 questions+0.008
Neg logit contributions
roller-0.035
robe-0.033
oom-0.032
ocated-0.031
 budget-0.031
Token shiny
Token ID22441
Feature activation-1.05e-02

Pos logit contributions
robe+0.020
room+0.019
tered+0.019
ase+0.018
roller+0.018
Neg logit contributions
 bra-0.014
 proud-0.013
 aware-0.013
 brave-0.012
 cautious-0.012
Token.
Token ID13
Feature activation-1.59e-02

Pos logit contributions
robe+0.041
roller+0.039
 ceiling+0.038
attered+0.038
Mine+0.038
Neg logit contributions
 bra-0.013
 aware-0.012
 questions-0.011
But-0.011
 cautious-0.010
Token
Token ID198
Feature activation-9.82e-03

Pos logit contributions
robe+0.024
roller+0.023
room+0.022
 ceiling+0.021
ase+0.021
Neg logit contributions
 bra-0.023
 aware-0.023
 disl-0.021
 cautious-0.021
 consider-0.021
Token
Token ID198
Feature activation-1.21e-02

Pos logit contributions
robe+0.019
roller+0.019
room+0.018
 ceiling+0.018
ork+0.017
Neg logit contributions
 bra-0.010
 aware-0.010
 disl-0.009
 cautious-0.009
 consider-0.009
TokenOn
Token ID2202
Feature activation-3.76e-04

Pos logit contributions
robe+0.024
roller+0.023
room+0.023
 ceiling+0.022
ocated+0.022
Neg logit contributions
 bra-0.012
 aware-0.011
 disl-0.011
 cautious-0.010
 questions-0.010
Token the
Token ID262
Feature activation-1.58e-02

Pos logit contributions
attered+0.001
robe+0.001
ase+0.001
arded+0.001
ocated+0.001
Neg logit contributions
 bra-0.000
 then-0.000
But-0.000
 proud-0.000
 saying-0.000
Token way
Token ID835
Feature activation-6.11e-02

Pos logit contributions
robe+0.023
ase+0.021
attered+0.021
uttered+0.020
ocated+0.019
Neg logit contributions
 bra-0.026
But-0.024
 tell-0.024
 ze-0.024
 saying-0.024
Token to
Token ID284
Feature activation-7.11e-04

Pos logit contributions
robe+0.135
tered+0.133
Mine+0.129
ool+0.128
ocated+0.128
Neg logit contributions
But-0.036
 brave-0.031
?-0.031
 next-0.031
 then-0.031
Token the
Token ID262
Feature activation-1.02e-02

Pos logit contributions
attered+0.002
robe+0.002
ase+0.002
ied+0.002
box+0.002
Neg logit contributions
 bra-0.001
 tell-0.000
 proud-0.000
But-0.000
 med-0.000
Token park
Token ID3952
Feature activation-4.28e-03

Pos logit contributions
robe+0.019
tered+0.018
attered+0.017
ase+0.017
uttered+0.017
Neg logit contributions
 bra-0.012
But-0.011
 med-0.011
 proud-0.011
 questions-0.011
Token,
Token ID11
Feature activation-8.58e-03

Pos logit contributions
robe+0.011
Mine+0.010
ifier+0.010
attered+0.010
ocated+0.010
Neg logit contributions
 bra-0.002
 proud-0.002
But-0.002
 questions-0.002
 brave-0.002
Token they
Token ID484
Feature activation+1.51e-03

Pos logit contributions
ase+0.016
robe+0.016
roller+0.016
ool+0.015
attered+0.015
Neg logit contributions
But-0.009
 proud-0.009
 bra-0.009
 brave-0.008
 aware-0.008
Tokeny
Token ID88
Feature activation-4.13e-03

Pos logit contributions
robe+0.014
roller+0.011
room+0.010
ase+0.009
 ceiling+0.009
Neg logit contributions
 bra-0.053
 aware-0.052
 disl-0.051
 consider-0.051
 cautious-0.050
Token showed
Token ID3751
Feature activation-1.47e-02

Pos logit contributions
robe+0.007
roller+0.006
room+0.006
 ceiling+0.006
tered+0.006
Neg logit contributions
 bra-0.005
 aware-0.005
 disl-0.005
 consider-0.005
 questions-0.005
Token the
Token ID262
Feature activation+9.95e-03

Pos logit contributions
robe+0.026
roller+0.025
room+0.023
ase+0.023
attered+0.023
Neg logit contributions
 bra-0.018
 aware-0.018
 questions-0.017
 disl-0.016
 cautious-0.016
Token bug
Token ID5434
Feature activation-1.00e-02

Pos logit contributions
 bra+0.015
 aware+0.014
 cautious+0.014
 disl+0.014
 questions+0.013
Neg logit contributions
robe-0.015
roller-0.014
room-0.014
tered-0.013
ase-0.013
Token the
Token ID262
Feature activation+1.24e-03

Pos logit contributions
robe+0.016
roller+0.015
room+0.015
 ceiling+0.014
ase+0.014
Neg logit contributions
 bra-0.014
 aware-0.013
 cautious-0.013
 disl-0.013
 consider-0.012
Token way
Token ID835
Feature activation-6.11e-02

Pos logit contributions
 bra+0.002
 aware+0.002
 cautious+0.002
 questions+0.002
 disl+0.002
Neg logit contributions
robe-0.002
roller-0.002
room-0.002
tered-0.002
ase-0.002
Token home
Token ID1363
Feature activation-2.37e-02

Pos logit contributions
robe+0.117
roller+0.112
room+0.111
ase+0.110
Mine+0.108
Neg logit contributions
 bra-0.062
 aware-0.059
 proud-0.058
But-0.057
 brave-0.055
Token and
Token ID290
Feature activation-6.56e-03

Pos logit contributions
Mine+0.054
roller+0.054
robe+0.052
room+0.050
ase+0.050
Neg logit contributions
 bra-0.016
 aware-0.015
 proud-0.013
 cautious-0.013
 brave-0.013
Token the
Token ID262
Feature activation-2.77e-04

Pos logit contributions
robe+0.012
roller+0.011
room+0.011
ase+0.011
 ceiling+0.011
Neg logit contributions
 bra-0.008
 aware-0.008
 cautious-0.007
 consider-0.007
 disl-0.007
Token bug
Token ID5434
Feature activation-1.52e-02

Pos logit contributions
robe+0.000
roller+0.000
room+0.000
 ceiling+0.000
ase+0.000
Neg logit contributions
 bra-0.000
 aware-0.000
 disl-0.000
 cautious-0.000
 consider-0.000
Token was
Token ID373
Feature activation-9.62e-03

Pos logit contributions
robe+0.022
roller+0.021
room+0.020
 ceiling+0.019
ase+0.019
Neg logit contributions
 bra-0.024
 aware-0.023
 cautious-0.022
 consider-0.022
 disl-0.021
Token hopes
Token ID7176
Feature activation-2.80e-03

Pos logit contributions
 aware+0.024
Just+0.022
 consider+0.022
 cautious+0.022
emy+0.022
Neg logit contributions
robe-0.020
roller-0.019
 ceiling-0.019
 glass-0.019
 section-0.018
Token they
Token ID484
Feature activation-2.68e-03

Pos logit contributions
robe+0.005
roller+0.005
ase+0.004
attered+0.004
room+0.004
Neg logit contributions
 bra-0.003
 questions-0.003
 aware-0.003
But-0.003
 cautious-0.003
Token will
Token ID481
Feature activation-2.05e-02

Pos logit contributions
robe+0.004
roller+0.003
 ceiling+0.003
room+0.003
ork+0.003
Neg logit contributions
 aware-0.004
 bra-0.004
 disl-0.004
 cautious-0.004
emy-0.004
Token stay
Token ID2652
Feature activation-2.13e-02

Pos logit contributions
robe+0.030
roller+0.029
room+0.029
 ceiling+0.028
ork+0.027
Neg logit contributions
 aware-0.030
 bra-0.029
emy-0.029
 cautious-0.028
 disl-0.028
Token that
Token ID326
Feature activation-9.79e-03

Pos logit contributions
robe+0.029
room+0.026
kin+0.024
roller+0.024
ings+0.024
Neg logit contributions
 disl-0.036
 eagle-0.033
 ze-0.033
 bra-0.033
cod-0.032
Token way
Token ID835
Feature activation-6.10e-02

Pos logit contributions
robe+0.018
roller+0.018
ocated+0.017
ase+0.017
ork+0.017
Neg logit contributions
 bra-0.010
 questions-0.010
 proud-0.010
 brave-0.009
 tell-0.009
Token.
Token ID13
Feature activation-5.41e-03

Pos logit contributions
robe+0.117
roller+0.109
ase+0.106
mail+0.105
room+0.104
Neg logit contributions
 aware-0.062
 bra-0.061
 questions-0.058
 proud-0.057
But-0.057
Token<|endoftext|>
Token ID50256
Feature activation+3.68e-02

Pos logit contributions
robe+0.008
roller+0.007
 ceiling+0.007
room+0.007
Mine+0.006
Neg logit contributions
 bra-0.008
 aware-0.008
 consider-0.008
ologist-0.008
 impressed-0.008
TokenIt
Token ID1026
Feature activation-1.65e-03

Pos logit contributions
 bra+0.027
 aware+0.027
 disl+0.024
 cautious+0.024
 consider+0.023
Neg logit contributions
robe-0.082
roller-0.078
room-0.076
 ceiling-0.075
ase-0.074
Token was
Token ID373
Feature activation-1.81e-02

Pos logit contributions
robe+0.002
room+0.001
 ceiling+0.001
roller+0.001
ase+0.001
Neg logit contributions
 bra-0.003
 aware-0.003
 consider-0.003
 cautious-0.003
 disl-0.003
Token a
Token ID257
Feature activation-4.67e-05

Pos logit contributions
robe+0.029
roller+0.028
ork+0.026
ocated+0.026
ase+0.025
Neg logit contributions
 bra-0.025
 aware-0.025
 questions-0.023
 proud-0.023
 cautious-0.023
Token They
Token ID1119
Feature activation+2.05e-02

Pos logit contributions
 bra+0.005
But+0.005
Just+0.005
 aware+0.005
 proud+0.005
Neg logit contributions
robe-0.010
 ceiling-0.009
ase-0.009
room-0.009
ocated-0.009
Token try
Token ID1949
Feature activation-1.00e-02

Pos logit contributions
 aware+0.032
 disl+0.031
smart+0.030
ologist+0.030
 bra+0.030
Neg logit contributions
robe-0.026
roller-0.025
Mine-0.024
ocated-0.024
 ceiling-0.024
Token to
Token ID284
Feature activation-3.27e-03

Pos logit contributions
robe+0.012
roller+0.012
ork+0.011
room+0.011
ocated+0.010
Neg logit contributions
 bra-0.016
 aware-0.016
 disl-0.016
 consider-0.016
 cautious-0.015
Token find
Token ID1064
Feature activation-1.89e-03

Pos logit contributions
robe+0.005
roller+0.005
room+0.005
 ceiling+0.004
ocated+0.004
Neg logit contributions
 aware-0.005
 bra-0.005
 disl-0.004
 cautious-0.004
 questions-0.004
Token the
Token ID262
Feature activation+1.12e-02

Pos logit contributions
robe+0.004
roller+0.004
ase+0.004
 ceiling+0.004
attered+0.003
Neg logit contributions
 bra-0.002
 brave-0.002
 questions-0.002
 aware-0.002
 cautious-0.002
Token way
Token ID835
Feature activation-6.00e-02

Pos logit contributions
 bra+0.014
 questions+0.013
 aware+0.013
 cautious+0.013
 brave+0.012
Neg logit contributions
robe-0.019
room-0.019
ase-0.019
roller-0.019
tered-0.019
Token back
Token ID736
Feature activation-9.66e-03

Pos logit contributions
robe+0.109
ase+0.101
room+0.100
roller+0.099
 ceiling+0.098
Neg logit contributions
 bra-0.067
But-0.064
 aware-0.064
 proud-0.062
 questions-0.062
Token to
Token ID284
Feature activation+9.55e-03

Pos logit contributions
robe+0.017
roller+0.016
room+0.016
 ceiling+0.016
ase+0.015
Neg logit contributions
 bra-0.011
 aware-0.011
 cautious-0.010
 questions-0.010
 consider-0.010
Token their
Token ID511
Feature activation+1.66e-02

Pos logit contributions
 bra+0.009
 tell+0.008
 saying+0.008
 brave+0.007
 aware+0.007
Neg logit contributions
robe-0.022
ase-0.022
attered-0.021
roller-0.020
bars-0.020
Token home
Token ID1363
Feature activation-2.93e-03

Pos logit contributions
 bra+0.025
 questions+0.024
 aware+0.024
 cautious+0.023
 trusts+0.022
Neg logit contributions
robe-0.025
tered-0.024
ase-0.023
room-0.023
roller-0.022
Token.
Token ID13
Feature activation+5.71e-03

Pos logit contributions
robe+0.007
Mine+0.007
ase+0.007
roller+0.007
attered+0.006
Neg logit contributions
 bra-0.002
 aware-0.002
 questions-0.001
 cautious-0.001
 proud-0.001
Token her
Token ID607
Feature activation+3.53e-03

Pos logit contributions
robe+0.020
attered+0.019
ase+0.019
roller+0.019
ometer+0.018
Neg logit contributions
 bra-0.010
 questions-0.010
But-0.010
 saying-0.010
 aware-0.010
Token pocket
Token ID10000
Feature activation-6.96e-03

Pos logit contributions
 proud+0.005
 aware+0.005
 bra+0.005
 brave+0.005
 cautious+0.005
Neg logit contributions
robe-0.005
tered-0.005
ocated-0.005
room-0.005
roller-0.004
Token.
Token ID13
Feature activation-2.58e-03

Pos logit contributions
robe+0.015
Mine+0.015
roller+0.015
room+0.014
ase+0.014
Neg logit contributions
 aware-0.005
 bra-0.005
 cautious-0.004
 questions-0.004
 consider-0.004
Token On
Token ID1550
Feature activation-6.80e-03

Pos logit contributions
robe+0.005
kin+0.005
room+0.005
ase+0.005
roller+0.004
Neg logit contributions
 bra-0.002
Just-0.002
But-0.002
 aware-0.002
 questions-0.002
Token the
Token ID262
Feature activation-1.04e-02

Pos logit contributions
attered+0.017
robe+0.016
ase+0.015
ometer+0.015
suit+0.015
Neg logit contributions
 then-0.004
 saying-0.004
 bra-0.004
But-0.004
 proud-0.003
Token way
Token ID835
Feature activation-5.99e-02

Pos logit contributions
robe+0.015
uttered+0.013
ase+0.013
attered+0.013
ings+0.013
Neg logit contributions
But-0.017
 bra-0.017
 saying-0.017
 tell-0.016
Just-0.016
Token to
Token ID284
Feature activation+5.37e-04

Pos logit contributions
robe+0.139
Mine+0.138
attered+0.132
tered+0.130
mail+0.129
Neg logit contributions
 brave-0.028
 then-0.028
But-0.028
,-0.027
 and-0.025
Token her
Token ID607
Feature activation-1.17e-02

Pos logit contributions
 bra+0.000
 tell+0.000
But+0.000
 proud+0.000
 say+0.000
Neg logit contributions
attered-0.001
robe-0.001
ase-0.001
ometer-0.001
roller-0.001
Token grandma
Token ID49890
Feature activation+1.53e-03

Pos logit contributions
robe+0.021
tered+0.019
ase+0.018
kin+0.018
attered+0.018
Neg logit contributions
 bra-0.015
 brave-0.015
 tell-0.015
 aware-0.014
 proud-0.014
Token's
Token ID338
Feature activation-1.50e-02

Pos logit contributions
 bra+0.002
 aware+0.002
 questions+0.002
 tell+0.002
 trusts+0.002
Neg logit contributions
robe-0.003
roller-0.003
attered-0.002
tered-0.002
 ceiling-0.002
Token house
Token ID2156
Feature activation+6.70e-03

Pos logit contributions
robe+0.026
tered+0.025
room+0.024
ocated+0.023
ase+0.023
Neg logit contributions
 aware-0.018
 bra-0.018
 questions-0.017
 cautious-0.017
 brave-0.017
Token it
Token ID340
Feature activation-1.34e-02

Pos logit contributions
 bra+0.007
 aware+0.007
 questions+0.006
 cautious+0.006
 consider+0.006
Neg logit contributions
robe-0.010
roller-0.010
ase-0.009
room-0.009
attered-0.009
Token would
Token ID561
Feature activation-2.70e-02

Pos logit contributions
robe+0.015
roller+0.014
 ceiling+0.013
room+0.013
ork+0.012
Neg logit contributions
 aware-0.024
 bra-0.024
 disl-0.023
 consider-0.023
 cautious-0.023
Token never
Token ID1239
Feature activation-2.06e-02

Pos logit contributions
robe+0.044
roller+0.042
room+0.041
ase+0.040
attered+0.039
Neg logit contributions
 bra-0.037
 aware-0.036
 consider-0.034
 disl-0.034
 cautious-0.033
Token find
Token ID1064
Feature activation-1.00e-02

Pos logit contributions
robe+0.024
roller+0.022
 ceiling+0.022
room+0.022
ocated+0.020
Neg logit contributions
 aware-0.035
 disl-0.034
 bra-0.034
 cautious-0.033
Just-0.032
Token its
Token ID663
Feature activation+6.92e-03

Pos logit contributions
robe+0.017
roller+0.017
ase+0.016
room+0.016
ocated+0.015
Neg logit contributions
 bra-0.014
 aware-0.013
 questions-0.012
 brave-0.012
 cautious-0.012
Token way
Token ID835
Feature activation-5.99e-02

Pos logit contributions
 bra+0.015
 aware+0.014
 disl+0.014
 consider+0.014
 cautious+0.014
Neg logit contributions
robe-0.006
roller-0.005
 ceiling-0.005
room-0.004
ork-0.004
Token back
Token ID736
Feature activation-2.11e-02

Pos logit contributions
robe+0.088
roller+0.081
room+0.081
ase+0.078
tered+0.076
Neg logit contributions
 aware-0.089
 bra-0.089
 cautious-0.083
 consider-0.082
 questions-0.082
Token home
Token ID1363
Feature activation-1.75e-02

Pos logit contributions
robe+0.027
roller+0.025
room+0.024
 ceiling+0.023
ork+0.022
Neg logit contributions
 bra-0.035
 aware-0.035
 disl-0.034
 cautious-0.033
 consider-0.033
Token.
Token ID13
Feature activation+5.23e-03

Pos logit contributions
robe+0.040
Mine+0.039
roller+0.039
ase+0.038
room+0.038
Neg logit contributions
 aware-0.011
 bra-0.011
 cautious-0.010
 questions-0.009
 proud-0.009
Token But
Token ID887
Feature activation+6.70e-03

Pos logit contributions
 bra+0.008
 aware+0.008
 cautious+0.007
 disl+0.007
 consider+0.007
Neg logit contributions
robe-0.008
roller-0.007
room-0.007
 ceiling-0.007
ase-0.007
Token then
Token ID788
Feature activation+1.94e-02

Pos logit contributions
 bra+0.007
 aware+0.007
 cautious+0.006
 consider+0.006
 questions+0.006
Neg logit contributions
robe-0.013
roller-0.013
room-0.012
 ceiling-0.012
ase-0.012
Token find
Token ID1064
Feature activation-1.73e-03

Pos logit contributions
robe+0.009
room+0.009
tered+0.009
ocated+0.009
 ceiling+0.009
Neg logit contributions
 disl-0.013
 aware-0.012
 impressed-0.012
 importantly-0.012
 knowledgeable-0.012
Token their
Token ID511
Feature activation+6.18e-03

Pos logit contributions
robe+0.004
roller+0.004
ase+0.003
 ceiling+0.003
kin+0.003
Neg logit contributions
 bra-0.002
 questions-0.002
 brave-0.002
 elephants-0.002
 tell-0.002
Token ball
Token ID2613
Feature activation-4.07e-03

Pos logit contributions
 bra+0.010
 aware+0.010
 questions+0.010
 cautious+0.010
 disl+0.010
Neg logit contributions
robe-0.008
room-0.007
roller-0.007
ase-0.007
tered-0.007
Token or
Token ID393
Feature activation-1.33e-02

Pos logit contributions
robe+0.009
roller+0.009
ase+0.008
Mine+0.008
room+0.008
Neg logit contributions
 bra-0.003
 brave-0.003
smart-0.002
?-0.002
 proud-0.002
Token the
Token ID262
Feature activation-6.36e-04

Pos logit contributions
robe+0.022
roller+0.021
room+0.020
ase+0.020
ork+0.020
Neg logit contributions
 bra-0.018
 aware-0.017
 disl-0.016
 questions-0.016
 consider-0.016
Token way
Token ID835
Feature activation-5.98e-02

Pos logit contributions
robe+0.001
roller+0.001
 ceiling+0.001
room+0.001
ork+0.001
Neg logit contributions
 aware-0.001
 bra-0.001
 disl-0.001
 consider-0.001
 cautious-0.001
Token back
Token ID736
Feature activation-1.75e-02

Pos logit contributions
robe+0.097
roller+0.089
room+0.088
ase+0.086
tered+0.086
Neg logit contributions
 bra-0.078
 aware-0.075
 proud-0.072
 cautious-0.071
But-0.071
Token.
Token ID13
Feature activation+1.63e-03

Pos logit contributions
robe+0.031
roller+0.030
room+0.029
 ceiling+0.028
ork+0.027
Neg logit contributions
 bra-0.021
 aware-0.021
 disl-0.019
 cautious-0.019
 consider-0.019
Token
Token ID198
Feature activation-3.55e-03

Pos logit contributions
 bra+0.002
 aware+0.002
 disl+0.002
 cautious+0.002
 consider+0.002
Neg logit contributions
robe-0.003
roller-0.003
room-0.003
 ceiling-0.003
ase-0.002
Token
Token ID198
Feature activation+5.22e-03

Pos logit contributions
robe+0.007
roller+0.007
room+0.006
ork+0.006
 ceiling+0.006
Neg logit contributions
 bra-0.004
 aware-0.004
 cautious-0.003
 disl-0.003
 consider-0.003
Token"
Token ID1
Feature activation+4.93e-03

Pos logit contributions
 bra+0.006
 aware+0.006
 disl+0.005
 questions+0.005
 cautious+0.005
Neg logit contributions
roller-0.009
robe-0.009
room-0.009
ork-0.009
tered-0.009
Token some
Token ID617
Feature activation+1.86e-03

Pos logit contributions
robe+0.029
roller+0.028
room+0.027
 ceiling+0.026
ase+0.026
Neg logit contributions
 bra-0.021
 aware-0.020
 disl-0.019
 consider-0.019
 cautious-0.019
Token ice
Token ID4771
Feature activation-1.62e-02

Pos logit contributions
 aware+0.003
 disl+0.003
ologist+0.003
 cautious+0.003
Just+0.003
Neg logit contributions
robe-0.003
 ceiling-0.002
 glass-0.002
room-0.002
roller-0.002
Token cream
Token ID8566
Feature activation-3.62e-03

Pos logit contributions
robe+0.023
roller+0.022
room+0.021
 ceiling+0.021
ase+0.020
Neg logit contributions
 bra-0.024
 aware-0.024
 cautious-0.023
 disl-0.023
 consider-0.023
Token on
Token ID319
Feature activation-2.59e-02

Pos logit contributions
robe+0.007
roller+0.007
room+0.006
 ceiling+0.006
ase+0.006
Neg logit contributions
 bra-0.004
 aware-0.004
 disl-0.003
 cautious-0.003
 consider-0.003
Token the
Token ID262
Feature activation-8.90e-03

Pos logit contributions
robe+0.050
roller+0.048
attered+0.047
ocated+0.046
ase+0.045
Neg logit contributions
But-0.027
 bra-0.025
?-0.024
 questions-0.023
 disl-0.023
Token way
Token ID835
Feature activation-5.97e-02

Pos logit contributions
robe+0.012
room+0.011
roller+0.011
ocated+0.010
ase+0.010
Neg logit contributions
 bra-0.015
 aware-0.013
But-0.013
 questions-0.013
 proud-0.013
Token back
Token ID736
Feature activation-2.01e-02

Pos logit contributions
robe+0.135
roller+0.129
mail+0.127
ocated+0.126
Mine+0.126
Neg logit contributions
 bra-0.033
But-0.032
 aware-0.031
 questions-0.031
 proud-0.028
Token.
Token ID13
Feature activation-6.62e-03

Pos logit contributions
robe+0.039
roller+0.037
tered+0.036
room+0.036
ocated+0.036
Neg logit contributions
 bra-0.019
 aware-0.019
 questions-0.018
 cautious-0.017
 consider-0.017
Token
Token ID198
Feature activation-5.39e-03

Pos logit contributions
robe+0.009
roller+0.009
 ceiling+0.008
room+0.008
ork+0.007
Neg logit contributions
 aware-0.011
 bra-0.011
cod-0.011
 cautious-0.010
ologist-0.010
Token
Token ID198
Feature activation-5.66e-03

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
ork+0.008
ase+0.008
Neg logit contributions
 aware-0.007
 bra-0.007
ologist-0.007
 cautious-0.007
 disl-0.007
TokenL
Token ID43
Feature activation-2.72e-02

Pos logit contributions
room+0.005
roller+0.005
robe+0.005
Mine+0.005
ork+0.005
Neg logit contributions
 aware-0.011
 disl-0.011
 bra-0.011
ologist-0.011
 questions-0.010
Token.
Token ID13
Feature activation-1.95e-03

Pos logit contributions
 bra+0.001
 aware+0.001
 disl+0.001
 cautious+0.001
 consider+0.001
Neg logit contributions
robe-0.001
roller-0.001
room-0.001
 ceiling-0.001
ase-0.001
Token The
Token ID383
Feature activation+4.98e-03

Pos logit contributions
robe+0.004
roller+0.004
ork+0.004
ocated+0.004
room+0.004
Neg logit contributions
 bra-0.002
 aware-0.002
 cautious-0.002
 questions-0.002
 trusts-0.002
Token map
Token ID3975
Feature activation-1.28e-02

Pos logit contributions
 bra+0.008
 aware+0.008
 questions+0.007
 cautious+0.007
 consider+0.007
Neg logit contributions
robe-0.007
roller-0.006
room-0.006
ase-0.006
 ceiling-0.006
Token showed
Token ID3751
Feature activation-3.94e-03

Pos logit contributions
robe+0.019
roller+0.019
room+0.018
 ceiling+0.017
attered+0.017
Neg logit contributions
 bra-0.018
 aware-0.017
 proud-0.017
 cautious-0.017
 trusts-0.016
Token the
Token ID262
Feature activation+6.72e-03

Pos logit contributions
robe+0.009
roller+0.008
ase+0.008
 ceiling+0.008
attered+0.007
Neg logit contributions
 bra-0.004
 questions-0.004
 aware-0.004
 brave-0.004
 trusts-0.003
Token way
Token ID835
Feature activation-5.96e-02

Pos logit contributions
 bra+0.010
 aware+0.010
 cautious+0.009
 questions+0.009
 disl+0.009
Neg logit contributions
robe-0.010
roller-0.009
room-0.009
ase-0.009
 ceiling-0.009
Token to
Token ID284
Feature activation+2.00e-03

Pos logit contributions
robe+0.121
ase+0.116
room+0.115
Mine+0.113
roller+0.110
Neg logit contributions
 questions-0.053
 proud-0.053
 saying-0.051
 aware-0.051
 bra-0.051
Token a
Token ID257
Feature activation+1.67e-03

Pos logit contributions
 tell+0.002
 bra+0.002
 questions+0.002
 say+0.002
 saying+0.002
Neg logit contributions
attered-0.005
robe-0.004
ase-0.004
roller-0.004
olk-0.004
Token big
Token ID1263
Feature activation+3.40e-03

Pos logit contributions
 aware+0.003
 bra+0.002
 disl+0.002
 cautious+0.002
 consider+0.002
Neg logit contributions
robe-0.002
roller-0.002
 ceiling-0.002
ork-0.002
room-0.002
Token treasure
Token ID14068
Feature activation-4.47e-03

Pos logit contributions
 aware+0.005
 bra+0.005
 disl+0.005
 cautious+0.005
 consider+0.005
Neg logit contributions
robe-0.005
roller-0.004
 ceiling-0.004
room-0.004
ork-0.004
Token chest
Token ID7721
Feature activation-1.59e-02

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
 ceiling+0.007
tered+0.007
Neg logit contributions
 bra-0.005
 aware-0.005
 cautious-0.004
 questions-0.004
 consider-0.004
Token He
Token ID679
Feature activation-4.52e-03

Pos logit contributions
robe+0.008
roller+0.007
room+0.007
 ceiling+0.007
ase+0.007
Neg logit contributions
 bra-0.005
 aware-0.005
 disl-0.005
 cautious-0.005
 consider-0.005
Token couldn
Token ID3521
Feature activation+1.40e-03

Pos logit contributions
robe+0.007
roller+0.007
room+0.007
ase+0.007
 ceiling+0.006
Neg logit contributions
 bra-0.006
 cautious-0.006
 aware-0.006
 brave-0.006
Just-0.006
Token't
Token ID470
Feature activation-1.49e-02

Pos logit contributions
 bra+0.002
But+0.002
Just+0.002
 proud+0.002
Voice+0.002
Neg logit contributions
robe-0.002
roller-0.002
attered-0.002
room-0.002
ase-0.002
Token find
Token ID1064
Feature activation-6.04e-03

Pos logit contributions
robe+0.022
roller+0.021
room+0.020
 ceiling+0.020
ork+0.019
Neg logit contributions
 bra-0.022
 aware-0.022
 disl-0.020
 cautious-0.020
 consider-0.020
Token the
Token ID262
Feature activation-1.28e-02

Pos logit contributions
robe+0.014
roller+0.014
ase+0.013
attered+0.013
ork+0.013
Neg logit contributions
 bra-0.005
 brave-0.004
 why-0.004
 questions-0.004
smart-0.004
Token way
Token ID835
Feature activation-5.96e-02

Pos logit contributions
robe+0.015
roller+0.015
room+0.014
ase+0.013
tered+0.013
Neg logit contributions
 bra-0.022
 aware-0.022
 questions-0.021
 cautious-0.021
 disl-0.021
Token out
Token ID503
Feature activation-9.70e-03

Pos logit contributions
robe+0.107
ase+0.101
room+0.100
Mine+0.099
tered+0.098
Neg logit contributions
 bra-0.065
 aware-0.063
 proud-0.061
 questions-0.060
But-0.060
Token.
Token ID13
Feature activation-1.20e-03

Pos logit contributions
robe+0.018
roller+0.017
room+0.017
 ceiling+0.016
ork+0.016
Neg logit contributions
 bra-0.011
 aware-0.011
 disl-0.010
 cautious-0.010
 consider-0.010
Token He
Token ID679
Feature activation-1.38e-03

Pos logit contributions
robe+0.002
roller+0.002
room+0.002
 ceiling+0.002
ork+0.002
Neg logit contributions
 bra-0.001
 aware-0.001
 disl-0.001
 cautious-0.001
 consider-0.001
Token got
Token ID1392
Feature activation-3.83e-03

Pos logit contributions
robe+0.002
roller+0.002
room+0.002
ase+0.002
 ceiling+0.002
Neg logit contributions
 bra-0.002
 aware-0.002
 cautious-0.002
 brave-0.002
 questions-0.002
Token frustrated
Token ID14718
Feature activation+1.01e-02

Pos logit contributions
robe+0.007
roller+0.007
room+0.007
ase+0.007
 ceiling+0.007
Neg logit contributions
 bra-0.004
 aware-0.004
 cautious-0.004
 questions-0.004
 proud-0.004
Token he
Token ID339
Feature activation-5.98e-03

Pos logit contributions
 aware+0.002
 bra+0.002
 disl+0.002
 cautious+0.002
 consider+0.002
Neg logit contributions
robe-0.004
roller-0.003
room-0.003
 ceiling-0.003
tered-0.003
Token slowly
Token ID6364
Feature activation-2.90e-03

Pos logit contributions
robe+0.010
roller+0.009
room+0.009
 ceiling+0.009
ase+0.008
Neg logit contributions
 bra-0.008
 aware-0.008
 disl-0.008
 cautious-0.008
 consider-0.007
Token crawled
Token ID45668
Feature activation-9.82e-03

Pos logit contributions
robe+0.004
roller+0.004
room+0.004
 ceiling+0.004
ork+0.004
Neg logit contributions
 aware-0.004
 disl-0.004
 bra-0.004
 consider-0.004
 cautious-0.004
Token back
Token ID736
Feature activation-1.26e-02

Pos logit contributions
robe+0.015
roller+0.014
room+0.014
 ceiling+0.013
ase+0.013
Neg logit contributions
 bra-0.014
 aware-0.014
 cautious-0.013
 questions-0.013
 disl-0.013
Token the
Token ID262
Feature activation+7.30e-04

Pos logit contributions
robe+0.018
roller+0.017
room+0.016
 ceiling+0.016
ork+0.015
Neg logit contributions
 bra-0.020
 aware-0.019
 disl-0.019
 consider-0.019
 cautious-0.018
Token way
Token ID835
Feature activation-5.96e-02

Pos logit contributions
 bra+0.001
 aware+0.001
 cautious+0.001
 questions+0.001
 disl+0.001
Neg logit contributions
robe-0.001
roller-0.001
room-0.001
ase-0.001
tered-0.001
Token he
Token ID339
Feature activation-5.15e-03

Pos logit contributions
robe+0.116
room+0.107
ase+0.106
roller+0.105
Mine+0.104
Neg logit contributions
 bra-0.062
 aware-0.059
 proud-0.057
 cautious-0.054
But-0.054
Token came
Token ID1625
Feature activation-1.75e-03

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
 ceiling+0.007
ase+0.007
Neg logit contributions
 bra-0.007
 aware-0.007
 cautious-0.007
 consider-0.006
 trusts-0.006
Token.
Token ID13
Feature activation-2.93e-03

Pos logit contributions
robe+0.002
roller+0.002
room+0.002
 ceiling+0.002
ork+0.002
Neg logit contributions
 disl-0.003
 bra-0.003
 aware-0.003
 consider-0.002
 cautious-0.002
Token
Token ID220
Feature activation-1.89e-03

Pos logit contributions
robe+0.005
roller+0.004
room+0.004
 ceiling+0.004
ork+0.004
Neg logit contributions
 bra-0.004
 aware-0.004
 disl-0.004
 cautious-0.004
 consider-0.004
Token
Token ID198
Feature activation-8.72e-03

Pos logit contributions
robe+0.004
roller+0.004
 ceiling+0.004
room+0.004
ase+0.004
Neg logit contributions
 aware-0.001
 bra-0.001
 cautious-0.001
 consider-0.001
 disl-0.001
Token and
Token ID290
Feature activation-1.78e-03

Pos logit contributions
roller+0.036
robe+0.036
ase+0.034
Mine+0.034
ork+0.034
Neg logit contributions
 aware-0.009
 bra-0.009
 brave-0.008
But-0.008
 cautious-0.008
Token could
Token ID714
Feature activation-3.22e-02

Pos logit contributions
roller+0.003
room+0.003
ase+0.003
robe+0.003
ool+0.003
Neg logit contributions
 brave-0.002
 proud-0.002
 bra-0.002
 aware-0.002
 cautious-0.002
Token not
Token ID407
Feature activation-2.34e-02

Pos logit contributions
ase+0.062
robe+0.062
roller+0.061
room+0.061
attered+0.060
Neg logit contributions
 bra-0.034
 aware-0.031
 proud-0.031
 brave-0.030
 tell-0.030
Token find
Token ID1064
Feature activation-7.23e-03

Pos logit contributions
robe+0.040
roller+0.038
ase+0.037
room+0.036
attered+0.036
Neg logit contributions
 bra-0.030
 aware-0.030
 consider-0.028
 cautious-0.027
 questions-0.027
Token its
Token ID663
Feature activation+4.20e-03

Pos logit contributions
robe+0.016
roller+0.016
ase+0.015
room+0.015
OM+0.014
Neg logit contributions
 bra-0.007
 brave-0.007
 questions-0.006
 aware-0.006
 why-0.006
Token way
Token ID835
Feature activation-5.93e-02

Pos logit contributions
 bra+0.007
 aware+0.007
 cautious+0.007
 questions+0.007
 consider+0.007
Neg logit contributions
robe-0.005
roller-0.005
room-0.005
ase-0.004
 ceiling-0.004
Token back
Token ID736
Feature activation-2.12e-02

Pos logit contributions
robe+0.089
room+0.085
roller+0.082
ase+0.082
Mine+0.081
Neg logit contributions
 aware-0.084
 bra-0.083
 proud-0.078
 cautious-0.077
 questions-0.077
Token home
Token ID1363
Feature activation-1.67e-02

Pos logit contributions
robe+0.038
roller+0.036
room+0.035
 ceiling+0.034
ase+0.034
Neg logit contributions
 aware-0.025
 bra-0.025
 cautious-0.023
 consider-0.022
 disl-0.022
Token.
Token ID13
Feature activation+3.99e-03

Pos logit contributions
Mine+0.041
robe+0.040
roller+0.040
ase+0.039
room+0.039
Neg logit contributions
 aware-0.008
 bra-0.008
 cautious-0.006
 proud-0.006
 brave-0.006
Token The
Token ID383
Feature activation+1.88e-02

Pos logit contributions
 bra+0.004
 aware+0.004
Just+0.004
But+0.004
 cautious+0.004
Neg logit contributions
robe-0.007
roller-0.007
room-0.007
ocated-0.007
ork-0.007
Token pretty
Token ID2495
Feature activation-6.44e-03

Pos logit contributions
 disl+0.025
smart+0.023
 bra+0.023
 aware+0.023
 consider+0.022
Neg logit contributions
robe-0.031
 ceiling-0.028
roller-0.028
tered-0.027
room-0.027
Token He
Token ID679
Feature activation-2.50e-03

Pos logit contributions
robe+0.010
roller+0.009
room+0.009
 ceiling+0.009
ase+0.009
Neg logit contributions
 aware-0.011
 bra-0.010
 disl-0.010
 consider-0.010
cod-0.010
Token liked
Token ID8288
Feature activation+8.46e-03

Pos logit contributions
robe+0.003
tered+0.003
roller+0.003
room+0.003
 ceiling+0.003
Neg logit contributions
 aware-0.004
 disl-0.004
 bra-0.004
 cautious-0.004
 questions-0.004
Token his
Token ID465
Feature activation-1.61e-03

Pos logit contributions
 aware+0.013
 bra+0.012
 cautious+0.012
 consider+0.012
 trusts+0.012
Neg logit contributions
robe-0.012
roller-0.011
 ceiling-0.011
room-0.011
ork-0.010
Token plane
Token ID6614
Feature activation+4.76e-03

Pos logit contributions
robe+0.002
roller+0.002
 ceiling+0.002
room+0.001
ork+0.001
Neg logit contributions
 aware-0.003
 bra-0.003
 disl-0.003
 consider-0.003
 cautious-0.003
Token the
Token ID262
Feature activation-9.95e-03

Pos logit contributions
 aware+0.005
 bra+0.005
 disl+0.004
 cautious+0.004
 consider+0.004
Neg logit contributions
robe-0.009
roller-0.009
room-0.009
 ceiling-0.009
ase-0.008
Token way
Token ID835
Feature activation-5.91e-02

Pos logit contributions
robe+0.015
 ceiling+0.014
roller+0.013
room+0.013
ase+0.012
Neg logit contributions
 aware-0.015
 disl-0.014
 bra-0.014
 cautious-0.014
 consider-0.014
Token it
Token ID340
Feature activation-1.09e-02

Pos logit contributions
robe+0.104
roller+0.099
ocated+0.098
room+0.096
ase+0.095
Neg logit contributions
 bra-0.065
 questions-0.062
 proud-0.060
 aware-0.059
But-0.057
Token was
Token ID373
Feature activation-1.26e-02

Pos logit contributions
robe+0.014
roller+0.014
room+0.013
 ceiling+0.013
ocated+0.013
Neg logit contributions
 bra-0.018
 aware-0.017
 cautious-0.016
 consider-0.016
 questions-0.016
Token,
Token ID11
Feature activation+3.73e-03

Pos logit contributions
robe+0.020
 ceiling+0.018
roller+0.017
tered+0.017
room+0.017
Neg logit contributions
 disl-0.017
 aware-0.017
 trusts-0.016
cod-0.016
emy-0.016
Token but
Token ID475
Feature activation+3.65e-04

Pos logit contributions
 aware+0.004
 disl+0.004
 bra+0.004
 cautious+0.004
 trusts+0.004
Neg logit contributions
robe-0.007
 ceiling-0.006
room-0.006
roller-0.006
tered-0.006
Token he
Token ID339
Feature activation-3.70e-03

Pos logit contributions
 bra+0.000
 aware+0.000
 cautious+0.000
 disl+0.000
 consider+0.000
Neg logit contributions
robe-0.001
roller-0.001
room-0.001
 ceiling-0.001
ase-0.001
Token.
Token ID13
Feature activation+5.83e-03

Pos logit contributions
 disl+0.029
 bra+0.028
 aware+0.027
 consider+0.026
 trusts+0.026
Neg logit contributions
robe-0.043
roller-0.038
 ceiling-0.038
room-0.037
attered-0.037
Token She
Token ID1375
Feature activation-1.76e-03

Pos logit contributions
 bra+0.007
 aware+0.007
 disl+0.007
 questions+0.007
 cautious+0.007
Neg logit contributions
robe-0.010
roller-0.009
room-0.009
ocated-0.009
ork-0.009
Token lived
Token ID5615
Feature activation+6.68e-03

Pos logit contributions
robe+0.003
roller+0.002
room+0.002
 ceiling+0.002
ase+0.002
Neg logit contributions
 bra-0.003
 aware-0.003
 disl-0.003
 cautious-0.003
 consider-0.002
Token in
Token ID287
Feature activation+1.35e-02

Pos logit contributions
 disl+0.010
 bra+0.010
 aware+0.009
 consider+0.009
ologist+0.009
Neg logit contributions
robe-0.010
room-0.009
roller-0.009
ase-0.009
tered-0.009
Token a
Token ID257
Feature activation+2.72e-02

Pos logit contributions
 bra+0.015
 aware+0.014
 brave+0.013
 cautious+0.013
But+0.013
Neg logit contributions
robe-0.028
roller-0.027
attered-0.026
ase-0.025
ocated-0.025
Token beautiful
Token ID4950
Feature activation+3.45e-02

Pos logit contributions
 bra+0.040
 aware+0.040
 disl+0.039
 consider+0.037
 questions+0.037
Neg logit contributions
robe-0.039
roller-0.038
room-0.035
 ceiling-0.035
ork-0.035
Token green
Token ID4077
Feature activation+2.39e-02

Pos logit contributions
 bra+0.051
 aware+0.049
 disl+0.048
 cautious+0.048
 questions+0.047
Neg logit contributions
robe-0.050
roller-0.050
 ceiling-0.047
room-0.046
ork-0.046
Token me
Token ID502
Feature activation+8.18e-03

Pos logit contributions
 bra+0.034
 aware+0.033
 disl+0.032
 cautious+0.032
 consider+0.031
Neg logit contributions
robe-0.037
roller-0.035
room-0.034
 ceiling-0.033
ase-0.032
Tokenadow
Token ID4584
Feature activation-3.43e-03

Pos logit contributions
 bra+0.011
 aware+0.010
 disl+0.010
 consider+0.010
 cautious+0.010
Neg logit contributions
robe-0.014
roller-0.013
room-0.012
ase-0.012
ork-0.012
Token.
Token ID13
Feature activation+8.56e-03

Pos logit contributions
robe+0.006
roller+0.006
room+0.006
 ceiling+0.006
ork+0.006
Neg logit contributions
 bra-0.004
 aware-0.004
 cautious-0.003
 disl-0.003
 consider-0.003
Token
Token ID198
Feature activation-8.07e-03

Pos logit contributions
 bra+0.010
 aware+0.010
 cautious+0.010
 disl+0.010
 consider+0.009
Neg logit contributions
robe-0.015
roller-0.014
room-0.014
 ceiling-0.014
ase-0.013
Token "
Token ID366
Feature activation+8.37e-03

Pos logit contributions
robe+0.025
roller+0.023
room+0.022
 ceiling+0.022
tered+0.021
Neg logit contributions
 aware-0.027
 bra-0.026
 cautious-0.025
 questions-0.025
 consider-0.025
TokenDo
Token ID5211
Feature activation-4.35e-03

Pos logit contributions
 questions+0.011
 bra+0.011
 aware+0.011
 trusts+0.011
 cautious+0.010
Neg logit contributions
robe-0.013
Mine-0.013
tered-0.012
room-0.012
ase-0.011
Token you
Token ID345
Feature activation+1.21e-02

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
 ceiling+0.008
ork+0.007
Neg logit contributions
 bra-0.005
 aware-0.005
 cautious-0.004
 disl-0.004
 consider-0.004
Token hear
Token ID3285
Feature activation+1.41e-02

Pos logit contributions
Just+0.017
 aware+0.017
 bra+0.017
emy+0.017
 impressed+0.016
Neg logit contributions
room-0.016
 frame-0.015
 ceiling-0.015
robe-0.015
tered-0.015
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
 bra+0.020
 aware+0.019
 cautious+0.019
 trusts+0.018
 consider+0.018
Neg logit contributions
room-0.020
robe-0.020
 ceiling-0.020
roller-0.019
ork-0.019
Token bird
Token ID6512
Feature activation+3.62e-02

Pos logit contributions
 aware+0.019
smart+0.018
 consider+0.018
 bra+0.018
 trusts+0.018
Neg logit contributions
 ceiling-0.015
robe-0.015
 frame-0.013
roller-0.013
ork-0.013
Token singing
Token ID13777
Feature activation+2.58e-02

Pos logit contributions
emy+0.053
Just+0.053
 knowledgeable+0.053
 disl+0.052
 bra+0.052
Neg logit contributions
robe-0.045
 frame-0.044
room-0.043
 ceiling-0.041
ings-0.041
Token?"
Token ID1701
Feature activation-1.64e-02

Pos logit contributions
 bra+0.031
 aware+0.030
 trusts+0.029
 disl+0.029
 cautious+0.028
Neg logit contributions
robe-0.046
room-0.044
 ceiling-0.042
tered-0.041
ase-0.040
Token Lily
Token ID20037
Feature activation-7.82e-03

Pos logit contributions
robe+0.018
roller+0.017
tered+0.016
ork+0.015
ase+0.015
Neg logit contributions
 aware-0.031
ologist-0.029
 bra-0.029
 questions-0.029
 cautious-0.029
Token stopped
Token ID5025
Feature activation+1.71e-02

Pos logit contributions
robe+0.013
roller+0.013
tered+0.013
room+0.012
 ceiling+0.012
Neg logit contributions
 bra-0.009
 disl-0.009
 aware-0.008
 questions-0.008
cod-0.008
Token what
Token ID644
Feature activation+5.39e-04

Pos logit contributions
inations+0.031
 Deal+0.029
 trusts+0.029
 knowledgeable+0.029
ises+0.028
Neg logit contributions
 reaction-0.015
y-0.015
ho-0.014
 out-0.013
 baby-0.012
Token up
Token ID510
Feature activation+7.95e-03

Pos logit contributions
robe+0.005
tered+0.005
attered+0.005
ied+0.005
 ceiling+0.005
Neg logit contributions
 disl-0.011
ologist-0.010
emy-0.010
cod-0.010
 consider-0.010
Token,
Token ID11
Feature activation+1.17e-02

Pos logit contributions
 disl+0.012
 aware+0.012
cod+0.012
 consider+0.012
ologist+0.012
Neg logit contributions
robe-0.010
roller-0.010
tered-0.010
 ceiling-0.010
ork-0.010
Token she
Token ID673
Feature activation+7.08e-03

Pos logit contributions
 bra+0.016
 aware+0.016
 proud+0.015
But+0.015
 brave+0.015
Neg logit contributions
robe-0.019
roller-0.018
room-0.017
ase-0.017
attered-0.017
Token saw
Token ID2497
Feature activation+2.86e-02

Pos logit contributions
 disl+0.012
 bra+0.011
 aware+0.011
 questions+0.011
 consider+0.011
Neg logit contributions
roller-0.008
robe-0.008
ork-0.008
tered-0.008
 ceiling-0.008
Token an
Token ID281
Feature activation+8.00e-05

Pos logit contributions
 bra+0.034
 aware+0.034
 questions+0.031
 disl+0.031
 brave+0.031
Neg logit contributions
robe-0.052
roller-0.051
ase-0.047
ork-0.046
attered-0.046
Token owl
Token ID39610
Feature activation+3.61e-02

Pos logit contributions
 bra+0.000
 aware+0.000
 disl+0.000
 cautious+0.000
 consider+0.000
Neg logit contributions
robe-0.000
roller-0.000
room-0.000
 ceiling-0.000
ase-0.000
Token flying
Token ID7348
Feature activation+5.65e-03

Pos logit contributions
 disl+0.052
 Deal+0.049
 questions+0.048
 aware+0.048
 knowledgeable+0.048
Neg logit contributions
robe-0.050
roller-0.049
room-0.048
 ceiling-0.048
 frame-0.047
Token in
Token ID287
Feature activation-7.02e-03

Pos logit contributions
 bra+0.007
 aware+0.007
 cautious+0.006
 disl+0.006
 consider+0.006
Neg logit contributions
robe-0.010
roller-0.010
room-0.009
 ceiling-0.009
ase-0.009
Token the
Token ID262
Feature activation+9.51e-04

Pos logit contributions
robe+0.011
roller+0.010
room+0.010
 ceiling+0.010
ork+0.010
Neg logit contributions
 aware-0.010
 bra-0.010
 disl-0.009
 cautious-0.009
 consider-0.009
Token air
Token ID1633
Feature activation+2.30e-03

Pos logit contributions
 aware+0.002
 bra+0.002
 disl+0.001
 consider+0.001
 cautious+0.001
Neg logit contributions
robe-0.001
roller-0.001
 ceiling-0.001
room-0.001
ase-0.001
Token.
Token ID13
Feature activation-4.97e-03

Pos logit contributions
 bra+0.002
 aware+0.002
 disl+0.002
 cautious+0.002
 consider+0.002
Neg logit contributions
robe-0.005
roller-0.004
room-0.004
 ceiling-0.004
ase-0.004
Token spaghetti
Token ID48241
Feature activation-1.35e-02

Pos logit contributions
robe+0.005
 ceiling+0.005
roller+0.004
room+0.004
ase+0.004
Neg logit contributions
 aware-0.007
 bra-0.007
 consider-0.007
 disl-0.007
 cautious-0.007
Token like
Token ID588
Feature activation-1.36e-02

Pos logit contributions
robe+0.025
roller+0.024
room+0.023
 ceiling+0.023
attered+0.023
Neg logit contributions
 bra-0.015
 aware-0.014
 questions-0.013
 cautious-0.013
 consider-0.012
Token snakes
Token ID26042
Feature activation+1.03e-03

Pos logit contributions
robe+0.021
roller+0.021
room+0.019
ocated+0.019
attered+0.019
Neg logit contributions
 bra-0.019
 aware-0.018
 questions-0.018
 disl-0.018
 cautious-0.017
Token again
Token ID757
Feature activation-5.94e-05

Pos logit contributions
 aware+0.001
 bra+0.001
 disl+0.001
 consider+0.001
 cautious+0.001
Neg logit contributions
robe-0.002
 ceiling-0.002
room-0.002
roller-0.002
ase-0.002
Token.
Token ID13
Feature activation-2.38e-05

Pos logit contributions
robe+0.000
room+0.000
ork+0.000
 ceiling+0.000
roller+0.000
Neg logit contributions
 aware-0.000
 bra-0.000
 disl-0.000
 trusts-0.000
 cautious-0.000
Token<|endoftext|>
Token ID50256
Feature activation+3.97e-02

Pos logit contributions
robe+0.000
roller+0.000
room+0.000
 ceiling+0.000
Mine+0.000
Neg logit contributions
 bra-0.000
 aware-0.000
 cautious-0.000
 consider-0.000
ologist-0.000
TokenL
Token ID43
Feature activation-2.42e-02

Pos logit contributions
 aware+0.036
 bra+0.035
 disl+0.033
 cautious+0.032
 consider+0.032
Neg logit contributions
robe-0.084
room-0.078
roller-0.078
 ceiling-0.076
ase-0.076
Tokenily
Token ID813
Feature activation+1.26e-02

Pos logit contributions
robe+0.053
attered+0.049
tered+0.049
roller+0.049
ocated+0.049
Neg logit contributions
 questions-0.020
?-0.016
 tell-0.016
 aware-0.015
fing-0.015
Token was
Token ID373
Feature activation+6.73e-03

Pos logit contributions
 bra+0.017
 aware+0.017
 cautious+0.016
 disl+0.015
 consider+0.015
Neg logit contributions
robe-0.020
roller-0.019
room-0.019
 ceiling-0.018
ase-0.018
Token eager
Token ID11069
Feature activation-1.35e-02

Pos logit contributions
 bra+0.008
 aware+0.008
 disl+0.008
 cautious+0.008
 consider+0.008
Neg logit contributions
robe-0.011
roller-0.011
room-0.010
 ceiling-0.010
ase-0.010
Token to
Token ID284
Feature activation+7.33e-03

Pos logit contributions
robe+0.024
roller+0.023
room+0.022
ocated+0.021
ase+0.021
Neg logit contributions
 bra-0.016
 aware-0.016
 questions-0.015
 cautious-0.015
 disl-0.015
Token mind
Token ID2000
Feature activation-1.20e-02

Pos logit contributions
Just+0.001
 aware+0.001
 cautious+0.001
 bra+0.001
 trusts+0.001
Neg logit contributions
robe-0.001
 ceiling-0.001
 glass-0.001
tered-0.001
room-0.001
Token the
Token ID262
Feature activation-5.01e-03

Pos logit contributions
robe+0.019
roller+0.018
room+0.017
 ceiling+0.017
ork+0.017
Neg logit contributions
 bra-0.016
 aware-0.016
 cautious-0.015
 disl-0.015
 consider-0.015
Token gloomy
Token ID46400
Feature activation-2.11e-02

Pos logit contributions
robe+0.006
 ceiling+0.006
 glass+0.005
roller+0.005
 shelf+0.005
Neg logit contributions
 aware-0.009
 consider-0.008
 bra-0.008
smart-0.008
 disl-0.008
Token day
Token ID1110
Feature activation-1.54e-02

Pos logit contributions
robe+0.028
roller+0.027
room+0.025
ocated+0.024
ork+0.024
Neg logit contributions
 bra-0.034
 aware-0.034
 disl-0.032
 questions-0.032
 cautious-0.032
Token.
Token ID13
Feature activation-3.20e-03

Pos logit contributions
robe+0.036
roller+0.035
ocated+0.033
ase+0.032
 ceiling+0.032
Neg logit contributions
 bra-0.010
 questions-0.009
 aware-0.009
 proud-0.008
 cautious-0.008
Token<|endoftext|>
Token ID50256
Feature activation+3.42e-02

Pos logit contributions
robe+0.004
room+0.004
roller+0.004
 ceiling+0.004
ork+0.003
Neg logit contributions
 bra-0.006
 aware-0.006
ologist-0.005
 cautious-0.005
 eagle-0.005
TokenS
Token ID50
Feature activation+8.68e-03

Pos logit contributions
 bra+0.025
 aware+0.024
 cautious+0.022
 disl+0.021
 consider+0.021
Neg logit contributions
robe-0.076
roller-0.073
room-0.071
 ceiling-0.070
ase-0.069
Tokenara
Token ID3301
Feature activation-2.16e-02

Pos logit contributions
 bra+0.011
 aware+0.010
 questions+0.010
 disl+0.010
 trusts+0.010
Neg logit contributions
robe-0.015
roller-0.013
tered-0.013
 ceiling-0.013
ocated-0.013
Token and
Token ID290
Feature activation-4.55e-03

Pos logit contributions
robe+0.046
roller+0.046
 ceiling+0.044
ocated+0.044
room+0.042
Neg logit contributions
 aware-0.017
 bra-0.016
 cautious-0.015
 trusts-0.015
 disl-0.015
Token Tom
Token ID4186
Feature activation-5.71e-03

Pos logit contributions
robe+0.008
roller+0.008
ork+0.007
ase+0.007
 ceiling+0.007
Neg logit contributions
 aware-0.006
 bra-0.006
 proud-0.006
 disl-0.005
 consider-0.005
Token were
Token ID547
Feature activation+1.02e-02

Pos logit contributions
robe+0.009
roller+0.009
ork+0.008
 ceiling+0.008
ocated+0.008
Neg logit contributions
 bra-0.008
 aware-0.008
 trusts-0.008
 disl-0.008
 proud-0.008
Token Their
Token ID5334
Feature activation-4.99e-03

Pos logit contributions
robe+0.014
roller+0.013
room+0.013
 ceiling+0.012
ocated+0.012
Neg logit contributions
 bra-0.016
 aware-0.016
 disl-0.015
 questions-0.015
 consider-0.015
Token mom
Token ID1995
Feature activation-1.18e-02

Pos logit contributions
robe+0.008
roller+0.007
room+0.007
ocated+0.007
tered+0.007
Neg logit contributions
 bra-0.007
 questions-0.007
 disl-0.007
 aware-0.007
 cautious-0.007
Token said
Token ID531
Feature activation-1.22e-02

Pos logit contributions
robe+0.025
roller+0.023
room+0.023
 ceiling+0.023
ork+0.022
Neg logit contributions
 bra-0.010
 aware-0.010
 cautious-0.009
 disl-0.009
 consider-0.009
Token yes
Token ID3763
Feature activation-6.88e-03

Pos logit contributions
robe+0.024
roller+0.023
room+0.022
 ceiling+0.022
attered+0.021
Neg logit contributions
 bra-0.013
 aware-0.012
 questions-0.012
 cautious-0.011
 consider-0.011
Token,
Token ID11
Feature activation-2.59e-03

Pos logit contributions
robe+0.015
roller+0.014
 ceiling+0.014
room+0.014
ase+0.013
Neg logit contributions
 bra-0.006
 aware-0.006
 consider-0.005
 questions-0.005
 cautious-0.005
Token but
Token ID475
Feature activation+3.15e-03

Pos logit contributions
robe+0.004
tered+0.004
roller+0.004
room+0.004
 ceiling+0.004
Neg logit contributions
 disl-0.003
 aware-0.003
cod-0.003
ologist-0.003
emy-0.003
Token reminded
Token ID14516
Feature activation-2.37e-02

Pos logit contributions
 aware+0.005
 bra+0.005
 disl+0.004
 cautious+0.004
 consider+0.004
Neg logit contributions
robe-0.005
roller-0.004
 ceiling-0.004
room-0.004
tered-0.004
Token them
Token ID606
Feature activation-1.47e-02

Pos logit contributions
robe+0.042
roller+0.040
 ceiling+0.037
ase+0.037
room+0.037
Neg logit contributions
 bra-0.030
 aware-0.028
 questions-0.028
 disl-0.027
 consider-0.026
Token to
Token ID284
Feature activation-6.93e-03

Pos logit contributions
robe+0.027
roller+0.025
room+0.025
 ceiling+0.024
ase+0.024
Neg logit contributions
 aware-0.017
 bra-0.017
 disl-0.015
 cautious-0.015
 questions-0.015
Token clean
Token ID3424
Feature activation-9.15e-03

Pos logit contributions
robe+0.010
roller+0.010
room+0.009
 ceiling+0.009
tered+0.009
Neg logit contributions
 bra-0.010
 aware-0.010
 cautious-0.009
 disl-0.009
 questions-0.009
Token up
Token ID510
Feature activation-1.16e-02

Pos logit contributions
robe+0.014
roller+0.013
 ceiling+0.012
ase+0.012
ocated+0.012
Neg logit contributions
 bra-0.013
 aware-0.013
 questions-0.012
 disl-0.012
 proud-0.012
Token saw
Token ID2497
Feature activation+1.63e-02

Pos logit contributions
robe+0.010
tered+0.009
roller+0.009
 ceiling+0.009
 nobody+0.009
Neg logit contributions
 disl-0.017
ologist-0.017
emy-0.017
 bra-0.017
agons-0.017
Token an
Token ID281
Feature activation-9.70e-03

Pos logit contributions
 bra+0.022
 aware+0.022
 questions+0.021
 cautious+0.020
 disl+0.020
Neg logit contributions
robe-0.026
roller-0.026
room-0.024
ase-0.024
ork-0.023
Token old
Token ID1468
Feature activation+2.42e-03

Pos logit contributions
robe+0.016
roller+0.015
room+0.014
 ceiling+0.014
ase+0.014
Neg logit contributions
 bra-0.013
 aware-0.013
 disl-0.012
 cautious-0.012
 consider-0.012
Token man
Token ID582
Feature activation+1.25e-02

Pos logit contributions
 bra+0.003
 aware+0.003
 disl+0.003
 cautious+0.003
 consider+0.003
Neg logit contributions
robe-0.004
roller-0.004
 ceiling-0.003
room-0.003
ork-0.003
Token standing
Token ID5055
Feature activation+1.63e-02

Pos logit contributions
 disl+0.016
 aware+0.016
 bra+0.016
emy+0.015
 cautious+0.015
Neg logit contributions
robe-0.019
roller-0.019
room-0.018
 ceiling-0.018
ork-0.017
Token nearby
Token ID6716
Feature activation+1.71e-02

Pos logit contributions
 disl+0.028
cod+0.025
 bra+0.024
emy+0.024
 consider+0.024
Neg logit contributions
robe-0.022
roller-0.020
room-0.020
ings-0.019
box-0.019
Token.
Token ID13
Feature activation+2.41e-03

Pos logit contributions
 aware+0.017
 bra+0.017
 disl+0.016
 cautious+0.016
 consider+0.015
Neg logit contributions
robe-0.033
roller-0.031
room-0.031
 ceiling-0.030
tered-0.030
Token The
Token ID383
Feature activation+1.09e-02

Pos logit contributions
 aware+0.004
 bra+0.004
 disl+0.004
 cautious+0.004
 consider+0.004
Neg logit contributions
robe-0.003
roller-0.003
 ceiling-0.003
room-0.003
ork-0.003
Token old
Token ID1468
Feature activation-7.20e-03

Pos logit contributions
 bra+0.015
 aware+0.015
 disl+0.014
 consider+0.014
 cautious+0.014
Neg logit contributions
robe-0.017
roller-0.016
 ceiling-0.016
room-0.015
ase-0.015
Token man
Token ID582
Feature activation+7.22e-04

Pos logit contributions
robe+0.012
roller+0.012
room+0.011
 ceiling+0.011
ase+0.011
Neg logit contributions
 bra-0.009
 aware-0.009
 cautious-0.008
 disl-0.008
 consider-0.008
Token said
Token ID531
Feature activation-1.37e-02

Pos logit contributions
 disl+0.001
 aware+0.001
 bra+0.001
 questions+0.001
 cautious+0.001
Neg logit contributions
robe-0.001
roller-0.001
tered-0.001
room-0.001
ork-0.001
Tokeny
Token ID88
Feature activation-2.04e-02

Pos logit contributions
robe+0.018
roller+0.016
ocated+0.015
ork+0.014
mail+0.013
Neg logit contributions
 aware-0.028
 bra-0.028
 disl-0.027
 eagle-0.027
emy-0.027
Token bird
Token ID6512
Feature activation-7.60e-04

Pos logit contributions
robe+0.031
roller+0.029
 ceiling+0.028
room+0.028
ase+0.027
Neg logit contributions
 bra-0.029
 aware-0.029
 disl-0.028
 consider-0.027
 cautious-0.027
Token left
Token ID1364
Feature activation-9.63e-03

Pos logit contributions
robe+0.001
roller+0.001
ase+0.001
room+0.001
ork+0.001
Neg logit contributions
 aware-0.001
 questions-0.001
 bra-0.001
 disl-0.001
 consider-0.001
Token.
Token ID13
Feature activation+5.67e-03

Pos logit contributions
roller+0.018
robe+0.018
ocated+0.017
attered+0.017
ork+0.017
Neg logit contributions
 bra-0.009
 aware-0.009
 proud-0.009
 questions-0.008
 brave-0.008
Token But
Token ID887
Feature activation+5.91e-03

Pos logit contributions
 bra+0.007
 aware+0.007
 disl+0.006
 cautious+0.006
 consider+0.006
Neg logit contributions
robe-0.010
roller-0.009
room-0.009
 ceiling-0.009
ork-0.009
Token,
Token ID11
Feature activation+3.47e-03

Pos logit contributions
 bra+0.007
 aware+0.007
 cautious+0.006
 consider+0.006
 proud+0.006
Neg logit contributions
robe-0.011
roller-0.011
room-0.010
ase-0.010
 ceiling-0.010
Token Lucy
Token ID22162
Feature activation-6.11e-03

Pos logit contributions
 bra+0.005
 aware+0.005
 cautious+0.004
 consider+0.004
 questions+0.004
Neg logit contributions
robe-0.006
roller-0.006
room-0.005
ase-0.005
ork-0.005
Token and
Token ID290
Feature activation+3.45e-03

Pos logit contributions
robe+0.012
roller+0.011
room+0.011
 ceiling+0.011
ase+0.011
Neg logit contributions
 bra-0.006
 aware-0.006
 disl-0.006
 consider-0.006
 questions-0.006
Token Tom
Token ID4186
Feature activation+8.40e-04

Pos logit contributions
 bra+0.005
 saying+0.004
 questions+0.004
 aware+0.004
 proud+0.004
Neg logit contributions
robe-0.006
roller-0.006
ase-0.006
attered-0.005
ork-0.005
Token were
Token ID547
Feature activation+1.43e-02

Pos logit contributions
 bra+0.002
 aware+0.002
 cautious+0.002
 consider+0.002
 disl+0.002
Neg logit contributions
robe-0.001
roller-0.001
room-0.001
 ceiling-0.001
ase-0.001
Token still
Token ID991
Feature activation+1.06e-02

Pos logit contributions
 bra+0.017
 aware+0.017
 disl+0.016
 cautious+0.016
 consider+0.015
Neg logit contributions
robe-0.025
roller-0.024
room-0.023
 ceiling-0.023
ase-0.022
Token,
Token ID11
Feature activation+1.45e-02

Pos logit contributions
 bra+0.022
 aware+0.022
 cautious+0.020
 disl+0.020
 consider+0.020
Neg logit contributions
robe-0.040
roller-0.038
room-0.037
 ceiling-0.036
ase-0.036
Token the
Token ID262
Feature activation+1.44e-02

Pos logit contributions
 bra+0.020
 aware+0.020
 disl+0.019
 cautious+0.019
 consider+0.018
Neg logit contributions
robe-0.023
roller-0.022
room-0.021
 ceiling-0.021
ase-0.020
Token farmer
Token ID18739
Feature activation+1.15e-02

Pos logit contributions
smart+0.028
 disl+0.027
fing+0.027
 consider+0.026
 bra+0.026
Neg logit contributions
 ceiling-0.014
robe-0.013
 shelf-0.012
 frame-0.012
 glass-0.012
Token started
Token ID2067
Feature activation+1.19e-03

Pos logit contributions
 bra+0.016
 aware+0.016
 disl+0.016
 questions+0.016
 consider+0.015
Neg logit contributions
robe-0.017
room-0.015
ase-0.015
roller-0.015
 ceiling-0.015
Token scattering
Token ID45765
Feature activation+1.70e-02

Pos logit contributions
 trusts+0.002
 bra+0.002
 disl+0.002
 Deal+0.002
cod+0.002
Neg logit contributions
robe-0.001
roller-0.001
 ceiling-0.001
room-0.001
ings-0.001
Token some
Token ID617
Feature activation+1.62e-02

Pos logit contributions
 aware+0.022
 bra+0.022
 disl+0.021
 cautious+0.021
 consider+0.021
Neg logit contributions
robe-0.028
roller-0.026
room-0.025
 ceiling-0.025
ase-0.024
Token wheat
Token ID17135
Feature activation+1.46e-02

Pos logit contributions
 aware+0.023
 disl+0.023
Just+0.022
 bra+0.022
 consider+0.022
Neg logit contributions
robe-0.023
 ceiling-0.022
room-0.021
 frame-0.021
 glass-0.020
Token in
Token ID287
Feature activation-2.25e-03

Pos logit contributions
 bra+0.017
 aware+0.016
 cautious+0.015
 disl+0.015
 consider+0.015
Neg logit contributions
robe-0.027
roller-0.025
room-0.024
 ceiling-0.024
ork-0.024
Token the
Token ID262
Feature activation+2.16e-02

Pos logit contributions
robe+0.004
roller+0.004
ase+0.004
room+0.004
 ceiling+0.004
Neg logit contributions
 bra-0.003
 aware-0.003
 questions-0.003
 cautious-0.002
 disl-0.002
Token field
Token ID2214
Feature activation-7.55e-03

Pos logit contributions
 bra+0.032
 aware+0.032
 disl+0.032
 consider+0.031
 cautious+0.030
Neg logit contributions
robe-0.029
roller-0.028
 ceiling-0.028
room-0.027
ork-0.027
Token for
Token ID329
Feature activation+1.57e-03

Pos logit contributions
robe+0.016
roller+0.015
ocated+0.015
 ceiling+0.014
ork+0.014
Neg logit contributions
 aware-0.007
 bra-0.006
 cautious-0.006
 questions-0.006
 consider-0.006
Token put
Token ID1234
Feature activation-2.70e-03

Pos logit contributions
robe+0.004
ase+0.004
roller+0.004
room+0.004
 ceiling+0.004
Neg logit contributions
 bra-0.003
 aware-0.002
 proud-0.002
 consider-0.002
But-0.002
Token it
Token ID340
Feature activation-1.77e-02

Pos logit contributions
robe+0.004
roller+0.004
room+0.004
 ceiling+0.004
attered+0.003
Neg logit contributions
 bra-0.004
 proud-0.004
 aware-0.004
But-0.004
 questions-0.004
Token in
Token ID287
Feature activation-1.23e-02

Pos logit contributions
roller+0.033
robe+0.033
room+0.032
 ceiling+0.030
ase+0.030
Neg logit contributions
 bra-0.019
 aware-0.018
 cautious-0.017
 proud-0.016
 questions-0.016
Token her
Token ID607
Feature activation+9.83e-03

Pos logit contributions
robe+0.022
attered+0.021
roller+0.020
ase+0.020
room+0.019
Neg logit contributions
 bra-0.016
But-0.015
 aware-0.015
 proud-0.014
 saying-0.014
Token pocket
Token ID10000
Feature activation-2.34e-03

Pos logit contributions
 bra+0.015
 aware+0.014
 proud+0.014
 cautious+0.014
 trusts+0.014
Neg logit contributions
robe-0.014
room-0.013
tered-0.013
ocated-0.012
roller-0.012
Token.
Token ID13
Feature activation+2.03e-03

Pos logit contributions
robe+0.005
room+0.005
roller+0.005
 ceiling+0.005
Mine+0.005
Neg logit contributions
 bra-0.002
 aware-0.002
 cautious-0.002
 consider-0.002
 questions-0.002
Token As
Token ID1081
Feature activation+6.62e-03

Pos logit contributions
 bra+0.002
 aware+0.002
 questions+0.002
Just+0.002
 disl+0.002
Neg logit contributions
robe-0.004
roller-0.003
room-0.003
kin-0.003
ase-0.003
Token she
Token ID673
Feature activation+5.76e-03

Pos logit contributions
 bra+0.007
 aware+0.007
But+0.006
 proud+0.006
 cautious+0.006
Neg logit contributions
robe-0.013
roller-0.013
room-0.012
ase-0.012
attered-0.012
Token was
Token ID373
Feature activation+8.88e-03

Pos logit contributions
 bra+0.008
 aware+0.007
Just+0.007
 consider+0.007
 cautious+0.007
Neg logit contributions
robe-0.010
roller-0.009
 ceiling-0.009
room-0.009
ase-0.008
Token walking
Token ID6155
Feature activation+1.52e-02

Pos logit contributions
 bra+0.011
 disl+0.011
 aware+0.011
 cautious+0.011
 questions+0.011
Neg logit contributions
robe-0.014
roller-0.014
ork-0.013
room-0.012
ocated-0.012
Token home
Token ID1363
Feature activation-3.72e-03

Pos logit contributions
 disl+0.022
 aware+0.022
 bra+0.021
 consider+0.021
cod+0.020
Neg logit contributions
robe-0.023
roller-0.021
room-0.021
 ceiling-0.020
ase-0.020
Token to
Token ID284
Feature activation-3.54e-03

Pos logit contributions
 bra+0.003
 aware+0.003
 cautious+0.003
 questions+0.003
 consider+0.003
Neg logit contributions
robe-0.005
roller-0.005
room-0.005
 ceiling-0.005
ase-0.005
Token greet
Token ID12589
Feature activation-7.84e-03

Pos logit contributions
robe+0.006
roller+0.006
room+0.006
 ceiling+0.005
tered+0.005
Neg logit contributions
 aware-0.004
 bra-0.004
 cautious-0.004
 disl-0.004
 consider-0.004
Token her
Token ID607
Feature activation+6.48e-03

Pos logit contributions
roller+0.013
robe+0.013
ocated+0.011
attered+0.011
ase+0.011
Neg logit contributions
 questions-0.010
 bra-0.010
 aware-0.010
But-0.009
 cautious-0.009
Token.
Token ID13
Feature activation-6.95e-06

Pos logit contributions
 aware+0.008
 disl+0.008
 bra+0.008
emy+0.008
 cautious+0.007
Neg logit contributions
robe-0.011
roller-0.010
room-0.010
 ceiling-0.010
tered-0.010
Token She
Token ID1375
Feature activation-6.61e-03

Feature not active on prev token;
no predictions were affected.
Token was
Token ID373
Feature activation-2.70e-03

Pos logit contributions
robe+0.010
roller+0.009
room+0.008
 ceiling+0.008
tered+0.008
Neg logit contributions
 aware-0.010
 bra-0.010
 disl-0.009
 cautious-0.009
 questions-0.009
Token really
Token ID1107
Feature activation+2.50e-03

Pos logit contributions
robe+0.005
roller+0.005
room+0.005
 ceiling+0.005
ase+0.004
Neg logit contributions
 bra-0.003
 aware-0.003
 disl-0.003
 cautious-0.003
 consider-0.003
Token nice
Token ID3621
Feature activation+2.62e-03

Pos logit contributions
 aware+0.002
 bra+0.002
 disl+0.002
ologist+0.002
emy+0.002
Neg logit contributions
robe-0.005
roller-0.005
room-0.005
 ceiling-0.005
tered-0.005
Token and
Token ID290
Feature activation+8.29e-04

Pos logit contributions
 bra+0.003
 aware+0.003
 disl+0.003
 consider+0.003
cod+0.003
Neg logit contributions
robe-0.005
roller-0.005
room-0.004
 ceiling-0.004
ase-0.004
Token said
Token ID531
Feature activation-8.35e-03

Pos logit contributions
 bra+0.001
 aware+0.001
 cautious+0.001
 disl+0.001
 consider+0.001
Neg logit contributions
robe-0.002
roller-0.001
room-0.001
 ceiling-0.001
ase-0.001
Token,
Token ID11
Feature activation-4.56e-03

Pos logit contributions
robe+0.013
roller+0.012
tered+0.012
ork+0.012
 ceiling+0.012
Neg logit contributions
 aware-0.011
 bra-0.010
 cautious-0.010
ologist-0.010
cod-0.010
TokenOnce
Token ID7454
Feature activation+1.08e-03

Pos logit contributions
 bra+0.013
 aware+0.011
But+0.011
 questions+0.011
 proud+0.010
Neg logit contributions
robe-0.049
roller-0.048
attered-0.046
 ceiling-0.046
ocated-0.045
Token upon
Token ID2402
Feature activation-3.09e-02

Pos logit contributions
 aware+0.001
 bra+0.001
 questions+0.001
 proud+0.001
 impressed+0.001
Neg logit contributions
robe-0.002
roller-0.002
room-0.002
ocated-0.002
tered-0.002
Token a
Token ID257
Feature activation-6.33e-03

Pos logit contributions
robe+0.051
roller+0.051
ase+0.046
room+0.046
attered+0.046
Neg logit contributions
 bra-0.040
 aware-0.040
 proud-0.037
But-0.037
 cautious-0.037
Token time
Token ID640
Feature activation-4.23e-03

Pos logit contributions
robe+0.008
roller+0.007
room+0.007
 ceiling+0.007
ase+0.007
Neg logit contributions
 bra-0.011
 aware-0.011
 disl-0.010
 cautious-0.010
 consider-0.010
Token there
Token ID612
Feature activation-1.22e-02

Pos logit contributions
robe+0.009
roller+0.009
attered+0.008
 ceiling+0.008
ase+0.008
Neg logit contributions
 aware-0.003
 bra-0.003
 disl-0.003
 cautious-0.003
 questions-0.003
Token was
Token ID373
Feature activation-7.91e-03

Pos logit contributions
roller+0.028
ifier+0.027
attered+0.026
ocated+0.026
robe+0.026
Neg logit contributions
?-0.010
 brave-0.009
-0.009
But-0.009
 told-0.008
Token a
Token ID257
Feature activation+9.14e-03

Pos logit contributions
robe+0.013
roller+0.012
room+0.011
ocated+0.011
ork+0.011
Neg logit contributions
 bra-0.011
 aware-0.011
 cautious-0.010
 proud-0.010
 disl-0.010
Token little
Token ID1310
Feature activation-6.03e-03

Pos logit contributions
 aware+0.015
 disl+0.015
 bra+0.015
smart+0.015
 consider+0.015
Neg logit contributions
robe-0.011
 ceiling-0.010
roller-0.010
room-0.010
ork-0.010
Token girl
Token ID2576
Feature activation-6.07e-03

Pos logit contributions
robe+0.009
roller+0.008
room+0.008
 ceiling+0.008
ase+0.007
Neg logit contributions
 bra-0.009
 aware-0.009
 disl-0.009
 consider-0.009
 cautious-0.009
Token named
Token ID3706
Feature activation-2.02e-02

Pos logit contributions
robe+0.011
room+0.011
roller+0.010
ase+0.010
 frame+0.010
Neg logit contributions
 bra-0.006
 disl-0.006
 aware-0.006
 questions-0.006
 consider-0.006
Token Emily
Token ID17608
Feature activation-6.04e-03

Pos logit contributions
robe+0.031
roller+0.029
room+0.027
ase+0.026
attered+0.026
Neg logit contributions
 aware-0.028
 bra-0.028
But-0.027
 questions-0.027
 disl-0.027
Token She
Token ID1375
Feature activation+1.41e-03

Pos logit contributions
 bra+0.013
 aware+0.013
 disl+0.013
 questions+0.012
 cautious+0.012
Neg logit contributions
robe-0.017
roller-0.016
room-0.015
ocated-0.015
ork-0.015
Token went
Token ID1816
Feature activation-4.13e-03

Pos logit contributions
 aware+0.002
 bra+0.002
 disl+0.002
 cautious+0.002
 questions+0.002
Neg logit contributions
robe-0.002
roller-0.002
room-0.002
 ceiling-0.002
ork-0.001
Token home
Token ID1363
Feature activation-1.61e-02

Pos logit contributions
robe+0.005
roller+0.005
 ceiling+0.004
room+0.004
ork+0.004
Neg logit contributions
 disl-0.007
 aware-0.007
 bra-0.007
 consider-0.007
 cautious-0.007
Token and
Token ID290
Feature activation-4.86e-03

Pos logit contributions
robe+0.029
roller+0.027
room+0.026
 ceiling+0.026
ork+0.025
Neg logit contributions
 bra-0.019
 aware-0.018
 disl-0.017
 cautious-0.017
 consider-0.017
Token had
Token ID550
Feature activation-1.35e-03

Pos logit contributions
robe+0.008
roller+0.008
room+0.008
 ceiling+0.007
ase+0.007
Neg logit contributions
 bra-0.006
 aware-0.006
 cautious-0.006
 consider-0.006
 disl-0.006
Token a
Token ID257
Feature activation-2.31e-03

Pos logit contributions
robe+0.002
roller+0.002
ase+0.002
room+0.002
ocated+0.002
Neg logit contributions
 bra-0.002
 aware-0.002
 consider-0.002
 proud-0.002
 impressed-0.002
Token big
Token ID1263
Feature activation+7.74e-03

Pos logit contributions
robe+0.004
roller+0.004
 ceiling+0.004
ork+0.003
room+0.003
Neg logit contributions
 aware-0.003
 bra-0.003
 disl-0.003
 cautious-0.003
 consider-0.003
Token,
Token ID11
Feature activation+7.70e-03

Pos logit contributions
 aware+0.011
 bra+0.011
 disl+0.010
 cautious+0.010
 consider+0.010
Neg logit contributions
robe-0.012
roller-0.011
 ceiling-0.011
room-0.011
ase-0.010
Token happy
Token ID3772
Feature activation-3.59e-03

Pos logit contributions
 aware+0.011
 bra+0.010
 disl+0.010
Just+0.010
emy+0.010
Neg logit contributions
robe-0.012
 ceiling-0.011
roller-0.011
 frame-0.010
room-0.010
Token smile
Token ID8212
Feature activation+8.49e-03

Pos logit contributions
robe+0.005
roller+0.005
room+0.005
 ceiling+0.005
ase+0.005
Neg logit contributions
 bra-0.005
 aware-0.005
 cautious-0.005
 disl-0.005
 consider-0.005
Token on
Token ID319
Feature activation-1.36e-02

Pos logit contributions
 bra+0.009
 aware+0.009
 cautious+0.008
 consider+0.008
 disl+0.008
Neg logit contributions
robe-0.016
roller-0.016
room-0.015
 ceiling-0.015
ocated-0.014
Token big
Token ID1263
Feature activation+1.61e-02

Pos logit contributions
 aware+0.011
 disl+0.011
 bra+0.011
smart+0.011
 consider+0.011
Neg logit contributions
robe-0.007
 ceiling-0.007
roller-0.007
room-0.006
 shelf-0.006
Token gr
Token ID1036
Feature activation+1.39e-02

Pos logit contributions
 aware+0.025
 bra+0.024
Just+0.024
 disl+0.024
 consider+0.024
Neg logit contributions
robe-0.022
 ceiling-0.022
roller-0.021
room-0.021
 frame-0.020
Tokenumpy
Token ID32152
Feature activation+4.97e-03

Pos logit contributions
 bra+0.015
 aware+0.015
 cautious+0.013
 disl+0.013
 consider+0.013
Neg logit contributions
robe-0.026
roller-0.025
room-0.024
 ceiling-0.024
ase-0.023
Token bear
Token ID6842
Feature activation+1.05e-02

Pos logit contributions
 bra+0.007
 aware+0.006
 cautious+0.006
 disl+0.006
 consider+0.006
Neg logit contributions
robe-0.008
roller-0.008
room-0.007
 ceiling-0.007
ase-0.007
Token.
Token ID13
Feature activation+4.79e-03

Pos logit contributions
 Deal+0.014
 bra+0.014
 disl+0.014
 aware+0.013
 consider+0.012
Neg logit contributions
robe-0.018
room-0.017
ase-0.017
roller-0.016
 ceiling-0.015
Token He
Token ID679
Feature activation-3.43e-04

Pos logit contributions
 bra+0.005
 aware+0.005
 questions+0.004
 disl+0.004
 cautious+0.004
Neg logit contributions
robe-0.009
roller-0.009
ocated-0.009
ork-0.009
room-0.008
Token didn
Token ID1422
Feature activation+6.07e-03

Pos logit contributions
robe+0.000
roller+0.000
room+0.000
 ceiling+0.000
tered+0.000
Neg logit contributions
 bra-0.001
 aware-0.001
 disl-0.001
 cautious-0.001
 consider-0.001
Token't
Token ID470
Feature activation+1.52e-03

Pos logit contributions
 bra+0.010
 aware+0.009
 consider+0.009
 disl+0.009
 questions+0.009
Neg logit contributions
robe-0.008
roller-0.008
room-0.007
ase-0.007
 ceiling-0.007
Token like
Token ID588
Feature activation+9.87e-03

Pos logit contributions
 aware+0.002
 bra+0.002
 disl+0.001
 trusts+0.001
Just+0.001
Neg logit contributions
robe-0.003
tered-0.003
room-0.003
 ceiling-0.003
roller-0.003
Token to
Token ID284
Feature activation-5.68e-03

Pos logit contributions
 bra+0.013
 aware+0.013
 disl+0.012
 cautious+0.012
 consider+0.012
Neg logit contributions
robe-0.016
roller-0.015
room-0.014
 ceiling-0.014
ase-0.014
Token do
Token ID466
Feature activation+6.66e-03

Pos logit contributions
robe+0.009
roller+0.008
room+0.008
 ceiling+0.008
ase+0.008
Neg logit contributions
 bra-0.008
 aware-0.008
 cautious-0.007
 disl-0.007
 consider-0.007
Token and
Token ID290
Feature activation+1.12e-02

Pos logit contributions
robe+0.022
roller+0.021
room+0.020
 ceiling+0.020
ase+0.020
Neg logit contributions
 bra-0.013
 aware-0.013
 cautious-0.012
 disl-0.011
 questions-0.011
Token bye
Token ID33847
Feature activation-7.07e-04

Pos logit contributions
emy+0.016
Just+0.016
 aware+0.016
 cautious+0.015
cod+0.015
Neg logit contributions
 ceiling-0.015
robe-0.015
Mine-0.014
 frame-0.014
room-0.014
Token-
Token ID12
Feature activation-1.04e-02

Pos logit contributions
robe+0.001
roller+0.001
room+0.001
 ceiling+0.001
ork+0.001
Neg logit contributions
 aware-0.001
 bra-0.001
 cautious-0.001
 consider-0.001
 disl-0.001
Tokenbye
Token ID16390
Feature activation-1.70e-03

Pos logit contributions
robe+0.015
roller+0.014
room+0.013
 ceiling+0.013
ase+0.013
Neg logit contributions
 bra-0.016
 aware-0.016
 disl-0.015
 cautious-0.015
 consider-0.015
Token and
Token ID290
Feature activation+1.03e-02

Pos logit contributions
robe+0.003
roller+0.003
room+0.002
 ceiling+0.002
ork+0.002
Neg logit contributions
 aware-0.002
 cautious-0.002
 bra-0.002
 consider-0.002
 disl-0.002
Token make
Token ID787
Feature activation-8.31e-03

Pos logit contributions
 aware+0.013
 bra+0.012
 disl+0.012
 cautious+0.012
 consider+0.011
Neg logit contributions
robe-0.017
roller-0.016
 ceiling-0.016
room-0.016
tered-0.016
Token funny
Token ID8258
Feature activation-8.13e-03

Pos logit contributions
robe+0.016
roller+0.015
ase+0.015
ocated+0.014
attered+0.014
Neg logit contributions
 bra-0.010
 questions-0.009
 aware-0.009
 proud-0.009
 brave-0.008
Token noises
Token ID26782
Feature activation+5.47e-04

Pos logit contributions
robe+0.010
room+0.010
roller+0.010
 ceiling+0.009
ings+0.009
Neg logit contributions
 aware-0.013
 bra-0.013
 cautious-0.013
emy-0.013
 consider-0.013
Token.
Token ID13
Feature activation+1.27e-02

Pos logit contributions
 aware+0.001
emy+0.001
 disl+0.001
 cautious+0.001
 trusts+0.001
Neg logit contributions
robe-0.001
room-0.001
 ceiling-0.001
tered-0.001
roller-0.001
Token
Token ID198
Feature activation-2.06e-03

Pos logit contributions
 bra+0.015
 aware+0.015
 disl+0.014
 cautious+0.014
 consider+0.014
Neg logit contributions
robe-0.023
roller-0.021
room-0.021
 ceiling-0.020
ork-0.020
Token
Token ID198
Feature activation+1.47e-03

Pos logit contributions
robe+0.004
roller+0.004
room+0.004
ork+0.004
 ceiling+0.003
Neg logit contributions
 bra-0.002
 aware-0.002
 disl-0.002
 cautious-0.002
 consider-0.002
Token and
Token ID290
Feature activation+5.39e-03

Pos logit contributions
 aware+0.009
 disl+0.009
 bra+0.008
 cautious+0.008
 consider+0.008
Neg logit contributions
robe-0.010
room-0.010
tered-0.009
 ceiling-0.009
roller-0.009
Token laughed
Token ID13818
Feature activation+1.20e-02

Pos logit contributions
 bra+0.007
 aware+0.007
 cautious+0.007
 disl+0.007
 proud+0.007
Neg logit contributions
robe-0.009
roller-0.008
room-0.008
ork-0.008
ase-0.008
Token until
Token ID1566
Feature activation+1.19e-02

Pos logit contributions
 disl+0.015
 aware+0.015
 bra+0.015
 cautious+0.014
 consider+0.014
Neg logit contributions
robe-0.021
room-0.019
tered-0.019
 ceiling-0.019
roller-0.018
Token the
Token ID262
Feature activation-7.19e-03

Pos logit contributions
 bra+0.012
 proud+0.011
 then+0.010
But+0.010
 saying+0.010
Neg logit contributions
robe-0.025
room-0.025
roller-0.024
ase-0.024
ool-0.023
Token sun
Token ID4252
Feature activation-1.42e-02

Pos logit contributions
robe+0.012
 ceiling+0.012
roller+0.012
room+0.011
ase+0.011
Neg logit contributions
 aware-0.009
 bra-0.009
 consider-0.008
 cautious-0.008
 disl-0.008
Token began
Token ID2540
Feature activation-4.20e-02

Pos logit contributions
roller+0.023
ocated+0.023
robe+0.023
 budget+0.023
room+0.022
Neg logit contributions
 bra-0.019
 med-0.016
 proud-0.016
But-0.015
 cal-0.015
Token to
Token ID284
Feature activation-3.44e-02

Pos logit contributions
robe+0.070
roller+0.066
room+0.064
ase+0.063
 ceiling+0.062
Neg logit contributions
 bra-0.055
 aware-0.054
 cautious-0.050
 questions-0.050
 consider-0.050
Token set
Token ID900
Feature activation-1.72e-02

Pos logit contributions
robe+0.044
roller+0.041
room+0.039
 ceiling+0.038
ase+0.037
Neg logit contributions
 bra-0.058
 aware-0.057
 cautious-0.054
 disl-0.054
 consider-0.054
Token.
Token ID13
Feature activation+2.13e-03

Pos logit contributions
robe+0.043
roller+0.042
ase+0.041
 ceiling+0.041
Mine+0.040
Neg logit contributions
 bra-0.009
 aware-0.006
But-0.006
 questions-0.006
 proud-0.005
Token Lucy
Token ID22162
Feature activation-9.79e-03

Pos logit contributions
 bra+0.003
 aware+0.003
 disl+0.002
 proud+0.002
But+0.002
Neg logit contributions
robe-0.004
roller-0.003
room-0.003
ocated-0.003
 ceiling-0.003
Token said
Token ID531
Feature activation+3.31e-03

Pos logit contributions
robe+0.018
roller+0.017
room+0.016
 ceiling+0.016
ase+0.016
Neg logit contributions
 aware-0.011
 bra-0.011
 disl-0.011
 questions-0.010
 cautious-0.010
Token Mom
Token ID11254
Feature activation-1.86e-02

Pos logit contributions
robe+0.016
roller+0.015
ocated+0.014
room+0.014
kin+0.014
Neg logit contributions
 disl-0.013
 bra-0.012
 aware-0.012
 proud-0.012
But-0.012
Tokenmy
Token ID1820
Feature activation-6.94e-03

Pos logit contributions
robe+0.026
roller+0.024
room+0.023
ase+0.023
 ceiling+0.022
Neg logit contributions
 bra-0.029
 disl-0.028
 aware-0.028
 cautious-0.027
 consider-0.027
Token said
Token ID531
Feature activation-1.53e-02

Pos logit contributions
robe+0.014
roller+0.013
room+0.013
tered+0.013
ase+0.012
Neg logit contributions
 bra-0.006
 aware-0.006
 disl-0.006
 cautious-0.006
 questions-0.006
Token sorry
Token ID7926
Feature activation-1.04e-02

Pos logit contributions
robe+0.031
roller+0.030
room+0.029
 ceiling+0.028
ase+0.028
Neg logit contributions
 bra-0.015
 aware-0.014
 proud-0.013
 disl-0.013
 consider-0.013
Token and
Token ID290
Feature activation-8.31e-03

Pos logit contributions
robe+0.019
roller+0.018
room+0.017
 ceiling+0.017
tered+0.017
Neg logit contributions
 bra-0.012
 aware-0.012
 disl-0.011
 cautious-0.011
 consider-0.011
Token promised
Token ID8072
Feature activation-3.48e-02

Pos logit contributions
robe+0.017
roller+0.017
room+0.016
 ceiling+0.016
ork+0.016
Neg logit contributions
 bra-0.008
 aware-0.008
 consider-0.007
 proud-0.007
 cautious-0.007
Token to
Token ID284
Feature activation-7.15e-03

Pos logit contributions
robe+0.058
roller+0.054
room+0.053
 ceiling+0.052
ase+0.051
Neg logit contributions
 bra-0.046
 aware-0.046
 disl-0.043
 consider-0.042
 cautious-0.042
Token pack
Token ID2353
Feature activation-2.87e-02

Pos logit contributions
robe+0.011
roller+0.010
room+0.010
 ceiling+0.010
ork+0.009
Neg logit contributions
 bra-0.010
 aware-0.010
 cautious-0.009
 disl-0.009
 questions-0.009
Token her
Token ID607
Feature activation+5.16e-03

Pos logit contributions
roller+0.054
robe+0.051
ocated+0.049
room+0.048
 ceiling+0.048
Neg logit contributions
 bra-0.034
 questions-0.034
 proud-0.032
 cal-0.028
 brave-0.028
Token favorite
Token ID4004
Feature activation-1.45e-02

Pos logit contributions
 bra+0.010
 aware+0.010
 disl+0.009
 cautious+0.009
 consider+0.009
Neg logit contributions
robe-0.006
roller-0.005
room-0.005
 ceiling-0.005
ork-0.004
Token sandwich
Token ID20433
Feature activation-6.84e-03

Pos logit contributions
robe+0.017
roller+0.016
room+0.015
 ceiling+0.015
ocated+0.014
Neg logit contributions
 bra-0.025
 aware-0.025
 questions-0.024
 disl-0.024
 cautious-0.024
Token store
Token ID3650
Feature activation-1.48e-02

Pos logit contributions
robe+0.036
roller+0.034
 ceiling+0.034
room+0.033
ork+0.032
Neg logit contributions
 aware-0.039
 bra-0.039
 cautious-0.038
 disl-0.037
 questions-0.036
Token their
Token ID511
Feature activation+1.39e-02

Pos logit contributions
robe+0.023
roller+0.021
room+0.020
 ceiling+0.020
ork+0.020
Neg logit contributions
 bra-0.021
 aware-0.021
 cautious-0.020
 disl-0.020
 consider-0.019
Token honey
Token ID12498
Feature activation+8.46e-03

Pos logit contributions
 bra+0.019
 aware+0.019
 disl+0.018
 cautious+0.018
 consider+0.018
Neg logit contributions
robe-0.022
roller-0.021
 ceiling-0.020
room-0.020
ork-0.019
Token and
Token ID290
Feature activation-7.43e-03

Pos logit contributions
 bra+0.010
 aware+0.010
 disl+0.010
 cautious+0.010
 consider+0.009
Neg logit contributions
robe-0.015
roller-0.014
 ceiling-0.014
room-0.014
ork-0.013
Token take
Token ID1011
Feature activation-4.06e-03

Pos logit contributions
robe+0.011
 ceiling+0.010
roller+0.010
ork+0.009
room+0.009
Neg logit contributions
 bra-0.011
 aware-0.011
 cautious-0.010
Just-0.010
 impressed-0.010
Token care
Token ID1337
Feature activation-3.31e-02

Pos logit contributions
robe+0.007
roller+0.007
room+0.007
ase+0.007
 ceiling+0.006
Neg logit contributions
 bra-0.005
 questions-0.005
 aware-0.005
 brave-0.004
 proud-0.004
Token of
Token ID286
Feature activation-8.31e-03

Pos logit contributions
robe+0.046
roller+0.042
 ceiling+0.041
room+0.040
ork+0.040
Neg logit contributions
 bra-0.052
 aware-0.050
 disl-0.050
 cautious-0.049
 consider-0.047
Token their
Token ID511
Feature activation+1.43e-02

Pos logit contributions
robe+0.012
roller+0.011
room+0.011
 ceiling+0.011
ork+0.010
Neg logit contributions
 bra-0.013
 aware-0.013
 cautious-0.012
 disl-0.012
 consider-0.012
Token babies
Token ID11903
Feature activation+1.15e-02

Pos logit contributions
 bra+0.019
 aware+0.019
 consider+0.018
Just+0.018
 disl+0.018
Neg logit contributions
robe-0.023
 ceiling-0.022
roller-0.020
ork-0.020
ings-0.020
Token,"
Token ID553
Feature activation+1.86e-02

Pos logit contributions
 bra+0.013
 aware+0.013
 disl+0.012
 cautious+0.012
 trusts+0.012
Neg logit contributions
robe-0.021
roller-0.019
 ceiling-0.019
room-0.019
tered-0.018
Token mom
Token ID1995
Feature activation-1.87e-02

Pos logit contributions
 aware+0.034
 bra+0.034
 disl+0.033
 cautious+0.033
 consider+0.032
Neg logit contributions
robe-0.020
roller-0.018
room-0.018
 ceiling-0.018
Mine-0.016
Token it
Token ID340
Feature activation-6.63e-04

Pos logit contributions
robe+0.003
roller+0.003
ase+0.003
attered+0.002
room+0.002
Neg logit contributions
 bra-0.002
 questions-0.002
 aware-0.002
 disl-0.002
 consider-0.002
Token that
Token ID326
Feature activation+8.51e-03

Pos logit contributions
robe+0.001
roller+0.001
room+0.001
ase+0.001
ocated+0.001
Neg logit contributions
 aware-0.001
 bra-0.001
 questions-0.001
 cautious-0.001
 disl-0.001
Token this
Token ID428
Feature activation-2.48e-03

Pos logit contributions
 bra+0.013
 aware+0.013
 cautious+0.012
 questions+0.012
 disl+0.012
Neg logit contributions
robe-0.013
roller-0.012
room-0.011
ase-0.011
attered-0.011
Token was
Token ID373
Feature activation-1.19e-02

Pos logit contributions
robe+0.003
roller+0.003
room+0.003
 ceiling+0.003
ase+0.003
Neg logit contributions
 bra-0.004
 aware-0.004
 cautious-0.004
 disl-0.004
 consider-0.004
Token the
Token ID262
Feature activation-7.66e-03

Pos logit contributions
robe+0.018
roller+0.016
 ceiling+0.016
tered+0.016
ocated+0.015
Neg logit contributions
 aware-0.016
 bra-0.016
 disl-0.016
 trusts-0.015
 cautious-0.015
Token way
Token ID835
Feature activation-4.56e-02

Pos logit contributions
robe+0.011
 ceiling+0.010
roller+0.010
room+0.009
box+0.009
Neg logit contributions
 aware-0.012
 disl-0.012
 bra-0.011
smart-0.011
emy-0.011
Token things
Token ID1243
Feature activation-3.39e-03

Pos logit contributions
robe+0.080
roller+0.074
room+0.074
ase+0.073
tered+0.070
Neg logit contributions
 aware-0.054
 bra-0.054
 cautious-0.051
 consider-0.050
 questions-0.050
Token had
Token ID550
Feature activation-1.88e-02

Pos logit contributions
robe+0.005
roller+0.005
room+0.005
 ceiling+0.005
ase+0.005
Neg logit contributions
 bra-0.005
 aware-0.005
 disl-0.004
 cautious-0.004
 consider-0.004
Token to
Token ID284
Feature activation-9.37e-03

Pos logit contributions
robe+0.031
roller+0.029
room+0.028
ase+0.028
 ceiling+0.027
Neg logit contributions
 bra-0.025
 aware-0.025
 consider-0.024
 cautious-0.023
 questions-0.023
Token be
Token ID307
Feature activation+3.35e-03

Pos logit contributions
robe+0.015
roller+0.014
room+0.013
 ceiling+0.013
ork+0.013
Neg logit contributions
 bra-0.013
 aware-0.013
 cautious-0.012
 disl-0.012
 consider-0.012
Token and
Token ID290
Feature activation-3.66e-03

Pos logit contributions
 bra+0.004
 aware+0.004
 disl+0.004
 trusts+0.004
 questions+0.004
Neg logit contributions
robe-0.005
roller-0.005
room-0.005
 ceiling-0.005
tered-0.005
Token put
Token ID1234
Feature activation-1.05e-02

Pos logit contributions
robe+0.014
roller+0.014
ork+0.013
 ceiling+0.013
ings+0.013
Neg logit contributions
 aware-0.011
 bra-0.011
 proud-0.010
 tell-0.010
 consider-0.010
Token the
Token ID262
Feature activation-1.37e-02

Pos logit contributions
roller+0.025
ocated+0.023
ork+0.023
arded+0.022
case+0.022
Neg logit contributions
 proud-0.008
 bra-0.008
 questions-0.007
 brave-0.006
 then-0.006
Token blanket
Token ID18447
Feature activation-2.96e-02

Pos logit contributions
roller+0.017
room+0.016
attered+0.016
robe+0.016
ork+0.015
Neg logit contributions
 brave-0.023
 bra-0.023
 proud-0.022
 questions-0.022
 aware-0.022
Token on
Token ID319
Feature activation-2.21e-02

Pos logit contributions
Mine+0.060
roller+0.059
room+0.057
robe+0.057
ocated+0.057
Neg logit contributions
 proud-0.025
 bra-0.024
 aware-0.023
 brave-0.023
 cautious-0.021
Token the
Token ID262
Feature activation-1.83e-02

Pos logit contributions
robe+0.049
attered+0.048
ocated+0.048
roller+0.048
ase+0.046
Neg logit contributions
But-0.017
 bra-0.014
 then-0.014
 proud-0.013
 saying-0.013
Token side
Token ID1735
Feature activation-3.46e-02

Pos logit contributions
ings+0.021
robe+0.021
room+0.021
uttered+0.021
attered+0.021
Neg logit contributions
 bra-0.031
 brave-0.031
But-0.031
 proud-0.030
 questions-0.029
Token of
Token ID286
Feature activation-2.04e-02

Pos logit contributions
robe+0.074
Mine+0.074
ocated+0.073
tered+0.070
mail+0.067
Neg logit contributions
 bra-0.022
 proud-0.021
 saying-0.021
 aware-0.021
 questions-0.020
Token her
Token ID607
Feature activation-8.20e-03

Pos logit contributions
robe+0.053
attered+0.053
roller+0.052
ocated+0.051
t+0.049
Neg logit contributions
 saying-0.009
 questions-0.008
 bra-0.007
 then-0.007
But-0.007
Token bed
Token ID3996
Feature activation-1.46e-02

Pos logit contributions
robe+0.011
room+0.010
ocated+0.010
roller+0.010
tered+0.010
Neg logit contributions
 bra-0.013
 aware-0.013
 proud-0.012
 questions-0.012
 cautious-0.012
Token.
Token ID13
Feature activation-5.05e-03

Pos logit contributions
Mine+0.035
ocated+0.031
roller+0.031
robe+0.031
mail+0.030
Neg logit contributions
 proud-0.008
 bra-0.008
 aware-0.007
 confident-0.007
But-0.006
Token It
Token ID632
Feature activation-2.43e-02

Pos logit contributions
robe+0.009
kin+0.008
roller+0.008
ings+0.008
ase+0.008
Neg logit contributions
 disl-0.005
 bra-0.005
 proud-0.005
 questions-0.005
 aware-0.005