TOP ACTIVATIONS
MAX = 0.064

, the animals learned to always share and be kind to
Or we can ask your dad to make another one ?"
decided to use Sammy 's dad 's tool box to make
She did not want her dad to go away . She
his dog can do what he wants . Mom is very
She did not want her dad to see her face .
learned to share and to be careful . They had a
Sometimes Is la 's dad would play with her .
and he can do what he wants . M
Tim my and Billy would always go to the park and
everyone around him do what he said . One day ,
They learned to share and be kind . They were not
He learns to share and be nice . They play with
learned to share and to be kind . They had a
story is to share and be kind to others . <|endoftext|>
on , Sammy learned to always ask before taking something that
They learn to share and be good . Mom is proud
promised to always share and be good friends . <|endoftext|> Once
help him buy the things he needed . They said they
They learned that springs can be dangerous . They learned that

MIN ACTIVATIONS
MIN = -0.072

a strong lizard named T az . He lived happily among
â M az zy was so excited about
exclaimed . L izz ie smiled back . She
you do ?!" Sad ie quickly looked up ,
Clara stepped quietly so she wouldn 't scare it away .
leave , the old man wouldn 't let him . He
protect her . <|endoftext|> Pat ches the puppy was very hungry
. L izz ie and her sister were
ears anymore ! <|endoftext|> J os ie wanted to pack her
tray very carefully so he wouldn 't drop the food .
. L izz ie and her sister learned
string , so that he wouldn 't lose it again .
it away so Tim my wouldn 't launch it again .
the cabinet so Tim my wouldn 't eat it all before
. The elevator stopped and wouldn 't move . She was
â L izz ie nodded and said ,
One day , T az heard some voices coming from
it , so that she wouldn 't pull it too hard
ill . But the ant wouldn 't give up . The
in the closet so Lily wouldn 't see it anymore .

INTERVAL 0.030 - 0.064
CONTAINS 0.912%

and found all the tools he needed to fix the car
L ily asked her mom my , " Why is
to call her mom and dad , but they were in
fence . And they did not mean to upset the dog
They smile . They do not see the man . He

INTERVAL -0.004 - 0.030
CONTAINS 61.184%

and kept repeating the silly song . He was very excited
sad , but then the conductor came over and said ,
" What happened ?" Mom asked .
and held Jill 's hand . They sw am back to
me sad ." " Me too . I wish

INTERVAL -0.038 - -0.004
CONTAINS 37.637%

, the sky started to turn dark and the wind began
pillar ! He said hello to each one . They all
pretty . Lily felt proud and happy . She told them
her . She was so determined to get it , she
toilet , feeling very silly . She said , " I

INTERVAL -0.072 - -0.038
CONTAINS 0.267%

my was sad that he couldn 't step on the bug
. He had to z ig zag around and make big
. L ola started to record the pig
and tried , but she couldn 't do it .
office closed ?" she asked . " Because
Token,
Token ID11
Feature activation+1.15e-03

Pos logit contributions
 harmful+0.004
 charity+0.004
 wasteful+0.004
Or+0.004
 spies+0.004
Neg logit contributions
 couldn-0.002
field-0.002
 doorway-0.002
 comet-0.002
 night-0.002
Token the
Token ID262
Feature activation+4.27e-03

Pos logit contributions
 harmful+0.002
 wasteful+0.002
drivers+0.002
 charity+0.002
Or+0.002
Neg logit contributions
 couldn-0.001
 night-0.001
field-0.001
 comet-0.001
 Cloud-0.001
Token animals
Token ID4695
Feature activation-7.36e-03

Pos logit contributions
 harmful+0.009
 charity+0.008
 wasteful+0.008
Or+0.008
drivers+0.008
Neg logit contributions
 doorway-0.006
field-0.005
 comet-0.005
 doorstep-0.005
 couldn-0.005
Token learned
Token ID4499
Feature activation-7.45e-04

Pos logit contributions
 doorway+0.010
 doorstep+0.010
field+0.010
 comet+0.010
chair+0.010
Neg logit contributions
 harmful-0.014
 wasteful-0.013
Or-0.013
 charity-0.012
 spies-0.012
Token to
Token ID284
Feature activation+3.46e-02

Pos logit contributions
 doorway+0.001
 couldn+0.001
 comet+0.001
 doorstep+0.001
field+0.001
Neg logit contributions
 harmful-0.002
 wasteful-0.002
 charity-0.002
Or-0.002
drivers-0.002
Token always
Token ID1464
Feature activation+6.44e-02

Pos logit contributions
 harmful+0.057
 spies+0.053
drivers+0.053
Or+0.051
 wasteful+0.050
Neg logit contributions
 doorstep-0.055
 replied-0.054
 couldn-0.053
field-0.053
 night-0.052
Token share
Token ID2648
Feature activation+2.44e-02

Pos logit contributions
 harmful+0.117
drivers+0.114
istered+0.107
 spies+0.106
 wasteful+0.106
Neg logit contributions
 replied-0.086
 couldn-0.085
 night-0.084
 doorstep-0.084
field-0.084
Token and
Token ID290
Feature activation+3.65e-02

Pos logit contributions
drivers+0.057
 harmful+0.056
ief+0.056
 wasteful+0.054
iers+0.054
Neg logit contributions
 couldn-0.016
 night-0.016
 tightly-0.014
 comet-0.013
 was-0.013
Token be
Token ID307
Feature activation+5.71e-02

Pos logit contributions
 harmful+0.066
drivers+0.058
 wasteful+0.058
 charity+0.057
Or+0.057
Neg logit contributions
 couldn-0.054
 doorstep-0.053
 doorway-0.053
field-0.053
 comet-0.052
Token kind
Token ID1611
Feature activation+1.52e-02

Pos logit contributions
drivers+0.123
istered+0.116
renches+0.112
 harmful+0.111
 charity+0.111
Neg logit contributions
 waved-0.048
 cl-0.045
 was-0.045
 loved-0.042
 couldn-0.042
Token to
Token ID284
Feature activation+2.10e-02

Pos logit contributions
 harmful+0.035
 wasteful+0.031
 charity+0.031
Or+0.031
drivers+0.030
Neg logit contributions
 couldn-0.016
 doorstep-0.015
 doorway-0.015
 night-0.015
chair-0.015
TokenOr
Token ID5574
Feature activation+1.63e-02

Pos logit contributions
 harmful+0.007
 thieves+0.007
 wasteful+0.007
 dishonest+0.006
drivers+0.006
Neg logit contributions
field-0.005
chair-0.005
 comet-0.005
 sky-0.005
 doorstep-0.004
Token we
Token ID356
Feature activation-2.37e-03

Pos logit contributions
 harmful+0.029
 wasteful+0.024
cod+0.024
Or+0.023
 charity+0.023
Neg logit contributions
 couldn-0.026
 comet-0.025
chair-0.024
 night-0.024
 doorstep-0.024
Token can
Token ID460
Feature activation-1.36e-02

Pos logit contributions
 couldn+0.003
field+0.003
 doorstep+0.003
 comet+0.003
 doorway+0.003
Neg logit contributions
 harmful-0.004
 wasteful-0.004
 charity-0.004
Or-0.004
drivers-0.004
Token ask
Token ID1265
Feature activation+1.29e-02

Pos logit contributions
 doorway+0.022
field+0.022
 doorstep+0.021
 comet+0.021
 couldn+0.021
Neg logit contributions
 harmful-0.024
 wasteful-0.021
 charity-0.021
Or-0.020
 spies-0.020
Token your
Token ID534
Feature activation+1.19e-02

Pos logit contributions
 harmful+0.025
 wasteful+0.022
Or+0.022
drivers+0.022
 charity+0.022
Neg logit contributions
 doorstep-0.018
field-0.017
 comet-0.017
 doorway-0.017
 couldn-0.017
Token dad
Token ID9955
Feature activation+6.14e-02

Pos logit contributions
 harmful+0.024
 charity+0.021
 spies+0.021
 wasteful+0.021
Or+0.021
Neg logit contributions
 doorway-0.016
field-0.016
 doorstep-0.016
 comet-0.016
 couldn-0.015
Token to
Token ID284
Feature activation+1.32e-02

Pos logit contributions
 harmful+0.130
 wasteful+0.114
Or+0.113
drivers+0.111
 spies+0.110
Neg logit contributions
 couldn-0.075
 night-0.072
field-0.072
 doorstep-0.071
chair-0.068
Token make
Token ID787
Feature activation-6.01e-03

Pos logit contributions
 harmful+0.024
drivers+0.021
 wasteful+0.021
Or+0.021
 spies+0.020
Neg logit contributions
 doorstep-0.020
field-0.020
 comet-0.020
 doorway-0.020
 couldn-0.019
Token another
Token ID1194
Feature activation+2.02e-03

Pos logit contributions
 comet+0.009
 doorway+0.009
 doorstep+0.009
field+0.009
 night+0.009
Neg logit contributions
 harmful-0.011
Or-0.010
 wasteful-0.010
 charity-0.009
drivers-0.009
Token one
Token ID530
Feature activation+1.77e-03

Pos logit contributions
 harmful+0.004
 charity+0.003
 wasteful+0.003
 spies+0.003
Or+0.003
Neg logit contributions
 doorway-0.003
 doorstep-0.003
field-0.003
 comet-0.003
 couldn-0.003
Token?"
Token ID1701
Feature activation-1.11e-02

Pos logit contributions
 harmful+0.003
 charity+0.003
 wasteful+0.003
 spies+0.003
Or+0.003
Neg logit contributions
 doorway-0.003
 doorstep-0.003
field-0.003
 comet-0.003
 couldn-0.003
Token decided
Token ID3066
Feature activation+5.15e-03

Pos logit contributions
 harmful+0.000
 wasteful+0.000
 charity+0.000
Or+0.000
 spies+0.000
Neg logit contributions
 doorway-0.000
field-0.000
 doorstep-0.000
 comet-0.000
 couldn-0.000
Token to
Token ID284
Feature activation+2.73e-03

Pos logit contributions
 harmful+0.012
 wasteful+0.011
 charity+0.011
Or+0.011
drivers+0.011
Neg logit contributions
 couldn-0.005
 doorway-0.005
 comet-0.005
 doorstep-0.005
field-0.005
Token use
Token ID779
Feature activation+3.79e-03

Pos logit contributions
 harmful+0.005
 wasteful+0.004
 charity+0.004
Or+0.004
 spies+0.004
Neg logit contributions
 doorway-0.004
 doorstep-0.004
field-0.004
 comet-0.004
 couldn-0.004
Token Sammy
Token ID44067
Feature activation+1.59e-02

Pos logit contributions
 harmful+0.006
 wasteful+0.006
Or+0.006
 charity+0.005
 dishonest+0.005
Neg logit contributions
 doorway-0.006
 comet-0.006
 couldn-0.006
field-0.006
 night-0.006
Token's
Token ID338
Feature activation+5.96e-03

Pos logit contributions
 harmful+0.036
 wasteful+0.033
Or+0.033
 charity+0.032
 spies+0.031
Neg logit contributions
 couldn-0.017
field-0.016
 doorstep-0.016
 doorway-0.016
 night-0.016
Token dad
Token ID9955
Feature activation+6.02e-02

Pos logit contributions
 harmful+0.010
 wasteful+0.009
 charity+0.009
Or+0.009
 spies+0.009
Neg logit contributions
 doorway-0.010
 doorstep-0.009
 comet-0.009
field-0.009
 couldn-0.009
Token's
Token ID338
Feature activation+7.37e-03

Pos logit contributions
 harmful+0.152
Or+0.142
 wasteful+0.141
 spies+0.135
Like+0.134
Neg logit contributions
 couldn-0.052
 night-0.046
 doorstep-0.044
chair-0.042
field-0.042
Token tool
Token ID2891
Feature activation-1.89e-03

Pos logit contributions
 harmful+0.014
 wasteful+0.012
Or+0.012
 charity+0.012
 spies+0.012
Neg logit contributions
 doorway-0.011
 doorstep-0.011
 comet-0.011
field-0.011
 couldn-0.010
Token box
Token ID3091
Feature activation+1.36e-02

Pos logit contributions
 doorway+0.002
field+0.002
 doorstep+0.002
 couldn+0.002
 comet+0.002
Neg logit contributions
 harmful-0.004
 wasteful-0.003
 charity-0.003
Or-0.003
 spies-0.003
Token to
Token ID284
Feature activation+7.74e-03

Pos logit contributions
 harmful+0.029
 wasteful+0.026
 charity+0.025
Or+0.025
 spies+0.025
Neg logit contributions
 doorway-0.017
 couldn-0.017
field-0.017
 doorstep-0.017
chair-0.016
Token make
Token ID787
Feature activation+8.82e-04

Pos logit contributions
 harmful+0.015
drivers+0.014
 wasteful+0.014
Or+0.014
cod+0.013
Neg logit contributions
 doorstep-0.011
 couldn-0.010
 doorway-0.010
 comet-0.010
 night-0.010
Token She
Token ID1375
Feature activation+3.31e-04

Pos logit contributions
 harmful+0.002
 charity+0.002
 wasteful+0.002
Or+0.002
 spies+0.002
Neg logit contributions
 doorway-0.003
field-0.003
 doorstep-0.003
 comet-0.003
 couldn-0.003
Token did
Token ID750
Feature activation-5.59e-03

Pos logit contributions
 harmful+0.001
 wasteful+0.000
 dishonest+0.000
Or+0.000
 greedy+0.000
Neg logit contributions
field-0.001
 doorway-0.001
 comet-0.001
 sunrise-0.001
 doorstep-0.001
Token not
Token ID407
Feature activation+4.12e-02

Pos logit contributions
 couldn+0.009
 doorstep+0.009
 doorway+0.009
 comet+0.009
 night+0.009
Neg logit contributions
 harmful-0.009
drivers-0.008
Or-0.008
 charity-0.008
 wasteful-0.008
Token want
Token ID765
Feature activation+5.58e-03

Pos logit contributions
drivers+0.079
Or+0.074
 harmful+0.070
 charity+0.066
Like+0.065
Neg logit contributions
 night-0.064
night-0.054
 comet-0.054
 yawn-0.053
 couldn-0.053
Token her
Token ID607
Feature activation-1.61e-03

Pos logit contributions
 harmful+0.012
 wasteful+0.011
 charity+0.011
Or+0.011
 spies+0.011
Neg logit contributions
 doorway-0.006
field-0.006
 doorstep-0.006
 comet-0.006
 couldn-0.006
Token dad
Token ID9955
Feature activation+6.01e-02

Pos logit contributions
field+0.002
 doorway+0.002
 doorstep+0.002
 comet+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.003
 charity-0.003
 spies-0.003
 wasteful-0.003
Or-0.003
Token to
Token ID284
Feature activation+1.05e-02

Pos logit contributions
 harmful+0.146
drivers+0.141
ief+0.138
Or+0.136
 wasteful+0.136
Neg logit contributions
 couldn-0.049
chair-0.045
 night-0.045
 doorstep-0.041
night-0.039
Token go
Token ID467
Feature activation+7.20e-03

Pos logit contributions
 harmful+0.021
drivers+0.021
Or+0.020
 wasteful+0.019
 spies+0.019
Neg logit contributions
 doorstep-0.014
 night-0.013
 couldn-0.013
 comet-0.013
field-0.012
Token away
Token ID1497
Feature activation-2.05e-03

Pos logit contributions
 harmful+0.018
 wasteful+0.017
Or+0.016
cod+0.016
drivers+0.016
Neg logit contributions
 night-0.006
 doorway-0.006
 doorstep-0.006
field-0.006
 couldn-0.005
Token.
Token ID13
Feature activation+1.29e-03

Pos logit contributions
 night+0.002
field+0.002
 doorway+0.002
 doorstep+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.005
drivers-0.004
 wasteful-0.004
Or-0.004
ief-0.004
Token She
Token ID1375
Feature activation+1.18e-03

Pos logit contributions
 harmful+0.002
 wasteful+0.002
 charity+0.002
drivers+0.002
Or+0.002
Neg logit contributions
 doorway-0.002
 doorstep-0.002
field-0.002
 comet-0.002
 couldn-0.002
Token his
Token ID465
Feature activation+1.01e-02

Pos logit contributions
 couldn+0.005
chair+0.004
 comet+0.004
 night+0.004
 Cloud+0.004
Neg logit contributions
 harmful-0.009
 charity-0.008
Or-0.007
ief-0.007
Like-0.007
Token dog
Token ID3290
Feature activation+2.00e-02

Pos logit contributions
 harmful+0.019
 spies+0.017
 charity+0.017
Or+0.016
 wasteful+0.016
Neg logit contributions
field-0.015
 doorway-0.015
 comet-0.015
 doorstep-0.014
 awoke-0.014
Token can
Token ID460
Feature activation-2.27e-03

Pos logit contributions
 harmful+0.048
 charity+0.046
ief+0.045
 wasteful+0.044
drivers+0.044
Neg logit contributions
 couldn-0.017
 night-0.014
chair-0.013
night-0.013
 doorstep-0.013
Token do
Token ID466
Feature activation+1.30e-02

Pos logit contributions
 doorstep+0.003
field+0.003
 couldn+0.003
 night+0.003
 doorway+0.003
Neg logit contributions
 harmful-0.004
 wasteful-0.004
 spies-0.004
drivers-0.004
 charity-0.004
Token what
Token ID644
Feature activation+2.27e-02

Pos logit contributions
 harmful+0.022
Or+0.022
drivers+0.021
 charity+0.021
 wasteful+0.020
Neg logit contributions
 couldn-0.018
 night-0.018
 comet-0.016
chair-0.016
night-0.016
Token he
Token ID339
Feature activation+5.99e-02

Pos logit contributions
 harmful+0.041
 wasteful+0.035
 charity+0.035
Or+0.035
drivers+0.034
Neg logit contributions
 doorway-0.035
 comet-0.035
field-0.035
 doorstep-0.034
 couldn-0.034
Token wants
Token ID3382
Feature activation+1.58e-02

Pos logit contributions
 harmful+0.114
 charity+0.106
drivers+0.102
 wasteful+0.101
 spies+0.101
Neg logit contributions
 couldn-0.084
 night-0.078
 doorstep-0.075
night-0.075
field-0.075
Token.
Token ID13
Feature activation+5.82e-03

Pos logit contributions
 harmful+0.034
 wasteful+0.031
Or+0.030
drivers+0.030
 charity+0.030
Neg logit contributions
 couldn-0.018
 night-0.018
 doorstep-0.017
chair-0.017
 doorway-0.017
Token Mom
Token ID11254
Feature activation+8.38e-03

Pos logit contributions
 harmful+0.012
 charity+0.011
 wasteful+0.011
drivers+0.011
Or+0.011
Neg logit contributions
 doorway-0.007
chair-0.007
 couldn-0.007
field-0.007
 comet-0.007
Token is
Token ID318
Feature activation+8.28e-03

Pos logit contributions
 harmful+0.015
 wasteful+0.013
Or+0.013
drivers+0.013
 spies+0.013
Neg logit contributions
field-0.013
 doorway-0.013
 comet-0.013
 doorstep-0.013
 couldn-0.012
Token very
Token ID845
Feature activation+1.99e-02

Pos logit contributions
 harmful+0.013
drivers+0.013
 charity+0.012
Or+0.012
ief+0.011
Neg logit contributions
 doorway-0.013
 couldn-0.012
 night-0.012
 comet-0.012
 doorstep-0.012
Token She
Token ID1375
Feature activation+3.90e-03

Pos logit contributions
 doorway+0.001
 couldn+0.001
chair+0.001
 doorstep+0.001
 Cloud+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
cod-0.001
 charity-0.001
ief-0.001
Token did
Token ID750
Feature activation-1.91e-03

Pos logit contributions
 harmful+0.006
Or+0.006
 wasteful+0.005
 dishonest+0.005
 spies+0.005
Neg logit contributions
field-0.007
 doorstep-0.007
 comet-0.007
 doorway-0.007
 sunrise-0.007
Token not
Token ID407
Feature activation+3.00e-02

Pos logit contributions
 couldn+0.003
 doorway+0.003
 doorstep+0.003
 comet+0.003
 night+0.003
Neg logit contributions
 harmful-0.003
drivers-0.003
 wasteful-0.003
 charity-0.003
Or-0.003
Token want
Token ID765
Feature activation+8.39e-03

Pos logit contributions
 harmful+0.054
drivers+0.054
Or+0.053
 wasteful+0.051
 charity+0.050
Neg logit contributions
 night-0.044
 comet-0.040
 yawn-0.039
night-0.037
 awoke-0.037
Token her
Token ID607
Feature activation+3.53e-03

Pos logit contributions
 harmful+0.019
 wasteful+0.017
 charity+0.017
Or+0.017
 spies+0.016
Neg logit contributions
 doorway-0.009
 doorstep-0.009
 comet-0.009
field-0.009
 couldn-0.009
Token dad
Token ID9955
Feature activation+5.95e-02

Pos logit contributions
 harmful+0.007
 spies+0.006
 charity+0.006
 wasteful+0.006
Or+0.006
Neg logit contributions
field-0.005
 doorway-0.005
 comet-0.005
 doorstep-0.005
 couldn-0.004
Token to
Token ID284
Feature activation+1.01e-02

Pos logit contributions
 harmful+0.143
drivers+0.136
 wasteful+0.134
Or+0.134
ief+0.133
Neg logit contributions
 couldn-0.048
chair-0.047
 night-0.046
 doorstep-0.044
 awoke-0.041
Token see
Token ID766
Feature activation+1.87e-02

Pos logit contributions
 harmful+0.022
drivers+0.021
 wasteful+0.021
Or+0.020
 dishonest+0.019
Neg logit contributions
 doorstep-0.012
 night-0.012
 comet-0.011
 couldn-0.011
 cloud-0.010
Token her
Token ID607
Feature activation+4.88e-03

Pos logit contributions
 harmful+0.034
Or+0.030
 wasteful+0.030
drivers+0.030
 charity+0.029
Neg logit contributions
 night-0.028
 doorway-0.028
 comet-0.028
chair-0.027
 couldn-0.027
Token face
Token ID1986
Feature activation+1.56e-02

Pos logit contributions
 harmful+0.009
 charity+0.008
 wasteful+0.008
 spies+0.008
Or+0.008
Neg logit contributions
 doorway-0.008
field-0.007
 doorstep-0.007
 comet-0.007
 couldn-0.007
Token.
Token ID13
Feature activation-7.46e-04

Pos logit contributions
 harmful+0.032
 charity+0.030
drivers+0.030
 wasteful+0.030
Or+0.029
Neg logit contributions
 couldn-0.019
 comet-0.019
chair-0.018
 doorstep-0.018
 night-0.018
Token learned
Token ID4499
Feature activation-7.20e-03

Pos logit contributions
 harmful+0.005
Or+0.005
 wasteful+0.005
 charity+0.005
 spies+0.005
Neg logit contributions
 doorway-0.004
 doorstep-0.004
 comet-0.004
field-0.004
chair-0.004
Token to
Token ID284
Feature activation+2.90e-02

Pos logit contributions
 doorway+0.012
field+0.012
 doorstep+0.011
 comet+0.011
 couldn+0.011
Neg logit contributions
 harmful-0.013
 charity-0.011
 spies-0.011
 wasteful-0.011
Or-0.011
Token share
Token ID2648
Feature activation+1.39e-02

Pos logit contributions
 harmful+0.056
 wasteful+0.051
 spies+0.050
drivers+0.050
 dishonest+0.048
Neg logit contributions
field-0.041
 doorstep-0.040
 night-0.039
 comet-0.038
 doorway-0.038
Token and
Token ID290
Feature activation+4.33e-02

Pos logit contributions
 harmful+0.033
drivers+0.031
 wasteful+0.030
ief+0.030
 dishonest+0.030
Neg logit contributions
 comet-0.012
 couldn-0.012
 night-0.012
field-0.011
 doorstep-0.011
Token to
Token ID284
Feature activation+1.89e-02

Pos logit contributions
 harmful+0.077
 wasteful+0.068
 charity+0.066
drivers+0.066
Or+0.066
Neg logit contributions
 doorway-0.067
field-0.067
 doorstep-0.067
 comet-0.066
 couldn-0.065
Token be
Token ID307
Feature activation+5.93e-02

Pos logit contributions
 harmful+0.036
drivers+0.033
 spies+0.032
 wasteful+0.032
 dishonest+0.031
Neg logit contributions
 night-0.027
field-0.026
 couldn-0.026
 comet-0.026
 doorstep-0.026
Token careful
Token ID8161
Feature activation-2.95e-03

Pos logit contributions
drivers+0.119
 harmful+0.112
renches+0.106
 charity+0.106
ief+0.105
Neg logit contributions
 waved-0.070
 night-0.068
 morning-0.063
 cl-0.063
 began-0.058
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
 doorstep+0.004
 couldn+0.004
 doorway+0.004
field+0.004
 comet+0.004
Neg logit contributions
 harmful-0.006
 wasteful-0.005
 charity-0.005
Or-0.005
drivers-0.005
Token They
Token ID1119
Feature activation+7.83e-03

Pos logit contributions
 harmful+0.017
 wasteful+0.015
 charity+0.015
Or+0.015
 spies+0.015
Neg logit contributions
 doorway-0.018
field-0.018
 doorstep-0.018
 comet-0.018
 couldn-0.017
Token had
Token ID550
Feature activation+2.39e-02

Pos logit contributions
 harmful+0.014
Or+0.013
 wasteful+0.012
 charity+0.012
 spies+0.012
Neg logit contributions
 doorway-0.012
 doorstep-0.012
 comet-0.012
field-0.012
chair-0.011
Token a
Token ID257
Feature activation+1.40e-02

Pos logit contributions
 harmful+0.035
 charity+0.030
 wasteful+0.030
drivers+0.029
Or+0.029
Neg logit contributions
 couldn-0.044
 doorway-0.043
field-0.043
 night-0.043
 doorstep-0.042
Token
Token ID198
Feature activation+2.21e-02

Pos logit contributions
 doorway+0.001
 comet+0.001
 doorstep+0.001
field+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
 charity-0.001
Or-0.001
 wasteful-0.001
drivers-0.001
TokenSometimes
Token ID15468
Feature activation+2.41e-02

Pos logit contributions
 harmful+0.035
 wasteful+0.030
drivers+0.029
 spies+0.029
 charity+0.028
Neg logit contributions
field-0.039
 doorway-0.039
 doorstep-0.039
 comet-0.038
 couldn-0.038
Token Is
Token ID1148
Feature activation-1.62e-02

Pos logit contributions
 harmful+0.049
 charity+0.044
 wasteful+0.044
drivers+0.044
Or+0.043
Neg logit contributions
 comet-0.032
 couldn-0.032
 night-0.031
field-0.031
 doorway-0.030
Tokenla
Token ID5031
Feature activation+2.35e-03

Pos logit contributions
 doorway+0.022
field+0.022
 doorstep+0.022
 comet+0.021
 couldn+0.020
Neg logit contributions
 harmful-0.032
 wasteful-0.029
 charity-0.028
 spies-0.028
Or-0.028
Token's
Token ID338
Feature activation+1.05e-02

Pos logit contributions
 harmful+0.005
 charity+0.005
 wasteful+0.005
 spies+0.005
drivers+0.005
Neg logit contributions
 couldn-0.003
field-0.002
 comet-0.002
 doorstep-0.002
 doorway-0.002
Token dad
Token ID9955
Feature activation+5.84e-02

Pos logit contributions
 harmful+0.021
 wasteful+0.018
 charity+0.018
Or+0.018
drivers+0.018
Neg logit contributions
 doorway-0.015
 comet-0.014
 doorstep-0.014
field-0.014
 couldn-0.014
Token would
Token ID561
Feature activation+3.52e-03

Pos logit contributions
 harmful+0.125
drivers+0.114
 spies+0.113
 charity+0.113
 wasteful+0.113
Neg logit contributions
 couldn-0.072
field-0.065
 night-0.062
 doorstep-0.062
chair-0.061
Token play
Token ID711
Feature activation+1.98e-02

Pos logit contributions
 harmful+0.007
drivers+0.006
 wasteful+0.006
 spies+0.006
Or+0.006
Neg logit contributions
field-0.005
 doorstep-0.005
 comet-0.005
 couldn-0.005
 doorway-0.005
Token with
Token ID351
Feature activation+6.46e-03

Pos logit contributions
 harmful+0.045
 wasteful+0.042
drivers+0.042
istered+0.041
 dishonest+0.039
Neg logit contributions
 night-0.019
chair-0.018
 comet-0.018
 morning-0.017
 couldn-0.017
Token her
Token ID607
Feature activation+1.06e-02

Pos logit contributions
 harmful+0.012
 wasteful+0.010
Or+0.010
 charity+0.010
drivers+0.009
Neg logit contributions
 comet-0.010
 doorway-0.010
 couldn-0.010
 night-0.010
field-0.010
Token.
Token ID13
Feature activation+7.22e-03

Pos logit contributions
 harmful+0.020
 charity+0.018
 wasteful+0.017
Or+0.017
 spies+0.017
Neg logit contributions
 doorway-0.015
field-0.015
 doorstep-0.015
 comet-0.015
 couldn-0.015
Token and
Token ID290
Feature activation+4.25e-02

Pos logit contributions
 harmful+0.046
 charity+0.044
ief+0.043
 wasteful+0.042
cod+0.041
Neg logit contributions
 night-0.020
chair-0.018
 couldn-0.017
field-0.017
night-0.016
Token he
Token ID339
Feature activation+2.53e-02

Pos logit contributions
 charity+0.063
 harmful+0.062
drivers+0.061
ief+0.059
 groups+0.057
Neg logit contributions
 couldn-0.072
chair-0.066
 night-0.066
 replied-0.063
 comet-0.061
Token can
Token ID460
Feature activation-1.23e-02

Pos logit contributions
 harmful+0.044
 wasteful+0.038
Or+0.037
 spies+0.036
drivers+0.036
Neg logit contributions
 doorway-0.043
 doorstep-0.042
field-0.042
 comet-0.041
 couldn-0.038
Token do
Token ID466
Feature activation+1.21e-02

Pos logit contributions
 doorway+0.018
field+0.018
 doorstep+0.018
 comet+0.018
 couldn+0.018
Neg logit contributions
 harmful-0.023
 wasteful-0.020
 charity-0.020
Or-0.020
 spies-0.019
Token what
Token ID644
Feature activation+2.35e-02

Pos logit contributions
 harmful+0.021
 charity+0.020
Or+0.020
drivers+0.019
 spies+0.019
Neg logit contributions
 couldn-0.017
 night-0.016
chair-0.015
 comet-0.014
 morning-0.014
Token he
Token ID339
Feature activation+5.84e-02

Pos logit contributions
 harmful+0.043
 charity+0.038
 wasteful+0.038
Or+0.037
drivers+0.037
Neg logit contributions
 comet-0.035
 couldn-0.034
 doorway-0.034
field-0.034
chair-0.034
Token wants
Token ID3382
Feature activation+1.47e-02

Pos logit contributions
 harmful+0.107
 charity+0.101
drivers+0.097
Or+0.095
 spies+0.094
Neg logit contributions
 couldn-0.086
 night-0.081
night-0.077
 replied-0.077
chair-0.077
Token.
Token ID13
Feature activation+7.49e-03

Pos logit contributions
 harmful+0.032
 wasteful+0.029
Or+0.029
 charity+0.028
drivers+0.028
Neg logit contributions
 couldn-0.017
 night-0.017
chair-0.016
 doorstep-0.015
 comet-0.015
Token
Token ID198
Feature activation-2.10e-03

Pos logit contributions
 harmful+0.015
 charity+0.013
 wasteful+0.013
drivers+0.013
Or+0.013
Neg logit contributions
 doorway-0.010
field-0.010
chair-0.010
 comet-0.010
 doorstep-0.010
Token
Token ID198
Feature activation+2.84e-02

Pos logit contributions
 doorway+0.003
 doorstep+0.003
field+0.003
 comet+0.003
 couldn+0.003
Neg logit contributions
 harmful-0.004
 charity-0.004
 wasteful-0.004
Or-0.004
 spies-0.004
TokenM
Token ID44
Feature activation+1.28e-02

Pos logit contributions
 harmful+0.045
drivers+0.038
 charity+0.038
 spies+0.037
 wasteful+0.036
Neg logit contributions
chair-0.050
 doorway-0.050
field-0.049
 night-0.048
 comet-0.048
Token Tim
Token ID5045
Feature activation+2.41e-04

Pos logit contributions
 harmful+0.022
 wasteful+0.020
drivers+0.019
 dishonest+0.019
Or+0.019
Neg logit contributions
 couldn-0.018
 night-0.018
 comet-0.017
field-0.017
chair-0.017
Tokenmy
Token ID1820
Feature activation-3.75e-03

Pos logit contributions
 harmful+0.000
 wasteful+0.000
Or+0.000
 charity+0.000
 spies+0.000
Neg logit contributions
 doorway-0.000
 doorstep-0.000
field-0.000
 comet-0.000
chair-0.000
Token and
Token ID290
Feature activation+7.57e-03

Pos logit contributions
 doorway+0.005
 doorstep+0.005
 comet+0.005
field+0.005
chair+0.005
Neg logit contributions
 harmful-0.007
Or-0.007
 wasteful-0.007
drivers-0.006
 spies-0.006
Token Billy
Token ID15890
Feature activation+1.92e-02

Pos logit contributions
 harmful+0.012
 wasteful+0.010
 charity+0.010
Or+0.010
drivers+0.009
Neg logit contributions
 doorway-0.014
 comet-0.014
field-0.014
 couldn-0.014
 doorstep-0.014
Token would
Token ID561
Feature activation+5.68e-03

Pos logit contributions
 harmful+0.043
 charity+0.038
 wasteful+0.038
 spies+0.038
drivers+0.038
Neg logit contributions
 couldn-0.022
field-0.022
 doorstep-0.021
 comet-0.021
 doorway-0.021
Token always
Token ID1464
Feature activation+5.83e-02

Pos logit contributions
 harmful+0.011
 charity+0.009
Or+0.009
 wasteful+0.009
 spies+0.009
Neg logit contributions
 doorway-0.009
 doorstep-0.008
 comet-0.008
field-0.008
chair-0.008
Token go
Token ID467
Feature activation+2.54e-02

Pos logit contributions
 harmful+0.112
drivers+0.104
 wasteful+0.102
 spies+0.102
 charity+0.099
Neg logit contributions
field-0.080
 night-0.078
 doorstep-0.077
 couldn-0.077
 comet-0.075
Token to
Token ID284
Feature activation+9.14e-03

Pos logit contributions
 harmful+0.066
 wasteful+0.061
Or+0.060
 charity+0.059
 dishonest+0.058
Neg logit contributions
 night-0.020
field-0.019
 doorway-0.019
 doorstep-0.019
 morning-0.017
Token the
Token ID262
Feature activation-3.11e-03

Pos logit contributions
 harmful+0.019
drivers+0.017
Or+0.017
 wasteful+0.017
 charity+0.017
Neg logit contributions
field-0.011
 night-0.011
 doorway-0.011
 doorstep-0.011
 comet-0.011
Token park
Token ID3952
Feature activation+1.97e-02

Pos logit contributions
 doorway+0.004
field+0.004
 comet+0.004
 doorstep+0.004
 couldn+0.004
Neg logit contributions
 harmful-0.006
 charity-0.005
 wasteful-0.005
Or-0.005
 spies-0.005
Token and
Token ID290
Feature activation+1.22e-02

Pos logit contributions
 harmful+0.041
 wasteful+0.038
drivers+0.036
 charity+0.036
 dishonest+0.035
Neg logit contributions
field-0.025
 doorstep-0.025
 doorway-0.024
 couldn-0.024
 night-0.024
Token everyone
Token ID2506
Feature activation+2.53e-02

Pos logit contributions
 doorway+0.015
field+0.015
 doorstep+0.015
 comet+0.014
 couldn+0.014
Neg logit contributions
 harmful-0.015
 wasteful-0.013
 charity-0.013
Or-0.013
 spies-0.013
Token around
Token ID1088
Feature activation+5.34e-03

Pos logit contributions
 harmful+0.060
 wasteful+0.054
 charity+0.054
Or+0.054
 spies+0.053
Neg logit contributions
 doorway-0.025
field-0.024
 doorstep-0.024
 comet-0.024
 couldn-0.023
Token him
Token ID683
Feature activation+1.78e-02

Pos logit contributions
 harmful+0.010
 wasteful+0.009
drivers+0.009
Or+0.009
 charity+0.009
Neg logit contributions
 night-0.008
 couldn-0.008
 comet-0.008
field-0.008
 doorway-0.007
Token do
Token ID466
Feature activation+1.61e-02

Pos logit contributions
 harmful+0.037
 wasteful+0.033
 charity+0.033
Or+0.032
 spies+0.032
Neg logit contributions
 doorway-0.023
field-0.022
 doorstep-0.022
 comet-0.022
 couldn-0.021
Token what
Token ID644
Feature activation+1.20e-02

Pos logit contributions
 harmful+0.032
 wasteful+0.029
Or+0.029
drivers+0.028
 spies+0.028
Neg logit contributions
 couldn-0.021
field-0.020
 doorstep-0.020
 night-0.020
 doorway-0.019
Token he
Token ID339
Feature activation+5.80e-02

Pos logit contributions
 harmful+0.021
 wasteful+0.018
 charity+0.018
Or+0.018
 spies+0.018
Neg logit contributions
 doorway-0.019
field-0.019
 doorstep-0.019
 comet-0.019
 couldn-0.018
Token said
Token ID531
Feature activation+1.20e-03

Pos logit contributions
 harmful+0.120
 wasteful+0.106
 charity+0.106
Or+0.105
 spies+0.104
Neg logit contributions
 doorway-0.075
field-0.074
 doorstep-0.073
 comet-0.073
 couldn-0.071
Token.
Token ID13
Feature activation+5.18e-03

Pos logit contributions
 harmful+0.002
 charity+0.002
 wasteful+0.002
 spies+0.002
Or+0.002
Neg logit contributions
 doorway-0.002
field-0.002
 doorstep-0.002
 comet-0.002
 couldn-0.002
Token One
Token ID1881
Feature activation+2.08e-03

Pos logit contributions
 harmful+0.009
 wasteful+0.008
 charity+0.008
Or+0.008
 spies+0.008
Neg logit contributions
 doorway-0.008
field-0.008
 doorstep-0.008
 comet-0.008
 couldn-0.008
Token day
Token ID1110
Feature activation+1.86e-02

Pos logit contributions
 harmful+0.005
 charity+0.004
 wasteful+0.004
 spies+0.004
Or+0.004
Neg logit contributions
 doorstep-0.003
 doorway-0.002
field-0.002
 comet-0.002
 couldn-0.002
Token,
Token ID11
Feature activation+4.74e-03

Pos logit contributions
 harmful+0.037
 wasteful+0.033
 charity+0.032
Or+0.032
 spies+0.032
Neg logit contributions
 doorway-0.025
 comet-0.025
field-0.025
 doorstep-0.025
 couldn-0.025
Token They
Token ID1119
Feature activation+1.01e-02

Pos logit contributions
 harmful+0.034
 wasteful+0.028
 charity+0.027
drivers+0.027
Or+0.027
Neg logit contributions
 doorway-0.041
 doorstep-0.041
field-0.041
 couldn-0.041
 comet-0.041
Token learned
Token ID4499
Feature activation-8.45e-03

Pos logit contributions
 harmful+0.020
 wasteful+0.017
Or+0.017
 charity+0.017
 spies+0.017
Neg logit contributions
 doorway-0.014
 doorstep-0.014
field-0.014
 comet-0.014
chair-0.013
Token to
Token ID284
Feature activation+3.40e-02

Pos logit contributions
 doorway+0.012
field+0.012
 doorstep+0.012
 comet+0.012
 couldn+0.011
Neg logit contributions
 harmful-0.016
 charity-0.015
 wasteful-0.014
 spies-0.014
Or-0.014
Token share
Token ID2648
Feature activation+1.20e-02

Pos logit contributions
 harmful+0.065
 spies+0.060
drivers+0.059
 wasteful+0.058
 thieves+0.057
Neg logit contributions
 doorstep-0.046
field-0.046
 night-0.046
 replied-0.044
 couldn-0.043
Token and
Token ID290
Feature activation+4.48e-02

Pos logit contributions
 harmful+0.031
drivers+0.029
ief+0.028
 wasteful+0.028
 dishonest+0.028
Neg logit contributions
 comet-0.008
 night-0.008
 couldn-0.008
 doorstep-0.007
field-0.007
Token be
Token ID307
Feature activation+5.80e-02

Pos logit contributions
 harmful+0.080
 wasteful+0.070
drivers+0.069
 charity+0.068
Or+0.068
Neg logit contributions
 doorstep-0.068
 doorway-0.068
field-0.068
 couldn-0.067
 comet-0.067
Token kind
Token ID1611
Feature activation+1.49e-02

Pos logit contributions
drivers+0.119
 harmful+0.117
asy+0.111
renches+0.111
 wasteful+0.107
Neg logit contributions
 waved-0.060
 cl-0.058
 night-0.057
 day-0.050
 morning-0.050
Token.
Token ID13
Feature activation+1.82e-02

Pos logit contributions
 harmful+0.033
 wasteful+0.030
 charity+0.029
Or+0.029
drivers+0.029
Neg logit contributions
 couldn-0.016
 doorstep-0.016
 doorway-0.016
chair-0.016
 night-0.016
Token They
Token ID1119
Feature activation+1.31e-02

Pos logit contributions
 harmful+0.030
 wasteful+0.025
drivers+0.025
ief+0.024
 thieves+0.024
Neg logit contributions
 couldn-0.031
field-0.031
 doorway-0.031
 night-0.031
 doorstep-0.030
Token were
Token ID547
Feature activation+1.19e-02

Pos logit contributions
 harmful+0.025
 wasteful+0.022
Or+0.022
 charity+0.022
 spies+0.021
Neg logit contributions
 doorway-0.019
 doorstep-0.019
field-0.019
 comet-0.019
chair-0.018
Token not
Token ID407
Feature activation+3.05e-02

Pos logit contributions
 harmful+0.020
drivers+0.019
 charity+0.018
Or+0.017
 wasteful+0.017
Neg logit contributions
 couldn-0.019
 comet-0.018
 night-0.018
 doorway-0.018
field-0.018
Token He
Token ID679
Feature activation+1.10e-02

Pos logit contributions
 harmful+0.006
 wasteful+0.005
 charity+0.005
Or+0.005
 spies+0.005
Neg logit contributions
 doorway-0.007
field-0.007
 doorstep-0.007
 comet-0.007
 couldn-0.007
Token learns
Token ID22974
Feature activation-9.65e-04

Pos logit contributions
 harmful+0.018
Or+0.016
 spies+0.016
 wasteful+0.016
 charity+0.016
Neg logit contributions
 doorway-0.018
 doorstep-0.018
field-0.018
 comet-0.018
 couldn-0.017
Token to
Token ID284
Feature activation+2.16e-02

Pos logit contributions
 doorway+0.001
 comet+0.001
 doorstep+0.001
 couldn+0.001
chair+0.001
Neg logit contributions
 harmful-0.002
 wasteful-0.002
Or-0.002
 charity-0.002
ief-0.002
Token share
Token ID2648
Feature activation+1.27e-02

Pos logit contributions
 harmful+0.040
drivers+0.037
 wasteful+0.036
 spies+0.036
Or+0.035
Neg logit contributions
 night-0.031
field-0.030
 doorstep-0.030
 comet-0.030
 replied-0.030
Token and
Token ID290
Feature activation+4.04e-02

Pos logit contributions
 harmful+0.034
drivers+0.032
 wasteful+0.031
ief+0.031
 dishonest+0.030
Neg logit contributions
 night-0.007
 couldn-0.007
 comet-0.007
chair-0.006
 sky-0.006
Token be
Token ID307
Feature activation+5.80e-02

Pos logit contributions
 harmful+0.074
 wasteful+0.066
drivers+0.065
 charity+0.062
 spies+0.062
Neg logit contributions
 doorstep-0.058
 doorway-0.058
field-0.058
 couldn-0.057
 night-0.057
Token nice
Token ID3621
Feature activation+1.74e-02

Pos logit contributions
drivers+0.126
 harmful+0.121
istered+0.118
renches+0.118
assing+0.116
Neg logit contributions
 waved-0.053
 night-0.049
 cl-0.048
 was-0.046
 morning-0.045
Token.
Token ID13
Feature activation+1.23e-02

Pos logit contributions
 harmful+0.039
 wasteful+0.035
drivers+0.035
ief+0.034
Or+0.034
Neg logit contributions
 couldn-0.018
 night-0.018
 doorstep-0.018
chair-0.017
 doorway-0.017
Token They
Token ID1119
Feature activation+1.05e-02

Pos logit contributions
 harmful+0.025
 wasteful+0.022
ief+0.021
ts+0.021
Or+0.021
Neg logit contributions
 doorway-0.017
chair-0.016
 couldn-0.016
 night-0.015
field-0.015
Token play
Token ID711
Feature activation+1.43e-02

Pos logit contributions
 harmful+0.018
 wasteful+0.015
 charity+0.015
Or+0.015
 spies+0.015
Neg logit contributions
 doorway-0.018
field-0.017
 doorstep-0.017
 comet-0.017
 couldn-0.017
Token with
Token ID351
Feature activation+1.25e-02

Pos logit contributions
 harmful+0.030
drivers+0.027
 wasteful+0.026
 dishonest+0.025
 thieves+0.025
Neg logit contributions
 night-0.017
 couldn-0.016
 doorstep-0.016
 doorway-0.016
night-0.016
Token learned
Token ID4499
Feature activation-3.80e-03

Pos logit contributions
 harmful+0.022
 charity+0.020
 wasteful+0.020
drivers+0.019
 spies+0.019
Neg logit contributions
field-0.013
 doorway-0.013
 comet-0.013
 doorstep-0.013
 couldn-0.013
Token to
Token ID284
Feature activation+2.99e-02

Pos logit contributions
 doorway+0.005
field+0.005
 doorstep+0.005
 comet+0.005
 couldn+0.005
Neg logit contributions
 harmful-0.008
 wasteful-0.007
 charity-0.007
Or-0.007
 spies-0.007
Token share
Token ID2648
Feature activation+1.72e-02

Pos logit contributions
 harmful+0.055
drivers+0.053
 spies+0.051
 wasteful+0.049
 dishonest+0.048
Neg logit contributions
field-0.043
 doorstep-0.042
 night-0.041
 couldn-0.039
 replied-0.039
Token and
Token ID290
Feature activation+4.19e-02

Pos logit contributions
drivers+0.045
 harmful+0.043
ief+0.043
 wasteful+0.041
 dishonest+0.040
Neg logit contributions
 night-0.010
 comet-0.009
 couldn-0.009
field-0.009
 sunrise-0.008
Token to
Token ID284
Feature activation+2.26e-02

Pos logit contributions
 harmful+0.075
 wasteful+0.066
drivers+0.066
 charity+0.064
 spies+0.064
Neg logit contributions
field-0.063
 doorstep-0.063
 doorway-0.063
 couldn-0.063
 comet-0.061
Token be
Token ID307
Feature activation+5.79e-02

Pos logit contributions
drivers+0.040
 harmful+0.039
iers+0.037
 spies+0.037
 thieves+0.036
Neg logit contributions
 night-0.033
 couldn-0.032
field-0.031
 doorstep-0.030
 comet-0.030
Token kind
Token ID1611
Feature activation+1.71e-02

Pos logit contributions
drivers+0.132
ief+0.116
renches+0.115
 charity+0.113
 harmful+0.110
Neg logit contributions
 night-0.056
 waved-0.055
 morning-0.050
 cl-0.050
 day-0.049
Token.
Token ID13
Feature activation+1.32e-02

Pos logit contributions
 harmful+0.038
 wasteful+0.034
 charity+0.033
drivers+0.033
ief+0.033
Neg logit contributions
 couldn-0.018
 doorstep-0.018
 night-0.018
chair-0.018
field-0.018
Token They
Token ID1119
Feature activation+1.23e-02

Pos logit contributions
 harmful+0.026
drivers+0.023
ief+0.023
 wasteful+0.023
ts+0.022
Neg logit contributions
 night-0.018
 sky-0.018
field-0.018
 doorway-0.018
 couldn-0.018
Token had
Token ID550
Feature activation+2.39e-02

Pos logit contributions
 harmful+0.025
 wasteful+0.022
 charity+0.022
 spies+0.021
drivers+0.021
Neg logit contributions
field-0.016
 doorway-0.016
 doorstep-0.016
 comet-0.016
 couldn-0.016
Token a
Token ID257
Feature activation+1.13e-02

Pos logit contributions
 harmful+0.041
drivers+0.038
 charity+0.037
ief+0.037
 wasteful+0.036
Neg logit contributions
 couldn-0.038
 night-0.036
field-0.035
 doorway-0.034
 doorstep-0.034
Token story
Token ID1621
Feature activation+4.08e-03

Pos logit contributions
 harmful+0.015
 charity+0.013
 wasteful+0.013
 spies+0.013
 dishonest+0.013
Neg logit contributions
 doorway-0.008
field-0.007
 doorstep-0.007
 couldn-0.007
chair-0.007
Token is
Token ID318
Feature activation-2.86e-03

Pos logit contributions
 harmful+0.006
Or+0.006
 wasteful+0.006
 charity+0.006
 spies+0.005
Neg logit contributions
 doorway-0.007
field-0.007
 doorstep-0.007
 comet-0.007
 sunrise-0.006
Token to
Token ID284
Feature activation+1.53e-03

Pos logit contributions
 doorway+0.004
 comet+0.004
 doorstep+0.004
chair+0.004
 awoke+0.004
Neg logit contributions
 harmful-0.006
 wasteful-0.005
 charity-0.005
 spies-0.005
 dishonest-0.005
Token share
Token ID2648
Feature activation+1.09e-02

Pos logit contributions
 harmful+0.003
drivers+0.002
 wasteful+0.002
 spies+0.002
Or+0.002
Neg logit contributions
field-0.002
 comet-0.002
 doorstep-0.002
 couldn-0.002
 doorway-0.002
Token and
Token ID290
Feature activation+3.44e-02

Pos logit contributions
drivers+0.026
 harmful+0.025
ief+0.023
Or+0.023
 wasteful+0.023
Neg logit contributions
 comet-0.009
 night-0.009
 couldn-0.008
field-0.008
 doorstep-0.008
Token be
Token ID307
Feature activation+5.77e-02

Pos logit contributions
 harmful+0.060
 wasteful+0.052
drivers+0.052
Or+0.051
 charity+0.051
Neg logit contributions
 doorstep-0.054
field-0.054
 doorway-0.054
 comet-0.053
 couldn-0.053
Token kind
Token ID1611
Feature activation+1.12e-02

Pos logit contributions
drivers+0.131
 harmful+0.120
renches+0.115
istered+0.112
asy+0.110
Neg logit contributions
 night-0.056
 waved-0.053
 cl-0.051
 morning-0.048
 heart-0.046
Token to
Token ID284
Feature activation+9.93e-03

Pos logit contributions
 harmful+0.025
 wasteful+0.023
 charity+0.022
Or+0.022
drivers+0.022
Neg logit contributions
 doorstep-0.012
 doorway-0.012
 couldn-0.012
field-0.012
 comet-0.012
Token others
Token ID1854
Feature activation+7.38e-03

Pos logit contributions
drivers+0.020
 harmful+0.019
istered+0.019
Or+0.018
iers+0.018
Neg logit contributions
 night-0.013
 comet-0.012
 couldn-0.012
 mummy-0.011
 morning-0.011
Token.
Token ID13
Feature activation-1.71e-04

Pos logit contributions
 harmful+0.015
drivers+0.013
 wasteful+0.013
Or+0.013
 charity+0.013
Neg logit contributions
 comet-0.009
field-0.009
 couldn-0.009
 doorstep-0.009
 doorway-0.009
Token<|endoftext|>
Token ID50256
Feature activation+5.52e-03

Pos logit contributions
 doorway+0.000
 doorstep+0.000
 comet+0.000
field+0.000
 couldn+0.000
Neg logit contributions
 harmful-0.000
 charity-0.000
 wasteful-0.000
 spies-0.000
Or-0.000
Token on
Token ID319
Feature activation+4.70e-03

Pos logit contributions
 harmful+0.036
 charity+0.032
 wasteful+0.032
Or+0.032
 spies+0.031
Neg logit contributions
 doorway-0.020
field-0.020
 doorstep-0.020
 comet-0.019
 couldn-0.019
Token,
Token ID11
Feature activation-5.99e-03

Pos logit contributions
 harmful+0.011
 charity+0.010
Or+0.010
 wasteful+0.010
 spies+0.009
Neg logit contributions
 couldn-0.006
 night-0.005
field-0.005
 doorway-0.005
 comet-0.005
Token Sammy
Token ID44067
Feature activation-1.17e-02

Pos logit contributions
 couldn+0.008
 night+0.007
 comet+0.007
field+0.007
 morning+0.006
Neg logit contributions
 harmful-0.013
Or-0.012
 charity-0.012
drivers-0.012
 wasteful-0.012
Token learned
Token ID4499
Feature activation-6.17e-03

Pos logit contributions
 doorway+0.015
 doorstep+0.014
field+0.014
 comet+0.014
chair+0.013
Neg logit contributions
 harmful-0.025
 wasteful-0.022
Or-0.022
 charity-0.022
 spies-0.022
Token to
Token ID284
Feature activation+2.90e-02

Pos logit contributions
 doorway+0.008
field+0.008
 doorstep+0.007
 comet+0.007
 couldn+0.007
Neg logit contributions
 harmful-0.013
 wasteful-0.012
 charity-0.012
 spies-0.012
Or-0.012
Token always
Token ID1464
Feature activation+5.73e-02

Pos logit contributions
 harmful+0.055
 spies+0.050
drivers+0.049
 wasteful+0.049
Or+0.048
Neg logit contributions
 doorstep-0.041
field-0.041
 night-0.039
 couldn-0.039
 doorway-0.038
Token ask
Token ID1265
Feature activation+2.31e-02

Pos logit contributions
 harmful+0.112
drivers+0.106
 wasteful+0.103
 spies+0.102
Or+0.098
Neg logit contributions
field-0.072
 night-0.072
 doorstep-0.071
 couldn-0.069
 replied-0.067
Token before
Token ID878
Feature activation-9.29e-03

Pos logit contributions
 harmful+0.051
Or+0.048
drivers+0.047
istered+0.047
 wasteful+0.046
Neg logit contributions
 couldn-0.025
 night-0.024
 doorstep-0.021
 mummy-0.021
night-0.021
Token taking
Token ID2263
Feature activation+2.98e-02

Pos logit contributions
 couldn+0.013
field+0.012
 comet+0.012
 night+0.012
 doorway+0.011
Neg logit contributions
 harmful-0.018
Or-0.017
 wasteful-0.017
 charity-0.016
ief-0.016
Token something
Token ID1223
Feature activation+1.10e-02

Pos logit contributions
 harmful+0.050
Or+0.050
 wasteful+0.047
drivers+0.047
istered+0.046
Neg logit contributions
 night-0.048
night-0.045
 morning-0.043
 comet-0.040
 couldn-0.040
Token that
Token ID326
Feature activation+2.33e-02

Pos logit contributions
 harmful+0.021
Or+0.020
ief+0.020
drivers+0.020
 wasteful+0.019
Neg logit contributions
 couldn-0.015
 doorstep-0.014
 night-0.013
 comet-0.013
 doorway-0.013
Token They
Token ID1119
Feature activation+1.50e-02

Pos logit contributions
 harmful+0.016
 charity+0.014
Or+0.014
 spies+0.014
 wasteful+0.014
Neg logit contributions
 doorway-0.020
 doorstep-0.020
field-0.020
 comet-0.019
 couldn-0.019
Token learn
Token ID2193
Feature activation-5.48e-03

Pos logit contributions
 harmful+0.025
 wasteful+0.021
 charity+0.021
drivers+0.021
 spies+0.021
Neg logit contributions
field-0.025
 doorway-0.025
 doorstep-0.025
 comet-0.025
 couldn-0.025
Token to
Token ID284
Feature activation+2.71e-02

Pos logit contributions
field+0.011
 doorway+0.011
 doorstep+0.011
 couldn+0.010
 comet+0.010
Neg logit contributions
 harmful-0.008
 spies-0.007
 charity-0.007
 wasteful-0.007
Or-0.007
Token share
Token ID2648
Feature activation+1.19e-02

Pos logit contributions
 harmful+0.050
 spies+0.045
drivers+0.045
 wasteful+0.044
 dishonest+0.043
Neg logit contributions
 doorstep-0.040
field-0.039
 night-0.039
 comet-0.038
 couldn-0.037
Token and
Token ID290
Feature activation+4.21e-02

Pos logit contributions
 harmful+0.027
drivers+0.025
 wasteful+0.025
 dishonest+0.024
ief+0.024
Neg logit contributions
 comet-0.012
 night-0.012
field-0.012
 couldn-0.012
 doorstep-0.011
Token be
Token ID307
Feature activation+5.72e-02

Pos logit contributions
 harmful+0.075
 wasteful+0.066
drivers+0.065
 charity+0.064
Or+0.064
Neg logit contributions
 doorway-0.065
 doorstep-0.064
field-0.064
 comet-0.063
 couldn-0.062
Token good
Token ID922
Feature activation+6.24e-03

Pos logit contributions
drivers+0.116
 harmful+0.114
asy+0.105
renches+0.105
assing+0.103
Neg logit contributions
 night-0.063
 cl-0.059
 waved-0.057
 morning-0.055
 day-0.052
Token.
Token ID13
Feature activation+9.42e-03

Pos logit contributions
 harmful+0.011
 wasteful+0.010
Or+0.010
 charity+0.010
drivers+0.010
Neg logit contributions
 doorway-0.010
 doorstep-0.010
 comet-0.009
field-0.009
 couldn-0.009
Token Mom
Token ID11254
Feature activation+1.44e-02

Pos logit contributions
 harmful+0.017
 wasteful+0.014
 charity+0.014
Or+0.014
 spies+0.014
Neg logit contributions
 doorway-0.015
field-0.015
 doorstep-0.015
 comet-0.015
 couldn-0.014
Token is
Token ID318
Feature activation+6.37e-03

Pos logit contributions
 harmful+0.024
 wasteful+0.022
Or+0.022
 spies+0.021
drivers+0.021
Neg logit contributions
 doorstep-0.023
 doorway-0.023
field-0.023
 comet-0.022
 couldn-0.022
Token proud
Token ID6613
Feature activation-1.40e-03

Pos logit contributions
 harmful+0.012
drivers+0.010
 charity+0.010
Or+0.010
 wasteful+0.010
Neg logit contributions
 doorway-0.010
 comet-0.009
field-0.009
 doorstep-0.009
 couldn-0.009
Token promised
Token ID8072
Feature activation-3.35e-03

Pos logit contributions
 harmful+0.005
 wasteful+0.005
 charity+0.005
Or+0.005
 spies+0.005
Neg logit contributions
 doorway-0.003
 doorstep-0.003
field-0.003
 comet-0.003
 couldn-0.003
Token to
Token ID284
Feature activation+1.29e-02

Pos logit contributions
 doorstep+0.003
 couldn+0.003
field+0.002
 doorway+0.002
 night+0.002
Neg logit contributions
 harmful-0.009
 wasteful-0.008
Or-0.008
 charity-0.008
 spies-0.008
Token always
Token ID1464
Feature activation+5.11e-02

Pos logit contributions
 harmful+0.023
 wasteful+0.020
 charity+0.020
 spies+0.020
Or+0.020
Neg logit contributions
field-0.020
 doorway-0.020
 doorstep-0.020
 comet-0.020
 couldn-0.019
Token share
Token ID2648
Feature activation+1.61e-02

Pos logit contributions
 harmful+0.095
drivers+0.087
 wasteful+0.085
istered+0.081
 spies+0.080
Neg logit contributions
 doorstep-0.073
 night-0.072
field-0.071
 couldn-0.070
 comet-0.068
Token and
Token ID290
Feature activation+3.27e-02

Pos logit contributions
 harmful+0.035
drivers+0.035
ief+0.033
Or+0.032
 wasteful+0.032
Neg logit contributions
 comet-0.016
 night-0.016
 couldn-0.015
 doorstep-0.014
 sunrise-0.014
Token be
Token ID307
Feature activation+5.72e-02

Pos logit contributions
 harmful+0.059
 wasteful+0.052
 charity+0.051
drivers+0.051
Or+0.051
Neg logit contributions
 doorway-0.049
 doorstep-0.049
field-0.049
 couldn-0.049
 comet-0.048
Token good
Token ID922
Feature activation+1.01e-02

Pos logit contributions
drivers+0.123
 harmful+0.114
istered+0.112
renches+0.110
asy+0.109
Neg logit contributions
 night-0.050
 waved-0.050
 cl-0.050
 loved-0.046
 was-0.045
Token friends
Token ID2460
Feature activation+9.04e-03

Pos logit contributions
 harmful+0.020
 wasteful+0.018
Or+0.018
 charity+0.018
drivers+0.017
Neg logit contributions
 couldn-0.013
 night-0.013
 doorstep-0.013
night-0.013
 comet-0.013
Token.
Token ID13
Feature activation+7.22e-03

Pos logit contributions
 harmful+0.017
 wasteful+0.015
 charity+0.015
Or+0.015
drivers+0.015
Neg logit contributions
 doorway-0.013
field-0.013
 doorstep-0.013
 comet-0.013
 couldn-0.012
Token<|endoftext|>
Token ID50256
Feature activation-1.65e-03

Pos logit contributions
 harmful+0.014
Or+0.012
 charity+0.012
 wasteful+0.012
drivers+0.012
Neg logit contributions
 doorway-0.010
 doorstep-0.010
 comet-0.010
field-0.010
 couldn-0.009
TokenOnce
Token ID7454
Feature activation-1.41e-02

Pos logit contributions
 doorstep+0.003
 comet+0.003
 doorway+0.003
field+0.003
 couldn+0.003
Neg logit contributions
 harmful-0.003
 charity-0.002
 spies-0.002
drivers-0.002
Or-0.002
Token help
Token ID1037
Feature activation+2.99e-03

Pos logit contributions
 harmful+0.009
 wasteful+0.008
drivers+0.008
 spies+0.008
Or+0.007
Neg logit contributions
 doorstep-0.008
field-0.008
 couldn-0.008
 doorway-0.007
 comet-0.007
Token him
Token ID683
Feature activation+1.48e-02

Pos logit contributions
 harmful+0.006
Or+0.005
drivers+0.005
 wasteful+0.005
 dishonest+0.005
Neg logit contributions
 doorstep-0.004
 couldn-0.004
 night-0.004
 comet-0.004
field-0.004
Token buy
Token ID2822
Feature activation+1.45e-02

Pos logit contributions
 harmful+0.030
 wasteful+0.029
drivers+0.028
 dishonest+0.027
Or+0.027
Neg logit contributions
 doorstep-0.019
 couldn-0.018
field-0.017
 night-0.016
 doorway-0.016
Token the
Token ID262
Feature activation-6.61e-03

Pos logit contributions
 harmful+0.031
Or+0.029
drivers+0.028
Like+0.028
 wasteful+0.027
Neg logit contributions
 night-0.018
 comet-0.016
 doorway-0.015
 doorstep-0.015
 couldn-0.015
Token things
Token ID1243
Feature activation+2.36e-03

Pos logit contributions
 doorway+0.010
field+0.010
 doorstep+0.010
 comet+0.009
 couldn+0.009
Neg logit contributions
 harmful-0.012
 wasteful-0.011
 charity-0.011
Or-0.011
 spies-0.011
Token he
Token ID339
Feature activation+5.72e-02

Pos logit contributions
 harmful+0.005
drivers+0.005
cod+0.005
 wasteful+0.005
ief+0.005
Neg logit contributions
 night-0.003
 couldn-0.003
field-0.002
 comet-0.002
chair-0.002
Token needed
Token ID2622
Feature activation-1.74e-03

Pos logit contributions
 harmful+0.127
drivers+0.120
 charity+0.118
 wasteful+0.117
 spies+0.116
Neg logit contributions
 couldn-0.067
field-0.055
 night-0.055
 doorstep-0.054
chair-0.050
Token.
Token ID13
Feature activation+5.54e-03

Pos logit contributions
 couldn+0.002
 doorstep+0.002
field+0.002
 night+0.002
 doorway+0.002
Neg logit contributions
 harmful-0.004
 wasteful-0.003
Or-0.003
drivers-0.003
ief-0.003
Token They
Token ID1119
Feature activation+9.27e-03

Pos logit contributions
 harmful+0.009
 wasteful+0.007
drivers+0.007
 charity+0.007
Or+0.007
Neg logit contributions
 doorway-0.010
field-0.010
 doorstep-0.010
 couldn-0.010
 comet-0.010
Token said
Token ID531
Feature activation-7.13e-03

Pos logit contributions
 harmful+0.020
 wasteful+0.018
 charity+0.018
Or+0.018
 spies+0.018
Neg logit contributions
 doorway-0.011
field-0.011
 doorstep-0.011
 comet-0.011
 couldn-0.010
Token they
Token ID484
Feature activation+2.69e-02

Pos logit contributions
 couldn+0.010
 comet+0.010
 night+0.009
 replied+0.009
chair+0.009
Neg logit contributions
 harmful-0.014
 wasteful-0.012
cod-0.012
 charity-0.012
ief-0.012
Token They
Token ID1119
Feature activation+1.05e-02

Pos logit contributions
 harmful+0.032
drivers+0.027
 wasteful+0.027
 dishonest+0.026
ts+0.026
Neg logit contributions
field-0.033
 doorstep-0.032
 doorway-0.032
 night-0.031
 couldn-0.031
Token learned
Token ID4499
Feature activation-1.70e-02

Pos logit contributions
 harmful+0.019
Or+0.017
 wasteful+0.017
 charity+0.016
 spies+0.016
Neg logit contributions
 doorway-0.017
 comet-0.016
 doorstep-0.016
field-0.016
chair-0.015
Token that
Token ID326
Feature activation-7.50e-03

Pos logit contributions
 doorway+0.024
field+0.023
 doorstep+0.023
 comet+0.022
chair+0.021
Neg logit contributions
 harmful-0.035
 wasteful-0.031
 charity-0.031
 spies-0.030
Or-0.030
Token springs
Token ID24747
Feature activation+1.49e-02

Pos logit contributions
 doorway+0.013
 doorstep+0.013
field+0.013
 comet+0.012
 couldn+0.012
Neg logit contributions
 harmful-0.012
 charity-0.011
 wasteful-0.011
 spies-0.011
Or-0.010
Token can
Token ID460
Feature activation+1.13e-02

Pos logit contributions
 harmful+0.032
 charity+0.029
 spies+0.028
 wasteful+0.028
Or+0.028
Neg logit contributions
 couldn-0.018
field-0.016
 doorway-0.016
chair-0.016
 comet-0.016
Token be
Token ID307
Feature activation+5.72e-02

Pos logit contributions
 harmful+0.023
 spies+0.023
 wasteful+0.022
 dishonest+0.022
 chefs+0.022
Neg logit contributions
field-0.013
 couldn-0.013
 doorstep-0.012
 night-0.012
 comet-0.012
Token dangerous
Token ID4923
Feature activation+3.00e-02

Pos logit contributions
 charity+0.090
ief+0.088
drivers+0.088
iers+0.084
Or+0.081
Neg logit contributions
 couldn-0.090
 night-0.084
 began-0.071
 morning-0.071
 waved-0.071
Token.
Token ID13
Feature activation+1.81e-02

Pos logit contributions
 harmful+0.073
Or+0.069
 charity+0.068
 wasteful+0.068
cod+0.067
Neg logit contributions
 night-0.024
 couldn-0.023
chair-0.021
night-0.019
 morning-0.018
Token They
Token ID1119
Feature activation+1.09e-02

Pos logit contributions
 harmful+0.031
 wasteful+0.026
drivers+0.026
ts+0.025
 dishonest+0.025
Neg logit contributions
field-0.029
 night-0.029
 doorstep-0.029
 couldn-0.028
 doorway-0.028
Token learned
Token ID4499
Feature activation-1.57e-02

Pos logit contributions
 harmful+0.020
Or+0.017
 wasteful+0.017
 charity+0.017
 spies+0.016
Neg logit contributions
 doorway-0.018
 comet-0.017
 doorstep-0.017
field-0.017
chair-0.017
Token that
Token ID326
Feature activation-7.13e-03

Pos logit contributions
 doorway+0.021
field+0.021
 doorstep+0.021
 comet+0.020
chair+0.019
Neg logit contributions
 harmful-0.032
 wasteful-0.029
 charity-0.028
 spies-0.028
Or-0.028
Token a
Token ID257
Feature activation-7.03e-04

Pos logit contributions
field+0.005
 doorway+0.005
 doorstep+0.005
 comet+0.004
 couldn+0.004
Neg logit contributions
 harmful-0.004
 spies-0.003
 charity-0.003
 wasteful-0.003
Or-0.003
Token strong
Token ID1913
Feature activation-1.55e-02

Pos logit contributions
 doorway+0.001
 doorstep+0.001
field+0.001
 comet+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
 charity-0.001
 wasteful-0.001
 spies-0.001
Or-0.001
Token lizard
Token ID42406
Feature activation-1.65e-02

Pos logit contributions
 doorstep+0.024
 doorway+0.024
field+0.023
 couldn+0.023
 awoke+0.021
Neg logit contributions
 harmful-0.028
 wasteful-0.024
 spies-0.024
 charity-0.024
Or-0.023
Token named
Token ID3706
Feature activation+1.13e-02

Pos logit contributions
 doorway+0.035
 doorstep+0.035
 horizon+0.032
 comet+0.032
field+0.032
Neg logit contributions
 harmful-0.022
Or-0.018
smart-0.017
 spies-0.017
 charity-0.017
Token T
Token ID309
Feature activation-2.57e-02

Pos logit contributions
 harmful+0.016
 spies+0.014
 charity+0.014
drivers+0.013
 wasteful+0.013
Neg logit contributions
 doorway-0.021
 doorstep-0.021
field-0.021
 comet-0.020
 awoke-0.019
Tokenaz
Token ID1031
Feature activation-6.96e-02

Pos logit contributions
 doorway+0.038
field+0.037
 doorstep+0.036
 couldn+0.035
 comet+0.034
Neg logit contributions
 harmful-0.048
 charity-0.042
 spies-0.042
drivers-0.041
 wasteful-0.040
Token.
Token ID13
Feature activation+9.70e-03

Pos logit contributions
 doorway+0.119
 doorstep+0.108
field+0.106
 comet+0.105
 theatre+0.101
Neg logit contributions
 harmful-0.119
ts-0.100
drivers-0.100
Or-0.099
cod-0.098
Token He
Token ID679
Feature activation+2.05e-03

Pos logit contributions
 harmful+0.022
 wasteful+0.019
 charity+0.019
Or+0.019
 spies+0.019
Neg logit contributions
 doorway-0.011
field-0.011
 doorstep-0.011
 comet-0.011
 couldn-0.010
Token lived
Token ID5615
Feature activation+7.52e-04

Pos logit contributions
 harmful+0.004
Or+0.004
 wasteful+0.004
 spies+0.003
ts+0.003
Neg logit contributions
 doorway-0.003
 doorstep-0.003
field-0.003
 comet-0.003
chair-0.002
Token happily
Token ID18177
Feature activation+1.07e-02

Pos logit contributions
 harmful+0.001
 spies+0.001
Or+0.001
 wasteful+0.001
 charity+0.001
Neg logit contributions
 doorstep-0.001
 doorway-0.001
 comet-0.001
chair-0.001
field-0.001
Token among
Token ID1871
Feature activation-1.99e-02

Pos logit contributions
 harmful+0.020
 spies+0.017
 wasteful+0.017
Or+0.017
 charity+0.017
Neg logit contributions
 doorway-0.016
 doorstep-0.016
 comet-0.016
field-0.015
chair-0.015
Tokenâ
Token ID22940
Feature activation-7.18e-03

Pos logit contributions
 doorway+0.006
 comet+0.006
 doorstep+0.006
field+0.006
 couldn+0.005
Neg logit contributions
 harmful-0.007
 charity-0.006
 spies-0.006
Or-0.006
 wasteful-0.006
Token
Token ID26391
Feature activation-8.99e-03

Pos logit contributions
 comet+0.011
 doorway+0.011
 doorstep+0.010
field+0.010
 couldn+0.010
Neg logit contributions
 harmful-0.013
 wasteful-0.012
 charity-0.011
Or-0.011
ief-0.011
Token
Token ID198
Feature activation-1.05e-02

Pos logit contributions
 doorway+0.014
 comet+0.014
 doorstep+0.014
field+0.013
night+0.013
Neg logit contributions
 harmful-0.016
drivers-0.014
Or-0.014
 charity-0.014
 spies-0.014
Token
Token ID198
Feature activation+1.18e-02

Pos logit contributions
 comet+0.017
 doorway+0.017
 doorstep+0.016
field+0.016
night+0.016
Neg logit contributions
 harmful-0.018
 charity-0.017
Or-0.016
Their-0.016
drivers-0.016
TokenM
Token ID44
Feature activation-3.35e-03

Pos logit contributions
 harmful+0.019
 wasteful+0.016
 spies+0.016
 charity+0.016
drivers+0.016
Neg logit contributions
 doorway-0.020
field-0.020
 doorstep-0.020
 comet-0.020
 couldn-0.020
Tokenaz
Token ID1031
Feature activation-6.69e-02

Pos logit contributions
 doorway+0.004
field+0.004
 comet+0.004
 doorstep+0.004
 couldn+0.003
Neg logit contributions
 harmful-0.007
 wasteful-0.007
 charity-0.007
drivers-0.007
Or-0.007
Tokenzy
Token ID7357
Feature activation-3.93e-02

Pos logit contributions
 doorway+0.097
 comet+0.097
field+0.091
 doorstep+0.091
 night+0.081
Neg logit contributions
 harmful-0.124
drivers-0.114
 wasteful-0.113
 disl-0.112
Or-0.107
Token was
Token ID373
Feature activation-8.67e-03

Pos logit contributions
 doorway+0.054
 comet+0.054
 doorstep+0.051
field+0.050
chair+0.048
Neg logit contributions
 harmful-0.071
Or-0.066
 wasteful-0.063
drivers-0.062
smart-0.060
Token so
Token ID523
Feature activation+5.59e-04

Pos logit contributions
 doorstep+0.016
 comet+0.016
field+0.016
 doorway+0.015
chair+0.015
Neg logit contributions
 harmful-0.013
 wasteful-0.012
Or-0.011
 dishonest-0.011
 spies-0.011
Token excited
Token ID6568
Feature activation-4.91e-03

Pos logit contributions
 harmful+0.001
 wasteful+0.001
Or+0.001
 charity+0.001
 spies+0.001
Neg logit contributions
 doorway-0.001
 doorstep-0.001
field-0.001
 comet-0.001
chair-0.001
Token about
Token ID546
Feature activation-1.18e-02

Pos logit contributions
 doorway+0.008
 comet+0.008
field+0.008
 doorstep+0.007
 awoke+0.007
Neg logit contributions
 harmful-0.009
 spies-0.007
 wasteful-0.007
 charity-0.007
drivers-0.007
Token exclaimed
Token ID33552
Feature activation-3.89e-04

Pos logit contributions
 doorway+0.025
 comet+0.024
 doorstep+0.024
field+0.024
 sunrise+0.022
Neg logit contributions
 harmful-0.019
Or-0.017
 greedy-0.016
 dishonest-0.016
 spies-0.016
Token.
Token ID13
Feature activation-5.92e-03

Pos logit contributions
 doorway+0.001
 doorstep+0.001
 couldn+0.001
field+0.001
 comet+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
 charity-0.001
Or-0.001
drivers-0.001
Token
Token ID198
Feature activation-9.27e-03

Pos logit contributions
 comet+0.009
field+0.009
 doorstep+0.009
 doorway+0.009
chair+0.009
Neg logit contributions
 harmful-0.010
drivers-0.009
 charity-0.009
Or-0.009
 spies-0.009
Token
Token ID198
Feature activation+1.49e-02

Pos logit contributions
 comet+0.015
field+0.014
 doorway+0.014
 doorstep+0.014
night+0.013
Neg logit contributions
 harmful-0.016
 charity-0.015
Or-0.015
drivers-0.014
 spies-0.014
TokenL
Token ID43
Feature activation-1.19e-02

Pos logit contributions
 harmful+0.020
 wasteful+0.016
 spies+0.016
 thieves+0.016
 charity+0.015
Neg logit contributions
 couldn-0.030
 doorway-0.030
 doorstep-0.030
field-0.029
chair-0.029
Tokenizz
Token ID6457
Feature activation-6.39e-02

Pos logit contributions
 doorway+0.018
 doorstep+0.018
 comet+0.017
field+0.017
 couldn+0.017
Neg logit contributions
 harmful-0.022
 charity-0.019
 wasteful-0.019
 spies-0.019
Or-0.019
Tokenie
Token ID494
Feature activation-3.23e-02

Pos logit contributions
 comet+0.109
 doorstep+0.108
 doorway+0.107
field+0.105
chair+0.102
Neg logit contributions
 harmful-0.096
ts-0.085
 wasteful-0.085
smart-0.084
drivers-0.082
Token smiled
Token ID13541
Feature activation-8.53e-03

Pos logit contributions
 doorway+0.050
 comet+0.048
field+0.048
 doorstep+0.048
chair+0.045
Neg logit contributions
Or-0.049
 harmful-0.048
smart-0.044
 greedy-0.044
Po-0.044
Token back
Token ID736
Feature activation-1.96e-02

Pos logit contributions
field+0.013
 doorway+0.013
 doorstep+0.012
 comet+0.012
chair+0.012
Neg logit contributions
 harmful-0.016
 spies-0.014
Or-0.014
 charity-0.014
 wasteful-0.013
Token.
Token ID13
Feature activation-5.00e-03

Pos logit contributions
 comet+0.043
 doorway+0.042
 awoke+0.040
field+0.040
 doorstep+0.039
Neg logit contributions
 harmful-0.020
 spies-0.018
 greedy-0.016
 than-0.016
S-0.016
Token She
Token ID1375
Feature activation-7.28e-03

Pos logit contributions
 comet+0.009
field+0.009
 doorstep+0.009
 doorway+0.008
chair+0.008
Neg logit contributions
 harmful-0.008
Or-0.007
 charity-0.007
drivers-0.007
 spies-0.007
Token you
Token ID345
Feature activation-2.01e-02

Pos logit contributions
 doorway+0.019
field+0.019
 doorstep+0.019
 comet+0.019
 couldn+0.018
Neg logit contributions
 harmful-0.015
 wasteful-0.013
 charity-0.013
 spies-0.013
Or-0.012
Token do
Token ID466
Feature activation-1.69e-02

Pos logit contributions
 doorstep+0.028
field+0.028
 comet+0.028
 couldn+0.028
 doorway+0.028
Neg logit contributions
 harmful-0.039
Or-0.035
 wasteful-0.034
drivers-0.034
 charity-0.034
Token?!"
Token ID30823
Feature activation-9.28e-03

Pos logit contributions
 doorway+0.022
field+0.022
 doorstep+0.022
 comet+0.022
 couldn+0.021
Neg logit contributions
 harmful-0.034
 charity-0.031
 wasteful-0.030
 spies-0.030
Or-0.030
Token
Token ID198
Feature activation-1.85e-02

Pos logit contributions
field+0.015
 doorstep+0.014
 comet+0.014
 doorway+0.013
 awoke+0.012
Neg logit contributions
 harmful-0.017
smart-0.015
Or-0.015
drivers-0.015
 charity-0.014
Token
Token ID198
Feature activation+2.22e-03

Pos logit contributions
field+0.031
 doorstep+0.029
 comet+0.029
 doorway+0.029
night+0.027
Neg logit contributions
 harmful-0.031
smart-0.028
drivers-0.027
 charity-0.027
Or-0.027
TokenSad
Token ID26699
Feature activation-6.34e-02

Pos logit contributions
 harmful+0.004
Or+0.003
 charity+0.003
 wasteful+0.003
drivers+0.003
Neg logit contributions
field-0.004
 doorway-0.003
 doorstep-0.003
 comet-0.003
 couldn-0.003
Tokenie
Token ID494
Feature activation-3.70e-02

Pos logit contributions
field+0.108
 doorstep+0.099
 comet+0.098
 doorway+0.097
 theatre+0.087
Neg logit contributions
 harmful-0.105
drivers-0.093
 wasteful-0.092
smart-0.091
ts-0.083
Token quickly
Token ID2952
Feature activation-1.10e-02

Pos logit contributions
 comet+0.054
field+0.054
 doorway+0.052
 sunrise+0.050
 night+0.050
Neg logit contributions
 harmful-0.067
Po-0.059
 wasteful-0.059
Or-0.059
 dishonest-0.059
Token looked
Token ID3114
Feature activation-1.01e-03

Pos logit contributions
 doorway+0.014
 doorstep+0.014
 couldn+0.014
field+0.014
 comet+0.013
Neg logit contributions
 harmful-0.023
 charity-0.020
 wasteful-0.020
Or-0.020
 spies-0.020
Token up
Token ID510
Feature activation-1.08e-02

Pos logit contributions
field+0.002
 doorstep+0.001
 comet+0.001
 doorway+0.001
chair+0.001
Neg logit contributions
 harmful-0.002
 wasteful-0.002
 spies-0.002
 dishonest-0.002
 thieves-0.002
Token,
Token ID11
Feature activation-2.85e-02

Pos logit contributions
 doorstep+0.017
field+0.017
 comet+0.017
 doorway+0.017
chair+0.016
Neg logit contributions
 harmful-0.018
 spies-0.016
Or-0.016
 wasteful-0.015
 dishonest-0.015
Token Clara
Token ID27443
Feature activation-2.98e-02

Pos logit contributions
 comet+0.014
field+0.014
 doorstep+0.014
 doorway+0.013
 recorder+0.013
Neg logit contributions
Or-0.010
 harmful-0.010
 charity-0.010
 wasteful-0.009
 spies-0.009
Token stepped
Token ID10764
Feature activation-2.42e-02

Pos logit contributions
 doorway+0.047
 comet+0.044
 doorstep+0.044
chair+0.042
night+0.041
Neg logit contributions
 harmful-0.050
Or-0.048
Po-0.044
Like-0.043
 greedy-0.042
Token quietly
Token ID12703
Feature activation-6.72e-03

Pos logit contributions
field+0.043
 comet+0.043
chair+0.042
 doorstep+0.041
 couldn+0.040
Neg logit contributions
 harmful-0.036
 spies-0.032
Or-0.031
smart-0.031
 charity-0.031
Token so
Token ID523
Feature activation-2.04e-02

Pos logit contributions
 comet+0.009
 doorway+0.009
field+0.009
 doorstep+0.009
 couldn+0.009
Neg logit contributions
 harmful-0.013
Or-0.011
 spies-0.011
 charity-0.011
 wasteful-0.011
Token she
Token ID673
Feature activation-1.67e-02

Pos logit contributions
 doorway+0.032
 doorstep+0.031
field+0.031
 comet+0.031
 couldn+0.030
Neg logit contributions
 harmful-0.037
 wasteful-0.032
Or-0.032
 charity-0.032
 spies-0.031
Token wouldn
Token ID3636
Feature activation-6.33e-02

Pos logit contributions
 doorway+0.027
 doorstep+0.026
 night+0.025
field+0.025
 comet+0.025
Neg logit contributions
 harmful-0.028
Or-0.027
Po-0.024
 spies-0.023
ts-0.023
Token't
Token ID470
Feature activation-1.02e-02

Pos logit contributions
 doorway+0.104
field+0.097
 comet+0.096
night+0.094
 doorstep+0.094
Neg logit contributions
 harmful-0.106
smart-0.095
drivers-0.094
Or-0.092
 charity-0.090
Token scare
Token ID19437
Feature activation-1.16e-02

Pos logit contributions
 doorway+0.016
 doorstep+0.016
 comet+0.016
field+0.016
 couldn+0.016
Neg logit contributions
 harmful-0.018
 wasteful-0.016
 charity-0.016
drivers-0.016
Or-0.016
Token it
Token ID340
Feature activation-2.37e-03

Pos logit contributions
 doorway+0.017
field+0.017
 doorstep+0.017
 comet+0.016
chair+0.016
Neg logit contributions
 harmful-0.021
 charity-0.019
 spies-0.019
 wasteful-0.018
Or-0.018
Token away
Token ID1497
Feature activation-8.30e-03

Pos logit contributions
 doorstep+0.003
 couldn+0.003
 doorway+0.003
field+0.003
 comet+0.003
Neg logit contributions
 harmful-0.005
 wasteful-0.005
 charity-0.004
Or-0.004
drivers-0.004
Token.
Token ID13
Feature activation-1.38e-02

Pos logit contributions
 doorway+0.011
field+0.011
 doorstep+0.011
 comet+0.011
 couldn+0.011
Neg logit contributions
 harmful-0.017
 wasteful-0.015
 charity-0.015
drivers-0.014
Or-0.014
Token leave
Token ID2666
Feature activation+2.21e-03

Pos logit contributions
 doorway+0.039
field+0.038
 comet+0.038
chair+0.038
 doorstep+0.038
Neg logit contributions
 harmful-0.046
 charity-0.041
 spies-0.040
 wasteful-0.039
Or-0.038
Token,
Token ID11
Feature activation-1.67e-02

Pos logit contributions
 harmful+0.005
 wasteful+0.004
drivers+0.004
Or+0.004
 charity+0.004
Neg logit contributions
 doorway-0.003
field-0.003
 doorstep-0.003
 night-0.003
 couldn-0.003
Token the
Token ID262
Feature activation+2.31e-03

Pos logit contributions
 doorway+0.027
 doorstep+0.027
field+0.026
 comet+0.026
chair+0.025
Neg logit contributions
 harmful-0.029
 wasteful-0.025
 charity-0.025
Or-0.025
 spies-0.025
Token old
Token ID1468
Feature activation-1.23e-02

Pos logit contributions
 harmful+0.004
 wasteful+0.003
 charity+0.003
Or+0.003
drivers+0.003
Neg logit contributions
 doorway-0.004
 doorstep-0.004
field-0.004
 comet-0.004
 couldn-0.004
Token man
Token ID582
Feature activation-2.93e-03

Pos logit contributions
 comet+0.018
field+0.018
 couldn+0.018
 doorstep+0.018
 doorway+0.018
Neg logit contributions
 harmful-0.022
 spies-0.020
 wasteful-0.019
 charity-0.019
Or-0.019
Token wouldn
Token ID3636
Feature activation-6.32e-02

Pos logit contributions
 comet+0.004
 doorway+0.004
field+0.004
 doorstep+0.004
chair+0.004
Neg logit contributions
 harmful-0.006
Or-0.005
 wasteful-0.005
 spies-0.005
 charity-0.005
Token't
Token ID470
Feature activation+5.29e-03

Pos logit contributions
 doorway+0.093
field+0.088
 comet+0.087
 doorstep+0.085
night+0.082
Neg logit contributions
 harmful-0.117
drivers-0.104
 charity-0.103
 spies-0.102
smart-0.101
Token let
Token ID1309
Feature activation-6.50e-03

Pos logit contributions
 harmful+0.010
 wasteful+0.010
drivers+0.010
Or+0.009
 charity+0.009
Neg logit contributions
 doorstep-0.007
 couldn-0.007
 comet-0.007
field-0.007
 night-0.007
Token him
Token ID683
Feature activation+4.12e-03

Pos logit contributions
 doorway+0.010
 doorstep+0.010
 comet+0.010
 couldn+0.010
field+0.010
Neg logit contributions
 harmful-0.012
 wasteful-0.010
Or-0.010
 charity-0.010
drivers-0.010
Token.
Token ID13
Feature activation+1.09e-03

Pos logit contributions
 harmful+0.008
 wasteful+0.007
Or+0.007
drivers+0.007
 charity+0.007
Neg logit contributions
 doorstep-0.006
 couldn-0.005
field-0.005
 doorway-0.005
 comet-0.005
Token He
Token ID679
Feature activation+5.87e-03

Pos logit contributions
 harmful+0.002
 wasteful+0.002
 charity+0.002
Or+0.002
 spies+0.002
Neg logit contributions
 doorway-0.002
field-0.002
 doorstep-0.002
 comet-0.002
 couldn-0.002
Token protect
Token ID1805
Feature activation-1.51e-02

Pos logit contributions
 harmful+0.019
 wasteful+0.017
Or+0.016
drivers+0.016
 spies+0.016
Neg logit contributions
 doorstep-0.011
field-0.010
 comet-0.010
 doorway-0.010
 couldn-0.010
Token her
Token ID607
Feature activation-5.89e-03

Pos logit contributions
 doorway+0.026
 doorstep+0.026
field+0.026
 comet+0.026
 couldn+0.025
Neg logit contributions
 harmful-0.024
 wasteful-0.021
 charity-0.021
Or-0.021
drivers-0.020
Token.
Token ID13
Feature activation-1.38e-02

Pos logit contributions
 doorway+0.009
field+0.009
 doorstep+0.009
 comet+0.009
 couldn+0.008
Neg logit contributions
 harmful-0.011
 charity-0.009
 wasteful-0.009
Or-0.009
 spies-0.009
Token<|endoftext|>
Token ID50256
Feature activation-6.87e-04

Pos logit contributions
 comet+0.028
field+0.026
 doorstep+0.025
 doorway+0.024
 recorder+0.023
Neg logit contributions
 charity-0.016
Or-0.016
 harmful-0.015
smart-0.015
S-0.015
TokenPat
Token ID12130
Feature activation-3.29e-02

Pos logit contributions
 doorstep+0.001
 comet+0.001
 doorway+0.001
field+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
aks-0.001
 spies-0.001
 charity-0.001
drivers-0.001
Tokenches
Token ID2052
Feature activation-6.32e-02

Pos logit contributions
 doorway+0.051
 comet+0.048
 doorstep+0.047
field+0.046
chair+0.043
Neg logit contributions
 harmful-0.060
Or-0.050
 wasteful-0.049
smart-0.049
cod-0.048
Token the
Token ID262
Feature activation-8.55e-03

Pos logit contributions
 comet+0.084
 doorway+0.082
field+0.079
 doorstep+0.077
chair+0.074
Neg logit contributions
 harmful-0.115
Or-0.107
 charity-0.104
ts-0.101
 dishonest-0.101
Token puppy
Token ID26188
Feature activation-1.12e-02

Pos logit contributions
field+0.014
 doorstep+0.014
 doorway+0.014
 awoke+0.013
 couldn+0.013
Neg logit contributions
 harmful-0.014
 spies-0.013
 charity-0.013
 thieves-0.012
 wasteful-0.012
Token was
Token ID373
Feature activation-3.73e-03

Pos logit contributions
 comet+0.015
 doorway+0.015
 doorstep+0.014
field+0.014
chair+0.013
Neg logit contributions
 harmful-0.022
Or-0.020
 spies-0.019
 charity-0.019
 wasteful-0.019
Token very
Token ID845
Feature activation+4.64e-03

Pos logit contributions
 doorstep+0.006
 comet+0.006
field+0.006
 doorway+0.006
chair+0.006
Neg logit contributions
 harmful-0.006
 spies-0.006
 wasteful-0.006
Or-0.005
 charity-0.005
Token hungry
Token ID14720
Feature activation+1.19e-02

Pos logit contributions
 harmful+0.008
 wasteful+0.007
 charity+0.007
Or+0.007
 spies+0.007
Neg logit contributions
 doorway-0.008
field-0.007
 doorstep-0.007
 comet-0.007
 couldn-0.007
Token.
Token ID13
Feature activation-2.10e-03

Pos logit contributions
 doorway+0.025
field+0.025
 doorstep+0.025
 comet+0.025
 couldn+0.024
Neg logit contributions
 harmful-0.035
 wasteful-0.031
 charity-0.031
Or-0.030
 spies-0.030
Token
Token ID220
Feature activation-7.51e-03

Pos logit contributions
 comet+0.003
 doorway+0.002
 doorstep+0.002
field+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.004
 charity-0.004
Or-0.004
 wasteful-0.004
drivers-0.004
Token
Token ID198
Feature activation+2.71e-03

Pos logit contributions
 doorway+0.009
field+0.009
 doorstep+0.009
 comet+0.009
 couldn+0.008
Neg logit contributions
 harmful-0.016
 wasteful-0.014
 charity-0.014
Or-0.014
 spies-0.014
Token
Token ID198
Feature activation+2.07e-02

Pos logit contributions
 harmful+0.006
 wasteful+0.005
 charity+0.005
Or+0.005
 spies+0.005
Neg logit contributions
 doorway-0.003
field-0.003
 doorstep-0.003
 comet-0.003
 couldn-0.003
TokenL
Token ID43
Feature activation-1.36e-02

Pos logit contributions
 harmful+0.027
 spies+0.022
 wasteful+0.022
drivers+0.022
 charity+0.021
Neg logit contributions
field-0.041
 couldn-0.041
 doorway-0.041
 doorstep-0.041
 comet-0.040
Tokenizz
Token ID6457
Feature activation-6.30e-02

Pos logit contributions
 doorway+0.023
field+0.023
 doorstep+0.022
 comet+0.022
 couldn+0.022
Neg logit contributions
 harmful-0.023
 wasteful-0.020
 charity-0.020
Or-0.019
 spies-0.019
Tokenie
Token ID494
Feature activation-3.14e-02

Pos logit contributions
 comet+0.100
 doorway+0.097
 doorstep+0.097
field+0.095
 sunrise+0.087
Neg logit contributions
 harmful-0.099
smart-0.089
 wasteful-0.089
drivers-0.087
ts-0.086
Token and
Token ID290
Feature activation-1.15e-02

Pos logit contributions
 comet+0.046
 doorway+0.046
 doorstep+0.044
field+0.044
 sunrise+0.042
Neg logit contributions
 harmful-0.053
Or-0.050
 wasteful-0.046
smart-0.045
 dishonest-0.045
Token her
Token ID607
Feature activation+6.36e-03

Pos logit contributions
 doorstep+0.025
 doorway+0.025
field+0.025
 comet+0.024
chair+0.024
Neg logit contributions
 harmful-0.013
 charity-0.011
 spies-0.010
 wasteful-0.010
Or-0.010
Token sister
Token ID6621
Feature activation+9.23e-03

Pos logit contributions
 harmful+0.011
 charity+0.010
 wasteful+0.010
 spies+0.010
Or+0.010
Neg logit contributions
field-0.010
 doorway-0.010
 doorstep-0.010
 comet-0.010
 couldn-0.009
Token were
Token ID547
Feature activation-6.28e-03

Pos logit contributions
 harmful+0.017
 charity+0.015
Or+0.015
 spies+0.015
 wasteful+0.015
Neg logit contributions
 doorway-0.013
field-0.013
 comet-0.013
 doorstep-0.012
chair-0.011
Tokenears
Token ID4127
Feature activation+6.34e-03

Pos logit contributions
 harmful+0.012
 spies+0.012
 charity+0.011
Or+0.011
 wasteful+0.011
Neg logit contributions
 doorway-0.011
chair-0.011
 doorstep-0.011
field-0.010
 couldn-0.010
Token anymore
Token ID7471
Feature activation+3.67e-03

Pos logit contributions
 harmful+0.013
drivers+0.012
 wasteful+0.012
 charity+0.012
 spies+0.012
Neg logit contributions
 couldn-0.008
field-0.008
 comet-0.007
 doorstep-0.007
 night-0.007
Token!
Token ID0
Feature activation-3.63e-04

Pos logit contributions
 harmful+0.007
 spies+0.006
Or+0.006
 charity+0.006
 wasteful+0.006
Neg logit contributions
 doorway-0.006
 doorstep-0.006
field-0.005
 comet-0.005
chair-0.005
Token<|endoftext|>
Token ID50256
Feature activation+3.92e-03

Pos logit contributions
 doorstep+0.001
 comet+0.001
 doorway+0.001
field+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
 charity-0.001
smart-0.001
Or-0.001
 spies-0.001
TokenJ
Token ID41
Feature activation-2.65e-02

Pos logit contributions
 harmful+0.005
 spies+0.005
 charity+0.005
drivers+0.005
aks+0.005
Neg logit contributions
 doorstep-0.007
 comet-0.007
 doorway-0.007
field-0.006
 couldn-0.006
Tokenos
Token ID418
Feature activation-6.30e-02

Pos logit contributions
 doorway+0.046
field+0.045
 comet+0.045
 doorstep+0.045
chair+0.042
Neg logit contributions
 harmful-0.041
 charity-0.036
drivers-0.036
 wasteful-0.035
 spies-0.035
Tokenie
Token ID494
Feature activation-3.36e-02

Pos logit contributions
 doorway+0.091
 doorstep+0.086
 comet+0.083
field+0.079
chair+0.078
Neg logit contributions
 harmful-0.116
Or-0.102
 wasteful-0.101
 charity-0.100
smart-0.098
Token wanted
Token ID2227
Feature activation-9.25e-03

Pos logit contributions
 doorway+0.050
 doorstep+0.049
 comet+0.049
field+0.047
chair+0.045
Neg logit contributions
 harmful-0.058
Or-0.053
 spies-0.050
 wasteful-0.049
 charity-0.049
Token to
Token ID284
Feature activation-8.77e-03

Pos logit contributions
field+0.013
 doorway+0.013
 doorstep+0.013
 comet+0.013
chair+0.012
Neg logit contributions
 harmful-0.018
 charity-0.016
 spies-0.016
 wasteful-0.015
Or-0.015
Token pack
Token ID2353
Feature activation-4.07e-03

Pos logit contributions
 doorway+0.016
field+0.015
chair+0.015
night+0.014
 comet+0.014
Neg logit contributions
 harmful-0.013
 charity-0.012
 spies-0.011
Or-0.011
 groups-0.010
Token her
Token ID607
Feature activation-6.69e-03

Pos logit contributions
 doorway+0.007
field+0.007
 doorstep+0.007
 comet+0.006
 couldn+0.006
Neg logit contributions
 harmful-0.007
 wasteful-0.006
 charity-0.006
Or-0.006
 spies-0.006
Token tray
Token ID26473
Feature activation-9.14e-03

Pos logit contributions
 harmful+0.000
 charity+0.000
 wasteful+0.000
 spies+0.000
Or+0.000
Neg logit contributions
field-0.000
 doorway-0.000
 doorstep-0.000
 couldn-0.000
 comet-0.000
Token very
Token ID845
Feature activation+8.40e-03

Pos logit contributions
 comet+0.013
 doorway+0.012
field+0.012
 doorstep+0.012
 couldn+0.012
Neg logit contributions
 harmful-0.017
 charity-0.015
 spies-0.015
Or-0.015
 wasteful-0.015
Token carefully
Token ID7773
Feature activation-6.81e-04

Pos logit contributions
 harmful+0.016
 charity+0.015
drivers+0.014
 spies+0.014
 wasteful+0.014
Neg logit contributions
 couldn-0.012
field-0.012
chair-0.011
 doorway-0.011
 night-0.011
Token so
Token ID523
Feature activation-5.18e-03

Pos logit contributions
 doorway+0.001
field+0.001
 doorstep+0.001
 comet+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
 charity-0.001
Or-0.001
 spies-0.001
Token he
Token ID339
Feature activation+3.12e-03

Pos logit contributions
 couldn+0.008
field+0.007
 doorway+0.007
 comet+0.007
 doorstep+0.007
Neg logit contributions
 harmful-0.010
 charity-0.009
 wasteful-0.009
Or-0.008
 spies-0.008
Token wouldn
Token ID3636
Feature activation-6.29e-02

Pos logit contributions
 harmful+0.006
Or+0.006
 spies+0.005
drivers+0.005
 wasteful+0.005
Neg logit contributions
 doorway-0.004
 comet-0.004
 doorstep-0.004
field-0.004
 sunrise-0.004
Token't
Token ID470
Feature activation-3.76e-03

Pos logit contributions
 doorway+0.102
 comet+0.096
field+0.095
 doorstep+0.092
night+0.091
Neg logit contributions
 harmful-0.105
drivers-0.097
 charity-0.093
smart-0.092
 spies-0.091
Token drop
Token ID4268
Feature activation-7.50e-03

Pos logit contributions
 doorstep+0.006
field+0.005
 couldn+0.005
 comet+0.005
 doorway+0.005
Neg logit contributions
 harmful-0.007
 wasteful-0.006
drivers-0.006
Or-0.006
 charity-0.006
Token the
Token ID262
Feature activation-6.54e-03

Pos logit contributions
 night+0.004
 comet+0.004
 couldn+0.004
 doorway+0.004
chair+0.004
Neg logit contributions
 harmful-0.021
 charity-0.019
Or-0.019
 wasteful-0.019
drivers-0.019
Token food
Token ID2057
Feature activation-7.27e-03

Pos logit contributions
field+0.009
 couldn+0.009
 doorway+0.009
 doorstep+0.009
 comet+0.009
Neg logit contributions
 harmful-0.013
 charity-0.011
 wasteful-0.011
 spies-0.011
Or-0.011
Token.
Token ID13
Feature activation-4.34e-03

Pos logit contributions
 doorway+0.010
field+0.010
 doorstep+0.010
 comet+0.010
 couldn+0.010
Neg logit contributions
 harmful-0.014
 wasteful-0.012
 charity-0.012
Or-0.012
 spies-0.012
Token.
Token ID13
Feature activation-4.86e-03

Pos logit contributions
 doorway+0.001
field+0.001
 doorstep+0.001
 comet+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
 charity-0.001
Or-0.001
 spies-0.001
Token
Token ID220
Feature activation-7.72e-03

Pos logit contributions
 comet+0.007
 doorstep+0.006
field+0.006
 doorway+0.006
night+0.006
Neg logit contributions
 harmful-0.009
 charity-0.009
Or-0.009
drivers-0.009
 wasteful-0.008
Token
Token ID198
Feature activation-8.17e-03

Pos logit contributions
 doorway+0.009
field+0.009
 doorstep+0.009
 comet+0.009
 couldn+0.008
Neg logit contributions
 harmful-0.017
 wasteful-0.015
 charity-0.015
Or-0.015
 spies-0.015
Token
Token ID198
Feature activation+1.49e-02

Pos logit contributions
 comet+0.013
 doorway+0.013
 doorstep+0.013
field+0.012
night+0.012
Neg logit contributions
 harmful-0.014
 charity-0.013
Or-0.013
drivers-0.013
 wasteful-0.012
TokenL
Token ID43
Feature activation-1.38e-02

Pos logit contributions
 harmful+0.022
 charity+0.019
 wasteful+0.019
Or+0.019
 spies+0.018
Neg logit contributions
 doorway-0.028
field-0.027
 doorstep-0.027
 comet-0.027
 couldn-0.026
Tokenizz
Token ID6457
Feature activation-6.26e-02

Pos logit contributions
 doorway+0.023
field+0.023
 doorstep+0.023
 comet+0.022
 couldn+0.022
Neg logit contributions
 harmful-0.023
 wasteful-0.020
 charity-0.020
Or-0.020
 spies-0.020
Tokenie
Token ID494
Feature activation-2.92e-02

Pos logit contributions
 comet+0.097
 doorstep+0.097
 doorway+0.096
field+0.095
chair+0.087
Neg logit contributions
 harmful-0.103
 wasteful-0.095
smart-0.092
drivers-0.089
 dishonest-0.088
Token and
Token ID290
Feature activation-1.28e-02

Pos logit contributions
 comet+0.043
 doorway+0.041
 doorstep+0.041
field+0.040
 sunrise+0.039
Neg logit contributions
 harmful-0.049
Or-0.047
 wasteful-0.043
smart-0.043
 dishonest-0.042
Token her
Token ID607
Feature activation+4.36e-03

Pos logit contributions
field+0.026
 doorstep+0.026
 doorway+0.026
 comet+0.026
chair+0.025
Neg logit contributions
 harmful-0.017
 wasteful-0.014
 charity-0.014
 spies-0.014
Or-0.013
Token sister
Token ID6621
Feature activation+5.11e-03

Pos logit contributions
 harmful+0.008
 charity+0.007
 wasteful+0.007
Or+0.007
 spies+0.007
Neg logit contributions
 doorway-0.007
field-0.007
 doorstep-0.007
 comet-0.007
 couldn-0.007
Token learned
Token ID4499
Feature activation-9.61e-03

Pos logit contributions
 harmful+0.010
 charity+0.009
Or+0.009
 wasteful+0.008
drivers+0.008
Neg logit contributions
 comet-0.007
field-0.007
 doorway-0.007
 doorstep-0.007
chair-0.006
Token string
Token ID4731
Feature activation-2.05e-02

Pos logit contributions
field+0.021
 couldn+0.021
 doorstep+0.021
chair+0.021
night+0.020
Neg logit contributions
 harmful-0.022
 charity-0.020
Or-0.019
 wasteful-0.019
 spies-0.019
Token,
Token ID11
Feature activation-1.65e-02

Pos logit contributions
 doorstep+0.028
 doorway+0.028
 comet+0.028
field+0.028
 couldn+0.027
Neg logit contributions
 harmful-0.039
Or-0.034
drivers-0.034
 spies-0.034
 charity-0.034
Token so
Token ID523
Feature activation-4.51e-03

Pos logit contributions
 doorstep+0.024
 comet+0.024
 doorway+0.024
field+0.024
chair+0.023
Neg logit contributions
 harmful-0.031
Or-0.027
 charity-0.027
 spies-0.027
 wasteful-0.026
Token that
Token ID326
Feature activation-1.78e-02

Pos logit contributions
 doorway+0.007
field+0.007
 comet+0.007
 doorstep+0.007
chair+0.007
Neg logit contributions
 harmful-0.008
 wasteful-0.007
 spies-0.007
Or-0.007
 charity-0.007
Token he
Token ID339
Feature activation-9.80e-05

Pos logit contributions
 doorway+0.029
 doorstep+0.028
chair+0.027
field+0.027
 comet+0.027
Neg logit contributions
 harmful-0.031
 wasteful-0.027
 spies-0.027
Or-0.026
 charity-0.026
Token wouldn
Token ID3636
Feature activation-6.25e-02

Pos logit contributions
 doorway+0.000
 doorstep+0.000
 sunrise+0.000
 comet+0.000
field+0.000
Neg logit contributions
 harmful-0.000
Or-0.000
 spies-0.000
 wasteful-0.000
ts-0.000
Token't
Token ID470
Feature activation-6.01e-03

Pos logit contributions
 doorway+0.103
field+0.097
 comet+0.094
night+0.092
 doorstep+0.091
Neg logit contributions
 harmful-0.104
drivers-0.093
smart-0.092
 charity-0.090
 spies-0.089
Token lose
Token ID4425
Feature activation+5.04e-04

Pos logit contributions
 doorstep+0.009
 comet+0.009
field+0.009
 doorway+0.009
 couldn+0.009
Neg logit contributions
 harmful-0.011
 wasteful-0.010
 charity-0.010
drivers-0.009
Or-0.009
Token it
Token ID340
Feature activation+2.85e-03

Pos logit contributions
 harmful+0.001
 wasteful+0.001
drivers+0.001
Or+0.001
 charity+0.001
Neg logit contributions
 comet-0.000
 couldn-0.000
 doorway-0.000
 night-0.000
 doorstep-0.000
Token again
Token ID757
Feature activation+1.04e-02

Pos logit contributions
 harmful+0.006
 wasteful+0.006
Or+0.006
 charity+0.006
drivers+0.006
Neg logit contributions
 couldn-0.003
 doorstep-0.003
field-0.003
 doorway-0.003
 night-0.003
Token.
Token ID13
Feature activation-2.80e-03

Pos logit contributions
 harmful+0.023
 wasteful+0.022
drivers+0.021
 charity+0.021
Or+0.021
Neg logit contributions
field-0.012
 couldn-0.011
 night-0.011
 doorstep-0.011
 doorway-0.011
Token it
Token ID340
Feature activation+1.52e-02

Pos logit contributions
 harmful+0.006
 wasteful+0.005
 charity+0.005
Or+0.005
 spies+0.005
Neg logit contributions
 doorway-0.002
field-0.002
 doorstep-0.002
 comet-0.002
 couldn-0.002
Token away
Token ID1497
Feature activation-3.80e-03

Pos logit contributions
 harmful+0.033
 charity+0.030
 wasteful+0.029
Or+0.029
drivers+0.029
Neg logit contributions
 doorway-0.018
 doorstep-0.018
 comet-0.018
 couldn-0.018
field-0.018
Token so
Token ID523
Feature activation-9.52e-03

Pos logit contributions
 doorway+0.006
field+0.006
 doorstep+0.006
 comet+0.006
 couldn+0.005
Neg logit contributions
 harmful-0.007
 wasteful-0.006
 charity-0.006
Or-0.006
 spies-0.006
Token Tim
Token ID5045
Feature activation+3.02e-03

Pos logit contributions
 couldn+0.013
 comet+0.013
field+0.012
 doorway+0.012
 night+0.012
Neg logit contributions
 harmful-0.019
 charity-0.017
Or-0.017
 wasteful-0.017
drivers-0.017
Tokenmy
Token ID1820
Feature activation+4.09e-03

Pos logit contributions
 harmful+0.005
 wasteful+0.004
drivers+0.004
 dishonest+0.004
 spies+0.004
Neg logit contributions
field-0.005
 doorway-0.005
 comet-0.005
 doorstep-0.005
 Cloud-0.005
Token wouldn
Token ID3636
Feature activation-6.21e-02

Pos logit contributions
 harmful+0.008
Or+0.007
 wasteful+0.007
drivers+0.007
 dishonest+0.007
Neg logit contributions
 doorway-0.006
 comet-0.006
 sunrise-0.006
field-0.005
 Cloud-0.005
Token't
Token ID470
Feature activation-6.08e-03

Pos logit contributions
 doorway+0.109
field+0.102
 comet+0.098
night+0.097
 doorstep+0.096
Neg logit contributions
 harmful-0.098
drivers-0.087
smart-0.086
 spies-0.084
 charity-0.081
Token launch
Token ID4219
Feature activation-8.89e-03

Pos logit contributions
 comet+0.009
 doorstep+0.009
 doorway+0.009
field+0.009
 couldn+0.009
Neg logit contributions
 harmful-0.011
 wasteful-0.010
 charity-0.009
Or-0.009
drivers-0.009
Token it
Token ID340
Feature activation+6.69e-04

Pos logit contributions
 comet+0.007
 couldn+0.007
 night+0.007
 doorway+0.007
 doorstep+0.007
Neg logit contributions
 harmful-0.022
 wasteful-0.021
 charity-0.021
Or-0.020
drivers-0.020
Token again
Token ID757
Feature activation+1.19e-02

Pos logit contributions
 harmful+0.001
 wasteful+0.001
 charity+0.001
Or+0.001
drivers+0.001
Neg logit contributions
 couldn-0.001
 doorstep-0.001
 night-0.001
chair-0.001
 doorway-0.001
Token.
Token ID13
Feature activation-1.28e-02

Pos logit contributions
 harmful+0.027
 wasteful+0.026
drivers+0.026
Or+0.026
 charity+0.025
Neg logit contributions
 night-0.012
field-0.012
 couldn-0.012
 doorstep-0.011
 doorway-0.011
Token the
Token ID262
Feature activation+3.31e-05

Pos logit contributions
 night+0.004
 doorway+0.004
 comet+0.003
field+0.003
 doorstep+0.003
Neg logit contributions
 harmful-0.006
drivers-0.006
 wasteful-0.006
Or-0.006
 dishonest-0.005
Token cabinet
Token ID13447
Feature activation-7.89e-03

Pos logit contributions
 harmful+0.000
 charity+0.000
 wasteful+0.000
Or+0.000
 spies+0.000
Neg logit contributions
field-0.000
 comet-0.000
 couldn-0.000
 doorstep-0.000
 doorway-0.000
Token so
Token ID523
Feature activation-2.01e-03

Pos logit contributions
 comet+0.013
 doorway+0.012
field+0.012
 doorstep+0.012
 couldn+0.012
Neg logit contributions
 harmful-0.014
 wasteful-0.012
 charity-0.012
Or-0.012
 spies-0.012
Token Tim
Token ID5045
Feature activation+1.06e-02

Pos logit contributions
 couldn+0.003
 doorway+0.003
field+0.003
 comet+0.003
 doorstep+0.003
Neg logit contributions
 harmful-0.004
 charity-0.004
 wasteful-0.003
Or-0.003
 spies-0.003
Tokenmy
Token ID1820
Feature activation+5.96e-03

Pos logit contributions
 harmful+0.017
 wasteful+0.015
drivers+0.014
 dishonest+0.014
Or+0.014
Neg logit contributions
 comet-0.018
 doorway-0.018
field-0.018
 doorstep-0.018
 Cloud-0.017
Token wouldn
Token ID3636
Feature activation-6.20e-02

Pos logit contributions
 harmful+0.012
 wasteful+0.011
Or+0.010
 dishonest+0.010
 spies+0.010
Neg logit contributions
 doorway-0.008
 comet-0.008
field-0.007
 Cloud-0.007
 doorstep-0.007
Token't
Token ID470
Feature activation-3.36e-03

Pos logit contributions
 doorway+0.089
field+0.084
 comet+0.082
 doorstep+0.080
night+0.078
Neg logit contributions
 harmful-0.117
drivers-0.104
 charity-0.104
 spies-0.102
 wasteful-0.101
Token eat
Token ID4483
Feature activation+2.64e-03

Pos logit contributions
 doorstep+0.005
field+0.005
 comet+0.005
 couldn+0.005
 night+0.005
Neg logit contributions
 harmful-0.006
drivers-0.006
 wasteful-0.006
Or-0.006
 charity-0.006
Token it
Token ID340
Feature activation+2.95e-03

Pos logit contributions
 harmful+0.007
Or+0.007
drivers+0.007
 charity+0.007
 wasteful+0.007
Neg logit contributions
 night-0.002
 couldn-0.002
 comet-0.001
field-0.001
 doorway-0.001
Token all
Token ID477
Feature activation-1.32e-02

Pos logit contributions
 harmful+0.007
 wasteful+0.006
drivers+0.006
cod+0.006
 charity+0.006
Neg logit contributions
 night-0.003
 couldn-0.003
 doorstep-0.003
chair-0.003
field-0.003
Token before
Token ID878
Feature activation-2.35e-02

Pos logit contributions
 doorway+0.019
 doorstep+0.019
field+0.019
 comet+0.019
 couldn+0.018
Neg logit contributions
 harmful-0.025
 wasteful-0.022
 charity-0.022
Or-0.022
 spies-0.021
Token.
Token ID13
Feature activation+2.58e-05

Pos logit contributions
 harmful+0.018
 wasteful+0.016
 charity+0.016
Or+0.016
 spies+0.015
Neg logit contributions
 doorway-0.012
field-0.012
 doorstep-0.012
 comet-0.012
 couldn-0.012
Token The
Token ID383
Feature activation-4.49e-03

Pos logit contributions
 harmful+0.000
 wasteful+0.000
 charity+0.000
 spies+0.000
Or+0.000
Neg logit contributions
field-0.000
 doorway-0.000
 doorstep-0.000
 comet-0.000
 couldn-0.000
Token elevator
Token ID20932
Feature activation+1.03e-03

Pos logit contributions
field+0.008
night+0.007
 awoke+0.007
 doorstep+0.007
 couldn+0.007
Neg logit contributions
 harmful-0.008
 charity-0.007
 wasteful-0.007
 spies-0.007
Or-0.006
Token stopped
Token ID5025
Feature activation-1.04e-02

Pos logit contributions
 harmful+0.002
Or+0.002
drivers+0.002
 wasteful+0.002
 spies+0.002
Neg logit contributions
field-0.001
 comet-0.001
 doorway-0.001
night-0.001
 doorstep-0.001
Token and
Token ID290
Feature activation-1.80e-02

Pos logit contributions
field+0.011
 comet+0.011
 couldn+0.011
 doorway+0.011
 doorstep+0.011
Neg logit contributions
 harmful-0.023
 charity-0.022
 wasteful-0.021
drivers-0.021
Or-0.021
Token wouldn
Token ID3636
Feature activation-6.17e-02

Pos logit contributions
 doorstep+0.028
 doorway+0.027
field+0.027
 comet+0.026
chair+0.025
Neg logit contributions
 harmful-0.033
 wasteful-0.029
Or-0.028
 spies-0.028
 charity-0.027
Token't
Token ID470
Feature activation-9.82e-03

Pos logit contributions
 doorway+0.099
field+0.094
 doorstep+0.092
 comet+0.090
night+0.089
Neg logit contributions
 harmful-0.106
drivers-0.095
Or-0.093
 spies-0.090
smart-0.088
Token move
Token ID1445
Feature activation+1.61e-03

Pos logit contributions
 doorway+0.016
field+0.015
 doorstep+0.015
 comet+0.015
 couldn+0.015
Neg logit contributions
 harmful-0.017
 wasteful-0.015
 charity-0.015
Or-0.015
 spies-0.015
Token.
Token ID13
Feature activation+4.30e-03

Pos logit contributions
 harmful+0.004
 charity+0.003
 wasteful+0.003
Or+0.003
cod+0.003
Neg logit contributions
 couldn-0.002
 night-0.002
 comet-0.002
 doorway-0.002
chair-0.002
Token She
Token ID1375
Feature activation-1.21e-02

Pos logit contributions
 harmful+0.008
 wasteful+0.008
 charity+0.008
Or+0.007
 spies+0.007
Neg logit contributions
field-0.006
 doorway-0.006
 comet-0.006
 doorstep-0.006
 couldn-0.006
Token was
Token ID373
Feature activation-2.94e-02

Pos logit contributions
field+0.018
 comet+0.018
 doorway+0.018
 sunrise+0.018
 doorstep+0.017
Neg logit contributions
 harmful-0.021
Or-0.019
ts-0.018
 wasteful-0.018
 recycled-0.017
Tokenâ
Token ID22940
Feature activation-9.49e-03

Pos logit contributions
 doorway+0.015
field+0.015
 doorstep+0.015
 comet+0.015
 couldn+0.014
Neg logit contributions
 harmful-0.022
 wasteful-0.019
 charity-0.019
Or-0.019
 spies-0.019
Token
Token ID26391
Feature activation-1.10e-02

Pos logit contributions
 comet+0.014
 doorway+0.014
 doorstep+0.014
field+0.013
 couldn+0.013
Neg logit contributions
 harmful-0.018
 wasteful-0.016
 charity-0.015
 dishonest-0.015
Or-0.015
Token
Token ID198
Feature activation-4.13e-03

Pos logit contributions
 doorway+0.015
 comet+0.015
 doorstep+0.015
field+0.014
night+0.014
Neg logit contributions
 harmful-0.022
Or-0.019
 charity-0.019
 wasteful-0.019
 spies-0.019
Token
Token ID198
Feature activation+1.80e-02

Pos logit contributions
 doorway+0.005
 comet+0.005
 doorstep+0.005
field+0.005
 couldn+0.005
Neg logit contributions
 harmful-0.009
 charity-0.008
 wasteful-0.008
Or-0.008
drivers-0.007
TokenL
Token ID43
Feature activation-1.51e-02

Pos logit contributions
 harmful+0.025
 spies+0.021
drivers+0.020
 wasteful+0.020
 thieves+0.020
Neg logit contributions
field-0.035
 couldn-0.035
 doorway-0.034
 doorstep-0.034
chair-0.034
Tokenizz
Token ID6457
Feature activation-6.16e-02

Pos logit contributions
field+0.028
 doorway+0.028
 doorstep+0.028
 comet+0.028
 couldn+0.026
Neg logit contributions
 harmful-0.022
 wasteful-0.019
 charity-0.019
Or-0.019
drivers-0.018
Tokenie
Token ID494
Feature activation-2.82e-02

Pos logit contributions
 doorstep+0.098
 doorway+0.096
 comet+0.095
field+0.090
chair+0.085
Neg logit contributions
 harmful-0.104
 wasteful-0.093
smart-0.091
Or-0.088
drivers-0.088
Token nodded
Token ID14464
Feature activation-7.78e-03

Pos logit contributions
 doorway+0.040
 doorstep+0.040
 comet+0.039
field+0.037
 sunrise+0.036
Neg logit contributions
 harmful-0.051
Or-0.047
 wasteful-0.044
 dishonest-0.044
smart-0.043
Token and
Token ID290
Feature activation-1.54e-02

Pos logit contributions
 doorway+0.011
field+0.011
 doorstep+0.011
 comet+0.011
chair+0.010
Neg logit contributions
 harmful-0.015
Or-0.013
 charity-0.013
 wasteful-0.013
 spies-0.013
Token said
Token ID531
Feature activation+5.74e-03

Pos logit contributions
 comet+0.022
 doorstep+0.022
 doorway+0.022
field+0.021
chair+0.020
Neg logit contributions
 harmful-0.029
Or-0.026
 wasteful-0.025
 spies-0.024
 dishonest-0.024
Token,
Token ID11
Feature activation-1.20e-02

Pos logit contributions
 harmful+0.011
 wasteful+0.010
 charity+0.010
Or+0.010
 spies+0.010
Neg logit contributions
 doorway-0.008
field-0.008
 doorstep-0.008
 comet-0.008
 couldn-0.008
Token
Token ID198
Feature activation+2.25e-02

Pos logit contributions
 doorway+0.023
 doorstep+0.023
 comet+0.022
night+0.022
field+0.022
Neg logit contributions
 harmful-0.019
 charity-0.018
Or-0.017
 spies-0.016
smart-0.016
TokenOne
Token ID3198
Feature activation-3.48e-03

Pos logit contributions
 harmful+0.026
 wasteful+0.021
 charity+0.021
 spies+0.020
Or+0.020
Neg logit contributions
field-0.049
 doorway-0.049
 comet-0.049
 doorstep-0.048
 couldn-0.048
Token day
Token ID1110
Feature activation+1.74e-02

Pos logit contributions
 doorstep+0.005
 doorway+0.005
field+0.004
 couldn+0.004
 comet+0.004
Neg logit contributions
 harmful-0.007
 wasteful-0.006
Or-0.006
S-0.006
cod-0.006
Token,
Token ID11
Feature activation+1.27e-03

Pos logit contributions
 harmful+0.028
 spies+0.025
drivers+0.025
 charity+0.024
 wasteful+0.024
Neg logit contributions
 doorway-0.030
 doorstep-0.030
field-0.029
 comet-0.029
chair-0.027
Token T
Token ID309
Feature activation-2.77e-02

Pos logit contributions
 harmful+0.003
 charity+0.002
 wasteful+0.002
 spies+0.002
Or+0.002
Neg logit contributions
 doorway-0.002
 doorstep-0.002
field-0.002
 comet-0.002
chair-0.001
Tokenaz
Token ID1031
Feature activation-6.15e-02

Pos logit contributions
field+0.045
 doorway+0.045
 doorstep+0.041
 couldn+0.038
chair+0.038
Neg logit contributions
 harmful-0.047
 spies-0.041
drivers-0.040
 charity-0.040
 wasteful-0.039
Token heard
Token ID2982
Feature activation-5.44e-03

Pos logit contributions
 doorway+0.093
 comet+0.088
field+0.086
 doorstep+0.083
 theatre+0.080
Neg logit contributions
 harmful-0.110
drivers-0.097
ts-0.096
Po-0.093
cod-0.093
Token some
Token ID617
Feature activation+1.77e-02

Pos logit contributions
field+0.012
 doorstep+0.011
 doorway+0.011
 comet+0.011
 couldn+0.010
Neg logit contributions
 harmful-0.007
 spies-0.006
 charity-0.006
 wasteful-0.006
smart-0.005
Token voices
Token ID10839
Feature activation-1.44e-02

Pos logit contributions
 harmful+0.029
 wasteful+0.025
 charity+0.025
 spies+0.025
Or+0.025
Neg logit contributions
 doorway-0.030
field-0.030
 doorstep-0.030
 comet-0.030
 couldn-0.029
Token coming
Token ID2406
Feature activation-1.09e-02

Pos logit contributions
field+0.023
 doorway+0.022
 doorstep+0.022
 comet+0.021
 couldn+0.021
Neg logit contributions
 harmful-0.026
 spies-0.022
 charity-0.022
smart-0.021
Or-0.021
Token from
Token ID422
Feature activation+4.74e-03

Pos logit contributions
 doorway+0.013
field+0.013
 doorstep+0.013
 comet+0.013
 couldn+0.012
Neg logit contributions
 harmful-0.024
 wasteful-0.021
 charity-0.021
Or-0.021
 spies-0.021
Token it
Token ID340
Feature activation-1.50e-03

Pos logit contributions
 harmful+0.006
 charity+0.005
 wasteful+0.005
Or+0.005
drivers+0.005
Neg logit contributions
 doorway-0.003
 couldn-0.003
 night-0.003
 comet-0.003
 doorstep-0.003
Token,
Token ID11
Feature activation-2.79e-02

Pos logit contributions
 doorway+0.002
 doorstep+0.002
field+0.002
 comet+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.003
 wasteful-0.003
 charity-0.003
Or-0.003
 spies-0.003
Token so
Token ID523
Feature activation-1.22e-02

Pos logit contributions
 doorway+0.042
field+0.042
 comet+0.042
 doorstep+0.042
 couldn+0.040
Neg logit contributions
 harmful-0.052
 wasteful-0.045
 charity-0.045
Or-0.044
 spies-0.044
Token that
Token ID326
Feature activation-1.64e-02

Pos logit contributions
 doorway+0.018
 couldn+0.018
field+0.018
 doorstep+0.018
 comet+0.017
Neg logit contributions
 harmful-0.023
 charity-0.020
 wasteful-0.020
 spies-0.020
drivers-0.020
Token she
Token ID673
Feature activation-7.07e-03

Pos logit contributions
 doorway+0.023
field+0.022
 doorstep+0.022
 comet+0.022
 couldn+0.022
Neg logit contributions
 harmful-0.032
 charity-0.029
 wasteful-0.029
drivers-0.028
Or-0.028
Token wouldn
Token ID3636
Feature activation-6.14e-02

Pos logit contributions
 sunrise+0.009
 comet+0.009
 doorway+0.009
field+0.009
 doorstep+0.008
Neg logit contributions
 harmful-0.014
Or-0.013
 spies-0.012
drivers-0.012
ts-0.012
Token't
Token ID470
Feature activation-1.07e-02

Pos logit contributions
 doorway+0.099
field+0.096
 comet+0.094
night+0.091
 doorstep+0.089
Neg logit contributions
 harmful-0.103
drivers-0.095
smart-0.090
 spies-0.089
 charity-0.089
Token pull
Token ID2834
Feature activation-6.10e-03

Pos logit contributions
 doorstep+0.017
 doorway+0.017
 comet+0.016
field+0.016
 couldn+0.016
Neg logit contributions
 harmful-0.019
 wasteful-0.017
 charity-0.017
Or-0.016
drivers-0.016
Token it
Token ID340
Feature activation+3.94e-04

Pos logit contributions
 couldn+0.004
 night+0.004
 comet+0.004
 doorway+0.003
chair+0.003
Neg logit contributions
 harmful-0.016
 charity-0.015
Or-0.015
 wasteful-0.015
cod-0.015
Token too
Token ID1165
Feature activation+9.12e-03

Pos logit contributions
 harmful+0.001
 wasteful+0.001
Or+0.001
 charity+0.001
 dishonest+0.001
Neg logit contributions
 doorstep-0.000
 couldn-0.000
 night-0.000
 doorway-0.000
chair-0.000
Token hard
Token ID1327
Feature activation-1.56e-02

Pos logit contributions
 harmful+0.016
 charity+0.016
Or+0.016
drivers+0.015
 spies+0.015
Neg logit contributions
 night-0.013
 couldn-0.013
 doorway-0.013
chair-0.012
 doorstep-0.012
Tokenill
Token ID359
Feature activation-2.41e-02

Pos logit contributions
 comet+0.098
 doorstep+0.094
 doorway+0.094
night+0.093
field+0.093
Neg logit contributions
 harmful-0.035
 spies-0.029
Or-0.029
drivers-0.028
smart-0.028
Token.
Token ID13
Feature activation-2.18e-03

Pos logit contributions
 doorway+0.042
 comet+0.042
 doorstep+0.040
field+0.039
 Cloud+0.038
Neg logit contributions
 harmful-0.038
Or-0.031
 spies-0.029
 wasteful-0.029
 charity-0.028
Token But
Token ID887
Feature activation-1.12e-02

Pos logit contributions
 comet+0.003
field+0.003
 doorstep+0.003
 doorway+0.003
 awoke+0.003
Neg logit contributions
 harmful-0.004
Or-0.003
 wasteful-0.003
 charity-0.003
 spies-0.003
Token the
Token ID262
Feature activation-6.97e-04

Pos logit contributions
 doorway+0.017
 doorstep+0.017
 comet+0.016
chair+0.016
field+0.016
Neg logit contributions
 harmful-0.021
 wasteful-0.019
Or-0.018
 spies-0.018
 dishonest-0.017
Token ant
Token ID1885
Feature activation-2.06e-02

Pos logit contributions
chair+0.001
 awoke+0.001
field+0.001
night+0.001
 comet+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
 spies-0.001
 thieves-0.001
Or-0.001
Token wouldn
Token ID3636
Feature activation-6.14e-02

Pos logit contributions
 comet+0.029
 doorway+0.029
 doorstep+0.027
field+0.026
 sunrise+0.025
Neg logit contributions
 harmful-0.041
Or-0.036
 wasteful-0.035
 dishonest-0.033
 spies-0.033
Token't
Token ID470
Feature activation+7.92e-03

Pos logit contributions
 doorway+0.102
field+0.096
 doorstep+0.092
night+0.092
 comet+0.091
Neg logit contributions
 harmful-0.103
drivers-0.090
smart-0.089
 spies-0.088
Or-0.088
Token give
Token ID1577
Feature activation+8.59e-03

Pos logit contributions
 harmful+0.014
 wasteful+0.012
 charity+0.012
drivers+0.012
Or+0.012
Neg logit contributions
 doorstep-0.012
 doorway-0.012
 comet-0.012
field-0.012
 couldn-0.012
Token up
Token ID510
Feature activation+2.46e-03

Pos logit contributions
 harmful+0.021
 wasteful+0.019
 charity+0.019
 spies+0.019
Or+0.019
Neg logit contributions
 doorway-0.008
field-0.008
 doorstep-0.008
 comet-0.008
 couldn-0.007
Token.
Token ID13
Feature activation+3.48e-03

Pos logit contributions
 harmful+0.005
drivers+0.005
 wasteful+0.005
 charity+0.005
Or+0.005
Neg logit contributions
 couldn-0.003
 comet-0.003
field-0.003
 night-0.003
 doorstep-0.003
Token The
Token ID383
Feature activation+2.70e-03

Pos logit contributions
 harmful+0.007
 wasteful+0.006
 charity+0.006
Or+0.006
 spies+0.006
Neg logit contributions
 comet-0.005
 doorway-0.005
field-0.005
 doorstep-0.005
 couldn-0.004
Token in
Token ID287
Feature activation-3.15e-03

Pos logit contributions
 doorway+0.011
 comet+0.011
field+0.011
 doorstep+0.011
 couldn+0.010
Neg logit contributions
 harmful-0.018
 wasteful-0.015
 charity-0.015
 spies-0.015
Or-0.015
Token the
Token ID262
Feature activation-1.26e-02

Pos logit contributions
 doorway+0.004
 comet+0.004
field+0.004
 doorstep+0.004
 night+0.004
Neg logit contributions
 harmful-0.006
 wasteful-0.006
drivers-0.006
Or-0.005
 dishonest-0.005
Token closet
Token ID21615
Feature activation-1.87e-02

Pos logit contributions
 couldn+0.020
field+0.020
chair+0.019
ous+0.019
 comet+0.019
Neg logit contributions
 harmful-0.023
 charity-0.019
 wasteful-0.019
Or-0.018
 spies-0.018
Token so
Token ID523
Feature activation-1.81e-03

Pos logit contributions
 comet+0.026
 doorway+0.026
field+0.026
 doorstep+0.025
 couldn+0.025
Neg logit contributions
 harmful-0.036
 charity-0.032
 wasteful-0.032
 spies-0.032
Or-0.031
Token Lily
Token ID20037
Feature activation-9.03e-03

Pos logit contributions
 doorway+0.003
 doorstep+0.003
field+0.003
 comet+0.003
 couldn+0.002
Neg logit contributions
 harmful-0.003
 wasteful-0.003
 charity-0.003
Or-0.003
 spies-0.003
Token wouldn
Token ID3636
Feature activation-6.14e-02

Pos logit contributions
 sunrise+0.013
 comet+0.013
 doorway+0.013
field+0.012
 horizon+0.011
Neg logit contributions
 harmful-0.018
 spies-0.015
ts-0.015
Or-0.015
drivers-0.015
Token't
Token ID470
Feature activation-3.21e-03

Pos logit contributions
 doorway+0.091
field+0.089
 comet+0.087
 doorstep+0.083
night+0.083
Neg logit contributions
 harmful-0.112
drivers-0.101
 spies-0.098
 charity-0.097
Or-0.096
Token see
Token ID766
Feature activation-5.67e-03

Pos logit contributions
 doorstep+0.005
 couldn+0.005
 comet+0.005
 doorway+0.005
 night+0.005
Neg logit contributions
 harmful-0.006
 wasteful-0.005
drivers-0.005
 charity-0.005
Or-0.005
Token it
Token ID340
Feature activation-2.69e-03

Pos logit contributions
 doorway+0.006
 comet+0.006
 doorstep+0.005
 couldn+0.005
field+0.005
Neg logit contributions
 harmful-0.013
 wasteful-0.012
 charity-0.012
drivers-0.012
Or-0.012
Token anymore
Token ID7471
Feature activation-6.42e-03

Pos logit contributions
 couldn+0.003
 doorstep+0.003
 night+0.003
 doorway+0.003
 comet+0.003
Neg logit contributions
 harmful-0.006
 wasteful-0.005
 charity-0.005
 dishonest-0.005
drivers-0.005
Token.
Token ID13
Feature activation-1.50e-02

Pos logit contributions
 doorway+0.010
field+0.010
 doorstep+0.010
 comet+0.010
 couldn+0.009
Neg logit contributions
 harmful-0.012
 charity-0.010
 spies-0.010
 wasteful-0.010
Or-0.010
Token and
Token ID290
Feature activation-1.36e-02

Pos logit contributions
 comet+0.016
 doorway+0.016
field+0.016
 doorstep+0.015
chair+0.015
Neg logit contributions
 harmful-0.025
Or-0.022
 wasteful-0.022
drivers-0.022
 charity-0.022
Token found
Token ID1043
Feature activation+1.54e-03

Pos logit contributions
 doorway+0.019
 comet+0.019
 doorstep+0.019
field+0.019
chair+0.018
Neg logit contributions
 harmful-0.026
 spies-0.023
Or-0.023
 wasteful-0.023
 charity-0.023
Token all
Token ID477
Feature activation-1.41e-02

Pos logit contributions
 harmful+0.003
 wasteful+0.003
 charity+0.003
Or+0.003
 spies+0.003
Neg logit contributions
 doorway-0.002
 doorstep-0.002
field-0.002
 comet-0.002
 couldn-0.002
Token the
Token ID262
Feature activation+1.57e-02

Pos logit contributions
 doorstep+0.022
field+0.021
 comet+0.021
 doorway+0.021
 couldn+0.021
Neg logit contributions
 harmful-0.025
 wasteful-0.022
 charity-0.022
 spies-0.021
Or-0.021
Token tools
Token ID4899
Feature activation-5.65e-03

Pos logit contributions
 harmful+0.025
 wasteful+0.022
Or+0.022
 charity+0.021
 spies+0.021
Neg logit contributions
 doorway-0.027
 doorstep-0.027
 comet-0.026
field-0.026
 night-0.026
Token he
Token ID339
Feature activation+3.60e-02

Pos logit contributions
 doorway+0.007
 couldn+0.007
field+0.007
 doorstep+0.007
 comet+0.007
Neg logit contributions
 harmful-0.012
 wasteful-0.010
 charity-0.010
Or-0.010
 spies-0.010
Token needed
Token ID2622
Feature activation-2.00e-03

Pos logit contributions
 harmful+0.081
 charity+0.075
 wasteful+0.075
 spies+0.073
Or+0.071
Neg logit contributions
 couldn-0.042
 doorstep-0.038
 doorway-0.037
field-0.037
 comet-0.034
Token to
Token ID284
Feature activation-5.18e-04

Pos logit contributions
 doorway+0.002
 couldn+0.002
 doorstep+0.002
field+0.002
 night+0.002
Neg logit contributions
 harmful-0.004
 wasteful-0.004
Or-0.004
 charity-0.004
 spies-0.004
Token fix
Token ID4259
Feature activation+5.24e-03

Pos logit contributions
 doorstep+0.001
field+0.001
 couldn+0.001
 doorway+0.001
 comet+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
Or-0.001
 spies-0.001
drivers-0.001
Token the
Token ID262
Feature activation+5.79e-03

Pos logit contributions
 harmful+0.012
Or+0.011
 wasteful+0.011
 charity+0.011
 spies+0.010
Neg logit contributions
 night-0.005
 couldn-0.005
 doorway-0.005
field-0.005
 comet-0.005
Token car
Token ID1097
Feature activation+2.18e-02

Pos logit contributions
 harmful+0.011
 wasteful+0.010
Or+0.010
 charity+0.010
drivers+0.009
Neg logit contributions
 doorway-0.008
 doorstep-0.008
field-0.008
 comet-0.008
 couldn-0.008
Token
Token ID198
Feature activation+2.32e-02

Pos logit contributions
field+0.015
 comet+0.015
 doorway+0.015
 doorstep+0.014
night+0.013
Neg logit contributions
 harmful-0.015
 charity-0.014
Or-0.013
 spies-0.013
drivers-0.013
TokenL
Token ID43
Feature activation+4.34e-03

Pos logit contributions
 harmful+0.028
 wasteful+0.023
 spies+0.023
drivers+0.023
 charity+0.022
Neg logit contributions
 doorstep-0.050
 doorway-0.050
 couldn-0.049
chair-0.049
 comet-0.048
Tokenily
Token ID813
Feature activation-6.79e-03

Pos logit contributions
 harmful+0.004
 charity+0.003
Or+0.003
 wasteful+0.003
drivers+0.003
Neg logit contributions
 doorway-0.010
 doorstep-0.010
 comet-0.010
field-0.010
 couldn-0.010
Token asked
Token ID1965
Feature activation+2.36e-02

Pos logit contributions
 comet+0.010
field+0.010
 doorway+0.010
 doorstep+0.009
 sunrise+0.009
Neg logit contributions
 harmful-0.013
Or-0.011
 wasteful-0.011
 spies-0.011
 dishonest-0.010
Token her
Token ID607
Feature activation+5.69e-03

Pos logit contributions
 harmful+0.048
drivers+0.041
Or+0.041
 wasteful+0.041
cod+0.039
Neg logit contributions
 night-0.034
 couldn-0.034
night-0.032
chair-0.032
 comet-0.031
Token mom
Token ID1995
Feature activation+3.28e-02

Pos logit contributions
 harmful+0.013
 charity+0.012
 spies+0.012
 wasteful+0.012
Or+0.011
Neg logit contributions
field-0.006
 doorway-0.006
 doorstep-0.006
 comet-0.006
 couldn-0.006
Tokenmy
Token ID1820
Feature activation+2.89e-02

Pos logit contributions
 harmful+0.072
Or+0.064
 wasteful+0.063
drivers+0.062
 spies+0.062
Neg logit contributions
 couldn-0.039
 night-0.038
 doorway-0.037
 doorstep-0.037
 comet-0.036
Token,
Token ID11
Feature activation+2.25e-02

Pos logit contributions
 harmful+0.061
Or+0.054
 wasteful+0.053
drivers+0.053
 charity+0.052
Neg logit contributions
 couldn-0.037
 comet-0.035
 night-0.035
 doorway-0.035
 doorstep-0.035
Token "
Token ID366
Feature activation+6.74e-03

Pos logit contributions
 harmful+0.047
 wasteful+0.042
 charity+0.041
Or+0.041
drivers+0.041
Neg logit contributions
 doorway-0.028
 couldn-0.028
 comet-0.027
field-0.027
 doorstep-0.027
TokenWhy
Token ID5195
Feature activation-1.28e-02

Pos logit contributions
 harmful+0.009
 wasteful+0.007
 charity+0.007
Or+0.007
 spies+0.007
Neg logit contributions
 doorway-0.014
field-0.013
 doorstep-0.013
 comet-0.013
 couldn-0.013
Token is
Token ID318
Feature activation-1.44e-02

Pos logit contributions
 doorway+0.028
field+0.027
 doorstep+0.026
 horizon+0.024
 comet+0.024
Neg logit contributions
 harmful-0.017
 wasteful-0.015
 spies-0.014
 charity-0.014
drivers-0.013
Token to
Token ID284
Feature activation-1.39e-02

Pos logit contributions
field+0.000
 doorway+0.000
 doorstep+0.000
 comet+0.000
 couldn+0.000
Neg logit contributions
 harmful-0.000
 charity-0.000
 spies-0.000
 wasteful-0.000
Or-0.000
Token call
Token ID869
Feature activation-8.09e-04

Pos logit contributions
 doorway+0.030
field+0.030
chair+0.029
oline+0.028
ane+0.028
Neg logit contributions
 harmful-0.017
 spies-0.013
 charity-0.013
 survive-0.013
Or-0.012
Token her
Token ID607
Feature activation+5.42e-03

Pos logit contributions
field+0.002
 doorway+0.001
 doorstep+0.001
 comet+0.001
 couldn+0.001
Neg logit contributions
 harmful-0.001
 charity-0.001
 spies-0.001
 wasteful-0.001
Or-0.001
Token mom
Token ID1995
Feature activation+3.44e-02

Pos logit contributions
 harmful+0.010
 spies+0.009
 charity+0.009
 thieves+0.008
 wasteful+0.008
Neg logit contributions
 doorway-0.008
field-0.008
 awoke-0.008
 doorstep-0.008
chair-0.008
Token and
Token ID290
Feature activation+8.47e-03

Pos logit contributions
 harmful+0.076
 wasteful+0.068
Or+0.067
 charity+0.066
 spies+0.065
Neg logit contributions
 couldn-0.039
 doorway-0.039
 doorstep-0.039
field-0.039
 night-0.038
Token dad
Token ID9955
Feature activation+3.32e-02

Pos logit contributions
 harmful+0.015
 spies+0.014
 charity+0.014
 wasteful+0.014
Or+0.014
Neg logit contributions
field-0.013
 doorway-0.013
 comet-0.013
 doorstep-0.013
chair-0.012
Token,
Token ID11
Feature activation-3.29e-04

Pos logit contributions
 harmful+0.066
 wasteful+0.059
Or+0.058
 charity+0.058
 spies+0.057
Neg logit contributions
 doorway-0.045
 doorstep-0.044
field-0.044
 comet-0.043
 couldn-0.043
Token but
Token ID475
Feature activation-8.08e-03

Pos logit contributions
 comet+0.001
field+0.001
 doorway+0.001
 doorstep+0.001
night+0.001
Neg logit contributions
 harmful-0.000
 charity-0.000
Or-0.000
 spies-0.000
 wasteful-0.000
Token they
Token ID484
Feature activation+4.18e-03

Pos logit contributions
 doorway+0.014
field+0.014
 doorstep+0.014
 comet+0.014
 couldn+0.013
Neg logit contributions
 harmful-0.013
 wasteful-0.011
 charity-0.011
Or-0.011
 spies-0.011
Token were
Token ID547
Feature activation-1.09e-02

Pos logit contributions
 harmful+0.007
 thieves+0.005
 wasteful+0.005
Or+0.005
ts+0.005
Neg logit contributions
 sunrise-0.007
night-0.007
 doorway-0.007
 comet-0.007
field-0.007
Token in
Token ID287
Feature activation-1.42e-03

Pos logit contributions
 doorstep+0.020
field+0.019
 doorway+0.019
 comet+0.019
 couldn+0.018
Neg logit contributions
 harmful-0.018
 spies-0.015
 wasteful-0.015
 charity-0.014
Or-0.014
Token fence
Token ID13990
Feature activation-2.98e-03

Pos logit contributions
 harmful+0.002
 charity+0.002
 wasteful+0.002
 spies+0.002
Or+0.002
Neg logit contributions
field-0.002
 doorway-0.002
 couldn-0.002
 doorstep-0.002
 comet-0.002
Token.
Token ID13
Feature activation+6.07e-03

Pos logit contributions
 doorway+0.004
 doorstep+0.004
field+0.004
 couldn+0.004
 comet+0.004
Neg logit contributions
 harmful-0.006
 wasteful-0.005
 charity-0.005
drivers-0.005
 spies-0.005
Token And
Token ID843
Feature activation+5.67e-03

Pos logit contributions
 harmful+0.010
 wasteful+0.008
 charity+0.008
Or+0.008
drivers+0.008
Neg logit contributions
 doorway-0.011
field-0.011
 doorstep-0.011
 comet-0.011
 couldn-0.010
Token they
Token ID484
Feature activation+1.38e-02

Pos logit contributions
 harmful+0.010
 wasteful+0.008
 charity+0.008
drivers+0.008
Or+0.008
Neg logit contributions
 doorway-0.009
field-0.009
 comet-0.009
 couldn-0.009
 doorstep-0.009
Token did
Token ID750
Feature activation+5.39e-03

Pos logit contributions
 harmful+0.023
 wasteful+0.020
Or+0.019
 charity+0.019
 dishonest+0.018
Neg logit contributions
 doorway-0.025
field-0.024
 comet-0.024
 doorstep-0.023
chair-0.023
Token not
Token ID407
Feature activation+3.41e-02

Pos logit contributions
 harmful+0.008
 charity+0.007
 wasteful+0.007
Or+0.007
drivers+0.007
Neg logit contributions
 doorway-0.010
 doorstep-0.010
field-0.010
 comet-0.010
 couldn-0.009
Token mean
Token ID1612
Feature activation+1.01e-02

Pos logit contributions
drivers+0.061
Or+0.057
 harmful+0.057
 charity+0.053
 spies+0.052
Neg logit contributions
 night-0.050
 comet-0.047
 couldn-0.046
 cloud-0.044
 doorstep-0.044
Token to
Token ID284
Feature activation+1.70e-02

Pos logit contributions
 harmful+0.025
 charity+0.022
 wasteful+0.022
Or+0.022
drivers+0.022
Neg logit contributions
 couldn-0.009
 comet-0.009
 doorstep-0.009
 doorway-0.008
field-0.008
Token upset
Token ID9247
Feature activation-1.47e-03

Pos logit contributions
drivers+0.033
 harmful+0.032
iers+0.031
Or+0.031
 spies+0.030
Neg logit contributions
 comet-0.022
 night-0.021
 cloud-0.021
 couldn-0.020
 doorstep-0.020
Token the
Token ID262
Feature activation+4.61e-03

Pos logit contributions
 comet+0.002
 night+0.002
 couldn+0.002
chair+0.002
 doorstep+0.002
Neg logit contributions
 harmful-0.003
drivers-0.003
Or-0.003
 wasteful-0.002
 charity-0.002
Token dog
Token ID3290
Feature activation+1.47e-03

Pos logit contributions
 harmful+0.009
 wasteful+0.008
 charity+0.008
Or+0.008
 spies+0.008
Neg logit contributions
 doorway-0.006
field-0.006
 doorstep-0.006
 comet-0.006
 couldn-0.006
Token They
Token ID1119
Feature activation+5.28e-03

Pos logit contributions
 doorway+0.014
field+0.014
 doorstep+0.014
 comet+0.014
 couldn+0.014
Neg logit contributions
 harmful-0.011
 wasteful-0.009
 charity-0.009
Or-0.008
 spies-0.008
Token smile
Token ID8212
Feature activation+5.84e-03

Pos logit contributions
 harmful+0.010
 charity+0.009
 wasteful+0.009
Or+0.009
 spies+0.009
Neg logit contributions
 doorway-0.008
field-0.008
 doorstep-0.008
 comet-0.007
 couldn-0.007
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
 harmful+0.011
 wasteful+0.009
 charity+0.009
Or+0.009
 spies+0.009
Neg logit contributions
 doorway-0.009
field-0.009
 doorstep-0.009
 comet-0.009
 couldn-0.008
Token They
Token ID1119
Feature activation+1.30e-02

Pos logit contributions
 harmful+0.016
 wasteful+0.013
 charity+0.013
Or+0.013
 spies+0.013
Neg logit contributions
 doorway-0.019
field-0.019
 comet-0.019
 doorstep-0.019
 couldn-0.019
Token do
Token ID466
Feature activation+4.41e-03

Pos logit contributions
 harmful+0.022
 wasteful+0.019
 charity+0.019
Or+0.019
 spies+0.019
Neg logit contributions
 doorway-0.021
field-0.021
 doorstep-0.021
 comet-0.021
 couldn-0.021
Token not
Token ID407
Feature activation+3.31e-02

Pos logit contributions
 harmful+0.007
 wasteful+0.006
 charity+0.006
Or+0.006
 spies+0.005
Neg logit contributions
 doorway-0.008
field-0.008
 doorstep-0.008
 comet-0.008
 couldn-0.008
Token see
Token ID766
Feature activation+6.89e-03

Pos logit contributions
drivers+0.059
 harmful+0.055
Or+0.054
 charity+0.052
 spies+0.050
Neg logit contributions
 night-0.051
 comet-0.048
 couldn-0.048
 cloud-0.045
 doorstep-0.044
Token the
Token ID262
Feature activation-9.39e-03

Pos logit contributions
 harmful+0.014
drivers+0.013
 charity+0.012
Or+0.012
 wasteful+0.012
Neg logit contributions
 comet-0.009
 night-0.009
 couldn-0.009
field-0.009
 doorway-0.009
Token man
Token ID582
Feature activation+2.08e-02

Pos logit contributions
field+0.016
 doorway+0.016
chair+0.015
 doorstep+0.015
 couldn+0.015
Neg logit contributions
 harmful-0.016
 wasteful-0.014
 charity-0.014
 spies-0.013
Or-0.013
Token.
Token ID13
Feature activation+4.38e-03

Pos logit contributions
 harmful+0.042
drivers+0.038
 wasteful+0.038
 charity+0.037
Or+0.037
Neg logit contributions
 doorstep-0.026
field-0.026
 comet-0.026
 couldn-0.026
 doorway-0.026
Token He
Token ID679
Feature activation+2.34e-02

Pos logit contributions
 harmful+0.008
 wasteful+0.007
ief+0.007
 charity+0.007
 dishonest+0.007
Neg logit contributions
field-0.006
 doorway-0.006
 comet-0.006
 couldn-0.006
 night-0.006
Token and
Token ID290
Feature activation-1.34e-02

Pos logit contributions
 comet+0.033
 doorway+0.032
field+0.031
 doorstep+0.031
chair+0.030
Neg logit contributions
 harmful-0.032
 charity-0.027
drivers-0.027
 spies-0.027
Or-0.027
Token kept
Token ID4030
Feature activation-1.37e-02

Pos logit contributions
 comet+0.018
 doorway+0.017
field+0.017
 doorstep+0.017
chair+0.016
Neg logit contributions
 harmful-0.027
Or-0.024
 spies-0.024
 wasteful-0.023
 charity-0.023
Token repeating
Token ID20394
Feature activation-1.51e-04

Pos logit contributions
 doorstep+0.019
field+0.019
 doorway+0.019
 comet+0.019
chair+0.019
Neg logit contributions
 harmful-0.026
 spies-0.023
 charity-0.023
drivers-0.022
 wasteful-0.022
Token the
Token ID262
Feature activation+4.70e-03

Pos logit contributions
 doorway+0.000
 comet+0.000
field+0.000
 couldn+0.000
 doorstep+0.000
Neg logit contributions
 harmful-0.000
 wasteful-0.000
 charity-0.000
Or-0.000
 spies-0.000
Token silly
Token ID14397
Feature activation+6.69e-03

Pos logit contributions
 harmful+0.008
 charity+0.007
 wasteful+0.007
Or+0.006
 spies+0.006
Neg logit contributions
 doorway-0.008
field-0.008
 doorstep-0.008
 comet-0.008
 couldn-0.008
Token song
Token ID3496
Feature activation+1.29e-03

Pos logit contributions
 harmful+0.013
 wasteful+0.011
Or+0.011
 charity+0.011
 spies+0.011
Neg logit contributions
 doorway-0.010
 doorstep-0.010
 comet-0.009
 couldn-0.009
field-0.009
Token.
Token ID13
Feature activation-5.18e-03

Pos logit contributions
 harmful+0.002
 wasteful+0.002
 charity+0.002
Or+0.002
 spies+0.002
Neg logit contributions
 doorway-0.002
field-0.002
 doorstep-0.002
 comet-0.002
 couldn-0.002
Token He
Token ID679
Feature activation+3.94e-03

Pos logit contributions
 comet+0.008
field+0.008
 doorstep+0.008
 doorway+0.008
 couldn+0.007
Neg logit contributions
 harmful-0.009
 charity-0.008
drivers-0.008
Or-0.008
 wasteful-0.008
Token was
Token ID373
Feature activation-1.22e-02

Pos logit contributions
 harmful+0.007
Or+0.007
 wasteful+0.006
ts+0.006
drivers+0.006
Neg logit contributions
 doorway-0.006
 doorstep-0.006
field-0.005
 comet-0.005
 sunrise-0.005
Token very
Token ID845
Feature activation+1.41e-02

Pos logit contributions
 doorstep+0.020
field+0.020
 comet+0.019
 doorway+0.019
chair+0.018
Neg logit contributions
 harmful-0.021
 wasteful-0.019
 spies-0.018
Or-0.018
 dishonest-0.018
Token excited
Token ID6568
Feature activation+8.74e-04

Pos logit contributions
 harmful+0.025
 charity+0.024
drivers+0.023
Or+0.022
 spies+0.022
Neg logit contributions
 couldn-0.021
 doorway-0.021
 comet-0.020
field-0.020
 doorstep-0.020
Token sad
Token ID6507
Feature activation+3.18e-03

Pos logit contributions
 comet+0.030
field+0.030
 doorstep+0.029
 doorway+0.028
chair+0.027
Neg logit contributions
 harmful-0.024
 wasteful-0.020
 spies-0.020
 dishonest-0.019
 charity-0.019
Token,
Token ID11
Feature activation-1.53e-02

Pos logit contributions
 harmful+0.007
 charity+0.006
 wasteful+0.006
Or+0.006
drivers+0.006
Neg logit contributions
 couldn-0.004
 doorway-0.004
 doorstep-0.004
field-0.004
 comet-0.004
Token but
Token ID475
Feature activation-2.05e-02

Pos logit contributions
 doorway+0.026
field+0.026
 doorstep+0.026
 comet+0.025
 couldn+0.025
Neg logit contributions
 harmful-0.025
 wasteful-0.022
 charity-0.022
Or-0.021
 spies-0.021
Token then
Token ID788
Feature activation+1.77e-02

Pos logit contributions
 doorstep+0.038
 doorway+0.038
field+0.038
 comet+0.037
chair+0.035
Neg logit contributions
 harmful-0.031
 wasteful-0.026
 spies-0.026
Or-0.025
 charity-0.025
Token the
Token ID262
Feature activation+1.69e-02

Pos logit contributions
 harmful+0.032
 wasteful+0.028
 charity+0.028
Or+0.027
drivers+0.027
Neg logit contributions
 doorway-0.027
 couldn-0.027
field-0.027
 comet-0.027
 doorstep-0.027
Token conductor
Token ID39206
Feature activation+6.04e-03

Pos logit contributions
 harmful+0.030
 charity+0.027
 wasteful+0.026
 spies+0.026
Or+0.026
Neg logit contributions
field-0.026
 doorstep-0.026
 doorway-0.026
 comet-0.025
 couldn-0.025
Token came
Token ID1625
Feature activation+9.49e-04

Pos logit contributions
 harmful+0.012
Or+0.011
drivers+0.011
 wasteful+0.011
 charity+0.011
Neg logit contributions
field-0.008
 comet-0.008
 doorway-0.008
 doorstep-0.007
chair-0.007
Token over
Token ID625
Feature activation+2.08e-02

Pos logit contributions
 harmful+0.002
 wasteful+0.002
 charity+0.002
Or+0.002
 spies+0.002
Neg logit contributions
 doorway-0.001
field-0.001
 doorstep-0.001
 comet-0.001
 couldn-0.001
Token and
Token ID290
Feature activation-1.65e-03

Pos logit contributions
 harmful+0.052
 wasteful+0.047
drivers+0.046
cod+0.045
ief+0.045
Neg logit contributions
 doorway-0.019
 couldn-0.019
 night-0.018
 doorstep-0.018
field-0.017
Token said
Token ID531
Feature activation+9.09e-03

Pos logit contributions
 doorway+0.002
field+0.002
 doorstep+0.002
 comet+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.004
 wasteful-0.003
 charity-0.003
Or-0.003
 spies-0.003
Token,
Token ID11
Feature activation-5.69e-04

Pos logit contributions
 harmful+0.021
 charity+0.018
 wasteful+0.018
Or+0.018
ief+0.018
Neg logit contributions
 couldn-0.011
 comet-0.009
 replied-0.009
chair-0.009
 doorway-0.009
Token
Token ID198
Feature activation-2.75e-03

Pos logit contributions
 doorway+0.002
 couldn+0.002
 sky+0.002
chair+0.002
 night+0.002
Neg logit contributions
 harmful-0.007
ief-0.006
ts-0.006
cod-0.006
 wasteful-0.006
Token
Token ID198
Feature activation+2.50e-02

Pos logit contributions
 doorway+0.003
 doorstep+0.003
field+0.003
 couldn+0.003
 comet+0.002
Neg logit contributions
 harmful-0.007
 wasteful-0.006
Or-0.006
 charity-0.006
 spies-0.006
Token"
Token ID1
Feature activation+2.58e-02

Pos logit contributions
 harmful+0.051
drivers+0.044
 theirs+0.044
 thieves+0.043
 spies+0.043
Neg logit contributions
chair-0.031
 doorway-0.030
 couldn-0.029
 night-0.028
field-0.028
TokenWhat
Token ID2061
Feature activation+1.05e-02

Pos logit contributions
 harmful+0.044
drivers+0.041
smart+0.038
 charity+0.037
cod+0.037
Neg logit contributions
 comet-0.040
 sky-0.039
 doorway-0.039
 sunrise-0.037
 cloud-0.037
Token happened
Token ID3022
Feature activation+5.04e-03

Pos logit contributions
 harmful+0.020
 wasteful+0.018
 charity+0.018
Or+0.018
 spies+0.018
Neg logit contributions
 doorway-0.015
field-0.015
 doorstep-0.015
 comet-0.014
 couldn-0.014
Token?"
Token ID1701
Feature activation+6.42e-03

Pos logit contributions
 harmful+0.010
Or+0.009
 charity+0.009
 wasteful+0.009
drivers+0.009
Neg logit contributions
 comet-0.007
 doorway-0.007
field-0.007
 doorstep-0.007
 couldn-0.007
Token Mom
Token ID11254
Feature activation-1.42e-02

Pos logit contributions
 harmful+0.016
drivers+0.014
 charity+0.014
Or+0.014
 wasteful+0.014
Neg logit contributions
 doorway-0.006
 couldn-0.006
 comet-0.006
 night-0.006
chair-0.005
Token asked
Token ID1965
Feature activation+2.04e-02

Pos logit contributions
field+0.020
 doorstep+0.020
 doorway+0.020
 comet+0.020
 couldn+0.018
Neg logit contributions
 harmful-0.027
 wasteful-0.024
 spies-0.024
drivers-0.024
Or-0.023
Token.
Token ID13
Feature activation-1.50e-02

Pos logit contributions
 harmful+0.050
Or+0.047
 wasteful+0.045
cod+0.045
 charity+0.044
Neg logit contributions
 couldn-0.018
 night-0.016
 doorway-0.016
chair-0.015
night-0.015
Token
Token ID198
Feature activation+6.11e-03

Pos logit contributions
 doorway+0.018
 couldn+0.017
 doorstep+0.016
 comet+0.016
field+0.016
Neg logit contributions
 harmful-0.033
 wasteful-0.030
 charity-0.029
drivers-0.029
ief-0.029
Token
Token ID198
Feature activation+2.23e-02

Pos logit contributions
 harmful+0.015
 wasteful+0.014
Or+0.014
 dishonest+0.014
 charity+0.013
Neg logit contributions
 doorway-0.005
 doorstep-0.005
chair-0.005
 couldn-0.005
 comet-0.005
Token and
Token ID290
Feature activation-6.88e-03

Pos logit contributions
 harmful+0.020
 wasteful+0.019
drivers+0.019
 charity+0.019
 dishonest+0.018
Neg logit contributions
 couldn-0.008
 comet-0.008
 night-0.008
 awoke-0.008
night-0.008
Token held
Token ID2714
Feature activation+8.63e-03

Pos logit contributions
 doorway+0.007
 couldn+0.007
field+0.007
 doorstep+0.007
 comet+0.007
Neg logit contributions
 harmful-0.016
 charity-0.014
 wasteful-0.014
drivers-0.014
 spies-0.014
Token Jill
Token ID22772
Feature activation+3.22e-03

Pos logit contributions
 harmful+0.017
 charity+0.015
 wasteful+0.015
Or+0.015
drivers+0.015
Neg logit contributions
 doorway-0.012
 comet-0.012
 night-0.012
 couldn-0.012
 doorstep-0.011
Token's
Token ID338
Feature activation-1.86e-04

Pos logit contributions
 harmful+0.007
 wasteful+0.006
 charity+0.006
Or+0.006
 spies+0.006
Neg logit contributions
 doorway-0.004
 doorstep-0.004
field-0.004
 comet-0.004
 couldn-0.004
Token hand
Token ID1021
Feature activation-1.06e-03

Pos logit contributions
field+0.000
 doorway+0.000
 doorstep+0.000
 comet+0.000
 couldn+0.000
Neg logit contributions
 harmful-0.000
 charity-0.000
Or-0.000
 wasteful-0.000
 spies-0.000
Token.
Token ID13
Feature activation+3.18e-03

Pos logit contributions
 doorway+0.002
field+0.002
 doorstep+0.002
 comet+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.002
 wasteful-0.002
 charity-0.002
Or-0.002
 spies-0.002
Token They
Token ID1119
Feature activation+4.42e-03

Pos logit contributions
 harmful+0.007
ief+0.006
 thieves+0.006
 wasteful+0.006
cod+0.006
Neg logit contributions
 doorway-0.004
 couldn-0.004
 sky-0.004
 replied-0.004
field-0.004
Token sw
Token ID1509
Feature activation+8.47e-03

Pos logit contributions
 harmful+0.008
Or+0.007
 wasteful+0.007
 spies+0.007
 charity+0.007
Neg logit contributions
 doorway-0.007
 comet-0.006
field-0.006
 doorstep-0.006
chair-0.006
Tokenam
Token ID321
Feature activation+1.46e-02

Pos logit contributions
 harmful+0.016
 charity+0.015
 wasteful+0.014
Or+0.014
 spies+0.014
Neg logit contributions
 doorstep-0.012
 couldn-0.012
field-0.012
 doorway-0.012
night-0.012
Token back
Token ID736
Feature activation+4.56e-03

Pos logit contributions
 harmful+0.030
 wasteful+0.027
 charity+0.027
Or+0.026
drivers+0.026
Neg logit contributions
 doorway-0.019
 doorstep-0.018
field-0.018
 comet-0.018
 couldn-0.018
Token to
Token ID284
Feature activation-1.38e-02

Pos logit contributions
 harmful+0.009
 spies+0.008
Or+0.008
drivers+0.008
 charity+0.008
Neg logit contributions
 comet-0.006
 doorway-0.006
 doorstep-0.006
field-0.006
 couldn-0.006
Token me
Token ID502
Feature activation-1.51e-05

Pos logit contributions
field+0.039
 doorway+0.037
 doorstep+0.037
 comet+0.035
 couldn+0.034
Neg logit contributions
 harmful-0.035
 wasteful-0.029
 spies-0.029
 charity-0.028
Or-0.028
Token sad
Token ID6507
Feature activation+3.41e-03

Pos logit contributions
 doorway+0.000
field+0.000
 doorstep+0.000
 comet+0.000
 couldn+0.000
Neg logit contributions
 harmful-0.000
 wasteful-0.000
 charity-0.000
Or-0.000
 spies-0.000
Token."
Token ID526
Feature activation-7.34e-03

Pos logit contributions
 harmful+0.007
drivers+0.006
 charity+0.006
 wasteful+0.006
Or+0.006
Neg logit contributions
 comet-0.004
 night-0.004
 couldn-0.004
chair-0.004
 doorway-0.004
Token
Token ID198
Feature activation-1.40e-02

Pos logit contributions
 doorway+0.010
 doorstep+0.009
field+0.009
 comet+0.009
 couldn+0.009
Neg logit contributions
 harmful-0.015
Or-0.013
 charity-0.013
 spies-0.013
 wasteful-0.013
Token
Token ID198
Feature activation+1.12e-02

Pos logit contributions
 doorway+0.019
 doorstep+0.019
 comet+0.018
field+0.018
 couldn+0.017
Neg logit contributions
 harmful-0.028
 charity-0.025
Or-0.025
 spies-0.024
 wasteful-0.024
Token"
Token ID1
Feature activation+1.76e-02

Pos logit contributions
 harmful+0.018
drivers+0.016
 wasteful+0.016
 dishonest+0.015
 charity+0.015
Neg logit contributions
field-0.020
 comet-0.019
 doorway-0.018
 cloud-0.018
 doorstep-0.018
TokenMe
Token ID5308
Feature activation+2.97e-04

Pos logit contributions
 harmful+0.033
drivers+0.031
 charity+0.030
 thieves+0.029
 spies+0.029
Neg logit contributions
 comet-0.025
field-0.024
 doorway-0.023
 couldn-0.023
chair-0.023
Token too
Token ID1165
Feature activation-4.16e-03

Pos logit contributions
 harmful+0.001
 charity+0.000
 wasteful+0.000
Or+0.000
 spies+0.000
Neg logit contributions
 doorway-0.000
field-0.000
 comet-0.000
 doorstep-0.000
 couldn-0.000
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
 doorway+0.005
 doorstep+0.005
 couldn+0.005
 comet+0.005
field+0.005
Neg logit contributions
 harmful-0.009
 charity-0.008
Or-0.008
 wasteful-0.008
 spies-0.008
Token I
Token ID314
Feature activation-1.11e-03

Pos logit contributions
 doorway+0.015
field+0.015
 doorstep+0.015
 comet+0.015
 couldn+0.014
Neg logit contributions
 harmful-0.019
 charity-0.017
 wasteful-0.016
Or-0.016
 spies-0.016
Token wish
Token ID4601
Feature activation-2.25e-02

Pos logit contributions
 doorway+0.002
field+0.002
 doorstep+0.002
 comet+0.002
chair+0.002
Neg logit contributions
 harmful-0.002
 wasteful-0.001
Or-0.001
 charity-0.001
 spies-0.001
Token,
Token ID11
Feature activation+1.07e-02

Pos logit contributions
 harmful+0.027
 spies+0.024
 wasteful+0.023
drivers+0.022
 thieves+0.022
Neg logit contributions
 doorstep-0.036
 doorway-0.036
field-0.036
 comet-0.034
chair-0.032
Token the
Token ID262
Feature activation+1.06e-03

Pos logit contributions
 harmful+0.015
 spies+0.013
 wasteful+0.013
Or+0.013
 charity+0.012
Neg logit contributions
 doorstep-0.021
 doorway-0.021
field-0.021
 comet-0.020
chair-0.019
Token sky
Token ID6766
Feature activation+6.46e-03

Pos logit contributions
 harmful+0.002
 spies+0.002
 charity+0.002
 wasteful+0.002
Or+0.002
Neg logit contributions
field-0.002
 awoke-0.002
 couldn-0.002
 doorway-0.002
 doorstep-0.001
Token started
Token ID2067
Feature activation-1.92e-02

Pos logit contributions
 harmful+0.010
 mean+0.008
 wasteful+0.008
 trash+0.008
 rude+0.008
Neg logit contributions
field-0.011
 doorway-0.010
 doorstep-0.010
 comet-0.010
la-0.010
Token to
Token ID284
Feature activation-1.31e-03

Pos logit contributions
 doorstep+0.031
field+0.031
 comet+0.031
 doorway+0.029
 therapist+0.029
Neg logit contributions
 harmful-0.034
 spies-0.028
 wasteful-0.027
Or-0.027
 traps-0.026
Token turn
Token ID1210
Feature activation-7.86e-03

Pos logit contributions
 doorway+0.002
 doorstep+0.002
field+0.002
 comet+0.002
 couldn+0.002
Neg logit contributions
 harmful-0.002
 wasteful-0.002
 charity-0.002
Or-0.002
 spies-0.002
Token dark
Token ID3223
Feature activation-8.13e-03

Pos logit contributions
 doorstep+0.013
field+0.013
 doorway+0.012
 comet+0.012
 couldn+0.011
Neg logit contributions
 harmful-0.014
 wasteful-0.012
 spies-0.012
Or-0.011
drivers-0.011
Token and
Token ID290
Feature activation+2.78e-03

Pos logit contributions
 doorstep+0.012
field+0.012
 doorway+0.012
 comet+0.012
 couldn+0.012
Neg logit contributions
 harmful-0.015
 spies-0.013
drivers-0.012
Or-0.012
 wasteful-0.012
Token the
Token ID262
Feature activation-5.52e-03

Pos logit contributions
 harmful+0.005
 wasteful+0.004
 spies+0.004
Or+0.004
 charity+0.004
Neg logit contributions
 doorstep-0.005
 doorway-0.005
field-0.005
 comet-0.005
chair-0.005
Token wind
Token ID2344
Feature activation-8.41e-03

Pos logit contributions
field+0.008
 awoke+0.008
 replied+0.008
night+0.008
chair+0.008
Neg logit contributions
 harmful-0.010
 spies-0.009
 charity-0.009
 wasteful-0.009
 thieves-0.008
Token began
Token ID2540
Feature activation-1.26e-02

Pos logit contributions
 doorstep+0.014
 doorway+0.014
 comet+0.013
field+0.013
 Moon+0.013
Neg logit contributions
 harmful-0.014
Or-0.011
 wasteful-0.011
ts-0.011
drivers-0.010
Tokenpillar
Token ID41643
Feature activation-1.21e-02

Pos logit contributions
 doorway+0.031
field+0.030
 comet+0.030
 doorstep+0.029
chair+0.029
Neg logit contributions
 harmful-0.070
 charity-0.064
 wasteful-0.063
Or-0.063
 spies-0.063
Token!
Token ID0
Feature activation+1.24e-03

Pos logit contributions
 doorway+0.019
 doorstep+0.019
 comet+0.019
field+0.019
 couldn+0.018
Neg logit contributions
 harmful-0.021
Or-0.018
 spies-0.018
 charity-0.018
 wasteful-0.018
Token He
Token ID679
Feature activation+3.14e-04

Pos logit contributions
 harmful+0.002
 wasteful+0.002
 charity+0.002
Or+0.002
 spies+0.002
Neg logit contributions
 doorway-0.002
 doorstep-0.002
field-0.002
 comet-0.002
 couldn-0.002
Token said
Token ID531
Feature activation+1.32e-03

Pos logit contributions
 harmful+0.001
Or+0.001
 wasteful+0.001
 spies+0.001
 charity+0.001
Neg logit contributions
 doorway-0.000
 doorstep-0.000
 comet-0.000
field-0.000
chair-0.000
Token hello
Token ID23748
Feature activation+1.64e-04

Pos logit contributions
 harmful+0.003
 charity+0.003
 wasteful+0.003
Or+0.003
drivers+0.003
Neg logit contributions
 couldn-0.001
 comet-0.001
 replied-0.001
 Cloud-0.001
 awoke-0.001
Token to
Token ID284
Feature activation-5.59e-03

Pos logit contributions
 harmful+0.000
 wasteful+0.000
 charity+0.000
drivers+0.000
Or+0.000
Neg logit contributions
 doorway-0.000
 doorstep-0.000
field-0.000
 comet-0.000
 couldn-0.000
Token each
Token ID1123
Feature activation-7.90e-03

Pos logit contributions
 couldn+0.009
 comet+0.009
 doorway+0.009
 night+0.009
 doorstep+0.008
Neg logit contributions
 harmful-0.010
 wasteful-0.009
drivers-0.009
Or-0.008
 spies-0.008
Token one
Token ID530
Feature activation+3.64e-03

Pos logit contributions
 doorway+0.013
field+0.013
 doorstep+0.013
 comet+0.013
 couldn+0.012
Neg logit contributions
 harmful-0.014
 charity-0.012
 wasteful-0.012
Or-0.012
 spies-0.011
Token.
Token ID13
Feature activation-3.11e-03

Pos logit contributions
 harmful+0.005
Or+0.004
 wasteful+0.004
 charity+0.004
smart+0.004
Neg logit contributions
 doorway-0.007
 doorstep-0.007
field-0.006
 comet-0.006
 couldn-0.006
Token They
Token ID1119
Feature activation-7.73e-03

Pos logit contributions
 doorway+0.005
field+0.005
 comet+0.005
 doorstep+0.005
 couldn+0.005
Neg logit contributions
 harmful-0.006
 charity-0.005
 wasteful-0.005
Or-0.005
 spies-0.005
Token all
Token ID477
Feature activation-2.03e-02

Pos logit contributions
 doorway+0.012
 doorstep+0.012
chair+0.011
 comet+0.011
field+0.011
Neg logit contributions
 harmful-0.014
Or-0.013
 wasteful-0.012
 spies-0.012
 charity-0.012
Token pretty
Token ID2495
Feature activation-7.29e-03

Pos logit contributions
 harmful+0.005
 wasteful+0.004
 charity+0.004
 spies+0.004
Or+0.004
Neg logit contributions
field-0.006
 doorstep-0.006
 doorway-0.005
 comet-0.005
chair-0.005
Token.
Token ID13
Feature activation+5.63e-03

Pos logit contributions
 doorway+0.013
 doorstep+0.013
field+0.013
 couldn+0.012
 comet+0.012
Neg logit contributions
 harmful-0.011
Or-0.010
 spies-0.010
 charity-0.010
 wasteful-0.009
Token Lily
Token ID20037
Feature activation-8.74e-03

Pos logit contributions
 harmful+0.010
 wasteful+0.009
 charity+0.009
Or+0.009
 spies+0.009
Neg logit contributions
 doorway-0.009
field-0.009
 comet-0.009
 doorstep-0.009
 couldn-0.008
Token felt
Token ID2936
Feature activation-1.18e-02

Pos logit contributions
 doorway+0.015
 doorstep+0.015
field+0.014
 comet+0.014
ous+0.013
Neg logit contributions
 harmful-0.014
Or-0.014
 wasteful-0.012
Like-0.012
 greedy-0.012
Token proud
Token ID6613
Feature activation-1.62e-03

Pos logit contributions
 comet+0.020
 couldn+0.020
 doorway+0.020
field+0.019
 night+0.019
Neg logit contributions
 harmful-0.019
drivers-0.017
 charity-0.017
Or-0.016
 wasteful-0.016
Token and
Token ID290
Feature activation-9.75e-03

Pos logit contributions
field+0.003
 doorstep+0.003
 doorway+0.003
 comet+0.002
 awoke+0.002
Neg logit contributions
 harmful-0.003
 spies-0.003
 charity-0.002
 wasteful-0.002
smart-0.002
Token happy
Token ID3772
Feature activation+5.77e-03

Pos logit contributions
 doorstep+0.020
field+0.019
 doorway+0.019
 comet+0.019
chair+0.018
Neg logit contributions
 harmful-0.014
 wasteful-0.012
Or-0.011
 spies-0.011
 dishonest-0.011
Token.
Token ID13
Feature activation+3.85e-03

Pos logit contributions
 harmful+0.009
 spies+0.008
 charity+0.008
Or+0.008
 wasteful+0.008
Neg logit contributions
field-0.010
 doorway-0.010
 doorstep-0.010
 comet-0.010
chair-0.009
Token She
Token ID1375
Feature activation-5.43e-03

Pos logit contributions
 harmful+0.006
 charity+0.005
 wasteful+0.005
Or+0.005
 spies+0.005
Neg logit contributions
 doorstep-0.007
 doorway-0.007
field-0.007
 comet-0.007
 couldn-0.007
Token told
Token ID1297
Feature activation-3.25e-03

Pos logit contributions
 doorway+0.010
field+0.009
 doorstep+0.009
 comet+0.009
chair+0.009
Neg logit contributions
 harmful-0.008
Or-0.008
 spies-0.007
 greedy-0.007
 wasteful-0.007
Token them
Token ID606
Feature activation-2.66e-03

Pos logit contributions
 comet+0.006
 doorway+0.006
 doorstep+0.005
 couldn+0.005
 night+0.005
Neg logit contributions
 harmful-0.005
Or-0.005
 wasteful-0.005
drivers-0.005
 charity-0.004
Token her
Token ID607
Feature activation-1.17e-02

Pos logit contributions
 harmful+0.027
Or+0.025
drivers+0.024
 wasteful+0.024
 charity+0.023
Neg logit contributions
 night-0.025
 doorway-0.024
 couldn-0.024
 doorstep-0.024
 comet-0.023
Token.
Token ID13
Feature activation-1.11e-02

Pos logit contributions
 doorway+0.019
field+0.019
 comet+0.018
 doorstep+0.017
chair+0.017
Neg logit contributions
 harmful-0.021
 charity-0.018
 spies-0.017
Or-0.017
drivers-0.017
Token She
Token ID1375
Feature activation+3.72e-04

Pos logit contributions
 comet+0.018
field+0.017
 doorway+0.017
 doorstep+0.017
chair+0.016
Neg logit contributions
 harmful-0.020
 charity-0.017
 wasteful-0.017
Or-0.017
 spies-0.017
Token was
Token ID373
Feature activation-1.64e-02

Pos logit contributions
 harmful+0.001
Or+0.001
 wasteful+0.001
 dishonest+0.001
ts+0.001
Neg logit contributions
 comet-0.001
 doorway-0.001
field-0.001
 doorstep-0.001
 sunrise-0.001
Token so
Token ID523
Feature activation-1.86e-03

Pos logit contributions
 comet+0.029
field+0.029
 doorstep+0.029
 doorway+0.028
chair+0.027
Neg logit contributions
 harmful-0.027
 wasteful-0.023
 spies-0.022
Or-0.022
 dishonest-0.021
Token determined
Token ID5295
Feature activation-4.08e-03

Pos logit contributions
 couldn+0.003
 doorway+0.003
field+0.002
 comet+0.002
 doorstep+0.002
Neg logit contributions
 harmful-0.004
 charity-0.003
drivers-0.003
 spies-0.003
Or-0.003
Token to
Token ID284
Feature activation-1.79e-02

Pos logit contributions
 comet+0.005
 doorway+0.005
field+0.005
 doorstep+0.005
chair+0.005
Neg logit contributions
 harmful-0.009
 wasteful-0.007
Or-0.007
 charity-0.007
 spies-0.007
Token get
Token ID651
Feature activation+1.31e-03

Pos logit contributions
 doorway+0.034
field+0.033
 comet+0.033
chair+0.032
 doorstep+0.032
Neg logit contributions
 harmful-0.026
 charity-0.021
 spies-0.021
Or-0.021
 wasteful-0.021
Token it
Token ID340
Feature activation+7.97e-03

Pos logit contributions
 harmful+0.003
 charity+0.003
 wasteful+0.003
Or+0.003
drivers+0.003
Neg logit contributions
 doorway-0.001
field-0.001
 doorstep-0.001
 comet-0.001
 couldn-0.001
Token,
Token ID11
Feature activation-2.76e-02

Pos logit contributions
 harmful+0.018
 wasteful+0.017
 charity+0.016
Or+0.016
drivers+0.016
Neg logit contributions
 doorway-0.009
 couldn-0.009
 doorstep-0.009
 night-0.009
field-0.009
Token she
Token ID673
Feature activation+7.15e-03

Pos logit contributions
 doorway+0.043
field+0.042
 doorstep+0.042
 comet+0.042
 couldn+0.041
Neg logit contributions
 harmful-0.050
 wasteful-0.043
 charity-0.043
Or-0.043
 spies-0.042
Token toilet
Token ID16146
Feature activation-1.11e-02

Pos logit contributions
 couldn+0.013
night+0.012
field+0.012
 doorstep+0.011
 awoke+0.011
Neg logit contributions
 harmful-0.019
 charity-0.017
 spies-0.017
 wasteful-0.016
Or-0.016
Token,
Token ID11
Feature activation-2.92e-02

Pos logit contributions
 comet+0.016
 doorway+0.016
 doorstep+0.016
field+0.016
 couldn+0.016
Neg logit contributions
 harmful-0.020
 charity-0.018
Or-0.018
 wasteful-0.018
drivers-0.018
Token feeling
Token ID4203
Feature activation-9.22e-03

Pos logit contributions
 comet+0.045
 doorstep+0.044
 doorway+0.043
field+0.043
chair+0.043
Neg logit contributions
 harmful-0.053
Or-0.047
 wasteful-0.046
 spies-0.046
 charity-0.045
Token very
Token ID845
Feature activation-1.88e-03

Pos logit contributions
 doorway+0.013
 couldn+0.013
 comet+0.012
 night+0.012
field+0.012
Neg logit contributions
 harmful-0.017
 charity-0.016
drivers-0.016
Or-0.015
 spies-0.015
Token silly
Token ID14397
Feature activation-5.61e-03

Pos logit contributions
 couldn+0.003
 doorway+0.003
field+0.003
 comet+0.003
 night+0.003
Neg logit contributions
 harmful-0.003
 charity-0.003
drivers-0.003
Or-0.003
 spies-0.003
Token.
Token ID13
Feature activation-4.71e-03

Pos logit contributions
 doorway+0.008
field+0.008
 doorstep+0.008
 comet+0.008
 couldn+0.008
Neg logit contributions
 harmful-0.011
 wasteful-0.009
 charity-0.009
Or-0.009
 spies-0.009
Token She
Token ID1375
Feature activation-1.03e-02

Pos logit contributions
 doorway+0.007
 comet+0.007
field+0.007
 doorstep+0.007
 couldn+0.006
Neg logit contributions
 harmful-0.009
 wasteful-0.008
 charity-0.008
Or-0.008
 spies-0.008
Token said
Token ID531
Feature activation-2.88e-03

Pos logit contributions
 doorway+0.015
 doorstep+0.015
 comet+0.014
field+0.014
 sunrise+0.013
Neg logit contributions
 harmful-0.019
Or-0.018
 wasteful-0.017
 dishonest-0.016
 spies-0.016
Token,
Token ID11
Feature activation-1.11e-02

Pos logit contributions
 couldn+0.004
 comet+0.003
 doorway+0.003
field+0.003
 doorstep+0.003
Neg logit contributions
 harmful-0.006
 charity-0.006
 wasteful-0.005
Or-0.005
drivers-0.005
Token "
Token ID366
Feature activation+8.46e-03

Pos logit contributions
 doorstep+0.018
 doorway+0.018
 comet+0.018
field+0.018
chair+0.017
Neg logit contributions
 harmful-0.019
Or-0.017
 charity-0.016
 spies-0.016
 wasteful-0.016
TokenI
Token ID40
Feature activation-8.54e-03

Pos logit contributions
 harmful+0.014
 wasteful+0.012
 charity+0.012
drivers+0.012
 spies+0.012
Neg logit contributions
 doorway-0.014
field-0.014
 comet-0.014
 doorstep-0.014
 couldn-0.014
Tokenmy
Token ID1820
Feature activation-6.95e-03

Pos logit contributions
 comet+0.001
field+0.001
 doorway+0.001
 doorstep+0.001
chair+0.001
Neg logit contributions
 harmful-0.001
 wasteful-0.001
drivers-0.001
Or-0.001
 dishonest-0.001
Token was
Token ID373
Feature activation-4.02e-03

Pos logit contributions
 doorway+0.010
 comet+0.010
field+0.010
 doorstep+0.009
 sunrise+0.009
Neg logit contributions
 harmful-0.014
Or-0.012
 wasteful-0.012
drivers-0.011
 spies-0.011
Token sad
Token ID6507
Feature activation+2.69e-03

Pos logit contributions
 doorway+0.007
 couldn+0.006
field+0.006
 doorstep+0.006
 comet+0.006
Neg logit contributions
 harmful-0.007
 charity-0.006
Or-0.006
drivers-0.006
 wasteful-0.006
Token that
Token ID326
Feature activation-7.24e-03

Pos logit contributions
 harmful+0.005
 charity+0.005
 wasteful+0.005
Or+0.005
drivers+0.005
Neg logit contributions
 doorway-0.004
field-0.004
 doorstep-0.004
 comet-0.003
 couldn-0.003
Token he
Token ID339
Feature activation+1.92e-02

Pos logit contributions
 couldn+0.012
 comet+0.012
 doorway+0.012
 night+0.012
field+0.011
Neg logit contributions
 harmful-0.013
Or-0.011
drivers-0.011
 wasteful-0.011
 charity-0.011
Token couldn
Token ID3521
Feature activation-4.19e-02

Pos logit contributions
 harmful+0.039
 wasteful+0.032
 dishonest+0.031
 greedy+0.030
Or+0.030
Neg logit contributions
 doorway-0.031
 comet-0.030
field-0.030
 sunrise-0.028
 doorstep-0.027
Token't
Token ID470
Feature activation+2.82e-03

Pos logit contributions
 doorway+0.053
 comet+0.052
field+0.051
 doorstep+0.050
 couldn+0.047
Neg logit contributions
 harmful-0.086
drivers-0.077
 wasteful-0.077
 charity-0.076
Or-0.076
Token step
Token ID2239
Feature activation-1.53e-02

Pos logit contributions
 harmful+0.005
 wasteful+0.004
 charity+0.004
Or+0.004
drivers+0.004
Neg logit contributions
 doorstep-0.004
 comet-0.004
field-0.004
 doorway-0.004
 couldn-0.004
Token on
Token ID319
Feature activation-2.28e-03

Pos logit contributions
 comet+0.014
 couldn+0.014
 night+0.014
chair+0.013
 doorway+0.013
Neg logit contributions
 harmful-0.036
 charity-0.034
Or-0.034
 wasteful-0.033
drivers-0.033
Token the
Token ID262
Feature activation-1.14e-02

Pos logit contributions
 comet+0.003
 night+0.003
 doorway+0.002
 couldn+0.002
 doorstep+0.002
Neg logit contributions
 harmful-0.005
 wasteful-0.005
Or-0.005
drivers-0.005
 charity-0.005
Token bug
Token ID5434
Feature activation-7.15e-03

Pos logit contributions
field+0.017
 doorway+0.017
 doorstep+0.016
 awoke+0.016
 couldn+0.016
Neg logit contributions
 harmful-0.022
 wasteful-0.019
 charity-0.019
 spies-0.019
Or-0.018
Token.
Token ID13
Feature activation+6.19e-03

Pos logit contributions
field+0.010
 doorway+0.010
 doorstep+0.010
 comet+0.010
 couldn+0.009
Neg logit contributions
 harmful-0.010
 spies-0.009
 charity-0.009
Or-0.009
drivers-0.009
Token He
Token ID679
Feature activation+1.27e-03

Pos logit contributions
 harmful+0.008
 wasteful+0.007
drivers+0.006
 spies+0.006
 charity+0.006
Neg logit contributions
 comet-0.013
field-0.013
 doorway-0.013
 doorstep-0.012
 couldn-0.012
Token had
Token ID550
Feature activation-6.45e-03

Pos logit contributions
 harmful+0.002
Or+0.002
 spies+0.002
 wasteful+0.002
drivers+0.002
Neg logit contributions
 doorway-0.002
field-0.002
 doorstep-0.002
 comet-0.002
 sunrise-0.002
Token to
Token ID284
Feature activation+1.70e-02

Pos logit contributions
field+0.010
 comet+0.010
 doorstep+0.010
 doorway+0.009
chair+0.009
Neg logit contributions
 harmful-0.012
 spies-0.011
Or-0.010
 wasteful-0.010
 charity-0.010
Token z
Token ID1976
Feature activation-6.65e-03

Pos logit contributions
 harmful+0.033
drivers+0.030
 wasteful+0.030
Or+0.030
 dishonest+0.028
Neg logit contributions
 doorstep-0.024
 comet-0.024
 doorway-0.023
field-0.023
 night-0.022
Tokenig
Token ID328
Feature activation-3.93e-02

Pos logit contributions
 doorway+0.010
field+0.010
 doorstep+0.010
 comet+0.010
 couldn+0.010
Neg logit contributions
 harmful-0.012
 wasteful-0.011
 charity-0.011
 spies-0.011
Or-0.010
Tokenzag
Token ID50183
Feature activation+3.55e-03

Pos logit contributions
 doorway+0.044
field+0.043
 comet+0.043
 doorstep+0.043
 couldn+0.041
Neg logit contributions
 harmful-0.087
 wasteful-0.078
 charity-0.078
Or-0.077
 spies-0.077
Token around
Token ID1088
Feature activation-1.52e-02

Pos logit contributions
 harmful+0.007
 charity+0.006
 wasteful+0.006
 spies+0.006
Or+0.006
Neg logit contributions
 doorway-0.005
 couldn-0.005
field-0.005
 comet-0.005
 doorstep-0.005
Token and
Token ID290
Feature activation-1.03e-02

Pos logit contributions
 doorway+0.022
field+0.022
 doorstep+0.022
 comet+0.021
 couldn+0.020
Neg logit contributions
 harmful-0.029
 charity-0.025
 wasteful-0.025
 spies-0.025
Or-0.025
Token make
Token ID787
Feature activation-3.08e-03

Pos logit contributions
 doorway+0.016
field+0.016
 comet+0.016
 doorstep+0.016
chair+0.014
Neg logit contributions
 harmful-0.018
 wasteful-0.016
Or-0.016
 spies-0.016
 charity-0.016
Token big
Token ID1263
Feature activation+7.75e-04

Pos logit contributions
 doorway+0.005
field+0.005
 doorstep+0.005
 comet+0.005
 couldn+0.005
Neg logit contributions
 harmful-0.005
 wasteful-0.005
 charity-0.005
Or-0.005
 spies-0.005
Token.
Token ID13
Feature activation-8.40e-03

Pos logit contributions
 harmful+0.002
 wasteful+0.002
Or+0.002
drivers+0.002
 charity+0.002
Neg logit contributions
 doorway-0.002
 comet-0.002
 couldn-0.002
 doorstep-0.002
field-0.002
Token
Token ID220
Feature activation-5.72e-03

Pos logit contributions
 comet+0.014
 doorstep+0.013
field+0.013
 doorway+0.013
night+0.012
Neg logit contributions
 harmful-0.014
 charity-0.013
Or-0.013
drivers-0.013
 wasteful-0.012
Token
Token ID198
Feature activation-9.41e-03

Pos logit contributions
 doorway+0.006
field+0.006
 doorstep+0.006
 comet+0.006
 couldn+0.006
Neg logit contributions
 harmful-0.013
 wasteful-0.012
 charity-0.012
Or-0.011
 spies-0.011
Token
Token ID198
Feature activation+1.06e-02

Pos logit contributions
 doorway+0.012
 comet+0.012
 doorstep+0.012
field+0.012
 couldn+0.011
Neg logit contributions
 harmful-0.019
 charity-0.017
Or-0.017
 wasteful-0.017
drivers-0.017
TokenL
Token ID43
Feature activation-7.42e-03

Pos logit contributions
 harmful+0.017
 wasteful+0.014
 charity+0.014
 spies+0.014
Or+0.014
Neg logit contributions
 doorway-0.019
field-0.019
 doorstep-0.019
 comet-0.018
 couldn-0.018
Tokenola
Token ID5708
Feature activation-3.96e-02

Pos logit contributions
field+0.011
 doorway+0.011
 doorstep+0.011
 comet+0.011
 couldn+0.011
Neg logit contributions
 harmful-0.013
 wasteful-0.012
 charity-0.012
Or-0.012
 spies-0.012
Token started
Token ID2067
Feature activation-1.67e-02

Pos logit contributions
 doorway+0.056
 comet+0.054
 doorstep+0.053
field+0.051
chair+0.050
Neg logit contributions
 harmful-0.071
Or-0.069
 wasteful-0.062
drivers-0.060
 dishonest-0.060
Token to
Token ID284
Feature activation+1.52e-04

Pos logit contributions
field+0.021
 doorstep+0.021
 comet+0.021
 doorway+0.020
chair+0.020
Neg logit contributions
 harmful-0.035
 wasteful-0.030
 spies-0.030
Or-0.030
 charity-0.030
Token record
Token ID1700
Feature activation-2.16e-03

Pos logit contributions
 harmful+0.000
 wasteful+0.000
 charity+0.000
Or+0.000
 spies+0.000
Neg logit contributions
 doorway-0.000
field-0.000
 doorstep-0.000
 comet-0.000
 couldn-0.000
Token the
Token ID262
Feature activation+1.96e-03

Pos logit contributions
 doorway+0.003
 comet+0.003
 couldn+0.003
 doorstep+0.003
 night+0.003
Neg logit contributions
 harmful-0.004
 wasteful-0.004
Or-0.003
 charity-0.003
drivers-0.003
Token pig
Token ID12967
Feature activation-6.41e-03

Pos logit contributions
 harmful+0.004
 wasteful+0.003
 charity+0.003
Or+0.003
 spies+0.003
Neg logit contributions
 doorway-0.003
field-0.003
 doorstep-0.003
 comet-0.003
 couldn-0.003
Token and
Token ID290
Feature activation+1.75e-02

Pos logit contributions
 doorway+0.014
 couldn+0.014
field+0.014
 doorstep+0.014
 comet+0.013
Neg logit contributions
 harmful-0.035
 wasteful-0.031
 charity-0.031
 spies-0.031
Or-0.031
Token tried
Token ID3088
Feature activation-9.12e-03

Pos logit contributions
 harmful+0.041
 charity+0.039
 spies+0.038
 wasteful+0.037
Or+0.037
Neg logit contributions
 couldn-0.021
 night-0.016
 doorway-0.016
 replied-0.015
field-0.015
Token,
Token ID11
Feature activation-1.01e-02

Pos logit contributions
 couldn+0.009
 doorway+0.009
 doorstep+0.009
 night+0.009
field+0.008
Neg logit contributions
 harmful-0.021
 wasteful-0.020
ief-0.019
cod-0.019
Or-0.019
Token but
Token ID475
Feature activation-3.62e-03

Pos logit contributions
 couldn+0.013
 doorway+0.011
field+0.011
 night+0.011
 doorstep+0.010
Neg logit contributions
 harmful-0.021
 charity-0.020
 spies-0.020
 wasteful-0.019
 disl-0.018
Token she
Token ID673
Feature activation+1.26e-02

Pos logit contributions
 doorway+0.005
field+0.005
 doorstep+0.005
 comet+0.005
 couldn+0.005
Neg logit contributions
 harmful-0.007
 wasteful-0.006
 charity-0.006
Or-0.006
 spies-0.006
Token couldn
Token ID3521
Feature activation-4.79e-02

Pos logit contributions
 harmful+0.029
 wasteful+0.026
drivers+0.026
Or+0.026
 dishonest+0.025
Neg logit contributions
 doorway-0.014
 comet-0.013
 doorstep-0.012
 sunrise-0.012
field-0.012
Token't
Token ID470
Feature activation-1.06e-02

Pos logit contributions
 doorway+0.063
field+0.063
 comet+0.061
 doorstep+0.061
night+0.057
Neg logit contributions
 harmful-0.094
drivers-0.085
 wasteful-0.083
Or-0.083
 charity-0.083
Token do
Token ID466
Feature activation-2.02e-02

Pos logit contributions
 doorstep+0.016
field+0.016
 doorway+0.016
 comet+0.016
 couldn+0.016
Neg logit contributions
 harmful-0.019
 wasteful-0.017
 charity-0.017
Or-0.017
 spies-0.017
Token it
Token ID340
Feature activation+1.21e-02

Pos logit contributions
 couldn+0.017
 doorway+0.016
 doorstep+0.016
 comet+0.016
field+0.015
Neg logit contributions
 harmful-0.051
 wasteful-0.046
Or-0.046
 charity-0.046
 spies-0.045
Token.
Token ID13
Feature activation-7.83e-03

Pos logit contributions
 harmful+0.029
Or+0.028
cod+0.027
 disl+0.027
 wasteful+0.027
Neg logit contributions
 couldn-0.013
 night-0.012
 doorstep-0.010
chair-0.010
ned-0.009
Token
Token ID220
Feature activation-1.09e-02

Pos logit contributions
 doorway+0.009
field+0.009
 comet+0.009
 doorstep+0.009
 couldn+0.008
Neg logit contributions
 harmful-0.017
 wasteful-0.015
 charity-0.015
Or-0.015
 spies-0.015
Token office
Token ID2607
Feature activation+9.51e-03

Pos logit contributions
 doorway+0.006
 doorstep+0.006
field+0.006
 comet+0.006
 couldn+0.006
Neg logit contributions
 harmful-0.007
 wasteful-0.006
 charity-0.006
Or-0.006
drivers-0.006
Token closed
Token ID4838
Feature activation-2.47e-02

Pos logit contributions
 harmful+0.019
 charity+0.017
 wasteful+0.017
Or+0.017
 spies+0.016
Neg logit contributions
 doorway-0.013
 doorstep-0.013
field-0.013
 comet-0.012
 couldn-0.012
Token?"
Token ID1701
Feature activation-8.73e-03

Pos logit contributions
 doorway+0.034
 doorstep+0.034
field+0.034
 comet+0.033
 couldn+0.032
Neg logit contributions
 harmful-0.048
 wasteful-0.042
 charity-0.042
Or-0.041
 spies-0.041
Token she
Token ID673
Feature activation-1.09e-02

Pos logit contributions
 doorway+0.012
 doorstep+0.012
field+0.012
 comet+0.012
 couldn+0.011
Neg logit contributions
 harmful-0.017
 wasteful-0.015
 charity-0.015
Or-0.015
 spies-0.015
Token asked
Token ID1965
Feature activation-9.21e-05

Pos logit contributions
field+0.016
 doorway+0.016
 doorstep+0.016
 comet+0.016
chair+0.015
Neg logit contributions
 harmful-0.020
 wasteful-0.017
 spies-0.017
Or-0.017
 dishonest-0.017
Token.
Token ID13
Feature activation-3.78e-02

Pos logit contributions
 doorway+0.000
field+0.000
 doorstep+0.000
 comet+0.000
 couldn+0.000
Neg logit contributions
 harmful-0.000
 wasteful-0.000
 charity-0.000
Or-0.000
 spies-0.000
Token
Token ID220
Feature activation-1.73e-02

Pos logit contributions
 comet+0.054
 doorstep+0.053
field+0.053
 doorway+0.052
chair+0.050
Neg logit contributions
 harmful-0.067
 charity-0.063
Or-0.063
 spies-0.061
drivers-0.059
Token
Token ID198
Feature activation-1.96e-02

Pos logit contributions
 doorway+0.021
field+0.020
 doorstep+0.020
 comet+0.020
 couldn+0.019
Neg logit contributions
 harmful-0.037
 wasteful-0.033
 charity-0.033
Or-0.033
 spies-0.033
Token
Token ID198
Feature activation-4.98e-03

Pos logit contributions
 doorway+0.026
field+0.026
 comet+0.026
 doorstep+0.026
 couldn+0.025
Neg logit contributions
 harmful-0.038
 charity-0.035
Or-0.034
 wasteful-0.033
 spies-0.033
Token"
Token ID1
Feature activation+5.39e-03

Pos logit contributions
 doorway+0.009
 doorstep+0.009
field+0.009
 comet+0.009
 couldn+0.009
Neg logit contributions
 harmful-0.008
 wasteful-0.007
drivers-0.006
 charity-0.006
 spies-0.006
TokenBecause
Token ID8128
Feature activation-2.76e-02

Pos logit contributions
 harmful+0.010
 wasteful+0.009
 charity+0.009
 spies+0.009
drivers+0.009
Neg logit contributions
field-0.008
 comet-0.008
 doorway-0.008
 couldn-0.008
 doorstep-0.008