TOP ACTIVATIONS
MAX = 0.082

But as soon as she bit into it , she felt
as soon as Twe ety bit the brick , it hurt
favorite bugs because they always bit him and made him it
stars are like lots of little planets , each one is
made it into lots of little pieces . Now everyone could
. As soon as he bit it , the nut cracked
dog was not friendly and bit Tim my 's hand .
didn 't like it and bit the boy . Lily learned
, but it was all fore shadow ed by the kind
very organized with lots of little draw ers . She was
known as the whale who healed . Animals from all over
. I can 't be bit en . But if you
. There were lots of little birds and flowers that Todd
day and wanted something little bit different . So she asked
want to go empty - handed , so she sque aked
now I 'm feeling a bit chilly ." The
fast as he could and bit the fish . The ot
, taking out lots of little pieces of wood .
sense of accomplishment for having healed the bird and giving it
man who lived in a little house . Every day ,

MIN ACTIVATIONS
MIN = -0.111

would sit by the books he lf and read together every
tried to climb the books he lf to get the book
puppy jumped on the books he lf and knocked all the
book falls off the books he lf . It makes a
showed her friend the books he lf . From then on
Can I touch the books he lf too ?" His mom
chair to climb the books he lf . She pushes the
They go to the books he lf and start to climb
felt his tum my ac he . He felt his head
about her tum my ac he . Her mom my said
. She moved the books he lf a little and got
she always kept her books he lf tidy and never did
books back on the books he lf , smiling proudly .
toy and poking the books he lf . She was having
can 't reach the books he lf . Can you reach
felt his tum my ac he starting to feel a bit
it back on the books he lf so we can find
books back on the books he lf , but this time
down the slide - bat he feeling the water on her
They ran to the books he lf and put the box

INTERVAL 0.034 - 0.082
CONTAINS 1.223%

it wasn 't safe . So he put it away and
Tim my promised to only shoot at pretend bad guys and
before they could board their plane . Lily was bored and
ummy treats ! Cl ara gently touched the door
others made her happy and helped her make new friends .

INTERVAL -0.014 - 0.034
CONTAINS 84.308%

toy . And be careful with it . It can break
way . As Sam j ogged , she had
so happy and gave his mom my a big hug .
some . Jill and Jack agreed that it was much better
! Emma was so happy and wanted to take the stone

INTERVAL -0.063 - -0.014
CONTAINS 14.444%

L ily said , " You are welcome ,
a time there was a purple flower . It was in
good all year and hoped for some nice toys .
. L ily thought for a moment . She
each other and Sue put on her sunglasses . " Thank

INTERVAL -0.111 - -0.063
CONTAINS 0.026%

unw ra pped the l oll ipop and l icked it
closer . She saw the spinning toy on the billboard .
They grabbed the vines and started to climb .
family with a rare books he lf in their home .
a bad tum my ac he and felt very sick .
Token But
Token ID887
Feature activation+1.19e-02

Pos logit contributions
 neither+0.017
charged+0.014
 Twe+0.014
idents+0.014
 worry+0.014
Neg logit contributions
book-0.013
te-0.013
 calendar-0.013
 fant-0.013
 around-0.012
Token as
Token ID355
Feature activation+2.48e-02

Pos logit contributions
 neither+0.015
 Instead+0.015
 So+0.015
 Twe+0.015
idents+0.013
Neg logit contributions
te-0.020
bridge-0.019
THE-0.018
 calendar-0.018
 fant-0.018
Token soon
Token ID2582
Feature activation+5.98e-03

Pos logit contributions
 Instead+0.039
 So+0.039
idents+0.037
 neither+0.036
ucker+0.036
Neg logit contributions
te-0.035
 fant-0.032
THE-0.031
book-0.031
bridge-0.030
Token as
Token ID355
Feature activation+3.18e-02

Pos logit contributions
 neither+0.007
 Instead+0.007
 So+0.006
 Twe+0.006
idents+0.006
Neg logit contributions
te-0.011
bridge-0.011
THE-0.010
 fant-0.010
book-0.010
Token she
Token ID673
Feature activation+3.95e-03

Pos logit contributions
 So+0.050
 Instead+0.049
idents+0.048
 neither+0.047
 Ow+0.045
Neg logit contributions
te-0.044
 fant-0.042
THE-0.041
bridge-0.040
 throne-0.039
Token bit
Token ID1643
Feature activation+8.24e-02

Pos logit contributions
 neither+0.006
 Instead+0.006
idents+0.006
 Twe+0.006
 So+0.005
Neg logit contributions
te-0.006
bridge-0.006
THE-0.005
 fant-0.005
book-0.005
Token into
Token ID656
Feature activation+2.92e-03

Pos logit contributions
 So+0.150
 Instead+0.146
idents+0.145
 neither+0.140
mm+0.140
Neg logit contributions
 around-0.089
te-0.081
 downstairs-0.078
 over-0.077
 forward-0.075
Token it
Token ID340
Feature activation+1.83e-02

Pos logit contributions
idents+0.005
 So+0.005
ety+0.005
 Instead+0.005
 neither+0.005
Neg logit contributions
te-0.003
 curtain-0.003
 corner-0.003
 calendar-0.003
 fant-0.003
Token,
Token ID11
Feature activation+1.33e-02

Pos logit contributions
 neither+0.028
 Twe+0.027
 Instead+0.027
 So+0.026
idents+0.026
Neg logit contributions
te-0.025
bridge-0.024
 fant-0.023
 curtain-0.023
 around-0.023
Token she
Token ID673
Feature activation-8.08e-03

Pos logit contributions
 So+0.020
 Instead+0.020
ucker+0.020
eter+0.019
 neither+0.019
Neg logit contributions
bridge-0.018
te-0.018
 fant-0.018
 calendar-0.018
 curtain-0.017
Token felt
Token ID2936
Feature activation-3.16e-03

Pos logit contributions
te+0.012
bridge+0.011
 throne+0.011
THE+0.011
 fant+0.010
Neg logit contributions
 Twe-0.012
 neither-0.012
 Instead-0.012
 So-0.012
idents-0.011
Token as
Token ID355
Feature activation+1.85e-02

Pos logit contributions
ucker+0.009
 Instead+0.009
eter+0.008
 So+0.008
idents+0.008
Neg logit contributions
THE-0.011
 around-0.010
te-0.010
 fant-0.010
 inside-0.010
Token soon
Token ID2582
Feature activation+2.67e-03

Pos logit contributions
ucker+0.031
idents+0.030
 So+0.030
 Instead+0.030
uck+0.029
Neg logit contributions
THE-0.024
te-0.023
 fant-0.023
 inside-0.020
 around-0.020
Token as
Token ID355
Feature activation+2.67e-02

Pos logit contributions
 neither+0.003
 Instead+0.003
 So+0.003
idents+0.003
ucker+0.003
Neg logit contributions
te-0.004
THE-0.004
bridge-0.004
 fant-0.004
 tour-0.004
Token Twe
Token ID24205
Feature activation+6.64e-02

Pos logit contributions
ucker+0.043
idents+0.042
 So+0.042
 Instead+0.040
uck+0.039
Neg logit contributions
 fant-0.037
THE-0.036
te-0.034
 amazing-0.033
 tour-0.032
Tokenety
Token ID2963
Feature activation+3.59e-02

Pos logit contributions
 neither+0.093
 Sorry+0.091
 Instead+0.081
 disaster+0.074
idents+0.072
Neg logit contributions
THE-0.097
ines-0.095
 pad-0.095
iet-0.093
----0.087
Token bit
Token ID1643
Feature activation+8.16e-02

Pos logit contributions
 neither+0.061
 Sorry+0.057
 Instead+0.057
idents+0.056
 Twe+0.055
Neg logit contributions
te-0.045
 around-0.041
bridge-0.040
 fant-0.039
book-0.039
Token the
Token ID262
Feature activation+9.55e-03

Pos logit contributions
idents+0.110
mm+0.107
 So+0.105
 Sorry+0.098
Fine+0.097
Neg logit contributions
 around-0.134
 over-0.123
 inside-0.118
 forward-0.112
 across-0.108
Token brick
Token ID17214
Feature activation+1.16e-03

Pos logit contributions
 neither+0.016
 Instead+0.015
 Twe+0.015
 So+0.015
 Mitt+0.014
Neg logit contributions
te-0.013
bridge-0.013
THE-0.012
 fant-0.012
book-0.012
Token,
Token ID11
Feature activation+7.07e-03

Pos logit contributions
 neither+0.002
 Instead+0.002
 Twe+0.002
 So+0.002
 Mitt+0.001
Neg logit contributions
te-0.002
bridge-0.002
 calendar-0.002
THE-0.002
book-0.002
Token it
Token ID340
Feature activation+1.18e-02

Pos logit contributions
ucker+0.015
eter+0.014
 So+0.013
charged+0.013
 Instead+0.013
Neg logit contributions
 fant-0.007
bridge-0.007
THE-0.007
 around-0.007
 inside-0.007
Token hurt
Token ID5938
Feature activation+1.69e-02

Pos logit contributions
 neither+0.016
 Instead+0.015
 Twe+0.015
 So+0.015
 Mitt+0.014
Neg logit contributions
te-0.020
bridge-0.019
THE-0.018
 fant-0.018
 calendar-0.018
Token favorite
Token ID4004
Feature activation-7.42e-03

Pos logit contributions
 neither+0.017
 Instead+0.016
 Twe+0.016
 So+0.016
 Mitt+0.015
Neg logit contributions
te-0.014
bridge-0.014
THE-0.013
 fant-0.013
 calendar-0.013
Token bugs
Token ID11316
Feature activation-1.82e-02

Pos logit contributions
te+0.011
bridge+0.011
 calendar+0.011
THE+0.011
 fant+0.011
Neg logit contributions
 neither-0.011
 Instead-0.010
 Twe-0.010
 So-0.010
 Mitt-0.009
Token because
Token ID780
Feature activation-2.87e-02

Pos logit contributions
te+0.032
bridge+0.032
THE+0.031
 calendar+0.030
 throne+0.030
Neg logit contributions
 neither-0.022
 Twe-0.020
 Instead-0.020
 So-0.020
 Mitt-0.018
Token they
Token ID484
Feature activation+4.94e-03

Pos logit contributions
te+0.053
bridge+0.052
THE+0.050
book+0.049
 fant+0.049
Neg logit contributions
 neither-0.033
 Instead-0.031
 Twe-0.030
 So-0.030
 Mitt-0.028
Token always
Token ID1464
Feature activation+2.48e-02

Pos logit contributions
 neither+0.006
 Instead+0.006
 Twe+0.006
 So+0.006
 Mitt+0.005
Neg logit contributions
te-0.008
bridge-0.008
THE-0.008
 fant-0.008
 calendar-0.008
Token bit
Token ID1643
Feature activation+7.85e-02

Pos logit contributions
 neither+0.038
 So+0.038
 Instead+0.037
 Twe+0.037
 Ow+0.037
Neg logit contributions
te-0.032
 around-0.031
 fant-0.030
 throne-0.029
 curtain-0.028
Token him
Token ID683
Feature activation+2.71e-02

Pos logit contributions
 So+0.123
 Instead+0.120
 neither+0.117
 Twe+0.115
mm+0.114
Neg logit contributions
 around-0.103
ines-0.095
te-0.093
 fant-0.090
 curtain-0.090
Token and
Token ID290
Feature activation+7.41e-03

Pos logit contributions
 neither+0.039
 Twe+0.038
 Instead+0.037
 So+0.035
 Mitt+0.034
Neg logit contributions
te-0.041
THE-0.038
 calendar-0.038
bridge-0.037
 fant-0.037
Token made
Token ID925
Feature activation+1.79e-02

Pos logit contributions
 Instead+0.010
 So+0.010
 neither+0.010
 Twe+0.010
 Ow+0.009
Neg logit contributions
te-0.011
 throne-0.010
 fant-0.010
bridge-0.010
 calendar-0.010
Token him
Token ID683
Feature activation+1.80e-02

Pos logit contributions
 neither+0.029
 Instead+0.027
 So+0.027
ety+0.027
our+0.026
Neg logit contributions
 around-0.023
bridge-0.022
 fant-0.022
te-0.022
 calendar-0.021
Token it
Token ID340
Feature activation+5.36e-03

Pos logit contributions
 neither+0.026
 Instead+0.025
 Twe+0.025
 So+0.025
 Mitt+0.023
Neg logit contributions
te-0.027
bridge-0.026
THE-0.025
 throne-0.025
 fant-0.025
Token stars
Token ID5788
Feature activation-1.53e-02

Pos logit contributions
 neither+0.026
 Instead+0.025
 Twe+0.025
 So+0.025
 Mitt+0.023
Neg logit contributions
te-0.018
bridge-0.017
book-0.017
THE-0.016
 fant-0.016
Token are
Token ID389
Feature activation+1.34e-02

Pos logit contributions
te+0.026
bridge+0.025
THE+0.024
 calendar+0.024
 fant+0.024
Neg logit contributions
 neither-0.020
 Instead-0.019
 Twe-0.019
 So-0.018
 Mitt-0.017
Token like
Token ID588
Feature activation-5.83e-03

Pos logit contributions
 neither+0.019
 Instead+0.018
 Twe+0.018
 So+0.016
 Mitt+0.016
Neg logit contributions
bridge-0.022
te-0.021
book-0.021
THE-0.020
ibrarian-0.019
Token lots
Token ID6041
Feature activation+2.61e-02

Pos logit contributions
te+0.009
bridge+0.009
 fant+0.009
 calendar+0.009
THE+0.008
Neg logit contributions
 neither-0.008
 Instead-0.008
 So-0.008
 Twe-0.007
idents-0.007
Token of
Token ID286
Feature activation+2.48e-02

Pos logit contributions
 neither+0.042
 Instead+0.042
 Twe+0.041
 So+0.041
 Mitt+0.038
Neg logit contributions
te-0.034
bridge-0.033
THE-0.032
book-0.031
 calendar-0.031
Token little
Token ID1310
Feature activation+7.85e-02

Pos logit contributions
 neither+0.039
 Instead+0.039
 Twe+0.039
 So+0.038
 Mitt+0.035
Neg logit contributions
te-0.033
bridge-0.032
THE-0.031
book-0.031
 calendar-0.030
Token planets
Token ID14705
Feature activation+2.34e-03

Pos logit contributions
 neither+0.143
 Instead+0.138
 Twe+0.136
 So+0.134
 Mitt+0.133
Neg logit contributions
te-0.090
bridge-0.088
 fant-0.086
 calendar-0.084
 throne-0.083
Token,
Token ID11
Feature activation+1.14e-02

Pos logit contributions
 neither+0.003
 Instead+0.002
 So+0.002
 Twe+0.002
 you+0.002
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
book-0.004
 fant-0.004
Token each
Token ID1123
Feature activation+1.06e-02

Pos logit contributions
 neither+0.016
 Instead+0.015
 Twe+0.015
 So+0.015
 Mitt+0.014
Neg logit contributions
te-0.018
bridge-0.018
THE-0.017
book-0.017
 fant-0.017
Token one
Token ID530
Feature activation+2.97e-02

Pos logit contributions
 neither+0.015
 Instead+0.015
 So+0.014
 Twe+0.014
 Mitt+0.013
Neg logit contributions
te-0.016
bridge-0.016
THE-0.015
 fant-0.015
book-0.014
Token is
Token ID318
Feature activation+2.59e-03

Pos logit contributions
 neither+0.041
 Instead+0.040
 So+0.038
 Twe+0.038
 Mitt+0.035
Neg logit contributions
te-0.047
bridge-0.047
THE-0.045
 fant-0.044
book-0.044
Token made
Token ID925
Feature activation+1.29e-02

Pos logit contributions
 neither+0.002
 Instead+0.002
 So+0.002
 Twe+0.002
idents+0.002
Neg logit contributions
te-0.002
bridge-0.002
 throne-0.002
THE-0.002
 carriage-0.002
Token it
Token ID340
Feature activation+3.66e-03

Pos logit contributions
 neither+0.018
 Instead+0.017
 So+0.017
 Twe+0.017
ety+0.016
Neg logit contributions
bridge-0.021
te-0.020
 fant-0.019
 calendar-0.019
 around-0.019
Token into
Token ID656
Feature activation-3.80e-03

Pos logit contributions
 neither+0.006
 Instead+0.006
 Twe+0.006
 So+0.005
 Mitt+0.005
Neg logit contributions
te-0.005
bridge-0.005
THE-0.005
 fant-0.005
 calendar-0.005
Token lots
Token ID6041
Feature activation+3.51e-03

Pos logit contributions
te+0.006
bridge+0.006
 throne+0.005
 fant+0.005
book+0.005
Neg logit contributions
 neither-0.006
 So-0.005
idents-0.005
 Twe-0.005
ety-0.005
Token of
Token ID286
Feature activation+1.99e-02

Pos logit contributions
 neither+0.006
 Instead+0.006
 Twe+0.006
 So+0.006
 Mitt+0.006
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
 fant-0.004
book-0.004
Token little
Token ID1310
Feature activation+7.83e-02

Pos logit contributions
 neither+0.035
 Instead+0.033
 Twe+0.033
 So+0.032
idents+0.031
Neg logit contributions
bridge-0.025
te-0.025
 fant-0.024
 calendar-0.023
THE-0.023
Token pieces
Token ID5207
Feature activation+3.85e-03

Pos logit contributions
 neither+0.120
 Twe+0.113
 Instead+0.112
 So+0.109
 Mitt+0.107
Neg logit contributions
bridge-0.113
te-0.111
 fant-0.108
THE-0.105
 throne-0.105
Token.
Token ID13
Feature activation+1.01e-02

Pos logit contributions
 neither+0.005
 Instead+0.005
 Twe+0.005
 So+0.005
 Mitt+0.004
Neg logit contributions
te-0.006
bridge-0.006
 fant-0.006
 calendar-0.006
THE-0.006
Token Now
Token ID2735
Feature activation-5.92e-03

Pos logit contributions
 neither+0.016
idents+0.015
ety+0.015
 worry+0.015
 scrape+0.014
Neg logit contributions
te-0.013
bridge-0.013
THE-0.013
book-0.013
idi-0.012
Token everyone
Token ID2506
Feature activation-9.64e-03

Pos logit contributions
te+0.010
bridge+0.009
THE+0.009
 fant+0.009
 throne+0.009
Neg logit contributions
 neither-0.008
 Instead-0.007
 Twe-0.007
 So-0.007
idents-0.007
Token could
Token ID714
Feature activation+2.31e-02

Pos logit contributions
bridge+0.017
te+0.016
book+0.016
THE+0.016
 calendar+0.016
Neg logit contributions
 neither-0.012
 Instead-0.011
 So-0.011
 Twe-0.010
 B-0.009
Token.
Token ID13
Feature activation+1.09e-02

Pos logit contributions
te+0.002
bridge+0.002
THE+0.002
 fant+0.002
book+0.002
Neg logit contributions
 neither-0.001
 Instead-0.001
 So-0.001
 Twe-0.001
idents-0.001
Token As
Token ID1081
Feature activation+2.07e-02

Pos logit contributions
 neither+0.021
ety+0.019
charged+0.018
idents+0.018
 worry+0.018
Neg logit contributions
book-0.011
 around-0.011
te-0.011
idi-0.011
 corner-0.011
Token soon
Token ID2582
Feature activation-5.18e-03

Pos logit contributions
 Instead+0.029
 neither+0.029
 So+0.028
idents+0.027
 Twe+0.026
Neg logit contributions
te-0.032
THE-0.031
 fant-0.031
bridge-0.031
 throne-0.030
Token as
Token ID355
Feature activation+2.43e-02

Pos logit contributions
te+0.009
bridge+0.009
THE+0.009
 fant+0.009
book+0.009
Neg logit contributions
 neither-0.006
 Instead-0.006
 So-0.005
 Twe-0.005
idents-0.005
Token he
Token ID339
Feature activation+1.86e-02

Pos logit contributions
 So+0.040
 Instead+0.039
idents+0.038
 neither+0.037
ucker+0.036
Neg logit contributions
te-0.032
THE-0.031
 fant-0.031
 throne-0.030
bridge-0.029
Token bit
Token ID1643
Feature activation+7.77e-02

Pos logit contributions
 neither+0.028
 Twe+0.027
 Instead+0.027
idents+0.027
 So+0.025
Neg logit contributions
te-0.028
THE-0.025
bridge-0.025
book-0.024
 fant-0.024
Token it
Token ID340
Feature activation+2.23e-02

Pos logit contributions
 So+0.131
idents+0.131
 Instead+0.130
 neither+0.129
mm+0.126
Neg logit contributions
 around-0.092
 downstairs-0.082
te-0.080
 over-0.079
 forward-0.078
Token,
Token ID11
Feature activation+8.40e-03

Pos logit contributions
 neither+0.032
 Instead+0.031
 Twe+0.030
 So+0.030
idents+0.028
Neg logit contributions
te-0.034
bridge-0.033
THE-0.032
book-0.032
 fant-0.031
Token the
Token ID262
Feature activation+6.45e-03

Pos logit contributions
ucker+0.017
ety+0.016
eter+0.016
 So+0.016
 Instead+0.015
Neg logit contributions
 around-0.009
bridge-0.009
 fant-0.009
te-0.008
 throne-0.008
Token nut
Token ID6701
Feature activation+1.29e-02

Pos logit contributions
 neither+0.012
 Twe+0.011
 Instead+0.011
ucker+0.010
 So+0.010
Neg logit contributions
 calendar-0.008
 corner-0.008
 curtain-0.008
bridge-0.008
te-0.007
Token cracked
Token ID21368
Feature activation+2.71e-02

Pos logit contributions
 neither+0.017
 Instead+0.017
 Twe+0.017
 So+0.016
 Mitt+0.015
Neg logit contributions
te-0.021
bridge-0.021
THE-0.020
book-0.019
 fant-0.019
Token dog
Token ID3290
Feature activation+1.33e-02

Pos logit contributions
 neither+0.007
 Instead+0.007
 Twe+0.007
 So+0.006
 Mitt+0.006
Neg logit contributions
te-0.008
bridge-0.008
THE-0.008
 calendar-0.008
 fant-0.008
Token was
Token ID373
Feature activation+5.35e-03

Pos logit contributions
 Instead+0.019
 neither+0.019
 So+0.018
 Twe+0.018
idents+0.018
Neg logit contributions
te-0.019
 fant-0.017
 around-0.017
idi-0.016
 calendar-0.016
Token not
Token ID407
Feature activation-1.10e-02

Pos logit contributions
 neither+0.007
 Instead+0.006
 Twe+0.006
 Mitt+0.006
 So+0.006
Neg logit contributions
bridge-0.010
te-0.009
THE-0.009
book-0.009
 calendar-0.009
Token friendly
Token ID8030
Feature activation+1.62e-02

Pos logit contributions
te+0.014
bridge+0.013
THE+0.013
book+0.013
 fant+0.012
Neg logit contributions
 neither-0.019
 Instead-0.019
 So-0.018
 Twe-0.018
idents-0.017
Token and
Token ID290
Feature activation-1.51e-05

Pos logit contributions
 neither+0.022
 Twe+0.021
 Instead+0.021
 So+0.020
 Mitt+0.019
Neg logit contributions
te-0.026
bridge-0.024
THE-0.024
book-0.024
 fant-0.023
Token bit
Token ID1643
Feature activation+7.71e-02

Pos logit contributions
te+0.000
 around+0.000
 fant+0.000
idi+0.000
 tour+0.000
Neg logit contributions
 So-0.000
 Instead-0.000
 Ow-0.000
uck-0.000
idents-0.000
Token Tim
Token ID5045
Feature activation+8.85e-03

Pos logit contributions
�+0.087
mm+0.084
 So+0.080
unch+0.078
idents+0.076
Neg logit contributions
 around-0.147
 over-0.133
 inside-0.124
 forward-0.124
ines-0.122
Tokenmy
Token ID1820
Feature activation+3.00e-03

Pos logit contributions
 Sorry+0.015
 Twe+0.015
 neither+0.014
mm+0.014
 Instead+0.014
Neg logit contributions
te-0.011
 around-0.010
ines-0.010
book-0.010
 over-0.010
Token's
Token ID338
Feature activation+1.26e-02

Pos logit contributions
 Twe+0.004
 neither+0.004
 Instead+0.004
 Ow+0.004
 Sorry+0.004
Neg logit contributions
te-0.004
 around-0.004
book-0.004
THE-0.004
 calendar-0.004
Token hand
Token ID1021
Feature activation+4.13e-02

Pos logit contributions
 neither+0.019
 Twe+0.018
 Instead+0.018
 So+0.018
 Mitt+0.016
Neg logit contributions
te-0.018
bridge-0.017
 calendar-0.017
THE-0.017
 throne-0.017
Token.
Token ID13
Feature activation+6.59e-03

Pos logit contributions
 Twe+0.062
 Instead+0.062
 neither+0.060
 Ow+0.060
idents+0.060
Neg logit contributions
te-0.055
 around-0.052
book-0.048
bridge-0.047
 over-0.046
Token didn
Token ID1422
Feature activation-1.20e-02

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
 Mitt+0.001
Neg logit contributions
te-0.001
bridge-0.001
THE-0.001
book-0.001
 fant-0.001
Token't
Token ID470
Feature activation-8.96e-03

Pos logit contributions
te+0.020
bridge+0.020
THE+0.019
 fant+0.019
book+0.019
Neg logit contributions
 neither-0.015
 Instead-0.015
 Twe-0.014
 So-0.014
 Mitt-0.013
Token like
Token ID588
Feature activation-4.57e-03

Pos logit contributions
te+0.012
bridge+0.011
 fant+0.010
book+0.010
THE+0.010
Neg logit contributions
 Instead-0.015
 neither-0.015
 So-0.014
 Twe-0.014
idents-0.013
Token it
Token ID340
Feature activation+2.62e-02

Pos logit contributions
te+0.008
bridge+0.007
 fant+0.007
 calendar+0.007
THE+0.007
Neg logit contributions
 neither-0.006
 Instead-0.006
 So-0.006
 Twe-0.005
idents-0.005
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
 Instead+0.037
 Twe+0.037
 neither+0.036
 So+0.036
 Ow+0.035
Neg logit contributions
te-0.039
 around-0.035
 calendar-0.035
book-0.035
THE-0.035
Token bit
Token ID1643
Feature activation+7.70e-02

Pos logit contributions
 So+0.012
 Instead+0.012
 neither+0.012
 Ow+0.012
idents+0.011
Neg logit contributions
te-0.018
bridge-0.017
 fant-0.017
THE-0.017
 around-0.017
Token the
Token ID262
Feature activation+9.54e-03

Pos logit contributions
 So+0.114
 Instead+0.112
mm+0.108
 neither+0.107
idents+0.105
Neg logit contributions
 around-0.110
 over-0.098
 downstairs-0.096
te-0.096
ines-0.096
Token boy
Token ID2933
Feature activation+1.90e-02

Pos logit contributions
 neither+0.016
 Instead+0.015
 Twe+0.015
 So+0.015
ucker+0.015
Neg logit contributions
 calendar-0.012
te-0.012
 curtain-0.012
 corner-0.012
bridge-0.012
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
 Twe+0.028
 Ow+0.028
 Instead+0.028
idents+0.027
 neither+0.027
Neg logit contributions
te-0.026
 around-0.025
bridge-0.023
book-0.023
 throne-0.023
Token Lily
Token ID20037
Feature activation+8.04e-03

Pos logit contributions
 neither+0.016
 acts+0.014
unn+0.013
our+0.013
 mun+0.013
Neg logit contributions
THE-0.015
 fant-0.014
te-0.014
book-0.013
bridge-0.013
Token learned
Token ID4499
Feature activation-2.83e-03

Pos logit contributions
 neither+0.011
 Twe+0.011
 Instead+0.011
 Ow+0.011
idents+0.010
Neg logit contributions
te-0.012
THE-0.012
idi-0.011
bridge-0.011
 around-0.011
Token,
Token ID11
Feature activation+7.32e-03

Pos logit contributions
 neither+0.052
 Instead+0.051
 Twe+0.050
 So+0.049
 Mitt+0.046
Neg logit contributions
te-0.058
bridge-0.055
THE-0.054
 calendar-0.053
book-0.053
Token but
Token ID475
Feature activation+1.25e-02

Pos logit contributions
 neither+0.011
 Instead+0.010
 Twe+0.010
 So+0.010
idents+0.009
Neg logit contributions
te-0.011
bridge-0.011
THE-0.011
 fant-0.010
 calendar-0.010
Token it
Token ID340
Feature activation+2.10e-02

Pos logit contributions
 neither+0.016
 Instead+0.016
 So+0.015
 Twe+0.015
idents+0.014
Neg logit contributions
te-0.021
bridge-0.020
THE-0.020
 fant-0.019
book-0.019
Token was
Token ID373
Feature activation+1.86e-02

Pos logit contributions
 neither+0.026
 Instead+0.025
 So+0.024
 Twe+0.024
 Mitt+0.022
Neg logit contributions
bridge-0.037
te-0.037
book-0.035
 calendar-0.035
 throne-0.035
Token all
Token ID477
Feature activation+1.03e-02

Pos logit contributions
 neither+0.024
 Instead+0.023
 Twe+0.023
 So+0.021
 Mitt+0.021
Neg logit contributions
bridge-0.032
te-0.031
 calendar-0.030
book-0.030
 throne-0.030
Token fore
Token ID1674
Feature activation+7.65e-02

Pos logit contributions
 neither+0.017
 Instead+0.016
 Twe+0.016
 So+0.016
 Mitt+0.015
Neg logit contributions
te-0.014
bridge-0.014
THE-0.013
book-0.013
 calendar-0.013
Tokenshadow
Token ID19106
Feature activation+1.60e-02

Pos logit contributions
 neither+0.093
 Instead+0.090
 Twe+0.086
 So+0.085
 Mitt+0.083
Neg logit contributions
te-0.135
bridge-0.129
THE-0.126
 fant-0.122
 carriage-0.121
Tokened
Token ID276
Feature activation+3.50e-02

Pos logit contributions
idents+0.026
 Sorry+0.024
 scrape+0.024
 Twe+0.024
 neither+0.024
Neg logit contributions
te-0.025
amed-0.021
 around-0.020
THE-0.020
ative-0.019
Token by
Token ID416
Feature activation-1.91e-02

Pos logit contributions
 neither+0.055
 Instead+0.054
 So+0.053
 Twe+0.052
ucker+0.050
Neg logit contributions
te-0.048
 around-0.043
bridge-0.042
THE-0.042
 fant-0.042
Token the
Token ID262
Feature activation+9.09e-03

Pos logit contributions
te+0.028
 calendar+0.023
 fant+0.023
 throne+0.023
 curtain+0.023
Neg logit contributions
 So-0.030
 Instead-0.029
Fine-0.028
 neither-0.028
ucker-0.028
Token kind
Token ID1611
Feature activation+1.73e-03

Pos logit contributions
 Instead+0.015
 neither+0.015
 So+0.015
 Twe+0.015
ucker+0.014
Neg logit contributions
te-0.012
 calendar-0.011
bridge-0.011
 fant-0.010
THE-0.010
Token very
Token ID845
Feature activation+6.50e-03

Pos logit contributions
 Twe+0.014
 neither+0.014
 Mitt+0.013
 Instead+0.013
 Tim+0.012
Neg logit contributions
bridge-0.021
THE-0.020
te-0.020
iet-0.020
anne-0.019
Token organized
Token ID8389
Feature activation+2.21e-03

Pos logit contributions
 Twe+0.010
 So+0.009
 neither+0.009
 Instead+0.009
 D+0.008
Neg logit contributions
bridge-0.011
te-0.010
THE-0.010
book-0.009
 carriage-0.009
Token with
Token ID351
Feature activation+1.55e-02

Pos logit contributions
 neither+0.003
 Instead+0.002
 Twe+0.002
 So+0.002
 Mitt+0.002
Neg logit contributions
bridge-0.004
te-0.004
THE-0.004
 fant-0.004
 throne-0.004
Token lots
Token ID6041
Feature activation+8.69e-03

Pos logit contributions
 neither+0.024
 Instead+0.023
 So+0.023
 Twe+0.022
idents+0.022
Neg logit contributions
te-0.022
bridge-0.021
 calendar-0.020
book-0.020
 fant-0.020
Token of
Token ID286
Feature activation+2.06e-02

Pos logit contributions
 Instead+0.014
 So+0.014
 neither+0.013
 Twe+0.013
 Mitt+0.012
Neg logit contributions
te-0.012
bridge-0.012
THE-0.011
 fant-0.011
 throne-0.011
Token little
Token ID1310
Feature activation+7.61e-02

Pos logit contributions
 neither+0.028
 Instead+0.028
 Twe+0.028
 So+0.027
 Mitt+0.025
Neg logit contributions
te-0.033
bridge-0.032
THE-0.031
 throne-0.030
book-0.030
Token draw
Token ID3197
Feature activation+1.02e-02

Pos logit contributions
 neither+0.115
 Instead+0.108
 Twe+0.105
 So+0.102
idents+0.098
Neg logit contributions
te-0.109
bridge-0.109
 fant-0.105
 calendar-0.104
 curtain-0.103
Tokeners
Token ID364
Feature activation+1.25e-02

Pos logit contributions
 Instead+0.013
 neither+0.013
 Twe+0.013
 So+0.012
 Mitt+0.012
Neg logit contributions
te-0.016
THE-0.016
 throne-0.016
 fant-0.016
idi-0.015
Token.
Token ID13
Feature activation+5.56e-03

Pos logit contributions
 neither+0.015
 Instead+0.014
 So+0.014
 Twe+0.014
 Mitt+0.012
Neg logit contributions
te-0.023
bridge-0.022
THE-0.021
 fant-0.021
book-0.021
Token She
Token ID1375
Feature activation-3.42e-03

Pos logit contributions
 neither+0.009
idents+0.007
eter+0.007
ucker+0.007
 Twe+0.007
Neg logit contributions
te-0.008
 fant-0.008
THE-0.007
 corner-0.007
 calendar-0.007
Token was
Token ID373
Feature activation+7.00e-03

Pos logit contributions
te+0.006
bridge+0.006
THE+0.005
 fant+0.005
 calendar+0.005
Neg logit contributions
 neither-0.004
 Instead-0.004
 Twe-0.004
 So-0.004
 Mitt-0.004
Token known
Token ID1900
Feature activation+9.76e-03

Pos logit contributions
 neither+0.004
 Twe+0.004
 Instead+0.004
 Mitt+0.003
 So+0.003
Neg logit contributions
 calendar-0.005
bridge-0.005
te-0.005
book-0.005
 curtain-0.005
Token as
Token ID355
Feature activation+8.94e-03

Pos logit contributions
 neither+0.012
 Instead+0.011
 Twe+0.011
 So+0.011
 Mitt+0.010
Neg logit contributions
te-0.017
bridge-0.017
THE-0.016
book-0.016
 fant-0.016
Token the
Token ID262
Feature activation+1.13e-02

Pos logit contributions
 neither+0.014
 So+0.013
idents+0.013
 Instead+0.013
ucker+0.013
Neg logit contributions
te-0.013
 fant-0.012
bridge-0.012
THE-0.011
book-0.011
Token whale
Token ID22206
Feature activation+4.11e-02

Pos logit contributions
 neither+0.019
 Instead+0.018
 Twe+0.018
 So+0.018
 Mitt+0.017
Neg logit contributions
te-0.015
bridge-0.014
THE-0.014
 fant-0.013
 calendar-0.013
Token who
Token ID508
Feature activation+2.21e-02

Pos logit contributions
 neither+0.062
 Instead+0.060
 Twe+0.059
 So+0.058
 Mitt+0.055
Neg logit contributions
te-0.060
bridge-0.059
THE-0.056
 fant-0.055
 calendar-0.055
Token healed
Token ID28557
Feature activation+7.53e-02

Pos logit contributions
 neither+0.036
 Instead+0.035
 So+0.035
 Twe+0.034
idents+0.034
Neg logit contributions
te-0.030
bridge-0.028
 fant-0.027
THE-0.027
book-0.026
Token.
Token ID13
Feature activation+8.55e-03

Pos logit contributions
 neither+0.118
ucker+0.116
idents+0.115
 Instead+0.115
 So+0.111
Neg logit contributions
te-0.104
 around-0.093
 fant-0.091
THE-0.090
bridge-0.085
Token Animals
Token ID25390
Feature activation+1.64e-02

Pos logit contributions
 neither+0.012
 Twe+0.011
idents+0.010
 Mitt+0.010
 Instead+0.010
Neg logit contributions
te-0.014
THE-0.013
 fant-0.013
bridge-0.013
 calendar-0.012
Token from
Token ID422
Feature activation-2.16e-02

Pos logit contributions
 neither+0.023
 Instead+0.021
 Twe+0.021
 So+0.021
 Mitt+0.018
Neg logit contributions
bridge-0.026
te-0.026
book-0.025
THE-0.025
 throne-0.024
Token all
Token ID477
Feature activation-5.49e-03

Pos logit contributions
te+0.034
bridge+0.034
THE+0.032
book+0.032
 fant+0.031
Neg logit contributions
 neither-0.030
 Instead-0.029
 Twe-0.029
 So-0.028
 Mitt-0.026
Token over
Token ID625
Feature activation-1.35e-02

Pos logit contributions
bridge+0.009
te+0.009
book+0.009
THE+0.009
 calendar+0.009
Neg logit contributions
 neither-0.007
 Instead-0.007
 So-0.007
 Twe-0.006
 B-0.006
Token.
Token ID13
Feature activation-5.88e-03

Pos logit contributions
 neither+0.044
 Instead+0.042
 Twe+0.040
 So+0.040
ucker+0.040
Neg logit contributions
te-0.054
 calendar-0.051
 fant-0.050
 throne-0.050
THE-0.050
Token I
Token ID314
Feature activation+5.85e-03

Pos logit contributions
te+0.010
bridge+0.009
 fant+0.009
 throne+0.009
THE+0.009
Neg logit contributions
 neither-0.007
ucker-0.006
idents-0.006
ety-0.006
 mun-0.006
Token can
Token ID460
Feature activation+6.91e-03

Pos logit contributions
 neither+0.008
 Instead+0.008
 Twe+0.008
 So+0.007
 Mitt+0.007
Neg logit contributions
te-0.010
bridge-0.009
 fant-0.008
THE-0.008
book-0.008
Token't
Token ID470
Feature activation-1.56e-02

Pos logit contributions
 neither+0.010
 Instead+0.010
 So+0.010
 Twe+0.010
 Mitt+0.009
Neg logit contributions
te-0.010
bridge-0.010
 throne-0.009
THE-0.009
 calendar-0.009
Token be
Token ID307
Feature activation+1.64e-02

Pos logit contributions
te+0.023
bridge+0.022
THE+0.021
 throne+0.021
 calendar+0.020
Neg logit contributions
 neither-0.024
 Instead-0.024
 So-0.023
ucker-0.022
 Twe-0.022
Token bit
Token ID1643
Feature activation+7.53e-02

Pos logit contributions
 So+0.025
 neither+0.024
 Instead+0.024
idents+0.023
 Twe+0.023
Neg logit contributions
te-0.024
bridge-0.023
 throne-0.021
 fant-0.021
THE-0.021
Tokenen
Token ID268
Feature activation+2.38e-02

Pos logit contributions
 So+0.116
mm+0.115
 Instead+0.112
 neither+0.110
idents+0.108
Neg logit contributions
 around-0.103
te-0.094
 downstairs-0.092
 forward-0.091
ines-0.090
Token.
Token ID13
Feature activation-1.04e-03

Pos logit contributions
 neither+0.035
 Instead+0.033
 Twe+0.033
 So+0.032
 Mitt+0.030
Neg logit contributions
te-0.036
bridge-0.035
THE-0.034
book-0.034
 fant-0.033
Token But
Token ID887
Feature activation+5.19e-03

Pos logit contributions
te+0.002
bridge+0.002
 fant+0.002
THE+0.002
book+0.002
Neg logit contributions
 neither-0.001
 Twe-0.001
idents-0.001
ucker-0.001
ety-0.001
Token if
Token ID611
Feature activation+1.89e-02

Pos logit contributions
 Instead+0.006
 So+0.006
ucker+0.005
 neither+0.005
 Twe+0.005
Neg logit contributions
te-0.009
 throne-0.009
 calendar-0.009
 curtain-0.009
 fant-0.009
Token you
Token ID345
Feature activation-7.65e-03

Pos logit contributions
ucker+0.026
unch+0.024
ety+0.023
elt+0.022
 Instead+0.022
Neg logit contributions
 fant-0.029
te-0.029
 amazing-0.028
 throne-0.028
 calendar-0.028
Token.
Token ID13
Feature activation+2.64e-03

Pos logit contributions
 neither+0.028
 Instead+0.027
 Twe+0.026
 So+0.026
 Mitt+0.024
Neg logit contributions
te-0.036
bridge-0.035
THE-0.034
book-0.033
 fant-0.033
Token There
Token ID1318
Feature activation+2.75e-02

Pos logit contributions
 neither+0.004
 Twe+0.004
idents+0.004
 Mitt+0.004
ucker+0.003
Neg logit contributions
te-0.004
bridge-0.004
 throne-0.004
 fant-0.004
book-0.004
Token were
Token ID547
Feature activation+8.72e-03

Pos logit contributions
 neither+0.036
 Instead+0.036
 Twe+0.035
 So+0.034
 Mitt+0.032
Neg logit contributions
te-0.045
bridge-0.044
THE-0.042
 fant-0.042
book-0.042
Token lots
Token ID6041
Feature activation+1.62e-02

Pos logit contributions
 neither+0.011
 Instead+0.010
 Twe+0.010
 So+0.010
 Mitt+0.009
Neg logit contributions
te-0.015
bridge-0.015
THE-0.014
 fant-0.014
book-0.014
Token of
Token ID286
Feature activation+2.45e-02

Pos logit contributions
 Twe+0.023
 neither+0.023
 Instead+0.023
 So+0.023
 Mitt+0.022
Neg logit contributions
te-0.024
bridge-0.023
THE-0.023
book-0.022
 calendar-0.022
Token little
Token ID1310
Feature activation+7.51e-02

Pos logit contributions
 neither+0.037
 Instead+0.036
 Twe+0.035
 So+0.035
 Mitt+0.033
Neg logit contributions
te-0.036
bridge-0.035
THE-0.033
 fant-0.033
 calendar-0.033
Token birds
Token ID10087
Feature activation+8.89e-03

Pos logit contributions
 neither+0.116
 Instead+0.113
 So+0.109
 Twe+0.108
 Mitt+0.103
Neg logit contributions
te-0.105
bridge-0.104
 fant-0.101
 calendar-0.098
THE-0.097
Token and
Token ID290
Feature activation+1.42e-03

Pos logit contributions
 neither+0.014
 Instead+0.013
 Twe+0.013
 So+0.013
 Mitt+0.012
Neg logit contributions
te-0.013
bridge-0.012
THE-0.012
 fant-0.012
 calendar-0.011
Token flowers
Token ID12734
Feature activation-1.08e-02

Pos logit contributions
 Instead+0.002
 So+0.002
 neither+0.002
idents+0.002
ucker+0.002
Neg logit contributions
bridge-0.002
te-0.002
 fant-0.002
 throne-0.002
 calendar-0.002
Token that
Token ID326
Feature activation+5.85e-03

Pos logit contributions
te+0.019
bridge+0.018
THE+0.018
 fant+0.017
 calendar+0.017
Neg logit contributions
 neither-0.013
 Instead-0.013
 Twe-0.013
 So-0.012
 Mitt-0.011
Token Todd
Token ID14377
Feature activation+2.10e-02

Pos logit contributions
 neither+0.008
 Instead+0.007
 So+0.007
 Twe+0.007
 Mitt+0.007
Neg logit contributions
te-0.010
bridge-0.010
THE-0.009
 fant-0.009
book-0.009
Token day
Token ID1110
Feature activation+1.88e-02

Pos logit contributions
bridge+0.006
 throne+0.005
te+0.005
 carriage+0.005
 vows+0.005
Neg logit contributions
 Twe-0.005
 neither-0.005
 Instead-0.004
 Mitt-0.004
 you-0.004
Token and
Token ID290
Feature activation+7.55e-03

Pos logit contributions
 neither+0.024
 Instead+0.023
 So+0.023
 Twe+0.022
 Mitt+0.021
Neg logit contributions
bridge-0.032
te-0.032
THE-0.031
 fant-0.030
 calendar-0.029
Token wanted
Token ID2227
Feature activation-3.86e-03

Pos logit contributions
 neither+0.012
 Instead+0.012
 So+0.012
 Twe+0.012
 Ow+0.011
Neg logit contributions
te-0.010
bridge-0.009
THE-0.009
 fant-0.009
 calendar-0.009
Token something
Token ID1223
Feature activation+8.55e-03

Pos logit contributions
te+0.005
bridge+0.005
THE+0.005
 fant+0.004
 calendar+0.004
Neg logit contributions
 neither-0.007
 Instead-0.006
 So-0.006
 Twe-0.006
idents-0.006
Token little
Token ID1310
Feature activation+6.40e-02

Pos logit contributions
 neither+0.013
 Instead+0.013
 Twe+0.013
 So+0.013
 Mitt+0.012
Neg logit contributions
bridge-0.013
te-0.013
THE-0.012
book-0.011
 carriage-0.011
Token bit
Token ID1643
Feature activation+7.49e-02

Pos logit contributions
 Twe+0.100
 neither+0.098
 Instead+0.097
 Ow+0.095
ucker+0.094
Neg logit contributions
 calendar-0.083
 around-0.079
te-0.079
 downstairs-0.078
 curtain-0.078
Token different
Token ID1180
Feature activation+6.69e-03

Pos logit contributions
 neither+0.118
 Twe+0.114
 Instead+0.113
 So+0.113
idents+0.109
Neg logit contributions
 around-0.096
te-0.094
 calendar-0.091
ines-0.090
 fant-0.089
Token.
Token ID13
Feature activation+1.31e-02

Pos logit contributions
 neither+0.008
 Instead+0.008
 Twe+0.008
 So+0.008
 Mitt+0.007
Neg logit contributions
te-0.011
bridge-0.011
THE-0.011
book-0.011
 fant-0.011
Token So
Token ID1406
Feature activation+4.34e-02

Pos logit contributions
 neither+0.017
 Twe+0.016
 Mitt+0.015
 Instead+0.015
idents+0.015
Neg logit contributions
te-0.021
 fant-0.020
THE-0.020
bridge-0.020
 throne-0.020
Token she
Token ID673
Feature activation-1.03e-02

Pos logit contributions
 neither+0.058
 Instead+0.057
 Twe+0.056
 So+0.054
 Mitt+0.052
Neg logit contributions
te-0.069
bridge-0.065
THE-0.065
 fant-0.065
 calendar-0.064
Token asked
Token ID1965
Feature activation+9.11e-03

Pos logit contributions
te+0.018
bridge+0.018
THE+0.017
 calendar+0.017
book+0.017
Neg logit contributions
 neither-0.013
 Instead-0.012
 Twe-0.012
 So-0.012
 Mitt-0.011
Token want
Token ID765
Feature activation+1.25e-03

Pos logit contributions
te+0.015
bridge+0.015
THE+0.014
 fant+0.014
 calendar+0.014
Neg logit contributions
 neither-0.013
 Instead-0.012
 Twe-0.012
 So-0.012
 Mitt-0.011
Token to
Token ID284
Feature activation+1.40e-02

Pos logit contributions
 neither+0.002
 Instead+0.002
 So+0.002
 Twe+0.002
idents+0.002
Neg logit contributions
te-0.002
bridge-0.002
 fant-0.002
THE-0.002
 calendar-0.002
Token go
Token ID467
Feature activation+2.64e-02

Pos logit contributions
elt+0.021
 neither+0.021
idents+0.021
 Instead+0.021
 So+0.021
Neg logit contributions
te-0.019
 throne-0.017
 tour-0.017
THE-0.016
 fant-0.015
Token empty
Token ID6565
Feature activation+3.13e-03

Pos logit contributions
 neither+0.045
 Instead+0.044
 Twe+0.043
 So+0.042
 Mitt+0.040
Neg logit contributions
te-0.034
bridge-0.033
THE-0.031
 fant-0.031
book-0.031
Token-
Token ID12
Feature activation+7.99e-03

Pos logit contributions
 neither+0.004
 Instead+0.004
 Twe+0.004
 So+0.004
 Mitt+0.004
Neg logit contributions
te-0.005
bridge-0.005
THE-0.005
book-0.005
 calendar-0.005
Tokenhanded
Token ID13638
Feature activation+7.47e-02

Pos logit contributions
 neither+0.011
 Instead+0.011
 So+0.011
 Twe+0.010
 Ow+0.010
Neg logit contributions
te-0.013
bridge-0.012
THE-0.012
book-0.012
 around-0.011
Token,
Token ID11
Feature activation-7.01e-03

Pos logit contributions
 neither+0.104
 Twe+0.102
 Instead+0.100
 Ow+0.097
 Mitt+0.096
Neg logit contributions
te-0.111
bridge-0.105
 throne-0.104
 calendar-0.103
THE-0.103
Token so
Token ID523
Feature activation+1.34e-02

Pos logit contributions
te+0.011
bridge+0.011
THE+0.011
 fant+0.011
book+0.011
Neg logit contributions
 neither-0.009
 Instead-0.009
 Twe-0.009
 So-0.009
 Mitt-0.008
Token she
Token ID673
Feature activation-6.95e-03

Pos logit contributions
idents+0.018
 Instead+0.018
 So+0.018
 neither+0.018
 Ow+0.018
Neg logit contributions
te-0.020
 fant-0.019
bridge-0.019
THE-0.019
 throne-0.018
Token sque
Token ID35776
Feature activation+2.61e-02

Pos logit contributions
te+0.012
bridge+0.011
THE+0.011
 fant+0.011
book+0.011
Neg logit contributions
 neither-0.009
 Instead-0.009
 Twe-0.009
 So-0.009
 Mitt-0.008
Tokenaked
Token ID4335
Feature activation+3.17e-02

Pos logit contributions
 neither+0.048
 Instead+0.047
 Twe+0.046
 So+0.046
 Mitt+0.043
Neg logit contributions
te-0.030
bridge-0.029
THE-0.027
 fant-0.027
book-0.027
Token now
Token ID783
Feature activation-8.31e-03

Pos logit contributions
 So+0.003
 neither+0.003
 Instead+0.003
 Twe+0.003
ucker+0.003
Neg logit contributions
te-0.004
 around-0.004
 fant-0.004
THE-0.004
 calendar-0.004
Token I
Token ID314
Feature activation+1.87e-02

Pos logit contributions
 around+0.014
te+0.014
THE+0.014
 fant+0.014
 throne+0.014
Neg logit contributions
 So-0.009
 neither-0.009
ucker-0.009
 Instead-0.009
ety-0.008
Token'm
Token ID1101
Feature activation+3.55e-03

Pos logit contributions
 neither+0.026
 Instead+0.025
 Twe+0.025
 So+0.024
 Mitt+0.023
Neg logit contributions
te-0.030
bridge-0.028
THE-0.027
 fant-0.027
book-0.027
Token feeling
Token ID4203
Feature activation+2.74e-03

Pos logit contributions
 neither+0.005
 Instead+0.005
 Twe+0.005
 So+0.005
 Mitt+0.004
Neg logit contributions
te-0.006
bridge-0.006
THE-0.005
book-0.005
 calendar-0.005
Token a
Token ID257
Feature activation+3.72e-02

Pos logit contributions
 neither+0.004
 So+0.004
 Instead+0.004
 Twe+0.004
idents+0.004
Neg logit contributions
te-0.004
bridge-0.004
 fant-0.004
THE-0.004
 calendar-0.004
Token bit
Token ID1643
Feature activation+7.46e-02

Pos logit contributions
 neither+0.037
 Instead+0.035
 Twe+0.034
 So+0.034
 Mitt+0.030
Neg logit contributions
te-0.073
bridge-0.071
THE-0.070
 fant-0.069
 calendar-0.069
Token chilly
Token ID49018
Feature activation+2.02e-02

Pos logit contributions
 neither+0.111
 Twe+0.107
idents+0.106
 Instead+0.105
 So+0.105
Neg logit contributions
te-0.101
 around-0.099
 fant-0.095
bridge-0.094
THE-0.092
Token."
Token ID526
Feature activation+2.17e-02

Pos logit contributions
 neither+0.026
 Instead+0.025
 So+0.024
 Twe+0.024
idents+0.023
Neg logit contributions
te-0.033
bridge-0.032
 around-0.032
 downstairs-0.031
THE-0.031
Token
Token ID198
Feature activation+1.08e-03

Pos logit contributions
 neither+0.039
idents+0.037
charged+0.034
ucker+0.034
eter+0.033
Neg logit contributions
te-0.024
THE-0.023
idi-0.023
 around-0.023
 fant-0.022
Token
Token ID198
Feature activation+3.90e-03

Pos logit contributions
 neither+0.002
 Instead+0.001
 Twe+0.001
idents+0.001
 So+0.001
Neg logit contributions
te-0.002
THE-0.002
bridge-0.002
book-0.002
 fant-0.002
TokenThe
Token ID464
Feature activation+2.89e-02

Pos logit contributions
 neither+0.006
 Instead+0.006
 Twe+0.006
 So+0.006
 Mitt+0.006
Neg logit contributions
te-0.005
bridge-0.005
THE-0.005
 fant-0.005
 calendar-0.005
Token fast
Token ID3049
Feature activation+2.67e-03

Pos logit contributions
 Instead+0.008
 So+0.008
idents+0.007
 neither+0.007
 Twe+0.007
Neg logit contributions
te-0.004
THE-0.003
bridge-0.003
 around-0.003
 fant-0.003
Token as
Token ID355
Feature activation+2.04e-02

Pos logit contributions
 Instead+0.004
 So+0.003
 neither+0.003
idents+0.003
ucker+0.003
Neg logit contributions
te-0.004
bridge-0.004
 around-0.004
THE-0.004
 carriage-0.004
Token he
Token ID339
Feature activation+1.86e-02

Pos logit contributions
 Instead+0.032
 So+0.031
idents+0.031
 neither+0.029
ucker+0.029
Neg logit contributions
te-0.029
THE-0.028
bridge-0.028
 fant-0.026
 throne-0.026
Token could
Token ID714
Feature activation+2.80e-02

Pos logit contributions
 neither+0.036
idents+0.036
 Instead+0.035
 Twe+0.035
 Sorry+0.034
Neg logit contributions
te-0.018
bridge-0.016
THE-0.016
 around-0.015
 fant-0.015
Token and
Token ID290
Feature activation+1.03e-02

Pos logit contributions
 neither+0.036
 Instead+0.034
 So+0.033
 Twe+0.033
 Mitt+0.030
Neg logit contributions
te-0.048
bridge-0.048
THE-0.046
 calendar-0.046
 fant-0.045
Token bit
Token ID1643
Feature activation+7.45e-02

Pos logit contributions
 So+0.016
 Instead+0.016
ucker+0.016
 neither+0.015
idents+0.015
Neg logit contributions
te-0.013
bridge-0.013
 around-0.013
 fant-0.013
 throne-0.013
Token the
Token ID262
Feature activation+2.83e-03

Pos logit contributions
idents+0.136
 So+0.134
mm+0.134
 Instead+0.129
 Sorry+0.128
Neg logit contributions
 around-0.083
 over-0.075
 downstairs-0.071
 corner-0.067
THE-0.067
Token fish
Token ID5916
Feature activation+2.99e-02

Pos logit contributions
 Instead+0.004
 neither+0.004
 So+0.004
 Twe+0.004
 Mitt+0.004
Neg logit contributions
bridge-0.004
 corner-0.004
 throne-0.004
 tour-0.004
THE-0.004
Token.
Token ID13
Feature activation+1.35e-02

Pos logit contributions
 neither+0.045
 Instead+0.044
 Twe+0.044
 So+0.043
 Mitt+0.040
Neg logit contributions
te-0.043
bridge-0.042
THE-0.040
 fant-0.039
 calendar-0.039
Token The
Token ID383
Feature activation+2.01e-02

Pos logit contributions
ucker+0.025
 neither+0.024
ety+0.024
charged+0.022
w+0.022
Neg logit contributions
 around-0.015
 corner-0.013
 throne-0.013
 over-0.012
idi-0.012
Token ot
Token ID30972
Feature activation+1.36e-02

Pos logit contributions
 Instead+0.030
 neither+0.029
 So+0.029
idents+0.027
eter+0.027
Neg logit contributions
bridge-0.029
THE-0.028
 tour-0.028
te-0.028
 corner-0.027
Token,
Token ID11
Feature activation+1.08e-02

Pos logit contributions
 neither+0.008
 Instead+0.008
 So+0.008
 Twe+0.007
 Mitt+0.007
Neg logit contributions
te-0.010
bridge-0.010
THE-0.009
 fant-0.009
 calendar-0.009
Token taking
Token ID2263
Feature activation-5.01e-03

Pos logit contributions
 neither+0.014
 Instead+0.014
 So+0.014
 Twe+0.014
 Mitt+0.013
Neg logit contributions
te-0.018
bridge-0.017
THE-0.017
 fant-0.016
 calendar-0.016
Token out
Token ID503
Feature activation+7.21e-03

Pos logit contributions
te+0.008
bridge+0.008
THE+0.007
book+0.007
 calendar+0.007
Neg logit contributions
 neither-0.007
 Instead-0.007
 Twe-0.007
 So-0.006
 Mitt-0.006
Token lots
Token ID6041
Feature activation+2.84e-03

Pos logit contributions
 neither+0.008
 Instead+0.008
 So+0.007
 Twe+0.007
 Mitt+0.006
Neg logit contributions
te-0.014
bridge-0.013
THE-0.013
 fant-0.013
book-0.013
Token of
Token ID286
Feature activation+1.98e-02

Pos logit contributions
 neither+0.004
 So+0.004
 Instead+0.004
 Twe+0.004
 Mitt+0.004
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
 fant-0.004
 calendar-0.004
Token little
Token ID1310
Feature activation+7.43e-02

Pos logit contributions
 Instead+0.027
 So+0.027
 neither+0.027
 Twe+0.027
 Mitt+0.025
Neg logit contributions
te-0.031
bridge-0.031
THE-0.030
 calendar-0.028
 throne-0.028
Token pieces
Token ID5207
Feature activation+6.45e-03

Pos logit contributions
 neither+0.112
 Instead+0.107
 Twe+0.106
 So+0.101
 Mitt+0.098
Neg logit contributions
te-0.107
bridge-0.107
 fant-0.103
 calendar-0.101
 throne-0.101
Token of
Token ID286
Feature activation+1.35e-02

Pos logit contributions
 neither+0.008
 Instead+0.007
 So+0.007
 Twe+0.007
 Mitt+0.006
Neg logit contributions
te-0.011
bridge-0.011
 fant-0.011
THE-0.011
 calendar-0.011
Token wood
Token ID4898
Feature activation+1.88e-02

Pos logit contributions
 neither+0.025
 Instead+0.024
 Twe+0.023
 So+0.023
idents+0.023
Neg logit contributions
te-0.015
bridge-0.014
 fant-0.014
 calendar-0.014
book-0.014
Token.
Token ID13
Feature activation+1.05e-02

Pos logit contributions
 neither+0.024
 Instead+0.023
 Twe+0.022
 So+0.022
 Mitt+0.020
Neg logit contributions
te-0.032
bridge-0.032
THE-0.031
 fant-0.030
 calendar-0.030
Token
Token ID198
Feature activation+9.04e-04

Pos logit contributions
 neither+0.014
 Twe+0.014
idents+0.013
 Instead+0.013
ety+0.012
Neg logit contributions
te-0.016
bridge-0.016
THE-0.016
book-0.015
 fant-0.015
Token sense
Token ID2565
Feature activation+2.78e-02

Pos logit contributions
 neither+0.007
 Instead+0.007
 Twe+0.007
 So+0.007
 Mitt+0.007
Neg logit contributions
te-0.006
bridge-0.006
THE-0.006
 fant-0.005
book-0.005
Token of
Token ID286
Feature activation+6.38e-03

Pos logit contributions
 neither+0.040
 Instead+0.037
 Twe+0.036
 So+0.036
 Mitt+0.034
Neg logit contributions
bridge-0.043
 fant-0.041
THE-0.040
te-0.040
 throne-0.040
Token accomplishment
Token ID29340
Feature activation-3.17e-02

Pos logit contributions
 neither+0.010
 Instead+0.009
 Twe+0.009
 So+0.009
 Mitt+0.009
Neg logit contributions
te-0.009
bridge-0.009
THE-0.009
book-0.009
 fant-0.009
Token for
Token ID329
Feature activation-3.90e-03

Pos logit contributions
bridge+0.054
te+0.054
 fant+0.053
THE+0.052
 calendar+0.051
Neg logit contributions
 neither-0.040
 Twe-0.036
 Instead-0.036
 So-0.035
 Mitt-0.032
Token having
Token ID1719
Feature activation-2.88e-02

Pos logit contributions
te+0.005
THE+0.005
bridge+0.005
 fant+0.005
 calendar+0.005
Neg logit contributions
 Instead-0.006
 So-0.006
 neither-0.006
idents-0.006
ucker-0.006
Token healed
Token ID28557
Feature activation+7.41e-02

Pos logit contributions
te+0.042
 fant+0.040
THE+0.039
bridge+0.039
book+0.038
Neg logit contributions
 neither-0.044
 Instead-0.041
idents-0.041
 So-0.041
 Twe-0.040
Token the
Token ID262
Feature activation+1.27e-02

Pos logit contributions
 neither+0.111
 Instead+0.106
 So+0.103
 Twe+0.101
idents+0.098
Neg logit contributions
te-0.112
bridge-0.104
THE-0.102
 fant-0.101
book-0.101
Token bird
Token ID6512
Feature activation+4.06e-02

Pos logit contributions
 neither+0.022
 Instead+0.021
 So+0.020
 Twe+0.020
 Mitt+0.019
Neg logit contributions
te-0.017
bridge-0.016
 calendar-0.016
THE-0.016
 fant-0.015
Token and
Token ID290
Feature activation+8.29e-03

Pos logit contributions
 neither+0.057
 Instead+0.055
 Twe+0.054
 So+0.054
 Mitt+0.050
Neg logit contributions
te-0.064
bridge-0.062
THE-0.060
 calendar-0.059
book-0.059
Token giving
Token ID3501
Feature activation+1.95e-02

Pos logit contributions
 neither+0.012
 Instead+0.011
 So+0.011
 Twe+0.011
idents+0.010
Neg logit contributions
te-0.013
bridge-0.012
THE-0.012
 calendar-0.012
 fant-0.012
Token it
Token ID340
Feature activation+8.90e-03

Pos logit contributions
 neither+0.032
 Instead+0.030
 So+0.030
idents+0.029
 Twe+0.028
Neg logit contributions
te-0.026
THE-0.025
 around-0.024
bridge-0.024
 fant-0.024
Token man
Token ID582
Feature activation+2.86e-02

Pos logit contributions
 neither+0.013
 Instead+0.012
 Twe+0.012
 So+0.012
 Mitt+0.011
Neg logit contributions
te-0.021
bridge-0.020
THE-0.019
book-0.019
 fant-0.019
Token who
Token ID508
Feature activation+1.15e-02

Pos logit contributions
 neither+0.043
 Instead+0.042
idents+0.040
 Twe+0.040
 So+0.040
Neg logit contributions
te-0.043
bridge-0.040
book-0.039
THE-0.038
 throne-0.038
Token lived
Token ID5615
Feature activation+2.02e-02

Pos logit contributions
 neither+0.022
 Instead+0.021
 Twe+0.020
 So+0.020
idents+0.019
Neg logit contributions
te-0.013
bridge-0.013
 fant-0.012
book-0.012
THE-0.012
Token in
Token ID287
Feature activation+1.99e-02

Pos logit contributions
 neither+0.041
 Instead+0.041
 So+0.040
 Twe+0.040
idents+0.038
Neg logit contributions
te-0.019
bridge-0.018
THE-0.017
book-0.016
 fant-0.016
Token a
Token ID257
Feature activation+1.44e-02

Pos logit contributions
 neither+0.025
 So+0.024
 Instead+0.023
idents+0.023
 Twe+0.022
Neg logit contributions
te-0.034
bridge-0.034
 fant-0.033
 calendar-0.032
book-0.032
Token little
Token ID1310
Feature activation+7.39e-02

Pos logit contributions
 Instead+0.022
 Twe+0.021
 So+0.021
 Mitt+0.020
 neither+0.020
Neg logit contributions
te-0.023
bridge-0.022
book-0.021
THE-0.020
uits-0.019
Token house
Token ID2156
Feature activation+1.73e-02

Pos logit contributions
 neither+0.138
 Instead+0.129
 Twe+0.127
 So+0.125
idents+0.122
Neg logit contributions
te-0.082
 calendar-0.078
bridge-0.078
THE-0.077
 fant-0.077
Token.
Token ID13
Feature activation+7.85e-03

Pos logit contributions
 neither+0.024
 Instead+0.023
 Twe+0.023
 So+0.022
idents+0.022
Neg logit contributions
te-0.028
bridge-0.026
book-0.025
 calendar-0.025
 throne-0.025
Token Every
Token ID3887
Feature activation+4.23e-03

Pos logit contributions
 neither+0.012
idents+0.011
ucker+0.010
uck+0.010
 Twe+0.010
Neg logit contributions
te-0.012
bridge-0.012
book-0.012
 throne-0.011
 corner-0.011
Token day
Token ID1110
Feature activation+7.29e-03

Pos logit contributions
 neither+0.008
 Instead+0.008
 Twe+0.008
 So+0.008
 Mitt+0.008
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
book-0.004
 calendar-0.004
Token,
Token ID11
Feature activation-4.09e-04

Pos logit contributions
 neither+0.009
 Instead+0.009
 Twe+0.008
 So+0.008
 Mitt+0.008
Neg logit contributions
te-0.013
bridge-0.012
THE-0.012
 fant-0.012
book-0.012
Token would
Token ID561
Feature activation-3.04e-03

Pos logit contributions
bridge+0.025
te+0.025
THE+0.024
 fant+0.024
book+0.024
Neg logit contributions
 neither-0.019
 Twe-0.018
 Instead-0.018
 So-0.018
 Mitt-0.017
Token sit
Token ID1650
Feature activation-1.73e-02

Pos logit contributions
te+0.004
bridge+0.004
THE+0.004
 calendar+0.004
 fant+0.004
Neg logit contributions
 neither-0.005
 Instead-0.005
 Twe-0.004
 So-0.004
 Mitt-0.004
Token by
Token ID416
Feature activation-4.93e-02

Pos logit contributions
bridge+0.027
te+0.026
 fant+0.025
book+0.025
 throne+0.024
Neg logit contributions
 neither-0.025
 Twe-0.024
 Instead-0.023
 So-0.023
 Mitt-0.022
Token the
Token ID262
Feature activation-1.40e-02

Pos logit contributions
te+0.067
 calendar+0.062
 throne+0.059
 curtain+0.057
 fant+0.055
Neg logit contributions
 Instead-0.085
 So-0.084
 neither-0.082
ucker-0.079
elt-0.077
Token books
Token ID3835
Feature activation-5.37e-02

Pos logit contributions
bridge+0.020
te+0.020
book+0.018
THE+0.018
 fant+0.018
Neg logit contributions
 Instead-0.022
 So-0.021
 Mitt-0.021
 neither-0.020
 Twe-0.020
Tokenhe
Token ID258
Feature activation-1.06e-01

Pos logit contributions
bridge+0.056
te+0.055
 fant+0.054
THE+0.051
 throne+0.048
Neg logit contributions
 neither-0.101
 Instead-0.099
 So-0.098
 Twe-0.094
 Mitt-0.090
Tokenlf
Token ID1652
Feature activation-4.79e-02

Pos logit contributions
 fant+0.090
bridge+0.089
te+0.085
 spectacular+0.082
THE+0.082
Neg logit contributions
 neither-0.207
 So-0.202
 Instead-0.197
 Twe-0.186
 Mitt-0.185
Token and
Token ID290
Feature activation-5.00e-03

Pos logit contributions
te+0.072
bridge+0.070
THE+0.068
 calendar+0.066
book+0.066
Neg logit contributions
 neither-0.070
 Instead-0.068
 Twe-0.067
 So-0.066
 Mitt-0.062
Token read
Token ID1100
Feature activation-1.68e-02

Pos logit contributions
te+0.007
bridge+0.006
 throne+0.006
 calendar+0.006
 fant+0.006
Neg logit contributions
 neither-0.008
 Instead-0.008
 So-0.008
 Twe-0.008
 Ow-0.007
Token together
Token ID1978
Feature activation-2.45e-03

Pos logit contributions
te+0.025
bridge+0.025
THE+0.024
 fant+0.024
book+0.023
Neg logit contributions
 neither-0.024
 Instead-0.023
 Twe-0.023
 So-0.023
 Mitt-0.021
Token every
Token ID790
Feature activation-9.09e-03

Pos logit contributions
bridge+0.005
THE+0.004
 fant+0.004
iet+0.004
te+0.004
Neg logit contributions
 neither-0.003
 So-0.003
 Instead-0.003
 Twe-0.002
 Mitt-0.002
Token tried
Token ID3088
Feature activation+1.71e-02

Pos logit contributions
te+0.016
idi+0.014
 around+0.014
 corner+0.014
bridge+0.013
Neg logit contributions
 Twe-0.015
 neither-0.015
 Instead-0.014
idents-0.014
ucker-0.014
Token to
Token ID284
Feature activation+1.96e-03

Pos logit contributions
 neither+0.026
 Instead+0.025
 Twe+0.024
 So+0.024
 Mitt+0.023
Neg logit contributions
te-0.026
bridge-0.026
THE-0.024
book-0.024
 fant-0.024
Token climb
Token ID12080
Feature activation-6.07e-03

Pos logit contributions
 neither+0.003
 Instead+0.003
 Twe+0.003
 So+0.003
ucker+0.003
Neg logit contributions
te-0.003
bridge-0.003
 calendar-0.003
 throne-0.002
THE-0.002
Token the
Token ID262
Feature activation-1.45e-02

Pos logit contributions
bridge+0.011
te+0.010
book+0.010
 fant+0.010
THE+0.010
Neg logit contributions
 neither-0.008
 Twe-0.008
 Instead-0.007
 So-0.007
 Mitt-0.007
Token books
Token ID3835
Feature activation-6.17e-02

Pos logit contributions
bridge+0.020
te+0.019
THE+0.018
 fant+0.017
idi+0.017
Neg logit contributions
 Instead-0.024
 So-0.024
 neither-0.022
 Mitt-0.021
idents-0.021
Tokenhe
Token ID258
Feature activation-1.05e-01

Pos logit contributions
te+0.072
bridge+0.072
 fant+0.068
THE+0.068
 throne+0.065
Neg logit contributions
 neither-0.112
 Instead-0.109
 So-0.107
 Twe-0.105
 Mitt-0.098
Tokenlf
Token ID1652
Feature activation-5.42e-02

Pos logit contributions
THE+0.096
bridge+0.092
 fant+0.092
 spectacular+0.091
vy+0.088
Neg logit contributions
 neither-0.200
 So-0.199
 Instead-0.197
 Twe-0.179
 Tim-0.176
Token to
Token ID284
Feature activation-6.98e-03

Pos logit contributions
te+0.088
bridge+0.086
THE+0.082
 calendar+0.082
book+0.082
Neg logit contributions
 neither-0.072
 Instead-0.070
 Twe-0.069
 So-0.068
 Mitt-0.064
Token get
Token ID651
Feature activation-1.36e-02

Pos logit contributions
te+0.010
 calendar+0.009
 throne+0.009
bridge+0.009
 corner+0.009
Neg logit contributions
 neither-0.011
 Instead-0.011
 So-0.011
 Twe-0.011
ucker-0.010
Token the
Token ID262
Feature activation-2.03e-02

Pos logit contributions
te+0.019
bridge+0.018
THE+0.017
 fant+0.017
 calendar+0.017
Neg logit contributions
 neither-0.022
 Instead-0.021
 So-0.021
 Twe-0.021
 Mitt-0.019
Token book
Token ID1492
Feature activation-2.26e-02

Pos logit contributions
te+0.024
bridge+0.024
THE+0.024
 fant+0.022
idi+0.021
Neg logit contributions
 neither-0.035
 Instead-0.035
 So-0.034
 Twe-0.033
 Mitt-0.032
Token puppy
Token ID26188
Feature activation+6.75e-04

Pos logit contributions
te+0.068
bridge+0.065
 calendar+0.065
THE+0.063
 fant+0.063
Neg logit contributions
 neither-0.055
 Instead-0.052
 So-0.049
 Twe-0.049
 Mitt-0.045
Token jumped
Token ID11687
Feature activation-1.33e-02

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
idents+0.001
Neg logit contributions
te-0.001
bridge-0.001
THE-0.001
 fant-0.001
 calendar-0.001
Token on
Token ID319
Feature activation-3.81e-02

Pos logit contributions
bridge+0.021
 throne+0.020
pter+0.019
te+0.019
 orchestra+0.019
Neg logit contributions
 neither-0.019
 Instead-0.019
 Twe-0.019
 Mitt-0.018
 So-0.018
Token the
Token ID262
Feature activation-1.36e-02

Pos logit contributions
te+0.061
bridge+0.057
 calendar+0.056
THE+0.056
book+0.055
Neg logit contributions
 neither-0.054
 Instead-0.052
 So-0.050
 Twe-0.050
idents-0.047
Token books
Token ID3835
Feature activation-5.73e-02

Pos logit contributions
te+0.021
bridge+0.020
THE+0.020
eway+0.019
 fant+0.019
Neg logit contributions
 So-0.018
 Mitt-0.018
 Instead-0.018
 Twe-0.018
 neither-0.017
Tokenhe
Token ID258
Feature activation-1.05e-01

Pos logit contributions
te+0.060
bridge+0.059
THE+0.055
 fant+0.055
 calendar+0.052
Neg logit contributions
 neither-0.111
 Instead-0.108
 So-0.106
 Twe-0.105
 Mitt-0.099
Tokenlf
Token ID1652
Feature activation-4.72e-02

Pos logit contributions
 fant+0.082
THE+0.081
bridge+0.081
te+0.078
 spectacular+0.075
Neg logit contributions
 neither-0.215
 So-0.208
 Instead-0.205
 Twe-0.204
 Mitt-0.196
Token and
Token ID290
Feature activation-6.07e-03

Pos logit contributions
te+0.069
bridge+0.066
 calendar+0.063
book+0.063
 throne+0.062
Neg logit contributions
 neither-0.071
 Instead-0.070
 Twe-0.070
 So-0.068
 Mitt-0.065
Token knocked
Token ID13642
Feature activation-1.00e-02

Pos logit contributions
te+0.009
bridge+0.009
 fant+0.009
 throne+0.009
THE+0.009
Neg logit contributions
 neither-0.008
 Instead-0.008
 So-0.008
 Twe-0.008
idents-0.007
Token all
Token ID477
Feature activation-4.02e-02

Pos logit contributions
bridge+0.010
te+0.010
 fant+0.010
 throne+0.010
 calendar+0.009
Neg logit contributions
 neither-0.019
 Instead-0.019
 Twe-0.019
 So-0.019
 Mitt-0.018
Token the
Token ID262
Feature activation-1.15e-02

Pos logit contributions
bridge+0.065
te+0.064
 fant+0.061
 throne+0.060
THE+0.059
Neg logit contributions
 Instead-0.056
 Twe-0.055
 neither-0.055
 So-0.055
 Mitt-0.052
Token book
Token ID1492
Feature activation-1.53e-02

Pos logit contributions
 Instead+0.028
 neither+0.027
 So+0.027
 Twe+0.026
 Mitt+0.025
Neg logit contributions
te-0.020
THE-0.019
bridge-0.018
 fant-0.017
iet-0.017
Token falls
Token ID8953
Feature activation+7.63e-04

Pos logit contributions
te+0.030
THE+0.029
 fant+0.029
bridge+0.029
book+0.028
Neg logit contributions
 neither-0.017
 Instead-0.016
 So-0.015
 Twe-0.013
idents-0.012
Token off
Token ID572
Feature activation-4.46e-02

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
 Mitt+0.001
Neg logit contributions
te-0.001
bridge-0.001
THE-0.001
 calendar-0.001
book-0.001
Token the
Token ID262
Feature activation-1.17e-02

Pos logit contributions
te+0.073
bridge+0.071
THE+0.068
book+0.067
 calendar+0.067
Neg logit contributions
 neither-0.061
 Instead-0.059
 Twe-0.057
 So-0.057
 Mitt-0.053
Token books
Token ID3835
Feature activation-6.25e-02

Pos logit contributions
te+0.016
THE+0.016
bridge+0.016
 fant+0.015
iet+0.015
Neg logit contributions
 Instead-0.017
 So-0.017
 neither-0.017
 Twe-0.017
 Mitt-0.016
Tokenhe
Token ID258
Feature activation-1.05e-01

Pos logit contributions
te+0.080
 fant+0.079
bridge+0.078
THE+0.076
 calendar+0.073
Neg logit contributions
 neither-0.104
 Instead-0.102
 So-0.099
 Twe-0.096
 Mitt-0.090
Tokenlf
Token ID1652
Feature activation-5.32e-02

Pos logit contributions
 fant+0.086
THE+0.081
te+0.077
 spectacular+0.075
bridge+0.074
Neg logit contributions
 neither-0.213
 Instead-0.207
 So-0.205
 Twe-0.196
 Ow-0.188
Token.
Token ID13
Feature activation+5.26e-03

Pos logit contributions
te+0.094
bridge+0.093
THE+0.090
 fant+0.088
 calendar+0.088
Neg logit contributions
 neither-0.064
 Instead-0.061
 Twe-0.059
 So-0.059
 Mitt-0.054
Token It
Token ID632
Feature activation+1.86e-02

Pos logit contributions
 Twe+0.007
ety+0.007
 neither+0.007
charged+0.007
idents+0.007
Neg logit contributions
book-0.008
bridge-0.007
te-0.007
 around-0.007
 curtain-0.007
Token makes
Token ID1838
Feature activation+1.18e-02

Pos logit contributions
 neither+0.020
 Instead+0.019
 Twe+0.018
 So+0.018
 Mitt+0.016
Neg logit contributions
te-0.036
bridge-0.035
THE-0.035
book-0.034
 fant-0.034
Token a
Token ID257
Feature activation+7.99e-03

Pos logit contributions
 neither+0.016
 Twe+0.016
 So+0.016
ety+0.016
 Instead+0.015
Neg logit contributions
bridge-0.018
te-0.018
 around-0.016
 curtain-0.016
THE-0.016
Token showed
Token ID3751
Feature activation+1.25e-02

Pos logit contributions
te+0.034
 calendar+0.030
 around+0.030
book+0.030
THE+0.030
Neg logit contributions
 neither-0.040
 So-0.039
 Twe-0.038
 Instead-0.038
idents-0.038
Token her
Token ID607
Feature activation-2.95e-02

Pos logit contributions
 neither+0.022
 Instead+0.022
 So+0.022
 Twe+0.021
idents+0.020
Neg logit contributions
te-0.014
 calendar-0.013
THE-0.013
bridge-0.013
 fant-0.013
Token friend
Token ID1545
Feature activation-1.31e-02

Pos logit contributions
bridge+0.053
te+0.052
book+0.050
THE+0.050
 fant+0.049
Neg logit contributions
 neither-0.037
 Instead-0.035
 Twe-0.033
 Mitt-0.033
 So-0.032
Token the
Token ID262
Feature activation-1.95e-02

Pos logit contributions
bridge+0.024
te+0.023
THE+0.022
 fant+0.022
 throne+0.022
Neg logit contributions
 neither-0.016
 Instead-0.015
 Twe-0.014
 So-0.014
 Mitt-0.014
Token books
Token ID3835
Feature activation-6.03e-02

Pos logit contributions
bridge+0.028
te+0.028
THE+0.027
iet+0.026
vy+0.025
Neg logit contributions
 Instead-0.030
 So-0.028
 neither-0.028
 Twe-0.027
 Mitt-0.026
Tokenhe
Token ID258
Feature activation-1.03e-01

Pos logit contributions
bridge+0.103
te+0.102
 fant+0.100
THE+0.098
 throne+0.097
Neg logit contributions
 neither-0.076
 Instead-0.071
 Twe-0.068
 So-0.068
 Mitt-0.061
Tokenlf
Token ID1652
Feature activation-6.21e-02

Pos logit contributions
 spectacular+0.111
THE+0.108
 fant+0.107
bridge+0.106
 carriage+0.102
Neg logit contributions
 neither-0.180
 So-0.166
 Instead-0.165
 Tim-0.162
 Twe-0.156
Token.
Token ID13
Feature activation-5.03e-04

Pos logit contributions
bridge+0.114
te+0.113
THE+0.111
 fant+0.110
book+0.106
Neg logit contributions
 neither-0.070
 Instead-0.064
 Twe-0.063
 So-0.062
 Mitt-0.056
Token From
Token ID3574
Feature activation-2.34e-02

Pos logit contributions
te+0.001
THE+0.001
 calendar+0.001
 fant+0.001
book+0.001
Neg logit contributions
 neither-0.001
 mun-0.001
 Twe-0.001
ucker-0.001
 scrape-0.001
Token then
Token ID788
Feature activation-1.26e-02

Pos logit contributions
bridge+0.049
te+0.048
book+0.047
 throne+0.046
 fant+0.046
Neg logit contributions
 neither-0.023
 Twe-0.021
 Instead-0.020
 So-0.019
 Mitt-0.019
Token on
Token ID319
Feature activation-2.27e-02

Pos logit contributions
bridge+0.015
 throne+0.014
te+0.014
 fant+0.013
 carriage+0.013
Neg logit contributions
 neither-0.024
 So-0.023
 Instead-0.023
 Mitt-0.022
 Twe-0.022
TokenCan
Token ID6090
Feature activation+7.54e-03

Pos logit contributions
te+0.021
bridge+0.021
 calendar+0.020
 throne+0.020
ines+0.020
Neg logit contributions
 So-0.020
 Instead-0.020
 neither-0.020
 Twe-0.019
 Mitt-0.019
Token I
Token ID314
Feature activation-8.31e-03

Pos logit contributions
ucker+0.009
 Instead+0.009
eter+0.008
 neither+0.008
anked+0.008
Neg logit contributions
te-0.013
THE-0.013
 throne-0.012
 calendar-0.012
 fant-0.012
Token touch
Token ID3638
Feature activation+1.10e-02

Pos logit contributions
bridge+0.015
te+0.014
THE+0.014
book+0.014
 fant+0.014
Neg logit contributions
 Instead-0.010
 neither-0.010
 So-0.010
 Twe-0.010
 Mitt-0.009
Token the
Token ID262
Feature activation-2.10e-02

Pos logit contributions
 neither+0.016
 Instead+0.016
 So+0.015
idents+0.015
ucker+0.015
Neg logit contributions
te-0.016
bridge-0.016
 calendar-0.015
THE-0.015
 fant-0.015
Token books
Token ID3835
Feature activation-5.79e-02

Pos logit contributions
te+0.030
THE+0.029
bridge+0.029
idi+0.027
iet+0.027
Neg logit contributions
 Instead-0.033
 So-0.031
 neither-0.030
 Twe-0.030
 Mitt-0.029
Tokenhe
Token ID258
Feature activation-1.03e-01

Pos logit contributions
te+0.076
bridge+0.075
 fant+0.072
THE+0.071
 calendar+0.070
Neg logit contributions
 neither-0.095
 Instead-0.093
 So-0.090
 Twe-0.090
 Mitt-0.082
Tokenlf
Token ID1652
Feature activation-5.25e-02

Pos logit contributions
THE+0.077
te+0.076
 fant+0.075
bridge+0.074
uits+0.070
Neg logit contributions
 neither-0.211
 So-0.207
 Instead-0.207
 Twe-0.198
 you-0.195
Token too
Token ID1165
Feature activation-1.60e-02

Pos logit contributions
te+0.085
bridge+0.083
THE+0.080
book+0.079
 calendar+0.078
Neg logit contributions
 neither-0.071
 Instead-0.069
 Twe-0.067
 So-0.066
 Mitt-0.062
Token?"
Token ID1701
Feature activation-1.53e-03

Pos logit contributions
te+0.022
 calendar+0.021
book+0.020
 fant+0.020
bridge+0.020
Neg logit contributions
 neither-0.025
 Twe-0.023
 Instead-0.023
 So-0.023
 Mitt-0.023
Token His
Token ID2399
Feature activation+3.01e-02

Pos logit contributions
te+0.002
 fant+0.002
book+0.002
THE+0.002
 corner+0.002
Neg logit contributions
 neither-0.002
idents-0.002
 Twe-0.002
ucker-0.002
 Instead-0.002
Token mom
Token ID1995
Feature activation-4.60e-03

Pos logit contributions
 neither+0.034
 Instead+0.033
 So+0.032
 Twe+0.030
 Mitt+0.030
Neg logit contributions
bridge-0.058
te-0.057
THE-0.055
book-0.054
 fant-0.053
Token chair
Token ID5118
Feature activation-3.93e-03

Pos logit contributions
 neither+0.002
 Instead+0.002
 So+0.002
 Ow+0.001
idents+0.001
Neg logit contributions
te-0.002
THE-0.001
bridge-0.001
 fant-0.001
 reporter-0.001
Token to
Token ID284
Feature activation+7.45e-03

Pos logit contributions
te+0.006
bridge+0.006
THE+0.006
 fant+0.006
 calendar+0.006
Neg logit contributions
 neither-0.006
 Instead-0.005
 So-0.005
 Twe-0.005
 Mitt-0.005
Token climb
Token ID12080
Feature activation-2.92e-03

Pos logit contributions
 neither+0.012
 Instead+0.012
 Twe+0.011
 So+0.011
idents+0.011
Neg logit contributions
te-0.010
bridge-0.010
 throne-0.010
 calendar-0.010
book-0.010
Token the
Token ID262
Feature activation-1.27e-02

Pos logit contributions
te+0.005
bridge+0.005
THE+0.004
book+0.004
 fant+0.004
Neg logit contributions
 neither-0.004
 Instead-0.004
 Twe-0.004
 So-0.004
 Mitt-0.003
Token books
Token ID3835
Feature activation-5.68e-02

Pos logit contributions
te+0.019
bridge+0.018
THE+0.017
 reporter+0.017
iet+0.016
Neg logit contributions
 Instead-0.019
 So-0.019
 neither-0.017
 Twe-0.016
 Ow-0.016
Tokenhe
Token ID258
Feature activation-1.03e-01

Pos logit contributions
te+0.084
bridge+0.083
 fant+0.081
THE+0.080
 calendar+0.077
Neg logit contributions
 neither-0.084
 Instead-0.082
 So-0.080
 Twe-0.077
 Mitt-0.070
Tokenlf
Token ID1652
Feature activation-4.92e-02

Pos logit contributions
THE+0.086
 fant+0.086
te+0.083
bridge+0.081
 spectacular+0.080
Neg logit contributions
 neither-0.201
 So-0.199
 Instead-0.199
 Twe-0.181
uck-0.177
Token.
Token ID13
Feature activation+7.42e-03

Pos logit contributions
te+0.088
bridge+0.086
THE+0.083
 fant+0.082
book+0.082
Neg logit contributions
 neither-0.059
 Instead-0.056
 Twe-0.055
 So-0.054
 Mitt-0.049
Token She
Token ID1375
Feature activation-4.71e-03

Pos logit contributions
 neither+0.014
unch+0.013
ucker+0.013
 mun+0.013
charged+0.013
Neg logit contributions
 around-0.008
 downstairs-0.007
 corner-0.007
 calendar-0.007
 curtain-0.007
Token pushes
Token ID20070
Feature activation-1.69e-03

Pos logit contributions
te+0.008
bridge+0.008
THE+0.008
 fant+0.008
 calendar+0.008
Neg logit contributions
 neither-0.006
 Instead-0.006
 Twe-0.006
 So-0.005
 Mitt-0.005
Token the
Token ID262
Feature activation-9.76e-03

Pos logit contributions
te+0.003
bridge+0.002
THE+0.002
book+0.002
 calendar+0.002
Neg logit contributions
 neither-0.002
 Instead-0.002
 Twe-0.002
 So-0.002
idents-0.002
TokenThey
Token ID2990
Feature activation-1.81e-03

Pos logit contributions
te+0.014
bridge+0.014
 calendar+0.013
 fant+0.013
 throne+0.013
Neg logit contributions
 neither-0.015
 So-0.015
 Instead-0.015
 Twe-0.014
 Mitt-0.014
Token go
Token ID467
Feature activation-1.05e-03

Pos logit contributions
te+0.003
bridge+0.003
THE+0.003
book+0.003
 fant+0.003
Neg logit contributions
 neither-0.002
 Instead-0.002
 So-0.002
 Twe-0.002
 Mitt-0.002
Token to
Token ID284
Feature activation-1.68e-02

Pos logit contributions
bridge+0.002
 fant+0.002
te+0.002
book+0.002
iet+0.002
Neg logit contributions
 neither-0.001
 So-0.001
 Instead-0.001
 Twe-0.001
 Mitt-0.001
Token the
Token ID262
Feature activation-1.74e-02

Pos logit contributions
te+0.023
 calendar+0.022
 downstairs+0.022
 curtain+0.022
 throne+0.022
Neg logit contributions
 neither-0.026
 Instead-0.026
 So-0.025
 Twe-0.025
ucker-0.024
Token books
Token ID3835
Feature activation-5.58e-02

Pos logit contributions
te+0.027
bridge+0.027
book+0.025
iet+0.024
THE+0.024
Neg logit contributions
 Instead-0.025
 So-0.025
 Mitt-0.024
 cash-0.023
 Twe-0.022
Tokenhe
Token ID258
Feature activation-1.02e-01

Pos logit contributions
 fant+0.076
te+0.075
bridge+0.074
THE+0.071
iet+0.068
Neg logit contributions
 neither-0.089
 So-0.088
 Instead-0.086
 Twe-0.077
uck-0.073
Tokenlf
Token ID1652
Feature activation-4.96e-02

Pos logit contributions
 fant+0.085
THE+0.080
 spectacular+0.080
te+0.080
bridge+0.079
Neg logit contributions
 neither-0.204
 So-0.199
 Instead-0.195
 you-0.183
 Twe-0.183
Token and
Token ID290
Feature activation-7.62e-03

Pos logit contributions
te+0.075
bridge+0.073
THE+0.070
book+0.069
 calendar+0.069
Neg logit contributions
 neither-0.072
 Instead-0.070
 Twe-0.070
 So-0.067
 Mitt-0.065
Token start
Token ID923
Feature activation-1.44e-02

Pos logit contributions
te+0.012
bridge+0.011
THE+0.011
 fant+0.011
 calendar+0.011
Neg logit contributions
 neither-0.011
 Instead-0.011
 Twe-0.011
 So-0.010
 Mitt-0.010
Token to
Token ID284
Feature activation-3.54e-03

Pos logit contributions
bridge+0.023
te+0.023
THE+0.022
 fant+0.021
book+0.021
Neg logit contributions
 neither-0.020
 Instead-0.020
 Twe-0.019
 So-0.019
 Mitt-0.018
Token climb
Token ID12080
Feature activation-9.96e-03

Pos logit contributions
te+0.005
 throne+0.005
 calendar+0.005
bridge+0.005
 fant+0.005
Neg logit contributions
 neither-0.005
 Instead-0.005
 Twe-0.005
 So-0.005
idents-0.005
Token felt
Token ID2936
Feature activation-2.33e-02

Pos logit contributions
 neither+0.021
 Instead+0.020
 Twe+0.020
 So+0.020
idents+0.019
Neg logit contributions
te-0.022
bridge-0.021
THE-0.020
 fant-0.019
 throne-0.019
Token his
Token ID465
Feature activation+1.16e-02

Pos logit contributions
te+0.035
bridge+0.035
THE+0.034
 fant+0.033
book+0.032
Neg logit contributions
 neither-0.032
 Instead-0.032
 So-0.031
 Twe-0.031
idents-0.029
Token tum
Token ID11814
Feature activation+3.19e-02

Pos logit contributions
 Instead+0.018
 Twe+0.017
Fine+0.016
 So+0.016
 neither+0.016
Neg logit contributions
 curtain-0.016
te-0.016
 throne-0.015
 calendar-0.015
 corner-0.015
Tokenmy
Token ID1820
Feature activation-1.74e-02

Pos logit contributions
 neither+0.049
 Twe+0.046
idents+0.046
 Instead+0.046
 So+0.045
Neg logit contributions
bridge-0.047
THE-0.047
te-0.046
book-0.045
 corner-0.042
Token ac
Token ID936
Feature activation-3.11e-02

Pos logit contributions
te+0.025
bridge+0.025
 around+0.024
THE+0.023
 corner+0.023
Neg logit contributions
 Instead-0.025
 Twe-0.025
 neither-0.025
 So-0.025
 Cr-0.024
Tokenhe
Token ID258
Feature activation-1.02e-01

Pos logit contributions
te+0.016
THE+0.016
bridge+0.016
book+0.013
 around+0.011
Neg logit contributions
 Instead-0.079
 So-0.077
 neither-0.077
 Twe-0.075
idents-0.073
Token.
Token ID13
Feature activation-9.77e-04

Pos logit contributions
te+0.164
bridge+0.161
THE+0.154
book+0.151
 throne+0.150
Neg logit contributions
 Instead-0.141
 neither-0.139
 So-0.136
 Twe-0.135
 Mitt-0.125
Token He
Token ID679
Feature activation+1.42e-02

Pos logit contributions
te+0.001
book+0.001
bridge+0.001
 fant+0.001
---+0.001
Neg logit contributions
 neither-0.002
idents-0.002
 mun-0.002
ucker-0.002
charged-0.002
Token felt
Token ID2936
Feature activation-2.22e-02

Pos logit contributions
 neither+0.020
 Instead+0.020
 Twe+0.020
 So+0.019
idents+0.018
Neg logit contributions
te-0.022
bridge-0.021
THE-0.020
 throne-0.020
 fant-0.019
Token his
Token ID465
Feature activation+1.17e-02

Pos logit contributions
te+0.034
bridge+0.033
THE+0.032
 fant+0.032
book+0.031
Neg logit contributions
 neither-0.031
 Instead-0.030
 So-0.030
 Twe-0.029
idents-0.027
Token head
Token ID1182
Feature activation-1.10e-02

Pos logit contributions
 Instead+0.017
Fine+0.017
 Twe+0.017
 So+0.016
 neither+0.016
Neg logit contributions
 curtain-0.015
te-0.015
 throne-0.015
 calendar-0.015
 corner-0.015
Token about
Token ID546
Feature activation-4.35e-02

Pos logit contributions
te+0.027
 calendar+0.025
THE+0.025
bridge+0.024
book+0.024
Neg logit contributions
 neither-0.034
 Twe-0.033
 Instead-0.033
 So-0.032
 Ow-0.030
Token her
Token ID607
Feature activation-1.53e-02

Pos logit contributions
te+0.060
 calendar+0.060
 fant+0.058
THE+0.057
 throne+0.056
Neg logit contributions
 neither-0.067
 Instead-0.066
 So-0.066
 Twe-0.060
idents-0.060
Token tum
Token ID11814
Feature activation+3.72e-02

Pos logit contributions
 calendar+0.022
te+0.022
 corner+0.021
 fant+0.020
THE+0.020
Neg logit contributions
 neither-0.023
 Twe-0.022
 Instead-0.022
 Sorry-0.021
 So-0.020
Tokenmy
Token ID1820
Feature activation-1.47e-02

Pos logit contributions
 neither+0.058
 Twe+0.056
 Instead+0.054
 So+0.053
 mun+0.053
Neg logit contributions
te-0.052
THE-0.052
book-0.051
bridge-0.051
ines-0.049
Token ac
Token ID936
Feature activation-3.13e-02

Pos logit contributions
te+0.020
 around+0.019
book+0.019
bridge+0.019
 calendar+0.018
Neg logit contributions
 neither-0.023
 Instead-0.022
 Twe-0.022
 So-0.022
idents-0.022
Tokenhe
Token ID258
Feature activation-1.01e-01

Pos logit contributions
te+0.031
THE+0.031
bridge+0.029
book+0.029
ines+0.028
Neg logit contributions
 neither-0.064
 Instead-0.062
 So-0.060
 Twe-0.059
idents-0.057
Token.
Token ID13
Feature activation-7.12e-03

Pos logit contributions
te+0.150
bridge+0.144
THE+0.139
book+0.138
 calendar+0.137
Neg logit contributions
 neither-0.153
 Instead-0.152
 Twe-0.147
 So-0.145
 Mitt-0.136
Token Her
Token ID2332
Feature activation+9.29e-03

Pos logit contributions
te+0.010
 calendar+0.009
 fant+0.009
 curtain+0.009
 throne+0.009
Neg logit contributions
 neither-0.011
 Twe-0.010
 Instead-0.010
 So-0.010
 mun-0.010
Token mom
Token ID1995
Feature activation-8.24e-03

Pos logit contributions
 neither+0.013
 Instead+0.013
 Twe+0.012
 So+0.012
 Mitt+0.011
Neg logit contributions
te-0.015
bridge-0.014
THE-0.014
 calendar-0.014
 fant-0.014
Tokenmy
Token ID1820
Feature activation-1.73e-02

Pos logit contributions
te+0.012
THE+0.011
book+0.011
bridge+0.011
 calendar+0.011
Neg logit contributions
 Twe-0.012
 neither-0.012
 Instead-0.012
 Ow-0.011
idents-0.011
Token said
Token ID531
Feature activation-2.77e-02

Pos logit contributions
te+0.028
bridge+0.025
THE+0.025
book+0.024
 calendar+0.024
Neg logit contributions
 Twe-0.023
 neither-0.023
 Instead-0.023
 Ow-0.022
 So-0.021
Token.
Token ID13
Feature activation+3.19e-03

Pos logit contributions
 Twe+0.005
 neither+0.005
mm+0.005
 Instead+0.005
 Ow+0.005
Neg logit contributions
te-0.005
 around-0.005
 downstairs-0.004
book-0.004
bridge-0.004
Token She
Token ID1375
Feature activation-1.96e-03

Pos logit contributions
 neither+0.004
 Twe+0.004
 Instead+0.004
 So+0.003
 Mitt+0.003
Neg logit contributions
te-0.005
 calendar-0.005
book-0.005
bridge-0.005
 fant-0.005
Token moved
Token ID3888
Feature activation+3.34e-03

Pos logit contributions
te+0.003
bridge+0.003
THE+0.003
 fant+0.003
 calendar+0.003
Neg logit contributions
 neither-0.003
 Instead-0.003
 Twe-0.003
 So-0.003
 Mitt-0.002
Token the
Token ID262
Feature activation-1.11e-02

Pos logit contributions
 neither+0.004
 Twe+0.004
 Instead+0.003
 Mitt+0.003
 So+0.003
Neg logit contributions
bridge-0.007
te-0.006
 throne-0.006
THE-0.006
 calendar-0.006
Token books
Token ID3835
Feature activation-5.66e-02

Pos logit contributions
te+0.017
bridge+0.017
THE+0.016
vy+0.016
eway+0.016
Neg logit contributions
 Instead-0.016
 So-0.016
 neither-0.015
 Mitt-0.015
 Ow-0.014
Tokenhe
Token ID258
Feature activation-1.01e-01

Pos logit contributions
bridge+0.071
te+0.069
 fant+0.069
THE+0.065
 throne+0.064
Neg logit contributions
 neither-0.101
 So-0.094
 Instead-0.094
 Twe-0.090
 Mitt-0.086
Tokenlf
Token ID1652
Feature activation-4.84e-02

Pos logit contributions
 fant+0.098
bridge+0.098
THE+0.096
 spectacular+0.094
vy+0.090
Neg logit contributions
 neither-0.186
 So-0.182
 Instead-0.175
 Tim-0.168
 Mitt-0.164
Token a
Token ID257
Feature activation-1.07e-02

Pos logit contributions
te+0.067
bridge+0.064
THE+0.062
 calendar+0.061
book+0.061
Neg logit contributions
 neither-0.077
 Instead-0.075
 Twe-0.074
 So-0.073
 Mitt-0.069
Token little
Token ID1310
Feature activation+3.64e-02

Pos logit contributions
bridge+0.021
te+0.020
THE+0.019
 fant+0.019
iet+0.018
Neg logit contributions
 Instead-0.011
 So-0.011
 neither-0.011
 Twe-0.011
 Mitt-0.010
Token and
Token ID290
Feature activation+8.17e-03

Pos logit contributions
 neither+0.050
 Twe+0.049
 Instead+0.046
mm+0.045
 Ow+0.045
Neg logit contributions
te-0.056
 corner-0.055
 calendar-0.055
 curtain-0.054
 downstairs-0.053
Token got
Token ID1392
Feature activation+6.96e-03

Pos logit contributions
 Instead+0.012
 neither+0.012
 So+0.012
 Twe+0.012
idents+0.011
Neg logit contributions
te-0.012
bridge-0.011
book-0.010
 fant-0.010
THE-0.010
Token she
Token ID673
Feature activation-2.39e-02

Pos logit contributions
te+0.013
THE+0.013
 calendar+0.013
 fant+0.013
bridge+0.012
Neg logit contributions
 Instead-0.009
 neither-0.009
 So-0.009
 Twe-0.009
ucker-0.008
Token always
Token ID1464
Feature activation-1.02e-03

Pos logit contributions
bridge+0.039
te+0.038
book+0.038
 calendar+0.037
 throne+0.037
Neg logit contributions
 neither-0.033
 Instead-0.031
 So-0.030
 Twe-0.030
 Mitt-0.028
Token kept
Token ID4030
Feature activation-6.08e-03

Pos logit contributions
te+0.001
book+0.001
bridge+0.001
 downstairs+0.001
 reporter+0.001
Neg logit contributions
 Instead-0.002
 neither-0.002
 Twe-0.002
 Ow-0.002
 So-0.002
Token her
Token ID607
Feature activation-1.97e-02

Pos logit contributions
te+0.009
bridge+0.009
THE+0.008
book+0.008
 fant+0.008
Neg logit contributions
 neither-0.009
 Instead-0.009
 Twe-0.009
 So-0.009
 Mitt-0.008
Token books
Token ID3835
Feature activation-4.82e-02

Pos logit contributions
bridge+0.029
te+0.028
THE+0.027
book+0.025
 fant+0.025
Neg logit contributions
 neither-0.031
 Instead-0.031
 So-0.030
 Mitt-0.028
 Twe-0.028
Tokenhe
Token ID258
Feature activation-1.00e-01

Pos logit contributions
te+0.059
bridge+0.059
 fant+0.057
THE+0.055
 throne+0.054
Neg logit contributions
 neither-0.083
 Instead-0.079
 Twe-0.078
 So-0.078
 Mitt-0.074
Tokenlf
Token ID1652
Feature activation-5.23e-02

Pos logit contributions
THE+0.097
bridge+0.094
 fant+0.092
 spectacular+0.091
 ferry+0.089
Neg logit contributions
 neither-0.184
 So-0.172
 Instead-0.168
 you-0.166
 Mitt-0.163
Token tidy
Token ID43044
Feature activation-2.10e-02

Pos logit contributions
te+0.071
bridge+0.071
THE+0.068
 fant+0.067
book+0.065
Neg logit contributions
 neither-0.083
 Instead-0.079
 Twe-0.078
 So-0.078
 Mitt-0.074
Token and
Token ID290
Feature activation-1.49e-02

Pos logit contributions
bridge+0.032
te+0.032
THE+0.031
 fant+0.030
 calendar+0.030
Neg logit contributions
 neither-0.030
 Instead-0.029
 So-0.028
 Twe-0.028
 Mitt-0.026
Token never
Token ID1239
Feature activation-2.89e-02

Pos logit contributions
te+0.020
bridge+0.019
THE+0.018
book+0.018
 fant+0.018
Neg logit contributions
 neither-0.025
 Instead-0.024
 So-0.024
 Twe-0.024
 Mitt-0.022
Token did
Token ID750
Feature activation-4.59e-03

Pos logit contributions
te+0.034
 around+0.034
 over+0.032
 downstairs+0.032
 tour+0.030
Neg logit contributions
 Ow-0.049
mm-0.048
idents-0.046
 Twe-0.046
 neither-0.046
Token books
Token ID3835
Feature activation-5.49e-02

Pos logit contributions
te+0.023
THE+0.023
 reporter+0.022
eway+0.022
iet+0.022
Neg logit contributions
 So-0.019
 Instead-0.019
 neither-0.018
 Twe-0.018
 you-0.017
Token back
Token ID736
Feature activation-2.86e-03

Pos logit contributions
 fant+0.067
te+0.066
bridge+0.065
iet+0.065
 orchestra+0.064
Neg logit contributions
 neither-0.095
 Instead-0.089
 Twe-0.088
 So-0.088
 Tim-0.079
Token on
Token ID319
Feature activation-2.82e-02

Pos logit contributions
te+0.003
bridge+0.003
THE+0.003
 fant+0.003
 calendar+0.003
Neg logit contributions
 neither-0.005
 Instead-0.005
 Twe-0.005
 So-0.005
 Mitt-0.005
Token the
Token ID262
Feature activation-7.18e-03

Pos logit contributions
te+0.047
bridge+0.047
THE+0.045
 fant+0.044
 calendar+0.044
Neg logit contributions
 neither-0.037
 Instead-0.036
 Twe-0.035
 So-0.034
 Mitt-0.032
Token books
Token ID3835
Feature activation-5.50e-02

Pos logit contributions
bridge+0.011
te+0.011
THE+0.010
iet+0.010
eway+0.010
Neg logit contributions
 So-0.010
 Instead-0.010
 Twe-0.009
 Mitt-0.009
idents-0.009
Tokenhe
Token ID258
Feature activation-1.00e-01

Pos logit contributions
te+0.059
bridge+0.057
THE+0.054
 fant+0.053
 calendar+0.052
Neg logit contributions
 neither-0.104
 Instead-0.102
 So-0.100
 Twe-0.100
 Mitt-0.094
Tokenlf
Token ID1652
Feature activation-4.93e-02

Pos logit contributions
THE+0.101
vy+0.099
 fant+0.098
 spectacular+0.097
te+0.096
Neg logit contributions
 So-0.175
 neither-0.172
 Instead-0.169
 Tim-0.158
 Twe-0.156
Token,
Token ID11
Feature activation+8.37e-04

Pos logit contributions
te+0.084
bridge+0.082
THE+0.079
book+0.078
 calendar+0.078
Neg logit contributions
 neither-0.062
 Instead-0.060
 Twe-0.059
 So-0.058
 Mitt-0.054
Token smiling
Token ID16755
Feature activation-2.10e-02

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
 Mitt+0.001
Neg logit contributions
te-0.002
bridge-0.001
THE-0.001
 calendar-0.001
book-0.001
Token proudly
Token ID23017
Feature activation+4.96e-04

Pos logit contributions
te+0.036
bridge+0.035
THE+0.034
 fant+0.034
 calendar+0.034
Neg logit contributions
 Instead-0.026
 neither-0.026
 So-0.025
 Twe-0.025
 Mitt-0.022
Token.
Token ID13
Feature activation+1.21e-02

Pos logit contributions
 neither+0.001
 Instead+0.001
 So+0.000
 Twe+0.000
 Mitt+0.000
Neg logit contributions
bridge-0.001
te-0.001
 fant-0.001
THE-0.001
 calendar-0.001
Token toy
Token ID13373
Feature activation+2.42e-03

Pos logit contributions
bridge+0.024
te+0.024
THE+0.024
book+0.022
vy+0.022
Neg logit contributions
 So-0.018
 Instead-0.018
 Twe-0.017
 Mitt-0.016
 neither-0.015
Token and
Token ID290
Feature activation+2.09e-03

Pos logit contributions
 neither+0.003
 Instead+0.003
 So+0.003
 Twe+0.003
 Mitt+0.003
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
book-0.004
 fant-0.004
Token poking
Token ID39048
Feature activation-5.13e-03

Pos logit contributions
 neither+0.003
 Instead+0.003
 Twe+0.003
 So+0.003
 Mitt+0.003
Neg logit contributions
te-0.003
bridge-0.003
THE-0.003
book-0.003
 calendar-0.003
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
te+0.008
bridge+0.007
THE+0.007
 fant+0.007
book+0.007
Neg logit contributions
 neither-0.008
 Instead-0.007
 Twe-0.007
 So-0.007
 Mitt-0.007
Token books
Token ID3835
Feature activation-5.47e-02

Pos logit contributions
te+0.016
THE+0.016
eway+0.015
vy+0.015
bridge+0.015
Neg logit contributions
 Instead-0.014
 So-0.014
 Twe-0.013
 Ow-0.013
idents-0.013
Tokenhe
Token ID258
Feature activation-1.00e-01

Pos logit contributions
te+0.075
 fant+0.074
bridge+0.071
THE+0.070
idi+0.069
Neg logit contributions
 neither-0.086
 Instead-0.082
 So-0.078
 Twe-0.077
 Mitt-0.070
Tokenlf
Token ID1652
Feature activation-4.92e-02

Pos logit contributions
THE+0.088
te+0.086
 fant+0.082
bridge+0.079
 carriage+0.078
Neg logit contributions
 neither-0.190
 Instead-0.186
 So-0.183
 Twe-0.180
 Tim-0.175
Token.
Token ID13
Feature activation+7.53e-03

Pos logit contributions
te+0.078
bridge+0.076
THE+0.075
 fant+0.073
book+0.072
Neg logit contributions
 neither-0.068
 Instead-0.065
 Twe-0.063
 So-0.063
 Mitt-0.058
Token She
Token ID1375
Feature activation-3.32e-03

Pos logit contributions
 neither+0.010
 Twe+0.009
 Instead+0.009
 So+0.008
 Mitt+0.008
Neg logit contributions
te-0.012
bridge-0.012
THE-0.012
book-0.012
 fant-0.012
Token was
Token ID373
Feature activation-3.69e-03

Pos logit contributions
bridge+0.006
te+0.006
THE+0.005
 calendar+0.005
book+0.005
Neg logit contributions
 neither-0.004
 Instead-0.004
 So-0.004
 Twe-0.004
 Mitt-0.004
Token having
Token ID1719
Feature activation-4.36e-02

Pos logit contributions
book+0.007
 calendar+0.007
bridge+0.007
te+0.006
iet+0.006
Neg logit contributions
 neither-0.005
 Twe-0.004
 Instead-0.004
 Mitt-0.004
 So-0.004
Token can
Token ID460
Feature activation+6.01e-03

Pos logit contributions
te+0.000
bridge+0.000
THE+0.000
 fant+0.000
book+0.000
Neg logit contributions
 neither-0.000
 Instead-0.000
 Twe-0.000
 So-0.000
 Mitt-0.000
Token't
Token ID470
Feature activation-2.51e-02

Pos logit contributions
 neither+0.009
 Twe+0.009
 Instead+0.009
ety+0.008
 Mitt+0.008
Neg logit contributions
te-0.009
 calendar-0.008
bridge-0.008
 throne-0.008
 curtain-0.008
Token reach
Token ID3151
Feature activation+9.99e-03

Pos logit contributions
te+0.035
bridge+0.032
 calendar+0.032
THE+0.031
 downstairs+0.031
Neg logit contributions
 neither-0.039
 Instead-0.038
 Twe-0.037
ucker-0.037
 Mitt-0.036
Token the
Token ID262
Feature activation-1.81e-02

Pos logit contributions
 neither+0.015
 Instead+0.015
 So+0.015
 Twe+0.014
idents+0.013
Neg logit contributions
te-0.014
 calendar-0.014
THE-0.014
bridge-0.014
 fant-0.013
Token books
Token ID3835
Feature activation-5.81e-02

Pos logit contributions
te+0.024
bridge+0.024
THE+0.023
 fant+0.021
iet+0.021
Neg logit contributions
 Instead-0.029
 neither-0.028
 So-0.028
 Twe-0.027
 Mitt-0.027
Tokenhe
Token ID258
Feature activation-1.00e-01

Pos logit contributions
te+0.055
bridge+0.054
THE+0.050
 fant+0.050
 calendar+0.048
Neg logit contributions
 neither-0.117
 Instead-0.115
 Twe-0.112
 So-0.112
 Mitt-0.106
Tokenlf
Token ID1652
Feature activation-5.24e-02

Pos logit contributions
THE+0.084
 fant+0.082
te+0.079
bridge+0.077
 spectacular+0.077
Neg logit contributions
 neither-0.194
 So-0.193
 Instead-0.193
 you-0.179
 Twe-0.179
Token.
Token ID13
Feature activation-5.43e-03

Pos logit contributions
te+0.083
bridge+0.081
THE+0.077
book+0.076
 calendar+0.076
Neg logit contributions
 neither-0.072
 Instead-0.071
 Twe-0.069
 So-0.069
 Mitt-0.065
Token Can
Token ID1680
Feature activation+9.70e-03

Pos logit contributions
te+0.009
bridge+0.008
 calendar+0.008
 downstairs+0.008
book+0.008
Neg logit contributions
 neither-0.007
ucker-0.007
ety-0.007
 Twe-0.006
 mun-0.006
Token you
Token ID345
Feature activation-2.37e-02

Pos logit contributions
ucker+0.014
eter+0.013
 Instead+0.013
 amounts+0.012
 mun+0.012
Neg logit contributions
te-0.014
THE-0.014
 around-0.012
 calendar-0.012
lf-0.012
Token reach
Token ID3151
Feature activation+4.81e-03

Pos logit contributions
te+0.036
bridge+0.034
 calendar+0.033
THE+0.033
 throne+0.033
Neg logit contributions
 neither-0.034
 Instead-0.033
 Twe-0.032
 So-0.031
idents-0.030
Token felt
Token ID2936
Feature activation-1.82e-02

Pos logit contributions
 neither+0.017
 Instead+0.017
 So+0.017
 Twe+0.017
 Mitt+0.015
Neg logit contributions
te-0.022
bridge-0.022
THE-0.021
book-0.020
 calendar-0.020
Token his
Token ID465
Feature activation+1.92e-02

Pos logit contributions
bridge+0.031
 calendar+0.031
te+0.030
iet+0.030
 throne+0.030
Neg logit contributions
 Instead-0.024
 Twe-0.023
 neither-0.023
 So-0.022
 Mitt-0.021
Token tum
Token ID11814
Feature activation+3.99e-02

Pos logit contributions
 neither+0.030
 Instead+0.029
 Twe+0.029
 So+0.028
 Mitt+0.027
Neg logit contributions
te-0.027
 calendar-0.027
 throne-0.026
 fant-0.025
bridge-0.025
Tokenmy
Token ID1820
Feature activation-1.24e-02

Pos logit contributions
 neither+0.061
 Twe+0.057
 Instead+0.055
idents+0.055
 So+0.055
Neg logit contributions
te-0.060
bridge-0.058
book-0.058
THE-0.057
ines-0.053
Token ac
Token ID936
Feature activation-2.77e-02

Pos logit contributions
te+0.018
bridge+0.018
THE+0.017
 calendar+0.017
 fant+0.017
Neg logit contributions
 neither-0.019
 Instead-0.018
 Twe-0.018
 So-0.018
 Mitt-0.016
Tokenhe
Token ID258
Feature activation-1.00e-01

Pos logit contributions
te+0.018
bridge+0.015
book+0.014
THE+0.014
 calendar+0.013
Neg logit contributions
 neither-0.068
 Instead-0.066
 Twe-0.065
 So-0.065
idents-0.062
Token starting
Token ID3599
Feature activation-3.33e-02

Pos logit contributions
te+0.151
bridge+0.146
THE+0.140
 calendar+0.138
book+0.138
Neg logit contributions
 neither-0.147
 Instead-0.144
 Twe-0.141
 So-0.139
 Mitt-0.130
Token to
Token ID284
Feature activation+2.80e-03

Pos logit contributions
te+0.049
bridge+0.048
THE+0.046
 fant+0.045
 calendar+0.045
Neg logit contributions
 neither-0.050
 Instead-0.049
 Twe-0.047
 So-0.047
 Mitt-0.044
Token feel
Token ID1254
Feature activation-2.32e-02

Pos logit contributions
 neither+0.004
 So+0.004
 Instead+0.004
 Twe+0.004
idents+0.004
Neg logit contributions
te-0.004
bridge-0.004
 fant-0.004
 throne-0.004
THE-0.004
Token a
Token ID257
Feature activation+1.16e-02

Pos logit contributions
te+0.037
bridge+0.036
 calendar+0.035
THE+0.035
book+0.034
Neg logit contributions
 neither-0.032
 Instead-0.032
 Twe-0.031
 So-0.030
 Mitt-0.029
Token bit
Token ID1643
Feature activation+5.44e-02

Pos logit contributions
 neither+0.014
 Instead+0.013
 Twe+0.013
 So+0.013
 Mitt+0.011
Neg logit contributions
te-0.021
bridge-0.020
THE-0.020
 fant-0.020
 calendar-0.020
Token it
Token ID340
Feature activation-8.35e-03

Pos logit contributions
te+0.004
bridge+0.004
THE+0.004
 calendar+0.004
 fant+0.004
Neg logit contributions
 neither-0.004
 Instead-0.004
 Twe-0.004
 So-0.004
 Mitt-0.003
Token back
Token ID736
Feature activation+7.84e-03

Pos logit contributions
bridge+0.010
 fant+0.010
te+0.010
 reporter+0.009
THE+0.009
Neg logit contributions
 neither-0.014
 Twe-0.014
 Instead-0.013
 So-0.013
 Mitt-0.012
Token on
Token ID319
Feature activation-2.50e-02

Pos logit contributions
 neither+0.012
 Instead+0.012
 Twe+0.012
 So+0.011
 Mitt+0.011
Neg logit contributions
te-0.011
bridge-0.011
THE-0.010
 calendar-0.010
book-0.010
Token the
Token ID262
Feature activation-6.99e-03

Pos logit contributions
te+0.041
bridge+0.040
THE+0.039
 calendar+0.038
book+0.038
Neg logit contributions
 neither-0.034
 Instead-0.032
 Twe-0.032
 So-0.031
 Mitt-0.029
Token books
Token ID3835
Feature activation-5.13e-02

Pos logit contributions
bridge+0.011
te+0.011
THE+0.010
vy+0.010
eway+0.010
Neg logit contributions
 So-0.009
 Instead-0.009
 Mitt-0.009
 Twe-0.009
 you-0.008
Tokenhe
Token ID258
Feature activation-9.98e-02

Pos logit contributions
te+0.051
bridge+0.049
THE+0.046
book+0.045
 calendar+0.045
Neg logit contributions
 neither-0.101
 Instead-0.099
 Twe-0.098
 So-0.096
 Mitt-0.093
Tokenlf
Token ID1652
Feature activation-4.77e-02

Pos logit contributions
vy+0.097
 spectacular+0.095
 fant+0.093
te+0.092
THE+0.091
Neg logit contributions
 So-0.179
 Instead-0.173
 neither-0.172
 you-0.165
mm-0.163
Token so
Token ID523
Feature activation+4.30e-03

Pos logit contributions
te+0.076
bridge+0.072
book+0.070
 calendar+0.069
THE+0.069
Neg logit contributions
 neither-0.067
 Instead-0.066
 Twe-0.066
 So-0.062
 Mitt-0.061
Token we
Token ID356
Feature activation-1.32e-03

Pos logit contributions
 Instead+0.006
 scrape+0.006
elt+0.006
 So+0.006
unch+0.005
Neg logit contributions
te-0.007
 around-0.006
THE-0.006
 calendar-0.006
 downstairs-0.006
Token can
Token ID460
Feature activation+9.86e-03

Pos logit contributions
te+0.002
bridge+0.002
THE+0.002
 calendar+0.002
book+0.002
Neg logit contributions
 neither-0.002
 Instead-0.002
 Twe-0.002
 So-0.002
 Mitt-0.002
Token find
Token ID1064
Feature activation-1.42e-02

Pos logit contributions
 neither+0.015
 Instead+0.014
 Twe+0.014
 So+0.014
 Mitt+0.013
Neg logit contributions
te-0.015
bridge-0.014
THE-0.014
 calendar-0.013
 throne-0.013
Token books
Token ID3835
Feature activation-4.99e-02

Pos logit contributions
 So+0.021
 Instead+0.021
 neither+0.020
 Twe+0.019
 Mitt+0.019
Neg logit contributions
te-0.021
bridge-0.021
THE-0.021
 reporter-0.019
 fant-0.019
Token back
Token ID736
Feature activation+1.82e-03

Pos logit contributions
 fant+0.061
bridge+0.061
te+0.060
 calendar+0.058
 throne+0.057
Neg logit contributions
 neither-0.087
 So-0.081
 Twe-0.081
 Instead-0.080
 Mitt-0.072
Token on
Token ID319
Feature activation-2.64e-02

Pos logit contributions
 neither+0.003
 Instead+0.003
 Twe+0.003
 So+0.003
 Mitt+0.003
Neg logit contributions
te-0.002
bridge-0.002
THE-0.002
 fant-0.002
book-0.002
Token the
Token ID262
Feature activation-4.61e-03

Pos logit contributions
te+0.043
bridge+0.042
THE+0.040
book+0.039
 calendar+0.039
Neg logit contributions
 neither-0.036
 Instead-0.035
 Twe-0.034
 So-0.034
 Mitt-0.031
Token books
Token ID3835
Feature activation-5.56e-02

Pos logit contributions
bridge+0.007
te+0.007
THE+0.007
vy+0.007
eway+0.007
Neg logit contributions
 So-0.006
 Instead-0.006
idents-0.005
 Mitt-0.005
 Twe-0.005
Tokenhe
Token ID258
Feature activation-9.96e-02

Pos logit contributions
te+0.052
bridge+0.051
THE+0.047
 fant+0.046
 calendar+0.046
Neg logit contributions
 neither-0.113
 Instead-0.110
 Twe-0.108
 So-0.108
 Mitt-0.103
Tokenlf
Token ID1652
Feature activation-4.86e-02

Pos logit contributions
 spectacular+0.095
THE+0.095
 fant+0.094
bridge+0.094
vy+0.093
Neg logit contributions
 So-0.179
 neither-0.177
 Instead-0.174
idents-0.161
 Twe-0.160
Token,
Token ID11
Feature activation-4.74e-03

Pos logit contributions
te+0.067
bridge+0.064
book+0.062
 calendar+0.061
 throne+0.061
Neg logit contributions
 neither-0.077
 Twe-0.076
 Instead-0.075
 So-0.072
 Mitt-0.071
Token but
Token ID475
Feature activation+2.96e-04

Pos logit contributions
te+0.008
bridge+0.008
THE+0.007
book+0.007
 fant+0.007
Neg logit contributions
 neither-0.006
 Instead-0.006
 Twe-0.006
 So-0.006
 Mitt-0.006
Token this
Token ID428
Feature activation-1.21e-02

Pos logit contributions
 neither+0.000
 Instead+0.000
 So+0.000
 Twe+0.000
 Mitt+0.000
Neg logit contributions
te-0.001
bridge-0.000
THE-0.000
 fant-0.000
book-0.000
Token time
Token ID640
Feature activation+1.90e-02

Pos logit contributions
te+0.015
THE+0.015
bridge+0.014
 fant+0.014
 throne+0.014
Neg logit contributions
 Instead-0.020
 So-0.020
 neither-0.020
 Twe-0.018
 Mitt-0.017
Token down
Token ID866
Feature activation-2.26e-02

Pos logit contributions
book+0.018
 calendar+0.017
te+0.017
 reporter+0.017
ibrarian+0.017
Neg logit contributions
 Twe-0.017
 B-0.015
 So-0.015
 neither-0.015
 Tim-0.015
Token the
Token ID262
Feature activation-9.45e-03

Pos logit contributions
te+0.039
bridge+0.038
THE+0.036
book+0.036
 calendar+0.036
Neg logit contributions
 neither-0.029
 Instead-0.028
 Twe-0.027
 So-0.027
 Mitt-0.025
Token slide
Token ID10649
Feature activation-1.58e-02

Pos logit contributions
te+0.017
book+0.016
bridge+0.016
 fant+0.015
THE+0.015
Neg logit contributions
 neither-0.012
 So-0.011
 Instead-0.011
 Twe-0.011
 Mitt-0.010
Token-
Token ID12
Feature activation-3.25e-03

Pos logit contributions
te+0.025
bridge+0.024
THE+0.023
book+0.022
 calendar+0.022
Neg logit contributions
 neither-0.022
 Instead-0.022
 Twe-0.021
 So-0.021
 Mitt-0.020
Tokenbat
Token ID8664
Feature activation-1.23e-02

Pos logit contributions
te+0.004
bridge+0.004
THE+0.004
book+0.003
 around+0.003
Neg logit contributions
 neither-0.006
 Instead-0.006
 So-0.006
 Twe-0.006
 Mitt-0.006
Tokenhe
Token ID258
Feature activation-9.95e-02

Pos logit contributions
te+0.010
bridge+0.010
THE+0.009
 fant+0.009
 calendar+0.009
Neg logit contributions
 neither-0.026
 Instead-0.026
 Twe-0.025
 So-0.025
 Mitt-0.024
Token feeling
Token ID4203
Feature activation-2.77e-02

Pos logit contributions
te+0.133
bridge+0.129
THE+0.126
 fant+0.125
 calendar+0.120
Neg logit contributions
 neither-0.163
 Instead-0.151
 So-0.151
 Twe-0.150
 Mitt-0.139
Token the
Token ID262
Feature activation+8.23e-03

Pos logit contributions
te+0.042
 calendar+0.042
bridge+0.041
book+0.040
 throne+0.040
Neg logit contributions
 neither-0.041
 Twe-0.039
 Instead-0.038
 So-0.037
 Mitt-0.036
Token water
Token ID1660
Feature activation+2.45e-02

Pos logit contributions
 Instead+0.013
 neither+0.013
 So+0.013
 Twe+0.013
 Mitt+0.012
Neg logit contributions
bridge-0.011
te-0.011
THE-0.010
 carriage-0.010
 throne-0.010
Token on
Token ID319
Feature activation-4.78e-03

Pos logit contributions
 neither+0.043
 Instead+0.042
 Twe+0.041
 So+0.040
 Mitt+0.038
Neg logit contributions
te-0.030
bridge-0.029
THE-0.028
 fant-0.027
 calendar-0.027
Token her
Token ID607
Feature activation+3.02e-03

Pos logit contributions
te+0.007
bridge+0.007
THE+0.007
book+0.006
 fant+0.006
Neg logit contributions
 neither-0.007
 Twe-0.007
 Instead-0.007
 So-0.007
 Mitt-0.007
TokenThey
Token ID2990
Feature activation+1.61e-03

Pos logit contributions
te+0.005
bridge+0.005
 calendar+0.005
 fant+0.005
book+0.005
Neg logit contributions
 neither-0.007
 Instead-0.007
 So-0.007
 Twe-0.006
 Mitt-0.006
Token ran
Token ID4966
Feature activation-1.62e-02

Pos logit contributions
 neither+0.002
 Instead+0.002
 Twe+0.002
 So+0.002
 Mitt+0.002
Neg logit contributions
te-0.003
bridge-0.003
THE-0.003
 fant-0.003
 calendar-0.003
Token to
Token ID284
Feature activation-1.37e-02

Pos logit contributions
bridge+0.033
book+0.030
te+0.029
 fant+0.028
 calendar+0.028
Neg logit contributions
 neither-0.021
 Instead-0.019
 Twe-0.019
 So-0.019
 Tim-0.018
Token the
Token ID262
Feature activation-1.05e-02

Pos logit contributions
te+0.019
 fant+0.018
 calendar+0.018
THE+0.017
 curtain+0.017
Neg logit contributions
 neither-0.021
 Instead-0.021
 Twe-0.020
 So-0.020
ucker-0.019
Token books
Token ID3835
Feature activation-5.68e-02

Pos logit contributions
te+0.018
book+0.017
bridge+0.016
iet+0.016
 fant+0.015
Neg logit contributions
 Mitt-0.014
 Twe-0.014
 Instead-0.014
 So-0.013
 scrape-0.013
Tokenhe
Token ID258
Feature activation-9.94e-02

Pos logit contributions
 fant+0.088
te+0.087
bridge+0.087
THE+0.083
 calendar+0.080
Neg logit contributions
 neither-0.082
 So-0.078
 Instead-0.077
 Twe-0.070
 Mitt-0.066
Tokenlf
Token ID1652
Feature activation-4.79e-02

Pos logit contributions
 fant+0.082
te+0.080
bridge+0.076
THE+0.076
 spectacular+0.072
Neg logit contributions
 neither-0.202
 So-0.190
 Twe-0.186
 Instead-0.185
 Mitt-0.179
Token and
Token ID290
Feature activation-5.37e-04

Pos logit contributions
te+0.072
bridge+0.070
THE+0.067
book+0.066
 calendar+0.066
Neg logit contributions
 neither-0.070
 Instead-0.069
 Twe-0.068
 So-0.066
 Mitt-0.063
Token put
Token ID1234
Feature activation-1.25e-02

Pos logit contributions
te+0.001
bridge+0.001
book+0.001
THE+0.001
 throne+0.001
Neg logit contributions
 neither-0.001
 Instead-0.001
 Twe-0.001
 So-0.001
idents-0.001
Token the
Token ID262
Feature activation-1.32e-02

Pos logit contributions
te+0.015
bridge+0.015
book+0.013
 around+0.013
 calendar+0.013
Neg logit contributions
 Instead-0.022
 So-0.021
 neither-0.021
 Twe-0.021
idents-0.020
Token box
Token ID3091
Feature activation-1.37e-02

Pos logit contributions
te+0.023
THE+0.022
 fant+0.021
book+0.021
bridge+0.021
Neg logit contributions
 neither-0.017
 Twe-0.016
 Ow-0.015
 Mitt-0.015
 Instead-0.015
Token it
Token ID340
Feature activation+1.09e-02

Pos logit contributions
te+0.026
bridge+0.026
THE+0.026
 fant+0.025
 throne+0.024
Neg logit contributions
 Instead-0.024
 So-0.024
 neither-0.024
idents-0.023
 Twe-0.022
Token wasn
Token ID2492
Feature activation-1.22e-02

Pos logit contributions
 Instead+0.017
 neither+0.017
 So+0.016
 Twe+0.016
ucker+0.016
Neg logit contributions
te-0.015
book-0.013
 fant-0.013
bridge-0.013
 around-0.013
Token't
Token ID470
Feature activation-1.40e-02

Pos logit contributions
te+0.020
bridge+0.020
THE+0.019
 fant+0.018
 calendar+0.018
Neg logit contributions
 neither-0.016
 Instead-0.016
 So-0.015
 Twe-0.015
 Mitt-0.014
Token safe
Token ID3338
Feature activation+2.24e-02

Pos logit contributions
te+0.019
bridge+0.018
THE+0.018
 fant+0.017
book+0.017
Neg logit contributions
 neither-0.023
 Instead-0.022
 Twe-0.022
 So-0.022
 Mitt-0.020
Token.
Token ID13
Feature activation+7.49e-03

Pos logit contributions
 neither+0.029
 Instead+0.028
 Twe+0.028
 So+0.027
 Mitt+0.025
Neg logit contributions
te-0.038
bridge-0.037
THE-0.035
 fant-0.035
book-0.035
Token So
Token ID1406
Feature activation+3.99e-02

Pos logit contributions
 neither+0.015
idents+0.013
 scrape+0.013
charged+0.013
ucker+0.013
Neg logit contributions
 around-0.007
 corner-0.007
THE-0.006
te-0.006
idi-0.006
Token he
Token ID339
Feature activation+4.79e-03

Pos logit contributions
ucker+0.058
 neither+0.058
 Instead+0.057
idents+0.057
ety+0.056
Neg logit contributions
te-0.056
THE-0.053
bridge-0.052
 throne-0.052
 fant-0.052
Token put
Token ID1234
Feature activation-2.70e-03

Pos logit contributions
 neither+0.007
 Instead+0.007
 Twe+0.007
 So+0.006
idents+0.006
Neg logit contributions
te-0.007
bridge-0.007
THE-0.006
idi-0.006
 fant-0.006
Token it
Token ID340
Feature activation-3.28e-03

Pos logit contributions
te+0.003
bridge+0.003
THE+0.003
book+0.003
 around+0.003
Neg logit contributions
 neither-0.005
 Instead-0.005
 So-0.004
 Twe-0.004
idents-0.004
Token away
Token ID1497
Feature activation-2.27e-03

Pos logit contributions
te+0.004
bridge+0.004
 fant+0.003
 calendar+0.003
 orchestra+0.003
Neg logit contributions
 neither-0.006
 Twe-0.006
 Instead-0.006
 So-0.006
 Mitt-0.006
Token and
Token ID290
Feature activation-2.07e-03

Pos logit contributions
te+0.003
bridge+0.003
THE+0.003
book+0.003
 fant+0.003
Neg logit contributions
 neither-0.004
 Instead-0.004
 Twe-0.003
 So-0.003
 Mitt-0.003
Token Tim
Token ID5045
Feature activation+4.32e-04

Pos logit contributions
 around+0.004
 fant+0.004
THE+0.004
 over+0.004
te+0.004
Neg logit contributions
ucker-0.004
uck-0.003
unn-0.003
our-0.003
Fine-0.003
Tokenmy
Token ID1820
Feature activation-4.68e-03

Pos logit contributions
 neither+0.000
 Instead+0.000
 Twe+0.000
 So+0.000
 Mitt+0.000
Neg logit contributions
te-0.001
bridge-0.001
THE-0.001
 fant-0.001
 calendar-0.001
Token promised
Token ID8072
Feature activation-1.92e-02

Pos logit contributions
bridge+0.007
te+0.007
book+0.007
 calendar+0.007
 throne+0.007
Neg logit contributions
 neither-0.007
 Instead-0.006
 Twe-0.006
 So-0.006
 Mitt-0.006
Token to
Token ID284
Feature activation-1.54e-02

Pos logit contributions
THE+0.013
te+0.013
idi+0.011
 tour+0.011
 around+0.011
Neg logit contributions
 neither-0.041
 Sorry-0.041
idents-0.040
 Instead-0.040
ucker-0.039
Token only
Token ID691
Feature activation-4.65e-04

Pos logit contributions
te+0.018
THE+0.016
 throne+0.016
 tour+0.016
bridge+0.015
Neg logit contributions
 neither-0.028
 Instead-0.027
 So-0.026
idents-0.026
 Twe-0.025
Token shoot
Token ID2686
Feature activation+3.74e-02

Pos logit contributions
te+0.001
bridge+0.001
 calendar+0.001
 throne+0.001
THE+0.001
Neg logit contributions
 neither-0.001
 Instead-0.001
 So-0.001
 Twe-0.001
idents-0.001
Token at
Token ID379
Feature activation-1.09e-02

Pos logit contributions
 neither+0.057
 Instead+0.055
 Twe+0.054
 So+0.054
idents+0.051
Neg logit contributions
te-0.053
bridge-0.052
THE-0.049
 fant-0.049
 calendar-0.048
Token pretend
Token ID16614
Feature activation-1.91e-02

Pos logit contributions
 around+0.013
 calendar+0.013
te+0.012
 corner+0.012
THE+0.012
Neg logit contributions
 So-0.019
 Instead-0.019
ucker-0.018
idents-0.018
Fine-0.017
Token bad
Token ID2089
Feature activation-6.87e-03

Pos logit contributions
te+0.031
bridge+0.031
THE+0.030
book+0.029
 calendar+0.029
Neg logit contributions
 neither-0.026
 Instead-0.025
 Twe-0.024
 So-0.024
 Mitt-0.022
Token guys
Token ID3730
Feature activation-1.13e-02

Pos logit contributions
 calendar+0.008
bridge+0.007
 tour+0.007
te+0.007
 around+0.007
Neg logit contributions
 So-0.012
 Instead-0.012
 neither-0.012
ucker-0.012
 Twe-0.011
Token and
Token ID290
Feature activation-8.48e-03

Pos logit contributions
te+0.017
bridge+0.017
THE+0.016
 fant+0.016
 calendar+0.016
Neg logit contributions
 neither-0.016
 Instead-0.015
 Twe-0.015
 So-0.015
 Mitt-0.014
Token before
Token ID878
Feature activation+1.00e-02

Pos logit contributions
 neither+0.029
 Instead+0.027
 So+0.026
 Twe+0.026
 Mitt+0.023
Neg logit contributions
bridge-0.049
te-0.048
THE-0.047
 calendar-0.046
 fant-0.046
Token they
Token ID484
Feature activation+5.36e-03

Pos logit contributions
 Instead+0.014
 neither+0.014
 So+0.014
 Twe+0.014
ety+0.013
Neg logit contributions
te-0.015
THE-0.014
 fant-0.014
bridge-0.014
 calendar-0.014
Token could
Token ID714
Feature activation+2.33e-02

Pos logit contributions
 neither+0.008
 Instead+0.008
 Twe+0.008
 So+0.008
 Mitt+0.007
Neg logit contributions
te-0.008
bridge-0.008
THE-0.007
book-0.007
 throne-0.007
Token board
Token ID3096
Feature activation+3.12e-02

Pos logit contributions
 neither+0.033
 Instead+0.032
 Twe+0.031
 So+0.031
 Mitt+0.029
Neg logit contributions
te-0.037
bridge-0.036
THE-0.034
 calendar-0.034
 fant-0.034
Token their
Token ID511
Feature activation+1.93e-02

Pos logit contributions
 neither+0.044
 Instead+0.043
 Twe+0.042
 So+0.041
 Mitt+0.038
Neg logit contributions
te-0.048
bridge-0.047
THE-0.045
 calendar-0.045
 fant-0.045
Token plane
Token ID6614
Feature activation+3.80e-02

Pos logit contributions
 neither+0.032
 Twe+0.030
 So+0.030
 Instead+0.030
idents+0.028
Neg logit contributions
te-0.026
bridge-0.025
book-0.024
THE-0.024
 fant-0.023
Token.
Token ID13
Feature activation+8.92e-03

Pos logit contributions
 neither+0.048
 Instead+0.047
 Twe+0.046
 So+0.045
 Mitt+0.041
Neg logit contributions
te-0.064
bridge-0.063
THE-0.061
 fant-0.060
book-0.060
Token Lily
Token ID20037
Feature activation+9.94e-03

Pos logit contributions
 neither+0.011
 Twe+0.009
 mun+0.009
idents+0.009
 scrape+0.009
Neg logit contributions
te-0.015
 fant-0.015
book-0.015
THE-0.015
bridge-0.014
Token was
Token ID373
Feature activation+5.41e-03

Pos logit contributions
 neither+0.013
 Instead+0.012
 Twe+0.012
 So+0.012
 Mitt+0.011
Neg logit contributions
te-0.017
bridge-0.017
THE-0.016
book-0.016
 calendar-0.016
Token bored
Token ID17533
Feature activation+1.36e-03

Pos logit contributions
 neither+0.008
 Twe+0.007
 Instead+0.007
 So+0.006
 Mitt+0.006
Neg logit contributions
THE-0.009
book-0.009
 calendar-0.009
bridge-0.009
te-0.009
Token and
Token ID290
Feature activation+2.12e-03

Pos logit contributions
 neither+0.002
 Instead+0.002
 So+0.002
 Twe+0.002
 Mitt+0.002
Neg logit contributions
te-0.002
bridge-0.002
THE-0.002
 fant-0.002
 calendar-0.002
Tokenummy
Token ID13513
Feature activation+1.37e-02

Pos logit contributions
 neither+0.050
 Twe+0.045
 Instead+0.042
 So+0.040
 Mitt+0.039
Neg logit contributions
te-0.059
 calendar-0.059
ines-0.058
book-0.057
bridge-0.057
Token treats
Token ID18432
Feature activation+5.37e-03

Pos logit contributions
 neither+0.016
 Instead+0.016
 So+0.015
 Twe+0.015
 Mitt+0.014
Neg logit contributions
te-0.024
bridge-0.024
THE-0.023
 fant-0.022
book-0.022
Token!
Token ID0
Feature activation+5.88e-03

Pos logit contributions
 neither+0.006
 Twe+0.005
 Instead+0.005
 So+0.005
 Mitt+0.005
Neg logit contributions
bridge-0.010
te-0.010
 fant-0.010
THE-0.010
 throne-0.009
Token
Token ID198
Feature activation-1.84e-03

Pos logit contributions
 neither+0.010
idents+0.009
 Twe+0.009
 scrape+0.008
ucker+0.008
Neg logit contributions
te-0.008
THE-0.008
book-0.007
 fant-0.007
 downstairs-0.007
Token
Token ID198
Feature activation+1.75e-03

Pos logit contributions
bridge+0.003
te+0.003
 calendar+0.003
 throne+0.003
THE+0.003
Neg logit contributions
 Instead-0.002
 neither-0.002
 So-0.002
 Twe-0.002
 Mitt-0.002
TokenCl
Token ID2601
Feature activation+4.61e-02

Pos logit contributions
 So+0.002
 Instead+0.002
 neither+0.002
 Twe+0.002
Fine+0.002
Neg logit contributions
 throne-0.003
bridge-0.003
 calendar-0.003
te-0.003
 curtain-0.003
Tokenara
Token ID3301
Feature activation+1.56e-02

Pos logit contributions
 neither+0.071
 Twe+0.070
 Mitt+0.065
 Bugs+0.065
 Tweet+0.063
Neg logit contributions
te-0.073
THE-0.067
book-0.067
iet-0.063
 calendar-0.061
Token gently
Token ID15165
Feature activation+1.49e-04

Pos logit contributions
 neither+0.020
 Instead+0.020
 So+0.020
 Twe+0.019
 Mitt+0.018
Neg logit contributions
te-0.026
bridge-0.026
THE-0.025
 fant-0.024
 calendar-0.024
Token touched
Token ID12615
Feature activation+2.44e-02

Pos logit contributions
 neither+0.000
 Instead+0.000
 Twe+0.000
 So+0.000
idents+0.000
Neg logit contributions
te-0.000
bridge-0.000
book-0.000
 throne-0.000
 fant-0.000
Token the
Token ID262
Feature activation-1.04e-02

Pos logit contributions
 neither+0.037
 Instead+0.036
 Twe+0.036
 So+0.036
idents+0.035
Neg logit contributions
te-0.034
 calendar-0.032
bridge-0.032
book-0.032
 around-0.032
Token door
Token ID3420
Feature activation-2.21e-02

Pos logit contributions
te+0.017
bridge+0.016
THE+0.016
 fant+0.016
book+0.015
Neg logit contributions
 Instead-0.014
 neither-0.014
 So-0.013
 Twe-0.013
 Mitt-0.013
Token others
Token ID1854
Feature activation+1.04e-02

Pos logit contributions
 neither+0.004
 Twe+0.004
 Instead+0.004
 So+0.004
idents+0.004
Neg logit contributions
te-0.004
 calendar-0.004
THE-0.004
 around-0.004
 curtain-0.004
Token made
Token ID925
Feature activation+6.82e-03

Pos logit contributions
 neither+0.014
 Instead+0.014
 So+0.013
 Twe+0.013
 Mitt+0.012
Neg logit contributions
te-0.017
bridge-0.017
THE-0.016
 fant-0.016
book-0.016
Token her
Token ID607
Feature activation-6.33e-03

Pos logit contributions
 neither+0.012
 Instead+0.012
 mun+0.012
 So+0.011
elt+0.011
Neg logit contributions
 calendar-0.008
 around-0.007
te-0.007
 fant-0.007
THE-0.007
Token happy
Token ID3772
Feature activation-2.09e-02

Pos logit contributions
te+0.010
THE+0.009
 calendar+0.009
bridge+0.009
 fant+0.009
Neg logit contributions
 Instead-0.009
 neither-0.009
 Twe-0.009
 So-0.008
 Ow-0.008
Token and
Token ID290
Feature activation+9.56e-03

Pos logit contributions
te+0.036
bridge+0.035
THE+0.034
 calendar+0.033
 fant+0.033
Neg logit contributions
 neither-0.026
 Twe-0.025
 Instead-0.025
 So-0.024
 Mitt-0.023
Token helped
Token ID4193
Feature activation+3.72e-02

Pos logit contributions
 neither+0.015
 Twe+0.015
 Instead+0.015
 So+0.015
 Ow+0.015
Neg logit contributions
te-0.012
THE-0.012
 around-0.011
 fant-0.011
 calendar-0.011
Token her
Token ID607
Feature activation-5.72e-03

Pos logit contributions
 neither+0.063
 mun+0.059
ucker+0.059
�+0.059
mm+0.059
Neg logit contributions
 around-0.045
te-0.044
 calendar-0.042
 over-0.041
 downstairs-0.040
Token make
Token ID787
Feature activation+1.05e-02

Pos logit contributions
te+0.008
THE+0.008
 calendar+0.008
 throne+0.008
bridge+0.007
Neg logit contributions
 Twe-0.008
 Instead-0.008
 neither-0.008
 So-0.008
 Ow-0.008
Token new
Token ID649
Feature activation+3.37e-03

Pos logit contributions
 neither+0.019
elt+0.018
�+0.018
ety+0.018
 So+0.017
Neg logit contributions
 fant-0.012
 amazing-0.012
 calendar-0.011
 around-0.011
THE-0.011
Token friends
Token ID2460
Feature activation-1.65e-02

Pos logit contributions
 neither+0.005
 Twe+0.004
 Instead+0.004
 Ow+0.004
 Mitt+0.004
Neg logit contributions
te-0.005
 calendar-0.005
 fant-0.005
THE-0.005
bridge-0.005
Token.
Token ID13
Feature activation+5.39e-03

Pos logit contributions
te+0.030
bridge+0.029
book+0.028
 calendar+0.028
THE+0.028
Neg logit contributions
 neither-0.019
 Instead-0.019
 So-0.018
 Twe-0.018
 Mitt-0.016
Token toy
Token ID13373
Feature activation+6.00e-03

Pos logit contributions
te+0.008
bridge+0.007
 calendar+0.007
THE+0.007
book+0.007
Neg logit contributions
 neither-0.011
 Instead-0.011
 Twe-0.010
 So-0.010
 Mitt-0.010
Token.
Token ID13
Feature activation+6.61e-03

Pos logit contributions
 neither+0.008
 Instead+0.007
 Twe+0.007
 So+0.007
 Mitt+0.006
Neg logit contributions
te-0.010
bridge-0.010
THE-0.010
 fant-0.010
book-0.010
Token And
Token ID843
Feature activation+2.36e-02

Pos logit contributions
 neither+0.010
idents+0.009
 Twe+0.009
 mun+0.008
ety+0.008
Neg logit contributions
te-0.010
bridge-0.010
book-0.009
THE-0.009
 fant-0.009
Token be
Token ID307
Feature activation+1.90e-02

Pos logit contributions
 Instead+0.027
 Twe+0.026
 neither+0.026
 Ow+0.025
 So+0.025
Neg logit contributions
te-0.040
 calendar-0.038
 around-0.037
THE-0.037
 fant-0.037
Token careful
Token ID8161
Feature activation+1.09e-03

Pos logit contributions
 neither+0.024
 Instead+0.024
 So+0.022
 Twe+0.022
 Mitt+0.022
Neg logit contributions
te-0.032
bridge-0.032
 calendar-0.031
 throne-0.031
 fant-0.031
Token with
Token ID351
Feature activation+9.76e-03

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
 Mitt+0.001
Neg logit contributions
te-0.002
bridge-0.002
THE-0.002
 fant-0.002
book-0.002
Token it
Token ID340
Feature activation+1.34e-02

Pos logit contributions
 neither+0.014
 So+0.014
 Instead+0.014
 Twe+0.013
idents+0.013
Neg logit contributions
te-0.014
 calendar-0.014
bridge-0.013
 curtain-0.013
THE-0.013
Token.
Token ID13
Feature activation+6.96e-03

Pos logit contributions
 neither+0.017
 Instead+0.017
 Twe+0.017
 So+0.016
 Mitt+0.015
Neg logit contributions
te-0.023
bridge-0.021
THE-0.021
book-0.021
 calendar-0.021
Token It
Token ID632
Feature activation+2.65e-02

Pos logit contributions
 neither+0.010
 mun+0.009
idents+0.009
 Twe+0.009
 scrape+0.009
Neg logit contributions
te-0.011
bridge-0.010
THE-0.009
book-0.009
 fant-0.009
Token can
Token ID460
Feature activation+1.27e-02

Pos logit contributions
 neither+0.040
 Instead+0.039
 Twe+0.039
 So+0.038
 Mitt+0.036
Neg logit contributions
te-0.038
bridge-0.036
 calendar-0.035
 fant-0.035
THE-0.035
Token break
Token ID2270
Feature activation+2.31e-02

Pos logit contributions
 Instead+0.017
 neither+0.017
 So+0.017
 Twe+0.015
idents+0.015
Neg logit contributions
te-0.020
bridge-0.020
 throne-0.019
 calendar-0.019
THE-0.019
Token way
Token ID835
Feature activation+1.91e-02

Pos logit contributions
 neither+0.005
 Instead+0.004
 So+0.004
 Twe+0.004
 Mitt+0.004
Neg logit contributions
te-0.007
bridge-0.007
book-0.007
THE-0.007
 fant-0.007
Token.
Token ID13
Feature activation+9.01e-03

Pos logit contributions
 neither+0.023
 Instead+0.022
 Twe+0.021
 So+0.021
 Mitt+0.019
Neg logit contributions
te-0.034
bridge-0.033
THE-0.032
book-0.032
 fant-0.032
Token
Token ID198
Feature activation-5.62e-03

Pos logit contributions
 neither+0.013
 Twe+0.012
idents+0.011
 scrape+0.011
 Instead+0.011
Neg logit contributions
te-0.013
THE-0.013
book-0.013
bridge-0.012
 fant-0.012
Token
Token ID198
Feature activation-2.13e-03

Pos logit contributions
te+0.011
bridge+0.011
 calendar+0.010
 throne+0.010
THE+0.010
Neg logit contributions
 neither-0.006
 Instead-0.006
 So-0.006
 Twe-0.005
 Mitt-0.005
TokenAs
Token ID1722
Feature activation+1.97e-02

Pos logit contributions
 throne+0.004
te+0.004
 calendar+0.004
bridge+0.004
 carriage+0.004
Neg logit contributions
 So-0.003
 Instead-0.003
 Mitt-0.002
 neither-0.002
Fine-0.002
Token Sam
Token ID3409
Feature activation+5.72e-04

Pos logit contributions
 neither+0.020
 Instead+0.019
 Twe+0.018
 So+0.018
 Mitt+0.016
Neg logit contributions
te-0.039
bridge-0.038
THE-0.037
 fant-0.036
 calendar-0.036
Token j
Token ID474
Feature activation-7.59e-03

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
 Mitt+0.001
Neg logit contributions
te-0.001
bridge-0.001
THE-0.001
 calendar-0.001
book-0.001
Tokenogged
Token ID42545
Feature activation+1.90e-02

Pos logit contributions
te+0.012
bridge+0.011
book+0.011
THE+0.011
ines+0.010
Neg logit contributions
 neither-0.011
 Instead-0.011
 Twe-0.010
idents-0.010
 So-0.010
Token,
Token ID11
Feature activation-8.12e-03

Pos logit contributions
 neither+0.019
 Twe+0.019
 So+0.017
 Instead+0.017
 B+0.016
Neg logit contributions
bridge-0.040
book-0.039
te-0.038
 calendar-0.037
iet-0.036
Token she
Token ID673
Feature activation-2.27e-02

Pos logit contributions
bridge+0.012
te+0.012
THE+0.011
 fant+0.011
 calendar+0.011
Neg logit contributions
 Instead-0.012
 So-0.011
 Twe-0.011
ucker-0.011
 neither-0.011
Token had
Token ID550
Feature activation-1.37e-02

Pos logit contributions
te+0.037
bridge+0.036
THE+0.034
book+0.034
 calendar+0.034
Neg logit contributions
 neither-0.031
 Instead-0.030
 Twe-0.029
 So-0.029
 Mitt-0.027
Token so
Token ID523
Feature activation+1.96e-02

Pos logit contributions
 neither+0.008
 Instead+0.008
 Twe+0.008
 So+0.007
 Mitt+0.007
Neg logit contributions
 calendar-0.012
book-0.012
bridge-0.012
te-0.012
iet-0.012
Token happy
Token ID3772
Feature activation+3.67e-03

Pos logit contributions
 neither+0.036
 Instead+0.035
 Twe+0.034
 So+0.034
 Mitt+0.032
Neg logit contributions
bridge-0.023
te-0.023
 calendar-0.022
THE-0.021
book-0.021
Token and
Token ID290
Feature activation+4.13e-03

Pos logit contributions
 neither+0.005
 Instead+0.005
 Twe+0.005
 So+0.005
 Mitt+0.004
Neg logit contributions
te-0.006
bridge-0.006
THE-0.005
 fant-0.005
 calendar-0.005
Token gave
Token ID2921
Feature activation+3.12e-02

Pos logit contributions
 neither+0.007
 Instead+0.006
 So+0.006
 Twe+0.006
 Ow+0.006
Neg logit contributions
te-0.006
bridge-0.005
THE-0.005
 fant-0.005
 throne-0.005
Token his
Token ID465
Feature activation+1.82e-02

Pos logit contributions
 neither+0.056
 Instead+0.053
 So+0.053
idents+0.052
ucker+0.050
Neg logit contributions
 around-0.036
te-0.036
THE-0.033
 corner-0.033
 over-0.032
Token mom
Token ID1995
Feature activation+3.95e-03

Pos logit contributions
 neither+0.026
 Instead+0.025
 Twe+0.025
 So+0.024
 Mitt+0.023
Neg logit contributions
te-0.028
bridge-0.027
 calendar-0.026
THE-0.026
 fant-0.026
Tokenmy
Token ID1820
Feature activation-1.58e-02

Pos logit contributions
 neither+0.005
 Instead+0.005
 Twe+0.004
 So+0.004
idents+0.004
Neg logit contributions
te-0.007
bridge-0.007
THE-0.007
book-0.007
 calendar-0.007
Token a
Token ID257
Feature activation-4.54e-03

Pos logit contributions
te+0.031
bridge+0.031
THE+0.030
 fant+0.029
book+0.029
Neg logit contributions
 neither-0.016
 Instead-0.015
 Twe-0.015
 So-0.015
 Mitt-0.013
Token big
Token ID1263
Feature activation-8.74e-03

Pos logit contributions
bridge+0.007
te+0.007
THE+0.006
 fant+0.006
ines+0.006
Neg logit contributions
 Mitt-0.006
 Instead-0.006
 So-0.006
 Twe-0.006
 neither-0.006
Token hug
Token ID16225
Feature activation+2.06e-02

Pos logit contributions
te+0.012
bridge+0.011
THE+0.011
 calendar+0.011
book+0.011
Neg logit contributions
 neither-0.014
 Instead-0.014
 Twe-0.014
 So-0.013
 Mitt-0.013
Token.
Token ID13
Feature activation+2.64e-03

Pos logit contributions
 neither+0.024
 Instead+0.023
 Twe+0.023
 So+0.023
 Mitt+0.021
Neg logit contributions
te-0.037
bridge-0.036
THE-0.035
 fant-0.035
 calendar-0.035
Token some
Token ID617
Feature activation+9.17e-03

Pos logit contributions
Fine+0.014
idents+0.014
 So+0.014
 Instead+0.013
 neither+0.013
Neg logit contributions
 around-0.014
te-0.014
book-0.014
bridge-0.014
THE-0.014
Token.
Token ID13
Feature activation+1.73e-02

Pos logit contributions
 neither+0.015
 Instead+0.014
 So+0.013
 Twe+0.013
idents+0.013
Neg logit contributions
te-0.012
bridge-0.012
 fant-0.012
THE-0.012
 calendar-0.011
Token Jill
Token ID22772
Feature activation-1.03e-02

Pos logit contributions
 neither+0.020
 Twe+0.019
 Instead+0.018
idents+0.017
 scrape+0.016
Neg logit contributions
te-0.031
book-0.029
bridge-0.029
THE-0.029
 fant-0.029
Token and
Token ID290
Feature activation+1.09e-02

Pos logit contributions
bridge+0.018
te+0.018
THE+0.018
 calendar+0.017
 fant+0.017
Neg logit contributions
 neither-0.012
 Instead-0.012
 So-0.012
 Twe-0.012
 Mitt-0.011
Token Jack
Token ID3619
Feature activation+4.05e-03

Pos logit contributions
 Instead+0.015
 neither+0.014
idents+0.014
 So+0.014
ucker+0.013
Neg logit contributions
te-0.018
 fant-0.016
 calendar-0.016
 tour-0.016
book-0.016
Token agreed
Token ID4987
Feature activation+1.14e-02

Pos logit contributions
 neither+0.005
 Instead+0.005
 Twe+0.005
 So+0.005
 Mitt+0.004
Neg logit contributions
te-0.007
bridge-0.007
THE-0.007
 fant-0.006
 calendar-0.006
Token that
Token ID326
Feature activation+7.15e-03

Pos logit contributions
 neither+0.015
 Instead+0.014
 So+0.014
 Twe+0.014
 Mitt+0.013
Neg logit contributions
bridge-0.020
te-0.019
THE-0.018
 fant-0.018
 calendar-0.018
Token it
Token ID340
Feature activation+1.19e-02

Pos logit contributions
 Instead+0.010
 So+0.010
 Ow+0.009
 neither+0.009
idents+0.009
Neg logit contributions
te-0.011
 fant-0.010
THE-0.010
 calendar-0.010
 around-0.009
Token was
Token ID373
Feature activation+5.61e-03

Pos logit contributions
 neither+0.016
 Twe+0.016
 Instead+0.016
 So+0.015
 Mitt+0.015
Neg logit contributions
te-0.019
bridge-0.018
THE-0.017
 fant-0.017
 calendar-0.017
Token much
Token ID881
Feature activation+2.18e-02

Pos logit contributions
 neither+0.007
 Instead+0.007
 Twe+0.007
 So+0.007
 Mitt+0.006
Neg logit contributions
te-0.010
bridge-0.009
THE-0.009
 calendar-0.009
 throne-0.009
Token better
Token ID1365
Feature activation+1.29e-02

Pos logit contributions
 Twe+0.034
 neither+0.034
 Ow+0.033
 So+0.033
 Instead+0.033
Neg logit contributions
te-0.027
THE-0.026
 fant-0.026
bridge-0.026
 calendar-0.025
Token!
Token ID0
Feature activation+4.42e-03

Pos logit contributions
 neither+0.003
 Instead+0.003
 Twe+0.003
 So+0.003
 Mitt+0.002
Neg logit contributions
te-0.004
bridge-0.004
THE-0.003
 fant-0.003
book-0.003
Token Emma
Token ID18966
Feature activation-1.36e-02

Pos logit contributions
 neither+0.006
idents+0.006
ucker+0.005
 scrape+0.005
 Twe+0.005
Neg logit contributions
te-0.007
book-0.007
bridge-0.007
THE-0.006
 fant-0.006
Token was
Token ID373
Feature activation+7.83e-04

Pos logit contributions
te+0.022
bridge+0.022
THE+0.021
 fant+0.020
book+0.020
Neg logit contributions
 neither-0.018
 Instead-0.018
 Twe-0.017
 So-0.017
 Mitt-0.016
Token so
Token ID523
Feature activation+1.46e-02

Pos logit contributions
 neither+0.001
 Instead+0.001
 Twe+0.001
 So+0.001
 Mitt+0.001
Neg logit contributions
bridge-0.002
book-0.002
te-0.002
 calendar-0.002
THE-0.002
Token happy
Token ID3772
Feature activation+2.77e-03

Pos logit contributions
 neither+0.024
 Instead+0.023
 Twe+0.022
 So+0.022
 Mitt+0.020
Neg logit contributions
bridge-0.021
te-0.020
THE-0.020
 calendar-0.019
book-0.019
Token and
Token ID290
Feature activation+2.25e-03

Pos logit contributions
 neither+0.004
 Instead+0.004
 Twe+0.004
 So+0.004
 Mitt+0.003
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
 fant-0.004
 calendar-0.004
Token wanted
Token ID2227
Feature activation-7.00e-03

Pos logit contributions
 neither+0.003
 Instead+0.003
 Twe+0.003
 So+0.003
 Mitt+0.003
Neg logit contributions
te-0.003
bridge-0.003
THE-0.003
 fant-0.003
 calendar-0.003
Token to
Token ID284
Feature activation+7.67e-03

Pos logit contributions
te+0.010
bridge+0.009
THE+0.009
 fant+0.009
 calendar+0.009
Neg logit contributions
 neither-0.011
 Instead-0.011
 Twe-0.011
 So-0.011
 Mitt-0.010
Token take
Token ID1011
Feature activation+1.54e-02

Pos logit contributions
 neither+0.010
 Instead+0.010
 Twe+0.010
 So+0.010
 Mitt+0.009
Neg logit contributions
te-0.012
bridge-0.012
THE-0.012
book-0.012
 fant-0.012
Token the
Token ID262
Feature activation-2.92e-03

Pos logit contributions
 neither+0.021
 Instead+0.020
 So+0.019
 Twe+0.019
idents+0.018
Neg logit contributions
te-0.025
bridge-0.024
THE-0.024
 calendar-0.023
 fant-0.023
Token stone
Token ID7815
Feature activation+4.61e-03

Pos logit contributions
te+0.004
bridge+0.004
THE+0.004
 fant+0.004
book+0.004
Neg logit contributions
 neither-0.005
 Instead-0.004
 Twe-0.004
 So-0.004
 Mitt-0.004
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
 neither+0.014
idents+0.013
 Twe+0.012
eter+0.012
charged+0.012
Neg logit contributions
THE-0.010
te-0.010
 fant-0.010
 corner-0.010
book-0.010
Token
Token ID198
Feature activation-6.60e-03

Pos logit contributions
THE+0.015
te+0.015
bridge+0.014
 fant+0.013
book+0.013
Neg logit contributions
 neither-0.016
 Twe-0.014
 Instead-0.014
idents-0.014
eter-0.013
TokenL
Token ID43
Feature activation+1.83e-02

Pos logit contributions
te+0.012
bridge+0.012
THE+0.011
 calendar+0.011
 fant+0.011
Neg logit contributions
 neither-0.008
 Instead-0.007
 Twe-0.007
 So-0.007
 Mitt-0.006
Tokenily
Token ID813
Feature activation-4.25e-03

Pos logit contributions
 neither+0.038
 Instead+0.037
 Twe+0.036
 So+0.036
 Mitt+0.034
Neg logit contributions
te-0.018
bridge-0.017
THE-0.016
book-0.016
 calendar-0.015
Token said
Token ID531
Feature activation-2.00e-02

Pos logit contributions
te+0.006
idi+0.005
bridge+0.005
 around+0.005
ibrarian+0.005
Neg logit contributions
 Twe-0.007
 neither-0.007
 Instead-0.007
 Bugs-0.006
 Ow-0.006
Token,
Token ID11
Feature activation-1.68e-02

Pos logit contributions
te+0.026
 around+0.023
 calendar+0.023
THE+0.022
 forward+0.022
Neg logit contributions
 Instead-0.032
 Twe-0.031
 neither-0.031
idents-0.030
 So-0.030
Token "
Token ID366
Feature activation-1.12e-02

Pos logit contributions
te+0.023
bridge+0.022
THE+0.021
 fant+0.021
book+0.021
Neg logit contributions
 neither-0.027
 Instead-0.026
 Twe-0.026
 So-0.026
 Mitt-0.024
TokenYou
Token ID1639
Feature activation-2.72e-02

Pos logit contributions
THE+0.014
te+0.013
bridge+0.013
 fant+0.012
---+0.012
Neg logit contributions
 neither-0.018
 Instead-0.018
 Twe-0.017
idents-0.017
 So-0.017
Token are
Token ID389
Feature activation+1.07e-02

Pos logit contributions
te+0.038
bridge+0.036
THE+0.033
 fant+0.032
 calendar+0.032
Neg logit contributions
eter-0.039
 Twe-0.039
 Instead-0.038
 neither-0.038
 So-0.037
Token welcome
Token ID7062
Feature activation+1.76e-02

Pos logit contributions
 neither+0.013
 Instead+0.012
 Twe+0.012
 So+0.011
 Mitt+0.011
Neg logit contributions
bridge-0.020
te-0.019
book-0.019
THE-0.019
 carriage-0.018
Token,
Token ID11
Feature activation-1.15e-02

Pos logit contributions
 neither+0.025
 Instead+0.024
 Twe+0.023
 So+0.023
idents+0.022
Neg logit contributions
te-0.027
bridge-0.025
 calendar-0.024
THE-0.024
 fant-0.024
Token a
Token ID257
Feature activation+7.56e-03

Pos logit contributions
 neither+0.014
ucker+0.013
idents+0.013
 Instead+0.012
eter+0.012
Neg logit contributions
bridge-0.013
te-0.013
THE-0.013
 fant-0.012
book-0.012
Token time
Token ID640
Feature activation+2.59e-02

Pos logit contributions
 neither+0.016
 Instead+0.015
 Twe+0.015
 So+0.015
 Mitt+0.014
Neg logit contributions
te-0.007
bridge-0.007
THE-0.006
 fant-0.006
book-0.006
Token there
Token ID612
Feature activation+2.85e-02

Pos logit contributions
 neither+0.032
 Instead+0.031
 Twe+0.030
 So+0.030
 Mitt+0.028
Neg logit contributions
te-0.045
bridge-0.044
THE-0.042
 fant-0.042
 calendar-0.041
Token was
Token ID373
Feature activation-3.66e-03

Pos logit contributions
 neither+0.041
 Instead+0.039
 Twe+0.038
 So+0.038
idents+0.036
Neg logit contributions
te-0.044
bridge-0.042
THE-0.041
 fant-0.041
book-0.040
Token a
Token ID257
Feature activation+9.55e-03

Pos logit contributions
te+0.007
bridge+0.007
THE+0.007
 fant+0.007
book+0.007
Neg logit contributions
 neither-0.004
 Instead-0.004
 So-0.004
 Twe-0.003
idents-0.003
Token purple
Token ID14032
Feature activation-3.30e-02

Pos logit contributions
 neither+0.011
 Instead+0.010
 Twe+0.010
 So+0.010
 Mitt+0.009
Neg logit contributions
te-0.018
bridge-0.017
THE-0.017
 fant-0.016
book-0.016
Token flower
Token ID15061
Feature activation+1.27e-02

Pos logit contributions
te+0.049
bridge+0.047
 calendar+0.047
 throne+0.046
THE+0.045
Neg logit contributions
 neither-0.050
 Instead-0.049
 So-0.046
 Twe-0.046
idents-0.043
Token.
Token ID13
Feature activation-3.36e-06

Pos logit contributions
 neither+0.016
 Instead+0.015
 Twe+0.015
 So+0.015
 Mitt+0.014
Neg logit contributions
te-0.022
bridge-0.021
THE-0.021
 calendar-0.020
 fant-0.020
Token It
Token ID632
Feature activation+2.04e-02

Feature not active on prev token;
no predictions were affected.
Token was
Token ID373
Feature activation+8.55e-03

Pos logit contributions
 Twe+0.023
 neither+0.023
 Instead+0.023
 So+0.022
 Mitt+0.021
Neg logit contributions
bridge-0.037
te-0.036
THE-0.035
book-0.035
 calendar-0.035
Token in
Token ID287
Feature activation+1.91e-02

Pos logit contributions
 Twe+0.012
 neither+0.012
 Instead+0.012
 Mitt+0.012
 So+0.011
Neg logit contributions
bridge-0.013
te-0.013
THE-0.013
 fant-0.012
book-0.012
Token good
Token ID922
Feature activation+3.68e-03

Pos logit contributions
bridge+0.000
te+0.000
THE+0.000
book+0.000
 calendar+0.000
Neg logit contributions
 neither-0.000
 Instead-0.000
 Twe-0.000
 So-0.000
 Mitt-0.000
Token all
Token ID477
Feature activation-2.00e-02

Pos logit contributions
 neither+0.006
 Instead+0.006
 Twe+0.006
 So+0.006
 Mitt+0.005
Neg logit contributions
te-0.005
bridge-0.005
THE-0.005
book-0.005
 calendar-0.005
Token year
Token ID614
Feature activation+3.29e-02

Pos logit contributions
bridge+0.040
iet+0.033
THE+0.033
te+0.033
 reporter+0.032
Neg logit contributions
 neither-0.023
 Twe-0.022
 you-0.021
 So-0.021
 Instead-0.021
Token and
Token ID290
Feature activation+3.60e-03

Pos logit contributions
 So+0.043
 neither+0.043
 Instead+0.043
 Twe+0.042
idents+0.038
Neg logit contributions
bridge-0.055
te-0.051
THE-0.051
book-0.048
 fant-0.048
Token hoped
Token ID10719
Feature activation-6.98e-03

Pos logit contributions
 Instead+0.006
 So+0.006
 Twe+0.006
 neither+0.006
 Ow+0.006
Neg logit contributions
te-0.005
bridge-0.004
book-0.004
 fant-0.004
THE-0.004
Token for
Token ID329
Feature activation-2.20e-02

Pos logit contributions
te+0.011
bridge+0.010
 fant+0.010
THE+0.010
 throne+0.010
Neg logit contributions
 neither-0.010
 Instead-0.010
 So-0.010
 Twe-0.009
idents-0.009
Token some
Token ID617
Feature activation+1.95e-03

Pos logit contributions
te+0.027
 fant+0.024
THE+0.024
 throne+0.023
bridge+0.023
Neg logit contributions
 neither-0.040
 Instead-0.038
ucker-0.038
 So-0.037
ety-0.037
Token nice
Token ID3621
Feature activation+8.84e-03

Pos logit contributions
 neither+0.003
 Instead+0.003
 Twe+0.003
 So+0.003
 Mitt+0.003
Neg logit contributions
te-0.003
bridge-0.003
THE-0.002
 fant-0.002
 calendar-0.002
Token toys
Token ID14958
Feature activation-8.75e-03

Pos logit contributions
 neither+0.013
 Instead+0.012
 Twe+0.012
 So+0.012
 Mitt+0.011
Neg logit contributions
te-0.013
bridge-0.013
THE-0.013
 fant-0.012
book-0.012
Token.
Token ID13
Feature activation+2.32e-03

Pos logit contributions
bridge+0.016
te+0.016
 fant+0.015
THE+0.015
 calendar+0.015
Neg logit contributions
 neither-0.010
 Instead-0.009
 So-0.009
 Twe-0.009
 Mitt-0.008
Token
Token ID198
Feature activation-1.20e-02

Pos logit contributions
 neither+0.003
 Twe+0.003
 Instead+0.003
 So+0.002
 Mitt+0.002
Neg logit contributions
te-0.004
bridge-0.004
THE-0.004
 fant-0.004
book-0.004
Token.
Token ID13
Feature activation+1.84e-03

Pos logit contributions
te+0.038
bridge+0.038
THE+0.036
 fant+0.036
book+0.036
Neg logit contributions
 neither-0.027
 Instead-0.026
 Twe-0.025
 So-0.025
 Mitt-0.023
Token
Token ID198
Feature activation-9.84e-03

Pos logit contributions
 neither+0.003
 mun+0.003
charged+0.002
 Twe+0.002
iously+0.002
Neg logit contributions
te-0.002
THE-0.002
 fant-0.002
book-0.002
 calendar-0.002
Token
Token ID198
Feature activation-4.77e-03

Pos logit contributions
te+0.015
THE+0.014
book+0.014
 fant+0.014
bridge+0.013
Neg logit contributions
 neither-0.015
 Instead-0.013
 Twe-0.013
idents-0.013
eter-0.012
TokenL
Token ID43
Feature activation+2.08e-02

Pos logit contributions
te+0.010
bridge+0.010
 throne+0.009
 calendar+0.009
 fant+0.009
Neg logit contributions
 neither-0.005
 So-0.005
 Instead-0.005
 Twe-0.004
 Mitt-0.004
Tokenily
Token ID813
Feature activation-4.11e-03

Pos logit contributions
 neither+0.043
 Instead+0.042
 Twe+0.042
 So+0.041
 Mitt+0.039
Neg logit contributions
te-0.020
bridge-0.019
book-0.018
THE-0.018
 calendar-0.017
Token thought
Token ID1807
Feature activation-2.77e-02

Pos logit contributions
te+0.007
bridge+0.007
THE+0.007
 fant+0.006
book+0.006
Neg logit contributions
 neither-0.005
 Instead-0.005
 Twe-0.005
 So-0.005
 Mitt-0.005
Token for
Token ID329
Feature activation-2.25e-02

Pos logit contributions
te+0.036
bridge+0.034
THE+0.033
 calendar+0.032
 fant+0.032
Neg logit contributions
 neither-0.047
 Instead-0.046
 Twe-0.045
 So-0.045
 Mitt-0.042
Token a
Token ID257
Feature activation-1.05e-03

Pos logit contributions
te+0.035
book+0.033
THE+0.032
bridge+0.032
 calendar+0.032
Neg logit contributions
 neither-0.034
 Twe-0.031
ucker-0.031
 Instead-0.031
 Sorry-0.030
Token moment
Token ID2589
Feature activation+2.84e-03

Pos logit contributions
te+0.001
bridge+0.001
THE+0.001
book+0.001
 fant+0.001
Neg logit contributions
 neither-0.002
 Instead-0.002
 Twe-0.002
 So-0.002
 Mitt-0.002
Token.
Token ID13
Feature activation+9.50e-04

Pos logit contributions
 neither+0.003
 So+0.003
 Instead+0.003
 Twe+0.003
 Mitt+0.003
Neg logit contributions
bridge-0.005
te-0.005
 fant-0.005
 throne-0.005
 carriage-0.005
Token She
Token ID1375
Feature activation-5.80e-03

Pos logit contributions
 neither+0.002
 Twe+0.001
 mun+0.001
ucker+0.001
idents+0.001
Neg logit contributions
te-0.001
THE-0.001
 calendar-0.001
 fant-0.001
bridge-0.001
Token each
Token ID1123
Feature activation-1.31e-02

Pos logit contributions
 Instead+0.016
 neither+0.016
 Twe+0.015
 So+0.014
ucker+0.014
Neg logit contributions
te-0.019
THE-0.017
 calendar-0.017
 around-0.017
 fant-0.017
Token other
Token ID584
Feature activation-8.16e-03

Pos logit contributions
bridge+0.023
te+0.023
 throne+0.022
 fant+0.022
THE+0.021
Neg logit contributions
 So-0.016
 neither-0.016
 Instead-0.015
 Mitt-0.014
 Twe-0.014
Token and
Token ID290
Feature activation+5.95e-03

Pos logit contributions
te+0.012
bridge+0.011
THE+0.011
 calendar+0.011
book+0.011
Neg logit contributions
 neither-0.013
 Instead-0.012
 Twe-0.012
 So-0.012
 Mitt-0.011
Token Sue
Token ID26089
Feature activation-3.62e-03

Pos logit contributions
 Instead+0.009
 Ow+0.009
 neither+0.009
ucker+0.009
idents+0.009
Neg logit contributions
te-0.008
THE-0.007
 around-0.007
 tour-0.007
 carriage-0.007
Token put
Token ID1234
Feature activation-2.36e-03

Pos logit contributions
te+0.005
bridge+0.005
THE+0.005
 throne+0.005
 fant+0.005
Neg logit contributions
 neither-0.005
 Instead-0.005
 Twe-0.005
 So-0.005
idents-0.005
Token on
Token ID319
Feature activation-1.63e-02

Pos logit contributions
te+0.003
bridge+0.003
THE+0.003
 fant+0.003
 calendar+0.003
Neg logit contributions
 neither-0.004
 Instead-0.004
 Twe-0.004
 So-0.004
 Mitt-0.004
Token her
Token ID607
Feature activation-3.44e-03

Pos logit contributions
te+0.023
bridge+0.022
THE+0.021
 fant+0.021
 calendar+0.021
Neg logit contributions
 neither-0.026
 Instead-0.025
 Twe-0.025
 So-0.025
 Mitt-0.023
Token sunglasses
Token ID36251
Feature activation-2.96e-02

Pos logit contributions
te+0.006
bridge+0.006
book+0.005
THE+0.005
 throne+0.005
Neg logit contributions
 So-0.005
 Twe-0.005
 neither-0.004
 Instead-0.004
 Mitt-0.004
Token.
Token ID13
Feature activation+7.46e-03

Pos logit contributions
te+0.048
bridge+0.047
THE+0.045
 fant+0.045
 calendar+0.044
Neg logit contributions
 neither-0.040
 Instead-0.039
 Twe-0.038
 So-0.038
 Mitt-0.035
Token "
Token ID366
Feature activation+7.53e-03

Pos logit contributions
 neither+0.009
 Twe+0.008
idents+0.008
 Instead+0.008
 scrape+0.008
Neg logit contributions
te-0.013
THE-0.013
bridge-0.012
 fant-0.012
book-0.012
TokenThank
Token ID10449
Feature activation+1.58e-02

Pos logit contributions
 neither+0.011
 Instead+0.010
 Twe+0.010
 So+0.010
 Mitt+0.009
Neg logit contributions
te-0.012
bridge-0.012
THE-0.011
 fant-0.011
book-0.011
Token unw
Token ID7379
Feature activation+3.02e-04

Pos logit contributions
 Instead+0.035
 neither+0.034
 Twe+0.033
 So+0.033
idents+0.032
Neg logit contributions
te-0.034
bridge-0.031
 fant-0.031
idi-0.030
 throne-0.030
Tokenra
Token ID430
Feature activation-2.13e-02

Pos logit contributions
 neither+0.000
 Instead+0.000
 Twe+0.000
 So+0.000
 Mitt+0.000
Neg logit contributions
bridge-0.001
book-0.001
 fant-0.001
te-0.001
 calendar-0.001
Tokenpped
Token ID1496
Feature activation+2.21e-02

Pos logit contributions
te+0.030
bridge+0.029
 fant+0.028
 calendar+0.028
THE+0.028
Neg logit contributions
 neither-0.033
 Instead-0.032
 Twe-0.031
 So-0.031
 Mitt-0.030
Token the
Token ID262
Feature activation-4.68e-03

Pos logit contributions
 neither+0.029
 Twe+0.029
 Instead+0.027
 Mitt+0.026
 So+0.025
Neg logit contributions
te-0.038
bridge-0.038
 fant-0.036
 throne-0.036
book-0.036
Token l
Token ID300
Feature activation-3.82e-02

Pos logit contributions
te+0.008
bridge+0.008
THE+0.007
 calendar+0.007
book+0.007
Neg logit contributions
 neither-0.006
 Instead-0.006
 Twe-0.006
 So-0.006
 Mitt-0.005
Tokenoll
Token ID692
Feature activation-6.74e-02

Pos logit contributions
te+0.057
 fant+0.055
bridge+0.055
 calendar+0.054
THE+0.054
Neg logit contributions
 neither-0.054
 Instead-0.054
 So-0.052
 Twe-0.052
 Mitt-0.049
Tokenipop
Token ID42800
Feature activation-2.31e-02

Pos logit contributions
te+0.101
bridge+0.098
THE+0.095
book+0.094
 calendar+0.093
Neg logit contributions
 neither-0.099
 Instead-0.098
 Twe-0.094
 So-0.094
 Mitt-0.087
Token and
Token ID290
Feature activation+4.69e-03

Pos logit contributions
te+0.029
bridge+0.028
THE+0.027
book+0.027
 calendar+0.026
Neg logit contributions
 neither-0.038
 Instead-0.038
 Twe-0.038
 So-0.036
 Mitt-0.035
Token l
Token ID300
Feature activation-1.70e-02

Pos logit contributions
 Instead+0.007
 neither+0.007
 So+0.007
idents+0.007
unch+0.006
Neg logit contributions
te-0.006
bridge-0.006
 around-0.006
 throne-0.006
idi-0.006
Tokenicked
Token ID9484
Feature activation+8.23e-03

Pos logit contributions
te+0.027
bridge+0.027
book+0.025
THE+0.025
 throne+0.024
Neg logit contributions
 neither-0.024
 Twe-0.023
 Instead-0.023
idents-0.022
 So-0.022
Token it
Token ID340
Feature activation-5.02e-03

Pos logit contributions
 neither+0.011
 Instead+0.010
 Twe+0.010
 So+0.010
idents+0.010
Neg logit contributions
te-0.013
bridge-0.013
THE-0.013
book-0.012
 fant-0.012
Token closer
Token ID5699
Feature activation-7.78e-03

Pos logit contributions
bridge+0.001
book+0.001
te+0.001
THE+0.001
 throne+0.001
Neg logit contributions
 neither-0.001
 Instead-0.001
 Twe-0.001
 So-0.001
 Mitt-0.001
Token.
Token ID13
Feature activation-1.96e-02

Pos logit contributions
te+0.013
bridge+0.012
THE+0.012
 fant+0.012
 calendar+0.012
Neg logit contributions
 neither-0.011
 Instead-0.010
 Twe-0.010
 So-0.010
 Mitt-0.009
Token She
Token ID1375
Feature activation-1.79e-02

Pos logit contributions
te+0.029
bridge+0.027
 fant+0.027
THE+0.027
book+0.027
Neg logit contributions
 neither-0.030
 Twe-0.027
 Instead-0.027
 So-0.026
idents-0.026
Token saw
Token ID2497
Feature activation-3.57e-03

Pos logit contributions
bridge+0.033
te+0.033
THE+0.032
book+0.031
 throne+0.031
Neg logit contributions
 neither-0.020
 Instead-0.019
 So-0.019
 Twe-0.018
 Mitt-0.017
Token the
Token ID262
Feature activation-3.58e-03

Pos logit contributions
te+0.005
 calendar+0.005
bridge+0.005
 curtain+0.005
 around+0.005
Neg logit contributions
 So-0.005
 Instead-0.005
idents-0.005
 neither-0.005
eter-0.005
Token spinning
Token ID19493
Feature activation-6.56e-02

Pos logit contributions
te+0.005
bridge+0.005
THE+0.005
 calendar+0.005
 fant+0.005
Neg logit contributions
 neither-0.006
 Instead-0.006
 Twe-0.005
 So-0.005
 Mitt-0.005
Token toy
Token ID13373
Feature activation-1.22e-02

Pos logit contributions
te+0.088
bridge+0.085
THE+0.082
 fant+0.080
book+0.080
Neg logit contributions
 neither-0.107
 Instead-0.104
 Twe-0.102
 So-0.101
 Mitt-0.096
Token on
Token ID319
Feature activation-2.09e-02

Pos logit contributions
te+0.017
bridge+0.016
THE+0.015
 fant+0.015
book+0.015
Neg logit contributions
 neither-0.020
 Instead-0.019
 Twe-0.019
 So-0.019
 Mitt-0.018
Token the
Token ID262
Feature activation+1.87e-03

Pos logit contributions
te+0.033
bridge+0.032
 calendar+0.031
THE+0.031
book+0.030
Neg logit contributions
 neither-0.029
 Instead-0.028
 So-0.028
 Twe-0.027
idents-0.026
Token billboard
Token ID41448
Feature activation-8.40e-03

Pos logit contributions
 Twe+0.003
 Mitt+0.002
 neither+0.002
 So+0.002
 Instead+0.002
Neg logit contributions
te-0.003
bridge-0.003
THE-0.003
 fant-0.003
book-0.003
Token.
Token ID13
Feature activation-4.57e-03

Pos logit contributions
te+0.014
bridge+0.013
THE+0.013
book+0.013
 calendar+0.013
Neg logit contributions
 neither-0.011
 Instead-0.011
 Twe-0.011
 So-0.011
 Mitt-0.010
Token
Token ID198
Feature activation-1.68e-02

Pos logit contributions
te+0.008
 around+0.008
book+0.008
 downstairs+0.008
 corner+0.008
Neg logit contributions
 neither-0.011
 Twe-0.011
idents-0.011
eter-0.011
 mun-0.010
Token
Token ID198
Feature activation-1.42e-02

Pos logit contributions
te+0.028
THE+0.027
bridge+0.026
 fant+0.025
book+0.025
Neg logit contributions
 neither-0.023
 Instead-0.021
 Twe-0.021
 So-0.020
idents-0.019
TokenThey
Token ID2990
Feature activation-6.04e-03

Pos logit contributions
te+0.024
bridge+0.024
 throne+0.023
 calendar+0.023
 fant+0.022
Neg logit contributions
 neither-0.018
 So-0.018
 Instead-0.018
 Twe-0.017
 Mitt-0.016
Token grabbed
Token ID13176
Feature activation-1.27e-02

Pos logit contributions
te+0.009
bridge+0.009
THE+0.009
book+0.009
 calendar+0.008
Neg logit contributions
 neither-0.009
 Instead-0.009
 Twe-0.008
 So-0.008
idents-0.008
Token the
Token ID262
Feature activation-1.89e-02

Pos logit contributions
te+0.019
 calendar+0.018
bridge+0.018
 around+0.018
book+0.017
Neg logit contributions
 Instead-0.018
 So-0.017
 neither-0.017
 Twe-0.017
idents-0.016
Token vines
Token ID44439
Feature activation-6.52e-02

Pos logit contributions
te+0.029
bridge+0.027
THE+0.026
book+0.026
 fant+0.026
Neg logit contributions
 neither-0.029
 Instead-0.027
 Twe-0.027
 So-0.027
 Mitt-0.026
Token and
Token ID290
Feature activation-1.23e-02

Pos logit contributions
te+0.093
bridge+0.090
THE+0.087
 calendar+0.085
 fant+0.085
Neg logit contributions
 neither-0.101
 Instead-0.098
 Twe-0.096
 So-0.095
 Mitt-0.090
Token started
Token ID2067
Feature activation-2.22e-02

Pos logit contributions
te+0.017
bridge+0.016
 calendar+0.015
THE+0.015
 around+0.015
Neg logit contributions
 Instead-0.019
 neither-0.018
 So-0.018
 Twe-0.018
 Ow-0.017
Token to
Token ID284
Feature activation-1.48e-03

Pos logit contributions
te+0.024
bridge+0.023
 calendar+0.023
THE+0.022
 fant+0.022
Neg logit contributions
 neither-0.039
 Instead-0.039
 So-0.038
 Twe-0.038
idents-0.037
Token climb
Token ID12080
Feature activation-1.13e-02

Pos logit contributions
 calendar+0.002
te+0.002
 throne+0.002
 tour+0.002
 curtain+0.002
Neg logit contributions
 neither-0.002
 Instead-0.002
 So-0.002
 Twe-0.002
ucker-0.002
Token.
Token ID13
Feature activation-3.24e-03

Pos logit contributions
te+0.018
bridge+0.017
THE+0.017
book+0.016
 fant+0.016
Neg logit contributions
 neither-0.016
 Instead-0.015
 Twe-0.015
 So-0.015
 Mitt-0.014
Token family
Token ID1641
Feature activation-1.21e-02

Pos logit contributions
te+0.007
bridge+0.007
THE+0.007
book+0.006
 fant+0.006
Neg logit contributions
 neither-0.004
 Instead-0.004
 Twe-0.004
 So-0.004
 Mitt-0.004
Token with
Token ID351
Feature activation+4.04e-03

Pos logit contributions
te+0.020
bridge+0.020
THE+0.019
 fant+0.019
book+0.019
Neg logit contributions
 neither-0.016
 Instead-0.015
 Twe-0.015
 So-0.015
 Mitt-0.014
Token a
Token ID257
Feature activation-3.94e-03

Pos logit contributions
 neither+0.005
 Instead+0.005
 So+0.005
idents+0.005
 Twe+0.005
Neg logit contributions
te-0.007
bridge-0.006
 calendar-0.006
 fant-0.006
book-0.006
Token rare
Token ID4071
Feature activation-2.92e-02

Pos logit contributions
te+0.006
bridge+0.006
THE+0.006
book+0.006
 fant+0.006
Neg logit contributions
 neither-0.005
 Twe-0.005
 Instead-0.005
 So-0.005
 Mitt-0.005
Token books
Token ID3835
Feature activation-5.13e-02

Pos logit contributions
te+0.039
bridge+0.038
 calendar+0.037
THE+0.036
 fant+0.036
Neg logit contributions
 neither-0.048
 Instead-0.047
 So-0.045
 Twe-0.045
 Mitt-0.042
Tokenhe
Token ID258
Feature activation-8.90e-02

Pos logit contributions
 fant+0.106
bridge+0.103
THE+0.101
iet+0.100
idi+0.099
Neg logit contributions
 neither-0.046
 Instead-0.041
 So-0.040
 Twe-0.039
 Tim-0.031
Tokenlf
Token ID1652
Feature activation-4.09e-02

Pos logit contributions
 fant+0.070
THE+0.068
te+0.063
 spectacular+0.062
bridge+0.061
Neg logit contributions
 neither-0.188
 Twe-0.176
 So-0.170
 Instead-0.168
 Mitt-0.167
Token in
Token ID287
Feature activation-7.35e-03

Pos logit contributions
te+0.069
bridge+0.066
THE+0.063
book+0.063
 calendar+0.063
Neg logit contributions
 neither-0.053
 Instead-0.053
 Twe-0.051
 So-0.050
 Mitt-0.047
Token their
Token ID511
Feature activation+1.89e-03

Pos logit contributions
te+0.011
bridge+0.011
THE+0.010
 fant+0.010
book+0.010
Neg logit contributions
 neither-0.011
 Instead-0.011
 Twe-0.011
 So-0.010
 Mitt-0.010
Token home
Token ID1363
Feature activation-1.40e-02

Pos logit contributions
 Instead+0.003
 So+0.003
 Twe+0.003
 Mitt+0.003
 neither+0.003
Neg logit contributions
bridge-0.003
te-0.003
book-0.003
THE-0.003
eway-0.003
Token.
Token ID13
Feature activation-1.50e-03

Pos logit contributions
te+0.022
book+0.020
bridge+0.020
 calendar+0.020
 throne+0.019
Neg logit contributions
 neither-0.020
 Twe-0.020
 Instead-0.019
idents-0.019
 Ow-0.018
Token a
Token ID257
Feature activation+1.39e-02

Pos logit contributions
te+0.009
bridge+0.008
 fant+0.008
THE+0.008
 throne+0.008
Neg logit contributions
 neither-0.006
 Twe-0.006
 Instead-0.006
 So-0.006
idents-0.006
Token bad
Token ID2089
Feature activation+4.24e-03

Pos logit contributions
 neither+0.022
 Instead+0.022
 Twe+0.021
 So+0.021
 Mitt+0.020
Neg logit contributions
te-0.020
bridge-0.019
THE-0.019
book-0.018
 fant-0.017
Token tum
Token ID11814
Feature activation+5.38e-02

Pos logit contributions
 neither+0.007
 Instead+0.007
 Twe+0.007
 So+0.007
 Mitt+0.006
Neg logit contributions
 calendar-0.005
 throne-0.005
te-0.005
 tour-0.005
 carriage-0.005
Tokenmy
Token ID1820
Feature activation-5.85e-04

Pos logit contributions
 neither+0.072
 Twe+0.068
 Instead+0.067
 So+0.066
idents+0.065
Neg logit contributions
te-0.089
bridge-0.087
book-0.086
THE-0.085
ines-0.079
Token ac
Token ID936
Feature activation-1.65e-02

Pos logit contributions
te+0.001
bridge+0.001
book+0.001
THE+0.001
 calendar+0.001
Neg logit contributions
 neither-0.001
 Instead-0.001
 Twe-0.001
 So-0.001
idents-0.001
Tokenhe
Token ID258
Feature activation-8.06e-02

Pos logit contributions
te+0.012
bridge+0.011
THE+0.011
book+0.010
 calendar+0.010
Neg logit contributions
 neither-0.038
 Instead-0.036
 So-0.036
 Twe-0.036
idents-0.034
Token and
Token ID290
Feature activation+1.79e-03

Pos logit contributions
te+0.104
bridge+0.102
THE+0.098
 fant+0.097
 calendar+0.095
Neg logit contributions
 neither-0.134
 Instead-0.129
 Twe-0.128
 So-0.126
 Mitt-0.119
Token felt
Token ID2936
Feature activation-9.19e-03

Pos logit contributions
 Instead+0.003
 So+0.003
 neither+0.003
 Twe+0.003
idents+0.002
Neg logit contributions
te-0.003
bridge-0.002
book-0.002
 throne-0.002
 around-0.002
Token very
Token ID845
Feature activation+1.12e-02

Pos logit contributions
te+0.012
bridge+0.012
THE+0.012
 fant+0.012
 calendar+0.011
Neg logit contributions
 neither-0.014
 Instead-0.014
 So-0.014
 Twe-0.014
idents-0.013
Token sick
Token ID6639
Feature activation+2.72e-02

Pos logit contributions
 neither+0.018
 So+0.018
 Instead+0.017
 Twe+0.017
idents+0.016
Neg logit contributions
te-0.014
 around-0.014
 fant-0.013
 throne-0.013
 calendar-0.013
Token.
Token ID13
Feature activation+1.21e-02

Pos logit contributions
 neither+0.034
 Instead+0.033
 Twe+0.033
 So+0.032
 Mitt+0.029
Neg logit contributions
te-0.047
bridge-0.045
THE-0.044
book-0.043
 calendar-0.043