TOP ACTIVATIONS
MAX = 0.076

. He never wanted to do what his mom my and
. Jack never wanted to do anything useful ever again .
. He wondered what other quizz es she had created for
He had never dared to do something so brave before .
to her that the gir aff es were wild animals and
'. Whenever Lily wanted to do something special , her mom
. He didn 't like hairc uts because they made him
. He never wanted to do any work .
He promised himself never to do it again and to always
learnt his lesson not to do anything dangerous , no matter
, she understood why the par rot was so different from
lesson to never trust naughty mag icians and to be careful
though she didn â t know how the ice
learned that even though the tree was split , it could
something new ! She thought av oc ados were the most
knew she didn â t have to do everything
that if they wanted to do something , they just had
he didn 't want to do something and to take care
if you don â t know what â
you count how many ost ric hes there are ?"

MIN ACTIVATIONS
MIN = -0.083

placed it onto the wheel bar row . John
him a big red wheel bar row and filled it with
of soil into the wheel bar row . It was heavy
it away in a wheel bar row . Once
dolls around in a wheel bar row she had found in
back with a big wheel bar row . He asked the
. He found a wheel bar row , a tr amp
a big , spinning wheel bar row and put the pumpkin
He jumped onto the wheel bar row and started rolling around
harder to push the wheel bar row . But her backyard
. She saw a wheel bar row , a large sp
filled up lots of wheel bar rows with the waste and
" Good job , O ll ie !" his mother praised
Finally , the sun began to set and Emma â
turned into a giant sand storm . The sand
the garden . <|endoftext|> " Mom my , what 's that
welcome her . <|endoftext|> " Mom my , look at the
Suddenly , a loud noise made them stop and look up
. <|endoftext|> " Look , Mom my !" exclaimed three -
stopped and the sun started to shine . Ber tha and

INTERVAL 0.036 - 0.076
CONTAINS 0.411%

dress . She would tw irl around in it and act
. She wanted to know what it felt like to splash
come on !" Lily climbed up the ladder to the top
her mom , and the par rot . After
, she wanted to go on an adventure , so she

INTERVAL -0.003 - 0.036
CONTAINS 60.300%

Larry . Larry was a good - nat ured bird and
. <|endoftext|> One day , a little boy named Tim found
fly was scared . The fly tried to hide
until they came to a little island with a tree .
Once upon a time , there was a clown named Bob

INTERVAL -0.043 - -0.003
CONTAINS 39.246%

and pointed to a word . " This is cat ,"
" Oh no , Tom , what did you
. She used the peanuts to make a peanut butter sandwich
my lively twins . I know you didn 't mean to
. They take out two toys . One is a car

INTERVAL -0.083 - -0.043
CONTAINS 0.043%

story . " Mom my , I 'm enjoying
. Their moms heard the noise and came to
garden . <|endoftext|> " Mom my , what 's that ?"
! M ummy asked ,
to the park with her mom . They saw an interesting
Token.
Token ID13
Feature activation+4.96e-03

Pos logit contributions
 judge+0.021
 pigeon+0.021
 judged+0.021
ked+0.020
 scored+0.020
Neg logit contributions
 dining-0.032
rd-0.030
 blankets-0.030
 counters-0.029
zie-0.029
Token He
Token ID679
Feature activation+1.06e-02

Pos logit contributions
 judge+0.006
 en+0.006
 judged+0.006
 jealous+0.006
 scored+0.006
Neg logit contributions
 dining-0.011
rd-0.010
 fluids-0.010
 blankets-0.010
 counters-0.010
Token never
Token ID1239
Feature activation+2.76e-02

Pos logit contributions
 pigeon+0.015
ked+0.015
 gol+0.014
 judge+0.014
 judged+0.013
Neg logit contributions
 dining-0.019
zie-0.018
rd-0.018
 blankets-0.016
 counters-0.016
Token wanted
Token ID2227
Feature activation+1.80e-02

Pos logit contributions
 pigeon+0.037
ked+0.036
 gol+0.034
 en+0.033
 ar+0.032
Neg logit contributions
zie-0.047
 dining-0.045
rated-0.044
rd-0.043
my-0.043
Token to
Token ID284
Feature activation+3.31e-02

Pos logit contributions
ked+0.029
 gol+0.026
 pigeon+0.026
 judged+0.025
 scored+0.025
Neg logit contributions
 blankets-0.033
 bed-0.029
 counters-0.027
 directions-0.027
 beds-0.027
Token do
Token ID466
Feature activation+7.57e-02

Pos logit contributions
ked+0.057
 judged+0.052
 scored+0.051
 en+0.050
 pigeon+0.050
Neg logit contributions
 dining-0.056
zie-0.050
rd-0.049
 counters-0.049
 blankets-0.049
Token what
Token ID644
Feature activation+4.79e-02

Pos logit contributions
ked+0.144
 pigeon+0.134
 judge+0.133
 judged+0.131
 gol+0.129
Neg logit contributions
rd-0.104
 counters-0.104
 dining-0.103
 blankets-0.100
 fluids-0.096
Token his
Token ID465
Feature activation+2.96e-02

Pos logit contributions
ked+0.089
 judged+0.074
 pigeon+0.073
 judge+0.069
 teased+0.067
Neg logit contributions
 dining-0.078
 blankets-0.078
rd-0.075
 counters-0.074
 occasions-0.073
Token mom
Token ID1995
Feature activation+2.37e-02

Pos logit contributions
ked+0.053
 scored+0.052
 judged+0.051
 pigeon+0.049
 teased+0.049
Neg logit contributions
 dining-0.049
 bedroom-0.045
 blankets-0.044
rd-0.043
 bed-0.043
Tokenmy
Token ID1820
Feature activation+3.33e-02

Pos logit contributions
ked+0.044
 pigeon+0.042
 ar+0.040
 bully+0.040
 en+0.040
Neg logit contributions
 dining-0.032
zie-0.032
ma-0.031
rd-0.030
 counters-0.029
Token and
Token ID290
Feature activation+2.54e-02

Pos logit contributions
ked+0.050
 pigeon+0.048
 ar+0.046
 gol+0.046
 judge+0.045
Neg logit contributions
 dining-0.055
rd-0.054
zie-0.052
 counters-0.052
 fluids-0.050
Token.
Token ID13
Feature activation-7.74e-03

Pos logit contributions
 dining+0.067
rd+0.059
zie+0.059
 counters+0.059
 blankets+0.057
Neg logit contributions
 judge-0.055
 judged-0.054
 pigeon-0.053
 scored-0.052
 teased-0.052
Token Jack
Token ID3619
Feature activation-9.98e-03

Pos logit contributions
 dining+0.016
rd+0.014
 counters+0.014
zie+0.014
 blankets+0.014
Neg logit contributions
 judge-0.011
 judged-0.011
 scored-0.011
 teased-0.010
 pigeon-0.010
Token never
Token ID1239
Feature activation+1.98e-02

Pos logit contributions
 dining+0.021
rd+0.019
zie+0.019
 counters+0.018
 blankets+0.018
Neg logit contributions
ked-0.013
 judge-0.013
 pigeon-0.012
 judged-0.012
 scored-0.012
Token wanted
Token ID2227
Feature activation+1.58e-02

Pos logit contributions
 pigeon+0.028
ked+0.027
 judge+0.026
 gol+0.025
 ar+0.025
Neg logit contributions
 dining-0.034
zie-0.033
rd-0.032
rated-0.032
 fluids-0.031
Token to
Token ID284
Feature activation+2.72e-02

Pos logit contributions
ked+0.024
 judged+0.021
 gol+0.021
 pigeon+0.020
 scored+0.020
Neg logit contributions
 blankets-0.031
 dining-0.029
 bed-0.028
 counters-0.027
 fluids-0.026
Token do
Token ID466
Feature activation+7.42e-02

Pos logit contributions
 judge+0.040
 judged+0.039
 pigeon+0.038
 scored+0.038
 teased+0.037
Neg logit contributions
 dining-0.054
rd-0.048
zie-0.047
 counters-0.047
 blankets-0.046
Token anything
Token ID1997
Feature activation+5.14e-02

Pos logit contributions
ked+0.133
 judge+0.128
 judged+0.125
 pigeon+0.125
 teased+0.121
Neg logit contributions
 dining-0.118
rd-0.114
 counters-0.111
 blankets-0.109
 fluids-0.104
Token useful
Token ID4465
Feature activation+3.64e-02

Pos logit contributions
ked+0.102
 pigeon+0.091
 judged+0.087
 judge+0.084
 teased+0.077
Neg logit contributions
rd-0.077
 dining-0.075
 blankets-0.065
 counters-0.064
 fluids-0.062
Token ever
Token ID1683
Feature activation+1.78e-02

Pos logit contributions
 pigeon+0.062
 judged+0.062
 judge+0.059
ked+0.059
 en+0.054
Neg logit contributions
 dining-0.058
rd-0.057
 occasions-0.053
 blankets-0.053
zie-0.050
Token again
Token ID757
Feature activation+3.05e-02

Pos logit contributions
 judge+0.028
 judged+0.028
 pigeon+0.027
 scored+0.027
 teased+0.026
Neg logit contributions
 dining-0.033
rd-0.029
zie-0.029
 counters-0.029
 blankets-0.028
Token.
Token ID13
Feature activation-2.09e-03

Pos logit contributions
 judge+0.049
 judged+0.048
 pigeon+0.047
 scored+0.047
ked+0.046
Neg logit contributions
 dining-0.056
rd-0.050
zie-0.048
 counters-0.048
 blankets-0.048
Token.
Token ID13
Feature activation-2.42e-04

Pos logit contributions
 judge+0.030
 pigeon+0.030
 judged+0.030
ked+0.030
 scored+0.029
Neg logit contributions
 dining-0.031
rd-0.027
 blankets-0.027
 counters-0.026
zie-0.025
Token He
Token ID679
Feature activation+4.16e-03

Pos logit contributions
 dining+0.001
rd+0.000
 counters+0.000
zie+0.000
 fluids+0.000
Neg logit contributions
 judge-0.000
 judged-0.000
 en-0.000
 scored-0.000
 jealous-0.000
Token wondered
Token ID14028
Feature activation-1.80e-03

Pos logit contributions
ked+0.006
 pigeon+0.006
 judge+0.006
 en+0.006
 bully+0.006
Neg logit contributions
 dining-0.007
 blankets-0.006
zie-0.006
rd-0.006
 counters-0.006
Token what
Token ID644
Feature activation+2.99e-02

Pos logit contributions
 dining+0.004
zie+0.003
 blankets+0.003
 counters+0.003
rd+0.003
Neg logit contributions
 judge-0.003
 judged-0.003
 pigeon-0.003
 scored-0.003
 teased-0.003
Token other
Token ID584
Feature activation+1.09e-02

Pos logit contributions
ked+0.056
 judged+0.045
 en+0.042
 judge+0.042
 gol+0.042
Neg logit contributions
 dining-0.051
 blankets-0.047
 occasions-0.046
rd-0.045
ma-0.043
Token quizz
Token ID48258
Feature activation+7.09e-02

Pos logit contributions
ked+0.021
 judged+0.019
 scored+0.018
 pigeon+0.018
 judge+0.018
Neg logit contributions
 dining-0.017
rd-0.016
 blankets-0.016
 counters-0.015
 occasions-0.014
Tokenes
Token ID274
Feature activation+2.75e-02

Pos logit contributions
 pigeon+0.155
 en+0.152
 judged+0.151
 judge+0.151
 jealous+0.149
Neg logit contributions
 dining-0.086
rd-0.074
 counters-0.068
 blankets-0.067
zie-0.060
Token she
Token ID673
Feature activation+1.18e-02

Pos logit contributions
ked+0.041
 pigeon+0.041
 judged+0.041
 judge+0.040
 en+0.040
Neg logit contributions
 dining-0.047
 blankets-0.044
zie-0.043
rd-0.042
 counters-0.041
Token had
Token ID550
Feature activation+6.22e-03

Pos logit contributions
 pigeon+0.018
ked+0.017
 en+0.017
 bully+0.016
 gol+0.016
Neg logit contributions
 dining-0.021
rd-0.019
zie-0.019
 occasions-0.018
 counters-0.017
Token created
Token ID2727
Feature activation-5.66e-03

Pos logit contributions
 pigeon+0.010
 en+0.010
 bully+0.010
ked+0.010
 judged+0.009
Neg logit contributions
 dining-0.011
rd-0.010
 occasions-0.009
zie-0.009
keeping-0.009
Token for
Token ID329
Feature activation+5.95e-03

Pos logit contributions
 dining+0.011
rd+0.010
 counters+0.009
 blankets+0.009
zie+0.009
Neg logit contributions
 judge-0.009
 judged-0.009
 pigeon-0.009
ked-0.008
 scored-0.008
Token He
Token ID679
Feature activation-5.68e-03

Pos logit contributions
 dining+0.003
zie+0.003
 counters+0.003
rd+0.003
 blankets+0.002
Neg logit contributions
 pigeon-0.004
 judge-0.004
 pin-0.004
 judged-0.004
ked-0.004
Token had
Token ID550
Feature activation+2.22e-02

Pos logit contributions
 dining+0.010
zie+0.009
 blankets+0.009
rd+0.009
 counters+0.009
Neg logit contributions
ked-0.008
 judge-0.008
 bully-0.008
 pigeon-0.008
 judged-0.008
Token never
Token ID1239
Feature activation+2.03e-02

Pos logit contributions
ked+0.033
 judge+0.032
 pigeon+0.032
 bully+0.030
 judged+0.030
Neg logit contributions
 dining-0.043
zie-0.038
rd-0.037
 counters-0.037
 blankets-0.037
Token dared
Token ID28765
Feature activation+1.35e-03

Pos logit contributions
ked+0.034
 pigeon+0.032
 judge+0.030
 ar+0.030
 en+0.029
Neg logit contributions
 dining-0.032
rd-0.030
zie-0.028
 counters-0.026
keeping-0.026
Token to
Token ID284
Feature activation+2.90e-02

Pos logit contributions
 judged+0.002
 pigeon+0.002
ked+0.002
 scored+0.002
 judge+0.002
Neg logit contributions
zie-0.003
 blankets-0.003
 dining-0.003
 counters-0.003
 directions-0.002
Token do
Token ID466
Feature activation+7.01e-02

Pos logit contributions
ked+0.051
 scored+0.044
 judged+0.044
 pigeon+0.043
 en+0.041
Neg logit contributions
 dining-0.046
 bed-0.041
zie-0.041
 fluids-0.040
rd-0.040
Token something
Token ID1223
Feature activation+2.62e-02

Pos logit contributions
ked+0.126
 judge+0.122
 judged+0.121
 pigeon+0.118
 teased+0.116
Neg logit contributions
 dining-0.104
rd-0.100
 fluids-0.097
 counters-0.096
 blankets-0.093
Token so
Token ID523
Feature activation+4.73e-03

Pos logit contributions
ked+0.048
 judged+0.042
 pigeon+0.041
 judge+0.040
 scored+0.039
Neg logit contributions
 dining-0.044
rd-0.042
 counters-0.037
 blankets-0.036
 fluids-0.036
Token brave
Token ID14802
Feature activation+3.33e-02

Pos logit contributions
ked+0.009
 judged+0.008
 judge+0.008
 teased+0.007
 pigeon+0.007
Neg logit contributions
rd-0.007
 dining-0.007
 fluids-0.006
 counters-0.006
heart-0.006
Token before
Token ID878
Feature activation+6.42e-03

Pos logit contributions
 judged+0.050
ked+0.049
 judge+0.049
 pigeon+0.047
 scored+0.047
Neg logit contributions
 dining-0.057
rd-0.056
 blankets-0.052
 counters-0.052
zie-0.051
Token.
Token ID13
Feature activation+3.54e-03

Pos logit contributions
 judge+0.010
 judged+0.009
 pigeon+0.009
 scored+0.009
 teased+0.009
Neg logit contributions
 dining-0.013
rd-0.011
zie-0.011
 counters-0.011
 blankets-0.011
Token to
Token ID284
Feature activation-2.34e-03

Pos logit contributions
 scored+0.015
 judged+0.014
ked+0.014
 judge+0.013
 pigeon+0.013
Neg logit contributions
 dining-0.025
 blankets-0.023
 counters-0.023
 fluids-0.022
rd-0.021
Token her
Token ID607
Feature activation+2.78e-02

Pos logit contributions
 dining+0.005
zie+0.004
 counters+0.004
rd+0.004
 blankets+0.004
Neg logit contributions
 judge-0.003
 pigeon-0.003
 judged-0.003
 scored-0.003
ked-0.003
Token that
Token ID326
Feature activation+1.06e-02

Pos logit contributions
 scored+0.043
 judged+0.042
 judge+0.042
 teased+0.041
ked+0.041
Neg logit contributions
 dining-0.051
 blankets-0.046
 counters-0.046
zie-0.045
rd-0.045
Token the
Token ID262
Feature activation+1.26e-02

Pos logit contributions
ked+0.019
 scored+0.016
 judged+0.016
 pigeon+0.015
 judge+0.015
Neg logit contributions
 dining-0.018
 blankets-0.017
zie-0.015
 fluids-0.015
 counters-0.015
Token gir
Token ID37370
Feature activation+4.99e-02

Pos logit contributions
 judged+0.017
 scored+0.016
ked+0.016
 judge+0.016
 teased+0.016
Neg logit contributions
 dining-0.026
rd-0.023
 counters-0.023
 blankets-0.023
zie-0.023
Tokenaff
Token ID2001
Feature activation+7.00e-02

Pos logit contributions
 judged+0.087
 en+0.083
 scored+0.082
 teased+0.082
 jealous+0.081
Neg logit contributions
zie-0.065
rd-0.063
ma-0.061
 blankets-0.059
 dining-0.059
Tokenes
Token ID274
Feature activation+2.72e-02

Pos logit contributions
 gol+0.146
 jealous+0.142
 judged+0.142
 en+0.142
 judge+0.134
Neg logit contributions
ma-0.063
zie-0.057
rd-0.056
 dining-0.054
igi-0.052
Token were
Token ID547
Feature activation-2.43e-03

Pos logit contributions
 pigeon+0.047
 judge+0.047
ked+0.047
 judged+0.046
 gol+0.045
Neg logit contributions
 dining-0.038
 blankets-0.038
 counters-0.035
zie-0.035
 fluids-0.034
Token wild
Token ID4295
Feature activation+3.57e-03

Pos logit contributions
 dining+0.005
 blankets+0.005
 counters+0.004
zie+0.004
 fluids+0.004
Neg logit contributions
 scored-0.003
 judge-0.003
 judged-0.003
 pigeon-0.003
ked-0.003
Token animals
Token ID4695
Feature activation+1.31e-02

Pos logit contributions
 judge+0.005
 judged+0.005
 scored+0.005
 pigeon+0.005
 en+0.004
Neg logit contributions
 dining-0.007
 blankets-0.007
 counters-0.007
rd-0.007
zie-0.006
Token and
Token ID290
Feature activation+2.59e-02

Pos logit contributions
 judge+0.021
 judged+0.021
 pigeon+0.020
 scored+0.020
ked+0.020
Neg logit contributions
 dining-0.023
 blankets-0.021
rd-0.020
 counters-0.020
 fluids-0.019
Token'.
Token ID4458
Feature activation+6.00e-03

Pos logit contributions
 judged+0.013
 judge+0.013
ked+0.013
 scored+0.013
 pigeon+0.013
Neg logit contributions
 dining-0.012
 counters-0.010
rd-0.010
 blankets-0.010
zie-0.010
Token Whenever
Token ID21326
Feature activation+1.37e-02

Pos logit contributions
 judge+0.007
 judged+0.007
 pigeon+0.006
 en+0.006
 teased+0.006
Neg logit contributions
 dining-0.014
zie-0.013
rd-0.012
 counters-0.012
 blankets-0.012
Token Lily
Token ID20037
Feature activation+3.47e-02

Pos logit contributions
ked+0.019
 judge+0.019
 judged+0.018
 pigeon+0.018
 scored+0.018
Neg logit contributions
 dining-0.028
rd-0.025
zie-0.025
 blankets-0.025
 counters-0.025
Token wanted
Token ID2227
Feature activation+1.21e-02

Pos logit contributions
ked+0.053
 en+0.050
 pigeon+0.050
 jealous+0.048
 gol+0.047
Neg logit contributions
 dining-0.060
zie-0.057
rd-0.055
 counters-0.055
 bedroom-0.054
Token to
Token ID284
Feature activation+2.45e-02

Pos logit contributions
ked+0.023
 scored+0.021
 gol+0.021
 judged+0.020
 pigeon+0.019
Neg logit contributions
 dining-0.017
 blankets-0.016
 counters-0.016
 bed-0.015
rd-0.015
Token do
Token ID466
Feature activation+6.95e-02

Pos logit contributions
ked+0.046
 scored+0.041
 judged+0.040
 teased+0.038
 gol+0.038
Neg logit contributions
 dining-0.038
 bed-0.036
 blankets-0.035
 counters-0.033
zie-0.033
Token something
Token ID1223
Feature activation+2.74e-02

Pos logit contributions
ked+0.120
 judged+0.112
 pigeon+0.111
 gol+0.110
 judge+0.110
Neg logit contributions
 dining-0.110
rd-0.110
 counters-0.105
 blankets-0.101
 occasions-0.097
Token special
Token ID2041
Feature activation+1.52e-02

Pos logit contributions
ked+0.052
 scored+0.048
 pigeon+0.048
 judged+0.047
 judge+0.047
Neg logit contributions
 dining-0.041
rd-0.037
 blankets-0.037
 counters-0.036
 fluids-0.035
Token,
Token ID11
Feature activation+1.74e-02

Pos logit contributions
 pigeon+0.029
ked+0.028
 judged+0.028
 scored+0.028
 judge+0.027
Neg logit contributions
 dining-0.022
 occasions-0.020
rd-0.020
 blankets-0.019
zie-0.019
Token her
Token ID607
Feature activation+5.08e-03

Pos logit contributions
ked+0.024
 judge+0.023
 scored+0.023
 judged+0.023
 pigeon+0.022
Neg logit contributions
 dining-0.035
 blankets-0.032
 counters-0.032
rd-0.031
zie-0.031
Token mom
Token ID1995
Feature activation-8.27e-03

Pos logit contributions
 scored+0.009
ked+0.009
 judged+0.009
 pigeon+0.009
 teased+0.009
Neg logit contributions
 dining-0.009
 blankets-0.008
 bedroom-0.008
 counters-0.007
 bed-0.007
Token.
Token ID13
Feature activation+4.58e-03

Pos logit contributions
 judge+0.013
 judged+0.013
 pigeon+0.013
 scored+0.013
 teased+0.012
Neg logit contributions
 dining-0.015
rd-0.013
 counters-0.013
 blankets-0.013
zie-0.013
Token He
Token ID679
Feature activation+8.98e-03

Pos logit contributions
 judge+0.006
 en+0.006
 judged+0.006
 scored+0.006
 teased+0.005
Neg logit contributions
 dining-0.010
rd-0.009
 fluids-0.009
 blankets-0.009
 counters-0.009
Token didn
Token ID1422
Feature activation+2.42e-02

Pos logit contributions
 judge+0.010
 judged+0.009
 pigeon+0.009
 scored+0.009
ked+0.009
Neg logit contributions
 dining-0.021
zie-0.019
rd-0.019
 counters-0.019
 blankets-0.019
Token't
Token ID470
Feature activation-8.63e-03

Pos logit contributions
 judged+0.032
 judge+0.032
 pigeon+0.032
ked+0.031
 teased+0.031
Neg logit contributions
 dining-0.049
zie-0.044
rd-0.044
 counters-0.044
 blankets-0.044
Token like
Token ID588
Feature activation+2.38e-02

Pos logit contributions
 dining+0.018
rd+0.016
zie+0.016
 counters+0.016
 blankets+0.016
Neg logit contributions
 judge-0.012
 judged-0.011
 pigeon-0.011
 scored-0.011
 teased-0.011
Token hairc
Token ID37363
Feature activation+6.85e-02

Pos logit contributions
ked+0.040
 pigeon+0.037
 judged+0.036
 scored+0.036
 judge+0.035
Neg logit contributions
 blankets-0.042
 dining-0.042
 bed-0.040
 counters-0.039
 occasions-0.038
Tokenuts
Token ID5500
Feature activation+1.42e-02

Pos logit contributions
 pigeon+0.089
 judge+0.088
 judged+0.083
 pin+0.081
 bully+0.081
Neg logit contributions
 dining-0.140
rd-0.134
 counters-0.129
zie-0.122
amas-0.119
Token because
Token ID780
Feature activation+1.91e-02

Pos logit contributions
 judged+0.024
 pigeon+0.024
 judge+0.023
ked+0.023
 ar+0.022
Neg logit contributions
 dining-0.022
 blankets-0.020
zie-0.020
 bed-0.019
 counters-0.019
Token they
Token ID484
Feature activation-6.11e-04

Pos logit contributions
 judged+0.027
 judge+0.027
ked+0.027
 pigeon+0.026
 scored+0.025
Neg logit contributions
 dining-0.038
 blankets-0.035
rd-0.034
 counters-0.033
zie-0.033
Token made
Token ID925
Feature activation-6.06e-03

Pos logit contributions
 dining+0.001
zie+0.001
 blankets+0.001
 counters+0.001
 fluids+0.001
Neg logit contributions
 pigeon-0.001
 ar-0.001
 judge-0.001
 gol-0.001
ked-0.001
Token him
Token ID683
Feature activation+8.73e-03

Pos logit contributions
 dining+0.012
 counters+0.011
 blankets+0.011
rd+0.010
zie+0.010
Neg logit contributions
 judge-0.009
 judged-0.009
 pigeon-0.009
 scored-0.008
ked-0.008
Token.
Token ID13
Feature activation+7.14e-03

Pos logit contributions
 judge+0.002
 judged+0.002
 pigeon+0.002
 scored+0.002
 teased+0.002
Neg logit contributions
 dining-0.003
rd-0.003
zie-0.003
 counters-0.003
 blankets-0.003
Token He
Token ID679
Feature activation+3.55e-03

Pos logit contributions
 judge+0.012
 judged+0.011
 pigeon+0.011
ked+0.011
 teased+0.011
Neg logit contributions
 dining-0.013
rd-0.011
zie-0.011
 counters-0.011
 blankets-0.011
Token never
Token ID1239
Feature activation+2.21e-02

Pos logit contributions
 judge+0.005
ked+0.005
 pigeon+0.005
 judged+0.005
 en+0.005
Neg logit contributions
 dining-0.007
 blankets-0.006
zie-0.006
rd-0.006
 fluids-0.006
Token wanted
Token ID2227
Feature activation+1.55e-02

Pos logit contributions
 pigeon+0.032
 judge+0.031
ked+0.031
 gol+0.029
 en+0.029
Neg logit contributions
 dining-0.042
zie-0.039
rd-0.038
 blankets-0.037
 counters-0.036
Token to
Token ID284
Feature activation+2.87e-02

Pos logit contributions
ked+0.021
 pigeon+0.020
 gol+0.020
 judged+0.019
 judge+0.019
Neg logit contributions
 blankets-0.032
 dining-0.031
 bed-0.029
 counters-0.029
 beds-0.028
Token do
Token ID466
Feature activation+6.84e-02

Pos logit contributions
 judge+0.044
 judged+0.042
 pigeon+0.041
 scored+0.040
 teased+0.040
Neg logit contributions
 dining-0.056
rd-0.050
 counters-0.049
zie-0.049
 blankets-0.048
Token any
Token ID597
Feature activation+1.44e-02

Pos logit contributions
 judge+0.109
ked+0.107
 pigeon+0.105
 judged+0.105
 teased+0.100
Neg logit contributions
 dining-0.124
rd-0.117
 counters-0.115
 blankets-0.115
 fluids-0.111
Token work
Token ID670
Feature activation+2.50e-02

Pos logit contributions
ked+0.028
 judged+0.026
 pigeon+0.026
 scored+0.026
 judge+0.026
Neg logit contributions
 dining-0.022
 blankets-0.020
rd-0.020
 counters-0.020
 fluids-0.019
Token.
Token ID13
Feature activation+1.07e-02

Pos logit contributions
ked+0.040
 pigeon+0.040
 judged+0.039
 judge+0.038
 scored+0.035
Neg logit contributions
 dining-0.045
rd-0.042
 blankets-0.040
 counters-0.040
zie-0.038
Token
Token ID198
Feature activation+1.65e-02

Pos logit contributions
 judge+0.016
 judged+0.015
 pigeon+0.015
 scored+0.015
 teased+0.014
Neg logit contributions
 dining-0.021
rd-0.019
zie-0.019
 counters-0.019
 blankets-0.018
Token
Token ID198
Feature activation+3.00e-03

Pos logit contributions
 judge+0.017
 judged+0.017
 scored+0.017
 pigeon+0.016
 en+0.016
Neg logit contributions
 dining-0.040
 counters-0.037
zie-0.036
 blankets-0.035
rd-0.035
Token He
Token ID679
Feature activation+8.89e-03

Pos logit contributions
 judge+0.002
 judged+0.002
 en+0.001
 scored+0.001
 teased+0.001
Neg logit contributions
 dining-0.003
 counters-0.002
 blankets-0.002
rd-0.002
zie-0.002
Token promised
Token ID8072
Feature activation+1.29e-03

Pos logit contributions
 judge+0.013
ked+0.012
 pigeon+0.012
 scored+0.011
 bully+0.011
Neg logit contributions
 dining-0.016
zie-0.015
 blankets-0.015
rd-0.015
 counters-0.014
Token himself
Token ID2241
Feature activation+1.85e-02

Pos logit contributions
 judge+0.001
 judged+0.001
 pigeon+0.001
 teased+0.001
 scored+0.001
Neg logit contributions
 dining-0.004
rd-0.003
 counters-0.003
zie-0.003
 blankets-0.003
Token never
Token ID1239
Feature activation+2.22e-02

Pos logit contributions
 judge+0.018
 judged+0.017
 pigeon+0.016
 scored+0.016
 teased+0.016
Neg logit contributions
 dining-0.046
rd-0.042
zie-0.042
 counters-0.041
 blankets-0.041
Token to
Token ID284
Feature activation+3.41e-02

Pos logit contributions
ked+0.020
 en+0.020
 pigeon+0.019
 scored+0.019
 judge+0.019
Neg logit contributions
 dining-0.049
 blankets-0.047
zie-0.046
rd-0.045
 counters-0.044
Token do
Token ID466
Feature activation+6.82e-02

Pos logit contributions
 judged+0.046
 scored+0.045
 judge+0.045
ked+0.045
 pigeon+0.044
Neg logit contributions
 dining-0.068
rd-0.062
zie-0.061
 counters-0.061
 blankets-0.060
Token it
Token ID340
Feature activation+4.21e-02

Pos logit contributions
ked+0.113
 judged+0.111
 judge+0.109
 teased+0.105
 gol+0.104
Neg logit contributions
 dining-0.107
rd-0.104
 counters-0.102
 blankets-0.100
 fluids-0.099
Token again
Token ID757
Feature activation+3.85e-02

Pos logit contributions
 pigeon+0.071
 judge+0.070
 judged+0.070
 scored+0.067
ked+0.067
Neg logit contributions
 dining-0.070
rd-0.065
zie-0.064
 counters-0.061
 blankets-0.061
Token and
Token ID290
Feature activation+3.60e-02

Pos logit contributions
 judge+0.063
 judged+0.062
 pigeon+0.061
ked+0.061
 scored+0.061
Neg logit contributions
 dining-0.068
rd-0.061
zie-0.059
 counters-0.059
 blankets-0.058
Token to
Token ID284
Feature activation+1.78e-02

Pos logit contributions
ked+0.047
 judge+0.047
 pigeon+0.046
 judged+0.046
 scored+0.045
Neg logit contributions
 dining-0.075
zie-0.066
rd-0.066
 counters-0.065
 blankets-0.064
Token always
Token ID1464
Feature activation+1.08e-02

Pos logit contributions
 judge+0.025
 judged+0.025
 pigeon+0.024
 scored+0.024
 teased+0.023
Neg logit contributions
 dining-0.036
rd-0.032
 counters-0.032
zie-0.032
 blankets-0.031
Token learnt
Token ID26338
Feature activation+3.70e-02

Pos logit contributions
ked+0.016
 pigeon+0.016
 judge+0.016
 en+0.015
 judged+0.015
Neg logit contributions
 dining-0.019
zie-0.018
rd-0.017
 blankets-0.017
 counters-0.017
Token his
Token ID465
Feature activation-4.58e-03

Pos logit contributions
ked+0.049
 judged+0.048
 scored+0.047
 pigeon+0.046
 judge+0.046
Neg logit contributions
 dining-0.073
 blankets-0.072
 directions-0.067
 counters-0.066
 fluids-0.066
Token lesson
Token ID11483
Feature activation+2.64e-02

Pos logit contributions
 dining+0.008
rd+0.007
 counters+0.007
zie+0.007
 blankets+0.007
Neg logit contributions
 judged-0.008
ked-0.008
 scored-0.007
 judge-0.007
 teased-0.007
Token not
Token ID407
Feature activation+3.13e-02

Pos logit contributions
 pigeon+0.036
 judged+0.036
ked+0.035
 judge+0.034
 en+0.033
Neg logit contributions
rd-0.051
 dining-0.051
zie-0.048
 counters-0.047
 blankets-0.046
Token to
Token ID284
Feature activation+2.28e-02

Pos logit contributions
ked+0.044
 pigeon+0.039
 scored+0.037
 bully+0.036
 judged+0.035
Neg logit contributions
 blankets-0.051
 directions-0.051
 dining-0.048
 hinges-0.046
zie-0.046
Token do
Token ID466
Feature activation+6.81e-02

Pos logit contributions
 judge+0.031
 judged+0.030
 pigeon+0.030
 scored+0.029
ked+0.029
Neg logit contributions
 dining-0.047
rd-0.042
zie-0.042
 counters-0.042
 blankets-0.041
Token anything
Token ID1997
Feature activation+5.62e-02

Pos logit contributions
ked+0.117
 judged+0.112
 judge+0.111
 pigeon+0.106
 teased+0.106
Neg logit contributions
 dining-0.108
 counters-0.103
 blankets-0.100
 fluids-0.099
rd-0.096
Token dangerous
Token ID4923
Feature activation+3.76e-02

Pos logit contributions
ked+0.097
 pigeon+0.092
 judge+0.086
 judged+0.084
 scored+0.078
Neg logit contributions
 dining-0.089
rd-0.089
 counters-0.081
 fluids-0.078
 blankets-0.078
Token,
Token ID11
Feature activation+2.66e-02

Pos logit contributions
 pigeon+0.056
 judged+0.053
ked+0.053
 judge+0.052
 scored+0.050
Neg logit contributions
 dining-0.063
rd-0.062
zie-0.060
 counters-0.060
 blankets-0.059
Token no
Token ID645
Feature activation-1.26e-03

Pos logit contributions
ked+0.045
 judged+0.039
 judge+0.038
 scored+0.037
 pigeon+0.037
Neg logit contributions
 dining-0.044
 blankets-0.042
 counters-0.040
 fluids-0.038
zie-0.038
Token matter
Token ID2300
Feature activation+2.95e-02

Pos logit contributions
 dining+0.002
 blankets+0.002
zie+0.002
 fluids+0.002
rd+0.002
Neg logit contributions
ked-0.002
 judged-0.002
 scored-0.002
 judge-0.002
 pigeon-0.002
Token,
Token ID11
Feature activation-1.10e-02

Pos logit contributions
 dining+0.022
rd+0.019
 counters+0.019
zie+0.019
 blankets+0.019
Neg logit contributions
 judge-0.016
 judged-0.016
 pigeon-0.015
 scored-0.015
ked-0.015
Token she
Token ID673
Feature activation-3.58e-03

Pos logit contributions
 dining+0.023
 blankets+0.020
 counters+0.020
zie+0.019
rd+0.019
Neg logit contributions
 judge-0.015
 judged-0.015
ked-0.015
 scored-0.014
 pigeon-0.014
Token understood
Token ID7247
Feature activation+4.69e-03

Pos logit contributions
 dining+0.007
zie+0.006
 counters+0.006
 blankets+0.006
rd+0.006
Neg logit contributions
 judge-0.005
 pigeon-0.005
ked-0.005
 judged-0.005
 en-0.005
Token why
Token ID1521
Feature activation+1.61e-02

Pos logit contributions
ked+0.006
 judged+0.006
 judge+0.006
 scored+0.006
 pigeon+0.006
Neg logit contributions
 dining-0.009
zie-0.008
 blankets-0.008
rd-0.008
 counters-0.008
Token the
Token ID262
Feature activation+1.94e-02

Pos logit contributions
ked+0.023
 judged+0.023
 judge+0.023
 pigeon+0.022
 scored+0.022
Neg logit contributions
 dining-0.031
 blankets-0.027
zie-0.027
 counters-0.026
rd-0.026
Token par
Token ID1582
Feature activation+6.80e-02

Pos logit contributions
ked+0.016
 judged+0.016
 teased+0.015
 scored+0.015
 judge+0.013
Neg logit contributions
 dining-0.048
rd-0.045
 blankets-0.043
zie-0.043
 counters-0.042
Tokenrot
Token ID10599
Feature activation+3.77e-02

Pos logit contributions
 teased+0.112
 judged+0.112
 en+0.110
 bored+0.110
 ignored+0.108
Neg logit contributions
rd-0.091
lor-0.086
ator-0.081
 Amen-0.081
 dining-0.080
Token was
Token ID373
Feature activation+2.04e-03

Pos logit contributions
ked+0.066
 gol+0.065
 pigeon+0.062
 judge+0.062
 judged+0.061
Neg logit contributions
zie-0.048
 dining-0.048
rd-0.045
 blankets-0.044
heart-0.043
Token so
Token ID523
Feature activation-1.17e-02

Pos logit contributions
ked+0.003
 scored+0.003
 judge+0.003
 pigeon+0.003
 bully+0.003
Neg logit contributions
 dining-0.004
rd-0.003
zie-0.003
 blankets-0.003
 counters-0.003
Token different
Token ID1180
Feature activation+1.51e-02

Pos logit contributions
 dining+0.020
rd+0.017
 directions+0.016
 counters+0.016
heart+0.016
Neg logit contributions
ked-0.019
 scored-0.019
 judge-0.017
 pigeon-0.017
 bully-0.017
Token from
Token ID422
Feature activation+8.81e-03

Pos logit contributions
 scored+0.024
 judged+0.023
 judge+0.023
ked+0.023
 pigeon+0.023
Neg logit contributions
 dining-0.027
 blankets-0.024
rd-0.024
heart-0.024
zie-0.024
Token lesson
Token ID11483
Feature activation+2.70e-02

Pos logit contributions
ked+0.018
 teased+0.017
 judged+0.017
 pigeon+0.016
 scored+0.016
Neg logit contributions
 dining-0.012
rd-0.010
 blankets-0.010
 fluids-0.010
 directions-0.010
Token to
Token ID284
Feature activation+2.98e-03

Pos logit contributions
 judged+0.030
 pigeon+0.030
 judge+0.029
 en+0.029
ked+0.028
Neg logit contributions
 dining-0.058
rd-0.057
zie-0.054
 blankets-0.053
 counters-0.053
Token never
Token ID1239
Feature activation+2.89e-02

Pos logit contributions
 judged+0.004
 pigeon+0.004
ked+0.004
 judge+0.004
 scored+0.004
Neg logit contributions
 dining-0.006
zie-0.005
rd-0.005
 blankets-0.005
 counters-0.005
Token trust
Token ID3774
Feature activation+2.80e-02

Pos logit contributions
ked+0.046
 pigeon+0.043
Because+0.038
 en+0.038
 judged+0.038
Neg logit contributions
 dining-0.043
 occasions-0.040
zie-0.038
 directions-0.037
 bed-0.037
Token naughty
Token ID45128
Feature activation+2.35e-02

Pos logit contributions
 pigeon+0.044
ked+0.043
 judged+0.043
 scored+0.042
 judge+0.040
Neg logit contributions
zie-0.045
 dining-0.045
 blankets-0.045
 counters-0.043
 directions-0.042
Token mag
Token ID2153
Feature activation+6.79e-02

Pos logit contributions
ked+0.042
 judged+0.041
 pigeon+0.041
 scored+0.040
 judge+0.039
Neg logit contributions
 dining-0.033
 counters-0.032
heart-0.031
rd-0.031
 occasions-0.030
Tokenicians
Token ID5106
Feature activation+4.24e-02

Pos logit contributions
 pigeon+0.133
 judge+0.121
 judged+0.120
ked+0.118
 en+0.117
Neg logit contributions
rd-0.081
 dining-0.076
zie-0.072
 occasions-0.068
 counters-0.067
Token and
Token ID290
Feature activation+3.77e-02

Pos logit contributions
 pigeon+0.067
ked+0.064
 judged+0.064
 judge+0.061
 en+0.061
Neg logit contributions
 dining-0.073
rd-0.066
zie-0.062
 blankets-0.061
 counters-0.060
Token to
Token ID284
Feature activation+2.28e-02

Pos logit contributions
ked+0.053
 pigeon+0.047
 judged+0.045
 scored+0.045
 en+0.044
Neg logit contributions
 dining-0.074
 occasions-0.065
 blankets-0.064
 fluids-0.064
 counters-0.063
Token be
Token ID307
Feature activation+2.34e-02

Pos logit contributions
ked+0.036
 pigeon+0.033
 judged+0.032
 scored+0.031
 teased+0.031
Neg logit contributions
 dining-0.040
zie-0.038
 occasions-0.036
 blankets-0.036
 fluids-0.036
Token careful
Token ID8161
Feature activation+1.50e-02

Pos logit contributions
ked+0.040
 pigeon+0.038
 judge+0.037
 scored+0.035
 gol+0.034
Neg logit contributions
 dining-0.035
rd-0.034
 directions-0.033
 blankets-0.031
 counters-0.031
Token though
Token ID996
Feature activation+1.71e-02

Pos logit contributions
ked+0.058
 pigeon+0.054
 scored+0.053
 judge+0.052
 bully+0.051
Neg logit contributions
 dining-0.047
 blankets-0.046
heart-0.046
 fluids-0.045
 counters-0.043
Token she
Token ID673
Feature activation+2.52e-02

Pos logit contributions
ked+0.024
 judged+0.023
 scored+0.023
 judge+0.023
 pigeon+0.022
Neg logit contributions
 dining-0.034
 blankets-0.031
rd-0.031
 fluids-0.030
zie-0.030
Token didn
Token ID1422
Feature activation+2.58e-02

Pos logit contributions
ked+0.046
 pigeon+0.044
 gol+0.040
 ar+0.040
 en+0.040
Neg logit contributions
 dining-0.036
zie-0.036
rd-0.035
ma-0.031
 counters-0.031
Tokenâ
Token ID22940
Feature activation+1.17e-02

Pos logit contributions
 judged+0.035
 pigeon+0.034
ked+0.034
 judge+0.034
 scored+0.034
Neg logit contributions
 dining-0.052
rd-0.047
zie-0.047
 counters-0.046
 blankets-0.046
Token
Token ID26391
Feature activation+4.99e-02

Pos logit contributions
 pigeon+0.025
 judge+0.025
 judged+0.025
ked+0.024
 en+0.024
Neg logit contributions
 dining-0.014
rd-0.013
zie-0.013
 fluids-0.012
 counters-0.012
Token
Token ID8151
Feature activation+6.70e-02

Pos logit contributions
 judge+0.088
 pigeon+0.087
 judged+0.087
ked+0.085
 en+0.083
Neg logit contributions
 dining-0.081
rd-0.077
 counters-0.070
zie-0.068
 blankets-0.065
Tokent
Token ID83
Feature activation+1.01e-02

Pos logit contributions
 judged+0.099
 judge+0.098
 pigeon+0.097
 ar+0.092
 scored+0.091
Neg logit contributions
 dining-0.123
rd-0.117
zie-0.115
 counters-0.108
 blankets-0.104
Token know
Token ID760
Feature activation+2.01e-02

Pos logit contributions
ked+0.016
 pigeon+0.015
 judged+0.015
 scored+0.015
 teased+0.014
Neg logit contributions
 dining-0.018
zie-0.016
rd-0.016
 counters-0.015
 fluids-0.015
Token how
Token ID703
Feature activation+2.50e-02

Pos logit contributions
 scored+0.032
 judge+0.031
 judged+0.031
ked+0.031
 pigeon+0.031
Neg logit contributions
 dining-0.035
 blankets-0.033
 counters-0.032
zie-0.032
 fluids-0.031
Token the
Token ID262
Feature activation+1.66e-02

Pos logit contributions
 judge+0.030
 judged+0.029
 pigeon+0.029
 scored+0.029
ked+0.029
Neg logit contributions
 dining-0.055
rd-0.050
 counters-0.050
zie-0.049
 blankets-0.049
Token ice
Token ID4771
Feature activation+1.76e-02

Pos logit contributions
 judge+0.026
 pigeon+0.025
 en+0.025
 judged+0.025
 jealous+0.024
Neg logit contributions
 dining-0.032
zie-0.029
rd-0.028
 counters-0.028
 blankets-0.027
Token learned
Token ID4499
Feature activation+3.19e-02

Pos logit contributions
ked+0.030
 pigeon+0.030
 judge+0.029
 judged+0.028
 scored+0.027
Neg logit contributions
 dining-0.052
zie-0.046
 blankets-0.045
rd-0.045
 counters-0.045
Token that
Token ID326
Feature activation+5.85e-03

Pos logit contributions
ked+0.047
 scored+0.044
 judged+0.044
 judge+0.043
 teased+0.042
Neg logit contributions
 dining-0.059
 blankets-0.058
 fluids-0.052
 directions-0.052
 bed-0.051
Token even
Token ID772
Feature activation+3.57e-02

Pos logit contributions
ked+0.009
 judged+0.008
 scored+0.008
 pigeon+0.008
 judge+0.008
Neg logit contributions
 dining-0.011
 blankets-0.010
 counters-0.009
 fluids-0.009
rd-0.009
Token though
Token ID996
Feature activation+2.10e-02

Pos logit contributions
ked+0.064
 pigeon+0.060
 scored+0.057
 judge+0.056
 bully+0.054
Neg logit contributions
 blankets-0.053
 fluids-0.047
 dining-0.047
 directions-0.045
 counters-0.044
Token the
Token ID262
Feature activation+2.25e-02

Pos logit contributions
ked+0.029
 judged+0.029
 judge+0.029
 scored+0.028
 pigeon+0.027
Neg logit contributions
 dining-0.042
 blankets-0.038
rd-0.037
 counters-0.037
 fluids-0.037
Token tree
Token ID5509
Feature activation+6.69e-02

Pos logit contributions
 scored+0.035
ked+0.035
 judged+0.034
 teased+0.034
 judge+0.033
Neg logit contributions
 dining-0.041
 blankets-0.036
 counters-0.036
zie-0.036
rd-0.035
Token was
Token ID373
Feature activation+1.08e-02

Pos logit contributions
ked+0.126
 pigeon+0.116
 gol+0.115
 judged+0.115
 scored+0.115
Neg logit contributions
 dining-0.088
zie-0.086
 blankets-0.084
rd-0.084
 counters-0.083
Token split
Token ID6626
Feature activation+1.39e-02

Pos logit contributions
ked+0.016
 pigeon+0.016
 judge+0.016
 scored+0.015
 judged+0.015
Neg logit contributions
 dining-0.020
 blankets-0.019
zie-0.018
 counters-0.018
 fluids-0.017
Token,
Token ID11
Feature activation+2.24e-02

Pos logit contributions
 judge+0.017
 judged+0.016
 pigeon+0.016
 scored+0.016
 teased+0.016
Neg logit contributions
 dining-0.031
rd-0.028
zie-0.028
 counters-0.028
 blankets-0.027
Token it
Token ID340
Feature activation+3.11e-02

Pos logit contributions
ked+0.034
 judge+0.032
 judged+0.031
 scored+0.031
 pigeon+0.031
Neg logit contributions
 dining-0.041
 blankets-0.040
 counters-0.038
 fluids-0.037
zie-0.036
Token could
Token ID714
Feature activation+2.63e-03

Pos logit contributions
 pigeon+0.054
ked+0.053
 judge+0.052
 gol+0.050
 judged+0.049
Neg logit contributions
 dining-0.047
 blankets-0.046
zie-0.045
 counters-0.043
rd-0.042
Token something
Token ID1223
Feature activation+1.48e-02

Pos logit contributions
ked+0.048
 judged+0.046
 judge+0.046
 scored+0.045
 teased+0.043
Neg logit contributions
 dining-0.062
rd-0.056
 blankets-0.056
 fluids-0.055
zie-0.055
Token new
Token ID649
Feature activation+1.30e-02

Pos logit contributions
ked+0.029
 judged+0.026
 scored+0.026
 judge+0.026
 teased+0.025
Neg logit contributions
 dining-0.023
rd-0.021
 counters-0.020
 blankets-0.020
 fluids-0.019
Token!
Token ID0
Feature activation-6.01e-03

Pos logit contributions
 judged+0.022
ked+0.022
 judge+0.022
 pigeon+0.021
 en+0.021
Neg logit contributions
 dining-0.022
rd-0.021
 occasions-0.019
zie-0.019
 blankets-0.019
Token She
Token ID1375
Feature activation-6.17e-03

Pos logit contributions
 dining+0.010
zie+0.009
rd+0.009
 blankets+0.009
 counters+0.009
Neg logit contributions
 judge-0.010
 pigeon-0.010
 judged-0.010
ked-0.010
 teased-0.010
Token thought
Token ID1807
Feature activation-1.47e-02

Pos logit contributions
 dining+0.012
zie+0.011
rd+0.011
 counters+0.011
 blankets+0.011
Neg logit contributions
ked-0.008
 judge-0.008
 judged-0.008
 pigeon-0.008
 en-0.008
Token av
Token ID1196
Feature activation+6.68e-02

Pos logit contributions
 dining+0.024
 blankets+0.023
 counters+0.022
zie+0.022
 bed+0.021
Neg logit contributions
ked-0.025
 judged-0.024
 judge-0.023
 teased-0.023
 scored-0.023
Tokenoc
Token ID420
Feature activation+3.53e-02

Pos logit contributions
 judge+0.155
 judged+0.153
 pigeon+0.151
 scored+0.149
 teased+0.148
Neg logit contributions
 dining-0.073
rd-0.060
zie-0.059
 counters-0.058
 blankets-0.056
Tokenados
Token ID22484
Feature activation+2.44e-02

Pos logit contributions
 judged+0.072
 scored+0.070
 en+0.070
 judge+0.068
 jealous+0.068
Neg logit contributions
zie-0.043
 dining-0.042
rd-0.041
 counters-0.040
 occasions-0.038
Token were
Token ID547
Feature activation-6.61e-03

Pos logit contributions
ked+0.043
 judge+0.043
 pigeon+0.042
 judged+0.042
 en+0.041
Neg logit contributions
 dining-0.036
 blankets-0.033
zie-0.033
rd-0.031
 counters-0.030
Token the
Token ID262
Feature activation+4.10e-03

Pos logit contributions
 dining+0.013
 blankets+0.011
zie+0.011
 counters+0.011
rd+0.011
Neg logit contributions
ked-0.010
 judge-0.010
 judged-0.010
 scored-0.009
 pigeon-0.009
Token most
Token ID749
Feature activation+1.09e-02

Pos logit contributions
ked+0.008
 scored+0.008
 judged+0.007
 teased+0.007
 pigeon+0.007
Neg logit contributions
 dining-0.006
zie-0.005
 blankets-0.005
 counters-0.005
rd-0.005
Token knew
Token ID2993
Feature activation+9.42e-03

Pos logit contributions
ked+0.017
 pigeon+0.016
 judge+0.015
 en+0.015
 jealous+0.015
Neg logit contributions
zie-0.015
 dining-0.015
rd-0.014
 blankets-0.014
ma-0.013
Token she
Token ID673
Feature activation+1.50e-02

Pos logit contributions
ked+0.013
 scored+0.012
 judged+0.012
 bully+0.012
 judge+0.011
Neg logit contributions
 dining-0.018
zie-0.017
 blankets-0.017
 fluids-0.016
rd-0.016
Token didn
Token ID1422
Feature activation+2.50e-02

Pos logit contributions
 pigeon+0.025
ked+0.024
 en+0.024
 judge+0.024
 judged+0.023
Neg logit contributions
 dining-0.023
zie-0.022
rd-0.022
 counters-0.020
 blankets-0.020
Tokenâ
Token ID22940
Feature activation+3.58e-03

Pos logit contributions
ked+0.037
 judged+0.037
 pigeon+0.037
 judge+0.035
 scored+0.035
Neg logit contributions
 dining-0.045
rd-0.042
zie-0.042
 counters-0.041
 fluids-0.040
Token
Token ID26391
Feature activation+4.32e-02

Pos logit contributions
 judge+0.007
 judged+0.007
 pigeon+0.007
 scored+0.007
 teased+0.007
Neg logit contributions
 dining-0.005
rd-0.004
zie-0.004
 counters-0.004
 fluids-0.004
Token
Token ID8151
Feature activation+6.68e-02

Pos logit contributions
 judge+0.076
 judged+0.075
 pigeon+0.074
 en+0.072
ked+0.071
Neg logit contributions
 dining-0.071
rd-0.066
 counters-0.061
zie-0.061
 blankets-0.058
Tokent
Token ID83
Feature activation-1.71e-03

Pos logit contributions
 judged+0.092
 judge+0.092
 pigeon+0.089
 en+0.085
 scored+0.085
Neg logit contributions
 dining-0.133
rd-0.125
zie-0.123
 counters-0.117
 blankets-0.112
Token have
Token ID423
Feature activation+3.65e-02

Pos logit contributions
 dining+0.003
zie+0.003
rd+0.003
 counters+0.003
 occasions+0.003
Neg logit contributions
ked-0.003
 pigeon-0.002
 judged-0.002
 scored-0.002
 ar-0.002
Token to
Token ID284
Feature activation+1.38e-02

Pos logit contributions
ked+0.052
 pigeon+0.046
 judge+0.043
 bully+0.042
 gol+0.041
Neg logit contributions
 directions-0.063
 occasions-0.063
 dining-0.062
rd-0.062
zie-0.062
Token do
Token ID466
Feature activation+5.61e-02

Pos logit contributions
 judge+0.019
 judged+0.019
 pigeon+0.019
 scored+0.018
ked+0.018
Neg logit contributions
 dining-0.028
rd-0.025
zie-0.025
 counters-0.025
 blankets-0.024
Token everything
Token ID2279
Feature activation+2.95e-02

Pos logit contributions
ked+0.089
 judge+0.085
 judged+0.084
 pigeon+0.084
 scored+0.080
Neg logit contributions
 dining-0.101
rd-0.094
 counters-0.092
 fluids-0.091
 blankets-0.090
Token that
Token ID326
Feature activation-2.62e-03

Pos logit contributions
 dining+0.001
 blankets+0.001
rd+0.001
zie+0.001
 directions+0.001
Neg logit contributions
 judged-0.001
ked-0.001
 scored-0.001
 judge-0.001
 teased-0.001
Token if
Token ID611
Feature activation+8.64e-03

Pos logit contributions
 dining+0.005
 blankets+0.004
rd+0.004
 fluids+0.004
 counters+0.004
Neg logit contributions
ked-0.004
 judge-0.004
 pigeon-0.004
 judged-0.004
 scored-0.004
Token they
Token ID484
Feature activation+1.72e-02

Pos logit contributions
 scored+0.012
ked+0.011
 judge+0.011
 judged+0.011
 pigeon+0.011
Neg logit contributions
 dining-0.017
 blankets-0.015
rd-0.015
 fluids-0.015
 counters-0.015
Token wanted
Token ID2227
Feature activation+4.26e-03

Pos logit contributions
 pigeon+0.025
 en+0.025
 jealous+0.025
ked+0.025
 gol+0.024
Neg logit contributions
 blankets-0.025
 dining-0.024
zie-0.024
 fluids-0.023
rd-0.022
Token to
Token ID284
Feature activation+2.23e-02

Pos logit contributions
ked+0.005
 scored+0.005
 judged+0.005
 judge+0.005
 pigeon+0.004
Neg logit contributions
 blankets-0.009
 dining-0.008
 directions-0.008
 occasions-0.008
 counters-0.008
Token do
Token ID466
Feature activation+6.65e-02

Pos logit contributions
 judged+0.034
ked+0.034
 judge+0.033
 pigeon+0.033
 scored+0.033
Neg logit contributions
 dining-0.040
rd-0.036
zie-0.035
 blankets-0.035
 counters-0.035
Token something
Token ID1223
Feature activation+2.94e-02

Pos logit contributions
 judge+0.112
ked+0.111
 judged+0.109
 pigeon+0.105
 teased+0.104
Neg logit contributions
 dining-0.110
rd-0.103
 blankets-0.101
 counters-0.100
 fluids-0.097
Token,
Token ID11
Feature activation+1.57e-02

Pos logit contributions
ked+0.048
 judge+0.045
 judged+0.045
 pigeon+0.044
 scored+0.043
Neg logit contributions
 dining-0.052
rd-0.048
 blankets-0.047
 counters-0.046
 fluids-0.045
Token they
Token ID484
Feature activation-5.15e-03

Pos logit contributions
ked+0.024
 judged+0.022
 judge+0.022
 scored+0.022
 bully+0.021
Neg logit contributions
 dining-0.028
 blankets-0.027
 fluids-0.026
 counters-0.025
rd-0.025
Token just
Token ID655
Feature activation+1.70e-02

Pos logit contributions
 dining+0.008
zie+0.008
 blankets+0.008
 Amen+0.007
 counters+0.007
Neg logit contributions
 pigeon-0.008
ked-0.008
 judge-0.008
 bully-0.007
 judged-0.007
Token had
Token ID550
Feature activation+2.51e-02

Pos logit contributions
 judge+0.021
 pigeon+0.020
 scored+0.020
 judged+0.020
ked+0.019
Neg logit contributions
 dining-0.032
zie-0.030
 blankets-0.029
rd-0.029
 fluids-0.029
Token he
Token ID339
Feature activation+2.23e-02

Pos logit contributions
ked+0.029
 pigeon+0.028
 judged+0.028
 scored+0.028
 judge+0.028
Neg logit contributions
 dining-0.032
 blankets-0.030
 counters-0.030
rd-0.029
zie-0.028
Token didn
Token ID1422
Feature activation+2.73e-02

Pos logit contributions
 pigeon+0.036
ked+0.032
chard+0.032
 gol+0.031
 en+0.030
Neg logit contributions
rd-0.031
hew-0.031
zie-0.030
ma-0.029
 occasions-0.028
Token't
Token ID470
Feature activation+9.31e-03

Pos logit contributions
 judged+0.038
 pigeon+0.037
ked+0.037
 judge+0.037
 scored+0.036
Neg logit contributions
 dining-0.053
rd-0.049
zie-0.048
 counters-0.048
 blankets-0.047
Token want
Token ID765
Feature activation+1.43e-02

Pos logit contributions
ked+0.014
 pigeon+0.013
 judged+0.013
 scored+0.013
 en+0.012
Neg logit contributions
 dining-0.015
zie-0.015
rd-0.014
 counters-0.014
 blankets-0.014
Token to
Token ID284
Feature activation+3.16e-02

Pos logit contributions
ked+0.018
 pigeon+0.017
 scored+0.016
 judged+0.016
 judge+0.015
Neg logit contributions
 blankets-0.028
 directions-0.027
 occasions-0.026
 counters-0.026
zie-0.025
Token do
Token ID466
Feature activation+6.65e-02

Pos logit contributions
ked+0.046
 judged+0.043
 scored+0.042
 pigeon+0.042
 en+0.040
Neg logit contributions
 dining-0.055
zie-0.053
rd-0.052
 occasions-0.050
 counters-0.050
Token something
Token ID1223
Feature activation+2.68e-02

Pos logit contributions
ked+0.112
 judge+0.109
 pigeon+0.109
 judged+0.108
 teased+0.102
Neg logit contributions
 dining-0.102
rd-0.100
 counters-0.099
 blankets-0.095
 fluids-0.094
Token and
Token ID290
Feature activation+3.94e-02

Pos logit contributions
ked+0.041
 pigeon+0.041
 judge+0.039
 judged+0.038
 scored+0.037
Neg logit contributions
 dining-0.044
 counters-0.043
rd-0.043
 blankets-0.041
 fluids-0.040
Token to
Token ID284
Feature activation+1.57e-02

Pos logit contributions
ked+0.052
 pigeon+0.048
 judged+0.047
 judge+0.047
 scored+0.046
Neg logit contributions
 dining-0.079
 blankets-0.071
rd-0.071
 counters-0.070
zie-0.070
Token take
Token ID1011
Feature activation+9.40e-03

Pos logit contributions
ked+0.023
 judged+0.022
 pigeon+0.022
 scored+0.022
 judge+0.021
Neg logit contributions
 dining-0.029
zie-0.028
rd-0.026
 fluids-0.026
 blankets-0.026
Token care
Token ID1337
Feature activation+1.60e-02

Pos logit contributions
 pigeon+0.016
 judge+0.016
 judged+0.016
ked+0.016
 scored+0.015
Neg logit contributions
 blankets-0.014
rd-0.013
 fluids-0.013
 counters-0.013
 dining-0.013
Token if
Token ID611
Feature activation+3.89e-03

Pos logit contributions
 judge+0.018
 judged+0.017
 pigeon+0.017
 scored+0.017
 teased+0.016
Neg logit contributions
 dining-0.029
rd-0.026
zie-0.026
 counters-0.026
 blankets-0.025
Token you
Token ID345
Feature activation+3.96e-02

Pos logit contributions
ked+0.007
 judged+0.007
 scored+0.007
 judge+0.007
 pigeon+0.007
Neg logit contributions
 dining-0.006
 fluids-0.005
 blankets-0.005
 counters-0.005
rd-0.005
Token don
Token ID836
Feature activation+5.23e-03

Pos logit contributions
ked+0.062
 pigeon+0.061
 judged+0.058
 judge+0.058
 en+0.057
Neg logit contributions
 dining-0.066
 fluids-0.060
zie-0.059
rd-0.057
 counters-0.057
Tokenâ
Token ID22940
Feature activation+8.49e-03

Pos logit contributions
 judge+0.006
 judged+0.005
 pigeon+0.005
 scored+0.005
 teased+0.005
Neg logit contributions
 dining-0.013
rd-0.011
 counters-0.011
zie-0.011
 blankets-0.011
Token
Token ID26391
Feature activation+4.33e-02

Pos logit contributions
 pigeon+0.018
ked+0.017
 judge+0.017
 judged+0.017
 en+0.017
Neg logit contributions
 dining-0.011
rd-0.010
zie-0.010
 fluids-0.010
 occasions-0.009
Token
Token ID8151
Feature activation+6.63e-02

Pos logit contributions
 judge+0.073
 judged+0.072
 pigeon+0.071
ked+0.070
 teased+0.069
Neg logit contributions
 dining-0.075
rd-0.069
 counters-0.065
zie-0.063
 blankets-0.062
Tokent
Token ID83
Feature activation+1.20e-03

Pos logit contributions
 judged+0.097
 judge+0.096
 pigeon+0.094
ked+0.091
 en+0.090
Neg logit contributions
 dining-0.125
rd-0.117
zie-0.114
 counters-0.109
 blankets-0.106
Token know
Token ID760
Feature activation+1.19e-02

Pos logit contributions
ked+0.002
 judged+0.002
 pigeon+0.002
 judge+0.002
 scored+0.002
Neg logit contributions
 dining-0.002
zie-0.002
rd-0.002
 counters-0.002
 fluids-0.002
Token what
Token ID644
Feature activation+2.71e-02

Pos logit contributions
 judge+0.017
 judged+0.016
 scored+0.016
 pigeon+0.016
ked+0.016
Neg logit contributions
 dining-0.024
 counters-0.021
 blankets-0.021
rd-0.021
zie-0.021
Tokenâ
Token ID22940
Feature activation+6.36e-03

Pos logit contributions
ked+0.046
 judged+0.042
 judge+0.041
 pigeon+0.040
 scored+0.040
Neg logit contributions
 dining-0.048
rd-0.044
 counters-0.042
 fluids-0.042
 blankets-0.041
Token
Token ID26391
Feature activation+3.02e-02

Pos logit contributions
ked+0.014
 judged+0.014
 pigeon+0.014
 judge+0.014
 en+0.014
Neg logit contributions
 dining-0.007
rd-0.007
zie-0.006
 fluids-0.006
 occasions-0.006
Token you
Token ID345
Feature activation+2.15e-02

Pos logit contributions
 judged+0.004
 scored+0.004
ked+0.004
 teased+0.004
 judge+0.004
Neg logit contributions
 dining-0.003
zie-0.003
 blankets-0.003
rd-0.003
 counters-0.003
Token count
Token ID954
Feature activation+1.83e-02

Pos logit contributions
ked+0.036
 judged+0.034
 scored+0.033
 en+0.032
 pigeon+0.031
Neg logit contributions
 dining-0.033
zie-0.030
 blankets-0.030
rd-0.029
 Amen-0.029
Token how
Token ID703
Feature activation+1.23e-02

Pos logit contributions
 judged+0.025
 judge+0.024
 scored+0.024
 teased+0.023
ked+0.023
Neg logit contributions
 dining-0.037
 blankets-0.035
 counters-0.034
zie-0.034
rd-0.033
Token many
Token ID867
Feature activation+1.18e-02

Pos logit contributions
 judged+0.015
ked+0.015
 scored+0.014
 judge+0.014
 teased+0.014
Neg logit contributions
 dining-0.027
 blankets-0.024
 counters-0.024
rd-0.024
zie-0.023
Token ost
Token ID23619
Feature activation+3.43e-02

Pos logit contributions
 scored+0.020
ked+0.020
 judge+0.020
 judged+0.020
 teased+0.019
Neg logit contributions
 dining-0.019
 blankets-0.018
 counters-0.017
rd-0.017
 fluids-0.017
Tokenric
Token ID1173
Feature activation+6.63e-02

Pos logit contributions
 jealous+0.081
 bully+0.080
 pigeon+0.080
 judged+0.080
 en+0.080
Neg logit contributions
zie-0.032
 dining-0.030
rd-0.026
 blankets-0.023
 counters-0.022
Tokenhes
Token ID956
Feature activation+1.60e-02

Pos logit contributions
 judge+0.125
 judged+0.123
ked+0.123
 pigeon+0.120
 teased+0.117
Neg logit contributions
 dining-0.095
rd-0.086
zie-0.082
 counters-0.081
 blankets-0.077
Token there
Token ID612
Feature activation+1.29e-03

Pos logit contributions
 judged+0.027
ked+0.027
 judge+0.026
 scored+0.025
 teased+0.025
Neg logit contributions
 dining-0.027
zie-0.024
 blankets-0.024
 counters-0.023
rd-0.023
Token are
Token ID389
Feature activation+8.33e-03

Pos logit contributions
ked+0.003
 judged+0.002
 judge+0.002
 scored+0.002
 pigeon+0.002
Neg logit contributions
 dining-0.002
 blankets-0.002
zie-0.002
 counters-0.001
rd-0.001
Token?"
Token ID1701
Feature activation+2.67e-03

Pos logit contributions
ked+0.014
 judged+0.013
 judge+0.013
 scored+0.013
 pigeon+0.013
Neg logit contributions
 dining-0.014
 blankets-0.013
zie-0.012
 counters-0.012
rd-0.012
Token
Token ID198
Feature activation+2.59e-03

Pos logit contributions
 judge+0.004
 judged+0.004
 pigeon+0.004
 scored+0.004
 teased+0.004
Neg logit contributions
 dining-0.005
rd-0.004
 counters-0.004
zie-0.004
 blankets-0.004
Token placed
Token ID4624
Feature activation-5.08e-03

Pos logit contributions
 judge+0.005
 judged+0.005
 pigeon+0.004
 scored+0.004
 teased+0.004
Neg logit contributions
 dining-0.009
rd-0.008
zie-0.008
 counters-0.008
 fluids-0.008
Token it
Token ID340
Feature activation-1.90e-02

Pos logit contributions
 dining+0.012
rd+0.010
zie+0.010
 counters+0.010
 blankets+0.010
Neg logit contributions
 judge-0.006
 judged-0.006
 pigeon-0.006
 scored-0.005
 teased-0.005
Token onto
Token ID4291
Feature activation-2.63e-02

Pos logit contributions
 dining+0.047
 blankets+0.042
 counters+0.042
rd+0.042
zie+0.042
Neg logit contributions
 judge-0.020
 judged-0.019
 pigeon-0.019
 scored-0.017
 teased-0.017
Token the
Token ID262
Feature activation-5.99e-03

Pos logit contributions
 dining+0.066
rd+0.061
zie+0.060
 counters+0.060
 blankets+0.058
Neg logit contributions
 judge-0.025
 judged-0.024
 pigeon-0.023
 scored-0.023
 teased-0.022
Token wheel
Token ID7825
Feature activation-2.61e-02

Pos logit contributions
 dining+0.009
rd+0.009
zie+0.009
 fluids+0.008
 Amen+0.008
Neg logit contributions
 judge-0.011
 pigeon-0.011
 pin-0.010
 en-0.010
 judged-0.010
Tokenbar
Token ID5657
Feature activation-8.26e-02

Pos logit contributions
 dining+0.052
rd+0.047
 counters+0.046
 fluids+0.046
zie+0.046
Neg logit contributions
 judge-0.037
 judged-0.036
 scored-0.035
 pigeon-0.035
ked-0.035
Tokenrow
Token ID808
Feature activation-3.50e-02

Pos logit contributions
 dining+0.113
 Amen+0.093
rd+0.087
 fluids+0.086
 counters+0.081
Neg logit contributions
 at-0.173
ked-0.170
 pin-0.169
 judge-0.164
 scored-0.161
Token.
Token ID13
Feature activation-2.08e-02

Pos logit contributions
 dining+0.075
rd+0.069
 fluids+0.068
zie+0.068
 counters+0.066
Neg logit contributions
ked-0.042
 judge-0.041
 at-0.041
 judged-0.041
 scored-0.041
Token
Token ID198
Feature activation-1.87e-02

Pos logit contributions
 dining+0.042
rd+0.038
zie+0.038
 blankets+0.037
 counters+0.037
Neg logit contributions
 judge-0.028
 judged-0.028
 scored-0.028
 en-0.027
 pigeon-0.027
Token
Token ID198
Feature activation-2.89e-02

Pos logit contributions
 dining+0.048
 blankets+0.044
zie+0.044
 counters+0.043
rd+0.042
Neg logit contributions
 scored-0.017
 judge-0.016
 judged-0.016
 en-0.015
 teased-0.015
TokenJohn
Token ID7554
Feature activation-1.27e-02

Pos logit contributions
 dining+0.052
zie+0.050
 blankets+0.050
 counters+0.047
 fluids+0.047
Neg logit contributions
 judge-0.046
 scored-0.045
 judged-0.043
 en-0.043
 pigeon-0.043
Token him
Token ID683
Feature activation-7.73e-03

Pos logit contributions
 dining+0.045
 blankets+0.041
 counters+0.040
zie+0.040
rd+0.040
Neg logit contributions
 judge-0.037
 judged-0.036
 pigeon-0.035
 scored-0.034
ked-0.033
Token a
Token ID257
Feature activation-9.07e-03

Pos logit contributions
 dining+0.015
 blankets+0.014
rd+0.014
zie+0.014
 counters+0.013
Neg logit contributions
 judged-0.011
 judge-0.011
 pigeon-0.010
ked-0.010
 teased-0.010
Token big
Token ID1263
Feature activation-1.33e-02

Pos logit contributions
 dining+0.017
 blankets+0.015
rd+0.015
 counters+0.014
zie+0.014
Neg logit contributions
 judged-0.014
ked-0.014
 judge-0.014
 teased-0.014
 pigeon-0.013
Token red
Token ID2266
Feature activation-2.49e-02

Pos logit contributions
 dining+0.025
rd+0.022
zie+0.022
 counters+0.022
 blankets+0.021
Neg logit contributions
 judge-0.021
 judged-0.021
 pigeon-0.020
 scored-0.020
 teased-0.020
Token wheel
Token ID7825
Feature activation-2.42e-02

Pos logit contributions
 dining+0.048
rd+0.043
zie+0.042
 counters+0.042
 blankets+0.042
Neg logit contributions
 judge-0.037
 judged-0.037
 pigeon-0.035
 scored-0.035
 teased-0.035
Tokenbar
Token ID5657
Feature activation-7.47e-02

Pos logit contributions
 dining+0.045
rd+0.040
zie+0.040
 counters+0.040
 blankets+0.039
Neg logit contributions
 judged-0.037
 judge-0.037
 pigeon-0.036
 scored-0.035
 teased-0.035
Tokenrow
Token ID808
Feature activation-3.10e-02

Pos logit contributions
 dining+0.116
 fluids+0.094
rd+0.093
 counters+0.093
zie+0.087
Neg logit contributions
 judge-0.143
ked-0.141
 at-0.139
 pin-0.139
 judged-0.138
Token and
Token ID290
Feature activation+4.98e-03

Pos logit contributions
 dining+0.067
rd+0.060
zie+0.060
 counters+0.060
 blankets+0.058
Neg logit contributions
 judge-0.039
 judged-0.038
 pigeon-0.037
 scored-0.037
 teased-0.036
Token filled
Token ID5901
Feature activation-3.97e-03

Pos logit contributions
 judge+0.006
 judged+0.006
 pigeon+0.006
ked+0.006
 scored+0.006
Neg logit contributions
 dining-0.011
 blankets-0.010
 counters-0.010
zie-0.009
rd-0.009
Token it
Token ID340
Feature activation-1.68e-02

Pos logit contributions
 dining+0.009
zie+0.009
rd+0.009
 counters+0.009
 blankets+0.008
Neg logit contributions
 judge-0.004
 pigeon-0.004
 judged-0.004
 scored-0.004
 teased-0.004
Token with
Token ID351
Feature activation-1.19e-02

Pos logit contributions
 dining+0.042
zie+0.039
rd+0.039
 counters+0.039
 fluids+0.038
Neg logit contributions
 judge-0.015
 judged-0.014
 scored-0.014
 pigeon-0.014
 en-0.014
Token of
Token ID286
Feature activation+8.00e-03

Pos logit contributions
 dining+0.020
 fluids+0.019
rd+0.019
 Amen+0.019
zie+0.019
Neg logit contributions
 at-0.008
 en-0.007
 teased-0.007
 jealous-0.007
ked-0.007
Token soil
Token ID9260
Feature activation-1.50e-02

Pos logit contributions
 pigeon+0.013
 pin+0.013
 judge+0.013
 en+0.013
 judged+0.013
Neg logit contributions
 dining-0.014
rd-0.013
zie-0.013
 Amen-0.012
 counters-0.011
Token into
Token ID656
Feature activation-8.47e-03

Pos logit contributions
 dining+0.036
rd+0.034
zie+0.033
 fluids+0.033
 counters+0.032
Neg logit contributions
 at-0.014
 judge-0.014
 judged-0.014
 en-0.014
 teased-0.014
Token the
Token ID262
Feature activation-4.30e-03

Pos logit contributions
 dining+0.023
rd+0.022
zie+0.022
 Amen+0.021
 fluids+0.021
Neg logit contributions
 en-0.005
 jealous-0.005
 pin-0.005
 judged-0.005
 judge-0.005
Token wheel
Token ID7825
Feature activation-1.91e-02

Pos logit contributions
rd+0.007
 Amen+0.007
zie+0.007
 occasions+0.006
 dining+0.006
Neg logit contributions
 judge-0.007
 pigeon-0.007
 pin-0.007
 jealous-0.007
 en-0.007
Tokenbar
Token ID5657
Feature activation-7.00e-02

Pos logit contributions
 dining+0.040
rd+0.036
 fluids+0.035
 counters+0.035
zie+0.034
Neg logit contributions
 judge-0.026
 judged-0.025
 scored-0.024
ked-0.024
 pigeon-0.024
Tokenrow
Token ID808
Feature activation-2.81e-02

Pos logit contributions
 dining+0.102
 Amen+0.089
rd+0.080
 fluids+0.078
 counters+0.073
Neg logit contributions
 at-0.142
 pin-0.139
ked-0.135
 judge-0.130
 touched-0.130
Token.
Token ID13
Feature activation-2.10e-02

Pos logit contributions
 dining+0.062
rd+0.057
 fluids+0.056
zie+0.056
 Amen+0.055
Neg logit contributions
 at-0.032
ked-0.032
 en-0.032
 judge-0.031
 judged-0.031
Token It
Token ID632
Feature activation-3.74e-03

Pos logit contributions
 dining+0.043
rd+0.039
zie+0.039
 fluids+0.038
 blankets+0.037
Neg logit contributions
 judge-0.028
 judged-0.027
 scored-0.027
 en-0.027
 teased-0.026
Token was
Token ID373
Feature activation-1.59e-02

Pos logit contributions
 dining+0.008
rd+0.007
zie+0.007
 counters+0.007
 blankets+0.007
Neg logit contributions
 judge-0.005
 judged-0.005
 pigeon-0.005
 scored-0.005
 teased-0.005
Token heavy
Token ID4334
Feature activation-1.25e-02

Pos logit contributions
 dining+0.038
rd+0.036
zie+0.035
 counters+0.034
 fluids+0.034
Neg logit contributions
 judged-0.016
 en-0.015
 judge-0.015
 jealous-0.015
 teased-0.015
Token it
Token ID340
Feature activation+5.39e-03

Pos logit contributions
 judge+0.009
 judged+0.009
 pigeon+0.009
 teased+0.008
 scored+0.008
Neg logit contributions
 dining-0.012
 blankets-0.011
 counters-0.011
zie-0.011
rd-0.011
Token away
Token ID1497
Feature activation-3.23e-04

Pos logit contributions
 judge+0.006
 judged+0.006
 pigeon+0.006
 scored+0.006
 teased+0.006
Neg logit contributions
 dining-0.012
rd-0.011
 counters-0.011
zie-0.011
 blankets-0.011
Token in
Token ID287
Feature activation-3.91e-03

Pos logit contributions
 dining+0.001
rd+0.001
 counters+0.001
 blankets+0.001
zie+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 pigeon-0.001
 scored-0.000
 teased-0.000
Token a
Token ID257
Feature activation-9.01e-03

Pos logit contributions
 dining+0.010
rd+0.009
 counters+0.009
zie+0.009
 fluids+0.009
Neg logit contributions
 judge-0.004
 judged-0.004
 scored-0.003
 pigeon-0.003
 teased-0.003
Token wheel
Token ID7825
Feature activation-1.38e-02

Pos logit contributions
 dining+0.018
zie+0.017
rd+0.017
 counters+0.016
 fluids+0.016
Neg logit contributions
 judge-0.013
 pigeon-0.013
 judged-0.012
 jealous-0.012
 scored-0.012
Tokenbar
Token ID5657
Feature activation-6.94e-02

Pos logit contributions
 dining+0.026
rd+0.023
zie+0.023
 counters+0.023
 blankets+0.022
Neg logit contributions
 judge-0.022
 judged-0.021
 pigeon-0.021
 scored-0.021
 teased-0.020
Tokenrow
Token ID808
Feature activation-2.21e-02

Pos logit contributions
 dining+0.094
 Amen+0.073
 counters+0.070
 fluids+0.069
rd+0.069
Neg logit contributions
 at-0.148
 pin-0.147
ked-0.146
 judge-0.144
 judged-0.141
Token.
Token ID13
Feature activation-4.78e-03

Pos logit contributions
 dining+0.045
rd+0.040
 counters+0.040
zie+0.040
 fluids+0.039
Neg logit contributions
 judge-0.031
 judged-0.030
 scored-0.029
 pigeon-0.029
 teased-0.029
Token
Token ID198
Feature activation+1.50e-03

Pos logit contributions
 dining+0.009
rd+0.008
 counters+0.008
zie+0.008
 occasions+0.008
Neg logit contributions
 scored-0.007
 judge-0.007
 judged-0.007
 en-0.007
 teased-0.007
Token
Token ID198
Feature activation-4.27e-03

Pos logit contributions
 scored+0.001
 judged+0.001
 judge+0.001
 en+0.001
 at+0.001
Neg logit contributions
 dining-0.004
 counters-0.004
zie-0.004
 blankets-0.004
rd-0.004
TokenOnce
Token ID7454
Feature activation+1.22e-02

Pos logit contributions
 dining+0.007
rd+0.006
zie+0.006
 counters+0.006
 blankets+0.006
Neg logit contributions
 judge-0.008
 judged-0.008
 pigeon-0.008
 scored-0.007
 teased-0.007
Token dolls
Token ID36062
Feature activation+1.15e-02

Pos logit contributions
 blankets+0.001
 dining+0.001
 counters+0.001
 bedroom+0.001
 bed+0.001
Neg logit contributions
 judged-0.001
 pigeon-0.001
 judge-0.001
 scored-0.001
ked-0.001
Token around
Token ID1088
Feature activation+7.22e-03

Pos logit contributions
 pigeon+0.018
 judged+0.017
 judge+0.017
ked+0.016
 en+0.015
Neg logit contributions
 dining-0.021
 blankets-0.020
zie-0.019
 counters-0.019
 bed-0.019
Token in
Token ID287
Feature activation-9.55e-04

Pos logit contributions
 judged+0.010
 pigeon+0.010
ked+0.010
 judge+0.010
 scored+0.010
Neg logit contributions
 dining-0.014
 blankets-0.013
zie-0.013
 counters-0.013
rd-0.012
Token a
Token ID257
Feature activation-2.61e-03

Pos logit contributions
 dining+0.002
rd+0.002
 counters+0.002
zie+0.002
 fluids+0.002
Neg logit contributions
 judge-0.001
 judged-0.001
 en-0.001
 teased-0.001
 jealous-0.001
Token wheel
Token ID7825
Feature activation-1.15e-02

Pos logit contributions
 dining+0.005
zie+0.005
rd+0.005
 counters+0.005
 fluids+0.005
Neg logit contributions
 judge-0.004
 pigeon-0.003
 scored-0.003
 judged-0.003
 en-0.003
Tokenbar
Token ID5657
Feature activation-6.92e-02

Pos logit contributions
 dining+0.021
zie+0.019
rd+0.019
 counters+0.019
 blankets+0.019
Neg logit contributions
 judged-0.017
 judge-0.017
 pigeon-0.017
ked-0.017
 scored-0.016
Tokenrow
Token ID808
Feature activation-2.06e-02

Pos logit contributions
 dining+0.100
 counters+0.079
 fluids+0.079
rd+0.074
 Amen+0.072
Neg logit contributions
 judge-0.141
 pin-0.140
 at-0.139
 judged-0.136
ked-0.136
Token she
Token ID673
Feature activation-5.61e-03

Pos logit contributions
 dining+0.044
rd+0.040
 counters+0.039
zie+0.039
 blankets+0.038
Neg logit contributions
 judge-0.027
 judged-0.026
 pigeon-0.025
 scored-0.025
 teased-0.025
Token had
Token ID550
Feature activation+1.60e-02

Pos logit contributions
 dining+0.011
rd+0.010
zie+0.010
 counters+0.010
 blankets+0.009
Neg logit contributions
 pigeon-0.008
ked-0.007
 judge-0.007
 judged-0.007
 en-0.007
Token found
Token ID1043
Feature activation-1.18e-02

Pos logit contributions
 pigeon+0.024
 judge+0.023
 bully+0.022
 judged+0.022
ked+0.021
Neg logit contributions
 dining-0.030
rd-0.028
 blankets-0.027
 counters-0.027
zie-0.027
Token in
Token ID287
Feature activation+4.09e-03

Pos logit contributions
 dining+0.025
 blankets+0.022
rd+0.022
zie+0.022
 counters+0.022
Neg logit contributions
 judge-0.016
 judged-0.015
 pigeon-0.015
 scored-0.014
 bully-0.014
Token back
Token ID736
Feature activation+1.29e-02

Pos logit contributions
 dining+0.014
rd+0.013
 counters+0.013
zie+0.013
 blankets+0.012
Neg logit contributions
 judge-0.006
 judged-0.006
 pigeon-0.006
 scored-0.005
 teased-0.005
Token with
Token ID351
Feature activation-9.62e-03

Pos logit contributions
 pigeon+0.017
 scored+0.016
 judged+0.016
 judge+0.016
ked+0.016
Neg logit contributions
 dining-0.028
 blankets-0.025
zie-0.024
 counters-0.024
rd-0.024
Token a
Token ID257
Feature activation-4.54e-03

Pos logit contributions
 dining+0.022
 blankets+0.020
rd+0.019
zie+0.019
 counters+0.019
Neg logit contributions
 judge-0.012
 judged-0.011
 pigeon-0.011
 scored-0.011
ked-0.011
Token big
Token ID1263
Feature activation-1.36e-02

Pos logit contributions
 dining+0.009
rd+0.008
 counters+0.008
zie+0.008
 blankets+0.008
Neg logit contributions
 judge-0.007
 judged-0.007
 pigeon-0.006
 scored-0.006
 teased-0.006
Token wheel
Token ID7825
Feature activation-1.54e-02

Pos logit contributions
 dining+0.026
rd+0.023
 counters+0.023
zie+0.023
 blankets+0.022
Neg logit contributions
 judge-0.021
 judged-0.020
 pigeon-0.020
 scored-0.020
ked-0.020
Tokenbar
Token ID5657
Feature activation-6.78e-02

Pos logit contributions
 dining+0.032
rd+0.029
zie+0.029
 counters+0.029
 blankets+0.028
Neg logit contributions
 judge-0.020
 judged-0.020
 pigeon-0.020
 scored-0.019
 teased-0.019
Tokenrow
Token ID808
Feature activation-2.14e-02

Pos logit contributions
 dining+0.085
 fluids+0.063
rd+0.062
 Amen+0.062
 counters+0.061
Neg logit contributions
 pin-0.153
ked-0.153
 judge-0.152
 at-0.150
 judged-0.147
Token.
Token ID13
Feature activation-1.83e-04

Pos logit contributions
 dining+0.048
rd+0.043
zie+0.043
 counters+0.043
 blankets+0.042
Neg logit contributions
 judge-0.026
 judged-0.025
 pigeon-0.024
 scored-0.024
 teased-0.024
Token He
Token ID679
Feature activation+2.58e-06

Pos logit contributions
 dining+0.000
 counters+0.000
zie+0.000
 blankets+0.000
rd+0.000
Neg logit contributions
 judge-0.000
 pigeon-0.000
 judged-0.000
ked-0.000
 teased-0.000
Token asked
Token ID1965
Feature activation-1.51e-02

Feature not active on prev token;
no predictions were affected.
Token the
Token ID262
Feature activation+1.52e-02

Pos logit contributions
 dining+0.030
zie+0.028
 blankets+0.027
rd+0.027
 counters+0.026
Neg logit contributions
 judge-0.021
 judged-0.021
 pigeon-0.021
 scored-0.020
ked-0.020
Token.
Token ID13
Feature activation-7.72e-03

Pos logit contributions
 judge+0.002
 judged+0.002
 pigeon+0.002
ked+0.002
 bully+0.002
Neg logit contributions
 dining-0.002
 blankets-0.002
 counters-0.002
rd-0.002
zie-0.002
Token He
Token ID679
Feature activation+1.35e-03

Pos logit contributions
 dining+0.014
rd+0.013
 counters+0.013
zie+0.012
 blankets+0.012
Neg logit contributions
 judge-0.012
 judged-0.012
 pigeon-0.012
 scored-0.012
 teased-0.012
Token found
Token ID1043
Feature activation-3.01e-03

Pos logit contributions
 judge+0.002
 pigeon+0.002
 bully+0.002
ked+0.002
 judged+0.002
Neg logit contributions
 dining-0.003
zie-0.002
rd-0.002
 blankets-0.002
 counters-0.002
Token a
Token ID257
Feature activation-3.07e-03

Pos logit contributions
 dining+0.006
 counters+0.006
 blankets+0.006
zie+0.006
rd+0.006
Neg logit contributions
 judge-0.004
 judged-0.004
 pigeon-0.004
 scored-0.004
 teased-0.004
Token wheel
Token ID7825
Feature activation-1.35e-02

Pos logit contributions
 dining+0.006
zie+0.006
rd+0.006
 counters+0.006
 fluids+0.005
Neg logit contributions
 judge-0.004
 judged-0.004
 pigeon-0.004
 scored-0.004
 teased-0.004
Tokenbar
Token ID5657
Feature activation-6.60e-02

Pos logit contributions
 dining+0.030
zie+0.028
rd+0.027
 counters+0.027
 blankets+0.027
Neg logit contributions
 judged-0.016
 judge-0.015
 pigeon-0.015
 scored-0.014
 teased-0.014
Tokenrow
Token ID808
Feature activation-1.99e-02

Pos logit contributions
 dining+0.088
rd+0.067
 counters+0.066
 Amen+0.066
 fluids+0.066
Neg logit contributions
ked-0.143
 judge-0.143
 pin-0.142
 at-0.139
 scored-0.139
Token,
Token ID11
Feature activation+1.03e-02

Pos logit contributions
 dining+0.047
rd+0.043
zie+0.043
 counters+0.043
 blankets+0.042
Neg logit contributions
 judge-0.021
 judged-0.020
 pigeon-0.020
 scored-0.019
 teased-0.019
Token a
Token ID257
Feature activation-5.28e-03

Pos logit contributions
 judge+0.014
 judged+0.013
 pigeon+0.013
 bully+0.012
 scored+0.012
Neg logit contributions
 dining-0.022
 blankets-0.020
 counters-0.020
zie-0.019
rd-0.019
Token tr
Token ID491
Feature activation+4.04e-03

Pos logit contributions
 dining+0.010
rd+0.009
zie+0.009
 counters+0.009
 fluids+0.009
Neg logit contributions
 judge-0.008
 pigeon-0.008
 judged-0.007
 scored-0.007
 en-0.007
Tokenamp
Token ID696
Feature activation-2.93e-03

Pos logit contributions
 judge+0.008
 judged+0.008
 scored+0.008
 pigeon+0.007
 teased+0.007
Neg logit contributions
 dining-0.006
 counters-0.005
rd-0.005
zie-0.005
 blankets-0.005
Token a
Token ID257
Feature activation-4.59e-03

Pos logit contributions
 judge+0.010
 judged+0.010
 scored+0.010
 teased+0.009
 pigeon+0.009
Neg logit contributions
 dining-0.015
 blankets-0.015
 counters-0.014
rd-0.014
zie-0.014
Token big
Token ID1263
Feature activation-1.21e-02

Pos logit contributions
 dining+0.009
rd+0.008
zie+0.008
 counters+0.008
 blankets+0.008
Neg logit contributions
 judge-0.007
 judged-0.007
 pigeon-0.006
 scored-0.006
 teased-0.006
Token,
Token ID11
Feature activation+5.37e-03

Pos logit contributions
 dining+0.021
rd+0.019
zie+0.018
 counters+0.018
 blankets+0.018
Neg logit contributions
 judge-0.021
 judged-0.020
 pigeon-0.020
 scored-0.020
 teased-0.019
Token spinning
Token ID19493
Feature activation+4.48e-03

Pos logit contributions
 judge+0.008
 judged+0.008
ked+0.008
 scored+0.008
 teased+0.008
Neg logit contributions
 dining-0.010
 counters-0.009
 blankets-0.009
rd-0.009
zie-0.008
Token wheel
Token ID7825
Feature activation-1.17e-02

Pos logit contributions
 judged+0.007
 judge+0.007
 teased+0.007
 scored+0.007
ked+0.007
Neg logit contributions
 dining-0.008
 blankets-0.007
zie-0.007
 counters-0.007
rd-0.007
Tokenbar
Token ID5657
Feature activation-6.34e-02

Pos logit contributions
 dining+0.024
zie+0.022
rd+0.022
 counters+0.021
 blankets+0.021
Neg logit contributions
 judge-0.016
 judged-0.016
 pigeon-0.015
 scored-0.015
 teased-0.014
Tokenrow
Token ID808
Feature activation-1.83e-02

Pos logit contributions
 dining+0.081
rd+0.060
 fluids+0.059
 counters+0.058
 Amen+0.057
Neg logit contributions
ked-0.142
 judge-0.140
 pin-0.139
 at-0.139
 judged-0.137
Token and
Token ID290
Feature activation+1.23e-02

Pos logit contributions
 dining+0.042
rd+0.038
 counters+0.038
zie+0.038
 blankets+0.038
Neg logit contributions
 judge-0.021
 judged-0.020
 pigeon-0.019
 scored-0.019
 teased-0.019
Token put
Token ID1234
Feature activation-5.43e-03

Pos logit contributions
 judge+0.016
 pigeon+0.015
 judged+0.015
 bully+0.015
ked+0.014
Neg logit contributions
 dining-0.026
 blankets-0.024
 counters-0.024
zie-0.023
rd-0.022
Token the
Token ID262
Feature activation-9.93e-03

Pos logit contributions
 dining+0.010
 blankets+0.010
 counters+0.010
zie+0.010
 fluids+0.009
Neg logit contributions
 judge-0.008
 judged-0.008
 pigeon-0.007
 scored-0.007
 bully-0.007
Token pumpkin
Token ID30089
Feature activation-1.06e-02

Pos logit contributions
 dining+0.019
 counters+0.017
rd+0.016
zie+0.016
 blankets+0.016
Neg logit contributions
 judge-0.015
 judged-0.015
 scored-0.015
 teased-0.015
ked-0.015
Token He
Token ID679
Feature activation+3.04e-03

Pos logit contributions
 dining+0.001
rd+0.001
 counters+0.001
zie+0.001
 blankets+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 pigeon-0.001
 scored-0.001
 teased-0.001
Token jumped
Token ID11687
Feature activation+5.94e-03

Pos logit contributions
 judge+0.004
 pigeon+0.004
ked+0.004
 bully+0.004
 gol+0.004
Neg logit contributions
 dining-0.005
 blankets-0.005
zie-0.005
 counters-0.005
rd-0.005
Token onto
Token ID4291
Feature activation+9.82e-03

Pos logit contributions
 judge+0.009
 judged+0.009
 pigeon+0.008
 scored+0.008
 teased+0.008
Neg logit contributions
 dining-0.012
 blankets-0.011
 counters-0.011
zie-0.011
rd-0.010
Token the
Token ID262
Feature activation-1.56e-03

Pos logit contributions
 judge+0.012
 judged+0.011
 pigeon+0.011
 scored+0.011
 teased+0.011
Neg logit contributions
 dining-0.022
 blankets-0.020
zie-0.020
 counters-0.020
 fluids-0.020
Token wheel
Token ID7825
Feature activation-5.44e-03

Pos logit contributions
 dining+0.003
rd+0.003
zie+0.002
 fluids+0.002
 occasions+0.002
Neg logit contributions
 judge-0.003
 pigeon-0.003
 pin-0.002
 en-0.002
 judged-0.002
Tokenbar
Token ID5657
Feature activation-6.34e-02

Pos logit contributions
 dining+0.012
zie+0.011
 blankets+0.011
rd+0.010
 counters+0.010
Neg logit contributions
 judged-0.007
 judge-0.007
 pigeon-0.006
 scored-0.006
 en-0.006
Tokenrow
Token ID808
Feature activation-1.10e-02

Pos logit contributions
 dining+0.080
 counters+0.058
rd+0.058
 fluids+0.057
 Amen+0.057
Neg logit contributions
 judge-0.141
ked-0.140
 pin-0.138
 at-0.138
 judged-0.137
Token and
Token ID290
Feature activation+6.59e-03

Pos logit contributions
 dining+0.025
zie+0.022
rd+0.022
 counters+0.022
 blankets+0.022
Neg logit contributions
 judge-0.013
 judged-0.013
 pigeon-0.012
 scored-0.012
 teased-0.012
Token started
Token ID2067
Feature activation-4.47e-03

Pos logit contributions
 judge+0.010
 pigeon+0.009
 judged+0.009
 bully+0.008
 en+0.008
Neg logit contributions
 dining-0.013
 blankets-0.013
 counters-0.012
zie-0.012
 bed-0.011
Token rolling
Token ID10708
Feature activation-4.61e-03

Pos logit contributions
 dining+0.010
 blankets+0.010
 counters+0.010
rd+0.009
zie+0.009
Neg logit contributions
 judge-0.005
 pigeon-0.005
 judged-0.005
 scored-0.004
 bully-0.004
Token around
Token ID1088
Feature activation+3.77e-04

Pos logit contributions
 dining+0.010
rd+0.009
zie+0.009
 counters+0.009
 blankets+0.009
Neg logit contributions
 judge-0.005
 judged-0.005
 pigeon-0.005
 scored-0.005
 teased-0.005
Token harder
Token ID7069
Feature activation+2.28e-03

Pos logit contributions
 judge+0.004
 judged+0.004
 pigeon+0.004
ked+0.004
 scored+0.004
Neg logit contributions
 dining-0.007
 counters-0.007
zie-0.007
 blankets-0.007
rd-0.007
Token to
Token ID284
Feature activation-1.05e-02

Pos logit contributions
 judge+0.003
 judged+0.003
 pigeon+0.003
 scored+0.003
 teased+0.003
Neg logit contributions
 dining-0.005
rd-0.004
zie-0.004
 counters-0.004
 blankets-0.004
Token push
Token ID4574
Feature activation-2.49e-03

Pos logit contributions
 dining+0.020
zie+0.019
 blankets+0.018
 counters+0.018
 fluids+0.017
Neg logit contributions
ked-0.015
 judged-0.014
 judge-0.014
 pigeon-0.014
 scored-0.014
Token the
Token ID262
Feature activation-5.68e-03

Pos logit contributions
 dining+0.005
zie+0.005
 counters+0.005
rd+0.005
 blankets+0.005
Neg logit contributions
 judge-0.003
 judged-0.003
 pigeon-0.003
 scored-0.003
 teased-0.003
Token wheel
Token ID7825
Feature activation-3.53e-03

Pos logit contributions
 dining+0.011
rd+0.009
zie+0.009
 counters+0.009
 blankets+0.009
Neg logit contributions
 judge-0.009
 judged-0.009
 pigeon-0.008
 scored-0.008
 teased-0.008
Tokenbar
Token ID5657
Feature activation-6.28e-02

Pos logit contributions
 dining+0.006
zie+0.006
 counters+0.006
 blankets+0.006
rd+0.006
Neg logit contributions
 judge-0.005
 judged-0.005
 pigeon-0.005
 scored-0.005
 teased-0.005
Tokenrow
Token ID808
Feature activation-7.40e-03

Pos logit contributions
 dining+0.104
rd+0.086
 counters+0.084
 fluids+0.084
 blankets+0.080
Neg logit contributions
 judge-0.115
 judged-0.112
 scored-0.112
ked-0.111
 pin-0.111
Token.
Token ID13
Feature activation-9.33e-03

Pos logit contributions
 dining+0.016
rd+0.014
zie+0.014
 counters+0.014
 blankets+0.014
Neg logit contributions
 judge-0.010
 judged-0.009
 scored-0.009
 pigeon-0.009
 teased-0.009
Token But
Token ID887
Feature activation-1.75e-02

Pos logit contributions
 dining+0.018
rd+0.016
zie+0.016
 counters+0.016
 blankets+0.016
Neg logit contributions
 judge-0.014
 judged-0.014
 pigeon-0.013
 scored-0.013
 teased-0.013
Token her
Token ID607
Feature activation+3.47e-03

Pos logit contributions
 dining+0.034
zie+0.031
 blankets+0.030
 counters+0.029
rd+0.028
Neg logit contributions
 judged-0.025
 judge-0.025
 pigeon-0.024
ked-0.024
 scored-0.023
Token backyard
Token ID24296
Feature activation+1.92e-02

Pos logit contributions
 judged+0.005
ked+0.005
 scored+0.005
 judge+0.005
 teased+0.005
Neg logit contributions
 dining-0.006
 blankets-0.006
 counters-0.006
rd-0.006
zie-0.006
Token.
Token ID13
Feature activation-9.42e-03

Pos logit contributions
 at+0.013
 judge+0.012
 en+0.012
 judged+0.012
 pigeon+0.012
Neg logit contributions
 dining-0.026
zie-0.024
rd-0.024
 counters-0.024
 fluids-0.023
Token She
Token ID1375
Feature activation+1.14e-03

Pos logit contributions
 dining+0.018
rd+0.016
zie+0.016
 counters+0.016
 blankets+0.016
Neg logit contributions
 judge-0.014
 judged-0.014
 scored-0.013
 pigeon-0.013
 en-0.013
Token saw
Token ID2497
Feature activation-1.94e-03

Pos logit contributions
 judge+0.002
 pigeon+0.002
ked+0.002
 bully+0.002
 judged+0.002
Neg logit contributions
 dining-0.002
zie-0.002
rd-0.002
 blankets-0.002
 counters-0.002
Token a
Token ID257
Feature activation+2.82e-03

Pos logit contributions
 dining+0.004
 counters+0.004
 blankets+0.004
zie+0.004
rd+0.004
Neg logit contributions
 judge-0.003
 judged-0.003
 scored-0.003
 pigeon-0.002
ked-0.002
Token wheel
Token ID7825
Feature activation-9.69e-03

Pos logit contributions
 judge+0.004
 pigeon+0.004
 judged+0.004
 scored+0.004
 en+0.004
Neg logit contributions
 dining-0.006
zie-0.005
rd-0.005
 counters-0.005
 fluids-0.005
Tokenbar
Token ID5657
Feature activation-6.08e-02

Pos logit contributions
 dining+0.022
rd+0.020
zie+0.020
 counters+0.020
 blankets+0.019
Neg logit contributions
 judge-0.011
 judged-0.011
 pigeon-0.011
 scored-0.011
 teased-0.010
Tokenrow
Token ID808
Feature activation-1.73e-02

Pos logit contributions
 dining+0.088
 fluids+0.070
 Amen+0.069
rd+0.066
 counters+0.064
Neg logit contributions
 at-0.126
ked-0.126
 pin-0.125
 judge-0.124
 teased-0.119
Token,
Token ID11
Feature activation+1.15e-02

Pos logit contributions
 dining+0.041
rd+0.037
zie+0.036
 fluids+0.036
 counters+0.036
Neg logit contributions
 judge-0.019
 judged-0.018
 pigeon-0.018
 scored-0.017
ked-0.017
Token a
Token ID257
Feature activation-4.49e-03

Pos logit contributions
 judge+0.015
 judged+0.014
 pigeon+0.014
 scored+0.014
ked+0.013
Neg logit contributions
 dining-0.025
 blankets-0.023
 counters-0.023
zie-0.022
rd-0.022
Token large
Token ID1588
Feature activation-1.08e-02

Pos logit contributions
 dining+0.009
rd+0.008
zie+0.008
 counters+0.008
 fluids+0.008
Neg logit contributions
 judge-0.007
 pigeon-0.006
 judged-0.006
 scored-0.006
 en-0.006
Token sp
Token ID599
Feature activation-1.48e-02

Pos logit contributions
 dining+0.020
rd+0.019
zie+0.018
 counters+0.018
 fluids+0.018
Neg logit contributions
 judge-0.017
 pigeon-0.016
 judged-0.016
 scored-0.015
 pin-0.015
Token filled
Token ID5901
Feature activation+1.26e-02

Pos logit contributions
 judge+0.003
 pigeon+0.002
 judged+0.002
 en+0.002
ked+0.002
Neg logit contributions
 dining-0.004
zie-0.004
 blankets-0.004
 counters-0.004
rd-0.003
Token up
Token ID510
Feature activation+2.72e-02

Pos logit contributions
 judge+0.015
 judged+0.014
 pigeon+0.014
ked+0.013
 scored+0.013
Neg logit contributions
 dining-0.029
 counters-0.026
 blankets-0.026
rd-0.026
zie-0.025
Token lots
Token ID6041
Feature activation+3.66e-04

Pos logit contributions
 judge+0.041
 judged+0.040
ked+0.038
 pigeon+0.038
 scored+0.036
Neg logit contributions
 dining-0.055
 blankets-0.050
 counters-0.048
rd-0.046
zie-0.045
Token of
Token ID286
Feature activation+1.28e-02

Pos logit contributions
 judge+0.001
 judged+0.001
 pigeon+0.001
 scored+0.000
 teased+0.000
Neg logit contributions
 dining-0.001
rd-0.001
 counters-0.001
zie-0.001
 blankets-0.001
Token wheel
Token ID7825
Feature activation-1.24e-02

Pos logit contributions
 en+0.020
 judge+0.020
 judged+0.019
 jealous+0.019
 pigeon+0.019
Neg logit contributions
 dining-0.024
zie-0.022
rd-0.021
 counters-0.020
 fluids-0.020
Tokenbar
Token ID5657
Feature activation-6.01e-02

Pos logit contributions
 dining+0.031
rd+0.028
 fluids+0.028
zie+0.027
 counters+0.027
Neg logit contributions
 judge-0.012
 pigeon-0.011
 judged-0.011
 scored-0.011
ked-0.011
Tokenrows
Token ID8516
Feature activation+2.22e-03

Pos logit contributions
 dining+0.096
 Amen+0.082
 fluids+0.076
rd+0.075
jar+0.072
Neg logit contributions
 at-0.118
ked-0.115
 pin-0.113
 teased-0.109
 judge-0.108
Token with
Token ID351
Feature activation-1.51e-03

Pos logit contributions
 judge+0.003
 judged+0.003
 pigeon+0.003
 scored+0.002
 bully+0.002
Neg logit contributions
 dining-0.005
rd-0.004
 blankets-0.004
zie-0.004
 counters-0.004
Token the
Token ID262
Feature activation+3.44e-03

Pos logit contributions
 dining+0.003
 blankets+0.003
 counters+0.003
zie+0.003
rd+0.003
Neg logit contributions
 judge-0.002
 judged-0.002
 pigeon-0.002
 scored-0.002
ked-0.002
Token waste
Token ID7030
Feature activation+5.90e-03

Pos logit contributions
 judged+0.006
 judge+0.006
 scored+0.006
ked+0.006
 teased+0.006
Neg logit contributions
 dining-0.006
 counters-0.005
 blankets-0.005
rd-0.005
zie-0.005
Token and
Token ID290
Feature activation+2.14e-03

Pos logit contributions
 judged+0.009
 judge+0.008
ked+0.008
 scored+0.008
 teased+0.008
Neg logit contributions
 dining-0.012
 blankets-0.010
 counters-0.010
zie-0.010
rd-0.010
Token"
Token ID1
Feature activation-1.19e-02

Pos logit contributions
 dining+0.038
zie+0.036
 counters+0.034
 blankets+0.034
rd+0.033
Neg logit contributions
 judge-0.029
 en-0.027
 judged-0.027
 scored-0.027
 pigeon-0.026
TokenGood
Token ID10248
Feature activation-2.04e-02

Pos logit contributions
 dining+0.023
rd+0.021
 counters+0.021
zie+0.021
 blankets+0.020
Neg logit contributions
 judge-0.018
 judged-0.018
 pigeon-0.017
 scored-0.017
 teased-0.017
Token job
Token ID1693
Feature activation-8.36e-03

Pos logit contributions
 dining+0.041
 counters+0.037
zie+0.036
rd+0.036
 blankets+0.035
Neg logit contributions
 judge-0.031
 pigeon-0.030
 judged-0.029
 scored-0.028
 teased-0.028
Token,
Token ID11
Feature activation-2.47e-02

Pos logit contributions
 dining+0.018
zie+0.016
 counters+0.016
rd+0.016
 fluids+0.016
Neg logit contributions
 judge-0.011
 scored-0.010
 teased-0.010
 pigeon-0.010
 judged-0.010
Token O
Token ID440
Feature activation-2.04e-02

Pos logit contributions
 dining+0.055
rd+0.050
 counters+0.050
zie+0.047
 occasions+0.047
Neg logit contributions
 judge-0.032
 pigeon-0.030
 en-0.030
 judged-0.030
 jealous-0.029
Tokenll
Token ID297
Feature activation-5.87e-02

Pos logit contributions
 dining+0.040
 blankets+0.037
 counters+0.036
 fluids+0.034
 occasions+0.034
Neg logit contributions
 judge-0.030
 pigeon-0.029
 scored-0.029
 pin-0.029
 judged-0.029
Tokenie
Token ID494
Feature activation-3.10e-02

Pos logit contributions
 dining+0.131
 counters+0.117
 bedrooms+0.115
rd+0.115
 fluids+0.110
Neg logit contributions
 pigeon-0.079
 at-0.079
 pin-0.079
 judge-0.078
 ar-0.073
Token!"
Token ID2474
Feature activation-5.07e-03

Pos logit contributions
 dining+0.062
 counters+0.055
 fluids+0.055
 bedrooms+0.054
 occasions+0.052
Neg logit contributions
 at-0.048
 judge-0.047
 pigeon-0.046
 pin-0.046
 teased-0.044
Token his
Token ID465
Feature activation+3.39e-03

Pos logit contributions
 dining+0.011
rd+0.010
zie+0.010
 counters+0.009
 blankets+0.009
Neg logit contributions
 judge-0.007
 judged-0.007
 pigeon-0.006
 scored-0.006
ked-0.006
Token mother
Token ID2802
Feature activation-5.05e-03

Pos logit contributions
 judge+0.005
 judged+0.005
 pigeon+0.005
 teased+0.004
 en+0.004
Neg logit contributions
 dining-0.007
 counters-0.006
rd-0.006
zie-0.006
 fluids-0.006
Token praised
Token ID15342
Feature activation-4.18e-02

Pos logit contributions
 dining+0.012
rd+0.011
 counters+0.011
zie+0.011
 blankets+0.010
Neg logit contributions
 judge-0.006
 judged-0.006
 pigeon-0.005
 scored-0.005
 teased-0.005
TokenFinally
Token ID11158
Feature activation-2.39e-02

Pos logit contributions
 dining+0.043
zie+0.041
 counters+0.039
 blankets+0.038
rd+0.037
Neg logit contributions
 judge-0.036
 scored-0.035
 judged-0.035
 pigeon-0.034
 teased-0.034
Token,
Token ID11
Feature activation-1.30e-02

Pos logit contributions
 dining+0.053
 counters+0.048
zie+0.048
rd+0.048
 blankets+0.046
Neg logit contributions
 judge-0.029
 judged-0.028
 scored-0.028
 teased-0.027
 pigeon-0.027
Token the
Token ID262
Feature activation-8.14e-04

Pos logit contributions
 dining+0.025
 blankets+0.021
 counters+0.021
zie+0.021
rd+0.021
Neg logit contributions
 judge-0.021
 judged-0.020
ked-0.020
 pigeon-0.020
 scored-0.019
Token sun
Token ID4252
Feature activation-8.85e-03

Pos logit contributions
 dining+0.002
zie+0.001
rd+0.001
 counters+0.001
 fluids+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 scored-0.001
 pigeon-0.001
 en-0.001
Token began
Token ID2540
Feature activation-2.62e-02

Pos logit contributions
 dining+0.019
rd+0.018
 counters+0.017
zie+0.017
 fluids+0.017
Neg logit contributions
 judge-0.011
 judged-0.011
 pigeon-0.011
 scored-0.011
 teased-0.010
Token to
Token ID284
Feature activation-5.85e-02

Pos logit contributions
 dining+0.070
 blankets+0.064
zie+0.064
 counters+0.063
rd+0.063
Neg logit contributions
 judge-0.021
 pigeon-0.020
 judged-0.020
 scored-0.018
 bully-0.018
Token set
Token ID900
Feature activation-3.31e-02

Pos logit contributions
 dining+0.118
rd+0.112
 counters+0.105
zie+0.103
 fluids+0.102
Neg logit contributions
 judge-0.082
 judged-0.079
 pigeon-0.077
 en-0.075
 scored-0.075
Token and
Token ID290
Feature activation-2.81e-02

Pos logit contributions
 dining+0.078
rd+0.071
 fluids+0.070
zie+0.069
 counters+0.069
Neg logit contributions
 judge-0.035
 at-0.034
 pigeon-0.033
 judged-0.032
 en-0.032
Token Emma
Token ID18966
Feature activation-4.19e-03

Pos logit contributions
 dining+0.058
rd+0.052
zie+0.052
 counters+0.052
 blankets+0.050
Neg logit contributions
 judge-0.039
 judged-0.038
 pigeon-0.037
 scored-0.036
 teased-0.036
Tokenâ
Token ID22940
Feature activation-1.96e-02

Pos logit contributions
 dining+0.009
rd+0.008
zie+0.008
 counters+0.008
 blankets+0.007
Neg logit contributions
 judge-0.006
 pigeon-0.006
 judged-0.006
ked-0.005
 en-0.005
Token
Token ID26391
Feature activation+1.57e-02

Pos logit contributions
 dining+0.027
rd+0.023
 counters+0.023
zie+0.023
 blankets+0.022
Neg logit contributions
 judge-0.040
 judged-0.040
 pigeon-0.039
 scored-0.039
 teased-0.038
Token turned
Token ID2900
Feature activation-8.11e-03

Pos logit contributions
 dining+0.004
 counters+0.003
zie+0.003
rd+0.003
 blankets+0.003
Neg logit contributions
 judge-0.003
 judged-0.003
 pigeon-0.003
 scored-0.003
 teased-0.003
Token into
Token ID656
Feature activation-1.70e-02

Pos logit contributions
 dining+0.016
 blankets+0.015
 counters+0.015
zie+0.015
rd+0.014
Neg logit contributions
 judge-0.012
 pigeon-0.011
 judged-0.011
 scored-0.010
 bully-0.010
Token a
Token ID257
Feature activation-1.95e-02

Pos logit contributions
 dining+0.043
rd+0.040
zie+0.040
 counters+0.039
 fluids+0.038
Neg logit contributions
 judge-0.015
 judged-0.015
 pigeon-0.014
 scored-0.014
 en-0.014
Token giant
Token ID6175
Feature activation-2.50e-02

Pos logit contributions
 dining+0.039
rd+0.037
zie+0.037
 counters+0.036
 fluids+0.035
Neg logit contributions
 judge-0.027
 pigeon-0.026
 judged-0.026
 scored-0.025
 jealous-0.025
Token sand
Token ID6450
Feature activation-2.53e-02

Pos logit contributions
 dining+0.046
rd+0.041
zie+0.041
 counters+0.041
 fluids+0.040
Neg logit contributions
 judge-0.039
 pigeon-0.038
 judged-0.038
 scored-0.037
 teased-0.036
Tokenstorm
Token ID12135
Feature activation-5.84e-02

Pos logit contributions
 dining+0.044
rd+0.038
zie+0.038
 counters+0.038
 blankets+0.037
Neg logit contributions
 judge-0.043
 judged-0.043
 pigeon-0.042
 scored-0.041
 teased-0.041
Token.
Token ID13
Feature activation-2.09e-02

Pos logit contributions
 dining+0.111
rd+0.099
 counters+0.097
zie+0.097
 fluids+0.094
Neg logit contributions
 judge-0.089
 judged-0.087
 pigeon-0.086
 scored-0.086
 teased-0.085
Token
Token ID198
Feature activation-2.22e-02

Pos logit contributions
 dining+0.042
rd+0.038
zie+0.037
 counters+0.037
 blankets+0.037
Neg logit contributions
 judge-0.030
 judged-0.029
 scored-0.028
 pigeon-0.028
 teased-0.028
Token
Token ID198
Feature activation-2.77e-02

Pos logit contributions
 dining+0.059
zie+0.054
 counters+0.053
 blankets+0.053
rd+0.053
Neg logit contributions
 scored-0.018
 judge-0.017
 judged-0.017
 en-0.017
 jealous-0.016
TokenThe
Token ID464
Feature activation-7.07e-03

Pos logit contributions
 dining+0.053
zie+0.049
 blankets+0.048
 fluids+0.046
 counters+0.046
Neg logit contributions
 scored-0.043
 judge-0.043
 judged-0.041
 en-0.040
 jealous-0.040
Token sand
Token ID6450
Feature activation+5.43e-03

Pos logit contributions
 dining+0.014
rd+0.012
zie+0.012
 counters+0.012
 fluids+0.012
Neg logit contributions
 judge-0.011
 pigeon-0.011
 judged-0.010
 en-0.010
 jealous-0.010
Token the
Token ID262
Feature activation-8.66e-03

Pos logit contributions
 dining+0.027
 counters+0.025
rd+0.025
zie+0.025
 fluids+0.024
Neg logit contributions
 judge-0.009
 judged-0.009
 teased-0.008
 scored-0.008
 pigeon-0.008
Token garden
Token ID11376
Feature activation-5.51e-03

Pos logit contributions
 dining+0.017
rd+0.016
 occasions+0.016
 Amen+0.016
 fluids+0.016
Neg logit contributions
 judge-0.012
 en-0.011
 jealous-0.011
 judged-0.011
 shy-0.011
Token.
Token ID13
Feature activation-2.03e-02

Pos logit contributions
 dining+0.011
rd+0.009
zie+0.009
 counters+0.009
 blankets+0.009
Neg logit contributions
 judge-0.008
 judged-0.008
 pigeon-0.008
 scored-0.008
 teased-0.008
Token<|endoftext|>
Token ID50256
Feature activation-1.20e-02

Pos logit contributions
 dining+0.042
 counters+0.037
zie+0.037
rd+0.037
 blankets+0.037
Neg logit contributions
 judge-0.027
 judged-0.027
 scored-0.026
 pigeon-0.025
 en-0.025
Token"
Token ID1
Feature activation-5.63e-03

Pos logit contributions
 dining+0.027
 counters+0.025
rd+0.025
zie+0.025
 fluids+0.024
Neg logit contributions
 scored-0.013
 judge-0.013
 at-0.013
 teased-0.012
 judged-0.012
TokenMom
Token ID29252
Feature activation-5.83e-02

Pos logit contributions
 dining+0.010
rd+0.009
zie+0.008
 counters+0.008
 blankets+0.008
Neg logit contributions
 judge-0.010
 judged-0.009
 pigeon-0.009
 scored-0.009
 teased-0.009
Tokenmy
Token ID1820
Feature activation-4.61e-02

Pos logit contributions
 dining+0.087
 fluids+0.085
rd+0.084
 counters+0.077
 occasions+0.077
Neg logit contributions
 at-0.109
 judge-0.108
 teased-0.105
 judged-0.103
 scored-0.102
Token,
Token ID11
Feature activation-9.67e-03

Pos logit contributions
 dining+0.097
rd+0.088
 counters+0.087
 fluids+0.086
zie+0.086
Neg logit contributions
 judge-0.062
 judged-0.059
 scored-0.058
 pigeon-0.058
 teased-0.058
Token what
Token ID644
Feature activation+2.01e-02

Pos logit contributions
 dining+0.020
 counters+0.018
rd+0.018
zie+0.018
 blankets+0.017
Neg logit contributions
 judge-0.013
 judged-0.013
 pigeon-0.012
 scored-0.012
 teased-0.012
Token's
Token ID338
Feature activation-8.12e-03

Pos logit contributions
ked+0.032
 judged+0.031
 judge+0.031
 pigeon+0.030
 teased+0.030
Neg logit contributions
 dining-0.037
rd-0.033
 counters-0.031
 blankets-0.030
 fluids-0.030
Token that
Token ID326
Feature activation-1.49e-02

Pos logit contributions
 dining+0.016
rd+0.014
 counters+0.014
zie+0.014
 blankets+0.014
Neg logit contributions
 judged-0.012
 judge-0.012
ked-0.011
 scored-0.011
 teased-0.011
Token welcome
Token ID7062
Feature activation+2.59e-03

Pos logit contributions
 dining+0.037
rd+0.033
 counters+0.033
zie+0.033
 blankets+0.032
Neg logit contributions
 judge-0.026
 judged-0.025
 pigeon-0.024
 scored-0.024
 teased-0.024
Token her
Token ID607
Feature activation+9.07e-03

Pos logit contributions
 judged+0.004
 scored+0.004
 judge+0.004
ked+0.004
 pigeon+0.003
Neg logit contributions
 dining-0.005
zie-0.005
 blankets-0.005
 counters-0.005
rd-0.005
Token.
Token ID13
Feature activation-8.07e-03

Pos logit contributions
 judge+0.014
 judged+0.014
 pigeon+0.013
 scored+0.013
ked+0.013
Neg logit contributions
 dining-0.018
rd-0.015
 counters-0.015
 blankets-0.015
zie-0.015
Token<|endoftext|>
Token ID50256
Feature activation-1.12e-02

Pos logit contributions
 dining+0.016
rd+0.015
 counters+0.015
zie+0.014
 blankets+0.014
Neg logit contributions
 judge-0.011
 judged-0.011
 pigeon-0.011
 scored-0.011
 en-0.011
Token"
Token ID1
Feature activation-6.70e-03

Pos logit contributions
 dining+0.025
 counters+0.023
rd+0.022
zie+0.022
 fluids+0.022
Neg logit contributions
 scored-0.013
 judge-0.013
 teased-0.013
 at-0.013
 judged-0.013
TokenMom
Token ID29252
Feature activation-5.80e-02

Pos logit contributions
 dining+0.012
rd+0.011
zie+0.011
 counters+0.011
 blankets+0.010
Neg logit contributions
 judge-0.011
 judged-0.011
 pigeon-0.010
 scored-0.010
 teased-0.010
Tokenmy
Token ID1820
Feature activation-4.73e-02

Pos logit contributions
 dining+0.092
 fluids+0.090
rd+0.089
 counters+0.084
 occasions+0.081
Neg logit contributions
 at-0.102
 judge-0.102
 teased-0.098
 judged-0.096
 scored-0.095
Token,
Token ID11
Feature activation-9.01e-03

Pos logit contributions
 dining+0.102
rd+0.094
 counters+0.093
 fluids+0.093
 blankets+0.091
Neg logit contributions
 judge-0.061
 judged-0.057
 pigeon-0.057
 scored-0.057
 teased-0.057
Token look
Token ID804
Feature activation-1.70e-02

Pos logit contributions
 dining+0.018
zie+0.016
 counters+0.016
rd+0.016
 blankets+0.016
Neg logit contributions
 judged-0.013
 judge-0.013
ked-0.012
 pigeon-0.012
 scored-0.012
Token at
Token ID379
Feature activation+6.95e-04

Pos logit contributions
 dining+0.050
 counters+0.046
rd+0.046
zie+0.045
 blankets+0.045
Neg logit contributions
 judge-0.009
 judged-0.008
 scored-0.007
 pigeon-0.007
 teased-0.007
Token the
Token ID262
Feature activation-4.29e-03

Pos logit contributions
 judge+0.001
 judged+0.001
 scored+0.001
 pigeon+0.001
ked+0.001
Neg logit contributions
 dining-0.002
zie-0.002
 counters-0.002
rd-0.002
 blankets-0.002
TokenSuddenly
Token ID38582
Feature activation-9.62e-03

Pos logit contributions
 dining+0.020
zie+0.020
rd+0.018
heart+0.018
 counters+0.018
Neg logit contributions
 judge-0.016
 scored-0.015
 judged-0.015
 pigeon-0.015
 jealous-0.014
Token,
Token ID11
Feature activation-1.68e-02

Pos logit contributions
 dining+0.020
rd+0.018
zie+0.018
 counters+0.018
 blankets+0.018
Neg logit contributions
 judge-0.013
 judged-0.012
 pigeon-0.012
 scored-0.012
 teased-0.012
Token a
Token ID257
Feature activation-1.43e-02

Pos logit contributions
 dining+0.035
 blankets+0.032
 counters+0.031
 fluids+0.030
zie+0.029
Neg logit contributions
 judge-0.023
 judged-0.023
 pigeon-0.021
 scored-0.021
ked-0.021
Token loud
Token ID7812
Feature activation+6.12e-04

Pos logit contributions
 dining+0.032
rd+0.029
zie+0.029
 counters+0.029
 blankets+0.028
Neg logit contributions
 judge-0.018
 judged-0.017
 pigeon-0.017
 scored-0.016
 teased-0.016
Token noise
Token ID7838
Feature activation-1.66e-02

Pos logit contributions
 judged+0.001
 judge+0.001
 scored+0.001
 pigeon+0.001
 teased+0.001
Neg logit contributions
 dining-0.001
zie-0.001
 counters-0.001
rd-0.001
 blankets-0.001
Token made
Token ID925
Feature activation-5.78e-02

Pos logit contributions
 dining+0.030
 blankets+0.029
 counters+0.029
rd+0.027
zie+0.026
Neg logit contributions
 judge-0.024
 judged-0.024
 pigeon-0.023
 en-0.022
 scored-0.021
Token them
Token ID606
Feature activation-1.10e-02

Pos logit contributions
 dining+0.127
rd+0.118
zie+0.112
 counters+0.111
 fluids+0.107
Neg logit contributions
 judge-0.072
 judged-0.071
 pigeon-0.069
 teased-0.068
 jealous-0.068
Token stop
Token ID2245
Feature activation-3.17e-02

Pos logit contributions
 dining+0.023
rd+0.021
 counters+0.021
zie+0.021
 blankets+0.021
Neg logit contributions
 judge-0.014
 judged-0.014
 pigeon-0.014
 scored-0.013
 teased-0.013
Token and
Token ID290
Feature activation+3.71e-03

Pos logit contributions
 dining+0.075
rd+0.070
zie+0.068
 counters+0.065
 fluids+0.065
Neg logit contributions
 at-0.035
 judge-0.034
 judged-0.033
 teased-0.033
 jealous-0.033
Token look
Token ID804
Feature activation-1.81e-02

Pos logit contributions
 judge+0.005
 pigeon+0.005
 judged+0.005
 scored+0.005
 gol+0.005
Neg logit contributions
 dining-0.007
 blankets-0.007
 counters-0.007
zie-0.006
 fluids-0.006
Token up
Token ID510
Feature activation-1.03e-02

Pos logit contributions
 dining+0.045
 counters+0.041
 blankets+0.041
rd+0.041
zie+0.040
Neg logit contributions
 judge-0.018
 judged-0.017
 pigeon-0.016
 scored-0.016
 teased-0.015
Token.
Token ID13
Feature activation-1.15e-02

Pos logit contributions
 dining+0.022
 counters+0.019
rd+0.019
zie+0.019
 fluids+0.019
Neg logit contributions
 judge-0.014
 judged-0.013
ked-0.013
 pigeon-0.013
 at-0.013
Token<|endoftext|>
Token ID50256
Feature activation-1.61e-02

Pos logit contributions
 dining+0.024
 counters+0.022
rd+0.022
zie+0.022
 blankets+0.022
Neg logit contributions
 judge-0.015
 judged-0.015
 en-0.014
 scored-0.014
 teased-0.014
Token"
Token ID1
Feature activation-6.57e-03

Pos logit contributions
 dining+0.037
 counters+0.034
zie+0.033
rd+0.033
 hinges+0.033
Neg logit contributions
 scored-0.018
 at-0.017
 judge-0.017
 teased-0.017
 judged-0.016
TokenLook
Token ID8567
Feature activation-3.08e-02

Pos logit contributions
 dining+0.012
rd+0.010
zie+0.010
 counters+0.010
 blankets+0.010
Neg logit contributions
 judge-0.011
 judged-0.011
 pigeon-0.011
 scored-0.011
 teased-0.010
Token,
Token ID11
Feature activation-2.63e-02

Pos logit contributions
 dining+0.074
rd+0.067
 counters+0.067
zie+0.066
 blankets+0.066
Neg logit contributions
 judge-0.031
 judged-0.031
 pigeon-0.030
 scored-0.029
 teased-0.029
Token Mom
Token ID11254
Feature activation-5.76e-02

Pos logit contributions
 dining+0.057
rd+0.051
 counters+0.050
zie+0.050
 blankets+0.049
Neg logit contributions
 judge-0.034
 judged-0.033
 pigeon-0.033
 scored-0.032
 teased-0.032
Tokenmy
Token ID1820
Feature activation-5.09e-02

Pos logit contributions
 dining+0.096
 fluids+0.094
rd+0.091
 counters+0.089
 occasions+0.088
Neg logit contributions
 at-0.101
 judge-0.098
 teased-0.096
 pin-0.092
 only-0.091
Token!"
Token ID2474
Feature activation-3.17e-02

Pos logit contributions
 dining+0.101
 fluids+0.093
 counters+0.093
rd+0.091
 blankets+0.089
Neg logit contributions
 judge-0.074
 at-0.073
 teased-0.071
 scored-0.070
 pigeon-0.070
Token exclaimed
Token ID33552
Feature activation-3.37e-02

Pos logit contributions
 dining+0.073
zie+0.067
rd+0.066
 counters+0.064
 fluids+0.064
Neg logit contributions
 teased-0.035
 judged-0.035
 judge-0.035
 en-0.034
 pigeon-0.034
Token three
Token ID1115
Feature activation-3.53e-04

Pos logit contributions
 dining+0.062
rd+0.054
zie+0.054
 counters+0.054
 blankets+0.052
Neg logit contributions
 judge-0.054
 judged-0.053
 pigeon-0.052
 scored-0.052
 teased-0.051
Token-
Token ID12
Feature activation+2.06e-02

Pos logit contributions
 dining+0.001
zie+0.001
rd+0.001
 counters+0.001
 blankets+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 pigeon-0.001
 scored-0.000
 teased-0.000
Token stopped
Token ID5025
Feature activation-2.73e-02

Pos logit contributions
 dining+0.012
 counters+0.011
zie+0.011
rd+0.011
 blankets+0.011
Neg logit contributions
 judge-0.005
 judged-0.005
 pigeon-0.005
 scored-0.004
 en-0.004
Token and
Token ID290
Feature activation-9.15e-03

Pos logit contributions
 dining+0.058
zie+0.052
 counters+0.052
 blankets+0.052
rd+0.052
Neg logit contributions
 judge-0.035
 judged-0.034
 pigeon-0.034
 scored-0.034
 teased-0.032
Token the
Token ID262
Feature activation-3.73e-03

Pos logit contributions
 dining+0.018
 blankets+0.016
zie+0.016
 counters+0.016
rd+0.015
Neg logit contributions
 judge-0.014
 scored-0.013
 judged-0.013
 pigeon-0.013
 en-0.013
Token sun
Token ID4252
Feature activation-1.31e-02

Pos logit contributions
 dining+0.007
rd+0.006
zie+0.006
 counters+0.006
 blankets+0.006
Neg logit contributions
 judge-0.006
 judged-0.006
 pigeon-0.006
 scored-0.006
 teased-0.006
Token started
Token ID2067
Feature activation-3.66e-02

Pos logit contributions
 dining+0.029
zie+0.026
 counters+0.026
rd+0.026
 blankets+0.026
Neg logit contributions
 judge-0.016
 judged-0.015
 pigeon-0.015
 scored-0.015
 en-0.014
Token to
Token ID284
Feature activation-5.70e-02

Pos logit contributions
 dining+0.083
 blankets+0.080
zie+0.078
 counters+0.076
rd+0.074
Neg logit contributions
 judge-0.041
 pigeon-0.039
 scored-0.038
 judged-0.038
 bully-0.036
Token shine
Token ID18340
Feature activation-1.56e-02

Pos logit contributions
 dining+0.118
rd+0.109
 counters+0.105
zie+0.102
 fluids+0.101
Neg logit contributions
 judge-0.078
 judged-0.077
 pigeon-0.074
 teased-0.073
ked-0.072
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 dining+0.033
rd+0.029
zie+0.029
 counters+0.029
 fluids+0.028
Neg logit contributions
 judged-0.021
 judge-0.021
 pigeon-0.020
 teased-0.020
ked-0.020
Token Ber
Token ID4312
Feature activation-4.73e-03

Pos logit contributions
 dining+0.019
 counters+0.017
zie+0.017
 blankets+0.017
rd+0.017
Neg logit contributions
 judge-0.018
 judged-0.017
 pigeon-0.017
 scored-0.017
 teased-0.017
Tokentha
Token ID12898
Feature activation+8.32e-03

Pos logit contributions
 dining+0.008
rd+0.007
 fluids+0.007
 counters+0.007
 occasions+0.007
Neg logit contributions
ked-0.008
 judge-0.008
 pigeon-0.008
 judged-0.008
 at-0.008
Token and
Token ID290
Feature activation-3.71e-03

Pos logit contributions
 judge+0.011
 pigeon+0.011
 en+0.011
 judged+0.011
 scored+0.011
Neg logit contributions
 dining-0.015
zie-0.015
 counters-0.014
 blankets-0.014
rd-0.014
Token dress
Token ID6576
Feature activation-5.16e-04

Pos logit contributions
 dining+0.040
 blankets+0.036
rd+0.035
 counters+0.034
zie+0.034
Neg logit contributions
 judged-0.039
 scored-0.037
 judge-0.037
ked-0.036
 teased-0.036
Token.
Token ID13
Feature activation+7.10e-03

Pos logit contributions
 dining+0.001
 blankets+0.001
 counters+0.001
rd+0.001
zie+0.001
Neg logit contributions
 judged-0.001
 judge-0.001
 scored-0.001
 pigeon-0.001
 bully-0.001
Token She
Token ID1375
Feature activation+3.76e-03

Pos logit contributions
 judge+0.011
 judged+0.011
 pigeon+0.010
 scored+0.010
 teased+0.010
Neg logit contributions
 dining-0.014
rd-0.012
zie-0.012
 counters-0.012
 blankets-0.012
Token would
Token ID561
Feature activation+9.59e-03

Pos logit contributions
 judge+0.005
 pigeon+0.005
 judged+0.005
ked+0.005
 scored+0.005
Neg logit contributions
 dining-0.008
rd-0.007
zie-0.007
 blankets-0.007
 counters-0.007
Token tw
Token ID665
Feature activation+2.09e-02

Pos logit contributions
 judged+0.013
ked+0.013
 scored+0.013
 judge+0.013
 pigeon+0.013
Neg logit contributions
 dining-0.018
 blankets-0.017
zie-0.016
 counters-0.016
rd-0.016
Tokenirl
Token ID1901
Feature activation+3.97e-02

Pos logit contributions
 judge+0.036
 judged+0.035
 pigeon+0.034
 scored+0.034
 teased+0.033
Neg logit contributions
 dining-0.036
rd-0.032
zie-0.032
 counters-0.032
 blankets-0.031
Token around
Token ID1088
Feature activation+2.16e-02

Pos logit contributions
 judge+0.064
 judged+0.058
 pigeon+0.054
 bully+0.054
 greedy+0.053
Neg logit contributions
 blankets-0.067
 dining-0.067
zie-0.066
rd-0.063
 counters-0.062
Token in
Token ID287
Feature activation+1.09e-02

Pos logit contributions
 judged+0.037
 judge+0.036
 pigeon+0.035
ked+0.035
 scored+0.034
Neg logit contributions
 dining-0.037
 blankets-0.037
 bed-0.034
zie-0.033
 bedroom-0.032
Token it
Token ID340
Feature activation+5.64e-03

Pos logit contributions
 judge+0.010
 en+0.009
 judged+0.009
 pigeon+0.009
 pin+0.009
Neg logit contributions
 dining-0.028
rd-0.026
 counters-0.025
zie-0.025
 fluids-0.025
Token and
Token ID290
Feature activation+1.38e-02

Pos logit contributions
ked+0.010
 judge+0.010
 pigeon+0.010
 judged+0.010
 scored+0.009
Neg logit contributions
 dining-0.009
 blankets-0.009
zie-0.008
 bed-0.008
 bedroom-0.008
Token act
Token ID719
Feature activation+1.70e-02

Pos logit contributions
 pigeon+0.021
 judge+0.021
 judged+0.020
ked+0.020
 greedy+0.019
Neg logit contributions
 dining-0.024
zie-0.023
 blankets-0.022
 counters-0.021
 bedroom-0.021
Token.
Token ID13
Feature activation-4.91e-03

Pos logit contributions
 dining+0.010
rd+0.009
 counters+0.009
zie+0.009
 blankets+0.008
Neg logit contributions
 judge-0.008
 judged-0.008
 pigeon-0.008
 scored-0.008
ked-0.008
Token She
Token ID1375
Feature activation+5.57e-03

Pos logit contributions
 dining+0.009
rd+0.008
zie+0.008
 counters+0.008
 blankets+0.008
Neg logit contributions
 judge-0.008
 judged-0.007
 pigeon-0.007
 scored-0.007
 teased-0.007
Token wanted
Token ID2227
Feature activation-6.46e-03

Pos logit contributions
 judge+0.007
 pigeon+0.006
 judged+0.006
ked+0.006
 bully+0.006
Neg logit contributions
 dining-0.012
zie-0.011
 counters-0.011
rd-0.011
 blankets-0.011
Token to
Token ID284
Feature activation+9.29e-03

Pos logit contributions
 dining+0.017
 blankets+0.016
 counters+0.016
zie+0.016
rd+0.016
Neg logit contributions
 judge-0.005
 judged-0.005
 scored-0.004
 pigeon-0.004
ked-0.004
Token know
Token ID760
Feature activation+1.66e-02

Pos logit contributions
 judge+0.012
 judged+0.012
 pigeon+0.012
 scored+0.011
 teased+0.011
Neg logit contributions
 dining-0.020
rd-0.018
 counters-0.018
zie-0.018
 blankets-0.017
Token what
Token ID644
Feature activation+3.80e-02

Pos logit contributions
 judge+0.021
 judged+0.020
 pigeon+0.020
 en+0.020
 teased+0.020
Neg logit contributions
 dining-0.037
rd-0.033
zie-0.033
 counters-0.032
 fluids-0.031
Token it
Token ID340
Feature activation+3.54e-02

Pos logit contributions
 judge+0.055
 judged+0.054
 pigeon+0.052
 scored+0.052
ked+0.052
Neg logit contributions
 dining-0.075
rd-0.068
zie-0.067
 counters-0.067
 blankets-0.066
Token felt
Token ID2936
Feature activation+2.45e-02

Pos logit contributions
 judge+0.062
 pigeon+0.062
ked+0.061
 judged+0.059
 scored+0.058
Neg logit contributions
 dining-0.056
zie-0.053
 blankets-0.051
rd-0.050
 counters-0.050
Token like
Token ID588
Feature activation+1.64e-02

Pos logit contributions
 judge+0.027
 pigeon+0.026
 judged+0.026
ked+0.025
 scored+0.025
Neg logit contributions
 dining-0.056
rd-0.051
 counters-0.051
 blankets-0.051
zie-0.050
Token to
Token ID284
Feature activation+5.02e-04

Pos logit contributions
 judge+0.023
 judged+0.023
 pigeon+0.022
 scored+0.022
ked+0.022
Neg logit contributions
 dining-0.033
rd-0.030
 counters-0.030
 blankets-0.030
zie-0.029
Token splash
Token ID22870
Feature activation+4.50e-03

Pos logit contributions
 judge+0.001
 pigeon+0.001
 pin+0.001
 scored+0.001
 judged+0.001
Neg logit contributions
 dining-0.001
rd-0.001
 counters-0.001
 blankets-0.001
zie-0.001
Token come
Token ID1282
Feature activation-1.32e-02

Pos logit contributions
 dining+0.000
zie+0.000
rd+0.000
 blankets+0.000
 counters+0.000
Neg logit contributions
 judged-0.000
 judge-0.000
 scored-0.000
ked-0.000
 teased-0.000
Token on
Token ID319
Feature activation+1.98e-02

Pos logit contributions
 dining+0.033
 counters+0.030
rd+0.030
 fluids+0.030
zie+0.029
Neg logit contributions
 judge-0.014
 judged-0.012
ked-0.012
 pigeon-0.012
 at-0.012
Token!"
Token ID2474
Feature activation+6.05e-03

Pos logit contributions
 judge+0.033
 judged+0.033
 pigeon+0.032
 scored+0.031
ked+0.031
Neg logit contributions
 dining-0.035
rd-0.031
zie-0.031
 counters-0.030
 blankets-0.030
Token Lily
Token ID20037
Feature activation+1.75e-02

Pos logit contributions
 judge+0.011
 pigeon+0.011
 judged+0.010
 scored+0.010
 bully+0.010
Neg logit contributions
 dining-0.010
 counters-0.009
rd-0.009
 blankets-0.009
zie-0.009
Token climbed
Token ID19952
Feature activation-4.03e-03

Pos logit contributions
 pigeon+0.026
 judge+0.026
ked+0.025
 bully+0.024
 judged+0.024
Neg logit contributions
 dining-0.031
zie-0.029
rd-0.029
 counters-0.028
 blankets-0.027
Token up
Token ID510
Feature activation+3.90e-02

Pos logit contributions
 dining+0.008
zie+0.007
 blankets+0.007
 counters+0.007
rd+0.007
Neg logit contributions
 judge-0.006
 judged-0.006
 pigeon-0.006
 scored-0.006
 teased-0.005
Token the
Token ID262
Feature activation+4.15e-03

Pos logit contributions
 judge+0.061
 judged+0.060
 scored+0.057
ked+0.056
 pigeon+0.055
Neg logit contributions
 blankets-0.070
 dining-0.068
 bed-0.064
zie-0.064
 counters-0.059
Token ladder
Token ID18002
Feature activation-2.74e-03

Pos logit contributions
 judge+0.007
 judged+0.006
 pigeon+0.006
 scored+0.006
 en+0.006
Neg logit contributions
 dining-0.008
rd-0.007
 counters-0.007
zie-0.007
 fluids-0.007
Token to
Token ID284
Feature activation-4.98e-03

Pos logit contributions
 dining+0.006
zie+0.006
 blankets+0.006
 counters+0.006
rd+0.006
Neg logit contributions
 judge-0.003
 judged-0.003
 pigeon-0.003
 scored-0.003
 teased-0.003
Token the
Token ID262
Feature activation+1.34e-02

Pos logit contributions
 dining+0.010
zie+0.009
 blankets+0.009
 counters+0.009
 fluids+0.009
Neg logit contributions
 judged-0.007
 judge-0.007
 pigeon-0.007
 scored-0.007
ked-0.007
Token top
Token ID1353
Feature activation+1.96e-03

Pos logit contributions
 judge+0.020
 pigeon+0.020
 judged+0.020
 scored+0.019
 en+0.019
Neg logit contributions
 dining-0.026
rd-0.023
zie-0.023
 counters-0.023
 blankets-0.022
Token her
Token ID607
Feature activation-2.96e-03

Pos logit contributions
 dining+0.000
 blankets+0.000
 counters+0.000
zie+0.000
 Amen+0.000
Neg logit contributions
 judge-0.000
 bully-0.000
 judged-0.000
 scored-0.000
 pigeon-0.000
Token mom
Token ID1995
Feature activation-2.37e-02

Pos logit contributions
 dining+0.006
rd+0.005
 blankets+0.005
 counters+0.005
zie+0.005
Neg logit contributions
 judged-0.004
 judge-0.004
 scored-0.004
 teased-0.004
 pigeon-0.004
Token,
Token ID11
Feature activation+8.22e-04

Pos logit contributions
 dining+0.053
rd+0.048
 counters+0.048
zie+0.047
 fluids+0.046
Neg logit contributions
 judge-0.029
 judged-0.028
 pigeon-0.028
 scored-0.027
 teased-0.027
Token and
Token ID290
Feature activation-5.66e-03

Pos logit contributions
 judge+0.001
 bully+0.001
 scored+0.001
 judged+0.001
 pigeon+0.001
Neg logit contributions
 dining-0.002
 blankets-0.001
zie-0.001
 counters-0.001
 Amen-0.001
Token the
Token ID262
Feature activation+1.24e-02

Pos logit contributions
 dining+0.012
 blankets+0.011
zie+0.010
 counters+0.010
rd+0.010
Neg logit contributions
 judge-0.008
 judged-0.007
 scored-0.007
 bully-0.007
 en-0.007
Token par
Token ID1582
Feature activation+5.40e-02

Pos logit contributions
 judged+0.011
 scored+0.010
ked+0.010
 judge+0.010
 teased+0.010
Neg logit contributions
 dining-0.031
 blankets-0.028
rd-0.028
 counters-0.028
zie-0.027
Tokenrot
Token ID10599
Feature activation-6.57e-03

Pos logit contributions
 judged+0.070
 en+0.068
 judge+0.068
 teased+0.067
 jealous+0.067
Neg logit contributions
 dining-0.104
rd-0.100
zie-0.094
 blankets-0.092
 counters-0.091
Token.
Token ID13
Feature activation-2.03e-02

Pos logit contributions
 dining+0.012
 blankets+0.011
zie+0.011
rd+0.011
 counters+0.010
Neg logit contributions
 judge-0.010
 judged-0.009
 pigeon-0.009
 en-0.009
 bully-0.009
Token
Token ID198
Feature activation-2.28e-02

Pos logit contributions
 dining+0.046
rd+0.041
zie+0.041
 counters+0.041
 occasions+0.039
Neg logit contributions
 judged-0.023
 en-0.023
 judge-0.023
 scored-0.022
 pigeon-0.022
Token
Token ID198
Feature activation-3.01e-02

Pos logit contributions
 dining+0.061
 counters+0.057
zie+0.056
heart+0.055
rd+0.054
Neg logit contributions
 scored-0.015
 en-0.015
 at-0.014
 judged-0.014
 teased-0.014
TokenAfter
Token ID3260
Feature activation-5.89e-03

Pos logit contributions
 dining+0.060
zie+0.056
 counters+0.054
 blankets+0.052
rd+0.051
Neg logit contributions
 scored-0.041
 pigeon-0.041
 judged-0.041
 en-0.040
 judge-0.040
Token,
Token ID11
Feature activation+1.50e-02

Pos logit contributions
 judge+0.001
 judged+0.001
 pigeon+0.001
 scored+0.001
 en+0.001
Neg logit contributions
 dining-0.002
rd-0.001
 blankets-0.001
zie-0.001
 counters-0.001
Token she
Token ID673
Feature activation+8.08e-04

Pos logit contributions
 scored+0.025
 judged+0.025
 judge+0.025
ked+0.024
 gol+0.023
Neg logit contributions
 dining-0.026
 blankets-0.024
 counters-0.022
 bed-0.022
zie-0.021
Token wanted
Token ID2227
Feature activation+4.62e-04

Pos logit contributions
ked+0.001
 judge+0.001
 pigeon+0.001
 judged+0.001
 en+0.001
Neg logit contributions
 dining-0.002
rd-0.001
 counters-0.001
zie-0.001
 blankets-0.001
Token to
Token ID284
Feature activation+1.05e-02

Pos logit contributions
ked+0.001
 scored+0.000
 judged+0.000
 pigeon+0.000
 judge+0.000
Neg logit contributions
 dining-0.001
 blankets-0.001
 counters-0.001
 bed-0.001
zie-0.001
Token go
Token ID467
Feature activation+9.42e-03

Pos logit contributions
 judge+0.015
 judged+0.014
 pigeon+0.014
 scored+0.014
 teased+0.014
Neg logit contributions
 dining-0.022
rd-0.019
zie-0.019
 counters-0.019
 blankets-0.019
Token on
Token ID319
Feature activation+5.26e-02

Pos logit contributions
 judge+0.009
 judged+0.009
 pigeon+0.009
 scored+0.009
ked+0.009
Neg logit contributions
 dining-0.023
rd-0.021
zie-0.021
 counters-0.021
 blankets-0.021
Token an
Token ID281
Feature activation+5.19e-03

Pos logit contributions
 judge+0.064
 judged+0.063
ked+0.063
 scored+0.061
 pigeon+0.061
Neg logit contributions
 dining-0.118
 blankets-0.107
rd-0.107
zie-0.105
 counters-0.104
Token adventure
Token ID8855
Feature activation-7.60e-03

Pos logit contributions
 judge+0.009
 judged+0.009
 pigeon+0.009
 scored+0.009
 teased+0.009
Neg logit contributions
 dining-0.008
rd-0.007
zie-0.007
 counters-0.007
 blankets-0.007
Token,
Token ID11
Feature activation+1.77e-02

Pos logit contributions
 dining+0.015
rd+0.014
zie+0.014
 blankets+0.014
 counters+0.014
Neg logit contributions
 judge-0.011
 judged-0.011
ked-0.010
 pigeon-0.010
 scored-0.010
Token so
Token ID523
Feature activation+1.39e-02

Pos logit contributions
ked+0.029
 judge+0.028
 scored+0.028
 judged+0.028
 pigeon+0.027
Neg logit contributions
 dining-0.032
 blankets-0.029
 counters-0.029
zie-0.027
 bed-0.027
Token she
Token ID673
Feature activation+5.85e-03

Pos logit contributions
 judge+0.017
 judged+0.017
 scored+0.016
 pigeon+0.016
ked+0.016
Neg logit contributions
 dining-0.031
rd-0.028
zie-0.027
 counters-0.027
 blankets-0.027
Token Larry
Token ID13633
Feature activation-2.66e-02

Pos logit contributions
 dining+0.021
zie+0.019
rd+0.019
 blankets+0.019
 counters+0.018
Neg logit contributions
 judge-0.015
 judged-0.015
 scored-0.014
 pigeon-0.014
 teased-0.014
Token.
Token ID13
Feature activation-5.93e-03

Pos logit contributions
 dining+0.050
rd+0.044
zie+0.044
 counters+0.044
 blankets+0.043
Neg logit contributions
 judge-0.042
 judged-0.041
 pigeon-0.040
 scored-0.039
 teased-0.039
Token Larry
Token ID13633
Feature activation-6.46e-03

Pos logit contributions
 dining+0.010
 counters+0.009
rd+0.009
zie+0.009
 fluids+0.008
Neg logit contributions
 judge-0.010
 judged-0.010
 scored-0.010
 pigeon-0.010
 teased-0.010
Token was
Token ID373
Feature activation+5.47e-03

Pos logit contributions
 dining+0.012
rd+0.011
zie+0.010
 blankets+0.010
 counters+0.010
Neg logit contributions
 judge-0.010
ked-0.009
 judged-0.009
 scored-0.009
 pigeon-0.009
Token a
Token ID257
Feature activation+1.82e-02

Pos logit contributions
 scored+0.008
ked+0.008
 judge+0.007
 pigeon+0.007
 judged+0.007
Neg logit contributions
 dining-0.010
 blankets-0.009
zie-0.009
 Amen-0.009
 bed-0.009
Token good
Token ID922
Feature activation+7.50e-03

Pos logit contributions
ked+0.030
 scored+0.028
 judged+0.027
 teased+0.027
 judge+0.024
Neg logit contributions
 dining-0.032
 Amen-0.028
 fort-0.026
 ig-0.026
 cave-0.026
Token-
Token ID12
Feature activation+3.54e-02

Pos logit contributions
ked+0.012
 scored+0.012
 judged+0.011
 teased+0.011
 en+0.010
Neg logit contributions
 dining-0.013
 fort-0.012
 ig-0.012
 cave-0.011
rd-0.011
Tokennat
Token ID32353
Feature activation+1.79e-02

Pos logit contributions
 judged+0.058
 judge+0.055
 teased+0.055
ked+0.055
 scored+0.055
Neg logit contributions
 dining-0.062
rd-0.058
zie-0.054
 counters-0.053
 fluids-0.052
Tokenured
Token ID1522
Feature activation+9.88e-03

Pos logit contributions
 judged+0.028
 judge+0.027
 teased+0.026
 scored+0.026
 bully+0.025
Neg logit contributions
rd-0.030
 dining-0.030
 counters-0.025
zie-0.025
esh-0.025
Token bird
Token ID6512
Feature activation-7.55e-03

Pos logit contributions
 scored+0.016
 judged+0.015
ked+0.015
 teased+0.014
 en+0.013
Neg logit contributions
 dining-0.018
 blankets-0.016
 fort-0.016
 ig-0.016
rd-0.016
Token and
Token ID290
Feature activation+1.54e-02

Pos logit contributions
 dining+0.016
rd+0.015
zie+0.015
 blankets+0.015
 counters+0.014
Neg logit contributions
 judged-0.009
 judge-0.009
ked-0.009
 scored-0.009
 pigeon-0.009
Token.
Token ID13
Feature activation-6.27e-03

Pos logit contributions
 dining+0.005
 blankets+0.005
zie+0.004
 counters+0.004
rd+0.004
Neg logit contributions
 judged-0.006
 judge-0.006
ked-0.005
 pigeon-0.005
 en-0.005
Token<|endoftext|>
Token ID50256
Feature activation-1.82e-03

Pos logit contributions
 dining+0.012
rd+0.011
zie+0.011
 counters+0.011
 blankets+0.011
Neg logit contributions
 judge-0.009
 judged-0.009
 pigeon-0.009
 scored-0.009
 teased-0.009
TokenOne
Token ID3198
Feature activation-1.51e-02

Pos logit contributions
 dining+0.004
 counters+0.004
zie+0.004
rd+0.004
 fluids+0.004
Neg logit contributions
 scored-0.002
 judge-0.002
 teased-0.002
 judged-0.002
 at-0.002
Token day
Token ID1110
Feature activation-6.51e-03

Pos logit contributions
 dining+0.026
rd+0.023
zie+0.023
 counters+0.023
 blankets+0.022
Neg logit contributions
 judge-0.026
 judged-0.026
 pigeon-0.025
 scored-0.025
 teased-0.025
Token,
Token ID11
Feature activation+1.65e-02

Pos logit contributions
 dining+0.013
rd+0.011
zie+0.011
 counters+0.011
 blankets+0.011
Neg logit contributions
 judge-0.010
 judged-0.009
 pigeon-0.009
 scored-0.009
 en-0.009
Token a
Token ID257
Feature activation+2.04e-02

Pos logit contributions
 judge+0.019
 judged+0.019
 pigeon+0.018
 scored+0.017
 teased+0.017
Neg logit contributions
 dining-0.037
rd-0.034
zie-0.033
 blankets-0.033
 counters-0.033
Token little
Token ID1310
Feature activation-5.58e-03

Pos logit contributions
ked+0.033
 judged+0.032
 teased+0.032
 scored+0.031
 en+0.028
Neg logit contributions
 dining-0.034
 fort-0.028
 blankets-0.028
 counters-0.027
 bedroom-0.027
Token boy
Token ID2933
Feature activation-3.31e-02

Pos logit contributions
 dining+0.010
 blankets+0.009
zie+0.009
rd+0.009
 counters+0.009
Neg logit contributions
 judged-0.008
 scored-0.008
ked-0.008
 judge-0.008
 teased-0.008
Token named
Token ID3706
Feature activation-6.79e-03

Pos logit contributions
 dining+0.078
rd+0.070
 counters+0.070
zie+0.070
 blankets+0.068
Neg logit contributions
 judge-0.037
 judged-0.036
 pigeon-0.035
 scored-0.034
 teased-0.034
Token Tim
Token ID5045
Feature activation-3.62e-02

Pos logit contributions
 dining+0.017
 counters+0.016
rd+0.016
 fluids+0.016
 occasions+0.016
Neg logit contributions
 judge-0.007
 scored-0.006
 teased-0.006
 pigeon-0.006
 at-0.006
Token found
Token ID1043
Feature activation-7.16e-03

Pos logit contributions
 dining+0.096
 counters+0.086
rd+0.086
zie+0.085
 fluids+0.084
Neg logit contributions
 judge-0.033
 judged-0.032
 teased-0.031
 scored-0.031
 pigeon-0.029
Token fly
Token ID6129
Feature activation+3.31e-02

Pos logit contributions
ked+0.023
 scored+0.022
 judged+0.022
 teased+0.022
 judge+0.018
Neg logit contributions
 dining-0.033
 blankets-0.033
 bed-0.031
 fort-0.031
 counters-0.030
Token was
Token ID373
Feature activation+1.68e-03

Pos logit contributions
 judge+0.058
 gol+0.053
 pigeon+0.052
 client+0.052
 lawyer+0.051
Neg logit contributions
 blankets-0.037
rated-0.037
row-0.036
rd-0.036
 directions-0.035
Token scared
Token ID12008
Feature activation-3.15e-03

Pos logit contributions
 judge+0.003
 pigeon+0.003
 gol+0.002
 scored+0.002
 bully+0.002
Neg logit contributions
 blankets-0.003
 dining-0.003
zie-0.002
 directions-0.002
 bed-0.002
Token.
Token ID13
Feature activation+1.24e-03

Pos logit contributions
 dining+0.005
 blankets+0.005
zie+0.005
 counters+0.005
rd+0.005
Neg logit contributions
 judge-0.005
 pigeon-0.005
 judged-0.005
 scored-0.005
ked-0.005
Token
Token ID198
Feature activation+1.05e-02

Pos logit contributions
 judge+0.002
 pigeon+0.002
 judged+0.002
ked+0.002
 pin+0.002
Neg logit contributions
 dining-0.002
 counters-0.002
rd-0.002
zie-0.002
 blankets-0.002
Token
Token ID198
Feature activation+2.99e-04

Pos logit contributions
 judge+0.010
 scored+0.010
 judged+0.010
 teased+0.010
 en+0.009
Neg logit contributions
 dining-0.026
 counters-0.024
zie-0.023
 blankets-0.023
rd-0.023
TokenThe
Token ID464
Feature activation+2.19e-02

Pos logit contributions
 judge+0.001
 judged+0.000
 pigeon+0.000
 scored+0.000
 teased+0.000
Neg logit contributions
 dining-0.001
rd-0.000
zie-0.000
 counters-0.000
 blankets-0.000
Token fly
Token ID6129
Feature activation+2.55e-02

Pos logit contributions
ked+0.031
 scored+0.028
 teased+0.028
 judged+0.028
 nor+0.025
Neg logit contributions
 dining-0.041
 blankets-0.039
 fort-0.037
 bed-0.036
 counters-0.036
Token tried
Token ID3088
Feature activation+7.47e-03

Pos logit contributions
 judge+0.040
 gol+0.036
 nerd+0.036
 pigeon+0.036
 judged+0.036
Neg logit contributions
rd-0.034
 blankets-0.033
zie-0.033
 dining-0.033
row-0.032
Token to
Token ID284
Feature activation-8.71e-03

Pos logit contributions
 pigeon+0.010
 judge+0.009
ked+0.009
 judged+0.009
 gol+0.008
Neg logit contributions
 blankets-0.015
 dining-0.015
 directions-0.014
 bed-0.014
zie-0.014
Token hide
Token ID7808
Feature activation+2.44e-02

Pos logit contributions
 dining+0.018
zie+0.017
 blankets+0.017
 counters+0.016
 fluids+0.016
Neg logit contributions
 judged-0.011
 scored-0.011
 judge-0.011
 pigeon-0.010
ked-0.010
Token until
Token ID1566
Feature activation-1.67e-02

Pos logit contributions
 dining+0.011
zie+0.010
rd+0.010
 blankets+0.010
 counters+0.010
Neg logit contributions
 judge-0.007
 judged-0.007
 pigeon-0.007
 scored-0.006
 teased-0.006
Token they
Token ID484
Feature activation-2.20e-02

Pos logit contributions
 dining+0.037
zie+0.033
 blankets+0.033
rd+0.033
 counters+0.033
Neg logit contributions
 judge-0.021
 judged-0.020
 pigeon-0.019
 teased-0.019
 en-0.019
Token came
Token ID1625
Feature activation-2.13e-03

Pos logit contributions
 dining+0.040
zie+0.037
rd+0.034
 blankets+0.033
 counters+0.031
Neg logit contributions
 judge-0.034
 pigeon-0.032
 bully-0.031
 judged-0.030
 en-0.030
Token to
Token ID284
Feature activation+3.32e-03

Pos logit contributions
 dining+0.006
rd+0.005
 counters+0.005
zie+0.005
 fluids+0.005
Neg logit contributions
 judge-0.002
 judged-0.002
 pigeon-0.002
 scored-0.002
 teased-0.002
Token a
Token ID257
Feature activation-2.22e-04

Pos logit contributions
 judge+0.003
 judged+0.003
 pigeon+0.003
 scored+0.003
 teased+0.003
Neg logit contributions
 dining-0.008
rd-0.008
zie-0.008
 counters-0.008
 blankets-0.007
Token little
Token ID1310
Feature activation-5.49e-04

Pos logit contributions
 dining+0.000
rd+0.000
zie+0.000
 counters+0.000
 blankets+0.000
Neg logit contributions
 judge-0.000
 pigeon-0.000
 judged-0.000
 scored-0.000
 en-0.000
Token island
Token ID7022
Feature activation-1.32e-02

Pos logit contributions
 dining+0.001
rd+0.001
zie+0.001
 counters+0.001
 fluids+0.001
Neg logit contributions
 judge-0.001
 pigeon-0.001
 judged-0.001
 teased-0.001
 jealous-0.001
Token with
Token ID351
Feature activation-1.26e-02

Pos logit contributions
 dining+0.029
zie+0.026
 fluids+0.026
 counters+0.026
rd+0.026
Neg logit contributions
 judge-0.017
 pigeon-0.016
 at-0.016
 judged-0.016
 scored-0.016
Token a
Token ID257
Feature activation-1.85e-02

Pos logit contributions
 dining+0.028
rd+0.026
zie+0.026
 counters+0.025
 fluids+0.024
Neg logit contributions
 judge-0.015
 judged-0.015
 pigeon-0.014
 scored-0.014
 teased-0.014
Token tree
Token ID5509
Feature activation-9.61e-03

Pos logit contributions
 dining+0.039
rd+0.036
zie+0.036
 counters+0.035
 blankets+0.034
Neg logit contributions
 judge-0.024
 judged-0.023
 pigeon-0.023
 scored-0.023
 teased-0.022
Token.
Token ID13
Feature activation-9.15e-03

Pos logit contributions
 dining+0.021
zie+0.019
 fluids+0.019
rd+0.019
 counters+0.019
Neg logit contributions
 judge-0.012
 pigeon-0.012
 judged-0.012
 at-0.012
 scored-0.012
TokenOnce
Token ID7454
Feature activation+1.41e-02

Pos logit contributions
 dining+0.032
 counters+0.030
zie+0.029
rd+0.029
 fluids+0.028
Neg logit contributions
 scored-0.011
 judge-0.010
 at-0.010
 teased-0.010
 judged-0.010
Token upon
Token ID2402
Feature activation+8.42e-03

Pos logit contributions
 judge+0.025
 judged+0.024
 scored+0.024
 pigeon+0.024
 en+0.024
Neg logit contributions
 dining-0.024
 counters-0.021
rd-0.021
 fluids-0.020
zie-0.020
Token a
Token ID257
Feature activation+1.34e-02

Pos logit contributions
 jealous+0.003
 pigeon+0.003
 en+0.003
 judged+0.003
 scored+0.003
Neg logit contributions
 dining-0.025
zie-0.024
 counters-0.024
rd-0.023
 occasions-0.023
Token time
Token ID640
Feature activation+8.89e-03

Pos logit contributions
 judge+0.017
 judged+0.017
 pigeon+0.017
 scored+0.017
 teased+0.016
Neg logit contributions
 dining-0.029
 counters-0.027
zie-0.025
 blankets-0.025
keeping-0.025
Token,
Token ID11
Feature activation+1.43e-02

Pos logit contributions
 judge+0.010
 judged+0.010
 scored+0.009
 teased+0.009
 pigeon+0.009
Neg logit contributions
 dining-0.021
 counters-0.019
 fluids-0.018
zie-0.018
 blankets-0.018
Token there
Token ID612
Feature activation-7.40e-04

Pos logit contributions
 at+0.014
 en+0.013
 judged+0.013
 judge+0.013
 jealous+0.013
Neg logit contributions
 dining-0.036
 counters-0.034
 occasions-0.033
 fluids-0.033
zie-0.032
Token was
Token ID373
Feature activation+2.08e-02

Pos logit contributions
 dining+0.002
 counters+0.002
rd+0.001
zie+0.001
 blankets+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 scored-0.001
 teased-0.001
 pigeon-0.001
Token a
Token ID257
Feature activation+1.87e-02

Pos logit contributions
 pigeon+0.013
 judge+0.013
 en+0.013
 jealous+0.013
 judged+0.013
Neg logit contributions
 dining-0.058
zie-0.054
 counters-0.054
rd-0.053
 occasions-0.052
Token clown
Token ID25573
Feature activation+3.13e-03

Pos logit contributions
ked+0.026
 judged+0.025
 scored+0.025
 teased+0.024
 judge+0.024
Neg logit contributions
 dining-0.038
 counters-0.033
 blankets-0.033
rd-0.033
zie-0.032
Token named
Token ID3706
Feature activation-1.71e-02

Pos logit contributions
 judged+0.005
ked+0.005
 scored+0.004
 en+0.004
 judge+0.004
Neg logit contributions
 dining-0.006
rd-0.005
 Amen-0.005
 ig-0.005
 blankets-0.005
Token Bob
Token ID5811
Feature activation-3.37e-02

Pos logit contributions
 dining+0.032
zie+0.029
rd+0.028
 blankets+0.028
 counters+0.027
Neg logit contributions
 judged-0.026
 judge-0.025
ked-0.024
 teased-0.024
 pigeon-0.024
Token and
Token ID290
Feature activation+8.99e-03

Pos logit contributions
 judged+0.038
 judge+0.038
 pigeon+0.037
ked+0.036
 teased+0.036
Neg logit contributions
 dining-0.054
 blankets-0.048
 counters-0.047
rd-0.046
zie-0.045
Token pointed
Token ID6235
Feature activation-1.25e-02

Pos logit contributions
 judge+0.014
 pigeon+0.013
 en+0.013
 bully+0.013
 gol+0.013
Neg logit contributions
 dining-0.016
 blankets-0.014
 counters-0.014
zie-0.013
 beds-0.013
Token to
Token ID284
Feature activation-1.28e-03

Pos logit contributions
 dining+0.026
 blankets+0.025
 counters+0.024
 directions+0.024
zie+0.023
Neg logit contributions
 judge-0.015
 judged-0.014
 bully-0.014
 scored-0.014
 gol-0.013
Token a
Token ID257
Feature activation+1.99e-04

Pos logit contributions
 dining+0.003
 blankets+0.003
 counters+0.002
zie+0.002
rd+0.002
Neg logit contributions
 judge-0.002
 judged-0.002
ked-0.002
 pigeon-0.001
 scored-0.001
Token word
Token ID1573
Feature activation+7.91e-03

Pos logit contributions
ked+0.000
 judged+0.000
 judge+0.000
 teased+0.000
 scored+0.000
Neg logit contributions
 dining-0.000
 counters-0.000
 blankets-0.000
rd-0.000
zie-0.000
Token.
Token ID13
Feature activation-5.49e-03

Pos logit contributions
 judged+0.014
 judge+0.013
 teased+0.013
 en+0.013
 pigeon+0.013
Neg logit contributions
 dining-0.013
 blankets-0.011
 counters-0.011
rd-0.011
zie-0.010
Token "
Token ID366
Feature activation+8.44e-03

Pos logit contributions
 dining+0.008
 counters+0.008
 blankets+0.007
 bed+0.007
 bedroom+0.007
Neg logit contributions
 pigeon-0.011
 judge-0.010
 judged-0.010
 ar-0.010
ked-0.010
TokenThis
Token ID1212
Feature activation+3.77e-03

Pos logit contributions
ked+0.016
 judged+0.015
 en+0.015
 judge+0.015
 teased+0.015
Neg logit contributions
 dining-0.011
 Amen-0.010
rd-0.010
 counters-0.010
 occasions-0.010
Token is
Token ID318
Feature activation-6.50e-03

Pos logit contributions
ked+0.008
 judged+0.008
 en+0.008
 teased+0.007
 pigeon+0.007
Neg logit contributions
 dining-0.005
rd-0.004
 blankets-0.004
 occasions-0.004
zie-0.004
Token cat
Token ID3797
Feature activation+3.07e-03

Pos logit contributions
 blankets+0.010
 dining+0.010
zie+0.009
 bed+0.008
 hinges+0.008
Neg logit contributions
ked-0.011
 judged-0.011
 bully-0.011
 pigeon-0.011
 teased-0.010
Token,"
Token ID553
Feature activation+7.92e-03

Pos logit contributions
 judged+0.006
ked+0.005
 judge+0.005
 en+0.005
 scored+0.005
Neg logit contributions
 dining-0.005
rd-0.004
 blankets-0.004
zie-0.004
 counters-0.004
Token
Token ID198
Feature activation-1.92e-03

Pos logit contributions
 dining+0.006
 counters+0.005
 blankets+0.005
rd+0.005
zie+0.005
Neg logit contributions
 judge-0.007
 pigeon-0.007
 judged-0.006
ked-0.006
 teased-0.006
Token
Token ID198
Feature activation-1.58e-02

Pos logit contributions
 dining+0.005
rd+0.005
zie+0.005
 counters+0.005
 occasions+0.005
Neg logit contributions
 judged-0.001
 en-0.001
 scored-0.001
 judge-0.001
 at-0.001
Token"
Token ID1
Feature activation-1.07e-02

Pos logit contributions
 dining+0.037
zie+0.035
 occasions+0.034
 blankets+0.033
heart+0.033
Neg logit contributions
 scored-0.019
 at-0.018
 judge-0.018
 judged-0.017
 en-0.017
TokenOh
Token ID5812
Feature activation-1.16e-02

Pos logit contributions
 dining+0.017
 counters+0.015
 fluids+0.015
 blankets+0.014
zie+0.014
Neg logit contributions
 judge-0.020
 judged-0.019
 pigeon-0.019
 en-0.018
 scored-0.018
Token no
Token ID645
Feature activation-3.21e-02

Pos logit contributions
 dining+0.031
 counters+0.029
zie+0.028
 fluids+0.028
rd+0.028
Neg logit contributions
 judge-0.010
 pigeon-0.009
 judged-0.009
 scored-0.009
 teased-0.009
Token,
Token ID11
Feature activation-6.04e-03

Pos logit contributions
 dining+0.083
 counters+0.076
zie+0.072
rolley+0.071
keeping+0.071
Neg logit contributions
 at-0.032
 judge-0.032
 teased-0.032
 pin-0.031
 scored-0.031
Token Tom
Token ID4186
Feature activation-1.80e-02

Pos logit contributions
 dining+0.013
rd+0.012
zie+0.012
 counters+0.012
 blankets+0.011
Neg logit contributions
 judge-0.008
 judged-0.007
 pigeon-0.007
 scored-0.007
 teased-0.007
Token,
Token ID11
Feature activation-3.00e-04

Pos logit contributions
 dining+0.046
 fluids+0.040
 counters+0.040
rd+0.039
 occasions+0.039
Neg logit contributions
 judge-0.020
 at-0.020
 teased-0.020
 pin-0.020
 scored-0.019
Token what
Token ID644
Feature activation+1.41e-03

Pos logit contributions
 dining+0.001
rd+0.000
 blankets+0.000
zie+0.000
 counters+0.000
Neg logit contributions
 judged-0.000
 judge-0.000
ked-0.000
 scored-0.000
 teased-0.000
Token did
Token ID750
Feature activation-3.58e-03

Pos logit contributions
ked+0.002
 pigeon+0.002
 judged+0.002
 judge+0.002
 en+0.002
Neg logit contributions
 dining-0.003
rd-0.003
 counters-0.003
 blankets-0.002
zie-0.002
Token you
Token ID345
Feature activation+5.76e-03

Pos logit contributions
 dining+0.006
rd+0.005
zie+0.005
 counters+0.005
 blankets+0.005
Neg logit contributions
 judged-0.006
 judge-0.006
ked-0.006
 pigeon-0.006
 teased-0.006
Token.
Token ID13
Feature activation+3.34e-03

Pos logit contributions
 dining+0.046
rd+0.041
zie+0.040
 counters+0.040
 blankets+0.040
Neg logit contributions
 judge-0.042
 judged-0.042
 pigeon-0.041
 scored-0.040
 teased-0.040
Token She
Token ID1375
Feature activation+1.00e-02

Pos logit contributions
 judge+0.006
 pigeon+0.006
 judged+0.006
 scored+0.006
ked+0.006
Neg logit contributions
 dining-0.006
 counters-0.005
zie-0.005
rd-0.005
 blankets-0.005
Token used
Token ID973
Feature activation+3.34e-03

Pos logit contributions
 judge+0.013
 pigeon+0.012
ked+0.012
 en+0.012
 judged+0.012
Neg logit contributions
 dining-0.020
 counters-0.018
 blankets-0.018
zie-0.018
rd-0.018
Token the
Token ID262
Feature activation-7.57e-03

Pos logit contributions
 judge+0.005
 bully+0.005
 pigeon+0.005
 scored+0.005
 judged+0.005
Neg logit contributions
 blankets-0.006
 dining-0.006
zie-0.005
 counters-0.005
 bed-0.005
Token peanuts
Token ID45612
Feature activation+9.15e-03

Pos logit contributions
 dining+0.017
rd+0.015
zie+0.015
 counters+0.015
 fluids+0.014
Neg logit contributions
 judge-0.010
 judged-0.010
 pigeon-0.009
 scored-0.009
 teased-0.009
Token to
Token ID284
Feature activation-1.91e-02

Pos logit contributions
 judge+0.012
 judged+0.012
 bully+0.012
 en+0.012
 pigeon+0.012
Neg logit contributions
 blankets-0.017
 dining-0.016
 counters-0.015
 fort-0.015
 contents-0.015
Token make
Token ID787
Feature activation-1.91e-03

Pos logit contributions
 dining+0.036
zie+0.032
 counters+0.032
 blankets+0.031
rd+0.031
Neg logit contributions
 judged-0.028
 judge-0.028
 pigeon-0.028
 scored-0.028
ked-0.027
Token a
Token ID257
Feature activation+3.76e-03

Pos logit contributions
 dining+0.004
 counters+0.003
 blankets+0.003
zie+0.003
rd+0.003
Neg logit contributions
 judge-0.003
 judged-0.003
 pigeon-0.003
 scored-0.003
ked-0.003
Token peanut
Token ID26636
Feature activation-7.11e-04

Pos logit contributions
 judge+0.005
 judged+0.005
 pigeon+0.005
 scored+0.005
 teased+0.005
Neg logit contributions
 dining-0.007
rd-0.007
zie-0.007
 counters-0.007
 blankets-0.006
Token butter
Token ID9215
Feature activation+1.53e-03

Pos logit contributions
 dining+0.001
 counters+0.001
rd+0.001
zie+0.001
 blankets+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 pigeon-0.001
 scored-0.001
 teased-0.001
Token sandwich
Token ID20433
Feature activation-1.20e-03

Pos logit contributions
 judge+0.002
 judged+0.002
 pigeon+0.002
 scored+0.002
 teased+0.002
Neg logit contributions
 dining-0.003
rd-0.003
zie-0.003
 counters-0.003
 blankets-0.003
Token my
Token ID616
Feature activation+1.11e-03

Pos logit contributions
 dining+0.018
rd+0.016
 counters+0.016
zie+0.016
 fluids+0.015
Neg logit contributions
 judge-0.012
 pigeon-0.011
 judged-0.011
 en-0.011
 jealous-0.011
Token lively
Token ID29696
Feature activation+8.32e-04

Pos logit contributions
 judge+0.002
 pigeon+0.002
 judged+0.002
 scored+0.002
 en+0.002
Neg logit contributions
 dining-0.002
zie-0.002
rd-0.002
 counters-0.002
 blankets-0.002
Token twins
Token ID20345
Feature activation-1.33e-02

Pos logit contributions
 judge+0.001
 pigeon+0.001
 judged+0.001
 teased+0.001
 en+0.001
Neg logit contributions
 dining-0.002
zie-0.001
 fluids-0.001
rd-0.001
 blankets-0.001
Token.
Token ID13
Feature activation+1.34e-02

Pos logit contributions
 dining+0.020
rd+0.017
 counters+0.016
 blankets+0.016
zie+0.015
Neg logit contributions
 pigeon-0.025
 judged-0.025
 judge-0.024
 scored-0.024
 en-0.024
Token I
Token ID314
Feature activation-5.28e-03

Pos logit contributions
 judge+0.019
 judged+0.018
 pigeon+0.018
 teased+0.017
 en+0.017
Neg logit contributions
 dining-0.027
rd-0.024
 counters-0.024
zie-0.024
 blankets-0.024
Token know
Token ID760
Feature activation-1.50e-02

Pos logit contributions
 dining+0.009
rd+0.008
 fluids+0.008
zie+0.008
 counters+0.007
Neg logit contributions
ked-0.009
 pigeon-0.009
 judge-0.008
 judged-0.008
 scored-0.008
Token you
Token ID345
Feature activation+1.18e-02

Pos logit contributions
 dining+0.030
rd+0.026
zie+0.026
 counters+0.026
 blankets+0.025
Neg logit contributions
 judge-0.022
 judged-0.021
 pigeon-0.021
 scored-0.021
 teased-0.020
Token didn
Token ID1422
Feature activation+6.83e-03

Pos logit contributions
 judge+0.017
 pigeon+0.017
ked+0.017
 judged+0.016
 en+0.016
Neg logit contributions
 dining-0.022
rd-0.019
 fluids-0.019
zie-0.019
 counters-0.019
Token't
Token ID470
Feature activation-8.19e-03

Pos logit contributions
 judged+0.009
 judge+0.009
 pigeon+0.008
ked+0.008
 scored+0.008
Neg logit contributions
 dining-0.014
rd-0.013
 counters-0.013
zie-0.013
 blankets-0.013
Token mean
Token ID1612
Feature activation+2.99e-02

Pos logit contributions
 dining+0.018
rd+0.017
zie+0.017
 counters+0.017
 blankets+0.016
Neg logit contributions
ked-0.009
 pigeon-0.009
 judged-0.008
 judge-0.008
 scored-0.008
Token to
Token ID284
Feature activation+2.45e-02

Pos logit contributions
 judged+0.030
ked+0.030
 judge+0.029
 scored+0.028
 en+0.027
Neg logit contributions
 dining-0.064
 counters-0.063
rd-0.062
 blankets-0.061
 directions-0.060
Token.
Token ID13
Feature activation-2.63e-03

Pos logit contributions
 dining+0.078
rd+0.073
zie+0.073
 fluids+0.071
 counters+0.071
Neg logit contributions
 judge-0.040
 at-0.038
 judged-0.038
 scored-0.037
 pigeon-0.037
Token They
Token ID1119
Feature activation-2.53e-02

Pos logit contributions
 dining+0.004
 counters+0.004
rd+0.004
 blankets+0.004
zie+0.004
Neg logit contributions
 judge-0.005
 judged-0.005
 pigeon-0.004
 scored-0.004
 teased-0.004
Token take
Token ID1011
Feature activation-1.93e-02

Pos logit contributions
 dining+0.045
 blankets+0.040
 counters+0.039
zie+0.038
 bedrooms+0.035
Neg logit contributions
 judged-0.040
ked-0.040
 pigeon-0.040
 judge-0.039
 teased-0.038
Token out
Token ID503
Feature activation-6.17e-03

Pos logit contributions
 dining+0.037
 blankets+0.035
 counters+0.034
rd+0.034
zie+0.032
Neg logit contributions
 judge-0.029
 judged-0.028
 pigeon-0.026
 teased-0.026
 scored-0.026
Token two
Token ID734
Feature activation-3.03e-02

Pos logit contributions
 dining+0.013
 blankets+0.012
 counters+0.012
rd+0.012
zie+0.012
Neg logit contributions
 judge-0.008
 judged-0.008
 scored-0.007
 pigeon-0.007
 teased-0.007
Token toys
Token ID14958
Feature activation-2.62e-02

Pos logit contributions
 dining+0.055
zie+0.054
rd+0.050
 fluids+0.049
 Amen+0.047
Neg logit contributions
 judge-0.047
 at-0.047
 judged-0.046
 pigeon-0.046
 en-0.046
Token.
Token ID13
Feature activation-4.20e-03

Pos logit contributions
 dining+0.047
 blankets+0.042
 counters+0.041
rd+0.040
zie+0.038
Neg logit contributions
 judged-0.042
 judge-0.042
 en-0.041
 pigeon-0.040
 scored-0.040
Token One
Token ID1881
Feature activation-1.35e-02

Pos logit contributions
 dining+0.007
 counters+0.006
rd+0.006
 blankets+0.006
zie+0.006
Neg logit contributions
 judge-0.008
 judged-0.008
 pigeon-0.007
 scored-0.007
 teased-0.007
Token is
Token ID318
Feature activation+1.58e-03

Pos logit contributions
 dining+0.026
rd+0.023
 counters+0.023
zie+0.023
 blankets+0.023
Neg logit contributions
 judge-0.021
 judged-0.020
 pigeon-0.020
 scored-0.020
ked-0.020
Token a
Token ID257
Feature activation-2.01e-03

Pos logit contributions
 judge+0.002
 judged+0.002
 pigeon+0.002
ked+0.002
 bully+0.002
Neg logit contributions
 dining-0.003
 blankets-0.003
 counters-0.003
zie-0.003
rd-0.003
Token car
Token ID1097
Feature activation-1.33e-02

Pos logit contributions
 dining+0.004
 blankets+0.003
 counters+0.003
rd+0.003
zie+0.003
Neg logit contributions
ked-0.003
 judged-0.003
 teased-0.003
 judge-0.003
 scored-0.003
Token story
Token ID1621
Feature activation+9.17e-03

Pos logit contributions
 dining+0.003
rd+0.002
 counters+0.002
zie+0.002
 blankets+0.002
Neg logit contributions
 judge-0.003
 judged-0.003
 pigeon-0.003
 scored-0.003
 teased-0.003
Token.
Token ID13
Feature activation-1.49e-02

Pos logit contributions
 judged+0.016
 judge+0.015
 pigeon+0.015
ked+0.015
 teased+0.015
Neg logit contributions
 dining-0.016
rd-0.014
 counters-0.014
 blankets-0.014
zie-0.013
Token
Token ID198
Feature activation-2.09e-02

Pos logit contributions
 dining+0.027
 counters+0.024
zie+0.024
rd+0.024
 blankets+0.023
Neg logit contributions
 judge-0.024
 pigeon-0.024
 judged-0.024
ked-0.023
 scored-0.023
Token
Token ID198
Feature activation-2.80e-02

Pos logit contributions
 dining+0.052
zie+0.048
 counters+0.048
 blankets+0.048
rd+0.047
Neg logit contributions
 judge-0.019
 judged-0.018
 scored-0.018
 teased-0.017
 en-0.017
Token"
Token ID1
Feature activation-9.10e-03

Pos logit contributions
 dining+0.048
zie+0.045
 blankets+0.044
 counters+0.043
 fluids+0.043
Neg logit contributions
 judge-0.047
 scored-0.045
 judged-0.045
 jealous-0.043
 pigeon-0.043
TokenMom
Token ID29252
Feature activation-4.78e-02

Pos logit contributions
 dining+0.015
zie+0.012
rd+0.012
 counters+0.012
 blankets+0.012
Neg logit contributions
 judge-0.017
 judged-0.016
 pigeon-0.016
ked-0.016
 en-0.016
Tokenmy
Token ID1820
Feature activation-3.14e-02

Pos logit contributions
 dining+0.069
 fluids+0.064
rd+0.061
 counters+0.059
 occasions+0.057
Neg logit contributions
 judge-0.097
 teased-0.093
 judged-0.093
 at-0.092
 pin-0.091
Token,
Token ID11
Feature activation-1.70e-02

Pos logit contributions
 dining+0.062
rd+0.055
zie+0.055
 counters+0.055
 blankets+0.054
Neg logit contributions
 judge-0.045
 judged-0.044
 pigeon-0.043
 scored-0.043
 teased-0.042
Token I
Token ID314
Feature activation-1.61e-02

Pos logit contributions
 dining+0.036
rd+0.033
zie+0.032
 counters+0.032
 blankets+0.032
Neg logit contributions
 judge-0.022
 judged-0.022
 pigeon-0.021
 scored-0.021
 teased-0.021
Token'm
Token ID1101
Feature activation-1.74e-02

Pos logit contributions
 dining+0.031
rd+0.028
zie+0.027
 counters+0.026
 fluids+0.026
Neg logit contributions
ked-0.024
 pigeon-0.023
 en-0.022
 judge-0.022
 judged-0.022
Token enjoying
Token ID13226
Feature activation+1.33e-02

Pos logit contributions
 dining+0.032
 counters+0.025
 bed+0.025
rd+0.025
 blankets+0.024
Neg logit contributions
ked-0.030
 pigeon-0.028
 scored-0.028
 judge-0.027
 judged-0.027
Token.
Token ID13
Feature activation-2.02e-03

Pos logit contributions
 dining+0.031
 blankets+0.028
 counters+0.027
rd+0.026
zie+0.026
Neg logit contributions
 judge-0.035
 pigeon-0.034
 judged-0.034
 scored-0.033
 teased-0.032
Token
Token ID198
Feature activation-6.44e-04

Pos logit contributions
 dining+0.002
 counters+0.002
 blankets+0.002
zie+0.002
rd+0.002
Neg logit contributions
 pigeon-0.004
 judge-0.004
 judged-0.004
 ar-0.004
ked-0.004
Token
Token ID198
Feature activation-1.02e-02

Pos logit contributions
 dining+0.001
rd+0.001
zie+0.001
 counters+0.001
 blankets+0.001
Neg logit contributions
 judge-0.001
 judged-0.001
 scored-0.001
 pigeon-0.001
 teased-0.001
TokenTheir
Token ID14574
Feature activation-6.07e-03

Pos logit contributions
 dining+0.018
zie+0.016
rd+0.016
 counters+0.015
 blankets+0.015
Neg logit contributions
 judge-0.018
 judged-0.018
 scored-0.017
 pigeon-0.017
 teased-0.017
Token moms
Token ID34015
Feature activation-7.75e-03

Pos logit contributions
 dining+0.011
 blankets+0.010
 counters+0.010
rd+0.009
zie+0.009
Neg logit contributions
 judged-0.010
 pigeon-0.009
 judge-0.009
 scored-0.009
ked-0.009
Token heard
Token ID2982
Feature activation-4.49e-02

Pos logit contributions
 dining+0.016
 counters+0.015
rd+0.015
zie+0.015
 blankets+0.014
Neg logit contributions
 judge-0.010
 judged-0.010
 pigeon-0.010
 scored-0.009
 en-0.009
Token the
Token ID262
Feature activation-2.07e-03

Pos logit contributions
 dining+0.105
rd+0.096
zie+0.092
 counters+0.090
 fluids+0.089
Neg logit contributions
 judge-0.052
 judged-0.051
 pigeon-0.050
 teased-0.050
 en-0.050
Token noise
Token ID7838
Feature activation-8.67e-03

Pos logit contributions
 dining+0.004
zie+0.004
rd+0.003
 fluids+0.003
 counters+0.003
Neg logit contributions
 judge-0.003
 bully-0.003
 jealous-0.003
 en-0.003
 judged-0.003
Token and
Token ID290
Feature activation+1.66e-02

Pos logit contributions
 dining+0.017
 counters+0.016
 blankets+0.016
rd+0.015
zie+0.015
Neg logit contributions
 judge-0.012
 judged-0.012
 pigeon-0.012
 scored-0.012
ked-0.011
Token came
Token ID1625
Feature activation-2.08e-02

Pos logit contributions
 judge+0.018
 pigeon+0.017
 judged+0.017
 scored+0.017
 teased+0.016
Neg logit contributions
 dining-0.038
 counters-0.035
zie-0.035
 blankets-0.034
rd-0.034
Token to
Token ID284
Feature activation-1.31e-02

Pos logit contributions
 dining+0.072
rd+0.069
 fluids+0.068
zie+0.066
 Amen+0.064
Neg logit contributions
 at-0.004
 to-0.004
 teased-0.001
 judge-0.000
 pin+0.000
Token garden
Token ID11376
Feature activation-5.51e-03

Pos logit contributions
 dining+0.017
rd+0.016
 occasions+0.016
 Amen+0.016
 fluids+0.016
Neg logit contributions
 judge-0.012
 en-0.011
 jealous-0.011
 judged-0.011
 shy-0.011
Token.
Token ID13
Feature activation-2.03e-02

Pos logit contributions
 dining+0.011
rd+0.009
zie+0.009
 counters+0.009
 blankets+0.009
Neg logit contributions
 judge-0.008
 judged-0.008
 pigeon-0.008
 scored-0.008
 teased-0.008
Token<|endoftext|>
Token ID50256
Feature activation-1.20e-02

Pos logit contributions
 dining+0.042
 counters+0.037
zie+0.037
rd+0.037
 blankets+0.037
Neg logit contributions
 judge-0.027
 judged-0.027
 scored-0.026
 pigeon-0.025
 en-0.025
Token"
Token ID1
Feature activation-5.63e-03

Pos logit contributions
 dining+0.027
 counters+0.025
rd+0.025
zie+0.025
 fluids+0.024
Neg logit contributions
 scored-0.013
 judge-0.013
 at-0.013
 teased-0.012
 judged-0.012
TokenMom
Token ID29252
Feature activation-5.83e-02

Pos logit contributions
 dining+0.010
rd+0.009
zie+0.008
 counters+0.008
 blankets+0.008
Neg logit contributions
 judge-0.010
 judged-0.009
 pigeon-0.009
 scored-0.009
 teased-0.009
Tokenmy
Token ID1820
Feature activation-4.61e-02

Pos logit contributions
 dining+0.087
 fluids+0.085
rd+0.084
 counters+0.077
 occasions+0.077
Neg logit contributions
 at-0.109
 judge-0.108
 teased-0.105
 judged-0.103
 scored-0.102
Token,
Token ID11
Feature activation-9.67e-03

Pos logit contributions
 dining+0.097
rd+0.088
 counters+0.087
 fluids+0.086
zie+0.086
Neg logit contributions
 judge-0.062
 judged-0.059
 scored-0.058
 pigeon-0.058
 teased-0.058
Token what
Token ID644
Feature activation+2.01e-02

Pos logit contributions
 dining+0.020
 counters+0.018
rd+0.018
zie+0.018
 blankets+0.017
Neg logit contributions
 judge-0.013
 judged-0.013
 pigeon-0.012
 scored-0.012
 teased-0.012
Token's
Token ID338
Feature activation-8.12e-03

Pos logit contributions
ked+0.032
 judged+0.031
 judge+0.031
 pigeon+0.030
 teased+0.030
Neg logit contributions
 dining-0.037
rd-0.033
 counters-0.031
 blankets-0.030
 fluids-0.030
Token that
Token ID326
Feature activation-1.49e-02

Pos logit contributions
 dining+0.016
rd+0.014
 counters+0.014
zie+0.014
 blankets+0.014
Neg logit contributions
 judged-0.012
 judge-0.012
ked-0.011
 scored-0.011
 teased-0.011
Token?"
Token ID1701
Feature activation-1.49e-02

Pos logit contributions
 dining+0.020
zie+0.017
 counters+0.016
 blankets+0.016
rd+0.015
Neg logit contributions
ked-0.033
 judged-0.030
 scored-0.029
 pigeon-0.029
 judge-0.029
Token!
Token ID0
Feature activation-2.28e-02

Pos logit contributions
 dining+0.046
rd+0.041
 fluids+0.041
 counters+0.040
zie+0.040
Neg logit contributions
 judge-0.030
 judged-0.028
 teased-0.028
 pigeon-0.028
 scored-0.027
Token
Token ID198
Feature activation-2.66e-02

Pos logit contributions
 dining+0.042
 counters+0.036
zie+0.036
rd+0.036
 blankets+0.035
Neg logit contributions
 judge-0.037
 judged-0.036
 pigeon-0.036
 scored-0.035
ked-0.035
Token
Token ID198
Feature activation-3.20e-02

Pos logit contributions
 dining+0.063
zie+0.058
rd+0.057
 counters+0.056
 blankets+0.056
Neg logit contributions
 judge-0.029
 judged-0.028
 scored-0.028
 en-0.027
 teased-0.026
TokenM
Token ID44
Feature activation-2.79e-02

Pos logit contributions
 dining+0.054
zie+0.048
rd+0.047
 counters+0.046
 blankets+0.046
Neg logit contributions
 judge-0.057
 judged-0.056
 pigeon-0.055
 scored-0.055
 teased-0.054
Tokenummy
Token ID13513
Feature activation-3.07e-03

Pos logit contributions
 dining+0.038
 blankets+0.032
rd+0.032
zie+0.031
 counters+0.031
Neg logit contributions
 judge-0.059
 judged-0.057
 pigeon-0.056
 scored-0.055
 teased-0.055
Token asked
Token ID1965
Feature activation-4.44e-02

Pos logit contributions
 dining+0.007
zie+0.006
rd+0.006
 counters+0.006
 blankets+0.006
Neg logit contributions
 judge-0.004
 judged-0.004
 pigeon-0.004
 scored-0.003
 teased-0.003
Token,
Token ID11
Feature activation-7.21e-03

Pos logit contributions
 dining+0.091
rd+0.082
zie+0.081
 counters+0.081
 blankets+0.079
Neg logit contributions
 judge-0.062
 judged-0.060
 scored-0.058
 pigeon-0.058
 teased-0.058
Token
Token ID6184
Feature activation-1.01e-02

Pos logit contributions
 dining+0.014
 counters+0.012
zie+0.012
rd+0.012
 blankets+0.011
Neg logit contributions
 judge-0.011
 judged-0.011
 scored-0.011
 pigeon-0.011
 teased-0.010
Token
Token ID95
Feature activation-7.44e-03

Pos logit contributions
 dining+0.018
 counters+0.016
rd+0.016
zie+0.015
 fluids+0.015
Neg logit contributions
 judged-0.017
 judge-0.017
 scored-0.016
 teased-0.016
 pigeon-0.016
Token
Token ID26391
Feature activation+8.04e-03

Pos logit contributions
 dining+0.013
 blankets+0.011
zie+0.011
 counters+0.011
rd+0.011
Neg logit contributions
 judge-0.013
 judged-0.012
 scored-0.012
 pigeon-0.012
 teased-0.012
Token
Token ID129
Feature activation+9.96e-03

Pos logit contributions
 judge+0.012
 judged+0.012
 pigeon+0.012
 scored+0.012
 teased+0.012
Neg logit contributions
 dining-0.015
zie-0.013
 counters-0.013
rd-0.013
 blankets-0.013
Token to
Token ID284
Feature activation+1.99e-02

Pos logit contributions
 dining+0.024
 blankets+0.022
 counters+0.021
rd+0.021
zie+0.021
Neg logit contributions
 judged-0.008
 judge-0.008
 pigeon-0.008
 scored-0.007
 teased-0.007
Token the
Token ID262
Feature activation+2.00e-02

Pos logit contributions
 judge+0.021
 judged+0.021
 pigeon+0.020
 scored+0.020
ked+0.019
Neg logit contributions
 dining-0.047
rd-0.043
zie-0.043
 counters-0.043
 blankets-0.042
Token park
Token ID3952
Feature activation+8.34e-03

Pos logit contributions
ked+0.038
 judged+0.037
 teased+0.036
 scored+0.036
 judge+0.034
Neg logit contributions
 dining-0.033
 blankets-0.026
 counters-0.026
rd-0.026
zie-0.026
Token with
Token ID351
Feature activation-3.79e-03

Pos logit contributions
 judge+0.009
 judged+0.009
 pigeon+0.009
 scored+0.008
ked+0.008
Neg logit contributions
 dining-0.020
rd-0.018
 counters-0.018
zie-0.018
 blankets-0.017
Token her
Token ID607
Feature activation-3.04e-03

Pos logit contributions
 dining+0.010
rd+0.009
 counters+0.009
zie+0.009
 fluids+0.009
Neg logit contributions
 judge-0.003
 judged-0.003
 teased-0.003
 en-0.003
 jealous-0.003
Token mom
Token ID1995
Feature activation-4.77e-02

Pos logit contributions
 dining+0.006
 blankets+0.005
 counters+0.005
zie+0.005
rd+0.005
Neg logit contributions
 scored-0.004
 judged-0.004
 judge-0.004
ked-0.004
 pigeon-0.004
Token.
Token ID13
Feature activation+2.44e-03

Pos logit contributions
 dining+0.114
 fluids+0.108
 counters+0.106
rd+0.106
zie+0.104
Neg logit contributions
 at-0.055
 judge-0.052
 teased-0.046
 to-0.045
 judged-0.044
Token They
Token ID1119
Feature activation-1.46e-02

Pos logit contributions
 judge+0.003
 judged+0.003
 teased+0.003
 pigeon+0.003
 scored+0.003
Neg logit contributions
 dining-0.005
rd-0.004
zie-0.004
 counters-0.004
 fluids-0.004
Token saw
Token ID2497
Feature activation+9.88e-04

Pos logit contributions
 dining+0.030
zie+0.026
 blankets+0.026
 counters+0.025
 bedrooms+0.024
Neg logit contributions
 judge-0.019
 bully-0.019
 pigeon-0.019
 judged-0.018
ked-0.018
Token an
Token ID281
Feature activation-5.48e-03

Pos logit contributions
 judge+0.001
 judged+0.001
 scored+0.001
ked+0.001
 teased+0.001
Neg logit contributions
 dining-0.002
 blankets-0.002
 counters-0.002
zie-0.002
rd-0.002
Token interesting
Token ID3499
Feature activation-3.79e-03

Pos logit contributions
 dining+0.011
rd+0.010
zie+0.010
 counters+0.010
 blankets+0.010
Neg logit contributions
 judge-0.007
 judged-0.007
 scored-0.007
ked-0.007
 pigeon-0.007