TOP ACTIVATIONS
MAX = 0.044

lion with a big man e and sharp teeth . Lily
lion with a big man e and sharp teeth .
he gave a big sne e ze ! Big Bear sne
, after an extra - long flight through the desert ,
He has a big man e of hair and a loud
a big , scary sne e ze . Every time he
. He had a man e and sharp teeth . He
He had a thick man e and he loved to roar
lion with a fluffy man e . She saw a tall
horse with a long man e and a tail . It
yellow and has a man e . It ro ars like
horse with a long man e and a tail . And
suddenly heard a big sne e ze . She opened the
It has a long man e and a long tail .
lion with a golden man e . He looked at her
A few days later , Sab rina had to stay in
and had a long man e . Tom wanted to ride
let out a big sne e ze ! It felt so
with just a tiny sne e ze . He was so
horse with a long man e and a tail . L

MIN ACTIVATIONS
MIN = -0.055

you how to wash the rad ishes ." The little girl
was able to wash the rad ishes and get a drink
they started to pick the blo oms .
he decided to remove the seaw eed from the water so
boy tried to make the rad ish happy . He sang
and together they shook the rad ish until it finally came
the sun would make the sc ooter too hot to touch
, it kept on flying err atically with inexplicable speed .
him how to hold the golf club and how to swing
Jimmy quickly cleaned his sc ooter and then had a
was going to fix the sc ooter . First , Daddy
, " What is this big cross ?" Lily said ,
He decided to take his sc ooter out into the backyard
flew down and snatched the rad ish from Ben . It
Bob tried to pull the rad ish out . He pulled
him how to hold the golf stick and hit the ball
. He showed them how to hold the ladder and slide
, " Can I pick those flowers ?" Her mom smiled
was ready to pick the rad ish . She pl ucked
said . She put the cherry in her mouth and smiled

INTERVAL 0.019 - 0.044
CONTAINS 0.876%

shelf , and they could not reach it . They tried
mom heard the noise and came to see what happened .
was very sad and didn 't know what to do .
home with the fan ci est king in the land .
, there was a little bear named Teddy . Teddy had

INTERVAL -0.005 - 0.019
CONTAINS 74.251%

attend to other customers . Jo anne and Joey said goodbye
the tomato on the floor . He is frightened and starts
and took the button out of his mouth . He felt
their favorite shows . Sometimes they watched cartoons , sometimes they
! You spoil the cr ay ons !" Tom

INTERVAL -0.030 - -0.005
CONTAINS 24.839%

mom said , " Good job , Lily ! Now we
Ben says . He tries to lift the rock , but
sorry . They take Mia and Tim home . They put
hand and pur red . L ily hugged the
say , " Okay , Mom . We will share .

INTERVAL -0.055 - -0.030
CONTAINS 0.034%

sad ." Anna and Ben look at each other
mom my showed her how to use just a little bit
" Good job , Lily and Tom !" Mom says .
house . They wash the beans and put them in a
and it kept her warm even when it was cold outside
Token lion
Token ID18744
Feature activation+1.91e-02

Pos logit contributions
 agrees+0.009
 cleans+0.008
ours+0.008
 learns+0.008
 wasteful+0.008
Neg logit contributions
 stub-0.004
 all-0.004
 ever-0.004
 nowhere-0.004
--0.004
Token with
Token ID351
Feature activation+2.10e-02

Pos logit contributions
 stub+0.013
 all+0.011
 year+0.011
 nowhere+0.011
 ever+0.010
Neg logit contributions
 agrees-0.051
 cleans-0.047
ours-0.047
 learns-0.046
 cooperate-0.044
Token a
Token ID257
Feature activation+3.57e-03

Pos logit contributions
 stub+0.024
 year+0.023
-+0.021
 pupil+0.021
 nowhere+0.020
Neg logit contributions
 agrees-0.045
 cleans-0.043
ours-0.040
 learns-0.040
 yeast-0.040
Token big
Token ID1263
Feature activation-7.63e-03

Pos logit contributions
 all+0.003
 stub+0.003
 ever+0.003
 year+0.003
-+0.003
Neg logit contributions
 agrees-0.009
 cleans-0.008
ours-0.008
 learns-0.008
 wasteful-0.007
Token man
Token ID582
Feature activation+9.98e-03

Pos logit contributions
 agrees+0.017
 cleans+0.016
ours+0.015
 learns+0.015
 cooperate+0.014
Neg logit contributions
 stub-0.008
 all-0.008
 year-0.008
 ever-0.008
 nowhere-0.008
Tokene
Token ID68
Feature activation+4.40e-02

Pos logit contributions
 stub+0.008
 all+0.008
 year+0.007
 nowhere+0.007
 t+0.006
Neg logit contributions
 agrees-0.025
 cleans-0.023
ours-0.023
 learns-0.022
 cooperate-0.021
Token and
Token ID290
Feature activation+3.80e-03

Pos logit contributions
 stub+0.051
 nowhere+0.049
 pupil+0.045
 year+0.045
 ever+0.044
Neg logit contributions
 agrees-0.089
 cleans-0.083
 learns-0.079
ours-0.077
 becomes-0.074
Token sharp
Token ID7786
Feature activation+1.06e-02

Pos logit contributions
 stub+0.003
 all+0.003
 year+0.003
-+0.003
 ever+0.003
Neg logit contributions
 agrees-0.009
 cleans-0.009
ours-0.009
 learns-0.008
 wasteful-0.008
Token teeth
Token ID9941
Feature activation+1.82e-02

Pos logit contributions
 stub+0.016
 nowhere+0.016
 year+0.016
 all+0.016
 ever+0.016
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.017
 learns-0.017
 becomes-0.015
Token.
Token ID13
Feature activation-8.09e-03

Pos logit contributions
 stub+0.013
 year+0.011
 nowhere+0.010
 pupil+0.010
 all+0.010
Neg logit contributions
 agrees-0.049
 cleans-0.046
ours-0.045
 learns-0.044
 cooperate-0.042
Token Lily
Token ID20037
Feature activation-7.55e-03

Pos logit contributions
 agrees+0.020
 cleans+0.019
ours+0.019
 wasteful+0.018
 learns+0.018
Neg logit contributions
 all-0.008
 one-0.007
 stub-0.007
--0.007
 just-0.007
Token lion
Token ID18744
Feature activation+1.39e-02

Pos logit contributions
 agrees+0.024
 cleans+0.022
ours+0.022
 learns+0.021
 wasteful+0.021
Neg logit contributions
 stub-0.010
 all-0.010
 ever-0.010
 one-0.009
--0.009
Token with
Token ID351
Feature activation+2.07e-02

Pos logit contributions
 stub+0.009
 all+0.008
 year+0.008
 nowhere+0.008
 ever+0.007
Neg logit contributions
 agrees-0.038
 cleans-0.035
ours-0.035
 learns-0.034
 cooperate-0.033
Token a
Token ID257
Feature activation+5.94e-03

Pos logit contributions
 stub+0.023
 year+0.022
-+0.020
 pupil+0.020
 nowhere+0.019
Neg logit contributions
 agrees-0.045
 cleans-0.044
ours-0.040
 yeast-0.040
 learns-0.039
Token big
Token ID1263
Feature activation-2.87e-03

Pos logit contributions
 all+0.005
 stub+0.005
 ever+0.005
 year+0.005
-+0.005
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.013
 learns-0.013
 wasteful-0.013
Token man
Token ID582
Feature activation+4.98e-03

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.006
 learns+0.005
 cooperate+0.005
Neg logit contributions
 stub-0.003
 all-0.003
 ever-0.003
 year-0.003
 nowhere-0.003
Tokene
Token ID68
Feature activation+4.40e-02

Pos logit contributions
 stub+0.004
 all+0.004
 year+0.004
 nowhere+0.004
 pupil+0.003
Neg logit contributions
 agrees-0.012
 cleans-0.011
ours-0.011
 learns-0.011
 cooperate-0.010
Token and
Token ID290
Feature activation+3.24e-03

Pos logit contributions
 pupil+0.048
 stub+0.048
 nowhere+0.047
 year+0.045
 visitor+0.045
Neg logit contributions
 agrees-0.088
 cleans-0.085
 learns-0.078
ours-0.077
 becomes-0.073
Token sharp
Token ID7786
Feature activation+1.37e-02

Pos logit contributions
 stub+0.003
 all+0.003
 year+0.003
-+0.003
 ever+0.003
Neg logit contributions
 agrees-0.008
 cleans-0.007
ours-0.007
 learns-0.007
 wasteful-0.007
Token teeth
Token ID9941
Feature activation+1.80e-02

Pos logit contributions
 stub+0.021
 nowhere+0.021
 year+0.021
 all+0.021
 ever+0.021
Neg logit contributions
 agrees-0.024
 cleans-0.023
ours-0.022
 learns-0.021
 becomes-0.019
Token.
Token ID13
Feature activation+2.68e-03

Pos logit contributions
 stub+0.013
 pupil+0.011
 year+0.011
 nowhere+0.011
 all+0.010
Neg logit contributions
 agrees-0.048
 cleans-0.045
ours-0.044
 learns-0.043
 cooperate-0.041
Token
Token ID220
Feature activation-7.02e-03

Pos logit contributions
 all+0.003
 stub+0.003
-+0.003
 one+0.003
 ever+0.003
Neg logit contributions
 agrees-0.006
 cleans-0.006
ours-0.006
 learns-0.005
 wasteful-0.005
Token he
Token ID339
Feature activation+1.75e-02

Pos logit contributions
 stub+0.011
-+0.011
 all+0.010
 pupil+0.010
 year+0.010
Neg logit contributions
 agrees-0.025
 cleans-0.025
ours-0.024
 safer-0.023
 learns-0.023
Token gave
Token ID2921
Feature activation+9.78e-03

Pos logit contributions
 all+0.018
 pupil+0.016
 ad+0.016
 one+0.015
 spirit+0.015
Neg logit contributions
 agrees-0.037
 cleans-0.036
ours-0.034
 learns-0.033
 finishes-0.033
Token a
Token ID257
Feature activation+6.85e-03

Pos logit contributions
 stub+0.013
 year+0.012
 pupil+0.012
t+0.012
dream+0.012
Neg logit contributions
 cleans-0.019
 agrees-0.019
 safer-0.018
 ourselves-0.018
ours-0.017
Token big
Token ID1263
Feature activation+2.06e-03

Pos logit contributions
 all+0.006
 stub+0.005
 one+0.005
-+0.005
 t+0.005
Neg logit contributions
 agrees-0.018
ours-0.017
 cleans-0.017
 wasteful-0.016
 learns-0.016
Token sne
Token ID10505
Feature activation+2.16e-02

Pos logit contributions
 all+0.003
 stub+0.003
 ever+0.003
-+0.003
 one+0.003
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 learns-0.003
 wasteful-0.003
Tokene
Token ID68
Feature activation+4.39e-02

Pos logit contributions
 nowhere+0.031
 all+0.030
 stub+0.029
 t+0.028
 addressed+0.027
Neg logit contributions
 agrees-0.043
 cleans-0.039
 learns-0.036
ws-0.036
 cooperate-0.036
Tokenze
Token ID2736
Feature activation+2.84e-02

Pos logit contributions
 nowhere+0.058
 ever+0.053
 quite+0.050
 stub+0.049
 usually+0.049
Neg logit contributions
 agrees-0.080
 cleans-0.076
ours-0.068
ashes-0.063
 learns-0.063
Token!
Token ID0
Feature activation+5.76e-03

Pos logit contributions
 stub+0.021
 nowhere+0.019
 year+0.018
 pupil+0.018
 all+0.018
Neg logit contributions
 agrees-0.072
 cleans-0.068
ours-0.064
 safer-0.062
 learns-0.061
Token Big
Token ID4403
Feature activation+1.14e-02

Pos logit contributions
 all+0.006
 stub+0.005
 one+0.005
-+0.005
 in+0.005
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.013
 wasteful-0.012
 learns-0.012
Token Bear
Token ID14732
Feature activation+2.99e-03

Pos logit contributions
 all+0.012
 stub+0.012
 ever+0.011
 year+0.011
 one+0.010
Neg logit contributions
 agrees-0.027
 cleans-0.025
ours-0.024
 learns-0.023
 wasteful-0.023
Token sne
Token ID10505
Feature activation+2.05e-02

Pos logit contributions
 stub+0.003
 all+0.002
 one+0.002
-+0.002
 ever+0.002
Neg logit contributions
 agrees-0.008
 cleans-0.007
ours-0.007
 learns-0.007
 cooperate-0.006
Token,
Token ID11
Feature activation-6.13e-03

Pos logit contributions
 stub+0.001
 all+0.001
 ever+0.001
 year+0.001
-+0.001
Neg logit contributions
 agrees-0.004
 cleans-0.003
ours-0.003
 learns-0.003
 wasteful-0.003
Token after
Token ID706
Feature activation+2.06e-02

Pos logit contributions
 agrees+0.012
 cleans+0.011
ours+0.011
 wasteful+0.011
 learns+0.010
Neg logit contributions
 stub-0.008
--0.007
 ever-0.007
 year-0.007
 all-0.007
Token an
Token ID281
Feature activation+1.91e-02

Pos logit contributions
 stub+0.020
-+0.017
 all+0.017
 just+0.016
 year+0.016
Neg logit contributions
 agrees-0.047
 cleans-0.046
ours-0.045
 learns-0.043
 wasteful-0.042
Token extra
Token ID3131
Feature activation+1.45e-02

Pos logit contributions
 stub+0.021
 all+0.018
 t+0.017
 just+0.017
 ever+0.017
Neg logit contributions
 agrees-0.043
 cleans-0.039
ours-0.039
 wasteful-0.038
 learns-0.038
Token-
Token ID12
Feature activation+2.89e-02

Pos logit contributions
 all+0.014
 stub+0.013
 nowhere+0.013
 ever+0.012
 just+0.012
Neg logit contributions
 agrees-0.034
 cleans-0.031
ours-0.030
 learns-0.029
 safer-0.029
Tokenlong
Token ID6511
Feature activation+4.38e-02

Pos logit contributions
 all+0.043
 t+0.041
 stub+0.038
 nowhere+0.037
 just+0.036
Neg logit contributions
 agrees-0.054
 cleans-0.048
 learns-0.047
 cooperate-0.045
ours-0.045
Token flight
Token ID5474
Feature activation-6.98e-03

Pos logit contributions
 all+0.044
 nowhere+0.042
 stub+0.039
 addressed+0.036
 even+0.034
Neg logit contributions
 agrees-0.091
 cleans-0.089
 yeast-0.082
ours-0.081
 cooperate-0.081
Token through
Token ID832
Feature activation+7.20e-03

Pos logit contributions
 agrees+0.017
 cleans+0.015
ours+0.015
 learns+0.015
 cooperate+0.014
Neg logit contributions
 stub-0.006
 all-0.006
 t-0.006
 nowhere-0.006
 ever-0.005
Token the
Token ID262
Feature activation+1.30e-02

Pos logit contributions
 stub+0.010
-+0.008
 just+0.008
 ever+0.008
 pupil+0.008
Neg logit contributions
 agrees-0.013
 cleans-0.013
 learns-0.013
ours-0.012
 safer-0.012
Token desert
Token ID10326
Feature activation+7.31e-03

Pos logit contributions
 all+0.013
 stub+0.013
 one+0.012
-+0.012
 ever+0.012
Neg logit contributions
 agrees-0.031
 cleans-0.029
ours-0.028
 learns-0.027
 wasteful-0.027
Token,
Token ID11
Feature activation+4.28e-03

Pos logit contributions
 all+0.006
 stub+0.005
 one+0.005
-+0.005
 ever+0.005
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.018
 learns-0.017
 wasteful-0.017
Token He
Token ID679
Feature activation+8.06e-03

Pos logit contributions
 agrees+0.016
 cleans+0.015
ours+0.015
 wasteful+0.014
 learns+0.014
Neg logit contributions
 all-0.010
 one-0.009
 stub-0.009
 in-0.009
 ever-0.009
Token has
Token ID468
Feature activation+8.53e-03

Pos logit contributions
 all+0.015
 just+0.014
-+0.014
 one+0.014
 in+0.014
Neg logit contributions
 agrees-0.013
ours-0.012
 cleans-0.011
 wasteful-0.011
 cooperate-0.010
Token a
Token ID257
Feature activation+6.17e-03

Pos logit contributions
 stub+0.009
 all+0.008
-+0.008
 year+0.008
 one+0.008
Neg logit contributions
 agrees-0.020
 cleans-0.019
ours-0.018
 learns-0.018
 wasteful-0.017
Token big
Token ID1263
Feature activation-7.72e-03

Pos logit contributions
 all+0.007
 nowhere+0.007
 stub+0.006
 ever+0.006
 in+0.006
Neg logit contributions
 agrees-0.013
 cleans-0.012
 safer-0.012
ours-0.012
 learns-0.012
Token man
Token ID582
Feature activation+5.28e-03

Pos logit contributions
 agrees+0.016
 cleans+0.015
 learns+0.014
ours+0.014
 cooperate+0.013
Neg logit contributions
 stub-0.009
t-0.009
 nowhere-0.009
 all-0.009
 ever-0.009
Tokene
Token ID68
Feature activation+4.36e-02

Pos logit contributions
 stub+0.005
 all+0.005
 nowhere+0.004
 year+0.004
 t+0.004
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.011
 learns-0.011
 wasteful-0.011
Token of
Token ID286
Feature activation+4.82e-03

Pos logit contributions
 stub+0.048
 nowhere+0.045
 quite+0.040
 pupil+0.040
 ever+0.040
Neg logit contributions
 agrees-0.097
 cleans-0.091
 learns-0.085
ours-0.085
 cooperate-0.078
Token hair
Token ID4190
Feature activation+1.62e-02

Pos logit contributions
 stub+0.006
-+0.006
 year+0.006
 all+0.006
 nowhere+0.005
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 learns-0.009
 cooperate-0.008
Token and
Token ID290
Feature activation+7.41e-03

Pos logit contributions
 stub+0.010
 all+0.010
 one+0.009
-+0.009
 ever+0.009
Neg logit contributions
 agrees-0.044
 cleans-0.042
ours-0.041
 learns-0.040
 wasteful-0.039
Token a
Token ID257
Feature activation+1.11e-03

Pos logit contributions
 all+0.006
 stub+0.006
 one+0.006
-+0.006
 in+0.006
Neg logit contributions
 agrees-0.019
ours-0.018
 cleans-0.018
 wasteful-0.017
 learns-0.017
Token loud
Token ID7812
Feature activation+9.11e-03

Pos logit contributions
 all+0.001
 stub+0.001
 ever+0.001
-+0.001
 nowhere+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.002
ours-0.002
 learns-0.002
 wasteful-0.002
Token a
Token ID257
Feature activation+8.45e-03

Pos logit contributions
 year+0.007
-+0.007
lf+0.007
 elf+0.007
ie+0.007
Neg logit contributions
 cleans-0.011
 yeast-0.011
 safer-0.011
 agrees-0.011
 learns-0.010
Token big
Token ID1263
Feature activation-1.04e-02

Pos logit contributions
 ever+0.008
 all+0.008
 stub+0.007
 year+0.007
 nowhere+0.007
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.018
 wasteful-0.017
 safer-0.017
Token,
Token ID11
Feature activation-3.90e-03

Pos logit contributions
 agrees+0.023
 cleans+0.022
ours+0.022
 learns+0.021
 wasteful+0.021
Neg logit contributions
 stub-0.011
 ever-0.011
 all-0.011
--0.010
 nowhere-0.010
Token scary
Token ID14343
Feature activation+1.26e-02

Pos logit contributions
 cleans+0.008
 agrees+0.008
 wasteful+0.008
 safer+0.008
ours+0.007
Neg logit contributions
 all-0.004
 stub-0.004
 ever-0.004
 year-0.004
--0.004
Token sne
Token ID10505
Feature activation+2.06e-02

Pos logit contributions
 ever+0.015
 nowhere+0.015
t+0.015
 surrounded+0.014
 all+0.014
Neg logit contributions
 agrees-0.023
 cleans-0.022
ours-0.022
 learns-0.021
 wasteful-0.020
Tokene
Token ID68
Feature activation+4.34e-02

Pos logit contributions
 addressed+0.025
 nowhere+0.025
 stub+0.024
 t+0.024
 all+0.024
Neg logit contributions
 agrees-0.041
 cleans-0.039
 learns-0.038
 wasteful-0.037
 cooperate-0.036
Tokenze
Token ID2736
Feature activation+2.44e-02

Pos logit contributions
 Mill+0.058
 elf+0.058
 Sus+0.056
 Peter+0.056
 Frank+0.056
Neg logit contributions
 cleans-0.065
 agrees-0.064
 learns-0.059
ows-0.059
ashes-0.056
Token.
Token ID13
Feature activation-8.88e-03

Pos logit contributions
 stub+0.026
 3+0.023
 pupil+0.022
 nowhere+0.022
yl+0.022
Neg logit contributions
 agrees-0.049
 cleans-0.048
 learns-0.045
 wasteful-0.045
ours-0.044
Token Every
Token ID3887
Feature activation-9.85e-05

Pos logit contributions
 agrees+0.021
 cleans+0.020
ours+0.020
 learns+0.019
 wasteful+0.019
Neg logit contributions
 all-0.009
 stub-0.009
 one-0.008
--0.008
 in-0.008
Token time
Token ID640
Feature activation-5.93e-03

Pos logit contributions
 agrees+0.000
 cleans+0.000
ours+0.000
 learns+0.000
 yeast+0.000
Neg logit contributions
 all-0.000
 stub-0.000
 nowhere-0.000
 just-0.000
 ever-0.000
Token he
Token ID339
Feature activation+6.54e-03

Pos logit contributions
 agrees+0.015
 cleans+0.014
ours+0.014
 learns+0.013
 wasteful+0.013
Neg logit contributions
 all-0.005
 stub-0.005
--0.004
 one-0.004
 year-0.004
Token.
Token ID13
Feature activation+4.01e-03

Pos logit contributions
 stub+0.013
 all+0.012
 year+0.011
 one+0.011
 pupil+0.011
Neg logit contributions
 agrees-0.049
 cleans-0.046
ours-0.045
 learns-0.044
 cooperate-0.042
Token He
Token ID679
Feature activation+5.53e-03

Pos logit contributions
 all+0.004
 one+0.004
 in+0.004
 just+0.004
 t+0.003
Neg logit contributions
 agrees-0.010
ours-0.009
 cleans-0.009
 wasteful-0.009
 learns-0.009
Token had
Token ID550
Feature activation+8.49e-03

Pos logit contributions
 stub+0.005
 all+0.005
 year+0.005
 pupil+0.005
-+0.005
Neg logit contributions
 agrees-0.013
 cleans-0.013
ours-0.012
 learns-0.012
 wasteful-0.011
Token a
Token ID257
Feature activation+9.14e-03

Pos logit contributions
 pupil+0.007
 stub+0.007
-+0.007
 year+0.007
 all+0.007
Neg logit contributions
 agrees-0.021
 cleans-0.020
 learns-0.019
ours-0.018
 safer-0.018
Token man
Token ID582
Feature activation+9.37e-03

Pos logit contributions
 all+0.008
 stub+0.008
 ever+0.008
-+0.008
 year+0.008
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.019
 safer-0.019
 wasteful-0.019
Tokene
Token ID68
Feature activation+4.29e-02

Pos logit contributions
 stub+0.010
 all+0.009
 nowhere+0.008
 pupil+0.008
 year+0.008
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.019
 learns-0.019
 cooperate-0.018
Token and
Token ID290
Feature activation+4.47e-03

Pos logit contributions
 stub+0.051
 pupil+0.048
 nowhere+0.047
 3+0.046
 ever+0.045
Neg logit contributions
 agrees-0.087
 cleans-0.082
 learns-0.077
ours-0.075
ows-0.073
Token sharp
Token ID7786
Feature activation+1.38e-02

Pos logit contributions
 all+0.003
 stub+0.003
-+0.003
 year+0.003
 one+0.003
Neg logit contributions
 agrees-0.012
 cleans-0.011
ours-0.011
 learns-0.011
 wasteful-0.011
Token teeth
Token ID9941
Feature activation+1.92e-02

Pos logit contributions
 all+0.024
 stub+0.024
 nowhere+0.023
 year+0.023
 ever+0.023
Neg logit contributions
 agrees-0.023
 cleans-0.021
ours-0.019
 learns-0.019
ws-0.017
Token.
Token ID13
Feature activation+6.41e-03

Pos logit contributions
 stub+0.015
 pupil+0.013
 year+0.012
 all+0.012
 nowhere+0.011
Neg logit contributions
 agrees-0.050
 cleans-0.047
ours-0.046
 learns-0.045
 safer-0.043
Token He
Token ID679
Feature activation+6.05e-03

Pos logit contributions
 all+0.005
 one+0.005
 just+0.005
 in+0.005
 stub+0.005
Neg logit contributions
 agrees-0.016
 cleans-0.015
ours-0.015
 wasteful-0.015
 cooperate-0.015
Token He
Token ID679
Feature activation+7.28e-03

Pos logit contributions
 agrees+0.012
 cleans+0.011
ours+0.010
 learns+0.010
 wasteful+0.009
Neg logit contributions
 stub-0.008
--0.008
 all-0.008
 ever-0.008
 just-0.008
Token had
Token ID550
Feature activation+8.17e-03

Pos logit contributions
dream+0.009
 stub+0.009
 pupil+0.008
 one+0.008
 3+0.008
Neg logit contributions
 agrees-0.013
 learns-0.013
 cleans-0.013
 becomes-0.012
 cooperate-0.012
Token a
Token ID257
Feature activation+9.46e-03

Pos logit contributions
-+0.009
 year+0.009
 pupil+0.009
lf+0.009
 stub+0.009
Neg logit contributions
 agrees-0.016
 cleans-0.016
 learns-0.016
 safer-0.015
 wasteful-0.014
Token thick
Token ID6546
Feature activation+1.07e-02

Pos logit contributions
 stub+0.009
 ever+0.009
 nowhere+0.009
 all+0.009
t+0.008
Neg logit contributions
 agrees-0.020
 cleans-0.019
 safer-0.019
 wasteful-0.019
 learns-0.019
Token man
Token ID582
Feature activation+1.09e-02

Pos logit contributions
 ever+0.014
 stub+0.014
t+0.013
 nowhere+0.013
T+0.013
Neg logit contributions
 agrees-0.020
 cleans-0.019
 learns-0.018
ours-0.017
 cooperate-0.016
Tokene
Token ID68
Feature activation+4.28e-02

Pos logit contributions
 stub+0.012
 all+0.011
 nowhere+0.011
 barely+0.010
 t+0.010
Neg logit contributions
 agrees-0.022
 cleans-0.022
ours-0.021
 learns-0.020
 safer-0.020
Token and
Token ID290
Feature activation+7.41e-03

Pos logit contributions
 stub+0.052
 nowhere+0.050
 ever+0.048
 visitor+0.047
 3+0.047
Neg logit contributions
 agrees-0.071
 cleans-0.066
 learns-0.065
ows-0.062
girls-0.058
Token he
Token ID339
Feature activation+6.67e-03

Pos logit contributions
 stub+0.007
 year+0.007
-+0.006
 nowhere+0.006
 all+0.006
Neg logit contributions
 agrees-0.017
 cleans-0.016
 learns-0.015
ours-0.015
 wasteful-0.015
Token loved
Token ID6151
Feature activation+2.66e-03

Pos logit contributions
 stub+0.007
dream+0.006
 one+0.006
 pupil+0.006
 Halloween+0.006
Neg logit contributions
 agrees-0.014
 cleans-0.014
 learns-0.014
 wasteful-0.013
ours-0.013
Token to
Token ID284
Feature activation-3.16e-03

Pos logit contributions
 stub+0.003
 pupil+0.003
 year+0.002
-+0.002
dream+0.002
Neg logit contributions
 agrees-0.006
 cleans-0.005
 learns-0.005
ours-0.005
 cooperate-0.005
Token roar
Token ID34532
Feature activation+9.87e-03

Pos logit contributions
 agrees+0.007
 cleans+0.006
ours+0.006
 learns+0.006
 cooperate+0.006
Neg logit contributions
 stub-0.004
 all-0.004
 nowhere-0.004
--0.004
 year-0.004
Token lion
Token ID18744
Feature activation+1.90e-02

Pos logit contributions
 agrees+0.026
 cleans+0.024
ours+0.023
 learns+0.022
 safer+0.022
Neg logit contributions
 stub-0.015
 ever-0.013
 all-0.013
t-0.013
 nowhere-0.013
Token with
Token ID351
Feature activation+1.77e-02

Pos logit contributions
 stub+0.016
 all+0.013
 year+0.013
 nowhere+0.012
 one+0.012
Neg logit contributions
 agrees-0.048
 cleans-0.045
ours-0.044
 learns-0.043
 safer-0.041
Token a
Token ID257
Feature activation+7.80e-03

Pos logit contributions
 stub+0.020
 year+0.019
-+0.018
 pupil+0.018
lf+0.018
Neg logit contributions
 agrees-0.037
 cleans-0.035
 learns-0.032
 yeast-0.032
ours-0.032
Token fluffy
Token ID39145
Feature activation+1.78e-02

Pos logit contributions
 ever+0.007
 all+0.007
 nowhere+0.007
 year+0.007
 stub+0.007
Neg logit contributions
 agrees-0.018
 cleans-0.017
 learns-0.016
ours-0.016
 safer-0.016
Token man
Token ID582
Feature activation+6.91e-03

Pos logit contributions
 ever+0.027
t+0.026
 nowhere+0.026
 ad+0.026
 year+0.025
Neg logit contributions
 agrees-0.026
 cleans-0.024
 learns-0.021
 nods-0.021
 mouths-0.021
Tokene
Token ID68
Feature activation+4.25e-02

Pos logit contributions
 stub+0.006
 all+0.006
 year+0.005
 nowhere+0.005
 pupil+0.005
Neg logit contributions
 agrees-0.016
 cleans-0.016
ours-0.015
 learns-0.014
 supervision-0.014
Token.
Token ID13
Feature activation+2.55e-04

Pos logit contributions
 pupil+0.048
 stub+0.047
 nowhere+0.046
 3+0.045
 ever+0.045
Neg logit contributions
 agrees-0.082
 cleans-0.080
 learns-0.072
ours-0.072
ows-0.070
Token She
Token ID1375
Feature activation-2.55e-03

Pos logit contributions
 all+0.000
 one+0.000
 just+0.000
 in+0.000
 stub+0.000
Neg logit contributions
 agrees-0.001
 cleans-0.001
ours-0.001
 wasteful-0.001
 cooperate-0.001
Token saw
Token ID2497
Feature activation-1.15e-02

Pos logit contributions
 agrees+0.006
 cleans+0.005
 learns+0.005
ours+0.005
 cooperate+0.005
Neg logit contributions
 stub-0.003
 year-0.002
 all-0.002
--0.002
 pupil-0.002
Token a
Token ID257
Feature activation+2.71e-04

Pos logit contributions
 agrees+0.026
 cleans+0.024
ours+0.024
 learns+0.023
 wasteful+0.022
Neg logit contributions
 stub-0.013
 all-0.012
--0.012
 year-0.012
 ever-0.012
Token tall
Token ID7331
Feature activation-8.11e-03

Pos logit contributions
 all+0.000
 stub+0.000
 ever+0.000
 nowhere+0.000
 in+0.000
Neg logit contributions
 agrees-0.001
 cleans-0.001
ours-0.001
 wasteful-0.001
 safer-0.001
Token horse
Token ID8223
Feature activation+1.25e-02

Pos logit contributions
 agrees+0.003
 cleans+0.003
ours+0.003
 wasteful+0.003
 learns+0.003
Neg logit contributions
 stub-0.002
 all-0.002
 ever-0.002
 just-0.001
--0.001
Token with
Token ID351
Feature activation+2.04e-02

Pos logit contributions
 stub+0.009
 all+0.007
 year+0.007
 one+0.007
 nowhere+0.007
Neg logit contributions
 agrees-0.033
 cleans-0.031
ours-0.031
 learns-0.029
 cooperate-0.029
Token a
Token ID257
Feature activation+9.17e-03

Pos logit contributions
 stub+0.022
 year+0.021
-+0.020
 pupil+0.019
 ever+0.018
Neg logit contributions
 agrees-0.046
 cleans-0.044
ours-0.042
 yeast-0.040
 learns-0.040
Token long
Token ID890
Feature activation+1.44e-02

Pos logit contributions
 all+0.009
 ever+0.009
 stub+0.008
 year+0.008
-+0.008
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.020
 learns-0.019
 wasteful-0.019
Token man
Token ID582
Feature activation+1.11e-02

Pos logit contributions
 ever+0.018
 all+0.016
 pupil+0.016
 stub+0.015
 nowhere+0.015
Neg logit contributions
 agrees-0.030
 cleans-0.029
ours-0.026
 learns-0.026
 yeast-0.025
Tokene
Token ID68
Feature activation+4.24e-02

Pos logit contributions
 all+0.010
 stub+0.010
 year+0.009
 nowhere+0.009
 barely+0.008
Neg logit contributions
 agrees-0.026
 cleans-0.025
ours-0.025
 learns-0.023
 supervision-0.022
Token and
Token ID290
Feature activation+3.23e-03

Pos logit contributions
 pupil+0.042
 stub+0.042
 year+0.041
 nowhere+0.040
 ever+0.040
Neg logit contributions
 cleans-0.088
 agrees-0.087
ours-0.081
 learns-0.077
ows-0.074
Token a
Token ID257
Feature activation+6.03e-03

Pos logit contributions
 all+0.002
 stub+0.002
 year+0.002
-+0.002
 ever+0.002
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 learns-0.008
 wasteful-0.007
Token tail
Token ID7894
Feature activation+2.06e-02

Pos logit contributions
 all+0.006
 ever+0.006
 stub+0.006
 year+0.006
-+0.006
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.013
 learns-0.012
 wasteful-0.012
Token.
Token ID13
Feature activation-3.89e-03

Pos logit contributions
 stub+0.015
 year+0.015
 pupil+0.015
 ever+0.014
 nowhere+0.014
Neg logit contributions
 agrees-0.050
 cleans-0.048
ours-0.046
 learns-0.044
 safer-0.043
Token It
Token ID632
Feature activation+1.21e-02

Pos logit contributions
 agrees+0.010
 cooperate+0.009
 cleans+0.009
 wasteful+0.009
ours+0.009
Neg logit contributions
 one-0.004
 all-0.004
 in-0.003
 stub-0.003
 just-0.003
Token yellow
Token ID7872
Feature activation+3.23e-03

Pos logit contributions
 agrees+0.010
 cleans+0.009
ours+0.009
 learns+0.009
 wasteful+0.009
Neg logit contributions
 all-0.004
 stub-0.004
 one-0.003
--0.003
 in-0.003
Token and
Token ID290
Feature activation-4.46e-04

Pos logit contributions
 stub+0.003
 all+0.002
 nowhere+0.002
 year+0.002
-+0.002
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 learns-0.007
 cooperate-0.007
Token has
Token ID468
Feature activation+1.79e-02

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
 all-0.001
 stub-0.001
--0.000
 one-0.000
 ever-0.000
Token a
Token ID257
Feature activation+7.38e-03

Pos logit contributions
 stub+0.019
-+0.018
 all+0.018
 year+0.017
 pupil+0.017
Neg logit contributions
 agrees-0.041
 cleans-0.038
ours-0.037
 learns-0.036
 wasteful-0.035
Token man
Token ID582
Feature activation+6.06e-03

Pos logit contributions
 nowhere+0.008
 all+0.008
 stub+0.008
 year+0.007
 ever+0.007
Neg logit contributions
 agrees-0.016
 cleans-0.014
 learns-0.014
ours-0.014
 safer-0.014
Tokene
Token ID68
Feature activation+4.24e-02

Pos logit contributions
 stub+0.006
 all+0.006
 nowhere+0.005
 year+0.005
 t+0.005
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.013
 learns-0.013
 wasteful-0.013
Token.
Token ID13
Feature activation+1.74e-03

Pos logit contributions
 stub+0.053
 nowhere+0.052
 pupil+0.047
 all+0.044
 ever+0.044
Neg logit contributions
 agrees-0.088
 cleans-0.079
ours-0.077
 learns-0.077
 becomes-0.070
Token It
Token ID632
Feature activation+1.28e-02

Pos logit contributions
 all+0.003
 stub+0.003
 one+0.003
-+0.003
 in+0.002
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 learns-0.003
 wasteful-0.003
Token ro
Token ID686
Feature activation+3.87e-03

Pos logit contributions
 all+0.023
 just+0.022
 one+0.022
 in+0.022
-+0.021
Neg logit contributions
 agrees-0.021
ours-0.019
 cleans-0.019
 wasteful-0.018
 cooperate-0.018
Tokenars
Token ID945
Feature activation+1.49e-02

Pos logit contributions
 all+0.007
 stub+0.007
 one+0.007
-+0.007
 t+0.007
Neg logit contributions
 agrees-0.006
 cleans-0.005
ours-0.005
 learns-0.005
 wasteful-0.005
Token like
Token ID588
Feature activation+3.26e-03

Pos logit contributions
 stub+0.009
 all+0.009
-+0.008
 one+0.008
 year+0.008
Neg logit contributions
 agrees-0.041
 cleans-0.038
ours-0.038
 learns-0.036
 wasteful-0.036
Token horse
Token ID8223
Feature activation+9.98e-03

Pos logit contributions
 agrees+0.002
 cleans+0.002
ours+0.002
 learns+0.001
 wasteful+0.001
Neg logit contributions
 stub-0.001
 all-0.001
--0.001
 ever-0.001
 one-0.001
Token with
Token ID351
Feature activation+2.17e-02

Pos logit contributions
 stub+0.006
 all+0.005
 year+0.005
 nowhere+0.005
 ever+0.004
Neg logit contributions
 agrees-0.028
 cleans-0.026
ours-0.026
 learns-0.025
 cooperate-0.025
Token a
Token ID257
Feature activation+6.80e-03

Pos logit contributions
 stub+0.024
 year+0.024
-+0.022
 pupil+0.021
t+0.020
Neg logit contributions
 agrees-0.046
 cleans-0.045
ours-0.043
 yeast-0.043
 learns-0.041
Token long
Token ID890
Feature activation+1.28e-02

Pos logit contributions
 all+0.007
 stub+0.006
 ever+0.006
-+0.006
 year+0.006
Neg logit contributions
 agrees-0.016
 cleans-0.015
ours-0.015
 learns-0.014
 wasteful-0.014
Token man
Token ID582
Feature activation+7.57e-03

Pos logit contributions
 ever+0.016
 all+0.015
 pupil+0.014
 ad+0.014
 nowhere+0.014
Neg logit contributions
 agrees-0.025
 cleans-0.024
 learns-0.022
ours-0.022
 yeast-0.022
Tokene
Token ID68
Feature activation+4.22e-02

Pos logit contributions
 stub+0.006
 all+0.006
 year+0.005
 nowhere+0.005
 t+0.005
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.018
 learns-0.017
 cooperate-0.016
Token and
Token ID290
Feature activation+1.89e-03

Pos logit contributions
 stub+0.047
 nowhere+0.044
 pupil+0.042
 year+0.042
 ever+0.040
Neg logit contributions
 agrees-0.090
 cleans-0.083
 learns-0.080
ours-0.080
 becomes-0.074
Token a
Token ID257
Feature activation+3.95e-03

Pos logit contributions
 all+0.001
 stub+0.001
-+0.001
 year+0.001
 ever+0.001
Neg logit contributions
 agrees-0.005
 cleans-0.005
ours-0.005
 learns-0.004
 wasteful-0.004
Token tail
Token ID7894
Feature activation+1.89e-02

Pos logit contributions
 all+0.004
 stub+0.004
-+0.004
 ever+0.004
 one+0.004
Neg logit contributions
 agrees-0.009
 cleans-0.009
ours-0.009
 learns-0.008
 wasteful-0.008
Token.
Token ID13
Feature activation+1.89e-03

Pos logit contributions
 stub+0.013
 year+0.011
 all+0.011
 pupil+0.011
 nowhere+0.011
Neg logit contributions
 agrees-0.050
 cleans-0.047
ours-0.046
 learns-0.045
 cooperate-0.042
Token And
Token ID843
Feature activation+5.20e-03

Pos logit contributions
 all+0.002
 one+0.002
 in+0.002
 just+0.002
 stub+0.002
Neg logit contributions
 agrees-0.005
ours-0.004
 cleans-0.004
 wasteful-0.004
 cooperate-0.004
Token suddenly
Token ID6451
Feature activation+5.67e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
 learns+0.001
ours+0.001
 wasteful+0.001
Neg logit contributions
 year-0.001
 Halloween-0.001
 ad-0.001
 all-0.001
 ever-0.001
Token heard
Token ID2982
Feature activation+5.11e-03

Pos logit contributions
 year+0.006
 ever+0.006
 ad+0.006
 Halloween+0.006
 all+0.006
Neg logit contributions
 agrees-0.012
 cleans-0.011
 learns-0.011
ours-0.010
 wasteful-0.010
Token a
Token ID257
Feature activation+1.02e-02

Pos logit contributions
 stub+0.006
 miner+0.005
 pupil+0.005
 ever+0.005
amb+0.005
Neg logit contributions
 agrees-0.011
 cleans-0.010
 learns-0.009
 safer-0.009
ours-0.009
Token big
Token ID1263
Feature activation-8.54e-03

Pos logit contributions
 all+0.008
 stub+0.008
 one+0.007
-+0.007
 ever+0.007
Neg logit contributions
 agrees-0.027
 cleans-0.025
ours-0.025
 learns-0.024
 wasteful-0.024
Token sne
Token ID10505
Feature activation+1.55e-02

Pos logit contributions
 agrees+0.019
 cleans+0.018
ours+0.017
 wasteful+0.017
 learns+0.017
Neg logit contributions
 stub-0.010
 all-0.009
 ever-0.009
 one-0.009
 just-0.008
Tokene
Token ID68
Feature activation+4.16e-02

Pos logit contributions
 all+0.019
 stub+0.018
 t+0.018
 nowhere+0.018
 one+0.018
Neg logit contributions
 agrees-0.034
 cleans-0.031
 learns-0.029
 cooperate-0.028
ws-0.027
Tokenze
Token ID2736
Feature activation+2.06e-02

Pos logit contributions
 nowhere+0.051
 stub+0.051
 recently+0.045
 pupil+0.045
 3+0.044
Neg logit contributions
 agrees-0.080
 cleans-0.069
 learns-0.064
ows-0.062
 finishes-0.061
Token.
Token ID13
Feature activation+9.28e-03

Pos logit contributions
 stub+0.021
 nowhere+0.020
 Halloween+0.019
 pupil+0.018
 one+0.017
Neg logit contributions
 agrees-0.046
 cleans-0.042
ours-0.038
 finishes-0.038
 learns-0.037
Token She
Token ID1375
Feature activation+2.66e-03

Pos logit contributions
 all+0.011
 stub+0.010
 one+0.010
-+0.010
 in+0.010
Neg logit contributions
 agrees-0.021
 cleans-0.019
ours-0.019
 wasteful-0.018
 learns-0.018
Token opened
Token ID4721
Feature activation+6.00e-03

Pos logit contributions
 stub+0.003
 all+0.003
 one+0.003
 year+0.003
 pupil+0.003
Neg logit contributions
 agrees-0.006
 cleans-0.005
ours-0.005
 learns-0.005
 finishes-0.005
Token the
Token ID262
Feature activation-3.20e-03

Pos logit contributions
 stub+0.008
 year+0.007
 pupil+0.007
 ever+0.007
dream+0.006
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.011
 learns-0.010
 safer-0.010
Token It
Token ID632
Feature activation+8.60e-03

Pos logit contributions
 all+0.002
 stub+0.002
 one+0.002
 in+0.002
 t+0.002
Neg logit contributions
 agrees-0.005
 cleans-0.004
ours-0.004
 learns-0.004
 cooperate-0.004
Token has
Token ID468
Feature activation+1.51e-02

Pos logit contributions
 all+0.016
 one+0.016
 t+0.015
 just+0.015
 in+0.015
Neg logit contributions
 agrees-0.013
ours-0.012
 cleans-0.012
 wasteful-0.012
 cooperate-0.012
Token a
Token ID257
Feature activation+9.69e-03

Pos logit contributions
 stub+0.015
-+0.014
 year+0.014
 pupil+0.014
 all+0.013
Neg logit contributions
 agrees-0.035
 cleans-0.033
ours-0.031
 learns-0.031
 yeast-0.031
Token long
Token ID890
Feature activation+7.12e-03

Pos logit contributions
 ever+0.009
 all+0.009
 nowhere+0.009
 stub+0.008
 year+0.008
Neg logit contributions
 agrees-0.022
 cleans-0.021
ours-0.021
 safer-0.019
 learns-0.019
Token man
Token ID582
Feature activation+7.45e-03

Pos logit contributions
 ever+0.009
 all+0.008
 nowhere+0.008
 stub+0.008
 pupil+0.008
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.013
 learns-0.013
 yeast-0.012
Tokene
Token ID68
Feature activation+4.16e-02

Pos logit contributions
 all+0.007
 stub+0.007
 nowhere+0.006
 year+0.006
 barely+0.006
Neg logit contributions
 agrees-0.017
 cleans-0.016
ours-0.016
 learns-0.015
 supervision-0.015
Token and
Token ID290
Feature activation+1.51e-03

Pos logit contributions
 stub+0.046
 nowhere+0.046
 ever+0.044
 pupil+0.044
 year+0.043
Neg logit contributions
 agrees-0.086
 cleans-0.085
ours-0.079
 learns-0.077
ows-0.073
Token a
Token ID257
Feature activation+4.81e-03

Pos logit contributions
 all+0.001
 stub+0.001
 one+0.001
-+0.001
 just+0.001
Neg logit contributions
 agrees-0.004
ours-0.004
 cleans-0.004
 wasteful-0.004
 learns-0.004
Token long
Token ID890
Feature activation+4.71e-03

Pos logit contributions
 all+0.005
 ever+0.005
 year+0.004
 stub+0.004
 nowhere+0.004
Neg logit contributions
 agrees-0.011
 cleans-0.011
ours-0.010
 learns-0.010
 wasteful-0.010
Token tail
Token ID7894
Feature activation+1.57e-02

Pos logit contributions
 ever+0.006
 all+0.005
 nowhere+0.005
 stub+0.005
 pupil+0.005
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 learns-0.008
 yeast-0.008
Token.
Token ID13
Feature activation-3.40e-04

Pos logit contributions
 stub+0.010
 year+0.010
 all+0.009
 nowhere+0.009
 ever+0.009
Neg logit contributions
 agrees-0.042
 cleans-0.039
ours-0.039
 learns-0.037
 yeast-0.036
Token lion
Token ID18744
Feature activation+1.92e-02

Pos logit contributions
 agrees+0.021
 cleans+0.020
ours+0.019
 wasteful+0.018
 learns+0.018
Neg logit contributions
 stub-0.011
 all-0.011
 ever-0.011
 just-0.010
 year-0.010
Token with
Token ID351
Feature activation+2.13e-02

Pos logit contributions
 stub+0.014
 all+0.013
 year+0.012
 one+0.012
 pupil+0.011
Neg logit contributions
 agrees-0.050
 cleans-0.047
ours-0.046
 learns-0.044
 cooperate-0.043
Token a
Token ID257
Feature activation+1.09e-02

Pos logit contributions
 stub+0.026
 year+0.025
-+0.023
 pupil+0.023
t+0.022
Neg logit contributions
 agrees-0.042
 cleans-0.042
ours-0.038
 vinegar-0.037
 learns-0.037
Token golden
Token ID10861
Feature activation+1.33e-02

Pos logit contributions
 all+0.011
 ever+0.010
 year+0.010
 stub+0.010
 nowhere+0.010
Neg logit contributions
 agrees-0.024
 cleans-0.023
ours-0.022
 learns-0.021
 wasteful-0.021
Token man
Token ID582
Feature activation+5.35e-03

Pos logit contributions
 ever+0.018
 stub+0.017
 nowhere+0.017
 all+0.016
t+0.016
Neg logit contributions
 agrees-0.024
 cleans-0.024
ours-0.023
 yeast-0.021
 vinegar-0.021
Tokene
Token ID68
Feature activation+4.16e-02

Pos logit contributions
 stub+0.005
 all+0.005
 year+0.004
 pupil+0.004
 one+0.004
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.012
 learns-0.011
 cooperate-0.011
Token.
Token ID13
Feature activation+8.78e-03

Pos logit contributions
 stub+0.046
 pupil+0.043
 nowhere+0.043
 year+0.041
 quite+0.041
Neg logit contributions
 agrees-0.084
 cleans-0.081
ours-0.075
 learns-0.073
 finishes-0.068
Token He
Token ID679
Feature activation+6.81e-03

Pos logit contributions
 all+0.008
 one+0.007
 stub+0.007
 in+0.007
 just+0.007
Neg logit contributions
 agrees-0.022
 cleans-0.021
ours-0.021
 wasteful-0.020
 cooperate-0.020
Token looked
Token ID3114
Feature activation-7.04e-03

Pos logit contributions
 stub+0.007
 all+0.007
 year+0.007
dream+0.006
 Halloween+0.006
Neg logit contributions
 agrees-0.015
 cleans-0.014
 learns-0.013
ours-0.013
 wasteful-0.013
Token at
Token ID379
Feature activation+8.22e-03

Pos logit contributions
 agrees+0.016
 cleans+0.015
ours+0.015
 wasteful+0.014
 learns+0.014
Neg logit contributions
 stub-0.008
 year-0.007
 ever-0.007
 all-0.007
lf-0.007
Token her
Token ID607
Feature activation-2.61e-04

Pos logit contributions
 stub+0.010
-+0.009
 year+0.009
dream+0.009
 pupil+0.009
Neg logit contributions
 agrees-0.017
 cleans-0.017
ours-0.016
 learns-0.015
 wasteful-0.015
TokenA
Token ID32
Feature activation+9.10e-03

Pos logit contributions
 all+0.005
 stub+0.005
-+0.005
 one+0.005
 in+0.005
Neg logit contributions
 agrees-0.008
 cleans-0.007
ours-0.007
 learns-0.007
 wasteful-0.007
Token few
Token ID1178
Feature activation-3.44e-04

Pos logit contributions
 stub+0.011
 all+0.011
 nowhere+0.010
 in+0.009
 just+0.009
Neg logit contributions
 agrees-0.019
 cleans-0.018
 learns-0.017
 cooperate-0.017
 safer-0.017
Token days
Token ID1528
Feature activation-4.76e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
 all-0.000
 stub-0.000
--0.000
 one-0.000
 in-0.000
Token later
Token ID1568
Feature activation+1.87e-02

Pos logit contributions
 agrees+0.009
 cleans+0.008
ours+0.008
 learns+0.008
 wasteful+0.008
Neg logit contributions
 stub-0.007
 all-0.007
--0.006
 one-0.006
 t-0.006
Token,
Token ID11
Feature activation+9.33e-03

Pos logit contributions
 stub+0.021
 nowhere+0.017
 armour+0.017
 t+0.017
 all+0.016
Neg logit contributions
 agrees-0.041
 cleans-0.040
ours-0.037
 learns-0.037
 wasteful-0.035
Token Sab
Token ID9910
Feature activation+4.10e-02

Pos logit contributions
 stub+0.013
 nowhere+0.011
-+0.011
 year+0.011
 pupil+0.010
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.017
 learns-0.017
 safer-0.016
Tokenrina
Token ID22267
Feature activation+3.80e-03

Pos logit contributions
 stub+0.076
 all+0.074
 one+0.072
 t+0.069
 in+0.069
Neg logit contributions
 agrees-0.063
 cleans-0.057
ours-0.056
 learns-0.053
 wasteful-0.047
Token had
Token ID550
Feature activation+5.28e-03

Pos logit contributions
 all+0.004
 stub+0.003
 one+0.003
 year+0.003
 pupil+0.003
Neg logit contributions
 agrees-0.009
 cleans-0.009
ours-0.008
 learns-0.008
 cooperate-0.008
Token to
Token ID284
Feature activation-3.11e-03

Pos logit contributions
 all+0.004
 stub+0.004
 one+0.003
-+0.003
 in+0.003
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.013
 learns-0.013
 wasteful-0.013
Token stay
Token ID2652
Feature activation+1.44e-03

Pos logit contributions
 agrees+0.006
 cleans+0.005
 cooperate+0.005
 learns+0.005
ours+0.005
Neg logit contributions
 stub-0.005
 all-0.004
--0.004
 nowhere-0.004
 year-0.004
Token in
Token ID287
Feature activation+9.83e-03

Pos logit contributions
 stub+0.001
 ever+0.001
t+0.001
 year+0.001
 pupil+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 safer-0.003
 wasteful-0.003
Token and
Token ID290
Feature activation-1.95e-03

Pos logit contributions
 agrees+0.035
 cleans+0.032
ours+0.032
 learns+0.031
 wasteful+0.031
Neg logit contributions
 stub-0.007
 all-0.006
 one-0.005
--0.005
 ever-0.005
Token had
Token ID550
Feature activation+1.31e-02

Pos logit contributions
 agrees+0.005
 cleans+0.004
ours+0.004
 wasteful+0.004
 safer+0.004
Neg logit contributions
 stub-0.002
 year-0.002
--0.002
 all-0.002
 one-0.002
Token a
Token ID257
Feature activation+7.66e-03

Pos logit contributions
 stub+0.013
-+0.012
 year+0.012
 pupil+0.012
 all+0.012
Neg logit contributions
 agrees-0.030
 cleans-0.029
ours-0.028
 learns-0.027
 wasteful-0.026
Token long
Token ID890
Feature activation+7.48e-03

Pos logit contributions
 ever+0.008
 nowhere+0.007
 all+0.007
 year+0.007
 stub+0.007
Neg logit contributions
 agrees-0.017
 cleans-0.016
ours-0.016
 learns-0.015
 safer-0.015
Token man
Token ID582
Feature activation+9.69e-03

Pos logit contributions
 ever+0.009
 all+0.009
 nowhere+0.009
 pupil+0.008
 stub+0.008
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.014
 learns-0.013
 yeast-0.013
Tokene
Token ID68
Feature activation+4.10e-02

Pos logit contributions
 all+0.010
 stub+0.009
 nowhere+0.009
 year+0.009
 barely+0.009
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.020
 learns-0.019
 supervision-0.019
Token.
Token ID13
Feature activation-2.81e-03

Pos logit contributions
 nowhere+0.041
 pupil+0.041
 stub+0.040
 visitor+0.040
 3+0.039
Neg logit contributions
 agrees-0.084
 cleans-0.084
ours-0.079
 learns-0.078
 becomes-0.074
Token Tom
Token ID4186
Feature activation-1.92e-03

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.006
 learns+0.005
 wasteful+0.005
Neg logit contributions
 all-0.003
 one-0.003
 stub-0.003
--0.003
 just-0.003
Token wanted
Token ID2227
Feature activation+1.43e-03

Pos logit contributions
 agrees+0.005
 cleans+0.005
ours+0.005
 learns+0.004
 wasteful+0.004
Neg logit contributions
 all-0.001
 stub-0.001
 one-0.001
 ever-0.001
 year-0.001
Token to
Token ID284
Feature activation-2.92e-03

Pos logit contributions
 stub+0.001
 all+0.001
-+0.001
 year+0.001
 ever+0.001
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 learns-0.003
 wasteful-0.003
Token ride
Token ID6594
Feature activation-2.97e-04

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.005
 cooperate+0.005
 learns+0.005
Neg logit contributions
 stub-0.004
 all-0.004
 nowhere-0.004
 year-0.004
--0.004
Token let
Token ID1309
Feature activation+6.88e-03

Pos logit contributions
 year+0.004
 stub+0.004
-+0.004
 ever+0.004
 all+0.004
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.012
 wasteful-0.011
 learns-0.011
Token out
Token ID503
Feature activation-1.31e-02

Pos logit contributions
 nowhere+0.008
 stub+0.008
 year+0.007
 pupil+0.007
 miner+0.007
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.013
 safer-0.013
 learns-0.013
Token a
Token ID257
Feature activation+1.68e-02

Pos logit contributions
 agrees+0.036
 cleans+0.034
ours+0.033
 learns+0.032
 wasteful+0.031
Neg logit contributions
 stub-0.009
 all-0.008
--0.008
 ever-0.007
 one-0.007
Token big
Token ID1263
Feature activation+5.46e-03

Pos logit contributions
 all+0.017
 stub+0.016
 one+0.016
-+0.015
 ever+0.015
Neg logit contributions
 agrees-0.040
 cleans-0.037
ours-0.037
 wasteful-0.035
 learns-0.035
Token sne
Token ID10505
Feature activation+1.72e-02

Pos logit contributions
 all+0.008
 stub+0.008
 ever+0.008
 one+0.008
 nowhere+0.008
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 cooperate-0.008
 learns-0.008
Tokene
Token ID68
Feature activation+4.08e-02

Pos logit contributions
 all+0.022
 nowhere+0.021
 stub+0.020
 t+0.019
 pupil+0.019
Neg logit contributions
 agrees-0.038
 cleans-0.035
 learns-0.032
 cooperate-0.032
ours-0.032
Tokenze
Token ID2736
Feature activation+2.03e-02

Pos logit contributions
 nowhere+0.056
 stub+0.048
 3+0.047
 miner+0.047
 all+0.046
Neg logit contributions
 agrees-0.079
 cleans-0.071
ours-0.066
 learns-0.064
ashes-0.062
Token!
Token ID0
Feature activation+2.51e-03

Pos logit contributions
 stub+0.015
 nowhere+0.015
 year+0.014
 pupil+0.013
 all+0.013
Neg logit contributions
 agrees-0.052
 cleans-0.048
ours-0.046
 cooperate-0.044
 learns-0.044
Token It
Token ID632
Feature activation+1.11e-02

Pos logit contributions
 stub+0.003
 all+0.003
-+0.003
 ever+0.003
 one+0.003
Neg logit contributions
 agrees-0.006
 cleans-0.005
ours-0.005
 learns-0.005
 cooperate-0.005
Token felt
Token ID2936
Feature activation-9.75e-03

Pos logit contributions
 pupil+0.011
 year+0.011
dream+0.011
 Halloween+0.011
 stub+0.011
Neg logit contributions
 agrees-0.024
 cleans-0.023
ours-0.021
 becomes-0.021
 safer-0.021
Token so
Token ID523
Feature activation-1.07e-02

Pos logit contributions
 agrees+0.025
ours+0.023
 cleans+0.023
 learns+0.022
 cooperate+0.021
Neg logit contributions
 all-0.009
 stub-0.009
 one-0.008
 just-0.008
 in-0.008
Token with
Token ID351
Feature activation+1.70e-02

Pos logit contributions
 stub+0.005
 all+0.005
 year+0.004
 nowhere+0.004
-+0.004
Neg logit contributions
 agrees-0.018
 cleans-0.017
ours-0.017
 learns-0.016
 cooperate-0.016
Token just
Token ID655
Feature activation+7.40e-03

Pos logit contributions
 stub+0.019
-+0.018
 year+0.017
 pupil+0.017
lf+0.017
Neg logit contributions
 agrees-0.036
 cleans-0.034
 safer-0.033
 learns-0.031
 cooperate-0.031
Token a
Token ID257
Feature activation+1.54e-02

Pos logit contributions
 all+0.006
 stub+0.006
-+0.005
 one+0.005
 year+0.005
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.018
 learns-0.017
 wasteful-0.017
Token tiny
Token ID7009
Feature activation+1.61e-02

Pos logit contributions
 all+0.016
 stub+0.015
-+0.015
 one+0.015
 in+0.014
Neg logit contributions
 agrees-0.036
 cleans-0.034
ours-0.033
 learns-0.032
 wasteful-0.032
Token sne
Token ID10505
Feature activation+2.15e-02

Pos logit contributions
 all+0.015
 stub+0.015
 nowhere+0.014
 ever+0.013
-+0.013
Neg logit contributions
 agrees-0.039
 cleans-0.035
ours-0.034
 learns-0.033
 cooperate-0.033
Tokene
Token ID68
Feature activation+4.07e-02

Pos logit contributions
 all+0.028
 t+0.026
 stub+0.026
 nowhere+0.026
 one+0.025
Neg logit contributions
 agrees-0.046
 cleans-0.042
 learns-0.040
ours-0.039
 cooperate-0.039
Tokenze
Token ID2736
Feature activation+1.83e-02

Pos logit contributions
 nowhere+0.043
 stub+0.042
 all+0.037
 pupil+0.036
 year+0.036
Neg logit contributions
 agrees-0.090
 cleans-0.077
 learns-0.076
ours-0.074
 cooperate-0.072
Token.
Token ID13
Feature activation-2.88e-03

Pos logit contributions
 stub+0.013
 nowhere+0.011
 year+0.011
 all+0.010
 pupil+0.010
Neg logit contributions
 agrees-0.048
 cleans-0.044
ours-0.043
 learns-0.042
 cooperate-0.041
Token He
Token ID679
Feature activation+1.63e-03

Pos logit contributions
 agrees+0.007
 cleans+0.006
ours+0.006
 wasteful+0.006
 learns+0.006
Neg logit contributions
 all-0.003
 stub-0.003
 one-0.003
--0.003
 just-0.003
Token was
Token ID373
Feature activation-1.57e-03

Pos logit contributions
 all+0.001
 stub+0.001
-+0.001
 one+0.001
 year+0.001
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 learns-0.004
 wasteful-0.004
Token so
Token ID523
Feature activation-7.60e-03

Pos logit contributions
 agrees+0.004
 cleans+0.004
ours+0.004
 learns+0.003
 wasteful+0.003
Neg logit contributions
 stub-0.001
 all-0.001
--0.001
 year-0.001
 one-0.001
Token horse
Token ID8223
Feature activation+8.79e-03

Pos logit contributions
 stub+0.002
 nowhere+0.002
 ever+0.002
 barely+0.002
t+0.002
Neg logit contributions
 agrees-0.003
ours-0.003
 cleans-0.003
 yeast-0.003
 salsa-0.002
Token with
Token ID351
Feature activation+1.47e-02

Pos logit contributions
 stub+0.008
 nowhere+0.006
 all+0.006
 year+0.006
 Halloween+0.006
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.019
 learns-0.019
 safer-0.019
Token a
Token ID257
Feature activation+5.62e-03

Pos logit contributions
 stub+0.016
 year+0.015
 pupil+0.015
-+0.015
lf+0.015
Neg logit contributions
 agrees-0.031
 cleans-0.030
 learns-0.028
ours-0.028
 yeast-0.027
Token long
Token ID890
Feature activation+8.45e-03

Pos logit contributions
 ever+0.005
 all+0.005
 nowhere+0.005
 year+0.005
 stub+0.005
Neg logit contributions
 agrees-0.013
 cleans-0.013
ours-0.012
 learns-0.012
 wasteful-0.012
Token man
Token ID582
Feature activation+8.04e-03

Pos logit contributions
 ever+0.010
 all+0.010
 pupil+0.010
 nowhere+0.009
 stub+0.009
Neg logit contributions
 agrees-0.017
 cleans-0.016
ours-0.015
 learns-0.015
 yeast-0.014
Tokene
Token ID68
Feature activation+4.07e-02

Pos logit contributions
 all+0.008
 stub+0.007
 nowhere+0.007
 year+0.007
 barely+0.006
Neg logit contributions
 agrees-0.018
 cleans-0.017
ours-0.017
 learns-0.016
 supervision-0.016
Token and
Token ID290
Feature activation+3.39e-03

Pos logit contributions
 pupil+0.044
 nowhere+0.042
 stub+0.042
 ever+0.041
 visitor+0.041
Neg logit contributions
 agrees-0.081
 cleans-0.081
ours-0.072
 learns-0.072
 becomes-0.066
Token a
Token ID257
Feature activation+3.50e-03

Pos logit contributions
 all+0.003
 year+0.003
 stub+0.003
-+0.002
 ever+0.002
Neg logit contributions
 agrees-0.009
 cleans-0.008
ours-0.008
 learns-0.008
 wasteful-0.008
Token tail
Token ID7894
Feature activation+1.79e-02

Pos logit contributions
 all+0.003
 ever+0.003
 year+0.003
 stub+0.003
 nowhere+0.003
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 wasteful-0.007
 learns-0.007
Token.
Token ID13
Feature activation-5.16e-04

Pos logit contributions
 stub+0.013
 year+0.013
 pupil+0.013
 ever+0.012
 nowhere+0.012
Neg logit contributions
 agrees-0.044
 cleans-0.042
ours-0.041
 learns-0.039
 yeast-0.037
Token L
Token ID406
Feature activation+5.87e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 wasteful+0.001
 learns+0.001
Neg logit contributions
 all-0.001
 stub-0.001
 one-0.001
--0.001
 in-0.001
Token you
Token ID345
Feature activation-1.51e-02

Pos logit contributions
 agrees+0.006
 cleans+0.005
ours+0.005
 learns+0.005
 cooperate+0.005
Neg logit contributions
 stub-0.006
 year-0.006
--0.006
 pupil-0.006
 t-0.006
Token how
Token ID703
Feature activation-1.17e-02

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.031
 wasteful+0.030
 learns+0.029
Neg logit contributions
 all-0.020
 one-0.018
 in-0.018
--0.017
 just-0.017
Token to
Token ID284
Feature activation-2.16e-02

Pos logit contributions
 agrees+0.034
 cleans+0.032
ours+0.031
 learns+0.030
 cooperate+0.030
Neg logit contributions
 all-0.007
 in-0.007
 one-0.006
 ever-0.006
--0.006
Token wash
Token ID13502
Feature activation-1.50e-02

Pos logit contributions
 agrees+0.046
ours+0.042
 cleans+0.042
 learns+0.039
 wasteful+0.039
Neg logit contributions
 all-0.029
 one-0.027
 stub-0.027
 just-0.027
 in-0.027
Token the
Token ID262
Feature activation-1.78e-02

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.030
 learns+0.029
 wasteful+0.029
Neg logit contributions
 all-0.019
 one-0.018
--0.017
 in-0.017
 just-0.017
Token rad
Token ID2511
Feature activation-5.23e-02

Pos logit contributions
 agrees+0.038
 cleans+0.035
ours+0.034
 learns+0.032
 wasteful+0.032
Neg logit contributions
 all-0.024
 one-0.023
 stub-0.022
--0.022
 in-0.022
Tokenishes
Token ID5614
Feature activation-3.39e-02

Pos logit contributions
 agrees+0.074
 cleans+0.067
 learns+0.061
ours+0.061
 cooperate+0.060
Neg logit contributions
 all-0.108
--0.105
 one-0.104
 in-0.103
 just-0.100
Token."
Token ID526
Feature activation-1.04e-04

Pos logit contributions
 agrees+0.090
 cleans+0.086
ours+0.084
 wasteful+0.081
 learns+0.080
Neg logit contributions
 all-0.032
 in-0.029
 one-0.027
--0.026
 just-0.025
Token The
Token ID383
Feature activation-2.59e-03

Pos logit contributions
 agrees+0.000
ours+0.000
 wasteful+0.000
 cleans+0.000
 learns+0.000
Neg logit contributions
 all-0.000
 in-0.000
 one-0.000
 just-0.000
--0.000
Token little
Token ID1310
Feature activation-1.33e-02

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.006
 wasteful+0.006
 learns+0.005
Neg logit contributions
 stub-0.002
 ever-0.002
 all-0.002
--0.002
 just-0.002
Token girl
Token ID2576
Feature activation-3.89e-03

Pos logit contributions
 agrees+0.026
 cleans+0.023
ours+0.023
 learns+0.021
 wasteful+0.021
Neg logit contributions
 stub-0.019
 all-0.019
 ever-0.018
 barely-0.018
--0.017
Token was
Token ID373
Feature activation-4.00e-03

Pos logit contributions
 agrees+0.002
 cleans+0.002
ours+0.002
 learns+0.002
 wasteful+0.002
Neg logit contributions
 stub-0.001
 year-0.001
 all-0.001
 one-0.001
 pupil-0.001
Token able
Token ID1498
Feature activation+2.93e-03

Pos logit contributions
 agrees+0.009
 cleans+0.008
ours+0.008
 wasteful+0.008
 safer+0.008
Neg logit contributions
 stub-0.005
--0.004
 all-0.004
 year-0.004
 pupil-0.004
Token to
Token ID284
Feature activation-1.75e-02

Pos logit contributions
 stub+0.002
 all+0.002
-+0.002
 year+0.002
 one+0.002
Neg logit contributions
 agrees-0.008
 cleans-0.007
ours-0.007
 learns-0.007
 wasteful-0.007
Token wash
Token ID13502
Feature activation-7.18e-03

Pos logit contributions
 agrees+0.038
ours+0.035
 cleans+0.035
 wasteful+0.033
 learns+0.033
Neg logit contributions
 all-0.024
 just-0.022
 one-0.021
 in-0.021
 stub-0.021
Token the
Token ID262
Feature activation-9.66e-03

Pos logit contributions
 agrees+0.017
 cleans+0.016
ours+0.016
 wasteful+0.015
 learns+0.015
Neg logit contributions
 stub-0.007
 ever-0.006
 year-0.006
 all-0.006
--0.006
Token rad
Token ID2511
Feature activation-5.05e-02

Pos logit contributions
 agrees+0.022
 cleans+0.020
ours+0.020
 learns+0.019
 wasteful+0.019
Neg logit contributions
 all-0.012
 one-0.011
--0.011
 in-0.011
 stub-0.011
Tokenishes
Token ID5614
Feature activation-2.58e-02

Pos logit contributions
 agrees+0.071
 cleans+0.064
ours+0.058
 learns+0.058
 wasteful+0.057
Neg logit contributions
 all-0.105
--0.102
 one-0.101
 in-0.099
 just-0.097
Token and
Token ID290
Feature activation-4.70e-03

Pos logit contributions
 agrees+0.076
ours+0.073
 cleans+0.073
 wasteful+0.070
 learns+0.069
Neg logit contributions
 all-0.015
 in-0.013
 one-0.012
 just-0.011
--0.011
Token get
Token ID651
Feature activation+2.99e-03

Pos logit contributions
 agrees+0.011
ours+0.010
 cleans+0.010
 learns+0.009
 wasteful+0.009
Neg logit contributions
 all-0.006
 one-0.005
 just-0.005
 in-0.005
 t-0.005
Token a
Token ID257
Feature activation-1.03e-03

Pos logit contributions
 stub+0.003
 all+0.003
-+0.003
 year+0.003
 one+0.003
Neg logit contributions
 agrees-0.007
 cleans-0.007
ours-0.007
 wasteful-0.006
 learns-0.006
Token drink
Token ID4144
Feature activation-8.72e-03

Pos logit contributions
 agrees+0.002
ours+0.002
 cleans+0.002
 learns+0.002
 wasteful+0.002
Neg logit contributions
 all-0.001
 one-0.001
 stub-0.001
--0.001
 t-0.001
Token they
Token ID484
Feature activation-1.59e-02

Pos logit contributions
 agrees+0.017
 cleans+0.015
ours+0.015
 learns+0.014
 cooperate+0.014
Neg logit contributions
 stub-0.010
--0.008
 nowhere-0.008
 all-0.008
 year-0.008
Token started
Token ID2067
Feature activation-1.02e-02

Pos logit contributions
 agrees+0.039
 cleans+0.036
ours+0.035
 learns+0.034
 cooperate+0.033
Neg logit contributions
 stub-0.015
--0.014
 year-0.014
 one-0.013
 nowhere-0.013
Token to
Token ID284
Feature activation-7.45e-03

Pos logit contributions
 agrees+0.027
 cleans+0.026
ours+0.026
 wasteful+0.024
 learns+0.024
Neg logit contributions
 all-0.008
 stub-0.007
 one-0.007
--0.007
 in-0.007
Token pick
Token ID2298
Feature activation-1.17e-02

Pos logit contributions
 agrees+0.016
ours+0.016
 cleans+0.015
 wasteful+0.015
 learns+0.014
Neg logit contributions
 all-0.010
 one-0.009
 t-0.009
 stub-0.009
 just-0.009
Token the
Token ID262
Feature activation-4.09e-03

Pos logit contributions
 agrees+0.029
 cleans+0.027
ours+0.027
 learns+0.025
 wasteful+0.025
Neg logit contributions
 stub-0.011
 all-0.011
 ever-0.010
 year-0.010
--0.010
Token blo
Token ID24924
Feature activation-4.97e-02

Pos logit contributions
 agrees+0.009
 cleans+0.008
ours+0.008
 wasteful+0.008
 learns+0.007
Neg logit contributions
 all-0.005
 stub-0.005
--0.005
 ever-0.005
 one-0.005
Tokenoms
Token ID3150
Feature activation-1.60e-02

Pos logit contributions
 cleans+0.074
 agrees+0.072
 cooperate+0.069
 wasteful+0.069
 supervision+0.066
Neg logit contributions
 all-0.098
--0.095
 in-0.095
 one-0.091
 around-0.086
Token.
Token ID13
Feature activation-2.97e-03

Pos logit contributions
 agrees+0.045
 cleans+0.042
ours+0.042
 learns+0.040
 wasteful+0.040
Neg logit contributions
 all-0.010
 stub-0.009
 one-0.008
--0.008
 in-0.008
Token
Token ID220
Feature activation-1.18e-02

Pos logit contributions
 agrees+0.006
 cleans+0.005
ours+0.005
 wasteful+0.005
 learns+0.005
Neg logit contributions
 all-0.004
 one-0.004
 stub-0.004
 just-0.004
 in-0.004
Token
Token ID198
Feature activation+1.72e-03

Pos logit contributions
 agrees+0.032
 cleans+0.030
 wasteful+0.030
ours+0.030
 learns+0.029
Neg logit contributions
 all-0.008
 one-0.007
 in-0.007
 just-0.006
--0.006
Token
Token ID198
Feature activation+2.95e-03

Pos logit contributions
 all+0.002
 stub+0.001
 one+0.001
-+0.001
 in+0.001
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 wasteful-0.004
 learns-0.004
Token he
Token ID339
Feature activation-6.00e-03

Pos logit contributions
 agrees+0.009
 cleans+0.009
ours+0.009
 wasteful+0.009
 safer+0.009
Neg logit contributions
 stub-0.006
--0.005
 year-0.005
 pupil-0.005
ind-0.005
Token decided
Token ID3066
Feature activation+1.07e-02

Pos logit contributions
 agrees+0.014
 cleans+0.013
ours+0.013
 learns+0.012
 wasteful+0.012
Neg logit contributions
 stub-0.006
 all-0.006
 year-0.006
 ever-0.006
--0.005
Token to
Token ID284
Feature activation-1.46e-02

Pos logit contributions
 year+0.010
 stub+0.010
 all+0.009
 pupil+0.009
 ever+0.009
Neg logit contributions
 agrees-0.024
 cleans-0.023
 learns-0.023
 wasteful-0.023
ours-0.022
Token remove
Token ID4781
Feature activation-7.05e-03

Pos logit contributions
 agrees+0.027
 cleans+0.025
ours+0.024
 cooperate+0.024
 learns+0.023
Neg logit contributions
 stub-0.022
 nowhere-0.021
--0.021
 pupil-0.020
 year-0.020
Token the
Token ID262
Feature activation-1.53e-02

Pos logit contributions
 agrees+0.015
 cleans+0.015
ours+0.015
 wasteful+0.014
 learns+0.014
Neg logit contributions
 stub-0.008
 year-0.008
 pupil-0.007
 ever-0.007
--0.007
Token seaw
Token ID32085
Feature activation-4.80e-02

Pos logit contributions
 agrees+0.027
ours+0.025
 cleans+0.024
 learns+0.023
 wasteful+0.023
Neg logit contributions
 all-0.023
 stub-0.023
 year-0.023
 ever-0.022
--0.022
Tokeneed
Token ID2308
Feature activation-1.52e-02

Pos logit contributions
 agrees+0.071
ours+0.064
 cleans+0.060
 learns+0.057
 cooperate+0.056
Neg logit contributions
 all-0.093
 stub-0.091
 one-0.088
 ever-0.088
 nowhere-0.087
Token from
Token ID422
Feature activation+9.43e-03

Pos logit contributions
 agrees+0.039
 cleans+0.037
ours+0.037
 learns+0.035
 wasteful+0.035
Neg logit contributions
 all-0.012
 stub-0.011
 one-0.011
--0.011
 in-0.011
Token the
Token ID262
Feature activation-1.11e-02

Pos logit contributions
 stub+0.011
 year+0.010
 pupil+0.010
dream+0.010
t+0.010
Neg logit contributions
 cleans-0.019
ours-0.018
 wasteful-0.018
 agrees-0.018
 safer-0.018
Token water
Token ID1660
Feature activation-4.24e-04

Pos logit contributions
 agrees+0.020
ours+0.018
 cleans+0.018
 wasteful+0.017
 learns+0.017
Neg logit contributions
 all-0.017
 stub-0.017
--0.016
 year-0.016
 ever-0.016
Token so
Token ID523
Feature activation+8.44e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
 all-0.000
 stub-0.000
--0.000
 one-0.000
 ever-0.000
Token boy
Token ID2933
Feature activation+1.80e-04

Pos logit contributions
 agrees+0.042
 cleans+0.039
ours+0.039
 learns+0.037
 wasteful+0.037
Neg logit contributions
 all-0.018
 stub-0.017
 one-0.017
--0.017
 in-0.017
Token tried
Token ID3088
Feature activation+5.98e-04

Pos logit contributions
 all+0.000
 stub+0.000
-+0.000
 one+0.000
 in+0.000
Neg logit contributions
 agrees-0.000
 cleans-0.000
ours-0.000
 learns-0.000
 wasteful-0.000
Token to
Token ID284
Feature activation-9.49e-03

Pos logit contributions
 all+0.000
 one+0.000
 in+0.000
 just+0.000
-+0.000
Neg logit contributions
 agrees-0.002
ours-0.002
 cleans-0.002
 wasteful-0.002
 learns-0.002
Token make
Token ID787
Feature activation-9.71e-03

Pos logit contributions
 agrees+0.020
 cleans+0.019
 cooperate+0.018
ours+0.017
 learns+0.017
Neg logit contributions
 stub-0.012
 all-0.012
--0.011
 year-0.011
 one-0.011
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
 agrees+0.025
 cleans+0.023
ours+0.023
 learns+0.022
 wasteful+0.022
Neg logit contributions
 stub-0.008
 all-0.008
--0.007
 one-0.007
 ever-0.007
Token rad
Token ID2511
Feature activation-4.79e-02

Pos logit contributions
 agrees+0.025
 cleans+0.023
ours+0.023
 learns+0.022
 wasteful+0.022
Neg logit contributions
 all-0.012
 one-0.011
 stub-0.011
 in-0.010
--0.010
Tokenish
Token ID680
Feature activation-2.84e-03

Pos logit contributions
 agrees+0.081
 cleans+0.073
ours+0.071
 learns+0.067
 wasteful+0.066
Neg logit contributions
 all-0.084
--0.080
 one-0.080
 stub-0.079
 in-0.078
Token happy
Token ID3772
Feature activation+1.57e-03

Pos logit contributions
 agrees+0.007
ours+0.007
 cleans+0.007
 wasteful+0.007
 learns+0.007
Neg logit contributions
 all-0.003
 in-0.003
 one-0.003
 just-0.002
--0.002
Token.
Token ID13
Feature activation+1.01e-03

Pos logit contributions
 all+0.001
 in+0.001
 one+0.001
 just+0.001
-+0.001
Neg logit contributions
 agrees-0.005
ours-0.005
 cleans-0.005
 wasteful-0.004
 learns-0.004
Token He
Token ID679
Feature activation+1.35e-03

Pos logit contributions
 all+0.001
 one+0.001
 with+0.001
 in+0.001
 just+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.003
 wasteful-0.003
ours-0.003
 supervision-0.003
Token sang
Token ID25889
Feature activation+1.58e-03

Pos logit contributions
 stub+0.001
 all+0.001
 year+0.001
-+0.001
 one+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 learns-0.003
 wasteful-0.003
Token and
Token ID290
Feature activation-8.09e-03

Pos logit contributions
 stub+0.002
 pupil+0.002
 all+0.002
 year+0.001
 nowhere+0.001
Neg logit contributions
 agrees-0.005
 cleans-0.005
ours-0.005
 cooperate-0.005
 learns-0.005
Token together
Token ID1978
Feature activation-2.31e-02

Pos logit contributions
 agrees+0.018
ours+0.016
 cleans+0.016
 wasteful+0.015
 learns+0.015
Neg logit contributions
 all-0.010
 one-0.010
 stub-0.009
 just-0.009
--0.009
Token they
Token ID484
Feature activation-7.37e-03

Pos logit contributions
 agrees+0.041
 cleans+0.038
ours+0.037
 learns+0.034
 cooperate+0.034
Neg logit contributions
 stub-0.036
 all-0.034
 year-0.034
--0.034
 pupil-0.034
Token shook
Token ID14682
Feature activation+2.26e-03

Pos logit contributions
 agrees+0.017
 cleans+0.016
ours+0.015
 learns+0.015
 wasteful+0.015
Neg logit contributions
 stub-0.008
--0.008
 year-0.007
 one-0.007
 all-0.007
Token the
Token ID262
Feature activation-1.34e-02

Pos logit contributions
 all+0.003
 stub+0.003
-+0.002
 one+0.002
 ever+0.002
Neg logit contributions
 agrees-0.005
 cleans-0.005
ours-0.005
 learns-0.004
 wasteful-0.004
Token rad
Token ID2511
Feature activation-4.73e-02

Pos logit contributions
 agrees+0.029
 cleans+0.027
ours+0.027
 wasteful+0.025
 learns+0.025
Neg logit contributions
 all-0.016
 stub-0.016
--0.015
 one-0.015
 ever-0.015
Tokenish
Token ID680
Feature activation-7.28e-03

Pos logit contributions
 agrees+0.083
 cleans+0.076
ours+0.073
 learns+0.070
 wasteful+0.069
Neg logit contributions
 all-0.079
--0.077
 one-0.076
 in-0.074
 just-0.074
Token until
Token ID1566
Feature activation+4.13e-03

Pos logit contributions
 agrees+0.018
 cleans+0.017
ours+0.017
 wasteful+0.016
 learns+0.016
Neg logit contributions
 all-0.008
 in-0.007
 one-0.007
--0.007
 just-0.006
Token it
Token ID340
Feature activation+6.25e-03

Pos logit contributions
 stub+0.004
 all+0.004
-+0.004
 year+0.004
 t+0.004
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 wasteful-0.008
 learns-0.008
Token finally
Token ID3443
Feature activation+4.88e-03

Pos logit contributions
 stub+0.005
 all+0.005
 pupil+0.005
 year+0.005
-+0.005
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.014
 learns-0.014
 wasteful-0.013
Token came
Token ID1625
Feature activation+1.13e-02

Pos logit contributions
 all+0.004
 stub+0.004
 one+0.004
-+0.004
 in+0.004
Neg logit contributions
 agrees-0.012
 cleans-0.012
ours-0.012
 wasteful-0.011
 learns-0.011
Token the
Token ID262
Feature activation+8.46e-05

Pos logit contributions
-+0.007
 stub+0.007
 year+0.007
 all+0.007
 armour+0.007
Neg logit contributions
 agrees-0.013
 cleans-0.013
ours-0.012
 learns-0.012
 wasteful-0.012
Token sun
Token ID4252
Feature activation-8.26e-03

Pos logit contributions
dd+0.000
dream+0.000
 ever+0.000
te+0.000
eb+0.000
Neg logit contributions
 cleans-0.000
 agrees-0.000
 safer-0.000
 sandbox-0.000
 learns-0.000
Token would
Token ID561
Feature activation+9.02e-03

Pos logit contributions
 agrees+0.018
 cleans+0.017
ours+0.017
 learns+0.016
 becomes+0.015
Neg logit contributions
 stub-0.009
 all-0.009
 pupil-0.008
 nowhere-0.008
 Halloween-0.008
Token make
Token ID787
Feature activation-7.13e-03

Pos logit contributions
-+0.013
 pupil+0.013
 nowhere+0.013
 stub+0.013
 year+0.012
Neg logit contributions
 agrees-0.016
 cleans-0.015
 cooperate-0.014
 safer-0.014
 learns-0.014
Token the
Token ID262
Feature activation-6.55e-03

Pos logit contributions
 agrees+0.018
 cleans+0.017
ours+0.016
 learns+0.016
 wasteful+0.016
Neg logit contributions
 stub-0.006
 year-0.005
 ever-0.005
 pupil-0.005
 nowhere-0.005
Token sc
Token ID629
Feature activation-4.70e-02

Pos logit contributions
 agrees+0.012
 cleans+0.011
 learns+0.010
 safer+0.010
ours+0.010
Neg logit contributions
 ever-0.009
 nowhere-0.009
 stub-0.009
dd-0.009
te-0.009
Tokenooter
Token ID25141
Feature activation-1.57e-02

Pos logit contributions
 agrees+0.090
ours+0.081
 cleans+0.081
 learns+0.076
 wasteful+0.074
Neg logit contributions
 all-0.070
 stub-0.069
 nowhere-0.068
 one-0.066
 t-0.065
Token too
Token ID1165
Feature activation+5.72e-03

Pos logit contributions
 agrees+0.035
 cleans+0.032
ours+0.032
 learns+0.030
 wasteful+0.030
Neg logit contributions
 stub-0.018
 all-0.018
 year-0.017
 one-0.017
--0.016
Token hot
Token ID3024
Feature activation-1.32e-03

Pos logit contributions
 stub+0.005
 all+0.005
 pupil+0.004
 ever+0.004
 nowhere+0.004
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.013
 wasteful-0.013
 learns-0.013
Token to
Token ID284
Feature activation-1.63e-02

Pos logit contributions
 agrees+0.004
 cleans+0.003
ours+0.003
 learns+0.003
 wasteful+0.003
Neg logit contributions
 stub-0.001
 year-0.001
 all-0.001
 pupil-0.001
 t-0.001
Token touch
Token ID3638
Feature activation-4.95e-04

Pos logit contributions
 cleans+0.024
 agrees+0.024
 cooperate+0.023
 safer+0.022
 rinse+0.021
Neg logit contributions
 pupil-0.027
 nowhere-0.027
 stub-0.025
 3-0.025
 year-0.024
Token,
Token ID11
Feature activation+9.84e-03

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.031
 learns+0.030
 wasteful+0.029
Neg logit contributions
 all-0.009
 stub-0.009
 one-0.008
--0.008
 ever-0.008
Token it
Token ID340
Feature activation-1.32e-03

Pos logit contributions
 stub+0.012
-+0.011
 year+0.010
 ever+0.010
 all+0.010
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.019
 wasteful-0.019
 learns-0.019
Token kept
Token ID4030
Feature activation-9.26e-03

Pos logit contributions
 agrees+0.003
 cleans+0.002
 learns+0.002
 wasteful+0.002
 becomes+0.002
Neg logit contributions
 year-0.002
 pupil-0.002
 stub-0.001
 Halloween-0.001
dream-0.001
Token on
Token ID319
Feature activation-1.06e-02

Pos logit contributions
 cleans+0.018
 agrees+0.018
ours+0.017
 wasteful+0.017
 safer+0.017
Neg logit contributions
 stub-0.011
 year-0.011
t-0.010
 pupil-0.010
lf-0.010
Token flying
Token ID7348
Feature activation-1.24e-02

Pos logit contributions
 agrees+0.022
 cleans+0.021
 learns+0.020
 safer+0.020
ours+0.019
Neg logit contributions
 stub-0.012
t-0.011
 year-0.011
 pupil-0.011
 ever-0.010
Token err
Token ID11454
Feature activation-4.70e-02

Pos logit contributions
 agrees+0.030
 safer+0.026
 learns+0.025
 cleans+0.025
 cooperate+0.024
Neg logit contributions
 stub-0.013
 Halloween-0.011
 pupil-0.011
lf-0.010
 t-0.010
Tokenatically
Token ID4142
Feature activation-3.86e-03

Pos logit contributions
 agrees+0.068
ours+0.062
 cleans+0.062
 wasteful+0.053
 learns+0.053
Neg logit contributions
 all-0.091
--0.089
 ever-0.088
 stub-0.087
 year-0.087
Token with
Token ID351
Feature activation+3.65e-03

Pos logit contributions
 agrees+0.011
 cleans+0.010
ours+0.010
 learns+0.009
 wasteful+0.009
Neg logit contributions
 all-0.002
 stub-0.002
--0.002
 one-0.002
 year-0.002
Token inexplicable
Token ID49602
Feature activation+2.18e-03

Pos logit contributions
 stub+0.004
lf+0.004
-+0.004
 year+0.004
 pupil+0.004
Neg logit contributions
 agrees-0.008
 cleans-0.007
 learns-0.007
 safer-0.007
ours-0.007
Token speed
Token ID2866
Feature activation-3.60e-03

Pos logit contributions
 stub+0.003
 nowhere+0.003
 pupil+0.002
 all+0.002
 year+0.002
Neg logit contributions
 agrees-0.005
 cleans-0.004
 learns-0.004
 safer-0.004
 cooperate-0.004
Token.
Token ID13
Feature activation+2.04e-03

Pos logit contributions
 agrees+0.009
 cleans+0.008
 learns+0.008
ours+0.008
 supervision+0.008
Neg logit contributions
 stub-0.003
 Halloween-0.002
 year-0.002
 nowhere-0.002
 all-0.002
Token him
Token ID683
Feature activation-6.20e-04

Pos logit contributions
 stub+0.007
 year+0.006
 ever+0.006
 pupil+0.006
 surrounded+0.005
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.012
 praises-0.011
 safer-0.011
Token how
Token ID703
Feature activation-9.14e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 wasteful+0.001
 learns+0.001
Neg logit contributions
 all-0.001
 one-0.001
--0.001
 in-0.001
 stub-0.001
Token to
Token ID284
Feature activation-2.55e-02

Pos logit contributions
 agrees+0.026
 cleans+0.025
ours+0.025
 learns+0.023
 wasteful+0.023
Neg logit contributions
 all-0.006
 stub-0.005
 one-0.005
--0.005
 in-0.004
Token hold
Token ID1745
Feature activation-7.95e-03

Pos logit contributions
 agrees+0.055
ours+0.051
 cleans+0.051
 wasteful+0.047
 learns+0.046
Neg logit contributions
 all-0.035
 one-0.033
 just-0.032
 stub-0.031
 in-0.031
Token the
Token ID262
Feature activation-2.48e-02

Pos logit contributions
 agrees+0.020
 cleans+0.019
ours+0.019
 learns+0.018
 wasteful+0.018
Neg logit contributions
 stub-0.007
 all-0.006
--0.006
 year-0.006
 ever-0.006
Token golf
Token ID13126
Feature activation-4.62e-02

Pos logit contributions
 agrees+0.054
ours+0.052
 cleans+0.052
 wasteful+0.047
 learns+0.045
Neg logit contributions
 all-0.032
 one-0.031
 in-0.029
 just-0.029
--0.028
Token club
Token ID3430
Feature activation-2.67e-02

Pos logit contributions
 agrees+0.101
 cleans+0.097
ours+0.096
 wasteful+0.090
 learns+0.088
Neg logit contributions
 all-0.057
 one-0.054
--0.053
 in-0.052
 just-0.051
Token and
Token ID290
Feature activation+6.44e-04

Pos logit contributions
 agrees+0.072
 cleans+0.067
ours+0.066
 learns+0.064
 cooperate+0.063
Neg logit contributions
 stub-0.019
 all-0.016
--0.015
 year-0.015
 ever-0.015
Token how
Token ID703
Feature activation-8.68e-03

Pos logit contributions
 all+0.001
 one+0.001
 just+0.001
 in+0.001
 stub+0.001
Neg logit contributions
 agrees-0.002
ours-0.001
 cleans-0.001
 wasteful-0.001
 learns-0.001
Token to
Token ID284
Feature activation-2.51e-02

Pos logit contributions
 agrees+0.023
 cleans+0.022
ours+0.022
 learns+0.021
 wasteful+0.021
Neg logit contributions
 all-0.006
 stub-0.006
--0.006
 one-0.006
 in-0.005
Token swing
Token ID9628
Feature activation-1.04e-02

Pos logit contributions
 agrees+0.053
 cleans+0.049
ours+0.049
 wasteful+0.045
 learns+0.045
Neg logit contributions
 all-0.034
 one-0.032
 stub-0.031
 just-0.031
 in-0.031
Token
Token ID198
Feature activation-1.76e-03

Pos logit contributions
 agrees+0.012
 cleans+0.011
ours+0.011
 learns+0.010
 cooperate+0.010
Neg logit contributions
 stub-0.005
 all-0.005
--0.005
 ever-0.005
 one-0.005
TokenJimmy
Token ID40335
Feature activation-1.14e-02

Pos logit contributions
 agrees+0.003
ours+0.003
 cleans+0.003
 learns+0.003
 cooperate+0.003
Neg logit contributions
 stub-0.003
 all-0.003
 t-0.003
 one-0.003
--0.003
Token quickly
Token ID2952
Feature activation-1.10e-02

Pos logit contributions
 agrees+0.028
 cleans+0.026
ours+0.026
 wasteful+0.024
 learns+0.024
Neg logit contributions
 all-0.011
 stub-0.011
--0.010
 one-0.010
 in-0.010
Token cleaned
Token ID20750
Feature activation-5.53e-03

Pos logit contributions
 agrees+0.026
 cleans+0.025
ours+0.024
 wasteful+0.023
 learns+0.023
Neg logit contributions
 all-0.011
 stub-0.010
--0.010
 one-0.010
 in-0.010
Token his
Token ID465
Feature activation-2.06e-02

Pos logit contributions
 agrees+0.013
 cleans+0.012
ours+0.012
 learns+0.011
 wasteful+0.011
Neg logit contributions
 stub-0.006
 ever-0.005
 year-0.005
 nowhere-0.005
 all-0.005
Token sc
Token ID629
Feature activation-4.60e-02

Pos logit contributions
 agrees+0.047
 cleans+0.044
ours+0.043
 learns+0.042
 wasteful+0.040
Neg logit contributions
 stub-0.023
 ever-0.022
 all-0.020
 nowhere-0.020
 year-0.019
Tokenooter
Token ID25141
Feature activation-1.44e-02

Pos logit contributions
 agrees+0.096
 cleans+0.088
ours+0.088
 learns+0.082
 wasteful+0.082
Neg logit contributions
 all-0.060
 stub-0.059
 one-0.057
--0.056
 in-0.055
Token and
Token ID290
Feature activation+1.12e-03

Pos logit contributions
 agrees+0.038
 cleans+0.035
ours+0.034
 learns+0.033
 cooperate+0.032
Neg logit contributions
 stub-0.012
 nowhere-0.010
 all-0.010
 year-0.009
 ever-0.009
Token then
Token ID788
Feature activation+2.90e-03

Pos logit contributions
 all+0.001
 stub+0.001
 one+0.001
-+0.001
 in+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.002
ours-0.002
 learns-0.002
 wasteful-0.002
Token had
Token ID550
Feature activation+7.10e-03

Pos logit contributions
 stub+0.003
 all+0.003
 year+0.003
-+0.002
 ever+0.002
Neg logit contributions
 agrees-0.007
 cleans-0.007
ours-0.006
 learns-0.006
 wasteful-0.006
Token a
Token ID257
Feature activation+7.33e-03

Pos logit contributions
 all+0.005
 stub+0.005
-+0.004
 one+0.004
 in+0.004
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.018
 learns-0.017
 wasteful-0.017
Token was
Token ID373
Feature activation-2.96e-03

Pos logit contributions
 stub+0.000
 year+0.000
 Halloween+0.000
 all+0.000
 pupil+0.000
Neg logit contributions
 agrees-0.000
 cleans-0.000
ours-0.000
 learns-0.000
 wasteful-0.000
Token going
Token ID1016
Feature activation-5.19e-03

Pos logit contributions
 cleans+0.006
 safer+0.006
 agrees+0.006
 wasteful+0.006
ours+0.005
Neg logit contributions
 stub-0.004
 pupil-0.003
 year-0.003
amb-0.003
 ever-0.003
Token to
Token ID284
Feature activation-8.43e-04

Pos logit contributions
 agrees+0.011
 cleans+0.010
 learns+0.010
 safer+0.010
ours+0.010
Neg logit contributions
 stub-0.006
 year-0.005
lf-0.005
 pupil-0.005
 t-0.005
Token fix
Token ID4259
Feature activation-4.34e-03

Pos logit contributions
 agrees+0.002
 cleans+0.002
ours+0.001
 cooperate+0.001
 learns+0.001
Neg logit contributions
 stub-0.001
--0.001
 all-0.001
 year-0.001
 ever-0.001
Token the
Token ID262
Feature activation+2.30e-03

Pos logit contributions
 agrees+0.010
 cleans+0.010
ours+0.010
 wasteful+0.009
 learns+0.009
Neg logit contributions
 stub-0.004
 all-0.004
 ever-0.004
 year-0.004
 pupil-0.004
Token sc
Token ID629
Feature activation-4.59e-02

Pos logit contributions
 ever+0.003
 stub+0.003
dream+0.003
dd+0.003
 nowhere+0.003
Neg logit contributions
 agrees-0.004
 cleans-0.004
 learns-0.004
 wasteful-0.004
 safer-0.004
Tokenooter
Token ID25141
Feature activation-1.18e-02

Pos logit contributions
 agrees+0.092
ours+0.084
 cleans+0.084
 learns+0.079
 wasteful+0.078
Neg logit contributions
 all-0.064
 stub-0.063
 one-0.060
 ever-0.059
 t-0.059
Token.
Token ID13
Feature activation+1.10e-03

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.031
 learns+0.030
 wasteful+0.030
Neg logit contributions
 stub-0.007
 all-0.006
 year-0.005
 one-0.005
 ever-0.005
Token First
Token ID3274
Feature activation+3.28e-03

Pos logit contributions
 all+0.001
 one+0.001
 in+0.001
 just+0.001
 stub+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.002
ours-0.002
 wasteful-0.002
 learns-0.002
Token,
Token ID11
Feature activation-2.55e-03

Pos logit contributions
 stub+0.003
 all+0.003
 pupil+0.002
 nowhere+0.002
-+0.002
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 learns-0.007
 wasteful-0.007
Token Daddy
Token ID26926
Feature activation-2.54e-03

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.005
 wasteful+0.005
 learns+0.005
Neg logit contributions
 all-0.003
 stub-0.003
--0.003
 one-0.002
 ever-0.002
Token,
Token ID11
Feature activation+3.22e-03

Pos logit contributions
 agrees+0.025
 cleans+0.024
ours+0.024
 wasteful+0.023
 yeast+0.022
Neg logit contributions
 all-0.007
 one-0.006
 in-0.006
--0.005
 just-0.005
Token "
Token ID366
Feature activation+1.17e-02

Pos logit contributions
 all+0.004
 stub+0.003
-+0.003
 one+0.003
 year+0.003
Neg logit contributions
 agrees-0.007
 cleans-0.007
ours-0.007
 wasteful-0.006
 learns-0.006
TokenWhat
Token ID2061
Feature activation-2.65e-02

Pos logit contributions
 all+0.018
 stub+0.017
-+0.017
 ever+0.017
 just+0.016
Neg logit contributions
 agrees-0.022
 cleans-0.020
ours-0.019
 cooperate-0.018
 wasteful-0.018
Token is
Token ID318
Feature activation+5.87e-03

Pos logit contributions
 agrees+0.046
ours+0.043
 cleans+0.041
 wasteful+0.040
 learns+0.038
Neg logit contributions
 all-0.044
 in-0.043
 ever-0.043
--0.042
 just-0.042
Token this
Token ID428
Feature activation-3.03e-02

Pos logit contributions
 stub+0.006
 year+0.006
 all+0.006
 ever+0.006
-+0.005
Neg logit contributions
 agrees-0.013
 cleans-0.013
ours-0.013
 wasteful-0.012
 learns-0.012
Token big
Token ID1263
Feature activation-4.59e-02

Pos logit contributions
 agrees+0.072
 cleans+0.066
ours+0.065
 learns+0.061
 cooperate+0.061
Neg logit contributions
 all-0.035
 one-0.034
--0.033
 in-0.032
 just-0.032
Token cross
Token ID3272
Feature activation-1.78e-02

Pos logit contributions
 agrees+0.102
 cleans+0.094
ours+0.093
 learns+0.085
 cooperate+0.085
Neg logit contributions
 one-0.059
 all-0.059
--0.057
 in-0.055
 just-0.053
Token?"
Token ID1701
Feature activation+7.43e-03

Pos logit contributions
 agrees+0.040
 cleans+0.037
ours+0.037
 learns+0.035
 wasteful+0.034
Neg logit contributions
 all-0.021
 stub-0.020
 one-0.019
--0.019
 in-0.019
Token Lily
Token ID20037
Feature activation-1.32e-03

Pos logit contributions
 all+0.009
 stub+0.008
 one+0.008
-+0.008
 in+0.008
Neg logit contributions
 agrees-0.017
ours-0.016
 cleans-0.016
 wasteful-0.015
 learns-0.015
Token said
Token ID531
Feature activation+2.70e-03

Pos logit contributions
 agrees+0.003
 cleans+0.003
ours+0.003
 learns+0.002
 cooperate+0.002
Neg logit contributions
 stub-0.002
 all-0.002
 ever-0.002
--0.002
 year-0.002
Token,
Token ID11
Feature activation+3.81e-03

Pos logit contributions
 stub+0.002
 all+0.002
 year+0.002
 nowhere+0.002
 one+0.002
Neg logit contributions
 agrees-0.007
 cleans-0.006
ours-0.006
 learns-0.006
 wasteful-0.006
Token He
Token ID679
Feature activation-1.41e-03

Pos logit contributions
 agrees+0.012
ours+0.011
 cleans+0.011
 learns+0.010
 cooperate+0.010
Neg logit contributions
 stub-0.007
--0.007
 ever-0.007
 all-0.007
 t-0.007
Token decided
Token ID3066
Feature activation+9.60e-03

Pos logit contributions
 agrees+0.003
 cleans+0.003
 learns+0.003
ours+0.003
 cooperate+0.003
Neg logit contributions
 stub-0.001
 year-0.001
--0.001
 Halloween-0.001
 all-0.001
Token to
Token ID284
Feature activation-1.42e-02

Pos logit contributions
 stub+0.009
 year+0.008
 t+0.008
 pupil+0.008
-+0.008
Neg logit contributions
 agrees-0.022
 cleans-0.021
 learns-0.020
ours-0.020
 wasteful-0.020
Token take
Token ID1011
Feature activation-5.57e-03

Pos logit contributions
 agrees+0.028
 cleans+0.025
 cooperate+0.024
ours+0.023
 learns+0.023
Neg logit contributions
 stub-0.020
 nowhere-0.020
--0.019
 year-0.018
 pupil-0.018
Token his
Token ID465
Feature activation-2.92e-03

Pos logit contributions
 agrees+0.012
 cleans+0.011
 learns+0.010
 safer+0.010
ours+0.010
Neg logit contributions
 stub-0.007
 nowhere-0.006
 year-0.006
lf-0.006
 pupil-0.006
Token sc
Token ID629
Feature activation-4.58e-02

Pos logit contributions
 agrees+0.006
 cleans+0.005
ours+0.005
 supervision+0.005
 learns+0.005
Neg logit contributions
 stub-0.004
 ever-0.003
 nowhere-0.003
dd-0.003
te-0.003
Tokenooter
Token ID25141
Feature activation-1.41e-02

Pos logit contributions
 agrees+0.084
ours+0.074
 cleans+0.071
 learns+0.070
 cooperate+0.068
Neg logit contributions
 stub-0.073
 all-0.071
 nowhere-0.070
 t-0.069
 one-0.068
Token out
Token ID503
Feature activation-8.50e-03

Pos logit contributions
 agrees+0.031
 safer+0.028
 learns+0.028
 cooperate+0.027
 cleans+0.027
Neg logit contributions
 stub-0.017
 nowhere-0.013
 year-0.013
dd-0.013
 Halloween-0.012
Token into
Token ID656
Feature activation-3.27e-03

Pos logit contributions
 agrees+0.021
 cleans+0.019
ours+0.019
 learns+0.018
 safer+0.018
Neg logit contributions
 stub-0.008
 all-0.007
 t-0.006
 pupil-0.006
 one-0.006
Token the
Token ID262
Feature activation-5.49e-03

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.006
 cooperate+0.005
 safer+0.005
Neg logit contributions
 stub-0.004
t-0.004
--0.004
man-0.004
 ever-0.004
Token backyard
Token ID24296
Feature activation-4.06e-03

Pos logit contributions
 agrees+0.012
 cleans+0.011
ours+0.011
 learns+0.010
 wasteful+0.010
Neg logit contributions
 all-0.007
 stub-0.007
--0.007
 one-0.007
 ever-0.006
Token flew
Token ID13112
Feature activation-2.39e-03

Pos logit contributions
 all+0.002
 pupil+0.002
 year+0.002
dream+0.002
 stub+0.002
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 learns-0.004
 wasteful-0.004
Token down
Token ID866
Feature activation-1.77e-02

Pos logit contributions
 agrees+0.005
ours+0.004
 cleans+0.004
 safer+0.004
 wasteful+0.004
Neg logit contributions
 stub-0.003
lf-0.003
 year-0.003
 Halloween-0.003
 pupil-0.003
Token and
Token ID290
Feature activation-1.57e-03

Pos logit contributions
 agrees+0.047
 cleans+0.044
ours+0.044
 learns+0.042
 wasteful+0.042
Neg logit contributions
 stub-0.013
 all-0.012
--0.011
 one-0.011
 year-0.011
Token snatched
Token ID43276
Feature activation-1.68e-02

Pos logit contributions
 agrees+0.004
ours+0.004
 cleans+0.004
 learns+0.003
 wasteful+0.003
Neg logit contributions
 all-0.002
 stub-0.002
 one-0.001
 in-0.001
 just-0.001
Token the
Token ID262
Feature activation-4.80e-03

Pos logit contributions
 agrees+0.038
ours+0.035
 cleans+0.035
 learns+0.033
 wasteful+0.032
Neg logit contributions
 stub-0.020
 year-0.019
lf-0.017
 ever-0.017
 pupil-0.017
Token rad
Token ID2511
Feature activation-4.57e-02

Pos logit contributions
 agrees+0.010
 cleans+0.010
ours+0.009
 wasteful+0.009
 learns+0.009
Neg logit contributions
 stub-0.006
 all-0.006
--0.006
 ever-0.005
 year-0.005
Tokenish
Token ID680
Feature activation-1.10e-02

Pos logit contributions
 agrees+0.074
 cleans+0.067
ours+0.066
 learns+0.061
 wasteful+0.061
Neg logit contributions
 all-0.081
 stub-0.079
--0.078
 one-0.078
 in-0.076
Token from
Token ID422
Feature activation-2.10e-03

Pos logit contributions
 agrees+0.027
ours+0.025
 cleans+0.025
 learns+0.024
 wasteful+0.024
Neg logit contributions
 stub-0.010
 all-0.009
 year-0.008
 one-0.008
 nowhere-0.008
Token Ben
Token ID3932
Feature activation+1.46e-02

Pos logit contributions
ours+0.003
 agrees+0.003
 cleans+0.003
 vinegar+0.003
 safer+0.003
Neg logit contributions
t-0.004
 year-0.004
 stub-0.004
--0.004
 pupil-0.003
Token.
Token ID13
Feature activation+1.87e-03

Pos logit contributions
 stub+0.011
 all+0.010
 year+0.010
 one+0.009
-+0.009
Neg logit contributions
 agrees-0.038
 cleans-0.036
ours-0.035
 learns-0.034
 wasteful-0.033
Token It
Token ID632
Feature activation+3.63e-03

Pos logit contributions
 all+0.002
 stub+0.002
 one+0.002
 just+0.002
 in+0.002
Neg logit contributions
 agrees-0.005
 cleans-0.004
ours-0.004
 wasteful-0.004
 learns-0.004
TokenBob
Token ID18861
Feature activation-1.16e-02

Pos logit contributions
 agrees+0.004
 cleans+0.004
ours+0.004
 learns+0.003
 cooperate+0.003
Neg logit contributions
 stub-0.003
 all-0.003
 t-0.002
 pupil-0.002
 nowhere-0.002
Token tried
Token ID3088
Feature activation-2.37e-03

Pos logit contributions
 agrees+0.030
 cleans+0.027
ours+0.027
 learns+0.026
 wasteful+0.026
Neg logit contributions
 all-0.010
 stub-0.009
 one-0.009
 ever-0.009
--0.008
Token to
Token ID284
Feature activation-1.18e-02

Pos logit contributions
 agrees+0.007
ours+0.007
 cleans+0.007
 wasteful+0.006
 learns+0.006
Neg logit contributions
 all-0.001
 one-0.001
 in-0.001
--0.001
 stub-0.001
Token pull
Token ID2834
Feature activation-5.32e-03

Pos logit contributions
 agrees+0.024
 cleans+0.022
 cooperate+0.021
ours+0.021
 wasteful+0.021
Neg logit contributions
 stub-0.016
 all-0.015
--0.015
 year-0.015
 one-0.014
Token the
Token ID262
Feature activation-1.08e-02

Pos logit contributions
 agrees+0.011
 cleans+0.010
ours+0.010
 safer+0.010
 wasteful+0.010
Neg logit contributions
 stub-0.007
 pupil-0.006
 year-0.006
dream-0.006
lf-0.006
Token rad
Token ID2511
Feature activation-4.55e-02

Pos logit contributions
 agrees+0.023
 cleans+0.021
ours+0.021
 wasteful+0.020
 learns+0.019
Neg logit contributions
 stub-0.013
 all-0.013
--0.013
 ever-0.012
 year-0.012
Tokenish
Token ID680
Feature activation-2.42e-03

Pos logit contributions
 agrees+0.075
 cleans+0.067
ours+0.066
 learns+0.061
 wasteful+0.061
Neg logit contributions
 all-0.081
 stub-0.078
--0.077
 one-0.077
 in-0.076
Token out
Token ID503
Feature activation-8.42e-03

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.006
 learns+0.006
 wasteful+0.006
Neg logit contributions
 all-0.002
 stub-0.002
 one-0.002
--0.002
 in-0.002
Token.
Token ID13
Feature activation+4.22e-03

Pos logit contributions
 agrees+0.022
 cleans+0.021
ours+0.021
 learns+0.020
 wasteful+0.020
Neg logit contributions
 all-0.007
 stub-0.006
 one-0.006
--0.006
 ever-0.006
Token He
Token ID679
Feature activation-1.25e-03

Pos logit contributions
 all+0.004
 one+0.004
 in+0.004
-+0.004
 just+0.004
Neg logit contributions
 agrees-0.010
 cleans-0.010
ours-0.010
 wasteful-0.009
 learns-0.009
Token pulled
Token ID5954
Feature activation+1.86e-03

Pos logit contributions
 agrees+0.003
 cleans+0.002
ours+0.002
 wasteful+0.002
 learns+0.002
Neg logit contributions
 stub-0.001
 year-0.001
 pupil-0.001
 all-0.001
 Halloween-0.001
Token him
Token ID683
Feature activation-4.73e-03

Pos logit contributions
 stub+0.004
 year+0.003
 ever+0.003
dream+0.003
 pupil+0.003
Neg logit contributions
 agrees-0.011
 cleans-0.010
ours-0.010
 wasteful-0.009
 learns-0.009
Token how
Token ID703
Feature activation-1.10e-02

Pos logit contributions
 agrees+0.011
 cleans+0.010
ours+0.010
 wasteful+0.010
 learns+0.010
Neg logit contributions
 all-0.005
 one-0.005
--0.005
 in-0.005
 just-0.005
Token to
Token ID284
Feature activation-2.80e-02

Pos logit contributions
 agrees+0.032
 cleans+0.030
ours+0.030
 learns+0.028
 wasteful+0.028
Neg logit contributions
 all-0.006
 stub-0.006
--0.005
 one-0.005
 in-0.005
Token hold
Token ID1745
Feature activation-1.23e-02

Pos logit contributions
 agrees+0.060
ours+0.055
 cleans+0.055
 wasteful+0.050
 learns+0.050
Neg logit contributions
 all-0.038
 one-0.036
 just-0.035
 stub-0.035
 in-0.034
Token the
Token ID262
Feature activation-1.67e-02

Pos logit contributions
 agrees+0.030
 cleans+0.028
ours+0.028
 wasteful+0.027
 learns+0.027
Neg logit contributions
 all-0.012
 stub-0.011
 one-0.011
--0.011
 in-0.010
Token golf
Token ID13126
Feature activation-4.55e-02

Pos logit contributions
 agrees+0.035
 cleans+0.033
ours+0.033
 wasteful+0.030
 learns+0.030
Neg logit contributions
 all-0.022
 stub-0.021
 one-0.021
--0.021
 in-0.020
Token stick
Token ID4859
Feature activation-6.19e-03

Pos logit contributions
 agrees+0.100
 cleans+0.096
ours+0.096
 wasteful+0.089
 learns+0.086
Neg logit contributions
 all-0.056
 one-0.052
--0.051
 in-0.051
 just-0.049
Token and
Token ID290
Feature activation-6.58e-03

Pos logit contributions
 agrees+0.017
 cleans+0.015
ours+0.015
 learns+0.015
 wasteful+0.015
Neg logit contributions
 stub-0.004
 all-0.004
 year-0.004
--0.003
 pupil-0.003
Token hit
Token ID2277
Feature activation+1.22e-03

Pos logit contributions
 agrees+0.015
ours+0.014
 cleans+0.014
 wasteful+0.013
 learns+0.013
Neg logit contributions
 all-0.008
 one-0.007
 just-0.007
 in-0.007
 t-0.007
Token the
Token ID262
Feature activation-1.23e-02

Pos logit contributions
 stub+0.001
 year+0.001
 pupil+0.001
 ever+0.001
-+0.001
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 wasteful-0.002
 safer-0.002
Token ball
Token ID2613
Feature activation-2.27e-03

Pos logit contributions
 agrees+0.027
 cleans+0.025
ours+0.024
 learns+0.023
 wasteful+0.023
Neg logit contributions
 all-0.015
 stub-0.015
 one-0.014
--0.014
 in-0.014
Token.
Token ID13
Feature activation-6.47e-03

Pos logit contributions
 stub+0.000
 all+0.000
-+0.000
 one+0.000
 ever+0.000
Neg logit contributions
 agrees-0.002
 cleans-0.002
ours-0.002
 learns-0.002
 wasteful-0.002
Token He
Token ID679
Feature activation-7.80e-03

Pos logit contributions
 agrees+0.014
 cleans+0.013
ours+0.013
 wasteful+0.012
 cooperate+0.012
Neg logit contributions
 all-0.009
 one-0.008
 just-0.008
 in-0.008
 with-0.008
Token showed
Token ID3751
Feature activation+1.75e-03

Pos logit contributions
 agrees+0.018
 cleans+0.016
ours+0.016
 learns+0.015
 wasteful+0.015
Neg logit contributions
 stub-0.009
 all-0.009
--0.008
 year-0.008
 one-0.008
Token them
Token ID606
Feature activation-3.43e-03

Pos logit contributions
 stub+0.003
 year+0.002
dream+0.002
 nowhere+0.002
 ever+0.002
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 learns-0.003
 safer-0.003
Token how
Token ID703
Feature activation-2.06e-02

Pos logit contributions
 agrees+0.009
 wasteful+0.008
 cleans+0.008
ours+0.008
 cooperate+0.008
Neg logit contributions
 all-0.004
 one-0.003
 in-0.003
--0.003
 just-0.003
Token to
Token ID284
Feature activation-4.51e-02

Pos logit contributions
 agrees+0.059
 cleans+0.056
ours+0.055
 learns+0.052
 wasteful+0.052
Neg logit contributions
 all-0.013
 one-0.011
--0.011
 ever-0.011
 in-0.011
Token hold
Token ID1745
Feature activation-1.31e-02

Pos logit contributions
 agrees+0.091
 cleans+0.084
ours+0.083
 wasteful+0.078
 learns+0.077
Neg logit contributions
 all-0.067
 one-0.063
 just-0.062
 in-0.061
--0.060
Token the
Token ID262
Feature activation-1.72e-02

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.030
 wasteful+0.029
 learns+0.029
Neg logit contributions
 all-0.012
--0.011
 one-0.011
 stub-0.011
 in-0.011
Token ladder
Token ID18002
Feature activation-2.44e-02

Pos logit contributions
 agrees+0.033
ours+0.030
 cleans+0.029
 learns+0.028
 wasteful+0.028
Neg logit contributions
 all-0.027
 one-0.026
 in-0.025
 t-0.025
--0.025
Token and
Token ID290
Feature activation-1.16e-02

Pos logit contributions
 agrees+0.068
 cleans+0.064
ours+0.063
 learns+0.060
 wasteful+0.060
Neg logit contributions
 all-0.015
 stub-0.014
--0.013
 one-0.013
 ever-0.013
Token slide
Token ID10649
Feature activation+2.54e-04

Pos logit contributions
 agrees+0.028
ours+0.027
 wasteful+0.025
 cleans+0.025
ucker+0.024
Neg logit contributions
 all-0.014
 one-0.013
 in-0.012
 just-0.012
 t-0.012
Token,
Token ID11
Feature activation-5.59e-03

Pos logit contributions
 agrees+0.036
 cleans+0.034
ours+0.034
 wasteful+0.032
 learns+0.032
Neg logit contributions
 all-0.013
 one-0.012
 in-0.012
--0.011
 just-0.011
Token "
Token ID366
Feature activation+1.14e-02

Pos logit contributions
 agrees+0.013
 cleans+0.012
ours+0.012
 learns+0.012
 wasteful+0.012
Neg logit contributions
 all-0.006
 stub-0.006
 one-0.005
--0.005
 in-0.005
TokenCan
Token ID6090
Feature activation-9.23e-03

Pos logit contributions
 stub+0.017
 all+0.016
-+0.016
 ever+0.015
 t+0.015
Neg logit contributions
 agrees-0.021
ours-0.020
 cleans-0.019
 yeast-0.018
 wasteful-0.018
Token I
Token ID314
Feature activation-1.86e-03

Pos logit contributions
 agrees+0.017
 cleans+0.015
ours+0.015
 learns+0.014
 wasteful+0.014
Neg logit contributions
 all-0.014
 stub-0.014
 one-0.014
--0.014
 in-0.013
Token pick
Token ID2298
Feature activation-2.32e-02

Pos logit contributions
 agrees+0.004
 cleans+0.003
ours+0.003
 wasteful+0.003
 learns+0.003
Neg logit contributions
 all-0.003
 stub-0.002
 one-0.002
--0.002
 just-0.002
Token those
Token ID883
Feature activation-4.51e-02

Pos logit contributions
 agrees+0.055
ours+0.051
 cleans+0.051
 learns+0.048
 wasteful+0.048
Neg logit contributions
 stub-0.024
 all-0.022
 year-0.022
--0.021
 ever-0.021
Token flowers
Token ID12734
Feature activation-3.59e-02

Pos logit contributions
 agrees+0.097
 cleans+0.090
ours+0.089
 learns+0.085
 wasteful+0.084
Neg logit contributions
 all-0.057
 one-0.053
--0.053
 stub-0.053
 in-0.052
Token?"
Token ID1701
Feature activation+8.53e-03

Pos logit contributions
 agrees+0.097
 cleans+0.093
ours+0.091
 learns+0.088
 wasteful+0.087
Neg logit contributions
 all-0.031
 in-0.027
 one-0.026
 just-0.025
--0.024
Token Her
Token ID2332
Feature activation-4.70e-04

Pos logit contributions
 stub+0.013
 all+0.013
 ever+0.012
 nowhere+0.012
-+0.012
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.014
 learns-0.013
 cooperate-0.013
Token mom
Token ID1995
Feature activation+3.03e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
dd-0.000
 stub-0.000
 ever-0.000
dream-0.000
 year-0.000
Token smiled
Token ID13541
Feature activation-5.83e-03

Pos logit contributions
 stub+0.005
 Halloween+0.005
 armour+0.005
 pupil+0.005
 year+0.005
Neg logit contributions
 agrees-0.005
 cleans-0.005
ours-0.004
 cooperate-0.004
 learns-0.004
Token was
Token ID373
Feature activation-3.06e-03

Pos logit contributions
 all+0.004
 stub+0.004
 one+0.003
-+0.003
 year+0.003
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.007
 learns-0.007
 wasteful-0.007
Token ready
Token ID3492
Feature activation+1.01e-02

Pos logit contributions
 agrees+0.007
 cleans+0.006
ours+0.006
 safer+0.006
 wasteful+0.006
Neg logit contributions
 stub-0.003
 year-0.003
 pupil-0.003
--0.003
 all-0.003
Token to
Token ID284
Feature activation-4.57e-03

Pos logit contributions
 stub+0.008
 all+0.007
 year+0.007
-+0.007
 pupil+0.007
Neg logit contributions
 agrees-0.026
 cleans-0.024
ours-0.024
 learns-0.023
 cooperate-0.023
Token pick
Token ID2298
Feature activation-9.29e-03

Pos logit contributions
 agrees+0.010
 cleans+0.009
ours+0.009
 learns+0.009
 wasteful+0.009
Neg logit contributions
 all-0.006
 stub-0.005
--0.005
 one-0.005
 ever-0.005
Token the
Token ID262
Feature activation-1.04e-02

Pos logit contributions
 agrees+0.023
 cleans+0.021
ours+0.021
 learns+0.020
 wasteful+0.020
Neg logit contributions
 stub-0.009
 all-0.008
--0.008
 ever-0.008
 year-0.008
Token rad
Token ID2511
Feature activation-4.47e-02

Pos logit contributions
 agrees+0.023
 cleans+0.021
ours+0.021
 wasteful+0.019
 learns+0.019
Neg logit contributions
 stub-0.012
 all-0.012
--0.012
 one-0.012
 ever-0.012
Tokenish
Token ID680
Feature activation-8.37e-03

Pos logit contributions
 agrees+0.070
 cleans+0.063
ours+0.062
 learns+0.057
 wasteful+0.057
Neg logit contributions
 all-0.084
 one-0.080
--0.080
 in-0.078
 just-0.078
Token.
Token ID13
Feature activation+2.98e-04

Pos logit contributions
 agrees+0.024
 cleans+0.022
ours+0.022
 wasteful+0.021
 learns+0.021
Neg logit contributions
 all-0.005
 stub-0.004
 one-0.004
--0.004
 in-0.004
Token She
Token ID1375
Feature activation-4.89e-03

Pos logit contributions
 all+0.000
 one+0.000
 in+0.000
 just+0.000
 ever+0.000
Neg logit contributions
 agrees-0.001
 wasteful-0.001
 cleans-0.001
ours-0.001
 yeast-0.001
Token pl
Token ID458
Feature activation+6.55e-03

Pos logit contributions
 agrees+0.012
 cleans+0.011
ours+0.011
 learns+0.010
 cooperate+0.010
Neg logit contributions
 stub-0.005
 all-0.005
 one-0.004
--0.004
 year-0.004
Tokenucked
Token ID17758
Feature activation+1.72e-03

Pos logit contributions
 all+0.009
 stub+0.009
-+0.009
 one+0.009
 in+0.009
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.012
 learns-0.011
 wasteful-0.011
Token said
Token ID531
Feature activation-2.34e-03

Pos logit contributions
 agrees+0.029
 cleans+0.026
ours+0.026
 learns+0.024
 wasteful+0.024
Neg logit contributions
 all-0.018
 stub-0.018
--0.017
 year-0.017
 one-0.017
Token.
Token ID13
Feature activation+1.11e-03

Pos logit contributions
 agrees+0.006
 cleans+0.006
ours+0.006
 cooperate+0.006
 learns+0.005
Neg logit contributions
 stub-0.002
 all-0.001
 year-0.001
 nowhere-0.001
 pupil-0.001
Token She
Token ID1375
Feature activation-1.89e-02

Pos logit contributions
 all+0.001
 stub+0.001
-+0.001
 one+0.001
 ever+0.001
Neg logit contributions
 agrees-0.002
 cleans-0.002
ours-0.002
 learns-0.002
 wasteful-0.002
Token put
Token ID1234
Feature activation+4.27e-04

Pos logit contributions
 agrees+0.043
 cleans+0.040
ours+0.039
 learns+0.036
 wasteful+0.035
Neg logit contributions
 stub-0.020
 all-0.020
 year-0.020
 one-0.019
 ever-0.019
Token the
Token ID262
Feature activation-1.37e-02

Pos logit contributions
 stub+0.000
 year+0.000
 pupil+0.000
 ever+0.000
-+0.000
Neg logit contributions
 agrees-0.001
 cleans-0.001
ours-0.001
 wasteful-0.001
 yeast-0.001
Token cherry
Token ID23612
Feature activation-4.47e-02

Pos logit contributions
 agrees+0.025
 cleans+0.023
ours+0.023
 learns+0.021
 wasteful+0.021
Neg logit contributions
 all-0.022
 stub-0.021
 one-0.021
--0.020
 in-0.020
Token in
Token ID287
Feature activation-2.10e-03

Pos logit contributions
 agrees+0.127
 cleans+0.120
ours+0.119
 learns+0.114
 wasteful+0.114
Neg logit contributions
 all-0.026
 stub-0.023
 one-0.022
--0.022
 in-0.021
Token her
Token ID607
Feature activation-1.47e-02

Pos logit contributions
 agrees+0.005
 cleans+0.005
ours+0.004
 safer+0.004
 yeast+0.004
Neg logit contributions
 stub-0.002
--0.002
 pupil-0.002
 ever-0.002
 year-0.002
Token mouth
Token ID5422
Feature activation+9.03e-03

Pos logit contributions
 agrees+0.032
 cleans+0.029
ours+0.029
 wasteful+0.028
 cooperate+0.028
Neg logit contributions
 all-0.019
 one-0.018
--0.017
 in-0.017
 stub-0.017
Token and
Token ID290
Feature activation-5.48e-03

Pos logit contributions
 stub+0.009
 pupil+0.008
 Halloween+0.008
lf+0.007
 ever+0.007
Neg logit contributions
 agrees-0.021
 cleans-0.019
ours-0.019
 learns-0.018
 safer-0.018
Token smiled
Token ID13541
Feature activation-3.32e-03

Pos logit contributions
 agrees+0.013
ours+0.012
 cleans+0.012
 wasteful+0.012
 cooperate+0.012
Neg logit contributions
 all-0.006
 one-0.005
 just-0.005
 stub-0.005
 in-0.005
Token shelf
Token ID18316
Feature activation+1.25e-02

Pos logit contributions
 stub+0.003
dream+0.003
 ever+0.003
 all+0.003
 barely+0.002
Neg logit contributions
 agrees-0.003
 cleans-0.003
 cooperate-0.003
ours-0.003
 yeast-0.003
Token,
Token ID11
Feature activation-6.08e-04

Pos logit contributions
 stub+0.011
 t+0.009
 Halloween+0.008
 pupil+0.008
 all+0.008
Neg logit contributions
 agrees-0.030
ours-0.028
 cleans-0.028
 safer-0.027
 wasteful-0.027
Token and
Token ID290
Feature activation+7.22e-03

Pos logit contributions
 agrees+0.002
ours+0.002
 cleans+0.001
 learns+0.001
 cooperate+0.001
Neg logit contributions
 all-0.000
 one-0.000
 in-0.000
 just-0.000
 stub-0.000
Token they
Token ID484
Feature activation+3.10e-03

Pos logit contributions
 all+0.007
 one+0.006
 stub+0.006
 in+0.006
-+0.006
Neg logit contributions
 agrees-0.018
 cleans-0.017
ours-0.017
 learns-0.016
 cooperate-0.016
Token could
Token ID714
Feature activation+6.33e-03

Pos logit contributions
-+0.004
 year+0.004
 stub+0.004
 one+0.004
 Halloween+0.004
Neg logit contributions
 agrees-0.006
 cleans-0.006
ours-0.006
 learns-0.005
 wasteful-0.005
Token not
Token ID407
Feature activation+2.11e-02

Pos logit contributions
 stub+0.007
-+0.007
 year+0.007
 all+0.007
 one+0.006
Neg logit contributions
 agrees-0.014
 cleans-0.013
ours-0.013
 cooperate-0.013
 learns-0.013
Token reach
Token ID3151
Feature activation+1.95e-03

Pos logit contributions
 stub+0.028
 year+0.026
-+0.026
 all+0.026
 pupil+0.025
Neg logit contributions
 agrees-0.042
 cleans-0.039
 cooperate-0.039
ours-0.038
 learns-0.037
Token it
Token ID340
Feature activation+1.23e-02

Pos logit contributions
 stub+0.002
 year+0.002
 pupil+0.002
-+0.002
 t+0.002
Neg logit contributions
 agrees-0.004
ours-0.004
 cleans-0.004
 wasteful-0.004
 safer-0.004
Token.
Token ID13
Feature activation+5.53e-03

Pos logit contributions
 stub+0.006
 all+0.006
-+0.005
 one+0.005
 year+0.005
Neg logit contributions
 agrees-0.035
 cleans-0.033
ours-0.033
 learns-0.032
 wasteful-0.031
Token They
Token ID1119
Feature activation-7.29e-03

Pos logit contributions
 all+0.006
 one+0.006
 just+0.005
 in+0.005
 stub+0.005
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.012
 learns-0.011
 cooperate-0.011
Token tried
Token ID3088
Feature activation+8.85e-04

Pos logit contributions
 agrees+0.015
 cleans+0.014
ours+0.014
 learns+0.013
 wasteful+0.013
Neg logit contributions
--0.009
 stub-0.009
 year-0.009
 Halloween-0.008
 one-0.008
Token mom
Token ID1995
Feature activation-7.36e-03

Pos logit contributions
 stub+0.006
 all+0.005
-+0.005
 ever+0.005
 year+0.005
Neg logit contributions
 agrees-0.011
 cleans-0.010
ours-0.010
 learns-0.010
 wasteful-0.009
Token heard
Token ID2982
Feature activation-7.40e-03

Pos logit contributions
 agrees+0.018
ours+0.017
 cleans+0.017
 wasteful+0.017
 learns+0.016
Neg logit contributions
 all-0.007
 in-0.006
--0.006
 just-0.006
 one-0.006
Token the
Token ID262
Feature activation+3.60e-03

Pos logit contributions
 agrees+0.018
 cleans+0.017
ours+0.017
 learns+0.016
 wasteful+0.016
Neg logit contributions
 stub-0.007
 all-0.007
--0.006
 ever-0.006
 one-0.006
Token noise
Token ID7838
Feature activation+3.83e-03

Pos logit contributions
 all+0.004
 stub+0.004
 one+0.004
-+0.004
 in+0.004
Neg logit contributions
 agrees-0.008
ours-0.008
 cleans-0.008
 learns-0.007
 wasteful-0.007
Token and
Token ID290
Feature activation+6.69e-03

Pos logit contributions
 stub+0.002
 all+0.002
 one+0.002
-+0.002
 year+0.002
Neg logit contributions
 agrees-0.011
 cleans-0.010
ours-0.010
 learns-0.009
 wasteful-0.009
Token came
Token ID1625
Feature activation+2.17e-02

Pos logit contributions
 stub+0.004
 all+0.004
-+0.004
 one+0.004
 year+0.004
Neg logit contributions
 agrees-0.018
 cleans-0.017
ours-0.017
 learns-0.016
 wasteful-0.016
Token to
Token ID284
Feature activation+4.79e-03

Pos logit contributions
 stub+0.019
 year+0.018
lf+0.017
 all+0.017
 one+0.016
Neg logit contributions
 agrees-0.053
 cleans-0.049
ours-0.048
 cooperate-0.046
 safer-0.046
Token see
Token ID766
Feature activation-1.99e-03

Pos logit contributions
 stub+0.006
 all+0.006
-+0.006
 year+0.006
 pupil+0.006
Neg logit contributions
 agrees-0.010
 cleans-0.009
 cooperate-0.009
ours-0.008
 safer-0.008
Token what
Token ID644
Feature activation+3.63e-03

Pos logit contributions
 agrees+0.004
ours+0.004
 cleans+0.003
 learns+0.003
 wasteful+0.003
Neg logit contributions
 all-0.003
 one-0.003
 in-0.003
--0.003
 just-0.003
Token happened
Token ID3022
Feature activation+1.13e-02

Pos logit contributions
 stub+0.003
 all+0.003
-+0.003
 t+0.003
 one+0.003
Neg logit contributions
 agrees-0.009
 cleans-0.008
ours-0.008
 wasteful-0.008
 learns-0.008
Token.
Token ID13
Feature activation+6.26e-03

Pos logit contributions
 stub+0.009
 all+0.008
 t+0.007
 year+0.007
 pupil+0.007
Neg logit contributions
 agrees-0.029
ours-0.027
 cleans-0.026
 learns-0.026
 cooperate-0.025
Token was
Token ID373
Feature activation+3.61e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
 year-0.001
 Halloween-0.001
 one-0.001
dream-0.001
 ever-0.001
Token very
Token ID845
Feature activation-1.10e-02

Pos logit contributions
 year+0.004
-+0.003
 stub+0.003
 one+0.003
 ever+0.003
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.007
 wasteful-0.007
 safer-0.007
Token sad
Token ID6507
Feature activation-4.84e-03

Pos logit contributions
 agrees+0.025
 cleans+0.023
ours+0.023
 wasteful+0.021
 learns+0.021
Neg logit contributions
 all-0.013
 stub-0.012
 one-0.012
--0.012
 year-0.012
Token and
Token ID290
Feature activation+1.16e-02

Pos logit contributions
 agrees+0.012
 cleans+0.012
ours+0.012
 learns+0.011
 wasteful+0.011
Neg logit contributions
 stub-0.004
 year-0.003
 all-0.003
 ever-0.003
 nowhere-0.003
Token didn
Token ID1422
Feature activation+2.21e-02

Pos logit contributions
-+0.012
 year+0.012
 all+0.011
 one+0.011
 ever+0.011
Neg logit contributions
 agrees-0.026
 cleans-0.025
ours-0.024
 wasteful-0.024
 learns-0.024
Token't
Token ID470
Feature activation+2.14e-02

Pos logit contributions
 stub+0.014
 all+0.014
-+0.013
 year+0.013
 one+0.013
Neg logit contributions
 agrees-0.059
 cleans-0.056
ours-0.055
 learns-0.053
 wasteful-0.052
Token know
Token ID760
Feature activation-9.13e-03

Pos logit contributions
 stub+0.040
 all+0.040
-+0.040
 year+0.040
 nowhere+0.039
Neg logit contributions
 agrees-0.031
 cleans-0.027
ours-0.025
 learns-0.025
 cooperate-0.024
Token what
Token ID644
Feature activation+1.23e-02

Pos logit contributions
 agrees+0.016
 cleans+0.014
ours+0.014
 learns+0.013
 wasteful+0.013
Neg logit contributions
 all-0.015
 stub-0.015
 one-0.015
--0.015
 in-0.014
Token to
Token ID284
Feature activation-7.57e-03

Pos logit contributions
 all+0.009
 stub+0.008
-+0.007
 year+0.007
 one+0.007
Neg logit contributions
 agrees-0.033
 cleans-0.031
ours-0.031
 learns-0.029
 wasteful-0.029
Token do
Token ID466
Feature activation+2.85e-02

Pos logit contributions
 agrees+0.013
 cleans+0.011
ours+0.011
 cooperate+0.011
 learns+0.010
Neg logit contributions
 stub-0.013
 year-0.012
 all-0.012
--0.012
 one-0.012
Token.
Token ID13
Feature activation+3.25e-03

Pos logit contributions
 stub+0.030
 armour+0.028
 wolf+0.023
ind+0.023
 miner+0.023
Neg logit contributions
 agrees-0.066
 cooperate-0.059
 learns-0.058
 cleans-0.058
ours-0.056
Token home
Token ID1363
Feature activation+3.82e-03

Pos logit contributions
 stub+0.006
 all+0.006
 nowhere+0.006
 just+0.005
 ever+0.005
Neg logit contributions
 agrees-0.009
 cleans-0.008
ours-0.008
 learns-0.008
 cooperate-0.008
Token with
Token ID351
Feature activation+1.49e-02

Pos logit contributions
 stub+0.003
 all+0.002
 one+0.002
 t+0.002
 year+0.002
Neg logit contributions
 agrees-0.010
 cleans-0.010
ours-0.010
 learns-0.009
 cooperate-0.009
Token the
Token ID262
Feature activation+9.69e-03

Pos logit contributions
 stub+0.018
-+0.015
 pupil+0.015
t+0.014
 year+0.014
Neg logit contributions
 agrees-0.032
 cleans-0.031
 safer-0.030
 learns-0.029
ours-0.028
Token fan
Token ID4336
Feature activation+1.55e-03

Pos logit contributions
 stub+0.012
-+0.011
 all+0.011
 nowhere+0.011
dd+0.011
Neg logit contributions
 agrees-0.020
 cleans-0.018
 safer-0.017
 learns-0.017
 wasteful-0.016
Tokenci
Token ID979
Feature activation+2.48e-02

Pos logit contributions
 stub+0.001
 all+0.001
 year+0.001
-+0.001
 one+0.001
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 learns-0.004
 cooperate-0.004
Tokenest
Token ID395
Feature activation+2.60e-02

Pos logit contributions
 stub+0.023
 all+0.022
 nowhere+0.020
 in+0.019
 just+0.019
Neg logit contributions
 agrees-0.062
 cleans-0.057
ours-0.055
 learns-0.054
 cooperate-0.052
Token king
Token ID5822
Feature activation-1.30e-03

Pos logit contributions
 stub+0.032
 all+0.029
 nowhere+0.028
 barely+0.026
 pupil+0.026
Neg logit contributions
 agrees-0.056
 cleans-0.051
ours-0.048
 learns-0.047
 cooperate-0.046
Token in
Token ID287
Feature activation+1.59e-02

Pos logit contributions
 agrees+0.003
 cleans+0.003
ours+0.003
 learns+0.003
 safer+0.003
Neg logit contributions
 stub-0.001
 all-0.001
 year-0.001
 one-0.001
 t-0.001
Token the
Token ID262
Feature activation+1.37e-02

Pos logit contributions
 stub+0.023
-+0.022
t+0.021
 just+0.019
dream+0.019
Neg logit contributions
 agrees-0.027
 cleans-0.026
 learns-0.025
ours-0.024
 safer-0.024
Token land
Token ID1956
Feature activation+9.33e-03

Pos logit contributions
 stub+0.017
-+0.016
 all+0.016
 barely+0.016
 just+0.016
Neg logit contributions
 agrees-0.027
 cleans-0.025
 learns-0.025
 safer-0.024
ours-0.023
Token.
Token ID13
Feature activation-3.70e-03

Pos logit contributions
 stub+0.007
 t+0.006
 all+0.006
 pupil+0.005
 one+0.005
Neg logit contributions
 agrees-0.025
 cleans-0.023
ours-0.022
 learns-0.022
 safer-0.021
Token,
Token ID11
Feature activation-3.80e-03

Pos logit contributions
 stub+0.009
 all+0.009
-+0.008
 just+0.008
 pupil+0.008
Neg logit contributions
 agrees-0.024
ours-0.023
 cleans-0.023
 learns-0.021
 cooperate-0.021
Token there
Token ID612
Feature activation-7.00e-03

Pos logit contributions
 agrees+0.010
 cleans+0.010
ours+0.010
 wasteful+0.009
 learns+0.009
Neg logit contributions
 all-0.003
 stub-0.003
--0.002
 one-0.002
 ever-0.002
Token was
Token ID373
Feature activation-6.08e-03

Pos logit contributions
 agrees+0.016
 cleans+0.015
ours+0.015
 learns+0.014
 becomes+0.014
Neg logit contributions
 stub-0.007
 all-0.006
dream-0.006
--0.006
 pupil-0.006
Token a
Token ID257
Feature activation+7.75e-04

Pos logit contributions
 agrees+0.015
 cleans+0.014
ours+0.014
 yeast+0.013
 learns+0.013
Neg logit contributions
 stub-0.005
--0.005
 all-0.004
 ever-0.004
 year-0.004
Token little
Token ID1310
Feature activation-4.48e-03

Pos logit contributions
 all+0.001
 stub+0.001
 ever+0.001
 nowhere+0.001
 one+0.001
Neg logit contributions
 agrees-0.002
 cleans-0.002
ours-0.002
 wasteful-0.002
 safer-0.002
Token bear
Token ID6842
Feature activation+1.99e-02

Pos logit contributions
 agrees+0.009
 cleans+0.009
ours+0.009
 wasteful+0.008
 yeast+0.008
Neg logit contributions
 stub-0.006
 all-0.006
 ever-0.005
 just-0.005
 nowhere-0.005
Token named
Token ID3706
Feature activation-4.90e-03

Pos logit contributions
 nowhere+0.024
dream+0.023
 barely+0.023
 armour+0.023
 stub+0.023
Neg logit contributions
 cleans-0.033
ours-0.033
 agrees-0.033
 learns-0.032
 cooperate-0.030
Token Teddy
Token ID29345
Feature activation+1.87e-02

Pos logit contributions
ours+0.008
 cleans+0.008
 agrees+0.008
 safer+0.008
 yeast+0.007
Neg logit contributions
--0.007
 year-0.007
 ad-0.007
 stub-0.007
dream-0.006
Token.
Token ID13
Feature activation+2.44e-03

Pos logit contributions
 stub+0.019
 nowhere+0.016
dream+0.016
 barely+0.015
 year+0.015
Neg logit contributions
 agrees-0.042
 cleans-0.040
 learns-0.039
 cooperate-0.039
ours-0.037
Token Teddy
Token ID29345
Feature activation+8.84e-03

Pos logit contributions
 ever+0.003
-+0.003
 year+0.003
 stub+0.003
 nowhere+0.002
Neg logit contributions
 agrees-0.005
 cleans-0.005
ours-0.005
 learns-0.005
 safer-0.004
Token had
Token ID550
Feature activation+1.00e-02

Pos logit contributions
 stub+0.007
 all+0.007
 one+0.006
 ever+0.006
-+0.006
Neg logit contributions
 agrees-0.022
 cleans-0.021
 learns-0.020
ours-0.020
 cooperate-0.019
Token attend
Token ID5262
Feature activation-1.01e-02

Pos logit contributions
 stub+0.001
-+0.001
 all+0.001
 year+0.001
 ever+0.001
Neg logit contributions
 agrees-0.002
 cleans-0.002
ours-0.002
 cooperate-0.002
 learns-0.002
Token to
Token ID284
Feature activation+4.30e-03

Pos logit contributions
 agrees+0.025
 cleans+0.024
ours+0.023
 learns+0.023
 safer+0.022
Neg logit contributions
 stub-0.009
 ever-0.006
 all-0.006
 pupil-0.006
 t-0.006
Token other
Token ID584
Feature activation-1.83e-04

Pos logit contributions
 stub+0.005
-+0.004
 all+0.004
 year+0.004
 ever+0.004
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 learns-0.009
 cooperate-0.009
Token customers
Token ID4297
Feature activation-3.17e-03

Pos logit contributions
 agrees+0.000
 cleans+0.000
ours+0.000
 learns+0.000
 wasteful+0.000
Neg logit contributions
 stub-0.000
 all-0.000
 nowhere-0.000
 ever-0.000
 barely-0.000
Token.
Token ID13
Feature activation-3.67e-03

Pos logit contributions
 agrees+0.009
 cleans+0.008
ours+0.008
 learns+0.008
 cooperate+0.008
Neg logit contributions
 stub-0.002
 all-0.002
--0.002
 t-0.002
 one-0.001
Token Jo
Token ID5302
Feature activation+1.42e-02

Pos logit contributions
 agrees+0.009
 cleans+0.008
ours+0.008
 learns+0.008
 wasteful+0.008
Neg logit contributions
 all-0.004
 one-0.004
 stub-0.004
 just-0.003
 year-0.003
Tokenanne
Token ID21952
Feature activation-1.95e-02

Pos logit contributions
 all+0.015
 stub+0.014
 one+0.014
 Halloween+0.013
 pupil+0.013
Neg logit contributions
 agrees-0.032
 cleans-0.030
 learns-0.028
ours-0.028
 cooperate-0.026
Token and
Token ID290
Feature activation-8.98e-03

Pos logit contributions
 agrees+0.047
 cleans+0.044
ours+0.042
 learns+0.042
 cooperate+0.039
Neg logit contributions
 stub-0.018
 all-0.018
 pupil-0.016
--0.016
 one-0.016
Token Joey
Token ID26154
Feature activation-1.22e-02

Pos logit contributions
 agrees+0.020
 cleans+0.019
 learns+0.018
ours+0.018
 supervision+0.017
Neg logit contributions
 stub-0.010
 year-0.009
--0.009
 nowhere-0.008
t-0.008
Token said
Token ID531
Feature activation+4.58e-03

Pos logit contributions
 agrees+0.029
 cleans+0.027
ours+0.027
 learns+0.026
 wasteful+0.025
Neg logit contributions
 stub-0.012
 all-0.011
--0.011
 one-0.011
 year-0.011
Token goodbye
Token ID24829
Feature activation-2.05e-03

Pos logit contributions
 all+0.005
 stub+0.005
-+0.004
 one+0.004
 ever+0.004
Neg logit contributions
 agrees-0.011
 cleans-0.010
ours-0.010
 learns-0.009
 wasteful-0.009
Token the
Token ID262
Feature activation-5.52e-04

Pos logit contributions
 all+0.012
 one+0.011
 t+0.011
 in+0.011
 just+0.011
Neg logit contributions
 agrees-0.009
ours-0.008
 wasteful-0.008
 cooperate-0.008
 cleans-0.008
Token tomato
Token ID24240
Feature activation-7.62e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 wasteful+0.001
 learns+0.001
Neg logit contributions
 all-0.001
 stub-0.001
--0.001
 one-0.001
 ever-0.001
Token on
Token ID319
Feature activation+1.56e-03

Pos logit contributions
 agrees+0.022
 cleans+0.020
ours+0.020
 wasteful+0.020
 cooperate+0.019
Neg logit contributions
 all-0.005
 in-0.005
 one-0.004
 just-0.004
--0.004
Token the
Token ID262
Feature activation-8.46e-04

Pos logit contributions
 stub+0.002
 ever+0.002
 ad+0.002
 pupil+0.002
 year+0.002
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 yeast-0.003
 learns-0.003
Token floor
Token ID4314
Feature activation+1.56e-02

Pos logit contributions
 agrees+0.002
 cleans+0.002
ours+0.002
 learns+0.001
 wasteful+0.001
Neg logit contributions
 all-0.001
 stub-0.001
 one-0.001
--0.001
 in-0.001
Token.
Token ID13
Feature activation+3.18e-03

Pos logit contributions
 all+0.007
 stub+0.007
-+0.006
 one+0.006
 year+0.005
Neg logit contributions
 agrees-0.046
 cleans-0.043
ours-0.043
 learns-0.041
 wasteful-0.041
Token He
Token ID679
Feature activation-3.21e-03

Pos logit contributions
 all+0.003
 one+0.002
 stub+0.002
 just+0.002
 in+0.002
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 wasteful-0.008
 learns-0.007
Token is
Token ID318
Feature activation+4.12e-03

Pos logit contributions
 agrees+0.004
 supervision+0.004
 cooperate+0.004
 cleans+0.004
ours+0.004
Neg logit contributions
 all-0.007
 just-0.006
 t-0.006
 one-0.006
 in-0.006
Token frightened
Token ID24776
Feature activation-7.78e-03

Pos logit contributions
 all+0.004
 in+0.004
 stub+0.004
 one+0.004
 just+0.004
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 learns-0.009
 cooperate-0.009
Token and
Token ID290
Feature activation+4.85e-03

Pos logit contributions
 agrees+0.020
 cleans+0.019
ours+0.019
 wasteful+0.018
 learns+0.018
Neg logit contributions
 stub-0.006
 barely-0.005
 all-0.005
 year-0.005
 pupil-0.005
Token starts
Token ID4940
Feature activation+5.76e-03

Pos logit contributions
 all+0.007
 in+0.006
 just+0.006
 stub+0.006
 one+0.006
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 cooperate-0.009
 yeast-0.009
Token and
Token ID290
Feature activation-2.54e-03

Pos logit contributions
 agrees+0.018
ours+0.017
 cleans+0.017
 wasteful+0.017
 learns+0.016
Neg logit contributions
 all-0.004
--0.003
 one-0.003
 in-0.003
 just-0.003
Token took
Token ID1718
Feature activation+2.28e-03

Pos logit contributions
 agrees+0.006
ours+0.006
 cleans+0.006
 wasteful+0.005
 learns+0.005
Neg logit contributions
 all-0.003
 stub-0.002
 one-0.002
--0.002
 just-0.002
Token the
Token ID262
Feature activation+4.24e-03

Pos logit contributions
 stub+0.002
 year+0.002
 pupil+0.002
 nowhere+0.002
 ever+0.002
Neg logit contributions
 agrees-0.005
 cleans-0.005
ours-0.005
 learns-0.005
 safer-0.005
Token button
Token ID4936
Feature activation+5.98e-03

Pos logit contributions
 all+0.006
 stub+0.006
 ever+0.006
 just+0.005
-+0.005
Neg logit contributions
 cleans-0.008
 agrees-0.007
ours-0.007
 wasteful-0.007
 safer-0.007
Token out
Token ID503
Feature activation+3.54e-03

Pos logit contributions
 stub+0.005
 year+0.004
 all+0.004
 nowhere+0.004
 ever+0.004
Neg logit contributions
 agrees-0.015
 cleans-0.014
ours-0.014
 learns-0.014
 safer-0.014
Token of
Token ID286
Feature activation+8.77e-03

Pos logit contributions
 stub+0.003
 pupil+0.003
 all+0.002
 t+0.002
 one+0.002
Neg logit contributions
 agrees-0.009
 cleans-0.009
ours-0.008
 learns-0.008
 wasteful-0.008
Token his
Token ID465
Feature activation+5.63e-03

Pos logit contributions
 stub+0.011
-+0.010
 year+0.010
 pupil+0.010
 ever+0.010
Neg logit contributions
 cleans-0.018
 agrees-0.017
ours-0.016
 wasteful-0.016
 safer-0.016
Token mouth
Token ID5422
Feature activation+8.73e-03

Pos logit contributions
 all+0.008
 stub+0.008
 one+0.007
-+0.007
 ever+0.007
Neg logit contributions
 agrees-0.011
 cleans-0.011
ours-0.010
 learns-0.010
 wasteful-0.010
Token.
Token ID13
Feature activation-1.69e-03

Pos logit contributions
 stub+0.007
 pupil+0.006
 year+0.005
 nowhere+0.005
 all+0.005
Neg logit contributions
 agrees-0.023
 cleans-0.021
ours-0.021
 safer-0.020
 learns-0.020
Token He
Token ID679
Feature activation-4.55e-03

Pos logit contributions
 agrees+0.004
 cleans+0.004
ours+0.004
 wasteful+0.004
 learns+0.004
Neg logit contributions
 all-0.002
 one-0.001
 stub-0.001
--0.001
 just-0.001
Token felt
Token ID2936
Feature activation-7.48e-03

Pos logit contributions
 agrees+0.011
 cleans+0.010
ours+0.010
 learns+0.009
 wasteful+0.009
Neg logit contributions
 stub-0.004
 all-0.004
 year-0.004
 one-0.004
--0.004
Token their
Token ID511
Feature activation+1.34e-02

Pos logit contributions
 stub+0.004
 all+0.003
-+0.003
 ever+0.003
 one+0.003
Neg logit contributions
 agrees-0.008
 cleans-0.007
ours-0.007
 learns-0.007
 wasteful-0.007
Token favorite
Token ID4004
Feature activation+2.83e-02

Pos logit contributions
 stub+0.016
 all+0.015
-+0.014
 one+0.014
 ever+0.014
Neg logit contributions
 agrees-0.030
 cleans-0.028
ours-0.028
 learns-0.026
 wasteful-0.026
Token shows
Token ID2523
Feature activation+7.53e-03

Pos logit contributions
 stub+0.037
 all+0.035
 ever+0.034
-+0.033
 one+0.033
Neg logit contributions
 agrees-0.060
 cleans-0.055
ours-0.055
 learns-0.051
 wasteful-0.050
Token.
Token ID13
Feature activation-1.01e-02

Pos logit contributions
 stub+0.006
 all+0.004
-+0.004
 nowhere+0.004
 year+0.004
Neg logit contributions
 agrees-0.020
 cleans-0.019
ours-0.019
 learns-0.018
 cooperate-0.018
Token Sometimes
Token ID8975
Feature activation+3.86e-03

Pos logit contributions
 agrees+0.024
 cleans+0.022
 wasteful+0.021
ours+0.021
 learns+0.021
Neg logit contributions
 all-0.011
 one-0.011
 stub-0.010
 just-0.010
 in-0.010
Token they
Token ID484
Feature activation+1.51e-02

Pos logit contributions
 all+0.006
 one+0.005
-+0.005
 stub+0.005
 in+0.005
Neg logit contributions
 agrees-0.008
ours-0.007
 cleans-0.007
 wasteful-0.007
 learns-0.007
Token watched
Token ID7342
Feature activation-2.08e-03

Pos logit contributions
 stub+0.015
 all+0.014
-+0.014
 one+0.014
 year+0.013
Neg logit contributions
 agrees-0.036
 cleans-0.034
ours-0.033
 learns-0.032
 wasteful-0.031
Token cartoons
Token ID30070
Feature activation-5.30e-03

Pos logit contributions
 agrees+0.005
 cleans+0.004
ours+0.004
 wasteful+0.004
 learns+0.004
Neg logit contributions
 stub-0.002
 all-0.002
 ever-0.002
 nowhere-0.002
 year-0.002
Token,
Token ID11
Feature activation+8.62e-03

Pos logit contributions
 agrees+0.015
ours+0.015
 cleans+0.014
 wasteful+0.014
 learns+0.014
Neg logit contributions
 all-0.003
 in-0.003
 one-0.003
--0.003
 just-0.002
Token sometimes
Token ID3360
Feature activation+3.56e-03

Pos logit contributions
 all+0.009
 one+0.008
 just+0.008
 in+0.008
 t+0.008
Neg logit contributions
 agrees-0.021
 cleans-0.020
ours-0.020
 wasteful-0.018
 learns-0.018
Token they
Token ID484
Feature activation+1.54e-02

Pos logit contributions
 all+0.005
 one+0.005
 stub+0.005
 in+0.005
-+0.005
Neg logit contributions
 agrees-0.007
ours-0.007
 cleans-0.006
 wasteful-0.006
 learns-0.006
Token!
Token ID0
Feature activation-1.04e-02

Pos logit contributions
 stub+0.005
 all+0.005
 one+0.004
 t+0.004
 nowhere+0.004
Neg logit contributions
 agrees-0.013
 cleans-0.012
ours-0.012
 learns-0.011
 wasteful-0.011
Token You
Token ID921
Feature activation-7.25e-03

Pos logit contributions
 agrees+0.021
 cleans+0.019
ours+0.019
 wasteful+0.018
 learns+0.018
Neg logit contributions
 all-0.014
 stub-0.014
 one-0.014
--0.013
 just-0.013
Token spoil
Token ID20851
Feature activation-1.07e-02

Pos logit contributions
 agrees+0.013
ours+0.012
 cleans+0.012
 learns+0.011
 wasteful+0.011
Neg logit contributions
 all-0.012
 stub-0.011
 one-0.011
 just-0.011
 ever-0.011
Token the
Token ID262
Feature activation-8.28e-03

Pos logit contributions
 agrees+0.025
 cleans+0.024
ours+0.024
 learns+0.022
 wasteful+0.022
Neg logit contributions
 all-0.011
 stub-0.011
--0.010
 one-0.010
 ever-0.010
Token cr
Token ID1067
Feature activation+5.87e-03

Pos logit contributions
 agrees+0.015
 cleans+0.014
 learns+0.013
 cooperate+0.013
ours+0.013
Neg logit contributions
 stub-0.013
 all-0.012
--0.012
 nowhere-0.012
 ever-0.011
Tokenay
Token ID323
Feature activation+1.16e-02

Pos logit contributions
 stub+0.006
 all+0.005
 one+0.005
 t+0.005
 nowhere+0.005
Neg logit contributions
 agrees-0.015
ours-0.014
 cleans-0.014
 learns-0.013
 cooperate-0.013
Tokenons
Token ID684
Feature activation-8.21e-03

Pos logit contributions
 stub+0.017
 all+0.016
 nowhere+0.015
 t+0.015
 just+0.015
Neg logit contributions
 agrees-0.023
 cleans-0.021
ours-0.021
 learns-0.019
 cooperate-0.019
Token!"
Token ID2474
Feature activation+1.82e-03

Pos logit contributions
 agrees+0.020
 cleans+0.018
ours+0.018
 wasteful+0.017
 learns+0.017
Neg logit contributions
 stub-0.008
 all-0.008
 one-0.007
--0.007
 year-0.007
Token
Token ID198
Feature activation-5.85e-03

Pos logit contributions
 all+0.001
 one+0.001
 in+0.001
 just+0.001
 stub+0.001
Neg logit contributions
 agrees-0.005
ours-0.005
 cleans-0.005
 wasteful-0.005
 cooperate-0.004
Token
Token ID198
Feature activation-4.95e-03

Pos logit contributions
 agrees+0.015
ours+0.014
 cleans+0.014
 wasteful+0.014
 learns+0.013
Neg logit contributions
 all-0.005
 stub-0.004
--0.004
 one-0.004
 in-0.004
TokenTom
Token ID13787
Feature activation+3.52e-03

Pos logit contributions
 agrees+0.012
ours+0.011
 cleans+0.011
 wasteful+0.010
 learns+0.010
Neg logit contributions
--0.005
 all-0.005
 one-0.005
 stub-0.004
 in-0.004
Token mom
Token ID1995
Feature activation-1.24e-02

Pos logit contributions
 agrees+0.029
 cleans+0.027
ours+0.027
 learns+0.026
 wasteful+0.025
Neg logit contributions
 stub-0.008
 all-0.008
--0.007
 ever-0.007
 year-0.007
Token said
Token ID531
Feature activation-9.68e-04

Pos logit contributions
 agrees+0.029
ours+0.027
 cleans+0.027
 wasteful+0.026
 learns+0.026
Neg logit contributions
 all-0.013
 stub-0.012
 one-0.012
--0.012
 just-0.011
Token,
Token ID11
Feature activation-6.01e-03

Pos logit contributions
 agrees+0.002
 cleans+0.002
ours+0.002
 wasteful+0.002
 learns+0.002
Neg logit contributions
 stub-0.001
 all-0.001
 year-0.001
 pupil-0.001
--0.001
Token "
Token ID366
Feature activation-4.51e-03

Pos logit contributions
 agrees+0.014
 cleans+0.013
ours+0.012
 wasteful+0.012
 learns+0.012
Neg logit contributions
 all-0.007
 stub-0.007
--0.006
 one-0.006
 ever-0.006
TokenGood
Token ID10248
Feature activation-1.82e-03

Pos logit contributions
 agrees+0.009
 cleans+0.009
ours+0.008
 learns+0.008
 wasteful+0.008
Neg logit contributions
 all-0.006
 stub-0.006
--0.006
 one-0.006
 in-0.006
Token job
Token ID1693
Feature activation-1.22e-02

Pos logit contributions
 agrees+0.004
ours+0.004
 cleans+0.004
 wasteful+0.004
 learns+0.004
Neg logit contributions
 all-0.002
 stub-0.002
 nowhere-0.002
 t-0.002
 year-0.002
Token,
Token ID11
Feature activation-1.56e-02

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.031
 learns+0.030
 wasteful+0.030
Neg logit contributions
 all-0.009
--0.008
 stub-0.008
 one-0.008
 in-0.007
Token Lily
Token ID20037
Feature activation-3.59e-03

Pos logit contributions
 agrees+0.036
ours+0.033
 cleans+0.033
 learns+0.032
 wasteful+0.031
Neg logit contributions
 all-0.018
 one-0.017
 just-0.017
 in-0.016
--0.016
Token!
Token ID0
Feature activation-1.68e-02

Pos logit contributions
 agrees+0.010
 cleans+0.009
ours+0.009
 learns+0.009
 wasteful+0.008
Neg logit contributions
 stub-0.003
 all-0.003
 one-0.002
 year-0.002
 ever-0.002
Token Now
Token ID2735
Feature activation+2.69e-03

Pos logit contributions
 agrees+0.033
 cleans+0.029
ours+0.029
 wasteful+0.028
 learns+0.027
Neg logit contributions
 all-0.024
 stub-0.024
 one-0.024
 ever-0.023
 just-0.023
Token we
Token ID356
Feature activation+4.33e-03

Pos logit contributions
 all+0.003
 stub+0.003
 one+0.003
-+0.003
 just+0.003
Neg logit contributions
 agrees-0.006
 cleans-0.005
ours-0.005
 wasteful-0.005
 learns-0.005
Token Ben
Token ID3932
Feature activation-2.63e-03

Pos logit contributions
 agrees+0.021
 cleans+0.020
ours+0.020
 wasteful+0.019
 learns+0.019
Neg logit contributions
 all-0.014
 one-0.014
 stub-0.014
 just-0.013
 in-0.013
Token says
Token ID1139
Feature activation-1.05e-04

Pos logit contributions
 agrees+0.002
ours+0.002
 wasteful+0.002
 cleans+0.002
 yeast+0.002
Neg logit contributions
 all-0.006
 one-0.006
 just-0.006
 in-0.006
 t-0.006
Token.
Token ID13
Feature activation-1.44e-03

Pos logit contributions
 agrees+0.000
 cleans+0.000
ours+0.000
 cooperate+0.000
 learns+0.000
Neg logit contributions
 stub-0.000
 all-0.000
 year-0.000
 nowhere-0.000
 pupil-0.000
Token He
Token ID679
Feature activation-1.09e-02

Pos logit contributions
 agrees+0.003
ours+0.003
 cleans+0.003
 learns+0.003
 wasteful+0.003
Neg logit contributions
 all-0.002
 stub-0.002
--0.002
 ever-0.002
 year-0.002
Token tries
Token ID8404
Feature activation-5.30e-03

Pos logit contributions
 agrees+0.014
 cleans+0.012
ours+0.012
 wasteful+0.011
 cooperate+0.011
Neg logit contributions
 all-0.023
 one-0.022
--0.022
 stub-0.022
 just-0.022
Token to
Token ID284
Feature activation-1.78e-02

Pos logit contributions
 agrees+0.016
ours+0.015
 cleans+0.015
 wasteful+0.014
 learns+0.014
Neg logit contributions
 all-0.003
 one-0.003
--0.002
 stub-0.002
 in-0.002
Token lift
Token ID10303
Feature activation-1.49e-02

Pos logit contributions
 agrees+0.031
 cooperate+0.028
 cleans+0.028
 safer+0.025
 learns+0.025
Neg logit contributions
 stub-0.028
 year-0.027
dream-0.027
 pupil-0.026
--0.026
Token the
Token ID262
Feature activation-1.80e-02

Pos logit contributions
 agrees+0.033
ours+0.030
 cleans+0.030
 learns+0.028
 safer+0.028
Neg logit contributions
 stub-0.018
 year-0.017
 pupil-0.016
 ever-0.015
dream-0.015
Token rock
Token ID3881
Feature activation+1.77e-03

Pos logit contributions
 agrees+0.039
 cleans+0.036
ours+0.036
 learns+0.034
 wasteful+0.033
Neg logit contributions
 stub-0.020
 ever-0.020
 year-0.020
--0.019
 all-0.019
Token,
Token ID11
Feature activation-1.27e-02

Pos logit contributions
 stub+0.001
 all+0.001
 year+0.001
 nowhere+0.001
-+0.001
Neg logit contributions
 agrees-0.005
 cleans-0.004
ours-0.004
 learns-0.004
 wasteful-0.004
Token but
Token ID475
Feature activation-8.11e-03

Pos logit contributions
 agrees+0.036
ours+0.034
 cleans+0.034
 learns+0.032
 cooperate+0.032
Neg logit contributions
 all-0.009
 one-0.008
 just-0.008
 in-0.008
 stub-0.007
Token sorry
Token ID7926
Feature activation-8.38e-03

Pos logit contributions
 stub+0.014
-+0.014
 one+0.014
 ever+0.014
 all+0.013
Neg logit contributions
 agrees-0.022
 cleans-0.021
ours-0.020
 wasteful-0.019
 safer-0.019
Token.
Token ID13
Feature activation+6.09e-03

Pos logit contributions
 agrees+0.027
 cleans+0.025
ours+0.025
 wasteful+0.024
 learns+0.024
Neg logit contributions
 all-0.003
 in-0.002
 one-0.002
--0.002
 just-0.002
Token They
Token ID1119
Feature activation-4.12e-03

Pos logit contributions
 all+0.010
 in+0.010
 one+0.010
 just+0.010
-+0.010
Neg logit contributions
 agrees-0.011
ours-0.010
 cleans-0.010
 supervision-0.009
 wasteful-0.009
Token take
Token ID1011
Feature activation-8.68e-03

Pos logit contributions
 agrees+0.008
ours+0.008
 cleans+0.007
 wasteful+0.007
 yeast+0.007
Neg logit contributions
 all-0.006
 just-0.006
 t-0.006
 in-0.006
 one-0.006
Token Mia
Token ID32189
Feature activation+9.23e-03

Pos logit contributions
 agrees+0.019
 cleans+0.017
ours+0.017
 wasteful+0.016
 learns+0.016
Neg logit contributions
 all-0.011
 stub-0.010
 one-0.010
--0.010
 in-0.010
Token and
Token ID290
Feature activation-8.73e-03

Pos logit contributions
 all+0.007
 stub+0.007
 one+0.006
 year+0.006
-+0.006
Neg logit contributions
 agrees-0.024
 cleans-0.023
ours-0.022
 learns-0.021
 wasteful-0.021
Token Tim
Token ID5045
Feature activation-1.31e-03

Pos logit contributions
 agrees+0.019
ours+0.018
 cleans+0.017
 wasteful+0.017
 learns+0.017
Neg logit contributions
 all-0.011
 one-0.011
 t-0.011
 just-0.010
 in-0.010
Token home
Token ID1363
Feature activation-3.15e-03

Pos logit contributions
 agrees+0.004
ours+0.004
 cleans+0.003
 wasteful+0.003
 learns+0.003
Neg logit contributions
 all-0.001
 in-0.001
--0.001
 one-0.001
 just-0.001
Token.
Token ID13
Feature activation+4.03e-03

Pos logit contributions
 agrees+0.010
ours+0.009
 cleans+0.009
 wasteful+0.009
 learns+0.009
Neg logit contributions
 all-0.001
--0.001
 in-0.001
 one-0.001
 stub-0.001
Token They
Token ID1119
Feature activation-7.84e-03

Pos logit contributions
 all+0.006
 in+0.006
 one+0.006
 just+0.006
 with+0.006
Neg logit contributions
 agrees-0.008
ours-0.007
 cleans-0.007
 supervision-0.007
 wasteful-0.007
Token put
Token ID1234
Feature activation-1.50e-03

Pos logit contributions
 agrees+0.016
ours+0.015
 cleans+0.015
 wasteful+0.014
 learns+0.014
Neg logit contributions
 all-0.011
 just-0.010
 t-0.010
 in-0.010
 one-0.010
Token hand
Token ID1021
Feature activation+8.46e-03

Pos logit contributions
 agrees+0.035
 cleans+0.031
ours+0.031
 learns+0.030
 cooperate+0.029
Neg logit contributions
 stub-0.018
 all-0.017
 year-0.017
 nowhere-0.016
 ever-0.016
Token and
Token ID290
Feature activation+3.66e-03

Pos logit contributions
 stub+0.005
 all+0.005
 year+0.004
 nowhere+0.004
 ever+0.004
Neg logit contributions
 agrees-0.024
 cleans-0.022
ours-0.022
 learns-0.021
 wasteful-0.021
Token pur
Token ID1308
Feature activation-2.08e-03

Pos logit contributions
 all+0.004
 stub+0.004
 one+0.004
-+0.004
 in+0.004
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 learns-0.007
 wasteful-0.007
Tokenred
Token ID445
Feature activation+5.58e-04

Pos logit contributions
 agrees+0.005
 cleans+0.005
ours+0.005
 learns+0.004
 cooperate+0.004
Neg logit contributions
 all-0.002
 stub-0.002
 one-0.002
 nowhere-0.002
 just-0.002
Token.
Token ID13
Feature activation-2.75e-03

Pos logit contributions
 stub+0.000
lf+0.000
 nowhere+0.000
 year+0.000
 pupil+0.000
Neg logit contributions
 agrees-0.001
 cleans-0.001
ours-0.001
 learns-0.001
 safer-0.001
Token
Token ID198
Feature activation-8.02e-03

Pos logit contributions
 agrees+0.007
ours+0.006
 cleans+0.006
 wasteful+0.006
 cooperate+0.006
Neg logit contributions
 all-0.003
 stub-0.002
 one-0.002
 in-0.002
 just-0.002
Token
Token ID198
Feature activation-5.63e-03

Pos logit contributions
 agrees+0.022
ours+0.020
 cleans+0.020
 wasteful+0.020
 learns+0.019
Neg logit contributions
 all-0.006
 stub-0.005
 in-0.005
 one-0.005
--0.005
TokenL
Token ID43
Feature activation+7.60e-03

Pos logit contributions
 agrees+0.016
ours+0.015
 cleans+0.015
 supervision+0.014
 learns+0.014
Neg logit contributions
 all-0.003
 one-0.003
 in-0.003
 stub-0.003
--0.003
Tokenily
Token ID813
Feature activation-9.11e-03

Pos logit contributions
-+0.007
 all+0.007
 in+0.006
 ever+0.006
 just+0.006
Neg logit contributions
 agrees-0.019
 cleans-0.018
ours-0.018
 wasteful-0.017
 learns-0.017
Token hugged
Token ID41130
Feature activation-1.59e-03

Pos logit contributions
 agrees+0.022
 cleans+0.021
ours+0.020
 learns+0.019
 wasteful+0.019
Neg logit contributions
 all-0.009
 stub-0.009
 one-0.008
--0.008
 ever-0.008
Token the
Token ID262
Feature activation-1.09e-02

Pos logit contributions
 agrees+0.003
 cleans+0.003
 learns+0.003
ours+0.003
 safer+0.003
Neg logit contributions
 year-0.002
 stub-0.002
dream-0.002
 ever-0.002
 miner-0.002
Token say
Token ID910
Feature activation+4.31e-03

Pos logit contributions
ours+0.013
 agrees+0.012
 wasteful+0.012
 cleans+0.012
 supervision+0.011
Neg logit contributions
 all-0.011
 in-0.010
 just-0.010
 t-0.010
 one-0.009
Token,
Token ID11
Feature activation+9.97e-03

Pos logit contributions
 all+0.002
 in+0.002
 one+0.002
-+0.001
 with+0.001
Neg logit contributions
 agrees-0.013
ours-0.013
 wasteful-0.013
 cleans-0.013
 learns-0.012
Token "
Token ID366
Feature activation+1.83e-03

Pos logit contributions
 all+0.010
 one+0.009
 in+0.009
 stub+0.009
-+0.009
Neg logit contributions
 agrees-0.025
 cleans-0.023
ours-0.023
 wasteful-0.022
 learns-0.022
TokenOkay
Token ID16454
Feature activation-4.28e-04

Pos logit contributions
 one+0.003
 ever+0.002
 stub+0.002
-+0.002
 just+0.002
Neg logit contributions
 agrees-0.004
ours-0.004
 cleans-0.003
 learns-0.003
 wasteful-0.003
Token,
Token ID11
Feature activation+7.94e-04

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
 stub-0.000
 all-0.000
--0.000
 year-0.000
 ever-0.000
Token Mom
Token ID11254
Feature activation-1.31e-02

Pos logit contributions
 all+0.001
 one+0.001
 just+0.000
 stub+0.000
 t+0.000
Neg logit contributions
 agrees-0.002
 cleans-0.002
ours-0.002
 learns-0.002
 wasteful-0.002
Token.
Token ID13
Feature activation-7.81e-03

Pos logit contributions
 agrees+0.039
 cleans+0.037
ours+0.037
 wasteful+0.036
 learns+0.035
Neg logit contributions
 all-0.006
 one-0.005
--0.005
 in-0.005
 just-0.005
Token We
Token ID775
Feature activation-1.00e-02

Pos logit contributions
 agrees+0.014
ours+0.012
 wasteful+0.012
 cleans+0.012
 learns+0.012
Neg logit contributions
 one-0.013
 ever-0.012
 all-0.012
 just-0.012
 in-0.012
Token will
Token ID481
Feature activation+2.03e-03

Pos logit contributions
 agrees+0.016
ours+0.015
 cleans+0.014
 wasteful+0.013
 learns+0.013
Neg logit contributions
 all-0.018
 one-0.017
 just-0.017
 stub-0.017
 ever-0.017
Token share
Token ID2648
Feature activation-1.60e-03

Pos logit contributions
 all+0.003
 stub+0.003
 one+0.003
-+0.003
 just+0.002
Neg logit contributions
 agrees-0.004
 cleans-0.004
ours-0.004
 learns-0.004
 wasteful-0.004
Token.
Token ID13
Feature activation-6.08e-03

Pos logit contributions
 agrees+0.004
ours+0.003
 cleans+0.003
 wasteful+0.003
 learns+0.003
Neg logit contributions
 all-0.002
 one-0.002
 in-0.002
--0.002
 just-0.002
Token sad
Token ID6507
Feature activation-9.90e-03

Pos logit contributions
 agrees+0.008
ours+0.008
 cleans+0.008
 supervision+0.007
 yeast+0.007
Neg logit contributions
 all-0.004
 just-0.004
 in-0.004
 one-0.004
 with-0.004
Token."
Token ID526
Feature activation+3.79e-03

Pos logit contributions
 agrees+0.028
ours+0.027
 cleans+0.027
 wasteful+0.026
 learns+0.025
Neg logit contributions
 all-0.006
 in-0.005
 one-0.005
--0.005
 just-0.005
Token
Token ID198
Feature activation-6.54e-03

Pos logit contributions
 all+0.003
 stub+0.003
 one+0.003
 in+0.003
-+0.003
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 wasteful-0.009
 learns-0.009
Token
Token ID198
Feature activation-5.93e-03

Pos logit contributions
 agrees+0.019
ours+0.018
 cleans+0.017
 wasteful+0.017
 learns+0.017
Neg logit contributions
 all-0.004
 stub-0.003
 in-0.003
 one-0.003
--0.003
TokenAnna
Token ID31160
Feature activation-8.56e-03

Pos logit contributions
 agrees+0.013
ours+0.013
 cleans+0.012
 wasteful+0.012
 supervision+0.011
Neg logit contributions
 one-0.007
 year-0.007
 with-0.007
 in-0.006
--0.006
Token and
Token ID290
Feature activation-3.03e-02

Pos logit contributions
 wasteful+0.024
ours+0.024
 supervision+0.024
 agrees+0.023
 safer+0.023
Neg logit contributions
 all-0.004
 just-0.004
 in-0.004
 one-0.004
--0.003
Token Ben
Token ID3932
Feature activation-8.87e-03

Pos logit contributions
 agrees+0.067
ours+0.062
 cleans+0.061
 wasteful+0.059
 cooperate+0.057
Neg logit contributions
 all-0.039
 one-0.038
 t-0.036
 in-0.035
 stub-0.035
Token look
Token ID804
Feature activation-3.70e-03

Pos logit contributions
ours+0.017
 agrees+0.016
 cleans+0.015
 wasteful+0.015
 yeast+0.015
Neg logit contributions
 all-0.015
 just-0.013
 in-0.013
 one-0.013
 t-0.013
Token at
Token ID379
Feature activation+2.47e-03

Pos logit contributions
 agrees+0.012
 cleans+0.011
ours+0.011
 cooperate+0.011
 yeast+0.011
Neg logit contributions
 all-0.001
 in-0.001
--0.001
 one-0.001
 just-0.001
Token each
Token ID1123
Feature activation+7.80e-04

Pos logit contributions
 all+0.002
 one+0.002
 in+0.002
 just+0.002
 t+0.002
Neg logit contributions
 agrees-0.006
ours-0.006
 cleans-0.006
 wasteful-0.006
 learns-0.006
Token other
Token ID584
Feature activation-6.28e-03

Pos logit contributions
 all+0.001
 one+0.001
 in+0.001
-+0.001
 stub+0.001
Neg logit contributions
 agrees-0.002
ours-0.002
 cleans-0.002
 wasteful-0.001
 learns-0.001
Token mom
Token ID1995
Feature activation-7.49e-03

Pos logit contributions
dd+0.003
eb+0.003
 ever+0.003
Frank+0.003
de+0.003
Neg logit contributions
 cleans-0.005
 agrees-0.005
 learns-0.004
ours-0.004
 wasteful-0.004
Tokenmy
Token ID1820
Feature activation-7.03e-03

Pos logit contributions
 agrees+0.015
 cleans+0.014
ours+0.014
 learns+0.013
 wasteful+0.013
Neg logit contributions
 stub-0.010
 all-0.009
 Halloween-0.009
 year-0.009
 ever-0.009
Token showed
Token ID3751
Feature activation+4.51e-03

Pos logit contributions
 agrees+0.015
 cleans+0.014
ours+0.013
 learns+0.013
 wasteful+0.012
Neg logit contributions
 stub-0.009
 year-0.008
 all-0.008
 nowhere-0.008
 ever-0.008
Token her
Token ID607
Feature activation-3.78e-03

Pos logit contributions
 stub+0.004
 all+0.003
 ever+0.003
 year+0.003
-+0.003
Neg logit contributions
 agrees-0.012
 cleans-0.011
ours-0.011
 learns-0.010
 wasteful-0.010
Token how
Token ID703
Feature activation-8.54e-03

Pos logit contributions
 agrees+0.008
 cleans+0.007
ours+0.007
 wasteful+0.007
 learns+0.007
Neg logit contributions
 all-0.005
 one-0.005
 stub-0.005
--0.005
 in-0.004
Token to
Token ID284
Feature activation-3.18e-02

Pos logit contributions
 agrees+0.023
 cleans+0.022
ours+0.022
 wasteful+0.021
 learns+0.021
Neg logit contributions
 all-0.006
 stub-0.006
 one-0.005
--0.005
 year-0.005
Token use
Token ID779
Feature activation-5.86e-03

Pos logit contributions
 agrees+0.065
 cleans+0.060
ours+0.059
 learns+0.056
 cooperate+0.056
Neg logit contributions
 stub-0.042
 all-0.041
--0.040
 year-0.039
 ever-0.039
Token just
Token ID655
Feature activation-9.61e-03

Pos logit contributions
 agrees+0.014
 cleans+0.013
ours+0.013
 learns+0.012
 wasteful+0.012
Neg logit contributions
 stub-0.006
 year-0.006
 all-0.006
 ever-0.005
--0.005
Token a
Token ID257
Feature activation-7.29e-03

Pos logit contributions
 agrees+0.026
 cleans+0.024
ours+0.024
 wasteful+0.023
 learns+0.023
Neg logit contributions
 all-0.007
 one-0.007
 stub-0.007
--0.006
 in-0.006
Token little
Token ID1310
Feature activation-4.92e-03

Pos logit contributions
 agrees+0.014
 cleans+0.014
ours+0.013
 wasteful+0.013
 learns+0.012
Neg logit contributions
 all-0.009
 ever-0.009
 stub-0.009
 nowhere-0.009
 in-0.009
Token bit
Token ID1643
Feature activation-1.31e-02

Pos logit contributions
 agrees+0.011
 cleans+0.010
ours+0.010
 learns+0.010
 wasteful+0.010
Neg logit contributions
 all-0.006
 stub-0.005
--0.005
 one-0.005
 ever-0.005
Token"
Token ID1
Feature activation+5.67e-03

Pos logit contributions
 agrees+0.008
ours+0.007
 cleans+0.007
 supervision+0.007
 wasteful+0.006
Neg logit contributions
 with-0.003
 in-0.003
--0.003
 all-0.003
 year-0.003
TokenGood
Token ID10248
Feature activation+3.96e-03

Pos logit contributions
 all+0.007
 one+0.007
 ever+0.007
 stub+0.007
 just+0.006
Neg logit contributions
 agrees-0.013
ours-0.012
 cleans-0.012
 wasteful-0.011
 learns-0.011
Token job
Token ID1693
Feature activation-1.15e-02

Pos logit contributions
 stub+0.004
 all+0.004
 nowhere+0.004
 year+0.004
 t+0.004
Neg logit contributions
 agrees-0.009
ours-0.009
 cleans-0.009
 learns-0.008
 cooperate-0.008
Token,
Token ID11
Feature activation-1.40e-02

Pos logit contributions
 agrees+0.033
 cleans+0.031
ours+0.031
 learns+0.030
 wasteful+0.030
Neg logit contributions
 all-0.007
 one-0.006
--0.006
 in-0.006
 ever-0.005
Token Lily
Token ID20037
Feature activation-8.59e-03

Pos logit contributions
 agrees+0.032
ours+0.032
 cleans+0.032
 wasteful+0.029
 learns+0.029
Neg logit contributions
 all-0.018
 one-0.016
 just-0.015
 in-0.014
 t-0.014
Token and
Token ID290
Feature activation-3.11e-02

Pos logit contributions
 agrees+0.022
 cleans+0.020
 learns+0.019
ours+0.019
 wasteful+0.018
Neg logit contributions
 stub-0.008
 all-0.007
 nowhere-0.007
 pupil-0.007
 t-0.007
Token Tom
Token ID4186
Feature activation-2.05e-03

Pos logit contributions
 agrees+0.075
ours+0.070
 cleans+0.070
 cooperate+0.065
 wasteful+0.065
Neg logit contributions
 all-0.035
 one-0.033
 in-0.030
 just-0.029
 t-0.029
Token!"
Token ID2474
Feature activation-6.84e-03

Pos logit contributions
 agrees+0.005
ours+0.005
 cleans+0.005
 wasteful+0.005
 learns+0.005
Neg logit contributions
 all-0.002
--0.002
 one-0.002
 in-0.002
 stub-0.002
Token Mom
Token ID11254
Feature activation-5.19e-03

Pos logit contributions
 agrees+0.017
ours+0.017
 cleans+0.016
 wasteful+0.016
 supervision+0.015
Neg logit contributions
 all-0.007
 one-0.006
 in-0.006
 just-0.006
 t-0.006
Token says
Token ID1139
Feature activation+2.54e-03

Pos logit contributions
 wasteful+0.007
 agrees+0.007
ours+0.007
 cleans+0.007
 cooperate+0.006
Neg logit contributions
 all-0.010
 in-0.010
 one-0.010
 just-0.010
 t-0.010
Token.
Token ID13
Feature activation+6.86e-03

Pos logit contributions
 all+0.001
-+0.001
 in+0.001
 one+0.001
 ever+0.001
Neg logit contributions
 agrees-0.008
ours-0.007
 cleans-0.007
 wasteful-0.007
 learns-0.007
Token house
Token ID2156
Feature activation-3.76e-03

Pos logit contributions
 all+0.006
 one+0.005
 in+0.005
 t+0.005
 stub+0.005
Neg logit contributions
 agrees-0.012
ours-0.010
 cleans-0.010
 wasteful-0.010
 cooperate-0.010
Token.
Token ID13
Feature activation+1.58e-03

Pos logit contributions
 agrees+0.011
 cleans+0.010
ours+0.010
 learns+0.010
 wasteful+0.010
Neg logit contributions
 all-0.002
 stub-0.002
--0.001
 one-0.001
 ever-0.001
Token They
Token ID1119
Feature activation-1.18e-02

Pos logit contributions
 all+0.002
 in+0.002
 one+0.002
 with+0.002
 t+0.002
Neg logit contributions
 agrees-0.003
 cleans-0.003
ours-0.003
 supervision-0.003
 wasteful-0.003
Token wash
Token ID13502
Feature activation+6.05e-04

Pos logit contributions
 agrees+0.027
ours+0.025
 cleans+0.025
 wasteful+0.024
 learns+0.023
Neg logit contributions
 all-0.014
 t-0.012
 in-0.012
 just-0.012
 one-0.012
Token the
Token ID262
Feature activation+3.71e-03

Pos logit contributions
 all+0.001
 in+0.000
 one+0.000
-+0.000
 just+0.000
Neg logit contributions
 agrees-0.002
 cleans-0.001
ours-0.001
 wasteful-0.001
 learns-0.001
Token beans
Token ID16567
Feature activation-3.17e-02

Pos logit contributions
 all+0.005
 stub+0.004
 one+0.004
 in+0.004
-+0.004
Neg logit contributions
 agrees-0.008
 cleans-0.008
ours-0.008
 wasteful-0.007
 learns-0.007
Token and
Token ID290
Feature activation+8.08e-04

Pos logit contributions
 agrees+0.100
 cleans+0.095
ours+0.095
 wasteful+0.093
 learns+0.091
Neg logit contributions
 all-0.015
 in-0.013
 one-0.009
 with-0.009
--0.008
Token put
Token ID1234
Feature activation+4.24e-03

Pos logit contributions
 all+0.001
 in+0.001
 one+0.001
 t+0.001
 just+0.001
Neg logit contributions
 agrees-0.002
ours-0.002
 cleans-0.002
 wasteful-0.002
oke-0.002
Token them
Token ID606
Feature activation-1.65e-02

Pos logit contributions
 stub+0.004
 all+0.004
-+0.004
 year+0.004
 one+0.004
Neg logit contributions
 agrees-0.010
 cleans-0.009
ours-0.009
 learns-0.009
 wasteful-0.009
Token in
Token ID287
Feature activation-5.44e-04

Pos logit contributions
 agrees+0.053
 cleans+0.050
ours+0.050
 learns+0.048
 wasteful+0.048
Neg logit contributions
 all-0.003
 stub-0.002
 one-0.002
--0.002
 in-0.002
Token a
Token ID257
Feature activation-1.05e-03

Pos logit contributions
 agrees+0.001
 cleans+0.001
ours+0.001
 learns+0.001
 wasteful+0.001
Neg logit contributions
 all-0.001
 stub-0.000
 one-0.000
--0.000
 in-0.000
Token and
Token ID290
Feature activation-1.65e-03

Pos logit contributions
 agrees+0.030
 cleans+0.028
ours+0.027
 safer+0.027
 learns+0.027
Neg logit contributions
 stub-0.011
 pupil-0.009
 year-0.009
 t-0.008
lf-0.008
Token it
Token ID340
Feature activation+4.68e-03

Pos logit contributions
 agrees+0.004
 cleans+0.003
ours+0.003
 learns+0.003
 wasteful+0.003
Neg logit contributions
 all-0.002
 stub-0.002
--0.002
 one-0.002
 year-0.002
Token kept
Token ID4030
Feature activation-1.59e-03

Pos logit contributions
 stub+0.004
 all+0.004
 one+0.004
-+0.004
 year+0.004
Neg logit contributions
 agrees-0.011
 cleans-0.011
ours-0.010
 learns-0.010
 wasteful-0.010
Token her
Token ID607
Feature activation-1.19e-02

Pos logit contributions
 cleans+0.003
 safer+0.003
 agrees+0.003
 wasteful+0.003
ours+0.003
Neg logit contributions
 stub-0.002
 pupil-0.002
 year-0.002
t-0.002
lf-0.002
Token warm
Token ID5814
Feature activation-2.27e-03

Pos logit contributions
 agrees+0.026
 cleans+0.024
ours+0.023
 safer+0.022
 learns+0.022
Neg logit contributions
 stub-0.014
 all-0.013
--0.013
 year-0.012
 one-0.012
Token even
Token ID772
Feature activation-3.20e-02

Pos logit contributions
 agrees+0.005
 cleans+0.005
 safer+0.005
ours+0.005
 learns+0.004
Neg logit contributions
 stub-0.002
 pupil-0.002
lf-0.002
 nowhere-0.002
 all-0.002
Token when
Token ID618
Feature activation+2.82e-03

Pos logit contributions
 agrees+0.057
ours+0.054
 cleans+0.051
 cooperate+0.047
 learns+0.046
Neg logit contributions
 in-0.057
 all-0.056
 one-0.056
--0.054
 just-0.054
Token it
Token ID340
Feature activation+8.10e-03

Pos logit contributions
 all+0.003
 stub+0.003
-+0.003
 one+0.003
 year+0.003
Neg logit contributions
 agrees-0.007
 cleans-0.006
ours-0.006
 learns-0.006
 wasteful-0.006
Token was
Token ID373
Feature activation+1.29e-02

Pos logit contributions
 all+0.007
 stub+0.007
 one+0.007
-+0.007
 year+0.006
Neg logit contributions
 agrees-0.020
 cleans-0.019
ours-0.019
 learns-0.018
 wasteful-0.018
Token cold
Token ID4692
Feature activation+1.96e-04

Pos logit contributions
 pupil+0.017
-+0.016
 stub+0.016
 year+0.015
 ever+0.015
Neg logit contributions
 cleans-0.024
 agrees-0.024
 safer-0.023
ours-0.022
 wasteful-0.022
Token outside
Token ID2354
Feature activation+9.93e-03

Pos logit contributions
 stub+0.000
 pupil+0.000
 all+0.000
 year+0.000
-+0.000
Neg logit contributions
 agrees-0.001
 cleans-0.001
ours-0.001
 learns-0.000
 wasteful-0.000