TOP ACTIVATIONS
MAX = 0.071

What made this mess , Mom my ?" he asked .
climb and hard for grown ups , but not for Bob
, and only grown - ups could use them . Tom
hard sweet heart , but Mom my needs to help organize
afternoon and had had lots of fun learning about the computer
so heavy that two grown ups had to carry it .
, " Care ful , Mom my !" Mom
" Y ummy beef ! Mom , can I have more
he could be and never mis beh aved . <|endoftext|> Once
the park . When they arrived , they saw a new
, " That 's what moms are for , sweet ie
It is for grown - ups . It can make you
it and said , " Mom my , can I go
a new doll ! Mom o was so excited when
. When she arrived , the teacher taught them
can 't I go , Mom my ?" Mom
I play it today , Mom my ?" Mom
show her . When she arrived , Grand pa said ,
excited ! But when they arrived , the place was very
. When they arrived they unloaded the car and

MIN ACTIVATIONS
MIN = -0.083

you ?" B run o said , " Of
She takes out a magn ifying glass . She looks at
the necklace and the magn ifying glass and started to examine
He took out a magn ifying glass and a notebook and
When Daisy put the magn ifying glass down , the cater
She took out a magn ifying glass and looked very carefully
iley laughed at the nos y dove and then continued on
there were two cats , Tig gy and Snow y .
the watch with the magn ifying glass . He saw the
, handing him the magn ifying glass . She looked around
ara and Tom run to Rocky . They hope he is
M am ma sc run ched up all of the
said , holding the magn ifying glass over a black ant
cried , dropping the magn ifying glass . "
Ex amine the Tent â and he decided
toy had fallen behind the book case . He reached in
the necklace and the magn ifying glass and told him that
a big camera on a tripod . It belongs to a
Mom looked at Mr . Jones . Mr . Jones
they sw am back up river and soon they were back

INTERVAL 0.032 - 0.071
CONTAINS 1.820%

s and seaw eed . Then we can pretend to play
. The sun flower was so proud of
" Let 's go take a closer look ,"
so she wouldn 't get her dress dirty . Lily thought
in the morning . The sun was just starting to come

INTERVAL -0.006 - 0.032
CONTAINS 65.230%

ball too hard and it rolled away into the bushes .
. But she was relieved too , that they had only
high ! You will fall !" Bob o was
was so happy , but he was now too nervous to
listened to Red and they never got into a wreck again

INTERVAL -0.044 - -0.006
CONTAINS 32.796%

then shouted , " I 'm going for a dive !"
easier for them to swim around . The two friends were
. They have fun . They make a new friend .
in the sky . <|endoftext|> A little girl was playing at
knob , but Sam was determined . He ran up to

INTERVAL -0.083 - -0.044
CONTAINS 0.155%

to drink from the pond every day . Then one day
a big bubble near the ceiling . He thought , "
to play with the boss y girl . They wanted to
their own ending to the story . L ily
They put the cap on Lily 's head and laughed
TokenWhat
Token ID2061
Feature activation-9.66e-03

Pos logit contributions
thank+0.033
 met+0.029
organized+0.028
 neighborhood+0.028
please+0.028
Neg logit contributions
ah-0.039
 snowball-0.039
 cannon-0.037
 may-0.037
affe-0.037
Token made
Token ID925
Feature activation-2.91e-05

Pos logit contributions
ah+0.014
 snowball+0.014
 cannon+0.014
affe+0.013
 tentacles+0.013
Neg logit contributions
thank-0.018
 met-0.016
organized-0.016
please-0.016
 neighborhood-0.016
Token this
Token ID428
Feature activation-1.29e-02

Pos logit contributions
ah+0.000
 snowball+0.000
 cannon+0.000
affe+0.000
 tentacles+0.000
Neg logit contributions
thank-0.000
 met-0.000
organized-0.000
please-0.000
 neighborhood-0.000
Token mess
Token ID2085
Feature activation+1.44e-02

Pos logit contributions
ah+0.022
 cannon+0.022
 snowball+0.022
affe+0.022
 tentacles+0.021
Neg logit contributions
thank-0.021
 met-0.018
organized-0.018
please-0.018
 neighborhood-0.017
Token,
Token ID11
Feature activation+1.95e-02

Pos logit contributions
thank+0.022
please+0.019
 met+0.018
organized+0.018
 neighborhood+0.018
Neg logit contributions
ah-0.026
 cannon-0.026
 may-0.025
 snowball-0.025
 button-0.025
Token Mom
Token ID11254
Feature activation+7.06e-02

Pos logit contributions
thank+0.020
 doctors+0.017
 responsible+0.017
 met+0.017
 neighborhood+0.017
Neg logit contributions
ah-0.043
 cannon-0.041
 tentacles-0.041
fly-0.041
ience-0.041
Tokenmy
Token ID1820
Feature activation+3.07e-02

Pos logit contributions
thank+0.119
 met+0.098
 neighborhood+0.096
please+0.095
organized+0.094
Neg logit contributions
ah-0.120
 snowball-0.115
fly-0.112
 may-0.111
 button-0.111
Token?"
Token ID1701
Feature activation+2.72e-02

Pos logit contributions
thank+0.051
 neighborhood+0.043
organized+0.043
ups+0.042
please+0.041
Neg logit contributions
ah-0.050
 may-0.050
fly-0.048
 button-0.047
 carrot-0.047
Token he
Token ID339
Feature activation-1.15e-02

Pos logit contributions
thank+0.050
please+0.043
organized+0.043
 met+0.042
 neighborhood+0.042
Neg logit contributions
 snowball-0.041
 button-0.040
 may-0.040
ah-0.040
 cannon-0.039
Token asked
Token ID1965
Feature activation+6.59e-03

Pos logit contributions
ah+0.027
 snowball+0.027
 tentacles+0.027
 button+0.027
 cannon+0.027
Neg logit contributions
thank-0.012
organized-0.009
 neighborhood-0.008
please-0.008
 met-0.007
Token.
Token ID13
Feature activation-1.79e-02

Pos logit contributions
thank+0.011
please+0.010
 met+0.010
 neighborhood+0.010
organized+0.009
Neg logit contributions
 snowball-0.011
ah-0.011
 button-0.010
 cannon-0.010
 sled-0.010
Token climb
Token ID12080
Feature activation+1.65e-02

Pos logit contributions
ah+0.017
 snowball+0.017
 cannon+0.017
 button+0.017
 may+0.017
Neg logit contributions
thank-0.017
please-0.015
organized-0.014
 met-0.014
 neighborhood-0.014
Token and
Token ID290
Feature activation+2.45e-03

Pos logit contributions
thank+0.020
organized+0.017
 met+0.017
 neighborhood+0.016
 outside+0.015
Neg logit contributions
ah-0.035
affe-0.035
 cannon-0.034
ience-0.033
bee-0.033
Token hard
Token ID1327
Feature activation-6.20e-03

Pos logit contributions
thank+0.004
please+0.004
organized+0.004
 met+0.004
 neighborhood+0.004
Neg logit contributions
ah-0.004
 may-0.004
 snowball-0.004
 cannon-0.004
 button-0.004
Token for
Token ID329
Feature activation+1.24e-02

Pos logit contributions
ah+0.012
 cannon+0.012
 snowball+0.012
 may+0.011
affe+0.011
Neg logit contributions
thank-0.009
please-0.008
 met-0.007
organized-0.007
 neighborhood-0.007
Token grown
Token ID7334
Feature activation+2.43e-02

Pos logit contributions
thank+0.016
 met+0.014
 doctors+0.014
organized+0.014
 neighborhood+0.014
Neg logit contributions
ah-0.025
affe-0.024
fly-0.024
 cannon-0.024
 may-0.023
Tokenups
Token ID4739
Feature activation+7.03e-02

Pos logit contributions
thank+0.042
 met+0.035
please+0.035
 neighborhood+0.033
organized+0.032
Neg logit contributions
 snowball-0.039
 tentacles-0.038
ah-0.037
 button-0.037
 sled-0.037
Token,
Token ID11
Feature activation+1.03e-02

Pos logit contributions
thank+0.102
please+0.085
 met+0.084
organized+0.084
 neighborhood+0.082
Neg logit contributions
ah-0.131
 cannon-0.129
 snowball-0.128
 may-0.128
affe-0.126
Token but
Token ID475
Feature activation-5.72e-03

Pos logit contributions
thank+0.014
 neighborhood+0.011
 met+0.011
 doctors+0.011
 bored+0.011
Neg logit contributions
ah-0.022
affe-0.021
ience-0.020
rum-0.020
 cannon-0.020
Token not
Token ID407
Feature activation+1.81e-03

Pos logit contributions
affe+0.010
ah+0.010
 snowball+0.010
hell+0.009
 cannon+0.009
Neg logit contributions
thank-0.009
 met-0.009
 doctors-0.008
organized-0.008
 bored-0.008
Token for
Token ID329
Feature activation+2.41e-02

Pos logit contributions
thank+0.002
 responsible+0.002
organized+0.002
please+0.002
 met+0.002
Neg logit contributions
ah-0.004
 cannon-0.004
affe-0.004
 snowball-0.004
 may-0.004
Token Bob
Token ID5811
Feature activation+4.01e-03

Pos logit contributions
thank+0.029
 doctors+0.027
organized+0.027
 met+0.026
please+0.026
Neg logit contributions
affe-0.050
ah-0.050
 cannon-0.048
fly-0.048
 snowball-0.047
Token,
Token ID11
Feature activation+7.27e-03

Pos logit contributions
thank+0.023
organized+0.020
 met+0.019
 doctors+0.019
please+0.019
Neg logit contributions
ah-0.040
affe-0.039
 cannon-0.039
 snowball-0.039
 may-0.038
Token and
Token ID290
Feature activation-6.41e-04

Pos logit contributions
thank+0.010
 doctors+0.008
organized+0.008
 met+0.008
please+0.008
Neg logit contributions
ah-0.014
affe-0.014
fly-0.014
 globe-0.014
ience-0.013
Token only
Token ID691
Feature activation+2.62e-02

Pos logit contributions
ah+0.001
 cannon+0.001
affe+0.001
 snowball+0.001
 globe+0.001
Neg logit contributions
thank-0.001
 met-0.001
organized-0.001
 doctors-0.001
please-0.001
Token grown
Token ID7334
Feature activation+3.92e-02

Pos logit contributions
thank+0.037
 met+0.033
organized+0.032
please+0.031
 responsible+0.030
Neg logit contributions
ah-0.049
affe-0.049
 snowball-0.049
 cannon-0.048
 globe-0.047
Token-
Token ID12
Feature activation+5.17e-02

Pos logit contributions
thank+0.073
 met+0.062
 neighborhood+0.061
please+0.060
organized+0.059
Neg logit contributions
 snowball-0.060
ah-0.060
 tentacles-0.057
 button-0.057
 cannon-0.056
Tokenups
Token ID4739
Feature activation+7.01e-02

Pos logit contributions
thank+0.032
 neighborhood+0.019
 met+0.017
please+0.015
organized+0.013
Neg logit contributions
ah-0.141
 snowball-0.136
 may-0.135
ience-0.135
 button-0.134
Token could
Token ID714
Feature activation+1.49e-03

Pos logit contributions
thank+0.104
 met+0.090
organized+0.089
please+0.088
 neighborhood+0.084
Neg logit contributions
 snowball-0.129
ah-0.129
 cannon-0.128
affe-0.126
 tentacles-0.123
Token use
Token ID779
Feature activation+6.95e-03

Pos logit contributions
thank+0.002
please+0.002
 punished+0.002
 met+0.002
 doctors+0.002
Neg logit contributions
affe-0.003
ience-0.003
ah-0.003
 cannon-0.003
 globe-0.003
Token them
Token ID606
Feature activation+3.92e-02

Pos logit contributions
thank+0.010
organized+0.009
 doctors+0.009
ups+0.009
 met+0.009
Neg logit contributions
affe-0.014
ah-0.013
fly-0.013
bee-0.012
 globe-0.012
Token.
Token ID13
Feature activation+6.06e-03

Pos logit contributions
thank+0.048
organized+0.040
 met+0.039
please+0.038
ups+0.038
Neg logit contributions
ah-0.082
 snowball-0.081
 cannon-0.081
affe-0.081
 may-0.079
Token Tom
Token ID4186
Feature activation+7.19e-03

Pos logit contributions
thank+0.011
please+0.009
organized+0.009
 met+0.009
 neighborhood+0.009
Neg logit contributions
ah-0.010
 snowball-0.010
 cannon-0.009
 may-0.009
affe-0.009
Token hard
Token ID1327
Feature activation-3.49e-03

Pos logit contributions
ah+0.008
 cannon+0.008
 snowball+0.008
 may+0.008
affe+0.008
Neg logit contributions
thank-0.007
please-0.006
 met-0.006
organized-0.006
 neighborhood-0.006
Token sweet
Token ID6029
Feature activation+1.23e-02

Pos logit contributions
ah+0.006
 cannon+0.006
 snowball+0.006
affe+0.006
 may+0.006
Neg logit contributions
thank-0.006
organized-0.005
please-0.005
 met-0.005
 neighborhood-0.005
Tokenheart
Token ID11499
Feature activation-1.07e-02

Pos logit contributions
thank+0.020
organized+0.017
please+0.017
 neighborhood+0.016
 doctors+0.016
Neg logit contributions
 cannon-0.021
ah-0.021
 snowball-0.021
 tentacles-0.020
 spoiler-0.020
Token,
Token ID11
Feature activation+2.82e-04

Pos logit contributions
 cannon+0.022
 snowball+0.022
ah+0.022
 tentacles+0.021
 spear+0.021
Neg logit contributions
thank-0.014
 responsible-0.011
organized-0.011
 met-0.011
please-0.011
Token but
Token ID475
Feature activation+1.92e-03

Pos logit contributions
thank+0.000
 doctors+0.000
 responsible+0.000
 punished+0.000
 organized+0.000
Neg logit contributions
 cannon-0.001
ah-0.001
 tentacles-0.001
fly-0.001
 globe-0.001
Token Mom
Token ID11254
Feature activation+6.96e-02

Pos logit contributions
thank+0.003
 met+0.003
please+0.003
organized+0.003
 neighborhood+0.002
Neg logit contributions
 snowball-0.003
ah-0.003
 cannon-0.003
affe-0.003
 tentacles-0.003
Tokenmy
Token ID1820
Feature activation+3.37e-02

Pos logit contributions
thank+0.100
 met+0.085
organized+0.085
please+0.084
 neighborhood+0.081
Neg logit contributions
 cannon-0.132
 snowball-0.131
ah-0.129
affe-0.128
 tentacles-0.127
Token needs
Token ID2476
Feature activation-1.60e-02

Pos logit contributions
thank+0.059
 met+0.053
organized+0.051
please+0.051
 punished+0.050
Neg logit contributions
 snowball-0.054
 cannon-0.054
ah-0.052
 sled-0.052
 tentacles-0.051
Token to
Token ID284
Feature activation+2.18e-02

Pos logit contributions
ah+0.031
 snowball+0.031
 cannon+0.031
affe+0.030
 may+0.030
Neg logit contributions
thank-0.023
 met-0.019
please-0.019
organized-0.018
 neighborhood-0.018
Token help
Token ID1037
Feature activation+1.64e-02

Pos logit contributions
thank+0.034
please+0.028
 met+0.028
organized+0.028
 neighborhood+0.027
Neg logit contributions
ah-0.040
 snowball-0.039
 cannon-0.039
affe-0.039
 may-0.038
Token organize
Token ID16481
Feature activation+1.59e-02

Pos logit contributions
thank+0.027
 met+0.023
please+0.023
 neighborhood+0.022
organized+0.022
Neg logit contributions
 may-0.027
 button-0.027
ah-0.026
 snowball-0.026
 cannon-0.026
Token afternoon
Token ID6672
Feature activation+1.32e-02

Pos logit contributions
affe+0.008
 cannon+0.008
rum+0.008
ience+0.008
 may+0.008
Neg logit contributions
thank-0.006
 neighborhood-0.005
 met-0.005
 responsible-0.005
 neighbourhood-0.005
Token and
Token ID290
Feature activation-1.30e-03

Pos logit contributions
thank+0.020
please+0.017
organized+0.017
 met+0.017
 neighborhood+0.016
Neg logit contributions
ah-0.024
 cannon-0.023
 snowball-0.023
 may-0.023
 button-0.022
Token had
Token ID550
Feature activation-1.15e-02

Pos logit contributions
 may+0.002
ah+0.002
 cannon+0.002
 snowball+0.002
 tentacles+0.002
Neg logit contributions
thank-0.002
please-0.002
organized-0.002
 neighborhood-0.002
 met-0.002
Token had
Token ID550
Feature activation-4.04e-03

Pos logit contributions
ah+0.020
 snowball+0.020
 cannon+0.020
 may+0.019
 tentacles+0.019
Neg logit contributions
thank-0.018
please-0.015
organized-0.015
 neighborhood-0.015
 met-0.015
Token lots
Token ID6041
Feature activation+1.49e-02

Pos logit contributions
ah+0.007
 snowball+0.007
 cannon+0.007
 tentacles+0.007
 may+0.007
Neg logit contributions
thank-0.007
organized-0.006
please-0.006
 neighborhood-0.005
 met-0.005
Token of
Token ID286
Feature activation+6.93e-02

Pos logit contributions
thank+0.029
please+0.026
organized+0.026
 met+0.026
 neighborhood+0.024
Neg logit contributions
 snowball-0.020
ah-0.020
 may-0.020
affe-0.019
 cannon-0.019
Token fun
Token ID1257
Feature activation+3.44e-02

Pos logit contributions
thank+0.095
please+0.081
organized+0.075
 met+0.074
ups+0.068
Neg logit contributions
 snowball-0.142
 may-0.136
 button-0.135
 cannon-0.135
 tentacles-0.134
Token learning
Token ID4673
Feature activation-2.02e-02

Pos logit contributions
thank+0.053
please+0.046
 met+0.045
organized+0.045
 neighborhood+0.043
Neg logit contributions
ah-0.061
 snowball-0.061
 cannon-0.060
 may-0.059
 button-0.059
Token about
Token ID546
Feature activation-1.30e-02

Pos logit contributions
ah+0.043
affe+0.042
 cannon+0.041
 snowball+0.041
fly+0.041
Neg logit contributions
thank-0.024
ups-0.020
 neighborhood-0.020
organized-0.020
 met-0.020
Token the
Token ID262
Feature activation+1.37e-02

Pos logit contributions
affe+0.025
ah+0.025
ience+0.023
 cannon+0.023
 snowball+0.023
Neg logit contributions
thank-0.018
 neighborhood-0.017
organized-0.016
 doctors-0.016
 met-0.016
Token computer
Token ID3644
Feature activation-1.23e-02

Pos logit contributions
thank+0.023
please+0.020
 met+0.020
organized+0.019
ups+0.019
Neg logit contributions
 snowball-0.022
ah-0.022
 button-0.022
 cannon-0.022
 may-0.022
Token so
Token ID523
Feature activation-1.66e-03

Pos logit contributions
 cannon+0.003
 snowball+0.003
ah+0.003
 button+0.003
 may+0.003
Neg logit contributions
thank-0.003
please-0.003
organized-0.003
 met-0.003
 neighborhood-0.003
Token heavy
Token ID4334
Feature activation+1.08e-03

Pos logit contributions
 may+0.003
 cannon+0.003
 button+0.003
 snowball+0.003
 tentacles+0.003
Neg logit contributions
thank-0.003
please-0.002
organized-0.002
 met-0.002
 neighborhood-0.002
Token that
Token ID326
Feature activation-1.05e-02

Pos logit contributions
thank+0.001
organized+0.001
 doctors+0.001
 met+0.001
 neighborhood+0.001
Neg logit contributions
ah-0.002
affe-0.002
 cannon-0.002
 snowball-0.002
 may-0.002
Token two
Token ID734
Feature activation+3.07e-02

Pos logit contributions
ah+0.013
 snowball+0.013
 cannon+0.013
affe+0.012
 may+0.012
Neg logit contributions
thank-0.022
 met-0.020
please-0.020
organized-0.020
 neighborhood-0.019
Token grown
Token ID7334
Feature activation+2.28e-02

Pos logit contributions
thank+0.033
 met+0.029
organized+0.028
 neighborhood+0.028
 doctors+0.027
Neg logit contributions
ah-0.068
 cannon-0.065
 snowball-0.065
fly-0.064
affe-0.064
Tokenups
Token ID4739
Feature activation+6.88e-02

Pos logit contributions
thank+0.043
please+0.037
 met+0.035
 neighborhood+0.034
Thank+0.033
Neg logit contributions
 tentacles-0.034
 button-0.033
 snowball-0.033
 cannon-0.032
ah-0.032
Token had
Token ID550
Feature activation+1.99e-03

Pos logit contributions
thank+0.103
please+0.087
organized+0.083
 neighborhood+0.082
 doctors+0.081
Neg logit contributions
 may-0.126
ah-0.124
 cannon-0.122
 button-0.119
fly-0.118
Token to
Token ID284
Feature activation+1.43e-02

Pos logit contributions
thank+0.003
please+0.002
organized+0.002
 met+0.002
 neighborhood+0.002
Neg logit contributions
ah-0.004
 cannon-0.004
 snowball-0.004
 may-0.004
 button-0.004
Token carry
Token ID3283
Feature activation+1.87e-02

Pos logit contributions
thank+0.022
 met+0.018
please+0.018
organized+0.018
 neighborhood+0.017
Neg logit contributions
ah-0.026
 cannon-0.026
 snowball-0.026
affe-0.025
 may-0.025
Token it
Token ID340
Feature activation+5.44e-03

Pos logit contributions
thank+0.027
organized+0.023
 met+0.023
please+0.022
 neighborhood+0.022
Neg logit contributions
ah-0.036
affe-0.035
 snowball-0.035
 cannon-0.034
 may-0.034
Token.
Token ID13
Feature activation+9.46e-03

Pos logit contributions
thank+0.008
please+0.006
 met+0.006
organized+0.006
 neighborhood+0.006
Neg logit contributions
ah-0.011
 cannon-0.010
 snowball-0.010
 may-0.010
affe-0.010
Token,
Token ID11
Feature activation+5.47e-03

Pos logit contributions
 snowball+0.009
ah+0.009
 cannon+0.009
affe+0.009
fly+0.009
Neg logit contributions
thank-0.008
 met-0.007
please-0.007
organized-0.006
 responsible-0.006
Token "
Token ID366
Feature activation+1.96e-02

Pos logit contributions
thank+0.007
please+0.006
 met+0.006
 responsible+0.006
organized+0.006
Neg logit contributions
 snowball-0.011
ah-0.011
 cannon-0.011
affe-0.011
 tentacles-0.010
TokenCare
Token ID17784
Feature activation-1.08e-03

Pos logit contributions
thank+0.031
 met+0.027
organized+0.027
please+0.026
 neighborhood+0.026
Neg logit contributions
ah-0.034
 snowball-0.033
 cannon-0.033
 may-0.032
affe-0.032
Tokenful
Token ID913
Feature activation-9.09e-04

Pos logit contributions
 snowball+0.001
ah+0.001
 cannon+0.001
affe+0.001
 may+0.001
Neg logit contributions
thank-0.002
organized-0.002
please-0.002
 met-0.002
 neighborhood-0.002
Token,
Token ID11
Feature activation+2.49e-02

Pos logit contributions
 cannon+0.002
 snowball+0.002
 globe+0.002
 spoiler+0.002
ah+0.002
Neg logit contributions
thank-0.001
ups-0.001
organized-0.001
please-0.001
 responsible-0.001
Token Mom
Token ID11254
Feature activation+6.88e-02

Pos logit contributions
thank+0.026
 doctors+0.022
 responsible+0.022
 organized+0.021
 met+0.021
Neg logit contributions
ah-0.055
fly-0.053
 cannon-0.053
affe-0.053
ience-0.052
Tokenmy
Token ID1820
Feature activation+1.99e-02

Pos logit contributions
thank+0.090
please+0.073
 met+0.073
organized+0.073
 neighborhood+0.070
Neg logit contributions
ah-0.139
 cannon-0.138
 snowball-0.138
 may-0.136
affe-0.136
Token!"
Token ID2474
Feature activation+1.52e-02

Pos logit contributions
thank+0.034
please+0.029
organized+0.029
 met+0.029
 neighborhood+0.028
Neg logit contributions
ah-0.032
 snowball-0.032
 cannon-0.032
affe-0.031
 may-0.031
Token
Token ID198
Feature activation-4.88e-03

Pos logit contributions
thank+0.029
 met+0.026
organized+0.026
please+0.026
 doctors+0.025
Neg logit contributions
 snowball-0.021
 button-0.021
 may-0.020
ah-0.020
 sled-0.020
Token
Token ID198
Feature activation-1.48e-03

Pos logit contributions
 snowball+0.008
ah+0.008
 button+0.008
 cannon+0.008
 sled+0.008
Neg logit contributions
thank-0.008
organized-0.007
 met-0.007
please-0.007
 neighborhood-0.007
TokenMom
Token ID29252
Feature activation+4.15e-02

Pos logit contributions
ah+0.003
 snowball+0.003
 cannon+0.002
 button+0.002
affe+0.002
Neg logit contributions
thank-0.002
 met-0.002
please-0.002
organized-0.002
 neighborhood-0.002
Token "
Token ID366
Feature activation+1.92e-02

Pos logit contributions
thank+0.019
please+0.015
 met+0.015
 responsible+0.014
 neighborhood+0.014
Neg logit contributions
ah-0.024
 snowball-0.024
 cannon-0.023
affe-0.023
 tentacles-0.022
TokenY
Token ID56
Feature activation+1.58e-02

Pos logit contributions
thank+0.016
 met+0.013
organized+0.013
 neighborhood+0.012
please+0.012
Neg logit contributions
ah-0.047
 snowball-0.046
affe-0.045
 may-0.045
 cannon-0.045
Tokenummy
Token ID13513
Feature activation+8.95e-03

Pos logit contributions
thank+0.033
please+0.030
organized+0.029
 met+0.029
 neighborhood+0.029
Neg logit contributions
ah-0.020
 cannon-0.019
 snowball-0.019
 may-0.019
affe-0.019
Token beef
Token ID12023
Feature activation+2.88e-02

Pos logit contributions
thank+0.014
please+0.012
organized+0.011
 met+0.011
 doctors+0.011
Neg logit contributions
 cannon-0.016
 may-0.016
 tentacles-0.016
 spoiler-0.015
 snowball-0.015
Token!
Token ID0
Feature activation-5.15e-03

Pos logit contributions
thank+0.050
please+0.043
 met+0.042
organized+0.041
 neighborhood+0.039
Neg logit contributions
ah-0.046
 cannon-0.046
 snowball-0.046
affe-0.044
 spoiler-0.044
Token Mom
Token ID11254
Feature activation+6.79e-02

Pos logit contributions
ah+0.009
bee+0.009
 snowball+0.009
 cannon+0.009
 button+0.009
Neg logit contributions
thank-0.008
please-0.007
organized-0.006
 met-0.006
Thank-0.006
Token,
Token ID11
Feature activation+6.52e-03

Pos logit contributions
thank+0.084
please+0.069
organized+0.068
 met+0.067
 neighborhood+0.064
Neg logit contributions
 cannon-0.143
ah-0.141
 snowball-0.141
affe-0.140
 may-0.139
Token can
Token ID460
Feature activation+4.20e-03

Pos logit contributions
thank+0.008
 responsible+0.008
 baked+0.008
please+0.007
 doctors+0.007
Neg logit contributions
 spoiler-0.013
rum-0.012
ah-0.012
affe-0.012
 cannon-0.012
Token I
Token ID314
Feature activation+4.44e-03

Pos logit contributions
thank+0.006
please+0.005
 met+0.005
organized+0.005
 doctors+0.005
Neg logit contributions
 cannon-0.008
ah-0.008
 snowball-0.008
affe-0.008
 tentacles-0.008
Token have
Token ID423
Feature activation+6.05e-03

Pos logit contributions
thank+0.007
please+0.006
 met+0.006
 punished+0.006
 baked+0.005
Neg logit contributions
 cannon-0.008
 spoiler-0.008
 tentacles-0.008
 globe-0.008
 snowball-0.008
Token more
Token ID517
Feature activation+3.14e-02

Pos logit contributions
thank+0.010
please+0.009
 met+0.009
organized+0.008
 neighborhood+0.008
Neg logit contributions
ah-0.010
 cannon-0.010
affe-0.010
 snowball-0.010
 may-0.010
Token he
Token ID339
Feature activation-3.38e-03

Pos logit contributions
thank+0.009
 met+0.008
 neighborhood+0.008
organized+0.008
please+0.008
Neg logit contributions
ah-0.007
 cannon-0.007
 snowball-0.007
affe-0.007
 may-0.007
Token could
Token ID714
Feature activation-8.54e-03

Pos logit contributions
 may+0.005
ah+0.005
affe+0.005
ience+0.005
 cannon+0.004
Neg logit contributions
thank-0.006
organized-0.006
please-0.006
 neighborhood-0.005
ups-0.005
Token be
Token ID307
Feature activation+1.21e-02

Pos logit contributions
ah+0.014
 may+0.014
 snowball+0.014
 cannon+0.014
affe+0.014
Neg logit contributions
thank-0.014
please-0.012
organized-0.012
 met-0.012
 neighborhood-0.011
Token and
Token ID290
Feature activation-1.72e-03

Pos logit contributions
thank+0.017
please+0.014
organized+0.014
 neighborhood+0.013
ups+0.013
Neg logit contributions
 snowball-0.021
 cannon-0.021
ah-0.021
rum-0.020
 spear-0.020
Token never
Token ID1239
Feature activation+5.54e-03

Pos logit contributions
ah+0.003
 snowball+0.003
 cannon+0.003
affe+0.003
 may+0.003
Neg logit contributions
thank-0.003
 met-0.002
please-0.002
organized-0.002
 neighborhood-0.002
Token mis
Token ID2984
Feature activation+6.75e-02

Pos logit contributions
thank+0.010
organized+0.008
please+0.008
 neighborhood+0.008
 doctors+0.008
Neg logit contributions
 may-0.009
ah-0.009
 cannon-0.008
fly-0.008
 nut-0.008
Tokenbeh
Token ID20709
Feature activation+2.80e-02

Pos logit contributions
thank+0.125
 met+0.107
 neighborhood+0.103
please+0.102
 doctors+0.100
Neg logit contributions
ah-0.098
fly-0.096
affe-0.095
 snowball-0.095
 cannon-0.094
Tokenaved
Token ID9586
Feature activation+1.47e-02

Pos logit contributions
thank+0.047
 met+0.040
organized+0.040
please+0.040
 neighborhood+0.039
Neg logit contributions
ah-0.046
 snowball-0.046
 cannon-0.045
affe-0.045
 may-0.045
Token.
Token ID13
Feature activation-3.51e-03

Pos logit contributions
thank+0.021
organized+0.017
please+0.017
 met+0.017
 neighborhood+0.017
Neg logit contributions
ah-0.028
 snowball-0.028
 cannon-0.028
affe-0.027
 may-0.027
Token<|endoftext|>
Token ID50256
Feature activation-5.84e-03

Pos logit contributions
ah+0.007
 cannon+0.007
 snowball+0.007
 button+0.007
bee+0.006
Neg logit contributions
thank-0.005
 met-0.004
 neighborhood-0.004
please-0.004
organized-0.004
TokenOnce
Token ID7454
Feature activation-1.41e-02

Pos logit contributions
ah+0.012
 cannon+0.012
affe+0.012
 snowball+0.011
ience+0.011
Neg logit contributions
thank-0.008
 met-0.007
 neighborhood-0.006
please-0.006
organized-0.006
Token the
Token ID262
Feature activation+3.29e-02

Pos logit contributions
thank+0.006
 neighborhood+0.005
 met+0.005
 neighbourhood+0.004
 outside+0.004
Neg logit contributions
affe-0.011
ah-0.010
ience-0.010
bee-0.010
hell-0.010
Token park
Token ID3952
Feature activation+4.07e-03

Pos logit contributions
thank+0.046
please+0.039
organized+0.037
ups+0.034
 met+0.034
Neg logit contributions
 snowball-0.065
 button-0.063
ah-0.063
 cannon-0.063
 sled-0.063
Token.
Token ID13
Feature activation+1.59e-03

Pos logit contributions
thank+0.007
please+0.007
organized+0.006
 punished+0.006
 met+0.006
Neg logit contributions
ah-0.006
 sled-0.006
 button-0.006
 snowball-0.006
 tentacles-0.006
Token When
Token ID1649
Feature activation+1.31e-02

Pos logit contributions
thank+0.003
please+0.003
 met+0.003
organized+0.003
 neighborhood+0.003
Neg logit contributions
ah-0.002
 snowball-0.002
 cannon-0.002
 may-0.002
affe-0.002
Token they
Token ID484
Feature activation+8.83e-03

Pos logit contributions
thank+0.025
please+0.023
organized+0.022
 met+0.022
 neighborhood+0.021
Neg logit contributions
 may-0.018
ah-0.018
 snowball-0.018
 cannon-0.018
 sled-0.018
Token arrived
Token ID5284
Feature activation+6.73e-02

Pos logit contributions
thank+0.013
organized+0.012
please+0.011
 responsible+0.011
 doctors+0.011
Neg logit contributions
 may-0.018
ah-0.017
 sled-0.016
 button-0.016
 cannon-0.016
Token,
Token ID11
Feature activation+3.56e-02

Pos logit contributions
thank+0.143
please+0.130
organized+0.124
 punished+0.116
 doctors+0.115
Neg logit contributions
 may-0.087
 sled-0.084
 button-0.084
ah-0.083
 cannon-0.083
Token they
Token ID484
Feature activation+1.56e-02

Pos logit contributions
thank+0.070
please+0.063
organized+0.060
 met+0.058
 neighborhood+0.057
Neg logit contributions
 may-0.049
 snowball-0.048
 cannon-0.048
 button-0.048
ah-0.048
Token saw
Token ID2497
Feature activation+3.13e-02

Pos logit contributions
thank+0.022
organized+0.020
please+0.018
 doctors+0.018
 responsible+0.018
Neg logit contributions
 may-0.030
ah-0.029
 cannon-0.028
 sled-0.027
 button-0.027
Token a
Token ID257
Feature activation+3.11e-03

Pos logit contributions
thank+0.051
please+0.043
organized+0.043
 met+0.042
 neighborhood+0.041
Neg logit contributions
ah-0.054
 snowball-0.054
 cannon-0.054
 may-0.052
affe-0.052
Token new
Token ID649
Feature activation-3.26e-03

Pos logit contributions
thank+0.004
 neighborhood+0.004
 met+0.004
please+0.004
 responsible+0.003
Neg logit contributions
rum-0.006
ah-0.006
 tentacles-0.006
affe-0.006
 cannon-0.006
Token,
Token ID11
Feature activation-4.70e-03

Pos logit contributions
 snowball+0.025
ah+0.025
 cannon+0.024
affe+0.024
 tentacles+0.024
Neg logit contributions
thank-0.023
please-0.019
organized-0.019
 met-0.019
 responsible-0.018
Token "
Token ID366
Feature activation-4.79e-03

Pos logit contributions
ah+0.008
 snowball+0.008
 cannon+0.008
affe+0.008
 tentacles+0.008
Neg logit contributions
thank-0.008
please-0.007
 met-0.006
organized-0.006
 responsible-0.006
TokenThat
Token ID2504
Feature activation-3.43e-02

Pos logit contributions
ah+0.009
 cannon+0.009
 snowball+0.009
 may+0.009
affe+0.009
Neg logit contributions
thank-0.007
please-0.005
 met-0.005
organized-0.005
 neighborhood-0.005
Token's
Token ID338
Feature activation+9.49e-03

Pos logit contributions
 cannon+0.070
 spoiler+0.068
rum+0.067
 tentacles+0.067
affe+0.066
Neg logit contributions
thank-0.044
 met-0.039
 neighborhood-0.038
organized-0.037
please-0.036
Token what
Token ID644
Feature activation-2.54e-02

Pos logit contributions
thank+0.014
 met+0.012
please+0.012
organized+0.012
 neighborhood+0.012
Neg logit contributions
ah-0.018
 snowball-0.017
 may-0.017
 cannon-0.017
 button-0.017
Token moms
Token ID34015
Feature activation+6.73e-02

Pos logit contributions
ah+0.047
 snowball+0.047
 cannon+0.046
affe+0.046
 button+0.045
Neg logit contributions
thank-0.037
 met-0.032
please-0.032
organized-0.031
 neighborhood-0.030
Token are
Token ID389
Feature activation+6.86e-03

Pos logit contributions
thank+0.142
please+0.126
organized+0.125
 met+0.124
 neighborhood+0.123
Neg logit contributions
ah-0.082
 may-0.080
 cannon-0.080
 snowball-0.078
affe-0.078
Token for
Token ID329
Feature activation+1.28e-02

Pos logit contributions
thank+0.010
 met+0.009
please+0.009
 neighborhood+0.009
organized+0.009
Neg logit contributions
 may-0.012
ah-0.012
 cannon-0.012
 globe-0.012
 button-0.012
Token,
Token ID11
Feature activation+3.67e-03

Pos logit contributions
thank+0.022
please+0.019
 met+0.019
organized+0.019
 neighborhood+0.019
Neg logit contributions
 cannon-0.020
 may-0.020
ah-0.020
 snowball-0.020
 tentacles-0.020
Token sweet
Token ID6029
Feature activation-6.98e-03

Pos logit contributions
thank+0.003
 doctors+0.003
 responsible+0.003
 punished+0.003
please+0.003
Neg logit contributions
ah-0.009
fly-0.008
 cannon-0.008
 tentacles-0.008
 globe-0.008
Tokenie
Token ID494
Feature activation-4.58e-03

Pos logit contributions
 cannon+0.014
 spoiler+0.014
ah+0.014
 globe+0.014
 tentacles+0.014
Neg logit contributions
thank-0.009
please-0.008
ups-0.008
 doctors-0.007
organized-0.007
Token It
Token ID632
Feature activation+1.50e-03

Pos logit contributions
thank+0.042
organized+0.039
please+0.038
 met+0.037
 doctors+0.037
Neg logit contributions
ah-0.029
 snowball-0.028
 cannon-0.028
 spear-0.027
 tentacles-0.027
Token is
Token ID318
Feature activation-1.16e-03

Pos logit contributions
thank+0.003
 met+0.002
organized+0.002
 punished+0.002
ups+0.002
Neg logit contributions
 cannon-0.002
affe-0.002
ah-0.002
 snowball-0.002
 tentacles-0.002
Token for
Token ID329
Feature activation+3.14e-02

Pos logit contributions
ah+0.002
affe+0.002
 cannon+0.002
 snowball+0.002
 may+0.002
Neg logit contributions
thank-0.002
 met-0.001
please-0.001
organized-0.001
 neighborhood-0.001
Token grown
Token ID7334
Feature activation+2.56e-02

Pos logit contributions
thank+0.049
organized+0.040
please+0.040
 met+0.040
 neighborhood+0.039
Neg logit contributions
 cannon-0.054
 tentacles-0.054
 button-0.054
 may-0.054
 snowball-0.053
Token-
Token ID12
Feature activation+5.41e-02

Pos logit contributions
thank+0.049
 met+0.042
please+0.042
 neighborhood+0.041
organized+0.040
Neg logit contributions
 snowball-0.037
ah-0.036
 tentacles-0.035
 cannon-0.035
 button-0.035
Tokenups
Token ID4739
Feature activation+6.72e-02

Pos logit contributions
thank+0.039
 neighborhood+0.030
 met+0.026
please+0.024
 neighbourhood+0.023
Neg logit contributions
ah-0.135
ion-0.133
ience-0.132
 button-0.130
hell-0.129
Token.
Token ID13
Feature activation+2.67e-02

Pos logit contributions
thank+0.103
please+0.087
 met+0.086
organized+0.086
 neighborhood+0.084
Neg logit contributions
ah-0.120
 cannon-0.118
 snowball-0.118
 may-0.117
affe-0.116
Token It
Token ID632
Feature activation+2.83e-03

Pos logit contributions
thank+0.051
organized+0.045
please+0.045
 met+0.045
 neighborhood+0.044
Neg logit contributions
ah-0.038
 cannon-0.037
 snowball-0.036
 may-0.036
affe-0.035
Token can
Token ID460
Feature activation-1.17e-02

Pos logit contributions
thank+0.005
 met+0.005
organized+0.005
 punished+0.005
please+0.005
Neg logit contributions
ah-0.004
 cannon-0.004
 snowball-0.004
affe-0.004
 tentacles-0.004
Token make
Token ID787
Feature activation+9.33e-04

Pos logit contributions
ah+0.020
 snowball+0.020
 cannon+0.020
affe+0.020
 may+0.019
Neg logit contributions
thank-0.019
please-0.016
 met-0.016
organized-0.016
 neighborhood-0.015
Token you
Token ID345
Feature activation+5.58e-03

Pos logit contributions
thank+0.001
please+0.001
 met+0.001
organized+0.001
 neighborhood+0.001
Neg logit contributions
ah-0.002
 cannon-0.002
 snowball-0.002
 may-0.002
 button-0.002
Token it
Token ID340
Feature activation-2.50e-02

Pos logit contributions
thank+0.045
 met+0.041
organized+0.040
 neighborhood+0.040
please+0.039
Neg logit contributions
ah-0.028
affe-0.027
 cannon-0.027
ience-0.026
fly-0.026
Token and
Token ID290
Feature activation-4.71e-03

Pos logit contributions
ah+0.054
affe+0.054
 cannon+0.053
haw+0.053
ience+0.052
Neg logit contributions
thank-0.030
organized-0.025
 outside-0.025
 neighborhood-0.025
 organized-0.023
Token said
Token ID531
Feature activation-2.60e-04

Pos logit contributions
 may+0.008
ah+0.008
 snowball+0.008
 button+0.008
 cannon+0.008
Neg logit contributions
thank-0.007
please-0.006
organized-0.006
 neighborhood-0.006
 doctors-0.006
Token,
Token ID11
Feature activation+1.97e-02

Pos logit contributions
 snowball+0.000
ah+0.000
fly+0.000
 cannon+0.000
affe+0.000
Neg logit contributions
thank-0.000
 met-0.000
 responsible-0.000
ups-0.000
please-0.000
Token "
Token ID366
Feature activation+2.45e-02

Pos logit contributions
thank+0.024
please+0.021
 met+0.020
 bored+0.019
 neighborhood+0.019
Neg logit contributions
ah-0.041
 snowball-0.041
affe-0.040
 cannon-0.040
ience-0.039
TokenMom
Token ID29252
Feature activation+6.70e-02

Pos logit contributions
thank+0.037
 met+0.031
please+0.031
organized+0.031
 neighborhood+0.031
Neg logit contributions
ah-0.044
 snowball-0.043
 cannon-0.042
 button-0.042
 may-0.042
Tokenmy
Token ID1820
Feature activation+2.13e-02

Pos logit contributions
thank+0.075
organized+0.062
please+0.061
 met+0.061
 neighborhood+0.058
Neg logit contributions
 cannon-0.146
 snowball-0.144
affe-0.143
ah-0.143
 may-0.142
Token,
Token ID11
Feature activation+1.87e-02

Pos logit contributions
thank+0.034
 met+0.030
organized+0.029
please+0.029
 neighborhood+0.028
Neg logit contributions
 snowball-0.036
 cannon-0.036
ah-0.036
affe-0.036
 tentacles-0.035
Token can
Token ID460
Feature activation+8.33e-03

Pos logit contributions
 met+0.023
thank+0.022
 organized+0.022
 outside+0.022
 responsible+0.022
Neg logit contributions
affe-0.036
ah-0.035
 tentacles-0.035
fly-0.035
rum-0.034
Token I
Token ID314
Feature activation-2.04e-03

Pos logit contributions
thank+0.012
 met+0.011
organized+0.011
please+0.011
 doctors+0.010
Neg logit contributions
ah-0.015
 snowball-0.015
 cannon-0.015
affe-0.014
 tentacles-0.014
Token go
Token ID467
Feature activation+4.81e-02

Pos logit contributions
 tentacles+0.004
 cannon+0.004
ience+0.004
affe+0.004
ah+0.004
Neg logit contributions
thank-0.003
 met-0.003
please-0.003
 neighborhood-0.002
organized-0.002
Token
Token ID250
Feature activation+1.85e-02

Pos logit contributions
thank+0.015
 met+0.014
 doctors+0.014
please+0.014
organized+0.014
Neg logit contributions
 may-0.011
ah-0.011
 snowball-0.010
ience-0.010
 cannon-0.010
Token a
Token ID257
Feature activation+1.05e-03

Pos logit contributions
thank+0.026
please+0.022
organized+0.021
 met+0.021
 neighborhood+0.020
Neg logit contributions
 cannon-0.037
 snowball-0.036
 may-0.036
ah-0.036
affe-0.036
Token new
Token ID649
Feature activation-1.75e-03

Pos logit contributions
thank+0.001
 responsible+0.001
please+0.001
 neighborhood+0.001
 met+0.001
Neg logit contributions
rum-0.002
affe-0.002
ah-0.002
 cannon-0.002
 tentacles-0.002
Token doll
Token ID3654
Feature activation+3.01e-02

Pos logit contributions
affe+0.003
 may+0.003
 cannon+0.003
ience+0.003
ah+0.003
Neg logit contributions
thank-0.003
 neighborhood-0.003
 met-0.003
please-0.003
organized-0.003
Token!
Token ID0
Feature activation+3.25e-03

Pos logit contributions
thank+0.051
 met+0.044
please+0.043
organized+0.043
 neighborhood+0.041
Neg logit contributions
 snowball-0.050
 cannon-0.050
ah-0.049
 tentacles-0.048
 sled-0.047
Token Mom
Token ID11254
Feature activation+6.70e-02

Pos logit contributions
thank+0.005
organized+0.005
please+0.005
 met+0.005
 responsible+0.004
Neg logit contributions
 snowball-0.005
 spear-0.005
bee-0.005
 globe-0.005
 cannon-0.005
Tokeno
Token ID78
Feature activation+1.31e-02

Pos logit contributions
thank+0.097
 met+0.081
please+0.081
organized+0.081
 neighborhood+0.077
Neg logit contributions
 snowball-0.126
 cannon-0.126
ah-0.125
affe-0.123
 tentacles-0.121
Token was
Token ID373
Feature activation+3.67e-03

Pos logit contributions
thank+0.023
 met+0.021
 punished+0.019
organized+0.019
please+0.019
Neg logit contributions
 snowball-0.021
 sled-0.021
 cannon-0.021
 spoiler-0.020
 tentacles-0.020
Token so
Token ID523
Feature activation+1.95e-02

Pos logit contributions
thank+0.005
please+0.004
organized+0.004
 met+0.004
 neighborhood+0.004
Neg logit contributions
ah-0.007
 cannon-0.007
 snowball-0.007
 may-0.007
affe-0.007
Token excited
Token ID6568
Feature activation+1.27e-02

Pos logit contributions
thank+0.026
please+0.021
organized+0.021
 met+0.021
 neighborhood+0.021
Neg logit contributions
ah-0.039
 cannon-0.038
 snowball-0.038
 may-0.038
 button-0.037
Token when
Token ID618
Feature activation+9.86e-03

Pos logit contributions
thank+0.021
please+0.018
 met+0.018
organized+0.018
 neighborhood+0.018
Neg logit contributions
ah-0.021
 may-0.021
 snowball-0.021
 cannon-0.020
affe-0.020
Token.
Token ID13
Feature activation+1.77e-02

Pos logit contributions
thank+0.052
please+0.048
 punished+0.043
organized+0.043
 met+0.043
Neg logit contributions
 button-0.034
ah-0.032
 may-0.032
hell-0.029
 medal-0.029
Token
Token ID198
Feature activation+1.39e-02

Pos logit contributions
thank+0.035
please+0.033
 punished+0.031
organized+0.030
 doctors+0.030
Neg logit contributions
 may-0.023
 button-0.023
ah-0.022
 It-0.021
 cannon-0.020
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
thank+0.027
please+0.023
organized+0.023
 punished+0.023
 doctors+0.022
Neg logit contributions
ah-0.019
 button-0.019
 may-0.018
 snowball-0.018
 sled-0.018
TokenWhen
Token ID2215
Feature activation+1.53e-02

Pos logit contributions
thank+0.035
please+0.030
organized+0.029
 met+0.028
 punished+0.028
Neg logit contributions
ah-0.029
 button-0.028
 may-0.027
hell-0.027
 snowball-0.027
Token she
Token ID673
Feature activation+2.48e-02

Pos logit contributions
thank+0.028
please+0.026
organized+0.025
 punished+0.024
 met+0.023
Neg logit contributions
 may-0.023
 button-0.023
ah-0.023
 sled-0.021
 cannon-0.020
Token arrived
Token ID5284
Feature activation+6.69e-02

Pos logit contributions
thank+0.044
please+0.040
organized+0.040
 responsible+0.038
 doctors+0.037
Neg logit contributions
ah-0.040
 may-0.040
 button-0.038
 cannon-0.035
 snowball-0.034
Token,
Token ID11
Feature activation+3.39e-02

Pos logit contributions
thank+0.134
please+0.125
organized+0.112
 responsible+0.097
 punished+0.097
Neg logit contributions
 button-0.095
 may-0.093
 lizard-0.088
 medal-0.087
ience-0.085
Token the
Token ID262
Feature activation+2.19e-02

Pos logit contributions
thank+0.065
please+0.062
organized+0.056
 punished+0.053
 neighborhood+0.053
Neg logit contributions
 may-0.049
 button-0.048
ah-0.044
 globe-0.042
 lizard-0.042
Token teacher
Token ID4701
Feature activation+2.80e-02

Pos logit contributions
thank+0.039
please+0.034
organized+0.033
 punished+0.031
 met+0.030
Neg logit contributions
 button-0.037
 lizard-0.034
 medal-0.033
 may-0.033
ah-0.031
Token taught
Token ID7817
Feature activation+6.06e-03

Pos logit contributions
thank+0.042
please+0.035
organized+0.035
 responsible+0.032
 neighborhood+0.032
Neg logit contributions
ah-0.052
 may-0.051
 button-0.050
 cannon-0.049
affe-0.048
Token them
Token ID606
Feature activation+1.22e-02

Pos logit contributions
thank+0.008
please+0.007
 met+0.007
organized+0.007
 neighborhood+0.007
Neg logit contributions
ah-0.012
 cannon-0.011
 snowball-0.011
 button-0.011
 may-0.011
Token can
Token ID460
Feature activation+7.49e-03

Pos logit contributions
 tentacles+0.054
 cannon+0.054
ah+0.054
 snowball+0.053
fly+0.052
Neg logit contributions
thank-0.057
 met-0.054
organized-0.052
 responsible-0.052
 neighborhood-0.052
Token't
Token ID470
Feature activation+4.35e-03

Pos logit contributions
thank+0.011
 met+0.010
 doctors+0.010
organized+0.010
please+0.010
Neg logit contributions
ah-0.014
 snowball-0.013
 cannon-0.013
affe-0.013
 tentacles-0.012
Token I
Token ID314
Feature activation-1.09e-02

Pos logit contributions
thank+0.008
please+0.007
organized+0.007
 met+0.007
 neighborhood+0.007
Neg logit contributions
 may-0.006
 button-0.006
 cannon-0.006
ah-0.006
 snowball-0.006
Token go
Token ID467
Feature activation+3.80e-02

Pos logit contributions
 cannon+0.021
ience+0.021
ah+0.020
 tentacles+0.020
 snowball+0.020
Neg logit contributions
thank-0.015
 met-0.014
 neighborhood-0.013
please-0.013
 punished-0.012
Token,
Token ID11
Feature activation+1.97e-02

Pos logit contributions
thank+0.053
please+0.044
 met+0.044
organized+0.043
 neighborhood+0.042
Neg logit contributions
ah-0.074
 cannon-0.073
 snowball-0.073
 may-0.072
affe-0.072
Token Mom
Token ID11254
Feature activation+6.68e-02

Pos logit contributions
 doctors+0.012
 organized+0.011
 responsible+0.011
 bored+0.011
 tidy+0.010
Neg logit contributions
ah-0.051
 cannon-0.050
pe-0.049
haw-0.049
rum-0.048
Tokenmy
Token ID1820
Feature activation+2.03e-02

Pos logit contributions
thank+0.106
 met+0.088
please+0.085
organized+0.085
 neighborhood+0.084
Neg logit contributions
ah-0.120
 snowball-0.116
 may-0.112
 button-0.111
fly-0.111
Token?"
Token ID1701
Feature activation+2.42e-02

Pos logit contributions
thank+0.033
organized+0.027
 met+0.027
 neighborhood+0.027
ups+0.026
Neg logit contributions
ah-0.035
 snowball-0.033
 may-0.033
fly-0.033
 button-0.032
Token
Token ID198
Feature activation-1.80e-03

Pos logit contributions
thank+0.048
organized+0.043
please+0.041
ups+0.041
 punished+0.041
Neg logit contributions
 may-0.033
 button-0.032
ah-0.031
 snowball-0.031
 tentacles-0.030
Token
Token ID198
Feature activation+2.57e-03

Pos logit contributions
 snowball+0.002
 button+0.002
 may+0.002
ah+0.002
 sled+0.002
Neg logit contributions
thank-0.004
organized-0.003
please-0.003
 punished-0.003
 doctors-0.003
TokenMom
Token ID29252
Feature activation+3.43e-02

Pos logit contributions
thank+0.004
 met+0.003
please+0.003
organized+0.003
 neighborhood+0.003
Neg logit contributions
ah-0.004
 snowball-0.004
 may-0.004
 button-0.004
 cannon-0.004
Token I
Token ID314
Feature activation-6.95e-03

Pos logit contributions
ah+0.013
 snowball+0.013
 cannon+0.013
affe+0.013
 tentacles+0.013
Neg logit contributions
thank-0.013
 met-0.012
please-0.011
organized-0.011
 neighborhood-0.011
Token play
Token ID711
Feature activation+2.72e-02

Pos logit contributions
 cannon+0.013
 tentacles+0.012
 snowball+0.012
affe+0.012
ah+0.012
Neg logit contributions
thank-0.011
please-0.009
 met-0.009
organized-0.009
 neighborhood-0.009
Token it
Token ID340
Feature activation-8.30e-03

Pos logit contributions
thank+0.042
please+0.036
 met+0.035
organized+0.035
 neighborhood+0.034
Neg logit contributions
 snowball-0.048
ah-0.048
 button-0.047
 cannon-0.047
 may-0.046
Token today
Token ID1909
Feature activation-1.43e-02

Pos logit contributions
ah+0.014
 snowball+0.014
 may+0.013
 button+0.013
 cannon+0.013
Neg logit contributions
thank-0.014
please-0.012
organized-0.012
 met-0.012
 neighborhood-0.012
Token,
Token ID11
Feature activation+1.67e-02

Pos logit contributions
ah+0.025
 may+0.024
 snowball+0.024
 button+0.024
 cannon+0.023
Neg logit contributions
thank-0.023
please-0.020
organized-0.020
 met-0.020
 neighborhood-0.019
Token Mom
Token ID11254
Feature activation+6.66e-02

Pos logit contributions
thank+0.021
please+0.017
 met+0.017
organized+0.017
 neighborhood+0.016
Neg logit contributions
ah-0.035
 cannon-0.034
 tentacles-0.034
 snowball-0.034
ience-0.033
Tokenmy
Token ID1820
Feature activation+2.14e-02

Pos logit contributions
thank+0.103
 met+0.086
organized+0.083
please+0.083
 neighborhood+0.083
Neg logit contributions
ah-0.122
 snowball-0.118
 may-0.114
 cannon-0.113
 button-0.113
Token?"
Token ID1701
Feature activation+2.10e-02

Pos logit contributions
thank+0.032
 neighborhood+0.027
organized+0.027
 met+0.027
ups+0.026
Neg logit contributions
ah-0.040
 may-0.038
 snowball-0.037
 button-0.036
fly-0.036
Token
Token ID198
Feature activation-8.78e-03

Pos logit contributions
thank+0.048
organized+0.046
 punished+0.045
 responsible+0.045
ups+0.045
Neg logit contributions
 may-0.019
ah-0.018
 snowball-0.018
 button-0.018
 sled-0.018
Token
Token ID198
Feature activation-5.07e-03

Pos logit contributions
 snowball+0.011
ah+0.011
 button+0.011
 may+0.011
 sled+0.011
Neg logit contributions
thank-0.018
organized-0.016
 punished-0.015
 doctors-0.015
please-0.015
TokenMom
Token ID29252
Feature activation+3.57e-02

Pos logit contributions
ah+0.008
 snowball+0.008
 button+0.008
affe+0.008
 may+0.008
Neg logit contributions
thank-0.008
 met-0.007
organized-0.007
please-0.007
 neighborhood-0.007
Token show
Token ID905
Feature activation+5.64e-03

Pos logit contributions
ah+0.004
 cannon+0.004
 may+0.004
 snowball+0.004
 button+0.004
Neg logit contributions
thank-0.004
please-0.004
 met-0.004
organized-0.004
 neighborhood-0.004
Token her
Token ID607
Feature activation+1.78e-02

Pos logit contributions
thank+0.008
please+0.007
organized+0.007
 met+0.007
 neighborhood+0.006
Neg logit contributions
ah-0.011
 cannon-0.011
 snowball-0.011
 may-0.011
affe-0.010
Token.
Token ID13
Feature activation+4.87e-04

Pos logit contributions
thank+0.023
organized+0.019
 met+0.019
 neighborhood+0.018
please+0.018
Neg logit contributions
ah-0.037
affe-0.036
 cannon-0.036
 snowball-0.036
 may-0.034
Token When
Token ID1649
Feature activation+1.43e-02

Pos logit contributions
thank+0.001
please+0.001
 met+0.001
organized+0.001
 neighborhood+0.001
Neg logit contributions
ah-0.001
 cannon-0.001
 snowball-0.001
 may-0.001
affe-0.001
Token she
Token ID673
Feature activation+1.80e-02

Pos logit contributions
thank+0.025
organized+0.023
please+0.022
 met+0.021
 neighborhood+0.021
Neg logit contributions
 may-0.022
ah-0.022
 button-0.021
 snowball-0.021
 sled-0.021
Token arrived
Token ID5284
Feature activation+6.64e-02

Pos logit contributions
thank+0.029
organized+0.025
please+0.025
ups+0.024
 doctors+0.024
Neg logit contributions
 may-0.032
ah-0.031
 button-0.029
 cannon-0.029
 snowball-0.028
Token,
Token ID11
Feature activation+3.66e-02

Pos logit contributions
thank+0.147
please+0.136
organized+0.131
 met+0.123
 doctors+0.121
Neg logit contributions
 may-0.074
 button-0.072
ah-0.068
ience-0.067
 sled-0.067
Token Grand
Token ID5675
Feature activation+4.66e-03

Pos logit contributions
thank+0.066
please+0.058
organized+0.057
 met+0.054
 neighborhood+0.054
Neg logit contributions
 may-0.057
 button-0.056
ah-0.055
 snowball-0.055
 sled-0.054
Tokenpa
Token ID8957
Feature activation+1.20e-02

Pos logit contributions
thank+0.007
please+0.006
 met+0.006
organized+0.006
 neighborhood+0.006
Neg logit contributions
ah-0.008
 snowball-0.008
 cannon-0.008
 may-0.008
affe-0.008
Token said
Token ID531
Feature activation+1.22e-03

Pos logit contributions
thank+0.016
please+0.013
 met+0.013
organized+0.013
 neighborhood+0.013
Neg logit contributions
ah-0.024
 snowball-0.024
 cannon-0.024
 may-0.023
affe-0.023
Token,
Token ID11
Feature activation+1.02e-02

Pos logit contributions
thank+0.002
 met+0.002
 doctors+0.002
 responsible+0.001
 bored+0.001
Neg logit contributions
 snowball-0.002
ah-0.002
affe-0.002
 tentacles-0.002
 cannon-0.002
Token excited
Token ID6568
Feature activation+1.25e-02

Pos logit contributions
thank+0.023
please+0.020
organized+0.019
 neighborhood+0.019
ups+0.019
Neg logit contributions
 may-0.023
ah-0.023
 cannon-0.023
 snowball-0.023
 button-0.023
Token!
Token ID0
Feature activation-2.53e-03

Pos logit contributions
thank+0.021
please+0.018
 met+0.018
organized+0.018
 neighborhood+0.017
Neg logit contributions
ah-0.021
 may-0.020
 snowball-0.020
affe-0.020
 cannon-0.020
Token But
Token ID887
Feature activation-1.17e-03

Pos logit contributions
ah+0.005
bee+0.005
 cannon+0.004
hell+0.004
ience+0.004
Neg logit contributions
thank-0.004
 neighborhood-0.003
 met-0.003
 neighbourhood-0.003
 asked-0.003
Token when
Token ID618
Feature activation+1.23e-02

Pos logit contributions
affe+0.003
hell+0.002
 cannon+0.002
 snowball+0.002
ah+0.002
Neg logit contributions
thank-0.001
 met-0.001
 when-0.001
 doctors-0.001
 neighborhood-0.001
Token they
Token ID484
Feature activation+1.37e-02

Pos logit contributions
thank+0.023
 met+0.021
please+0.020
 neighborhood+0.020
 doctors+0.019
Neg logit contributions
ah-0.018
affe-0.018
 snowball-0.017
 cannon-0.017
hell-0.017
Token arrived
Token ID5284
Feature activation+6.64e-02

Pos logit contributions
thank+0.021
please+0.018
organized+0.018
 neighborhood+0.017
 met+0.017
Neg logit contributions
ah-0.025
 may-0.025
 cannon-0.024
 button-0.024
 snowball-0.024
Token,
Token ID11
Feature activation+2.89e-02

Pos logit contributions
thank+0.114
please+0.100
organized+0.096
 met+0.093
 neighborhood+0.089
Neg logit contributions
 may-0.108
 cannon-0.108
ah-0.108
 button-0.104
 spear-0.104
Token the
Token ID262
Feature activation+1.72e-02

Pos logit contributions
thank+0.056
 met+0.053
 neighborhood+0.050
 doctors+0.049
please+0.049
Neg logit contributions
ah-0.040
affe-0.039
 snowball-0.038
 cannon-0.037
hell-0.036
Token place
Token ID1295
Feature activation+1.20e-02

Pos logit contributions
thank+0.024
 met+0.020
 neighborhood+0.020
organized+0.020
please+0.020
Neg logit contributions
ah-0.034
 snowball-0.033
affe-0.033
 cannon-0.033
 may-0.032
Token was
Token ID373
Feature activation-2.09e-03

Pos logit contributions
thank+0.019
 met+0.016
organized+0.016
please+0.016
 neighborhood+0.016
Neg logit contributions
ah-0.021
 snowball-0.021
 cannon-0.021
affe-0.020
 may-0.020
Token very
Token ID845
Feature activation+7.24e-03

Pos logit contributions
 snowball+0.004
ah+0.004
 may+0.004
 cannon+0.004
 button+0.004
Neg logit contributions
thank-0.003
please-0.003
organized-0.003
 met-0.003
 neighborhood-0.003
Token.
Token ID13
Feature activation+8.11e-03

Pos logit contributions
thank+0.022
please+0.019
organized+0.019
 met+0.019
 punished+0.018
Neg logit contributions
ah-0.023
 snowball-0.023
 cannon-0.023
 button-0.022
 may-0.022
Token
Token ID198
Feature activation+8.88e-03

Pos logit contributions
thank+0.011
 neighborhood+0.010
 met+0.009
organized+0.009
please+0.009
Neg logit contributions
fly-0.016
ah-0.016
 cannon-0.016
affe-0.016
bee-0.015
Token
Token ID198
Feature activation+1.45e-02

Pos logit contributions
thank+0.012
 met+0.010
 neighborhood+0.010
please+0.010
organized+0.010
Neg logit contributions
 cannon-0.017
ah-0.017
 snowball-0.017
 may-0.017
affe-0.017
TokenWhen
Token ID2215
Feature activation+1.30e-02

Pos logit contributions
thank+0.023
 met+0.021
please+0.020
 neighborhood+0.020
organized+0.020
Neg logit contributions
 cannon-0.025
fly-0.024
 tentacles-0.024
ah-0.024
 may-0.024
Token they
Token ID484
Feature activation+1.78e-02

Pos logit contributions
thank+0.021
 met+0.018
please+0.018
organized+0.018
 neighborhood+0.018
Neg logit contributions
ah-0.022
affe-0.022
 cannon-0.022
 snowball-0.021
fly-0.021
Token arrived
Token ID5284
Feature activation+6.63e-02

Pos logit contributions
thank+0.029
please+0.025
organized+0.025
 met+0.025
 neighborhood+0.024
Neg logit contributions
ah-0.030
 cannon-0.030
 snowball-0.029
 may-0.029
affe-0.029
Token they
Token ID484
Feature activation+2.80e-02

Pos logit contributions
thank+0.117
please+0.103
organized+0.098
 met+0.088
 punished+0.087
Neg logit contributions
 sled-0.111
 spear-0.107
 cannon-0.106
ah-0.106
 snowball-0.105
Token unloaded
Token ID49975
Feature activation+2.82e-02

Pos logit contributions
thank+0.040
organized+0.037
ups+0.033
please+0.033
 responsible+0.032
Neg logit contributions
 may-0.055
ah-0.053
 cannon-0.052
 sled-0.051
 carrot-0.048
Token the
Token ID262
Feature activation+3.63e-02

Pos logit contributions
thank+0.038
organized+0.031
 neighborhood+0.031
 met+0.031
 doctors+0.030
Neg logit contributions
ah-0.057
affe-0.056
fly-0.053
 may-0.053
ience-0.053
Token car
Token ID1097
Feature activation+1.56e-02

Pos logit contributions
thank+0.071
 met+0.063
organized+0.062
please+0.062
 neighborhood+0.062
Neg logit contributions
ah-0.050
 cannon-0.049
affe-0.048
 snowball-0.048
 may-0.047
Token and
Token ID290
Feature activation+1.62e-02

Pos logit contributions
thank+0.028
please+0.025
 met+0.024
organized+0.024
 neighborhood+0.023
Neg logit contributions
ah-0.024
 cannon-0.024
 snowball-0.024
 may-0.024
 button-0.023
Token you
Token ID345
Feature activation-1.79e-02

Pos logit contributions
thank+0.015
please+0.013
 met+0.012
organized+0.012
 doctors+0.011
Neg logit contributions
ah-0.023
 snowball-0.023
 cannon-0.023
affe-0.022
rum-0.022
Token?"
Token ID1701
Feature activation+2.15e-02

Pos logit contributions
ah+0.036
 cannon+0.035
 snowball+0.035
 may+0.035
affe+0.035
Neg logit contributions
thank-0.024
please-0.019
organized-0.019
 met-0.019
 neighborhood-0.019
Token
Token ID198
Feature activation-1.11e-02

Pos logit contributions
thank+0.037
organized+0.032
please+0.031
 met+0.031
 neighborhood+0.030
Neg logit contributions
 snowball-0.035
ah-0.035
 may-0.035
affe-0.034
 cannon-0.034
Token
Token ID198
Feature activation-5.74e-03

Pos logit contributions
 snowball+0.017
ah+0.016
affe+0.016
 may+0.016
 sled+0.016
Neg logit contributions
thank-0.021
organized-0.018
please-0.018
 met-0.017
 punished-0.017
TokenB
Token ID33
Feature activation-1.05e-02

Pos logit contributions
ah+0.010
 snowball+0.010
 cannon+0.010
 may+0.010
affe+0.010
Neg logit contributions
thank-0.009
please-0.008
 met-0.008
organized-0.008
 neighborhood-0.007
Tokenrun
Token ID5143
Feature activation-8.29e-02

Pos logit contributions
 cannon+0.020
 prope+0.020
 hawk+0.020
 button+0.020
 tentacles+0.020
Neg logit contributions
thank-0.014
organized-0.013
please-0.012
udd-0.012
ave-0.012
Tokeno
Token ID78
Feature activation-7.97e-03

Pos logit contributions
 cannon+0.133
 sled+0.127
 boulder+0.126
 spear+0.126
 button+0.125
Neg logit contributions
thank-0.146
 met-0.135
organized-0.125
please-0.120
ups-0.117
Token said
Token ID531
Feature activation-1.51e-02

Pos logit contributions
 cannon+0.016
 snowball+0.016
rum+0.015
 may+0.015
ah+0.015
Neg logit contributions
thank-0.011
 met-0.010
please-0.009
organized-0.008
 neighborhood-0.008
Token,
Token ID11
Feature activation-1.23e-02

Pos logit contributions
 snowball+0.025
ah+0.025
 cannon+0.024
affe+0.024
 may+0.024
Neg logit contributions
thank-0.025
 met-0.022
please-0.022
organized-0.021
 responsible-0.020
Token "
Token ID366
Feature activation-5.83e-03

Pos logit contributions
ah+0.021
 snowball+0.020
 cannon+0.020
affe+0.020
 may+0.020
Neg logit contributions
thank-0.021
please-0.018
 met-0.018
organized-0.017
 neighborhood-0.017
TokenOf
Token ID5189
Feature activation-4.91e-03

Pos logit contributions
 cannon+0.012
 snowball+0.012
ah+0.012
 may+0.012
 tentacles+0.012
Neg logit contributions
thank-0.008
please-0.007
 met-0.006
organized-0.006
 responsible-0.006
Token She
Token ID1375
Feature activation-2.70e-02

Pos logit contributions
ah+0.073
 cannon+0.072
 snowball+0.071
 tentacles+0.070
ience+0.070
Neg logit contributions
thank-0.051
please-0.043
 met-0.043
 neighborhood-0.043
organized-0.041
Token takes
Token ID2753
Feature activation-5.21e-03

Pos logit contributions
ah+0.039
 cannon+0.039
affe+0.038
 snowball+0.038
fly+0.037
Neg logit contributions
thank-0.048
 met-0.046
 neighborhood-0.044
 punished-0.042
 doctors-0.041
Token out
Token ID503
Feature activation-6.93e-03

Pos logit contributions
affe+0.011
ah+0.011
 may+0.011
 cannon+0.011
 snowball+0.011
Neg logit contributions
thank-0.006
 met-0.005
organized-0.005
please-0.005
 doctors-0.005
Token a
Token ID257
Feature activation-1.67e-02

Pos logit contributions
affe+0.017
 may+0.017
fly+0.017
 cannon+0.017
ah+0.016
Neg logit contributions
 doctors-0.005
thank-0.005
 organized-0.004
bel-0.004
 bored-0.004
Token magn
Token ID7842
Feature activation-3.91e-02

Pos logit contributions
affe+0.029
bee+0.029
ah+0.029
 tentacles+0.028
rum+0.028
Neg logit contributions
thank-0.023
 met-0.023
 neighborhood-0.022
 responsible-0.022
 organized-0.022
Tokenifying
Token ID4035
Feature activation-8.28e-02

Pos logit contributions
 snowball+0.062
haw+0.058
 nut+0.058
bee+0.057
 cannon+0.055
Neg logit contributions
 outside-0.058
amed-0.058
iring-0.055
thank-0.054
organized-0.054
Token glass
Token ID5405
Feature activation-3.72e-02

Pos logit contributions
ah+0.101
affe+0.098
bee+0.097
 cannon+0.096
 may+0.096
Neg logit contributions
thank-0.175
 met-0.154
 neighborhood-0.151
organized-0.150
please-0.150
Token.
Token ID13
Feature activation-1.63e-02

Pos logit contributions
haw+0.074
ah+0.072
ience+0.070
affe+0.069
 may+0.069
Neg logit contributions
thank-0.050
 outside-0.041
ups-0.041
 organized-0.040
 doctors-0.038
Token She
Token ID1375
Feature activation-1.65e-02

Pos logit contributions
ah+0.031
ience+0.030
 snowball+0.029
affe+0.029
 cannon+0.029
Neg logit contributions
thank-0.023
 neighborhood-0.020
please-0.020
 met-0.020
organized-0.019
Token looks
Token ID3073
Feature activation-3.10e-03

Pos logit contributions
ah+0.024
 cannon+0.023
affe+0.023
 snowball+0.023
fly+0.021
Neg logit contributions
thank-0.031
 met-0.029
 neighborhood-0.028
please-0.027
organized-0.027
Token at
Token ID379
Feature activation+7.70e-03

Pos logit contributions
 may+0.006
 snowball+0.006
ah+0.006
 cannon+0.006
 button+0.006
Neg logit contributions
thank-0.005
organized-0.004
please-0.004
 neighborhood-0.004
ups-0.004
Token the
Token ID262
Feature activation+7.87e-03

Pos logit contributions
affe+0.004
ah+0.004
 cannon+0.004
fly+0.004
 spoiler+0.004
Neg logit contributions
 doctors-0.001
 outside-0.001
ups-0.001
 met-0.001
thank-0.001
Token necklace
Token ID38987
Feature activation-3.46e-02

Pos logit contributions
thank+0.015
 met+0.014
 neighborhood+0.013
 doctors+0.013
 responsible+0.012
Neg logit contributions
ah-0.012
rum-0.012
ience-0.011
affe-0.011
 snowball-0.010
Token and
Token ID290
Feature activation-8.29e-03

Pos logit contributions
 cannon+0.061
 snowball+0.061
ah+0.060
ience+0.059
haw+0.059
Neg logit contributions
thank-0.055
 met-0.046
organized-0.044
 outside-0.044
 neighborhood-0.044
Token the
Token ID262
Feature activation+6.66e-03

Pos logit contributions
 snowball+0.014
 cannon+0.014
 may+0.014
 tentacles+0.014
 carrot+0.014
Neg logit contributions
thank-0.013
organized-0.012
please-0.012
 neighborhood-0.012
 responsible-0.011
Token magn
Token ID7842
Feature activation-3.71e-02

Pos logit contributions
thank+0.013
 met+0.012
 neighborhood+0.012
please+0.011
 doctors+0.011
Neg logit contributions
ah-0.009
rum-0.009
 snowball-0.008
ience-0.008
affe-0.008
Tokenifying
Token ID4035
Feature activation-8.21e-02

Pos logit contributions
 snowball+0.058
rum+0.052
 spear+0.051
 sled+0.049
 cannon+0.049
Neg logit contributions
thank-0.063
amed-0.059
 met-0.059
 outside-0.057
organized-0.055
Token glass
Token ID5405
Feature activation-2.92e-02

Pos logit contributions
ah+0.092
 cannon+0.087
affe+0.087
 snowball+0.087
 may+0.086
Neg logit contributions
thank-0.183
 met-0.162
please-0.160
organized-0.159
 neighborhood-0.159
Token and
Token ID290
Feature activation+2.14e-04

Pos logit contributions
ah+0.066
ience+0.064
 snowball+0.064
haw+0.063
affe+0.063
Neg logit contributions
thank-0.032
ups-0.025
 met-0.024
 outside-0.022
 doctors-0.021
Token started
Token ID2067
Feature activation+2.57e-02

Pos logit contributions
thank+0.000
organized+0.000
please+0.000
 neighborhood+0.000
 responsible+0.000
Neg logit contributions
 tentacles-0.000
 may-0.000
 cannon-0.000
 carrot-0.000
 snowball-0.000
Token to
Token ID284
Feature activation+6.88e-03

Pos logit contributions
thank+0.030
please+0.024
 met+0.024
organized+0.023
 neighborhood+0.022
Neg logit contributions
ah-0.058
affe-0.057
 may-0.056
 cannon-0.055
 snowball-0.055
Token examine
Token ID10716
Feature activation+2.38e-02

Pos logit contributions
thank+0.011
 met+0.009
please+0.009
organized+0.009
 neighborhood+0.008
Neg logit contributions
affe-0.013
ah-0.013
ience-0.012
rum-0.012
 cannon-0.012
Token He
Token ID679
Feature activation-2.19e-02

Pos logit contributions
thank+0.016
 met+0.014
 neighborhood+0.014
organized+0.014
please+0.014
Neg logit contributions
ah-0.015
 cannon-0.015
 snowball-0.015
 may-0.015
ience-0.015
Token took
Token ID1718
Feature activation+5.09e-03

Pos logit contributions
ah+0.041
 snowball+0.040
 cannon+0.040
 may+0.040
affe+0.039
Neg logit contributions
thank-0.033
please-0.027
organized-0.027
 met-0.026
 neighborhood-0.026
Token out
Token ID503
Feature activation-1.52e-04

Pos logit contributions
thank+0.008
 met+0.006
organized+0.006
please+0.006
 neighborhood+0.006
Neg logit contributions
ah-0.009
 cannon-0.009
 snowball-0.009
 may-0.009
affe-0.009
Token a
Token ID257
Feature activation-1.34e-02

Pos logit contributions
fly+0.000
 may+0.000
affe+0.000
ah+0.000
bee+0.000
Neg logit contributions
 doctors-0.000
thank-0.000
 met-0.000
 organized-0.000
 outside-0.000
Token magn
Token ID7842
Feature activation-3.44e-02

Pos logit contributions
affe+0.022
rum+0.022
ah+0.022
 tentacles+0.021
bee+0.021
Neg logit contributions
thank-0.020
 met-0.020
 neighborhood-0.019
 responsible-0.018
 organized-0.018
Tokenifying
Token ID4035
Feature activation-7.82e-02

Pos logit contributions
 snowball+0.050
 nut+0.047
rum+0.047
 cannon+0.045
bee+0.044
Neg logit contributions
 outside-0.057
thank-0.056
amed-0.054
organized-0.053
ave-0.052
Token glass
Token ID5405
Feature activation-3.47e-02

Pos logit contributions
ah+0.095
affe+0.092
 cannon+0.092
 snowball+0.092
 may+0.091
Neg logit contributions
thank-0.165
 met-0.146
 neighborhood-0.144
please-0.143
organized-0.142
Token and
Token ID290
Feature activation-3.68e-03

Pos logit contributions
haw+0.065
ience+0.064
ah+0.064
rum+0.063
 spoiler+0.063
Neg logit contributions
thank-0.047
 outside-0.040
ups-0.038
 met-0.038
 neighborhood-0.038
Token a
Token ID257
Feature activation-7.49e-03

Pos logit contributions
 may+0.007
 cannon+0.007
ah+0.007
 button+0.007
 snowball+0.007
Neg logit contributions
thank-0.005
please-0.005
organized-0.005
 neighborhood-0.004
 met-0.004
Token notebook
Token ID20922
Feature activation-1.06e-02

Pos logit contributions
rum+0.011
ah+0.011
affe+0.011
ience+0.011
ion+0.011
Neg logit contributions
thank-0.013
 met-0.012
 neighborhood-0.012
 responsible-0.011
please-0.011
Token and
Token ID290
Feature activation-1.14e-02

Pos logit contributions
ah+0.019
 snowball+0.019
 cannon+0.019
 may+0.019
affe+0.018
Neg logit contributions
thank-0.016
 met-0.013
organized-0.013
please-0.013
 neighborhood-0.013
Token When
Token ID1649
Feature activation+2.18e-02

Pos logit contributions
thank+0.030
 met+0.027
 neighborhood+0.026
please+0.026
organized+0.024
Neg logit contributions
ah-0.039
 snowball-0.037
 cannon-0.037
ience-0.037
bee-0.037
Token Daisy
Token ID40355
Feature activation+2.02e-03

Pos logit contributions
thank+0.031
 met+0.028
 doctors+0.027
please+0.026
 neighborhood+0.025
Neg logit contributions
affe-0.041
ah-0.041
 snowball-0.040
 cannon-0.039
rum-0.039
Token put
Token ID1234
Feature activation+1.33e-02

Pos logit contributions
thank+0.003
 met+0.002
please+0.002
 neighborhood+0.002
 punished+0.002
Neg logit contributions
 cannon-0.004
 snowball-0.004
 spoiler-0.004
affe-0.004
 tentacles-0.004
Token the
Token ID262
Feature activation+2.39e-02

Pos logit contributions
thank+0.016
 met+0.014
 outside+0.013
 doctors+0.012
 bored+0.012
Neg logit contributions
affe-0.029
 cannon-0.028
ah-0.027
fly-0.027
ience-0.027
Token magn
Token ID7842
Feature activation-2.44e-02

Pos logit contributions
thank+0.044
 neighborhood+0.037
 met+0.037
 doctors+0.037
please+0.036
Neg logit contributions
rum-0.035
ah-0.035
affe-0.034
 may-0.034
 cannon-0.034
Tokenifying
Token ID4035
Feature activation-7.79e-02

Pos logit contributions
 snowball+0.037
rum+0.034
 cannon+0.033
 sled+0.033
haw+0.032
Neg logit contributions
thank-0.042
 outside-0.041
amed-0.040
organized-0.039
 met-0.038
Token glass
Token ID5405
Feature activation-1.26e-02

Pos logit contributions
ah+0.107
bee+0.106
affe+0.105
ience+0.105
 cannon+0.103
Neg logit contributions
thank-0.152
 met-0.132
 neighborhood-0.128
organized-0.125
please-0.124
Token down
Token ID866
Feature activation+4.22e-03

Pos logit contributions
ah+0.022
affe+0.021
 cannon+0.021
 snowball+0.021
haw+0.021
Neg logit contributions
thank-0.020
 met-0.017
organized-0.017
ups-0.016
 neighborhood-0.016
Token,
Token ID11
Feature activation+1.40e-02

Pos logit contributions
thank+0.006
 met+0.005
 neighborhood+0.005
organized+0.005
 doctors+0.005
Neg logit contributions
ah-0.008
 snowball-0.008
affe-0.008
 cannon-0.008
ience-0.007
Token the
Token ID262
Feature activation+1.14e-02

Pos logit contributions
thank+0.016
 met+0.014
please+0.012
organized+0.012
 neighborhood+0.012
Neg logit contributions
ah-0.031
 cannon-0.031
 snowball-0.030
affe-0.030
 tentacles-0.029
Token cater
Token ID20825
Feature activation-3.47e-02

Pos logit contributions
thank+0.021
 met+0.018
 neighborhood+0.018
please+0.018
 doctors+0.017
Neg logit contributions
ah-0.018
rum-0.017
 cannon-0.017
 snowball-0.017
affe-0.016
Token She
Token ID1375
Feature activation-1.12e-03

Pos logit contributions
ah+0.011
 snowball+0.011
 cannon+0.011
 may+0.010
affe+0.010
Neg logit contributions
thank-0.009
 met-0.007
please-0.007
organized-0.007
 neighborhood-0.007
Token took
Token ID1718
Feature activation+1.66e-02

Pos logit contributions
ah+0.002
 cannon+0.002
 snowball+0.002
affe+0.002
 may+0.002
Neg logit contributions
thank-0.001
 met-0.001
please-0.001
organized-0.001
 neighborhood-0.001
Token out
Token ID503
Feature activation+5.30e-03

Pos logit contributions
thank+0.019
 met+0.018
 doctors+0.016
organized+0.015
please+0.015
Neg logit contributions
affe-0.035
ah-0.035
 cannon-0.034
 globe-0.034
rum-0.034
Token a
Token ID257
Feature activation-7.10e-03

Pos logit contributions
thank+0.004
 doctors+0.004
 met+0.004
ups+0.004
 organized+0.004
Neg logit contributions
affe-0.013
ah-0.013
 cannon-0.013
 may-0.013
 spoiler-0.012
Token magn
Token ID7842
Feature activation-3.02e-02

Pos logit contributions
rum+0.014
affe+0.014
bee+0.013
ah+0.013
haw+0.013
Neg logit contributions
 met-0.009
thank-0.008
 organized-0.008
 neighborhood-0.008
 outside-0.008
Tokenifying
Token ID4035
Feature activation-7.63e-02

Pos logit contributions
 snowball+0.044
haw+0.039
rum+0.039
 nut+0.039
 cannon+0.039
Neg logit contributions
amed-0.051
thank-0.050
 outside-0.050
organized-0.049
ave-0.048
Token glass
Token ID5405
Feature activation-2.84e-02

Pos logit contributions
ah+0.089
 snowball+0.086
 cannon+0.086
affe+0.085
 may+0.084
Neg logit contributions
thank-0.165
 met-0.147
please-0.146
organized-0.145
 neighborhood-0.144
Token and
Token ID290
Feature activation+5.45e-03

Pos logit contributions
haw+0.058
ah+0.057
 spoiler+0.057
ience+0.056
rum+0.056
Neg logit contributions
thank-0.035
 outside-0.030
ups-0.028
 organized-0.028
 met-0.027
Token looked
Token ID3114
Feature activation+1.81e-02

Pos logit contributions
thank+0.008
please+0.007
organized+0.007
 neighborhood+0.007
 responsible+0.006
Neg logit contributions
 may-0.010
 cannon-0.010
 tentacles-0.010
ah-0.010
 button-0.010
Token very
Token ID845
Feature activation+2.78e-02

Pos logit contributions
thank+0.025
please+0.021
organized+0.021
 met+0.020
 neighborhood+0.020
Neg logit contributions
ah-0.035
 may-0.035
 cannon-0.035
 snowball-0.035
 button-0.034
Token carefully
Token ID7773
Feature activation+1.15e-03

Pos logit contributions
thank+0.051
please+0.046
organized+0.044
 met+0.042
 neighborhood+0.040
Neg logit contributions
 may-0.046
 tentacles-0.043
 cannon-0.042
ience-0.042
 button-0.042
Tokeniley
Token ID9618
Feature activation-2.78e-02

Pos logit contributions
 cannon+0.025
ah+0.025
 snowball+0.025
 may+0.024
 button+0.024
Neg logit contributions
thank-0.026
please-0.022
 met-0.022
organized-0.022
 neighborhood-0.021
Token laughed
Token ID13818
Feature activation-5.71e-03

Pos logit contributions
 snowball+0.055
 cannon+0.054
ah+0.054
 button+0.053
affe+0.053
Neg logit contributions
thank-0.037
 met-0.033
please-0.030
organized-0.030
 neighborhood-0.030
Token at
Token ID379
Feature activation-1.68e-02

Pos logit contributions
ah+0.011
 snowball+0.010
 cannon+0.010
 may+0.010
affe+0.010
Neg logit contributions
thank-0.009
 met-0.007
organized-0.007
 neighborhood-0.007
please-0.007
Token the
Token ID262
Feature activation-3.48e-02

Pos logit contributions
 cannon+0.036
ah+0.036
affe+0.035
 snowball+0.035
 may+0.034
Neg logit contributions
thank-0.022
 met-0.017
 neighborhood-0.016
organized-0.016
please-0.016
Token nos
Token ID43630
Feature activation-2.28e-02

Pos logit contributions
 cannon+0.052
rum+0.050
ah+0.050
 snowball+0.050
affe+0.050
Neg logit contributions
thank-0.064
 met-0.058
please-0.057
 neighborhood-0.055
 doctors-0.053
Tokeny
Token ID88
Feature activation-7.61e-02

Pos logit contributions
 cannon+0.040
 may+0.038
 spoiler+0.038
 snowball+0.037
ience+0.037
Neg logit contributions
thank-0.036
ups-0.033
 neighborhood-0.031
please-0.031
organized-0.031
Token dove
Token ID38862
Feature activation-3.13e-02

Pos logit contributions
 cannon+0.109
 spoiler+0.108
rum+0.105
 fossil+0.102
driver+0.101
Neg logit contributions
thank-0.149
 met-0.134
please-0.130
 neighborhood-0.125
 doctors-0.124
Token and
Token ID290
Feature activation-2.90e-02

Pos logit contributions
 snowball+0.066
 spoiler+0.066
 fossil+0.066
 cannon+0.066
haw+0.065
Neg logit contributions
thank-0.034
 met-0.030
 outside-0.027
 when-0.027
 neighborhood-0.027
Token then
Token ID788
Feature activation-7.23e-03

Pos logit contributions
ah+0.056
 cannon+0.055
 snowball+0.055
 may+0.055
affe+0.054
Neg logit contributions
thank-0.040
organized-0.034
please-0.034
 met-0.033
 neighborhood-0.033
Token continued
Token ID3767
Feature activation-4.85e-03

Pos logit contributions
ah+0.014
 snowball+0.014
 cannon+0.014
affe+0.014
 may+0.014
Neg logit contributions
thank-0.010
 met-0.009
please-0.008
organized-0.008
 neighborhood-0.008
Token on
Token ID319
Feature activation-3.40e-02

Pos logit contributions
affe+0.010
ah+0.010
 may+0.010
 snowball+0.010
 cannon+0.010
Neg logit contributions
thank-0.006
 met-0.005
please-0.005
 neighborhood-0.005
organized-0.005
Token there
Token ID612
Feature activation+8.33e-03

Pos logit contributions
 cannon+0.027
 snowball+0.027
ah+0.026
affe+0.026
 tentacles+0.026
Neg logit contributions
thank-0.019
 met-0.017
 neighborhood-0.017
organized-0.016
please-0.016
Token were
Token ID547
Feature activation+1.65e-02

Pos logit contributions
thank+0.012
 met+0.011
organized+0.010
 neighborhood+0.010
please+0.010
Neg logit contributions
 cannon-0.015
ah-0.015
affe-0.015
 snowball-0.015
 tentacles-0.015
Token two
Token ID734
Feature activation+1.36e-02

Pos logit contributions
thank+0.027
 met+0.023
please+0.023
organized+0.023
 neighborhood+0.023
Neg logit contributions
ah-0.028
 cannon-0.027
 snowball-0.027
 may-0.027
affe-0.027
Token cats
Token ID11875
Feature activation+2.36e-03

Pos logit contributions
thank+0.017
please+0.013
organized+0.013
 met+0.013
 neighborhood+0.011
Neg logit contributions
 snowball-0.030
ah-0.029
 cannon-0.029
affe-0.029
 may-0.028
Token,
Token ID11
Feature activation-1.86e-02

Pos logit contributions
thank+0.003
 met+0.003
 neighborhood+0.003
please+0.003
organized+0.003
Neg logit contributions
 cannon-0.004
ah-0.004
 may-0.004
 snowball-0.004
 button-0.004
Token Tig
Token ID14189
Feature activation-7.40e-02

Pos logit contributions
ah+0.032
 cannon+0.032
 may+0.032
 snowball+0.031
affe+0.031
Neg logit contributions
thank-0.029
 met-0.025
please-0.025
organized-0.025
 neighborhood-0.024
Tokengy
Token ID1360
Feature activation-1.09e-02

Pos logit contributions
 cannon+0.104
 snowball+0.098
 may+0.098
 button+0.098
ah+0.096
Neg logit contributions
thank-0.149
 met-0.130
organized-0.128
please-0.128
 neighborhood-0.125
Token and
Token ID290
Feature activation-2.03e-02

Pos logit contributions
ah+0.015
 snowball+0.015
affe+0.015
 cannon+0.015
 may+0.014
Neg logit contributions
thank-0.021
please-0.019
organized-0.019
 met-0.018
 neighborhood-0.018
Token Snow
Token ID7967
Feature activation-3.22e-02

Pos logit contributions
 cannon+0.033
 button+0.033
ah+0.033
 snowball+0.032
affe+0.032
Neg logit contributions
thank-0.034
 met-0.030
please-0.030
organized-0.029
 doctors-0.028
Tokeny
Token ID88
Feature activation-1.23e-02

Pos logit contributions
 cannon+0.042
 may+0.041
 snowball+0.041
 button+0.040
ah+0.039
Neg logit contributions
thank-0.065
 met-0.058
organized-0.058
please-0.058
 neighborhood-0.057
Token.
Token ID13
Feature activation-1.80e-02

Pos logit contributions
 cannon+0.025
ah+0.025
 may+0.025
 button+0.025
 snowball+0.025
Neg logit contributions
thank-0.015
 met-0.013
 neighborhood-0.013
please-0.012
organized-0.012
Token the
Token ID262
Feature activation-6.65e-03

Pos logit contributions
ah+0.014
affe+0.014
ience+0.014
 cannon+0.014
 may+0.013
Neg logit contributions
thank-0.009
 met-0.008
organized-0.008
 neighborhood-0.008
 doctors-0.007
Token watch
Token ID2342
Feature activation-1.72e-02

Pos logit contributions
ah+0.010
rum+0.010
 cannon+0.010
 snowball+0.010
affe+0.010
Neg logit contributions
thank-0.012
 met-0.010
 neighborhood-0.010
please-0.010
 doctors-0.010
Token with
Token ID351
Feature activation-1.69e-02

Pos logit contributions
 snowball+0.034
ience+0.033
 cannon+0.033
 tentacles+0.032
rum+0.032
Neg logit contributions
thank-0.023
 met-0.019
organized-0.018
 neighborhood-0.018
 doctors-0.018
Token the
Token ID262
Feature activation+7.05e-03

Pos logit contributions
ah+0.032
affe+0.032
ience+0.031
 cannon+0.031
bee+0.031
Neg logit contributions
thank-0.025
 met-0.021
organized-0.020
 neighborhood-0.019
please-0.019
Token magn
Token ID7842
Feature activation-3.80e-02

Pos logit contributions
thank+0.012
 met+0.011
please+0.011
 neighborhood+0.011
 doctors+0.010
Neg logit contributions
ah-0.011
 cannon-0.011
rum-0.011
 snowball-0.011
ience-0.011
Tokenifying
Token ID4035
Feature activation-7.37e-02

Pos logit contributions
 snowball+0.057
rum+0.056
haw+0.054
 sled+0.052
 cannon+0.051
Neg logit contributions
 outside-0.061
thank-0.058
amed-0.057
organized-0.057
ave-0.056
Token glass
Token ID5405
Feature activation-3.25e-02

Pos logit contributions
ah+0.084
 cannon+0.081
affe+0.080
 snowball+0.080
 may+0.079
Neg logit contributions
thank-0.162
 met-0.143
please-0.141
organized-0.141
 neighborhood-0.140
Token.
Token ID13
Feature activation-5.76e-03

Pos logit contributions
haw+0.069
ah+0.068
 snowball+0.067
rum+0.067
ience+0.066
Neg logit contributions
thank-0.034
ups-0.031
 met-0.027
 outside-0.027
 doctors-0.026
Token He
Token ID679
Feature activation-1.69e-02

Pos logit contributions
ah+0.008
ience+0.007
 cannon+0.007
bee+0.007
affe+0.007
Neg logit contributions
thank-0.011
organized-0.010
 neighborhood-0.010
 met-0.010
please-0.010
Token saw
Token ID2497
Feature activation+1.27e-02

Pos logit contributions
 may+0.035
ah+0.035
 snowball+0.034
 tentacles+0.034
 cannon+0.033
Neg logit contributions
thank-0.023
organized-0.018
please-0.017
 neighborhood-0.016
 responsible-0.016
Token the
Token ID262
Feature activation+7.22e-03

Pos logit contributions
thank+0.019
 met+0.016
organized+0.016
please+0.016
 neighborhood+0.015
Neg logit contributions
ah-0.024
 may-0.023
 cannon-0.023
 snowball-0.023
affe-0.023
Token,
Token ID11
Feature activation-9.66e-03

Pos logit contributions
ah+0.015
 snowball+0.015
 cannon+0.014
affe+0.014
 tentacles+0.014
Neg logit contributions
thank-0.012
 met-0.010
please-0.010
organized-0.010
 responsible-0.009
Token handing
Token ID22786
Feature activation-1.36e-02

Pos logit contributions
ah+0.018
 cannon+0.017
 snowball+0.017
affe+0.017
 tentacles+0.017
Neg logit contributions
thank-0.015
please-0.013
 met-0.013
organized-0.012
 neighborhood-0.012
Token him
Token ID683
Feature activation-2.28e-02

Pos logit contributions
ah+0.024
 cannon+0.024
affe+0.024
 snowball+0.024
 may+0.024
Neg logit contributions
thank-0.021
please-0.018
 met-0.018
organized-0.018
 neighborhood-0.017
Token the
Token ID262
Feature activation+6.55e-04

Pos logit contributions
 cannon+0.051
affe+0.051
 spoiler+0.051
ah+0.050
 may+0.049
Neg logit contributions
thank-0.021
 met-0.020
ups-0.020
 doctors-0.018
 outside-0.017
Token magn
Token ID7842
Feature activation-3.81e-02

Pos logit contributions
thank+0.001
 met+0.001
 neighborhood+0.001
 doctors+0.001
please+0.001
Neg logit contributions
ience-0.001
ah-0.001
rum-0.001
affe-0.001
 may-0.001
Tokenifying
Token ID4035
Feature activation-7.29e-02

Pos logit contributions
 snowball+0.060
rum+0.057
 sled+0.055
haw+0.054
bee+0.054
Neg logit contributions
 outside-0.058
thank-0.054
organized-0.053
ave-0.052
amed-0.051
Token glass
Token ID5405
Feature activation-3.10e-02

Pos logit contributions
ah+0.091
bee+0.090
affe+0.087
ience+0.086
 may+0.085
Neg logit contributions
thank-0.152
 met-0.133
 neighborhood-0.131
organized-0.128
please-0.128
Token.
Token ID13
Feature activation-9.71e-03

Pos logit contributions
ience+0.069
ah+0.067
haw+0.067
Bernie+0.066
 may+0.065
Neg logit contributions
thank-0.032
ups-0.026
 outside-0.024
 met-0.021
 organized-0.021
Token She
Token ID1375
Feature activation-1.17e-02

Pos logit contributions
ah+0.017
 cannon+0.017
 snowball+0.017
 may+0.016
affe+0.016
Neg logit contributions
thank-0.015
please-0.013
 met-0.013
organized-0.013
 neighborhood-0.013
Token looked
Token ID3114
Feature activation-2.99e-03

Pos logit contributions
ah+0.022
 snowball+0.022
 may+0.021
 cannon+0.021
affe+0.021
Neg logit contributions
thank-0.017
organized-0.015
please-0.014
 neighborhood-0.014
 met-0.013
Token around
Token ID1088
Feature activation-1.73e-02

Pos logit contributions
 snowball+0.006
ah+0.006
 may+0.006
 cannon+0.006
affe+0.006
Neg logit contributions
thank-0.004
please-0.003
organized-0.003
 met-0.003
 neighborhood-0.003
Tokenara
Token ID3301
Feature activation+3.19e-03

Pos logit contributions
thank+0.007
 met+0.006
 neighborhood+0.006
please+0.006
organized+0.006
Neg logit contributions
 snowball-0.005
 cannon-0.005
 tentacles-0.005
 spear-0.005
 button-0.005
Token and
Token ID290
Feature activation-2.34e-02

Pos logit contributions
 met+0.005
thank+0.005
 neighborhood+0.005
ups+0.005
 punished+0.005
Neg logit contributions
 cannon-0.005
ah-0.005
 tentacles-0.005
 spoiler-0.005
 globe-0.005
Token Tom
Token ID4186
Feature activation-1.03e-02

Pos logit contributions
 cannon+0.042
ah+0.041
 snowball+0.041
 tentacles+0.040
rum+0.040
Neg logit contributions
thank-0.036
 met-0.032
please-0.031
 doctors-0.030
 neighborhood-0.030
Token run
Token ID1057
Feature activation+1.17e-03

Pos logit contributions
ah+0.013
 snowball+0.013
 cannon+0.013
affe+0.013
 may+0.013
Neg logit contributions
thank-0.021
 met-0.019
please-0.018
organized-0.018
 neighborhood-0.018
Token to
Token ID284
Feature activation-1.56e-02

Pos logit contributions
thank+0.002
please+0.001
 met+0.001
organized+0.001
 neighborhood+0.001
Neg logit contributions
ah-0.002
 cannon-0.002
 snowball-0.002
 may-0.002
affe-0.002
Token Rocky
Token ID24534
Feature activation-7.25e-02

Pos logit contributions
ah+0.031
affe+0.030
 cannon+0.029
ience+0.029
 snowball+0.028
Neg logit contributions
thank-0.022
 met-0.019
please-0.018
 neighborhood-0.018
organized-0.018
Token.
Token ID13
Feature activation-3.26e-03

Pos logit contributions
haw+0.167
ah+0.165
 spoiler+0.164
 snowball+0.159
 cannon+0.159
Neg logit contributions
 neighborhood-0.069
 met-0.069
thank-0.068
 outside-0.068
 when-0.064
Token They
Token ID1119
Feature activation-1.40e-02

Pos logit contributions
ah+0.007
bee+0.006
rum+0.006
affe+0.006
hell+0.006
Neg logit contributions
thank-0.004
 met-0.004
 bored-0.003
 neighborhood-0.003
 doctors-0.003
Token hope
Token ID2911
Feature activation-2.03e-02

Pos logit contributions
ah+0.025
 snowball+0.024
affe+0.024
 cannon+0.024
rum+0.024
Neg logit contributions
thank-0.022
 met-0.020
 neighborhood-0.019
 punished-0.019
please-0.018
Token he
Token ID339
Feature activation-1.47e-02

Pos logit contributions
ah+0.034
 snowball+0.033
 cannon+0.032
affe+0.032
rum+0.032
Neg logit contributions
thank-0.034
 met-0.030
 doctors-0.030
please-0.029
 neighborhood-0.029
Token is
Token ID318
Feature activation-1.35e-02

Pos logit contributions
 cannon+0.019
 tentacles+0.019
ah+0.019
affe+0.018
 snowball+0.018
Neg logit contributions
thank-0.029
 met-0.027
 neighborhood-0.026
please-0.026
 punished-0.025
Token
Token ID198
Feature activation+1.28e-02

Pos logit contributions
thank+0.015
organized+0.013
please+0.013
 met+0.012
 punished+0.012
Neg logit contributions
 snowball-0.012
ah-0.012
 may-0.011
 button-0.011
 sled-0.011
TokenM
Token ID44
Feature activation+9.69e-03

Pos logit contributions
thank+0.022
 met+0.018
organized+0.018
please+0.018
 neighborhood+0.018
Neg logit contributions
ah-0.021
 snowball-0.021
 button-0.021
 may-0.020
 sled-0.020
Tokenam
Token ID321
Feature activation+1.83e-02

Pos logit contributions
thank+0.019
 met+0.017
organized+0.016
please+0.016
 neighborhood+0.015
Neg logit contributions
ah-0.015
 snowball-0.014
 button-0.014
 sled-0.013
 may-0.013
Tokenma
Token ID2611
Feature activation+1.59e-02

Pos logit contributions
thank+0.029
organized+0.026
please+0.025
 neighborhood+0.025
 met+0.023
Neg logit contributions
ah-0.033
 button-0.031
 snowball-0.030
affe-0.030
 may-0.030
Token sc
Token ID629
Feature activation+2.50e-02

Pos logit contributions
thank+0.027
please+0.022
organized+0.022
 neighborhood+0.021
 responsible+0.021
Neg logit contributions
ah-0.027
 snowball-0.026
 may-0.026
 cannon-0.026
 button-0.026
Tokenrun
Token ID5143
Feature activation-7.21e-02

Pos logit contributions
thank+0.027
 neighborhood+0.020
please+0.020
 met+0.020
organized+0.019
Neg logit contributions
ah-0.059
affe-0.057
ience-0.055
 cannon-0.055
 may-0.055
Tokenched
Token ID1740
Feature activation-2.08e-02

Pos logit contributions
 cannon+0.126
 may+0.121
 snowball+0.120
 tentacles+0.120
 spear+0.119
Neg logit contributions
thank-0.118
 met-0.103
organized-0.101
please-0.100
ups-0.096
Token up
Token ID510
Feature activation-6.43e-03

Pos logit contributions
 cannon+0.043
affe+0.043
ah+0.042
 may+0.042
 snowball+0.042
Neg logit contributions
thank-0.027
please-0.022
organized-0.022
 met-0.022
 neighborhood-0.020
Token all
Token ID477
Feature activation-1.27e-02

Pos logit contributions
ah+0.012
 cannon+0.012
 snowball+0.011
affe+0.011
 may+0.011
Neg logit contributions
thank-0.010
please-0.008
organized-0.008
 met-0.008
 neighborhood-0.008
Token of
Token ID286
Feature activation-1.12e-02

Pos logit contributions
 cannon+0.023
ah+0.023
 snowball+0.023
affe+0.023
 may+0.023
Neg logit contributions
thank-0.019
 met-0.016
organized-0.016
please-0.016
 neighborhood-0.016
Token the
Token ID262
Feature activation+2.35e-02

Pos logit contributions
affe+0.022
ah+0.022
 cannon+0.022
ience+0.022
 may+0.022
Neg logit contributions
thank-0.015
please-0.013
 met-0.012
organized-0.012
 neighborhood-0.012
Token said
Token ID531
Feature activation-4.32e-03

Pos logit contributions
ah+0.013
 snowball+0.012
 button+0.012
affe+0.012
 may+0.012
Neg logit contributions
thank-0.010
please-0.008
organized-0.008
 neighborhood-0.008
 unatt-0.008
Token,
Token ID11
Feature activation-3.20e-02

Pos logit contributions
ah+0.006
 cannon+0.006
 snowball+0.006
 button+0.006
 may+0.006
Neg logit contributions
thank-0.008
please-0.007
 met-0.007
 neighborhood-0.007
organized-0.007
Token holding
Token ID4769
Feature activation-1.28e-02

Pos logit contributions
ah+0.058
 snowball+0.058
 cannon+0.058
 may+0.057
affe+0.056
Neg logit contributions
thank-0.048
please-0.040
 met-0.040
organized-0.040
 neighborhood-0.039
Token the
Token ID262
Feature activation-6.09e-03

Pos logit contributions
affe+0.030
 cannon+0.030
ah+0.029
 may+0.029
ience+0.029
Neg logit contributions
thank-0.012
organized-0.010
 met-0.010
 outside-0.010
 bored-0.009
Token magn
Token ID7842
Feature activation-3.88e-02

Pos logit contributions
 spoiler+0.009
ience+0.009
ah+0.009
rum+0.009
 cannon+0.009
Neg logit contributions
thank-0.010
 neighborhood-0.010
 met-0.009
 bored-0.009
 neighbors-0.009
Tokenifying
Token ID4035
Feature activation-7.19e-02

Pos logit contributions
haw+0.060
 snowball+0.059
 cannon+0.057
 sled+0.055
rum+0.054
Neg logit contributions
 outside-0.059
amed-0.053
organized-0.052
aming-0.052
iring-0.051
Token glass
Token ID5405
Feature activation-3.42e-02

Pos logit contributions
ah+0.085
affe+0.081
 cannon+0.081
bee+0.080
 may+0.080
Neg logit contributions
thank-0.155
 met-0.137
 neighborhood-0.134
please-0.134
organized-0.133
Token over
Token ID625
Feature activation+1.16e-02

Pos logit contributions
haw+0.085
ah+0.081
ience+0.080
 spoiler+0.079
 may+0.079
Neg logit contributions
 outside-0.026
thank-0.025
ups-0.022
 organized-0.022
 when-0.020
Token a
Token ID257
Feature activation-3.49e-02

Pos logit contributions
thank+0.015
 met+0.012
please+0.012
 neighborhood+0.011
organized+0.011
Neg logit contributions
affe-0.025
ience-0.024
ah-0.024
 may-0.024
 cannon-0.024
Token black
Token ID2042
Feature activation-4.62e-02

Pos logit contributions
haw+0.056
rum+0.056
ience+0.055
ah+0.053
ion+0.052
Neg logit contributions
thank-0.053
 neighborhood-0.049
 met-0.049
 responsible-0.048
 organized-0.048
Token ant
Token ID1885
Feature activation-2.44e-02

Pos logit contributions
ience+0.084
haw+0.072
ret+0.069
 cannon+0.068
 may+0.068
Neg logit contributions
thank-0.078
 outside-0.071
 doctors-0.063
 neighborhood-0.061
 met-0.060
Token cried
Token ID16896
Feature activation-1.07e-02

Pos logit contributions
ah+0.029
 button+0.029
 snowball+0.029
 tentacles+0.028
 may+0.028
Neg logit contributions
thank-0.013
organized-0.008
please-0.008
 met-0.008
 neighborhood-0.008
Token,
Token ID11
Feature activation-2.66e-02

Pos logit contributions
 snowball+0.017
ah+0.016
 button+0.016
 cannon+0.016
 sled+0.016
Neg logit contributions
thank-0.020
 met-0.017
 neighborhood-0.017
please-0.016
organized-0.016
Token dropping
Token ID12047
Feature activation-9.13e-03

Pos logit contributions
ah+0.049
 cannon+0.048
 snowball+0.048
 may+0.048
affe+0.048
Neg logit contributions
thank-0.040
please-0.033
 met-0.033
organized-0.033
 neighborhood-0.032
Token the
Token ID262
Feature activation+2.84e-03

Pos logit contributions
affe+0.021
ah+0.021
ience+0.020
 may+0.020
 cannon+0.020
Neg logit contributions
thank-0.010
 doctors-0.008
 bored-0.008
organized-0.008
 responsible-0.008
Token magn
Token ID7842
Feature activation-3.38e-02

Pos logit contributions
thank+0.005
 neighborhood+0.005
 met+0.005
 bored+0.005
please+0.005
Neg logit contributions
ah-0.004
 spoiler-0.004
 cannon-0.004
ience-0.004
rum-0.004
Tokenifying
Token ID4035
Feature activation-7.19e-02

Pos logit contributions
haw+0.048
 snowball+0.048
 cannon+0.046
rum+0.045
 sled+0.044
Neg logit contributions
 outside-0.055
organized-0.052
thank-0.052
amed-0.049
aming-0.049
Token glass
Token ID5405
Feature activation-2.93e-02

Pos logit contributions
ah+0.077
 cannon+0.075
 snowball+0.075
 may+0.073
affe+0.073
Neg logit contributions
thank-0.162
please-0.145
 met-0.145
organized-0.144
 neighborhood-0.141
Token.
Token ID13
Feature activation-9.77e-03

Pos logit contributions
haw+0.060
ah+0.060
ience+0.058
 spoiler+0.057
 may+0.057
Neg logit contributions
thank-0.032
ups-0.028
 outside-0.028
organized-0.026
 doctors-0.025
Token
Token ID198
Feature activation-2.76e-03

Pos logit contributions
 may+0.013
 button+0.013
 cannon+0.012
 snowball+0.012
ah+0.012
Neg logit contributions
thank-0.020
organized-0.017
please-0.017
 met-0.017
 punished-0.017
Token
Token ID198
Feature activation+1.80e-03

Pos logit contributions
 snowball+0.004
 cannon+0.004
 may+0.004
ah+0.004
affe+0.004
Neg logit contributions
thank-0.005
please-0.004
organized-0.004
 met-0.004
 punished-0.004
Token"
Token ID1
Feature activation+2.32e-02

Pos logit contributions
thank+0.004
please+0.004
organized+0.004
 met+0.004
 neighborhood+0.003
Neg logit contributions
ah-0.002
 snowball-0.002
affe-0.002
 cannon-0.002
 button-0.002
Token
Token ID129
Feature activation-1.96e-02

Pos logit contributions
ah+0.005
 cannon+0.005
 snowball+0.005
affe+0.005
 may+0.005
Neg logit contributions
thank-0.006
 met-0.005
please-0.005
organized-0.005
 neighborhood-0.005
Token
Token ID241
Feature activation+2.65e-03

Pos logit contributions
 snowball+0.033
ah+0.033
 cannon+0.033
 may+0.033
fly+0.032
Neg logit contributions
thank-0.032
organized-0.028
 met-0.027
please-0.027
 neighborhood-0.026
TokenEx
Token ID3109
Feature activation-1.45e-02

Pos logit contributions
thank+0.004
please+0.004
 met+0.003
organized+0.003
 neighborhood+0.003
Neg logit contributions
ience-0.005
 cannon-0.005
fly-0.005
 may-0.005
affe-0.005
Tokenamine
Token ID9862
Feature activation-1.82e-02

Pos logit contributions
 snowball+0.024
 cannon+0.023
 lizard+0.023
 globe+0.023
fly+0.023
Neg logit contributions
thank-0.024
organized-0.023
ups-0.023
 met-0.022
please-0.022
Token the
Token ID262
Feature activation-1.67e-03

Pos logit contributions
 snowball+0.031
 tentacles+0.030
 cannon+0.029
ah+0.029
fly+0.029
Neg logit contributions
ups-0.028
thank-0.028
 met-0.026
organized-0.026
 responsible-0.026
Token Tent
Token ID30765
Feature activation-7.16e-02

Pos logit contributions
ience+0.003
rum+0.002
ah+0.002
fly+0.002
affe+0.002
Neg logit contributions
thank-0.003
 neighborhood-0.003
 met-0.003
 doctors-0.003
 neighbourhood-0.003
Tokenâ
Token ID22940
Feature activation-2.85e-02

Pos logit contributions
 snowball+0.124
ah+0.111
 carrot+0.111
 tentacles+0.110
 sled+0.110
Neg logit contributions
thank-0.117
 met-0.099
 neighborhood-0.098
 outside-0.095
organized-0.095
Token
Token ID26391
Feature activation-2.26e-02

Pos logit contributions
 cannon+0.049
 snowball+0.049
ah+0.048
 may+0.048
affe+0.048
Neg logit contributions
thank-0.047
organized-0.039
 met-0.039
please-0.039
 neighborhood-0.037
Token and
Token ID290
Feature activation-8.78e-03

Pos logit contributions
ah+0.034
 snowball+0.033
 cannon+0.033
affe+0.032
fly+0.032
Neg logit contributions
thank-0.041
 met-0.037
organized-0.036
 neighborhood-0.036
please-0.036
Token he
Token ID339
Feature activation+4.05e-03

Pos logit contributions
ah+0.016
 snowball+0.016
 cannon+0.016
affe+0.016
 may+0.016
Neg logit contributions
thank-0.013
 met-0.011
please-0.011
organized-0.011
 neighborhood-0.011
Token decided
Token ID3066
Feature activation-1.11e-02

Pos logit contributions
thank+0.005
 met+0.004
please+0.004
organized+0.004
 neighborhood+0.004
Neg logit contributions
 cannon-0.008
 snowball-0.008
ah-0.008
 globe-0.008
affe-0.008
Token toy
Token ID13373
Feature activation-1.08e-02

Pos logit contributions
ah+0.012
 tentacles+0.012
 cannon+0.012
rum+0.012
ience+0.012
Neg logit contributions
thank-0.010
 neighborhood-0.009
 met-0.008
please-0.008
 responsible-0.008
Token had
Token ID550
Feature activation-1.24e-02

Pos logit contributions
 tentacles+0.015
 snowball+0.015
ah+0.015
 cannon+0.015
rum+0.014
Neg logit contributions
thank-0.020
 met-0.018
 doctors-0.018
 neighborhood-0.018
organized-0.018
Token fallen
Token ID9292
Feature activation+1.33e-02

Pos logit contributions
ah+0.022
 cannon+0.022
 may+0.022
 button+0.022
 snowball+0.021
Neg logit contributions
thank-0.019
please-0.017
organized-0.016
 neighborhood-0.016
 met-0.015
Token behind
Token ID2157
Feature activation+3.30e-03

Pos logit contributions
thank+0.015
organized+0.012
 doctors+0.012
 outside+0.012
 met+0.011
Neg logit contributions
ience-0.028
affe-0.028
 tentacles-0.028
ah-0.028
 cannon-0.028
Token the
Token ID262
Feature activation-1.23e-02

Pos logit contributions
thank+0.005
 met+0.004
organized+0.004
 neighborhood+0.004
 doctors+0.004
Neg logit contributions
ah-0.006
affe-0.006
ience-0.006
bee-0.006
 may-0.006
Token book
Token ID1492
Feature activation-7.16e-02

Pos logit contributions
ah+0.018
 cannon+0.017
 tentacles+0.016
ience+0.016
 snowball+0.016
Neg logit contributions
thank-0.023
 neighborhood-0.021
 met-0.021
organized-0.021
 doctors-0.020
Tokencase
Token ID7442
Feature activation-4.05e-02

Pos logit contributions
 snowball+0.132
ah+0.131
 cannon+0.131
 tentacles+0.130
bee+0.128
Neg logit contributions
thank-0.102
 met-0.086
organized-0.086
 neighborhood-0.085
ups-0.080
Token.
Token ID13
Feature activation+8.25e-03

Pos logit contributions
ah+0.085
 snowball+0.085
 cannon+0.084
affe+0.084
 tentacles+0.083
Neg logit contributions
thank-0.049
 met-0.038
organized-0.038
 neighborhood-0.037
please-0.036
Token He
Token ID679
Feature activation-1.10e-02

Pos logit contributions
thank+0.016
 met+0.014
organized+0.014
please+0.014
 neighborhood+0.014
Neg logit contributions
ah-0.011
 cannon-0.011
 snowball-0.011
 may-0.011
affe-0.010
Token reached
Token ID4251
Feature activation+2.00e-02

Pos logit contributions
 cannon+0.023
ah+0.022
 globe+0.022
 snowball+0.022
 may+0.022
Neg logit contributions
thank-0.013
 met-0.012
please-0.012
 punished-0.011
organized-0.011
Token in
Token ID287
Feature activation-8.04e-03

Pos logit contributions
thank+0.019
 met+0.016
 outside+0.015
 neighborhood+0.014
organized+0.014
Neg logit contributions
affe-0.047
ah-0.046
 cannon-0.046
ience-0.045
bee-0.045
Token the
Token ID262
Feature activation+1.84e-03

Pos logit contributions
affe+0.055
ah+0.054
 spoiler+0.054
 cannon+0.054
 snowball+0.053
Neg logit contributions
ups-0.021
 met-0.020
thank-0.020
 doctors-0.019
 outside-0.018
Token necklace
Token ID38987
Feature activation-2.57e-02

Pos logit contributions
thank+0.004
 met+0.003
 neighborhood+0.003
 doctors+0.003
please+0.003
Neg logit contributions
ah-0.003
 snowball-0.003
rum-0.003
affe-0.003
haw-0.002
Token and
Token ID290
Feature activation-2.15e-02

Pos logit contributions
 snowball+0.053
haw+0.053
ience+0.052
 cannon+0.052
 spoiler+0.052
Neg logit contributions
thank-0.031
 outside-0.027
 met-0.026
ups-0.025
 organized-0.024
Token the
Token ID262
Feature activation+4.64e-03

Pos logit contributions
 cannon+0.041
ah+0.041
 snowball+0.041
 may+0.040
 button+0.040
Neg logit contributions
thank-0.030
organized-0.025
please-0.025
 neighborhood-0.024
 met-0.023
Token magn
Token ID7842
Feature activation-3.53e-02

Pos logit contributions
thank+0.009
 met+0.008
 neighborhood+0.008
please+0.008
 doctors+0.008
Neg logit contributions
ah-0.006
 snowball-0.006
rum-0.006
affe-0.006
 cannon-0.006
Tokenifying
Token ID4035
Feature activation-7.11e-02

Pos logit contributions
 snowball+0.058
 sled+0.051
rum+0.050
 spear+0.049
 log+0.048
Neg logit contributions
thank-0.061
amed-0.057
 met-0.056
organized-0.055
 outside-0.054
Token glass
Token ID5405
Feature activation-2.81e-02

Pos logit contributions
ah+0.073
 snowball+0.071
 cannon+0.071
affe+0.069
 may+0.069
Neg logit contributions
thank-0.164
 met-0.147
please-0.146
organized-0.145
 neighborhood-0.143
Token and
Token ID290
Feature activation-6.44e-03

Pos logit contributions
ience+0.065
 snowball+0.065
ah+0.065
haw+0.065
rum+0.064
Neg logit contributions
thank-0.026
ups-0.021
 met-0.019
 outside-0.018
 doctors-0.016
Token told
Token ID1297
Feature activation+1.47e-02

Pos logit contributions
 cannon+0.012
 may+0.012
ah+0.011
 button+0.011
 snowball+0.011
Neg logit contributions
thank-0.010
organized-0.008
please-0.008
 neighborhood-0.008
 met-0.007
Token him
Token ID683
Feature activation-5.33e-03

Pos logit contributions
 doctors+0.016
ups+0.015
thank+0.014
 met+0.014
 bored+0.014
Neg logit contributions
fly-0.031
ah-0.031
 snowball-0.031
affe-0.030
 globe-0.030
Token that
Token ID326
Feature activation-1.30e-02

Pos logit contributions
 snowball+0.012
ah+0.012
 spoiler+0.012
affe+0.011
 cannon+0.011
Neg logit contributions
ups-0.006
 met-0.005
 doctors-0.005
 responsible-0.005
thank-0.005
Token a
Token ID257
Feature activation-2.63e-04

Pos logit contributions
thank+0.025
 met+0.022
please+0.021
organized+0.021
 neighborhood+0.021
Neg logit contributions
ah-0.032
 cannon-0.032
 may-0.031
affe-0.031
 snowball-0.031
Token big
Token ID1263
Feature activation-1.00e-02

Pos logit contributions
rum+0.001
ah+0.001
 tentacles+0.001
 cannon+0.001
ience+0.001
Neg logit contributions
thank-0.000
 neighborhood-0.000
 met-0.000
please-0.000
 doctors-0.000
Token camera
Token ID4676
Feature activation-3.18e-02

Pos logit contributions
rum+0.018
 tentacles+0.017
ience+0.017
ret+0.017
 may+0.016
Neg logit contributions
 neighborhood-0.015
thank-0.015
 outside-0.013
 met-0.013
 neighbourhood-0.013
Token on
Token ID319
Feature activation-4.11e-02

Pos logit contributions
 cannon+0.063
ah+0.063
 snowball+0.063
 may+0.063
 tentacles+0.062
Neg logit contributions
thank-0.042
organized-0.034
 met-0.034
 neighborhood-0.033
please-0.033
Token a
Token ID257
Feature activation-1.36e-02

Pos logit contributions
ience+0.110
rum+0.108
affe+0.107
ah+0.106
 may+0.106
Neg logit contributions
thank-0.028
 doctors-0.023
 outside-0.022
 when-0.021
 stores-0.020
Token tripod
Token ID47439
Feature activation-7.10e-02

Pos logit contributions
rum+0.024
ience+0.022
 jaws+0.021
 tentacles+0.021
fly+0.021
Neg logit contributions
 neighborhood-0.021
 outside-0.020
thank-0.020
 met-0.020
 organized-0.019
Token.
Token ID13
Feature activation+5.17e-03

Pos logit contributions
ience+0.156
 snowball+0.156
 may+0.152
 cannon+0.152
haw+0.149
Neg logit contributions
thank-0.083
 met-0.067
 neighborhood-0.062
organized-0.062
please-0.058
Token It
Token ID632
Feature activation-1.91e-02

Pos logit contributions
thank+0.009
 met+0.008
 neighborhood+0.008
please+0.008
organized+0.008
Neg logit contributions
ah-0.008
affe-0.008
 cannon-0.008
 snowball-0.008
fly-0.008
Token belongs
Token ID14448
Feature activation+1.72e-02

Pos logit contributions
rum+0.036
affe+0.033
 jaws+0.031
 cannon+0.031
 tentacles+0.031
Neg logit contributions
 met-0.031
 asked-0.027
 called-0.026
thank-0.025
 invited-0.025
Token to
Token ID284
Feature activation+8.60e-03

Pos logit contributions
thank+0.030
please+0.026
organized+0.025
 punished+0.024
 neighborhood+0.024
Neg logit contributions
ah-0.027
 tentacles-0.026
 lizard-0.026
 cannon-0.026
 sled-0.026
Token a
Token ID257
Feature activation+2.86e-03

Pos logit contributions
thank+0.015
please+0.013
organized+0.013
 met+0.012
 neighborhood+0.012
Neg logit contributions
 cannon-0.014
 snowball-0.014
ah-0.014
 may-0.014
 button-0.013
Token
Token ID198
Feature activation-1.31e-02

Pos logit contributions
ah+0.038
 cannon+0.037
 snowball+0.036
 may+0.036
affe+0.036
Neg logit contributions
thank-0.026
 met-0.022
please-0.022
 neighborhood-0.022
organized-0.021
TokenMom
Token ID29252
Feature activation+5.12e-03

Pos logit contributions
ah+0.025
 may+0.024
hell+0.024
ience+0.024
affe+0.023
Neg logit contributions
thank-0.018
 neighborhood-0.016
 met-0.016
ups-0.016
please-0.016
Token looked
Token ID3114
Feature activation-1.86e-02

Pos logit contributions
thank+0.006
 met+0.006
please+0.005
ups+0.005
organized+0.005
Neg logit contributions
 cannon-0.011
rum-0.010
 tentacles-0.010
 jaws-0.010
 spoiler-0.010
Token at
Token ID379
Feature activation-2.00e-02

Pos logit contributions
 snowball+0.037
ience+0.036
 may+0.036
ah+0.035
 spear+0.035
Neg logit contributions
thank-0.028
please-0.023
organized-0.022
 neighborhood-0.020
ups-0.020
Token Mr
Token ID1770
Feature activation-2.89e-02

Pos logit contributions
rum+0.049
ience+0.048
affe+0.048
hell+0.047
fly+0.046
Neg logit contributions
 doctors-0.015
 outside-0.014
 neighborhood-0.013
 when-0.012
thank-0.012
Token.
Token ID13
Feature activation-7.10e-02

Pos logit contributions
ience+0.067
 may+0.066
 cannon+0.065
 tentacles+0.065
 spoiler+0.063
Neg logit contributions
thank-0.026
 neighborhood-0.022
 doctors-0.022
please-0.021
organized-0.021
Token Jones
Token ID5437
Feature activation-4.01e-02

Pos logit contributions
ience+0.127
hell+0.124
ion+0.121
affe+0.121
ah+0.115
Neg logit contributions
 neighborhood-0.104
thank-0.103
 doctors-0.095
please-0.094
 met-0.094
Token.
Token ID13
Feature activation-4.90e-02

Pos logit contributions
haw+0.097
 snowball+0.095
ah+0.094
 cannon+0.094
affe+0.093
Neg logit contributions
 met-0.029
 asked-0.029
thank-0.028
 bored-0.028
 outside-0.028
Token Mr
Token ID1770
Feature activation-1.56e-02

Pos logit contributions
ah+0.092
ience+0.092
 cannon+0.086
 tentacles+0.086
affe+0.086
Neg logit contributions
thank-0.066
 neighborhood-0.064
 doctors-0.062
 bored-0.061
please-0.059
Token.
Token ID13
Feature activation-6.08e-02

Pos logit contributions
 cannon+0.034
ience+0.033
ah+0.033
 may+0.033
 tentacles+0.033
Neg logit contributions
thank-0.017
organized-0.014
please-0.013
 neighborhood-0.013
 met-0.013
Token Jones
Token ID5437
Feature activation-3.40e-02

Pos logit contributions
ience+0.108
hell+0.105
affe+0.105
ion+0.105
ah+0.104
Neg logit contributions
thank-0.089
 neighborhood-0.085
please-0.080
 met-0.079
 doctors-0.077
Token they
Token ID484
Feature activation+5.23e-03

Pos logit contributions
thank+0.010
 met+0.009
 neighborhood+0.009
organized+0.008
ups+0.008
Neg logit contributions
 snowball-0.012
ah-0.012
affe-0.012
 cannon-0.012
bee-0.012
Token sw
Token ID1509
Feature activation+2.06e-02

Pos logit contributions
thank+0.006
 met+0.005
please+0.005
 punished+0.004
 neighborhood+0.004
Neg logit contributions
 snowball-0.012
 cannon-0.011
 button-0.011
rum-0.011
affe-0.011
Tokenam
Token ID321
Feature activation+1.48e-02

Pos logit contributions
thank+0.023
 met+0.019
ups+0.019
organized+0.018
 neighborhood+0.018
Neg logit contributions
 cannon-0.045
 snowball-0.045
ah-0.044
 may-0.044
 button-0.044
Token back
Token ID736
Feature activation+5.32e-03

Pos logit contributions
thank+0.019
 met+0.016
 neighborhood+0.015
please+0.015
organized+0.015
Neg logit contributions
ah-0.030
 cannon-0.030
affe-0.030
 may-0.030
 snowball-0.030
Token up
Token ID510
Feature activation-5.45e-03

Pos logit contributions
thank+0.006
 met+0.005
please+0.005
organized+0.005
 neighborhood+0.005
Neg logit contributions
ah-0.011
 cannon-0.011
 snowball-0.011
affe-0.011
 may-0.011
Tokenriver
Token ID38291
Feature activation-7.08e-02

Pos logit contributions
ah+0.009
 snowball+0.009
 cannon+0.009
 may+0.009
 button+0.009
Neg logit contributions
thank-0.009
please-0.008
organized-0.008
 met-0.007
 neighborhood-0.007
Token and
Token ID290
Feature activation-1.41e-02

Pos logit contributions
ah+0.123
 snowball+0.122
 cannon+0.121
affe+0.119
fly+0.117
Neg logit contributions
thank-0.113
 met-0.096
 neighborhood-0.093
organized-0.093
ups-0.090
Token soon
Token ID2582
Feature activation-1.34e-02

Pos logit contributions
ah+0.027
 may+0.026
 cannon+0.026
 snowball+0.026
 sled+0.026
Neg logit contributions
thank-0.020
please-0.017
organized-0.017
 neighborhood-0.016
 doctors-0.015
Token they
Token ID484
Feature activation+1.26e-02

Pos logit contributions
ah+0.022
 cannon+0.022
 may+0.021
 tentacles+0.021
 snowball+0.021
Neg logit contributions
thank-0.023
organized-0.020
please-0.020
 neighborhood-0.019
 responsible-0.018
Token were
Token ID547
Feature activation+1.25e-02

Pos logit contributions
thank+0.017
 met+0.014
please+0.014
organized+0.014
 neighborhood+0.013
Neg logit contributions
ah-0.025
 snowball-0.025
 cannon-0.025
affe-0.024
 may-0.024
Token back
Token ID736
Feature activation+2.35e-02

Pos logit contributions
thank+0.018
please+0.015
organized+0.015
 met+0.014
 neighborhood+0.014
Neg logit contributions
ah-0.024
 snowball-0.024
 cannon-0.024
 tentacles-0.023
 may-0.023
Tokens
Token ID82
Feature activation-2.66e-03

Pos logit contributions
thank+0.022
ups+0.021
organized+0.020
 met+0.019
please+0.019
Neg logit contributions
 snowball-0.026
 cannon-0.025
ah-0.025
affe-0.025
 lizard-0.025
Token and
Token ID290
Feature activation-4.43e-03

Pos logit contributions
ah+0.005
 snowball+0.005
affe+0.005
 cannon+0.005
 may+0.005
Neg logit contributions
thank-0.004
 met-0.003
please-0.003
organized-0.003
 neighborhood-0.003
Token seaw
Token ID32085
Feature activation-1.74e-03

Pos logit contributions
ah+0.007
 snowball+0.007
affe+0.007
 cannon+0.007
 may+0.007
Neg logit contributions
thank-0.008
 met-0.007
please-0.007
organized-0.006
 neighborhood-0.006
Tokeneed
Token ID2308
Feature activation-7.85e-03

Pos logit contributions
ah+0.002
 may+0.002
 snowball+0.002
fly+0.002
 cannon+0.002
Neg logit contributions
thank-0.004
please-0.004
 met-0.003
organized-0.003
ups-0.003
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
ah+0.017
 snowball+0.017
affe+0.016
 spoiler+0.016
 cannon+0.016
Neg logit contributions
thank-0.009
ups-0.008
 met-0.008
organized-0.007
please-0.007
Token Then
Token ID3244
Feature activation+3.69e-02

Pos logit contributions
thank+0.016
please+0.014
 met+0.014
organized+0.014
 doctors+0.014
Neg logit contributions
ah-0.019
 snowball-0.018
bee-0.018
 cannon-0.018
affe-0.018
Token we
Token ID356
Feature activation+1.54e-02

Pos logit contributions
thank+0.063
 met+0.055
please+0.054
organized+0.054
 neighborhood+0.053
Neg logit contributions
ah-0.060
 snowball-0.059
 cannon-0.058
affe-0.058
 tentacles-0.056
Token can
Token ID460
Feature activation+2.16e-03

Pos logit contributions
thank+0.031
 met+0.031
 punished+0.029
please+0.028
 organized+0.028
Neg logit contributions
affe-0.018
 lizard-0.018
rum-0.018
 snowball-0.018
ah-0.017
Token pretend
Token ID16614
Feature activation+3.64e-03

Pos logit contributions
thank+0.003
 met+0.003
please+0.003
 neighborhood+0.002
organized+0.002
Neg logit contributions
affe-0.004
ah-0.004
 snowball-0.004
ience-0.004
 lizard-0.004
Token to
Token ID284
Feature activation+3.55e-03

Pos logit contributions
thank+0.006
 met+0.005
please+0.005
organized+0.005
 neighborhood+0.005
Neg logit contributions
ah-0.006
 snowball-0.006
 cannon-0.006
affe-0.006
 may-0.006
Token play
Token ID711
Feature activation+9.92e-03

Pos logit contributions
thank+0.006
 met+0.005
please+0.005
organized+0.005
 neighborhood+0.005
Neg logit contributions
ah-0.006
 snowball-0.006
affe-0.006
 cannon-0.006
 may-0.006
Token.
Token ID13
Feature activation+3.18e-03

Pos logit contributions
ah+0.000
 cannon+0.000
 snowball+0.000
 may+0.000
affe+0.000
Neg logit contributions
thank-0.000
please-0.000
 met-0.000
organized-0.000
 neighborhood-0.000
Token
Token ID220
Feature activation+1.39e-02

Pos logit contributions
thank+0.006
please+0.005
organized+0.005
 met+0.005
 neighborhood+0.005
Neg logit contributions
ah-0.005
 snowball-0.005
 cannon-0.005
 may-0.005
affe-0.005
Token
Token ID198
Feature activation-1.75e-03

Pos logit contributions
thank+0.026
please+0.023
organized+0.023
 punished+0.022
 met+0.022
Neg logit contributions
 may-0.020
 snowball-0.020
ah-0.020
affe-0.020
 cannon-0.019
Token
Token ID198
Feature activation+1.85e-03

Pos logit contributions
ah+0.003
 snowball+0.003
 cannon+0.003
 may+0.003
affe+0.003
Neg logit contributions
thank-0.003
organized-0.002
please-0.002
 met-0.002
 neighborhood-0.002
TokenThe
Token ID464
Feature activation-7.41e-03

Pos logit contributions
thank+0.003
organized+0.003
please+0.003
 met+0.003
 neighborhood+0.003
Neg logit contributions
ah-0.003
 snowball-0.003
affe-0.003
 may-0.003
 button-0.003
Token sun
Token ID4252
Feature activation+3.95e-02

Pos logit contributions
 snowball+0.011
ah+0.011
 carrot+0.011
 button+0.011
 may+0.011
Neg logit contributions
thank-0.013
organized-0.012
please-0.011
 punished-0.011
 met-0.011
Tokenflower
Token ID25547
Feature activation+3.38e-02

Pos logit contributions
thank+0.065
 met+0.058
 neighborhood+0.054
please+0.054
organized+0.051
Neg logit contributions
 cannon-0.070
hell-0.066
affe-0.066
ience-0.066
 tentacles-0.065
Token was
Token ID373
Feature activation-6.59e-03

Pos logit contributions
thank+0.048
 met+0.042
please+0.039
 neighborhood+0.039
organized+0.038
Neg logit contributions
 cannon-0.066
 snowball-0.065
ah-0.064
 button-0.063
affe-0.063
Token so
Token ID523
Feature activation+4.50e-03

Pos logit contributions
ah+0.012
 snowball+0.012
 cannon+0.011
 tentacles+0.011
fly+0.011
Neg logit contributions
thank-0.010
organized-0.009
please-0.008
ups-0.008
 neighborhood-0.008
Token proud
Token ID6613
Feature activation+9.98e-04

Pos logit contributions
thank+0.007
organized+0.006
please+0.006
 doctors+0.006
 neighborhood+0.006
Neg logit contributions
ah-0.007
 may-0.007
 button-0.007
 snowball-0.007
 cannon-0.007
Token of
Token ID286
Feature activation-3.00e-02

Pos logit contributions
thank+0.002
organized+0.002
please+0.002
 met+0.002
 punished+0.002
Neg logit contributions
ah-0.001
 may-0.001
affe-0.001
 snowball-0.001
 lizard-0.001
Token
Token ID198
Feature activation-2.25e-02

Pos logit contributions
ah+0.016
 snowball+0.016
 cannon+0.016
 may+0.016
affe+0.016
Neg logit contributions
thank-0.017
please-0.015
organized-0.014
 met-0.014
 neighborhood-0.014
Token
Token ID198
Feature activation-1.70e-02

Pos logit contributions
ah+0.039
 snowball+0.038
 cannon+0.038
affe+0.037
 may+0.037
Neg logit contributions
thank-0.037
organized-0.031
please-0.031
 met-0.031
 neighborhood-0.030
Token"
Token ID1
Feature activation+2.16e-02

Pos logit contributions
ah+0.021
 cannon+0.021
 snowball+0.021
 may+0.021
 tentacles+0.020
Neg logit contributions
thank-0.036
please-0.032
 met-0.032
organized-0.031
 neighborhood-0.031
TokenLet
Token ID5756
Feature activation-3.08e-02

Pos logit contributions
thank+0.031
please+0.026
 met+0.026
organized+0.025
 neighborhood+0.024
Neg logit contributions
ah-0.042
 cannon-0.041
 snowball-0.041
 may-0.041
affe-0.040
Token's
Token ID338
Feature activation+4.49e-03

Pos logit contributions
ah+0.044
 may+0.043
 snowball+0.042
 button+0.042
affe+0.042
Neg logit contributions
thank-0.061
organized-0.053
please-0.052
 neighborhood-0.051
bel-0.050
Token go
Token ID467
Feature activation+3.60e-02

Pos logit contributions
thank+0.007
 met+0.006
please+0.006
 neighborhood+0.005
organized+0.005
Neg logit contributions
ah-0.008
affe-0.008
ience-0.008
 snowball-0.008
 cannon-0.008
Token take
Token ID1011
Feature activation+3.25e-03

Pos logit contributions
thank+0.052
 met+0.045
please+0.044
organized+0.042
 neighborhood+0.041
Neg logit contributions
ah-0.068
affe-0.067
 cannon-0.067
 snowball-0.066
 may-0.066
Token a
Token ID257
Feature activation+1.46e-03

Pos logit contributions
thank+0.004
 met+0.004
please+0.003
organized+0.003
 outside+0.003
Neg logit contributions
fly-0.007
ah-0.007
affe-0.007
bee-0.006
rum-0.006
Token closer
Token ID5699
Feature activation-1.12e-02

Pos logit contributions
 met+0.002
thank+0.002
 outside+0.002
 organized+0.002
 bored+0.002
Neg logit contributions
rum-0.003
affe-0.002
 thirteen-0.002
pe-0.002
bee-0.002
Token look
Token ID804
Feature activation-1.91e-02

Pos logit contributions
ience+0.023
rum+0.021
 may+0.020
 thirteen+0.020
affe+0.020
Neg logit contributions
thank-0.014
 outside-0.013
 met-0.013
please-0.012
 asked-0.011
Token,"
Token ID553
Feature activation+2.09e-02

Pos logit contributions
ah+0.030
 snowball+0.029
 cannon+0.029
 may+0.029
affe+0.029
Neg logit contributions
thank-0.034
please-0.029
organized-0.029
 met-0.029
 neighborhood-0.028
Token so
Token ID523
Feature activation+4.20e-03

Pos logit contributions
ah+0.005
 may+0.005
 button+0.005
 snowball+0.005
 cannon+0.005
Neg logit contributions
thank-0.005
 neighborhood-0.004
please-0.004
 met-0.004
organized-0.004
Token she
Token ID673
Feature activation+2.67e-02

Pos logit contributions
thank+0.007
organized+0.006
please+0.006
 met+0.006
 neighborhood+0.006
Neg logit contributions
ah-0.007
 cannon-0.007
 snowball-0.007
affe-0.006
 may-0.006
Token wouldn
Token ID3636
Feature activation-3.04e-02

Pos logit contributions
thank+0.040
 neighborhood+0.035
organized+0.034
please+0.032
 unatt+0.032
Neg logit contributions
 may-0.049
ah-0.047
 snowball-0.047
 button-0.046
ience-0.046
Token't
Token ID470
Feature activation+1.56e-02

Pos logit contributions
ah+0.044
 cannon+0.044
 snowball+0.043
affe+0.042
 tentacles+0.042
Neg logit contributions
thank-0.057
please-0.051
organized-0.050
 met-0.050
 neighborhood-0.047
Token get
Token ID651
Feature activation+3.87e-02

Pos logit contributions
thank+0.025
please+0.022
 met+0.021
organized+0.021
 neighborhood+0.020
Neg logit contributions
 cannon-0.027
ah-0.027
 snowball-0.027
affe-0.027
ience-0.026
Token her
Token ID607
Feature activation+3.59e-02

Pos logit contributions
thank+0.063
please+0.054
organized+0.054
 met+0.054
 neighborhood+0.053
Neg logit contributions
ah-0.066
 snowball-0.065
 cannon-0.065
 may-0.064
affe-0.063
Token dress
Token ID6576
Feature activation+1.92e-02

Pos logit contributions
thank+0.061
please+0.053
 met+0.053
organized+0.053
 neighborhood+0.052
Neg logit contributions
 snowball-0.058
 may-0.057
 button-0.057
ah-0.056
 cannon-0.056
Token dirty
Token ID11841
Feature activation+2.81e-02

Pos logit contributions
thank+0.031
 met+0.028
 neighborhood+0.028
please+0.027
organized+0.026
Neg logit contributions
ah-0.033
 button-0.031
 snowball-0.031
 may-0.031
 cannon-0.030
Token.
Token ID13
Feature activation+2.46e-03

Pos logit contributions
thank+0.046
 met+0.042
please+0.041
 neighborhood+0.040
organized+0.038
Neg logit contributions
ah-0.046
 snowball-0.046
 cannon-0.045
 button-0.045
 may-0.044
Token Lily
Token ID20037
Feature activation+1.05e-02

Pos logit contributions
thank+0.005
please+0.005
 punished+0.005
organized+0.005
ups+0.005
Neg logit contributions
 may-0.003
 It-0.003
 button-0.003
 snowball-0.003
 carrot-0.002
Token thought
Token ID1807
Feature activation-1.48e-02

Pos logit contributions
thank+0.017
please+0.015
organized+0.014
 neighborhood+0.014
 met+0.014
Neg logit contributions
ah-0.018
 may-0.018
 snowball-0.017
 button-0.017
affe-0.017
Token in
Token ID287
Feature activation-1.64e-02

Pos logit contributions
ah+0.029
 snowball+0.028
 cannon+0.027
 button+0.027
 tentacles+0.027
Neg logit contributions
thank-0.027
please-0.024
organized-0.024
 punished-0.023
ups-0.023
Token the
Token ID262
Feature activation+2.10e-02

Pos logit contributions
affe+0.038
ah+0.037
ience+0.036
hell+0.036
bee+0.036
Neg logit contributions
thank-0.018
 met-0.016
 neighborhood-0.016
 doctors-0.014
please-0.014
Token morning
Token ID3329
Feature activation+1.77e-03

Pos logit contributions
thank+0.044
 met+0.040
 neighborhood+0.039
please+0.039
organized+0.038
Neg logit contributions
ah-0.026
affe-0.025
 cannon-0.025
 snowball-0.024
 may-0.024
Token.
Token ID13
Feature activation-6.81e-03

Pos logit contributions
thank+0.003
organized+0.003
please+0.003
 punished+0.003
 responsible+0.003
Neg logit contributions
 cannon-0.003
 snowball-0.003
 tentacles-0.003
 button-0.003
ah-0.002
Token The
Token ID383
Feature activation-2.55e-03

Pos logit contributions
 cannon+0.010
 snowball+0.010
ah+0.010
 may+0.010
affe+0.010
Neg logit contributions
thank-0.012
please-0.011
 met-0.011
organized-0.011
 neighborhood-0.010
Token sun
Token ID4252
Feature activation+3.36e-02

Pos logit contributions
ah+0.004
 cannon+0.004
 snowball+0.004
affe+0.004
 may+0.004
Neg logit contributions
thank-0.005
 met-0.004
please-0.004
 neighborhood-0.004
organized-0.004
Token was
Token ID373
Feature activation-7.78e-03

Pos logit contributions
thank+0.049
 met+0.042
please+0.039
organized+0.039
 neighborhood+0.038
Neg logit contributions
ah-0.063
affe-0.063
 cannon-0.062
 snowball-0.061
rum-0.061
Token just
Token ID655
Feature activation-1.11e-02

Pos logit contributions
 snowball+0.014
ah+0.014
 cannon+0.014
 carrot+0.013
 tentacles+0.013
Neg logit contributions
thank-0.012
organized-0.010
 neighborhood-0.010
ups-0.010
 met-0.010
Token starting
Token ID3599
Feature activation+3.35e-03

Pos logit contributions
ah+0.019
 cannon+0.019
 snowball+0.019
 may+0.019
 tentacles+0.019
Neg logit contributions
thank-0.017
organized-0.015
please-0.015
 neighborhood-0.015
 met-0.015
Token to
Token ID284
Feature activation+8.45e-03

Pos logit contributions
thank+0.006
organized+0.005
 met+0.005
 doctors+0.005
please+0.005
Neg logit contributions
 snowball-0.005
 sled-0.005
 tentacles-0.005
ah-0.005
 cannon-0.005
Token come
Token ID1282
Feature activation+2.06e-02

Pos logit contributions
thank+0.013
organized+0.011
 met+0.011
please+0.011
 neighborhood+0.011
Neg logit contributions
ah-0.015
 snowball-0.015
 cannon-0.015
 may-0.015
 button-0.014
Token ball
Token ID2613
Feature activation-2.43e-02

Pos logit contributions
thank+0.008
 neighborhood+0.007
organized+0.007
 met+0.007
 bored+0.007
Neg logit contributions
ah-0.007
rum-0.007
fly-0.007
ience-0.007
 cannon-0.006
Token too
Token ID1165
Feature activation+3.37e-03

Pos logit contributions
 cannon+0.047
ah+0.047
 snowball+0.046
affe+0.046
 may+0.045
Neg logit contributions
thank-0.034
 met-0.028
organized-0.028
please-0.028
 neighborhood-0.026
Token hard
Token ID1327
Feature activation-2.68e-02

Pos logit contributions
thank+0.006
organized+0.005
please+0.005
 met+0.005
 responsible+0.005
Neg logit contributions
 cannon-0.005
fly-0.005
 tentacles-0.005
ah-0.005
 globe-0.005
Token and
Token ID290
Feature activation+5.83e-03

Pos logit contributions
ah+0.040
 snowball+0.039
 cannon+0.039
 may+0.038
affe+0.038
Neg logit contributions
thank-0.050
please-0.043
 met-0.043
organized-0.043
 neighborhood-0.042
Token it
Token ID340
Feature activation+1.23e-02

Pos logit contributions
thank+0.016
 neighborhood+0.015
please+0.015
organized+0.015
 neighbourhood+0.014
Neg logit contributions
 may-0.004
 snowball-0.003
ah-0.003
 button-0.003
 cannon-0.003
Token rolled
Token ID11686
Feature activation+1.00e-02

Pos logit contributions
thank+0.023
 neighborhood+0.020
please+0.020
organized+0.019
 neighbourhood+0.019
Neg logit contributions
 snowball-0.019
 may-0.019
ah-0.019
 button-0.019
 cannon-0.018
Token away
Token ID1497
Feature activation-5.50e-03

Pos logit contributions
thank+0.013
organized+0.010
 met+0.010
please+0.010
 outside+0.009
Neg logit contributions
affe-0.021
ah-0.021
ience-0.021
 cannon-0.021
 may-0.020
Token into
Token ID656
Feature activation+2.50e-02

Pos logit contributions
 snowball+0.009
 button+0.009
ah+0.008
fly+0.008
 sled+0.008
Neg logit contributions
thank-0.009
please-0.009
 met-0.009
 neighborhood-0.008
organized-0.008
Token the
Token ID262
Feature activation+2.34e-02

Pos logit contributions
thank+0.041
please+0.035
organized+0.035
 met+0.035
 neighborhood+0.033
Neg logit contributions
 snowball-0.042
 cannon-0.042
ah-0.042
 button-0.041
 may-0.040
Token bushes
Token ID37413
Feature activation+6.17e-03

Pos logit contributions
thank+0.035
please+0.030
organized+0.029
 met+0.029
 neighborhood+0.027
Neg logit contributions
 snowball-0.043
ah-0.043
 cannon-0.043
 may-0.042
affe-0.042
Token.
Token ID13
Feature activation+9.88e-03

Pos logit contributions
thank+0.010
please+0.009
 met+0.009
organized+0.009
 neighborhood+0.009
Neg logit contributions
ah-0.010
 snowball-0.010
 button-0.010
 may-0.010
 cannon-0.010
Token.
Token ID13
Feature activation+1.01e-02

Pos logit contributions
thank+0.029
please+0.024
 met+0.024
organized+0.024
 doctors+0.023
Neg logit contributions
 cannon-0.045
ah-0.045
 snowball-0.045
 globe-0.044
 may-0.043
Token But
Token ID887
Feature activation+1.11e-02

Pos logit contributions
thank+0.019
please+0.017
organized+0.016
 met+0.016
 doctors+0.015
Neg logit contributions
 may-0.015
 button-0.014
affe-0.014
 cannon-0.014
ah-0.013
Token she
Token ID673
Feature activation+1.02e-02

Pos logit contributions
thank+0.016
please+0.014
organized+0.014
 met+0.013
 neighborhood+0.013
Neg logit contributions
ah-0.021
 may-0.020
 cannon-0.020
 snowball-0.020
affe-0.020
Token was
Token ID373
Feature activation+6.26e-04

Pos logit contributions
thank+0.014
 met+0.012
please+0.012
organized+0.012
 neighborhood+0.011
Neg logit contributions
ah-0.020
 cannon-0.020
 snowball-0.020
affe-0.019
 may-0.019
Token relieved
Token ID22598
Feature activation-1.26e-02

Pos logit contributions
thank+0.001
organized+0.001
please+0.001
 neighborhood+0.001
 met+0.001
Neg logit contributions
ah-0.001
 snowball-0.001
 cannon-0.001
 tentacles-0.001
 may-0.001
Token too
Token ID1165
Feature activation+7.54e-03

Pos logit contributions
 cannon+0.022
 snowball+0.022
ah+0.022
affe+0.021
 may+0.021
Neg logit contributions
thank-0.021
please-0.017
 met-0.017
organized-0.017
 neighborhood-0.017
Token,
Token ID11
Feature activation+5.77e-03

Pos logit contributions
thank+0.012
please+0.010
 met+0.010
organized+0.010
 neighborhood+0.010
Neg logit contributions
ah-0.013
 cannon-0.013
 snowball-0.013
 may-0.013
affe-0.013
Token that
Token ID326
Feature activation-2.14e-02

Pos logit contributions
thank+0.009
 met+0.008
please+0.008
 neighborhood+0.007
organized+0.007
Neg logit contributions
ah-0.010
 cannon-0.010
 snowball-0.010
affe-0.010
 button-0.010
Token they
Token ID484
Feature activation+1.46e-02

Pos logit contributions
ah+0.038
 may+0.037
 cannon+0.037
 snowball+0.037
 button+0.036
Neg logit contributions
thank-0.033
please-0.028
organized-0.028
 met-0.028
 neighborhood-0.027
Token had
Token ID550
Feature activation-1.55e-02

Pos logit contributions
thank+0.022
please+0.019
organized+0.019
 met+0.019
 neighborhood+0.018
Neg logit contributions
ah-0.027
 cannon-0.026
 snowball-0.026
 may-0.026
affe-0.026
Token only
Token ID691
Feature activation+1.06e-02

Pos logit contributions
 may+0.028
 cannon+0.028
 tentacles+0.028
ah+0.028
 snowball+0.028
Neg logit contributions
thank-0.022
organized-0.019
please-0.018
 neighborhood-0.017
bel-0.017
Token high
Token ID1029
Feature activation-2.29e-03

Pos logit contributions
thank+0.004
 met+0.004
organized+0.004
please+0.004
 neighborhood+0.003
Neg logit contributions
ah-0.005
 cannon-0.005
 snowball-0.005
affe-0.005
 may-0.005
Token!
Token ID0
Feature activation+2.03e-04

Pos logit contributions
ah+0.004
 cannon+0.004
 snowball+0.004
affe+0.004
 may+0.004
Neg logit contributions
thank-0.004
please-0.003
organized-0.003
 met-0.003
 neighborhood-0.003
Token You
Token ID921
Feature activation-2.02e-02

Pos logit contributions
thank+0.000
please+0.000
 neighborhood+0.000
 met+0.000
organized+0.000
Neg logit contributions
ah-0.000
hell-0.000
 cannon-0.000
 snowball-0.000
bee-0.000
Token will
Token ID481
Feature activation-7.86e-03

Pos logit contributions
 snowball+0.029
 cannon+0.027
rum+0.027
ah+0.027
 spoiler+0.026
Neg logit contributions
 met-0.038
thank-0.036
 punished-0.034
 bored-0.033
 accomplished-0.033
Token fall
Token ID2121
Feature activation-6.30e-03

Pos logit contributions
 snowball+0.016
affe+0.016
 cannon+0.015
ah+0.015
ience+0.015
Neg logit contributions
thank-0.011
 met-0.009
please-0.009
 neighborhood-0.008
 responsible-0.008
Token!"
Token ID2474
Feature activation+1.57e-02

Pos logit contributions
affe+0.012
 cannon+0.012
ah+0.012
 tentacles+0.012
 may+0.011
Neg logit contributions
thank-0.009
please-0.007
 met-0.007
organized-0.007
 neighborhood-0.007
Token
Token ID198
Feature activation-1.74e-02

Pos logit contributions
thank+0.017
 met+0.016
please+0.015
 neighborhood+0.014
 doctors+0.013
Neg logit contributions
 cannon-0.033
ience-0.033
 button-0.033
 globe-0.033
ah-0.032
Token
Token ID198
Feature activation-1.39e-02

Pos logit contributions
 cannon+0.037
ah+0.037
ience+0.036
 may+0.036
 tentacles+0.036
Neg logit contributions
thank-0.019
please-0.017
 met-0.017
 neighborhood-0.016
ups-0.015
TokenBob
Token ID18861
Feature activation-1.17e-03

Pos logit contributions
 cannon+0.026
 may+0.025
 jaws+0.025
 tentacles+0.025
 spoiler+0.025
Neg logit contributions
thank-0.021
please-0.020
 met-0.018
ups-0.018
 neighborhood-0.018
Tokeno
Token ID78
Feature activation-1.02e-02

Pos logit contributions
 cannon+0.002
 spoiler+0.002
 button+0.002
 may+0.002
 globe+0.002
Neg logit contributions
thank-0.002
 met-0.001
please-0.001
organized-0.001
 neighborhood-0.001
Token was
Token ID373
Feature activation-2.03e-02

Pos logit contributions
 cannon+0.021
 snowball+0.020
 globe+0.020
rum+0.020
 spear+0.020
Neg logit contributions
thank-0.014
 met-0.012
please-0.011
organized-0.010
 neighborhood-0.010
Token was
Token ID373
Feature activation-5.04e-03

Pos logit contributions
ah+0.020
 cannon+0.020
 snowball+0.020
 may+0.020
affe+0.020
Neg logit contributions
thank-0.016
 met-0.013
please-0.013
organized-0.013
 neighborhood-0.012
Token so
Token ID523
Feature activation+1.16e-02

Pos logit contributions
 snowball+0.009
 cannon+0.009
 tentacles+0.009
ah+0.009
 button+0.009
Neg logit contributions
thank-0.007
organized-0.006
please-0.006
 neighborhood-0.006
 met-0.006
Token happy
Token ID3772
Feature activation-5.42e-03

Pos logit contributions
thank+0.018
please+0.015
organized+0.015
 neighborhood+0.015
 doctors+0.014
Neg logit contributions
 cannon-0.020
 tentacles-0.020
 button-0.019
 may-0.019
 snowball-0.019
Token,
Token ID11
Feature activation+1.17e-03

Pos logit contributions
ah+0.009
 snowball+0.009
 cannon+0.009
affe+0.009
 may+0.009
Neg logit contributions
thank-0.009
 met-0.008
please-0.008
organized-0.008
 neighborhood-0.008
Token but
Token ID475
Feature activation-6.21e-03

Pos logit contributions
thank+0.002
 met+0.001
please+0.001
 neighborhood+0.001
 responsible+0.001
Neg logit contributions
ah-0.002
 cannon-0.002
affe-0.002
fly-0.002
 snowball-0.002
Token he
Token ID339
Feature activation+5.00e-03

Pos logit contributions
 snowball+0.011
 cannon+0.011
affe+0.011
ah+0.011
 button+0.011
Neg logit contributions
thank-0.010
 met-0.008
 neighborhood-0.008
please-0.008
organized-0.008
Token was
Token ID373
Feature activation-2.16e-03

Pos logit contributions
thank+0.007
 met+0.006
please+0.006
organized+0.006
 neighborhood+0.006
Neg logit contributions
 cannon-0.010
 snowball-0.010
ah-0.010
affe-0.009
 button-0.009
Token now
Token ID783
Feature activation-2.14e-02

Pos logit contributions
ah+0.004
 snowball+0.004
 cannon+0.004
 may+0.004
affe+0.004
Neg logit contributions
thank-0.003
please-0.003
organized-0.003
 met-0.003
 neighborhood-0.003
Token too
Token ID1165
Feature activation+8.92e-03

Pos logit contributions
ah+0.045
affe+0.044
 snowball+0.044
 cannon+0.044
 may+0.043
Neg logit contributions
thank-0.028
 met-0.023
please-0.022
organized-0.022
 neighborhood-0.021
Token nervous
Token ID10927
Feature activation+9.77e-03

Pos logit contributions
thank+0.012
organized+0.010
please+0.010
 met+0.010
 neighborhood+0.009
Neg logit contributions
ah-0.018
 snowball-0.017
 cannon-0.017
 may-0.017
 tentacles-0.017
Token to
Token ID284
Feature activation+3.62e-03

Pos logit contributions
thank+0.014
please+0.012
 met+0.012
organized+0.012
 neighborhood+0.012
Neg logit contributions
ah-0.018
 cannon-0.018
 may-0.018
 snowball-0.018
affe-0.017
Token listened
Token ID16399
Feature activation+1.71e-03

Pos logit contributions
ah+0.021
 may+0.020
 cannon+0.020
 sled+0.019
 snowball+0.019
Neg logit contributions
thank-0.018
ups-0.016
bel-0.015
please-0.015
 neighborhood-0.015
Token to
Token ID284
Feature activation-8.68e-03

Pos logit contributions
thank+0.003
please+0.002
organized+0.002
 neighborhood+0.002
 met+0.002
Neg logit contributions
ah-0.003
 may-0.003
 button-0.003
ience-0.003
 snowball-0.003
Token Red
Token ID2297
Feature activation-3.67e-02

Pos logit contributions
 may+0.012
 snowball+0.012
 cannon+0.012
 button+0.011
ah+0.011
Neg logit contributions
thank-0.017
 met-0.015
please-0.014
ups-0.014
organized-0.014
Token and
Token ID290
Feature activation-1.79e-02

Pos logit contributions
 snowball+0.068
 cannon+0.068
ah+0.067
 tentacles+0.066
 may+0.064
Neg logit contributions
thank-0.055
organized-0.048
 met-0.047
please-0.046
 neighborhood-0.046
Token they
Token ID484
Feature activation+1.64e-02

Pos logit contributions
 may+0.030
ah+0.028
 spear+0.028
 cannon+0.028
 sled+0.028
Neg logit contributions
thank-0.028
organized-0.025
please-0.024
 neighborhood-0.023
ups-0.022
Token never
Token ID1239
Feature activation+6.25e-03

Pos logit contributions
thank+0.021
please+0.017
organized+0.017
 met+0.017
 neighborhood+0.017
Neg logit contributions
ah-0.034
 cannon-0.033
 may-0.033
 snowball-0.033
affe-0.032
Token got
Token ID1392
Feature activation+3.92e-02

Pos logit contributions
thank+0.011
organized+0.009
 neighborhood+0.009
 doctors+0.009
please+0.009
Neg logit contributions
ah-0.010
 may-0.010
 cannon-0.010
fly-0.009
 roar-0.009
Token into
Token ID656
Feature activation+3.23e-02

Pos logit contributions
thank+0.064
 neighborhood+0.052
please+0.051
organized+0.051
 met+0.050
Neg logit contributions
 snowball-0.068
 cannon-0.067
ah-0.066
 may-0.065
 sled-0.065
Token a
Token ID257
Feature activation-2.01e-03

Pos logit contributions
thank+0.046
organized+0.038
 met+0.038
please+0.038
 neighborhood+0.037
Neg logit contributions
ah-0.063
affe-0.062
 cannon-0.061
 snowball-0.060
 may-0.060
Token wreck
Token ID15228
Feature activation+3.91e-03

Pos logit contributions
 snowball+0.003
ah+0.003
 cannon+0.003
 may+0.003
 button+0.003
Neg logit contributions
thank-0.003
please-0.003
organized-0.003
 met-0.003
 neighborhood-0.003
Token again
Token ID757
Feature activation-3.95e-03

Pos logit contributions
thank+0.008
please+0.007
 met+0.007
organized+0.007
 neighborhood+0.006
Neg logit contributions
ah-0.005
 button-0.005
 snowball-0.005
 cannon-0.005
 may-0.005
Token then
Token ID788
Feature activation-1.24e-02

Pos logit contributions
ah+0.016
 snowball+0.016
 cannon+0.015
 may+0.015
 tentacles+0.015
Neg logit contributions
thank-0.019
organized-0.016
please-0.016
 met-0.016
 neighborhood-0.016
Token shouted
Token ID17293
Feature activation-1.56e-03

Pos logit contributions
ah+0.022
 snowball+0.021
 cannon+0.021
 may+0.021
 button+0.021
Neg logit contributions
thank-0.019
organized-0.017
please-0.017
 met-0.016
 neighborhood-0.016
Token,
Token ID11
Feature activation-1.99e-03

Pos logit contributions
ah+0.003
 snowball+0.003
 cannon+0.003
affe+0.003
 may+0.003
Neg logit contributions
thank-0.002
please-0.002
 met-0.002
organized-0.002
 neighborhood-0.002
Token "
Token ID366
Feature activation+1.25e-02

Pos logit contributions
ah+0.004
 cannon+0.003
 snowball+0.003
affe+0.003
 may+0.003
Neg logit contributions
thank-0.003
please-0.003
 met-0.003
 neighborhood-0.003
organized-0.003
TokenI
Token ID40
Feature activation-2.15e-03

Pos logit contributions
thank+0.018
organized+0.016
 met+0.015
 neighborhood+0.015
please+0.015
Neg logit contributions
ah-0.023
 snowball-0.023
affe-0.022
 may-0.022
 cannon-0.022
Token'm
Token ID1101
Feature activation-6.22e-03

Pos logit contributions
affe+0.004
 cannon+0.004
 globe+0.004
 spoiler+0.004
 snowball+0.004
Neg logit contributions
thank-0.003
 met-0.003
please-0.003
 punished-0.003
 neighborhood-0.003
Token going
Token ID1016
Feature activation+2.09e-02

Pos logit contributions
ah+0.012
 cannon+0.012
 snowball+0.012
 may+0.012
affe+0.012
Neg logit contributions
thank-0.009
please-0.007
 met-0.007
organized-0.007
 neighborhood-0.007
Token for
Token ID329
Feature activation+1.80e-02

Pos logit contributions
thank+0.024
please+0.020
 met+0.019
 neighborhood+0.018
organized+0.017
Neg logit contributions
affe-0.046
ah-0.045
 cannon-0.045
bee-0.044
 snowball-0.044
Token a
Token ID257
Feature activation-1.33e-04

Pos logit contributions
thank+0.026
 met+0.022
please+0.022
organized+0.021
 doctors+0.021
Neg logit contributions
ah-0.035
affe-0.034
 cannon-0.034
 snowball-0.033
 may-0.033
Token dive
Token ID15647
Feature activation+9.52e-03

Pos logit contributions
rum+0.000
 spoiler+0.000
affe+0.000
 tentacles+0.000
ion+0.000
Neg logit contributions
thank-0.000
please-0.000
 met-0.000
 responsible-0.000
 neighborhood-0.000
Token!"
Token ID2474
Feature activation+1.23e-02

Pos logit contributions
thank+0.015
 met+0.013
please+0.013
organized+0.013
 neighborhood+0.012
Neg logit contributions
affe-0.016
 cannon-0.016
ah-0.016
 may-0.016
 snowball-0.016
Token easier
Token ID4577
Feature activation+2.79e-03

Pos logit contributions
thank+0.027
organized+0.025
please+0.024
 met+0.024
 neighborhood+0.023
Neg logit contributions
 snowball-0.024
 cannon-0.023
 may-0.023
 carrot-0.023
 spear-0.022
Token for
Token ID329
Feature activation+9.53e-03

Pos logit contributions
thank+0.004
 met+0.003
please+0.003
organized+0.003
 neighborhood+0.003
Neg logit contributions
ah-0.005
 snowball-0.005
 cannon-0.005
affe-0.005
 may-0.005
Token them
Token ID606
Feature activation+1.43e-02

Pos logit contributions
thank+0.014
 met+0.012
please+0.012
 neighborhood+0.011
organized+0.011
Neg logit contributions
ah-0.018
affe-0.018
 snowball-0.017
 may-0.017
 cannon-0.017
Token to
Token ID284
Feature activation-2.21e-03

Pos logit contributions
thank+0.019
 met+0.016
please+0.015
organized+0.015
 neighborhood+0.015
Neg logit contributions
ah-0.029
 snowball-0.029
 cannon-0.028
affe-0.028
 button-0.028
Token swim
Token ID9422
Feature activation+1.15e-02

Pos logit contributions
affe+0.004
ah+0.004
 snowball+0.004
ience+0.004
 cannon+0.004
Neg logit contributions
thank-0.004
 met-0.003
 neighborhood-0.003
please-0.003
 punished-0.003
Token around
Token ID1088
Feature activation-2.16e-02

Pos logit contributions
thank+0.016
 met+0.013
please+0.013
organized+0.013
 neighborhood+0.013
Neg logit contributions
ah-0.022
 cannon-0.022
 snowball-0.022
affe-0.022
 may-0.022
Token.
Token ID13
Feature activation-2.33e-03

Pos logit contributions
affe+0.047
ah+0.047
 snowball+0.045
 cannon+0.045
ience+0.045
Neg logit contributions
thank-0.027
 met-0.021
 neighborhood-0.020
please-0.019
organized-0.019
Token The
Token ID383
Feature activation-1.71e-02

Pos logit contributions
ah+0.004
bee+0.004
 snowball+0.004
 button+0.004
 lizard+0.004
Neg logit contributions
thank-0.003
 neighborhood-0.003
 met-0.003
organized-0.003
please-0.003
Token two
Token ID734
Feature activation+1.44e-03

Pos logit contributions
affe+0.036
rum+0.035
ah+0.035
 may+0.035
hell+0.034
Neg logit contributions
thank-0.020
 met-0.019
 neighborhood-0.018
 doctors-0.017
 bored-0.017
Token friends
Token ID2460
Feature activation+3.03e-02

Pos logit contributions
thank+0.001
 met+0.001
 neighborhood+0.001
please+0.001
ups+0.001
Neg logit contributions
ah-0.003
 snowball-0.003
 cannon-0.003
 globe-0.003
 button-0.003
Token were
Token ID547
Feature activation-4.27e-03

Pos logit contributions
thank+0.037
 met+0.036
 neighborhood+0.028
 punished+0.028
 accomplished+0.028
Neg logit contributions
 snowball-0.065
affe-0.062
 button-0.062
ah-0.062
 cannon-0.062
Token.
Token ID13
Feature activation+7.38e-03

Pos logit contributions
thank+0.003
 met+0.003
please+0.003
organized+0.003
 neighborhood+0.003
Neg logit contributions
ah-0.004
 snowball-0.003
 cannon-0.003
affe-0.003
 may-0.003
Token They
Token ID1119
Feature activation-7.82e-03

Pos logit contributions
thank+0.012
 met+0.010
please+0.010
 neighborhood+0.010
organized+0.010
Neg logit contributions
ah-0.013
affe-0.012
 snowball-0.012
ience-0.012
 cannon-0.012
Token have
Token ID423
Feature activation-1.31e-02

Pos logit contributions
 snowball+0.014
 cannon+0.013
rum+0.013
ah+0.013
affe+0.013
Neg logit contributions
thank-0.013
 met-0.011
please-0.011
 punished-0.010
 neighborhood-0.010
Token fun
Token ID1257
Feature activation+2.51e-02

Pos logit contributions
ah+0.026
 cannon+0.026
 snowball+0.026
 may+0.025
 tentacles+0.025
Neg logit contributions
thank-0.017
organized-0.015
please-0.014
 met-0.014
 neighborhood-0.014
Token.
Token ID13
Feature activation+8.24e-03

Pos logit contributions
thank+0.037
please+0.031
 met+0.031
organized+0.030
 neighborhood+0.030
Neg logit contributions
ah-0.047
 cannon-0.046
 snowball-0.046
affe-0.046
 may-0.046
Token They
Token ID1119
Feature activation-7.86e-03

Pos logit contributions
thank+0.014
please+0.012
 met+0.012
organized+0.012
 neighborhood+0.012
Neg logit contributions
ah-0.013
 snowball-0.013
 cannon-0.013
affe-0.013
 may-0.013
Token make
Token ID787
Feature activation-9.87e-04

Pos logit contributions
 snowball+0.012
ah+0.012
 cannon+0.012
affe+0.012
rum+0.012
Neg logit contributions
thank-0.014
 met-0.013
please-0.012
 neighborhood-0.012
organized-0.012
Token a
Token ID257
Feature activation+2.23e-02

Pos logit contributions
ah+0.002
 snowball+0.002
 cannon+0.002
 may+0.002
affe+0.002
Neg logit contributions
thank-0.002
please-0.001
 met-0.001
organized-0.001
 neighborhood-0.001
Token new
Token ID649
Feature activation+1.24e-02

Pos logit contributions
thank+0.032
please+0.027
 neighborhood+0.026
 met+0.026
organized+0.026
Neg logit contributions
ah-0.042
 cannon-0.041
affe-0.041
 may-0.040
 snowball-0.040
Token friend
Token ID1545
Feature activation+1.86e-02

Pos logit contributions
thank+0.021
please+0.018
 met+0.018
organized+0.018
 neighborhood+0.017
Neg logit contributions
ah-0.020
 snowball-0.020
 cannon-0.020
 may-0.020
affe-0.019
Token.
Token ID13
Feature activation+5.26e-03

Pos logit contributions
thank+0.022
 met+0.019
please+0.018
 neighborhood+0.017
organized+0.017
Neg logit contributions
ah-0.039
 cannon-0.039
 snowball-0.039
 tentacles-0.039
 may-0.038
Token in
Token ID287
Feature activation+4.67e-04

Pos logit contributions
affe+0.022
 cannon+0.022
ah+0.021
 snowball+0.021
 may+0.021
Neg logit contributions
thank-0.012
 met-0.009
organized-0.009
 neighborhood-0.009
please-0.008
Token the
Token ID262
Feature activation+2.56e-02

Pos logit contributions
thank+0.001
 neighborhood+0.000
 met+0.000
 outside+0.000
 neighbourhood+0.000
Neg logit contributions
affe-0.001
ience-0.001
bee-0.001
ah-0.001
haw-0.001
Token sky
Token ID6766
Feature activation-5.95e-03

Pos logit contributions
thank+0.039
 met+0.034
 neighborhood+0.033
please+0.032
organized+0.031
Neg logit contributions
ah-0.046
 cannon-0.046
 snowball-0.045
affe-0.045
rum-0.045
Token.
Token ID13
Feature activation+8.84e-03

Pos logit contributions
 cannon+0.012
ah+0.012
 snowball+0.012
affe+0.011
 may+0.011
Neg logit contributions
thank-0.008
 met-0.007
please-0.007
organized-0.006
 neighborhood-0.006
Token<|endoftext|>
Token ID50256
Feature activation+7.99e-04

Pos logit contributions
thank+0.012
 neighborhood+0.010
 met+0.010
please+0.010
 neighbourhood+0.010
Neg logit contributions
 cannon-0.017
 button-0.017
bee-0.017
ah-0.017
 globe-0.017
TokenA
Token ID32
Feature activation-1.02e-02

Pos logit contributions
thank+0.001
 met+0.001
 neighborhood+0.001
please+0.001
organized+0.001
Neg logit contributions
ah-0.002
 cannon-0.002
affe-0.002
ience-0.002
 tentacles-0.002
Token little
Token ID1310
Feature activation+4.54e-03

Pos logit contributions
 cannon+0.018
 may+0.018
 snowball+0.018
 tentacles+0.018
ah+0.018
Neg logit contributions
thank-0.016
please-0.013
 met-0.013
 neighborhood-0.013
organized-0.013
Token girl
Token ID2576
Feature activation+1.02e-02

Pos logit contributions
thank+0.005
 met+0.004
 neighborhood+0.004
please+0.004
organized+0.004
Neg logit contributions
 cannon-0.010
 may-0.010
rum-0.010
 tentacles-0.010
affe-0.009
Token was
Token ID373
Feature activation-3.56e-03

Pos logit contributions
thank+0.015
please+0.013
organized+0.013
 responsible+0.012
ups+0.012
Neg logit contributions
ah-0.019
 may-0.018
 snowball-0.018
 cannon-0.018
 button-0.018
Token playing
Token ID2712
Feature activation+1.27e-02

Pos logit contributions
 snowball+0.006
ah+0.006
 tentacles+0.006
 sled+0.006
 cannon+0.006
Neg logit contributions
thank-0.005
ups-0.005
organized-0.005
please-0.005
bel-0.004
Token at
Token ID379
Feature activation+6.08e-03

Pos logit contributions
thank+0.016
please+0.014
organized+0.013
 met+0.013
ups+0.013
Neg logit contributions
 snowball-0.026
ah-0.025
 cannon-0.025
 button-0.025
 lizard-0.024
Token knob
Token ID36842
Feature activation-1.72e-02

Pos logit contributions
thank+0.031
 met+0.028
please+0.028
organized+0.027
 neighborhood+0.027
Neg logit contributions
 button-0.018
ah-0.017
 snowball-0.017
 may-0.016
 sled-0.016
Token,
Token ID11
Feature activation-4.79e-03

Pos logit contributions
 tentacles+0.033
 cannon+0.033
 snowball+0.033
ah+0.033
affe+0.033
Neg logit contributions
thank-0.024
organized-0.020
please-0.019
 met-0.018
 punished-0.018
Token but
Token ID475
Feature activation-1.04e-02

Pos logit contributions
ah+0.009
affe+0.009
fly+0.009
 cannon+0.009
 globe+0.009
Neg logit contributions
thank-0.007
 met-0.006
 neighborhood-0.006
 punished-0.006
please-0.006
Token Sam
Token ID3409
Feature activation-4.07e-03

Pos logit contributions
ah+0.018
 snowball+0.018
affe+0.018
 cannon+0.017
 button+0.017
Neg logit contributions
thank-0.016
 met-0.015
 neighborhood-0.014
organized-0.013
please-0.013
Token was
Token ID373
Feature activation-2.39e-03

Pos logit contributions
ah+0.007
 snowball+0.007
 cannon+0.007
 may+0.007
affe+0.007
Neg logit contributions
thank-0.006
please-0.005
organized-0.005
 met-0.005
 neighborhood-0.005
Token determined
Token ID5295
Feature activation-1.98e-02

Pos logit contributions
 snowball+0.004
ah+0.004
 cannon+0.004
 tentacles+0.004
 sled+0.004
Neg logit contributions
thank-0.004
organized-0.003
please-0.003
 neighborhood-0.003
 met-0.003
Token.
Token ID13
Feature activation-7.52e-04

Pos logit contributions
ah+0.036
 may+0.035
 cannon+0.035
 snowball+0.035
 button+0.034
Neg logit contributions
thank-0.030
please-0.025
organized-0.025
 met-0.025
 neighborhood-0.024
Token He
Token ID679
Feature activation-1.07e-02

Pos logit contributions
ah+0.001
 cannon+0.001
 snowball+0.001
 may+0.001
affe+0.001
Neg logit contributions
thank-0.002
please-0.001
 met-0.001
organized-0.001
 neighborhood-0.001
Token ran
Token ID4966
Feature activation+1.26e-02

Pos logit contributions
ah+0.022
 cannon+0.022
 snowball+0.022
 may+0.022
affe+0.022
Neg logit contributions
thank-0.013
please-0.011
 met-0.011
organized-0.011
 neighborhood-0.010
Token up
Token ID510
Feature activation+6.20e-03

Pos logit contributions
thank+0.016
 met+0.013
organized+0.013
please+0.012
 neighborhood+0.012
Neg logit contributions
ah-0.027
 cannon-0.026
affe-0.026
 may-0.026
 snowball-0.026
Token to
Token ID284
Feature activation-1.76e-02

Pos logit contributions
thank+0.009
 met+0.007
please+0.007
organized+0.007
 neighborhood+0.007
Neg logit contributions
ah-0.012
 cannon-0.012
affe-0.012
 may-0.012
 snowball-0.012
Token to
Token ID284
Feature activation-4.09e-03

Pos logit contributions
ah+0.023
affe+0.022
 cannon+0.022
 snowball+0.022
 may+0.022
Neg logit contributions
thank-0.015
please-0.012
 met-0.012
organized-0.012
 neighborhood-0.011
Token drink
Token ID4144
Feature activation-1.71e-02

Pos logit contributions
affe+0.009
 snowball+0.008
ah+0.008
hell+0.008
 cannon+0.008
Neg logit contributions
thank-0.005
 met-0.005
 neighborhood-0.004
please-0.004
organized-0.004
Token from
Token ID422
Feature activation+1.15e-02

Pos logit contributions
ah+0.031
 cannon+0.030
affe+0.030
 snowball+0.030
 may+0.030
Neg logit contributions
thank-0.027
please-0.022
 met-0.022
organized-0.022
 neighborhood-0.021
Token the
Token ID262
Feature activation+5.48e-03

Pos logit contributions
thank+0.017
 met+0.015
please+0.014
 neighborhood+0.014
organized+0.014
Neg logit contributions
ah-0.021
affe-0.021
 may-0.020
 cannon-0.020
ience-0.020
Token pond
Token ID16723
Feature activation-1.82e-03

Pos logit contributions
thank+0.010
 neighborhood+0.009
 met+0.008
 doctors+0.008
please+0.008
Neg logit contributions
ah-0.009
rum-0.008
affe-0.008
 cannon-0.008
 may-0.008
Token every
Token ID790
Feature activation-5.04e-02

Pos logit contributions
ah+0.003
 cannon+0.003
 snowball+0.003
 may+0.003
affe+0.003
Neg logit contributions
thank-0.003
 met-0.002
organized-0.002
 neighborhood-0.002
please-0.002
Token day
Token ID1110
Feature activation-1.27e-02

Pos logit contributions
 cannon+0.117
affe+0.116
 may+0.115
 spoiler+0.114
rum+0.114
Neg logit contributions
thank-0.053
 met-0.043
please-0.039
 neighborhood-0.037
 doctors-0.037
Token.
Token ID13
Feature activation-9.05e-03

Pos logit contributions
ah+0.024
 snowball+0.024
 cannon+0.024
 may+0.023
affe+0.023
Neg logit contributions
thank-0.018
 met-0.015
please-0.015
organized-0.015
 neighborhood-0.015
Token Then
Token ID3244
Feature activation+1.78e-02

Pos logit contributions
ah+0.015
 cannon+0.015
bee+0.015
ience+0.015
 button+0.015
Neg logit contributions
thank-0.015
 neighborhood-0.013
 met-0.013
please-0.013
organized-0.012
Token one
Token ID530
Feature activation+2.07e-02

Pos logit contributions
thank+0.027
 met+0.025
 doctors+0.024
please+0.024
 neighborhood+0.023
Neg logit contributions
affe-0.031
 snowball-0.030
ah-0.030
 cannon-0.030
 globe-0.029
Token day
Token ID1110
Feature activation+7.73e-03

Pos logit contributions
thank+0.027
 met+0.023
please+0.022
 neighborhood+0.021
organized+0.021
Neg logit contributions
 cannon-0.042
 may-0.041
ah-0.041
 snowball-0.040
affe-0.040
Token a
Token ID257
Feature activation-1.90e-02

Pos logit contributions
thank+0.015
 met+0.013
please+0.013
organized+0.013
 doctors+0.012
Neg logit contributions
ah-0.019
 may-0.018
affe-0.018
 cannon-0.018
 snowball-0.018
Token big
Token ID1263
Feature activation-1.73e-02

Pos logit contributions
rum+0.039
ion+0.036
ience+0.036
ah+0.035
affe+0.035
Neg logit contributions
thank-0.024
 neighborhood-0.024
 met-0.023
please-0.022
 responsible-0.021
Token bubble
Token ID14310
Feature activation-3.66e-02

Pos logit contributions
rum+0.028
ience+0.026
hell+0.026
ion+0.026
ret+0.026
Neg logit contributions
 neighborhood-0.026
thank-0.026
 met-0.025
 doctors-0.025
 outside-0.025
Token near
Token ID1474
Feature activation-2.20e-02

Pos logit contributions
rum+0.061
haw+0.059
ience+0.058
ah+0.058
 spoiler+0.057
Neg logit contributions
thank-0.055
 outside-0.053
 met-0.050
 doctors-0.050
 neighborhood-0.047
Token the
Token ID262
Feature activation-1.13e-02

Pos logit contributions
ience+0.053
affe+0.052
ah+0.051
 may+0.050
bee+0.049
Neg logit contributions
thank-0.022
 met-0.019
 doctors-0.017
 neighborhood-0.017
 outside-0.017
Token ceiling
Token ID13387
Feature activation-5.20e-02

Pos logit contributions
rum+0.021
ah+0.020
ience+0.019
affe+0.019
haw+0.019
Neg logit contributions
thank-0.017
 met-0.016
 neighborhood-0.016
 doctors-0.015
 bored-0.014
Token.
Token ID13
Feature activation-1.45e-03

Pos logit contributions
ience+0.112
haw+0.112
affe+0.110
ah+0.109
 snowball+0.109
Neg logit contributions
thank-0.058
 neighborhood-0.046
 met-0.045
ups-0.043
 outside-0.042
Token He
Token ID679
Feature activation-1.84e-02

Pos logit contributions
ah+0.002
 snowball+0.002
 cannon+0.002
bee+0.002
affe+0.002
Neg logit contributions
thank-0.003
 met-0.003
please-0.003
organized-0.002
 neighborhood-0.002
Token thought
Token ID1807
Feature activation-2.56e-02

Pos logit contributions
 cannon+0.040
 globe+0.040
rum+0.039
ah+0.039
 snowball+0.039
Neg logit contributions
thank-0.020
 met-0.020
please-0.018
 neighborhood-0.016
 asked-0.015
Token,
Token ID11
Feature activation+8.55e-03

Pos logit contributions
 spoiler+0.051
ah+0.050
fly+0.047
 globe+0.047
osaurus+0.046
Neg logit contributions
 outside-0.032
 when-0.030
 doctors-0.028
 organized-0.027
 bored-0.027
Token "
Token ID366
Feature activation+1.19e-02

Pos logit contributions
 doctors+0.008
 met+0.008
thank+0.007
please+0.007
 bored+0.007
Neg logit contributions
ah-0.020
 snowball-0.019
rum-0.019
 cannon-0.018
 jaws-0.018
Token to
Token ID284
Feature activation+1.37e-02

Pos logit contributions
ah+0.032
 cannon+0.031
 snowball+0.031
 may+0.031
affe+0.031
Neg logit contributions
thank-0.021
please-0.017
 met-0.017
organized-0.017
 neighborhood-0.016
Token play
Token ID711
Feature activation+6.93e-03

Pos logit contributions
thank+0.024
organized+0.022
 neighborhood+0.021
 met+0.021
please+0.021
Neg logit contributions
ah-0.021
 button-0.020
 cannon-0.020
 snowball-0.020
 may-0.020
Token with
Token ID351
Feature activation+1.66e-03

Pos logit contributions
thank+0.010
please+0.009
organized+0.009
 met+0.009
 neighborhood+0.008
Neg logit contributions
 snowball-0.013
 cannon-0.013
 button-0.012
ah-0.012
 may-0.012
Token the
Token ID262
Feature activation+2.85e-03

Pos logit contributions
thank+0.002
organized+0.002
 met+0.002
please+0.002
 neighborhood+0.002
Neg logit contributions
ah-0.003
affe-0.003
 cannon-0.003
 snowball-0.003
 may-0.003
Token boss
Token ID6478
Feature activation+5.31e-03

Pos logit contributions
thank+0.005
 neighborhood+0.005
 doctors+0.004
 met+0.004
please+0.004
Neg logit contributions
ah-0.004
rum-0.004
ience-0.004
affe-0.004
 may-0.004
Tokeny
Token ID88
Feature activation-4.57e-02

Pos logit contributions
thank+0.009
 met+0.008
organized+0.008
ups+0.008
please+0.008
Neg logit contributions
 cannon-0.008
 tentacles-0.008
 may-0.008
 snowball-0.008
ience-0.008
Token girl
Token ID2576
Feature activation-2.30e-02

Pos logit contributions
ah+0.073
 cannon+0.071
 snowball+0.071
 may+0.070
affe+0.070
Neg logit contributions
thank-0.080
please-0.069
 met-0.068
organized-0.068
 neighborhood-0.067
Token.
Token ID13
Feature activation+3.44e-03

Pos logit contributions
ah+0.048
 cannon+0.048
 snowball+0.047
 tentacles+0.047
 button+0.047
Neg logit contributions
thank-0.028
 met-0.024
please-0.022
organized-0.022
 neighborhood-0.022
Token They
Token ID1119
Feature activation-1.84e-03

Pos logit contributions
thank+0.006
 met+0.005
please+0.005
organized+0.005
 neighborhood+0.005
Neg logit contributions
ah-0.005
 cannon-0.005
 snowball-0.005
affe-0.005
 may-0.005
Token wanted
Token ID2227
Feature activation-2.45e-02

Pos logit contributions
ah+0.003
 may+0.003
 snowball+0.003
 cannon+0.003
affe+0.003
Neg logit contributions
thank-0.003
please-0.002
organized-0.002
 neighborhood-0.002
 met-0.002
Token to
Token ID284
Feature activation-3.29e-03

Pos logit contributions
ah+0.050
 cannon+0.049
 snowball+0.049
affe+0.048
 may+0.048
Neg logit contributions
thank-0.032
please-0.027
 met-0.026
organized-0.026
 neighborhood-0.025
Token their
Token ID511
Feature activation+1.28e-02

Pos logit contributions
thank+0.030
 met+0.025
please+0.024
organized+0.024
 neighborhood+0.024
Neg logit contributions
ah-0.038
 cannon-0.037
 snowball-0.037
 may-0.037
affe-0.036
Token own
Token ID898
Feature activation+3.00e-04

Pos logit contributions
thank+0.023
 met+0.020
please+0.020
organized+0.019
 neighborhood+0.019
Neg logit contributions
ah-0.020
 cannon-0.020
 snowball-0.020
 may-0.019
affe-0.019
Token ending
Token ID7464
Feature activation-9.46e-03

Pos logit contributions
thank+0.001
 met+0.001
organized+0.001
 neighborhood+0.001
please+0.001
Neg logit contributions
ah-0.000
 cannon-0.000
 snowball-0.000
 may-0.000
affe-0.000
Token to
Token ID284
Feature activation-9.21e-03

Pos logit contributions
ah+0.016
 may+0.016
 cannon+0.016
 snowball+0.016
 button+0.015
Neg logit contributions
thank-0.015
please-0.013
 met-0.013
organized-0.013
 neighborhood-0.013
Token the
Token ID262
Feature activation+9.89e-03

Pos logit contributions
 may+0.015
 cannon+0.015
 button+0.015
ah+0.014
 snowball+0.014
Neg logit contributions
thank-0.015
please-0.013
organized-0.013
ups-0.013
 met-0.013
Token story
Token ID1621
Feature activation-4.60e-02

Pos logit contributions
thank+0.022
organized+0.020
please+0.020
 met+0.019
 punished+0.019
Neg logit contributions
ah-0.011
 button-0.011
 may-0.011
affe-0.011
 lizard-0.011
Token.
Token ID13
Feature activation+1.87e-03

Pos logit contributions
ah+0.084
 cannon+0.083
 snowball+0.082
 may+0.082
affe+0.082
Neg logit contributions
thank-0.069
please-0.059
 met-0.058
organized-0.058
 neighborhood-0.056
Token
Token ID198
Feature activation-3.44e-03

Pos logit contributions
thank+0.004
please+0.004
 punished+0.004
 unatt+0.004
 doctors+0.003
Neg logit contributions
 may-0.002
 It-0.002
 button-0.002
 cannon-0.002
affe-0.002
Token
Token ID198
Feature activation+1.24e-03

Pos logit contributions
 snowball+0.005
ah+0.005
 may+0.005
affe+0.005
 button+0.005
Neg logit contributions
thank-0.006
please-0.005
organized-0.005
 met-0.005
 punished-0.005
TokenL
Token ID43
Feature activation-1.09e-02

Pos logit contributions
thank+0.002
please+0.002
 met+0.002
 bored+0.002
 neighborhood+0.002
Neg logit contributions
ah-0.002
 button-0.002
affe-0.002
 snowball-0.002
 may-0.002
Tokenily
Token ID813
Feature activation+8.32e-03

Pos logit contributions
hell+0.017
affe+0.017
ah+0.017
ion+0.016
bee+0.016
Neg logit contributions
thank-0.019
please-0.018
 met-0.016
 neighborhood-0.016
 bored-0.016
Token
Token ID198
Feature activation-4.83e-03

Pos logit contributions
 snowball+0.014
ah+0.014
 cannon+0.014
affe+0.014
 may+0.014
Neg logit contributions
thank-0.015
organized-0.013
please-0.013
 met-0.012
 neighborhood-0.012
TokenThey
Token ID2990
Feature activation-5.52e-03

Pos logit contributions
 snowball+0.010
ah+0.010
 cannon+0.010
affe+0.010
 may+0.010
Neg logit contributions
thank-0.006
organized-0.005
please-0.005
 met-0.005
 neighborhood-0.005
Token put
Token ID1234
Feature activation+4.22e-04

Pos logit contributions
ah+0.011
 snowball+0.010
 cannon+0.010
 may+0.010
 sled+0.010
Neg logit contributions
thank-0.008
organized-0.007
please-0.007
 neighborhood-0.007
 doctors-0.006
Token the
Token ID262
Feature activation+5.66e-03

Pos logit contributions
thank+0.001
 met+0.001
organized+0.001
please+0.000
 neighborhood+0.000
Neg logit contributions
affe-0.001
ah-0.001
 cannon-0.001
fly-0.001
 may-0.001
Token cap
Token ID1451
Feature activation-9.58e-03

Pos logit contributions
thank+0.010
 met+0.010
 doctors+0.009
 neighborhood+0.009
please+0.009
Neg logit contributions
rum-0.008
ience-0.008
affe-0.008
fly-0.008
ah-0.008
Token on
Token ID319
Feature activation-4.51e-02

Pos logit contributions
ah+0.020
 may+0.019
 cannon+0.019
 tentacles+0.019
affe+0.019
Neg logit contributions
thank-0.012
 met-0.010
organized-0.010
ups-0.010
 neighborhood-0.009
Token Lily
Token ID20037
Feature activation-3.05e-02

Pos logit contributions
ah+0.083
affe+0.082
 cannon+0.081
 tentacles+0.081
 may+0.081
Neg logit contributions
thank-0.068
organized-0.055
 met-0.055
 neighborhood-0.055
please-0.053
Token's
Token ID338
Feature activation-2.99e-02

Pos logit contributions
ah+0.063
 cannon+0.062
 snowball+0.061
affe+0.061
haw+0.061
Neg logit contributions
thank-0.036
 met-0.034
organized-0.033
 neighborhood-0.032
 outside-0.030
Token head
Token ID1182
Feature activation-7.90e-03

Pos logit contributions
ience+0.058
ah+0.058
 may+0.055
fly+0.054
affe+0.054
Neg logit contributions
thank-0.041
 neighborhood-0.035
 doctors-0.035
 met-0.034
 responsible-0.033
Token and
Token ID290
Feature activation-2.68e-02

Pos logit contributions
 cannon+0.013
 snowball+0.013
affe+0.013
ah+0.013
 tentacles+0.013
Neg logit contributions
thank-0.013
organized-0.011
 met-0.011
 neighborhood-0.011
please-0.011
Token laughed
Token ID13818
Feature activation+1.35e-03

Pos logit contributions
 cannon+0.047
 snowball+0.046
 sled+0.045
ah+0.045
 may+0.045
Neg logit contributions
thank-0.043
organized-0.038
please-0.038
 neighborhood-0.036
 responsible-0.034