TOP ACTIVATIONS
MAX = 5.878

and said , " G reetings Sir . Do you speak
dog replied , " G reetings , little one . Don
gen ie . " G reetings , kind one !" said
friendly voice , " G reetings , little one . Why
" G reetings , brave rabbit . I
spoke up . " G reetings , le opard ," it
friendly voice . " G reetings Jack ! Are you here
it . But the fire fly was too quick and flew
and thanked the woman prof use ly . The woman smiled
. They were called ost ric hes . Lily and Ben
and said , " Exc use me , that 's our
. " Exc use me , sir , can
and not take things for granted . The end
. " Exc use me , m ister ,"
ostr ich pen . Ost ric hes are big birds with
. " Exc use me , children . It
to take your health for granted . <|endoftext|> Once upon a
â Exc use me , Fox , â
about how your actions might affect others . <|endoftext|> Once upon
someone say , " Exc use me , little girl .

MIN ACTIVATIONS
MIN = -7.453

her mom my watched in amaz ement as the machine flew
Sam gasped in amaz ement . He had never
egg ! He watched in amaz ement as the egg began
air . He watched with amaz ement as the z ig
ly . Jill nodded in amaz ement .
little girl looked up in amaz ement and said , "
. Tim my watched with amaz ement as he saw stars
ered in , blinking in amaz ement . But
. To her amaz ement , the clothes came
Tim my just stared in amaz ement at the t eddy
eddy bear and watched in amaz ement as it shrunk an
L ucky watched in amaz ement as the frog zoom
Little Jack gasped in amaz ement . Dad pointed to
go ! Everyone watched in amaz ement as it took off
friends stopped and watched in amaz ement . The dog too
my 's eyes widened with amaz ement . Grand ma then
the zipper and gasped with amaz ement at all the treasures
, her eyes wide with amaz ement . Her
, her eyes wide with amaz ement . "
Her best friend watched in amaz ement as Amelia gigg led

INTERVAL 2.546 - 5.878
CONTAINS 0.257%

. They were eating y ummy spaghetti and meat balls .
saw the monster again - sc ur rying away from the
find yourself in a dangerous situation , it 's important to
was time for bed . Patty was so happy and fell
head . M ummy : " Thank you ,

INTERVAL -0.787 - 2.546
CONTAINS 87.355%

happy and put on his lucky sock . But
work . Can you wait a little bit ?"
our home more . Can we go back now ?"
a time , there was a kid named Jack . He
the tree . This time , she made sure to hold

INTERVAL -4.120 - -0.787
CONTAINS 12.374%

the new fashion and started down the hill . Everyone in
had gone on a journey with Santa and how it had
with the leaves . He did not see them . He
They were amazed at how tall and beautiful it was .
hard to eat . <|endoftext|> Once upon a time there was

INTERVAL -7.453 - -4.120
CONTAINS 0.013%

at the bug in the p uddle !" she said in
they both looked around in amaz ement . The lake was
fox got out of the p uddle . She shook off
a bad tum my ac he and her mom my gave
a nos y old man pe ering at them through the
Token and
Token ID290
Feature activation-1.46e-01

Pos logit contributions
 two+0.018
 excitement+0.017
Y+0.016
 himself+0.015
 his+0.015
Neg logit contributions
aring-0.016
ues-0.016
 appearance-0.014
land-0.014
ems-0.014
Token said
Token ID531
Feature activation-2.77e-01

Pos logit contributions
Y+0.004
 excitement+0.004
 two+0.004
 mixture+0.004
 whole+0.004
Neg logit contributions
aring-0.005
 where-0.005
 why-0.005
ues-0.005
 about-0.005
Token,
Token ID11
Feature activation+2.87e-01

Pos logit contributions
 two+0.009
 excitement+0.008
Y+0.008
 himself+0.008
 his+0.008
Neg logit contributions
aring-0.010
ues-0.010
 appearance-0.010
ems-0.009
land-0.009
Token "
Token ID366
Feature activation-4.58e-01

Pos logit contributions
aring+0.010
ues+0.010
 appearance+0.009
land+0.009
ems+0.009
Neg logit contributions
 two-0.011
Y-0.010
 excitement-0.010
 himself-0.009
 p-0.009
TokenG
Token ID38
Feature activation-7.74e-01

Pos logit contributions
 excitement+0.013
 two+0.013
Y+0.011
 mixture+0.011
 p+0.011
Neg logit contributions
aring-0.018
land-0.017
ues-0.017
 Spanish-0.016
 rather-0.016
Tokenreetings
Token ID46648
Feature activation+5.88e+00

Pos logit contributions
 two+0.030
Y+0.029
 excitement+0.028
 himself+0.027
 mixture+0.025
Neg logit contributions
aring-0.022
ues-0.022
 rather-0.020
 appearance-0.020
 Spanish-0.019
Token Sir
Token ID7361
Feature activation+1.31e+00

Pos logit contributions
aring+0.210
ues+0.206
 rather+0.184
 appearance+0.184
land+0.179
Neg logit contributions
 two-0.205
Y-0.190
 excitement-0.189
 himself-0.171
 p-0.167
Token.
Token ID13
Feature activation-3.49e-03

Pos logit contributions
aring+0.054
ues+0.053
 rather+0.048
 appearance+0.048
land+0.047
Neg logit contributions
 two-0.038
Y-0.035
 excitement-0.034
 himself-0.031
 p-0.029
Token Do
Token ID2141
Feature activation+2.27e+00

Pos logit contributions
 two+0.000
Y+0.000
 excitement+0.000
 himself+0.000
 ah+0.000
Neg logit contributions
ues-0.000
aring-0.000
land-0.000
 appearance-0.000
 rather-0.000
Token you
Token ID345
Feature activation+2.72e-01

Pos logit contributions
aring+0.065
ues+0.061
land+0.055
 appearance+0.054
 rather+0.053
Neg logit contributions
 two-0.092
Y-0.091
 excitement-0.090
 himself-0.081
 mixture-0.080
Token speak
Token ID2740
Feature activation+9.61e-01

Pos logit contributions
ues+0.010
aring+0.010
land+0.009
 appearance+0.009
 Portuguese+0.009
Neg logit contributions
 two-0.009
 excitement-0.008
Y-0.008
 himself-0.008
 his-0.007
Token dog
Token ID3290
Feature activation+6.74e-01

Pos logit contributions
aring+0.010
ues+0.010
land+0.009
 rather+0.009
 appearance+0.009
Neg logit contributions
Y-0.006
 excitement-0.006
 two-0.006
 himself-0.005
 mixture-0.005
Token replied
Token ID8712
Feature activation+6.03e-01

Pos logit contributions
aring+0.029
ues+0.029
land+0.027
 appearance+0.026
 rather+0.026
Neg logit contributions
 two-0.017
Y-0.016
 excitement-0.016
 himself-0.014
 p-0.014
Token,
Token ID11
Feature activation+2.00e-01

Pos logit contributions
aring+0.020
ues+0.020
 appearance+0.019
land+0.019
ems+0.018
Neg logit contributions
 two-0.022
 excitement-0.020
Y-0.020
 himself-0.019
 p-0.018
Token "
Token ID366
Feature activation-2.63e-01

Pos logit contributions
aring+0.006
ues+0.006
 appearance+0.005
land+0.005
 rather+0.005
Neg logit contributions
 two-0.008
Y-0.008
 excitement-0.008
 himself-0.007
 p-0.007
TokenG
Token ID38
Feature activation-6.83e-01

Pos logit contributions
 excitement+0.008
 two+0.008
Y+0.007
 mixture+0.007
 himself+0.007
Neg logit contributions
aring-0.010
ues-0.009
 rather-0.009
 Spanish-0.009
.'"-0.008
Tokenreetings
Token ID46648
Feature activation+5.76e+00

Pos logit contributions
 two+0.025
Y+0.024
 excitement+0.023
 himself+0.022
 p+0.020
Neg logit contributions
aring-0.022
ues-0.022
 rather-0.020
 appearance-0.020
land-0.019
Token,
Token ID11
Feature activation-6.25e-02

Pos logit contributions
ues+0.229
aring+0.226
 appearance+0.205
 rather+0.203
land+0.197
Neg logit contributions
 two-0.185
 excitement-0.160
Y-0.157
 himself-0.150
 p-0.146
Token little
Token ID1310
Feature activation-1.58e-01

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 himself+0.002
 p+0.002
Neg logit contributions
ues-0.002
aring-0.002
land-0.002
 appearance-0.002
 rather-0.002
Token one
Token ID530
Feature activation-2.26e-01

Pos logit contributions
 two+0.004
Y+0.003
 excitement+0.003
 himself+0.003
 p+0.003
Neg logit contributions
aring-0.007
ues-0.007
land-0.007
 appearance-0.007
 rather-0.007
Token.
Token ID13
Feature activation-7.68e-02

Pos logit contributions
 two+0.007
 excitement+0.006
 p+0.006
Y+0.006
 giant+0.006
Neg logit contributions
ues-0.009
aring-0.009
 Him-0.008
 appearance-0.008
land-0.008
Token Don
Token ID2094
Feature activation+7.04e-01

Pos logit contributions
 two+0.003
Y+0.003
 excitement+0.003
 himself+0.003
 ah+0.002
Neg logit contributions
ues-0.002
aring-0.002
land-0.002
 appearance-0.002
 rather-0.002
Token gen
Token ID2429
Feature activation-1.27e+00

Pos logit contributions
 two+0.013
Y+0.013
 excitement+0.012
 himself+0.011
 mixture+0.010
Neg logit contributions
aring-0.021
ues-0.020
land-0.019
 appearance-0.019
 rather-0.019
Tokenie
Token ID494
Feature activation+1.02e+00

Pos logit contributions
 two+0.047
Y+0.045
 excitement+0.044
 himself+0.041
 p+0.040
Neg logit contributions
aring-0.040
ues-0.039
 appearance-0.035
 rather-0.035
 Spanish-0.034
Token.
Token ID13
Feature activation-3.41e-01

Pos logit contributions
aring+0.038
ues+0.037
 appearance+0.033
land+0.033
 rather+0.033
Neg logit contributions
 two-0.033
 excitement-0.031
Y-0.031
 himself-0.028
 p-0.027
Token "
Token ID366
Feature activation-5.79e-01

Pos logit contributions
 two+0.011
Y+0.010
 excitement+0.010
 himself+0.009
 mixture+0.009
Neg logit contributions
aring-0.013
ues-0.013
land-0.012
 rather-0.011
 appearance-0.011
TokenG
Token ID38
Feature activation-8.54e-01

Pos logit contributions
 excitement+0.017
 two+0.016
 himself+0.014
 p+0.013
 mixture+0.013
Neg logit contributions
aring-0.022
ues-0.020
 rather-0.020
land-0.020
 Spanish-0.020
Tokenreetings
Token ID46648
Feature activation+5.73e+00

Pos logit contributions
 two+0.030
Y+0.028
 himself+0.027
 excitement+0.026
 mixture+0.025
Neg logit contributions
aring-0.027
 rather-0.026
ues-0.026
 appearance-0.025
 Spanish-0.025
Token,
Token ID11
Feature activation+6.49e-03

Pos logit contributions
aring+0.204
ues+0.201
 appearance+0.178
 rather+0.177
ems+0.172
Neg logit contributions
 two-0.206
 excitement-0.186
Y-0.183
 p-0.169
 himself-0.169
Token kind
Token ID1611
Feature activation+1.21e+00

Pos logit contributions
aring+0.000
ues+0.000
land+0.000
 appearance+0.000
 rather+0.000
Neg logit contributions
 two-0.000
Y-0.000
 excitement-0.000
 himself-0.000
 mixture-0.000
Token one
Token ID530
Feature activation-2.12e-01

Pos logit contributions
aring+0.044
ues+0.043
land+0.039
 appearance+0.039
 rather+0.038
Neg logit contributions
 two-0.039
Y-0.039
 excitement-0.038
 himself-0.034
 mixture-0.033
Token!"
Token ID2474
Feature activation-4.01e-01

Pos logit contributions
 two+0.008
 excitement+0.007
Y+0.007
 whole+0.007
 p+0.007
Neg logit contributions
ues-0.007
aring-0.007
 appearance-0.006
ems-0.006
land-0.006
Token said
Token ID531
Feature activation-3.01e-01

Pos logit contributions
Y+0.011
 two+0.011
 excitement+0.011
 mixture+0.009
 himself+0.009
Neg logit contributions
aring-0.016
ues-0.015
land-0.014
 where-0.014
 rather-0.014
Token friendly
Token ID8030
Feature activation+1.27e+00

Pos logit contributions
 two+0.014
 excitement+0.013
Y+0.013
 p+0.011
 himself+0.011
Neg logit contributions
aring-0.021
ues-0.021
land-0.019
 appearance-0.019
ems-0.018
Token voice
Token ID3809
Feature activation+4.82e-01

Pos logit contributions
ues+0.059
aring+0.059
ems+0.053
 rather+0.053
.'"+0.052
Neg logit contributions
 excitement-0.032
 two-0.031
Y-0.025
 p-0.025
 himself-0.025
Token,
Token ID11
Feature activation+7.54e-02

Pos logit contributions
ues+0.017
aring+0.016
 appearance+0.016
 rather+0.015
ems+0.014
Neg logit contributions
 two-0.017
 excitement-0.015
 himself-0.014
 p-0.014
Y-0.014
Token "
Token ID366
Feature activation-5.31e-01

Pos logit contributions
ues+0.003
aring+0.003
 appearance+0.002
land+0.002
ute+0.002
Neg logit contributions
 two-0.003
 excitement-0.002
Y-0.002
 himself-0.002
 p-0.002
TokenG
Token ID38
Feature activation-9.95e-01

Pos logit contributions
 two+0.016
 excitement+0.016
Y+0.015
 himself+0.013
 mixture+0.013
Neg logit contributions
aring-0.021
ues-0.020
land-0.019
 appearance-0.018
 rather-0.018
Tokenreetings
Token ID46648
Feature activation+5.72e+00

Pos logit contributions
 two+0.039
Y+0.036
 himself+0.035
 excitement+0.033
Mike+0.032
Neg logit contributions
 rather-0.027
ues-0.027
aring-0.026
 appearance-0.026
 Spanish-0.024
Token,
Token ID11
Feature activation+2.92e-02

Pos logit contributions
aring+0.219
ues+0.219
 appearance+0.196
 rather+0.195
land+0.185
Neg logit contributions
 two-0.191
 excitement-0.167
Y-0.165
 himself-0.153
 p-0.148
Token little
Token ID1310
Feature activation-1.21e-01

Pos logit contributions
aring+0.001
ues+0.001
land+0.001
 appearance+0.001
 rather+0.001
Neg logit contributions
 two-0.001
Y-0.001
 excitement-0.001
 himself-0.001
 p-0.001
Token one
Token ID530
Feature activation-3.25e-01

Pos logit contributions
 two+0.003
Y+0.003
 excitement+0.003
 himself+0.002
 mixture+0.002
Neg logit contributions
aring-0.006
ues-0.005
land-0.005
 appearance-0.005
 rather-0.005
Token.
Token ID13
Feature activation-3.34e-02

Pos logit contributions
 two+0.012
 excitement+0.010
 giant+0.009
 whole+0.009
 p+0.009
Neg logit contributions
ues-0.013
aring-0.011
 appearance-0.011
 Him-0.010
ree-0.010
Token Why
Token ID4162
Feature activation+1.23e+00

Pos logit contributions
 two+0.001
Y+0.001
 excitement+0.001
 himself+0.001
 ah+0.001
Neg logit contributions
ues-0.001
aring-0.001
 appearance-0.001
land-0.001
 rather-0.001
Token
Token ID220
Feature activation-8.96e-01

Pos logit contributions
 two+0.016
 excitement+0.015
Y+0.015
 himself+0.014
 p+0.013
Neg logit contributions
ues-0.021
aring-0.021
 appearance-0.020
land-0.020
 rather-0.019
Token
Token ID198
Feature activation-1.52e-01

Pos logit contributions
 two+0.022
 himself+0.019
 excitement+0.017
Y+0.017
 his+0.016
Neg logit contributions
 appearance-0.041
aring-0.041
ues-0.041
 culture-0.038
 tale-0.038
Token
Token ID198
Feature activation-1.72e-01

Pos logit contributions
 two+0.005
Y+0.004
 excitement+0.004
 himself+0.004
 p+0.004
Neg logit contributions
aring-0.006
ues-0.006
 appearance-0.005
land-0.005
 rather-0.005
Token"
Token ID1
Feature activation-1.52e+00

Pos logit contributions
 two+0.006
 excitement+0.006
Y+0.006
 himself+0.005
 mixture+0.005
Neg logit contributions
aring-0.006
ues-0.006
land-0.005
 rather-0.005
 appearance-0.005
TokenG
Token ID38
Feature activation-1.15e+00

Pos logit contributions
 excitement+0.049
 two+0.044
 mixture+0.043
 mar+0.043
 Lego+0.042
Neg logit contributions
aring-0.049
 why-0.047
 Spanish-0.044
.'"-0.043
land-0.043
Tokenreetings
Token ID46648
Feature activation+5.66e+00

Pos logit contributions
 two+0.043
Y+0.040
 himself+0.038
 excitement+0.037
 mixture+0.035
Neg logit contributions
ues-0.036
aring-0.035
 appearance-0.033
 rather-0.033
 Spanish-0.033
Token,
Token ID11
Feature activation-7.13e-02

Pos logit contributions
aring+0.208
ues+0.207
 appearance+0.185
 rather+0.183
land+0.181
Neg logit contributions
 two-0.191
 excitement-0.176
Y-0.175
 himself-0.160
 p-0.156
Token brave
Token ID14802
Feature activation-4.16e-01

Pos logit contributions
 two+0.002
 excitement+0.002
Y+0.002
 himself+0.001
 p+0.001
Neg logit contributions
ues-0.003
aring-0.003
 appearance-0.003
land-0.003
 rather-0.003
Token rabbit
Token ID22746
Feature activation+1.08e+00

Pos logit contributions
 two+0.010
 excitement+0.009
Y+0.009
 himself+0.008
 p+0.007
Neg logit contributions
aring-0.019
ues-0.019
land-0.017
 appearance-0.017
 rather-0.017
Token.
Token ID13
Feature activation-1.85e-01

Pos logit contributions
ues+0.037
aring+0.036
 appearance+0.035
 Spanish+0.032
 rather+0.032
Neg logit contributions
 two-0.039
Y-0.035
 excitement-0.034
 himself-0.033
 p-0.033
Token I
Token ID314
Feature activation+2.99e-01

Pos logit contributions
 two+0.007
Y+0.007
 excitement+0.007
 himself+0.006
 ah+0.006
Neg logit contributions
ues-0.006
aring-0.005
 appearance-0.005
land-0.005
 rather-0.005
Token spoke
Token ID5158
Feature activation+1.01e+00

Pos logit contributions
aring+0.024
ues+0.024
land+0.022
 where+0.022
 Spanish+0.021
Neg logit contributions
Y-0.022
 two-0.021
 excitement-0.021
 himself-0.018
 mixture-0.018
Token up
Token ID510
Feature activation-1.10e+00

Pos logit contributions
aring+0.034
ues+0.033
 appearance+0.030
 rather+0.029
land+0.029
Neg logit contributions
 two-0.037
 excitement-0.035
Y-0.033
 himself-0.032
 his-0.031
Token.
Token ID13
Feature activation-1.42e-01

Pos logit contributions
 two+0.041
 excitement+0.040
Y+0.039
 himself+0.036
 p+0.035
Neg logit contributions
aring-0.036
ues-0.035
land-0.031
 appearance-0.031
 rather-0.030
Token "
Token ID366
Feature activation-4.76e-01

Pos logit contributions
 two+0.005
 excitement+0.004
Y+0.004
 himself+0.004
 p+0.004
Neg logit contributions
aring-0.005
ues-0.005
 appearance-0.005
 rather-0.005
land-0.005
TokenG
Token ID38
Feature activation-8.67e-01

Pos logit contributions
 excitement+0.015
 two+0.015
Y+0.014
 mixture+0.013
 himself+0.013
Neg logit contributions
aring-0.018
ues-0.016
land-0.016
 Spanish-0.016
 rather-0.015
Tokenreetings
Token ID46648
Feature activation+5.64e+00

Pos logit contributions
 two+0.032
Y+0.030
 himself+0.028
 excitement+0.028
 mixture+0.026
Neg logit contributions
ues-0.027
aring-0.027
 rather-0.025
 appearance-0.025
 Spanish-0.024
Token,
Token ID11
Feature activation+3.13e-02

Pos logit contributions
ues+0.216
aring+0.215
 rather+0.195
 appearance+0.193
 Spanish+0.185
Neg logit contributions
 two-0.192
 excitement-0.164
Y-0.160
 p-0.152
 himself-0.151
Token le
Token ID443
Feature activation-1.83e+00

Pos logit contributions
ues+0.001
aring+0.001
 appearance+0.001
land+0.001
 rather+0.001
Neg logit contributions
 two-0.001
Y-0.001
 excitement-0.001
 himself-0.001
 p-0.001
Tokenopard
Token ID15478
Feature activation-2.01e-01

Pos logit contributions
 excitement+0.077
Y+0.071
 two+0.071
 anticipation+0.063
 himself+0.063
Neg logit contributions
aring-0.058
ues-0.051
land-0.046
 where-0.044
 rather-0.044
Token,"
Token ID553
Feature activation-1.03e+00

Pos logit contributions
 two+0.006
Y+0.006
 excitement+0.005
 p+0.005
 himself+0.005
Neg logit contributions
ues-0.008
aring-0.008
 appearance-0.007
 Spanish-0.007
land-0.007
Token it
Token ID340
Feature activation-1.47e-01

Pos logit contributions
 two+0.031
 excitement+0.028
Y+0.027
 himself+0.024
 p+0.024
Neg logit contributions
aring-0.045
ues-0.044
 appearance-0.041
land-0.039
 rather-0.039
Token friendly
Token ID8030
Feature activation+1.12e+00

Pos logit contributions
 two+0.018
Y+0.018
 excitement+0.018
 himself+0.015
 p+0.014
Neg logit contributions
aring-0.026
ues-0.026
land-0.024
 appearance-0.024
 rather-0.023
Token voice
Token ID3809
Feature activation+3.26e-01

Pos logit contributions
ues+0.047
aring+0.047
ems+0.042
 rather+0.042
land+0.042
Neg logit contributions
 excitement-0.031
 two-0.031
Y-0.028
 p-0.026
 himself-0.025
Token.
Token ID13
Feature activation-3.88e-01

Pos logit contributions
ues+0.012
aring+0.012
 appearance+0.011
 rather+0.011
ems+0.010
Neg logit contributions
 two-0.011
 excitement-0.010
Y-0.009
 himself-0.009
 p-0.009
Token "
Token ID366
Feature activation-6.80e-01

Pos logit contributions
 two+0.014
 excitement+0.013
Y+0.013
 himself+0.012
 his+0.011
Neg logit contributions
aring-0.014
ues-0.013
land-0.013
 appearance-0.013
 tale-0.012
TokenG
Token ID38
Feature activation-9.83e-01

Pos logit contributions
 excitement+0.022
 two+0.021
Y+0.020
 mixture+0.019
 himself+0.018
Neg logit contributions
aring-0.025
ues-0.023
land-0.022
 rather-0.022
 Spanish-0.022
Tokenreetings
Token ID46648
Feature activation+5.58e+00

Pos logit contributions
 two+0.039
Y+0.037
 himself+0.035
 excitement+0.035
 mixture+0.032
Neg logit contributions
aring-0.028
ues-0.027
 rather-0.027
 appearance-0.026
 Spanish-0.025
Token Jack
Token ID3619
Feature activation+7.98e-01

Pos logit contributions
aring+0.203
ues+0.201
 appearance+0.179
 rather+0.178
land+0.174
Neg logit contributions
 two-0.192
 excitement-0.177
Y-0.174
 himself-0.161
 p-0.156
Token!
Token ID0
Feature activation-1.67e-01

Pos logit contributions
ues+0.028
aring+0.028
 appearance+0.026
 rather+0.025
 Spanish+0.024
Neg logit contributions
 two-0.029
 excitement-0.025
Y-0.025
 himself-0.024
 p-0.023
Token Are
Token ID4231
Feature activation+7.02e-01

Pos logit contributions
 two+0.007
Y+0.006
 excitement+0.006
 himself+0.006
 ah+0.005
Neg logit contributions
ues-0.005
aring-0.005
 appearance-0.005
land-0.005
 rather-0.004
Token you
Token ID345
Feature activation+4.65e-02

Pos logit contributions
aring+0.019
ues+0.017
 where+0.015
land+0.015
 rather+0.015
Neg logit contributions
Y-0.029
 excitement-0.029
 two-0.029
 mixture-0.026
 himself-0.026
Token here
Token ID994
Feature activation-8.45e-02

Pos logit contributions
aring+0.002
ues+0.002
land+0.002
 appearance+0.002
 rather+0.001
Neg logit contributions
 two-0.002
Y-0.001
 excitement-0.001
 himself-0.001
 mixture-0.001
Token it
Token ID340
Feature activation-9.31e-01

Pos logit contributions
aring+0.010
ues+0.009
 rather+0.009
 where+0.009
 Spanish+0.008
Neg logit contributions
 two-0.010
Y-0.010
 excitement-0.010
 mixture-0.009
 himself-0.009
Token.
Token ID13
Feature activation-3.40e-01

Pos logit contributions
 two+0.030
 excitement+0.026
Y+0.026
 himself+0.026
 p+0.025
Neg logit contributions
aring-0.038
ues-0.037
 Spanish-0.033
 Him-0.033
 appearance-0.032
Token But
Token ID887
Feature activation-7.25e-01

Pos logit contributions
 two+0.010
Y+0.010
 excitement+0.009
 himself+0.008
 p+0.008
Neg logit contributions
aring-0.014
ues-0.014
 appearance-0.013
land-0.013
 rather-0.012
Token the
Token ID262
Feature activation-3.36e-01

Pos logit contributions
 two+0.021
Y+0.021
 excitement+0.020
 himself+0.018
 p+0.017
Neg logit contributions
aring-0.029
ues-0.028
land-0.026
 appearance-0.026
 rather-0.025
Token fire
Token ID2046
Feature activation-1.47e-01

Pos logit contributions
 two+0.008
Y+0.008
 excitement+0.008
 himself+0.007
 mixture+0.007
Neg logit contributions
aring-0.015
ues-0.014
land-0.013
 appearance-0.013
 rather-0.013
Tokenfly
Token ID12254
Feature activation+4.83e+00

Pos logit contributions
 two+0.005
Y+0.005
 excitement+0.005
 himself+0.004
 p+0.004
Neg logit contributions
aring-0.005
ues-0.005
land-0.005
 appearance-0.005
 rather-0.004
Token was
Token ID373
Feature activation+4.58e-01

Pos logit contributions
aring+0.182
ues+0.182
land+0.166
 Spanish+0.162
 appearance+0.159
Neg logit contributions
 two-0.156
Y-0.142
 excitement-0.142
 p-0.134
 himself-0.127
Token too
Token ID1165
Feature activation-8.72e-01

Pos logit contributions
aring+0.017
ues+0.016
land+0.015
 appearance+0.014
 rather+0.014
Neg logit contributions
 two-0.015
Y-0.015
 excitement-0.015
 himself-0.013
 p-0.013
Token quick
Token ID2068
Feature activation+1.51e+00

Pos logit contributions
 two+0.023
Y+0.022
 excitement+0.022
 himself+0.019
 mixture+0.018
Neg logit contributions
aring-0.037
ues-0.036
land-0.034
 appearance-0.033
 rather-0.033
Token and
Token ID290
Feature activation+9.45e-02

Pos logit contributions
ues+0.062
aring+0.058
 Spanish+0.055
 Portuguese+0.054
 appearance+0.053
Neg logit contributions
 two-0.047
 excitement-0.041
 himself-0.040
 p-0.038
 his-0.038
Token flew
Token ID13112
Feature activation+1.75e+00

Pos logit contributions
aring+0.004
ues+0.004
land+0.003
 appearance+0.003
 rather+0.003
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 himself-0.002
 p-0.002
Token and
Token ID290
Feature activation-2.42e-01

Pos logit contributions
 two+0.032
 excitement+0.031
Y+0.031
 himself+0.028
 p+0.027
Neg logit contributions
aring-0.027
ues-0.027
 appearance-0.023
land-0.023
 rather-0.023
Token thanked
Token ID26280
Feature activation-9.82e-02

Pos logit contributions
Y+0.006
 two+0.006
 excitement+0.006
 mixture+0.005
 himself+0.005
Neg logit contributions
aring-0.010
ues-0.009
 where-0.009
 rather-0.009
land-0.009
Token the
Token ID262
Feature activation-6.35e-02

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 himself+0.002
 mixture+0.002
Neg logit contributions
aring-0.005
ues-0.004
land-0.004
 appearance-0.004
 rather-0.004
Token woman
Token ID2415
Feature activation+6.77e-01

Pos logit contributions
Y+0.002
 two+0.002
 excitement+0.002
 himself+0.001
Carl+0.001
Neg logit contributions
aring-0.003
ues-0.002
land-0.002
 appearance-0.002
 where-0.002
Token prof
Token ID1534
Feature activation+2.53e+00

Pos logit contributions
ues+0.025
aring+0.024
 appearance+0.023
ems+0.022
 rather+0.021
Neg logit contributions
 two-0.023
 excitement-0.021
 himself-0.019
 his-0.018
 p-0.018
Tokenuse
Token ID1904
Feature activation+4.75e+00

Pos logit contributions
aring+0.056
ues+0.053
land+0.045
 appearance+0.045
 rather+0.043
Neg logit contributions
 two-0.120
Y-0.117
 excitement-0.116
 himself-0.107
 p-0.105
Tokenly
Token ID306
Feature activation+2.49e+00

Pos logit contributions
aring+0.171
ues+0.166
ute+0.156
 Portuguese+0.150
land+0.150
Neg logit contributions
 two-0.164
 excitement-0.154
Y-0.146
 himself-0.140
 his-0.130
Token.
Token ID13
Feature activation-4.77e-01

Pos logit contributions
aring+0.084
ues+0.082
 appearance+0.074
land+0.074
 rather+0.073
Neg logit contributions
 two-0.089
 excitement-0.085
Y-0.083
 himself-0.076
 p-0.073
Token The
Token ID383
Feature activation+3.79e-01

Pos logit contributions
 two+0.015
Y+0.014
 excitement+0.014
 himself+0.012
 p+0.011
Neg logit contributions
aring-0.019
ues-0.019
 appearance-0.018
land-0.018
 tale-0.017
Token woman
Token ID2415
Feature activation+5.05e-01

Pos logit contributions
aring+0.014
ues+0.014
land+0.013
 appearance+0.013
 rather+0.013
Neg logit contributions
 two-0.012
Y-0.012
 excitement-0.011
 himself-0.010
 p-0.010
Token smiled
Token ID13541
Feature activation-4.81e-01

Pos logit contributions
aring+0.022
ues+0.022
land+0.020
 rather+0.020
 where+0.020
Neg logit contributions
Y-0.012
 two-0.012
 excitement-0.012
 himself-0.010
 mixture-0.010
Token.
Token ID13
Feature activation+1.58e-01

Pos logit contributions
ues+0.022
aring+0.022
 appearance+0.020
 Him+0.019
ems+0.019
Neg logit contributions
 two-0.017
 excitement-0.015
Y-0.014
 himself-0.014
 p-0.013
Token They
Token ID1119
Feature activation-3.01e-01

Pos logit contributions
aring+0.007
ues+0.006
 appearance+0.006
land+0.006
 tale+0.006
Neg logit contributions
 two-0.005
 excitement-0.004
Y-0.004
 himself-0.004
 p-0.004
Token were
Token ID547
Feature activation-2.92e-01

Pos logit contributions
 two+0.008
Y+0.008
 excitement+0.008
 himself+0.007
 mixture+0.006
Neg logit contributions
aring-0.013
ues-0.012
land-0.011
 appearance-0.011
 rather-0.011
Token called
Token ID1444
Feature activation+2.11e+00

Pos logit contributions
 two+0.009
 excitement+0.009
Y+0.008
 himself+0.007
 p+0.007
Neg logit contributions
aring-0.012
ues-0.012
land-0.011
 appearance-0.010
ems-0.010
Token ost
Token ID23619
Feature activation-9.30e-01

Pos logit contributions
aring+0.081
ues+0.080
 rather+0.071
 appearance+0.070
ems+0.068
Neg logit contributions
 two-0.075
 excitement-0.066
Y-0.059
 p-0.058
 himself-0.057
Tokenric
Token ID1173
Feature activation+4.74e+00

Pos logit contributions
 two+0.045
 excitement+0.040
Y+0.039
 himself+0.039
 p+0.038
Neg logit contributions
ues-0.019
aring-0.018
 rather-0.016
 Spanish-0.014
land-0.014
Tokenhes
Token ID956
Feature activation-6.55e-01

Pos logit contributions
aring+0.171
ues+0.166
 appearance+0.154
land+0.151
 rather+0.149
Neg logit contributions
 two-0.160
Y-0.149
 excitement-0.148
 himself-0.136
 p-0.127
Token.
Token ID13
Feature activation+1.71e-01

Pos logit contributions
 two+0.022
 excitement+0.020
Y+0.019
 himself+0.018
 p+0.017
Neg logit contributions
aring-0.025
ues-0.024
 appearance-0.022
land-0.021
 rather-0.021
Token Lily
Token ID20037
Feature activation-1.27e-02

Pos logit contributions
aring+0.007
ues+0.007
 appearance+0.007
land+0.006
 rather+0.006
Neg logit contributions
 two-0.005
Y-0.005
 excitement-0.005
 himself-0.004
 p-0.004
Token and
Token ID290
Feature activation+5.85e-02

Pos logit contributions
 two+0.000
Y+0.000
 excitement+0.000
 p+0.000
 himself+0.000
Neg logit contributions
aring-0.001
ues-0.001
 appearance-0.000
land-0.000
 Him-0.000
Token Ben
Token ID3932
Feature activation+1.23e+00

Pos logit contributions
 where+0.002
 why+0.002
 appearance+0.002
aring+0.002
 Him+0.002
Neg logit contributions
Y-0.002
 excitement-0.002
 mixture-0.002
 two-0.002
 entire-0.002
Token and
Token ID290
Feature activation-6.64e-03

Pos logit contributions
ues+0.023
aring+0.022
 appearance+0.020
 Portuguese+0.020
ems+0.019
Neg logit contributions
 two-0.016
 excitement-0.014
 himself-0.014
 his-0.013
 p-0.013
Token said
Token ID531
Feature activation+6.28e-01

Pos logit contributions
Y+0.000
 two+0.000
 excitement+0.000
 himself+0.000
 mixture+0.000
Neg logit contributions
aring-0.000
ues-0.000
 where-0.000
land-0.000
 rather-0.000
Token,
Token ID11
Feature activation+6.51e-01

Pos logit contributions
aring+0.024
ues+0.024
 appearance+0.022
ems+0.022
land+0.022
Neg logit contributions
 two-0.020
 excitement-0.019
 himself-0.018
Y-0.018
 his-0.017
Token "
Token ID366
Feature activation-1.74e-01

Pos logit contributions
ues+0.025
aring+0.025
 appearance+0.024
land+0.022
 tale+0.022
Neg logit contributions
 two-0.022
 excitement-0.019
Y-0.019
 himself-0.018
 his-0.018
TokenExc
Token ID40127
Feature activation+7.53e-01

Pos logit contributions
 excitement+0.005
 two+0.005
 p+0.004
Y+0.004
 himself+0.004
Neg logit contributions
aring-0.007
ues-0.006
land-0.006
 Spanish-0.006
 rather-0.006
Tokenuse
Token ID1904
Feature activation+4.61e+00

Pos logit contributions
ues+0.017
aring+0.017
land+0.014
 rather+0.014
 appearance+0.014
Neg logit contributions
Y-0.035
 two-0.035
 excitement-0.033
 himself-0.031
 p-0.031
Token me
Token ID502
Feature activation+8.21e-01

Pos logit contributions
aring+0.183
ues+0.183
 rather+0.161
land+0.157
 Spanish+0.157
Neg logit contributions
 two-0.147
Y-0.138
 excitement-0.133
 himself-0.122
 p-0.120
Token,
Token ID11
Feature activation+1.28e-01

Pos logit contributions
ues+0.030
aring+0.029
 appearance+0.027
land+0.027
 rather+0.026
Neg logit contributions
 two-0.028
 excitement-0.025
Y-0.024
 himself-0.023
 p-0.023
Token that
Token ID326
Feature activation+5.49e-01

Pos logit contributions
ues+0.004
aring+0.004
 appearance+0.004
land+0.004
 rather+0.003
Neg logit contributions
 two-0.005
Y-0.005
 excitement-0.005
 p-0.004
 himself-0.004
Token's
Token ID338
Feature activation+1.49e+00

Pos logit contributions
ues+0.019
aring+0.019
land+0.017
 appearance+0.016
 rather+0.016
Neg logit contributions
 two-0.020
Y-0.019
 excitement-0.019
 himself-0.017
 p-0.017
Token our
Token ID674
Feature activation+1.92e-01

Pos logit contributions
aring+0.069
ues+0.068
land+0.061
 appearance+0.060
ems+0.058
Neg logit contributions
 two-0.041
Y-0.035
 excitement-0.035
 himself-0.033
 his-0.032
Token.
Token ID13
Feature activation+2.65e-01

Pos logit contributions
aring+0.015
ues+0.015
 appearance+0.013
land+0.013
 rather+0.013
Neg logit contributions
 two-0.013
 excitement-0.012
Y-0.012
 himself-0.011
 p-0.011
Token
Token ID198
Feature activation+2.16e-01

Pos logit contributions
aring+0.013
ues+0.013
 appearance+0.012
land+0.012
 rather+0.012
Neg logit contributions
 two-0.006
Y-0.005
 excitement-0.005
 himself-0.004
 his-0.004
Token
Token ID198
Feature activation+1.71e-01

Pos logit contributions
aring+0.009
 appearance+0.009
ues+0.009
ems+0.009
land+0.009
Neg logit contributions
 two-0.006
Y-0.005
 excitement-0.005
 himself-0.005
 his-0.004
Token"
Token ID1
Feature activation-1.15e+00

Pos logit contributions
aring+0.006
ues+0.006
land+0.005
 rather+0.005
 appearance+0.005
Neg logit contributions
 two-0.006
 excitement-0.006
Y-0.006
 himself-0.005
 mixture-0.005
TokenExc
Token ID40127
Feature activation+8.59e-01

Pos logit contributions
 excitement+0.039
 two+0.034
 himself+0.033
Y+0.032
 mixture+0.032
Neg logit contributions
aring-0.039
ues-0.038
 Spanish-0.036
land-0.036
 English-0.035
Tokenuse
Token ID1904
Feature activation+4.61e+00

Pos logit contributions
ues+0.022
aring+0.021
 rather+0.018
 about+0.018
land+0.018
Neg logit contributions
 two-0.038
Y-0.038
 excitement-0.035
 p-0.034
 himself-0.033
Token me
Token ID502
Feature activation+9.19e-01

Pos logit contributions
aring+0.185
ues+0.183
 rather+0.166
ute+0.158
 Spanish+0.156
Neg logit contributions
 two-0.153
Y-0.136
 excitement-0.130
 p-0.123
 himself-0.120
Token,
Token ID11
Feature activation+4.14e-01

Pos logit contributions
ues+0.034
aring+0.033
 appearance+0.032
 rather+0.031
land+0.031
Neg logit contributions
 two-0.031
 excitement-0.026
Y-0.025
 himself-0.025
 p-0.024
Token sir
Token ID15967
Feature activation+8.57e-01

Pos logit contributions
ues+0.013
aring+0.013
 appearance+0.011
land+0.011
 rather+0.011
Neg logit contributions
 two-0.017
Y-0.015
 excitement-0.015
 p-0.014
 himself-0.014
Token,
Token ID11
Feature activation+4.37e-01

Pos logit contributions
ues+0.032
aring+0.030
 appearance+0.030
 rather+0.028
land+0.027
Neg logit contributions
 two-0.029
Y-0.024
 himself-0.024
 p-0.023
 excitement-0.023
Token can
Token ID460
Feature activation+2.29e-01

Pos logit contributions
ues+0.015
aring+0.014
 appearance+0.013
land+0.013
.'"+0.012
Neg logit contributions
 two-0.018
Y-0.015
 excitement-0.015
 p-0.015
 his-0.014
Token and
Token ID290
Feature activation+4.78e-01

Pos logit contributions
ues+0.026
aring+0.026
 appearance+0.023
land+0.023
 Spanish+0.022
Neg logit contributions
 two-0.021
 excitement-0.017
 himself-0.017
Y-0.017
 his-0.017
Token not
Token ID407
Feature activation+3.72e-01

Pos logit contributions
ues+0.017
aring+0.017
land+0.016
.'"+0.014
 appearance+0.014
Neg logit contributions
 two-0.017
 excitement-0.016
Y-0.015
 himself-0.015
 his-0.014
Token take
Token ID1011
Feature activation+4.57e-01

Pos logit contributions
ues+0.015
aring+0.014
land+0.013
 Portuguese+0.012
ems+0.012
Neg logit contributions
 two-0.012
 excitement-0.011
Y-0.010
 himself-0.010
 p-0.010
Token things
Token ID1243
Feature activation+4.86e-01

Pos logit contributions
aring+0.020
ues+0.020
land+0.018
ems+0.018
ute+0.017
Neg logit contributions
 two-0.013
 excitement-0.012
 himself-0.011
Y-0.010
 his-0.010
Token for
Token ID329
Feature activation+9.97e-01

Pos logit contributions
ues+0.018
aring+0.017
 appearance+0.016
 Spanish+0.016
 Portuguese+0.016
Neg logit contributions
 two-0.016
 excitement-0.014
 himself-0.014
 his-0.013
Y-0.013
Token granted
Token ID7520
Feature activation+4.60e+00

Pos logit contributions
aring+0.041
ues+0.039
land+0.036
 appearance+0.034
ems+0.034
Neg logit contributions
 two-0.032
 excitement-0.029
Y-0.028
 himself-0.027
 his-0.025
Token.
Token ID13
Feature activation+6.29e-01

Pos logit contributions
aring+0.207
ues+0.200
 appearance+0.181
 Portuguese+0.180
land+0.176
Neg logit contributions
 two-0.122
 excitement-0.100
 himself-0.098
 his-0.094
Y-0.091
Token
Token ID198
Feature activation+7.20e-01

Pos logit contributions
aring+0.025
ues+0.025
 appearance+0.023
land+0.023
 rather+0.022
Neg logit contributions
 two-0.019
Y-0.018
 excitement-0.018
 himself-0.016
 p-0.015
Token
Token ID198
Feature activation+6.09e-01

Pos logit contributions
aring+0.028
ues+0.028
 appearance+0.027
land+0.025
 rather+0.025
Neg logit contributions
 two-0.022
Y-0.021
 excitement-0.021
 himself-0.019
 his-0.018
TokenThe
Token ID464
Feature activation+8.50e-01

Pos logit contributions
aring+0.024
ues+0.024
land+0.022
 appearance+0.021
 rather+0.021
Neg logit contributions
 two-0.018
Y-0.017
 excitement-0.017
 himself-0.015
 mixture-0.014
Token end
Token ID886
Feature activation+2.08e-01

Pos logit contributions
aring+0.031
ues+0.030
land+0.028
 appearance+0.028
 rather+0.028
Neg logit contributions
Y-0.026
 excitement-0.025
 two-0.025
 himself-0.023
 mixture-0.022
Token.
Token ID13
Feature activation+2.33e-01

Pos logit contributions
ues+0.012
aring+0.012
 appearance+0.011
 rather+0.011
ems+0.011
Neg logit contributions
 two-0.008
 excitement-0.007
 himself-0.007
 his-0.007
Y-0.007
Token
Token ID198
Feature activation+2.54e-01

Pos logit contributions
aring+0.009
ues+0.009
 appearance+0.009
land+0.009
 rather+0.008
Neg logit contributions
 two-0.007
Y-0.007
 excitement-0.006
 himself-0.006
 p-0.005
Token
Token ID198
Feature activation+1.76e-01

Pos logit contributions
aring+0.011
ues+0.010
 appearance+0.010
land+0.010
ems+0.009
Neg logit contributions
 two-0.007
Y-0.007
 excitement-0.007
 himself-0.006
 his-0.006
Token"
Token ID1
Feature activation-1.21e+00

Pos logit contributions
aring+0.005
ues+0.005
 Spanish+0.004
 rather+0.004
land+0.004
Neg logit contributions
 excitement-0.007
 two-0.007
Y-0.006
 mixture-0.006
 himself-0.006
TokenExc
Token ID40127
Feature activation+8.61e-01

Pos logit contributions
 excitement+0.042
 mixture+0.037
 himself+0.037
 two+0.036
inge+0.035
Neg logit contributions
 Spanish-0.036
 English-0.036
land-0.035
aring-0.035
ems-0.033
Tokenuse
Token ID1904
Feature activation+4.59e+00

Pos logit contributions
aring+0.020
ues+0.019
 rather+0.016
land+0.015
 appearance+0.015
Neg logit contributions
 two-0.040
Y-0.040
 excitement-0.038
 p-0.036
 himself-0.035
Token me
Token ID502
Feature activation+8.68e-01

Pos logit contributions
aring+0.178
ues+0.173
 rather+0.156
land+0.151
 appearance+0.150
Neg logit contributions
 two-0.153
Y-0.142
 excitement-0.138
 himself-0.126
 p-0.124
Token,
Token ID11
Feature activation+3.67e-01

Pos logit contributions
ues+0.031
aring+0.031
 appearance+0.029
 rather+0.028
land+0.028
Neg logit contributions
 two-0.030
 excitement-0.026
Y-0.026
 himself-0.024
 p-0.024
Token m
Token ID285
Feature activation-2.94e+00

Pos logit contributions
aring+0.011
ues+0.011
land+0.010
 appearance+0.009
 rather+0.009
Neg logit contributions
 two-0.014
Y-0.014
 excitement-0.014
 himself-0.013
 p-0.012
Tokenister
Token ID1694
Feature activation+4.24e-01

Pos logit contributions
 two+0.112
Y+0.109
 excitement+0.108
 himself+0.099
 p+0.095
Neg logit contributions
aring-0.090
ues-0.088
land-0.079
 appearance-0.079
 rather-0.078
Token,"
Token ID553
Feature activation-1.14e+00

Pos logit contributions
ues+0.015
aring+0.014
 appearance+0.014
 rather+0.013
land+0.013
Neg logit contributions
 two-0.015
Y-0.013
 excitement-0.013
 himself-0.012
 p-0.012
Token ostr
Token ID42387
Feature activation-1.72e+00

Pos logit contributions
 two+0.004
Y+0.004
 excitement+0.004
 himself+0.003
 p+0.003
Neg logit contributions
aring-0.005
ues-0.005
land-0.004
 appearance-0.004
 rather-0.004
Tokenich
Token ID488
Feature activation+1.01e+00

Pos logit contributions
 two+0.080
Y+0.078
 excitement+0.077
 himself+0.072
 p+0.070
Neg logit contributions
aring-0.038
ues-0.037
land-0.031
 appearance-0.030
 rather-0.030
Token pen
Token ID3112
Feature activation+2.02e+00

Pos logit contributions
ues+0.040
aring+0.038
 appearance+0.035
ems+0.035
 Spanish+0.035
Neg logit contributions
 two-0.031
 p-0.028
 his-0.026
 front-0.026
 himself-0.025
Token.
Token ID13
Feature activation+1.71e-01

Pos logit contributions
aring+0.083
ues+0.078
 appearance+0.078
 unnecessary+0.071
itionally+0.070
Neg logit contributions
 two-0.065
 himself-0.052
 p-0.052
 excitement-0.051
 his-0.050
Token Ost
Token ID38919
Feature activation-3.39e-01

Pos logit contributions
aring+0.007
ues+0.007
 appearance+0.006
land+0.006
ems+0.006
Neg logit contributions
 two-0.005
Y-0.005
 excitement-0.005
 himself-0.004
 his-0.004
Tokenric
Token ID1173
Feature activation+4.57e+00

Pos logit contributions
 two+0.019
Y+0.018
 excitement+0.018
 himself+0.017
 mixture+0.017
Neg logit contributions
aring-0.005
ues-0.004
land-0.003
 appearance-0.003
 where-0.003
Tokenhes
Token ID956
Feature activation-7.87e-01

Pos logit contributions
aring+0.168
ues+0.164
 appearance+0.156
land+0.151
 rather+0.149
Neg logit contributions
 two-0.152
Y-0.139
 excitement-0.137
 himself-0.128
 p-0.119
Token are
Token ID389
Feature activation+4.32e-01

Pos logit contributions
 two+0.026
 himself+0.022
 his+0.021
 p+0.021
 excitement+0.021
Neg logit contributions
aring-0.031
ues-0.031
 appearance-0.028
 rather-0.028
land-0.027
Token big
Token ID1263
Feature activation+5.60e-01

Pos logit contributions
aring+0.019
ues+0.019
land+0.017
 appearance+0.017
ems+0.017
Neg logit contributions
 two-0.013
 excitement-0.011
 p-0.010
Y-0.010
 himself-0.010
Token birds
Token ID10087
Feature activation+6.84e-01

Pos logit contributions
aring+0.021
ues+0.021
land+0.018
ute+0.018
 rather+0.018
Neg logit contributions
 two-0.019
 excitement-0.017
 p-0.016
Y-0.016
 himself-0.015
Token with
Token ID351
Feature activation-5.76e-01

Pos logit contributions
ues+0.031
 appearance+0.028
aring+0.028
ems+0.027
 Spanish+0.027
Neg logit contributions
 two-0.019
 p-0.015
 excitement-0.015
 himself-0.014
 his-0.013
Token.
Token ID13
Feature activation+4.19e-01

Pos logit contributions
aring+0.046
ues+0.042
 appearance+0.039
 Him+0.039
 Portuguese+0.039
Neg logit contributions
 two-0.032
 excitement-0.027
 p-0.026
Y-0.026
 himself-0.026
Token
Token ID198
Feature activation+2.53e-01

Pos logit contributions
aring+0.016
ues+0.016
 appearance+0.014
land+0.014
 rather+0.014
Neg logit contributions
 two-0.013
Y-0.013
 excitement-0.013
 himself-0.011
 p-0.011
Token
Token ID198
Feature activation+2.04e-01

Pos logit contributions
 appearance+0.012
aring+0.012
ues+0.011
ems+0.011
land+0.011
Neg logit contributions
 two-0.006
Y-0.006
 excitement-0.005
 himself-0.005
Mike-0.005
Token"
Token ID1
Feature activation-1.12e+00

Pos logit contributions
aring+0.007
ues+0.007
land+0.006
 appearance+0.006
 rather+0.006
Neg logit contributions
 two-0.007
Y-0.007
 excitement-0.007
 himself-0.006
 mixture-0.006
TokenExc
Token ID40127
Feature activation+1.02e+00

Pos logit contributions
 excitement+0.038
 himself+0.034
 two+0.032
 anticipation+0.031
 delight+0.030
Neg logit contributions
 Spanish-0.035
ues-0.034
aring-0.034
land-0.034
 English-0.034
Tokenuse
Token ID1904
Feature activation+4.57e+00

Pos logit contributions
aring+0.025
ues+0.025
 rather+0.021
land+0.021
 about+0.020
Neg logit contributions
Y-0.046
 two-0.046
 excitement-0.042
 p-0.041
 front-0.039
Token me
Token ID502
Feature activation+1.09e+00

Pos logit contributions
aring+0.188
ues+0.184
 rather+0.168
land+0.160
ute+0.159
Neg logit contributions
 two-0.145
Y-0.132
 excitement-0.126
 p-0.118
 himself-0.113
Token,
Token ID11
Feature activation+4.90e-01

Pos logit contributions
ues+0.040
aring+0.039
 appearance+0.037
land+0.036
 rather+0.036
Neg logit contributions
 two-0.037
 excitement-0.032
Y-0.031
 p-0.030
 himself-0.030
Token children
Token ID1751
Feature activation-1.40e+00

Pos logit contributions
aring+0.017
ues+0.017
land+0.015
 appearance+0.015
 rather+0.015
Neg logit contributions
 two-0.017
Y-0.016
 excitement-0.016
 himself-0.014
 p-0.014
Token.
Token ID13
Feature activation+4.07e-01

Pos logit contributions
 two+0.044
 excitement+0.037
 p+0.037
Y+0.036
 himself+0.035
Neg logit contributions
ues-0.055
aring-0.053
 appearance-0.050
 Portuguese-0.049
land-0.049
Token It
Token ID632
Feature activation-2.05e-01

Pos logit contributions
ues+0.013
aring+0.013
land+0.012
 appearance+0.012
 rather+0.011
Neg logit contributions
 two-0.015
Y-0.015
 excitement-0.014
 himself-0.013
 ah-0.013
Token to
Token ID284
Feature activation+2.55e-01

Pos logit contributions
ues+0.015
aring+0.014
land+0.013
 Portuguese+0.013
 Spanish+0.012
Neg logit contributions
 two-0.010
 excitement-0.009
Y-0.009
 himself-0.009
 p-0.009
Token take
Token ID1011
Feature activation+4.88e-01

Pos logit contributions
aring+0.010
ues+0.010
land+0.009
 appearance+0.009
 Spanish+0.009
Neg logit contributions
 two-0.008
Y-0.007
 excitement-0.007
 himself-0.006
 p-0.006
Token your
Token ID534
Feature activation+1.12e+00

Pos logit contributions
aring+0.021
ues+0.021
land+0.019
ems+0.018
.'"+0.018
Neg logit contributions
 two-0.014
 excitement-0.013
Y-0.012
 himself-0.012
 his-0.011
Token health
Token ID1535
Feature activation+7.42e-01

Pos logit contributions
aring+0.047
ues+0.047
land+0.042
 rather+0.041
 where+0.041
Neg logit contributions
 two-0.032
 excitement-0.031
Y-0.029
 himself-0.026
 p-0.025
Token for
Token ID329
Feature activation+8.39e-01

Pos logit contributions
ues+0.030
aring+0.029
 appearance+0.026
 Portuguese+0.026
land+0.026
Neg logit contributions
 two-0.023
 excitement-0.020
 himself-0.020
Y-0.019
 his-0.018
Token granted
Token ID7520
Feature activation+4.55e+00

Pos logit contributions
aring+0.037
ues+0.036
land+0.033
ems+0.032
 appearance+0.031
Neg logit contributions
 two-0.024
 excitement-0.022
Y-0.021
 himself-0.021
 his-0.019
Token.
Token ID13
Feature activation+2.40e-01

Pos logit contributions
ues+0.188
aring+0.188
 Portuguese+0.165
 appearance+0.164
land+0.161
Neg logit contributions
 two-0.134
 himself-0.116
 excitement-0.114
 his-0.113
Y-0.103
Token<|endoftext|>
Token ID50256
Feature activation+1.33e+00

Pos logit contributions
ues+0.009
aring+0.009
land+0.008
 appearance+0.008
ems+0.008
Neg logit contributions
 two-0.008
Y-0.007
 excitement-0.007
 himself-0.007
 his-0.006
TokenOnce
Token ID7454
Feature activation-1.14e+00

Pos logit contributions
aring+0.052
ues+0.050
land+0.047
 appearance+0.046
 rather+0.045
Neg logit contributions
 two-0.039
Y-0.038
 excitement-0.038
 himself-0.033
 mixture-0.032
Token upon
Token ID2402
Feature activation+2.71e-01

Pos logit contributions
 two+0.033
 excitement+0.029
Y+0.027
 his+0.026
 himself+0.026
Neg logit contributions
aring-0.048
ues-0.047
land-0.045
 appearance-0.044
 rather-0.043
Token a
Token ID257
Feature activation-4.51e-01

Pos logit contributions
aring+0.011
ues+0.011
land+0.010
 where+0.010
 rather+0.010
Neg logit contributions
Y-0.007
 excitement-0.007
 two-0.007
 mixture-0.006
 ah-0.006
Tokenâ
Token ID22940
Feature activation+1.48e+00

Pos logit contributions
 excitement+0.005
 two+0.005
Y+0.004
 mixture+0.004
 ah+0.004
Neg logit contributions
aring-0.005
ues-0.004
 Spanish-0.004
 rather-0.004
land-0.004
Token
Token ID26391
Feature activation-2.16e-01

Pos logit contributions
aring+0.053
ues+0.047
land+0.045
.'"+0.044
 appearance+0.043
Neg logit contributions
 two-0.052
 excitement-0.051
Y-0.049
 ah-0.044
 mar-0.042
Token
Token ID129
Feature activation-6.14e-01

Pos logit contributions
 two+0.007
Y+0.007
 excitement+0.007
 himself+0.006
 p+0.006
Neg logit contributions
ues-0.008
aring-0.007
 appearance-0.007
 Spanish-0.007
land-0.007
Token
Token ID241
Feature activation-2.68e-01

Pos logit contributions
 two+0.020
Y+0.020
 excitement+0.020
 himself+0.018
 p+0.017
Neg logit contributions
aring-0.022
ues-0.021
land-0.019
 appearance-0.019
 rather-0.019
TokenExc
Token ID40127
Feature activation+6.64e-01

Pos logit contributions
 two+0.007
 excitement+0.007
Y+0.007
 himself+0.006
 mixture+0.006
Neg logit contributions
aring-0.011
ues-0.011
land-0.010
 rather-0.010
 appearance-0.010
Tokenuse
Token ID1904
Feature activation+4.55e+00

Pos logit contributions
aring+0.019
ues+0.019
land+0.017
 appearance+0.016
 rather+0.016
Neg logit contributions
 two-0.027
Y-0.026
 excitement-0.026
 himself-0.023
 p-0.023
Token me
Token ID502
Feature activation+7.23e-01

Pos logit contributions
ues+0.181
aring+0.179
 rather+0.157
land+0.155
 appearance+0.155
Neg logit contributions
 two-0.148
Y-0.138
 excitement-0.130
 himself-0.123
 p-0.122
Token,
Token ID11
Feature activation-5.06e-02

Pos logit contributions
ues+0.026
aring+0.025
land+0.024
 appearance+0.024
 rather+0.023
Neg logit contributions
 two-0.025
 excitement-0.022
Y-0.021
 himself-0.021
 p-0.020
Token Fox
Token ID5426
Feature activation+1.83e+00

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 himself+0.001
 p+0.001
Neg logit contributions
ues-0.002
aring-0.002
 appearance-0.002
land-0.002
 rather-0.002
Token,
Token ID11
Feature activation-2.21e-02

Pos logit contributions
ues+0.069
aring+0.068
 appearance+0.062
land+0.061
 rather+0.061
Neg logit contributions
 two-0.060
 excitement-0.052
Y-0.052
 p-0.049
 himself-0.049
Tokenâ
Token ID22940
Feature activation+1.32e+00

Pos logit contributions
 two+0.001
 excitement+0.001
Y+0.001
 himself+0.001
 p+0.001
Neg logit contributions
ues-0.001
aring-0.001
 appearance-0.001
land-0.001
ute-0.001
Token about
Token ID546
Feature activation-2.32e-02

Pos logit contributions
aring+0.041
ues+0.039
itionally+0.037
 appearance+0.037
ems+0.035
Neg logit contributions
 two-0.048
 himself-0.043
 his-0.043
 excitement-0.042
 other-0.038
Token how
Token ID703
Feature activation-1.02e+00

Pos logit contributions
 two+0.001
 himself+0.001
 excitement+0.001
 his+0.001
Y+0.001
Neg logit contributions
ues-0.001
aring-0.001
land-0.001
ute-0.001
ems-0.001
Token your
Token ID534
Feature activation+1.21e+00

Pos logit contributions
 two+0.036
 excitement+0.031
 himself+0.030
Y+0.030
 his+0.030
Neg logit contributions
ues-0.039
aring-0.036
land-0.035
 appearance-0.034
ems-0.033
Token actions
Token ID4028
Feature activation+2.49e+00

Pos logit contributions
aring+0.054
ues+0.054
land+0.048
 appearance+0.045
ems+0.045
Neg logit contributions
 two-0.035
 excitement-0.033
Y-0.029
 himself-0.027
 p-0.025
Token might
Token ID1244
Feature activation+1.33e+00

Pos logit contributions
ues+0.097
aring+0.091
land+0.083
 Portuguese+0.081
 appearance+0.081
Neg logit contributions
 two-0.085
 excitement-0.074
 himself-0.072
Y-0.069
 his-0.068
Token affect
Token ID2689
Feature activation+4.55e+00

Pos logit contributions
ues+0.061
aring+0.055
 Portuguese+0.054
land+0.052
 Him+0.052
Neg logit contributions
 two-0.037
 himself-0.031
 excitement-0.030
 whole-0.028
 p-0.028
Token others
Token ID1854
Feature activation+9.01e-01

Pos logit contributions
ues+0.224
aring+0.212
land+0.199
 Portuguese+0.193
ems+0.187
Neg logit contributions
 two-0.120
 himself-0.108
 his-0.097
 excitement-0.096
 other-0.089
Token.
Token ID13
Feature activation+2.30e-01

Pos logit contributions
ues+0.046
aring+0.041
ree+0.040
 Portuguese+0.039
land+0.038
Neg logit contributions
 two-0.023
 himself-0.018
 his-0.016
 excitement-0.016
 other-0.015
Token<|endoftext|>
Token ID50256
Feature activation+1.28e+00

Pos logit contributions
ues+0.010
aring+0.010
 appearance+0.009
land+0.009
ems+0.009
Neg logit contributions
 two-0.006
Y-0.006
 excitement-0.006
 himself-0.005
 his-0.005
TokenOnce
Token ID7454
Feature activation-1.27e+00

Pos logit contributions
aring+0.052
ues+0.051
 appearance+0.047
land+0.046
 rather+0.046
Neg logit contributions
 two-0.038
Y-0.036
 excitement-0.035
 himself-0.032
 p-0.030
Token upon
Token ID2402
Feature activation+2.34e-01

Pos logit contributions
 two+0.037
 excitement+0.033
Y+0.031
 himself+0.030
 his+0.030
Neg logit contributions
aring-0.053
ues-0.052
land-0.049
 appearance-0.049
 rather-0.047
Token someone
Token ID2130
Feature activation+1.67e+00

Pos logit contributions
aring+0.036
ues+0.035
land+0.033
 where+0.033
 appearance+0.032
Neg logit contributions
Y-0.022
 two-0.021
 excitement-0.020
 mixture-0.018
 himself-0.017
Token say
Token ID910
Feature activation+4.56e-01

Pos logit contributions
ues+0.061
aring+0.058
ems+0.052
 appearance+0.052
.'"+0.052
Neg logit contributions
 two-0.055
 excitement-0.054
 p-0.049
 himself-0.048
Y-0.048
Token,
Token ID11
Feature activation+2.02e-01

Pos logit contributions
ues+0.018
aring+0.018
 appearance+0.016
ems+0.016
 rather+0.016
Neg logit contributions
 two-0.015
 excitement-0.013
 his-0.012
 himself-0.012
Y-0.012
Token "
Token ID366
Feature activation-5.06e-01

Pos logit contributions
ues+0.008
aring+0.008
 appearance+0.007
land+0.007
ems+0.007
Neg logit contributions
 two-0.007
 excitement-0.006
Y-0.006
 himself-0.006
 p-0.006
TokenExc
Token ID40127
Feature activation+8.15e-01

Pos logit contributions
 excitement+0.014
 two+0.014
 himself+0.013
 mixture+0.012
 mar+0.012
Neg logit contributions
aring-0.020
land-0.019
ues-0.018
 Spanish-0.018
ems-0.018
Tokenuse
Token ID1904
Feature activation+4.55e+00

Pos logit contributions
aring+0.018
ues+0.018
land+0.014
 rather+0.014
 where+0.014
Neg logit contributions
 two-0.038
Y-0.038
 excitement-0.036
 p-0.034
 himself-0.034
Token me
Token ID502
Feature activation+8.12e-01

Pos logit contributions
aring+0.183
ues+0.180
 rather+0.162
land+0.157
 appearance+0.154
Neg logit contributions
 two-0.144
Y-0.134
 excitement-0.130
 p-0.117
 himself-0.114
Token,
Token ID11
Feature activation+5.81e-02

Pos logit contributions
ues+0.030
aring+0.029
 appearance+0.027
land+0.027
 rather+0.026
Neg logit contributions
 two-0.028
 excitement-0.024
Y-0.023
 himself-0.022
 p-0.022
Token little
Token ID1310
Feature activation+1.02e-02

Pos logit contributions
ues+0.002
aring+0.002
 appearance+0.002
land+0.002
 rather+0.002
Neg logit contributions
 two-0.002
Y-0.002
 excitement-0.002
 p-0.002
 himself-0.002
Token girl
Token ID2576
Feature activation-6.05e-01

Pos logit contributions
aring+0.000
ues+0.000
land+0.000
 appearance+0.000
 rather+0.000
Neg logit contributions
 two-0.000
Y-0.000
 excitement-0.000
 himself-0.000
 mixture-0.000
Token.
Token ID13
Feature activation-9.04e-02

Pos logit contributions
 two+0.018
Y+0.016
 excitement+0.016
 p+0.015
 himself+0.014
Neg logit contributions
ues-0.024
aring-0.024
 appearance-0.022
land-0.022
 rather-0.022
Token her
Token ID607
Feature activation-1.17e+00

Pos logit contributions
aring+0.037
ues+0.036
land+0.033
 appearance+0.032
 rather+0.031
Neg logit contributions
 two-0.032
 excitement-0.030
Y-0.029
 himself-0.026
 p-0.025
Token mom
Token ID1995
Feature activation-1.22e+00

Pos logit contributions
 two+0.033
 excitement+0.031
Y+0.028
 p+0.026
 himself+0.026
Neg logit contributions
aring-0.052
ues-0.051
land-0.045
ems-0.045
 appearance-0.044
Tokenmy
Token ID1820
Feature activation-2.33e-01

Pos logit contributions
 two+0.037
Y+0.032
 excitement+0.031
 p+0.030
 himself+0.028
Neg logit contributions
ues-0.051
aring-0.051
 Spanish-0.047
 appearance-0.047
 Portuguese-0.046
Token watched
Token ID7342
Feature activation-1.33e+00

Pos logit contributions
 two+0.007
Y+0.006
 excitement+0.006
 p+0.006
 himself+0.006
Neg logit contributions
aring-0.010
ues-0.010
 appearance-0.009
 Spanish-0.009
 Portuguese-0.008
Token in
Token ID287
Feature activation+2.24e-01

Pos logit contributions
 two+0.040
 excitement+0.038
Y+0.037
 himself+0.034
 p+0.032
Neg logit contributions
aring-0.055
ues-0.052
 appearance-0.048
land-0.048
 rather-0.047
Token amaz
Token ID40642
Feature activation-6.36e+00

Pos logit contributions
aring+0.010
ues+0.009
land+0.009
 appearance+0.009
 rather+0.008
Neg logit contributions
 two-0.006
 excitement-0.006
Y-0.006
 himself-0.005
 p-0.005
Tokenement
Token ID972
Feature activation-3.81e-01

Pos logit contributions
 two+0.162
 excitement+0.155
Y+0.154
 himself+0.134
 mixture+0.126
Neg logit contributions
aring-0.281
ues-0.267
 appearance-0.249
 rather-0.248
land-0.245
Token as
Token ID355
Feature activation-1.31e+00

Pos logit contributions
 two+0.013
 excitement+0.012
Y+0.011
 himself+0.011
 p+0.010
Neg logit contributions
aring-0.014
ues-0.013
 appearance-0.013
 rather-0.012
 Spanish-0.012
Token the
Token ID262
Feature activation-3.39e-01

Pos logit contributions
Y+0.037
 two+0.036
 excitement+0.035
 mixture+0.031
 himself+0.030
Neg logit contributions
aring-0.050
ues-0.050
 where-0.047
 rather-0.047
land-0.047
Token machine
Token ID4572
Feature activation+9.57e-01

Pos logit contributions
Y+0.009
 two+0.008
 excitement+0.008
 himself+0.008
Carl+0.007
Neg logit contributions
aring-0.014
ues-0.013
land-0.013
 appearance-0.012
 Spanish-0.012
Token flew
Token ID13112
Feature activation+1.57e+00

Pos logit contributions
aring+0.041
ues+0.040
 appearance+0.037
land+0.036
 rather+0.036
Neg logit contributions
 two-0.026
Y-0.024
 excitement-0.024
 himself-0.021
 p-0.020
Token
Token ID198
Feature activation-4.65e-01

Pos logit contributions
 two+0.005
Y+0.005
 excitement+0.005
 himself+0.005
 ah+0.004
Neg logit contributions
aring-0.006
ues-0.006
 appearance-0.006
 rather-0.006
land-0.006
Token
Token ID198
Feature activation-4.88e-01

Pos logit contributions
 two+0.014
Y+0.013
 excitement+0.013
 himself+0.012
 p+0.011
Neg logit contributions
aring-0.018
ues-0.018
 appearance-0.017
land-0.017
 rather-0.016
TokenSam
Token ID16305
Feature activation-7.78e-01

Pos logit contributions
 excitement+0.014
 two+0.014
Y+0.013
 mixture+0.012
 himself+0.011
Neg logit contributions
aring-0.020
ues-0.018
land-0.017
 Spanish-0.017
 rather-0.017
Token gasped
Token ID45236
Feature activation-1.06e+00

Pos logit contributions
 two+0.024
Y+0.023
 excitement+0.023
 himself+0.020
 p+0.019
Neg logit contributions
aring-0.030
ues-0.030
land-0.027
 appearance-0.027
 rather-0.027
Token in
Token ID287
Feature activation+1.90e-01

Pos logit contributions
 two+0.038
Y+0.038
 excitement+0.037
 himself+0.033
 mixture+0.033
Neg logit contributions
aring-0.035
ues-0.034
land-0.030
 appearance-0.030
 rather-0.030
Token amaz
Token ID40642
Feature activation-6.33e+00

Pos logit contributions
 where+0.009
 Spanish+0.008
aring+0.008
 why+0.008
 appearance+0.008
Neg logit contributions
Y-0.004
 mixture-0.004
 two-0.004
Carl-0.003
 delic-0.003
Tokenement
Token ID972
Feature activation-7.41e-01

Pos logit contributions
 two+0.180
Y+0.175
 excitement+0.173
 himself+0.150
 mixture+0.144
Neg logit contributions
aring-0.257
ues-0.251
land-0.232
 appearance-0.229
 rather-0.226
Token.
Token ID13
Feature activation-6.88e-01

Pos logit contributions
 two+0.025
 excitement+0.024
Y+0.024
 himself+0.021
 p+0.020
Neg logit contributions
aring-0.026
ues-0.026
 appearance-0.023
land-0.023
 rather-0.023
Token He
Token ID679
Feature activation-9.76e-01

Pos logit contributions
 two+0.018
Y+0.017
 excitement+0.017
 himself+0.015
 p+0.014
Neg logit contributions
aring-0.030
ues-0.029
 appearance-0.027
land-0.027
 rather-0.027
Token had
Token ID550
Feature activation+4.70e-01

Pos logit contributions
 two+0.023
 excitement+0.022
Y+0.022
 himself+0.018
 p+0.018
Neg logit contributions
aring-0.045
ues-0.044
land-0.041
 appearance-0.041
 rather-0.040
Token never
Token ID1239
Feature activation+5.74e-01

Pos logit contributions
aring+0.014
ues+0.014
land+0.013
 where+0.013
 rather+0.013
Neg logit contributions
Y-0.016
 excitement-0.016
 two-0.016
 mixture-0.015
 ah-0.014
Token egg
Token ID5935
Feature activation-7.11e-01

Pos logit contributions
 two+0.034
 excitement+0.031
Y+0.031
 p+0.027
 himself+0.027
Neg logit contributions
aring-0.049
ues-0.049
 appearance-0.044
land-0.044
 rather-0.043
Token!
Token ID0
Feature activation-2.02e-01

Pos logit contributions
 two+0.023
Y+0.021
 excitement+0.020
 p+0.019
 himself+0.019
Neg logit contributions
aring-0.027
ues-0.027
 appearance-0.025
 Spanish-0.025
land-0.024
Token He
Token ID679
Feature activation-8.58e-01

Pos logit contributions
 two+0.006
Y+0.006
 excitement+0.006
 himself+0.005
 p+0.005
Neg logit contributions
aring-0.008
ues-0.008
land-0.007
 appearance-0.007
 rather-0.007
Token watched
Token ID7342
Feature activation-1.31e+00

Pos logit contributions
 two+0.021
Y+0.020
 excitement+0.020
 himself+0.017
 p+0.017
Neg logit contributions
aring-0.038
ues-0.037
land-0.035
 appearance-0.035
 rather-0.034
Token in
Token ID287
Feature activation+4.06e-01

Pos logit contributions
Y+0.037
 two+0.037
 excitement+0.036
 himself+0.031
 mixture+0.030
Neg logit contributions
aring-0.052
ues-0.051
land-0.047
 appearance-0.046
 rather-0.046
Token amaz
Token ID40642
Feature activation-6.31e+00

Pos logit contributions
aring+0.017
ues+0.016
 where+0.016
 appearance+0.016
land+0.016
Neg logit contributions
Y-0.011
 two-0.010
 excitement-0.009
 mixture-0.008
 himself-0.008
Tokenement
Token ID972
Feature activation-4.94e-01

Pos logit contributions
 two+0.188
Y+0.185
 excitement+0.180
 himself+0.155
 mixture+0.151
Neg logit contributions
aring-0.245
ues-0.243
land-0.223
 appearance-0.218
 rather-0.216
Token as
Token ID355
Feature activation-1.17e+00

Pos logit contributions
 two+0.018
 excitement+0.017
Y+0.017
 himself+0.015
 p+0.015
Neg logit contributions
aring-0.017
ues-0.016
land-0.014
 appearance-0.014
 rather-0.014
Token the
Token ID262
Feature activation-2.40e-01

Pos logit contributions
Y+0.030
 two+0.027
 excitement+0.027
 mixture+0.025
Carl+0.022
Neg logit contributions
aring-0.048
ues-0.047
 where-0.046
 rather-0.045
land-0.044
Token egg
Token ID5935
Feature activation-2.87e-01

Pos logit contributions
Y+0.007
Carl+0.006
 mar+0.006
 himself+0.006
Mike+0.006
Neg logit contributions
land-0.008
 tale-0.008
aring-0.008
 where-0.008
 why-0.008
Token began
Token ID2540
Feature activation-3.78e-01

Pos logit contributions
 two+0.008
Y+0.008
 excitement+0.008
 himself+0.007
 mixture+0.006
Neg logit contributions
aring-0.012
ues-0.011
land-0.011
 appearance-0.010
 rather-0.010
Token air
Token ID1633
Feature activation-1.28e+00

Pos logit contributions
Y+0.003
 excitement+0.003
 two+0.003
 mixture+0.003
 himself+0.002
Neg logit contributions
aring-0.005
ues-0.005
 where-0.005
land-0.005
 rather-0.004
Token.
Token ID13
Feature activation-3.01e-01

Pos logit contributions
 two+0.040
 excitement+0.037
Y+0.035
 p+0.035
 himself+0.034
Neg logit contributions
aring-0.051
ues-0.049
 Spanish-0.045
 appearance-0.045
 rather-0.043
Token He
Token ID679
Feature activation-1.08e+00

Pos logit contributions
 two+0.009
Y+0.008
 excitement+0.008
 himself+0.007
 p+0.007
Neg logit contributions
aring-0.012
ues-0.012
 appearance-0.011
land-0.011
 rather-0.011
Token watched
Token ID7342
Feature activation-1.69e+00

Pos logit contributions
 two+0.030
 excitement+0.027
Y+0.027
 p+0.024
 himself+0.023
Neg logit contributions
aring-0.047
ues-0.045
land-0.042
 appearance-0.042
 rather-0.040
Token with
Token ID351
Feature activation-7.84e-01

Pos logit contributions
 two+0.045
Y+0.044
 excitement+0.043
 himself+0.037
 mixture+0.035
Neg logit contributions
aring-0.071
ues-0.070
land-0.064
 appearance-0.064
 rather-0.063
Token amaz
Token ID40642
Feature activation-6.22e+00

Pos logit contributions
Y+0.020
 two+0.018
 excitement+0.016
 mixture+0.016
Carl+0.015
Neg logit contributions
 where-0.032
ues-0.032
aring-0.032
 rather-0.031
 appearance-0.031
Tokenement
Token ID972
Feature activation-3.20e-01

Pos logit contributions
 two+0.158
 excitement+0.151
Y+0.150
 himself+0.133
 mixture+0.123
Neg logit contributions
aring-0.274
ues-0.260
 appearance-0.244
 rather-0.242
land-0.241
Token as
Token ID355
Feature activation-1.23e+00

Pos logit contributions
 two+0.011
 excitement+0.010
 himself+0.010
Y+0.009
 his+0.009
Neg logit contributions
aring-0.011
ues-0.011
 appearance-0.010
 rather-0.010
 Spanish-0.010
Token the
Token ID262
Feature activation-3.31e-01

Pos logit contributions
Y+0.029
 two+0.026
 excitement+0.025
 mixture+0.023
Carl+0.021
Neg logit contributions
aring-0.054
ues-0.054
 where-0.052
 rather-0.051
land-0.051
Token z
Token ID1976
Feature activation-7.93e-01

Pos logit contributions
Y+0.009
 two+0.008
 excitement+0.008
 himself+0.008
Carl+0.007
Neg logit contributions
aring-0.013
ues-0.012
land-0.012
 language-0.012
 Spanish-0.012
Tokenig
Token ID328
Feature activation+4.51e-01

Pos logit contributions
 two+0.033
 excitement+0.033
Y+0.033
 himself+0.029
 p+0.028
Neg logit contributions
ues-0.022
aring-0.022
land-0.020
 appearance-0.018
 where-0.018
Tokenly
Token ID306
Feature activation+2.42e+00

Pos logit contributions
 two+0.029
Y+0.028
 excitement+0.028
 himself+0.025
 mixture+0.024
Neg logit contributions
aring-0.037
ues-0.035
 appearance-0.032
 rather-0.032
land-0.032
Token.
Token ID13
Feature activation-2.81e-01

Pos logit contributions
aring+0.095
ues+0.091
 appearance+0.081
 rather+0.080
land+0.080
Neg logit contributions
 two-0.077
 excitement-0.073
Y-0.070
 himself-0.064
 p-0.064
Token Jill
Token ID22772
Feature activation+1.24e-01

Pos logit contributions
 two+0.007
Y+0.007
 excitement+0.007
 himself+0.006
 mixture+0.005
Neg logit contributions
aring-0.012
ues-0.012
land-0.011
 appearance-0.011
 rather-0.011
Token nodded
Token ID14464
Feature activation-2.90e+00

Pos logit contributions
aring+0.005
ues+0.005
 rather+0.005
land+0.005
 where+0.005
Neg logit contributions
Y-0.003
 two-0.003
 excitement-0.003
 himself-0.003
 mixture-0.002
Token in
Token ID287
Feature activation+1.68e-01

Pos logit contributions
 two+0.088
 excitement+0.084
Y+0.083
 himself+0.075
 his+0.071
Neg logit contributions
aring-0.119
ues-0.113
land-0.105
 appearance-0.104
 rather-0.100
Token amaz
Token ID40642
Feature activation-6.09e+00

Pos logit contributions
aring+0.006
 Spanish+0.006
ues+0.006
 where+0.006
 appearance+0.006
Neg logit contributions
Y-0.005
 two-0.005
 mixture-0.005
Carl-0.004
 excitement-0.004
Tokenement
Token ID972
Feature activation-7.29e-01

Pos logit contributions
 two+0.173
Y+0.168
 excitement+0.166
 himself+0.144
 mixture+0.137
Neg logit contributions
aring-0.248
ues-0.242
land-0.223
 appearance-0.221
 rather-0.218
Token.
Token ID13
Feature activation-5.46e-01

Pos logit contributions
 two+0.024
 excitement+0.023
Y+0.023
 himself+0.021
 p+0.020
Neg logit contributions
aring-0.026
ues-0.025
 appearance-0.023
land-0.023
 rather-0.023
Token
Token ID220
Feature activation-1.12e+00

Pos logit contributions
 two+0.016
Y+0.016
 excitement+0.015
 himself+0.014
 p+0.013
Neg logit contributions
aring-0.022
ues-0.021
 appearance-0.020
land-0.020
 rather-0.019
Token
Token ID198
Feature activation-2.77e-01

Pos logit contributions
 two+0.031
Y+0.027
 excitement+0.026
 himself+0.026
 his+0.022
Neg logit contributions
aring-0.047
 appearance-0.046
ues-0.046
land-0.043
 rather-0.043
Token
Token ID198
Feature activation-3.18e-01

Pos logit contributions
 two+0.009
Y+0.009
 excitement+0.009
 himself+0.008
 mixture+0.007
Neg logit contributions
aring-0.010
ues-0.010
land-0.009
 appearance-0.009
 rather-0.009
Token little
Token ID1310
Feature activation-2.90e-01

Pos logit contributions
Y+0.003
 excitement+0.003
 two+0.003
 himself+0.002
 mixture+0.002
Neg logit contributions
aring-0.005
ues-0.004
land-0.004
 appearance-0.004
 rather-0.004
Token girl
Token ID2576
Feature activation-1.04e+00

Pos logit contributions
Y+0.007
 excitement+0.007
 two+0.007
 himself+0.006
 mixture+0.005
Neg logit contributions
aring-0.012
ues-0.012
 appearance-0.011
land-0.011
 where-0.011
Token looked
Token ID3114
Feature activation+9.33e-02

Pos logit contributions
 two+0.026
 excitement+0.024
Y+0.024
 p+0.021
 himself+0.020
Neg logit contributions
aring-0.047
ues-0.047
 appearance-0.043
land-0.043
 Spanish-0.042
Token up
Token ID510
Feature activation-8.95e-01

Pos logit contributions
aring+0.004
ues+0.004
land+0.003
 appearance+0.003
ems+0.003
Neg logit contributions
 two-0.003
 excitement-0.003
Y-0.002
 p-0.002
 his-0.002
Token in
Token ID287
Feature activation+8.85e-02

Pos logit contributions
 two+0.035
Y+0.034
 excitement+0.034
 himself+0.031
 mixture+0.030
Neg logit contributions
aring-0.027
ues-0.026
land-0.023
 appearance-0.023
 rather-0.023
Token amaz
Token ID40642
Feature activation-6.05e+00

Pos logit contributions
ues+0.004
aring+0.004
 where+0.004
 appearance+0.004
 Spanish+0.004
Neg logit contributions
Y-0.002
 two-0.002
 excitement-0.002
 mixture-0.002
 himself-0.002
Tokenement
Token ID972
Feature activation-6.30e-01

Pos logit contributions
 two+0.164
Y+0.158
 excitement+0.158
 himself+0.137
 mixture+0.129
Neg logit contributions
aring-0.256
ues-0.246
land-0.228
 appearance-0.227
 rather-0.225
Token and
Token ID290
Feature activation-5.53e-01

Pos logit contributions
 two+0.021
 excitement+0.020
Y+0.019
 himself+0.018
 p+0.017
Neg logit contributions
aring-0.023
ues-0.022
 appearance-0.020
 Spanish-0.020
 rather-0.020
Token said
Token ID531
Feature activation-1.31e-01

Pos logit contributions
 two+0.015
 excitement+0.015
Y+0.014
 himself+0.012
 p+0.012
Neg logit contributions
aring-0.023
ues-0.023
land-0.021
 appearance-0.021
 rather-0.021
Token,
Token ID11
Feature activation+7.38e-02

Pos logit contributions
 two+0.004
 excitement+0.004
Y+0.004
 himself+0.004
 his+0.003
Neg logit contributions
aring-0.005
ues-0.005
 appearance-0.005
ems-0.005
 rather-0.004
Token "
Token ID366
Feature activation-4.95e-01

Pos logit contributions
aring+0.003
ues+0.003
 appearance+0.003
ems+0.003
 tale+0.003
Neg logit contributions
 two-0.002
 excitement-0.002
Y-0.002
 himself-0.002
 p-0.002
Token.
Token ID13
Feature activation-4.11e-01

Pos logit contributions
 two+0.029
Y+0.028
 excitement+0.027
 himself+0.025
 p+0.024
Neg logit contributions
aring-0.031
ues-0.031
 appearance-0.027
land-0.027
 rather-0.026
Token Tim
Token ID5045
Feature activation-1.55e+00

Pos logit contributions
 two+0.012
Y+0.011
 excitement+0.011
 himself+0.010
 p+0.009
Neg logit contributions
ues-0.017
aring-0.017
 appearance-0.016
land-0.016
 rather-0.015
Tokenmy
Token ID1820
Feature activation-9.86e-02

Pos logit contributions
 two+0.046
Y+0.042
 excitement+0.042
 himself+0.038
 p+0.037
Neg logit contributions
ues-0.064
aring-0.063
land-0.057
 appearance-0.057
 Spanish-0.055
Token watched
Token ID7342
Feature activation-1.60e+00

Pos logit contributions
 two+0.002
 excitement+0.002
Y+0.002
 p+0.002
 himself+0.002
Neg logit contributions
aring-0.005
ues-0.005
 appearance-0.004
land-0.004
 Spanish-0.004
Token with
Token ID351
Feature activation-6.90e-01

Pos logit contributions
 two+0.042
Y+0.041
 excitement+0.040
 himself+0.033
 mixture+0.033
Neg logit contributions
aring-0.068
ues-0.067
land-0.062
 appearance-0.061
 rather-0.061
Token amaz
Token ID40642
Feature activation-6.04e+00

Pos logit contributions
Y+0.018
 two+0.015
 mixture+0.014
Carl+0.014
 excitement+0.013
Neg logit contributions
 where-0.029
 rather-0.027
 why-0.027
ues-0.027
 Spanish-0.027
Tokenement
Token ID972
Feature activation-1.65e-01

Pos logit contributions
 two+0.156
Y+0.149
 excitement+0.149
 himself+0.132
 mixture+0.121
Neg logit contributions
aring-0.262
ues-0.251
 appearance-0.235
land-0.234
 rather-0.232
Token as
Token ID355
Feature activation-1.11e+00

Pos logit contributions
 two+0.006
 excitement+0.005
Y+0.005
 himself+0.005
 p+0.005
Neg logit contributions
aring-0.006
ues-0.006
 appearance-0.005
 rather-0.005
land-0.005
Token he
Token ID339
Feature activation-1.20e+00

Pos logit contributions
Y+0.027
 excitement+0.024
 two+0.024
 mixture+0.023
Carl+0.021
Neg logit contributions
aring-0.047
ues-0.046
 where-0.046
 rather-0.045
 Spanish-0.044
Token saw
Token ID2497
Feature activation+4.27e-01

Pos logit contributions
Y+0.034
 excitement+0.034
 two+0.034
 himself+0.029
 mixture+0.029
Neg logit contributions
aring-0.046
ues-0.046
land-0.043
 appearance-0.042
 rather-0.042
Token stars
Token ID5788
Feature activation+9.73e-01

Pos logit contributions
aring+0.017
ues+0.016
 where+0.016
land+0.016
 Spanish+0.015
Neg logit contributions
Y-0.012
 excitement-0.011
 two-0.011
 mixture-0.010
Carl-0.009
Tokenered
Token ID1068
Feature activation-5.79e-02

Pos logit contributions
 two+0.104
Y+0.099
 excitement+0.098
 himself+0.088
 mixture+0.083
Neg logit contributions
aring-0.143
ues-0.138
land-0.127
 appearance-0.127
 rather-0.126
Token in
Token ID287
Feature activation+1.30e-01

Pos logit contributions
Y+0.002
 two+0.002
 excitement+0.002
 ah+0.002
 mixture+0.002
Neg logit contributions
aring-0.002
ues-0.002
land-0.002
 where-0.002
 rather-0.002
Token,
Token ID11
Feature activation+2.86e-03

Pos logit contributions
aring+0.006
ues+0.006
 where+0.005
land+0.005
 appearance+0.005
Neg logit contributions
Y-0.003
 two-0.003
 excitement-0.003
 mixture-0.002
 ah-0.002
Token blinking
Token ID43196
Feature activation+5.06e-01

Pos logit contributions
aring+0.000
ues+0.000
 appearance+0.000
land+0.000
 Spanish+0.000
Neg logit contributions
 two-0.000
 excitement-0.000
Y-0.000
 himself-0.000
 p-0.000
Token in
Token ID287
Feature activation+2.37e-01

Pos logit contributions
aring+0.019
ues+0.018
land+0.017
 appearance+0.017
 rather+0.016
Neg logit contributions
 two-0.016
Y-0.016
 excitement-0.016
 himself-0.014
 mixture-0.013
Token amaz
Token ID40642
Feature activation-6.01e+00

Pos logit contributions
aring+0.010
ues+0.010
 where+0.010
 appearance+0.010
 Spanish+0.010
Neg logit contributions
Y-0.006
 two-0.005
 excitement-0.005
 mixture-0.004
 ah-0.004
Tokenement
Token ID972
Feature activation-5.81e-01

Pos logit contributions
 two+0.169
Y+0.164
 excitement+0.163
 himself+0.141
 mixture+0.135
Neg logit contributions
aring-0.246
ues-0.240
land-0.221
 appearance-0.219
 rather-0.217
Token.
Token ID13
Feature activation-6.32e-01

Pos logit contributions
 two+0.019
 excitement+0.018
Y+0.018
 himself+0.017
 p+0.016
Neg logit contributions
aring-0.021
ues-0.021
 appearance-0.019
 rather-0.018
land-0.018
Token
Token ID198
Feature activation-4.17e-01

Pos logit contributions
 two+0.018
Y+0.018
 excitement+0.017
 himself+0.015
 mixture+0.014
Neg logit contributions
aring-0.026
ues-0.025
land-0.023
 appearance-0.023
 rather-0.023
Token
Token ID198
Feature activation-4.46e-01

Pos logit contributions
 two+0.013
Y+0.013
 excitement+0.013
 himself+0.011
 p+0.011
Neg logit contributions
aring-0.016
ues-0.015
 appearance-0.014
land-0.014
 rather-0.014
TokenBut
Token ID1537
Feature activation-2.20e+00

Pos logit contributions
 excitement+0.014
 two+0.014
Y+0.013
 mixture+0.012
 p+0.012
Neg logit contributions
aring-0.017
ues-0.016
land-0.015
 Spanish-0.014
 rather-0.014
Token.
Token ID13
Feature activation-6.10e-01

Pos logit contributions
aring+0.049
ues+0.048
 appearance+0.044
 Spanish+0.044
 Him+0.042
Neg logit contributions
 two-0.040
 excitement-0.035
Y-0.034
 himself-0.034
 p-0.033
Token
Token ID198
Feature activation-2.13e-01

Pos logit contributions
 two+0.018
Y+0.018
 excitement+0.018
 himself+0.015
 mixture+0.015
Neg logit contributions
aring-0.024
ues-0.023
land-0.021
 appearance-0.021
 rather-0.021
Token
Token ID198
Feature activation-1.41e-01

Pos logit contributions
 two+0.007
Y+0.007
 excitement+0.006
 himself+0.006
 p+0.005
Neg logit contributions
aring-0.008
ues-0.008
 appearance-0.007
land-0.007
 rather-0.007
TokenTo
Token ID2514
Feature activation+2.27e-01

Pos logit contributions
 excitement+0.005
 two+0.005
Y+0.004
 mixture+0.004
 himself+0.004
Neg logit contributions
aring-0.005
ues-0.005
 rather-0.004
 Spanish-0.004
land-0.004
Token her
Token ID607
Feature activation+2.85e-01

Pos logit contributions
aring+0.009
ues+0.009
 where+0.009
land+0.009
 rather+0.009
Neg logit contributions
Y-0.006
 two-0.006
 excitement-0.006
 mixture-0.005
 himself-0.005
Token amaz
Token ID40642
Feature activation-5.96e+00

Pos logit contributions
aring+0.013
ues+0.013
land+0.012
 appearance+0.012
 rather+0.012
Neg logit contributions
 two-0.006
Y-0.006
 excitement-0.006
 himself-0.005
 mixture-0.005
Tokenement
Token ID972
Feature activation-5.25e-01

Pos logit contributions
 two+0.123
 excitement+0.115
Y+0.106
 himself+0.097
 mixture+0.095
Neg logit contributions
aring-0.297
ues-0.272
ems-0.270
 appearance-0.261
 rather-0.260
Token,
Token ID11
Feature activation-9.46e-02

Pos logit contributions
 two+0.018
 excitement+0.016
Y+0.015
 himself+0.015
 p+0.014
Neg logit contributions
aring-0.019
ues-0.018
 appearance-0.017
 rather-0.017
 Spanish-0.016
Token the
Token ID262
Feature activation-3.73e-01

Pos logit contributions
Y+0.003
 two+0.002
 excitement+0.002
 mixture+0.002
 himself+0.002
Neg logit contributions
aring-0.004
ues-0.004
 where-0.004
land-0.004
 rather-0.003
Token clothes
Token ID8242
Feature activation-1.40e+00

Pos logit contributions
Y+0.010
 two+0.009
 excitement+0.009
 himself+0.008
 ah+0.008
Neg logit contributions
aring-0.015
ues-0.015
land-0.014
 rather-0.014
 appearance-0.014
Token came
Token ID1625
Feature activation+1.35e-01

Pos logit contributions
 two+0.033
 excitement+0.031
Y+0.030
 himself+0.025
 p+0.025
Neg logit contributions
aring-0.065
ues-0.063
 appearance-0.059
land-0.059
 rather-0.058
TokenTim
Token ID14967
Feature activation-1.87e+00

Pos logit contributions
 two+0.008
 excitement+0.008
Y+0.007
 mixture+0.007
 himself+0.006
Neg logit contributions
aring-0.010
ues-0.010
 rather-0.009
land-0.009
 where-0.009
Tokenmy
Token ID1820
Feature activation-9.19e-02

Pos logit contributions
 two+0.059
Y+0.057
 excitement+0.057
 himself+0.050
 p+0.048
Neg logit contributions
aring-0.070
ues-0.068
land-0.062
 appearance-0.062
 rather-0.061
Token just
Token ID655
Feature activation-2.23e+00

Pos logit contributions
 two+0.003
Y+0.003
 excitement+0.003
 himself+0.002
 p+0.002
Neg logit contributions
aring-0.004
ues-0.004
land-0.003
 appearance-0.003
 rather-0.003
Token stared
Token ID18484
Feature activation-2.03e+00

Pos logit contributions
 two+0.057
Y+0.057
 excitement+0.055
 himself+0.047
 mixture+0.046
Neg logit contributions
aring-0.096
ues-0.093
land-0.086
 rather-0.086
 appearance-0.086
Token in
Token ID287
Feature activation+2.53e-01

Pos logit contributions
 two+0.069
 excitement+0.065
Y+0.061
 himself+0.059
 p+0.055
Neg logit contributions
aring-0.076
ues-0.073
 appearance-0.066
land-0.066
ems-0.064
Token amaz
Token ID40642
Feature activation-5.93e+00

Pos logit contributions
aring+0.012
ues+0.012
 appearance+0.011
 where+0.011
land+0.011
Neg logit contributions
Y-0.005
 two-0.005
 excitement-0.005
 mixture-0.004
 himself-0.004
Tokenement
Token ID972
Feature activation-9.67e-01

Pos logit contributions
 two+0.173
Y+0.169
 excitement+0.166
 himself+0.144
 mixture+0.138
Neg logit contributions
aring-0.236
ues-0.232
land-0.214
 appearance-0.210
 rather-0.207
Token at
Token ID379
Feature activation-1.92e+00

Pos logit contributions
 two+0.032
 excitement+0.030
Y+0.029
 himself+0.027
 p+0.025
Neg logit contributions
aring-0.035
ues-0.035
 appearance-0.032
 rather-0.031
 Spanish-0.031
Token the
Token ID262
Feature activation-2.75e-01

Pos logit contributions
 two+0.041
 excitement+0.039
Y+0.039
 himself+0.033
 p+0.030
Neg logit contributions
aring-0.093
ues-0.091
land-0.084
 appearance-0.083
 rather-0.082
Token t
Token ID256
Feature activation-2.59e+00

Pos logit contributions
Y+0.007
Carl+0.007
 excitement+0.007
 ah+0.007
 two+0.006
Neg logit contributions
 appearance-0.010
 rather-0.010
 language-0.010
aring-0.009
 Spanish-0.009
Tokeneddy
Token ID21874
Feature activation-7.06e-01

Pos logit contributions
 excitement+0.132
 two+0.126
Y+0.123
 ah+0.115
 himself+0.110
Neg logit contributions
aring-0.059
ues-0.053
land-0.046
ute-0.045
 where-0.040
Tokeneddy
Token ID21874
Feature activation-3.02e-01

Pos logit contributions
 two+0.089
Y+0.087
 excitement+0.087
 himself+0.078
 mixture+0.076
Neg logit contributions
aring-0.067
ues-0.065
land-0.058
 appearance-0.057
 rather-0.056
Token bear
Token ID6842
Feature activation-7.48e-02

Pos logit contributions
 two+0.004
Y+0.003
 excitement+0.003
 himself+0.002
 p+0.002
Neg logit contributions
aring-0.018
ues-0.017
land-0.016
 appearance-0.016
 rather-0.016
Token and
Token ID290
Feature activation-7.95e-02

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 p+0.002
 himself+0.002
Neg logit contributions
aring-0.003
ues-0.003
 Spanish-0.002
 appearance-0.002
 rather-0.002
Token watched
Token ID7342
Feature activation-1.32e+00

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 himself+0.002
 mixture+0.002
Neg logit contributions
aring-0.003
ues-0.003
land-0.003
 appearance-0.003
 rather-0.003
Token in
Token ID287
Feature activation+5.75e-01

Pos logit contributions
 two+0.041
Y+0.039
 excitement+0.039
 himself+0.035
 mixture+0.033
Neg logit contributions
aring-0.051
ues-0.050
land-0.045
 appearance-0.045
 rather-0.044
Token amaz
Token ID40642
Feature activation-5.90e+00

Pos logit contributions
aring+0.025
ues+0.024
land+0.023
 appearance+0.023
 rather+0.022
Neg logit contributions
 two-0.014
Y-0.014
 excitement-0.014
 himself-0.012
 mixture-0.011
Tokenement
Token ID972
Feature activation-1.91e-01

Pos logit contributions
 two+0.163
Y+0.158
 excitement+0.156
 himself+0.136
 mixture+0.129
Neg logit contributions
aring-0.246
ues-0.238
land-0.219
 appearance-0.219
 rather-0.217
Token as
Token ID355
Feature activation-1.03e+00

Pos logit contributions
 two+0.006
 excitement+0.006
Y+0.006
 himself+0.005
 p+0.005
Neg logit contributions
aring-0.007
ues-0.007
 appearance-0.006
 rather-0.006
land-0.006
Token it
Token ID340
Feature activation+1.67e-01

Pos logit contributions
Y+0.030
 two+0.029
 excitement+0.029
 mixture+0.025
 himself+0.025
Neg logit contributions
aring-0.039
ues-0.039
land-0.036
 rather-0.036
 where-0.036
Token shrunk
Token ID38900
Feature activation+9.01e-01

Pos logit contributions
ues+0.006
aring+0.006
land+0.006
 where+0.006
 why+0.006
Neg logit contributions
Y-0.005
 excitement-0.005
 two-0.005
 himself-0.004
Mike-0.004
Token an
Token ID281
Feature activation-8.09e-01

Pos logit contributions
aring+0.036
ues+0.034
 rather+0.031
 appearance+0.031
land+0.031
Neg logit contributions
 two-0.028
 excitement-0.026
Y-0.026
 himself-0.024
 p-0.023
Token
Token ID198
Feature activation+2.14e-01

Pos logit contributions
aring+0.012
ues+0.011
 appearance+0.011
land+0.010
 rather+0.010
Neg logit contributions
 two-0.010
Y-0.009
 excitement-0.009
 himself-0.008
 p-0.008
TokenL
Token ID43
Feature activation-1.17e+00

Pos logit contributions
aring+0.009
ues+0.008
land+0.007
 Spanish+0.007
 rather+0.007
Neg logit contributions
 excitement-0.006
 two-0.006
Y-0.006
 mixture-0.005
 p-0.005
Tokenucky
Token ID5309
Feature activation+2.10e-01

Pos logit contributions
 excitement+0.037
Y+0.036
 two+0.035
 mixture+0.031
 p+0.029
Neg logit contributions
aring-0.046
ues-0.044
land-0.041
 where-0.038
 rather-0.038
Token watched
Token ID7342
Feature activation-1.54e+00

Pos logit contributions
aring+0.009
ues+0.009
land+0.008
 appearance+0.008
 rather+0.008
Neg logit contributions
 two-0.006
Y-0.005
 excitement-0.005
 himself-0.005
 p-0.004
Token in
Token ID287
Feature activation+3.09e-01

Pos logit contributions
 two+0.049
Y+0.049
 excitement+0.047
 himself+0.041
 mixture+0.041
Neg logit contributions
aring-0.057
ues-0.056
land-0.051
 appearance-0.050
 rather-0.050
Token amaz
Token ID40642
Feature activation-5.89e+00

Pos logit contributions
aring+0.014
ues+0.014
land+0.013
 appearance+0.013
 rather+0.013
Neg logit contributions
 two-0.007
Y-0.007
 excitement-0.007
 himself-0.006
 p-0.005
Tokenement
Token ID972
Feature activation-3.27e-01

Pos logit contributions
 two+0.166
Y+0.161
 excitement+0.159
 himself+0.139
 mixture+0.132
Neg logit contributions
aring-0.241
ues-0.235
land-0.217
 appearance-0.215
 rather-0.213
Token as
Token ID355
Feature activation-1.24e+00

Pos logit contributions
 two+0.011
 excitement+0.011
Y+0.010
 himself+0.010
 p+0.009
Neg logit contributions
aring-0.011
ues-0.011
 appearance-0.010
land-0.010
 rather-0.010
Token the
Token ID262
Feature activation-1.26e-01

Pos logit contributions
Y+0.033
 two+0.030
 excitement+0.030
 mixture+0.027
 ah+0.024
Neg logit contributions
aring-0.050
ues-0.050
 rather-0.048
 where-0.048
 Spanish-0.047
Token frog
Token ID21264
Feature activation+6.58e-01

Pos logit contributions
Y+0.003
 two+0.003
Mike+0.003
 excitement+0.003
Andy+0.003
Neg logit contributions
aring-0.005
 language-0.005
 Spanish-0.005
 appearance-0.005
 rather-0.004
Token zoom
Token ID19792
Feature activation+1.39e-01

Pos logit contributions
aring+0.029
ues+0.028
land+0.026
 appearance+0.026
 rather+0.026
Neg logit contributions
 two-0.017
Y-0.016
 excitement-0.016
 himself-0.013
 p-0.013
Token
Token ID198
Feature activation-3.23e-01

Pos logit contributions
 two+0.009
Y+0.008
 excitement+0.008
 himself+0.007
 p+0.007
Neg logit contributions
aring-0.011
ues-0.011
 appearance-0.010
land-0.010
 rather-0.010
TokenLittle
Token ID22253
Feature activation-4.96e-01

Pos logit contributions
 excitement+0.011
 two+0.011
Y+0.010
 mixture+0.009
 himself+0.009
Neg logit contributions
aring-0.011
ues-0.011
land-0.010
 rather-0.010
 Spanish-0.010
Token Jack
Token ID3619
Feature activation+5.83e-01

Pos logit contributions
Y+0.013
 two+0.013
 excitement+0.013
 himself+0.011
 mixture+0.010
Neg logit contributions
aring-0.021
ues-0.020
land-0.019
 appearance-0.019
 where-0.018
Token gasped
Token ID45236
Feature activation-1.25e+00

Pos logit contributions
aring+0.024
ues+0.023
land+0.022
 appearance+0.021
 rather+0.021
Neg logit contributions
 two-0.016
Y-0.016
 excitement-0.016
 himself-0.014
 mixture-0.013
Token in
Token ID287
Feature activation+1.29e-01

Pos logit contributions
 two+0.043
 excitement+0.041
Y+0.041
 himself+0.038
 p+0.036
Neg logit contributions
aring-0.045
ues-0.043
land-0.039
 appearance-0.039
ems-0.038
Token amaz
Token ID40642
Feature activation-5.89e+00

Pos logit contributions
 where+0.005
 Spanish+0.005
 why+0.005
 English+0.005
 appearance+0.005
Neg logit contributions
Y-0.003
 mixture-0.003
 two-0.003
Carl-0.002
ops-0.002
Tokenement
Token ID972
Feature activation-7.52e-01

Pos logit contributions
 two+0.172
Y+0.169
 excitement+0.165
 himself+0.143
 mixture+0.138
Neg logit contributions
aring-0.233
ues-0.230
land-0.212
 appearance-0.208
 rather-0.205
Token.
Token ID13
Feature activation-8.43e-01

Pos logit contributions
 two+0.026
 excitement+0.025
Y+0.025
 himself+0.023
 p+0.022
Neg logit contributions
aring-0.026
ues-0.025
land-0.023
 appearance-0.023
 rather-0.022
Token Dad
Token ID17415
Feature activation-9.06e-01

Pos logit contributions
 two+0.021
Y+0.020
 excitement+0.020
 himself+0.017
 mixture+0.016
Neg logit contributions
aring-0.038
ues-0.037
land-0.034
 appearance-0.034
 rather-0.033
Token pointed
Token ID6235
Feature activation-2.01e-01

Pos logit contributions
Y+0.023
 two+0.023
 excitement+0.023
 himself+0.020
 mixture+0.019
Neg logit contributions
aring-0.039
ues-0.037
land-0.035
 rather-0.034
 where-0.034
Token to
Token ID284
Feature activation-3.76e-01

Pos logit contributions
 two+0.008
 excitement+0.008
Y+0.007
 himself+0.007
 his+0.007
Neg logit contributions
aring-0.006
ues-0.006
land-0.006
 appearance-0.005
ems-0.005
Token go
Token ID467
Feature activation+7.41e-01

Pos logit contributions
 two+0.018
Y+0.018
 excitement+0.018
 himself+0.015
 mixture+0.015
Neg logit contributions
aring-0.027
ues-0.026
land-0.024
 appearance-0.024
 rather-0.023
Token!
Token ID0
Feature activation+4.33e-01

Pos logit contributions
aring+0.028
ues+0.028
 appearance+0.026
ems+0.024
 Spanish+0.024
Neg logit contributions
 two-0.025
 himself-0.023
 his-0.022
 excitement-0.022
 p-0.022
Token Everyone
Token ID11075
Feature activation-5.47e-01

Pos logit contributions
aring+0.017
ues+0.017
 appearance+0.016
land+0.016
 rather+0.015
Neg logit contributions
 two-0.013
Y-0.012
 excitement-0.012
 himself-0.011
 p-0.010
Token watched
Token ID7342
Feature activation-1.15e+00

Pos logit contributions
 two+0.015
 excitement+0.014
Y+0.013
 himself+0.012
 p+0.012
Neg logit contributions
ues-0.024
aring-0.023
 appearance-0.022
land-0.021
 rather-0.021
Token in
Token ID287
Feature activation+7.45e-01

Pos logit contributions
 two+0.037
 excitement+0.035
Y+0.034
 himself+0.031
 p+0.029
Neg logit contributions
aring-0.044
ues-0.042
 appearance-0.039
land-0.039
 rather-0.038
Token amaz
Token ID40642
Feature activation-5.85e+00

Pos logit contributions
aring+0.030
ues+0.030
land+0.027
 appearance+0.027
 rather+0.026
Neg logit contributions
 two-0.022
 excitement-0.021
Y-0.021
 himself-0.018
 p-0.017
Tokenement
Token ID972
Feature activation-1.30e-01

Pos logit contributions
 two+0.155
 excitement+0.149
Y+0.148
 himself+0.129
 mixture+0.121
Neg logit contributions
aring-0.251
ues-0.240
 appearance-0.223
 rather-0.222
land-0.222
Token as
Token ID355
Feature activation-9.55e-01

Pos logit contributions
 two+0.004
 excitement+0.004
Y+0.004
 himself+0.004
 p+0.004
Neg logit contributions
aring-0.005
ues-0.004
 appearance-0.004
 rather-0.004
 Spanish-0.004
Token it
Token ID340
Feature activation+4.01e-01

Pos logit contributions
Y+0.028
 two+0.027
 excitement+0.027
 mixture+0.023
 himself+0.023
Neg logit contributions
aring-0.037
ues-0.036
land-0.033
 rather-0.033
 where-0.033
Token took
Token ID1718
Feature activation-1.33e-01

Pos logit contributions
aring+0.016
ues+0.016
land+0.015
 appearance+0.015
 rather+0.015
Neg logit contributions
 two-0.011
Y-0.011
 excitement-0.011
 himself-0.009
 mixture-0.009
Token off
Token ID572
Feature activation-4.43e-01

Pos logit contributions
Y+0.004
 two+0.004
 excitement+0.004
 mixture+0.003
 himself+0.003
Neg logit contributions
aring-0.005
ues-0.005
land-0.005
 where-0.005
 rather-0.004
Token friends
Token ID2460
Feature activation-1.99e+00

Pos logit contributions
aring+0.051
ues+0.050
land+0.046
 appearance+0.046
 rather+0.046
Neg logit contributions
 two-0.031
Y-0.030
 excitement-0.030
 himself-0.026
 mixture-0.024
Token stopped
Token ID5025
Feature activation+1.80e-01

Pos logit contributions
 two+0.050
 excitement+0.047
Y+0.045
 himself+0.039
 p+0.038
Neg logit contributions
aring-0.090
ues-0.089
land-0.082
 appearance-0.080
 rather-0.079
Token and
Token ID290
Feature activation-3.40e-01

Pos logit contributions
ues+0.007
aring+0.007
land+0.006
ems+0.006
 appearance+0.006
Neg logit contributions
 two-0.006
 excitement-0.006
Y-0.005
 himself-0.005
 his-0.005
Token watched
Token ID7342
Feature activation-1.26e+00

Pos logit contributions
Y+0.010
 excitement+0.010
 two+0.009
 mixture+0.009
 himself+0.009
Neg logit contributions
aring-0.013
ues-0.012
 where-0.012
 why-0.012
 rather-0.012
Token in
Token ID287
Feature activation+4.58e-01

Pos logit contributions
 two+0.039
Y+0.038
 excitement+0.038
 himself+0.033
 mixture+0.032
Neg logit contributions
aring-0.048
ues-0.047
land-0.043
 appearance-0.042
 rather-0.042
Token amaz
Token ID40642
Feature activation-5.84e+00

Pos logit contributions
aring+0.020
ues+0.020
 where+0.019
 appearance+0.018
 rather+0.018
Neg logit contributions
Y-0.011
 two-0.011
 excitement-0.010
 himself-0.009
 mixture-0.009
Tokenement
Token ID972
Feature activation-2.11e-01

Pos logit contributions
 two+0.162
Y+0.157
 excitement+0.156
 himself+0.135
 mixture+0.128
Neg logit contributions
aring-0.242
ues-0.235
land-0.217
 appearance-0.215
 rather-0.213
Token.
Token ID13
Feature activation-5.74e-01

Pos logit contributions
 two+0.007
 excitement+0.007
Y+0.007
 himself+0.006
 p+0.006
Neg logit contributions
aring-0.007
ues-0.007
 appearance-0.007
land-0.007
 rather-0.007
Token The
Token ID383
Feature activation+2.78e-01

Pos logit contributions
 two+0.019
Y+0.018
 excitement+0.018
 himself+0.016
 p+0.015
Neg logit contributions
aring-0.021
ues-0.020
land-0.019
 appearance-0.018
 rather-0.018
Token dog
Token ID3290
Feature activation+6.17e-01

Pos logit contributions
aring+0.011
ues+0.011
 Spanish+0.011
 appearance+0.011
 rather+0.010
Neg logit contributions
Y-0.007
 excitement-0.006
 two-0.006
 himself-0.006
 mixture-0.005
Token too
Token ID1165
Feature activation-1.15e+00

Pos logit contributions
aring+0.027
ues+0.026
land+0.024
 rather+0.024
 appearance+0.024
Neg logit contributions
 two-0.015
Y-0.015
 excitement-0.015
 himself-0.013
 mixture-0.012
Tokenmy
Token ID1820
Feature activation-3.42e-02

Pos logit contributions
 two+0.054
Y+0.054
 excitement+0.053
 himself+0.046
 ah+0.046
Neg logit contributions
aring-0.055
ues-0.052
land-0.048
 appearance-0.047
 rather-0.047
Token's
Token ID338
Feature activation+1.60e+00

Pos logit contributions
Y+0.001
 two+0.001
 excitement+0.001
 ah+0.001
 himself+0.001
Neg logit contributions
aring-0.001
ues-0.001
land-0.001
 rather-0.001
 where-0.001
Token eyes
Token ID2951
Feature activation-1.81e-01

Pos logit contributions
aring+0.069
ues+0.068
land+0.062
 appearance+0.061
 rather+0.061
Neg logit contributions
 two-0.044
 excitement-0.042
Y-0.041
 himself-0.035
 p-0.034
Token widened
Token ID36405
Feature activation-8.27e-01

Pos logit contributions
 two+0.004
Y+0.004
 excitement+0.004
 himself+0.004
 ah+0.003
Neg logit contributions
aring-0.008
ues-0.008
land-0.007
 rather-0.007
 appearance-0.007
Token with
Token ID351
Feature activation-8.30e-01

Pos logit contributions
 two+0.028
 excitement+0.027
Y+0.027
 himself+0.025
 p+0.023
Neg logit contributions
aring-0.030
ues-0.029
land-0.026
 appearance-0.026
ems-0.025
Token amaz
Token ID40642
Feature activation-5.83e+00

Pos logit contributions
Y+0.014
 two+0.012
Carl+0.011
 mixture+0.011
 p+0.009
Neg logit contributions
 rather-0.040
 where-0.040
 Spanish-0.040
aring-0.040
ues-0.040
Tokenement
Token ID972
Feature activation-8.03e-01

Pos logit contributions
 two+0.161
Y+0.156
 excitement+0.155
 himself+0.135
 mixture+0.128
Neg logit contributions
aring-0.243
ues-0.235
land-0.217
 appearance-0.216
 rather-0.214
Token.
Token ID13
Feature activation-1.86e-01

Pos logit contributions
 two+0.029
Y+0.028
 excitement+0.028
 himself+0.025
 mixture+0.024
Neg logit contributions
aring-0.027
ues-0.026
land-0.023
 appearance-0.023
 rather-0.023
Token Grand
Token ID5675
Feature activation-2.88e-01

Pos logit contributions
 two+0.005
Y+0.005
 excitement+0.005
 himself+0.004
 mixture+0.004
Neg logit contributions
aring-0.008
ues-0.007
land-0.007
 where-0.007
 rather-0.007
Tokenma
Token ID2611
Feature activation+1.40e-01

Pos logit contributions
 excitement+0.009
Y+0.009
 two+0.009
 mixture+0.007
Carl+0.007
Neg logit contributions
aring-0.011
ues-0.010
land-0.010
 appearance-0.009
 where-0.009
Token then
Token ID788
Feature activation-7.48e-01

Pos logit contributions
aring+0.006
ues+0.005
 rather+0.005
land+0.005
 where+0.005
Neg logit contributions
Y-0.004
 excitement-0.004
 two-0.004
 himself-0.003
 mixture-0.003
Token the
Token ID262
Feature activation-2.51e-01

Pos logit contributions
Y+0.014
 excitement+0.011
 two+0.011
 ah+0.011
Carl+0.011
Neg logit contributions
 where-0.017
ues-0.016
 Spanish-0.015
land-0.015
 about-0.015
Token zipper
Token ID48992
Feature activation-6.27e-01

Pos logit contributions
Y+0.006
 two+0.006
 excitement+0.006
 ah+0.005
 himself+0.005
Neg logit contributions
aring-0.010
ues-0.010
 appearance-0.010
land-0.010
 where-0.010
Token and
Token ID290
Feature activation-4.33e-01

Pos logit contributions
 two+0.025
Y+0.024
 excitement+0.024
 himself+0.022
 mixture+0.021
Neg logit contributions
aring-0.019
ues-0.018
land-0.016
 appearance-0.016
 rather-0.016
Token gasped
Token ID45236
Feature activation-1.26e+00

Pos logit contributions
Y+0.014
 excitement+0.013
 mixture+0.012
 two+0.012
Carl+0.011
Neg logit contributions
 where-0.015
ues-0.014
 why-0.013
aring-0.013
 who-0.013
Token with
Token ID351
Feature activation-1.05e+00

Pos logit contributions
Y+0.047
 two+0.046
 excitement+0.045
 mixture+0.040
 ah+0.039
Neg logit contributions
aring-0.038
ues-0.038
 where-0.035
land-0.034
 rather-0.034
Token amaz
Token ID40642
Feature activation-5.83e+00

Pos logit contributions
Y+0.022
 mixture+0.019
Carl+0.018
ops+0.017
 two+0.016
Neg logit contributions
 where-0.048
 rather-0.046
 Spanish-0.046
 why-0.045
 about-0.045
Tokenement
Token ID972
Feature activation-8.47e-01

Pos logit contributions
 two+0.175
Y+0.171
 excitement+0.167
 himself+0.145
 mixture+0.140
Neg logit contributions
aring-0.227
ues-0.225
land-0.206
 appearance-0.202
 rather-0.199
Token at
Token ID379
Feature activation-1.75e+00

Pos logit contributions
Y+0.033
 two+0.031
 excitement+0.031
Carl+0.027
 mixture+0.027
Neg logit contributions
aring-0.027
ues-0.026
land-0.024
 where-0.023
 appearance-0.022
Token all
Token ID477
Feature activation-3.50e-01

Pos logit contributions
Y+0.047
 two+0.041
 excitement+0.041
 mixture+0.037
Carl+0.034
Neg logit contributions
ues-0.073
aring-0.072
 where-0.072
 appearance-0.069
 rather-0.067
Token the
Token ID262
Feature activation-1.63e-01

Pos logit contributions
Y+0.010
 two+0.008
 excitement+0.008
 mixture+0.007
Carl+0.007
Neg logit contributions
aring-0.015
 where-0.015
 about-0.014
ues-0.014
 why-0.014
Token treasures
Token ID29561
Feature activation-8.86e-01

Pos logit contributions
Y+0.005
Carl+0.004
 two+0.004
 mixture+0.004
 ah+0.004
Neg logit contributions
aring-0.006
 appearance-0.005
land-0.005
 language-0.005
 Spanish-0.005
Token,
Token ID11
Feature activation+1.91e-01

Pos logit contributions
aring+0.024
ues+0.022
land+0.021
 where+0.021
 rather+0.020
Neg logit contributions
Y-0.028
 two-0.027
 excitement-0.026
 mixture-0.023
 himself-0.023
Token her
Token ID607
Feature activation+3.16e-01

Pos logit contributions
aring+0.007
ues+0.007
 appearance+0.006
land+0.006
ems+0.006
Neg logit contributions
 two-0.006
 excitement-0.006
Y-0.006
 himself-0.005
 p-0.005
Token eyes
Token ID2951
Feature activation-2.17e-01

Pos logit contributions
aring+0.012
ues+0.012
 appearance+0.011
 rather+0.011
land+0.011
Neg logit contributions
Y-0.009
 two-0.008
 excitement-0.008
 himself-0.008
 ah-0.007
Token wide
Token ID3094
Feature activation+2.38e-01

Pos logit contributions
 two+0.006
Y+0.006
 excitement+0.005
 himself+0.005
Carl+0.005
Neg logit contributions
aring-0.009
ues-0.009
land-0.008
 rather-0.008
 where-0.008
Token with
Token ID351
Feature activation-9.43e-01

Pos logit contributions
aring+0.009
ues+0.009
 appearance+0.008
land+0.008
ems+0.008
Neg logit contributions
 two-0.008
 excitement-0.007
Y-0.007
 himself-0.007
 p-0.006
Token amaz
Token ID40642
Feature activation-5.82e+00

Pos logit contributions
Y+0.022
 two+0.020
 mixture+0.018
Carl+0.018
 excitement+0.018
Neg logit contributions
aring-0.039
 rather-0.038
 where-0.038
ues-0.038
 appearance-0.037
Tokenement
Token ID972
Feature activation-6.70e-01

Pos logit contributions
 two+0.159
Y+0.154
 excitement+0.153
 himself+0.133
 mixture+0.127
Neg logit contributions
aring-0.245
ues-0.237
land-0.219
 appearance-0.218
 rather-0.216
Token.
Token ID13
Feature activation+8.52e-02

Pos logit contributions
 two+0.021
 excitement+0.020
Y+0.020
 himself+0.018
 p+0.017
Neg logit contributions
aring-0.025
ues-0.025
 appearance-0.023
 rather-0.022
land-0.022
Token
Token ID198
Feature activation+1.38e-01

Pos logit contributions
aring+0.004
ues+0.004
 appearance+0.003
land+0.003
 tale+0.003
Neg logit contributions
 two-0.002
Y-0.002
 excitement-0.002
 himself-0.002
 p-0.002
Token
Token ID198
Feature activation+6.38e-02

Pos logit contributions
aring+0.006
ues+0.006
 appearance+0.005
land+0.005
 rather+0.005
Neg logit contributions
 two-0.004
Y-0.004
 excitement-0.004
 himself-0.003
 p-0.003
TokenHer
Token ID9360
Feature activation+3.23e-01

Pos logit contributions
aring+0.002
ues+0.002
land+0.002
 appearance+0.002
 rather+0.002
Neg logit contributions
 two-0.002
Y-0.002
 excitement-0.002
 himself-0.002
 mixture-0.002
Token,
Token ID11
Feature activation+1.94e-01

Pos logit contributions
 two+0.005
Y+0.005
 excitement+0.005
 himself+0.005
 p+0.004
Neg logit contributions
aring-0.006
ues-0.005
land-0.005
 appearance-0.005
 rather-0.005
Token her
Token ID607
Feature activation+3.02e-01

Pos logit contributions
aring+0.006
ues+0.006
land+0.006
 appearance+0.005
 rather+0.005
Neg logit contributions
 two-0.007
Y-0.007
 excitement-0.007
 himself-0.006
 mixture-0.006
Token eyes
Token ID2951
Feature activation-2.80e-01

Pos logit contributions
aring+0.011
ues+0.011
 appearance+0.010
 rather+0.010
 where+0.010
Neg logit contributions
Y-0.010
 two-0.008
 himself-0.008
 excitement-0.008
Mike-0.008
Token wide
Token ID3094
Feature activation+1.12e-01

Pos logit contributions
Y+0.008
 two+0.007
 excitement+0.007
Carl+0.007
 himself+0.007
Neg logit contributions
aring-0.011
ues-0.011
land-0.011
 rather-0.010
 where-0.010
Token with
Token ID351
Feature activation-9.38e-01

Pos logit contributions
aring+0.004
ues+0.004
land+0.003
 appearance+0.003
 rather+0.003
Neg logit contributions
 two-0.004
Y-0.004
 excitement-0.004
 himself-0.003
 mixture-0.003
Token amaz
Token ID40642
Feature activation-5.82e+00

Pos logit contributions
Y+0.022
Carl+0.019
 mixture+0.018
 two+0.018
Mike+0.017
Neg logit contributions
 Spanish-0.039
 where-0.038
 rather-0.038
ues-0.037
 English-0.037
Tokenement
Token ID972
Feature activation-7.00e-01

Pos logit contributions
 two+0.165
Y+0.160
 excitement+0.158
 himself+0.137
 mixture+0.131
Neg logit contributions
aring-0.237
ues-0.231
land-0.214
 appearance-0.211
 rather-0.209
Token.
Token ID13
Feature activation-7.40e-02

Pos logit contributions
 two+0.024
Y+0.024
 excitement+0.024
 himself+0.021
 mixture+0.020
Neg logit contributions
aring-0.024
ues-0.023
land-0.021
 appearance-0.021
 rather-0.021
Token
Token ID198
Feature activation+5.54e-02

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 himself+0.002
 p+0.002
Neg logit contributions
aring-0.003
ues-0.003
 appearance-0.003
land-0.003
 rather-0.002
Token
Token ID198
Feature activation-1.19e-02

Pos logit contributions
aring+0.002
ues+0.002
 appearance+0.002
land+0.002
 rather+0.002
Neg logit contributions
 two-0.002
Y-0.002
 excitement-0.002
 himself-0.001
 p-0.001
Token"
Token ID1
Feature activation-1.49e+00

Pos logit contributions
 two+0.000
 excitement+0.000
Y+0.000
 himself+0.000
 mixture+0.000
Neg logit contributions
aring-0.000
ues-0.000
land-0.000
 rather-0.000
 where-0.000
Token Her
Token ID2332
Feature activation+5.70e-01

Pos logit contributions
 two+0.008
 excitement+0.008
Y+0.007
 himself+0.006
 top+0.006
Neg logit contributions
aring-0.012
ues-0.012
 appearance-0.011
land-0.011
 rather-0.010
Token best
Token ID1266
Feature activation+8.91e-02

Pos logit contributions
aring+0.025
ues+0.024
land+0.022
 appearance+0.022
 rather+0.022
Neg logit contributions
 two-0.015
Y-0.014
 excitement-0.014
 himself-0.012
 mixture-0.011
Token friend
Token ID1545
Feature activation-6.00e-01

Pos logit contributions
aring+0.004
ues+0.003
land+0.003
 appearance+0.003
 rather+0.003
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 himself-0.002
 mixture-0.002
Token watched
Token ID7342
Feature activation-1.56e+00

Pos logit contributions
 two+0.017
 excitement+0.015
Y+0.015
 himself+0.013
 p+0.013
Neg logit contributions
ues-0.026
aring-0.026
 appearance-0.023
land-0.023
 rather-0.023
Token in
Token ID287
Feature activation+3.61e-01

Pos logit contributions
 two+0.041
 excitement+0.039
Y+0.037
 himself+0.034
 his+0.032
Neg logit contributions
aring-0.070
ues-0.067
 appearance-0.062
land-0.062
ems-0.061
Token amaz
Token ID40642
Feature activation-5.81e+00

Pos logit contributions
aring+0.017
ues+0.017
land+0.015
ems+0.015
 appearance+0.015
Neg logit contributions
 two-0.009
 excitement-0.009
Y-0.008
 himself-0.007
 p-0.007
Tokenement
Token ID972
Feature activation-5.94e-01

Pos logit contributions
 two+0.155
 excitement+0.149
Y+0.149
 himself+0.129
 p+0.122
Neg logit contributions
aring-0.249
ues-0.239
 appearance-0.220
land-0.220
 rather-0.219
Token as
Token ID355
Feature activation-1.13e+00

Pos logit contributions
 two+0.020
 excitement+0.019
Y+0.018
 himself+0.017
 p+0.016
Neg logit contributions
aring-0.021
ues-0.021
 appearance-0.019
 rather-0.019
land-0.018
Token Amelia
Token ID49769
Feature activation+1.79e+00

Pos logit contributions
 two+0.030
Y+0.029
 excitement+0.029
 himself+0.025
 mixture+0.024
Neg logit contributions
aring-0.048
ues-0.047
land-0.044
 appearance-0.043
 rather-0.043
Token gigg
Token ID30442
Feature activation-2.27e+00

Pos logit contributions
aring+0.077
ues+0.075
land+0.070
 appearance+0.069
 rather+0.068
Neg logit contributions
 two-0.046
Y-0.045
 excitement-0.045
 himself-0.038
 mixture-0.036
Tokenled
Token ID992
Feature activation+2.84e-01

Pos logit contributions
 two+0.064
Y+0.063
 excitement+0.062
 himself+0.054
 mixture+0.051
Neg logit contributions
aring-0.093
ues-0.091
land-0.084
 appearance-0.083
 rather-0.082
Token.
Token ID13
Feature activation-3.48e-01

Pos logit contributions
 two+0.037
Y+0.036
 excitement+0.036
 himself+0.032
 p+0.031
Neg logit contributions
aring-0.038
ues-0.037
land-0.034
 appearance-0.033
 rather-0.033
Token They
Token ID1119
Feature activation-5.79e-01

Pos logit contributions
 two+0.010
Y+0.010
 excitement+0.010
 himself+0.009
 p+0.008
Neg logit contributions
aring-0.014
ues-0.014
land-0.013
 appearance-0.013
 rather-0.012
Token were
Token ID547
Feature activation-7.08e-01

Pos logit contributions
 excitement+0.019
Y+0.018
 two+0.017
 himself+0.017
 mixture+0.015
Neg logit contributions
 where-0.019
 rather-0.019
aring-0.018
 why-0.018
ues-0.018
Token eating
Token ID6600
Feature activation-1.41e-01

Pos logit contributions
Y+0.025
 two+0.025
 excitement+0.024
 himself+0.022
 mixture+0.021
Neg logit contributions
aring-0.023
ues-0.023
land-0.021
 rather-0.020
 appearance-0.020
Token y
Token ID331
Feature activation-1.88e+00

Pos logit contributions
 excitement+0.004
Y+0.004
 two+0.004
 himself+0.003
 ah+0.003
Neg logit contributions
ues-0.005
 where-0.005
aring-0.005
land-0.005
 rather-0.005
Tokenummy
Token ID13513
Feature activation+2.65e+00

Pos logit contributions
 two+0.077
 excitement+0.074
Y+0.074
 himself+0.066
 ah+0.066
Neg logit contributions
aring-0.053
ues-0.052
 appearance-0.045
land-0.044
 rather-0.043
Token spaghetti
Token ID48241
Feature activation+9.02e-01

Pos logit contributions
ues+0.096
land+0.095
aring+0.095
 where+0.090
 tale+0.088
Neg logit contributions
Y-0.073
 excitement-0.072
 two-0.070
Cool-0.064
 himself-0.064
Token and
Token ID290
Feature activation-1.64e-01

Pos logit contributions
aring+0.030
ues+0.028
 appearance+0.026
land+0.026
 Spanish+0.025
Neg logit contributions
 two-0.033
Y-0.032
 excitement-0.031
 himself-0.029
 p-0.028
Token meat
Token ID6174
Feature activation-6.32e-01

Pos logit contributions
Y+0.005
 two+0.005
 excitement+0.005
 himself+0.004
Cool+0.004
Neg logit contributions
ues-0.006
 where-0.006
aring-0.006
 rather-0.005
 about-0.005
Tokenballs
Token ID21591
Feature activation+1.39e+00

Pos logit contributions
 two+0.018
Y+0.018
 excitement+0.018
 himself+0.015
 mixture+0.015
Neg logit contributions
aring-0.025
ues-0.025
land-0.023
 appearance-0.022
 rather-0.022
Token.
Token ID13
Feature activation-3.40e-01

Pos logit contributions
aring+0.045
ues+0.044
land+0.040
 rather+0.039
 where+0.039
Neg logit contributions
 two-0.049
Y-0.049
 excitement-0.049
 himself-0.043
 mixture-0.042
Token saw
Token ID2497
Feature activation+1.45e-01

Pos logit contributions
Y+0.015
 excitement+0.014
 two+0.014
 himself+0.012
 mixture+0.012
Neg logit contributions
aring-0.019
ues-0.018
 rather-0.017
land-0.017
 appearance-0.017
Token the
Token ID262
Feature activation+9.89e-02

Pos logit contributions
aring+0.005
ues+0.005
 where+0.005
land+0.004
 rather+0.004
Neg logit contributions
Y-0.005
 excitement-0.005
 two-0.005
 mixture-0.004
 ah-0.004
Token monster
Token ID9234
Feature activation+8.89e-01

Pos logit contributions
 appearance+0.003
aring+0.003
 language+0.003
 where+0.003
 rather+0.003
Neg logit contributions
Y-0.003
 excitement-0.003
 himself-0.002
Mike-0.002
 ah-0.002
Token again
Token ID757
Feature activation-5.06e-01

Pos logit contributions
aring+0.034
ues+0.034
land+0.030
 appearance+0.030
 rather+0.030
Neg logit contributions
 two-0.028
 excitement-0.026
Y-0.026
 himself-0.023
 p-0.023
Token -
Token ID532
Feature activation-1.30e-03

Pos logit contributions
 two+0.017
Y+0.017
 excitement+0.017
 himself+0.015
 p+0.015
Neg logit contributions
aring-0.018
ues-0.017
land-0.016
 appearance-0.015
 rather-0.015
Tokensc
Token ID1416
Feature activation+3.64e+00

Pos logit contributions
Y+0.000
 two+0.000
 excitement+0.000
 himself+0.000
 mixture+0.000
Neg logit contributions
aring-0.000
ues-0.000
land-0.000
 rather-0.000
 appearance-0.000
Tokenur
Token ID333
Feature activation+2.77e-02

Pos logit contributions
aring+0.126
ues+0.121
 where+0.107
 appearance+0.105
 rather+0.105
Neg logit contributions
 two-0.125
Y-0.123
 excitement-0.122
 mixture-0.109
 ah-0.106
Tokenrying
Token ID14992
Feature activation+7.19e-01

Pos logit contributions
aring+0.001
ues+0.001
land+0.001
 appearance+0.001
 rather+0.001
Neg logit contributions
 two-0.001
Y-0.001
 excitement-0.001
 himself-0.001
 p-0.001
Token away
Token ID1497
Feature activation-6.83e-01

Pos logit contributions
aring+0.027
ues+0.027
land+0.024
 appearance+0.024
 rather+0.023
Neg logit contributions
 two-0.023
 excitement-0.022
Y-0.022
 himself-0.020
 p-0.019
Token from
Token ID422
Feature activation-9.09e-01

Pos logit contributions
 two+0.024
Y+0.023
 excitement+0.023
 himself+0.020
 mixture+0.020
Neg logit contributions
aring-0.023
ues-0.023
land-0.021
 appearance-0.020
 rather-0.020
Token the
Token ID262
Feature activation+1.79e-01

Pos logit contributions
 two+0.021
Y+0.021
 excitement+0.020
 himself+0.017
 mixture+0.016
Neg logit contributions
aring-0.041
ues-0.040
land-0.038
 appearance-0.037
 rather-0.037
Token find
Token ID1064
Feature activation+6.02e-01

Pos logit contributions
ues+0.071
aring+0.069
land+0.063
 Spanish+0.062
 Portuguese+0.060
Neg logit contributions
 two-0.052
 excitement-0.048
Y-0.046
 himself-0.043
 p-0.042
Token yourself
Token ID3511
Feature activation+1.70e+00

Pos logit contributions
aring+0.023
ues+0.022
land+0.020
 appearance+0.019
 rather+0.019
Neg logit contributions
 two-0.021
 excitement-0.019
Y-0.018
 himself-0.018
 his-0.017
Token in
Token ID287
Feature activation+8.46e-01

Pos logit contributions
ues+0.066
aring+0.064
 Portuguese+0.056
 Spanish+0.056
 Him+0.054
Neg logit contributions
 two-0.058
 himself-0.051
 excitement-0.049
 his-0.047
 p-0.047
Token a
Token ID257
Feature activation-1.86e-02

Pos logit contributions
aring+0.038
ues+0.037
land+0.033
ems+0.033
 Portuguese+0.031
Neg logit contributions
 two-0.026
 excitement-0.023
 himself-0.021
Y-0.020
 his-0.020
Token dangerous
Token ID4923
Feature activation+1.21e-01

Pos logit contributions
 two+0.001
 excitement+0.001
Y+0.001
 p+0.000
 himself+0.000
Neg logit contributions
aring-0.001
ues-0.001
land-0.001
ems-0.001
 appearance-0.001
Token situation
Token ID3074
Feature activation+2.70e+00

Pos logit contributions
ues+0.006
aring+0.006
ems+0.006
 Him+0.005
land+0.005
Neg logit contributions
 two-0.003
 excitement-0.003
 p-0.002
 giant-0.002
 himself-0.002
Token,
Token ID11
Feature activation+4.97e-01

Pos logit contributions
ues+0.116
aring+0.111
 Portuguese+0.105
 Spanish+0.101
 Him+0.101
Neg logit contributions
 two-0.078
 himself-0.066
 excitement-0.066
 his-0.062
 p-0.061
Token it
Token ID340
Feature activation+1.70e-01

Pos logit contributions
aring+0.017
ues+0.017
land+0.015
 appearance+0.015
ems+0.014
Neg logit contributions
 two-0.019
 excitement-0.018
Y-0.017
 himself-0.016
 his-0.015
Token's
Token ID338
Feature activation+2.21e+00

Pos logit contributions
ues+0.006
aring+0.005
land+0.005
 Spanish+0.005
 appearance+0.005
Neg logit contributions
 two-0.007
 excitement-0.006
Y-0.006
 himself-0.006
 p-0.006
Token important
Token ID1593
Feature activation+1.36e+00

Pos logit contributions
ues+0.092
aring+0.087
land+0.077
ems+0.074
ute+0.073
Neg logit contributions
 two-0.080
 himself-0.068
 his-0.066
 excitement-0.065
 p-0.060
Token to
Token ID284
Feature activation+2.29e-01

Pos logit contributions
ues+0.048
aring+0.046
itionally+0.043
 Him+0.042
ems+0.042
Neg logit contributions
 two-0.050
 himself-0.046
 excitement-0.045
 his-0.043
 p-0.041
Token was
Token ID373
Feature activation+3.83e-01

Pos logit contributions
land+0.005
ues+0.005
aring+0.005
 about+0.005
 where+0.004
Neg logit contributions
 excitement-0.005
Y-0.005
 two-0.005
 ah-0.005
 Lego-0.005
Token time
Token ID640
Feature activation+1.04e+00

Pos logit contributions
aring+0.013
ues+0.013
land+0.012
 rather+0.012
 where+0.012
Neg logit contributions
Y-0.013
 two-0.013
 excitement-0.012
 himself-0.011
 ah-0.011
Token for
Token ID329
Feature activation+5.97e-01

Pos logit contributions
aring+0.033
ues+0.031
 appearance+0.030
 Spanish+0.029
land+0.028
Neg logit contributions
 two-0.041
 excitement-0.039
Y-0.036
 whole-0.035
 himself-0.035
Token bed
Token ID3996
Feature activation-3.38e-01

Pos logit contributions
ues+0.024
aring+0.024
land+0.024
 where+0.023
 appearance+0.023
Neg logit contributions
Y-0.015
 two-0.014
 excitement-0.014
 mixture-0.013
 himself-0.012
Token.
Token ID13
Feature activation+1.96e-02

Pos logit contributions
 two+0.010
Y+0.009
 excitement+0.009
 p+0.008
 himself+0.008
Neg logit contributions
aring-0.014
ues-0.013
 Spanish-0.013
 appearance-0.013
 English-0.012
Token Patty
Token ID42417
Feature activation+2.59e+00

Pos logit contributions
aring+0.001
ues+0.001
land+0.001
 appearance+0.001
 rather+0.001
Neg logit contributions
 two-0.001
Y-0.001
 excitement-0.001
 himself-0.001
 p-0.000
Token was
Token ID373
Feature activation+3.67e-01

Pos logit contributions
aring+0.111
ues+0.109
land+0.100
 appearance+0.100
 rather+0.097
Neg logit contributions
 two-0.069
Y-0.066
 excitement-0.066
 himself-0.056
 p-0.055
Token so
Token ID523
Feature activation+1.22e+00

Pos logit contributions
aring+0.013
ues+0.013
land+0.012
 appearance+0.012
ems+0.011
Neg logit contributions
 two-0.013
 excitement-0.012
Y-0.012
 himself-0.011
 p-0.010
Token happy
Token ID3772
Feature activation-4.71e-01

Pos logit contributions
aring+0.049
ues+0.047
land+0.044
 rather+0.043
 appearance+0.043
Neg logit contributions
 two-0.035
Y-0.035
 excitement-0.034
 himself-0.030
 mixture-0.029
Token and
Token ID290
Feature activation+1.43e-01

Pos logit contributions
 two+0.018
 excitement+0.017
Y+0.017
 himself+0.016
 p+0.015
Neg logit contributions
ues-0.015
aring-0.015
 appearance-0.013
 Spanish-0.013
ems-0.013
Token fell
Token ID3214
Feature activation+9.16e-01

Pos logit contributions
aring+0.006
ues+0.005
 rather+0.005
 where+0.005
land+0.005
Neg logit contributions
 two-0.004
Y-0.004
 excitement-0.004
 himself-0.003
 mixture-0.003
Token head
Token ID1182
Feature activation+1.49e-02

Pos logit contributions
aring+0.075
ues+0.074
land+0.067
 appearance+0.066
 rather+0.065
Neg logit contributions
 two-0.054
 excitement-0.053
Y-0.052
 himself-0.046
 mixture-0.043
Token.
Token ID13
Feature activation-2.09e-01

Pos logit contributions
aring+0.001
ues+0.001
land+0.000
 appearance+0.000
 rather+0.000
Neg logit contributions
 two-0.001
Y-0.000
 excitement-0.000
 himself-0.000
 mixture-0.000
Token
Token ID198
Feature activation+8.60e-02

Pos logit contributions
 two+0.007
Y+0.006
 excitement+0.006
 himself+0.006
 p+0.005
Neg logit contributions
aring-0.008
ues-0.008
land-0.007
 appearance-0.007
 rather-0.007
Token
Token ID198
Feature activation+6.89e-02

Pos logit contributions
aring+0.003
ues+0.003
 appearance+0.003
land+0.003
 rather+0.003
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 himself-0.002
 p-0.002
TokenM
Token ID44
Feature activation-1.91e+00

Pos logit contributions
aring+0.003
ues+0.003
land+0.002
 rather+0.002
 where+0.002
Neg logit contributions
 two-0.002
 excitement-0.002
Y-0.002
 himself-0.002
 mixture-0.002
Tokenummy
Token ID13513
Feature activation+2.68e+00

Pos logit contributions
 excitement+0.080
 two+0.080
Y+0.079
 himself+0.073
 ah+0.072
Neg logit contributions
aring-0.053
ues-0.049
land-0.044
 Him-0.041
 rather-0.041
Token:
Token ID25
Feature activation+1.78e+00

Pos logit contributions
aring+0.104
ues+0.101
land+0.094
 appearance+0.092
 rather+0.091
Neg logit contributions
 two-0.079
Y-0.078
 excitement-0.077
 himself-0.067
 mixture-0.064
Token "
Token ID366
Feature activation-4.91e-01

Pos logit contributions
aring+0.061
ues+0.058
 appearance+0.053
land+0.052
 rather+0.050
Neg logit contributions
 two-0.068
Y-0.066
 excitement-0.064
 himself-0.058
 p-0.057
TokenThank
Token ID10449
Feature activation-1.12e+00

Pos logit contributions
 excitement+0.013
 two+0.013
Y+0.012
 himself+0.011
 p+0.011
Neg logit contributions
aring-0.020
ues-0.019
 rather-0.018
land-0.018
 Spanish-0.018
Token you
Token ID345
Feature activation+2.22e-01

Pos logit contributions
 two+0.047
 excitement+0.046
Y+0.046
 himself+0.042
 mixture+0.041
Neg logit contributions
aring-0.030
ues-0.028
land-0.025
 rather-0.025
 appearance-0.024
Token,
Token ID11
Feature activation+2.05e-01

Pos logit contributions
aring+0.008
ues+0.008
 appearance+0.007
land+0.007
ems+0.006
Neg logit contributions
 two-0.008
 excitement-0.008
Y-0.007
 himself-0.007
 p-0.007
Token happy
Token ID3772
Feature activation-6.78e-01

Pos logit contributions
aring+0.031
 rather+0.028
ues+0.028
 where+0.027
 about+0.026
Neg logit contributions
Y-0.038
 two-0.037
 excitement-0.035
 mixture-0.035
Carl-0.034
Token and
Token ID290
Feature activation+9.27e-02

Pos logit contributions
 two+0.026
 excitement+0.025
Y+0.025
 himself+0.024
 his+0.023
Neg logit contributions
aring-0.021
ues-0.021
 appearance-0.018
land-0.018
 Spanish-0.018
Token put
Token ID1234
Feature activation-5.81e-01

Pos logit contributions
aring+0.003
 where+0.003
ues+0.003
 rather+0.003
 about+0.003
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 mixture-0.002
Carl-0.002
Token on
Token ID319
Feature activation+6.77e-01

Pos logit contributions
Y+0.021
 excitement+0.020
 two+0.020
 mixture+0.018
Carl+0.018
Neg logit contributions
 where-0.017
ues-0.016
aring-0.015
 about-0.015
 why-0.015
Token his
Token ID465
Feature activation-5.17e-02

Pos logit contributions
 where+0.033
ues+0.031
aring+0.031
 about+0.030
 rather+0.030
Neg logit contributions
Y-0.012
 two-0.011
 excitement-0.010
 ah-0.010
 mixture-0.009
Token lucky
Token ID9670
Feature activation-3.79e-01

Pos logit contributions
Y+0.001
 excitement+0.001
 two+0.001
 ah+0.001
Mike+0.001
Neg logit contributions
ues-0.002
aring-0.002
 where-0.002
 appearance-0.002
land-0.002
Token sock
Token ID32263
Feature activation+3.59e-01

Pos logit contributions
Y+0.010
 two+0.009
 excitement+0.009
 ah+0.008
 himself+0.008
Neg logit contributions
aring-0.016
ues-0.015
 appearance-0.015
land-0.015
 where-0.014
Token.
Token ID13
Feature activation-1.55e-01

Pos logit contributions
aring+0.013
ues+0.012
land+0.011
 appearance+0.011
 rather+0.011
Neg logit contributions
 two-0.012
Y-0.012
 excitement-0.012
 himself-0.010
 p-0.010
Token
Token ID198
Feature activation+1.07e-01

Pos logit contributions
 two+0.005
 excitement+0.005
Y+0.005
 mixture+0.004
 ah+0.004
Neg logit contributions
aring-0.006
 where-0.005
ues-0.005
 about-0.005
 rather-0.005
Token
Token ID198
Feature activation+4.97e-02

Pos logit contributions
aring+0.004
ues+0.004
 appearance+0.004
land+0.004
 rather+0.004
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 himself-0.003
 his-0.003
TokenBut
Token ID1537
Feature activation-1.67e+00

Pos logit contributions
aring+0.002
ues+0.002
 where+0.001
 rather+0.001
 about+0.001
Neg logit contributions
 excitement-0.002
 two-0.002
Y-0.002
 mixture-0.002
 ah-0.002
Token work
Token ID670
Feature activation+6.90e-02

Pos logit contributions
aring+0.004
ues+0.004
land+0.004
 appearance+0.004
 rather+0.004
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 himself-0.003
 mixture-0.002
Token.
Token ID13
Feature activation+4.79e-01

Pos logit contributions
aring+0.003
 appearance+0.002
ues+0.002
land+0.002
itionally+0.002
Neg logit contributions
 two-0.002
 himself-0.002
 excitement-0.002
Y-0.002
 his-0.002
Token Can
Token ID1680
Feature activation+7.89e-01

Pos logit contributions
aring+0.015
ues+0.014
land+0.013
 appearance+0.013
 rather+0.012
Neg logit contributions
 two-0.019
Y-0.018
 excitement-0.018
 himself-0.016
 p-0.016
Token you
Token ID345
Feature activation+4.75e-01

Pos logit contributions
aring+0.025
ues+0.025
land+0.023
 appearance+0.022
 rather+0.021
Neg logit contributions
 two-0.031
Y-0.030
 excitement-0.028
 himself-0.027
 p-0.026
Token wait
Token ID4043
Feature activation-2.05e+00

Pos logit contributions
aring+0.020
ues+0.020
land+0.018
 rather+0.018
 appearance+0.018
Neg logit contributions
 two-0.012
Y-0.012
 excitement-0.012
 himself-0.010
 mixture-0.010
Token a
Token ID257
Feature activation-4.36e-01

Pos logit contributions
 two+0.074
 excitement+0.064
 his+0.061
 p+0.061
 himself+0.060
Neg logit contributions
aring-0.085
 tale-0.071
ues-0.068
 language-0.068
rots-0.068
Token little
Token ID1310
Feature activation+2.64e-01

Pos logit contributions
 two+0.012
Y+0.011
 excitement+0.011
 p+0.009
 himself+0.009
Neg logit contributions
aring-0.019
ues-0.018
land-0.017
 appearance-0.017
 rather-0.016
Token bit
Token ID1643
Feature activation-4.75e-01

Pos logit contributions
aring+0.011
ues+0.011
land+0.010
 appearance+0.010
ems+0.010
Neg logit contributions
 two-0.007
 excitement-0.007
Y-0.007
 p-0.006
 mixture-0.006
Token?"
Token ID1701
Feature activation+1.13e+00

Pos logit contributions
 two+0.019
 p+0.016
 excitement+0.016
Y+0.015
 himself+0.015
Neg logit contributions
aring-0.015
ues-0.015
 appearance-0.014
 culture-0.013
land-0.013
Token
Token ID198
Feature activation+1.00e-01

Pos logit contributions
aring+0.042
ues+0.041
land+0.038
 appearance+0.037
 rather+0.037
Neg logit contributions
 two-0.035
Y-0.035
 excitement-0.034
 himself-0.030
 mixture-0.029
Token
Token ID198
Feature activation+4.52e-02

Pos logit contributions
aring+0.004
ues+0.004
 appearance+0.004
land+0.004
 rather+0.004
Neg logit contributions
 two-0.003
Y-0.003
 excitement-0.003
 himself-0.003
 p-0.002
Token our
Token ID674
Feature activation+5.48e-01

Pos logit contributions
aring+0.088
ues+0.087
ems+0.078
uing+0.076
ows+0.075
Neg logit contributions
 two-0.032
 excitement-0.024
 p-0.023
 his-0.023
 other-0.021
Token home
Token ID1363
Feature activation-1.90e+00

Pos logit contributions
aring+0.023
ues+0.022
land+0.020
 appearance+0.019
 rather+0.019
Neg logit contributions
 two-0.016
 excitement-0.016
Y-0.015
 himself-0.013
 p-0.013
Token more
Token ID517
Feature activation-8.49e-01

Pos logit contributions
 two+0.065
 excitement+0.057
Y+0.056
 himself+0.054
 p+0.053
Neg logit contributions
aring-0.072
ues-0.069
 appearance-0.065
ems-0.061
 Spanish-0.061
Token.
Token ID13
Feature activation+7.49e-01

Pos logit contributions
 two+0.031
 excitement+0.030
Y+0.028
 p+0.026
 himself+0.026
Neg logit contributions
aring-0.029
ues-0.028
 appearance-0.025
ems-0.024
 rather-0.024
Token Can
Token ID1680
Feature activation+8.53e-01

Pos logit contributions
ues+0.028
aring+0.028
 appearance+0.026
land+0.025
ems+0.024
Neg logit contributions
 two-0.025
Y-0.024
 excitement-0.023
 himself-0.021
 p-0.019
Token we
Token ID356
Feature activation+6.53e-01

Pos logit contributions
ues+0.029
aring+0.029
land+0.027
 appearance+0.025
 tale+0.024
Neg logit contributions
 two-0.033
Y-0.031
 excitement-0.029
 himself-0.028
 p-0.028
Token go
Token ID467
Feature activation+8.01e-01

Pos logit contributions
aring+0.023
ues+0.022
land+0.021
 rather+0.021
 appearance+0.020
Neg logit contributions
 two-0.021
Y-0.021
 excitement-0.021
 himself-0.019
 mixture-0.018
Token back
Token ID736
Feature activation+1.01e+00

Pos logit contributions
aring+0.035
ues+0.034
 appearance+0.031
ems+0.030
 Spanish+0.030
Neg logit contributions
 two-0.024
 p-0.020
 himself-0.020
 his-0.020
 excitement-0.019
Token now
Token ID783
Feature activation+2.39e-01

Pos logit contributions
aring+0.037
ues+0.034
 appearance+0.030
 rather+0.029
ems+0.028
Neg logit contributions
 two-0.040
Y-0.034
 excitement-0.033
 p-0.033
 his-0.032
Token?"
Token ID1701
Feature activation+1.45e+00

Pos logit contributions
aring+0.007
ues+0.006
 Spanish+0.005
 rather+0.005
 appearance+0.005
Neg logit contributions
 two-0.011
 p-0.010
 excitement-0.010
 his-0.010
Y-0.010
Token
Token ID198
Feature activation+5.29e-01

Pos logit contributions
aring+0.061
ues+0.060
 appearance+0.056
land+0.056
 rather+0.053
Neg logit contributions
 two-0.042
Y-0.039
 excitement-0.039
 himself-0.035
 p-0.033
Token a
Token ID257
Feature activation-7.32e-01

Pos logit contributions
aring+0.001
 where+0.001
 rather+0.001
ues+0.001
land+0.001
Neg logit contributions
Y-0.001
 excitement-0.001
 two-0.001
 mixture-0.001
 ah-0.001
Token time
Token ID640
Feature activation+6.58e-01

Pos logit contributions
 two+0.021
Y+0.021
 excitement+0.020
 p+0.017
 himself+0.017
Neg logit contributions
aring-0.030
ues-0.029
land-0.027
 appearance-0.027
 Spanish-0.026
Token,
Token ID11
Feature activation+1.57e-01

Pos logit contributions
aring+0.022
ues+0.022
land+0.020
 appearance+0.020
 rather+0.019
Neg logit contributions
 two-0.024
Y-0.023
 excitement-0.022
 himself-0.020
 p-0.020
Token there
Token ID612
Feature activation-1.07e+00

Pos logit contributions
aring+0.006
ues+0.006
 appearance+0.005
land+0.005
 rather+0.005
Neg logit contributions
 two-0.005
Y-0.005
 excitement-0.005
 himself-0.004
 p-0.004
Token was
Token ID373
Feature activation+1.59e-02

Pos logit contributions
Y+0.031
 excitement+0.031
 two+0.030
 mixture+0.027
 ah+0.026
Neg logit contributions
aring-0.039
ues-0.038
 where-0.038
land-0.036
 rather-0.036
Token a
Token ID257
Feature activation-7.11e-01

Pos logit contributions
 where+0.001
 rather+0.001
 about+0.000
 Spanish+0.000
ues+0.000
Neg logit contributions
Y-0.000
 excitement-0.000
 ah-0.000
 mixture-0.000
hetically-0.000
Token kid
Token ID5141
Feature activation-1.26e+00

Pos logit contributions
 two+0.017
 excitement+0.016
Y+0.016
 himself+0.014
 mixture+0.013
Neg logit contributions
aring-0.032
ues-0.031
land-0.029
 rather-0.029
 appearance-0.029
Token named
Token ID3706
Feature activation+1.02e-01

Pos logit contributions
 two+0.046
 p+0.042
Y+0.042
 his+0.040
 excitement+0.039
Neg logit contributions
ues-0.043
aring-0.040
 appearance-0.040
land-0.038
ems-0.038
Token Jack
Token ID3619
Feature activation+6.42e-01

Pos logit contributions
aring+0.005
 where+0.004
ues+0.004
 appearance+0.004
 Spanish+0.004
Neg logit contributions
 excitement-0.002
Y-0.002
 two-0.002
 mixture-0.002
 ah-0.002
Token.
Token ID13
Feature activation-3.11e-01

Pos logit contributions
ues+0.030
aring+0.029
 appearance+0.026
land+0.025
 Portuguese+0.025
Neg logit contributions
 two-0.018
 his-0.016
 himself-0.015
 p-0.015
Y-0.014
Token He
Token ID679
Feature activation-8.82e-01

Pos logit contributions
 two+0.008
Y+0.007
 excitement+0.007
 mixture+0.007
 delic+0.006
Neg logit contributions
aring-0.013
 where-0.012
ues-0.012
 rather-0.012
 Spanish-0.012
Token the
Token ID262
Feature activation+1.32e-02

Pos logit contributions
Y+0.026
 two+0.025
 excitement+0.025
 mixture+0.021
 ah+0.021
Neg logit contributions
ues-0.041
aring-0.040
 where-0.040
 rather-0.039
 appearance-0.039
Token tree
Token ID5509
Feature activation+4.44e-01

Pos logit contributions
aring+0.001
ues+0.000
 appearance+0.000
 where+0.000
 rather+0.000
Neg logit contributions
Y-0.000
 excitement-0.000
 two-0.000
 himself-0.000
 ah-0.000
Token.
Token ID13
Feature activation-3.92e-01

Pos logit contributions
aring+0.016
ues+0.016
 appearance+0.015
land+0.014
ems+0.014
Neg logit contributions
 two-0.014
 excitement-0.013
Y-0.013
 himself-0.012
 p-0.012
Token This
Token ID770
Feature activation+1.11e+00

Pos logit contributions
 two+0.013
Y+0.012
 excitement+0.012
 himself+0.011
 p+0.010
Neg logit contributions
aring-0.015
ues-0.014
land-0.013
 appearance-0.013
 rather-0.013
Token time
Token ID640
Feature activation+6.84e-01

Pos logit contributions
aring+0.047
ues+0.046
land+0.042
 appearance+0.041
 Him+0.040
Neg logit contributions
 two-0.033
 excitement-0.031
Y-0.031
 mixture-0.026
 p-0.026
Token,
Token ID11
Feature activation+1.84e-01

Pos logit contributions
aring+0.025
ues+0.024
land+0.022
 appearance+0.022
 rather+0.021
Neg logit contributions
 two-0.023
 excitement-0.022
Y-0.022
 himself-0.019
 p-0.019
Token she
Token ID673
Feature activation-1.87e-01

Pos logit contributions
aring+0.008
ues+0.007
land+0.007
 appearance+0.007
 rather+0.007
Neg logit contributions
 two-0.005
Y-0.005
 excitement-0.005
 himself-0.004
 mixture-0.004
Token made
Token ID925
Feature activation+8.77e-01

Pos logit contributions
 two+0.005
Y+0.005
 excitement+0.005
 himself+0.004
 mixture+0.004
Neg logit contributions
aring-0.008
ues-0.007
land-0.007
 rather-0.007
 where-0.007
Token sure
Token ID1654
Feature activation+5.80e-01

Pos logit contributions
ues+0.031
aring+0.030
land+0.029
 where+0.028
 rather+0.028
Neg logit contributions
Y-0.027
 two-0.026
 excitement-0.025
Carl-0.023
 ah-0.023
Token to
Token ID284
Feature activation-7.19e-02

Pos logit contributions
aring+0.017
ues+0.017
land+0.014
 appearance+0.014
ems+0.013
Neg logit contributions
 two-0.024
 excitement-0.023
Y-0.022
 himself-0.021
 his-0.021
Token hold
Token ID1745
Feature activation+2.08e-01

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 mixture+0.002
 ah+0.002
Neg logit contributions
aring-0.003
ues-0.002
 rather-0.002
 appearance-0.002
 where-0.002
Token the
Token ID262
Feature activation-5.00e-02

Pos logit contributions
 where+0.020
ues+0.020
 rather+0.019
land+0.019
 Spanish+0.019
Neg logit contributions
Y-0.009
 two-0.008
 excitement-0.008
 ah-0.008
Carl-0.008
Token new
Token ID649
Feature activation+2.05e-01

Pos logit contributions
Y+0.001
 two+0.001
 excitement+0.001
 himself+0.001
Carl+0.001
Neg logit contributions
aring-0.002
ues-0.002
land-0.002
 appearance-0.002
 where-0.002
Token fashion
Token ID6977
Feature activation-1.22e-01

Pos logit contributions
aring+0.008
ues+0.008
land+0.008
 appearance+0.008
 where+0.007
Neg logit contributions
Y-0.006
 two-0.006
 excitement-0.005
 himself-0.005
 ah-0.004
Token and
Token ID290
Feature activation-1.50e-01

Pos logit contributions
 two+0.004
Y+0.004
 excitement+0.004
 himself+0.003
 p+0.003
Neg logit contributions
aring-0.005
ues-0.005
 appearance-0.004
land-0.004
 rather-0.004
Token started
Token ID2067
Feature activation-1.12e+00

Pos logit contributions
Y+0.004
 two+0.004
 excitement+0.004
Carl+0.003
 himself+0.003
Neg logit contributions
ues-0.006
 where-0.006
aring-0.006
 rather-0.005
land-0.005
Token down
Token ID866
Feature activation-8.18e-01

Pos logit contributions
 two+0.041
Y+0.041
 excitement+0.040
 himself+0.036
 mixture+0.035
Neg logit contributions
aring-0.036
ues-0.035
land-0.032
 appearance-0.031
 rather-0.031
Token the
Token ID262
Feature activation+1.93e-02

Pos logit contributions
Y+0.024
 two+0.023
 excitement+0.023
Carl+0.020
 mixture+0.020
Neg logit contributions
ues-0.030
aring-0.030
 where-0.029
land-0.029
 rather-0.028
Token hill
Token ID12788
Feature activation+1.31e+00

Pos logit contributions
aring+0.001
land+0.001
ues+0.001
 rather+0.001
 appearance+0.001
Neg logit contributions
Y-0.001
 excitement-0.001
 two-0.001
 ah-0.000
 himself-0.000
Token.
Token ID13
Feature activation-2.83e-01

Pos logit contributions
aring+0.044
ues+0.043
land+0.039
 rather+0.038
 appearance+0.038
Neg logit contributions
 two-0.046
Y-0.046
 excitement-0.045
 himself-0.040
 mixture-0.039
Token Everyone
Token ID11075
Feature activation-1.26e+00

Pos logit contributions
Y+0.009
 two+0.008
 excitement+0.008
 mixture+0.007
 p+0.007
Neg logit contributions
aring-0.011
ues-0.010
 rather-0.009
 where-0.009
 Spanish-0.009
Token in
Token ID287
Feature activation+4.70e-01

Pos logit contributions
Y+0.039
 two+0.037
 excitement+0.036
 himself+0.032
 mixture+0.031
Neg logit contributions
aring-0.047
ues-0.045
 where-0.043
 about-0.042
 rather-0.041
Token had
Token ID550
Feature activation+1.01e+00

Pos logit contributions
 excitement+0.000
Y+0.000
 two+0.000
 ah+0.000
 himself+0.000
Neg logit contributions
aring-0.001
ues-0.001
 rather-0.001
land-0.001
 where-0.001
Token gone
Token ID3750
Feature activation+1.46e+00

Pos logit contributions
aring+0.042
ues+0.041
land+0.039
 appearance+0.038
 rather+0.038
Neg logit contributions
 two-0.027
Y-0.027
 excitement-0.026
 himself-0.022
 mixture-0.022
Token on
Token ID319
Feature activation+5.20e-01

Pos logit contributions
aring+0.064
ues+0.063
ems+0.055
 Him+0.055
 appearance+0.053
Neg logit contributions
 two-0.042
 himself-0.038
 excitement-0.037
 his-0.037
 p-0.034
Token a
Token ID257
Feature activation-4.14e-01

Pos logit contributions
aring+0.022
ues+0.022
land+0.021
 appearance+0.021
 rather+0.021
Neg logit contributions
Y-0.013
 two-0.013
 excitement-0.012
 mixture-0.010
 himself-0.010
Token journey
Token ID7002
Feature activation+2.47e+00

Pos logit contributions
 two+0.011
 excitement+0.010
Y+0.010
 p+0.009
 whole+0.009
Neg logit contributions
aring-0.019
ues-0.019
ems-0.017
 Him-0.016
 appearance-0.016
Token with
Token ID351
Feature activation-9.05e-01

Pos logit contributions
aring+0.101
ues+0.100
 appearance+0.093
ems+0.091
 Him+0.089
Neg logit contributions
 two-0.070
 excitement-0.063
 himself-0.062
Y-0.057
 his-0.057
Token Santa
Token ID8909
Feature activation+7.19e-01

Pos logit contributions
Y+0.025
 two+0.023
 excitement+0.022
 mixture+0.020
 himself+0.019
Neg logit contributions
aring-0.036
ues-0.036
 where-0.034
 rather-0.034
 appearance-0.034
Token and
Token ID290
Feature activation-1.21e-01

Pos logit contributions
ues+0.026
aring+0.026
 appearance+0.023
 rather+0.023
 Spanish+0.022
Neg logit contributions
 two-0.024
 excitement-0.022
Y-0.021
 himself-0.020
 p-0.019
Token how
Token ID703
Feature activation-1.35e+00

Pos logit contributions
 two+0.003
 excitement+0.003
Y+0.003
 himself+0.002
 p+0.002
Neg logit contributions
aring-0.005
ues-0.005
land-0.005
 appearance-0.005
 rather-0.005
Token it
Token ID340
Feature activation+3.67e-02

Pos logit contributions
 two+0.038
 excitement+0.035
Y+0.035
 himself+0.032
 his+0.031
Neg logit contributions
ues-0.057
aring-0.056
 appearance-0.052
land-0.052
ems-0.050
Token had
Token ID550
Feature activation+1.15e+00

Pos logit contributions
aring+0.001
ues+0.001
 rather+0.001
 where+0.001
land+0.001
Neg logit contributions
 excitement-0.001
Y-0.001
 two-0.001
 himself-0.001
 mixture-0.001
Token with
Token ID351
Feature activation-7.52e-01

Pos logit contributions
aring+0.004
ues+0.004
land+0.003
 appearance+0.003
 rather+0.003
Neg logit contributions
 two-0.004
Y-0.004
 excitement-0.004
 himself-0.003
 p-0.003
Token the
Token ID262
Feature activation-1.62e-01

Pos logit contributions
Y+0.020
 two+0.017
 mixture+0.016
 excitement+0.016
Carl+0.016
Neg logit contributions
 where-0.029
ues-0.028
aring-0.028
 rather-0.028
 Spanish-0.028
Token leaves
Token ID5667
Feature activation-1.56e-01

Pos logit contributions
Y+0.004
 excitement+0.004
 two+0.004
 himself+0.003
Mike+0.003
Neg logit contributions
aring-0.007
ues-0.006
 appearance-0.006
 where-0.006
 rather-0.006
Token.
Token ID13
Feature activation+1.51e-01

Pos logit contributions
 two+0.005
 excitement+0.005
Y+0.005
 himself+0.005
 p+0.004
Neg logit contributions
aring-0.006
ues-0.005
 appearance-0.005
land-0.005
 Spanish-0.005
Token He
Token ID679
Feature activation-4.83e-01

Pos logit contributions
aring+0.006
ues+0.006
land+0.006
 where+0.006
 rather+0.006
Neg logit contributions
 two-0.004
Y-0.004
 excitement-0.004
 mixture-0.003
 himself-0.003
Token did
Token ID750
Feature activation-8.18e-01

Pos logit contributions
 excitement+0.013
Y+0.013
 two+0.013
 himself+0.011
 mixture+0.011
Neg logit contributions
aring-0.019
ues-0.019
 rather-0.018
 where-0.017
land-0.017
Token not
Token ID407
Feature activation-7.62e-02

Pos logit contributions
 two+0.036
Y+0.035
 excitement+0.034
 himself+0.033
 p+0.031
Neg logit contributions
aring-0.022
ues-0.020
land-0.018
 appearance-0.018
ems-0.017
Token see
Token ID766
Feature activation+1.65e+00

Pos logit contributions
 two+0.003
Y+0.003
 excitement+0.003
 himself+0.002
 p+0.002
Neg logit contributions
aring-0.003
ues-0.003
land-0.002
 appearance-0.002
 rather-0.002
Token them
Token ID606
Feature activation+1.32e-01

Pos logit contributions
aring+0.077
ues+0.074
land+0.069
ems+0.067
 rather+0.066
Neg logit contributions
 two-0.042
 excitement-0.037
Y-0.036
 himself-0.035
 his-0.034
Token.
Token ID13
Feature activation+2.41e-01

Pos logit contributions
ues+0.004
aring+0.004
land+0.004
 Spanish+0.004
 appearance+0.004
Neg logit contributions
 two-0.005
 excitement-0.004
Y-0.004
 himself-0.004
 p-0.004
Token He
Token ID679
Feature activation-4.85e-01

Pos logit contributions
aring+0.011
ues+0.010
land+0.010
 where+0.009
 rather+0.009
Neg logit contributions
 two-0.006
Y-0.006
 excitement-0.006
 mixture-0.005
 himself-0.005
Token They
Token ID1119
Feature activation-4.44e-01

Pos logit contributions
Y+0.011
 two+0.011
 excitement+0.011
 mixture+0.010
 himself+0.009
Neg logit contributions
aring-0.012
ues-0.011
land-0.010
 where-0.010
 rather-0.010
Token were
Token ID547
Feature activation-4.22e-01

Pos logit contributions
Y+0.013
 excitement+0.012
 two+0.012
 mixture+0.012
 himself+0.011
Neg logit contributions
 where-0.016
ues-0.015
aring-0.015
 rather-0.015
 about-0.015
Token amazed
Token ID24562
Feature activation-1.34e+00

Pos logit contributions
Y+0.016
 two+0.015
 excitement+0.015
 himself+0.014
 mixture+0.014
Neg logit contributions
aring-0.013
ues-0.012
 where-0.011
 rather-0.011
land-0.011
Token at
Token ID379
Feature activation-1.46e+00

Pos logit contributions
 two+0.046
 excitement+0.044
Y+0.042
 himself+0.039
 his+0.038
Neg logit contributions
ues-0.048
aring-0.048
 appearance-0.042
ems-0.042
land-0.041
Token how
Token ID703
Feature activation-1.35e+00

Pos logit contributions
 two+0.058
 excitement+0.055
Y+0.052
 himself+0.050
 p+0.048
Neg logit contributions
aring-0.047
ues-0.045
land-0.040
 appearance-0.039
ems-0.038
Token tall
Token ID7331
Feature activation-1.09e+00

Pos logit contributions
Y+0.040
 two+0.040
 excitement+0.039
 himself+0.035
 mixture+0.034
Neg logit contributions
aring-0.052
ues-0.050
land-0.046
 where-0.046
 rather-0.045
Token and
Token ID290
Feature activation-1.08e-01

Pos logit contributions
Y+0.028
 two+0.028
 excitement+0.028
 himself+0.024
 mixture+0.023
Neg logit contributions
aring-0.045
ues-0.045
land-0.042
 where-0.041
 appearance-0.041
Token beautiful
Token ID4950
Feature activation+1.58e-01

Pos logit contributions
 two+0.003
Y+0.003
 excitement+0.003
 himself+0.003
 mixture+0.003
Neg logit contributions
aring-0.004
ues-0.004
 where-0.003
land-0.003
 rather-0.003
Token it
Token ID340
Feature activation+2.07e-02

Pos logit contributions
aring+0.006
ues+0.006
 appearance+0.005
 rather+0.005
land+0.005
Neg logit contributions
 two-0.005
 excitement-0.005
Y-0.005
 himself-0.004
 p-0.004
Token was
Token ID373
Feature activation+1.99e-01

Pos logit contributions
aring+0.001
ues+0.001
 where+0.001
land+0.001
 rather+0.001
Neg logit contributions
Y-0.001
 excitement-0.001
 two-0.001
 himself-0.001
 ah-0.001
Token.
Token ID13
Feature activation-2.44e-01

Pos logit contributions
aring+0.008
ues+0.008
 appearance+0.007
ems+0.007
 Spanish+0.007
Neg logit contributions
 two-0.006
 excitement-0.006
Y-0.006
 himself-0.005
 p-0.005
Token hard
Token ID1327
Feature activation+4.41e-01

Pos logit contributions
 two+0.030
 excitement+0.028
Y+0.028
 himself+0.024
 p+0.024
Neg logit contributions
aring-0.039
ues-0.038
land-0.035
 appearance-0.034
 Spanish-0.033
Token to
Token ID284
Feature activation-1.40e-02

Pos logit contributions
aring+0.016
ues+0.016
 Portuguese+0.015
 Spanish+0.015
 appearance+0.014
Neg logit contributions
 two-0.015
 excitement-0.013
 himself-0.013
 his-0.012
Y-0.012
Token eat
Token ID4483
Feature activation-6.71e-01

Pos logit contributions
 two+0.000
Y+0.000
 excitement+0.000
 himself+0.000
 mixture+0.000
Neg logit contributions
aring-0.001
ues-0.000
land-0.000
 appearance-0.000
 rather-0.000
Token.
Token ID13
Feature activation+2.95e-02

Pos logit contributions
 two+0.019
Y+0.018
 excitement+0.017
 himself+0.017
 p+0.016
Neg logit contributions
aring-0.030
ues-0.028
land-0.025
 appearance-0.025
 Spanish-0.025
Token<|endoftext|>
Token ID50256
Feature activation+1.04e+00

Pos logit contributions
aring+0.001
ues+0.001
 appearance+0.001
land+0.001
 rather+0.001
Neg logit contributions
 two-0.001
Y-0.001
 excitement-0.001
 himself-0.001
 p-0.001
TokenOnce
Token ID7454
Feature activation-1.34e+00

Pos logit contributions
aring+0.035
ues+0.034
land+0.032
 where+0.031
 rather+0.031
Neg logit contributions
Y-0.035
 excitement-0.034
 two-0.034
 mixture-0.030
 himself-0.030
Token upon
Token ID2402
Feature activation+9.46e-03

Pos logit contributions
 two+0.040
 excitement+0.037
Y+0.035
 himself+0.033
 his+0.032
Neg logit contributions
aring-0.054
ues-0.052
 appearance-0.049
land-0.049
 rather-0.048
Token a
Token ID257
Feature activation-5.86e-01

Pos logit contributions
aring+0.000
ues+0.000
land+0.000
 rather+0.000
 where+0.000
Neg logit contributions
 two-0.000
Y-0.000
 excitement-0.000
 mixture-0.000
 himself-0.000
Token time
Token ID640
Feature activation+3.69e-01

Pos logit contributions
 two+0.014
Y+0.013
 excitement+0.013
 whole+0.011
 p+0.011
Neg logit contributions
ues-0.027
aring-0.027
 appearance-0.025
ems-0.025
land-0.024
Token there
Token ID612
Feature activation-1.26e+00

Pos logit contributions
aring+0.013
ues+0.013
land+0.012
 appearance+0.012
 rather+0.012
Neg logit contributions
 two-0.013
 excitement-0.012
Y-0.012
 himself-0.011
 p-0.010
Token was
Token ID373
Feature activation-7.32e-02

Pos logit contributions
Y+0.038
 excitement+0.037
 two+0.037
 mixture+0.033
 himself+0.031
Neg logit contributions
aring-0.044
ues-0.043
 where-0.043
 about-0.041
 why-0.041
Token at
Token ID379
Feature activation-1.54e+00

Pos logit contributions
aring+0.001
ues+0.001
 rather+0.001
 appearance+0.001
land+0.001
Neg logit contributions
 two-0.001
 excitement-0.001
Y-0.001
 p-0.001
 himself-0.001
Token the
Token ID262
Feature activation-2.44e-03

Pos logit contributions
 two+0.050
 p+0.042
Y+0.041
 excitement+0.040
 himself+0.038
Neg logit contributions
aring-0.064
ues-0.063
land-0.056
ems-0.056
 rather-0.055
Token bug
Token ID5434
Feature activation+2.54e-01

Pos logit contributions
Y+0.000
 excitement+0.000
 two+0.000
 himself+0.000
Mike+0.000
Neg logit contributions
aring-0.000
 appearance-0.000
ues-0.000
 rather-0.000
 where-0.000
Token in
Token ID287
Feature activation+4.42e-01

Pos logit contributions
ues+0.009
aring+0.009
 Spanish+0.008
 appearance+0.008
 Him+0.008
Neg logit contributions
 two-0.009
 p-0.008
 excitement-0.008
Y-0.008
 himself-0.007
Token the
Token ID262
Feature activation+6.96e-03

Pos logit contributions
aring+0.019
ues+0.019
land+0.017
 appearance+0.017
 rather+0.017
Neg logit contributions
 two-0.012
Y-0.011
 excitement-0.011
 himself-0.010
 p-0.009
Token p
Token ID279
Feature activation-4.26e+00

Pos logit contributions
aring+0.000
 appearance+0.000
 where+0.000
land+0.000
 rather+0.000
Neg logit contributions
Y-0.000
 excitement-0.000
 two-0.000
 himself-0.000
Andy-0.000
Tokenuddle
Token ID24500
Feature activation-7.74e-01

Pos logit contributions
 two+0.133
Y+0.130
 p+0.125
 excitement+0.123
 mixture+0.115
Neg logit contributions
aring-0.146
 Him-0.144
 rather-0.142
ues-0.141
 Spanish-0.140
Token!"
Token ID2474
Feature activation-3.69e-01

Pos logit contributions
 two+0.030
Y+0.028
 excitement+0.028
 himself+0.026
 p+0.026
Neg logit contributions
aring-0.024
ues-0.023
 appearance-0.021
land-0.021
 Spanish-0.021
Token she
Token ID673
Feature activation-3.14e-01

Pos logit contributions
Y+0.011
 two+0.011
 excitement+0.010
 mixture+0.009
 himself+0.009
Neg logit contributions
aring-0.014
ues-0.014
land-0.013
 where-0.013
 Spanish-0.012
Token said
Token ID531
Feature activation-4.70e-01

Pos logit contributions
Y+0.009
 two+0.008
 excitement+0.008
 mixture+0.008
Cool+0.007
Neg logit contributions
aring-0.012
ues-0.011
 rather-0.011
 about-0.011
 where-0.010
Token in
Token ID287
Feature activation+4.54e-01

Pos logit contributions
 two+0.016
Y+0.016
 excitement+0.016
 himself+0.014
 mixture+0.014
Neg logit contributions
aring-0.016
ues-0.016
land-0.014
 rather-0.014
 appearance-0.014
Token they
Token ID484
Feature activation+2.17e-02

Pos logit contributions
aring+0.013
ues+0.013
 appearance+0.011
land+0.011
.'"+0.011
Neg logit contributions
 two-0.012
 excitement-0.011
Y-0.011
 himself-0.010
 p-0.010
Token both
Token ID1111
Feature activation-3.27e-01

Pos logit contributions
aring+0.001
ues+0.001
land+0.001
 appearance+0.001
 rather+0.001
Neg logit contributions
 two-0.001
 excitement-0.001
Y-0.001
 himself-0.000
 p-0.000
Token looked
Token ID3114
Feature activation+4.88e-01

Pos logit contributions
Y+0.008
 two+0.008
 excitement+0.008
 himself+0.007
 mixture+0.007
Neg logit contributions
aring-0.014
ues-0.014
land-0.013
 rather-0.013
 where-0.013
Token around
Token ID1088
Feature activation+5.21e-02

Pos logit contributions
aring+0.020
ues+0.019
ems+0.017
 appearance+0.017
land+0.017
Neg logit contributions
 two-0.016
 excitement-0.015
Y-0.013
 p-0.013
 his-0.013
Token in
Token ID287
Feature activation+3.61e-01

Pos logit contributions
aring+0.002
ues+0.002
 appearance+0.002
land+0.002
ems+0.002
Neg logit contributions
 two-0.002
 excitement-0.002
Y-0.002
 himself-0.002
 his-0.002
Token amaz
Token ID40642
Feature activation-5.70e+00

Pos logit contributions
aring+0.016
ues+0.015
land+0.014
 appearance+0.014
 where+0.014
Neg logit contributions
 two-0.009
Y-0.009
 excitement-0.008
 himself-0.007
 mixture-0.007
Tokenement
Token ID972
Feature activation-1.11e-01

Pos logit contributions
 two+0.147
 excitement+0.141
Y+0.139
 himself+0.122
 p+0.114
Neg logit contributions
aring-0.251
ues-0.238
 appearance-0.223
 rather-0.222
land-0.219
Token.
Token ID13
Feature activation-5.23e-01

Pos logit contributions
 two+0.004
 excitement+0.003
Y+0.003
 himself+0.003
 p+0.003
Neg logit contributions
aring-0.004
ues-0.004
 appearance-0.004
 Spanish-0.004
 rather-0.004
Token The
Token ID383
Feature activation+2.15e-01

Pos logit contributions
 two+0.016
Y+0.016
 excitement+0.015
 himself+0.013
 p+0.013
Neg logit contributions
aring-0.020
ues-0.020
 appearance-0.018
land-0.018
 rather-0.018
Token lake
Token ID13546
Feature activation+1.29e+00

Pos logit contributions
aring+0.009
ues+0.009
land+0.008
 appearance+0.008
 rather+0.008
Neg logit contributions
Y-0.005
 excitement-0.005
 two-0.005
 himself-0.004
 ah-0.004
Token was
Token ID373
Feature activation+7.84e-03

Pos logit contributions
aring+0.056
ues+0.055
land+0.051
 appearance+0.050
 rather+0.050
Neg logit contributions
 two-0.033
Y-0.032
 excitement-0.031
 himself-0.027
 mixture-0.025
Token fox
Token ID21831
Feature activation+1.42e+00

Pos logit contributions
Y+0.004
 himself+0.004
 his+0.004
 excitement+0.004
Mike+0.004
Neg logit contributions
aring-0.006
 appearance-0.006
 tale-0.006
land-0.006
 rather-0.006
Token got
Token ID1392
Feature activation-1.33e-01

Pos logit contributions
aring+0.059
ues+0.057
land+0.054
 rather+0.052
 where+0.052
Neg logit contributions
Y-0.038
 two-0.037
 excitement-0.037
 himself-0.033
 mixture-0.031
Token out
Token ID503
Feature activation+5.36e-01

Pos logit contributions
Y+0.005
 excitement+0.005
 ah+0.005
 two+0.004
 mixture+0.004
Neg logit contributions
 rather-0.004
 where-0.004
aring-0.003
ues-0.003
 about-0.003
Token of
Token ID286
Feature activation-1.47e+00

Pos logit contributions
aring+0.018
ues+0.017
 where+0.016
land+0.016
 rather+0.016
Neg logit contributions
Y-0.019
 two-0.018
 excitement-0.018
 himself-0.016
 mixture-0.016
Token the
Token ID262
Feature activation-1.53e-01

Pos logit contributions
Y+0.045
 two+0.039
Cool+0.037
 mixture+0.037
Carl+0.036
Neg logit contributions
 where-0.053
 rather-0.048
 Spanish-0.047
 about-0.047
 why-0.046
Token p
Token ID279
Feature activation-4.25e+00

Pos logit contributions
Y+0.003
 excitement+0.003
 himself+0.003
 two+0.002
 ah+0.002
Neg logit contributions
aring-0.007
 where-0.007
land-0.007
 appearance-0.006
ues-0.006
Tokenuddle
Token ID24500
Feature activation-8.54e-01

Pos logit contributions
 two+0.172
Y+0.169
 excitement+0.168
 himself+0.152
 mixture+0.147
Neg logit contributions
aring-0.123
ues-0.119
land-0.105
 appearance-0.103
 rather-0.101
Token.
Token ID13
Feature activation-4.95e-01

Pos logit contributions
 two+0.033
Y+0.032
 excitement+0.032
 himself+0.029
 mixture+0.028
Neg logit contributions
aring-0.026
ues-0.025
land-0.023
 appearance-0.022
 rather-0.022
Token She
Token ID1375
Feature activation-8.87e-01

Pos logit contributions
 two+0.016
Y+0.016
 excitement+0.015
 himself+0.014
 mixture+0.013
Neg logit contributions
aring-0.018
ues-0.017
land-0.016
 appearance-0.016
 rather-0.016
Token shook
Token ID14682
Feature activation-2.82e-01

Pos logit contributions
Y+0.023
 two+0.022
 excitement+0.022
 himself+0.020
 mixture+0.018
Neg logit contributions
aring-0.035
ues-0.035
 rather-0.033
 where-0.033
land-0.032
Token off
Token ID572
Feature activation-6.72e-01

Pos logit contributions
 two+0.007
 excitement+0.007
Y+0.007
 himself+0.006
 p+0.006
Neg logit contributions
aring-0.013
ues-0.013
land-0.012
 appearance-0.011
 rather-0.011
Token a
Token ID257
Feature activation-5.94e-01

Pos logit contributions
aring+0.029
 rather+0.028
 where+0.028
ues+0.028
land+0.027
Neg logit contributions
Y-0.020
 excitement-0.020
 two-0.019
 ah-0.018
 himself-0.018
Token bad
Token ID2089
Feature activation+2.37e-01

Pos logit contributions
 two+0.015
Y+0.015
 excitement+0.015
 himself+0.012
 p+0.012
Neg logit contributions
aring-0.026
ues-0.025
land-0.023
 appearance-0.023
 rather-0.023
Token tum
Token ID11814
Feature activation-1.97e+00

Pos logit contributions
aring+0.011
ues+0.011
land+0.010
.'"+0.009
 Spanish+0.009
Neg logit contributions
 two-0.006
 excitement-0.006
Y-0.005
 p-0.005
 mixture-0.005
Tokenmy
Token ID1820
Feature activation-3.57e-02

Pos logit contributions
 two+0.057
Y+0.052
 excitement+0.050
 himself+0.049
 p+0.044
Neg logit contributions
ues-0.082
aring-0.081
 appearance-0.074
land-0.074
 rather-0.072
Token ac
Token ID936
Feature activation-1.28e+00

Pos logit contributions
 two+0.001
 p+0.001
 excitement+0.001
Y+0.001
 himself+0.001
Neg logit contributions
ues-0.002
aring-0.002
 Spanish-0.002
 appearance-0.001
 English-0.001
Tokenhe
Token ID258
Feature activation-4.62e+00

Pos logit contributions
 excitement+0.026
 two+0.025
Y+0.025
 ah+0.018
 mixture+0.018
Neg logit contributions
aring-0.064
ues-0.061
land-0.057
 where-0.057
 rather-0.057
Token and
Token ID290
Feature activation-2.55e-01

Pos logit contributions
 two+0.128
Y+0.122
 excitement+0.121
 himself+0.107
 p+0.102
Neg logit contributions
aring-0.195
ues-0.190
land-0.174
 appearance-0.173
 rather-0.170
Token her
Token ID607
Feature activation+2.90e-01

Pos logit contributions
 two+0.007
Y+0.007
 excitement+0.007
 himself+0.006
 p+0.006
Neg logit contributions
aring-0.011
ues-0.011
land-0.010
 appearance-0.010
 Spanish-0.010
Token mom
Token ID1995
Feature activation-9.11e-01

Pos logit contributions
aring+0.012
ues+0.011
land+0.011
 appearance+0.011
 rather+0.011
Neg logit contributions
 two-0.008
Y-0.008
 excitement-0.008
 himself-0.007
 ah-0.006
Tokenmy
Token ID1820
Feature activation-6.14e-02

Pos logit contributions
 two+0.024
Y+0.020
 p+0.019
 excitement+0.019
 his+0.018
Neg logit contributions
ues-0.044
aring-0.042
 appearance-0.039
 Spanish-0.038
land-0.038
Token gave
Token ID2921
Feature activation+1.10e+00

Pos logit contributions
 two+0.002
Y+0.002
 excitement+0.002
 p+0.002
 himself+0.002
Neg logit contributions
ues-0.003
aring-0.002
 appearance-0.002
 Spanish-0.002
land-0.002
Token a
Token ID257
Feature activation-6.03e-01

Pos logit contributions
aring+0.032
ues+0.031
land+0.029
 appearance+0.029
 rather+0.028
Neg logit contributions
 two-0.018
Y-0.018
 excitement-0.018
 himself-0.015
 mixture-0.014
Token nos
Token ID43630
Feature activation-1.88e+00

Pos logit contributions
 two+0.015
Y+0.015
 excitement+0.015
 himself+0.013
 mixture+0.012
Neg logit contributions
aring-0.025
ues-0.025
land-0.023
 rather-0.023
 appearance-0.023
Tokeny
Token ID88
Feature activation+1.49e+00

Pos logit contributions
Y+0.097
 two+0.094
 excitement+0.092
 himself+0.085
 mixture+0.085
Neg logit contributions
aring-0.038
ues-0.032
land-0.028
 rather-0.028
 appearance-0.027
Token old
Token ID1468
Feature activation+9.37e-01

Pos logit contributions
aring+0.062
ues+0.058
 where+0.056
land+0.056
 appearance+0.056
Neg logit contributions
Y-0.039
 two-0.038
 excitement-0.037
 himself-0.033
 ease-0.031
Token man
Token ID582
Feature activation-6.81e-01

Pos logit contributions
aring+0.041
ues+0.039
land+0.037
 appearance+0.037
 where+0.037
Neg logit contributions
Y-0.023
 two-0.023
 excitement-0.022
 himself-0.019
 mixture-0.018
Token pe
Token ID613
Feature activation-4.16e+00

Pos logit contributions
 two+0.026
 excitement+0.025
Y+0.024
 himself+0.023
 p+0.022
Neg logit contributions
ues-0.021
aring-0.021
 appearance-0.019
land-0.018
ems-0.018
Tokenering
Token ID1586
Feature activation+1.65e+00

Pos logit contributions
Y+0.191
 two+0.191
 excitement+0.189
 himself+0.170
 mixture+0.168
Neg logit contributions
aring-0.097
ues-0.093
land-0.081
 appearance-0.077
 rather-0.075
Token at
Token ID379
Feature activation-1.57e+00

Pos logit contributions
aring+0.057
ues+0.055
land+0.051
 where+0.050
 rather+0.050
Neg logit contributions
Y-0.056
 two-0.055
 excitement-0.054
 mixture-0.047
 ah-0.047
Token them
Token ID606
Feature activation-1.19e-01

Pos logit contributions
 two+0.035
Y+0.033
 excitement+0.033
 himself+0.027
 mixture+0.026
Neg logit contributions
aring-0.074
ues-0.073
land-0.068
 appearance-0.067
 rather-0.067
Token through
Token ID832
Feature activation-4.36e-01

Pos logit contributions
 two+0.004
 excitement+0.004
Y+0.004
 himself+0.004
 p+0.003
Neg logit contributions
ues-0.004
aring-0.004
 appearance-0.004
land-0.004
ems-0.004
Token the
Token ID262
Feature activation-1.02e-01

Pos logit contributions
Y+0.012
 two+0.011
 excitement+0.010
 mixture+0.010
 ah+0.009
Neg logit contributions
aring-0.017
ues-0.017
 where-0.016
 Spanish-0.016
land-0.016