TOP ACTIVATIONS
MAX = 9.841

reward . It was ch ocol ates and sweets !
out her finger and gently stro ked the cater pillar .
it ! <|endoftext|> Sam stared int ently through the microscope .
as he opened the ch ocol ates and started eating them
time there was a blue ostr ich that lived in the
a big box of ch ocol ates . It was so
was a box of ch ocol ates . Lily loved ch
giving him too many ch ocol ates ." The king learned
oll ip ops , ch ocol ates , g ummies ,
a stroke . As she stro ked him , the horse
to Rose . Rose gently stro ked the turkey 's feathers
a large selection of ch ocol ates . Ell a picked
he ate all the ch ocol ates and sweets . Then
a design , when in stro de a small rabbit .
Jack . M ummy stro ked her chin . "
The little lamb watched him int ently , and when the
looking at all the ch ocol ates , Lex i chose
a small box with other therm ometers . But Tim my
reached out his hand and stro ked a white sheep .
Tom ran to Rex and stro ked his fur . Rex

MIN ACTIVATIONS
MIN = -6.678

away ! You are not a pirate !" Ben said .
. He helps them make structures with blocks . He makes
want . You can make towers and houses and cars and
learners . You can make triangles with your blocks too .
television . It is not a toy . It is expensive
lose . You are not a good friend ."
mess . You are not a good friend ."
mean . You are not a good customer ."
my notebook ! You are a bad friend !"
Ruby ! She is not a princess , she is my
fair ! You are not a queen , you are just
funny . You are not a superhero , you are a
. I was not being a good friend . I was
dolls and animals to make houses , roads , farms and
her . You are not a good friend , you are
shapes . They can make towers , bridges , houses and
mean . You are not a dragon , you are my
every day . They make structures with blocks . They are
you . He is not a toy , he is a
nice . You are not a good friend . You do

INTERVAL 5.711 - 9.841
CONTAINS 0.004%

there was a rh in oc eros who was always naughty
and their little three - year - old daughter , Mill
park . When the ch ocol ates were all gone ,
my !" exclaimed three - year - old Jenny . She
started eating the delicious ch ocol ates . She was so

INTERVAL 1.581 - 5.711
CONTAINS 14.072%

and confused . He shook his head and threw the cushion
made the old man very happy . He decided
up and up , trying to get closer to the sun
jumped in surprise , but Cel ery just smiled and kept
went away . They told her to stay inside and be

INTERVAL -2.548 - 1.581
CONTAINS 81.350%

was happy and proud . He never forgot the girl who
and a boy . They were outside playing in the bright
me . Can I help you fix your sc ooter ?"
." Dad said , " The fish will not
always looking for treasure and having fun . <|endoftext|> Once upon

INTERVAL -6.678 - -2.548
CONTAINS 4.575%

and have lots of fun playing together . And they always
day , her mom and dad would organize things around the
being on the roof for a while , until they heard
we have a net and a stick . We can catch
It has big wheels and a big horn . Lily and
Token reward
Token ID6721
Feature activation+2.97e+00

Pos logit contributions
ment+0.112
isher+0.111
edly+0.110
anged+0.108
 thirst+0.106
Neg logit contributions
 Anne-0.064
 great-0.060
 different-0.060
Beck-0.059
 big-0.059
Token.
Token ID13
Feature activation+1.23e+00

Pos logit contributions
ment+0.079
isher+0.077
edly+0.077
anged+0.075
 thirst+0.074
Neg logit contributions
 Anne-0.041
 great-0.038
Beck-0.038
Anne-0.038
 different-0.037
Token It
Token ID632
Feature activation+1.71e+00

Pos logit contributions
ment+0.033
edly+0.032
isher+0.032
anged+0.031
 thirst+0.031
Neg logit contributions
 Anne-0.016
 great-0.016
Beck-0.015
 different-0.015
 neat-0.015
Token was
Token ID373
Feature activation+7.82e-01

Pos logit contributions
ment+0.041
isher+0.041
edly+0.041
 thirst+0.040
anged+0.039
Neg logit contributions
 Anne-0.025
Anne-0.024
Beck-0.024
 great-0.024
 neat-0.023
Token ch
Token ID442
Feature activation+1.87e+00

Pos logit contributions
ment+0.023
isher+0.022
 thirst+0.022
edly+0.022
anged+0.022
Neg logit contributions
 Anne-0.008
Beck-0.008
Anne-0.008
 third-0.007
 great-0.007
Tokenocol
Token ID4668
Feature activation+7.20e+00

Pos logit contributions
edly+0.029
ment+0.029
isher+0.027
anged+0.027
 thirst+0.027
Neg logit contributions
 great-0.045
 big-0.045
 different-0.044
 out-0.044
 neat-0.043
Tokenates
Token ID689
Feature activation+6.87e-01

Pos logit contributions
 thirst+0.163
isher+0.156
edly+0.153
anged+0.153
ment+0.151
Neg logit contributions
Beck-0.116
 Anne-0.113
Anne-0.113
 ten-0.112
 out-0.112
Token and
Token ID290
Feature activation+2.66e+00

Pos logit contributions
ment+0.018
isher+0.018
edly+0.018
anged+0.017
 thirst+0.017
Neg logit contributions
 Anne-0.010
Beck-0.009
Anne-0.009
 great-0.009
 different-0.008
Token sweets
Token ID42402
Feature activation+1.32e+00

Pos logit contributions
ment+0.072
isher+0.072
edly+0.071
anged+0.070
 thirst+0.069
Neg logit contributions
 Anne-0.035
 great-0.032
Beck-0.032
Anne-0.032
 different-0.032
Token!
Token ID0
Feature activation+1.61e-01

Pos logit contributions
ment+0.035
edly+0.035
isher+0.035
anged+0.034
 thirst+0.033
Neg logit contributions
 Anne-0.018
Beck-0.017
Anne-0.017
 great-0.016
 different-0.016
Token
Token ID198
Feature activation+1.15e+00

Pos logit contributions
ment+0.004
edly+0.004
isher+0.004
anged+0.004
 thirst+0.004
Neg logit contributions
 Anne-0.002
 great-0.002
 neat-0.002
 different-0.002
Beck-0.002
Token out
Token ID503
Feature activation+1.81e-01

Pos logit contributions
ment+0.067
anged+0.065
 thirst+0.064
isher+0.062
edly+0.061
Neg logit contributions
Anne-0.025
Alice-0.025
Beck-0.024
 Anne-0.023
 great-0.022
Token her
Token ID607
Feature activation+3.94e+00

Pos logit contributions
ment+0.006
edly+0.005
anged+0.005
 thirst+0.005
isher+0.005
Neg logit contributions
Beck-0.002
Anne-0.002
 great-0.002
 overcome-0.002
 achieve-0.002
Token finger
Token ID7660
Feature activation+3.20e+00

Pos logit contributions
ment+0.083
edly+0.081
isher+0.081
anged+0.081
 thirst+0.080
Neg logit contributions
 Anne-0.074
Anne-0.072
Beck-0.072
 great-0.070
 neat-0.068
Token and
Token ID290
Feature activation+3.17e+00

Pos logit contributions
isher+0.089
ment+0.089
edly+0.087
anged+0.086
 thirst+0.085
Neg logit contributions
 Anne-0.040
 different-0.036
 great-0.036
 big-0.036
 third-0.036
Token gently
Token ID15165
Feature activation+2.30e+00

Pos logit contributions
ment+0.064
edly+0.062
anged+0.062
 thirst+0.061
isher+0.061
Neg logit contributions
 Anne-0.062
 great-0.061
Anne-0.060
Beck-0.059
 third-0.058
Token stro
Token ID9660
Feature activation+7.00e+00

Pos logit contributions
ment+0.046
isher+0.046
edly+0.045
anged+0.044
 thirst+0.044
Neg logit contributions
 Anne-0.047
 great-0.044
Anne-0.044
Beck-0.044
 different-0.043
Tokenked
Token ID9091
Feature activation+1.09e+00

Pos logit contributions
isher+0.073
ment+0.066
edly+0.057
 thirst+0.057
anged+0.056
Neg logit contributions
 out-0.214
 her-0.207
Beck-0.206
 Anne-0.206
 third-0.204
Token the
Token ID262
Feature activation-2.46e-01

Pos logit contributions
ment+0.034
edly+0.033
 thirst+0.032
isher+0.032
anged+0.032
Neg logit contributions
 Anne-0.011
Anne-0.010
Beck-0.010
 great-0.009
 neat-0.009
Token cater
Token ID20825
Feature activation+1.44e+00

Pos logit contributions
 Anne+0.004
 big+0.004
 great+0.004
 different+0.004
 third+0.004
Neg logit contributions
isher-0.006
ment-0.006
anged-0.006
edly-0.006
 thirst-0.006
Tokenpillar
Token ID41643
Feature activation+1.04e+00

Pos logit contributions
isher+0.031
ment+0.031
edly+0.030
anged+0.030
 thirst+0.029
Neg logit contributions
 Anne-0.027
 big-0.026
 great-0.026
Beck-0.026
 different-0.026
Token.
Token ID13
Feature activation+3.48e-01

Pos logit contributions
ment+0.028
edly+0.028
isher+0.027
anged+0.027
 thirst+0.027
Neg logit contributions
 Anne-0.014
 great-0.013
Anne-0.013
Beck-0.013
 neat-0.012
Token it
Token ID340
Feature activation-2.44e-01

Pos logit contributions
 Anne+0.005
Beck+0.005
Anne+0.004
 great+0.004
 third+0.004
Neg logit contributions
ment-0.013
isher-0.013
edly-0.013
anged-0.013
 thirst-0.013
Token!
Token ID0
Feature activation-8.51e-01

Pos logit contributions
 Anne+0.003
Beck+0.003
Anne+0.003
 great+0.003
 third+0.003
Neg logit contributions
ment-0.006
isher-0.006
edly-0.006
anged-0.006
 thirst-0.006
Token<|endoftext|>
Token ID50256
Feature activation+3.30e+00

Pos logit contributions
 Anne+0.007
 great+0.006
Anne+0.006
Beck+0.006
 different+0.006
Neg logit contributions
isher-0.028
ment-0.028
edly-0.027
anged-0.027
 thirst-0.027
TokenSam
Token ID16305
Feature activation-3.35e+00

Pos logit contributions
ment+0.079
edly+0.075
isher+0.074
anged+0.072
 thirst+0.071
Neg logit contributions
 Anne-0.051
 neat-0.051
 great-0.050
 different-0.048
Beck-0.048
Token stared
Token ID18484
Feature activation+1.76e+00

Pos logit contributions
 Anne+0.060
Beck+0.059
 third+0.058
Anne+0.057
 great+0.057
Neg logit contributions
isher-0.073
anged-0.070
ment-0.070
 thirst-0.066
edly-0.066
Token int
Token ID493
Feature activation+6.99e+00

Pos logit contributions
ment+0.051
isher+0.049
edly+0.049
 thirst+0.048
anged+0.048
Neg logit contributions
 Anne-0.021
Anne-0.020
Beck-0.020
 third-0.019
 great-0.019
Tokenently
Token ID1473
Feature activation+4.01e-01

Pos logit contributions
isher+0.152
anged+0.151
 thirst+0.151
ment+0.148
edly+0.147
Neg logit contributions
 Anne-0.126
 great-0.120
 best-0.117
Anne-0.117
 out-0.117
Token through
Token ID832
Feature activation-1.61e+00

Pos logit contributions
ment+0.011
edly+0.011
isher+0.011
anged+0.010
 thirst+0.010
Neg logit contributions
 Anne-0.005
Beck-0.005
Anne-0.005
 great-0.005
 third-0.005
Token the
Token ID262
Feature activation+6.86e-01

Pos logit contributions
Beck+0.016
Anne+0.016
 Anne+0.015
 overcome+0.014
Alice+0.013
Neg logit contributions
ment-0.048
isher-0.046
edly-0.046
 thirst-0.045
anged-0.044
Token microscope
Token ID36396
Feature activation+1.91e+00

Pos logit contributions
isher+0.016
ment+0.015
 thirst+0.015
edly+0.015
anged+0.014
Neg logit contributions
 Anne-0.012
Beck-0.011
Anne-0.011
 overcome-0.010
 great-0.010
Token.
Token ID13
Feature activation+9.32e-01

Pos logit contributions
ment+0.050
edly+0.049
isher+0.049
anged+0.047
 thirst+0.046
Neg logit contributions
 Anne-0.026
Beck-0.026
Anne-0.025
 great-0.025
 third-0.024
Token as
Token ID355
Feature activation-1.24e+00

Pos logit contributions
ment+0.039
isher+0.038
edly+0.038
anged+0.037
 thirst+0.037
Neg logit contributions
 Anne-0.020
 great-0.019
Beck-0.019
Anne-0.019
 different-0.018
Token he
Token ID339
Feature activation+1.34e+00

Pos logit contributions
 Anne+0.017
 great+0.016
Beck+0.016
Anne+0.015
 different+0.015
Neg logit contributions
ment-0.033
isher-0.033
edly-0.032
anged-0.032
 thirst-0.031
Token opened
Token ID4721
Feature activation-8.64e-01

Pos logit contributions
ment+0.034
isher+0.034
edly+0.033
 thirst+0.032
anged+0.032
Neg logit contributions
 Anne-0.021
 great-0.020
 big-0.020
 different-0.019
 third-0.019
Token the
Token ID262
Feature activation-6.06e-01

Pos logit contributions
Beck+0.009
Anne+0.009
 Anne+0.009
 third+0.008
 great+0.008
Neg logit contributions
ment-0.025
isher-0.025
edly-0.024
 thirst-0.023
anged-0.023
Token ch
Token ID442
Feature activation+1.49e+00

Pos logit contributions
 Anne+0.010
Anne+0.010
Beck+0.010
 great+0.009
 neat+0.009
Neg logit contributions
isher-0.014
ment-0.014
edly-0.014
 thirst-0.014
anged-0.013
Tokenocol
Token ID4668
Feature activation+6.98e+00

Pos logit contributions
edly+0.020
idents+0.017
anged+0.017
 thirst+0.017
outed+0.017
Neg logit contributions
 big-0.038
 different-0.037
 great-0.037
 out-0.036
 matching-0.036
Tokenates
Token ID689
Feature activation+3.68e-01

Pos logit contributions
 thirst+0.167
isher+0.156
anged+0.155
idents+0.153
outed+0.153
Neg logit contributions
Beck-0.102
 ten-0.097
 out-0.097
Anne-0.096
 Anne-0.094
Token and
Token ID290
Feature activation+1.94e+00

Pos logit contributions
ment+0.010
isher+0.010
edly+0.010
anged+0.009
 thirst+0.009
Neg logit contributions
 Anne-0.005
 great-0.005
Beck-0.005
Anne-0.005
 different-0.005
Token started
Token ID2067
Feature activation-1.74e+00

Pos logit contributions
ment+0.050
isher+0.049
edly+0.049
anged+0.048
 thirst+0.047
Neg logit contributions
 Anne-0.028
 great-0.026
Beck-0.026
Anne-0.026
 different-0.026
Token eating
Token ID6600
Feature activation-1.57e+00

Pos logit contributions
 Anne+0.021
Beck+0.021
Anne+0.020
 great+0.020
 third+0.019
Neg logit contributions
ment-0.048
anged-0.046
edly-0.046
isher-0.046
 thirst-0.045
Token them
Token ID606
Feature activation-1.19e+00

Pos logit contributions
Anne+0.021
 Anne+0.021
Beck+0.020
Alice+0.017
 June+0.017
Neg logit contributions
ment-0.044
isher-0.042
edly-0.042
 thirst-0.042
anged-0.039
Token time
Token ID640
Feature activation-3.03e-01

Pos logit contributions
 Anne+0.045
Anne+0.042
Beck+0.041
 out+0.040
 neat+0.039
Neg logit contributions
 thirst-0.061
isher-0.058
ment-0.058
edly-0.058
anged-0.057
Token there
Token ID612
Feature activation+3.38e-01

Pos logit contributions
Beck+0.004
Anne+0.004
 Anne+0.004
 neat+0.003
 great+0.003
Neg logit contributions
ment-0.008
anged-0.008
 thirst-0.007
outed-0.007
edly-0.007
Token was
Token ID373
Feature activation+1.77e+00

Pos logit contributions
ment+0.009
isher+0.008
anged+0.008
edly+0.008
 thirst+0.008
Neg logit contributions
 Anne-0.005
Beck-0.005
Anne-0.005
 great-0.005
 neat-0.005
Token a
Token ID257
Feature activation-5.62e-01

Pos logit contributions
ment+0.056
edly+0.055
isher+0.055
 thirst+0.054
anged+0.054
Neg logit contributions
 Anne-0.015
Anne-0.013
Beck-0.013
 great-0.013
 third-0.013
Token blue
Token ID4171
Feature activation+3.29e+00

Pos logit contributions
 Anne+0.011
Anne+0.010
Beck+0.010
 overcome+0.009
 out+0.009
Neg logit contributions
 thirst-0.011
isher-0.011
edly-0.011
ment-0.011
ro-0.010
Token ostr
Token ID42387
Feature activation+6.91e+00

Pos logit contributions
ment+0.085
edly+0.084
isher+0.084
anged+0.083
 thirst+0.080
Neg logit contributions
 Anne-0.047
 big-0.045
 different-0.045
 great-0.044
 third-0.043
Tokenich
Token ID488
Feature activation+2.47e+00

Pos logit contributions
ment+0.124
isher+0.123
edly+0.119
 thirst+0.115
anged+0.115
Neg logit contributions
 Anne-0.153
Anne-0.146
Beck-0.145
 big-0.145
 great-0.144
Token that
Token ID326
Feature activation+1.67e+00

Pos logit contributions
ment+0.063
isher+0.061
edly+0.061
anged+0.060
 thirst+0.059
Neg logit contributions
 Anne-0.037
 great-0.035
Beck-0.035
Anne-0.034
 neat-0.034
Token lived
Token ID5615
Feature activation+2.94e+00

Pos logit contributions
isher+0.044
ment+0.043
edly+0.043
anged+0.042
 thirst+0.040
Neg logit contributions
 Anne-0.024
 different-0.023
 big-0.023
 great-0.023
Beck-0.023
Token in
Token ID287
Feature activation+3.59e-01

Pos logit contributions
ment+0.086
edly+0.085
isher+0.084
 thirst+0.083
anged+0.081
Neg logit contributions
 Anne-0.034
Anne-0.030
 great-0.029
Beck-0.029
 third-0.028
Token the
Token ID262
Feature activation+3.12e-01

Pos logit contributions
isher+0.013
ment+0.012
edly+0.012
anged+0.012
 thirst+0.012
Neg logit contributions
 Anne-0.002
 great-0.002
 different-0.002
 big-0.002
Beck-0.002
Token a
Token ID257
Feature activation-2.06e+00

Pos logit contributions
ment+0.025
isher+0.025
edly+0.025
 thirst+0.024
anged+0.024
Neg logit contributions
 Anne-0.009
Beck-0.009
Anne-0.009
 great-0.008
 third-0.008
Token big
Token ID1263
Feature activation-2.56e+00

Pos logit contributions
 Anne+0.029
Anne+0.028
Beck+0.027
 overcome+0.024
 June+0.024
Neg logit contributions
isher-0.051
 thirst-0.051
ment-0.050
edly-0.048
uc-0.047
Token box
Token ID3091
Feature activation+2.78e-01

Pos logit contributions
Beck+0.041
 Anne+0.040
Anne+0.040
 overcome+0.036
 third+0.036
Neg logit contributions
isher-0.063
ment-0.061
 thirst-0.060
anged-0.059
edly-0.057
Token of
Token ID286
Feature activation-1.70e+00

Pos logit contributions
ment+0.008
isher+0.008
edly+0.008
anged+0.008
 thirst+0.008
Neg logit contributions
 Anne-0.003
 different-0.003
 big-0.003
 great-0.003
 neat-0.003
Token ch
Token ID442
Feature activation+2.26e+00

Pos logit contributions
 Anne+0.029
Anne+0.029
Beck+0.029
 third+0.027
 overcome+0.027
Neg logit contributions
isher-0.039
ment-0.037
 thirst-0.036
anged-0.035
edly-0.035
Tokenocol
Token ID4668
Feature activation+6.83e+00

Pos logit contributions
edly+0.035
 thirst+0.031
ment+0.030
outed+0.026
isher+0.026
Neg logit contributions
 out-0.057
 big-0.056
 different-0.055
 neat-0.053
Beck-0.053
Tokenates
Token ID689
Feature activation+8.11e-01

Pos logit contributions
 thirst+0.153
isher+0.148
edly+0.145
ment+0.145
anged+0.142
Neg logit contributions
Beck-0.113
Anne-0.109
 Anne-0.108
 out-0.106
 ten-0.103
Token.
Token ID13
Feature activation-1.79e-01

Pos logit contributions
ment+0.022
isher+0.022
edly+0.022
anged+0.022
 thirst+0.021
Neg logit contributions
 Anne-0.010
 great-0.010
 different-0.009
 big-0.009
 neat-0.009
Token It
Token ID632
Feature activation+1.11e+00

Pos logit contributions
 Anne+0.003
 great+0.003
Beck+0.003
Anne+0.003
 different+0.003
Neg logit contributions
ment-0.004
isher-0.004
edly-0.004
anged-0.004
 thirst-0.004
Token was
Token ID373
Feature activation+5.04e-01

Pos logit contributions
edly+0.025
ment+0.024
isher+0.024
anged+0.024
 thirst+0.024
Neg logit contributions
 Anne-0.018
Beck-0.018
Anne-0.017
 great-0.017
 neat-0.017
Token so
Token ID523
Feature activation+2.81e-01

Pos logit contributions
ment+0.014
 thirst+0.014
edly+0.014
isher+0.014
anged+0.013
Neg logit contributions
Beck-0.006
Anne-0.006
 Anne-0.006
 overcome-0.005
 third-0.005
Token was
Token ID373
Feature activation+1.98e-01

Pos logit contributions
edly+0.040
isher+0.040
ment+0.040
anged+0.039
 thirst+0.038
Neg logit contributions
 Anne-0.028
Anne-0.027
Beck-0.027
 great-0.027
 neat-0.026
Token a
Token ID257
Feature activation-3.38e+00

Pos logit contributions
ment+0.006
isher+0.006
edly+0.006
 thirst+0.006
anged+0.005
Neg logit contributions
 Anne-0.002
Beck-0.002
Anne-0.002
 third-0.002
 great-0.002
Token box
Token ID3091
Feature activation-1.73e-01

Pos logit contributions
 Anne+0.051
Anne+0.049
Beck+0.047
 overcome+0.044
Alice+0.043
Neg logit contributions
 thirst-0.081
isher-0.081
ment-0.078
edly-0.077
ro-0.074
Token of
Token ID286
Feature activation-2.03e+00

Pos logit contributions
 Anne+0.002
Beck+0.002
Anne+0.002
 great+0.002
 third+0.002
Neg logit contributions
ment-0.005
isher-0.005
edly-0.005
anged-0.004
 thirst-0.004
Token ch
Token ID442
Feature activation+2.10e+00

Pos logit contributions
 Anne+0.033
Anne+0.032
Beck+0.031
 third+0.029
 overcome+0.029
Neg logit contributions
isher-0.049
ment-0.049
 thirst-0.047
edly-0.046
anged-0.045
Tokenocol
Token ID4668
Feature activation+6.79e+00

Pos logit contributions
edly+0.027
 thirst+0.023
ment+0.023
outed+0.022
isher+0.021
Neg logit contributions
 out-0.058
 big-0.057
 different-0.056
 neat-0.055
Beck-0.055
Tokenates
Token ID689
Feature activation+8.94e-01

Pos logit contributions
 thirst+0.153
isher+0.146
edly+0.146
ment+0.144
anged+0.141
Neg logit contributions
Beck-0.112
Anne-0.107
 Anne-0.107
 out-0.105
 ten-0.103
Token.
Token ID13
Feature activation-4.42e-01

Pos logit contributions
ment+0.024
isher+0.023
edly+0.023
anged+0.023
 thirst+0.022
Neg logit contributions
 Anne-0.012
Beck-0.011
 great-0.011
Anne-0.011
 different-0.011
Token Lily
Token ID20037
Feature activation-1.23e+00

Pos logit contributions
 Anne+0.007
 great+0.007
Beck+0.007
Anne+0.007
 different+0.007
Neg logit contributions
ment-0.010
isher-0.010
edly-0.010
anged-0.010
 thirst-0.010
Token loved
Token ID6151
Feature activation+1.12e+00

Pos logit contributions
 great+0.023
 Anne+0.022
Beck+0.022
Anne+0.021
 third+0.021
Neg logit contributions
edly-0.026
anged-0.025
isher-0.025
ment-0.024
outed-0.024
Token ch
Token ID442
Feature activation+7.03e-01

Pos logit contributions
ment+0.030
isher+0.030
edly+0.029
anged+0.029
 thirst+0.029
Neg logit contributions
Beck-0.014
 Anne-0.014
Anne-0.013
 great-0.013
 third-0.013
Token giving
Token ID3501
Feature activation+1.13e+00

Pos logit contributions
 Anne+0.006
 great+0.006
 different+0.006
 big+0.005
 third+0.005
Neg logit contributions
isher-0.013
ment-0.013
edly-0.012
anged-0.012
 thirst-0.012
Token him
Token ID683
Feature activation-1.16e+00

Pos logit contributions
ment+0.032
isher+0.032
edly+0.031
anged+0.031
 thirst+0.030
Neg logit contributions
 Anne-0.014
 great-0.013
Beck-0.013
 different-0.013
Anne-0.013
Token too
Token ID1165
Feature activation-2.02e-01

Pos logit contributions
 Anne+0.012
Anne+0.011
Beck+0.011
 great+0.011
 different+0.011
Neg logit contributions
ment-0.034
isher-0.034
edly-0.034
anged-0.033
 thirst-0.033
Token many
Token ID867
Feature activation-1.43e+00

Pos logit contributions
 Anne+0.002
 great+0.002
 different+0.002
Beck+0.002
 big+0.002
Neg logit contributions
ment-0.006
isher-0.006
edly-0.006
anged-0.006
 thirst-0.006
Token ch
Token ID442
Feature activation+1.16e+00

Pos logit contributions
 Anne+0.021
 great+0.019
Beck+0.019
Anne+0.019
 different+0.019
Neg logit contributions
ment-0.037
isher-0.037
edly-0.036
anged-0.036
 thirst-0.035
Tokenocol
Token ID4668
Feature activation+6.78e+00

Pos logit contributions
edly+0.019
idents+0.016
 thirst+0.015
isher+0.015
anged+0.015
Neg logit contributions
 big-0.027
 different-0.027
 great-0.026
 many-0.026
 out-0.026
Tokenates
Token ID689
Feature activation-4.37e-01

Pos logit contributions
 thirst+0.165
edly+0.153
isher+0.153
anged+0.153
outed+0.150
Neg logit contributions
Beck-0.092
 best-0.091
 ten-0.091
 out-0.089
 overcome-0.088
Token."
Token ID526
Feature activation+9.17e-01

Pos logit contributions
 Anne+0.006
 great+0.005
 different+0.005
Beck+0.005
Anne+0.005
Neg logit contributions
ment-0.012
isher-0.012
edly-0.012
anged-0.011
 thirst-0.011
Token The
Token ID383
Feature activation+1.27e+00

Pos logit contributions
ment+0.027
edly+0.027
isher+0.026
anged+0.026
 thirst+0.026
Neg logit contributions
 Anne-0.010
 great-0.009
 neat-0.009
Beck-0.009
 different-0.009
Token king
Token ID5822
Feature activation-4.35e-01

Pos logit contributions
ment+0.033
isher+0.033
edly+0.032
anged+0.032
 thirst+0.031
Neg logit contributions
 Anne-0.019
 great-0.018
 big-0.018
 different-0.018
 third-0.017
Token learned
Token ID4499
Feature activation+1.06e+00

Pos logit contributions
 Anne+0.007
Anne+0.007
 big+0.007
 different+0.007
Beck+0.006
Neg logit contributions
isher-0.011
ment-0.010
edly-0.010
anged-0.010
 thirst-0.010
Tokenoll
Token ID692
Feature activation+7.78e-01

Pos logit contributions
ment+0.045
isher+0.044
edly+0.043
anged+0.042
 thirst+0.041
Neg logit contributions
 Anne-0.077
 great-0.074
Beck-0.073
Anne-0.073
 different-0.073
Tokenip
Token ID541
Feature activation+1.27e+00

Pos logit contributions
edly+0.015
 thirst+0.015
isher+0.014
ment+0.014
 habitat+0.014
Neg logit contributions
 best-0.015
 out-0.015
 great-0.015
 big-0.015
 on-0.015
Tokenops
Token ID2840
Feature activation-3.10e-01

Pos logit contributions
isher+0.016
ment+0.016
edly+0.015
anged+0.014
 thirst+0.014
Neg logit contributions
 Anne-0.035
 great-0.035
 big-0.034
 different-0.034
 neat-0.034
Token,
Token ID11
Feature activation-5.31e-01

Pos logit contributions
 Anne+0.004
Beck+0.004
Anne+0.004
 great+0.004
 third+0.003
Neg logit contributions
ment-0.009
edly-0.008
isher-0.008
anged-0.008
 thirst-0.008
Token ch
Token ID442
Feature activation+1.49e+00

Pos logit contributions
Beck+0.007
 Anne+0.007
Anne+0.007
 great+0.006
 third+0.006
Neg logit contributions
ment-0.014
edly-0.014
 thirst-0.013
isher-0.013
anged-0.013
Tokenocol
Token ID4668
Feature activation+6.69e+00

Pos logit contributions
edly+0.019
isher+0.017
 thirst+0.017
ment+0.016
anged+0.015
Neg logit contributions
 big-0.040
 great-0.039
 different-0.039
 ten-0.038
 out-0.038
Tokenates
Token ID689
Feature activation-4.44e-01

Pos logit contributions
 thirst+0.154
isher+0.149
ment+0.145
edly+0.144
anged+0.143
Neg logit contributions
 ten-0.101
Beck-0.100
 out-0.097
Anne-0.097
 Anne-0.096
Token,
Token ID11
Feature activation-1.94e-01

Pos logit contributions
 Anne+0.005
Beck+0.004
Anne+0.004
 great+0.004
 different+0.004
Neg logit contributions
ment-0.013
isher-0.013
edly-0.013
anged-0.013
 thirst-0.012
Token g
Token ID308
Feature activation+3.13e+00

Pos logit contributions
 Anne+0.003
Beck+0.003
Anne+0.003
 great+0.003
 neat+0.002
Neg logit contributions
ment-0.005
edly-0.005
isher-0.005
 thirst-0.005
anged-0.005
Tokenummies
Token ID39578
Feature activation-2.41e+00

Pos logit contributions
isher+0.054
ment+0.051
edly+0.051
anged+0.050
 thirst+0.050
Neg logit contributions
 Anne-0.070
 great-0.070
 big-0.069
 neat-0.068
 different-0.067
Token,
Token ID11
Feature activation-8.62e-01

Pos logit contributions
 Anne+0.028
Beck+0.027
Anne+0.027
 great+0.026
 third+0.025
Neg logit contributions
ment-0.069
edly-0.067
isher-0.067
anged-0.065
 thirst-0.065
Token a
Token ID257
Feature activation-1.52e-01

Pos logit contributions
ment+0.080
isher+0.078
edly+0.078
anged+0.077
 thirst+0.076
Neg logit contributions
 Anne-0.028
 great-0.026
Beck-0.026
Anne-0.026
 neat-0.025
Token stroke
Token ID14000
Feature activation+3.16e+00

Pos logit contributions
 Anne+0.002
Anne+0.002
Beck+0.002
 overcome+0.002
 third+0.002
Neg logit contributions
 thirst-0.004
isher-0.004
ment-0.004
edly-0.004
anged-0.004
Token.
Token ID13
Feature activation+1.26e+00

Pos logit contributions
isher+0.091
anged+0.089
ment+0.089
edly+0.089
 thirst+0.087
Neg logit contributions
 Anne-0.036
 out-0.033
 big-0.033
 different-0.033
 great-0.032
Token As
Token ID1081
Feature activation-5.30e-01

Pos logit contributions
ment+0.032
edly+0.031
isher+0.030
anged+0.030
 thirst+0.030
Neg logit contributions
 great-0.018
 Anne-0.018
Beck-0.018
 neat-0.017
 different-0.017
Token she
Token ID673
Feature activation+1.51e+00

Pos logit contributions
 Anne+0.007
Beck+0.007
Anne+0.007
 great+0.007
 neat+0.007
Neg logit contributions
ment-0.014
isher-0.013
edly-0.013
 thirst-0.013
anged-0.013
Token stro
Token ID9660
Feature activation+6.60e+00

Pos logit contributions
ment+0.037
isher+0.037
edly+0.036
anged+0.035
 thirst+0.035
Neg logit contributions
 Anne-0.024
 great-0.023
Beck-0.022
 different-0.022
Anne-0.022
Tokenked
Token ID9091
Feature activation+3.60e-01

Pos logit contributions
isher+0.069
ment+0.064
 thirst+0.056
anged+0.053
edly+0.052
Neg logit contributions
 out-0.202
 Anne-0.199
Beck-0.196
Anne-0.193
 her-0.192
Token him
Token ID683
Feature activation+2.50e+00

Pos logit contributions
ment+0.011
edly+0.011
 thirst+0.010
anged+0.010
isher+0.010
Neg logit contributions
Beck-0.004
Anne-0.003
 Anne-0.003
 neat-0.003
 overcome-0.003
Token,
Token ID11
Feature activation+1.47e-01

Pos logit contributions
ment+0.073
edly+0.072
isher+0.070
anged+0.070
 thirst+0.069
Neg logit contributions
 Anne-0.027
Beck-0.027
Anne-0.026
 great-0.025
 neat-0.025
Token the
Token ID262
Feature activation+2.33e+00

Pos logit contributions
ment+0.004
edly+0.004
isher+0.004
 thirst+0.004
anged+0.004
Neg logit contributions
 Anne-0.001
Beck-0.001
Anne-0.001
 great-0.001
 neat-0.001
Token horse
Token ID8223
Feature activation+9.99e-01

Pos logit contributions
ment+0.060
isher+0.060
anged+0.059
edly+0.059
 thirst+0.056
Neg logit contributions
 Anne-0.034
 great-0.033
 big-0.032
 different-0.032
 third-0.031
Token to
Token ID284
Feature activation+1.09e+00

Pos logit contributions
ment+0.029
edly+0.029
anged+0.026
 thirst+0.026
isher+0.026
Neg logit contributions
Beck-0.017
Anne-0.017
 Anne-0.017
 great-0.016
Alice-0.015
Token Rose
Token ID8049
Feature activation+3.53e-01

Pos logit contributions
ment+0.035
edly+0.034
 thirst+0.034
anged+0.034
isher+0.033
Neg logit contributions
 Anne-0.008
Anne-0.007
Beck-0.007
 great-0.007
 neat-0.007
Token.
Token ID13
Feature activation+9.39e-01

Pos logit contributions
ment+0.009
edly+0.009
anged+0.009
isher+0.009
 thirst+0.009
Neg logit contributions
 Anne-0.005
Anne-0.005
Beck-0.005
 great-0.005
 neat-0.005
Token Rose
Token ID8049
Feature activation-1.28e+00

Pos logit contributions
ment+0.026
edly+0.026
isher+0.025
anged+0.025
 thirst+0.024
Neg logit contributions
 Anne-0.011
 great-0.011
Beck-0.011
Anne-0.011
 neat-0.011
Token gently
Token ID15165
Feature activation+2.25e+00

Pos logit contributions
 Anne+0.023
 great+0.022
 neat+0.022
Beck+0.021
Anne+0.021
Neg logit contributions
edly-0.028
ment-0.028
isher-0.027
anged-0.027
 thirst-0.027
Token stro
Token ID9660
Feature activation+6.60e+00

Pos logit contributions
ment+0.049
edly+0.048
anged+0.047
isher+0.047
 thirst+0.046
Neg logit contributions
 Anne-0.042
Anne-0.040
 great-0.040
Beck-0.040
 neat-0.039
Tokenked
Token ID9091
Feature activation+7.82e-01

Pos logit contributions
isher+0.067
ment+0.063
 thirst+0.053
anged+0.053
edly+0.050
Neg logit contributions
 Anne-0.199
 out-0.198
Beck-0.196
 different-0.194
 third-0.194
Token the
Token ID262
Feature activation+1.33e-01

Pos logit contributions
ment+0.023
edly+0.022
 thirst+0.021
anged+0.020
isher+0.020
Neg logit contributions
Anne-0.009
Beck-0.009
 Anne-0.009
 neat-0.008
 great-0.008
Token turkey
Token ID26876
Feature activation+7.28e-01

Pos logit contributions
ment+0.004
edly+0.004
isher+0.004
 thirst+0.003
anged+0.003
Neg logit contributions
 Anne-0.002
Anne-0.002
Beck-0.002
 great-0.002
 neat-0.002
Token's
Token ID338
Feature activation+3.26e+00

Pos logit contributions
edly+0.020
ment+0.020
 thirst+0.020
anged+0.019
isher+0.019
Neg logit contributions
 Anne-0.009
Anne-0.008
Beck-0.008
 great-0.008
Alice-0.008
Token feathers
Token ID27400
Feature activation+1.38e+00

Pos logit contributions
isher+0.073
ment+0.070
anged+0.070
edly+0.068
 thirst+0.065
Neg logit contributions
 Anne-0.059
 big-0.058
 great-0.058
 third-0.057
 different-0.057
Token a
Token ID257
Feature activation-3.61e+00

Pos logit contributions
 Anne+0.003
Beck+0.003
Anne+0.003
 great+0.003
 third+0.003
Neg logit contributions
ment-0.011
isher-0.010
edly-0.010
 thirst-0.010
anged-0.010
Token large
Token ID1588
Feature activation-1.46e+00

Pos logit contributions
 Anne+0.052
Anne+0.051
Beck+0.051
 overcome+0.045
 June+0.043
Neg logit contributions
 thirst-0.091
isher-0.090
ment-0.086
edly-0.084
anged-0.082
Token selection
Token ID6356
Feature activation+2.16e+00

Pos logit contributions
 Anne+0.023
Beck+0.023
Anne+0.023
 overcome+0.021
 third+0.020
Neg logit contributions
isher-0.036
ment-0.035
 thirst-0.034
edly-0.033
anged-0.033
Token of
Token ID286
Feature activation-1.10e+00

Pos logit contributions
ment+0.048
isher+0.048
edly+0.047
anged+0.045
 thirst+0.044
Neg logit contributions
 Anne-0.040
Beck-0.038
Anne-0.038
 great-0.037
 third-0.036
Token ch
Token ID442
Feature activation+1.77e+00

Pos logit contributions
 Anne+0.019
Beck+0.019
Anne+0.019
 overcome+0.017
 third+0.017
Neg logit contributions
isher-0.025
ment-0.025
 thirst-0.024
anged-0.023
edly-0.023
Tokenocol
Token ID4668
Feature activation+6.58e+00

Pos logit contributions
edly+0.018
 thirst+0.016
ment+0.016
isher+0.014
outed+0.014
Neg logit contributions
 big-0.053
 different-0.053
 great-0.052
 matching-0.051
 out-0.051
Tokenates
Token ID689
Feature activation+5.02e-01

Pos logit contributions
 thirst+0.145
edly+0.138
isher+0.138
ment+0.136
anged+0.135
Neg logit contributions
Beck-0.111
 Anne-0.108
Anne-0.107
 ten-0.105
 out-0.105
Token.
Token ID13
Feature activation-3.31e-01

Pos logit contributions
ment+0.013
isher+0.013
edly+0.013
anged+0.013
 thirst+0.012
Neg logit contributions
 Anne-0.007
Beck-0.007
 great-0.007
Anne-0.007
 different-0.006
Token Ell
Token ID7122
Feature activation-9.97e-02

Pos logit contributions
 Anne+0.005
Beck+0.005
 great+0.005
Anne+0.004
 neat+0.004
Neg logit contributions
ment-0.008
edly-0.008
isher-0.008
anged-0.008
 thirst-0.008
Tokena
Token ID64
Feature activation-9.51e-01

Pos logit contributions
 Anne+0.001
Anne+0.001
's+0.001
Beck+0.001
 different+0.001
Neg logit contributions
ment-0.003
isher-0.003
 thirst-0.003
edly-0.003
anged-0.003
Token picked
Token ID6497
Feature activation+1.92e-01

Pos logit contributions
 Anne+0.015
Beck+0.015
 great+0.015
Anne+0.015
 neat+0.014
Neg logit contributions
edly-0.022
ment-0.022
isher-0.022
anged-0.022
 thirst-0.021
Token he
Token ID339
Feature activation+5.99e-01

Pos logit contributions
 Anne+0.001
Beck+0.001
Anne+0.001
 great+0.001
 different+0.001
Neg logit contributions
ment-0.002
isher-0.002
edly-0.002
 thirst-0.002
anged-0.002
Token ate
Token ID15063
Feature activation-1.44e-02

Pos logit contributions
ment+0.013
isher+0.013
edly+0.013
anged+0.013
 thirst+0.012
Neg logit contributions
 Anne-0.011
Beck-0.010
 great-0.010
Anne-0.010
 different-0.010
Token all
Token ID477
Feature activation-8.88e-01

Pos logit contributions
 Anne+0.000
Anne+0.000
Beck+0.000
 great+0.000
 third+0.000
Neg logit contributions
ment-0.000
isher-0.000
edly-0.000
 thirst-0.000
anged-0.000
Token the
Token ID262
Feature activation-5.03e-01

Pos logit contributions
 Anne+0.007
 great+0.006
Beck+0.006
Anne+0.006
 different+0.006
Neg logit contributions
ment-0.029
isher-0.029
edly-0.029
anged-0.028
 thirst-0.028
Token ch
Token ID442
Feature activation+1.42e+00

Pos logit contributions
 Anne+0.009
Anne+0.008
Beck+0.008
 June+0.008
 overcome+0.007
Neg logit contributions
ment-0.012
isher-0.012
 thirst-0.011
edly-0.011
anged-0.011
Tokenocol
Token ID4668
Feature activation+6.48e+00

Pos logit contributions
edly+0.019
idents+0.018
 thirst+0.016
outed+0.016
isher+0.016
Neg logit contributions
 big-0.034
 many-0.033
 different-0.033
 her-0.033
 great-0.033
Tokenates
Token ID689
Feature activation-1.86e-01

Pos logit contributions
 thirst+0.147
isher+0.143
anged+0.138
edly+0.136
outed+0.135
Neg logit contributions
Beck-0.105
Anne-0.102
 Anne-0.101
 ten-0.096
 out-0.095
Token and
Token ID290
Feature activation+1.98e+00

Pos logit contributions
 Anne+0.003
Beck+0.002
Anne+0.002
 great+0.002
 third+0.002
Neg logit contributions
ment-0.005
edly-0.005
isher-0.005
anged-0.005
 thirst-0.005
Token sweets
Token ID42402
Feature activation+2.93e-01

Pos logit contributions
isher+0.049
ment+0.048
anged+0.047
edly+0.046
 thirst+0.045
Neg logit contributions
 Anne-0.032
 great-0.030
 big-0.030
 different-0.030
 neat-0.030
Token.
Token ID13
Feature activation-1.70e-01

Pos logit contributions
ment+0.008
isher+0.008
edly+0.008
anged+0.008
 thirst+0.008
Neg logit contributions
 Anne-0.004
 great-0.004
 different-0.003
Beck-0.003
Anne-0.003
Token Then
Token ID3244
Feature activation-1.28e+00

Pos logit contributions
 Anne+0.003
 great+0.003
Anne+0.003
 different+0.003
Beck+0.003
Neg logit contributions
isher-0.004
ment-0.004
edly-0.004
anged-0.004
 thirst-0.004
Token a
Token ID257
Feature activation-1.77e+00

Pos logit contributions
ment+0.002
isher+0.002
edly+0.002
anged+0.002
 thirst+0.002
Neg logit contributions
 Anne-0.001
 great-0.001
 different-0.001
Beck-0.001
 big-0.001
Token design
Token ID1486
Feature activation+9.69e-01

Pos logit contributions
 Anne+0.024
Anne+0.024
Beck+0.021
Alice+0.021
 Alice+0.021
Neg logit contributions
 thirst-0.044
isher-0.042
ro-0.041
 habitat-0.040
anged-0.039
Token,
Token ID11
Feature activation-3.04e-01

Pos logit contributions
ment+0.026
isher+0.026
edly+0.026
anged+0.025
 thirst+0.025
Neg logit contributions
 Anne-0.013
Beck-0.012
Anne-0.012
 great-0.012
 different-0.011
Token when
Token ID618
Feature activation-1.40e+00

Pos logit contributions
 Anne+0.004
Beck+0.003
Anne+0.003
 great+0.003
 third+0.003
Neg logit contributions
ment-0.009
isher-0.009
edly-0.008
anged-0.008
 thirst-0.008
Token in
Token ID287
Feature activation-6.22e-01

Pos logit contributions
 Anne+0.019
 great+0.018
 different+0.017
 big+0.017
 third+0.017
Neg logit contributions
isher-0.038
ment-0.038
edly-0.037
anged-0.036
 thirst-0.036
Token stro
Token ID9660
Feature activation+6.41e+00

Pos logit contributions
 Anne+0.003
 big+0.003
 different+0.003
 great+0.003
 third+0.002
Neg logit contributions
edly-0.023
isher-0.023
ment-0.022
anged-0.022
 thirst-0.022
Tokende
Token ID2934
Feature activation+3.30e+00

Pos logit contributions
isher+0.133
 thirst+0.115
ment+0.115
edly+0.114
anged+0.114
Neg logit contributions
 out-0.138
 big-0.135
 different-0.128
's-0.125
 great-0.124
Token a
Token ID257
Feature activation-2.05e+00

Pos logit contributions
isher+0.089
ment+0.088
edly+0.087
anged+0.086
 thirst+0.085
Neg logit contributions
 Anne-0.045
 great-0.041
 different-0.041
 big-0.040
Anne-0.040
Token small
Token ID1402
Feature activation+1.18e+00

Pos logit contributions
 Anne+0.024
Anne+0.022
Beck+0.021
 Alice+0.020
 out+0.019
Neg logit contributions
 thirst-0.056
ment-0.056
isher-0.056
edly-0.054
anged-0.054
Token rabbit
Token ID22746
Feature activation+6.76e-01

Pos logit contributions
ment+0.030
isher+0.030
edly+0.029
anged+0.029
 thirst+0.028
Neg logit contributions
 Anne-0.018
 great-0.017
Beck-0.016
Anne-0.016
 different-0.016
Token.
Token ID13
Feature activation+4.49e-01

Pos logit contributions
ment+0.018
anged+0.018
isher+0.018
edly+0.018
 thirst+0.018
Neg logit contributions
 Anne-0.008
 great-0.008
Anne-0.008
Beck-0.008
 neat-0.008
Token Jack
Token ID3619
Feature activation-2.41e+00

Pos logit contributions
ment+0.031
 thirst+0.029
edly+0.029
isher+0.029
anged+0.028
Neg logit contributions
Anne-0.017
 great-0.017
 neat-0.017
 Anne-0.017
Beck-0.016
Token.
Token ID13
Feature activation+8.08e-01

Pos logit contributions
 Anne+0.033
 great+0.032
Beck+0.031
Anne+0.031
 different+0.030
Neg logit contributions
ment-0.063
edly-0.062
isher-0.061
anged-0.061
 thirst-0.060
Token
Token ID198
Feature activation+1.55e+00

Pos logit contributions
ment+0.021
edly+0.020
 thirst+0.019
anged+0.019
elt+0.019
Neg logit contributions
 great-0.011
 neat-0.011
 Anne-0.010
 best-0.010
Anne-0.010
TokenM
Token ID44
Feature activation+2.83e+00

Pos logit contributions
ment+0.037
edly+0.035
ipt+0.033
 thirst+0.033
outed+0.033
Neg logit contributions
 best-0.023
 neat-0.022
 merry-0.022
 calendar-0.022
 great-0.022
Tokenummy
Token ID13513
Feature activation+3.81e-01

Pos logit contributions
isher+0.056
edly+0.055
ment+0.054
 thirst+0.053
anged+0.053
Neg logit contributions
 Anne-0.056
 big-0.054
 great-0.054
Beck-0.052
 third-0.052
Token stro
Token ID9660
Feature activation+6.41e+00

Pos logit contributions
ment+0.008
edly+0.008
isher+0.008
anged+0.008
 thirst+0.007
Neg logit contributions
 Anne-0.007
 great-0.007
 neat-0.007
Anne-0.007
Beck-0.007
Tokenked
Token ID9091
Feature activation+6.59e-01

Pos logit contributions
ment+0.063
isher+0.061
 thirst+0.055
anged+0.054
edly+0.053
Neg logit contributions
 Anne-0.192
Beck-0.190
 out-0.188
 big-0.188
 third-0.188
Token her
Token ID607
Feature activation+3.82e+00

Pos logit contributions
ment+0.019
edly+0.019
 thirst+0.018
isher+0.018
anged+0.017
Neg logit contributions
Anne-0.007
 Anne-0.007
Beck-0.007
 great-0.007
 neat-0.007
Token chin
Token ID22531
Feature activation+1.93e+00

Pos logit contributions
ment+0.070
edly+0.069
isher+0.068
 thirst+0.066
anged+0.064
Neg logit contributions
 Anne-0.084
Anne-0.083
Beck-0.083
 great-0.079
 different-0.078
Token.
Token ID13
Feature activation-2.71e-02

Pos logit contributions
ment+0.052
edly+0.052
isher+0.051
anged+0.050
 thirst+0.050
Neg logit contributions
 Anne-0.025
Anne-0.024
 great-0.024
Beck-0.024
 different-0.022
Token "
Token ID366
Feature activation+2.52e-01

Pos logit contributions
 great+0.000
 neat+0.000
 best+0.000
 merry+0.000
Anne+0.000
Neg logit contributions
ment-0.001
edly-0.001
elt-0.001
 thirst-0.001
outed-0.001
Token The
Token ID383
Feature activation+1.37e-01

Pos logit contributions
 Anne+0.008
 great+0.008
Beck+0.008
 neat+0.008
Anne+0.008
Neg logit contributions
ment-0.020
edly-0.020
isher-0.020
anged-0.019
 thirst-0.019
Token little
Token ID1310
Feature activation+9.13e-01

Pos logit contributions
anged+0.004
ment+0.004
isher+0.004
edly+0.004
outed+0.004
Neg logit contributions
 big-0.002
 great-0.002
 different-0.002
 third-0.002
 best-0.002
Token lamb
Token ID19343
Feature activation+1.93e+00

Pos logit contributions
anged+0.023
ment+0.022
isher+0.022
edly+0.022
outed+0.021
Neg logit contributions
 big-0.015
 different-0.015
 out-0.015
 great-0.014
 best-0.014
Token watched
Token ID7342
Feature activation+5.47e-02

Pos logit contributions
ment+0.043
edly+0.043
isher+0.043
anged+0.042
 thirst+0.042
Neg logit contributions
 Anne-0.033
 great-0.032
Beck-0.032
Anne-0.032
 neat-0.031
Token him
Token ID683
Feature activation+6.97e-01

Pos logit contributions
ment+0.001
edly+0.001
 thirst+0.001
anged+0.001
isher+0.001
Neg logit contributions
Anne-0.001
Beck-0.001
 Anne-0.001
 great-0.001
 neat-0.001
Token int
Token ID493
Feature activation+6.39e+00

Pos logit contributions
ment+0.017
edly+0.017
anged+0.016
 thirst+0.016
isher+0.016
Neg logit contributions
 Anne-0.011
Beck-0.011
Anne-0.011
Alice-0.010
 great-0.010
Tokenently
Token ID1473
Feature activation-1.68e-02

Pos logit contributions
isher+0.156
 thirst+0.155
anged+0.154
edly+0.152
ro+0.148
Neg logit contributions
 best-0.093
 Anne-0.091
 overcome-0.090
 her-0.089
 out-0.089
Token,
Token ID11
Feature activation+3.79e-01

Pos logit contributions
 Anne+0.000
Anne+0.000
Beck+0.000
 great+0.000
 third+0.000
Neg logit contributions
ment-0.000
edly-0.000
anged-0.000
isher-0.000
 thirst-0.000
Token and
Token ID290
Feature activation+1.66e+00

Pos logit contributions
ment+0.010
edly+0.010
isher+0.010
anged+0.009
 thirst+0.009
Neg logit contributions
 Anne-0.005
Beck-0.005
 great-0.005
Anne-0.005
 third-0.005
Token when
Token ID618
Feature activation-1.75e+00

Pos logit contributions
ment+0.043
edly+0.042
isher+0.042
anged+0.041
 thirst+0.041
Neg logit contributions
 Anne-0.024
 great-0.023
Beck-0.023
Anne-0.022
 neat-0.022
Token the
Token ID262
Feature activation+1.24e-01

Pos logit contributions
 Anne+0.029
Anne+0.029
Beck+0.029
 neat+0.028
 matching+0.027
Neg logit contributions
ment-0.039
 thirst-0.037
edly-0.036
anged-0.035
isher-0.034
Token looking
Token ID2045
Feature activation-1.15e+00

Pos logit contributions
 Anne+0.012
Beck+0.011
Anne+0.010
 great+0.010
 neat+0.010
Neg logit contributions
ment-0.034
edly-0.033
isher-0.033
anged-0.032
 thirst-0.032
Token at
Token ID379
Feature activation+6.58e-01

Pos logit contributions
Anne+0.022
 Anne+0.021
Beck+0.020
Alice+0.020
 tag+0.019
Neg logit contributions
ment-0.027
isher-0.025
anged-0.024
 thirst-0.024
edly-0.024
Token all
Token ID477
Feature activation-1.11e+00

Pos logit contributions
ment+0.021
isher+0.021
edly+0.020
anged+0.020
 thirst+0.020
Neg logit contributions
 Anne-0.006
Beck-0.005
Anne-0.005
 great-0.005
 neat-0.005
Token the
Token ID262
Feature activation-4.68e-01

Pos logit contributions
Beck+0.011
 Anne+0.010
Anne+0.010
 third+0.009
 great+0.009
Neg logit contributions
ment-0.034
isher-0.034
anged-0.032
edly-0.031
 thirst-0.031
Token ch
Token ID442
Feature activation+1.54e+00

Pos logit contributions
 Anne+0.008
Beck+0.008
Anne+0.008
 June+0.008
 overcome+0.007
Neg logit contributions
isher-0.010
ment-0.010
 thirst-0.010
anged-0.009
outed-0.009
Tokenocol
Token ID4668
Feature activation+6.38e+00

Pos logit contributions
edly+0.018
 thirst+0.014
isher+0.014
ment+0.013
anged+0.012
Neg logit contributions
 big-0.043
 great-0.042
 different-0.042
 neat-0.042
 out-0.042
Tokenates
Token ID689
Feature activation+2.33e-01

Pos logit contributions
 thirst+0.142
isher+0.138
edly+0.135
ment+0.133
anged+0.132
Neg logit contributions
Beck-0.104
Anne-0.103
 Anne-0.103
 ten-0.096
 out-0.096
Token,
Token ID11
Feature activation-1.91e-01

Pos logit contributions
ment+0.007
isher+0.007
edly+0.007
anged+0.007
 thirst+0.007
Neg logit contributions
 Anne-0.002
Beck-0.002
Anne-0.002
 great-0.002
 third-0.002
Token Lex
Token ID17210
Feature activation+4.78e-01

Pos logit contributions
 Anne+0.002
Anne+0.002
Beck+0.002
 neat+0.002
 great+0.002
Neg logit contributions
ment-0.005
edly-0.005
isher-0.005
 thirst-0.005
anged-0.005
Tokeni
Token ID72
Feature activation+8.12e-01

Pos logit contributions
isher+0.007
edly+0.007
anged+0.007
ment+0.006
 thirst+0.006
Neg logit contributions
 Anne-0.012
Beck-0.012
 different-0.012
 big-0.012
 great-0.012
Token chose
Token ID7690
Feature activation+1.53e+00

Pos logit contributions
ment+0.019
isher+0.018
edly+0.018
anged+0.018
 thirst+0.018
Neg logit contributions
 Anne-0.014
 great-0.013
Beck-0.013
Anne-0.013
 different-0.013
Token a
Token ID257
Feature activation-1.45e+00

Pos logit contributions
isher+0.004
anged+0.004
ment+0.004
edly+0.004
outed+0.004
Neg logit contributions
 big-0.000
 Anne-0.000
 great-0.000
 different-0.000
 third-0.000
Token small
Token ID1402
Feature activation+5.76e-01

Pos logit contributions
 Anne+0.015
Anne+0.013
Beck+0.012
 Alice+0.011
 June+0.011
Neg logit contributions
 thirst-0.042
isher-0.042
ment-0.041
edly-0.040
anged-0.040
Token box
Token ID3091
Feature activation+1.15e+00

Pos logit contributions
ment+0.014
isher+0.014
edly+0.014
anged+0.014
 thirst+0.014
Neg logit contributions
 Anne-0.009
Anne-0.008
Beck-0.008
 great-0.008
 third-0.008
Token with
Token ID351
Feature activation+1.20e+00

Pos logit contributions
ment+0.035
isher+0.035
edly+0.035
anged+0.034
 thirst+0.034
Neg logit contributions
 Anne-0.011
 different-0.010
 big-0.010
 great-0.010
 neat-0.010
Token other
Token ID584
Feature activation-6.80e-01

Pos logit contributions
ment+0.039
isher+0.038
edly+0.038
anged+0.037
 thirst+0.037
Neg logit contributions
 Anne-0.010
 great-0.009
Beck-0.009
Anne-0.009
 different-0.008
Token therm
Token ID21969
Feature activation+6.35e+00

Pos logit contributions
 Anne+0.012
Beck+0.011
Anne+0.011
 great+0.010
 third+0.010
Neg logit contributions
isher-0.016
ment-0.016
edly-0.015
 thirst-0.015
anged-0.015
Tokenometers
Token ID40077
Feature activation-1.01e+00

Pos logit contributions
ited+0.135
 thirst+0.125
isher+0.119
ised+0.114
ment+0.112
Neg logit contributions
 out-0.132
 big-0.130
 many-0.126
 best-0.120
 Anne-0.115
Token.
Token ID13
Feature activation-2.96e-01

Pos logit contributions
 Anne+0.013
 great+0.012
 different+0.012
 big+0.012
 neat+0.012
Neg logit contributions
ment-0.028
isher-0.028
edly-0.027
anged-0.027
 thirst-0.027
Token But
Token ID887
Feature activation+1.13e+00

Pos logit contributions
 Anne+0.005
Beck+0.004
 great+0.004
Anne+0.004
 third+0.004
Neg logit contributions
ment-0.007
isher-0.007
edly-0.007
anged-0.007
 thirst-0.007
Token Tim
Token ID5045
Feature activation-1.57e+00

Pos logit contributions
ment+0.034
isher+0.034
edly+0.034
anged+0.033
uc+0.031
Neg logit contributions
 different-0.013
 Anne-0.013
 great-0.012
 big-0.012
 neat-0.012
Tokenmy
Token ID1820
Feature activation-5.17e-01

Pos logit contributions
 Anne+0.027
Beck+0.025
Anne+0.025
 great+0.025
 third+0.025
Neg logit contributions
isher-0.036
ment-0.036
anged-0.035
edly-0.035
 thirst-0.035
Token reached
Token ID4251
Feature activation+1.38e+00

Pos logit contributions
 Anne+0.037
Beck+0.036
Anne+0.035
 great+0.035
 neat+0.034
Neg logit contributions
edly-0.050
ment-0.049
anged-0.048
 thirst-0.048
isher-0.047
Token out
Token ID503
Feature activation-3.89e-01

Pos logit contributions
ment+0.041
 thirst+0.040
anged+0.039
isher+0.039
edly+0.038
Neg logit contributions
Anne-0.014
Beck-0.014
Alice-0.012
 great-0.012
 neat-0.012
Token his
Token ID465
Feature activation+2.83e+00

Pos logit contributions
Beck+0.006
Anne+0.005
 great+0.005
 Anne+0.005
Alice+0.005
Neg logit contributions
ment-0.010
edly-0.010
anged-0.009
isher-0.009
 thirst-0.009
Token hand
Token ID1021
Feature activation+2.71e+00

Pos logit contributions
ment+0.059
isher+0.058
anged+0.057
 thirst+0.057
edly+0.057
Neg logit contributions
 Anne-0.055
Beck-0.053
Anne-0.053
 great-0.051
 neat-0.050
Token and
Token ID290
Feature activation+2.16e+00

Pos logit contributions
ment+0.076
edly+0.075
isher+0.075
anged+0.074
 thirst+0.072
Neg logit contributions
 Anne-0.033
 great-0.031
Beck-0.031
Anne-0.031
 different-0.030
Token stro
Token ID9660
Feature activation+6.35e+00

Pos logit contributions
ment+0.050
isher+0.050
edly+0.049
anged+0.048
 thirst+0.047
Neg logit contributions
 Anne-0.038
 great-0.035
Beck-0.035
 different-0.035
Anne-0.035
Tokenked
Token ID9091
Feature activation+6.96e-01

Pos logit contributions
isher+0.069
ment+0.060
edly+0.052
anged+0.051
 thirst+0.050
Neg logit contributions
 out-0.192
 Anne-0.185
 big-0.184
 third-0.184
 different-0.183
Token a
Token ID257
Feature activation-2.16e+00

Pos logit contributions
ment+0.021
edly+0.020
 thirst+0.020
isher+0.020
anged+0.019
Neg logit contributions
 Anne-0.008
Anne-0.007
Beck-0.007
 great-0.007
 neat-0.007
Token white
Token ID2330
Feature activation+2.11e+00

Pos logit contributions
 Anne+0.023
 great+0.020
Anne+0.020
Beck+0.020
 different+0.020
Neg logit contributions
ment-0.065
isher-0.064
edly-0.063
anged-0.062
 thirst-0.062
Token sheep
Token ID15900
Feature activation+1.05e-01

Pos logit contributions
isher+0.058
anged+0.057
outed+0.055
edly+0.054
ment+0.054
Neg logit contributions
 big-0.029
 Anne-0.028
 different-0.028
 out-0.027
 tag-0.027
Token.
Token ID13
Feature activation-7.28e-01

Pos logit contributions
ment+0.003
isher+0.003
edly+0.003
anged+0.003
 thirst+0.003
Neg logit contributions
 Anne-0.002
 great-0.001
Beck-0.001
Anne-0.001
 different-0.001
Token Tom
Token ID4186
Feature activation-2.49e+00

Pos logit contributions
ment+0.051
isher+0.050
edly+0.049
anged+0.049
 thirst+0.048
Neg logit contributions
 Anne-0.023
Beck-0.021
 great-0.021
Anne-0.021
 neat-0.020
Token ran
Token ID4966
Feature activation-4.62e-01

Pos logit contributions
 Anne+0.048
Anne+0.046
Beck+0.045
 great+0.045
 neat+0.044
Neg logit contributions
edly-0.052
anged-0.052
ment-0.051
 thirst-0.050
isher-0.050
Token to
Token ID284
Feature activation+1.05e+00

Pos logit contributions
Beck+0.007
 Anne+0.007
Anne+0.007
 third+0.007
 neat+0.007
Neg logit contributions
edly-0.011
anged-0.011
ment-0.011
isher-0.010
 thirst-0.010
Token Rex
Token ID17853
Feature activation+8.77e-02

Pos logit contributions
anged+0.030
ment+0.029
 thirst+0.029
isher+0.028
edly+0.028
Neg logit contributions
Beck-0.011
Anne-0.010
 Anne-0.010
 neat-0.010
 great-0.009
Token and
Token ID290
Feature activation+1.70e+00

Pos logit contributions
anged+0.002
edly+0.002
ment+0.002
 thirst+0.002
isher+0.002
Neg logit contributions
 great-0.001
Beck-0.001
Anne-0.001
 Anne-0.001
 neat-0.001
Token stro
Token ID9660
Feature activation+6.34e+00

Pos logit contributions
ment+0.040
edly+0.040
anged+0.039
isher+0.039
 thirst+0.039
Neg logit contributions
 Anne-0.027
 great-0.026
Beck-0.026
Anne-0.026
 neat-0.025
Tokenked
Token ID9091
Feature activation+9.51e-01

Pos logit contributions
isher+0.063
ment+0.058
 thirst+0.051
edly+0.050
anged+0.047
Neg logit contributions
 out-0.196
 Anne-0.195
 big-0.189
 different-0.189
 third-0.188
Token his
Token ID465
Feature activation+3.32e+00

Pos logit contributions
ment+0.028
edly+0.027
isher+0.027
anged+0.026
 thirst+0.026
Neg logit contributions
 Anne-0.011
Beck-0.010
Anne-0.010
 great-0.010
 neat-0.009
Token fur
Token ID9230
Feature activation+1.86e+00

Pos logit contributions
isher+0.076
ment+0.073
anged+0.072
edly+0.070
 thirst+0.068
Neg logit contributions
 big-0.059
 Anne-0.058
 great-0.058
 different-0.057
 third-0.057
Token.
Token ID13
Feature activation-2.66e-01

Pos logit contributions
isher+0.054
ment+0.052
anged+0.052
 thirst+0.050
edly+0.050
Neg logit contributions
 Anne-0.021
 different-0.021
 big-0.020
 out-0.020
 third-0.019
Token Rex
Token ID17853
Feature activation-8.93e-01

Pos logit contributions
 Anne+0.004
Beck+0.004
 great+0.004
Anne+0.004
 neat+0.004
Neg logit contributions
ment-0.006
edly-0.006
anged-0.006
isher-0.006
 thirst-0.006
Token away
Token ID1497
Feature activation+3.97e-01

Pos logit contributions
ment+0.045
isher+0.044
edly+0.044
anged+0.043
 thirst+0.042
Neg logit contributions
 Anne-0.022
 great-0.021
Beck-0.020
 different-0.020
Anne-0.020
Token!
Token ID0
Feature activation-2.47e+00

Pos logit contributions
ment+0.010
isher+0.010
edly+0.010
anged+0.009
 thirst+0.009
Neg logit contributions
 Anne-0.006
Beck-0.006
Anne-0.006
 third-0.005
 great-0.005
Token You
Token ID921
Feature activation+6.38e-01

Pos logit contributions
 Anne+0.035
 great+0.031
 out+0.030
Anne+0.030
 big+0.030
Neg logit contributions
ment-0.067
isher-0.066
edly-0.066
 thirst-0.064
anged-0.062
Token are
Token ID389
Feature activation-1.64e+00

Pos logit contributions
ment+0.021
edly+0.020
isher+0.020
anged+0.019
 thirst+0.019
Neg logit contributions
 big-0.006
 great-0.006
 out-0.006
 different-0.006
 Anne-0.006
Token not
Token ID407
Feature activation-1.29e+00

Pos logit contributions
 Anne+0.024
Beck+0.024
Anne+0.024
 third+0.022
 neat+0.021
Neg logit contributions
 thirst-0.039
isher-0.039
edly-0.038
anged-0.038
ment-0.037
Token a
Token ID257
Feature activation-6.68e+00

Pos logit contributions
 Anne+0.016
Beck+0.015
Anne+0.015
 third+0.014
 neat+0.013
Neg logit contributions
ment-0.035
isher-0.035
 thirst-0.035
anged-0.034
edly-0.034
Token pirate
Token ID25868
Feature activation-1.84e+00

Pos logit contributions
 Anne+0.135
Anne+0.126
 June+0.115
 Higher+0.112
 Alice+0.109
Neg logit contributions
isher-0.148
 thirst-0.136
edly-0.130
der-0.130
ory-0.128
Token!"
Token ID2474
Feature activation-1.77e+00

Pos logit contributions
 Anne+0.023
Beck+0.021
 great+0.021
Anne+0.021
 different+0.020
Neg logit contributions
ment-0.051
isher-0.051
edly-0.050
anged-0.049
 thirst-0.049
Token Ben
Token ID3932
Feature activation-1.98e+00

Pos logit contributions
 Anne+0.026
Anne+0.026
Beck+0.026
 great+0.026
 neat+0.025
Neg logit contributions
ment-0.043
anged-0.042
 thirst-0.041
edly-0.041
isher-0.040
Token said
Token ID531
Feature activation+8.55e-01

Pos logit contributions
 Anne+0.042
Beck+0.041
 great+0.041
Anne+0.040
 neat+0.040
Neg logit contributions
ment-0.036
edly-0.036
anged-0.035
isher-0.035
 thirst-0.034
Token.
Token ID13
Feature activation-7.96e-01

Pos logit contributions
ment+0.022
edly+0.021
anged+0.021
isher+0.021
 thirst+0.020
Neg logit contributions
 Anne-0.013
Beck-0.013
Anne-0.013
 great-0.013
 third-0.012
Token.
Token ID13
Feature activation-1.14e+00

Pos logit contributions
Beck+0.030
 Anne+0.029
Anne+0.029
 great+0.027
 neat+0.026
Neg logit contributions
edly-0.038
ment-0.038
isher-0.037
anged-0.037
 thirst-0.036
Token He
Token ID679
Feature activation+3.98e-01

Pos logit contributions
 Anne+0.022
 great+0.021
Beck+0.021
Anne+0.021
 neat+0.021
Neg logit contributions
ment-0.024
edly-0.023
isher-0.023
 thirst-0.022
anged-0.022
Token helps
Token ID5419
Feature activation-3.54e-01

Pos logit contributions
isher+0.008
anged+0.008
ment+0.008
edly+0.008
outed+0.008
Neg logit contributions
 Anne-0.007
Beck-0.007
Anne-0.007
Alice-0.007
 neat-0.007
Token them
Token ID606
Feature activation-2.21e+00

Pos logit contributions
 great+0.004
 neat+0.004
Beck+0.004
 wonderful+0.004
 Anne+0.004
Neg logit contributions
ment-0.010
edly-0.009
isher-0.009
 thirst-0.009
ro-0.009
Token make
Token ID787
Feature activation-3.56e+00

Pos logit contributions
Beck+0.035
Anne+0.034
 Anne+0.034
Alice+0.033
 great+0.031
Neg logit contributions
anged-0.054
ment-0.053
ro-0.052
edly-0.051
isher-0.051
Token structures
Token ID8573
Feature activation-6.41e+00

Pos logit contributions
 Anne+0.038
Beck+0.037
Anne+0.037
 third+0.033
 great+0.032
Neg logit contributions
ment-0.103
isher-0.100
 thirst-0.100
edly-0.099
anged-0.097
Token with
Token ID351
Feature activation-2.23e+00

Pos logit contributions
 Anne+0.103
Anne+0.100
Beck+0.097
Alice+0.091
 Alice+0.090
Neg logit contributions
ment-0.160
edly-0.153
isher-0.150
anged-0.147
 thirst-0.145
Token blocks
Token ID7021
Feature activation-5.01e+00

Pos logit contributions
 Anne+0.021
Anne+0.020
Beck+0.019
 great+0.018
 third+0.017
Neg logit contributions
ment-0.068
isher-0.068
anged-0.066
 thirst-0.066
edly-0.066
Token.
Token ID13
Feature activation-1.51e+00

Pos logit contributions
 Anne+0.056
 different+0.055
 big+0.054
 great+0.053
 out+0.052
Neg logit contributions
isher-0.144
ment-0.144
edly-0.142
 thirst-0.141
anged-0.140
Token He
Token ID679
Feature activation+6.36e-02

Pos logit contributions
 Anne+0.028
 neat+0.027
 great+0.027
Beck+0.027
Anne+0.027
Neg logit contributions
ment-0.033
edly-0.032
isher-0.030
 thirst-0.030
anged-0.030
Token makes
Token ID1838
Feature activation-1.80e+00

Pos logit contributions
isher+0.001
anged+0.001
ment+0.001
outed+0.001
edly+0.001
Neg logit contributions
 Anne-0.001
Anne-0.001
Beck-0.001
Alice-0.001
 Alice-0.001
Token want
Token ID765
Feature activation-1.81e+00

Pos logit contributions
ment+0.028
edly+0.027
isher+0.026
 thirst+0.025
anged+0.025
Neg logit contributions
 great-0.014
 different-0.014
 big-0.014
 out-0.013
 neat-0.013
Token.
Token ID13
Feature activation-2.25e+00

Pos logit contributions
Beck+0.038
Anne+0.037
 Anne+0.036
Alice+0.033
Con+0.033
Neg logit contributions
isher-0.036
ment-0.036
edly-0.033
 thirst-0.033
anged-0.033
Token You
Token ID921
Feature activation+1.70e-01

Pos logit contributions
 Anne+0.034
 great+0.031
Anne+0.031
Beck+0.031
 different+0.030
Neg logit contributions
ment-0.058
isher-0.057
edly-0.056
anged-0.055
 thirst-0.054
Token can
Token ID460
Feature activation-1.03e+00

Pos logit contributions
ment+0.005
isher+0.005
edly+0.005
anged+0.004
 thirst+0.004
Neg logit contributions
 Anne-0.002
 great-0.002
Beck-0.002
Anne-0.002
 different-0.002
Token make
Token ID787
Feature activation-3.56e+00

Pos logit contributions
 Anne+0.016
Anne+0.015
Beck+0.015
 Alice+0.013
 June+0.013
Neg logit contributions
isher-0.025
anged-0.024
 thirst-0.024
ment-0.023
edly-0.023
Token towers
Token ID18028
Feature activation-6.34e+00

Pos logit contributions
Anne+0.047
Beck+0.047
 Anne+0.046
Alice+0.041
 third+0.040
Neg logit contributions
isher-0.091
ment-0.091
 thirst-0.088
anged-0.084
edly-0.084
Token and
Token ID290
Feature activation+6.79e-02

Pos logit contributions
 Anne+0.089
Beck+0.085
Anne+0.085
 great+0.078
 third+0.076
Neg logit contributions
ment-0.169
isher-0.163
edly-0.162
anged-0.157
 thirst-0.155
Token houses
Token ID7777
Feature activation-5.35e+00

Pos logit contributions
ment+0.002
isher+0.002
edly+0.002
anged+0.002
 thirst+0.002
Neg logit contributions
 Anne-0.001
Beck-0.001
Anne-0.001
 great-0.001
 third-0.001
Token and
Token ID290
Feature activation-1.41e-01

Pos logit contributions
 Anne+0.075
Beck+0.071
Anne+0.070
 great+0.066
 third+0.065
Neg logit contributions
ment-0.142
isher-0.139
edly-0.139
anged-0.134
 thirst-0.133
Token cars
Token ID5006
Feature activation-4.52e+00

Pos logit contributions
 Anne+0.002
Beck+0.002
Anne+0.002
 great+0.002
 third+0.002
Neg logit contributions
ment-0.004
isher-0.004
edly-0.004
anged-0.004
 thirst-0.004
Token and
Token ID290
Feature activation-2.51e-01

Pos logit contributions
 Anne+0.055
 great+0.051
 different+0.050
Beck+0.050
Anne+0.050
Neg logit contributions
ment-0.127
isher-0.126
edly-0.125
anged-0.123
 thirst-0.122
Token learners
Token ID46184
Feature activation-1.28e+00

Pos logit contributions
ment+0.001
isher+0.001
edly+0.001
anged+0.001
 thirst+0.001
Neg logit contributions
Beck-0.001
Anne-0.001
 Anne-0.001
 third-0.000
 neat-0.000
Token.
Token ID13
Feature activation-1.76e+00

Pos logit contributions
 Anne+0.018
Beck+0.017
 great+0.017
Anne+0.017
 neat+0.016
Neg logit contributions
ment-0.033
isher-0.033
edly-0.033
anged-0.032
 thirst-0.031
Token You
Token ID921
Feature activation+3.89e-01

Pos logit contributions
 Anne+0.032
 great+0.032
Beck+0.031
 different+0.031
 big+0.031
Neg logit contributions
ment-0.038
isher-0.037
edly-0.037
anged-0.036
 thirst-0.035
Token can
Token ID460
Feature activation-8.85e-01

Pos logit contributions
isher+0.010
anged+0.009
 thirst+0.009
ment+0.009
edly+0.009
Neg logit contributions
 Anne-0.006
Anne-0.005
Beck-0.005
 Alice-0.005
Alice-0.005
Token make
Token ID787
Feature activation-3.06e+00

Pos logit contributions
 Anne+0.015
Anne+0.014
Beck+0.014
 Alice+0.013
 neat+0.013
Neg logit contributions
isher-0.020
anged-0.020
 thirst-0.019
edly-0.019
ro-0.019
Token triangles
Token ID44360
Feature activation-6.29e+00

Pos logit contributions
Anne+0.039
 Anne+0.039
Beck+0.037
Alice+0.037
 Alice+0.033
Neg logit contributions
ment-0.080
isher-0.079
 thirst-0.077
edly-0.074
anged-0.074
Token with
Token ID351
Feature activation-1.69e+00

Pos logit contributions
 Anne+0.091
Anne+0.088
Beck+0.083
Alice+0.081
 Alice+0.080
Neg logit contributions
ment-0.166
edly-0.157
isher-0.156
 thirst-0.152
anged-0.149
Token your
Token ID534
Feature activation+1.33e+00

Pos logit contributions
 Anne+0.017
Anne+0.015
Beck+0.015
 great+0.014
 third+0.014
Neg logit contributions
isher-0.051
ment-0.051
edly-0.050
anged-0.049
 thirst-0.049
Token blocks
Token ID7021
Feature activation-4.39e+00

Pos logit contributions
ment+0.033
isher+0.032
edly+0.032
anged+0.031
 thirst+0.031
Neg logit contributions
 Anne-0.021
 great-0.020
 different-0.020
 big-0.019
Beck-0.019
Token too
Token ID1165
Feature activation-1.08e+00

Pos logit contributions
 Anne+0.055
 great+0.052
 different+0.052
 big+0.051
 neat+0.051
Neg logit contributions
ment-0.121
isher-0.120
edly-0.119
anged-0.118
 thirst-0.116
Token.
Token ID13
Feature activation-2.18e+00

Pos logit contributions
 Anne+0.018
Beck+0.017
Anne+0.017
 great+0.016
 third+0.015
Neg logit contributions
ment-0.025
edly-0.025
isher-0.025
 thirst-0.024
anged-0.024
Token television
Token ID5581
Feature activation-8.94e-01

Pos logit contributions
Beck+0.026
 Anne+0.025
Anne+0.025
 overcome+0.024
Alice+0.022
Neg logit contributions
isher-0.065
 thirst-0.057
ment-0.057
iness-0.056
outed-0.056
Token.
Token ID13
Feature activation-2.40e+00

Pos logit contributions
 Anne+0.013
 great+0.012
Beck+0.012
Anne+0.012
 different+0.011
Neg logit contributions
ment-0.023
isher-0.023
edly-0.023
anged-0.023
 thirst-0.022
Token It
Token ID632
Feature activation+5.76e-01

Pos logit contributions
 Anne+0.039
 great+0.036
Anne+0.036
Beck+0.036
 different+0.036
Neg logit contributions
ment-0.058
isher-0.058
edly-0.057
anged-0.056
 thirst-0.055
Token is
Token ID318
Feature activation-1.11e+00

Pos logit contributions
ment+0.018
isher+0.018
edly+0.018
anged+0.017
 thirst+0.017
Neg logit contributions
 Anne-0.006
 great-0.005
 different-0.005
 big-0.005
Beck-0.005
Token not
Token ID407
Feature activation-1.00e+00

Pos logit contributions
 Anne+0.013
Beck+0.012
Anne+0.012
 great+0.012
 third+0.012
Neg logit contributions
ment-0.032
isher-0.031
edly-0.031
anged-0.030
 thirst-0.030
Token a
Token ID257
Feature activation-6.27e+00

Pos logit contributions
 Anne+0.010
Beck+0.009
Anne+0.009
 great+0.009
 third+0.009
Neg logit contributions
ment-0.030
isher-0.030
edly-0.029
anged-0.029
 thirst-0.029
Token toy
Token ID13373
Feature activation-1.71e+00

Pos logit contributions
 Anne+0.096
Anne+0.095
Beck+0.091
 Higher+0.090
Alice+0.087
Neg logit contributions
isher-0.140
der-0.131
 thirst-0.129
mer-0.125
ro-0.125
Token.
Token ID13
Feature activation-2.45e+00

Pos logit contributions
Beck+0.028
 Anne+0.028
Anne+0.028
 great+0.026
Alice+0.025
Neg logit contributions
isher-0.040
edly-0.039
ment-0.039
anged-0.038
der-0.037
Token It
Token ID632
Feature activation+6.79e-01

Pos logit contributions
 Anne+0.039
 great+0.036
Anne+0.036
Beck+0.036
 different+0.035
Neg logit contributions
isher-0.061
ment-0.060
edly-0.059
anged-0.058
 thirst-0.058
Token is
Token ID318
Feature activation-9.99e-01

Pos logit contributions
ment+0.021
isher+0.021
edly+0.021
anged+0.021
 thirst+0.020
Neg logit contributions
 Anne-0.006
 great-0.006
 different-0.006
 big-0.006
Anne-0.006
Token expensive
Token ID5789
Feature activation+1.29e+00

Pos logit contributions
 Anne+0.011
 great+0.010
 different+0.010
 big+0.010
 neat+0.009
Neg logit contributions
ment-0.030
isher-0.030
edly-0.029
anged-0.029
 thirst-0.028
Token lose
Token ID4425
Feature activation-6.52e-01

Pos logit contributions
 Anne+0.007
 great+0.007
 big+0.006
 different+0.006
 neat+0.006
Neg logit contributions
isher-0.018
ment-0.018
edly-0.018
anged-0.018
 thirst-0.018
Token.
Token ID13
Feature activation-2.28e+00

Pos logit contributions
 Anne+0.010
Anne+0.009
Beck+0.009
 great+0.009
 third+0.009
Neg logit contributions
ment-0.017
isher-0.017
edly-0.016
 thirst-0.016
anged-0.016
Token You
Token ID921
Feature activation+7.15e-01

Pos logit contributions
 Anne+0.038
Anne+0.035
Beck+0.034
 great+0.034
 third+0.034
Neg logit contributions
isher-0.055
ment-0.055
edly-0.054
anged-0.053
 thirst-0.052
Token are
Token ID389
Feature activation-1.44e+00

Pos logit contributions
ment+0.024
edly+0.023
isher+0.023
anged+0.022
 thirst+0.022
Neg logit contributions
 great-0.007
 big-0.007
 different-0.007
 out-0.006
 Anne-0.006
Token not
Token ID407
Feature activation-1.09e+00

Pos logit contributions
 Anne+0.019
Anne+0.018
Beck+0.018
 third+0.016
Alice+0.015
Neg logit contributions
isher-0.038
 thirst-0.038
edly-0.037
ment-0.037
anged-0.037
Token a
Token ID257
Feature activation-6.25e+00

Pos logit contributions
 Anne+0.012
 great+0.011
 different+0.011
 big+0.011
 neat+0.011
Neg logit contributions
ment-0.032
isher-0.032
edly-0.032
anged-0.031
 thirst-0.031
Token good
Token ID922
Feature activation-2.10e+00

Pos logit contributions
 Anne+0.109
Anne+0.102
 Alice+0.091
Beck+0.091
 June+0.090
Neg logit contributions
isher-0.149
 thirst-0.138
ment-0.133
anged-0.132
edly-0.131
Token friend
Token ID1545
Feature activation-2.16e+00

Pos logit contributions
 Anne+0.035
 great+0.035
 big+0.034
 different+0.034
 out+0.034
Neg logit contributions
ment-0.050
edly-0.050
isher-0.049
 thirst-0.047
anged-0.047
Token."
Token ID526
Feature activation-9.53e-01

Pos logit contributions
 Anne+0.031
 great+0.029
Beck+0.028
Anne+0.028
 different+0.028
Neg logit contributions
ment-0.057
isher-0.056
edly-0.055
anged-0.054
 thirst-0.054
Token
Token ID198
Feature activation+1.31e-01

Pos logit contributions
 Anne+0.008
 great+0.007
 different+0.007
 big+0.007
Beck+0.007
Neg logit contributions
isher-0.031
ment-0.031
edly-0.031
anged-0.030
 thirst-0.030
Token
Token ID198
Feature activation+6.55e-01

Pos logit contributions
isher+0.004
ment+0.004
edly+0.004
anged+0.004
 thirst+0.004
Neg logit contributions
 Anne-0.001
 great-0.001
 big-0.001
 different-0.001
Anne-0.001
Token mess
Token ID2085
Feature activation-8.26e-01

Pos logit contributions
 Anne+0.100
Anne+0.096
Beck+0.094
 Alice+0.091
Alice+0.089
Neg logit contributions
 thirst-0.077
isher-0.075
ment-0.072
anged-0.068
edly-0.068
Token.
Token ID13
Feature activation-2.31e+00

Pos logit contributions
 Anne+0.012
Beck+0.011
 great+0.011
Anne+0.011
 third+0.011
Neg logit contributions
ment-0.022
isher-0.021
edly-0.021
anged-0.020
 thirst-0.020
Token You
Token ID921
Feature activation+1.11e+00

Pos logit contributions
 Anne+0.041
Anne+0.038
Beck+0.038
 great+0.038
 different+0.037
Neg logit contributions
isher-0.053
ment-0.053
edly-0.052
anged-0.051
 thirst-0.050
Token are
Token ID389
Feature activation-1.21e+00

Pos logit contributions
ment+0.035
edly+0.034
isher+0.033
anged+0.032
 thirst+0.032
Neg logit contributions
 big-0.012
 great-0.012
 different-0.012
 Anne-0.012
's-0.012
Token not
Token ID407
Feature activation-9.28e-01

Pos logit contributions
 Anne+0.016
Beck+0.015
Anne+0.015
 third+0.015
 great+0.014
Neg logit contributions
isher-0.031
ment-0.031
edly-0.031
 thirst-0.031
anged-0.030
Token a
Token ID257
Feature activation-6.25e+00

Pos logit contributions
 Anne+0.010
Beck+0.009
Anne+0.009
 great+0.009
 third+0.009
Neg logit contributions
ment-0.027
isher-0.027
edly-0.027
anged-0.026
 thirst-0.026
Token good
Token ID922
Feature activation-1.83e+00

Pos logit contributions
 Anne+0.108
 Higher+0.104
Anne+0.102
 June+0.094
Beck+0.093
Neg logit contributions
isher-0.131
 thirst-0.127
der-0.122
 actions-0.119
edly-0.118
Token friend
Token ID1545
Feature activation-2.08e+00

Pos logit contributions
 Anne+0.033
Beck+0.031
Anne+0.031
 great+0.031
 third+0.030
Neg logit contributions
ment-0.041
isher-0.041
edly-0.040
anged-0.039
 thirst-0.039
Token."
Token ID526
Feature activation-1.45e+00

Pos logit contributions
 Anne+0.032
Beck+0.031
Anne+0.030
 great+0.030
 neat+0.029
Neg logit contributions
ment-0.051
isher-0.050
edly-0.050
anged-0.049
 thirst-0.048
Token
Token ID198
Feature activation-3.06e-03

Pos logit contributions
 neat+0.023
 great+0.022
 Anne+0.022
 matching+0.022
Beck+0.022
Neg logit contributions
ment-0.033
edly-0.032
anged-0.029
outed-0.029
 thirst-0.029
Token
Token ID198
Feature activation+4.92e-01

Pos logit contributions
 Anne+0.000
 great+0.000
Beck+0.000
 third+0.000
 neat+0.000
Neg logit contributions
ment-0.000
edly-0.000
anged-0.000
 thirst-0.000
isher-0.000
Token mean
Token ID1612
Feature activation-1.06e+00

Pos logit contributions
 Anne+0.038
Beck+0.038
Anne+0.036
 fant+0.032
 third+0.032
Neg logit contributions
isher-0.049
 thirst-0.047
outed-0.046
edly-0.045
anged-0.044
Token.
Token ID13
Feature activation-3.14e+00

Pos logit contributions
 Anne+0.013
 great+0.012
 different+0.012
 big+0.012
 out+0.011
Neg logit contributions
ment-0.030
isher-0.030
edly-0.029
 thirst-0.029
anged-0.029
Token You
Token ID921
Feature activation+4.39e-01

Pos logit contributions
 Anne+0.055
 great+0.051
Anne+0.051
Beck+0.051
 different+0.050
Neg logit contributions
ment-0.072
isher-0.072
edly-0.071
anged-0.069
 thirst-0.068
Token are
Token ID389
Feature activation-1.26e+00

Pos logit contributions
ment+0.015
edly+0.014
isher+0.014
 thirst+0.013
anged+0.013
Neg logit contributions
 big-0.004
 great-0.004
 different-0.004
 Anne-0.004
 out-0.004
Token not
Token ID407
Feature activation-1.21e+00

Pos logit contributions
 Anne+0.017
Beck+0.016
Anne+0.016
 third+0.015
 great+0.014
Neg logit contributions
isher-0.033
ment-0.033
edly-0.032
 thirst-0.032
anged-0.032
Token a
Token ID257
Feature activation-6.22e+00

Pos logit contributions
 Anne+0.014
Beck+0.013
Anne+0.013
 third+0.012
 great+0.012
Neg logit contributions
ment-0.035
isher-0.035
edly-0.034
anged-0.034
 thirst-0.034
Token good
Token ID922
Feature activation-1.78e+00

Pos logit contributions
 Anne+0.115
Anne+0.106
 June+0.100
Beck+0.100
 Amanda+0.094
Neg logit contributions
isher-0.144
edly-0.125
der-0.125
ment-0.124
 thirst-0.124
Token customer
Token ID6491
Feature activation-1.30e+00

Pos logit contributions
 Anne+0.034
Beck+0.032
Anne+0.032
 great+0.032
 different+0.031
Neg logit contributions
ment-0.038
isher-0.038
edly-0.037
anged-0.036
 thirst-0.036
Token."
Token ID526
Feature activation-7.00e-01

Pos logit contributions
 Anne+0.019
Beck+0.018
 great+0.017
Anne+0.017
 different+0.017
Neg logit contributions
ment-0.033
isher-0.033
edly-0.033
anged-0.032
 thirst-0.032
Token
Token ID198
Feature activation+4.50e-02

Pos logit contributions
 Anne+0.008
 great+0.007
Beck+0.007
Anne+0.007
 different+0.007
Neg logit contributions
ment-0.021
isher-0.020
edly-0.020
anged-0.020
 thirst-0.020
Token
Token ID198
Feature activation+5.20e-01

Pos logit contributions
ment+0.001
isher+0.001
edly+0.001
anged+0.001
 thirst+0.001
Neg logit contributions
 Anne-0.000
 great-0.000
Beck-0.000
Anne-0.000
 different-0.000
Token my
Token ID616
Feature activation+5.40e-02

Pos logit contributions
Beck+0.009
 Anne+0.009
Anne+0.009
 third+0.007
 neat+0.007
Neg logit contributions
isher-0.019
ment-0.019
 thirst-0.018
anged-0.018
uc-0.017
Token notebook
Token ID20922
Feature activation-1.58e+00

Pos logit contributions
ment+0.001
isher+0.001
edly+0.001
anged+0.001
 thirst+0.001
Neg logit contributions
 Anne-0.001
 great-0.001
 different-0.001
 big-0.001
Beck-0.001
Token!
Token ID0
Feature activation-2.25e+00

Pos logit contributions
 Anne+0.022
Beck+0.021
Anne+0.020
 great+0.019
 different+0.019
Neg logit contributions
ment-0.042
isher-0.042
edly-0.041
anged-0.040
 thirst-0.040
Token You
Token ID921
Feature activation+5.90e-01

Pos logit contributions
 Anne+0.039
 great+0.037
 different+0.037
Beck+0.037
 neat+0.037
Neg logit contributions
ment-0.051
edly-0.050
isher-0.050
anged-0.049
 thirst-0.048
Token are
Token ID389
Feature activation-1.55e+00

Pos logit contributions
ment+0.018
isher+0.018
edly+0.018
anged+0.018
 thirst+0.017
Neg logit contributions
 Anne-0.006
 great-0.005
 different-0.005
 big-0.005
 neat-0.005
Token a
Token ID257
Feature activation-6.20e+00

Pos logit contributions
 Anne+0.024
Beck+0.023
Anne+0.023
 third+0.021
 neat+0.020
Neg logit contributions
isher-0.037
 thirst-0.036
edly-0.035
anged-0.035
ment-0.035
Token bad
Token ID2089
Feature activation-7.21e-01

Pos logit contributions
 Anne+0.114
Anne+0.105
 June+0.098
Beck+0.096
 Higher+0.096
Neg logit contributions
isher-0.133
edly-0.131
 thirst-0.129
ment-0.122
der-0.119
Token friend
Token ID1545
Feature activation-1.86e+00

Pos logit contributions
 great+0.012
 big+0.012
 Anne+0.011
 best+0.011
 different+0.011
Neg logit contributions
ment-0.017
isher-0.017
edly-0.017
anged-0.017
 thirst-0.017
Token!"
Token ID2474
Feature activation-1.16e+00

Pos logit contributions
Beck+0.028
 Anne+0.028
 neat+0.027
Anne+0.027
 great+0.026
Neg logit contributions
ment-0.045
isher-0.045
edly-0.044
anged-0.043
 thirst-0.042
Token
Token ID198
Feature activation+5.95e-02

Pos logit contributions
 Anne+0.014
Anne+0.012
Beck+0.012
 great+0.012
 neat+0.012
Neg logit contributions
ment-0.034
edly-0.033
anged-0.032
 thirst-0.032
isher-0.032
Token
Token ID198
Feature activation+4.94e-01

Pos logit contributions
ment+0.002
edly+0.002
isher+0.002
anged+0.002
 thirst+0.002
Neg logit contributions
 Anne-0.001
 great-0.001
Beck-0.001
Anne-0.001
 neat-0.001
Token Ruby
Token ID10888
Feature activation-7.06e-01

Pos logit contributions
Beck+0.007
 overcome+0.007
 great+0.007
 third+0.007
 neat+0.007
Neg logit contributions
ment-0.016
isher-0.016
edly-0.016
 thirst-0.016
anged-0.015
Token!
Token ID0
Feature activation-2.72e+00

Pos logit contributions
Beck+0.010
 Anne+0.010
Anne+0.010
 great+0.009
 different+0.009
Neg logit contributions
edly-0.017
ment-0.017
isher-0.017
 thirst-0.017
anged-0.016
Token She
Token ID1375
Feature activation+3.54e-01

Pos logit contributions
 Anne+0.041
 great+0.040
 different+0.040
 neat+0.039
Beck+0.039
Neg logit contributions
ment-0.068
edly-0.066
isher-0.066
anged-0.065
 thirst-0.064
Token is
Token ID318
Feature activation-1.54e+00

Pos logit contributions
ment+0.011
edly+0.011
isher+0.011
anged+0.011
 thirst+0.010
Neg logit contributions
's-0.003
 Anne-0.003
 great-0.003
 big-0.003
 out-0.003
Token not
Token ID407
Feature activation-8.62e-01

Pos logit contributions
 Anne+0.020
Beck+0.019
Anne+0.019
 third+0.018
 great+0.018
Neg logit contributions
isher-0.040
ment-0.040
edly-0.040
 thirst-0.040
anged-0.039
Token a
Token ID257
Feature activation-6.20e+00

Pos logit contributions
 Anne+0.008
 great+0.007
 different+0.007
Beck+0.007
Anne+0.007
Neg logit contributions
ment-0.027
isher-0.026
edly-0.026
anged-0.026
 thirst-0.025
Token princess
Token ID21752
Feature activation-2.05e+00

Pos logit contributions
 Anne+0.102
 Higher+0.099
Anne+0.095
 June+0.088
 overcome+0.087
Neg logit contributions
isher-0.140
 thirst-0.134
edly-0.132
ment-0.128
ro-0.128
Token,
Token ID11
Feature activation-2.74e+00

Pos logit contributions
 Anne+0.028
Beck+0.026
 great+0.026
Anne+0.026
 neat+0.025
Neg logit contributions
ment-0.054
isher-0.053
edly-0.053
anged-0.051
 thirst-0.051
Token she
Token ID673
Feature activation-4.51e-01

Pos logit contributions
 Anne+0.035
Beck+0.034
Anne+0.033
 great+0.033
 third+0.032
Neg logit contributions
ment-0.074
isher-0.072
edly-0.072
 thirst-0.070
anged-0.069
Token is
Token ID318
Feature activation-1.28e+00

Pos logit contributions
 Anne+0.005
 great+0.005
 big+0.004
 different+0.004
's+0.004
Neg logit contributions
ment-0.014
isher-0.014
edly-0.013
anged-0.013
 thirst-0.013
Token my
Token ID616
Feature activation+3.95e-01

Pos logit contributions
 Anne+0.015
Beck+0.014
Anne+0.013
 third+0.013
 great+0.013
Neg logit contributions
ment-0.036
isher-0.036
edly-0.035
 thirst-0.035
anged-0.035
Token fair
Token ID3148
Feature activation+1.10e+00

Pos logit contributions
ment+0.005
isher+0.005
 thirst+0.005
edly+0.005
anged+0.005
Neg logit contributions
 Anne-0.002
Beck-0.002
Anne-0.002
 third-0.002
 great-0.002
Token!
Token ID0
Feature activation-2.08e+00

Pos logit contributions
ment+0.032
isher+0.031
edly+0.031
anged+0.031
 thirst+0.030
Neg logit contributions
 Anne-0.013
Beck-0.012
Anne-0.012
 great-0.012
 different-0.011
Token You
Token ID921
Feature activation+5.61e-01

Pos logit contributions
 Anne+0.034
 great+0.032
Anne+0.031
Beck+0.031
 different+0.031
Neg logit contributions
ment-0.050
isher-0.050
edly-0.049
anged-0.048
 thirst-0.048
Token are
Token ID389
Feature activation-1.30e+00

Pos logit contributions
ment+0.016
edly+0.016
isher+0.016
anged+0.015
 thirst+0.015
Neg logit contributions
 Anne-0.007
 great-0.007
 different-0.007
 big-0.007
 neat-0.006
Token not
Token ID407
Feature activation-8.94e-01

Pos logit contributions
 Anne+0.019
Beck+0.019
Anne+0.019
 third+0.017
Alice+0.016
Neg logit contributions
 thirst-0.030
isher-0.030
edly-0.029
ment-0.029
anged-0.029
Token a
Token ID257
Feature activation-6.20e+00

Pos logit contributions
 Anne+0.010
Beck+0.010
Anne+0.010
 third+0.009
 great+0.009
Neg logit contributions
ment-0.025
isher-0.025
edly-0.025
anged-0.025
 thirst-0.024
Token queen
Token ID16599
Feature activation-2.11e+00

Pos logit contributions
 Anne+0.124
 Higher+0.118
Anne+0.117
 June+0.107
 Ashley+0.107
Neg logit contributions
isher-0.125
edly-0.114
ory-0.113
 thirst-0.110
ment-0.109
Token,
Token ID11
Feature activation-3.42e+00

Pos logit contributions
 Anne+0.030
Beck+0.029
 great+0.028
Anne+0.028
 neat+0.028
Neg logit contributions
ment-0.054
isher-0.054
edly-0.053
anged-0.052
 thirst-0.051
Token you
Token ID345
Feature activation+6.11e-01

Pos logit contributions
 Anne+0.050
Beck+0.049
Anne+0.048
 great+0.045
 neat+0.045
Neg logit contributions
ment-0.087
isher-0.085
edly-0.085
 thirst-0.082
anged-0.080
Token are
Token ID389
Feature activation-1.25e+00

Pos logit contributions
ment+0.016
isher+0.016
anged+0.016
edly+0.016
 thirst+0.015
Neg logit contributions
 Anne-0.008
Beck-0.008
Anne-0.008
 great-0.007
 neat-0.007
Token just
Token ID655
Feature activation+6.97e-01

Pos logit contributions
Beck+0.017
 Anne+0.017
Anne+0.017
 third+0.016
 Higher+0.015
Neg logit contributions
 thirst-0.031
isher-0.030
ment-0.030
edly-0.030
anged-0.029
Token funny
Token ID8258
Feature activation-5.90e-01

Pos logit contributions
 Anne+0.004
Beck+0.003
Anne+0.003
 great+0.003
 third+0.003
Neg logit contributions
ment-0.011
isher-0.011
edly-0.011
 thirst-0.010
anged-0.010
Token.
Token ID13
Feature activation-2.28e+00

Pos logit contributions
 Anne+0.008
Beck+0.008
Anne+0.008
 great+0.007
 third+0.007
Neg logit contributions
ment-0.016
isher-0.015
edly-0.015
anged-0.015
 thirst-0.015
Token You
Token ID921
Feature activation+1.01e+00

Pos logit contributions
 Anne+0.035
Beck+0.032
Anne+0.032
 great+0.031
 third+0.031
Neg logit contributions
isher-0.057
ment-0.057
edly-0.056
anged-0.055
 thirst-0.055
Token are
Token ID389
Feature activation-1.51e+00

Pos logit contributions
ment+0.032
edly+0.031
isher+0.031
anged+0.029
 thirst+0.029
Neg logit contributions
 great-0.011
 Anne-0.011
 big-0.010
 different-0.010
 neat-0.010
Token not
Token ID407
Feature activation-9.44e-01

Pos logit contributions
 Anne+0.019
Anne+0.017
Beck+0.017
 third+0.017
 great+0.016
Neg logit contributions
isher-0.041
ment-0.041
edly-0.041
 thirst-0.040
anged-0.040
Token a
Token ID257
Feature activation-6.19e+00

Pos logit contributions
 Anne+0.010
 great+0.010
 different+0.009
Beck+0.009
Anne+0.009
Neg logit contributions
ment-0.028
isher-0.028
edly-0.027
anged-0.027
 thirst-0.027
Token superhero
Token ID17343
Feature activation-8.53e-01

Pos logit contributions
 Anne+0.095
Anne+0.090
 Higher+0.082
 June+0.082
 Amanda+0.078
Neg logit contributions
isher-0.147
 thirst-0.146
edly-0.143
der-0.141
ro-0.138
Token,
Token ID11
Feature activation-3.04e+00

Pos logit contributions
 Anne+0.010
 great+0.009
 different+0.009
 big+0.009
 out+0.009
Neg logit contributions
ment-0.025
isher-0.025
anged-0.024
edly-0.024
 thirst-0.024
Token you
Token ID345
Feature activation+8.49e-01

Pos logit contributions
 Anne+0.037
 great+0.035
 different+0.034
 big+0.034
 third+0.033
Neg logit contributions
ment-0.087
isher-0.086
edly-0.085
anged-0.085
 thirst-0.082
Token are
Token ID389
Feature activation-1.20e+00

Pos logit contributions
ment+0.027
isher+0.027
edly+0.027
 thirst+0.026
anged+0.026
Neg logit contributions
 great-0.008
 Anne-0.007
 big-0.007
 different-0.007
 neat-0.007
Token a
Token ID257
Feature activation-5.01e+00

Pos logit contributions
 Anne+0.014
Beck+0.014
Anne+0.013
 third+0.013
 great+0.012
Neg logit contributions
ment-0.033
isher-0.033
edly-0.033
 thirst-0.032
anged-0.032
Token.
Token ID13
Feature activation-2.39e+00

Pos logit contributions
 Anne+0.003
 great+0.003
 different+0.003
 big+0.003
 neat+0.003
Neg logit contributions
ment-0.008
isher-0.008
edly-0.008
anged-0.008
 thirst-0.008
Token I
Token ID314
Feature activation+6.91e-01

Pos logit contributions
 Anne+0.037
 great+0.034
Anne+0.033
Beck+0.033
 different+0.033
Neg logit contributions
isher-0.061
ment-0.060
edly-0.059
anged-0.059
 thirst-0.058
Token was
Token ID373
Feature activation-1.60e+00

Pos logit contributions
ment+0.020
isher+0.020
edly+0.020
 thirst+0.018
anged+0.018
Neg logit contributions
 great-0.009
 different-0.009
 big-0.009
's-0.008
 hide-0.008
Token not
Token ID407
Feature activation-3.46e-01

Pos logit contributions
 Anne+0.024
Anne+0.023
Beck+0.023
 third+0.022
 neat+0.020
Neg logit contributions
isher-0.039
ment-0.039
 thirst-0.039
edly-0.038
anged-0.038
Token being
Token ID852
Feature activation-1.99e+00

Pos logit contributions
 Anne+0.004
 great+0.004
 different+0.004
 big+0.003
 neat+0.003
Neg logit contributions
ment-0.010
isher-0.010
edly-0.010
anged-0.010
 thirst-0.010
Token a
Token ID257
Feature activation-6.17e+00

Pos logit contributions
 Anne+0.035
Beck+0.034
Anne+0.033
 third+0.031
Alice+0.029
Neg logit contributions
isher-0.046
ment-0.045
anged-0.044
 thirst-0.044
edly-0.043
Token good
Token ID922
Feature activation-1.29e+00

Pos logit contributions
 Anne+0.123
Anne+0.114
 Higher+0.112
 June+0.104
 Alice+0.099
Neg logit contributions
isher-0.130
 gol-0.115
edly-0.111
 thirst-0.111
 actions-0.110
Token friend
Token ID1545
Feature activation-1.75e+00

Pos logit contributions
 Anne+0.025
Anne+0.024
Beck+0.024
 great+0.023
 neat+0.023
Neg logit contributions
ment-0.027
isher-0.027
edly-0.026
anged-0.026
 thirst-0.025
Token.
Token ID13
Feature activation-2.18e+00

Pos logit contributions
 Anne+0.025
 great+0.023
Beck+0.023
Anne+0.023
 different+0.023
Neg logit contributions
ment-0.045
isher-0.045
edly-0.045
anged-0.044
 thirst-0.043
Token I
Token ID314
Feature activation+7.81e-01

Pos logit contributions
 Anne+0.036
 great+0.034
Beck+0.033
Anne+0.033
 different+0.033
Neg logit contributions
ment-0.052
isher-0.051
edly-0.051
anged-0.050
 thirst-0.049
Token was
Token ID373
Feature activation-1.37e+00

Pos logit contributions
ment+0.024
isher+0.023
edly+0.023
ipt+0.021
 thirst+0.021
Neg logit contributions
 great-0.010
 different-0.010
 big-0.009
's-0.009
 hide-0.009
Token dolls
Token ID36062
Feature activation-3.05e+00

Pos logit contributions
Beck+0.024
Anne+0.023
 Anne+0.023
Alice+0.022
 great+0.022
Neg logit contributions
ment-0.035
isher-0.035
edly-0.034
anged-0.034
ro-0.034
Token and
Token ID290
Feature activation+4.82e-01

Pos logit contributions
Beck+0.042
Anne+0.040
 Anne+0.040
 great+0.037
 neat+0.035
Neg logit contributions
ment-0.079
anged-0.078
ro-0.078
edly-0.077
isher-0.077
Token animals
Token ID4695
Feature activation-3.42e+00

Pos logit contributions
ment+0.012
anged+0.012
edly+0.012
isher+0.012
 thirst+0.011
Neg logit contributions
 Anne-0.006
Beck-0.006
Anne-0.006
Alice-0.006
 great-0.006
Token to
Token ID284
Feature activation-5.68e-01

Pos logit contributions
 Anne+0.060
Beck+0.058
Anne+0.057
 great+0.056
 neat+0.054
Neg logit contributions
ment-0.077
edly-0.076
isher-0.074
anged-0.073
 thirst-0.070
Token make
Token ID787
Feature activation-3.52e+00

Pos logit contributions
 Anne+0.008
Beck+0.008
Anne+0.008
Alice+0.007
Con+0.007
Neg logit contributions
isher-0.013
 thirst-0.013
ment-0.013
anged-0.013
ro-0.013
Token houses
Token ID7777
Feature activation-6.16e+00

Pos logit contributions
Anne+0.047
Beck+0.047
 Anne+0.045
Alice+0.043
Elizabeth+0.043
Neg logit contributions
ment-0.091
isher-0.086
 thirst-0.085
edly-0.082
anged-0.080
Token,
Token ID11
Feature activation-1.50e+00

Pos logit contributions
 Anne+0.097
Anne+0.093
Beck+0.092
 great+0.083
Alice+0.083
Neg logit contributions
ment-0.153
edly-0.146
isher-0.144
 thirst-0.138
anged-0.138
Token roads
Token ID9725
Feature activation-2.96e+00

Pos logit contributions
 Anne+0.020
Beck+0.019
Anne+0.019
 great+0.018
 third+0.017
Neg logit contributions
ment-0.040
edly-0.039
isher-0.039
anged-0.038
 thirst-0.038
Token,
Token ID11
Feature activation-1.49e+00

Pos logit contributions
 Anne+0.039
Beck+0.037
Anne+0.037
 great+0.036
 third+0.034
Neg logit contributions
ment-0.081
isher-0.079
edly-0.079
anged-0.076
 thirst-0.075
Token farms
Token ID15893
Feature activation-2.27e+00

Pos logit contributions
 Anne+0.022
Beck+0.022
Anne+0.021
 great+0.020
 third+0.019
Neg logit contributions
ment-0.037
isher-0.036
edly-0.036
anged-0.035
 thirst-0.035
Token and
Token ID290
Feature activation+1.58e-01

Pos logit contributions
 Anne+0.027
Beck+0.026
Anne+0.026
 great+0.025
 third+0.023
Neg logit contributions
ment-0.064
isher-0.063
edly-0.062
anged-0.061
 thirst-0.060
Token her
Token ID607
Feature activation+5.72e-01

Pos logit contributions
ment+0.004
edly+0.004
isher+0.004
anged+0.004
 thirst+0.004
Neg logit contributions
 Anne-0.002
Beck-0.002
Anne-0.002
 third-0.002
 great-0.002
Token.
Token ID13
Feature activation-2.44e+00

Pos logit contributions
ment+0.014
edly+0.014
isher+0.014
 thirst+0.014
anged+0.013
Neg logit contributions
 Anne-0.009
Beck-0.008
Anne-0.008
 great-0.008
 different-0.008
Token You
Token ID921
Feature activation+2.18e-01

Pos logit contributions
 Anne+0.043
 great+0.041
Beck+0.040
 different+0.040
Anne+0.040
Neg logit contributions
ment-0.056
isher-0.055
edly-0.054
anged-0.053
 thirst-0.052
Token are
Token ID389
Feature activation-1.73e+00

Pos logit contributions
ment+0.007
isher+0.007
edly+0.007
anged+0.007
 thirst+0.007
Neg logit contributions
 great-0.002
 different-0.002
 big-0.002
 Anne-0.002
 out-0.002
Token not
Token ID407
Feature activation-1.39e+00

Pos logit contributions
 Anne+0.022
Anne+0.021
Beck+0.021
 third+0.020
Alice+0.018
Neg logit contributions
isher-0.045
 thirst-0.045
edly-0.044
ment-0.044
anged-0.042
Token a
Token ID257
Feature activation-6.16e+00

Pos logit contributions
 Anne+0.015
Beck+0.014
Anne+0.013
 third+0.013
 great+0.013
Neg logit contributions
ment-0.041
isher-0.041
edly-0.040
anged-0.040
 thirst-0.040
Token good
Token ID922
Feature activation-1.82e+00

Pos logit contributions
 Anne+0.100
Anne+0.093
 Higher+0.089
 June+0.084
Alice+0.081
Neg logit contributions
isher-0.144
 thirst-0.139
der-0.137
edly-0.136
ment-0.134
Token friend
Token ID1545
Feature activation-2.50e+00

Pos logit contributions
 Anne+0.035
Anne+0.034
Beck+0.034
 great+0.033
 neat+0.032
Neg logit contributions
ment-0.038
isher-0.038
edly-0.037
anged-0.036
 thirst-0.036
Token,
Token ID11
Feature activation-3.33e+00

Pos logit contributions
 Anne+0.039
Beck+0.037
Anne+0.036
 great+0.036
 neat+0.035
Neg logit contributions
ment-0.061
isher-0.060
edly-0.060
anged-0.058
 thirst-0.057
Token you
Token ID345
Feature activation-8.74e-02

Pos logit contributions
Beck+0.046
 Anne+0.046
Anne+0.044
 neat+0.043
 great+0.043
Neg logit contributions
ment-0.083
edly-0.081
isher-0.080
 thirst-0.079
ro-0.075
Token are
Token ID389
Feature activation-1.74e+00

Pos logit contributions
 great+0.001
 Anne+0.001
 different+0.001
 big+0.001
 neat+0.001
Neg logit contributions
ment-0.003
isher-0.003
edly-0.003
anged-0.003
 thirst-0.003
Token shapes
Token ID15268
Feature activation-3.05e+00

Pos logit contributions
ment+0.028
isher+0.027
edly+0.027
anged+0.027
 thirst+0.026
Neg logit contributions
 Anne-0.020
Beck-0.019
Anne-0.019
 great-0.018
 third-0.018
Token.
Token ID13
Feature activation-8.35e-01

Pos logit contributions
Anne+0.057
Beck+0.056
 Anne+0.056
Alice+0.052
Elizabeth+0.051
Neg logit contributions
ment-0.068
edly-0.066
isher-0.064
der-0.061
anged-0.059
Token They
Token ID1119
Feature activation-1.25e+00

Pos logit contributions
 great+0.014
 Anne+0.014
 neat+0.014
Beck+0.013
 big+0.013
Neg logit contributions
ment-0.018
edly-0.018
anged-0.018
isher-0.018
 thirst-0.017
Token can
Token ID460
Feature activation-1.47e+00

Pos logit contributions
 Anne+0.019
Anne+0.019
Beck+0.018
Alice+0.017
 Alice+0.016
Neg logit contributions
isher-0.029
anged-0.028
 thirst-0.028
edly-0.027
ment-0.027
Token make
Token ID787
Feature activation-3.30e+00

Pos logit contributions
 Anne+0.020
Anne+0.019
Beck+0.019
 great+0.017
 Alice+0.017
Neg logit contributions
isher-0.037
anged-0.036
edly-0.036
ment-0.035
 thirst-0.035
Token towers
Token ID18028
Feature activation-6.15e+00

Pos logit contributions
Anne+0.048
Beck+0.045
 Anne+0.045
Alice+0.045
Elizabeth+0.042
Neg logit contributions
ment-0.084
isher-0.081
 thirst-0.078
anged-0.076
edly-0.076
Token,
Token ID11
Feature activation-2.02e+00

Pos logit contributions
 Anne+0.095
Anne+0.094
Beck+0.094
Alice+0.085
 Ashley+0.085
Neg logit contributions
ment-0.155
isher-0.145
edly-0.142
anged-0.140
ro-0.135
Token bridges
Token ID19432
Feature activation-4.93e+00

Pos logit contributions
 Anne+0.027
Beck+0.026
Anne+0.026
Alice+0.024
 great+0.024
Neg logit contributions
ment-0.053
isher-0.051
edly-0.051
anged-0.050
 thirst-0.049
Token,
Token ID11
Feature activation-2.02e+00

Pos logit contributions
Beck+0.071
 Anne+0.070
Anne+0.069
 great+0.061
 Ashley+0.060
Neg logit contributions
ment-0.129
isher-0.124
edly-0.123
anged-0.119
ro-0.118
Token houses
Token ID7777
Feature activation-4.65e+00

Pos logit contributions
 Anne+0.030
Anne+0.029
Beck+0.029
Alice+0.027
 Alice+0.026
Neg logit contributions
ment-0.050
isher-0.049
edly-0.048
anged-0.047
ro-0.046
Token and
Token ID290
Feature activation-1.79e-02

Pos logit contributions
 Anne+0.061
Beck+0.061
Anne+0.060
Alice+0.050
 great+0.050
Neg logit contributions
ment-0.127
isher-0.122
edly-0.120
anged-0.118
ro-0.117
Token mean
Token ID1612
Feature activation+2.98e-01

Pos logit contributions
 Anne+0.019
Beck+0.019
Anne+0.019
 third+0.018
 neat+0.017
Neg logit contributions
 thirst-0.028
isher-0.027
edly-0.027
ment-0.027
anged-0.026
Token.
Token ID13
Feature activation-3.04e+00

Pos logit contributions
ment+0.009
isher+0.009
edly+0.008
 thirst+0.008
anged+0.008
Neg logit contributions
 Anne-0.003
 great-0.003
 different-0.003
 big-0.003
 out-0.003
Token You
Token ID921
Feature activation+9.89e-01

Pos logit contributions
 Anne+0.052
Anne+0.048
Beck+0.048
 great+0.048
 different+0.047
Neg logit contributions
isher-0.071
ment-0.071
edly-0.070
anged-0.068
 thirst-0.068
Token are
Token ID389
Feature activation-1.44e+00

Pos logit contributions
ment+0.031
edly+0.030
isher+0.030
anged+0.029
 thirst+0.029
Neg logit contributions
 Anne-0.010
 great-0.010
 big-0.010
 different-0.009
 out-0.009
Token not
Token ID407
Feature activation-9.12e-01

Pos logit contributions
 Anne+0.020
Beck+0.020
Anne+0.019
 third+0.019
 neat+0.018
Neg logit contributions
isher-0.035
 thirst-0.035
edly-0.034
ment-0.034
anged-0.034
Token a
Token ID257
Feature activation-6.15e+00

Pos logit contributions
 Anne+0.010
Beck+0.010
Anne+0.010
 third+0.010
 great+0.009
Neg logit contributions
ment-0.026
isher-0.025
 thirst-0.025
edly-0.025
anged-0.025
Token dragon
Token ID10441
Feature activation-1.31e+00

Pos logit contributions
 Anne+0.117
 Higher+0.115
Anne+0.112
Alice+0.104
 June+0.103
Neg logit contributions
isher-0.116
 thirst-0.114
der-0.112
edly-0.109
mer-0.105
Token,
Token ID11
Feature activation-3.06e+00

Pos logit contributions
 Anne+0.015
 great+0.014
 different+0.014
 big+0.014
 third+0.013
Neg logit contributions
ment-0.038
isher-0.038
edly-0.037
anged-0.037
 thirst-0.036
Token you
Token ID345
Feature activation+1.03e+00

Pos logit contributions
 Anne+0.040
 great+0.037
Beck+0.037
Anne+0.037
 different+0.036
Neg logit contributions
ment-0.083
isher-0.082
edly-0.082
anged-0.080
 thirst-0.079
Token are
Token ID389
Feature activation-1.06e+00

Pos logit contributions
ment+0.033
isher+0.033
edly+0.033
anged+0.031
 thirst+0.031
Neg logit contributions
 Anne-0.009
 great-0.009
 big-0.009
 different-0.009
 third-0.008
Token my
Token ID616
Feature activation+2.14e-01

Pos logit contributions
 Anne+0.013
Beck+0.013
Anne+0.013
 third+0.012
 great+0.012
Neg logit contributions
ment-0.028
isher-0.028
 thirst-0.028
edly-0.028
anged-0.027
Token every
Token ID790
Feature activation+4.87e-01

Pos logit contributions
 Anne+0.050
Beck+0.047
 great+0.046
Anne+0.046
 neat+0.044
Neg logit contributions
ment-0.103
isher-0.102
edly-0.101
anged-0.099
 thirst-0.097
Token day
Token ID1110
Feature activation-3.61e-01

Pos logit contributions
ment+0.011
edly+0.011
isher+0.011
anged+0.011
 thirst+0.010
Neg logit contributions
 Anne-0.009
 great-0.008
 different-0.008
 big-0.008
 third-0.008
Token.
Token ID13
Feature activation-1.23e+00

Pos logit contributions
 Anne+0.006
Beck+0.006
Anne+0.006
 great+0.005
 neat+0.005
Neg logit contributions
ment-0.009
edly-0.009
anged-0.008
isher-0.008
 thirst-0.008
Token They
Token ID1119
Feature activation-1.35e+00

Pos logit contributions
 neat+0.024
 Anne+0.024
Anne+0.023
Beck+0.023
 great+0.023
Neg logit contributions
edly-0.025
ment-0.025
 thirst-0.023
anged-0.022
ro-0.022
Token make
Token ID787
Feature activation-2.58e+00

Pos logit contributions
 Anne+0.024
Anne+0.024
Alice+0.023
 Alice+0.022
Beck+0.022
Neg logit contributions
ro-0.028
isher-0.027
anged-0.027
 thirst-0.026
edly-0.024
Token structures
Token ID8573
Feature activation-6.15e+00

Pos logit contributions
Beck+0.037
Anne+0.037
 Anne+0.036
Alice+0.034
Con+0.032
Neg logit contributions
ment-0.064
 thirst-0.063
isher-0.062
edly-0.061
anged-0.060
Token with
Token ID351
Feature activation-1.89e+00

Pos logit contributions
Anne+0.107
 Anne+0.106
Beck+0.103
Alice+0.101
 Ashley+0.097
Neg logit contributions
ment-0.147
edly-0.136
anged-0.132
isher-0.131
 thirst-0.125
Token blocks
Token ID7021
Feature activation-4.88e+00

Pos logit contributions
 Anne+0.020
Anne+0.020
Beck+0.019
Alice+0.018
May+0.016
Neg logit contributions
isher-0.056
ment-0.055
anged-0.054
 thirst-0.054
edly-0.052
Token.
Token ID13
Feature activation-1.72e+00

Pos logit contributions
 Anne+0.062
 great+0.057
Beck+0.057
Anne+0.056
 different+0.055
Neg logit contributions
ment-0.136
isher-0.134
edly-0.133
anged-0.131
 thirst-0.129
Token They
Token ID1119
Feature activation-1.44e+00

Pos logit contributions
 neat+0.037
 Anne+0.036
Anne+0.035
Beck+0.035
 great+0.035
Neg logit contributions
edly-0.033
ment-0.031
 thirst-0.029
ro-0.029
der-0.027
Token are
Token ID389
Feature activation-1.11e+00

Pos logit contributions
 Anne+0.024
Anne+0.024
Alice+0.023
Beck+0.022
 Alice+0.022
Neg logit contributions
ro-0.031
isher-0.030
anged-0.030
 thirst-0.030
edly-0.028
Token you
Token ID345
Feature activation+3.18e-01

Pos logit contributions
 Anne+0.017
Beck+0.016
Anne+0.016
 great+0.015
 third+0.015
Neg logit contributions
ment-0.053
isher-0.052
edly-0.051
 thirst-0.051
anged-0.050
Token.
Token ID13
Feature activation-1.75e+00

Pos logit contributions
ment+0.008
edly+0.008
isher+0.008
anged+0.008
 thirst+0.008
Neg logit contributions
 Anne-0.005
Beck-0.005
Anne-0.005
 great-0.004
 neat-0.004
Token He
Token ID679
Feature activation+1.61e-01

Pos logit contributions
 Anne+0.029
Anne+0.027
Beck+0.026
 great+0.026
 different+0.026
Neg logit contributions
isher-0.043
ment-0.043
edly-0.042
anged-0.041
 thirst-0.041
Token is
Token ID318
Feature activation-1.35e+00

Pos logit contributions
ment+0.005
isher+0.005
edly+0.005
anged+0.004
 thirst+0.004
Neg logit contributions
 Anne-0.002
 great-0.002
 different-0.002
 big-0.002
 out-0.002
Token not
Token ID407
Feature activation-1.42e+00

Pos logit contributions
 Anne+0.015
Beck+0.015
Anne+0.014
 third+0.014
 neat+0.013
Neg logit contributions
 thirst-0.037
ment-0.037
isher-0.037
edly-0.036
anged-0.034
Token a
Token ID257
Feature activation-6.14e+00

Pos logit contributions
 Anne+0.017
Beck+0.015
 great+0.015
Anne+0.015
 different+0.015
Neg logit contributions
ment-0.040
isher-0.040
edly-0.040
anged-0.039
 thirst-0.039
Token toy
Token ID13373
Feature activation-1.56e+00

Pos logit contributions
 Anne+0.096
Anne+0.091
Beck+0.089
 Higher+0.086
 overcome+0.086
Neg logit contributions
isher-0.136
 thirst-0.133
edly-0.130
der-0.127
mer-0.127
Token,
Token ID11
Feature activation-2.75e+00

Pos logit contributions
Beck+0.023
 Anne+0.023
Anne+0.022
 neat+0.022
 great+0.021
Neg logit contributions
edly-0.039
isher-0.039
ment-0.039
anged-0.037
 thirst-0.036
Token he
Token ID339
Feature activation-3.60e-01

Pos logit contributions
 Anne+0.031
 great+0.029
 different+0.028
 big+0.028
 neat+0.028
Neg logit contributions
ment-0.080
isher-0.080
edly-0.079
anged-0.078
 thirst-0.076
Token is
Token ID318
Feature activation-1.67e+00

Pos logit contributions
 great+0.004
 Anne+0.004
 different+0.004
 big+0.004
's+0.004
Neg logit contributions
ment-0.011
isher-0.010
edly-0.010
anged-0.010
 thirst-0.010
Token a
Token ID257
Feature activation-5.00e+00

Pos logit contributions
 Anne+0.019
Beck+0.018
Anne+0.017
 third+0.017
 great+0.017
Neg logit contributions
ment-0.048
isher-0.047
edly-0.047
 thirst-0.046
anged-0.045
Token nice
Token ID3621
Feature activation-2.21e-01

Pos logit contributions
 Anne+0.015
Beck+0.014
Anne+0.014
 third+0.013
 great+0.013
Neg logit contributions
ment-0.034
isher-0.034
edly-0.034
 thirst-0.033
anged-0.033
Token.
Token ID13
Feature activation-3.32e+00

Pos logit contributions
 Anne+0.003
 great+0.003
 different+0.003
Beck+0.003
Anne+0.003
Neg logit contributions
ment-0.006
isher-0.006
edly-0.006
anged-0.006
 thirst-0.006
Token You
Token ID921
Feature activation+6.69e-01

Pos logit contributions
 great+0.064
 Anne+0.064
 neat+0.063
 big+0.063
 different+0.062
Neg logit contributions
ment-0.067
edly-0.065
isher-0.063
anged-0.062
 thirst-0.060
Token are
Token ID389
Feature activation-1.37e+00

Pos logit contributions
ment+0.019
isher+0.019
edly+0.018
anged+0.018
 thirst+0.018
Neg logit contributions
 Anne-0.008
 great-0.007
Beck-0.007
Anne-0.007
 different-0.007
Token not
Token ID407
Feature activation-1.22e+00

Pos logit contributions
 Anne+0.018
Anne+0.018
Beck+0.017
 third+0.016
 Higher+0.016
Neg logit contributions
 thirst-0.034
isher-0.034
edly-0.033
ment-0.033
asured-0.032
Token a
Token ID257
Feature activation-6.13e+00

Pos logit contributions
 Anne+0.015
Beck+0.014
Anne+0.014
 third+0.013
 great+0.013
Neg logit contributions
ment-0.034
isher-0.033
edly-0.033
 thirst-0.032
anged-0.032
Token good
Token ID922
Feature activation-1.49e+00

Pos logit contributions
 Anne+0.101
Anne+0.099
 Higher+0.092
Alice+0.087
 Ashley+0.084
Neg logit contributions
isher-0.133
der-0.128
 gol-0.124
 actions-0.122
ro-0.121
Token friend
Token ID1545
Feature activation-2.01e+00

Pos logit contributions
 Anne+0.029
Anne+0.028
Beck+0.027
 great+0.026
 third+0.026
Neg logit contributions
isher-0.031
ment-0.031
edly-0.030
anged-0.030
 thirst-0.029
Token.
Token ID13
Feature activation-2.90e+00

Pos logit contributions
 Anne+0.032
Beck+0.031
Anne+0.030
 great+0.030
 neat+0.029
Neg logit contributions
ment-0.049
isher-0.048
edly-0.048
anged-0.046
 thirst-0.046
Token You
Token ID921
Feature activation+5.43e-01

Pos logit contributions
 great+0.056
 Anne+0.056
 neat+0.055
 big+0.054
 different+0.054
Neg logit contributions
ment-0.059
edly-0.057
isher-0.056
anged-0.055
 thirst-0.054
Token do
Token ID466
Feature activation-2.52e+00

Pos logit contributions
ment+0.016
isher+0.016
edly+0.016
anged+0.015
 thirst+0.015
Neg logit contributions
 Anne-0.006
 great-0.006
 different-0.005
Beck-0.005
 big-0.005
Token there
Token ID612
Feature activation+6.66e-01

Pos logit contributions
 Anne+0.004
 great+0.003
Beck+0.003
Anne+0.003
 different+0.003
Neg logit contributions
ment-0.014
isher-0.014
edly-0.014
anged-0.013
 thirst-0.013
Token was
Token ID373
Feature activation+2.06e+00

Pos logit contributions
ment+0.018
isher+0.018
edly+0.017
anged+0.017
 thirst+0.017
Neg logit contributions
 Anne-0.009
Beck-0.008
 great-0.008
Anne-0.008
 neat-0.008
Token a
Token ID257
Feature activation-1.82e+00

Pos logit contributions
ment+0.060
edly+0.059
 thirst+0.059
isher+0.058
anged+0.057
Neg logit contributions
 Anne-0.020
Beck-0.019
Anne-0.019
 neat-0.018
 third-0.018
Token rh
Token ID9529
Feature activation+5.05e+00

Pos logit contributions
 Anne+0.032
Anne+0.031
Beck+0.030
 out+0.029
 overcome+0.028
Neg logit contributions
 thirst-0.038
isher-0.037
ment-0.035
edly-0.035
anged-0.034
Tokenin
Token ID259
Feature activation+2.03e-01

Pos logit contributions
ment+0.132
edly+0.132
anged+0.130
 thirst+0.129
isher+0.123
Neg logit contributions
 different-0.071
 big-0.071
 Anne-0.068
Anne-0.068
Beck-0.068
Tokenoc
Token ID420
Feature activation+6.13e+00

Pos logit contributions
ment+0.004
isher+0.004
edly+0.004
anged+0.004
outed+0.004
Neg logit contributions
 big-0.005
 different-0.004
 great-0.004
's-0.004
 third-0.004
Tokeneros
Token ID27498
Feature activation+2.17e+00

Pos logit contributions
 thirst+0.081
ment+0.076
edly+0.075
outed+0.074
anged+0.071
Neg logit contributions
 Anne-0.155
 out-0.151
 best-0.148
 big-0.147
 ten-0.147
Token who
Token ID508
Feature activation+2.37e+00

Pos logit contributions
isher+0.060
ment+0.060
edly+0.059
anged+0.058
 thirst+0.058
Neg logit contributions
 Anne-0.027
 great-0.025
 different-0.025
 big-0.024
Anne-0.024
Token was
Token ID373
Feature activation+1.28e+00

Pos logit contributions
isher+0.064
ment+0.064
edly+0.062
anged+0.061
 thirst+0.061
Neg logit contributions
 Anne-0.033
 great-0.031
 different-0.031
 big-0.031
Anne-0.030
Token always
Token ID1464
Feature activation+3.23e+00

Pos logit contributions
 thirst+0.037
ment+0.037
edly+0.036
anged+0.036
isher+0.036
Neg logit contributions
 Anne-0.013
Beck-0.012
Anne-0.012
 third-0.010
 neat-0.010
Token naughty
Token ID45128
Feature activation+5.95e-01

Pos logit contributions
ment+0.094
isher+0.093
edly+0.092
anged+0.091
 thirst+0.089
Neg logit contributions
 Anne-0.037
 great-0.035
 different-0.034
 big-0.033
 neat-0.033
Token and
Token ID290
Feature activation+1.17e+00

Pos logit contributions
Beck+0.043
Anne+0.040
 Anne+0.040
 third+0.040
 overcome+0.036
Neg logit contributions
isher-0.075
anged-0.073
ment-0.073
uc-0.070
 thirst-0.069
Token their
Token ID511
Feature activation+2.26e+00

Pos logit contributions
edly+0.031
ment+0.031
isher+0.030
anged+0.030
 thirst+0.029
Neg logit contributions
 Anne-0.017
 big-0.017
 great-0.017
 different-0.016
 neat-0.016
Token little
Token ID1310
Feature activation+2.33e+00

Pos logit contributions
isher+0.050
ment+0.049
edly+0.049
 thirst+0.048
anged+0.048
Neg logit contributions
 Anne-0.041
Anne-0.039
Beck-0.039
 neat-0.036
 great-0.036
Token three
Token ID1115
Feature activation-7.93e-01

Pos logit contributions
ment+0.047
isher+0.046
edly+0.045
anged+0.044
 thirst+0.044
Neg logit contributions
 Anne-0.047
 great-0.045
Beck-0.045
Anne-0.044
 different-0.044
Token-
Token ID12
Feature activation+2.65e+00

Pos logit contributions
 Anne+0.015
Beck+0.014
Anne+0.014
 great+0.014
 neat+0.014
Neg logit contributions
ment-0.017
isher-0.017
edly-0.016
anged-0.016
 thirst-0.016
Tokenyear
Token ID1941
Feature activation+5.86e+00

Pos logit contributions
ment+0.030
edly+0.029
isher+0.029
 thirst+0.028
anged+0.027
Neg logit contributions
 Anne-0.077
Beck-0.074
Anne-0.074
 different-0.074
 great-0.074
Token-
Token ID12
Feature activation+3.04e+00

Pos logit contributions
isher+0.114
edly+0.109
anged+0.109
 thirst+0.108
ment+0.104
Neg logit contributions
 big-0.126
 many-0.119
 different-0.117
 good-0.113
 great-0.113
Tokenold
Token ID727
Feature activation+1.61e-01

Pos logit contributions
ment+0.057
isher+0.056
edly+0.055
anged+0.054
 thirst+0.053
Neg logit contributions
 Anne-0.066
 great-0.063
Beck-0.063
Anne-0.062
 different-0.062
Token daughter
Token ID4957
Feature activation-1.71e+00

Pos logit contributions
ment+0.003
isher+0.003
anged+0.003
edly+0.003
 thirst+0.003
Neg logit contributions
 Anne-0.003
Anne-0.003
Beck-0.003
 great-0.003
 neat-0.003
Token,
Token ID11
Feature activation-7.63e-01

Pos logit contributions
Beck+0.027
Anne+0.027
 Anne+0.024
Lou+0.024
 highest+0.024
Neg logit contributions
anged-0.042
ment-0.040
 thirst-0.039
uc-0.038
isher-0.038
Token Mill
Token ID9212
Feature activation-1.09e+00

Pos logit contributions
Beck+0.013
 third+0.013
Anne+0.013
 overcome+0.012
Con+0.012
Neg logit contributions
 thirst-0.015
ment-0.015
outed-0.015
anged-0.014
edly-0.014
Token park
Token ID3952
Feature activation+2.93e-01

Pos logit contributions
 Anne+0.015
 great+0.013
Beck+0.013
Anne+0.013
 different+0.013
Neg logit contributions
ment-0.045
isher-0.045
edly-0.045
anged-0.044
 thirst-0.044
Token.
Token ID13
Feature activation-3.60e-01

Pos logit contributions
ment+0.008
isher+0.008
edly+0.008
anged+0.007
 thirst+0.007
Neg logit contributions
 Anne-0.004
 great-0.004
Beck-0.004
Anne-0.004
 different-0.004
Token When
Token ID1649
Feature activation-2.11e+00

Pos logit contributions
Beck+0.006
 great+0.006
 neat+0.006
 Anne+0.006
 different+0.006
Neg logit contributions
ment-0.008
edly-0.008
anged-0.008
isher-0.008
 thirst-0.008
Token the
Token ID262
Feature activation-1.74e-01

Pos logit contributions
Beck+0.033
Anne+0.031
 Anne+0.030
 neat+0.029
 third+0.027
Neg logit contributions
ment-0.052
 thirst-0.048
anged-0.047
edly-0.047
isher-0.046
Token ch
Token ID442
Feature activation+1.32e-01

Pos logit contributions
 Anne+0.003
 great+0.003
 different+0.002
 big+0.002
 third+0.002
Neg logit contributions
ment-0.004
isher-0.004
edly-0.004
anged-0.004
 thirst-0.004
Tokenocol
Token ID4668
Feature activation+6.10e+00

Pos logit contributions
edly+0.002
isher+0.001
 thirst+0.001
idents+0.001
keeping+0.001
Neg logit contributions
 big-0.003
 different-0.003
 these-0.003
 great-0.003
 out-0.003
Tokenates
Token ID689
Feature activation+1.24e-01

Pos logit contributions
 thirst+0.140
isher+0.134
edly+0.133
anged+0.129
ment+0.129
Neg logit contributions
Anne-0.092
Beck-0.092
 Anne-0.090
 ten-0.087
 out-0.084
Token were
Token ID547
Feature activation-7.97e-03

Pos logit contributions
ment+0.003
isher+0.003
edly+0.003
anged+0.003
 thirst+0.003
Neg logit contributions
 Anne-0.002
 great-0.002
 different-0.002
 big-0.002
Beck-0.002
Token all
Token ID477
Feature activation-8.99e-01

Pos logit contributions
Beck+0.000
Anne+0.000
 Anne+0.000
Con+0.000
 June+0.000
Neg logit contributions
 thirst-0.000
outed-0.000
ment-0.000
edly-0.000
anged-0.000
Token gone
Token ID3750
Feature activation+1.34e-01

Pos logit contributions
 Anne+0.016
Beck+0.015
Anne+0.015
 third+0.014
 great+0.014
Neg logit contributions
isher-0.020
ment-0.020
anged-0.020
 thirst-0.020
edly-0.020
Token,
Token ID11
Feature activation-1.37e+00

Pos logit contributions
ment+0.004
edly+0.004
isher+0.004
anged+0.004
 thirst+0.004
Neg logit contributions
Anne-0.001
Beck-0.001
 Anne-0.001
 third-0.001
 great-0.001
Tokenmy
Token ID1820
Feature activation-1.82e+00

Pos logit contributions
 Anne+0.020
 great+0.018
 different+0.018
Anne+0.018
Beck+0.018
Neg logit contributions
ment-0.038
isher-0.037
edly-0.037
anged-0.036
 thirst-0.036
Token!"
Token ID2474
Feature activation-1.60e+00

Pos logit contributions
 Anne+0.019
 great+0.017
Beck+0.016
Anne+0.016
 different+0.016
Neg logit contributions
ment-0.055
isher-0.054
edly-0.054
anged-0.053
 thirst-0.052
Token exclaimed
Token ID33552
Feature activation+5.31e-01

Pos logit contributions
 Anne+0.024
 big+0.022
 different+0.022
 great+0.022
 third+0.021
Neg logit contributions
isher-0.042
ment-0.041
edly-0.040
anged-0.040
 thirst-0.039
Token three
Token ID1115
Feature activation-7.65e-01

Pos logit contributions
ment+0.014
isher+0.013
edly+0.013
anged+0.013
 thirst+0.013
Neg logit contributions
 Anne-0.007
Anne-0.007
Beck-0.007
 great-0.007
 different-0.007
Token-
Token ID12
Feature activation+3.47e+00

Pos logit contributions
 Anne+0.009
 big+0.008
 different+0.008
 great+0.008
 neat+0.008
Neg logit contributions
isher-0.022
ment-0.022
anged-0.022
edly-0.021
 thirst-0.021
Tokenyear
Token ID1941
Feature activation+5.73e+00

Pos logit contributions
ment+0.038
isher+0.037
edly+0.036
anged+0.035
 thirst+0.034
Neg logit contributions
 Anne-0.102
 great-0.098
Beck-0.098
Anne-0.098
 different-0.098
Token-
Token ID12
Feature activation+4.09e+00

Pos logit contributions
isher+0.110
anged+0.108
 thirst+0.106
edly+0.103
ment+0.101
Neg logit contributions
 big-0.123
 different-0.116
 many-0.113
 great-0.112
 neat-0.110
Tokenold
Token ID727
Feature activation+1.16e+00

Pos logit contributions
ment+0.078
isher+0.077
edly+0.076
anged+0.074
 thirst+0.073
Neg logit contributions
 Anne-0.088
 great-0.084
Beck-0.083
Anne-0.083
 different-0.083
Token Jenny
Token ID28437
Feature activation-1.21e+00

Pos logit contributions
isher+0.030
ment+0.030
edly+0.029
 thirst+0.029
anged+0.029
Neg logit contributions
 Anne-0.017
 great-0.015
Beck-0.015
 different-0.015
Anne-0.015
Token.
Token ID13
Feature activation+1.08e+00

Pos logit contributions
 Anne+0.017
 great+0.015
Beck+0.015
Anne+0.015
 different+0.015
Neg logit contributions
ment-0.032
isher-0.032
edly-0.032
anged-0.031
 thirst-0.031
Token She
Token ID1375
Feature activation+2.01e+00

Pos logit contributions
ment+0.030
isher+0.029
edly+0.029
anged+0.029
 thirst+0.028
Neg logit contributions
 Anne-0.013
Beck-0.012
 great-0.012
Anne-0.012
 different-0.012
Token started
Token ID2067
Feature activation-1.14e+00

Pos logit contributions
ment+0.057
anged+0.056
isher+0.056
edly+0.054
 thirst+0.052
Neg logit contributions
 great-0.040
Anne-0.039
 Anne-0.038
 third-0.038
Beck-0.038
Token eating
Token ID6600
Feature activation-7.44e-01

Pos logit contributions
Beck+0.017
Anne+0.016
 great+0.016
Alice+0.016
 Anne+0.016
Neg logit contributions
ment-0.030
anged-0.029
edly-0.027
uc-0.026
 thirst-0.026
Token the
Token ID262
Feature activation-1.31e+00

Pos logit contributions
Anne+0.012
Beck+0.011
 Anne+0.011
June+0.011
 June+0.011
Neg logit contributions
ment-0.020
edly-0.018
 thirst-0.018
isher-0.017
uc-0.016
Token delicious
Token ID12625
Feature activation-1.73e-01

Pos logit contributions
 Anne+0.021
Anne+0.021
Beck+0.021
 June+0.020
 overcome+0.019
Neg logit contributions
isher-0.031
ment-0.031
 thirst-0.030
edly-0.029
anged-0.028
Token ch
Token ID442
Feature activation+1.29e+00

Pos logit contributions
 Anne+0.003
Anne+0.003
Beck+0.003
 overcome+0.003
 June+0.003
Neg logit contributions
ment-0.004
isher-0.004
 thirst-0.003
edly-0.003
anged-0.003
Tokenocol
Token ID4668
Feature activation+6.11e+00

Pos logit contributions
edly+0.016
 thirst+0.014
outed+0.014
isher+0.013
ipt+0.012
Neg logit contributions
 big-0.035
 out-0.034
 great-0.034
 different-0.034
 matching-0.033
Tokenates
Token ID689
Feature activation+8.09e-01

Pos logit contributions
 thirst+0.135
isher+0.132
anged+0.127
edly+0.124
ment+0.123
Neg logit contributions
Beck-0.106
 Anne-0.105
Anne-0.103
 out-0.097
 ten-0.094
Token.
Token ID13
Feature activation-4.91e-01

Pos logit contributions
ment+0.021
edly+0.021
isher+0.020
anged+0.020
 thirst+0.019
Neg logit contributions
 Anne-0.012
Beck-0.011
Anne-0.011
 great-0.011
 third-0.011
Token She
Token ID1375
Feature activation+8.96e-01

Pos logit contributions
 overcome+0.007
 neat+0.007
 great+0.007
Beck+0.007
 best+0.007
Neg logit contributions
edly-0.012
ment-0.012
outed-0.011
anged-0.011
orted-0.011
Token was
Token ID373
Feature activation-5.46e-02

Pos logit contributions
edly+0.019
ment+0.019
anged+0.019
 thirst+0.019
isher+0.019
Neg logit contributions
 Anne-0.015
Beck-0.015
Anne-0.015
 great-0.015
 neat-0.015
Token so
Token ID523
Feature activation-4.20e-01

Pos logit contributions
Beck+0.001
Anne+0.001
 Anne+0.001
 third+0.001
Miss+0.001
Neg logit contributions
 thirst-0.001
edly-0.001
ment-0.001
anged-0.001
asured-0.001
Token and
Token ID290
Feature activation+1.73e+00

Pos logit contributions
isher+0.015
ment+0.015
edly+0.015
anged+0.015
 thirst+0.015
Neg logit contributions
 Anne-0.006
 great-0.006
 different-0.005
 big-0.005
 out-0.005
Token confused
Token ID10416
Feature activation+2.71e+00

Pos logit contributions
isher+0.051
ment+0.050
edly+0.049
anged+0.046
row+0.045
Neg logit contributions
 big-0.024
 different-0.024
 out-0.022
 neat-0.022
 great-0.022
Token.
Token ID13
Feature activation-1.90e+00

Pos logit contributions
isher+0.079
ment+0.077
anged+0.075
edly+0.075
 thirst+0.075
Neg logit contributions
 Anne-0.031
 different-0.030
 great-0.029
 big-0.029
 out-0.029
Token He
Token ID679
Feature activation+1.57e-01

Pos logit contributions
 Anne+0.024
 great+0.023
 neat+0.023
Anne+0.023
Beck+0.023
Neg logit contributions
ment-0.051
edly-0.050
isher-0.049
anged-0.048
 thirst-0.048
Token shook
Token ID14682
Feature activation-4.47e-02

Pos logit contributions
ment+0.004
isher+0.004
edly+0.004
anged+0.004
 thirst+0.004
Neg logit contributions
 big-0.002
 Anne-0.002
 different-0.002
 great-0.002
Anne-0.002
Token his
Token ID465
Feature activation+2.57e+00

Pos logit contributions
Beck+0.001
 Anne+0.001
Anne+0.001
 great+0.001
 third+0.001
Neg logit contributions
ment-0.001
edly-0.001
anged-0.001
isher-0.001
 thirst-0.001
Token head
Token ID1182
Feature activation-8.42e-02

Pos logit contributions
ment+0.053
isher+0.052
edly+0.052
anged+0.050
 thirst+0.050
Neg logit contributions
 Anne-0.051
Beck-0.049
 great-0.049
Anne-0.049
 different-0.048
Token and
Token ID290
Feature activation+2.12e+00

Pos logit contributions
 Anne+0.001
Beck+0.001
Anne+0.001
 great+0.001
 neat+0.001
Neg logit contributions
ment-0.002
edly-0.002
isher-0.002
anged-0.002
 thirst-0.002
Token threw
Token ID9617
Feature activation-7.55e-01

Pos logit contributions
isher+0.054
ment+0.053
edly+0.052
anged+0.050
 thirst+0.050
Neg logit contributions
 Anne-0.033
 different-0.032
 big-0.032
 out-0.031
 neat-0.031
Token the
Token ID262
Feature activation-6.81e-01

Pos logit contributions
 Anne+0.008
Beck+0.007
Anne+0.007
 great+0.006
 third+0.006
Neg logit contributions
ment-0.023
isher-0.023
edly-0.022
anged-0.022
 thirst-0.022
Token cushion
Token ID33579
Feature activation-2.54e-01

Pos logit contributions
 Anne+0.007
Beck+0.007
Anne+0.007
 overcome+0.007
 great+0.007
Neg logit contributions
isher-0.019
ment-0.019
edly-0.019
 thirst-0.019
anged-0.018
Token made
Token ID925
Feature activation-1.59e+00

Pos logit contributions
ment+0.030
edly+0.030
isher+0.030
anged+0.030
 thirst+0.029
Neg logit contributions
 Anne-0.019
Beck-0.018
Anne-0.018
 great-0.018
 neat-0.017
Token the
Token ID262
Feature activation-1.10e+00

Pos logit contributions
 Anne+0.019
 big+0.018
 great+0.018
 different+0.018
 out+0.017
Neg logit contributions
isher-0.046
edly-0.046
ment-0.046
anged-0.045
 thirst-0.044
Token old
Token ID1468
Feature activation+1.70e+00

Pos logit contributions
 big+0.016
 different+0.016
 great+0.015
 best+0.015
 third+0.015
Neg logit contributions
ment-0.032
anged-0.031
isher-0.031
edly-0.030
 thirst-0.028
Token man
Token ID582
Feature activation-9.93e-01

Pos logit contributions
isher+0.048
ment+0.047
anged+0.046
edly+0.046
 thirst+0.046
Neg logit contributions
 Anne-0.022
 big-0.022
 different-0.022
 great-0.021
 neat-0.021
Token very
Token ID845
Feature activation+1.29e+00

Pos logit contributions
 Anne+0.011
 great+0.010
Beck+0.010
Anne+0.010
 different+0.010
Neg logit contributions
ment-0.029
isher-0.029
edly-0.028
anged-0.028
 thirst-0.028
Token happy
Token ID3772
Feature activation+3.05e+00

Pos logit contributions
isher+0.043
ment+0.042
edly+0.041
anged+0.041
 thirst+0.040
Neg logit contributions
 different-0.010
 great-0.010
 big-0.010
 Anne-0.010
 neat-0.010
Token.
Token ID13
Feature activation-5.67e-01

Pos logit contributions
isher+0.092
ment+0.089
anged+0.088
 thirst+0.088
edly+0.086
Neg logit contributions
 Anne-0.033
 out-0.032
 different-0.032
 big-0.031
 great-0.031
Token
Token ID198
Feature activation+1.22e-01

Pos logit contributions
 Anne+0.006
 great+0.006
Beck+0.006
Anne+0.006
 different+0.006
Neg logit contributions
ment-0.017
isher-0.017
edly-0.016
anged-0.016
 thirst-0.016
Token
Token ID198
Feature activation+1.97e-01

Pos logit contributions
ment+0.003
edly+0.003
anged+0.003
ro+0.002
ipt+0.002
Neg logit contributions
 neat-0.002
 Anne-0.002
 best-0.002
Beck-0.002
 great-0.002
TokenHe
Token ID1544
Feature activation+1.14e+00

Pos logit contributions
edly+0.005
isher+0.005
ro+0.005
ment+0.005
ised+0.005
Neg logit contributions
 Anne-0.003
 best-0.003
 out-0.003
 great-0.003
Beck-0.002
Token decided
Token ID3066
Feature activation+6.02e-01

Pos logit contributions
ment+0.026
isher+0.025
edly+0.025
anged+0.025
 thirst+0.024
Neg logit contributions
 Anne-0.020
 great-0.019
Beck-0.019
Anne-0.019
 different-0.019
Token up
Token ID510
Feature activation+1.11e-01

Pos logit contributions
 Anne+0.013
Beck+0.012
Anne+0.012
 great+0.011
 third+0.011
Neg logit contributions
ment-0.031
edly-0.030
isher-0.030
anged-0.030
 thirst-0.029
Token and
Token ID290
Feature activation+2.40e+00

Pos logit contributions
isher+0.003
ment+0.003
edly+0.003
anged+0.003
 thirst+0.003
Neg logit contributions
 Anne-0.001
 different-0.001
 big-0.001
 great-0.001
 out-0.001
Token up
Token ID510
Feature activation+3.99e-01

Pos logit contributions
ment+0.067
isher+0.065
edly+0.063
anged+0.059
 thirst+0.058
Neg logit contributions
 out-0.034
 different-0.034
 big-0.033
 Anne-0.033
 great-0.031
Token,
Token ID11
Feature activation-3.45e-01

Pos logit contributions
ment+0.012
edly+0.011
isher+0.011
anged+0.011
 thirst+0.011
Neg logit contributions
 Anne-0.005
Beck-0.004
Anne-0.004
 great-0.004
 neat-0.004
Token trying
Token ID2111
Feature activation-8.10e-01

Pos logit contributions
 Anne+0.005
Beck+0.004
Anne+0.004
 great+0.004
 third+0.004
Neg logit contributions
ment-0.009
edly-0.009
isher-0.009
 thirst-0.009
anged-0.009
Token to
Token ID284
Feature activation+1.69e+00

Pos logit contributions
Anne+0.015
Beck+0.015
 Anne+0.015
Alice+0.014
 Ellen+0.013
Neg logit contributions
ment-0.019
edly-0.018
 thirst-0.018
anged-0.017
der-0.017
Token get
Token ID651
Feature activation-5.54e-01

Pos logit contributions
ment+0.049
isher+0.048
edly+0.048
anged+0.047
 thirst+0.045
Neg logit contributions
 Anne-0.020
 great-0.019
 different-0.019
 big-0.019
 third-0.018
Token closer
Token ID5699
Feature activation+4.12e-01

Pos logit contributions
 out+0.007
 big+0.006
 different+0.006
 great+0.006
 good+0.006
Neg logit contributions
edly-0.017
isher-0.017
ment-0.017
anged-0.016
ately-0.016
Token to
Token ID284
Feature activation+6.67e-01

Pos logit contributions
ment+0.011
edly+0.011
isher+0.010
anged+0.010
 thirst+0.010
Neg logit contributions
 Anne-0.006
Beck-0.006
Anne-0.006
 great-0.005
 third-0.005
Token the
Token ID262
Feature activation-2.63e-01

Pos logit contributions
isher+0.023
ment+0.023
edly+0.023
anged+0.023
 thirst+0.022
Neg logit contributions
 Anne-0.004
 different-0.003
 great-0.003
 big-0.003
 out-0.003
Token sun
Token ID4252
Feature activation-1.45e-01

Pos logit contributions
 big+0.004
 great+0.004
 different+0.004
 third+0.004
 Anne+0.004
Neg logit contributions
ment-0.007
isher-0.007
anged-0.007
edly-0.007
 thirst-0.006
Token jumped
Token ID11687
Feature activation-3.11e-01

Pos logit contributions
 Anne+0.046
Beck+0.045
 great+0.045
 neat+0.044
Anne+0.043
Neg logit contributions
edly-0.066
ment-0.066
anged-0.065
isher-0.064
 thirst-0.063
Token in
Token ID287
Feature activation-2.41e+00

Pos logit contributions
Anne+0.005
Beck+0.005
 Anne+0.004
 piping+0.004
Alice+0.004
Neg logit contributions
edly-0.008
ment-0.008
 thirst-0.007
ro-0.007
isher-0.007
Token surprise
Token ID5975
Feature activation+4.05e-01

Pos logit contributions
Beck+0.034
 Anne+0.033
Anne+0.033
 overcome+0.031
 best+0.030
Neg logit contributions
ment-0.062
edly-0.060
 thirst-0.058
isher-0.058
anged-0.057
Token,
Token ID11
Feature activation-6.14e-01

Pos logit contributions
ment+0.010
edly+0.010
anged+0.010
isher+0.009
 thirst+0.009
Neg logit contributions
Beck-0.006
Anne-0.006
 great-0.006
 third-0.006
 Anne-0.006
Token but
Token ID475
Feature activation+1.33e+00

Pos logit contributions
 Anne+0.009
Beck+0.008
 great+0.008
Anne+0.008
 third+0.008
Neg logit contributions
ment-0.016
edly-0.016
isher-0.015
 thirst-0.015
anged-0.015
Token Cel
Token ID15248
Feature activation+2.59e+00

Pos logit contributions
ment+0.038
isher+0.038
edly+0.038
anged+0.037
 thirst+0.037
Neg logit contributions
 Anne-0.015
Beck-0.014
 great-0.014
Anne-0.014
 different-0.013
Tokenery
Token ID1924
Feature activation-2.52e-01

Pos logit contributions
ment+0.055
edly+0.055
isher+0.055
anged+0.054
 thirst+0.052
Neg logit contributions
 Anne-0.048
Beck-0.047
 great-0.046
 third-0.045
 big-0.045
Token just
Token ID655
Feature activation+4.58e+00

Pos logit contributions
 Anne+0.004
 great+0.004
 different+0.003
 big+0.003
 third+0.003
Neg logit contributions
isher-0.006
ment-0.006
anged-0.006
edly-0.006
 thirst-0.006
Token smiled
Token ID13541
Feature activation+1.52e+00

Pos logit contributions
isher+0.107
ment+0.106
edly+0.103
anged+0.100
 thirst+0.097
Neg logit contributions
 Anne-0.080
 different-0.076
 great-0.076
 big-0.075
 neat-0.075
Token and
Token ID290
Feature activation+2.32e+00

Pos logit contributions
ment+0.040
isher+0.039
edly+0.039
anged+0.038
 thirst+0.038
Neg logit contributions
 Anne-0.022
 great-0.020
Beck-0.020
Anne-0.020
 different-0.019
Token kept
Token ID4030
Feature activation-6.89e-01

Pos logit contributions
isher+0.057
ment+0.056
edly+0.054
anged+0.054
 thirst+0.053
Neg logit contributions
 Anne-0.038
 great-0.036
 different-0.035
 big-0.035
 neat-0.035
Token went
Token ID1816
Feature activation-1.14e+00

Pos logit contributions
 Anne+0.057
Anne+0.055
Beck+0.055
 great+0.052
 third+0.052
Neg logit contributions
anged-0.053
isher-0.052
ment-0.050
 thirst-0.050
outed-0.050
Token away
Token ID1497
Feature activation+2.51e+00

Pos logit contributions
 Anne+0.018
Beck+0.017
Anne+0.017
 third+0.016
 great+0.016
Neg logit contributions
ment-0.027
anged-0.027
edly-0.027
isher-0.026
 thirst-0.026
Token.
Token ID13
Feature activation-4.98e-01

Pos logit contributions
ment+0.063
edly+0.061
isher+0.059
anged+0.059
 thirst+0.058
Neg logit contributions
 Anne-0.039
Beck-0.038
Anne-0.037
 great-0.036
 third-0.036
Token They
Token ID1119
Feature activation-7.96e-01

Pos logit contributions
 Anne+0.007
 great+0.006
Beck+0.006
Anne+0.006
 different+0.006
Neg logit contributions
ment-0.013
isher-0.013
edly-0.013
anged-0.013
 thirst-0.013
Token told
Token ID1297
Feature activation+1.59e+00

Pos logit contributions
 Anne+0.013
 great+0.012
Beck+0.012
Anne+0.012
 different+0.012
Neg logit contributions
ment-0.019
isher-0.019
edly-0.019
anged-0.018
 thirst-0.018
Token her
Token ID607
Feature activation+2.08e+00

Pos logit contributions
ment+0.045
 thirst+0.044
edly+0.044
isher+0.044
anged+0.044
Neg logit contributions
 Anne-0.018
Beck-0.017
Anne-0.017
 third-0.016
 great-0.016
Token to
Token ID284
Feature activation+2.00e+00

Pos logit contributions
ment+0.056
edly+0.053
 thirst+0.052
isher+0.051
anged+0.051
Neg logit contributions
Beck-0.028
 Anne-0.028
Anne-0.027
 third-0.025
 neat-0.025
Token stay
Token ID2652
Feature activation+4.34e-01

Pos logit contributions
ment+0.058
isher+0.058
edly+0.057
anged+0.056
 thirst+0.055
Neg logit contributions
 Anne-0.023
 great-0.021
 different-0.021
 big-0.020
 neat-0.020
Token inside
Token ID2641
Feature activation+1.07e+00

Pos logit contributions
ment+0.012
isher+0.012
edly+0.012
 thirst+0.012
anged+0.012
Neg logit contributions
 Anne-0.005
Anne-0.005
Beck-0.005
 third-0.004
 great-0.004
Token and
Token ID290
Feature activation+2.02e+00

Pos logit contributions
ment+0.029
edly+0.028
isher+0.028
 thirst+0.027
anged+0.027
Neg logit contributions
 Anne-0.014
Beck-0.014
Anne-0.014
 great-0.013
 third-0.012
Token be
Token ID307
Feature activation+2.43e-02

Pos logit contributions
isher+0.065
ment+0.063
edly+0.062
anged+0.062
outed+0.059
Neg logit contributions
 Anne-0.021
 out-0.019
 different-0.019
 big-0.019
 neat-0.019
Token was
Token ID373
Feature activation-7.48e-01

Pos logit contributions
ment+0.010
isher+0.010
edly+0.010
anged+0.010
 thirst+0.010
Neg logit contributions
 Anne-0.007
 great-0.006
Beck-0.006
Anne-0.006
 different-0.006
Token happy
Token ID3772
Feature activation+1.32e+00

Pos logit contributions
 Anne+0.011
Beck+0.011
Anne+0.011
 third+0.010
 calendar+0.010
Neg logit contributions
 thirst-0.018
ment-0.018
edly-0.017
asured-0.017
isher-0.017
Token and
Token ID290
Feature activation+1.35e+00

Pos logit contributions
isher+0.040
ment+0.039
anged+0.038
edly+0.038
 thirst+0.038
Neg logit contributions
 Anne-0.014
 different-0.013
 big-0.013
 great-0.013
 out-0.013
Token proud
Token ID6613
Feature activation+1.24e+00

Pos logit contributions
isher+0.042
ment+0.039
anged+0.039
edly+0.038
 thirst+0.037
Neg logit contributions
 big-0.016
 great-0.016
 neat-0.016
 different-0.016
 Anne-0.015
Token.
Token ID13
Feature activation-1.63e+00

Pos logit contributions
isher+0.037
anged+0.036
ment+0.035
edly+0.035
 thirst+0.035
Neg logit contributions
 different-0.014
 big-0.014
 Anne-0.014
 out-0.013
 great-0.013
Token He
Token ID679
Feature activation+3.58e-01

Pos logit contributions
 Anne+0.020
 big+0.019
 great+0.019
 different+0.018
Beck+0.018
Neg logit contributions
isher-0.048
anged-0.046
ment-0.046
edly-0.046
 thirst-0.045
Token never
Token ID1239
Feature activation+2.53e+00

Pos logit contributions
ment+0.009
isher+0.009
edly+0.009
anged+0.008
 thirst+0.008
Neg logit contributions
 Anne-0.006
 great-0.005
Beck-0.005
Anne-0.005
 different-0.005
Token forgot
Token ID16453
Feature activation-2.31e-01

Pos logit contributions
isher+0.071
edly+0.068
ment+0.068
 thirst+0.063
ocol+0.062
Neg logit contributions
 different-0.037
 big-0.037
's-0.036
 great-0.034
 out-0.034
Token the
Token ID262
Feature activation-7.68e-01

Pos logit contributions
Beck+0.003
Anne+0.003
 Anne+0.003
 third+0.003
Alice+0.003
Neg logit contributions
ment-0.006
isher-0.006
edly-0.006
 thirst-0.005
der-0.005
Token girl
Token ID2576
Feature activation-1.94e-01

Pos logit contributions
 Anne+0.011
Anne+0.010
Beck+0.010
 out+0.009
 Alice+0.009
Neg logit contributions
isher-0.020
ment-0.020
 thirst-0.020
edly-0.019
anged-0.018
Token who
Token ID508
Feature activation+8.81e-01

Pos logit contributions
 Anne+0.002
Beck+0.002
Anne+0.002
 great+0.002
 neat+0.002
Neg logit contributions
ment-0.005
edly-0.005
isher-0.005
anged-0.005
 thirst-0.005
Token and
Token ID290
Feature activation+2.20e+00

Pos logit contributions
 Anne+0.005
 great+0.005
Beck+0.005
Anne+0.004
 third+0.004
Neg logit contributions
ment-0.008
isher-0.008
edly-0.008
anged-0.008
 thirst-0.008
Token a
Token ID257
Feature activation-7.45e-01

Pos logit contributions
ment+0.067
isher+0.066
 thirst+0.065
anged+0.064
edly+0.064
Neg logit contributions
 Anne-0.018
Beck-0.017
Anne-0.017
 third-0.016
 great-0.016
Token boy
Token ID2933
Feature activation-1.64e+00

Pos logit contributions
 Anne+0.014
Anne+0.014
Beck+0.013
 overcome+0.013
Alice+0.013
Neg logit contributions
 thirst-0.015
ment-0.014
isher-0.014
anged-0.013
edly-0.013
Token.
Token ID13
Feature activation-2.84e-01

Pos logit contributions
 Anne+0.020
 different+0.019
 great+0.018
 big+0.018
 out+0.017
Neg logit contributions
isher-0.047
ment-0.046
edly-0.046
 thirst-0.044
anged-0.044
Token They
Token ID1119
Feature activation-1.31e+00

Pos logit contributions
 great+0.006
 Anne+0.005
Beck+0.005
Anne+0.005
 wonderful+0.005
Neg logit contributions
anged-0.006
ment-0.005
isher-0.005
edly-0.005
 thirst-0.005
Token were
Token ID547
Feature activation-7.90e-01

Pos logit contributions
 Anne+0.025
Anne+0.024
Beck+0.024
Alice+0.023
 Alice+0.023
Neg logit contributions
 thirst-0.027
isher-0.027
anged-0.026
outed-0.025
edly-0.025
Token outside
Token ID2354
Feature activation-1.25e+00

Pos logit contributions
Beck+0.011
 Anne+0.011
Anne+0.011
Alice+0.009
 third+0.009
Neg logit contributions
 thirst-0.020
anged-0.019
outed-0.019
isher-0.018
uc-0.018
Token playing
Token ID2712
Feature activation-2.77e+00

Pos logit contributions
 Anne+0.013
 great+0.012
Beck+0.012
Anne+0.012
 different+0.012
Neg logit contributions
ment-0.037
isher-0.037
edly-0.036
anged-0.036
 thirst-0.036
Token in
Token ID287
Feature activation-8.68e-01

Pos logit contributions
 Anne+0.036
Anne+0.033
Beck+0.033
 third+0.031
 Alice+0.030
Neg logit contributions
edly-0.075
ment-0.075
 thirst-0.073
isher-0.073
anged-0.071
Token the
Token ID262
Feature activation+7.80e-02

Pos logit contributions
 Anne+0.009
Beck+0.009
Anne+0.008
 third+0.008
 best+0.008
Neg logit contributions
ment-0.024
isher-0.023
 thirst-0.023
anged-0.023
edly-0.022
Token bright
Token ID6016
Feature activation+1.44e-01

Pos logit contributions
isher+0.002
ment+0.002
 thirst+0.002
anged+0.002
elt+0.002
Neg logit contributions
 Anne-0.001
Anne-0.001
Beck-0.001
 Alice-0.001
 June-0.001
Token me
Token ID502
Feature activation-3.40e-02

Pos logit contributions
 Anne+0.027
Beck+0.027
Anne+0.026
 third+0.026
 neat+0.025
Neg logit contributions
isher-0.035
ment-0.035
anged-0.035
 thirst-0.035
edly-0.034
Token.
Token ID13
Feature activation-1.77e+00

Pos logit contributions
 Anne+0.001
Beck+0.000
 great+0.000
Anne+0.000
 different+0.000
Neg logit contributions
ment-0.001
isher-0.001
edly-0.001
anged-0.001
 thirst-0.001
Token Can
Token ID1680
Feature activation+9.91e-01

Pos logit contributions
 Anne+0.031
 great+0.029
Beck+0.028
Anne+0.028
 different+0.028
Neg logit contributions
ment-0.041
isher-0.041
edly-0.040
anged-0.039
 thirst-0.039
Token I
Token ID314
Feature activation+5.74e-01

Pos logit contributions
ment+0.027
isher+0.027
edly+0.027
anged+0.026
 thirst+0.026
Neg logit contributions
 Anne-0.013
 great-0.012
 third-0.012
 different-0.012
 big-0.012
Token help
Token ID1037
Feature activation-1.82e+00

Pos logit contributions
ment+0.015
isher+0.015
anged+0.014
edly+0.014
 thirst+0.014
Neg logit contributions
 Anne-0.008
Beck-0.008
Anne-0.008
 great-0.007
 third-0.007
Token you
Token ID345
Feature activation-5.58e-01

Pos logit contributions
Beck+0.036
 Anne+0.036
 great+0.035
Anne+0.035
 third+0.035
Neg logit contributions
isher-0.036
ment-0.035
anged-0.034
edly-0.032
 thirst-0.032
Token fix
Token ID4259
Feature activation-1.78e+00

Pos logit contributions
 Anne+0.008
Beck+0.008
Anne+0.008
 great+0.007
 third+0.007
Neg logit contributions
ment-0.015
isher-0.014
edly-0.014
anged-0.014
 thirst-0.014
Token your
Token ID534
Feature activation-4.38e-01

Pos logit contributions
Beck+0.022
 Anne+0.021
Anne+0.020
 third+0.020
 neat+0.018
Neg logit contributions
isher-0.050
ment-0.049
anged-0.046
edly-0.045
ishing-0.045
Token sc
Token ID629
Feature activation+2.23e-01

Pos logit contributions
 Anne+0.005
 great+0.005
 different+0.005
 big+0.005
 neat+0.004
Neg logit contributions
ment-0.013
isher-0.013
edly-0.013
anged-0.012
 thirst-0.012
Tokenooter
Token ID25141
Feature activation-2.66e-01

Pos logit contributions
edly+0.006
 thirst+0.006
ment+0.006
isher+0.006
anged+0.005
Neg logit contributions
 big-0.003
 great-0.003
 different-0.003
 best-0.003
 neat-0.003
Token?"
Token ID1701
Feature activation-8.66e-02

Pos logit contributions
 Anne+0.003
 great+0.003
 big+0.003
 out+0.003
 different+0.003
Neg logit contributions
ment-0.008
edly-0.008
isher-0.008
 thirst-0.008
anged-0.008
Token."
Token ID526
Feature activation-2.44e-01

Pos logit contributions
 Anne+0.003
 great+0.003
Beck+0.003
Anne+0.003
 different+0.003
Neg logit contributions
ment-0.006
isher-0.006
edly-0.006
anged-0.006
 thirst-0.006
Token
Token ID198
Feature activation+8.01e-01

Pos logit contributions
 great+0.003
 Anne+0.003
 neat+0.003
Beck+0.003
Anne+0.003
Neg logit contributions
ment-0.007
edly-0.007
anged-0.006
 thirst-0.006
isher-0.006
Token
Token ID198
Feature activation+1.05e+00

Pos logit contributions
ment+0.021
edly+0.020
isher+0.020
 thirst+0.020
anged+0.020
Neg logit contributions
 Anne-0.010
 great-0.010
 neat-0.010
 matching-0.010
 best-0.010
TokenDad
Token ID46270
Feature activation-8.40e-01

Pos logit contributions
ment+0.025
edly+0.025
isher+0.024
 thirst+0.023
anged+0.023
Neg logit contributions
 out-0.016
 Anne-0.016
 best-0.016
 great-0.016
 matching-0.015
Token said
Token ID531
Feature activation+4.62e-01

Pos logit contributions
 Anne+0.014
 great+0.014
Beck+0.014
Anne+0.014
 third+0.014
Neg logit contributions
ment-0.019
isher-0.019
edly-0.018
anged-0.018
 thirst-0.018
Token,
Token ID11
Feature activation-1.72e+00

Pos logit contributions
ment+0.015
edly+0.014
isher+0.014
 thirst+0.014
anged+0.014
Neg logit contributions
 Anne-0.004
Anne-0.004
Beck-0.004
 great-0.004
 neat-0.004
Token "
Token ID366
Feature activation+4.25e-01

Pos logit contributions
 Anne+0.018
 great+0.017
Beck+0.017
Anne+0.016
 neat+0.016
Neg logit contributions
ment-0.051
edly-0.050
isher-0.049
anged-0.048
 thirst-0.048
TokenThe
Token ID464
Feature activation+5.02e-01

Pos logit contributions
anged+0.012
ment+0.012
 thirst+0.011
isher+0.011
edly+0.011
Neg logit contributions
Beck-0.006
Anne-0.006
 Anne-0.005
 third-0.005
 different-0.005
Token fish
Token ID5916
Feature activation-7.70e-01

Pos logit contributions
ment+0.013
anged+0.013
edly+0.012
isher+0.012
outed+0.011
Neg logit contributions
 big-0.008
 great-0.008
 different-0.008
 third-0.008
 best-0.008
Token will
Token ID481
Feature activation-7.47e-01

Pos logit contributions
 Anne+0.008
 great+0.007
 different+0.007
 big+0.007
 out+0.007
Neg logit contributions
ment-0.023
edly-0.023
isher-0.023
anged-0.023
 thirst-0.022
Token not
Token ID407
Feature activation-9.04e-01

Pos logit contributions
 Anne+0.011
Anne+0.010
Beck+0.010
 great+0.010
 different+0.010
Neg logit contributions
ment-0.019
edly-0.019
isher-0.019
 thirst-0.018
anged-0.018
Token always
Token ID1464
Feature activation+2.42e+00

Pos logit contributions
Beck+0.014
 Anne+0.013
Anne+0.013
 third+0.013
Alice+0.012
Neg logit contributions
ment-0.023
 thirst-0.023
anged-0.022
edly-0.022
isher-0.022
Token looking
Token ID2045
Feature activation-2.25e+00

Pos logit contributions
ment+0.066
isher+0.065
edly+0.065
anged+0.063
 thirst+0.063
Neg logit contributions
 Anne-0.032
 great-0.030
Beck-0.029
 different-0.029
Anne-0.029
Token for
Token ID329
Feature activation-1.54e+00

Pos logit contributions
 Anne+0.039
Anne+0.039
Beck+0.037
 third+0.036
Alice+0.035
Neg logit contributions
ment-0.053
isher-0.051
edly-0.050
 thirst-0.049
anged-0.049
Token treasure
Token ID14068
Feature activation-1.07e+00

Pos logit contributions
 Anne+0.023
Anne+0.022
Beck+0.021
 third+0.020
 great+0.019
Neg logit contributions
ment-0.040
isher-0.039
 thirst-0.038
edly-0.038
anged-0.037
Token and
Token ID290
Feature activation+7.70e-01

Pos logit contributions
 Anne+0.015
 great+0.014
 different+0.014
 big+0.013
Beck+0.013
Neg logit contributions
ment-0.028
isher-0.028
edly-0.028
anged-0.028
 thirst-0.027
Token having
Token ID1719
Feature activation-2.02e+00

Pos logit contributions
isher+0.022
edly+0.022
ment+0.022
anged+0.021
outed+0.021
Neg logit contributions
 Anne-0.010
 great-0.009
 different-0.009
 big-0.009
 neat-0.009
Token fun
Token ID1257
Feature activation-2.41e+00

Pos logit contributions
 Anne+0.023
Anne+0.022
Beck+0.021
 third+0.020
 out+0.019
Neg logit contributions
ment-0.057
 thirst-0.056
edly-0.055
isher-0.054
anged-0.053
Token.
Token ID13
Feature activation-1.02e+00

Pos logit contributions
 Anne+0.031
 great+0.029
Beck+0.028
Anne+0.028
 different+0.028
Neg logit contributions
ment-0.066
isher-0.066
edly-0.065
anged-0.064
 thirst-0.063
Token<|endoftext|>
Token ID50256
Feature activation+2.40e+00

Pos logit contributions
 Anne+0.010
 great+0.009
Beck+0.009
Anne+0.009
 different+0.009
Neg logit contributions
ment-0.031
isher-0.030
edly-0.030
anged-0.030
 thirst-0.030
TokenOnce
Token ID7454
Feature activation+5.48e-01

Pos logit contributions
ment+0.063
edly+0.061
isher+0.059
 thirst+0.059
anged+0.059
Neg logit contributions
 Anne-0.030
 neat-0.030
Beck-0.028
 great-0.028
 third-0.028
Token upon
Token ID2402
Feature activation+2.60e+00

Pos logit contributions
edly+0.010
isher+0.010
ment+0.010
anged+0.009
 thirst+0.009
Neg logit contributions
 Anne-0.013
 big-0.012
 different-0.012
 great-0.012
 neat-0.012
Token and
Token ID290
Feature activation+3.92e-01

Pos logit contributions
 Anne+0.046
Beck+0.044
Anne+0.043
 great+0.042
 third+0.041
Neg logit contributions
ment-0.098
edly-0.096
isher-0.095
anged-0.093
 thirst-0.092
Token have
Token ID423
Feature activation-1.58e+00

Pos logit contributions
ment+0.011
isher+0.011
edly+0.011
anged+0.011
 thirst+0.011
Neg logit contributions
 Anne-0.004
Beck-0.004
Anne-0.004
 great-0.004
 third-0.004
Token lots
Token ID6041
Feature activation-4.65e-01

Pos logit contributions
 Anne+0.024
Anne+0.024
Beck+0.023
Con+0.022
 calendar+0.021
Neg logit contributions
 thirst-0.036
ment-0.035
edly-0.033
isher-0.033
 habitat-0.031
Token of
Token ID286
Feature activation-2.54e+00

Pos logit contributions
 Anne+0.008
Beck+0.008
Anne+0.008
 great+0.008
 third+0.008
Neg logit contributions
ment-0.011
isher-0.010
edly-0.010
anged-0.010
 thirst-0.010
Token fun
Token ID1257
Feature activation-2.33e+00

Pos logit contributions
 Anne+0.039
Beck+0.038
Anne+0.037
 third+0.034
 Ashley+0.033
Neg logit contributions
ment-0.062
 thirst-0.061
isher-0.061
edly-0.058
anged-0.058
Token playing
Token ID2712
Feature activation-4.25e+00

Pos logit contributions
 Anne+0.031
Beck+0.029
Anne+0.029
 great+0.029
 third+0.028
Neg logit contributions
ment-0.063
isher-0.062
edly-0.061
anged-0.060
 thirst-0.060
Token together
Token ID1978
Feature activation-1.38e+00

Pos logit contributions
 Anne+0.061
Beck+0.059
Anne+0.058
 third+0.055
 great+0.052
Neg logit contributions
ment-0.108
edly-0.108
isher-0.106
 thirst-0.102
anged-0.097
Token.
Token ID13
Feature activation-9.17e-01

Pos logit contributions
 Anne+0.024
Beck+0.023
Anne+0.023
 great+0.021
 third+0.021
Neg logit contributions
edly-0.031
ment-0.031
isher-0.030
 thirst-0.029
anged-0.028
Token And
Token ID843
Feature activation+4.30e-01

Pos logit contributions
 Anne+0.010
 great+0.009
Beck+0.009
Anne+0.009
 different+0.008
Neg logit contributions
ment-0.027
isher-0.027
edly-0.027
anged-0.026
 thirst-0.026
Token they
Token ID484
Feature activation-2.41e+00

Pos logit contributions
ment+0.012
isher+0.012
edly+0.012
anged+0.012
 thirst+0.012
Neg logit contributions
 Anne-0.005
 great-0.005
 different-0.005
Beck-0.005
Anne-0.005
Token always
Token ID1464
Feature activation+1.93e+00

Pos logit contributions
 Anne+0.034
Anne+0.032
Beck+0.031
 third+0.030
 neat+0.030
Neg logit contributions
isher-0.061
ment-0.060
edly-0.060
 thirst-0.059
anged-0.058
Token day
Token ID1110
Feature activation-7.28e-01

Pos logit contributions
isher+0.045
ment+0.044
anged+0.044
edly+0.044
uc+0.041
Neg logit contributions
 Anne-0.030
 great-0.030
 big-0.030
 different-0.029
 third-0.028
Token,
Token ID11
Feature activation-3.82e-01

Pos logit contributions
Anne+0.009
 Anne+0.009
Beck+0.008
Alice+0.008
 neat+0.008
Neg logit contributions
ment-0.021
anged-0.020
 thirst-0.020
edly-0.019
isher-0.019
Token her
Token ID607
Feature activation+3.80e+00

Pos logit contributions
 Anne+0.004
 great+0.004
Beck+0.004
Anne+0.004
 different+0.004
Neg logit contributions
ment-0.011
isher-0.011
edly-0.011
anged-0.011
 thirst-0.011
Token mom
Token ID1995
Feature activation-9.07e-01

Pos logit contributions
ment+0.098
isher+0.097
edly+0.096
anged+0.094
 thirst+0.093
Neg logit contributions
 Anne-0.056
 great-0.052
Beck-0.052
Anne-0.051
 different-0.051
Token and
Token ID290
Feature activation+1.91e+00

Pos logit contributions
Beck+0.015
 great+0.015
 Anne+0.015
 third+0.015
Anne+0.015
Neg logit contributions
isher-0.019
ment-0.019
anged-0.019
edly-0.018
outed-0.018
Token dad
Token ID9955
Feature activation-2.80e+00

Pos logit contributions
ment+0.048
isher+0.048
edly+0.047
anged+0.046
 thirst+0.045
Neg logit contributions
 Anne-0.029
 great-0.027
Beck-0.027
Anne-0.027
 third-0.026
Token would
Token ID561
Feature activation-6.68e-01

Pos logit contributions
 Anne+0.050
Anne+0.049
Beck+0.049
 third+0.047
 great+0.046
Neg logit contributions
isher-0.058
anged-0.056
ment-0.055
 thirst-0.054
edly-0.053
Token organize
Token ID16481
Feature activation-5.05e-01

Pos logit contributions
 Anne+0.009
Anne+0.009
Beck+0.009
 great+0.008
 third+0.008
Neg logit contributions
ment-0.017
 thirst-0.017
isher-0.017
edly-0.016
anged-0.016
Token things
Token ID1243
Feature activation-2.38e+00

Pos logit contributions
Anne+0.005
Beck+0.005
 Anne+0.005
Alice+0.004
 third+0.004
Neg logit contributions
ment-0.015
isher-0.015
 thirst-0.014
anged-0.014
edly-0.014
Token around
Token ID1088
Feature activation-1.06e+00

Pos logit contributions
 Anne+0.032
Anne+0.030
Beck+0.030
 third+0.029
 great+0.028
Neg logit contributions
ment-0.064
isher-0.063
edly-0.063
anged-0.061
 thirst-0.060
Token the
Token ID262
Feature activation-3.19e-01

Pos logit contributions
Alice+0.009
Anne+0.009
 Anne+0.009
 Higher+0.008
 third+0.008
Neg logit contributions
ment-0.032
isher-0.031
anged-0.030
edly-0.030
 thirst-0.030
Token being
Token ID852
Feature activation-1.32e+00

Pos logit contributions
 Anne+0.002
Beck+0.002
Anne+0.002
 third+0.002
 great+0.001
Neg logit contributions
ment-0.007
edly-0.007
 thirst-0.007
anged-0.007
isher-0.007
Token on
Token ID319
Feature activation-1.92e+00

Pos logit contributions
 Anne+0.022
Anne+0.021
Beck+0.021
 third+0.021
 Higher+0.020
Neg logit contributions
isher-0.030
ment-0.029
 thirst-0.029
edly-0.028
ishing-0.027
Token the
Token ID262
Feature activation-6.56e-01

Pos logit contributions
Beck+0.022
Anne+0.022
 Anne+0.021
 third+0.018
 neat+0.017
Neg logit contributions
ment-0.054
 thirst-0.053
edly-0.053
isher-0.049
ised-0.048
Token roof
Token ID9753
Feature activation-2.70e-01

Pos logit contributions
 Anne+0.011
Anne+0.011
Beck+0.011
 June+0.010
 Alice+0.010
Neg logit contributions
 thirst-0.015
isher-0.015
edly-0.014
ment-0.013
ro-0.013
Token for
Token ID329
Feature activation-1.98e+00

Pos logit contributions
 Anne+0.003
 great+0.003
 different+0.003
 big+0.003
 neat+0.003
Neg logit contributions
ment-0.008
isher-0.008
edly-0.008
anged-0.007
 thirst-0.007
Token a
Token ID257
Feature activation-3.16e+00

Pos logit contributions
 Anne+0.016
Beck+0.014
Anne+0.014
 great+0.014
 neat+0.014
Neg logit contributions
ment-0.064
edly-0.063
isher-0.062
 thirst-0.061
anged-0.061
Token while
Token ID981
Feature activation+4.51e-01

Pos logit contributions
 Anne+0.061
Anne+0.060
Beck+0.058
 overcome+0.056
 out+0.055
Neg logit contributions
 thirst-0.060
edly-0.058
ment-0.056
ro-0.055
anged-0.055
Token,
Token ID11
Feature activation-6.68e-02

Pos logit contributions
ment+0.012
edly+0.012
isher+0.012
anged+0.012
 thirst+0.012
Neg logit contributions
 Anne-0.006
Beck-0.005
Anne-0.005
 great-0.005
 neat-0.005
Token until
Token ID1566
Feature activation-4.85e-01

Pos logit contributions
Beck+0.001
 Anne+0.001
Anne+0.001
 great+0.001
 third+0.001
Neg logit contributions
edly-0.002
 thirst-0.002
ment-0.002
isher-0.002
anged-0.002
Token they
Token ID484
Feature activation-5.53e-01

Pos logit contributions
 Anne+0.005
Anne+0.005
Beck+0.005
 great+0.005
 neat+0.005
Neg logit contributions
ment-0.014
isher-0.014
edly-0.014
 thirst-0.014
anged-0.014
Token heard
Token ID2982
Feature activation+3.67e-01

Pos logit contributions
 Anne+0.007
 great+0.007
Beck+0.006
 different+0.006
 big+0.006
Neg logit contributions
ment-0.016
isher-0.015
edly-0.015
anged-0.015
 thirst-0.015
Token we
Token ID356
Feature activation-3.88e-01

Pos logit contributions
isher+0.025
ment+0.025
edly+0.024
anged+0.024
 thirst+0.023
Neg logit contributions
 Anne-0.009
 great-0.008
 different-0.008
 big-0.008
 neat-0.008
Token have
Token ID423
Feature activation-1.81e+00

Pos logit contributions
 Anne+0.005
 great+0.004
Beck+0.004
Anne+0.004
 different+0.004
Neg logit contributions
ment-0.011
isher-0.011
edly-0.011
anged-0.010
 thirst-0.010
Token a
Token ID257
Feature activation-4.59e+00

Pos logit contributions
 Anne+0.023
Beck+0.022
Anne+0.022
 third+0.020
 neat+0.020
Neg logit contributions
ment-0.048
 thirst-0.048
edly-0.047
isher-0.046
anged-0.045
Token net
Token ID2010
Feature activation-1.77e+00

Pos logit contributions
 Anne+0.067
Anne+0.067
Beck+0.061
Her+0.060
Alice+0.058
Neg logit contributions
 thirst-0.114
isher-0.107
ment-0.098
edly-0.097
anged-0.097
Token and
Token ID290
Feature activation+9.57e-01

Pos logit contributions
 Anne+0.024
Beck+0.023
Anne+0.023
 great+0.022
 third+0.022
Neg logit contributions
ment-0.046
isher-0.046
edly-0.046
anged-0.044
 thirst-0.044
Token a
Token ID257
Feature activation-3.48e+00

Pos logit contributions
isher+0.029
ment+0.029
edly+0.028
anged+0.028
 thirst+0.027
Neg logit contributions
 Anne-0.010
 great-0.010
 big-0.009
 different-0.009
 third-0.009
Token stick
Token ID4859
Feature activation-2.23e+00

Pos logit contributions
 Anne+0.055
Anne+0.053
Beck+0.050
Alice+0.047
 Alice+0.047
Neg logit contributions
 thirst-0.084
isher-0.081
ment-0.080
edly-0.077
anged-0.074
Token.
Token ID13
Feature activation-1.61e+00

Pos logit contributions
 Anne+0.028
 great+0.026
 different+0.026
Beck+0.026
Anne+0.026
Neg logit contributions
ment-0.061
isher-0.061
edly-0.060
anged-0.059
 thirst-0.059
Token We
Token ID775
Feature activation-2.10e-01

Pos logit contributions
 Anne+0.028
 great+0.026
Beck+0.026
Anne+0.026
 different+0.026
Neg logit contributions
ment-0.037
isher-0.037
edly-0.036
anged-0.036
 thirst-0.035
Token can
Token ID460
Feature activation-1.92e+00

Pos logit contributions
 Anne+0.003
 great+0.003
Beck+0.003
Anne+0.002
 different+0.002
Neg logit contributions
ment-0.006
isher-0.006
edly-0.006
anged-0.006
 thirst-0.005
Token catch
Token ID4929
Feature activation-1.63e+00

Pos logit contributions
 Anne+0.026
Anne+0.025
Beck+0.025
 great+0.023
 neat+0.023
Neg logit contributions
isher-0.049
 thirst-0.048
edly-0.048
ment-0.048
anged-0.047
Token It
Token ID632
Feature activation+1.13e+00

Pos logit contributions
 Anne+0.005
 great+0.005
Beck+0.005
 different+0.005
Anne+0.005
Neg logit contributions
ment-0.011
isher-0.011
edly-0.010
anged-0.010
 thirst-0.010
Token has
Token ID468
Feature activation+2.69e-02

Pos logit contributions
ment+0.034
isher+0.033
edly+0.033
anged+0.032
 thirst+0.032
Neg logit contributions
 Anne-0.012
 different-0.011
 big-0.011
 great-0.011
 third-0.011
Token big
Token ID1263
Feature activation-2.25e+00

Pos logit contributions
ment+0.001
edly+0.001
isher+0.001
anged+0.001
 thirst+0.001
Neg logit contributions
 Anne-0.000
Beck-0.000
Anne-0.000
 great-0.000
 neat-0.000
Token wheels
Token ID13666
Feature activation-3.32e-01

Pos logit contributions
 Anne+0.044
Beck+0.043
Anne+0.042
 great+0.041
 third+0.040
Neg logit contributions
ment-0.048
isher-0.046
anged-0.046
edly-0.045
 thirst-0.044
Token and
Token ID290
Feature activation+2.28e+00

Pos logit contributions
 Anne+0.003
 different+0.003
 big+0.003
 great+0.003
 out+0.003
Neg logit contributions
ment-0.010
isher-0.010
edly-0.010
 thirst-0.010
anged-0.009
Token a
Token ID257
Feature activation-2.81e+00

Pos logit contributions
isher+0.071
ment+0.069
edly+0.069
anged+0.068
 thirst+0.068
Neg logit contributions
 big-0.022
 Anne-0.022
 different-0.021
 great-0.021
 third-0.020
Token big
Token ID1263
Feature activation-2.20e+00

Pos logit contributions
 Anne+0.038
Anne+0.035
Beck+0.035
 great+0.032
 Alice+0.032
Neg logit contributions
ment-0.075
isher-0.073
edly-0.073
anged-0.072
 thirst-0.071
Token horn
Token ID12718
Feature activation+1.49e+00

Pos logit contributions
 Anne+0.036
Beck+0.035
Anne+0.034
 great+0.033
 different+0.033
Neg logit contributions
ment-0.052
isher-0.052
edly-0.051
anged-0.050
 thirst-0.050
Token.
Token ID13
Feature activation-6.85e-01

Pos logit contributions
isher+0.043
ment+0.043
edly+0.042
 thirst+0.041
anged+0.041
Neg logit contributions
 Anne-0.017
 great-0.016
 big-0.016
 different-0.016
 out-0.016
Token Lily
Token ID20037
Feature activation-1.43e+00

Pos logit contributions
 Anne+0.009
 great+0.008
Anne+0.008
Beck+0.008
 different+0.008
Neg logit contributions
ment-0.019
isher-0.019
edly-0.019
anged-0.018
 thirst-0.018
Token and
Token ID290
Feature activation+1.96e+00

Pos logit contributions
Beck+0.025
 Anne+0.024
 great+0.024
Anne+0.023
 third+0.023
Neg logit contributions
edly-0.031
anged-0.031
isher-0.030
ment-0.030
outed-0.030