TOP ACTIVATIONS
MAX = 6.388

so big and their eyes spark led in admiration . Jenny
with wonder and her eyes spark led with excitement . She
inside and Sally 's eyes spark led as she looked at
blue eyes and her curly blonde hair . She loved to
hair , and her eyes spark led like stars . One
blue and shiny and it spark led in the sun .
ing off her long , blonde hair . She
was smiling and her eyes spark led . "
bright blue eyes and long blonde hair .
it , and their eyes spark led . They wanted to
was bright and shiny and spark led in the sun .
was red and white and spark led like a star .
her pink dress and her spark ly shoes every day .
years old and had long blonde hair . She loved to
she had blue eyes and blonde hair . One
â s eyes spark led . She said ,
small and shiny and it spark led in the sun .
only three , with curly blonde hair and she loved to
on a hill and it spark led in the sunlight .
box full of colorful and spark ly clothes and every morning

MIN ACTIVATIONS
MIN = -4.973

and sad . They told Sam and Tim that they were
the shell . They told Tom and Lily that it was
the car . She told Tom to stay with Grand ma
the branch . He told Tom and Lily to go home
both tired . Lucy told Sam that they should take the
what happened . She told Tim my , " I told
very disappointed . She told Lisa that if she wanted a
nearby garage . She told Josh that she thought the pie
kitchen drawer . She told Tom , " I think I
the vet . They told Lily and Max to stay with
not happy . She told Tom and Lily to stop fighting
very upset . She told Mary that she needs to learn
was happening . She told Sally to give the toy back
family history . She told Sarah about all the brave and
her side . She told Jane it was ok , and
a suit . He told Jill it was a special dinosaur
She was unhappy and told Jill she had to clean it
and sad . She tells Tom and Lily to sit on
was wrong . Spot told Sam about his lost ball .
to help . She told Sarah that it wasn 't fair

INTERVAL 3.548 - 6.388
CONTAINS 0.087%

slide down . Her eyes spark led with excitement and she
a white fur and big brown eyes . The
me , Lily !" he cried . " I can 't
. They are safer and prett ier ," Mom says .
listened to his mom and worn pants . Jack 's mom

INTERVAL 0.708 - 3.548
CONTAINS 25.639%

My t eddy is stuck ! What
a tasty treat . She smiled and thanked him . The
. She loved eating rice for dinner . One day ,
he trusted his dad . His dad went inside and came
a red car that could zoom very fast . They called

INTERVAL -2.132 - 0.708
CONTAINS 71.824%

to wake up early . It wanted to see what the
, in a far away corner , he saw a big
. She was more polite and gentle with her humans and
studied , the harder it was to concentrate , but Jacob
pepper . <|endoftext|> Once upon a time , there was a

INTERVAL -4.973 - -2.132
CONTAINS 2.451%

Anna cried , running to Ben . Ben heard
In the end , Rex decided not to permit the visitor
house . Daddy , Mom my , the little boy and
. The dog was sad , but then suddenly it heard
From that day on , Lily always remembered to be safe
Token so
Token ID523
Feature activation+1.53e+00

Pos logit contributions
ped+0.013
led+0.012
kin+0.012
uc+0.012
ked+0.011
Neg logit contributions
 Elizabeth-0.012
 Caroline-0.012
 university-0.012
 Mary-0.012
abella-0.012
Token big
Token ID1263
Feature activation+1.78e-01

Pos logit contributions
kin+0.036
ped+0.036
led+0.034
uc+0.034
bit+0.032
Neg logit contributions
 Elizabeth-0.036
 Caroline-0.036
abella-0.035
 university-0.035
 Mary-0.035
Token and
Token ID290
Feature activation+6.01e-01

Pos logit contributions
kin+0.005
ped+0.005
uc+0.005
led+0.004
aces+0.004
Neg logit contributions
 Elizabeth-0.004
 Caroline-0.004
 Mary-0.004
 university-0.003
 Lib-0.003
Token their
Token ID511
Feature activation+2.02e+00

Pos logit contributions
kin+0.014
ped+0.013
uc+0.013
kins+0.013
led+0.013
Neg logit contributions
 Elizabeth-0.014
 Caroline-0.014
 Lib-0.014
 university-0.014
 Mary-0.014
Token eyes
Token ID2951
Feature activation+3.66e+00

Pos logit contributions
kin+0.042
ped+0.040
uc+0.039
kins+0.037
led+0.037
Neg logit contributions
 Mary-0.056
 Elizabeth-0.055
 Caroline-0.055
 university-0.054
 Alice-0.054
Token spark
Token ID9009
Feature activation+6.39e+00

Pos logit contributions
kin+0.097
uc+0.093
 elbow+0.092
ags+0.090
ped+0.090
Neg logit contributions
 Mary-0.075
 �-0.075
 Karen-0.073
 Lib-0.072
 Alice-0.072
Tokenled
Token ID992
Feature activation+3.10e+00

Pos logit contributions
ags+0.086
arl+0.084
kin+0.083
 elbow+0.082
ands+0.082
Neg logit contributions
abella-0.194
 Mary-0.192
 Karen-0.192
 about-0.190
 Alice-0.190
Token in
Token ID287
Feature activation+6.55e-01

Pos logit contributions
kin+0.079
uc+0.075
ped+0.075
 elbow+0.072
aces+0.070
Neg logit contributions
 Mary-0.070
 �-0.069
 Caroline-0.069
 Elizabeth-0.069
 Alice-0.068
Token admiration
Token ID28506
Feature activation+4.27e+00

Pos logit contributions
kin+0.020
uc+0.019
ped+0.019
aces+0.019
 elbow+0.019
Neg logit contributions
 Mary-0.012
 Caroline-0.011
 Elizabeth-0.011
 Alice-0.011
 Lib-0.011
Token.
Token ID13
Feature activation+5.83e-01

Pos logit contributions
kin+0.103
ped+0.099
uc+0.098
 elbow+0.095
bit+0.093
Neg logit contributions
 Mary-0.102
 Caroline-0.100
 �-0.100
 Alice-0.099
 Elizabeth-0.099
Token Jenny
Token ID28437
Feature activation-7.31e-01

Pos logit contributions
kin+0.017
uc+0.016
ped+0.016
 shorts+0.016
agged+0.015
Neg logit contributions
 Elizabeth-0.011
 Karen-0.010
 Lib-0.010
 university-0.010
 Caroline-0.010
Token with
Token ID351
Feature activation+7.79e-01

Pos logit contributions
kin+0.002
aces+0.002
ped+0.002
kins+0.002
uc+0.002
Neg logit contributions
abella-0.003
Kat-0.003
 Elizabeth-0.003
 Karen-0.003
 Caroline-0.003
Token wonder
Token ID4240
Feature activation+1.32e+00

Pos logit contributions
kin+0.026
uc+0.025
led+0.024
kins+0.023
agged+0.023
Neg logit contributions
 Mary-0.014
 �-0.014
 Caroline-0.014
 Tammy-0.013
 certain-0.013
Token and
Token ID290
Feature activation+1.69e+00

Pos logit contributions
kin+0.035
ped+0.033
uc+0.032
 elbow+0.031
bit+0.031
Neg logit contributions
 Mary-0.029
 Elizabeth-0.029
 Caroline-0.029
 �-0.028
 Alice-0.028
Token her
Token ID607
Feature activation+3.06e+00

Pos logit contributions
kin+0.048
uc+0.046
ped+0.044
agged+0.044
bit+0.044
Neg logit contributions
 Mary-0.037
 Caroline-0.035
 Elizabeth-0.035
 �-0.035
 Alice-0.035
Token eyes
Token ID2951
Feature activation+3.23e+00

Pos logit contributions
kin+0.066
ped+0.065
uc+0.062
bit+0.061
led+0.060
Neg logit contributions
 Caroline-0.080
 Elizabeth-0.079
 Mary-0.079
 university-0.079
 Alice-0.077
Token spark
Token ID9009
Feature activation+6.35e+00

Pos logit contributions
kin+0.081
ped+0.077
uc+0.077
 elbow+0.075
bit+0.073
Neg logit contributions
 �-0.072
 Mary-0.072
 Caroline-0.070
 Lib-0.070
 Alice-0.069
Tokenled
Token ID992
Feature activation+2.98e+00

Pos logit contributions
kin+0.095
arl+0.092
 elbow+0.092
ands+0.091
 fiber+0.091
Neg logit contributions
 about-0.187
 closely-0.177
 Paula-0.172
abella-0.171
 Mary-0.171
Token with
Token ID351
Feature activation+1.40e+00

Pos logit contributions
kin+0.079
uc+0.078
ped+0.076
 elbow+0.074
bit+0.072
Neg logit contributions
 �-0.064
 Mary-0.064
 Caroline-0.061
 Lib-0.061
 Alice-0.061
Token excitement
Token ID14067
Feature activation+3.63e+00

Pos logit contributions
kin+0.037
uc+0.035
ped+0.033
kins+0.033
led+0.032
Neg logit contributions
 Mary-0.033
 �-0.033
 Caroline-0.032
 Alice-0.032
 certain-0.031
Token.
Token ID13
Feature activation+3.23e-01

Pos logit contributions
kin+0.096
ped+0.092
uc+0.090
 elbow+0.089
agged+0.089
Neg logit contributions
 Mary-0.079
 �-0.077
 Alice-0.076
 Caroline-0.075
 university-0.075
Token She
Token ID1375
Feature activation+1.61e+00

Pos logit contributions
aces+0.008
ped+0.008
 fingers+0.007
 paws+0.007
 cheeks+0.007
Neg logit contributions
abella-0.007
str-0.007
Kat-0.007
 certain-0.007
Alice-0.007
Token inside
Token ID2641
Feature activation-1.58e+00

Pos logit contributions
Kat+0.051
 Elizabeth+0.051
Mary+0.051
 Heidi+0.050
str+0.050
Neg logit contributions
uc-0.033
ags-0.031
ked-0.031
ped-0.031
led-0.030
Token and
Token ID290
Feature activation+7.42e-01

Pos logit contributions
abella+0.039
 Elizabeth+0.039
 Caroline+0.039
 Lib+0.037
 Carol+0.037
Neg logit contributions
kin-0.034
uc-0.034
ped-0.032
kins-0.031
 fingers-0.030
Token Sally
Token ID25737
Feature activation-1.24e+00

Pos logit contributions
kin+0.015
led+0.015
ped+0.015
uc+0.015
ked+0.014
Neg logit contributions
 Elizabeth-0.019
 Caroline-0.019
 university-0.019
 �-0.018
 Carol-0.018
Token's
Token ID338
Feature activation+1.66e+00

Pos logit contributions
 university+0.031
 Caroline+0.031
 Elizabeth+0.031
 Tammy+0.030
 Mary+0.030
Neg logit contributions
uc-0.027
ped-0.027
led-0.027
kin-0.026
ked-0.025
Token eyes
Token ID2951
Feature activation+3.31e+00

Pos logit contributions
kin+0.044
ped+0.044
uc+0.041
led+0.041
bit+0.041
Neg logit contributions
 university-0.036
 Mary-0.035
 Caroline-0.035
 Alice-0.035
 Elizabeth-0.035
Token spark
Token ID9009
Feature activation+6.31e+00

Pos logit contributions
kin+0.088
ped+0.082
uc+0.082
 elbow+0.082
bit+0.080
Neg logit contributions
 �-0.069
 Mary-0.068
 Alice-0.068
 Lib-0.067
 Karen-0.067
Tokenled
Token ID992
Feature activation+2.88e+00

Pos logit contributions
kin+0.102
 fiber+0.095
 elbow+0.094
ases+0.092
ands+0.091
Neg logit contributions
 about-0.182
 Paula-0.178
 closely-0.175
abella-0.174
land-0.173
Token as
Token ID355
Feature activation-2.16e-01

Pos logit contributions
kin+0.072
ped+0.069
uc+0.069
 elbow+0.066
aces+0.064
Neg logit contributions
 Mary-0.066
 Caroline-0.065
 Elizabeth-0.065
 �-0.065
 Alice-0.064
Token she
Token ID673
Feature activation+2.58e+00

Pos logit contributions
 Elizabeth+0.004
 Caroline+0.003
 university+0.003
 Lib+0.003
 Mary+0.003
Neg logit contributions
kin-0.007
ped-0.007
uc-0.006
kins-0.006
led-0.006
Token looked
Token ID3114
Feature activation+1.15e+00

Pos logit contributions
kin+0.073
ags+0.071
ped+0.069
bit+0.069
aces+0.067
Neg logit contributions
abella-0.051
 Mary-0.048
 Caroline-0.048
 Carol-0.047
 Alice-0.047
Token at
Token ID379
Feature activation-3.32e-01

Pos logit contributions
kin+0.030
ped+0.029
uc+0.028
led+0.027
aces+0.026
Neg logit contributions
 Elizabeth-0.026
 Caroline-0.026
 Mary-0.025
 Karen-0.025
 Lib-0.025
Token blue
Token ID4171
Feature activation+3.44e+00

Pos logit contributions
kin+0.050
ped+0.049
uc+0.048
led+0.047
icked+0.045
Neg logit contributions
 Mary-0.059
 university-0.058
 Caroline-0.058
 Elizabeth-0.057
 Alice-0.057
Token eyes
Token ID2951
Feature activation+1.34e+00

Pos logit contributions
led+0.076
kin+0.075
wy+0.072
ands+0.072
ped+0.072
Neg logit contributions
 Mary-0.086
 �-0.086
 university-0.085
unt-0.085
 Caroline-0.082
Token and
Token ID290
Feature activation+5.69e-01

Pos logit contributions
kin+0.032
ped+0.031
uc+0.030
 elbow+0.029
kins+0.029
Neg logit contributions
 Elizabeth-0.032
 Caroline-0.032
 Mary-0.032
 Lib-0.031
 Alice-0.031
Token her
Token ID607
Feature activation+3.04e+00

Pos logit contributions
kin+0.016
uc+0.016
ater+0.016
agged+0.016
ands+0.015
Neg logit contributions
 Mary-0.012
 Alice-0.011
 Elizabeth-0.011
 Caroline-0.011
 �-0.011
Token curly
Token ID45731
Feature activation+5.54e+00

Pos logit contributions
kin+0.073
uc+0.072
ands+0.072
ped+0.071
icked+0.069
Neg logit contributions
 Mary-0.072
 Caroline-0.071
 �-0.070
 Alice-0.069
 Elizabeth-0.069
Token blonde
Token ID21541
Feature activation+6.26e+00

Pos logit contributions
kin+0.107
ases+0.104
ands+0.103
kins+0.102
uc+0.101
Neg logit contributions
 Mary-0.152
 Alice-0.148
 Caroline-0.146
 �-0.146
 Karen-0.146
Token hair
Token ID4190
Feature activation+3.44e+00

Pos logit contributions
kin+0.129
ands+0.122
kins+0.120
uc+0.120
ases+0.117
Neg logit contributions
 Mary-0.162
 Lib-0.157
 Alice-0.157
 Caroline-0.156
 �-0.153
Token.
Token ID13
Feature activation-7.52e-01

Pos logit contributions
kin+0.092
ped+0.088
bit+0.088
uc+0.087
ags+0.087
Neg logit contributions
 Lib-0.066
 Mary-0.065
 �-0.064
 Alice-0.064
 Karen-0.063
Token She
Token ID1375
Feature activation+7.31e-01

Pos logit contributions
abella+0.016
str+0.014
 Elizabeth+0.014
 certain+0.014
Kat+0.014
Neg logit contributions
aces-0.018
ags-0.018
agged-0.017
 fingers-0.017
 paws-0.017
Token loved
Token ID6151
Feature activation-1.72e+00

Pos logit contributions
kin+0.019
ped+0.018
uc+0.018
 elbow+0.017
kins+0.017
Neg logit contributions
 Elizabeth-0.016
 Caroline-0.016
 Mary-0.016
 university-0.016
 Lib-0.016
Token to
Token ID284
Feature activation+1.37e-01

Pos logit contributions
 Caroline+0.034
 Elizabeth+0.034
abella+0.033
 Lib+0.032
 Mary+0.032
Neg logit contributions
ped-0.047
uc-0.046
kin-0.045
 elbow-0.044
aces-0.042
Token hair
Token ID4190
Feature activation+8.56e-01

Pos logit contributions
kin+0.075
ped+0.071
icked+0.068
agged+0.066
uc+0.065
Neg logit contributions
 university-0.099
 Mary-0.097
 bakery-0.096
 Caroline-0.095
 Karen-0.095
Token,
Token ID11
Feature activation-1.43e+00

Pos logit contributions
kin+0.024
ped+0.023
uc+0.023
 elbow+0.022
kins+0.022
Neg logit contributions
 Mary-0.017
 Elizabeth-0.017
 Caroline-0.017
 Lib-0.016
 university-0.016
Token and
Token ID290
Feature activation-3.27e-01

Pos logit contributions
 Elizabeth+0.030
 Caroline+0.029
 Mary+0.029
 university+0.029
 Lib+0.029
Neg logit contributions
kin-0.039
ped-0.038
uc-0.037
 elbow-0.035
aces-0.035
Token her
Token ID607
Feature activation+2.53e+00

Pos logit contributions
 Mary+0.006
 Alice+0.006
 Elizabeth+0.006
 Caroline+0.006
 �+0.005
Neg logit contributions
kin-0.010
uc-0.010
ped-0.010
agged-0.009
bit-0.009
Token eyes
Token ID2951
Feature activation+2.34e+00

Pos logit contributions
ped+0.059
kin+0.059
uc+0.057
led+0.056
icked+0.056
Neg logit contributions
 Mary-0.060
 university-0.060
 Caroline-0.059
 Elizabeth-0.059
 Alice-0.058
Token spark
Token ID9009
Feature activation+6.08e+00

Pos logit contributions
kin+0.062
ped+0.058
 elbow+0.058
uc+0.057
ocol+0.056
Neg logit contributions
 Paula-0.047
 �-0.047
 Lib-0.046
 Karen-0.046
 Mary-0.046
Tokenled
Token ID992
Feature activation+2.56e+00

Pos logit contributions
kin+0.108
 fiber+0.107
ocol+0.104
 elbow+0.101
 Deal+0.097
Neg logit contributions
 about-0.169
land-0.161
 Paula-0.155
 beyond-0.152
 closely-0.151
Token like
Token ID588
Feature activation-1.52e-01

Pos logit contributions
kin+0.069
uc+0.067
ped+0.066
 elbow+0.065
bit+0.063
Neg logit contributions
 Mary-0.054
 �-0.053
 university-0.052
 Caroline-0.052
 Lib-0.052
Token stars
Token ID5788
Feature activation+2.15e+00

Pos logit contributions
 Elizabeth+0.003
 Mary+0.003
 Caroline+0.003
 Lib+0.003
 Alice+0.003
Neg logit contributions
kin-0.004
ped-0.004
uc-0.004
 elbow-0.004
kins-0.004
Token.
Token ID13
Feature activation-4.88e-01

Pos logit contributions
kin+0.056
uc+0.054
ped+0.054
 elbow+0.052
agged+0.051
Neg logit contributions
 Mary-0.045
 Lib-0.044
 Elizabeth-0.044
 Alice-0.044
 Caroline-0.044
Token One
Token ID1881
Feature activation-3.02e-01

Pos logit contributions
 Elizabeth+0.009
 Caroline+0.008
 university+0.008
 Lib+0.008
 Alice+0.008
Neg logit contributions
kin-0.014
ped-0.013
uc-0.013
aces-0.013
 toes-0.013
Token blue
Token ID4171
Feature activation+1.83e+00

Pos logit contributions
 Elizabeth+0.010
 Mary+0.010
 Caroline+0.010
 Lib+0.010
 Alice+0.010
Neg logit contributions
kin-0.013
ped-0.013
uc-0.013
 elbow-0.012
kins-0.012
Token and
Token ID290
Feature activation-4.81e-01

Pos logit contributions
kin+0.057
ped+0.055
uc+0.054
 elbow+0.053
kins+0.053
Neg logit contributions
 Mary-0.032
 �-0.030
 Caroline-0.030
 Lib-0.030
 Elizabeth-0.030
Token shiny
Token ID22441
Feature activation+4.05e+00

Pos logit contributions
 Mary+0.011
 Elizabeth+0.011
 Caroline+0.011
 Alice+0.011
 Lib+0.011
Neg logit contributions
kin-0.012
ped-0.012
uc-0.011
 elbow-0.011
aces-0.011
Token and
Token ID290
Feature activation-3.70e-01

Pos logit contributions
kin+0.105
uc+0.102
ped+0.102
 elbow+0.096
agged+0.095
Neg logit contributions
 Mary-0.088
 �-0.087
 Lib-0.086
 Caroline-0.086
 Alice-0.086
Token it
Token ID340
Feature activation-8.76e-01

Pos logit contributions
 Elizabeth+0.009
 Caroline+0.008
 university+0.008
 Mary+0.008
abella+0.008
Neg logit contributions
kin-0.009
ped-0.009
uc-0.008
kins-0.008
led-0.008
Token spark
Token ID9009
Feature activation+5.96e+00

Pos logit contributions
 Elizabeth+0.020
 Caroline+0.019
 Mary+0.019
 university+0.019
 Lib+0.019
Neg logit contributions
kin-0.023
ped-0.022
uc-0.021
led-0.020
kins-0.020
Tokenled
Token ID992
Feature activation+2.89e+00

Pos logit contributions
 elbow+0.073
kin+0.072
 elbows+0.069
ocol+0.069
ags+0.068
Neg logit contributions
abella-0.196
 about-0.193
 Mary-0.192
 Alice-0.190
 Karen-0.189
Token in
Token ID287
Feature activation+8.63e-01

Pos logit contributions
kin+0.074
uc+0.071
ped+0.071
 elbow+0.068
kins+0.066
Neg logit contributions
 Mary-0.065
 Caroline-0.064
 �-0.063
 Lib-0.063
 Elizabeth-0.063
Token the
Token ID262
Feature activation+1.58e+00

Pos logit contributions
kin+0.028
uc+0.027
ped+0.027
 elbow+0.026
aces+0.026
Neg logit contributions
 Mary-0.014
 Caroline-0.014
 Alice-0.013
 university-0.013
 Elizabeth-0.013
Token sun
Token ID4252
Feature activation+1.70e+00

Pos logit contributions
kin+0.035
uc+0.034
ped+0.033
 elbow+0.032
icked+0.031
Neg logit contributions
 �-0.041
 university-0.040
 Mary-0.040
 certain-0.039
 Caroline-0.039
Token.
Token ID13
Feature activation+2.61e-01

Pos logit contributions
kin+0.041
ped+0.040
uc+0.039
 elbow+0.038
agged+0.037
Neg logit contributions
 Mary-0.040
 Elizabeth-0.039
 Caroline-0.039
 �-0.039
 Alice-0.039
Tokening
Token ID278
Feature activation+1.26e+00

Pos logit contributions
kin+0.054
 elbow+0.050
ped+0.050
uc+0.050
kins+0.048
Neg logit contributions
 Mary-0.033
 �-0.032
 Caroline-0.031
 Elizabeth-0.031
 Lib-0.030
Token off
Token ID572
Feature activation+2.37e+00

Pos logit contributions
kin+0.033
ped+0.032
uc+0.031
 elbow+0.030
kins+0.030
Neg logit contributions
 Elizabeth-0.027
 Mary-0.027
 Caroline-0.027
 Alice-0.027
 university-0.027
Token her
Token ID607
Feature activation+2.28e+00

Pos logit contributions
kin+0.063
ped+0.061
uc+0.060
 elbow+0.057
kins+0.057
Neg logit contributions
 Elizabeth-0.051
 Mary-0.050
 Caroline-0.050
 Alice-0.049
 Lib-0.049
Token long
Token ID890
Feature activation+2.49e+00

Pos logit contributions
kin+0.038
ped+0.038
uc+0.037
icked+0.036
agged+0.034
Neg logit contributions
 university-0.070
 �-0.070
 Mary-0.069
 Caroline-0.069
 Alice-0.068
Token,
Token ID11
Feature activation-1.39e+00

Pos logit contributions
kin+0.061
uc+0.059
ped+0.059
icked+0.056
kins+0.056
Neg logit contributions
 Mary-0.058
 Caroline-0.058
 university-0.058
 Elizabeth-0.058
 �-0.058
Token blonde
Token ID21541
Feature activation+5.89e+00

Pos logit contributions
 Elizabeth+0.035
 Caroline+0.035
abella+0.035
 Mary+0.034
 Lib+0.034
Neg logit contributions
kin-0.031
ped-0.030
uc-0.029
 elbow-0.028
led-0.028
Token hair
Token ID4190
Feature activation+3.17e+00

Pos logit contributions
kin+0.123
kins+0.113
uc+0.110
icked+0.108
ped+0.108
Neg logit contributions
 Mary-0.162
 Alice-0.156
 university-0.155
 Caroline-0.155
 Lib-0.153
Token.
Token ID13
Feature activation-7.55e-01

Pos logit contributions
kin+0.085
ped+0.083
uc+0.081
agged+0.080
ags+0.079
Neg logit contributions
 Mary-0.062
 �-0.061
 Caroline-0.060
 university-0.060
 Alice-0.060
Token
Token ID198
Feature activation-8.82e-01

Pos logit contributions
 Elizabeth+0.015
 Caroline+0.015
abella+0.014
 Lib+0.014
 foreign+0.014
Neg logit contributions
aces-0.018
kin-0.018
ags-0.018
agged-0.018
 shorts-0.018
Token
Token ID198
Feature activation+1.03e+00

Pos logit contributions
 Elizabeth+0.021
 Caroline+0.021
 Lib+0.021
 university+0.021
 bakery+0.020
Neg logit contributions
kin-0.017
agged-0.015
uc-0.015
ped-0.015
ags-0.014
TokenShe
Token ID3347
Feature activation+1.25e+00

Pos logit contributions
kin+0.025
ater+0.022
ands+0.022
agged+0.022
 beads+0.022
Neg logit contributions
 Lib-0.023
 Mary-0.023
 Elizabeth-0.022
 Amelia-0.022
 Caroline-0.022
Token was
Token ID373
Feature activation-6.05e-01

Pos logit contributions
ped+0.041
kin+0.040
uc+0.038
led+0.038
ked+0.037
Neg logit contributions
 university-0.045
 Lib-0.043
 Elizabeth-0.042
 Caroline-0.042
abella-0.042
Token smiling
Token ID16755
Feature activation+2.65e+00

Pos logit contributions
abella+0.014
 university+0.013
unt+0.013
 Elizabeth+0.013
 Lib+0.013
Neg logit contributions
ped-0.014
kin-0.014
 lips-0.013
ps-0.013
ked-0.013
Token and
Token ID290
Feature activation+1.16e+00

Pos logit contributions
kin+0.071
ped+0.068
uc+0.067
 elbow+0.064
kins+0.063
Neg logit contributions
 Mary-0.057
 Caroline-0.056
 Elizabeth-0.056
 Alice-0.055
 �-0.055
Token her
Token ID607
Feature activation+2.20e+00

Pos logit contributions
kin+0.030
ped+0.030
uc+0.029
 elbow+0.028
kins+0.027
Neg logit contributions
 Elizabeth-0.025
 Caroline-0.025
 Mary-0.024
 university-0.024
 Lib-0.024
Token eyes
Token ID2951
Feature activation+3.19e+00

Pos logit contributions
kin+0.045
uc+0.043
ped+0.043
led+0.040
agged+0.040
Neg logit contributions
 Mary-0.060
 Elizabeth-0.060
 Caroline-0.060
 Alice-0.058
 university-0.058
Token spark
Token ID9009
Feature activation+5.87e+00

Pos logit contributions
kin+0.081
uc+0.076
ped+0.076
 elbow+0.074
kins+0.073
Neg logit contributions
 Mary-0.070
 �-0.070
 Lib-0.069
 Alice-0.069
 Caroline-0.069
Tokenled
Token ID992
Feature activation+2.90e+00

Pos logit contributions
kin+0.097
ands+0.089
 fiber+0.086
 elbow+0.082
ags+0.077
Neg logit contributions
 about-0.172
 Alice-0.166
 Mary-0.166
 Karen-0.165
 Caroline-0.165
Token.
Token ID13
Feature activation+1.28e-01

Pos logit contributions
kin+0.073
uc+0.069
ped+0.069
 elbow+0.066
kins+0.065
Neg logit contributions
 Mary-0.067
 Caroline-0.066
 �-0.066
 Elizabeth-0.066
 Lib-0.065
Token
Token ID198
Feature activation+2.68e-02

Pos logit contributions
ped+0.003
 lips+0.003
ked+0.003
 paws+0.003
 hands+0.003
Neg logit contributions
abella-0.003
 university-0.003
 certain-0.003
 cinema-0.003
 divorce-0.003
Token
Token ID198
Feature activation+8.89e-01

Pos logit contributions
ped+0.001
kin+0.001
agged+0.001
uc+0.001
kins+0.001
Neg logit contributions
 university-0.001
abella-0.001
 Caroline-0.001
 meeting-0.001
 Lib-0.001
Token"
Token ID1
Feature activation+4.82e-01

Pos logit contributions
kin+0.020
ped+0.019
uc+0.019
kins+0.018
led+0.017
Neg logit contributions
abella-0.021
 university-0.021
 Lib-0.021
 Mary-0.021
 Caroline-0.021
Token bright
Token ID6016
Feature activation+2.84e+00

Pos logit contributions
kin+0.015
ped+0.015
aces+0.014
 elbow+0.014
uc+0.014
Neg logit contributions
 Elizabeth-0.018
 Caroline-0.018
 �-0.018
abella-0.018
 Lib-0.018
Token blue
Token ID4171
Feature activation+3.07e+00

Pos logit contributions
kin+0.073
uc+0.064
ped+0.063
ked+0.063
kins+0.061
Neg logit contributions
 Mary-0.073
 university-0.070
 Karen-0.069
 Alice-0.069
 �-0.068
Token eyes
Token ID2951
Feature activation+1.54e+00

Pos logit contributions
kin+0.076
ped+0.067
kins+0.063
led+0.063
wy+0.059
Neg logit contributions
 university-0.081
 bakery-0.080
 Mary-0.079
 �-0.079
 Karen-0.075
Token and
Token ID290
Feature activation+7.93e-01

Pos logit contributions
kin+0.029
ped+0.028
uc+0.027
aces+0.026
led+0.025
Neg logit contributions
 Elizabeth-0.044
 Caroline-0.044
 Mary-0.043
 university-0.043
abella-0.043
Token long
Token ID890
Feature activation+2.52e+00

Pos logit contributions
kin+0.022
uc+0.022
ped+0.021
agged+0.021
ked+0.021
Neg logit contributions
 Mary-0.017
 Alice-0.016
 Elizabeth-0.016
 Caroline-0.016
 �-0.016
Token blonde
Token ID21541
Feature activation+5.87e+00

Pos logit contributions
kin+0.054
uc+0.048
ped+0.045
bit+0.045
kins+0.044
Neg logit contributions
 �-0.069
 Mary-0.069
 Alice-0.067
 university-0.066
 Karen-0.066
Token hair
Token ID4190
Feature activation+3.42e+00

Pos logit contributions
kin+0.125
kins+0.109
uc+0.108
ped+0.106
icked+0.100
Neg logit contributions
 Mary-0.162
 Lib-0.158
 Alice-0.156
 Caroline-0.153
 university-0.152
Token.
Token ID13
Feature activation+7.51e-02

Pos logit contributions
kin+0.083
ped+0.079
uc+0.077
 elbow+0.074
bit+0.074
Neg logit contributions
 Mary-0.080
 Lib-0.079
 Elizabeth-0.079
 Caroline-0.078
 Alice-0.078
Token
Token ID220
Feature activation-3.62e-01

Pos logit contributions
kin+0.002
uc+0.002
ped+0.002
 toes+0.002
aces+0.002
Neg logit contributions
 Elizabeth-0.001
 Caroline-0.001
 Lib-0.001
 Alice-0.001
Alice-0.001
Token
Token ID198
Feature activation-1.32e-01

Pos logit contributions
 Elizabeth+0.007
 Caroline+0.007
 university+0.007
 Lib+0.006
abella+0.006
Neg logit contributions
kin-0.010
ped-0.010
kins-0.009
aces-0.009
uc-0.009
Token
Token ID198
Feature activation+1.04e+00

Pos logit contributions
 Elizabeth+0.003
 university+0.003
 Caroline+0.003
 Heidi+0.003
 Lib+0.003
Neg logit contributions
kin-0.003
uc-0.003
kins-0.003
ped-0.003
ippers-0.003
Token it
Token ID340
Feature activation+2.44e-01

Pos logit contributions
 Elizabeth+0.027
 Caroline+0.027
 Mary+0.026
 Lib+0.026
 university+0.026
Neg logit contributions
kin-0.046
ped-0.046
uc-0.044
 elbow-0.043
kins-0.042
Token,
Token ID11
Feature activation-1.70e+00

Pos logit contributions
kin+0.005
ped+0.005
led+0.005
uc+0.005
aces+0.005
Neg logit contributions
 Karen-0.007
 Lib-0.007
 Elizabeth-0.007
abella-0.007
 Caroline-0.007
Token and
Token ID290
Feature activation-3.15e-01

Pos logit contributions
abella+0.041
unt+0.041
 Elizabeth+0.040
Alice+0.040
Kat+0.039
Neg logit contributions
ped-0.040
ked-0.039
 fingers-0.038
 toes-0.038
kin-0.037
Token their
Token ID511
Feature activation+1.84e+00

Pos logit contributions
 Elizabeth+0.006
 Caroline+0.006
 Lib+0.006
 university+0.006
Alice+0.006
Neg logit contributions
kin-0.009
ped-0.009
uc-0.009
led-0.008
ked-0.008
Token eyes
Token ID2951
Feature activation+3.02e+00

Pos logit contributions
kin+0.038
ped+0.035
uc+0.035
aces+0.034
kins+0.033
Neg logit contributions
 Caroline-0.050
 Mary-0.050
 university-0.049
 Elizabeth-0.049
 �-0.049
Token spark
Token ID9009
Feature activation+5.84e+00

Pos logit contributions
kin+0.077
ped+0.075
 elbow+0.075
uc+0.074
ags+0.072
Neg logit contributions
 Mary-0.062
 �-0.062
 Caroline-0.060
 certain-0.060
 Paula-0.059
Tokenled
Token ID992
Feature activation+2.64e+00

Pos logit contributions
arl+0.081
ags+0.078
ocol+0.077
 elbow+0.075
kin+0.075
Neg logit contributions
abella-0.177
 about-0.175
 Paula-0.174
 Mary-0.172
 Caroline-0.171
Token.
Token ID13
Feature activation-6.65e-01

Pos logit contributions
kin+0.071
uc+0.069
ped+0.067
 elbow+0.066
ands+0.064
Neg logit contributions
 �-0.056
 Mary-0.056
 Caroline-0.054
 Alice-0.053
 Lib-0.053
Token They
Token ID1119
Feature activation+9.19e-01

Pos logit contributions
 Elizabeth+0.014
 Caroline+0.014
 university+0.014
 Lib+0.014
abella+0.014
Neg logit contributions
ped-0.017
uc-0.016
kin-0.016
 paws-0.016
icked-0.016
Token wanted
Token ID2227
Feature activation-1.82e+00

Pos logit contributions
kin+0.024
uc+0.022
ped+0.022
agged+0.022
ags+0.022
Neg logit contributions
 Mary-0.021
 Caroline-0.021
 �-0.020
 Alice-0.020
 Elizabeth-0.020
Token to
Token ID284
Feature activation+6.59e-01

Pos logit contributions
Alice+0.051
 Elizabeth+0.050
unt+0.050
 Kathy+0.049
aren+0.048
Neg logit contributions
 fingers-0.035
 cr-0.035
 feathers-0.034
 hands-0.034
ped-0.033
Token was
Token ID373
Feature activation-6.76e-01

Pos logit contributions
 Elizabeth+0.002
 Mary+0.002
 Caroline+0.002
 �+0.002
 Alice+0.002
Neg logit contributions
kin-0.003
ped-0.003
uc-0.003
 elbow-0.003
kins-0.003
Token bright
Token ID6016
Feature activation+2.15e+00

Pos logit contributions
 Mary+0.014
 Elizabeth+0.014
 Caroline+0.014
 university+0.014
 Alice+0.014
Neg logit contributions
kin-0.018
ped-0.018
uc-0.017
 elbow-0.017
kins-0.017
Token and
Token ID290
Feature activation-2.47e-01

Pos logit contributions
kin+0.061
ped+0.059
uc+0.058
 elbow+0.056
kins+0.056
Neg logit contributions
 Mary-0.043
 Caroline-0.042
 Elizabeth-0.042
 Alice-0.041
 university-0.041
Token shiny
Token ID22441
Feature activation+4.17e+00

Pos logit contributions
 Mary+0.006
 Elizabeth+0.006
 Caroline+0.006
 �+0.006
 Alice+0.006
Neg logit contributions
kin-0.006
ped-0.006
uc-0.006
agged-0.006
 elbow-0.006
Token and
Token ID290
Feature activation-4.18e-01

Pos logit contributions
kin+0.113
uc+0.113
ped+0.110
 elbow+0.105
agged+0.104
Neg logit contributions
 Mary-0.084
 university-0.081
 Lib-0.081
 Alice-0.081
 Caroline-0.080
Token spark
Token ID9009
Feature activation+5.79e+00

Pos logit contributions
 Elizabeth+0.010
 Caroline+0.010
 university+0.010
 Mary+0.010
 Lib+0.010
Neg logit contributions
kin-0.010
ped-0.010
uc-0.009
kins-0.009
 elbow-0.009
Tokenled
Token ID992
Feature activation+2.88e+00

Pos logit contributions
 elbow+0.091
 elbows+0.090
kin+0.085
ags+0.085
ocol+0.085
Neg logit contributions
 about-0.176
abella-0.172
 Mary-0.166
 Lib-0.166
 Alice-0.165
Token in
Token ID287
Feature activation+8.45e-01

Pos logit contributions
kin+0.080
uc+0.078
 elbow+0.077
ped+0.076
bit+0.073
Neg logit contributions
 Mary-0.058
 �-0.057
 certain-0.056
 university-0.056
 Lib-0.056
Token the
Token ID262
Feature activation+1.54e+00

Pos logit contributions
kin+0.028
ped+0.027
uc+0.027
led+0.026
 elbow+0.026
Neg logit contributions
 Mary-0.014
 Caroline-0.013
 Elizabeth-0.013
 Alice-0.013
 university-0.013
Token sun
Token ID4252
Feature activation+1.54e+00

Pos logit contributions
kin+0.032
ped+0.031
uc+0.030
 elbow+0.028
kins+0.028
Neg logit contributions
 Elizabeth-0.041
 Mary-0.041
 Caroline-0.041
 university-0.041
 Lib-0.040
Token.
Token ID13
Feature activation+1.14e-02

Pos logit contributions
kin+0.040
ped+0.038
uc+0.037
 elbow+0.037
agged+0.036
Neg logit contributions
 Mary-0.035
 �-0.033
 Alice-0.033
 Lib-0.033
 Elizabeth-0.033
Token was
Token ID373
Feature activation-1.19e+00

Pos logit contributions
 Mary+0.006
 �+0.006
 Elizabeth+0.006
 Caroline+0.006
 university+0.006
Neg logit contributions
kin-0.013
uc-0.012
ped-0.012
 elbow-0.012
kins-0.012
Token red
Token ID2266
Feature activation+2.11e+00

Pos logit contributions
 Elizabeth+0.024
 Caroline+0.024
 Mary+0.024
 Lib+0.024
 Alice+0.023
Neg logit contributions
kin-0.033
ped-0.032
uc-0.031
 elbow-0.029
aces-0.029
Token and
Token ID290
Feature activation-8.22e-01

Pos logit contributions
kin+0.061
ped+0.059
uc+0.058
kins+0.055
 elbow+0.055
Neg logit contributions
 Mary-0.041
 Elizabeth-0.041
 Caroline-0.040
 university-0.040
 Alice-0.039
Token white
Token ID2330
Feature activation+2.41e+00

Pos logit contributions
 Mary+0.017
 �+0.017
 Elizabeth+0.017
 Caroline+0.017
 Alice+0.017
Neg logit contributions
kin-0.022
ped-0.021
uc-0.021
agged-0.020
aces-0.020
Token and
Token ID290
Feature activation-5.29e-01

Pos logit contributions
kin+0.057
ped+0.055
uc+0.053
 elbow+0.051
aces+0.051
Neg logit contributions
 Elizabeth-0.059
 Caroline-0.058
 Mary-0.058
 Lib-0.057
 Alice-0.057
Token spark
Token ID9009
Feature activation+5.78e+00

Pos logit contributions
 Elizabeth+0.011
 Caroline+0.011
 Mary+0.011
 Lib+0.011
 Alice+0.011
Neg logit contributions
kin-0.014
ped-0.014
uc-0.013
 elbow-0.013
kins-0.013
Tokenled
Token ID992
Feature activation+2.79e+00

Pos logit contributions
 elbow+0.100
kin+0.099
 fiber+0.097
 elbows+0.096
ocol+0.094
Neg logit contributions
 about-0.169
abella-0.155
 beyond-0.153
 Mary-0.150
 Alice-0.148
Token like
Token ID588
Feature activation+2.97e-01

Pos logit contributions
kin+0.075
uc+0.073
ped+0.071
 elbow+0.071
bit+0.067
Neg logit contributions
 Mary-0.059
 �-0.058
 university-0.057
 Caroline-0.057
 Lib-0.056
Token a
Token ID257
Feature activation+6.92e-01

Pos logit contributions
kin+0.008
ped+0.008
uc+0.007
 elbow+0.007
kins+0.007
Neg logit contributions
 Elizabeth-0.006
 Caroline-0.006
 Mary-0.006
 Lib-0.006
 Alice-0.006
Token star
Token ID3491
Feature activation+1.99e+00

Pos logit contributions
kin+0.017
ped+0.016
uc+0.016
 elbow+0.016
kins+0.015
Neg logit contributions
 Elizabeth-0.016
 Caroline-0.016
 Mary-0.016
 Lib-0.016
 Alice-0.015
Token.
Token ID13
Feature activation+4.71e-01

Pos logit contributions
ped+0.051
uc+0.051
kin+0.050
agged+0.050
ags+0.049
Neg logit contributions
 university-0.041
 Mary-0.041
 certain-0.039
 Alice-0.039
 Elizabeth-0.039
Token her
Token ID607
Feature activation+3.35e+00

Pos logit contributions
kin+0.068
ped+0.064
uc+0.062
 elbow+0.061
kins+0.061
Neg logit contributions
 Caroline-0.048
 university-0.048
 Alice-0.047
 Elizabeth-0.047
 Paula-0.046
Token pink
Token ID11398
Feature activation+4.25e+00

Pos logit contributions
ped+0.093
kin+0.090
icked+0.088
uc+0.087
ked+0.087
Neg logit contributions
 university-0.068
 Caroline-0.066
 Alice-0.066
 Mary-0.066
 Elizabeth-0.065
Token dress
Token ID6576
Feature activation+1.98e+00

Pos logit contributions
ped+0.112
led+0.106
icked+0.106
kin+0.105
ps+0.103
Neg logit contributions
 university-0.095
 �-0.091
unt-0.090
 bakery-0.089
 Caroline-0.089
Token and
Token ID290
Feature activation+1.84e+00

Pos logit contributions
kin+0.051
ped+0.048
uc+0.048
aces+0.046
kins+0.045
Neg logit contributions
 Elizabeth-0.045
 Mary-0.044
 Caroline-0.044
 Lib-0.043
 Alice-0.043
Token her
Token ID607
Feature activation+3.46e+00

Pos logit contributions
agged+0.048
kin+0.048
ped+0.048
uc+0.047
ases+0.047
Neg logit contributions
 Mary-0.040
 �-0.040
 Alice-0.040
 university-0.039
 Paula-0.039
Token spark
Token ID9009
Feature activation+5.73e+00

Pos logit contributions
ped+0.097
uc+0.094
icked+0.094
ases+0.093
ands+0.093
Neg logit contributions
 Caroline-0.067
 Mary-0.066
 Alice-0.065
 university-0.065
 Karen-0.065
Tokenly
Token ID306
Feature activation+4.07e+00

Pos logit contributions
ocol+0.110
 fiber+0.090
uc+0.088
bit+0.087
kin+0.086
Neg logit contributions
 about-0.164
 Paula-0.154
abella-0.153
land-0.151
 Alice-0.150
Token shoes
Token ID10012
Feature activation+2.22e+00

Pos logit contributions
ases+0.096
ped+0.094
agged+0.092
uc+0.091
icked+0.090
Neg logit contributions
 Paula-0.090
 �-0.086
 Carol-0.085
 Caroline-0.084
 university-0.084
Token every
Token ID790
Feature activation+1.90e+00

Pos logit contributions
kin+0.063
ocol+0.063
ped+0.063
bit+0.062
uc+0.062
Neg logit contributions
 Lib-0.040
 �-0.039
 about-0.039
 certain-0.038
 Paula-0.038
Token day
Token ID1110
Feature activation+2.00e+00

Pos logit contributions
uc+0.065
kin+0.064
ped+0.064
kins+0.062
ked+0.062
Neg logit contributions
 Mary-0.025
 university-0.024
 Alice-0.024
 Elizabeth-0.024
 Caroline-0.024
Token.
Token ID13
Feature activation-5.07e-01

Pos logit contributions
kin+0.040
ped+0.037
uc+0.036
aces+0.035
kins+0.034
Neg logit contributions
 Elizabeth-0.057
 Caroline-0.056
 Mary-0.054
 university-0.054
abella-0.054
Token years
Token ID812
Feature activation+4.85e-01

Pos logit contributions
kin+0.002
ped+0.002
uc+0.002
 elbow+0.002
kins+0.002
Neg logit contributions
 Elizabeth-0.002
 Caroline-0.002
 Mary-0.002
 Lib-0.002
 university-0.002
Token old
Token ID1468
Feature activation+2.31e+00

Pos logit contributions
kin+0.014
ped+0.013
uc+0.013
 elbow+0.013
kins+0.013
Neg logit contributions
 Elizabeth-0.009
 Caroline-0.009
 Mary-0.009
 Lib-0.009
 Alice-0.009
Token and
Token ID290
Feature activation+4.00e-01

Pos logit contributions
kin+0.058
ped+0.056
uc+0.055
 elbow+0.052
kins+0.052
Neg logit contributions
 Elizabeth-0.052
 Caroline-0.052
 Mary-0.051
abella-0.051
 university-0.051
Token had
Token ID550
Feature activation+1.01e+00

Pos logit contributions
kin+0.012
ped+0.012
uc+0.011
 elbow+0.011
bit+0.011
Neg logit contributions
 Mary-0.007
 Elizabeth-0.007
 Caroline-0.007
 Alice-0.007
 Lib-0.007
Token long
Token ID890
Feature activation+2.23e+00

Pos logit contributions
kin+0.022
ped+0.021
uc+0.020
 elbow+0.020
aces+0.020
Neg logit contributions
 Elizabeth-0.026
 Caroline-0.026
abella-0.025
 �-0.025
 Mary-0.025
Token blonde
Token ID21541
Feature activation+5.72e+00

Pos logit contributions
kin+0.045
uc+0.041
ped+0.040
ked+0.039
icked+0.037
Neg logit contributions
 Mary-0.064
 Alice-0.063
 �-0.063
 university-0.062
 Elizabeth-0.062
Token hair
Token ID4190
Feature activation+2.82e+00

Pos logit contributions
kin+0.128
uc+0.114
kins+0.114
ped+0.112
icked+0.105
Neg logit contributions
 Mary-0.153
 Alice-0.148
 Lib-0.147
 university-0.145
 �-0.144
Token.
Token ID13
Feature activation-1.79e-01

Pos logit contributions
kin+0.067
ped+0.065
uc+0.062
 elbow+0.059
bit+0.059
Neg logit contributions
 Mary-0.067
 Elizabeth-0.067
 Caroline-0.067
 Lib-0.066
 Alice-0.066
Token She
Token ID1375
Feature activation+7.35e-01

Pos logit contributions
abella+0.004
 Elizabeth+0.003
unt+0.003
str+0.003
 Kathy+0.003
Neg logit contributions
agged-0.004
aces-0.004
kin-0.004
ags-0.004
 toes-0.004
Token loved
Token ID6151
Feature activation-1.07e+00

Pos logit contributions
kin+0.019
ped+0.018
uc+0.018
 elbow+0.017
kins+0.017
Neg logit contributions
 Elizabeth-0.017
 Caroline-0.016
 Mary-0.016
 Lib-0.016
 university-0.016
Token to
Token ID284
Feature activation+7.16e-01

Pos logit contributions
 Caroline+0.019
 Elizabeth+0.019
abella+0.018
 Mary+0.018
 Lib+0.018
Neg logit contributions
ped-0.031
uc-0.031
kin-0.031
 elbow-0.029
aces-0.029
Token she
Token ID673
Feature activation+4.55e-01

Pos logit contributions
kin+0.019
ped+0.018
uc+0.017
aces+0.017
kins+0.017
Neg logit contributions
 Elizabeth-0.014
 Caroline-0.014
 Mary-0.014
 Lib-0.013
 university-0.013
Token had
Token ID550
Feature activation+5.98e-01

Pos logit contributions
kin+0.012
ped+0.012
uc+0.012
 elbow+0.011
kins+0.011
Neg logit contributions
 Mary-0.009
 Elizabeth-0.009
 Caroline-0.009
 Alice-0.009
 �-0.009
Token blue
Token ID4171
Feature activation+2.61e+00

Pos logit contributions
ped+0.015
kin+0.015
uc+0.015
aces+0.014
 elbow+0.014
Neg logit contributions
 Elizabeth-0.013
 Caroline-0.013
 Lib-0.012
 Mary-0.012
 university-0.012
Token eyes
Token ID2951
Feature activation+1.28e+00

Pos logit contributions
kin+0.058
ped+0.053
led+0.051
kins+0.050
uc+0.049
Neg logit contributions
 Mary-0.072
 �-0.071
 Caroline-0.069
 Karen-0.068
 university-0.068
Token and
Token ID290
Feature activation+7.05e-01

Pos logit contributions
kin+0.031
ped+0.031
uc+0.030
aces+0.029
led+0.028
Neg logit contributions
 Elizabeth-0.029
 Caroline-0.029
 Mary-0.029
 university-0.029
abella-0.028
Token blonde
Token ID21541
Feature activation+5.69e+00

Pos logit contributions
kin+0.018
ped+0.017
uc+0.017
 elbow+0.016
kins+0.016
Neg logit contributions
 Mary-0.016
 Elizabeth-0.016
 Caroline-0.016
 Alice-0.015
 Lib-0.015
Token hair
Token ID4190
Feature activation+3.43e+00

Pos logit contributions
kin+0.118
uc+0.105
kins+0.100
ped+0.099
ands+0.096
Neg logit contributions
 Mary-0.162
 Lib-0.159
 Alice-0.158
 Caroline-0.157
 Karen-0.155
Token.
Token ID13
Feature activation-3.92e-01

Pos logit contributions
kin+0.083
ped+0.079
uc+0.077
 elbow+0.074
bit+0.073
Neg logit contributions
 Mary-0.080
 Elizabeth-0.080
 Lib-0.080
 Caroline-0.079
 Alice-0.079
Token
Token ID198
Feature activation+2.83e-02

Pos logit contributions
abella+0.007
 certain+0.007
 university+0.007
str+0.007
 Elizabeth+0.007
Neg logit contributions
aces-0.010
ped-0.010
 paws-0.010
kin-0.010
 fingers-0.009
Token
Token ID198
Feature activation+8.12e-01

Pos logit contributions
ped+0.001
kin+0.001
aces+0.001
uc+0.001
agged+0.001
Neg logit contributions
 university-0.001
 Lib-0.001
 Elizabeth-0.001
 meeting-0.001
 Caroline-0.001
TokenOne
Token ID3198
Feature activation+1.30e-01

Pos logit contributions
kin+0.020
ped+0.019
uc+0.019
aces+0.018
bit+0.018
Neg logit contributions
 Lib-0.016
 Elizabeth-0.016
 university-0.015
 Caroline-0.015
 Alice-0.015
Tokenâ
Token ID22940
Feature activation+3.71e-01

Pos logit contributions
 university+0.005
 Elizabeth+0.005
 Caroline+0.004
 Lib+0.004
 Heidi+0.004
Neg logit contributions
ped-0.003
kin-0.003
uc-0.003
led-0.003
ked-0.003
Token
Token ID26391
Feature activation-5.99e-01

Pos logit contributions
kin+0.010
ped+0.010
uc+0.010
agged+0.010
bit+0.010
Neg logit contributions
 Elizabeth-0.007
 Caroline-0.007
 Mary-0.007
 university-0.007
abella-0.007
Token
Token ID8151
Feature activation-1.43e+00

Pos logit contributions
 Elizabeth+0.012
abella+0.011
 �+0.011
 Lib+0.011
 Caroline+0.011
Neg logit contributions
kin-0.016
uc-0.016
aces-0.015
kins-0.015
ped-0.015
Tokens
Token ID82
Feature activation+1.63e+00

Pos logit contributions
 Elizabeth+0.034
 university+0.034
 Caroline+0.033
 Carol+0.032
 Mary+0.032
Neg logit contributions
kin-0.035
ped-0.034
uc-0.033
icked-0.031
agged-0.031
Token eyes
Token ID2951
Feature activation+2.81e+00

Pos logit contributions
kin+0.033
ped+0.031
uc+0.031
 elbow+0.028
bit+0.028
Neg logit contributions
 Elizabeth-0.046
 Lib-0.045
 Caroline-0.045
 Mary-0.045
 Alice-0.045
Token spark
Token ID9009
Feature activation+5.68e+00

Pos logit contributions
kin+0.073
uc+0.067
ped+0.066
 elbow+0.066
kins+0.064
Neg logit contributions
 �-0.062
 Mary-0.061
 Caroline-0.060
 Karen-0.060
 Lib-0.060
Tokenled
Token ID992
Feature activation+2.56e+00

Pos logit contributions
kin+0.093
ands+0.084
 elbow+0.082
 fiber+0.081
ags+0.080
Neg logit contributions
 about-0.167
 closely-0.161
 Paula-0.160
 �-0.158
 Caroline-0.158
Token.
Token ID13
Feature activation+1.08e+00

Pos logit contributions
kin+0.069
uc+0.066
ped+0.066
 elbow+0.063
kins+0.062
Neg logit contributions
 Mary-0.054
 �-0.053
 Caroline-0.053
 Elizabeth-0.053
 Alice-0.052
Token She
Token ID1375
Feature activation+1.71e+00

Pos logit contributions
aces+0.027
ped+0.026
uc+0.026
 shorts+0.025
 paws+0.025
Neg logit contributions
 Elizabeth-0.021
abella-0.021
Alice-0.021
Kat-0.021
 certain-0.021
Token said
Token ID531
Feature activation-2.16e-01

Pos logit contributions
kin+0.039
ped+0.038
uc+0.037
led+0.035
 elbow+0.035
Neg logit contributions
 Elizabeth-0.043
 Caroline-0.042
 Mary-0.042
 university-0.042
 Lib-0.041
Token,
Token ID11
Feature activation-2.61e-01

Pos logit contributions
 Elizabeth+0.005
 Caroline+0.005
abella+0.005
 university+0.005
Alice+0.005
Neg logit contributions
kin-0.005
ped-0.005
uc-0.005
icked-0.004
ked-0.004
Token small
Token ID1402
Feature activation+1.00e+00

Pos logit contributions
 Mary+0.016
 Elizabeth+0.016
 Caroline+0.016
 university+0.015
 Lib+0.015
Neg logit contributions
kin-0.020
ped-0.020
uc-0.020
 elbow-0.019
kins-0.019
Token and
Token ID290
Feature activation-4.55e-01

Pos logit contributions
kin+0.026
ped+0.025
uc+0.025
aces+0.024
 elbow+0.024
Neg logit contributions
 Elizabeth-0.022
 Caroline-0.022
 Mary-0.021
abella-0.021
 Lib-0.021
Token shiny
Token ID22441
Feature activation+3.80e+00

Pos logit contributions
 Mary+0.010
 Caroline+0.010
 Elizabeth+0.010
 Alice+0.010
 �+0.010
Neg logit contributions
kin-0.012
ped-0.011
uc-0.011
agged-0.011
 elbow-0.011
Token and
Token ID290
Feature activation-6.27e-01

Pos logit contributions
kin+0.099
uc+0.099
ped+0.098
agged+0.091
 elbow+0.091
Neg logit contributions
 university-0.081
 Mary-0.080
 Lib-0.079
 Alice-0.079
 Caroline-0.078
Token it
Token ID340
Feature activation-1.09e+00

Pos logit contributions
 Elizabeth+0.015
 Caroline+0.014
 Mary+0.014
abella+0.014
 university+0.014
Neg logit contributions
kin-0.016
ped-0.015
uc-0.014
kins-0.014
aces-0.014
Token spark
Token ID9009
Feature activation+5.68e+00

Pos logit contributions
 Elizabeth+0.025
 Caroline+0.024
 Mary+0.024
 Lib+0.024
 Alice+0.024
Neg logit contributions
kin-0.028
ped-0.026
uc-0.026
 elbow-0.025
kins-0.025
Tokenled
Token ID992
Feature activation+2.47e+00

Pos logit contributions
ocol+0.080
 elbow+0.078
 elbows+0.075
kin+0.075
 fiber+0.074
Neg logit contributions
 about-0.178
abella-0.173
 Alice-0.167
 Mary-0.166
 beyond-0.165
Token in
Token ID287
Feature activation+6.43e-01

Pos logit contributions
kin+0.066
uc+0.065
ped+0.063
 elbow+0.062
bit+0.060
Neg logit contributions
 Mary-0.052
 �-0.051
 university-0.051
 Caroline-0.050
 Alice-0.050
Token the
Token ID262
Feature activation+1.17e+00

Pos logit contributions
kin+0.020
ped+0.019
uc+0.019
 elbow+0.018
led+0.018
Neg logit contributions
 Mary-0.012
 Caroline-0.011
 Elizabeth-0.011
 university-0.011
 Alice-0.011
Token sun
Token ID4252
Feature activation+1.29e+00

Pos logit contributions
kin+0.029
ped+0.028
uc+0.027
 elbow+0.026
kins+0.026
Neg logit contributions
 university-0.028
 Caroline-0.028
 Mary-0.028
 Elizabeth-0.027
 Alice-0.027
Token.
Token ID13
Feature activation-2.11e-02

Pos logit contributions
kin+0.033
ped+0.032
uc+0.031
 elbow+0.031
agged+0.030
Neg logit contributions
 Mary-0.029
 Alice-0.028
 Elizabeth-0.028
 Lib-0.028
 �-0.028
Token only
Token ID691
Feature activation+1.86e+00

Pos logit contributions
 Elizabeth+0.014
 Caroline+0.014
abella+0.014
Kat+0.014
 �+0.014
Neg logit contributions
kin-0.017
ped-0.016
uc-0.016
 elbow-0.015
aces-0.015
Token three
Token ID1115
Feature activation+2.34e-01

Pos logit contributions
kin+0.041
ped+0.039
uc+0.039
 toes+0.037
 fingers+0.037
Neg logit contributions
 Elizabeth-0.046
abella-0.046
 Caroline-0.045
Kat-0.045
 �-0.045
Token,
Token ID11
Feature activation-2.40e+00

Pos logit contributions
kin+0.006
ped+0.005
uc+0.005
 elbow+0.005
kins+0.005
Neg logit contributions
 Mary-0.006
 Elizabeth-0.006
 Caroline-0.006
 Alice-0.006
 Lib-0.006
Token with
Token ID351
Feature activation+6.05e-01

Pos logit contributions
abella+0.053
 Elizabeth+0.050
Kat+0.050
 Caroline+0.050
 Paula+0.050
Neg logit contributions
kin-0.064
 toes-0.058
ped-0.058
aces-0.056
 elbow-0.056
Token curly
Token ID45731
Feature activation+4.79e+00

Pos logit contributions
kin+0.019
led+0.019
uc+0.019
kins+0.019
ked+0.019
Neg logit contributions
 Mary-0.011
 Alice-0.011
 Caroline-0.011
 Elizabeth-0.011
 university-0.010
Token blonde
Token ID21541
Feature activation+5.67e+00

Pos logit contributions
kin+0.106
uc+0.093
kins+0.092
ands+0.087
ases+0.086
Neg logit contributions
 Mary-0.131
 Alice-0.126
 �-0.125
 Karen-0.124
 Lib-0.122
Token hair
Token ID4190
Feature activation+2.60e+00

Pos logit contributions
kin+0.123
kins+0.110
uc+0.109
ped+0.105
ands+0.103
Neg logit contributions
 Mary-0.151
 Lib-0.145
 Alice-0.144
 �-0.142
 Caroline-0.141
Token and
Token ID290
Feature activation+2.03e+00

Pos logit contributions
kin+0.076
ped+0.073
uc+0.071
bit+0.070
 elbow+0.069
Neg logit contributions
 Mary-0.046
 Lib-0.046
 �-0.045
 Elizabeth-0.045
 Alice-0.045
Token she
Token ID673
Feature activation+9.98e-01

Pos logit contributions
uc+0.055
kin+0.055
ags+0.054
agged+0.053
ands+0.053
Neg logit contributions
 Mary-0.043
 Alice-0.040
 Caroline-0.040
 �-0.040
 Elizabeth-0.039
Token loved
Token ID6151
Feature activation-7.45e-01

Pos logit contributions
kin+0.029
ped+0.028
uc+0.027
 elbow+0.026
kins+0.026
Neg logit contributions
 Mary-0.019
 Elizabeth-0.019
 Caroline-0.019
abella-0.019
 Alice-0.018
Token to
Token ID284
Feature activation+5.85e-01

Pos logit contributions
 Elizabeth+0.015
 Caroline+0.015
abella+0.014
 Heidi+0.014
 Lib+0.014
Neg logit contributions
uc-0.020
ped-0.020
kin-0.020
 elbow-0.020
aces-0.019
Token on
Token ID319
Feature activation+1.07e+00

Pos logit contributions
kin+0.009
ped+0.009
uc+0.009
 elbow+0.009
kins+0.008
Neg logit contributions
 Mary-0.005
 Elizabeth-0.005
 Caroline-0.005
 university-0.005
 Lib-0.005
Token a
Token ID257
Feature activation-3.99e-01

Pos logit contributions
kin+0.027
ped+0.026
uc+0.025
bit+0.024
kins+0.024
Neg logit contributions
 Elizabeth-0.024
 Caroline-0.024
abella-0.023
 Mary-0.023
 Lib-0.023
Token hill
Token ID12788
Feature activation-7.04e-01

Pos logit contributions
 Elizabeth+0.010
abella+0.010
 Mary+0.010
 Caroline+0.010
 Karen+0.010
Neg logit contributions
kin-0.010
uc-0.008
ater-0.008
 toes-0.008
 fingers-0.008
Token and
Token ID290
Feature activation+4.11e-01

Pos logit contributions
 Elizabeth+0.019
 Caroline+0.019
abella+0.019
 Carol+0.018
Kat+0.018
Neg logit contributions
kin-0.015
ped-0.014
led-0.013
uc-0.013
aces-0.013
Token it
Token ID340
Feature activation-6.99e-01

Pos logit contributions
kin+0.011
ped+0.011
uc+0.010
 elbow+0.010
led+0.010
Neg logit contributions
 Elizabeth-0.008
 Caroline-0.008
 Mary-0.008
 university-0.008
 Lib-0.008
Token spark
Token ID9009
Feature activation+5.66e+00

Pos logit contributions
 Elizabeth+0.017
 Caroline+0.017
 Mary+0.017
 Lib+0.017
 Alice+0.017
Neg logit contributions
kin-0.017
ped-0.016
led-0.015
uc-0.015
kins-0.015
Tokenled
Token ID992
Feature activation+2.72e+00

Pos logit contributions
ocol+0.083
 elbow+0.077
 fiber+0.076
kin+0.075
arl+0.073
Neg logit contributions
 about-0.179
abella-0.176
 Mary-0.173
 Alice-0.169
 Lib-0.169
Token in
Token ID287
Feature activation+5.41e-01

Pos logit contributions
kin+0.073
uc+0.072
ped+0.069
 elbow+0.069
kins+0.066
Neg logit contributions
 Mary-0.058
 �-0.057
 Caroline-0.056
 Lib-0.055
 Alice-0.055
Token the
Token ID262
Feature activation+6.46e-01

Pos logit contributions
kin+0.018
uc+0.017
ped+0.017
led+0.017
aces+0.016
Neg logit contributions
 Mary-0.009
 Caroline-0.009
 university-0.008
 Alice-0.008
 Elizabeth-0.008
Token sunlight
Token ID19606
Feature activation+1.85e+00

Pos logit contributions
kin+0.012
ped+0.011
uc+0.011
 elbow+0.010
bit+0.010
Neg logit contributions
 Elizabeth-0.019
 Caroline-0.019
 Mary-0.019
 Lib-0.018
 Alice-0.018
Token.
Token ID13
Feature activation+7.60e-02

Pos logit contributions
kin+0.045
ped+0.044
uc+0.043
 elbow+0.042
agged+0.041
Neg logit contributions
 Mary-0.043
 Caroline-0.042
 Lib-0.042
 Elizabeth-0.042
 �-0.041
Token box
Token ID3091
Feature activation-3.29e-01

Pos logit contributions
kin+0.006
ped+0.005
uc+0.005
 elbow+0.005
kins+0.005
Neg logit contributions
 Elizabeth-0.006
 Mary-0.006
 Caroline-0.006
 Lib-0.006
 Alice-0.006
Token full
Token ID1336
Feature activation+1.06e+00

Pos logit contributions
 Mary+0.007
 Elizabeth+0.006
 Caroline+0.006
 university+0.006
 Lib+0.006
Neg logit contributions
kin-0.009
ped-0.009
uc-0.009
 elbow-0.008
bit-0.008
Token of
Token ID286
Feature activation+1.67e-01

Pos logit contributions
ped+0.032
uc+0.031
kin+0.030
 elbow+0.030
agged+0.030
Neg logit contributions
 �-0.019
 Mary-0.019
 about-0.018
 university-0.018
 Lib-0.018
Token colorful
Token ID20239
Feature activation+2.59e+00

Pos logit contributions
kin+0.004
ped+0.004
ked+0.004
uc+0.004
bit+0.004
Neg logit contributions
 Mary-0.004
 university-0.004
 certain-0.004
 Alice-0.004
 Karen-0.004
Token and
Token ID290
Feature activation+9.79e-01

Pos logit contributions
kin+0.058
agged+0.055
ped+0.055
bit+0.054
icked+0.052
Neg logit contributions
 university-0.067
 Mary-0.065
 Paula-0.065
 �-0.065
 bakery-0.064
Token spark
Token ID9009
Feature activation+5.64e+00

Pos logit contributions
ped+0.027
uc+0.026
kin+0.026
bit+0.026
agged+0.026
Neg logit contributions
 Mary-0.020
 �-0.020
 Alice-0.020
 Lib-0.019
 Caroline-0.019
Tokenly
Token ID306
Feature activation+2.85e+00

Pos logit contributions
kin+0.130
 elbow+0.120
bit+0.116
 elbows+0.115
uc+0.113
Neg logit contributions
 about-0.134
 Paula-0.128
 Alice-0.128
 Mary-0.128
 Caroline-0.127
Token clothes
Token ID8242
Feature activation+2.37e+00

Pos logit contributions
kin+0.057
agged+0.056
uc+0.056
ped+0.055
 elbow+0.052
Neg logit contributions
 Mary-0.075
 �-0.075
 Lib-0.074
 university-0.074
 Alice-0.073
Token and
Token ID290
Feature activation+1.36e+00

Pos logit contributions
kin+0.062
ped+0.060
uc+0.058
 elbow+0.057
bit+0.057
Neg logit contributions
 Lib-0.051
 Mary-0.051
 Alice-0.050
 Caroline-0.050
 university-0.050
Token every
Token ID790
Feature activation+9.63e-01

Pos logit contributions
agged+0.034
kin+0.034
uc+0.034
ater+0.033
ags+0.033
Neg logit contributions
 Mary-0.031
 Alice-0.031
 Paula-0.030
 Caroline-0.030
 �-0.030
Token morning
Token ID3329
Feature activation+1.20e+00

Pos logit contributions
kin+0.030
ped+0.030
uc+0.030
bit+0.029
agged+0.029
Neg logit contributions
 Mary-0.015
 Caroline-0.015
 Elizabeth-0.015
 Alice-0.015
 university-0.015
Token and
Token ID290
Feature activation+1.12e+00

Pos logit contributions
kin+0.039
ped+0.039
uc+0.037
led+0.036
aces+0.036
Neg logit contributions
 Elizabeth-0.034
 Caroline-0.034
 Lib-0.033
 Mary-0.033
abella-0.033
Token sad
Token ID6507
Feature activation+2.39e+00

Pos logit contributions
kin+0.030
uc+0.029
ped+0.028
aces+0.027
kins+0.027
Neg logit contributions
 Mary-0.024
 Elizabeth-0.024
 Caroline-0.023
 Alice-0.023
 �-0.023
Token.
Token ID13
Feature activation+1.02e-01

Pos logit contributions
kin+0.060
uc+0.057
ped+0.057
agged+0.054
 elbow+0.054
Neg logit contributions
 Mary-0.056
 �-0.054
 Elizabeth-0.054
 Caroline-0.054
 Alice-0.054
Token They
Token ID1119
Feature activation+6.96e-01

Pos logit contributions
ped+0.002
kin+0.002
aces+0.002
agged+0.002
uc+0.002
Neg logit contributions
 Elizabeth-0.002
 Lib-0.002
 Caroline-0.002
 Paula-0.002
abella-0.002
Token told
Token ID1297
Feature activation+4.88e-01

Pos logit contributions
kin+0.016
ped+0.015
uc+0.015
 elbow+0.014
kins+0.014
Neg logit contributions
 Elizabeth-0.017
 Mary-0.017
 Caroline-0.017
 Lib-0.017
 Alice-0.017
Token Sam
Token ID3409
Feature activation-4.97e+00

Pos logit contributions
ped+0.011
kin+0.011
uc+0.010
bit+0.010
ked+0.010
Neg logit contributions
 Lib-0.012
 Elizabeth-0.011
 Caroline-0.011
Kat-0.011
 university-0.011
Token and
Token ID290
Feature activation+3.99e-01

Pos logit contributions
 bakery+0.112
 university+0.112
 Caroline+0.111
 Elizabeth+0.110
Kat+0.109
Neg logit contributions
ped-0.121
kin-0.121
aces-0.113
led-0.110
icked-0.110
Token Tim
Token ID5045
Feature activation-2.69e+00

Pos logit contributions
kin+0.011
ped+0.010
uc+0.010
 elbow+0.010
kins+0.010
Neg logit contributions
 Elizabeth-0.008
 Mary-0.008
 Caroline-0.008
 Alice-0.008
 Lib-0.008
Token that
Token ID326
Feature activation-3.01e+00

Pos logit contributions
 Elizabeth+0.059
 Caroline+0.059
Kat+0.058
 Lib+0.057
abella+0.057
Neg logit contributions
ped-0.069
kin-0.066
uc-0.064
icked-0.063
aces-0.063
Token they
Token ID484
Feature activation+5.44e-01

Pos logit contributions
 Elizabeth+0.063
 Caroline+0.063
Alice+0.062
Kat+0.062
abella+0.061
Neg logit contributions
kin-0.080
ped-0.076
kins-0.073
 elbow-0.070
aces-0.069
Token were
Token ID547
Feature activation-1.59e+00

Pos logit contributions
kin+0.015
uc+0.014
ped+0.014
 elbow+0.014
kins+0.014
Neg logit contributions
 Mary-0.011
 Elizabeth-0.011
 Caroline-0.011
 Karen-0.011
 Alice-0.011
Token the
Token ID262
Feature activation-1.79e+00

Pos logit contributions
 Caroline+0.008
 Elizabeth+0.008
 university+0.008
unt+0.007
abella+0.007
Neg logit contributions
ped-0.008
uc-0.008
kin-0.007
led-0.007
ked-0.007
Token shell
Token ID7582
Feature activation-3.40e-01

Pos logit contributions
 Caroline+0.043
 Mary+0.042
 Elizabeth+0.042
 Lib+0.042
 Alice+0.041
Neg logit contributions
kin-0.043
uc-0.041
ped-0.041
aces-0.039
 elbow-0.038
Token.
Token ID13
Feature activation+2.81e-01

Pos logit contributions
 Elizabeth+0.008
 Mary+0.008
 Caroline+0.008
 Lib+0.007
 Alice+0.007
Neg logit contributions
kin-0.009
ped-0.008
uc-0.008
 elbow-0.008
kins-0.008
Token They
Token ID1119
Feature activation+1.44e+00

Pos logit contributions
kin+0.006
ped+0.006
uc+0.006
 lips+0.006
 fingers+0.005
Neg logit contributions
abella-0.007
 Elizabeth-0.007
 Caroline-0.007
 university-0.007
 Paula-0.007
Token told
Token ID1297
Feature activation+3.34e-01

Pos logit contributions
kin+0.032
ped+0.031
uc+0.030
 elbow+0.029
kins+0.028
Neg logit contributions
 Elizabeth-0.037
 Mary-0.037
 Caroline-0.037
 Lib-0.036
 Alice-0.036
Token Tom
Token ID4186
Feature activation-4.96e+00

Pos logit contributions
ped+0.007
 fingers+0.007
ps+0.006
 lips+0.006
uc+0.006
Neg logit contributions
str-0.008
 Elizabeth-0.008
Kat-0.008
itating-0.008
abella-0.008
Token and
Token ID290
Feature activation+5.87e-01

Pos logit contributions
 university+0.122
 Caroline+0.118
 bakery+0.117
Kat+0.117
 Elizabeth+0.116
Neg logit contributions
ped-0.118
kin-0.113
uc-0.110
led-0.108
aces-0.104
Token Lily
Token ID20037
Feature activation-2.74e+00

Pos logit contributions
kin+0.017
ped+0.017
uc+0.016
aces+0.016
kins+0.016
Neg logit contributions
 Mary-0.011
 Elizabeth-0.011
 Caroline-0.011
 Alice-0.011
 Lib-0.011
Token that
Token ID326
Feature activation-2.35e+00

Pos logit contributions
 Elizabeth+0.056
 Caroline+0.056
 Carol+0.054
unt+0.054
 Paula+0.053
Neg logit contributions
ped-0.074
kin-0.073
uc-0.070
led-0.070
icked-0.068
Token it
Token ID340
Feature activation-9.93e-01

Pos logit contributions
 Elizabeth+0.045
 Caroline+0.045
 �+0.044
abella+0.044
 university+0.044
Neg logit contributions
kin-0.067
ped-0.065
kins-0.061
 elbow-0.061
uc-0.061
Token was
Token ID373
Feature activation-3.01e-01

Pos logit contributions
 university+0.021
 Caroline+0.021
 Elizabeth+0.020
 Lib+0.020
 Mary+0.020
Neg logit contributions
kin-0.027
ped-0.027
led-0.026
uc-0.025
kins-0.025
Token the
Token ID262
Feature activation-9.72e-01

Pos logit contributions
 Caroline+0.004
 Elizabeth+0.004
abella+0.004
 Lib+0.004
 �+0.004
Neg logit contributions
kin-0.005
ped-0.005
 elbow-0.004
uc-0.004
ags-0.004
Token car
Token ID1097
Feature activation-9.32e-01

Pos logit contributions
 Caroline+0.018
 Elizabeth+0.018
 Mary+0.017
 Lib+0.017
abella+0.017
Neg logit contributions
kin-0.029
ped-0.028
uc-0.027
 elbow-0.027
kins-0.026
Token.
Token ID13
Feature activation+1.21e-01

Pos logit contributions
 Elizabeth+0.024
 Caroline+0.024
abella+0.023
 Mary+0.023
 Lib+0.023
Neg logit contributions
kin-0.021
ped-0.020
uc-0.019
 elbow-0.019
aces-0.019
Token She
Token ID1375
Feature activation+1.34e+00

Pos logit contributions
kin+0.003
ped+0.003
 elbow+0.003
agged+0.003
aces+0.003
Neg logit contributions
abella-0.002
 Caroline-0.002
 Elizabeth-0.002
 Lib-0.002
str-0.002
Token told
Token ID1297
Feature activation+6.43e-01

Pos logit contributions
kin+0.034
ped+0.032
uc+0.031
aces+0.030
kins+0.030
Neg logit contributions
 Elizabeth-0.031
 Mary-0.031
 Caroline-0.030
 Alice-0.030
 Lib-0.030
Token Tom
Token ID4186
Feature activation-4.93e+00

Pos logit contributions
kin+0.015
ped+0.015
uc+0.014
 elbow+0.014
aces+0.014
Neg logit contributions
 Elizabeth-0.015
 Caroline-0.015
 Mary-0.015
 university-0.015
 Lib-0.015
Token to
Token ID284
Feature activation+1.00e+00

Pos logit contributions
Kat+0.114
 Caroline+0.112
 university+0.110
 bakery+0.109
abella+0.108
Neg logit contributions
ped-0.121
kin-0.113
uc-0.112
icked-0.111
aces-0.110
Token stay
Token ID2652
Feature activation-7.58e-01

Pos logit contributions
kin+0.025
ped+0.024
uc+0.024
led+0.023
kins+0.023
Neg logit contributions
 Mary-0.023
 Elizabeth-0.023
 Alice-0.023
 Caroline-0.023
 Karen-0.023
Token with
Token ID351
Feature activation-5.15e-01

Pos logit contributions
 Elizabeth+0.016
 Mary+0.016
 Caroline+0.016
 Alice+0.016
 university+0.016
Neg logit contributions
kin-0.020
ped-0.019
uc-0.019
 elbow-0.018
aces-0.018
Token Grand
Token ID5675
Feature activation-4.11e-01

Pos logit contributions
 Mary+0.011
 Alice+0.010
 Elizabeth+0.010
 Caroline+0.010
 university+0.010
Neg logit contributions
kin-0.015
uc-0.014
ped-0.014
led-0.013
kins-0.013
Tokenma
Token ID2611
Feature activation-1.76e+00

Pos logit contributions
 Elizabeth+0.007
 Caroline+0.007
 Mary+0.007
 Lib+0.007
 university+0.007
Neg logit contributions
kin-0.012
ped-0.012
uc-0.012
 elbow-0.011
kins-0.011
Token the
Token ID262
Feature activation-6.54e-01

Pos logit contributions
kin+0.035
ped+0.033
uc+0.032
agged+0.031
aces+0.031
Neg logit contributions
 Mary-0.036
 Elizabeth-0.035
 Alice-0.035
 Caroline-0.035
 Lib-0.035
Token branch
Token ID8478
Feature activation-3.84e-01

Pos logit contributions
 Elizabeth+0.016
 Mary+0.016
 university+0.015
 Caroline+0.015
 �+0.015
Neg logit contributions
kin-0.016
ped-0.015
uc-0.015
led-0.014
agged-0.014
Token.
Token ID13
Feature activation+4.83e-01

Pos logit contributions
 Mary+0.008
 �+0.008
 Elizabeth+0.008
 Lib+0.007
 Alice+0.007
Neg logit contributions
kin-0.010
uc-0.010
ped-0.010
bit-0.010
 elbow-0.010
Token He
Token ID679
Feature activation+1.62e+00

Pos logit contributions
kin+0.010
aces+0.009
ped+0.009
agged+0.009
 elbow+0.009
Neg logit contributions
 Elizabeth-0.013
 Paula-0.012
 Caroline-0.012
str-0.012
 Amelia-0.012
Token told
Token ID1297
Feature activation+8.87e-01

Pos logit contributions
kin+0.038
ped+0.036
uc+0.036
aces+0.034
 elbow+0.034
Neg logit contributions
 Elizabeth-0.039
 Mary-0.039
 Caroline-0.039
 Lib-0.038
 Alice-0.038
Token Tom
Token ID4186
Feature activation-4.93e+00

Pos logit contributions
ped+0.021
kin+0.020
ps+0.019
 elbow+0.019
bit+0.018
Neg logit contributions
 Elizabeth-0.021
 Caroline-0.020
 �-0.020
 university-0.020
str-0.020
Token and
Token ID290
Feature activation+5.17e-01

Pos logit contributions
 Caroline+0.125
 university+0.125
 Elizabeth+0.123
 Carol+0.121
 Paula+0.121
Neg logit contributions
ped-0.112
kin-0.109
uc-0.104
aces-0.103
led-0.102
Token Lily
Token ID20037
Feature activation-2.97e+00

Pos logit contributions
kin+0.014
ped+0.014
uc+0.013
 elbow+0.013
kins+0.013
Neg logit contributions
 Elizabeth-0.011
 Caroline-0.010
 Mary-0.010
 university-0.010
 Lib-0.010
Token to
Token ID284
Feature activation+8.17e-01

Pos logit contributions
 Elizabeth+0.058
 Caroline+0.058
abella+0.056
 Carol+0.056
 Mary+0.055
Neg logit contributions
ped-0.082
kin-0.081
uc-0.080
 elbow-0.078
led-0.077
Token go
Token ID467
Feature activation-2.73e+00

Pos logit contributions
kin+0.020
ped+0.019
uc+0.019
led+0.018
kins+0.018
Neg logit contributions
 Mary-0.019
 Elizabeth-0.019
 Caroline-0.019
 Alice-0.019
 Karen-0.019
Token home
Token ID1363
Feature activation-9.55e-01

Pos logit contributions
 Caroline+0.068
 Elizabeth+0.068
abella+0.065
 Carol+0.064
 Mary+0.064
Neg logit contributions
ped-0.062
kin-0.061
uc-0.059
ked-0.057
kins-0.057
Token both
Token ID1111
Feature activation+3.53e-01

Pos logit contributions
 Elizabeth+0.003
 university+0.003
 Caroline+0.003
unt+0.003
 Lib+0.003
Neg logit contributions
kin-0.003
ped-0.003
uc-0.002
led-0.002
icked-0.002
Token tired
Token ID10032
Feature activation+9.71e-01

Pos logit contributions
kin+0.010
ped+0.010
uc+0.009
 elbow+0.009
kins+0.009
Neg logit contributions
 Elizabeth-0.007
 Caroline-0.007
 university-0.007
abella-0.007
 Mary-0.007
Token.
Token ID13
Feature activation-5.89e-03

Pos logit contributions
kin+0.025
ped+0.024
uc+0.023
 elbow+0.022
led+0.022
Neg logit contributions
 Elizabeth-0.022
 Caroline-0.021
abella-0.021
 Mary-0.021
 Lib-0.020
Token Lucy
Token ID22162
Feature activation-1.42e+00

Pos logit contributions
 university+0.000
str+0.000
abella+0.000
 foreign+0.000
 certain+0.000
Neg logit contributions
 hands-0.000
 paws-0.000
ped-0.000
 fingers-0.000
 lips-0.000
Token told
Token ID1297
Feature activation+4.62e-01

Pos logit contributions
 university+0.036
 Elizabeth+0.034
 Caroline+0.034
 Lib+0.034
 Karen+0.033
Neg logit contributions
kin-0.032
ped-0.032
uc-0.030
led-0.030
 elbow-0.029
Token Sam
Token ID3409
Feature activation-4.91e+00

Pos logit contributions
ped+0.009
ps+0.008
kin+0.007
 lips+0.007
 cheeks+0.007
Neg logit contributions
str-0.013
 bakery-0.012
Today-0.012
 Taking-0.012
unt-0.012
Token that
Token ID326
Feature activation-2.88e+00

Pos logit contributions
 university+0.120
 bakery+0.119
 cinema+0.119
Kat+0.112
roads+0.112
Neg logit contributions
kin-0.107
ped-0.106
aces-0.100
led-0.098
icked-0.098
Token they
Token ID484
Feature activation+1.66e-01

Pos logit contributions
abella+0.058
 Elizabeth+0.058
 Caroline+0.057
Alice+0.057
unt+0.056
Neg logit contributions
kin-0.076
ped-0.070
 elbow-0.069
kins-0.069
 toes-0.067
Token should
Token ID815
Feature activation-4.50e-01

Pos logit contributions
kin+0.004
ped+0.004
uc+0.004
 elbow+0.004
kins+0.004
Neg logit contributions
 Elizabeth-0.004
 Mary-0.004
 Caroline-0.004
 Lib-0.004
 Alice-0.004
Token take
Token ID1011
Feature activation+3.62e-02

Pos logit contributions
 Elizabeth+0.009
 Caroline+0.009
 Mary+0.009
 Lib+0.008
 Alice+0.008
Neg logit contributions
kin-0.013
ped-0.012
uc-0.012
 elbow-0.012
kins-0.012
Token the
Token ID262
Feature activation-1.10e+00

Pos logit contributions
ped+0.001
kin+0.001
ags+0.001
bit+0.001
ps+0.001
Neg logit contributions
 Lib-0.001
 �-0.001
 bakery-0.001
abella-0.001
 Elizabeth-0.001
Token what
Token ID644
Feature activation-8.88e-01

Pos logit contributions
 Elizabeth+0.014
 Caroline+0.014
 Mary+0.014
 Lib+0.013
 Alice+0.013
Neg logit contributions
kin-0.021
ped-0.020
uc-0.020
 elbow-0.019
kins-0.019
Token happened
Token ID3022
Feature activation-1.06e+00

Pos logit contributions
 Caroline+0.021
 Elizabeth+0.021
 Lib+0.020
 university+0.020
 Mary+0.020
Neg logit contributions
kin-0.022
ped-0.021
uc-0.021
 elbow-0.020
led-0.020
Token.
Token ID13
Feature activation-5.04e-01

Pos logit contributions
 Elizabeth+0.034
 Caroline+0.034
abella+0.033
 Carol+0.033
 Lib+0.033
Neg logit contributions
ped-0.017
kin-0.016
uc-0.015
led-0.015
kins-0.014
Token She
Token ID1375
Feature activation+7.28e-01

Pos logit contributions
 university+0.011
abella+0.011
 foreign+0.011
 Caroline+0.011
 certain+0.011
Neg logit contributions
ped-0.011
 lips-0.011
 paws-0.011
uc-0.011
 cheeks-0.010
Token told
Token ID1297
Feature activation+6.77e-02

Pos logit contributions
ped+0.015
uc+0.014
led+0.014
kin+0.014
icked+0.013
Neg logit contributions
 university-0.021
 Caroline-0.020
 Lib-0.020
 Elizabeth-0.019
 Mary-0.019
Token Tim
Token ID5045
Feature activation-4.85e+00

Pos logit contributions
ped+0.001
kin+0.001
 elbow+0.001
uc+0.001
ps+0.001
Neg logit contributions
 Caroline-0.002
 university-0.002
 Elizabeth-0.002
abella-0.002
 Lib-0.002
Tokenmy
Token ID1820
Feature activation-3.88e+00

Pos logit contributions
 Caroline+0.111
 university+0.110
 bakery+0.107
 Lib+0.107
Kat+0.106
Neg logit contributions
ped-0.124
kin-0.116
uc-0.111
led-0.110
aces-0.110
Token,
Token ID11
Feature activation-1.09e+00

Pos logit contributions
 Caroline+0.094
 Carol+0.089
 Lib+0.089
 Elizabeth+0.089
Kat+0.088
Neg logit contributions
ped-0.092
uc-0.084
kin-0.083
icked-0.083
led-0.082
Token "
Token ID366
Feature activation-4.33e-01

Pos logit contributions
 Caroline+0.020
 university+0.019
abella+0.019
 Lib+0.019
 Elizabeth+0.018
Neg logit contributions
ped-0.032
kin-0.031
 elbow-0.031
uc-0.030
 lips-0.029
TokenI
Token ID40
Feature activation+6.83e-02

Pos logit contributions
 Caroline+0.010
 Elizabeth+0.009
 Mary+0.009
 university+0.009
 Lib+0.009
Neg logit contributions
kin-0.011
ped-0.011
uc-0.011
 elbow-0.010
led-0.010
Token told
Token ID1297
Feature activation-7.78e-01

Pos logit contributions
kin+0.002
ped+0.002
uc+0.002
 elbow+0.002
led+0.001
Neg logit contributions
 Caroline-0.002
 Elizabeth-0.002
 Mary-0.002
 Lib-0.002
 university-0.002
Token very
Token ID845
Feature activation+2.55e-01

Pos logit contributions
 Caroline+0.030
 Elizabeth+0.029
 Mary+0.029
 Carol+0.028
 Lib+0.028
Neg logit contributions
kin-0.036
ped-0.034
uc-0.034
 elbow-0.033
agged-0.032
Token disappointed
Token ID11679
Feature activation+1.83e+00

Pos logit contributions
kin+0.007
ped+0.007
uc+0.007
 elbow+0.007
aces+0.007
Neg logit contributions
 Elizabeth-0.005
 Mary-0.005
 Caroline-0.005
 university-0.005
 Alice-0.005
Token.
Token ID13
Feature activation-1.87e-01

Pos logit contributions
kin+0.034
ped+0.030
uc+0.030
kins+0.029
led+0.029
Neg logit contributions
 Elizabeth-0.052
 Caroline-0.052
abella-0.051
 Lib-0.051
 university-0.050
Token She
Token ID1375
Feature activation+8.09e-01

Pos logit contributions
str+0.004
abella+0.004
 Lib+0.004
 Karen+0.004
 Elizabeth+0.004
Neg logit contributions
 heads-0.004
ped-0.004
 fingers-0.004
 cheeks-0.004
 lips-0.004
Token told
Token ID1297
Feature activation+4.14e-01

Pos logit contributions
ped+0.018
kin+0.017
uc+0.017
led+0.017
icked+0.016
Neg logit contributions
 Caroline-0.021
 Elizabeth-0.020
 Lib-0.020
 university-0.020
 Mary-0.020
Token Lisa
Token ID15378
Feature activation-4.83e+00

Pos logit contributions
ped+0.011
kin+0.010
ps+0.010
 fingers+0.010
aces+0.010
Neg logit contributions
str-0.008
 Lib-0.008
 Karen-0.008
 bakery-0.008
 �-0.008
Token that
Token ID326
Feature activation-2.67e+00

Pos logit contributions
 bakery+0.118
 cinema+0.117
 Caroline+0.117
 university+0.115
 Lib+0.115
Neg logit contributions
ped-0.103
kin-0.102
icked-0.098
led-0.097
aces-0.095
Token if
Token ID611
Feature activation+9.78e-02

Pos logit contributions
 Caroline+0.060
Alice+0.058
 Carol+0.057
 Lib+0.057
 Elizabeth+0.057
Neg logit contributions
kin-0.066
kins-0.063
 cr-0.061
uc-0.061
 fingers-0.060
Token she
Token ID673
Feature activation+1.20e-01

Pos logit contributions
kin+0.003
kins+0.002
uc+0.002
ped+0.002
 elbow+0.002
Neg logit contributions
 �-0.002
 Lib-0.002
 Karen-0.002
 bakery-0.002
unt-0.002
Token wanted
Token ID2227
Feature activation-1.32e+00

Pos logit contributions
kin+0.003
ped+0.003
uc+0.003
ags+0.003
agged+0.003
Neg logit contributions
 Mary-0.003
 Elizabeth-0.003
abella-0.002
 Alice-0.002
 Caroline-0.002
Token a
Token ID257
Feature activation-5.16e-01

Pos logit contributions
 Caroline+0.030
 Lib+0.029
 Karen+0.029
 Carol+0.029
Alice+0.029
Neg logit contributions
kin-0.031
uc-0.031
 fingers-0.030
ped-0.030
 beads-0.029
Token nearby
Token ID6716
Feature activation-1.02e+00

Pos logit contributions
 Caroline+0.033
abella+0.033
 Paula+0.032
 Elizabeth+0.032
 Carol+0.032
Neg logit contributions
kin-0.042
ped-0.041
uc-0.039
 elbow-0.039
kins-0.038
Token garage
Token ID15591
Feature activation-6.34e-01

Pos logit contributions
abella+0.022
 Paula+0.021
 Lib+0.021
 Elizabeth+0.021
 Caroline+0.021
Neg logit contributions
kin-0.026
ped-0.023
uc-0.023
 paws-0.023
 lips-0.023
Token.
Token ID13
Feature activation-5.11e-01

Pos logit contributions
 Elizabeth+0.020
abella+0.020
 Caroline+0.019
 Alice+0.019
 Heidi+0.019
Neg logit contributions
kin-0.010
ped-0.010
kins-0.009
ked-0.009
uc-0.009
Token She
Token ID1375
Feature activation+1.28e+00

Pos logit contributions
 cinema+0.013
abella+0.013
 bakery+0.013
unt+0.013
str+0.012
Neg logit contributions
 fingers-0.009
ped-0.009
 lips-0.009
 hands-0.009
icked-0.009
Token told
Token ID1297
Feature activation+8.36e-01

Pos logit contributions
kin+0.032
ped+0.032
uc+0.030
led+0.029
kins+0.029
Neg logit contributions
 Elizabeth-0.029
 university-0.029
 Caroline-0.028
 Lib-0.028
 Mary-0.028
Token Josh
Token ID8518
Feature activation-4.83e+00

Pos logit contributions
ped+0.019
ps+0.018
 fingers+0.016
 lips+0.016
 hands+0.016
Neg logit contributions
 bakery-0.022
Today-0.021
 cinema-0.020
unt-0.020
str-0.020
Token that
Token ID326
Feature activation-2.73e+00

Pos logit contributions
 university+0.121
 bakery+0.121
 cinema+0.116
Kat+0.114
roads+0.112
Neg logit contributions
ped-0.109
kin-0.103
icked-0.100
ked-0.098
uc-0.098
Token she
Token ID673
Feature activation-2.05e-01

Pos logit contributions
abella+0.062
Alice+0.061
 Caroline+0.059
unt+0.059
itating+0.057
Neg logit contributions
kin-0.064
kins-0.063
ped-0.060
 cr-0.058
 beads-0.058
Token thought
Token ID1807
Feature activation-1.62e+00

Pos logit contributions
 Elizabeth+0.005
 Lib+0.005
 university+0.005
 Caroline+0.005
 Alice+0.005
Neg logit contributions
kin-0.005
ped-0.005
uc-0.004
led-0.004
kins-0.004
Token the
Token ID262
Feature activation-1.24e+00

Pos logit contributions
 Lib+0.034
 Caroline+0.033
 Elizabeth+0.033
Kat+0.033
 bakery+0.033
Neg logit contributions
kin-0.044
kins-0.040
ped-0.040
aces-0.039
led-0.037
Token pie
Token ID2508
Feature activation-2.07e+00

Pos logit contributions
 Caroline+0.033
abella+0.032
 Elizabeth+0.032
 Carol+0.031
 Paula+0.031
Neg logit contributions
kin-0.027
ped-0.027
uc-0.026
kins-0.025
bit-0.024
Token kitchen
Token ID9592
Feature activation+1.16e-01

Pos logit contributions
 Elizabeth+0.011
 Mary+0.011
 Caroline+0.011
abella+0.011
 Alice+0.011
Neg logit contributions
kin-0.013
ped-0.012
uc-0.012
 elbow-0.011
kins-0.011
Token drawer
Token ID33451
Feature activation+1.74e-01

Pos logit contributions
kin+0.003
ped+0.003
uc+0.003
 elbow+0.003
kins+0.003
Neg logit contributions
 Elizabeth-0.003
 Caroline-0.003
 Mary-0.003
 Lib-0.003
 Alice-0.003
Token.
Token ID13
Feature activation-1.15e+00

Pos logit contributions
kin+0.004
ped+0.004
uc+0.004
 elbow+0.004
kins+0.004
Neg logit contributions
 Elizabeth-0.004
 Mary-0.004
 Caroline-0.004
 Lib-0.004
 Alice-0.004
Token She
Token ID1375
Feature activation+9.61e-01

Pos logit contributions
 further+0.027
 foreign+0.026
 brewing+0.026
 certain+0.026
abella+0.026
Neg logit contributions
aces-0.022
 lips-0.021
 paws-0.021
 hands-0.021
agged-0.021
Token told
Token ID1297
Feature activation+9.12e-02

Pos logit contributions
kin+0.023
ped+0.022
uc+0.021
led+0.020
 elbow+0.020
Neg logit contributions
 Elizabeth-0.023
 Caroline-0.023
 university-0.023
 Mary-0.023
 Lib-0.023
Token Tom
Token ID4186
Feature activation-4.83e+00

Pos logit contributions
ped+0.002
 elbow+0.002
ps+0.002
 fingers+0.001
bit+0.001
Neg logit contributions
Kat-0.002
itating-0.002
str-0.002
 bakery-0.002
 university-0.002
Token,
Token ID11
Feature activation-9.41e-01

Pos logit contributions
 bakery+0.117
Kat+0.115
 cinema+0.114
 university+0.111
 Caroline+0.111
Neg logit contributions
ped-0.111
kin-0.105
icked-0.102
aces-0.102
uc-0.100
Token "
Token ID366
Feature activation-5.49e-01

Pos logit contributions
 Caroline+0.017
 Elizabeth+0.017
abella+0.016
 Lib+0.016
 university+0.016
Neg logit contributions
kin-0.027
ped-0.027
uc-0.025
 elbow-0.025
 lips-0.025
TokenI
Token ID40
Feature activation+8.30e-01

Pos logit contributions
 university+0.012
 Elizabeth+0.012
 Caroline+0.012
 �+0.012
 Mary+0.012
Neg logit contributions
kin-0.013
ped-0.013
uc-0.013
aces-0.012
ater-0.012
Token think
Token ID892
Feature activation-1.16e+00

Pos logit contributions
kin+0.020
uc+0.019
ped+0.018
kins+0.017
 elbow+0.017
Neg logit contributions
 Elizabeth-0.021
 Caroline-0.020
 university-0.020
 Lib-0.020
 Alice-0.020
Token I
Token ID314
Feature activation+7.72e-01

Pos logit contributions
 Elizabeth+0.027
 Caroline+0.027
 Lib+0.027
Kat+0.026
abella+0.026
Neg logit contributions
kin-0.028
ped-0.026
aces-0.025
kins-0.025
uc-0.025
Token the
Token ID262
Feature activation-1.57e+00

Pos logit contributions
 Caroline+0.008
 Elizabeth+0.008
abella+0.008
 �+0.008
 Lib+0.008
Neg logit contributions
kin-0.007
ped-0.007
 elbow-0.007
ps-0.007
ags-0.007
Token vet
Token ID20202
Feature activation-7.83e-01

Pos logit contributions
 Karen+0.026
 Caroline+0.026
 Elizabeth+0.026
 Paula+0.026
 Lib+0.026
Neg logit contributions
kin-0.047
 elbow-0.046
uc-0.043
ped-0.042
 beads-0.042
Token.
Token ID13
Feature activation-1.09e+00

Pos logit contributions
 Caroline+0.022
 Elizabeth+0.022
abella+0.022
 Lib+0.022
Kat+0.022
Neg logit contributions
kin-0.015
ped-0.014
uc-0.014
kins-0.013
aces-0.013
Token They
Token ID1119
Feature activation+6.57e-01

Pos logit contributions
 Lib+0.022
 Caroline+0.022
abella+0.022
 university+0.022
 Elizabeth+0.021
Neg logit contributions
kin-0.029
ped-0.028
uc-0.027
 elbow-0.026
ked-0.026
Token told
Token ID1297
Feature activation+2.03e-01

Pos logit contributions
kin+0.016
ped+0.016
uc+0.016
 elbow+0.015
kins+0.015
Neg logit contributions
 Elizabeth-0.015
 Mary-0.015
 Caroline-0.015
 Lib-0.015
 Alice-0.015
Token Lily
Token ID20037
Feature activation-4.82e+00

Pos logit contributions
kin+0.005
ped+0.005
uc+0.005
 elbow+0.005
bit+0.004
Neg logit contributions
 Elizabeth-0.005
 Caroline-0.005
 Lib-0.005
 university-0.005
 Mary-0.005
Token and
Token ID290
Feature activation-8.34e-02

Pos logit contributions
 university+0.127
 Caroline+0.127
 Lib+0.123
Kat+0.122
 Elizabeth+0.121
Neg logit contributions
ped-0.104
kin-0.100
uc-0.095
aces-0.092
led-0.092
Token Max
Token ID5436
Feature activation-3.36e+00

Pos logit contributions
 Mary+0.002
 Elizabeth+0.002
 Alice+0.002
 Caroline+0.002
abella+0.002
Neg logit contributions
kin-0.002
ped-0.002
uc-0.002
kins-0.002
agged-0.002
Token to
Token ID284
Feature activation+3.97e-01

Pos logit contributions
 Caroline+0.069
 Elizabeth+0.068
Kat+0.068
abella+0.068
 Lib+0.066
Neg logit contributions
ped-0.089
kin-0.086
uc-0.085
icked-0.083
aces-0.081
Token stay
Token ID2652
Feature activation-5.19e-01

Pos logit contributions
kin+0.009
ped+0.009
uc+0.009
led+0.008
kins+0.008
Neg logit contributions
 Mary-0.010
 Elizabeth-0.010
 Caroline-0.010
 Alice-0.010
 Karen-0.010
Token with
Token ID351
Feature activation-1.04e+00

Pos logit contributions
 Elizabeth+0.011
 Mary+0.011
 Caroline+0.011
 Alice+0.011
 university+0.011
Neg logit contributions
kin-0.014
ped-0.013
uc-0.013
aces-0.012
 elbow-0.012
Token not
Token ID407
Feature activation+3.25e-01

Pos logit contributions
 Elizabeth+0.024
 Caroline+0.024
 Mary+0.023
 Lib+0.023
 university+0.023
Neg logit contributions
kin-0.035
ped-0.035
uc-0.033
 elbow-0.032
agged-0.032
Token happy
Token ID3772
Feature activation+2.62e-01

Pos logit contributions
kin+0.010
ped+0.010
uc+0.010
bit+0.009
led+0.009
Neg logit contributions
 Mary-0.006
 Alice-0.005
 university-0.005
 Elizabeth-0.005
 Caroline-0.005
Token.
Token ID13
Feature activation-2.33e-02

Pos logit contributions
kin+0.006
ped+0.006
uc+0.006
 elbow+0.006
kins+0.006
Neg logit contributions
 Mary-0.006
 Elizabeth-0.006
 Caroline-0.006
 Alice-0.006
 university-0.006
Token She
Token ID1375
Feature activation+5.01e-01

Pos logit contributions
 Caroline+0.000
abella+0.000
 Elizabeth+0.000
 Lib+0.000
 Paula+0.000
Neg logit contributions
kin-0.001
ped-0.001
uc-0.001
aces-0.001
agged-0.001
Token told
Token ID1297
Feature activation+1.54e-01

Pos logit contributions
ped+0.011
kin+0.011
uc+0.010
led+0.010
icked+0.010
Neg logit contributions
 university-0.013
 Caroline-0.013
 Elizabeth-0.013
 Lib-0.013
 Mary-0.013
Token Tom
Token ID4186
Feature activation-4.81e+00

Pos logit contributions
ped+0.004
kin+0.004
 elbow+0.003
uc+0.003
bit+0.003
Neg logit contributions
 Caroline-0.003
 Elizabeth-0.003
 �-0.003
Kat-0.003
 certain-0.003
Token and
Token ID290
Feature activation-2.92e-01

Pos logit contributions
 Caroline+0.110
Kat+0.109
 university+0.108
 bakery+0.108
 Lib+0.106
Neg logit contributions
ped-0.120
kin-0.112
aces-0.110
led-0.109
uc-0.109
Token Lily
Token ID20037
Feature activation-3.09e+00

Pos logit contributions
 Elizabeth+0.006
 Caroline+0.006
 Mary+0.006
 university+0.006
 Lib+0.006
Neg logit contributions
kin-0.008
ped-0.008
uc-0.008
 elbow-0.008
kins-0.007
Token to
Token ID284
Feature activation+6.08e-01

Pos logit contributions
 Caroline+0.062
 Elizabeth+0.062
abella+0.060
 Lib+0.059
 Mary+0.059
Neg logit contributions
ped-0.085
kin-0.082
uc-0.081
 elbow-0.079
aces-0.079
Token stop
Token ID2245
Feature activation-1.63e+00

Pos logit contributions
kin+0.015
ped+0.014
uc+0.014
led+0.013
kins+0.013
Neg logit contributions
 Mary-0.015
 Elizabeth-0.014
 Caroline-0.014
 Alice-0.014
 Karen-0.014
Token fighting
Token ID4330
Feature activation-7.49e-01

Pos logit contributions
 Elizabeth+0.035
 Caroline+0.034
 Lib+0.034
abella+0.033
 Mary+0.033
Neg logit contributions
kin-0.043
ped-0.043
uc-0.041
 elbow-0.041
ps-0.039
Token very
Token ID845
Feature activation-1.82e-01

Pos logit contributions
 Caroline+0.037
 Elizabeth+0.037
 Mary+0.036
 Lib+0.035
 Carol+0.035
Neg logit contributions
kin-0.045
ped-0.042
uc-0.041
ps-0.040
 elbow-0.040
Token upset
Token ID9247
Feature activation+1.36e+00

Pos logit contributions
 Caroline+0.004
 Elizabeth+0.004
 Mary+0.004
 Lib+0.004
 university+0.004
Neg logit contributions
kin-0.005
uc-0.005
ped-0.005
kins-0.004
led-0.004
Token.
Token ID13
Feature activation-1.05e+00

Pos logit contributions
kin+0.027
ped+0.026
uc+0.025
led+0.024
aces+0.023
Neg logit contributions
 Elizabeth-0.038
 Caroline-0.038
 Lib-0.037
 university-0.037
abella-0.037
Token She
Token ID1375
Feature activation+4.32e-01

Pos logit contributions
 further+0.025
abella+0.025
 cinema+0.024
saw+0.024
 university+0.023
Neg logit contributions
 cheeks-0.021
 fingers-0.021
 lips-0.020
ped-0.020
 paws-0.020
Token told
Token ID1297
Feature activation+2.56e-01

Pos logit contributions
ped+0.009
kin+0.009
uc+0.009
led+0.009
icked+0.008
Neg logit contributions
 Caroline-0.011
 university-0.011
 Lib-0.011
 Elizabeth-0.011
 Mary-0.011
Token Mary
Token ID5335
Feature activation-4.81e+00

Pos logit contributions
ped+0.007
ps+0.007
 fingers+0.006
kin+0.006
 cheeks+0.006
Neg logit contributions
itating-0.005
 foreign-0.005
 university-0.005
 cinema-0.005
 Taking-0.005
Token that
Token ID326
Feature activation-2.87e+00

Pos logit contributions
 cinema+0.112
 Caroline+0.112
 bakery+0.110
Kat+0.110
 university+0.110
Neg logit contributions
ped-0.112
kin-0.105
icked-0.104
aces-0.101
uc-0.101
Token she
Token ID673
Feature activation-3.71e-01

Pos logit contributions
 Caroline+0.061
 Elizabeth+0.059
abella+0.058
Alice+0.058
 Lib+0.057
Neg logit contributions
kin-0.071
ped-0.066
 elbow-0.066
 fingers-0.066
 toes-0.065
Token needs
Token ID2476
Feature activation+1.43e-01

Pos logit contributions
 Caroline+0.009
 Elizabeth+0.009
 Lib+0.009
 Mary+0.009
 university+0.009
Neg logit contributions
kin-0.009
ped-0.009
uc-0.008
led-0.008
 elbow-0.008
Token to
Token ID284
Feature activation+7.14e-01

Pos logit contributions
kin+0.005
ped+0.005
uc+0.005
kins+0.005
bit+0.005
Neg logit contributions
 Mary-0.002
 Alice-0.002
 �-0.002
 Elizabeth-0.002
 university-0.002
Token learn
Token ID2193
Feature activation-1.04e+00

Pos logit contributions
kin+0.018
ped+0.017
uc+0.017
led+0.017
aces+0.016
Neg logit contributions
 Mary-0.016
 university-0.016
 Alice-0.016
 Elizabeth-0.016
 Karen-0.016
Token was
Token ID373
Feature activation-2.86e+00

Pos logit contributions
 Caroline+0.025
 Elizabeth+0.025
 university+0.024
abella+0.024
 Carol+0.024
Neg logit contributions
kin-0.022
ped-0.021
uc-0.021
kins-0.020
led-0.020
Token happening
Token ID5836
Feature activation-9.30e-02

Pos logit contributions
 Caroline+0.042
 Elizabeth+0.042
 Mary+0.042
 Carol+0.040
abella+0.039
Neg logit contributions
kin-0.097
ped-0.096
uc-0.091
led-0.091
kins-0.090
Token.
Token ID13
Feature activation-3.56e-01

Pos logit contributions
 Elizabeth+0.003
abella+0.003
 Caroline+0.003
 Carol+0.003
 university+0.003
Neg logit contributions
ped-0.001
kin-0.001
led-0.001
uc-0.001
kins-0.001
Token She
Token ID1375
Feature activation+9.08e-01

Pos logit contributions
abella+0.008
 Caroline+0.008
 university+0.007
 Elizabeth+0.007
 Carol+0.007
Neg logit contributions
ped-0.009
uc-0.009
 lips-0.008
 paws-0.008
icked-0.008
Token told
Token ID1297
Feature activation+7.35e-02

Pos logit contributions
ped+0.020
uc+0.020
kin+0.019
led+0.018
icked+0.018
Neg logit contributions
 university-0.024
 Caroline-0.024
 Elizabeth-0.023
 Lib-0.023
 Heidi-0.023
Token Sally
Token ID25737
Feature activation-4.78e+00

Pos logit contributions
ped+0.002
kin+0.002
uc+0.002
bit+0.002
ked+0.002
Neg logit contributions
 Caroline-0.002
 university-0.002
 Elizabeth-0.002
abella-0.002
 Carol-0.002
Token to
Token ID284
Feature activation+9.03e-01

Pos logit contributions
roads+0.111
 Caroline+0.110
 university+0.109
Kat+0.108
 cinema+0.108
Neg logit contributions
ped-0.113
kin-0.105
uc-0.105
icked-0.103
aces-0.101
Token give
Token ID1577
Feature activation-1.49e+00

Pos logit contributions
kin+0.022
ped+0.022
uc+0.022
led+0.020
aces+0.020
Neg logit contributions
 Mary-0.021
 Elizabeth-0.021
 Caroline-0.021
 Alice-0.020
 university-0.020
Token the
Token ID262
Feature activation-1.16e+00

Pos logit contributions
 university+0.034
 Caroline+0.034
 Elizabeth+0.033
abella+0.033
 Carol+0.032
Neg logit contributions
ped-0.039
kin-0.036
uc-0.035
led-0.034
ked-0.034
Token toy
Token ID13373
Feature activation-2.26e-01

Pos logit contributions
 Elizabeth+0.026
 Mary+0.026
 Caroline+0.026
 Lib+0.025
abella+0.025
Neg logit contributions
kin-0.030
ped-0.029
uc-0.028
 elbow-0.027
kins-0.026
Token back
Token ID736
Feature activation-1.34e-01

Pos logit contributions
 Elizabeth+0.004
 Mary+0.004
 Caroline+0.004
 university+0.004
 Alice+0.004
Neg logit contributions
kin-0.007
ped-0.006
uc-0.006
 elbow-0.006
kins-0.006
Token family
Token ID1641
Feature activation-3.39e-01

Pos logit contributions
kin+0.040
ped+0.039
uc+0.038
 elbow+0.036
kins+0.036
Neg logit contributions
 Mary-0.035
 Elizabeth-0.035
 Caroline-0.035
 university-0.034
 Alice-0.034
Token history
Token ID2106
Feature activation+1.38e+00

Pos logit contributions
 Elizabeth+0.010
 Caroline+0.010
abella+0.009
 university+0.009
 Carol+0.009
Neg logit contributions
kin-0.006
ped-0.006
uc-0.006
kins-0.006
aces-0.006
Token.
Token ID13
Feature activation-5.25e-01

Pos logit contributions
kin+0.028
ped+0.026
uc+0.025
aces+0.025
kins+0.025
Neg logit contributions
 Elizabeth-0.038
 Caroline-0.037
abella-0.037
 Lib-0.037
 Heidi-0.036
Token She
Token ID1375
Feature activation+5.31e-01

Pos logit contributions
str+0.009
 Elizabeth+0.009
abella+0.009
 certain+0.009
 Caroline+0.009
Neg logit contributions
aces-0.013
 shorts-0.013
 paws-0.013
 lips-0.012
ped-0.012
Token told
Token ID1297
Feature activation+7.42e-01

Pos logit contributions
kin+0.012
ped+0.012
uc+0.011
led+0.011
kins+0.011
Neg logit contributions
 Elizabeth-0.013
 university-0.012
 Caroline-0.012
 Lib-0.012
 Heidi-0.012
Token Sarah
Token ID10490
Feature activation-4.78e+00

Pos logit contributions
ped+0.021
kin+0.020
uc+0.019
ps+0.019
 fingers+0.019
Neg logit contributions
 Elizabeth-0.013
 bakery-0.013
 Lib-0.013
 university-0.013
 Caroline-0.012
Token about
Token ID546
Feature activation-2.84e+00

Pos logit contributions
 university+0.107
 cinema+0.106
Kat+0.105
 bakery+0.104
 foreign+0.103
Neg logit contributions
ped-0.111
aces-0.106
kin-0.106
icked-0.101
uc-0.100
Token all
Token ID477
Feature activation-2.06e+00

Pos logit contributions
abella+0.064
Alice+0.064
 Elizabeth+0.063
unt+0.063
Kat+0.063
Neg logit contributions
ped-0.061
kin-0.060
 beads-0.058
uc-0.058
 elbow-0.056
Token the
Token ID262
Feature activation-4.65e-01

Pos logit contributions
abella+0.047
Kat+0.046
Alice+0.044
 Caroline+0.044
 Elizabeth+0.044
Neg logit contributions
ped-0.047
kin-0.045
led-0.043
aces-0.043
kins-0.043
Token brave
Token ID14802
Feature activation+1.29e-01

Pos logit contributions
 Elizabeth+0.012
 �+0.012
abella+0.012
 Caroline+0.011
 Heidi+0.011
Neg logit contributions
ped-0.010
kin-0.009
uc-0.009
kins-0.009
aces-0.009
Token and
Token ID290
Feature activation-7.90e-01

Pos logit contributions
kin+0.002
ped+0.002
uc+0.002
aces+0.002
led+0.002
Neg logit contributions
 Elizabeth-0.004
abella-0.004
 Caroline-0.004
 �-0.004
 Lib-0.004
Token her
Token ID607
Feature activation-2.14e-01

Pos logit contributions
abella+0.010
 Caroline+0.010
unt+0.010
 Elizabeth+0.010
 Lib+0.010
Neg logit contributions
ped-0.007
kin-0.007
ps-0.007
 elbow-0.007
uc-0.006
Token side
Token ID1735
Feature activation-2.05e+00

Pos logit contributions
 Caroline+0.005
abella+0.005
 Elizabeth+0.005
 Lib+0.005
 Mary+0.005
Neg logit contributions
kin-0.005
ped-0.005
uc-0.005
 elbow-0.005
kins-0.005
Token.
Token ID13
Feature activation-7.70e-01

Pos logit contributions
 Caroline+0.058
abella+0.058
 Elizabeth+0.058
 Lib+0.056
 university+0.055
Neg logit contributions
kin-0.040
ped-0.036
uc-0.035
 elbow-0.034
kins-0.034
Token She
Token ID1375
Feature activation-1.82e-01

Pos logit contributions
 university+0.015
abella+0.014
 Caroline+0.013
 foreign+0.013
 subway+0.013
Neg logit contributions
ped-0.020
 hands-0.020
 paws-0.020
 lips-0.019
icked-0.019
Token told
Token ID1297
Feature activation+5.60e-01

Pos logit contributions
 university+0.005
 Caroline+0.005
 Lib+0.005
 Elizabeth+0.005
 Mary+0.005
Neg logit contributions
ped-0.004
kin-0.004
uc-0.004
led-0.003
 elbow-0.003
Token Jane
Token ID12091
Feature activation-4.77e+00

Pos logit contributions
ped+0.013
kin+0.012
ps+0.011
icked+0.011
 elbow+0.011
Neg logit contributions
 Caroline-0.013
 university-0.013
abella-0.013
 Elizabeth-0.013
unt-0.013
Token it
Token ID340
Feature activation-1.91e+00

Pos logit contributions
 Caroline+0.109
 cinema+0.109
roads+0.108
 university+0.108
Kat+0.108
Neg logit contributions
ped-0.111
icked-0.106
kin-0.106
aces-0.103
uc-0.101
Token was
Token ID373
Feature activation-2.29e-01

Pos logit contributions
Kat+0.052
 university+0.052
 Caroline+0.051
 Lib+0.050
 festivals+0.049
Neg logit contributions
ped-0.042
led-0.038
kin-0.036
icked-0.036
uc-0.035
Token ok
Token ID12876
Feature activation-6.95e-01

Pos logit contributions
 Elizabeth+0.005
 Mary+0.005
 Caroline+0.004
 Alice+0.004
 university+0.004
Neg logit contributions
kin-0.006
ped-0.006
uc-0.006
 elbow-0.006
kins-0.006
Token,
Token ID11
Feature activation-2.96e+00

Pos logit contributions
abella+0.017
 Caroline+0.017
 Elizabeth+0.017
 Lib+0.016
 university+0.016
Neg logit contributions
kin-0.016
ped-0.015
uc-0.015
kins-0.014
aces-0.014
Token and
Token ID290
Feature activation-2.35e-01

Pos logit contributions
abella+0.065
 Caroline+0.063
 Elizabeth+0.060
 Lib+0.060
Kat+0.060
Neg logit contributions
ped-0.074
kin-0.074
 elbow-0.071
 fingers-0.070
aces-0.069
Token a
Token ID257
Feature activation-1.07e-01

Pos logit contributions
kin+0.049
ped+0.047
uc+0.047
kins+0.044
 elbow+0.044
Neg logit contributions
 Mary-0.043
 Elizabeth-0.043
 Caroline-0.043
 Alice-0.042
 Lib-0.042
Token suit
Token ID6050
Feature activation+1.20e+00

Pos logit contributions
 Elizabeth+0.003
 Mary+0.003
 Caroline+0.003
abella+0.003
 Lib+0.003
Neg logit contributions
kin-0.002
uc-0.002
ped-0.002
 elbow-0.002
kins-0.002
Token.
Token ID13
Feature activation+4.06e-01

Pos logit contributions
kin+0.029
ped+0.028
uc+0.027
 elbow+0.026
kins+0.026
Neg logit contributions
 Elizabeth-0.028
 Caroline-0.028
 Mary-0.028
 Lib-0.027
 Alice-0.027
Token He
Token ID679
Feature activation+1.20e+00

Pos logit contributions
 lips+0.007
 paws+0.007
aces+0.007
ped+0.007
 shorts+0.006
Neg logit contributions
 certain-0.011
abella-0.010
itating-0.010
 university-0.010
 arriving-0.010
Token told
Token ID1297
Feature activation+6.98e-01

Pos logit contributions
ped+0.024
led+0.023
kin+0.022
ked+0.022
uc+0.022
Neg logit contributions
 university-0.033
 Elizabeth-0.031
 Lib-0.031
 Heidi-0.031
 Caroline-0.030
Token Jill
Token ID22772
Feature activation-4.77e+00

Pos logit contributions
ped+0.014
kin+0.012
ps+0.012
 lips+0.012
 elbow+0.012
Neg logit contributions
itating-0.018
abella-0.018
 university-0.018
str-0.018
 Caroline-0.018
Token it
Token ID340
Feature activation-1.73e+00

Pos logit contributions
 university+0.101
 bakery+0.101
 Caroline+0.100
 Elizabeth+0.100
 cinema+0.099
Neg logit contributions
ped-0.118
kin-0.117
aces-0.115
led-0.111
icked-0.108
Token was
Token ID373
Feature activation-1.81e-01

Pos logit contributions
Kat+0.043
 Elizabeth+0.041
 Caroline+0.041
 university+0.041
Alice+0.041
Neg logit contributions
ped-0.040
kin-0.037
led-0.036
aces-0.035
ked-0.034
Token a
Token ID257
Feature activation-1.68e+00

Pos logit contributions
 Caroline+0.004
 Elizabeth+0.004
 Lib+0.004
 Mary+0.004
 university+0.004
Neg logit contributions
kin-0.005
ped-0.004
uc-0.004
 elbow-0.004
aces-0.004
Token special
Token ID2041
Feature activation+8.06e-02

Pos logit contributions
 Elizabeth+0.042
 Lib+0.042
 Mary+0.042
 Caroline+0.041
 Karen+0.041
Neg logit contributions
kin-0.039
 elbow-0.034
uc-0.034
 seaw-0.032
ater-0.031
Token dinosaur
Token ID28257
Feature activation-1.02e+00

Pos logit contributions
kin+0.002
uc+0.002
ped+0.002
 elbow+0.002
kins+0.001
Neg logit contributions
 Elizabeth-0.002
 Caroline-0.002
abella-0.002
 Lib-0.002
 Alice-0.002
Token She
Token ID1375
Feature activation+9.46e-01

Pos logit contributions
ped+0.007
kin+0.007
icked+0.006
uc+0.006
agged+0.006
Neg logit contributions
 Elizabeth-0.005
abella-0.005
unt-0.005
 Kathy-0.005
 Caroline-0.005
Token was
Token ID373
Feature activation-1.19e+00

Pos logit contributions
kin+0.023
ped+0.023
uc+0.022
led+0.021
kins+0.021
Neg logit contributions
 university-0.023
 Caroline-0.022
 Elizabeth-0.022
 Lib-0.022
 Mary-0.022
Token unhappy
Token ID19283
Feature activation+6.01e-01

Pos logit contributions
 Elizabeth+0.021
 Caroline+0.021
 Mary+0.021
 Lib+0.020
 university+0.020
Neg logit contributions
kin-0.036
ped-0.035
uc-0.034
 elbow-0.033
kins-0.032
Token and
Token ID290
Feature activation-8.30e-01

Pos logit contributions
kin+0.014
ped+0.013
uc+0.013
kins+0.012
 elbow+0.012
Neg logit contributions
 Elizabeth-0.015
 Caroline-0.015
 Lib-0.014
abella-0.014
 university-0.014
Token told
Token ID1297
Feature activation+4.88e-01

Pos logit contributions
 Elizabeth+0.018
 Caroline+0.018
 Mary+0.017
 university+0.017
 Lib+0.017
Neg logit contributions
kin-0.022
ped-0.021
uc-0.021
 elbow-0.020
led-0.020
Token Jill
Token ID22772
Feature activation-4.76e+00

Pos logit contributions
kin+0.011
ped+0.011
uc+0.010
kins+0.010
ps+0.010
Neg logit contributions
 Elizabeth-0.012
unt-0.012
 Caroline-0.012
Alice-0.012
 Lib-0.012
Token she
Token ID673
Feature activation-1.28e-01

Pos logit contributions
 Caroline+0.105
 cinema+0.105
 bakery+0.104
 Elizabeth+0.103
 Carol+0.102
Neg logit contributions
kin-0.116
ped-0.113
aces-0.109
led-0.107
icked-0.106
Token had
Token ID550
Feature activation-4.50e-02

Pos logit contributions
 Caroline+0.003
 Elizabeth+0.003
 university+0.003
 Lib+0.003
 Mary+0.003
Neg logit contributions
ped-0.003
kin-0.003
uc-0.003
led-0.003
icked-0.003
Token to
Token ID284
Feature activation+1.99e+00

Pos logit contributions
 Mary+0.001
 Elizabeth+0.001
 Caroline+0.001
 university+0.001
 Alice+0.001
Neg logit contributions
kin-0.001
ped-0.001
uc-0.001
kins-0.001
led-0.001
Token clean
Token ID3424
Feature activation-8.13e-01

Pos logit contributions
ped+0.055
led+0.054
uc+0.054
ked+0.053
kin+0.052
Neg logit contributions
 university-0.042
 Mary-0.041
 Alice-0.040
 Karen-0.040
 Elizabeth-0.039
Token it
Token ID340
Feature activation+7.58e-02

Pos logit contributions
 Elizabeth+0.016
 Caroline+0.016
 Lib+0.016
unt+0.016
 �+0.015
Neg logit contributions
kin-0.022
ped-0.021
 elbow-0.021
kins-0.021
uc-0.021
Token and
Token ID290
Feature activation-8.52e-01

Pos logit contributions
 Elizabeth+0.012
 Caroline+0.012
abella+0.011
 Mary+0.011
 Lib+0.011
Neg logit contributions
kin-0.012
ped-0.012
uc-0.011
aces-0.011
led-0.011
Token sad
Token ID6507
Feature activation+1.78e+00

Pos logit contributions
 Elizabeth+0.017
 Caroline+0.017
 Mary+0.017
 Lib+0.017
 university+0.016
Neg logit contributions
kin-0.023
ped-0.023
uc-0.023
 elbow-0.022
led-0.021
Token.
Token ID13
Feature activation-3.97e-01

Pos logit contributions
kin+0.051
ped+0.048
uc+0.047
bit+0.046
kins+0.045
Neg logit contributions
 �-0.037
 Mary-0.036
 university-0.036
 about-0.035
 Alice-0.035
Token She
Token ID1375
Feature activation-1.67e-01

Pos logit contributions
 Caroline+0.007
abella+0.007
 Elizabeth+0.007
 Lib+0.007
 Mary+0.007
Neg logit contributions
ped-0.011
kin-0.011
uc-0.011
agged-0.011
aces-0.011
Token tells
Token ID4952
Feature activation+4.45e-01

Pos logit contributions
 Caroline+0.004
 Mary+0.004
 Elizabeth+0.004
 Lib+0.004
 Alice+0.004
Neg logit contributions
ped-0.004
kin-0.004
uc-0.004
led-0.004
 elbow-0.004
Token Tom
Token ID4186
Feature activation-4.75e+00

Pos logit contributions
ped+0.010
kin+0.009
ps+0.009
uc+0.009
ags+0.009
Neg logit contributions
 �-0.011
 Caroline-0.010
 Elizabeth-0.010
 university-0.010
Kat-0.010
Token and
Token ID290
Feature activation-2.43e-01

Pos logit contributions
 Caroline+0.120
 university+0.119
 Elizabeth+0.116
 Lib+0.115
 Paula+0.115
Neg logit contributions
ped-0.110
kin-0.106
uc-0.101
led-0.100
aces-0.099
Token Lily
Token ID20037
Feature activation-3.42e+00

Pos logit contributions
 Mary+0.004
 Elizabeth+0.004
 Alice+0.004
 Caroline+0.004
 �+0.004
Neg logit contributions
kin-0.008
uc-0.008
ped-0.007
kins-0.007
bit-0.007
Token to
Token ID284
Feature activation+8.32e-01

Pos logit contributions
 Caroline+0.071
 Elizabeth+0.069
abella+0.067
 Carol+0.066
 Lib+0.066
Neg logit contributions
ped-0.092
uc-0.089
kin-0.088
 elbow-0.086
led-0.085
Token sit
Token ID1650
Feature activation-1.03e+00

Pos logit contributions
kin+0.021
ped+0.020
uc+0.020
led+0.020
aces+0.019
Neg logit contributions
 Mary-0.020
 Alice-0.019
 Karen-0.019
 Elizabeth-0.019
 Caroline-0.019
Token on
Token ID319
Feature activation-6.65e-01

Pos logit contributions
 Mary+0.020
 Elizabeth+0.020
 Caroline+0.020
 university+0.020
 Alice+0.020
Neg logit contributions
kin-0.029
ped-0.028
uc-0.027
bit-0.027
aces-0.027
Token was
Token ID373
Feature activation-1.17e+00

Pos logit contributions
 Caroline+0.036
 Elizabeth+0.036
 Paula+0.036
 Carol+0.036
abella+0.035
Neg logit contributions
kin-0.037
ped-0.034
kins-0.033
uc-0.032
led-0.032
Token wrong
Token ID2642
Feature activation+1.11e+00

Pos logit contributions
 Mary+0.010
 Elizabeth+0.009
 Caroline+0.009
 university+0.009
 Lib+0.009
Neg logit contributions
kin-0.047
ped-0.045
uc-0.045
bit-0.044
 elbow-0.043
Token.
Token ID13
Feature activation-1.15e+00

Pos logit contributions
kin+0.023
ped+0.021
uc+0.021
kins+0.020
aces+0.019
Neg logit contributions
 Elizabeth-0.031
 Caroline-0.031
 Lib-0.030
 university-0.030
 Mary-0.030
Token Spot
Token ID15899
Feature activation-7.99e-01

Pos logit contributions
 university+0.026
 certain+0.025
Kat+0.025
 Paula+0.024
abella+0.023
Neg logit contributions
ped-0.027
uc-0.027
 lips-0.026
 paws-0.026
ked-0.026
Token told
Token ID1297
Feature activation+4.91e-01

Pos logit contributions
 university+0.024
 Caroline+0.022
 Lib+0.022
 Paula+0.022
Kat+0.022
Neg logit contributions
ped-0.015
uc-0.014
kin-0.014
led-0.014
ked-0.013
Token Sam
Token ID3409
Feature activation-4.73e+00

Pos logit contributions
ped+0.010
kin+0.010
bit+0.009
ps+0.009
 elbow+0.009
Neg logit contributions
Kat-0.013
 bakery-0.013
 university-0.013
 Lib-0.013
 Paula-0.013
Token about
Token ID546
Feature activation-2.69e+00

Pos logit contributions
 bakery+0.110
 university+0.108
Kat+0.108
Alice+0.107
 Caroline+0.107
Neg logit contributions
ped-0.117
kin-0.110
icked-0.108
led-0.108
aces-0.104
Token his
Token ID465
Feature activation+2.76e-01

Pos logit contributions
Kat+0.075
 Caroline+0.072
 Elizabeth+0.071
abella+0.071
Alice+0.071
Neg logit contributions
 elbow-0.052
kin-0.051
ped-0.050
 beads-0.046
uc-0.045
Token lost
Token ID2626
Feature activation+1.41e+00

Pos logit contributions
uc+0.007
ped+0.007
kin+0.006
icked+0.006
led+0.006
Neg logit contributions
 �-0.006
 university-0.006
 certain-0.006
 Caroline-0.006
 Tammy-0.006
Token ball
Token ID2613
Feature activation-3.45e-01

Pos logit contributions
uc+0.032
ped+0.032
kin+0.032
led+0.031
agged+0.031
Neg logit contributions
 university-0.034
 Mary-0.033
 Caroline-0.033
 �-0.033
 Elizabeth-0.033
Token.
Token ID13
Feature activation-1.01e+00

Pos logit contributions
 Mary+0.006
 university+0.006
 Elizabeth+0.006
 �+0.006
 Alice+0.006
Neg logit contributions
kin-0.010
ped-0.010
uc-0.010
agged-0.009
bit-0.009
Token to
Token ID284
Feature activation-1.18e-01

Pos logit contributions
 Elizabeth+0.036
 Caroline+0.036
abella+0.036
 Lib+0.035
 Carol+0.034
Neg logit contributions
kin-0.047
uc-0.046
ped-0.046
kins-0.043
 elbow-0.043
Token help
Token ID1037
Feature activation-5.69e-01

Pos logit contributions
 Elizabeth+0.003
 Caroline+0.003
 Mary+0.003
 Lib+0.003
 university+0.003
Neg logit contributions
kin-0.003
ped-0.003
uc-0.003
 elbow-0.003
kins-0.003
Token.
Token ID13
Feature activation-3.29e-01

Pos logit contributions
 Caroline+0.015
 Elizabeth+0.015
 Lib+0.014
abella+0.014
 university+0.014
Neg logit contributions
kin-0.012
ped-0.012
uc-0.012
kins-0.011
led-0.011
Token She
Token ID1375
Feature activation+7.85e-01

Pos logit contributions
 Caroline+0.007
abella+0.007
 Elizabeth+0.007
Mary+0.007
str+0.007
Neg logit contributions
ped-0.007
 cheeks-0.007
 paws-0.007
uc-0.007
 lips-0.007
Token told
Token ID1297
Feature activation+4.28e-01

Pos logit contributions
uc+0.016
ped+0.016
kin+0.015
led+0.015
icked+0.015
Neg logit contributions
 university-0.022
 Caroline-0.022
 Lib-0.021
 Elizabeth-0.021
 Mary-0.021
Token Sarah
Token ID10490
Feature activation-4.73e+00

Pos logit contributions
ped+0.010
kin+0.010
uc+0.010
 elbow+0.009
led+0.009
Neg logit contributions
 Caroline-0.010
 Elizabeth-0.010
 Lib-0.009
 university-0.009
abella-0.009
Token that
Token ID326
Feature activation-2.82e+00

Pos logit contributions
roads+0.118
 Caroline+0.118
 cinema+0.117
Kat+0.116
 bakery+0.116
Neg logit contributions
ped-0.103
uc-0.095
icked-0.094
led-0.092
kin-0.091
Token it
Token ID340
Feature activation-1.49e+00

Pos logit contributions
 Caroline+0.059
 Elizabeth+0.058
abella+0.057
Alice+0.057
 Carol+0.055
Neg logit contributions
kin-0.073
ped-0.069
 beads-0.069
uc-0.068
 elbow-0.068
Token wasn
Token ID2492
Feature activation-1.08e+00

Pos logit contributions
 Caroline+0.030
 Mary+0.029
 Lib+0.029
 university+0.029
Kat+0.029
Neg logit contributions
ped-0.043
kin-0.041
led-0.040
uc-0.040
aces-0.038
Token't
Token ID470
Feature activation+7.93e-01

Pos logit contributions
 Elizabeth+0.016
 Caroline+0.015
 Alice+0.015
 university+0.015
 Karen+0.015
Neg logit contributions
kin-0.037
uc-0.035
ped-0.035
kins-0.034
 elbow-0.034
Token fair
Token ID3148
Feature activation+5.51e-01

Pos logit contributions
kin+0.023
uc+0.023
ped+0.022
 elbow+0.022
kins+0.021
Neg logit contributions
 Mary-0.015
 Alice-0.015
 Elizabeth-0.015
 Karen-0.015
 Lib-0.014
Token slide
Token ID10649
Feature activation-1.47e+00

Pos logit contributions
 elbow+0.007
ped+0.007
 toes+0.007
 fingers+0.007
 feet+0.007
Neg logit contributions
abella-0.009
 Elizabeth-0.009
Kat-0.009
unt-0.009
 Caroline-0.009
Token down
Token ID866
Feature activation-4.14e-01

Pos logit contributions
 Elizabeth+0.041
 Caroline+0.040
 university+0.039
Kat+0.039
abella+0.039
Neg logit contributions
kin-0.028
ped-0.028
 elbow-0.026
led-0.026
uc-0.026
Token.
Token ID13
Feature activation-5.66e-01

Pos logit contributions
 Elizabeth+0.011
abella+0.011
unt+0.011
 Caroline+0.011
Kat+0.011
Neg logit contributions
ped-0.008
kin-0.008
ps-0.007
aces-0.007
 elbow-0.007
Token Her
Token ID2332
Feature activation+1.67e+00

Pos logit contributions
 Elizabeth+0.011
 university+0.011
 Caroline+0.011
 foreign+0.010
Kat+0.010
Neg logit contributions
ped-0.015
uc-0.014
 paws-0.014
aces-0.014
 fingers-0.014
Token eyes
Token ID2951
Feature activation+1.71e+00

Pos logit contributions
kin+0.041
ped+0.039
uc+0.038
kins+0.036
led+0.036
Neg logit contributions
 Mary-0.040
 Caroline-0.039
 Elizabeth-0.039
 university-0.039
 Alice-0.038
Token spark
Token ID9009
Feature activation+4.66e+00

Pos logit contributions
kin+0.045
uc+0.042
ped+0.041
 elbow+0.041
kins+0.041
Neg logit contributions
 Mary-0.037
 �-0.037
 Karen-0.036
 Lib-0.036
 Alice-0.036
Tokenled
Token ID992
Feature activation+8.36e-01

Pos logit contributions
kin+0.064
ands+0.055
 fiber+0.054
 elbow+0.053
ocol+0.053
Neg logit contributions
 Caroline-0.151
 about-0.151
 Mary-0.150
 Alice-0.149
 Paula-0.149
Token with
Token ID351
Feature activation+5.76e-01

Pos logit contributions
kin+0.021
ped+0.020
uc+0.019
led+0.019
 elbow+0.019
Neg logit contributions
 Elizabeth-0.019
 Caroline-0.019
 Mary-0.019
 university-0.019
abella-0.019
Token excitement
Token ID14067
Feature activation+1.91e+00

Pos logit contributions
kin+0.016
uc+0.015
kins+0.015
ands+0.014
 elbow+0.013
Neg logit contributions
 Mary-0.014
 �-0.013
 Caroline-0.013
 Karen-0.013
 Alice-0.013
Token and
Token ID290
Feature activation+1.30e+00

Pos logit contributions
kin+0.046
ped+0.044
uc+0.043
 elbow+0.041
kins+0.041
Neg logit contributions
 Mary-0.045
 Elizabeth-0.045
 Caroline-0.045
 Alice-0.044
 Lib-0.044
Token she
Token ID673
Feature activation+1.63e+00

Pos logit contributions
kin+0.037
uc+0.035
ped+0.035
agged+0.033
aces+0.033
Neg logit contributions
 Mary-0.027
 Caroline-0.027
 Elizabeth-0.026
 Alice-0.026
 Lib-0.026
Token a
Token ID257
Feature activation-9.81e-02

Pos logit contributions
kin+0.014
ped+0.014
uc+0.013
kins+0.013
 elbow+0.013
Neg logit contributions
 Elizabeth-0.012
 Mary-0.012
 Caroline-0.012
 Alice-0.012
 Lib-0.012
Token white
Token ID2330
Feature activation+2.58e+00

Pos logit contributions
 Mary+0.002
abella+0.002
 Elizabeth+0.002
 Caroline+0.002
 Lib+0.002
Neg logit contributions
kin-0.002
ped-0.002
uc-0.002
 elbow-0.002
bit-0.002
Token fur
Token ID9230
Feature activation+1.29e+00

Pos logit contributions
kin+0.062
uc+0.062
kins+0.062
ands+0.061
ped+0.060
Neg logit contributions
 Karen-0.059
 Alice-0.059
 Elizabeth-0.058
 Mary-0.058
 Caroline-0.058
Token and
Token ID290
Feature activation+7.71e-01

Pos logit contributions
kin+0.036
ped+0.035
uc+0.034
kins+0.033
 elbow+0.033
Neg logit contributions
 Elizabeth-0.026
 Mary-0.025
 Caroline-0.025
 Lib-0.025
 Alice-0.025
Token big
Token ID1263
Feature activation+1.60e+00

Pos logit contributions
kin+0.023
uc+0.022
ags+0.021
ands+0.021
agged+0.021
Neg logit contributions
 Mary-0.015
 Elizabeth-0.015
 Alice-0.015
 Caroline-0.015
 �-0.015
Token brown
Token ID7586
Feature activation+3.90e+00

Pos logit contributions
kin+0.028
uc+0.027
ped+0.026
kins+0.024
led+0.024
Neg logit contributions
 Mary-0.050
 Caroline-0.050
 Elizabeth-0.049
 �-0.049
 Alice-0.049
Token eyes
Token ID2951
Feature activation+2.39e+00

Pos logit contributions
kin+0.069
uc+0.066
ands+0.065
kins+0.064
wy+0.061
Neg logit contributions
 Mary-0.119
 Karen-0.117
 Alice-0.116
 Elizabeth-0.114
 Caroline-0.114
Token.
Token ID13
Feature activation-6.91e-01

Pos logit contributions
kin+0.058
ped+0.055
uc+0.055
kins+0.052
 elbow+0.052
Neg logit contributions
 Mary-0.056
 Elizabeth-0.055
 Lib-0.055
 Caroline-0.055
 �-0.055
Token
Token ID198
Feature activation-4.97e-01

Pos logit contributions
abella+0.016
 certain+0.015
 university+0.015
 arriving+0.015
 Paula+0.015
Neg logit contributions
 paws-0.015
 lips-0.015
aces-0.014
 hands-0.014
 fingers-0.014
Token
Token ID198
Feature activation+1.36e+00

Pos logit contributions
 university+0.011
 Lib+0.011
 Caroline+0.011
 Elizabeth+0.011
abella+0.011
Neg logit contributions
kin-0.011
ped-0.010
uc-0.010
agged-0.010
icked-0.010
TokenThe
Token ID464
Feature activation-6.88e-01

Pos logit contributions
kin+0.031
ped+0.028
ater+0.026
led+0.026
uc+0.026
Neg logit contributions
 Lib-0.033
 Paula-0.033
abella-0.032
 Karen-0.032
 Caroline-0.032
Token me
Token ID502
Feature activation+1.24e+00

Pos logit contributions
kin+0.058
ped+0.055
uc+0.054
 elbow+0.052
kins+0.052
Neg logit contributions
 Mary-0.054
 Elizabeth-0.054
 Caroline-0.054
 Alice-0.053
 Lib-0.052
Token,
Token ID11
Feature activation-1.29e-01

Pos logit contributions
kin+0.035
ped+0.034
uc+0.033
kins+0.032
led+0.032
Neg logit contributions
 Mary-0.025
 Elizabeth-0.025
 Caroline-0.024
 Alice-0.024
 university-0.024
Token Lily
Token ID20037
Feature activation-1.65e+00

Pos logit contributions
 Mary+0.003
 Elizabeth+0.003
 Caroline+0.003
 Alice+0.003
 university+0.003
Neg logit contributions
kin-0.003
ped-0.003
uc-0.003
led-0.003
aces-0.003
Token!"
Token ID2474
Feature activation+7.70e-01

Pos logit contributions
 Caroline+0.042
 Elizabeth+0.042
 Mary+0.041
 university+0.041
 Lib+0.041
Neg logit contributions
kin-0.037
ped-0.035
uc-0.034
 elbow-0.033
kins-0.032
Token he
Token ID339
Feature activation+3.55e+00

Pos logit contributions
kin+0.018
ped+0.018
 elbow+0.018
uc+0.017
 fingers+0.017
Neg logit contributions
 university-0.016
 Elizabeth-0.016
 Caroline-0.016
abella-0.016
 certain-0.016
Token cried
Token ID16896
Feature activation+4.04e+00

Pos logit contributions
kin+0.090
uc+0.084
ped+0.083
 elbow+0.080
agged+0.080
Neg logit contributions
 Elizabeth-0.083
 Mary-0.082
 Caroline-0.081
 Alice-0.081
 Karen-0.079
Token.
Token ID13
Feature activation+5.52e-02

Pos logit contributions
kin+0.087
ped+0.084
uc+0.081
 elbow+0.077
ked+0.076
Neg logit contributions
 Elizabeth-0.106
 Caroline-0.106
 Mary-0.105
 Lib-0.104
 university-0.103
Token "
Token ID366
Feature activation+5.64e-02

Pos logit contributions
ped+0.002
kin+0.002
 lips+0.001
 elbow+0.001
ked+0.001
Neg logit contributions
 Caroline-0.001
 university-0.001
 Elizabeth-0.001
abella-0.001
 foreign-0.001
TokenI
Token ID40
Feature activation+1.08e+00

Pos logit contributions
kin+0.002
uc+0.002
ped+0.002
kins+0.002
icked+0.002
Neg logit contributions
 Elizabeth-0.001
 Mary-0.001
 Caroline-0.001
Alice-0.001
 Alice-0.001
Token can
Token ID460
Feature activation-4.35e-01

Pos logit contributions
kin+0.028
ped+0.028
uc+0.027
agged+0.027
kins+0.026
Neg logit contributions
 Mary-0.023
 Elizabeth-0.023
 certain-0.022
abella-0.022
 Karen-0.022
Token't
Token ID470
Feature activation+7.89e-01

Pos logit contributions
 Elizabeth+0.005
 Alice+0.005
 Mary+0.005
 university+0.005
 Karen+0.005
Neg logit contributions
kin-0.016
uc-0.016
ped-0.016
led-0.015
agged-0.015
Token.
Token ID13
Feature activation+2.23e-01

Pos logit contributions
 �+0.000
 about+0.000
 Mary+0.000
 Alice+0.000
 university+0.000
Neg logit contributions
ped-0.000
kin-0.000
agged-0.000
 elbow-0.000
uc-0.000
Token They
Token ID1119
Feature activation-7.09e-01

Pos logit contributions
kin+0.006
ped+0.006
uc+0.006
aces+0.006
bit+0.006
Neg logit contributions
 Caroline-0.004
 Elizabeth-0.004
 Mary-0.004
 Lib-0.004
abella-0.004
Token are
Token ID389
Feature activation-9.86e-01

Pos logit contributions
 university+0.016
 certain+0.016
 Elizabeth+0.016
 Mary+0.016
 Caroline+0.016
Neg logit contributions
kin-0.017
ped-0.017
uc-0.016
agged-0.016
 elbow-0.016
Token safer
Token ID14178
Feature activation+5.71e-01

Pos logit contributions
 Elizabeth+0.019
 Mary+0.019
 Caroline+0.019
 university+0.019
 Alice+0.018
Neg logit contributions
kin-0.028
ped-0.027
uc-0.027
 elbow-0.026
led-0.026
Token and
Token ID290
Feature activation-1.38e-01

Pos logit contributions
kin+0.018
ped+0.018
agged+0.017
uc+0.017
icked+0.017
Neg logit contributions
 university-0.010
 Mary-0.009
 Alice-0.009
 Elizabeth-0.009
 Caroline-0.009
Token prett
Token ID46442
Feature activation+3.76e+00

Pos logit contributions
 university+0.002
 certain+0.002
 �+0.002
 Mary+0.002
 Alice+0.002
Neg logit contributions
kin-0.005
ped-0.005
uc-0.004
agged-0.004
bit-0.004
Tokenier
Token ID959
Feature activation+1.17e+00

Pos logit contributions
ked+0.101
 paws+0.098
 elbow+0.097
 elbows+0.097
 tentacles+0.095
Neg logit contributions
 about-0.081
abella-0.068
unt-0.062
 wrong-0.059
 By-0.058
Token,"
Token ID553
Feature activation-1.35e-02

Pos logit contributions
agged+0.032
ped+0.031
kin+0.031
uc+0.031
ked+0.030
Neg logit contributions
 Mary-0.023
 Alice-0.022
 university-0.022
 about-0.022
 �-0.021
Token Mom
Token ID11254
Feature activation-7.02e-01

Pos logit contributions
 Caroline+0.000
 Elizabeth+0.000
abella+0.000
 Lib+0.000
 Mary+0.000
Neg logit contributions
kin-0.000
ped-0.000
 elbow-0.000
uc-0.000
kins-0.000
Token says
Token ID1139
Feature activation+2.46e-01

Pos logit contributions
 Karen+0.020
 Caroline+0.020
 university+0.020
 Lib+0.020
Kat+0.020
Neg logit contributions
ped-0.012
ps-0.012
icked-0.012
uc-0.011
kin-0.011
Token.
Token ID13
Feature activation-2.23e-01

Pos logit contributions
kin+0.006
ped+0.005
uc+0.005
 elbow+0.005
kins+0.005
Neg logit contributions
 Elizabeth-0.006
 Mary-0.006
 Caroline-0.006
 Alice-0.006
 Lib-0.006
Token listened
Token ID16399
Feature activation-4.86e-01

Pos logit contributions
 Mary+0.006
 Alice+0.006
abella+0.006
 Caroline+0.006
 Elizabeth+0.006
Neg logit contributions
kin-0.009
ped-0.008
uc-0.008
kins-0.008
led-0.008
Token to
Token ID284
Feature activation+3.28e-01

Pos logit contributions
 Elizabeth+0.009
 Caroline+0.009
 Karen+0.009
 Lib+0.009
 Mary+0.009
Neg logit contributions
kin-0.014
ped-0.014
uc-0.013
led-0.013
aces-0.013
Token his
Token ID465
Feature activation+2.59e+00

Pos logit contributions
kin+0.006
ped+0.006
 elbow+0.006
uc+0.006
kins+0.006
Neg logit contributions
 Elizabeth-0.009
 Lib-0.009
 Caroline-0.009
abella-0.009
 Karen-0.009
Token mom
Token ID1995
Feature activation-1.08e+00

Pos logit contributions
ped+0.075
kin+0.072
led+0.071
uc+0.071
agged+0.070
Neg logit contributions
 university-0.051
 Mary-0.051
 Caroline-0.049
 Alice-0.049
 Carol-0.048
Token and
Token ID290
Feature activation+1.75e+00

Pos logit contributions
 Caroline+0.029
 Elizabeth+0.029
abella+0.029
Kat+0.029
 Lib+0.028
Neg logit contributions
kin-0.022
ped-0.021
uc-0.020
aces-0.020
kins-0.019
Token worn
Token ID12666
Feature activation+3.77e+00

Pos logit contributions
kin+0.046
ped+0.044
uc+0.044
agged+0.044
aces+0.043
Neg logit contributions
 Mary-0.039
 Alice-0.038
 Elizabeth-0.037
 Caroline-0.037
 university-0.037
Token pants
Token ID12581
Feature activation+1.11e+00

Pos logit contributions
kin+0.103
ped+0.099
uc+0.098
kins+0.096
agged+0.094
Neg logit contributions
 Mary-0.077
abella-0.075
 Alice-0.075
 �-0.074
 Carol-0.074
Token.
Token ID13
Feature activation+8.95e-01

Pos logit contributions
ped+0.029
kin+0.029
uc+0.029
bit+0.028
agged+0.027
Neg logit contributions
 �-0.022
 university-0.022
abella-0.022
 Alice-0.022
 Lib-0.022
Token Jack
Token ID3619
Feature activation-5.01e-01

Pos logit contributions
ped+0.025
kin+0.025
uc+0.024
aces+0.023
agged+0.023
Neg logit contributions
 Elizabeth-0.017
 Caroline-0.017
 Lib-0.017
 Karen-0.017
 Carol-0.016
Token's
Token ID338
Feature activation+9.49e-01

Pos logit contributions
 Elizabeth+0.011
 Caroline+0.011
 university+0.011
 Lib+0.011
 Mary+0.011
Neg logit contributions
kin-0.013
ped-0.012
uc-0.012
led-0.012
 elbow-0.011
Token mom
Token ID1995
Feature activation-3.68e-01

Pos logit contributions
ped+0.026
uc+0.026
kin+0.026
led+0.025
kins+0.025
Neg logit contributions
 Mary-0.019
 Alice-0.018
 university-0.018
 Caroline-0.018
 Carol-0.018
Token
Token ID95
Feature activation-5.34e-01

Pos logit contributions
kin+0.020
uc+0.018
ped+0.018
 elbow+0.018
aces+0.017
Neg logit contributions
 Mary-0.017
 Elizabeth-0.017
 Caroline-0.017
 Alice-0.017
 Lib-0.017
Token
Token ID26391
Feature activation-7.13e-01

Pos logit contributions
 Mary+0.011
 Elizabeth+0.011
 Caroline+0.011
 Lib+0.011
 Paula+0.011
Neg logit contributions
kin-0.014
aces-0.013
ped-0.013
bit-0.013
ags-0.013
Token
Token ID129
Feature activation+8.99e-01

Pos logit contributions
 Elizabeth+0.015
 Caroline+0.015
 Lib+0.015
 Mary+0.015
abella+0.014
Neg logit contributions
kin-0.019
uc-0.018
ped-0.018
aces-0.017
kins-0.017
Token
Token ID241
Feature activation+4.51e-01

Pos logit contributions
ped+0.022
kin+0.021
bit+0.021
aces+0.020
uc+0.020
Neg logit contributions
 Mary-0.021
 Elizabeth-0.021
 Caroline-0.021
 Lib-0.020
 Alice-0.020
TokenMy
Token ID3666
Feature activation+1.22e+00

Pos logit contributions
kin+0.012
ped+0.011
uc+0.011
 elbow+0.011
aces+0.011
Neg logit contributions
 Elizabeth-0.010
 Caroline-0.009
 university-0.009
 Mary-0.009
 Lib-0.009
Token t
Token ID256
Feature activation+1.46e+00

Pos logit contributions
ped+0.026
kin+0.026
led+0.025
uc+0.025
agged+0.025
Neg logit contributions
 Mary-0.032
 university-0.032
 �-0.032
 Elizabeth-0.031
 Alice-0.031
Tokeneddy
Token ID21874
Feature activation-4.58e-02

Pos logit contributions
ped+0.037
kin+0.036
 seaw+0.035
kins+0.035
ps+0.034
Neg logit contributions
unt-0.031
 Paula-0.030
abella-0.030
 Karen-0.030
Alice-0.029
Token is
Token ID318
Feature activation-1.27e+00

Pos logit contributions
 Elizabeth+0.001
 Mary+0.001
 Alice+0.001
 Caroline+0.001
 �+0.001
Neg logit contributions
kin-0.001
agged-0.001
uc-0.001
ands-0.001
ags-0.001
Token stuck
Token ID7819
Feature activation+6.24e-01

Pos logit contributions
 Elizabeth+0.028
 Mary+0.028
 Caroline+0.028
 Alice+0.027
 �+0.027
Neg logit contributions
kin-0.033
ped-0.032
uc-0.031
kins-0.030
 elbow-0.030
Token!
Token ID0
Feature activation+2.65e-01

Pos logit contributions
kin+0.017
uc+0.017
ped+0.017
agged+0.016
kins+0.016
Neg logit contributions
 Mary-0.012
 �-0.012
 Alice-0.012
 university-0.012
 Elizabeth-0.012
Token What
Token ID1867
Feature activation+6.84e-02

Pos logit contributions
kin+0.006
ped+0.006
uc+0.006
aces+0.006
agged+0.006
Neg logit contributions
 Caroline-0.006
 Elizabeth-0.006
abella-0.006
 university-0.005
 Mary-0.005
Token a
Token ID257
Feature activation+8.48e-02

Pos logit contributions
 Elizabeth+0.029
 Caroline+0.029
 university+0.029
abella+0.028
 Paula+0.028
Neg logit contributions
kin-0.024
ped-0.024
uc-0.022
aces-0.022
bit-0.021
Token tasty
Token ID25103
Feature activation+6.43e-01

Pos logit contributions
kin+0.002
uc+0.002
ped+0.002
 elbow+0.002
kins+0.002
Neg logit contributions
 Elizabeth-0.002
 Mary-0.002
 Caroline-0.002
 Lib-0.002
abella-0.002
Token treat
Token ID2190
Feature activation+9.98e-01

Pos logit contributions
ped+0.016
kins+0.015
led+0.015
kin+0.015
aces+0.015
Neg logit contributions
 Mary-0.015
 �-0.015
 Elizabeth-0.014
 university-0.014
 Alice-0.014
Token.
Token ID13
Feature activation+8.05e-01

Pos logit contributions
kin+0.025
ped+0.024
uc+0.024
agged+0.023
 elbow+0.023
Neg logit contributions
 Mary-0.023
 Elizabeth-0.022
 �-0.022
 Lib-0.022
 Alice-0.022
Token She
Token ID1375
Feature activation+1.10e+00

Pos logit contributions
kin+0.023
ped+0.023
uc+0.022
aces+0.022
bit+0.021
Neg logit contributions
 Elizabeth-0.015
 Caroline-0.014
abella-0.014
 university-0.014
 certain-0.014
Token smiled
Token ID13541
Feature activation+1.28e+00

Pos logit contributions
kin+0.029
ped+0.028
uc+0.027
 elbow+0.026
kins+0.026
Neg logit contributions
 Elizabeth-0.024
 Mary-0.023
 Caroline-0.023
 Lib-0.023
 Alice-0.023
Token and
Token ID290
Feature activation+1.87e+00

Pos logit contributions
kin+0.039
ped+0.037
uc+0.037
 elbow+0.035
kins+0.035
Neg logit contributions
 Mary-0.023
 �-0.023
 Caroline-0.023
 Elizabeth-0.023
 Alice-0.023
Token thanked
Token ID26280
Feature activation+7.30e-01

Pos logit contributions
kin+0.044
ped+0.042
uc+0.042
aces+0.040
kins+0.039
Neg logit contributions
 Mary-0.046
 Elizabeth-0.046
 Caroline-0.046
 Alice-0.045
 Lib-0.045
Token him
Token ID683
Feature activation-1.30e-01

Pos logit contributions
ped+0.015
kin+0.015
 elbow+0.014
 lips+0.013
uc+0.013
Neg logit contributions
abella-0.018
 Elizabeth-0.018
str-0.018
unt-0.018
 Caroline-0.017
Token.
Token ID13
Feature activation+4.54e-02

Pos logit contributions
Kat+0.004
 Elizabeth+0.004
 Karen+0.004
Kelly+0.004
 Kathy+0.004
Neg logit contributions
kin-0.003
ped-0.002
aces-0.002
uc-0.002
kins-0.002
Token The
Token ID383
Feature activation-8.90e-01

Pos logit contributions
ped+0.001
kin+0.001
aces+0.001
uc+0.001
icked+0.001
Neg logit contributions
abella-0.001
 Caroline-0.001
 university-0.001
 Elizabeth-0.001
 certain-0.001
Token.
Token ID13
Feature activation+4.41e-01

Pos logit contributions
 Elizabeth+0.029
 Mary+0.029
 Caroline+0.029
 Alice+0.029
 Lib+0.029
Neg logit contributions
kin-0.027
ped-0.026
uc-0.025
 elbow-0.024
kins-0.024
Token She
Token ID1375
Feature activation+7.12e-01

Pos logit contributions
kin+0.014
ped+0.014
uc+0.014
aces+0.013
 elbow+0.013
Neg logit contributions
 Elizabeth-0.007
 Caroline-0.007
 Mary-0.007
 Lib-0.007
 university-0.007
Token loved
Token ID6151
Feature activation-4.07e-01

Pos logit contributions
uc+0.023
kin+0.023
ocol+0.022
ags+0.022
ped+0.022
Neg logit contributions
 Mary-0.010
 Alice-0.010
 Elizabeth-0.010
abella-0.010
 university-0.010
Token eating
Token ID6600
Feature activation-5.04e-01

Pos logit contributions
 Caroline+0.007
 Elizabeth+0.007
 Mary+0.007
abella+0.007
 Lib+0.007
Neg logit contributions
kin-0.012
ped-0.012
uc-0.012
 elbow-0.012
aces-0.011
Token rice
Token ID11464
Feature activation+5.40e-01

Pos logit contributions
 Mary+0.013
 Elizabeth+0.013
 Lib+0.013
 Caroline+0.013
 Alice+0.013
Neg logit contributions
kin-0.011
ped-0.010
uc-0.010
 elbow-0.009
aces-0.009
Token for
Token ID329
Feature activation+1.70e+00

Pos logit contributions
ped+0.015
kin+0.015
 elbow+0.015
uc+0.015
ked+0.014
Neg logit contributions
 Mary-0.010
 Lib-0.010
 Elizabeth-0.010
 university-0.010
 Alice-0.010
Token dinner
Token ID8073
Feature activation+3.00e-01

Pos logit contributions
kin+0.041
ped+0.039
ked+0.038
aces+0.038
icked+0.038
Neg logit contributions
 Alice-0.041
 university-0.040
 Mrs-0.040
 Elizabeth-0.040
 Mary-0.040
Token.
Token ID13
Feature activation-4.62e-01

Pos logit contributions
kin+0.007
ped+0.006
uc+0.006
kins+0.006
aces+0.006
Neg logit contributions
 Elizabeth-0.008
 Caroline-0.008
 Mary-0.008
 Lib-0.007
 Alice-0.007
Token One
Token ID1881
Feature activation-2.70e-01

Pos logit contributions
 Caroline+0.008
abella+0.008
 Elizabeth+0.008
Kat+0.008
 certain+0.008
Neg logit contributions
kin-0.013
aces-0.012
ags-0.012
uc-0.012
bit-0.012
Token day
Token ID1110
Feature activation+1.22e+00

Pos logit contributions
 Elizabeth+0.004
 Caroline+0.004
 Mary+0.004
 Lib+0.004
 university+0.004
Neg logit contributions
kin-0.009
ped-0.008
uc-0.008
 elbow-0.008
kins-0.008
Token,
Token ID11
Feature activation+4.21e-01

Pos logit contributions
kin+0.039
ped+0.039
bit+0.038
aces+0.037
uc+0.037
Neg logit contributions
 Mary-0.020
 Elizabeth-0.019
 Alice-0.019
 �-0.019
 Lib-0.018
Token he
Token ID339
Feature activation+9.50e-01

Pos logit contributions
 Elizabeth+0.030
 Caroline+0.029
 university+0.029
abella+0.029
 Carol+0.028
Neg logit contributions
kin-0.035
ped-0.034
 elbow-0.032
uc-0.031
led-0.031
Token trusted
Token ID13467
Feature activation+3.39e-01

Pos logit contributions
kin+0.024
ped+0.023
uc+0.022
led+0.021
 elbow+0.021
Neg logit contributions
 Elizabeth-0.022
 Caroline-0.022
 Mary-0.022
 university-0.022
 Lib-0.021
Token his
Token ID465
Feature activation+1.07e+00

Pos logit contributions
kin+0.007
ped+0.007
uc+0.007
 elbow+0.007
kins+0.006
Neg logit contributions
 Caroline-0.009
 Elizabeth-0.009
 university-0.009
 Carol-0.009
 Lib-0.009
Token dad
Token ID9955
Feature activation-1.67e+00

Pos logit contributions
kin+0.024
ped+0.023
uc+0.022
aces+0.021
 elbow+0.021
Neg logit contributions
 Mary-0.028
 Elizabeth-0.028
 Caroline-0.028
 university-0.027
 Alice-0.027
Token.
Token ID13
Feature activation-7.43e-01

Pos logit contributions
 Caroline+0.053
 Carol+0.052
 Elizabeth+0.052
 university+0.052
Kat+0.051
Neg logit contributions
ped-0.028
kin-0.027
led-0.027
aces-0.025
kins-0.025
Token His
Token ID2399
Feature activation+1.14e+00

Pos logit contributions
 Elizabeth+0.018
 Caroline+0.018
 university+0.017
abella+0.017
 Carol+0.017
Neg logit contributions
ped-0.017
uc-0.017
kin-0.016
icked-0.016
 paws-0.016
Token dad
Token ID9955
Feature activation-5.37e-01

Pos logit contributions
kin+0.026
ped+0.025
uc+0.024
aces+0.022
kins+0.022
Neg logit contributions
 Mary-0.030
 Caroline-0.029
 Elizabeth-0.029
 university-0.029
 Alice-0.029
Token went
Token ID1816
Feature activation-2.14e+00

Pos logit contributions
 university+0.016
 Caroline+0.016
 Elizabeth+0.015
 Carol+0.015
 Lib+0.015
Neg logit contributions
ped-0.011
led-0.010
kin-0.010
uc-0.010
ked-0.010
Token inside
Token ID2641
Feature activation-1.89e+00

Pos logit contributions
str+0.056
 Caroline+0.055
abella+0.054
 Elizabeth+0.054
Kat+0.053
Neg logit contributions
ags-0.041
uc-0.041
ped-0.041
ps-0.040
ked-0.039
Token and
Token ID290
Feature activation-1.42e-01

Pos logit contributions
 Elizabeth+0.046
 Caroline+0.045
abella+0.044
 Lib+0.044
 Carol+0.043
Neg logit contributions
kin-0.045
uc-0.044
ped-0.043
 elbow-0.041
kins-0.041
Token came
Token ID1625
Feature activation-5.65e-01

Pos logit contributions
 Caroline+0.004
 Elizabeth+0.004
 university+0.004
 Lib+0.004
 Carol+0.004
Neg logit contributions
kin-0.003
uc-0.003
ped-0.003
led-0.003
 elbow-0.003
Token a
Token ID257
Feature activation-1.44e+00

Pos logit contributions
 Elizabeth+0.059
 Caroline+0.058
 Lib+0.057
 Amelia+0.057
 Heidi+0.056
Neg logit contributions
kin-0.063
aces-0.057
ped-0.057
uc-0.056
kins-0.054
Token red
Token ID2266
Feature activation+1.22e+00

Pos logit contributions
 Lib+0.038
 Elizabeth+0.037
 Caroline+0.037
 Mary+0.037
abella+0.036
Neg logit contributions
kin-0.033
 elbow-0.028
uc-0.028
kins-0.027
 beads-0.027
Token car
Token ID1097
Feature activation+2.04e-01

Pos logit contributions
kin+0.036
ped+0.036
led+0.035
uc+0.034
kins+0.034
Neg logit contributions
 university-0.023
 �-0.023
 Mary-0.022
 Caroline-0.021
 Karen-0.021
Token that
Token ID326
Feature activation-1.11e+00

Pos logit contributions
kin+0.005
ped+0.005
uc+0.005
 elbow+0.005
kins+0.005
Neg logit contributions
 Elizabeth-0.005
 Mary-0.005
 Caroline-0.005
 university-0.005
 Alice-0.005
Token could
Token ID714
Feature activation-3.48e-01

Pos logit contributions
 Elizabeth+0.023
 Caroline+0.023
 Mary+0.023
 Lib+0.023
 university+0.022
Neg logit contributions
kin-0.030
ped-0.029
uc-0.028
kins-0.027
 elbow-0.027
Token zoom
Token ID19792
Feature activation+1.58e+00

Pos logit contributions
 university+0.007
 Mary+0.007
 Alice+0.007
 Caroline+0.007
 Karen+0.007
Neg logit contributions
uc-0.010
kin-0.009
ped-0.009
led-0.009
agged-0.009
Token very
Token ID845
Feature activation+6.55e-01

Pos logit contributions
kin+0.046
uc+0.044
ped+0.044
 elbow+0.043
kins+0.042
Neg logit contributions
 university-0.030
 Caroline-0.030
 Mary-0.029
 �-0.029
 Alice-0.029
Token fast
Token ID3049
Feature activation+1.66e+00

Pos logit contributions
kin+0.018
ped+0.016
uc+0.016
aces+0.016
kins+0.016
Neg logit contributions
 Elizabeth-0.014
 Mary-0.014
 Caroline-0.014
 Lib-0.014
 Heidi-0.014
Token.
Token ID13
Feature activation-6.76e-01

Pos logit contributions
kin+0.045
uc+0.043
ped+0.042
kins+0.040
agged+0.040
Neg logit contributions
 university-0.036
 �-0.035
 Caroline-0.035
 Mary-0.035
 Alice-0.035
Token They
Token ID1119
Feature activation+2.76e-01

Pos logit contributions
 Caroline+0.014
 certain+0.014
 Elizabeth+0.014
 university+0.013
 Lib+0.013
Neg logit contributions
kin-0.016
 paws-0.015
bit-0.015
aces-0.015
ped-0.015
Token called
Token ID1444
Feature activation-6.75e-01

Pos logit contributions
kin+0.007
uc+0.007
ped+0.007
agged+0.006
kins+0.006
Neg logit contributions
 Mary-0.006
 Caroline-0.006
 Elizabeth-0.006
 Alice-0.006
abella-0.006
Token to
Token ID284
Feature activation+2.18e+00

Pos logit contributions
 Elizabeth+0.029
 Caroline+0.028
abella+0.028
 Paula+0.028
Alice+0.028
Neg logit contributions
ped-0.031
kin-0.030
uc-0.029
led-0.029
ked-0.029
Token wake
Token ID7765
Feature activation+7.16e-01

Pos logit contributions
ped+0.059
led+0.059
ked+0.059
uc+0.058
agged+0.058
Neg logit contributions
 university-0.046
 Mary-0.046
 Elizabeth-0.044
 Alice-0.043
 Carol-0.042
Token up
Token ID510
Feature activation+3.32e-01

Pos logit contributions
ped+0.022
kin+0.022
uc+0.021
led+0.020
ps+0.020
Neg logit contributions
 Elizabeth-0.012
 Caroline-0.012
 Karen-0.012
Alice-0.012
 Lib-0.012
Token early
Token ID1903
Feature activation+1.46e+00

Pos logit contributions
kin+0.009
ped+0.009
uc+0.009
 elbow+0.008
kins+0.008
Neg logit contributions
 Elizabeth-0.007
 Caroline-0.007
 Mary-0.007
 Lib-0.006
 Alice-0.006
Token.
Token ID13
Feature activation+5.28e-01

Pos logit contributions
kin+0.032
ped+0.031
uc+0.030
led+0.029
aces+0.029
Neg logit contributions
 Elizabeth-0.037
 Caroline-0.037
 Karen-0.037
 Mary-0.036
 Carol-0.036
Token It
Token ID632
Feature activation-1.48e-01

Pos logit contributions
ped+0.014
kin+0.014
aces+0.014
uc+0.013
ked+0.013
Neg logit contributions
abella-0.010
 Caroline-0.010
 Elizabeth-0.010
 Lib-0.010
Kat-0.010
Token wanted
Token ID2227
Feature activation-1.10e+00

Pos logit contributions
 Mary+0.003
 Elizabeth+0.003
 �+0.003
 Alice+0.003
 Caroline+0.003
Neg logit contributions
kin-0.004
uc-0.004
ped-0.004
agged-0.004
aces-0.004
Token to
Token ID284
Feature activation+9.82e-01

Pos logit contributions
 Caroline+0.025
 Elizabeth+0.024
 Lib+0.024
 Karen+0.024
 Carol+0.024
Neg logit contributions
ped-0.026
uc-0.025
kin-0.025
 fingers-0.025
 cr-0.024
Token see
Token ID766
Feature activation-1.65e+00

Pos logit contributions
led+0.031
ked+0.030
ped+0.029
uc+0.029
agged+0.029
Neg logit contributions
 university-0.021
 Mary-0.020
 Alice-0.018
 Elizabeth-0.018
 Carol-0.018
Token what
Token ID644
Feature activation-3.19e-01

Pos logit contributions
 Caroline+0.035
abella+0.035
 Elizabeth+0.035
 Lib+0.033
Alice+0.033
Neg logit contributions
ped-0.041
kin-0.041
uc-0.040
kins-0.039
 elbow-0.039
Token the
Token ID262
Feature activation-2.72e-01

Pos logit contributions
abella+0.008
 Caroline+0.008
 Paula+0.007
 Lib+0.007
Kelly+0.007
Neg logit contributions
kin-0.007
ped-0.007
kins-0.007
led-0.006
 cr-0.006
Token,
Token ID11
Feature activation+5.51e-01

Pos logit contributions
kin+0.054
ped+0.052
uc+0.050
led+0.048
kins+0.048
Neg logit contributions
 Elizabeth-0.048
 Caroline-0.047
 Mary-0.046
 Lib-0.045
 university-0.045
Token in
Token ID287
Feature activation-2.94e-01

Pos logit contributions
kin+0.013
ped+0.013
uc+0.013
kins+0.012
ked+0.012
Neg logit contributions
 Elizabeth-0.013
 Caroline-0.013
 Lib-0.012
abella-0.012
 Carol-0.012
Token a
Token ID257
Feature activation-6.51e-01

Pos logit contributions
 Lib+0.006
 Elizabeth+0.006
abella+0.006
 Caroline+0.006
Kat+0.006
Neg logit contributions
kin-0.008
ped-0.007
 elbow-0.007
uc-0.007
kins-0.007
Token far
Token ID1290
Feature activation+4.58e-01

Pos logit contributions
 Lib+0.016
 Karen+0.016
abella+0.015
 Paula+0.015
 Alice+0.015
Neg logit contributions
kin-0.015
 seaw-0.014
 paws-0.013
 claws-0.013
 lips-0.013
Token away
Token ID1497
Feature activation+1.14e+00

Pos logit contributions
kin+0.010
ped+0.010
uc+0.009
led+0.009
bit+0.009
Neg logit contributions
 Elizabeth-0.011
abella-0.011
 Caroline-0.011
 Lib-0.011
 Mary-0.011
Token corner
Token ID5228
Feature activation-1.48e-01

Pos logit contributions
kin+0.031
ped+0.030
uc+0.028
kins+0.028
led+0.028
Neg logit contributions
 Elizabeth-0.024
 Caroline-0.023
 �-0.023
abella-0.023
 Lib-0.023
Token,
Token ID11
Feature activation-3.48e-01

Pos logit contributions
 Elizabeth+0.004
 Caroline+0.004
 Lib+0.004
abella+0.004
 Karen+0.004
Neg logit contributions
kin-0.003
ped-0.003
uc-0.003
led-0.003
aces-0.003
Token he
Token ID339
Feature activation+1.30e+00

Pos logit contributions
 Elizabeth+0.009
unt+0.009
 Caroline+0.009
 Lib+0.008
Kat+0.008
Neg logit contributions
kin-0.008
ped-0.007
led-0.007
 elbow-0.007
ked-0.007
Token saw
Token ID2497
Feature activation+2.73e-01

Pos logit contributions
kin+0.035
ped+0.034
uc+0.033
led+0.032
kins+0.032
Neg logit contributions
 Elizabeth-0.027
 Caroline-0.027
 Mary-0.027
 Lib-0.027
 Alice-0.026
Token a
Token ID257
Feature activation-1.19e+00

Pos logit contributions
kin+0.007
ped+0.007
uc+0.006
 elbow+0.006
led+0.006
Neg logit contributions
 Elizabeth-0.006
 Caroline-0.006
 Mary-0.006
 Lib-0.006
 university-0.006
Token big
Token ID1263
Feature activation-1.34e+00

Pos logit contributions
 Lib+0.033
abella+0.033
 Elizabeth+0.032
 Alice+0.032
 Heidi+0.032
Neg logit contributions
kin-0.025
uc-0.020
 lips-0.020
 seaw-0.020
kins-0.020
Token.
Token ID13
Feature activation+4.63e-01

Pos logit contributions
ped+0.015
led+0.014
aces+0.014
kin+0.013
ps+0.012
Neg logit contributions
Kat-0.025
 bakery-0.025
 Caroline-0.025
 Elizabeth-0.025
abella-0.025
Token She
Token ID1375
Feature activation+5.91e-01

Pos logit contributions
kin+0.012
aces+0.012
ped+0.012
 elbow+0.012
agged+0.012
Neg logit contributions
abella-0.009
 Elizabeth-0.009
 Lib-0.009
 Caroline-0.009
str-0.009
Token was
Token ID373
Feature activation-1.86e+00

Pos logit contributions
kin+0.017
ped+0.017
uc+0.017
kins+0.016
 elbow+0.016
Neg logit contributions
 Elizabeth-0.011
 Mary-0.011
 Caroline-0.011
abella-0.011
 Alice-0.011
Token more
Token ID517
Feature activation-7.56e-01

Pos logit contributions
 Caroline+0.045
 Elizabeth+0.044
 Mary+0.044
 Carol+0.044
 Paula+0.043
Neg logit contributions
ped-0.045
kin-0.045
aces-0.041
agged-0.041
ps-0.040
Token polite
Token ID23507
Feature activation+1.89e-01

Pos logit contributions
 Elizabeth+0.019
 Mary+0.019
 Caroline+0.019
 Alice+0.019
 university+0.019
Neg logit contributions
kin-0.017
ped-0.016
uc-0.016
 elbow-0.015
kins-0.015
Token and
Token ID290
Feature activation-1.51e-01

Pos logit contributions
kin+0.006
ped+0.005
uc+0.005
 elbow+0.005
kins+0.005
Neg logit contributions
 Mary-0.003
 Elizabeth-0.003
 Caroline-0.003
 Alice-0.003
 university-0.003
Token gentle
Token ID10296
Feature activation+1.06e+00

Pos logit contributions
 Elizabeth+0.003
 Mary+0.003
 �+0.003
 Caroline+0.003
 Alice+0.003
Neg logit contributions
kin-0.004
uc-0.004
ped-0.004
agged-0.004
aces-0.004
Token with
Token ID351
Feature activation+1.95e-01

Pos logit contributions
kin+0.030
ped+0.028
uc+0.028
kins+0.027
 elbow+0.027
Neg logit contributions
 Mary-0.022
 Elizabeth-0.022
 Caroline-0.021
 Alice-0.021
 university-0.021
Token her
Token ID607
Feature activation+2.05e+00

Pos logit contributions
kin+0.005
ped+0.005
uc+0.005
 elbow+0.005
kins+0.005
Neg logit contributions
 Elizabeth-0.004
 Mary-0.004
 Caroline-0.004
 Alice-0.004
 Lib-0.004
Token humans
Token ID5384
Feature activation-1.41e+00

Pos logit contributions
ped+0.048
kin+0.047
agged+0.046
icked+0.046
led+0.045
Neg logit contributions
 university-0.049
 �-0.047
 Elizabeth-0.047
 Mary-0.047
 certain-0.046
Token and
Token ID290
Feature activation+1.57e-01

Pos logit contributions
 Elizabeth+0.035
 Caroline+0.035
 Mary+0.035
 Lib+0.034
abella+0.034
Neg logit contributions
kin-0.032
ped-0.031
uc-0.030
aces-0.029
 elbow-0.028
Token studied
Token ID9713
Feature activation-3.47e-01

Pos logit contributions
kin+0.005
ped+0.004
uc+0.004
led+0.004
aces+0.004
Neg logit contributions
 Elizabeth-0.004
 Caroline-0.004
 Mary-0.004
 Lib-0.004
 university-0.004
Token,
Token ID11
Feature activation-2.45e+00

Pos logit contributions
 Elizabeth+0.008
 Lib+0.008
 Heidi+0.008
 Caroline+0.008
 Karen+0.008
Neg logit contributions
kin-0.009
ped-0.008
led-0.008
uc-0.008
kins-0.008
Token the
Token ID262
Feature activation+9.06e-02

Pos logit contributions
 Elizabeth+0.054
 Lib+0.052
abella+0.052
unt+0.052
 Karen+0.052
Neg logit contributions
ped-0.062
kin-0.058
aces-0.058
 fingers-0.057
led-0.057
Token harder
Token ID7069
Feature activation-5.51e-01

Pos logit contributions
kin+0.002
ped+0.002
uc+0.002
 elbow+0.002
kins+0.002
Neg logit contributions
 Elizabeth-0.002
 Lib-0.002
 Mary-0.002
 Caroline-0.002
 Karen-0.002
Token it
Token ID340
Feature activation-2.25e-01

Pos logit contributions
 Elizabeth+0.013
 Lib+0.013
abella+0.013
 Karen+0.013
 Caroline+0.013
Neg logit contributions
kin-0.013
ped-0.012
kins-0.012
uc-0.012
aces-0.012
Token was
Token ID373
Feature activation-5.64e-01

Pos logit contributions
 Elizabeth+0.005
 Caroline+0.005
 Mary+0.005
 Lib+0.005
 Alice+0.005
Neg logit contributions
kin-0.006
ped-0.005
uc-0.005
led-0.005
aces-0.005
Token to
Token ID284
Feature activation+7.21e-01

Pos logit contributions
 Elizabeth+0.012
 Caroline+0.012
 Lib+0.011
 Karen+0.011
 Mary+0.011
Neg logit contributions
kin-0.015
ped-0.015
uc-0.014
aces-0.014
kins-0.014
Token concentrate
Token ID20062
Feature activation+3.74e-01

Pos logit contributions
kin+0.017
ped+0.016
uc+0.016
 elbow+0.015
led+0.015
Neg logit contributions
 Mary-0.017
 Elizabeth-0.017
 Caroline-0.017
 university-0.017
 Alice-0.017
Token,
Token ID11
Feature activation-2.36e+00

Pos logit contributions
kin+0.009
ped+0.009
uc+0.009
aces+0.008
kins+0.008
Neg logit contributions
 Elizabeth-0.009
 Caroline-0.009
 Mary-0.009
 Lib-0.009
 Alice-0.008
Token but
Token ID475
Feature activation-1.23e+00

Pos logit contributions
 Elizabeth+0.049
abella+0.049
 Lib+0.048
 Karen+0.048
 Caroline+0.048
Neg logit contributions
ped-0.061
kin-0.061
aces-0.057
 elbow-0.057
ps-0.057
Token Jacob
Token ID12806
Feature activation-7.76e-01

Pos logit contributions
 Elizabeth+0.025
Kat+0.025
unt+0.025
Alice+0.025
 Lib+0.024
Neg logit contributions
kin-0.033
ped-0.031
 elbow-0.029
kins-0.029
 toes-0.029
Token pepper
Token ID13385
Feature activation-1.39e+00

Pos logit contributions
 Mary+0.007
 Elizabeth+0.007
 Caroline+0.007
 university+0.007
 Alice+0.007
Neg logit contributions
kin-0.006
ped-0.006
uc-0.005
 elbow-0.005
aces-0.005
Token.
Token ID13
Feature activation-1.04e+00

Pos logit contributions
 Elizabeth+0.033
 Caroline+0.033
 Mary+0.033
 Lib+0.032
 Alice+0.032
Neg logit contributions
kin-0.033
ped-0.032
uc-0.031
 elbow-0.030
kins-0.030
Token<|endoftext|>
Token ID50256
Feature activation+2.10e-01

Pos logit contributions
 university+0.021
abella+0.021
 foreign+0.020
 arriving+0.020
 site+0.020
Neg logit contributions
aces-0.025
ped-0.024
kin-0.024
 paws-0.024
 hands-0.023
TokenOnce
Token ID7454
Feature activation-9.41e-01

Pos logit contributions
kin+0.006
ped+0.005
uc+0.005
aces+0.005
kins+0.005
Neg logit contributions
 Elizabeth-0.004
 university-0.004
 Caroline-0.004
abella-0.004
 Lib-0.004
Token upon
Token ID2402
Feature activation+6.68e-01

Pos logit contributions
 Elizabeth+0.027
 Caroline+0.026
 university+0.026
 Lib+0.026
 bakery+0.025
Neg logit contributions
kin-0.019
uc-0.016
ped-0.016
kins-0.016
led-0.016
Token a
Token ID257
Feature activation-1.21e+00

Pos logit contributions
uc+0.020
kin+0.019
ped+0.019
kins+0.018
led+0.018
Neg logit contributions
 Mary-0.013
 Alice-0.013
 Caroline-0.013
 university-0.013
 Elizabeth-0.012
Token time
Token ID640
Feature activation-8.04e-01

Pos logit contributions
 Lib+0.025
 Paula+0.025
 Caroline+0.025
 Elizabeth+0.025
 Karen+0.025
Neg logit contributions
kin-0.033
ped-0.029
bit-0.029
uc-0.029
agged-0.028
Token,
Token ID11
Feature activation-4.64e-01

Pos logit contributions
 Elizabeth+0.020
abella+0.020
 Caroline+0.020
Kat+0.019
 university+0.019
Neg logit contributions
kin-0.018
ped-0.016
uc-0.016
kins-0.016
icked-0.015
Token there
Token ID612
Feature activation-1.01e+00

Pos logit contributions
 Elizabeth+0.009
 Caroline+0.009
 Mary+0.009
 university+0.009
 Lib+0.009
Neg logit contributions
kin-0.013
ped-0.013
uc-0.012
led-0.012
kins-0.012
Token was
Token ID373
Feature activation+4.68e-01

Pos logit contributions
 Elizabeth+0.022
 university+0.022
 Caroline+0.022
abella+0.021
 Lib+0.021
Neg logit contributions
kin-0.027
ped-0.026
uc-0.024
kins-0.024
led-0.024
Token a
Token ID257
Feature activation-1.06e+00

Pos logit contributions
kin+0.013
uc+0.013
ped+0.013
kins+0.012
 elbow+0.012
Neg logit contributions
 Mary-0.010
 Alice-0.009
 Elizabeth-0.009
 Caroline-0.009
 Lib-0.009
Token Anna
Token ID11735
Feature activation+6.11e-01

Pos logit contributions
kin+0.041
ped+0.041
uc+0.039
 elbow+0.038
aces+0.037
Neg logit contributions
 university-0.026
 Caroline-0.026
 Elizabeth-0.025
abella-0.025
 certain-0.025
Token cried
Token ID16896
Feature activation+2.97e+00

Pos logit contributions
kin+0.015
ped+0.014
uc+0.014
 elbow+0.013
kins+0.013
Neg logit contributions
 Elizabeth-0.015
 Caroline-0.014
 Mary-0.014
 Lib-0.014
 university-0.014
Token,
Token ID11
Feature activation-6.19e-01

Pos logit contributions
kin+0.076
uc+0.072
ped+0.072
kins+0.068
 elbow+0.067
Neg logit contributions
 Elizabeth-0.067
 Mary-0.067
 �-0.066
 Caroline-0.066
 Alice-0.066
Token running
Token ID2491
Feature activation+1.46e-01

Pos logit contributions
abella+0.012
 university+0.012
 Caroline+0.012
 Lib+0.011
Kat+0.011
Neg logit contributions
ped-0.017
kin-0.017
 elbow-0.017
ked-0.017
 lips-0.016
Token to
Token ID284
Feature activation-7.26e-01

Pos logit contributions
kin+0.004
ped+0.003
uc+0.003
 elbow+0.003
kins+0.003
Neg logit contributions
 Elizabeth-0.003
 Caroline-0.003
 Mary-0.003
 university-0.003
 Lib-0.003
Token Ben
Token ID3932
Feature activation-2.47e+00

Pos logit contributions
 Elizabeth+0.017
 Lib+0.017
 Caroline+0.017
abella+0.017
 Mary+0.016
Neg logit contributions
kin-0.017
ped-0.017
 elbow-0.016
uc-0.016
bit-0.016
Token.
Token ID13
Feature activation+7.39e-01

Pos logit contributions
abella+0.073
 Caroline+0.072
 Lib+0.071
 Elizabeth+0.071
 Karen+0.070
Neg logit contributions
kin-0.045
ped-0.043
uc-0.040
aces-0.040
kins-0.039
Token
Token ID198
Feature activation-1.87e-02

Pos logit contributions
kin+0.018
 elbow+0.017
ped+0.017
ked+0.017
 lips+0.017
Neg logit contributions
abella-0.015
 Karen-0.015
 Lib-0.015
str-0.015
 university-0.015
Token
Token ID198
Feature activation+1.65e+00

Pos logit contributions
 Lib+0.000
 Caroline+0.000
 Elizabeth+0.000
 Karen+0.000
 university+0.000
Neg logit contributions
kin-0.001
ped-0.000
uc-0.000
ands-0.000
bit-0.000
TokenBen
Token ID11696
Feature activation-1.10e+00

Pos logit contributions
kin+0.040
ped+0.038
uc+0.037
 elbow+0.035
kins+0.035
Neg logit contributions
 Elizabeth-0.040
 Mary-0.039
 Caroline-0.039
 Lib-0.039
 Alice-0.038
Token heard
Token ID2982
Feature activation+1.35e+00

Pos logit contributions
 university+0.027
 Caroline+0.027
 Lib+0.027
 Elizabeth+0.026
 Mary+0.026
Neg logit contributions
kin-0.025
ped-0.025
uc-0.024
led-0.023
 elbow-0.023
Token In
Token ID554
Feature activation-1.89e+00

Pos logit contributions
Kat+0.027
 Lib+0.027
 university+0.027
 bakery+0.026
 Paula+0.026
Neg logit contributions
 paws-0.030
ked-0.029
ped-0.028
bit-0.028
agged-0.028
Token the
Token ID262
Feature activation-1.13e+00

Pos logit contributions
 Lib+0.039
Kat+0.039
 Elizabeth+0.038
abella+0.038
 Mary+0.038
Neg logit contributions
kin-0.050
ped-0.048
 elbow-0.045
bit-0.044
uc-0.044
Token end
Token ID886
Feature activation-2.13e+00

Pos logit contributions
abella+0.035
 Lib+0.034
 Paula+0.032
Kat+0.032
 Karen+0.032
Neg logit contributions
kin-0.017
bit-0.017
ags-0.017
ped-0.017
 paws-0.016
Token,
Token ID11
Feature activation-1.34e+00

Pos logit contributions
Kat+0.064
 Lib+0.062
aman+0.060
abella+0.060
 Heidi+0.060
Neg logit contributions
kin-0.033
kins-0.032
 paws-0.029
uc-0.029
ippers-0.028
Token Rex
Token ID17853
Feature activation-1.34e+00

Pos logit contributions
Kat+0.031
 Lib+0.029
 bakery+0.029
 university+0.029
 Elizabeth+0.028
Neg logit contributions
kin-0.034
ped-0.031
ked-0.030
kins-0.030
led-0.030
Token decided
Token ID3066
Feature activation-2.36e+00

Pos logit contributions
 Lib+0.032
 Elizabeth+0.032
 university+0.032
 Caroline+0.032
 Karen+0.031
Neg logit contributions
ped-0.031
kin-0.031
led-0.030
uc-0.028
ked-0.028
Token not
Token ID407
Feature activation-9.75e-01

Pos logit contributions
 Elizabeth+0.058
 Caroline+0.056
unt+0.056
 bakery+0.056
Kat+0.055
Neg logit contributions
ped-0.049
ked-0.045
 claws-0.042
led-0.042
 elbow-0.041
Token to
Token ID284
Feature activation+1.24e+00

Pos logit contributions
abella+0.022
 Elizabeth+0.021
 Caroline+0.021
unt+0.021
 Lib+0.020
Neg logit contributions
ped-0.024
uc-0.022
kin-0.022
ked-0.021
 fingers-0.021
Token permit
Token ID8749
Feature activation-1.23e+00

Pos logit contributions
kin+0.027
ped+0.026
uc+0.025
bit+0.024
 elbow+0.024
Neg logit contributions
 Elizabeth-0.032
 Caroline-0.031
abella-0.031
 Mary-0.031
 Lib-0.031
Token the
Token ID262
Feature activation-9.17e-01

Pos logit contributions
Kat+0.034
 bakery+0.033
str+0.032
unt+0.032
Alice+0.032
Neg logit contributions
ped-0.024
ps-0.023
bit-0.021
 toes-0.021
 fingers-0.021
Token visitor
Token ID21493
Feature activation-1.15e+00

Pos logit contributions
abella+0.022
 �+0.021
Kat+0.021
Mary+0.020
 Taking+0.020
Neg logit contributions
ped-0.020
ps-0.019
 heads-0.018
 feathers-0.018
kin-0.017
Token house
Token ID2156
Feature activation+8.14e-01

Pos logit contributions
kin+0.038
ped+0.037
uc+0.036
 elbow+0.035
kins+0.035
Neg logit contributions
 Elizabeth-0.026
 Caroline-0.026
abella-0.026
 Mary-0.026
 Lib-0.025
Token.
Token ID13
Feature activation+1.30e-01

Pos logit contributions
kin+0.018
ped+0.017
uc+0.016
 elbow+0.015
kins+0.015
Neg logit contributions
 Elizabeth-0.022
 Caroline-0.021
 Mary-0.021
 Lib-0.021
 university-0.021
Token Daddy
Token ID26926
Feature activation-2.83e+00

Pos logit contributions
kin+0.003
 shorts+0.003
 toes+0.003
 fingers+0.003
 paws+0.003
Neg logit contributions
 Elizabeth-0.003
 certain-0.003
abella-0.003
 Caroline-0.003
Kat-0.003
Token,
Token ID11
Feature activation-2.29e-01

Pos logit contributions
 Caroline+0.070
 Elizabeth+0.069
 university+0.068
 Carol+0.067
 Lib+0.066
Neg logit contributions
kin-0.067
ped-0.065
uc-0.064
led-0.062
kins-0.061
Token Mom
Token ID11254
Feature activation-3.40e+00

Pos logit contributions
 Elizabeth+0.006
abella+0.006
 Caroline+0.006
itating+0.006
 certain+0.006
Neg logit contributions
kin-0.005
kins-0.004
 elbow-0.004
ped-0.004
uc-0.004
Tokenmy
Token ID1820
Feature activation-2.21e+00

Pos logit contributions
 Caroline+0.067
 Elizabeth+0.066
 university+0.065
 Lib+0.063
abella+0.062
Neg logit contributions
kin-0.095
ped-0.095
uc-0.094
led-0.088
kins-0.088
Token,
Token ID11
Feature activation-5.39e-01

Pos logit contributions
 Caroline+0.051
 Elizabeth+0.050
 university+0.049
 Lib+0.049
 Mary+0.049
Neg logit contributions
kin-0.055
ped-0.054
uc-0.052
led-0.050
kins-0.050
Token the
Token ID262
Feature activation-2.77e-01

Pos logit contributions
 Caroline+0.012
 Elizabeth+0.012
abella+0.011
 university+0.011
 Carol+0.011
Neg logit contributions
kin-0.014
ped-0.013
uc-0.013
kins-0.012
 elbow-0.012
Token little
Token ID1310
Feature activation+9.66e-01

Pos logit contributions
 Mary+0.006
 Elizabeth+0.006
 Caroline+0.006
 university+0.006
 Alice+0.006
Neg logit contributions
kin-0.007
ped-0.007
uc-0.007
led-0.006
aces-0.006
Token boy
Token ID2933
Feature activation-2.05e+00

Pos logit contributions
kin+0.027
ped+0.026
uc+0.025
aces+0.025
agged+0.025
Neg logit contributions
 Mary-0.019
 Elizabeth-0.019
 Caroline-0.019
 Alice-0.019
 university-0.019
Token and
Token ID290
Feature activation+6.28e-01

Pos logit contributions
 Caroline+0.048
 Elizabeth+0.048
 university+0.047
 Carol+0.046
abella+0.046
Neg logit contributions
kin-0.050
uc-0.048
ped-0.048
kins-0.045
led-0.045
Token.
Token ID13
Feature activation+1.05e-01

Pos logit contributions
kin+0.027
ped+0.026
uc+0.025
led+0.024
kins+0.023
Neg logit contributions
 Elizabeth-0.031
 Caroline-0.030
 university-0.030
 Lib-0.030
 Mary-0.029
Token The
Token ID383
Feature activation-9.56e-01

Pos logit contributions
kin+0.002
ped+0.002
aces+0.002
agged+0.002
 paws+0.002
Neg logit contributions
 Elizabeth-0.002
 Lib-0.002
 Caroline-0.002
Kat-0.002
 bakery-0.002
Token dog
Token ID3290
Feature activation-9.13e-01

Pos logit contributions
 Caroline+0.022
 university+0.022
 Elizabeth+0.022
 Mary+0.022
 �+0.021
Neg logit contributions
kin-0.024
ped-0.023
uc-0.023
kins-0.021
icked-0.021
Token was
Token ID373
Feature activation-1.35e+00

Pos logit contributions
 university+0.023
 Elizabeth+0.023
 Lib+0.023
 Heidi+0.022
 Caroline+0.022
Neg logit contributions
ped-0.020
kin-0.020
led-0.019
ked-0.019
uc-0.018
Token sad
Token ID6507
Feature activation+1.11e+00

Pos logit contributions
 Elizabeth+0.031
 university+0.030
unt+0.030
 Heidi+0.029
 Caroline+0.029
Neg logit contributions
kin-0.033
ped-0.033
ps-0.030
bit-0.029
 toes-0.029
Token,
Token ID11
Feature activation-3.03e+00

Pos logit contributions
kin+0.028
ped+0.027
uc+0.026
 elbow+0.026
led+0.026
Neg logit contributions
 Elizabeth-0.025
 Caroline-0.025
 Lib-0.024
abella-0.024
Alice-0.024
Token but
Token ID475
Feature activation-1.59e+00

Pos logit contributions
abella+0.068
 university+0.068
 Elizabeth+0.067
Kat+0.066
 bakery+0.066
Neg logit contributions
ped-0.074
kin-0.069
 elbow-0.068
ked-0.068
 fingers-0.067
Token then
Token ID788
Feature activation+6.44e-01

Pos logit contributions
 university+0.036
 Elizabeth+0.035
 bakery+0.035
 Paula+0.034
unt+0.034
Neg logit contributions
kin-0.040
ped-0.039
led-0.036
 elbow-0.035
 toes-0.034
Token suddenly
Token ID6451
Feature activation+1.95e+00

Pos logit contributions
kin+0.017
ped+0.016
 elbow+0.015
uc+0.015
led+0.015
Neg logit contributions
 Elizabeth-0.014
 Lib-0.013
 university-0.013
 Caroline-0.013
 Karen-0.013
Token it
Token ID340
Feature activation+6.63e-01

Pos logit contributions
kin+0.057
ped+0.055
uc+0.054
 elbow+0.052
kins+0.052
Neg logit contributions
 Elizabeth-0.036
 Caroline-0.036
 Mary-0.036
 Lib-0.035
 university-0.035
Token heard
Token ID2982
Feature activation+1.09e+00

Pos logit contributions
kin+0.017
uc+0.017
ped+0.016
 elbow+0.016
ags+0.016
Neg logit contributions
 �-0.014
 Caroline-0.014
 Mary-0.014
abella-0.014
 Elizabeth-0.014
Token From
Token ID3574
Feature activation-1.80e-01

Pos logit contributions
 university+0.026
 Elizabeth+0.025
 foreign+0.025
 Caroline+0.025
 site+0.025
Neg logit contributions
kin-0.028
bit-0.028
 paws-0.027
 hands-0.027
ped-0.027
Token that
Token ID326
Feature activation-3.20e+00

Pos logit contributions
 Elizabeth+0.004
 Mary+0.003
 Caroline+0.003
 university+0.003
 Lib+0.003
Neg logit contributions
kin-0.005
ped-0.005
uc-0.005
led-0.005
 elbow-0.005
Token day
Token ID1110
Feature activation-1.03e-01

Pos logit contributions
abella+0.087
 Lib+0.087
 Caroline+0.086
 Karen+0.085
saw+0.085
Neg logit contributions
kin-0.069
ped-0.061
w-0.054
 hands-0.053
kins-0.053
Token on
Token ID319
Feature activation-4.79e-01

Pos logit contributions
 Mary+0.002
 Elizabeth+0.002
 Caroline+0.002
 Lib+0.002
 Alice+0.002
Neg logit contributions
kin-0.003
ped-0.002
uc-0.002
kins-0.002
agged-0.002
Token,
Token ID11
Feature activation-1.35e+00

Pos logit contributions
abella+0.011
 Lib+0.011
 Caroline+0.011
 Elizabeth+0.011
Kat+0.011
Neg logit contributions
kin-0.012
ped-0.011
kins-0.011
uc-0.010
 elbow-0.010
Token Lily
Token ID20037
Feature activation-2.52e+00

Pos logit contributions
 Elizabeth+0.033
 Caroline+0.032
 Lib+0.031
 Mary+0.031
abella+0.031
Neg logit contributions
kin-0.032
uc-0.029
ped-0.028
 beads-0.028
kins-0.028
Token always
Token ID1464
Feature activation+3.45e-01

Pos logit contributions
 Elizabeth+0.054
 Lib+0.054
 Caroline+0.054
 Mary+0.054
 Karen+0.053
Neg logit contributions
kin-0.066
ped-0.066
uc-0.062
led-0.061
bit-0.060
Token remembered
Token ID12086
Feature activation-2.15e+00

Pos logit contributions
kin+0.008
ped+0.008
uc+0.007
 elbow+0.007
kins+0.007
Neg logit contributions
 Elizabeth-0.009
 Mary-0.009
 Caroline-0.008
 Alice-0.008
 Lib-0.008
Token to
Token ID284
Feature activation-3.80e-01

Pos logit contributions
 Caroline+0.043
 Elizabeth+0.042
 Lib+0.041
abella+0.041
 Mary+0.040
Neg logit contributions
ped-0.058
kin-0.057
uc-0.057
 elbow-0.054
kins-0.053
Token be
Token ID307
Feature activation-2.57e-01

Pos logit contributions
 Elizabeth+0.009
 Caroline+0.009
 Mary+0.009
 Lib+0.009
abella+0.009
Neg logit contributions
kin-0.009
ped-0.008
uc-0.008
 elbow-0.008
bit-0.008
Token safe
Token ID3338
Feature activation+1.51e-02

Pos logit contributions
 Caroline+0.006
 Elizabeth+0.006
 Mary+0.006
 Lib+0.006
abella+0.006
Neg logit contributions
kin-0.006
ped-0.006
uc-0.006
kins-0.006
aces-0.006