TOP ACTIVATIONS
MAX = 0.088

then on Lily and the f auc et played gently ,
anymore ." The f auc et was happy again
. The f auc et started to feel
play !" The f auc et was happy and
Lily started to tease the f auc et . She would
faster she skipped until the f auc et was a blur
swimming around her . The mer maid smiled and said ,
my quickly turned off the f auc et and apologized to
L ily and the f auc et played all day
started to flow from the f auc et . It wasn
could play too . The babys itter said it was fine
Suddenly , the f auc et broke and water
not to play with the f auc et because it was
yellow button and suddenly a voice came out of nowhere and
you ." The f auc et wasn 't very
finally changed ! Everywhere the f auc et went the rainbow
came and turned off the f auc et for her .
. The babys itter told Lily that they
She noticed that the f auc et was wide open
girl and knew that the f auc et was stuck because

MIN ACTIVATIONS
MIN = -0.079

. Tim my loved going on holiday with his family .
. Tim my loved going on saf ari with his family
Lily . She loved going on holiday with her family .
Lily . She loved going on adventures in the bush with
Lily . She loved going on adventures with her family .
. Tim my loved going on pic n ics with his
. Tim my loved going on vacation because he got to
Lily . She loved going on adventures in the forest near
Lily . She loved going on adventures with her mom my
. Tim my loved going on the subway with his mom
John . He loved going on adventures and playing outside .
named Sam . Sam lived on a farm with other animals
Lily . She loved going on adventures with her dog ,
Lily . She loved going on adventures with her family .
. Tim my loved going on adventures with his regular jacket
Lily . She loved going on trips with her family .
Lily . She loved going on vacation with her family .
ily and Tom were going on a journey with Mom and
Rose . She loved going on adventures and one day ,
little girl who loved going on adventures . Today , she

INTERVAL 0.046 - 0.088
CONTAINS 0.096%

concert and Lily danced and cl apped her hands . After
What 's wrong ?" The bird softly replied between its so
. Their parents hugged them and praised them . They said
Then , a voice called out from the garden
a big box in the living room . He thought ,

INTERVAL 0.005 - 0.046
CONTAINS 37.925%

other was called Lily . They were always playing together but
� � Here you go ! â € The
, Jenny told her friends all about their secret adventure .
did not like dogs . They had sharp teeth and rough
So Matt said , " We should share the perfume .

INTERVAL -0.037 - 0.005
CONTAINS 61.256%

Spot was so happy , he knew he had to prepare
yes . S ue put the rock in her
bug who was very clean . He lived in the garden
" Look there is an airplane !" said the mom my
and ride the pony . They rode around the field and

INTERVAL -0.079 - -0.037
CONTAINS 0.723%

very much . <|endoftext|> L ily and Max were playing in
! He put the money in his pocket and started running
She looked angry . " This is a private party .
an idea . He hopped up on the chair , then
. She started to read it aloud . M
Token then
Token ID788
Feature activation-4.86e-03

Pos logit contributions
arl+0.037
auc+0.033
lessness+0.033
washer+0.032
udd+0.031
Neg logit contributions
 Jeffrey-0.020
 Raven-0.019
 Tab-0.016
 Pet-0.016
 Heidi-0.016
Token on
Token ID319
Feature activation-1.55e-02

Pos logit contributions
 Jeffrey+0.007
 Raven+0.007
 Tab+0.006
 Heidi+0.006
 Pet+0.006
Neg logit contributions
arl-0.012
lessness-0.010
auc-0.010
washer-0.010
udd-0.009
Token Lily
Token ID20037
Feature activation-3.44e-03

Pos logit contributions
 Jeffrey+0.029
 Raven+0.026
 Gor+0.025
 Tab+0.024
 Heidi+0.024
Neg logit contributions
arl-0.032
lessness-0.027
auc-0.026
washer-0.025
ot-0.024
Token and
Token ID290
Feature activation+1.82e-02

Pos logit contributions
 Jeffrey+0.005
 Raven+0.004
 Tab+0.004
 Pet+0.004
 Ruth+0.004
Neg logit contributions
arl-0.009
auc-0.008
lessness-0.008
washer-0.007
leys-0.006
Token the
Token ID262
Feature activation+4.79e-02

Pos logit contributions
arl+0.045
lessness+0.039
auc+0.038
washer+0.036
udd+0.035
Neg logit contributions
 Jeffrey-0.028
 Raven-0.025
 Tab-0.023
 Heidi-0.023
 Gor-0.023
Token f
Token ID277
Feature activation+8.64e-02

Pos logit contributions
arl+0.091
lessness+0.073
leys+0.073
ares+0.072
udd+0.066
Neg logit contributions
 Jeffrey-0.084
 Raven-0.080
 Pet-0.077
 Tab-0.076
 gor-0.075
Tokenauc
Token ID14272
Feature activation+6.46e-02

Pos logit contributions
arl+0.097
 die+0.092
 divorce+0.076
 forgive+0.066
 charity+0.065
Neg logit contributions
alia-0.157
 Noah-0.141
elle-0.140
 Jeffrey-0.140
sey-0.138
Tokenet
Token ID316
Feature activation+2.79e-02

Pos logit contributions
arl+0.190
lessness+0.169
washer+0.159
auc+0.151
 outage+0.150
Neg logit contributions
 Jeffrey-0.046
elle-0.034
 Gor-0.033
 Raven-0.031
 gor-0.028
Token played
Token ID2826
Feature activation+4.04e-02

Pos logit contributions
arl+0.067
lessness+0.051
auc+0.046
 die+0.046
 outage+0.046
Neg logit contributions
 Jeffrey-0.035
elle-0.032
 Raven-0.032
 Pet-0.032
 Tab-0.031
Token gently
Token ID15165
Feature activation+6.73e-03

Pos logit contributions
arl+0.113
lessness+0.092
oke+0.089
auc+0.088
washer+0.087
Neg logit contributions
 Jeffrey-0.043
 Raven-0.038
 Pet-0.033
 counting-0.032
 Sunshine-0.032
Token,
Token ID11
Feature activation-1.55e-03

Pos logit contributions
arl+0.018
lessness+0.016
auc+0.015
washer+0.015
udd+0.014
Neg logit contributions
 Jeffrey-0.008
 Raven-0.008
 Pet-0.007
 gor-0.007
 Tab-0.006
Token anymore
Token ID7471
Feature activation+8.49e-03

Pos logit contributions
arl+0.025
auc+0.022
lessness+0.022
washer+0.021
oke+0.019
Neg logit contributions
 Jeffrey-0.016
 Raven-0.016
 Tab-0.014
 Gor-0.014
 Ruth-0.013
Token."
Token ID526
Feature activation-1.49e-03

Pos logit contributions
arl+0.020
auc+0.017
lessness+0.017
washer+0.016
udd+0.015
Neg logit contributions
 Jeffrey-0.013
 Raven-0.013
 Heidi-0.011
 Gor-0.011
 Tab-0.011
Token
Token ID198
Feature activation+1.64e-02

Pos logit contributions
 Jeffrey+0.002
 Raven+0.002
 Tab+0.002
 Heidi+0.002
 Gor+0.002
Neg logit contributions
arl-0.003
lessness-0.003
auc-0.003
washer-0.003
udd-0.003
Token
Token ID198
Feature activation+1.38e-02

Pos logit contributions
arl+0.042
lessness+0.039
auc+0.039
washer+0.037
udd+0.034
Neg logit contributions
 Jeffrey-0.021
 Raven-0.020
 Heidi-0.017
 Ruth-0.016
 Tab-0.016
TokenThe
Token ID464
Feature activation+3.70e-02

Pos logit contributions
arl+0.032
lessness+0.029
auc+0.028
washer+0.027
udd+0.025
Neg logit contributions
 Jeffrey-0.022
 Raven-0.020
 Heidi-0.018
 Tab-0.018
 Pet-0.017
Token f
Token ID277
Feature activation+8.52e-02

Pos logit contributions
arl+0.081
lessness+0.072
leys+0.070
ares+0.065
udd+0.065
Neg logit contributions
 Jeffrey-0.050
 Raven-0.049
 gor-0.046
 Pet-0.045
 Gor-0.045
Tokenauc
Token ID14272
Feature activation+5.99e-02

Pos logit contributions
 die+0.093
arl+0.093
 divorce+0.072
 promise+0.068
 ruin+0.064
Neg logit contributions
alia-0.157
 Jeffrey-0.151
elle-0.148
 Noah-0.140
Jacob-0.139
Tokenet
Token ID316
Feature activation+2.50e-02

Pos logit contributions
arl+0.169
lessness+0.147
washer+0.140
auc+0.135
emic+0.132
Neg logit contributions
 Jeffrey-0.056
 Raven-0.042
elle-0.042
 Gor-0.040
 Heidi-0.038
Token was
Token ID373
Feature activation+1.43e-02

Pos logit contributions
arl+0.058
lessness+0.048
auc+0.044
washer+0.041
udd+0.039
Neg logit contributions
 Jeffrey-0.041
 Raven-0.037
 Tab-0.035
 Gor-0.034
 Pet-0.034
Token happy
Token ID3772
Feature activation+1.58e-02

Pos logit contributions
arl+0.033
auc+0.029
lessness+0.028
washer+0.028
ry+0.027
Neg logit contributions
 Jeffrey-0.022
 Raven-0.021
 Heidi-0.019
 Sunshine-0.018
 Tab-0.018
Token again
Token ID757
Feature activation+6.77e-03

Pos logit contributions
arl+0.040
auc+0.035
lessness+0.034
washer+0.033
udd+0.031
Neg logit contributions
 Jeffrey-0.022
 Raven-0.020
 Heidi-0.018
 Tab-0.018
 Gor-0.017
Token.
Token ID13
Feature activation+1.36e-02

Pos logit contributions
 Jeffrey+0.011
 Raven+0.009
 Tab+0.009
 Heidi+0.009
 Pet+0.008
Neg logit contributions
arl-0.018
auc-0.015
lessness-0.015
washer-0.014
udd-0.013
Token
Token ID220
Feature activation+5.18e-04

Pos logit contributions
arl+0.032
lessness+0.028
auc+0.028
washer+0.027
udd+0.025
Neg logit contributions
 Jeffrey-0.021
 Raven-0.020
 Tab-0.018
 Heidi-0.018
 Gor-0.017
Token
Token ID198
Feature activation+1.71e-02

Pos logit contributions
arl+0.001
lessness+0.001
auc+0.001
washer+0.001
udd+0.001
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Heidi-0.001
 Tab-0.001
 Pet-0.001
Token
Token ID198
Feature activation+9.52e-03

Pos logit contributions
arl+0.041
auc+0.038
lessness+0.038
washer+0.036
 die+0.033
Neg logit contributions
 Jeffrey-0.024
 Raven-0.022
 Tab-0.020
 Heidi-0.020
 Ruth-0.020
TokenThe
Token ID464
Feature activation+4.24e-02

Pos logit contributions
arl+0.021
lessness+0.018
auc+0.018
washer+0.017
udd+0.016
Neg logit contributions
 Jeffrey-0.016
 Raven-0.015
 Tab-0.014
 Heidi-0.013
 Gor-0.013
Token f
Token ID277
Feature activation+7.96e-02

Pos logit contributions
arl+0.088
lessness+0.072
leys+0.071
bs+0.069
run+0.067
Neg logit contributions
 Jeffrey-0.057
 Pet-0.056
 gor-0.056
 counting-0.055
 Gor-0.055
Tokenauc
Token ID14272
Feature activation+5.26e-02

Pos logit contributions
arl+0.080
 die+0.070
 divorce+0.051
 forgive+0.050
 promise+0.050
Neg logit contributions
 Jeffrey-0.167
alia-0.167
elle-0.162
 Noah-0.157
 Pet-0.155
Tokenet
Token ID316
Feature activation+1.84e-02

Pos logit contributions
arl+0.152
lessness+0.132
washer+0.128
auc+0.124
icker+0.118
Neg logit contributions
 Jeffrey-0.049
 Raven-0.038
 Gor-0.036
elle-0.035
 Heidi-0.032
Token started
Token ID2067
Feature activation+9.06e-04

Pos logit contributions
arl+0.042
lessness+0.034
auc+0.032
washer+0.031
udd+0.029
Neg logit contributions
 Jeffrey-0.031
 Raven-0.029
 Gor-0.026
 Tab-0.026
 Pet-0.026
Token to
Token ID284
Feature activation+2.57e-02

Pos logit contributions
arl+0.002
lessness+0.002
auc+0.002
washer+0.002
udd+0.002
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Tab-0.001
 Pet-0.001
 Ruth-0.001
Token feel
Token ID1254
Feature activation+3.30e-02

Pos logit contributions
arl+0.054
lessness+0.046
auc+0.044
washer+0.044
oke+0.040
Neg logit contributions
 Jeffrey-0.046
 Raven-0.044
 Pet-0.040
 Tab-0.039
 Sunshine-0.038
Token play
Token ID711
Feature activation+3.69e-02

Pos logit contributions
arl+0.077
lessness+0.067
auc+0.063
washer+0.059
udd+0.057
Neg logit contributions
 Jeffrey-0.071
 Raven-0.065
 Tab-0.061
 Gor-0.060
 Heidi-0.059
Token!"
Token ID2474
Feature activation+7.21e-03

Pos logit contributions
arl+0.097
lessness+0.083
auc+0.076
oke+0.074
washer+0.070
Neg logit contributions
 Jeffrey-0.048
 Raven-0.045
 Sunshine-0.038
 Gor-0.037
 Ruth-0.036
Token
Token ID198
Feature activation+1.47e-02

Pos logit contributions
arl+0.015
lessness+0.013
auc+0.013
washer+0.012
udd+0.012
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Heidi-0.011
 Gor-0.011
Token
Token ID198
Feature activation+7.88e-03

Pos logit contributions
arl+0.035
lessness+0.032
auc+0.031
washer+0.030
leys+0.027
Neg logit contributions
 Jeffrey-0.022
 Raven-0.021
 Tab-0.018
 Ruth-0.018
 Pet-0.018
TokenThe
Token ID464
Feature activation+3.84e-02

Pos logit contributions
arl+0.012
lessness+0.010
auc+0.009
washer+0.009
udd+0.009
Neg logit contributions
 Jeffrey-0.019
 Raven-0.018
 Gor-0.017
 Tab-0.017
 Heidi-0.017
Token f
Token ID277
Feature activation+7.91e-02

Pos logit contributions
arl+0.080
lessness+0.067
leys+0.067
bs+0.064
ares+0.062
Neg logit contributions
 Tab-0.052
 Pet-0.051
 Jeffrey-0.051
 Gor-0.051
 Raven-0.050
Tokenauc
Token ID14272
Feature activation+5.27e-02

Pos logit contributions
arl+0.078
 die+0.067
 divorce+0.052
lessness+0.047
 forgive+0.044
Neg logit contributions
 Jeffrey-0.172
alia-0.167
elle-0.162
 Pet-0.159
 Gor-0.158
Tokenet
Token ID316
Feature activation+1.82e-02

Pos logit contributions
arl+0.148
lessness+0.130
washer+0.122
auc+0.121
udd+0.114
Neg logit contributions
 Jeffrey-0.056
 Raven-0.045
 Gor-0.042
 Heidi-0.039
 Tab-0.038
Token was
Token ID373
Feature activation+4.92e-03

Pos logit contributions
arl+0.041
lessness+0.034
auc+0.032
washer+0.030
udd+0.029
Neg logit contributions
 Jeffrey-0.031
 Raven-0.028
 Tab-0.026
 Gor-0.026
 Pet-0.026
Token happy
Token ID3772
Feature activation+9.58e-03

Pos logit contributions
arl+0.011
lessness+0.009
auc+0.009
washer+0.009
udd+0.008
Neg logit contributions
 Jeffrey-0.009
 Raven-0.008
 Tab-0.007
 Gor-0.007
 Heidi-0.007
Token and
Token ID290
Feature activation+8.08e-03

Pos logit contributions
arl+0.020
lessness+0.018
auc+0.017
washer+0.016
udd+0.015
Neg logit contributions
 Jeffrey-0.018
 Raven-0.016
 Tab-0.015
 Gor-0.015
 Heidi-0.015
Token Lily
Token ID20037
Feature activation-4.91e-03

Pos logit contributions
arl+0.042
lessness+0.037
auc+0.036
washer+0.036
udd+0.034
Neg logit contributions
 Jeffrey-0.024
 Raven-0.022
 Tab-0.020
 Heidi-0.020
 Gor-0.019
Token started
Token ID2067
Feature activation+7.42e-05

Pos logit contributions
 Jeffrey+0.009
 Raven+0.009
 Tab+0.008
 Heidi+0.008
 Gor+0.008
Neg logit contributions
arl-0.010
lessness-0.008
auc-0.008
washer-0.008
udd-0.007
Token to
Token ID284
Feature activation+1.75e-02

Pos logit contributions
arl+0.000
lessness+0.000
auc+0.000
washer+0.000
udd+0.000
Neg logit contributions
 Jeffrey-0.000
 Raven-0.000
 Tab-0.000
 Heidi-0.000
 Gor-0.000
Token tease
Token ID34691
Feature activation+1.99e-02

Pos logit contributions
arl+0.032
lessness+0.027
auc+0.026
washer+0.024
udd+0.023
Neg logit contributions
 Jeffrey-0.037
 Raven-0.034
 Tab-0.032
 Gor-0.032
 Heidi-0.032
Token the
Token ID262
Feature activation+4.37e-02

Pos logit contributions
arl+0.043
lessness+0.037
auc+0.035
ot+0.035
udd+0.033
Neg logit contributions
 Jeffrey-0.033
 Gor-0.029
 Raven-0.029
 Tab-0.028
 gor-0.028
Token f
Token ID277
Feature activation+7.91e-02

Pos logit contributions
arl+0.086
lessness+0.070
leys+0.067
bs+0.065
ares+0.062
Neg logit contributions
 Jeffrey-0.070
 Pet-0.068
 Tab-0.068
 Raven-0.066
 Gor-0.066
Tokenauc
Token ID14272
Feature activation+5.65e-02

Pos logit contributions
arl+0.077
 die+0.059
lessness+0.046
 divorce+0.044
leys+0.043
Neg logit contributions
alia-0.171
 Jeffrey-0.169
 Noah-0.162
 Gor-0.161
 Tab-0.161
Tokenet
Token ID316
Feature activation-3.98e-03

Pos logit contributions
arl+0.160
lessness+0.143
washer+0.134
auc+0.131
emic+0.125
Neg logit contributions
 Jeffrey-0.054
 Raven-0.042
 Gor-0.041
 Tab-0.036
 gor-0.036
Token.
Token ID13
Feature activation+9.88e-03

Pos logit contributions
 Jeffrey+0.006
 Raven+0.005
 Tab+0.005
 Gor+0.005
 Pet+0.005
Neg logit contributions
arl-0.010
lessness-0.008
auc-0.008
washer-0.008
udd-0.007
Token She
Token ID1375
Feature activation+2.83e-02

Pos logit contributions
arl+0.025
lessness+0.022
auc+0.022
washer+0.021
udd+0.020
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Heidi-0.011
 Gor-0.010
Token would
Token ID561
Feature activation+6.88e-03

Pos logit contributions
arl+0.059
lessness+0.049
auc+0.047
washer+0.044
udd+0.041
Neg logit contributions
 Jeffrey-0.054
 Raven-0.050
 Tab-0.047
 Gor-0.046
 Heidi-0.045
Token faster
Token ID5443
Feature activation+8.15e-03

Pos logit contributions
arl+0.030
lessness+0.025
auc+0.025
washer+0.024
udd+0.023
Neg logit contributions
 Jeffrey-0.017
 Raven-0.015
 Tab-0.014
 Heidi-0.013
 Gor-0.013
Token she
Token ID673
Feature activation+1.15e-03

Pos logit contributions
arl+0.018
lessness+0.016
auc+0.016
washer+0.015
udd+0.014
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Gor-0.011
 Tab-0.011
 Heidi-0.011
Token skipped
Token ID26684
Feature activation+9.30e-03

Pos logit contributions
arl+0.003
lessness+0.002
auc+0.002
washer+0.002
udd+0.002
Neg logit contributions
 Jeffrey-0.002
 Raven-0.002
 Tab-0.002
 Pet-0.002
 Heidi-0.002
Token until
Token ID1566
Feature activation-4.69e-03

Pos logit contributions
arl+0.020
lessness+0.017
auc+0.016
washer+0.015
udd+0.015
Neg logit contributions
 Jeffrey-0.017
 Raven-0.016
 Gor-0.015
 Tab-0.015
 Heidi-0.014
Token the
Token ID262
Feature activation+2.72e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Tab+0.006
 Gor+0.006
 Heidi+0.006
Neg logit contributions
arl-0.011
lessness-0.009
auc-0.009
washer-0.009
udd-0.008
Token f
Token ID277
Feature activation+7.86e-02

Pos logit contributions
arl+0.057
lessness+0.048
leys+0.046
bs+0.044
udd+0.044
Neg logit contributions
 Jeffrey-0.041
 Raven-0.039
 Gor-0.038
 gor-0.037
 Tab-0.037
Tokenauc
Token ID14272
Feature activation+5.58e-02

Pos logit contributions
arl+0.084
 die+0.075
 promise+0.064
 forgive+0.063
 divorce+0.057
Neg logit contributions
alia-0.155
elle-0.151
 Jeffrey-0.149
 Noah-0.144
 gor-0.142
Tokenet
Token ID316
Feature activation+1.38e-02

Pos logit contributions
arl+0.158
lessness+0.139
washer+0.132
auc+0.130
 die+0.121
Neg logit contributions
 Jeffrey-0.057
 Raven-0.046
 Gor-0.043
 Tab-0.040
elle-0.039
Token was
Token ID373
Feature activation-2.36e-02

Pos logit contributions
arl+0.032
lessness+0.027
auc+0.025
washer+0.024
 die+0.023
Neg logit contributions
 Jeffrey-0.021
 Raven-0.019
 Tab-0.018
 Gor-0.018
 Pet-0.018
Token a
Token ID257
Feature activation+1.73e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.038
 Tab+0.035
 Gor+0.035
 Heidi+0.035
Neg logit contributions
arl-0.051
lessness-0.044
auc-0.043
washer-0.041
udd-0.039
Token blur
Token ID23671
Feature activation+2.43e-02

Pos logit contributions
arl+0.038
auc+0.032
lessness+0.032
washer+0.029
leys+0.028
Neg logit contributions
 Jeffrey-0.030
 Raven-0.028
 Tab-0.025
 Gor-0.025
 Pet-0.025
Token swimming
Token ID14899
Feature activation+2.56e-02

Pos logit contributions
arl+0.060
lessness+0.052
auc+0.050
washer+0.048
udd+0.045
Neg logit contributions
 Jeffrey-0.049
 Raven-0.044
 Tab-0.041
 Heidi-0.041
 Gor-0.041
Token around
Token ID1088
Feature activation-4.34e-03

Pos logit contributions
arl+0.057
lessness+0.049
auc+0.048
washer+0.046
udd+0.043
Neg logit contributions
 Jeffrey-0.044
 Raven-0.040
 Tab-0.037
 Heidi-0.037
 Gor-0.036
Token her
Token ID607
Feature activation+9.67e-03

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Gor+0.007
 Tab+0.007
Alexander+0.007
Neg logit contributions
arl-0.008
lessness-0.008
auc-0.007
washer-0.006
ot-0.006
Token.
Token ID13
Feature activation+9.74e-03

Pos logit contributions
arl+0.019
lessness+0.017
auc+0.016
washer+0.016
udd+0.015
Neg logit contributions
 Jeffrey-0.018
 Raven-0.017
 Heidi-0.016
 Tab-0.016
 Gor-0.016
Token The
Token ID383
Feature activation+7.85e-03

Pos logit contributions
arl+0.021
lessness+0.019
auc+0.018
washer+0.017
udd+0.016
Neg logit contributions
 Jeffrey-0.017
 Raven-0.015
 Tab-0.014
 Heidi-0.014
 Gor-0.014
Token mer
Token ID4017
Feature activation+7.79e-02

Pos logit contributions
arl+0.015
auc+0.012
lessness+0.012
udd+0.011
washer+0.011
Neg logit contributions
 Jeffrey-0.016
 Raven-0.015
 Tab-0.014
 Pet-0.013
 Sunshine-0.013
Tokenmaid
Token ID23151
Feature activation+3.86e-02

Pos logit contributions
arl+0.150
lessness+0.130
auc+0.123
washer+0.122
udd+0.109
Neg logit contributions
 Jeffrey-0.143
 Raven-0.133
 Tab-0.126
 Gor-0.124
 Pet-0.123
Token smiled
Token ID13541
Feature activation-2.12e-03

Pos logit contributions
arl+0.083
auc+0.066
lessness+0.066
washer+0.060
udd+0.057
Neg logit contributions
 Jeffrey-0.074
 Raven-0.069
 Tab-0.064
 Gor-0.063
 Pet-0.062
Token and
Token ID290
Feature activation+9.64e-03

Pos logit contributions
 Jeffrey+0.004
 Raven+0.003
 Tab+0.003
 Heidi+0.003
 Gor+0.003
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token said
Token ID531
Feature activation+1.65e-02

Pos logit contributions
arl+0.018
auc+0.015
lessness+0.015
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.021
 Raven-0.019
 Tab-0.018
 Pet-0.018
 Gor-0.018
Token,
Token ID11
Feature activation-6.62e-03

Pos logit contributions
arl+0.037
lessness+0.032
auc+0.031
washer+0.030
udd+0.028
Neg logit contributions
 Jeffrey-0.027
 Raven-0.025
 Tab-0.023
 Gor-0.023
 Heidi-0.023
Tokenmy
Token ID1820
Feature activation+1.48e-02

Pos logit contributions
 Jeffrey+0.004
 Raven+0.004
 Tab+0.003
 Heidi+0.003
 Gor+0.003
Neg logit contributions
arl-0.006
lessness-0.005
auc-0.005
washer-0.005
udd-0.004
Token quickly
Token ID2952
Feature activation+3.18e-02

Pos logit contributions
arl+0.033
lessness+0.027
auc+0.027
washer+0.025
udd+0.023
Neg logit contributions
 Jeffrey-0.026
 Raven-0.024
 Tab-0.022
 Gor-0.022
 Heidi-0.022
Token turned
Token ID2900
Feature activation+2.21e-02

Pos logit contributions
arl+0.082
lessness+0.068
auc+0.064
washer+0.060
icker+0.059
Neg logit contributions
 Jeffrey-0.040
 Tab-0.035
 Pet-0.035
 Raven-0.033
 Heidi-0.033
Token off
Token ID572
Feature activation-1.14e-03

Pos logit contributions
arl+0.057
auc+0.047
lessness+0.047
washer+0.046
emic+0.044
Neg logit contributions
 Jeffrey-0.029
 Raven-0.026
 Heidi-0.023
 Tab-0.023
 Pet-0.023
Token the
Token ID262
Feature activation+4.27e-02

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Tab+0.001
 Heidi+0.001
 Gor+0.001
Neg logit contributions
arl-0.003
lessness-0.003
auc-0.003
washer-0.003
udd-0.002
Token f
Token ID277
Feature activation+7.75e-02

Pos logit contributions
arl+0.085
leys+0.078
lessness+0.073
bs+0.072
emic+0.071
Neg logit contributions
 trees-0.053
 Jeffrey-0.051
 top-0.050
 bar-0.050
 heels-0.050
Tokenauc
Token ID14272
Feature activation+6.16e-02

Pos logit contributions
arl+0.079
 die+0.072
 forgive+0.061
 divorce+0.056
leys+0.056
Neg logit contributions
alia-0.149
 Jeffrey-0.139
elle-0.139
 Noah-0.137
 counting-0.130
Tokenet
Token ID316
Feature activation-4.20e-03

Pos logit contributions
arl+0.173
lessness+0.156
washer+0.146
auc+0.142
emic+0.138
Neg logit contributions
 Jeffrey-0.056
 Raven-0.042
 Gor-0.041
elle-0.038
 Heidi-0.037
Token and
Token ID290
Feature activation+1.90e-02

Pos logit contributions
 Jeffrey+0.004
 Raven+0.003
tons+0.003
 Tab+0.003
 Pet+0.003
Neg logit contributions
arl-0.012
lessness-0.010
auc-0.009
prise-0.009
udd-0.009
Token apologized
Token ID22893
Feature activation+3.08e-02

Pos logit contributions
arl+0.049
lessness+0.039
auc+0.038
oke+0.035
washer+0.035
Neg logit contributions
 Jeffrey-0.030
 Raven-0.027
 Heidi-0.025
 Tab-0.025
 Gor-0.024
Token to
Token ID284
Feature activation+1.14e-02

Pos logit contributions
arl+0.073
auc+0.063
lessness+0.062
washer+0.058
udd+0.056
Neg logit contributions
 Jeffrey-0.047
 Raven-0.042
 Heidi-0.038
 Gor-0.038
 Tab-0.038
Token
Token ID198
Feature activation+1.44e-02

Pos logit contributions
arl+0.057
lessness+0.053
auc+0.052
washer+0.051
leys+0.045
Neg logit contributions
 Jeffrey-0.029
 Raven-0.027
 Ruth-0.024
 Pet-0.024
 Heidi-0.023
TokenL
Token ID43
Feature activation+2.41e-02

Pos logit contributions
arl+0.023
lessness+0.019
auc+0.018
washer+0.017
udd+0.016
Neg logit contributions
 Jeffrey-0.034
 Raven-0.031
 Tab-0.030
 Gor-0.029
 Heidi-0.029
Tokenily
Token ID813
Feature activation-6.86e-03

Pos logit contributions
arl+0.046
lessness+0.039
washer+0.037
auc+0.036
 die+0.034
Neg logit contributions
 Jeffrey-0.044
 Raven-0.040
 Heidi-0.038
 Tab-0.038
 Gor-0.037
Token and
Token ID290
Feature activation+6.05e-03

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Heidi+0.012
 Tab+0.012
 Gor+0.012
Neg logit contributions
arl-0.012
washer-0.011
auc-0.011
lessness-0.011
udd-0.010
Token the
Token ID262
Feature activation+4.29e-02

Pos logit contributions
arl+0.014
lessness+0.012
auc+0.011
washer+0.011
ot+0.011
Neg logit contributions
 Jeffrey-0.010
 Raven-0.009
 Gor-0.008
 Tab-0.008
 Heidi-0.008
Token f
Token ID277
Feature activation+7.72e-02

Pos logit contributions
arl+0.089
leys+0.074
lessness+0.072
ares+0.068
bs+0.066
Neg logit contributions
 Jeffrey-0.062
 Pet-0.062
 Tab-0.061
 Raven-0.059
 Gor-0.058
Tokenauc
Token ID14272
Feature activation+5.02e-02

Pos logit contributions
arl+0.079
 die+0.074
 divorce+0.060
 forgive+0.053
 promise+0.049
Neg logit contributions
alia-0.154
 Jeffrey-0.147
elle-0.143
 Noah-0.141
 Tab-0.140
Tokenet
Token ID316
Feature activation+1.58e-02

Pos logit contributions
arl+0.139
lessness+0.122
washer+0.115
auc+0.114
 die+0.106
Neg logit contributions
 Jeffrey-0.055
 Raven-0.044
 Gor-0.042
 Heidi-0.039
 Tab-0.039
Token played
Token ID2826
Feature activation+2.59e-02

Pos logit contributions
arl+0.037
lessness+0.029
auc+0.027
washer+0.026
 die+0.025
Neg logit contributions
 Jeffrey-0.025
 Raven-0.022
 Tab-0.022
 Pet-0.021
 Gor-0.021
Token all
Token ID477
Feature activation+9.89e-03

Pos logit contributions
arl+0.062
lessness+0.051
auc+0.050
washer+0.047
oke+0.046
Neg logit contributions
 Jeffrey-0.041
 Raven-0.037
 Pet-0.033
 Sunshine-0.032
 Tab-0.032
Token day
Token ID1110
Feature activation-1.02e-02

Pos logit contributions
arl+0.022
lessness+0.018
auc+0.018
washer+0.017
udd+0.016
Neg logit contributions
 Jeffrey-0.017
 Raven-0.016
 Tab-0.014
 Gor-0.014
 Heidi-0.014
Token started
Token ID2067
Feature activation+2.58e-03

Pos logit contributions
arl+0.085
lessness+0.070
auc+0.064
udd+0.062
washer+0.061
Neg logit contributions
 Jeffrey-0.055
 Raven-0.050
 Pet-0.046
 Tab-0.046
 Gor-0.044
Token to
Token ID284
Feature activation+1.76e-02

Pos logit contributions
arl+0.006
lessness+0.005
auc+0.005
washer+0.005
udd+0.005
Neg logit contributions
 Jeffrey-0.004
 Raven-0.003
 Tab-0.003
 Pet-0.003
 Gor-0.003
Token flow
Token ID5202
Feature activation+3.84e-02

Pos logit contributions
arl+0.040
lessness+0.034
udd+0.031
auc+0.031
oke+0.031
Neg logit contributions
 Jeffrey-0.029
 Raven-0.027
 Pet-0.025
 Tab-0.025
 Sunshine-0.024
Token from
Token ID422
Feature activation+1.75e-03

Pos logit contributions
arl+0.087
lessness+0.073
auc+0.072
washer+0.069
udd+0.065
Neg logit contributions
 Jeffrey-0.063
 Raven-0.058
 Tab-0.054
 Gor-0.053
 Pet-0.052
Token the
Token ID262
Feature activation+2.73e-02

Pos logit contributions
arl+0.004
lessness+0.003
auc+0.003
washer+0.003
ot+0.003
Neg logit contributions
 Jeffrey-0.003
 Raven-0.002
 Tab-0.002
 Heidi-0.002
 Gor-0.002
Token f
Token ID277
Feature activation+7.66e-02

Pos logit contributions
arl+0.056
lessness+0.047
auc+0.045
washer+0.043
udd+0.041
Neg logit contributions
 Jeffrey-0.051
 Raven-0.047
 Tab-0.044
 Gor-0.044
 Pet-0.043
Tokenauc
Token ID14272
Feature activation+6.13e-02

Pos logit contributions
arl+0.069
 die+0.048
 forgive+0.043
lessness+0.043
 divorce+0.038
Neg logit contributions
alia-0.177
 Jeffrey-0.174
 gor-0.164
 Pet-0.164
 Tab-0.163
Tokenet
Token ID316
Feature activation+1.07e-02

Pos logit contributions
arl+0.170
lessness+0.152
washer+0.142
auc+0.140
ares+0.132
Neg logit contributions
 Jeffrey-0.063
 Raven-0.052
 Gor-0.050
 gor-0.047
 Tab-0.045
Token.
Token ID13
Feature activation+2.23e-03

Pos logit contributions
arl+0.028
lessness+0.023
auc+0.022
washer+0.020
 Acc+0.020
Neg logit contributions
 Jeffrey-0.014
 Raven-0.013
 Tab-0.011
 Pet-0.011
 Gor-0.011
Token It
Token ID632
Feature activation+2.78e-02

Pos logit contributions
arl+0.005
lessness+0.004
auc+0.004
washer+0.004
udd+0.004
Neg logit contributions
 Jeffrey-0.004
 Raven-0.003
 Tab-0.003
 Heidi-0.003
 Gor-0.003
Token wasn
Token ID2492
Feature activation-1.26e-02

Pos logit contributions
arl+0.083
lessness+0.064
auc+0.058
udd+0.056
oke+0.055
Neg logit contributions
 Jeffrey-0.033
 Raven-0.031
 Gor-0.028
 Pet-0.028
 Ruth-0.025
Token could
Token ID714
Feature activation-4.41e-03

Pos logit contributions
 Jeffrey+0.004
 Raven+0.004
 Heidi+0.004
 Tab+0.004
 Sunshine+0.003
Neg logit contributions
arl-0.002
lessness-0.002
washer-0.002
auc-0.002
udd-0.002
Token play
Token ID711
Feature activation+1.36e-02

Pos logit contributions
 Jeffrey+0.009
 Raven+0.008
 Tab+0.008
 Gor+0.008
 Heidi+0.008
Neg logit contributions
arl-0.008
lessness-0.007
auc-0.007
washer-0.007
udd-0.006
Token too
Token ID1165
Feature activation-6.02e-03

Pos logit contributions
arl+0.031
lessness+0.026
auc+0.026
washer+0.025
udd+0.023
Neg logit contributions
 Jeffrey-0.023
 Raven-0.021
 Tab-0.019
 Heidi-0.019
 Gor-0.019
Token.
Token ID13
Feature activation+7.76e-03

Pos logit contributions
 Jeffrey+0.012
 Raven+0.011
 Tab+0.010
 Gor+0.010
 Heidi+0.010
Neg logit contributions
arl-0.012
lessness-0.010
auc-0.010
washer-0.009
udd-0.009
Token The
Token ID383
Feature activation+3.10e-03

Pos logit contributions
arl+0.017
lessness+0.014
auc+0.014
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.014
 Raven-0.013
 Tab-0.012
 Heidi-0.012
 Gor-0.012
Token babys
Token ID46711
Feature activation+7.65e-02

Pos logit contributions
arl+0.008
lessness+0.007
auc+0.007
washer+0.006
udd+0.006
Neg logit contributions
 Jeffrey-0.005
 Raven-0.004
 Tab-0.004
 Gor-0.004
 Heidi-0.004
Tokenitter
Token ID1967
Feature activation+1.99e-02

Pos logit contributions
arl+0.100
lessness+0.077
 die+0.066
ulp+0.066
auc+0.065
Neg logit contributions
 Jeffrey-0.177
 Raven-0.170
 Tab-0.163
 Lib-0.161
 Ruth-0.161
Token said
Token ID531
Feature activation+1.26e-02

Pos logit contributions
arl+0.028
lessness+0.023
auc+0.022
washer+0.022
udd+0.019
Neg logit contributions
 Jeffrey-0.048
 Raven-0.045
 Tab-0.043
 Heidi-0.043
 Gor-0.042
Token it
Token ID340
Feature activation-7.08e-03

Pos logit contributions
arl+0.026
lessness+0.022
auc+0.022
washer+0.021
udd+0.020
Neg logit contributions
 Jeffrey-0.023
 Raven-0.021
 Tab-0.020
 Heidi-0.020
 Gor-0.020
Token was
Token ID373
Feature activation-1.11e-02

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Tab+0.013
 Heidi+0.012
 Gor+0.012
Neg logit contributions
arl-0.013
lessness-0.011
auc-0.011
washer-0.011
udd-0.009
Token fine
Token ID3734
Feature activation-9.19e-03

Pos logit contributions
 Jeffrey+0.023
 Raven+0.021
 Tab+0.020
 Gor+0.019
Alexander+0.019
Neg logit contributions
arl-0.020
lessness-0.016
auc-0.015
washer-0.014
ot-0.014
Token
Token ID198
Feature activation+1.59e-02

Pos logit contributions
arl+0.021
lessness+0.019
auc+0.019
washer+0.018
 die+0.016
Neg logit contributions
 Jeffrey-0.012
 Raven-0.010
 Heidi-0.010
 Tab-0.010
 Pet-0.009
Token
Token ID198
Feature activation+1.42e-02

Pos logit contributions
arl+0.036
lessness+0.032
auc+0.032
washer+0.031
udd+0.028
Neg logit contributions
 Jeffrey-0.026
 Raven-0.023
 Heidi-0.022
 Tab-0.021
 Pet-0.021
TokenSuddenly
Token ID38582
Feature activation+5.70e-03

Pos logit contributions
arl+0.029
lessness+0.025
auc+0.025
washer+0.023
udd+0.021
Neg logit contributions
 Jeffrey-0.026
 Raven-0.024
 Heidi-0.023
 Tab-0.022
 Gor-0.022
Token,
Token ID11
Feature activation+2.01e-02

Pos logit contributions
arl+0.011
lessness+0.010
auc+0.009
washer+0.009
ot+0.008
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Gor-0.010
 Tab-0.009
 gor-0.009
Token the
Token ID262
Feature activation+4.19e-02

Pos logit contributions
arl+0.041
lessness+0.035
auc+0.034
washer+0.033
udd+0.031
Neg logit contributions
 Jeffrey-0.038
 Raven-0.034
 Tab-0.032
 Gor-0.032
 Heidi-0.032
Token f
Token ID277
Feature activation+7.57e-02

Pos logit contributions
arl+0.081
leys+0.065
lessness+0.065
ares+0.061
udd+0.059
Neg logit contributions
 Jeffrey-0.070
 Raven-0.068
 Pet-0.066
 Tab-0.065
 Gor-0.065
Tokenauc
Token ID14272
Feature activation+5.33e-02

Pos logit contributions
arl+0.076
 die+0.062
 forgive+0.048
 promise+0.046
 divorce+0.045
Neg logit contributions
alia-0.163
 Jeffrey-0.162
elle-0.153
 Noah-0.150
 Tab-0.147
Tokenet
Token ID316
Feature activation+1.87e-02

Pos logit contributions
arl+0.151
lessness+0.133
washer+0.125
auc+0.122
udd+0.116
Neg logit contributions
 Jeffrey-0.054
 Raven-0.042
 Gor-0.040
 Heidi-0.038
elle-0.036
Token broke
Token ID6265
Feature activation+2.85e-02

Pos logit contributions
arl+0.041
lessness+0.034
auc+0.030
washer+0.029
udd+0.028
Neg logit contributions
 Jeffrey-0.032
 Raven-0.029
 Gor-0.027
 Tab-0.027
 Pet-0.027
Token and
Token ID290
Feature activation+2.61e-02

Pos logit contributions
arl+0.065
lessness+0.056
auc+0.054
washer+0.052
udd+0.049
Neg logit contributions
 Jeffrey-0.047
 Raven-0.043
 Tab-0.040
 Heidi-0.039
 Gor-0.039
Token water
Token ID1660
Feature activation+4.07e-02

Pos logit contributions
arl+0.067
auc+0.052
lessness+0.052
udd+0.050
oke+0.049
Neg logit contributions
 Jeffrey-0.037
 Raven-0.034
 Heidi-0.032
 Tab-0.031
 Ruth-0.030
Token not
Token ID407
Feature activation+8.52e-03

Pos logit contributions
arl+0.004
lessness+0.004
auc+0.004
washer+0.004
udd+0.003
Neg logit contributions
 Jeffrey-0.005
 Raven-0.005
 Tab-0.005
 Gor-0.005
 Heidi-0.005
Token to
Token ID284
Feature activation+2.36e-02

Pos logit contributions
arl+0.020
auc+0.018
lessness+0.018
washer+0.016
udd+0.016
Neg logit contributions
 Jeffrey-0.013
 Raven-0.011
 Tab-0.011
 Gor-0.010
 Heidi-0.010
Token play
Token ID711
Feature activation+2.44e-02

Pos logit contributions
arl+0.050
lessness+0.044
auc+0.042
washer+0.040
oke+0.038
Neg logit contributions
 Jeffrey-0.039
 Raven-0.036
 Tab-0.035
 Pet-0.033
tons-0.033
Token with
Token ID351
Feature activation+1.15e-04

Pos logit contributions
arl+0.067
lessness+0.058
oke+0.053
auc+0.053
leys+0.053
Neg logit contributions
 Jeffrey-0.026
 Raven-0.023
tons-0.020
 Gor-0.019
 Pet-0.019
Token the
Token ID262
Feature activation+3.65e-02

Pos logit contributions
arl+0.000
lessness+0.000
auc+0.000
washer+0.000
oke+0.000
Neg logit contributions
 Jeffrey-0.000
 Raven-0.000
 Gor-0.000
 Tab-0.000
 gor-0.000
Token f
Token ID277
Feature activation+7.54e-02

Pos logit contributions
arl+0.075
lessness+0.062
leys+0.058
bs+0.056
oke+0.055
Neg logit contributions
 Jeffrey-0.053
 gor-0.052
 Pet-0.052
 Tab-0.051
 Raven-0.051
Tokenauc
Token ID14272
Feature activation+6.00e-02

Pos logit contributions
arl+0.074
 die+0.065
 forgive+0.056
 divorce+0.052
 sank+0.052
Neg logit contributions
alia-0.153
 Jeffrey-0.136
elle-0.136
 gor-0.132
 counting-0.132
Tokenet
Token ID316
Feature activation-3.79e-03

Pos logit contributions
arl+0.162
lessness+0.146
washer+0.136
auc+0.132
 die+0.129
Neg logit contributions
 Jeffrey-0.054
elle-0.043
 Gor-0.042
 Raven-0.040
 gor-0.038
Token because
Token ID780
Feature activation+2.17e-02

Pos logit contributions
 Jeffrey+0.005
 Raven+0.005
 Gor+0.004
 Tab+0.004
tons+0.004
Neg logit contributions
arl-0.009
lessness-0.008
auc-0.007
udd-0.007
washer-0.007
Token it
Token ID340
Feature activation+3.47e-02

Pos logit contributions
arl+0.054
lessness+0.045
auc+0.044
udd+0.043
washer+0.041
Neg logit contributions
 Jeffrey-0.031
 Raven-0.028
 Heidi-0.026
 Gor-0.025
 Tab-0.025
Token was
Token ID373
Feature activation-2.64e-03

Pos logit contributions
arl+0.084
lessness+0.068
auc+0.064
udd+0.059
washer+0.059
Neg logit contributions
 Jeffrey-0.055
 Raven-0.050
 Gor-0.046
 Tab-0.045
 Pet-0.044
Token yellow
Token ID7872
Feature activation+1.49e-02

Pos logit contributions
arl+0.040
lessness+0.033
auc+0.032
washer+0.031
udd+0.029
Neg logit contributions
 Jeffrey-0.043
 Raven-0.040
 Tab-0.037
 Gor-0.037
 Heidi-0.037
Token button
Token ID4936
Feature activation+2.53e-03

Pos logit contributions
arl+0.026
lessness+0.021
auc+0.021
washer+0.019
udd+0.018
Neg logit contributions
 Jeffrey-0.033
 Raven-0.030
 Tab-0.029
 Gor-0.028
 Heidi-0.028
Token and
Token ID290
Feature activation+1.06e-02

Pos logit contributions
arl+0.007
lessness+0.006
auc+0.006
washer+0.005
udd+0.005
Neg logit contributions
 Jeffrey-0.003
 Raven-0.003
 Tab-0.003
 Heidi-0.002
 Gor-0.002
Token suddenly
Token ID6451
Feature activation-1.82e-03

Pos logit contributions
arl+0.027
auc+0.022
lessness+0.022
washer+0.020
udd+0.020
Neg logit contributions
 Jeffrey-0.017
 Raven-0.015
 Tab-0.014
 Pet-0.014
 Heidi-0.014
Token a
Token ID257
Feature activation+2.87e-02

Pos logit contributions
 Jeffrey+0.003
 Raven+0.003
 Tab+0.003
 Gor+0.003
 Heidi+0.003
Neg logit contributions
arl-0.004
lessness-0.003
auc-0.003
washer-0.003
udd-0.003
Token voice
Token ID3809
Feature activation+7.53e-02

Pos logit contributions
arl+0.059
lessness+0.050
auc+0.049
washer+0.046
udd+0.045
Neg logit contributions
 Jeffrey-0.053
 Raven-0.050
 Tab-0.046
 Gor-0.045
 Heidi-0.045
Token came
Token ID1625
Feature activation+2.74e-02

Pos logit contributions
arl+0.190
udd+0.154
lessness+0.151
auc+0.143
uddle+0.142
Neg logit contributions
 Jeffrey-0.114
 Raven-0.110
 Tab-0.103
 Pet-0.101
 Sunshine-0.101
Token out
Token ID503
Feature activation+2.55e-02

Pos logit contributions
arl+0.054
lessness+0.046
auc+0.045
washer+0.043
udd+0.040
Neg logit contributions
 Jeffrey-0.053
 Raven-0.049
 Tab-0.046
 Heidi-0.045
 Gor-0.045
Token of
Token ID286
Feature activation+9.20e-03

Pos logit contributions
arl+0.054
auc+0.046
lessness+0.046
washer+0.045
udd+0.044
Neg logit contributions
 Jeffrey-0.045
 Raven-0.042
 Tab-0.039
 Gor-0.038
 Heidi-0.038
Token nowhere
Token ID12062
Feature activation+3.09e-02

Pos logit contributions
arl+0.022
lessness+0.020
auc+0.019
washer+0.018
udd+0.018
Neg logit contributions
 Jeffrey-0.014
 Raven-0.012
 Tab-0.011
 Heidi-0.011
 Gor-0.011
Token and
Token ID290
Feature activation+3.25e-02

Pos logit contributions
arl+0.071
lessness+0.061
auc+0.059
washer+0.056
udd+0.054
Neg logit contributions
 Jeffrey-0.051
 Raven-0.047
 Tab-0.043
 Heidi-0.043
 Gor-0.042
Token you
Token ID345
Feature activation+1.99e-03

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Tab+0.009
 Heidi+0.009
 Gor+0.009
Neg logit contributions
arl-0.012
lessness-0.010
auc-0.010
washer-0.010
udd-0.009
Token."
Token ID526
Feature activation+3.16e-03

Pos logit contributions
arl+0.004
lessness+0.003
auc+0.003
washer+0.003
udd+0.003
Neg logit contributions
 Jeffrey-0.004
 Raven-0.003
 Tab-0.003
 Heidi-0.003
 Gor-0.003
Token
Token ID198
Feature activation+1.42e-02

Pos logit contributions
arl+0.006
lessness+0.006
auc+0.005
washer+0.005
udd+0.005
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.005
 Gor-0.005
 Heidi-0.005
Token
Token ID198
Feature activation+7.68e-03

Pos logit contributions
arl+0.034
lessness+0.030
auc+0.030
washer+0.029
udd+0.026
Neg logit contributions
 Jeffrey-0.022
 Raven-0.020
 Heidi-0.018
 Tab-0.018
 Ruth-0.018
TokenThe
Token ID464
Feature activation+2.48e-02

Pos logit contributions
arl+0.017
lessness+0.014
auc+0.014
washer+0.013
udd+0.013
Neg logit contributions
 Jeffrey-0.014
 Raven-0.012
 Tab-0.011
 Heidi-0.011
 Gor-0.011
Token f
Token ID277
Feature activation+7.51e-02

Pos logit contributions
arl+0.057
lessness+0.047
leys+0.046
bs+0.045
ares+0.045
Neg logit contributions
 Jeffrey-0.033
 Gor-0.032
 gor-0.031
 Raven-0.031
 Tab-0.030
Tokenauc
Token ID14272
Feature activation+4.84e-02

Pos logit contributions
arl+0.075
 die+0.073
 divorce+0.051
 promise+0.048
 forgive+0.048
Neg logit contributions
 Jeffrey-0.158
alia-0.152
elle-0.152
 Gor-0.144
Jacob-0.144
Tokenet
Token ID316
Feature activation+4.22e-03

Pos logit contributions
arl+0.133
lessness+0.115
auc+0.110
washer+0.109
udd+0.102
Neg logit contributions
 Jeffrey-0.056
 Raven-0.046
 Gor-0.043
 Tab-0.040
 Heidi-0.040
Token wasn
Token ID2492
Feature activation-1.57e-02

Pos logit contributions
arl+0.009
lessness+0.007
auc+0.007
washer+0.007
udd+0.006
Neg logit contributions
 Jeffrey-0.008
 Raven-0.007
 Tab-0.007
 Gor-0.007
 Heidi-0.007
Token't
Token ID470
Feature activation-5.70e-03

Pos logit contributions
 Jeffrey+0.028
 Raven+0.025
 Tab+0.024
 Gor+0.023
 Heidi+0.023
Neg logit contributions
arl-0.034
lessness-0.029
auc-0.028
washer-0.027
udd-0.024
Token very
Token ID845
Feature activation+3.60e-03

Pos logit contributions
 Jeffrey+0.010
 Raven+0.009
 Tab+0.009
 Gor+0.009
 Heidi+0.009
Neg logit contributions
arl-0.012
lessness-0.010
auc-0.010
washer-0.010
udd-0.009
Token finally
Token ID3443
Feature activation-5.19e-03

Pos logit contributions
 Jeffrey+0.004
 Raven+0.004
 Tab+0.004
 Heidi+0.003
 Gor+0.003
Neg logit contributions
arl-0.006
lessness-0.005
auc-0.005
washer-0.005
udd-0.005
Token changed
Token ID3421
Feature activation+1.84e-02

Pos logit contributions
 Jeffrey+0.009
 Raven+0.008
 Tab+0.008
 Heidi+0.008
 Pet+0.008
Neg logit contributions
arl-0.011
auc-0.009
lessness-0.009
washer-0.009
udd-0.008
Token!
Token ID0
Feature activation+1.31e-02

Pos logit contributions
arl+0.041
lessness+0.035
auc+0.034
washer+0.033
udd+0.031
Neg logit contributions
 Jeffrey-0.032
 Raven-0.029
 Tab-0.027
 Gor-0.026
 Heidi-0.026
Token Everywhere
Token ID48201
Feature activation+3.86e-04

Pos logit contributions
arl+0.028
lessness+0.024
auc+0.024
washer+0.023
udd+0.021
Neg logit contributions
 Jeffrey-0.023
 Raven-0.021
 Tab-0.019
 Gor-0.019
 Heidi-0.019
Token the
Token ID262
Feature activation+2.69e-02

Pos logit contributions
arl+0.001
lessness+0.001
auc+0.001
washer+0.001
ry+0.001
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Gor-0.001
 Tab-0.001
 Heidi-0.001
Token f
Token ID277
Feature activation+7.43e-02

Pos logit contributions
arl+0.052
lessness+0.042
auc+0.039
udd+0.037
washer+0.036
Neg logit contributions
 Jeffrey-0.054
 Raven-0.051
 Tab-0.048
 Gor-0.048
 Pet-0.046
Tokenauc
Token ID14272
Feature activation+5.62e-02

Pos logit contributions
arl+0.086
 die+0.060
 promise+0.053
 forgive+0.051
 divorce+0.051
Neg logit contributions
 Jeffrey-0.157
alia-0.153
 Pet-0.145
 Tab-0.145
elle-0.144
Tokenet
Token ID316
Feature activation+2.58e-02

Pos logit contributions
arl+0.162
lessness+0.136
washer+0.124
auc+0.124
 die+0.122
Neg logit contributions
 Jeffrey-0.056
 Gor-0.043
 Raven-0.041
elle-0.039
 Tab-0.037
Token went
Token ID1816
Feature activation+1.52e-02

Pos logit contributions
arl+0.062
lessness+0.049
auc+0.045
washer+0.042
udd+0.042
Neg logit contributions
 Jeffrey-0.040
 Raven-0.036
 Gor-0.034
 Tab-0.034
 Pet-0.033
Token the
Token ID262
Feature activation+1.72e-02

Pos logit contributions
arl+0.033
lessness+0.029
auc+0.028
washer+0.027
udd+0.025
Neg logit contributions
 Jeffrey-0.027
 Raven-0.025
 Tab-0.023
 Gor-0.022
 Heidi-0.022
Token rainbow
Token ID27223
Feature activation+2.80e-02

Pos logit contributions
arl+0.032
lessness+0.026
auc+0.024
washer+0.023
udd+0.022
Neg logit contributions
 Jeffrey-0.035
 Raven-0.033
 Tab-0.032
 Gor-0.031
 Pet-0.030
Token came
Token ID1625
Feature activation+1.29e-02

Pos logit contributions
arl+0.059
lessness+0.048
auc+0.046
washer+0.042
udd+0.040
Neg logit contributions
 Jeffrey-0.048
 Raven-0.044
 Gor-0.041
 Tab-0.041
 Heidi-0.041
Token and
Token ID290
Feature activation+3.63e-02

Pos logit contributions
arl+0.025
lessness+0.021
auc+0.020
washer+0.020
udd+0.018
Neg logit contributions
 Jeffrey-0.026
 Raven-0.024
 Tab-0.022
 Gor-0.022
 Heidi-0.022
Token turned
Token ID2900
Feature activation+2.50e-02

Pos logit contributions
arl+0.089
lessness+0.072
auc+0.069
ares+0.062
ulp+0.062
Neg logit contributions
 Jeffrey-0.058
 Raven-0.052
 Tab-0.052
 Heidi-0.050
 Gor-0.049
Token off
Token ID572
Feature activation+1.29e-02

Pos logit contributions
arl+0.061
lessness+0.052
auc+0.051
washer+0.050
udd+0.047
Neg logit contributions
 Jeffrey-0.035
 Raven-0.032
 Tab-0.029
 Heidi-0.029
 Pet-0.028
Token the
Token ID262
Feature activation+3.83e-02

Pos logit contributions
arl+0.037
lessness+0.032
auc+0.032
washer+0.032
udd+0.031
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Heidi-0.010
 Sunshine-0.010
Token f
Token ID277
Feature activation+7.42e-02

Pos logit contributions
arl+0.075
leys+0.069
bs+0.066
lessness+0.063
emic+0.062
Neg logit contributions
 Jeffrey-0.051
 trees-0.049
 Tab-0.048
 heels-0.048
 bar-0.048
Tokenauc
Token ID14272
Feature activation+5.92e-02

Pos logit contributions
arl+0.076
 die+0.064
 forgive+0.057
 promise+0.049
 divorce+0.049
Neg logit contributions
alia-0.153
 Jeffrey-0.149
elle-0.141
 gor-0.138
 Gor-0.137
Tokenet
Token ID316
Feature activation+4.17e-03

Pos logit contributions
arl+0.164
lessness+0.148
washer+0.136
auc+0.135
 die+0.129
Neg logit contributions
 Jeffrey-0.059
 Raven-0.046
 Gor-0.045
 Heidi-0.040
 Tab-0.040
Token for
Token ID329
Feature activation+1.51e-02

Pos logit contributions
arl+0.011
lessness+0.009
udd+0.008
prise+0.008
auc+0.008
Neg logit contributions
 Jeffrey-0.005
 Raven-0.004
 Tab-0.004
elle-0.004
 Ruth-0.004
Token her
Token ID607
Feature activation+2.56e-02

Pos logit contributions
arl+0.040
auc+0.036
lessness+0.035
washer+0.035
udd+0.033
Neg logit contributions
 Jeffrey-0.019
 Raven-0.017
 Tab-0.016
 Heidi-0.015
 Sunshine-0.015
Token.
Token ID13
Feature activation+1.04e-02

Pos logit contributions
arl+0.064
auc+0.055
lessness+0.053
leys+0.051
udd+0.050
Neg logit contributions
 Jeffrey-0.032
 Raven-0.029
 heels-0.027
 Tab-0.026
 Gor-0.026
Token.
Token ID13
Feature activation+2.22e-02

Pos logit contributions
arl+0.055
auc+0.048
lessness+0.046
washer+0.044
oke+0.042
Neg logit contributions
 Jeffrey-0.032
 Raven-0.029
 Heidi-0.026
 Tab-0.026
 Sunshine-0.026
Token
Token ID220
Feature activation+4.93e-03

Pos logit contributions
arl+0.057
auc+0.049
lessness+0.048
washer+0.047
 outage+0.044
Neg logit contributions
 Jeffrey-0.030
 Raven-0.027
 Heidi-0.026
 Tab-0.025
 Gor-0.025
Token
Token ID198
Feature activation+2.20e-02

Pos logit contributions
arl+0.012
lessness+0.011
auc+0.010
washer+0.010
udd+0.009
Neg logit contributions
 Jeffrey-0.007
 Raven-0.007
 Heidi-0.006
 Tab-0.006
 Pet-0.006
Token
Token ID198
Feature activation+1.77e-02

Pos logit contributions
arl+0.062
auc+0.059
lessness+0.056
washer+0.054
iness+0.053
Neg logit contributions
 Jeffrey-0.021
 Raven-0.019
 Heidi-0.016
 Tab-0.016
 Sunshine-0.015
TokenThe
Token ID464
Feature activation+3.23e-03

Pos logit contributions
arl+0.043
auc+0.038
lessness+0.037
washer+0.035
 die+0.035
Neg logit contributions
 Jeffrey-0.022
 Raven-0.020
 Heidi-0.018
Ste-0.018
 Tab-0.018
Token babys
Token ID46711
Feature activation+7.34e-02

Pos logit contributions
arl+0.007
lessness+0.006
auc+0.006
washer+0.006
udd+0.005
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.005
 Gor-0.005
 Heidi-0.005
Tokenitter
Token ID1967
Feature activation+1.57e-02

Pos logit contributions
arl+0.096
lessness+0.072
 die+0.061
auc+0.060
ulp+0.060
Neg logit contributions
 Jeffrey-0.172
 Raven-0.165
 Tab-0.161
 Lib-0.159
 Ruth-0.158
Token told
Token ID1297
Feature activation+3.36e-02

Pos logit contributions
arl+0.028
lessness+0.024
auc+0.023
washer+0.022
udd+0.021
Neg logit contributions
 Jeffrey-0.033
 Raven-0.031
 Tab-0.029
 Heidi-0.029
 Gor-0.028
Token Lily
Token ID20037
Feature activation-2.18e-02

Pos logit contributions
arl+0.084
auc+0.074
lessness+0.074
washer+0.073
udd+0.067
Neg logit contributions
 Jeffrey-0.047
 Raven-0.042
 Heidi-0.040
 Tab-0.038
 Sunshine-0.037
Token that
Token ID326
Feature activation+1.66e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.039
 Tab+0.037
 Gor+0.036
 Heidi+0.036
Neg logit contributions
arl-0.043
lessness-0.037
auc-0.035
washer-0.034
udd-0.032
Token they
Token ID484
Feature activation+2.68e-02

Pos logit contributions
arl+0.034
lessness+0.029
auc+0.029
washer+0.028
udd+0.026
Neg logit contributions
 Jeffrey-0.031
 Raven-0.028
 Tab-0.026
 Heidi-0.026
 Gor-0.026
Token
Token ID198
Feature activation-2.47e-02

Pos logit contributions
 Jeffrey+0.017
 Raven+0.015
 Tab+0.014
 Gor+0.014
 Heidi+0.014
Neg logit contributions
arl-0.017
lessness-0.014
auc-0.014
washer-0.013
udd-0.012
TokenShe
Token ID3347
Feature activation+2.30e-02

Pos logit contributions
 Jeffrey+0.046
 Raven+0.043
tons+0.042
 Gor+0.040
 Stanley+0.040
Neg logit contributions
arl-0.042
ot-0.034
lessness-0.034
leys-0.033
auc-0.033
Token noticed
Token ID6810
Feature activation+1.30e-02

Pos logit contributions
arl+0.047
lessness+0.040
auc+0.038
washer+0.035
udd+0.033
Neg logit contributions
 Jeffrey-0.044
 Raven-0.040
 Tab-0.038
 Gor-0.038
 Heidi-0.037
Token that
Token ID326
Feature activation+8.91e-03

Pos logit contributions
arl+0.027
lessness+0.023
auc+0.022
washer+0.021
udd+0.020
Neg logit contributions
 Jeffrey-0.024
 Raven-0.022
 Tab-0.021
 Gor-0.021
 Heidi-0.021
Token the
Token ID262
Feature activation+3.13e-02

Pos logit contributions
arl+0.018
lessness+0.017
auc+0.017
washer+0.016
ot+0.015
Neg logit contributions
 Jeffrey-0.015
 Raven-0.013
 Gor-0.012
 Tab-0.012
Alexander-0.012
Token f
Token ID277
Feature activation+7.34e-02

Pos logit contributions
arl+0.061
lessness+0.050
leys+0.046
ares+0.046
bs+0.045
Neg logit contributions
 Jeffrey-0.050
 Pet-0.050
 Tab-0.048
 Raven-0.048
 Gor-0.047
Tokenauc
Token ID14272
Feature activation+5.86e-02

Pos logit contributions
arl+0.068
 die+0.050
 forgive+0.042
 promise+0.037
 divorce+0.037
Neg logit contributions
 Jeffrey-0.169
alia-0.164
 Pet-0.160
 Tab-0.159
elle-0.157
Tokenet
Token ID316
Feature activation+1.82e-02

Pos logit contributions
arl+0.167
lessness+0.146
washer+0.137
auc+0.134
 die+0.129
Neg logit contributions
 Jeffrey-0.055
 Raven-0.043
 Gor-0.043
 Tab-0.038
 gor-0.037
Token was
Token ID373
Feature activation-2.85e-03

Pos logit contributions
arl+0.044
lessness+0.036
auc+0.034
washer+0.032
udd+0.032
Neg logit contributions
 Jeffrey-0.028
 Raven-0.025
 Tab-0.023
 Gor-0.023
 Pet-0.023
Token wide
Token ID3094
Feature activation-6.03e-03

Pos logit contributions
 Jeffrey+0.005
 Raven+0.005
 Tab+0.004
 Heidi+0.004
 Gor+0.004
Neg logit contributions
arl-0.006
lessness-0.005
auc-0.005
washer-0.005
udd-0.005
Token open
Token ID1280
Feature activation+1.22e-02

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Heidi+0.009
 Tab+0.009
 Gor+0.009
Neg logit contributions
arl-0.012
lessness-0.011
auc-0.011
washer-0.011
udd-0.010
Token girl
Token ID2576
Feature activation+4.98e-03

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Tab+0.019
 Charles+0.018
 Heidi+0.018
Neg logit contributions
arl-0.021
lessness-0.018
auc-0.018
washer-0.017
oke-0.016
Token and
Token ID290
Feature activation+7.44e-03

Pos logit contributions
arl+0.008
lessness+0.007
washer+0.007
auc+0.007
ry+0.006
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Tab-0.009
 Heidi-0.009
 Charles-0.009
Token knew
Token ID2993
Feature activation+1.41e-02

Pos logit contributions
arl+0.014
lessness+0.012
auc+0.012
washer+0.011
udd+0.010
Neg logit contributions
 Jeffrey-0.015
 Raven-0.014
 Tab-0.013
 Gor-0.013
 Heidi-0.013
Token that
Token ID326
Feature activation+2.04e-03

Pos logit contributions
arl+0.025
lessness+0.021
auc+0.020
washer+0.020
udd+0.018
Neg logit contributions
 Jeffrey-0.030
 Raven-0.027
 Tab-0.026
 Gor-0.026
 Heidi-0.025
Token the
Token ID262
Feature activation+1.90e-02

Pos logit contributions
arl+0.004
lessness+0.004
auc+0.004
washer+0.004
udd+0.003
Neg logit contributions
 Jeffrey-0.004
 Raven-0.003
 Tab-0.003
 Gor-0.003
 Heidi-0.003
Token f
Token ID277
Feature activation+7.31e-02

Pos logit contributions
arl+0.039
lessness+0.032
ares+0.029
leys+0.029
auc+0.028
Neg logit contributions
 Jeffrey-0.033
 Raven-0.031
 Gor-0.031
 Tab-0.030
 Pet-0.030
Tokenauc
Token ID14272
Feature activation+5.87e-02

Pos logit contributions
arl+0.076
 die+0.061
 forgive+0.052
 promise+0.050
 divorce+0.047
Neg logit contributions
 Jeffrey-0.145
alia-0.145
 Gor-0.139
elle-0.139
 gor-0.137
Tokenet
Token ID316
Feature activation+1.47e-02

Pos logit contributions
arl+0.169
lessness+0.147
washer+0.137
auc+0.134
icker+0.131
Neg logit contributions
 Jeffrey-0.050
 Gor-0.041
 Raven-0.038
elle-0.035
 gor-0.034
Token was
Token ID373
Feature activation-1.09e-03

Pos logit contributions
arl+0.033
lessness+0.027
auc+0.026
washer+0.024
udd+0.024
Neg logit contributions
 Jeffrey-0.025
 Raven-0.023
 Gor-0.021
 Tab-0.021
 Heidi-0.021
Token stuck
Token ID7819
Feature activation+2.41e-02

Pos logit contributions
 Jeffrey+0.002
 Raven+0.002
 Tab+0.002
 Heidi+0.002
 Gor+0.002
Neg logit contributions
arl-0.002
lessness-0.002
auc-0.002
washer-0.002
udd-0.002
Token because
Token ID780
Feature activation+4.99e-03

Pos logit contributions
arl+0.054
lessness+0.046
auc+0.046
washer+0.044
udd+0.042
Neg logit contributions
 Jeffrey-0.040
 Raven-0.036
 Tab-0.033
 Heidi-0.033
 Gor-0.033
Token.
Token ID13
Feature activation-2.32e-03

Pos logit contributions
 Jeffrey+0.030
 Raven+0.028
 Tab+0.026
 Heidi+0.026
 Gor+0.025
Neg logit contributions
arl-0.030
lessness-0.025
auc-0.025
washer-0.024
udd-0.022
Token Tim
Token ID5045
Feature activation-1.18e-02

Pos logit contributions
 Jeffrey+0.005
 Raven+0.005
 Tab+0.005
 Heidi+0.004
 Pet+0.004
Neg logit contributions
arl-0.004
lessness-0.003
auc-0.003
washer-0.003
udd-0.003
Tokenmy
Token ID1820
Feature activation-1.27e-02

Pos logit contributions
 Jeffrey+0.023
 Raven+0.021
 Tab+0.020
 Heidi+0.019
 Ruth+0.019
Neg logit contributions
arl-0.023
lessness-0.021
auc-0.020
washer-0.020
udd-0.019
Token loved
Token ID6151
Feature activation-2.14e-02

Pos logit contributions
 Jeffrey+0.030
 Raven+0.027
 Tab+0.027
 Heidi+0.026
 Ruth+0.026
Neg logit contributions
arl-0.017
washer-0.017
lessness-0.016
auc-0.016
udd-0.015
Token going
Token ID1016
Feature activation-3.81e-02

Pos logit contributions
 Jeffrey+0.046
 Raven+0.042
 Heidi+0.041
 Tab+0.040
 Pet+0.039
Neg logit contributions
arl-0.037
washer-0.030
auc-0.030
lessness-0.029
udd-0.027
Token on
Token ID319
Feature activation-7.90e-02

Pos logit contributions
 Jeffrey+0.087
 Raven+0.079
 Heidi+0.078
 Tab+0.076
 Ruth+0.073
Neg logit contributions
arl-0.064
washer-0.052
auc-0.050
lessness-0.050
udd-0.045
Token holiday
Token ID9912
Feature activation-8.69e-03

Pos logit contributions
 Jeffrey+0.140
 Raven+0.130
 Heidi+0.126
 Gor+0.121
 Pet+0.120
Neg logit contributions
arl-0.132
washer-0.118
udd-0.114
auc-0.108
lessness-0.102
Token with
Token ID351
Feature activation-1.77e-02

Pos logit contributions
 Jeffrey+0.019
 Raven+0.018
 Heidi+0.017
 Gor+0.017
 Tab+0.016
Neg logit contributions
arl-0.013
washer-0.012
udd-0.012
auc-0.011
lessness-0.010
Token his
Token ID465
Feature activation+7.20e-03

Pos logit contributions
 Jeffrey+0.027
 Raven+0.025
 Tab+0.022
 Heidi+0.022
 Gor+0.022
Neg logit contributions
arl-0.042
lessness-0.036
auc-0.036
washer-0.035
udd-0.033
Token family
Token ID1641
Feature activation+1.49e-02

Pos logit contributions
arl+0.015
lessness+0.013
auc+0.013
washer+0.012
udd+0.011
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Gor-0.011
 Heidi-0.011
Token.
Token ID13
Feature activation+2.09e-03

Pos logit contributions
arl+0.022
washer+0.019
lessness+0.019
auc+0.017
ry+0.015
Neg logit contributions
 Jeffrey-0.036
 Raven-0.034
 Heidi-0.032
 Ruth-0.031
 Alexander-0.031
Token.
Token ID13
Feature activation+1.40e-03

Pos logit contributions
 Jeffrey+0.019
 Raven+0.017
 Tab+0.016
 Heidi+0.016
 Gor+0.016
Neg logit contributions
arl-0.019
lessness-0.016
auc-0.016
washer-0.015
udd-0.014
Token Tim
Token ID5045
Feature activation-1.08e-02

Pos logit contributions
arl+0.002
lessness+0.002
auc+0.002
washer+0.002
udd+0.002
Neg logit contributions
 Jeffrey-0.003
 Raven-0.003
 Tab-0.003
 Heidi-0.003
 Ruth-0.003
Tokenmy
Token ID1820
Feature activation-1.01e-02

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Tab+0.019
 Heidi+0.018
 Ruth+0.018
Neg logit contributions
arl-0.020
lessness-0.018
washer-0.018
auc-0.018
udd-0.017
Token loved
Token ID6151
Feature activation-1.97e-02

Pos logit contributions
 Jeffrey+0.025
 Raven+0.022
 Tab+0.022
 Heidi+0.022
 Ruth+0.021
Neg logit contributions
arl-0.013
washer-0.013
lessness-0.012
auc-0.012
udd-0.011
Token going
Token ID1016
Feature activation-3.48e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.039
 Heidi+0.037
 Tab+0.036
 Pet+0.036
Neg logit contributions
arl-0.034
washer-0.028
auc-0.028
lessness-0.027
udd-0.025
Token on
Token ID319
Feature activation-7.87e-02

Pos logit contributions
 Jeffrey+0.082
 Raven+0.074
 Heidi+0.074
 Tab+0.071
 Ruth+0.069
Neg logit contributions
arl-0.056
washer-0.046
auc-0.043
lessness-0.043
udd-0.040
Token saf
Token ID1932
Feature activation-2.41e-03

Pos logit contributions
 Jeffrey+0.146
 Raven+0.135
 Heidi+0.132
 Lex+0.128
 Pet+0.128
Neg logit contributions
arl-0.116
washer-0.106
udd-0.104
auc-0.097
 divorce-0.090
Tokenari
Token ID2743
Feature activation-8.62e-03

Pos logit contributions
 Jeffrey+0.006
 Raven+0.005
 Heidi+0.005
 Tab+0.005
 Gor+0.005
Neg logit contributions
arl-0.004
auc-0.003
lessness-0.003
washer-0.003
udd-0.003
Token with
Token ID351
Feature activation-1.45e-02

Pos logit contributions
 Jeffrey+0.020
 Heidi+0.019
 Raven+0.019
 Stanley+0.018
 Lex+0.018
Neg logit contributions
washer-0.010
auc-0.010
udd-0.010
arl-0.009
lessness-0.008
Token his
Token ID465
Feature activation+6.76e-03

Pos logit contributions
 Jeffrey+0.024
 Raven+0.022
 Heidi+0.020
 Tab+0.020
 Gor+0.019
Neg logit contributions
arl-0.032
auc-0.027
lessness-0.027
washer-0.027
ot-0.025
Token family
Token ID1641
Feature activation+1.28e-02

Pos logit contributions
arl+0.014
lessness+0.012
auc+0.012
washer+0.011
udd+0.010
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Gor-0.011
 Heidi-0.011
Token Lily
Token ID20037
Feature activation-7.79e-03

Pos logit contributions
arl+0.042
lessness+0.038
auc+0.037
washer+0.036
udd+0.034
Neg logit contributions
 Jeffrey-0.031
 Raven-0.030
 Heidi-0.029
 Tab-0.029
 Sunshine-0.028
Token.
Token ID13
Feature activation+6.28e-03

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Tab+0.012
 Gor+0.012
 Heidi+0.012
Neg logit contributions
arl-0.017
lessness-0.014
auc-0.014
washer-0.013
udd-0.012
Token She
Token ID1375
Feature activation-5.36e-03

Pos logit contributions
arl+0.017
lessness+0.015
auc+0.015
washer+0.014
udd+0.014
Neg logit contributions
 Jeffrey-0.008
 Raven-0.007
 Tab-0.006
 Gor-0.006
 Heidi-0.006
Token loved
Token ID6151
Feature activation-2.06e-02

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Tab+0.010
 Heidi+0.010
 Gor+0.009
Neg logit contributions
arl-0.010
lessness-0.009
auc-0.008
washer-0.008
udd-0.007
Token going
Token ID1016
Feature activation-3.54e-02

Pos logit contributions
 Jeffrey+0.045
 Raven+0.041
 Heidi+0.039
 Tab+0.039
 Pet+0.038
Neg logit contributions
arl-0.035
auc-0.028
lessness-0.028
washer-0.028
udd-0.025
Token on
Token ID319
Feature activation-7.82e-02

Pos logit contributions
 Jeffrey+0.084
 Heidi+0.077
 Raven+0.076
 Tab+0.074
 Eva+0.072
Neg logit contributions
arl-0.056
washer-0.045
auc-0.042
lessness-0.041
udd-0.039
Token holiday
Token ID9912
Feature activation-9.00e-03

Pos logit contributions
 Jeffrey+0.143
 Raven+0.127
 Gor+0.125
tons+0.124
 Alexander+0.123
Neg logit contributions
arl-0.119
udd-0.110
washer-0.101
auc-0.096
lessness-0.093
Token with
Token ID351
Feature activation-1.96e-02

Pos logit contributions
 Jeffrey+0.020
 Raven+0.019
 Heidi+0.018
 Gor+0.018
 Alexander+0.018
Neg logit contributions
arl-0.012
washer-0.012
udd-0.012
auc-0.010
lessness-0.010
Token her
Token ID607
Feature activation+8.92e-03

Pos logit contributions
 Jeffrey+0.031
 Raven+0.027
 Gor+0.025
 Heidi+0.024
 Pet+0.024
Neg logit contributions
arl-0.046
lessness-0.039
auc-0.038
washer-0.037
ot-0.036
Token family
Token ID1641
Feature activation+1.44e-02

Pos logit contributions
arl+0.017
lessness+0.015
auc+0.014
washer+0.013
udd+0.013
Neg logit contributions
 Jeffrey-0.018
 Raven-0.017
 Tab-0.016
 Heidi-0.015
 Gor-0.015
Token.
Token ID13
Feature activation+1.31e-02

Pos logit contributions
arl+0.020
washer+0.018
lessness+0.017
auc+0.016
ry+0.015
Neg logit contributions
 Jeffrey-0.035
 Raven-0.033
 Heidi-0.032
 Alexander-0.030
Alexander-0.030
Token Lily
Token ID20037
Feature activation-8.08e-03

Pos logit contributions
arl+0.046
lessness+0.041
auc+0.041
washer+0.040
udd+0.037
Neg logit contributions
 Jeffrey-0.030
 Raven-0.030
 Heidi-0.029
 Tab-0.029
 Sunshine-0.028
Token.
Token ID13
Feature activation+7.07e-03

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Tab+0.012
 Gor+0.012
 Heidi+0.012
Neg logit contributions
arl-0.018
lessness-0.015
auc-0.015
washer-0.014
udd-0.013
Token She
Token ID1375
Feature activation-3.67e-03

Pos logit contributions
arl+0.019
lessness+0.017
auc+0.017
washer+0.016
udd+0.016
Neg logit contributions
 Jeffrey-0.009
 Raven-0.008
 Tab-0.007
 Gor-0.007
 Heidi-0.006
Token loved
Token ID6151
Feature activation-1.82e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Tab+0.007
 Heidi+0.006
 Gor+0.006
Neg logit contributions
arl-0.007
lessness-0.006
auc-0.006
washer-0.005
udd-0.005
Token going
Token ID1016
Feature activation-3.63e-02

Pos logit contributions
 Jeffrey+0.040
 Raven+0.036
 Heidi+0.034
 Tab+0.034
 Gor+0.033
Neg logit contributions
arl-0.032
auc-0.025
lessness-0.025
washer-0.025
udd-0.023
Token on
Token ID319
Feature activation-7.76e-02

Pos logit contributions
 Jeffrey+0.086
 Heidi+0.078
 Raven+0.078
 Tab+0.075
 Eva+0.073
Neg logit contributions
arl-0.057
washer-0.047
auc-0.043
lessness-0.043
udd-0.040
Token adventures
Token ID17545
Feature activation-2.09e-02

Pos logit contributions
 Jeffrey+0.142
 Raven+0.125
 Gor+0.123
tons+0.122
 Alexander+0.121
Neg logit contributions
arl-0.118
udd-0.109
washer-0.100
auc-0.095
 divorce-0.093
Token in
Token ID287
Feature activation-3.88e-02

Pos logit contributions
 Jeffrey+0.054
 Tab+0.049
 Stanley+0.048
 Heidi+0.048
 Gor+0.048
Neg logit contributions
 together-0.022
washer-0.018
lessness-0.017
auc-0.015
udd-0.015
Token the
Token ID262
Feature activation-9.78e-03

Pos logit contributions
 Jeffrey+0.066
 Raven+0.060
 Gor+0.057
 Alexander+0.054
 Tab+0.054
Neg logit contributions
arl-0.081
lessness-0.069
auc-0.066
washer-0.064
ot-0.063
Token bush
Token ID24484
Feature activation+9.98e-03

Pos logit contributions
 Jeffrey+0.016
 Raven+0.014
 Tab+0.013
 Heidi+0.013
 Gor+0.013
Neg logit contributions
arl-0.023
auc-0.019
lessness-0.019
washer-0.018
udd-0.018
Token with
Token ID351
Feature activation-6.05e-03

Pos logit contributions
arl+0.020
lessness+0.017
auc+0.017
washer+0.017
udd+0.016
Neg logit contributions
 Jeffrey-0.019
 Raven-0.017
 Tab-0.016
 Heidi-0.016
 Gor-0.016
Token Lily
Token ID20037
Feature activation-1.12e-02

Pos logit contributions
arl+0.042
lessness+0.038
auc+0.038
washer+0.037
udd+0.034
Neg logit contributions
 Jeffrey-0.028
 Raven-0.028
 Heidi-0.027
 Tab-0.027
 Sunshine-0.026
Token.
Token ID13
Feature activation+4.03e-03

Pos logit contributions
 Jeffrey+0.021
 Raven+0.019
 Tab+0.018
 Gor+0.017
 Heidi+0.017
Neg logit contributions
arl-0.024
lessness-0.020
auc-0.020
washer-0.019
udd-0.018
Token She
Token ID1375
Feature activation-5.29e-03

Pos logit contributions
arl+0.011
lessness+0.009
auc+0.009
washer+0.009
udd+0.008
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.004
 Gor-0.004
 Heidi-0.004
Token loved
Token ID6151
Feature activation-1.93e-02

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Tab+0.010
 Heidi+0.010
 Gor+0.010
Neg logit contributions
arl-0.009
lessness-0.008
auc-0.008
washer-0.008
udd-0.007
Token going
Token ID1016
Feature activation-3.73e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.038
 Heidi+0.037
 Tab+0.036
 Pet+0.036
Neg logit contributions
arl-0.033
auc-0.026
lessness-0.026
washer-0.026
udd-0.024
Token on
Token ID319
Feature activation-7.73e-02

Pos logit contributions
 Jeffrey+0.088
 Raven+0.080
 Heidi+0.080
 Tab+0.077
 Ruth+0.075
Neg logit contributions
arl-0.059
washer-0.048
auc-0.044
lessness-0.044
udd-0.042
Token adventures
Token ID17545
Feature activation-2.20e-02

Pos logit contributions
 Jeffrey+0.143
 Raven+0.127
 Gor+0.124
 Alexander+0.122
 Pet+0.121
Neg logit contributions
arl-0.121
udd-0.109
washer-0.102
auc-0.097
lessness-0.095
Token with
Token ID351
Feature activation-2.08e-02

Pos logit contributions
 Jeffrey+0.056
 Tab+0.051
 Gor+0.051
 Stanley+0.050
 Heidi+0.050
Neg logit contributions
 together-0.023
washer-0.019
lessness-0.018
auc-0.017
udd-0.017
Token her
Token ID607
Feature activation+6.14e-03

Pos logit contributions
 Jeffrey+0.034
 Raven+0.031
 Gor+0.029
 Tab+0.027
 Pet+0.027
Neg logit contributions
arl-0.046
lessness-0.039
auc-0.039
washer-0.037
ot-0.036
Token family
Token ID1641
Feature activation+1.36e-02

Pos logit contributions
arl+0.012
lessness+0.010
auc+0.010
washer+0.009
udd+0.009
Neg logit contributions
 Jeffrey-0.012
 Raven-0.011
 Tab-0.011
 Heidi-0.011
 Gor-0.010
Token.
Token ID13
Feature activation+8.14e-03

Pos logit contributions
arl+0.015
washer+0.014
lessness+0.013
auc+0.011
ry+0.011
Neg logit contributions
 Jeffrey-0.036
 Raven-0.034
 Heidi-0.034
 Alexander-0.032
Alexander-0.032
Token.
Token ID13
Feature activation-1.69e-03

Pos logit contributions
 Jeffrey+0.020
 Raven+0.018
 Tab+0.017
 Heidi+0.017
 Gor+0.017
Neg logit contributions
arl-0.021
lessness-0.018
auc-0.017
washer-0.017
udd-0.016
Token Tim
Token ID5045
Feature activation-9.38e-03

Pos logit contributions
 Jeffrey+0.004
 Raven+0.003
 Tab+0.003
 Heidi+0.003
 Ruth+0.003
Neg logit contributions
arl-0.003
lessness-0.002
auc-0.002
washer-0.002
udd-0.002
Tokenmy
Token ID1820
Feature activation-1.01e-02

Pos logit contributions
 Jeffrey+0.018
 Raven+0.016
 Tab+0.016
 Heidi+0.015
 Ruth+0.015
Neg logit contributions
arl-0.019
lessness-0.017
auc-0.016
washer-0.016
udd-0.015
Token loved
Token ID6151
Feature activation-1.95e-02

Pos logit contributions
 Jeffrey+0.024
 Raven+0.022
 Tab+0.022
 Heidi+0.021
 Ruth+0.021
Neg logit contributions
arl-0.014
washer-0.013
lessness-0.013
auc-0.013
udd-0.012
Token going
Token ID1016
Feature activation-3.30e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.038
 Heidi+0.036
 Tab+0.036
 Pet+0.035
Neg logit contributions
arl-0.034
washer-0.028
auc-0.028
lessness-0.027
udd-0.025
Token on
Token ID319
Feature activation-7.68e-02

Pos logit contributions
 Jeffrey+0.078
 Raven+0.071
 Heidi+0.071
 Tab+0.068
 Ruth+0.067
Neg logit contributions
arl-0.052
washer-0.043
auc-0.040
lessness-0.040
udd-0.037
Token pic
Token ID8301
Feature activation-4.28e-03

Pos logit contributions
 Jeffrey+0.141
 Raven+0.131
 Heidi+0.127
 Pet+0.123
 Gor+0.123
Neg logit contributions
arl-0.118
washer-0.107
udd-0.104
auc-0.099
 divorce-0.091
Tokenn
Token ID77
Feature activation-2.56e-02

Pos logit contributions
 Jeffrey+0.009
 Raven+0.008
 Tab+0.007
 Heidi+0.007
 Gor+0.007
Neg logit contributions
arl-0.008
lessness-0.007
auc-0.007
washer-0.006
udd-0.006
Tokenics
Token ID873
Feature activation-2.57e-02

Pos logit contributions
 Jeffrey+0.051
 Raven+0.047
 Gor+0.044
 Tab+0.044
 Heidi+0.043
Neg logit contributions
arl-0.049
lessness-0.042
auc-0.041
washer-0.040
udd-0.037
Token with
Token ID351
Feature activation-2.25e-02

Pos logit contributions
 Jeffrey+0.064
 Raven+0.062
 Gor+0.061
 Tab+0.061
 Ruth+0.061
Neg logit contributions
 together-0.024
arl-0.022
washer-0.021
udd-0.019
auc-0.018
Token his
Token ID465
Feature activation+2.07e-03

Pos logit contributions
 Jeffrey+0.040
 Raven+0.038
 Gor+0.035
 Pet+0.034
 Tab+0.034
Neg logit contributions
arl-0.044
auc-0.036
washer-0.036
lessness-0.035
ot-0.034
Token.
Token ID13
Feature activation+1.81e-03

Pos logit contributions
arl+0.016
washer+0.014
auc+0.013
lessness+0.013
udd+0.013
Neg logit contributions
 Jeffrey-0.020
 Raven-0.018
 Gor-0.017
 Tab-0.017
 Heidi-0.017
Token Tim
Token ID5045
Feature activation-1.29e-02

Pos logit contributions
arl+0.004
lessness+0.003
auc+0.003
washer+0.003
udd+0.003
Neg logit contributions
 Jeffrey-0.003
 Raven-0.003
 Tab-0.003
 Gor-0.002
 Heidi-0.002
Tokenmy
Token ID1820
Feature activation-1.32e-02

Pos logit contributions
 Jeffrey+0.024
 Raven+0.022
 Tab+0.020
 Heidi+0.020
 Gor+0.020
Neg logit contributions
arl-0.027
lessness-0.023
auc-0.023
washer-0.022
udd-0.022
Token loved
Token ID6151
Feature activation-2.40e-02

Pos logit contributions
 Jeffrey+0.026
 Raven+0.024
 Heidi+0.022
 Tab+0.022
 Gor+0.022
Neg logit contributions
arl-0.025
lessness-0.022
auc-0.021
washer-0.021
udd-0.020
Token going
Token ID1016
Feature activation-3.37e-02

Pos logit contributions
 Jeffrey+0.045
 Raven+0.042
 Heidi+0.039
 Tab+0.039
 Gor+0.038
Neg logit contributions
arl-0.047
lessness-0.039
washer-0.039
auc-0.039
udd-0.036
Token on
Token ID319
Feature activation-7.65e-02

Pos logit contributions
 Jeffrey+0.072
 Raven+0.067
 Heidi+0.064
 Tab+0.063
 Gor+0.061
Neg logit contributions
arl-0.060
washer-0.049
lessness-0.048
auc-0.048
udd-0.044
Token vacation
Token ID14600
Feature activation-7.17e-03

Pos logit contributions
 Jeffrey+0.145
 Raven+0.136
 Heidi+0.129
 Gor+0.128
 Pet+0.124
Neg logit contributions
arl-0.124
washer-0.107
udd-0.106
auc-0.103
lessness-0.100
Token because
Token ID780
Feature activation-3.63e-03

Pos logit contributions
 Jeffrey+0.015
 Raven+0.014
 Gor+0.013
 Heidi+0.013
 Tab+0.013
Neg logit contributions
arl-0.011
washer-0.011
udd-0.010
auc-0.009
lessness-0.009
Token he
Token ID339
Feature activation-1.25e-02

Pos logit contributions
 Jeffrey+0.007
 Raven+0.006
 Tab+0.006
 Heidi+0.006
 Gor+0.006
Neg logit contributions
arl-0.007
lessness-0.006
auc-0.006
washer-0.006
udd-0.005
Token got
Token ID1392
Feature activation-1.48e-02

Pos logit contributions
 Jeffrey+0.026
 Raven+0.024
 Heidi+0.024
 Tab+0.023
 Alexander+0.022
Neg logit contributions
arl-0.021
washer-0.019
lessness-0.019
auc-0.018
udd-0.016
Token to
Token ID284
Feature activation-2.78e-02

Pos logit contributions
 Jeffrey+0.030
 Raven+0.028
 Heidi+0.026
 Tab+0.026
 Gor+0.025
Neg logit contributions
arl-0.027
lessness-0.024
auc-0.023
washer-0.023
udd-0.021
Token Lily
Token ID20037
Feature activation-6.03e-03

Pos logit contributions
arl+0.040
lessness+0.036
auc+0.036
washer+0.034
udd+0.032
Neg logit contributions
 Jeffrey-0.029
 Raven-0.028
 Heidi-0.027
 Tab-0.027
 Sunshine-0.026
Token.
Token ID13
Feature activation+2.84e-03

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Tab+0.009
 Gor+0.009
 Heidi+0.009
Neg logit contributions
arl-0.013
lessness-0.011
auc-0.011
washer-0.010
udd-0.010
Token She
Token ID1375
Feature activation-2.26e-03

Pos logit contributions
arl+0.008
lessness+0.007
auc+0.006
washer+0.006
udd+0.006
Neg logit contributions
 Jeffrey-0.004
 Raven-0.003
 Tab-0.003
 Gor-0.003
 Heidi-0.003
Token loved
Token ID6151
Feature activation-1.86e-02

Pos logit contributions
 Jeffrey+0.005
 Raven+0.004
 Tab+0.004
 Heidi+0.004
 Gor+0.004
Neg logit contributions
arl-0.004
lessness-0.004
auc-0.003
washer-0.003
udd-0.003
Token going
Token ID1016
Feature activation-3.47e-02

Pos logit contributions
 Jeffrey+0.041
 Raven+0.037
 Heidi+0.036
 Tab+0.035
 Pet+0.034
Neg logit contributions
arl-0.031
auc-0.025
lessness-0.025
washer-0.025
udd-0.023
Token on
Token ID319
Feature activation-7.61e-02

Pos logit contributions
 Jeffrey+0.082
 Raven+0.075
 Heidi+0.075
 Tab+0.072
 Eva+0.070
Neg logit contributions
arl-0.055
washer-0.045
lessness-0.042
auc-0.042
udd-0.039
Token adventures
Token ID17545
Feature activation-2.15e-02

Pos logit contributions
 Jeffrey+0.138
 Raven+0.125
 Gor+0.119
tons+0.118
 Heidi+0.118
Neg logit contributions
arl-0.120
udd-0.110
washer-0.103
auc-0.098
lessness-0.094
Token in
Token ID287
Feature activation-3.87e-02

Pos logit contributions
 Jeffrey+0.054
 Tab+0.049
 Gor+0.049
 Stanley+0.049
 Raven+0.048
Neg logit contributions
 together-0.023
washer-0.020
lessness-0.019
auc-0.018
arl-0.017
Token the
Token ID262
Feature activation-1.06e-02

Pos logit contributions
 Jeffrey+0.067
 Raven+0.061
 Gor+0.058
 Alexander+0.055
 Tab+0.054
Neg logit contributions
arl-0.080
lessness-0.068
auc-0.065
washer-0.062
ot-0.062
Token forest
Token ID8222
Feature activation+9.83e-03

Pos logit contributions
 Jeffrey+0.017
 Raven+0.015
 Tab+0.014
 Heidi+0.014
 Gor+0.014
Neg logit contributions
arl-0.024
auc-0.021
lessness-0.021
washer-0.020
udd-0.019
Token near
Token ID1474
Feature activation-1.43e-02

Pos logit contributions
arl+0.016
lessness+0.014
auc+0.014
washer+0.014
ot+0.013
Neg logit contributions
 Jeffrey-0.022
 Raven-0.020
 Heidi-0.019
 Gor-0.019
 Alexander-0.019
Token Lily
Token ID20037
Feature activation-9.55e-03

Pos logit contributions
arl+0.034
lessness+0.030
auc+0.029
washer+0.028
udd+0.026
Neg logit contributions
 Jeffrey-0.029
 Raven-0.027
 Heidi-0.026
 Tab-0.026
 Ruth-0.025
Token.
Token ID13
Feature activation+5.46e-03

Pos logit contributions
 Jeffrey+0.018
 Raven+0.016
 Tab+0.015
 Heidi+0.015
 Gor+0.015
Neg logit contributions
arl-0.020
lessness-0.017
auc-0.017
washer-0.016
udd-0.015
Token She
Token ID1375
Feature activation-3.75e-03

Pos logit contributions
arl+0.014
lessness+0.013
auc+0.012
washer+0.012
udd+0.011
Neg logit contributions
 Jeffrey-0.007
 Raven-0.006
 Tab-0.006
 Gor-0.006
 Heidi-0.006
Token loved
Token ID6151
Feature activation-1.77e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Tab+0.007
 Heidi+0.007
 Gor+0.007
Neg logit contributions
arl-0.007
lessness-0.006
auc-0.006
washer-0.005
udd-0.005
Token going
Token ID1016
Feature activation-3.16e-02

Pos logit contributions
 Jeffrey+0.039
 Raven+0.035
 Heidi+0.034
 Tab+0.033
 Pet+0.033
Neg logit contributions
arl-0.030
auc-0.024
lessness-0.024
washer-0.024
udd-0.021
Token on
Token ID319
Feature activation-7.61e-02

Pos logit contributions
 Jeffrey+0.071
 Raven+0.065
 Heidi+0.064
 Tab+0.062
 Ruth+0.060
Neg logit contributions
arl-0.054
washer-0.043
lessness-0.042
auc-0.042
udd-0.038
Token adventures
Token ID17545
Feature activation-1.86e-02

Pos logit contributions
 Jeffrey+0.136
 Raven+0.120
 Gor+0.117
tons+0.116
 Heidi+0.115
Neg logit contributions
arl-0.121
udd-0.109
washer-0.104
auc-0.099
lessness-0.096
Token with
Token ID351
Feature activation-2.23e-02

Pos logit contributions
 Jeffrey+0.047
 Tab+0.043
 Gor+0.042
 Stanley+0.042
 Heidi+0.042
Neg logit contributions
 together-0.020
washer-0.017
lessness-0.016
auc-0.015
arl-0.015
Token her
Token ID607
Feature activation+8.44e-03

Pos logit contributions
 Jeffrey+0.036
 Raven+0.032
 Gor+0.030
 Tab+0.029
 Pet+0.029
Neg logit contributions
arl-0.049
lessness-0.042
auc-0.041
washer-0.039
ot-0.039
Token mom
Token ID1995
Feature activation+8.20e-03

Pos logit contributions
arl+0.017
lessness+0.014
auc+0.014
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.017
 Raven-0.015
 Tab-0.014
 Heidi-0.014
 Gor-0.014
Tokenmy
Token ID1820
Feature activation-2.87e-03

Pos logit contributions
arl+0.015
lessness+0.013
washer+0.013
auc+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.016
 Raven-0.015
 Heidi-0.014
 Tab-0.014
 Gor-0.014
Token.
Token ID13
Feature activation+1.59e-03

Pos logit contributions
 Jeffrey+0.014
 Raven+0.012
 Tab+0.012
 Gor+0.011
 Heidi+0.011
Neg logit contributions
arl-0.015
lessness-0.013
auc-0.013
washer-0.012
udd-0.011
Token Tim
Token ID5045
Feature activation-8.12e-03

Pos logit contributions
arl+0.003
lessness+0.002
auc+0.002
washer+0.002
udd+0.002
Neg logit contributions
 Jeffrey-0.003
 Raven-0.003
 Tab-0.003
 Heidi-0.003
 Pet-0.003
Tokenmy
Token ID1820
Feature activation-9.24e-03

Pos logit contributions
 Jeffrey+0.015
 Raven+0.014
 Tab+0.013
 Heidi+0.013
 Gor+0.012
Neg logit contributions
arl-0.017
lessness-0.015
auc-0.014
washer-0.014
udd-0.013
Token loved
Token ID6151
Feature activation-1.98e-02

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Tab+0.020
 Heidi+0.019
 Ruth+0.019
Neg logit contributions
arl-0.013
washer-0.012
lessness-0.012
auc-0.011
udd-0.011
Token going
Token ID1016
Feature activation-3.38e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.038
 Heidi+0.037
 Tab+0.036
 Pet+0.036
Neg logit contributions
arl-0.035
washer-0.028
auc-0.028
lessness-0.028
udd-0.025
Token on
Token ID319
Feature activation-7.56e-02

Pos logit contributions
 Jeffrey+0.079
 Heidi+0.071
 Raven+0.071
 Tab+0.069
 Ruth+0.067
Neg logit contributions
arl-0.055
washer-0.045
lessness-0.042
auc-0.042
udd-0.039
Token the
Token ID262
Feature activation-1.02e-02

Pos logit contributions
 Jeffrey+0.135
 Raven+0.124
 Heidi+0.120
 Pet+0.118
 Gor+0.117
Neg logit contributions
arl-0.123
washer-0.109
udd-0.108
auc-0.101
lessness-0.096
Token subway
Token ID19612
Feature activation-4.04e-03

Pos logit contributions
 Jeffrey+0.018
 Raven+0.017
 Heidi+0.016
 Tab+0.015
 Gor+0.015
Neg logit contributions
arl-0.021
auc-0.018
washer-0.018
lessness-0.017
udd-0.016
Token with
Token ID351
Feature activation-1.07e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Heidi+0.007
 Tab+0.007
 Gor+0.007
Neg logit contributions
arl-0.008
auc-0.007
washer-0.007
lessness-0.007
udd-0.006
Token his
Token ID465
Feature activation+1.82e-02

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Tab+0.011
 Heidi+0.011
 Gor+0.011
Neg logit contributions
arl-0.028
lessness-0.025
auc-0.024
washer-0.023
udd-0.022
Token mom
Token ID1995
Feature activation+1.30e-02

Pos logit contributions
arl+0.042
auc+0.037
lessness+0.036
ares+0.032
washer+0.032
Neg logit contributions
 Jeffrey-0.030
 Raven-0.028
 Gor-0.025
 Tab-0.025
 Charles-0.024
Token John
Token ID1757
Feature activation-1.56e-02

Pos logit contributions
arl+0.031
lessness+0.027
auc+0.026
washer+0.025
udd+0.023
Neg logit contributions
 Jeffrey-0.028
 Raven-0.026
 Tab-0.024
 Gor-0.024
 Heidi-0.024
Token.
Token ID13
Feature activation-2.24e-02

Pos logit contributions
 Jeffrey+0.030
 Raven+0.028
 Tab+0.026
 Heidi+0.026
 Gor+0.025
Neg logit contributions
arl-0.031
lessness-0.026
auc-0.026
washer-0.025
udd-0.023
Token He
Token ID679
Feature activation-7.80e-03

Pos logit contributions
 Raven+0.037
 Jeffrey+0.036
 Petra+0.033
 Pet+0.033
 Tab+0.032
Neg logit contributions
arl-0.044
lessness-0.038
washer-0.036
auc-0.036
udd-0.032
Token loved
Token ID6151
Feature activation-1.80e-02

Pos logit contributions
 Jeffrey+0.019
 Tab+0.017
 Heidi+0.017
 Raven+0.017
 Ruth+0.016
Neg logit contributions
arl-0.010
washer-0.009
lessness-0.009
auc-0.009
ot-0.008
Token going
Token ID1016
Feature activation-3.07e-02

Pos logit contributions
 Jeffrey+0.040
 Raven+0.036
 Heidi+0.035
 Tab+0.034
 Pet+0.033
Neg logit contributions
arl-0.030
washer-0.025
auc-0.024
lessness-0.024
udd-0.022
Token on
Token ID319
Feature activation-7.55e-02

Pos logit contributions
 Jeffrey+0.070
 Raven+0.064
 Heidi+0.063
 Tab+0.062
 Ruth+0.060
Neg logit contributions
arl-0.051
washer-0.041
lessness-0.040
auc-0.040
udd-0.036
Token adventures
Token ID17545
Feature activation-1.89e-02

Pos logit contributions
 Jeffrey+0.137
 Raven+0.129
 Heidi+0.122
 Lex+0.120
 Pet+0.120
Neg logit contributions
arl-0.116
washer-0.105
udd-0.102
auc-0.097
lessness-0.094
Token and
Token ID290
Feature activation-9.62e-03

Pos logit contributions
 Jeffrey+0.046
 Tab+0.044
 Heidi+0.044
 Lex+0.044
 Raven+0.043
Neg logit contributions
washer-0.021
 together-0.020
lessness-0.018
auc-0.018
arl-0.017
Token playing
Token ID2712
Feature activation+1.68e-03

Pos logit contributions
 Jeffrey+0.016
 Raven+0.015
 Tab+0.014
 Heidi+0.014
 Gor+0.013
Neg logit contributions
arl-0.020
lessness-0.018
washer-0.017
auc-0.017
udd-0.016
Token outside
Token ID2354
Feature activation-1.16e-02

Pos logit contributions
arl+0.003
auc+0.003
lessness+0.003
washer+0.003
udd+0.003
Neg logit contributions
 Jeffrey-0.003
 Raven-0.003
 Tab-0.003
 Heidi-0.003
 Gor-0.003
Token.
Token ID13
Feature activation-7.47e-03

Pos logit contributions
 Jeffrey+0.031
 Heidi+0.029
 Raven+0.029
asha+0.028
 Tab+0.028
Neg logit contributions
arl-0.012
washer-0.012
lessness-0.012
auc-0.011
ot-0.009
Token named
Token ID3706
Feature activation-1.63e-02

Pos logit contributions
 Jeffrey+0.012
 Heidi+0.010
 Tab+0.010
 Raven+0.010
 Alexander+0.010
Neg logit contributions
arl-0.008
lessness-0.007
washer-0.007
auc-0.006
ot-0.006
Token Sam
Token ID3409
Feature activation-7.88e-03

Pos logit contributions
 Jeffrey+0.027
 Raven+0.023
 Heidi+0.023
 Tab+0.022
 gor+0.022
Neg logit contributions
arl-0.034
washer-0.024
ot-0.024
lessness-0.024
auc-0.023
Token.
Token ID13
Feature activation-2.73e-02

Pos logit contributions
 Jeffrey+0.016
 Raven+0.014
 Tab+0.014
 Heidi+0.013
 Gor+0.013
Neg logit contributions
arl-0.015
lessness-0.013
auc-0.013
washer-0.012
udd-0.011
Token Sam
Token ID3409
Feature activation-2.48e-02

Pos logit contributions
 Jeffrey+0.055
 Raven+0.054
 Tab+0.051
 Gor+0.050
 Sunshine+0.050
Neg logit contributions
arl-0.050
lessness-0.040
auc-0.038
washer-0.037
udd-0.036
Token lived
Token ID5615
Feature activation-5.64e-03

Pos logit contributions
 Jeffrey+0.064
 Tab+0.058
 Heidi+0.058
 Raven+0.056
 Ruth+0.055
Neg logit contributions
washer-0.026
arl-0.025
lessness-0.024
udd-0.023
auc-0.022
Token on
Token ID319
Feature activation-7.54e-02

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Heidi+0.010
 Tab+0.010
 Gor+0.009
Neg logit contributions
arl-0.011
lessness-0.009
auc-0.009
washer-0.009
udd-0.008
Token a
Token ID257
Feature activation-1.34e-02

Pos logit contributions
 Jeffrey+0.155
 Heidi+0.144
alia+0.141
 Tab+0.140
 Christopher+0.134
Neg logit contributions
arl-0.086
udd-0.066
 bad-0.061
washer-0.061
lessness-0.059
Token farm
Token ID5318
Feature activation+5.20e-03

Pos logit contributions
 Jeffrey+0.023
 Raven+0.021
 Tab+0.020
 Heidi+0.019
 Gor+0.019
Neg logit contributions
arl-0.029
lessness-0.025
auc-0.024
washer-0.024
udd-0.023
Token with
Token ID351
Feature activation-1.31e-02

Pos logit contributions
arl+0.009
washer+0.008
lessness+0.008
auc+0.007
udd+0.007
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Heidi-0.010
 Tab-0.009
 Gor-0.009
Token other
Token ID584
Feature activation-2.41e-02

Pos logit contributions
 Jeffrey+0.020
 Raven+0.018
 Tab+0.016
 Heidi+0.016
 Gor+0.016
Neg logit contributions
arl-0.031
lessness-0.027
auc-0.027
washer-0.026
udd-0.024
Token animals
Token ID4695
Feature activation+2.22e-02

Pos logit contributions
 Jeffrey+0.041
 Heidi+0.036
 Raven+0.035
 Tab+0.034
 Lib+0.034
Neg logit contributions
arl-0.049
udd-0.044
lessness-0.042
washer-0.038
auc-0.037
Token Lily
Token ID20037
Feature activation-8.04e-03

Pos logit contributions
arl+0.039
lessness+0.035
auc+0.034
washer+0.033
udd+0.031
Neg logit contributions
 Jeffrey-0.031
 Raven-0.030
 Heidi-0.028
 Tab-0.028
 Ruth-0.027
Token.
Token ID13
Feature activation+6.51e-03

Pos logit contributions
 Jeffrey+0.015
 Raven+0.014
 Tab+0.013
 Heidi+0.013
 Gor+0.013
Neg logit contributions
arl-0.017
lessness-0.014
auc-0.014
washer-0.013
udd-0.012
Token She
Token ID1375
Feature activation-2.65e-03

Pos logit contributions
arl+0.017
lessness+0.015
auc+0.015
washer+0.014
udd+0.014
Neg logit contributions
 Jeffrey-0.009
 Raven-0.008
 Tab-0.007
 Gor-0.007
 Heidi-0.007
Token loved
Token ID6151
Feature activation-1.67e-02

Pos logit contributions
 Jeffrey+0.006
 Raven+0.005
 Tab+0.005
 Heidi+0.005
 Gor+0.005
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token going
Token ID1016
Feature activation-3.09e-02

Pos logit contributions
 Jeffrey+0.037
 Raven+0.033
 Heidi+0.032
 Tab+0.031
 Gor+0.031
Neg logit contributions
arl-0.028
auc-0.022
lessness-0.022
washer-0.022
udd-0.020
Token on
Token ID319
Feature activation-7.50e-02

Pos logit contributions
 Jeffrey+0.072
 Raven+0.065
 Heidi+0.065
 Tab+0.063
 Ruth+0.061
Neg logit contributions
arl-0.051
washer-0.041
lessness-0.039
auc-0.038
udd-0.035
Token adventures
Token ID17545
Feature activation-1.89e-02

Pos logit contributions
 Jeffrey+0.138
 Raven+0.122
 Gor+0.121
 Alexander+0.118
tons+0.118
Neg logit contributions
arl-0.114
udd-0.102
washer-0.097
auc-0.092
 divorce-0.091
Token with
Token ID351
Feature activation-1.90e-02

Pos logit contributions
 Jeffrey+0.048
 Tab+0.043
 Gor+0.043
 Stanley+0.042
 Heidi+0.042
Neg logit contributions
 together-0.020
washer-0.017
lessness-0.017
auc-0.017
arl-0.016
Token her
Token ID607
Feature activation+1.08e-02

Pos logit contributions
 Jeffrey+0.031
 Raven+0.028
 Gor+0.026
 Tab+0.025
 Heidi+0.024
Neg logit contributions
arl-0.042
lessness-0.036
auc-0.035
washer-0.034
ot-0.034
Token dog
Token ID3290
Feature activation-6.31e-03

Pos logit contributions
arl+0.022
lessness+0.019
auc+0.018
washer+0.017
udd+0.016
Neg logit contributions
 Jeffrey-0.021
 Raven-0.019
 Tab-0.018
 Gor-0.018
 Heidi-0.017
Token,
Token ID11
Feature activation-4.59e-03

Pos logit contributions
 Jeffrey+0.013
 Heidi+0.011
 Tab+0.011
 Raven+0.011
 Charles+0.011
Neg logit contributions
arl-0.011
washer-0.009
auc-0.009
udd-0.009
lessness-0.009
Token Lily
Token ID20037
Feature activation-1.22e-02

Pos logit contributions
arl+0.036
lessness+0.033
auc+0.032
washer+0.031
udd+0.028
Neg logit contributions
 Jeffrey-0.030
 Raven-0.029
 Tab-0.027
 Heidi-0.027
 Ruth-0.027
Token.
Token ID13
Feature activation+3.71e-03

Pos logit contributions
 Jeffrey+0.023
 Raven+0.021
 Tab+0.020
 Heidi+0.020
 Gor+0.020
Neg logit contributions
arl-0.025
lessness-0.021
auc-0.021
washer-0.020
udd-0.018
Token She
Token ID1375
Feature activation-6.13e-03

Pos logit contributions
arl+0.009
lessness+0.008
auc+0.008
washer+0.007
udd+0.007
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.005
 Gor-0.005
 Heidi-0.004
Token loved
Token ID6151
Feature activation-2.10e-02

Pos logit contributions
 Jeffrey+0.013
 Raven+0.012
 Tab+0.011
 Heidi+0.011
 Gor+0.011
Neg logit contributions
arl-0.011
lessness-0.010
auc-0.009
washer-0.009
udd-0.008
Token going
Token ID1016
Feature activation-3.58e-02

Pos logit contributions
 Jeffrey+0.046
 Raven+0.042
 Heidi+0.040
 Tab+0.040
 Pet+0.039
Neg logit contributions
arl-0.035
washer-0.028
auc-0.028
lessness-0.027
udd-0.025
Token on
Token ID319
Feature activation-7.47e-02

Pos logit contributions
 Jeffrey+0.084
 Raven+0.076
 Heidi+0.076
 Tab+0.073
 Ruth+0.071
Neg logit contributions
arl-0.058
washer-0.047
auc-0.044
lessness-0.044
udd-0.040
Token adventures
Token ID17545
Feature activation-2.01e-02

Pos logit contributions
 Jeffrey+0.138
 Raven+0.124
 Gor+0.121
 Alexander+0.118
 Heidi+0.118
Neg logit contributions
arl-0.123
udd-0.108
washer-0.101
auc-0.097
lessness-0.094
Token with
Token ID351
Feature activation-1.98e-02

Pos logit contributions
 Jeffrey+0.051
 Tab+0.047
 Gor+0.047
 Heidi+0.046
 Pet+0.046
Neg logit contributions
 together-0.021
washer-0.018
lessness-0.017
auc-0.017
arl-0.015
Token her
Token ID607
Feature activation+5.39e-03

Pos logit contributions
 Jeffrey+0.033
 Raven+0.030
 Gor+0.028
 Tab+0.027
 Alexander+0.026
Neg logit contributions
arl-0.043
lessness-0.036
auc-0.036
ot-0.034
washer-0.034
Token family
Token ID1641
Feature activation+1.41e-02

Pos logit contributions
arl+0.011
lessness+0.009
auc+0.009
washer+0.008
udd+0.008
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Tab-0.009
 Gor-0.009
 Heidi-0.009
Token.
Token ID13
Feature activation+7.23e-03

Pos logit contributions
arl+0.015
washer+0.014
lessness+0.013
auc+0.012
ry+0.011
Neg logit contributions
 Jeffrey-0.038
 Raven-0.035
 Heidi-0.035
 Alexander-0.033
Alexander-0.033
Token.
Token ID13
Feature activation+5.68e-04

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Tab+0.019
 Heidi+0.019
 Gor+0.019
Neg logit contributions
arl-0.022
lessness-0.019
auc-0.018
washer-0.017
udd-0.016
Token Tim
Token ID5045
Feature activation-1.31e-02

Pos logit contributions
arl+0.001
lessness+0.001
auc+0.001
washer+0.001
udd+0.001
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Tab-0.001
 Heidi-0.001
 Pet-0.001
Tokenmy
Token ID1820
Feature activation-1.59e-02

Pos logit contributions
 Jeffrey+0.025
 Raven+0.023
 Tab+0.022
 Heidi+0.021
 Gor+0.021
Neg logit contributions
arl-0.026
lessness-0.023
auc-0.022
washer-0.022
udd-0.021
Token loved
Token ID6151
Feature activation-2.11e-02

Pos logit contributions
 Jeffrey+0.039
 Raven+0.035
 Tab+0.034
 Heidi+0.034
 Ruth+0.033
Neg logit contributions
arl-0.021
washer-0.020
lessness-0.019
auc-0.018
udd-0.018
Token going
Token ID1016
Feature activation-3.46e-02

Pos logit contributions
 Jeffrey+0.047
 Raven+0.042
 Heidi+0.041
 Tab+0.040
 Pet+0.040
Neg logit contributions
arl-0.035
washer-0.028
auc-0.028
lessness-0.027
udd-0.025
Token on
Token ID319
Feature activation-7.45e-02

Pos logit contributions
 Jeffrey+0.084
 Heidi+0.076
 Raven+0.075
 Tab+0.073
 Eva+0.071
Neg logit contributions
arl-0.054
washer-0.044
auc-0.041
lessness-0.040
udd-0.037
Token adventures
Token ID17545
Feature activation-2.43e-02

Pos logit contributions
 Jeffrey+0.134
 Raven+0.122
 Heidi+0.120
 Gor+0.115
 Tab+0.114
Neg logit contributions
arl-0.125
washer-0.108
udd-0.105
auc-0.100
lessness-0.096
Token with
Token ID351
Feature activation-2.50e-02

Pos logit contributions
 Jeffrey+0.059
 Heidi+0.056
 Tab+0.056
 Lex+0.055
asha+0.055
Neg logit contributions
washer-0.026
 together-0.026
auc-0.023
lessness-0.022
arl-0.021
Token his
Token ID465
Feature activation-2.41e-03

Pos logit contributions
 Jeffrey+0.042
 Raven+0.039
 Heidi+0.035
 Tab+0.035
 Gor+0.034
Neg logit contributions
arl-0.055
auc-0.046
lessness-0.046
washer-0.045
ot-0.042
Token regular
Token ID3218
Feature activation-1.11e-02

Pos logit contributions
 Jeffrey+0.005
 Raven+0.004
 Tab+0.004
 Heidi+0.004
 Gor+0.004
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token jacket
Token ID15224
Feature activation-1.51e-02

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Heidi+0.019
 Tab+0.019
 Ruth+0.018
Neg logit contributions
arl-0.022
lessness-0.018
auc-0.018
washer-0.017
udd-0.016
Token Lily
Token ID20037
Feature activation-1.18e-02

Pos logit contributions
arl+0.034
lessness+0.031
auc+0.030
washer+0.029
udd+0.027
Neg logit contributions
 Jeffrey-0.028
 Raven-0.027
 Tab-0.026
 Heidi-0.026
 Ruth-0.025
Token.
Token ID13
Feature activation+4.57e-03

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Tab+0.019
 Heidi+0.018
 Gor+0.018
Neg logit contributions
arl-0.025
lessness-0.021
auc-0.020
washer-0.019
udd-0.018
Token She
Token ID1375
Feature activation-6.47e-03

Pos logit contributions
arl+0.012
lessness+0.011
auc+0.010
washer+0.010
udd+0.010
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.005
 Gor-0.005
 Heidi-0.005
Token loved
Token ID6151
Feature activation-1.80e-02

Pos logit contributions
 Jeffrey+0.013
 Raven+0.012
 Tab+0.012
 Heidi+0.012
 Gor+0.011
Neg logit contributions
arl-0.012
lessness-0.010
auc-0.010
washer-0.010
udd-0.009
Token going
Token ID1016
Feature activation-3.33e-02

Pos logit contributions
 Jeffrey+0.040
 Raven+0.036
 Heidi+0.034
 Tab+0.034
 Charles+0.033
Neg logit contributions
arl-0.031
auc-0.024
lessness-0.024
washer-0.024
udd-0.021
Token on
Token ID319
Feature activation-7.44e-02

Pos logit contributions
 Jeffrey+0.077
 Raven+0.069
 Heidi+0.069
 Tab+0.067
 Ruth+0.065
Neg logit contributions
arl-0.055
washer-0.045
lessness-0.042
auc-0.042
udd-0.039
Token trips
Token ID13229
Feature activation-1.56e-02

Pos logit contributions
 Jeffrey+0.136
 Raven+0.120
 Gor+0.117
 Heidi+0.115
 Alexander+0.115
Neg logit contributions
arl-0.123
udd-0.108
washer-0.103
auc-0.099
lessness-0.096
Token with
Token ID351
Feature activation-1.94e-02

Pos logit contributions
 Jeffrey+0.037
 Raven+0.033
 Alexander+0.033
 Heidi+0.033
Alexander+0.032
Neg logit contributions
arl-0.021
washer-0.020
auc-0.020
lessness-0.018
ot-0.018
Token her
Token ID607
Feature activation+6.06e-03

Pos logit contributions
 Jeffrey+0.032
 Raven+0.029
 Gor+0.027
 Heidi+0.025
 Tab+0.025
Neg logit contributions
arl-0.043
lessness-0.036
auc-0.036
ot-0.034
washer-0.034
Token family
Token ID1641
Feature activation+1.28e-02

Pos logit contributions
arl+0.011
lessness+0.009
auc+0.009
washer+0.009
udd+0.008
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Heidi-0.011
 Gor-0.011
Token.
Token ID13
Feature activation+8.91e-03

Pos logit contributions
arl+0.016
washer+0.015
lessness+0.014
auc+0.012
ry+0.012
Neg logit contributions
 Jeffrey-0.033
 Raven-0.031
 Heidi-0.030
 Alexander-0.028
 Charles-0.028
Token Lily
Token ID20037
Feature activation-1.10e-02

Pos logit contributions
arl+0.035
lessness+0.032
auc+0.031
washer+0.030
udd+0.027
Neg logit contributions
 Jeffrey-0.029
 Raven-0.028
 Tab-0.026
 Heidi-0.026
 Ruth-0.025
Token.
Token ID13
Feature activation+4.05e-03

Pos logit contributions
 Jeffrey+0.021
 Raven+0.019
 Tab+0.018
 Heidi+0.018
 Gor+0.018
Neg logit contributions
arl-0.022
lessness-0.019
auc-0.019
washer-0.018
udd-0.017
Token She
Token ID1375
Feature activation-7.51e-03

Pos logit contributions
arl+0.010
lessness+0.009
auc+0.009
washer+0.008
udd+0.008
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.005
 Gor-0.005
 Heidi-0.005
Token loved
Token ID6151
Feature activation-1.76e-02

Pos logit contributions
 Jeffrey+0.016
 Raven+0.014
 Tab+0.014
 Heidi+0.013
 Gor+0.013
Neg logit contributions
arl-0.014
lessness-0.012
auc-0.011
washer-0.011
udd-0.010
Token going
Token ID1016
Feature activation-2.90e-02

Pos logit contributions
 Jeffrey+0.039
 Raven+0.035
 Heidi+0.034
 Tab+0.033
 Pet+0.033
Neg logit contributions
arl-0.029
auc-0.023
washer-0.023
lessness-0.023
udd-0.021
Token on
Token ID319
Feature activation-7.41e-02

Pos logit contributions
 Jeffrey+0.064
 Raven+0.058
 Heidi+0.057
 Tab+0.055
 Gor+0.053
Neg logit contributions
arl-0.052
washer-0.041
lessness-0.041
auc-0.041
udd-0.037
Token vacation
Token ID14600
Feature activation+8.48e-04

Pos logit contributions
 Jeffrey+0.137
 Raven+0.122
 Gor+0.119
 Heidi+0.118
 Alexander+0.117
Neg logit contributions
arl-0.122
udd-0.107
washer-0.101
auc-0.097
lessness-0.093
Token with
Token ID351
Feature activation-1.66e-02

Pos logit contributions
arl+0.001
washer+0.001
udd+0.001
auc+0.001
lessness+0.001
Neg logit contributions
 Jeffrey-0.002
 Raven-0.002
 Gor-0.002
 Heidi-0.002
 Tab-0.002
Token her
Token ID607
Feature activation+1.40e-02

Pos logit contributions
 Jeffrey+0.028
 Raven+0.024
 Gor+0.023
 Alexander+0.022
 Heidi+0.022
Neg logit contributions
arl-0.037
auc-0.030
lessness-0.030
ot-0.029
washer-0.029
Token family
Token ID1641
Feature activation+1.83e-02

Pos logit contributions
arl+0.030
auc+0.025
lessness+0.025
washer+0.023
udd+0.022
Neg logit contributions
 Jeffrey-0.025
 Raven-0.024
 Tab-0.022
 Gor-0.022
 Heidi-0.021
Token.
Token ID13
Feature activation+1.08e-02

Pos logit contributions
arl+0.026
washer+0.022
lessness+0.022
auc+0.020
ry+0.018
Neg logit contributions
 Jeffrey-0.045
 Raven-0.042
 Heidi-0.040
 Alexander-0.038
 Gor-0.038
Tokenily
Token ID813
Feature activation-3.58e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Tab+0.007
 Heidi+0.007
 Gor+0.007
Neg logit contributions
arl-0.008
lessness-0.007
auc-0.007
washer-0.006
udd-0.006
Token and
Token ID290
Feature activation-1.03e-02

Pos logit contributions
 Jeffrey+0.078
 Raven+0.073
 Tab+0.069
 Gor+0.068
 Heidi+0.067
Neg logit contributions
arl-0.056
auc-0.050
washer-0.049
udd-0.048
lessness-0.047
Token Tom
Token ID4186
Feature activation-2.63e-02

Pos logit contributions
 Jeffrey+0.018
 Raven+0.016
 Gor+0.016
 gor+0.015
 Tab+0.015
Neg logit contributions
arl-0.020
lessness-0.017
ot-0.017
bs-0.016
auc-0.016
Token were
Token ID547
Feature activation-1.79e-02

Pos logit contributions
 Jeffrey+0.065
 Raven+0.062
 Heidi+0.057
 Tab+0.057
 Ruth+0.056
Neg logit contributions
arl-0.035
auc-0.029
lessness-0.028
washer-0.028
udd-0.027
Token going
Token ID1016
Feature activation-2.29e-02

Pos logit contributions
 Jeffrey+0.034
 Raven+0.032
 Tab+0.029
 Gor+0.029
 Heidi+0.029
Neg logit contributions
arl-0.037
lessness-0.031
auc-0.030
washer-0.029
udd-0.028
Token on
Token ID319
Feature activation-7.40e-02

Pos logit contributions
 Jeffrey+0.045
 Raven+0.041
 Tab+0.038
 Heidi+0.038
 Gor+0.038
Neg logit contributions
arl-0.046
lessness-0.038
auc-0.037
washer-0.036
udd-0.034
Token a
Token ID257
Feature activation-1.10e-02

Pos logit contributions
 Jeffrey+0.144
 Gor+0.140
 Raven+0.138
 Heidi+0.128
 Pet+0.128
Neg logit contributions
arl-0.097
washer-0.086
 together-0.085
udd-0.081
auc-0.079
Token journey
Token ID7002
Feature activation+4.08e-03

Pos logit contributions
 Jeffrey+0.021
 Raven+0.019
 Tab+0.018
 Gor+0.017
 Heidi+0.017
Neg logit contributions
arl-0.023
lessness-0.019
auc-0.019
washer-0.018
udd-0.017
Token with
Token ID351
Feature activation-1.13e-02

Pos logit contributions
arl+0.007
washer+0.006
lessness+0.006
auc+0.006
ot+0.005
Neg logit contributions
 Jeffrey-0.009
 Raven-0.009
 Gor-0.008
 Heidi-0.008
 Tab-0.008
Token Mom
Token ID11254
Feature activation-2.00e-02

Pos logit contributions
 Jeffrey+0.023
 Raven+0.022
 Gor+0.020
 Tab+0.020
 Heidi+0.020
Neg logit contributions
arl-0.019
auc-0.016
lessness-0.016
washer-0.015
ot-0.015
Token and
Token ID290
Feature activation-1.89e-02

Pos logit contributions
 Jeffrey+0.042
 Raven+0.041
 Heidi+0.038
 Pet+0.037
 Tab+0.036
Neg logit contributions
arl-0.034
washer-0.031
lessness-0.029
auc-0.028
udd-0.026
Token Rose
Token ID8049
Feature activation+4.41e-03

Pos logit contributions
arl+0.036
lessness+0.032
auc+0.031
washer+0.030
udd+0.027
Neg logit contributions
 Jeffrey-0.030
 Raven-0.029
 Tab-0.028
 Heidi-0.028
 Ruth-0.027
Token.
Token ID13
Feature activation-8.47e-03

Pos logit contributions
arl+0.010
lessness+0.009
auc+0.009
washer+0.008
udd+0.007
Neg logit contributions
 Jeffrey-0.007
 Raven-0.007
 Tab-0.006
 Gor-0.006
 Heidi-0.006
Token She
Token ID1375
Feature activation-4.78e-03

Pos logit contributions
 Jeffrey+0.012
 Raven+0.011
 Tab+0.010
 Gor+0.010
 Heidi+0.010
Neg logit contributions
arl-0.021
lessness-0.019
auc-0.018
washer-0.018
udd-0.017
Token loved
Token ID6151
Feature activation-2.01e-02

Pos logit contributions
 Jeffrey+0.011
 Raven+0.010
 Tab+0.009
 Heidi+0.009
 Gor+0.009
Neg logit contributions
arl-0.008
lessness-0.007
washer-0.006
auc-0.006
udd-0.006
Token going
Token ID1016
Feature activation-3.53e-02

Pos logit contributions
 Jeffrey+0.045
 Raven+0.041
 Heidi+0.040
 Tab+0.039
 Pet+0.038
Neg logit contributions
arl-0.032
washer-0.025
auc-0.025
lessness-0.025
udd-0.023
Token on
Token ID319
Feature activation-7.40e-02

Pos logit contributions
 Jeffrey+0.084
 Heidi+0.077
 Raven+0.076
 Tab+0.074
 Eva+0.072
Neg logit contributions
arl-0.055
washer-0.045
lessness-0.042
auc-0.041
udd-0.038
Token adventures
Token ID17545
Feature activation-2.11e-02

Pos logit contributions
 Jeffrey+0.134
 Raven+0.121
 Gor+0.116
 Heidi+0.114
 Alexander+0.114
Neg logit contributions
arl-0.119
udd-0.107
washer-0.101
auc-0.096
lessness-0.094
Token and
Token ID290
Feature activation-1.64e-02

Pos logit contributions
 Jeffrey+0.052
 Gor+0.048
 Tab+0.048
 Heidi+0.047
Alexander+0.047
Neg logit contributions
 together-0.022
washer-0.019
lessness-0.019
auc-0.018
arl-0.017
Token one
Token ID530
Feature activation+3.20e-03

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
Alexander+0.024
 Heidi+0.024
 Gor+0.023
Neg logit contributions
arl-0.031
lessness-0.029
washer-0.027
auc-0.026
udd-0.026
Token day
Token ID1110
Feature activation-1.47e-02

Pos logit contributions
arl+0.006
lessness+0.005
auc+0.005
washer+0.005
udd+0.004
Neg logit contributions
 Jeffrey-0.007
 Raven-0.006
 Tab-0.006
 Heidi-0.006
 Gor-0.006
Token,
Token ID11
Feature activation-3.08e-03

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
 Gor+0.024
Alexander+0.024
 Alexander+0.023
Neg logit contributions
arl-0.029
lessness-0.026
auc-0.024
ot-0.023
washer-0.023
Token little
Token ID1310
Feature activation-5.21e-04

Pos logit contributions
arl+0.000
lessness+0.000
auc+0.000
washer+0.000
udd+0.000
Neg logit contributions
 Jeffrey-0.001
 Raven-0.000
 Tab-0.000
 Heidi-0.000
 Gor-0.000
Token girl
Token ID2576
Feature activation+4.90e-03

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Tab+0.001
 Heidi+0.001
 Charles+0.001
Neg logit contributions
arl-0.001
lessness-0.001
auc-0.001
washer-0.001
ot-0.001
Token who
Token ID508
Feature activation+7.80e-03

Pos logit contributions
arl+0.010
lessness+0.008
auc+0.008
washer+0.008
udd+0.007
Neg logit contributions
 Jeffrey-0.009
 Raven-0.009
 Tab-0.008
 Heidi-0.008
 Gor-0.008
Token loved
Token ID6151
Feature activation-8.79e-03

Pos logit contributions
arl+0.013
lessness+0.012
auc+0.011
washer+0.011
ot+0.010
Neg logit contributions
 Jeffrey-0.017
 Raven-0.015
 Tab-0.015
 Heidi-0.014
 Alexander-0.014
Token going
Token ID1016
Feature activation-2.92e-02

Pos logit contributions
 Jeffrey+0.019
 Raven+0.017
 Heidi+0.016
 Tab+0.016
 Gor+0.016
Neg logit contributions
arl-0.016
lessness-0.013
auc-0.013
washer-0.012
udd-0.011
Token on
Token ID319
Feature activation-7.40e-02

Pos logit contributions
 Jeffrey+0.065
 Raven+0.060
 Heidi+0.058
 Tab+0.057
 Ruth+0.055
Neg logit contributions
arl-0.051
washer-0.041
lessness-0.040
auc-0.040
udd-0.036
Token adventures
Token ID17545
Feature activation-1.11e-02

Pos logit contributions
 Jeffrey+0.145
 Raven+0.129
 Heidi+0.125
 Alexander+0.125
 Gor+0.125
Neg logit contributions
arl-0.099
washer-0.090
udd-0.089
auc-0.085
lessness-0.084
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
 Jeffrey+0.029
 Tab+0.026
 Raven+0.026
 Gor+0.026
 Heidi+0.026
Neg logit contributions
arl-0.011
lessness-0.010
washer-0.010
auc-0.010
 together-0.010
Token Today
Token ID6288
Feature activation+7.44e-04

Pos logit contributions
 Jeffrey+0.020
 Raven+0.018
 Gor+0.017
Charles+0.017
 gor+0.017
Neg logit contributions
arl-0.022
lessness-0.018
auc-0.017
ot-0.017
udd-0.016
Token,
Token ID11
Feature activation-9.22e-03

Pos logit contributions
arl+0.001
lessness+0.001
auc+0.001
washer+0.001
udd+0.001
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Gor-0.001
 Tab-0.001
 Alexander-0.001
Token she
Token ID673
Feature activation-2.19e-02

Pos logit contributions
 Jeffrey+0.018
 Gor+0.016
 Raven+0.016
Alexander+0.015
 Alexander+0.015
Neg logit contributions
arl-0.016
lessness-0.015
washer-0.013
auc-0.013
ot-0.013
Token concert
Token ID10010
Feature activation+2.36e-02

Pos logit contributions
arl+0.022
lessness+0.020
auc+0.018
washer+0.016
udd+0.016
Neg logit contributions
 Jeffrey-0.016
 Raven-0.015
 Tab-0.014
 Gor-0.014
 Sunshine-0.013
Token and
Token ID290
Feature activation+1.49e-02

Pos logit contributions
arl+0.060
lessness+0.051
auc+0.051
washer+0.045
udd+0.044
Neg logit contributions
 Jeffrey-0.034
 Raven-0.031
 Tab-0.028
 Sunshine-0.027
 Ruth-0.027
Token Lily
Token ID20037
Feature activation+3.56e-03

Pos logit contributions
arl+0.037
lessness+0.030
auc+0.030
washer+0.026
ares+0.026
Neg logit contributions
 Jeffrey-0.024
 Raven-0.023
 Tab-0.022
 Sunshine-0.021
 Ruth-0.021
Token danced
Token ID39480
Feature activation+2.46e-02

Pos logit contributions
arl+0.008
lessness+0.007
auc+0.007
washer+0.006
udd+0.006
Neg logit contributions
 Jeffrey-0.006
 Raven-0.006
 Tab-0.005
 Gor-0.005
 Heidi-0.005
Token and
Token ID290
Feature activation+7.60e-03

Pos logit contributions
arl+0.055
lessness+0.047
auc+0.046
washer+0.044
udd+0.041
Neg logit contributions
 Jeffrey-0.043
 Raven-0.039
 Tab-0.036
 Heidi-0.035
 Gor-0.035
Token cl
Token ID537
Feature activation+4.70e-02

Pos logit contributions
arl+0.017
auc+0.014
lessness+0.014
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.014
 Raven-0.013
 Tab-0.012
 Ruth-0.011
 Sunshine-0.011
Tokenapped
Token ID6320
Feature activation+3.79e-02

Pos logit contributions
arl+0.079
lessness+0.066
auc+0.062
washer+0.056
 die+0.054
Neg logit contributions
 Jeffrey-0.094
 Raven-0.089
 Tab-0.088
 Pet-0.085
 Ruth-0.084
Token her
Token ID607
Feature activation+2.03e-02

Pos logit contributions
arl+0.090
auc+0.079
lessness+0.077
washer+0.076
oke+0.070
Neg logit contributions
 Jeffrey-0.058
 Raven-0.052
 Tab-0.048
 Heidi-0.047
 Sunshine-0.046
Token hands
Token ID2832
Feature activation-1.73e-03

Pos logit contributions
arl+0.047
auc+0.043
lessness+0.041
washer+0.039
oke+0.036
Neg logit contributions
 Jeffrey-0.031
 Raven-0.029
 Tab-0.026
 Pet-0.026
 Sunshine-0.026
Token.
Token ID13
Feature activation-9.02e-04

Pos logit contributions
 Jeffrey+0.003
 Raven+0.003
 Tab+0.002
 Gor+0.002
 Sunshine+0.002
Neg logit contributions
arl-0.004
auc-0.003
lessness-0.003
washer-0.003
udd-0.003
Token After
Token ID2293
Feature activation-1.20e-02

Pos logit contributions
 Jeffrey+0.002
 Raven+0.001
 Gor+0.001
 Tab+0.001
 Heidi+0.001
Neg logit contributions
arl-0.002
lessness-0.002
auc-0.002
washer-0.002
udd-0.001
TokenWhat
Token ID2061
Feature activation+2.05e-02

Pos logit contributions
 Jeffrey+0.006
 Raven+0.005
 Tab+0.005
 Heidi+0.005
 Gor+0.005
Neg logit contributions
arl-0.008
lessness-0.007
auc-0.007
washer-0.006
udd-0.006
Token's
Token ID338
Feature activation-2.82e-03

Pos logit contributions
arl+0.037
lessness+0.031
auc+0.030
washer+0.028
udd+0.026
Neg logit contributions
 Jeffrey-0.044
 Raven-0.041
 Tab-0.038
 Heidi-0.038
 Gor-0.038
Token wrong
Token ID2642
Feature activation-2.43e-03

Pos logit contributions
 Jeffrey+0.006
 Raven+0.006
 Heidi+0.005
 Tab+0.005
 Pet+0.005
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token?"
Token ID1701
Feature activation-2.03e-03

Pos logit contributions
 Jeffrey+0.005
 Raven+0.005
 Tab+0.004
 Heidi+0.004
 Gor+0.004
Neg logit contributions
arl-0.004
lessness-0.004
auc-0.004
washer-0.004
udd-0.003
Token The
Token ID383
Feature activation-8.50e-03

Pos logit contributions
 Jeffrey+0.004
 Gor+0.003
 Raven+0.003
 gor+0.003
 Tab+0.003
Neg logit contributions
arl-0.004
lessness-0.003
auc-0.003
washer-0.003
ot-0.003
Token bird
Token ID6512
Feature activation+4.99e-02

Pos logit contributions
 Raven+0.012
 Jeffrey+0.012
 Gor+0.010
 Tab+0.010
 Pet+0.010
Neg logit contributions
arl-0.021
lessness-0.019
auc-0.019
washer-0.018
udd-0.017
Token softly
Token ID26625
Feature activation+1.50e-02

Pos logit contributions
arl+0.107
auc+0.087
lessness+0.084
washer+0.080
udd+0.073
Neg logit contributions
 Jeffrey-0.096
 Raven-0.095
 Gor-0.083
 Tab-0.082
 Pet-0.081
Token replied
Token ID8712
Feature activation+1.21e-02

Pos logit contributions
arl+0.032
lessness+0.027
auc+0.026
washer+0.024
udd+0.023
Neg logit contributions
 Jeffrey-0.028
 Raven-0.027
 Tab-0.025
 Pet-0.024
 Gor-0.024
Token between
Token ID1022
Feature activation-2.03e-02

Pos logit contributions
arl+0.029
lessness+0.025
auc+0.025
washer+0.024
udd+0.023
Neg logit contributions
 Jeffrey-0.018
 Raven-0.017
 Tab-0.015
 Heidi-0.015
 Gor-0.015
Token its
Token ID663
Feature activation+2.12e-02

Pos logit contributions
 Jeffrey+0.036
 Raven+0.033
 Gor+0.031
 Tab+0.030
 Pet+0.029
Neg logit contributions
arl-0.044
lessness-0.036
auc-0.033
ot-0.031
washer-0.031
Token so
Token ID523
Feature activation+9.11e-03

Pos logit contributions
arl+0.047
lessness+0.041
auc+0.040
washer+0.038
udd+0.036
Neg logit contributions
 Jeffrey-0.036
 Raven-0.033
 Tab-0.031
 Gor-0.030
 Heidi-0.030
Token.
Token ID13
Feature activation+1.18e-02

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Tab+0.001
 Heidi+0.001
 Gor+0.001
Neg logit contributions
arl-0.002
lessness-0.001
auc-0.001
washer-0.001
udd-0.001
Token Their
Token ID5334
Feature activation+6.83e-03

Pos logit contributions
arl+0.026
lessness+0.023
auc+0.022
washer+0.022
udd+0.021
Neg logit contributions
 Jeffrey-0.020
 Raven-0.018
 Tab-0.018
 Heidi-0.017
 Ruth-0.017
Token parents
Token ID3397
Feature activation+4.31e-02

Pos logit contributions
arl+0.016
lessness+0.014
auc+0.013
washer+0.012
udd+0.012
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Gor-0.009
 Tab-0.009
 Ruth-0.009
Token hugged
Token ID41130
Feature activation+2.12e-02

Pos logit contributions
arl+0.097
auc+0.077
lessness+0.075
leys+0.067
washer+0.066
Neg logit contributions
 Jeffrey-0.076
 Tab-0.068
 Raven-0.067
 Heidi-0.063
 Pet-0.063
Token them
Token ID606
Feature activation-2.33e-02

Pos logit contributions
arl+0.038
lessness+0.032
auc+0.031
washer+0.029
udd+0.027
Neg logit contributions
 Jeffrey-0.045
 Raven-0.042
 Tab-0.039
 Gor-0.039
 Heidi-0.039
Token and
Token ID290
Feature activation+4.72e-02

Pos logit contributions
 Jeffrey+0.039
 Raven+0.036
 Heidi+0.033
 Tab+0.033
 Gor+0.033
Neg logit contributions
arl-0.052
lessness-0.045
auc-0.043
washer-0.043
udd-0.041
Token praised
Token ID15342
Feature activation+3.44e-02

Pos logit contributions
arl+0.112
lessness+0.085
auc+0.084
ares+0.079
leys+0.079
Neg logit contributions
 Jeffrey-0.082
 Tab-0.074
 Raven-0.073
 Heidi-0.073
 Pet-0.069
Token them
Token ID606
Feature activation-1.43e-02

Pos logit contributions
arl+0.070
lessness+0.060
auc+0.059
washer+0.058
udd+0.055
Neg logit contributions
 Jeffrey-0.065
 Raven-0.058
 Tab-0.055
 Heidi-0.055
 Gor-0.054
Token.
Token ID13
Feature activation+1.06e-02

Pos logit contributions
 Jeffrey+0.026
 Raven+0.024
 Tab+0.022
 Gor+0.022
 Heidi+0.022
Neg logit contributions
arl-0.030
lessness-0.025
auc-0.025
washer-0.023
udd-0.022
Token They
Token ID1119
Feature activation+3.24e-02

Pos logit contributions
arl+0.024
lessness+0.021
auc+0.021
washer+0.020
udd+0.019
Neg logit contributions
 Jeffrey-0.017
 Raven-0.016
 Tab-0.015
 Heidi-0.014
 Pet-0.014
Token said
Token ID531
Feature activation+1.97e-02

Pos logit contributions
arl+0.072
auc+0.056
lessness+0.056
ares+0.051
washer+0.050
Neg logit contributions
 Jeffrey-0.058
 Raven-0.053
 Tab-0.052
 Gor-0.050
 Pet-0.050
Token
Token ID198
Feature activation+6.75e-03

Pos logit contributions
arl+0.007
lessness+0.006
auc+0.006
washer+0.006
udd+0.006
Neg logit contributions
 Jeffrey-0.005
 Raven-0.005
 Tab-0.004
 Heidi-0.004
 Gor-0.004
Token
Token ID198
Feature activation+5.17e-03

Pos logit contributions
arl+0.015
lessness+0.014
auc+0.013
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Heidi-0.009
 Tab-0.009
 Pet-0.009
TokenThen
Token ID6423
Feature activation-1.53e-02

Pos logit contributions
arl+0.012
lessness+0.010
auc+0.010
washer+0.009
udd+0.009
Neg logit contributions
 Jeffrey-0.009
 Raven-0.008
 Tab-0.007
 Heidi-0.007
 Gor-0.007
Token,
Token ID11
Feature activation+3.94e-03

Pos logit contributions
 Jeffrey+0.033
 Tab+0.029
 Debbie+0.028
 Raven+0.028
 Heidi+0.028
Neg logit contributions
arl-0.020
lessness-0.020
pt-0.017
 die-0.017
 forgive-0.017
Token a
Token ID257
Feature activation+3.04e-02

Pos logit contributions
arl+0.007
lessness+0.006
 forgive+0.006
auc+0.006
 die+0.006
Neg logit contributions
 Jeffrey-0.007
 Tab-0.006
 Debbie-0.006
 Heidi-0.006
 Raven-0.006
Token voice
Token ID3809
Feature activation+5.76e-02

Pos logit contributions
arl+0.063
lessness+0.053
auc+0.052
washer+0.049
udd+0.047
Neg logit contributions
 Jeffrey-0.057
 Raven-0.053
 Tab-0.049
 Gor-0.048
 Heidi-0.048
Token called
Token ID1444
Feature activation+3.79e-02

Pos logit contributions
arl+0.130
lessness+0.104
udd+0.102
auc+0.100
washer+0.098
Neg logit contributions
 Jeffrey-0.103
 Raven-0.098
 Tab-0.089
 Pet-0.089
 Gor-0.088
Token out
Token ID503
Feature activation+2.58e-02

Pos logit contributions
arl+0.072
lessness+0.060
auc+0.058
washer+0.055
udd+0.051
Neg logit contributions
 Jeffrey-0.077
 Raven-0.071
 Tab-0.067
 Heidi-0.066
 Gor-0.066
Token from
Token ID422
Feature activation+2.28e-03

Pos logit contributions
arl+0.061
lessness+0.053
auc+0.053
washer+0.052
udd+0.051
Neg logit contributions
 Jeffrey-0.040
 Raven-0.037
 Gor-0.034
 Tab-0.033
 Pet-0.032
Token the
Token ID262
Feature activation+1.15e-03

Pos logit contributions
arl+0.005
lessness+0.004
auc+0.004
washer+0.003
 die+0.003
Neg logit contributions
 Jeffrey-0.004
 Tab-0.004
 Raven-0.004
 Gor-0.003
 Heidi-0.003
Token garden
Token ID11376
Feature activation+3.71e-02

Pos logit contributions
arl+0.003
lessness+0.002
auc+0.002
washer+0.002
udd+0.002
Neg logit contributions
 Jeffrey-0.002
 Raven-0.002
 Tab-0.002
 Heidi-0.002
 Gor-0.002
Token a
Token ID257
Feature activation+1.50e-02

Pos logit contributions
arl+0.030
lessness+0.026
auc+0.025
washer+0.024
udd+0.024
Neg logit contributions
 Jeffrey-0.012
 Raven-0.010
 Heidi-0.009
 Tab-0.009
 Gor-0.009
Token big
Token ID1263
Feature activation-6.38e-04

Pos logit contributions
arl+0.031
lessness+0.028
auc+0.027
washer+0.024
udd+0.024
Neg logit contributions
 Jeffrey-0.027
 Raven-0.026
 Gor-0.023
 Tab-0.023
 Heidi-0.023
Token box
Token ID3091
Feature activation+2.11e-02

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Tab+0.001
 Heidi+0.001
 Gor+0.001
Neg logit contributions
arl-0.001
lessness-0.001
auc-0.001
washer-0.001
udd-0.001
Token in
Token ID287
Feature activation-1.31e-02

Pos logit contributions
arl+0.051
lessness+0.044
auc+0.043
udd+0.040
washer+0.040
Neg logit contributions
 Jeffrey-0.030
 Raven-0.027
 Tab-0.025
 Heidi-0.024
 Gor-0.023
Token the
Token ID262
Feature activation+2.38e-02

Pos logit contributions
 Jeffrey+0.023
 Raven+0.021
 Tab+0.019
 Gor+0.019
 Heidi+0.019
Neg logit contributions
arl-0.026
lessness-0.023
auc-0.022
washer-0.022
ot-0.020
Token living
Token ID2877
Feature activation+5.51e-02

Pos logit contributions
arl+0.058
lessness+0.051
auc+0.048
ares+0.046
oke+0.045
Neg logit contributions
 Jeffrey-0.035
 Raven-0.033
 Tab-0.030
 Gor-0.030
 Heidi-0.029
Token room
Token ID2119
Feature activation+2.62e-02

Pos logit contributions
arl+0.147
lessness+0.125
ares+0.121
oke+0.120
bs+0.117
Neg logit contributions
 Jeffrey-0.056
 Tab-0.050
 Raven-0.049
 trees-0.048
 front-0.045
Token.
Token ID13
Feature activation-2.01e-02

Pos logit contributions
arl+0.065
lessness+0.054
auc+0.052
udd+0.049
washer+0.049
Neg logit contributions
 Jeffrey-0.037
 Raven-0.034
 Tab-0.032
 Heidi-0.030
 Gor-0.030
Token He
Token ID679
Feature activation+6.91e-03

Pos logit contributions
 Jeffrey+0.029
 Raven+0.026
 Tab+0.024
 Gor+0.024
 Heidi+0.023
Neg logit contributions
arl-0.049
lessness-0.043
auc-0.041
washer-0.041
udd-0.038
Token thought
Token ID1807
Feature activation-1.47e-02

Pos logit contributions
arl+0.014
lessness+0.012
auc+0.012
washer+0.011
udd+0.010
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Tab-0.011
 Heidi-0.011
 Gor-0.011
Token,
Token ID11
Feature activation-2.88e-03

Pos logit contributions
 Jeffrey+0.024
 Raven+0.021
 Heidi+0.019
 Tab+0.019
 Gor+0.019
Neg logit contributions
arl-0.035
lessness-0.029
auc-0.029
washer-0.027
udd-0.027
Token other
Token ID584
Feature activation+6.27e-03

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Charles+0.012
 Gor+0.012
 Heidi+0.012
Neg logit contributions
arl-0.014
lessness-0.012
auc-0.011
washer-0.011
ot-0.010
Token was
Token ID373
Feature activation-1.27e-02

Pos logit contributions
arl+0.011
lessness+0.009
auc+0.009
udd+0.008
washer+0.008
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
Alexander-0.011
 Heidi-0.011
 Charles-0.011
Token called
Token ID1444
Feature activation+1.55e-02

Pos logit contributions
 Jeffrey+0.025
 Raven+0.023
 Gor+0.022
 Heidi+0.021
 Tab+0.021
Neg logit contributions
arl-0.024
auc-0.019
lessness-0.019
washer-0.018
ot-0.018
Token Lily
Token ID20037
Feature activation-6.77e-03

Pos logit contributions
arl+0.036
auc+0.030
lessness+0.030
washer+0.028
ot+0.027
Neg logit contributions
 Jeffrey-0.024
 Raven-0.022
 Gor-0.020
 Heidi-0.020
 Tab-0.020
Token.
Token ID13
Feature activation+2.24e-03

Pos logit contributions
 Jeffrey+0.014
 Raven+0.013
 Tab+0.012
 Heidi+0.012
 Gor+0.012
Neg logit contributions
arl-0.012
lessness-0.010
washer-0.009
auc-0.009
udd-0.008
Token They
Token ID1119
Feature activation+5.06e-03

Pos logit contributions
arl+0.004
auc+0.003
lessness+0.003
oke+0.003
ot+0.003
Neg logit contributions
 Jeffrey-0.004
 Raven-0.004
Alexander-0.004
Charles-0.004
 eighty-0.004
Token were
Token ID547
Feature activation-2.00e-02

Pos logit contributions
arl+0.008
lessness+0.007
auc+0.007
washer+0.007
udd+0.006
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Heidi-0.010
 Tab-0.010
 Gor-0.010
Token always
Token ID1464
Feature activation-3.81e-03

Pos logit contributions
 Jeffrey+0.039
 Raven+0.037
 Heidi+0.035
Alexander+0.034
asha+0.034
Neg logit contributions
arl-0.033
lessness-0.024
washer-0.024
 sorry-0.024
ot-0.024
Token playing
Token ID2712
Feature activation-1.43e-04

Pos logit contributions
 Jeffrey+0.007
 Raven+0.007
 Tab+0.006
 Gor+0.006
 Heidi+0.006
Neg logit contributions
arl-0.008
lessness-0.006
auc-0.006
washer-0.006
udd-0.006
Token together
Token ID1978
Feature activation-1.56e-02

Pos logit contributions
 Jeffrey+0.000
 Raven+0.000
 Tab+0.000
 Gor+0.000
 Heidi+0.000
Neg logit contributions
arl-0.000
lessness-0.000
auc-0.000
washer-0.000
udd-0.000
Token but
Token ID475
Feature activation+7.71e-03

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
 Tab+0.024
 Heidi+0.024
 Gor+0.024
Neg logit contributions
arl-0.032
lessness-0.028
auc-0.026
washer-0.026
udd-0.024
Token�
Token ID129
Feature activation-1.91e-02

Pos logit contributions
 Jeffrey+0.063
 Raven+0.058
 Tab+0.054
 Heidi+0.053
 Gor+0.053
Neg logit contributions
arl-0.075
lessness-0.064
auc-0.063
washer-0.060
udd-0.056
Token�
Token ID241
Feature activation-2.29e-02

Pos logit contributions
 Jeffrey+0.032
 Raven+0.029
 Tab+0.027
 Gor+0.027
 Heidi+0.026
Neg logit contributions
arl-0.043
auc-0.034
lessness-0.034
udd-0.033
washer-0.031
TokenHere
Token ID4342
Feature activation-1.92e-02

Pos logit contributions
 Jeffrey+0.044
 Raven+0.039
 Gor+0.038
 Alexander+0.038
 Christopher+0.037
Neg logit contributions
arl-0.043
lessness-0.038
auc-0.033
ot-0.032
less-0.030
Token you
Token ID345
Feature activation+7.44e-03

Pos logit contributions
 Jeffrey+0.043
 Raven+0.038
Alexander+0.038
 Gor+0.037
 Alexander+0.037
Neg logit contributions
arl-0.026
lessness-0.024
auc-0.023
washer-0.022
ot-0.021
Token go
Token ID467
Feature activation-3.13e-03

Pos logit contributions
arl+0.015
lessness+0.014
auc+0.013
washer+0.012
udd+0.012
Neg logit contributions
 Jeffrey-0.014
 Raven-0.012
 Gor-0.012
 Tab-0.011
 Pet-0.011
Token!
Token ID0
Feature activation+5.93e-03

Pos logit contributions
 Jeffrey+0.006
 Raven+0.005
 Tab+0.005
 Heidi+0.005
 Gor+0.005
Neg logit contributions
arl-0.007
lessness-0.006
auc-0.006
washer-0.005
udd-0.005
Tokenâ
Token ID22940
Feature activation-7.68e-03

Pos logit contributions
arl+0.012
lessness+0.010
auc+0.010
washer+0.009
udd+0.009
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Tab-0.009
 Heidi-0.009
 gor-0.009
Token€
Token ID26391
Feature activation-3.05e-02

Pos logit contributions
 Jeffrey+0.013
 Raven+0.012
 Tab+0.011
 Heidi+0.011
 Gor+0.011
Neg logit contributions
arl-0.017
lessness-0.015
auc-0.014
washer-0.014
udd-0.013
Token
Token ID198
Feature activation+1.83e-02

Pos logit contributions
 Jeffrey+0.059
 Raven+0.054
 Tab+0.051
 Gor+0.050
 Heidi+0.050
Neg logit contributions
arl-0.060
lessness-0.051
washer-0.048
auc-0.048
udd-0.047
Token
Token ID198
Feature activation-2.58e-03

Pos logit contributions
arl+0.042
lessness+0.037
auc+0.037
washer+0.035
udd+0.032
Neg logit contributions
 Jeffrey-0.029
 Raven-0.027
 Heidi-0.024
 Tab-0.024
 Ruth-0.023
TokenThe
Token ID464
Feature activation-1.74e-02

Pos logit contributions
 Jeffrey+0.005
 Raven+0.004
 Gor+0.004
 Tab+0.004
 Heidi+0.004
Neg logit contributions
arl-0.006
lessness-0.005
auc-0.005
washer-0.004
udd-0.004
Token,
Token ID11
Feature activation+2.00e-02

Pos logit contributions
arl+0.083
auc+0.070
lessness+0.070
washer+0.065
udd+0.064
Neg logit contributions
 Jeffrey-0.038
 Raven-0.034
 Tab-0.032
 Heidi-0.030
 Ruth-0.030
Token Jenny
Token ID28437
Feature activation-2.51e-02

Pos logit contributions
arl+0.045
auc+0.036
lessness+0.035
washer+0.032
udd+0.032
Neg logit contributions
 Jeffrey-0.036
 Tab-0.032
 Raven-0.031
 Heidi-0.030
 Pet-0.030
Token told
Token ID1297
Feature activation+2.39e-02

Pos logit contributions
 Jeffrey+0.049
 Raven+0.046
 Heidi+0.043
 Tab+0.042
 Gor+0.042
Neg logit contributions
arl-0.043
washer-0.040
lessness-0.039
auc-0.038
udd-0.036
Token her
Token ID607
Feature activation+1.44e-02

Pos logit contributions
arl+0.064
lessness+0.057
auc+0.056
washer+0.054
udd+0.052
Neg logit contributions
 Jeffrey-0.030
 Raven-0.027
 Tab-0.024
 Heidi-0.023
 Gor-0.023
Token friends
Token ID2460
Feature activation+1.86e-02

Pos logit contributions
arl+0.032
auc+0.026
lessness+0.026
washer+0.023
udd+0.023
Neg logit contributions
 Jeffrey-0.026
 Raven-0.024
 Tab-0.022
 Gor-0.022
 Heidi-0.022
Token all
Token ID477
Feature activation+3.48e-02

Pos logit contributions
arl+0.038
lessness+0.031
auc+0.031
washer+0.029
udd+0.027
Neg logit contributions
 Jeffrey-0.036
 Raven-0.033
 Tab-0.031
 Heidi-0.031
 Gor-0.030
Token about
Token ID546
Feature activation-2.37e-03

Pos logit contributions
arl+0.078
auc+0.063
lessness+0.061
washer+0.059
udd+0.057
Neg logit contributions
 Jeffrey-0.063
 Raven-0.056
 Heidi-0.054
 Tab-0.053
 Sunshine-0.052
Token their
Token ID511
Feature activation-1.58e-02

Pos logit contributions
 Jeffrey+0.004
 Raven+0.004
 Gor+0.003
 Tab+0.003
 Heidi+0.003
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token secret
Token ID3200
Feature activation+1.39e-02

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
 Gor+0.024
 Tab+0.024
 Heidi+0.024
Neg logit contributions
arl-0.034
lessness-0.029
auc-0.028
washer-0.027
udd-0.025
Token adventure
Token ID8855
Feature activation+1.13e-02

Pos logit contributions
arl+0.030
lessness+0.026
auc+0.025
washer+0.024
udd+0.023
Neg logit contributions
 Jeffrey-0.024
 Raven-0.022
 Tab-0.021
 Gor-0.020
 Heidi-0.020
Token.
Token ID13
Feature activation+1.44e-02

Pos logit contributions
arl+0.024
lessness+0.021
auc+0.020
washer+0.019
udd+0.018
Neg logit contributions
 Jeffrey-0.020
 Raven-0.018
 Tab-0.017
 Heidi-0.017
 Gor-0.017
Token did
Token ID750
Feature activation-1.69e-02

Pos logit contributions
arl+0.021
lessness+0.018
auc+0.017
washer+0.016
udd+0.015
Neg logit contributions
 Jeffrey-0.023
 Raven-0.021
 Tab-0.020
 Heidi-0.019
 Gor-0.019
Token not
Token ID407
Feature activation+2.32e-02

Pos logit contributions
 Jeffrey+0.032
 Raven+0.030
 Tab+0.028
 Gor+0.028
 Heidi+0.027
Neg logit contributions
arl-0.034
lessness-0.029
auc-0.028
washer-0.027
udd-0.025
Token like
Token ID588
Feature activation-4.20e-03

Pos logit contributions
arl+0.044
lessness+0.037
auc+0.037
washer+0.035
udd+0.033
Neg logit contributions
 Jeffrey-0.047
 Raven-0.043
 Heidi-0.040
 Tab-0.040
 Gor-0.040
Token dogs
Token ID6844
Feature activation+4.72e-04

Pos logit contributions
 Jeffrey+0.006
 Raven+0.006
 Tab+0.005
 Gor+0.005
 Heidi+0.005
Neg logit contributions
arl-0.010
lessness-0.009
auc-0.009
washer-0.008
udd-0.008
Token.
Token ID13
Feature activation-1.05e-02

Pos logit contributions
arl+0.001
auc+0.001
lessness+0.001
washer+0.001
udd+0.001
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Heidi-0.001
 Gor-0.001
 Tab-0.001
Token They
Token ID1119
Feature activation+1.07e-02

Pos logit contributions
 Jeffrey+0.013
 Raven+0.012
 Tab+0.011
 Gor+0.011
 Heidi+0.010
Neg logit contributions
arl-0.028
lessness-0.025
auc-0.024
washer-0.023
udd-0.022
Token had
Token ID550
Feature activation+1.63e-03

Pos logit contributions
arl+0.021
lessness+0.017
auc+0.017
washer+0.016
udd+0.015
Neg logit contributions
 Jeffrey-0.022
 Raven-0.020
 Tab-0.019
 Gor-0.019
 Heidi-0.019
Token sharp
Token ID7786
Feature activation-1.38e-02

Pos logit contributions
arl+0.004
lessness+0.003
auc+0.003
washer+0.003
udd+0.003
Neg logit contributions
 Jeffrey-0.003
 Raven-0.002
 Tab-0.002
 Heidi-0.002
 Gor-0.002
Token teeth
Token ID9941
Feature activation-1.30e-02

Pos logit contributions
 Jeffrey+0.022
 Raven+0.020
 Tab+0.019
 Gor+0.018
 Heidi+0.018
Neg logit contributions
arl-0.032
lessness-0.028
auc-0.027
washer-0.025
udd-0.025
Token and
Token ID290
Feature activation+7.43e-03

Pos logit contributions
 Jeffrey+0.015
 Raven+0.014
 Tab+0.012
 Heidi+0.012
 Gor+0.012
Neg logit contributions
arl-0.035
auc-0.031
lessness-0.031
washer-0.029
udd-0.027
Token rough
Token ID5210
Feature activation-1.00e-03

Pos logit contributions
arl+0.017
auc+0.015
lessness+0.015
washer+0.014
udd+0.014
Neg logit contributions
 Jeffrey-0.012
 Raven-0.011
 Heidi-0.010
 Tab-0.010
 Gor-0.010
Token So
Token ID1406
Feature activation-1.07e-02

Pos logit contributions
 Jeffrey+0.009
 Raven+0.007
 Tab+0.007
 Charles+0.007
 Gor+0.006
Neg logit contributions
arl-0.015
auc-0.013
lessness-0.012
 die-0.012
washer-0.011
Token Matt
Token ID4705
Feature activation-2.08e-02

Pos logit contributions
 Jeffrey+0.016
 Raven+0.013
 Tab+0.012
 Charles+0.012
 Gor+0.012
Neg logit contributions
arl-0.028
auc-0.024
lessness-0.023
udd-0.022
washer-0.022
Token said
Token ID531
Feature activation+7.14e-03

Pos logit contributions
 Jeffrey+0.047
 Raven+0.044
 Heidi+0.041
 Gor+0.040
 Tab+0.040
Neg logit contributions
arl-0.031
washer-0.027
lessness-0.027
auc-0.027
udd-0.025
Token,
Token ID11
Feature activation-1.11e-02

Pos logit contributions
arl+0.016
lessness+0.014
auc+0.013
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.012
 Raven-0.011
 Tab-0.010
 Gor-0.010
 Heidi-0.010
Token "
Token ID366
Feature activation-1.69e-02

Pos logit contributions
 Jeffrey+0.018
 Raven+0.017
 Tab+0.016
 Gor+0.015
 Heidi+0.015
Neg logit contributions
arl-0.025
lessness-0.022
auc-0.022
washer-0.020
udd-0.020
TokenWe
Token ID1135
Feature activation+2.56e-02

Pos logit contributions
 Jeffrey+0.033
 Raven+0.030
 Heidi+0.029
 gor+0.029
 Gor+0.028
Neg logit contributions
arl-0.031
lessness-0.028
auc-0.025
washer-0.025
ot-0.024
Token should
Token ID815
Feature activation+1.44e-02

Pos logit contributions
arl+0.043
lessness+0.035
auc+0.034
washer+0.031
udd+0.029
Neg logit contributions
 Jeffrey-0.058
 Raven-0.054
 Tab-0.051
 Gor-0.051
 Heidi-0.050
Token share
Token ID2648
Feature activation-4.68e-03

Pos logit contributions
arl+0.027
lessness+0.021
auc+0.021
washer+0.019
udd+0.018
Neg logit contributions
 Jeffrey-0.030
 Raven-0.028
 Gor-0.026
 Tab-0.026
 Heidi-0.026
Token the
Token ID262
Feature activation+1.65e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Heidi+0.007
 Tab+0.007
 Gor+0.007
Neg logit contributions
arl-0.010
lessness-0.009
auc-0.009
washer-0.008
udd-0.008
Token perfume
Token ID40094
Feature activation-8.26e-03

Pos logit contributions
arl+0.035
lessness+0.030
auc+0.029
washer+0.028
leys+0.026
Neg logit contributions
 Jeffrey-0.028
 Raven-0.028
 Tab-0.027
 Pet-0.026
 Sunshine-0.025
Token.
Token ID13
Feature activation+1.69e-02

Pos logit contributions
 Jeffrey+0.016
 Raven+0.015
 Heidi+0.014
 Tab+0.014
 Pet+0.013
Neg logit contributions
arl-0.016
lessness-0.014
auc-0.014
washer-0.013
udd-0.012
Token Spot
Token ID15899
Feature activation-2.96e-02

Pos logit contributions
 Jeffrey+0.018
 Raven+0.016
 Charles+0.015
 gor+0.015
 Tab+0.015
Neg logit contributions
arl-0.024
lessness-0.019
auc-0.018
oke-0.017
washer-0.017
Token was
Token ID373
Feature activation-1.57e-02

Pos logit contributions
 Jeffrey+0.065
 Raven+0.061
 Heidi+0.058
 Tab+0.057
 gor+0.057
Neg logit contributions
arl-0.040
washer-0.039
lessness-0.037
udd-0.035
auc-0.034
Token so
Token ID523
Feature activation-1.72e-02

Pos logit contributions
 Jeffrey+0.030
 Raven+0.028
 Gor+0.026
 Tab+0.026
 Heidi+0.025
Neg logit contributions
arl-0.030
lessness-0.026
auc-0.025
washer-0.025
udd-0.023
Token happy
Token ID3772
Feature activation-9.89e-03

Pos logit contributions
 Jeffrey+0.026
 Raven+0.025
 Tab+0.024
 Gor+0.023
 Heidi+0.023
Neg logit contributions
arl-0.039
auc-0.035
lessness-0.033
leys-0.031
washer-0.031
Token,
Token ID11
Feature activation-2.32e-02

Pos logit contributions
 Jeffrey+0.021
 Raven+0.020
 Tab+0.019
 Gor+0.019
Alexander+0.019
Neg logit contributions
arl-0.015
lessness-0.014
washer-0.013
auc-0.012
udd-0.012
Token he
Token ID339
Feature activation-1.05e-02

Pos logit contributions
 Jeffrey+0.041
 Raven+0.036
 Tab+0.034
 Gor+0.034
 gor+0.033
Neg logit contributions
arl-0.048
lessness-0.043
auc-0.039
washer-0.039
udd-0.037
Token knew
Token ID2993
Feature activation+4.38e-03

Pos logit contributions
 Jeffrey+0.024
 Raven+0.022
 Heidi+0.021
 Tab+0.021
 Gor+0.020
Neg logit contributions
arl-0.017
lessness-0.014
washer-0.014
auc-0.013
udd-0.012
Token he
Token ID339
Feature activation-8.48e-03

Pos logit contributions
arl+0.008
lessness+0.007
auc+0.007
washer+0.007
udd+0.006
Neg logit contributions
 Jeffrey-0.009
 Raven-0.008
 Tab-0.007
 Gor-0.007
 Heidi-0.007
Token had
Token ID550
Feature activation-2.74e-03

Pos logit contributions
 Jeffrey+0.021
 Raven+0.020
 Tab+0.019
 Heidi+0.019
 gor+0.018
Neg logit contributions
arl-0.010
lessness-0.010
washer-0.010
auc-0.008
udd-0.008
Token to
Token ID284
Feature activation+8.07e-03

Pos logit contributions
 Jeffrey+0.006
 Raven+0.005
 Tab+0.005
 Gor+0.005
 Heidi+0.005
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token prepare
Token ID8335
Feature activation-1.44e-02

Pos logit contributions
arl+0.013
lessness+0.009
 forgive+0.009
auc+0.009
washer+0.009
Neg logit contributions
 Jeffrey-0.019
 Raven-0.017
 Tab-0.016
 Heidi-0.016
 Gor-0.016
Token yes
Token ID3763
Feature activation+2.51e-03

Pos logit contributions
arl+0.024
auc+0.020
lessness+0.020
washer+0.019
udd+0.018
Neg logit contributions
 Jeffrey-0.019
 Raven-0.018
 Tab-0.017
 Heidi-0.016
 Gor-0.016
Token.
Token ID13
Feature activation-8.05e-03

Pos logit contributions
arl+0.006
lessness+0.005
auc+0.005
washer+0.004
udd+0.004
Neg logit contributions
 Jeffrey-0.004
 Raven-0.004
 Tab-0.004
 Gor-0.004
 Heidi-0.004
Token
Token ID198
Feature activation-1.08e-02

Pos logit contributions
 Jeffrey+0.015
 Raven+0.014
 Tab+0.013
 Gor+0.013
 Charles+0.013
Neg logit contributions
arl-0.016
lessness-0.013
auc-0.013
washer-0.012
udd-0.012
Token
Token ID198
Feature activation-2.37e-02

Pos logit contributions
 Jeffrey+0.020
 Raven+0.019
 Tab+0.017
 Heidi+0.017
 Gor+0.017
Neg logit contributions
arl-0.022
lessness-0.019
auc-0.018
washer-0.018
udd-0.016
TokenS
Token ID50
Feature activation-1.73e-02

Pos logit contributions
 Jeffrey+0.052
 Raven+0.049
 Gor+0.045
 Charles+0.044
 Tab+0.044
Neg logit contributions
arl-0.042
lessness-0.033
auc-0.030
udd-0.030
ot-0.030
Tokenue
Token ID518
Feature activation-3.12e-02

Pos logit contributions
 Jeffrey+0.024
 Raven+0.021
 Tab+0.019
 Heidi+0.019
 Gor+0.019
Neg logit contributions
arl-0.043
lessness-0.039
auc-0.038
washer-0.037
udd-0.034
Token put
Token ID1234
Feature activation-1.33e-02

Pos logit contributions
 Jeffrey+0.073
 Raven+0.068
 Heidi+0.066
 Tab+0.063
 Alexander+0.063
Neg logit contributions
washer-0.039
udd-0.038
arl-0.037
auc-0.035
ot-0.035
Token the
Token ID262
Feature activation-2.25e-02

Pos logit contributions
 Jeffrey+0.024
 Raven+0.021
 Gor+0.020
 Tab+0.020
 Heidi+0.020
Neg logit contributions
arl-0.029
lessness-0.025
auc-0.024
washer-0.023
ot-0.021
Token rock
Token ID3881
Feature activation-2.60e-02

Pos logit contributions
 Jeffrey+0.039
 Raven+0.036
 Tab+0.033
 Heidi+0.033
 Gor+0.032
Neg logit contributions
arl-0.050
lessness-0.043
auc-0.042
washer-0.040
udd-0.038
Token in
Token ID287
Feature activation-4.32e-02

Pos logit contributions
 Jeffrey+0.047
 Raven+0.042
 Tab+0.040
 Heidi+0.039
 Gor+0.039
Neg logit contributions
arl-0.055
lessness-0.048
auc-0.047
washer-0.045
udd-0.042
Token her
Token ID607
Feature activation+1.03e-02

Pos logit contributions
 Jeffrey+0.077
 Raven+0.070
 Gor+0.068
 gor+0.066
 Alexander+0.064
Neg logit contributions
arl-0.087
lessness-0.077
ot-0.070
auc-0.068
 forgive-0.065
Token bug
Token ID5434
Feature activation+8.60e-03

Pos logit contributions
 Jeffrey+0.012
 Raven+0.011
 Heidi+0.010
 Tab+0.010
 Ruth+0.010
Neg logit contributions
arl-0.009
washer-0.007
lessness-0.007
auc-0.007
ot-0.006
Token who
Token ID508
Feature activation+1.18e-02

Pos logit contributions
arl+0.016
lessness+0.013
auc+0.013
washer+0.013
udd+0.012
Neg logit contributions
 Jeffrey-0.018
 Raven-0.016
 Tab-0.016
 Heidi-0.015
 Gor-0.015
Token was
Token ID373
Feature activation-1.39e-02

Pos logit contributions
arl+0.023
lessness+0.020
auc+0.019
washer+0.018
udd+0.017
Neg logit contributions
 Jeffrey-0.023
 Raven-0.021
 Tab-0.020
 Heidi-0.020
 Gor-0.019
Token very
Token ID845
Feature activation+1.61e-03

Pos logit contributions
 Jeffrey+0.027
 Raven+0.023
 Heidi+0.023
 Tab+0.022
 Gor+0.022
Neg logit contributions
arl-0.026
lessness-0.022
auc-0.022
washer-0.021
ot-0.020
Token clean
Token ID3424
Feature activation-5.31e-03

Pos logit contributions
arl+0.003
lessness+0.003
auc+0.002
washer+0.002
udd+0.002
Neg logit contributions
 Jeffrey-0.003
 Raven-0.003
 Tab-0.003
 Gor-0.003
 Heidi-0.003
Token.
Token ID13
Feature activation-9.86e-03

Pos logit contributions
 Jeffrey+0.012
 Raven+0.011
 Tab+0.010
Alexander+0.010
 Heidi+0.010
Neg logit contributions
arl-0.008
lessness-0.008
auc-0.007
washer-0.007
udd-0.006
Token He
Token ID679
Feature activation+7.90e-05

Pos logit contributions
 Jeffrey+0.017
 Raven+0.016
Alexander+0.015
 gor+0.015
 ponies+0.014
Neg logit contributions
arl-0.018
lessness-0.016
auc-0.016
washer-0.015
ot-0.014
Token lived
Token ID5615
Feature activation+3.16e-03

Pos logit contributions
arl+0.000
lessness+0.000
auc+0.000
washer+0.000
ot+0.000
Neg logit contributions
 Jeffrey-0.000
 Raven-0.000
 Heidi-0.000
 Tab-0.000
 Alexander-0.000
Token in
Token ID287
Feature activation-4.09e-02

Pos logit contributions
arl+0.006
lessness+0.005
auc+0.005
washer+0.005
udd+0.005
Neg logit contributions
 Jeffrey-0.006
 Raven-0.005
 Tab-0.005
 Heidi-0.005
 Gor-0.005
Token the
Token ID262
Feature activation+1.22e-02

Pos logit contributions
 Jeffrey+0.070
 Raven+0.063
 Tab+0.062
 Gor+0.061
 Heidi+0.060
Neg logit contributions
arl-0.079
lessness-0.073
auc-0.066
washer-0.066
ot-0.062
Token garden
Token ID11376
Feature activation-1.08e-04

Pos logit contributions
arl+0.027
auc+0.024
lessness+0.024
washer+0.023
udd+0.022
Neg logit contributions
 Jeffrey-0.021
 Raven-0.018
 Tab-0.017
 Heidi-0.017
 Gor-0.017
Token"
Token ID1
Feature activation-7.89e-04

Pos logit contributions
 Jeffrey+0.005
 Raven+0.004
 Tab+0.004
 Gor+0.004
 Heidi+0.004
Neg logit contributions
arl-0.005
lessness-0.005
auc-0.004
washer-0.004
udd-0.004
TokenLook
Token ID8567
Feature activation+9.76e-03

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Heidi+0.001
 Tab+0.001
 Gor+0.001
Neg logit contributions
arl-0.002
lessness-0.002
auc-0.002
washer-0.001
udd-0.001
Token there
Token ID612
Feature activation-1.88e-02

Pos logit contributions
arl+0.020
lessness+0.017
auc+0.017
washer+0.016
udd+0.015
Neg logit contributions
 Jeffrey-0.018
 Raven-0.016
 Tab-0.015
 Heidi-0.015
 Gor-0.015
Token is
Token ID318
Feature activation-1.21e-02

Pos logit contributions
 Jeffrey+0.033
 Raven+0.030
 Tab+0.027
 Heidi+0.027
 Gor+0.027
Neg logit contributions
arl-0.041
lessness-0.035
auc-0.034
washer-0.033
udd-0.031
Token an
Token ID281
Feature activation+5.05e-03

Pos logit contributions
 Jeffrey+0.017
 Raven+0.015
 Tab+0.014
 Heidi+0.014
 Gor+0.013
Neg logit contributions
arl-0.030
lessness-0.027
auc-0.026
washer-0.025
udd-0.024
Token airplane
Token ID19401
Feature activation-1.61e-02

Pos logit contributions
arl+0.009
auc+0.008
lessness+0.008
washer+0.008
udd+0.007
Neg logit contributions
 Jeffrey-0.010
 Raven-0.009
 Heidi-0.009
 Tab-0.009
 Gor-0.009
Token!"
Token ID2474
Feature activation+5.71e-03

Pos logit contributions
 Jeffrey+0.027
 Raven+0.025
 Tab+0.023
 Gor+0.023
 Heidi+0.023
Neg logit contributions
arl-0.036
lessness-0.031
auc-0.030
washer-0.028
udd-0.027
Token said
Token ID531
Feature activation+8.46e-03

Pos logit contributions
arl+0.011
lessness+0.010
auc+0.009
washer+0.009
udd+0.008
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Tab-0.010
 Gor-0.009
 Heidi-0.009
Token the
Token ID262
Feature activation-5.45e-03

Pos logit contributions
arl+0.018
lessness+0.016
auc+0.015
washer+0.015
ot+0.014
Neg logit contributions
 Jeffrey-0.015
 Raven-0.013
 Gor-0.012
 Tab-0.012
 Heidi-0.012
Token mom
Token ID1995
Feature activation+1.88e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Tab+0.007
 Gor+0.007
 Heidi+0.007
Neg logit contributions
arl-0.012
lessness-0.011
auc-0.011
leys-0.010
udd-0.010
Tokenmy
Token ID1820
Feature activation+5.48e-03

Pos logit contributions
arl+0.044
lessness+0.038
auc+0.038
washer+0.036
udd+0.035
Neg logit contributions
 Jeffrey-0.029
 Raven-0.027
 Tab-0.024
 Gor-0.024
 Heidi-0.024
Token and
Token ID290
Feature activation+2.24e-02

Pos logit contributions
arl+0.001
lessness+0.001
auc+0.001
washer+0.001
udd+0.001
Neg logit contributions
 Jeffrey-0.000
 Raven-0.000
 Tab-0.000
 Gor-0.000
 Heidi-0.000
Token ride
Token ID6594
Feature activation+1.49e-02

Pos logit contributions
arl+0.053
lessness+0.043
auc+0.042
udd+0.040
oke+0.037
Neg logit contributions
 Jeffrey-0.039
 Raven-0.034
 Tab-0.033
 Heidi-0.032
 Gor-0.032
Token the
Token ID262
Feature activation-3.69e-04

Pos logit contributions
arl+0.030
lessness+0.026
auc+0.025
washer+0.024
udd+0.023
Neg logit contributions
 Jeffrey-0.028
 Raven-0.026
 Tab-0.024
 Gor-0.024
 Heidi-0.024
Token pony
Token ID26902
Feature activation+6.06e-03

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Tab+0.001
 Heidi+0.001
 Gor+0.001
Neg logit contributions
arl-0.001
lessness-0.001
auc-0.001
washer-0.001
udd-0.000
Token.
Token ID13
Feature activation+1.63e-02

Pos logit contributions
arl+0.013
lessness+0.011
auc+0.011
washer+0.010
udd+0.010
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Tab-0.009
 Gor-0.009
 Heidi-0.009
Token They
Token ID1119
Feature activation+4.49e-03

Pos logit contributions
arl+0.041
auc+0.036
lessness+0.036
washer+0.034
udd+0.033
Neg logit contributions
 Jeffrey-0.023
 Raven-0.019
 Tab-0.018
 Pet-0.018
 Heidi-0.018
Token rode
Token ID22075
Feature activation+2.09e-03

Pos logit contributions
arl+0.011
auc+0.008
lessness+0.008
udd+0.007
washer+0.007
Neg logit contributions
 Jeffrey-0.008
 Raven-0.007
 Tab-0.006
 Gor-0.006
 Pet-0.006
Token around
Token ID1088
Feature activation-1.06e-02

Pos logit contributions
arl+0.003
lessness+0.003
auc+0.003
washer+0.003
ot+0.003
Neg logit contributions
 Jeffrey-0.004
 Raven-0.004
 Gor-0.004
 Tab-0.004
 Heidi-0.004
Token the
Token ID262
Feature activation-7.69e-04

Pos logit contributions
 Jeffrey+0.017
 Raven+0.016
 Tab+0.015
 Gor+0.014
 Heidi+0.014
Neg logit contributions
arl-0.023
lessness-0.021
auc-0.020
washer-0.019
udd-0.018
Token field
Token ID2214
Feature activation+5.98e-04

Pos logit contributions
 Jeffrey+0.002
 Raven+0.002
 Tab+0.001
 Heidi+0.001
 Ruth+0.001
Neg logit contributions
arl-0.001
washer-0.001
udd-0.001
auc-0.001
lessness-0.001
Token and
Token ID290
Feature activation-2.07e-03

Pos logit contributions
arl+0.001
lessness+0.001
auc+0.001
washer+0.001
udd+0.001
Neg logit contributions
 Jeffrey-0.001
 Raven-0.001
 Tab-0.001
 Gor-0.001
 Heidi-0.001
Token very
Token ID845
Feature activation-3.34e-03

Pos logit contributions
 Jeffrey+0.067
 Raven+0.061
 Heidi+0.060
 Tab+0.058
 Pet+0.057
Neg logit contributions
arl-0.059
lessness-0.055
washer-0.053
auc-0.050
udd-0.047
Token much
Token ID881
Feature activation-7.69e-04

Pos logit contributions
 Jeffrey+0.008
 Raven+0.007
 Tab+0.007
 Heidi+0.007
 Gor+0.007
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
udd-0.004
Token.
Token ID13
Feature activation+6.28e-03

Pos logit contributions
 Jeffrey+0.001
 Raven+0.001
 Tab+0.001
 Heidi+0.001
 Gor+0.001
Neg logit contributions
arl-0.002
lessness-0.001
auc-0.001
washer-0.001
udd-0.001
Token<|endoftext|>
Token ID50256
Feature activation-3.57e-03

Pos logit contributions
arl+0.013
lessness+0.011
auc+0.011
washer+0.010
udd+0.010
Neg logit contributions
 Jeffrey-0.012
 Raven-0.011
 Tab-0.010
 Heidi-0.010
 Gor-0.010
TokenL
Token ID43
Feature activation-8.20e-03

Pos logit contributions
 Jeffrey+0.007
 Raven+0.006
 Tab+0.006
 Gor+0.006
 Heidi+0.006
Neg logit contributions
arl-0.007
lessness-0.006
auc-0.006
washer-0.006
udd-0.005
Tokenily
Token ID813
Feature activation-4.35e-02

Pos logit contributions
 Jeffrey+0.017
 Raven+0.015
 Tab+0.014
 Heidi+0.014
 Gor+0.014
Neg logit contributions
arl-0.016
lessness-0.013
auc-0.012
washer-0.012
udd-0.011
Token and
Token ID290
Feature activation-5.55e-03

Pos logit contributions
 Jeffrey+0.097
 Raven+0.090
 Tab+0.086
 Alexander+0.084
 Heidi+0.083
Neg logit contributions
arl-0.063
auc-0.057
udd-0.057
washer-0.056
lessness-0.054
Token Max
Token ID5436
Feature activation-3.25e-02

Pos logit contributions
 Jeffrey+0.010
 Raven+0.009
 Gor+0.009
 gor+0.008
 Tab+0.008
Neg logit contributions
arl-0.011
lessness-0.010
ot-0.009
auc-0.009
washer-0.009
Token were
Token ID547
Feature activation-1.78e-02

Pos logit contributions
 Jeffrey+0.080
 Raven+0.076
 Heidi+0.070
 Tab+0.070
Alexander+0.070
Neg logit contributions
arl-0.036
auc-0.032
pt-0.032
udd-0.032
lessness-0.029
Token playing
Token ID2712
Feature activation+3.37e-03

Pos logit contributions
 Jeffrey+0.035
 Raven+0.033
 Tab+0.030
 Gor+0.030
 Heidi+0.029
Neg logit contributions
arl-0.035
lessness-0.029
auc-0.028
washer-0.028
udd-0.027
Token in
Token ID287
Feature activation-1.68e-02

Pos logit contributions
arl+0.007
auc+0.006
lessness+0.006
washer+0.005
udd+0.005
Neg logit contributions
 Jeffrey-0.006
 Raven-0.006
 Tab-0.006
 Gor-0.006
 Heidi-0.006
Token!
Token ID0
Feature activation+2.41e-03

Pos logit contributions
 Jeffrey+0.040
 Raven+0.037
 Tab+0.036
 Heidi+0.035
 Gor+0.034
Neg logit contributions
arl-0.033
lessness-0.031
washer-0.028
auc-0.027
ry-0.024
Token He
Token ID679
Feature activation-8.99e-03

Pos logit contributions
arl+0.005
lessness+0.004
auc+0.004
washer+0.004
udd+0.003
Neg logit contributions
 Jeffrey-0.005
 Raven-0.004
 Tab-0.004
 Gor-0.004
 Heidi-0.004
Token put
Token ID1234
Feature activation-9.98e-03

Pos logit contributions
 Jeffrey+0.018
 Raven+0.017
 Heidi+0.016
 Tab+0.016
 Gor+0.015
Neg logit contributions
arl-0.016
lessness-0.014
auc-0.013
washer-0.013
udd-0.012
Token the
Token ID262
Feature activation-1.11e-02

Pos logit contributions
 Jeffrey+0.016
 Raven+0.014
 Tab+0.013
 Gor+0.013
 Heidi+0.013
Neg logit contributions
arl-0.023
lessness-0.020
auc-0.020
washer-0.019
udd-0.018
Token money
Token ID1637
Feature activation-1.79e-02

Pos logit contributions
 Jeffrey+0.021
 Raven+0.019
 Tab+0.018
 Heidi+0.018
 Gor+0.018
Neg logit contributions
arl-0.023
lessness-0.019
auc-0.018
washer-0.018
udd-0.016
Token in
Token ID287
Feature activation-4.06e-02

Pos logit contributions
 Jeffrey+0.031
 Raven+0.029
 Tab+0.027
 Heidi+0.026
 Gor+0.026
Neg logit contributions
arl-0.039
lessness-0.033
auc-0.032
washer-0.031
udd-0.029
Token his
Token ID465
Feature activation+9.67e-03

Pos logit contributions
 Jeffrey+0.060
 Raven+0.054
 Tab+0.049
 Gor+0.049
 Heidi+0.048
Neg logit contributions
arl-0.097
lessness-0.086
auc-0.082
washer-0.079
udd-0.074
Token pocket
Token ID10000
Feature activation+1.01e-02

Pos logit contributions
arl+0.021
lessness+0.018
auc+0.017
washer+0.016
udd+0.016
Neg logit contributions
 Jeffrey-0.017
 Raven-0.016
 Tab-0.015
 Heidi-0.015
 Gor-0.015
Token and
Token ID290
Feature activation-4.87e-03

Pos logit contributions
arl+0.028
auc+0.024
lessness+0.023
washer+0.022
ares+0.022
Neg logit contributions
 Jeffrey-0.011
 Raven-0.010
 Tab-0.009
 Heidi-0.008
 Sunshine-0.008
Token started
Token ID2067
Feature activation-2.28e-02

Pos logit contributions
 Jeffrey+0.009
 Raven+0.008
 Tab+0.008
 Heidi+0.008
 Gor+0.008
Neg logit contributions
arl-0.010
lessness-0.009
auc-0.008
washer-0.008
udd-0.008
Token running
Token ID2491
Feature activation-2.63e-05

Pos logit contributions
 Jeffrey+0.042
 Raven+0.039
 Heidi+0.036
 Gor+0.036
 Tab+0.035
Neg logit contributions
arl-0.046
lessness-0.040
auc-0.038
washer-0.037
udd-0.035
Token She
Token ID1375
Feature activation-1.59e-03

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
 Tab+0.024
 Heidi+0.023
 Gor+0.023
Neg logit contributions
arl-0.039
lessness-0.034
auc-0.034
washer-0.032
udd-0.030
Token looked
Token ID3114
Feature activation-1.33e-02

Pos logit contributions
 Jeffrey+0.004
 Raven+0.003
 Heidi+0.003
Alexander+0.003
 Tab+0.003
Neg logit contributions
washer-0.002
arl-0.002
auc-0.002
lessness-0.002
udd-0.001
Token angry
Token ID7954
Feature activation-5.32e-03

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
 Heidi+0.024
 Tab+0.024
 Gor+0.024
Neg logit contributions
arl-0.023
lessness-0.020
auc-0.019
washer-0.018
udd-0.017
Token.
Token ID13
Feature activation-1.80e-02

Pos logit contributions
 Jeffrey+0.008
 Raven+0.008
 Tab+0.007
 Heidi+0.007
 Gor+0.007
Neg logit contributions
arl-0.013
lessness-0.011
auc-0.011
washer-0.010
udd-0.010
Token "
Token ID366
Feature activation-1.58e-02

Pos logit contributions
 Jeffrey+0.031
 Raven+0.029
 Tab+0.027
 Heidi+0.026
 Gor+0.026
Neg logit contributions
arl-0.039
lessness-0.034
auc-0.033
washer-0.032
udd-0.030
TokenThis
Token ID1212
Feature activation-3.88e-02

Pos logit contributions
 Jeffrey+0.026
 Raven+0.024
 Tab+0.022
 Gor+0.022
 Heidi+0.022
Neg logit contributions
arl-0.036
lessness-0.031
auc-0.030
washer-0.029
udd-0.027
Token is
Token ID318
Feature activation-3.10e-02

Pos logit contributions
 Jeffrey+0.090
 Raven+0.086
 Pet+0.083
 Tab+0.082
tons+0.081
Neg logit contributions
arl-0.044
washer-0.042
auc-0.039
lessness-0.033
udd-0.031
Token a
Token ID257
Feature activation-2.34e-02

Pos logit contributions
 Jeffrey+0.059
 Raven+0.056
 Heidi+0.052
 Tab+0.051
Alexander+0.051
Neg logit contributions
arl-0.054
washer-0.043
auc-0.043
lessness-0.042
 die-0.038
Token private
Token ID2839
Feature activation-8.15e-03

Pos logit contributions
 Jeffrey+0.046
 Raven+0.042
 Tab+0.039
 Heidi+0.039
 Gor+0.038
Neg logit contributions
arl-0.046
lessness-0.038
auc-0.038
washer-0.036
udd-0.033
Token party
Token ID2151
Feature activation+2.74e-03

Pos logit contributions
 Jeffrey+0.016
 Raven+0.014
 Tab+0.014
 Heidi+0.014
 Pet+0.013
Neg logit contributions
arl-0.015
auc-0.013
washer-0.012
lessness-0.012
udd-0.011
Token.
Token ID13
Feature activation-8.41e-03

Pos logit contributions
arl+0.006
lessness+0.005
auc+0.005
washer+0.005
udd+0.004
Neg logit contributions
 Jeffrey-0.005
 Raven-0.005
 Tab-0.004
 Heidi-0.004
 Gor-0.004
Token an
Token ID281
Feature activation+3.38e-03

Pos logit contributions
 Jeffrey+0.024
 Raven+0.022
 Tab+0.021
 Heidi+0.020
 Gor+0.020
Neg logit contributions
arl-0.029
lessness-0.025
auc-0.024
washer-0.023
udd-0.021
Token idea
Token ID2126
Feature activation-9.46e-03

Pos logit contributions
arl+0.005
lessness+0.004
auc+0.004
washer+0.004
udd+0.003
Neg logit contributions
 Jeffrey-0.008
 Raven-0.008
 Tab-0.007
 Ruth-0.007
 Heidi-0.007
Token.
Token ID13
Feature activation+2.22e-03

Pos logit contributions
 Jeffrey+0.016
 Raven+0.014
 Tab+0.013
 Heidi+0.013
 Gor+0.013
Neg logit contributions
arl-0.022
lessness-0.018
auc-0.018
washer-0.017
udd-0.016
Token He
Token ID679
Feature activation+1.22e-02

Pos logit contributions
arl+0.005
lessness+0.004
auc+0.004
washer+0.004
ot+0.004
Neg logit contributions
 Jeffrey-0.004
 Raven-0.003
 Tab-0.003
 Gor-0.003
 gor-0.003
Token hopped
Token ID45230
Feature activation-2.80e-03

Pos logit contributions
arl+0.024
lessness+0.020
auc+0.020
washer+0.019
udd+0.017
Neg logit contributions
 Jeffrey-0.024
 Raven-0.022
 Tab-0.021
 Gor-0.020
 Heidi-0.020
Token up
Token ID510
Feature activation-3.87e-02

Pos logit contributions
 Jeffrey+0.006
 Raven+0.006
 Gor+0.005
 Tab+0.005
 gor+0.005
Neg logit contributions
arl-0.005
lessness-0.004
auc-0.004
washer-0.004
ot-0.004
Token on
Token ID319
Feature activation-4.38e-02

Pos logit contributions
 Jeffrey+0.074
 Raven+0.071
 Gor+0.067
 Tab+0.066
 Heidi+0.066
Neg logit contributions
arl-0.068
lessness-0.058
auc-0.057
washer-0.055
ot-0.051
Token the
Token ID262
Feature activation+7.40e-03

Pos logit contributions
 Jeffrey+0.078
 Raven+0.077
 Gor+0.073
 Heidi+0.071
 Tab+0.071
Neg logit contributions
arl-0.074
lessness-0.062
auc-0.061
ot-0.059
washer-0.059
Token chair
Token ID5118
Feature activation-2.47e-02

Pos logit contributions
arl+0.015
auc+0.013
lessness+0.013
washer+0.013
udd+0.011
Neg logit contributions
 Jeffrey-0.013
 Raven-0.012
 Heidi-0.012
 Tab-0.012
 Gor-0.011
Token,
Token ID11
Feature activation-1.49e-02

Pos logit contributions
 Jeffrey+0.041
 Raven+0.038
 Tab+0.035
 Gor+0.034
 Heidi+0.034
Neg logit contributions
arl-0.054
lessness-0.048
auc-0.047
washer-0.046
udd-0.043
Token then
Token ID788
Feature activation-6.85e-03

Pos logit contributions
 Jeffrey+0.027
 Raven+0.026
 Gor+0.024
 Tab+0.023
 Heidi+0.023
Neg logit contributions
arl-0.029
lessness-0.027
washer-0.025
auc-0.025
udd-0.023
Token.
Token ID13
Feature activation-1.91e-02

Pos logit contributions
 Jeffrey+0.028
 Raven+0.026
 Gor+0.025
 Heidi+0.024
 Tab+0.024
Neg logit contributions
arl-0.026
lessness-0.023
auc-0.022
washer-0.021
udd-0.019
Token She
Token ID1375
Feature activation-2.20e-02

Pos logit contributions
 Jeffrey+0.038
 Raven+0.037
 Gor+0.033
 gor+0.033
Alexander+0.033
Neg logit contributions
arl-0.035
auc-0.029
lessness-0.029
washer-0.027
udd-0.025
Token started
Token ID2067
Feature activation-5.49e-02

Pos logit contributions
 Jeffrey+0.048
 Raven+0.045
 Heidi+0.043
 Tab+0.042
Alexander+0.041
Neg logit contributions
washer-0.030
arl-0.028
lessness-0.026
auc-0.026
udd-0.026
Token to
Token ID284
Feature activation-3.44e-02

Pos logit contributions
 Jeffrey+0.122
 Raven+0.114
tons+0.109
 Stanley+0.109
 Heidi+0.109
Neg logit contributions
arl-0.075
washer-0.062
lessness-0.061
auc-0.056
ot-0.055
Token read
Token ID1100
Feature activation-2.46e-02

Pos logit contributions
 Jeffrey+0.061
 Raven+0.057
 Gor+0.054
 Heidi+0.054
 Tab+0.054
Neg logit contributions
arl-0.059
 forgive-0.051
washer-0.050
 die-0.049
lessness-0.049
Token it
Token ID340
Feature activation-4.27e-02

Pos logit contributions
 Jeffrey+0.045
 Raven+0.041
 Gor+0.039
 Tab+0.039
 Heidi+0.039
Neg logit contributions
arl-0.044
lessness-0.040
auc-0.037
washer-0.037
udd-0.034
Token aloud
Token ID32227
Feature activation-2.28e-02

Pos logit contributions
 Jeffrey+0.081
 Raven+0.075
 Tab+0.072
 Heidi+0.072
 Gor+0.070
Neg logit contributions
arl-0.077
lessness-0.069
washer-0.067
auc-0.065
udd-0.060
Token.
Token ID13
Feature activation-1.75e-02

Pos logit contributions
 Jeffrey+0.044
 Raven+0.041
 Heidi+0.039
 Gor+0.038
 Tab+0.038
Neg logit contributions
arl-0.042
lessness-0.037
auc-0.034
washer-0.034
ry-0.032
Token
Token ID198
Feature activation-7.94e-03

Pos logit contributions
 Jeffrey+0.036
 Raven+0.035
 Gor+0.032
 gor+0.032
Alexander+0.031
Neg logit contributions
arl-0.031
auc-0.025
lessness-0.025
washer-0.023
udd-0.022
Token
Token ID198
Feature activation-1.39e-02

Pos logit contributions
 Jeffrey+0.015
 Raven+0.013
 Tab+0.012
 Heidi+0.012
 Gor+0.012
Neg logit contributions
arl-0.017
lessness-0.014
auc-0.014
washer-0.013
udd-0.012
TokenM
Token ID44
Feature activation-1.80e-02

Pos logit contributions
 Jeffrey+0.024
 Raven+0.023
 Tab+0.021
 Gor+0.021
 Heidi+0.020
Neg logit contributions
arl-0.030
lessness-0.026
auc-0.025
washer-0.024
udd-0.023