TOP ACTIVATIONS
MAX = 0.112

laugh and shout , " Whe e !" They
laugh and shout , " Whe e ! That was fun
something bad happened . <|endoftext|> Cl ara liked to climb .
happily ever after ! <|endoftext|> Cl are nce was a little
stirred until it was all soft and fluffy . Then ,
smiles and shouts , " Whe e ! That was fun
and slides down . " Whe e !" "
neat and tidy all year round . One day
and slides down . " Whe e !" "
gently and said , " Whe e , Brown ie ,
and slides down . " Whe e !" "
smiled and shouted , " Whe e !" L
until next time . <|endoftext|> Cl aire was playing in the
jelly . It was all squ ashed and dull now .
y ummy cookie , all soft and sweet . Lily was
yellow with lots of pol ka dots . L
so they did . <|endoftext|> Cl aire was feeling very proud
slides down fast . " Whe e !" she shouts .
and slides down . " Whe e !" "
caught them and ate them both . The end . <|endoftext|>

MIN ACTIVATIONS
MIN = -0.114

â t worry , Sam . I can help
. " Don 't worry , Tim ," she said .
â s a surprise ! â
: " Don 't worry , it will be alright ,
said " Don 't worry , I won 't hurt you
of her . " Hello , Lily ! I am the
the rat . " Hello , rat . What is your
, " Don 't worry , Lily . You have a
, " Don 't worry , Ben , it 's not
, " Don 't worry , sweet ie . It was
, " Don 't worry , Frost y . Let 's
snow man . " Hello , Mr . Snow man !"
, " Don 't worry , we can repair it !"
loud voice . " Hello , children !" she said .
, " Don 't worry , Lily . We can plan
, " Don 't worry , little bird . I will
, " Don 't worry , Tim my . We can
and said , " Hello , sir . We are trying
and said , " Hello , Rose . I am Tom
She said , " Hello , do ggy . I want

INTERVAL 0.056 - 0.112
CONTAINS 1.711%

he said , making them all jump . They smiled and
that he was going too fast and that was very stupid
big tree with a sign on it . The sign said
studying very hard and was feeling nervous . He put on
to write about flowers and stars . They want to write

INTERVAL -0.001 - 0.056
CONTAINS 55.214%

that she wanted . " Mom my , can
. They argued and argued . They did not
her favorite vegetable . One day , she was eating caul
saw big trees , pretty flowers , and fast cars .
fun . The squirrel thanked the boy and

INTERVAL -0.057 - -0.001
CONTAINS 39.368%

that noise ?" The trees replied , " It 's just
day , they cooked the sa us ages and had a
again and w add led to the pond . Lily followed
dangerous . He did not want to try it . He
She went to the living room , where her brother

INTERVAL -0.114 - -0.057
CONTAINS 3.708%

Lily wanted to spin more , but it started to rain
Tim rode the sc ooter , he went very fast .
. She said , " Can I strike the key ?"
day on , Prince Jack knew he wanted to be a
stay and watch the sunset , so she found a big
Token laugh
Token ID6487
Feature activation+5.51e-02

Pos logit contributions
acks+0.105
irmed+0.092
acked+0.090
irted+0.087
 drifted+0.087
Neg logit contributions
 why-0.081
ache-0.081
chard-0.079
cher-0.078
 sir-0.076
Token and
Token ID290
Feature activation+5.31e-02

Pos logit contributions
acks+0.116
irmed+0.104
|+0.103
 drifted+0.101
irted+0.098
Neg logit contributions
 oh-0.076
ache-0.074
 why-0.073
 lawyer-0.068
cher-0.067
Token shout
Token ID9191
Feature activation+3.44e-02

Pos logit contributions
acks+0.110
acked+0.101
igg+0.099
irmed+0.098
irted+0.098
Neg logit contributions
 oh-0.067
 why-0.066
chard-0.064
cher-0.063
 sir-0.062
Token,
Token ID11
Feature activation-4.72e-02

Pos logit contributions
acks+0.066
 drifted+0.058
irmed+0.057
|+0.055
irt+0.054
Neg logit contributions
 oh-0.053
ache-0.053
 why-0.050
 lawyer-0.050
cher-0.050
Token "
Token ID366
Feature activation-3.61e-02

Pos logit contributions
ache+0.051
 oh+0.048
cher+0.047
 lawyer+0.046
chard+0.045
Neg logit contributions
acks-0.114
 drifted-0.104
irmed-0.101
|-0.096
irt-0.096
TokenWhe
Token ID10842
Feature activation+1.12e-01

Pos logit contributions
 oh+0.060
can+0.056
ache+0.055
cher+0.054
 lawyer+0.052
Neg logit contributions
acks-0.066
irmed-0.058
 drifted-0.056
irt-0.055
irm-0.054
Tokene
Token ID68
Feature activation+4.91e-02

Pos logit contributions
acks+0.171
 drifted+0.160
irmed+0.156
 drift+0.145
 They+0.143
Neg logit contributions
 oh-0.198
 why-0.189
ister-0.179
can-0.175
 salon-0.175
Token!"
Token ID2474
Feature activation+2.13e-02

Pos logit contributions
acks+0.100
irmed+0.096
 drifted+0.094
|+0.090
irt+0.089
Neg logit contributions
ache-0.065
 oh-0.064
 lawyer-0.057
 why-0.056
cher-0.056
Token
Token ID198
Feature activation+2.66e-02

Pos logit contributions
acks+0.039
|+0.037
 minds+0.037
irmed+0.036
 sorts+0.036
Neg logit contributions
 why-0.029
 oh-0.028
 doctor-0.026
 sir-0.026
but-0.026
Token
Token ID198
Feature activation+1.98e-02

Pos logit contributions
acks+0.047
irmed+0.045
 drifted+0.043
 minds+0.041
irted+0.040
Neg logit contributions
 why-0.041
 oh-0.041
ache-0.040
 sir-0.039
 lawyer-0.038
TokenThey
Token ID2990
Feature activation+5.07e-02

Pos logit contributions
acks+0.042
irmed+0.037
 drifted+0.037
|+0.036
 minds+0.035
Neg logit contributions
 oh-0.027
ache-0.026
 why-0.025
chard-0.025
 lawyer-0.024
Token laugh
Token ID6487
Feature activation+5.72e-02

Pos logit contributions
acks+0.110
irmed+0.099
acked+0.098
irted+0.094
 drifted+0.093
Neg logit contributions
ache-0.089
 why-0.088
 salon-0.084
chard-0.082
 lawyer-0.082
Token and
Token ID290
Feature activation+5.64e-02

Pos logit contributions
acks+0.118
irmed+0.107
|+0.106
 drifted+0.104
irted+0.100
Neg logit contributions
 oh-0.079
 why-0.079
ache-0.078
 lawyer-0.073
 sir-0.071
Token shout
Token ID9191
Feature activation+3.60e-02

Pos logit contributions
acks+0.110
acked+0.102
igg+0.101
irted+0.100
irmed+0.099
Neg logit contributions
 why-0.073
chard-0.073
 sir-0.070
 ring-0.070
 oh-0.070
Token,
Token ID11
Feature activation-4.61e-02

Pos logit contributions
acks+0.062
 drifted+0.054
irmed+0.054
|+0.051
irt+0.050
Neg logit contributions
 oh-0.063
ache-0.062
 why-0.060
 lawyer-0.059
cher-0.059
Token "
Token ID366
Feature activation-3.30e-02

Pos logit contributions
ache+0.044
 oh+0.043
cher+0.040
 lawyer+0.039
chard+0.039
Neg logit contributions
acks-0.117
 drifted-0.107
irmed-0.105
|-0.101
irt-0.100
TokenWhe
Token ID10842
Feature activation+1.12e-01

Pos logit contributions
 oh+0.054
ache+0.051
can+0.050
cher+0.049
 lawyer+0.047
Neg logit contributions
acks-0.059
irmed-0.053
 drifted-0.052
irt-0.051
irm-0.050
Tokene
Token ID68
Feature activation+4.70e-02

Pos logit contributions
acks+0.174
 drifted+0.165
irmed+0.158
 drift+0.149
irm+0.142
Neg logit contributions
 oh-0.194
 why-0.188
ister-0.178
 salon-0.178
 lawyer-0.174
Token!
Token ID0
Feature activation-5.47e-03

Pos logit contributions
acks+0.095
irmed+0.092
 drifted+0.089
|+0.087
irt+0.085
Neg logit contributions
ache-0.063
 oh-0.061
 sir-0.055
 lawyer-0.055
 why-0.055
Token That
Token ID1320
Feature activation-3.87e-04

Pos logit contributions
 oh+0.008
 sir+0.008
can+0.008
 lawyer+0.008
ache+0.007
Neg logit contributions
acks-0.010
irmed-0.009
 drifted-0.009
irm-0.009
irt-0.009
Token was
Token ID373
Feature activation-8.78e-03

Pos logit contributions
 oh+0.001
ache+0.001
 why+0.001
 lawyer+0.001
 sir+0.001
Neg logit contributions
acks-0.001
irmed-0.001
 drifted-0.001
|-0.001
irt-0.001
Token fun
Token ID1257
Feature activation+2.14e-02

Pos logit contributions
ache+0.017
 oh+0.017
cher+0.016
 lawyer+0.016
chard+0.016
Neg logit contributions
acks-0.014
 drifted-0.011
irmed-0.011
|-0.011
irt-0.010
Token something
Token ID1223
Feature activation-3.49e-02

Pos logit contributions
 oh+0.002
ache+0.002
 why+0.002
 lawyer+0.002
cher+0.002
Neg logit contributions
acks-0.004
 drifted-0.003
irmed-0.003
|-0.003
irt-0.003
Token bad
Token ID2089
Feature activation+5.07e-03

Pos logit contributions
 oh+0.047
 why+0.046
ache+0.046
 lawyer+0.043
cher+0.042
Neg logit contributions
acks-0.075
irmed-0.066
 drifted-0.065
acked-0.064
|-0.064
Token happened
Token ID3022
Feature activation+3.33e-03

Pos logit contributions
acks+0.009
 drifted+0.008
irmed+0.008
|+0.008
irt+0.008
Neg logit contributions
ache-0.008
 oh-0.008
chard-0.008
cher-0.008
 lawyer-0.008
Token.
Token ID13
Feature activation+1.28e-02

Pos logit contributions
acks+0.005
 drifted+0.005
irmed+0.005
|+0.004
irt+0.004
Neg logit contributions
ache-0.006
 oh-0.006
 lawyer-0.006
cher-0.006
chard-0.006
Token<|endoftext|>
Token ID50256
Feature activation+6.77e-02

Pos logit contributions
acks+0.026
irmed+0.025
irted+0.024
irt+0.024
 drifted+0.023
Neg logit contributions
 why-0.018
 lawyer-0.018
 oh-0.017
ache-0.016
 doctor-0.016
TokenCl
Token ID2601
Feature activation+1.11e-01

Pos logit contributions
acks+0.121
irmed+0.113
 drifted+0.109
irt+0.107
irted+0.105
Neg logit contributions
 oh-0.101
 lawyer-0.100
 why-0.099
cher-0.099
ache-0.099
Tokenara
Token ID3301
Feature activation+4.51e-02

Pos logit contributions
acks+0.174
irmed+0.165
 drifted+0.157
 tails+0.154
 breaths+0.153
Neg logit contributions
ache-0.177
can-0.173
 lawyer-0.171
 sir-0.170
 why-0.169
Token liked
Token ID8288
Feature activation-6.09e-03

Pos logit contributions
acks+0.093
irmed+0.084
irt+0.082
irm+0.079
 drifted+0.078
Neg logit contributions
 why-0.057
ache-0.056
 lawyer-0.054
 oh-0.052
 office-0.050
Token to
Token ID284
Feature activation-3.49e-02

Pos logit contributions
ache+0.008
cher+0.008
 oh+0.007
 sir+0.007
 why+0.007
Neg logit contributions
acks-0.013
 drifted-0.012
irmed-0.011
|-0.011
irt-0.011
Token climb
Token ID12080
Feature activation+1.36e-02

Pos logit contributions
 oh+0.060
ache+0.059
 why+0.058
 lawyer+0.057
chard+0.056
Neg logit contributions
acks-0.061
irmed-0.054
 drifted-0.052
acked-0.050
|-0.050
Token.
Token ID13
Feature activation+9.17e-03

Pos logit contributions
acks+0.025
 drifted+0.023
irmed+0.022
|+0.021
irt+0.021
Neg logit contributions
 oh-0.022
ache-0.021
 why-0.021
chard-0.020
 lawyer-0.020
Token happily
Token ID18177
Feature activation+4.51e-02

Pos logit contributions
acks+0.009
 drifted+0.007
irmed+0.007
irt+0.007
|+0.007
Neg logit contributions
 oh-0.014
chard-0.013
ache-0.013
 why-0.013
cher-0.013
Token ever
Token ID1683
Feature activation+5.67e-02

Pos logit contributions
acks+0.090
irmed+0.079
irt+0.076
 drifted+0.075
irted+0.074
Neg logit contributions
 oh-0.067
ache-0.067
cher-0.066
 lawyer-0.063
chard-0.062
Token after
Token ID706
Feature activation+1.11e-02

Pos logit contributions
acks+0.120
irmed+0.106
 drifted+0.106
acked+0.101
irt+0.101
Neg logit contributions
 oh-0.079
ache-0.078
chard-0.074
 why-0.074
 lawyer-0.072
Token!
Token ID0
Feature activation+3.05e-03

Pos logit contributions
acks+0.022
irmed+0.020
irt+0.019
irted+0.019
 drifted+0.019
Neg logit contributions
 why-0.016
 oh-0.016
ache-0.015
 lawyer-0.015
cher-0.015
Token<|endoftext|>
Token ID50256
Feature activation+6.20e-02

Pos logit contributions
acks+0.006
irmed+0.006
irted+0.006
 breaths+0.006
irt+0.006
Neg logit contributions
 lawyer-0.004
 why-0.004
 sir-0.004
 principal-0.004
 doctor-0.004
TokenCl
Token ID2601
Feature activation+1.10e-01

Pos logit contributions
acks+0.106
irmed+0.099
irt+0.095
 drifted+0.095
irted+0.092
Neg logit contributions
can-0.097
cher-0.095
 oh-0.095
 lawyer-0.094
 why-0.093
Tokenare
Token ID533
Feature activation+4.88e-02

Pos logit contributions
acks+0.166
irmed+0.166
 breaths+0.152
 drifted+0.150
irm+0.147
Neg logit contributions
can-0.177
ber-0.173
 sir-0.172
keeper-0.169
lock-0.168
Tokennce
Token ID1198
Feature activation+1.47e-02

Pos logit contributions
acks+0.093
irmed+0.091
irm+0.085
irt+0.082
irted+0.082
Neg logit contributions
 why-0.067
can-0.062
 oh-0.061
 lawyer-0.060
cher-0.058
Token was
Token ID373
Feature activation-9.46e-04

Pos logit contributions
acks+0.029
 drifted+0.025
irmed+0.025
|+0.025
text+0.024
Neg logit contributions
 oh-0.021
 why-0.021
chard-0.021
ache-0.021
 lawyer-0.020
Token a
Token ID257
Feature activation-1.67e-02

Pos logit contributions
 oh+0.001
chard+0.001
 why+0.001
 lawyer+0.001
ache+0.001
Neg logit contributions
acks-0.002
irmed-0.002
 drifted-0.002
 minds-0.002
text-0.002
Token little
Token ID1310
Feature activation+2.29e-02

Pos logit contributions
 oh+0.025
ache+0.025
 lawyer+0.024
chard+0.024
 why+0.024
Neg logit contributions
acks-0.034
 drifted-0.029
irmed-0.029
|-0.028
acked-0.027
Token stirred
Token ID33091
Feature activation+2.95e-02

Pos logit contributions
acks+0.046
|+0.039
text+0.039
irmed+0.039
irt+0.037
Neg logit contributions
 oh-0.049
 why-0.047
ache-0.047
chard-0.045
cher-0.044
Token until
Token ID1566
Feature activation+1.45e-02

Pos logit contributions
acks+0.052
 drifted+0.046
irmed+0.045
|+0.043
irt+0.042
Neg logit contributions
 oh-0.050
ache-0.050
 why-0.048
 lawyer-0.047
chard-0.047
Token it
Token ID340
Feature activation+2.72e-02

Pos logit contributions
acks+0.031
|+0.027
irmed+0.027
irt+0.026
 drifted+0.026
Neg logit contributions
 oh-0.021
 why-0.019
chard-0.019
ache-0.019
 lawyer-0.018
Token was
Token ID373
Feature activation+8.94e-03

Pos logit contributions
acks+0.055
irmed+0.050
irt+0.048
|+0.048
 drifted+0.048
Neg logit contributions
 oh-0.038
ache-0.037
 why-0.035
cher-0.033
can-0.033
Token all
Token ID477
Feature activation+8.85e-02

Pos logit contributions
acks+0.016
irmed+0.014
|+0.014
 drifted+0.014
text+0.013
Neg logit contributions
 oh-0.015
ache-0.015
 why-0.014
chard-0.014
 lawyer-0.013
Token soft
Token ID2705
Feature activation+1.09e-01

Pos logit contributions
acks+0.178
irmed+0.161
text+0.158
|+0.155
irt+0.151
Neg logit contributions
ache-0.123
 oh-0.123
 why-0.123
chard-0.115
 worker-0.109
Token and
Token ID290
Feature activation+3.07e-02

Pos logit contributions
acks+0.210
|+0.188
irmed+0.188
 drifted+0.187
irt+0.182
Neg logit contributions
ache-0.162
 oh-0.161
 why-0.151
 lawyer-0.144
chard-0.142
Token fluffy
Token ID39145
Feature activation+9.58e-02

Pos logit contributions
acks+0.055
 drifted+0.048
irmed+0.047
|+0.045
irt+0.044
Neg logit contributions
ache-0.052
 oh-0.052
cher-0.049
 lawyer-0.049
 why-0.049
Token.
Token ID13
Feature activation+9.11e-03

Pos logit contributions
acks+0.177
irmed+0.156
|+0.156
 drifted+0.154
irt+0.151
Neg logit contributions
ache-0.154
 oh-0.152
 why-0.142
 lawyer-0.138
chard-0.137
Token Then
Token ID3244
Feature activation+9.88e-03

Pos logit contributions
acks+0.016
irmed+0.015
 drifted+0.015
irt+0.014
|+0.014
Neg logit contributions
 oh-0.015
 why-0.014
 lawyer-0.014
chard-0.014
 clerk-0.013
Token,
Token ID11
Feature activation-6.52e-02

Pos logit contributions
acks+0.021
 drifted+0.019
irmed+0.019
|+0.018
irt+0.018
Neg logit contributions
 oh-0.013
ache-0.013
 why-0.013
 lawyer-0.012
cher-0.012
Token smiles
Token ID21845
Feature activation+5.16e-02

Pos logit contributions
acks+0.032
|+0.028
acked+0.027
irmed+0.027
 drifted+0.026
Neg logit contributions
 why-0.036
 salon-0.035
ache-0.035
 oh-0.034
 doctor-0.033
Token and
Token ID290
Feature activation+3.77e-02

Pos logit contributions
acks+0.094
 drifted+0.082
irmed+0.081
|+0.077
irt+0.076
Neg logit contributions
 oh-0.086
ache-0.085
 why-0.081
 lawyer-0.081
cher-0.080
Token shouts
Token ID34757
Feature activation+2.59e-02

Pos logit contributions
acks+0.059
|+0.059
acked+0.053
alk+0.052
text+0.050
Neg logit contributions
 why-0.067
 oh-0.065
ache-0.060
 sir-0.060
cher-0.058
Token,
Token ID11
Feature activation-5.83e-02

Pos logit contributions
acks+0.054
 drifted+0.048
irmed+0.048
|+0.047
irt+0.045
Neg logit contributions
 oh-0.036
ache-0.035
 why-0.034
 lawyer-0.033
cher-0.032
Token "
Token ID366
Feature activation-3.89e-02

Pos logit contributions
ache+0.072
 lawyer+0.067
cher+0.067
 oh+0.066
 principal+0.063
Neg logit contributions
acks-0.128
 drifted-0.122
irmed-0.115
 They-0.110
 drift-0.108
TokenWhe
Token ID10842
Feature activation+1.09e-01

Pos logit contributions
 oh+0.068
ache+0.063
cher+0.060
 lawyer+0.059
can+0.059
Neg logit contributions
acks-0.069
irmed-0.059
 drifted-0.059
irt-0.058
|-0.056
Tokene
Token ID68
Feature activation+4.42e-02

Pos logit contributions
acks+0.163
 drifted+0.148
irmed+0.144
 drift+0.132
 They+0.131
Neg logit contributions
 oh-0.203
 why-0.195
cher-0.184
 salon-0.184
can-0.183
Token!
Token ID0
Feature activation-9.85e-03

Pos logit contributions
acks+0.089
irmed+0.083
 drifted+0.081
|+0.078
irt+0.078
Neg logit contributions
 oh-0.061
ache-0.061
cher-0.055
 lawyer-0.054
 why-0.054
Token That
Token ID1320
Feature activation-3.86e-03

Pos logit contributions
 oh+0.016
ache+0.015
 sir+0.015
 lawyer+0.014
 why+0.014
Neg logit contributions
acks-0.019
 drifted-0.017
irmed-0.016
|-0.015
irt-0.015
Token was
Token ID373
Feature activation-1.11e-02

Pos logit contributions
 oh+0.007
ache+0.007
 why+0.006
 lawyer+0.006
 sir+0.006
Neg logit contributions
acks-0.007
irmed-0.006
 drifted-0.006
|-0.006
irt-0.005
Token fun
Token ID1257
Feature activation+1.91e-02

Pos logit contributions
ache+0.020
 oh+0.020
cher+0.019
 lawyer+0.019
chard+0.019
Neg logit contributions
acks-0.019
 drifted-0.016
irmed-0.016
|-0.015
irt-0.015
Token and
Token ID290
Feature activation+3.52e-02

Pos logit contributions
 oh+0.016
ache+0.016
cher+0.015
 why+0.015
 lawyer+0.015
Neg logit contributions
acks-0.015
 drifted-0.013
irmed-0.013
|-0.012
acked-0.012
Token slides
Token ID19392
Feature activation+5.28e-02

Pos logit contributions
acks+0.060
 drifted+0.052
irmed+0.051
|+0.050
acked+0.049
Neg logit contributions
 oh-0.061
 why-0.059
ache-0.059
 lawyer-0.057
chard-0.057
Token down
Token ID866
Feature activation+4.31e-02

Pos logit contributions
acks+0.099
 drifted+0.086
irmed+0.085
|+0.081
irt+0.080
Neg logit contributions
 oh-0.085
ache-0.085
cher-0.080
 lawyer-0.080
 why-0.080
Token.
Token ID13
Feature activation+7.80e-03

Pos logit contributions
acks+0.076
 drifted+0.066
irmed+0.066
|+0.062
irt+0.062
Neg logit contributions
 oh-0.073
ache-0.072
 lawyer-0.070
 why-0.070
chard-0.069
Token "
Token ID366
Feature activation-3.38e-02

Pos logit contributions
acks+0.013
 drifted+0.011
irmed+0.011
|+0.010
irt+0.010
Neg logit contributions
 oh-0.015
ache-0.015
 why-0.014
 lawyer-0.014
cher-0.014
TokenWhe
Token ID10842
Feature activation+1.08e-01

Pos logit contributions
 oh+0.051
ache+0.049
 lawyer+0.047
chard+0.046
 why+0.046
Neg logit contributions
acks-0.067
 drifted-0.059
irmed-0.059
irt-0.055
|-0.055
Tokene
Token ID68
Feature activation+4.26e-02

Pos logit contributions
acks+0.160
 drifted+0.147
irmed+0.140
 drift+0.131
irm+0.126
Neg logit contributions
 oh-0.205
 why-0.198
 salon-0.184
can-0.183
 lawyer-0.183
Token!"
Token ID2474
Feature activation+1.16e-02

Pos logit contributions
acks+0.083
irmed+0.076
 drifted+0.076
irt+0.071
|+0.070
Neg logit contributions
 oh-0.063
ache-0.063
 lawyer-0.058
cher-0.058
 why-0.057
Token
Token ID198
Feature activation+2.33e-02

Pos logit contributions
acks+0.022
irmed+0.020
 drifted+0.020
|+0.019
irt+0.019
Neg logit contributions
 oh-0.018
 why-0.017
ache-0.017
 lawyer-0.017
 sir-0.017
Token
Token ID198
Feature activation+1.75e-02

Pos logit contributions
acks+0.035
 drifted+0.030
irmed+0.030
|+0.028
irt+0.027
Neg logit contributions
 oh-0.046
ache-0.045
 why-0.044
 lawyer-0.043
cher-0.043
Token"
Token ID1
Feature activation-6.00e-02

Pos logit contributions
acks+0.034
 drifted+0.029
irmed+0.029
acked+0.028
|+0.028
Neg logit contributions
ache-0.028
 oh-0.026
chard-0.026
cher-0.026
 lawyer-0.026
Token neat
Token ID15049
Feature activation+4.64e-02

Pos logit contributions
ache+0.012
 oh+0.011
 lawyer+0.011
cher+0.011
 sir+0.011
Neg logit contributions
acks-0.012
 drifted-0.011
irmed-0.010
irt-0.010
|-0.009
Token and
Token ID290
Feature activation+2.47e-02

Pos logit contributions
acks+0.072
 drifted+0.062
irmed+0.060
irt+0.055
acked+0.055
Neg logit contributions
 oh-0.088
ache-0.086
chard-0.084
 lawyer-0.083
cher-0.083
Token tidy
Token ID43044
Feature activation+2.05e-02

Pos logit contributions
acks+0.038
 drifted+0.036
irmed+0.033
 minds+0.031
 evenings+0.031
Neg logit contributions
ache-0.047
 lawyer-0.045
 principal-0.044
cher-0.043
 sir-0.043
Token all
Token ID477
Feature activation+8.92e-02

Pos logit contributions
acks+0.031
 drifted+0.026
irmed+0.025
|+0.024
irt+0.023
Neg logit contributions
 oh-0.040
ache-0.040
cher-0.039
 lawyer-0.039
chard-0.038
Token year
Token ID614
Feature activation+5.37e-02

Pos logit contributions
acks+0.180
irmed+0.167
 drifted+0.164
text+0.156
irted+0.152
Neg logit contributions
 why-0.133
 oh-0.125
chard-0.117
 worker-0.114
 lawyer-0.113
Token round
Token ID2835
Feature activation+1.08e-01

Pos logit contributions
acks+0.098
 drifted+0.085
irmed+0.084
|+0.080
irt+0.079
Neg logit contributions
 oh-0.089
ache-0.088
cher-0.084
 lawyer-0.084
 why-0.083
Token.
Token ID13
Feature activation+1.71e-02

Pos logit contributions
acks+0.189
 drifted+0.165
irmed+0.163
|+0.160
irt+0.153
Neg logit contributions
 oh-0.183
ache-0.182
 why-0.175
 lawyer-0.170
cher-0.168
Token
Token ID198
Feature activation+2.10e-02

Pos logit contributions
acks+0.035
irmed+0.034
 drifted+0.033
irted+0.033
irt+0.032
Neg logit contributions
 why-0.023
 lawyer-0.022
 doctor-0.022
 sir-0.021
 oh-0.021
Token
Token ID198
Feature activation+1.26e-02

Pos logit contributions
acks+0.034
irmed+0.029
 drifted+0.029
irt+0.027
irted+0.027
Neg logit contributions
 oh-0.038
ache-0.038
 why-0.037
 lawyer-0.037
cher-0.036
TokenOne
Token ID3198
Feature activation+2.45e-02

Pos logit contributions
acks+0.022
irmed+0.020
 drifted+0.020
irted+0.018
irt+0.018
Neg logit contributions
 oh-0.022
 lawyer-0.021
 why-0.021
ache-0.021
chard-0.020
Token day
Token ID1110
Feature activation+6.80e-03

Pos logit contributions
acks+0.033
irmed+0.027
 drifted+0.026
acked+0.024
text+0.024
Neg logit contributions
 oh-0.054
 why-0.052
 lawyer-0.052
ache-0.052
chard-0.050
Token and
Token ID290
Feature activation+3.81e-02

Pos logit contributions
acks+0.029
 drifted+0.025
irmed+0.025
|+0.024
irt+0.023
Neg logit contributions
 oh-0.027
ache-0.027
 why-0.025
 lawyer-0.025
cher-0.025
Token slides
Token ID19392
Feature activation+5.29e-02

Pos logit contributions
acks+0.066
 drifted+0.057
irmed+0.056
|+0.056
acked+0.054
Neg logit contributions
 oh-0.065
 why-0.063
ache-0.062
 salon-0.061
 lawyer-0.061
Token down
Token ID866
Feature activation+4.22e-02

Pos logit contributions
acks+0.098
 drifted+0.086
irmed+0.085
|+0.080
irt+0.080
Neg logit contributions
 oh-0.086
ache-0.086
cher-0.081
 lawyer-0.081
 why-0.081
Token.
Token ID13
Feature activation+8.75e-03

Pos logit contributions
acks+0.074
 drifted+0.064
irmed+0.064
irt+0.061
|+0.061
Neg logit contributions
 oh-0.072
ache-0.070
 lawyer-0.068
 why-0.068
chard-0.068
Token "
Token ID366
Feature activation-3.40e-02

Pos logit contributions
acks+0.016
 drifted+0.014
irmed+0.014
|+0.014
irt+0.013
Neg logit contributions
 oh-0.014
ache-0.014
 why-0.013
 lawyer-0.013
chard-0.013
TokenWhe
Token ID10842
Feature activation+1.07e-01

Pos logit contributions
 oh+0.052
ache+0.049
 lawyer+0.048
chard+0.047
cher+0.047
Neg logit contributions
acks-0.067
 drifted-0.059
irmed-0.059
irt-0.055
|-0.055
Tokene
Token ID68
Feature activation+4.28e-02

Pos logit contributions
acks+0.160
 drifted+0.148
irmed+0.141
 drift+0.133
irm+0.126
Neg logit contributions
 oh-0.202
 why-0.194
 salon-0.181
can-0.181
 lawyer-0.179
Token!"
Token ID2474
Feature activation+1.13e-02

Pos logit contributions
acks+0.083
irmed+0.078
 drifted+0.077
irt+0.072
irted+0.071
Neg logit contributions
 oh-0.062
ache-0.062
 lawyer-0.057
cher-0.057
 why-0.056
Token
Token ID198
Feature activation+2.31e-02

Pos logit contributions
acks+0.026
|+0.025
acked+0.023
irmed+0.023
 minds+0.023
Neg logit contributions
 oh-0.014
 why-0.013
 sir-0.012
 please-0.012
 doctor-0.012
Token
Token ID198
Feature activation+1.68e-02

Pos logit contributions
acks+0.039
irmed+0.035
 drifted+0.035
|+0.032
 minds+0.032
Neg logit contributions
 oh-0.039
ache-0.038
 why-0.038
 lawyer-0.037
 sir-0.037
Token"
Token ID1
Feature activation-6.04e-02

Pos logit contributions
acks+0.036
 drifted+0.032
irmed+0.032
|+0.031
irt+0.030
Neg logit contributions
 oh-0.023
ache-0.022
 lawyer-0.021
 why-0.021
cher-0.021
Token gently
Token ID15165
Feature activation+2.87e-02

Pos logit contributions
 oh+0.017
ache+0.017
cher+0.016
chard+0.016
 lawyer+0.016
Neg logit contributions
acks-0.019
 drifted-0.016
irmed-0.016
irt-0.015
|-0.015
Token and
Token ID290
Feature activation+2.37e-02

Pos logit contributions
acks+0.048
 drifted+0.040
irmed+0.039
irt+0.037
acked+0.036
Neg logit contributions
 oh-0.054
ache-0.053
cher-0.052
 lawyer-0.051
chard-0.051
Token said
Token ID531
Feature activation-2.53e-02

Pos logit contributions
 drifted+0.033
acks+0.032
irmed+0.029
 drift+0.027
irted+0.026
Neg logit contributions
ache-0.046
 lawyer-0.045
 sir-0.043
card-0.042
 oh-0.042
Token,
Token ID11
Feature activation-6.35e-02

Pos logit contributions
chard+0.049
cher+0.048
card+0.048
itor+0.047
can+0.046
Neg logit contributions
acks-0.033
 drifted-0.030
 They-0.029
text-0.028
 care-0.027
Token "
Token ID366
Feature activation-4.36e-02

Pos logit contributions
ache+0.094
cher+0.093
 oh+0.090
 principal+0.089
chard+0.088
Neg logit contributions
acks-0.113
 They-0.109
 drifted-0.106
 drift-0.099
 Soon-0.099
TokenWhe
Token ID10842
Feature activation+1.07e-01

Pos logit contributions
 oh+0.060
ache+0.059
 lawyer+0.055
cher+0.053
can+0.052
Neg logit contributions
acks-0.094
irmed-0.084
 drifted-0.081
irm-0.080
irt-0.080
Tokene
Token ID68
Feature activation+4.52e-02

Pos logit contributions
acks+0.163
 drifted+0.146
irmed+0.141
 drift+0.132
irm+0.126
Neg logit contributions
 oh-0.205
 why-0.193
 lawyer-0.183
 salon-0.183
can-0.182
Token,
Token ID11
Feature activation-6.77e-02

Pos logit contributions
acks+0.089
irmed+0.086
 drifted+0.084
irt+0.080
reci+0.079
Neg logit contributions
ache-0.063
 oh-0.062
 lawyer-0.057
cher-0.056
 sir-0.054
Token Brown
Token ID4373
Feature activation+2.01e-02

Pos logit contributions
ache+0.110
 oh+0.110
cher+0.105
chard+0.105
 lawyer+0.102
Neg logit contributions
acks-0.120
 drifted-0.108
|-0.103
irmed-0.102
text-0.099
Tokenie
Token ID494
Feature activation+1.57e-02

Pos logit contributions
acks+0.030
 drifted+0.025
irmed+0.024
|+0.023
irt+0.023
Neg logit contributions
 oh-0.040
ache-0.040
 lawyer-0.038
 why-0.038
cher-0.038
Token,
Token ID11
Feature activation-7.46e-02

Pos logit contributions
acks+0.031
irmed+0.028
 drifted+0.028
|+0.026
irt+0.026
Neg logit contributions
ache-0.024
 oh-0.023
 why-0.022
 lawyer-0.022
cher-0.022
Token and
Token ID290
Feature activation+3.51e-02

Pos logit contributions
 oh+0.016
ache+0.016
cher+0.015
 why+0.015
 lawyer+0.015
Neg logit contributions
acks-0.015
 drifted-0.013
irmed-0.013
|-0.012
acked-0.012
Token slides
Token ID19392
Feature activation+5.30e-02

Pos logit contributions
acks+0.061
 drifted+0.053
irmed+0.052
|+0.051
acked+0.050
Neg logit contributions
 oh-0.059
 why-0.058
ache-0.057
 lawyer-0.056
 salon-0.056
Token down
Token ID866
Feature activation+4.26e-02

Pos logit contributions
acks+0.100
 drifted+0.087
irmed+0.086
|+0.082
irt+0.081
Neg logit contributions
 oh-0.084
ache-0.084
 why-0.079
cher-0.079
 lawyer-0.079
Token.
Token ID13
Feature activation+8.89e-03

Pos logit contributions
acks+0.075
 drifted+0.065
irmed+0.064
|+0.061
irt+0.061
Neg logit contributions
 oh-0.073
ache-0.072
 why-0.069
 lawyer-0.069
chard-0.069
Token "
Token ID366
Feature activation-3.29e-02

Pos logit contributions
acks+0.013
 drifted+0.011
irmed+0.011
acked+0.010
irt+0.010
Neg logit contributions
ache-0.019
 oh-0.018
cher-0.018
 lawyer-0.017
chard-0.017
TokenWhe
Token ID10842
Feature activation+1.07e-01

Pos logit contributions
 oh+0.050
ache+0.047
 lawyer+0.046
chard+0.045
cher+0.045
Neg logit contributions
acks-0.065
 drifted-0.057
irmed-0.057
irt-0.053
|-0.053
Tokene
Token ID68
Feature activation+4.18e-02

Pos logit contributions
acks+0.157
 drifted+0.143
irmed+0.137
 drift+0.127
irm+0.123
Neg logit contributions
 oh-0.205
 why-0.198
 salon-0.184
 lawyer-0.183
can-0.183
Token!"
Token ID2474
Feature activation+1.26e-02

Pos logit contributions
acks+0.081
irmed+0.074
 drifted+0.073
irt+0.069
|+0.068
Neg logit contributions
 oh-0.063
ache-0.062
 lawyer-0.058
cher-0.058
 why-0.057
Token
Token ID198
Feature activation+2.27e-02

Pos logit contributions
acks+0.025
 drifted+0.022
irmed+0.022
|+0.022
acked+0.021
Neg logit contributions
 oh-0.019
ache-0.018
 why-0.018
 lawyer-0.017
 sir-0.017
Token
Token ID198
Feature activation+1.71e-02

Pos logit contributions
acks+0.034
 drifted+0.028
irmed+0.028
|+0.026
irt+0.025
Neg logit contributions
ache-0.046
 oh-0.046
cher-0.043
 lawyer-0.043
 why-0.043
Token"
Token ID1
Feature activation-5.94e-02

Pos logit contributions
acks+0.031
irmed+0.026
 drifted+0.026
acked+0.025
irt+0.025
Neg logit contributions
ache-0.030
chard-0.028
 lawyer-0.028
 oh-0.028
 why-0.028
Token smiled
Token ID13541
Feature activation+4.07e-02

Pos logit contributions
acks+0.044
irmed+0.037
irt+0.037
|+0.037
acked+0.036
Neg logit contributions
ache-0.034
 oh-0.034
 why-0.032
cher-0.031
 lawyer-0.031
Token and
Token ID290
Feature activation+3.98e-02

Pos logit contributions
acks+0.063
 drifted+0.054
irmed+0.053
irt+0.049
acked+0.048
Neg logit contributions
ache-0.078
 oh-0.078
cher-0.076
 lawyer-0.075
chard-0.075
Token shouted
Token ID17293
Feature activation+2.41e-02

Pos logit contributions
acks+0.076
|+0.064
irmed+0.063
 drifted+0.062
irt+0.062
Neg logit contributions
 oh-0.066
ache-0.063
cher-0.060
 why-0.060
chard-0.059
Token,
Token ID11
Feature activation-5.97e-02

Pos logit contributions
acks+0.047
 drifted+0.041
irmed+0.041
|+0.039
irt+0.038
Neg logit contributions
 oh-0.037
ache-0.037
 why-0.035
 lawyer-0.035
cher-0.035
Token "
Token ID366
Feature activation-3.86e-02

Pos logit contributions
ache+0.082
 lawyer+0.079
cher+0.077
 salon+0.076
 oh+0.076
Neg logit contributions
acks-0.118
 drifted-0.117
 They-0.107
irmed-0.105
 drift-0.103
TokenWhe
Token ID10842
Feature activation+1.06e-01

Pos logit contributions
 oh+0.064
ache+0.060
cher+0.057
can+0.056
 lawyer+0.054
Neg logit contributions
acks-0.072
irmed-0.062
irt-0.061
 drifted-0.061
 minds-0.058
Tokene
Token ID68
Feature activation+4.08e-02

Pos logit contributions
acks+0.163
 drifted+0.145
irmed+0.141
 drift+0.131
 They+0.128
Neg logit contributions
 oh-0.199
 why-0.189
cher-0.180
can-0.180
 salon-0.178
Token!"
Token ID2474
Feature activation+2.03e-02

Pos logit contributions
acks+0.082
irmed+0.077
 drifted+0.075
|+0.073
irt+0.072
Neg logit contributions
ache-0.057
 oh-0.057
cher-0.051
 lawyer-0.051
 why-0.050
Token
Token ID198
Feature activation+2.99e-02

Pos logit contributions
acks+0.031
 drifted+0.027
irmed+0.027
|+0.025
irt+0.025
Neg logit contributions
 oh-0.039
ache-0.038
 why-0.037
 lawyer-0.037
 sir-0.036
Token
Token ID198
Feature activation+2.37e-02

Pos logit contributions
acks+0.049
 drifted+0.043
irmed+0.043
|+0.039
irt+0.039
Neg logit contributions
 oh-0.054
ache-0.053
 why-0.052
 lawyer-0.051
cher-0.051
TokenL
Token ID43
Feature activation+3.42e-02

Pos logit contributions
acks+0.042
 drifted+0.036
irmed+0.035
|+0.034
acked+0.033
Neg logit contributions
ache-0.041
 oh-0.041
chard-0.039
cher-0.039
 lawyer-0.039
Token until
Token ID1566
Feature activation+9.05e-03

Pos logit contributions
 oh+0.019
ache+0.019
 lawyer+0.018
 why+0.018
cher+0.018
Neg logit contributions
acks-0.018
 drifted-0.015
irmed-0.015
|-0.014
irt-0.014
Token next
Token ID1306
Feature activation-9.10e-03

Pos logit contributions
acks+0.021
irmed+0.019
irt+0.018
|+0.018
acked+0.018
Neg logit contributions
 oh-0.011
chard-0.010
 why-0.010
ache-0.010
 lawyer-0.010
Token time
Token ID640
Feature activation+3.63e-02

Pos logit contributions
ache+0.017
 oh+0.017
 why+0.017
 lawyer+0.017
 sir+0.016
Neg logit contributions
acks-0.014
irmed-0.012
 drifted-0.012
|-0.012
irt-0.011
Token.
Token ID13
Feature activation+9.55e-03

Pos logit contributions
acks+0.075
 drifted+0.066
irmed+0.066
|+0.063
irt+0.062
Neg logit contributions
 oh-0.051
ache-0.051
 why-0.048
cher-0.048
 lawyer-0.048
Token<|endoftext|>
Token ID50256
Feature activation+6.20e-02

Pos logit contributions
acks+0.017
 drifted+0.015
irmed+0.015
irt+0.014
irted+0.014
Neg logit contributions
 oh-0.016
 why-0.015
 sir-0.015
 lawyer-0.015
chard-0.015
TokenCl
Token ID2601
Feature activation+1.06e-01

Pos logit contributions
acks+0.096
irmed+0.085
 drifted+0.083
irt+0.081
irted+0.078
Neg logit contributions
 oh-0.112
ache-0.110
cher-0.110
 lawyer-0.109
 why-0.108
Tokenaire
Token ID7626
Feature activation+3.17e-02

Pos logit contributions
acks+0.170
irmed+0.160
 drifted+0.150
 harmony+0.147
 tails+0.147
Neg logit contributions
can-0.167
 sir-0.167
keeper-0.164
ache-0.163
lock-0.162
Token was
Token ID373
Feature activation-6.51e-03

Pos logit contributions
acks+0.065
irmed+0.057
|+0.055
irt+0.055
 drifted+0.054
Neg logit contributions
ache-0.044
 why-0.044
 oh-0.042
 lawyer-0.041
cher-0.040
Token playing
Token ID2712
Feature activation-1.86e-02

Pos logit contributions
 oh+0.009
ache+0.009
 why+0.009
 lawyer+0.009
chard+0.009
Neg logit contributions
acks-0.013
 drifted-0.012
irmed-0.012
|-0.011
irt-0.011
Token in
Token ID287
Feature activation+4.30e-02

Pos logit contributions
 oh+0.023
ache+0.023
 why+0.021
 lawyer+0.021
cher+0.021
Neg logit contributions
acks-0.042
 drifted-0.037
irmed-0.037
|-0.036
irt-0.035
Token the
Token ID262
Feature activation+3.12e-02

Pos logit contributions
acks+0.087
irmed+0.080
 drifted+0.077
irted+0.074
irt+0.073
Neg logit contributions
 oh-0.066
 why-0.062
 lawyer-0.060
chard-0.058
 salon-0.057
Token jelly
Token ID27644
Feature activation+1.46e-02

Pos logit contributions
acks+0.072
irmed+0.063
 drifted+0.063
|+0.061
irt+0.059
Neg logit contributions
 oh-0.063
ache-0.062
 why-0.059
 lawyer-0.059
chard-0.058
Token.
Token ID13
Feature activation+6.07e-03

Pos logit contributions
acks+0.027
 drifted+0.023
irmed+0.023
|+0.021
acked+0.021
Neg logit contributions
 oh-0.024
ache-0.024
 lawyer-0.023
 why-0.023
chard-0.023
Token It
Token ID632
Feature activation+1.57e-02

Pos logit contributions
acks+0.010
 drifted+0.009
irmed+0.008
irt+0.008
|+0.008
Neg logit contributions
 oh-0.011
 why-0.011
ache-0.011
 lawyer-0.011
 sir-0.010
Token was
Token ID373
Feature activation+7.56e-03

Pos logit contributions
acks+0.034
irmed+0.029
irt+0.029
 drifted+0.028
|+0.028
Neg logit contributions
 oh-0.021
ache-0.021
 why-0.021
cher-0.019
 lawyer-0.019
Token all
Token ID477
Feature activation+8.51e-02

Pos logit contributions
acks+0.012
 drifted+0.011
irmed+0.010
irt+0.009
|+0.009
Neg logit contributions
ache-0.014
cher-0.014
 oh-0.013
 lawyer-0.013
 salon-0.013
Token squ
Token ID2809
Feature activation+1.06e-01

Pos logit contributions
acks+0.172
irmed+0.151
 drifted+0.151
irt+0.143
text+0.141
Neg logit contributions
 oh-0.125
ache-0.123
 why-0.120
chard-0.118
cher-0.116
Tokenashed
Token ID5263
Feature activation+5.91e-02

Pos logit contributions
 drifted+0.149
|+0.143
acks+0.129
 drift+0.128
Ar+0.127
Neg logit contributions
ache-0.188
 oh-0.182
 why-0.178
 stealing-0.160
chard-0.159
Token and
Token ID290
Feature activation+4.80e-02

Pos logit contributions
acks+0.126
 drifted+0.118
irmed+0.112
|+0.110
irt+0.107
Neg logit contributions
 oh-0.078
 why-0.074
ache-0.067
chard-0.066
cher-0.065
Token dull
Token ID19222
Feature activation+5.48e-02

Pos logit contributions
acks+0.094
irmed+0.082
irt+0.081
 drifted+0.080
|+0.079
Neg logit contributions
 oh-0.075
ache-0.071
 why-0.070
chard-0.068
cher-0.067
Token now
Token ID783
Feature activation+2.87e-02

Pos logit contributions
acks+0.102
 drifted+0.089
irmed+0.088
|+0.085
irt+0.084
Neg logit contributions
 oh-0.088
ache-0.088
 why-0.084
 lawyer-0.082
cher-0.082
Token.
Token ID13
Feature activation+1.15e-02

Pos logit contributions
acks+0.061
irmed+0.053
|+0.052
irt+0.052
 drifted+0.052
Neg logit contributions
 oh-0.040
ache-0.040
 why-0.039
 lawyer-0.037
cher-0.036
Token y
Token ID331
Feature activation+3.19e-02

Pos logit contributions
cher+0.032
 why+0.030
ache+0.029
but+0.029
ress+0.028
Neg logit contributions
 night-0.016
 evening-0.016
 afternoon-0.015
acks-0.015
 drifted-0.015
Tokenummy
Token ID13513
Feature activation+1.56e-02

Pos logit contributions
acks+0.061
 drifted+0.059
irmed+0.057
|+0.055
 minds+0.052
Neg logit contributions
ache-0.045
 oh-0.043
 lawyer-0.042
 sir-0.040
 why-0.040
Token cookie
Token ID19751
Feature activation+4.34e-03

Pos logit contributions
acks+0.032
 drifted+0.028
irmed+0.027
|+0.026
irt+0.026
Neg logit contributions
ache-0.022
 oh-0.022
cher-0.021
chard-0.021
 why-0.021
Token,
Token ID11
Feature activation-4.81e-02

Pos logit contributions
acks+0.008
 drifted+0.007
acked+0.006
irmed+0.006
irt+0.006
Neg logit contributions
ache-0.007
 why-0.007
 oh-0.007
chard-0.007
cher-0.007
Token all
Token ID477
Feature activation+7.65e-02

Pos logit contributions
ache+0.078
cher+0.074
 principal+0.070
 lawyer+0.070
 sir+0.070
Neg logit contributions
acks-0.087
 drifted-0.082
irmed-0.075
 They-0.074
irt-0.073
Token soft
Token ID2705
Feature activation+1.06e-01

Pos logit contributions
acks+0.148
irmed+0.131
 drifted+0.128
irt+0.121
irm+0.119
Neg logit contributions
 oh-0.122
 why-0.122
ache-0.115
chard-0.114
 lawyer-0.111
Token and
Token ID290
Feature activation+2.92e-02

Pos logit contributions
acks+0.215
 drifted+0.194
irmed+0.194
|+0.191
irt+0.189
Neg logit contributions
 oh-0.149
ache-0.147
 why-0.136
 lawyer-0.133
chard-0.129
Token sweet
Token ID6029
Feature activation+6.40e-02

Pos logit contributions
acks+0.043
 drifted+0.041
irmed+0.037
irted+0.035
 evenings+0.035
Neg logit contributions
ache-0.054
cher-0.052
 sir-0.051
 lawyer-0.051
 principal-0.051
Token.
Token ID13
Feature activation+8.05e-03

Pos logit contributions
acks+0.125
|+0.113
irmed+0.112
 drifted+0.111
irt+0.108
Neg logit contributions
 oh-0.097
ache-0.096
 why-0.091
 lawyer-0.088
 sir-0.080
Token Lily
Token ID20037
Feature activation+8.81e-03

Pos logit contributions
acks+0.014
 drifted+0.012
irmed+0.012
|+0.011
irt+0.011
Neg logit contributions
 oh-0.014
ache-0.014
 why-0.014
 lawyer-0.013
chard-0.013
Token was
Token ID373
Feature activation-8.92e-03

Pos logit contributions
acks+0.019
irmed+0.016
irt+0.016
|+0.016
 drifted+0.016
Neg logit contributions
 oh-0.012
ache-0.012
 why-0.012
 lawyer-0.011
chard-0.011
Token yellow
Token ID7872
Feature activation+6.32e-02

Pos logit contributions
acks+0.007
 drifted+0.006
irmed+0.006
|+0.006
irt+0.006
Neg logit contributions
ache-0.006
cher-0.006
 lawyer-0.006
 oh-0.006
 principal-0.006
Token with
Token ID351
Feature activation+4.65e-02

Pos logit contributions
acks+0.113
 drifted+0.097
irmed+0.096
|+0.091
irt+0.090
Neg logit contributions
ache-0.108
 oh-0.108
cher-0.104
 lawyer-0.102
 why-0.102
Token lots
Token ID6041
Feature activation+5.30e-02

Pos logit contributions
acks+0.091
irmed+0.078
|+0.076
 drifted+0.075
irt+0.073
Neg logit contributions
 oh-0.074
ache-0.069
 why-0.069
 lawyer-0.066
chard-0.064
Token of
Token ID286
Feature activation+3.50e-02

Pos logit contributions
acks+0.115
irmed+0.105
 drifted+0.104
|+0.103
irt+0.099
Neg logit contributions
ache-0.067
 oh-0.063
 why-0.062
 lawyer-0.060
cher-0.056
Token pol
Token ID755
Feature activation+5.69e-02

Pos logit contributions
acks+0.063
irmed+0.053
|+0.053
 drifted+0.052
irm+0.050
Neg logit contributions
 oh-0.059
ache-0.056
 why-0.055
 lawyer-0.053
 paramed-0.050
Tokenka
Token ID4914
Feature activation+1.05e-01

Pos logit contributions
acks+0.088
 drifted+0.083
|+0.079
 drift+0.071
irmed+0.071
Neg logit contributions
ache-0.101
 oh-0.098
 why-0.093
cher-0.091
card-0.090
Token dots
Token ID22969
Feature activation+5.59e-02

Pos logit contributions
acks+0.174
|+0.164
irmed+0.161
 drifted+0.152
irm+0.151
Neg logit contributions
ache-0.189
 oh-0.167
 lawyer-0.162
can-0.156
cher-0.155
Token.
Token ID13
Feature activation+1.38e-02

Pos logit contributions
acks+0.106
 drifted+0.093
irmed+0.092
|+0.088
irt+0.087
Neg logit contributions
 oh-0.087
ache-0.087
 why-0.082
 lawyer-0.082
cher-0.082
Token
Token ID198
Feature activation+1.89e-02

Pos logit contributions
acks+0.023
 drifted+0.020
irmed+0.019
|+0.018
irt+0.018
Neg logit contributions
 oh-0.025
 why-0.024
ache-0.024
chard-0.023
 lawyer-0.023
Token
Token ID198
Feature activation+9.83e-03

Pos logit contributions
acks+0.029
 drifted+0.025
irmed+0.025
|+0.023
irt+0.023
Neg logit contributions
 oh-0.036
ache-0.036
 why-0.034
 lawyer-0.034
cher-0.034
TokenL
Token ID43
Feature activation+2.92e-02

Pos logit contributions
acks+0.020
 drifted+0.018
irmed+0.017
|+0.016
irt+0.016
Neg logit contributions
 oh-0.015
ache-0.014
 why-0.014
 lawyer-0.013
chard-0.013
Token so
Token ID523
Feature activation+1.10e-02

Pos logit contributions
ache+0.042
 oh+0.039
 lawyer+0.038
cher+0.038
chard+0.038
Neg logit contributions
acks-0.046
 drifted-0.043
irmed-0.040
|-0.039
text-0.038
Token they
Token ID484
Feature activation+2.48e-02

Pos logit contributions
acks+0.022
 drifted+0.020
irmed+0.019
|+0.019
irt+0.018
Neg logit contributions
 oh-0.016
ache-0.016
 why-0.015
cher-0.015
 lawyer-0.015
Token did
Token ID750
Feature activation-1.36e-02

Pos logit contributions
acks+0.046
irmed+0.037
irt+0.037
irted+0.035
irts+0.035
Neg logit contributions
 why-0.042
ache-0.041
chard-0.041
cher-0.040
 oh-0.039
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
 why+0.018
 oh+0.017
ache+0.017
 lawyer+0.015
chard+0.015
Neg logit contributions
acks-0.029
irmed-0.028
irted-0.026
 drifted-0.026
|-0.026
Token<|endoftext|>
Token ID50256
Feature activation+6.08e-02

Pos logit contributions
acks+0.018
irmed+0.018
irted+0.017
 drifted+0.017
irt+0.017
Neg logit contributions
 why-0.015
 sir-0.014
 lawyer-0.014
 doctor-0.013
keeper-0.013
TokenCl
Token ID2601
Feature activation+1.05e-01

Pos logit contributions
acks+0.100
irmed+0.090
 drifted+0.089
irt+0.086
irted+0.083
Neg logit contributions
 oh-0.101
cher-0.100
 lawyer-0.100
ache-0.099
 why-0.099
Tokenaire
Token ID7626
Feature activation+3.24e-02

Pos logit contributions
acks+0.150
irmed+0.143
 drifted+0.135
 breaths+0.130
irm+0.127
Neg logit contributions
can-0.183
 sir-0.181
ache-0.178
keeper-0.176
 why-0.174
Token was
Token ID373
Feature activation-5.53e-03

Pos logit contributions
acks+0.068
irmed+0.061
|+0.059
irt+0.058
irm+0.057
Neg logit contributions
ache-0.043
 why-0.042
 lawyer-0.040
 oh-0.040
cher-0.038
Token feeling
Token ID4203
Feature activation+5.42e-02

Pos logit contributions
 oh+0.008
ache+0.008
 why+0.007
chard+0.007
 lawyer+0.007
Neg logit contributions
acks-0.012
 drifted-0.010
irmed-0.010
|-0.010
irt-0.009
Token very
Token ID845
Feature activation+2.71e-02

Pos logit contributions
acks+0.118
 drifted+0.103
irmed+0.103
|+0.101
acked+0.099
Neg logit contributions
 oh-0.073
ache-0.071
 why-0.069
chard-0.068
 lawyer-0.066
Token proud
Token ID6613
Feature activation+3.34e-02

Pos logit contributions
acks+0.057
|+0.050
irmed+0.049
 drifted+0.048
text+0.048
Neg logit contributions
 oh-0.038
 why-0.036
chard-0.036
ache-0.035
 lawyer-0.034
Token slides
Token ID19392
Feature activation+4.11e-02

Pos logit contributions
acks+0.032
|+0.028
irmed+0.027
 drifted+0.027
acked+0.027
Neg logit contributions
 why-0.030
 salon-0.030
 oh-0.028
ache-0.028
 principal-0.027
Token down
Token ID866
Feature activation+3.68e-02

Pos logit contributions
acks+0.079
 drifted+0.070
irmed+0.069
|+0.066
irt+0.065
Neg logit contributions
 oh-0.063
ache-0.063
 why-0.060
 lawyer-0.059
cher-0.059
Token fast
Token ID3049
Feature activation+7.83e-02

Pos logit contributions
acks+0.068
 drifted+0.060
irmed+0.059
|+0.056
irt+0.056
Neg logit contributions
 oh-0.059
ache-0.059
 why-0.056
 lawyer-0.056
chard-0.056
Token.
Token ID13
Feature activation+1.45e-02

Pos logit contributions
acks+0.145
 drifted+0.129
irmed+0.128
|+0.124
irt+0.121
Neg logit contributions
 oh-0.125
ache-0.123
 why-0.119
 lawyer-0.118
 salon-0.112
Token "
Token ID366
Feature activation-3.37e-02

Pos logit contributions
acks+0.020
 drifted+0.016
irmed+0.016
|+0.015
 They+0.014
Neg logit contributions
ache-0.031
 oh-0.031
cher-0.030
chard-0.029
 why-0.029
TokenWhe
Token ID10842
Feature activation+1.05e-01

Pos logit contributions
 oh+0.055
ache+0.051
 lawyer+0.051
cher+0.048
can+0.048
Neg logit contributions
acks-0.062
 drifted-0.055
irmed-0.055
irt-0.051
irted-0.051
Tokene
Token ID68
Feature activation+4.26e-02

Pos logit contributions
acks+0.156
 drifted+0.146
irmed+0.140
 drift+0.131
irm+0.125
Neg logit contributions
 oh-0.198
 why-0.190
can-0.176
 salon-0.175
 lawyer-0.174
Token!"
Token ID2474
Feature activation+1.44e-02

Pos logit contributions
acks+0.084
irmed+0.078
 drifted+0.078
irt+0.073
|+0.072
Neg logit contributions
 oh-0.061
ache-0.061
 lawyer-0.056
cher-0.055
 why-0.055
Token she
Token ID673
Feature activation+1.07e-02

Pos logit contributions
acks+0.029
irmed+0.026
 drifted+0.026
|+0.025
acked+0.025
Neg logit contributions
 oh-0.021
 why-0.020
ache-0.020
 lawyer-0.020
 sir-0.019
Token shouts
Token ID34757
Feature activation+1.59e-02

Pos logit contributions
acks+0.017
 drifted+0.015
irmed+0.015
|+0.014
acked+0.014
Neg logit contributions
 oh-0.019
 why-0.019
ache-0.018
 salon-0.018
 sir-0.018
Token.
Token ID13
Feature activation+3.22e-03

Pos logit contributions
acks+0.031
 drifted+0.027
|+0.027
irmed+0.027
acked+0.025
Neg logit contributions
 oh-0.025
ache-0.024
 why-0.024
 lawyer-0.023
 sir-0.022
Token and
Token ID290
Feature activation+3.31e-02

Pos logit contributions
 oh+0.018
ache+0.018
cher+0.017
 why+0.017
 lawyer+0.017
Neg logit contributions
acks-0.019
 drifted-0.016
irmed-0.016
|-0.015
irt-0.015
Token slides
Token ID19392
Feature activation+5.03e-02

Pos logit contributions
acks+0.057
 drifted+0.049
irmed+0.049
|+0.048
acked+0.047
Neg logit contributions
 oh-0.056
 why-0.055
ache-0.054
chard-0.053
 lawyer-0.053
Token down
Token ID866
Feature activation+3.86e-02

Pos logit contributions
acks+0.096
 drifted+0.084
irmed+0.083
|+0.080
irt+0.078
Neg logit contributions
 oh-0.079
ache-0.078
 why-0.074
 lawyer-0.074
cher-0.074
Token.
Token ID13
Feature activation+8.41e-03

Pos logit contributions
acks+0.068
 drifted+0.059
irmed+0.059
irt+0.056
|+0.056
Neg logit contributions
 oh-0.065
ache-0.064
 lawyer-0.062
 why-0.062
chard-0.062
Token "
Token ID366
Feature activation-3.46e-02

Pos logit contributions
acks+0.015
 drifted+0.013
irmed+0.013
|+0.012
irt+0.012
Neg logit contributions
 oh-0.014
ache-0.014
 why-0.014
 lawyer-0.014
chard-0.013
TokenWhe
Token ID10842
Feature activation+1.05e-01

Pos logit contributions
 oh+0.052
ache+0.049
 lawyer+0.048
chard+0.047
cher+0.047
Neg logit contributions
acks-0.069
 drifted-0.061
irmed-0.060
irt-0.057
|-0.056
Tokene
Token ID68
Feature activation+4.23e-02

Pos logit contributions
acks+0.158
 drifted+0.145
irmed+0.139
 drift+0.130
irm+0.124
Neg logit contributions
 oh-0.198
 why-0.190
can-0.178
 salon-0.177
 lawyer-0.175
Token!"
Token ID2474
Feature activation+1.08e-02

Pos logit contributions
acks+0.082
irmed+0.076
 drifted+0.076
irt+0.071
|+0.070
Neg logit contributions
 oh-0.062
ache-0.061
 lawyer-0.057
cher-0.056
 why-0.055
Token
Token ID198
Feature activation+2.32e-02

Pos logit contributions
acks+0.024
|+0.023
acked+0.022
irmed+0.022
 minds+0.021
Neg logit contributions
 oh-0.014
 why-0.013
 sir-0.012
 please-0.012
 doctor-0.012
Token
Token ID198
Feature activation+1.72e-02

Pos logit contributions
acks+0.041
irmed+0.037
 drifted+0.036
 minds+0.034
irted+0.034
Neg logit contributions
 oh-0.038
 why-0.037
ache-0.037
 lawyer-0.036
 sir-0.036
Token"
Token ID1
Feature activation-6.10e-02

Pos logit contributions
acks+0.038
 drifted+0.034
irmed+0.034
|+0.032
irt+0.031
Neg logit contributions
 oh-0.023
ache-0.022
 lawyer-0.021
 why-0.021
cher-0.020
Token caught
Token ID4978
Feature activation-9.32e-03

Pos logit contributions
acks+0.013
 drifted+0.011
irmed+0.011
|+0.010
irt+0.010
Neg logit contributions
 oh-0.011
ache-0.011
 why-0.010
cher-0.010
 lawyer-0.010
Token them
Token ID606
Feature activation+2.96e-02

Pos logit contributions
 oh+0.015
ache+0.015
 why+0.015
chard+0.014
 lawyer+0.014
Neg logit contributions
acks-0.017
 drifted-0.015
irmed-0.015
|-0.014
irt-0.014
Token and
Token ID290
Feature activation+2.70e-02

Pos logit contributions
acks+0.056
 drifted+0.049
irmed+0.048
|+0.046
irt+0.045
Neg logit contributions
ache-0.047
 oh-0.047
 why-0.045
 lawyer-0.044
cher-0.044
Token ate
Token ID15063
Feature activation+1.73e-02

Pos logit contributions
acks+0.044
 drifted+0.040
irmed+0.038
|+0.035
irted+0.035
Neg logit contributions
ache-0.049
 oh-0.048
 lawyer-0.047
cher-0.046
chard-0.046
Token them
Token ID606
Feature activation+3.50e-02

Pos logit contributions
acks+0.029
 drifted+0.024
irmed+0.024
|+0.023
acked+0.022
Neg logit contributions
 oh-0.033
 why-0.032
ache-0.030
chard-0.030
 lawyer-0.029
Token both
Token ID1111
Feature activation+1.05e-01

Pos logit contributions
acks+0.071
 drifted+0.064
irmed+0.063
acked+0.061
reci+0.060
Neg logit contributions
 oh-0.050
ache-0.048
 why-0.048
chard-0.046
cher-0.044
Token.
Token ID13
Feature activation+2.26e-02

Pos logit contributions
acks+0.232
 drifted+0.213
|+0.208
reci+0.207
Ar+0.205
Neg logit contributions
 why-0.114
 oh-0.107
ache-0.104
chard-0.103
 stealing-0.099
Token The
Token ID383
Feature activation+1.47e-02

Pos logit contributions
acks+0.044
 drifted+0.040
irmed+0.039
irt+0.038
 minds+0.037
Neg logit contributions
 oh-0.032
 why-0.031
 lawyer-0.029
 doctor-0.029
 clerk-0.029
Token end
Token ID886
Feature activation-6.03e-03

Pos logit contributions
acks+0.027
 drifted+0.025
irmed+0.024
|+0.024
irt+0.023
Neg logit contributions
 oh-0.022
cher-0.022
ache-0.022
chard-0.021
 why-0.021
Token.
Token ID13
Feature activation+2.45e-02

Pos logit contributions
 oh+0.012
ache+0.012
chard+0.011
cher+0.011
 lawyer+0.011
Neg logit contributions
acks-0.009
 drifted-0.008
irmed-0.008
|-0.007
irt-0.007
Token<|endoftext|>
Token ID50256
Feature activation+6.89e-02

Pos logit contributions
acks+0.049
irmed+0.046
 minds+0.044
 drifted+0.044
irt+0.042
Neg logit contributions
 why-0.033
 oh-0.032
 lawyer-0.030
 doctor-0.030
 dentist-0.030
Tokenâ
Token ID22940
Feature activation+1.11e-03

Pos logit contributions
 oh+0.057
ache+0.056
chard+0.053
 why+0.052
 lawyer+0.052
Neg logit contributions
acks-0.105
 drifted-0.094
irmed-0.092
|-0.090
irt-0.088
Token
Token ID26391
Feature activation-1.32e-02

Pos logit contributions
acks+0.002
 drifted+0.002
irmed+0.002
irt+0.002
irted+0.002
Neg logit contributions
ache-0.002
 oh-0.002
 why-0.002
 lawyer-0.002
 sir-0.002
Token
Token ID8151
Feature activation-4.72e-02

Pos logit contributions
 oh+0.027
ache+0.026
 why+0.025
chard+0.025
cher+0.025
Neg logit contributions
acks-0.020
 drifted-0.016
|-0.015
irmed-0.015
acked-0.014
Tokent
Token ID83
Feature activation-2.79e-02

Pos logit contributions
 oh+0.081
ache+0.077
 lawyer+0.075
 why+0.075
chard+0.074
Neg logit contributions
acks-0.087
|-0.073
 drifted-0.072
irmed-0.072
acked-0.070
Token worry
Token ID5490
Feature activation-4.72e-02

Pos logit contributions
ache+0.047
 oh+0.046
 why+0.045
 lawyer+0.044
cher+0.043
Neg logit contributions
acks-0.051
irmed-0.044
 drifted-0.044
acked-0.041
irt-0.041
Token,
Token ID11
Feature activation-1.09e-01

Pos logit contributions
chard+0.075
ache+0.075
 oh+0.073
cher+0.073
 lawyer+0.072
Neg logit contributions
acks-0.087
 drifted-0.074
irmed-0.072
irt-0.069
|-0.068
Token Sam
Token ID3409
Feature activation-2.76e-02

Pos logit contributions
ache+0.187
chard+0.178
itor+0.169
cher+0.169
 lawyer+0.168
Neg logit contributions
acks-0.174
 drifted-0.164
 They-0.149
|-0.148
text-0.147
Token.
Token ID13
Feature activation-2.91e-02

Pos logit contributions
ache+0.043
 oh+0.043
 why+0.041
cher+0.040
 lawyer+0.040
Neg logit contributions
acks-0.053
 drifted-0.046
irmed-0.046
|-0.044
irt-0.043
Token I
Token ID314
Feature activation-1.84e-03

Pos logit contributions
ache+0.057
 oh+0.054
chard+0.052
 lawyer+0.052
cher+0.052
Neg logit contributions
acks-0.045
irmed-0.038
 drifted-0.037
|-0.036
 They-0.036
Token can
Token ID460
Feature activation-7.56e-02

Pos logit contributions
ache+0.003
 oh+0.003
 why+0.003
cher+0.003
 lawyer+0.003
Neg logit contributions
acks-0.004
 drifted-0.003
irmed-0.003
|-0.003
irt-0.003
Token help
Token ID1037
Feature activation-6.71e-02

Pos logit contributions
 oh+0.113
ache+0.113
 why+0.106
cher+0.106
 lawyer+0.106
Neg logit contributions
acks-0.150
 drifted-0.132
irmed-0.130
|-0.125
irt-0.123
Token.
Token ID13
Feature activation-9.55e-03

Pos logit contributions
acks+0.019
irmed+0.017
 drifted+0.017
|+0.016
irt+0.016
Neg logit contributions
ache-0.016
 oh-0.016
 why-0.015
cher-0.015
 lawyer-0.015
Token "
Token ID366
Feature activation-4.07e-02

Pos logit contributions
 oh+0.019
ache+0.019
 why+0.018
 lawyer+0.018
cher+0.018
Neg logit contributions
acks-0.014
 drifted-0.012
irmed-0.011
|-0.011
irt-0.011
TokenDon
Token ID3987
Feature activation-4.27e-02

Pos logit contributions
 oh+0.057
ache+0.055
 lawyer+0.053
chard+0.052
 why+0.052
Neg logit contributions
acks-0.086
 drifted-0.076
irmed-0.075
irt-0.071
|-0.070
Token't
Token ID470
Feature activation-3.57e-02

Pos logit contributions
 oh+0.061
ache+0.061
chard+0.060
cher+0.056
 why+0.055
Neg logit contributions
acks-0.087
 drifted-0.077
|-0.076
text-0.074
irmed-0.072
Token worry
Token ID5490
Feature activation-4.65e-02

Pos logit contributions
 oh+0.043
ache+0.043
 why+0.041
 lawyer+0.040
cher+0.039
Neg logit contributions
acks-0.082
 drifted-0.073
irmed-0.073
acked-0.069
|-0.069
Token,
Token ID11
Feature activation-1.09e-01

Pos logit contributions
cher+0.076
chard+0.076
 oh+0.075
ache+0.074
 lawyer+0.071
Neg logit contributions
acks-0.083
 drifted-0.071
irmed-0.069
text-0.067
|-0.067
Token Tim
Token ID5045
Feature activation-1.15e-02

Pos logit contributions
ache+0.197
cher+0.189
card+0.184
chard+0.183
 salon+0.182
Neg logit contributions
acks-0.150
 drifted-0.147
 They-0.137
 drift-0.133
 night-0.130
Token,"
Token ID553
Feature activation-3.20e-03

Pos logit contributions
chard+0.022
 oh+0.021
ache+0.021
cher+0.021
 lawyer+0.020
Neg logit contributions
acks-0.018
 drifted-0.015
irmed-0.015
irt-0.014
acked-0.014
Token she
Token ID673
Feature activation-1.10e-02

Pos logit contributions
ache+0.005
 oh+0.005
cher+0.005
 lawyer+0.005
chard+0.005
Neg logit contributions
acks-0.006
 drifted-0.005
irmed-0.005
|-0.005
irt-0.005
Token said
Token ID531
Feature activation-3.53e-02

Pos logit contributions
 oh+0.018
ache+0.018
 why+0.017
cher+0.017
 lawyer+0.017
Neg logit contributions
acks-0.020
 drifted-0.017
irmed-0.017
|-0.016
irt-0.016
Token.
Token ID13
Feature activation-1.88e-02

Pos logit contributions
ache+0.073
 lawyer+0.072
 oh+0.072
 principal+0.072
 salon+0.071
Neg logit contributions
acks-0.046
 drifted-0.040
irmed-0.038
irt-0.034
irm-0.033
Tokenâ
Token ID22940
Feature activation+3.89e-02

Pos logit contributions
acks+0.020
 drifted+0.017
irmed+0.017
|+0.016
irt+0.016
Neg logit contributions
 oh-0.019
 why-0.019
ache-0.019
cher-0.018
 lawyer-0.018
Token
Token ID26391
Feature activation+2.35e-02

Pos logit contributions
acks+0.070
 drifted+0.067
irmed+0.065
irted+0.062
irt+0.060
Neg logit contributions
 oh-0.058
ache-0.057
 sir-0.056
 why-0.055
cher-0.055
Token
Token ID8151
Feature activation-1.38e-02

Pos logit contributions
acks+0.042
 drifted+0.037
irmed+0.036
irt+0.034
|+0.034
Neg logit contributions
 oh-0.039
ache-0.039
 lawyer-0.037
cher-0.037
 why-0.037
Tokens
Token ID82
Feature activation-2.26e-02

Pos logit contributions
ache+0.025
 oh+0.025
 why+0.024
cher+0.024
 lawyer+0.024
Neg logit contributions
acks-0.022
 drifted-0.019
irmed-0.019
|-0.018
irt-0.018
Token a
Token ID257
Feature activation-4.36e-02

Pos logit contributions
 oh+0.035
ache+0.035
cher+0.033
 lawyer+0.033
 why+0.033
Neg logit contributions
acks-0.043
 drifted-0.038
irmed-0.037
|-0.036
irt-0.035
Token surprise
Token ID5975
Feature activation-1.07e-01

Pos logit contributions
ache+0.078
cher+0.077
chard+0.077
 oh+0.076
 why+0.075
Neg logit contributions
acks-0.067
 drifted-0.059
irmed-0.056
irt-0.054
|-0.053
Token!
Token ID0
Feature activation-2.09e-02

Pos logit contributions
ache+0.190
chard+0.190
 oh+0.180
 lawyer+0.178
cher+0.174
Neg logit contributions
acks-0.175
 drifted-0.153
irmed-0.139
irt-0.136
text-0.136
Tokenâ
Token ID22940
Feature activation+3.34e-02

Pos logit contributions
ache+0.037
 oh+0.036
chard+0.035
cher+0.035
 lawyer+0.034
Neg logit contributions
acks-0.036
 drifted-0.030
irmed-0.030
|-0.029
irt-0.028
Token
Token ID26391
Feature activation+2.04e-02

Pos logit contributions
acks+0.063
 drifted+0.058
irmed+0.056
irted+0.054
irt+0.054
Neg logit contributions
 oh-0.050
ache-0.050
 why-0.048
cher-0.047
 sir-0.047
Token
Token ID220
Feature activation+3.03e-02

Pos logit contributions
acks+0.037
 drifted+0.032
irmed+0.032
irt+0.031
irted+0.030
Neg logit contributions
 oh-0.033
ache-0.033
 lawyer-0.031
cher-0.031
 why-0.031
Token
Token ID198
Feature activation+8.33e-04

Pos logit contributions
acks+0.046
 drifted+0.039
irmed+0.038
|+0.036
irt+0.035
Neg logit contributions
 oh-0.059
ache-0.059
 why-0.056
cher-0.056
 lawyer-0.056
Token:
Token ID25
Feature activation-6.41e-02

Pos logit contributions
acks+0.018
 drifted+0.016
irmed+0.016
|+0.015
irt+0.015
Neg logit contributions
 oh-0.010
ache-0.009
 why-0.009
 lawyer-0.009
cher-0.009
Token "
Token ID366
Feature activation-4.26e-02

Pos logit contributions
ache+0.116
cher+0.109
card+0.102
 salon+0.101
 oh+0.100
Neg logit contributions
acks-0.105
 They-0.095
irmed-0.089
 drifted-0.087
 Soon-0.086
TokenDon
Token ID3987
Feature activation-4.05e-02

Pos logit contributions
 oh+0.060
ache+0.057
 why+0.055
 lawyer+0.055
chard+0.055
Neg logit contributions
acks-0.090
 drifted-0.081
irmed-0.079
irt-0.074
text-0.074
Token't
Token ID470
Feature activation-3.19e-02

Pos logit contributions
ache+0.056
 oh+0.055
chard+0.054
cher+0.050
 why+0.050
Neg logit contributions
acks-0.085
 drifted-0.076
|-0.074
irmed-0.072
text-0.072
Token worry
Token ID5490
Feature activation-4.58e-02

Pos logit contributions
 oh+0.042
ache+0.042
 why+0.041
 lawyer+0.040
 sir+0.039
Neg logit contributions
acks-0.070
irmed-0.062
 drifted-0.062
acked-0.060
irted-0.059
Token,
Token ID11
Feature activation-1.06e-01

Pos logit contributions
cher+0.074
 oh+0.073
chard+0.073
ache+0.073
 lawyer+0.069
Neg logit contributions
acks-0.083
 drifted-0.071
irmed-0.069
irt-0.067
text-0.066
Token it
Token ID340
Feature activation-4.90e-03

Pos logit contributions
ache+0.193
cher+0.183
 salon+0.181
card+0.178
 principal+0.177
Neg logit contributions
acks-0.146
 drifted-0.140
 They-0.137
 drift-0.130
 night-0.126
Token will
Token ID481
Feature activation-2.03e-02

Pos logit contributions
ache+0.008
 why+0.008
 oh+0.008
cher+0.007
 lawyer+0.007
Neg logit contributions
acks-0.009
 drifted-0.008
irmed-0.008
|-0.008
irt-0.008
Token be
Token ID307
Feature activation+4.37e-04

Pos logit contributions
ache+0.030
 oh+0.030
 why+0.030
cher+0.028
 sir+0.027
Neg logit contributions
acks-0.041
irmed-0.037
 drifted-0.036
acked-0.035
irted-0.035
Token alright
Token ID23036
Feature activation-2.61e-03

Pos logit contributions
acks+0.001
 drifted+0.001
irmed+0.001
|+0.001
acked+0.001
Neg logit contributions
 oh-0.001
 why-0.001
ache-0.001
chard-0.001
cher-0.001
Token,
Token ID11
Feature activation-9.00e-02

Pos logit contributions
ache+0.005
cher+0.005
 principal+0.005
 salon+0.005
 oh+0.005
Neg logit contributions
acks-0.004
 drifted-0.003
irmed-0.003
 They-0.003
|-0.003
Token said
Token ID531
Feature activation-4.29e-02

Pos logit contributions
 oh+0.032
ache+0.032
 why+0.030
cher+0.030
 lawyer+0.030
Neg logit contributions
acks-0.039
 drifted-0.035
irmed-0.034
|-0.032
irt-0.032
Token "
Token ID366
Feature activation-5.66e-02

Pos logit contributions
ache+0.073
chard+0.070
 lawyer+0.069
 oh+0.068
cher+0.067
Neg logit contributions
acks-0.072
 drifted-0.066
irmed-0.061
 They-0.057
text-0.057
TokenDon
Token ID3987
Feature activation-4.93e-02

Pos logit contributions
 oh+0.085
ache+0.085
 why+0.079
 lawyer+0.079
cher+0.079
Neg logit contributions
acks-0.112
 drifted-0.099
irmed-0.098
|-0.093
irt-0.092
Token't
Token ID470
Feature activation-3.16e-02

Pos logit contributions
chard+0.076
 oh+0.075
ache+0.074
cher+0.069
 why+0.068
Neg logit contributions
acks-0.096
 drifted-0.083
|-0.082
text-0.080
irmed-0.077
Token worry
Token ID5490
Feature activation-4.58e-02

Pos logit contributions
 oh+0.038
ache+0.038
 why+0.036
 lawyer+0.035
 sir+0.034
Neg logit contributions
acks-0.074
irmed-0.067
 drifted-0.066
acked-0.064
irted-0.063
Token,
Token ID11
Feature activation-1.06e-01

Pos logit contributions
cher+0.077
chard+0.077
 oh+0.075
ache+0.075
 principal+0.072
Neg logit contributions
acks-0.080
 drifted-0.067
irmed-0.064
text-0.063
 They-0.062
Token I
Token ID314
Feature activation-1.10e-02

Pos logit contributions
ache+0.193
cher+0.183
 principal+0.177
chard+0.177
 salon+0.174
Neg logit contributions
acks-0.141
 drifted-0.141
 They-0.138
 drift-0.131
 night-0.126
Token won
Token ID1839
Feature activation-2.12e-02

Pos logit contributions
 oh+0.019
ache+0.018
chard+0.018
 lawyer+0.017
cher+0.017
Neg logit contributions
acks-0.020
 drifted-0.017
irmed-0.017
|-0.016
irt-0.016
Token't
Token ID470
Feature activation-1.34e-02

Pos logit contributions
ache+0.027
 oh+0.027
chard+0.025
 why+0.025
cher+0.025
Neg logit contributions
acks-0.047
 drifted-0.042
irmed-0.041
|-0.040
irt-0.039
Token hurt
Token ID5938
Feature activation-1.49e-02

Pos logit contributions
 oh+0.022
ache+0.021
chard+0.021
cher+0.021
 lawyer+0.020
Neg logit contributions
acks-0.025
 drifted-0.021
irmed-0.021
|-0.020
irt-0.020
Token you
Token ID345
Feature activation+1.07e-02

Pos logit contributions
ache+0.028
cher+0.027
 oh+0.027
chard+0.026
 lawyer+0.026
Neg logit contributions
acks-0.024
irmed-0.019
 drifted-0.019
|-0.019
irt-0.019
Token of
Token ID286
Feature activation-3.29e-03

Pos logit contributions
 oh+0.070
ache+0.070
cher+0.068
chard+0.067
 lawyer+0.064
Neg logit contributions
acks-0.075
 drifted-0.066
irmed-0.062
acked-0.060
 minds-0.060
Token her
Token ID607
Feature activation-1.96e-02

Pos logit contributions
ache+0.006
 oh+0.006
cher+0.006
 lawyer+0.006
chard+0.006
Neg logit contributions
acks-0.005
 drifted-0.005
irmed-0.004
|-0.004
irt-0.004
Token.
Token ID13
Feature activation-5.15e-03

Pos logit contributions
ache+0.035
 oh+0.034
cher+0.033
chard+0.032
 sir+0.031
Neg logit contributions
acks-0.032
 drifted-0.028
 minds-0.027
irmed-0.026
irt-0.026
Token "
Token ID366
Feature activation-4.19e-02

Pos logit contributions
ache+0.010
 oh+0.010
cher+0.010
 why+0.009
 lawyer+0.009
Neg logit contributions
acks-0.008
 drifted-0.007
irmed-0.006
|-0.006
acked-0.006
TokenHello
Token ID15496
Feature activation-4.01e-02

Pos logit contributions
 oh+0.065
ache+0.064
 lawyer+0.061
 why+0.060
cher+0.060
Neg logit contributions
acks-0.081
 drifted-0.071
irmed-0.071
|-0.067
irt-0.067
Token,
Token ID11
Feature activation-1.06e-01

Pos logit contributions
ache+0.080
cher+0.079
chard+0.078
but+0.073
 oh+0.073
Neg logit contributions
acks-0.052
 night-0.043
text-0.041
|-0.041
 turns-0.040
Token Lily
Token ID20037
Feature activation+6.07e-03

Pos logit contributions
ache+0.201
chard+0.188
cher+0.184
 oh+0.182
but+0.179
Neg logit contributions
acks-0.142
|-0.133
 They-0.129
 night-0.124
text-0.122
Token!
Token ID0
Feature activation-4.00e-02

Pos logit contributions
acks+0.009
 drifted+0.007
irmed+0.007
|+0.007
acked+0.007
Neg logit contributions
 oh-0.012
ache-0.012
chard-0.012
cher-0.011
 lawyer-0.011
Token I
Token ID314
Feature activation-4.04e-05

Pos logit contributions
ache+0.067
 oh+0.062
chard+0.060
cher+0.059
 why+0.056
Neg logit contributions
acks-0.072
|-0.067
 drifted-0.063
text-0.061
irmed-0.061
Token am
Token ID716
Feature activation-3.41e-02

Pos logit contributions
 oh+0.000
ache+0.000
 why+0.000
cher+0.000
 lawyer+0.000
Neg logit contributions
acks-0.000
 drifted-0.000
irmed-0.000
|-0.000
irt-0.000
Token the
Token ID262
Feature activation+8.60e-03

Pos logit contributions
ache+0.054
 oh+0.051
cher+0.049
chard+0.047
 why+0.047
Neg logit contributions
acks-0.063
 drifted-0.056
|-0.054
irmed-0.053
irt-0.052
Token the
Token ID262
Feature activation-1.24e-02

Pos logit contributions
 oh+0.089
ache+0.089
 why+0.085
 lawyer+0.084
cher+0.084
Neg logit contributions
acks-0.085
 drifted-0.073
irmed-0.073
|-0.069
irt-0.068
Token rat
Token ID4227
Feature activation-4.77e-02

Pos logit contributions
 oh+0.021
ache+0.020
cher+0.020
chard+0.020
 why+0.019
Neg logit contributions
acks-0.021
 drifted-0.019
irt-0.017
 minds-0.017
irmed-0.017
Token.
Token ID13
Feature activation+6.40e-03

Pos logit contributions
 oh+0.085
ache+0.083
chard+0.080
cher+0.080
 lawyer+0.079
Neg logit contributions
acks-0.081
 drifted-0.069
irmed-0.067
|-0.065
irt-0.064
Token "
Token ID366
Feature activation-3.65e-02

Pos logit contributions
acks+0.010
 drifted+0.008
irmed+0.008
|+0.008
irt+0.008
Neg logit contributions
 oh-0.012
ache-0.012
 why-0.012
 lawyer-0.012
chard-0.012
TokenHello
Token ID15496
Feature activation-3.42e-02

Pos logit contributions
 oh+0.049
ache+0.047
 lawyer+0.044
 why+0.042
chard+0.042
Neg logit contributions
acks-0.081
 drifted-0.071
irmed-0.071
irt-0.068
|-0.067
Token,
Token ID11
Feature activation-1.05e-01

Pos logit contributions
cher+0.069
chard+0.068
ache+0.068
card+0.066
 principal+0.066
Neg logit contributions
acks-0.046
 hops-0.036
text-0.036
 night-0.036
 They-0.035
Token rat
Token ID4227
Feature activation-2.98e-02

Pos logit contributions
chard+0.190
but+0.190
cher+0.181
card+0.180
ache+0.178
Neg logit contributions
 They-0.127
acks-0.127
 night-0.125
 lives-0.122
 drifted-0.120
Token.
Token ID13
Feature activation-2.29e-02

Pos logit contributions
 oh+0.058
chard+0.057
ache+0.056
cher+0.055
 salon+0.055
Neg logit contributions
acks-0.042
 drifted-0.035
irmed-0.033
|-0.032
irt-0.032
Token What
Token ID1867
Feature activation-5.28e-02

Pos logit contributions
ache+0.034
 oh+0.033
chard+0.031
cher+0.031
 lawyer+0.030
Neg logit contributions
acks-0.045
 drifted-0.040
|-0.040
irmed-0.040
text-0.037
Token is
Token ID318
Feature activation-4.23e-02

Pos logit contributions
ache+0.113
 oh+0.111
chard+0.109
cher+0.107
 lawyer+0.105
Neg logit contributions
acks-0.070
 drifted-0.057
|-0.054
irmed-0.054
text-0.052
Token your
Token ID534
Feature activation-2.17e-02

Pos logit contributions
ache+0.085
cher+0.079
 sir+0.078
 salon+0.077
 clerk+0.076
Neg logit contributions
acks-0.058
 drifted-0.051
irt-0.048
irmed-0.048
|-0.046
Token,
Token ID11
Feature activation-8.94e-02

Pos logit contributions
ache+0.057
 oh+0.057
chard+0.057
cher+0.056
 lawyer+0.056
Neg logit contributions
acks-0.077
 drifted-0.071
irmed-0.068
text-0.063
acked-0.062
Token "
Token ID366
Feature activation-5.50e-02

Pos logit contributions
 oh+0.107
cher+0.104
ache+0.102
chard+0.100
 lawyer+0.096
Neg logit contributions
acks-0.192
 drifted-0.181
 They-0.169
irmed-0.163
 drift-0.161
TokenDon
Token ID3987
Feature activation-4.10e-02

Pos logit contributions
 oh+0.073
ache+0.070
cher+0.064
can+0.063
 lawyer+0.062
Neg logit contributions
acks-0.126
irt-0.107
irmed-0.106
 drifted-0.105
alk-0.104
Token't
Token ID470
Feature activation-3.18e-02

Pos logit contributions
 oh+0.055
ache+0.054
chard+0.053
cher+0.050
 why+0.050
Neg logit contributions
acks-0.088
 drifted-0.078
|-0.077
irmed-0.075
text-0.074
Token worry
Token ID5490
Feature activation-4.40e-02

Pos logit contributions
ache+0.034
 oh+0.033
 why+0.032
 lawyer+0.030
 sir+0.029
Neg logit contributions
acks-0.079
irmed-0.071
 drifted-0.071
acked-0.069
irted-0.068
Token,
Token ID11
Feature activation-1.05e-01

Pos logit contributions
 oh+0.066
ache+0.065
chard+0.065
cher+0.064
 lawyer+0.062
Neg logit contributions
acks-0.086
 drifted-0.075
irmed-0.073
|-0.071
text-0.070
Token Lily
Token ID20037
Feature activation+2.80e-03

Pos logit contributions
ache+0.188
cher+0.184
chard+0.183
card+0.182
 oh+0.180
Neg logit contributions
acks-0.156
 drifted-0.146
|-0.136
irmed-0.133
text-0.130
Token.
Token ID13
Feature activation-2.62e-02

Pos logit contributions
acks+0.004
 drifted+0.004
irmed+0.004
|+0.003
irt+0.003
Neg logit contributions
 oh-0.005
ache-0.005
chard-0.005
 lawyer-0.005
cher-0.005
Token You
Token ID921
Feature activation+1.80e-02

Pos logit contributions
 oh+0.038
ache+0.037
 why+0.036
 sir+0.035
 lawyer+0.035
Neg logit contributions
acks-0.054
 drifted-0.048
irmed-0.047
|-0.045
irt-0.044
Token have
Token ID423
Feature activation+2.63e-02

Pos logit contributions
acks+0.037
irmed+0.031
 drifted+0.031
acked+0.030
irt+0.029
Neg logit contributions
ache-0.027
cher-0.026
 why-0.026
 oh-0.026
 lawyer-0.025
Token a
Token ID257
Feature activation-3.65e-02

Pos logit contributions
acks+0.066
irmed+0.059
 drifted+0.059
|+0.058
acked+0.057
Neg logit contributions
 oh-0.027
ache-0.026
 why-0.025
 lawyer-0.024
chard-0.023
Token,
Token ID11
Feature activation-8.90e-02

Pos logit contributions
acks+0.068
 drifted+0.060
irmed+0.060
|+0.058
irt+0.057
Neg logit contributions
 oh-0.052
ache-0.051
 why-0.049
 lawyer-0.048
chard-0.048
Token "
Token ID366
Feature activation-4.61e-02

Pos logit contributions
ache+0.140
cher+0.129
 oh+0.126
 lawyer+0.123
chard+0.122
Neg logit contributions
acks-0.161
 drifted-0.149
 They-0.137
irmed-0.137
|-0.134
TokenDon
Token ID3987
Feature activation-4.32e-02

Pos logit contributions
 oh+0.068
ache+0.066
 why+0.063
 lawyer+0.063
cher+0.063
Neg logit contributions
acks-0.095
 drifted-0.083
irmed-0.082
irt-0.078
|-0.077
Token't
Token ID470
Feature activation-3.16e-02

Pos logit contributions
ache+0.056
 oh+0.055
chard+0.053
cher+0.051
 why+0.050
Neg logit contributions
acks-0.095
 drifted-0.085
irmed-0.082
|-0.082
text-0.080
Token worry
Token ID5490
Feature activation-4.28e-02

Pos logit contributions
 oh+0.037
ache+0.036
 why+0.035
 lawyer+0.033
 sir+0.033
Neg logit contributions
acks-0.076
irmed-0.068
 drifted-0.067
acked-0.066
irted-0.064
Token,
Token ID11
Feature activation-1.05e-01

Pos logit contributions
ache+0.066
 oh+0.066
cher+0.066
chard+0.065
 lawyer+0.063
Neg logit contributions
acks-0.080
 drifted-0.070
irmed-0.068
|-0.065
irt-0.065
Token Ben
Token ID3932
Feature activation+5.03e-03

Pos logit contributions
ache+0.194
cher+0.183
chard+0.177
 principal+0.177
 salon+0.176
Neg logit contributions
acks-0.145
 drifted-0.142
 They-0.131
 drift-0.130
 night-0.127
Token,
Token ID11
Feature activation-9.57e-02

Pos logit contributions
acks+0.008
 drifted+0.007
irmed+0.007
|+0.007
irt+0.007
Neg logit contributions
 oh-0.009
ache-0.009
 why-0.009
cher-0.009
 lawyer-0.009
Token it
Token ID340
Feature activation+3.31e-03

Pos logit contributions
ache+0.160
chard+0.151
 principal+0.151
itor+0.149
 lawyer+0.148
Neg logit contributions
acks-0.151
 drifted-0.146
 drift-0.135
|-0.130
 They-0.130
Token's
Token ID338
Feature activation-3.08e-02

Pos logit contributions
acks+0.006
 drifted+0.006
irmed+0.005
|+0.005
acked+0.005
Neg logit contributions
ache-0.005
 why-0.005
 oh-0.005
cher-0.005
 lawyer-0.004
Token not
Token ID407
Feature activation+1.58e-03

Pos logit contributions
 oh+0.029
ache+0.028
 why+0.026
 lawyer+0.025
cher+0.025
Neg logit contributions
acks-0.079
 drifted-0.072
irmed-0.071
|-0.069
irt-0.068
Token,
Token ID11
Feature activation-8.90e-02

Pos logit contributions
acks+0.023
 drifted+0.019
irmed+0.018
|+0.016
irt+0.016
Neg logit contributions
ache-0.038
 oh-0.038
cher-0.038
chard-0.036
 lawyer-0.036
Token "
Token ID366
Feature activation-4.61e-02

Pos logit contributions
ache+0.140
cher+0.129
 oh+0.126
 lawyer+0.123
chard+0.122
Neg logit contributions
acks-0.161
 drifted-0.149
 They-0.137
irmed-0.137
|-0.134
TokenDon
Token ID3987
Feature activation-4.32e-02

Pos logit contributions
 oh+0.068
ache+0.066
 why+0.063
 lawyer+0.063
cher+0.063
Neg logit contributions
acks-0.095
 drifted-0.083
irmed-0.082
irt-0.078
|-0.077
Token't
Token ID470
Feature activation-3.16e-02

Pos logit contributions
ache+0.056
 oh+0.055
chard+0.053
cher+0.051
 why+0.050
Neg logit contributions
acks-0.095
 drifted-0.085
irmed-0.082
|-0.082
text-0.080
Token worry
Token ID5490
Feature activation-4.28e-02

Pos logit contributions
 oh+0.037
ache+0.036
 why+0.035
 lawyer+0.033
 sir+0.033
Neg logit contributions
acks-0.076
irmed-0.068
 drifted-0.067
acked-0.066
irted-0.064
Token,
Token ID11
Feature activation-1.05e-01

Pos logit contributions
ache+0.066
 oh+0.066
cher+0.066
chard+0.065
 lawyer+0.063
Neg logit contributions
acks-0.080
 drifted-0.070
irmed-0.068
|-0.065
irt-0.065
Token sweet
Token ID6029
Feature activation+3.02e-03

Pos logit contributions
ache+0.194
cher+0.183
chard+0.177
 principal+0.177
 salon+0.176
Neg logit contributions
acks-0.145
 drifted-0.142
 They-0.131
 drift-0.130
 night-0.127
Tokenie
Token ID494
Feature activation-1.86e-02

Pos logit contributions
acks+0.004
 drifted+0.004
irmed+0.003
irt+0.003
|+0.003
Neg logit contributions
 oh-0.006
cher-0.006
chard-0.006
ache-0.006
 lawyer-0.006
Token.
Token ID13
Feature activation-2.95e-02

Pos logit contributions
 oh+0.036
ache+0.035
chard+0.035
 lawyer+0.034
cher+0.034
Neg logit contributions
acks-0.028
 drifted-0.024
irmed-0.023
irt-0.022
|-0.021
Token It
Token ID632
Feature activation-4.36e-03

Pos logit contributions
ache+0.046
 oh+0.046
cher+0.044
chard+0.043
 lawyer+0.043
Neg logit contributions
acks-0.056
 drifted-0.049
irmed-0.048
|-0.047
irt-0.045
Token was
Token ID373
Feature activation-1.05e-02

Pos logit contributions
 why+0.007
ache+0.007
 oh+0.007
cher+0.006
 lawyer+0.006
Neg logit contributions
acks-0.009
irmed-0.007
 drifted-0.007
irt-0.007
acked-0.007
Token,
Token ID11
Feature activation-8.90e-02

Pos logit contributions
acks+0.020
 drifted+0.017
irmed+0.017
|+0.016
irt+0.016
Neg logit contributions
 oh-0.020
ache-0.020
 why-0.019
chard-0.019
 lawyer-0.019
Token "
Token ID366
Feature activation-4.61e-02

Pos logit contributions
ache+0.140
cher+0.129
 oh+0.126
 lawyer+0.123
chard+0.122
Neg logit contributions
acks-0.161
 drifted-0.149
 They-0.137
irmed-0.137
|-0.134
TokenDon
Token ID3987
Feature activation-4.32e-02

Pos logit contributions
 oh+0.068
ache+0.066
 why+0.063
 lawyer+0.063
cher+0.063
Neg logit contributions
acks-0.095
 drifted-0.083
irmed-0.082
irt-0.078
|-0.077
Token't
Token ID470
Feature activation-3.16e-02

Pos logit contributions
ache+0.056
 oh+0.055
chard+0.053
cher+0.051
 why+0.050
Neg logit contributions
acks-0.095
 drifted-0.085
irmed-0.082
|-0.082
text-0.080
Token worry
Token ID5490
Feature activation-4.28e-02

Pos logit contributions
 oh+0.037
ache+0.036
 why+0.035
 lawyer+0.033
 sir+0.033
Neg logit contributions
acks-0.076
irmed-0.068
 drifted-0.067
acked-0.066
irted-0.064
Token,
Token ID11
Feature activation-1.05e-01

Pos logit contributions
ache+0.066
 oh+0.066
cher+0.066
chard+0.065
 lawyer+0.063
Neg logit contributions
acks-0.080
 drifted-0.070
irmed-0.068
|-0.065
irt-0.065
Token Frost
Token ID15122
Feature activation+2.71e-02

Pos logit contributions
ache+0.194
cher+0.183
chard+0.177
 principal+0.177
 salon+0.176
Neg logit contributions
acks-0.145
 drifted-0.142
 They-0.131
 drift-0.130
 night-0.127
Tokeny
Token ID88
Feature activation+4.05e-03

Pos logit contributions
acks+0.045
 drifted+0.039
irmed+0.039
|+0.036
irt+0.035
Neg logit contributions
ache-0.051
 oh-0.047
 why-0.047
cher-0.047
 sir-0.046
Token.
Token ID13
Feature activation-2.53e-02

Pos logit contributions
acks+0.008
 drifted+0.007
|+0.007
irmed+0.007
acked+0.007
Neg logit contributions
ache-0.006
 why-0.006
 oh-0.006
cher-0.006
 sir-0.006
Token Let
Token ID3914
Feature activation-1.97e-02

Pos logit contributions
 oh+0.042
ache+0.042
 why+0.040
cher+0.040
 lawyer+0.040
Neg logit contributions
acks-0.046
 drifted-0.040
irmed-0.039
|-0.037
irt-0.037
Token's
Token ID338
Feature activation-5.18e-02

Pos logit contributions
 oh+0.027
 why+0.026
ache+0.026
 lawyer+0.023
 please+0.023
Neg logit contributions
acks-0.042
irmed-0.038
 drifted-0.037
acked-0.035
|-0.035
Token snow
Token ID6729
Feature activation-1.39e-02

Pos logit contributions
acks+0.010
 drifted+0.009
irt+0.008
irmed+0.008
 minds+0.008
Neg logit contributions
cher-0.009
 oh-0.009
ache-0.009
but-0.009
chard-0.008
Tokenman
Token ID805
Feature activation-2.19e-02

Pos logit contributions
 oh+0.019
 lawyer+0.018
 why+0.018
 principal+0.018
ache+0.018
Neg logit contributions
acks-0.028
 drifted-0.026
acked-0.024
|-0.024
irmed-0.023
Token.
Token ID13
Feature activation-1.37e-03

Pos logit contributions
 oh+0.035
ache+0.034
chard+0.032
 sir+0.032
 lawyer+0.032
Neg logit contributions
acks-0.039
 drifted-0.036
|-0.033
irt-0.033
irmed-0.032
Token "
Token ID366
Feature activation-4.11e-02

Pos logit contributions
 oh+0.003
ache+0.003
 why+0.003
 lawyer+0.003
chard+0.003
Neg logit contributions
acks-0.002
 drifted-0.002
irmed-0.002
irt-0.002
|-0.002
TokenHello
Token ID15496
Feature activation-3.85e-02

Pos logit contributions
 oh+0.057
ache+0.056
 lawyer+0.053
 why+0.052
cher+0.052
Neg logit contributions
acks-0.088
 drifted-0.077
irmed-0.077
irt-0.073
|-0.072
Token,
Token ID11
Feature activation-1.05e-01

Pos logit contributions
chard+0.075
cher+0.075
 salon+0.075
ache+0.074
 clerk+0.073
Neg logit contributions
acks-0.050
 night-0.045
text-0.043
 drifted-0.042
 They-0.041
Token Mr
Token ID1770
Feature activation+8.81e-03

Pos logit contributions
ache+0.209
but+0.209
chard+0.203
 salon+0.201
lock+0.197
Neg logit contributions
 night-0.115
 They-0.114
acks-0.112
text-0.111
|-0.108
Token.
Token ID13
Feature activation-4.23e-02

Pos logit contributions
acks+0.017
 drifted+0.015
irmed+0.015
|+0.014
acked+0.014
Neg logit contributions
ache-0.014
 sir-0.014
 oh-0.014
 lawyer-0.014
 why-0.013
Token Snow
Token ID7967
Feature activation-8.49e-04

Pos logit contributions
ache+0.081
chard+0.079
 oh+0.079
 why+0.077
cher+0.075
Neg logit contributions
acks-0.060
 drifted-0.056
|-0.052
irmed-0.049
text-0.049
Tokenman
Token ID805
Feature activation-3.35e-02

Pos logit contributions
ache+0.001
 oh+0.001
cher+0.001
 lawyer+0.001
 why+0.001
Neg logit contributions
acks-0.002
 drifted-0.002
irmed-0.002
|-0.002
irt-0.002
Token!"
Token ID2474
Feature activation+1.15e-03

Pos logit contributions
 oh+0.058
ache+0.057
chard+0.055
 lawyer+0.054
cher+0.054
Neg logit contributions
acks-0.058
 drifted-0.050
irmed-0.049
|-0.047
irt-0.046
Token,
Token ID11
Feature activation-8.19e-02

Pos logit contributions
chard+0.052
ache+0.051
cher+0.050
 lawyer+0.050
 principal+0.049
Neg logit contributions
acks-0.060
 drifted-0.057
irmed-0.053
 They-0.049
 drift-0.048
Token "
Token ID366
Feature activation-5.47e-02

Pos logit contributions
ache+0.117
chard+0.114
cher+0.113
 oh+0.105
 salon+0.105
Neg logit contributions
acks-0.147
 They-0.145
 drifted-0.142
 drift-0.128
 Soon-0.127
TokenDon
Token ID3987
Feature activation-4.29e-02

Pos logit contributions
 oh+0.079
ache+0.072
cher+0.070
 lawyer+0.070
 why+0.068
Neg logit contributions
acks-0.116
irmed-0.102
 drifted-0.102
irt-0.098
irted-0.096
Token't
Token ID470
Feature activation-3.34e-02

Pos logit contributions
ache+0.057
 oh+0.057
chard+0.056
cher+0.053
 why+0.052
Neg logit contributions
acks-0.092
 drifted-0.082
|-0.080
irmed-0.079
text-0.077
Token worry
Token ID5490
Feature activation-4.48e-02

Pos logit contributions
 oh+0.037
ache+0.037
 why+0.035
 lawyer+0.034
cher+0.034
Neg logit contributions
acks-0.080
 drifted-0.072
irmed-0.071
|-0.068
irt-0.068
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
chard+0.070
ache+0.069
 oh+0.069
cher+0.069
 lawyer+0.066
Neg logit contributions
acks-0.085
 drifted-0.073
irmed-0.071
irt-0.068
|-0.068
Token we
Token ID356
Feature activation+7.29e-03

Pos logit contributions
ache+0.191
chard+0.186
cher+0.183
 lawyer+0.174
card+0.174
Neg logit contributions
acks-0.154
 drifted-0.148
 They-0.134
irmed-0.133
 minds-0.132
Token can
Token ID460
Feature activation-6.97e-02

Pos logit contributions
acks+0.014
 drifted+0.012
irmed+0.012
|+0.011
irt+0.011
Neg logit contributions
 oh-0.012
ache-0.012
chard-0.011
 lawyer-0.011
cher-0.011
Token repair
Token ID9185
Feature activation-5.14e-02

Pos logit contributions
 oh+0.118
ache+0.117
cher+0.111
 lawyer+0.110
chard+0.110
Neg logit contributions
acks-0.125
 drifted-0.109
irmed-0.107
|-0.102
irt-0.101
Token it
Token ID340
Feature activation+5.63e-03

Pos logit contributions
ache+0.090
 oh+0.088
cher+0.085
 lawyer+0.084
chard+0.084
Neg logit contributions
acks-0.088
 drifted-0.075
irmed-0.074
irt-0.071
|-0.071
Token!"
Token ID2474
Feature activation+2.25e-03

Pos logit contributions
acks+0.009
 drifted+0.008
irmed+0.007
irt+0.007
|+0.007
Neg logit contributions
 oh-0.010
ache-0.010
 lawyer-0.010
chard-0.010
cher-0.010
Token loud
Token ID7812
Feature activation+9.14e-03

Pos logit contributions
 oh+0.050
ache+0.050
cher+0.049
 why+0.048
chard+0.047
Neg logit contributions
acks-0.042
 drifted-0.038
irmed-0.036
irt-0.035
 minds-0.034
Token voice
Token ID3809
Feature activation-2.71e-02

Pos logit contributions
acks+0.018
 drifted+0.016
irmed+0.015
|+0.015
irt+0.015
Neg logit contributions
 oh-0.014
ache-0.014
 why-0.013
 lawyer-0.013
cher-0.013
Token.
Token ID13
Feature activation-6.90e-03

Pos logit contributions
chard+0.060
 oh+0.058
itor+0.057
 lawyer+0.056
ache+0.056
Neg logit contributions
acks-0.029
 drifted-0.027
 drift-0.023
 night-0.020
|-0.020
Token "
Token ID366
Feature activation-4.27e-02

Pos logit contributions
ache+0.014
 oh+0.014
cher+0.013
 lawyer+0.013
 why+0.013
Neg logit contributions
acks-0.010
 drifted-0.009
irmed-0.009
|-0.008
irt-0.008
TokenHello
Token ID15496
Feature activation-3.78e-02

Pos logit contributions
ache+0.058
 oh+0.057
cher+0.054
 why+0.054
chard+0.053
Neg logit contributions
acks-0.090
 drifted-0.080
irmed-0.079
|-0.078
irt-0.075
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
ache+0.071
chard+0.070
cher+0.070
 principal+0.066
card+0.066
Neg logit contributions
acks-0.053
|-0.044
 night-0.044
text-0.044
 drifted-0.042
Token children
Token ID1751
Feature activation-1.47e-02

Pos logit contributions
ache+0.188
chard+0.182
lock+0.175
cher+0.173
 oh+0.172
Neg logit contributions
acks-0.142
|-0.138
 They-0.130
 drifted-0.128
 night-0.126
Token!"
Token ID2474
Feature activation-2.21e-03

Pos logit contributions
 oh+0.027
ache+0.027
chard+0.026
 lawyer+0.026
cher+0.026
Neg logit contributions
acks-0.024
 drifted-0.020
irmed-0.019
|-0.019
irt-0.018
Token she
Token ID673
Feature activation-1.96e-03

Pos logit contributions
 oh+0.003
ache+0.003
 why+0.003
 lawyer+0.003
chard+0.003
Neg logit contributions
acks-0.004
irmed-0.004
 drifted-0.004
irt-0.004
|-0.004
Token said
Token ID531
Feature activation-2.92e-02

Pos logit contributions
 oh+0.003
ache+0.003
 why+0.003
 lawyer+0.003
cher+0.003
Neg logit contributions
acks-0.003
 drifted-0.003
irmed-0.003
|-0.003
irt-0.003
Token.
Token ID13
Feature activation-1.08e-02

Pos logit contributions
ache+0.060
 oh+0.059
 lawyer+0.059
chard+0.058
 salon+0.058
Neg logit contributions
acks-0.038
 drifted-0.033
irmed-0.030
 drift-0.028
 They-0.027
Token,
Token ID11
Feature activation-8.85e-02

Pos logit contributions
ache+0.052
 lawyer+0.050
chard+0.050
cher+0.049
 principal+0.049
Neg logit contributions
acks-0.062
 drifted-0.059
irmed-0.055
 They-0.050
 drift-0.050
Token "
Token ID366
Feature activation-5.74e-02

Pos logit contributions
ache+0.120
chard+0.114
cher+0.112
 lawyer+0.108
 oh+0.107
Neg logit contributions
acks-0.172
 They-0.164
 drifted-0.163
 drift-0.149
 Soon-0.146
TokenDon
Token ID3987
Feature activation-4.29e-02

Pos logit contributions
 oh+0.084
ache+0.081
 lawyer+0.077
 why+0.077
cher+0.077
Neg logit contributions
acks-0.118
 drifted-0.104
irmed-0.103
irt-0.098
|-0.098
Token't
Token ID470
Feature activation-3.07e-02

Pos logit contributions
ache+0.060
 oh+0.060
chard+0.059
cher+0.056
 why+0.054
Neg logit contributions
acks-0.090
 drifted-0.079
|-0.079
text-0.076
irmed-0.075
Token worry
Token ID5490
Feature activation-4.37e-02

Pos logit contributions
 oh+0.031
ache+0.030
 why+0.029
 lawyer+0.027
 sir+0.027
Neg logit contributions
acks-0.079
irmed-0.072
 drifted-0.071
acked-0.070
irted-0.068
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
cher+0.070
chard+0.070
ache+0.069
 oh+0.068
 lawyer+0.065
Neg logit contributions
acks-0.081
 drifted-0.069
irmed-0.067
text-0.064
irt-0.064
Token Lily
Token ID20037
Feature activation+2.72e-03

Pos logit contributions
ache+0.197
cher+0.189
chard+0.186
 principal+0.179
 lawyer+0.178
Neg logit contributions
acks-0.141
 drifted-0.138
 They-0.128
 minds-0.125
 drift-0.124
Token.
Token ID13
Feature activation-3.05e-02

Pos logit contributions
acks+0.004
 drifted+0.003
irmed+0.003
irt+0.003
|+0.003
Neg logit contributions
 oh-0.006
ache-0.005
chard-0.005
 lawyer-0.005
cher-0.005
Token We
Token ID775
Feature activation+2.24e-02

Pos logit contributions
ache+0.049
 oh+0.049
cher+0.046
 lawyer+0.046
chard+0.046
Neg logit contributions
acks-0.057
 drifted-0.050
irmed-0.049
|-0.047
irt-0.046
Token can
Token ID460
Feature activation-6.64e-02

Pos logit contributions
acks+0.047
 drifted+0.041
irmed+0.040
|+0.038
acked+0.038
Neg logit contributions
ache-0.032
 oh-0.031
 why-0.031
cher-0.030
 lawyer-0.030
Token plan
Token ID1410
Feature activation-5.69e-02

Pos logit contributions
 oh+0.108
ache+0.108
 why+0.103
 lawyer+0.102
cher+0.102
Neg logit contributions
acks-0.122
 drifted-0.107
irmed-0.105
|-0.100
irt-0.099
Token,
Token ID11
Feature activation-8.30e-02

Pos logit contributions
ache+0.053
chard+0.051
 lawyer+0.051
cher+0.049
 principal+0.049
Neg logit contributions
acks-0.052
 drifted-0.048
irmed-0.045
 They-0.041
irt-0.041
Token "
Token ID366
Feature activation-5.54e-02

Pos logit contributions
ache+0.118
cher+0.109
chard+0.107
 oh+0.106
 lawyer+0.103
Neg logit contributions
acks-0.159
 They-0.150
 drifted-0.145
 drift-0.132
 Soon-0.131
TokenDon
Token ID3987
Feature activation-4.36e-02

Pos logit contributions
 oh+0.079
ache+0.074
 lawyer+0.070
 why+0.069
chard+0.069
Neg logit contributions
acks-0.118
 drifted-0.105
irmed-0.103
irt-0.098
|-0.097
Token't
Token ID470
Feature activation-3.15e-02

Pos logit contributions
chard+0.065
ache+0.065
 oh+0.065
cher+0.061
 why+0.059
Neg logit contributions
acks-0.087
 drifted-0.076
|-0.075
text-0.074
irmed-0.071
Token worry
Token ID5490
Feature activation-4.45e-02

Pos logit contributions
 oh+0.035
ache+0.034
 why+0.033
 lawyer+0.032
 sir+0.031
Neg logit contributions
acks-0.076
irmed-0.069
 drifted-0.069
acked-0.067
irted-0.066
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
cher+0.076
chard+0.075
 oh+0.072
ache+0.072
 lawyer+0.069
Neg logit contributions
acks-0.080
 drifted-0.066
irmed-0.065
text-0.063
irt-0.063
Token little
Token ID1310
Feature activation-7.23e-03

Pos logit contributions
cher+0.190
ache+0.189
chard+0.186
card+0.180
 principal+0.177
Neg logit contributions
acks-0.131
 drifted-0.129
 They-0.124
 drift-0.116
 night-0.113
Token bird
Token ID6512
Feature activation-2.23e-02

Pos logit contributions
ache+0.012
 oh+0.011
cher+0.011
chard+0.011
 why+0.011
Neg logit contributions
acks-0.013
 drifted-0.012
irmed-0.012
irt-0.011
irted-0.011
Token.
Token ID13
Feature activation-2.74e-02

Pos logit contributions
 oh+0.044
ache+0.044
chard+0.043
 lawyer+0.043
cher+0.042
Neg logit contributions
acks-0.033
 drifted-0.027
irmed-0.026
|-0.025
irt-0.025
Token I
Token ID314
Feature activation-1.88e-03

Pos logit contributions
ache+0.049
 oh+0.048
cher+0.046
chard+0.046
 lawyer+0.046
Neg logit contributions
acks-0.046
 drifted-0.040
irmed-0.040
|-0.038
irt-0.037
Token will
Token ID481
Feature activation-3.01e-02

Pos logit contributions
ache+0.003
 oh+0.003
 why+0.003
cher+0.003
 lawyer+0.002
Neg logit contributions
acks-0.004
 drifted-0.003
irmed-0.003
|-0.003
text-0.003
Token,
Token ID11
Feature activation-8.55e-02

Pos logit contributions
ache+0.057
chard+0.056
 lawyer+0.054
 salon+0.053
cher+0.053
Neg logit contributions
acks-0.061
 drifted-0.058
irmed-0.053
 They-0.050
 drift-0.049
Token "
Token ID366
Feature activation-5.38e-02

Pos logit contributions
ache+0.113
chard+0.111
 oh+0.103
cher+0.103
 salon+0.102
Neg logit contributions
acks-0.172
 drifted-0.161
 They-0.160
 drift-0.147
irmed-0.144
TokenDon
Token ID3987
Feature activation-4.32e-02

Pos logit contributions
 oh+0.079
ache+0.074
 lawyer+0.072
cher+0.072
 why+0.071
Neg logit contributions
acks-0.112
 drifted-0.098
irmed-0.098
irt-0.093
irted-0.091
Token't
Token ID470
Feature activation-3.29e-02

Pos logit contributions
 oh+0.050
ache+0.050
 why+0.046
cher+0.046
 lawyer+0.046
Neg logit contributions
acks-0.101
 drifted-0.090
irmed-0.089
|-0.086
irt-0.085
Token worry
Token ID5490
Feature activation-4.44e-02

Pos logit contributions
 oh+0.034
ache+0.034
 why+0.032
 lawyer+0.031
 sir+0.030
Neg logit contributions
acks-0.082
irmed-0.074
 drifted-0.074
acked-0.071
irted-0.070
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
ache+0.067
 oh+0.067
chard+0.066
cher+0.066
 lawyer+0.063
Neg logit contributions
acks-0.085
 drifted-0.074
irmed-0.073
|-0.069
irt-0.069
Token Tim
Token ID5045
Feature activation-6.09e-03

Pos logit contributions
ache+0.187
chard+0.177
cher+0.177
card+0.171
 principal+0.169
Neg logit contributions
acks-0.149
 drifted-0.144
 They-0.133
 minds-0.128
 drift-0.128
Tokenmy
Token ID1820
Feature activation-2.80e-02

Pos logit contributions
 oh+0.012
chard+0.011
ache+0.011
 lawyer+0.011
cher+0.011
Neg logit contributions
acks-0.009
 drifted-0.008
irmed-0.008
irt-0.007
|-0.007
Token.
Token ID13
Feature activation-3.11e-02

Pos logit contributions
 oh+0.053
ache+0.053
 lawyer+0.051
chard+0.050
cher+0.050
Neg logit contributions
acks-0.044
 drifted-0.037
irmed-0.037
|-0.034
irt-0.034
Token We
Token ID775
Feature activation+2.26e-02

Pos logit contributions
 oh+0.052
ache+0.051
 why+0.050
 lawyer+0.049
cher+0.049
Neg logit contributions
acks-0.056
 drifted-0.049
irmed-0.048
|-0.045
irt-0.045
Token can
Token ID460
Feature activation-6.64e-02

Pos logit contributions
acks+0.046
 drifted+0.040
irmed+0.039
|+0.037
acked+0.037
Neg logit contributions
ache-0.033
 oh-0.033
 why-0.032
cher-0.032
 lawyer-0.032
Token and
Token ID290
Feature activation+3.90e-03

Pos logit contributions
acks+0.114
irmed+0.096
acked+0.093
 drifted+0.092
irt+0.091
Neg logit contributions
ache-0.092
 why-0.090
 lawyer-0.086
cher-0.086
 sir-0.085
Token said
Token ID531
Feature activation-3.76e-02

Pos logit contributions
acks+0.007
 drifted+0.006
irmed+0.006
|+0.005
irt+0.005
Neg logit contributions
ache-0.007
 oh-0.007
 lawyer-0.006
cher-0.006
 why-0.006
Token,
Token ID11
Feature activation-8.33e-02

Pos logit contributions
ache+0.063
chard+0.061
cher+0.060
 lawyer+0.060
 salon+0.060
Neg logit contributions
acks-0.060
 drifted-0.056
irmed-0.052
 They-0.050
text-0.047
Token "
Token ID366
Feature activation-5.22e-02

Pos logit contributions
ache+0.106
cher+0.101
 oh+0.100
chard+0.097
 lawyer+0.095
Neg logit contributions
acks-0.175
 drifted-0.163
 They-0.157
 drift-0.148
irmed-0.148
TokenHello
Token ID15496
Feature activation-3.86e-02

Pos logit contributions
 oh+0.079
ache+0.074
cher+0.071
 lawyer+0.071
 why+0.071
Neg logit contributions
acks-0.106
 drifted-0.093
irmed-0.092
irt-0.088
text-0.087
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
ache+0.074
chard+0.070
cher+0.070
 oh+0.067
card+0.066
Neg logit contributions
acks-0.059
|-0.047
 drifted-0.047
text-0.047
 night-0.045
Token sir
Token ID15967
Feature activation-2.27e-02

Pos logit contributions
ache+0.194
chard+0.180
cher+0.177
lock+0.175
 oh+0.174
Neg logit contributions
acks-0.147
|-0.136
 They-0.132
 drifted-0.125
 night-0.122
Token.
Token ID13
Feature activation-3.06e-02

Pos logit contributions
 oh+0.048
ache+0.046
chard+0.046
cher+0.044
 lawyer+0.044
Neg logit contributions
acks-0.030
|-0.023
 drifted-0.022
acked-0.022
irt-0.021
Token We
Token ID775
Feature activation+2.48e-02

Pos logit contributions
ache+0.045
 oh+0.042
cher+0.040
chard+0.040
 lawyer+0.039
Neg logit contributions
acks-0.062
 drifted-0.055
|-0.054
irmed-0.054
acked-0.051
Token are
Token ID389
Feature activation-1.18e-02

Pos logit contributions
acks+0.048
irmed+0.041
 drifted+0.041
acked+0.038
irted+0.038
Neg logit contributions
 why-0.040
ache-0.039
cher-0.038
 lawyer-0.038
 oh-0.038
Token trying
Token ID2111
Feature activation-3.28e-02

Pos logit contributions
ache+0.019
 oh+0.018
cher+0.017
 lawyer+0.017
 why+0.017
Neg logit contributions
acks-0.022
 drifted-0.020
irmed-0.019
|-0.019
irt-0.018
Token and
Token ID290
Feature activation+3.90e-03

Pos logit contributions
 lawyer+0.006
ache+0.006
 sir+0.005
itor+0.005
 salon+0.005
Neg logit contributions
acks-0.004
 drifted-0.004
irmed-0.004
irm-0.004
irt-0.004
Token said
Token ID531
Feature activation-3.76e-02

Pos logit contributions
acks+0.007
 drifted+0.006
irmed+0.006
|+0.005
irt+0.005
Neg logit contributions
ache-0.007
 oh-0.007
 lawyer-0.006
cher-0.006
 why-0.006
Token,
Token ID11
Feature activation-8.33e-02

Pos logit contributions
ache+0.063
chard+0.061
cher+0.060
 lawyer+0.060
 salon+0.060
Neg logit contributions
acks-0.060
 drifted-0.056
irmed-0.052
 They-0.050
text-0.047
Token "
Token ID366
Feature activation-5.22e-02

Pos logit contributions
ache+0.106
cher+0.101
 oh+0.100
chard+0.097
 lawyer+0.095
Neg logit contributions
acks-0.175
 drifted-0.163
 They-0.157
 drift-0.148
irmed-0.148
TokenHello
Token ID15496
Feature activation-3.86e-02

Pos logit contributions
 oh+0.079
ache+0.074
cher+0.071
 lawyer+0.071
 why+0.071
Neg logit contributions
acks-0.106
 drifted-0.093
irmed-0.092
irt-0.088
text-0.087
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
ache+0.074
chard+0.070
cher+0.070
 oh+0.067
card+0.066
Neg logit contributions
acks-0.059
|-0.047
 drifted-0.047
text-0.047
 night-0.045
Token Rose
Token ID8049
Feature activation+4.25e-03

Pos logit contributions
ache+0.194
chard+0.180
cher+0.177
lock+0.175
 oh+0.174
Neg logit contributions
acks-0.147
|-0.136
 They-0.132
 drifted-0.125
 night-0.122
Token.
Token ID13
Feature activation-3.20e-02

Pos logit contributions
acks+0.006
 drifted+0.005
irmed+0.005
|+0.005
irt+0.005
Neg logit contributions
 oh-0.008
ache-0.008
chard-0.008
cher-0.008
 lawyer-0.008
Token I
Token ID314
Feature activation-7.26e-04

Pos logit contributions
ache+0.047
 oh+0.045
cher+0.042
chard+0.042
 lawyer+0.041
Neg logit contributions
acks-0.064
 drifted-0.057
|-0.057
irmed-0.056
text-0.053
Token am
Token ID716
Feature activation-3.23e-02

Pos logit contributions
ache+0.001
 why+0.001
 oh+0.001
cher+0.001
 lawyer+0.001
Neg logit contributions
acks-0.001
irmed-0.001
 drifted-0.001
acked-0.001
irt-0.001
Token Tom
Token ID4186
Feature activation+2.20e-02

Pos logit contributions
ache+0.052
 oh+0.050
cher+0.048
 lawyer+0.047
chard+0.047
Neg logit contributions
acks-0.059
 drifted-0.052
irmed-0.050
|-0.050
irt-0.049
Token She
Token ID1375
Feature activation+8.86e-03

Pos logit contributions
 oh+0.019
ache+0.019
 why+0.018
 lawyer+0.018
cher+0.018
Neg logit contributions
acks-0.014
 drifted-0.012
irmed-0.011
|-0.011
irt-0.011
Token said
Token ID531
Feature activation-3.60e-02

Pos logit contributions
acks+0.017
irmed+0.014
|+0.014
 drifted+0.014
irt+0.014
Neg logit contributions
ache-0.014
 why-0.014
 oh-0.014
 lawyer-0.013
cher-0.013
Token,
Token ID11
Feature activation-7.86e-02

Pos logit contributions
cher+0.061
chard+0.061
ache+0.060
card+0.058
itor+0.058
Neg logit contributions
acks-0.053
 drifted-0.053
 They-0.048
irmed-0.046
 care-0.045
Token "
Token ID366
Feature activation-5.23e-02

Pos logit contributions
ache+0.115
 oh+0.111
cher+0.111
chard+0.104
 lawyer+0.102
Neg logit contributions
acks-0.144
 They-0.137
 drifted-0.136
 drift-0.123
 Soon-0.122
TokenHello
Token ID15496
Feature activation-3.82e-02

Pos logit contributions
 oh+0.079
ache+0.075
 lawyer+0.073
cher+0.072
chard+0.071
Neg logit contributions
acks-0.106
 drifted-0.093
irmed-0.093
irt-0.089
irted-0.088
Token,
Token ID11
Feature activation-1.04e-01

Pos logit contributions
ache+0.074
chard+0.072
cher+0.071
 oh+0.067
card+0.067
Neg logit contributions
acks-0.057
 drifted-0.045
text-0.045
 They-0.044
|-0.044
Token do
Token ID466
Feature activation-5.56e-02

Pos logit contributions
ache+0.187
chard+0.183
cher+0.174
 oh+0.171
lock+0.171
Neg logit contributions
acks-0.148
 They-0.132
|-0.132
 drifted-0.127
text-0.124
Tokenggy
Token ID19970
Feature activation-4.37e-03

Pos logit contributions
ache+0.088
 oh+0.086
chard+0.082
cher+0.082
 why+0.082
Neg logit contributions
acks-0.106
 drifted-0.092
irmed-0.091
|-0.088
irt-0.086
Token.
Token ID13
Feature activation-2.82e-02

Pos logit contributions
ache+0.008
 oh+0.008
chard+0.008
 lawyer+0.008
cher+0.008
Neg logit contributions
acks-0.007
 drifted-0.006
irmed-0.006
|-0.005
irt-0.005
Token I
Token ID314
Feature activation+1.65e-04

Pos logit contributions
ache+0.043
 oh+0.040
cher+0.038
chard+0.038
 lawyer+0.037
Neg logit contributions
acks-0.056
|-0.049
 drifted-0.049
irmed-0.049
irt-0.046
Token want
Token ID765
Feature activation-4.05e-02

Pos logit contributions
acks+0.000
 drifted+0.000
irmed+0.000
acked+0.000
irt+0.000
Neg logit contributions
ache-0.000
 oh-0.000
 why-0.000
cher-0.000
 lawyer-0.000
Token he
Token ID339
Feature activation+1.03e-03

Pos logit contributions
acks+0.013
irmed+0.012
 drifted+0.011
|+0.011
irt+0.011
Neg logit contributions
 oh-0.006
 why-0.006
ache-0.006
 sir-0.005
 lawyer-0.005
Token said
Token ID531
Feature activation-3.24e-02

Pos logit contributions
acks+0.002
 drifted+0.001
irmed+0.001
|+0.001
acked+0.001
Neg logit contributions
ache-0.002
 oh-0.002
 lawyer-0.002
chard-0.002
cher-0.002
Token,
Token ID11
Feature activation-7.13e-02

Pos logit contributions
ache+0.068
 oh+0.067
 salon+0.065
 lawyer+0.065
chard+0.064
Neg logit contributions
acks-0.043
 drifted-0.036
irmed-0.033
acked-0.031
 drift-0.030
Token making
Token ID1642
Feature activation-7.81e-04

Pos logit contributions
ache+0.093
cher+0.090
 oh+0.089
 lawyer+0.084
 salon+0.082
Neg logit contributions
acks-0.147
 drifted-0.139
irmed-0.127
 drift-0.125
 They-0.123
Token them
Token ID606
Feature activation+4.95e-03

Pos logit contributions
 oh+0.001
 why+0.001
ache+0.001
 lawyer+0.001
cher+0.001
Neg logit contributions
acks-0.001
irmed-0.001
 drifted-0.001
|-0.001
irt-0.001
Token all
Token ID477
Feature activation+6.44e-02

Pos logit contributions
acks+0.010
irmed+0.009
 drifted+0.008
irted+0.008
acked+0.008
Neg logit contributions
 why-0.007
 oh-0.007
ache-0.007
 lawyer-0.007
 sir-0.007
Token jump
Token ID4391
Feature activation+3.91e-02

Pos logit contributions
acks+0.119
irmed+0.111
 drifted+0.108
text+0.106
irt+0.104
Neg logit contributions
 why-0.104
 oh-0.093
 stealing-0.088
 lawyer-0.087
chard-0.086
Token.
Token ID13
Feature activation+1.14e-03

Pos logit contributions
acks+0.079
 drifted+0.071
irmed+0.070
|+0.067
irt+0.066
Neg logit contributions
ache-0.055
 oh-0.054
cher-0.052
 why-0.052
chard-0.051
Token They
Token ID1119
Feature activation+4.31e-02

Pos logit contributions
acks+0.002
 drifted+0.002
irmed+0.002
|+0.002
irt+0.001
Neg logit contributions
 oh-0.002
ache-0.002
 why-0.002
 lawyer-0.002
chard-0.002
Token smiled
Token ID13541
Feature activation+3.66e-02

Pos logit contributions
acks+0.074
irmed+0.062
 drifted+0.060
irt+0.059
|+0.058
Neg logit contributions
ache-0.076
 oh-0.076
 why-0.075
chard-0.074
cher-0.074
Token and
Token ID290
Feature activation+3.72e-02

Pos logit contributions
acks+0.069
 drifted+0.060
irmed+0.060
|+0.058
irt+0.056
Neg logit contributions
 oh-0.058
ache-0.058
 why-0.055
 lawyer-0.054
cher-0.054
Token that
Token ID326
Feature activation-4.29e-02

Pos logit contributions
chard+0.044
cher+0.042
 oh+0.042
ache+0.042
 salon+0.040
Neg logit contributions
acks-0.058
 drifted-0.052
irmed-0.051
 They-0.050
 care-0.049
Token he
Token ID339
Feature activation+3.26e-03

Pos logit contributions
ache+0.077
card+0.072
cher+0.071
chard+0.069
alo+0.069
Neg logit contributions
acks-0.065
 drifted-0.063
 night-0.058
|-0.056
irmed-0.056
Token was
Token ID373
Feature activation-1.26e-02

Pos logit contributions
acks+0.006
 drifted+0.006
irmed+0.006
|+0.005
irt+0.005
Neg logit contributions
 oh-0.005
ache-0.005
chard-0.005
 lawyer-0.005
cher-0.005
Token going
Token ID1016
Feature activation-2.29e-02

Pos logit contributions
ache+0.018
 oh+0.017
chard+0.017
cher+0.017
 salon+0.016
Neg logit contributions
acks-0.024
 drifted-0.023
irmed-0.022
irt-0.021
|-0.021
Token too
Token ID1165
Feature activation+3.21e-02

Pos logit contributions
 oh+0.032
ache+0.032
chard+0.031
cher+0.029
 sir+0.028
Neg logit contributions
acks-0.046
 drifted-0.041
irmed-0.041
irt-0.039
|-0.038
Token fast
Token ID3049
Feature activation+6.16e-02

Pos logit contributions
acks+0.058
 drifted+0.050
irmed+0.050
|+0.048
acked+0.047
Neg logit contributions
ache-0.054
 oh-0.054
 why-0.051
 lawyer-0.051
cher-0.050
Token and
Token ID290
Feature activation+1.13e-02

Pos logit contributions
acks+0.115
 drifted+0.100
irmed+0.099
|+0.095
irt+0.094
Neg logit contributions
 oh-0.099
ache-0.099
 why-0.094
 lawyer-0.093
cher-0.093
Token that
Token ID326
Feature activation-3.09e-02

Pos logit contributions
acks+0.018
 drifted+0.018
irmed+0.017
 minds+0.016
|+0.015
Neg logit contributions
ache-0.019
alo-0.017
 oh-0.017
card-0.017
 lawyer-0.017
Token was
Token ID373
Feature activation-1.30e-02

Pos logit contributions
ache+0.049
 oh+0.045
cher+0.045
chard+0.045
 lawyer+0.044
Neg logit contributions
acks-0.057
 drifted-0.053
irmed-0.050
|-0.048
irt-0.046
Token very
Token ID845
Feature activation+1.35e-02

Pos logit contributions
ache+0.020
 oh+0.019
cher+0.018
chard+0.018
 lawyer+0.018
Neg logit contributions
acks-0.024
 drifted-0.022
irmed-0.022
irt-0.020
irm-0.020
Token stupid
Token ID8531
Feature activation+7.67e-03

Pos logit contributions
acks+0.026
irmed+0.023
 drifted+0.023
irt+0.022
irted+0.021
Neg logit contributions
ache-0.021
 oh-0.020
chard-0.019
cher-0.019
 lawyer-0.019
Token big
Token ID1263
Feature activation+1.15e-02

Pos logit contributions
cher+0.079
 why+0.076
but+0.074
ache+0.073
 oh+0.072
Neg logit contributions
acks-0.027
 night-0.026
 evening-0.024
 afternoon-0.024
 lot-0.023
Token tree
Token ID5509
Feature activation-3.19e-02

Pos logit contributions
acks+0.020
 drifted+0.016
irt+0.016
irmed+0.016
 night+0.016
Neg logit contributions
cher-0.020
ache-0.020
 oh-0.019
 why-0.018
can-0.018
Token with
Token ID351
Feature activation+4.67e-02

Pos logit contributions
ache+0.047
 oh+0.047
cher+0.044
 lawyer+0.043
 why+0.042
Neg logit contributions
acks-0.065
 drifted-0.057
irmed-0.056
|-0.055
acked-0.054
Token a
Token ID257
Feature activation-1.15e-02

Pos logit contributions
acks+0.060
 drifted+0.051
irmed+0.049
acked+0.045
irt+0.044
Neg logit contributions
ache-0.101
 oh-0.099
cher-0.098
 why-0.096
 sir-0.095
Token sign
Token ID1051
Feature activation-2.57e-02

Pos logit contributions
cher+0.028
 why+0.026
 oh+0.025
itor+0.024
ache+0.024
Neg logit contributions
acks-0.011
 drifted-0.010
 lot-0.010
 night-0.010
 evening-0.009
Token on
Token ID319
Feature activation+6.25e-02

Pos logit contributions
 oh+0.051
ache+0.050
cher+0.050
 principal+0.047
chard+0.046
Neg logit contributions
acks-0.033
 They-0.030
 drifted-0.030
acked-0.029
irt-0.027
Token it
Token ID340
Feature activation+4.05e-02

Pos logit contributions
acks+0.128
 drifted+0.113
irmed+0.112
|+0.107
irt+0.106
Neg logit contributions
 oh-0.090
ache-0.089
 why-0.084
 lawyer-0.084
chard-0.083
Token.
Token ID13
Feature activation+7.31e-03

Pos logit contributions
acks+0.053
 drifted+0.045
irmed+0.043
acked+0.040
irt+0.039
Neg logit contributions
ache-0.087
 oh-0.086
cher-0.083
 lawyer-0.083
chard-0.082
Token The
Token ID383
Feature activation+7.80e-03

Pos logit contributions
acks+0.009
 drifted+0.007
irmed+0.007
|+0.007
acked+0.007
Neg logit contributions
ache-0.017
cher-0.016
 oh-0.016
 lawyer-0.015
chard-0.015
Token sign
Token ID1051
Feature activation-2.98e-02

Pos logit contributions
acks+0.013
 They+0.011
 air+0.011
 night+0.011
 care+0.011
Neg logit contributions
ache-0.014
cher-0.013
 oh-0.012
 why-0.012
chard-0.011
Token said
Token ID531
Feature activation-2.14e-02

Pos logit contributions
ache+0.061
 oh+0.059
cher+0.058
chard+0.056
 why+0.055
Neg logit contributions
acks-0.039
 drifted-0.037
irmed-0.033
acked-0.033
 They-0.032
Token studying
Token ID11065
Feature activation-8.54e-03

Pos logit contributions
acks+0.029
 drifted+0.026
irmed+0.025
irt+0.024
|+0.024
Neg logit contributions
ache-0.020
 oh-0.019
chard-0.018
 sir-0.018
 lawyer-0.018
Token very
Token ID845
Feature activation+2.18e-02

Pos logit contributions
ache+0.014
 oh+0.014
 lawyer+0.013
cher+0.013
chard+0.013
Neg logit contributions
acks-0.015
 drifted-0.014
irmed-0.013
|-0.012
irt-0.012
Token hard
Token ID1327
Feature activation+4.75e-02

Pos logit contributions
acks+0.042
|+0.037
 drifted+0.036
acked+0.036
irmed+0.035
Neg logit contributions
ache-0.034
 oh-0.034
 why-0.034
 lawyer-0.032
 sir-0.031
Token and
Token ID290
Feature activation+2.15e-02

Pos logit contributions
acks+0.092
 drifted+0.081
irmed+0.080
|+0.076
irt+0.075
Neg logit contributions
 oh-0.073
ache-0.073
cher-0.069
 why-0.069
 lawyer-0.068
Token was
Token ID373
Feature activation-7.86e-03

Pos logit contributions
acks+0.041
 drifted+0.037
irmed+0.036
|+0.034
irt+0.033
Neg logit contributions
ache-0.033
 oh-0.032
 lawyer-0.031
cher-0.031
chard-0.031
Token feeling
Token ID4203
Feature activation+6.11e-02

Pos logit contributions
ache+0.012
 oh+0.011
 lawyer+0.011
cher+0.011
 sir+0.011
Neg logit contributions
acks-0.015
 drifted-0.014
irmed-0.013
|-0.012
irt-0.012
Token nervous
Token ID10927
Feature activation+2.57e-02

Pos logit contributions
acks+0.135
 drifted+0.118
irmed+0.118
|+0.115
acked+0.114
Neg logit contributions
 oh-0.080
ache-0.078
 why-0.077
chard-0.072
 lawyer-0.072
Token.
Token ID13
Feature activation+2.77e-03

Pos logit contributions
acks+0.039
 drifted+0.033
irmed+0.032
irt+0.029
|+0.029
Neg logit contributions
 oh-0.049
ache-0.049
chard-0.049
cher-0.048
 lawyer-0.047
Token He
Token ID679
Feature activation+4.15e-03

Pos logit contributions
acks+0.004
 drifted+0.004
irmed+0.004
irt+0.003
|+0.003
Neg logit contributions
 oh-0.005
ache-0.005
 why-0.005
 lawyer-0.005
 sir-0.005
Token put
Token ID1234
Feature activation+1.16e-02

Pos logit contributions
acks+0.008
irmed+0.007
 drifted+0.007
|+0.007
irt+0.007
Neg logit contributions
ache-0.007
 oh-0.007
 why-0.006
 lawyer-0.006
cher-0.006
Token on
Token ID319
Feature activation+3.67e-02

Pos logit contributions
acks+0.023
 drifted+0.020
irmed+0.020
|+0.019
irt+0.019
Neg logit contributions
 oh-0.018
ache-0.017
 why-0.017
cher-0.016
 lawyer-0.016
Token to
Token ID284
Feature activation-4.18e-02

Pos logit contributions
ache+0.035
 oh+0.035
cher+0.033
chard+0.032
 lawyer+0.031
Neg logit contributions
acks-0.072
 drifted-0.065
irmed-0.064
|-0.062
irt-0.062
Token write
Token ID3551
Feature activation+3.31e-04

Pos logit contributions
 oh+0.073
ache+0.072
cher+0.070
chard+0.068
 lawyer+0.068
Neg logit contributions
acks-0.073
 drifted-0.063
irmed-0.061
|-0.059
irt-0.058
Token about
Token ID546
Feature activation+4.10e-04

Pos logit contributions
acks+0.000
 drifted+0.000
irt+0.000
 They+0.000
 minds+0.000
Neg logit contributions
 salon-0.001
 oh-0.001
ache-0.001
chard-0.001
cher-0.001
Token flowers
Token ID12734
Feature activation+2.68e-02

Pos logit contributions
acks+0.001
 drifted+0.001
irmed+0.001
|+0.000
irt+0.000
Neg logit contributions
ache-0.001
 oh-0.001
cher-0.001
 lawyer-0.001
chard-0.001
Token and
Token ID290
Feature activation+4.99e-02

Pos logit contributions
acks+0.038
 drifted+0.033
irt+0.029
 They+0.029
|+0.029
Neg logit contributions
 oh-0.056
cher-0.050
itor-0.050
 clerk-0.050
 lawyer-0.050
Token stars
Token ID5788
Feature activation+5.70e-02

Pos logit contributions
acks+0.080
 drifted+0.071
irmed+0.067
|+0.064
irt+0.063
Neg logit contributions
ache-0.091
 oh-0.091
cher-0.088
 lawyer-0.086
chard-0.085
Token.
Token ID13
Feature activation+2.11e-02

Pos logit contributions
acks+0.078
 drifted+0.068
irmed+0.061
irt+0.059
|+0.057
Neg logit contributions
 oh-0.120
ache-0.113
cher-0.111
chard-0.111
 lawyer-0.111
Token They
Token ID1119
Feature activation+5.37e-02

Pos logit contributions
acks+0.022
 drifted+0.017
irmed+0.017
acked+0.015
|+0.014
Neg logit contributions
 oh-0.051
 why-0.051
 lawyer-0.050
ache-0.050
 sir-0.049
Token want
Token ID765
Feature activation-2.72e-02

Pos logit contributions
acks+0.118
acked+0.102
irmed+0.100
 drifted+0.100
text+0.099
Neg logit contributions
 why-0.072
ache-0.072
 lawyer-0.066
 oh-0.065
 sir-0.065
Token to
Token ID284
Feature activation-4.68e-02

Pos logit contributions
cher+0.039
ache+0.039
 oh+0.037
chard+0.035
 lawyer+0.034
Neg logit contributions
acks-0.054
 drifted-0.050
irt-0.046
 minds-0.046
 care-0.045
Token write
Token ID3551
Feature activation-1.99e-03

Pos logit contributions
 oh+0.078
ache+0.078
 why+0.074
 lawyer+0.074
cher+0.074
Neg logit contributions
acks-0.085
 drifted-0.073
irmed-0.073
|-0.069
irt-0.068
Token that
Token ID326
Feature activation-2.46e-02

Pos logit contributions
ache+0.048
 oh+0.046
 why+0.045
cher+0.044
chard+0.043
Neg logit contributions
acks-0.054
 drifted-0.050
irt-0.046
irmed-0.046
irted-0.045
Token she
Token ID673
Feature activation+2.03e-02

Pos logit contributions
ache+0.042
 oh+0.041
cher+0.040
 why+0.039
 sir+0.039
Neg logit contributions
acks-0.043
 drifted-0.039
irmed-0.037
|-0.036
acked-0.035
Token wanted
Token ID2227
Feature activation-3.10e-02

Pos logit contributions
acks+0.036
 drifted+0.031
irmed+0.031
|+0.029
irt+0.029
Neg logit contributions
 oh-0.035
ache-0.035
 why-0.033
 lawyer-0.033
cher-0.033
Token.
Token ID13
Feature activation+1.02e-02

Pos logit contributions
ache+0.040
 oh+0.040
cher+0.038
 lawyer+0.037
chard+0.037
Neg logit contributions
acks-0.067
 drifted-0.060
irmed-0.059
|-0.056
irt-0.056
Token
Token ID198
Feature activation+2.12e-02

Pos logit contributions
acks+0.013
irmed+0.010
 drifted+0.010
 They+0.010
|+0.010
Neg logit contributions
ache-0.023
cher-0.023
 oh-0.022
chard-0.021
 why-0.021
Token
Token ID198
Feature activation+1.33e-02

Pos logit contributions
acks+0.031
 drifted+0.026
irmed+0.025
|+0.024
irt+0.023
Neg logit contributions
ache-0.043
 oh-0.043
cher-0.041
 why-0.040
 lawyer-0.040
Token"
Token ID1
Feature activation-5.65e-02

Pos logit contributions
acks+0.021
 drifted+0.018
irmed+0.018
acked+0.017
|+0.017
Neg logit contributions
ache-0.026
cher-0.025
 oh-0.025
 sir-0.024
chard-0.024
TokenMom
Token ID29252
Feature activation-1.16e-03

Pos logit contributions
ache+0.110
cher+0.110
 why+0.102
 sir+0.101
 oh+0.097
Neg logit contributions
acks-0.080
 drifted-0.076
acked-0.072
|-0.071
 evenings-0.069
Tokenmy
Token ID1820
Feature activation-3.58e-03

Pos logit contributions
chard+0.002
 oh+0.002
cher+0.002
ache+0.002
 lawyer+0.002
Neg logit contributions
acks-0.001
 night-0.001
 drifted-0.001
acked-0.001
 minds-0.001
Token,
Token ID11
Feature activation-7.06e-02

Pos logit contributions
 oh+0.006
ache+0.006
chard+0.005
cher+0.005
 lawyer+0.005
Neg logit contributions
acks-0.007
 drifted-0.006
irmed-0.006
acked-0.005
irt-0.005
Token can
Token ID460
Feature activation-6.70e-02

Pos logit contributions
ache+0.169
cher+0.158
but+0.154
chard+0.152
ich+0.150
Neg logit contributions
 imagination-0.066
 night-0.065
 care-0.062
 turns-0.059
 They-0.059
Token.
Token ID13
Feature activation-9.54e-04

Pos logit contributions
 oh+0.062
ache+0.061
chard+0.060
 lawyer+0.060
cher+0.059
Neg logit contributions
acks-0.040
 drifted-0.033
irmed-0.032
irt-0.029
irm-0.028
Token
Token ID198
Feature activation+2.47e-02

Pos logit contributions
ache+0.002
 oh+0.002
cher+0.002
chard+0.002
 why+0.002
Neg logit contributions
acks-0.001
 drifted-0.001
irmed-0.001
 They-0.001
|-0.001
Token
Token ID198
Feature activation+1.74e-02

Pos logit contributions
acks+0.038
 drifted+0.032
irmed+0.032
|+0.030
irt+0.029
Neg logit contributions
 oh-0.048
ache-0.048
 why-0.046
 lawyer-0.046
cher-0.045
TokenThey
Token ID2990
Feature activation+3.77e-02

Pos logit contributions
acks+0.030
irmed+0.026
 drifted+0.026
acked+0.025
irt+0.024
Neg logit contributions
ache-0.030
 oh-0.029
cher-0.029
chard-0.028
 why-0.028
Token argued
Token ID7189
Feature activation+1.43e-02

Pos logit contributions
acks+0.062
 drifted+0.055
irmed+0.054
|+0.050
irt+0.049
Neg logit contributions
 oh-0.068
ache-0.067
 lawyer-0.064
cher-0.064
 why-0.064
Token and
Token ID290
Feature activation+4.39e-02

Pos logit contributions
acks+0.026
 drifted+0.023
irmed+0.022
|+0.021
irt+0.021
Neg logit contributions
ache-0.024
 oh-0.024
cher-0.022
 lawyer-0.022
chard-0.022
Token argued
Token ID7189
Feature activation+2.19e-02

Pos logit contributions
acks+0.082
irmed+0.067
 drifted+0.066
irt+0.066
acked+0.065
Neg logit contributions
 oh-0.077
ache-0.072
 why-0.072
chard-0.070
cher-0.070
Token.
Token ID13
Feature activation+6.29e-03

Pos logit contributions
acks+0.037
 drifted+0.031
irmed+0.031
irt+0.029
|+0.029
Neg logit contributions
ache-0.039
 oh-0.039
cher-0.037
chard-0.037
 lawyer-0.037
Token They
Token ID1119
Feature activation+4.20e-02

Pos logit contributions
acks+0.006
 drifted+0.004
irmed+0.004
|+0.003
irt+0.003
Neg logit contributions
 oh-0.016
ache-0.016
 why-0.016
 lawyer-0.016
chard-0.016
Token did
Token ID750
Feature activation-1.43e-02

Pos logit contributions
acks+0.077
 drifted+0.069
irmed+0.068
|+0.064
irt+0.063
Neg logit contributions
ache-0.068
 oh-0.068
 lawyer-0.064
cher-0.064
 why-0.064
Token not
Token ID407
Feature activation+3.21e-03

Pos logit contributions
ache+0.013
 oh+0.013
cher+0.012
 lawyer+0.012
chard+0.012
Neg logit contributions
acks-0.036
 drifted-0.033
irmed-0.032
irt-0.032
|-0.032
Token her
Token ID607
Feature activation+9.63e-03

Pos logit contributions
acks+0.018
 drifted+0.016
irmed+0.016
irt+0.015
|+0.015
Neg logit contributions
ache-0.018
 oh-0.018
cher-0.018
 lawyer-0.017
 sir-0.017
Token favorite
Token ID4004
Feature activation+1.20e-02

Pos logit contributions
acks+0.013
 drifted+0.011
irt+0.011
irmed+0.011
 minds+0.010
Neg logit contributions
cher-0.020
ache-0.019
 oh-0.019
 why-0.018
 lawyer-0.018
Token vegetable
Token ID20236
Feature activation-6.77e-03

Pos logit contributions
acks+0.022
 drifted+0.019
irmed+0.019
|+0.018
irt+0.017
Neg logit contributions
ache-0.020
 oh-0.020
cher-0.019
 why-0.019
 lawyer-0.019
Token.
Token ID13
Feature activation+8.33e-03

Pos logit contributions
 oh+0.014
cher+0.014
ache+0.014
 sir+0.014
 lawyer+0.013
Neg logit contributions
acks-0.009
 drifted-0.007
irmed-0.007
irt-0.006
acked-0.006
Token One
Token ID1881
Feature activation+1.68e-02

Pos logit contributions
acks+0.011
 drifted+0.010
irmed+0.010
|+0.009
irt+0.009
Neg logit contributions
 oh-0.017
 why-0.017
 lawyer-0.017
ache-0.017
chard-0.016
Token day
Token ID1110
Feature activation+4.42e-03

Pos logit contributions
acks+0.022
irmed+0.019
 drifted+0.018
|+0.016
acked+0.016
Neg logit contributions
 oh-0.037
ache-0.036
 why-0.036
 lawyer-0.036
chard-0.035
Token,
Token ID11
Feature activation-6.12e-02

Pos logit contributions
acks+0.009
 drifted+0.008
irmed+0.008
|+0.008
irt+0.008
Neg logit contributions
ache-0.006
 oh-0.006
cher-0.006
 why-0.006
 lawyer-0.006
Token she
Token ID673
Feature activation+1.34e-02

Pos logit contributions
ache+0.124
cher+0.119
 sir+0.110
card+0.109
 principal+0.108
Neg logit contributions
acks-0.090
 drifted-0.083
irmed-0.078
irted-0.074
irt-0.074
Token was
Token ID373
Feature activation-7.00e-03

Pos logit contributions
acks+0.024
irmed+0.021
 drifted+0.020
|+0.020
irt+0.019
Neg logit contributions
 oh-0.022
ache-0.022
 why-0.021
 lawyer-0.021
chard-0.021
Token eating
Token ID6600
Feature activation+1.12e-02

Pos logit contributions
cher+0.014
 sir+0.014
ache+0.014
itor+0.013
ister+0.013
Neg logit contributions
acks-0.009
 drifted-0.009
irmed-0.009
irm-0.008
irt-0.008
Token caul
Token ID46087
Feature activation+1.97e-02

Pos logit contributions
acks+0.015
 drifted+0.013
irmed+0.012
irt+0.012
irted+0.011
Neg logit contributions
cher-0.024
ache-0.023
 sir-0.022
 lawyer-0.022
 principal-0.022
Token saw
Token ID2497
Feature activation-4.62e-02

Pos logit contributions
acks+0.014
irmed+0.012
irt+0.012
 drifted+0.012
|+0.012
Neg logit contributions
 oh-0.009
 why-0.009
ache-0.009
chard-0.008
cher-0.008
Token big
Token ID1263
Feature activation+3.40e-02

Pos logit contributions
ache+0.097
cher+0.094
 oh+0.092
 lawyer+0.090
chard+0.090
Neg logit contributions
acks-0.064
 drifted-0.054
irmed-0.052
irted-0.048
irt-0.048
Token trees
Token ID7150
Feature activation+8.35e-03

Pos logit contributions
acks+0.045
 drifted+0.036
irmed+0.036
 minds+0.033
irted+0.032
Neg logit contributions
ache-0.074
cher-0.073
 oh-0.072
chard-0.071
 lawyer-0.068
Token,
Token ID11
Feature activation-3.42e-02

Pos logit contributions
acks+0.015
 drifted+0.013
irmed+0.012
|+0.012
acked+0.012
Neg logit contributions
 oh-0.015
ache-0.014
cher-0.014
 lawyer-0.014
 principal-0.013
Token pretty
Token ID2495
Feature activation+5.52e-02

Pos logit contributions
ache+0.070
cher+0.068
 oh+0.068
 lawyer+0.066
 sir+0.064
Neg logit contributions
acks-0.049
 drifted-0.042
irmed-0.041
|-0.039
irt-0.038
Token flowers
Token ID12734
Feature activation+1.07e-02

Pos logit contributions
acks+0.082
 drifted+0.069
irmed+0.067
|+0.063
acked+0.062
Neg logit contributions
 oh-0.110
ache-0.110
cher-0.107
chard-0.104
 lawyer-0.104
Token,
Token ID11
Feature activation-3.68e-02

Pos logit contributions
acks+0.019
 drifted+0.017
irmed+0.016
|+0.015
acked+0.015
Neg logit contributions
 oh-0.018
ache-0.018
cher-0.017
 lawyer-0.017
chard-0.017
Token and
Token ID290
Feature activation+2.00e-02

Pos logit contributions
ache+0.074
cher+0.073
 oh+0.072
 lawyer+0.070
 sir+0.069
Neg logit contributions
acks-0.054
 drifted-0.047
irmed-0.044
|-0.042
irt-0.041
Token fast
Token ID3049
Feature activation+6.19e-02

Pos logit contributions
acks+0.030
 drifted+0.027
irmed+0.026
|+0.024
irt+0.024
Neg logit contributions
ache-0.039
 oh-0.037
cher-0.037
 sir-0.036
 lawyer-0.036
Token cars
Token ID5006
Feature activation+2.28e-02

Pos logit contributions
acks+0.106
 drifted+0.092
irmed+0.091
|+0.090
irt+0.086
Neg logit contributions
ache-0.107
 oh-0.107
 why-0.105
 lawyer-0.103
chard-0.099
Token.
Token ID13
Feature activation+5.71e-03

Pos logit contributions
acks+0.036
 drifted+0.030
irmed+0.030
|+0.028
irt+0.027
Neg logit contributions
 oh-0.044
ache-0.044
 lawyer-0.042
 why-0.042
cher-0.041
Token fun
Token ID1257
Feature activation+1.86e-02

Pos logit contributions
acks+0.032
 drifted+0.029
irmed+0.028
 minds+0.026
irt+0.026
Neg logit contributions
ache-0.028
 oh-0.027
 lawyer-0.027
cher-0.026
 principal-0.026
Token.
Token ID13
Feature activation+6.44e-03

Pos logit contributions
acks+0.035
 drifted+0.030
irmed+0.030
|+0.029
irt+0.028
Neg logit contributions
 oh-0.030
ache-0.030
 why-0.029
 lawyer-0.028
cher-0.028
Token
Token ID220
Feature activation+4.82e-02

Pos logit contributions
acks+0.011
irmed+0.010
 drifted+0.010
irted+0.010
irt+0.010
Neg logit contributions
 sir-0.010
 why-0.010
 oh-0.010
keeper-0.010
 lawyer-0.010
Token
Token ID198
Feature activation+1.94e-02

Pos logit contributions
acks+0.078
 drifted+0.069
irmed+0.068
|+0.064
irt+0.063
Neg logit contributions
ache-0.086
 oh-0.085
 why-0.083
cher-0.082
 lawyer-0.082
Token
Token ID198
Feature activation+1.22e-02

Pos logit contributions
acks+0.032
irmed+0.029
 drifted+0.028
irt+0.027
irted+0.027
Neg logit contributions
 oh-0.034
ache-0.034
cher-0.033
 why-0.033
 lawyer-0.032
TokenThe
Token ID464
Feature activation+1.08e-03

Pos logit contributions
acks+0.024
irmed+0.021
 drifted+0.021
irt+0.020
irted+0.020
Neg logit contributions
 oh-0.019
ache-0.018
cher-0.018
 why-0.018
 lawyer-0.017
Token squirrel
Token ID33039
Feature activation-2.62e-02

Pos logit contributions
acks+0.002
 drifted+0.002
irmed+0.002
|+0.001
irted+0.001
Neg logit contributions
cher-0.002
ache-0.002
 oh-0.002
chard-0.002
 why-0.002
Token thanked
Token ID26280
Feature activation-3.80e-02

Pos logit contributions
ache+0.044
 oh+0.043
 why+0.042
cher+0.041
 lawyer+0.041
Neg logit contributions
acks-0.048
irmed-0.041
 drifted-0.040
irt-0.039
|-0.039
Token the
Token ID262
Feature activation-1.47e-02

Pos logit contributions
 oh+0.066
ache+0.065
 why+0.063
 lawyer+0.063
chard+0.062
Neg logit contributions
acks-0.067
irmed-0.058
 drifted-0.057
irt-0.054
acked-0.054
Token boy
Token ID2933
Feature activation-2.59e-02

Pos logit contributions
cher+0.025
 oh+0.024
ache+0.024
chard+0.023
 salon+0.023
Neg logit contributions
acks-0.022
 drifted-0.021
irmed-0.021
|-0.020
 minds-0.020
Token and
Token ID290
Feature activation+2.44e-02

Pos logit contributions
 why+0.038
ache+0.037
 oh+0.037
 lawyer+0.035
 sir+0.035
Neg logit contributions
acks-0.052
 drifted-0.046
irmed-0.046
|-0.044
irted-0.043
Token that
Token ID326
Feature activation-3.77e-02

Pos logit contributions
ache+0.060
cher+0.057
 oh+0.056
 lawyer+0.054
 why+0.054
Neg logit contributions
acks-0.068
 drifted-0.061
irmed-0.059
irt-0.058
|-0.057
Token noise
Token ID7838
Feature activation-1.78e-04

Pos logit contributions
cher+0.054
ache+0.054
 oh+0.052
chard+0.050
itor+0.049
Neg logit contributions
acks-0.075
 drifted-0.064
irt-0.062
 night-0.061
irmed-0.060
Token?"
Token ID1701
Feature activation+6.97e-03

Pos logit contributions
ache+0.000
cher+0.000
 oh+0.000
chard+0.000
 salon+0.000
Neg logit contributions
acks-0.000
 drifted-0.000
irt-0.000
|-0.000
acked-0.000
Token The
Token ID383
Feature activation-4.85e-03

Pos logit contributions
acks+0.011
 drifted+0.009
irmed+0.009
|+0.009
irt+0.008
Neg logit contributions
 oh-0.013
ache-0.013
 why-0.013
cher-0.013
 lawyer-0.013
Token trees
Token ID7150
Feature activation-1.77e-02

Pos logit contributions
 oh+0.007
cher+0.007
ache+0.007
 why+0.007
chard+0.007
Neg logit contributions
acks-0.008
 drifted-0.008
irmed-0.007
irt-0.007
 minds-0.007
Token replied
Token ID8712
Feature activation-1.69e-02

Pos logit contributions
 oh+0.029
ache+0.028
 why+0.027
cher+0.027
 lawyer+0.027
Neg logit contributions
acks-0.033
 drifted-0.029
irmed-0.028
|-0.027
irt-0.027
Token,
Token ID11
Feature activation-6.04e-02

Pos logit contributions
 salon+0.035
 principal+0.034
 lawyer+0.033
ache+0.033
 dentist+0.033
Neg logit contributions
acks-0.022
 drifted-0.019
 They-0.018
irmed-0.017
acked-0.016
Token "
Token ID366
Feature activation-5.20e-02

Pos logit contributions
ache+0.073
cher+0.070
 lawyer+0.068
 principal+0.068
 oh+0.067
Neg logit contributions
acks-0.129
 drifted-0.124
irmed-0.115
 They-0.114
 drift-0.111
TokenIt
Token ID1026
Feature activation+1.58e-02

Pos logit contributions
 oh+0.072
ache+0.072
 lawyer+0.067
 why+0.067
chard+0.067
Neg logit contributions
acks-0.109
 drifted-0.097
irmed-0.096
irt-0.091
|-0.091
Token's
Token ID338
Feature activation-2.03e-02

Pos logit contributions
acks+0.029
irmed+0.025
 drifted+0.025
|+0.024
irt+0.023
Neg logit contributions
 why-0.026
ache-0.026
 oh-0.026
cher-0.024
 lawyer-0.024
Token just
Token ID655
Feature activation+4.25e-02

Pos logit contributions
ache+0.032
 oh+0.032
cher+0.030
 lawyer+0.030
 why+0.030
Neg logit contributions
acks-0.038
 drifted-0.034
irmed-0.033
|-0.032
irt-0.031
Token day
Token ID1110
Feature activation-3.62e-03

Pos logit contributions
 lawyer+0.086
 oh+0.086
ache+0.084
 why+0.084
 worker+0.082
Neg logit contributions
acks-0.049
irmed-0.040
 drifted-0.039
|-0.036
irt-0.035
Token,
Token ID11
Feature activation-7.02e-02

Pos logit contributions
ache+0.005
 oh+0.005
cher+0.005
chard+0.005
 lawyer+0.005
Neg logit contributions
acks-0.007
 drifted-0.007
irmed-0.006
|-0.006
irt-0.006
Token they
Token ID484
Feature activation+3.17e-02

Pos logit contributions
 oh+0.119
ache+0.116
 why+0.113
 lawyer+0.111
cher+0.111
Neg logit contributions
acks-0.127
 drifted-0.110
irmed-0.109
|-0.104
irt-0.102
Token cooked
Token ID15847
Feature activation-2.32e-02

Pos logit contributions
acks+0.053
irmed+0.045
 drifted+0.045
|+0.042
irt+0.042
Neg logit contributions
ache-0.057
 oh-0.057
 why-0.055
 lawyer-0.054
cher-0.054
Token the
Token ID262
Feature activation-7.81e-03

Pos logit contributions
 oh+0.040
ache+0.040
 why+0.038
 lawyer+0.038
cher+0.038
Neg logit contributions
acks-0.040
 drifted-0.035
irmed-0.035
|-0.033
irt-0.032
Token sa
Token ID473
Feature activation-1.72e-02

Pos logit contributions
 oh+0.013
ache+0.013
 why+0.012
 lawyer+0.012
cher+0.012
Neg logit contributions
acks-0.015
 drifted-0.013
irmed-0.013
|-0.012
irt-0.012
Tokenus
Token ID385
Feature activation-1.74e-02

Pos logit contributions
ache+0.021
cher+0.021
 oh+0.020
 clerk+0.018
chard+0.018
Neg logit contributions
acks-0.038
 drifted-0.035
irmed-0.035
|-0.035
irm-0.033
Tokenages
Token ID1095
Feature activation-1.64e-02

Pos logit contributions
ache+0.027
 oh+0.025
cher+0.024
 lawyer+0.024
 why+0.024
Neg logit contributions
acks-0.035
irmed-0.031
 drifted-0.031
|-0.030
irted-0.029
Token and
Token ID290
Feature activation+3.17e-02

Pos logit contributions
ache+0.025
 oh+0.025
 why+0.024
cher+0.024
chard+0.023
Neg logit contributions
acks-0.032
irmed-0.028
 drifted-0.028
|-0.027
irt-0.026
Token had
Token ID550
Feature activation+3.57e-02

Pos logit contributions
acks+0.051
 drifted+0.046
irmed+0.043
|+0.041
irted+0.040
Neg logit contributions
ache-0.058
 oh-0.057
cher-0.055
 lawyer-0.055
 why-0.054
Token a
Token ID257
Feature activation-3.25e-02

Pos logit contributions
acks+0.069
irmed+0.060
 drifted+0.058
|+0.058
irt+0.056
Neg logit contributions
 oh-0.056
ache-0.055
 why-0.054
chard-0.052
 lawyer-0.052
Token again
Token ID757
Feature activation+4.58e-02

Pos logit contributions
 oh+0.008
ache+0.008
 why+0.007
 lawyer+0.007
chard+0.007
Neg logit contributions
acks-0.007
 drifted-0.006
irmed-0.006
|-0.006
irt-0.005
Token and
Token ID290
Feature activation+3.51e-02

Pos logit contributions
acks+0.078
 drifted+0.068
irmed+0.065
irt+0.061
|+0.061
Neg logit contributions
 oh-0.083
ache-0.082
cher-0.079
 lawyer-0.078
chard-0.078
Token w
Token ID266
Feature activation+2.34e-02

Pos logit contributions
acks+0.056
 drifted+0.052
irmed+0.049
irted+0.046
|+0.045
Neg logit contributions
ache-0.065
 oh-0.062
 lawyer-0.062
cher-0.060
 sir-0.059
Tokenadd
Token ID2860
Feature activation+7.49e-02

Pos logit contributions
acks+0.030
|+0.028
 harmony+0.025
 drift+0.025
 drifted+0.025
Neg logit contributions
ache-0.045
 oh-0.043
 why-0.043
cher-0.041
 worker-0.040
Tokenled
Token ID992
Feature activation+5.13e-02

Pos logit contributions
acks+0.113
|+0.098
 drifted+0.091
 evenings+0.090
 harmony+0.090
Neg logit contributions
ache-0.139
 oh-0.132
cher-0.127
 why-0.127
 worker-0.124
Token to
Token ID284
Feature activation-2.91e-02

Pos logit contributions
acks+0.105
|+0.093
 drifted+0.090
irmed+0.090
text+0.090
Neg logit contributions
 oh-0.074
 why-0.073
chard-0.067
ache-0.067
 lawyer-0.062
Token the
Token ID262
Feature activation-9.10e-03

Pos logit contributions
 oh+0.049
ache+0.047
 why+0.047
chard+0.046
 lawyer+0.045
Neg logit contributions
acks-0.053
 drifted-0.046
irmed-0.046
acked-0.043
|-0.043
Token pond
Token ID16723
Feature activation-1.99e-02

Pos logit contributions
ache+0.016
 oh+0.016
cher+0.015
 why+0.015
 lawyer+0.015
Neg logit contributions
acks-0.016
 drifted-0.014
irmed-0.013
irt-0.013
|-0.013
Token.
Token ID13
Feature activation+6.55e-03

Pos logit contributions
ache+0.035
 oh+0.035
 why+0.034
chard+0.033
cher+0.033
Neg logit contributions
acks-0.034
 drifted-0.029
irmed-0.029
|-0.027
irt-0.027
Token Lily
Token ID20037
Feature activation+1.61e-02

Pos logit contributions
acks+0.011
 drifted+0.010
irmed+0.010
|+0.009
irt+0.009
Neg logit contributions
 oh-0.011
ache-0.011
 why-0.011
 lawyer-0.011
chard-0.011
Token followed
Token ID3940
Feature activation-1.73e-02

Pos logit contributions
acks+0.029
irmed+0.025
 drifted+0.025
|+0.024
irt+0.023
Neg logit contributions
ache-0.027
 oh-0.027
 why-0.026
cher-0.025
 lawyer-0.025
Token dangerous
Token ID4923
Feature activation-4.63e-03

Pos logit contributions
acks+0.015
 drifted+0.013
irmed+0.013
|+0.013
acked+0.012
Neg logit contributions
 oh-0.011
ache-0.011
 why-0.011
chard-0.010
cher-0.010
Token.
Token ID13
Feature activation+5.80e-03

Pos logit contributions
 oh+0.009
ache+0.009
chard+0.009
 lawyer+0.009
cher+0.009
Neg logit contributions
acks-0.007
 drifted-0.006
irmed-0.006
irt-0.005
acked-0.005
Token He
Token ID679
Feature activation+8.53e-03

Pos logit contributions
acks+0.009
 drifted+0.008
irmed+0.008
|+0.007
irt+0.007
Neg logit contributions
 oh-0.011
ache-0.011
 why-0.011
cher-0.011
 lawyer-0.011
Token did
Token ID750
Feature activation-2.19e-02

Pos logit contributions
acks+0.016
 drifted+0.014
irmed+0.014
|+0.013
irt+0.013
Neg logit contributions
 oh-0.014
ache-0.014
 why-0.013
 lawyer-0.013
cher-0.013
Token not
Token ID407
Feature activation-4.39e-03

Pos logit contributions
ache+0.019
 oh+0.019
cher+0.017
 lawyer+0.017
chard+0.017
Neg logit contributions
acks-0.057
 drifted-0.052
irmed-0.051
|-0.050
irt-0.049
Token want
Token ID765
Feature activation-3.94e-02

Pos logit contributions
ache+0.008
 oh+0.008
 why+0.007
cher+0.007
 lawyer+0.007
Neg logit contributions
acks-0.008
 drifted-0.007
irmed-0.007
|-0.006
irt-0.006
Token to
Token ID284
Feature activation-4.96e-02

Pos logit contributions
ache+0.045
 oh+0.044
cher+0.042
chard+0.040
 lawyer+0.040
Neg logit contributions
acks-0.092
 drifted-0.083
irmed-0.082
|-0.079
irt-0.078
Token try
Token ID1949
Feature activation-2.27e-02

Pos logit contributions
 oh+0.087
ache+0.087
cher+0.083
 lawyer+0.082
 why+0.082
Neg logit contributions
acks-0.086
 drifted-0.074
irmed-0.073
|-0.070
irt-0.069
Token it
Token ID340
Feature activation+5.43e-03

Pos logit contributions
ache+0.043
cher+0.042
 oh+0.041
 clerk+0.041
chard+0.040
Neg logit contributions
acks-0.036
 drifted-0.029
irt-0.028
|-0.028
irmed-0.028
Token.
Token ID13
Feature activation+8.33e-03

Pos logit contributions
acks+0.009
 drifted+0.007
irmed+0.007
|+0.007
irt+0.007
Neg logit contributions
ache-0.011
 oh-0.010
 lawyer-0.010
chard-0.010
cher-0.010
Token He
Token ID679
Feature activation+1.06e-02

Pos logit contributions
acks+0.012
 drifted+0.010
irmed+0.010
|+0.009
irt+0.009
Neg logit contributions
ache-0.017
 oh-0.017
cher-0.016
 lawyer-0.016
 why-0.016
Token
Token ID198
Feature activation+1.78e-02

Pos logit contributions
acks+0.043
 drifted+0.037
irmed+0.037
irt+0.034
|+0.034
Neg logit contributions
 oh-0.052
ache-0.051
 why-0.050
 lawyer-0.049
chard-0.049
TokenShe
Token ID3347
Feature activation+2.24e-03

Pos logit contributions
acks+0.028
 drifted+0.024
irmed+0.024
|+0.022
irt+0.022
Neg logit contributions
 oh-0.034
ache-0.033
 why-0.032
 lawyer-0.032
chard-0.032
Token went
Token ID1816
Feature activation-8.38e-03

Pos logit contributions
acks+0.004
irmed+0.003
 drifted+0.003
|+0.003
irt+0.003
Neg logit contributions
 oh-0.004
ache-0.004
 why-0.004
 lawyer-0.003
chard-0.003
Token to
Token ID284
Feature activation-3.19e-02

Pos logit contributions
ache+0.016
 oh+0.015
 sir+0.014
keys+0.014
 lawyer+0.013
Neg logit contributions
acks-0.015
 drifted-0.012
|-0.012
 night-0.012
irmed-0.012
Token the
Token ID262
Feature activation-1.03e-02

Pos logit contributions
ache+0.059
 oh+0.059
cher+0.056
 why+0.056
 lawyer+0.056
Neg logit contributions
acks-0.052
 drifted-0.044
irmed-0.044
|-0.041
irt-0.041
Token living
Token ID2877
Feature activation-3.20e-02

Pos logit contributions
ache+0.021
cher+0.020
 oh+0.019
ver+0.019
 why+0.018
Neg logit contributions
acks-0.014
 night-0.014
 drifted-0.013
 afternoon-0.012
 evening-0.012
Token room
Token ID2119
Feature activation-2.76e-02

Pos logit contributions
ache+0.052
 oh+0.051
cher+0.049
 lawyer+0.048
 why+0.048
Neg logit contributions
acks-0.060
 drifted-0.053
irmed-0.052
|-0.050
irt-0.050
Token,
Token ID11
Feature activation-5.60e-02

Pos logit contributions
ache+0.048
 oh+0.046
cher+0.045
chard+0.044
 lawyer+0.043
Neg logit contributions
acks-0.050
 drifted-0.044
irmed-0.042
|-0.039
irt-0.039
Token where
Token ID810
Feature activation-4.65e-02

Pos logit contributions
ache+0.077
cher+0.069
 lawyer+0.067
 oh+0.066
 sir+0.065
Neg logit contributions
acks-0.118
 drifted-0.113
irmed-0.106
|-0.103
irt-0.101
Token her
Token ID607
Feature activation-3.83e-02

Pos logit contributions
ache+0.092
cher+0.088
 oh+0.086
 lawyer+0.081
 sir+0.081
Neg logit contributions
acks-0.068
 drifted-0.064
irmed-0.058
 night-0.057
|-0.056
Token brother
Token ID3956
Feature activation-2.85e-02

Pos logit contributions
cher+0.062
ache+0.062
 oh+0.060
chard+0.057
 why+0.056
Neg logit contributions
 drifted-0.066
acks-0.064
 minds-0.061
irmed-0.060
 night-0.059
Token Lily
Token ID20037
Feature activation+1.31e-02

Pos logit contributions
ache+0.118
 oh+0.114
cher+0.111
 lawyer+0.109
 why+0.109
Neg logit contributions
acks-0.109
 drifted-0.095
irmed-0.092
|-0.090
irt-0.087
Token wanted
Token ID2227
Feature activation-4.03e-02

Pos logit contributions
acks+0.024
irmed+0.021
|+0.020
irt+0.020
 drifted+0.020
Neg logit contributions
ache-0.022
 why-0.020
 oh-0.020
 lawyer-0.020
cher-0.019
Token to
Token ID284
Feature activation-5.19e-02

Pos logit contributions
cher+0.061
ache+0.059
 oh+0.054
 sir+0.054
chard+0.053
Neg logit contributions
acks-0.077
 drifted-0.071
 night-0.068
 care-0.067
irmed-0.066
Token spin
Token ID7906
Feature activation+2.04e-02

Pos logit contributions
 oh+0.097
ache+0.097
cher+0.093
chard+0.092
 lawyer+0.090
Neg logit contributions
acks-0.084
 drifted-0.072
irmed-0.068
|-0.067
irt-0.065
Token more
Token ID517
Feature activation+1.91e-02

Pos logit contributions
acks+0.038
 drifted+0.033
irmed+0.033
|+0.031
irt+0.031
Neg logit contributions
ache-0.033
 oh-0.033
 lawyer-0.031
cher-0.031
chard-0.031
Token,
Token ID11
Feature activation-6.64e-02

Pos logit contributions
acks+0.033
 drifted+0.028
irmed+0.028
|+0.026
irt+0.026
Neg logit contributions
 oh-0.034
ache-0.034
cher-0.032
 lawyer-0.032
 why-0.032
Token but
Token ID475
Feature activation-1.78e-02

Pos logit contributions
ache+0.093
cher+0.087
 lawyer+0.082
chard+0.081
itor+0.079
Neg logit contributions
acks-0.140
 drifted-0.128
irmed-0.124
irm-0.119
irt-0.119
Token it
Token ID340
Feature activation+1.51e-02

Pos logit contributions
ache+0.033
cher+0.031
 lawyer+0.028
card+0.028
itor+0.028
Neg logit contributions
 drifted-0.027
acks-0.026
irmed-0.025
 night-0.024
 evening-0.023
Token started
Token ID2067
Feature activation-1.51e-03

Pos logit contributions
acks+0.029
 drifted+0.025
irmed+0.025
|+0.024
irt+0.024
Neg logit contributions
 oh-0.024
ache-0.023
 why-0.022
 lawyer-0.022
cher-0.022
Token to
Token ID284
Feature activation-2.12e-02

Pos logit contributions
 oh+0.002
ache+0.002
 why+0.002
 lawyer+0.001
cher+0.001
Neg logit contributions
acks-0.004
 drifted-0.003
irmed-0.003
|-0.003
irt-0.003
Token rain
Token ID6290
Feature activation-6.90e-03

Pos logit contributions
ache+0.034
 oh+0.034
 why+0.033
 lawyer+0.032
chard+0.031
Neg logit contributions
acks-0.040
irmed-0.035
 drifted-0.034
acked-0.034
irted-0.033
Token Tim
Token ID5045
Feature activation+1.64e-02

Pos logit contributions
acks+0.032
irmed+0.028
 drifted+0.028
irt+0.026
|+0.026
Neg logit contributions
 oh-0.024
ache-0.023
 why-0.023
 lawyer-0.023
chard-0.022
Token rode
Token ID22075
Feature activation+2.07e-02

Pos logit contributions
acks+0.031
irmed+0.027
 drifted+0.026
irt+0.026
irted+0.025
Neg logit contributions
ache-0.025
 oh-0.024
 lawyer-0.024
 why-0.024
cher-0.023
Token the
Token ID262
Feature activation+3.74e-03

Pos logit contributions
acks+0.030
 drifted+0.025
irmed+0.025
|+0.023
acked+0.023
Neg logit contributions
ache-0.043
 oh-0.041
cher-0.040
chard-0.039
 lawyer-0.039
Token sc
Token ID629
Feature activation-3.00e-02

Pos logit contributions
acks+0.008
 drifted+0.007
irmed+0.007
|+0.007
irt+0.007
Neg logit contributions
 oh-0.005
ache-0.005
 why-0.005
cher-0.005
chard-0.005
Tokenooter
Token ID25141
Feature activation-3.09e-02

Pos logit contributions
ache+0.020
 oh+0.020
cher+0.018
 lawyer+0.018
 why+0.017
Neg logit contributions
acks-0.084
 drifted-0.077
irmed-0.076
|-0.074
irt-0.073
Token,
Token ID11
Feature activation-5.93e-02

Pos logit contributions
 oh+0.054
ache+0.053
chard+0.051
cher+0.051
 why+0.049
Neg logit contributions
acks-0.054
 drifted-0.046
irmed-0.044
acked-0.042
|-0.042
Token he
Token ID339
Feature activation+1.35e-02

Pos logit contributions
ache+0.099
cher+0.091
 sir+0.088
 oh+0.088
 lawyer+0.087
Neg logit contributions
acks-0.104
 drifted-0.097
irmed-0.093
|-0.090
 evenings-0.084
Token went
Token ID1816
Feature activation-1.13e-02

Pos logit contributions
acks+0.022
 drifted+0.021
irmed+0.020
|+0.018
acked+0.018
Neg logit contributions
 oh-0.023
ache-0.023
 sir-0.022
chard-0.022
 lawyer-0.022
Token very
Token ID845
Feature activation+6.31e-03

Pos logit contributions
 oh+0.020
ache+0.020
cher+0.017
chard+0.017
 sir+0.017
Neg logit contributions
acks-0.020
 drifted-0.017
irmed-0.016
|-0.016
irm-0.016
Token fast
Token ID3049
Feature activation+5.36e-02

Pos logit contributions
acks+0.011
 drifted+0.010
irmed+0.010
|+0.009
irt+0.009
Neg logit contributions
 oh-0.011
ache-0.011
 why-0.010
 lawyer-0.010
chard-0.010
Token.
Token ID13
Feature activation-1.76e-03

Pos logit contributions
acks+0.097
 drifted+0.085
irmed+0.084
|+0.079
irt+0.078
Neg logit contributions
 oh-0.089
ache-0.088
cher-0.084
chard-0.084
 why-0.084
Token.
Token ID13
Feature activation-1.55e-03

Pos logit contributions
acks+0.029
 drifted+0.028
 drift+0.023
irmed+0.023
acked+0.022
Neg logit contributions
 oh-0.055
 principal-0.052
cher-0.052
 lawyer-0.051
chard-0.051
Token She
Token ID1375
Feature activation+3.93e-03

Pos logit contributions
ache+0.003
 oh+0.003
cher+0.003
chard+0.003
 lawyer+0.003
Neg logit contributions
acks-0.002
 drifted-0.002
irmed-0.002
|-0.002
irt-0.002
Token said
Token ID531
Feature activation-3.32e-02

Pos logit contributions
acks+0.007
 drifted+0.006
irmed+0.006
|+0.006
irt+0.006
Neg logit contributions
 oh-0.006
ache-0.006
 why-0.006
 lawyer-0.006
cher-0.006
Token,
Token ID11
Feature activation-7.10e-02

Pos logit contributions
ache+0.070
chard+0.067
cher+0.067
 principal+0.067
 dentist+0.066
Neg logit contributions
acks-0.037
 drifted-0.035
 They-0.033
irmed-0.031
 hands-0.030
Token "
Token ID366
Feature activation-5.11e-02

Pos logit contributions
ache+0.111
cher+0.107
 oh+0.101
 salon+0.100
 principal+0.100
Neg logit contributions
acks-0.126
 drifted-0.123
 They-0.121
 Soon-0.110
 drift-0.110
TokenCan
Token ID6090
Feature activation-8.19e-02

Pos logit contributions
 oh+0.077
ache+0.076
 lawyer+0.072
 why+0.071
cher+0.071
Neg logit contributions
acks-0.103
 drifted-0.090
irmed-0.090
|-0.085
irt-0.085
Token I
Token ID314
Feature activation-1.93e-02

Pos logit contributions
ache+0.157
cher+0.146
 oh+0.140
card+0.136
chard+0.134
Neg logit contributions
acks-0.113
 drifted-0.108
|-0.107
 They-0.101
 minds-0.098
Token strike
Token ID5587
Feature activation-2.06e-02

Pos logit contributions
ache+0.039
 oh+0.039
chard+0.037
cher+0.036
 lawyer+0.036
Neg logit contributions
acks-0.028
 drifted-0.024
|-0.023
irmed-0.022
irt-0.022
Token the
Token ID262
Feature activation-1.63e-02

Pos logit contributions
ache+0.038
 oh+0.036
cher+0.035
 lawyer+0.035
chard+0.034
Neg logit contributions
acks-0.035
 drifted-0.029
irmed-0.029
irt-0.028
|-0.028
Token key
Token ID1994
Feature activation-4.77e-02

Pos logit contributions
 oh+0.023
ache+0.023
cher+0.023
 why+0.022
chard+0.021
Neg logit contributions
acks-0.030
 drifted-0.029
 minds-0.027
irmed-0.027
irt-0.027
Token?"
Token ID1701
Feature activation+1.17e-02

Pos logit contributions
 oh+0.080
ache+0.080
 paramed+0.074
 principal+0.074
 lawyer+0.073
Neg logit contributions
acks-0.082
irt-0.069
 night-0.069
|-0.069
 drifted-0.065
Token day
Token ID1110
Feature activation+2.04e-02

Pos logit contributions
 lawyer+0.039
 oh+0.038
 why+0.038
 office+0.035
 stealing+0.035
Neg logit contributions
acks-0.042
irmed-0.037
irted-0.035
irt-0.033
irm-0.033
Token on
Token ID319
Feature activation+5.08e-02

Pos logit contributions
acks+0.048
irmed+0.042
 drifted+0.042
|+0.041
irt+0.040
Neg logit contributions
 oh-0.024
ache-0.023
cher-0.022
 lawyer-0.022
 why-0.022
Token,
Token ID11
Feature activation-6.54e-02

Pos logit contributions
acks+0.126
irmed+0.115
irt+0.112
 drifted+0.111
|+0.110
Neg logit contributions
 oh-0.050
 why-0.049
 lawyer-0.048
ache-0.046
 sir-0.045
Token Prince
Token ID9005
Feature activation-2.23e-02

Pos logit contributions
 lawyer+0.093
 oh+0.091
 why+0.090
chard+0.089
ache+0.086
Neg logit contributions
acks-0.137
irmed-0.125
 drifted-0.118
acked-0.117
irted-0.115
Token Jack
Token ID3619
Feature activation+5.25e-03

Pos logit contributions
ache+0.036
 oh+0.036
 why+0.034
 lawyer+0.034
cher+0.034
Neg logit contributions
acks-0.042
 drifted-0.036
irmed-0.036
|-0.034
irt-0.034
Token knew
Token ID2993
Feature activation-6.51e-02

Pos logit contributions
acks+0.011
irmed+0.009
irt+0.009
 drifted+0.009
|+0.009
Neg logit contributions
ache-0.008
 why-0.007
 oh-0.007
 lawyer-0.007
cher-0.007
Token he
Token ID339
Feature activation-4.37e-03

Pos logit contributions
ache+0.093
chard+0.090
 oh+0.086
cher+0.085
itor+0.083
Neg logit contributions
acks-0.122
 drifted-0.121
irmed-0.109
 night-0.105
 evenings-0.104
Token wanted
Token ID2227
Feature activation-3.96e-02

Pos logit contributions
 oh+0.006
ache+0.006
 why+0.006
cher+0.006
 lawyer+0.006
Neg logit contributions
acks-0.009
 drifted-0.008
irmed-0.008
|-0.008
irt-0.008
Token to
Token ID284
Feature activation-4.78e-02

Pos logit contributions
ache+0.050
 oh+0.049
cher+0.048
chard+0.047
 sir+0.044
Neg logit contributions
acks-0.087
 drifted-0.078
irmed-0.075
 minds-0.072
irt-0.072
Token be
Token ID307
Feature activation+8.35e-04

Pos logit contributions
 oh+0.088
ache+0.087
chard+0.084
cher+0.082
 lawyer+0.081
Neg logit contributions
acks-0.078
 drifted-0.068
irmed-0.062
|-0.061
irt-0.061
Token a
Token ID257
Feature activation-4.46e-02

Pos logit contributions
acks+0.001
 drifted+0.001
irmed+0.001
irt+0.001
irm+0.001
Neg logit contributions
ache-0.001
 oh-0.001
 sir-0.001
 salon-0.001
chard-0.001
Token stay
Token ID2652
Feature activation+2.32e-02

Pos logit contributions
 oh+0.079
ache+0.078
 why+0.075
 lawyer+0.074
cher+0.074
Neg logit contributions
acks-0.073
 drifted-0.063
irmed-0.062
|-0.059
irt-0.058
Token and
Token ID290
Feature activation+1.87e-02

Pos logit contributions
acks+0.048
 drifted+0.043
irmed+0.042
|+0.040
irt+0.040
Neg logit contributions
 oh-0.033
ache-0.032
 why-0.031
cher-0.030
 lawyer-0.030
Token watch
Token ID2342
Feature activation-2.58e-03

Pos logit contributions
acks+0.025
 drifted+0.022
irmed+0.020
|+0.020
irt+0.018
Neg logit contributions
ache-0.040
 oh-0.039
cher-0.038
 lawyer-0.038
 sir-0.037
Token the
Token ID262
Feature activation-6.31e-03

Pos logit contributions
ache+0.005
 oh+0.005
cher+0.005
 lawyer+0.005
chard+0.005
Neg logit contributions
acks-0.004
 drifted-0.003
irmed-0.003
|-0.003
irt-0.003
Token sunset
Token ID26428
Feature activation+6.97e-03

Pos logit contributions
 oh+0.013
ache+0.013
 lawyer+0.013
chard+0.013
 why+0.013
Neg logit contributions
acks-0.009
irmed-0.007
 drifted-0.007
|-0.006
irt-0.006
Token,
Token ID11
Feature activation-5.78e-02

Pos logit contributions
acks+0.012
 drifted+0.010
irmed+0.010
|+0.009
irt+0.009
Neg logit contributions
 oh-0.013
ache-0.012
cher-0.012
 lawyer-0.012
chard-0.012
Token so
Token ID523
Feature activation+8.60e-03

Pos logit contributions
ache+0.061
 oh+0.061
cher+0.056
 lawyer+0.056
 why+0.055
Neg logit contributions
acks-0.140
 drifted-0.127
irmed-0.125
|-0.121
irt-0.119
Token she
Token ID673
Feature activation+5.76e-03

Pos logit contributions
acks+0.019
 drifted+0.017
irmed+0.017
|+0.016
irt+0.016
Neg logit contributions
 oh-0.011
ache-0.011
 why-0.010
 lawyer-0.010
cher-0.010
Token found
Token ID1043
Feature activation-4.98e-02

Pos logit contributions
acks+0.010
 drifted+0.009
irmed+0.009
|+0.008
irt+0.008
Neg logit contributions
 oh-0.010
ache-0.010
 why-0.009
 lawyer-0.009
cher-0.009
Token a
Token ID257
Feature activation-4.49e-02

Pos logit contributions
ache+0.095
 lawyer+0.089
 oh+0.089
cher+0.088
chard+0.087
Neg logit contributions
acks-0.078
 drifted-0.070
irmed-0.067
irt-0.061
acked-0.061
Token big
Token ID1263
Feature activation+3.42e-03

Pos logit contributions
cher+0.088
 oh+0.087
ache+0.086
 why+0.085
itor+0.082
Neg logit contributions
acks-0.055
 drifted-0.052
 night-0.051
 harmony-0.049
 evening-0.049