TOP ACTIVATIONS
MAX = 0.240

Her mom my was heart broken and missed her little girl
But George was heart - broken . He had lost his
ia is shocked and heart broken . She does not understand
. Tim my was heart broken and cried again . His
the baby deer was heart broken . She had been fooled
and her family was heart broken . Remember
. Her mom was heart broken and promised to always remember
. It 's okay . Acc idents happen . Just be
. Tweet ie was heart broken and wished she could have
. Her parents were heart broken and knew she must have
leaving his parents forever heart broken . <|endoftext|> M ia was
. The princess was heart broken and didn 't know what
anymore . Lily was heart broken and cried all day .
mother was sad and heart broken , and she reminded her
was so tired and heart broken that he just stayed there
married . Lily was heart broken and cried a lot .
before . Lily was heart broken . " It 's not
again . He was heart broken and never wanted to go
okay , sweet ie . Acc idents happen . Let 's
's okay , Fiona . Acc idents happen ." Then they

MIN ACTIVATIONS
MIN = -0.280

M ummy used her fing ern ails to gently pull the
not bigger than a fing ern ail . They also found
the dove with her fing ern ails . The dove was
lots of words and their meanings ". M olly
new words and learn their meanings . One day , Tim
all the words and their meanings in it .
. They have stories and meanings that you may not understand
saw many words and their meanings . She tried to read
lots of words and their meanings . Lily looked through the
been caught in some b ram bles and so he carefully
try to loosen the b ram bles . As he was
was pin ching the b ram bles , the raven fl
as he opened the ch ocol ates and started eating them
They knew there were crocod iles in the water , but
colorful cand ies and ch ocol ates . She said to
by telling us . Miss iles can be very dangerous .
waved it at the crocod iles . He tried to scare
not eat too many ch ocol ates and his friend learned
can 't . Cro cod iles are dangerous . They have
and started eating the ch ocol ates . The shop keeper

INTERVAL 0.110 - 0.240
CONTAINS 1.724%

that he forgot why he came to the house in the
they had lots of fun together . <|endoftext|> Ben and Mom
When the little girl woke up , she was so
time , there was a little boy named Tim my .
the old nation had built together . <|endoftext|> Once upon a

INTERVAL -0.020 - 0.110
CONTAINS 69.711%

I do !" Lily says . She hugs the robot .
up . She loved the sound of the violin and wanted
S ally cl apped her hands and laughed . � �
, he felt a bit scared , but also very brave
old . He wanted to give a present to his mum

INTERVAL -0.150 - -0.020
CONTAINS 27.749%

� � Old man , why do you whistle ? It
but he does not smell them . They are
and loses his balance . He falls off his bike and
careful not to spill the y olk or the white .
The mixer waved to the clouds and asked if they would

INTERVAL -0.280 - -0.150
CONTAINS 0.816%

She also realized that asking for help was not bad ,
They liked to pick the fruits and share them with their
J enny stayed there for the rest of the day
of exciting things to pay for . She excited ly asked
One day , he went for a walk with his brain
Token Her
Token ID2332
Feature activation+1.83e-02

Pos logit contributions
ern+0.003
zie+0.003
 meanings+0.003
ad+0.003
 herself+0.003
Neg logit contributions
broken-0.005
 divorce-0.004
burn-0.004
 helicopter-0.004
 band-0.004
Token mom
Token ID1995
Feature activation+3.77e-02

Pos logit contributions
AAAA+0.036
B+0.036
arre+0.036
 late+0.035
 divorce+0.035
Neg logit contributions
 meanings-0.017
 herself-0.015
 answers-0.015
ad-0.014
sey-0.013
Tokenmy
Token ID1820
Feature activation+4.06e-04

Pos logit contributions
 divorce+0.089
AAAA+0.086
 late+0.082
apy+0.080
 helicopter+0.078
Neg logit contributions
ad-0.029
zie-0.029
et-0.028
sey-0.024
ern-0.023
Token was
Token ID373
Feature activation+7.37e-03

Pos logit contributions
 divorce+0.001
 late+0.001
AAAA+0.001
apy+0.001
des+0.001
Neg logit contributions
ad-0.000
et-0.000
ima-0.000
i-0.000
sey-0.000
Token heart
Token ID2612
Feature activation+7.61e-02

Pos logit contributions
ever+0.011
oing+0.011
AAAA+0.011
ite+0.010
Sorry+0.010
Neg logit contributions
 herself-0.009
ad-0.009
ern-0.009
wered-0.008
 for-0.008
Tokenbroken
Token ID25826
Feature activation+2.16e-01

Pos logit contributions
AAAA+0.056
 divorce+0.051
Sorry+0.037
Uh+0.036
Quick+0.036
Neg logit contributions
et-0.171
ern-0.170
ad-0.169
ales-0.166
sey-0.166
Token and
Token ID290
Feature activation+4.90e-02

Pos logit contributions
broken+0.258
 divorce+0.252
 late+0.241
 band+0.216
 severe+0.216
Neg logit contributions
ern-0.408
iles-0.395
ales-0.391
ad-0.388
zie-0.385
Token missed
Token ID6825
Feature activation+6.89e-02

Pos logit contributions
broken+0.066
 divorce+0.065
burn+0.063
AAAA+0.062
ooming+0.061
Neg logit contributions
 meanings-0.074
ad-0.074
ern-0.073
zie-0.073
et-0.071
Token her
Token ID607
Feature activation-2.46e-02

Pos logit contributions
 divorce+0.117
broken+0.116
 late+0.113
 band+0.106
 severe+0.105
Neg logit contributions
ern-0.097
iles-0.093
ales-0.092
ad-0.089
 Lucas-0.088
Token little
Token ID1310
Feature activation+1.27e-01

Pos logit contributions
 answers+0.032
 meanings+0.031
ern+0.030
ad+0.029
et+0.029
Neg logit contributions
AAAA-0.043
Sorry-0.040
 divorce-0.038
arre-0.038
broken-0.038
Token girl
Token ID2576
Feature activation+7.21e-02

Pos logit contributions
broken+0.145
 divorce+0.144
AAAA+0.141
 late+0.134
 severe+0.132
Neg logit contributions
 answers-0.231
ern-0.220
et-0.217
ad-0.215
 meanings-0.215
TokenBut
Token ID1537
Feature activation-6.29e-02

Pos logit contributions
 divorce+0.060
broken+0.055
burn+0.052
 unst+0.050
 late+0.050
Neg logit contributions
ern-0.062
ad-0.061
iles-0.059
 Lucas-0.057
zie-0.057
Token George
Token ID4502
Feature activation-3.57e-03

Pos logit contributions
ern+0.089
ad+0.084
iles+0.084
zie+0.084
 Lucas+0.084
Neg logit contributions
broken-0.102
 divorce-0.102
 late-0.093
AAAA-0.090
 severe-0.090
Token was
Token ID373
Feature activation+1.02e-02

Pos logit contributions
ern+0.007
ales+0.007
iles+0.006
ad+0.006
 meanings+0.006
Neg logit contributions
broken-0.004
 divorce-0.004
 late-0.004
 severe-0.004
 band-0.004
Token heart
Token ID2612
Feature activation+7.49e-02

Pos logit contributions
 divorce+0.014
broken+0.014
 late+0.013
 severe+0.012
 helicopter+0.012
Neg logit contributions
ern-0.017
ad-0.017
iles-0.016
ales-0.016
zie-0.016
Token-
Token ID12
Feature activation+7.25e-02

Pos logit contributions
 divorce+0.022
AAAA+0.008
 late+0.007
broken+0.005
 helicopter+0.004
Neg logit contributions
ern-0.208
ad-0.204
et-0.202
zie-0.201
ales-0.201
Tokenbroken
Token ID25826
Feature activation+2.15e-01

Pos logit contributions
 divorce+0.104
AAAA+0.096
 million+0.093
 helicopter+0.090
 late+0.089
Neg logit contributions
ern-0.120
ad-0.117
ales-0.113
et-0.112
zie-0.108
Token.
Token ID13
Feature activation-2.34e-03

Pos logit contributions
broken+0.230
 late+0.226
 badly+0.197
 divorce+0.197
 together+0.191
Neg logit contributions
ern-0.450
ales-0.444
iles-0.443
 meanings-0.431
zie-0.425
Token He
Token ID679
Feature activation-3.43e-02

Pos logit contributions
ales+0.004
ern+0.004
iles+0.004
et+0.004
ad+0.004
Neg logit contributions
 late-0.003
 divorce-0.003
broken-0.003
 severe-0.003
 million-0.003
Token had
Token ID550
Feature activation-2.22e-02

Pos logit contributions
ern+0.065
iles+0.063
ales+0.063
ad+0.061
zie+0.060
Neg logit contributions
broken-0.041
 divorce-0.040
 late-0.037
 severe-0.035
 band-0.034
Token lost
Token ID2626
Feature activation+4.33e-02

Pos logit contributions
ern+0.035
ad+0.034
iles+0.033
zie+0.033
 meanings+0.033
Neg logit contributions
 divorce-0.032
broken-0.032
 late-0.029
 helicopter-0.029
AAAA-0.028
Token his
Token ID465
Feature activation+6.85e-03

Pos logit contributions
broken+0.060
 divorce+0.058
 late+0.057
 band+0.057
 badly+0.054
Neg logit contributions
ern-0.075
iles-0.073
ales-0.072
alia-0.071
zie-0.070
Tokenia
Token ID544
Feature activation-1.47e-01

Pos logit contributions
ern+0.018
ales+0.016
 meanings+0.016
zie+0.016
 Lucas+0.015
Neg logit contributions
 divorce-0.050
broken-0.050
 late-0.049
 band-0.048
 severe-0.047
Token is
Token ID318
Feature activation-6.62e-02

Pos logit contributions
 answers+0.204
ad+0.204
ern+0.202
et+0.196
iles+0.195
Neg logit contributions
 divorce-0.241
broken-0.236
 late-0.224
 severe-0.215
B-0.209
Token shocked
Token ID11472
Feature activation+9.75e-02

Pos logit contributions
ern+0.147
ad+0.141
et+0.138
 answers+0.138
iles+0.136
Neg logit contributions
 divorce-0.053
broken-0.049
AAAA-0.042
B-0.041
burn-0.040
Token and
Token ID290
Feature activation+4.13e-02

Pos logit contributions
broken+0.097
 late+0.097
 divorce+0.089
 band+0.084
 together+0.078
Neg logit contributions
ern-0.206
 meanings-0.195
 Lucas-0.195
zie-0.194
ales-0.193
Token heart
Token ID2612
Feature activation+7.26e-02

Pos logit contributions
 divorce+0.033
broken+0.033
burn+0.028
B+0.027
 million+0.026
Neg logit contributions
ern-0.088
 answers-0.088
ad-0.087
 meanings-0.085
 Lucas-0.084
Tokenbroken
Token ID25826
Feature activation+2.14e-01

Pos logit contributions
 divorce+0.086
broken+0.077
 late+0.072
AAAA+0.072
 helicopter+0.070
Neg logit contributions
ern-0.137
iles-0.132
ad-0.132
et-0.131
ales-0.130
Token.
Token ID13
Feature activation-1.21e-02

Pos logit contributions
 late+0.273
broken+0.268
 together+0.246
 band+0.243
 divorce+0.241
Neg logit contributions
ern-0.395
ales-0.393
zie-0.389
iles-0.376
 meanings-0.375
Token She
Token ID1375
Feature activation-5.51e-02

Pos logit contributions
ales+0.023
et+0.023
ern+0.023
iles+0.022
ad+0.022
Neg logit contributions
 late-0.015
 divorce-0.013
AAAA-0.013
broken-0.012
 sad-0.012
Token does
Token ID857
Feature activation-6.33e-02

Pos logit contributions
ern+0.086
ales+0.084
iles+0.082
zie+0.081
 Lucas+0.079
Neg logit contributions
 divorce-0.083
broken-0.082
 late-0.078
 band-0.076
AAAA-0.074
Token not
Token ID407
Feature activation-1.16e-02

Pos logit contributions
ern+0.079
ad+0.076
iles+0.071
 answers+0.071
 meanings+0.071
Neg logit contributions
broken-0.112
 divorce-0.111
 late-0.102
 helicopter-0.102
 million-0.102
Token understand
Token ID1833
Feature activation+1.86e-02

Pos logit contributions
ern+0.019
iles+0.018
 Lucas+0.018
zie+0.018
ales+0.018
Neg logit contributions
 divorce-0.017
broken-0.017
 late-0.016
 band-0.015
 severe-0.015
Token.
Token ID13
Feature activation-4.85e-03

Pos logit contributions
 divorce+0.048
broken+0.048
 late+0.044
 severe+0.042
AAAA+0.042
Neg logit contributions
ern-0.055
iles-0.052
ad-0.052
ales-0.052
zie-0.051
Token Tim
Token ID5045
Feature activation+2.15e-02

Pos logit contributions
ern+0.007
zie+0.006
ad+0.006
 meanings+0.006
ales+0.006
Neg logit contributions
broken-0.008
 divorce-0.008
burn-0.008
 severe-0.007
 band-0.007
Tokenmy
Token ID1820
Feature activation-3.04e-03

Pos logit contributions
broken+0.032
 divorce+0.032
 late+0.030
 severe+0.028
 band+0.028
Neg logit contributions
ern-0.034
iles-0.032
ales-0.032
ad-0.032
 meanings-0.031
Token was
Token ID373
Feature activation+6.67e-03

Pos logit contributions
ern+0.005
ad+0.005
 meanings+0.004
et+0.004
 Lucas+0.004
Neg logit contributions
 divorce-0.005
 late-0.004
broken-0.004
burn-0.004
 helicopter-0.004
Token heart
Token ID2612
Feature activation+7.31e-02

Pos logit contributions
 divorce+0.007
broken+0.006
AAAA+0.006
Sorry+0.006
B+0.006
Neg logit contributions
ern-0.013
ad-0.013
et-0.012
 answers-0.012
zie-0.012
Tokenbroken
Token ID25826
Feature activation+2.14e-01

Pos logit contributions
 divorce+0.028
AAAA+0.021
 late+0.012
Sorry+0.011
 helicopter+0.010
Neg logit contributions
ern-0.191
ad-0.190
et-0.189
ales-0.187
zie-0.185
Token and
Token ID290
Feature activation+4.69e-02

Pos logit contributions
broken+0.211
 late+0.210
 divorce+0.188
 band+0.173
 together+0.173
Neg logit contributions
ern-0.465
iles-0.453
ales-0.449
 meanings-0.445
zie-0.440
Token cried
Token ID16896
Feature activation+6.28e-02

Pos logit contributions
 divorce+0.059
broken+0.058
burn+0.051
AAAA+0.051
 late+0.051
Neg logit contributions
ern-0.080
ad-0.079
zie-0.078
 meanings-0.078
et-0.077
Token again
Token ID757
Feature activation+1.18e-01

Pos logit contributions
 divorce+0.118
broken+0.111
AAAA+0.109
 late+0.104
Sorry+0.104
Neg logit contributions
ern-0.072
ad-0.069
ales-0.068
et-0.067
 Lucas-0.066
Token.
Token ID13
Feature activation-4.54e-03

Pos logit contributions
broken+0.177
 late+0.168
 divorce+0.167
 severe+0.153
 band+0.152
Neg logit contributions
iles-0.191
ern-0.189
ales-0.185
zie-0.182
ad-0.181
Token His
Token ID2399
Feature activation+1.96e-02

Pos logit contributions
ern+0.007
ad+0.007
zie+0.007
 meanings+0.007
ales+0.007
Neg logit contributions
broken-0.007
 divorce-0.007
 late-0.006
 band-0.006
 helicopter-0.006
Token the
Token ID262
Feature activation-4.32e-03

Pos logit contributions
 divorce+0.059
broken+0.059
 late+0.054
 severe+0.051
 band+0.050
Neg logit contributions
ern-0.083
iles-0.080
ad-0.079
ales-0.079
zie-0.078
Token baby
Token ID5156
Feature activation+1.13e-01

Pos logit contributions
ern+0.008
ales+0.008
zie+0.008
iles+0.008
ad+0.008
Neg logit contributions
 divorce-0.005
broken-0.005
 band-0.005
 late-0.005
 helicopter-0.005
Token deer
Token ID20096
Feature activation-4.47e-02

Pos logit contributions
broken+0.141
 divorce+0.135
 late+0.126
 band+0.122
 helicopter+0.121
Neg logit contributions
ern-0.211
ales-0.199
iles-0.197
ad-0.196
 meanings-0.195
Token was
Token ID373
Feature activation+7.52e-03

Pos logit contributions
ern+0.081
iles+0.078
ales+0.078
ad+0.076
zie+0.076
Neg logit contributions
broken-0.056
 divorce-0.055
 late-0.050
 band-0.048
 helicopter-0.047
Token heart
Token ID2612
Feature activation+7.38e-02

Pos logit contributions
 divorce+0.008
broken+0.008
AAAA+0.007
Sorry+0.007
B+0.007
Neg logit contributions
ern-0.014
ad-0.014
et-0.014
 Lucas-0.013
zie-0.013
Tokenbroken
Token ID25826
Feature activation+2.13e-01

Pos logit contributions
 divorce+0.041
AAAA+0.037
Sorry+0.026
 late+0.025
 helicopter+0.022
Neg logit contributions
ad-0.181
ern-0.179
et-0.177
ales-0.175
iles-0.174
Token.
Token ID13
Feature activation-6.96e-03

Pos logit contributions
broken+0.291
 late+0.287
 divorce+0.265
 badly+0.256
 together+0.253
Neg logit contributions
ern-0.380
ales-0.372
iles-0.367
zie-0.361
 meanings-0.359
Token She
Token ID1375
Feature activation-5.23e-02

Pos logit contributions
ern+0.010
iles+0.010
ales+0.010
ad+0.010
et+0.010
Neg logit contributions
 divorce-0.011
broken-0.011
 late-0.011
 severe-0.010
 helicopter-0.010
Token had
Token ID550
Feature activation-2.38e-02

Pos logit contributions
ern+0.094
ad+0.090
iles+0.089
 meanings+0.088
ales+0.088
Neg logit contributions
 divorce-0.066
broken-0.065
 late-0.061
 severe-0.056
 helicopter-0.055
Token been
Token ID587
Feature activation+2.46e-02

Pos logit contributions
ern+0.038
ad+0.037
 meanings+0.036
iles+0.035
 answers+0.035
Neg logit contributions
broken-0.034
 divorce-0.034
 late-0.031
 helicopter-0.030
AAAA-0.030
Token fooled
Token ID35820
Feature activation+3.07e-02

Pos logit contributions
 divorce+0.033
broken+0.032
AAAA+0.030
Sorry+0.029
 severe+0.029
Neg logit contributions
ern-0.041
ad-0.039
et-0.038
 answers-0.038
iles-0.038
Token and
Token ID290
Feature activation+4.87e-02

Pos logit contributions
ern+0.001
et+0.001
ad+0.001
zie+0.001
ales+0.001
Neg logit contributions
 divorce-0.001
broken-0.001
 late-0.001
AAAA-0.001
 severe-0.001
Token her
Token ID607
Feature activation-2.54e-02

Pos logit contributions
 divorce+0.080
broken+0.080
 late+0.074
 severe+0.071
 band+0.070
Neg logit contributions
ern-0.069
iles-0.065
ad-0.065
ales-0.065
zie-0.064
Token family
Token ID1641
Feature activation+2.67e-02

Pos logit contributions
ern+0.040
ad+0.039
iles+0.039
 meanings+0.038
zie+0.038
Neg logit contributions
 divorce-0.036
broken-0.036
 late-0.035
AAAA-0.032
 severe-0.032
Token was
Token ID373
Feature activation+6.78e-03

Pos logit contributions
 divorce+0.042
AAAA+0.041
 late+0.041
broken+0.040
Sorry+0.038
Neg logit contributions
et-0.036
ad-0.035
ern-0.034
zie-0.032
ima-0.032
Token heart
Token ID2612
Feature activation+7.47e-02

Pos logit contributions
AAAA+0.011
Sorry+0.010
ever+0.010
oing+0.010
Quick+0.010
Neg logit contributions
ern-0.008
ad-0.008
et-0.008
 herself-0.008
 for-0.007
Tokenbroken
Token ID25826
Feature activation+2.13e-01

Pos logit contributions
AAAA+0.047
 divorce+0.044
Sorry+0.030
Uh+0.027
Quick+0.027
Neg logit contributions
et-0.177
ales-0.176
ad-0.176
ern-0.173
sey-0.170
Token.
Token ID13
Feature activation-5.90e-03

Pos logit contributions
broken+0.322
 late+0.310
 divorce+0.309
 band+0.279
 helicopter+0.277
Neg logit contributions
ern-0.343
iles-0.328
ales-0.325
zie-0.321
ad-0.320
Token
Token ID220
Feature activation+1.96e-02

Pos logit contributions
ern+0.010
ad+0.010
zie+0.010
iles+0.010
ales+0.010
Neg logit contributions
broken-0.008
 divorce-0.008
 late-0.007
 band-0.007
 severe-0.007
Token
Token ID198
Feature activation+3.77e-02

Pos logit contributions
 divorce+0.027
broken+0.026
 helicopter+0.023
 band+0.023
 severe+0.023
Neg logit contributions
ern-0.034
ad-0.032
ales-0.031
zie-0.031
iles-0.031
Token
Token ID198
Feature activation+3.77e-02

Pos logit contributions
 divorce+0.050
broken+0.050
 helicopter+0.044
burn+0.043
 late+0.043
Neg logit contributions
ern-0.066
ad-0.062
zie-0.061
iles-0.060
ales-0.060
TokenRemember
Token ID16676
Feature activation+1.13e-01

Pos logit contributions
burn+0.054
 divorce+0.054
 helicopter+0.051
broken+0.049
 unst+0.048
Neg logit contributions
ern-0.060
ad-0.055
 herself-0.053
wered-0.052
ori-0.051
Token.
Token ID13
Feature activation-4.37e-03

Pos logit contributions
AAAA+0.150
 sudden+0.147
Quick+0.145
Sorry+0.140
Uh+0.139
Neg logit contributions
 herself-0.096
 himself-0.090
ad-0.089
 answers-0.089
et-0.087
Token Her
Token ID2332
Feature activation+1.77e-02

Pos logit contributions
ern+0.005
 meanings+0.005
zie+0.005
ad+0.005
alia+0.005
Neg logit contributions
broken-0.008
 divorce-0.008
 helicopter-0.007
 band-0.007
 severe-0.007
Token mom
Token ID1995
Feature activation+3.57e-02

Pos logit contributions
broken+0.027
 divorce+0.027
 late+0.026
AAAA+0.024
 million+0.024
Neg logit contributions
ern-0.026
ad-0.025
iles-0.025
 meanings-0.025
zie-0.024
Token was
Token ID373
Feature activation+6.56e-03

Pos logit contributions
 divorce+0.060
 late+0.058
AAAA+0.058
broken+0.051
 helicopter+0.051
Neg logit contributions
et-0.049
ad-0.048
ern-0.046
zie-0.045
ima-0.045
Token heart
Token ID2612
Feature activation+7.39e-02

Pos logit contributions
AAAA+0.009
 divorce+0.009
oing+0.008
 Sorry+0.008
ite+0.008
Neg logit contributions
ern-0.010
ad-0.010
et-0.009
 herself-0.009
 answers-0.009
Tokenbroken
Token ID25826
Feature activation+2.13e-01

Pos logit contributions
AAAA+0.054
 divorce+0.047
 CA+0.034
Quick+0.033
Sorry+0.032
Neg logit contributions
et-0.169
ad-0.166
ern-0.165
ales-0.165
zie-0.163
Token and
Token ID290
Feature activation+4.84e-02

Pos logit contributions
broken+0.256
 divorce+0.249
 late+0.240
 band+0.215
 severe+0.213
Neg logit contributions
ern-0.403
iles-0.392
ales-0.388
ad-0.382
zie-0.380
Token promised
Token ID8072
Feature activation+1.47e-01

Pos logit contributions
broken+0.063
 divorce+0.063
AAAA+0.058
burn+0.056
 late+0.056
Neg logit contributions
ern-0.079
 meanings-0.077
ad-0.077
zie-0.076
et-0.076
Token to
Token ID284
Feature activation-1.11e-02

Pos logit contributions
 divorce+0.178
broken+0.177
 late+0.162
 severe+0.151
 band+0.149
Neg logit contributions
ern-0.274
iles-0.264
ales-0.262
ad-0.261
zie-0.257
Token always
Token ID1464
Feature activation-2.53e-02

Pos logit contributions
ern+0.017
iles+0.017
ales+0.016
 Lucas+0.016
alia+0.016
Neg logit contributions
 divorce-0.017
broken-0.017
 late-0.016
 band-0.015
 severe-0.015
Token remember
Token ID3505
Feature activation-2.23e-02

Pos logit contributions
ales+0.043
iles+0.043
ern+0.042
zie+0.041
io+0.040
Neg logit contributions
 divorce-0.035
broken-0.034
 band-0.034
 late-0.032
burn-0.030
Token.
Token ID13
Feature activation-1.45e-02

Pos logit contributions
ern+0.068
iles+0.065
ales+0.064
ad+0.064
 meanings+0.063
Neg logit contributions
broken-0.070
 divorce-0.069
 late-0.066
 severe-0.062
 band-0.062
Token It
Token ID632
Feature activation+2.18e-02

Pos logit contributions
ern+0.025
iles+0.024
ales+0.024
ad+0.024
et+0.024
Neg logit contributions
 divorce-0.019
broken-0.018
 late-0.018
AAAA-0.016
 helicopter-0.016
Token's
Token ID338
Feature activation+4.85e-02

Pos logit contributions
 divorce+0.035
broken+0.035
 late+0.033
 sudden+0.033
AAAA+0.031
Neg logit contributions
et-0.030
 answers-0.029
zie-0.029
ad-0.028
ern-0.028
Token okay
Token ID8788
Feature activation+1.23e-01

Pos logit contributions
broken+0.050
 divorce+0.050
 late+0.046
 severe+0.041
 band+0.041
Neg logit contributions
ern-0.099
iles-0.096
ad-0.095
ales-0.095
zie-0.094
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
broken+0.212
 divorce+0.211
AAAA+0.190
 severe+0.190
 late+0.189
Neg logit contributions
ern-0.161
iles-0.156
ad-0.154
zie-0.151
ales-0.149
Token Acc
Token ID6366
Feature activation+2.13e-01

Pos logit contributions
ern+0.024
ad+0.023
iles+0.023
ales+0.023
zie+0.023
Neg logit contributions
broken-0.021
 divorce-0.020
 late-0.018
 severe-0.018
 band-0.017
Tokenidents
Token ID3231
Feature activation+4.25e-02

Pos logit contributions
 divorce+0.339
broken+0.318
 severe+0.316
AAAA+0.310
 late+0.308
Neg logit contributions
ern-0.305
 meanings-0.294
ales-0.291
 Lucas-0.287
zie-0.286
Token happen
Token ID1645
Feature activation+6.33e-02

Pos logit contributions
 divorce+0.074
broken+0.074
 late+0.070
 severe+0.066
 band+0.066
Neg logit contributions
ern-0.056
iles-0.053
ales-0.053
ad-0.052
zie-0.051
Token.
Token ID13
Feature activation-1.43e-02

Pos logit contributions
 divorce+0.104
broken+0.103
 late+0.096
 severe+0.092
 helicopter+0.091
Neg logit contributions
ern-0.089
iles-0.085
ad-0.084
ales-0.084
zie-0.082
Token Just
Token ID2329
Feature activation+3.38e-02

Pos logit contributions
ern+0.026
iles+0.025
ad+0.025
ales+0.025
zie+0.024
Neg logit contributions
broken-0.018
 divorce-0.018
 late-0.016
 severe-0.015
 band-0.015
Token be
Token ID307
Feature activation-6.57e-02

Pos logit contributions
 divorce+0.050
broken+0.049
 late+0.047
 band+0.046
burn+0.044
Neg logit contributions
ern-0.053
iles-0.051
ales-0.051
io-0.050
alia-0.050
Token.
Token ID13
Feature activation-6.39e-03

Pos logit contributions
broken+0.076
 late+0.076
 badly+0.072
B+0.069
 unst+0.069
Neg logit contributions
ern-0.110
 meanings-0.109
zie-0.108
iles-0.108
ales-0.107
Token Tweet
Token ID18752
Feature activation-2.98e-02

Pos logit contributions
ern+0.011
ad+0.010
iles+0.010
ales+0.010
zie+0.010
Neg logit contributions
 divorce-0.009
broken-0.009
 late-0.008
 severe-0.008
 helicopter-0.007
Tokenie
Token ID494
Feature activation+3.48e-03

Pos logit contributions
ad+0.042
ern+0.042
zie+0.041
et+0.041
iles+0.041
Neg logit contributions
 divorce-0.051
 late-0.046
broken-0.045
 helicopter-0.043
AAAA-0.042
Token was
Token ID373
Feature activation+7.01e-03

Pos logit contributions
 divorce+0.006
 late+0.005
burn+0.005
broken+0.005
AAAA+0.005
Neg logit contributions
ad-0.005
et-0.005
ern-0.005
zie-0.004
 Lucas-0.004
Token heart
Token ID2612
Feature activation+7.45e-02

Pos logit contributions
 divorce+0.008
Sorry+0.008
AAAA+0.007
burn+0.007
broken+0.007
Neg logit contributions
ern-0.013
ad-0.012
et-0.012
 herself-0.011
zie-0.011
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
 divorce+0.045
AAAA+0.040
Sorry+0.028
 CA+0.023
 helicopter+0.023
Neg logit contributions
ad-0.177
et-0.175
ales-0.175
ern-0.175
zie-0.171
Token and
Token ID290
Feature activation+4.74e-02

Pos logit contributions
broken+0.224
 late+0.217
 divorce+0.194
 badly+0.182
 together+0.180
Neg logit contributions
ern-0.452
iles-0.437
ales-0.436
 meanings-0.431
zie-0.426
Token wished
Token ID16555
Feature activation+3.39e-02

Pos logit contributions
 divorce+0.054
broken+0.053
 late+0.048
 severe+0.045
 helicopter+0.044
Neg logit contributions
ern-0.090
ad-0.087
iles-0.087
ales-0.086
zie-0.086
Token she
Token ID673
Feature activation+2.38e-02

Pos logit contributions
 late+0.034
broken+0.033
 divorce+0.030
 badly+0.030
 together+0.029
Neg logit contributions
ern-0.072
ales-0.072
iles-0.070
 meanings-0.068
alia-0.067
Token could
Token ID714
Feature activation-9.04e-03

Pos logit contributions
 divorce+0.037
broken+0.036
 late+0.034
AAAA+0.032
 helicopter+0.032
Neg logit contributions
ern-0.035
ad-0.034
et-0.033
zie-0.033
 meanings-0.033
Token have
Token ID423
Feature activation+3.00e-02

Pos logit contributions
ern+0.013
ad+0.013
iles+0.012
ales+0.012
zie+0.012
Neg logit contributions
 divorce-0.015
broken-0.014
 late-0.014
AAAA-0.013
 severe-0.013
Token.
Token ID13
Feature activation-3.75e-03

Pos logit contributions
broken+0.159
 late+0.155
 divorce+0.151
 severe+0.137
 band+0.134
Neg logit contributions
ern-0.209
iles-0.202
ales-0.201
zie-0.199
 meanings-0.196
Token Her
Token ID2332
Feature activation+1.74e-02

Pos logit contributions
ern+0.006
iles+0.006
ales+0.006
et+0.006
ad+0.006
Neg logit contributions
 divorce-0.005
 late-0.005
broken-0.005
 severe-0.005
 million-0.005
Token parents
Token ID3397
Feature activation+1.81e-03

Pos logit contributions
 divorce+0.028
broken+0.028
 late+0.026
 severe+0.025
 band+0.025
Neg logit contributions
ern-0.026
ales-0.025
iles-0.025
ad-0.024
zie-0.024
Token were
Token ID547
Feature activation-2.36e-02

Pos logit contributions
 divorce+0.003
 late+0.003
broken+0.003
AAAA+0.002
 severe+0.002
Neg logit contributions
ad-0.003
ern-0.003
et-0.003
zie-0.002
 Lucas-0.002
Token heart
Token ID2612
Feature activation+7.39e-02

Pos logit contributions
ern+0.039
ad+0.037
et+0.036
 answers+0.034
iles+0.034
Neg logit contributions
 divorce-0.031
broken-0.029
AAAA-0.029
B-0.028
Sorry-0.028
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
 divorce+0.033
AAAA+0.030
Sorry+0.016
 helicopter+0.015
 late+0.015
Neg logit contributions
ern-0.188
ad-0.187
et-0.185
ales-0.184
zie-0.183
Token and
Token ID290
Feature activation+4.87e-02

Pos logit contributions
broken+0.231
 late+0.222
 divorce+0.211
 together+0.189
 band+0.188
Neg logit contributions
ern-0.440
ales-0.426
iles-0.423
zie-0.415
 meanings-0.415
Token knew
Token ID2993
Feature activation+1.09e-02

Pos logit contributions
broken+0.060
 divorce+0.059
 late+0.054
AAAA+0.052
 severe+0.051
Neg logit contributions
ern-0.086
ad-0.084
zie-0.082
 meanings-0.082
 Lucas-0.082
Token she
Token ID673
Feature activation+2.31e-02

Pos logit contributions
 divorce+0.012
broken+0.011
 late+0.011
 band+0.010
 severe+0.010
Neg logit contributions
ern-0.022
ales-0.021
iles-0.021
et-0.021
zie-0.021
Token must
Token ID1276
Feature activation-6.18e-03

Pos logit contributions
 divorce+0.029
broken+0.029
 late+0.027
 severe+0.025
 band+0.025
Neg logit contributions
ern-0.041
iles-0.040
ales-0.039
ad-0.039
zie-0.039
Token have
Token ID423
Feature activation+2.95e-02

Pos logit contributions
iles+0.010
ales+0.010
ern+0.010
 Lucas+0.010
 meanings+0.010
Neg logit contributions
 divorce-0.008
broken-0.008
 band-0.008
 late-0.008
 severe-0.007
Token leaving
Token ID4305
Feature activation-2.60e-02

Pos logit contributions
 divorce+0.002
broken+0.002
 late+0.002
AAAA+0.002
 severe+0.002
Neg logit contributions
ern-0.002
ad-0.002
et-0.002
zie-0.002
ales-0.002
Token his
Token ID465
Feature activation+8.07e-03

Pos logit contributions
ern+0.049
iles+0.045
ales+0.044
 meanings+0.044
alia+0.043
Neg logit contributions
 late-0.034
 divorce-0.033
broken-0.032
 band-0.030
 severe-0.029
Token parents
Token ID3397
Feature activation+8.77e-04

Pos logit contributions
 divorce+0.010
broken+0.010
 late+0.010
 severe+0.009
 helicopter+0.009
Neg logit contributions
ern-0.015
iles-0.014
ales-0.014
ad-0.014
et-0.013
Token forever
Token ID8097
Feature activation+5.64e-02

Pos logit contributions
AAAA+0.001
 divorce+0.001
broken+0.001
Sorry+0.001
 late+0.001
Neg logit contributions
ad-0.001
et-0.001
ern-0.001
ales-0.001
zie-0.001
Token heart
Token ID2612
Feature activation+7.43e-02

Pos logit contributions
 divorce+0.088
broken+0.087
 late+0.082
 severe+0.077
 band+0.076
Neg logit contributions
ern-0.085
iles-0.081
ad-0.080
ales-0.080
zie-0.079
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
 divorce+0.104
broken+0.094
AAAA+0.092
 late+0.091
 helicopter+0.087
Neg logit contributions
ern-0.122
ad-0.118
et-0.117
ales-0.117
iles-0.115
Token.
Token ID13
Feature activation-2.38e-03

Pos logit contributions
broken+0.274
 late+0.261
 divorce+0.244
 badly+0.238
 together+0.235
Neg logit contributions
ern-0.396
iles-0.390
ales-0.386
zie-0.376
ori-0.370
Token<|endoftext|>
Token ID50256
Feature activation+1.06e-02

Pos logit contributions
ern+0.004
iles+0.004
ad+0.004
ales+0.004
zie+0.004
Neg logit contributions
 divorce-0.003
broken-0.003
 late-0.003
 severe-0.002
 band-0.002
TokenM
Token ID44
Feature activation-1.44e-02

Pos logit contributions
 divorce+0.011
broken+0.011
 late+0.010
 severe+0.009
 band+0.009
Neg logit contributions
ern-0.021
iles-0.020
ales-0.020
ad-0.020
 meanings-0.020
Tokenia
Token ID544
Feature activation-1.46e-01

Pos logit contributions
ern+0.022
ales+0.021
iles+0.021
et+0.021
ad+0.020
Neg logit contributions
 divorce-0.022
broken-0.022
 late-0.021
 severe-0.020
 band-0.019
Token was
Token ID373
Feature activation+1.11e-02

Pos logit contributions
ern+0.260
ad+0.251
iles+0.250
ales+0.248
zie+0.246
Neg logit contributions
 divorce-0.184
broken-0.182
 late-0.166
 severe-0.156
 helicopter-0.154
Token.
Token ID13
Feature activation-5.88e-03

Pos logit contributions
ern+0.021
iles+0.020
ad+0.020
ales+0.020
zie+0.020
Neg logit contributions
 divorce-0.022
broken-0.022
 late-0.020
 severe-0.019
 band-0.019
Token The
Token ID383
Feature activation+2.87e-02

Pos logit contributions
ern+0.010
iles+0.010
ales+0.010
ad+0.010
et+0.009
Neg logit contributions
 divorce-0.008
broken-0.008
 late-0.008
 severe-0.007
 helicopter-0.007
Token princess
Token ID21752
Feature activation-3.43e-02

Pos logit contributions
 divorce+0.044
broken+0.044
 late+0.041
 severe+0.039
 helicopter+0.038
Neg logit contributions
ern-0.044
iles-0.042
ad-0.042
ales-0.041
zie-0.041
Token was
Token ID373
Feature activation+7.66e-03

Pos logit contributions
et+0.044
ad+0.043
iles+0.040
 meanings+0.040
io+0.040
Neg logit contributions
 divorce-0.059
 late-0.058
 unst-0.053
burn-0.052
AAAA-0.052
Token heart
Token ID2612
Feature activation+7.31e-02

Pos logit contributions
 divorce+0.008
broken+0.008
AAAA+0.007
Sorry+0.007
B+0.007
Neg logit contributions
ern-0.015
ad-0.014
et-0.014
 answers-0.014
iles-0.013
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
AAAA+0.050
 divorce+0.046
Sorry+0.034
Quick+0.034
 Sorry+0.027
Neg logit contributions
ad-0.171
et-0.169
ern-0.162
zie-0.161
ales-0.160
Token and
Token ID290
Feature activation+4.70e-02

Pos logit contributions
broken+0.196
 late+0.195
 divorce+0.171
 band+0.158
 badly+0.154
Neg logit contributions
ern-0.476
iles-0.457
ales-0.454
 meanings-0.451
zie-0.449
Token didn
Token ID1422
Feature activation+7.75e-02

Pos logit contributions
 divorce+0.054
broken+0.053
 late+0.048
 severe+0.045
 band+0.044
Neg logit contributions
ern-0.089
ad-0.086
iles-0.086
ales-0.085
zie-0.085
Token't
Token ID470
Feature activation+6.79e-02

Pos logit contributions
broken+0.123
 divorce+0.116
 sudden+0.111
AAAA+0.105
 band+0.105
Neg logit contributions
ad-0.117
ern-0.116
wered-0.102
zie-0.102
 Lucas-0.100
Token know
Token ID760
Feature activation-5.35e-02

Pos logit contributions
 divorce+0.103
broken+0.103
 late+0.097
 severe+0.091
 band+0.090
Neg logit contributions
ern-0.105
iles-0.101
ales-0.100
ad-0.099
zie-0.098
Token what
Token ID644
Feature activation-1.10e-01

Pos logit contributions
ern+0.082
ales+0.078
alia+0.077
iles+0.075
zie+0.074
Neg logit contributions
 divorce-0.083
broken-0.080
 late-0.077
 helicopter-0.076
 badly-0.075
Token anymore
Token ID7471
Feature activation+1.69e-02

Pos logit contributions
 divorce+0.058
broken+0.057
 late+0.053
 severe+0.050
AAAA+0.049
Neg logit contributions
ern-0.079
ad-0.076
iles-0.075
ales-0.075
zie-0.074
Token.
Token ID13
Feature activation-2.67e-03

Pos logit contributions
broken+0.025
 divorce+0.024
 late+0.023
 badly+0.022
 million+0.022
Neg logit contributions
ern-0.027
iles-0.027
ales-0.027
zie-0.026
ad-0.026
Token Lily
Token ID20037
Feature activation+3.55e-02

Pos logit contributions
ern+0.004
zie+0.004
 meanings+0.004
ad+0.004
 Lucas+0.004
Neg logit contributions
broken-0.004
 divorce-0.004
burn-0.004
 helicopter-0.004
 severe-0.004
Token was
Token ID373
Feature activation+6.95e-03

Pos logit contributions
 divorce+0.062
 late+0.059
AAAA+0.055
 helicopter+0.054
burn+0.054
Neg logit contributions
ad-0.047
ern-0.045
et-0.044
zie-0.044
 meanings-0.043
Token heart
Token ID2612
Feature activation+7.27e-02

Pos logit contributions
AAAA+0.009
Sorry+0.008
ever+0.008
oing+0.008
Quick+0.008
Neg logit contributions
 herself-0.011
ad-0.011
ern-0.011
 himself-0.010
wered-0.010
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
AAAA+0.047
 divorce+0.046
Sorry+0.031
Quick+0.029
 Sorry+0.028
Neg logit contributions
ad-0.165
et-0.165
ern-0.165
zie-0.161
sey-0.161
Token and
Token ID290
Feature activation+4.77e-02

Pos logit contributions
broken+0.231
 late+0.219
 divorce+0.218
 band+0.190
 severe+0.185
Neg logit contributions
ern-0.432
iles-0.423
ales-0.419
zie-0.408
ad-0.408
Token cried
Token ID16896
Feature activation+6.30e-02

Pos logit contributions
 divorce+0.061
broken+0.061
burn+0.055
AAAA+0.055
 late+0.054
Neg logit contributions
 meanings-0.079
ern-0.078
ad-0.078
zie-0.077
et-0.075
Token all
Token ID477
Feature activation-9.88e-02

Pos logit contributions
 divorce+0.120
AAAA+0.112
broken+0.112
Sorry+0.106
 late+0.105
Neg logit contributions
ern-0.070
ad-0.067
 meanings-0.065
et-0.065
ales-0.063
Token day
Token ID1110
Feature activation+5.35e-02

Pos logit contributions
ern+0.152
iles+0.147
ales+0.145
ad+0.143
 Lucas+0.141
Neg logit contributions
 divorce-0.151
broken-0.149
 late-0.142
 severe-0.133
 band-0.131
Token.
Token ID13
Feature activation-3.03e-03

Pos logit contributions
broken+0.076
 divorce+0.075
 late+0.071
 severe+0.066
 band+0.065
Neg logit contributions
ern-0.088
iles-0.086
ales-0.085
ad-0.084
zie-0.083
Token mother
Token ID2802
Feature activation-5.05e-03

Pos logit contributions
 divorce+0.034
broken+0.033
 band+0.032
 sudden+0.032
 sad+0.031
Neg logit contributions
ales-0.040
ern-0.037
iles-0.036
et-0.034
zie-0.034
Token was
Token ID373
Feature activation+7.77e-03

Pos logit contributions
ern+0.009
iles+0.009
ales+0.009
ad+0.009
 meanings+0.009
Neg logit contributions
broken-0.006
 divorce-0.006
 late-0.006
 band-0.005
 severe-0.005
Token sad
Token ID6507
Feature activation+1.42e-01

Pos logit contributions
 divorce+0.009
broken+0.009
 late+0.008
AAAA+0.008
 helicopter+0.008
Neg logit contributions
ern-0.014
ad-0.014
iles-0.013
et-0.013
zie-0.013
Token and
Token ID290
Feature activation+4.75e-02

Pos logit contributions
broken+0.142
 late+0.141
 divorce+0.133
 badly+0.127
 band+0.117
Neg logit contributions
ales-0.300
ern-0.298
iles-0.293
 meanings-0.287
zie-0.285
Token heart
Token ID2612
Feature activation+7.33e-02

Pos logit contributions
 divorce+0.056
broken+0.055
 late+0.052
 severe+0.047
 band+0.046
Neg logit contributions
ern-0.091
iles-0.088
ales-0.087
ad-0.086
 meanings-0.084
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
 divorce+0.077
broken+0.074
 late+0.068
AAAA+0.062
 helicopter+0.062
Neg logit contributions
ern-0.147
ad-0.141
iles-0.141
ales-0.140
zie-0.139
Token,
Token ID11
Feature activation-3.67e-04

Pos logit contributions
broken+0.238
 late+0.235
 together+0.226
 badly+0.218
 divorce+0.205
Neg logit contributions
ales-0.423
iles-0.415
ern-0.412
wered-0.395
zie-0.395
Token and
Token ID290
Feature activation+4.74e-02

Pos logit contributions
ern+0.001
iles+0.001
ales+0.001
ad+0.001
 meanings+0.001
Neg logit contributions
broken-0.000
 divorce-0.000
 late-0.000
 severe-0.000
 band-0.000
Token she
Token ID673
Feature activation+2.34e-02

Pos logit contributions
 divorce+0.061
broken+0.060
 late+0.059
 severe+0.053
 band+0.052
Neg logit contributions
ern-0.087
iles-0.083
ales-0.083
ad-0.079
io-0.078
Token reminded
Token ID14516
Feature activation+3.99e-02

Pos logit contributions
 divorce+0.031
broken+0.031
 late+0.028
 severe+0.026
 helicopter+0.026
Neg logit contributions
ern-0.040
ad-0.039
iles-0.039
ales-0.038
zie-0.038
Token her
Token ID607
Feature activation-2.43e-02

Pos logit contributions
 late+0.068
B+0.064
 divorce+0.063
 band+0.062
 together+0.061
Neg logit contributions
ern-0.056
artments-0.054
 meanings-0.053
iles-0.050
 poses-0.050
Token was
Token ID373
Feature activation+9.13e-03

Pos logit contributions
ern+0.062
iles+0.060
ales+0.060
ad+0.059
zie+0.058
Neg logit contributions
broken-0.041
 divorce-0.040
 late-0.037
 band-0.035
 severe-0.035
Token so
Token ID523
Feature activation-1.90e-02

Pos logit contributions
broken+0.010
 divorce+0.010
 late+0.009
 severe+0.008
 band+0.008
Neg logit contributions
ern-0.018
iles-0.017
ales-0.017
ad-0.017
 meanings-0.017
Token tired
Token ID10032
Feature activation+1.22e-01

Pos logit contributions
ad+0.037
ern+0.036
 Lucas+0.035
iles+0.034
 answers+0.034
Neg logit contributions
 divorce-0.019
burn-0.017
broken-0.017
 helicopter-0.016
AAAA-0.016
Token and
Token ID290
Feature activation+4.86e-02

Pos logit contributions
 late+0.137
 divorce+0.128
broken+0.128
 band+0.118
 severe+0.111
Neg logit contributions
ern-0.247
 meanings-0.247
iles-0.237
io-0.231
ales-0.230
Token heart
Token ID2612
Feature activation+7.31e-02

Pos logit contributions
 divorce+0.059
broken+0.057
 late+0.051
AAAA+0.051
burn+0.050
Neg logit contributions
ern-0.087
ad-0.085
zie-0.084
et-0.083
 Lucas-0.083
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
 divorce+0.064
broken+0.060
 late+0.055
AAAA+0.049
 helicopter+0.049
Neg logit contributions
ern-0.159
ad-0.154
iles-0.153
ales-0.153
et-0.151
Token that
Token ID326
Feature activation-9.51e-02

Pos logit contributions
broken+0.249
 late+0.239
 together+0.228
 badly+0.225
 divorce+0.211
Neg logit contributions
ern-0.415
iles-0.411
ales-0.407
zie-0.398
ori-0.396
Token he
Token ID339
Feature activation+2.92e-02

Pos logit contributions
ern+0.187
ales+0.180
iles+0.178
io+0.170
alia+0.170
Neg logit contributions
 divorce-0.114
 late-0.110
broken-0.107
 severe-0.097
 helicopter-0.097
Token just
Token ID655
Feature activation-1.62e-02

Pos logit contributions
 divorce+0.042
broken+0.040
 late+0.038
burn+0.037
AAAA+0.037
Neg logit contributions
ern-0.045
ad-0.045
et-0.044
zie-0.043
iles-0.043
Token stayed
Token ID9658
Feature activation+3.13e-02

Pos logit contributions
ern+0.032
iles+0.031
ales+0.031
io+0.030
alia+0.030
Neg logit contributions
broken-0.018
 divorce-0.018
 late-0.016
 band-0.015
 severe-0.015
Token there
Token ID612
Feature activation+6.38e-02

Pos logit contributions
 divorce+0.039
broken+0.038
 late+0.034
AAAA+0.033
 severe+0.033
Neg logit contributions
ern-0.056
ad-0.054
iles-0.053
et-0.053
ales-0.053
Token married
Token ID6405
Feature activation+6.25e-02

Pos logit contributions
AAAA+0.069
broken+0.068
 divorce+0.065
Quick+0.060
oing+0.059
Neg logit contributions
ad-0.100
 answers-0.097
ern-0.095
 meanings-0.092
 Lucas-0.092
Token.
Token ID13
Feature activation-6.61e-03

Pos logit contributions
broken+0.094
 divorce+0.094
 late+0.088
 band+0.083
 severe+0.082
Neg logit contributions
ern-0.098
iles-0.093
ales-0.092
ad-0.092
zie-0.091
Token Lily
Token ID20037
Feature activation+3.64e-02

Pos logit contributions
ern+0.012
ad+0.011
iles+0.011
ales+0.011
zie+0.011
Neg logit contributions
broken-0.009
 divorce-0.009
 late-0.008
 severe-0.007
 helicopter-0.007
Token was
Token ID373
Feature activation+6.07e-03

Pos logit contributions
 divorce+0.062
 late+0.058
broken+0.054
burn+0.053
 unst+0.053
Neg logit contributions
ad-0.049
et-0.047
ern-0.046
 meanings-0.045
 Lucas-0.045
Token heart
Token ID2612
Feature activation+7.32e-02

Pos logit contributions
 divorce+0.008
Sorry+0.008
AAAA+0.008
B+0.008
broken+0.008
Neg logit contributions
ad-0.009
ern-0.009
 herself-0.009
et-0.009
 answers-0.009
Tokenbroken
Token ID25826
Feature activation+2.12e-01

Pos logit contributions
AAAA+0.043
 divorce+0.041
Sorry+0.026
Quick+0.024
 CA+0.023
Neg logit contributions
ad-0.175
et-0.172
iddles-0.169
ales-0.169
ern-0.169
Token and
Token ID290
Feature activation+4.68e-02

Pos logit contributions
broken+0.233
 late+0.231
 divorce+0.218
 band+0.194
 together+0.190
Neg logit contributions
ern-0.432
ales-0.414
iles-0.413
zie-0.407
 meanings-0.406
Token cried
Token ID16896
Feature activation+6.24e-02

Pos logit contributions
 divorce+0.053
broken+0.053
 late+0.048
 severe+0.045
 band+0.044
Neg logit contributions
ern-0.089
iles-0.086
ad-0.086
ales-0.085
zie-0.084
Token a
Token ID257
Feature activation+4.93e-02

Pos logit contributions
 divorce+0.119
broken+0.113
AAAA+0.111
Sorry+0.106
 late+0.104
Neg logit contributions
ern-0.068
ad-0.067
ales-0.065
et-0.064
 meanings-0.064
Token lot
Token ID1256
Feature activation+1.08e-02

Pos logit contributions
 divorce+0.067
broken+0.066
 unst+0.062
burn+0.062
 Our+0.061
Neg logit contributions
 meanings-0.071
 answers-0.068
ad-0.066
ern-0.066
icity-0.066
Token.
Token ID13
Feature activation-6.52e-03

Pos logit contributions
 divorce+0.020
broken+0.019
AAAA+0.018
 sudden+0.018
 late+0.018
Neg logit contributions
ern-0.013
ad-0.013
iles-0.012
ales-0.012
et-0.012
Token before
Token ID878
Feature activation-2.10e-02

Pos logit contributions
ern+0.076
ales+0.072
iles+0.072
ad+0.071
zie+0.070
Neg logit contributions
 divorce-0.051
broken-0.050
 late-0.050
 band-0.044
 severe-0.044
Token.
Token ID13
Feature activation-6.52e-03

Pos logit contributions
ern+0.028
ad+0.026
iles+0.026
ales+0.026
zie+0.025
Neg logit contributions
 divorce-0.036
broken-0.036
 late-0.034
 severe-0.032
 helicopter-0.032
Token Lily
Token ID20037
Feature activation+3.73e-02

Pos logit contributions
ern+0.010
zie+0.009
 meanings+0.009
ad+0.009
 Lucas+0.009
Neg logit contributions
broken-0.010
 divorce-0.010
burn-0.010
 helicopter-0.009
 severe-0.009
Token was
Token ID373
Feature activation+5.16e-03

Pos logit contributions
 divorce+0.057
 late+0.053
broken+0.050
AAAA+0.049
 helicopter+0.048
Neg logit contributions
ern-0.058
ad-0.058
et-0.057
zie-0.055
alia-0.055
Token heart
Token ID2612
Feature activation+7.29e-02

Pos logit contributions
 divorce+0.005
AAAA+0.005
Sorry+0.005
B+0.005
burn+0.005
Neg logit contributions
ern-0.010
ad-0.010
et-0.009
 herself-0.009
 answers-0.009
Tokenbroken
Token ID25826
Feature activation+2.11e-01

Pos logit contributions
 divorce+0.030
AAAA+0.025
 late+0.013
Sorry+0.012
 helicopter+0.011
Neg logit contributions
ad-0.187
ern-0.187
et-0.185
ales-0.183
 Lucas-0.182
Token.
Token ID13
Feature activation-6.78e-03

Pos logit contributions
broken+0.215
 late+0.214
 divorce+0.188
 together+0.180
 badly+0.177
Neg logit contributions
ern-0.454
iles-0.450
ales-0.447
 meanings-0.432
zie-0.428
Token "
Token ID366
Feature activation+3.71e-02

Pos logit contributions
ern+0.011
iles+0.010
ad+0.010
ales+0.010
zie+0.010
Neg logit contributions
 divorce-0.010
broken-0.010
 late-0.009
 severe-0.009
 band-0.009
TokenIt
Token ID1026
Feature activation+3.40e-02

Pos logit contributions
 divorce+0.057
broken+0.053
 helicopter+0.051
 severe+0.050
 unst+0.048
Neg logit contributions
ern-0.053
ad-0.052
ori-0.049
et-0.049
ima-0.049
Token's
Token ID338
Feature activation+4.75e-02

Pos logit contributions
 divorce+0.052
broken+0.051
 sudden+0.049
 late+0.047
Sorry+0.047
Neg logit contributions
 answers-0.049
zie-0.048
et-0.047
 meanings-0.047
ad-0.046
Token not
Token ID407
Feature activation-7.41e-03

Pos logit contributions
 divorce+0.059
broken+0.057
AAAA+0.051
B+0.051
Sorry+0.051
Neg logit contributions
ern-0.084
et-0.082
ad-0.081
zie-0.080
 meanings-0.079
Token again
Token ID757
Feature activation+1.17e-01

Pos logit contributions
ern+0.187
ad+0.181
iles+0.180
et+0.179
ales+0.178
Neg logit contributions
 divorce-0.100
broken-0.098
 late-0.088
AAAA-0.083
 severe-0.082
Token.
Token ID13
Feature activation-5.85e-03

Pos logit contributions
broken+0.164
 late+0.163
 divorce+0.156
 severe+0.143
 badly+0.142
Neg logit contributions
ern-0.200
iles-0.188
 meanings-0.187
zie-0.187
ales-0.186
Token He
Token ID679
Feature activation-3.89e-02

Pos logit contributions
ern+0.010
ad+0.010
iles+0.010
ales+0.010
zie+0.010
Neg logit contributions
broken-0.008
 divorce-0.008
 late-0.007
 severe-0.007
 band-0.007
Token was
Token ID373
Feature activation+5.67e-03

Pos logit contributions
ern+0.069
ad+0.067
iles+0.066
zie+0.065
et+0.065
Neg logit contributions
 divorce-0.050
broken-0.049
 late-0.046
 severe-0.043
 helicopter-0.042
Token heart
Token ID2612
Feature activation+7.23e-02

Pos logit contributions
 divorce+0.006
Sorry+0.006
AAAA+0.006
broken+0.006
B+0.006
Neg logit contributions
ern-0.010
ad-0.010
et-0.010
zie-0.010
 answers-0.010
Tokenbroken
Token ID25826
Feature activation+2.11e-01

Pos logit contributions
 divorce+0.030
AAAA+0.021
 late+0.014
 helicopter+0.012
Sorry+0.012
Neg logit contributions
ern-0.189
ad-0.187
zie-0.185
et-0.184
ales-0.183
Token and
Token ID290
Feature activation+4.59e-02

Pos logit contributions
broken+0.252
 late+0.245
 divorce+0.235
 band+0.208
 badly+0.208
Neg logit contributions
ern-0.413
iles-0.398
ales-0.395
 meanings-0.388
zie-0.387
Token never
Token ID1239
Feature activation+3.00e-02

Pos logit contributions
broken+0.056
 divorce+0.056
burn+0.049
 late+0.049
AAAA+0.049
Neg logit contributions
ern-0.080
ad-0.080
 meanings-0.078
zie-0.077
et-0.077
Token wanted
Token ID2227
Feature activation-4.30e-06

Pos logit contributions
 divorce+0.039
broken+0.038
 late+0.035
 severe+0.034
 helicopter+0.034
Neg logit contributions
ern-0.052
ad-0.050
iles-0.050
 meanings-0.049
ales-0.049
Token to
Token ID284
Feature activation-1.30e-02

Feature not active on prev token;
no predictions were affected.
Token go
Token ID467
Feature activation+9.62e-02

Pos logit contributions
ern+0.021
iles+0.021
ales+0.021
zie+0.020
 Lucas+0.020
Neg logit contributions
 divorce-0.019
broken-0.018
 late-0.017
 band-0.017
 helicopter-0.016
Token okay
Token ID8788
Feature activation+1.21e-01

Pos logit contributions
 divorce+0.027
broken+0.026
 late+0.020
 helicopter+0.018
 severe+0.017
Neg logit contributions
ern-0.121
iles-0.117
ad-0.116
ales-0.116
zie-0.115
Token,
Token ID11
Feature activation-4.00e-03

Pos logit contributions
 divorce+0.147
broken+0.146
 late+0.137
 band+0.126
 severe+0.125
Neg logit contributions
ern-0.223
ales-0.216
iles-0.214
ad-0.212
et-0.210
Token sweet
Token ID6029
Feature activation-8.97e-02

Pos logit contributions
ern+0.006
ales+0.006
iles+0.006
ad+0.006
et+0.006
Neg logit contributions
 divorce-0.006
broken-0.006
 late-0.006
 band-0.005
 helicopter-0.005
Tokenie
Token ID494
Feature activation+1.20e-03

Pos logit contributions
ern+0.148
ales+0.142
 meanings+0.140
et+0.139
artments+0.137
Neg logit contributions
broken-0.124
 divorce-0.121
 late-0.117
 band-0.114
burn-0.111
Token.
Token ID13
Feature activation-8.93e-03

Pos logit contributions
broken+0.002
 divorce+0.002
 late+0.002
 band+0.002
 severe+0.002
Neg logit contributions
ern-0.002
iles-0.002
ales-0.002
ad-0.002
 meanings-0.002
Token Acc
Token ID6366
Feature activation+2.11e-01

Pos logit contributions
ern+0.015
ad+0.014
iles+0.014
zie+0.014
ales+0.014
Neg logit contributions
broken-0.013
 divorce-0.013
 late-0.012
 severe-0.011
 band-0.011
Tokenidents
Token ID3231
Feature activation+4.06e-02

Pos logit contributions
 divorce+0.341
broken+0.318
 severe+0.316
AAAA+0.315
 late+0.313
Neg logit contributions
ern-0.297
 meanings-0.287
ales-0.282
zie-0.278
et-0.275
Token happen
Token ID1645
Feature activation+6.14e-02

Pos logit contributions
 divorce+0.071
broken+0.071
 late+0.067
 severe+0.064
 band+0.063
Neg logit contributions
ern-0.053
iles-0.050
ales-0.050
ad-0.050
 Lucas-0.049
Token.
Token ID13
Feature activation-8.91e-03

Pos logit contributions
broken+0.088
 divorce+0.087
 late+0.084
 severe+0.077
 band+0.076
Neg logit contributions
ern-0.101
ad-0.096
iles-0.095
ales-0.095
 meanings-0.093
Token Let
Token ID3914
Feature activation+1.02e-01

Pos logit contributions
ern+0.015
ad+0.014
iles+0.014
zie+0.014
 meanings+0.014
Neg logit contributions
broken-0.012
 divorce-0.012
 late-0.011
 severe-0.011
 band-0.010
Token's
Token ID338
Feature activation+4.82e-02

Pos logit contributions
broken+0.165
 divorce+0.150
 late+0.149
oing+0.146
AAAA+0.141
Neg logit contributions
zie-0.147
et-0.146
ern-0.136
 answers-0.135
ima-0.134
Token's
Token ID338
Feature activation+4.80e-02

Pos logit contributions
 divorce+0.043
broken+0.042
 late+0.039
AAAA+0.037
 sudden+0.037
Neg logit contributions
zie-0.063
ern-0.063
 answers-0.062
ad-0.062
 Lucas-0.062
Token okay
Token ID8788
Feature activation+1.21e-01

Pos logit contributions
broken+0.024
 divorce+0.024
 late+0.023
 severe+0.017
 band+0.014
Neg logit contributions
ern-0.123
iles-0.121
ales-0.121
ad-0.119
 Lucas-0.119
Token,
Token ID11
Feature activation-5.67e-04

Pos logit contributions
broken+0.145
 divorce+0.144
 late+0.134
 severe+0.124
 band+0.123
Neg logit contributions
ern-0.223
ales-0.218
iles-0.216
ad-0.214
et-0.212
Token Fiona
Token ID42207
Feature activation-7.55e-02

Pos logit contributions
ern+0.001
ales+0.001
iles+0.001
ad+0.001
et+0.001
Neg logit contributions
 divorce-0.001
broken-0.001
 late-0.001
 severe-0.001
 band-0.001
Token.
Token ID13
Feature activation-4.58e-03

Pos logit contributions
ern+0.120
iles+0.116
ales+0.114
ad+0.114
 meanings+0.111
Neg logit contributions
broken-0.111
 divorce-0.109
 late-0.103
 severe-0.097
 band-0.096
Token Acc
Token ID6366
Feature activation+2.11e-01

Pos logit contributions
ern+0.008
ad+0.007
iles+0.007
zie+0.007
ales+0.007
Neg logit contributions
broken-0.007
 divorce-0.006
 late-0.006
 severe-0.006
 band-0.006
Tokenidents
Token ID3231
Feature activation+4.00e-02

Pos logit contributions
 divorce+0.324
broken+0.302
 severe+0.297
 late+0.295
AAAA+0.292
Neg logit contributions
ern-0.319
zie-0.300
ales-0.298
 meanings-0.298
et-0.294
Token happen
Token ID1645
Feature activation+5.92e-02

Pos logit contributions
 divorce+0.064
broken+0.063
 late+0.061
 severe+0.057
AAAA+0.056
Neg logit contributions
ern-0.058
alia-0.056
ales-0.055
 Lucas-0.055
iles-0.055
Token."
Token ID526
Feature activation+4.19e-02

Pos logit contributions
broken+0.086
 divorce+0.085
 late+0.081
 severe+0.075
 band+0.074
Neg logit contributions
ern-0.096
ad-0.091
iles-0.090
ales-0.090
 meanings-0.088
Token Then
Token ID3244
Feature activation-7.54e-03

Pos logit contributions
 divorce+0.048
broken+0.048
 late+0.044
 severe+0.041
 helicopter+0.040
Neg logit contributions
ern-0.080
iles-0.077
ad-0.076
ales-0.076
zie-0.075
Token they
Token ID484
Feature activation+9.92e-02

Pos logit contributions
ern+0.014
iles+0.014
ales+0.014
 meanings+0.014
ad+0.014
Neg logit contributions
broken-0.008
 divorce-0.008
 late-0.008
 band-0.007
 sudden-0.007
Token M
Token ID337
Feature activation-2.03e-02

Pos logit contributions
ern+0.053
ales+0.051
iles+0.051
 meanings+0.050
ad+0.050
Neg logit contributions
 divorce-0.032
broken-0.031
 late-0.030
 band-0.027
 helicopter-0.027
Tokenummy
Token ID13513
Feature activation-8.07e-02

Pos logit contributions
ern+0.026
iles+0.025
ales+0.025
ad+0.024
 Lucas+0.024
Neg logit contributions
broken-0.036
 divorce-0.035
 late-0.033
 severe-0.032
 band-0.031
Token used
Token ID973
Feature activation+1.32e-02

Pos logit contributions
ern+0.139
ad+0.134
iles+0.134
ales+0.132
zie+0.131
Neg logit contributions
 divorce-0.107
broken-0.106
 late-0.098
 severe-0.092
 helicopter-0.091
Token her
Token ID607
Feature activation-2.20e-02

Pos logit contributions
 divorce+0.017
broken+0.017
 late+0.016
 band+0.016
 unst+0.015
Neg logit contributions
iles-0.023
ern-0.023
ales-0.023
zie-0.022
 meanings-0.022
Token fing
Token ID8038
Feature activation-2.16e-02

Pos logit contributions
ern+0.033
ad+0.032
 meanings+0.031
iles+0.031
et+0.031
Neg logit contributions
 divorce-0.034
broken-0.034
 late-0.032
 severe-0.030
AAAA-0.029
Tokenern
Token ID1142
Feature activation-2.80e-01

Pos logit contributions
 Lucas+0.014
 meanings+0.012
 poses+0.012
 names+0.012
iles+0.012
Neg logit contributions
broken-0.048
 badly-0.047
 divorce-0.046
Acc-0.045
 late-0.045
Tokenails
Token ID1768
Feature activation-5.86e-02

Pos logit contributions
ern+0.471
iles+0.460
 meanings+0.456
ales+0.449
ad+0.447
Neg logit contributions
broken-0.377
 divorce-0.366
 late-0.343
 band-0.328
 severe-0.320
Token to
Token ID284
Feature activation-1.01e-02

Pos logit contributions
ern+0.110
iles+0.105
ales+0.103
ad+0.102
zie+0.102
Neg logit contributions
broken-0.071
 divorce-0.069
 late-0.065
 band-0.061
 severe-0.060
Token gently
Token ID15165
Feature activation+3.26e-02

Pos logit contributions
iles+0.018
ern+0.017
alia+0.017
io+0.017
ales+0.017
Neg logit contributions
 band-0.013
broken-0.013
 divorce-0.013
 unst-0.012
 late-0.012
Token pull
Token ID2834
Feature activation-1.30e-02

Pos logit contributions
broken+0.042
 divorce+0.041
 late+0.038
 band+0.038
 unst+0.036
Neg logit contributions
iles-0.058
ern-0.057
ales-0.055
zie-0.055
 meanings-0.055
Token the
Token ID262
Feature activation+1.86e-03

Pos logit contributions
ern+0.023
iles+0.022
ales+0.022
zie+0.021
 meanings+0.021
Neg logit contributions
broken-0.017
 divorce-0.016
 late-0.016
 band-0.015
 together-0.015
Token not
Token ID407
Feature activation-7.18e-03

Pos logit contributions
broken+0.008
 late+0.008
 divorce+0.008
 severe+0.007
 sad+0.007
Neg logit contributions
ern-0.013
ales-0.013
iles-0.013
ad-0.012
 meanings-0.012
Token bigger
Token ID5749
Feature activation+3.00e-02

Pos logit contributions
ern+0.012
iles+0.012
ad+0.012
ales+0.012
et+0.012
Neg logit contributions
 divorce-0.010
broken-0.009
 late-0.008
 severe-0.008
AAAA-0.008
Token than
Token ID621
Feature activation-1.91e-01

Pos logit contributions
broken+0.045
 late+0.044
 divorce+0.044
 together+0.040
 band+0.039
Neg logit contributions
ern-0.049
zie-0.045
ad-0.044
ales-0.044
iles-0.043
Token a
Token ID257
Feature activation+4.81e-02

Pos logit contributions
ern+0.312
alia+0.282
ad+0.280
iles+0.279
zie+0.279
Neg logit contributions
 divorce-0.284
broken-0.268
 late-0.264
 helicopter-0.246
 band-0.241
Token fing
Token ID8038
Feature activation-2.00e-02

Pos logit contributions
broken+0.065
 divorce+0.064
 late+0.059
 severe+0.055
AAAA+0.054
Neg logit contributions
ern-0.081
iles-0.078
ad-0.077
et-0.076
 meanings-0.076
Tokenern
Token ID1142
Feature activation-2.80e-01

Pos logit contributions
 Lucas+0.018
 meanings+0.017
 facts+0.017
iles+0.016
 poses+0.016
Neg logit contributions
 divorce-0.039
 badly-0.037
broken-0.037
 unst-0.036
Acc-0.035
Tokenail
Token ID603
Feature activation-3.00e-02

Pos logit contributions
 meanings+0.410
ern+0.389
 Lucas+0.374
ad+0.373
iles+0.368
Neg logit contributions
broken-0.439
 divorce-0.424
 late-0.411
 band-0.379
 severe-0.372
Token.
Token ID13
Feature activation-9.77e-03

Pos logit contributions
ern+0.049
iles+0.047
ad+0.046
ales+0.046
 Lucas+0.046
Neg logit contributions
 divorce-0.042
broken-0.042
 late-0.039
 band-0.037
 helicopter-0.036
Token They
Token ID1119
Feature activation+1.05e-02

Pos logit contributions
ern+0.017
iles+0.016
ad+0.016
zie+0.016
et+0.016
Neg logit contributions
 divorce-0.012
broken-0.012
 late-0.012
AAAA-0.011
 helicopter-0.011
Token also
Token ID635
Feature activation-4.57e-02

Pos logit contributions
broken+0.012
 divorce+0.012
 band+0.011
 helicopter+0.011
 late+0.010
Neg logit contributions
ern-0.020
ales-0.019
zie-0.019
iles-0.019
ad-0.018
Token found
Token ID1043
Feature activation-2.95e-02

Pos logit contributions
ern+0.093
ales+0.089
iles+0.088
zie+0.086
 Lucas+0.083
Neg logit contributions
broken-0.048
 divorce-0.045
 destroyed-0.041
 band-0.041
 late-0.041
Token the
Token ID262
Feature activation-5.95e-03

Pos logit contributions
ern+0.005
iles+0.005
ales+0.005
 meanings+0.005
zie+0.005
Neg logit contributions
broken-0.003
 divorce-0.003
 late-0.003
 band-0.003
 severe-0.002
Token dove
Token ID38862
Feature activation-3.83e-03

Pos logit contributions
ern+0.010
iles+0.009
et+0.009
ales+0.009
ad+0.009
Neg logit contributions
 divorce-0.009
broken-0.008
 late-0.008
 severe-0.007
 million-0.007
Token with
Token ID351
Feature activation-5.09e-02

Pos logit contributions
et+0.006
ern+0.005
ales+0.005
ad+0.005
zie+0.005
Neg logit contributions
 divorce-0.006
broken-0.006
AAAA-0.006
Sorry-0.006
 late-0.006
Token her
Token ID607
Feature activation-2.69e-02

Pos logit contributions
ern+0.074
ad+0.070
iles+0.070
ales+0.069
zie+0.068
Neg logit contributions
 divorce-0.082
broken-0.082
 late-0.076
 severe-0.072
 helicopter-0.071
Token fing
Token ID8038
Feature activation-1.66e-02

Pos logit contributions
ern+0.041
ad+0.039
et+0.038
iles+0.038
ales+0.038
Neg logit contributions
 divorce-0.042
broken-0.040
 late-0.039
AAAA-0.036
 severe-0.036
Tokenern
Token ID1142
Feature activation-2.76e-01

Pos logit contributions
 Lucas+0.010
 meanings+0.009
 poses+0.008
 facts+0.008
iles+0.008
Neg logit contributions
broken-0.038
 divorce-0.038
 badly-0.038
 band-0.036
B-0.036
Tokenails
Token ID1768
Feature activation-5.52e-02

Pos logit contributions
ern+0.424
 meanings+0.419
iles+0.416
ad+0.410
 Lucas+0.408
Neg logit contributions
broken-0.405
 divorce-0.391
 late-0.371
 band-0.351
 severe-0.348
Token.
Token ID13
Feature activation-4.69e-03

Pos logit contributions
ern+0.087
iles+0.083
ales+0.081
ad+0.081
zie+0.080
Neg logit contributions
broken-0.083
 divorce-0.081
 late-0.077
 band-0.073
 severe-0.072
Token The
Token ID383
Feature activation+2.45e-02

Pos logit contributions
ern+0.006
 meanings+0.005
zie+0.005
ad+0.005
 Lucas+0.005
Neg logit contributions
 divorce-0.009
broken-0.009
 severe-0.008
burn-0.008
AAAA-0.008
Token dove
Token ID38862
Feature activation-3.84e-03

Pos logit contributions
 divorce+0.040
broken+0.039
 late+0.038
 million+0.036
 severe+0.035
Neg logit contributions
ern-0.035
iles-0.033
ad-0.033
et-0.033
ori-0.032
Token was
Token ID373
Feature activation+6.98e-03

Pos logit contributions
et+0.006
ad+0.005
ern+0.005
zie+0.005
 scales+0.005
Neg logit contributions
 divorce-0.006
 late-0.006
broken-0.005
arre-0.005
AAAA-0.005
Token lots
Token ID6041
Feature activation-4.15e-03

Pos logit contributions
ern+0.101
zie+0.093
iles+0.092
ales+0.092
alia+0.090
Neg logit contributions
 late-0.070
 divorce-0.070
broken-0.068
 band-0.066
 helicopter-0.065
Token of
Token ID286
Feature activation+9.21e-03

Pos logit contributions
ern+0.010
iles+0.009
zie+0.009
ales+0.009
ad+0.009
Neg logit contributions
broken-0.003
 divorce-0.003
 late-0.003
 band-0.002
 severe-0.002
Token words
Token ID2456
Feature activation-1.28e-01

Pos logit contributions
 late+0.012
 band+0.012
 divorce+0.011
 helicopter+0.011
 million+0.011
Neg logit contributions
ern-0.017
zie-0.016
io-0.016
ales-0.016
alia-0.016
Token and
Token ID290
Feature activation+4.68e-02

Pos logit contributions
ern+0.222
iles+0.212
zie+0.209
ales+0.207
ad+0.204
Neg logit contributions
broken-0.169
 divorce-0.167
 late-0.165
 together-0.154
 helicopter-0.151
Token their
Token ID511
Feature activation+4.40e-02

Pos logit contributions
 divorce+0.069
 late+0.067
broken+0.066
 band+0.064
 sudden+0.063
Neg logit contributions
ern-0.077
iles-0.071
ales-0.070
io-0.070
zie-0.069
Token meanings
Token ID26368
Feature activation-2.68e-01

Pos logit contributions
 divorce+0.064
 helicopter+0.063
broken+0.062
 band+0.061
 sudden+0.061
Neg logit contributions
ern-0.076
ales-0.067
alia-0.066
iles-0.065
ad-0.064
Token".
Token ID1911
Feature activation+3.61e-02

Pos logit contributions
ern+0.452
iles+0.427
zie+0.426
alia+0.426
ales+0.420
Neg logit contributions
broken-0.362
 divorce-0.349
 late-0.343
 together-0.330
 helicopter-0.322
Token
Token ID198
Feature activation+2.86e-02

Pos logit contributions
 divorce+0.042
broken+0.041
 late+0.040
 band+0.036
 helicopter+0.035
Neg logit contributions
ern-0.067
iles-0.066
ales-0.065
et-0.064
ad-0.064
Token
Token ID198
Feature activation+2.82e-02

Pos logit contributions
 divorce+0.025
broken+0.025
 late+0.025
B+0.022
 band+0.022
Neg logit contributions
ern-0.060
ales-0.060
et-0.059
iles-0.059
 meanings-0.058
TokenM
Token ID44
Feature activation-2.58e-02

Pos logit contributions
 divorce+0.034
broken+0.034
 late+0.031
 severe+0.028
 band+0.028
Neg logit contributions
ern-0.052
iles-0.050
ad-0.050
ales-0.050
zie-0.049
Tokenolly
Token ID5098
Feature activation-7.73e-02

Pos logit contributions
ern+0.033
ales+0.032
et+0.032
zie+0.031
 Lucas+0.030
Neg logit contributions
broken-0.044
 divorce-0.044
 late-0.041
 band-0.040
 severe-0.039
Token new
Token ID649
Feature activation+3.70e-02

Pos logit contributions
 late+0.067
broken+0.064
 divorce+0.061
 helicopter+0.061
 band+0.060
Neg logit contributions
ern-0.119
 meanings-0.113
ales-0.112
iles-0.111
alia-0.109
Token words
Token ID2456
Feature activation-1.24e-01

Pos logit contributions
broken+0.050
 divorce+0.050
 helicopter+0.048
 band+0.047
 late+0.047
Neg logit contributions
ern-0.063
ales-0.060
iles-0.059
ad-0.057
zie-0.057
Token and
Token ID290
Feature activation+5.15e-02

Pos logit contributions
ern+0.212
iles+0.201
ales+0.199
zie+0.196
ad+0.193
Neg logit contributions
broken-0.170
 divorce-0.166
 late-0.161
 band-0.150
 helicopter-0.149
Token learn
Token ID2193
Feature activation+5.58e-03

Pos logit contributions
 divorce+0.067
broken+0.067
 late+0.065
 band+0.063
 helicopter+0.061
Neg logit contributions
ern-0.093
iles-0.088
ales-0.086
io-0.083
ad-0.081
Token their
Token ID511
Feature activation+4.39e-02

Pos logit contributions
 divorce+0.007
broken+0.007
 late+0.007
 helicopter+0.007
 band+0.007
Neg logit contributions
ern-0.010
zie-0.009
ales-0.009
iles-0.009
alia-0.009
Token meanings
Token ID26368
Feature activation-2.67e-01

Pos logit contributions
 divorce+0.070
broken+0.067
 late+0.063
 band+0.062
 helicopter+0.061
Neg logit contributions
ern-0.068
ales-0.065
zie-0.063
iles-0.062
ad-0.062
Token.
Token ID13
Feature activation-4.37e-03

Pos logit contributions
ern+0.462
alia+0.430
zie+0.429
ales+0.427
iles+0.427
Neg logit contributions
broken-0.355
 divorce-0.340
 late-0.339
 together-0.325
 helicopter-0.311
Token One
Token ID1881
Feature activation-8.91e-02

Pos logit contributions
ern+0.008
iles+0.008
ales+0.008
et+0.008
ad+0.008
Neg logit contributions
 divorce-0.005
 late-0.005
broken-0.005
AAAA-0.004
 million-0.004
Token day
Token ID1110
Feature activation+4.54e-02

Pos logit contributions
ern+0.162
ales+0.155
iles+0.151
zie+0.150
ad+0.150
Neg logit contributions
 divorce-0.114
broken-0.110
 late-0.107
 band-0.100
 helicopter-0.099
Token,
Token ID11
Feature activation-2.52e-03

Pos logit contributions
 divorce+0.058
broken+0.057
 late+0.053
 severe+0.049
 band+0.049
Neg logit contributions
ern-0.081
iles-0.078
ales-0.077
ad-0.077
zie-0.076
Token Tim
Token ID5045
Feature activation+1.30e-02

Pos logit contributions
ern+0.005
iles+0.005
ales+0.005
ad+0.005
zie+0.005
Neg logit contributions
 divorce-0.003
broken-0.003
 late-0.002
 band-0.002
 severe-0.002
Token all
Token ID477
Feature activation-9.97e-02

Pos logit contributions
iles+0.051
ales+0.050
ern+0.049
zie+0.049
io+0.049
Neg logit contributions
 late-0.026
 million-0.026
 divorce-0.026
 band-0.024
 severe-0.023
Token the
Token ID262
Feature activation-5.06e-03

Pos logit contributions
ern+0.164
iles+0.163
io+0.157
olo+0.156
ales+0.155
Neg logit contributions
 late-0.141
 divorce-0.131
 together-0.131
 million-0.127
broken-0.123
Token words
Token ID2456
Feature activation-1.23e-01

Pos logit contributions
ern+0.009
io+0.009
ales+0.008
zie+0.008
olo+0.008
Neg logit contributions
 late-0.007
 little-0.007
 sad-0.006
 million-0.006
 lazy-0.006
Token and
Token ID290
Feature activation+4.59e-02

Pos logit contributions
ern+0.225
iles+0.217
ales+0.214
ad+0.212
zie+0.212
Neg logit contributions
 divorce-0.152
broken-0.151
 late-0.142
 helicopter-0.130
 band-0.129
Token their
Token ID511
Feature activation+4.83e-02

Pos logit contributions
 divorce+0.064
broken+0.062
 late+0.061
 band+0.058
 million+0.057
Neg logit contributions
ern-0.079
iles-0.076
ales-0.076
zie-0.072
ad-0.072
Token meanings
Token ID26368
Feature activation-2.56e-01

Pos logit contributions
 divorce+0.090
 sad+0.088
broken+0.087
 sudden+0.086
 band+0.086
Neg logit contributions
ern-0.065
ales-0.056
zie-0.054
alia-0.052
iles-0.052
Token in
Token ID287
Feature activation-3.20e-02

Pos logit contributions
ern+0.388
iles+0.371
ales+0.364
ad+0.361
zie+0.360
Neg logit contributions
broken-0.397
 divorce-0.392
 late-0.370
 severe-0.347
 helicopter-0.346
Token it
Token ID340
Feature activation+7.79e-02

Pos logit contributions
ern+0.056
iles+0.055
zie+0.054
ales+0.054
alia+0.053
Neg logit contributions
 divorce-0.040
broken-0.038
 late-0.037
 million-0.037
 helicopter-0.037
Token.
Token ID13
Feature activation-6.18e-03

Pos logit contributions
 divorce+0.098
broken+0.096
 late+0.096
 together+0.092
 band+0.086
Neg logit contributions
ern-0.141
iles-0.134
ales-0.132
ad-0.130
zie-0.129
Token
Token ID198
Feature activation+3.30e-02

Pos logit contributions
ern+0.010
zie+0.009
 meanings+0.009
ad+0.009
 Lucas+0.009
Neg logit contributions
broken-0.010
 divorce-0.009
 severe-0.009
burn-0.008
 helicopter-0.008
Token
Token ID198
Feature activation+3.30e-02

Pos logit contributions
broken+0.042
 divorce+0.041
 late+0.036
 severe+0.035
 helicopter+0.035
Neg logit contributions
ern-0.060
ad-0.057
iles-0.057
ales-0.056
zie-0.056
Token.
Token ID13
Feature activation-1.38e-02

Pos logit contributions
ern+0.141
iles+0.136
ales+0.136
ad+0.135
et+0.133
Neg logit contributions
 divorce-0.149
broken-0.148
 late-0.137
AAAA-0.134
 severe-0.130
Token They
Token ID1119
Feature activation+1.05e-02

Pos logit contributions
ern+0.024
iles+0.023
ad+0.022
ales+0.022
zie+0.022
Neg logit contributions
broken-0.019
 divorce-0.018
 late-0.017
 severe-0.016
 band-0.016
Token have
Token ID423
Feature activation+2.61e-02

Pos logit contributions
 divorce+0.014
broken+0.014
 late+0.013
 band+0.013
 severe+0.012
Neg logit contributions
ern-0.018
zie-0.017
iles-0.017
ales-0.017
 Lucas-0.017
Token stories
Token ID3923
Feature activation-8.19e-02

Pos logit contributions
 divorce+0.036
broken+0.036
 late+0.033
 band+0.032
 severe+0.031
Neg logit contributions
ern-0.044
iles-0.042
ales-0.042
ad-0.042
zie-0.042
Token and
Token ID290
Feature activation+4.06e-02

Pos logit contributions
ern+0.134
ad+0.130
iles+0.129
ales+0.128
et+0.128
Neg logit contributions
broken-0.112
 divorce-0.111
 late-0.099
AAAA-0.098
 severe-0.097
Token meanings
Token ID26368
Feature activation-2.53e-01

Pos logit contributions
 divorce+0.067
broken+0.065
 late+0.062
 band+0.060
 severe+0.059
Neg logit contributions
ern-0.059
iles-0.055
ales-0.055
zie-0.055
ad-0.054
Token that
Token ID326
Feature activation-1.04e-01

Pos logit contributions
ern+0.367
iles+0.350
ad+0.347
ales+0.346
zie+0.339
Neg logit contributions
 divorce-0.404
broken-0.402
 late-0.376
 severe-0.357
 band-0.353
Token you
Token ID345
Feature activation-1.06e-02

Pos logit contributions
ern+0.174
zie+0.159
ales+0.159
iles+0.159
ad+0.158
Neg logit contributions
 divorce-0.155
broken-0.143
 late-0.142
 band-0.135
 helicopter-0.133
Token may
Token ID743
Feature activation+1.12e-02

Pos logit contributions
ern+0.022
zie+0.021
io+0.020
alia+0.019
iles+0.019
Neg logit contributions
 divorce-0.011
broken-0.011
 band-0.011
 late-0.010
 destroyed-0.010
Token not
Token ID407
Feature activation-1.01e-02

Pos logit contributions
 divorce+0.012
 late+0.012
broken+0.011
 severe+0.011
 sad+0.011
Neg logit contributions
io-0.023
ern-0.023
ales-0.023
et-0.022
iles-0.022
Token understand
Token ID1833
Feature activation+1.65e-02

Pos logit contributions
ern+0.018
zie+0.017
io+0.017
 Lucas+0.016
iles+0.016
Neg logit contributions
 divorce-0.014
broken-0.014
 late-0.013
 band-0.012
 severe-0.012
Token saw
Token ID2497
Feature activation-3.51e-02

Pos logit contributions
ales+0.108
ern+0.106
iles+0.106
ad+0.101
zie+0.100
Neg logit contributions
broken-0.058
 divorce-0.055
 band-0.054
 late-0.052
 sudden-0.049
Token many
Token ID867
Feature activation-4.51e-02

Pos logit contributions
ern+0.061
ales+0.058
iles+0.057
zie+0.056
ad+0.055
Neg logit contributions
 divorce-0.049
broken-0.047
 late-0.046
 band-0.044
 helicopter-0.044
Token words
Token ID2456
Feature activation-1.18e-01

Pos logit contributions
ern+0.085
ales+0.079
zie+0.079
io+0.078
stack+0.076
Neg logit contributions
 million-0.057
 sudden-0.057
 band-0.057
broken-0.056
 late-0.056
Token and
Token ID290
Feature activation+4.51e-02

Pos logit contributions
ern+0.239
ales+0.231
iles+0.231
zie+0.228
 meanings+0.220
Neg logit contributions
 divorce-0.124
broken-0.123
 late-0.119
 helicopter-0.109
 together-0.108
Token their
Token ID511
Feature activation+4.54e-02

Pos logit contributions
 divorce+0.067
broken+0.066
 late+0.063
 sudden+0.062
 band+0.062
Neg logit contributions
ern-0.074
ales-0.070
zie-0.069
iles-0.069
io-0.065
Token meanings
Token ID26368
Feature activation-2.53e-01

Pos logit contributions
 divorce+0.075
 helicopter+0.075
 sad+0.074
 band+0.073
broken+0.073
Neg logit contributions
ern-0.071
ales-0.063
alia-0.061
iles-0.059
zie-0.059
Token.
Token ID13
Feature activation-8.95e-03

Pos logit contributions
ern+0.380
iles+0.362
ales+0.358
zie+0.354
ad+0.350
Neg logit contributions
broken-0.395
 divorce-0.387
 late-0.369
 severe-0.345
 helicopter-0.345
Token She
Token ID1375
Feature activation-5.40e-02

Pos logit contributions
ern+0.014
iles+0.014
ales+0.014
ad+0.013
et+0.013
Neg logit contributions
 divorce-0.013
broken-0.013
 late-0.013
 band-0.012
 helicopter-0.012
Token tried
Token ID3088
Feature activation+8.27e-02

Pos logit contributions
ales+0.113
ern+0.110
iles+0.106
zie+0.105
ad+0.100
Neg logit contributions
broken-0.057
 divorce-0.053
 band-0.052
 late-0.049
 sudden-0.048
Token to
Token ID284
Feature activation-1.62e-02

Pos logit contributions
 divorce+0.078
 late+0.071
broken+0.070
 band+0.065
 together+0.063
Neg logit contributions
ern-0.181
ales-0.179
iles-0.177
zie-0.174
ad-0.169
Token read
Token ID1100
Feature activation-3.50e-02

Pos logit contributions
iles+0.033
io+0.033
ales+0.032
alia+0.032
ern+0.032
Neg logit contributions
 band-0.018
broken-0.016
 divorce-0.015
 upset-0.015
 sad-0.014
Token lots
Token ID6041
Feature activation+5.49e-03

Pos logit contributions
ern+0.100
zie+0.095
ales+0.095
iles+0.094
alia+0.090
Neg logit contributions
 million-0.054
 late-0.052
 divorce-0.051
broken-0.049
 helicopter-0.049
Token of
Token ID286
Feature activation+1.62e-02

Pos logit contributions
 divorce+0.006
broken+0.006
 late+0.005
AAAA+0.005
 severe+0.005
Neg logit contributions
ern-0.011
ad-0.011
iles-0.011
ales-0.011
 meanings-0.011
Token words
Token ID2456
Feature activation-1.21e-01

Pos logit contributions
 million+0.022
 sudden+0.021
 late+0.021
 helicopter+0.021
 band+0.021
Neg logit contributions
ern-0.029
zie-0.028
io-0.027
ales-0.027
iles-0.027
Token and
Token ID290
Feature activation+4.88e-02

Pos logit contributions
ern+0.208
iles+0.199
ales+0.196
zie+0.195
ad+0.194
Neg logit contributions
broken-0.163
 divorce-0.162
 late-0.153
 helicopter-0.141
 band-0.140
Token their
Token ID511
Feature activation+4.92e-02

Pos logit contributions
 divorce+0.067
broken+0.065
 late+0.062
 band+0.058
 severe+0.057
Neg logit contributions
ern-0.084
iles-0.081
ales-0.080
ad-0.079
zie-0.078
Token meanings
Token ID26368
Feature activation-2.52e-01

Pos logit contributions
 divorce+0.095
broken+0.094
 sudden+0.092
 band+0.091
 helicopter+0.089
Neg logit contributions
ern-0.063
ales-0.056
zie-0.053
alia-0.052
iles-0.051
Token.
Token ID13
Feature activation-1.50e-03

Pos logit contributions
ern+0.326
ad+0.312
iles+0.306
ales+0.305
et+0.301
Neg logit contributions
 divorce-0.444
broken-0.434
 late-0.408
AAAA-0.396
 severe-0.393
Token Lily
Token ID20037
Feature activation+3.79e-02

Pos logit contributions
ern+0.002
 meanings+0.002
zie+0.002
artments+0.002
 herself+0.002
Neg logit contributions
broken-0.003
 divorce-0.003
 severe-0.003
burn-0.002
AAAA-0.002
Token looked
Token ID3114
Feature activation+8.11e-03

Pos logit contributions
 divorce+0.056
broken+0.051
 late+0.051
 severe+0.047
 helicopter+0.047
Neg logit contributions
ern-0.062
ad-0.060
et-0.058
 meanings-0.058
zie-0.058
Token through
Token ID832
Feature activation-3.16e-02

Pos logit contributions
 divorce+0.013
broken+0.013
 late+0.012
 severe+0.012
burn+0.012
Neg logit contributions
ern-0.011
et-0.011
 meanings-0.011
ales-0.010
iles-0.010
Token the
Token ID262
Feature activation-2.96e-03

Pos logit contributions
ern+0.062
iles+0.061
ales+0.060
zie+0.058
ogo+0.057
Neg logit contributions
 divorce-0.036
 late-0.036
 helicopter-0.035
broken-0.034
 sudden-0.034
Token been
Token ID587
Feature activation+2.41e-02

Pos logit contributions
ern+0.039
iles+0.037
ales+0.037
ad+0.037
zie+0.036
Neg logit contributions
 divorce-0.031
broken-0.031
 late-0.028
 severe-0.027
 band-0.026
Token caught
Token ID4978
Feature activation-5.14e-02

Pos logit contributions
broken+0.029
 divorce+0.029
 late+0.027
 severe+0.025
 band+0.024
Neg logit contributions
ern-0.045
iles-0.043
ales-0.043
ad-0.043
zie-0.043
Token in
Token ID287
Feature activation-2.86e-02

Pos logit contributions
ern+0.089
iles+0.084
ales+0.084
zie+0.083
 meanings+0.083
Neg logit contributions
broken-0.069
 divorce-0.067
 late-0.065
 severe-0.060
 band-0.059
Token some
Token ID617
Feature activation-2.90e-02

Pos logit contributions
ern+0.054
ales+0.051
zie+0.051
iles+0.051
io+0.050
Neg logit contributions
 late-0.036
 divorce-0.034
broken-0.034
 severe-0.032
 band-0.031
Token b
Token ID275
Feature activation-3.43e-02

Pos logit contributions
ern+0.047
ales+0.043
iles+0.043
ad+0.043
zie+0.043
Neg logit contributions
 divorce-0.043
broken-0.042
 late-0.040
 band-0.038
 severe-0.038
Tokenram
Token ID859
Feature activation-2.48e-01

Pos logit contributions
 Lucas+0.056
ern+0.054
 meanings+0.052
icity+0.052
zie+0.052
Neg logit contributions
broken-0.048
 late-0.044
 divorce-0.043
 severe-0.043
 badly-0.042
Tokenbles
Token ID7689
Feature activation-1.40e-01

Pos logit contributions
ern+0.350
iles+0.329
 meanings+0.326
ad+0.326
ales+0.322
Neg logit contributions
broken-0.411
 divorce-0.400
 late-0.379
 severe-0.363
 band-0.356
Token and
Token ID290
Feature activation+5.00e-02

Pos logit contributions
ern+0.208
iles+0.199
ales+0.199
ad+0.198
zie+0.193
Neg logit contributions
 divorce-0.217
broken-0.215
 late-0.200
AAAA-0.190
 severe-0.190
Token so
Token ID523
Feature activation-2.09e-02

Pos logit contributions
 divorce+0.062
broken+0.061
 late+0.057
 band+0.053
 severe+0.053
Neg logit contributions
ern-0.093
iles-0.088
ales-0.088
ad-0.086
zie-0.086
Token he
Token ID339
Feature activation+2.63e-02

Pos logit contributions
ern+0.038
ales+0.036
iles+0.036
ad+0.036
zie+0.036
Neg logit contributions
broken-0.027
 divorce-0.026
 late-0.025
 severe-0.023
 helicopter-0.023
Token carefully
Token ID7773
Feature activation-6.17e-03

Pos logit contributions
 divorce+0.032
broken+0.031
 late+0.029
 severe+0.027
 helicopter+0.026
Neg logit contributions
ern-0.049
iles-0.047
ad-0.047
ales-0.046
zie-0.046
Token try
Token ID1949
Feature activation+2.13e-02

Pos logit contributions
ern+0.014
iles+0.014
ales+0.013
ad+0.013
 meanings+0.013
Neg logit contributions
 divorce-0.013
broken-0.013
 late-0.012
 band-0.011
 severe-0.011
Token to
Token ID284
Feature activation-9.11e-03

Pos logit contributions
broken+0.021
 divorce+0.021
 late+0.020
 band+0.019
 severe+0.018
Neg logit contributions
ern-0.043
iles-0.042
 meanings-0.042
ales-0.042
zie-0.042
Token loosen
Token ID40953
Feature activation+3.45e-02

Pos logit contributions
ern+0.016
iles+0.015
 meanings+0.015
io+0.015
alia+0.015
Neg logit contributions
broken-0.012
 divorce-0.012
 band-0.011
 late-0.011
 severe-0.011
Token the
Token ID262
Feature activation+3.31e-04

Pos logit contributions
 late+0.038
 divorce+0.035
 together+0.034
 helicopter+0.034
broken+0.034
Neg logit contributions
ern-0.068
 meanings-0.065
zie-0.065
iles-0.065
ori-0.063
Token b
Token ID275
Feature activation-3.27e-02

Pos logit contributions
broken+0.000
 divorce+0.000
 late+0.000
 band+0.000
 helicopter+0.000
Neg logit contributions
ern-0.001
ales-0.001
iles-0.001
zie-0.001
ad-0.001
Tokenram
Token ID859
Feature activation-2.47e-01

Pos logit contributions
 Lucas+0.055
 meanings+0.055
zie+0.053
artments+0.051
ern+0.051
Neg logit contributions
broken-0.042
ite-0.039
 badly-0.039
oing-0.037
 severe-0.036
Tokenbles
Token ID7689
Feature activation-1.39e-01

Pos logit contributions
ern+0.355
 meanings+0.347
iles+0.335
ad+0.329
ales+0.329
Neg logit contributions
broken-0.399
 divorce-0.382
 late-0.364
 severe-0.350
 band-0.348
Token.
Token ID13
Feature activation-3.31e-03

Pos logit contributions
ern+0.225
iles+0.216
 meanings+0.213
ad+0.213
zie+0.211
Neg logit contributions
broken-0.198
 divorce-0.195
 late-0.185
 severe-0.173
 band-0.172
Token As
Token ID1081
Feature activation+7.29e-02

Pos logit contributions
ern+0.005
iles+0.005
ad+0.005
ales+0.005
zie+0.005
Neg logit contributions
 divorce-0.005
broken-0.005
 late-0.005
 severe-0.005
 helicopter-0.005
Token he
Token ID339
Feature activation+2.75e-02

Pos logit contributions
broken+0.108
 divorce+0.107
 late+0.095
AAAA+0.094
 severe+0.093
Neg logit contributions
ern-0.113
ad-0.110
iles-0.110
 Lucas-0.108
ales-0.107
Token was
Token ID373
Feature activation+1.11e-02

Pos logit contributions
 divorce+0.037
broken+0.036
 late+0.033
 severe+0.031
 helicopter+0.031
Neg logit contributions
ern-0.048
ad-0.047
iles-0.046
et-0.046
ales-0.045
Token was
Token ID373
Feature activation+1.11e-02

Pos logit contributions
 divorce+0.037
broken+0.036
 late+0.033
 severe+0.031
 helicopter+0.031
Neg logit contributions
ern-0.048
ad-0.047
iles-0.046
et-0.046
ales-0.045
Token pin
Token ID6757
Feature activation-1.68e-02

Pos logit contributions
 divorce+0.015
broken+0.015
 late+0.014
 severe+0.013
 helicopter+0.013
Neg logit contributions
ern-0.019
iles-0.018
ad-0.018
ales-0.018
zie-0.018
Tokenching
Token ID10813
Feature activation+1.14e-01

Pos logit contributions
ern+0.033
ad+0.032
iles+0.032
et+0.032
ales+0.031
Neg logit contributions
 divorce-0.019
broken-0.017
 late-0.016
 helicopter-0.015
 severe-0.015
Token the
Token ID262
Feature activation+8.55e-04

Pos logit contributions
 divorce+0.157
broken+0.156
 late+0.148
 helicopter+0.138
 band+0.138
Neg logit contributions
ern-0.194
iles-0.183
ales-0.182
zie-0.180
ad-0.179
Token b
Token ID275
Feature activation-3.22e-02

Pos logit contributions
 divorce+0.001
broken+0.001
 late+0.001
 band+0.001
 helicopter+0.001
Neg logit contributions
ern-0.002
ales-0.001
iles-0.001
ad-0.001
zie-0.001
Tokenram
Token ID859
Feature activation-2.46e-01

Pos logit contributions
 meanings+0.055
 Lucas+0.055
zie+0.053
wered+0.053
icity+0.053
Neg logit contributions
ite-0.039
broken-0.038
 badly-0.036
ended-0.034
 unst-0.034
Tokenbles
Token ID7689
Feature activation-1.39e-01

Pos logit contributions
ern+0.357
 meanings+0.343
iles+0.331
ales+0.330
ad+0.328
Neg logit contributions
broken-0.397
 divorce-0.377
 late-0.359
 severe-0.346
 band-0.344
Token,
Token ID11
Feature activation+1.69e-03

Pos logit contributions
ern+0.235
iles+0.226
ad+0.225
ales+0.224
zie+0.220
Neg logit contributions
 divorce-0.189
broken-0.187
 late-0.173
 severe-0.162
 helicopter-0.160
Token the
Token ID262
Feature activation+1.14e-03

Pos logit contributions
broken+0.003
 divorce+0.003
 late+0.002
 severe+0.002
 band+0.002
Neg logit contributions
ern-0.003
ad-0.003
iles-0.002
ales-0.002
 Lucas-0.002
Token raven
Token ID37735
Feature activation-3.44e-02

Pos logit contributions
 divorce+0.001
broken+0.001
 band+0.001
 helicopter+0.001
 late+0.001
Neg logit contributions
ern-0.002
ales-0.002
zie-0.002
iles-0.002
ad-0.002
Token fl
Token ID781
Feature activation+3.08e-02

Pos logit contributions
ern+0.059
iles+0.058
ad+0.058
et+0.058
ales+0.057
Neg logit contributions
 divorce-0.045
broken-0.042
 late-0.041
 severe-0.038
AAAA-0.037
Token as
Token ID355
Feature activation-4.64e-02

Pos logit contributions
broken+0.109
 divorce+0.106
 late+0.102
 band+0.097
 helicopter+0.097
Neg logit contributions
ern-0.121
iles-0.118
zie-0.114
ales-0.114
ad-0.113
Token he
Token ID339
Feature activation+2.12e-02

Pos logit contributions
ern+0.090
iles+0.086
ales+0.085
ad+0.085
zie+0.084
Neg logit contributions
 divorce-0.052
broken-0.052
 late-0.048
 severe-0.044
 helicopter-0.043
Token opened
Token ID4721
Feature activation+4.60e-02

Pos logit contributions
broken+0.024
 divorce+0.023
 late+0.022
 band+0.021
 severe+0.020
Neg logit contributions
ern-0.041
ales-0.039
iles-0.038
zie-0.038
ad-0.038
Token the
Token ID262
Feature activation+1.23e-03

Pos logit contributions
 late+0.056
broken+0.054
 together+0.051
 divorce+0.051
 helicopter+0.050
Neg logit contributions
iles-0.086
ern-0.086
 meanings-0.083
ales-0.082
alia-0.080
Token ch
Token ID442
Feature activation-3.49e-02

Pos logit contributions
 divorce+0.002
broken+0.002
 helicopter+0.002
 late+0.002
 band+0.002
Neg logit contributions
ern-0.002
ales-0.002
iles-0.002
ad-0.002
zie-0.002
Tokenocol
Token ID4668
Feature activation-2.44e-01

Pos logit contributions
ern+0.041
 meanings+0.039
zie+0.038
ales+0.037
 Lucas+0.036
Neg logit contributions
broken-0.065
 divorce-0.063
 band-0.062
 helicopter-0.061
 late-0.060
Tokenates
Token ID689
Feature activation-1.79e-01

Pos logit contributions
ern+0.189
 meanings+0.186
zie+0.180
 Lucas+0.172
 answers+0.172
Neg logit contributions
broken-0.520
burn-0.484
 late-0.484
 band-0.478
 together-0.473
Token and
Token ID290
Feature activation+4.39e-02

Pos logit contributions
ern+0.309
 meanings+0.296
iles+0.294
ad+0.289
zie+0.288
Neg logit contributions
broken-0.243
 divorce-0.236
 late-0.226
 band-0.210
 helicopter-0.208
Token started
Token ID2067
Feature activation+4.17e-02

Pos logit contributions
 divorce+0.056
broken+0.054
 late+0.053
 band+0.051
 helicopter+0.050
Neg logit contributions
ern-0.080
iles-0.077
ales-0.076
ad-0.074
zie-0.073
Token eating
Token ID6600
Feature activation-5.82e-02

Pos logit contributions
broken+0.050
 divorce+0.049
 late+0.047
 band+0.045
 severe+0.043
Neg logit contributions
ern-0.078
iles-0.076
ales-0.075
zie-0.074
 meanings-0.073
Token them
Token ID606
Feature activation-2.12e-02

Pos logit contributions
ern+0.108
zie+0.104
iles+0.101
alia+0.100
ales+0.100
Neg logit contributions
 late-0.070
 together-0.070
 band-0.068
 divorce-0.067
 helicopter-0.066
Token They
Token ID1119
Feature activation+2.24e-03

Pos logit contributions
ern+0.017
iles+0.017
ad+0.016
ales+0.016
et+0.016
Neg logit contributions
 divorce-0.018
 late-0.018
broken-0.018
 helicopter-0.017
B-0.016
Token knew
Token ID2993
Feature activation-1.47e-03

Pos logit contributions
broken+0.002
 band+0.002
 divorce+0.002
B+0.002
 helicopter+0.002
Neg logit contributions
ern-0.005
iles-0.004
 meanings-0.004
ad-0.004
ales-0.004
Token there
Token ID612
Feature activation+5.04e-02

Pos logit contributions
ern+0.003
iles+0.003
ad+0.003
ales+0.003
 Lucas+0.002
Neg logit contributions
broken-0.002
 divorce-0.002
 late-0.002
AAAA-0.002
 severe-0.002
Token were
Token ID547
Feature activation-3.51e-02

Pos logit contributions
 divorce+0.062
broken+0.060
 late+0.055
 helicopter+0.052
 band+0.052
Neg logit contributions
ern-0.094
iles-0.089
ales-0.089
ad-0.088
zie-0.087
Token crocod
Token ID37565
Feature activation-8.27e-02

Pos logit contributions
ern+0.064
ad+0.059
zie+0.058
ales+0.058
alia+0.058
Neg logit contributions
 divorce-0.046
 late-0.044
broken-0.042
 helicopter-0.039
 band-0.039
Tokeniles
Token ID2915
Feature activation-2.43e-01

Pos logit contributions
ern+0.015
iles+0.008
ad+0.008
ales+0.007
zie+0.006
Neg logit contributions
 divorce-0.238
broken-0.237
 late-0.229
 severe-0.223
 band-0.222
Token in
Token ID287
Feature activation-4.47e-02

Pos logit contributions
iles+0.377
ern+0.374
ales+0.367
ad+0.364
et+0.360
Neg logit contributions
 divorce-0.359
broken-0.355
 late-0.327
 severe-0.313
 band-0.308
Token the
Token ID262
Feature activation-1.26e-02

Pos logit contributions
ern+0.078
ales+0.073
 meanings+0.073
iles+0.072
zie+0.071
Neg logit contributions
 divorce-0.060
 late-0.059
broken-0.058
 helicopter-0.053
 severe-0.053
Token water
Token ID1660
Feature activation-2.09e-02

Pos logit contributions
ern+0.021
ales+0.019
 meanings+0.019
iles+0.019
ad+0.019
Neg logit contributions
 divorce-0.018
broken-0.018
 late-0.017
 helicopter-0.016
 band-0.016
Token,
Token ID11
Feature activation-1.20e-02

Pos logit contributions
ern+0.036
iles+0.034
ad+0.033
ales+0.033
 meanings+0.033
Neg logit contributions
broken-0.028
 divorce-0.028
 late-0.026
 band-0.024
 severe-0.024
Token but
Token ID475
Feature activation-8.63e-02

Pos logit contributions
ern+0.019
iles+0.018
ales+0.018
ad+0.018
zie+0.018
Neg logit contributions
 divorce-0.017
broken-0.017
 late-0.016
 severe-0.015
 band-0.015
Token colorful
Token ID20239
Feature activation-7.25e-02

Pos logit contributions
ern+0.088
zie+0.084
ad+0.081
ales+0.080
et+0.078
Neg logit contributions
broken-0.060
 divorce-0.060
 sudden-0.059
 helicopter-0.058
 band-0.057
Token cand
Token ID2658
Feature activation-1.20e-01

Pos logit contributions
ern+0.123
zie+0.116
et+0.109
ad+0.109
ales+0.108
Neg logit contributions
 divorce-0.102
broken-0.102
 helicopter-0.101
 band-0.098
 sudden-0.097
Tokenies
Token ID444
Feature activation-1.49e-02

Pos logit contributions
ern+0.271
zie+0.261
wered+0.255
 meanings+0.253
ad+0.252
Neg logit contributions
broken-0.114
 together-0.088
 band-0.081
 sudden-0.080
 severe-0.080
Token and
Token ID290
Feature activation+4.36e-02

Pos logit contributions
ern+0.030
iles+0.028
zie+0.028
 meanings+0.028
ales+0.028
Neg logit contributions
broken-0.017
 divorce-0.016
 late-0.015
 together-0.015
 band-0.014
Token ch
Token ID442
Feature activation-4.27e-02

Pos logit contributions
 sudden+0.052
broken+0.048
 divorce+0.046
 band+0.045
 million+0.044
Neg logit contributions
iles-0.085
ern-0.084
ales-0.084
iddle-0.081
et-0.081
Tokenocol
Token ID4668
Feature activation-2.41e-01

Pos logit contributions
ern+0.052
zie+0.049
 Lucas+0.046
 poses+0.046
ori+0.045
Neg logit contributions
broken-0.078
 divorce-0.072
 late-0.071
 sudden-0.070
 helicopter-0.069
Tokenates
Token ID689
Feature activation-1.76e-01

Pos logit contributions
ern+0.178
zie+0.173
 answers+0.173
 Lucas+0.172
 poses+0.171
Neg logit contributions
broken-0.522
burn-0.496
B-0.487
 together-0.484
 late-0.484
Token.
Token ID13
Feature activation-9.26e-03

Pos logit contributions
ern+0.335
zie+0.320
 meanings+0.317
iles+0.315
ori+0.310
Neg logit contributions
broken-0.224
 together-0.206
 late-0.186
 divorce-0.186
 badly-0.180
Token She
Token ID1375
Feature activation-6.00e-02

Pos logit contributions
ern+0.013
iles+0.013
ad+0.013
ales+0.013
et+0.012
Neg logit contributions
 divorce-0.015
broken-0.015
 late-0.014
 helicopter-0.013
 band-0.013
Token said
Token ID531
Feature activation-7.09e-02

Pos logit contributions
ales+0.126
iles+0.123
ern+0.122
zie+0.117
et+0.115
Neg logit contributions
broken-0.064
 sudden-0.058
 band-0.054
 divorce-0.054
 accidentally-0.051
Token to
Token ID284
Feature activation-2.07e-02

Pos logit contributions
ern+0.104
 Lucas+0.101
ad+0.100
io+0.098
iles+0.097
Neg logit contributions
broken-0.112
 divorce-0.109
 late-0.104
 severe-0.100
AAAA-0.095
Token by
Token ID416
Feature activation-4.46e-02

Pos logit contributions
ern+0.010
ad+0.009
 meanings+0.009
ales+0.009
iles+0.009
Neg logit contributions
 divorce-0.008
broken-0.008
 late-0.008
 band-0.007
 helicopter-0.007
Token telling
Token ID5149
Feature activation-2.21e-02

Pos logit contributions
alia+0.074
ern+0.073
 meanings+0.073
iles+0.072
ales+0.072
Neg logit contributions
 divorce-0.058
broken-0.057
 late-0.056
 band-0.054
AAAA-0.053
Token us
Token ID514
Feature activation-4.75e-02

Pos logit contributions
ern+0.034
alia+0.032
zie+0.031
ales+0.031
iles+0.031
Neg logit contributions
 divorce-0.035
 band-0.033
broken-0.031
 late-0.031
 helicopter-0.030
Token.
Token ID13
Feature activation-1.46e-02

Pos logit contributions
ern+0.078
iles+0.072
ales+0.071
zie+0.071
alia+0.071
Neg logit contributions
 divorce-0.070
broken-0.068
 late-0.063
 severe-0.061
 band-0.061
Token Miss
Token ID4544
Feature activation-2.15e-02

Pos logit contributions
ern+0.027
iles+0.026
ales+0.026
et+0.026
zie+0.026
Neg logit contributions
 divorce-0.017
 late-0.016
broken-0.016
AAAA-0.015
 helicopter-0.014
Tokeniles
Token ID2915
Feature activation-2.39e-01

Pos logit contributions
ern+0.030
iles+0.028
ad+0.028
ales+0.028
zie+0.027
Neg logit contributions
 divorce-0.036
broken-0.036
 late-0.033
 helicopter-0.031
 band-0.031
Token can
Token ID460
Feature activation-1.15e-02

Pos logit contributions
ern+0.415
zie+0.382
ales+0.380
 meanings+0.375
alia+0.373
Neg logit contributions
 divorce-0.316
broken-0.315
 late-0.287
 band-0.286
 helicopter-0.276
Token be
Token ID307
Feature activation-6.81e-02

Pos logit contributions
ern+0.022
 meanings+0.022
ori+0.022
artments+0.022
alia+0.022
Neg logit contributions
 we-0.013
 band-0.013
 divorce-0.012
 helicopter-0.011
 late-0.011
Token very
Token ID845
Feature activation+2.03e-03

Pos logit contributions
ern+0.138
iles+0.132
ales+0.132
ad+0.131
zie+0.130
Neg logit contributions
broken-0.072
 divorce-0.071
 late-0.066
 band-0.060
 severe-0.060
Token dangerous
Token ID4923
Feature activation-1.66e-02

Pos logit contributions
broken+0.002
 divorce+0.002
 late+0.002
 severe+0.002
 band+0.002
Neg logit contributions
ern-0.004
ales-0.004
ad-0.004
zie-0.004
iles-0.004
Token.
Token ID13
Feature activation-1.50e-02

Pos logit contributions
ern+0.029
zie+0.027
ad+0.027
 meanings+0.027
iles+0.027
Neg logit contributions
broken-0.022
 late-0.022
 divorce-0.022
 band-0.019
 helicopter-0.019
Token waved
Token ID26834
Feature activation+5.82e-02

Pos logit contributions
 divorce+0.064
broken+0.063
 late+0.058
 severe+0.055
 helicopter+0.054
Neg logit contributions
ern-0.080
ad-0.077
iles-0.076
ales-0.076
 Lucas-0.075
Token it
Token ID340
Feature activation+7.55e-02

Pos logit contributions
broken+0.082
 divorce+0.081
 late+0.077
 band+0.072
 helicopter+0.071
Neg logit contributions
ern-0.097
iles-0.094
ales-0.092
zie-0.091
 meanings-0.090
Token at
Token ID379
Feature activation-3.94e-02

Pos logit contributions
broken+0.097
 divorce+0.092
 late+0.089
 band+0.084
 helicopter+0.082
Neg logit contributions
ern-0.136
iles-0.132
ales-0.129
zie-0.129
ad-0.128
Token the
Token ID262
Feature activation-8.94e-04

Pos logit contributions
ern+0.071
 meanings+0.067
iles+0.066
alia+0.066
 poses+0.064
Neg logit contributions
 late-0.048
 divorce-0.046
 band-0.044
broken-0.044
 together-0.044
Token crocod
Token ID37565
Feature activation-8.04e-02

Pos logit contributions
ern+0.002
ales+0.002
iles+0.002
zie+0.002
ad+0.001
Neg logit contributions
 divorce-0.001
broken-0.001
 late-0.001
 band-0.001
 helicopter-0.001
Tokeniles
Token ID2915
Feature activation-2.39e-01

Pos logit contributions
ern+0.109
 meanings+0.097
zie+0.095
 Lucas+0.094
icity+0.093
Neg logit contributions
broken-0.139
 divorce-0.136
 band-0.129
 late-0.128
 helicopter-0.127
Token.
Token ID13
Feature activation-1.33e-02

Pos logit contributions
ern+0.405
 meanings+0.376
zie+0.375
alia+0.370
ori+0.365
Neg logit contributions
broken-0.332
 divorce-0.319
 late-0.309
 together-0.296
 helicopter-0.295
Token He
Token ID679
Feature activation-4.63e-02

Pos logit contributions
ern+0.022
iles+0.022
ales+0.022
et+0.021
ad+0.021
Neg logit contributions
 late-0.018
 divorce-0.018
broken-0.017
AAAA-0.015
 helicopter-0.015
Token tried
Token ID3088
Feature activation+7.91e-02

Pos logit contributions
ern+0.084
ales+0.080
iles+0.079
zie+0.079
ad+0.078
Neg logit contributions
broken-0.060
 divorce-0.058
 late-0.054
 band-0.052
 severe-0.050
Token to
Token ID284
Feature activation-1.82e-02

Pos logit contributions
 divorce+0.075
broken+0.071
 late+0.070
 band+0.066
 together+0.060
Neg logit contributions
ern-0.169
ales-0.165
iles-0.165
zie-0.162
 meanings-0.161
Token scare
Token ID19437
Feature activation-2.80e-02

Pos logit contributions
 meanings+0.033
ern+0.033
iles+0.032
alia+0.032
io+0.032
Neg logit contributions
broken-0.022
 band-0.022
 divorce-0.022
 late-0.020
 unst-0.019
Token not
Token ID407
Feature activation-1.13e-02

Pos logit contributions
iles+0.085
ales+0.084
 Lucas+0.081
 meanings+0.080
ern+0.080
Neg logit contributions
 divorce-0.062
 band-0.059
broken-0.058
 late-0.057
 severe-0.055
Token eat
Token ID4483
Feature activation-4.30e-02

Pos logit contributions
 Lucas+0.019
io+0.019
zie+0.019
iles+0.019
ern+0.019
Neg logit contributions
 divorce-0.015
broken-0.014
 late-0.014
 band-0.014
 badly-0.013
Token too
Token ID1165
Feature activation-4.75e-02

Pos logit contributions
ern+0.079
zie+0.075
ori+0.073
alia+0.073
ales+0.072
Neg logit contributions
 divorce-0.053
 late-0.053
broken-0.051
 helicopter-0.050
 band-0.049
Token many
Token ID867
Feature activation-4.50e-02

Pos logit contributions
ern+0.108
iles+0.107
ad+0.106
ales+0.105
 meanings+0.104
Neg logit contributions
broken-0.037
 divorce-0.037
burn-0.028
AAAA-0.027
 band-0.027
Token ch
Token ID442
Feature activation-4.10e-02

Pos logit contributions
ern+0.081
zie+0.077
ales+0.075
ori+0.075
alia+0.074
Neg logit contributions
 divorce-0.056
 band-0.055
broken-0.055
 helicopter-0.054
 late-0.054
Tokenocol
Token ID4668
Feature activation-2.39e-01

Pos logit contributions
ern+0.040
zie+0.036
 meanings+0.034
icity+0.034
ori+0.034
Neg logit contributions
broken-0.084
 late-0.081
 divorce-0.081
 band-0.079
 helicopter-0.079
Tokenates
Token ID689
Feature activation-1.74e-01

Pos logit contributions
ern+0.193
zie+0.178
 Lucas+0.173
 meanings+0.169
icity+0.166
Neg logit contributions
broken-0.514
 late-0.482
burn-0.479
B-0.473
 together-0.471
Token and
Token ID290
Feature activation+4.05e-02

Pos logit contributions
ern+0.312
 meanings+0.292
zie+0.292
iles+0.289
ad+0.286
Neg logit contributions
broken-0.228
 divorce-0.221
 late-0.215
 severe-0.199
 badly-0.198
Token his
Token ID465
Feature activation-4.07e-03

Pos logit contributions
 divorce+0.054
broken+0.052
 late+0.051
 band+0.047
 severe+0.047
Neg logit contributions
ern-0.073
iles-0.068
ales-0.068
zie-0.067
ad-0.066
Token friend
Token ID1545
Feature activation+6.74e-02

Pos logit contributions
ern+0.008
ales+0.007
alia+0.007
zie+0.007
ori+0.007
Neg logit contributions
 divorce-0.005
 band-0.005
broken-0.005
 helicopter-0.005
 sad-0.005
Token learned
Token ID4499
Feature activation+4.24e-02

Pos logit contributions
broken+0.077
 divorce+0.075
 late+0.070
 band+0.067
 severe+0.066
Neg logit contributions
ern-0.132
ales-0.126
iles-0.125
ad-0.122
zie-0.121
Token can
Token ID460
Feature activation-1.47e-02

Pos logit contributions
 divorce+0.086
broken+0.083
 late+0.078
 severe+0.072
 band+0.072
Neg logit contributions
ern-0.140
ad-0.134
iles-0.134
ales-0.133
zie-0.132
Token't
Token ID470
Feature activation+5.92e-02

Pos logit contributions
ern+0.030
iles+0.029
 meanings+0.029
ales+0.029
zie+0.028
Neg logit contributions
 divorce-0.015
broken-0.015
 late-0.014
 band-0.013
 severe-0.013
Token.
Token ID13
Feature activation-1.79e-02

Pos logit contributions
 divorce+0.069
broken+0.066
 late+0.065
 band+0.064
burn+0.061
Neg logit contributions
 meanings-0.110
ern-0.108
zie-0.107
 Lucas-0.104
 angles-0.104
Token Cro
Token ID9325
Feature activation-7.68e-02

Pos logit contributions
ern+0.030
iles+0.028
ales+0.028
ad+0.028
 meanings+0.028
Neg logit contributions
broken-0.026
 divorce-0.025
 late-0.022
 severe-0.022
 band-0.021
Tokencod
Token ID19815
Feature activation-1.02e-01

Pos logit contributions
 Lucas+0.123
ern+0.118
 meanings+0.115
zie+0.113
ad+0.111
Neg logit contributions
broken-0.110
 divorce-0.104
 band-0.102
 late-0.097
 severe-0.096
Tokeniles
Token ID2915
Feature activation-2.39e-01

Pos logit contributions
ern+0.119
artments+0.110
 Lucas+0.103
icity+0.099
 Mae+0.095
Neg logit contributions
 band-0.183
 divorce-0.180
B-0.177
arre-0.177
 helicopter-0.176
Token are
Token ID389
Feature activation-6.90e-02

Pos logit contributions
ern+0.399
zie+0.376
 meanings+0.372
alia+0.368
ad+0.366
Neg logit contributions
 divorce-0.325
broken-0.322
 band-0.304
 late-0.302
 helicopter-0.291
Token dangerous
Token ID4923
Feature activation-1.86e-02

Pos logit contributions
ern+0.147
ad+0.141
zie+0.139
alia+0.138
ales+0.136
Neg logit contributions
 late-0.072
 divorce-0.068
broken-0.066
 severe-0.060
 sad-0.059
Token.
Token ID13
Feature activation-1.82e-02

Pos logit contributions
ern+0.037
zie+0.035
ad+0.035
 Lucas+0.034
 meanings+0.034
Neg logit contributions
broken-0.020
 late-0.020
 divorce-0.020
 band-0.018
 helicopter-0.017
Token They
Token ID1119
Feature activation+4.70e-03

Pos logit contributions
ern+0.030
iles+0.029
ad+0.029
ales+0.029
zie+0.028
Neg logit contributions
broken-0.025
 divorce-0.025
 late-0.023
 severe-0.022
 band-0.022
Token have
Token ID423
Feature activation+2.13e-02

Pos logit contributions
 divorce+0.007
broken+0.006
 band+0.006
 late+0.006
 helicopter+0.006
Neg logit contributions
ern-0.008
zie-0.007
ad-0.007
alia-0.007
io-0.007
Token and
Token ID290
Feature activation+4.63e-02

Pos logit contributions
iles+0.055
ern+0.055
 meanings+0.053
zie+0.051
ales+0.051
Neg logit contributions
broken-0.021
 late-0.020
 together-0.019
 divorce-0.018
 band-0.017
Token started
Token ID2067
Feature activation+3.80e-02

Pos logit contributions
 divorce+0.057
broken+0.057
 late+0.054
 band+0.050
 severe+0.049
Neg logit contributions
ern-0.085
iles-0.083
ales-0.081
ad-0.081
 meanings-0.080
Token eating
Token ID6600
Feature activation-5.57e-02

Pos logit contributions
 divorce+0.037
broken+0.037
 late+0.037
 band+0.034
 severe+0.031
Neg logit contributions
iles-0.080
ern-0.078
ales-0.077
zie-0.076
 meanings-0.076
Token the
Token ID262
Feature activation-3.31e-03

Pos logit contributions
zie+0.119
ern+0.119
iles+0.115
ales+0.115
 meanings+0.113
Neg logit contributions
 together-0.059
 late-0.059
 band-0.052
 badly-0.048
 better-0.048
Token ch
Token ID442
Feature activation-4.00e-02

Pos logit contributions
ern+0.006
iles+0.006
ales+0.006
ad+0.006
zie+0.006
Neg logit contributions
 divorce-0.004
 late-0.004
broken-0.004
 band-0.004
 million-0.003
Tokenocol
Token ID4668
Feature activation-2.39e-01

Pos logit contributions
ern+0.039
zie+0.038
 meanings+0.037
 Lucas+0.036
 Fre+0.034
Neg logit contributions
broken-0.077
 band-0.076
apy-0.075
 late-0.074
 divorce-0.074
Tokenates
Token ID689
Feature activation-1.73e-01

Pos logit contributions
ern+0.183
 meanings+0.177
 Lucas+0.173
 answers+0.172
zie+0.166
Neg logit contributions
broken-0.506
burn-0.478
 late-0.476
 divorce-0.472
B-0.469
Token.
Token ID13
Feature activation-7.36e-03

Pos logit contributions
ern+0.305
 meanings+0.301
iles+0.295
zie+0.291
ad+0.289
Neg logit contributions
broken-0.225
 together-0.213
 late-0.212
 divorce-0.208
 band-0.197
Token The
Token ID383
Feature activation+2.19e-02

Pos logit contributions
iles+0.011
ern+0.011
ales+0.011
ad+0.011
et+0.011
Neg logit contributions
 divorce-0.011
 late-0.011
broken-0.011
 million-0.010
 severe-0.010
Token shop
Token ID6128
Feature activation-9.03e-03

Pos logit contributions
 band+0.029
 sudden+0.026
 divorce+0.025
 million+0.025
 little+0.024
Neg logit contributions
ales-0.045
iles-0.041
zie-0.041
ern-0.041
ad-0.039
Tokenkeeper
Token ID13884
Feature activation+1.15e-02

Pos logit contributions
ern+0.019
ales+0.019
iles+0.019
zie+0.018
ad+0.018
Neg logit contributions
broken-0.009
 divorce-0.009
 late-0.008
 band-0.008
 helicopter-0.007
Token that
Token ID326
Feature activation-9.94e-02

Pos logit contributions
 late+0.142
 divorce+0.130
broken+0.126
 band+0.119
 badly+0.112
Neg logit contributions
ern-0.248
 meanings-0.245
iles-0.233
artments-0.229
ales-0.228
Token he
Token ID339
Feature activation+2.48e-02

Pos logit contributions
ern+0.181
iles+0.173
ales+0.173
ad+0.168
 meanings+0.167
Neg logit contributions
 divorce-0.128
broken-0.125
 late-0.121
 helicopter-0.111
 severe-0.110
Token forgot
Token ID16453
Feature activation+3.94e-02

Pos logit contributions
 divorce+0.040
burn+0.036
broken+0.036
 helicopter+0.034
 late+0.034
Neg logit contributions
ern-0.034
et-0.034
ad-0.034
 Lucas-0.033
 meanings-0.032
Token why
Token ID1521
Feature activation-1.39e-01

Pos logit contributions
 divorce+0.062
broken+0.062
 late+0.060
 severe+0.056
 helicopter+0.055
Neg logit contributions
ern-0.060
iles-0.057
ales-0.055
ad-0.055
alia-0.054
Token he
Token ID339
Feature activation+2.50e-02

Pos logit contributions
ern+0.214
ad+0.207
iles+0.207
ales+0.204
zie+0.201
Neg logit contributions
 divorce-0.208
broken-0.207
 late-0.188
AAAA-0.179
 severe-0.179
Token came
Token ID1625
Feature activation+1.12e-01

Pos logit contributions
broken+0.030
 divorce+0.030
 late+0.028
 severe+0.026
 band+0.026
Neg logit contributions
ern-0.046
iles-0.044
ales-0.044
ad-0.044
zie-0.043
Token to
Token ID284
Feature activation-1.06e-02

Pos logit contributions
broken+0.128
 divorce+0.126
 late+0.121
 severe+0.109
 helicopter+0.108
Neg logit contributions
ern-0.219
iles-0.209
ales-0.207
 meanings-0.205
ad-0.205
Token the
Token ID262
Feature activation+7.83e-04

Pos logit contributions
ern+0.018
iles+0.017
ales+0.017
zie+0.016
 meanings+0.016
Neg logit contributions
 divorce-0.015
broken-0.015
 late-0.014
 band-0.014
 severe-0.013
Token house
Token ID2156
Feature activation+1.69e-02

Pos logit contributions
broken+0.001
 divorce+0.001
 late+0.001
AAAA+0.001
burn+0.001
Neg logit contributions
ern-0.001
iles-0.001
ad-0.001
 meanings-0.001
zie-0.001
Token in
Token ID287
Feature activation-2.85e-02

Pos logit contributions
 divorce+0.024
broken+0.024
 late+0.022
 severe+0.021
 band+0.021
Neg logit contributions
ern-0.028
iles-0.026
ad-0.026
ales-0.026
zie-0.026
Token the
Token ID262
Feature activation+6.76e-06

Pos logit contributions
ern+0.046
iles+0.044
ales+0.043
ad+0.043
zie+0.042
Neg logit contributions
 divorce-0.042
broken-0.042
 late-0.039
 severe-0.037
 band-0.037
Token they
Token ID484
Feature activation+9.84e-02

Pos logit contributions
 divorce+0.067
broken+0.067
 late+0.060
AAAA+0.059
 severe+0.058
Neg logit contributions
ern-0.083
ad-0.081
zie-0.080
 Lucas-0.080
ales-0.079
Token had
Token ID550
Feature activation-2.24e-02

Pos logit contributions
 divorce+0.142
broken+0.142
 late+0.131
 severe+0.124
 band+0.123
Neg logit contributions
ern-0.158
iles-0.152
ad-0.150
ales-0.150
zie-0.147
Token lots
Token ID6041
Feature activation+6.33e-03

Pos logit contributions
ern+0.037
ad+0.035
iles+0.035
ales+0.035
et+0.034
Neg logit contributions
 divorce-0.032
broken-0.032
 late-0.029
 severe-0.028
 helicopter-0.027
Token of
Token ID286
Feature activation+1.71e-02

Pos logit contributions
 divorce+0.009
broken+0.009
AAAA+0.008
burn+0.008
Sorry+0.008
Neg logit contributions
ad-0.011
iles-0.010
ern-0.010
ales-0.010
 answers-0.010
Token fun
Token ID1257
Feature activation-3.36e-02

Pos logit contributions
broken+0.026
 divorce+0.025
 late+0.023
 severe+0.022
 helicopter+0.022
Neg logit contributions
ern-0.027
iles-0.026
ad-0.026
ales-0.026
 meanings-0.026
Token together
Token ID1978
Feature activation+1.33e-01

Pos logit contributions
ern+0.066
iles+0.063
ad+0.063
ales+0.063
et+0.062
Neg logit contributions
 divorce-0.037
broken-0.037
 late-0.033
 severe-0.032
AAAA-0.030
Token.
Token ID13
Feature activation-3.74e-03

Pos logit contributions
broken+0.173
 divorce+0.170
 together+0.165
 late+0.165
 band+0.156
Neg logit contributions
ern-0.226
iles-0.221
zie-0.218
ales-0.217
 meanings-0.217
Token<|endoftext|>
Token ID50256
Feature activation+9.55e-03

Pos logit contributions
ern+0.007
iles+0.007
ad+0.007
ales+0.007
et+0.006
Neg logit contributions
 divorce-0.004
broken-0.004
 late-0.004
 severe-0.004
 million-0.004
TokenBen
Token ID11696
Feature activation-7.16e-02

Pos logit contributions
 divorce+0.013
broken+0.013
 late+0.012
 severe+0.011
 band+0.011
Neg logit contributions
ern-0.016
iles-0.016
ad-0.015
ales-0.015
zie-0.015
Token and
Token ID290
Feature activation+4.99e-02

Pos logit contributions
ern+0.143
iles+0.137
ales+0.135
 meanings+0.134
ad+0.131
Neg logit contributions
broken-0.076
 divorce-0.074
 late-0.073
 band-0.067
AAAA-0.063
Token Mom
Token ID11254
Feature activation+2.40e-02

Pos logit contributions
broken+0.089
 divorce+0.088
 late+0.081
 severe+0.079
 million+0.077
Neg logit contributions
ern-0.064
ad-0.063
iles-0.060
ales-0.060
 Lucas-0.059
Token
Token ID198
Feature activation+3.42e-02

Pos logit contributions
broken+0.037
 divorce+0.037
 late+0.034
 severe+0.031
 band+0.030
Neg logit contributions
ern-0.068
iles-0.066
ales-0.066
ad-0.065
et-0.065
TokenWhen
Token ID2215
Feature activation-6.29e-03

Pos logit contributions
 divorce+0.042
broken+0.040
 helicopter+0.035
 late+0.035
burn+0.034
Neg logit contributions
ern-0.065
ad-0.062
zie-0.060
ori-0.059
iles-0.059
Token the
Token ID262
Feature activation-5.98e-03

Pos logit contributions
ern+0.010
ad+0.010
iles+0.010
ales+0.010
zie+0.010
Neg logit contributions
broken-0.009
 divorce-0.009
 late-0.008
 severe-0.008
 helicopter-0.008
Token little
Token ID1310
Feature activation+1.26e-01

Pos logit contributions
ern+0.010
ad+0.009
iles+0.009
 meanings+0.009
zie+0.009
Neg logit contributions
broken-0.009
 divorce-0.008
 late-0.008
 severe-0.008
AAAA-0.007
Token girl
Token ID2576
Feature activation+6.58e-02

Pos logit contributions
broken+0.126
 divorce+0.124
 severe+0.106
 late+0.105
AAAA+0.101
Neg logit contributions
ern-0.257
ad-0.252
zie-0.248
iles-0.248
alia-0.246
Token woke
Token ID19092
Feature activation+1.20e-01

Pos logit contributions
 divorce+0.083
broken+0.083
 late+0.077
 severe+0.072
 helicopter+0.070
Neg logit contributions
ern-0.118
ad-0.114
iles-0.112
zie-0.111
ales-0.110
Token up
Token ID510
Feature activation+2.28e-02

Pos logit contributions
 divorce+0.174
broken+0.173
 late+0.164
 helicopter+0.154
 band+0.151
Neg logit contributions
ern-0.197
iles-0.192
ales-0.188
ad-0.186
 meanings-0.186
Token,
Token ID11
Feature activation-1.44e-03

Pos logit contributions
 divorce+0.029
broken+0.029
 late+0.027
 severe+0.025
 band+0.025
Neg logit contributions
ern-0.041
iles-0.039
ales-0.039
ad-0.038
 meanings-0.038
Token she
Token ID673
Feature activation+1.72e-02

Pos logit contributions
ern+0.003
iles+0.002
ales+0.002
ad+0.002
zie+0.002
Neg logit contributions
 divorce-0.002
broken-0.002
 late-0.002
 severe-0.002
 helicopter-0.002
Token was
Token ID373
Feature activation+7.94e-03

Pos logit contributions
 divorce+0.022
broken+0.022
 late+0.020
 severe+0.019
 helicopter+0.019
Neg logit contributions
ern-0.031
iles-0.029
ad-0.029
ales-0.029
zie-0.029
Token so
Token ID523
Feature activation-2.30e-02

Pos logit contributions
 divorce+0.010
broken+0.009
 late+0.009
AAAA+0.008
 severe+0.008
Neg logit contributions
ern-0.015
ad-0.014
zie-0.014
iles-0.014
et-0.014
Token time
Token ID640
Feature activation+2.92e-02

Pos logit contributions
 divorce+0.087
broken+0.086
 helicopter+0.078
 late+0.078
 severe+0.077
Neg logit contributions
ern-0.070
iles-0.067
ad-0.065
et-0.065
zie-0.065
Token,
Token ID11
Feature activation-2.61e-03

Pos logit contributions
broken+0.056
 divorce+0.054
 severe+0.051
burn+0.049
 late+0.049
Neg logit contributions
ern-0.035
iles-0.035
ad-0.035
 Lucas-0.033
zie-0.033
Token there
Token ID612
Feature activation+6.20e-02

Pos logit contributions
iles+0.003
ern+0.003
et+0.003
zie+0.003
 Mae+0.003
Neg logit contributions
broken-0.004
 divorce-0.004
 severe-0.004
burn-0.004
oing-0.004
Token was
Token ID373
Feature activation+8.11e-03

Pos logit contributions
 divorce+0.068
broken+0.067
 late+0.061
 band+0.057
AAAA+0.056
Neg logit contributions
ern-0.121
ales-0.118
iles-0.116
 meanings-0.114
ad-0.114
Token a
Token ID257
Feature activation+5.13e-02

Pos logit contributions
broken+0.017
 divorce+0.016
 severe+0.015
Sorry+0.015
 sank+0.014
Neg logit contributions
ad-0.009
ern-0.008
 herself-0.008
iles-0.008
 Mae-0.008
Token little
Token ID1310
Feature activation+1.26e-01

Pos logit contributions
 divorce+0.065
broken+0.064
 unst+0.056
 severe+0.056
arre+0.056
Neg logit contributions
iles-0.082
 Lucas-0.081
 Mae-0.079
icity-0.079
alia-0.078
Token boy
Token ID2933
Feature activation+6.10e-02

Pos logit contributions
broken+0.170
 divorce+0.165
 severe+0.145
 unst+0.141
 devastated+0.141
Neg logit contributions
iles-0.214
ad-0.204
alia-0.204
et-0.203
ern-0.199
Token named
Token ID3706
Feature activation+3.50e-02

Pos logit contributions
 divorce+0.072
broken+0.071
 late+0.064
 severe+0.061
AAAA+0.059
Neg logit contributions
ern-0.115
ad-0.111
iles-0.111
ales-0.110
et-0.109
Token Tim
Token ID5045
Feature activation+2.59e-02

Pos logit contributions
 divorce+0.045
broken+0.045
 late+0.041
 severe+0.038
 band+0.038
Neg logit contributions
ern-0.063
iles-0.060
ad-0.060
ales-0.060
zie-0.059
Tokenmy
Token ID1820
Feature activation+4.74e-03

Pos logit contributions
broken+0.033
 late+0.031
 divorce+0.031
 band+0.029
B+0.028
Neg logit contributions
ern-0.046
 Lucas-0.044
 meanings-0.044
iles-0.044
ales-0.043
Token.
Token ID13
Feature activation-3.61e-03

Pos logit contributions
 divorce+0.008
broken+0.008
 late+0.008
 severe+0.007
AAAA+0.007
Neg logit contributions
ern-0.006
ad-0.006
iles-0.006
ales-0.006
et-0.006
Token the
Token ID262
Feature activation-2.79e-03

Pos logit contributions
ern+0.247
iles+0.243
ad+0.243
 Lucas+0.239
 meanings+0.239
Neg logit contributions
broken-0.096
 divorce-0.091
 late-0.080
 severe-0.073
AAAA-0.072
Token old
Token ID1468
Feature activation-7.96e-02

Pos logit contributions
ern+0.005
iles+0.004
ad+0.004
ales+0.004
zie+0.004
Neg logit contributions
 divorce-0.004
broken-0.004
 late-0.004
 severe-0.003
 band-0.003
Token nation
Token ID3277
Feature activation+1.17e-02

Pos logit contributions
ern+0.137
iles+0.130
ales+0.130
ad+0.127
zie+0.127
Neg logit contributions
 divorce-0.110
broken-0.108
 late-0.101
 helicopter-0.096
 band-0.096
Token had
Token ID550
Feature activation-2.54e-02

Pos logit contributions
 divorce+0.015
broken+0.015
 late+0.014
 severe+0.013
 band+0.013
Neg logit contributions
ern-0.020
iles-0.019
ad-0.019
ales-0.019
zie-0.019
Token built
Token ID3170
Feature activation-1.96e-02

Pos logit contributions
ern+0.040
iles+0.039
ad+0.038
ales+0.038
zie+0.037
Neg logit contributions
 divorce-0.037
broken-0.037
 late-0.034
 severe-0.033
 helicopter-0.032
Token together
Token ID1978
Feature activation+1.24e-01

Pos logit contributions
ern+0.035
iles+0.032
ales+0.032
 meanings+0.031
zie+0.031
Neg logit contributions
broken-0.026
 divorce-0.026
 late-0.026
 together-0.023
 band-0.023
Token.
Token ID13
Feature activation-6.75e-03

Pos logit contributions
broken+0.168
 divorce+0.164
 late+0.159
 together+0.150
 band+0.147
Neg logit contributions
ern-0.215
iles-0.203
ales-0.202
zie-0.200
 meanings-0.195
Token<|endoftext|>
Token ID50256
Feature activation+4.91e-05

Pos logit contributions
ern+0.013
iles+0.012
ales+0.012
ad+0.012
et+0.012
Neg logit contributions
 divorce-0.008
broken-0.008
 late-0.007
 helicopter-0.007
 severe-0.007
TokenOnce
Token ID7454
Feature activation-7.66e-03

Pos logit contributions
 divorce+0.000
broken+0.000
 late+0.000
 band+0.000
 severe+0.000
Neg logit contributions
ern-0.000
ales-0.000
iles-0.000
ad-0.000
 meanings-0.000
Token upon
Token ID2402
Feature activation-4.85e-02

Pos logit contributions
ern+0.011
iles+0.011
ad+0.011
ales+0.011
zie+0.010
Neg logit contributions
 divorce-0.012
broken-0.012
 late-0.011
 severe-0.011
 helicopter-0.010
Token a
Token ID257
Feature activation+4.33e-02

Pos logit contributions
ad+0.063
ern+0.061
iles+0.057
 answers+0.056
 herself+0.056
Neg logit contributions
 divorce-0.086
broken-0.084
 helicopter-0.075
AAAA-0.075
burn-0.075
Token I
Token ID314
Feature activation-2.98e-02

Pos logit contributions
ern+0.009
ad+0.009
iles+0.009
ales+0.009
 meanings+0.009
Neg logit contributions
broken-0.006
 divorce-0.006
 late-0.006
 severe-0.005
AAAA-0.005
Token do
Token ID466
Feature activation-8.49e-02

Pos logit contributions
ern+0.045
zie+0.043
ales+0.043
iles+0.042
 Lucas+0.041
Neg logit contributions
 divorce-0.047
broken-0.045
 late-0.044
 band-0.044
 severe-0.041
Token!"
Token ID2474
Feature activation-3.74e-02

Pos logit contributions
zie+0.194
iles+0.189
ales+0.188
alia+0.188
ern+0.186
Neg logit contributions
 we-0.072
 together-0.069
 band-0.066
 divorce-0.065
 badly-0.063
Token Lily
Token ID20037
Feature activation+3.32e-02

Pos logit contributions
ern+0.063
ad+0.060
iles+0.059
 Lucas+0.059
ales+0.058
Neg logit contributions
broken-0.052
 divorce-0.052
 late-0.047
 severe-0.046
burn-0.044
Token says
Token ID1139
Feature activation-1.10e-01

Pos logit contributions
broken+0.049
 divorce+0.048
 late+0.047
 band+0.046
 together+0.044
Neg logit contributions
ern-0.051
ales-0.050
iles-0.050
zie-0.048
ad-0.046
Token.
Token ID13
Feature activation-5.61e-03

Pos logit contributions
ern+0.106
ad+0.104
 answers+0.097
 Lucas+0.097
et+0.094
Neg logit contributions
 divorce-0.231
AAAA-0.223
broken-0.221
 million-0.215
 severe-0.213
Token She
Token ID1375
Feature activation-5.19e-02

Pos logit contributions
ern+0.008
ad+0.008
 Lucas+0.008
iles+0.008
 meanings+0.008
Neg logit contributions
broken-0.009
 divorce-0.009
 late-0.008
 severe-0.008
 helicopter-0.008
Token hugs
Token ID40657
Feature activation+2.04e-02

Pos logit contributions
ern+0.084
ales+0.083
zie+0.079
iles+0.079
ad+0.077
Neg logit contributions
 divorce-0.073
broken-0.072
 band-0.070
 helicopter-0.068
 late-0.067
Token the
Token ID262
Feature activation-6.37e-03

Pos logit contributions
 band+0.033
 divorce+0.031
B+0.031
 late+0.030
 helicopter+0.030
Neg logit contributions
ern-0.030
ales-0.028
zie-0.028
iles-0.028
 meanings-0.027
Token robot
Token ID9379
Feature activation+3.34e-02

Pos logit contributions
ad+0.011
ern+0.011
 meanings+0.011
iles+0.011
 answers+0.011
Neg logit contributions
broken-0.008
 divorce-0.008
 late-0.007
 severe-0.007
AAAA-0.007
Token.
Token ID13
Feature activation-5.75e-03

Pos logit contributions
 divorce+0.048
broken+0.047
 late+0.043
AAAA+0.042
 severe+0.041
Neg logit contributions
ern-0.054
ad-0.052
et-0.051
ales-0.051
iles-0.051
Token up
Token ID510
Feature activation+2.50e-02

Pos logit contributions
AAAA+0.007
 divorce+0.007
Sorry+0.006
broken+0.006
 million+0.006
Neg logit contributions
 answers-0.005
ales-0.005
iles-0.005
ad-0.005
iddles-0.004
Token.
Token ID13
Feature activation-3.62e-03

Pos logit contributions
broken+0.035
 divorce+0.035
 late+0.033
 severe+0.031
 band+0.030
Neg logit contributions
ern-0.041
iles-0.040
ales-0.039
ad-0.039
zie-0.039
Token She
Token ID1375
Feature activation-5.53e-02

Pos logit contributions
iles+0.007
ales+0.006
et+0.006
io+0.006
ern+0.006
Neg logit contributions
 late-0.004
 divorce-0.004
broken-0.004
 band-0.004
 helicopter-0.004
Token loved
Token ID6151
Feature activation+1.44e-03

Pos logit contributions
ern+0.101
iles+0.098
ales+0.098
ad+0.097
zie+0.096
Neg logit contributions
broken-0.068
 divorce-0.067
 late-0.061
 band-0.058
 severe-0.057
Token the
Token ID262
Feature activation+1.16e-03

Pos logit contributions
broken+0.002
 divorce+0.002
 late+0.002
 severe+0.002
AAAA+0.002
Neg logit contributions
ern-0.002
iles-0.002
ad-0.002
 meanings-0.002
ales-0.002
Token sound
Token ID2128
Feature activation+2.71e-02

Pos logit contributions
broken+0.001
 divorce+0.001
AAAA+0.001
apy+0.001
 unst+0.001
Neg logit contributions
 answers-0.002
 meanings-0.002
ad-0.002
 poses-0.002
et-0.002
Token of
Token ID286
Feature activation+1.52e-02

Pos logit contributions
broken+0.034
 divorce+0.033
AAAA+0.030
 late+0.029
 severe+0.028
Neg logit contributions
ad-0.047
et-0.047
ern-0.047
iles-0.045
 answers-0.045
Token the
Token ID262
Feature activation+1.89e-03

Pos logit contributions
 divorce+0.023
broken+0.023
 late+0.022
 band+0.021
 helicopter+0.021
Neg logit contributions
ern-0.023
iles-0.022
ales-0.022
ad-0.022
zie-0.022
Token violin
Token ID38283
Feature activation+5.11e-02

Pos logit contributions
broken+0.002
 divorce+0.002
AAAA+0.002
 unst+0.002
 towed+0.002
Neg logit contributions
 answers-0.003
 meanings-0.003
ad-0.003
 poses-0.003
iles-0.003
Token and
Token ID290
Feature activation+4.66e-02

Pos logit contributions
 divorce+0.069
broken+0.068
 late+0.062
AAAA+0.059
 severe+0.059
Neg logit contributions
ern-0.086
ad-0.083
iles-0.082
ales-0.082
et-0.082
Token wanted
Token ID2227
Feature activation+6.87e-04

Pos logit contributions
 divorce+0.060
broken+0.059
 late+0.054
 band+0.052
 helicopter+0.051
Neg logit contributions
ern-0.085
iles-0.082
ales-0.080
ad-0.079
zie-0.078
TokenS
Token ID50
Feature activation-2.25e-02

Pos logit contributions
 divorce+0.046
broken+0.043
 helicopter+0.040
burn+0.040
 severe+0.038
Neg logit contributions
ern-0.063
ad-0.061
iles-0.059
alia-0.059
ales-0.059
Tokenally
Token ID453
Feature activation-3.66e-03

Pos logit contributions
ad+0.025
ern+0.024
ales+0.024
iles+0.024
zie+0.024
Neg logit contributions
 divorce-0.044
broken-0.044
 late-0.042
 severe-0.042
 helicopter-0.041
Token cl
Token ID537
Feature activation+4.13e-02

Pos logit contributions
ern+0.006
iles+0.006
ad+0.006
ales+0.006
zie+0.006
Neg logit contributions
broken-0.005
 divorce-0.005
 late-0.004
 severe-0.004
 band-0.004
Tokenapped
Token ID6320
Feature activation+1.33e-01

Pos logit contributions
 divorce+0.056
broken+0.051
 late+0.050
 severe+0.050
 helicopter+0.047
Neg logit contributions
ern-0.075
ad-0.074
io-0.071
iles-0.071
alia-0.070
Token her
Token ID607
Feature activation-2.28e-02

Pos logit contributions
broken+0.187
 late+0.178
 divorce+0.174
 band+0.164
 severe+0.155
Neg logit contributions
ern-0.229
iles-0.219
ales-0.217
 meanings-0.214
zie-0.210
Token hands
Token ID2832
Feature activation-1.06e-02

Pos logit contributions
ern+0.026
et+0.025
ad+0.025
iles+0.024
 answers+0.024
Neg logit contributions
 divorce-0.045
 late-0.042
broken-0.042
 severe-0.041
Sorry-0.039
Token and
Token ID290
Feature activation+4.70e-02

Pos logit contributions
ern+0.017
ad+0.017
et+0.017
ales+0.016
 Lucas+0.016
Neg logit contributions
 divorce-0.015
broken-0.014
AAAA-0.014
 severe-0.013
 late-0.013
Token laughed
Token ID13818
Feature activation+1.96e-02

Pos logit contributions
broken+0.063
 divorce+0.061
 late+0.058
 band+0.057
 sudden+0.053
Neg logit contributions
ern-0.081
iles-0.080
ales-0.078
ad-0.077
 meanings-0.076
Token.
Token ID13
Feature activation-4.97e-03

Pos logit contributions
 divorce+0.033
broken+0.033
 late+0.030
 severe+0.029
 helicopter+0.029
Neg logit contributions
ern-0.027
iles-0.026
ad-0.026
ales-0.025
zie-0.025
Token �
Token ID6184
Feature activation+3.44e-02

Pos logit contributions
ern+0.008
iles+0.008
ad+0.008
ales+0.008
et+0.008
Neg logit contributions
 late-0.007
broken-0.007
 divorce-0.007
AAAA-0.006
 helicopter-0.006
Token�
Token ID95
Feature activation-1.03e-02

Pos logit contributions
 divorce+0.052
broken+0.052
 late+0.049
AAAA+0.047
 band+0.046
Neg logit contributions
ern-0.051
iles-0.050
zie-0.049
 meanings-0.049
ales-0.049
Token,
Token ID11
Feature activation+2.00e-04

Pos logit contributions
 together+0.013
 late+0.012
broken+0.011
 while+0.010
 they+0.010
Neg logit contributions
ern-0.052
 meanings-0.052
ori-0.050
wered-0.049
ales-0.048
Token he
Token ID339
Feature activation+2.67e-02

Pos logit contributions
 divorce+0.000
broken+0.000
 late+0.000
 severe+0.000
 band+0.000
Neg logit contributions
ern-0.000
iles-0.000
ales-0.000
ad-0.000
 meanings-0.000
Token felt
Token ID2936
Feature activation-1.55e-02

Pos logit contributions
broken+0.026
 sudden+0.024
 accidentally+0.021
 band+0.021
 sank+0.021
Neg logit contributions
ern-0.059
ales-0.057
iles-0.055
 meanings-0.053
zie-0.053
Token a
Token ID257
Feature activation+5.21e-02

Pos logit contributions
ern+0.031
iles+0.030
ales+0.030
io+0.029
 meanings+0.029
Neg logit contributions
 late-0.018
broken-0.018
 divorce-0.017
 sad-0.016
 unst-0.016
Token bit
Token ID1643
Feature activation+6.95e-02

Pos logit contributions
 divorce+0.060
broken+0.059
 late+0.054
 severe+0.050
 band+0.049
Neg logit contributions
ern-0.099
iles-0.096
ad-0.095
ales-0.095
zie-0.094
Token scared
Token ID12008
Feature activation+8.17e-02

Pos logit contributions
broken+0.050
 divorce+0.049
 late+0.044
 severe+0.037
 band+0.036
Neg logit contributions
ern-0.163
iles-0.158
ales-0.157
ad-0.157
zie-0.156
Token,
Token ID11
Feature activation+5.73e-04

Pos logit contributions
 late+0.098
broken+0.091
 divorce+0.088
 badly+0.087
 sad+0.080
Neg logit contributions
 meanings-0.160
ern-0.157
iles-0.152
io-0.150
zie-0.149
Token but
Token ID475
Feature activation-7.19e-02

Pos logit contributions
 late+0.001
broken+0.001
 divorce+0.001
 sudden+0.001
 sad+0.001
Neg logit contributions
iles-0.001
ern-0.001
ales-0.001
io-0.001
alia-0.001
Token also
Token ID635
Feature activation-4.36e-02

Pos logit contributions
ern+0.136
ales+0.135
iles+0.134
 meanings+0.129
io+0.125
Neg logit contributions
 late-0.087
broken-0.085
 divorce-0.085
 upset-0.075
 sudden-0.075
Token very
Token ID845
Feature activation+1.21e-02

Pos logit contributions
ern+0.092
ales+0.092
iles+0.091
zie+0.088
io+0.086
Neg logit contributions
broken-0.045
 late-0.045
 divorce-0.043
 sad-0.038
 upset-0.038
Token brave
Token ID14802
Feature activation+3.34e-02

Pos logit contributions
broken+0.010
 late+0.009
 divorce+0.008
 sad+0.007
 upset+0.007
Neg logit contributions
ern-0.029
ales-0.029
iles-0.028
 meanings-0.028
ad-0.028
Token old
Token ID1468
Feature activation-8.10e-02

Pos logit contributions
ern+0.093
iles+0.089
ad+0.088
ales+0.087
zie+0.085
Neg logit contributions
 divorce-0.149
broken-0.149
 late-0.138
 severe-0.134
AAAA-0.134
Token.
Token ID13
Feature activation-6.73e-03

Pos logit contributions
ern+0.158
zie+0.153
iles+0.153
ales+0.152
ad+0.149
Neg logit contributions
broken-0.094
 divorce-0.086
 late-0.083
 band-0.083
 helicopter-0.080
Token He
Token ID679
Feature activation-4.22e-02

Pos logit contributions
ern+0.014
alia+0.013
ales+0.013
ocol+0.013
i+0.013
Neg logit contributions
 late-0.007
 Lots-0.007
 divorce-0.006
 helicopter-0.006
 much-0.006
Token wanted
Token ID2227
Feature activation+1.03e-04

Pos logit contributions
ales+0.090
ern+0.087
iles+0.086
 angles+0.085
zie+0.083
Neg logit contributions
broken-0.042
 late-0.040
B-0.039
 divorce-0.039
 band-0.039
Token to
Token ID284
Feature activation-1.36e-02

Pos logit contributions
 divorce+0.000
broken+0.000
 late+0.000
 band+0.000
 severe+0.000
Neg logit contributions
ern-0.000
iles-0.000
ales-0.000
ad-0.000
zie-0.000
Token give
Token ID1577
Feature activation+4.03e-03

Pos logit contributions
ern+0.022
iles+0.021
ales+0.021
zie+0.021
et+0.021
Neg logit contributions
 divorce-0.020
broken-0.019
 late-0.018
 band-0.017
 severe-0.017
Token a
Token ID257
Feature activation+4.72e-02

Pos logit contributions
 divorce+0.005
 band+0.005
 late+0.005
 helicopter+0.005
broken+0.005
Neg logit contributions
ern-0.007
iles-0.007
zie-0.007
ales-0.006
alia-0.006
Token present
Token ID1944
Feature activation+3.19e-02

Pos logit contributions
broken+0.082
 late+0.073
Sorry+0.070
AAAA+0.070
 divorce+0.070
Neg logit contributions
 answers-0.066
ern-0.059
 names-0.058
 Lucas-0.058
 meanings-0.058
Token to
Token ID284
Feature activation-1.34e-02

Pos logit contributions
 divorce+0.047
broken+0.046
 late+0.046
 band+0.044
 helicopter+0.042
Neg logit contributions
ern-0.051
iles-0.050
zie-0.049
 meanings-0.048
ales-0.047
Token his
Token ID465
Feature activation+1.24e-03

Pos logit contributions
ern+0.027
iles+0.027
alia+0.026
et+0.026
zie+0.026
Neg logit contributions
 band-0.015
 divorce-0.014
 late-0.013
 upset-0.012
 together-0.012
Token mum
Token ID25682
Feature activation-3.92e-02

Pos logit contributions
 divorce+0.001
broken+0.001
 late+0.001
 band+0.001
 helicopter+0.001
Neg logit contributions
ern-0.002
ales-0.002
iles-0.002
ad-0.002
zie-0.002
Token�
Token ID129
Feature activation+1.36e-02

Pos logit contributions
ad+0.011
ern+0.010
iles+0.010
 Lucas+0.010
ales+0.010
Neg logit contributions
 divorce-0.008
broken-0.008
 late-0.007
 band-0.007
 severe-0.007
Token�
Token ID241
Feature activation-8.07e-02

Pos logit contributions
broken+0.024
 late+0.023
 upset+0.022
 sad+0.022
 divorce+0.021
Neg logit contributions
iles-0.018
alia-0.017
 Lucas-0.016
ern-0.016
 Mae-0.016
TokenOld
Token ID19620
Feature activation-1.11e-02

Pos logit contributions
ales+0.145
ern+0.145
iles+0.144
zie+0.143
et+0.141
Neg logit contributions
 late-0.097
broken-0.096
 divorce-0.096
B-0.084
 band-0.083
Token man
Token ID582
Feature activation-2.31e-02

Pos logit contributions
ern+0.022
ales+0.021
zie+0.020
iles+0.020
 meanings+0.020
Neg logit contributions
 divorce-0.012
 late-0.012
broken-0.012
 helicopter-0.011
 band-0.011
Token,
Token ID11
Feature activation-1.88e-03

Pos logit contributions
ern+0.041
ales+0.040
iles+0.039
zie+0.038
 meanings+0.038
Neg logit contributions
broken-0.029
 divorce-0.029
 late-0.027
 band-0.026
 helicopter-0.026
Token why
Token ID1521
Feature activation-1.42e-01

Pos logit contributions
ern+0.003
ales+0.003
iles+0.003
zie+0.003
 meanings+0.003
Neg logit contributions
 divorce-0.002
broken-0.002
 late-0.002
 band-0.002
 helicopter-0.002
Token do
Token ID466
Feature activation-8.67e-02

Pos logit contributions
ern+0.264
ales+0.253
iles+0.246
zie+0.245
ad+0.242
Neg logit contributions
 divorce-0.173
broken-0.171
 late-0.163
 band-0.151
 severe-0.150
Token you
Token ID345
Feature activation-1.07e-02

Pos logit contributions
ern+0.155
iles+0.148
ales+0.147
ad+0.147
zie+0.145
Neg logit contributions
 divorce-0.110
broken-0.109
 late-0.101
 severe-0.094
 band-0.093
Token whistle
Token ID16121
Feature activation+8.46e-02

Pos logit contributions
ern+0.018
ales+0.017
zie+0.016
iles+0.016
io+0.016
Neg logit contributions
broken-0.015
 divorce-0.015
 late-0.014
 band-0.014
 severe-0.013
Token?
Token ID30
Feature activation-1.21e-02

Pos logit contributions
 divorce+0.121
broken+0.119
 late+0.109
 severe+0.106
AAAA+0.103
Neg logit contributions
ern-0.133
ad-0.131
iles-0.130
et-0.128
ales-0.127
Token It
Token ID632
Feature activation+1.93e-02

Pos logit contributions
ern+0.017
ad+0.016
 Lucas+0.016
iles+0.015
ales+0.015
Neg logit contributions
broken-0.021
 divorce-0.021
 severe-0.018
 late-0.018
burn-0.018
Token but
Token ID475
Feature activation-8.12e-02

Pos logit contributions
ern+0.020
iles+0.019
ad+0.019
ales+0.019
 meanings+0.019
Neg logit contributions
broken-0.012
 late-0.012
 divorce-0.012
 sad-0.011
 band-0.010
Token he
Token ID339
Feature activation+1.44e-02

Pos logit contributions
ern+0.177
ales+0.173
iles+0.169
ad+0.168
alia+0.167
Neg logit contributions
 divorce-0.080
 late-0.073
broken-0.069
 helicopter-0.066
 band-0.062
Token does
Token ID857
Feature activation-6.81e-02

Pos logit contributions
 sad+0.015
 band+0.015
 divorce+0.015
 sank+0.015
 we+0.014
Neg logit contributions
ern-0.029
ales-0.028
zie-0.027
ad-0.025
 meanings-0.024
Token not
Token ID407
Feature activation-1.37e-02

Pos logit contributions
ern+0.110
ales+0.107
iles+0.106
zie+0.105
ad+0.103
Neg logit contributions
broken-0.099
 divorce-0.099
 late-0.094
 severe-0.087
 band-0.086
Token smell
Token ID8508
Feature activation-7.85e-02

Pos logit contributions
ern+0.026
 meanings+0.026
zie+0.024
ales+0.024
io+0.024
Neg logit contributions
 divorce-0.016
 late-0.016
 band-0.015
broken-0.015
 upset-0.014
Token them
Token ID606
Feature activation-2.80e-02

Pos logit contributions
ern+0.128
zie+0.120
ales+0.118
 meanings+0.118
iles+0.117
Neg logit contributions
 divorce-0.118
 late-0.113
broken-0.112
 band-0.105
 helicopter-0.103
Token.
Token ID13
Feature activation-1.97e-02

Pos logit contributions
ern+0.037
iles+0.035
ales+0.034
ad+0.034
et+0.034
Neg logit contributions
 divorce-0.049
broken-0.049
 late-0.046
AAAA-0.044
 severe-0.044
Token
Token ID198
Feature activation+2.35e-02

Pos logit contributions
ern+0.031
iles+0.030
ad+0.030
ales+0.030
zie+0.029
Neg logit contributions
 divorce-0.029
broken-0.029
 late-0.027
 severe-0.025
 band-0.025
Token
Token ID198
Feature activation+2.37e-02

Pos logit contributions
 divorce+0.025
broken+0.025
 late+0.023
 severe+0.021
 band+0.020
Neg logit contributions
ern-0.047
iles-0.045
ales-0.045
ad-0.045
et-0.044
TokenThey
Token ID2990
Feature activation-2.97e-02

Pos logit contributions
 divorce+0.029
broken+0.028
 late+0.024
 helicopter+0.024
burn+0.024
Neg logit contributions
ern-0.044
iles-0.042
ad-0.042
ales-0.041
 Lucas-0.041
Token are
Token ID389
Feature activation-6.84e-02

Pos logit contributions
ern+0.061
ales+0.060
iddles+0.058
zie+0.057
io+0.056
Neg logit contributions
 sad-0.029
 divorce-0.028
 band-0.028
 upset-0.026
 broke-0.026
Token and
Token ID290
Feature activation+4.17e-02

Pos logit contributions
broken+0.106
 divorce+0.106
 late+0.101
 band+0.093
 severe+0.092
Neg logit contributions
ern-0.151
iles-0.145
ales-0.143
zie-0.143
ad-0.143
Token loses
Token ID14754
Feature activation-3.13e-03

Pos logit contributions
 divorce+0.061
broken+0.060
 late+0.059
 band+0.056
 helicopter+0.054
Neg logit contributions
ern-0.066
iles-0.064
ales-0.064
zie-0.062
ad-0.062
Token his
Token ID465
Feature activation-1.39e-03

Pos logit contributions
iles+0.005
ales+0.005
 meanings+0.005
ern+0.005
zie+0.005
Neg logit contributions
broken-0.004
 late-0.004
 band-0.004
 badly-0.004
 divorce-0.004
Token balance
Token ID5236
Feature activation-3.72e-02

Pos logit contributions
ern+0.002
ad+0.002
 answers+0.002
et+0.002
 meanings+0.002
Neg logit contributions
 divorce-0.002
broken-0.002
 late-0.002
B-0.002
 severe-0.002
Token.
Token ID13
Feature activation-1.16e-02

Pos logit contributions
ern+0.060
iles+0.059
 meanings+0.059
ales+0.058
zie+0.058
Neg logit contributions
broken-0.054
 late-0.051
 divorce-0.050
 band-0.047
 severe-0.047
Token He
Token ID679
Feature activation-4.38e-02

Pos logit contributions
ern+0.020
iles+0.019
ales+0.019
ad+0.019
et+0.019
Neg logit contributions
 divorce-0.016
broken-0.015
 late-0.015
 band-0.013
 severe-0.013
Token falls
Token ID8953
Feature activation+9.10e-02

Pos logit contributions
ern+0.077
ales+0.076
zie+0.074
iles+0.072
ad+0.071
Neg logit contributions
 divorce-0.056
broken-0.055
 band-0.053
 late-0.052
 helicopter-0.051
Token off
Token ID572
Feature activation+6.45e-02

Pos logit contributions
 divorce+0.139
broken+0.126
 severe+0.116
 sudden+0.112
Sorry+0.109
Neg logit contributions
ern-0.146
ad-0.141
et-0.139
 answers-0.135
 Lucas-0.134
Token his
Token ID465
Feature activation-4.57e-04

Pos logit contributions
 divorce+0.111
broken+0.109
 late+0.098
 severe+0.097
AAAA+0.096
Neg logit contributions
ern-0.088
ad-0.086
iles-0.082
et-0.081
 Lucas-0.081
Token bike
Token ID7161
Feature activation-3.29e-03

Pos logit contributions
ern+0.001
ad+0.001
et+0.001
iles+0.001
ales+0.001
Neg logit contributions
 divorce-0.001
broken-0.001
 late-0.001
 severe-0.001
AAAA-0.001
Token and
Token ID290
Feature activation+4.23e-02

Pos logit contributions
ern+0.006
ad+0.006
et+0.005
ales+0.005
iles+0.005
Neg logit contributions
 divorce-0.004
broken-0.004
 late-0.004
 severe-0.004
AAAA-0.004
Token careful
Token ID8161
Feature activation+3.94e-03

Pos logit contributions
ern+0.072
ales+0.072
ad+0.072
iles+0.071
zie+0.071
Neg logit contributions
broken-0.029
 late-0.026
 sad-0.025
 upset-0.023
 band-0.023
Token not
Token ID407
Feature activation-1.26e-02

Pos logit contributions
broken+0.005
 divorce+0.005
 late+0.005
 severe+0.005
 band+0.005
Neg logit contributions
ern-0.007
iles-0.006
ales-0.006
ad-0.006
zie-0.006
Token to
Token ID284
Feature activation-1.93e-02

Pos logit contributions
ern+0.023
iles+0.022
ales+0.022
ad+0.022
et+0.021
Neg logit contributions
 divorce-0.015
broken-0.015
 late-0.014
AAAA-0.013
 severe-0.013
Token spill
Token ID19431
Feature activation+1.43e-02

Pos logit contributions
ern+0.036
io+0.036
zie+0.036
iles+0.036
 meanings+0.036
Neg logit contributions
broken-0.022
 divorce-0.022
 band-0.021
 late-0.021
 badly-0.020
Token the
Token ID262
Feature activation-1.03e-02

Pos logit contributions
 divorce+0.023
broken+0.023
 late+0.021
 severe+0.020
 band+0.020
Neg logit contributions
ern-0.021
iles-0.020
ad-0.020
ales-0.020
zie-0.020
Token y
Token ID331
Feature activation-2.49e-02

Pos logit contributions
ern+0.018
ad+0.017
iles+0.017
ales+0.017
 Lucas+0.017
Neg logit contributions
 divorce-0.014
broken-0.013
 late-0.013
 band-0.012
 helicopter-0.012
Tokenolk
Token ID13597
Feature activation+1.54e-02

Pos logit contributions
ern+0.033
iles+0.030
ad+0.030
 meanings+0.030
 Lucas+0.030
Neg logit contributions
broken-0.043
 divorce-0.043
 late-0.041
 band-0.039
 severe-0.039
Token or
Token ID393
Feature activation+3.17e-02

Pos logit contributions
 divorce+0.020
broken+0.020
 late+0.019
 severe+0.018
 band+0.017
Neg logit contributions
ern-0.027
iles-0.026
ad-0.025
ales-0.025
zie-0.025
Token the
Token ID262
Feature activation-9.62e-03

Pos logit contributions
 divorce+0.045
broken+0.045
 late+0.043
 band+0.040
 severe+0.040
Neg logit contributions
ern-0.052
iles-0.049
ad-0.049
ales-0.049
zie-0.048
Token white
Token ID2330
Feature activation-6.50e-02

Pos logit contributions
ern+0.018
ales+0.017
ad+0.017
 Lucas+0.017
zie+0.017
Neg logit contributions
 divorce-0.011
broken-0.011
 band-0.011
 late-0.011
 helicopter-0.011
Token.
Token ID13
Feature activation-1.66e-02

Pos logit contributions
ern+0.108
ad+0.102
iles+0.101
zie+0.101
 meanings+0.101
Neg logit contributions
broken-0.089
 divorce-0.088
 late-0.083
 band-0.082
 helicopter-0.080
TokenThe
Token ID464
Feature activation-8.70e-03

Pos logit contributions
 divorce+0.044
broken+0.041
burn+0.039
 helicopter+0.037
 unst+0.036
Neg logit contributions
ern-0.054
ad-0.053
iles-0.052
 meanings-0.049
ori-0.049
Token mixer
Token ID33938
Feature activation+5.42e-02

Pos logit contributions
et+0.012
 answers+0.012
iles+0.012
ad+0.012
zie+0.012
Neg logit contributions
broken-0.014
 divorce-0.013
 late-0.013
 severe-0.012
 badly-0.012
Token waved
Token ID26834
Feature activation+6.02e-02

Pos logit contributions
 divorce+0.077
broken+0.075
 late+0.072
 severe+0.067
AAAA+0.066
Neg logit contributions
ern-0.088
ad-0.086
et-0.085
iles-0.084
zie-0.083
Token to
Token ID284
Feature activation-1.36e-02

Pos logit contributions
 divorce+0.087
broken+0.081
 severe+0.075
 late+0.074
AAAA+0.073
Neg logit contributions
ad-0.096
et-0.096
ern-0.094
 Lucas-0.093
iles-0.092
Token the
Token ID262
Feature activation-1.99e-03

Pos logit contributions
ern+0.025
alia+0.023
iles+0.023
 meanings+0.022
ales+0.022
Neg logit contributions
 divorce-0.018
broken-0.017
 late-0.017
 helicopter-0.016
 band-0.016
Token clouds
Token ID15114
Feature activation-5.46e-02

Pos logit contributions
ern+0.003
iles+0.003
et+0.003
ad+0.003
 answers+0.003
Neg logit contributions
broken-0.003
 divorce-0.003
 late-0.003
 severe-0.003
AAAA-0.003
Token and
Token ID290
Feature activation+4.56e-02

Pos logit contributions
ern+0.084
ad+0.084
et+0.083
ales+0.082
iles+0.081
Neg logit contributions
 divorce-0.081
broken-0.078
 late-0.073
 severe-0.073
AAAA-0.072
Token asked
Token ID1965
Feature activation+3.10e-03

Pos logit contributions
 divorce+0.067
broken+0.067
 late+0.062
 severe+0.059
 helicopter+0.058
Neg logit contributions
ern-0.072
iles-0.069
ad-0.068
ales-0.068
zie-0.067
Token if
Token ID611
Feature activation+2.19e-02

Pos logit contributions
 divorce+0.004
broken+0.004
 late+0.004
 helicopter+0.004
 band+0.004
Neg logit contributions
ern-0.005
iles-0.005
ales-0.005
ad-0.005
zie-0.005
Token they
Token ID484
Feature activation+9.48e-02

Pos logit contributions
broken+0.029
 divorce+0.028
 late+0.025
 severe+0.025
 million+0.024
Neg logit contributions
ern-0.037
iles-0.037
ad-0.037
 Lucas-0.036
ales-0.036
Token would
Token ID561
Feature activation+2.65e-02

Pos logit contributions
 divorce+0.085
broken+0.083
 late+0.078
 band+0.071
 helicopter+0.068
Neg logit contributions
ern-0.204
ales-0.193
zie-0.193
ad-0.188
iles-0.188
Token She
Token ID1375
Feature activation-6.16e-02

Pos logit contributions
ern+0.022
iles+0.022
ales+0.022
ad+0.021
et+0.021
Neg logit contributions
 divorce-0.022
broken-0.022
 late-0.021
 severe-0.019
AAAA-0.019
Token also
Token ID635
Feature activation-5.60e-02

Pos logit contributions
ern+0.117
ales+0.114
iles+0.114
zie+0.110
ad+0.110
Neg logit contributions
broken-0.072
 divorce-0.071
 late-0.065
 band-0.061
 severe-0.060
Token realized
Token ID6939
Feature activation-3.57e-03

Pos logit contributions
ern+0.115
ales+0.111
iles+0.111
zie+0.108
ad+0.106
Neg logit contributions
broken-0.059
 divorce-0.057
 late-0.054
 band-0.049
 severe-0.048
Token that
Token ID326
Feature activation-1.07e-01

Pos logit contributions
ern+0.006
ales+0.006
iles+0.006
zie+0.005
et+0.005
Neg logit contributions
 divorce-0.005
broken-0.005
 late-0.005
 helicopter-0.005
 band-0.005
Token asking
Token ID4737
Feature activation-8.35e-02

Pos logit contributions
ern+0.179
ales+0.171
iles+0.170
ad+0.168
zie+0.166
Neg logit contributions
 divorce-0.150
broken-0.146
 late-0.139
 severe-0.129
 band-0.129
Token for
Token ID329
Feature activation-1.94e-01

Pos logit contributions
ern+0.136
ales+0.123
iles+0.122
zie+0.120
et+0.117
Neg logit contributions
 divorce-0.129
 late-0.123
 band-0.121
broken-0.119
 sad-0.116
Token help
Token ID1037
Feature activation+1.20e-02

Pos logit contributions
ern+0.360
ales+0.344
zie+0.344
iles+0.340
alia+0.338
Neg logit contributions
 divorce-0.254
 late-0.239
 band-0.237
broken-0.230
 severe-0.221
Token was
Token ID373
Feature activation-1.10e-04

Pos logit contributions
broken+0.014
 divorce+0.014
 late+0.013
 band+0.012
 severe+0.012
Neg logit contributions
ern-0.023
iles-0.022
ales-0.022
 Lucas-0.022
ad-0.022
Token not
Token ID407
Feature activation-1.37e-02

Pos logit contributions
ern+0.000
ales+0.000
iles+0.000
ad+0.000
zie+0.000
Neg logit contributions
 divorce-0.000
 late-0.000
broken-0.000
 severe-0.000
 band-0.000
Token bad
Token ID2089
Feature activation+4.87e-02

Pos logit contributions
ern+0.022
iles+0.021
ad+0.021
ales+0.021
 meanings+0.020
Neg logit contributions
 divorce-0.020
broken-0.020
 late-0.018
AAAA-0.017
 severe-0.017
Token,
Token ID11
Feature activation-6.33e-03

Pos logit contributions
 divorce+0.065
broken+0.063
 late+0.062
 band+0.056
 helicopter+0.055
Neg logit contributions
ern-0.085
zie-0.081
ales-0.080
ad-0.080
iles-0.080
Token They
Token ID1119
Feature activation+1.42e-02

Pos logit contributions
ad+0.008
ern+0.008
 meanings+0.008
iles+0.008
ales+0.008
Neg logit contributions
 divorce-0.007
 late-0.007
broken-0.006
 sudden-0.006
B-0.006
Token liked
Token ID8288
Feature activation-6.36e-02

Pos logit contributions
 accidentally+0.011
broken+0.010
 band+0.010
 fixed+0.010
 broke+0.010
Neg logit contributions
ern-0.031
 poses-0.030
iles-0.029
ogo-0.029
ales-0.029
Token to
Token ID284
Feature activation-1.44e-02

Pos logit contributions
ern+0.128
ad+0.120
zie+0.120
ales+0.119
iles+0.119
Neg logit contributions
 divorce-0.068
broken-0.064
 late-0.063
 helicopter-0.059
 band-0.058
Token pick
Token ID2298
Feature activation-3.93e-02

Pos logit contributions
iles+0.029
ogo+0.029
 Lucas+0.029
ori+0.029
io+0.029
Neg logit contributions
 band-0.012
 helicopter-0.011
 happy-0.011
 sad-0.011
 accidentally-0.010
Token the
Token ID262
Feature activation-7.57e-03

Pos logit contributions
ern+0.071
zie+0.067
 meanings+0.066
ad+0.065
icity+0.064
Neg logit contributions
 late-0.047
 divorce-0.047
 together-0.046
 helicopter-0.046
 band-0.044
Token fruits
Token ID15921
Feature activation-1.57e-01

Pos logit contributions
ern+0.014
 Lucas+0.013
ad+0.012
ori+0.012
icity+0.012
Neg logit contributions
 little-0.010
 million-0.009
 helicopter-0.009
 sad-0.009
 late-0.009
Token and
Token ID290
Feature activation+4.71e-02

Pos logit contributions
ern+0.322
 meanings+0.311
iles+0.306
ad+0.305
zie+0.302
Neg logit contributions
broken-0.156
 divorce-0.154
 late-0.150
 together-0.135
 helicopter-0.131
Token share
Token ID2648
Feature activation+1.61e-03

Pos logit contributions
 divorce+0.058
 band+0.055
 we+0.055
 late+0.054
 helicopter+0.054
Neg logit contributions
ern-0.085
iles-0.081
ad-0.079
io-0.079
ori-0.079
Token them
Token ID606
Feature activation-2.32e-02

Pos logit contributions
 divorce+0.002
broken+0.002
 late+0.002
 band+0.002
 helicopter+0.002
Neg logit contributions
ern-0.003
zie-0.002
iles-0.002
ad-0.002
alia-0.002
Token with
Token ID351
Feature activation-5.16e-02

Pos logit contributions
zie+0.047
iles+0.047
ern+0.047
 meanings+0.046
ad+0.045
Neg logit contributions
broken-0.024
 together-0.022
 late-0.021
 divorce-0.020
 badly-0.020
Token their
Token ID511
Feature activation+4.59e-02

Pos logit contributions
ern+0.093
zie+0.089
ad+0.085
ales+0.085
iles+0.083
Neg logit contributions
 divorce-0.063
B-0.062
 band-0.060
 together-0.060
broken-0.059
Token
Token ID198
Feature activation+3.48e-02

Pos logit contributions
 divorce+0.032
broken+0.032
 late+0.031
B+0.027
 band+0.026
Neg logit contributions
ern-0.073
iles-0.073
ales-0.072
et-0.072
 meanings-0.071
TokenJ
Token ID41
Feature activation+2.02e-03

Pos logit contributions
 divorce+0.040
broken+0.039
 late+0.035
 helicopter+0.033
 severe+0.033
Neg logit contributions
ern-0.067
ad-0.065
iles-0.064
ales-0.063
zie-0.063
Tokenenny
Token ID11870
Feature activation+2.80e-02

Pos logit contributions
 divorce+0.002
broken+0.002
 late+0.002
 helicopter+0.002
 severe+0.002
Neg logit contributions
ern-0.004
ad-0.004
iles-0.004
ales-0.004
zie-0.004
Token stayed
Token ID9658
Feature activation+3.10e-02

Pos logit contributions
broken+0.033
 divorce+0.032
 late+0.028
 band+0.027
 severe+0.027
Neg logit contributions
ern-0.053
iles-0.052
ales-0.052
ad-0.050
zie-0.050
Token there
Token ID612
Feature activation+6.15e-02

Pos logit contributions
broken+0.038
 late+0.036
 divorce+0.036
 band+0.032
 severe+0.031
Neg logit contributions
ern-0.060
iles-0.058
ales-0.058
 meanings-0.057
ad-0.056
Token for
Token ID329
Feature activation-1.85e-01

Pos logit contributions
broken+0.093
 late+0.090
 divorce+0.087
 together+0.086
 band+0.081
Neg logit contributions
ern-0.099
iles-0.096
ales-0.095
ad-0.091
 meanings-0.090
Token the
Token ID262
Feature activation-2.92e-03

Pos logit contributions
ales+0.399
iles+0.398
ern+0.391
zie+0.385
io+0.380
Neg logit contributions
 late-0.180
 divorce-0.170
 band-0.151
broken-0.146
B-0.143
Token rest
Token ID1334
Feature activation+6.55e-02

Pos logit contributions
ern+0.006
ales+0.005
io+0.005
 Lucas+0.005
alia+0.005
Neg logit contributions
 divorce-0.004
 late-0.003
broken-0.003
 sudden-0.003
 band-0.003
Token of
Token ID286
Feature activation+1.59e-02

Pos logit contributions
broken+0.054
 divorce+0.052
 late+0.048
 band+0.042
 severe+0.041
Neg logit contributions
ern-0.146
 meanings-0.140
iles-0.140
ad-0.140
ales-0.139
Token the
Token ID262
Feature activation-2.23e-03

Pos logit contributions
 late+0.018
 divorce+0.017
 band+0.017
 together+0.016
broken+0.015
Neg logit contributions
ales-0.031
ern-0.030
iles-0.030
 meanings-0.030
alia-0.029
Token day
Token ID1110
Feature activation+5.37e-02

Pos logit contributions
ern+0.005
 Lucas+0.004
ales+0.004
io+0.004
alia+0.004
Neg logit contributions
 divorce-0.002
 late-0.002
broken-0.002
 sudden-0.002
 band-0.002
Token of
Token ID286
Feature activation+8.41e-03

Pos logit contributions
 divorce+0.003
 late+0.003
broken+0.003
 band+0.003
 helicopter+0.003
Neg logit contributions
ern-0.013
zie-0.013
ales-0.012
 meanings-0.012
iles-0.012
Token exciting
Token ID7895
Feature activation-1.12e-01

Pos logit contributions
 divorce+0.012
broken+0.012
 late+0.012
 band+0.011
 severe+0.011
Neg logit contributions
ern-0.013
iles-0.013
ales-0.013
ad-0.013
zie-0.013
Token things
Token ID1243
Feature activation-6.37e-02

Pos logit contributions
ern+0.197
zie+0.189
ales+0.187
 Lucas+0.186
iles+0.185
Neg logit contributions
 divorce-0.142
broken-0.138
 late-0.134
 band-0.130
 helicopter-0.129
Token to
Token ID284
Feature activation-1.74e-02

Pos logit contributions
ern+0.115
iles+0.112
ales+0.110
 meanings+0.110
zie+0.110
Neg logit contributions
 divorce-0.078
broken-0.076
 late-0.075
 together-0.069
 band-0.069
Token pay
Token ID1414
Feature activation+2.11e-02

Pos logit contributions
 meanings+0.038
alia+0.038
iles+0.037
ror+0.037
ales+0.037
Neg logit contributions
 band-0.016
 divorce-0.014
 happy-0.014
 upset-0.013
 sad-0.013
Token for
Token ID329
Feature activation-1.91e-01

Pos logit contributions
 late+0.034
 together+0.032
 divorce+0.031
broken+0.031
 unst+0.029
Neg logit contributions
ern-0.032
iles-0.031
zie-0.030
alia-0.030
wered-0.029
Token.
Token ID13
Feature activation-9.63e-03

Pos logit contributions
iles+0.315
zie+0.313
alia+0.312
ern+0.308
et+0.307
Neg logit contributions
 divorce-0.258
 late-0.253
 badly-0.249
 band-0.243
broken-0.242
Token She
Token ID1375
Feature activation-6.25e-02

Pos logit contributions
iles+0.020
ern+0.019
et+0.019
ales+0.019
ad+0.018
Neg logit contributions
 late-0.009
 divorce-0.009
 Lots-0.008
 together-0.008
broken-0.007
Token excited
Token ID6568
Feature activation+9.42e-02

Pos logit contributions
ern+0.118
ales+0.115
iles+0.114
ad+0.112
zie+0.112
Neg logit contributions
broken-0.074
 divorce-0.071
 late-0.065
 band-0.063
 sudden-0.061
Tokenly
Token ID306
Feature activation-6.42e-02

Pos logit contributions
broken+0.119
 divorce+0.118
 late+0.115
 band+0.105
 together+0.102
Neg logit contributions
ern-0.176
iles-0.158
stack-0.155
 Lucas-0.154
zie-0.153
Token asked
Token ID1965
Feature activation-5.89e-04

Pos logit contributions
ern+0.117
iles+0.113
ad+0.112
ales+0.112
zie+0.110
Neg logit contributions
 divorce-0.079
broken-0.079
 late-0.072
 severe-0.067
 band-0.066
TokenOne
Token ID3198
Feature activation-9.95e-02

Pos logit contributions
 divorce+0.060
broken+0.060
 late+0.057
 severe+0.054
 helicopter+0.054
Neg logit contributions
ern-0.047
iles-0.044
ales-0.044
ad-0.044
zie-0.043
Token day
Token ID1110
Feature activation+5.35e-02

Pos logit contributions
iles+0.125
ern+0.123
ad+0.122
et+0.121
ales+0.119
Neg logit contributions
broken-0.176
 divorce-0.175
AAAA-0.155
 severe-0.154
 million-0.153
Token,
Token ID11
Feature activation-1.52e-03

Pos logit contributions
broken+0.092
 divorce+0.091
 severe+0.082
 late+0.080
burn+0.078
Neg logit contributions
iles-0.071
ern-0.070
ad-0.070
 Lucas-0.069
ales-0.067
Token he
Token ID339
Feature activation+2.38e-02

Pos logit contributions
 Lucas+0.002
iles+0.002
ad+0.002
ern+0.002
ales+0.002
Neg logit contributions
broken-0.003
 divorce-0.003
 severe-0.002
arre-0.002
burn-0.002
Token went
Token ID1816
Feature activation+1.13e-01

Pos logit contributions
 divorce+0.045
broken+0.043
 helicopter+0.042
 severe+0.041
apy+0.041
Neg logit contributions
ad-0.026
alia-0.026
et-0.026
iles-0.025
 answers-0.025
Token for
Token ID329
Feature activation-1.84e-01

Pos logit contributions
 divorce+0.119
broken+0.117
 late+0.116
 band+0.106
 helicopter+0.100
Neg logit contributions
ern-0.229
iles-0.224
 meanings-0.222
ad-0.219
ales-0.218
Token a
Token ID257
Feature activation+5.22e-02

Pos logit contributions
ern+0.322
iles+0.308
ad+0.307
ales+0.306
zie+0.301
Neg logit contributions
 divorce-0.243
broken-0.242
 late-0.223
 severe-0.209
 band-0.206
Token walk
Token ID2513
Feature activation+7.17e-02

Pos logit contributions
broken+0.099
arre+0.090
 severe+0.087
 late+0.087
 million+0.087
Neg logit contributions
ern-0.057
 Lucas-0.052
 count-0.052
 answers-0.052
 Mae-0.051
Token with
Token ID351
Feature activation-4.85e-02

Pos logit contributions
broken+0.098
 divorce+0.097
 late+0.093
 band+0.085
 helicopter+0.084
Neg logit contributions
ern-0.122
iles-0.117
ad-0.115
ales-0.115
 meanings-0.115
Token his
Token ID465
Feature activation+4.30e-03

Pos logit contributions
ern+0.079
iles+0.074
ales+0.073
 meanings+0.072
alia+0.071
Neg logit contributions
 divorce-0.073
 late-0.071
broken-0.070
 band-0.067
 helicopter-0.064
Token brain
Token ID3632
Feature activation+4.09e-03

Pos logit contributions
 divorce+0.005
broken+0.005
 late+0.005
 band+0.005
 helicopter+0.004
Neg logit contributions
ern-0.008
ales-0.008
ad-0.008
iles-0.008
zie-0.008