TOP ACTIVATIONS
MAX = 0.075

It helps us remember when birth days and special days are
ie . They both had birth days coming up , and
promised to celebrate all her birth days together . <|endoftext|> Once
very excited because she loved birth days . On
so excited ! He loved birth days .
also liked to celebrate their birth days . Tim 's birthday
was time to celebrate their birth days . Max had a
Welcome home , Daddy !", Cher ie said happily . <|endoftext|>
and then said , " How about you mix together all
and Jane celebrated both their birth days together . They played
every year , on their birth days , they would celebrate
" Where is daddy ?", Cher ie asked her mom my
does . She learned that birth days are not about cakes
a box of pretty or nam ents that she wanted to
a big box of or nam ents . They were so
with lots of pretty or nam ents . One day ,
wrong â all the water had gone !
box full of beautiful or nam ents .
who liked to collect or nam ents . Everywhere she went
to play with me ?" Will said , " Yes ,

MIN ACTIVATIONS
MIN = -0.054

many books on his books he lf . His books he
many books on her books he lf in her room .
." The moral of the story is that it
you go , my d arl ings . Enjoy your lunch
together , but Gret a and Jing always seemed to bow
it back on the books he lf so we can find
them back on the books he lf . Then they grabbed
When she opened the books he lf , everything inside was
the books on his books he lf .
And Lucy never forgot the old woman 's words or the
books back on the books he lf , smiling proudly .
upon a time , little Su zy asked her Mom what
911 , what is your emergency ?" a voice said on
book falls off the books he lf . It makes a
is hiding behind the books he lf . The cat jumps
of fun exploring the books he lf , and his mom
more books from the books he lf . The end .
some sticks down . Tim and Uncle Jack laughed . They
The moral of the story is : With
washed and dried , Sandy and her uncle played tag in

INTERVAL 0.043 - 0.075
CONTAINS 0.005%

on the roof and sun bat he . He loved the
take care of ourselves . How about I tell you a
" Okay . How do we play ?" Jen
said , " Well , how about we make a game
went she searched for or nam ents , and the more

INTERVAL 0.011 - 0.043
CONTAINS 10.595%

her shower , she went back to her bedroom .
was sure that she would never forget it ! <|endoftext|> Once
birthday . You ruined it !" S ara and
first bee . So , the first bee invited the other
ball only moved a little bit . Tommy was disappointed his

INTERVAL -0.022 - 0.011
CONTAINS 88.509%

He was so sad that he did not fit into the
" I don 't know ," said mom . "
, " I just wanted to see how it feels .
girl 's room . The wardrobe was very dull and old
some milk and butter . Put on your coats , kids

INTERVAL -0.054 - -0.022
CONTAINS 0.891%

said , " Thank you for helping me fix my painting
never ended ! Su zie was sad that she
end . <|endoftext|> Once upon a time , there was a
horses , cows and goats . Car oline
paused and smiled at their beautiful moment . The
TokenIt
Token ID1026
Feature activation+6.14e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
s+0.002
 neither+0.002
Neg logit contributions
opl-0.002
izer-0.002
 stret-0.002
rob-0.002
ep-0.002
Token helps
Token ID5419
Feature activation-2.86e-03

Pos logit contributions
ived+0.011
iked+0.011
uttered+0.010
bs+0.010
 contain+0.010
Neg logit contributions
opl-0.010
ep-0.010
izer-0.009
Kim-0.009
rob-0.009
Token us
Token ID514
Feature activation+7.97e-03

Pos logit contributions
opl+0.004
ep+0.004
Kim+0.004
izer+0.004
coon+0.004
Neg logit contributions
ived-0.006
uttered-0.006
iked-0.005
ashed-0.005
bs-0.005
Token remember
Token ID3505
Feature activation+7.81e-03

Pos logit contributions
uttered+0.014
ived+0.014
iked+0.013
ashed+0.012
 disagreed+0.012
Neg logit contributions
opl-0.014
ep-0.013
izer-0.013
Kim-0.013
coon-0.012
Token when
Token ID618
Feature activation+1.01e-02

Pos logit contributions
ived+0.012
uttered+0.011
ashed+0.010
iked+0.010
bs+0.010
Neg logit contributions
opl-0.015
ep-0.015
Kim-0.014
izer-0.014
coon-0.013
Token birth
Token ID4082
Feature activation+7.52e-02

Pos logit contributions
ived+0.019
uttered+0.019
ashed+0.018
ared+0.018
ver+0.017
Neg logit contributions
opl-0.015
ep-0.014
Kim-0.013
 track-0.013
coon-0.013
Tokendays
Token ID12545
Feature activation+2.24e-02

Pos logit contributions
ived+0.116
uttered+0.104
ashed+0.103
bs+0.103
 goodbye+0.102
Neg logit contributions
opl-0.131
ep-0.127
Kim-0.118
father-0.116
istle-0.114
Token and
Token ID290
Feature activation+9.32e-03

Pos logit contributions
ived+0.037
uttered+0.036
iked+0.033
ared+0.031
ashed+0.031
Neg logit contributions
opl-0.042
ep-0.037
Kim-0.035
rob-0.035
istle-0.034
Token special
Token ID2041
Feature activation+9.56e-03

Pos logit contributions
ived+0.017
uttered+0.017
iked+0.015
ashed+0.015
ared+0.015
Neg logit contributions
opl-0.016
ep-0.015
Kim-0.014
 track-0.013
izer-0.013
Token days
Token ID1528
Feature activation+8.25e-03

Pos logit contributions
ived+0.017
uttered+0.016
ashed+0.015
ared+0.015
bs+0.014
Neg logit contributions
opl-0.017
ep-0.015
izer-0.015
rob-0.014
coon-0.014
Token are
Token ID389
Feature activation+5.45e-03

Pos logit contributions
ived+0.014
uttered+0.014
ashed+0.012
ared+0.012
iked+0.012
Neg logit contributions
opl-0.015
ep-0.013
Kim-0.013
izer-0.013
rob-0.012
Tokenie
Token ID494
Feature activation+1.52e-02

Pos logit contributions
ived+0.010
bs+0.010
 disagreed+0.010
uttered+0.010
s+0.009
Neg logit contributions
opl-0.026
izer-0.025
Kim-0.025
rob-0.024
Wh-0.024
Token.
Token ID13
Feature activation+1.40e-02

Pos logit contributions
ived+0.024
uttered+0.022
bs+0.021
 disagreed+0.020
 neither+0.020
Neg logit contributions
opl-0.030
rob-0.028
izer-0.028
Kim-0.027
ep-0.027
Token They
Token ID1119
Feature activation+1.44e-02

Pos logit contributions
uttered+0.012
bs+0.012
s+0.012
ived+0.012
 neither+0.011
Neg logit contributions
opl-0.036
ep-0.035
rob-0.035
izer-0.034
coon-0.033
Token both
Token ID1111
Feature activation+1.74e-02

Pos logit contributions
ived+0.021
uttered+0.020
iked+0.018
bs+0.018
ared+0.017
Neg logit contributions
ep-0.027
opl-0.027
Kim-0.027
rob-0.026
Wh-0.026
Token had
Token ID550
Feature activation+9.10e-03

Pos logit contributions
ived+0.028
uttered+0.026
iked+0.025
 plunge+0.024
bs+0.024
Neg logit contributions
opl-0.031
ep-0.030
Kim-0.029
rob-0.029
Wh-0.028
Token birth
Token ID4082
Feature activation+6.85e-02

Pos logit contributions
ived+0.013
uttered+0.013
bs+0.012
iked+0.011
 disagreed+0.011
Neg logit contributions
opl-0.018
Kim-0.018
ep-0.017
izer-0.017
rob-0.017
Tokendays
Token ID12545
Feature activation+2.06e-02

Pos logit contributions
ived+0.101
uttered+0.093
 neither+0.092
 disagreed+0.090
bs+0.088
Neg logit contributions
ep-0.122
Kim-0.120
opl-0.115
rob-0.110
izer-0.109
Token coming
Token ID2406
Feature activation+4.43e-03

Pos logit contributions
ived+0.032
uttered+0.029
bs+0.026
 plunge+0.026
iked+0.026
Neg logit contributions
opl-0.039
Kim-0.038
ep-0.037
rob-0.036
izer-0.035
Token up
Token ID510
Feature activation+8.30e-03

Pos logit contributions
ived+0.005
uttered+0.004
bs+0.004
 disagreed+0.004
iked+0.004
Neg logit contributions
Kim-0.010
opl-0.010
izer-0.010
ep-0.010
Wh-0.010
Token,
Token ID11
Feature activation-5.48e-03

Pos logit contributions
ived+0.012
uttered+0.011
bs+0.011
als+0.010
iked+0.010
Neg logit contributions
opl-0.017
Kim-0.016
rob-0.016
ep-0.016
izer-0.016
Token and
Token ID290
Feature activation+1.79e-02

Pos logit contributions
Kim+0.008
opl+0.008
ep+0.007
 track+0.007
Wh+0.007
Neg logit contributions
ived-0.011
uttered-0.011
bs-0.010
ared-0.010
iked-0.010
Token promised
Token ID8072
Feature activation+1.42e-03

Pos logit contributions
Kim+0.009
opl+0.008
ep+0.008
rob+0.008
izer+0.008
Neg logit contributions
ived-0.008
uttered-0.007
ared-0.007
iked-0.006
ver-0.006
Token to
Token ID284
Feature activation-1.62e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
 disagreed+0.001
iked+0.001
Neg logit contributions
opl-0.004
Kim-0.004
izer-0.004
ep-0.003
rob-0.003
Token celebrate
Token ID10648
Feature activation+4.83e-03

Pos logit contributions
opl+0.033
Kim+0.031
izer+0.031
rob+0.031
ep+0.030
Neg logit contributions
ived-0.024
uttered-0.022
bs-0.021
iked-0.020
 disagreed-0.020
Token all
Token ID477
Feature activation+6.52e-03

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.006
iked+0.005
 disagreed+0.005
Neg logit contributions
opl-0.011
izer-0.010
rob-0.010
Kim-0.010
ep-0.010
Token her
Token ID607
Feature activation+6.78e-03

Pos logit contributions
ived+0.010
uttered+0.009
bs+0.009
iked+0.008
 disagreed+0.008
Neg logit contributions
opl-0.013
rob-0.012
izer-0.012
Kim-0.012
ep-0.012
Token birth
Token ID4082
Feature activation+6.65e-02

Pos logit contributions
ived+0.013
uttered+0.012
ared+0.012
iked+0.011
 neither+0.011
Neg logit contributions
opl-0.010
ep-0.009
parent-0.009
rob-0.009
izer-0.008
Tokendays
Token ID12545
Feature activation+6.34e-03

Pos logit contributions
ived+0.098
 neither+0.092
 disagreed+0.091
uttered+0.089
bs+0.088
Neg logit contributions
ep-0.119
opl-0.118
Kim-0.113
rob-0.112
father-0.102
Token together
Token ID1978
Feature activation+6.06e-03

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
ared+0.009
iked+0.008
Neg logit contributions
opl-0.011
ep-0.011
Kim-0.011
rob-0.010
izer-0.010
Token.
Token ID13
Feature activation-1.03e-02

Pos logit contributions
ived+0.012
uttered+0.011
bs+0.010
iked+0.010
ashed+0.010
Neg logit contributions
opl-0.009
ep-0.008
Kim-0.008
izer-0.008
rob-0.008
Token<|endoftext|>
Token ID50256
Feature activation+4.54e-03

Pos logit contributions
opl+0.021
rob+0.021
izer+0.020
ep+0.020
coon+0.020
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.013
 neither-0.013
iked-0.013
TokenOnce
Token ID7454
Feature activation+3.00e-03

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.006
s+0.005
iked+0.005
Neg logit contributions
opl-0.009
izer-0.009
ep-0.009
Kim-0.009
rob-0.009
Token very
Token ID845
Feature activation+1.43e-04

Pos logit contributions
ived+0.000
bs+0.000
uttered+0.000
als+0.000
iked+0.000
Neg logit contributions
opl-0.000
ep-0.000
Kim-0.000
izer-0.000
 stret-0.000
Token excited
Token ID6568
Feature activation+7.44e-03

Pos logit contributions
als+0.000
ived+0.000
uttered+0.000
iked+0.000
rings+0.000
Neg logit contributions
ep-0.000
opl-0.000
Kim-0.000
izer-0.000
Grand-0.000
Token because
Token ID780
Feature activation+7.98e-03

Pos logit contributions
ived+0.011
uttered+0.011
bs+0.010
iked+0.009
ver+0.009
Neg logit contributions
opl-0.015
Kim-0.014
izer-0.014
rob-0.014
ep-0.014
Token she
Token ID673
Feature activation-6.13e-03

Pos logit contributions
ived+0.009
uttered+0.009
bs+0.008
ared+0.007
ashed+0.007
Neg logit contributions
opl-0.019
Kim-0.018
rob-0.018
ep-0.018
izer-0.017
Token loved
Token ID6151
Feature activation-9.33e-03

Pos logit contributions
opl+0.013
Kim+0.012
rob+0.012
izer+0.012
ep+0.012
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.007
iked-0.007
 neither-0.007
Token birth
Token ID4082
Feature activation+6.39e-02

Pos logit contributions
opl+0.018
rob+0.016
ep+0.016
Kim+0.016
izer+0.016
Neg logit contributions
ived-0.015
uttered-0.015
bs-0.014
ver-0.013
ared-0.013
Tokendays
Token ID12545
Feature activation+1.17e-02

Pos logit contributions
ived+0.084
uttered+0.077
bs+0.072
 neither+0.070
 disagreed+0.069
Neg logit contributions
opl-0.134
Kim-0.131
ep-0.130
rob-0.126
izer-0.124
Token.
Token ID13
Feature activation-1.65e-03

Pos logit contributions
ived+0.018
uttered+0.016
bs+0.016
 neither+0.015
 disagreed+0.015
Neg logit contributions
opl-0.023
izer-0.022
rob-0.022
Kim-0.022
ep-0.021
Token
Token ID198
Feature activation-6.51e-03

Pos logit contributions
opl+0.003
rob+0.003
izer+0.003
ep+0.003
coon+0.003
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
s-0.002
 disagreed-0.002
Token
Token ID198
Feature activation-2.49e-03

Pos logit contributions
opl+0.014
izer+0.013
rob+0.013
ep+0.013
coon+0.013
Neg logit contributions
ived-0.008
s-0.008
uttered-0.008
bs-0.008
ver-0.007
TokenOn
Token ID2202
Feature activation-1.83e-02

Pos logit contributions
opl+0.005
rob+0.005
izer+0.005
coon+0.004
ep+0.004
Neg logit contributions
ived-0.004
s-0.003
bs-0.003
uttered-0.003
 while-0.003
Token so
Token ID523
Feature activation-3.76e-03

Pos logit contributions
opl+0.002
ep+0.002
Kim+0.002
izer+0.002
rob+0.002
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.001
iked-0.001
als-0.001
Token excited
Token ID6568
Feature activation+1.01e-02

Pos logit contributions
opl+0.007
ep+0.007
Kim+0.007
izer+0.006
rob+0.006
Neg logit contributions
ived-0.006
uttered-0.006
als-0.005
ared-0.005
bs-0.005
Token!
Token ID0
Feature activation-4.53e-03

Pos logit contributions
ived+0.012
uttered+0.011
bs+0.010
ver+0.009
iked+0.009
Neg logit contributions
opl-0.023
Kim-0.022
rob-0.022
izer-0.022
ep-0.021
Token He
Token ID679
Feature activation-4.90e-03

Pos logit contributions
opl+0.008
Kim+0.007
izer+0.007
rob+0.007
ep+0.007
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.008
 neither-0.007
 disagreed-0.007
Token loved
Token ID6151
Feature activation-7.77e-03

Pos logit contributions
ep+0.010
opl+0.010
Kim+0.010
rob+0.010
izer+0.009
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.006
 neither-0.006
iked-0.005
Token birth
Token ID4082
Feature activation+6.21e-02

Pos logit contributions
opl+0.018
izer+0.017
Kim+0.017
rob+0.017
ep+0.016
Neg logit contributions
ived-0.009
uttered-0.008
bs-0.008
iked-0.008
 disagreed-0.008
Tokendays
Token ID12545
Feature activation+1.19e-02

Pos logit contributions
ived+0.074
uttered+0.068
bs+0.063
 disagreed+0.060
 neither+0.060
Neg logit contributions
opl-0.141
Kim-0.136
ep-0.133
izer-0.133
rob-0.133
Token.
Token ID13
Feature activation-2.55e-03

Pos logit contributions
ived+0.016
uttered+0.015
bs+0.014
 neither+0.014
 disagreed+0.014
Neg logit contributions
opl-0.025
izer-0.024
rob-0.023
Kim-0.023
ep-0.023
Token
Token ID220
Feature activation+5.42e-03

Pos logit contributions
opl+0.005
izer+0.005
rob+0.005
ep+0.005
Kim+0.005
Neg logit contributions
ived-0.004
uttered-0.003
bs-0.003
iked-0.003
 disagreed-0.003
Token
Token ID198
Feature activation+7.38e-03

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.006
s+0.006
ver+0.006
Neg logit contributions
opl-0.012
rob-0.011
izer-0.011
coon-0.011
ep-0.010
Token
Token ID198
Feature activation+2.31e-03

Pos logit contributions
ived+0.009
s+0.009
uttered+0.009
bs+0.008
ver+0.008
Neg logit contributions
opl-0.016
izer-0.015
rob-0.015
coon-0.015
ep-0.015
Token also
Token ID635
Feature activation+6.16e-04

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.006
 disagreed+0.005
 neither+0.005
Neg logit contributions
opl-0.010
izer-0.010
rob-0.009
Kim-0.009
ep-0.009
Token liked
Token ID8288
Feature activation-5.38e-03

Pos logit contributions
ived+0.001
bs+0.001
uttered+0.001
 disagreed+0.001
 neither+0.001
Neg logit contributions
opl-0.001
izer-0.001
rob-0.001
Kim-0.001
coon-0.001
Token to
Token ID284
Feature activation+1.60e-02

Pos logit contributions
opl+0.014
Kim+0.014
ep+0.014
rob+0.014
izer+0.014
Neg logit contributions
ived-0.004
uttered-0.004
bs-0.003
iked-0.003
ared-0.003
Token celebrate
Token ID10648
Feature activation+2.80e-03

Pos logit contributions
ived+0.015
bs+0.014
 neither+0.014
 seem+0.014
uttered+0.013
Neg logit contributions
opl-0.039
rob-0.037
izer-0.037
coon-0.036
Kim-0.035
Token their
Token ID511
Feature activation+5.42e-03

Pos logit contributions
ived+0.003
s+0.003
bs+0.003
 neither+0.003
 while+0.003
Neg logit contributions
opl-0.007
Kim-0.006
izer-0.006
rob-0.006
coon-0.006
Token birth
Token ID4082
Feature activation+6.00e-02

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
iked+0.007
 neither+0.007
Neg logit contributions
opl-0.011
izer-0.010
Kim-0.010
ep-0.010
rob-0.010
Tokendays
Token ID12545
Feature activation+6.81e-03

Pos logit contributions
ived+0.072
uttered+0.066
bs+0.062
 disagreed+0.061
 neither+0.059
Neg logit contributions
opl-0.137
Kim-0.130
ep-0.129
rob-0.128
izer-0.127
Token.
Token ID13
Feature activation+2.99e-03

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.010
 disagreed+0.009
iked+0.009
Neg logit contributions
opl-0.013
Kim-0.012
rob-0.012
izer-0.012
ep-0.012
Token Tim
Token ID5045
Feature activation-8.15e-03

Pos logit contributions
uttered+0.003
ived+0.003
bs+0.003
 neither+0.003
s+0.003
Neg logit contributions
rob-0.007
opl-0.007
ep-0.007
coon-0.007
izer-0.007
Token's
Token ID338
Feature activation-5.54e-04

Pos logit contributions
opl+0.018
izer+0.017
Kim+0.017
rob+0.017
ep+0.016
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.009
 disagreed-0.009
 neither-0.009
Token birthday
Token ID10955
Feature activation+6.14e-03

Pos logit contributions
opl+0.001
ep+0.001
 stret+0.001
izer+0.001
 rug+0.001
Neg logit contributions
ived-0.001
bs-0.001
uttered-0.001
ared-0.001
 disagreed-0.001
Token was
Token ID373
Feature activation-4.59e-04

Pos logit contributions
opl+0.015
izer+0.014
Kim+0.014
rob+0.014
 stret+0.014
Neg logit contributions
uttered-0.006
ived-0.006
 disagreed-0.006
 neither-0.006
s-0.006
Token time
Token ID640
Feature activation-6.28e-03

Pos logit contributions
opl+0.001
rob+0.001
izer+0.001
Kim+0.001
coon+0.001
Neg logit contributions
ived-0.001
uttered-0.000
 neither-0.000
 disagreed-0.000
bs-0.000
Token to
Token ID284
Feature activation+8.61e-03

Pos logit contributions
opl+0.016
Kim+0.015
izer+0.015
ep+0.015
rob+0.014
Neg logit contributions
ived-0.006
uttered-0.005
bs-0.005
als-0.004
 disagreed-0.004
Token celebrate
Token ID10648
Feature activation+5.54e-03

Pos logit contributions
ived+0.008
uttered+0.007
bs+0.006
 disagreed+0.006
 neither+0.006
Neg logit contributions
opl-0.022
rob-0.021
Kim-0.021
izer-0.021
ep-0.021
Token their
Token ID511
Feature activation+1.13e-02

Pos logit contributions
ived+0.006
 neither+0.006
uttered+0.006
bs+0.006
s+0.006
Neg logit contributions
opl-0.013
Kim-0.012
izer-0.012
rob-0.012
coon-0.011
Token birth
Token ID4082
Feature activation+5.95e-02

Pos logit contributions
ived+0.022
uttered+0.020
bs+0.020
ared+0.019
ver+0.019
Neg logit contributions
opl-0.018
izer-0.016
ep-0.016
parent-0.016
 rug-0.015
Tokendays
Token ID12545
Feature activation+9.85e-03

Pos logit contributions
ived+0.081
uttered+0.076
 neither+0.074
 disagreed+0.074
bs+0.072
Neg logit contributions
opl-0.119
ep-0.115
Kim-0.114
rob-0.109
izer-0.108
Token.
Token ID13
Feature activation-5.56e-03

Pos logit contributions
ived+0.017
uttered+0.016
bs+0.015
iked+0.015
 disagreed+0.015
Neg logit contributions
opl-0.017
Kim-0.016
rob-0.016
ep-0.016
izer-0.016
Token Max
Token ID5436
Feature activation+3.78e-03

Pos logit contributions
opl+0.011
izer+0.010
Kim+0.010
ep+0.010
rob+0.010
Neg logit contributions
ived-0.008
uttered-0.008
s-0.008
bs-0.008
 neither-0.007
Token had
Token ID550
Feature activation+1.44e-02

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.005
iked+0.005
 neither+0.005
Neg logit contributions
opl-0.007
ep-0.007
izer-0.007
Kim-0.007
rob-0.007
Token a
Token ID257
Feature activation+8.27e-03

Pos logit contributions
ived+0.025
uttered+0.025
bs+0.023
ared+0.022
iked+0.022
Neg logit contributions
ep-0.025
rob-0.023
opl-0.023
Kim-0.023
izer-0.022
TokenWelcome
Token ID14618
Feature activation+1.02e-02

Pos logit contributions
opl+0.036
izer+0.033
rob+0.031
 stret+0.031
ep+0.030
Neg logit contributions
ived-0.028
uttered-0.026
 disagreed-0.024
 married-0.024
 neither-0.024
Token home
Token ID1363
Feature activation-2.78e-02

Pos logit contributions
ived+0.014
uttered+0.014
bs+0.013
iked+0.012
 neither+0.012
Neg logit contributions
Kim-0.019
opl-0.018
izer-0.018
ep-0.017
parent-0.017
Token,
Token ID11
Feature activation+3.09e-03

Pos logit contributions
opl+0.048
Kim+0.045
ep+0.044
coon+0.043
rob+0.043
Neg logit contributions
ived-0.047
uttered-0.047
bs-0.044
ashed-0.042
iked-0.041
Token Daddy
Token ID26926
Feature activation-9.87e-03

Pos logit contributions
ived+0.005
uttered+0.004
bs+0.004
iked+0.004
 disagreed+0.004
Neg logit contributions
opl-0.006
Kim-0.006
izer-0.006
ep-0.006
rob-0.006
Token!",
Token ID40754
Feature activation+4.41e-05

Pos logit contributions
opl+0.019
Kim+0.017
ep+0.017
izer+0.017
rob+0.017
Neg logit contributions
ived-0.016
uttered-0.015
bs-0.014
iked-0.014
 neither-0.013
Token Cher
Token ID19305
Feature activation+5.83e-02

Pos logit contributions
ived+0.000
uttered+0.000
ared+0.000
ashed+0.000
iked+0.000
Neg logit contributions
Kim-0.000
opl-0.000
rob-0.000
ep-0.000
izer-0.000
Tokenie
Token ID494
Feature activation+6.27e-03

Pos logit contributions
ived+0.086
uttered+0.079
bs+0.077
 neither+0.071
iked+0.071
Neg logit contributions
opl-0.117
izer-0.111
rob-0.110
Kim-0.110
ep-0.109
Token said
Token ID531
Feature activation+2.31e-03

Pos logit contributions
ived+0.010
uttered+0.009
bs+0.009
iked+0.008
ashed+0.008
Neg logit contributions
ep-0.012
opl-0.011
Kim-0.011
izer-0.011
rob-0.010
Token happily
Token ID18177
Feature activation-1.81e-04

Pos logit contributions
ived+0.003
uttered+0.003
bs+0.003
ashed+0.003
als+0.003
Neg logit contributions
Kim-0.004
opl-0.004
izer-0.004
ep-0.004
 stret-0.004
Token.
Token ID13
Feature activation-4.12e-03

Pos logit contributions
opl+0.000
Kim+0.000
izer+0.000
ep+0.000
rob+0.000
Neg logit contributions
ived-0.000
uttered-0.000
bs-0.000
 neither-0.000
 disagreed-0.000
Token<|endoftext|>
Token ID50256
Feature activation+3.88e-03

Pos logit contributions
opl+0.009
rob+0.008
izer+0.008
ep+0.008
Kim+0.008
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.005
iked-0.005
s-0.005
Token and
Token ID290
Feature activation+3.62e-03

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.006
 disagreed+0.006
 neither+0.006
Neg logit contributions
opl-0.008
rob-0.008
izer-0.008
Kim-0.008
ep-0.008
Token then
Token ID788
Feature activation+1.07e-02

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.005
iked+0.005
ared+0.005
Neg logit contributions
ep-0.006
Kim-0.006
opl-0.006
rob-0.006
coon-0.005
Token said
Token ID531
Feature activation+1.23e-02

Pos logit contributions
ived+0.021
uttered+0.019
 plunge+0.019
 contain+0.018
ared+0.018
Neg logit contributions
ep-0.014
Kim-0.013
Grand-0.012
et-0.012
 Tooth-0.012
Token,
Token ID11
Feature activation+8.35e-03

Pos logit contributions
ived+0.022
uttered+0.021
ared+0.020
ashed+0.019
 disagreed+0.019
Neg logit contributions
Kim-0.016
coon-0.016
cel-0.015
ep-0.015
zer-0.015
Token "
Token ID366
Feature activation+8.32e-03

Pos logit contributions
ived+0.015
uttered+0.015
bs+0.014
 disagreed+0.013
iked+0.013
Neg logit contributions
opl-0.013
Kim-0.012
izer-0.012
ep-0.012
rob-0.012
TokenHow
Token ID2437
Feature activation+5.73e-02

Pos logit contributions
ived+0.014
uttered+0.013
bs+0.013
iked+0.012
 disagreed+0.012
Neg logit contributions
opl-0.015
Kim-0.014
izer-0.014
rob-0.014
ep-0.014
Token about
Token ID546
Feature activation+3.65e-02

Pos logit contributions
ived+0.093
uttered+0.086
als+0.080
iked+0.079
ared+0.077
Neg logit contributions
ep-0.099
opl-0.099
Kim-0.096
izer-0.089
Johnny-0.086
Token you
Token ID345
Feature activation+2.26e-02

Pos logit contributions
ived+0.061
uttered+0.058
ashed+0.053
bs+0.052
ared+0.052
Neg logit contributions
ep-0.064
opl-0.063
Kim-0.060
izer-0.059
rob-0.058
Token mix
Token ID5022
Feature activation+1.89e-02

Pos logit contributions
ived+0.035
uttered+0.034
iked+0.031
bs+0.030
ashed+0.029
Neg logit contributions
opl-0.043
ep-0.040
izer-0.039
Kim-0.039
coon-0.037
Token together
Token ID1978
Feature activation+2.17e-02

Pos logit contributions
ived+0.031
uttered+0.030
iked+0.027
bs+0.026
ared+0.026
Neg logit contributions
opl-0.035
ep-0.034
izer-0.032
rob-0.032
Kim-0.032
Token all
Token ID477
Feature activation+2.06e-02

Pos logit contributions
ived+0.039
uttered+0.038
bs+0.034
iked+0.034
ashed+0.033
Neg logit contributions
opl-0.036
ep-0.035
izer-0.033
Kim-0.032
rob-0.030
Token and
Token ID290
Feature activation-7.46e-03

Pos logit contributions
uttered+0.002
ived+0.002
bs+0.002
 neither+0.002
iked+0.002
Neg logit contributions
ep-0.002
opl-0.002
izer-0.002
Kim-0.002
coon-0.002
Token Jane
Token ID12091
Feature activation+1.38e-02

Pos logit contributions
Kim+0.012
opl+0.011
ep+0.011
izer+0.010
parent+0.010
Neg logit contributions
ived-0.014
uttered-0.013
bs-0.013
ashed-0.012
iked-0.012
Token celebrated
Token ID13943
Feature activation+1.69e-03

Pos logit contributions
ived+0.019
uttered+0.019
 neither+0.018
 plunge+0.017
iked+0.017
Neg logit contributions
ep-0.026
opl-0.024
et-0.023
Kim-0.022
parent-0.022
Token both
Token ID1111
Feature activation+1.23e-02

Pos logit contributions
ived+0.003
uttered+0.002
bs+0.002
iked+0.002
ashed+0.002
Neg logit contributions
opl-0.003
izer-0.003
Kim-0.003
rob-0.003
ep-0.003
Token their
Token ID511
Feature activation+8.60e-03

Pos logit contributions
uttered+0.017
ived+0.017
bs+0.016
ared+0.015
ver+0.015
Neg logit contributions
opl-0.024
parent-0.023
Kim-0.022
ep-0.022
izer-0.021
Token birth
Token ID4082
Feature activation+5.62e-02

Pos logit contributions
ived+0.015
bs+0.015
uttered+0.015
ver+0.014
ared+0.014
Neg logit contributions
opl-0.013
parent-0.012
ep-0.012
izer-0.012
 rug-0.011
Tokendays
Token ID12545
Feature activation+2.87e-03

Pos logit contributions
ived+0.083
 neither+0.082
 disagreed+0.079
bs+0.077
ls+0.076
Neg logit contributions
ep-0.093
Kim-0.092
opl-0.090
rob-0.085
Johnny-0.081
Token together
Token ID1978
Feature activation+6.32e-03

Pos logit contributions
ived+0.005
uttered+0.005
bs+0.004
ared+0.004
iked+0.004
Neg logit contributions
opl-0.005
Kim-0.005
ep-0.005
rob-0.004
izer-0.004
Token.
Token ID13
Feature activation-9.44e-03

Pos logit contributions
ived+0.012
uttered+0.012
bs+0.011
iked+0.010
ashed+0.010
Neg logit contributions
opl-0.010
ep-0.009
izer-0.009
Kim-0.008
rob-0.008
Token They
Token ID1119
Feature activation+8.72e-03

Pos logit contributions
opl+0.019
Kim+0.018
izer+0.018
rob+0.018
ep+0.018
Neg logit contributions
ived-0.014
uttered-0.013
bs-0.012
 disagreed-0.011
iked-0.011
Token played
Token ID2826
Feature activation+5.52e-03

Pos logit contributions
ived+0.014
uttered+0.013
iked+0.012
ared+0.011
bs+0.011
Neg logit contributions
ep-0.016
opl-0.016
rob-0.015
Kim-0.015
parent-0.014
Token every
Token ID790
Feature activation+1.55e-02

Pos logit contributions
ived+0.007
uttered+0.007
ashed+0.007
bs+0.006
iked+0.006
Neg logit contributions
opl-0.008
Kim-0.008
rob-0.008
izer-0.008
ep-0.008
Token year
Token ID614
Feature activation-2.74e-04

Pos logit contributions
ived+0.032
uttered+0.030
iked+0.029
als+0.028
bs+0.028
Neg logit contributions
opl-0.023
ep-0.021
izer-0.021
rob-0.019
Kim-0.019
Token,
Token ID11
Feature activation+6.06e-03

Pos logit contributions
opl+0.000
Kim+0.000
ep+0.000
rob+0.000
parent+0.000
Neg logit contributions
ived-0.000
uttered-0.000
ashed-0.000
bs-0.000
 disagreed-0.000
Token on
Token ID319
Feature activation-1.02e-02

Pos logit contributions
ived+0.010
ashed+0.010
ared+0.009
uttered+0.009
iked+0.009
Neg logit contributions
opl-0.009
Kim-0.008
Jake-0.008
ep-0.008
ena-0.008
Token their
Token ID511
Feature activation+4.61e-03

Pos logit contributions
opl+0.021
ep+0.019
Kim+0.019
izer+0.019
rob+0.019
Neg logit contributions
ived-0.015
uttered-0.013
ashed-0.013
iked-0.012
bs-0.012
Token birth
Token ID4082
Feature activation+5.56e-02

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
ver+0.007
iked+0.007
Neg logit contributions
opl-0.008
izer-0.007
ep-0.007
parent-0.007
Kim-0.007
Tokendays
Token ID12545
Feature activation+3.91e-03

Pos logit contributions
ived+0.086
 neither+0.081
ls+0.079
 plunge+0.077
 disagreed+0.075
Neg logit contributions
ep-0.091
Kim-0.085
opl-0.082
father-0.080
web-0.078
Token,
Token ID11
Feature activation-7.48e-03

Pos logit contributions
ived+0.006
uttered+0.005
ared+0.005
iked+0.005
bs+0.005
Neg logit contributions
opl-0.008
Kim-0.007
rob-0.007
ep-0.007
coon-0.007
Token they
Token ID484
Feature activation+3.19e-03

Pos logit contributions
opl+0.013
Kim+0.012
ep+0.012
 track+0.011
istle+0.011
Neg logit contributions
ived-0.012
ared-0.011
uttered-0.010
ashed-0.010
iked-0.010
Token would
Token ID561
Feature activation+6.50e-03

Pos logit contributions
ived+0.005
uttered+0.005
iked+0.005
 plunge+0.004
ared+0.004
Neg logit contributions
opl-0.006
ep-0.006
rob-0.005
et-0.005
istle-0.005
Token celebrate
Token ID10648
Feature activation+2.99e-04

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
iked+0.009
ashed+0.009
Neg logit contributions
opl-0.013
izer-0.012
ep-0.012
Kim-0.011
rob-0.011
Token"
Token ID1
Feature activation-1.38e-02

Pos logit contributions
opl+0.015
rob+0.014
izer+0.014
istle+0.013
Kim+0.012
Neg logit contributions
ived-0.012
uttered-0.011
bs-0.011
s-0.011
iked-0.011
TokenWhere
Token ID8496
Feature activation+1.12e-02

Pos logit contributions
opl+0.026
izer+0.024
rob+0.023
Kim+0.022
ep+0.022
Neg logit contributions
ived-0.023
uttered-0.022
bs-0.020
 disagreed-0.020
 neither-0.020
Token is
Token ID318
Feature activation+4.44e-03

Pos logit contributions
ived+0.023
uttered+0.021
iked+0.021
ashed+0.019
als+0.019
Neg logit contributions
ep-0.016
Kim-0.014
opl-0.014
iz-0.013
coon-0.013
Token daddy
Token ID35822
Feature activation-4.97e-03

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
ared+0.009
iked+0.009
Neg logit contributions
opl-0.006
Kim-0.005
ep-0.005
rob-0.005
izer-0.005
Token?",
Token ID35379
Feature activation-1.31e-03

Pos logit contributions
opl+0.009
ep+0.008
izer+0.008
Kim+0.008
coon+0.007
Neg logit contributions
uttered-0.009
ived-0.009
bs-0.008
iked-0.008
ashed-0.008
Token Cher
Token ID19305
Feature activation+5.41e-02

Pos logit contributions
Kim+0.002
opl+0.002
ep+0.002
izer+0.002
rob+0.002
Neg logit contributions
ived-0.002
uttered-0.002
ashed-0.002
iked-0.002
bs-0.002
Tokenie
Token ID494
Feature activation+1.41e-02

Pos logit contributions
ived+0.092
uttered+0.086
bs+0.085
iked+0.079
 neither+0.078
Neg logit contributions
opl-0.093
ep-0.091
rob-0.090
izer-0.089
Kim-0.088
Token asked
Token ID1965
Feature activation+9.83e-03

Pos logit contributions
ived+0.018
uttered+0.017
iked+0.016
bs+0.015
ashed+0.015
Neg logit contributions
ep-0.030
opl-0.029
Kim-0.028
izer-0.028
coon-0.027
Token her
Token ID607
Feature activation+1.37e-02

Pos logit contributions
ived+0.015
uttered+0.014
bs+0.014
ashed+0.013
iked+0.013
Neg logit contributions
opl-0.018
Kim-0.018
izer-0.017
ep-0.017
rob-0.017
Token mom
Token ID1995
Feature activation-3.21e-03

Pos logit contributions
ived+0.025
uttered+0.023
iked+0.022
ashed+0.022
bs+0.021
Neg logit contributions
opl-0.021
parent-0.021
ep-0.021
Kim-0.020
izer-0.020
Tokenmy
Token ID1820
Feature activation+6.23e-03

Pos logit contributions
opl+0.005
Kim+0.005
izer+0.005
ep+0.005
rob+0.005
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.005
iked-0.005
 neither-0.005
Token does
Token ID857
Feature activation+2.35e-02

Pos logit contributions
ived+0.019
uttered+0.018
iked+0.016
ashed+0.016
bs+0.016
Neg logit contributions
opl-0.017
ep-0.015
Kim-0.015
izer-0.015
coon-0.015
Token.
Token ID13
Feature activation-7.06e-03

Pos logit contributions
ived+0.044
uttered+0.041
ashed+0.040
iked+0.039
ared+0.036
Neg logit contributions
opl-0.041
ep-0.038
Kim-0.037
izer-0.036
coon-0.034
Token She
Token ID1375
Feature activation-5.84e-03

Pos logit contributions
opl+0.014
rob+0.014
izer+0.014
ep+0.014
Kim+0.013
Neg logit contributions
ived-0.010
uttered-0.010
bs-0.009
 neither-0.009
iked-0.009
Token learned
Token ID4499
Feature activation+1.61e-03

Pos logit contributions
opl+0.011
ep+0.011
rob+0.010
Kim+0.010
izer+0.010
Neg logit contributions
ived-0.009
uttered-0.009
bs-0.008
iked-0.008
 neither-0.008
Token that
Token ID326
Feature activation+4.62e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.003
Kim-0.003
izer-0.003
rob-0.003
ep-0.003
Token birth
Token ID4082
Feature activation+5.41e-02

Pos logit contributions
ived+0.009
uttered+0.008
ashed+0.007
ared+0.007
iked+0.007
Neg logit contributions
opl-0.008
Kim-0.007
rob-0.007
ep-0.007
coon-0.007
Tokendays
Token ID12545
Feature activation+8.82e-03

Pos logit contributions
ived+0.084
 disagreed+0.070
uttered+0.070
iked+0.069
ashed+0.068
Neg logit contributions
opl-0.096
Kim-0.094
ep-0.092
coon-0.089
istle-0.087
Token are
Token ID389
Feature activation+5.29e-03

Pos logit contributions
ived+0.015
uttered+0.013
iked+0.013
ashed+0.012
bs+0.012
Neg logit contributions
opl-0.017
Kim-0.015
ep-0.015
rob-0.015
istle-0.015
Token not
Token ID407
Feature activation+7.86e-03

Pos logit contributions
ived+0.009
uttered+0.008
bs+0.007
iked+0.007
 disagreed+0.007
Neg logit contributions
opl-0.010
Kim-0.010
izer-0.009
ep-0.009
rob-0.009
Token about
Token ID546
Feature activation+1.70e-02

Pos logit contributions
ived+0.015
uttered+0.014
iked+0.013
bs+0.013
als+0.012
Neg logit contributions
opl-0.013
ep-0.013
Kim-0.012
izer-0.012
Wh-0.011
Token cakes
Token ID30849
Feature activation-2.28e-03

Pos logit contributions
ived+0.036
uttered+0.034
bs+0.032
ashed+0.032
iked+0.031
Neg logit contributions
opl-0.025
Kim-0.021
ep-0.021
izer-0.020
rob-0.020
Token a
Token ID257
Feature activation+5.63e-03

Pos logit contributions
ived+0.009
uttered+0.008
iked+0.007
bs+0.007
ared+0.007
Neg logit contributions
opl-0.011
Kim-0.010
ep-0.010
Wh-0.010
rob-0.010
Token box
Token ID3091
Feature activation+1.33e-02

Pos logit contributions
ived+0.009
uttered+0.009
bs+0.008
 disagreed+0.008
 neither+0.008
Neg logit contributions
opl-0.011
rob-0.010
Kim-0.010
izer-0.010
ep-0.010
Token of
Token ID286
Feature activation+1.11e-02

Pos logit contributions
ived+0.015
uttered+0.014
bs+0.013
 disagreed+0.012
iked+0.012
Neg logit contributions
opl-0.031
Kim-0.029
izer-0.029
ep-0.029
rob-0.028
Token pretty
Token ID2495
Feature activation+6.48e-03

Pos logit contributions
ived+0.019
uttered+0.018
bs+0.017
 disagreed+0.016
iked+0.016
Neg logit contributions
opl-0.020
izer-0.018
rob-0.018
Kim-0.018
ep-0.018
Token or
Token ID393
Feature activation+1.54e-02

Pos logit contributions
ived+0.012
uttered+0.011
ver+0.011
 neither+0.011
 disagreed+0.010
Neg logit contributions
opl-0.011
ep-0.010
izer-0.010
Kim-0.010
rob-0.010
Tokennam
Token ID7402
Feature activation+5.41e-02

Pos logit contributions
ived+0.025
uttered+0.022
iked+0.020
ared+0.019
ashed+0.019
Neg logit contributions
Kim-0.030
ep-0.029
opl-0.029
Wh-0.028
rob-0.028
Tokenents
Token ID658
Feature activation+2.40e-02

Pos logit contributions
ived+0.065
uttered+0.062
bs+0.058
 disagreed+0.056
 neither+0.056
Neg logit contributions
opl-0.119
ep-0.115
Kim-0.112
izer-0.109
rob-0.106
Token that
Token ID326
Feature activation+1.03e-02

Pos logit contributions
ived+0.039
uttered+0.037
iked+0.033
bs+0.033
 disagreed+0.032
Neg logit contributions
opl-0.043
ep-0.042
Kim-0.039
rob-0.039
coon-0.039
Token she
Token ID673
Feature activation+7.52e-03

Pos logit contributions
ived+0.014
uttered+0.012
iked+0.011
ared+0.011
bs+0.011
Neg logit contributions
opl-0.022
Kim-0.021
rob-0.021
ep-0.020
Wh-0.020
Token wanted
Token ID2227
Feature activation-1.63e-03

Pos logit contributions
ived+0.011
uttered+0.010
iked+0.010
 neither+0.009
bs+0.009
Neg logit contributions
ep-0.015
opl-0.014
Kim-0.014
rob-0.014
Wh-0.014
Token to
Token ID284
Feature activation+1.38e-02

Pos logit contributions
opl+0.003
Kim+0.003
ep+0.003
Wh+0.003
parent+0.003
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
ashed-0.002
ared-0.002
Token a
Token ID257
Feature activation-1.13e-02

Pos logit contributions
Kim+0.007
opl+0.007
ep+0.007
izer+0.007
rob+0.007
Neg logit contributions
ived-0.007
uttered-0.006
bs-0.005
ared-0.005
 disagreed-0.005
Token big
Token ID1263
Feature activation-7.85e-03

Pos logit contributions
opl+0.017
 stret+0.016
izer+0.016
ep+0.016
Kim+0.016
Neg logit contributions
ived-0.023
uttered-0.021
bs-0.020
iked-0.020
ared-0.020
Token box
Token ID3091
Feature activation+1.30e-02

Pos logit contributions
opl+0.013
izer+0.012
rob+0.011
Kim+0.011
ep+0.011
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.014
iked-0.013
 neither-0.013
Token of
Token ID286
Feature activation+7.77e-03

Pos logit contributions
ived+0.013
uttered+0.012
bs+0.011
 disagreed+0.010
 neither+0.010
Neg logit contributions
opl-0.032
izer-0.030
Kim-0.030
rob-0.030
ep-0.030
Token or
Token ID393
Feature activation+6.35e-03

Pos logit contributions
ived+0.014
uttered+0.013
bs+0.012
ared+0.012
iked+0.012
Neg logit contributions
opl-0.014
rob-0.013
Kim-0.013
ep-0.013
izer-0.013
Tokennam
Token ID7402
Feature activation+5.28e-02

Pos logit contributions
ived+0.009
uttered+0.007
ared+0.007
iked+0.007
ashed+0.006
Neg logit contributions
Kim-0.014
ep-0.014
Wh-0.013
Grand-0.013
rob-0.013
Tokenents
Token ID658
Feature activation+2.18e-02

Pos logit contributions
ived+0.064
uttered+0.061
bs+0.058
 disagreed+0.057
 neither+0.056
Neg logit contributions
opl-0.114
ep-0.111
Kim-0.108
izer-0.105
rob-0.102
Token.
Token ID13
Feature activation+1.52e-02

Pos logit contributions
ived+0.036
uttered+0.034
bs+0.031
 disagreed+0.031
ashed+0.030
Neg logit contributions
opl-0.039
ep-0.038
rob-0.036
Kim-0.036
izer-0.035
Token They
Token ID1119
Feature activation+6.81e-03

Pos logit contributions
ived+0.026
uttered+0.024
bs+0.023
iked+0.023
 neither+0.022
Neg logit contributions
opl-0.026
Kim-0.025
izer-0.025
rob-0.024
ep-0.024
Token were
Token ID547
Feature activation+2.87e-04

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.009
 plunge+0.008
iked+0.008
Neg logit contributions
ep-0.013
Kim-0.012
rob-0.011
Wh-0.011
opl-0.011
Token so
Token ID523
Feature activation-5.91e-03

Pos logit contributions
ived+0.000
bs+0.000
uttered+0.000
ared+0.000
iked+0.000
Neg logit contributions
ep-0.000
opl-0.000
Kim-0.000
izer-0.000
Wh-0.000
Token with
Token ID351
Feature activation-6.22e-03

Pos logit contributions
ived+0.021
uttered+0.019
bs+0.017
 disagreed+0.017
iked+0.017
Neg logit contributions
opl-0.032
izer-0.031
Kim-0.030
ep-0.030
rob-0.030
Token lots
Token ID6041
Feature activation+2.13e-03

Pos logit contributions
opl+0.011
Kim+0.010
ep+0.010
rob+0.010
izer+0.009
Neg logit contributions
ived-0.011
uttered-0.011
ared-0.010
ver-0.010
iked-0.010
Token of
Token ID286
Feature activation+5.98e-03

Pos logit contributions
uttered+0.003
ived+0.002
 neither+0.002
 disagreed+0.002
ver+0.002
Neg logit contributions
opl-0.005
ep-0.004
Kim-0.004
izer-0.004
rob-0.004
Token pretty
Token ID2495
Feature activation+4.25e-03

Pos logit contributions
ived+0.012
uttered+0.011
 neither+0.010
ashed+0.010
ver+0.010
Neg logit contributions
opl-0.011
rob-0.009
ep-0.009
Kim-0.009
 stret-0.008
Token or
Token ID393
Feature activation+8.53e-03

Pos logit contributions
uttered+0.008
ived+0.008
 neither+0.007
ver+0.007
 contain+0.007
Neg logit contributions
opl-0.007
ep-0.007
rob-0.007
Kim-0.006
 stret-0.006
Tokennam
Token ID7402
Feature activation+5.27e-02

Pos logit contributions
ived+0.014
uttered+0.014
iked+0.011
ashed+0.011
ared+0.011
Neg logit contributions
ep-0.016
Kim-0.016
opl-0.016
Grand-0.016
Wh-0.016
Tokenents
Token ID658
Feature activation+2.61e-02

Pos logit contributions
ived+0.066
uttered+0.066
 disagreed+0.059
bs+0.059
 neither+0.059
Neg logit contributions
opl-0.114
ep-0.109
Kim-0.105
izer-0.101
rob-0.099
Token.
Token ID13
Feature activation+9.01e-03

Pos logit contributions
ived+0.045
uttered+0.044
 disagreed+0.040
iked+0.039
 neither+0.039
Neg logit contributions
opl-0.045
ep-0.043
rob-0.041
Kim-0.040
izer-0.040
Token One
Token ID1881
Feature activation-7.70e-04

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.010
s+0.009
 disagreed+0.009
Neg logit contributions
opl-0.021
izer-0.020
rob-0.019
ep-0.019
Kim-0.019
Token day
Token ID1110
Feature activation+5.50e-03

Pos logit contributions
opl+0.002
Kim+0.001
izer+0.001
rob+0.001
ep+0.001
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
iked-0.001
 neither-0.001
Token,
Token ID11
Feature activation+1.84e-02

Pos logit contributions
 contain+0.011
uttered+0.011
ived+0.011
ared+0.010
 hesitate+0.010
Neg logit contributions
Kim-0.006
izer-0.005
rob-0.005
ep-0.005
opl-0.005
Token wrong
Token ID2642
Feature activation+1.15e-03

Pos logit contributions
opl+0.019
izer+0.017
rob+0.017
Kim+0.017
ep+0.016
Neg logit contributions
ived-0.019
uttered-0.018
bs-0.017
 disagreed-0.017
 neither-0.017
Tokenâ
Token ID22940
Feature activation-5.53e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
ashed+0.002
Neg logit contributions
opl-0.002
Kim-0.002
izer-0.002
ep-0.002
coon-0.002
Token
Token ID26391
Feature activation-1.87e-03

Pos logit contributions
ep+0.007
opl+0.007
rob+0.007
Kim+0.007
Wh+0.007
Neg logit contributions
 disagreed-0.010
ived-0.010
uttered-0.010
ared-0.010
 seem-0.010
Token
Token ID447
Feature activation+1.21e-02

Pos logit contributions
opl+0.004
Kim+0.003
izer+0.003
ep+0.003
rob+0.003
Neg logit contributions
ived-0.003
uttered-0.003
bs-0.002
iked-0.002
 plunge-0.002
Token
Token ID251
Feature activation+1.35e-02

Pos logit contributions
ived+0.016
uttered+0.015
bs+0.015
 neither+0.014
 disagreed+0.014
Neg logit contributions
opl-0.026
izer-0.024
rob-0.024
Kim-0.024
ep-0.024
Tokenall
Token ID439
Feature activation+5.26e-02

Pos logit contributions
ived+0.021
uttered+0.020
bs+0.019
 disagreed+0.018
iked+0.018
Neg logit contributions
opl-0.026
izer-0.024
Kim-0.024
rob-0.024
ep-0.024
Token the
Token ID262
Feature activation+2.13e-02

Pos logit contributions
ived+0.075
uttered+0.071
bs+0.069
iked+0.065
ashed+0.064
Neg logit contributions
ep-0.097
Kim-0.096
coon-0.094
opl-0.093
rob-0.093
Token water
Token ID1660
Feature activation+1.23e-03

Pos logit contributions
ived+0.037
uttered+0.035
bs+0.034
iked+0.033
 neither+0.032
Neg logit contributions
opl-0.037
Kim-0.035
rob-0.035
izer-0.034
coon-0.034
Token had
Token ID550
Feature activation+9.79e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.001
ver+0.001
Neg logit contributions
opl-0.002
ep-0.002
Kim-0.002
coon-0.002
izer-0.002
Token gone
Token ID3750
Feature activation-3.64e-03

Pos logit contributions
ived+0.014
uttered+0.014
bs+0.013
iked+0.013
 neither+0.012
Neg logit contributions
opl-0.019
ep-0.019
Kim-0.018
izer-0.018
Wh-0.017
Token!
Token ID0
Feature activation+2.03e-03

Pos logit contributions
opl+0.007
Kim+0.007
izer+0.007
ep+0.007
rob+0.007
Neg logit contributions
ived-0.005
uttered-0.005
bs-0.005
iked-0.004
 neither-0.004
Token box
Token ID3091
Feature activation+1.06e-02

Pos logit contributions
opl+0.021
izer+0.019
Kim+0.019
rob+0.019
ep+0.018
Neg logit contributions
ived-0.021
uttered-0.020
bs-0.019
iked-0.018
 neither-0.018
Token full
Token ID1336
Feature activation-6.53e-04

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.010
iked+0.009
 disagreed+0.009
Neg logit contributions
opl-0.026
rob-0.024
izer-0.024
Kim-0.024
ep-0.024
Token of
Token ID286
Feature activation+1.08e-02

Pos logit contributions
opl+0.002
izer+0.002
Kim+0.002
ep+0.002
rob+0.002
Neg logit contributions
ived-0.000
uttered-0.000
bs-0.000
 neither-0.000
iked-0.000
Token beautiful
Token ID4950
Feature activation+1.84e-03

Pos logit contributions
ived+0.019
uttered+0.018
bs+0.016
iked+0.016
 neither+0.016
Neg logit contributions
opl-0.019
Kim-0.018
rob-0.018
izer-0.018
ep-0.018
Token or
Token ID393
Feature activation+3.84e-03

Pos logit contributions
ived+0.003
uttered+0.003
 neither+0.003
ver+0.003
bs+0.002
Neg logit contributions
opl-0.004
rob-0.003
ep-0.003
izer-0.003
Kim-0.003
Tokennam
Token ID7402
Feature activation+5.18e-02

Pos logit contributions
ived+0.006
uttered+0.005
iked+0.004
ared+0.004
bs+0.004
Neg logit contributions
Kim-0.008
ep-0.008
Wh-0.008
rob-0.008
opl-0.007
Tokenents
Token ID658
Feature activation+1.96e-02

Pos logit contributions
ived+0.060
uttered+0.056
bs+0.053
 neither+0.051
 disagreed+0.050
Neg logit contributions
opl-0.118
ep-0.112
Kim-0.111
izer-0.108
rob-0.108
Token.
Token ID13
Feature activation-2.54e-03

Pos logit contributions
ived+0.032
uttered+0.031
bs+0.029
iked+0.028
 disagreed+0.027
Neg logit contributions
opl-0.035
ep-0.033
rob-0.032
izer-0.032
Kim-0.032
Token
Token ID220
Feature activation+1.02e-02

Pos logit contributions
opl+0.004
Kim+0.004
izer+0.004
Wh+0.004
rob+0.004
Neg logit contributions
ived-0.005
uttered-0.004
iked-0.004
 neither-0.004
bs-0.004
Token
Token ID198
Feature activation-6.37e-03

Pos logit contributions
ived+0.016
uttered+0.016
bs+0.015
 neither+0.014
iked+0.014
Neg logit contributions
opl-0.019
rob-0.018
izer-0.018
Kim-0.018
ep-0.017
Token
Token ID198
Feature activation-9.50e-03

Pos logit contributions
opl+0.013
izer+0.012
rob+0.012
Kim+0.012
ep+0.011
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.009
s-0.008
 disagreed-0.008
Token who
Token ID508
Feature activation-1.95e-02

Pos logit contributions
opl+0.005
rob+0.005
izer+0.005
Kim+0.005
coon+0.005
Neg logit contributions
bs-0.004
 while-0.003
s-0.003
 neither-0.003
 to-0.003
Token liked
Token ID8288
Feature activation-1.15e-02

Pos logit contributions
opl+0.042
rob+0.039
izer+0.039
Kim+0.039
ep+0.038
Neg logit contributions
ived-0.027
uttered-0.025
bs-0.025
 disagreed-0.023
 neither-0.022
Token to
Token ID284
Feature activation+1.82e-02

Pos logit contributions
opl+0.026
Kim+0.025
ep+0.025
 stret+0.025
izer+0.025
Neg logit contributions
uttered-0.013
ived-0.013
ver-0.011
ared-0.011
bs-0.011
Token collect
Token ID2824
Feature activation+2.05e-03

Pos logit contributions
ived+0.027
uttered+0.025
bs+0.023
 neither+0.023
iked+0.022
Neg logit contributions
opl-0.037
izer-0.035
Kim-0.035
ep-0.034
rob-0.034
Token or
Token ID393
Feature activation+1.99e-02

Pos logit contributions
ived+0.003
uttered+0.003
ver+0.003
 neither+0.003
iked+0.003
Neg logit contributions
opl-0.004
rob-0.004
ep-0.004
izer-0.004
Kim-0.004
Tokennam
Token ID7402
Feature activation+5.18e-02

Pos logit contributions
ived+0.038
uttered+0.037
ver+0.031
ashed+0.030
iked+0.030
Neg logit contributions
ep-0.033
Wh-0.032
 track-0.032
Kim-0.031
Mic-0.030
Tokenents
Token ID658
Feature activation+1.91e-02

Pos logit contributions
ived+0.064
uttered+0.062
bs+0.058
 disagreed+0.056
 neither+0.055
Neg logit contributions
opl-0.113
ep-0.109
Kim-0.105
izer-0.102
rob-0.101
Token.
Token ID13
Feature activation+9.26e-03

Pos logit contributions
ived+0.036
uttered+0.035
bs+0.032
iked+0.031
 disagreed+0.031
Neg logit contributions
opl-0.030
ep-0.028
rob-0.026
Kim-0.026
izer-0.025
Token Everywhere
Token ID48201
Feature activation+1.25e-02

Pos logit contributions
ived+0.013
uttered+0.012
bs+0.012
iked+0.011
 disagreed+0.011
Neg logit contributions
opl-0.019
rob-0.018
izer-0.018
Kim-0.018
ep-0.018
Token she
Token ID673
Feature activation+1.95e-03

Pos logit contributions
ived+0.022
uttered+0.019
iked+0.018
ared+0.018
 contain+0.017
Neg logit contributions
rob-0.023
opl-0.023
Kim-0.023
ep-0.021
parent-0.020
Token went
Token ID1816
Feature activation+1.42e-02

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
 disagreed+0.002
iked+0.002
Neg logit contributions
opl-0.005
izer-0.005
Kim-0.005
rob-0.005
ep-0.004
Token to
Token ID284
Feature activation-9.59e-04

Pos logit contributions
opl+0.022
rob+0.021
Kim+0.021
ep+0.020
Wh+0.020
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.008
iked-0.008
ver-0.008
Token play
Token ID711
Feature activation-4.63e-03

Pos logit contributions
opl+0.002
Kim+0.002
izer+0.002
Wh+0.002
ep+0.002
Neg logit contributions
ived-0.002
uttered-0.001
bs-0.001
iked-0.001
 disagreed-0.001
Token with
Token ID351
Feature activation-6.56e-03

Pos logit contributions
opl+0.012
Kim+0.011
rob+0.011
izer+0.011
ep+0.011
Neg logit contributions
ived-0.004
uttered-0.004
bs-0.003
 neither-0.003
s-0.003
Token me
Token ID502
Feature activation+4.09e-03

Pos logit contributions
opl+0.009
rob+0.008
Kim+0.008
izer+0.008
ep+0.008
Neg logit contributions
ived-0.014
uttered-0.014
bs-0.013
iked-0.013
 disagreed-0.013
Token?"
Token ID1701
Feature activation+2.48e-02

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.007
iked+0.006
 disagreed+0.006
Neg logit contributions
opl-0.007
ep-0.007
izer-0.006
rob-0.006
Kim-0.006
Token Will
Token ID2561
Feature activation+5.15e-02

Pos logit contributions
ived+0.028
uttered+0.026
bs+0.025
s+0.023
 neither+0.023
Neg logit contributions
opl-0.059
Kim-0.055
izer-0.055
rob-0.054
ep-0.053
Token said
Token ID531
Feature activation+1.08e-02

Pos logit contributions
ived+0.083
uttered+0.081
iked+0.079
bs+0.078
 plunge+0.075
Neg logit contributions
ep-0.088
coon-0.079
izer-0.076
rob-0.072
zer-0.070
Token,
Token ID11
Feature activation+1.05e-02

Pos logit contributions
ived+0.017
uttered+0.016
bs+0.015
ared+0.015
iked+0.014
Neg logit contributions
Kim-0.020
opl-0.019
ep-0.019
izer-0.019
coon-0.018
Token "
Token ID366
Feature activation+8.58e-03

Pos logit contributions
ived+0.018
uttered+0.017
bs+0.016
iked+0.016
 disagreed+0.016
Neg logit contributions
opl-0.019
Kim-0.017
izer-0.017
rob-0.017
ep-0.017
TokenYes
Token ID5297
Feature activation+3.96e-04

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.010
iked+0.009
 disagreed+0.009
Neg logit contributions
opl-0.019
Kim-0.018
rob-0.018
izer-0.018
ep-0.017
Token,
Token ID11
Feature activation+1.14e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.000
iked+0.000
 disagreed+0.000
Neg logit contributions
opl-0.001
Kim-0.001
izer-0.001
ep-0.001
rob-0.001
Token many
Token ID867
Feature activation-4.71e-03

Pos logit contributions
opl+0.007
ep+0.007
Kim+0.006
rob+0.006
izer+0.006
Neg logit contributions
ived-0.005
uttered-0.005
bs-0.005
iked-0.005
ared-0.004
Token books
Token ID3835
Feature activation+2.15e-03

Pos logit contributions
opl+0.009
ep+0.008
rob+0.008
Kim+0.008
izer+0.008
Neg logit contributions
ived-0.008
uttered-0.008
 neither-0.007
iked-0.007
bs-0.007
Token on
Token ID319
Feature activation-7.54e-03

Pos logit contributions
ived+0.003
uttered+0.003
bs+0.003
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.005
izer-0.004
Kim-0.004
rob-0.004
ep-0.004
Token his
Token ID465
Feature activation-1.53e-02

Pos logit contributions
opl+0.015
rob+0.015
izer+0.015
Kim+0.015
coon+0.014
Neg logit contributions
ived-0.010
bs-0.009
uttered-0.009
 none-0.009
s-0.009
Token books
Token ID3835
Feature activation-6.29e-03

Pos logit contributions
rob+0.032
opl+0.031
Kim+0.031
izer+0.030
coon+0.030
Neg logit contributions
ived-0.020
bs-0.020
s-0.018
 goodbye-0.018
uttered-0.018
Tokenhe
Token ID258
Feature activation-5.37e-02

Pos logit contributions
opl+0.013
izer+0.013
rob+0.013
Kim+0.013
Wh+0.012
Neg logit contributions
ived-0.008
bs-0.008
s-0.008
uttered-0.007
ver-0.007
Tokenlf
Token ID1652
Feature activation-4.46e-03

Pos logit contributions
opl+0.042
Kim+0.037
izer+0.035
rob+0.035
Wh+0.033
Neg logit contributions
ived-0.143
uttered-0.137
bs-0.134
iked-0.133
 disagreed-0.131
Token.
Token ID13
Feature activation+6.95e-03

Pos logit contributions
opl+0.008
ep+0.007
Kim+0.007
izer+0.007
rob+0.007
Neg logit contributions
ived-0.008
uttered-0.007
bs-0.007
iked-0.007
 disagreed-0.007
Token His
Token ID2399
Feature activation-2.35e-03

Pos logit contributions
ived+0.012
uttered+0.011
bs+0.011
iked+0.010
 neither+0.010
Neg logit contributions
opl-0.013
izer-0.012
rob-0.012
Kim-0.012
ep-0.012
Token books
Token ID3835
Feature activation+5.28e-04

Pos logit contributions
opl+0.004
ep+0.004
izer+0.004
Kim+0.003
parent+0.003
Neg logit contributions
ived-0.004
uttered-0.004
iked-0.004
 neither-0.004
ared-0.004
Tokenhe
Token ID258
Feature activation-3.85e-02

Pos logit contributions
ived+0.001
uttered+0.001
iked+0.001
bs+0.001
ashed+0.001
Neg logit contributions
opl-0.001
ep-0.001
Kim-0.001
rob-0.001
izer-0.001
Token many
Token ID867
Feature activation-4.69e-03

Pos logit contributions
opl+0.008
ep+0.008
Kim+0.008
rob+0.007
Wh+0.007
Neg logit contributions
ived-0.007
uttered-0.007
iked-0.006
bs-0.006
ared-0.006
Token books
Token ID3835
Feature activation+1.87e-03

Pos logit contributions
opl+0.009
ep+0.008
rob+0.008
Kim+0.008
izer+0.008
Neg logit contributions
ived-0.008
uttered-0.008
 neither-0.007
iked-0.007
ver-0.007
Token on
Token ID319
Feature activation-8.05e-03

Pos logit contributions
ived+0.003
uttered+0.003
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.004
Kim-0.003
rob-0.003
izer-0.003
ep-0.003
Token her
Token ID607
Feature activation-1.54e-02

Pos logit contributions
opl+0.017
rob+0.016
izer+0.016
Kim+0.016
coon+0.016
Neg logit contributions
ived-0.010
bs-0.010
uttered-0.010
 neither-0.009
 disagreed-0.009
Token books
Token ID3835
Feature activation-7.01e-03

Pos logit contributions
rob+0.033
opl+0.033
Kim+0.032
izer+0.032
coon+0.031
Neg logit contributions
bs-0.019
ived-0.019
s-0.017
 goodbye-0.017
uttered-0.016
Tokenhe
Token ID258
Feature activation-5.28e-02

Pos logit contributions
opl+0.014
izer+0.014
rob+0.014
Kim+0.014
Wh+0.013
Neg logit contributions
ived-0.009
bs-0.009
uttered-0.009
s-0.009
ver-0.008
Tokenlf
Token ID1652
Feature activation-5.62e-03

Pos logit contributions
opl+0.038
Kim+0.033
izer+0.032
rob+0.031
Wh+0.028
Neg logit contributions
ived-0.145
uttered-0.139
bs-0.136
iked-0.134
 disagreed-0.133
Token in
Token ID287
Feature activation-8.88e-03

Pos logit contributions
opl+0.010
ep+0.009
Kim+0.009
rob+0.009
izer+0.009
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.009
iked-0.008
 disagreed-0.008
Token her
Token ID607
Feature activation-4.38e-03

Pos logit contributions
opl+0.018
rob+0.017
izer+0.017
Kim+0.017
coon+0.017
Neg logit contributions
ived-0.012
uttered-0.012
bs-0.011
 neither-0.011
 disagreed-0.011
Token room
Token ID2119
Feature activation-1.00e-02

Pos logit contributions
rob+0.009
coon+0.009
Kim+0.008
opl+0.008
izer+0.008
Neg logit contributions
bs-0.006
 tum-0.006
s-0.006
ived-0.006
 goodbye-0.006
Token.
Token ID13
Feature activation+1.35e-03

Pos logit contributions
opl+0.017
ep+0.016
Kim+0.015
izer+0.015
rob+0.015
Neg logit contributions
ived-0.018
uttered-0.017
bs-0.016
iked-0.015
 neither-0.015
Token."
Token ID526
Feature activation-1.06e-02

Pos logit contributions
opl+0.032
ep+0.028
izer+0.027
coon+0.027
Kim+0.026
Neg logit contributions
uttered-0.033
ived-0.033
iked-0.030
bs-0.029
ashed-0.029
Token
Token ID198
Feature activation-3.01e-02

Pos logit contributions
ep+0.022
opl+0.022
izer+0.021
rob+0.021
coon+0.021
Neg logit contributions
ived-0.014
uttered-0.014
iked-0.012
bs-0.012
s-0.012
Token
Token ID198
Feature activation-2.96e-02

Pos logit contributions
opl+0.066
rob+0.065
izer+0.064
ep+0.063
coon+0.062
Neg logit contributions
ived-0.035
uttered-0.033
s-0.032
bs-0.030
ver-0.029
TokenThe
Token ID464
Feature activation+4.62e-04

Pos logit contributions
opl+0.047
rob+0.045
izer+0.044
ep+0.042
coon+0.041
Neg logit contributions
ived-0.055
uttered-0.050
s-0.050
bs-0.048
iked-0.048
Token moral
Token ID6573
Feature activation+1.35e-03

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
iked+0.001
ared+0.001
Neg logit contributions
opl-0.001
Kim-0.001
izer-0.001
ep-0.001
coon-0.001
Token of
Token ID286
Feature activation-4.96e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
iked+0.001
 disagreed+0.001
Neg logit contributions
opl-0.004
Kim-0.004
izer-0.003
rob-0.003
ep-0.003
Token the
Token ID262
Feature activation-3.39e-02

Pos logit contributions
 stret+0.126
Wh+0.124
opl+0.122
ep+0.122
Kim+0.121
Neg logit contributions
 neither-0.029
 goodbye-0.029
 none-0.028
 they-0.028
 some-0.026
Token story
Token ID1621
Feature activation-1.26e-04

Pos logit contributions
opl+0.060
rob+0.056
Kim+0.056
izer+0.056
ep+0.055
Neg logit contributions
ived-0.057
uttered-0.053
 disagreed-0.050
bs-0.050
 neither-0.049
Token is
Token ID318
Feature activation-5.10e-03

Pos logit contributions
opl+0.000
Kim+0.000
ep+0.000
coon+0.000
istle+0.000
Neg logit contributions
ived-0.000
ashed-0.000
bs-0.000
iked-0.000
uttered-0.000
Token that
Token ID326
Feature activation+9.56e-03

Pos logit contributions
opl+0.009
Kim+0.008
izer+0.008
coon+0.008
ep+0.008
Neg logit contributions
ived-0.009
uttered-0.008
ashed-0.008
bs-0.008
 disagreed-0.008
Token it
Token ID340
Feature activation-4.20e-03

Pos logit contributions
ived+0.015
uttered+0.014
ashed+0.014
bs+0.013
iked+0.013
Neg logit contributions
opl-0.019
Kim-0.017
ep-0.016
izer-0.016
 track-0.016
Token you
Token ID345
Feature activation+1.15e-02

Pos logit contributions
Kim+0.019
opl+0.017
ep+0.017
coon+0.017
Wh+0.016
Neg logit contributions
ived-0.015
uttered-0.014
ared-0.013
iked-0.013
bs-0.013
Token go
Token ID467
Feature activation+1.12e-02

Pos logit contributions
uttered+0.019
ived+0.019
iked+0.018
ashed+0.016
als+0.016
Neg logit contributions
opl-0.019
ep-0.017
coon-0.016
istle-0.016
izer-0.015
Token,
Token ID11
Feature activation-6.18e-03

Pos logit contributions
ived+0.015
uttered+0.015
bs+0.013
 disagreed+0.012
iked+0.012
Neg logit contributions
opl-0.025
ep-0.023
rob-0.023
izer-0.023
Kim-0.023
Token my
Token ID616
Feature activation+1.25e-03

Pos logit contributions
opl+0.012
Kim+0.012
izer+0.012
ep+0.011
coon+0.011
Neg logit contributions
ived-0.009
uttered-0.009
iked-0.008
bs-0.008
ared-0.008
Token d
Token ID288
Feature activation-8.42e-04

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
ared+0.002
Neg logit contributions
opl-0.002
izer-0.002
ep-0.002
rob-0.002
Kim-0.002
Tokenarl
Token ID7063
Feature activation-4.68e-02

Pos logit contributions
opl+0.001
izer+0.001
Kim+0.001
ep+0.001
coon+0.001
Neg logit contributions
uttered-0.002
ived-0.002
 disagreed-0.002
 neither-0.002
bs-0.002
Tokenings
Token ID654
Feature activation-1.44e-02

Pos logit contributions
opl+0.110
rob+0.103
izer+0.102
ep+0.098
Kim+0.096
Neg logit contributions
ived-0.053
uttered-0.051
bs-0.047
 disagreed-0.046
 neither-0.044
Token.
Token ID13
Feature activation-1.04e-02

Pos logit contributions
opl+0.027
Kim+0.025
rob+0.025
izer+0.025
ep+0.024
Neg logit contributions
ived-0.023
uttered-0.022
bs-0.021
iked-0.020
 disagreed-0.020
Token Enjoy
Token ID18179
Feature activation+1.38e-02

Pos logit contributions
opl+0.020
rob+0.019
izer+0.019
Kim+0.019
ep+0.019
Neg logit contributions
ived-0.016
uttered-0.015
bs-0.014
 neither-0.014
iked-0.014
Token your
Token ID534
Feature activation+5.78e-04

Pos logit contributions
uttered+0.028
ived+0.028
ver+0.025
 neither+0.025
bs+0.025
Neg logit contributions
opl-0.021
rob-0.020
ep-0.019
izer-0.019
Kim-0.017
Token lunch
Token ID9965
Feature activation-2.01e-02

Pos logit contributions
ived+0.001
uttered+0.001
ver+0.001
 neither+0.001
bs+0.001
Neg logit contributions
opl-0.001
izer-0.001
 stret-0.001
ep-0.001
rob-0.001
Token together
Token ID1978
Feature activation+4.37e-03

Pos logit contributions
opl+0.009
Kim+0.008
izer+0.008
rob+0.008
Wh+0.008
Neg logit contributions
 while-0.002
ived-0.002
uttered-0.002
s-0.002
bs-0.001
Token,
Token ID11
Feature activation-5.48e-03

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.006
iked+0.006
 disagreed+0.006
Neg logit contributions
opl-0.008
ep-0.008
rob-0.008
izer-0.008
Kim-0.007
Token but
Token ID475
Feature activation-3.71e-03

Pos logit contributions
opl+0.009
izer+0.008
Kim+0.008
rob+0.007
ep+0.007
Neg logit contributions
ived-0.011
uttered-0.010
bs-0.010
iked-0.009
ared-0.009
Token Gret
Token ID35877
Feature activation+7.20e-03

Pos logit contributions
opl+0.008
Kim+0.008
rob+0.008
izer+0.008
ep+0.008
Neg logit contributions
ived-0.005
uttered-0.004
bs-0.004
iked-0.004
ashed-0.004
Tokena
Token ID64
Feature activation+2.28e-02

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
s+0.008
ared+0.008
Neg logit contributions
opl-0.015
Kim-0.015
rob-0.014
izer-0.014
Wh-0.014
Token and
Token ID290
Feature activation-4.63e-02

Pos logit contributions
ived+0.034
uttered+0.032
iked+0.029
bs+0.029
 plunge+0.027
Neg logit contributions
opl-0.046
ep-0.043
Kim-0.042
rob-0.042
izer-0.042
Token Jing
Token ID42279
Feature activation+1.60e-02

Pos logit contributions
opl+0.091
ep+0.087
Kim+0.086
izer+0.086
rob+0.085
Neg logit contributions
ived-0.066
uttered-0.062
 neither-0.060
 disagreed-0.060
bs-0.059
Token always
Token ID1464
Feature activation+8.45e-03

Pos logit contributions
ived+0.018
uttered+0.018
iked+0.016
 neither+0.015
 plunge+0.014
Neg logit contributions
opl-0.036
rob-0.035
ep-0.034
istle-0.033
izer-0.033
Token seemed
Token ID3947
Feature activation+1.60e-02

Pos logit contributions
ived+0.012
uttered+0.011
iked+0.010
bs+0.010
als+0.009
Neg logit contributions
opl-0.018
ep-0.017
Kim-0.017
rob-0.017
izer-0.017
Token to
Token ID284
Feature activation+1.68e-02

Pos logit contributions
iked+0.027
uttered+0.026
ived+0.026
als+0.024
ared+0.023
Neg logit contributions
ep-0.023
opl-0.023
 track-0.023
 living-0.021
Kim-0.021
Token bow
Token ID9563
Feature activation+4.76e-03

Pos logit contributions
ived+0.028
uttered+0.028
iked+0.025
als+0.024
ashed+0.024
Neg logit contributions
opl-0.033
ep-0.030
Kim-0.030
izer-0.029
istle-0.029
Token it
Token ID340
Feature activation+8.23e-03

Pos logit contributions
opl+0.014
Kim+0.013
izer+0.013
rob+0.013
ep+0.013
Neg logit contributions
ived-0.007
uttered-0.007
bs-0.007
 neither-0.006
iked-0.006
Token back
Token ID736
Feature activation+6.43e-03

Pos logit contributions
ived+0.012
uttered+0.012
 disagreed+0.011
 neither+0.011
bs+0.011
Neg logit contributions
ep-0.016
opl-0.015
Kim-0.015
izer-0.014
rob-0.014
Token on
Token ID319
Feature activation-6.77e-03

Pos logit contributions
ived+0.009
uttered+0.008
 disagreed+0.008
bs+0.007
 neither+0.007
Neg logit contributions
opl-0.013
ep-0.013
Kim-0.013
rob-0.012
izer-0.012
Token the
Token ID262
Feature activation-1.83e-02

Pos logit contributions
opl+0.015
rob+0.015
Kim+0.014
izer+0.014
coon+0.014
Neg logit contributions
ived-0.008
bs-0.007
uttered-0.007
 neither-0.007
s-0.007
Token books
Token ID3835
Feature activation+2.52e-03

Pos logit contributions
izer+0.038
opl+0.038
rob+0.038
Kim+0.037
istle+0.037
Neg logit contributions
 goodbye-0.021
 while-0.020
 steps-0.019
 to-0.019
 contain-0.019
Tokenhe
Token ID258
Feature activation-4.61e-02

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.004
iked+0.004
 disagreed+0.004
Neg logit contributions
opl-0.005
izer-0.004
Kim-0.004
rob-0.004
ep-0.004
Tokenlf
Token ID1652
Feature activation-4.11e-03

Pos logit contributions
opl+0.043
Kim+0.040
izer+0.039
Wh+0.038
rob+0.037
Neg logit contributions
ived-0.112
iked-0.105
uttered-0.104
bs-0.103
s-0.102
Token so
Token ID523
Feature activation+1.28e-03

Pos logit contributions
opl+0.008
ep+0.008
rob+0.007
Kim+0.007
izer+0.007
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.005
iked-0.005
 disagreed-0.005
Token we
Token ID356
Feature activation+1.14e-02

Pos logit contributions
ived+0.002
 disagreed+0.001
uttered+0.001
ared+0.001
iked+0.001
Neg logit contributions
Kim-0.003
opl-0.003
ep-0.003
Wh-0.003
rob-0.002
Token can
Token ID460
Feature activation+5.95e-03

Pos logit contributions
ived+0.020
uttered+0.018
iked+0.018
 neither+0.016
ared+0.016
Neg logit contributions
ep-0.019
opl-0.018
Kim-0.017
Wh-0.017
coon-0.017
Token find
Token ID1064
Feature activation-5.56e-03

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
iked+0.007
 disagreed+0.007
Neg logit contributions
opl-0.013
Kim-0.012
izer-0.012
ep-0.012
rob-0.012
Token them
Token ID606
Feature activation+6.37e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
 disagreed+0.002
iked+0.002
Neg logit contributions
opl-0.002
ep-0.002
Kim-0.002
izer-0.002
rob-0.002
Token back
Token ID736
Feature activation-1.83e-03

Pos logit contributions
ived+0.010
uttered+0.009
 disagreed+0.009
bs+0.009
 neither+0.009
Neg logit contributions
ep-0.012
opl-0.011
Kim-0.011
izer-0.010
 stret-0.010
Token on
Token ID319
Feature activation-8.87e-03

Pos logit contributions
opl+0.004
ep+0.004
Kim+0.003
izer+0.003
rob+0.003
Neg logit contributions
ived-0.003
uttered-0.002
bs-0.002
 disagreed-0.002
 neither-0.002
Token the
Token ID262
Feature activation-1.84e-02

Pos logit contributions
opl+0.022
rob+0.021
Kim+0.020
izer+0.020
ep+0.020
Neg logit contributions
ived-0.009
uttered-0.008
 neither-0.008
bs-0.008
 none-0.007
Token books
Token ID3835
Feature activation+6.34e-04

Pos logit contributions
opl+0.032
rob+0.030
Kim+0.030
izer+0.029
ep+0.029
Neg logit contributions
 goodbye-0.029
 while-0.027
 contain-0.026
als-0.026
s-0.026
Tokenhe
Token ID258
Feature activation-4.58e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
s+0.001
 neither+0.001
Neg logit contributions
opl-0.001
izer-0.001
rob-0.001
Kim-0.001
ep-0.001
Tokenlf
Token ID1652
Feature activation-6.00e-04

Pos logit contributions
opl+0.037
Kim+0.031
izer+0.031
rob+0.030
Wh+0.029
Neg logit contributions
ived-0.122
uttered-0.116
bs-0.113
iked-0.113
 disagreed-0.111
Token.
Token ID13
Feature activation+3.48e-03

Pos logit contributions
opl+0.001
ep+0.001
Kim+0.001
izer+0.001
rob+0.001
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
 disagreed-0.001
 neither-0.001
Token Then
Token ID3244
Feature activation+6.54e-03

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.006
 disagreed+0.005
 neither+0.005
Neg logit contributions
opl-0.006
Kim-0.006
izer-0.005
rob-0.005
ep-0.005
Token they
Token ID484
Feature activation+9.25e-03

Pos logit contributions
ived+0.011
uttered+0.011
ared+0.010
ashed+0.010
bs+0.009
Neg logit contributions
Kim-0.011
ep-0.010
opl-0.010
Johnny-0.010
izer-0.010
Token grabbed
Token ID13176
Feature activation-6.24e-03

Pos logit contributions
ived+0.012
uttered+0.011
 neither+0.010
bs+0.010
ared+0.010
Neg logit contributions
ep-0.019
Kim-0.019
opl-0.019
rob-0.018
Wh-0.018
Token When
Token ID1649
Feature activation-7.05e-03

Pos logit contributions
opl+0.003
Kim+0.002
izer+0.002
rob+0.002
ep+0.002
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
iked-0.002
 disagreed-0.002
Token she
Token ID673
Feature activation-1.46e-02

Pos logit contributions
Kim+0.014
opl+0.013
ep+0.012
rob+0.012
izer+0.012
Neg logit contributions
ived-0.011
uttered-0.010
ared-0.010
ashed-0.009
bs-0.009
Token opened
Token ID4721
Feature activation+6.78e-04

Pos logit contributions
ep+0.027
opl+0.027
Kim+0.025
rob+0.025
coon+0.024
Neg logit contributions
ived-0.022
uttered-0.021
iked-0.019
 neither-0.019
irds-0.018
Token the
Token ID262
Feature activation-1.69e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
 disagreed+0.001
iked+0.001
Neg logit contributions
opl-0.001
Kim-0.001
izer-0.001
ep-0.001
rob-0.001
Token books
Token ID3835
Feature activation+5.86e-03

Pos logit contributions
opl+0.027
Kim+0.025
izer+0.025
rob+0.024
ep+0.024
Neg logit contributions
ived-0.032
uttered-0.031
bs-0.030
 disagreed-0.029
iked-0.028
Tokenhe
Token ID258
Feature activation-4.58e-02

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
ashed+0.009
iked+0.009
Neg logit contributions
opl-0.010
Kim-0.009
rob-0.009
ep-0.009
izer-0.009
Tokenlf
Token ID1652
Feature activation+3.89e-03

Pos logit contributions
opl+0.017
rob+0.010
izer+0.010
ep+0.009
Kim+0.008
Neg logit contributions
ived-0.145
uttered-0.141
bs-0.137
ashed-0.136
 neither-0.134
Token,
Token ID11
Feature activation+5.95e-04

Pos logit contributions
ived+0.006
bs+0.006
uttered+0.006
ashed+0.006
 married+0.005
Neg logit contributions
opl-0.006
ep-0.006
Kim-0.006
rob-0.006
coon-0.006
Token everything
Token ID2279
Feature activation+4.90e-03

Pos logit contributions
ived+0.001
ared+0.001
uttered+0.001
ashed+0.001
ashes+0.001
Neg logit contributions
Kim-0.001
rob-0.001
Wh-0.001
ep-0.001
Johnny-0.001
Token inside
Token ID2641
Feature activation+1.11e-03

Pos logit contributions
ived+0.009
iked+0.008
bs+0.008
ared+0.008
uttered+0.007
Neg logit contributions
ep-0.008
Kim-0.007
opl-0.007
coon-0.007
rob-0.007
Token was
Token ID373
Feature activation-9.44e-03

Pos logit contributions
ived+0.002
uttered+0.001
bs+0.001
iked+0.001
ashed+0.001
Neg logit contributions
opl-0.002
Kim-0.002
ep-0.002
rob-0.002
izer-0.002
Token the
Token ID262
Feature activation-5.08e-03

Pos logit contributions
ived+0.005
uttered+0.004
bs+0.004
 neither+0.004
 disagreed+0.004
Neg logit contributions
opl-0.007
Kim-0.006
izer-0.006
rob-0.006
coon-0.006
Token books
Token ID3835
Feature activation+9.96e-03

Pos logit contributions
opl+0.009
izer+0.009
rob+0.009
Kim+0.009
ep+0.009
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.007
 disagreed-0.007
iked-0.007
Token on
Token ID319
Feature activation-1.41e-03

Pos logit contributions
ived+0.016
uttered+0.015
bs+0.015
iked+0.014
 disagreed+0.014
Neg logit contributions
opl-0.018
Kim-0.017
rob-0.017
izer-0.017
ep-0.017
Token his
Token ID465
Feature activation-2.03e-03

Pos logit contributions
opl+0.003
rob+0.003
izer+0.003
Kim+0.003
coon+0.003
Neg logit contributions
ived-0.001
 neither-0.001
 none-0.001
 anyone-0.001
s-0.001
Token books
Token ID3835
Feature activation+2.46e-03

Pos logit contributions
opl+0.004
rob+0.004
izer+0.004
Kim+0.004
ep+0.003
Neg logit contributions
ived-0.003
bs-0.003
uttered-0.003
s-0.003
 disagreed-0.003
Tokenhe
Token ID258
Feature activation-4.54e-02

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.003
 disagreed+0.003
 neither+0.003
Neg logit contributions
opl-0.005
izer-0.004
Kim-0.004
rob-0.004
ep-0.004
Tokenlf
Token ID1652
Feature activation+1.54e-04

Pos logit contributions
opl+0.043
izer+0.038
Kim+0.038
rob+0.037
Wh+0.036
Neg logit contributions
ived-0.111
uttered-0.105
iked-0.104
 disagreed-0.103
bs-0.103
Token.
Token ID13
Feature activation+1.52e-03

Pos logit contributions
ived+0.000
uttered+0.000
bs+0.000
iked+0.000
 disagreed+0.000
Neg logit contributions
opl-0.000
Kim-0.000
izer-0.000
rob-0.000
ep-0.000
Token
Token ID220
Feature activation+9.16e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.003
izer-0.003
rob-0.003
ep-0.003
Kim-0.003
Token
Token ID198
Feature activation+4.87e-03

Pos logit contributions
ived+0.012
uttered+0.012
bs+0.011
s+0.011
ver+0.011
Neg logit contributions
opl-0.020
rob-0.019
izer-0.019
coon-0.018
ep-0.018
Token
Token ID198
Feature activation+9.49e-04

Pos logit contributions
ived+0.006
s+0.006
bs+0.005
uttered+0.005
t+0.005
Neg logit contributions
opl-0.011
izer-0.011
rob-0.011
coon-0.010
ep-0.010
TokenAnd
Token ID1870
Feature activation+1.58e-02

Pos logit contributions
opl+0.002
izer+0.002
rob+0.002
ep+0.002
Kim+0.002
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
s-0.002
ver-0.002
Token Lucy
Token ID22162
Feature activation-1.55e-02

Pos logit contributions
ived+0.027
uttered+0.026
ashed+0.024
bs+0.023
ared+0.023
Neg logit contributions
Kim-0.026
opl-0.025
rob-0.024
Wh-0.023
izer-0.023
Token never
Token ID1239
Feature activation+1.67e-02

Pos logit contributions
opl+0.031
Kim+0.029
izer+0.029
rob+0.029
ep+0.028
Neg logit contributions
ived-0.023
uttered-0.022
bs-0.020
iked-0.019
 neither-0.019
Token forgot
Token ID16453
Feature activation+1.10e-02

Pos logit contributions
uttered+0.030
ived+0.026
iked+0.026
ls+0.024
apy+0.023
Neg logit contributions
opl-0.028
istle-0.026
ep-0.024
Kim-0.024
et-0.023
Token the
Token ID262
Feature activation-4.16e-03

Pos logit contributions
ived+0.018
uttered+0.017
iked+0.015
ashed+0.015
bs+0.014
Neg logit contributions
opl-0.021
Kim-0.019
ep-0.018
 stret-0.018
rob-0.017
Token old
Token ID1468
Feature activation-4.51e-02

Pos logit contributions
opl+0.008
izer+0.007
Kim+0.007
ep+0.007
rob+0.007
Neg logit contributions
ived-0.007
uttered-0.007
bs-0.006
iked-0.006
als-0.006
Token woman
Token ID2415
Feature activation-7.06e-03

Pos logit contributions
opl+0.071
izer+0.067
Kim+0.066
rob+0.066
ep+0.064
Neg logit contributions
ived-0.082
bs-0.078
uttered-0.078
 disagreed-0.077
 neither-0.075
Token's
Token ID338
Feature activation+2.86e-06

Pos logit contributions
opl+0.012
ep+0.010
izer+0.010
coon+0.010
Kim+0.010
Neg logit contributions
ived-0.013
uttered-0.013
iked-0.012
ashed-0.012
bs-0.011
Token words
Token ID2456
Feature activation-1.30e-02

Feature not active on prev token;
no predictions were affected.
Token or
Token ID393
Feature activation-2.42e-03

Pos logit contributions
opl+0.023
Kim+0.020
ep+0.020
coon+0.019
rob+0.019
Neg logit contributions
ived-0.022
uttered-0.022
iked-0.020
ashed-0.020
bs-0.019
Token the
Token ID262
Feature activation-1.55e-02

Pos logit contributions
opl+0.004
Kim+0.004
ep+0.004
izer+0.004
rob+0.004
Neg logit contributions
ived-0.004
uttered-0.004
iked-0.004
ashed-0.004
ared-0.003
Token books
Token ID3835
Feature activation+1.52e-02

Pos logit contributions
opl+0.013
izer+0.012
Kim+0.012
rob+0.012
ep+0.012
Neg logit contributions
ived-0.011
uttered-0.010
bs-0.010
 disagreed-0.009
iked-0.009
Token back
Token ID736
Feature activation+1.27e-03

Pos logit contributions
ived+0.023
uttered+0.022
bs+0.020
iked+0.020
 disagreed+0.020
Neg logit contributions
opl-0.030
Kim-0.028
ep-0.027
izer-0.027
rob-0.027
Token on
Token ID319
Feature activation-6.23e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
 disagreed+0.002
 neither+0.002
Neg logit contributions
opl-0.003
ep-0.002
Kim-0.002
izer-0.002
rob-0.002
Token the
Token ID262
Feature activation-1.33e-02

Pos logit contributions
opl+0.014
rob+0.013
Kim+0.013
izer+0.013
ep+0.013
Neg logit contributions
ived-0.008
uttered-0.007
bs-0.007
 neither-0.007
 disagreed-0.007
Token books
Token ID3835
Feature activation+5.17e-03

Pos logit contributions
izer+0.027
rob+0.026
opl+0.026
istle+0.025
coon+0.025
Neg logit contributions
 goodbye-0.017
 while-0.016
 steps-0.016
 usual-0.016
 to-0.015
Tokenhe
Token ID258
Feature activation-4.49e-02

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
 neither+0.007
 disagreed+0.007
Neg logit contributions
opl-0.010
izer-0.009
rob-0.009
Kim-0.009
ep-0.009
Tokenlf
Token ID1652
Feature activation-1.37e-03

Pos logit contributions
opl+0.045
Wh+0.042
izer+0.041
rob+0.041
Kim+0.040
Neg logit contributions
ived-0.106
uttered-0.100
iked-0.100
s-0.098
 disagreed-0.097
Token,
Token ID11
Feature activation-1.24e-02

Pos logit contributions
opl+0.003
Kim+0.002
izer+0.002
rob+0.002
ep+0.002
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
iked-0.002
 disagreed-0.002
Token smiling
Token ID16755
Feature activation+5.73e-03

Pos logit contributions
opl+0.026
Kim+0.025
ep+0.024
izer+0.024
rob+0.024
Neg logit contributions
ived-0.018
uttered-0.017
bs-0.016
ared-0.015
iked-0.015
Token proudly
Token ID23017
Feature activation+5.12e-03

Pos logit contributions
ived+0.010
uttered+0.009
bs+0.009
 plunge+0.008
 neither+0.008
Neg logit contributions
opl-0.010
ep-0.009
izer-0.009
Kim-0.009
rob-0.009
Token.
Token ID13
Feature activation-1.19e-02

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
iked+0.007
 disagreed+0.007
Neg logit contributions
opl-0.010
izer-0.009
Kim-0.009
ep-0.009
rob-0.009
Token upon
Token ID2402
Feature activation-6.44e-03

Pos logit contributions
opl+0.001
Kim+0.001
ep+0.001
izer+0.001
rob+0.001
Neg logit contributions
ived-0.000
uttered-0.000
als-0.000
bs-0.000
iked-0.000
Token a
Token ID257
Feature activation-2.79e-02

Pos logit contributions
opl+0.014
izer+0.013
Kim+0.013
rob+0.013
ep+0.013
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.007
iked-0.007
 disagreed-0.007
Token time
Token ID640
Feature activation-9.09e-03

Pos logit contributions
opl+0.064
rob+0.062
Kim+0.061
coon+0.059
izer+0.059
Neg logit contributions
ived-0.033
bs-0.030
 disagreed-0.029
uttered-0.028
 while-0.027
Token,
Token ID11
Feature activation+3.75e-03

Pos logit contributions
opl+0.019
Kim+0.018
rob+0.017
izer+0.017
ep+0.017
Neg logit contributions
ived-0.013
uttered-0.012
bs-0.012
als-0.011
ashed-0.011
Token little
Token ID1310
Feature activation-1.43e-02

Pos logit contributions
ived+0.003
uttered+0.003
bs+0.003
iked+0.003
 disagreed+0.002
Neg logit contributions
opl-0.010
Kim-0.009
izer-0.009
ep-0.009
rob-0.009
Token Su
Token ID1778
Feature activation-4.46e-02

Pos logit contributions
opl+0.027
Kim+0.025
izer+0.025
rob+0.025
ep+0.024
Neg logit contributions
ived-0.023
uttered-0.022
bs-0.021
 disagreed-0.020
 neither-0.020
Tokenzy
Token ID7357
Feature activation-2.73e-02

Pos logit contributions
rob+0.059
coon+0.059
ep+0.057
 engineer+0.055
Wh+0.055
Neg logit contributions
ared-0.081
s-0.080
ived-0.080
re-0.079
 disagreed-0.078
Token asked
Token ID1965
Feature activation-1.28e-02

Pos logit contributions
opl+0.058
rob+0.055
Kim+0.055
izer+0.055
ep+0.052
Neg logit contributions
ived-0.036
uttered-0.034
bs-0.032
 disagreed-0.031
 neither-0.030
Token her
Token ID607
Feature activation-3.32e-03

Pos logit contributions
opl+0.027
rob+0.025
izer+0.025
Kim+0.025
ep+0.024
Neg logit contributions
ived-0.018
uttered-0.017
bs-0.016
iked-0.016
 neither-0.015
Token Mom
Token ID11254
Feature activation-5.08e-03

Pos logit contributions
opl+0.008
rob+0.008
izer+0.008
Kim+0.008
ep+0.007
Neg logit contributions
ived-0.004
uttered-0.003
bs-0.003
 neither-0.003
iked-0.003
Token what
Token ID644
Feature activation+2.87e-03

Pos logit contributions
opl+0.008
rob+0.007
izer+0.007
Kim+0.007
ep+0.007
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.009
 disagreed-0.009
 neither-0.009
Token 911
Token ID16679
Feature activation-5.99e-03

Pos logit contributions
ived+0.011
uttered+0.011
bs+0.010
 neither+0.010
 disagreed+0.010
Neg logit contributions
opl-0.020
rob-0.019
izer-0.018
ep-0.018
Kim-0.017
Token,
Token ID11
Feature activation+7.21e-03

Pos logit contributions
opl+0.012
Kim+0.012
rob+0.012
izer+0.011
ep+0.011
Neg logit contributions
ived-0.009
uttered-0.008
bs-0.008
iked-0.007
 disagreed-0.007
Token what
Token ID644
Feature activation+1.71e-02

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
iked+0.009
 disagreed+0.008
Neg logit contributions
opl-0.015
Kim-0.015
izer-0.014
ep-0.014
rob-0.014
Token is
Token ID318
Feature activation-9.73e-03

Pos logit contributions
ived+0.031
uttered+0.030
iked+0.028
ashed+0.027
ared+0.025
Neg logit contributions
ep-0.029
opl-0.028
coon-0.027
Kim-0.027
rob-0.025
Token your
Token ID534
Feature activation-8.87e-03

Pos logit contributions
opl+0.023
izer+0.022
rob+0.021
Kim+0.021
Wh+0.020
Neg logit contributions
 neither-0.010
 they-0.010
 anyone-0.010
 to-0.009
 none-0.009
Token emergency
Token ID6334
Feature activation-4.42e-02

Pos logit contributions
opl+0.016
izer+0.015
Kim+0.015
rob+0.015
ep+0.015
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.013
iked-0.013
 disagreed-0.013
Token?"
Token ID1701
Feature activation+1.05e-02

Pos logit contributions
opl+0.081
izer+0.075
rob+0.075
Kim+0.074
ep+0.072
Neg logit contributions
ived-0.073
uttered-0.069
bs-0.066
 neither-0.064
 disagreed-0.064
Token a
Token ID257
Feature activation+3.06e-03

Pos logit contributions
ived+0.019
uttered+0.017
iked+0.017
ashed+0.016
ared+0.016
Neg logit contributions
Kim-0.018
opl-0.017
Wh-0.016
izer-0.016
rob-0.016
Token voice
Token ID3809
Feature activation-1.04e-02

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.004
 disagreed+0.004
 neither+0.004
Neg logit contributions
opl-0.006
rob-0.006
izer-0.006
Kim-0.006
ep-0.006
Token said
Token ID531
Feature activation+9.57e-03

Pos logit contributions
opl+0.024
izer+0.022
Kim+0.022
rob+0.022
ep+0.021
Neg logit contributions
ived-0.013
uttered-0.012
bs-0.011
 disagreed-0.011
 neither-0.010
Token on
Token ID319
Feature activation+1.99e-03

Pos logit contributions
ived+0.016
uttered+0.015
bs+0.014
iked+0.014
 disagreed+0.014
Neg logit contributions
izer-0.015
Kim-0.015
opl-0.015
ep-0.015
 stret-0.014
Token book
Token ID1492
Feature activation+6.26e-03

Pos logit contributions
ived+0.013
uttered+0.013
bs+0.012
iked+0.012
 disagreed+0.012
Neg logit contributions
opl-0.010
izer-0.009
Kim-0.009
rob-0.009
ep-0.009
Token falls
Token ID8953
Feature activation+6.18e-04

Pos logit contributions
ived+0.010
ory+0.010
iked+0.010
uttered+0.010
ashed+0.009
Neg logit contributions
opl-0.011
ep-0.011
Kim-0.010
Wh-0.009
 wire-0.009
Token off
Token ID572
Feature activation+5.98e-03

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
iked+0.000
 disagreed+0.000
Neg logit contributions
opl-0.002
Kim-0.001
izer-0.001
ep-0.001
rob-0.001
Token the
Token ID262
Feature activation-1.43e-02

Pos logit contributions
ived+0.009
uttered+0.008
bs+0.008
iked+0.007
 disagreed+0.007
Neg logit contributions
opl-0.012
izer-0.011
Kim-0.011
rob-0.011
ep-0.011
Token books
Token ID3835
Feature activation+4.73e-05

Pos logit contributions
opl+0.023
rob+0.022
Kim+0.022
izer+0.021
ep+0.021
Neg logit contributions
ived-0.026
bs-0.024
uttered-0.024
als-0.023
 disagreed-0.023
Tokenhe
Token ID258
Feature activation-4.42e-02

Pos logit contributions
ived+0.000
uttered+0.000
bs+0.000
 neither+0.000
s+0.000
Neg logit contributions
opl-0.000
rob-0.000
izer-0.000
Kim-0.000
ep-0.000
Tokenlf
Token ID1652
Feature activation-4.62e-03

Pos logit contributions
opl+0.033
Kim+0.029
rob+0.028
izer+0.027
Wh+0.025
Neg logit contributions
ived-0.121
uttered-0.115
bs-0.111
iked-0.110
 neither-0.109
Token.
Token ID13
Feature activation-1.66e-03

Pos logit contributions
opl+0.008
ep+0.007
izer+0.007
Kim+0.007
rob+0.007
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.008
iked-0.007
 disagreed-0.007
Token It
Token ID632
Feature activation-4.66e-04

Pos logit contributions
opl+0.003
izer+0.003
Kim+0.003
rob+0.003
ep+0.003
Neg logit contributions
ived-0.003
uttered-0.003
bs-0.002
 disagreed-0.002
iked-0.002
Token makes
Token ID1838
Feature activation-4.41e-03

Pos logit contributions
opl+0.001
ep+0.001
Kim+0.001
rob+0.001
Wh+0.001
Neg logit contributions
iked-0.001
ived-0.001
uttered-0.001
ory-0.001
ver-0.001
Token a
Token ID257
Feature activation+3.03e-03

Pos logit contributions
opl+0.009
ep+0.008
Kim+0.008
izer+0.008
rob+0.008
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.006
iked-0.006
 disagreed-0.006
Token is
Token ID318
Feature activation-6.09e-03

Pos logit contributions
opl+0.005
ep+0.005
izer+0.005
rob+0.005
coon+0.005
Neg logit contributions
ived-0.005
uttered-0.005
iked-0.005
bs-0.004
ver-0.004
Token hiding
Token ID11816
Feature activation-9.03e-03

Pos logit contributions
opl+0.013
rob+0.013
Kim+0.012
izer+0.012
coon+0.012
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.007
 neither-0.007
 disagreed-0.007
Token behind
Token ID2157
Feature activation-1.21e-02

Pos logit contributions
opl+0.017
izer+0.016
Kim+0.016
ep+0.016
rob+0.016
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.013
iked-0.012
 disagreed-0.012
Token the
Token ID262
Feature activation-1.35e-02

Pos logit contributions
opl+0.027
izer+0.025
Kim+0.025
rob+0.025
ep+0.024
Neg logit contributions
ived-0.016
uttered-0.015
bs-0.014
iked-0.013
 disagreed-0.013
Token books
Token ID3835
Feature activation+5.88e-03

Pos logit contributions
opl+0.025
rob+0.024
ep+0.024
izer+0.023
Kim+0.023
Neg logit contributions
ived-0.021
bs-0.020
als-0.019
s-0.019
uttered-0.019
Tokenhe
Token ID258
Feature activation-4.42e-02

Pos logit contributions
ived+0.009
uttered+0.009
bs+0.008
 neither+0.008
 disagreed+0.008
Neg logit contributions
opl-0.011
izer-0.010
Kim-0.010
rob-0.010
ep-0.010
Tokenlf
Token ID1652
Feature activation-1.73e-03

Pos logit contributions
opl+0.034
rob+0.029
Kim+0.028
izer+0.027
ep+0.025
Neg logit contributions
ived-0.120
uttered-0.115
bs-0.112
iked-0.110
 neither-0.110
Token.
Token ID13
Feature activation-8.84e-03

Pos logit contributions
opl+0.003
ep+0.003
izer+0.003
Kim+0.003
rob+0.003
Neg logit contributions
ived-0.003
uttered-0.003
bs-0.003
iked-0.003
ashed-0.003
Token The
Token ID383
Feature activation+3.30e-03

Pos logit contributions
opl+0.020
rob+0.019
izer+0.019
ep+0.019
coon+0.018
Neg logit contributions
ived-0.011
uttered-0.010
bs-0.010
 neither-0.009
s-0.009
Token cat
Token ID3797
Feature activation-2.39e-03

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.006
iked+0.006
 disagreed+0.006
Neg logit contributions
opl-0.005
izer-0.005
Kim-0.005
rob-0.005
ep-0.005
Token jumps
Token ID18045
Feature activation+7.22e-03

Pos logit contributions
opl+0.004
ep+0.004
rob+0.004
izer+0.004
coon+0.004
Neg logit contributions
ived-0.004
uttered-0.004
iked-0.004
bs-0.004
ver-0.003
Token of
Token ID286
Feature activation-6.38e-04

Pos logit contributions
ived+0.000
uttered+0.000
 neither+0.000
 disagreed+0.000
iked+0.000
Neg logit contributions
opl-0.001
Kim-0.001
rob-0.001
izer-0.001
ep-0.001
Token fun
Token ID1257
Feature activation-1.65e-02

Pos logit contributions
opl+0.001
rob+0.001
Kim+0.001
izer+0.001
ep+0.001
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
 neither-0.001
iked-0.001
Token exploring
Token ID13504
Feature activation-3.48e-03

Pos logit contributions
opl+0.031
rob+0.029
izer+0.028
Kim+0.028
ep+0.028
Neg logit contributions
ived-0.027
uttered-0.026
bs-0.025
 disagreed-0.023
 neither-0.023
Token the
Token ID262
Feature activation-1.89e-02

Pos logit contributions
opl+0.008
Kim+0.008
izer+0.008
rob+0.008
Wh+0.008
Neg logit contributions
ived-0.004
uttered-0.003
bs-0.003
 while-0.003
s-0.003
Token books
Token ID3835
Feature activation-1.67e-03

Pos logit contributions
izer+0.035
rob+0.034
coon+0.033
Kim+0.033
opl+0.033
Neg logit contributions
s-0.025
 upon-0.025
 to-0.024
 while-0.024
als-0.023
Tokenhe
Token ID258
Feature activation-4.40e-02

Pos logit contributions
opl+0.003
izer+0.003
Kim+0.003
rob+0.003
ep+0.003
Neg logit contributions
ived-0.003
uttered-0.003
bs-0.002
iked-0.002
 disagreed-0.002
Tokenlf
Token ID1652
Feature activation-6.27e-03

Pos logit contributions
opl+0.039
Kim+0.036
Wh+0.035
rob+0.034
izer+0.034
Neg logit contributions
ived-0.112
uttered-0.106
iked-0.104
bs-0.103
s-0.102
Token,
Token ID11
Feature activation-2.30e-02

Pos logit contributions
opl+0.012
ep+0.011
rob+0.011
Kim+0.011
izer+0.011
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.009
iked-0.008
 disagreed-0.008
Token and
Token ID290
Feature activation+8.43e-03

Pos logit contributions
opl+0.042
Kim+0.038
izer+0.038
ep+0.038
rob+0.038
Neg logit contributions
ived-0.040
uttered-0.038
bs-0.036
iked-0.034
 disagreed-0.033
Token his
Token ID465
Feature activation-2.35e-03

Pos logit contributions
ived+0.013
uttered+0.013
ared+0.011
bs+0.011
iked+0.011
Neg logit contributions
opl-0.016
Kim-0.015
rob-0.015
ep-0.015
izer-0.014
Token mom
Token ID1995
Feature activation-1.33e-02

Pos logit contributions
opl+0.005
izer+0.004
ep+0.004
Kim+0.004
rob+0.004
Neg logit contributions
ived-0.004
uttered-0.004
bs-0.003
 neither-0.003
iked-0.003
Token more
Token ID517
Feature activation+9.49e-04

Pos logit contributions
ived+0.009
uttered+0.008
bs+0.008
iked+0.007
 disagreed+0.007
Neg logit contributions
opl-0.012
Kim-0.012
rob-0.012
izer-0.011
ep-0.011
Token books
Token ID3835
Feature activation+7.86e-03

Pos logit contributions
ived+0.002
uttered+0.002
 neither+0.002
ared+0.002
 disagreed+0.002
Neg logit contributions
opl-0.002
rob-0.001
izer-0.001
ep-0.001
Kim-0.001
Token from
Token ID422
Feature activation-6.61e-04

Pos logit contributions
ived+0.012
uttered+0.012
bs+0.011
iked+0.010
 disagreed+0.010
Neg logit contributions
opl-0.015
Kim-0.014
rob-0.014
izer-0.014
ep-0.014
Token the
Token ID262
Feature activation-8.60e-03

Pos logit contributions
opl+0.001
izer+0.001
Kim+0.001
rob+0.001
ep+0.001
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
 neither-0.001
 disagreed-0.001
Token books
Token ID3835
Feature activation+2.71e-05

Pos logit contributions
opl+0.014
rob+0.013
izer+0.013
Kim+0.013
ep+0.013
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.014
 disagreed-0.014
 neither-0.014
Tokenhe
Token ID258
Feature activation-4.40e-02

Pos logit contributions
ived+0.000
uttered+0.000
bs+0.000
iked+0.000
 disagreed+0.000
Neg logit contributions
opl-0.000
Kim-0.000
izer-0.000
rob-0.000
ep-0.000
Tokenlf
Token ID1652
Feature activation-4.05e-03

Pos logit contributions
opl+0.031
Kim+0.025
izer+0.025
rob+0.025
ep+0.022
Neg logit contributions
ived-0.123
uttered-0.118
bs-0.116
iked-0.113
 disagreed-0.113
Token.
Token ID13
Feature activation+1.25e-03

Pos logit contributions
opl+0.007
ep+0.006
Kim+0.006
rob+0.006
izer+0.006
Neg logit contributions
ived-0.007
uttered-0.007
bs-0.006
iked-0.006
 disagreed-0.006
Token The
Token ID383
Feature activation-2.25e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.002
Kim-0.002
izer-0.002
rob-0.002
ep-0.002
Token end
Token ID886
Feature activation-1.77e-02

Pos logit contributions
opl+0.004
izer+0.004
rob+0.004
Kim+0.004
ep+0.004
Neg logit contributions
ived-0.004
uttered-0.003
bs-0.003
 disagreed-0.003
 neither-0.003
Token.
Token ID13
Feature activation-3.85e-03

Pos logit contributions
opl+0.042
rob+0.041
izer+0.040
ep+0.038
Wh+0.037
Neg logit contributions
bs-0.020
ived-0.020
s-0.019
uttered-0.018
 while-0.018
Token some
Token ID617
Feature activation+5.94e-03

Pos logit contributions
opl+0.021
izer+0.020
Kim+0.020
rob+0.020
ep+0.020
Neg logit contributions
ived-0.012
uttered-0.011
bs-0.010
iked-0.009
 disagreed-0.009
Token sticks
Token ID16461
Feature activation+9.11e-03

Pos logit contributions
uttered+0.010
ived+0.010
iked+0.009
ver+0.009
 disagreed+0.009
Neg logit contributions
opl-0.011
ep-0.010
Kim-0.009
izer-0.009
coon-0.009
Token down
Token ID866
Feature activation+1.19e-02

Pos logit contributions
ived+0.012
uttered+0.012
iked+0.011
 disagreed+0.011
bs+0.010
Neg logit contributions
opl-0.019
ep-0.018
izer-0.017
Kim-0.017
coon-0.017
Token.
Token ID13
Feature activation-1.54e-02

Pos logit contributions
ived+0.019
uttered+0.018
bs+0.017
iked+0.016
 disagreed+0.016
Neg logit contributions
opl-0.022
Kim-0.021
izer-0.021
ep-0.020
rob-0.020
Token Tim
Token ID5045
Feature activation-2.00e-02

Pos logit contributions
opl+0.027
Kim+0.026
izer+0.026
rob+0.025
ep+0.025
Neg logit contributions
ived-0.026
uttered-0.025
bs-0.024
iked-0.023
 disagreed-0.022
Token and
Token ID290
Feature activation-4.37e-02

Pos logit contributions
opl+0.038
Kim+0.035
izer+0.035
rob+0.035
ep+0.035
Neg logit contributions
ived-0.032
uttered-0.030
bs-0.028
iked-0.027
 neither-0.027
Token Uncle
Token ID23169
Feature activation+5.62e-03

Pos logit contributions
opl+0.080
rob+0.077
izer+0.069
ep+0.069
coon+0.069
Neg logit contributions
 disagreed-0.068
ived-0.067
 neither-0.067
bs-0.064
uttered-0.061
Token Jack
Token ID3619
Feature activation+4.67e-03

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.006
iked+0.006
ashed+0.005
Neg logit contributions
Kim-0.012
opl-0.012
rob-0.012
ep-0.012
izer-0.011
Token laughed
Token ID13818
Feature activation+2.36e-03

Pos logit contributions
ived+0.006
uttered+0.006
iked+0.006
 plunge+0.005
ipop+0.005
Neg logit contributions
ep-0.010
coon-0.008
opl-0.008
Kim-0.008
istle-0.008
Token.
Token ID13
Feature activation-1.72e-02

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.004
iked+0.004
 disagreed+0.004
Neg logit contributions
opl-0.004
izer-0.004
Kim-0.004
ep-0.004
rob-0.004
Token They
Token ID1119
Feature activation+3.32e-03

Pos logit contributions
opl+0.035
Kim+0.033
izer+0.033
rob+0.032
ep+0.032
Neg logit contributions
ived-0.025
uttered-0.024
bs-0.023
iked-0.021
 disagreed-0.021
Token
Token ID220
Feature activation+3.87e-03

Pos logit contributions
opl+0.045
izer+0.042
Kim+0.042
rob+0.042
ep+0.041
Neg logit contributions
ived-0.034
uttered-0.032
bs-0.030
iked-0.029
 neither-0.029
Token
Token ID198
Feature activation-1.57e-02

Pos logit contributions
ived+0.007
uttered+0.006
bs+0.006
 disagreed+0.006
iked+0.006
Neg logit contributions
opl-0.007
Kim-0.006
izer-0.006
rob-0.006
ep-0.006
Token
Token ID198
Feature activation-2.51e-02

Pos logit contributions
opl+0.032
izer+0.032
coon+0.030
rob+0.030
Kim+0.030
Neg logit contributions
ived-0.021
uttered-0.019
s-0.019
bs-0.018
iked-0.017
TokenThe
Token ID464
Feature activation-5.26e-03

Pos logit contributions
opl+0.039
izer+0.038
rob+0.036
coon+0.035
Kim+0.035
Neg logit contributions
ived-0.047
uttered-0.043
s-0.042
bs-0.041
iked-0.041
Token moral
Token ID6573
Feature activation-2.37e-06

Pos logit contributions
opl+0.009
izer+0.008
Kim+0.008
rob+0.008
ep+0.008
Neg logit contributions
ived-0.009
uttered-0.009
bs-0.009
iked-0.008
 neither-0.008
Token of
Token ID286
Feature activation-4.36e-02

Feature not active on prev token;
no predictions were affected.
Token the
Token ID262
Feature activation-2.71e-02

Pos logit contributions
 stret+0.114
Wh+0.113
opl+0.112
Kim+0.111
ep+0.108
Neg logit contributions
 of-0.020
 it-0.018
 goodbye-0.018
 to-0.018
 they-0.017
Token story
Token ID1621
Feature activation-2.03e-03

Pos logit contributions
opl+0.048
Kim+0.045
rob+0.045
izer+0.045
ep+0.045
Neg logit contributions
ived-0.045
uttered-0.042
bs-0.041
 disagreed-0.040
iked-0.039
Token is
Token ID318
Feature activation-5.83e-03

Pos logit contributions
opl+0.003
ep+0.002
Kim+0.002
istle+0.002
izer+0.002
Neg logit contributions
ived-0.005
bs-0.004
ashed-0.004
iked-0.004
uttered-0.004
Token:
Token ID25
Feature activation-3.10e-03

Pos logit contributions
opl+0.011
Kim+0.010
izer+0.010
ep+0.009
coon+0.009
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.008
ashed-0.008
 disagreed-0.008
Token With
Token ID2080
Feature activation-2.36e-02

Pos logit contributions
opl+0.006
Kim+0.005
ep+0.005
izer+0.005
Wh+0.005
Neg logit contributions
ived-0.005
uttered-0.005
ashed-0.005
bs-0.004
iked-0.004
Token washed
Token ID18989
Feature activation-9.15e-03

Pos logit contributions
uttered+0.000
ived+0.000
als+0.000
bs+0.000
s+0.000
Neg logit contributions
ep-0.000
opl-0.000
Kim-0.000
izer-0.000
 stret-0.000
Token and
Token ID290
Feature activation+3.28e-03

Pos logit contributions
opl+0.019
Kim+0.018
izer+0.018
rob+0.018
ep+0.018
Neg logit contributions
ived-0.013
uttered-0.012
bs-0.011
iked-0.010
 disagreed-0.010
Token dried
Token ID16577
Feature activation-1.55e-02

Pos logit contributions
ived+0.006
uttered+0.005
ared+0.005
ver+0.005
iked+0.005
Neg logit contributions
Kim-0.006
ep-0.006
opl-0.006
rob-0.005
Johnny-0.005
Token,
Token ID11
Feature activation-1.48e-02

Pos logit contributions
opl+0.034
izer+0.033
Kim+0.033
rob+0.032
ep+0.032
Neg logit contributions
ived-0.020
uttered-0.018
bs-0.017
 disagreed-0.016
iked-0.016
Token Sandy
Token ID17581
Feature activation-4.70e-03

Pos logit contributions
opl+0.028
Kim+0.028
izer+0.026
ep+0.026
rob+0.026
Neg logit contributions
ived-0.022
uttered-0.021
bs-0.019
ared-0.019
iked-0.019
Token and
Token ID290
Feature activation-4.35e-02

Pos logit contributions
opl+0.009
ep+0.009
Kim+0.009
izer+0.009
rob+0.009
Neg logit contributions
ived-0.007
uttered-0.007
bs-0.006
iked-0.006
 neither-0.006
Token her
Token ID607
Feature activation-2.34e-02

Pos logit contributions
opl+0.079
rob+0.074
izer+0.073
coon+0.071
Kim+0.070
Neg logit contributions
ived-0.071
bs-0.069
 neither-0.067
uttered-0.067
 disagreed-0.066
Token uncle
Token ID7711
Feature activation+5.59e-03

Pos logit contributions
opl+0.051
rob+0.047
izer+0.047
Kim+0.047
ep+0.046
Neg logit contributions
ived-0.032
uttered-0.030
bs-0.028
 disagreed-0.027
 neither-0.027
Token played
Token ID2826
Feature activation+4.41e-05

Pos logit contributions
ived+0.008
uttered+0.007
bs+0.007
iked+0.006
ared+0.006
Neg logit contributions
Kim-0.011
ep-0.011
opl-0.010
rob-0.010
coon-0.010
Token tag
Token ID7621
Feature activation-5.08e-04

Pos logit contributions
ived+0.000
uttered+0.000
bs+0.000
 disagreed+0.000
iked+0.000
Neg logit contributions
opl-0.000
izer-0.000
Kim-0.000
rob-0.000
ep-0.000
Token in
Token ID287
Feature activation-1.97e-03

Pos logit contributions
opl+0.001
ep+0.001
rob+0.001
izer+0.001
 stret+0.001
Neg logit contributions
ived-0.001
uttered-0.001
ared-0.001
bs-0.001
iked-0.001
Token on
Token ID319
Feature activation-7.40e-03

Pos logit contributions
opl+0.008
Kim+0.008
izer+0.008
rob+0.008
coon+0.008
Neg logit contributions
ived-0.002
s-0.002
uttered-0.002
 neither-0.002
 tired-0.002
Token the
Token ID262
Feature activation-1.54e-02

Pos logit contributions
opl+0.016
Kim+0.016
izer+0.015
rob+0.015
coon+0.015
Neg logit contributions
ived-0.009
bs-0.008
uttered-0.008
 neither-0.007
 none-0.007
Token roof
Token ID9753
Feature activation+1.57e-03

Pos logit contributions
Kim+0.030
opl+0.029
rob+0.029
izer+0.027
istle+0.027
Neg logit contributions
ived-0.021
 tum-0.020
als-0.020
 plunge-0.020
 goodbye-0.020
Token and
Token ID290
Feature activation+1.26e-02

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.003
izer-0.003
Kim-0.003
rob-0.003
ep-0.003
Token sun
Token ID4252
Feature activation-7.08e-03

Pos logit contributions
ived+0.024
uttered+0.022
iked+0.020
ashed+0.019
ared+0.019
Neg logit contributions
opl-0.022
ep-0.021
Kim-0.020
 stret-0.020
izer-0.020
Tokenbat
Token ID8664
Feature activation+4.31e-02

Pos logit contributions
opl+0.015
rob+0.015
izer+0.014
Kim+0.014
ep+0.014
Neg logit contributions
ived-0.009
uttered-0.009
bs-0.008
 disagreed-0.008
iked-0.008
Tokenhe
Token ID258
Feature activation-1.51e-02

Pos logit contributions
ived+0.077
uttered+0.073
bs+0.071
 disagreed+0.067
iked+0.067
Neg logit contributions
opl-0.073
izer-0.067
rob-0.067
Kim-0.067
ep-0.066
Token.
Token ID13
Feature activation-8.41e-04

Pos logit contributions
opl+0.029
ep+0.028
rob+0.028
izer+0.027
Kim+0.026
Neg logit contributions
ived-0.024
uttered-0.022
bs-0.021
iked-0.019
 disagreed-0.019
Token He
Token ID679
Feature activation+4.68e-03

Pos logit contributions
opl+0.002
Kim+0.001
izer+0.001
rob+0.001
ep+0.001
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
iked-0.001
 disagreed-0.001
Token loved
Token ID6151
Feature activation-8.76e-03

Pos logit contributions
ived+0.007
uttered+0.007
iked+0.006
bs+0.006
ared+0.006
Neg logit contributions
ep-0.009
opl-0.009
rob-0.009
Kim-0.008
Wh-0.008
Token the
Token ID262
Feature activation+6.49e-03

Pos logit contributions
opl+0.024
rob+0.023
Kim+0.023
izer+0.022
ep+0.022
Neg logit contributions
ived-0.007
uttered-0.007
bs-0.006
iked-0.005
 disagreed-0.005
Token take
Token ID1011
Feature activation+6.55e-03

Pos logit contributions
ived+0.003
uttered+0.003
iked+0.002
ashed+0.002
bs+0.002
Neg logit contributions
opl-0.003
ep-0.003
Kim-0.003
izer-0.003
Wh-0.002
Token care
Token ID1337
Feature activation+9.46e-03

Pos logit contributions
uttered+0.009
ived+0.009
iked+0.009
ver+0.008
 disagreed+0.007
Neg logit contributions
opl-0.013
izer-0.013
ep-0.013
rob-0.013
 stret-0.012
Token of
Token ID286
Feature activation+1.68e-02

Pos logit contributions
uttered+0.012
ived+0.012
 disagreed+0.011
iked+0.011
 argued+0.010
Neg logit contributions
opl-0.019
ep-0.019
Kim-0.018
Grand-0.018
iny-0.018
Token ourselves
Token ID6731
Feature activation+1.93e-02

Pos logit contributions
ived+0.024
uttered+0.023
bs+0.021
iked+0.020
 disagreed+0.020
Neg logit contributions
opl-0.035
izer-0.033
Kim-0.033
rob-0.032
ep-0.032
Token.
Token ID13
Feature activation-2.84e-03

Pos logit contributions
uttered+0.039
ived+0.038
iked+0.036
ashed+0.034
ared+0.033
Neg logit contributions
ep-0.026
opl-0.024
 stret-0.022
co-0.022
iny-0.022
Token How
Token ID1374
Feature activation+4.85e-02

Pos logit contributions
opl+0.006
rob+0.005
izer+0.005
Kim+0.005
ep+0.005
Neg logit contributions
ived-0.004
uttered-0.004
bs-0.004
 neither-0.004
iked-0.004
Token about
Token ID546
Feature activation+1.83e-02

Pos logit contributions
ived+0.094
uttered+0.087
iked+0.084
ashed+0.080
als+0.078
Neg logit contributions
ep-0.071
opl-0.067
Kim-0.067
 wire-0.061
iny-0.059
Token I
Token ID314
Feature activation+1.44e-05

Pos logit contributions
ived+0.030
uttered+0.027
ashed+0.026
ared+0.025
iked+0.025
Neg logit contributions
opl-0.034
izer-0.033
ep-0.033
rob-0.033
Kim-0.033
Token tell
Token ID1560
Feature activation+1.13e-03

Pos logit contributions
ived+0.000
uttered+0.000
iked+0.000
ashed+0.000
ared+0.000
Neg logit contributions
opl-0.000
Kim-0.000
izer-0.000
coon-0.000
ep-0.000
Token you
Token ID345
Feature activation+7.52e-03

Pos logit contributions
uttered+0.002
ived+0.002
iked+0.002
ashed+0.002
ared+0.002
Neg logit contributions
opl-0.001
ep-0.001
coon-0.001
Kim-0.001
parent-0.001
Token a
Token ID257
Feature activation-5.01e-03

Pos logit contributions
ived+0.014
uttered+0.014
iked+0.013
ared+0.012
ashed+0.012
Neg logit contributions
opl-0.011
ep-0.011
coon-0.010
izer-0.010
 stret-0.010
Token
Token ID198
Feature activation+5.16e-03

Pos logit contributions
ived+0.003
uttered+0.003
bs+0.003
iked+0.003
 neither+0.003
Neg logit contributions
opl-0.004
izer-0.004
rob-0.003
ep-0.003
Kim-0.003
Token
Token ID198
Feature activation+6.64e-03

Pos logit contributions
s+0.006
ived+0.006
uttered+0.005
bs+0.005
t+0.005
Neg logit contributions
opl-0.012
izer-0.012
rob-0.012
coon-0.011
 stret-0.011
Token"
Token ID1
Feature activation-8.97e-03

Pos logit contributions
ived+0.012
s+0.011
 neither+0.010
uttered+0.010
 disagreed+0.010
Neg logit contributions
opl-0.012
rob-0.011
izer-0.011
coon-0.011
 stret-0.010
TokenOkay
Token ID16454
Feature activation+1.14e-02

Pos logit contributions
Kim+0.018
opl+0.017
Wh+0.017
rob+0.017
ep+0.016
Neg logit contributions
ived-0.013
uttered-0.012
bs-0.011
iked-0.011
ared-0.011
Token.
Token ID13
Feature activation+7.31e-03

Pos logit contributions
ived+0.017
uttered+0.016
bs+0.014
iked+0.014
ashed+0.013
Neg logit contributions
opl-0.023
Kim-0.022
izer-0.022
ep-0.022
rob-0.022
Token How
Token ID1374
Feature activation+4.47e-02

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.009
 disagreed+0.009
iked+0.009
Neg logit contributions
opl-0.015
Kim-0.014
izer-0.014
Wh-0.013
rob-0.013
Token do
Token ID466
Feature activation+1.41e-02

Pos logit contributions
ived+0.074
uttered+0.068
iked+0.063
als+0.062
ashed+0.061
Neg logit contributions
opl-0.078
ep-0.077
Kim-0.075
izer-0.070
istle-0.067
Token we
Token ID356
Feature activation+2.73e-03

Pos logit contributions
ived+0.024
uttered+0.023
ashed+0.022
ared+0.020
iked+0.020
Neg logit contributions
Kim-0.026
rob-0.025
opl-0.025
ep-0.025
izer-0.024
Token play
Token ID711
Feature activation-2.06e-04

Pos logit contributions
ived+0.005
uttered+0.004
iked+0.004
ashed+0.004
ed+0.004
Neg logit contributions
opl-0.005
ep-0.005
Kim-0.004
izer-0.004
istle-0.004
Token?"
Token ID1701
Feature activation+8.97e-03

Pos logit contributions
opl+0.000
izer+0.000
Kim+0.000
rob+0.000
ep+0.000
Neg logit contributions
ived-0.000
uttered-0.000
bs-0.000
iked-0.000
 disagreed-0.000
Token Jen
Token ID13364
Feature activation+3.57e-03

Pos logit contributions
ived+0.018
uttered+0.018
bs+0.017
ared+0.016
ashed+0.016
Neg logit contributions
Kim-0.013
opl-0.013
ep-0.012
rob-0.011
Wh-0.011
Token said
Token ID531
Feature activation+1.38e-02

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.005
iked+0.005
ared+0.005
Neg logit contributions
ep-0.007
Kim-0.007
opl-0.007
rob-0.007
izer-0.007
Token,
Token ID11
Feature activation+1.06e-02

Pos logit contributions
ived+0.025
uttered+0.024
ared+0.022
 contain+0.022
iked+0.021
Neg logit contributions
Kim-0.018
coon-0.017
cel-0.017
ep-0.017
zer-0.017
Token "
Token ID366
Feature activation+7.22e-03

Pos logit contributions
ived+0.017
uttered+0.016
bs+0.016
 neither+0.015
iked+0.015
Neg logit contributions
opl-0.020
izer-0.018
rob-0.018
Kim-0.018
ep-0.018
TokenWell
Token ID5779
Feature activation+1.70e-02

Pos logit contributions
ived+0.011
uttered+0.010
bs+0.010
 neither+0.009
 disagreed+0.009
Neg logit contributions
opl-0.015
izer-0.014
rob-0.013
ep-0.013
Kim-0.013
Token,
Token ID11
Feature activation+2.45e-02

Pos logit contributions
ived+0.014
 neither+0.012
 disagreed+0.011
bs+0.011
uttered+0.010
Neg logit contributions
opl-0.046
izer-0.046
rob-0.046
Wh-0.044
Kim-0.043
Token how
Token ID703
Feature activation+4.88e-02

Pos logit contributions
ived+0.038
uttered+0.037
iked+0.033
bs+0.032
ared+0.032
Neg logit contributions
Kim-0.046
opl-0.045
Wh-0.044
izer-0.044
ep-0.044
Token about
Token ID546
Feature activation+2.16e-02

Pos logit contributions
ived+0.082
uttered+0.074
iked+0.069
ared+0.068
als+0.066
Neg logit contributions
opl-0.087
ep-0.087
Kim-0.086
izer-0.079
coon-0.077
Token we
Token ID356
Feature activation+8.51e-03

Pos logit contributions
ived+0.035
uttered+0.033
bs+0.031
iked+0.029
 disagreed+0.029
Neg logit contributions
opl-0.041
Kim-0.039
ep-0.038
izer-0.038
rob-0.038
Token make
Token ID787
Feature activation+2.74e-03

Pos logit contributions
ived+0.014
uttered+0.013
iked+0.012
bs+0.012
ared+0.011
Neg logit contributions
opl-0.016
Kim-0.015
ep-0.015
izer-0.014
Wh-0.014
Token a
Token ID257
Feature activation+1.33e-02

Pos logit contributions
ived+0.004
uttered+0.003
bs+0.003
iked+0.003
 disagreed+0.003
Neg logit contributions
opl-0.006
izer-0.006
Kim-0.006
rob-0.006
ep-0.006
Token game
Token ID983
Feature activation+6.82e-03

Pos logit contributions
ived+0.021
uttered+0.020
bs+0.019
 neither+0.018
 disagreed+0.018
Neg logit contributions
opl-0.025
rob-0.024
Kim-0.023
izer-0.023
ep-0.023
Token went
Token ID1816
Feature activation+1.42e-02

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
 disagreed+0.002
iked+0.002
Neg logit contributions
opl-0.005
izer-0.005
Kim-0.005
rob-0.005
ep-0.004
Token she
Token ID673
Feature activation+9.59e-03

Pos logit contributions
ived+0.023
uttered+0.022
bs+0.019
ashed+0.019
ared+0.018
Neg logit contributions
opl-0.027
Kim-0.026
rob-0.025
coon-0.023
izer-0.023
Token searched
Token ID16499
Feature activation+1.51e-02

Pos logit contributions
ived+0.012
uttered+0.011
bs+0.010
iked+0.009
 neither+0.009
Neg logit contributions
opl-0.022
rob-0.021
Kim-0.021
izer-0.021
ep-0.020
Token for
Token ID329
Feature activation+3.44e-03

Pos logit contributions
ived+0.025
uttered+0.023
iked+0.021
 neither+0.020
ared+0.020
Neg logit contributions
opl-0.028
ep-0.027
rob-0.026
izer-0.026
Kim-0.025
Token or
Token ID393
Feature activation+1.91e-02

Pos logit contributions
ived+0.006
uttered+0.006
ashed+0.006
bs+0.005
ver+0.005
Neg logit contributions
ep-0.006
opl-0.006
Kim-0.005
Wh-0.005
rob-0.005
Tokennam
Token ID7402
Feature activation+4.78e-02

Pos logit contributions
ived+0.027
uttered+0.024
iked+0.021
ashed+0.019
als+0.019
Neg logit contributions
ep-0.042
Kim-0.040
opl-0.040
Wh-0.040
rob-0.038
Tokenents
Token ID658
Feature activation+1.96e-02

Pos logit contributions
ived+0.058
uttered+0.056
bs+0.052
 disagreed+0.051
 neither+0.051
Neg logit contributions
opl-0.105
ep-0.101
Kim-0.098
izer-0.095
rob-0.093
Token,
Token ID11
Feature activation+3.85e-03

Pos logit contributions
ived+0.031
uttered+0.029
bs+0.026
iked+0.026
 disagreed+0.025
Neg logit contributions
opl-0.037
ep-0.034
Kim-0.033
rob-0.033
izer-0.033
Token and
Token ID290
Feature activation+9.29e-03

Pos logit contributions
ived+0.006
uttered+0.006
bs+0.005
ashed+0.005
iked+0.005
Neg logit contributions
opl-0.007
Kim-0.007
rob-0.007
ep-0.007
Wh-0.007
Token the
Token ID262
Feature activation+1.16e-02

Pos logit contributions
ived+0.015
uttered+0.014
ared+0.012
iked+0.012
bs+0.012
Neg logit contributions
Kim-0.018
opl-0.017
ep-0.017
Wh-0.017
rob-0.017
Token more
Token ID517
Feature activation+7.05e-03

Pos logit contributions
ived+0.021
uttered+0.020
bs+0.018
iked+0.018
ashed+0.017
Neg logit contributions
opl-0.021
Kim-0.019
ep-0.019
rob-0.019
izer-0.019
Token her
Token ID607
Feature activation-6.06e-03

Pos logit contributions
opl+0.023
Kim+0.022
rob+0.021
ep+0.021
Wh+0.021
Neg logit contributions
ived-0.019
uttered-0.018
ared-0.017
bs-0.016
 disagreed-0.016
Token shower
Token ID14643
Feature activation-7.39e-03

Pos logit contributions
opl+0.009
izer+0.009
Kim+0.009
 stret+0.009
parent+0.009
Neg logit contributions
ived-0.010
uttered-0.010
ared-0.009
ver-0.009
iked-0.009
Token,
Token ID11
Feature activation-2.86e-03

Pos logit contributions
opl+0.014
Kim+0.014
rob+0.014
izer+0.013
ep+0.013
Neg logit contributions
ived-0.010
uttered-0.010
bs-0.009
ared-0.009
 neither-0.009
Token she
Token ID673
Feature activation+7.39e-03

Pos logit contributions
Kim+0.005
Wh+0.005
opl+0.005
ep+0.005
Johnny+0.005
Neg logit contributions
ived-0.005
ared-0.004
ashed-0.004
uttered-0.004
 disagreed-0.004
Token went
Token ID1816
Feature activation+9.00e-03

Pos logit contributions
ived+0.011
uttered+0.011
iked+0.010
 neither+0.010
ared+0.010
Neg logit contributions
ep-0.014
Kim-0.012
 stret-0.012
opl-0.012
coon-0.011
Token back
Token ID736
Feature activation+1.20e-02

Pos logit contributions
ived+0.010
uttered+0.009
bs+0.009
 disagreed+0.008
 neither+0.008
Neg logit contributions
opl-0.021
Kim-0.020
izer-0.020
rob-0.020
ep-0.020
Token to
Token ID284
Feature activation+2.36e-03

Pos logit contributions
ived+0.015
uttered+0.013
 neither+0.012
 disagreed+0.012
bs+0.012
Neg logit contributions
opl-0.026
Kim-0.025
rob-0.025
izer-0.025
ep-0.025
Token her
Token ID607
Feature activation+7.90e-03

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.004
iked+0.004
ared+0.004
Neg logit contributions
opl-0.004
izer-0.004
Kim-0.004
ep-0.004
rob-0.004
Token bedroom
Token ID14043
Feature activation+8.35e-04

Pos logit contributions
ived+0.016
uttered+0.015
bs+0.014
iked+0.014
ver+0.014
Neg logit contributions
opl-0.012
izer-0.011
Kim-0.011
ep-0.011
rob-0.010
Token.
Token ID13
Feature activation-1.27e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
 neither+0.001
 disagreed+0.001
Neg logit contributions
opl-0.001
ep-0.001
Kim-0.001
izer-0.001
rob-0.001
Token
Token ID198
Feature activation-1.07e-02

Pos logit contributions
opl+0.024
Kim+0.023
izer+0.022
Wh+0.021
rob+0.021
Neg logit contributions
ived-0.021
uttered-0.019
bs-0.018
 neither-0.018
 disagreed-0.018
Token was
Token ID373
Feature activation-6.36e-03

Pos logit contributions
opl+0.014
ep+0.013
rob+0.012
Kim+0.012
izer+0.012
Neg logit contributions
ived-0.011
uttered-0.010
bs-0.009
iked-0.009
 neither-0.009
Token sure
Token ID1654
Feature activation+1.97e-02

Pos logit contributions
opl+0.012
ep+0.011
Kim+0.011
izer+0.010
rob+0.010
Neg logit contributions
ived-0.011
uttered-0.010
bs-0.010
iked-0.010
als-0.009
Token that
Token ID326
Feature activation+1.08e-02

Pos logit contributions
ived+0.031
ashed+0.028
uttered+0.028
ashes+0.026
iked+0.024
Neg logit contributions
opl-0.033
Kim-0.033
ep-0.031
istle-0.030
wen-0.030
Token she
Token ID673
Feature activation+2.51e-03

Pos logit contributions
ived+0.018
uttered+0.016
ashed+0.015
ared+0.015
iked+0.015
Neg logit contributions
opl-0.020
rob-0.018
Kim-0.018
 Tooth-0.017
ep-0.017
Token would
Token ID561
Feature activation+2.67e-03

Pos logit contributions
ived+0.004
uttered+0.003
bs+0.003
iked+0.003
 neither+0.003
Neg logit contributions
opl-0.005
ep-0.005
rob-0.005
Kim-0.005
izer-0.005
Token never
Token ID1239
Feature activation+2.30e-02

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.004
iked+0.004
 disagreed+0.004
Neg logit contributions
opl-0.005
Kim-0.005
izer-0.005
rob-0.005
ep-0.005
Token forget
Token ID6044
Feature activation+5.35e-03

Pos logit contributions
uttered+0.040
ived+0.036
iked+0.036
apped+0.034
bs+0.032
Neg logit contributions
opl-0.043
istle-0.039
Kim-0.038
Wh-0.036
coon-0.036
Token it
Token ID340
Feature activation-1.65e-02

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
iked+0.007
ashed+0.007
Neg logit contributions
opl-0.011
Kim-0.010
izer-0.010
rob-0.010
ep-0.010
Token!
Token ID0
Feature activation-6.47e-03

Pos logit contributions
opl+0.028
Kim+0.025
izer+0.025
rob+0.025
ep+0.024
Neg logit contributions
ived-0.030
uttered-0.029
bs-0.027
iked-0.026
 disagreed-0.026
Token<|endoftext|>
Token ID50256
Feature activation+1.17e-02

Pos logit contributions
opl+0.012
Kim+0.011
izer+0.011
rob+0.011
ep+0.010
Neg logit contributions
ived-0.011
uttered-0.010
bs-0.010
 disagreed-0.009
iked-0.009
TokenOnce
Token ID7454
Feature activation-8.50e-04

Pos logit contributions
ived+0.018
uttered+0.017
bs+0.016
iked+0.015
 disagreed+0.015
Neg logit contributions
opl-0.023
izer-0.021
Kim-0.021
rob-0.021
ep-0.021
Token birthday
Token ID10955
Feature activation+8.30e-04

Pos logit contributions
opl+0.005
izer+0.005
ep+0.005
Kim+0.004
coon+0.004
Neg logit contributions
ived-0.004
uttered-0.004
bs-0.004
iked-0.004
ver-0.004
Token.
Token ID13
Feature activation-5.41e-03

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
iked+0.001
 neither+0.001
Neg logit contributions
opl-0.001
Kim-0.001
izer-0.001
rob-0.001
ep-0.001
Token You
Token ID921
Feature activation+1.75e-02

Pos logit contributions
opl+0.012
rob+0.012
ep+0.012
izer+0.011
istle+0.011
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.006
s-0.005
 neither-0.005
Token ruined
Token ID20484
Feature activation-1.56e-02

Pos logit contributions
ived+0.030
iked+0.029
uttered+0.027
 goodbye+0.023
ashed+0.023
Neg logit contributions
ep-0.031
coon-0.027
Kim-0.027
opl-0.027
Wh-0.025
Token it
Token ID340
Feature activation-4.31e-03

Pos logit contributions
opl+0.034
rob+0.032
Kim+0.032
izer+0.032
ep+0.031
Neg logit contributions
ived-0.020
uttered-0.019
bs-0.018
 neither-0.017
 disagreed-0.017
Token!"
Token ID2474
Feature activation+1.19e-02

Pos logit contributions
opl+0.008
ep+0.007
izer+0.007
Kim+0.007
rob+0.007
Neg logit contributions
ived-0.007
uttered-0.007
bs-0.007
iked-0.007
ashed-0.006
Token
Token ID198
Feature activation-6.49e-03

Pos logit contributions
ived+0.019
uttered+0.018
bs+0.017
iked+0.016
 disagreed+0.016
Neg logit contributions
opl-0.022
izer-0.021
Kim-0.021
rob-0.021
ep-0.020
Token
Token ID198
Feature activation-2.99e-03

Pos logit contributions
opl+0.016
rob+0.016
izer+0.015
et+0.015
ep+0.015
Neg logit contributions
s-0.006
ived-0.006
bs-0.005
uttered-0.005
 of-0.004
TokenS
Token ID50
Feature activation+4.61e-03

Pos logit contributions
opl+0.005
rob+0.005
izer+0.005
et+0.004
cel+0.004
Neg logit contributions
ived-0.005
bs-0.005
s-0.005
uttered-0.004
 neither-0.004
Tokenara
Token ID3301
Feature activation+1.12e-02

Pos logit contributions
ived+0.008
uttered+0.008
bs+0.007
 disagreed+0.007
iked+0.007
Neg logit contributions
opl-0.008
rob-0.008
izer-0.007
Kim-0.007
ep-0.007
Token and
Token ID290
Feature activation-1.01e-02

Pos logit contributions
ived+0.020
uttered+0.019
iked+0.018
bs+0.018
 plunge+0.018
Neg logit contributions
ep-0.018
opl-0.018
Kim-0.017
izer-0.016
coon-0.016
Token first
Token ID717
Feature activation+6.57e-03

Pos logit contributions
iked+0.034
uttered+0.034
ver+0.033
ived+0.033
ared+0.032
Neg logit contributions
opl-0.019
ep-0.019
izer-0.018
coon-0.018
 Tooth-0.017
Token bee
Token ID20697
Feature activation+4.35e-03

Pos logit contributions
uttered+0.012
ver+0.012
iked+0.012
nam+0.012
ived+0.012
Neg logit contributions
izer-0.009
opl-0.008
et-0.008
ep-0.008
coon-0.008
Token.
Token ID13
Feature activation+1.63e-03

Pos logit contributions
ived+0.006
uttered+0.006
iked+0.006
 neither+0.006
bs+0.005
Neg logit contributions
opl-0.008
coon-0.008
ep-0.008
rob-0.007
izer-0.007
Token So
Token ID1406
Feature activation+6.58e-03

Pos logit contributions
ived+0.003
uttered+0.003
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.003
Kim-0.003
izer-0.003
rob-0.003
Wh-0.003
Token,
Token ID11
Feature activation+1.35e-02

Pos logit contributions
ived+0.013
ashed+0.012
uttered+0.012
ared+0.012
iked+0.012
Neg logit contributions
opl-0.008
Kim-0.008
izer-0.008
ep-0.008
 track-0.007
Token the
Token ID262
Feature activation+1.95e-02

Pos logit contributions
ived+0.030
ared+0.029
ashed+0.028
uttered+0.028
iked+0.027
Neg logit contributions
opl-0.013
Now-0.013
Kim-0.012
ep-0.012
izer-0.012
Token first
Token ID717
Feature activation-2.11e-04

Pos logit contributions
iked+0.041
uttered+0.040
ived+0.040
ared+0.039
bs+0.037
Neg logit contributions
ep-0.025
opl-0.024
coon-0.023
izer-0.023
 Tooth-0.022
Token bee
Token ID20697
Feature activation+3.82e-03

Pos logit contributions
izer+0.000
opl+0.000
ep+0.000
et+0.000
coon+0.000
Neg logit contributions
nam-0.000
iked-0.000
uttered-0.000
ived-0.000
ver-0.000
Token invited
Token ID9392
Feature activation+7.32e-03

Pos logit contributions
ived+0.006
uttered+0.005
iked+0.005
bs+0.005
 neither+0.005
Neg logit contributions
opl-0.007
ep-0.007
izer-0.007
coon-0.007
Kim-0.007
Token the
Token ID262
Feature activation+6.44e-03

Pos logit contributions
ived+0.014
uttered+0.014
bs+0.014
ashed+0.014
ared+0.013
Neg logit contributions
opl-0.009
izer-0.009
coon-0.009
ep-0.009
istle-0.008
Token other
Token ID584
Feature activation+1.44e-02

Pos logit contributions
uttered+0.013
iked+0.012
ived+0.012
ared+0.012
bs+0.012
Neg logit contributions
opl-0.009
ep-0.009
coon-0.008
izer-0.008
 Tooth-0.008
Token ball
Token ID2613
Feature activation+2.56e-03

Pos logit contributions
ived+0.024
uttered+0.023
bs+0.022
iked+0.021
 disagreed+0.021
Neg logit contributions
opl-0.029
Kim-0.027
izer-0.027
rob-0.026
ep-0.026
Token only
Token ID691
Feature activation+3.80e-03

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.003
iked+0.003
 married+0.003
Neg logit contributions
opl-0.005
ep-0.005
Kim-0.005
coon-0.004
izer-0.004
Token moved
Token ID3888
Feature activation+8.53e-03

Pos logit contributions
ived+0.006
uttered+0.006
iked+0.006
bs+0.005
 plunge+0.005
Neg logit contributions
ep-0.007
et-0.006
opl-0.006
Kim-0.006
rob-0.006
Token a
Token ID257
Feature activation-2.78e-03

Pos logit contributions
ived+0.012
uttered+0.011
bs+0.011
iked+0.010
 disagreed+0.010
Neg logit contributions
opl-0.018
izer-0.017
Kim-0.017
rob-0.017
ep-0.016
Token little
Token ID1310
Feature activation+1.80e-02

Pos logit contributions
opl+0.006
izer+0.006
Kim+0.006
rob+0.006
ep+0.006
Neg logit contributions
ived-0.003
uttered-0.003
bs-0.003
iked-0.003
 disagreed-0.003
Token bit
Token ID1643
Feature activation+1.15e-02

Pos logit contributions
uttered+0.024
ived+0.023
iked+0.022
bs+0.021
ashed+0.020
Neg logit contributions
opl-0.037
izer-0.036
ep-0.034
Kim-0.034
 stret-0.034
Token.
Token ID13
Feature activation-1.23e-02

Pos logit contributions
uttered+0.018
ived+0.017
iked+0.017
ver+0.016
bs+0.015
Neg logit contributions
opl-0.020
ep-0.019
Kim-0.018
izer-0.018
coon-0.017
Token Tommy
Token ID19919
Feature activation-1.17e-03

Pos logit contributions
opl+0.024
rob+0.022
izer+0.022
Kim+0.022
ep+0.022
Neg logit contributions
ived-0.019
uttered-0.018
bs-0.017
iked-0.016
 neither-0.016
Token was
Token ID373
Feature activation-3.95e-03

Pos logit contributions
ep+0.002
opl+0.002
Kim+0.002
rob+0.002
izer+0.002
Neg logit contributions
ived-0.002
uttered-0.002
iked-0.002
bs-0.002
 plunge-0.001
Token disappointed
Token ID11679
Feature activation-9.93e-04

Pos logit contributions
opl+0.007
ep+0.007
Kim+0.006
izer+0.006
coon+0.006
Neg logit contributions
uttered-0.007
ived-0.007
iked-0.006
bs-0.006
als-0.006
Token his
Token ID465
Feature activation-9.27e-03

Pos logit contributions
opl+0.002
Kim+0.002
izer+0.002
ep+0.002
rob+0.002
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.001
iked-0.001
ashed-0.001
Token He
Token ID679
Feature activation-1.72e-03

Pos logit contributions
opl+0.026
izer+0.024
rob+0.024
Kim+0.024
ep+0.023
Neg logit contributions
ived-0.023
uttered-0.021
bs-0.020
iked-0.020
 disagreed-0.020
Token was
Token ID373
Feature activation-5.94e-03

Pos logit contributions
ep+0.003
Kim+0.003
opl+0.003
rob+0.003
Wh+0.003
Neg logit contributions
ived-0.003
uttered-0.003
iked-0.002
bs-0.002
 plunge-0.002
Token so
Token ID523
Feature activation-5.20e-03

Pos logit contributions
opl+0.010
ep+0.010
Kim+0.009
izer+0.009
 stret+0.009
Neg logit contributions
ived-0.010
uttered-0.010
bs-0.010
als-0.010
iked-0.010
Token sad
Token ID6507
Feature activation+2.33e-04

Pos logit contributions
opl+0.009
ep+0.009
Kim+0.008
Grand+0.007
 stret+0.007
Neg logit contributions
als-0.009
ived-0.008
uttered-0.008
irds-0.008
rings-0.008
Token that
Token ID326
Feature activation-1.46e-02

Pos logit contributions
ived+0.000
uttered+0.000
bs+0.000
iked+0.000
 disagreed+0.000
Neg logit contributions
opl-0.000
Kim-0.000
rob-0.000
izer-0.000
ep-0.000
Token he
Token ID339
Feature activation-6.15e-03

Pos logit contributions
opl+0.031
Kim+0.029
rob+0.028
ep+0.028
izer+0.027
Neg logit contributions
ived-0.022
uttered-0.021
bs-0.018
ashed-0.018
iked-0.018
Token did
Token ID750
Feature activation+6.03e-03

Pos logit contributions
ep+0.012
opl+0.012
Kim+0.012
coon+0.011
Wh+0.011
Neg logit contributions
ived-0.009
uttered-0.009
iked-0.008
bs-0.007
ver-0.007
Token not
Token ID407
Feature activation+3.66e-03

Pos logit contributions
ived+0.012
uttered+0.012
als+0.011
ashed+0.011
bs+0.011
Neg logit contributions
opl-0.009
ep-0.009
Kim-0.008
izer-0.008
Wh-0.008
Token fit
Token ID4197
Feature activation-8.32e-04

Pos logit contributions
uttered+0.005
ived+0.005
iked+0.004
als+0.004
s+0.004
Neg logit contributions
opl-0.008
Kim-0.008
ep-0.007
izer-0.007
coon-0.007
Token into
Token ID656
Feature activation+1.31e-03

Pos logit contributions
opl+0.002
Kim+0.002
izer+0.002
rob+0.002
ep+0.002
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
iked-0.001
 disagreed-0.001
Token the
Token ID262
Feature activation-9.28e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
 disagreed+0.002
Neg logit contributions
opl-0.003
Kim-0.003
izer-0.003
rob-0.003
ep-0.003
Token
Token ID198
Feature activation+3.08e-03

Pos logit contributions
ived+0.004
uttered+0.004
s+0.004
bs+0.003
ver+0.003
Neg logit contributions
opl-0.007
rob-0.007
izer-0.006
coon-0.006
 stret-0.006
Token"
Token ID1
Feature activation-7.44e-03

Pos logit contributions
ived+0.006
uttered+0.006
s+0.005
bs+0.005
iked+0.005
Neg logit contributions
rob-0.005
opl-0.005
izer-0.004
coon-0.004
istle-0.004
TokenI
Token ID40
Feature activation-1.14e-03

Pos logit contributions
opl+0.013
Kim+0.012
rob+0.012
izer+0.012
ep+0.011
Neg logit contributions
ived-0.013
uttered-0.013
bs-0.012
iked-0.012
 disagreed-0.012
Token don
Token ID836
Feature activation-6.57e-03

Pos logit contributions
opl+0.002
ep+0.002
Kim+0.002
izer+0.002
rob+0.002
Neg logit contributions
ived-0.001
uttered-0.001
iked-0.001
als-0.001
 none-0.001
Token't
Token ID470
Feature activation-8.75e-03

Pos logit contributions
opl+0.016
izer+0.015
rob+0.015
Kim+0.015
ep+0.015
Neg logit contributions
ived-0.007
uttered-0.006
bs-0.006
 neither-0.005
s-0.005
Token know
Token ID760
Feature activation+6.82e-04

Pos logit contributions
opl+0.023
izer+0.022
Kim+0.022
rob+0.022
ep+0.021
Neg logit contributions
ived-0.008
uttered-0.007
bs-0.006
iked-0.006
 neither-0.006
Token,"
Token ID553
Feature activation-4.77e-03

Pos logit contributions
ived+0.001
uttered+0.001
ashed+0.001
iked+0.001
bs+0.001
Neg logit contributions
opl-0.001
ep-0.001
Kim-0.001
coon-0.001
rob-0.001
Token said
Token ID531
Feature activation+6.75e-03

Pos logit contributions
opl+0.009
Kim+0.009
izer+0.009
rob+0.009
ep+0.009
Neg logit contributions
ived-0.008
uttered-0.007
bs-0.007
iked-0.006
ashed-0.006
Token mom
Token ID1995
Feature activation-6.66e-03

Pos logit contributions
ived+0.015
uttered+0.014
bs+0.013
iked+0.013
ashed+0.013
Neg logit contributions
opl-0.009
Kim-0.008
izer-0.008
ep-0.008
rob-0.008
Token.
Token ID13
Feature activation-4.05e-03

Pos logit contributions
opl+0.014
rob+0.014
izer+0.014
Kim+0.013
ep+0.013
Neg logit contributions
ived-0.009
uttered-0.008
bs-0.008
 disagreed-0.008
 neither-0.008
Token "
Token ID366
Feature activation+2.33e-03

Pos logit contributions
opl+0.009
rob+0.009
izer+0.008
ep+0.008
Kim+0.008
Neg logit contributions
ived-0.006
uttered-0.005
 disagreed-0.005
 neither-0.005
s-0.005
Token,
Token ID11
Feature activation-1.86e-03

Pos logit contributions
ived+0.026
uttered+0.024
iked+0.022
 plunge+0.022
ashed+0.022
Neg logit contributions
opl-0.020
Kim-0.019
coon-0.018
ep-0.017
et-0.017
Token "
Token ID366
Feature activation+5.34e-03

Pos logit contributions
opl+0.003
Kim+0.003
izer+0.003
rob+0.003
ep+0.003
Neg logit contributions
ived-0.003
uttered-0.003
bs-0.003
iked-0.003
 disagreed-0.003
TokenI
Token ID40
Feature activation+4.38e-03

Pos logit contributions
ived+0.010
uttered+0.010
bs+0.009
iked+0.009
 disagreed+0.009
Neg logit contributions
opl-0.009
Kim-0.008
izer-0.008
rob-0.008
ep-0.008
Token just
Token ID655
Feature activation+8.03e-03

Pos logit contributions
ived+0.007
iked+0.007
uttered+0.006
als+0.006
nam+0.006
Neg logit contributions
opl-0.008
ep-0.008
Kim-0.008
izer-0.007
istle-0.007
Token wanted
Token ID2227
Feature activation-1.74e-03

Pos logit contributions
iked+0.012
ived+0.012
uttered+0.012
s+0.011
ls+0.011
Neg logit contributions
opl-0.015
ep-0.014
et-0.013
coon-0.013
istle-0.013
Token to
Token ID284
Feature activation+2.28e-04

Pos logit contributions
opl+0.004
Kim+0.004
ep+0.003
coon+0.003
izer+0.003
Neg logit contributions
uttered-0.002
ived-0.002
iked-0.002
ver-0.002
ashed-0.002
Token see
Token ID766
Feature activation+2.08e-03

Pos logit contributions
ived+0.000
uttered+0.000
iked+0.000
bs+0.000
ashed+0.000
Neg logit contributions
opl-0.000
izer-0.000
Kim-0.000
ep-0.000
istle-0.000
Token how
Token ID703
Feature activation+2.05e-02

Pos logit contributions
ived+0.003
uttered+0.003
iked+0.003
bs+0.003
 disagreed+0.003
Neg logit contributions
opl-0.004
izer-0.004
rob-0.004
Kim-0.004
ep-0.004
Token it
Token ID340
Feature activation+7.24e-04

Pos logit contributions
ived+0.034
iked+0.033
ver+0.032
uttered+0.032
ared+0.030
Neg logit contributions
opl-0.033
Kim-0.033
ep-0.031
Wh-0.028
Johnny-0.028
Token feels
Token ID5300
Feature activation+3.98e-03

Pos logit contributions
iked+0.001
ived+0.001
uttered+0.001
bs+0.001
ared+0.001
Neg logit contributions
ep-0.001
opl-0.001
rob-0.001
et-0.001
istle-0.001
Token.
Token ID13
Feature activation-2.23e-03

Pos logit contributions
uttered+0.008
ived+0.007
iked+0.007
ared+0.007
apped+0.006
Neg logit contributions
opl-0.006
ep-0.006
izer-0.005
coon-0.005
Kim-0.005
Token girl
Token ID2576
Feature activation-1.88e-03

Pos logit contributions
ived+0.009
uttered+0.009
bs+0.008
iked+0.008
 neither+0.008
Neg logit contributions
opl-0.008
izer-0.007
Kim-0.007
rob-0.007
ep-0.007
Token's
Token ID338
Feature activation+1.09e-02

Pos logit contributions
opl+0.003
rob+0.003
izer+0.003
Kim+0.003
coon+0.003
Neg logit contributions
ived-0.003
bs-0.003
uttered-0.003
s-0.003
 disagreed-0.003
Token room
Token ID2119
Feature activation-9.51e-04

Pos logit contributions
ived+0.019
uttered+0.019
bs+0.018
iked+0.017
 neither+0.017
Neg logit contributions
opl-0.019
rob-0.018
Kim-0.017
izer-0.017
coon-0.017
Token.
Token ID13
Feature activation+1.03e-02

Pos logit contributions
opl+0.002
Kim+0.002
izer+0.002
ep+0.001
rob+0.001
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
 disagreed-0.001
iked-0.001
Token The
Token ID383
Feature activation+1.91e-02

Pos logit contributions
ived+0.021
uttered+0.019
bs+0.018
 neither+0.018
iked+0.017
Neg logit contributions
opl-0.016
Kim-0.016
izer-0.014
Wh-0.014
ep-0.013
Token wardrobe
Token ID35593
Feature activation+9.38e-03

Pos logit contributions
ived+0.026
uttered+0.025
bs+0.024
 disagreed+0.022
iked+0.022
Neg logit contributions
opl-0.041
izer-0.038
rob-0.038
Kim-0.037
ep-0.037
Token was
Token ID373
Feature activation+5.49e-03

Pos logit contributions
ived+0.019
bs+0.016
uttered+0.016
 plunge+0.016
iked+0.015
Neg logit contributions
ep-0.014
opl-0.014
Kim-0.014
rob-0.013
coon-0.012
Token very
Token ID845
Feature activation-1.44e-04

Pos logit contributions
ived+0.009
uttered+0.008
bs+0.008
 disagreed+0.007
 neither+0.007
Neg logit contributions
opl-0.011
rob-0.010
izer-0.010
Kim-0.010
ep-0.010
Token dull
Token ID19222
Feature activation-1.38e-04

Pos logit contributions
opl+0.000
Kim+0.000
ep+0.000
izer+0.000
rob+0.000
Neg logit contributions
ived-0.000
uttered-0.000
bs-0.000
iked-0.000
 neither-0.000
Token and
Token ID290
Feature activation+4.18e-03

Pos logit contributions
opl+0.000
izer+0.000
Kim+0.000
rob+0.000
ep+0.000
Neg logit contributions
ived-0.000
uttered-0.000
bs-0.000
iked-0.000
 disagreed-0.000
Token old
Token ID1468
Feature activation+7.05e-03

Pos logit contributions
ived+0.007
uttered+0.006
iked+0.006
ared+0.006
bs+0.005
Neg logit contributions
opl-0.008
Kim-0.008
ep-0.007
rob-0.007
Wh-0.007
Token some
Token ID617
Feature activation+1.11e-02

Pos logit contributions
ep+0.004
opl+0.004
rob+0.004
Kim+0.004
izer+0.003
Neg logit contributions
ived-0.003
uttered-0.003
iked-0.003
ared-0.003
ashed-0.003
Token milk
Token ID7545
Feature activation+9.20e-03

Pos logit contributions
ived+0.021
uttered+0.021
ared+0.019
iked+0.018
ashed+0.018
Neg logit contributions
opl-0.018
ep-0.017
Kim-0.016
izer-0.016
rob-0.016
Token and
Token ID290
Feature activation+1.80e-02

Pos logit contributions
ived+0.016
uttered+0.015
iked+0.013
bs+0.013
ared+0.013
Neg logit contributions
opl-0.016
Kim-0.015
rob-0.015
izer-0.015
ep-0.015
Token butter
Token ID9215
Feature activation+1.36e-02

Pos logit contributions
ived+0.035
uttered+0.034
ared+0.031
iked+0.030
ver+0.029
Neg logit contributions
Kim-0.029
ep-0.028
opl-0.028
 stret-0.027
izer-0.026
Token.
Token ID13
Feature activation-1.68e-03

Pos logit contributions
ived+0.022
uttered+0.020
bs+0.019
iked+0.018
 disagreed+0.018
Neg logit contributions
opl-0.026
izer-0.024
rob-0.024
Kim-0.024
ep-0.024
Token Put
Token ID5930
Feature activation-1.49e-02

Pos logit contributions
opl+0.004
rob+0.003
izer+0.003
ep+0.003
Kim+0.003
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
 neither-0.002
iked-0.002
Token on
Token ID319
Feature activation-8.28e-03

Pos logit contributions
opl+0.032
izer+0.031
rob+0.031
ep+0.031
Kim+0.031
Neg logit contributions
ived-0.020
uttered-0.018
iked-0.016
bs-0.016
 disagreed-0.016
Token your
Token ID534
Feature activation+5.78e-03

Pos logit contributions
opl+0.017
Kim+0.016
izer+0.016
rob+0.016
ep+0.016
Neg logit contributions
ived-0.012
uttered-0.011
bs-0.010
iked-0.010
 disagreed-0.010
Token coats
Token ID30720
Feature activation-8.35e-03

Pos logit contributions
ived+0.009
uttered+0.008
bs+0.008
iked+0.008
 neither+0.008
Neg logit contributions
opl-0.011
izer-0.011
Kim-0.010
rob-0.010
ep-0.010
Token,
Token ID11
Feature activation-4.16e-03

Pos logit contributions
opl+0.012
ep+0.012
Kim+0.012
izer+0.012
rob+0.011
Neg logit contributions
ived-0.016
uttered-0.016
bs-0.014
iked-0.014
 disagreed-0.014
Token kids
Token ID3988
Feature activation-1.26e-03

Pos logit contributions
Kim+0.007
opl+0.007
ep+0.007
izer+0.007
 stret+0.007
Neg logit contributions
ived-0.007
uttered-0.007
iked-0.006
ared-0.006
ver-0.006
Token said
Token ID531
Feature activation-1.20e-02

Pos logit contributions
opl+0.033
Kim+0.032
rob+0.031
izer+0.030
ep+0.030
Neg logit contributions
ived-0.026
uttered-0.024
bs-0.022
iked-0.022
ared-0.021
Token,
Token ID11
Feature activation-8.50e-03

Pos logit contributions
opl+0.020
Kim+0.020
ep+0.018
rob+0.018
izer+0.018
Neg logit contributions
ived-0.018
ared-0.017
uttered-0.017
ashed-0.017
 disagreed-0.016
Token "
Token ID366
Feature activation-1.06e-02

Pos logit contributions
opl+0.016
rob+0.015
izer+0.015
Kim+0.015
ep+0.014
Neg logit contributions
ived-0.014
uttered-0.013
bs-0.012
iked-0.012
 neither-0.012
TokenThank
Token ID10449
Feature activation-8.71e-03

Pos logit contributions
opl+0.022
izer+0.021
 stret+0.020
rob+0.020
ep+0.020
Neg logit contributions
ived-0.014
uttered-0.013
als-0.013
 neither-0.012
bs-0.012
Token you
Token ID345
Feature activation-7.55e-03

Pos logit contributions
opl+0.024
izer+0.022
ep+0.022
 stret+0.022
rob+0.022
Neg logit contributions
ived-0.006
bs-0.005
als-0.005
uttered-0.005
 neither-0.005
Token for
Token ID329
Feature activation-3.22e-02

Pos logit contributions
opl+0.019
rob+0.018
izer+0.018
ep+0.017
Kim+0.017
Neg logit contributions
ived-0.008
uttered-0.007
bs-0.007
 neither-0.007
s-0.006
Token helping
Token ID5742
Feature activation-6.08e-03

Pos logit contributions
opl+0.072
rob+0.069
izer+0.065
istle+0.063
Kim+0.062
Neg logit contributions
 married-0.037
ived-0.034
 to-0.034
 everyone-0.033
 as-0.033
Token me
Token ID502
Feature activation+1.70e-03

Pos logit contributions
opl+0.009
rob+0.008
izer+0.008
Kim+0.008
ep+0.007
Neg logit contributions
ived-0.013
uttered-0.012
bs-0.012
 neither-0.011
 disagreed-0.011
Token fix
Token ID4259
Feature activation-1.27e-02

Pos logit contributions
ived+0.003
uttered+0.003
ashed+0.003
bs+0.003
iked+0.003
Neg logit contributions
opl-0.003
ep-0.003
izer-0.003
Kim-0.003
coon-0.003
Token my
Token ID616
Feature activation-2.16e-03

Pos logit contributions
opl+0.029
Kim+0.028
izer+0.028
rob+0.027
Wh+0.027
Neg logit contributions
ived-0.014
 neither-0.012
bs-0.012
uttered-0.012
s-0.012
Token painting
Token ID12036
Feature activation-1.38e-02

Pos logit contributions
opl+0.004
izer+0.003
Kim+0.003
ep+0.003
rob+0.003
Neg logit contributions
ived-0.004
uttered-0.004
bs-0.004
iked-0.004
 neither-0.004
Token never
Token ID1239
Feature activation+1.57e-02

Pos logit contributions
ived+0.005
uttered+0.004
bs+0.004
ared+0.004
ver+0.004
Neg logit contributions
Kim-0.008
opl-0.008
ep-0.008
istle-0.008
izer-0.007
Token ended
Token ID4444
Feature activation+3.02e-03

Pos logit contributions
uttered+0.024
ived+0.023
bs+0.022
iked+0.022
apy+0.021
Neg logit contributions
opl-0.025
istle-0.024
Kim-0.024
coon-0.023
ep-0.023
Token!
Token ID0
Feature activation-4.56e-03

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.004
iked+0.004
 disagreed+0.004
Neg logit contributions
opl-0.006
izer-0.006
Kim-0.006
ep-0.006
rob-0.006
Token
Token ID198
Feature activation-8.34e-03

Pos logit contributions
rob+0.010
opl+0.010
ep+0.010
izer+0.010
coon+0.009
Neg logit contributions
uttered-0.006
ived-0.006
bs-0.005
 neither-0.005
s-0.005
Token
Token ID198
Feature activation-4.35e-03

Pos logit contributions
opl+0.018
rob+0.018
izer+0.017
ep+0.016
Kim+0.016
Neg logit contributions
uttered-0.011
ived-0.010
bs-0.010
s-0.009
iked-0.008
TokenSu
Token ID5606
Feature activation-3.32e-02

Pos logit contributions
opl+0.008
rob+0.007
izer+0.007
 stret+0.006
coon+0.006
Neg logit contributions
ived-0.008
uttered-0.008
bs-0.007
s-0.007
 neither-0.007
Tokenzie
Token ID49746
Feature activation-1.87e-02

Pos logit contributions
rob+0.052
 stret+0.050
Wh+0.048
izer+0.047
ep+0.045
Neg logit contributions
ived-0.061
uttered-0.061
ared-0.058
 disagreed-0.055
owed-0.054
Token was
Token ID373
Feature activation-5.80e-03

Pos logit contributions
opl+0.033
ep+0.031
Kim+0.030
izer+0.030
rob+0.029
Neg logit contributions
ived-0.032
uttered-0.031
bs-0.029
iked-0.029
 neither-0.028
Token sad
Token ID6507
Feature activation+7.31e-03

Pos logit contributions
opl+0.010
ep+0.010
Kim+0.009
izer+0.009
Wh+0.008
Neg logit contributions
ived-0.010
uttered-0.009
bs-0.009
iked-0.009
als-0.009
Token that
Token ID326
Feature activation+2.16e-03

Pos logit contributions
ived+0.013
uttered+0.013
iked+0.012
bs+0.012
 plunge+0.012
Neg logit contributions
opl-0.011
Kim-0.010
izer-0.009
coon-0.009
 stret-0.009
Token she
Token ID673
Feature activation-5.23e-03

Pos logit contributions
ived+0.004
uttered+0.003
ashed+0.003
ared+0.003
iked+0.003
Neg logit contributions
opl-0.004
Kim-0.004
parent-0.003
Wh-0.003
izer-0.003
Token end
Token ID886
Feature activation-1.73e-02

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.006
 disagreed+0.006
 neither+0.006
Neg logit contributions
opl-0.008
rob-0.008
izer-0.008
Kim-0.008
ep-0.008
Token.
Token ID13
Feature activation-1.71e-02

Pos logit contributions
rob+0.043
opl+0.042
izer+0.042
Wh+0.040
Kim+0.040
Neg logit contributions
s-0.018
bs-0.017
ived-0.015
 of-0.015
 to-0.015
Token<|endoftext|>
Token ID50256
Feature activation+6.30e-03

Pos logit contributions
opl+0.041
ep+0.041
 stret+0.040
coon+0.040
rob+0.039
Neg logit contributions
s-0.017
uttered-0.016
bs-0.016
ived-0.015
 neither-0.014
TokenOnce
Token ID7454
Feature activation-1.57e-03

Pos logit contributions
ived+0.007
uttered+0.007
s+0.006
ver+0.006
bs+0.006
Neg logit contributions
opl-0.015
ep-0.014
izer-0.014
 stret-0.013
Kim-0.013
Token upon
Token ID2402
Feature activation+7.13e-04

Pos logit contributions
opl+0.004
Kim+0.004
rob+0.004
izer+0.004
ep+0.004
Neg logit contributions
ived-0.001
uttered-0.001
bs-0.001
iked-0.001
 disagreed-0.001
Token a
Token ID257
Feature activation-2.84e-02

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
 disagreed+0.001
 neither+0.001
Neg logit contributions
opl-0.002
rob-0.002
izer-0.002
Kim-0.002
ep-0.001
Token time
Token ID640
Feature activation-1.08e-02

Pos logit contributions
opl+0.067
rob+0.065
Kim+0.064
coon+0.063
izer+0.061
Neg logit contributions
ived-0.031
bs-0.029
uttered-0.027
 disagreed-0.027
 while-0.026
Token,
Token ID11
Feature activation+1.94e-03

Pos logit contributions
opl+0.023
Kim+0.021
rob+0.021
izer+0.021
ep+0.020
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.013
als-0.012
 disagreed-0.012
Token there
Token ID612
Feature activation+7.09e-03

Pos logit contributions
ived+0.002
uttered+0.002
bs+0.002
iked+0.002
als+0.002
Neg logit contributions
Kim-0.005
opl-0.005
ep-0.004
izer-0.004
rob-0.004
Token was
Token ID373
Feature activation-6.70e-03

Pos logit contributions
ived+0.010
uttered+0.009
bs+0.009
iked+0.008
 disagreed+0.008
Neg logit contributions
opl-0.015
izer-0.014
Kim-0.014
rob-0.014
ep-0.014
Token a
Token ID257
Feature activation+3.24e-03

Pos logit contributions
opl+0.018
rob+0.016
Kim+0.016
izer+0.016
ep+0.016
Neg logit contributions
ived-0.006
uttered-0.005
bs-0.005
 neither-0.005
 disagreed-0.004
Token horses
Token ID14260
Feature activation+2.03e-02

Pos logit contributions
opl+0.019
Kim+0.018
izer+0.018
rob+0.018
ep+0.017
Neg logit contributions
ived-0.015
uttered-0.014
bs-0.013
 disagreed-0.013
 neither-0.013
Token,
Token ID11
Feature activation-1.43e-02

Pos logit contributions
ived+0.036
uttered+0.035
iked+0.031
bs+0.030
ashed+0.030
Neg logit contributions
opl-0.036
izer-0.032
ep-0.032
Kim-0.031
rob-0.031
Token cows
Token ID22575
Feature activation+1.47e-02

Pos logit contributions
opl+0.025
Kim+0.023
ep+0.023
izer+0.023
rob+0.022
Neg logit contributions
ived-0.026
uttered-0.025
iked-0.022
bs-0.022
ared-0.022
Token and
Token ID290
Feature activation-8.12e-03

Pos logit contributions
ived+0.021
uttered+0.020
bs+0.018
iked+0.018
 disagreed+0.017
Neg logit contributions
opl-0.030
izer-0.028
Kim-0.028
rob-0.027
ep-0.027
Token goats
Token ID35199
Feature activation+1.78e-02

Pos logit contributions
opl+0.014
Kim+0.013
rob+0.013
ep+0.013
izer+0.012
Neg logit contributions
uttered-0.015
ived-0.015
ared-0.013
iked-0.013
ashed-0.012
Token.
Token ID13
Feature activation-2.70e-02

Pos logit contributions
ived+0.035
uttered+0.034
iked+0.030
bs+0.030
ashed+0.029
Neg logit contributions
opl-0.028
ep-0.024
izer-0.024
coon-0.024
rob-0.024
Token
Token ID220
Feature activation+3.44e-03

Pos logit contributions
opl+0.053
Kim+0.048
rob+0.047
izer+0.047
ep+0.046
Neg logit contributions
ived-0.043
uttered-0.039
bs-0.038
 disagreed-0.037
iked-0.036
Token
Token ID198
Feature activation-1.34e-02

Pos logit contributions
ived+0.005
uttered+0.005
bs+0.005
ver+0.004
s+0.004
Neg logit contributions
opl-0.007
rob-0.006
izer-0.006
Kim-0.006
ep-0.006
Token
Token ID198
Feature activation-2.35e-02

Pos logit contributions
opl+0.027
izer+0.026
rob+0.025
Kim+0.025
ep+0.025
Neg logit contributions
ived-0.019
uttered-0.018
bs-0.017
s-0.017
ver-0.016
TokenCar
Token ID9914
Feature activation-3.70e-03

Pos logit contributions
opl+0.037
izer+0.034
rob+0.034
Kim+0.034
ep+0.032
Neg logit contributions
ived-0.045
uttered-0.043
bs-0.041
 disagreed-0.040
iked-0.040
Tokenoline
Token ID14453
Feature activation+1.07e-02

Pos logit contributions
opl+0.007
Kim+0.007
izer+0.007
rob+0.007
ep+0.006
Neg logit contributions
ived-0.006
uttered-0.006
bs-0.005
iked-0.005
als-0.005
Token paused
Token ID24487
Feature activation-1.89e-03

Pos logit contributions
ived+0.001
uttered+0.001
bs+0.001
iked+0.001
 neither+0.001
Neg logit contributions
opl-0.001
ep-0.001
rob-0.001
Kim-0.001
izer-0.001
Token and
Token ID290
Feature activation+5.47e-03

Pos logit contributions
opl+0.004
Kim+0.004
izer+0.004
rob+0.004
ep+0.004
Neg logit contributions
ived-0.002
uttered-0.002
bs-0.002
 disagreed-0.002
iked-0.002
Token smiled
Token ID13541
Feature activation+4.77e-03

Pos logit contributions
ived+0.008
uttered+0.007
bs+0.007
iked+0.007
ared+0.007
Neg logit contributions
rob-0.010
Kim-0.010
ep-0.010
opl-0.010
istle-0.009
Token at
Token ID379
Feature activation+3.33e-03

Pos logit contributions
ived+0.007
uttered+0.007
bs+0.006
 disagreed+0.006
iked+0.006
Neg logit contributions
opl-0.009
Kim-0.009
izer-0.009
ep-0.008
rob-0.008
Token their
Token ID511
Feature activation+3.23e-03

Pos logit contributions
ived+0.004
uttered+0.004
bs+0.003
 neither+0.003
 disagreed+0.003
Neg logit contributions
opl-0.008
rob-0.007
izer-0.007
ep-0.007
Kim-0.007
Token beautiful
Token ID4950
Feature activation-2.22e-02

Pos logit contributions
ived+0.005
uttered+0.005
bs+0.005
iked+0.004
ver+0.004
Neg logit contributions
opl-0.006
izer-0.006
Kim-0.006
ep-0.006
rob-0.005
Token moment
Token ID2589
Feature activation-1.40e-02

Pos logit contributions
rob+0.038
opl+0.038
Kim+0.037
coon+0.036
izer+0.036
Neg logit contributions
ived-0.038
uttered-0.036
iked-0.033
bs-0.033
 disagreed-0.033
Token.
Token ID13
Feature activation-7.55e-03

Pos logit contributions
opl+0.025
Kim+0.024
izer+0.023
ep+0.023
rob+0.023
Neg logit contributions
ived-0.023
uttered-0.022
bs-0.020
 disagreed-0.020
iked-0.020
Token
Token ID198
Feature activation-7.64e-03

Pos logit contributions
opl+0.016
rob+0.015
izer+0.015
Kim+0.015
ep+0.015
Neg logit contributions
ived-0.010
uttered-0.010
bs-0.009
iked-0.009
 neither-0.009
Token
Token ID198
Feature activation-4.57e-03

Pos logit contributions
opl+0.016
rob+0.016
izer+0.015
Kim+0.015
ep+0.015
Neg logit contributions
ived-0.010
uttered-0.010
bs-0.009
s-0.009
iked-0.008
TokenThe
Token ID464
Feature activation-2.61e-06

Pos logit contributions
opl+0.009
rob+0.008
izer+0.008
 stret+0.007
ep+0.007
Neg logit contributions
ived-0.008
uttered-0.007
bs-0.007
iked-0.006
s-0.006