TOP ACTIVATIONS
MAX = 0.185

about how your actions might affect others . <|endoftext|> Once upon
about how their actions could affect other people . They learned
the things we eat can affect our dreams .
think about how his actions affect others . <|endoftext|> Once upon
not let others ' actions affect her . She promised to
of how his actions might affect those around him .
think about how my decisions affect our nation ."
before doing something that might affect others . From
of how our decisions might affect them . <|endoftext|> Once upon
a friendly cat named Wh isk ers . Wh isk ers
a black cat named Wh isk ers . Wh isk ers
Wh isk ers . Wh isk ers liked to imagine that
I 'm sorry , Wh isk ers . Let 's go
Wh isk ers . Wh isk ers loved to walk around
a lazy cat named Wh isk ers . Wh isk ers
was heading straight towards Wh isk ers . She ran as
, and he and Wh isk ers played with the new
a lazy cat named Wh isk ers . Wh isk ers
allow nature to take its course ." L
nos y cat , Wh isk ers , saw her searching

MIN ACTIVATIONS
MIN = -0.121

Jake was over joy ed and thanked his mom
robe and wear her p aj amas instead . But Lily
and changed into her p aj amas . She was so
He put on his p aj amas and asked his mom
and Tim my was over joy ed . He and his
and dressed in fresh p aj amas . That
. Tim my was over joy ed and said , "
Peter and Lucy were over joy ed by the beetle 's
put on her special p aj amas and went to bed
was still wearing her p aj amas .
, got into her p aj amas , and sn ugg
M ia was over joy ed and thanked her friend
distance . Everyone was over joy ed that the cow had
all ." Lucy was over joy ed and quickly grabbed the
The bald man was over joy ed . He thanked the
and put on your p aj amas ." Tim
king and he was over joy ed . " Thank you
girl put on her p aj amas and went to bed
and put on their p aj amas . They hug their
and put on your p aj amas ." L

INTERVAL 0.108 - 0.185
CONTAINS 0.024%

lot that day . No matter how spoiled her furry friend
everyone and everything , no matter what ! Every
, Bark y saw Wh isk ers examining a pretty flower
, Lucy !" said Wh isk ers , " Do you
Tom . It doesn 't matter who can balance a spoon

INTERVAL 0.032 - 0.108
CONTAINS 6.615%

park and Tim loved to kick the sand around . One
a happier place . The end . <|endoftext|> Once upon a
it was hers and he could not touch it . Max
her parents went to the store . There were many matt
always making discoveries and shouting out loud in joy .

INTERVAL -0.045 - 0.032
CONTAINS 91.259%

jumping , rolling , swinging , and throwing .
and sad . She cried and cried because she knew she
my ?" Ben asked . " Sure , honey
to buy some treats . Tim my wanted to buy a
who helped save him . And they all lived happily ever

INTERVAL -0.121 - -0.045
CONTAINS 2.102%

. She stumbled around , looking at everything with her big
the soup without asking . Now you won 't gain anything
something was attached to the cave . She looked closer and
, deep sigh , he watched it soar up ever higher
and thanked her mom for helping . From then on ,
Token about
Token ID546
Feature activation-7.01e-03

Pos logit contributions
 rac+0.011
iden+0.011
 M+0.010
 tr+0.010
 Oscar+0.010
Neg logit contributions
cks-0.012
ishment-0.012
 communicate-0.012
ations-0.012
en-0.011
Token how
Token ID703
Feature activation-6.62e-02

Pos logit contributions
 rac+0.012
 M+0.011
 tr+0.011
iden+0.011
 Oscar+0.011
Neg logit contributions
immer-0.011
pole-0.011
ishment-0.011
cks-0.011
cker-0.010
Token your
Token ID534
Feature activation-4.02e-02

Pos logit contributions
iden+0.105
 rac+0.105
 Oscar+0.097
 tr+0.096
 bulldo+0.096
Neg logit contributions
 communicate-0.111
ations-0.108
cks-0.107
 generation-0.106
 anymore-0.105
Token actions
Token ID4028
Feature activation+3.40e-02

Pos logit contributions
 rac+0.050
iden+0.049
 tr+0.045
 M+0.044
 coy+0.043
Neg logit contributions
cks-0.082
 communicate-0.081
pole-0.080
en-0.080
ations-0.080
Token might
Token ID1244
Feature activation+2.08e-02

Pos logit contributions
ishment+0.068
cks+0.067
 generation+0.066
pole+0.063
immer+0.062
Neg logit contributions
 rac-0.045
 tr-0.042
 sp-0.040
iden-0.040
 M-0.037
Token affect
Token ID2689
Feature activation+1.85e-01

Pos logit contributions
ishment+0.042
cks+0.040
angles+0.039
ations+0.038
 generation+0.036
Neg logit contributions
 rac-0.029
 sp-0.026
 big-0.026
 w-0.026
 tr-0.026
Token others
Token ID1854
Feature activation+3.35e-02

Pos logit contributions
immer+0.358
ishment+0.345
pole+0.344
cks+0.339
cker+0.331
Neg logit contributions
 rac-0.268
 two-0.234
 M-0.229
 big-0.228
 Oscar-0.226
Token.
Token ID13
Feature activation-2.20e-03

Pos logit contributions
cks+0.068
ishment+0.068
pole+0.066
angles+0.066
ck+0.066
Neg logit contributions
 rac-0.040
 tr-0.036
 M-0.036
 sp-0.036
iden-0.034
Token<|endoftext|>
Token ID50256
Feature activation+4.17e-02

Pos logit contributions
 rac+0.002
iden+0.002
 tr+0.002
 bulldo+0.002
 coy+0.002
Neg logit contributions
 communicate-0.005
 generation-0.005
cks-0.005
pole-0.005
en-0.005
TokenOnce
Token ID7454
Feature activation-7.31e-03

Pos logit contributions
cks+0.089
 anymore+0.089
ations+0.088
en+0.088
 communicate+0.088
Neg logit contributions
iden-0.045
 rac-0.043
 coy-0.040
 bulldo-0.039
 tr-0.039
Token upon
Token ID2402
Feature activation-4.43e-02

Pos logit contributions
iden+0.007
 rac+0.007
 Oscar+0.006
 M+0.006
 tr+0.006
Neg logit contributions
 communicate-0.017
cks-0.016
 generation-0.016
ishment-0.016
ations-0.016
Token about
Token ID546
Feature activation-3.33e-03

Pos logit contributions
 rac+0.049
iden+0.048
 M+0.045
 tr+0.044
 Oscar+0.044
Neg logit contributions
cks-0.072
ishment-0.071
 communicate-0.069
 generation-0.068
pole-0.068
Token how
Token ID703
Feature activation-6.50e-02

Pos logit contributions
 rac+0.006
 M+0.006
 tr+0.006
 Oscar+0.006
iden+0.006
Neg logit contributions
immer-0.005
pole-0.005
cks-0.005
cker-0.005
ishment-0.005
Token their
Token ID511
Feature activation-2.11e-02

Pos logit contributions
 rac+0.103
iden+0.102
 bulldo+0.095
 bisc+0.095
 tr+0.095
Neg logit contributions
 communicate-0.107
ations-0.106
cks-0.103
 generation-0.103
 anymore-0.103
Token actions
Token ID4028
Feature activation+3.29e-02

Pos logit contributions
iden+0.025
 rac+0.025
 Oscar+0.022
 coy+0.022
 bulldo+0.022
Neg logit contributions
 communicate-0.044
cks-0.043
ations-0.043
 generation-0.043
ishment-0.042
Token could
Token ID714
Feature activation+3.10e-02

Pos logit contributions
cks+0.069
ishment+0.069
 generation+0.068
pole+0.066
ility+0.064
Neg logit contributions
 rac-0.041
iden-0.038
 tr-0.037
 sp-0.035
 M-0.034
Token affect
Token ID2689
Feature activation+1.84e-01

Pos logit contributions
ishment+0.062
cks+0.062
ility+0.059
ations+0.058
 generation+0.058
Neg logit contributions
 rac-0.042
 tr-0.039
 sp-0.038
 w-0.038
 Oscar-0.037
Token other
Token ID584
Feature activation+1.28e-02

Pos logit contributions
immer+0.341
pole+0.333
cks+0.329
cker+0.327
ishment+0.315
Neg logit contributions
 rac-0.288
 M-0.265
 Oscar-0.260
 two-0.251
 tr-0.250
Token people
Token ID661
Feature activation+3.12e-02

Pos logit contributions
cks+0.029
pole+0.028
immer+0.028
ility+0.027
cker+0.027
Neg logit contributions
 rac-0.016
 tr-0.013
iden-0.013
 M-0.013
 sp-0.012
Token.
Token ID13
Feature activation+1.34e-03

Pos logit contributions
cks+0.065
angles+0.062
ishment+0.062
immer+0.061
 generation+0.061
Neg logit contributions
 rac-0.038
iden-0.036
 tr-0.034
 M-0.033
 sp-0.032
Token They
Token ID1119
Feature activation+7.82e-03

Pos logit contributions
 communicate+0.003
cks+0.003
 generation+0.003
en+0.003
ations+0.003
Neg logit contributions
 rac-0.001
iden-0.001
 tr-0.001
 coy-0.001
 Oscar-0.001
Token learned
Token ID4499
Feature activation-2.97e-02

Pos logit contributions
cks+0.016
ishment+0.015
ations+0.015
immer+0.015
 generation+0.015
Neg logit contributions
 rac-0.010
iden-0.010
 tr-0.010
 M-0.009
 sp-0.009
Token the
Token ID262
Feature activation-9.55e-03

Pos logit contributions
cks+0.035
pole+0.035
immer+0.034
ishment+0.033
ility+0.033
Neg logit contributions
 rac-0.029
 tr-0.026
 M-0.026
iden-0.026
 Oscar-0.025
Token things
Token ID1243
Feature activation-1.91e-02

Pos logit contributions
iden+0.013
 rac+0.013
 Oscar+0.012
 coy+0.012
 tr+0.012
Neg logit contributions
 communicate-0.018
cks-0.018
ations-0.018
 generation-0.017
pole-0.017
Token we
Token ID356
Feature activation+2.07e-03

Pos logit contributions
 rac+0.022
 tr+0.021
 sp+0.021
iden+0.019
 M+0.019
Neg logit contributions
ishment-0.040
 generation-0.040
cks-0.040
immer-0.040
pole-0.039
Token eat
Token ID4483
Feature activation-7.57e-03

Pos logit contributions
cks+0.004
ishment+0.004
pole+0.004
en+0.004
 generation+0.004
Neg logit contributions
 rac-0.003
 sp-0.003
iden-0.003
 tr-0.003
 bulldo-0.003
Token can
Token ID460
Feature activation+1.11e-02

Pos logit contributions
 rac+0.010
 sp+0.010
 M+0.010
 tr+0.010
 Oscar+0.010
Neg logit contributions
pole-0.014
cks-0.014
ility-0.014
 communicate-0.014
ishment-0.014
Token affect
Token ID2689
Feature activation+1.82e-01

Pos logit contributions
ishment+0.020
cks+0.020
 generation+0.019
ility+0.019
ations+0.019
Neg logit contributions
 rac-0.017
 sp-0.015
 tr-0.015
 M-0.015
 bulldo-0.014
Token our
Token ID674
Feature activation-5.08e-02

Pos logit contributions
pole+0.317
immer+0.317
ishment+0.304
ility+0.303
cks+0.301
Neg logit contributions
 rac-0.288
 M-0.258
 Oscar-0.258
 two-0.253
 Now-0.252
Token dreams
Token ID10625
Feature activation+1.67e-03

Pos logit contributions
 rac+0.058
iden+0.052
 tr+0.051
 sp+0.050
 M+0.049
Neg logit contributions
cks-0.111
pole-0.110
ility-0.109
 communicate-0.107
immer-0.107
Token.
Token ID13
Feature activation+8.76e-04

Pos logit contributions
ishment+0.004
cks+0.003
 generation+0.003
perate+0.003
cker+0.003
Neg logit contributions
 rac-0.002
 big-0.002
 long-0.002
 sp-0.002
 M-0.002
Token
Token ID220
Feature activation-8.48e-04

Pos logit contributions
 generation+0.002
 communicate+0.002
pole+0.002
cks+0.002
en+0.002
Neg logit contributions
iden-0.001
 rac-0.001
 tr-0.001
 coy-0.001
 bulldo-0.001
Token
Token ID198
Feature activation-6.36e-03

Pos logit contributions
iden+0.001
 rac+0.001
 tr+0.001
 coy+0.001
 bulldo+0.001
Neg logit contributions
cks-0.002
 communicate-0.002
pole-0.002
ations-0.002
 generation-0.002
Token think
Token ID892
Feature activation-4.46e-03

Pos logit contributions
cks+0.002
ishment+0.002
en+0.002
pole+0.002
ations+0.002
Neg logit contributions
 rac-0.002
iden-0.002
 tr-0.002
 sp-0.002
 Oscar-0.002
Token about
Token ID546
Feature activation-3.76e-03

Pos logit contributions
 rac+0.007
iden+0.007
 Oscar+0.007
 M+0.007
 tr+0.007
Neg logit contributions
ishment-0.007
cks-0.007
 communicate-0.007
 generation-0.007
ations-0.007
Token how
Token ID703
Feature activation-6.47e-02

Pos logit contributions
 rac+0.007
 M+0.006
 tr+0.006
 Oscar+0.006
iden+0.006
Neg logit contributions
pole-0.006
immer-0.006
cks-0.006
ishment-0.006
 communicate-0.005
Token his
Token ID465
Feature activation-3.26e-03

Pos logit contributions
iden+0.105
 rac+0.103
 bisc+0.098
 coy+0.095
 tr+0.095
Neg logit contributions
 communicate-0.108
ations-0.105
cks-0.105
 anymore-0.104
en-0.101
Token actions
Token ID4028
Feature activation+3.43e-02

Pos logit contributions
iden+0.003
 rac+0.003
 Oscar+0.002
 bulldo+0.002
 coy+0.002
Neg logit contributions
 communicate-0.008
cks-0.008
ations-0.008
 generation-0.008
en-0.008
Token affect
Token ID2689
Feature activation+1.82e-01

Pos logit contributions
ishment+0.071
 generation+0.070
cks+0.068
pole+0.068
immer+0.065
Neg logit contributions
 rac-0.044
 tr-0.040
 sp-0.039
iden-0.039
 Oscar-0.036
Token others
Token ID1854
Feature activation+3.59e-02

Pos logit contributions
immer+0.334
pole+0.316
ishment+0.312
cks+0.308
cker+0.303
Neg logit contributions
 rac-0.291
 Oscar-0.266
 M-0.257
 tr-0.249
 Now-0.246
Token.
Token ID13
Feature activation+2.50e-03

Pos logit contributions
ishment+0.071
 generation+0.071
pole+0.071
cks+0.071
immer+0.070
Neg logit contributions
 rac-0.046
 sp-0.041
 tr-0.041
 M-0.040
iden-0.039
Token<|endoftext|>
Token ID50256
Feature activation+4.42e-02

Pos logit contributions
 generation+0.005
pole+0.005
cks+0.005
en+0.005
 someday+0.005
Neg logit contributions
iden-0.003
 rac-0.003
 tr-0.003
 bulldo-0.003
 sp-0.003
TokenOnce
Token ID7454
Feature activation-2.92e-03

Pos logit contributions
cks+0.094
 anymore+0.093
ations+0.093
en+0.093
 communicate+0.092
Neg logit contributions
iden-0.049
 rac-0.046
 coy-0.043
 tr-0.042
 bulldo-0.042
Token upon
Token ID2402
Feature activation-4.27e-02

Pos logit contributions
 rac+0.003
iden+0.003
 Oscar+0.003
 tr+0.003
 coy+0.003
Neg logit contributions
 communicate-0.006
cks-0.006
ations-0.006
pole-0.006
en-0.006
Token not
Token ID407
Feature activation+1.22e-02

Pos logit contributions
cks+0.003
ishment+0.003
en+0.003
pole+0.003
ations+0.003
Neg logit contributions
 rac-0.003
iden-0.003
 tr-0.003
 Oscar-0.003
 sp-0.003
Token let
Token ID1309
Feature activation-7.85e-03

Pos logit contributions
cks+0.022
ishment+0.022
 generation+0.022
ations+0.022
pole+0.021
Neg logit contributions
 rac-0.018
 tr-0.017
 Oscar-0.016
 bulldo-0.016
 sp-0.016
Token others
Token ID1854
Feature activation+3.65e-02

Pos logit contributions
 rac+0.013
iden+0.013
 Oscar+0.013
 M+0.013
 tr+0.012
Neg logit contributions
cks-0.013
pole-0.012
ishment-0.012
 communicate-0.012
 generation-0.012
Token'
Token ID6
Feature activation+3.38e-02

Pos logit contributions
pole+0.064
ishment+0.064
 generation+0.063
cks+0.063
ility+0.061
Neg logit contributions
 rac-0.056
 tr-0.052
 Oscar-0.051
 sp-0.051
iden-0.050
Token actions
Token ID4028
Feature activation+3.23e-02

Pos logit contributions
cks+0.063
pole+0.063
ility+0.059
ishment+0.059
immer+0.058
Neg logit contributions
 rac-0.052
 Oscar-0.046
iden-0.046
 tr-0.046
 bulldo-0.046
Token affect
Token ID2689
Feature activation+1.81e-01

Pos logit contributions
ishment+0.062
cks+0.062
 generation+0.061
pole+0.061
ility+0.058
Neg logit contributions
 rac-0.046
 tr-0.041
 Oscar-0.040
iden-0.039
 sp-0.039
Token her
Token ID607
Feature activation+1.87e-02

Pos logit contributions
immer+0.340
pole+0.334
cker+0.320
ility+0.317
ishment+0.316
Neg logit contributions
 rac-0.276
 Oscar-0.257
 Now-0.242
 M-0.241
 two-0.239
Token.
Token ID13
Feature activation+5.61e-03

Pos logit contributions
pole+0.040
cks+0.039
ility+0.037
immer+0.037
angles+0.036
Neg logit contributions
 rac-0.025
 tr-0.021
 M-0.021
 sp-0.021
 brothers-0.021
Token She
Token ID1375
Feature activation+1.19e-02

Pos logit contributions
 communicate+0.011
cks+0.011
 generation+0.011
en+0.011
ations+0.011
Neg logit contributions
 rac-0.007
iden-0.007
 tr-0.006
 Oscar-0.006
 coy-0.006
Token promised
Token ID8072
Feature activation-1.15e-02

Pos logit contributions
 communicate+0.021
cks+0.021
ations+0.020
pole+0.020
en+0.020
Neg logit contributions
 rac-0.018
iden-0.018
 Oscar-0.016
 tr-0.016
 coy-0.016
Token to
Token ID284
Feature activation+1.49e-03

Pos logit contributions
 rac+0.012
iden+0.011
 Oscar+0.011
 M+0.011
 tr+0.010
Neg logit contributions
cks-0.026
pole-0.026
 communicate-0.025
ishment-0.025
ations-0.025
Token of
Token ID286
Feature activation-3.34e-03

Pos logit contributions
 rac+0.009
iden+0.009
 tr+0.008
 M+0.008
 Oscar+0.008
Neg logit contributions
cks-0.014
 communicate-0.014
ishment-0.014
ations-0.014
en-0.014
Token how
Token ID703
Feature activation-6.14e-02

Pos logit contributions
 rac+0.005
 M+0.005
 tr+0.005
 sp+0.005
 Oscar+0.005
Neg logit contributions
pole-0.006
immer-0.006
ishment-0.006
cks-0.006
cker-0.006
Token his
Token ID465
Feature activation-1.89e-03

Pos logit contributions
iden+0.081
 rac+0.080
 Oscar+0.074
 tr+0.073
 bisc+0.072
Neg logit contributions
 communicate-0.119
ations-0.116
 anymore-0.115
cks-0.115
 generation-0.114
Token actions
Token ID4028
Feature activation+3.20e-02

Pos logit contributions
 rac+0.002
iden+0.002
 Oscar+0.001
 tr+0.001
 coy+0.001
Neg logit contributions
 communicate-0.004
cks-0.004
ations-0.004
en-0.004
pole-0.004
Token might
Token ID1244
Feature activation+1.64e-02

Pos logit contributions
ishment+0.068
cks+0.066
 generation+0.066
pole+0.064
ility+0.063
Neg logit contributions
 rac-0.040
iden-0.036
 tr-0.036
 sp-0.034
 Oscar-0.034
Token affect
Token ID2689
Feature activation+1.80e-01

Pos logit contributions
ishment+0.034
cks+0.030
angles+0.029
ations+0.029
 generation+0.028
Neg logit contributions
 rac-0.025
 tr-0.022
 w-0.022
 sp-0.022
 sk-0.021
Token those
Token ID883
Feature activation+8.88e-03

Pos logit contributions
immer+0.343
ishment+0.326
pole+0.323
ist+0.322
ility+0.320
Neg logit contributions
 rac-0.285
 Oscar-0.252
 M-0.247
 big-0.239
 He-0.236
Token around
Token ID1088
Feature activation+3.57e-02

Pos logit contributions
ishment+0.017
 generation+0.017
immer+0.017
cks+0.016
ility+0.016
Neg logit contributions
 rac-0.013
 tr-0.012
 sp-0.011
iden-0.011
 M-0.011
Token him
Token ID683
Feature activation-5.54e-03

Pos logit contributions
ishment+0.066
immer+0.066
pole+0.064
cks+0.062
ist+0.062
Neg logit contributions
 rac-0.055
 M-0.050
iden-0.047
 sp-0.047
 tr-0.047
Token.
Token ID13
Feature activation+1.87e-03

Pos logit contributions
 rac+0.008
 tr+0.007
 sp+0.007
 M+0.007
iden+0.007
Neg logit contributions
 generation-0.010
ishment-0.010
angles-0.010
cks-0.010
 communicate-0.010
Token
Token ID198
Feature activation-9.09e-03

Pos logit contributions
pole+0.003
cks+0.003
 generation+0.003
 anymore+0.003
en+0.003
Neg logit contributions
iden-0.003
 rac-0.003
 bisc-0.002
 bulldo-0.002
 coy-0.002
Token think
Token ID892
Feature activation-8.63e-03

Pos logit contributions
ishment+0.005
ations+0.005
cks+0.005
ility+0.005
pole+0.005
Neg logit contributions
 rac-0.004
 tr-0.004
 sp-0.004
iden-0.004
 Oscar-0.004
Token about
Token ID546
Feature activation-4.37e-03

Pos logit contributions
iden+0.013
 rac+0.013
 M+0.012
 Oscar+0.012
 tr+0.012
Neg logit contributions
cks-0.015
 communicate-0.015
ishment-0.015
 generation-0.014
ations-0.014
Token how
Token ID703
Feature activation-6.40e-02

Pos logit contributions
 rac+0.008
 M+0.007
iden+0.007
 tr+0.007
 Oscar+0.007
Neg logit contributions
pole-0.007
immer-0.007
ishment-0.007
cks-0.007
 communicate-0.006
Token my
Token ID616
Feature activation-2.97e-02

Pos logit contributions
iden+0.100
 rac+0.099
 Oscar+0.093
 bisc+0.092
 tr+0.092
Neg logit contributions
 communicate-0.107
ations-0.106
cks-0.105
 anymore-0.104
 generation-0.104
Token decisions
Token ID5370
Feature activation-1.38e-03

Pos logit contributions
 rac+0.038
iden+0.038
 tr+0.034
 Oscar+0.034
 coy+0.033
Neg logit contributions
 communicate-0.059
cks-0.059
ations-0.057
en-0.057
pole-0.057
Token affect
Token ID2689
Feature activation+1.79e-01

Pos logit contributions
 rac+0.002
 tr+0.002
iden+0.002
 sp+0.002
 M+0.002
Neg logit contributions
ishment-0.003
 generation-0.003
cks-0.003
 communicate-0.002
ations-0.002
Token our
Token ID674
Feature activation-5.25e-02

Pos logit contributions
immer+0.317
ishment+0.314
pole+0.306
cks+0.300
cker+0.295
Neg logit contributions
 rac-0.276
 M-0.250
 Oscar-0.249
 Now-0.236
 two-0.234
Token nation
Token ID3277
Feature activation+4.51e-02

Pos logit contributions
 rac+0.092
iden+0.089
 tr+0.084
 sp+0.083
 coy+0.083
Neg logit contributions
cks-0.080
pole-0.080
 communicate-0.078
immer-0.076
ility-0.076
Token."
Token ID526
Feature activation-7.57e-03

Pos logit contributions
ishment+0.086
cks+0.085
 communicate+0.082
ility+0.082
 generation+0.082
Neg logit contributions
 rac-0.061
 tr-0.057
iden-0.056
 sp-0.056
 M-0.052
Token
Token ID198
Feature activation-6.40e-03

Pos logit contributions
 rac+0.010
iden+0.009
 coy+0.009
 bulldo+0.009
 tr+0.008
Neg logit contributions
pole-0.015
 generation-0.014
 communicate-0.014
cks-0.014
ishment-0.014
Token
Token ID198
Feature activation-6.28e-03

Pos logit contributions
 rac+0.009
iden+0.009
 sp+0.008
 coy+0.008
 tr+0.008
Neg logit contributions
 communicate-0.012
cks-0.012
ck-0.011
ie-0.011
 generation-0.011
Token before
Token ID878
Feature activation+8.57e-04

Pos logit contributions
 rac+0.110
iden+0.108
 M+0.107
 tr+0.102
 Now+0.101
Neg logit contributions
cks-0.173
ck-0.160
ishment-0.160
 communicate-0.157
 fade-0.156
Token doing
Token ID1804
Feature activation-2.60e-02

Pos logit contributions
pole+0.001
 communicate+0.001
cks+0.001
ishment+0.001
en+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 M-0.001
 Oscar-0.001
 bulldo-0.001
Token something
Token ID1223
Feature activation-2.12e-02

Pos logit contributions
 rac+0.043
iden+0.042
 sp+0.041
 M+0.040
 tr+0.040
Neg logit contributions
pole-0.044
cks-0.041
 generation-0.041
cker-0.041
ility-0.041
Token that
Token ID326
Feature activation+1.33e-02

Pos logit contributions
 rac+0.029
iden+0.029
 tr+0.027
 Oscar+0.026
 M+0.026
Neg logit contributions
 communicate-0.040
cks-0.040
 generation-0.040
ishment-0.039
pole-0.039
Token might
Token ID1244
Feature activation+1.27e-02

Pos logit contributions
cks+0.025
ishment+0.025
immer+0.025
 generation+0.025
en+0.025
Neg logit contributions
 rac-0.018
 M-0.017
iden-0.017
 sp-0.017
 tr-0.017
Token affect
Token ID2689
Feature activation+1.77e-01

Pos logit contributions
ishment+0.024
ations+0.024
cks+0.023
iece+0.022
pole+0.022
Neg logit contributions
 rac-0.019
 sp-0.018
 M-0.017
 big-0.017
 tr-0.017
Token others
Token ID1854
Feature activation+3.62e-02

Pos logit contributions
immer+0.376
pole+0.352
cker+0.348
ishment+0.347
iece+0.336
Neg logit contributions
 rac-0.251
 big-0.245
 two-0.220
 M-0.217
 friends-0.216
Token.
Token ID13
Feature activation+1.26e-03

Pos logit contributions
 generation+0.074
ishment+0.073
pole+0.072
immer+0.072
cks+0.071
Neg logit contributions
 rac-0.044
 sp-0.041
 tr-0.039
 M-0.039
 big-0.036
Token
Token ID198
Feature activation-6.73e-03

Pos logit contributions
 generation+0.003
cks+0.003
pole+0.002
ishment+0.002
 communicate+0.002
Neg logit contributions
iden-0.002
 rac-0.001
 tr-0.001
 bulldo-0.001
 coy-0.001
Token
Token ID198
Feature activation-6.95e-03

Pos logit contributions
iden+0.008
 rac+0.008
 tr+0.007
 coy+0.007
 Oscar+0.007
Neg logit contributions
 communicate-0.014
cks-0.014
pole-0.013
ations-0.013
 generation-0.013
TokenFrom
Token ID4863
Feature activation+3.10e-02

Pos logit contributions
 rac+0.009
iden+0.009
 Oscar+0.008
 tr+0.008
 coy+0.008
Neg logit contributions
 communicate-0.014
cks-0.014
ations-0.013
pole-0.013
 generation-0.013
Token of
Token ID286
Feature activation-7.25e-03

Pos logit contributions
 rac+0.044
iden+0.041
 M+0.037
 tr+0.037
 Oscar+0.037
Neg logit contributions
cks-0.116
 communicate-0.112
ishment-0.112
 generation-0.110
pole-0.109
Token how
Token ID703
Feature activation-6.61e-02

Pos logit contributions
 rac+0.012
 M+0.011
 Oscar+0.011
 sp+0.011
iden+0.011
Neg logit contributions
pole-0.012
immer-0.012
ishment-0.011
cks-0.011
ide-0.011
Token our
Token ID674
Feature activation-5.39e-02

Pos logit contributions
 rac+0.090
iden+0.090
 Oscar+0.081
 tr+0.081
 coy+0.081
Neg logit contributions
 communicate-0.125
cks-0.124
ations-0.122
en-0.121
pole-0.121
Token decisions
Token ID5370
Feature activation-3.96e-03

Pos logit contributions
 rac+0.083
iden+0.079
 sp+0.075
 tr+0.075
 coy+0.073
Neg logit contributions
pole-0.094
cks-0.093
immer-0.092
en-0.092
ility-0.091
Token might
Token ID1244
Feature activation+1.19e-02

Pos logit contributions
 rac+0.005
 tr+0.005
iden+0.005
 sp+0.005
 Oscar+0.005
Neg logit contributions
ishment-0.008
 generation-0.008
cks-0.008
pole-0.007
cker-0.007
Token affect
Token ID2689
Feature activation+1.76e-01

Pos logit contributions
ishment+0.022
cks+0.020
angles+0.020
ations+0.020
pole+0.019
Neg logit contributions
 rac-0.019
 sp-0.017
 tr-0.016
 coy-0.016
 w-0.016
Token them
Token ID606
Feature activation-8.15e-04

Pos logit contributions
immer+0.339
pole+0.325
ishment+0.317
cker+0.317
cks+0.315
Neg logit contributions
 rac-0.256
 big-0.231
 two-0.227
 Oscar-0.224
 M-0.216
Token.
Token ID13
Feature activation-1.24e-03

Pos logit contributions
 rac+0.001
 sp+0.001
 M+0.001
 tr+0.001
 big+0.001
Neg logit contributions
cks-0.002
ishment-0.001
angles-0.001
 generation-0.001
pole-0.001
Token<|endoftext|>
Token ID50256
Feature activation+4.30e-02

Pos logit contributions
 rac+0.001
iden+0.001
 tr+0.001
 coy+0.001
 bulldo+0.001
Neg logit contributions
 communicate-0.003
 generation-0.003
cks-0.003
pole-0.003
en-0.003
TokenOnce
Token ID7454
Feature activation-6.30e-03

Pos logit contributions
 anymore+0.087
ations+0.084
en+0.084
cks+0.081
ie+0.081
Neg logit contributions
iden-0.052
 rac-0.049
 coy-0.048
 bulldo-0.046
 tr-0.045
Token upon
Token ID2402
Feature activation-4.61e-02

Pos logit contributions
iden+0.007
 rac+0.007
 Oscar+0.006
 tr+0.006
 M+0.006
Neg logit contributions
 communicate-0.014
cks-0.014
 generation-0.014
ations-0.014
en-0.014
Token a
Token ID257
Feature activation+1.21e-03

Pos logit contributions
iden+0.009
 rac+0.009
 bulldo+0.008
 tr+0.008
 coy+0.008
Neg logit contributions
 generation-0.010
 communicate-0.010
cks-0.010
pole-0.010
en-0.010
Token friendly
Token ID8030
Feature activation-6.52e-03

Pos logit contributions
 communicate+0.002
ie+0.002
pole+0.002
 generation+0.002
cker+0.002
Neg logit contributions
iden-0.001
 Oscar-0.001
 Now-0.001
 rac-0.001
lic-0.001
Token cat
Token ID3797
Feature activation+1.11e-02

Pos logit contributions
iden+0.008
 rac+0.008
Fel+0.007
 Oscar+0.007
 bulldo+0.007
Neg logit contributions
ie-0.012
pole-0.012
 communicate-0.012
en-0.012
 generation-0.012
Token named
Token ID3706
Feature activation-2.48e-02

Pos logit contributions
ishment+0.022
ations+0.022
 communicate+0.022
cks+0.021
ility+0.021
Neg logit contributions
 rac-0.015
 M-0.014
iden-0.014
 tr-0.013
 bulldo-0.013
Token Wh
Token ID854
Feature activation+2.33e-03

Pos logit contributions
iden+0.035
 rac+0.032
 coy+0.030
 bulldo+0.029
 tr+0.029
Neg logit contributions
 anymore-0.045
 communicate-0.045
cks-0.045
 generation-0.045
pole-0.045
Tokenisk
Token ID1984
Feature activation+1.33e-01

Pos logit contributions
ishment+0.002
 communicate+0.002
 generation+0.002
cks+0.002
ations+0.002
Neg logit contributions
 rac-0.005
iden-0.005
 tr-0.005
 bulldo-0.005
 Oscar-0.005
Tokeners
Token ID364
Feature activation+1.18e-02

Pos logit contributions
ishment+0.215
 communicate+0.203
cks+0.193
 fade+0.191
 generation+0.190
Neg logit contributions
 rac-0.232
iden-0.224
 Now-0.206
 Oscar-0.205
 tr-0.205
Token.
Token ID13
Feature activation+2.58e-03

Pos logit contributions
ishment+0.023
ations+0.022
cks+0.022
ility+0.022
en+0.021
Neg logit contributions
 rac-0.017
 tr-0.016
 M-0.015
iden-0.015
 bulldo-0.015
Token Wh
Token ID854
Feature activation+2.19e-03

Pos logit contributions
 communicate+0.005
pole+0.004
 anymore+0.004
cks+0.004
en+0.004
Neg logit contributions
iden-0.004
 rac-0.003
 Oscar-0.003
 coy-0.003
 tr-0.003
Tokenisk
Token ID1984
Feature activation+1.32e-01

Pos logit contributions
 communicate+0.002
 generation+0.002
ishment+0.002
ility+0.002
cks+0.002
Neg logit contributions
 rac-0.005
iden-0.005
 bulldo-0.005
 Oscar-0.005
 coy-0.005
Tokeners
Token ID364
Feature activation+1.15e-02

Pos logit contributions
ishment+0.211
 communicate+0.198
 generation+0.190
cks+0.189
 fade+0.187
Neg logit contributions
 rac-0.230
iden-0.223
 tr-0.207
lic-0.206
 Now-0.205
Token a
Token ID257
Feature activation+3.71e-03

Pos logit contributions
iden+0.009
 rac+0.008
 bulldo+0.008
 tr+0.008
 coy+0.008
Neg logit contributions
 generation-0.009
 communicate-0.009
cks-0.009
pole-0.009
en-0.009
Token black
Token ID2042
Feature activation-1.56e-03

Pos logit contributions
 communicate+0.007
ie+0.007
 generation+0.007
pole+0.007
cker+0.007
Neg logit contributions
iden-0.004
 Oscar-0.004
 Now-0.004
 rac-0.004
 coy-0.004
Token cat
Token ID3797
Feature activation+1.13e-02

Pos logit contributions
iden+0.002
 rac+0.002
 Oscar+0.002
 coy+0.002
 bulldo+0.002
Neg logit contributions
 communicate-0.003
en-0.003
pole-0.003
 generation-0.003
ie-0.003
Token named
Token ID3706
Feature activation-2.20e-02

Pos logit contributions
ishment+0.023
ations+0.022
 communicate+0.022
ility+0.021
 generation+0.021
Neg logit contributions
 rac-0.016
 M-0.014
 tr-0.014
iden-0.014
 bulldo-0.014
Token Wh
Token ID854
Feature activation+3.37e-03

Pos logit contributions
iden+0.033
 rac+0.030
 coy+0.028
 bulldo+0.028
 tr+0.028
Neg logit contributions
cks-0.039
 communicate-0.039
 generation-0.038
pole-0.038
 anymore-0.038
Tokenisk
Token ID1984
Feature activation+1.32e-01

Pos logit contributions
ishment+0.003
 communicate+0.003
 generation+0.003
ations+0.003
pole+0.003
Neg logit contributions
 rac-0.008
 tr-0.007
iden-0.007
 Oscar-0.007
 bulldo-0.007
Tokeners
Token ID364
Feature activation+1.18e-02

Pos logit contributions
ishment+0.214
 communicate+0.200
 generation+0.192
 fade+0.191
cks+0.190
Neg logit contributions
 rac-0.230
iden-0.222
 bulldo-0.205
 Oscar-0.205
 tr-0.204
Token.
Token ID13
Feature activation+1.91e-03

Pos logit contributions
ishment+0.025
ations+0.024
ility+0.024
en+0.023
 generation+0.023
Neg logit contributions
 rac-0.015
 tr-0.014
 M-0.013
 bulldo-0.013
 sp-0.013
Token Wh
Token ID854
Feature activation+3.05e-03

Pos logit contributions
 communicate+0.003
cks+0.003
 anymore+0.003
pole+0.003
hips+0.003
Neg logit contributions
iden-0.003
 rac-0.003
 Oscar-0.003
 coy-0.003
 tr-0.002
Tokenisk
Token ID1984
Feature activation+1.31e-01

Pos logit contributions
 communicate+0.003
 generation+0.003
ishment+0.003
ility+0.003
pole+0.003
Neg logit contributions
 rac-0.007
iden-0.007
 Oscar-0.007
 bulldo-0.007
 coy-0.007
Tokeners
Token ID364
Feature activation+1.17e-02

Pos logit contributions
ishment+0.207
 communicate+0.195
 generation+0.190
ility+0.186
 fade+0.186
Neg logit contributions
 rac-0.230
iden-0.222
 tr-0.207
 Oscar-0.206
 Now-0.205
Token Wh
Token ID854
Feature activation+2.33e-03

Pos logit contributions
iden+0.035
 rac+0.032
 coy+0.030
 bulldo+0.029
 tr+0.029
Neg logit contributions
 anymore-0.045
 communicate-0.045
cks-0.045
 generation-0.045
pole-0.045
Tokenisk
Token ID1984
Feature activation+1.33e-01

Pos logit contributions
ishment+0.002
 communicate+0.002
 generation+0.002
cks+0.002
ations+0.002
Neg logit contributions
 rac-0.005
iden-0.005
 tr-0.005
 bulldo-0.005
 Oscar-0.005
Tokeners
Token ID364
Feature activation+1.18e-02

Pos logit contributions
ishment+0.215
 communicate+0.203
cks+0.193
 fade+0.191
 generation+0.190
Neg logit contributions
 rac-0.232
iden-0.224
 Now-0.206
 Oscar-0.205
 tr-0.205
Token.
Token ID13
Feature activation+2.58e-03

Pos logit contributions
ishment+0.023
ations+0.022
cks+0.022
ility+0.022
en+0.021
Neg logit contributions
 rac-0.017
 tr-0.016
 M-0.015
iden-0.015
 bulldo-0.015
Token Wh
Token ID854
Feature activation+2.19e-03

Pos logit contributions
 communicate+0.005
pole+0.004
 anymore+0.004
cks+0.004
en+0.004
Neg logit contributions
iden-0.004
 rac-0.003
 Oscar-0.003
 coy-0.003
 tr-0.003
Tokenisk
Token ID1984
Feature activation+1.32e-01

Pos logit contributions
 communicate+0.002
 generation+0.002
ishment+0.002
ility+0.002
cks+0.002
Neg logit contributions
 rac-0.005
iden-0.005
 bulldo-0.005
 Oscar-0.005
 coy-0.005
Tokeners
Token ID364
Feature activation+1.15e-02

Pos logit contributions
ishment+0.211
 communicate+0.198
 generation+0.190
cks+0.189
 fade+0.187
Neg logit contributions
 rac-0.230
iden-0.223
 tr-0.207
lic-0.206
 Now-0.205
Token liked
Token ID8288
Feature activation+1.31e-02

Pos logit contributions
ishment+0.024
cks+0.024
ility+0.024
 communicate+0.024
 generation+0.024
Neg logit contributions
 rac-0.014
iden-0.013
 tr-0.013
 sp-0.012
 M-0.012
Token to
Token ID284
Feature activation+2.09e-03

Pos logit contributions
 communicate+0.022
cks+0.021
 anymore+0.021
ations+0.020
 generation+0.020
Neg logit contributions
iden-0.021
 rac-0.019
 Oscar-0.018
 coy-0.018
lic-0.018
Token imagine
Token ID5967
Feature activation+1.16e-02

Pos logit contributions
cks+0.004
ishment+0.004
ations+0.004
pole+0.003
en+0.003
Neg logit contributions
 rac-0.003
iden-0.003
 tr-0.003
 M-0.003
 bulldo-0.003
Token that
Token ID326
Feature activation+1.57e-02

Pos logit contributions
ishment+0.020
 generation+0.020
 communicate+0.020
pole+0.020
en+0.020
Neg logit contributions
 rac-0.018
iden-0.017
 tr-0.017
 M-0.017
 sp-0.016
TokenI
Token ID40
Feature activation+2.40e-02

Pos logit contributions
cks+0.011
 communicate+0.011
ations+0.011
ishment+0.011
en+0.011
Neg logit contributions
 rac-0.005
iden-0.005
 coy-0.005
 Oscar-0.005
 tr-0.005
Token'm
Token ID1101
Feature activation-9.30e-03

Pos logit contributions
 communicate+0.037
 anymore+0.035
 die+0.035
 someday+0.035
 fade+0.035
Neg logit contributions
iden-0.042
 rac-0.039
 Water-0.039
 bulldo-0.038
 tr-0.038
Token sorry
Token ID7926
Feature activation-1.72e-02

Pos logit contributions
iden+0.015
 rac+0.014
 tr+0.013
 bulldo+0.013
 Oscar+0.013
Neg logit contributions
 communicate-0.016
cks-0.016
pole-0.015
en-0.015
ations-0.015
Token,
Token ID11
Feature activation+7.09e-03

Pos logit contributions
iden+0.031
 rac+0.030
 coy+0.029
 bulldo+0.028
 Oscar+0.028
Neg logit contributions
 anymore-0.026
 communicate-0.025
ie-0.025
en-0.025
pole-0.024
Token Wh
Token ID854
Feature activation+6.29e-04

Pos logit contributions
 communicate+0.014
cks+0.014
ations+0.014
en+0.014
 generation+0.014
Neg logit contributions
 rac-0.009
iden-0.009
 Oscar-0.008
 tr-0.008
 coy-0.008
Tokenisk
Token ID1984
Feature activation+1.32e-01

Pos logit contributions
 communicate+0.001
 generation+0.001
cks+0.001
pole+0.001
ations+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 bisc-0.001
 Oscar-0.001
 bulldo-0.001
Tokeners
Token ID364
Feature activation+1.13e-02

Pos logit contributions
ishment+0.203
 communicate+0.191
cks+0.189
 generation+0.185
en+0.184
Neg logit contributions
 rac-0.239
iden-0.233
 tr-0.214
 coy-0.213
 Now-0.212
Token.
Token ID13
Feature activation+5.31e-03

Pos logit contributions
cks+0.021
 generation+0.021
ishment+0.021
 communicate+0.021
ations+0.021
Neg logit contributions
 rac-0.017
iden-0.015
 tr-0.015
 M-0.015
 Oscar-0.014
Token Let
Token ID3914
Feature activation-7.80e-03

Pos logit contributions
pole+0.009
 communicate+0.009
 generation+0.009
ishment+0.009
ility+0.009
Neg logit contributions
iden-0.008
 rac-0.007
 tr-0.007
 coy-0.007
 bulldo-0.007
Token's
Token ID338
Feature activation+3.02e-02

Pos logit contributions
 rac+0.010
iden+0.009
 M+0.009
 Oscar+0.009
 tr+0.009
Neg logit contributions
cks-0.016
 communicate-0.016
ishment-0.016
ility-0.016
 generation-0.016
Token go
Token ID467
Feature activation+2.47e-02

Pos logit contributions
cks+0.047
ations+0.045
ishment+0.044
ility+0.044
 communicate+0.043
Neg logit contributions
 rac-0.054
 tr-0.051
 M-0.051
 Oscar-0.050
iden-0.050
Token Wh
Token ID854
Feature activation+3.37e-03

Pos logit contributions
iden+0.033
 rac+0.030
 coy+0.028
 bulldo+0.028
 tr+0.028
Neg logit contributions
cks-0.039
 communicate-0.039
 generation-0.038
pole-0.038
 anymore-0.038
Tokenisk
Token ID1984
Feature activation+1.32e-01

Pos logit contributions
ishment+0.003
 communicate+0.003
 generation+0.003
ations+0.003
pole+0.003
Neg logit contributions
 rac-0.008
 tr-0.007
iden-0.007
 Oscar-0.007
 bulldo-0.007
Tokeners
Token ID364
Feature activation+1.18e-02

Pos logit contributions
ishment+0.214
 communicate+0.200
 generation+0.192
 fade+0.191
cks+0.190
Neg logit contributions
 rac-0.230
iden-0.222
 bulldo-0.205
 Oscar-0.205
 tr-0.204
Token.
Token ID13
Feature activation+1.91e-03

Pos logit contributions
ishment+0.025
ations+0.024
ility+0.024
en+0.023
 generation+0.023
Neg logit contributions
 rac-0.015
 tr-0.014
 M-0.013
 bulldo-0.013
 sp-0.013
Token Wh
Token ID854
Feature activation+3.05e-03

Pos logit contributions
 communicate+0.003
cks+0.003
 anymore+0.003
pole+0.003
hips+0.003
Neg logit contributions
iden-0.003
 rac-0.003
 Oscar-0.003
 coy-0.003
 tr-0.002
Tokenisk
Token ID1984
Feature activation+1.31e-01

Pos logit contributions
 communicate+0.003
 generation+0.003
ishment+0.003
ility+0.003
pole+0.003
Neg logit contributions
 rac-0.007
iden-0.007
 Oscar-0.007
 bulldo-0.007
 coy-0.007
Tokeners
Token ID364
Feature activation+1.17e-02

Pos logit contributions
ishment+0.207
 communicate+0.195
 generation+0.190
ility+0.186
 fade+0.186
Neg logit contributions
 rac-0.230
iden-0.222
 tr-0.207
 Oscar-0.206
 Now-0.205
Token loved
Token ID6151
Feature activation-4.59e-04

Pos logit contributions
ishment+0.025
ility+0.025
 generation+0.025
en+0.024
cks+0.024
Neg logit contributions
 rac-0.013
iden-0.013
 tr-0.013
 sp-0.012
 M-0.012
Token to
Token ID284
Feature activation+2.15e-03

Pos logit contributions
iden+0.001
 rac+0.001
 Oscar+0.001
lic+0.001
 coy+0.001
Neg logit contributions
 communicate-0.001
cks-0.001
ie-0.001
 generation-0.001
en-0.001
Token walk
Token ID2513
Feature activation+1.63e-02

Pos logit contributions
ishment+0.004
cks+0.004
ations+0.004
pole+0.004
en+0.004
Neg logit contributions
 rac-0.003
iden-0.003
 tr-0.003
 M-0.003
 bulldo-0.003
Token around
Token ID1088
Feature activation+3.55e-02

Pos logit contributions
 communicate+0.025
cks+0.024
ations+0.024
en+0.024
 generation+0.024
Neg logit contributions
iden-0.029
 rac-0.029
 Oscar-0.027
 coy-0.026
 bulldo-0.026
Token a
Token ID257
Feature activation+1.90e-03

Pos logit contributions
iden+0.010
 rac+0.010
 bulldo+0.009
 tr+0.009
 coy+0.009
Neg logit contributions
 generation-0.010
 communicate-0.010
cks-0.010
pole-0.010
en-0.010
Token lazy
Token ID16931
Feature activation-2.76e-02

Pos logit contributions
 communicate+0.004
ie+0.004
pole+0.004
 generation+0.004
cker+0.003
Neg logit contributions
iden-0.002
 Oscar-0.002
 Now-0.002
 rac-0.002
Fel-0.002
Token cat
Token ID3797
Feature activation+1.18e-02

Pos logit contributions
iden+0.035
Fel+0.032
lic+0.032
 Oscar+0.032
 rac+0.031
Neg logit contributions
ie-0.052
 anymore-0.050
 communicate-0.050
en-0.049
pole-0.049
Token named
Token ID3706
Feature activation-2.29e-02

Pos logit contributions
ishment+0.024
 communicate+0.023
ations+0.023
cks+0.023
ility+0.023
Neg logit contributions
 rac-0.016
 M-0.014
iden-0.014
 tr-0.014
 bulldo-0.014
Token Wh
Token ID854
Feature activation+2.94e-03

Pos logit contributions
iden+0.032
 rac+0.029
 coy+0.028
 bulldo+0.027
 mo+0.027
Neg logit contributions
 anymore-0.042
ie-0.041
 communicate-0.040
pole-0.040
 generation-0.040
Tokenisk
Token ID1984
Feature activation+1.31e-01

Pos logit contributions
ishment+0.003
 communicate+0.003
 generation+0.003
cks+0.003
ations+0.003
Neg logit contributions
 rac-0.007
iden-0.006
 Oscar-0.006
 tr-0.006
 bulldo-0.006
Tokeners
Token ID364
Feature activation+1.05e-02

Pos logit contributions
ishment+0.217
 communicate+0.200
cks+0.192
 fade+0.191
 generation+0.191
Neg logit contributions
 rac-0.226
iden-0.219
 Oscar-0.200
 bulldo-0.200
lic-0.199
Token.
Token ID13
Feature activation+1.69e-03

Pos logit contributions
ishment+0.021
ations+0.021
cks+0.020
ility+0.020
en+0.020
Neg logit contributions
 rac-0.014
 tr-0.013
 M-0.013
 bulldo-0.012
iden-0.012
Token Wh
Token ID854
Feature activation+2.62e-03

Pos logit contributions
 anymore+0.003
cks+0.003
 communicate+0.003
pole+0.003
en+0.003
Neg logit contributions
iden-0.003
 rac-0.002
 coy-0.002
 Oscar-0.002
 tr-0.002
Tokenisk
Token ID1984
Feature activation+1.30e-01

Pos logit contributions
 communicate+0.003
 generation+0.002
ishment+0.002
 fade+0.002
ility+0.002
Neg logit contributions
 rac-0.006
iden-0.006
 Oscar-0.006
 bulldo-0.006
 coy-0.006
Tokeners
Token ID364
Feature activation+1.06e-02

Pos logit contributions
ishment+0.211
 communicate+0.195
 generation+0.190
cks+0.187
 fade+0.187
Neg logit contributions
 rac-0.225
iden-0.218
lic-0.202
 tr-0.202
 Now-0.201
Token was
Token ID373
Feature activation-6.41e-03

Pos logit contributions
 communicate+0.004
cks+0.004
ations+0.004
en+0.004
pole+0.004
Neg logit contributions
iden-0.003
 rac-0.003
 Oscar-0.003
 tr-0.003
 coy-0.003
Token heading
Token ID9087
Feature activation+2.62e-02

Pos logit contributions
iden+0.010
 rac+0.010
 tr+0.009
 bulldo+0.009
 Oscar+0.009
Neg logit contributions
cks-0.011
 communicate-0.011
pole-0.011
 anymore-0.011
en-0.010
Token straight
Token ID3892
Feature activation+9.82e-03

Pos logit contributions
 anymore+0.033
 communicate+0.032
en+0.032
 generation+0.032
ishment+0.031
Neg logit contributions
iden-0.053
 rac-0.050
 bulldo-0.047
 Oscar-0.047
 Now-0.046
Token towards
Token ID3371
Feature activation+9.96e-03

Pos logit contributions
 communicate+0.016
cks+0.016
en+0.016
ations+0.016
ishment+0.016
Neg logit contributions
iden-0.016
 rac-0.016
 Oscar-0.014
 coy-0.014
 tr-0.014
Token Wh
Token ID854
Feature activation+4.14e-03

Pos logit contributions
 communicate+0.016
cks+0.016
pole+0.016
ations+0.016
ishment+0.015
Neg logit contributions
 rac-0.017
iden-0.016
 M-0.016
 Oscar-0.016
 tr-0.015
Tokenisk
Token ID1984
Feature activation+1.31e-01

Pos logit contributions
 communicate+0.004
 generation+0.004
cks+0.004
ishment+0.004
pole+0.004
Neg logit contributions
 rac-0.009
iden-0.009
 tr-0.009
 Oscar-0.009
 bulldo-0.009
Tokeners
Token ID364
Feature activation+1.02e-02

Pos logit contributions
ishment+0.184
 communicate+0.184
cks+0.177
 generation+0.172
en+0.171
Neg logit contributions
 rac-0.245
iden-0.243
 tr-0.226
 Oscar-0.225
 bulldo-0.223
Token.
Token ID13
Feature activation+3.66e-03

Pos logit contributions
cks+0.019
 communicate+0.019
ishment+0.018
ations+0.018
cker+0.018
Neg logit contributions
 rac-0.015
 tr-0.014
iden-0.013
 Oscar-0.013
 M-0.013
Token She
Token ID1375
Feature activation+1.28e-02

Pos logit contributions
pole+0.005
 anymore+0.005
 generation+0.005
ishment+0.005
isk+0.004
Neg logit contributions
iden-0.006
 rac-0.006
 sp-0.006
 tr-0.006
 mo-0.006
Token ran
Token ID4966
Feature activation-2.83e-02

Pos logit contributions
ck+0.020
en+0.020
 communicate+0.019
ie+0.019
pole+0.019
Neg logit contributions
iden-0.020
 Water-0.020
 rac-0.019
 bisc-0.019
 Now-0.019
Token as
Token ID355
Feature activation+6.97e-03

Pos logit contributions
iden+0.055
 rac+0.054
 bulldo+0.051
 coy+0.050
 Oscar+0.050
Neg logit contributions
en-0.036
 anymore-0.036
 communicate-0.036
pole-0.035
 generation-0.035
Token,
Token ID11
Feature activation+7.87e-03

Pos logit contributions
en+0.014
 communicate+0.014
 anymore+0.014
cks+0.014
pole+0.014
Neg logit contributions
iden-0.017
 rac-0.017
 coy-0.016
 bulldo-0.016
 tr-0.015
Token and
Token ID290
Feature activation+3.36e-03

Pos logit contributions
pole+0.016
 communicate+0.016
 anymore+0.016
cks+0.015
 generation+0.015
Neg logit contributions
iden-0.009
 rac-0.009
 bulldo-0.008
lic-0.008
 tr-0.008
Token he
Token ID339
Feature activation-1.33e-02

Pos logit contributions
pole+0.006
 communicate+0.005
cks+0.005
ck+0.005
ie+0.005
Neg logit contributions
iden-0.005
 rac-0.005
 bulldo-0.005
Fel-0.004
lic-0.004
Token and
Token ID290
Feature activation+3.53e-03

Pos logit contributions
iden+0.022
 rac+0.021
 Water+0.020
 bulldo+0.019
 Oscar+0.019
Neg logit contributions
 communicate-0.021
en-0.020
pole-0.020
ck-0.020
ie-0.020
Token Wh
Token ID854
Feature activation+4.21e-03

Pos logit contributions
 communicate+0.007
cks+0.007
pole+0.007
en+0.006
ations+0.006
Neg logit contributions
iden-0.005
 rac-0.005
 bulldo-0.004
 coy-0.004
 tr-0.004
Tokenisk
Token ID1984
Feature activation+1.31e-01

Pos logit contributions
 generation+0.004
 communicate+0.004
ishment+0.004
cks+0.004
ations+0.004
Neg logit contributions
 rac-0.010
 Oscar-0.009
 tr-0.009
iden-0.009
W-0.009
Tokeners
Token ID364
Feature activation+1.10e-02

Pos logit contributions
ishment+0.236
 communicate+0.209
 appearance+0.206
cks+0.206
 generation+0.205
Neg logit contributions
 rac-0.212
iden-0.202
lic-0.193
 Now-0.190
 tr-0.188
Token played
Token ID2826
Feature activation-8.45e-04

Pos logit contributions
ishment+0.022
cks+0.022
 generation+0.022
ations+0.022
 communicate+0.021
Neg logit contributions
 rac-0.014
iden-0.014
 tr-0.013
 bulldo-0.013
 sp-0.012
Token with
Token ID351
Feature activation-2.03e-02

Pos logit contributions
 rac+0.001
 tr+0.001
iden+0.001
 Water+0.001
 Oscar+0.001
Neg logit contributions
cks-0.001
ishment-0.001
ations-0.001
cker-0.001
en-0.001
Token the
Token ID262
Feature activation+2.08e-03

Pos logit contributions
 rac+0.032
iden+0.029
 Oscar+0.029
 M+0.029
 tr+0.028
Neg logit contributions
pole-0.036
cks-0.036
 communicate-0.036
ations-0.036
ishment-0.035
Token new
Token ID649
Feature activation+1.98e-02

Pos logit contributions
 communicate+0.004
pole+0.004
cks+0.004
ations+0.004
 generation+0.004
Neg logit contributions
iden-0.003
 rac-0.002
 Oscar-0.002
 coy-0.002
 M-0.002
Token a
Token ID257
Feature activation+2.41e-03

Pos logit contributions
iden+0.010
 rac+0.010
 bulldo+0.009
 tr+0.009
 coy+0.009
Neg logit contributions
 generation-0.012
 communicate-0.011
cks-0.011
en-0.011
pole-0.011
Token lazy
Token ID16931
Feature activation-3.02e-02

Pos logit contributions
 communicate+0.005
ie+0.005
pole+0.004
 generation+0.004
cker+0.004
Neg logit contributions
 Oscar-0.003
iden-0.003
 Now-0.003
 rac-0.003
Fel-0.002
Token cat
Token ID3797
Feature activation+8.22e-03

Pos logit contributions
iden+0.038
Fel+0.035
 Oscar+0.035
lic+0.035
 rac+0.035
Neg logit contributions
ie-0.056
 anymore-0.055
 communicate-0.054
pole-0.054
en-0.054
Token named
Token ID3706
Feature activation-2.56e-02

Pos logit contributions
ishment+0.017
ations+0.016
 communicate+0.016
cks+0.016
ility+0.016
Neg logit contributions
 rac-0.011
 M-0.010
iden-0.010
 tr-0.010
 bulldo-0.009
Token Wh
Token ID854
Feature activation+1.95e-03

Pos logit contributions
iden+0.038
 rac+0.034
 mo+0.033
 bulldo+0.033
 coy+0.032
Neg logit contributions
 anymore-0.045
 generation-0.043
 communicate-0.043
pole-0.042
ishment-0.042
Tokenisk
Token ID1984
Feature activation+1.31e-01

Pos logit contributions
ishment+0.002
 communicate+0.002
 generation+0.002
cks+0.002
ations+0.002
Neg logit contributions
 rac-0.004
iden-0.004
 tr-0.004
 Oscar-0.004
 bulldo-0.004
Tokeners
Token ID364
Feature activation+9.71e-03

Pos logit contributions
ishment+0.217
 communicate+0.202
cks+0.193
 fade+0.192
 generation+0.192
Neg logit contributions
 rac-0.224
iden-0.216
 Oscar-0.199
lic-0.198
 bulldo-0.198
Token.
Token ID13
Feature activation-3.35e-06

Pos logit contributions
ishment+0.020
ations+0.020
cks+0.020
ility+0.019
immer+0.019
Neg logit contributions
 rac-0.013
 tr-0.012
 M-0.012
 bulldo-0.011
iden-0.011
Token Wh
Token ID854
Feature activation+1.42e-03

Feature not active on prev token;
no predictions were affected.
Tokenisk
Token ID1984
Feature activation+1.30e-01

Pos logit contributions
 communicate+0.001
 generation+0.001
ishment+0.001
 fade+0.001
ility+0.001
Neg logit contributions
 rac-0.003
iden-0.003
 bulldo-0.003
 Oscar-0.003
 coy-0.003
Tokeners
Token ID364
Feature activation+9.42e-03

Pos logit contributions
ishment+0.212
 communicate+0.198
 generation+0.192
 fade+0.189
cks+0.189
Neg logit contributions
 rac-0.223
iden-0.215
lic-0.200
 tr-0.199
 Now-0.199
Token allow
Token ID1249
Feature activation+3.53e-02

Pos logit contributions
cks+0.006
en+0.006
 communicate+0.006
ations+0.006
pole+0.006
Neg logit contributions
 rac-0.005
iden-0.005
 tr-0.004
 coy-0.004
 Oscar-0.004
Token nature
Token ID3450
Feature activation+3.81e-02

Pos logit contributions
pole+0.063
immer+0.060
cks+0.060
ishment+0.059
ist+0.058
Neg logit contributions
 rac-0.055
 M-0.054
 Oscar-0.052
iden-0.050
 coy-0.049
Token to
Token ID284
Feature activation+3.42e-03

Pos logit contributions
ishment+0.095
cks+0.094
ility+0.092
immer+0.092
 generation+0.092
Neg logit contributions
 rac-0.032
 tr-0.029
 M-0.028
 sp-0.027
iden-0.026
Token take
Token ID1011
Feature activation-2.67e-02

Pos logit contributions
ishment+0.007
ations+0.006
 generation+0.006
pole+0.006
cks+0.006
Neg logit contributions
 rac-0.005
 sp-0.005
 tr-0.005
 M-0.005
 big-0.004
Token its
Token ID663
Feature activation+2.52e-04

Pos logit contributions
 rac+0.038
 Water+0.035
 Now+0.035
 M+0.034
 Oscar+0.033
Neg logit contributions
pole-0.051
ishment-0.050
cks-0.048
 communicate-0.048
ations-0.048
Token course
Token ID1781
Feature activation+1.31e-01

Pos logit contributions
en+0.000
cks+0.000
ility+0.000
ishment+0.000
pole+0.000
Neg logit contributions
 rac-0.000
 sp-0.000
 tr-0.000
 M-0.000
 coy-0.000
Token."
Token ID526
Feature activation-8.10e-03

Pos logit contributions
cks+0.257
ility+0.257
 communicate+0.255
ishment+0.246
 generation+0.245
Neg logit contributions
 rac-0.163
 tr-0.161
 M-0.159
 sp-0.156
iden-0.151
Token
Token ID220
Feature activation+4.06e-03

Pos logit contributions
 rac+0.006
iden+0.006
 Oscar+0.005
 M+0.005
 tr+0.005
Neg logit contributions
 communicate-0.020
cks-0.020
ations-0.020
en-0.020
pole-0.020
Token
Token ID198
Feature activation-5.73e-03

Pos logit contributions
 communicate+0.010
cks+0.010
en+0.010
ations+0.010
ishment+0.010
Neg logit contributions
 rac-0.003
iden-0.003
 Oscar-0.003
 M-0.002
 tr-0.002
Token
Token ID198
Feature activation-5.48e-03

Pos logit contributions
 rac+0.005
iden+0.005
 tr+0.004
 coy+0.004
 bulldo+0.004
Neg logit contributions
 communicate-0.013
cks-0.013
ations-0.013
pole-0.013
 generation-0.013
TokenL
Token ID43
Feature activation+1.54e-02

Pos logit contributions
iden+0.006
 rac+0.006
 Oscar+0.006
 tr+0.006
 coy+0.006
Neg logit contributions
 communicate-0.011
cks-0.011
en-0.011
ishment-0.011
ations-0.011
Token nos
Token ID43630
Feature activation-6.24e-02

Pos logit contributions
 communicate+0.060
 anymore+0.060
ie+0.060
ishment+0.059
ations+0.058
Neg logit contributions
iden-0.041
 bulldo-0.038
 rac-0.038
 Oscar-0.036
lic-0.036
Tokeny
Token ID88
Feature activation+6.99e-02

Pos logit contributions
 rac+0.145
iden+0.142
 coy+0.135
 M+0.135
 tr+0.135
Neg logit contributions
 communicate-0.064
cks-0.060
 generation-0.058
ations-0.057
pole-0.057
Token cat
Token ID3797
Feature activation+1.07e-02

Pos logit contributions
 communicate+0.131
en+0.129
 generation+0.128
ations+0.128
ishment+0.127
Neg logit contributions
iden-0.097
 rac-0.094
 Oscar-0.088
 bulldo-0.086
 coy-0.085
Token,
Token ID11
Feature activation+6.30e-03

Pos logit contributions
 communicate+0.019
cks+0.019
ations+0.018
ishment+0.018
 generation+0.018
Neg logit contributions
 rac-0.017
iden-0.016
 tr-0.015
 Oscar-0.015
 coy-0.015
Token Wh
Token ID854
Feature activation+3.95e-03

Pos logit contributions
 communicate+0.011
cks+0.011
ations+0.011
ishment+0.011
en+0.010
Neg logit contributions
 rac-0.010
iden-0.010
 M-0.009
 Oscar-0.009
 tr-0.009
Tokenisk
Token ID1984
Feature activation+1.30e-01

Pos logit contributions
 generation+0.004
 communicate+0.004
cks+0.004
ishment+0.004
pole+0.004
Neg logit contributions
 rac-0.009
 tr-0.008
iden-0.008
Carl-0.008
 Oscar-0.008
Tokeners
Token ID364
Feature activation+1.09e-02

Pos logit contributions
ishment+0.235
 communicate+0.210
 generation+0.206
cks+0.204
 fade+0.202
Neg logit contributions
 rac-0.208
iden-0.199
lic-0.187
 tr-0.185
 Now-0.183
Token,
Token ID11
Feature activation+6.01e-03

Pos logit contributions
ishment+0.024
cks+0.024
 generation+0.023
ations+0.023
angles+0.022
Neg logit contributions
 rac-0.014
 tr-0.012
 M-0.011
 sp-0.011
 bulldo-0.011
Token saw
Token ID2497
Feature activation-7.57e-03

Pos logit contributions
 communicate+0.012
ishment+0.012
cks+0.012
 generation+0.012
ations+0.012
Neg logit contributions
 rac-0.008
iden-0.007
 M-0.007
 tr-0.007
 sp-0.007
Token her
Token ID607
Feature activation+1.87e-02

Pos logit contributions
 rac+0.014
 M+0.013
 Oscar+0.012
 Water+0.012
 sp+0.012
Neg logit contributions
ishment-0.012
 generation-0.012
 communicate-0.012
pole-0.012
immer-0.012
Token searching
Token ID10342
Feature activation-4.47e-02

Pos logit contributions
 communicate+0.034
cks+0.034
ations+0.033
en+0.033
ishment+0.033
Neg logit contributions
iden-0.027
 rac-0.027
 Oscar-0.024
 bulldo-0.024
 coy-0.024
Token
Token ID198
Feature activation-1.62e-02

Pos logit contributions
 rac+0.012
iden+0.011
 coy+0.010
 tr+0.010
 bulldo+0.010
Neg logit contributions
cks-0.025
 communicate-0.025
pole-0.025
ations-0.024
 generation-0.024
Token
Token ID198
Feature activation-1.50e-02

Pos logit contributions
 rac+0.018
iden+0.018
 bulldo+0.017
 tr+0.017
 sp+0.017
Neg logit contributions
 communicate-0.034
pole-0.033
 generation-0.033
cks-0.033
ations-0.032
TokenJake
Token ID43930
Feature activation-3.93e-02

Pos logit contributions
 rac+0.021
iden+0.020
 bulldo+0.020
 tr+0.019
 sp+0.019
Neg logit contributions
 communicate-0.028
 generation-0.027
pole-0.027
ations-0.026
cks-0.026
Token was
Token ID373
Feature activation-1.72e-02

Pos logit contributions
iden+0.048
 rac+0.048
 tr+0.043
 Oscar+0.043
 M+0.042
Neg logit contributions
 communicate-0.080
cks-0.080
ations-0.078
 generation-0.078
ishment-0.078
Token over
Token ID625
Feature activation-3.69e-03

Pos logit contributions
iden+0.026
 rac+0.026
 tr+0.024
 Oscar+0.024
 bulldo+0.023
Neg logit contributions
 communicate-0.030
cks-0.030
pole-0.029
en-0.029
 generation-0.029
Tokenjoy
Token ID2633
Feature activation-1.17e-01

Pos logit contributions
iden+0.004
 rac+0.003
 bulldo+0.003
 tr+0.003
 coy+0.003
Neg logit contributions
 communicate-0.008
en-0.008
ishment-0.008
 generation-0.008
 anymore-0.008
Tokened
Token ID276
Feature activation-3.33e-03

Pos logit contributions
 rac+0.109
iden+0.107
 Oscar+0.094
 bulldo+0.093
 coy+0.093
Neg logit contributions
 communicate-0.269
en-0.267
cks-0.267
pole-0.265
ations-0.264
Token and
Token ID290
Feature activation-7.59e-03

Pos logit contributions
 rac+0.005
iden+0.005
 Oscar+0.005
 tr+0.005
 M+0.005
Neg logit contributions
 communicate-0.006
cks-0.006
ations-0.005
 generation-0.005
pole-0.005
Token thanked
Token ID26280
Feature activation+4.05e-02

Pos logit contributions
 rac+0.011
iden+0.011
 bulldo+0.010
 Water+0.009
Fel+0.009
Neg logit contributions
 communicate-0.014
pole-0.013
en-0.013
ie-0.013
ck-0.013
Token his
Token ID465
Feature activation-1.02e-02

Pos logit contributions
cks+0.072
 communicate+0.072
pole+0.071
ations+0.070
ishment+0.070
Neg logit contributions
 rac-0.061
iden-0.061
 M-0.056
 Oscar-0.056
 tr-0.055
Token mom
Token ID1995
Feature activation-4.90e-03

Pos logit contributions
iden+0.013
 rac+0.013
lic+0.013
Spl+0.012
 bulldo+0.012
Neg logit contributions
 communicate-0.020
ishment-0.019
 generation-0.018
en-0.018
ie-0.018
Tokenrobe
Token ID25481
Feature activation+3.38e-02

Pos logit contributions
iden+0.034
 rac+0.033
 coy+0.031
 bulldo+0.031
 Oscar+0.030
Neg logit contributions
 communicate-0.045
ations-0.044
en-0.044
cks-0.044
ishment-0.043
Token and
Token ID290
Feature activation-9.73e-04

Pos logit contributions
cks+0.068
 communicate+0.064
ishment+0.062
ations+0.062
en+0.061
Neg logit contributions
 rac-0.048
iden-0.047
 tr-0.044
 M-0.043
 Oscar-0.043
Token wear
Token ID5806
Feature activation+7.42e-04

Pos logit contributions
iden+0.002
 rac+0.002
 coy+0.002
 bulldo+0.002
 Oscar+0.002
Neg logit contributions
 communicate-0.001
cks-0.001
ations-0.001
 generation-0.001
en-0.001
Token her
Token ID607
Feature activation+1.16e-02

Pos logit contributions
 communicate+0.001
cks+0.001
en+0.001
ations+0.001
 generation+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 bulldo-0.001
 coy-0.001
 Oscar-0.001
Token p
Token ID279
Feature activation-6.27e-02

Pos logit contributions
ations+0.022
 communicate+0.022
 anymore+0.022
ie+0.022
en+0.022
Neg logit contributions
iden-0.015
 rac-0.014
 bulldo-0.014
 coy-0.013
 Oscar-0.013
Tokenaj
Token ID1228
Feature activation-1.15e-01

Pos logit contributions
 rac+0.063
iden+0.059
 coy+0.055
 bulldo+0.054
 Oscar+0.052
Neg logit contributions
ie-0.145
ations-0.142
ishment-0.142
cks-0.141
en-0.140
Tokenamas
Token ID17485
Feature activation-1.67e-02

Pos logit contributions
 rac+0.163
iden+0.160
 coy+0.151
 bulldo+0.148
 Oscar+0.146
Neg logit contributions
en-0.212
ations-0.212
ishment-0.208
 communicate-0.207
pole-0.206
Token instead
Token ID2427
Feature activation-4.37e-02

Pos logit contributions
 rac+0.023
iden+0.023
 M+0.022
 tr+0.022
lic+0.021
Neg logit contributions
cks-0.034
 communicate-0.032
ishment-0.032
ations-0.031
 generation-0.030
Token.
Token ID13
Feature activation-6.62e-03

Pos logit contributions
 rac+0.064
 M+0.062
iden+0.061
 tr+0.057
 Oscar+0.057
Neg logit contributions
cks-0.084
ishment-0.083
 communicate-0.082
ations-0.078
 generation-0.077
Token But
Token ID887
Feature activation-8.88e-03

Pos logit contributions
 rac+0.009
iden+0.009
 coy+0.008
 bulldo+0.008
 Oscar+0.007
Neg logit contributions
 anymore-0.013
pole-0.013
ishment-0.013
en-0.013
 communicate-0.012
Token Lily
Token ID20037
Feature activation+8.02e-03

Pos logit contributions
 M+0.009
 rac+0.007
 sp+0.007
 Oscar+0.007
 tr+0.007
Neg logit contributions
cks-0.023
 communicate-0.022
cker-0.022
ations-0.022
 generation-0.022
Token and
Token ID290
Feature activation-1.57e-03

Pos logit contributions
cks+0.045
 communicate+0.043
ility+0.041
ck+0.041
ations+0.041
Neg logit contributions
 tr-0.027
 rac-0.026
iden-0.026
 M-0.024
 sp-0.023
Token changed
Token ID3421
Feature activation+3.87e-02

Pos logit contributions
iden+0.002
 rac+0.002
 bulldo+0.002
 coy+0.002
 Oscar+0.002
Neg logit contributions
 communicate-0.003
ations-0.003
cks-0.003
 generation-0.003
en-0.003
Token into
Token ID656
Feature activation+4.87e-03

Pos logit contributions
 communicate+0.056
pole+0.056
cks+0.054
ility+0.054
 generation+0.054
Neg logit contributions
 rac-0.070
iden-0.068
 M-0.068
 tr-0.068
 Oscar-0.065
Token her
Token ID607
Feature activation+1.29e-02

Pos logit contributions
 communicate+0.008
pole+0.008
cks+0.008
ations+0.007
ishment+0.007
Neg logit contributions
 rac-0.008
iden-0.008
 tr-0.008
 M-0.008
 Oscar-0.008
Token p
Token ID279
Feature activation-6.29e-02

Pos logit contributions
 communicate+0.024
ishment+0.024
 generation+0.023
ations+0.023
cks+0.023
Neg logit contributions
iden-0.018
 rac-0.018
 Oscar-0.017
 coy-0.016
lic-0.016
Tokenaj
Token ID1228
Feature activation-1.15e-01

Pos logit contributions
 rac+0.065
iden+0.061
 Oscar+0.057
 coy+0.056
 M+0.055
Neg logit contributions
cks-0.141
ishment-0.141
ations-0.140
ie-0.139
en-0.137
Tokenamas
Token ID17485
Feature activation-1.71e-02

Pos logit contributions
 rac+0.163
iden+0.162
 Oscar+0.150
 coy+0.150
 bulldo+0.148
Neg logit contributions
en-0.213
ishment-0.210
ations-0.210
pole-0.202
cks-0.202
Token.
Token ID13
Feature activation-5.45e-03

Pos logit contributions
 rac+0.027
iden+0.026
 tr+0.025
 M+0.025
 Oscar+0.024
Neg logit contributions
cks-0.030
 communicate-0.030
ations-0.029
ishment-0.029
 generation-0.028
Token She
Token ID1375
Feature activation+4.60e-03

Pos logit contributions
iden+0.010
 rac+0.010
 bulldo+0.010
 coy+0.009
 tr+0.009
Neg logit contributions
 generation-0.008
en-0.007
hip-0.007
 anymore-0.007
ility-0.007
Token was
Token ID373
Feature activation-1.10e-02

Pos logit contributions
ie+0.008
en+0.008
 communicate+0.008
ck+0.008
 fade+0.008
Neg logit contributions
 rac-0.007
iden-0.007
 bulldo-0.007
 coy-0.006
 Water-0.006
Token so
Token ID523
Feature activation-1.13e-02

Pos logit contributions
 rac+0.015
iden+0.015
 M+0.014
 Oscar+0.014
 tr+0.014
Neg logit contributions
 communicate-0.021
cks-0.020
ations-0.020
 generation-0.020
ishment-0.020
Token He
Token ID679
Feature activation-1.18e-02

Pos logit contributions
 anymore+0.001
 generation+0.001
ishment+0.001
en+0.001
hip+0.000
Neg logit contributions
iden-0.000
 rac-0.000
Fel-0.000
 bisc-0.000
 coy-0.000
Token put
Token ID1234
Feature activation-2.74e-02

Pos logit contributions
 rac+0.020
iden+0.020
 Water+0.020
lic+0.019
 bisc+0.019
Neg logit contributions
en-0.018
ie-0.017
ck-0.017
 communicate-0.017
 fade-0.016
Token on
Token ID319
Feature activation+1.33e-02

Pos logit contributions
 rac+0.040
iden+0.040
 tr+0.037
 Oscar+0.037
 M+0.037
Neg logit contributions
 communicate-0.051
cks-0.049
 generation-0.049
pole-0.049
ations-0.048
Token his
Token ID465
Feature activation-3.08e-03

Pos logit contributions
en+0.022
cks+0.022
ations+0.022
 communicate+0.022
 generation+0.021
Neg logit contributions
iden-0.021
 rac-0.021
 coy-0.020
lic-0.020
 Oscar-0.019
Token p
Token ID279
Feature activation-5.91e-02

Pos logit contributions
iden+0.004
 rac+0.004
 Oscar+0.004
 coy+0.004
lic+0.004
Neg logit contributions
 communicate-0.006
ations-0.006
cks-0.006
 generation-0.005
ishment-0.005
Tokenaj
Token ID1228
Feature activation-1.14e-01

Pos logit contributions
 rac+0.058
iden+0.056
 Oscar+0.054
 coy+0.052
 M+0.050
Neg logit contributions
ishment-0.137
cks-0.137
ist-0.135
ations-0.134
en-0.132
Tokenamas
Token ID17485
Feature activation-1.63e-02

Pos logit contributions
 rac+0.168
iden+0.166
 coy+0.158
 Oscar+0.154
 bulldo+0.152
Neg logit contributions
en-0.208
ations-0.205
ishment-0.202
cks-0.199
 communicate-0.199
Token and
Token ID290
Feature activation+1.69e-03

Pos logit contributions
 rac+0.023
iden+0.023
 tr+0.021
 M+0.021
 Oscar+0.021
Neg logit contributions
cks-0.032
 communicate-0.031
ations-0.031
ishment-0.030
 generation-0.030
Token asked
Token ID1965
Feature activation+1.53e-02

Pos logit contributions
 communicate+0.003
ie+0.003
en+0.003
ck+0.003
ations+0.002
Neg logit contributions
iden-0.003
 rac-0.003
 coy-0.003
lic-0.003
 bulldo-0.003
Token his
Token ID465
Feature activation-3.35e-03

Pos logit contributions
 communicate+0.028
cks+0.027
en+0.027
ations+0.027
 generation+0.027
Neg logit contributions
iden-0.022
 rac-0.022
 Oscar-0.020
 coy-0.020
 tr-0.020
Token mom
Token ID1995
Feature activation-2.05e-03

Pos logit contributions
iden+0.006
lic+0.006
 Now+0.006
Fel+0.006
 Oscar+0.006
Neg logit contributions
 communicate-0.005
 anymore-0.004
 someday-0.004
ie-0.004
 appearance-0.004
Token and
Token ID290
Feature activation-1.28e-03

Pos logit contributions
cks+0.021
 communicate+0.021
ishment+0.021
ations+0.020
 fade+0.020
Neg logit contributions
iden-0.018
 rac-0.018
 M-0.017
 Oscar-0.017
 tr-0.016
Token Tim
Token ID5045
Feature activation-1.35e-02

Pos logit contributions
 rac+0.002
iden+0.002
 tr+0.002
 Oscar+0.002
 M+0.002
Neg logit contributions
cks-0.002
 communicate-0.002
ations-0.002
pole-0.002
ishment-0.002
Tokenmy
Token ID1820
Feature activation-7.45e-03

Pos logit contributions
iden+0.023
 rac+0.023
 M+0.021
 tr+0.021
 Oscar+0.020
Neg logit contributions
cks-0.022
 communicate-0.022
 generation-0.022
ility-0.021
ishment-0.021
Token was
Token ID373
Feature activation-1.16e-02

Pos logit contributions
iden+0.008
 rac+0.008
 tr+0.007
 Oscar+0.007
 bulldo+0.007
Neg logit contributions
cks-0.016
 communicate-0.016
ishment-0.016
ility-0.016
ations-0.016
Token over
Token ID625
Feature activation-3.81e-03

Pos logit contributions
iden+0.015
 rac+0.015
 M+0.013
 Oscar+0.013
 coy+0.013
Neg logit contributions
ations-0.024
 communicate-0.023
cks-0.023
ishment-0.023
en-0.023
Tokenjoy
Token ID2633
Feature activation-1.14e-01

Pos logit contributions
iden+0.004
 rac+0.004
 tr+0.004
 coy+0.004
 Oscar+0.004
Neg logit contributions
 communicate-0.008
 generation-0.008
en-0.008
cks-0.008
ishment-0.008
Tokened
Token ID276
Feature activation-2.62e-03

Pos logit contributions
 rac+0.106
iden+0.105
 Oscar+0.092
 tr+0.091
 coy+0.091
Neg logit contributions
 communicate-0.264
cks-0.262
en-0.260
ations-0.259
pole-0.259
Token.
Token ID13
Feature activation-4.37e-03

Pos logit contributions
 rac+0.004
iden+0.004
 Oscar+0.004
 tr+0.004
 M+0.004
Neg logit contributions
 communicate-0.004
cks-0.004
 generation-0.004
pole-0.004
ations-0.004
Token He
Token ID679
Feature activation-1.48e-02

Pos logit contributions
 rac+0.007
iden+0.007
Fel+0.007
 coy+0.007
Carl+0.006
Neg logit contributions
pole-0.007
 generation-0.007
ishment-0.007
en-0.007
hip-0.007
Token and
Token ID290
Feature activation-1.97e-03

Pos logit contributions
 rac+0.023
iden+0.022
 Water+0.021
 Oscar+0.021
 bulldo+0.020
Neg logit contributions
 communicate-0.025
en-0.025
pole-0.025
ie-0.024
ck-0.024
Token his
Token ID465
Feature activation-6.17e-03

Pos logit contributions
 rac+0.002
iden+0.002
 Oscar+0.002
 tr+0.002
 M+0.002
Neg logit contributions
 communicate-0.004
cks-0.004
ations-0.004
ility-0.004
 generation-0.004
Token and
Token ID290
Feature activation-7.27e-04

Pos logit contributions
 communicate+0.001
cks+0.001
pole+0.001
ations+0.001
 generation+0.001
Neg logit contributions
 rac-0.002
iden-0.002
 tr-0.002
 Oscar-0.002
 M-0.002
Token dressed
Token ID12049
Feature activation-5.14e-02

Pos logit contributions
iden+0.001
 rac+0.001
 coy+0.001
 bulldo+0.001
 Oscar+0.001
Neg logit contributions
en-0.001
 communicate-0.001
ations-0.001
ishment-0.001
ie-0.001
Token in
Token ID287
Feature activation+9.20e-03

Pos logit contributions
iden+0.089
 rac+0.088
 tr+0.086
 M+0.085
 sp+0.083
Neg logit contributions
 communicate-0.080
cks-0.079
 generation-0.078
pole-0.077
ility-0.075
Token fresh
Token ID4713
Feature activation-2.79e-02

Pos logit contributions
 communicate+0.012
cks+0.012
en+0.012
ations+0.012
 generation+0.012
Neg logit contributions
iden-0.018
 rac-0.018
 Oscar-0.017
 bulldo-0.017
 coy-0.017
Token p
Token ID279
Feature activation-6.25e-02

Pos logit contributions
iden+0.039
 rac+0.039
 coy+0.035
 bulldo+0.035
 Oscar+0.034
Neg logit contributions
en-0.053
cks-0.051
ations-0.051
 communicate-0.051
ishment-0.051
Tokenaj
Token ID1228
Feature activation-1.14e-01

Pos logit contributions
 rac+0.088
iden+0.083
 coy+0.081
 bulldo+0.077
 tr+0.076
Neg logit contributions
cks-0.120
ishment-0.119
en-0.119
ations-0.118
ie-0.116
Tokenamas
Token ID17485
Feature activation-1.52e-02

Pos logit contributions
 rac+0.164
iden+0.161
 coy+0.152
 bulldo+0.148
 Oscar+0.146
Neg logit contributions
en-0.213
ishment-0.207
ations-0.206
cks-0.200
 communicate-0.200
Token.
Token ID13
Feature activation-3.87e-03

Pos logit contributions
 rac+0.023
iden+0.022
 tr+0.021
 M+0.021
 Oscar+0.020
Neg logit contributions
cks-0.029
 communicate-0.028
ations-0.027
ishment-0.027
ck-0.026
Token
Token ID198
Feature activation-1.17e-02

Pos logit contributions
 rac+0.007
iden+0.007
 coy+0.007
 bulldo+0.006
Carl+0.006
Neg logit contributions
 generation-0.006
en-0.006
ishment-0.005
hip-0.005
 Him-0.005
Token
Token ID198
Feature activation-1.18e-02

Pos logit contributions
 rac+0.016
iden+0.015
 coy+0.014
 sp+0.014
 tr+0.014
Neg logit contributions
 communicate-0.022
ide-0.022
 generation-0.021
pole-0.021
cks-0.021
TokenThat
Token ID2504
Feature activation+1.74e-02

Pos logit contributions
 rac+0.021
iden+0.020
 Oscar+0.020
 bulldo+0.019
 coy+0.019
Neg logit contributions
 communicate-0.018
 generation-0.018
pole-0.018
hip-0.017
ations-0.017
Token.
Token ID13
Feature activation-3.50e-03

Pos logit contributions
en+0.051
 communicate+0.050
pole+0.050
ie+0.049
ations+0.049
Neg logit contributions
iden-0.047
 rac-0.045
 coy-0.043
 Water-0.043
 Oscar-0.043
Token Tim
Token ID5045
Feature activation-2.05e-02

Pos logit contributions
 rac+0.004
iden+0.004
 coy+0.004
 tr+0.004
 bulldo+0.004
Neg logit contributions
pole-0.007
 generation-0.007
 communicate-0.007
cks-0.007
en-0.007
Tokenmy
Token ID1820
Feature activation-1.20e-02

Pos logit contributions
 rac+0.036
iden+0.036
 tr+0.033
 Oscar+0.033
 M+0.033
Neg logit contributions
 communicate-0.031
cks-0.031
 generation-0.031
ations-0.030
ishment-0.030
Token was
Token ID373
Feature activation-2.83e-02

Pos logit contributions
iden+0.014
 rac+0.014
 tr+0.013
 Oscar+0.013
 bulldo+0.012
Neg logit contributions
 communicate-0.025
cks-0.025
 generation-0.024
ishment-0.024
ations-0.024
Token over
Token ID625
Feature activation-9.45e-03

Pos logit contributions
iden+0.043
 rac+0.043
 tr+0.039
 Oscar+0.039
 coy+0.039
Neg logit contributions
 communicate-0.049
cks-0.049
pole-0.048
en-0.048
ations-0.047
Tokenjoy
Token ID2633
Feature activation-1.14e-01

Pos logit contributions
iden+0.013
 rac+0.012
lic+0.012
 tr+0.011
 coy+0.011
Neg logit contributions
en-0.018
 communicate-0.018
 anymore-0.018
 generation-0.017
ishment-0.017
Tokened
Token ID276
Feature activation-1.17e-03

Pos logit contributions
 rac+0.111
iden+0.110
 Oscar+0.096
 tr+0.095
 coy+0.094
Neg logit contributions
 communicate-0.260
cks-0.258
ations-0.254
en-0.253
pole-0.253
Token and
Token ID290
Feature activation-1.20e-02

Pos logit contributions
 rac+0.002
iden+0.002
 Oscar+0.002
 tr+0.002
 M+0.002
Neg logit contributions
 communicate-0.002
cks-0.002
 generation-0.002
ations-0.002
pole-0.002
Token said
Token ID531
Feature activation+1.04e-03

Pos logit contributions
iden+0.017
 rac+0.017
 bulldo+0.016
Fel+0.015
 Oscar+0.015
Neg logit contributions
 communicate-0.022
pole-0.021
cks-0.020
en-0.020
ck-0.020
Token,
Token ID11
Feature activation-2.45e-03

Pos logit contributions
 communicate+0.002
cks+0.002
pole+0.002
en+0.002
 anymore+0.002
Neg logit contributions
 rac-0.002
iden-0.002
 coy-0.002
 bulldo-0.002
 Oscar-0.002
Token "
Token ID366
Feature activation+1.39e-04

Pos logit contributions
 rac+0.003
iden+0.003
 coy+0.002
 Oscar+0.002
 bulldo+0.002
Neg logit contributions
 communicate-0.005
pole-0.005
cks-0.005
en-0.005
 generation-0.005
TokenPeter
Token ID19727
Feature activation-1.57e-02

Pos logit contributions
 rac+0.030
iden+0.029
 bulldo+0.029
 tr+0.029
 Oscar+0.028
Neg logit contributions
 communicate-0.029
 generation-0.029
pole-0.028
cks-0.027
ations-0.027
Token and
Token ID290
Feature activation-1.09e-02

Pos logit contributions
iden+0.018
 rac+0.018
 tr+0.017
 M+0.016
 Oscar+0.016
Neg logit contributions
 communicate-0.033
cks-0.033
 generation-0.032
ishment-0.032
ility-0.032
Token Lucy
Token ID22162
Feature activation+8.55e-03

Pos logit contributions
 rac+0.015
iden+0.015
 Oscar+0.013
 bulldo+0.013
 tr+0.013
Neg logit contributions
 communicate-0.021
cks-0.020
en-0.020
ations-0.020
 generation-0.020
Token were
Token ID547
Feature activation+2.59e-02

Pos logit contributions
cks+0.017
ishment+0.017
 communicate+0.017
 generation+0.017
ations+0.017
Neg logit contributions
 rac-0.011
 tr-0.010
iden-0.010
 M-0.010
 sp-0.010
Token over
Token ID625
Feature activation-3.24e-03

Pos logit contributions
 communicate+0.047
cks+0.047
pole+0.046
ations+0.046
en+0.045
Neg logit contributions
iden-0.038
 rac-0.038
 Oscar-0.034
 tr-0.034
 coy-0.034
Tokenjoy
Token ID2633
Feature activation-1.13e-01

Pos logit contributions
iden+0.004
 rac+0.003
lic+0.003
 bulldo+0.003
 coy+0.003
Neg logit contributions
 communicate-0.007
en-0.007
 generation-0.007
 anymore-0.007
ishment-0.007
Tokened
Token ID276
Feature activation+3.40e-04

Pos logit contributions
 rac+0.085
iden+0.084
 tr+0.068
 Oscar+0.066
 coy+0.066
Neg logit contributions
 communicate-0.288
cks-0.285
ishment-0.282
ations-0.280
 generation-0.279
Token by
Token ID416
Feature activation-5.65e-02

Pos logit contributions
cks+0.001
 communicate+0.001
 generation+0.001
pole+0.001
ations+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 M-0.000
 tr-0.000
 Oscar-0.000
Token the
Token ID262
Feature activation-5.95e-03

Pos logit contributions
 M+0.093
 rac+0.091
 Oscar+0.087
 tr+0.086
iden+0.085
Neg logit contributions
pole-0.094
 communicate-0.092
ations-0.092
immer-0.091
cks-0.091
Token beetle
Token ID45489
Feature activation+1.31e-02

Pos logit contributions
iden+0.010
 rac+0.010
 Oscar+0.009
 bulldo+0.009
 coy+0.009
Neg logit contributions
 communicate-0.010
ations-0.009
cks-0.009
 generation-0.009
en-0.009
Token's
Token ID338
Feature activation+2.44e-02

Pos logit contributions
cks+0.025
ishment+0.024
 communicate+0.024
 generation+0.023
en+0.023
Neg logit contributions
 rac-0.018
 tr-0.018
 M-0.017
 sp-0.016
iden-0.016
Token put
Token ID1234
Feature activation-2.80e-02

Pos logit contributions
iden+0.014
 rac+0.014
 Water+0.014
 coy+0.013
 bulldo+0.013
Neg logit contributions
en-0.013
ie-0.013
ck-0.013
 communicate-0.012
 fade-0.012
Token on
Token ID319
Feature activation+1.21e-02

Pos logit contributions
 rac+0.043
iden+0.042
 Oscar+0.040
 M+0.040
 tr+0.039
Neg logit contributions
 communicate-0.050
cks-0.048
 generation-0.048
pole-0.048
ishment-0.048
Token her
Token ID607
Feature activation+1.21e-02

Pos logit contributions
 communicate+0.012
cks+0.012
en+0.012
ations+0.012
 generation+0.012
Neg logit contributions
iden-0.027
 rac-0.027
 coy-0.025
 bulldo-0.025
 Oscar-0.025
Token special
Token ID2041
Feature activation+1.17e-02

Pos logit contributions
 communicate+0.023
cks+0.022
ations+0.022
ishment+0.022
 anymore+0.022
Neg logit contributions
iden-0.016
 rac-0.015
lic-0.015
 coy-0.014
 Now-0.014
Token p
Token ID279
Feature activation-6.19e-02

Pos logit contributions
 communicate+0.022
en+0.021
ations+0.021
cks+0.021
 generation+0.021
Neg logit contributions
iden-0.015
 rac-0.015
 Oscar-0.014
 coy-0.014
lic-0.014
Tokenaj
Token ID1228
Feature activation-1.13e-01

Pos logit contributions
 rac+0.072
iden+0.070
 coy+0.067
 M+0.061
 Oscar+0.061
Neg logit contributions
ist-0.129
ishment-0.127
cks-0.125
ie-0.123
en-0.123
Tokenamas
Token ID17485
Feature activation-1.47e-02

Pos logit contributions
 rac+0.175
iden+0.173
 coy+0.167
 bulldo+0.161
 Oscar+0.160
Neg logit contributions
en-0.200
ations-0.196
ishment-0.192
ist-0.189
 communicate-0.186
Token and
Token ID290
Feature activation-1.46e-03

Pos logit contributions
 rac+0.021
iden+0.020
 tr+0.019
 M+0.019
 Oscar+0.018
Neg logit contributions
cks-0.029
 communicate-0.028
ations-0.028
ishment-0.027
 generation-0.027
Token went
Token ID1816
Feature activation+1.32e-02

Pos logit contributions
iden+0.002
 rac+0.002
 bulldo+0.002
 coy+0.002
lic+0.002
Neg logit contributions
 communicate-0.002
ie-0.002
en-0.002
ations-0.002
 generation-0.002
Token to
Token ID284
Feature activation-3.07e-03

Pos logit contributions
en+0.018
ie+0.017
 anymore+0.016
ations+0.016
cks+0.016
Neg logit contributions
iden-0.026
 rac-0.024
 coy-0.024
 bulldo-0.023
 Oscar-0.023
Token bed
Token ID3996
Feature activation-3.20e-02

Pos logit contributions
iden+0.004
 rac+0.004
 coy+0.004
 bulldo+0.004
lic+0.004
Neg logit contributions
 communicate-0.006
en-0.006
cks-0.006
 generation-0.006
ations-0.006
Token was
Token ID373
Feature activation-9.40e-03

Pos logit contributions
 rac+0.009
iden+0.009
 Oscar+0.008
 tr+0.008
 coy+0.008
Neg logit contributions
 communicate-0.017
cks-0.017
ations-0.016
en-0.016
pole-0.016
Token still
Token ID991
Feature activation-1.08e-02

Pos logit contributions
 rac+0.015
iden+0.015
 Oscar+0.013
 tr+0.013
 coy+0.013
Neg logit contributions
 communicate-0.016
cks-0.016
ations-0.016
en-0.015
pole-0.015
Token wearing
Token ID5762
Feature activation-4.43e-02

Pos logit contributions
 rac+0.015
iden+0.015
 coy+0.013
 Oscar+0.013
 M+0.013
Neg logit contributions
 communicate-0.021
cks-0.020
 generation-0.020
en-0.020
ations-0.020
Token her
Token ID607
Feature activation+1.39e-02

Pos logit contributions
 rac+0.062
iden+0.061
 M+0.060
 tr+0.059
 sp+0.058
Neg logit contributions
cks-0.081
 communicate-0.081
pole-0.081
ations-0.080
ishment-0.079
Token p
Token ID279
Feature activation-5.96e-02

Pos logit contributions
 communicate+0.030
cks+0.030
ations+0.029
pole+0.029
ishment+0.029
Neg logit contributions
 rac-0.015
iden-0.015
 Oscar-0.013
 coy-0.013
 tr-0.013
Tokenaj
Token ID1228
Feature activation-1.13e-01

Pos logit contributions
 rac+0.053
iden+0.051
 Oscar+0.046
 bulldo+0.044
 coy+0.044
Neg logit contributions
cks-0.143
ishment-0.141
ations-0.141
en-0.139
ie-0.139
Tokenamas
Token ID17485
Feature activation-1.47e-02

Pos logit contributions
 rac+0.147
iden+0.145
 Oscar+0.132
 bulldo+0.132
 coy+0.132
Neg logit contributions
 communicate-0.221
ations-0.220
en-0.220
cks-0.219
ishment-0.218
Token.
Token ID13
Feature activation-3.63e-03

Pos logit contributions
 rac+0.020
iden+0.019
 tr+0.019
 M+0.018
 Oscar+0.018
Neg logit contributions
cks-0.031
 communicate-0.029
ishment-0.029
ations-0.028
ck-0.028
Token
Token ID220
Feature activation+9.37e-04

Pos logit contributions
 rac+0.006
iden+0.006
 bulldo+0.006
 coy+0.006
 Oscar+0.006
Neg logit contributions
ishment-0.005
pole-0.005
 generation-0.005
en-0.005
 anymore-0.005
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 communicate+0.002
cks+0.002
ations+0.002
en+0.002
ishment+0.002
Neg logit contributions
 rac-0.001
iden-0.001
 Oscar-0.001
 tr-0.001
 coy-0.001
Token
Token ID198
Feature activation-1.03e-02

Pos logit contributions
 rac+0.009
iden+0.009
 tr+0.008
 coy+0.008
 bulldo+0.008
Neg logit contributions
 communicate-0.024
cks-0.024
ations-0.024
pole-0.023
 generation-0.023
Token,
Token ID11
Feature activation+1.47e-03

Pos logit contributions
 rac+0.084
iden+0.082
 tr+0.078
 M+0.076
 sp+0.075
Neg logit contributions
cks-0.113
 communicate-0.112
ations-0.106
pole-0.106
 generation-0.105
Token got
Token ID1392
Feature activation+1.74e-02

Pos logit contributions
 communicate+0.003
cks+0.002
ations+0.002
en+0.002
 generation+0.002
Neg logit contributions
 rac-0.002
iden-0.002
 tr-0.002
 Oscar-0.002
 M-0.002
Token into
Token ID656
Feature activation+6.78e-03

Pos logit contributions
 communicate+0.026
ations+0.025
en+0.025
 generation+0.025
ie+0.025
Neg logit contributions
iden-0.030
 rac-0.030
 bulldo-0.028
 coy-0.028
 tr-0.028
Token her
Token ID607
Feature activation+1.40e-02

Pos logit contributions
 communicate+0.009
en+0.009
 anymore+0.009
ishment+0.009
cks+0.009
Neg logit contributions
iden-0.013
 rac-0.013
 bulldo-0.012
lic-0.012
 coy-0.012
Token p
Token ID279
Feature activation-6.05e-02

Pos logit contributions
ishment+0.024
 anymore+0.024
 generation+0.023
ations+0.023
ie+0.023
Neg logit contributions
iden-0.021
lic-0.020
 rac-0.020
 Oscar-0.019
 coy-0.019
Tokenaj
Token ID1228
Feature activation-1.13e-01

Pos logit contributions
 rac+0.057
iden+0.052
 Oscar+0.049
 coy+0.048
 bulldo+0.046
Neg logit contributions
ishment-0.143
cks-0.142
ations-0.141
ie-0.141
en-0.139
Tokenamas
Token ID17485
Feature activation-1.44e-02

Pos logit contributions
 rac+0.158
iden+0.155
 coy+0.145
 bulldo+0.144
 Oscar+0.143
Neg logit contributions
en-0.211
ishment-0.209
ations-0.207
pole-0.202
cks-0.201
Token,
Token ID11
Feature activation+9.80e-04

Pos logit contributions
 rac+0.021
iden+0.021
 tr+0.020
 M+0.019
 Oscar+0.019
Neg logit contributions
cks-0.027
 communicate-0.027
ations-0.026
 generation-0.026
ishment-0.026
Token and
Token ID290
Feature activation-4.55e-04

Pos logit contributions
 communicate+0.002
cks+0.002
ations+0.002
 generation+0.002
ility+0.002
Neg logit contributions
 rac-0.001
iden-0.001
 tr-0.001
 M-0.001
 Oscar-0.001
Token sn
Token ID3013
Feature activation-1.72e-02

Pos logit contributions
iden+0.001
 rac+0.001
 bulldo+0.001
 coy+0.001
 Oscar+0.001
Neg logit contributions
 communicate-0.001
en-0.001
 generation-0.001
cks-0.001
ations-0.001
Tokenugg
Token ID6837
Feature activation-3.81e-02

Pos logit contributions
 rac+0.017
iden+0.017
 coy+0.015
 tr+0.015
 bulldo+0.014
Neg logit contributions
 communicate-0.039
cks-0.039
en-0.039
ations-0.039
ishment-0.038
Token
Token ID198
Feature activation-1.15e-02

Pos logit contributions
 rac+0.014
iden+0.013
 coy+0.013
 bulldo+0.013
 tr+0.013
Neg logit contributions
 communicate-0.023
cks-0.023
pole-0.023
ie-0.023
 generation-0.022
TokenM
Token ID44
Feature activation-4.11e-02

Pos logit contributions
 rac+0.017
iden+0.017
 Oscar+0.016
 bulldo+0.016
 coy+0.016
Neg logit contributions
 communicate-0.021
pole-0.020
ations-0.020
cks-0.020
 generation-0.020
Tokenia
Token ID544
Feature activation-1.75e-02

Pos logit contributions
 rac+0.056
iden+0.055
 Oscar+0.052
 coy+0.051
 tr+0.051
Neg logit contributions
 communicate-0.078
cks-0.077
en-0.077
ations-0.077
pole-0.076
Token was
Token ID373
Feature activation-1.06e-02

Pos logit contributions
 rac+0.024
iden+0.023
 Oscar+0.021
 coy+0.021
 bulldo+0.021
Neg logit contributions
 communicate-0.033
cks-0.033
en-0.033
pole-0.033
ations-0.032
Token over
Token ID625
Feature activation-4.63e-03

Pos logit contributions
 rac+0.015
iden+0.014
 tr+0.013
 Oscar+0.013
 M+0.013
Neg logit contributions
 communicate-0.020
ations-0.020
cks-0.020
 generation-0.019
ishment-0.019
Tokenjoy
Token ID2633
Feature activation-1.13e-01

Pos logit contributions
 rac+0.003
iden+0.003
 Oscar+0.003
 tr+0.003
 coy+0.003
Neg logit contributions
 communicate-0.012
cks-0.012
ations-0.012
en-0.012
pole-0.012
Tokened
Token ID276
Feature activation-2.52e-03

Pos logit contributions
 rac+0.098
iden+0.097
 Oscar+0.083
 tr+0.083
 bulldo+0.082
Neg logit contributions
 communicate-0.270
cks-0.268
en-0.265
ations-0.265
pole-0.265
Token and
Token ID290
Feature activation-1.95e-03

Pos logit contributions
 rac+0.004
iden+0.004
 M+0.003
 Oscar+0.003
 tr+0.003
Neg logit contributions
cks-0.005
 communicate-0.005
 generation-0.005
ations-0.004
pole-0.004
Token thanked
Token ID26280
Feature activation+3.45e-02

Pos logit contributions
iden+0.003
 rac+0.003
 bulldo+0.003
 Oscar+0.003
 coy+0.003
Neg logit contributions
 communicate-0.003
pole-0.003
en-0.003
cks-0.003
ie-0.003
Token her
Token ID607
Feature activation+1.17e-02

Pos logit contributions
cks+0.063
ishment+0.059
pole+0.059
immer+0.059
cker+0.059
Neg logit contributions
 M-0.058
 rac-0.050
iden-0.049
 tr-0.048
 Oscar-0.048
Token friend
Token ID1545
Feature activation-7.65e-03

Pos logit contributions
 communicate+0.021
cks+0.021
pole+0.020
en+0.020
ations+0.020
Neg logit contributions
iden-0.017
 rac-0.017
 Oscar-0.015
 coy-0.015
 bulldo-0.015
Token distance
Token ID5253
Feature activation+8.55e-02

Pos logit contributions
 rac+0.013
iden+0.013
 Oscar+0.011
 tr+0.011
 coy+0.011
Neg logit contributions
 communicate-0.018
cks-0.018
ations-0.018
en-0.018
pole-0.018
Token.
Token ID13
Feature activation-8.15e-03

Pos logit contributions
 communicate+0.164
cks+0.161
 generation+0.156
ishment+0.156
ations+0.154
Neg logit contributions
 rac-0.118
 tr-0.113
iden-0.113
 M-0.112
 sp-0.106
Token Everyone
Token ID11075
Feature activation-3.78e-03

Pos logit contributions
iden+0.010
 rac+0.010
 bulldo+0.009
 tr+0.009
 coy+0.009
Neg logit contributions
 generation-0.017
pole-0.017
 communicate-0.016
en-0.016
ishment-0.016
Token was
Token ID373
Feature activation-2.97e-02

Pos logit contributions
iden+0.004
 rac+0.004
 tr+0.004
 M+0.004
 sp+0.004
Neg logit contributions
ishment-0.008
 generation-0.008
 communicate-0.008
cks-0.008
immer-0.008
Token over
Token ID625
Feature activation-9.89e-03

Pos logit contributions
 rac+0.037
iden+0.035
 coy+0.032
 Oscar+0.032
 M+0.032
Neg logit contributions
 communicate-0.060
cks-0.060
ishment-0.060
ations-0.060
 generation-0.059
Tokenjoy
Token ID2633
Feature activation-1.13e-01

Pos logit contributions
 rac+0.014
iden+0.013
 M+0.012
 Oscar+0.012
 tr+0.012
Neg logit contributions
cks-0.019
 communicate-0.019
pole-0.019
ations-0.018
ility-0.018
Tokened
Token ID276
Feature activation-8.49e-04

Pos logit contributions
iden+0.084
 rac+0.083
 tr+0.067
 Oscar+0.065
 coy+0.065
Neg logit contributions
 communicate-0.287
cks-0.284
ishment-0.282
 generation-0.278
ations-0.278
Token that
Token ID326
Feature activation+5.75e-03

Pos logit contributions
 rac+0.001
iden+0.001
 Oscar+0.001
 M+0.001
 tr+0.001
Neg logit contributions
cks-0.002
 communicate-0.002
 generation-0.002
ility-0.002
ations-0.002
Token the
Token ID262
Feature activation-1.21e-02

Pos logit contributions
cks+0.011
 communicate+0.011
ishment+0.011
ations+0.010
pole+0.010
Neg logit contributions
 rac-0.008
iden-0.008
 M-0.008
 tr-0.007
 Oscar-0.007
Token cow
Token ID9875
Feature activation-1.25e-02

Pos logit contributions
iden+0.016
 rac+0.015
 Oscar+0.014
 bulldo+0.014
 coy+0.014
Neg logit contributions
 communicate-0.023
 generation-0.023
ations-0.023
pole-0.022
ishment-0.022
Token had
Token ID550
Feature activation-2.33e-02

Pos logit contributions
 rac+0.013
iden+0.013
 tr+0.012
 M+0.012
 Oscar+0.011
Neg logit contributions
cks-0.028
 communicate-0.027
ishment-0.027
 generation-0.027
ility-0.026
Token all
Token ID477
Feature activation-9.05e-03

Pos logit contributions
 rac+0.004
 tr+0.004
 M+0.004
iden+0.004
 sp+0.004
Neg logit contributions
 communicate-0.005
cks-0.004
 generation-0.004
ishment-0.004
ility-0.004
Token."
Token ID526
Feature activation-1.50e-02

Pos logit contributions
 M+0.012
 rac+0.012
iden+0.012
 tr+0.012
 sp+0.012
Neg logit contributions
 communicate-0.017
 generation-0.016
cks-0.016
ishment-0.016
immer-0.015
Token Lucy
Token ID22162
Feature activation+3.70e-04

Pos logit contributions
iden+0.021
 rac+0.021
 coy+0.019
 bulldo+0.019
 Oscar+0.018
Neg logit contributions
 anymore-0.028
 generation-0.028
en-0.027
ie-0.027
pole-0.027
Token was
Token ID373
Feature activation-1.22e-02

Pos logit contributions
 communicate+0.001
en+0.001
pole+0.001
ishment+0.001
cks+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 bulldo-0.000
 Oscar-0.000
 coy-0.000
Token over
Token ID625
Feature activation-3.86e-03

Pos logit contributions
 rac+0.017
iden+0.016
 Oscar+0.015
 coy+0.015
 M+0.015
Neg logit contributions
ations-0.023
cks-0.023
 communicate-0.023
 generation-0.023
en-0.023
Tokenjoy
Token ID2633
Feature activation-1.13e-01

Pos logit contributions
iden+0.003
 rac+0.003
 tr+0.003
 bulldo+0.003
 Oscar+0.003
Neg logit contributions
 communicate-0.009
en-0.009
 generation-0.009
ishment-0.009
cks-0.009
Tokened
Token ID276
Feature activation-2.60e-03

Pos logit contributions
 rac+0.095
iden+0.094
 Oscar+0.079
 tr+0.079
 coy+0.078
Neg logit contributions
 communicate-0.274
cks-0.271
ations-0.268
en-0.267
pole-0.267
Token and
Token ID290
Feature activation-2.09e-03

Pos logit contributions
 rac+0.004
iden+0.004
 Oscar+0.003
 M+0.003
 tr+0.003
Neg logit contributions
cks-0.005
 communicate-0.005
 generation-0.005
pole-0.005
ations-0.005
Token quickly
Token ID2952
Feature activation-2.11e-02

Pos logit contributions
iden+0.003
 rac+0.003
 bulldo+0.003
 tr+0.003
 coy+0.003
Neg logit contributions
 communicate-0.004
pole-0.004
en-0.004
cks-0.004
ishment-0.004
Token grabbed
Token ID13176
Feature activation-1.22e-02

Pos logit contributions
iden+0.030
 rac+0.030
 Oscar+0.027
 coy+0.027
 tr+0.027
Neg logit contributions
 communicate-0.038
cks-0.037
en-0.037
pole-0.037
ishment-0.036
Token the
Token ID262
Feature activation-4.94e-03

Pos logit contributions
 M+0.018
 rac+0.018
 Oscar+0.017
iden+0.017
 sp+0.017
Neg logit contributions
 communicate-0.022
pole-0.022
cks-0.022
ations-0.022
 generation-0.021
TokenThe
Token ID464
Feature activation+2.33e-02

Pos logit contributions
 rac+0.018
iden+0.018
 M+0.018
 Oscar+0.017
 bisc+0.017
Neg logit contributions
 communicate-0.016
pole-0.016
 generation-0.015
ations-0.015
ide-0.015
Token bald
Token ID28273
Feature activation-2.47e-02

Pos logit contributions
 communicate+0.053
cks+0.051
ations+0.051
ishment+0.050
 generation+0.050
Neg logit contributions
iden-0.024
 rac-0.023
 Oscar-0.021
 M-0.021
 tr-0.021
Token man
Token ID582
Feature activation+1.65e-03

Pos logit contributions
iden+0.030
 rac+0.030
 M+0.029
 Oscar+0.028
 tr+0.027
Neg logit contributions
 communicate-0.049
en-0.048
ations-0.048
pole-0.047
ishment-0.047
Token was
Token ID373
Feature activation-1.11e-02

Pos logit contributions
cks+0.003
 communicate+0.003
ishment+0.003
 generation+0.003
ations+0.003
Neg logit contributions
 rac-0.002
iden-0.002
 tr-0.002
 coy-0.002
 Oscar-0.002
Token over
Token ID625
Feature activation-2.20e-03

Pos logit contributions
 rac+0.015
iden+0.014
 coy+0.013
 Oscar+0.013
 M+0.013
Neg logit contributions
ations-0.022
ishment-0.022
 communicate-0.022
 generation-0.022
en-0.021
Tokenjoy
Token ID2633
Feature activation-1.13e-01

Pos logit contributions
iden+0.002
 rac+0.002
 tr+0.002
lic+0.002
 bulldo+0.002
Neg logit contributions
 communicate-0.005
en-0.005
 generation-0.005
 anymore-0.005
ishment-0.005
Tokened
Token ID276
Feature activation-2.88e-03

Pos logit contributions
iden+0.117
 rac+0.115
 tr+0.105
 Oscar+0.104
 bulldo+0.102
Neg logit contributions
en-0.246
cks-0.245
 communicate-0.243
 anymore-0.242
ations-0.241
Token.
Token ID13
Feature activation-4.90e-03

Pos logit contributions
iden+0.005
 rac+0.005
 tr+0.004
 bulldo+0.004
 Oscar+0.004
Neg logit contributions
 communicate-0.004
cks-0.004
en-0.004
ations-0.004
pole-0.004
Token He
Token ID679
Feature activation-1.38e-02

Pos logit contributions
iden+0.005
 rac+0.005
 bisc+0.004
Fel+0.004
 coy+0.004
Neg logit contributions
pole-0.011
ishment-0.011
 generation-0.011
en-0.011
 anymore-0.011
Token thanked
Token ID26280
Feature activation+3.58e-02

Pos logit contributions
iden+0.021
 rac+0.021
 Water+0.020
 Oscar+0.019
 bulldo+0.019
Neg logit contributions
 communicate-0.023
en-0.023
pole-0.023
ie-0.023
ck-0.023
Token the
Token ID262
Feature activation-5.30e-03

Pos logit contributions
immer+0.078
cks+0.077
pole+0.077
ations+0.075
ist+0.075
Neg logit contributions
 M-0.043
 rac-0.042
iden-0.038
 Oscar-0.038
 tr-0.036
Token and
Token ID290
Feature activation+8.34e-04

Pos logit contributions
 rac+0.037
 tr+0.036
 sp+0.035
iden+0.035
 M+0.033
Neg logit contributions
ishment-0.054
 communicate-0.053
 generation-0.053
cks-0.052
ility-0.052
Token put
Token ID1234
Feature activation-2.97e-02

Pos logit contributions
 communicate+0.001
cks+0.001
ations+0.001
 generation+0.001
en+0.001
Neg logit contributions
iden-0.001
 rac-0.001
 Oscar-0.001
 coy-0.001
 bulldo-0.001
Token on
Token ID319
Feature activation+1.15e-02

Pos logit contributions
 rac+0.045
iden+0.043
 M+0.043
 sp+0.042
 tr+0.042
Neg logit contributions
 communicate-0.054
pole-0.051
ility-0.050
cks-0.050
ishment-0.050
Token your
Token ID534
Feature activation-4.67e-02

Pos logit contributions
 communicate+0.016
cks+0.016
ations+0.015
en+0.015
 generation+0.015
Neg logit contributions
iden-0.022
 rac-0.021
 Oscar-0.020
 coy-0.020
 tr-0.020
Token p
Token ID279
Feature activation-5.95e-02

Pos logit contributions
iden+0.067
 Oscar+0.062
 rac+0.060
lic+0.059
 coy+0.057
Neg logit contributions
 communicate-0.085
 anymore-0.085
cks-0.084
 generation-0.083
ations-0.082
Tokenaj
Token ID1228
Feature activation-1.12e-01

Pos logit contributions
 rac+0.053
iden+0.051
 coy+0.046
 Oscar+0.046
 bulldo+0.042
Neg logit contributions
cks-0.142
ishment-0.141
ations-0.139
en-0.139
ie-0.139
Tokenamas
Token ID17485
Feature activation-1.57e-02

Pos logit contributions
 rac+0.161
iden+0.159
 coy+0.150
 Oscar+0.148
 bulldo+0.145
Neg logit contributions
en-0.206
ations-0.204
cks-0.201
ishment-0.200
 communicate-0.200
Token."
Token ID526
Feature activation-1.52e-02

Pos logit contributions
 rac+0.024
iden+0.024
 tr+0.023
 M+0.022
 bulldo+0.022
Neg logit contributions
cks-0.028
 communicate-0.027
ishment-0.027
ations-0.026
ility-0.026
Token
Token ID198
Feature activation-1.33e-02

Pos logit contributions
 rac+0.021
iden+0.020
 coy+0.020
Fel+0.019
 tr+0.019
Neg logit contributions
 anymore-0.028
en-0.027
ie-0.027
ishment-0.026
pole-0.026
Token
Token ID198
Feature activation-1.34e-02

Pos logit contributions
 rac+0.018
 coy+0.017
iden+0.017
 tr+0.016
 bulldo+0.016
Neg logit contributions
ie-0.024
en-0.024
 communicate-0.024
ide-0.024
 anymore-0.023
TokenTim
Token ID14967
Feature activation-7.32e-03

Pos logit contributions
 rac+0.018
iden+0.018
 Oscar+0.017
 coy+0.016
 bulldo+0.016
Neg logit contributions
 communicate-0.025
pole-0.025
ie-0.025
en-0.025
ations-0.024
Token king
Token ID5822
Feature activation-3.65e-03

Pos logit contributions
iden+0.009
 rac+0.009
 Oscar+0.009
 bulldo+0.008
 coy+0.008
Neg logit contributions
 communicate-0.007
 generation-0.007
pole-0.007
ations-0.007
ishment-0.007
Token and
Token ID290
Feature activation-6.97e-04

Pos logit contributions
 rac+0.005
iden+0.005
 tr+0.004
 M+0.004
 sp+0.004
Neg logit contributions
cks-0.007
 communicate-0.007
ishment-0.007
ations-0.007
 generation-0.007
Token he
Token ID339
Feature activation-2.08e-02

Pos logit contributions
iden+0.001
 rac+0.001
 Oscar+0.001
 tr+0.001
 coy+0.001
Neg logit contributions
 communicate-0.001
cks-0.001
ations-0.001
pole-0.001
en-0.001
Token was
Token ID373
Feature activation-1.08e-02

Pos logit contributions
iden+0.029
 rac+0.029
 Oscar+0.026
 coy+0.026
 tr+0.026
Neg logit contributions
 communicate-0.039
cks-0.039
ations-0.038
pole-0.038
en-0.038
Token over
Token ID625
Feature activation-4.02e-03

Pos logit contributions
 rac+0.016
iden+0.016
 Oscar+0.015
 tr+0.015
 coy+0.015
Neg logit contributions
 communicate-0.019
cks-0.019
ations-0.019
ishment-0.019
 generation-0.018
Tokenjoy
Token ID2633
Feature activation-1.12e-01

Pos logit contributions
iden+0.004
 rac+0.004
 tr+0.003
 Oscar+0.003
 coy+0.003
Neg logit contributions
 communicate-0.009
cks-0.009
en-0.009
 generation-0.009
ishment-0.009
Tokened
Token ID276
Feature activation-1.65e-03

Pos logit contributions
iden+0.102
 rac+0.101
 Oscar+0.088
 tr+0.088
 bulldo+0.086
Neg logit contributions
 communicate-0.263
cks-0.262
en-0.259
pole-0.258
ations-0.258
Token.
Token ID13
Feature activation-4.36e-03

Pos logit contributions
 rac+0.003
iden+0.003
 Oscar+0.002
 M+0.002
 tr+0.002
Neg logit contributions
 communicate-0.003
cks-0.003
ations-0.003
 generation-0.003
ishment-0.003
Token "
Token ID366
Feature activation+1.06e-04

Pos logit contributions
 rac+0.006
iden+0.006
 tr+0.006
 sp+0.006
 bulldo+0.006
Neg logit contributions
pole-0.008
 generation-0.008
cks-0.007
en-0.007
ishment-0.007
TokenThank
Token ID10449
Feature activation+3.59e-02

Pos logit contributions
 communicate+0.000
cks+0.000
ations+0.000
ishment+0.000
en+0.000
Neg logit contributions
 rac-0.000
iden-0.000
 Oscar-0.000
 tr-0.000
 coy-0.000
Token you
Token ID345
Feature activation-2.83e-03

Pos logit contributions
 communicate+0.056
cks+0.055
ations+0.054
pole+0.054
en+0.053
Neg logit contributions
iden-0.062
 rac-0.062
 Oscar-0.057
 tr-0.057
 coy-0.056
Token girl
Token ID2576
Feature activation-3.81e-03

Pos logit contributions
iden+0.007
 Oscar+0.007
 rac+0.007
 bulldo+0.006
Fel+0.006
Neg logit contributions
ie-0.012
ations-0.012
 communicate-0.012
en-0.012
ishment-0.012
Token put
Token ID1234
Feature activation-2.70e-02

Pos logit contributions
 rac+0.006
iden+0.006
 coy+0.005
 Oscar+0.005
 bulldo+0.005
Neg logit contributions
ie-0.007
en-0.007
pole-0.006
 communicate-0.006
ck-0.006
Token on
Token ID319
Feature activation+1.26e-02

Pos logit contributions
 rac+0.045
iden+0.044
 M+0.043
 Oscar+0.042
 tr+0.042
Neg logit contributions
 communicate-0.045
 generation-0.043
cks-0.043
ations-0.042
pole-0.042
Token her
Token ID607
Feature activation+1.27e-02

Pos logit contributions
 communicate+0.013
en+0.013
cks+0.013
 generation+0.013
ations+0.013
Neg logit contributions
iden-0.028
 rac-0.028
 coy-0.026
 bulldo-0.026
 Oscar-0.026
Token p
Token ID279
Feature activation-6.09e-02

Pos logit contributions
 communicate+0.023
cks+0.023
ations+0.023
ishment+0.022
 anymore+0.022
Neg logit contributions
iden-0.018
 rac-0.017
 coy-0.015
 Oscar-0.015
 bulldo-0.015
Tokenaj
Token ID1228
Feature activation-1.12e-01

Pos logit contributions
 rac+0.062
iden+0.060
 coy+0.054
 Oscar+0.052
 bulldo+0.052
Neg logit contributions
ishment-0.139
cks-0.138
ist-0.137
ie-0.135
en-0.135
Tokenamas
Token ID17485
Feature activation-1.38e-02

Pos logit contributions
 rac+0.166
iden+0.164
 coy+0.154
 bulldo+0.151
 Oscar+0.151
Neg logit contributions
en-0.205
ations-0.200
ishment-0.199
cks-0.195
 communicate-0.195
Token and
Token ID290
Feature activation-2.83e-04

Pos logit contributions
 rac+0.019
iden+0.018
 tr+0.018
 M+0.017
 Oscar+0.017
Neg logit contributions
cks-0.028
 communicate-0.027
ations-0.027
ishment-0.026
 generation-0.026
Token went
Token ID1816
Feature activation+1.59e-02

Pos logit contributions
iden+0.000
 rac+0.000
 bulldo+0.000
 coy+0.000
 Oscar+0.000
Neg logit contributions
en-0.000
ie-0.000
 communicate-0.000
ck-0.000
ations-0.000
Token to
Token ID284
Feature activation-1.84e-03

Pos logit contributions
en+0.018
ie+0.017
 anymore+0.017
 communicate+0.017
ations+0.017
Neg logit contributions
iden-0.034
 rac-0.032
 coy-0.031
 bulldo-0.031
 Oscar-0.031
Token bed
Token ID3996
Feature activation-3.01e-02

Pos logit contributions
iden+0.003
 rac+0.003
 coy+0.003
 bulldo+0.003
 Oscar+0.003
Neg logit contributions
 communicate-0.003
cks-0.003
en-0.003
 generation-0.003
ations-0.003
Token and
Token ID290
Feature activation-2.05e-04

Pos logit contributions
 tr+0.034
 sp+0.033
 rac+0.033
 M+0.031
 Water+0.030
Neg logit contributions
 generation-0.063
ishment-0.063
cks-0.063
ations-0.062
 communicate-0.062
Token put
Token ID1234
Feature activation-3.09e-02

Pos logit contributions
 rac+0.000
iden+0.000
 tr+0.000
 Oscar+0.000
 coy+0.000
Neg logit contributions
 communicate-0.000
cks-0.000
ations-0.000
pole-0.000
 generation-0.000
Token on
Token ID319
Feature activation+1.12e-02

Pos logit contributions
 M+0.047
 rac+0.046
 sp+0.046
 Water+0.045
 Oscar+0.045
Neg logit contributions
 communicate-0.055
pole-0.052
ility-0.052
 generation-0.052
ations-0.051
Token their
Token ID511
Feature activation-2.38e-02

Pos logit contributions
 communicate+0.015
cks+0.014
pole+0.014
ations+0.014
 generation+0.014
Neg logit contributions
 rac-0.022
iden-0.022
 tr-0.021
 Oscar-0.021
 M-0.021
Token p
Token ID279
Feature activation-6.05e-02

Pos logit contributions
iden+0.035
 rac+0.034
 Oscar+0.033
 coy+0.031
 bulldo+0.031
Neg logit contributions
cks-0.043
 communicate-0.043
ations-0.042
 generation-0.041
ishment-0.041
Tokenaj
Token ID1228
Feature activation-1.12e-01

Pos logit contributions
 rac+0.047
iden+0.045
 Oscar+0.039
 coy+0.039
 bulldo+0.037
Neg logit contributions
cks-0.154
ishment-0.151
ations-0.150
 communicate-0.150
en-0.149
Tokenamas
Token ID17485
Feature activation-1.55e-02

Pos logit contributions
iden+0.153
 rac+0.153
 coy+0.141
 Oscar+0.140
 bulldo+0.138
Neg logit contributions
en-0.213
cks-0.212
 communicate-0.212
ations-0.212
ishment-0.209
Token.
Token ID13
Feature activation-5.91e-03

Pos logit contributions
 rac+0.023
iden+0.022
 tr+0.022
 M+0.021
 Oscar+0.021
Neg logit contributions
cks-0.029
 communicate-0.028
ishment-0.028
ations-0.028
 generation-0.027
Token They
Token ID1119
Feature activation+2.29e-03

Pos logit contributions
iden+0.010
 rac+0.010
 coy+0.010
 bulldo+0.010
 tr+0.009
Neg logit contributions
 generation-0.009
ishment-0.009
en-0.009
 anymore-0.009
cks-0.009
Token hug
Token ID16225
Feature activation-7.69e-03

Pos logit contributions
cks+0.003
 communicate+0.003
ations+0.003
pole+0.003
ishment+0.003
Neg logit contributions
 rac-0.004
iden-0.004
 tr-0.004
 Oscar-0.004
 M-0.004
Token their
Token ID511
Feature activation-2.44e-02

Pos logit contributions
 M+0.013
 rac+0.012
 tr+0.011
 Oscar+0.011
 sp+0.011
Neg logit contributions
immer-0.014
cks-0.014
pole-0.014
ations-0.013
ility-0.013
Token and
Token ID290
Feature activation+7.03e-04

Pos logit contributions
 tr+0.034
 sp+0.033
 rac+0.033
 M+0.031
 Water+0.029
Neg logit contributions
ishment-0.059
 communicate-0.059
 generation-0.058
cks-0.058
ations-0.056
Token put
Token ID1234
Feature activation-2.99e-02

Pos logit contributions
cks+0.001
 communicate+0.001
ations+0.001
pole+0.001
ility+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 tr-0.001
 M-0.001
 sp-0.001
Token on
Token ID319
Feature activation+1.17e-02

Pos logit contributions
 M+0.043
 rac+0.041
 Water+0.041
 sp+0.041
 Now+0.040
Neg logit contributions
 communicate-0.057
ility-0.052
ations-0.052
pole-0.052
ishment-0.052
Token your
Token ID534
Feature activation-4.71e-02

Pos logit contributions
 communicate+0.018
cks+0.017
pole+0.017
ations+0.017
ishment+0.017
Neg logit contributions
 rac-0.021
iden-0.020
 tr-0.019
 M-0.019
 Oscar-0.019
Token p
Token ID279
Feature activation-5.91e-02

Pos logit contributions
 rac+0.062
iden+0.062
 Oscar+0.056
 tr+0.055
 coy+0.055
Neg logit contributions
 communicate-0.092
cks-0.091
ations-0.089
en-0.089
pole-0.089
Tokenaj
Token ID1228
Feature activation-1.12e-01

Pos logit contributions
 rac+0.035
iden+0.035
 Oscar+0.028
 coy+0.027
 tr+0.027
Neg logit contributions
 communicate-0.157
cks-0.157
ations-0.154
ishment-0.154
en-0.154
Tokenamas
Token ID17485
Feature activation-1.49e-02

Pos logit contributions
 rac+0.142
iden+0.142
 coy+0.128
 Oscar+0.128
 bulldo+0.127
Neg logit contributions
 communicate-0.222
cks-0.220
en-0.220
ations-0.219
ishment-0.217
Token."
Token ID526
Feature activation-1.42e-02

Pos logit contributions
 rac+0.021
iden+0.021
 tr+0.020
 M+0.020
 Now+0.019
Neg logit contributions
cks-0.028
 communicate-0.028
ishment-0.027
ations-0.027
 generation-0.026
Token
Token ID198
Feature activation-1.16e-02

Pos logit contributions
iden+0.019
 rac+0.019
 coy+0.017
 bulldo+0.017
 tr+0.017
Neg logit contributions
 anymore-0.027
 communicate-0.027
en-0.027
cks-0.027
ishment-0.027
Token
Token ID198
Feature activation-1.17e-02

Pos logit contributions
iden+0.013
 rac+0.012
 coy+0.011
 tr+0.011
 bulldo+0.011
Neg logit contributions
 communicate-0.025
cks-0.025
en-0.025
 anymore-0.025
ie-0.024
TokenL
Token ID43
Feature activation+1.18e-02

Pos logit contributions
 rac+0.012
iden+0.012
 tr+0.010
 Oscar+0.010
 coy+0.010
Neg logit contributions
 communicate-0.026
cks-0.026
ations-0.026
en-0.026
pole-0.026
Token lot
Token ID1256
Feature activation+6.77e-03

Pos logit contributions
 communicate+0.004
 generation+0.004
ations+0.004
pole+0.004
en+0.004
Neg logit contributions
 rac-0.002
iden-0.002
 Oscar-0.002
 bulldo-0.002
 coy-0.002
Token that
Token ID326
Feature activation+1.33e-02

Pos logit contributions
 communicate+0.011
cks+0.011
ations+0.010
en+0.010
pole+0.010
Neg logit contributions
 rac-0.011
iden-0.011
 Oscar-0.010
 tr-0.010
 coy-0.010
Token day
Token ID1110
Feature activation-6.75e-03

Pos logit contributions
cks+0.026
ishment+0.025
 communicate+0.025
pole+0.025
ations+0.025
Neg logit contributions
 rac-0.018
iden-0.018
 M-0.017
 Oscar-0.017
 tr-0.016
Token.
Token ID13
Feature activation+7.43e-04

Pos logit contributions
 rac+0.011
iden+0.011
 Oscar+0.010
 M+0.010
 tr+0.010
Neg logit contributions
cks-0.011
 communicate-0.011
pole-0.011
ishment-0.011
ations-0.011
Token No
Token ID1400
Feature activation+4.03e-02

Pos logit contributions
 generation+0.001
 communicate+0.001
en+0.001
ishment+0.001
cks+0.001
Neg logit contributions
iden-0.001
 rac-0.001
 tr-0.001
 bulldo-0.001
 coy-0.001
Token matter
Token ID2300
Feature activation+1.22e-01

Pos logit contributions
cks+0.034
pole+0.031
 communicate+0.031
ishment+0.028
ations+0.028
Neg logit contributions
 rac-0.103
iden-0.102
 Oscar-0.097
 coy-0.096
 tr-0.096
Token how
Token ID703
Feature activation-6.52e-02

Pos logit contributions
cks+0.151
cker+0.147
 communicate+0.147
pole+0.146
ishment+0.144
Neg logit contributions
 M-0.255
 rac-0.253
 Oscar-0.239
 tr-0.237
iden-0.236
Token spoiled
Token ID34781
Feature activation+1.34e-02

Pos logit contributions
 bisc+0.095
 rac+0.095
iden+0.091
 tr+0.089
 coy+0.085
Neg logit contributions
ie-0.116
ations-0.116
 anymore-0.114
 communicate-0.110
en-0.110
Token her
Token ID607
Feature activation+1.59e-02

Pos logit contributions
cks+0.028
 communicate+0.028
pole+0.028
ishment+0.027
 generation+0.027
Neg logit contributions
 rac-0.016
iden-0.016
 Oscar-0.015
 coy-0.014
 M-0.014
Token furry
Token ID40857
Feature activation-2.30e-02

Pos logit contributions
pole+0.031
cks+0.031
ility+0.030
ations+0.030
 communicate+0.030
Neg logit contributions
 rac-0.022
iden-0.021
 coy-0.020
 M-0.020
 tr-0.020
Token friend
Token ID1545
Feature activation-1.68e-03

Pos logit contributions
 rac+0.023
iden+0.019
 coy+0.019
 Oscar+0.019
 tr+0.018
Neg logit contributions
cks-0.056
ility-0.054
pole-0.054
ishment-0.053
ations-0.053
Token everyone
Token ID2506
Feature activation+3.69e-02

Pos logit contributions
cks+0.012
pole+0.012
immer+0.012
 communicate+0.011
ishment+0.011
Neg logit contributions
 M-0.017
 rac-0.016
 Oscar-0.016
 tr-0.016
iden-0.015
Token and
Token ID290
Feature activation+1.98e-03

Pos logit contributions
ishment+0.078
 generation+0.072
cks+0.072
immer+0.071
 communicate+0.071
Neg logit contributions
 rac-0.047
 tr-0.047
 M-0.047
iden-0.045
 sp-0.043
Token everything
Token ID2279
Feature activation+4.88e-03

Pos logit contributions
cks+0.004
immer+0.003
pole+0.003
ishment+0.003
ility+0.003
Neg logit contributions
 rac-0.003
 tr-0.003
 sp-0.003
 M-0.003
 coy-0.003
Token,
Token ID11
Feature activation+3.63e-03

Pos logit contributions
ishment+0.010
 generation+0.010
cks+0.009
 communicate+0.009
cker+0.009
Neg logit contributions
 M-0.006
 tr-0.006
 sp-0.006
 rac-0.006
iden-0.006
Token no
Token ID645
Feature activation+9.28e-03

Pos logit contributions
cks+0.007
ishment+0.007
immer+0.007
ations+0.007
cker+0.007
Neg logit contributions
 rac-0.005
 M-0.005
 tr-0.005
 sp-0.005
iden-0.005
Token matter
Token ID2300
Feature activation+1.23e-01

Pos logit contributions
cks+0.012
hips+0.010
pole+0.010
immer+0.010
ations+0.009
Neg logit contributions
 rac-0.022
 tr-0.021
 M-0.020
 coy-0.020
iden-0.020
Token what
Token ID644
Feature activation-7.47e-02

Pos logit contributions
cker+0.187
ishment+0.183
 communicate+0.178
immer+0.174
pole+0.172
Neg logit contributions
 M-0.237
 rac-0.219
 tr-0.209
 Oscar-0.202
 two-0.201
Token!
Token ID0
Feature activation-2.94e-02

Pos logit contributions
 rac+0.116
iden+0.113
 M+0.107
 tr+0.107
 sp+0.104
Neg logit contributions
cks-0.131
pole-0.129
ishment-0.129
 communicate-0.129
ility-0.125
Token
Token ID198
Feature activation-9.88e-03

Pos logit contributions
 rac+0.035
iden+0.035
 coy+0.032
 bulldo+0.031
 Oscar+0.030
Neg logit contributions
 generation-0.060
 communicate-0.060
 anymore-0.060
pole-0.059
cks-0.059
Token
Token ID198
Feature activation-9.65e-03

Pos logit contributions
 rac+0.009
iden+0.009
 coy+0.008
 bulldo+0.008
 tr+0.008
Neg logit contributions
 communicate-0.023
cks-0.023
ations-0.023
 generation-0.022
pole-0.022
TokenEvery
Token ID6109
Feature activation+6.48e-02

Pos logit contributions
 rac+0.017
iden+0.017
 Oscar+0.016
 coy+0.016
 tr+0.016
Neg logit contributions
 communicate-0.014
cks-0.014
ations-0.014
pole-0.014
 generation-0.014
Token,
Token ID11
Feature activation+1.73e-03

Pos logit contributions
iden+0.016
 rac+0.015
 Oscar+0.014
 coy+0.014
 tr+0.014
Neg logit contributions
 communicate-0.011
 anymore-0.011
cks-0.010
ations-0.010
en-0.010
Token Bark
Token ID20511
Feature activation+1.36e-02

Pos logit contributions
 communicate+0.003
cks+0.003
pole+0.003
ations+0.003
 generation+0.003
Neg logit contributions
iden-0.003
 rac-0.002
 Oscar-0.002
 tr-0.002
 coy-0.002
Tokeny
Token ID88
Feature activation+6.36e-02

Pos logit contributions
 communicate+0.018
ishment+0.017
cks+0.017
 generation+0.016
ility+0.016
Neg logit contributions
 rac-0.029
iden-0.027
 coy-0.026
lic-0.026
 bulldo-0.025
Token saw
Token ID2497
Feature activation-1.12e-02

Pos logit contributions
 communicate+0.110
en+0.109
ations+0.108
ie+0.107
 generation+0.106
Neg logit contributions
iden-0.095
 rac-0.090
 Oscar-0.087
 bulldo-0.086
 tr-0.086
Token Wh
Token ID854
Feature activation-5.51e-05

Pos logit contributions
iden+0.018
 rac+0.016
 tr+0.015
 bulldo+0.015
 Oscar+0.015
Neg logit contributions
 communicate-0.020
cks-0.019
 generation-0.018
 anymore-0.018
en-0.018
Tokenisk
Token ID1984
Feature activation+1.24e-01

Pos logit contributions
 rac+0.000
 coy+0.000
 Oscar+0.000
iden+0.000
 tr+0.000
Neg logit contributions
 generation-0.000
 communicate-0.000
ishment-0.000
pole-0.000
 fade-0.000
Tokeners
Token ID364
Feature activation+6.93e-03

Pos logit contributions
ishment+0.212
 communicate+0.190
 generation+0.187
cks+0.187
 fade+0.184
Neg logit contributions
 rac-0.210
iden-0.200
lic-0.189
 tr-0.186
 coy-0.184
Token examining
Token ID17247
Feature activation-7.09e-02

Pos logit contributions
ishment+0.015
pole+0.014
cks+0.014
 generation+0.014
cker+0.014
Neg logit contributions
 rac-0.009
 sp-0.008
 tr-0.008
 bulldo-0.007
 Oscar-0.007
Token a
Token ID257
Feature activation+4.59e-04

Pos logit contributions
 rac+0.099
 M+0.092
iden+0.092
 Oscar+0.091
 tr+0.090
Neg logit contributions
 communicate-0.135
pole-0.133
cks-0.132
 generation-0.132
ility-0.131
Token pretty
Token ID2495
Feature activation+1.02e-02

Pos logit contributions
 communicate+0.001
pole+0.001
ations+0.001
 anymore+0.001
 generation+0.001
Neg logit contributions
iden-0.000
 rac-0.000
 Oscar-0.000
 Now-0.000
 coy-0.000
Token flower
Token ID15061
Feature activation-3.20e-03

Pos logit contributions
cks+0.018
 communicate+0.018
ations+0.017
ishment+0.017
 generation+0.017
Neg logit contributions
 rac-0.016
iden-0.016
 tr-0.014
 coy-0.014
 M-0.014
Token,
Token ID11
Feature activation-1.82e-03

Pos logit contributions
 communicate+0.112
cks+0.111
ations+0.109
en+0.108
pole+0.108
Neg logit contributions
 rac-0.106
iden-0.106
 Oscar-0.097
 tr-0.097
 coy-0.096
Token Lucy
Token ID22162
Feature activation-6.99e-04

Pos logit contributions
iden+0.003
Carl+0.003
 bulldo+0.003
 rac+0.003
 coy+0.003
Neg logit contributions
 communicate-0.003
pole-0.003
ie-0.003
 anymore-0.003
ations-0.003
Token!"
Token ID2474
Feature activation-1.71e-02

Pos logit contributions
 rac+0.001
iden+0.001
 tr+0.001
 Oscar+0.001
 coy+0.001
Neg logit contributions
 communicate-0.001
cks-0.001
 generation-0.001
ishment-0.001
ations-0.001
Token said
Token ID531
Feature activation-2.65e-03

Pos logit contributions
iden+0.026
 rac+0.026
 bulldo+0.024
 Oscar+0.023
Carl+0.023
Neg logit contributions
 communicate-0.029
 generation-0.028
en-0.027
ations-0.027
pole-0.027
Token Wh
Token ID854
Feature activation-4.08e-03

Pos logit contributions
 rac+0.003
iden+0.003
 Oscar+0.003
 tr+0.003
 coy+0.003
Neg logit contributions
 communicate-0.005
cks-0.005
ations-0.005
ishment-0.005
en-0.005
Tokenisk
Token ID1984
Feature activation+1.22e-01

Pos logit contributions
 rac+0.008
iden+0.008
 tr+0.007
 M+0.007
lic+0.007
Neg logit contributions
 communicate-0.006
cks-0.006
 generation-0.006
ishment-0.005
ations-0.005
Tokeners
Token ID364
Feature activation+4.53e-03

Pos logit contributions
ishment+0.170
 communicate+0.167
cks+0.164
 generation+0.162
pole+0.159
Neg logit contributions
 rac-0.228
iden-0.226
 tr-0.212
 Oscar-0.208
 coy-0.208
Token,
Token ID11
Feature activation-2.35e-03

Pos logit contributions
cks+0.008
ishment+0.008
 generation+0.008
ations+0.008
angles+0.008
Neg logit contributions
 rac-0.007
 tr-0.006
iden-0.006
 M-0.006
 coy-0.006
Token "
Token ID366
Feature activation+1.02e-03

Pos logit contributions
iden+0.002
 rac+0.002
 Oscar+0.002
 tr+0.002
 coy+0.002
Neg logit contributions
 communicate-0.005
cks-0.005
pole-0.005
en-0.005
ations-0.005
TokenDo
Token ID5211
Feature activation+4.87e-02

Pos logit contributions
cks+0.002
 communicate+0.002
ishment+0.002
ations+0.002
en+0.002
Neg logit contributions
 rac-0.001
iden-0.001
 tr-0.001
 coy-0.001
 Oscar-0.001
Token you
Token ID345
Feature activation-2.23e-03

Pos logit contributions
cks+0.082
 communicate+0.082
en+0.081
ishment+0.081
 generation+0.080
Neg logit contributions
 rac-0.077
iden-0.076
 tr-0.071
 Oscar-0.070
 coy-0.069
Token Tom
Token ID4186
Feature activation-1.14e-03

Pos logit contributions
 rac+0.004
iden+0.004
 M+0.003
 tr+0.003
 Oscar+0.003
Neg logit contributions
ations-0.008
 communicate-0.008
immer-0.008
cks-0.008
ishment-0.008
Token.
Token ID13
Feature activation+2.73e-03

Pos logit contributions
 rac+0.002
 tr+0.001
iden+0.001
 M+0.001
 sp+0.001
Neg logit contributions
ishment-0.002
 generation-0.002
ations-0.002
cks-0.002
immer-0.002
Token It
Token ID632
Feature activation+3.25e-03

Pos logit contributions
cks+0.005
 communicate+0.005
pole+0.005
 generation+0.005
en+0.005
Neg logit contributions
iden-0.004
 rac-0.004
 coy-0.004
 tr-0.004
 bulldo-0.004
Token doesn
Token ID1595
Feature activation+2.55e-02

Pos logit contributions
 communicate+0.006
ishment+0.006
 generation+0.006
cks+0.006
ations+0.006
Neg logit contributions
 rac-0.005
iden-0.004
 tr-0.004
 bulldo-0.004
 sp-0.004
Token't
Token ID470
Feature activation+4.24e-03

Pos logit contributions
 communicate+0.044
cks+0.043
ations+0.043
en+0.042
pole+0.042
Neg logit contributions
iden-0.039
 rac-0.039
 Oscar-0.036
 tr-0.036
 coy-0.035
Token matter
Token ID2300
Feature activation+1.21e-01

Pos logit contributions
ishment+0.008
ations+0.008
cks+0.008
 generation+0.007
 communicate+0.007
Neg logit contributions
 rac-0.007
iden-0.006
 sp-0.006
 tr-0.006
 M-0.006
Token who
Token ID508
Feature activation-4.26e-02

Pos logit contributions
ishment+0.190
pole+0.186
cker+0.181
immer+0.180
 generation+0.180
Neg logit contributions
 rac-0.215
 M-0.208
 tr-0.201
iden-0.192
 Oscar-0.189
Token can
Token ID460
Feature activation+9.34e-03

Pos logit contributions
 rac+0.062
 M+0.058
 sp+0.057
 tr+0.055
iden+0.055
Neg logit contributions
pole-0.078
ishment-0.077
immer-0.076
ations-0.075
ility-0.074
Token balance
Token ID5236
Feature activation+2.48e-02

Pos logit contributions
ishment+0.017
cks+0.016
ations+0.016
 generation+0.016
ility+0.016
Neg logit contributions
 rac-0.015
 tr-0.014
iden-0.013
 M-0.013
 Oscar-0.013
Token a
Token ID257
Feature activation-6.05e-03

Pos logit contributions
 generation+0.043
pole+0.043
 communicate+0.043
ishment+0.042
ility+0.042
Neg logit contributions
 rac-0.038
 tr-0.036
 Oscar-0.036
iden-0.035
 M-0.035
Token spoon
Token ID24556
Feature activation-2.25e-02

Pos logit contributions
iden+0.008
 rac+0.008
 Oscar+0.008
 coy+0.008
 Now+0.007
Neg logit contributions
 communicate-0.011
 anymore-0.011
pole-0.011
 generation-0.011
en-0.011
Token park
Token ID3952
Feature activation+4.26e-02

Pos logit contributions
iden+0.007
 Oscar+0.007
Fel+0.007
 rac+0.007
lic+0.006
Neg logit contributions
pole-0.006
 generation-0.006
 anymore-0.006
cks-0.006
 communicate-0.006
Token and
Token ID290
Feature activation-1.15e-03

Pos logit contributions
 communicate+0.079
cks+0.077
ations+0.076
ility+0.076
ishment+0.076
Neg logit contributions
 rac-0.064
iden-0.059
 tr-0.058
 sp-0.057
 M-0.057
Token Tim
Token ID5045
Feature activation-1.25e-02

Pos logit contributions
iden+0.001
 rac+0.001
 coy+0.001
 Oscar+0.001
 bulldo+0.001
Neg logit contributions
 communicate-0.002
pole-0.002
 generation-0.002
en-0.002
cks-0.002
Token loved
Token ID6151
Feature activation-4.65e-03

Pos logit contributions
iden+0.017
 rac+0.017
 Oscar+0.016
 coy+0.016
 Water+0.015
Neg logit contributions
 communicate-0.023
en-0.023
cks-0.023
pole-0.023
ations-0.023
Token to
Token ID284
Feature activation-2.12e-03

Pos logit contributions
iden+0.007
 rac+0.007
 Oscar+0.006
 coy+0.006
 tr+0.006
Neg logit contributions
 communicate-0.008
cks-0.008
en-0.008
ations-0.008
 generation-0.008
Token kick
Token ID4829
Feature activation+4.79e-02

Pos logit contributions
 rac+0.004
iden+0.004
 Oscar+0.003
 tr+0.003
 coy+0.003
Neg logit contributions
 communicate-0.003
cks-0.003
ations-0.003
en-0.003
pole-0.003
Token the
Token ID262
Feature activation-4.65e-03

Pos logit contributions
 communicate+0.077
cks+0.076
ations+0.075
pole+0.075
 generation+0.074
Neg logit contributions
 rac-0.080
iden-0.079
 tr-0.074
 Oscar-0.074
 coy-0.073
Token sand
Token ID6450
Feature activation-1.44e-02

Pos logit contributions
iden+0.008
Fel+0.008
 Oscar+0.007
 coy+0.007
 rac+0.007
Neg logit contributions
pole-0.008
 anymore-0.007
en-0.007
cks-0.007
hips-0.007
Token around
Token ID1088
Feature activation+2.81e-02

Pos logit contributions
 rac+0.022
 tr+0.021
 sp+0.021
iden+0.020
 mo+0.020
Neg logit contributions
 communicate-0.026
 generation-0.026
ility-0.025
ishment-0.025
ations-0.025
Token.
Token ID13
Feature activation-4.71e-03

Pos logit contributions
 communicate+0.051
cks+0.050
ations+0.050
ishment+0.049
immer+0.049
Neg logit contributions
 rac-0.043
 M-0.040
 tr-0.040
iden-0.040
 Oscar-0.040
Token One
Token ID1881
Feature activation+2.63e-02

Pos logit contributions
Fel+0.006
iden+0.006
 rac+0.006
us+0.006
 coy+0.005
Neg logit contributions
pole-0.009
 anymore-0.009
ishment-0.009
 generation-0.009
hip-0.009
Token a
Token ID257
Feature activation+3.62e-03

Pos logit contributions
 rac+0.007
iden+0.006
 M+0.006
 Oscar+0.006
 mo+0.006
Neg logit contributions
ishment-0.010
ations-0.009
cks-0.009
ility-0.009
 communicate-0.009
Token happier
Token ID23030
Feature activation+6.53e-02

Pos logit contributions
 communicate+0.007
 generation+0.007
pole+0.007
 anymore+0.007
en+0.007
Neg logit contributions
iden-0.004
 rac-0.004
 Oscar-0.004
 bulldo-0.004
 tr-0.004
Token place
Token ID1295
Feature activation+4.11e-02

Pos logit contributions
cks+0.113
 communicate+0.112
ations+0.109
pole+0.108
ishment+0.108
Neg logit contributions
 rac-0.104
iden-0.101
 Oscar-0.093
 coy-0.093
 tr-0.093
Token.
Token ID13
Feature activation+1.37e-03

Pos logit contributions
cks+0.096
ishment+0.093
 communicate+0.091
ations+0.089
 generation+0.089
Neg logit contributions
 rac-0.042
 M-0.038
 tr-0.038
iden-0.037
 Oscar-0.033
Token The
Token ID383
Feature activation-4.40e-04

Pos logit contributions
 generation+0.003
 anymore+0.003
 communicate+0.003
en+0.003
ie+0.003
Neg logit contributions
iden-0.002
 rac-0.002
 tr-0.002
 coy-0.002
 bulldo-0.001
Token end
Token ID886
Feature activation+6.00e-02

Pos logit contributions
iden+0.001
 rac+0.001
 Oscar+0.001
lic+0.001
 bulldo+0.001
Neg logit contributions
 communicate-0.001
ations-0.001
ie-0.001
 generation-0.001
en-0.001
Token.
Token ID13
Feature activation+1.25e-03

Pos logit contributions
 communicate+0.082
cks+0.080
en+0.078
ations+0.078
pole+0.077
Neg logit contributions
iden-0.116
 rac-0.115
 coy-0.108
 Oscar-0.108
 bulldo-0.107
Token<|endoftext|>
Token ID50256
Feature activation+4.69e-02

Pos logit contributions
 communicate+0.003
 generation+0.003
 anymore+0.003
en+0.003
cks+0.003
Neg logit contributions
iden-0.001
 rac-0.001
 tr-0.001
 coy-0.001
 bulldo-0.001
TokenOnce
Token ID7454
Feature activation-3.26e-03

Pos logit contributions
 anymore+0.095
cks+0.094
en+0.092
 communicate+0.092
ations+0.092
Neg logit contributions
iden-0.057
 rac-0.055
 coy-0.051
 bulldo-0.050
 tr-0.050
Token upon
Token ID2402
Feature activation-4.25e-02

Pos logit contributions
 rac+0.004
iden+0.004
 tr+0.003
 Oscar+0.003
 coy+0.003
Neg logit contributions
 communicate-0.007
cks-0.007
ations-0.007
pole-0.007
en-0.007
Token a
Token ID257
Feature activation+2.54e-03

Pos logit contributions
 rac+0.047
iden+0.045
 M+0.041
 tr+0.041
 Oscar+0.040
Neg logit contributions
ishment-0.091
 communicate-0.091
cks-0.091
ations-0.090
en-0.090
Token it
Token ID340
Feature activation-1.68e-02

Pos logit contributions
 communicate+0.012
 anymore+0.011
cks+0.011
en+0.011
pole+0.011
Neg logit contributions
iden-0.012
 rac-0.012
 coy-0.011
 bulldo-0.011
 Oscar-0.011
Token was
Token ID373
Feature activation-1.11e-02

Pos logit contributions
 rac+0.017
iden+0.017
 tr+0.015
 M+0.015
 Oscar+0.015
Neg logit contributions
cks-0.038
 communicate-0.038
ishment-0.037
 generation-0.037
ility-0.037
Token hers
Token ID25144
Feature activation+2.05e-02

Pos logit contributions
 rac+0.016
iden+0.015
 M+0.015
 Oscar+0.014
 tr+0.014
Neg logit contributions
 communicate-0.021
ations-0.020
cks-0.020
 generation-0.020
en-0.020
Token and
Token ID290
Feature activation-1.71e-04

Pos logit contributions
cks+0.039
 communicate+0.038
ations+0.037
ishment+0.037
ility+0.037
Neg logit contributions
 rac-0.031
iden-0.029
 tr-0.028
 M-0.027
 Oscar-0.027
Token he
Token ID339
Feature activation-1.97e-02

Pos logit contributions
 rac+0.000
iden+0.000
 tr+0.000
 M+0.000
 Oscar+0.000
Neg logit contributions
 communicate-0.000
cks-0.000
ility-0.000
ations-0.000
ishment-0.000
Token could
Token ID714
Feature activation+3.20e-02

Pos logit contributions
 rac+0.025
iden+0.025
 tr+0.022
 Oscar+0.022
 coy+0.022
Neg logit contributions
 communicate-0.039
cks-0.039
ations-0.038
ishment-0.038
 generation-0.038
Token not
Token ID407
Feature activation+9.98e-03

Pos logit contributions
ations+0.057
ishment+0.057
ility+0.056
cks+0.054
 generation+0.054
Neg logit contributions
 rac-0.050
 tr-0.047
 Oscar-0.046
 M-0.046
iden-0.045
Token touch
Token ID3638
Feature activation-6.94e-03

Pos logit contributions
ations+0.019
ishment+0.018
cks+0.018
ility+0.018
 generation+0.017
Neg logit contributions
 rac-0.014
 tr-0.014
 sp-0.013
 Oscar-0.013
 w-0.012
Token it
Token ID340
Feature activation-1.47e-02

Pos logit contributions
 M+0.010
 rac+0.010
 Oscar+0.010
 tr+0.009
iden+0.009
Neg logit contributions
 communicate-0.013
immer-0.013
cker-0.013
ility-0.013
pole-0.013
Token.
Token ID13
Feature activation+3.42e-03

Pos logit contributions
 tr+0.013
 long+0.013
 big+0.013
 sp+0.013
 M+0.013
Neg logit contributions
 communicate-0.035
ishment-0.035
 generation-0.035
cker-0.035
ations-0.033
Token Max
Token ID5436
Feature activation+3.65e-02

Pos logit contributions
 communicate+0.007
 anymore+0.006
cks+0.006
ishment+0.006
 generation+0.006
Neg logit contributions
iden-0.004
 rac-0.004
 coy-0.004
 bulldo-0.004
 tr-0.004
Token her
Token ID607
Feature activation+1.41e-02

Pos logit contributions
immer+0.000
cks+0.000
ist+0.000
ility+0.000
pole+0.000
Neg logit contributions
 rac-0.000
 M-0.000
 Oscar-0.000
 tr-0.000
 two-0.000
Token parents
Token ID3397
Feature activation+3.80e-03

Pos logit contributions
ie+0.024
 communicate+0.023
 anymore+0.022
ations+0.022
ishment+0.021
Neg logit contributions
 bulldo-0.023
lic-0.022
iden-0.022
 bisc-0.021
 rac-0.021
Token went
Token ID1816
Feature activation+1.46e-02

Pos logit contributions
cks+0.007
 communicate+0.007
ations+0.007
ishment+0.007
 generation+0.007
Neg logit contributions
iden-0.005
 rac-0.005
 tr-0.005
 Oscar-0.005
 bulldo-0.005
Token to
Token ID284
Feature activation-2.35e-03

Pos logit contributions
 communicate+0.029
cks+0.028
ations+0.028
en+0.028
 generation+0.028
Neg logit contributions
iden-0.019
 rac-0.019
 Oscar-0.017
 coy-0.017
 tr-0.017
Token the
Token ID262
Feature activation-3.11e-03

Pos logit contributions
 rac+0.003
 M+0.003
iden+0.003
 Oscar+0.003
 tr+0.003
Neg logit contributions
pole-0.004
cks-0.004
ations-0.004
ility-0.004
immer-0.004
Token store
Token ID3650
Feature activation+3.92e-02

Pos logit contributions
iden+0.006
 rac+0.006
 bulldo+0.005
 Oscar+0.005
 coy+0.005
Neg logit contributions
 communicate-0.004
 generation-0.004
ations-0.004
en-0.004
pole-0.004
Token.
Token ID13
Feature activation-4.22e-03

Pos logit contributions
cks+0.083
ility+0.080
 communicate+0.079
en+0.078
 fade+0.077
Neg logit contributions
 rac-0.050
iden-0.049
 M-0.045
 tr-0.044
 bulldo-0.043
Token There
Token ID1318
Feature activation-9.99e-04

Pos logit contributions
iden+0.006
 rac+0.006
 coy+0.005
 bulldo+0.005
 tr+0.005
Neg logit contributions
en-0.008
 communicate-0.008
ishment-0.008
 generation-0.008
cks-0.007
Token were
Token ID547
Feature activation+2.67e-02

Pos logit contributions
 rac+0.001
iden+0.001
 M+0.001
 Oscar+0.001
 tr+0.001
Neg logit contributions
cks-0.002
 communicate-0.002
ishment-0.002
ility-0.002
 generation-0.002
Token many
Token ID867
Feature activation+8.00e-03

Pos logit contributions
pole+0.047
ations+0.047
cks+0.047
 anymore+0.046
 communicate+0.046
Neg logit contributions
iden-0.038
 rac-0.035
 bulldo-0.034
 coy-0.033
 Oscar-0.033
Token matt
Token ID23963
Feature activation-6.39e-03

Pos logit contributions
ie+0.014
ations+0.014
en+0.014
 communicate+0.013
le+0.013
Neg logit contributions
iden-0.012
 rac-0.012
 bulldo-0.012
 coy-0.012
 Oscar-0.012
Token always
Token ID1464
Feature activation+3.77e-03

Pos logit contributions
cks+0.003
 communicate+0.003
ations+0.003
pole+0.003
ishment+0.003
Neg logit contributions
 rac-0.003
iden-0.003
 M-0.003
 tr-0.003
 Oscar-0.003
Token making
Token ID1642
Feature activation+1.71e-03

Pos logit contributions
cks+0.007
 communicate+0.007
pole+0.007
ations+0.007
immer+0.007
Neg logit contributions
 rac-0.006
iden-0.005
 M-0.005
 tr-0.005
 sp-0.005
Token discoveries
Token ID24362
Feature activation+6.89e-03

Pos logit contributions
cks+0.003
pole+0.003
 communicate+0.003
ility+0.003
cker+0.003
Neg logit contributions
 rac-0.003
 M-0.002
iden-0.002
 Oscar-0.002
 tr-0.002
Token and
Token ID290
Feature activation-1.60e-03

Pos logit contributions
cks+0.014
ishment+0.013
cker+0.013
 generation+0.013
ck+0.012
Neg logit contributions
 rac-0.010
 tr-0.009
iden-0.009
 M-0.009
 sp-0.008
Token shouting
Token ID19642
Feature activation-1.48e-02

Pos logit contributions
 rac+0.002
iden+0.002
 M+0.002
 tr+0.002
 coy+0.002
Neg logit contributions
cks-0.003
pole-0.003
ishment-0.003
immer-0.003
ations-0.003
Token out
Token ID503
Feature activation+4.58e-02

Pos logit contributions
 rac+0.023
iden+0.022
 M+0.022
 tr+0.022
 Oscar+0.021
Neg logit contributions
cks-0.026
 communicate-0.025
ations-0.025
ishment-0.025
 generation-0.025
Token loud
Token ID7812
Feature activation+5.75e-03

Pos logit contributions
 communicate+0.088
cks+0.088
ations+0.086
ishment+0.086
pole+0.085
Neg logit contributions
 rac-0.062
iden-0.061
 tr-0.056
 M-0.055
 coy-0.055
Token in
Token ID287
Feature activation+1.25e-02

Pos logit contributions
 communicate+0.009
cks+0.009
ations+0.009
pole+0.009
 generation+0.009
Neg logit contributions
 rac-0.010
iden-0.009
 tr-0.009
 M-0.009
 coy-0.009
Token joy
Token ID8716
Feature activation+1.43e-02

Pos logit contributions
pole+0.024
cker+0.023
 communicate+0.023
cks+0.022
ations+0.022
Neg logit contributions
 rac-0.019
 M-0.018
 sp-0.017
 coy-0.017
iden-0.016
Token.
Token ID13
Feature activation+1.32e-03

Pos logit contributions
cks+0.028
 communicate+0.027
 generation+0.027
ishment+0.026
cker+0.026
Neg logit contributions
 rac-0.019
 M-0.019
iden-0.019
 tr-0.018
 sp-0.017
Token
Token ID220
Feature activation-9.69e-04

Pos logit contributions
 communicate+0.003
 generation+0.003
cks+0.003
pole+0.003
en+0.003
Neg logit contributions
 rac-0.001
iden-0.001
 bulldo-0.001
 coy-0.001
 tr-0.001
Token jumping
Token ID14284
Feature activation-3.58e-02

Pos logit contributions
 rac+0.016
iden+0.015
 tr+0.015
 M+0.014
 sp+0.014
Neg logit contributions
 communicate-0.017
pole-0.017
ishment-0.017
 generation-0.017
cks-0.017
Token,
Token ID11
Feature activation+3.57e-03

Pos logit contributions
 rac+0.056
 tr+0.053
iden+0.053
 sp+0.050
 M+0.049
Neg logit contributions
 communicate-0.063
ishment-0.063
 generation-0.062
cks-0.062
ility-0.061
Token rolling
Token ID10708
Feature activation-4.56e-03

Pos logit contributions
 generation+0.007
 communicate+0.007
cks+0.006
immer+0.006
ility+0.006
Neg logit contributions
 rac-0.005
iden-0.005
 tr-0.005
 sp-0.005
 M-0.005
Token,
Token ID11
Feature activation+3.51e-03

Pos logit contributions
 rac+0.007
 tr+0.007
iden+0.006
 sp+0.006
 M+0.006
Neg logit contributions
 communicate-0.008
 generation-0.008
ility-0.008
pole-0.008
ishment-0.008
Token swinging
Token ID25635
Feature activation-9.13e-03

Pos logit contributions
 communicate+0.007
 generation+0.007
immer+0.007
ility+0.007
cks+0.007
Neg logit contributions
 rac-0.005
 tr-0.005
iden-0.005
 sp-0.004
 bulldo-0.004
Token,
Token ID11
Feature activation+3.42e-03

Pos logit contributions
 rac+0.013
 tr+0.012
 sp+0.012
 M+0.011
 Water+0.011
Neg logit contributions
ishment-0.019
 generation-0.018
 communicate-0.018
ility-0.018
cks-0.018
Token and
Token ID290
Feature activation+1.77e-03

Pos logit contributions
 communicate+0.007
 generation+0.007
immer+0.007
ishment+0.007
ility+0.007
Neg logit contributions
 rac-0.005
 tr-0.004
 sp-0.004
iden-0.004
 bulldo-0.004
Token throwing
Token ID9644
Feature activation+1.25e-02

Pos logit contributions
 communicate+0.003
 generation+0.003
ishment+0.003
cks+0.003
immer+0.003
Neg logit contributions
 rac-0.002
 tr-0.002
iden-0.002
 sp-0.002
 M-0.002
Token.
Token ID13
Feature activation-2.31e-03

Pos logit contributions
 generation+0.024
 communicate+0.024
pole+0.024
ility+0.024
ishment+0.023
Neg logit contributions
 rac-0.018
 tr-0.017
 sp-0.017
iden-0.016
 M-0.016
Token
Token ID198
Feature activation-1.00e-02

Pos logit contributions
iden+0.004
 rac+0.004
 coy+0.003
 bulldo+0.003
 tr+0.003
Neg logit contributions
cks-0.004
 anymore-0.004
 generation-0.004
en-0.004
ishment-0.004
Token
Token ID198
Feature activation-9.92e-03

Pos logit contributions
 rac+0.010
iden+0.010
 coy+0.009
 tr+0.009
 bulldo+0.009
Neg logit contributions
 communicate-0.022
cks-0.022
 generation-0.022
pole-0.022
en-0.022
Token and
Token ID290
Feature activation-4.48e-04

Pos logit contributions
 rac+0.005
iden+0.004
 tr+0.004
 M+0.004
 Oscar+0.004
Neg logit contributions
cks-0.006
 communicate-0.006
pole-0.006
cker-0.006
 generation-0.006
Token sad
Token ID6507
Feature activation+3.92e-02

Pos logit contributions
 rac+0.001
iden+0.001
 M+0.000
 coy+0.000
 Oscar+0.000
Neg logit contributions
cks-0.001
 communicate-0.001
 generation-0.001
pole-0.001
ations-0.001
Token.
Token ID13
Feature activation-2.99e-03

Pos logit contributions
cks+0.070
pole+0.070
 communicate+0.069
ations+0.068
 generation+0.068
Neg logit contributions
 rac-0.061
 M-0.057
 tr-0.056
iden-0.056
 Oscar-0.055
Token She
Token ID1375
Feature activation+3.29e-03

Pos logit contributions
iden+0.005
 rac+0.005
 bulldo+0.005
 coy+0.005
 Oscar+0.005
Neg logit contributions
ishment-0.004
 anymore-0.004
 generation-0.004
en-0.004
ie-0.004
Token cried
Token ID16896
Feature activation-2.00e-02

Pos logit contributions
 communicate+0.006
ie+0.006
en+0.006
ck+0.005
ishment+0.005
Neg logit contributions
iden-0.005
 rac-0.005
 Oscar-0.005
 bulldo-0.005
 Water-0.005
Token and
Token ID290
Feature activation-7.06e-04

Pos logit contributions
 rac+0.029
 M+0.027
 tr+0.026
iden+0.026
 sp+0.025
Neg logit contributions
cks-0.039
ations-0.037
 communicate-0.037
pole-0.037
ishment-0.036
Token cried
Token ID16896
Feature activation-2.00e-02

Pos logit contributions
 rac+0.001
iden+0.001
 Oscar+0.001
 tr+0.001
 coy+0.001
Neg logit contributions
 communicate-0.001
cks-0.001
ations-0.001
pole-0.001
en-0.001
Token because
Token ID780
Feature activation-1.59e-02

Pos logit contributions
 rac+0.032
 M+0.030
iden+0.030
 tr+0.029
 Oscar+0.029
Neg logit contributions
cks-0.035
ations-0.034
 communicate-0.034
pole-0.033
ishment-0.033
Token she
Token ID673
Feature activation-1.18e-02

Pos logit contributions
 rac+0.026
iden+0.025
 M+0.024
 Oscar+0.023
 tr+0.023
Neg logit contributions
cks-0.027
 communicate-0.027
pole-0.026
ishment-0.026
ations-0.026
Token knew
Token ID2993
Feature activation-1.77e-02

Pos logit contributions
iden+0.020
 rac+0.019
 Oscar+0.018
 bisc+0.018
 coy+0.018
Neg logit contributions
ie-0.019
en-0.018
 communicate-0.018
ck-0.018
 anymore-0.018
Token she
Token ID673
Feature activation-1.19e-02

Pos logit contributions
 rac+0.031
iden+0.031
 Oscar+0.029
 M+0.029
 tr+0.029
Neg logit contributions
cks-0.026
 communicate-0.026
pole-0.026
ations-0.026
ishment-0.026
Tokenmy
Token ID1820
Feature activation+4.89e-03

Pos logit contributions
 generation+0.005
 communicate+0.005
hips+0.005
ility+0.005
cks+0.005
Neg logit contributions
iden-0.002
 rac-0.002
 sp-0.002
 M-0.002
omo-0.002
Token?"
Token ID1701
Feature activation+1.90e-02

Pos logit contributions
 generation+0.010
cks+0.010
 communicate+0.010
ility+0.010
hips+0.010
Neg logit contributions
iden-0.006
 rac-0.006
 sp-0.005
 M-0.005
 bulldo-0.005
Token Ben
Token ID3932
Feature activation+1.16e-03

Pos logit contributions
 communicate+0.035
cks+0.035
ishment+0.034
pole+0.034
 generation+0.034
Neg logit contributions
iden-0.027
 rac-0.026
 tr-0.024
 bulldo-0.024
 coy-0.024
Token asked
Token ID1965
Feature activation+1.84e-02

Pos logit contributions
 communicate+0.002
cks+0.002
en+0.002
pole+0.002
ishment+0.002
Neg logit contributions
 rac-0.002
iden-0.002
 Oscar-0.002
 bulldo-0.002
 tr-0.002
Token.
Token ID13
Feature activation+5.14e-03

Pos logit contributions
 communicate+0.029
cks+0.029
 generation+0.028
ations+0.028
pole+0.028
Neg logit contributions
 rac-0.031
iden-0.031
 M-0.029
 Oscar-0.029
 tr-0.029
Token
Token ID198
Feature activation-1.10e-02

Pos logit contributions
 communicate+0.011
pole+0.011
cks+0.011
 generation+0.010
 anymore+0.010
Neg logit contributions
iden-0.006
 rac-0.006
 tr-0.005
 coy-0.005
 bulldo-0.005
Token
Token ID198
Feature activation-1.04e-02

Pos logit contributions
 rac+0.012
iden+0.012
 tr+0.011
 coy+0.011
 bulldo+0.011
Neg logit contributions
 communicate-0.023
cks-0.023
 generation-0.023
ie-0.023
en-0.023
Token"
Token ID1
Feature activation-2.52e-04

Pos logit contributions
iden+0.016
 rac+0.016
 tr+0.015
 Oscar+0.015
 coy+0.015
Neg logit contributions
 communicate-0.018
cks-0.017
 generation-0.017
pole-0.017
en-0.017
TokenSure
Token ID19457
Feature activation+2.98e-02

Pos logit contributions
 rac+0.000
iden+0.000
 Oscar+0.000
 bulldo+0.000
 coy+0.000
Neg logit contributions
 communicate-0.001
pole-0.000
cks-0.000
 generation-0.000
ations-0.000
Token,
Token ID11
Feature activation+6.10e-03

Pos logit contributions
 communicate+0.054
cks+0.053
ations+0.053
 generation+0.052
pole+0.052
Neg logit contributions
 rac-0.045
iden-0.044
 M-0.041
 tr-0.041
 Oscar-0.041
Token honey
Token ID12498
Feature activation-3.47e-02

Pos logit contributions
 communicate+0.011
cks+0.011
pole+0.010
en+0.010
ations+0.010
Neg logit contributions
iden-0.009
 rac-0.009
 coy-0.008
 bulldo-0.008
 tr-0.008
Token to
Token ID284
Feature activation+1.32e-04

Pos logit contributions
cks+0.074
 communicate+0.074
ility+0.072
en+0.071
ations+0.071
Neg logit contributions
 rac-0.056
iden-0.055
 tr-0.050
 Oscar-0.050
 M-0.050
Token buy
Token ID2822
Feature activation+3.86e-02

Pos logit contributions
cks+0.000
pole+0.000
ations+0.000
 communicate+0.000
ility+0.000
Neg logit contributions
 rac-0.000
iden-0.000
 M-0.000
 Oscar-0.000
 tr-0.000
Token some
Token ID617
Feature activation+9.28e-03

Pos logit contributions
 communicate+0.063
cks+0.062
 anymore+0.062
en+0.061
ations+0.061
Neg logit contributions
iden-0.062
 rac-0.062
 coy-0.057
 bulldo-0.057
 tr-0.055
Token treats
Token ID18432
Feature activation-1.21e-02

Pos logit contributions
 communicate+0.017
ie+0.017
 anymore+0.016
ations+0.016
pole+0.016
Neg logit contributions
iden-0.013
 rac-0.013
 coy-0.012
Fel-0.012
 Oscar-0.012
Token.
Token ID13
Feature activation+2.93e-03

Pos logit contributions
 rac+0.018
iden+0.018
 tr+0.018
 Oscar+0.017
 M+0.017
Neg logit contributions
cks-0.021
 communicate-0.021
ility-0.021
ishment-0.021
 generation-0.021
Token Tim
Token ID5045
Feature activation-6.22e-03

Pos logit contributions
 generation+0.005
pole+0.005
ishment+0.005
 anymore+0.005
en+0.005
Neg logit contributions
 rac-0.004
iden-0.004
 coy-0.004
 tr-0.004
Fel-0.004
Tokenmy
Token ID1820
Feature activation-6.70e-04

Pos logit contributions
 rac+0.013
iden+0.013
 Oscar+0.012
 coy+0.012
 tr+0.012
Neg logit contributions
 communicate-0.007
cks-0.007
en-0.007
ations-0.007
pole-0.007
Token wanted
Token ID2227
Feature activation+1.46e-04

Pos logit contributions
 rac+0.001
iden+0.001
 coy+0.001
 Water+0.001
 Oscar+0.001
Neg logit contributions
pole-0.001
ie-0.001
en-0.001
ations-0.001
 communicate-0.001
Token to
Token ID284
Feature activation+6.77e-04

Pos logit contributions
 communicate+0.000
cks+0.000
en+0.000
ations+0.000
 generation+0.000
Neg logit contributions
 rac-0.000
iden-0.000
 coy-0.000
 tr-0.000
 bulldo-0.000
Token buy
Token ID2822
Feature activation+3.90e-02

Pos logit contributions
cks+0.001
ations+0.001
 communicate+0.001
ishment+0.001
ility+0.001
Neg logit contributions
 rac-0.001
iden-0.001
 tr-0.001
 Oscar-0.001
 M-0.001
Token a
Token ID257
Feature activation-1.32e-03

Pos logit contributions
 communicate+0.072
cks+0.072
ations+0.070
 generation+0.070
ishment+0.070
Neg logit contributions
 rac-0.055
iden-0.055
 Oscar-0.051
 tr-0.050
 M-0.050
Token who
Token ID508
Feature activation-4.43e-02

Pos logit contributions
ishment+0.055
cks+0.054
 generation+0.052
 communicate+0.051
ility+0.050
Neg logit contributions
iden-0.042
 rac-0.042
 tr-0.039
 M-0.038
 sp-0.037
Token helped
Token ID4193
Feature activation+1.38e-02

Pos logit contributions
 rac+0.061
iden+0.058
 M+0.052
 Oscar+0.052
 Now+0.052
Neg logit contributions
ishment-0.087
cks-0.085
pole-0.083
immer-0.083
ility-0.082
Token save
Token ID3613
Feature activation-4.70e-02

Pos logit contributions
cks+0.022
ishment+0.021
ility+0.021
pole+0.021
immer+0.021
Neg logit contributions
 rac-0.024
 M-0.024
 Oscar-0.023
iden-0.022
 tr-0.022
Token him
Token ID683
Feature activation-7.52e-03

Pos logit contributions
 rac+0.072
 M+0.068
 Oscar+0.067
iden+0.065
 tr+0.064
Neg logit contributions
pole-0.082
cks-0.080
ility-0.080
ishment-0.079
 communicate-0.079
Token.
Token ID13
Feature activation+7.75e-04

Pos logit contributions
 rac+0.011
 tr+0.010
 sp+0.010
 M+0.010
iden+0.010
Neg logit contributions
cks-0.014
ishment-0.014
ility-0.013
 communicate-0.013
pole-0.013
Token And
Token ID843
Feature activation+3.12e-03

Pos logit contributions
 generation+0.002
pole+0.002
 communicate+0.002
en+0.002
cks+0.002
Neg logit contributions
 rac-0.001
iden-0.001
 tr-0.001
 coy-0.001
 sp-0.001
Token they
Token ID484
Feature activation+4.64e-03

Pos logit contributions
cks+0.006
 communicate+0.005
ishment+0.005
ility+0.005
ations+0.005
Neg logit contributions
 rac-0.005
iden-0.005
 M-0.004
 Oscar-0.004
 coy-0.004
Token all
Token ID477
Feature activation-7.86e-03

Pos logit contributions
cks+0.008
ishment+0.008
ations+0.008
 communicate+0.008
 generation+0.008
Neg logit contributions
 rac-0.007
iden-0.007
 tr-0.006
 Oscar-0.006
 M-0.006
Token lived
Token ID5615
Feature activation+2.11e-02

Pos logit contributions
 rac+0.012
iden+0.012
 tr+0.011
 M+0.011
 Oscar+0.011
Neg logit contributions
ishment-0.014
cks-0.014
 communicate-0.014
ations-0.013
 generation-0.013
Token happily
Token ID18177
Feature activation+1.53e-02

Pos logit contributions
 communicate+0.033
cks+0.033
ishment+0.032
en+0.032
ations+0.032
Neg logit contributions
 rac-0.036
iden-0.035
 tr-0.033
 Oscar-0.033
 M-0.033
Token ever
Token ID1683
Feature activation+3.18e-02

Pos logit contributions
 communicate+0.026
cks+0.026
ishment+0.026
pole+0.026
ations+0.026
Neg logit contributions
 rac-0.024
iden-0.023
 tr-0.021
 Oscar-0.021
 coy-0.021
Token.
Token ID13
Feature activation+1.35e-03

Pos logit contributions
 rac+0.031
 M+0.029
 sp+0.029
 tr+0.029
iden+0.028
Neg logit contributions
 communicate-0.043
 generation-0.041
ishment-0.040
ations-0.039
cks-0.039
Token She
Token ID1375
Feature activation+1.26e-02

Pos logit contributions
 generation+0.003
ishment+0.002
pole+0.002
 anymore+0.002
 communicate+0.002
Neg logit contributions
iden-0.002
 rac-0.002
 bulldo-0.002
 coy-0.002
 tr-0.002
Token stumbled
Token ID24241
Feature activation-3.82e-02

Pos logit contributions
 communicate+0.023
en+0.023
ations+0.022
 generation+0.022
cks+0.022
Neg logit contributions
iden-0.018
 rac-0.018
 Oscar-0.017
 bulldo-0.016
 coy-0.016
Token around
Token ID1088
Feature activation+3.35e-02

Pos logit contributions
iden+0.064
 rac+0.063
 bulldo+0.061
 coy+0.060
 Oscar+0.060
Neg logit contributions
en-0.060
ie-0.058
pole-0.057
 anymore-0.057
ations-0.056
Token,
Token ID11
Feature activation+4.14e-03

Pos logit contributions
cks+0.067
pole+0.065
ishment+0.063
cker+0.063
 communicate+0.063
Neg logit contributions
 rac-0.048
 M-0.047
 tr-0.045
 sp-0.044
iden-0.042
Token looking
Token ID2045
Feature activation-5.60e-02

Pos logit contributions
 communicate+0.007
cks+0.007
ations+0.007
 generation+0.007
ility+0.007
Neg logit contributions
 rac-0.006
iden-0.006
 tr-0.006
 M-0.006
 sp-0.006
Token at
Token ID379
Feature activation+3.55e-02

Pos logit contributions
 rac+0.098
iden+0.097
 tr+0.091
 Oscar+0.090
 coy+0.090
Neg logit contributions
 communicate-0.085
cks-0.085
 generation-0.082
ations-0.082
pole-0.082
Token everything
Token ID2279
Feature activation+2.11e-03

Pos logit contributions
pole+0.073
immer+0.072
cker+0.072
cks+0.071
 communicate+0.071
Neg logit contributions
 M-0.049
 sp-0.044
 rac-0.044
 tr-0.043
 Oscar-0.041
Token with
Token ID351
Feature activation-2.22e-02

Pos logit contributions
 communicate+0.004
 generation+0.004
cks+0.004
cker+0.004
ishment+0.004
Neg logit contributions
 M-0.003
 tr-0.003
 sp-0.003
iden-0.003
 rac-0.003
Token her
Token ID607
Feature activation+1.86e-02

Pos logit contributions
 rac+0.034
 M+0.032
 sp+0.032
 tr+0.032
iden+0.031
Neg logit contributions
pole-0.039
ations-0.039
cks-0.039
 communicate-0.038
ishment-0.037
Token big
Token ID1263
Feature activation-1.27e-03

Pos logit contributions
 communicate+0.036
cks+0.035
ations+0.034
en+0.034
 generation+0.034
Neg logit contributions
iden-0.025
 rac-0.025
 Oscar-0.023
 coy-0.023
 bulldo-0.023
Token the
Token ID262
Feature activation-1.88e-03

Pos logit contributions
 communicate+0.054
pole+0.053
cks+0.051
ations+0.051
 generation+0.051
Neg logit contributions
 rac-0.046
iden-0.046
 M-0.045
 Oscar-0.045
 sp-0.043
Token soup
Token ID17141
Feature activation-1.73e-02

Pos logit contributions
iden+0.003
 rac+0.003
 coy+0.002
 Oscar+0.002
 tr+0.002
Neg logit contributions
 communicate-0.004
ations-0.004
 anymore-0.003
en-0.003
ie-0.003
Token without
Token ID1231
Feature activation-8.57e-03

Pos logit contributions
 rac+0.027
iden+0.026
 M+0.026
 sp+0.025
 tr+0.025
Neg logit contributions
 communicate-0.031
cks-0.031
 generation-0.029
ishment-0.029
ations-0.029
Token asking
Token ID4737
Feature activation-2.06e-02

Pos logit contributions
 rac+0.014
iden+0.013
 Oscar+0.013
 M+0.013
 sp+0.013
Neg logit contributions
 communicate-0.014
pole-0.014
ations-0.014
hips-0.013
 generation-0.013
Token.
Token ID13
Feature activation-3.31e-03

Pos logit contributions
 rac+0.035
iden+0.035
 M+0.033
 Oscar+0.033
 tr+0.033
Neg logit contributions
 communicate-0.032
cks-0.031
pole-0.030
ations-0.030
 generation-0.030
Token Now
Token ID2735
Feature activation-5.83e-02

Pos logit contributions
 rac+0.005
iden+0.005
 tr+0.005
 coy+0.004
 bisc+0.004
Neg logit contributions
 generation-0.006
ishment-0.006
pole-0.006
en-0.006
 communicate-0.006
Token you
Token ID345
Feature activation+8.19e-04

Pos logit contributions
 rac+0.082
iden+0.082
 M+0.079
 tr+0.078
 Oscar+0.078
Neg logit contributions
cks-0.108
 communicate-0.107
 generation-0.103
ations-0.103
ishment-0.102
Token won
Token ID1839
Feature activation-4.29e-02

Pos logit contributions
 communicate+0.001
cks+0.001
en+0.001
ations+0.001
pole+0.001
Neg logit contributions
iden-0.001
 rac-0.001
 tr-0.001
 coy-0.001
 Oscar-0.001
Token't
Token ID470
Feature activation+5.82e-03

Pos logit contributions
iden+0.075
 rac+0.072
 coy+0.067
 tr+0.067
 bulldo+0.066
Neg logit contributions
 communicate-0.067
cks-0.066
en-0.066
 anymore-0.064
ations-0.064
Token gain
Token ID4461
Feature activation+1.13e-02

Pos logit contributions
cks+0.009
 communicate+0.009
ations+0.009
ishment+0.009
 generation+0.009
Neg logit contributions
 rac-0.010
iden-0.010
 Oscar-0.009
 tr-0.009
 coy-0.009
Token anything
Token ID1997
Feature activation+3.15e-02

Pos logit contributions
 communicate+0.018
pole+0.018
cks+0.018
ations+0.018
ishment+0.018
Neg logit contributions
 rac-0.018
iden-0.018
 Oscar-0.017
 M-0.017
 tr-0.017
Token something
Token ID1223
Feature activation-2.31e-02

Pos logit contributions
cks+0.019
 communicate+0.019
pole+0.019
ishment+0.018
 generation+0.018
Neg logit contributions
 rac-0.016
iden-0.016
 M-0.015
 tr-0.015
 Oscar-0.015
Token was
Token ID373
Feature activation-1.17e-02

Pos logit contributions
iden+0.029
 rac+0.029
 Oscar+0.026
 bulldo+0.025
 coy+0.025
Neg logit contributions
 communicate-0.046
cks-0.046
pole-0.045
ations-0.045
en-0.045
Token attached
Token ID7223
Feature activation-1.15e-02

Pos logit contributions
 rac+0.017
iden+0.016
 sp+0.015
 tr+0.015
 bulldo+0.015
Neg logit contributions
 communicate-0.022
ishment-0.022
 generation-0.022
ations-0.021
en-0.021
Token to
Token ID284
Feature activation-3.35e-03

Pos logit contributions
 rac+0.009
 M+0.008
 tr+0.008
iden+0.008
 sp+0.007
Neg logit contributions
 communicate-0.029
ishment-0.028
cks-0.028
 generation-0.028
ations-0.028
Token the
Token ID262
Feature activation-2.63e-03

Pos logit contributions
 M+0.004
 rac+0.003
 sp+0.003
 Y+0.003
 big+0.003
Neg logit contributions
pole-0.008
cks-0.008
ations-0.008
ishment-0.008
ist-0.008
Token cave
Token ID11527
Feature activation-5.54e-02

Pos logit contributions
iden+0.005
 Oscar+0.004
Fel+0.004
lic+0.004
 rac+0.004
Neg logit contributions
 anymore-0.004
 generation-0.004
pole-0.004
 communicate-0.004
ations-0.004
Token.
Token ID13
Feature activation-3.36e-03

Pos logit contributions
 rac+0.073
 M+0.070
 tr+0.070
 sp+0.069
iden+0.068
Neg logit contributions
 communicate-0.110
cks-0.108
 generation-0.106
ishment-0.106
ility-0.103
Token She
Token ID1375
Feature activation+3.46e-03

Pos logit contributions
 rac+0.006
iden+0.006
 coy+0.005
 bulldo+0.005
 tr+0.005
Neg logit contributions
 generation-0.005
pole-0.005
ility-0.005
 communicate-0.005
en-0.004
Token looked
Token ID3114
Feature activation+9.88e-04

Pos logit contributions
ck+0.005
ie+0.005
cker+0.005
pole+0.005
en+0.005
Neg logit contributions
 rac-0.006
iden-0.006
 Oscar-0.006
 Water-0.006
 bisc-0.005
Token closer
Token ID5699
Feature activation-1.57e-02

Pos logit contributions
 communicate+0.001
cks+0.001
en+0.001
ishment+0.001
pole+0.001
Neg logit contributions
 rac-0.002
iden-0.002
 Oscar-0.002
 bulldo-0.002
 coy-0.002
Token and
Token ID290
Feature activation-1.05e-03

Pos logit contributions
 rac+0.021
iden+0.021
 tr+0.021
 M+0.020
 sp+0.020
Neg logit contributions
 communicate-0.030
cks-0.029
 generation-0.029
ations-0.029
ility-0.028
Token,
Token ID11
Feature activation+1.07e-03

Pos logit contributions
 rac+0.063
iden+0.062
 tr+0.059
 long+0.056
 sp+0.056
Neg logit contributions
ishment-0.111
ations-0.108
 communicate-0.107
 generation-0.106
cks-0.105
Token deep
Token ID2769
Feature activation-6.01e-03

Pos logit contributions
 anything+0.002
 communicate+0.002
 generation+0.002
 someday+0.002
pole+0.002
Neg logit contributions
iden-0.002
 bulldo-0.001
lic-0.001
 rac-0.001
Fel-0.001
Token sigh
Token ID19680
Feature activation-1.82e-02

Pos logit contributions
 rac+0.009
 tr+0.008
 sp+0.008
iden+0.008
 bulldo+0.008
Neg logit contributions
 communicate-0.011
cks-0.011
 generation-0.011
pole-0.011
ations-0.011
Token,
Token ID11
Feature activation+1.65e-03

Pos logit contributions
iden+0.028
 rac+0.028
 Oscar+0.026
 coy+0.026
 tr+0.026
Neg logit contributions
 communicate-0.031
cks-0.030
en-0.030
ations-0.030
pole-0.030
Token he
Token ID339
Feature activation-1.76e-02

Pos logit contributions
 communicate+0.003
en+0.003
 generation+0.003
ishment+0.003
ations+0.003
Neg logit contributions
iden-0.002
 rac-0.002
 bulldo-0.002
 tr-0.002
lic-0.002
Token watched
Token ID7342
Feature activation-4.57e-02

Pos logit contributions
 rac+0.029
iden+0.028
 tr+0.027
 coy+0.026
 Oscar+0.026
Neg logit contributions
 communicate-0.029
cks-0.029
ishment-0.028
 generation-0.028
ations-0.028
Token it
Token ID340
Feature activation-1.22e-02

Pos logit contributions
 M+0.065
 rac+0.064
 tr+0.059
 Oscar+0.059
 Water+0.058
Neg logit contributions
ations-0.086
cks-0.086
ishment-0.086
 communicate-0.085
 generation-0.083
Token soar
Token ID48701
Feature activation+6.00e-02

Pos logit contributions
 tr+0.020
 rac+0.019
 w+0.019
 M+0.018
 sp+0.018
Neg logit contributions
ishment-0.021
cks-0.019
 generation-0.019
ations-0.019
ility-0.018
Token up
Token ID510
Feature activation+5.33e-03

Pos logit contributions
 generation+0.109
 communicate+0.109
cks+0.108
ishment+0.107
ility+0.106
Neg logit contributions
 tr-0.089
 rac-0.085
 M-0.083
 sp-0.081
 big-0.079
Token ever
Token ID1683
Feature activation+3.31e-02

Pos logit contributions
 communicate+0.011
cks+0.011
ations+0.010
ility+0.010
 generation+0.010
Neg logit contributions
 tr-0.007
 rac-0.007
 sp-0.007
 M-0.006
iden-0.006
Token higher
Token ID2440
Feature activation+1.18e-02

Pos logit contributions
 communicate+0.063
 generation+0.060
cks+0.059
hips+0.059
hip+0.055
Neg logit contributions
 tr-0.049
 sp-0.049
 M-0.045
 rac-0.045
iden-0.043
Token and
Token ID290
Feature activation+1.11e-03

Pos logit contributions
 communicate+0.007
cks+0.007
en+0.007
ishment+0.007
ations+0.007
Neg logit contributions
iden-0.007
 rac-0.007
 bulldo-0.007
 tr-0.007
 Oscar-0.007
Token thanked
Token ID26280
Feature activation+3.80e-02

Pos logit contributions
 communicate+0.002
ck+0.002
ie+0.002
ishment+0.002
pole+0.002
Neg logit contributions
iden-0.002
 rac-0.002
 bulldo-0.002
Fel-0.002
 coy-0.002
Token her
Token ID607
Feature activation+1.25e-02

Pos logit contributions
pole+0.075
cks+0.072
cker+0.072
ations+0.070
ishment+0.069
Neg logit contributions
 M-0.065
 Water-0.053
 Mom-0.053
 Y-0.053
 rac-0.052
Token mom
Token ID1995
Feature activation-3.05e-03

Pos logit contributions
 communicate+0.025
 anymore+0.024
ie+0.024
en+0.023
 fade+0.023
Neg logit contributions
iden-0.017
 rac-0.016
 Oscar-0.015
 coy-0.015
 bulldo-0.015
Token for
Token ID329
Feature activation-8.06e-03

Pos logit contributions
 M+0.004
 rac+0.004
 tr+0.004
iden+0.004
 Oscar+0.003
Neg logit contributions
cks-0.007
 communicate-0.006
ishment-0.006
ations-0.006
ck-0.006
Token helping
Token ID5742
Feature activation-4.57e-02

Pos logit contributions
 rac+0.013
iden+0.012
 Oscar+0.012
 M+0.012
 tr+0.012
Neg logit contributions
ishment-0.013
pole-0.013
 generation-0.013
en-0.013
ations-0.013
Token.
Token ID13
Feature activation-3.22e-03

Pos logit contributions
 M+0.075
 rac+0.062
 Mom+0.058
 m+0.058
 Y+0.058
Neg logit contributions
cks-0.095
pole-0.095
cker-0.088
hips-0.088
fly-0.086
Token From
Token ID3574
Feature activation-1.65e-02

Pos logit contributions
iden+0.005
 rac+0.005
 bulldo+0.004
 coy+0.004
 Oscar+0.004
Neg logit contributions
 communicate-0.006
 generation-0.006
ishment-0.006
cks-0.006
en-0.006
Token then
Token ID788
Feature activation-6.52e-03

Pos logit contributions
 rac+0.030
iden+0.029
 Oscar+0.028
 tr+0.028
 coy+0.027
Neg logit contributions
 communicate-0.024
 generation-0.024
cks-0.024
pole-0.024
ations-0.024
Token on
Token ID319
Feature activation+1.27e-02

Pos logit contributions
iden+0.010
 rac+0.010
 coy+0.009
 tr+0.009
 Oscar+0.009
Neg logit contributions
cks-0.011
 communicate-0.011
en-0.011
ations-0.011
ie-0.010
Token,
Token ID11
Feature activation+1.03e-04

Pos logit contributions
 communicate+0.024
cks+0.024
pole+0.024
ishment+0.024
 generation+0.024
Neg logit contributions
 rac-0.018
iden-0.018
 M-0.017
 Oscar-0.017
 tr-0.017