TOP ACTIVATIONS
MAX = 4.810

dragon loved it ! <|endoftext|> Lu cy was jealous of her
of wonderful memories from <|endoftext|> Lu cy was a little girl
The child said " Hello Frank ! I 'm going to
to name it " Sun ny " because it looked like
, rat ! You are my lunch !" But
ummy was so proud of Lex i . She said with
The frog said " Hi there little bunny ! What are
to be the strongest animal in the world . So they
her ears and the sun on her back . She felt
She put some lot ion on her hands and started to
name tag that said " Bob ". " This
dared to join them <|endoftext|> Ab by and Tommy were twins
The end . <|endoftext|> Lu cy was very excited to
. We don 't want Sunshine to hurt herself ." She
She put some lot ion on her hands and started to
said , " Thank you Stella ! What should we do
breeze rust le the leaves on the trees .
M opsy was happy for Fl opsy and said ,
said " No juice allowed inside ". She was sad because
Mrs Berry said " Of course , Lucy ." and gave

MIN ACTIVATIONS
MIN = -5.594

We can make it look even more special ," Lily said
end , Lily received something even better than her hoop -
. We can make it even better . We can use
, you can make something even better . The end .
work together and make them even better than before . And
quiet compared to the noisy se ag ull s outside .
to make her doll look even prett ier . She hugged
to others made them feel even better than winning a race
that he had gained something even more important - a friend
his mother gave him an even longer rope to play with
can help us make something even more special ! <|endoftext|> Once
on top to make it even taller !" The
and stickers and make them even better than before . Then
try again and make something even better . <|endoftext|> Once upon
next game and getting an even higher score . <|endoftext|> Once
others can make you feel even happier than getting what you
always be replaced with something even better . <|endoftext|> Once upon
had bought a new magnet even though the old one was
Why are you making so much noise ?" They
Bob wanted to make it even better , so he found

INTERVAL 2.209 - 4.810
CONTAINS 0.737%

Jos ie couldn â t find her favourite toy
, and she loved the way it tick led her toes
so it didn â t move . The children
pretend we 're the fish in the pond !" said John
allows on sticks and held them over the fire . The

INTERVAL -0.392 - 2.209
CONTAINS 67.066%

?" L ila and Ben were scared . They
in the park . Kitty was happy that her idea worked
't eat the toast !" but it was too late .
was too fast . Sadly , the wolf caught them both
room . Sara was the seeker and Tom was the h

INTERVAL -2.993 - -0.392
CONTAINS 32.126%

playing outside and got her bl ouse dirty . Her mom
, there was a playful dog named Spot . Spot loved
, they saw a big box in the living room .
. Her friends looked around and then Jane said
Ben says . He is scared and angry . He tries

INTERVAL -5.594 - -2.993
CONTAINS 0.071%

that the celebration would be even swe eter with his help
hurt , but we should always ask for help when we
day she was having so much fun shooting her hose that
a big poster . The teacher says they can use any
with a set of colourful cr ay ons .
Token dragon
Token ID10441
Feature activation+1.22e+00

Pos logit contributions
 Ralph+0.009
 Lex+0.009
Albert+0.009
 Mole+0.008
Mo+0.008
Neg logit contributions
auc-0.010
udd-0.010
wheel-0.009
 about-0.009
 x-0.009
Token loved
Token ID6151
Feature activation+8.23e-01

Pos logit contributions
 Mole+0.033
 Ralph+0.033
Albert+0.032
 Lex+0.032
 Amelia+0.031
Neg logit contributions
udd-0.037
auc-0.037
 about-0.035
wheel-0.034
ep-0.032
Token it
Token ID340
Feature activation+1.79e+00

Pos logit contributions
Albert+0.022
 Mole+0.022
 Lex+0.022
 Ralph+0.022
 Amelia+0.021
Neg logit contributions
udd-0.025
auc-0.025
 about-0.024
wheel-0.022
ep-0.020
Token!
Token ID0
Feature activation+6.11e-01

Pos logit contributions
 Ralph+0.055
 Mole+0.055
Albert+0.054
 Lex+0.053
 Amelia+0.052
Neg logit contributions
auc-0.049
udd-0.049
wheel-0.043
 about-0.043
iren-0.039
Token<|endoftext|>
Token ID50256
Feature activation+7.04e-02

Pos logit contributions
Albert+0.020
 mole+0.020
 Lex+0.019
sie+0.019
Lex+0.019
Neg logit contributions
 about-0.015
udd-0.015
auc-0.014
oke-0.013
wheel-0.012
TokenLu
Token ID25596
Feature activation+4.81e+00

Pos logit contributions
 Lex+0.002
Albert+0.002
 Mole+0.002
 Amelia+0.002
 chicks+0.002
Neg logit contributions
auc-0.002
udd-0.002
 about-0.002
wheel-0.002
iren-0.002
Tokency
Token ID948
Feature activation+1.41e+00

Pos logit contributions
 Mole+0.111
 Fox+0.107
 Ralph+0.104
 mole+0.103
Larry+0.101
Neg logit contributions
udd-0.150
auc-0.149
 about-0.142
 works-0.136
 fixes-0.134
Token was
Token ID373
Feature activation-1.26e+00

Pos logit contributions
 Mole+0.049
 Ralph+0.049
Albert+0.049
 Lex+0.048
 Fox+0.047
Neg logit contributions
auc-0.032
udd-0.032
wheel-0.028
 about-0.028
iren-0.025
Token jealous
Token ID19354
Feature activation-6.17e-01

Pos logit contributions
auc+0.043
udd+0.042
wheel+0.038
orting+0.037
iren+0.036
Neg logit contributions
 Ralph-0.032
 Mole-0.032
 Lex-0.031
 Amelia-0.031
 Buzz-0.031
Token of
Token ID286
Feature activation+1.87e-01

Pos logit contributions
auc+0.020
udd+0.020
ep+0.018
iren+0.018
wheel+0.017
Neg logit contributions
 Mole-0.014
 Ralph-0.014
 Fox-0.014
 Amelia-0.013
 Buzz-0.013
Token her
Token ID607
Feature activation+2.26e-01

Pos logit contributions
Albert+0.005
 Ralph+0.005
 Mole+0.005
 Lex+0.005
 Fox+0.005
Neg logit contributions
auc-0.006
udd-0.006
 about-0.005
wheel-0.005
iren-0.005
Token of
Token ID286
Feature activation-1.21e+00

Pos logit contributions
auc+0.032
udd+0.030
wheel+0.029
ep+0.026
ru+0.026
Neg logit contributions
 Ralph-0.035
 Mole-0.033
 Lex-0.033
Albert-0.032
 Amelia-0.032
Token wonderful
Token ID7932
Feature activation-5.76e-01

Pos logit contributions
auc+0.036
udd+0.035
wheel+0.032
 about+0.030
iren+0.030
Neg logit contributions
 Ralph-0.036
 Mole-0.034
Albert-0.034
 Lex-0.034
 Amelia-0.033
Token memories
Token ID9846
Feature activation-3.77e-01

Pos logit contributions
auc+0.018
udd+0.017
wheel+0.016
 about+0.015
iren+0.015
Neg logit contributions
 Ralph-0.016
Albert-0.016
 Mole-0.016
 Lex-0.016
 Amelia-0.015
Token from
Token ID422
Feature activation+9.14e-01

Pos logit contributions
auc+0.011
udd+0.011
wheel+0.010
 x+0.010
iren+0.009
Neg logit contributions
 Ralph-0.011
 Mole-0.010
 Amelia-0.010
Albert-0.010
 Albert-0.010
Token<|endoftext|>
Token ID50256
Feature activation-5.44e-01

Pos logit contributions
 Mole+0.024
Albert+0.024
 Ralph+0.024
 Lex+0.024
Larry+0.023
Neg logit contributions
auc-0.028
udd-0.027
 about-0.026
wheel-0.025
iren-0.023
TokenLu
Token ID25596
Feature activation+4.56e+00

Pos logit contributions
auc+0.018
udd+0.017
 about+0.017
wheel+0.016
iren+0.015
Neg logit contributions
 Lex-0.013
Albert-0.013
 Mole-0.013
 Ralph-0.013
 Amelia-0.013
Tokency
Token ID948
Feature activation+2.46e+00

Pos logit contributions
 Mole+0.100
 mole+0.093
 Fox+0.089
 Buzz+0.087
 Ralph+0.084
Neg logit contributions
udd-0.155
 about-0.148
auc-0.146
 writes-0.136
ep-0.132
Token was
Token ID373
Feature activation-1.71e+00

Pos logit contributions
Albert+0.104
 Mole+0.104
 Ralph+0.102
Larry+0.101
 Lex+0.101
Neg logit contributions
udd-0.039
auc-0.036
 about-0.034
wheel-0.028
 office-0.023
Token a
Token ID257
Feature activation+9.81e-01

Pos logit contributions
auc+0.056
udd+0.055
wheel+0.050
iren+0.048
orting+0.048
Neg logit contributions
 Ralph-0.045
 Lex-0.044
 Mole-0.044
 Buzz-0.043
 Amelia-0.043
Token little
Token ID1310
Feature activation+1.37e+00

Pos logit contributions
 Ralph+0.030
 Mole+0.030
Albert+0.030
 Flo+0.029
 Lex+0.029
Neg logit contributions
udd-0.026
auc-0.025
 about-0.022
wheel-0.022
 x-0.021
Token girl
Token ID2576
Feature activation+2.51e-01

Pos logit contributions
 Ralph+0.039
 Mole+0.039
Albert+0.038
 Lex+0.038
 Fox+0.037
Neg logit contributions
auc-0.041
udd-0.040
wheel-0.037
 about-0.036
iren-0.034
Token The
Token ID383
Feature activation+3.29e-01

Pos logit contributions
Albert+0.034
Pen+0.034
 Mole+0.033
opsy+0.033
 Ralph+0.033
Neg logit contributions
 about-0.038
auc-0.036
udd-0.035
iren-0.034
wheel-0.033
Token child
Token ID1200
Feature activation-6.32e-01

Pos logit contributions
 Mole+0.007
 Ralph+0.007
Albert+0.007
 Lex+0.007
 Fox+0.007
Neg logit contributions
auc-0.012
udd-0.012
wheel-0.011
 about-0.011
iren-0.010
Token said
Token ID531
Feature activation+6.26e-01

Pos logit contributions
auc+0.017
udd+0.017
 about+0.016
wheel+0.016
iren+0.014
Neg logit contributions
 Ralph-0.019
Albert-0.019
 Mole-0.019
 Lex-0.018
 Fox-0.018
Token "
Token ID366
Feature activation+1.65e+00

Pos logit contributions
 Ralph+0.020
 Mole+0.020
Albert+0.019
 Lex+0.019
 Fox+0.019
Neg logit contributions
auc-0.017
udd-0.017
wheel-0.015
 about-0.014
iren-0.013
TokenHello
Token ID15496
Feature activation+1.96e+00

Pos logit contributions
 Mole+0.041
 Ralph+0.041
 Buzz+0.039
 Flo+0.039
 Fox+0.038
Neg logit contributions
auc-0.049
udd-0.049
wheel-0.047
iren-0.044
ru-0.044
Token Frank
Token ID5278
Feature activation+4.45e+00

Pos logit contributions
Albert+0.066
Pen+0.065
 barely+0.065
 fingertips+0.064
 Mole+0.063
Neg logit contributions
 about-0.045
udd-0.040
auc-0.037
oke-0.033
orting-0.032
Token!
Token ID0
Feature activation+1.21e+00

Pos logit contributions
 barely+0.155
 fingertips+0.152
Albert+0.151
 Mole+0.149
Pen+0.148
Neg logit contributions
udd-0.089
 about-0.089
auc-0.082
oke-0.078
er-0.069
Token I
Token ID314
Feature activation+1.05e+00

Pos logit contributions
Albert+0.030
 Lex+0.030
 Ralph+0.030
Jess+0.029
Lex+0.029
Neg logit contributions
auc-0.038
 about-0.034
udd-0.034
wheel-0.033
iren-0.032
Token'm
Token ID1101
Feature activation+6.17e-01

Pos logit contributions
 Ralph+0.024
 Mole+0.024
Albert+0.023
 Lex+0.023
 Amelia+0.023
Neg logit contributions
auc-0.038
udd-0.037
wheel-0.034
 about-0.034
iren-0.032
Token going
Token ID1016
Feature activation+7.42e-01

Pos logit contributions
 Ralph+0.018
 Mole+0.018
Albert+0.018
 Lex+0.018
 Fox+0.017
Neg logit contributions
auc-0.018
udd-0.018
wheel-0.016
 about-0.015
iren-0.015
Token to
Token ID284
Feature activation+7.12e-01

Pos logit contributions
 Mole+0.022
 Ralph+0.022
Albert+0.022
 Lex+0.022
 Amelia+0.021
Neg logit contributions
udd-0.020
auc-0.020
 about-0.019
wheel-0.018
iren-0.017
Token to
Token ID284
Feature activation+1.01e+00

Pos logit contributions
 Ralph+0.000
 Mole+0.000
 Fox+0.000
Albert+0.000
 Lex+0.000
Neg logit contributions
auc-0.000
udd-0.000
wheel-0.000
iren-0.000
 about-0.000
Token name
Token ID1438
Feature activation+9.85e-01

Pos logit contributions
Albert+0.032
 Ralph+0.032
Larry+0.031
 Mole+0.031
Jess+0.030
Neg logit contributions
auc-0.023
 about-0.022
wheel-0.021
udd-0.021
 x-0.020
Token it
Token ID340
Feature activation+1.17e+00

Pos logit contributions
 Mole+0.027
Albert+0.026
 Ralph+0.025
 Lex+0.025
Larry+0.025
Neg logit contributions
 about-0.030
auc-0.028
udd-0.027
wheel-0.027
ep-0.025
Token "
Token ID366
Feature activation+9.55e-01

Pos logit contributions
 Ralph+0.043
Albert+0.043
 Mole+0.043
 Lex+0.043
 Amelia+0.042
Neg logit contributions
auc-0.024
udd-0.024
 about-0.021
wheel-0.021
iren-0.018
TokenSun
Token ID16012
Feature activation+1.92e+00

Pos logit contributions
 Ralph+0.026
 Mole+0.025
 Lex+0.024
 Amelia+0.024
Albert+0.023
Neg logit contributions
auc-0.030
udd-0.030
wheel-0.028
ep-0.026
 about-0.026
Tokenny
Token ID3281
Feature activation+4.42e+00

Pos logit contributions
Jess+0.056
Albert+0.056
 Mole+0.056
 Ralph+0.056
Larry+0.055
Neg logit contributions
udd-0.051
auc-0.048
 about-0.045
wheel-0.043
orting-0.042
Token"
Token ID1
Feature activation+1.09e-01

Pos logit contributions
Albert+0.124
Larry+0.124
Jess+0.122
 Mole+0.121
Flash+0.119
Neg logit contributions
udd-0.123
auc-0.115
 about-0.114
wheel-0.101
orting-0.099
Token because
Token ID780
Feature activation+5.56e-01

Pos logit contributions
 Ralph+0.003
 Fox+0.003
 Mole+0.003
Albert+0.003
Mo+0.003
Neg logit contributions
auc-0.003
udd-0.003
oke-0.003
orting-0.003
aj-0.003
Token it
Token ID340
Feature activation+1.01e+00

Pos logit contributions
 Mole+0.016
Larry+0.016
 fingertips+0.015
Joy+0.015
Albert+0.015
Neg logit contributions
 about-0.016
auc-0.013
wheel-0.012
udd-0.012
iren-0.012
Token looked
Token ID3114
Feature activation+1.09e+00

Pos logit contributions
 Ralph+0.032
 Mole+0.032
Albert+0.031
 Lex+0.031
 Amelia+0.030
Neg logit contributions
auc-0.025
udd-0.025
 about-0.023
wheel-0.023
ep-0.021
Token like
Token ID588
Feature activation+1.02e+00

Pos logit contributions
 Ralph+0.031
 Mole+0.031
Albert+0.030
 Lex+0.030
 Fox+0.030
Neg logit contributions
auc-0.033
udd-0.032
wheel-0.029
iren-0.028
 about-0.027
Token,
Token ID11
Feature activation+1.14e+00

Pos logit contributions
 Ralph+0.042
 Mole+0.042
Albert+0.041
 Fox+0.040
 Lex+0.040
Neg logit contributions
auc-0.041
udd-0.040
wheel-0.036
 about-0.034
iren-0.033
Token rat
Token ID4227
Feature activation+2.27e+00

Pos logit contributions
Albert+0.029
 Ralph+0.028
 Mole+0.028
 Lex+0.028
Larry+0.027
Neg logit contributions
auc-0.037
udd-0.036
 about-0.035
wheel-0.034
 x-0.032
Token!
Token ID0
Feature activation+6.18e-01

Pos logit contributions
Albert+0.073
 Lex+0.071
 Ralph+0.071
 Amelia+0.070
Jess+0.070
Neg logit contributions
auc-0.059
udd-0.058
 about-0.055
wheel-0.051
ep-0.047
Token You
Token ID921
Feature activation+1.30e+00

Pos logit contributions
 Ralph+0.015
Albert+0.015
 Mole+0.015
 Lex+0.015
 Amelia+0.015
Neg logit contributions
auc-0.020
udd-0.019
wheel-0.018
 about-0.018
iren-0.017
Token are
Token ID389
Feature activation+8.41e-01

Pos logit contributions
Jess+0.039
 Ralph+0.038
 Lex+0.038
 Mole+0.038
Albert+0.038
Neg logit contributions
auc-0.034
udd-0.034
 about-0.033
wheel-0.033
 x-0.031
Token my
Token ID616
Feature activation+4.33e+00

Pos logit contributions
 Lex+0.029
Albert+0.029
 Ralph+0.029
Jess+0.029
 Mole+0.028
Neg logit contributions
 about-0.019
 x-0.017
wheel-0.016
auc-0.015
udd-0.015
Token lunch
Token ID9965
Feature activation+1.07e+00

Pos logit contributions
Larry+0.137
Albert+0.135
Jess+0.135
 Ralph+0.130
Buzz+0.129
Neg logit contributions
 about-0.108
 x-0.104
udd-0.100
 office-0.097
 ambulance-0.091
Token!"
Token ID2474
Feature activation-1.02e+00

Pos logit contributions
 Ralph+0.026
 Mole+0.026
Albert+0.025
 Lex+0.024
 Fox+0.024
Neg logit contributions
auc-0.036
udd-0.036
wheel-0.033
 about-0.032
iren-0.031
Token
Token ID198
Feature activation-1.89e-01

Pos logit contributions
auc+0.045
udd+0.044
wheel+0.040
eping+0.038
orting+0.037
Neg logit contributions
 Ralph-0.017
 Mole-0.016
 Lex-0.015
 Fox-0.015
 Amelia-0.015
Token
Token ID198
Feature activation+1.46e-01

Pos logit contributions
auc+0.008
udd+0.008
wheel+0.007
 about+0.007
orting+0.007
Neg logit contributions
 Ralph-0.003
Albert-0.003
 Mole-0.003
 Lex-0.003
Larry-0.003
TokenBut
Token ID1537
Feature activation-8.91e-01

Pos logit contributions
Albert+0.004
 Ralph+0.004
 Mole+0.004
 Lex+0.004
Larry+0.004
Neg logit contributions
auc-0.005
udd-0.004
wheel-0.004
 about-0.004
orting-0.004
Tokenummy
Token ID13513
Feature activation-3.49e-01

Pos logit contributions
 Mole+0.052
 Ralph+0.052
Albert+0.051
 Fox+0.051
Larry+0.050
Neg logit contributions
udd-0.054
auc-0.054
 about-0.048
wheel-0.045
iren-0.044
Token was
Token ID373
Feature activation-3.73e-01

Pos logit contributions
auc+0.010
udd+0.010
wheel+0.009
iren+0.009
 about+0.009
Neg logit contributions
 Ralph-0.010
 Mole-0.010
 Lex-0.010
Albert-0.010
 Fox-0.010
Token so
Token ID523
Feature activation+9.04e-01

Pos logit contributions
auc+0.013
udd+0.013
orting+0.011
wheel+0.011
iren+0.011
Neg logit contributions
 Lex-0.009
 Ralph-0.009
 Amelia-0.009
 Mole-0.009
Albert-0.009
Token proud
Token ID6613
Feature activation-6.14e-01

Pos logit contributions
 Ralph+0.027
 Lex+0.027
 Mole+0.027
Albert+0.026
 Amelia+0.026
Neg logit contributions
auc-0.027
udd-0.026
wheel-0.023
orting-0.022
iren-0.022
Token of
Token ID286
Feature activation+2.57e+00

Pos logit contributions
auc+0.014
udd+0.014
wheel+0.012
 about+0.011
iren+0.011
Neg logit contributions
 Ralph-0.022
 Mole-0.022
Albert-0.021
 Lex-0.021
 Amelia-0.021
Token Lex
Token ID17210
Feature activation+4.27e+00

Pos logit contributions
unny+0.060
Carl+0.058
 fingertips+0.056
 total+0.055
aire+0.054
Neg logit contributions
 about-0.087
auc-0.066
orting-0.064
udd-0.063
ock-0.060
Tokeni
Token ID72
Feature activation+6.82e-01

Pos logit contributions
 total+0.113
 Mole+0.113
 barely+0.111
+0.109
 hardly+0.105
Neg logit contributions
udd-0.116
 about-0.109
auc-0.100
ier-0.097
oke-0.096
Token.
Token ID13
Feature activation+2.47e+00

Pos logit contributions
 Ralph+0.023
 Mole+0.023
Albert+0.023
 Lex+0.023
 Fox+0.022
Neg logit contributions
auc-0.017
udd-0.016
wheel-0.014
 about-0.014
iren-0.013
Token She
Token ID1375
Feature activation-5.96e-01

Pos logit contributions
Albert+0.076
Larry+0.075
 Mole+0.073
Carl+0.072
Flash+0.072
Neg logit contributions
 about-0.061
udd-0.058
auc-0.055
 explains-0.046
iren-0.044
Token said
Token ID531
Feature activation+8.53e-01

Pos logit contributions
auc+0.018
udd+0.018
wheel+0.016
 about+0.015
iren+0.015
Neg logit contributions
 Ralph-0.017
 Mole-0.016
Albert-0.016
 Lex-0.016
 Amelia-0.016
Token with
Token ID351
Feature activation+1.31e+00

Pos logit contributions
 Ralph+0.024
 Mole+0.024
 Lex+0.023
 Amelia+0.023
 Buzz+0.023
Neg logit contributions
udd-0.027
auc-0.027
wheel-0.024
iren-0.023
ep-0.022
Token The
Token ID383
Feature activation+7.89e-01

Pos logit contributions
 Ralph+0.008
Albert+0.008
 Mole+0.008
 Lex+0.008
 Amelia+0.008
Neg logit contributions
auc-0.011
udd-0.011
 about-0.010
wheel-0.010
iren-0.010
Token frog
Token ID21264
Feature activation+7.47e-01

Pos logit contributions
 Lex+0.016
Jess+0.016
 Amelia+0.015
Albert+0.015
Lex+0.014
Neg logit contributions
udd-0.027
wheel-0.027
auc-0.027
 about-0.026
 noise-0.026
Token said
Token ID531
Feature activation+1.13e+00

Pos logit contributions
Albert+0.022
 Ralph+0.022
 Lex+0.022
 Amelia+0.022
 Mole+0.022
Neg logit contributions
udd-0.020
auc-0.019
 about-0.018
wheel-0.018
ep-0.017
Token "
Token ID366
Feature activation+2.25e+00

Pos logit contributions
 Ralph+0.032
 Mole+0.032
 Fox+0.031
 Buzz+0.030
 Lex+0.030
Neg logit contributions
udd-0.037
auc-0.036
wheel-0.032
iren-0.031
ep-0.029
TokenHi
Token ID17250
Feature activation+2.11e+00

Pos logit contributions
 Ralph+0.055
 Mole+0.054
 Amelia+0.053
 Albert+0.052
 Fox+0.052
Neg logit contributions
auc-0.074
udd-0.073
wheel-0.071
iren-0.063
ep-0.063
Token there
Token ID612
Feature activation+4.20e+00

Pos logit contributions
Albert+0.072
 Lex+0.070
Jess+0.070
 barely+0.069
Larry+0.069
Neg logit contributions
 about-0.047
udd-0.045
auc-0.037
wheel-0.034
orting-0.033
Token little
Token ID1310
Feature activation+1.60e+00

Pos logit contributions
Albert+0.131
 Mole+0.128
 Lex+0.126
 Ralph+0.126
Jess+0.124
Neg logit contributions
udd-0.111
 about-0.106
auc-0.106
wheel-0.096
orting-0.085
Token bunny
Token ID44915
Feature activation+3.14e+00

Pos logit contributions
Albert+0.026
 Lex+0.025
Jess+0.024
 Ralph+0.024
 Amelia+0.024
Neg logit contributions
auc-0.062
udd-0.062
 about-0.060
wheel-0.059
 x-0.055
Token!
Token ID0
Feature activation+1.20e+00

Pos logit contributions
Albert+0.102
 Lex+0.101
Lex+0.098
Jess+0.098
 Amelia+0.097
Neg logit contributions
 about-0.074
udd-0.074
auc-0.070
wheel-0.067
orting-0.059
Token What
Token ID1867
Feature activation+7.46e-01

Pos logit contributions
Albert+0.034
 Ralph+0.032
 Lex+0.032
Jess+0.032
Lex+0.032
Neg logit contributions
auc-0.035
 about-0.032
udd-0.032
wheel-0.031
oke-0.029
Token are
Token ID389
Feature activation+1.37e+00

Pos logit contributions
 Ralph+0.022
 Mole+0.022
Albert+0.022
 Lex+0.022
 Amelia+0.021
Neg logit contributions
auc-0.021
udd-0.021
wheel-0.019
 about-0.019
iren-0.017
Token to
Token ID284
Feature activation+4.04e-01

Pos logit contributions
Albert+0.053
 Lex+0.050
 Ralph+0.050
 Mole+0.050
 Amelia+0.050
Neg logit contributions
 about-0.033
auc-0.030
udd-0.030
wheel-0.026
 x-0.023
Token be
Token ID307
Feature activation+9.49e-02

Pos logit contributions
Albert+0.012
 Ralph+0.012
 Amelia+0.012
 Mole+0.012
 Lex+0.012
Neg logit contributions
auc-0.011
udd-0.010
wheel-0.010
 about-0.010
 x-0.009
Token the
Token ID262
Feature activation+1.39e+00

Pos logit contributions
 Ralph+0.003
Albert+0.003
 Mole+0.003
 Lex+0.003
 Amelia+0.003
Neg logit contributions
udd-0.003
auc-0.003
 about-0.002
wheel-0.002
ep-0.002
Token strongest
Token ID12841
Feature activation+1.43e+00

Pos logit contributions
 Ralph+0.041
 Lex+0.041
 Mole+0.040
 Amelia+0.040
Albert+0.040
Neg logit contributions
udd-0.038
auc-0.036
 about-0.034
wheel-0.034
 x-0.031
Token animal
Token ID5044
Feature activation+1.94e+00

Pos logit contributions
Albert+0.040
 Ralph+0.040
 Mole+0.040
 Lex+0.040
 Amelia+0.039
Neg logit contributions
udd-0.042
auc-0.041
 about-0.039
wheel-0.037
iren-0.034
Token in
Token ID287
Feature activation+4.20e+00

Pos logit contributions
Albert+0.067
 Lex+0.067
 Ralph+0.066
Lex+0.065
 Mole+0.065
Neg logit contributions
 about-0.044
auc-0.044
udd-0.042
wheel-0.039
ep-0.032
Token the
Token ID262
Feature activation+3.44e-01

Pos logit contributions
 fingertips+0.157
etsy+0.155
unny+0.152
 Lex+0.149
opsy+0.146
Neg logit contributions
 about-0.076
auc-0.056
 late-0.056
ep-0.054
 accident-0.053
Token world
Token ID995
Feature activation+1.35e+00

Pos logit contributions
 Ralph+0.011
 Lex+0.011
 Mole+0.010
 Amelia+0.010
Albert+0.010
Neg logit contributions
udd-0.010
auc-0.009
 about-0.008
wheel-0.008
ep-0.007
Token.
Token ID13
Feature activation+4.73e-01

Pos logit contributions
 Ralph+0.041
 Mole+0.041
Albert+0.040
 Lex+0.039
 Fox+0.039
Neg logit contributions
auc-0.038
udd-0.038
wheel-0.034
 about-0.033
iren-0.031
Token So
Token ID1406
Feature activation+2.51e-01

Pos logit contributions
Albert+0.016
 Lex+0.015
 fingertips+0.015
Lex+0.015
Jess+0.015
Neg logit contributions
 about-0.011
udd-0.010
auc-0.010
orting-0.008
oke-0.008
Token they
Token ID484
Feature activation+6.36e-01

Pos logit contributions
Albert+0.010
Pen+0.010
 Ralph+0.009
Larry+0.009
Jess+0.009
Neg logit contributions
 about-0.004
udd-0.003
auc-0.003
orting-0.003
ep-0.003
Token her
Token ID607
Feature activation-2.10e-01

Pos logit contributions
Albert+0.104
 Mole+0.102
 chicks+0.101
 Ralph+0.101
unny+0.100
Neg logit contributions
 about-0.065
auc-0.063
udd-0.060
 accident-0.059
wheel-0.058
Token ears
Token ID11368
Feature activation+1.05e+00

Pos logit contributions
auc+0.007
udd+0.006
wheel+0.006
 about+0.006
iren+0.006
Neg logit contributions
 Ralph-0.005
 fingertips-0.005
 Lex-0.005
 Mole-0.005
Albert-0.005
Token and
Token ID290
Feature activation+9.65e-01

Pos logit contributions
 Ralph+0.028
 Lex+0.027
 Amelia+0.027
 Fox+0.026
 Buzz+0.026
Neg logit contributions
auc-0.034
udd-0.030
 x-0.027
aj-0.027
iren-0.027
Token the
Token ID262
Feature activation-1.07e+00

Pos logit contributions
 Ralph+0.022
 Mole+0.022
 Lex+0.022
 Fox+0.021
 Amelia+0.021
Neg logit contributions
auc-0.035
udd-0.034
wheel-0.030
iren-0.030
orting-0.029
Token sun
Token ID4252
Feature activation+1.20e+00

Pos logit contributions
auc+0.033
udd+0.032
 about+0.030
wheel+0.028
orting+0.027
Neg logit contributions
 Ralph-0.029
Albert-0.029
 Lex-0.028
 Mole-0.028
 Fox-0.027
Token on
Token ID319
Feature activation+4.18e+00

Pos logit contributions
 Lex+0.031
 Ralph+0.030
 Amelia+0.030
 Flo+0.029
 Mole+0.029
Neg logit contributions
auc-0.039
udd-0.038
wheel-0.033
 about-0.032
iren-0.032
Token her
Token ID607
Feature activation-4.85e-01

Pos logit contributions
Albert+0.149
 Mole+0.141
 Albert+0.138
 Ralph+0.138
Larry+0.137
Neg logit contributions
auc-0.081
 about-0.077
udd-0.067
 x-0.064
wheel-0.061
Token back
Token ID736
Feature activation-5.81e-01

Pos logit contributions
auc+0.015
udd+0.014
wheel+0.012
 about+0.012
iren+0.012
Neg logit contributions
 Ralph-0.013
Albert-0.013
 fingertips-0.013
 Mole-0.013
 Lex-0.013
Token.
Token ID13
Feature activation+1.62e+00

Pos logit contributions
auc+0.019
udd+0.019
wheel+0.016
orting+0.015
arre+0.015
Neg logit contributions
 Ralph-0.014
 Amelia-0.014
 Lex-0.014
 Flo-0.013
 Mole-0.013
Token She
Token ID1375
Feature activation-5.32e-01

Pos logit contributions
Albert+0.052
Larry+0.049
 mole+0.049
unia+0.048
 Mole+0.047
Neg logit contributions
udd-0.041
 about-0.040
auc-0.040
wheel-0.033
iren-0.030
Token felt
Token ID2936
Feature activation-3.25e-01

Pos logit contributions
auc+0.018
udd+0.017
iren+0.015
orting+0.015
wheel+0.015
Neg logit contributions
 Ralph-0.013
Albert-0.013
 Lex-0.013
 Mole-0.013
 Amelia-0.012
Token She
Token ID1375
Feature activation+4.30e-03

Pos logit contributions
 Ralph+0.024
 Mole+0.024
Albert+0.024
 Lex+0.023
Larry+0.023
Neg logit contributions
auc-0.033
udd-0.033
 about-0.030
wheel-0.030
iren-0.028
Token put
Token ID1234
Feature activation+6.94e-01

Pos logit contributions
 Ralph+0.000
 Mole+0.000
Albert+0.000
 Lex+0.000
 Amelia+0.000
Neg logit contributions
auc-0.000
udd-0.000
wheel-0.000
 about-0.000
iren-0.000
Token some
Token ID617
Feature activation-2.28e-01

Pos logit contributions
 Ralph+0.019
 Mole+0.019
Larry+0.019
Albert+0.019
Jess+0.019
Neg logit contributions
auc-0.020
udd-0.019
 about-0.018
wheel-0.018
iren-0.017
Token lot
Token ID1256
Feature activation-5.05e-01

Pos logit contributions
auc+0.007
udd+0.007
wheel+0.006
 about+0.006
iren+0.005
Neg logit contributions
 Ralph-0.007
 Mole-0.007
Albert-0.007
 Lex-0.006
 Fox-0.006
Tokenion
Token ID295
Feature activation+1.26e+00

Pos logit contributions
auc+0.013
udd+0.013
wheel+0.011
 about+0.011
iren+0.010
Neg logit contributions
 Mole-0.017
 Ralph-0.017
Albert-0.016
 Lex-0.016
 Fox-0.016
Token on
Token ID319
Feature activation+4.10e+00

Pos logit contributions
 Ralph+0.040
 Mole+0.040
Albert+0.039
 Lex+0.039
 Amelia+0.038
Neg logit contributions
auc-0.034
udd-0.033
wheel-0.030
 about-0.029
iren-0.027
Token her
Token ID607
Feature activation+1.09e+00

Pos logit contributions
 Mole+0.119
Albert+0.118
Larry+0.117
 Ralph+0.115
 mole+0.115
Neg logit contributions
auc-0.104
 about-0.103
udd-0.099
wheel-0.096
 x-0.095
Token hands
Token ID2832
Feature activation+1.59e+00

Pos logit contributions
Larry+0.032
Albert+0.032
Jess+0.032
 Ralph+0.032
 Mole+0.031
Neg logit contributions
udd-0.029
 about-0.027
auc-0.026
 x-0.026
 coll-0.025
Token and
Token ID290
Feature activation+4.83e-01

Pos logit contributions
 Ralph+0.049
 Mole+0.049
 Fox+0.048
 Amelia+0.048
 Lex+0.047
Neg logit contributions
auc-0.044
udd-0.042
wheel-0.037
iren-0.036
ep-0.034
Token started
Token ID2067
Feature activation+1.33e+00

Pos logit contributions
 Ralph+0.013
 Mole+0.013
 Lex+0.013
 Fox+0.013
Albert+0.012
Neg logit contributions
auc-0.016
udd-0.015
wheel-0.014
 about-0.013
iren-0.013
Token to
Token ID284
Feature activation+4.62e-02

Pos logit contributions
 Ralph+0.038
 Mole+0.038
Albert+0.037
 Lex+0.037
 Fox+0.036
Neg logit contributions
auc-0.040
udd-0.040
wheel-0.036
 about-0.035
iren-0.033
Token name
Token ID1438
Feature activation+5.71e-01

Pos logit contributions
 Ralph+0.028
 Mole+0.028
Albert+0.027
 Lex+0.027
 Fox+0.027
Neg logit contributions
auc-0.028
udd-0.028
wheel-0.025
 about-0.024
iren-0.023
Token tag
Token ID7621
Feature activation+7.13e-01

Pos logit contributions
 Ralph+0.016
 Mole+0.016
 Fox+0.016
 Amelia+0.016
 Lex+0.015
Neg logit contributions
udd-0.019
auc-0.017
wheel-0.015
arre-0.014
aj-0.014
Token that
Token ID326
Feature activation-2.56e-01

Pos logit contributions
 Ralph+0.020
 Mole+0.019
 Fox+0.019
 Amelia+0.019
 Lex+0.018
Neg logit contributions
udd-0.023
auc-0.022
wheel-0.020
iren-0.018
arre-0.018
Token said
Token ID531
Feature activation+5.93e-01

Pos logit contributions
udd+0.007
auc+0.007
wheel+0.006
 about+0.006
orting+0.006
Neg logit contributions
 Ralph-0.008
 Mole-0.008
 Fox-0.008
 Lex-0.008
 Amelia-0.008
Token "
Token ID366
Feature activation+1.84e+00

Pos logit contributions
 Ralph+0.015
 Mole+0.015
 Fox+0.014
 Flo+0.014
 Buzz+0.014
Neg logit contributions
udd-0.021
auc-0.020
wheel-0.018
iren-0.017
orting-0.017
TokenBob
Token ID18861
Feature activation+4.07e+00

Pos logit contributions
 Mole+0.043
 Ralph+0.043
Albert+0.043
 Lex+0.042
 Fox+0.041
Neg logit contributions
auc-0.063
udd-0.063
 about-0.057
wheel-0.057
iren-0.053
Token".
Token ID1911
Feature activation+7.99e-01

Pos logit contributions
Albert+0.108
 Mole+0.107
 Ralph+0.106
Jess+0.104
 Lex+0.104
Neg logit contributions
udd-0.122
auc-0.121
 about-0.113
wheel-0.106
ep-0.103
Token
Token ID198
Feature activation+1.47e+00

Pos logit contributions
Albert+0.021
Mo+0.020
Larry+0.020
 Mole+0.019
 Ralph+0.019
Neg logit contributions
 about-0.023
auc-0.022
udd-0.020
wheel-0.020
iren-0.020
Token
Token ID198
Feature activation+1.79e+00

Pos logit contributions
 fingertips+0.043
 Mole+0.043
 mole+0.042
 Ralph+0.042
 chicks+0.042
Neg logit contributions
 about-0.036
auc-0.036
udd-0.036
iren-0.033
her-0.032
Token"
Token ID1
Feature activation+1.33e+00

Pos logit contributions
 fingertips+0.055
 Ralph+0.049
 Fox+0.048
 Mole+0.048
 Amelia+0.048
Neg logit contributions
auc-0.045
ep-0.043
udd-0.042
iren-0.042
her-0.041
TokenThis
Token ID1212
Feature activation+2.17e-01

Pos logit contributions
 Ralph+0.032
 Mole+0.032
 Fox+0.031
 Albert+0.030
 Amelia+0.030
Neg logit contributions
auc-0.042
wheel-0.039
udd-0.038
iren-0.036
ep-0.035
Token dared
Token ID28765
Feature activation+5.08e-01

Pos logit contributions
auc+0.008
udd+0.008
iren+0.007
wheel+0.007
ep+0.006
Neg logit contributions
 Ralph-0.006
 Mole-0.006
 Fox-0.005
 Lex-0.005
 Buzz-0.005
Token to
Token ID284
Feature activation-4.02e-01

Pos logit contributions
 Mole+0.013
 Ralph+0.013
Albert+0.013
 Lex+0.013
 Amelia+0.013
Neg logit contributions
auc-0.016
udd-0.016
wheel-0.014
 about-0.014
iren-0.013
Token join
Token ID4654
Feature activation-2.79e-01

Pos logit contributions
auc+0.014
udd+0.013
wheel+0.012
 about+0.012
iren+0.011
Neg logit contributions
 Ralph-0.010
 Mole-0.010
Albert-0.010
 Lex-0.010
 Fox-0.010
Token them
Token ID606
Feature activation+9.53e-01

Pos logit contributions
auc+0.009
udd+0.009
 about+0.008
wheel+0.008
iren+0.007
Neg logit contributions
 Mole-0.007
 Ralph-0.007
Albert-0.007
 Lex-0.007
 Amelia-0.007
Token<|endoftext|>
Token ID50256
Feature activation-5.48e-01

Pos logit contributions
 Ralph+0.028
 Mole+0.028
Albert+0.027
 Fox+0.027
 Amelia+0.027
Neg logit contributions
auc-0.028
udd-0.027
wheel-0.024
iren-0.023
 about-0.023
TokenAb
Token ID4826
Feature activation+4.07e+00

Pos logit contributions
auc+0.018
udd+0.018
 about+0.017
wheel+0.016
iren+0.016
Neg logit contributions
 Lex-0.013
Albert-0.013
 Mole-0.013
 Amelia-0.013
 Ralph-0.013
Tokenby
Token ID1525
Feature activation+1.30e+00

Pos logit contributions
 Mole+0.106
 Buzz+0.099
 Fox+0.099
Albert+0.098
 Amelia+0.097
Neg logit contributions
udd-0.141
auc-0.129
ep-0.117
 about-0.115
aj-0.112
Token and
Token ID290
Feature activation+4.32e-01

Pos logit contributions
Albert+0.052
 Mole+0.051
 Ralph+0.051
 Lex+0.051
 Amelia+0.050
Neg logit contributions
udd-0.024
auc-0.022
 about-0.021
wheel-0.019
ep-0.016
Token Tommy
Token ID19919
Feature activation+1.19e+00

Pos logit contributions
 Ralph+0.011
 Mole+0.011
 Lex+0.011
Albert+0.011
 Fox+0.010
Neg logit contributions
auc-0.015
udd-0.015
wheel-0.013
 about-0.013
iren-0.012
Token were
Token ID547
Feature activation-1.62e+00

Pos logit contributions
Albert+0.029
 Ralph+0.028
 Mole+0.028
 Lex+0.027
 Fox+0.027
Neg logit contributions
udd-0.041
auc-0.040
 about-0.037
wheel-0.037
iren-0.034
Token twins
Token ID20345
Feature activation-5.66e-01

Pos logit contributions
auc+0.049
udd+0.047
wheel+0.045
iren+0.042
orting+0.040
Neg logit contributions
 Ralph-0.046
 Mole-0.045
 Lex-0.043
Albert-0.043
 Amelia-0.043
Token
Token ID198
Feature activation+3.14e-01

Pos logit contributions
 Lex+0.002
 mole+0.002
 Mole+0.002
 Ralph+0.002
 Fox+0.002
Neg logit contributions
udd-0.002
auc-0.002
 about-0.002
wheel-0.002
iren-0.002
TokenThe
Token ID464
Feature activation-1.50e-01

Pos logit contributions
 Ralph+0.009
 Mole+0.008
 Lex+0.008
 Fox+0.008
 mole+0.008
Neg logit contributions
auc-0.009
udd-0.009
wheel-0.009
 about-0.008
ep-0.008
Token end
Token ID886
Feature activation-1.66e+00

Pos logit contributions
auc+0.005
udd+0.005
wheel+0.004
 about+0.004
iren+0.004
Neg logit contributions
 Ralph-0.004
 Mole-0.004
Albert-0.004
 Lex-0.004
 Amelia-0.004
Token.
Token ID13
Feature activation-4.71e-01

Pos logit contributions
auc+0.049
udd+0.049
wheel+0.044
 about+0.042
iren+0.040
Neg logit contributions
 Ralph-0.048
 Mole-0.048
Albert-0.047
 Lex-0.046
 Fox-0.046
Token<|endoftext|>
Token ID50256
Feature activation-2.37e-01

Pos logit contributions
 about+0.012
auc+0.012
udd+0.011
wheel+0.011
oke+0.010
Neg logit contributions
Albert-0.014
 mole-0.014
 chicks-0.013
 Ralph-0.013
 Lex-0.013
TokenLu
Token ID25596
Feature activation+4.05e+00

Pos logit contributions
auc+0.007
udd+0.007
 about+0.007
wheel+0.007
iren+0.006
Neg logit contributions
 Lex-0.006
 Mole-0.006
 Ralph-0.006
Albert-0.006
 Amelia-0.006
Tokency
Token ID948
Feature activation+1.56e+00

Pos logit contributions
 Mole+0.084
 Ralph+0.079
 Fox+0.079
 mole+0.077
Larry+0.075
Neg logit contributions
udd-0.150
auc-0.138
 about-0.133
ep-0.123
ru-0.123
Token was
Token ID373
Feature activation-9.40e-01

Pos logit contributions
 Ralph+0.055
 Mole+0.055
 Lex+0.054
Albert+0.054
 Amelia+0.053
Neg logit contributions
auc-0.036
udd-0.035
wheel-0.031
 about-0.029
iren-0.028
Token very
Token ID845
Feature activation+5.38e-01

Pos logit contributions
auc+0.032
udd+0.029
wheel+0.027
iren+0.027
orting+0.026
Neg logit contributions
 Ralph-0.024
 Lex-0.024
 Mole-0.024
 Buzz-0.023
Albert-0.023
Token excited
Token ID6568
Feature activation-3.19e-01

Pos logit contributions
 Mole+0.016
 Ralph+0.016
Albert+0.015
 Lex+0.015
 Amelia+0.015
Neg logit contributions
auc-0.016
udd-0.015
wheel-0.014
 about-0.013
iren-0.013
Token to
Token ID284
Feature activation+4.98e-01

Pos logit contributions
auc+0.012
udd+0.011
wheel+0.010
iren+0.010
ep+0.010
Neg logit contributions
 Amelia-0.007
 Ralph-0.007
 Jessie-0.007
 Fox-0.007
 Mole-0.007
Token.
Token ID13
Feature activation+5.88e-01

Pos logit contributions
 Ralph+0.007
 Mole+0.007
Albert+0.007
 Fox+0.007
 Amelia+0.007
Neg logit contributions
auc-0.007
udd-0.007
wheel-0.006
iren-0.005
 x-0.005
Token We
Token ID775
Feature activation+1.01e+00

Pos logit contributions
Albert+0.016
 Ralph+0.016
 Lex+0.016
Larry+0.015
 Mole+0.015
Neg logit contributions
auc-0.017
udd-0.016
 about-0.016
wheel-0.015
iren-0.014
Token don
Token ID836
Feature activation+1.19e+00

Pos logit contributions
Jess+0.027
 Amelia+0.027
 Lex+0.027
Albert+0.027
Pen+0.026
Neg logit contributions
udd-0.030
auc-0.029
wheel-0.029
 about-0.029
ep-0.025
Token't
Token ID470
Feature activation-4.84e-01

Pos logit contributions
 Ralph+0.021
 Mole+0.021
 Lex+0.021
Albert+0.021
 Amelia+0.020
Neg logit contributions
auc-0.047
udd-0.047
wheel-0.046
 about-0.045
iren-0.043
Token want
Token ID765
Feature activation+2.02e+00

Pos logit contributions
udd+0.013
auc+0.013
wheel+0.012
 about+0.012
iren+0.010
Neg logit contributions
 Ralph-0.015
Albert-0.015
 Mole-0.015
 Amelia-0.015
 Lex-0.014
Token Sunshine
Token ID32210
Feature activation+4.05e+00

Pos logit contributions
Albert+0.078
Jess+0.075
Larry+0.074
opsy+0.073
 Mole+0.072
Neg logit contributions
 about-0.041
udd-0.031
auc-0.028
wheel-0.028
 x-0.026
Token to
Token ID284
Feature activation+1.33e+00

Pos logit contributions
Albert+0.138
 Ralph+0.135
 Lex+0.134
 Mole+0.134
Jess+0.133
Neg logit contributions
udd-0.095
 about-0.095
auc-0.086
wheel-0.081
orting-0.070
Token hurt
Token ID5938
Feature activation+1.91e+00

Pos logit contributions
 Ralph+0.038
Albert+0.038
 Mole+0.038
 Lex+0.037
 Amelia+0.037
Neg logit contributions
udd-0.038
auc-0.037
wheel-0.036
 about-0.035
iren-0.032
Token herself
Token ID5223
Feature activation+1.44e+00

Pos logit contributions
Albert+0.052
Larry+0.050
 Mole+0.050
opsy+0.050
 Ralph+0.050
Neg logit contributions
 about-0.051
auc-0.045
udd-0.045
 noise-0.041
 x-0.041
Token."
Token ID526
Feature activation+1.01e+00

Pos logit contributions
 Mole+0.035
 Ralph+0.035
Albert+0.034
 Lex+0.033
 Fox+0.033
Neg logit contributions
auc-0.049
udd-0.048
wheel-0.044
iren-0.042
 about-0.041
Token She
Token ID1375
Feature activation+9.63e-02

Pos logit contributions
 total+0.030
 fingertips+0.029
ifying+0.029
Larry+0.028
 Mole+0.028
Neg logit contributions
 about-0.025
udd-0.020
 explains-0.020
orting-0.019
 writes-0.019
Token She
Token ID1375
Feature activation+7.41e-02

Pos logit contributions
Albert+0.029
Larry+0.028
 Mole+0.028
 Ralph+0.028
Jess+0.027
Neg logit contributions
auc-0.033
udd-0.032
 about-0.030
wheel-0.029
iren-0.027
Token put
Token ID1234
Feature activation+6.60e-01

Pos logit contributions
 Ralph+0.002
 Mole+0.002
Albert+0.002
 Lex+0.002
 Amelia+0.002
Neg logit contributions
auc-0.002
udd-0.002
wheel-0.002
 about-0.002
iren-0.002
Token some
Token ID617
Feature activation-1.93e-02

Pos logit contributions
 Mole+0.017
 Ralph+0.017
Albert+0.017
Larry+0.017
Jess+0.017
Neg logit contributions
auc-0.020
udd-0.019
 about-0.018
wheel-0.018
iren-0.017
Token lot
Token ID1256
Feature activation-3.44e-01

Pos logit contributions
auc+0.001
udd+0.001
wheel+0.000
 about+0.000
iren+0.000
Neg logit contributions
 Ralph-0.001
 Mole-0.001
Albert-0.001
 Lex-0.001
 Fox-0.001
Tokenion
Token ID295
Feature activation+1.48e+00

Pos logit contributions
auc+0.008
udd+0.008
wheel+0.007
 about+0.007
iren+0.007
Neg logit contributions
 Mole-0.012
 Ralph-0.012
Albert-0.011
 Lex-0.011
 Fox-0.011
Token on
Token ID319
Feature activation+4.05e+00

Pos logit contributions
 Ralph+0.047
 Mole+0.047
Albert+0.047
 Lex+0.046
 Amelia+0.045
Neg logit contributions
auc-0.039
udd-0.039
wheel-0.034
 about-0.034
iren-0.031
Token her
Token ID607
Feature activation+9.05e-01

Pos logit contributions
 Mole+0.114
Albert+0.113
Larry+0.112
 mole+0.109
 Ralph+0.109
Neg logit contributions
 about-0.102
auc-0.102
udd-0.097
wheel-0.096
 x-0.095
Token hands
Token ID2832
Feature activation+1.72e+00

Pos logit contributions
Albert+0.026
 Ralph+0.026
Larry+0.026
 Mole+0.026
Jess+0.025
Neg logit contributions
udd-0.025
 about-0.023
auc-0.023
wheel-0.022
 x-0.022
Token and
Token ID290
Feature activation+6.37e-01

Pos logit contributions
 Ralph+0.056
 Mole+0.055
 Lex+0.054
 Fox+0.054
 Amelia+0.054
Neg logit contributions
auc-0.046
udd-0.044
wheel-0.038
iren-0.036
 x-0.035
Token started
Token ID2067
Feature activation+1.45e+00

Pos logit contributions
 Ralph+0.018
 Mole+0.017
 Lex+0.017
Albert+0.017
 Fox+0.017
Neg logit contributions
auc-0.020
udd-0.020
wheel-0.018
 about-0.017
iren-0.017
Token to
Token ID284
Feature activation+2.11e-01

Pos logit contributions
 Ralph+0.041
 Mole+0.041
Albert+0.040
 Lex+0.040
 Fox+0.039
Neg logit contributions
auc-0.043
udd-0.043
wheel-0.039
 about-0.038
iren-0.036
Token said
Token ID531
Feature activation+6.96e-01

Pos logit contributions
 Ralph+0.015
 Mole+0.015
 Lex+0.015
Albert+0.014
 Fox+0.014
Neg logit contributions
auc-0.010
udd-0.010
wheel-0.009
 about-0.008
iren-0.008
Token,
Token ID11
Feature activation+1.17e+00

Pos logit contributions
 Ralph+0.022
 Mole+0.022
Albert+0.021
 Lex+0.021
 Fox+0.021
Neg logit contributions
auc-0.019
udd-0.019
wheel-0.017
 about-0.016
iren-0.016
Token "
Token ID366
Feature activation+7.17e-01

Pos logit contributions
 Ralph+0.037
Albert+0.037
 Mole+0.037
 Lex+0.036
Larry+0.036
Neg logit contributions
auc-0.029
udd-0.029
 about-0.027
wheel-0.026
iren-0.023
TokenThank
Token ID10449
Feature activation+1.18e+00

Pos logit contributions
 Mole+0.021
 Ralph+0.020
 Fox+0.020
 Albert+0.020
 Buzz+0.019
Neg logit contributions
udd-0.020
wheel-0.019
auc-0.019
ock-0.016
orting-0.016
Token you
Token ID345
Feature activation+1.87e+00

Pos logit contributions
 Lex+0.036
Albert+0.036
 Ralph+0.036
field+0.036
 barely+0.035
Neg logit contributions
auc-0.028
udd-0.026
orting-0.025
 about-0.024
wheel-0.024
Token Stella
Token ID45856
Feature activation+4.05e+00

Pos logit contributions
Albert+0.066
 Ralph+0.066
 Lex+0.065
 Mole+0.064
Larry+0.064
Neg logit contributions
udd-0.042
auc-0.042
 about-0.039
wheel-0.036
iren-0.032
Token!
Token ID0
Feature activation+1.07e+00

Pos logit contributions
Albert+0.176
Larry+0.172
Flash+0.168
 Ralph+0.168
Jess+0.166
Neg logit contributions
 about-0.063
udd-0.060
auc-0.040
er-0.035
oke-0.033
Token What
Token ID1867
Feature activation-3.27e-01

Pos logit contributions
Albert+0.030
 Ralph+0.029
Larry+0.029
Jess+0.028
 Mole+0.028
Neg logit contributions
auc-0.031
udd-0.030
 about-0.028
wheel-0.027
orting-0.025
Token should
Token ID815
Feature activation+1.89e+00

Pos logit contributions
auc+0.010
udd+0.010
wheel+0.008
iren+0.008
orting+0.008
Neg logit contributions
 Mole-0.009
 Ralph-0.009
Albert-0.009
 Fox-0.009
 Amelia-0.009
Token we
Token ID356
Feature activation-1.47e+00

Pos logit contributions
Albert+0.051
 Ralph+0.049
Larry+0.048
Mo+0.048
 Lex+0.048
Neg logit contributions
auc-0.057
 about-0.053
wheel-0.050
udd-0.049
 x-0.049
Token do
Token ID466
Feature activation-7.76e-01

Pos logit contributions
auc+0.033
udd+0.033
wheel+0.030
 about+0.029
 x+0.025
Neg logit contributions
 Ralph-0.053
 Lex-0.052
 Mole-0.052
Albert-0.052
 Amelia-0.051
Token breeze
Token ID28633
Feature activation+1.82e-01

Pos logit contributions
auc+0.021
udd+0.021
wheel+0.019
 about+0.018
iren+0.018
Neg logit contributions
 Ralph-0.015
 Lex-0.015
 Mole-0.015
 Amelia-0.015
Albert-0.015
Token rust
Token ID17000
Feature activation-1.01e-01

Pos logit contributions
 Lex+0.004
 Flo+0.004
 Amelia+0.004
 Ralph+0.004
 Buzz+0.004
Neg logit contributions
auc-0.006
udd-0.006
 x-0.005
iren-0.005
wheel-0.005
Tokenle
Token ID293
Feature activation+1.44e+00

Pos logit contributions
auc+0.003
udd+0.003
wheel+0.003
 about+0.002
iren+0.002
Neg logit contributions
 Ralph-0.003
 Lex-0.003
 Mole-0.003
Albert-0.003
 Flo-0.003
Token the
Token ID262
Feature activation+9.98e-03

Pos logit contributions
 Ralph+0.043
 Mole+0.043
Albert+0.043
 Lex+0.042
 Amelia+0.041
Neg logit contributions
auc-0.040
udd-0.040
wheel-0.036
 about-0.035
iren-0.033
Token leaves
Token ID5667
Feature activation+3.85e-01

Pos logit contributions
 Ralph+0.000
 Mole+0.000
Albert+0.000
 Lex+0.000
 Amelia+0.000
Neg logit contributions
auc-0.000
udd-0.000
wheel-0.000
 about-0.000
iren-0.000
Token on
Token ID319
Feature activation+4.04e+00

Pos logit contributions
 Ralph+0.009
 Flo+0.009
 Lex+0.009
 Mole+0.009
Albert+0.009
Neg logit contributions
auc-0.013
udd-0.013
 x-0.011
iren-0.011
wheel-0.011
Token the
Token ID262
Feature activation-4.39e-01

Pos logit contributions
unny+0.145
Albert+0.143
 Mole+0.136
Flash+0.136
Mo+0.134
Neg logit contributions
 about-0.079
auc-0.069
her-0.059
 noise-0.059
 louder-0.059
Token trees
Token ID7150
Feature activation+1.13e+00

Pos logit contributions
auc+0.014
udd+0.014
wheel+0.013
 about+0.012
iren+0.012
Neg logit contributions
 Ralph-0.012
 Mole-0.012
Albert-0.011
 Lex-0.011
 Fox-0.011
Token.
Token ID13
Feature activation+2.02e+00

Pos logit contributions
 Ralph+0.032
 Mole+0.032
 Lex+0.031
Albert+0.031
 Amelia+0.031
Neg logit contributions
auc-0.034
udd-0.033
wheel-0.030
 about-0.029
iren-0.028
Token
Token ID220
Feature activation+9.82e-01

Pos logit contributions
Albert+0.061
 mole+0.059
Mo+0.058
 Fox+0.057
Flash+0.057
Neg logit contributions
 about-0.056
auc-0.050
udd-0.050
 explains-0.038
oke-0.038
Token
Token ID198
Feature activation+2.37e+00

Pos logit contributions
 Mole+0.023
 Ralph+0.023
Albert+0.023
 Lex+0.022
 Fox+0.022
Neg logit contributions
udd-0.034
auc-0.034
 about-0.031
wheel-0.030
iren-0.029
TokenM
Token ID44
Feature activation+2.06e+00

Pos logit contributions
 Ralph+0.019
 Lex+0.019
 Amelia+0.019
 Mole+0.019
 Jessie+0.018
Neg logit contributions
udd-0.022
auc-0.022
wheel-0.020
 about-0.019
ep-0.019
Tokenopsy
Token ID44522
Feature activation+1.03e-01

Pos logit contributions
 Ralph+0.042
Albert+0.042
 Amelia+0.042
 Mole+0.042
 Fox+0.041
Neg logit contributions
udd-0.077
auc-0.075
 about-0.068
wheel-0.067
orting-0.065
Token was
Token ID373
Feature activation-5.77e-02

Pos logit contributions
 Ralph+0.003
 Mole+0.003
Albert+0.003
 Lex+0.003
 Fox+0.003
Neg logit contributions
auc-0.003
udd-0.003
wheel-0.003
 about-0.003
iren-0.002
Token happy
Token ID3772
Feature activation-4.70e-01

Pos logit contributions
auc+0.002
udd+0.002
wheel+0.002
iren+0.002
orting+0.002
Neg logit contributions
 Ralph-0.002
 Mole-0.002
 Lex-0.002
Albert-0.002
 Amelia-0.001
Token for
Token ID329
Feature activation+1.62e-01

Pos logit contributions
auc+0.012
udd+0.012
wheel+0.011
 about+0.010
iren+0.010
Neg logit contributions
 Ralph-0.015
 Mole-0.015
Albert-0.015
 Lex-0.015
 Fox-0.015
Token Fl
Token ID1610
Feature activation+4.03e+00

Pos logit contributions
 Ralph+0.004
 Mole+0.004
Albert+0.004
 Lex+0.004
 Amelia+0.004
Neg logit contributions
auc-0.006
udd-0.005
wheel-0.005
 about-0.005
iren-0.005
Tokenopsy
Token ID44522
Feature activation+1.36e+00

Pos logit contributions
 Amelia+0.068
 Jessie+0.064
 Albert+0.063
Albert+0.061
Flash+0.057
Neg logit contributions
udd-0.179
auc-0.155
orting-0.146
ru-0.145
ouncing-0.145
Token and
Token ID290
Feature activation+3.11e-01

Pos logit contributions
Albert+0.052
 Ralph+0.052
 Mole+0.052
 Lex+0.051
 Amelia+0.050
Neg logit contributions
udd-0.026
auc-0.025
 about-0.024
wheel-0.023
orting-0.019
Token said
Token ID531
Feature activation+2.63e-01

Pos logit contributions
 Ralph+0.008
 Mole+0.008
 Lex+0.008
Albert+0.008
 Fox+0.008
Neg logit contributions
auc-0.010
udd-0.010
wheel-0.009
iren-0.009
 about-0.009
Token,
Token ID11
Feature activation+7.50e-01

Pos logit contributions
 Ralph+0.008
 Mole+0.008
Albert+0.008
 Lex+0.008
 Fox+0.008
Neg logit contributions
auc-0.008
udd-0.008
wheel-0.007
 about-0.006
iren-0.006
Token
Token ID6184
Feature activation+1.48e+00

Pos logit contributions
 Ralph+0.022
 Mole+0.022
Albert+0.021
 Lex+0.021
 Fox+0.021
Neg logit contributions
auc-0.022
udd-0.022
wheel-0.020
 about-0.019
iren-0.018
Token said
Token ID531
Feature activation+8.01e-01

Pos logit contributions
udd+0.010
auc+0.010
orting+0.008
wheel+0.008
iren+0.008
Neg logit contributions
 Amelia-0.010
 Mole-0.010
 Ralph-0.010
 Fox-0.010
 Lex-0.010
Token "
Token ID366
Feature activation+1.47e+00

Pos logit contributions
 Amelia+0.019
 Ralph+0.019
 Mole+0.019
 Lex+0.019
 Fox+0.019
Neg logit contributions
udd-0.029
auc-0.028
wheel-0.024
iren-0.023
orting-0.023
TokenNo
Token ID2949
Feature activation+2.27e+00

Pos logit contributions
 Mole+0.035
 Ralph+0.035
Albert+0.035
 Lex+0.035
 Amelia+0.034
Neg logit contributions
auc-0.050
udd-0.050
 about-0.045
wheel-0.045
iren-0.042
Token juice
Token ID13135
Feature activation+2.16e+00

Pos logit contributions
 Mole+0.069
 Ralph+0.069
Albert+0.069
 Lex+0.067
Larry+0.066
Neg logit contributions
udd-0.063
auc-0.063
 about-0.056
wheel-0.056
iren-0.051
Token allowed
Token ID3142
Feature activation+3.11e+00

Pos logit contributions
 Mole+0.059
 Ralph+0.059
Albert+0.057
 Lex+0.057
 Fox+0.056
Neg logit contributions
auc-0.066
udd-0.065
wheel-0.059
 about-0.057
iren-0.054
Token inside
Token ID2641
Feature activation+4.00e+00

Pos logit contributions
Albert+0.084
 Ralph+0.082
 Mole+0.080
Larry+0.080
 Lex+0.079
Neg logit contributions
auc-0.098
udd-0.095
 about-0.094
wheel-0.089
iren-0.083
Token".
Token ID1911
Feature activation-2.90e-01

Pos logit contributions
Albert+0.103
 Mole+0.102
 Ralph+0.102
Larry+0.100
 Lex+0.098
Neg logit contributions
auc-0.130
udd-0.128
 about-0.125
wheel-0.115
iren-0.108
Token She
Token ID1375
Feature activation+8.65e-01

Pos logit contributions
auc+0.011
udd+0.011
wheel+0.010
 about+0.009
iren+0.009
Neg logit contributions
 Mole-0.006
 Ralph-0.006
 Amelia-0.006
 Lex-0.006
 Fox-0.006
Token was
Token ID373
Feature activation+5.15e-01

Pos logit contributions
 Ralph+0.025
 Mole+0.025
Albert+0.024
 Lex+0.024
Larry+0.023
Neg logit contributions
udd-0.025
auc-0.025
 about-0.023
wheel-0.023
ep-0.022
Token sad
Token ID6507
Feature activation-2.78e-01

Pos logit contributions
 Mole+0.014
 Ralph+0.013
 Lex+0.013
 Amelia+0.013
 Fox+0.013
Neg logit contributions
auc-0.017
udd-0.017
orting-0.015
wheel-0.015
iren-0.014
Token because
Token ID780
Feature activation+3.31e-01

Pos logit contributions
auc+0.010
wheel+0.009
udd+0.009
ep+0.009
iren+0.008
Neg logit contributions
 Ralph-0.007
 Mole-0.007
 Fox-0.006
 Amelia-0.006
 Lex-0.006
Token Mrs
Token ID9074
Feature activation+8.68e-01

Pos logit contributions
udd+0.002
auc+0.002
 about+0.002
wheel+0.002
iren+0.002
Neg logit contributions
Albert-0.002
 Mole-0.002
 Ralph-0.002
Larry-0.002
 Lex-0.002
Token Berry
Token ID20165
Feature activation+6.67e-02

Pos logit contributions
Albert+0.027
 Ralph+0.026
Larry+0.026
 Mole+0.026
Jess+0.026
Neg logit contributions
auc-0.022
udd-0.022
 about-0.021
wheel-0.018
orting-0.018
Token said
Token ID531
Feature activation+2.58e-01

Pos logit contributions
Albert+0.002
 Ralph+0.002
 Mole+0.002
 Lex+0.002
 Amelia+0.002
Neg logit contributions
udd-0.002
auc-0.002
 about-0.001
wheel-0.001
ru-0.001
Token "
Token ID366
Feature activation+1.93e+00

Pos logit contributions
 Mole+0.007
 Ralph+0.007
 Amelia+0.007
 Lex+0.007
 Fox+0.007
Neg logit contributions
udd-0.008
auc-0.008
wheel-0.007
iren-0.007
ep-0.007
TokenOf
Token ID5189
Feature activation+2.50e+00

Pos logit contributions
 Ralph+0.045
 Mole+0.044
 Fox+0.043
Albert+0.042
 Lex+0.042
Neg logit contributions
auc-0.067
udd-0.066
wheel-0.064
 about-0.059
iren-0.057
Token course
Token ID1781
Feature activation+4.00e+00

Pos logit contributions
Albert+0.085
 Ralph+0.085
 Mole+0.085
 Albert+0.082
Larry+0.081
Neg logit contributions
auc-0.053
udd-0.051
 about-0.050
iren-0.048
wheel-0.047
Token,
Token ID11
Feature activation+8.84e-01

Pos logit contributions
Albert+0.140
Larry+0.133
 Ralph+0.132
 Mole+0.132
Mo+0.131
Neg logit contributions
 about-0.093
udd-0.086
auc-0.082
wheel-0.080
oke-0.072
Token Lucy
Token ID22162
Feature activation+3.18e+00

Pos logit contributions
Albert+0.023
 Mole+0.023
 Ralph+0.022
Larry+0.022
 Lex+0.022
Neg logit contributions
auc-0.027
udd-0.027
 about-0.027
wheel-0.025
orting-0.023
Token."
Token ID526
Feature activation+6.98e-02

Pos logit contributions
Albert+0.100
 Ralph+0.098
 Mole+0.098
Larry+0.096
 Lex+0.094
Neg logit contributions
udd-0.081
auc-0.080
 about-0.078
wheel-0.070
 office-0.063
Token and
Token ID290
Feature activation+7.36e-01

Pos logit contributions
Albert+0.002
Larry+0.002
 Ralph+0.002
 Mole+0.002
 mole+0.002
Neg logit contributions
 about-0.002
udd-0.002
auc-0.002
wheel-0.002
iren-0.002
Token gave
Token ID2921
Feature activation-5.06e-01

Pos logit contributions
Albert+0.018
 Ralph+0.017
Larry+0.017
 Mole+0.017
Mo+0.016
Neg logit contributions
 about-0.023
auc-0.022
udd-0.022
wheel-0.021
 x-0.019
Token We
Token ID775
Feature activation+1.18e-01

Pos logit contributions
udd+0.011
auc+0.010
wheel+0.009
 about+0.009
iren+0.009
Neg logit contributions
 Mole-0.008
 Ralph-0.008
Albert-0.008
 Lex-0.008
 Fox-0.008
Token can
Token ID460
Feature activation-5.01e-01

Pos logit contributions
 Ralph+0.003
 Mole+0.003
 Lex+0.003
Albert+0.003
 Amelia+0.003
Neg logit contributions
auc-0.004
udd-0.004
wheel-0.003
 about-0.003
ep-0.003
Token make
Token ID787
Feature activation-1.85e+00

Pos logit contributions
auc+0.015
udd+0.015
wheel+0.013
 about+0.013
iren+0.012
Neg logit contributions
 Ralph-0.014
 Mole-0.014
Albert-0.014
 Lex-0.014
 Fox-0.014
Token it
Token ID340
Feature activation-1.13e+00

Pos logit contributions
auc+0.067
udd+0.065
wheel+0.061
iren+0.059
orting+0.056
Neg logit contributions
 Ralph-0.042
 Amelia-0.042
Albert-0.041
 Albert-0.041
 Buzz-0.040
Token look
Token ID804
Feature activation-1.50e+00

Pos logit contributions
auc+0.038
udd+0.036
iren+0.035
oke+0.033
orting+0.032
Neg logit contributions
 Buzz-0.027
 Flo-0.026
Albert-0.025
 Albert-0.025
 Amelia-0.025
Token even
Token ID772
Feature activation-5.22e+00

Pos logit contributions
auc+0.052
iren+0.047
udd+0.046
wheel+0.044
orting+0.042
Neg logit contributions
 Buzz-0.035
Albert-0.035
 Ralph-0.035
 Amelia-0.035
 Albert-0.034
Token more
Token ID517
Feature activation-2.79e+00

Pos logit contributions
auc+0.193
iren+0.173
udd+0.172
board+0.164
ouncing+0.159
Neg logit contributions
Albert-0.100
 Buzz-0.100
 might-0.099
 Albert-0.095
 Fox-0.094
Token special
Token ID2041
Feature activation-2.12e+00

Pos logit contributions
auc+0.101
udd+0.090
wheel+0.086
iren+0.086
orting+0.080
Neg logit contributions
Albert-0.065
 Ralph-0.063
 Albert-0.063
 Lex-0.061
 Buzz-0.061
Token,"
Token ID553
Feature activation-1.39e+00

Pos logit contributions
auc+0.074
udd+0.073
wheel+0.068
iren+0.065
arre+0.061
Neg logit contributions
 Ralph-0.047
 Fox-0.046
 Albert-0.046
 Mole-0.046
Albert-0.046
Token Lily
Token ID20037
Feature activation-1.66e+00

Pos logit contributions
udd+0.045
auc+0.044
wheel+0.040
 about+0.038
iren+0.037
Neg logit contributions
 Mole-0.038
 Ralph-0.038
 Buzz-0.037
 Lex-0.036
 Fox-0.036
Token said
Token ID531
Feature activation-1.61e+00

Pos logit contributions
auc+0.054
udd+0.053
orting+0.047
iren+0.047
ooting+0.046
Neg logit contributions
 Fox-0.042
 Mole-0.041
 Ralph-0.041
 Buzz-0.041
 Amelia-0.041
Token end
Token ID886
Feature activation-2.21e+00

Pos logit contributions
auc+0.043
udd+0.042
wheel+0.039
 about+0.038
iren+0.037
Neg logit contributions
 Mole-0.026
 Ralph-0.025
 Fox-0.025
Albert-0.024
 Lex-0.024
Token,
Token ID11
Feature activation-1.77e+00

Pos logit contributions
udd+0.060
auc+0.060
wheel+0.056
ep+0.051
iren+0.049
Neg logit contributions
 Mole-0.069
 Ralph-0.066
 Buzz-0.065
 Fox-0.065
 Flo-0.064
Token Lily
Token ID20037
Feature activation-6.69e-01

Pos logit contributions
auc+0.050
udd+0.049
wheel+0.044
 about+0.043
iren+0.041
Neg logit contributions
 Mole-0.054
 Ralph-0.054
Albert-0.052
 Lex-0.052
 Fox-0.051
Token received
Token ID2722
Feature activation-1.94e+00

Pos logit contributions
auc+0.020
udd+0.019
wheel+0.018
 about+0.017
iren+0.017
Neg logit contributions
 Ralph-0.019
 Mole-0.019
Albert-0.018
 Fox-0.018
 Lex-0.018
Token something
Token ID1223
Feature activation-1.20e+00

Pos logit contributions
auc+0.068
udd+0.068
oke+0.060
wheel+0.059
iren+0.057
Neg logit contributions
 Mole-0.048
 Ralph-0.046
 Fox-0.045
 Buzz-0.044
 Lex-0.044
Token even
Token ID772
Feature activation-5.21e+00

Pos logit contributions
auc+0.038
udd+0.038
iren+0.035
ep+0.034
ru+0.032
Neg logit contributions
 Mole-0.032
 Ralph-0.031
 Buzz-0.030
 Amelia-0.029
 Fox-0.029
Token better
Token ID1365
Feature activation-1.25e+00

Pos logit contributions
auc+0.195
iren+0.188
irens+0.175
 laptop+0.171
ooting+0.167
Neg logit contributions
 Buzz-0.100
 might-0.098
 Mole-0.090
 Hop-0.088
 pet-0.088
Token than
Token ID621
Feature activation-8.51e-01

Pos logit contributions
auc+0.043
udd+0.041
iren+0.037
wheel+0.037
 x+0.035
Neg logit contributions
 Ralph-0.031
 Mole-0.031
 Fox-0.030
 Lex-0.029
 mole-0.029
Token her
Token ID607
Feature activation+4.26e-01

Pos logit contributions
auc+0.030
udd+0.030
iren+0.026
ep+0.025
wheel+0.025
Neg logit contributions
 Buzz-0.020
 Mole-0.020
 Ralph-0.020
 Fox-0.019
 Amelia-0.019
Token hoop
Token ID44792
Feature activation+6.46e-02

Pos logit contributions
 Ralph+0.014
Albert+0.014
 Mole+0.014
 Lex+0.013
 Amelia+0.013
Neg logit contributions
udd-0.011
auc-0.011
wheel-0.010
 about-0.010
iren-0.009
Token -
Token ID532
Feature activation-1.98e+00

Pos logit contributions
 Mole+0.001
 Fox+0.001
 Buzz+0.001
 Ralph+0.001
 might+0.001
Neg logit contributions
auc-0.002
udd-0.002
iren-0.002
ru-0.002
ep-0.002
Token.
Token ID13
Feature activation-2.81e-01

Pos logit contributions
udd+0.036
auc+0.035
iren+0.032
wheel+0.031
eping+0.031
Neg logit contributions
 Buzz-0.023
 Mole-0.022
 Fox-0.021
 might-0.021
 Flo-0.020
Token We
Token ID775
Feature activation+2.92e-02

Pos logit contributions
udd+0.010
auc+0.010
wheel+0.009
 about+0.009
iren+0.009
Neg logit contributions
 Mole-0.007
 Ralph-0.007
 Fox-0.006
Albert-0.006
 Lex-0.006
Token can
Token ID460
Feature activation-9.19e-01

Pos logit contributions
 Ralph+0.001
 Lex+0.001
 Amelia+0.001
Albert+0.001
 Mole+0.001
Neg logit contributions
auc-0.001
udd-0.001
wheel-0.001
 about-0.001
ep-0.001
Token make
Token ID787
Feature activation-1.25e+00

Pos logit contributions
auc+0.026
udd+0.026
wheel+0.023
 about+0.023
iren+0.021
Neg logit contributions
 Ralph-0.027
 Mole-0.027
Albert-0.027
 Lex-0.027
 Amelia-0.026
Token it
Token ID340
Feature activation-1.05e+00

Pos logit contributions
auc+0.046
udd+0.045
wheel+0.041
iren+0.041
orting+0.039
Neg logit contributions
 Ralph-0.027
 Mole-0.027
 Buzz-0.026
 Fox-0.026
Albert-0.025
Token even
Token ID772
Feature activation-5.01e+00

Pos logit contributions
auc+0.033
iren+0.031
udd+0.031
oke+0.030
bench+0.028
Neg logit contributions
 Buzz-0.027
 Mole-0.024
 top-0.024
 Fox-0.024
 might-0.024
Token better
Token ID1365
Feature activation-8.21e-01

Pos logit contributions
auc+0.168
udd+0.164
iren+0.154
orting+0.136
ouncing+0.136
Neg logit contributions
 Buzz-0.123
 Mole-0.115
 mole-0.112
 Fox-0.110
 might-0.106
Token.
Token ID13
Feature activation-5.03e-01

Pos logit contributions
auc+0.033
udd+0.032
wheel+0.030
iren+0.029
 x+0.027
Neg logit contributions
 Mole-0.015
 Fox-0.015
 Ralph-0.015
 Buzz-0.014
Albert-0.014
Token We
Token ID775
Feature activation+7.17e-02

Pos logit contributions
auc+0.017
udd+0.017
wheel+0.015
 about+0.015
iren+0.014
Neg logit contributions
 Mole-0.012
 Ralph-0.012
Albert-0.012
 Lex-0.012
 Fox-0.012
Token can
Token ID460
Feature activation-7.18e-01

Pos logit contributions
 Ralph+0.002
 Lex+0.002
 Amelia+0.002
Albert+0.002
 Flo+0.002
Neg logit contributions
auc-0.002
udd-0.002
wheel-0.002
 about-0.002
ep-0.002
Token use
Token ID779
Feature activation-1.38e-01

Pos logit contributions
auc+0.020
udd+0.020
wheel+0.019
 about+0.018
iren+0.017
Neg logit contributions
 Ralph-0.021
 Mole-0.021
Albert-0.021
 Lex-0.021
 Amelia-0.021
Token,
Token ID11
Feature activation-1.47e+00

Pos logit contributions
auc+0.055
wheel+0.053
udd+0.052
 about+0.048
iren+0.047
Neg logit contributions
 Fox-0.040
 Mole-0.040
 might-0.039
 Ralph-0.038
 Buzz-0.037
Token you
Token ID345
Feature activation-1.79e-01

Pos logit contributions
auc+0.056
udd+0.055
wheel+0.051
iren+0.050
ep+0.047
Neg logit contributions
 Ralph-0.030
 Mole-0.030
 Lex-0.029
 Amelia-0.029
 Fox-0.029
Token can
Token ID460
Feature activation-3.98e-01

Pos logit contributions
udd+0.006
auc+0.006
 about+0.006
wheel+0.005
iren+0.005
Neg logit contributions
 Ralph-0.005
 Mole-0.004
Albert-0.004
 Lex-0.004
 Amelia-0.004
Token make
Token ID787
Feature activation-1.75e+00

Pos logit contributions
udd+0.012
auc+0.012
wheel+0.011
 about+0.011
iren+0.010
Neg logit contributions
 Ralph-0.011
 Mole-0.011
Albert-0.010
 Lex-0.010
 Amelia-0.010
Token something
Token ID1223
Feature activation-1.63e+00

Pos logit contributions
auc+0.058
udd+0.054
wheel+0.052
iren+0.051
oke+0.047
Neg logit contributions
 Ralph-0.046
 Amelia-0.045
 Lex-0.044
 Mole-0.044
 Fox-0.043
Token even
Token ID772
Feature activation-5.01e+00

Pos logit contributions
auc+0.051
udd+0.049
iren+0.048
ep+0.045
wheel+0.045
Neg logit contributions
 Ralph-0.040
 Mole-0.040
 Buzz-0.039
 Amelia-0.039
 Fox-0.038
Token better
Token ID1365
Feature activation-1.35e+00

Pos logit contributions
auc+0.181
iren+0.172
udd+0.168
board+0.156
irens+0.155
Neg logit contributions
 Buzz-0.112
 might-0.109
 Fox-0.101
 Mole-0.100
 Hop-0.098
Token.
Token ID13
Feature activation-1.44e+00

Pos logit contributions
auc+0.047
udd+0.045
wheel+0.041
iren+0.041
 x+0.038
Neg logit contributions
 Ralph-0.033
 Mole-0.033
 Fox-0.032
 Amelia-0.031
 Lex-0.030
Token The
Token ID383
Feature activation-7.70e-01

Pos logit contributions
udd+0.041
 about+0.040
auc+0.039
wheel+0.035
orting+0.033
Neg logit contributions
Albert-0.044
 Ralph-0.043
Larry-0.042
 Lex-0.042
 Mole-0.041
Token end
Token ID886
Feature activation-2.63e+00

Pos logit contributions
auc+0.029
udd+0.028
wheel+0.026
 about+0.025
iren+0.025
Neg logit contributions
 Mole-0.016
Albert-0.015
 Fox-0.015
 Lex-0.015
 mole-0.015
Token.
Token ID13
Feature activation-1.86e+00

Pos logit contributions
auc+0.086
udd+0.085
wheel+0.078
iren+0.072
ep+0.071
Neg logit contributions
 Mole-0.063
 Fox-0.062
 Ralph-0.061
 Amelia-0.060
Albert-0.059
Token work
Token ID670
Feature activation-2.04e+00

Pos logit contributions
auc+0.024
udd+0.024
wheel+0.021
 about+0.021
iren+0.019
Neg logit contributions
 Ralph-0.027
 Mole-0.027
Albert-0.026
 Lex-0.026
 Amelia-0.026
Token together
Token ID1978
Feature activation+5.89e-01

Pos logit contributions
auc+0.073
udd+0.067
wheel+0.063
oke+0.058
iren+0.058
Neg logit contributions
Albert-0.046
 Ralph-0.044
 Buzz-0.044
 Amelia-0.044
 Fox-0.044
Token and
Token ID290
Feature activation-2.04e+00

Pos logit contributions
 Ralph+0.016
 Mole+0.016
Albert+0.016
 Fox+0.016
 Lex+0.016
Neg logit contributions
auc-0.018
udd-0.018
wheel-0.016
iren-0.015
 about-0.015
Token make
Token ID787
Feature activation-1.67e+00

Pos logit contributions
udd+0.078
auc+0.077
iren+0.067
wheel+0.065
orting+0.064
Neg logit contributions
 Mole-0.044
 Fox-0.042
 Ralph-0.042
Albert-0.041
 Buzz-0.041
Token them
Token ID606
Feature activation-5.34e-01

Pos logit contributions
auc+0.058
udd+0.055
wheel+0.054
iren+0.052
orting+0.049
Neg logit contributions
 Ralph-0.039
 Amelia-0.037
 Mole-0.037
Albert-0.037
 Fox-0.037
Token even
Token ID772
Feature activation-4.98e+00

Pos logit contributions
auc+0.018
udd+0.017
iren+0.017
oke+0.016
orting+0.016
Neg logit contributions
Albert-0.012
 Buzz-0.011
 both-0.011
 Albert-0.011
 Mole-0.011
Token better
Token ID1365
Feature activation-8.81e-01

Pos logit contributions
auc+0.177
udd+0.172
iren+0.163
board+0.151
orting+0.149
Neg logit contributions
 Buzz-0.099
 both-0.096
 total-0.094
 eight-0.094
 Fox-0.093
Token than
Token ID621
Feature activation-4.23e-01

Pos logit contributions
auc+0.036
udd+0.035
wheel+0.033
iren+0.031
orting+0.030
Neg logit contributions
 Mole-0.016
 Fox-0.016
 Ralph-0.015
Albert-0.015
 Buzz-0.015
Token before
Token ID878
Feature activation+1.43e+00

Pos logit contributions
udd+0.014
auc+0.014
wheel+0.012
 about+0.012
iren+0.011
Neg logit contributions
 Mole-0.011
 Ralph-0.011
Albert-0.011
 Fox-0.011
 Lex-0.011
Token.
Token ID13
Feature activation-3.73e-01

Pos logit contributions
 Ralph+0.038
Albert+0.037
 Mole+0.037
 Lex+0.037
Larry+0.036
Neg logit contributions
auc-0.045
udd-0.044
 about-0.041
wheel-0.040
iren-0.038
Token And
Token ID843
Feature activation-1.01e+00

Pos logit contributions
auc+0.011
udd+0.011
 about+0.010
wheel+0.010
iren+0.009
Neg logit contributions
 Ralph-0.011
Albert-0.010
 Mole-0.010
 Lex-0.010
Larry-0.010
Token quiet
Token ID5897
Feature activation-9.31e-01

Pos logit contributions
 Ralph+0.010
 Mole+0.010
Albert+0.010
 Lex+0.010
 Fox+0.010
Neg logit contributions
auc-0.014
udd-0.014
wheel-0.012
 about-0.012
iren-0.012
Token compared
Token ID3688
Feature activation-4.41e-01

Pos logit contributions
auc+0.029
udd+0.028
wheel+0.026
iren+0.024
 x+0.023
Neg logit contributions
 Ralph-0.026
 Mole-0.025
 Fox-0.024
 Lex-0.024
 Buzz-0.024
Token to
Token ID284
Feature activation+3.60e-01

Pos logit contributions
auc+0.014
udd+0.014
wheel+0.012
 about+0.012
iren+0.011
Neg logit contributions
 Ralph-0.012
 Mole-0.012
Albert-0.012
 Lex-0.012
 Amelia-0.011
Token the
Token ID262
Feature activation-9.10e-01

Pos logit contributions
 Ralph+0.009
 Mole+0.009
 Lex+0.009
Albert+0.009
 Amelia+0.009
Neg logit contributions
udd-0.012
auc-0.012
wheel-0.011
iren-0.010
 about-0.010
Token noisy
Token ID31210
Feature activation-1.55e+00

Pos logit contributions
udd+0.028
auc+0.028
wheel+0.025
 about+0.024
iren+0.023
Neg logit contributions
 Mole-0.025
 Ralph-0.025
Albert-0.025
 Lex-0.024
 Fox-0.024
Token se
Token ID384
Feature activation-4.94e+00

Pos logit contributions
udd+0.047
auc+0.046
wheel+0.043
 about+0.039
orting+0.039
Neg logit contributions
 Ralph-0.042
 Mole-0.042
 Fox-0.041
 mole-0.041
 Amelia-0.040
Tokenag
Token ID363
Feature activation-7.46e-01

Pos logit contributions
auc+0.104
udd+0.096
 about+0.095
 x+0.092
 typing+0.092
Neg logit contributions
Larry-0.140
Buzz-0.138
Albert-0.136
Joy-0.135
Jess-0.133
Tokenull
Token ID724
Feature activation-5.93e-01

Pos logit contributions
udd+0.020
auc+0.020
wheel+0.018
 about+0.017
iren+0.016
Neg logit contributions
 Ralph-0.023
 Mole-0.023
Albert-0.023
 Lex-0.022
Larry-0.022
Tokens
Token ID82
Feature activation-1.56e-02

Pos logit contributions
auc+0.019
udd+0.018
wheel+0.016
 x+0.016
 about+0.016
Neg logit contributions
 Ralph-0.016
 Mole-0.016
 Fox-0.015
Albert-0.015
 Lex-0.015
Token outside
Token ID2354
Feature activation+1.50e+00

Pos logit contributions
auc+0.000
udd+0.000
wheel+0.000
iren+0.000
 about+0.000
Neg logit contributions
 Mole-0.000
 Ralph-0.000
Albert-0.000
 Lex-0.000
 Fox-0.000
Token.
Token ID13
Feature activation-2.36e-01

Pos logit contributions
 Ralph+0.041
 Mole+0.040
 Fox+0.039
 Lex+0.039
Albert+0.038
Neg logit contributions
udd-0.047
auc-0.046
wheel-0.042
iren-0.039
 about-0.038
Token to
Token ID284
Feature activation-1.21e+00

Pos logit contributions
auc+0.016
udd+0.014
ru+0.013
oke+0.013
eping+0.013
Neg logit contributions
 Amelia-0.011
 might-0.011
 Fox-0.010
 wherever-0.010
 Bonnie-0.010
Token make
Token ID787
Feature activation-1.56e+00

Pos logit contributions
auc+0.043
udd+0.043
iren+0.037
oke+0.037
wheel+0.037
Neg logit contributions
 Mole-0.028
 Amelia-0.027
 Lex-0.027
 Ralph-0.027
 Fox-0.027
Token her
Token ID607
Feature activation-6.74e-01

Pos logit contributions
auc+0.061
udd+0.057
iren+0.055
wheel+0.053
oke+0.052
Neg logit contributions
 Amelia-0.031
 Ralph-0.030
 Jessie-0.029
 Lex-0.029
 Mole-0.028
Token doll
Token ID3654
Feature activation-1.00e+00

Pos logit contributions
auc+0.024
udd+0.023
iren+0.020
wheel+0.020
orting+0.020
Neg logit contributions
 Mole-0.016
 Amelia-0.016
 Lex-0.015
 Fox-0.015
 Ralph-0.015
Token look
Token ID804
Feature activation-1.30e+00

Pos logit contributions
auc+0.038
udd+0.033
iren+0.032
oke+0.032
orting+0.031
Neg logit contributions
 Amelia-0.020
 Mole-0.020
 Fox-0.019
 might-0.019
 barely-0.019
Token even
Token ID772
Feature activation-4.92e+00

Pos logit contributions
auc+0.049
iren+0.045
udd+0.042
oke+0.040
wheel+0.040
Neg logit contributions
 Amelia-0.026
 total-0.025
 both-0.025
 might-0.025
 Jessie-0.024
Token prett
Token ID46442
Feature activation-1.79e+00

Pos logit contributions
auc+0.192
iren+0.170
udd+0.165
oke+0.163
ooting+0.162
Neg logit contributions
 might-0.085
 both-0.081
 Bonnie-0.078
 eventually-0.078
 Buzz-0.077
Tokenier
Token ID959
Feature activation-1.26e+00

Pos logit contributions
auc+0.031
iren+0.030
 x+0.029
 report+0.029
udd+0.028
Neg logit contributions
Albert-0.061
 Ralph-0.061
 Mole-0.060
Bon-0.059
 Albert-0.059
Token.
Token ID13
Feature activation-5.19e-01

Pos logit contributions
auc+0.042
udd+0.041
iren+0.037
wheel+0.037
 x+0.035
Neg logit contributions
 Mole-0.031
 Ralph-0.030
 Fox-0.030
 Amelia-0.030
 Lex-0.029
Token She
Token ID1375
Feature activation-7.08e-01

Pos logit contributions
auc+0.017
udd+0.017
wheel+0.016
 about+0.015
iren+0.015
Neg logit contributions
 Ralph-0.013
 Mole-0.013
Albert-0.012
 Lex-0.012
 Amelia-0.012
Token hugged
Token ID41130
Feature activation-6.55e-01

Pos logit contributions
auc+0.023
udd+0.022
wheel+0.020
iren+0.020
 about+0.020
Neg logit contributions
 Ralph-0.018
 Mole-0.018
Albert-0.018
 Lex-0.018
 Amelia-0.017
Token to
Token ID284
Feature activation-1.14e+00

Pos logit contributions
auc+0.055
udd+0.053
wheel+0.049
iren+0.046
ep+0.045
Neg logit contributions
 Mole-0.047
 Ralph-0.046
 Fox-0.045
 Buzz-0.043
 Lex-0.043
Token others
Token ID1854
Feature activation-9.14e-01

Pos logit contributions
auc+0.043
udd+0.042
wheel+0.039
 about+0.038
iren+0.037
Neg logit contributions
 Mole-0.024
 Ralph-0.024
Albert-0.023
 Lex-0.023
 Fox-0.022
Token made
Token ID925
Feature activation-1.74e+00

Pos logit contributions
auc+0.030
udd+0.029
wheel+0.027
 about+0.026
iren+0.025
Neg logit contributions
 Mole-0.023
 Ralph-0.023
Albert-0.023
 Fox-0.022
 Lex-0.022
Token them
Token ID606
Feature activation-6.89e-01

Pos logit contributions
auc+0.065
udd+0.063
wheel+0.059
iren+0.056
 about+0.055
Neg logit contributions
 Mole-0.038
 Ralph-0.038
Albert-0.036
 Lex-0.036
 Fox-0.035
Token feel
Token ID1254
Feature activation-2.40e+00

Pos logit contributions
auc+0.023
udd+0.021
iren+0.020
wheel+0.019
 x+0.019
Neg logit contributions
 Mole-0.017
 Buzz-0.016
 Ralph-0.016
Albert-0.016
 Fox-0.016
Token even
Token ID772
Feature activation-4.91e+00

Pos logit contributions
auc+0.071
udd+0.066
iren+0.060
wheel+0.060
 x+0.056
Neg logit contributions
 Mole-0.072
 Ralph-0.070
 Fox-0.069
 Buzz-0.068
 mole-0.067
Token better
Token ID1365
Feature activation-7.50e-01

Pos logit contributions
auc+0.205
iren+0.182
udd+0.176
irens+0.172
board+0.170
Neg logit contributions
 Buzz-0.077
 might-0.075
 Mole-0.073
 mole-0.072
 Fox-0.069
Token than
Token ID621
Feature activation-6.63e-01

Pos logit contributions
auc+0.024
udd+0.023
wheel+0.022
iren+0.021
 about+0.020
Neg logit contributions
 Mole-0.020
 Ralph-0.020
 Fox-0.020
Albert-0.019
 Lex-0.019
Token winning
Token ID5442
Feature activation-2.38e-01

Pos logit contributions
auc+0.017
udd+0.017
wheel+0.015
 about+0.014
iren+0.014
Neg logit contributions
 Mole-0.022
 Ralph-0.021
Albert-0.021
 Lex-0.021
 Fox-0.021
Token a
Token ID257
Feature activation+6.30e-01

Pos logit contributions
auc+0.008
udd+0.007
wheel+0.007
iren+0.006
 about+0.006
Neg logit contributions
 Mole-0.006
 Ralph-0.006
 Fox-0.006
Albert-0.006
 Amelia-0.006
Token race
Token ID3234
Feature activation-3.22e-01

Pos logit contributions
 Mole+0.018
 Ralph+0.018
Albert+0.018
 Fox+0.018
 Amelia+0.017
Neg logit contributions
auc-0.019
udd-0.018
wheel-0.016
 about-0.016
iren-0.015
Token that
Token ID326
Feature activation-1.26e+00

Pos logit contributions
auc+0.029
udd+0.029
wheel+0.026
iren+0.025
oke+0.023
Neg logit contributions
 Ralph-0.027
 Mole-0.026
 Fox-0.026
 Lex-0.025
 Buzz-0.025
Token he
Token ID339
Feature activation-8.03e-01

Pos logit contributions
udd+0.047
auc+0.046
iren+0.040
wheel+0.039
orting+0.039
Neg logit contributions
 Mole-0.032
 Ralph-0.031
 Fox-0.030
 Buzz-0.030
 mole-0.028
Token had
Token ID550
Feature activation-1.71e-01

Pos logit contributions
auc+0.028
udd+0.028
wheel+0.025
 about+0.025
iren+0.024
Neg logit contributions
 Ralph-0.019
 Mole-0.019
Albert-0.019
 Lex-0.018
 Amelia-0.018
Token gained
Token ID8618
Feature activation-1.38e+00

Pos logit contributions
auc+0.006
udd+0.005
wheel+0.005
iren+0.005
oke+0.004
Neg logit contributions
 Ralph-0.005
 Mole-0.005
 Fox-0.004
 Lex-0.004
Albert-0.004
Token something
Token ID1223
Feature activation-8.12e-01

Pos logit contributions
auc+0.046
udd+0.045
iren+0.039
wheel+0.039
oke+0.038
Neg logit contributions
 Mole-0.037
 Ralph-0.036
 Fox-0.035
 Buzz-0.035
 Flo-0.034
Token even
Token ID772
Feature activation-4.87e+00

Pos logit contributions
auc+0.026
udd+0.025
iren+0.024
ep+0.023
wheel+0.022
Neg logit contributions
 Ralph-0.022
 Mole-0.022
 Fox-0.021
 Buzz-0.020
 Albert-0.020
Token more
Token ID517
Feature activation-1.78e+00

Pos logit contributions
auc+0.186
iren+0.173
udd+0.163
irens+0.160
 laptop+0.156
Neg logit contributions
 might-0.102
 Buzz-0.097
 Mole-0.093
 mole-0.092
 Hop-0.092
Token important
Token ID1593
Feature activation-1.49e+00

Pos logit contributions
auc+0.054
udd+0.049
iren+0.045
wheel+0.044
ep+0.040
Neg logit contributions
 Ralph-0.052
 Mole-0.051
 mole-0.050
 Fox-0.049
 Lex-0.049
Token -
Token ID532
Feature activation-2.34e+00

Pos logit contributions
auc+0.049
udd+0.048
wheel+0.042
iren+0.041
ep+0.040
Neg logit contributions
 Ralph-0.039
 Mole-0.038
 Fox-0.037
 Amelia-0.036
Albert-0.036
Token a
Token ID257
Feature activation+2.79e-01

Pos logit contributions
auc+0.081
udd+0.081
iren+0.070
wheel+0.068
oke+0.067
Neg logit contributions
 Mole-0.059
 Ralph-0.058
 Fox-0.057
 Buzz-0.055
 mole-0.054
Token friend
Token ID1545
Feature activation-5.32e-01

Pos logit contributions
 Mole+0.008
 Ralph+0.008
Albert+0.008
 Fox+0.008
 Lex+0.008
Neg logit contributions
auc-0.008
udd-0.008
wheel-0.007
 about-0.007
iren-0.007
Token his
Token ID465
Feature activation-6.62e-01

Pos logit contributions
udd+0.048
auc+0.048
wheel+0.042
iren+0.040
 about+0.040
Neg logit contributions
 Ralph-0.038
 Mole-0.037
 Fox-0.035
Albert-0.035
 Lex-0.035
Token mother
Token ID2802
Feature activation-1.54e+00

Pos logit contributions
auc+0.022
udd+0.022
wheel+0.020
 about+0.019
iren+0.019
Neg logit contributions
 Mole-0.017
 Ralph-0.017
Albert-0.016
 Fox-0.016
 Lex-0.016
Token gave
Token ID2921
Feature activation-1.65e+00

Pos logit contributions
auc+0.048
udd+0.047
wheel+0.042
iren+0.039
 about+0.039
Neg logit contributions
 Ralph-0.042
 Mole-0.042
 Fox-0.041
 Lex-0.041
 Amelia-0.041
Token him
Token ID683
Feature activation+9.58e-02

Pos logit contributions
auc+0.055
udd+0.055
wheel+0.050
 about+0.049
iren+0.047
Neg logit contributions
 Ralph-0.041
 Mole-0.041
Albert-0.040
 Lex-0.039
 Fox-0.038
Token an
Token ID281
Feature activation-2.18e+00

Pos logit contributions
 Ralph+0.003
 Mole+0.003
Albert+0.003
 Lex+0.003
 Fox+0.003
Neg logit contributions
auc-0.003
udd-0.003
wheel-0.002
 about-0.002
iren-0.002
Token even
Token ID772
Feature activation-4.80e+00

Pos logit contributions
udd+0.066
auc+0.061
orting+0.054
iren+0.053
oke+0.051
Neg logit contributions
 Mole-0.062
 mole-0.061
 Ralph-0.059
 Albert-0.058
 Lex-0.058
Token longer
Token ID2392
Feature activation-1.16e+00

Pos logit contributions
auc+0.171
udd+0.169
iren+0.162
irens+0.159
oke+0.153
Neg logit contributions
 might-0.088
 pet-0.085
 bunny-0.079
 Jessie-0.079
 Buzz-0.079
Token rope
Token ID17182
Feature activation-2.46e-01

Pos logit contributions
auc+0.038
udd+0.037
wheel+0.032
orting+0.032
iren+0.032
Neg logit contributions
 Ralph-0.028
 Lex-0.027
 Mole-0.027
 Jessie-0.027
 Amelia-0.027
Token to
Token ID284
Feature activation-8.58e-01

Pos logit contributions
udd+0.008
auc+0.008
iren+0.007
 x+0.007
wheel+0.007
Neg logit contributions
 Ralph-0.006
 Mole-0.006
 Flo-0.005
 Fox-0.005
 Buzz-0.005
Token play
Token ID711
Feature activation-9.79e-01

Pos logit contributions
auc+0.032
udd+0.032
oke+0.028
iren+0.027
 about+0.026
Neg logit contributions
 Ralph-0.019
 Mole-0.019
 Flo-0.018
 Lex-0.018
 Buzz-0.018
Token with
Token ID351
Feature activation-2.25e-01

Pos logit contributions
udd+0.032
auc+0.031
oke+0.027
iren+0.027
aj+0.026
Neg logit contributions
 Mole-0.026
 Ralph-0.025
 Buzz-0.025
 Hop-0.025
 Fox-0.024
Token can
Token ID460
Feature activation-2.69e-01

Pos logit contributions
auc+0.014
udd+0.014
wheel+0.012
iren+0.012
 about+0.011
Neg logit contributions
 Ralph-0.011
 Mole-0.011
Albert-0.011
 Fox-0.011
 Amelia-0.010
Token help
Token ID1037
Feature activation-1.23e+00

Pos logit contributions
auc+0.009
udd+0.009
wheel+0.008
 about+0.008
iren+0.008
Neg logit contributions
 Mole-0.006
Albert-0.006
 Ralph-0.006
 Fox-0.006
Larry-0.006
Token us
Token ID514
Feature activation-8.26e-01

Pos logit contributions
auc+0.045
udd+0.045
wheel+0.041
iren+0.040
 about+0.038
Neg logit contributions
 Ralph-0.027
 Mole-0.027
 Fox-0.026
Albert-0.026
 Lex-0.025
Token make
Token ID787
Feature activation-1.75e+00

Pos logit contributions
auc+0.026
udd+0.026
wheel+0.023
 about+0.022
iren+0.022
Neg logit contributions
 Mole-0.022
 Ralph-0.022
Albert-0.021
 Fox-0.021
 Lex-0.021
Token something
Token ID1223
Feature activation-1.64e+00

Pos logit contributions
auc+0.057
udd+0.055
wheel+0.053
iren+0.052
orting+0.047
Neg logit contributions
 Ralph-0.047
 Mole-0.045
 Fox-0.043
Albert-0.043
 Buzz-0.043
Token even
Token ID772
Feature activation-4.79e+00

Pos logit contributions
auc+0.049
udd+0.048
iren+0.044
wheel+0.044
ep+0.041
Neg logit contributions
 Ralph-0.046
 Mole-0.045
Albert-0.043
 Fox-0.043
 Albert-0.043
Token more
Token ID517
Feature activation-2.64e+00

Pos logit contributions
auc+0.163
udd+0.157
iren+0.155
irens+0.138
board+0.136
Neg logit contributions
 Buzz-0.118
 might-0.116
 Mole-0.113
 Fox-0.113
 Ralph-0.108
Token special
Token ID2041
Feature activation-1.57e+00

Pos logit contributions
auc+0.080
udd+0.075
iren+0.070
wheel+0.069
orting+0.064
Neg logit contributions
 Ralph-0.075
 Mole-0.072
 Fox-0.071
 Jessie-0.071
 Albert-0.070
Token!
Token ID0
Feature activation-1.77e+00

Pos logit contributions
udd+0.047
auc+0.047
wheel+0.042
iren+0.041
ep+0.038
Neg logit contributions
 Ralph-0.044
 Mole-0.044
 Fox-0.042
Albert-0.042
 Albert-0.041
Token<|endoftext|>
Token ID50256
Feature activation-4.18e-01

Pos logit contributions
udd+0.053
auc+0.053
 about+0.052
wheel+0.049
iren+0.046
Neg logit contributions
Albert-0.046
 Lex-0.046
 Ralph-0.046
 Mole-0.046
 mole-0.045
TokenOnce
Token ID7454
Feature activation+3.29e-03

Pos logit contributions
auc+0.013
udd+0.013
 about+0.012
wheel+0.012
iren+0.012
Neg logit contributions
 Lex-0.011
Albert-0.010
 Mole-0.010
 Ralph-0.010
 Amelia-0.010
Token on
Token ID319
Feature activation+2.23e+00

Pos logit contributions
 Ralph+0.006
 Mole+0.006
Albert+0.006
 Buzz+0.006
 Fox+0.006
Neg logit contributions
udd-0.007
auc-0.007
wheel-0.006
iren-0.006
 about-0.006
Token top
Token ID1353
Feature activation-5.40e-01

Pos logit contributions
 Mole+0.061
 Lex+0.061
 Ralph+0.060
Albert+0.059
 Flo+0.057
Neg logit contributions
auc-0.068
udd-0.066
 about-0.063
wheel-0.060
ep-0.057
Token to
Token ID284
Feature activation-2.78e-01

Pos logit contributions
udd+0.015
auc+0.015
wheel+0.013
orting+0.013
 about+0.012
Neg logit contributions
 Ralph-0.015
 Mole-0.015
 Buzz-0.015
Albert-0.015
 Amelia-0.014
Token make
Token ID787
Feature activation-1.68e+00

Pos logit contributions
udd+0.010
auc+0.009
wheel+0.008
 about+0.008
iren+0.008
Neg logit contributions
 Mole-0.007
 Ralph-0.007
Albert-0.007
 Buzz-0.006
 Amelia-0.006
Token it
Token ID340
Feature activation-1.00e+00

Pos logit contributions
udd+0.061
auc+0.061
wheel+0.054
iren+0.054
orting+0.052
Neg logit contributions
 Ralph-0.036
 Buzz-0.035
 Mole-0.034
Albert-0.034
 Amelia-0.034
Token even
Token ID772
Feature activation-4.76e+00

Pos logit contributions
auc+0.032
udd+0.031
iren+0.029
orting+0.027
oke+0.027
Neg logit contributions
 Buzz-0.026
 Mole-0.025
Albert-0.025
 Ralph-0.025
 Albert-0.024
Token taller
Token ID25242
Feature activation-9.38e-01

Pos logit contributions
auc+0.153
udd+0.148
iren+0.141
orting+0.124
irens+0.123
Neg logit contributions
 Buzz-0.121
 Mole-0.116
 might-0.111
 Fox-0.110
 total-0.109
Token!"
Token ID2474
Feature activation-5.37e-01

Pos logit contributions
auc+0.033
udd+0.033
wheel+0.030
iren+0.029
 x+0.028
Neg logit contributions
 Ralph-0.021
 Mole-0.021
 Buzz-0.020
 Fox-0.020
Albert-0.020
Token
Token ID198
Feature activation+2.68e-02

Pos logit contributions
auc+0.019
udd+0.018
 about+0.017
wheel+0.017
iren+0.016
Neg logit contributions
Albert-0.013
 Mole-0.013
 Ralph-0.013
 Lex-0.012
 Amelia-0.012
Token
Token ID198
Feature activation+3.96e-01

Pos logit contributions
 Lex+0.001
 Mole+0.001
 Ralph+0.001
Albert+0.001
 Amelia+0.001
Neg logit contributions
udd-0.001
auc-0.001
 about-0.001
wheel-0.001
iren-0.001
TokenThe
Token ID464
Feature activation+3.37e-01

Pos logit contributions
 Ralph+0.010
 Mole+0.010
 Lex+0.010
 Fox+0.010
 Amelia+0.010
Neg logit contributions
auc-0.013
udd-0.013
wheel-0.012
 about-0.011
iren-0.011
Token and
Token ID290
Feature activation-2.58e+00

Pos logit contributions
 Ralph+0.020
 Mole+0.020
 Fox+0.020
Albert+0.020
 Amelia+0.019
Neg logit contributions
udd-0.021
auc-0.021
wheel-0.019
iren-0.017
 about-0.017
Token stickers
Token ID28568
Feature activation-1.37e-01

Pos logit contributions
udd+0.085
auc+0.080
wheel+0.068
oke+0.067
iren+0.066
Neg logit contributions
 Fox-0.069
 Mole-0.068
Albert-0.067
 Ralph-0.066
 Lex-0.066
Token and
Token ID290
Feature activation-2.47e+00

Pos logit contributions
auc+0.004
udd+0.004
wheel+0.004
 x+0.003
iren+0.003
Neg logit contributions
 Ralph-0.004
Albert-0.004
 Amelia-0.004
 Fox-0.004
 Albert-0.003
Token make
Token ID787
Feature activation-2.18e+00

Pos logit contributions
udd+0.077
auc+0.075
wheel+0.064
orting+0.061
iren+0.061
Neg logit contributions
 Fox-0.067
 Mole-0.066
Albert-0.066
 Lex-0.065
 Ralph-0.064
Token them
Token ID606
Feature activation-1.01e+00

Pos logit contributions
auc+0.077
udd+0.076
wheel+0.070
iren+0.068
orting+0.066
Neg logit contributions
 Ralph-0.049
Albert-0.049
 Albert-0.048
 Fox-0.047
 Amelia-0.047
Token even
Token ID772
Feature activation-4.76e+00

Pos logit contributions
auc+0.035
udd+0.033
iren+0.031
oke+0.030
orting+0.030
Neg logit contributions
Albert-0.023
 Albert-0.022
 Buzz-0.022
 Mole-0.022
 Amelia-0.021
Token better
Token ID1365
Feature activation-1.34e+00

Pos logit contributions
auc+0.162
udd+0.160
iren+0.145
orting+0.136
board+0.136
Neg logit contributions
 Buzz-0.105
 Fox-0.103
Albert-0.103
 Albert-0.099
 Mole-0.099
Token than
Token ID621
Feature activation-3.79e-04

Pos logit contributions
auc+0.050
udd+0.049
wheel+0.045
iren+0.042
orting+0.041
Neg logit contributions
 Fox-0.029
Albert-0.029
 Ralph-0.029
 Mole-0.028
 Albert-0.027
Token before
Token ID878
Feature activation+1.30e+00

Pos logit contributions
auc+0.000
udd+0.000
wheel+0.000
 about+0.000
iren+0.000
Neg logit contributions
 Ralph-0.000
 Mole-0.000
Albert-0.000
 Lex-0.000
 Fox-0.000
Token.
Token ID13
Feature activation-7.17e-01

Pos logit contributions
 Mole+0.037
 Ralph+0.037
Albert+0.036
 Lex+0.036
 Amelia+0.035
Neg logit contributions
auc-0.038
udd-0.038
 about-0.035
wheel-0.034
iren-0.032
Token Then
Token ID3244
Feature activation-3.30e-01

Pos logit contributions
auc+0.022
udd+0.022
 about+0.020
wheel+0.020
iren+0.018
Neg logit contributions
 Ralph-0.019
 Mole-0.019
Albert-0.019
 Lex-0.019
 Amelia-0.018
Token try
Token ID1949
Feature activation-9.85e-01

Pos logit contributions
auc+0.125
udd+0.120
wheel+0.107
iren+0.105
oke+0.100
Neg logit contributions
 Mole-0.067
 Ralph-0.066
Albert-0.066
 Fox-0.065
 Amelia-0.063
Token again
Token ID757
Feature activation-9.59e-01

Pos logit contributions
auc+0.033
udd+0.030
wheel+0.029
iren+0.027
oke+0.026
Neg logit contributions
 Ralph-0.026
 Fox-0.026
 Mole-0.026
Albert-0.024
 Amelia-0.024
Token and
Token ID290
Feature activation-2.01e+00

Pos logit contributions
auc+0.034
udd+0.031
wheel+0.029
iren+0.028
oke+0.026
Neg logit contributions
 Fox-0.025
 Mole-0.024
 Ralph-0.024
Albert-0.023
 Lex-0.023
Token make
Token ID787
Feature activation-1.69e+00

Pos logit contributions
auc+0.078
udd+0.076
iren+0.067
aj+0.064
wheel+0.063
Neg logit contributions
 Mole-0.045
 Fox-0.043
 Ralph-0.042
 Buzz-0.041
Albert-0.040
Token something
Token ID1223
Feature activation-1.50e+00

Pos logit contributions
auc+0.059
udd+0.055
wheel+0.052
iren+0.051
oke+0.049
Neg logit contributions
 Ralph-0.042
 Mole-0.041
 Fox-0.040
 Amelia-0.039
Albert-0.039
Token even
Token ID772
Feature activation-4.75e+00

Pos logit contributions
auc+0.046
udd+0.043
iren+0.043
ep+0.041
wheel+0.040
Neg logit contributions
 Ralph-0.039
 Mole-0.039
 Buzz-0.038
 Fox-0.037
 Albert-0.037
Token better
Token ID1365
Feature activation-1.29e+00

Pos logit contributions
auc+0.178
iren+0.165
udd+0.162
irens+0.152
board+0.152
Neg logit contributions
 might-0.105
 Buzz-0.097
 Mole-0.091
 Hop-0.091
 Fox-0.090
Token.
Token ID13
Feature activation-1.89e+00

Pos logit contributions
auc+0.046
udd+0.045
wheel+0.041
iren+0.041
 x+0.037
Neg logit contributions
 Ralph-0.030
 Mole-0.030
 Fox-0.029
 Albert-0.027
Albert-0.027
Token<|endoftext|>
Token ID50256
Feature activation-4.07e-01

Pos logit contributions
udd+0.062
auc+0.062
 about+0.059
wheel+0.056
iren+0.052
Neg logit contributions
Albert-0.049
 Ralph-0.048
 Lex-0.048
 Mole-0.047
 Amelia-0.046
TokenOnce
Token ID7454
Feature activation-6.62e-02

Pos logit contributions
udd+0.011
auc+0.011
 about+0.011
wheel+0.010
iren+0.010
Neg logit contributions
 Lex-0.012
Albert-0.012
 Mole-0.011
 Ralph-0.011
 mole-0.011
Token upon
Token ID2402
Feature activation-1.50e+00

Pos logit contributions
udd+0.002
auc+0.002
 about+0.002
wheel+0.002
iren+0.002
Neg logit contributions
Albert-0.002
 Mole-0.002
 Ralph-0.002
 Lex-0.002
 Fox-0.002
Token next
Token ID1306
Feature activation-8.06e-01

Pos logit contributions
auc+0.004
udd+0.004
wheel+0.004
iren+0.004
 about+0.004
Neg logit contributions
 Ralph-0.004
 Mole-0.004
Albert-0.004
 Fox-0.004
 Amelia-0.003
Token game
Token ID983
Feature activation-8.66e-01

Pos logit contributions
auc+0.026
udd+0.025
wheel+0.022
iren+0.022
 about+0.021
Neg logit contributions
 Ralph-0.021
 Mole-0.021
 Fox-0.021
Albert-0.020
 Buzz-0.020
Token and
Token ID290
Feature activation-7.76e-01

Pos logit contributions
auc+0.027
udd+0.027
wheel+0.023
iren+0.023
eping+0.022
Neg logit contributions
 Ralph-0.023
 Mole-0.022
 Fox-0.022
 Buzz-0.022
Albert-0.021
Token getting
Token ID1972
Feature activation-1.50e+00

Pos logit contributions
auc+0.027
udd+0.026
iren+0.022
wheel+0.022
orting+0.022
Neg logit contributions
 Ralph-0.020
 Mole-0.020
 Fox-0.020
 Amelia-0.019
Albert-0.019
Token an
Token ID281
Feature activation-2.14e+00

Pos logit contributions
auc+0.051
udd+0.050
oke+0.045
iren+0.045
wheel+0.044
Neg logit contributions
 Ralph-0.036
 Mole-0.034
Albert-0.034
 Buzz-0.033
 Fox-0.033
Token even
Token ID772
Feature activation-4.75e+00

Pos logit contributions
auc+0.067
udd+0.063
iren+0.058
ows+0.053
orting+0.053
Neg logit contributions
 Ralph-0.057
 Mole-0.056
 Albert-0.054
Albert-0.054
 Amelia-0.053
Token higher
Token ID2440
Feature activation-1.16e+00

Pos logit contributions
auc+0.182
udd+0.171
iren+0.169
irens+0.160
oke+0.154
Neg logit contributions
 bunny-0.077
 Hop-0.075
 other-0.074
 Buzz-0.074
 Bonnie-0.073
Token score
Token ID4776
Feature activation-1.35e+00

Pos logit contributions
auc+0.038
udd+0.037
wheel+0.034
iren+0.032
 x+0.031
Neg logit contributions
 Ralph-0.031
 Mole-0.029
 Fox-0.028
Albert-0.028
 Lex-0.028
Token.
Token ID13
Feature activation-1.78e+00

Pos logit contributions
udd+0.045
auc+0.044
iren+0.039
 x+0.038
wheel+0.038
Neg logit contributions
 Ralph-0.032
 Buzz-0.031
 Mole-0.031
Larry-0.030
Albert-0.030
Token<|endoftext|>
Token ID50256
Feature activation-8.87e-01

Pos logit contributions
udd+0.056
 about+0.056
auc+0.054
wheel+0.049
iren+0.046
Neg logit contributions
Albert-0.048
 Lex-0.046
 Ralph-0.046
 Mole-0.046
Larry-0.045
TokenOnce
Token ID7454
Feature activation+1.14e-01

Pos logit contributions
udd+0.027
auc+0.027
 about+0.027
wheel+0.025
iren+0.023
Neg logit contributions
 Lex-0.023
 Mole-0.023
 Ralph-0.023
Albert-0.023
 mole-0.022
Token others
Token ID1854
Feature activation-1.06e+00

Pos logit contributions
auc+0.042
udd+0.042
 about+0.039
wheel+0.039
iren+0.036
Neg logit contributions
 Ralph-0.023
 Mole-0.022
Albert-0.022
 Lex-0.022
 Amelia-0.021
Token can
Token ID460
Feature activation-5.73e-01

Pos logit contributions
auc+0.034
udd+0.034
wheel+0.031
 about+0.029
iren+0.029
Neg logit contributions
 Ralph-0.028
 Mole-0.027
Albert-0.027
 Fox-0.026
 Lex-0.026
Token make
Token ID787
Feature activation-1.55e+00

Pos logit contributions
auc+0.020
udd+0.020
wheel+0.018
 about+0.018
iren+0.017
Neg logit contributions
 Mole-0.014
 Ralph-0.014
Albert-0.014
 Fox-0.013
 Lex-0.013
Token you
Token ID345
Feature activation-1.08e+00

Pos logit contributions
auc+0.053
udd+0.052
wheel+0.048
 about+0.046
iren+0.045
Neg logit contributions
 Ralph-0.038
 Mole-0.038
Albert-0.037
 Lex-0.036
 Fox-0.036
Token feel
Token ID1254
Feature activation-1.84e+00

Pos logit contributions
auc+0.040
udd+0.036
iren+0.036
oke+0.034
wheel+0.033
Neg logit contributions
 Mole-0.024
 Buzz-0.024
 Ralph-0.023
 Fox-0.023
 mole-0.023
Token even
Token ID772
Feature activation-4.73e+00

Pos logit contributions
auc+0.050
udd+0.047
wheel+0.042
iren+0.041
ep+0.038
Neg logit contributions
 Mole-0.060
 Ralph-0.059
 Fox-0.058
 Buzz-0.057
Albert-0.056
Token happier
Token ID23030
Feature activation-1.70e+00

Pos logit contributions
auc+0.188
iren+0.173
udd+0.166
irens+0.160
board+0.158
Neg logit contributions
 Buzz-0.090
 mole-0.088
 Mole-0.088
 Fox-0.085
 might-0.081
Token than
Token ID621
Feature activation-8.99e-01

Pos logit contributions
auc+0.051
udd+0.049
wheel+0.046
iren+0.043
 x+0.040
Neg logit contributions
 Ralph-0.050
 Mole-0.050
 Fox-0.049
Albert-0.048
 Lex-0.047
Token getting
Token ID1972
Feature activation-4.79e-01

Pos logit contributions
udd+0.026
auc+0.026
wheel+0.022
 about+0.021
iren+0.021
Neg logit contributions
 Mole-0.027
 Ralph-0.027
Albert-0.026
 Fox-0.026
 Lex-0.026
Token what
Token ID644
Feature activation-1.78e-01

Pos logit contributions
auc+0.013
udd+0.013
wheel+0.012
iren+0.011
orting+0.011
Neg logit contributions
 Mole-0.015
 Ralph-0.015
 Fox-0.014
 Lex-0.014
 Buzz-0.014
Token you
Token ID345
Feature activation-4.32e-01

Pos logit contributions
auc+0.006
udd+0.005
 about+0.005
wheel+0.005
iren+0.005
Neg logit contributions
 Ralph-0.005
 Mole-0.005
Albert-0.005
 Lex-0.005
 Amelia-0.005
Token always
Token ID1464
Feature activation-2.26e+00

Pos logit contributions
auc+0.007
udd+0.007
wheel+0.006
 about+0.006
iren+0.006
Neg logit contributions
 Ralph-0.006
 Mole-0.006
Albert-0.006
 Lex-0.006
 Amelia-0.006
Token be
Token ID307
Feature activation-1.03e+00

Pos logit contributions
auc+0.082
udd+0.080
wheel+0.072
iren+0.070
 about+0.069
Neg logit contributions
 Mole-0.052
 Ralph-0.052
Albert-0.051
 Fox-0.050
 Lex-0.050
Token replaced
Token ID6928
Feature activation-6.64e-01

Pos logit contributions
auc+0.031
udd+0.029
wheel+0.028
iren+0.026
orting+0.025
Neg logit contributions
 Mole-0.030
 Ralph-0.029
 Fox-0.029
 Lex-0.029
Albert-0.029
Token with
Token ID351
Feature activation-3.37e-01

Pos logit contributions
auc+0.017
udd+0.017
wheel+0.016
iren+0.014
 about+0.014
Neg logit contributions
 Ralph-0.022
 Mole-0.022
 Fox-0.021
Albert-0.021
 Amelia-0.021
Token something
Token ID1223
Feature activation-6.51e-01

Pos logit contributions
auc+0.010
udd+0.010
wheel+0.009
 about+0.009
iren+0.009
Neg logit contributions
 Ralph-0.009
 Mole-0.009
Albert-0.009
 Lex-0.009
 Fox-0.009
Token even
Token ID772
Feature activation-4.70e+00

Pos logit contributions
auc+0.019
udd+0.018
wheel+0.017
iren+0.016
ep+0.015
Neg logit contributions
 Ralph-0.019
 Mole-0.019
Albert-0.018
 Fox-0.018
 Lex-0.018
Token better
Token ID1365
Feature activation-6.39e-01

Pos logit contributions
auc+0.167
iren+0.155
udd+0.153
irens+0.142
ooting+0.140
Neg logit contributions
 might-0.112
 Buzz-0.106
 Fox-0.100
 Mole-0.100
 mole-0.099
Token.
Token ID13
Feature activation-2.18e+00

Pos logit contributions
auc+0.020
udd+0.019
wheel+0.018
iren+0.017
 about+0.016
Neg logit contributions
 Ralph-0.018
 Mole-0.018
 Fox-0.017
 Lex-0.017
Albert-0.017
Token<|endoftext|>
Token ID50256
Feature activation-4.12e-01

Pos logit contributions
udd+0.067
auc+0.066
 about+0.064
wheel+0.060
orting+0.055
Neg logit contributions
Albert-0.060
 Ralph-0.059
 Lex-0.059
 Mole-0.057
 Amelia-0.057
TokenOnce
Token ID7454
Feature activation+3.67e-02

Pos logit contributions
udd+0.011
 about+0.011
auc+0.011
wheel+0.010
iren+0.010
Neg logit contributions
 Lex-0.012
Albert-0.012
 Mole-0.012
 mole-0.012
 Ralph-0.012
Token upon
Token ID2402
Feature activation-1.41e+00

Pos logit contributions
 Ralph+0.001
 Mole+0.001
Albert+0.001
 Lex+0.001
 Fox+0.001
Neg logit contributions
udd-0.001
auc-0.001
 about-0.001
wheel-0.001
iren-0.001
Token had
Token ID550
Feature activation+7.64e-01

Pos logit contributions
auc+0.034
udd+0.033
orting+0.030
wheel+0.030
iren+0.029
Neg logit contributions
 Mole-0.028
 Fox-0.028
 Amelia-0.027
 Ralph-0.027
Albert-0.027
Token bought
Token ID5839
Feature activation-4.36e-01

Pos logit contributions
 Ralph+0.022
 Mole+0.022
Albert+0.022
 Lex+0.021
 Amelia+0.021
Neg logit contributions
auc-0.023
udd-0.022
wheel-0.020
 about-0.019
iren-0.019
Token a
Token ID257
Feature activation-9.50e-02

Pos logit contributions
auc+0.013
udd+0.013
wheel+0.012
 about+0.012
iren+0.011
Neg logit contributions
 Ralph-0.012
 Mole-0.012
Albert-0.012
 Lex-0.012
 Amelia-0.011
Token new
Token ID649
Feature activation-1.21e+00

Pos logit contributions
auc+0.003
udd+0.003
wheel+0.002
 about+0.002
iren+0.002
Neg logit contributions
 Mole-0.003
 Ralph-0.003
Albert-0.003
 Fox-0.003
 Amelia-0.003
Token magnet
Token ID19972
Feature activation+4.02e-01

Pos logit contributions
auc+0.039
udd+0.038
wheel+0.034
 about+0.033
iren+0.032
Neg logit contributions
 Mole-0.033
 Ralph-0.032
 Fox-0.032
Albert-0.031
 Amelia-0.031
Token even
Token ID772
Feature activation-4.68e+00

Pos logit contributions
 Mole+0.012
 Fox+0.012
 Ralph+0.012
 mole+0.011
 Amelia+0.011
Neg logit contributions
udd-0.012
auc-0.011
iren-0.009
wheel-0.009
ru-0.009
Token though
Token ID996
Feature activation+5.60e-01

Pos logit contributions
auc+0.170
iren+0.160
udd+0.155
ooting+0.146
irens+0.146
Neg logit contributions
 might-0.097
 mole-0.088
 Mole-0.088
 wherever-0.086
 Hop-0.085
Token the
Token ID262
Feature activation-7.49e-01

Pos logit contributions
 Ralph+0.018
 Mole+0.018
Albert+0.018
 Lex+0.017
Larry+0.017
Neg logit contributions
auc-0.015
udd-0.015
 about-0.013
wheel-0.013
iren-0.012
Token old
Token ID1468
Feature activation-2.19e+00

Pos logit contributions
auc+0.020
udd+0.020
wheel+0.018
 about+0.018
 x+0.017
Neg logit contributions
 Ralph-0.023
 Lex-0.023
Albert-0.023
 Mole-0.023
 Flo-0.023
Token one
Token ID530
Feature activation-9.50e-01

Pos logit contributions
udd+0.074
auc+0.074
wheel+0.064
 about+0.063
orting+0.061
Neg logit contributions
 Mole-0.057
 Ralph-0.055
 Fox-0.055
Albert-0.054
 mole-0.053
Token was
Token ID373
Feature activation-2.50e-01

Pos logit contributions
auc+0.035
udd+0.035
wheel+0.031
iren+0.031
 about+0.030
Neg logit contributions
 Mole-0.020
 Ralph-0.020
 Fox-0.019
 Lex-0.019
 Buzz-0.019
Token Why
Token ID4162
Feature activation-1.20e+00

Pos logit contributions
 Mole+0.007
 Ralph+0.007
Albert+0.007
 Lex+0.007
 Fox+0.007
Neg logit contributions
udd-0.009
auc-0.009
wheel-0.008
 about-0.008
iren-0.007
Token are
Token ID389
Feature activation+5.75e-01

Pos logit contributions
auc+0.043
udd+0.043
wheel+0.040
iren+0.039
ooting+0.037
Neg logit contributions
 Mole-0.025
 Ralph-0.025
 Fox-0.024
 might-0.024
 Buzz-0.023
Token you
Token ID345
Feature activation-3.39e-01

Pos logit contributions
 Ralph+0.014
Albert+0.014
 Mole+0.014
 Lex+0.014
Larry+0.013
Neg logit contributions
auc-0.019
udd-0.019
 about-0.018
wheel-0.017
iren-0.016
Token making
Token ID1642
Feature activation-2.44e+00

Pos logit contributions
auc+0.011
udd+0.010
wheel+0.009
orting+0.009
iren+0.009
Neg logit contributions
 Mole-0.009
 Fox-0.009
 Ralph-0.009
 Buzz-0.008
 Amelia-0.008
Token so
Token ID523
Feature activation-2.05e+00

Pos logit contributions
auc+0.088
udd+0.085
wheel+0.076
oke+0.075
ooting+0.075
Neg logit contributions
 Mole-0.054
 Ralph-0.054
 Fox-0.053
 Buzz-0.053
 Amelia-0.052
Token much
Token ID881
Feature activation-4.60e+00

Pos logit contributions
udd+0.066
auc+0.066
wheel+0.062
orting+0.061
 x+0.060
Neg logit contributions
 Buzz-0.039
 Mole-0.038
 both-0.037
Joy-0.037
Buzz-0.037
Token noise
Token ID7838
Feature activation-2.31e+00

Pos logit contributions
auc+0.120
udd+0.118
ooting+0.106
wheel+0.102
iece+0.102
Neg logit contributions
 Buzz-0.123
 Mole-0.122
 Fox-0.119
 mole-0.114
 Daisy-0.110
Token?"
Token ID1701
Feature activation-4.47e-01

Pos logit contributions
udd+0.075
auc+0.073
wheel+0.070
 shin+0.066
 coll+0.065
Neg logit contributions
 Fox-0.049
 Ralph-0.047
 Henry-0.046
 Mole-0.046
 Buzz-0.046
Token
Token ID198
Feature activation-1.01e-02

Pos logit contributions
auc+0.015
udd+0.014
 about+0.013
wheel+0.013
iren+0.013
Neg logit contributions
 Ralph-0.011
Albert-0.011
 Mole-0.011
 Lex-0.011
 Amelia-0.010
Token
Token ID198
Feature activation+7.40e-01

Pos logit contributions
udd+0.000
auc+0.000
 about+0.000
wheel+0.000
iren+0.000
Neg logit contributions
 Ralph-0.000
 Mole-0.000
Albert-0.000
 Lex-0.000
 Amelia-0.000
TokenThey
Token ID2990
Feature activation+6.90e-01

Pos logit contributions
 Ralph+0.024
 Mole+0.024
Albert+0.024
 Lex+0.024
 Albert+0.023
Neg logit contributions
auc-0.019
udd-0.019
wheel-0.016
iren-0.016
 about-0.016
TokenBob
Token ID18861
Feature activation+4.90e-01

Pos logit contributions
 Ralph+0.025
 Albert+0.024
 Fox+0.024
 Amelia+0.024
Albert+0.024
Neg logit contributions
auc-0.017
udd-0.016
wheel-0.016
 about-0.015
iren-0.014
Token wanted
Token ID2227
Feature activation+8.55e-02

Pos logit contributions
 Ralph+0.016
 Mole+0.016
Albert+0.015
 Lex+0.015
 Fox+0.015
Neg logit contributions
auc-0.013
udd-0.013
wheel-0.011
 about-0.011
iren-0.010
Token to
Token ID284
Feature activation-7.66e-01

Pos logit contributions
Albert+0.003
 Ralph+0.003
 Mole+0.003
 Lex+0.003
 Fox+0.003
Neg logit contributions
auc-0.002
udd-0.002
 about-0.002
wheel-0.002
 x-0.002
Token make
Token ID787
Feature activation-9.89e-01

Pos logit contributions
auc+0.022
udd+0.021
wheel+0.020
 about+0.019
 x+0.017
Neg logit contributions
 Ralph-0.023
Albert-0.023
 Lex-0.023
 Mole-0.023
 Amelia-0.022
Token it
Token ID340
Feature activation-6.16e-01

Pos logit contributions
auc+0.036
udd+0.035
wheel+0.032
iren+0.032
orting+0.030
Neg logit contributions
 Ralph-0.023
 Mole-0.022
 Buzz-0.021
 Amelia-0.021
 Fox-0.021
Token even
Token ID772
Feature activation-4.60e+00

Pos logit contributions
auc+0.021
udd+0.020
iren+0.020
oke+0.019
orting+0.018
Neg logit contributions
 Buzz-0.015
 Mole-0.014
 Ralph-0.014
 Larry-0.014
 Flo-0.013
Token better
Token ID1365
Feature activation-5.46e-01

Pos logit contributions
udd+0.169
auc+0.167
iren+0.162
ouncing+0.147
irens+0.144
Neg logit contributions
 Buzz-0.090
 Mole-0.084
 total-0.082
 both-0.081
 might-0.081
Token,
Token ID11
Feature activation-7.40e-01

Pos logit contributions
auc+0.017
udd+0.016
wheel+0.015
iren+0.014
 about+0.014
Neg logit contributions
 Ralph-0.016
 Mole-0.016
Albert-0.015
 Fox-0.015
 Lex-0.015
Token so
Token ID523
Feature activation-6.17e-01

Pos logit contributions
auc+0.025
udd+0.024
iren+0.021
wheel+0.021
ru+0.020
Neg logit contributions
 Ralph-0.020
 Mole-0.019
 Fox-0.019
Albert-0.018
 Lex-0.018
Token he
Token ID339
Feature activation-3.94e-01

Pos logit contributions
auc+0.016
udd+0.016
 about+0.015
wheel+0.014
iren+0.013
Neg logit contributions
Albert-0.019
 Ralph-0.019
 Mole-0.019
 Lex-0.019
 Amelia-0.019
Token found
Token ID1043
Feature activation-1.09e+00

Pos logit contributions
auc+0.012
udd+0.012
wheel+0.011
 about+0.011
iren+0.010
Neg logit contributions
 Ralph-0.011
 Mole-0.011
Albert-0.010
 Lex-0.010
 Fox-0.010
Token Jos
Token ID22568
Feature activation+1.80e-01

Pos logit contributions
 Mole+0.015
 Ralph+0.015
Albert+0.015
 Lex+0.015
 Fox+0.015
Neg logit contributions
auc-0.020
udd-0.020
wheel-0.018
 about-0.018
iren-0.017
Tokenie
Token ID494
Feature activation+8.91e-01

Pos logit contributions
Albert+0.003
 Ralph+0.003
Jess+0.003
 Amelia+0.003
 Lex+0.003
Neg logit contributions
auc-0.007
udd-0.007
wheel-0.006
iren-0.006
orting-0.006
Token couldn
Token ID3521
Feature activation+1.58e+00

Pos logit contributions
 Ralph+0.025
 Mole+0.025
Albert+0.025
 Lex+0.024
 Amelia+0.024
Neg logit contributions
auc-0.026
udd-0.026
wheel-0.024
 about-0.024
iren-0.022
Tokenâ
Token ID22940
Feature activation-1.10e-02

Pos logit contributions
 Ralph+0.020
 Mole+0.020
Albert+0.019
 Lex+0.019
 Amelia+0.018
Neg logit contributions
udd-0.072
auc-0.072
wheel-0.066
 about-0.066
iren-0.063
Token
Token ID26391
Feature activation+1.02e+00

Pos logit contributions
auc+0.000
udd+0.000
 about+0.000
aj+0.000
oke+0.000
Neg logit contributions
 Lex-0.000
Albert-0.000
 Ralph-0.000
Pen-0.000
 Amelia-0.000
Token
Token ID8151
Feature activation+2.50e+00

Pos logit contributions
 Mole+0.030
 Ralph+0.029
Albert+0.029
 Lex+0.028
 Fox+0.028
Neg logit contributions
udd-0.031
auc-0.030
wheel-0.028
 about-0.027
iren-0.025
Tokent
Token ID83
Feature activation+3.69e-01

Pos logit contributions
 Ralph+0.065
 Mole+0.065
Albert+0.064
 Lex+0.063
 Fox+0.062
Neg logit contributions
auc-0.080
udd-0.080
wheel-0.073
 about-0.072
iren-0.068
Token find
Token ID1064
Feature activation+9.25e-01

Pos logit contributions
 Ralph+0.010
 Mole+0.009
Albert+0.009
 Lex+0.009
 Fox+0.009
Neg logit contributions
auc-0.012
udd-0.012
wheel-0.011
 about-0.011
iren-0.010
Token her
Token ID607
Feature activation+1.29e+00

Pos logit contributions
Albert+0.026
 Mole+0.025
 Ralph+0.025
Larry+0.025
 Fox+0.024
Neg logit contributions
auc-0.028
udd-0.027
 about-0.025
wheel-0.025
iren-0.023
Token favourite
Token ID12507
Feature activation+8.17e-01

Pos logit contributions
Albert+0.037
 Ralph+0.037
 Mole+0.037
 Lex+0.036
Larry+0.036
Neg logit contributions
auc-0.037
udd-0.037
 about-0.035
wheel-0.033
 x-0.030
Token toy
Token ID13373
Feature activation+2.09e+00

Pos logit contributions
 Ralph+0.025
Albert+0.025
 Lex+0.025
 Mole+0.025
Larry+0.024
Neg logit contributions
auc-0.022
udd-0.021
 about-0.020
wheel-0.019
 x-0.017
Token,
Token ID11
Feature activation-2.79e-01

Pos logit contributions
 Ralph+0.038
 Mole+0.038
 Lex+0.037
 Buzz+0.037
 Amelia+0.036
Neg logit contributions
auc-0.040
udd-0.038
wheel-0.035
iren-0.033
oke-0.032
Token and
Token ID290
Feature activation+7.98e-01

Pos logit contributions
auc+0.009
udd+0.009
iren+0.008
wheel+0.008
oke+0.008
Neg logit contributions
 Ralph-0.008
 Buzz-0.007
 Lex-0.007
 Flo-0.007
 Amelia-0.007
Token she
Token ID673
Feature activation+1.14e+00

Pos logit contributions
 Ralph+0.020
 Mole+0.020
Albert+0.019
 Lex+0.019
 Fox+0.019
Neg logit contributions
auc-0.027
udd-0.027
wheel-0.024
 about-0.024
iren-0.023
Token loved
Token ID6151
Feature activation+1.12e+00

Pos logit contributions
 Mole+0.032
 Ralph+0.032
Albert+0.032
 Lex+0.031
 Fox+0.031
Neg logit contributions
udd-0.034
auc-0.033
 about-0.031
wheel-0.030
ep-0.028
Token the
Token ID262
Feature activation+2.38e-01

Pos logit contributions
Albert+0.030
 Mole+0.030
 Ralph+0.029
 Lex+0.028
Larry+0.028
Neg logit contributions
 about-0.034
udd-0.034
auc-0.033
wheel-0.031
iren-0.028
Token way
Token ID835
Feature activation+2.76e+00

Pos logit contributions
 Ralph+0.006
 Mole+0.006
Albert+0.006
 Lex+0.006
 Fox+0.006
Neg logit contributions
auc-0.008
udd-0.008
wheel-0.007
 about-0.007
iren-0.006
Token it
Token ID340
Feature activation+7.09e-01

Pos logit contributions
Albert+0.087
 Mole+0.086
 Ralph+0.083
Larry+0.082
 Fox+0.081
Neg logit contributions
udd-0.075
 about-0.075
auc-0.074
wheel-0.067
iren-0.059
Token tick
Token ID4378
Feature activation-4.73e-01

Pos logit contributions
 Ralph+0.018
 Mole+0.018
 Lex+0.017
Albert+0.017
 Flo+0.017
Neg logit contributions
auc-0.023
udd-0.023
wheel-0.021
 about-0.020
iren-0.020
Tokenled
Token ID992
Feature activation+8.22e-01

Pos logit contributions
auc+0.012
udd+0.012
wheel+0.011
 about+0.010
iren+0.010
Neg logit contributions
 Ralph-0.016
 Mole-0.015
 Lex-0.015
Albert-0.015
 Buzz-0.015
Token her
Token ID607
Feature activation+6.86e-01

Pos logit contributions
 Ralph+0.025
 Mole+0.025
Albert+0.025
 Lex+0.024
 Fox+0.024
Neg logit contributions
auc-0.022
udd-0.022
wheel-0.020
 about-0.020
iren-0.018
Token toes
Token ID23932
Feature activation+2.07e+00

Pos logit contributions
 Ralph+0.022
 Mole+0.022
Albert+0.022
 Lex+0.021
 Amelia+0.021
Neg logit contributions
udd-0.018
auc-0.017
 about-0.016
wheel-0.016
iren-0.014
Token so
Token ID523
Feature activation+9.55e-01

Pos logit contributions
 Ralph+0.026
 Mole+0.026
 Lex+0.025
 Fox+0.025
Albert+0.024
Neg logit contributions
auc-0.031
udd-0.031
wheel-0.028
iren-0.026
 about-0.026
Token it
Token ID340
Feature activation-2.58e-02

Pos logit contributions
Albert+0.027
 Mole+0.027
 Ralph+0.026
 Lex+0.026
 Amelia+0.026
Neg logit contributions
udd-0.027
auc-0.027
 about-0.026
wheel-0.024
ep-0.022
Token didn
Token ID1422
Feature activation+1.61e+00

Pos logit contributions
auc+0.001
udd+0.001
wheel+0.001
 about+0.001
iren+0.001
Neg logit contributions
 Mole-0.001
 Ralph-0.001
 Lex-0.001
 Fox-0.001
Albert-0.001
Tokenâ
Token ID22940
Feature activation+2.81e-01

Pos logit contributions
 Mole+0.017
 Ralph+0.016
Albert+0.016
 Lex+0.014
 Amelia+0.014
Neg logit contributions
udd-0.078
auc-0.077
wheel-0.070
 about-0.070
orting-0.068
Token
Token ID26391
Feature activation+6.47e-01

Pos logit contributions
 Mole+0.007
 Ralph+0.007
 Lex+0.007
Albert+0.007
Pen+0.007
Neg logit contributions
auc-0.009
udd-0.009
 about-0.008
wheel-0.008
iren-0.007
Token
Token ID8151
Feature activation+2.74e+00

Pos logit contributions
 Mole+0.018
 Ralph+0.018
Albert+0.018
 Lex+0.018
 Fox+0.017
Neg logit contributions
udd-0.020
auc-0.019
wheel-0.018
 about-0.018
iren-0.016
Tokent
Token ID83
Feature activation-1.17e-01

Pos logit contributions
 Ralph+0.078
Albert+0.077
 Mole+0.077
 Lex+0.075
 Fox+0.074
Neg logit contributions
auc-0.083
udd-0.082
wheel-0.076
 about-0.075
iren-0.069
Token move
Token ID1445
Feature activation+5.72e-01

Pos logit contributions
udd+0.004
auc+0.004
aj+0.004
orting+0.004
oke+0.004
Neg logit contributions
 Flo-0.002
 Buzz-0.002
 Mole-0.002
Albert-0.002
Mo-0.002
Token.
Token ID13
Feature activation+8.59e-01

Pos logit contributions
 Ralph+0.014
 Mole+0.014
 Fox+0.014
Albert+0.014
 Lex+0.013
Neg logit contributions
auc-0.019
udd-0.019
wheel-0.017
iren-0.016
 about-0.016
Token The
Token ID383
Feature activation-1.25e+00

Pos logit contributions
Albert+0.023
 mole+0.022
 Mole+0.022
 Ralph+0.022
Larry+0.021
Neg logit contributions
udd-0.026
auc-0.026
 about-0.024
wheel-0.022
orting-0.022
Token children
Token ID1751
Feature activation+1.39e-01

Pos logit contributions
udd+0.039
auc+0.039
 about+0.035
wheel+0.034
orting+0.032
Neg logit contributions
 Mole-0.033
 mole-0.032
 Amelia-0.031
 Ralph-0.031
Albert-0.031
Token pretend
Token ID16614
Feature activation+1.86e+00

Pos logit contributions
auc+0.016
 about+0.016
udd+0.016
wheel+0.016
 x+0.013
Neg logit contributions
Albert-0.024
Jess-0.023
Pen-0.023
 Lex-0.023
 Amelia-0.023
Token we
Token ID356
Feature activation-2.30e-01

Pos logit contributions
Albert+0.055
 Mole+0.054
 Lex+0.054
 Ralph+0.054
Mo+0.052
Neg logit contributions
auc-0.052
udd-0.051
 about-0.050
wheel-0.047
iren-0.042
Token're
Token ID821
Feature activation-1.49e-01

Pos logit contributions
udd+0.006
auc+0.006
 about+0.006
wheel+0.006
ep+0.005
Neg logit contributions
 Lex-0.007
 Ralph-0.007
 Amelia-0.007
Albert-0.007
 Mole-0.007
Token the
Token ID262
Feature activation+1.92e+00

Pos logit contributions
auc+0.004
udd+0.004
 about+0.004
wheel+0.004
ep+0.003
Neg logit contributions
Albert-0.004
 Ralph-0.004
 Mole-0.004
 Lex-0.004
 Amelia-0.004
Token fish
Token ID5916
Feature activation+6.88e-01

Pos logit contributions
 Amelia+0.059
 Lex+0.058
Albert+0.057
 Ralph+0.057
 Mole+0.057
Neg logit contributions
auc-0.048
udd-0.047
 about-0.044
wheel-0.043
ep-0.039
Token in
Token ID287
Feature activation+2.32e+00

Pos logit contributions
 Mole+0.019
 Ralph+0.019
Albert+0.019
 Lex+0.018
 Fox+0.018
Neg logit contributions
auc-0.021
udd-0.021
wheel-0.019
 about-0.018
iren-0.017
Token the
Token ID262
Feature activation-2.26e-01

Pos logit contributions
Albert+0.064
 Lex+0.063
 Ralph+0.063
 Mole+0.063
 Amelia+0.061
Neg logit contributions
auc-0.065
udd-0.064
 about-0.062
wheel-0.058
ep-0.055
Token pond
Token ID16723
Feature activation+4.59e-01

Pos logit contributions
udd+0.008
auc+0.008
wheel+0.007
 about+0.007
orting+0.007
Neg logit contributions
Albert-0.005
 Lex-0.005
 Ralph-0.005
 Amelia-0.005
 Mole-0.005
Token!"
Token ID2474
Feature activation-1.25e+00

Pos logit contributions
 Ralph+0.012
 Mole+0.012
Albert+0.011
 Buzz+0.011
 Fox+0.011
Neg logit contributions
auc-0.015
udd-0.015
wheel-0.014
iren-0.013
 about-0.012
Token said
Token ID531
Feature activation-7.53e-01

Pos logit contributions
auc+0.026
udd+0.025
 about+0.022
wheel+0.022
iren+0.019
Neg logit contributions
 Mole-0.047
 Ralph-0.047
Albert-0.047
 Lex-0.046
 Fox-0.045
Token John
Token ID1757
Feature activation+3.75e-01

Pos logit contributions
auc+0.023
udd+0.022
 about+0.021
wheel+0.021
iren+0.019
Neg logit contributions
Albert-0.021
 Ralph-0.020
 Mole-0.020
Larry-0.019
 Lex-0.019
Tokenallows
Token ID47205
Feature activation+1.23e+00

Pos logit contributions
auc+0.019
udd+0.019
wheel+0.017
 about+0.016
iren+0.016
Neg logit contributions
 Ralph-0.016
 Mole-0.016
Albert-0.015
 Lex-0.015
 Fox-0.015
Token on
Token ID319
Feature activation+2.41e+00

Pos logit contributions
 Ralph+0.033
 Mole+0.033
Albert+0.032
 Buzz+0.031
 Lex+0.031
Neg logit contributions
udd-0.039
auc-0.038
wheel-0.034
 x-0.032
 about-0.032
Token sticks
Token ID16461
Feature activation+1.52e+00

Pos logit contributions
 Mole+0.071
 Ralph+0.071
Albert+0.071
 Lex+0.070
 Amelia+0.068
Neg logit contributions
auc-0.067
udd-0.067
 about-0.061
wheel-0.060
iren-0.055
Token and
Token ID290
Feature activation+6.49e-01

Pos logit contributions
 Ralph+0.049
 Mole+0.047
 Fox+0.046
 Buzz+0.046
 Flo+0.046
Neg logit contributions
auc-0.041
udd-0.040
wheel-0.035
 x-0.034
iren-0.033
Token held
Token ID2714
Feature activation+1.47e+00

Pos logit contributions
 Mole+0.018
 Ralph+0.018
Albert+0.017
 Lex+0.017
 Amelia+0.017
Neg logit contributions
auc-0.020
udd-0.020
wheel-0.018
 about-0.018
iren-0.016
Token them
Token ID606
Feature activation+2.56e+00

Pos logit contributions
Albert+0.043
 Mole+0.042
 Amelia+0.041
Jess+0.041
Joy+0.041
Neg logit contributions
auc-0.040
udd-0.037
 about-0.037
ep-0.034
eping-0.032
Token over
Token ID625
Feature activation+2.14e+00

Pos logit contributions
 Mole+0.073
 Ralph+0.073
 Lex+0.072
Albert+0.072
 Amelia+0.070
Neg logit contributions
udd-0.074
auc-0.074
 about-0.068
wheel-0.066
ep-0.060
Token the
Token ID262
Feature activation+7.88e-01

Pos logit contributions
Albert+0.067
 Lex+0.065
 Mole+0.065
 Ralph+0.065
 Amelia+0.064
Neg logit contributions
auc-0.056
udd-0.055
 about-0.052
wheel-0.047
orting-0.044
Token fire
Token ID2046
Feature activation+6.97e-01

Pos logit contributions
 Ralph+0.026
 Mole+0.026
 Lex+0.025
 Amelia+0.025
Albert+0.025
Neg logit contributions
auc-0.020
udd-0.019
 about-0.017
ep-0.016
 x-0.016
Token.
Token ID13
Feature activation-1.47e-01

Pos logit contributions
 Ralph+0.021
 Mole+0.021
Albert+0.021
 Lex+0.021
 Amelia+0.021
Neg logit contributions
auc-0.019
udd-0.019
wheel-0.017
 about-0.017
iren-0.015
Token The
Token ID383
Feature activation+9.10e-01

Pos logit contributions
udd+0.004
auc+0.004
 about+0.004
wheel+0.004
iren+0.004
Neg logit contributions
Albert-0.004
 Ralph-0.004
 Mole-0.004
 Lex-0.004
 Amelia-0.004
Token?"
Token ID1701
Feature activation-1.22e+00

Pos logit contributions
auc+0.028
udd+0.027
oke+0.026
wheel+0.025
eping+0.024
Neg logit contributions
 Mole-0.020
 Fox-0.020
 Ralph-0.020
 Buzz-0.019
 mole-0.018
Token
Token ID198
Feature activation-8.51e-01

Pos logit contributions
udd+0.049
auc+0.048
wheel+0.044
 about+0.043
iren+0.041
Neg logit contributions
 Mole-0.024
 Ralph-0.024
Albert-0.023
 Fox-0.023
 Lex-0.022
Token
Token ID198
Feature activation+1.06e-01

Pos logit contributions
auc+0.032
udd+0.032
wheel+0.029
 about+0.029
iren+0.027
Neg logit contributions
 Ralph-0.018
 Mole-0.018
Albert-0.017
 Lex-0.017
 Fox-0.017
TokenL
Token ID43
Feature activation-3.01e-01

Pos logit contributions
 Ralph+0.003
 Mole+0.003
Albert+0.003
 Lex+0.003
 Fox+0.003
Neg logit contributions
auc-0.003
udd-0.003
wheel-0.003
 about-0.003
iren-0.003
Tokenila
Token ID10102
Feature activation-8.09e-01

Pos logit contributions
udd+0.008
wheel+0.007
auc+0.007
 about+0.007
 shin+0.007
Neg logit contributions
 Mole-0.009
 Ralph-0.008
Joy-0.008
 Flo-0.008
Albert-0.008
Token and
Token ID290
Feature activation+6.75e-01

Pos logit contributions
auc+0.022
udd+0.021
wheel+0.019
iren+0.018
orting+0.017
Neg logit contributions
 Ralph-0.026
 Fox-0.025
 Mole-0.025
Albert-0.025
 Lex-0.024
Token Ben
Token ID3932
Feature activation+3.10e-01

Pos logit contributions
Albert+0.023
Jess+0.022
 Ralph+0.022
 Mole+0.022
 Lex+0.021
Neg logit contributions
auc-0.015
udd-0.015
wheel-0.014
 about-0.014
 x-0.013
Token were
Token ID547
Feature activation+3.66e-01

Pos logit contributions
 Ralph+0.008
 Mole+0.008
Albert+0.008
 Lex+0.008
 Fox+0.008
Neg logit contributions
auc-0.010
udd-0.010
wheel-0.009
 about-0.009
iren-0.008
Token scared
Token ID12008
Feature activation-3.63e-01

Pos logit contributions
 Ralph+0.012
 Mole+0.012
Albert+0.011
 Lex+0.011
 Amelia+0.011
Neg logit contributions
auc-0.010
udd-0.010
wheel-0.009
 about-0.008
iren-0.008
Token.
Token ID13
Feature activation+1.77e+00

Pos logit contributions
auc+0.013
udd+0.013
wheel+0.012
ep+0.011
bench+0.011
Neg logit contributions
 Fox-0.008
 Mole-0.008
 Ralph-0.008
 Buzz-0.007
 Lex-0.007
Token They
Token ID1119
Feature activation+1.43e+00

Pos logit contributions
Albert+0.049
 Ralph+0.048
 Mole+0.048
 Lex+0.047
 Amelia+0.046
Neg logit contributions
auc-0.054
udd-0.053
 about-0.049
wheel-0.047
iren-0.045
Token in
Token ID287
Feature activation+1.71e+00

Pos logit contributions
 Ralph+0.033
 Mole+0.033
Albert+0.033
 Lex+0.032
 Amelia+0.032
Neg logit contributions
auc-0.023
udd-0.023
wheel-0.020
 about-0.020
iren-0.018
Token the
Token ID262
Feature activation-9.05e-01

Pos logit contributions
Albert+0.046
 Ralph+0.045
 Mole+0.045
 Lex+0.044
 Albert+0.043
Neg logit contributions
auc-0.053
udd-0.053
 about-0.050
wheel-0.048
iren-0.044
Token park
Token ID3952
Feature activation+6.69e-01

Pos logit contributions
auc+0.030
udd+0.027
wheel+0.027
 about+0.025
iren+0.024
Neg logit contributions
 Mole-0.024
 Ralph-0.023
Albert-0.023
 mole-0.023
 Fox-0.022
Token.
Token ID13
Feature activation-5.22e-02

Pos logit contributions
 Mole+0.018
 Ralph+0.018
 Fox+0.018
 Buzz+0.018
 Flo+0.017
Neg logit contributions
auc-0.021
udd-0.020
wheel-0.018
iren-0.018
 x-0.017
Token Kitty
Token ID21168
Feature activation+1.26e+00

Pos logit contributions
udd+0.002
 about+0.001
auc+0.001
wheel+0.001
iren+0.001
Neg logit contributions
Albert-0.002
 Ralph-0.002
Larry-0.001
 Mole-0.001
Jess-0.001
Token was
Token ID373
Feature activation-3.68e-01

Pos logit contributions
 Mole+0.040
 Ralph+0.040
Albert+0.040
 Lex+0.039
 Amelia+0.038
Neg logit contributions
udd-0.034
auc-0.033
wheel-0.030
 about-0.029
iren-0.026
Token happy
Token ID3772
Feature activation+1.21e-01

Pos logit contributions
udd+0.009
auc+0.009
wheel+0.008
 about+0.008
iren+0.007
Neg logit contributions
 Ralph-0.012
 Mole-0.012
Albert-0.012
 Lex-0.012
 Amelia-0.012
Token that
Token ID326
Feature activation+9.78e-01

Pos logit contributions
Albert+0.005
 Lex+0.005
 Ralph+0.004
 Amelia+0.004
 Mole+0.004
Neg logit contributions
 about-0.003
udd-0.003
auc-0.002
wheel-0.002
orting-0.002
Token her
Token ID607
Feature activation+7.50e-01

Pos logit contributions
Albert+0.028
 Ralph+0.028
 Mole+0.027
 Lex+0.027
 Amelia+0.027
Neg logit contributions
auc-0.028
udd-0.028
 about-0.027
wheel-0.026
iren-0.023
Token idea
Token ID2126
Feature activation+1.37e-01

Pos logit contributions
 Ralph+0.020
Albert+0.020
 Mole+0.020
 Lex+0.020
 Amelia+0.019
Neg logit contributions
udd-0.023
auc-0.023
 about-0.021
wheel-0.021
iren-0.019
Token worked
Token ID3111
Feature activation-1.79e+00

Pos logit contributions
 Mole+0.003
 Ralph+0.003
Albert+0.003
 Lex+0.003
 Fox+0.003
Neg logit contributions
auc-0.005
udd-0.005
wheel-0.004
iren-0.004
 about-0.004
Token't
Token ID470
Feature activation-3.24e-01

Pos logit contributions
 Ralph+0.018
 Mole+0.018
Albert+0.017
 Lex+0.017
 Fox+0.016
Neg logit contributions
auc-0.048
udd-0.047
wheel-0.045
 about-0.044
iren-0.042
Token eat
Token ID4483
Feature activation+3.52e-01

Pos logit contributions
auc+0.010
udd+0.010
wheel+0.010
 about+0.009
iren+0.009
Neg logit contributions
 Ralph-0.008
 Mole-0.008
Albert-0.008
 Lex-0.008
 Amelia-0.008
Token the
Token ID262
Feature activation+5.51e-03

Pos logit contributions
 Ralph+0.009
 Mole+0.009
Albert+0.009
 Lex+0.009
 Fox+0.009
Neg logit contributions
auc-0.011
udd-0.011
wheel-0.010
 about-0.010
iren-0.009
Token toast
Token ID27805
Feature activation+3.14e-01

Pos logit contributions
 Mole+0.000
 Ralph+0.000
 Fox+0.000
Albert+0.000
 Lex+0.000
Neg logit contributions
auc-0.000
udd-0.000
wheel-0.000
 about-0.000
iren-0.000
Token!"
Token ID2474
Feature activation-2.13e-01

Pos logit contributions
 Ralph+0.007
 Mole+0.007
 Buzz+0.007
Albert+0.007
 Fox+0.007
Neg logit contributions
auc-0.011
udd-0.011
iren-0.010
wheel-0.010
 x-0.009
Token but
Token ID475
Feature activation+4.01e-01

Pos logit contributions
auc+0.007
udd+0.007
 about+0.007
wheel+0.007
iren+0.006
Neg logit contributions
 Mole-0.005
 Ralph-0.005
Albert-0.005
 Lex-0.005
 Fox-0.005
Token it
Token ID340
Feature activation+3.91e-01

Pos logit contributions
Albert+0.010
 Mole+0.009
 Lex+0.009
 Ralph+0.009
Jess+0.009
Neg logit contributions
auc-0.013
udd-0.013
 about-0.012
wheel-0.012
iren-0.011
Token was
Token ID373
Feature activation+3.55e-01

Pos logit contributions
 Mole+0.010
 Ralph+0.010
Albert+0.010
 Lex+0.010
 Fox+0.010
Neg logit contributions
auc-0.013
udd-0.013
wheel-0.011
 about-0.011
iren-0.011
Token too
Token ID1165
Feature activation+1.11e+00

Pos logit contributions
 Mole+0.010
 Ralph+0.010
Albert+0.010
 Lex+0.010
 Fox+0.010
Neg logit contributions
auc-0.011
udd-0.011
wheel-0.009
iren-0.009
 about-0.009
Token late
Token ID2739
Feature activation-6.11e-01

Pos logit contributions
 Ralph+0.038
 Mole+0.038
Larry+0.037
 Lex+0.037
Albert+0.037
Neg logit contributions
auc-0.028
udd-0.025
wheel-0.024
orting-0.023
iren-0.021
Token.
Token ID13
Feature activation+7.95e-01

Pos logit contributions
udd+0.022
auc+0.022
wheel+0.020
iren+0.019
ep+0.018
Neg logit contributions
 Ralph-0.015
 Amelia-0.014
 Mole-0.014
 Fox-0.014
 Albert-0.013
Token was
Token ID373
Feature activation+6.41e-01

Pos logit contributions
 Ralph+0.002
 Mole+0.002
Albert+0.002
 Lex+0.002
 Amelia+0.002
Neg logit contributions
auc-0.003
udd-0.003
wheel-0.002
 about-0.002
iren-0.002
Token too
Token ID1165
Feature activation+1.75e+00

Pos logit contributions
 Ralph+0.019
 Mole+0.018
Albert+0.018
 Lex+0.018
 Fox+0.018
Neg logit contributions
auc-0.019
udd-0.019
wheel-0.017
 about-0.016
orting-0.016
Token fast
Token ID3049
Feature activation+5.08e-01

Pos logit contributions
 Ralph+0.051
 Mole+0.051
Albert+0.050
 Lex+0.050
 Fox+0.049
Neg logit contributions
auc-0.051
udd-0.050
wheel-0.046
 about-0.044
iren-0.042
Token.
Token ID13
Feature activation+3.87e-01

Pos logit contributions
 Ralph+0.012
 Fox+0.011
 Mole+0.011
 might+0.011
 Flo+0.011
Neg logit contributions
auc-0.018
udd-0.017
wheel-0.015
iren-0.015
 x-0.014
Token Sadly
Token ID27213
Feature activation+2.81e-02

Pos logit contributions
Albert+0.011
 Mole+0.011
 Ralph+0.011
 Lex+0.011
 mole+0.010
Neg logit contributions
auc-0.011
udd-0.011
 about-0.010
wheel-0.010
iren-0.009
Token,
Token ID11
Feature activation+2.43e-01

Pos logit contributions
 Ralph+0.001
 Mole+0.001
 Lex+0.001
 Fox+0.001
Albert+0.001
Neg logit contributions
auc-0.001
udd-0.001
wheel-0.001
iren-0.001
 about-0.001
Token the
Token ID262
Feature activation-4.05e-02

Pos logit contributions
Albert+0.007
 Ralph+0.006
 Mole+0.006
 Lex+0.006
Larry+0.006
Neg logit contributions
auc-0.007
udd-0.007
 about-0.007
wheel-0.007
 x-0.006
Token wolf
Token ID17481
Feature activation-1.32e-01

Pos logit contributions
auc+0.001
udd+0.001
wheel+0.001
 about+0.001
iren+0.001
Neg logit contributions
 Ralph-0.001
Albert-0.001
 Mole-0.001
 Lex-0.001
 Amelia-0.001
Token caught
Token ID4978
Feature activation-1.00e-01

Pos logit contributions
auc+0.004
udd+0.004
wheel+0.004
 about+0.004
iren+0.004
Neg logit contributions
 Ralph-0.003
 Mole-0.003
Albert-0.003
 Lex-0.003
 Fox-0.003
Token them
Token ID606
Feature activation+1.37e+00

Pos logit contributions
auc+0.003
udd+0.003
 about+0.003
wheel+0.003
iren+0.003
Neg logit contributions
Albert-0.003
 Mole-0.003
 Ralph-0.003
 Lex-0.002
 Amelia-0.002
Token both
Token ID1111
Feature activation+1.00e+00

Pos logit contributions
 Mole+0.044
 Ralph+0.044
Albert+0.043
 Fox+0.043
 Amelia+0.042
Neg logit contributions
auc-0.036
udd-0.034
wheel-0.030
iren-0.029
 about-0.028
Token room
Token ID2119
Feature activation+4.10e-01

Pos logit contributions
Albert+0.014
 Ralph+0.014
 Lex+0.014
 Mole+0.013
Mo+0.013
Neg logit contributions
udd-0.008
auc-0.008
 about-0.007
 office-0.007
 x-0.006
Token.
Token ID13
Feature activation+8.81e-01

Pos logit contributions
 Mole+0.011
 Ralph+0.011
 Fox+0.010
Albert+0.010
 Amelia+0.010
Neg logit contributions
auc-0.013
udd-0.013
wheel-0.012
iren-0.011
 about-0.011
Token Sara
Token ID24799
Feature activation+1.60e-01

Pos logit contributions
Albert+0.026
 Mole+0.025
Larry+0.025
 Ralph+0.025
Mo+0.025
Neg logit contributions
auc-0.025
 about-0.023
udd-0.023
wheel-0.020
iren-0.020
Token was
Token ID373
Feature activation+5.88e-02

Pos logit contributions
 Ralph+0.004
 Mole+0.004
Albert+0.004
 Lex+0.004
 Fox+0.004
Neg logit contributions
auc-0.005
udd-0.005
wheel-0.005
 about-0.005
iren-0.004
Token the
Token ID262
Feature activation+6.79e-01

Pos logit contributions
 Mole+0.002
 Ralph+0.002
Albert+0.002
 Lex+0.002
 Amelia+0.002
Neg logit contributions
auc-0.002
udd-0.002
wheel-0.002
 about-0.002
iren-0.002
Token seeker
Token ID45993
Feature activation+9.90e-01

Pos logit contributions
 Ralph+0.019
 Mole+0.019
Albert+0.018
 Lex+0.018
 Amelia+0.018
Neg logit contributions
auc-0.021
udd-0.021
wheel-0.019
 about-0.018
iren-0.017
Token and
Token ID290
Feature activation+1.69e+00

Pos logit contributions
 Ralph+0.036
 Mole+0.036
Albert+0.036
 Lex+0.035
 Fox+0.035
Neg logit contributions
auc-0.021
udd-0.021
wheel-0.018
 about-0.018
iren-0.016
Token Tom
Token ID4186
Feature activation+5.13e-01

Pos logit contributions
Albert+0.051
 Mole+0.049
 Ralph+0.048
 Fox+0.048
Mo+0.048
Neg logit contributions
auc-0.044
 about-0.043
udd-0.042
wheel-0.041
iren-0.034
Token was
Token ID373
Feature activation-2.75e-01

Pos logit contributions
Albert+0.016
 Ralph+0.015
 Mole+0.015
 Lex+0.015
 Amelia+0.015
Neg logit contributions
udd-0.014
auc-0.014
 about-0.013
wheel-0.013
ep-0.011
Token the
Token ID262
Feature activation+1.44e+00

Pos logit contributions
auc+0.008
udd+0.008
 about+0.007
wheel+0.007
iren+0.006
Neg logit contributions
Albert-0.008
 Ralph-0.008
 Mole-0.008
 Lex-0.008
 Amelia-0.008
Token h
Token ID289
Feature activation-2.73e-01

Pos logit contributions
 Ralph+0.047
Albert+0.046
 Mole+0.046
 Lex+0.046
 Amelia+0.045
Neg logit contributions
auc-0.037
udd-0.036
wheel-0.034
 about-0.033
 x-0.030
Token playing
Token ID2712
Feature activation-1.03e+00

Pos logit contributions
auc+0.035
udd+0.033
orting+0.030
wheel+0.030
iren+0.029
Neg logit contributions
 Mole-0.022
 Fox-0.021
 Lex-0.021
 Ralph-0.021
 Amelia-0.021
Token outside
Token ID2354
Feature activation+1.01e+00

Pos logit contributions
auc+0.034
udd+0.033
oke+0.029
iren+0.028
aj+0.027
Neg logit contributions
 Mole-0.028
 Buzz-0.027
 Ralph-0.027
 Fox-0.027
 Hop-0.026
Token and
Token ID290
Feature activation+5.38e-01

Pos logit contributions
 Mole+0.031
 Ralph+0.031
Albert+0.030
 Lex+0.030
 Fox+0.029
Neg logit contributions
auc-0.028
udd-0.028
wheel-0.025
 about-0.024
iren-0.023
Token got
Token ID1392
Feature activation-6.16e-01

Pos logit contributions
 Ralph+0.014
 Mole+0.014
Albert+0.014
 Lex+0.014
 Fox+0.013
Neg logit contributions
auc-0.017
udd-0.017
wheel-0.016
 about-0.015
iren-0.014
Token her
Token ID607
Feature activation+3.56e-01

Pos logit contributions
auc+0.018
udd+0.018
wheel+0.016
 about+0.015
iren+0.015
Neg logit contributions
 Mole-0.018
 Ralph-0.018
Albert-0.018
 Lex-0.018
 Fox-0.017
Token bl
Token ID698
Feature activation-1.14e+00

Pos logit contributions
 Ralph+0.012
 Mole+0.012
Albert+0.012
 Lex+0.012
Larry+0.012
Neg logit contributions
auc-0.009
udd-0.009
wheel-0.008
 about-0.007
iren-0.007
Tokenouse
Token ID1076
Feature activation+4.12e-01

Pos logit contributions
auc+0.026
udd+0.023
 about+0.023
wheel+0.022
iren+0.022
Neg logit contributions
 Mole-0.038
Albert-0.038
 Ralph-0.038
 Lex-0.037
 Fox-0.037
Token dirty
Token ID11841
Feature activation-2.85e-01

Pos logit contributions
 barely+0.008
 might+0.007
 top+0.007
 Mole+0.007
 Fox+0.007
Neg logit contributions
auc-0.016
oke-0.015
udd-0.014
iren-0.014
orting-0.013
Token.
Token ID13
Feature activation+3.47e-01

Pos logit contributions
auc+0.011
oke+0.011
udd+0.011
iren+0.010
wheel+0.010
Neg logit contributions
 Mole-0.005
 Fox-0.004
 Ralph-0.004
 Buzz-0.004
 Bonnie-0.004
Token Her
Token ID2332
Feature activation-4.48e-02

Pos logit contributions
Albert+0.010
 Ralph+0.010
 Mole+0.010
Larry+0.010
 Lex+0.010
Neg logit contributions
udd-0.010
auc-0.010
 about-0.010
wheel-0.009
iren-0.008
Token mom
Token ID1995
Feature activation-1.28e+00

Pos logit contributions
udd+0.001
auc+0.001
wheel+0.001
 about+0.001
iren+0.001
Neg logit contributions
 Ralph-0.001
 Mole-0.001
Albert-0.001
 Lex-0.001
Larry-0.001
Token,
Token ID11
Feature activation-5.32e-01

Pos logit contributions
auc+0.017
udd+0.017
 about+0.015
wheel+0.014
iren+0.013
Neg logit contributions
 Ralph-0.022
 Mole-0.021
Albert-0.021
 Lex-0.021
 Amelia-0.021
Token there
Token ID612
Feature activation+3.65e-01

Pos logit contributions
auc+0.015
udd+0.015
 about+0.014
wheel+0.013
iren+0.012
Neg logit contributions
 Ralph-0.016
 Mole-0.015
Albert-0.015
 Lex-0.015
 Amelia-0.015
Token was
Token ID373
Feature activation-1.49e+00

Pos logit contributions
Albert+0.012
 Lex+0.012
 Ralph+0.012
 Mole+0.012
 Amelia+0.012
Neg logit contributions
udd-0.009
auc-0.009
 about-0.009
wheel-0.007
ep-0.007
Token a
Token ID257
Feature activation+4.84e-01

Pos logit contributions
auc+0.037
udd+0.036
 about+0.032
wheel+0.032
iren+0.029
Neg logit contributions
 Ralph-0.050
 Mole-0.050
Albert-0.049
 Lex-0.048
 Amelia-0.048
Token playful
Token ID34264
Feature activation+1.19e+00

Pos logit contributions
 Ralph+0.014
 Mole+0.014
Albert+0.014
 Lex+0.014
 Amelia+0.014
Neg logit contributions
udd-0.014
auc-0.014
wheel-0.012
 about-0.012
 x-0.011
Token dog
Token ID3290
Feature activation-4.02e-01

Pos logit contributions
 Mole+0.027
 Ralph+0.027
Albert+0.026
 Lex+0.026
 Fox+0.026
Neg logit contributions
auc-0.042
udd-0.042
wheel-0.039
 about-0.038
iren-0.036
Token named
Token ID3706
Feature activation+6.78e-01

Pos logit contributions
auc+0.011
udd+0.010
 about+0.010
wheel+0.009
iren+0.009
Neg logit contributions
 Ralph-0.013
Albert-0.013
 Lex-0.013
 Mole-0.012
 Amelia-0.012
Token Spot
Token ID15899
Feature activation+7.89e-01

Pos logit contributions
Albert+0.023
 Lex+0.022
Jess+0.022
 Mole+0.022
 Ralph+0.022
Neg logit contributions
udd-0.016
 about-0.015
auc-0.015
wheel-0.014
 x-0.013
Token.
Token ID13
Feature activation-4.69e-01

Pos logit contributions
 Lex+0.032
 Ralph+0.032
Albert+0.032
 Amelia+0.032
Jess+0.031
Neg logit contributions
 about-0.014
udd-0.013
auc-0.012
ep-0.010
wheel-0.010
Token Spot
Token ID15899
Feature activation-5.04e-02

Pos logit contributions
udd+0.015
auc+0.015
 about+0.014
wheel+0.014
iren+0.013
Neg logit contributions
Albert-0.012
 Mole-0.012
 Ralph-0.012
 Lex-0.011
 Amelia-0.011
Token loved
Token ID6151
Feature activation+1.04e+00

Pos logit contributions
udd+0.001
auc+0.001
 about+0.001
wheel+0.001
ep+0.001
Neg logit contributions
 Ralph-0.002
 Lex-0.002
 Mole-0.002
Albert-0.002
 Amelia-0.002
Token,
Token ID11
Feature activation-9.56e-01

Pos logit contributions
auc+0.011
udd+0.011
 about+0.010
wheel+0.009
iren+0.008
Neg logit contributions
Albert-0.018
 Ralph-0.017
 Lex-0.017
 Mole-0.017
 Amelia-0.017
Token they
Token ID484
Feature activation+6.72e-01

Pos logit contributions
auc+0.022
 about+0.022
udd+0.021
wheel+0.019
iren+0.017
Neg logit contributions
Albert-0.033
Pen-0.032
 Lex-0.032
 Amelia-0.031
 Ralph-0.031
Token saw
Token ID2497
Feature activation-2.40e+00

Pos logit contributions
 Ralph+0.020
Albert+0.019
 Mole+0.019
 Lex+0.019
 Flo+0.019
Neg logit contributions
auc-0.020
udd-0.020
 about-0.018
wheel-0.017
ep-0.016
Token a
Token ID257
Feature activation+4.92e-01

Pos logit contributions
udd+0.092
wheel+0.087
auc+0.085
oke+0.082
aj+0.081
Neg logit contributions
 Mole-0.053
 Ralph-0.051
 mole-0.050
 Fox-0.049
 Buzz-0.048
Token big
Token ID1263
Feature activation-3.17e-02

Pos logit contributions
 Mole+0.014
 Ralph+0.014
Albert+0.013
 Fox+0.013
 mole+0.013
Neg logit contributions
auc-0.015
udd-0.015
wheel-0.013
 about-0.013
iren-0.012
Token box
Token ID3091
Feature activation-5.73e-01

Pos logit contributions
udd+0.001
auc+0.001
wheel+0.001
 about+0.001
ru+0.001
Neg logit contributions
 Mole-0.001
 Ralph-0.001
 mole-0.001
 Fox-0.001
 Amelia-0.001
Token in
Token ID287
Feature activation+1.67e+00

Pos logit contributions
udd+0.021
auc+0.019
aj+0.018
arre+0.017
orting+0.017
Neg logit contributions
 Mole-0.012
 mole-0.012
 Ralph-0.012
 Amelia-0.011
 Fox-0.011
Token the
Token ID262
Feature activation-1.30e+00

Pos logit contributions
 Ralph+0.046
 Mole+0.045
Albert+0.045
 Lex+0.044
 Fox+0.043
Neg logit contributions
auc-0.051
udd-0.048
 about-0.046
wheel-0.044
iren-0.043
Token living
Token ID2877
Feature activation-7.80e-01

Pos logit contributions
auc+0.034
udd+0.034
wheel+0.030
 about+0.029
iren+0.027
Neg logit contributions
 Ralph-0.042
 Mole-0.042
Albert-0.042
 Lex-0.041
 Amelia-0.040
Token room
Token ID2119
Feature activation-3.48e-01

Pos logit contributions
auc+0.017
udd+0.017
wheel+0.015
 about+0.015
iren+0.013
Neg logit contributions
 Ralph-0.028
 Mole-0.028
Albert-0.028
 Lex-0.027
 Fox-0.027
Token.
Token ID13
Feature activation+1.07e+00

Pos logit contributions
udd+0.013
auc+0.013
wheel+0.012
orting+0.011
 x+0.011
Neg logit contributions
 Mole-0.006
 Amelia-0.006
 Ralph-0.006
 barely-0.006
 Lex-0.006
Token.
Token ID13
Feature activation+7.28e-01

Pos logit contributions
Albert+0.003
 Ralph+0.003
 Mole+0.003
 Lex+0.002
Larry+0.002
Neg logit contributions
udd-0.002
auc-0.002
 about-0.002
wheel-0.002
orting-0.001
Token
Token ID198
Feature activation+1.10e+00

Pos logit contributions
Albert+0.019
 Ralph+0.018
 Mole+0.018
Larry+0.018
 Lex+0.018
Neg logit contributions
auc-0.022
udd-0.022
 about-0.020
wheel-0.020
iren-0.019
Token
Token ID198
Feature activation+9.61e-01

Pos logit contributions
 Ralph+0.026
 Mole+0.026
Albert+0.026
 Lex+0.025
 Fox+0.025
Neg logit contributions
auc-0.037
udd-0.037
wheel-0.034
 about-0.033
iren-0.032
TokenHer
Token ID9360
Feature activation+1.73e-02

Pos logit contributions
 Ralph+0.024
 Mole+0.024
Albert+0.024
 Lex+0.023
 Amelia+0.023
Neg logit contributions
auc-0.032
udd-0.032
wheel-0.029
 about-0.028
iren-0.027
Token friends
Token ID2460
Feature activation-5.99e-01

Pos logit contributions
Albert+0.000
 Ralph+0.000
 Mole+0.000
 Lex+0.000
 Amelia+0.000
Neg logit contributions
auc-0.001
udd-0.001
 about-0.001
wheel-0.001
iren-0.001
Token looked
Token ID3114
Feature activation-4.68e-01

Pos logit contributions
auc+0.016
udd+0.016
wheel+0.014
iren+0.013
 about+0.013
Neg logit contributions
 Ralph-0.019
 Mole-0.019
Albert-0.018
 Lex-0.018
 Amelia-0.018
Token around
Token ID1088
Feature activation+1.55e+00

Pos logit contributions
auc+0.013
udd+0.012
wheel+0.011
iren+0.010
orting+0.010
Neg logit contributions
 Ralph-0.015
 Mole-0.015
 Amelia-0.014
 Fox-0.014
 Lex-0.014
Token and
Token ID290
Feature activation+7.40e-01

Pos logit contributions
 Ralph+0.053
 Mole+0.053
Albert+0.052
 Lex+0.052
 Amelia+0.051
Neg logit contributions
auc-0.038
udd-0.038
wheel-0.033
 about-0.031
iren-0.030
Token then
Token ID788
Feature activation+3.02e-02

Pos logit contributions
 Ralph+0.023
 Mole+0.023
Albert+0.023
 Lex+0.022
 Amelia+0.022
Neg logit contributions
auc-0.020
udd-0.020
wheel-0.018
 about-0.017
iren-0.016
Token Jane
Token ID12091
Feature activation+2.99e-01

Pos logit contributions
 Ralph+0.001
 Mole+0.001
 Lex+0.001
 Amelia+0.001
 Fox+0.001
Neg logit contributions
udd-0.001
auc-0.001
wheel-0.001
iren-0.001
 about-0.001
Token said
Token ID531
Feature activation-1.83e-01

Pos logit contributions
 Ralph+0.009
 Mole+0.009
Albert+0.009
 Lex+0.009
 Fox+0.009
Neg logit contributions
auc-0.008
udd-0.008
wheel-0.007
 about-0.007
iren-0.007
Token Ben
Token ID3932
Feature activation-1.85e+00

Pos logit contributions
auc+0.028
udd+0.028
 about+0.025
wheel+0.025
iren+0.023
Neg logit contributions
 Mole-0.029
 Ralph-0.028
Albert-0.028
 Lex-0.028
 Fox-0.027
Token says
Token ID1139
Feature activation-1.91e+00

Pos logit contributions
auc+0.047
udd+0.046
wheel+0.040
ooting+0.037
orting+0.036
Neg logit contributions
 Ralph-0.061
 Fox-0.060
 Mole-0.059
 Buzz-0.059
 Lex-0.058
Token.
Token ID13
Feature activation+3.92e-01

Pos logit contributions
udd+0.076
auc+0.075
wheel+0.067
ep+0.066
eping+0.064
Neg logit contributions
 Buzz-0.035
 Mole-0.035
 Fox-0.034
 Ralph-0.032
 Flo-0.032
Token He
Token ID679
Feature activation-5.93e-01

Pos logit contributions
 Ralph+0.012
 Mole+0.012
Albert+0.011
 Lex+0.011
 Amelia+0.011
Neg logit contributions
auc-0.011
udd-0.011
 about-0.010
wheel-0.010
iren-0.009
Token is
Token ID318
Feature activation-6.40e-01

Pos logit contributions
auc+0.017
udd+0.016
wheel+0.015
iren+0.014
orting+0.014
Neg logit contributions
 Ralph-0.017
 Mole-0.016
Albert-0.016
 Lex-0.016
 Fox-0.016
Token scared
Token ID12008
Feature activation-4.78e-01

Pos logit contributions
auc+0.022
udd+0.021
orting+0.019
board+0.017
iren+0.017
Neg logit contributions
 Mole-0.016
 Ralph-0.016
 Buzz-0.016
 Lex-0.016
Albert-0.015
Token and
Token ID290
Feature activation+8.77e-01

Pos logit contributions
auc+0.016
udd+0.015
wheel+0.015
ep+0.013
bench+0.013
Neg logit contributions
 Mole-0.012
 Buzz-0.012
 Ralph-0.011
 Fox-0.011
 Flo-0.011
Token angry
Token ID7954
Feature activation-1.75e+00

Pos logit contributions
 Ralph+0.024
 might+0.024
 Buzz+0.023
 Mole+0.023
 Fox+0.023
Neg logit contributions
udd-0.026
auc-0.026
wheel-0.022
reating-0.022
orting-0.022
Token.
Token ID13
Feature activation+7.22e-01

Pos logit contributions
auc+0.066
udd+0.065
wheel+0.059
ep+0.057
eping+0.057
Neg logit contributions
 Mole-0.036
 Buzz-0.035
 Fox-0.035
 Ralph-0.034
 Flo-0.032
Token He
Token ID679
Feature activation-4.46e-01

Pos logit contributions
Albert+0.022
 Lex+0.022
 Ralph+0.022
 Mole+0.022
 Amelia+0.021
Neg logit contributions
auc-0.019
 about-0.018
udd-0.018
wheel-0.017
iren-0.016
Token tries
Token ID8404
Feature activation-1.45e-01

Pos logit contributions
auc+0.012
udd+0.012
wheel+0.011
 about+0.010
orting+0.010
Neg logit contributions
 Ralph-0.013
 Mole-0.013
Albert-0.013
 Lex-0.013
 Fox-0.013
Token that
Token ID326
Feature activation-5.64e-01

Pos logit contributions
 Mole+0.008
 Ralph+0.008
Albert+0.008
 Lex+0.008
 Amelia+0.008
Neg logit contributions
auc-0.007
udd-0.007
 about-0.006
wheel-0.006
iren-0.005
Token the
Token ID262
Feature activation+5.33e-02

Pos logit contributions
auc+0.017
udd+0.017
wheel+0.015
 about+0.015
iren+0.015
Neg logit contributions
 Ralph-0.016
 Mole-0.016
Albert-0.015
 Lex-0.015
 Fox-0.015
Token celebration
Token ID14395
Feature activation-1.11e+00

Pos logit contributions
 Ralph+0.002
 Mole+0.002
Albert+0.002
 Lex+0.002
 Amelia+0.001
Neg logit contributions
auc-0.002
udd-0.002
wheel-0.001
 about-0.001
iren-0.001
Token would
Token ID561
Feature activation+1.19e+00

Pos logit contributions
auc+0.036
udd+0.036
wheel+0.033
 about+0.032
iren+0.030
Neg logit contributions
 Ralph-0.029
 Mole-0.028
Albert-0.028
 Lex-0.028
 Amelia-0.027
Token be
Token ID307
Feature activation-1.45e+00

Pos logit contributions
 Ralph+0.032
 Mole+0.032
Albert+0.031
 Lex+0.031
 Amelia+0.030
Neg logit contributions
auc-0.037
udd-0.036
wheel-0.033
 about-0.033
iren-0.031
Token even
Token ID772
Feature activation-3.01e+00

Pos logit contributions
auc+0.043
udd+0.043
wheel+0.038
iren+0.036
 about+0.036
Neg logit contributions
 Ralph-0.042
 Mole-0.041
Albert-0.041
 Lex-0.041
 Fox-0.040
Token swe
Token ID3490
Feature activation+2.53e-01

Pos logit contributions
auc+0.094
udd+0.089
iren+0.086
wheel+0.076
orting+0.075
Neg logit contributions
 Buzz-0.081
 Mole-0.080
 Lex-0.079
Albert-0.079
 Jessie-0.078
Tokeneter
Token ID2357
Feature activation-6.42e-01

Pos logit contributions
 Ralph+0.008
Albert+0.008
 Mole+0.008
 Lex+0.007
 Amelia+0.007
Neg logit contributions
udd-0.007
auc-0.007
 about-0.006
wheel-0.006
orting-0.006
Token with
Token ID351
Feature activation+8.69e-01

Pos logit contributions
auc+0.020
udd+0.019
wheel+0.018
iren+0.017
ep+0.016
Neg logit contributions
 Ralph-0.018
 Mole-0.018
 Lex-0.017
 Fox-0.017
 Buzz-0.017
Token his
Token ID465
Feature activation+1.77e+00

Pos logit contributions
Albert+0.024
 Mole+0.024
 Ralph+0.023
 Lex+0.023
 Amelia+0.022
Neg logit contributions
auc-0.026
udd-0.025
 about-0.024
wheel-0.023
iren-0.021
Token help
Token ID1037
Feature activation+2.13e-02

Pos logit contributions
Albert+0.055
 Ralph+0.054
 Mole+0.053
 Lex+0.052
Mo+0.051
Neg logit contributions
udd-0.047
 about-0.045
auc-0.044
 x-0.038
wheel-0.038
Token hurt
Token ID5938
Feature activation-5.31e-01

Pos logit contributions
auc+0.029
udd+0.028
wheel+0.026
 about+0.024
iren+0.024
Neg logit contributions
 Mole-0.029
 Ralph-0.028
 Fox-0.028
 Lex-0.027
Albert-0.027
Token,
Token ID11
Feature activation-8.84e-01

Pos logit contributions
auc+0.015
udd+0.015
wheel+0.013
 about+0.013
iren+0.012
Neg logit contributions
 Mole-0.016
 Ralph-0.016
Albert-0.016
 Lex-0.015
 Fox-0.015
Token but
Token ID475
Feature activation-1.35e+00

Pos logit contributions
auc+0.030
udd+0.029
iren+0.027
wheel+0.027
ep+0.025
Neg logit contributions
 Ralph-0.022
 Mole-0.022
 Lex-0.021
 Buzz-0.021
 Fox-0.021
Token we
Token ID356
Feature activation+5.45e-01

Pos logit contributions
udd+0.051
auc+0.051
wheel+0.046
iren+0.045
 about+0.043
Neg logit contributions
 Mole-0.030
 Ralph-0.029
 Fox-0.028
 Lex-0.027
Albert-0.027
Token should
Token ID815
Feature activation-1.28e+00

Pos logit contributions
Pen+0.016
 Lex+0.016
Albert+0.016
Jess+0.016
Mo+0.015
Neg logit contributions
 about-0.016
udd-0.015
auc-0.015
wheel-0.013
arre-0.013
Token always
Token ID1464
Feature activation-3.47e+00

Pos logit contributions
auc+0.040
udd+0.040
wheel+0.036
 about+0.036
iren+0.033
Neg logit contributions
 Ralph-0.034
 Mole-0.034
Albert-0.034
 Lex-0.033
 Amelia-0.032
Token ask
Token ID1265
Feature activation+4.39e-01

Pos logit contributions
auc+0.128
udd+0.124
wheel+0.111
iren+0.107
 about+0.105
Neg logit contributions
 Mole-0.081
 Ralph-0.080
Albert-0.078
 Fox-0.077
 Lex-0.076
Token for
Token ID329
Feature activation-5.33e-02

Pos logit contributions
 Ralph+0.014
 Mole+0.014
Albert+0.014
 Lex+0.014
 Fox+0.014
Neg logit contributions
auc-0.011
udd-0.011
wheel-0.010
 about-0.010
iren-0.009
Token help
Token ID1037
Feature activation-2.17e-01

Pos logit contributions
auc+0.002
udd+0.002
wheel+0.002
 about+0.002
iren+0.001
Neg logit contributions
 Ralph-0.001
 Mole-0.001
Albert-0.001
 Lex-0.001
 Fox-0.001
Token when
Token ID618
Feature activation+6.52e-01

Pos logit contributions
auc+0.006
udd+0.006
wheel+0.005
 about+0.005
iren+0.005
Neg logit contributions
 Ralph-0.007
 Mole-0.007
Albert-0.006
 Lex-0.006
 Fox-0.006
Token we
Token ID356
Feature activation-5.15e-01

Pos logit contributions
Albert+0.016
 Ralph+0.016
 Mole+0.015
 Lex+0.015
 Amelia+0.015
Neg logit contributions
auc-0.022
udd-0.021
 about-0.020
wheel-0.020
iren-0.018
Token day
Token ID1110
Feature activation-1.23e+00

Pos logit contributions
udd+0.013
auc+0.013
wheel+0.012
iren+0.011
orting+0.011
Neg logit contributions
 Mole-0.005
 Ralph-0.005
 mole-0.005
 Buzz-0.005
Larry-0.005
Token she
Token ID673
Feature activation-2.68e-01

Pos logit contributions
udd+0.029
auc+0.029
 about+0.027
wheel+0.026
iren+0.023
Neg logit contributions
 Ralph-0.042
 Mole-0.042
Albert-0.042
 Lex-0.041
Larry-0.040
Token was
Token ID373
Feature activation-9.72e-01

Pos logit contributions
auc+0.009
udd+0.009
wheel+0.009
iren+0.009
orting+0.009
Neg logit contributions
 Ralph-0.006
 Mole-0.006
 Fox-0.006
Albert-0.005
 Lex-0.005
Token having
Token ID1719
Feature activation-2.93e-01

Pos logit contributions
auc+0.036
udd+0.034
orting+0.031
iren+0.030
wheel+0.030
Neg logit contributions
 Ralph-0.022
 Mole-0.021
 Fox-0.021
 Buzz-0.020
 barely-0.020
Token so
Token ID523
Feature activation-1.20e+00

Pos logit contributions
auc+0.008
udd+0.008
wheel+0.007
 about+0.007
iren+0.007
Neg logit contributions
 Ralph-0.009
 Mole-0.009
Albert-0.009
 Lex-0.009
 Amelia-0.009
Token much
Token ID881
Feature activation-3.27e+00

Pos logit contributions
auc+0.036
udd+0.034
wheel+0.031
 about+0.029
iren+0.029
Neg logit contributions
 Mole-0.035
 Ralph-0.034
Albert-0.034
 Lex-0.034
 Amelia-0.033
Token fun
Token ID1257
Feature activation-1.07e+00

Pos logit contributions
auc+0.116
udd+0.100
ru+0.092
oke+0.090
ep+0.088
Neg logit contributions
 Mole-0.079
 Buzz-0.079
 Ralph-0.079
 Fox-0.075
 mole-0.074
Token shooting
Token ID4395
Feature activation-2.12e-01

Pos logit contributions
auc+0.034
udd+0.034
iren+0.028
wheel+0.028
aj+0.028
Neg logit contributions
 Ralph-0.027
 Mole-0.027
 Buzz-0.026
 Fox-0.026
 Flo-0.026
Token her
Token ID607
Feature activation+2.00e-01

Pos logit contributions
auc+0.007
udd+0.007
wheel+0.006
 about+0.006
iren+0.006
Neg logit contributions
 Ralph-0.006
 Mole-0.006
Albert-0.006
 Lex-0.005
 Fox-0.005
Token hose
Token ID31489
Feature activation+5.52e-01

Pos logit contributions
 Mole+0.005
 Ralph+0.005
Albert+0.005
 Lex+0.005
 Fox+0.005
Neg logit contributions
auc-0.006
udd-0.006
wheel-0.006
 about-0.006
iren-0.005
Token that
Token ID326
Feature activation-3.34e-01

Pos logit contributions
 Mole+0.014
 Ralph+0.014
 Fox+0.014
 Lex+0.013
Albert+0.013
Neg logit contributions
auc-0.018
udd-0.017
wheel-0.015
iren-0.015
 about-0.014
Token a
Token ID257
Feature activation+2.26e-01

Pos logit contributions
udd+0.010
auc+0.009
wheel+0.008
iren+0.008
 about+0.008
Neg logit contributions
 Mole-0.007
 Ralph-0.007
 Amelia-0.006
 Lex-0.006
 Fox-0.006
Token big
Token ID1263
Feature activation-7.03e-01

Pos logit contributions
 Ralph+0.007
 Mole+0.007
Albert+0.007
 Lex+0.007
 Amelia+0.007
Neg logit contributions
auc-0.006
udd-0.006
wheel-0.005
 about-0.005
iren-0.005
Token poster
Token ID11968
Feature activation-1.33e+00

Pos logit contributions
auc+0.021
udd+0.021
wheel+0.018
 about+0.018
iren+0.017
Neg logit contributions
 Mole-0.020
 Ralph-0.020
Albert-0.019
 Fox-0.019
 Amelia-0.019
Token.
Token ID13
Feature activation-1.58e-01

Pos logit contributions
udd+0.044
auc+0.044
iren+0.037
ru+0.036
wheel+0.036
Neg logit contributions
 Amelia-0.031
 Ralph-0.031
 Fox-0.030
 Mole-0.030
 Albert-0.029
Token The
Token ID383
Feature activation-8.76e-01

Pos logit contributions
auc+0.005
udd+0.005
 about+0.004
wheel+0.004
iren+0.004
Neg logit contributions
Albert-0.004
 Ralph-0.004
 Mole-0.004
 Lex-0.004
 Amelia-0.004
Token teacher
Token ID4701
Feature activation-3.10e+00

Pos logit contributions
udd+0.028
auc+0.026
 about+0.023
ru+0.023
wheel+0.023
Neg logit contributions
 Mole-0.022
 Amelia-0.022
 mole-0.022
 Fox-0.021
 Ralph-0.021
Token says
Token ID1139
Feature activation+2.44e-01

Pos logit contributions
udd+0.078
auc+0.078
wheel+0.069
iren+0.064
ooting+0.063
Neg logit contributions
 Amelia-0.099
 Ralph-0.096
Albert-0.095
 Lex-0.094
 Albert-0.094
Token they
Token ID484
Feature activation-5.73e-02

Pos logit contributions
 Ralph+0.006
 Amelia+0.006
 Mole+0.006
 Buzz+0.005
 Fox+0.005
Neg logit contributions
udd-0.009
auc-0.009
wheel-0.008
iren-0.008
ru-0.007
Token can
Token ID460
Feature activation-4.18e-02

Pos logit contributions
auc+0.002
udd+0.002
wheel+0.002
 about+0.002
iren+0.002
Neg logit contributions
 Ralph-0.001
 Mole-0.001
Albert-0.001
 Lex-0.001
 Fox-0.001
Token use
Token ID779
Feature activation+9.08e-02

Pos logit contributions
auc+0.001
udd+0.001
wheel+0.001
 about+0.001
iren+0.001
Neg logit contributions
 Ralph-0.001
 Mole-0.001
Albert-0.001
 Lex-0.001
 Amelia-0.001
Token any
Token ID597
Feature activation-1.89e+00

Pos logit contributions
 Ralph+0.002
 Mole+0.002
 Amelia+0.002
 Fox+0.002
Albert+0.002
Neg logit contributions
udd-0.003
auc-0.003
wheel-0.003
iren-0.003
 about-0.003
Token with
Token ID351
Feature activation+3.87e-01

Pos logit contributions
udd+0.005
auc+0.005
 about+0.005
wheel+0.005
iren+0.005
Neg logit contributions
Albert-0.005
 Ralph-0.005
 Mole-0.005
 Lex-0.004
Larry-0.004
Token a
Token ID257
Feature activation+8.72e-01

Pos logit contributions
 Ralph+0.012
 Mole+0.012
Albert+0.011
 Lex+0.011
 Fox+0.011
Neg logit contributions
auc-0.011
udd-0.011
wheel-0.010
 about-0.009
iren-0.009
Token set
Token ID900
Feature activation-1.11e+00

Pos logit contributions
 Mole+0.025
 Ralph+0.025
Albert+0.024
 Lex+0.024
 Fox+0.024
Neg logit contributions
auc-0.027
udd-0.026
wheel-0.023
 about-0.023
orting-0.021
Token of
Token ID286
Feature activation-1.37e+00

Pos logit contributions
udd+0.036
auc+0.033
 writes+0.028
aj+0.028
wheel+0.028
Neg logit contributions
 Ralph-0.031
 Mole-0.031
 Flo-0.030
 Lex-0.030
 Amelia-0.029
Token colourful
Token ID44799
Feature activation-1.19e+00

Pos logit contributions
udd+0.044
auc+0.042
wheel+0.038
ouncing+0.035
orting+0.035
Neg logit contributions
 Mole-0.037
 mole-0.037
 Ralph-0.036
 Lex-0.036
 Fox-0.036
Token cr
Token ID1067
Feature activation-3.05e+00

Pos logit contributions
auc+0.037
udd+0.037
wheel+0.034
 about+0.032
orting+0.030
Neg logit contributions
 Mole-0.031
 mole-0.031
 Ralph-0.031
 Lex-0.031
 Amelia-0.031
Tokenay
Token ID323
Feature activation-7.74e-01

Pos logit contributions
auc+0.093
udd+0.093
 cheating+0.089
wheel+0.085
 about+0.085
Neg logit contributions
Albert-0.067
Jess-0.064
Mo-0.064
Larry-0.062
Lex-0.061
Tokenons
Token ID684
Feature activation+2.26e-01

Pos logit contributions
udd+0.024
auc+0.023
 about+0.020
wheel+0.020
aj+0.019
Neg logit contributions
Albert-0.019
Larry-0.019
 Mole-0.019
 Ralph-0.019
Mo-0.018
Token.
Token ID13
Feature activation-7.49e-01

Pos logit contributions
 Ralph+0.006
 Mole+0.006
 Fox+0.006
Albert+0.006
 Buzz+0.006
Neg logit contributions
auc-0.007
udd-0.007
wheel-0.006
iren-0.006
 x-0.006
Token
Token ID198
Feature activation-4.63e-01

Pos logit contributions
udd+0.025
auc+0.025
 about+0.024
wheel+0.023
iren+0.021
Neg logit contributions
Albert-0.018
 Mole-0.018
 Ralph-0.018
 Lex-0.017
 Amelia-0.017
Token
Token ID198
Feature activation+1.95e-01

Pos logit contributions
udd+0.015
auc+0.015
 about+0.014
wheel+0.014
iren+0.013
Neg logit contributions
 Mole-0.011
 Ralph-0.011
Albert-0.011
 Lex-0.011
 Amelia-0.011